Stay Connected with the Boundless Blog

NYC Citi Bike Analysis Using QGIS with Boundless

Victor Olaya

Citi Bike
, the bike share system in New York City, provides some interesting data that can be analyzed in many different ways. We liked this analysis from Ben Wellington that was performed using IPython and presented using QGIS. Since we wanted to demonstrate the power of QGIS, we decided to replicate his analysis entirely in QGIS and go a little bit further by adding some extra tasks. We automated the whole process, making it easy to add new layers corresponding to more recent data in the future.

Data can be found on the Citi Bike website and is provided on a monthly basis. We will show how to process one of them (February 2014) and discuss how to automate processing for the whole set of files. We will also use a layer with the borough boundaries from NYC Department of City Planning.

Getting started

Open both the table and the layer in QGIS. Tables open as vector layers and are handled much in the same way except for the fact that they lack geometries. This is what the table with trip data looks like.


We have to process the data in order to create a new points layer, with each point representing the location of an available bike station and having the following associated values:

  • Median value of age
  • Median value of trip duration
  • Percentage of male users
  • Percentage of subscribed users
  • Total number of outgoing trips

Computing a new layer

As we did in a previous bike share post, we can use a script to compute this new layer. We can add a new algorithm to the Processing framework by writing a Python script. This will make the algorithm available for later use and, as we will see later, will integrate it in all the Processing components.

You can find the script here. Add it to you collection of scripts and you should see a new algorithm named “Summarize Citi Bike data” in the toolbox.

Double click on the algorithm name to execute it and enter the table with the trips data as input.


Running the algorithm will create a new layer with stations, containing the computed statistics for each of them in the attributes table.


To create an influence zone around each point, we can use the Voronoi Polygons algorithm


We are using a buffer zone to cover all of Manhattan. Otherwise, the polygons would just include the area within the convex hull of the station points. Here is the output layer.


The final step is to clip the polygons with the contour layer, to remove areas overlapping with water. We will use the Clip algorithm, resulting in this:


Visualizing the data

We can now change the style and set a color ramp based on any of the variables that we have computed. Here you have one based on the median trip time.


Up to this point, we have replicated the result of the original blog entry, but we can go a bit further rather easily.

Creating a model

For instance, let’s suppose that you want to do the same calculation for other months. The most simple alternative would be to open all the corresponding tables and re-run all the above steps for each of them. However, it would be better if we could put all of those steps in a single algorithm that computes the final polygons from the inputs table. We can do that by creating a model.

Models are created by opening the graphical modeler and adding inputs and algorithms to define a workflow. A model defining the workflow that we have followed would look like this.


In case you want to try it yourself, you can download it here. Now, for a new set of data (a new input table), you just have to run a single algorithm (the model that we have just created) in order to get the corresponding polygons. The parameter dialog of the model looks like this:



Batch processing and other options

This, however, might be a lengthy operation once we have a few tables to process, but we can add some additional automation.The model we have created can be used just like any other algorithm in Processing, meaning that we can use it in the Processing batch interface. Right-clicking on the model and selecting “Execute as batch process” will open the batch processing dialog.


Just select the layers to process in the corresponding column, the output filenames, and the set of resulting layers will be automatically computed in a single execution.

With a bit of additional work, these layers can be used, for instance, to run the TimeManager plugin and create an animation, which will help understand how the bike system usage varies over the course of the year.

Other improvements can also be added. One of them would be to write a new short script to create the SLD file each time for each layer based on the layer data, adjusting the boundaries of the color ramp to the min and max values in the layer, or using some other criteria. That would allow us to create a data-driven symbology, and we could add the algorithm created with that script as a new step in our model.

Publishing to OpenGeo Suite

Another improvement that we can add is to link our model to the publishing capabilities of the OpenGeo Suite plugin. If we want to publish the layers that we create to a GeoServer instance, we can also do it from QGIS. Furthermore, we can call that functionality from Processing, so publishing a layer and setting a style can be included as additional steps of our model, automating the whole workflow.

First, we need a script to upload the layer and a style to GeoServer, calling the OpenGeo Suite Plugin API. The code of this script will look like this:

##Import styled layer to GeoServer=name

from qgis.core import *
from PyQt4.QtCore import *
import processing
from opengeo.qgis.catalog import createGeoServerCatalog

layer = processing.getObject(Layer)
catalog = createGeoServerCatalog(url, user, password)
ws = catalog.catalog.get_workspace(workspace)
catalog.publishLayer(layer, ws)

You can copy that code and create a new script or you can install this script file.

A new algorithm is now available in your toolbox: “Import styled layer to GeoServer”.


Select the model that we created, right-click on its name and select “Edit model” to edit it.

You can now add the import script algorithm to the model, so it takes the resulting layer and imports it.


Notice that, although our script takes the URL and other needed parameters for the import operation, these are not requested of the user when running the model, since we have hardcoded them assuming that we will always upload to the same server. This can of course be changed easily to adapt to a given scenario.

Running the algorithm will now compute the polygons from the input table, import it and set a given style, all in a single operation.

The style is selected as another input of the model, and it has to be entered as an SLD file. You can easily generate an SLD file from QGIS, just by defining it in the properties of the layer and then exporting as SLD. The SLD produced by QGIS is not fully compatible with GeoServer, but the OpenGeo Suite plugin will take care of that before actually sending it to GeoServer.

Although we have left the style as an input that has to be selected in each execution, we could hardcode it in the model, or even in the script dialog used for importing. Again, there are several alternatives to build our scripts and models.

Of course, this improved model can also be run in batch processing mode, like we did with the original one.


QGIS is an ideal tool for working with spatial data and analyzing it, and with our plugin, it is also an easy interface for publishing data to OpenGeo Suite. Integrating Processing the OpenGeo Suite plugin enables all sorts of automated analysis and publishing workflows, enabling the execution of complete workflows from a single application.