Stay Connected with the Boundless Blog

QGIS and the Bay Area Bike Share Open Data Challenge

Gretchen PetersonThe San Francisco Bay Area Bike Share Open Data Challenge is now underway, with entries due April 25, 2014. The idea is to use their open data on bike stations, bicycle traffic patterns, and weather to create an interesting visualization, map, or other product that adds value to the program.

Using open source mapping tools is a great way to explore the data and create winning entries for the contest. For those who are new to making maps out of open data, we’re here to help you get started. In this tutorial we’ll show you how to use QGIS, a popular mapping software product, to create a simple map out of the data. Build on this foundation to create your own contest entries and learn about data and geospatial technology along the way.

To get started, download all four data files from the contest website here. After you unzip the data you’ll see that it’s in CSV format. This is a comma delimited text file format that’s useful for spreadsheets and geospatial tables.

Installing QGIS

First, download and install QGIS. Then install the OpenLayers plugin, which simplifies adding some of the most common base layers, such as Google Maps or OSM, and makes it easier to visualize the bike station locations. You can install it by opening the Plugin Manager, selecting the Get more section and then searching for OpenLayers.


Now in the Plugins menu you should have a new entry where you can select the layers to add.



Adding bike share data

Add whichever basemap you like. In the following screenshots, you will see that we’ve added the Bing Road layer, which is less saturated than some of the others. A less saturated basemap helps to highlight the data that will be overlaid. You can zoom into San Francisco now or wait until the bike share data is added. To create the bike share data overlay, use the Add Delimited Text Layer button.


Add the station data file using the browse button. Use the x and y drop down selectors under Geometry definition to tell QGIS which fields have the latitude coordinates and which have the longitude coordinates. Latitude is y and longitude is x.


Your input should look like the screenshot above. Press OK. In the Coordinate Reference System Selector, type in “4326” in the Filter box and select WGS 84 in the box directly beneath it. Many — but not all — datasets in open data formats are in the WGS 84, or EPSG 4326, coordinate system.


The QGIS map should now look similar to the screenshot below. If it isn’t zoomed in properly, you can right-click the station_data layer in the Layers list and choose Zoom to Layer Extent.


Joining and analyzing bike share data

There is a lot of data in the other three tables to explore but they need to be joined to the station data first since the station data contains the geometry for displaying the data on the map, while the other tables are related to the station data geometry via its station_id field. Fields that can be used for joining are often described in files that come with the data. The README.txt file that came with this data follows this convention.

In this tutorial we’ll use the trip_data table to perform an analysis and display the results on the map. First the trip_data table needs to be added to QGIS. Click Add Delimited Text Layer again, browse to the trip_data table, and choose “No geometry” next to Geometry layer definition. Press ok. The table is added to the Layers list in QGIS. Right-click the table name in the Layers list and click Open Attribute Table. You can see the data has loaded correctly. Notice that the station_id is used in the Start Terminal and End Terminal fields.


The average duration of a trip from each station is a good first analysis. To get the average duration we have to total up the durations of each trip by Start Terminal. This could be done in a spreadsheet program, exported as a CSV file, and then added into QGIS using the steps described above for loading non-spatial tables. Alternatively, we are providing, a script created that will do the calculation within QGIS.

In the Processing menu under Options and configuration, expand Scripts and view the folder path. This is the folder path in which to save the Python script. Once the script is saved to that path, restart QGIS.

Open the Processing Toolbox by clicking Processing > Toolbox. It will appear on the right-hand side of the QGIS window. Expand Scripts, Boundless, and double-click “avg.” Fill out the dialog with the following, making sure to save the table as a CSV file in the path of your choosing.


Now you can join the output table with the station data layer in order to visualize the average duration (in seconds) of trips from each station. Double-click the station_data layer in the Layers list to bring up the Properties window. Choose Joins and click the green plus sign near the bottom of the window. Pick the table from the list that contains the average data and the field that has the station ID number. If you used the QGIS script, these will be “output_table” and “class.” The Target field is “station_id.”


Now you can look at the attribute table for the station_data to make sure the join worked properly. If it did, the fields from station_data are now in the table. (If the fields are added to the table but the cells are populated with NULL values, the wrong id field was used in the join process.)

To visualize the duration field, double click the station_data in the Layers list to open the Layer Properties and choose Style. Choose Graduated, output_table_avg for the column to style, and change the color in the color ramp as per your preference. Change the mode to Natural Breaks and press ok. (Choosing a mode that makes sense for the data and for the map is an important part of the analytical process. Here is more information on modes).SmallScale1.png

Zoom in to the denser section to see that data more clearly. Enlarge the circles by double clicking station_data, Style, click the change symbol button and change the size to 3. Click OK twice.


The trips in San Francisco appear to be shorter than the trips in Redwood City. Hopefully this tutorial on using QGIS with the Bay Area Bike Share open data provides a springboard for contest entrants. Good luck!

Interested in QGIS? Learn more at the first QGIS user group meetup in the United States on Friday, April 11!