GeoGit for Python: Announcing geogit-py

Victor OlayaAt Boundless, we usually describe GeoGit not as an application itself, but as a library. We see it as a basic component of geospatial data management on top of which other applications can be built. While GeoGit currently has a command-line interface (CLI), adding new ways of interacting with GeoGit will increase the possibilities for creating new projects that rely on GeoGit to manage changes to geospatial data. We hope to see GeoGit as the core of an ecosystem of tools that solve a variety of problems.

We have started developing a Python library, the geogit-py library, to make it much easier to create GeoGit-based applications. Since Python is a widespread scripting language, this will allow other developers to incorporate GeoGit-based capabilities into many other applications. In fact, we are already using it to create a plugin to bring the versioning capabilities of GeoGit into QGIS.

Basic GeoGit Automation

The geogit-py library will also make it easier to automate tasks when working with a GeoGit repository, since all the great features of the Python language can be used alongside GeoGit methods. This represents a great feature for all GeoGit users, especially those that use it in a workflow that can be partially of fully automated.

Here are some examples to provide an idea of what using the library is like. A basic workflow should start with something like this:

# Create repo
repo = Repository('path/to/repo/folder', init = True)

# Configure
repo.config(geogit.USER_NAME, 'myuser')
repo.config(geogit.USER_EMAIL, '')

# Add some data and create a snapshot
repo.addandcommit('first import')

You can automate this first step and easily import a set of layers , creating a different snapshot for each one. Assuming that we have a set of shapefiles in a folder, the following code will do it.

for f in os.listdir(folder):
if f.endswith('.shp'):
   path = os.path.join(folder, f)
   repo.addandcommit('Imported ' + f)

Editing Features

In a normal GeoGit workflow, you export from a GeoGit repository, edit the exported data using the tool of your choice (i.e. a desktop GIS like QGIS), and then import the changed layer so GeoGit can compute the changes that have been introduced, which are later used to create a new snapshot.

With geogit-py, that approach is still possible, but you can also edit without exporting while directly modifying a feature. Internally, geogit-py still calls GeoGit import/export commands but wraps them to expose them in a more practical way. It is also more efficient, since it does not import/export the whole layer. Here’s an example.

# Take a feature and modify its geometry
feature = repo.feature(geogit.HEAD, 'parks/1')
geom = feature.geom
attributes = feature.attributesnogeom
newgeom = geom.buffer(5.0)

# insert the modified geometry and create a new snapshot with the changes
repo.insertfeature(feature.path, attributes, newgeom)
repo.addandcommit('modified parks/1 (buffer computed)')

In this case we have computed a buffer, but you can modify the geometry as you like. Geometries are Shapely objects, so you can use all the methods in that powerful library to work with them. You can also modify the non-geometry attributes in the feature (though we haven’t done so in that example).

Working with GeoGit Branches

Working with branches is also rather simple:

# Create a branch at the current HEAD commit to work on it
repo.createbranch(repo.head, 'mybranch', checkout = True)

# [...] Perform some work on the branch, modifying the repo data and creating new commits

# Bring changes to master branch (which might itself have changes)
        print 'Merge correctly executed. No merge conflicts'
except GeoGitConflictException, e:
        print 'Cannot merge. There are merge conflicts'

Growing GeoGit

Although most of the functionality of GeoGit is already available through geogit-py, some features are not yet supported. Unsupported features mostly correspond to convenience options that can very easily be implemented, or replicated with a few lines of Python code.

A side effect of developing geogit-py is that it has helped us improve GeoGit itself. Several new features have been added to GeoGit commands to allow for a better integration, and some new commands have even been implemented. Using GeoGit from geogit-py has given us more insight into ways that GeoGit can be used by different applications and services, and has helped us shaping it and improving it.

A comprehensive test suite is included with geogit-py, which represents a real test suite for GeoGit itself, adding to the large collection of sets that GeoGit has. Moreover, we have plans to use geogit-py as part of our quality assurance, specifically for testing GeoGit and prototyping GeoGit use cases.

Soon we will release some of the projects that we are working on that rely on geogit-py. Stay tuned for further updates.