Stay Connected with the Boundless Blog

Making Geography Faster With OpenGEO Geospatial Software

When the “geography” type was added to PostGIS in version 1.5, I knew that one of the first things people would notice when they moved a process from geometry to geography was that geography was much slower.

The reason is two-fold. Firstly, geometry already has some fancy caching routines to allow caching of object indexes between spatial function calls. And secondly, the basic algorithms of computational geometry on the sphere are just much more expensive than those for the plane.

However, I knew that there was lots of room for improvement, by applying the same caching behaviour to geography that is used in geometry.  And fortunately, a client has been using PostGIS geography enough to require the kind of speed-up that a caching system will provide (10x to 20x faster).

The caching system looks at a function call (for example ST_Intersects(A, B)) and determines if either of the arguments is a repeat from the previous call. If it is, it builds an index over the object’s edges, which allows subsequent calculations to go much faster. Then it stores the index, in case the next call also has a repeat argument.

For cartesian geometry, the index of edges is based on bounding boxes around the edges. For spherical geometry, bounding boxes don’t make any sense, but bounding circles work very nicely.

Once you have a bounding circle for every edge, you can build a tree by merging adjacent circles into parents circles.

Continuing to merge each set until you have a single parent bounding circle.

The nice thing about bounding circles is they work the same regardless of where they are on the globe (they don’t care about datelines or poles) and the test for whether they contain a point, or intersect each other is relatively cheap (if the distance between the centers of two bounding circles is greater than the sum of their radii, they don’t intersect).

Using bounding circle trees and a new generic caching system (now the geography cache and the geometry cache use the same underlying cache control, which has cut down the amount of code substantially) we have been able to demonstrate a very substantial performance boost.

For example, calculating ST_Distance between Canada (3316 vertices) and Brazil (410 vertices) in a country boundary table. Using the traditional distance code, which compares every edge in Canada to every edge in Brazil (that’s 1359150 comparisons), takes 11940ms. Using the tree-based distance calculation takes 35ms. That’s 341 times faster. It’s an extreme case, because both objects are very large, but’s it’s indicative of the kinds of wins available with this improvement.

The speed difference for the distance between Germany (126 vertices) and Spain (130 vertices) is 6ms versus 138ms, over 20 times faster. Between Ireland (71 vertices) and Iceland (106 vertices) the difference is 6ms versus 60ms, 10 times faster.

This improvement has been applied to the following functions:

  • ST_Distance(geography, geography)
  • ST_DWithin(geography, geography, radius)

It could also be added to ST_Intersects(geography, geography) relatively easily.

A cartesian version of this code (using bounding rectangles instead of circles) could also be written, which would dramatically speed up ST_Distance(geometry, geometry) and ST_DWithin(geometry, geometry). The current geometry caching code only works for the boolean predicates (ST_Intersects, ST_Contains, ST_Within) so the geometry distance code could be very substantially sped up, if there is a client demand.

If you’re willing to fund faster distance calculations for the geometry type, get in touch.