Stay Connected with the Boundless Blog

PgCon Notes 3 Open Source Conference On Open Source Mapping Software

One of the joys of an open source conference is meeting people face to face who you have only previously corresponded with via e-mail. PgCon this year I got to meet Oleg Bartunov and Teodor Sigaev, the team behind the PostgreSQL GiST and GIN indexes.

The GiST and GIN indexes are interesting beasts. Both offer APIs for data type authors (like the PostGIS team) to attach their particular type to an index strategy that makes sense for it. So the GiST index can be used to build an R-Tree pattern, but it can also be used to index arrays.

At PgCon this year, Oleg and Teodor presented their initial findings on a new generalized index, that they are calling “SP-GiST” for “Spatial Partitioning Generalized Search Tree”.

Spatial partitioning is an important topic for GIS data, because GIS data tends to be queried in spatially consistent groups. When data is spatially partitioned, objects in the same index node tend to be close together. The net result should in theory be faster retrieval times for spatial queries. Oleg and Teodor implemented a very basic SP-GiST with a quadtree binding and ran performance tests to see how it fared.

Using a GIS data set (the GeoNames corpus of 2 million points), and comparing a GiST R-Tree to an SP-GiST quad-tree, they found the SP-GiST quad-tree built 3 times faster and fulfilled bounding box queries 6 times faster.

In case you skipped that last paragraph: the SP-GiST quad-tree was six times faster.

The demonstration code was suitable for the performance test, but is not ready for production. It needs to use the PostgreSQL write-ahead log for durability, to support concurrency for transactions, to support vacuuming so it doesn’t bloat over time, and to use a better fill algorithm for database pages.

However, with funding the work could be ready to be part of the PostgreSQL 9.2 development cycle which will likely release to production in the fall of next year. If you’re interested in seeing your index queries run 6 times faster in PostGIS, then contact me to discuss sponsoring this development.