Stay Connected with the Boundless Blog

OGR Foreign Data Wrap With PostgreSQL

Merry Christmas, nerds! Have you ever wanted to connect your PostGIS database directly to an existing GIS file or database and read from it without importing the data. I have good news, repeat after me: OGR FDW FTW!

(How did this happen? Boundless provides its engineers with “innovation time” to pursue personal technical projects, and this year I chose to blow all my time in one go on this project. Like the idea of innovation time? Join us!)

OGR, is the vector subsystem of the GDAL open source format library. The OGR API lets applications read and write to many different formats (Shapefile, Oracle, SDE, MapInfo, PostGIS, FGDB, GML, etc) without having to import them first.

FDW, is the PostgreSQL API for implementing “foreign data wrappers”: virtual tables in the database that are actually connections to remote data files, repositories and servers. There are FDW implementations to connect to MySQL, Oracle, SQLite, and even flat text files!

FTW, is “for the win”! Because the OGR API is a simple table-oriented read-write library that is practically begging to be hooked up to the PostgreSQL FDW system and expose OGR data sources as tables in PostgreSQL.

Here’s how it works.

First, go to the source code repository, build and install the ogr_fdw extension.

Next, create the ogr_fdw extension and the postgis extension.

CREATE EXTENSION postgis;
CREATE EXTENSION ogr_fdw;

Now create a “server”. For a database FDW, the server would be an actual database server somewhere. For the OGR FDW, a server is an OGR connection string: it could be a database server, a directory full of files, or (as in this case) a web service:

CREATE SERVER opengeo
  FOREIGN DATA WRAPPER ogr_fdw
  OPTIONS (
    datasource 'WFS:http://demo.opengeo.org/geoserver/wfs',
    format 'WFS' );

Now create a “foreign table”. This will look just like a table, to the database, but accessing it will actually create an access to the remote server.

CREATE FOREIGN TABLE topp_states (
  fid integer,
  geom geometry,
  gml_id varchar,
  state_name varchar,
  state_fips varchar,
  sub_region varchar,
  state_abbr varchar,
  land_km real,
  water_km real )
  SERVER opengeo
  OPTIONS ( layer 'topp:states' );

Now, treat the table like you would any other PostGIS table, and ask it a question in SQL:

SELECT st_area(geom::geography) 
FROM topp_states 
WHERE state_name = 'Missouri';

And the answer comes back: 180863 sq km.

How does it work? The PostgreSQL query fires off an OGR query to the server (in this case, the OpenGeo Suite demo server) which pulls the table down, and it is then filtered and calculated upon locally in PostgreSQL.

Could it be better? Sure!

It could push SQL restriction clauses down to the OGR driver, reducing the quantity of data returned to the server. For big tables, this will be very important.

It could restrict the number of columns it returns to just the ones needed for the query. This will make things a little faster.

It could allow read/write access to the table, so that INSERT, UPDATE and DELETE queries can also be run. This opens up a whole world of interoperability possibilities: imagine your database being able to directly edit a File Geodatabase on the file system? Or directly edit an ArcSDE server in a workgroup?

The biggest limitation of the PostgreSQL FDW system is that it requires a table definition before it can work, so you require a priori knowledge of the table structure to set up your tables. Because this just creates busywork, I’ve also bundled a utility program with the ogr_fdw extension: ogr_fdw_info. The utility will read an OGR data source and layer and return the SQL you need to create an FDW server and table for reading that layer.

Enjoy wrapping your foreign tables, and enjoy the holidays!