HTTP for PostgreSQL

I’m not sure if this falls into the category of “world changing innovation” or “stupid pet tricks”, probably the latter, but earlier this week I published the first revision of a library that lets you make HTTP requests from inside SQL statements.

SELECT status, content_type, content FROM http_get('http://localhost');

 status | content_type |                   content
--------+--------------+----------------------------------------------
    200 | text/html    | <html><body><h1>It works!</h1></body></html>
(1 row)

What good is it? Well, the first non-silly use case I came up with was setting up a geocoding hook so that you can call a geocoding web service as new addresses are added to a table, and this morning I thought I would write it up to see what it looked like.

The Google Geocoding API takes in URLs that look like this:

http://maps.googleapis.com/maps/api/geocode/xml?sensor=false&address=1234+main+st+chicago

And returns results that look like this:

<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse>
 <status>OK</status>
 <result>
  <type>street_address</type>
  <formatted_address>1234 Main St, Evanston, IL 60202, USA</formatted_address>
  <address_component>
   <long_name>1234</long_name>
   <short_name>1234</short_name>
   <type>street_number</type>
  </address_component>
  <address_component>
   <long_name>Main St</long_name>
   <short_name>Main St</short_name>
   <type>route</type>
  </address_component>
  ......
  <geometry>
   <location>
    <lat>41.6372458</lat>
    <lng>-87.5860067</lng>
   </location>
  </geometry>
  <partial_match>true</partial_match>
 </result>
</GeocodeResponse>

The HTTP library can get us the content, and we just need to extract the coordinates. There are all sorts of interesting text processing functions in PostgreSQL, including a full regular expressions system, but for maximum nerdiness it makes sense to process the XML document with the native PostgreSQL XML support (you will need to have your PostgreSQL compiled with XML support for this to work).

Here’s a function that takes in a street address, calls the geocoder, strips out the coordinates and constructs a PostGIS geometry return value.

CREATE OR REPLACE FUNCTION address_to_geom(text)
RETURNS geometry
AS
$$
WITH http AS (
  SELECT status, content_type, content::xml
  FROM http_get('https://maps.googleapis.com/maps/api/geocode/xml?sensor=false&address=' || replace($1, ' ', '+'))
),
parts AS (
  SELECT
    status, content_type,
    (xpath('//location/lat/text()',content))[1] AS lat,
    (xpath('//location/lng/text()',content))[1] AS lng
  FROM http
)
SELECT
  ST_SetSRID(ST_MakePoint(lng::text::float8, lat::text::float8),4326) AS geom
FROM parts

$$
LANGUAGE 'SQL';

I’m using my new favorite PostgreSQL syntactic sugar, the “WITH” keyword, to order the subqueries from top to bottom so you can read them serially.

First we run the HTTP call, lightly URL-encoding the input address before appending it to the base URL. (Future enhancement, do a real URL encoding.)

Then we extract the coordinates from the XML, using an XPath query! Note that we’re only extracting the first result, but with XPath we could extract them all into arrays if we wished. (Future enhancement, watch the return status and check for web service error conditions at this step.)

Finally we convert the coordinates into a geometry, casting the coordinates from XML to text to float, and then setting the SRID to 4326, since Google only works in WGS84.

Now that we have the function, writing a trigger is trivial.

CREATE TABLE addresses (
  id SERIAL PRIMARY KEY,
  address VARCHAR,
  geom GEOMETRY(Point, 4326)
);

CREATE OR REPLACE FUNCTION update_address_geometry()
RETURNS TRIGGER AS
$$
  BEGIN
  NEW.geom = address_to_geom(NEW.address);
  RETURN NEW;
  END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER address_trigger
  BEFORE UPDATE OR INSERT
  ON addresses
  FOR EACH ROW EXECUTE PROCEDURE update_address_geometry();

Now every time an address is inserted or updated, the associated geometry will be looked up in the geocoding service automagically.  What could possibly go wrong? Well, if the Google web service is slow, each insert and update will take as long as the web service, which would not be good in a high-volume environment, to say the least. And as noted above, this example includes no error handling at all. However, it’s a cute trick and I’m sure there will be more to come.

The reason I wrote this little library is to allow PostgreSQL and PostGIS to talk to GeoWebCache directly, so that edits at the database level can inform the caching layer that the cache needs to be updated where the edits have occurred. This is important for any system that uses cached tiles on top of live data. This update will be in the OpenGeo Suite 3.X series, which is coming later this year.

EmailTwitterFacebookGoogle+tumblrLinkedIn

4 thoughts on “HTTP for PostgreSQL

  1. Hi Paul,
    Wow, your implementation of the Google Geocoding API looks great. That would actually exactly be what I need. I tried to make your HTTP client for PostgreSQL run on Windows, but haven’t succeeded. Could you give me a hint how to get it working on Windows 7? I use the latest 32bit version of PostgreSQL (9.1.3) and pgadmin.
    Thanks a lot for your help.
    Best,
    Johannes

  2. Hi!

    Awesome idea, keep it up!

    I’m not sure yet what to make of this, but together with https://wiki.postgresql.org/wiki/HTTP_API
    it might be funny or it might get scary :)

    Regarding your notes on timeout at https://github.com/pramsey/pgsql-http#why-this-is-a-bad-idea – timeout would be an important feature.
    On the command line, netcat’s timeout feature seems robust. Unfortunately it is not available as a library.

    (Uh, pretty hard captchas here combined with the duplicate message check)

  3. black magic!

    I have a big geocoding project coming up and I’m going to give this a shot…

  4. Hi,

    I wanted to use this with PostGreSQL 9.2, but it seems its not supported. Any plans of releasing a version with 9.2 support.

Comments are closed.