Stay Connected with the Boundless Blog

Make HTTP Requests From Inside SQL Statements For PostgreSQL

I’m not sure if this falls into the category of “world changing innovation” or “stupid pet tricks”, probably the latter, but earlier this week I published the first revision of a library that lets you make HTTP requests from inside SQL statements.

SELECT status, content_type, content FROM http_get('http://localhost');

 status | content_type |                   content
    200 | text/html    | <html><body><h1>It works!</h1></body></html>
(1 row)

What good is it? Well, the first non-silly use case I came up with was setting up a geocoding hook so that you can call a geocoding web service as new addresses are added to a table, and this morning I thought I would write it up to see what it looked like.

The Google Geocoding API takes in URLs that look like this:

And returns results that look like this:

<?xml version="1.0" encoding="UTF-8"?>
  <formatted_address>1234 Main St, Evanston, IL 60202, USA</formatted_address>
   <long_name>Main St</long_name>
   <short_name>Main St</short_name>

The HTTP library can get us the content, and we just need to extract the coordinates. There are all sorts of interesting text processing functions in PostgreSQL, including a full regular expressions system, but for maximum nerdiness it makes sense to process the XML document with the native PostgreSQL XML support (you will need to have your PostgreSQL compiled with XML support for this to work).

Here’s a function that takes in a street address, calls the geocoder, strips out the coordinates and constructs a PostGIS geometry return value.

CREATE OR REPLACE FUNCTION address_to_geom(text)
RETURNS geometry
WITH http AS (
  SELECT status, content_type, content::xml
  FROM http_get('' || replace($1, ' ', '+'))
parts AS (
    status, content_type,
    (xpath('//location/lat/text()',content))[1] AS lat,
    (xpath('//location/lng/text()',content))[1] AS lng
  FROM http
  ST_SetSRID(ST_MakePoint(lng::text::float8, lat::text::float8),4326) AS geom
FROM parts


I’m using my new favorite PostgreSQL syntactic sugar, the “WITH” keyword, to order the subqueries from top to bottom so you can read them serially.

First we run the HTTP call, lightly URL-encoding the input address before appending it to the base URL. (Future enhancement, do a real URL encoding.)

Then we extract the coordinates from the XML, using an XPath query! Note that we’re only extracting the first result, but with XPath we could extract them all into arrays if we wished. (Future enhancement, watch the return status and check for web service error conditions at this step.)

Finally we convert the coordinates into a geometry, casting the coordinates from XML to text to float, and then setting the SRID to 4326, since Google only works in WGS84.

Now that we have the function, writing a trigger is trivial.

CREATE TABLE addresses (
  address VARCHAR,
  geom GEOMETRY(Point, 4326)

CREATE OR REPLACE FUNCTION update_address_geometry()
  NEW.geom = address_to_geom(NEW.address);
LANGUAGE plpgsql;

CREATE TRIGGER address_trigger
  ON addresses
  FOR EACH ROW EXECUTE PROCEDURE update_address_geometry();

Now every time an address is inserted or updated, the associated geometry will be looked up in the geocoding service automagically.  What could possibly go wrong? Well, if the Google web service is slow, each insert and update will take as long as the web service, which would not be good in a high-volume environment, to say the least. And as noted above, this example includes no error handling at all. However, it’s a cute trick and I’m sure there will be more to come.

The reason I wrote this little library is to allow PostgreSQL and PostGIS to talk to GeoWebCache directly, so that edits at the database level can inform the caching layer that the cache needs to be updated where the edits have occurred. This is important for any system that uses cached tiles on top of live data. This update will be in the OpenGeo Suite 3.X series, which is coming later this year.