RDS is database-in-the-cloud with all the annoying, repetitive administrative tasks taken care of: automatic backups, restore, replication, replication across availability zones, and I/O provisioning. You can build your own version of RDS using just vanilla cloud servers and your brain, but you’ll spend a lot of time on it, and it still might not work as well.
The first version of RDS was MySQL-based, and Amazon has since added support for Oracle and SQL Server, but the arrival of Amazon RDS for PostgreSQL is a big deal, because it means PostGIS is now an RDS option too.
Using Amazon RDS for PostgreSQL and OpenGeo Suite on AWS Marketplace, setting up a cloud web service is entirely a push-button affair.
Launching Amazon RDS for PostgreSQL
Getting the PostgreSQL instance up and running is almost comically easy.
- Log into AWS (you’ll need to set up an account) and go to the RDS section of the console. Setting up an instance only requires a button click.
- Select a PostgreSQL instance type, and then choose details of the deployment. A big instance? Replicate across zones? Apply minor upgrades (do this always)?
- Set up security and database. For new users, the only tricky part of AWS servers and databases is security. “VPC”s are “virtual private clouds”, network segments that are kept isolated from the world by firewall rules. If you run your database and servers in the same VPC, they can easily connect to each other.
- And then the database starts up!
Configuring Security & Adding Data
Pretty incredible, and particularly gratifying to get up-and-running with a fully backed-up system in just a handful of mouse clicks. But an empty running database holds limited interest… what can it do with data?!?
In order to load data directly from your home/office computer to the RDS instance, you’ll have to open up connections into the virtual private cloud for your home network. Add a rule to the VPC “security group” assigned to your RDS instance, allowing your computer access to port 5432 (the PgSQL port).
Once the rule was changed, I could connect easily the RDS instance using my favorite command-line SQL tool,
psql --host boundless.cyvm9oy2wq1b.us-west-2.rds.amazonaws.com
I loaded two tables of spatial data, an electoral districts (ed) file of 85 multipolygons and a census distribution area (da) file of 55529 multipolygons. Then I did a full spatial join of the two.
SELECT ed.edname, Sum(da.popn) AS popn
JOIN da ON ST_Intersects(ed.geom, da.geom)
WHERE ST_Contains(ed.geom, ST_Centroid(da.geom))
GROUP BY ed.edname
ORDER BY ed.edname;
The RDS instance took about 13 seconds. My MacBook laptop (1.8Ghz) took 28 seconds. So there’s some decent CPU horsepower in these AWS instances.
Since the RDS instance was accessible from my desktop computer, I decided to see if I could connect to it and view data from QGIS.Yep, no problem!
Connecting to OpenGeo Suite on AWS
Finally, I wanted to see how OpenGeo Suite performed accessing the RDS instance. I started OpenGeo Suite on an AWS instance and ensured it was a member of the VPC my RDS was in.
Suite connected to the RDS instance no problem, and I was able to quickly point and click to configure my big census table as a layer.
Speed was great for zoomed in rendering. For zoomed out rendering, where the whole 50Mb file needed to be scanned and rendered, things slowed down a bit, which is to be expected when the data and renderer are running on separate instances with a network interconnect.
The future is certainly cloudy!
With Amazon Web Services, I had a backed-up and replication-ready database up and running in under an hour, and OpenGeo Suite tied into it in another hour. With the knowledge I’ve gained, next time will take only a few minutes.