Stay Connected with the Boundless Blog

OpenGeo GeoScript in Action Part One: Exploring Data Subsets

This is the first post of a three-part series dedicated to showing the versatility and functionality of GeoScript. 

The rumors are true: GeoScript is pretty awesome. How awesome? We’ll let you be the judge. In this post we’ll focus on data exploration and show you how to create a few visualizations right from the GeoScript command line, but GeoScript can do much more than that. In subsequent posts we’ll build up enough code and data to create other GeoScript products, then we’ll show how to easily refactor this code into processing web services. If you’d like to follow along with this post please install GeoScript. For the purposes of these exercises make sure that you download the latest version. For our examples we’ll be using the Python flavor of GeoScript. Make sure to keep the API doc close at hand and don’t hesitate to experiment. If you get stuck, please contact us and we’ll try to help you out.

Getting the Source Data. We’ll be using solar resource data from the National Renewable Energy Lab. The data set contains direct normal irradiance (DNI) average values for the contiguous United States and Hawaii. If you want to find out more details should be contained within the metadata. You can download the entire data set to work through the examples below. Unzip to a directory, navigate to it, and fire up the GeoScript shell. (If you see some diagnostic messages along the way, just ignore them and remember that GeoScript is a work in progress.) Let’s get started!

Loading and Exploring Data. First, let’s load our two shapefiles into GeoServer:

>>> from geoscript.layer import Shapefile
>>> solar_all_poly = Shapefile("solar_dni_polygons.shp")
>>> states = Shapefile("usa_l48.shp")


What’s the structure (schema) of the ‘solar’ dataset and how many features does it contain?

>>> print(solar_all_poly.schema)
solar_dni_polygons [the_geom: MultiPolygon, ID: long, GRIDCODE: float, ANN_DNI: float, JAN: float, FEB: float, MAR: float, APR: float, MAY: float, JUN: float, JUL: float, AUG: float, SEP: float, OCT: float, NOV: float, DEC: float]
>>> print(solar_all_poly.count())
91439

If you want, you could print the entire ‘states’ dataset, but watch out though, as the result will be a pretty big data dump.

>>> print list(states.features())
[usa_l48.1 {the_geom: MULTIPOLYGON (((-77.05354916699991 38.8816935760002, -77.05099573299992 38.87975426100007, -77.04814371899994 38.87760644600007, -77.04687615699987 38.876303672999995, -77.04721013299991 38.87585838300009, ... ... ...

How about printing only the values of the STATE attribute? (You’ll have to hit Enter twice after this command.)

>>> for f in states.features(): print f["STATE"]
... 
District of Columbia
U.S. Virgin Islands
Indiana
Pennsylvania
Rhode Island
Illinois
...

Now, let’s draw the ‘states’ layer. Nothing fancy for now, but let’s ensure correct image proportions by applying the ‘state’ data’s inherent aspect ratio:

>>> from geoscript.render import draw
>>> aspect_ratio = states.bounds().aspect
>>> draw(states, size=(int(200 * aspect_ratio),200))

 

States layer
States layer

A Bit of Processing: Clipping. In the next few steps, we will clip our ‘solar’ layer to a single state. First, let’s decide which state we are interested in:

>>> state = "Florida"

Now, we will create a workspace to hold intermediate processing results in memory, otherwise they will be saved to disk:

>>> from geoscript.workspace import Memory
>>> memory_workspace = Memory() 
>>> state_geom_temp = memory_workspace.add(states)

And then apply a filter to select the state we want and extract its geometry:

>>> state_geom = state_geom_temp.filter("STATE='%s'"%state).features().next().geom

We’re set to run the Clip process. Before we do though, let’s take a look at all Processes available right here in the GeoScript shell. (If you want to jump ahead a little, check out how the same Clip process works as a Web Processing Service.)

>>> from geoscript.process import Process
>>> print list(Process.list())
[(u'geo', u'length'), (u'geo', u'isEmpty'), (u'geo', u'contains'), (u'geo', u'disjoint'), (u'geo', u'intersects'), (u'geo', u'isClosed'), (u'geo', u'isValid'), (u'geo', u'buffer'), (u'geo', u'getY'), (u'geo', u'getX'), ... ... ...
>>> process_clip = Process.lookup("vec:Clip")

Which parameters does the Clip process expect and what does it return?

>>> print "Clip inputs: " + str(process_clip.inputs) 
Clip inputs: {u'clip': (u'clip', <type 'com.vividsolutions.jts.geom.Geometry'>, Geometry to use for clipping (in same CRS as input features)), u'features': (u'features', <class 'geoscript.layer.layer.Layer'>, Input feature collection)}
>>> print "Clip outputs: " + str(process_clip.outputs)
Clip outputs: {u'result': (u'result', <class 'geoscript.layer.layer.Layer'>, Clipped feature collection)}

Now that we know what to feed into the Clip process and what to expect in return, let’s execute it. Please be patient, this process may take a bit longer than the previous commands.

>>> solar_state = memory_workspace.add(process_clip.run(clip=state_geom, features=solar_all_poly).get("result"))

Let’s check how many features were returned and draw the clipped output:

>>> print solar_state.count()
2294
>>> draw(solar_state)
fla_state
Solar resource data clipped to the bounds of Florida

Creating a Histogram. Let’s say we are interested in the distribution of the total solar energy over the course of a year. A histogram is a simple way to visualize this distribution. We can build out histogram for the entire extent of our solar resource data, or for the single state we clipped out in the previous step. Let’s stick with the single state for now.

>>> aoi = solar_state
>>> from geoscript.plot import bar
# create a list of attributes containing months
>>> months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC']
# compute solar power potential totals for each month
>>> month_values_per_feature = [[f.get(mo) for mo in months] for f in aoi.features()]
>>> mo_totals = map(sum, zip(*month_values_per_feature))
# assign an index to each month starting at 1
>>> mo_numbers = range(1,len(months)+1)
# build data for the histogram, print it and draw
>>> histo_xy = zip(mo_numbers, mo_totals)
>>> print histo_xy
[(1, 3016.6060161298988), (2, 3502.787616071399), (3, 3869.839870968297), (4, 3800.790388888799), (5, 3838.1435860215997), (6, 3984.095752778796), (7, 3930.639688172497), (8, 3797.606723118299), (9, 3542.7847666671996), ... ... ... 
>>> bar.xy(histo_xy).show()
solar_fla
Distribution of the total solar energy resource over the course of an (average) year in Florida

Creating an Animation. We’ll now create an animated GIF to show how average intensity of solar resource potential varies month-by-month, over a year.

>>> from geoscript.style import * 
>>> from geoscript.render import GIF
>>> from geoscript.util import interpolate
# compute aspect ratio
>>> bbox_aspect = aoi.bounds().aspect
# compute classes based on global minmax across all months
>>> classes = 100
>>> minmax = zip(*[aoi.minmax(m) for m in months])
# interpolate using exponential function
>>> vals = interpolate(min(minmax[0]), max(minmax[1]), classes, "exp")
>>> print vals
[3.4250456989, 3.44100384308852, 3.457216649642982, 3.4736881825021397, 3.4904225704576706, 3.507424008189102, 3.5246967573152643, 3.5422451474625136, 3.560073577350009 ... ... ...
# initiate empty list for image frames and set color gradient bounds for our symbols
>>> frames = []
>>> gradient_color_min = 'yellow'
>>> gradient_color_max = 'red'
>>> for m in months:

        # generate buckets for fills and strokes
...     fills = Fill(gradient_color_min).interpolate(Fill(gradient_color_max), classes)
...     strokes = Stroke(gradient_color_min).interpolate(Stroke(gradient_color_max), classes)

        # build styles
...     styles = map(lambda x: x[0] + x[1], zip(strokes, fills))
...     x = zip(zip(vals[:-1], vals[1:]), styles)
...     parts = map(lambda t: t[1].where( m + ' BETWEEN %f and %f' % (t[0][0], t[0][1])), x) 
...     style = reduce(lambda x,y: x+y, parts)

        # draw a map, apply the style, correct aspect ratio and save it as a frame
...     frames.append(draw(aoi, style, size=(int(400*bbox_aspect), 400), format='gif'))
...
# finally, generate the animation from the frames; 
# the image will be placed in the directory with your shapefiles
>>> GIF.animated(frames, 'solar.gif', delay=50, loop=True)
florida
Solar resource animation for Florida

Now that we’ve tested our logic on data for a single state, we can build an animation for the entire data set. To do this, simply reassign the aoi variable:

>>> aoi = solar_all_poly

and then re-run the animation generation steps. You should end up with something that looks like this:

solar
Solar resource animation for the contiguous USA

Stay tuned! Next time, we’ll five deeper into visualization and develop some processing services around this code and data.

Have you used GeoScript for a project? If so, please tell us about it. If not, what are you waiting for? Go give it a try! My special thanks for Justin Deoliveira and Mike Pumphrey for handholding and sanity checks respectively.