Google & GeoServer Support Geospatial Big Data in the Cloud

Our friends over at CCRi released an exciting announcement today describing their collaboration with Google on the initial release of GeoMesa for Google Cloud Bigtable, creating a vastly scalable platform for geospatial analysis that leverages the cost effectiveness and management ease of the cloud.

If you aren’t familiar with GeoMesa, it’s an open-source extension that quickly stores, indexes, and queries hundreds of billions of geospatial features in a distributed database built on Apache Accumulo.  GeoMesa leverages GeoServer for its spatial processing, and we’ve been working with CCRi for a while to combine the data management and publishing capabilities of OpenGeo Suite with the big data analytics capabilities of GeoMesa.

At the same time, Google today announced Google Cloud Bigtable; a fully managed, high-performance, extremely scalable NoSQL database service accessible through the industry-standard, open-source Apache HBase API. Under the hood this new service is powered by Bigtable, the same database that drives nearly all of Google’s largest applications.

CCRi’s announcement means that GeoMesa is now supported on Google Cloud Bigtable. As noted in CCRi’s blog post, when using Google Cloud Bigtable to back GeoMesa, developers and IT professionals are freed from the need to stand up and maintain complex cloud computing environments. These environments are not only expensive to build, but they require highly-trained DevOps Engineers to maintain them and grow them as the data accumulates.  Because GeoMesa supports Open Geospatial Consortium (OGC) standards, developers can easily migrate existing systems or build new systems on top of GeoMesa. Developers familiar with GeoServer or the OpenGeo Suite can use the GeoMesa plugin to add new data stores backed by Google Cloud Bigtable.

Let’s think for a moment about the opportunity here.  As an industry, organizations like CCRi are continuing to advance how spatial processing can be applied to big data (NoSQL, key-value pair, graph) stores, and GeoMesa is a great example of this.  I have also seen examples of OpenGeo Suite spatially enabling content in a speed layer of a Lambda architecture leveraging Apache Spark or Apache Storm.  And while these advancements do illustrate value added, the infrastructure and knowledge needed to setup these architectures is not trivial. Leveraging capabilities like GeoMesa for Google Cloud Bigtable makes geospatial analytics with big data accessible to a much wider audience.

Considering a Hybrid Proprietary/Open-Source Architecture

A discussion I find myself having more and more with customers is how best to migrate to a hybrid architecture based on a combination of both proprietary and open-source technologies.  Customers have realized that building a platform with both proprietary and open-source tools can help an organization reduce risk and add value in several ways:

  • Avoiding Single Vendor Lock-in
  • Reducing Costs Associated with Licensing
  • Promoting Interoperability with Existing Software and Architecture

In your typical proprietary environment, beyond just software license costs there are additional, sometimes hidden, costs captured in the graphic below.  While individual costs may be nominal, they can add up and ultimately affect the total cost of ownership of a solely proprietary solution.notional-costsCustomers are also realizing that hybrid architectures allow for more gradual, risk-appropriate migration strategies.  In other words, you do not have to rip and replace all of your existing proprietary software for all open-source software.  Many times this is impossible due to specific feature limitations, a steep learning curve, or it’s simply just too cost prohibitive. So I encourage customers to consider implementing only portions of their architecture at a time, and only where it makes sense to do so.

Anthony_2Remember that the OpenGeo Suite includes software at the database, application server, and user interface tier that do not have strict dependencies on each other.  This means you can focus on integrating open-source one tier at a time without interrupting the entire enterprise.  I see many customers start at the database tier because changes are largely ‘hidden’ to the end user.  They are still using the same user interface they are accustomed to, but are in many cases unknowingly connecting to a different end point to retrieve their data.  Everybody wins.

Still other organizations have realized that the best migration point is at the user interface tier.  They are not leveraging the value of expensive proprietary applications because their users require and use only a fraction of the potential capabilities.  In other words, they are paying for a Ferrari, when they could easily use a Vespa.  Hybrid migration strategies targeted at non-power users can quickly realize significant savings in license costs.

It is worth adding that adopting this hybrid approach early on in the evolution of your architecture ensures more choices for migration and an overall cost savings.  The old FRAM oil filter commercials of the 1970’s just popped into my head, “You can pay me now, or you can pay me (a lot more) later”.

While the why of migrating to a hybrid architecture is generally understood, I tend to get a lot of questions from customers regarding the how.  Boundless architects will happily sit one-on-one with you to discuss the specifics of your migration, but this is also one area where Boundless Professional Services can greatly help.  Our expert technologists will work side-by-side with your team to guarantee that best practices are met at every phase of your project, and that you make the most of your investment in OpenGeo Suite and Boundless.  We’ve handled engagements of all sizes, and can tailor them to meet the needs of your organization.  Consider the following packages to help your migration to a hybrid architecture:

  • Migration Assessment: Most organizations we see are not starting from scratch. This package will capture details about your as-is state, and understand where you want to go.  Perhaps you are looking to migrate your database from Oracle to PostGIS, or migrate from ArcGIS Server to GeoServer.  To ensure comprehensive coverage, we will document details about your current missions and business goals, legacy, users and workflows, present costs of your software inventory, as well as any indirect infrastructure and software costs.  Finally, as a best practice to ensure quality of communication, we will prepare a comprehensive report containing an executive summary, findings, a plan for incremental migration and any relevant risk mitigation strategies.
  • OpenGeo Suite Pilot: Customers don’t always know the art of the Possible and what they can actually achieve with OpenGeo Suite. Getting up and running with OpenGeo Suite can be as simple as running an installer, but what do you do from there? This package accelerates your understanding of Boundless capabilities and provides a picture of your future solution via hands-on activity. Whether you already have a geospatial technology legacy or starting from scratch, we help you stand up a working demo, get necessary experience using your own data and plan next steps
  • Architecture & Design Review: The Architecture and Design Review is a tool for your team to discover your solution’s strengths and areas of improvement. During this engagement, our senior engineers review your requirements and will answer any questions you have at this critical phase. You will benefit from improved solution architecture, improved infrastructure design, and best practices most relevant to your solution giving your architects and developers confidence to embark on the implementation.
  • Scale Up & Out: Many customers are getting ready for the cloud or are looking to optimize OpenGeo Suite in an elastic environment. We can review and benchmark your spatial IT infrastructure, and give you the advice you need to parallelize, how to set up high availability and how to configure your services for maximum performance and fault-tolerance. This package is for those getting ready to run GeoServer and/or PostGIS in parallel clusters, and for those looking to squeeze more performance from their existing infrastructure. We will measure and benchmark as-is performance of your OpenGeo Suite deployment, diagnose and resolve performance bottlenecks and help you migrate to an improved configuration.

There are additional Professional Services packages available for review on our website http://boundlessgeo.com/solutions/professional-services/, and Boundless can work to customize or combine these packages to best fit your organization’s needs.

One final thought worth mentioning.  Many customers I have talked to think this migration will happen quickly, in a matter of weeks or even months.  But the reality I’ve witnessed is depending on the complexity of your data, current architecture, availability of resources, and the end user applications the process could take significantly longer.  This is not necessarily a bad thing, and you can use it to your advantage.  By completing your migration in phases you won’t shock your end users into a big change all at once.  It also gives you plenty of ability to adjust and adapt as challenges arise.  You can see an example of a notional timeline below.
Anthony_3Bottom line; a hybrid platform built on both proprietary and open source tools can help an organization reduce risk and add value in several ways.  Boundless has experience implementing hybrid architectures of all sizes, and has a staff that can help assess your migration path as well.  Packages from Boundless Professional Services are a great way to kick start that migration and point you in the right direction right from the start.

Advanced Styling with OpenLayers 3

As we see growth in adoption of OpenLayers 3, we get a lot of feedback from the community how we can enhance its usefulness and functionality. We’ve been pleased in 2015 to release iterations of OL3 on a monthly basis – but I’m going to highlight some great new functionality added by my colleague Andreas Hocevar late last year.

While styling in OpenLayers normally uses the geometry of a feature, when you’re doing visualizations it can be beneficial if there is a way for a style to provide its own geometry. What this means is you can use OL3 to easily provide additional context within visualizations based on the style itself. As a very simple example, you can take a polygon and use this new feature to show all the vertices of the polygon in addition to the polygon geometry itself, or even showing the interior point of a polygon.

As a visual:Bart_1In order to achieve the above effect, you can add a constructor option called geometry to ol.style.Style. This can take a function that gets the feature as an argument, a geometry instance or the name of a feature attribute. If it’s a function, we can – for example – get all the vertices of the polygon and transform them into a multipoint geometry that is then used for rendering and applied to the corresponding style.

You can see sample code for this OpenLayers polygon-styles example at http://openlayers.org/en/master/examples/polygon-styles.html.

[Side note: Since the OpenLayers development team got together for a codesprint in Schladming, Austria, the polygon-styles example page now has a “Create JSFiddle” button (above the example code) which will allow you to experiment quickly with the code from the OpenLayers examples. Thanks to the sprint team for adding this convenient functionality!]

Another example to connect this with more practical use cases: you can use this functionality to show arrows at the segments of a line string.
Bart_2As before with the polygon-styles example, you can see what’s behind this line-arrows example at http://openlayers.org/en/master/examples/line-arrows.html

Lastly, we’ve provided an earthquake-clusters example (reviewable at http://openlayers.org/en/master/examples/earthquake-clusters.html) showing off this new functionality with a slightly different twist. When you hover over an earthquake cluster, you’ll see the individual earthquake locations styled by their magnitude as a regular shape (star):
Bart_3

Please don’t hesitate to let Boundless know if you have any questions about how we did this in OL3, or any other questions you may have about OpenLayers or OpenGeo Suite!

 

MGRS Coordinates in QGIS

One of the main characteristics of QGIS, and one of the reasons developers like myself appreciate it so much, is its extensibility. Using its Python API, new functionality can be added by writing plugins, and those plugins can be shared with the community. The ability to share scripts and plugins in an open-source medium has caused QGIS functionality to grow exponentially.  The Python API lowers the barrier to entry for programmers, who can now contribute to the project without having to work with the much more intimidating core C++ QGIS codebase.

At Boundless, we have created plugins for QGIS such as the OpenGeo Explorer plugin. The OpenGeo Explorer plugin allows QGIS users to interact with Suite elements such as PostGIS and GeoServer, as well as provides an easy and intuitive interface for managing these elements.

Boundless is also involved in the development and improvement of core plugins (plugins that, due to their importance, are distributed by default with QGIS instead of installed optionally by the user). For instance, Boundless is the main contributor to the Processing framework, where most of the analysis capabilities of QGIS reside.

Although both Processing and the OpenGeo Explorer are rather large plugins, most of the plugins available for QGIS (which currently are more than a hundred) are smaller, just adding some simple functionality. That is the case with one of our latest developments, the mgrs-tools plugin, which adds support for using MGRS coordinates when working with a QGIS map.

The military grid reference system (MGRS) is a geocoordinate standard which permits points on the earth to be expressed as alphanumeric strings. . QGIS has no native support for MGRS coordinates, so the need for the mgrs-tools plugin to support users of the standard has grown significantly.

Unlike other coordinate systems that are supported by QGIS, MGRS coordinates are not composed of a pair of values (i.e. lat, lon or x, y), but just a single value. For this reason, implementing support required using a different approach.

We created a small plugin that has two features: centering the view on a given MGRS coordinate, and showing the MGRS coordinate at the current mouse position.

The coordinates to zoom to are introduced in a panel at the top of the map view, which accepts MGRS coordinates of any degree of precision. The view is moved to that point and a marker is added to the map canvas.
Olaya_1

 

When the MGRS coordinates map tool is selected, the MGRS coordinates corresponding to the current mouse position in the map will be displayed in the QGIS status bar.
Olaya_2Both of these features make use of the Python mgrs library, using it to convert the coordinates of the QGIS map canvas into MGRS coordinates or other way around.

In spite of its simplicity this plugin is of great use for all those working with MGRS coordinates, who had no way of using them in QGIS until now. New routines can be added to extend the functionality, and we plan to do that in the near future.

As you can see, creating Python plugins is the easiest and most practical way of adding new functionality to QGIS or customizing it. The QGIS community has reduced barriers to solving challenges by adding extensibility. At Boundless, we use our extensive experience creating and maintaining QGIS plugins to provide effective solutions to our QGIS customers. Also, we provide training for those wanting to learn how to do it themselves through workshops and training programs. Let us know your needs and we will help you get the most out of your QGIS.

(Note: The mgrs-tools plugin is currently available at https://github.com/boundlessgeo/mgrs-tools)

 

Connecting The “Dots” of Your Supply Chain With OpenGeo Suite

The value of leveraging GIS in your supply chain is well known. This includes the ability to more effectively communicate the current state and relationships of your supply chain, detect events, model changes, etc. OpenGeo Suite can readily enable organizations with supply chain requirements to use their data to visualize and analyze relationships between supply chain participants.  As a sample proof exercise, I’m going to use OpenGeo Suite to identify supply lines for visualization and analysis between Production Plants and Suppliers. 

To begin, I have two separate shapefiles, one for the production plants and one for the suppliers. The production_plants layer contains a field that houses the supplier ID which will be used to join the two sets of data. 

Next we need to create the lines themselves. There are multiple methods available within QGIS to create the supply lines. For the purpose of this exercise, we will leverage the MMGIS plugin to create these supply lines using the Hub Lines tool.


In the dialog choose the production_plants and suppliers layers along with the supplier_id field that joins the two. In this case it is the supplier_id field in the Production Plants layer and the id field in the Suppliers layer. Chose the location where you would like the shapefile to be stored and click OK.

The tool generates the lines between plants and suppliers and adds it to our layer list.

From here we can publish the shapefile to GeoServer or import it into our DB. 

This is a good methodology if the dataset is static, or if you are processing this for another user and sending them the data for further desktop analysis. However, your needs may not be so straight-forward. What if we want to make this a more dynamic process? Alternatively, maybe this dataset and the relationships change on a regular basis, or this relationship is defined in another system of record (BI, Production management, etc)?

One choice would be to script this process and run it on a regular interval that corresponds to the data update cycle(IE quarterly, yearly, etc).

That works well for slow change data, but if we want to automate the line generation and see the changes each time we refresh our map we can use the power of the database to perform this task for us. 

Let’s look at one way to do this using a view. 

First I’ve already imported my production_plants and suppliers layers into the DB. Next we’ll create a view that generates our supply lines for us and then registers the table with GeoServer. 

The SQL below is joining the tables based on the supplier_id and id just like we did in QGIS. 

Create or replace view plant_supplier_lines as
SELECT p.supplier_id,
    s.id,
    ST_MakeLine(p.geom,s.geom)
FROM production_plants as P JOIN suppliers as s
    ON p.supplier_id = s.id


In addition to the automation this method allows us to easily incorporate other attributes into the line. For example, if we update the view to include the the amount of shipments in transit we can use that in the symbology. 

Create or replace view plant_supplier_lines_attribs as
SELECT p.supplier_id,
   s.id,
   s.shipments_in_transit,
   ST_MakeLine(p.geom,s.geom)
FROM production_plants as P JOIN suppliers as s
   ON p.supplier_id = s.id


Supply lines with a low volume of goods in transit are represented by a thin green line, moderate volume goods in transit are a medium yellow line, and high volume supply lines by a thick red line. 

This example was fairly straight forward in that our plant-supplier relationship is 1-to-1. If your data is 1-to-many or many-to-many a similar DB view based on a relationship table could be used. 

If you would like to know more about using DB views including parameterized views see  http://boundlessgeo.com/2015/03/support-story-getting-sql-views/ 

Creating the supply lines has helped us visualize the connection between our plants and suppliers as well as provide us with more data for future analysis. The next step in driving efficiencies into our supply chain is adding data for events that could adversely impact  our supply chain; this includes weather, transportation outages, disasters, etc. In the next blog post we will look at ways to incorporate some of these data feeds into the system and build towards automated alerting.

 

Pinpointing Disasters with OpenGeo Suite

Boundless’ Victoria, British Columbia office sits at the southern tip of Vancouver Island, a region in which is used to mild earthquake activity. So when a colleague back east asked if we’d felt “the earthquake near Port Hardy”, we weren’t particularly surprised that there had been one or that it had gone unnoticed locally.

We were a little surprised, however, when we saw that the epicentre was well off the west coast of the island, while Port Hardy sits on the east coast. Looking more closely at the USGS map showing the location of the earthquake, one wonders why Port Hardy was chosen as a reference point in news reports and not say, Tofino, which is roughly due east of the epicentre.

Like many OpenGeo Suite users, PostGIS is my go-to tool for doing some quick spatial analysis, so I loaded up the Natural Earth populated places data set in to a fresh database and whipped up a quick query to see what was happening within 250 kilometres of the quake.

$ shp2pgsql -W LATIN1 -I -s 4326 ne_10m_populated_places.shp | psql earthquakes

WITH constants AS 
(SELECT ST_SetSRID(ST_MakePoint(-128.15, 49.42), 4326)::geography AS quake)
(
  SELECT name, pop_max, ST_Distance(geom::geography, quake) / 1000 AS distance
  FROM ne_10m_populated_places, constants
  WHERE ST_DWithin(quake, geom::geography, 250*1000)
  ORDER BY DISTANCE
);

And sure enough:

      name      | pop_max |     distance
----------------+---------+------------------
 Port Hardy     |    2295 | 151.591959991648
 Tofino         |    1655 | 170.322296453086
 Campbell River |   33430 | 219.404018781354
 Courtenay      |   32793 | 229.792897985687

Port Hardy just edges out Tofino as the closest settlement in my data set. So do the USGS and other organisations simply use a naive algorithm to find a reference point for earthquakes? It sure looks like it!

[Note: if you’re wondering about the WITH clause above, that’s just an easy way to effectively define a constant that can be referenced throughout the main part of the query. Remember this syntax because we’ll be using it again below.]

Google went with Port Hardy, based on USGS data (although they calculate the distance differently that their source):

Natural Resources Canada’s website on the other hand referenced the small village of Port Alice, which is closer to the action but wouldn’t make the population threshold in most data sets:

Writing a better algorithm

If we agree that Port Hardy probably isn’t the best reference point for an event in the Pacific Ocean, then we are left with the question: can we design a better algorithm? A simple, but effective improvement would be to calculate the distance between nearby settlements but double the distance for the bits that cross over land.

So from Port Hardy to the epicentre is about 150 kilometres, but we’ll need to add about 75 km extra because about half of that is overland. From Tofino, however, it’s 170 km from point to point and only a smidgen extra for crossing Vargas Island on the way out to sea. That’s 225 km to 175: Tofino wins!

 

We’re going to build up our query in three parts, something which isn’t strictly necessary but it does make things a little easier to read and maintain.

The first step is to get a list of populated places that are within a reasonable range of the earthquake (we’ll use 250 km again) and which have more than 1000 inhabitants. Additionally, we’ll use PostGIS to create a line from that point out to the quake and pass up all this information to the next step in our query.

SELECT
  name, pop_max, ST_MakeLine(geom, quake) AS line
FROM constants, ne_10m_populated_places
WHERE
  pop_max > 1000 AND
  ST_DWithin(quake::geography, geom::geography, 250*1000)

The sub-query above will be referenced as the places table in the query below, which is where most of the hard work actually happens.

We take the results from the places query and join them to another Natural Earth data set which contains states, provinces and other subdivisions of countries around the world (you can load it with the same shp2pgsql command I used above). Basically, this new table tells us what parts of the world are covered by land and what parts are not. By finding the intersection between our line and any land polygons, we can calculate a new land_line geometry for each of the places we found above.

SELECT
  places.name, places.pop_max, places.line,
  ST_Collect(ST_Intersection(places.line, land.geom)) AS line_land,
  ST_Length(line::geography) / 1000 AS distance
FROM
  ne_10m_admin_1_states_provinces_shp AS land, places
WHERE ST_Intersects(places.line, land.geom)
GROUP BY places.name, places.pop_max, places.line

We’ll add this new geometry and its length to the data we’ve collected about each place and pass them all up to final part of our query, referring to this as the places_land table.

SELECT
  name, 
  pop_max,
  distance,
  distance + ST_Length(line_land::geography) / 1000 AS weighted_distance
FROM places_land
ORDER BY weighted_distance

This is where we wrap everything up by calculating the weighted_distance, which is just the regular distance plus the distance of the part that crossed over land (dividing by 1000 since the length of the PostGIS geography data type is measured in meters).

Pulling these together we get this final, three-step query:

WITH constants AS 
(SELECT ST_SetSRID(ST_MakePoint(-128.15, 49.42), 4326) AS quake)
(
  SELECT 
    name, 
    pop_max, 
    distance, 
    distance + ST_Length(line_land::geography) / 1000 AS weighted_distance
  FROM
  (
    SELECT
      places.name, 
      places.pop_max,
      places.line,
      ST_Collect(ST_Intersection(places.line, land.geom)) AS line_land,
      ST_Length(line::geography) / 1000 AS distance
    FROM
      ne_10m_admin_1_states_provinces_shp AS land,
    (
    SELECT
      name, pop_max, ST_MakeLine(geom, quake) AS line
    FROM constants, ne_10m_populated_places
    WHERE
      pop_max > 1000 AND
      ST_DWithin(quake::geography, geom::geography, 250*1000)
    ) AS places
    WHERE ST_Intersects(places.line, land.geom)
    GROUP BY places.name, places.pop_max, places.line
  ) AS places_land
  ORDER BY weighted_distance
);

All that’s left is to run the query and see what we get:

      name      | pop_max |     distance     | weighted_distance
----------------+---------+------------------+-------------------
 Tofino         |    1655 | 170.322296453086 |  170.996532624216
 Port Hardy     |    2295 | 151.591959991648 |  213.424215448539
 Campbell River |   33430 | 219.404018781354 |  336.005258086551
 Courtenay      |   32793 | 229.792897985687 |  344.417265701763

It works: this earthquake is best described as being 170 km from Tofino by our reckoning!

The query above is only really suitable for points at sea, but you can adapt this code for cases where points are on land as well … and of course the exercise is not limited to earthquakes, but can be applied for any kind of disaster or event. With some additional creativity, we could  also tune our algorithm to prefer places with more inhabitants over those with fewer. And of course, you can always change the search radius of 250 km or the population threshold of 1000 inhabitants.

Finally, if you want to pack this all up and create an application with OpenGeo Suite, I suggest checking out our recent blog post on publishing SQL queries like this in GeoServer and passing parameters to make an interactive and dynamic layer to show on a map!

 

My First FOSS4G

A Whole New World

Being a new developer at Boundless, freshly out of university with my degree in Computer Science, I haven’t had much practical experience yet. While I have a decent understanding of programming and awareness of popular languages and frameworks, I am still building my knowledge of geospatial software while working at Boundless  on GeoServer. Also, open source communities were something I knew about, but I had never been a part of one.

So, with FOSS4G NA 2015, I was really diving into a whole new world. I didn’t know what to expect out of an open source conference, or if it would prove friendly to newbies.

Here is what I found.

Diversity

Something that impressed me at the conference was the diversity. I found a wide range of people from varying backgrounds and levels of experience. Some were highly academic and studying sciences, while others were masterful cartographers. Then there were quite a few developers like myself who really weren’t veterans of geospatial. Notably, there were a sizable portion of women were at the conference, many of whom were presenters.

The topics were diverse too. The first day had some beginner-friendly sessions, including introductions to GeoServer and QGIS. Then there were “theme days”,  a concept I thought was awesome: Tuesday was PostgreSQL day, Wednesday was Big Data day, and Thursday was Web Mapping day. Other talks were going on, but I found the theme talks were especially popular. Plenty of beginner material was presented along with more advanced topics, so there was something for everyone.

Further, EclipseCon was hosted jointly with FOSS4G NA this year, providing even more diversity of people and backgrounds. Everyone who registered for one conference could attend sessions at the other. This allowed an interesting overlap of developers and scientists to mix and talk to each other. Seeing how people from different fields work differently can provide interesting insights and expand our knowledge.

Learning GIS and Software Development

Above all, my goal going into FOSS4G was to learn. A few sessions stood out in particular as great resources to me.

On the geospatial side, the list is too long to put everything here, but I’ll highlight the ones I found the most helpful and useful. For those interested in scripting with Python, the Intro to Spatial Data Analysis in Python by Jenny Palomino provided a lot of background on the many available libraries and frameworks for working with spatial data. Paul Ramsey’s PostGIS Feature Frenzy was great for introducing the power of PostGIS. There was also a whole educational “training academy” which was presented by Philip Davis in Building a Sustainable Open Source Training Academy Using QGIS. Finally, the Birds of a Feather session for GeoServer had the Boundless team as well as Andrea Aime from GeoSolutions there to answer questions and help people with GeoServer.

From the Eclipse side of things, I thought Shay Shmeltzer’s presentation on iOS Development with Eclipse and Java was particularly interesting, especially because a lot of people assumed that wasn’t possible! Another fascinating presentation was Think Before You Code by Lizzi Slivinski which promoted a good discussion about user experience (UX) and design in general. Finally, Katy DeCorah provided guidelines and considerations for writing in her presentation, Writing For Everyone.

Code Sprints and Hackathons

For the developers, there were plenty of opportunities to write some code and receive help from experienced members of the community. Tuesday night had a room dedicated for a hackathon, where I was able to meet with Andrea Aime to do some much needed bug fixing for GeoServer. Also, Boundless hosted an additional code sprint on Friday. Torben Barsballe, another new developer at Boundless working on GeoServer, and I got some help from Andrea to get started with CITE tests. The time went by really fast, but we got quite a bit done for only having a few hours. Thanks to the organizers for providing us a hackathon space, and to Boundless for the space for an additional code sprint.

Conclusion

FOSS4G really broadened my perspective. There are a lot of exciting things going on, especially with advances in web mapping and a greater desire to move to the cloud and process big data. It aligns well with what we’ve been working on for GeoServer and making sure it’s ready to scale up and out to meet client needs.

I think the best part about FOSS4G was that it felt welcoming. People were very open and friendly. Experts were happy to talk and share what they know, even with newbies. All the knowledge felt available for anyone who wished to pursue it. Regardless of who you are or where you stand, FOSS4G is a great experience.

Thank you to everyone who organized this year’s FOSS4G NA, and a big thanks to Boundless for sponsoring the event and giving me the opportunity to attend. Looking forward to next year.

 

Embarking on a Journey with FOSS4G-NA 2015

I’m relatively new to Boundless as I build my career as a Java Developer – so it was timely during the week of March 9 I was able to attend FOSS4G-NA in San Francisco. As someone new to software conferences like this, I’d like to offer some reflections to hopefully share how these events can be as positive an experience for the New Guy as they are for the veteran.

Here’s what I found – the conference schedule was well paced, with a good variety of presenters and presentations covering numerous topics in the FOSS4G space. There were also a number of events outside of the presentation schedule where I got to interact with other people who were here for the conference – in many ways, this was the most eye-opening part of the experience for me. Real work gets done at conferences outside of the sessions, not just sitting in the auditoriums.

As noted, I am a Java Developer, so I also felt lucky the event was co-located with EclipseCon. It gave me the opportunity to address multiple interests within one event, and I can only hope other conferences offer me this breadth of information.

Some of the highlights of my experiences at FOSS4G were:

PlanetLabs

PlanetLabs presented a number of talks about their project to image the entire planet at 3-5 m resolution using a fleet of satellites. These satellites are built in-house using an “agile aerospace” approach, which entails lots of small, cheap satellites with fast turnover. This is a novel change from the conventional monolithic aerospace development strategy, and allows PlanetLabs to deploy and test improvements and changes quickly and cheaply. Since each satellite is a relatively low investment it also means that individual failures are not catastrophic. PlanetLabs also hosted a social event at their office/lab, and showed us where they build and program their satellites and mapping software.

Mapping Drones

One of the themes of Wednesday was drones. I saw presentations on how to build your own drone, and on OpenDroneMap, a piece of software for rendering drone-collected imagery in three dimensions. I also attended a presentation about kites as an alternative to drones: they are stronger, cheaper, and can stay up for longer, and people don’t feel threatened by kites like they do drones. This is especially relevant with all the discussion about drones and privacy these days, and provides an interesting look into human psychology, and how we are more accepting of what is familiar than the what is new.

Java 9 and Beyond

As part of EclipseCon, I attended a keynote on upcoming Java features. This included a discussion of a major feature of Java 9, the modular JVM. Even more interesting were some of the plans for future Java releases. These include a “value” class, which is essentially a class that behaves like a primitive, as well as primitive support for Java Generics. These future additions have been a long time coming. Primitive support for generics will be especially nice as it will eliminate the need to null-check every simple list of numbers, and greatly enhance memory efficiency as well.

Cesium

As most GIS people probably already know, Cesium is a JavaScript globe and mapping library. The Cesium team made a strong showing at FOSS4G with a demonstration of 3D temporal data visualization of GPS traces using Cesium. They also presented a number of cool demo applications (that you should totally check out), which are available online:


Overall, I found FOSS4G-NA to be a valuable experience, and I would be interested in attending future conferences if I were given the opportunity. For this FOSS4G, I tried to go to talks on a wide variety of topics to explore what sort of stuff was out there. While this was definitely valuable for me as a beginner, there were definitely some things that went over my head.  If I were to go to similar events in the future, I feel like I could focus more strongly on topics that would broaden and develop my skill-set as GIS Java Developer.

Using WMS time to explore your data

A feature of GeoServer that’s not very well known is that it can publish layers that contain a time component. Clients can request data for a specific date/time, a list of dates/times, or even over a range. This is built-in to the WMS protocol; no third-party requirements are necessary here.

This can of course be used to create animations (see the MapStory project for a high-profile example of this), but even simpler, it can be used to just highlight another dimension of a dataset in a useful way.

With the release of GeoServer 2.7, the ability to request relative intervals has been added as well. Previously each request had to include absolute intervals, but now you can request data with parameters that include, for example, “<some date> and 30 days before it” or, more interestingly “10 years before <today>” where <today> is the server time.

So I thought it would be interesting to highlight this feature in GeoServer to give you some ideas and inspiration on how you can adapt this to your data.

The states and the union

First, let’s take a look at a dataset. One that I happen to have handy is the order in which the states of the USA joined the union. (Taken from MapStory.)

I’ve stripped down the style here to make comprehension easier. For each feature, there is an attribute called Date_of_St which is a timestamp accurate to the day.

For example, the value of the Date_of_St attribute for the feature representing Hawaii is listed as 1959-01-03T08:00:000Z This jives with the official date as listed on Wikipedia, though I suspect the hour may be one level of precision too ambitious.

We can set this attribute as a “special” time dimension in GeoServer. Once the layer is loaded, in the Edit Layer configuration area, the Dimensions tab contains the ability to link an attribute to the time dimension.

 

For our purposes, we really only care about the year. Luckily—and this is a big advantage of using WMS dimensions for this—requests need only be as precise as you want. So if you want to request a date range of 1900-1950, you don’t need to specify it as:

time=1900-01-01T00:00:000Z/1950-12-31T23:59:999Z

Instead, you can just write:

time=1900/1950

(Imagine trying to make this flexibility of input happen without this feature. Think of generating a text string search to match up the dates exactly. No fun at all.)

We’re going to make requests to GeoServer to find out which states joined at which times. The full WMS request, utilizing OpenLayers, is a mouthful:

http://localhost:8080/geoserver/time/wms?service=WMS&version=1.1.0&
request=GetMap&layers=time:StateJoined&styles=&bbox=-178.21759836236586,
18.92178634508703,-66.96927103600244,71.40623536725487&width=699&
height=330&srs=EPSG:4326&format=application/openlayers

But for tutorial purposes, I always prefer using the WMS Reflector, which makes reasonable assumptions about the request for the sake of brevity. That same request above can be shrunk to this:

http://localhost:8080/geoserver/wms/reflect?layers=time:StateJoined&format=application/openlayers

Much better right? Except one little pitfall about enabling WMS time is that when not specifying any times, the map will only render features with the latest time, which leaves us sad little Hawaii (being the most recent state added):

 

But with this setup, it’s quite easy to make maps of states at certain times.

The states of the states

(The following code snippets need to be appended to the above request.)

The thirteen original colonies (1790):

time=1780/1790

Thirteen original colonies

The states as of the onset of the Civil War (1861):

time=1700/1861

Lots of independent territory still out there

The states that joined up during the Civil War. The US Civil War was fought from 1861-1865, but if we were to just use the string 1861/1865, we’d include Kansas, which just predates the Civil War (as seen above).

So we’ll need to get more precise, and add in the month: April 1861 to April 1865.

time=1861-04/1865-04

I did not know about Nevada joining during the Civil War, so I learned something here.

(Again, notice how easy this is with WMS dimensions; all of the work of interpreting the time is done for us.)

Finally, the states that were created in the last 120 years:

time=P120Y/PRESENT

I believe the US stays at 50 states because it's such a nice round number...

This takes advantage of the new relative time support. This also means that the output of this request could itself change over time.

(The above image was reprojected to make Alaska look less huge. This is way easier using the WMS Reflector, as all you need to add is the srs parameter.)

More interactivity?

Now, it’s easy to envision a little app that takes as input Start and End dates and refreshes the map accordingly. And if people want to see that done (or anything else along that line), please leave a comment below.

And if you want to see this dataset animated, check it out over on MapStory.

No matter how long you’ve been working with GeoServer, there’s always more to be learned. Check out our Education Center to learn more!

Have you used the WMS time feature? Let us know how in the comments below!

Support story: Getting more out of SQL Views

Most GeoServer users know how to publish tables from a database such as PostgreSQL or Oracle as vector layers. More seasoned GeoServer users, however, also take advantage of SQL Views to publish those same tables with even greater control.

So what is an SQL View? A SQL View is a type of layer that is based on a database query that you write inside GeoServer itself. To the end user, it looks like a regular layer, but behind the curtain, you have all the power of spatial SQL at your fingertips to enrich the data that your users receive.

Uses of SQL Views

There are a number of different reasons to incorporate an SQL View into your application. For example, it’s possible to:

… only expose certain attributes to users:

SELECT geom, id, name FROM banks

… run spatial queries to do spatial computation or analysis:

SELECT *, ST_Area(geom) AS area FROM countries

… join two tables together:

SELECT airport.geom, airport.name, city.population
FROM airports AS airport, cities AS city
WHERE airport.city = city.id

… convert data types that GeoServer doesn’t support:

SELECT id, name, geom, iwxxm::text FROM weather_stations

IWXXM data is stored in XML, which can be stored and validated natively by PostgreSQL (as can JSON, arrays and other types), but is not available in GeoServer. But by adding ::text, we convert it to text and expose it as a regular attribute in our layers.

Using view parameters

We can take these static SQL Views one step further by adding parameters to our SQL queries to make dynamic OGC requests based on user input. Boundless has some examples of using parameterized SQL Views in our various tutorials, including GeoNames Heat Map, Building a Census Map, and Building a Routing Application.

The trick is to add parameters to the SQL View that can be specified during a WMS or a WFS request:

SELECT * FROM buildings WHERE type = '%type%'

When we make a WMS GetMap request, for example, we can add the following to the URL:

…&VIEWPARAMS=type:hospital

The result will be the execution of the following query on the database:

SELECT * FROM buildings WHERE type = 'hospital'

If you go to our Geonames Word Map demo and type “canyon” as the search term, you can see the following WMS request being sent to our server:

http://apps.opengeo.org/geoserver/wms?FORMAT=image/png&TRANSPARENT=TRUE&
LAYERS=opengeo:geonames,opengeo:geonames&
STYLES=point,heatmap&SERVICE=WMS&VERSION=1.1.1&
REQUEST=GetMap&SRS=EPSG:900913&
VIEWPARAMS=word:canyon&
BBOX=-16077730,2576500,-6186167,7155384&
WIDTH=2022&HEIGHT=936

Buried in there is VIEWPARAMS=word:canyon, and if you open this URL in your browser, you’ll see the results of the query are taken by GeoServer to generate a heatmap.

Our routing application uses multiple parameters (source, target and cost) to generate the path between two points.

Once you’ve added them to your GeoServer toolkit, you’ll wonder how you ever did without SQL Views!

Security

Allowing a GeoServer client to influence the SQL that will be executed on your database opens the door to an SQL injection attack. To prevent the client from running arbitrary SQL code, GeoServer comes with parameter validation using regular expressions. We know that regular expressions can have a steep learning curve, but let’s go over some easy examples.

We must consider what values we want to allow for each parameter as the first step to crafting a validation expression.

Take the following SQL query:

SELECT * FROM roads
WHERE city = '%city%' AND population > %rank% AND type = '%type%'

In this example, we will accept any city name with alphabetic characters (A-z) and spaces (\s) giving a regular expression of ^[A-z\s]+$. Population is always a number (\d) so we can use ^\d+$. Finally, for the road type, we only want to accept one of the following: highway, primary, secondary and residential. This gives the following expression: ^(highway|primary|secondary|residential)$.

By default, GeoServer uses a slightly more permissive (but usually safe!) ^[\w\d\s]+$, which allows letters, numbers and spaces for all parameters.

With these controls, we have prevented a malicious client from crafting a SQL injection that could potentially destroy our data. If a client does attempt to use a parameter that fails the regular expression check (for example: VIEWPARAMS=city:London,rank:1,type:tertiary), GeoServer will return an error:

Invalid value for parameter name

More importantly the SQL will not be executed!

Caveats

So what’s the catch? There are a few things we have to take into consideration when using SQL Views. The main points of attention are:

First, be aware of potential performance implications. If you write complex SQL queries, remember that each time you make a request, the query will be executed. If it’s a slow query, your users will be waiting that much longer for a response.

Second, layers built from SQL Views are read-only. This means that you can’t use WFS-T to write back to the layer, unlike regular layers that have been published from regular database tables.

More reading

The Boundless Workshops page is a great place to read about the practical use of SQL Views in applications.

For a thorough discussion, see the OpenGeo Suite User manual’s section on SQL Views.