In our last post on clustering, we talked about the theory behind some different options for clustering. In this post, we’ll go into an example of clustering, taken from our recent experience with one of our OpenGeo Suite Enterprise clients. If you’ll be attending FOSS4G-NA and want to learn more about clustering and GeoServer consider attending our GeoServer training and Juan Marin’s GeoServer in Production presentation (scheduled for 5/23/2013 at 11:30 am).
In this following scenario, we will work through the installation and configuration of two GeoServers each inside their own servlet container instances on the same machine. Each servlet container will use the same JRE and the same container binaries (Apache Tomcat 7), but they will have independent configurations that allow them to run on different ports. These two GeoServer/Tomcat instances will be fronted by a local software proxy called HAProxy which acts as a HTTP/TCP load balancer. Load balancer configurations provide very basic “round robin” balancing of GeoServers. More sophisticated load-balancing configurations are possible, but are beyond the scope of this example. All GeoServers will be deployed as WAR files placed into each of the Tomcat
webapps directories. It is possible to have multiple instances of Tomcat share a single web-application through the use of contexts. This is useful if you anticipate your web-application (GeoServer) will be changed/updated frequently, but isn’t necessary.
The following steps will walk through the installation and configuration of a basic cluster containing two GeoServers in separate Tomcat servlet containers, behind an HAProxy load-balancer/proxy on the same machine (high-performance). The steps are:
- Download and unpack Tomcat binaries
- Create individual Tomcat instances
- Start Tomcat instances and deploy GeoServer applications
- Install and configure load balancer / proxy server
Following this walk-through, we will discuss other options for and extensions of this configuration such as high-availability, alternate proxy tools/configurations, database-backed catalogs, and triggered configuration reloads.
Download and unpack Tomcat binaries
- Start by downloading a binary distribution of Tomcat to your machine. This example uses the latest version (7.0.39 at the time of writing) as a
.tar.gzfile from http://tomcat.apache.org/download-70.cgi.
- Unpack this archive to a suitable location: In this example the entire contents of the archive are extracted to
/var/tomcat. By convention, we’ll refer to this directory as
Create individual Tomcat instances
Next we’ll make two directories for two separate instances of Tomcat to run from. Both instances will use the same Tomcat binaries from the directory created above; the directories created here will just hold the logs and configuration for each of our instances. In this example we’ll make two instance directories,
/var/tomcat2. These will be referred to as
$CATALINA_BASE directories. Each of these directories needs a basic structure and some initial content to host a Tomcat instance. Run the following commands (or a variation thereof) to create the basic directory structure.
# mkdir /var/tomcat1/conf # mkdir /var/tomcat1/logs # mkdir /var/tomcat1/temp # mkdir /var/tomcat1/webapps # mkdir /var/tomcat1/work # mkdir /var/tomcat2/conf # mkdir /var/tomcat2/logs # mkdir /var/tomcat2/temp # mkdir /var/tomcat2/webapps # mkdir /var/tomcat2/work
Each Tomcat instance needs two configuration files to start with that define how the service will run (name, host(s), and ports). Copy the files
server.xml and web.xml from the
$CATALINA_HOME/conf directory into each of the
$CATALINA_BASE/conf directories. We’ll need to edit each of the
server.xml files so that each Tomcat instance runs and shuts down on different ports. In each file, look for lines like:
<Server port="8005" shutdown="SHUTDOWN"> <Connector port="8080" protocol = "HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
Change these port values. The values can typically be any unused port above 1024. Both the SHUTDOWN and the HTTP Connector ports must be different, and the values for these ports in
tomcat1/conf/server.xml must be different than those in
tomcat2/conf/server.xml. For example:
- On tomcat1: SHUTDOWN on port 8005, serve HTTP requests on 8085
- On tomcat2: SHUTDOWN on port 8006, serve HTTP requests on 8086
Start Tomcat instances and deploy GeoServer applications
Once configured, we’re ready to start up the servlet containers. We can do this by running a series of commands from the terminal, however it might be more pragmatic to write a small script to accomplish this since these steps often need to be repeated. Create two scripts:
/var/tomcat2/tomcat2.sh. These scripts will be identical except for the value of the
export CATALINA_HOME=/var/apache export CATALINA_BASE=/var/apache1 export CATALINA_TMPDIR=$CATALINA_BASE/temp export JRE_HOME=/usr/lib/jvm/java-6-openjre/jre export CLASSPATH=/var/tomcat/bin/bootstrap.jar;/var/tomcat/bin/tomcat-juli.jar $CATALINA_HOME/bin/catalina.sh start
These scripts will perform the following functions:
- Define the location of the Tomcat binaries in
- Set the location for the current container in
- Define the location of the JRE for Tomcat to use (we’re assuming Java is installed)
- Define the CLASSPATH to the core Tomcat JARs
- Export each environment variable so they are available to the Tomcat start-up calls
- Run the Catalina control script to start the Tomcat service
With the files created, run both
/var/tomcat2/tomcat2.sh to configure and start the two services. Note that you’ll need to make the files executable. To confirm that the services started correctly, you can tail the
catalina.out files in
/var/tomcat2/logs. Next, copy the
geoserver.war file (or unpacked application directory) into each of the
/var/tomcat2/webapps directories. The applications should automatically deploy. Confirm that you have two GeoServers running independently by browsing to each one on their respective port.
Install and configure load balancer / proxy server
Install a web server that is capable of acting as a load balancer in front of the cluster. In this example we use HAProxy installed on Ubuntu Linux using standard package management. That said, rhere are many other tools you can use as a front-end load balancer / proxy server other than HAProxy. One example would be Apache HTTP with mod_proxy_http and mod_proxy_balancer. Microsoft IIS can also sit in front of Tomcat instances using the Network Load Balancer (NLB) or Application Request Routing (ARR). Configure haproxy to act as a load balancer using the following sample in /etc/haproxy/haproxy.cfg. Stop haproxy, copy / edit the config file in place, and then restart.
global maxconn 4096 user haproxy group haproxy daemon defaults log global mode http option httplog option dontlogpull retries 3 option redispatch maxconn 2000 contimeout 5000 clitimeout 50000 srvtimeout 50000 log 127.0.0.1 local0 log 127.0.0.1 local7 debug frontend http-in bind *:80 default_backend geoserver backend GeoServer balance roundrobin server GeoServer1 localhost:8085 maxconn 32 check server GeoServer2 localhost:8086 maxconn 32 check ##option httpchk ##option forwardfor listen admin bind *:8080 stats enable
(Note that these are very basic configurations. Your system administrator will have a better idea of what is the norm for your organization.) There are a few ways you can confirm that HAProxy is working and balancing the clustered back-ends as anticipated. 1) Browse to the HAProxy admin page at
In the figure above we see that HAProxy recognizes the GeoServer backend with two members GeoServer1 and GeoServer2. 2) You might also make a noticeable configuration to one or GeoSevers and observe that the single proxy URL is requesting data from all members of the backend. For example, a change to a single SLD file, and then requesting a layer that makes use of that style through HAProxy confirms that both of our GeoServers are handling requests.
A Common Data Directory
Normally, GeoServer data directories in a cluster will be identical in content. In this example we will set a common data directory for all cluster members using a context parameter in the
web.xml file of each GeoServer in our cluster. You can specify the GeoServer data directory location several other ways. Make a new directory for your shared GeoServer Data Directory.
# mkdir /geoserver_data
Copy a template data directory into the new location. This step is optional. As long as the base data directory exists, GeoServer will create the basic configurations it needs if they’re not found on start-up. It’s just a bit more painless for this example if they are where we expect them.
# cp -r /var/tomcat1/webapps/geoserver/data* /geoserver_data
Stop the two GeoServer / Tomcat instances so we can reconfigure our data directory locations, either by killing the identified pids for the java processes that our GeoServers are running under, or (in a more sophisticated installation) using a service. Update the
web.xml file in each web application
WEB-INF directory. This file typically lives at
$CATALINA_HOME/webapps/geoserver/WEB-INF/web.xml. Make four changes to each file:
- Specify the new location of the
- Specify a
GEOSERVER_LOG_LOCATIONfor this particular instance to log to. This will avoid collisions with the other GeoServer nodes in the cluster writing to the same location. For example, set to
true. This will avoid collisions with the other GeoServer nodes’ GWCs writing disk use information to common locations.
true. This will avoid collisions with the other GeoServer nodes’ GWCs writing cache status information to common locations.
Extending These Examples
This document is intended as just an example of setting up GeoServer in a clustered configuration, designed to move users towards a more scalable GeoServer installation, but might not be suitable for production in all environments. Future versions and alternate scenarios will take into consideration:
- Scaling GeoServer horizontally (on multiple machines)
- Hybrids of vertically and horizontally scaled instances
- Better ways to manage and configure multiple servlet containers (Tomcat) and web applications (GeoServer)
What sort of GeoServer clustering environments are you interested in setting up? Let us know in the comments below.