Stay Connected with the Boundless Blog

OpenGeo Geospatial Finding Performance Regressions with Git Bisect

BisectAs a result of adding LIDAR pointcloud support to PostgreSQL, I have been working increasingly with PDAL, the pointcloud transformation and processing library, mostly to add support for reading and writing to my new database format.

Having put aside my PDAL work a couple weeks ago, I was surprised to come back to it this week and find that a translation process that previously took 40 seconds now took 4 minutes. Something had changed, and not in a good way! Obviously, this wasn’t a deliberate change, but an unfortunate side effect of a separate change a developer had made in good faith to fix some other problem.

How to find the problem?

The PDAL project is managed in the git source code control system, and fortunately git provides an automatic system for tracking down bad commits: git-bisect. Bisection starts from known “good” and “bad” commits in the code record (for my purposes I used the present for “bad” and two weeks ago for “good”) and searches the commits between by repeatedly bisecting the set of possible states.

You can automate bisection by writing a script that, given a source tree, will run a test to determine if the code is in a good or bad state. In my case, the script

  1. builds the code
  2. starts a timer
  3. runs the test translation
  4. compares the timing with my 100s threshold
  5. returns

The actual code (pdal_bisection.sh) looks like this:

#!/bin/bash

# a good time is less than 100 seconds
goodtime=100

cd ${HOME}/PDAL-build
make -j3 pcpipeline

# get start time (in seconds)
T="$(date +%s)"

# Do the work here
cd ${HOME}/lidar-test
${HOME}/PDAL-build/bin/pcpipeline las2las_test.xml

# subtract end time from start time
T="$(($(date +%s)-T))"

if [ ${T} -lt ${goodtime} ]; then
  echo "Conversion time ${T}s: code is good"
  exit 0
else
  echo "Conversion time ${T}s: code is BAD"
  exit 1
fi

Once the script was written, running the bisection was easy:

# Enter bisection mode
git bisect start
# Tell git where a good commit is
git bisect good 5c824fec60105559bf1ae96f406f8fde3ebe9fa1
# Tell git where a bad commit is
git bisect bad 2ec95574ae6eb737e32520245486106edaafd299
# Run the bisection to find the first bad commit
git bisect run pdal_bisection.sh

Git then takes over and runs the bisection automatically, turning over the script and checking the return value, until it finds the first bad commit (which in my case was e3ab7125, a sensible repair of a segfault condition, with a fix underway at #153).

There is nothing quite so satisfying as watching your computer automatically turning over a problem in a few minutes that would have required hours of laborious manual sleuthing otherwise. Bisect, bisect, bisect!