Monday, August 25, 2008

Keep an Eye on Performance

With the Agile development methodology my team has been able to accomplish an incredible level of productivity. We are able to respond very quickly to regressions or new requirements that come from our customers.

Using hudson, we have implemented a tight loop to look at functional regressions. Every hour or so hudson will check out our latest revision, execute unit and regression tests, and give a detailed report. Unfortunately, we didn't add any performance test in this loop. We didn't have more hardware resources to have an environment that could be used for performance testing and we though it would be good enough to test this manually in a separate environment every week or so.

For a couple of weeks we were unable to do any performance testing (we were busy getting our RC on time and with 1 less headcount). During that lapse we added around 350 revision to our SCM repository. When I got the chance to test performance again I noticed a significant degradation in performance. At the beginning we though there was a particular revision that had introduced this problem, and we were looking for it. We had some good candidates, like the migration to jruby 1.3.3 or Rails 2.1, but we were unable to find the culprit revision.

Then I decided to run a very simple benchmark against several revision submitted in the span of those weeks. What we found was that performance had not deteriorated in one single revision, or two. Performance had been deteriorating continuously during all those revisions.




With Jruby, Rails and agile development you can make your product evolve very quickly, but you can also hurt performance dramatically without really noticing. We've learned our lesson and we are adding a very simple performance test to our hudson environment. These tests will only do some basic and common operations, but that will be good enough to give us an idea of where we stand in performance.

We didn't get any more hardware for this shared environment, so we will create a solaris container with allocated CPU resources to have reliable number coming out of this test. The moral of the story (which you probably already knew), is to keep a close eye on performance and include it in the environment you use for regression testing.

No comments: