Monday, August 25, 2008

Keep an Eye on Performance

With the Agile development methodology my team has been able to accomplish an incredible level of productivity. We are able to respond very quickly to regressions or new requirements that come from our customers.

Using hudson, we have implemented a tight loop to look at functional regressions. Every hour or so hudson will check out our latest revision, execute unit and regression tests, and give a detailed report. Unfortunately, we didn't add any performance test in this loop. We didn't have more hardware resources to have an environment that could be used for performance testing and we though it would be good enough to test this manually in a separate environment every week or so.

For a couple of weeks we were unable to do any performance testing (we were busy getting our RC on time and with 1 less headcount). During that lapse we added around 350 revision to our SCM repository. When I got the chance to test performance again I noticed a significant degradation in performance. At the beginning we though there was a particular revision that had introduced this problem, and we were looking for it. We had some good candidates, like the migration to jruby 1.3.3 or Rails 2.1, but we were unable to find the culprit revision.

Then I decided to run a very simple benchmark against several revision submitted in the span of those weeks. What we found was that performance had not deteriorated in one single revision, or two. Performance had been deteriorating continuously during all those revisions.




With Jruby, Rails and agile development you can make your product evolve very quickly, but you can also hurt performance dramatically without really noticing. We've learned our lesson and we are adding a very simple performance test to our hudson environment. These tests will only do some basic and common operations, but that will be good enough to give us an idea of where we stand in performance.

We didn't get any more hardware for this shared environment, so we will create a solaris container with allocated CPU resources to have reliable number coming out of this test. The moral of the story (which you probably already knew), is to keep a close eye on performance and include it in the environment you use for regression testing.

Sunday, August 24, 2008

Jruby Memory Requirements

As most of you know, a ruby runtime is a single threaded process. In order to take advantage of modern servers that have multiple cores and/or threads, you will want your jruby application to have many runtimes ready to serve incoming requests. Warbler will nicely take care of creating these runtimes and maintaining the pool, but how many should you have inside 1 JVM?

In our environment we deploy our jruby application in glassfish, which we run in T2000 servers. These servers have 8 independent cores, and each core has 4 hardware threads. These servers don't run very fast (1.4Ghz), but can run in parallel many threads, which make them good for this kind of applications.

In an ideal scenario, where you have threads that don't wait for resources (like memory, locks or i/o), 8 threads could fully utilize these 8 cores. But in reality you want to have lots of runtimes ready in the running queue, so there is always a thread in the queue ready to take over an idle CPU, while other threads wait for i/o, memory or locks.

vmstat will show you the running queue. Notice in this vmstat sample how the running queue (first column) always has some threads waiting for cpu. You'll want to have some, but not too many. Also notice how idle time in the CPU is close to zero (in a benchmark you will want to see this, but not in production).

 
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 -- in sy cs us sy id
7 0 0 14910496 5852176 306 1638 0 176 121 0 0 8 0 0 0 6192 44501 8200 93 5 2
11 0 0 14792088 5731032 175 837 0 113 87 0 0 6 0 0 0 6250 40901 8670 93 5 2
9 0 0 14775944 5712792 227 1203 0 176 143 0 0 8 1 0 0 5910 48072 8433 94 5 1
10 0 0 14852816 5783816 116 552 0 104 73 0 0 6 0 0 0 6461 35296 8412 96 4 0
...



Based on these facts, In my first try I decided to configure warble and glassfish so that I could use 20 ruby runtimes in a single 32 bit jvm (the jvm had 3.7G of virtual space). The results were very interesting; We first noticed that we would run out of perm space, even after allocating 512M to it. These perm space problem were quickly fixed in jruby 1.1, but after that we were still having problems.

What we noticed was that each ruby runtime was taking around 20M of heap space just to be instantiate. In addition, when runtimes were invoked they were creating many objects and holding on to them for very long periods of time. So long that they would tenure and be moved to the old space generation. jstat was showing us that old space was eventually getting full and we were incurring in full garbage collections (full GCs should be avoided if possible, because they stop the world for longs periods of time). In addition, every full garbage collection would only clean only a tiny percentage of old space, so they were very frequent. Even though we were using parallel garbage collector, full GCs were bringing our performance to its knees.

There was always the option to move to a 64 bit JVM, but by doing so performance would take a hit of 20% or more. To solve this problem and make better use of the 32G of memory in these server we decided to deploy 6 JVM, configuring each of them with only 5 ruby runtimes. This solution kept our full GC way under control, and allowed us to max out the CPU in the server while maintaining our response time in check.


Saturday, August 23, 2008

Performance and Jruby

About a year ago I joined a new team at Sun. They were a handful of developers using jruby on rails to create a web application. My roll in this team was to help with database availability and performance. In my previous positions at Sun I had published several spec benchmarks, so I was familiar with that type of work.

By the time I joined the team little attention had been payed at the application's performance, but the assumption was that the database would be the biggest bottleneck. After all, that is usually the case, isn't it? Well, a year later I can say that we were all wrong.

In the first couple of days I started getting familiar with the application and using jmeter. I created a simple experiment that simulated 1 user hitting the home page several times. To my surprise, the average response time I got was more than 15 seconds and none of it was spent in the database; It was all in the app and web tiers! Lots of things have changed since then. We have found problems in many places and made big changes to keep performance in check.

In this series of blogs I'll go in detail about some of the problems we encountered and how they got fixed. Many of these topics will be touched in my presentation at this years rails conference, but most probably in less detail.

I hope these blogs help you tune your application and I hope they spare you some time.

Happy tuning!