Sunday, August 24, 2008

Jruby Memory Requirements

As most of you know, a ruby runtime is a single threaded process. In order to take advantage of modern servers that have multiple cores and/or threads, you will want your jruby application to have many runtimes ready to serve incoming requests. Warbler will nicely take care of creating these runtimes and maintaining the pool, but how many should you have inside 1 JVM?

In our environment we deploy our jruby application in glassfish, which we run in T2000 servers. These servers have 8 independent cores, and each core has 4 hardware threads. These servers don't run very fast (1.4Ghz), but can run in parallel many threads, which make them good for this kind of applications.

In an ideal scenario, where you have threads that don't wait for resources (like memory, locks or i/o), 8 threads could fully utilize these 8 cores. But in reality you want to have lots of runtimes ready in the running queue, so there is always a thread in the queue ready to take over an idle CPU, while other threads wait for i/o, memory or locks.

vmstat will show you the running queue. Notice in this vmstat sample how the running queue (first column) always has some threads waiting for cpu. You'll want to have some, but not too many. Also notice how idle time in the CPU is close to zero (in a benchmark you will want to see this, but not in production).

 
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 -- in sy cs us sy id
7 0 0 14910496 5852176 306 1638 0 176 121 0 0 8 0 0 0 6192 44501 8200 93 5 2
11 0 0 14792088 5731032 175 837 0 113 87 0 0 6 0 0 0 6250 40901 8670 93 5 2
9 0 0 14775944 5712792 227 1203 0 176 143 0 0 8 1 0 0 5910 48072 8433 94 5 1
10 0 0 14852816 5783816 116 552 0 104 73 0 0 6 0 0 0 6461 35296 8412 96 4 0
...



Based on these facts, In my first try I decided to configure warble and glassfish so that I could use 20 ruby runtimes in a single 32 bit jvm (the jvm had 3.7G of virtual space). The results were very interesting; We first noticed that we would run out of perm space, even after allocating 512M to it. These perm space problem were quickly fixed in jruby 1.1, but after that we were still having problems.

What we noticed was that each ruby runtime was taking around 20M of heap space just to be instantiate. In addition, when runtimes were invoked they were creating many objects and holding on to them for very long periods of time. So long that they would tenure and be moved to the old space generation. jstat was showing us that old space was eventually getting full and we were incurring in full garbage collections (full GCs should be avoided if possible, because they stop the world for longs periods of time). In addition, every full garbage collection would only clean only a tiny percentage of old space, so they were very frequent. Even though we were using parallel garbage collector, full GCs were bringing our performance to its knees.

There was always the option to move to a 64 bit JVM, but by doing so performance would take a hit of 20% or more. To solve this problem and make better use of the 32G of memory in these server we decided to deploy 6 JVM, configuring each of them with only 5 ruby runtimes. This solution kept our full GC way under control, and allowed us to max out the CPU in the server while maintaining our response time in check.


2 comments:

Jay said...

You said moving to 64 bit jvm would be a 20% performance hit (or more). Why is that? thanks!

Jay

fdo said...

mostly because with a 64 bit jvm you are handling pointers to memory that are twice the size.