Two weeks ago, Antonio Cangiano compared the
performance of different ruby implementations using Ruby 1.9 (YARV)'s benchmark suite. His numbers got me thinking: all alternative implementations performed badly -- most are even way slower than ruby 1.8.5. Does it signal that JVM and .NET are bad platform for Ruby language?
With this doubt I tried the benchmark with XRuby. XRuby is a ruby compiler. Unlike other implementations, it generates Java bytecode that run directly on JVM. But at first the numbers are not impressive: the 0.1.2 version is still slower than Ruby 1.8.5 in most of the cases.
Maybe I should mention that the XRuby team had done virtually nothing for performance before, and we would avoid optimization as long as possible if it makes our code complicated. But after doing some measurements, it turns out our bad performance are largely caused by a logic 'error': as we know Ruby Fixnum can not have singleton methods, but in 0.1.2 it still lookup an empty method table. And along with some bad code practices (iterating an empty ArrayList without checking if it is empty first etc), it makes method lookup much slower than it should be.
I fixed the problem by adding about 10 lines of code, and got great result:
In most benchmarks, XRuby 0.1.3 is faster than Ruby 1.8.5. For some, faster in a significant way. There are still some tests in which we are slower, but it looks like caused by poorly implemented builtin.
The following table shows the benchmark result for XRuby 0.1.3. The best part is:
we did it without a method cache. YARV is still faster than XRuby, but we have lots of room to improve too.
>java -Xmx512m -jar xruby-0.1.3.jar benchmark\run.rb
Test | Ruby 1.8.5 | XRuby 0.1.3 |
bm_app_answer.rb | fail | fail |
bm_app_factorial.rb | fail | fail |
bm_app_fib.rb | 20.02 | 12.29 |
bm_app_mandelbrot.rb | 7.099 | 8.252 |
bm_app_pentomino.rb | 289.8 | 538.5 |
bm_app_raise.rb | 4.846 | 3.986 |
bm_app_strconcat.rb | 5.898 | 3.234 |
bm_app_tak.rb | 26.14 | 22.12 |
bm_app_tarai.rb | 20.89 | 18.35 |
bm_loop_times.rb | 14.28 | 19.30 |
bm_loop_whileloop.rb | 26.03 | 19.27 |
bm_loop_whileloop2.rb | 5.257 | 4.786 |
bm_so_ackermann.rb | fail | fail |
bm_so_array.rb | 19.17 | 46.84 |
bm_so_concatenate.rb | 5.727 | 9.684 |
bm_so_count_words.rb | 2.944 | 45.50 |
bm_so_exception.rb | 9.793 | 7.399 |
bm_so_lists.rb | 3.666 | 24.59 |
bm_so_matrix.rb | 6.249 | 8.452 |
bm_so_nested_loop.rb | 15.17 | 13.45 |
bm_so_object.rb | 21.49 | 7.991 |
bm_so_random.rb | 6.169 | 5.888 |
bm_so_sieve.rb | 2.042 | 2.753 |
bm_vm1_block.rb | 64.57 | 38.69 |
bm_vm1_const.rb | 47.47 | 25.57 |
bm_vm1_ensure.rb | 45.54 | 20.01 |
bm_vm1_length.rb | 55.50 | 40.89 |
bm_vm1_rescue.rb | 39.61 | 20.64 |
bm_vm1_simplereturn.rb | 56.02 | 29.06 |
bm_vm1_swap.rb | 76.35 | 30.52 |
bm_vm2_array.rb | 19.34 | 8.532 |
bm_vm2_method.rb | 33.72 | 19.63 |
bm_vm2_poly_method.rb | 45.23 | 20.62 |
bm_vm2_poly_method_ov.rb | 12.64 | 8.261 |
bm_vm2_proc.rb | 21.08 | 17.86 |
bm_vm2_regexp.rb | 13.09 | 30.87 |
bm_vm2_send.rb | 11.71 | 15.75 |
bm_vm2_super.rb | 13.92 | 7.510 |
bm_vm2_unif1.rb | 11.30 | 8.292 |
bm_vm2_zsuper.rb | 15.71 | 7.740 |
bm_vm3_thread_create_join.rb | 0.110 | 1.331 |
* The test environment is Intel Pentium M 1G CPU, 1G Memory, Windows XP SP2, Java 1.5.0_09.