War of allocators: The QtLauncher's coast

After I benchmarked JavaScriptCore with our new participant called DLmalloc, it has been suggested to test it with QtLauncher also. I compiled DLmalloc in thread-safe mode (USE_LOCKS=1), so it became capable to serve WebCore's memory requests. Perhaps, another solution could have been to turn off every use of threads in QtLauncher/WebCore, but I think this would be a lucky approach...
I did the benchmarking on an x86 Debian-Lenny (SMP kernel, dual core 2.33GHz CPU), I used the Linux-Qt port of WebKit with the official r55365 revision. The memory results are provided by our patched linux kernel and they represent maximum resident set size (RSS).

Methanol

 

 

Methanol is our WebCore page rendering/painting benchmark which uses QtLauncher. It loads and renders web pages one by one (currently, 9 pages, 5 times). It measures the time with JavaScript.
As the chart shows, DLmalloc overtakes system allocator and JEmalloc, but TCmalloc is faster by 4.7%. On the memory consumption side, the chart shows exactly the inverse of the performance chart: TCmalloc consumes the most memory and JEmalloc the least. DLmalloc consumes less memory than TCmalloc by 3.9%, but this is only ~2.3 megabytes.

SunSpider in QtLauncher

 

 

This test runs SunSpider benchmark inside a web page, but otherwise it contains exactly the same tests which are located in the trunk/SunSpider directory of WebKit. From the view of performance, the charts of the SunSpider benchmark don't show significant differences between the allocators, but in this case DLmalloc is the fastest and it consumes the least memory, which means it is better than TCmalloc by 14.3%.

V8 in QtLauncher

 

 

In the case of V8 benchmark suite, TCmalloc is the fastest allocator again, but on the other side, it consumes the most memory. Compared to DLmalloc, TCmalloc is faster by 3.9% and consumes more memory by 4.8%.

WindScorpion in QtLauncher

 

 

WindScorpion is our real life JavaScript application collection. It emphasizes TCmalloc's performance advantage... TCmalloc is faster than DLmalloc by 10%, but on the other side, it consumes 7% more memory.

Summary

I tried every important benchmark with our new participant (DLmalloc), but it seems that in most cases, TCmalloc provides still the best performance results.

I found a fancy promising multi-threaded C++ based allocator called Hoard, in my next post I'll write about it's results.

Thomas Fletcher (not verified) - 03/11/2010 - 04:03

These are great tests. Having spent a fair amount of time working with WebKit on embedded systems, it is nice to see the memory consumption laid out side by side with performance ... both of which matter for low end systems.

If you've only got 32M of total memory for your system, then a difference of 2.3M can be pretty significant (though I'm sure the percentages scale). Looking forward to your results with Hoard. My experience is that it has good performance for multi-threaded apps, but in a single threaded shoot out doesn't do as well.

Thomas
www.cranksoftware.com

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • No HTML tags allowed
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Fill in the blank