War of allocators: TLSF in action

The main focus of my next experiment is yet another allocator called TLSF (Two-Level Segregate Fit). Based on the info published on its website, TLSF has bounded response time, efficient allocation methods which are fast enough, and it has efficient memory usage. Furthermore, the site promises quite good memory fragmentation values. Can it demonstrate all this inside WebKit?
This blog is an extended version of my "War of allocators: hoard or hoards?" post, so the non-TLSF results might be familiar from my formerly post. My benchmark machine is an x86 Debian-Lenny with an SMP kernel and a dual-core 2.33GHz CPU. Naturally, I used the Linux-Qt port of WebKit and I chose its official r55720 revision. The memory results are represent the maximum resident set size (RSS) of the memory consumption and they are provided by our modified Linux kernel.

Methanol

 

Methanol is our JavaScript based benchmark which uses QtLauncher, loads and renders locally mirrored popular web pages one by one (currently, 9 pages, 5 times).

 

 

I think Methanol is the most "close to real life" benchmark and unfortunately, the used implementation of TLSF doesn't show good values. TLSF is 33% slower than TCmalloc... I hoped this was only a measurement bobble, but this value is an average based on measurements executed ten times with low (<5%) standard deviation. Furthermore, to provide this lame performance result it consumes 20% more memory than TCmalloc.

SunSpider in QtLauncher

 

This test runs the popular SunSpider benchmark suite inside QtLauncher, it does a minimal rendering, but it lays emphasis of the execution of SunSpider's JavaScripts.

On the performance side, TLSF produces the same results as System malloc and this means only 6 msecs handicap to TCmalloc. On the other side, TLSF consumes 3.5% less memory than TCmalloc. Wow, I would have not believed this after Methanol's results.

V8 in QtLauncher

 

V8 benchmark suite is the official benchmark of the V8 JavaScript engine. How does WebCore and JavaScriptCore perform with TLSF on V8?

 

 

TCmalloc is faster than TSLF by 5.6%, although it consumes 1.6% more memory. I think these are not too bad values.

WindScorpion in QtLauncher

 

WindScorpion is our real life JavaScript collection, these JavaScripts haven't been written with benchmarking goals.

 

 

Huh, I don't know why TLSF performs as bad on these long running test, but it is 25% slower than TCmalloc, and it consumes 24% more memory. This is unacceptable...

 

Two SunSpider workers in QtLauncher

 

 

With JavaScript workers we are able to run JavaScript applications simultaneously... So let's do this with SunSpider benchmark suite!

 

 

As the chart shows, TCmalloc is the fastest allocator on this multi-threaded benchmark. It is 31% faster than TLSF, but the good result has a price: on the side of memory consumption, it consumes 10.8% more memory than TLSF.

Two V8 workers in QtLauncher

 

We can run V8 test simultaneously as well. Let's do it!

 

 

TCmalloc performs as good as in the case of SunSpider. It is 28% faster than TLSF, but it consumes 3.2% more memory.

Summary

 

In the most of the cases, TCmalloc provides better values than TLSF. Interestingly, on the tests which contain micro benchmarks, TLSF overtakes TCmalloc on memory consumption side, but unfortunately, on long running and real life tests it does not. We can not say that it is bad for WebKit, but it doesn't convince me fairly.

lu_zero (not verified) - 03/27/2010 - 13:34

What about checking http://code.google.com/p/compcache/wiki/xvMalloc as next candidate?

acrespo (not verified) - 03/30/2010 - 22:04

One crucial aspect when you compare several allocators is to know in detail which is the load profile (histograms are very useful).

Some allocators work well with small size requests and if the load is concentrated in small sizes the performance could apparently good.

From the temporal point of view it is important to know which is the average but also the standard deviation.

TLSF was designed for real-time systems and some possible optimisation (i.e. to handle small size requests) were not considered in order to maintain the constant cost.

Troglodyte (not verified) - 04/13/2010 - 02:56

Would you test nedmalloc? On his homepage he says,

"nedmalloc is a VERY fast, VERY scalable, multithreaded memory allocator with little memory fragmentation. It is faster in real world code than Hoard, faster than tcmalloc, faster than ptmalloc2 and it scales with extra processing cores better than Hoard, better than tcmalloc and better than ptmalloc2 or ptmalloc3. Put another way, there is no faster portable memory allocator out there!"

... but is it true??

zoltan.horvath - 04/13/2010 - 10:22

lu_zero: I checked xvMalloc and talked with Nitin Gupta. xvMalloc is part of Linux kernel and for our purposes it needs to be ported to userspace which be non-trivial effort. Nitin said TLSF and xv should give nearly identical performance results, so this post is enough. Btw, thanks for your responding and the tip!

acrespo: You are right! Thanks for your points. We're working now a more detailed memory measurement system. I hope in my next post I can show its results.

Troglodyte: Nedmalloc could be interesting. I'll check it, but is it working on Linux as well?

Troglodyte (not verified) - 04/14/2010 - 19:04

yes, it's working on Linux.

for an example of nedmalloc in portable project, you might check on the Ogre3D engine. nedmalloc is Ogre's default memory allocator -- users have been reporting faster load times since nedmalloc became the default :)

Pignot (not verified) - 08/05/2010 - 20:51

hi, what about nedmalloc, any tests done ???

Patrick (not verified) - 10/21/2014 - 11:17

Version matters for this post to have long lasting value.

According to https://www.facebook.com/notes/facebook-engineering/scalable-memory-allo..., jemalloc performs better than tcmallc in facebook, which contrasts the results here.

So, what are the versions?

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • No HTML tags allowed
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Fill in the blank