A new tool to extract meaningful memory consumption values. Valgrind shines again.

Every program allocates memory, regardless of its size. Reducing it is an important task and an eternal struggle. Using advanced memory profiling tools, it is easy to see that the most memory is consumed by ... hash tables, vectors, unicode strings, mmaped memory regions, etc. But is THIS what you are interested in? Isn't it would be better to know how much memory consumed by WebCore, JavaScriptCore, Qt, or by a particular component, like WebCore/css?

About the freya tool

Ever heard about valgrind? Many knows it is a leak hunting tool. Well, that is half (or less) of the story. Valgring is a JIT engine, which can fully recompile existing binaries. Why is that useful? Because you can add custom code modifications to existing binary programs, like calling a user callback function, whenever a memory read or write happens, and capturing the address and size of that particular operation. Although the generated code is slower than the original one, it is still way faster than debugging the program step-by-step. Several tools built upon valgrind, like the famous memcheck tool. However valgrind has several other tools, like dynamic call graph tracer and memory profiler.

I developed a new tool to push further the limits of valgrind, called freya. Freya is memory allocation tracker, both for explicit (malloc, posix_memalign) and implicit (allocating 4k pages from an address space returned by mmap). It not just records the allocations, but records their stack trace as well. The real interesting feature of freya is that you can define rules about how the extracted information should be organized to get meaningful values.

Rules

The rules are divided into two groups:

  • First, we can define rules to discard the top items of the stack trace. This is useful, if we don't interested in hash table and vector allocations, only those functions, which actually called them. The trace cannot be fully discarded, at least the last item remains.
  • The reamining stack trace is stored in a tree, so the allocations in the same code position are grouped together. We can define rules where the stack trace should be inserted into the tree. (The concept is similar to file system mounting). These rules are evaluated in first come first served basis.
 +--+--+--+--+--+--+-           +--+--+-
 | 0| 1| 2| 3| 4| 5| ...   ->   | 4| 5| ...
 +--+--+--+--+--+--+-           +--+--+-
   ^        ^
   |        |

For example, if a discard rule matches to the first and fourth items of the stack trace, the first four items of the trace are discarded. The position of the fifth element is determined by the second set of rules, and the remaining stack trace is inserted to that position in the tree.

 +--+--+--+--+
 | 0| 1| 2|3a| - Stack trace 1
 +--+--+--+--+

 +--+--+--+--+--+
 | 0| 1| 2|3b|4b| - stack trace 2
 +--+--+--+--+--+

 Generated tree:

       +--+
       | 0|
       +--+
        |
       +--+
       | 1|
       +--+
        |
       +--+
       | 2|
       +--+
     /      \
   +--+    +--+
   |3a|    |3b|
   +--+    +--+
            |
           +--+
           |4b|
           +--+

The items in the stack trace become tree nodes, creating natural groups. These groups collect the allocations on the same code locations.

Compiling the program

We should note first, that valgrind has its own application binary interface, so WebKit JIT should be disabled when WebKit is run by valgrind. Moreover, to evaluating the rules, we need debug information (-g for gcc).

In short, you have to pass the following arguments to build-webkit (under Qt):
QMAKE_CXXFLAGS+=-g QMAKE_LFLAGS+=-g DEFINES+=ENABLE_JIT=0 DEFINES+=ENABLE_YARR=0 DEFINES+=ENABLE_YARR_JIT=0 DEFINES+=USE_SYSTEM_MALLOC

Configuring freya

The rules are stored in a config file, each line contains either a rule, or a tree node. The lines which started by hash-mark (#) or only contains spaces are ignored.

The rules have the following format: rule_name function_name path

The rule_name can be a unique identifier, or a hypen, which means this rule is a discard rule. If a discard rule matches to an item in the stack trace, all items are removed to that point in the stack trace.

Both function_name and path are optional, but at least one of them must be appear in the line. The function_name can be a name of a function enclosed in parentheses (), or a namespace name enclosed in braces {}. The function_name is omitted, if neither parentheses nor braces are found. The name of a function must be a fully qualified name, and must match to the current function name. The name of a namespace must match to the beginning of the current function name, followed by two colons ::.

The path is created from the remaining characters of the line. The spaces before the path are omitted. The path contains directory names separated by slash /, and they must partially match to the path of the current file name. I.e: dir1/dir2 does not match to /home/user/dir1/dir22/main.cpp, but dir1/main.cpp matches to both /home/user/dir1/main.cpp and /home/user/dir1/main.cpp/real_main.cpp.

If both function_name and path are specified, they both must be match to the current function.

# Examples:

# Matches to main (...)
  rule1   (main)

# Matches to main (...) in main.cpp
# Note: this rule never matches, since rule1 already captures all main(...)-s
  rule2   (main)   main.cpp

# Discard all functions which qualified name starts with NS1::NS2::
  -  {NS1::NS2}

# Mathes to all functions in dir1/main.cpp
  rule3   dir1/main.cpp

The tree nodes must be enclosed in square brackets, braces, or parentheses.

[name] defines a new tree node. In case of tree nodes, the indentation is very important, since its also represents the depth in the tree (the nodes which have the same indentation have the same depth). The tree nodes collects the memory consumption information (total, peak, etc.).

{name} attach an already defined rule to the last tree node. One rule can only be attached to at most one tree node. If any of the attached rule matches, the current stack trace is inserted to that tree node.

(name) is just an abbreviation of [name] followed by a {name}

To make a node to the default tree node (for functions which does not match to any rule), specify a plus sign + after the node definition.

# Example:
  [Total]
     (rule1)
     [Sub-total]+
        [for rule2]
           {rule2}
  (rule3)
     [Empty node]

Note: Only spaces can be used for separation, tabs are threated as normal characters!

Where can I get the source code?

The source code of freya is attached to this post. It is the source code only for the tool, not the whole valgrind. Valgrind has a nice documentation about writing new tools here. The freya tool works well with valgrind-3.5.0 (stable).

AttachmentSize
freya.tgz9.71 KB

lu_zero (not verified) - 02/16/2010 - 17:25

Would be helpful if you update this post with a mention about how to run freya and how to feed the configuration file

zoltan.herczeg - 02/17/2010 - 08:15

Just like any other tools:
valgrind --tool=freya --config="config file path" [... other options ...]

Help is also available:
valgrind --tool=freya --help

research paper (not verified) - 02/18/2010 - 06:49

yeah right. great suggestion lu_zero. The author did not mention it. I hope he will update this one as soon as possible cos I can make use of freya for future projects. Thanks.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • No HTML tags allowed
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Fill in the blank