Skip to content

Generating Flamegraphs for a Running Node Service on Linux

This guide assumes a running instance of Lodestar and will walk through how to generate a flamegraph for the process while running on Linux. While it is possible to run Lodestar in a number of ways, for performance profiling it is recommended to not use Dockerized implementations. It is best to run Lodestar as a service on a Linux machine. Follow the Lodestar docs to get the service installed and running. Then come back here when you are ready to generate the flamegraph.

Modifying Linux and Lodestar

Use the following two commands to install perf for generating the stack traces. You may get a warning about needing to restart the VM due to kernel updates. This is nothing to be concerned with and if so, cancel out of the restart dialog.

sudo apt-get install linux-tools-common linux-tools-generic
sudo apt-get install linux-tools-`uname -r`  # empirically this throws if run on the same line above

Next we need to update the Lodestar service by modifying the start script. We need to add a necessary flag --perf-basic-prof to allow the stack traces to be useful. Node is a virtual machine and perf is designed to capture host stack traces. In order to allow the JavaScript functions to be captured meaningfully, v8 can provide some help. Generally Lodestar is started with a script like the following:

Example start_lodestar.sh

node \
  --perf-basic-prof \
  --max-old-space-size=8192 \
  /usr/src/lodestar/packages/cli/bin/lodestar \
  beacon \
  --rcConfig /home/devops/beacon/rcconfig.yml

After updating the start script, restart the node process running the beacon service. Note in the command below, that the beacon service may have a different name or restart command, depending on your setup.

admin@12.34.56.78: sudo systemctl restart beacon

The flag that was added notifies V8 to output a map of functions and their addresses. This is necessary for perf to generate the stack traces for the virtual machine in addition to the traditional host stack traces. There is a very small, performance overhead to output the maps. After a short while, once the process runs for a bit the functions will no longer be moving in memory and the overhead will be significantly reduced. The VM will still be moving objects around but this flag is generally safe to run in production. After a few minutes of running, listing the directory with the start script (process.cwd()) will look similar:

-rw-r--r--  1 admin admin   9701529 May 22 00:36 beacon-2023-05-22.log
-rwxrwxr-x  1 admin root        421 May 22 00:31 beacon_run.sh
drwxr-xr-x  2 admin admin    917504 May 22 00:35 chain-db
-rw-r--r--  1 admin admin   2861242 May 22 00:36 isolate-0x6761520-2085004-v8.log
-rw-r--r--  1 admin admin    203172 May 22 00:36 isolate-0x7fa2f0001060-2085004-v8.log
-rw-r--r--  1 admin admin     68044 May 22 00:36 isolate-0x7fcd80001060-2085004-v8.log
-rw-r--r--  1 admin admin    420809 May 22 00:36 isolate-0x7fcd84001060-2085004-v8.log
-rw-r--r--  1 admin admin    123919 May 22 00:36 isolate-0x7fcd88001060-2085004-v8.log
-rw-r--r--  1 admin admin     94391 May 22 00:35 isolate-0x7fcd8c001060-2085004-v8.log
-rw-r--r--  1 admin admin    183831 May 22 00:36 isolate-0x7fcd90000e60-2085004-v8.log
-rw-r--r--  1 admin admin    152786 May 22 00:36 isolate-0x7fcd94000e60-2085004-v8.log
-rw-r--r--  1 admin admin    262333 May 22 00:36 isolate-0x7fcd98000e60-2085004-v8.log
-rw-r--r--  1 admin admin    218473 May 22 00:36 isolate-0x7fcd9c000e60-2085004-v8.log
-rw-r--r--  1 admin admin    366788 May 22 00:36 isolate-0x7fcda0000e60-2085004-v8.log
-rw-r--r--  1 admin admin    304917 May 22 00:36 isolate-0x7fcda4000e60-2085004-v8.log
-rw-r--r--  1 admin admin    586238 May 22 00:36 isolate-0x7fcda8000e60-2085004-v8.log
-rw-r--r--  1 admin admin    450675 May 22 00:36 isolate-0x7fcdac000e60-2085004-v8.log
-rw-r--r--  1 admin admin    768470 May 22 00:36 isolate-0x7fcdb8000d60-2085004-v8.log
-rw-r--r--  1 admin root        559 May 21 14:17 rcconfig.yml

The isolate-*-v8.log files are the maps that v8 outputs for the perf command to reference. You are now ready to collect the stack traces.

Capturing Stack Traces

The first command below will run perf for 60 seconds, and then save the output to a file named perf.out. The second one will merge the exported, unknown, tokens with the isolate maps and output full stack traces for the render. Running both perf commands in the folder with the isolate maps will allow the data to be seamlessly spliced. Once the output is saved, update the permissions so the file can be copied to your local machine via scp.

You can modify the frequency of capture by changing -F 99 to a different number. Try to stay away from whole numbers as they are more likely to cause interference with periodically scheduled tasks. As an example use 99Hz or 997Hz instead of 100Hz or 1000Hz. In testing neither seemed to have an appreciable affect on CPU usage when run for a short period of time.

To change the period of capture adjust the sleep duration (which is in seconds).

The pgrep command is used to find the process id to capture against. Feel free to pass a number to the -p flag if you know the process id, or adjust the file path if the executable is in a different location.

admin@12.34.56.78: sudo perf record -F 99 -p $(pgrep -f '/usr/src/lodestar/packages/cli/bin/lodestar beacon') -g -- sleep 60
admin@12.34.56.78: sudo perf script -f > perf.out
admin@12.34.56.78: sudo chmod 777 ~/beacon/perf.out

And then copy the perf.out file to your local machine to render the flamegraph. Running at 99Hz for 180 seconds results in a file size of about 3.5MB and 997Hz for 60 seconds is roughly 4.4MB.

scp admin@12.34.56.78:/home/devops/beacon/out.perf /some_temp_dir/perf.out

Rendering a Flamegraph

By far the best tool to render flamegraphs is flamescope from Netflix. It allows for easy analysis and zooming into specific time periods. It also give a holistic view of how the process is performing over time.

Installation

Python3 is required. Clone the repository and install the dependencies:

The original is no longer maintained and had a configuration bug. This is a fork that fixes the issue.

git clone https://github.com/matthewkeil/flamescope
cd flamescope
pip3 install -r requirements.txt
yarn

Usage

mv /some_temp_dir/perf.out /path/to/flamescope/examples
yarn dev

Then navigate in a browser to http://localhost:8080 and begin analyzing the data.

flamescope home screen flamescope home screen flamescope home screen flamescope home screen flamescope home screen

Filtering Results

There can be a lot of "noise" in the stack traces with libc, v8 and libuv calls. It is possible to filter the results to make it more useful, but note this will skew the results. Looking at the graph both filtered and unfiltered can be beneficial. The following sed command will remove the noise from the stack traces.

sed -r -e "/( __libc_start| uv_| LazyCompile | v8::internal::| node::| Builtins_| Builtin:| Stub:| LoadIC:| \\[unknown\\]| LoadPolymorphicIC:)/d" -e 's/ LazyCompile:[*~]?/ /'

Unfiltered

flamescope home screen

Filtered

flamescope home screen

References

List of Web References

Visualization Tools

Collecting on Linux

Collecting on MacOS