Heap Analyzer
WARPO heap-analyzer is a heap snapshot analysis tool for AssemblyScript programs running on the incremental runtime. It is designed for the stage where you already know that memory is growing or staying alive for too long, but a raw linear-memory dump is still too low-level to explain why.
1. Why do we need heap-analyzer?
At runtime, the WebAssembly heap is just bytes. A raw memory dump tells you that memory exists, but not which objects are alive, which constructors dominate the live heap, or what changed between two moments in time.
There are two recurring debugging questions:
- What is currently taking space?
- What changed after a specific action, scene switch, request, or frame?
WARPO heap-analyzer answers these two questions with two views modeled after the Chrome DevTools heap snapshot workflow.
Constructor View
Constructor view solves the “what is taking space right now?” problem.
An object-by-object list is usually too noisy:
- addresses are unstable across runs
- hundreds or thousands of objects from the same class get mixed together
- root-oriented data explains reachability, but not where memory is concentrated
Constructor view groups live objects by className first, then lets you drill down into individual instances. That makes it easy to spot patterns such as:
- one constructor unexpectedly dominating retained size
- a container class with a small shallow size but a very large retained size
- a single oversized instance that keeps an entire object subgraph alive
Comparison View
Comparison view solves the “what changed between two snapshots?” problem.
Without a structured diff, you end up manually comparing addresses or scanning two large JSON files side by side. That is both slow and misleading, because the interesting question is not “which addresses differ?” but rather:
- which constructors gained objects
- which constructors lost objects
- how many bytes were newly allocated
- how many bytes were freed
WARPO comparison view follows the DevTools mental model:
- constructor rows show
New,Delete,Delta,Alloc.size,Freed size, andSize Delta - expanded instance rows show only changed instances
- instance rows keep per-object
retainedSizefor drill-down, while constructor diff rows intentionally use shallow-size-based columns
This split matters. Retained size is very useful for understanding ownership in a single snapshot, but it overlaps across objects and is not a good top-level diff column. For change analysis, shallow-size deltas are much easier to reason about.
2. How to use heap-analyzer?
Prerequisites
heap-analyzer needs two inputs:
- A dump file captured from the running WebAssembly module.
- The corresponding wasm binary compiled with DWARF debug information.
The debug information is required because heap-analyzer reconstructs class layouts, field metadata, and global-root metadata from DWARF. Without --debug, the analyzer cannot map runtime objects back to source-level constructors and reference fields reliably.
Compile your wasm with debug info enabled:
./build/warpo/warpo_asc ./build_work/dwarfFixture.ts -o ./build_work/dwarfFixture.wasm --debugIf you use project configuration instead of direct compiler arguments, the effective requirement is the same: the build must emit DWARF debug information.
Dump a Heap Snapshot at Runtime
heap-analyzer does not pause a running wasm VM by itself. Instead, your program imports a host function that writes a dump at the point you choose.
On the wasm side, declare a host import and call it with a UTF-8 encoded output path:
@external("MemoryDump", "dumpMemoryRegion")
declare function dumpMemoryRegion(filePathOffset: i32, filePathSize: i32): void;
const dumpPath = "./build_work/example-before.dump";
const encodedPath = String.UTF8.encode(dumpPath);
dumpMemoryRegion(changetype<i32>(encodedPath), String.UTF8.byteLength(dumpPath));The import itself is just a hook. The host runtime must implement MemoryDump.dumpMemoryRegion and serialize the current heap state to disk. If you are using wasm-compiler as the host, this import is already supported by its MemoryDump extension: MemoryDumpAPI.cpp.
In a real Node host, dumpMemoryRegion also needs to decode the UTF-8 path from wasm linear memory and then forward that path to the dump writer:
const imports = {
MemoryDump: {
dumpMemoryRegion(offset, size) {
const guestPath = Buffer.from(exports.memory.buffer, offset, size).toString("utf8");
writeDump(exports, resolve(projectRoot, guestPath));
},
},
};A minimal JavaScript host-side implementation looks like this:
import { mkdirSync, writeFileSync } from "node:fs";
import { dirname, resolve } from "node:path";
const DUMP_MAGIC = new Uint8Array([0x41, 0x53, 0x48, 0x44]); // "A S H D"
const HEADER_SIZE = 24;
const DUMP_VERSION = 2;
function writeDump(exports, outputPath) {
const raw = new Uint8Array(exports.memory.buffer);
const dump = new Uint8Array(HEADER_SIZE + raw.byteLength);
const view = new DataView(dump.buffer);
dump.set(DUMP_MAGIC, 0);
view.setUint32(4, DUMP_VERSION, true);
view.setUint32(8, Number(exports.__data_end.value), true);
view.setUint32(12, Number(exports.__heap_base.value), true);
view.setUint32(16, Number(exports.__stack_pointer.value), true);
view.setUint32(20, 0, true);
dump.set(raw, HEADER_SIZE);
mkdirSync(dirname(outputPath), { recursive: true });
writeFileSync(resolve(outputPath), dump);
}The dump file format is:
- magic: 'A' 'S' 'H' 'D'
- version
__data_end__heap_base__stack_pointernumMutableI32GlobalValuesmutableI32GlobalValues[]- the raw linear memory payload
The minimal Node example above intentionally writes 0 for numMutableI32GlobalValues. That is enough for simple fixtures, but it is not the full dump format.
This field is used to reconstruct GC global roots. heap-analyzer maps the serialized mutable i32 global values back to wasm global indices, then uses them to recover managed objects kept alive by global variables.
Today, the Node-side example does not have a general way to enumerate and serialize all runtime mutable i32 globals from an arbitrary instance. Because of that, the example focuses on the stable part of the format: runtime boundary globals plus the raw memory image.
If your program keeps managed objects in mutable wasm globals, omitting mutableI32GlobalValues[] can make some live objects look unreachable. For full GC-global accuracy, your host-side dump writer needs to serialize those mutable i32 globals before the raw memory payload.
To use comparison view, capture two dumps at two different moments, for example:
- before entering a scene and after leaving it
- before a request and after the response is fully processed
- before and after a stress loop that is suspected to leak
Run Constructor View
If warpo is installed in your project, invoke heap-analyzer directly from the package contents:
node ./node_modules/warpo/dist/heap_analyzer/cli.jsInside the repository, the equivalent entry point is still node ./tools/heap_analyzer/bin/cli.js.
Basic usage:
node ./node_modules/warpo/dist/heap_analyzer/cli.js analyze ./build_work/example-before.dump --wasm ./build_work/dwarfFixture.wasmUseful options:
node ./node_modules/warpo/dist/heap_analyzer/cli.js analyze ./build_work/example-before.dump --wasm ./build_work/dwarfFixture.wasm --sort retained
node ./node_modules/warpo/dist/heap_analyzer/cli.js analyze ./build_work/example-before.dump --wasm ./build_work/dwarfFixture.wasm --sort shallow
node ./node_modules/warpo/dist/heap_analyzer/cli.js analyze ./build_work/example-before.dump --wasm ./build_work/dwarfFixture.wasm --sort count --top 20The output is JSON with this top-level shape:
{
"totalHeapSize": 27476,
"totalLiveSize": 5904,
"constructors": [
{
"className": "build_work/dwarfFixture/TreeNode",
"count": 31,
"totalShallowSize": 992,
"totalRetainedSize": 4128,
"instances": [
{
"address": 41088,
"shallowSize": 32,
"retainedSize": 992
}
]
}
]
}Field meanings:
totalHeapSize: total TLSF-managed heap region currently present in linear memory. This is allocator space, not just live objects.totalLiveSize: sum of shallow sizes of all reachable objects.className: source-level constructor name resolved from DWARF.count: number of live instances for that constructor.totalShallowSize: sum of shallow sizes for instances of that constructor.totalRetainedSize: sum of retained sizes for instances of that constructor.instances: per-instance drill-down rows.
For each instance:
address: payload pointer in wasm linear memory.shallowSize: allocator cost of that object itself. In WARPO this is based on the object block size, so it includes allocator/header overhead rather than only source-visible field bytes.retainedSize: The size of memory that you can free after an object is deleted (and the dependents are made no longer reachable) is called the retained size.
One important caveat: constructor-level totalRetainedSize is useful for ranking, but it is not a partition of totalLiveSize. Retained regions overlap when you sum them across many objects, so constructor totals can exceed the live heap size.
Run Comparison View
Comparison view takes a baseline dump and a current dump:
node ./node_modules/warpo/dist/heap_analyzer/cli.js diff ./build_work/example-before.dump ./build_work/example-after.dump --wasm ./build_work/dwarfFixture.wasmUseful options:
node ./node_modules/warpo/dist/heap_analyzer/cli.js diff ./build_work/example-before.dump ./build_work/example-after.dump --wasm ./build_work/dwarfFixture.wasm --sort delta
node ./node_modules/warpo/dist/heap_analyzer/cli.js diff ./build_work/example-before.dump ./build_work/example-after.dump --wasm ./build_work/dwarfFixture.wasm --sort alloc
node ./node_modules/warpo/dist/heap_analyzer/cli.js diff ./build_work/example-before.dump ./build_work/example-after.dump --wasm ./build_work/dwarfFixture.wasm --sort freed --top 20The top-level output shape is:
{
"beforeTotalHeapSize": 27476,
"afterTotalHeapSize": 28000,
"totalHeapSizeDelta": 524,
"beforeTotalLiveSize": 5904,
"afterTotalLiveSize": 6400,
"totalLiveSizeDelta": 496,
"constructors": [
{
"className": "build_work/dwarfFixture/TreeNode",
"newCount": 2,
"deletedCount": 0,
"countDelta": 2,
"allocatedSize": 64,
"freedSize": 0,
"sizeDelta": 64,
"instances": [
{
"address": 50000,
"shallowSize": 32,
"retainedSize": 96,
"changeKind": "new"
}
]
}
]
}Field meanings follow the DevTools comparison model:
newCount: instances present only in the current dump.deletedCount: instances present only in the baseline dump.countDelta:newCount - deletedCount.allocatedSize: sum of shallow sizes fornewinstances.freedSize: sum of shallow sizes fordeleteinstances.sizeDelta:allocatedSize - freedSize.
Expanded instance rows keep:
addressshallowSizeretainedSizechangeKind: "new" | "delete"
Comparison view matches instances by className + address. If the same class/address pair exists in both dumps, it is treated as unchanged and omitted from the expanded change list.
3. How do we implement?
At a high level, heap-analyzer has three layers:
- dump capture
- single-snapshot analysis
- snapshot-to-snapshot comparison
Dump Capture
The runtime dump file is a lightweight container around a raw memory snapshot. The current format stores:
- dump magic and version
__data_end__heap_base__stack_pointer- the number of serialized mutable i32 global values
- serialized mutable i32 global values
- the linear memory payload itself
The analyzer entrypoint first parses this dump header and then exposes the remaining bytes as a DataView over wasm memory. Those serialized mutable i32 globals are specifically used to recover GC global roots.
Single-Snapshot Analysis Pipeline
The constructor view is built as a pipeline over one dump and one debug-enabled wasm binary.
1. Parse DWARF and runtime metadata
The analyzer reads DWARF from the wasm binary to reconstruct:
- class names
- field layouts
- reference fields
- global-root metadata
This is what lets the tool map a raw runtime type ID back to a constructor name and know where reference edges exist inside each object.
2. Enumerate heap objects from TLSF blocks
The AssemblyScript incremental runtime stores heap objects in TLSF blocks. heap-analyzer walks these blocks to enumerate allocated objects and compute their shallow sizes.
3. Scan references and build the object graph
Using the DWARF-derived layouts, the analyzer scans each live block and extracts outgoing references. This produces a graph where nodes are payload pointers and edges are object references.
4. Discover roots
The current implementation recognizes roots from three places:
- globals resolved from DWARF and wasm global metadata
- the shadow stack range between
__stack_pointerand__heap_base - pinned objects identified by the transparent GC color
5. Mark the live set
Starting from the discovered roots, the analyzer traverses the object graph and keeps only reachable objects. Everything after this point operates on the live graph only.
6. Compute retained sizes with a dominator tree
Retained size is implemented through dominator analysis.
In the live graph, if every path to object B goes through object A, then A dominates B. The analyzer builds the dominator tree and aggregates retained sizes bottom-up.
This gives a precise operational meaning to retained size:
- shallow size of the object itself
- plus the shallow sizes of live descendants that are exclusively kept alive by it
7. Aggregate by constructor
Finally, live objects are grouped by className. For each constructor row, the analyzer records:
- instance count
- total shallow size
- total retained size
- per-instance rows sorted by retained size
This is the constructor view.
Snapshot Comparison Pipeline
Comparison view does not diff raw memory directly. Instead, it compares two constructor snapshots.
The process is:
- analyze the baseline dump into a constructor snapshot
- analyze the current dump into another constructor snapshot
- group both snapshots by
className - match instances by
addresswithin each constructor - aggregate added and removed instances into DevTools-style diff columns
This design keeps the diff logic narrow and predictable:
- top-level diff columns use shallow size
- expanded rows show only changed instances
- unchanged same-address instances are ignored
That is why constructor view and comparison view complement each other:
- constructor view explains current ownership and retention
- comparison view explains change over time
Together they provide a practical workflow for memory debugging in WARPO-generated WebAssembly.