GC lowering in warpo is a shadow-stack lowering pass: it turns the GC MIR (~lib/rt/__localtostack / ~lib/rt/__tmptostack) into explicit stores to linear memory, and inserts stack-pointer adjustments (prologue / epilogue) around the region where those roots must be visible to the collector.
This page is a high-level overview of the assumptions and the workflow implemented in passes/GC/*.
Assumptions
GC lowering relies on the following runtime + IR conventions:
Runtime symbols exist
The module is expected to contain:
- Globals
~lib/memory/__stack_pointer(i32): base pointer for the shadow stack region.~lib/memory/__data_end(i32): lowest legal stack address (stack overflow check).
- Builtins used as IR markers
~lib/rt/__localtostack~lib/rt/__tmptostack.
- Potential GC triggers (used for leaf-function classification)
~lib/rt/itcms/__new,~lib/rt/itcms/__collect,~lib/rt/tcms/__new,~lib/rt/tcms/__collect.
Linear memory is available
Lowering writes roots into linear memory using the module’s first memory (m->memories.front()).
tostack call shapes are constrained
The passes assume the frontend emits:
__tmptostack(v)for temporary values (often as operands tocall/call_indirect).__localtostack(v)for locals (typically used underlocal.set, i.e.(local.set $x (call $__localtostack ...))).
In the optimized path, __localtostack is assumed to be used only in local.set (other uses are treated as invalid in analysis).
Caller-managed arguments
warpo’s convention is that the caller ensures the lifetime of GC-typed arguments across call and call_indirect. GC lowering focuses on locals and temporaries that need to be rooted across potential GC trigger points.
Workflow
There are two main lowering modes:
- Fast lowering (
gc::FastLower): minimal analysis, prioritizes compile time. - Optimized lowering (
gc::OptLower): uses liveness + CFG-based analyses to reduce shadow-stack traffic.
Both end up with the same essential shape in Wasm:
- Reserve a per-function shadow-stack “frame” by adjusting
__stack_pointer. - Replace
__localtostack/__tmptostackwith stores into that frame. - Restore
__stack_pointeron all exits.
stack-pointer helpers
GC lowering materializes two helper functions (see passes/GC/BaseLower.cpp):
~lib/rt/__decrease_sp(bytes)__stack_pointer -= bytesmemory.fill(__stack_pointer, 0, bytes)(clears newly allocated slots)- traps if
__stack_pointer < __data_end(stack overflow / collision)
~lib/rt/__increase_sp(bytes)__stack_pointer += bytes
The explicit zeroing is important: it prevents stale pointers in unused slots from keeping objects alive.
Fast lowering (gc::FastLower)
Fast lowering avoids CFG/liveness work and instead assigns slots with a simple, local strategy:
- Assign slots + attach offsets
- A walker scans the function and:
- maps each local that goes through
__localtostackto a stable slot. - allocates slots for
__tmptostackvalues using a stack-like allocator that respects nested calls.
- maps each local that goes through
- It rewrites each marker call from unary to binary form, e.g.
__localtostack(v) -> __localtostack(v, b)and__tmptostack(v) -> __tmptostack(v, b), wherebis the shadow-stack byte offset for the assigned slot (b = 4 * slotIndex).
- A walker scans the function and:
- Insert prologue/epilogue
PrologEpilogInserterreservesmaxOffsetbytes at function entry (or at an equivalent safe point), and restores the stack pointer on all exits.
- Lower the builtins into real code
- The imported
__localtostack/__tmptostackare turned into normal (non-imported) functions with signature(value: i32, offset: i32) -> i32that:- computes
address = __stack_pointer + offset - stores
valueataddress - returns
value
- computes
- The imported
This mode is fast and robust, but may reserve more space than strictly necessary.
Nested Calls in Fast Lowering
WARNING
It is possible to have nested call. We should make sure the slot of first argument is different with the second and the third.
(call $foo
(call $tostack (local.get $arg0))
(call $bar
(call $tostack (local.get $arg1))
(call $tostack (local.get $arg2))
)
)Optimized lowering (gc::OptLower)
Optimized lowering runs a pipeline to minimize both reserved frame size and the number of stores. For more detailed per-pass notes, see opt_lower_details.md.
OptLower::run also has an early test-only control path: after preprocess, --gc-test-only-control-group returns immediately (no analysis / lowering passes are scheduled).
- Preprocess
vacuumandmerge-blocks(Binaryen passes) to reduce IR noise / block count.
- Remove obviously-unnecessary
tostack
ImmutableGlobalToStackRemoveris enabled by default unless--no-gc-optis set, and can also be enabled explicitly via--gc-immutable-global-to-stack-remover.- It drops
__tmptostack(global.get $immutable)because immutable globals don’t need re-rooting.
- Build module-level SSA map
ModuleLevelSSAMap::create(m)creates the SSA model over locals, temporaries, and parameters that are treated as GC objects.
- Compute call graph
CallGraphBuilderis always added to the pass pipeline.
- Compute leaf functions (optional)
CollectLeafFunctionis scheduled only when leaf filtering is enabled.
- Object liveness analysis
ObjLivenessAnalyzercomputes a liveness bitset per expression for the SSA objects.
- SSA cleanup / merging (optional)
MergeSSAcan merge certain tmp-SSA values into the local-SSA they reference to reduce slots.
LeafFunctionFilteris always scheduled after liveness (and afterMergeSSA, when enabled).- It uses leaf-function info when available, and invalidates SSA values that are never live around a non-leaf call.
- Assign stack offsets
StackAssignerassigns offsets to thetostacksites where an SSA value becomes live.
- Mode is selected by flags: optimized mode uses greedy conflict-graph coloring; otherwise vanilla mode.
- If enabled,
ShrinkWrapAnalysischooses insertion points (dominating / post-dominating all required stack activity) and avoids loops. - Otherwise, dummy insertion hints are used.
PrologEpilogInserteruses the computed max offset and insertion hints; it also rewritesreturnsites to ensure the epilogue executes on early returns.
Inline
tostackinto stores (or remove it)ToStackReplacerreplaces__localtostack/__tmptostackwith ani32.storeinto__stack_pointer + offset, returning the original value.- If a call has no assigned offset, it is dropped (the value is used directly).
runner.run()
- Executes all scheduled passes in the above order.
- Cleanup
- Removes the marker functions
__localtostack/__tmptostackfrom the module and adds the stack-pointer helpers.
- Removes the marker functions