This reference explains how webnn-graph lowers ONNX graphs into the WebNN DSL.
MatMul, Transpose, Concat, Reduce*,
Gather, Reshape (with known newShape), etc., derive output shapes. Some unresolved symbolic input
metadata may still be preserved in v2 graphs, but shape-critical paths must be static..webnn output stores only structure (inputs, const declarations, nodes, outputs).
Actual tensor bytes live in model.weights with offsets/types in model.manifest.json. Inline bytes are
allowed for tiny scalars; everything else is referenced via @weights("key").batch, seq_len), and graphs may
manipulate shapes at runtime (Shape → Gather → Concat → Reshape pipelines, dynamic Slice, etc.).1) Prepare the ONNX graph for static lowering
--override-dim name=value to pin symbolic dims (e.g.,
batch=1, seq_len=128). A sidecar *.dims.json can supply the same. Defaults may kick in for
common names (batch, sequence_length, etc.) when missing.--optimize flag to activate built-in constant folding that
eliminates dynamic-shape plumbing such as:
Shape → Gather → Concat → Reshape--experimental-dynamic-inputs is enabled.2) Lower the ONNX graph to WebNN DSL
ai.onnx opset 11–18 is accepted.--optimize): evaluates Shape/Gather/Concat/Cast/Squeeze/Unsqueeze operations
at conversion time, replacing them with their computed constant values. Reduces graph size by 40-50%
for transformer models.OpRegistry; purely
constant outputs are emitted as consts and skipped as nodes..webnn (structure only) plus .weights and .manifest.json (raw bytes +
offsets). Inline bytes are used only for tiny scalars; real tensors use @weights("key").flowchart LR
A["ONNX model (may have symbolic dims)"]
B["Shape overrides (--override-dim) + constant folding (--optimize)"]
C["Static shape inference"]
D["Constant folding: Shape/Gather/Concat/etc"]
E["Op mapping (OpRegistry)"]
F["WebNN DSL (.webnn)"]
G["Weights sidecars (.weights + .manifest.json)"]
A --> B --> C --> D --> E --> F
D --> G
--experimental-dynamic-inputs is enabled, but conversion still requires concrete values wherever
shape math must be static.Shape, Gather, Concat, Unsqueeze, Squeeze, Cast of ints):
shape/gather/concat/unsqueeze/squeeze node, but
only when its shape impact is already known and compatible with WebNN.newShape to be fully known. The converter pulls it from constant tensors or folded const
values; -1 is resolved using the input element count.newShape is not fully static, conversion fails with a clear “WebNN requires static newShape” error.X -> Shape -> Gather -> Concat -> Reshape(X, newShape=tensor) with symbolic dims.--optimize: newShape becomes a constant tensor (e.g., [1,128,12,32]); Shape/Gather/Concat
are removed.reshape(X, newShape=[1,128,12,32]) node; newShape is static and
embedded as an inline small const if needed.
flowchart LR
A[X] --> B[Shape]
B --> C[Gather]
C --> D[Concat]
D --> E["Reshape X,newShape=tensor"]
subgraph constant_folding
F[X] --> G["Reshape X,newShape=[1,128,12,32]"]
end
subgraph webnn
H["reshape(X,[1,128,12,32])"]
end
starts/ends/axes/steps produced by small subgraphs.--optimize: those tensors become constants with normalized axes; negative indices are resolved.slice node with static starts/ends/axes/steps; rejected if any bound stays
dynamic or step != 1.
flowchart LR
A[starts subgraph] --> B[Slice]
C[ends subgraph] --> B
D[axes subgraph] --> B
E[steps subgraph] --> B
subgraph constant_folding
F[starts const] --> G["Slice(static)"]
H[ends const] --> G
I[axes const] --> G
J[steps const] --> G
end
subgraph webnn
K[slice static bounds]
end
Unsqueeze/Squeeze/Reduce* or perm for Transpose are fed by tensors.--optimize: axes/perm are folded into constant tensors.axes or permutation arrays; if still dynamic, the op
is rejected.
flowchart LR
A[axes tensor] --> B[Unsqueeze]
subgraph constant_folding
C[axes const] --> D["Unsqueeze axes=[1,3]"]
end
subgraph webnn
E["unsqueeze axes=[1,3]"]
end
Gather(Shape(X), idx) to pick a dim.--optimize: Shape and Gather are removed and replaced by a scalar constant (e.g., sequence
length).gather node is emitted.
flowchart LR
A[X] --> B[Shape]
B --> C[Gather idx]
subgraph constant_folding
D[const dim]:::const
end
subgraph webnn
E[inline scalar const]:::const
end
classDef const fill:#eef,stroke:#66f;
--optimize: shapes are static and compatible; broadcasting stays implicit.flowchart LR
A["X (static shape)"] --> C[Add]
B["Y (static shape)"] --> C
subgraph webnn
D["add(X,Y) with inferred broadcast"]
end
matmul (plus optional transposes, alpha/beta scaling, and bias add for Gemm).
Transposes are emitted as separate transpose nodes when requested.Add, Sub, Mul, Div, Pow map directly to WebNN elementwise ops with
broadcasted shapes already inferred.Relu, Gelu, Tanh, Sigmoid, Sqrt, Exp, Log, Abs, Neg, Erf map
one-to-one.LayerNormalization (with epsilon/axes) and Softmax (axis).Reshape, Transpose, Concat, Split, Unsqueeze, Squeeze with static
axes/newShape/permutation; failures if not static.Shape, Gather, Slice as described above; Cast with supported dtype mapping.ReduceMean, ReduceSum, ReduceMax, ReduceMin with axes and keepdims.Unsupported ops (or ops with remaining dynamism) fail fast with an explicit “unsupported operator” or “WebNN requires static …” message to keep the pipeline predictable.
all-MiniLM-L6-v2-webnn)1) Convert with overrides + folding:
webnn-graph convert-onnx --input model.onnx --optimize \\
--override-dim batch_size=1 \\
--override-dim sequence_length=128 \\
--output model.webnn --weights model.weights --manifest model.manifest.json
2) Lowering behavior:model.webnn: structure-only graph; inputs pinned; consts reference @weights("…") or inline tiny
scalars; nodes are WebNN ops with sanitized IDs; outputs expose last_hidden_state.model.weights: raw little-endian tensor bytes, concatenated.model.manifest.json: dtype/shape/byte offsets for every @weights tensor.