DCP-AI Canonicalization Profile¶
Profile identifier: dcp-jcs-v1
Status: Published — v2.0
First shipped in: Python 2.8.0, Rust 2.8.0, Go v2.8.0, TypeScript 2.1.0
This document is the normative reference for byte-exact JSON
canonicalization in DCP-AI. It supersedes the five-bullet summary that
appeared in spec/DCP-AI-v2.0.md § 15. Every signature, every hash,
every Merkle leaf in a DCP artifact is taken over the output of the
algorithm specified here.
The profile is a strict subset of RFC 8785 (JCS) with explicit treatment of the cases JCS leaves implementation-defined. SDKs that claim profile compliance MUST produce byte-identical output for the edge-case table in §3.
1. Why a profile and not just "JCS"¶
RFC 8785 fixes the structural rules — sorted keys, compact form, string
escaping, number representation — but defers a few decisions to the
producer's host language: how undefined is treated, what happens to
floats that happen to have integer value, whether 1e2 is normalized,
how null-equivalent host types map to JSON null. Across four SDKs
(TypeScript, Python, Rust, Go) those defaults diverge and produce
different bytes for the same source object. dcp-jcs-v1 pins each of
those decisions so all four SDKs produce identical output.
2. Normative rules¶
A producer that implements dcp-jcs-v1 MUST honour rules 1 through 8.
A verifier that consumes dcp-jcs-v1 SHOULD reject artifacts whose
output diverges from these rules.
Rule 1 — Sorted keys¶
Object keys are sorted lexicographically by Unicode code point in ascending order. Sorting is performed at every nesting depth.
Rule 2 — Compact form¶
No whitespace anywhere outside string values. Pair separator is ,,
key-value separator is :. No trailing commas.
Rule 3 — String escaping¶
Strings are encoded UTF-8. Escaping follows RFC 8785 §3.2: only the
characters that JSON requires escaping (control characters U+0000
through U+001F, ", \) are escaped. Forward slash, non-ASCII
characters, emoji, kanji, etc. are emitted verbatim as their UTF-8
byte sequence.
Rule 4 — Numeric values: integer-only¶
The wire format accepts only numbers whose value is an exact integer. "Exact integer" is decided post-parse:
- A JSON
Numberis valid iff (a) it is finite, and (b) its value has no fractional part (floor(n) == n). - A
Numberwhose source literal was1.0,1.00,1e2,100, or-0is valid because all those literals decode to an integer value after parsing. - A
Numberwhose value is0.1,1.5,1e-1,NaN, or±Infinityis invalid and the canonicalizer MUST raise an error before any output is produced.
Output normalization: every valid number is emitted in plain integer
form, no decimal point, no exponent, no leading zeros (except for
0), no trailing zeros.
{"n": 1.0} → {"n":1}
{"n": 1e2} → {"n":100}
{"n": -0} → {"n":0}
{"n": 0.1} → ERROR
{"n": NaN} → ERROR
This rule is what makes the four SDKs converge: a JavaScript producer
can pass 1.0 as a literal (it has no integer type), a Python
producer can pass int(100), a Go producer can pass 1e2 parsed from
JSON, and all four emit the same bytes.
Rule 5 — Null preservation¶
A JSON null value MUST be preserved with its key:
Null inside an array preserves its position:
Rule 6 — Null and undefined handling (cross-language)¶
JSON has one absent-value channel (null). Host languages have more.
TypeScript is the only host language that distinguishes undefined
from null; Python, Go, and Rust do not. The canonicalizer MUST treat
each host value as follows:
| Host language | Host value | In an object key | In an array slot |
|---|---|---|---|
| TypeScript | undefined |
omit the key | emit null |
| TypeScript | null |
emit null |
emit null |
| Python | absent key | omit the key | n/a |
| Python | None |
emit null |
emit null |
| Go | absent map entry | omit the key | n/a |
| Go | nil interface / explicit JSON null |
emit null |
emit null |
| Rust | absent struct field with skip_serializing_if=None |
omit the key | n/a |
| Rust | Option::None serialized without skip |
emit null |
emit null |
| Rust | Value::Null |
emit null |
emit null |
input → { a: 1, b: undefined, c: 3 } (TS)
output → {"a":1,"c":3}
input → [1, undefined, 3] (TS)
output → [1,null,3]
input → {"a": 1, "b": None, "c": 3} (Python)
output → {"a":1,"b":null,"c":3}
This is intentionally asymmetric: in TypeScript, an object with an
absent key and an object with b: undefined produce identical output,
matching JSON's inability to express "absent vs undefined-valued"
distinctly. Inside an array the position is meaningful, so
undefined materializes as null.
SDKs MUST NOT invent an undefined channel where the host language has
none. Python None, Go nil of an interface type that produced a JSON
null on parse, and Rust Value::Null all map to JSON null.
Rule 7 — Booleans¶
Booleans are emitted as the literals true and false.
Rule 8 — Empty containers¶
An empty object emits {}. An empty array emits []. Neither is
omitted.
Rule 9 — Unicode normalization is OUT OF SCOPE¶
Strings are emitted as their UTF-8 byte sequence with the escaping
rules of Rule 3, without any Unicode normalization (NFC, NFD,
NFKC, NFKD). A producer that supplies the precomposed string "é"
(U+00E9) and a producer that supplies the decomposed sequence
"e" + U+0301 will produce different bytes, even though the
two strings render identically and compare equal under canonical
equivalence.
This is a deliberate non-decision: requiring a normalization form at the canonicalization layer would force every SDK to ship an ICU- or unicode-tables-equivalent dependency and would still leave open which form (NFC vs NFKC) wins. Applications that care about homograph resistance MUST normalize their strings before they hand the value to the canonicalizer (NFC is the conventional choice for JSON-on-the-wire interchange — RFC 8259 § 8.1).
A future profile (dcp-jcs-v2 or later) MAY pin a normalization form;
that decision will live in its own profile document.
Note on
NaN/±Infinity: these values are not part of RFC 8259 JSON wire format. Most JSON parsers reject them at parse time, so the canonicalizer never sees them. Where a parser does accept them (e.g. Pythonjson.loads(allow_nan=True), JavaScript hand-built values), Rule 4 still rejects them post-parse. The interop fixtures deliberately do not ship a NaN/Infinity vector because no portable JSON syntax can express them; rejection is verified by per-SDK unit tests (e.g.sdks/rust/src/v2/canonicalize.rstest_rejects_non_finite).
3. Edge-case acceptance table¶
These are the cases that diverged across SDKs before the profile was formalized. A profile-compliant SDK MUST match every cell in this table.
| Input (JSON literal) | Output | Notes |
|---|---|---|
null |
null |
Rule 5 |
true |
true |
Rule 7 |
false |
false |
Rule 7 |
0 |
0 |
Rule 4 |
-0 |
0 |
Rule 4, sign normalised |
1 |
1 |
Rule 4 |
1.0 |
1 |
Rule 4, integer-valued float |
1.00 |
1 |
Rule 4, trailing zeros |
1e2 |
100 |
Rule 4, scientific notation with integer value |
100 |
100 |
Rule 4 |
-42 |
-42 |
Rule 4 |
0.1 |
ERROR | Rule 4, fractional value |
1.5 |
ERROR | Rule 4, fractional value |
1.0e-1 |
ERROR | Rule 4, fractional value |
NaN |
ERROR | Rule 4, non-finite |
Infinity |
ERROR | Rule 4, non-finite |
{} |
{} |
Rule 8 |
[] |
[] |
Rule 8 |
{"x": null, "y": 1} |
{"x":null,"y":1} |
Rule 5 |
[1, null, 3] |
[1,null,3] |
Rule 5 |
{"é": 1, "e": 2, "z": 3} |
{"e":2,"z":3,"é":1} |
Rule 1, code-point ordering |
{"a": {"b": {"c": 42}}} |
{"a":{"b":{"c":42}}} |
Rule 1, recursive |
4. Bundle manifest declaration¶
Bundles produced under this profile SHOULD include a top-level
canonicalization_profile field on the manifest:
{
"manifest": {
"canonicalization_profile": "dcp-jcs-v1",
"session_nonce": "...",
"rpr_hash": "sha256:...",
...
}
}
The field is optional for backward compatibility with bundles produced
before the profile shipped. Verifiers that encounter a bundle without
the field MUST assume dcp-jcs-v1 (because no other profile exists
yet). When future profiles ship (dcp-jcs-v2, ...), the field will
become required to disambiguate.
The field is added to schemas/v2/bundle_manifest.schema.json:
"canonicalization_profile": {
"type": "string",
"const": "dcp-jcs-v1",
"description": "Identifier of the canonicalization profile under which this bundle was produced."
}
5. Versioning¶
This document defines profile dcp-jcs-v1, which is frozen: any
change to the rules above requires a new profile identifier. The
governance for adding a new profile:
- A change that admits new inputs or normalises differently → new
profile (e.g.
dcp-jcs-v2). - A change that only tightens validation (rejecting inputs that were previously accepted) → consider whether downstream verifiers can upgrade in lockstep; if not, requires a new profile.
- A clarification that documents existing behaviour without changing it → in-place update, recorded in CHANGELOG with the same profile id.
When a future profile ships:
- The new profile gets its own document in
spec/(e.g.CANONICALIZATION_PROFILE_V2.md). - SDKs publish their support matrix in their READMEs.
- The bundle manifest's
canonicalization_profilebecomes a required field, and verifiers route to the appropriate canonicalizer based on its value.
6. Conformance vectors¶
The shared tests/interop/v2/interop_vectors.json file contains a
canonicalization.edge_cases block that materialises §3 as a set of
test fixtures. All four SDKs (TypeScript, Python, Rust, Go) MUST pass
this block to claim profile compliance.
The vectors are consumed by the per-SDK interop tests:
sdks/typescript/src/__tests__/interop.test.tssdks/python/tests/test_interop.pysdks/rust/tests/interop.rssdks/go/dcp/interop_test.go
CI runs all four in parallel on every PR. A divergence between SDKs is a release blocker.
7. References¶
- RFC 8785 — JSON Canonicalization Scheme (JCS) — base spec.
- RFC 8259 — JSON Data Interchange Format — wire-level grammar.
spec/DCP-AI-v2.0.md§ 15 — protocol-level summary.schemas/v2/bundle_manifest.schema.json— manifest field declaration.tests/interop/v2/interop_vectors.json#canonicalization.edge_cases— test fixtures.