I’m implementing an ABI encoder and the ABI specification is wrong with regard to offsets and which slot they’re relative to.
The specification is here and is hard to follow and lacks more complex, nested examples:
https://docs.soliditylang.org/en/latest/abi-spec.html
For reference checks against a known good, I have also written a small Go CLI tool which uses go-ethereum to encode a dynamic array of dynamic arrays, uint8[][]
using value [[1,2],[3,4]]
.
Here’s the setup part of my Go reference tool:
case 14: // foo(uint8[][]) - [[1, 2], [3, 4]]
uint8DynDyn, _ := abi.NewType("uint8[][]", "", nil)
types = []abi.Type{uint8DynDyn}
values = []interface{}{[][]uint8{{1, 2}, {3, 4}}}
This results in what is (should be) correct encoding:
Slot 0: 0000000000000000000000000000000000000000000000000000000000000020
Slot 1: 0000000000000000000000000000000000000000000000000000000000000002
Slot 2: 0000000000000000000000000000000000000000000000000000000000000040
Slot 3: 00000000000000000000000000000000000000000000000000000000000000a0
Slot 4: 0000000000000000000000000000000000000000000000000000000000000002
Slot 5: 0000000000000000000000000000000000000000000000000000000000000001
Slot 6: 0000000000000000000000000000000000000000000000000000000000000002
Slot 7: 0000000000000000000000000000000000000000000000000000000000000002
Slot 8: 0000000000000000000000000000000000000000000000000000000000000003
Slot 9: 0000000000000000000000000000000000000000000000000000000000000004
Not sure why slot 3 is red?!
The spec says about offsets:
The value of head(X(i))
is the offset of the beginning of tail(X(i))
relative to the start of enc(X)
.
I don’t really know what that means and neither does any LLM. I thought it was the start of the data section for the value, which would be its length slot, but that doesn’t yield the expect …40 and …a0 for the offsets.
If you reverse back from the slots they’re both pointing at, they’re both relative to slot 2, the offset …40 itself, which is neither the start of the encoding of the root param, nor the start of the encoding of the data section, which should be slot 1 with the length information.
It’s weird. Here’s what my implementation produces:
[Failed] No14_JaggedDynamicUint8Array_ReturnsCorrectEncoding
Message:
CollectionAssert.AreEquivalent failed. The expected collection contains 1 occurrence(s) of <0x0000000000000000000000000000000000000000000000000000000000000040>. The actual collection contains 0 occurrence(s).
Block:
0x0000000000000000000000000000000000000000000000000000000000000020 (id: 154b, off: 0, ord: 0, ptr: a589, rel: 154b - uint8[][].pointer_dyn_item)
0x0000000000000000000000000000000000000000000000000000000000000002 (id: a589, off: 32, ord: 1 - uint8[][].uint8[][].count)
0x0000000000000000000000000000000000000000000000000000000000000060 (id: 11d4, off: 64, ord: 2, ptr: 86f1, rel: a589 - uint8[][].uint8[][].pointer_dyn_elem_0)
0x00000000000000000000000000000000000000000000000000000000000000c0 (id: 9ff0, off: 96, ord: 3, ptr: 923d, rel: a589 - .uint8[][].pointer_dyn_elem_1)
0x0000000000000000000000000000000000000000000000000000000000000002 (id: 86f1, off: 128, ord: 4 - .uint8[][].uint8[].count)
0x0000000000000000000000000000000000000000000000000000000000000001 (id: 0ef2, off: 160, ord: 5 - .uint8[][].uint8[].uint8.value)
0x0000000000000000000000000000000000000000000000000000000000000002 (id: 359c, off: 192, ord: 6 - .uint8[][].uint8[].uint8.value)
0x0000000000000000000000000000000000000000000000000000000000000002 (id: 923d, off: 224, ord: 7 - .uint8[][].uint8[].count)
0x0000000000000000000000000000000000000000000000000000000000000003 (id: c315, off: 256, ord: 8 - .uint8[][].uint8[].uint8.value)
0x0000000000000000000000000000000000000000000000000000000000000004 (id: bed6, off: 288, ord: 9 - .uint8[][].uint8[].uint8.value)
The offsets are …60 and …c0 relative to the start of the data block, i.e. slot 1, the length.
What should it be?? Is the spec wrong? When challenged, LLMs do concede that it is wrong.