Ok this is a bit later than I intended to submit it, but I think I’ve figured out most of what I would like to see:
Binary Transaction Request Format
We can use the first byte as a variant and flag byte. The first 5 bits of a
bech32-encoded message easily correspond to the first char after the 1
, so we
can easily see just by looking it, see what general kind of request might be.
This format, as discussed previously, is informed by BOLT-11 but leaves out a
few Lightning-specific features like message signatures. Similarly, we
probably must ignore the length limitation that bech32 has (unless we decide to
use a different polynomial for the error correction).
With each variant, we get access to a different set of TLV tags we can use. Or
it might be such that a variant picks an entirely different representation to
use. This would be safe since a wallet that doesn’t support a variant wouldn’t
try to parse that not-TLV format at all.
We maintain the similar encoding:
-
type
(5 bits)
-
data_length
(10 bits, big-endian)
-
data
(data_length
* 5 bits)
The choices of using 5 bits here is to line up with the bech32 symbols.
Variant 0 (q
prefix)
Variant 0 tries to match directly to EIP-681-style request URLs.
Not all of these TLVs would be required, but we can include:
-
a
(29) - 20 bytes (32 symbols) - target address (required)
-
q
(0) - data_length
bytes - value [see value encoding]
-
f
(9) - data_length
bytes (trailing padded) - ASCII function signature [see note 1]
-
s
(16) - 4 bytes (6 symbols, padded) - raw 4 byte function selector
-
m
(27) - variable bytes (trailing padded) - message call bytes, decodable using function signature
-
c
(24) - 4 bytes (6 symbols, padded) chain ID
-
p
(1) - data_length
bytes - ASCII purpose or some other message
-
l
(31) - ? bytes - gas limit
-
g
(8) - 10 bits (2 symbols) - gas price
If the value field is missing, we can assume it’s 0. We can specify data to
pass with m
and only specifying the selector with s
, but that gives more
work for the wallet to figure out to present a useful UI to the user. If we
only pass a function signature with f
, then we can provide the user with
options to decide what to do with it (as we could with EIP-681). We don’t
actually have to include a separate TLV entry for each parameter if we have a
signature to decode the calldata bytes with.
This would of course be useful even with just simple ETH value transfers, and
pretty succinctly. If we omit the chain ID we can assume it’s referring to the
user’s current chain, as with EIP-681.
We could also add another tag n
to refer to a target address by ENS name.
Value encoding
It’s often not necessary to include extremely precise quantities. Even when we
do, it’s wasteful to waste space on lots of zero bits. So it makes more sense
to me to encode quantities in an exponent-mantissa form, like floating points
but without the floating points.
Since 10 ** 18
takes 59 bits to represent, if we set aside 10 bits (2
symbols) as an exponent, we get more than we really need (2 ** (2 ** 10 - 1)
)
but setting aside only 1 symbol isn’t enough. The following symbols we can
treat as a kind-of mantissa. This seems complicated, but it’s actually easy to
implement. We just treat the 3rd and after symbols as a big endian number and
shift it left by the amount of the exponent. We go the other way by counting
the trailing zeros and working backwards.
// TODO python reference code
This does have a limitation that it’s possible to encode the same number in
different ways, but we are able to decide that the form with the highest
possible exponent (no trailing zeros in the mantissa) is the canonical form.
Variant 1 (p
prefix)
As mentioned before, I think it’s worthwhile to develop a more concise format
for representing common operations like token transfers. This would let us
more efficiently represent a few parameters and let us further optimize what we
are able to do with this request pattern. We wouldn’t have to include this in
any initial EIP, but I would like to see it…
-
a
(29) - 20 bytes (32 symbols) - target address (required)
-
q
(0) - data_length
bytes - value [see value encoding]
-
t
(31) - 20 or 24 bytes (32 or 39 symbols, padded) - localized token contract address (repeatable)
-
k
(11) - variable bytes - token identifier
-
c
(24) - 4 bytes (6 symbols, padded) - chain ID (repeatable)
-
p
(1) - data_length
bytes - ASCII purpose or some other message
The “localized token contract address” param here is a 20-byte contract address
and optionally another 4 bytes to indicate chain ID. If a receiver has funds
on multiple chains and doesn’t know where the sender has funds, they can
specify a list of these localized tokens to present the sender with a list of
options to choose from if they have the appropriate funds without the receiver
having to produce several payment requests.
Now, this involves some duplication. If there was a generally-agreed-upon
registry of equivalent token contracts on different ledgers (tokenlists is a
good start to look at), then we could reference that (with k
) avoid including the full token contract
addresses, and only include the list of chain IDs (with c
) that we know must have the token
contracts, and look up those addresses locally. To make this less error prone,
we could add an extra checksum tag or some other tag to ensure that we find the
same list of contracts. This also lets this format more naturally support networks (like
zkSync v1, as previously discussed) that don’t use their own token address
space and inherit it from elsewhere and reference tokens by indexes.
If we borrow the n
tag from above to refer to a receiver succinctly, then
it’s possible we can have really short payment requests in this model,
since we don’t include any full addresses that require 32 symbols to encode.
Notes
- A binary ABI encoding of what function signatures can look like that matches with general Solidity patterns would be more ideal here, but for the purposes of spec discussion we can assume it’s just a regular textual ABI string. I’m not sure if it’s worth it trying to do a 1:1 match between textual ABIs or if we can optimize and only include 4 byte selectors, expecting wallets to infer what’s actually happening.