Increasing address size from 20 to 32 bytes

I’m wondering if, instead of deciding on one address format which has hardcoded support for shards/L2 chains addressing, it might not be getter to have optional address extensions.

So in the base 32 byte address, there could be a byte stating whether the address is followed by a second 32 byte extended address. If the byte is 0, there is no extended address information. If the byte is not 0, its value would specify the type of the address extension that follows in the next 32 bytes. Any address extension would be optional, but as you described providing it might be a way to optimize transfers.

Great idea! Doesn’t increase the base address size, and provides 32 bytes in which to encode multiple chain IDs when this functionality is needed :slight_smile:

IMHO this extension (that enables multiple chain IDs to be encoded) will be needed in most cases when an address that tokens might be sent to is shared/published. E.g. when users share addresses with one another, most of the time they should be using this address extension.

Just enforcement, and obviousness that the checksum exists.

Why not allow for the full 256 bit keccak256 checksum? GUIs can include however many bytes they want at the end of the address when copying and pasting. Ethereum can specify that anything under 4 bytes is invalid for GUIs.

I’m all for longer addresses… but if you’re going to make big incompatible changes, why not bundle that with switching from EVM to EWASM or RISCV or something, instead of making an ugly and futile attempt at backwards compatibility?

This may be misleading. It also requires 2**80 bytes (around 1 bn petabytes) of (fast) memory. You can trade off to something like 1,000 petabytes memory and 2**100 hashes, but it’s still hard.
I’m not saying this is not a problem, just making current situation somewhat clearer.

1 Like

This may be misleading. It also requires 2**80 bytes (around 1 bn petabytes) of (fast) memory.

I’m pretty sure you can use cycle finding algos to find a collision in sqrt time and O(1) memory.

Eg. https://diglib.tugraz.at/download.php?id=576a7826f0534&location=browse around page 55 talks about some approaches to do this.

1 Like

What was the motivation to use 20 byte addresses instead of the 32-byte addresses which are generated by default?

1 Like

It was a holdover from bitcoin (much like the v value in signatures being increased by 27).

1 Like

This is a brief survey of all (?) EVM opcodes which interact with addresses.

ADDRESS, ORIGIN, CALLER, COINBASE

Input: nothing.
Output: stack item with an address from the execution environment. Currently 160 bits.

BALANCE, EXTCODESIZE, EXTCODECOPY, EXTCODEHASH

Input: an address from stack, which we currently truncate to 160 bits.
Output: info about that address is possibly pushed to stack.

CALL, CALLCODE, DELEGATECALL, STATICCALL

Input: an address from stack, which we currently truncate to 160 bits.
Output: error code from the message call.

CREATE, CREATE2

Input: The init code, endowment, and possibly a salt.
Output: new account’s address (currently 160 bits) to stack, where the new address is roughly: Hash(creating contract's address, creating contract's nonce or a salt, init code)[12:32]. This interacts with addresses in two ways: as the input and output of a hash.

SELFDESTRUCT

Input: an address from stack, which we currently truncate to 160 bits.
Output: send the remaining balance to that address.

SLOAD, SSTORE

These only implicitly touch the current contract’s address to read/write its storage.

2 Likes

Did someone consider the effects of this change to vanity addresses? It seems to me that 0x01000000000157aE408398dF7E5f4552091A69125d5dFcb7B8C2659029395bdF might be considered as a vanity adress and that it will be harder in the future to differentiate between “high-effort” and “low-effort” ones. Might not be the most important detail, but as users often check some parts of the address to verify the correctness, we should keep this in mind.

2 Likes

Is it worth considering some of the following?

(extensions may want to consider multihash “support” https://w3c-ccg.github.io/multihash/index.xml)

A random collision indeed. But here you need chosen prefixs - two different ones. Notice the fact that the prefixes are different is meaningful since the iterators travel different paths in the space.

Notice the fact that the prefixes are different is meaningful since the iterators travel different paths in the space.

At worst, you can just have an iterator that randomly hops between both parts of the space (EOA pubkeys and contract codes fitting a template), and if you find a collision there’s a 50% chance that one preimage is a pubkey and the other preimage is a contract code. So I don’t think this is a barrier.

Err… You need everything to be deterministic so you keep cycling.

Edit: Ah of course if you use (state % 2) as your random number then everything works out deterministically and perfectly. True.

If we’re defining a new address format, can we please define a canonical text representation that is not just the hexadecimal encoding of the address? Ethereum’s lack of a checksum in its text representation is one of its greatest weaknesses, and if everyone has to support a new address format anyway, that’s an excellent time to fix it. This should be a core part of any new address proposal, and not an afterthought - if 32 byte hexadecimal addresses get a foothold, it will be impossible to fix this (again).

1 Like

Why not still keep 20 bytes in EVM, while adding extra fields in tx/msg to including chain_id/epoch_id? From a normal user perspective, the address is 32 bytes instead of 20 bytes, but wallets will automatically translate 32 bytes to 20 bytes byte hash + chain_id + others and put them in proper data fields.

For example, a tx sending to a 32-byte address will put “to” field as 20 bytes, and “to_chain_id” field from a 32-byte address.

And EVM can stay as it is except adding a few OPCODE to read like “to_chain_id” of current tx context.

I don’t think that would actually solve either the address space expansion problem or the security problem… the issues all happen in the EVM, not in clients.

1 Like

Contract level checksum validation or (maybe in the future) EVM level checksum validation.

1 Like

Is there a benefit to having the shard id in the address, instead of adding it in the transaction along with the chain id?