Increasing address size from 20 to 32 bytes

I’m all for longer addresses… but if you’re going to make big incompatible changes, why not bundle that with switching from EVM to EWASM or RISCV or something, instead of making an ugly and futile attempt at backwards compatibility?

This may be misleading. It also requires 2**80 bytes (around 1 bn petabytes) of (fast) memory. You can trade off to something like 1,000 petabytes memory and 2**100 hashes, but it’s still hard.
I’m not saying this is not a problem, just making current situation somewhat clearer.

1 Like

This may be misleading. It also requires 2**80 bytes (around 1 bn petabytes) of (fast) memory.

I’m pretty sure you can use cycle finding algos to find a collision in sqrt time and O(1) memory.

Eg. https://diglib.tugraz.at/download.php?id=576a7826f0534&location=browse around page 55 talks about some approaches to do this.

1 Like

What was the motivation to use 20 byte addresses instead of the 32-byte addresses which are generated by default?

1 Like

It was a holdover from bitcoin (much like the v value in signatures being increased by 27).

1 Like

This is a brief survey of all (?) EVM opcodes which interact with addresses.

ADDRESS, ORIGIN, CALLER, COINBASE

Input: nothing.
Output: stack item with an address from the execution environment. Currently 160 bits.

BALANCE, EXTCODESIZE, EXTCODECOPY, EXTCODEHASH

Input: an address from stack, which we currently truncate to 160 bits.
Output: info about that address is possibly pushed to stack.

CALL, CALLCODE, DELEGATECALL, STATICCALL

Input: an address from stack, which we currently truncate to 160 bits.
Output: error code from the message call.

CREATE, CREATE2

Input: The init code, endowment, and possibly a salt.
Output: new account’s address (currently 160 bits) to stack, where the new address is roughly: Hash(creating contract's address, creating contract's nonce or a salt, init code)[12:32]. This interacts with addresses in two ways: as the input and output of a hash.

SELFDESTRUCT

Input: an address from stack, which we currently truncate to 160 bits.
Output: send the remaining balance to that address.

SLOAD, SSTORE

These only implicitly touch the current contract’s address to read/write its storage.

2 Likes

Did someone consider the effects of this change to vanity addresses? It seems to me that 0x01000000000157aE408398dF7E5f4552091A69125d5dFcb7B8C2659029395bdF might be considered as a vanity adress and that it will be harder in the future to differentiate between “high-effort” and “low-effort” ones. Might not be the most important detail, but as users often check some parts of the address to verify the correctness, we should keep this in mind.

2 Likes

Is it worth considering some of the following?

(extensions may want to consider multihash “support” https://w3c-ccg.github.io/multihash/index.xml)

A random collision indeed. But here you need chosen prefixs - two different ones. Notice the fact that the prefixes are different is meaningful since the iterators travel different paths in the space.

Notice the fact that the prefixes are different is meaningful since the iterators travel different paths in the space.

At worst, you can just have an iterator that randomly hops between both parts of the space (EOA pubkeys and contract codes fitting a template), and if you find a collision there’s a 50% chance that one preimage is a pubkey and the other preimage is a contract code. So I don’t think this is a barrier.

Err… You need everything to be deterministic so you keep cycling.

Edit: Ah of course if you use (state % 2) as your random number then everything works out deterministically and perfectly. True.

If we’re defining a new address format, can we please define a canonical text representation that is not just the hexadecimal encoding of the address? Ethereum’s lack of a checksum in its text representation is one of its greatest weaknesses, and if everyone has to support a new address format anyway, that’s an excellent time to fix it. This should be a core part of any new address proposal, and not an afterthought - if 32 byte hexadecimal addresses get a foothold, it will be impossible to fix this (again).

1 Like

Why not still keep 20 bytes in EVM, while adding extra fields in tx/msg to including chain_id/epoch_id? From a normal user perspective, the address is 32 bytes instead of 20 bytes, but wallets will automatically translate 32 bytes to 20 bytes byte hash + chain_id + others and put them in proper data fields.

For example, a tx sending to a 32-byte address will put “to” field as 20 bytes, and “to_chain_id” field from a 32-byte address.

And EVM can stay as it is except adding a few OPCODE to read like “to_chain_id” of current tx context.

I don’t think that would actually solve either the address space expansion problem or the security problem… the issues all happen in the EVM, not in clients.

1 Like

Contract level checksum validation or (maybe in the future) EVM level checksum validation.

1 Like

Is there a benefit to having the shard id in the address, instead of adding it in the transaction along with the chain id?

The situation is actually much worse than this: You can write a harmless looking contract – let’s say a token wrapper or a uniswap clone – and use a collision to deploy it to an address that’s also an EOA. You deploy the contract, and after user funds have poured in (trusting the functionality according to the immutable contract code) you can steal everything using the EOA key.

We (me, @chfast, @gumb0, @hugo-dc) have been working on the address space extension topic the past two weeks. It seems to open up a lot of questions. First we have created a hopefully more comprehensive specification based on this forum and explored questions (they are listed in the latter part of the document): ASE (Address Space Extension) with Translation Map - HackMD

Posting it now in case someone else is working on this too – it would be nice to collaborate to find answers more quickly. We expect to update this document as we go next week. Any feedback is welcome, but be prepared the content is in flux.

There are some differences to the description in this forum, which are listed as bullet points. One difference is we removed references to epochs in the document, and just state bytes 1-5 are reserved and must be zero (i.e it would be epoch 0). This is meant to help decoupling this proposal from state expiry in the sense it could be introduced earlier and independently.

We are now working on some new ideas and analysis of some concerns we have identified.

1 Like

We have published this collection of concerns here: Issues with ASE (with a translation map) - HackMD

It is a very long list and some points are more developed than others.

1 Like

32 bytes new addr:
0-11 Byte : EVM/contract internal ID
12-31 Byte : old 20 Byte Addr