Increasing address size from 20 to 32 bytes

How so? Remember that there’s no need for a checksum within the EVM; the checksum is a user-interface-level convenience. So it should be part of the representation format (as the mixed-case-hex thing is), and not part of the raw address format.

making such entry more difficult and error prone to enter by making the addresses cases sensitive

The mixed-case checksums don’t have this risk. If you get the uppercase/lowercase wrong, that can ONLY lead to a “bad checksum” error, NEVER to you accidentally typing in a different but valid address.

Is it possible that an address could get corrupted while sending a transaction or executing contract code?

I can’t possibly imagine what could cause such a thing to happen.

so maybe that could be an argument for longer addresses (40 bytes?)

Addresses longer than 32 bytes are extra-problematic because there are 32 byte limits everywhere in ethereum: storage slot keys, storage slot values, ABI bytes32 values, SSZ chunk sizes… so IMO better to just stick to 32.

Actually, on second thought, maybe it would be nice to start at version 226, so that eth addresses would start with 0xE2 which could be read as Ethereum 2.0.

I’m ok with this if people want it!

1 Like

If you get the uppercase/lowercase wrong, that can ONLY lead to a “bad checksum” error, NEVER to you accidentally typing in a different but valid address

I get that, but it may still get frustrating and not user friendly. Suppose someone trying to send a large amount to a paper wallet. He enters the address, but we just made this entry more difficult because the letters are now case-sensitive. He clicks “send” and gets an error that the address is incorrect. He now totally freaks out.

I think most users would rather type case-insensitive addresses with an additional checksum byte, than having to worry about upper and lower case letters in addresses.

I would describe your argument as saying: “It is OK if I make your task more difficult, because I can guarantee that I will let you know if you fail it, and therefore it won’t have terrible consequences”.

There could also be a benefit to be able to check whether addresses are valid in program databases (beacon nodes, etc.). Maybe?

Yes please! At least a nibble, if not four bytes like Bitcoin. Two bytes would allow 1/65536 mistakes to get through. A nibble 1/16 mistakes would get through.

Another point: It’s not just users that bit flip. It’s not just for user interfaces. This is a real issue, for computers as well.

See my post on Reddit.

EIP-55 addresses are just regular addresses on chain. There’s no checksum on chain.

EIP-55 addresses are also optional. What I’m asking for is mandatory checksums.

Thank goodness applications like Metamask use checksums by default. But then again, it’s only ~15 bits on average.

But that’s another problem in of itself. The checksum strength is not guaranteed, it’s only “an average” value. It could be very few bits actually in the checksum. It totally depends on how many alpha characters the original address has.

Ethereum addresses are 42 characters long with (when checksummed) a base 22 character set.

If they were efficient (which they are not) they could hold 179 bits of data.

For a 42 string length in “base 66”, you can store 242 bits. Which is enough room to do everything Vitalik is suggesting to do and have enough bits left over to add checksums.

If using Bitcoin’s base 58, you’d need 44 characters.

It’s worth noting that in bitcoin, the checksum is also purely a user convenience, not an in-protocol thing; the bitcoin protocol treats addresses purely as a 20-byte hash with no redundancy or error detection of any kind. So as far as I can tell, that is the industry standard way of doing it. And if there are bit flips, they’re far more likely to happen in the much larger parts of the protocol that are not addresses.

Even if it had no technical value (a fact that I am not quite convinced is true), having a checksum is a very potent bragging claim (even better if bitcoin doesn’t have it).

It allows us to say:

Ethereum has an built in way to validate that addresses are indeed valid addresses, and not just any random string of bits. Before any token transfer, addresses are validated against the embedded checksum and any inconsistency will cause the transaction to be aborted.

Sadly, marketing is important, even for things which make little technical sense.

2 Likes

Proposal: any new address schema should support encoding multiple shard and L2 rollup chain IDs in a single address

The shard and L2 rollup chain IDs that are encoded in the address would represent the destination chains into which the address owner is happy to receive tokens e.g. If an address includes the IDs for say the Ethereum L1, Optimism L2 and ZKSync L2 chains, this would signal that the owner of the address is happy for tokens sent to this address to be sent on the Ethereum L1, Optimism L2 and ZKSync L2 chains.

In a world with multiple rollups, encoding multiple chain IDs in the address would enable wallets to automatically know which L2 rollup chains are valid destinations (for that address) without requiring any additional user input.

Without this change, when (in a world with multiple rollups) a user wishes to make a token transfer, the user would need to know the name of the destination chain in addition to the destination address. This is much worse than the token transfer user experience on Ethereum today!

Requiring users to enter a destination chain name (in addition to a destination address) introduces the following UX regressions:

  • significantly increases scope for user error (e.g. user sends tokens to the correct address but on the wrong chain)

  • Adds another concept (the fact there are multiple chains with different names) for users to learn, hindering adoption. For mainstream adoption we need to make Ethereum easier to use, not harder.

  • Because in reality most users would only enter/select a single destination chain, in many cases the routing of these cross chain transactions would be suboptimal, increasing the transaction cost the user has to pay, decreasing the transaction speed, and increasing the number of transactions that need to be performed which needlessly places additional load on the network. Many to many chain routing is better than many to one chain routing.

If multiple chain IDs are encoded in Ethereum addresses, all these problems go away and the token transfer user experience in a world with multiple rollups is just as good as the token transfer user experience today.

How it would work:

Let’s say Alice has funds associated with her address on rollups A, B and C. She wants to send funds to Bob who uses his address on rollups B, C and D. Bob sends his address to Alice, and in the address it is encoded that Bob is happy to receive funds to that address on rollups B, C and D.

Alice has 30 ETH associated with her address in total, and this is split equally across rollups A, B and C (10 ETH on each). Alice wants to send Bob 20 ETH. Because Bob’s address has told Alice’s wallet that he is happy to accept funds to rollups B, C and D, Allice’s wallet automatically works out that the quickest and cheapest way to send 20 ETH to Bob is to send 10 ETH from her address to his address on rollup B and another 10 ETH from her address to his address on rollup C. This means that this transfer doesn’t have to cross any rollup boundaries therefore avoiding the latency and cost of moving funds between rollups.

In terms of the routing of payments, the above is one of the simplest possible examples. In reality different L2s will have different bridge costs and delays, and the sender’s funds will frequently be spread across multiple rollups in a way that won’t match up so neatly with the rollups on which a recipient is happy to receive funds. And when L2 to L2 transfers become feasible there will also be different costs when moving funds between different L2’s adding another layer of routing complexity. It’s unreasonable to expect users to understand and manually work out the best answer to these complex routing problems each time they want to make a token transfer.

Encoding multiple chain IDs in ethereum addresses provides wallets with the information they require to compute the quickest and cheapest way of transferring funds (that are assigned to the same address across multiple rollups) to a recipient whose address is also used across multiple rollups.

Assumptions around token UX in a in a world with multiple rollups

  • In the future, wallets will include simultaneous support for multiple rollup chains (in addition to Ethereum L1)

  • When this happens, wallets will display the token balance for a given address across both Ethereum L1 and all the rollup chains that the wallet supports.

  • Token bridges will be built into wallets, so when a user wishes to transfer tokens across chains, they will be able to do this directly in their wallet without needing to navigate to a token bridge DApp.

See this rough visual mockup which illustrates these UX assumptions and shows how a token transfer flow could work in a world of multiple rollups if multiple chain IDs can be encoded in an address.

Benefits of being able to encode multiple chain IDs in an address in a world with multiple rollups:

  • The user sending tokens only needs to know the destination address (just like when using Ethereum today). Without this change, in a world with multiple rollups, a simple user to user token transfer requires the user to know both the destination address and the destination chain name.

  • Less scope for user error e.g. the user doesn’t have the opportunity to mistakenly enter the name of the wrong destination chain. Follows the UX principle that “when possible, the best way of handling errors is to remove the possibility of the user making the error in the first place!”

  • Lower transaction fees, faster transactions and reduced number of transactions… Wallets would be able to automatically calculate the most efficient routing to transfer X tokens from account A to account B when (for example) account A holds X tokens across rollup A, rollup B and rollup C, and where account B is happy to receive tokens on rollup B, rollup C and rollup D.

  • Requiring the user to manually enter the destination chain for a transaction makes the UX of transferring tokens more complex. By having the ability to encode multiple chain IDs in an address, Ethereum doesn’t get harder to use than it is today in a multi-rollup world.

Open questions:

  • What is the max number of chain IDs we would allow to be encoded in a single address? The cost of allowing a greater number of chain IDs to be encoded in a single address is increased address length. Could allowing a max of say 8 or 16 chain IDs to be encoded in a single address be a sensible number?

  • What is the global max number of chain IDs that would be available for use in the future? The cost of having a greater global max number of chain IDs is increased address length. On the flip side, the greater the number of chain IDs supported the more future proof this format would be. Perhaps aim to support a total of 32k or 64k chain IDs in total??

  • A table of which chain IDs map to which chains would have to live somewhere

Other thoughts

  • Encoding Chain IDs in addresses should be optional e.g. an address should be valid even if no Chain IDs are encoded in it.

  • I’m +1 on adding native support for checksums to any future address format.

  • We’ve been looking into how we can encode multiple chain IDs into Ethereum’s current address format using mixed case encoding (EIP-55’s checksum encoding mechanism), but it’s not possible to use mixed case encoding for both multiple chain IDs and a checksum at the same time. Natively supporting multiple chain IDs in a new address format would resolve this issue.

IMHO the ability to encode multiple chain IDs in ethereum addresses is highly desirable from a user experience perspective!

6 Likes

That sounds like a fair compromise. I would ask to enforce checksums for user interfaces.

I’m wondering if, instead of deciding on one address format which has hardcoded support for shards/L2 chains addressing, it might not be getter to have optional address extensions.

So in the base 32 byte address, there could be a byte stating whether the address is followed by a second 32 byte extended address. If the byte is 0, there is no extended address information. If the byte is not 0, its value would specify the type of the address extension that follows in the next 32 bytes. Any address extension would be optional, but as you described providing it might be a way to optimize transfers.

Great idea! Doesn’t increase the base address size, and provides 32 bytes in which to encode multiple chain IDs when this functionality is needed :slight_smile:

IMHO this extension (that enables multiple chain IDs to be encoded) will be needed in most cases when an address that tokens might be sent to is shared/published. E.g. when users share addresses with one another, most of the time they should be using this address extension.

Just enforcement, and obviousness that the checksum exists.

Why not allow for the full 256 bit keccak256 checksum? GUIs can include however many bytes they want at the end of the address when copying and pasting. Ethereum can specify that anything under 4 bytes is invalid for GUIs.

I’m all for longer addresses… but if you’re going to make big incompatible changes, why not bundle that with switching from EVM to EWASM or RISCV or something, instead of making an ugly and futile attempt at backwards compatibility?

This may be misleading. It also requires 2**80 bytes (around 1 bn petabytes) of (fast) memory. You can trade off to something like 1,000 petabytes memory and 2**100 hashes, but it’s still hard.
I’m not saying this is not a problem, just making current situation somewhat clearer.

1 Like

This may be misleading. It also requires 2**80 bytes (around 1 bn petabytes) of (fast) memory.

I’m pretty sure you can use cycle finding algos to find a collision in sqrt time and O(1) memory.

Eg. https://diglib.tugraz.at/download.php?id=576a7826f0534&location=browse around page 55 talks about some approaches to do this.

1 Like

What was the motivation to use 20 byte addresses instead of the 32-byte addresses which are generated by default?

1 Like

It was a holdover from bitcoin (much like the v value in signatures being increased by 27).

1 Like

This is a brief survey of all (?) EVM opcodes which interact with addresses.

ADDRESS, ORIGIN, CALLER, COINBASE

Input: nothing.
Output: stack item with an address from the execution environment. Currently 160 bits.

BALANCE, EXTCODESIZE, EXTCODECOPY, EXTCODEHASH

Input: an address from stack, which we currently truncate to 160 bits.
Output: info about that address is possibly pushed to stack.

CALL, CALLCODE, DELEGATECALL, STATICCALL

Input: an address from stack, which we currently truncate to 160 bits.
Output: error code from the message call.

CREATE, CREATE2

Input: The init code, endowment, and possibly a salt.
Output: new account’s address (currently 160 bits) to stack, where the new address is roughly: Hash(creating contract's address, creating contract's nonce or a salt, init code)[12:32]. This interacts with addresses in two ways: as the input and output of a hash.

SELFDESTRUCT

Input: an address from stack, which we currently truncate to 160 bits.
Output: send the remaining balance to that address.

SLOAD, SSTORE

These only implicitly touch the current contract’s address to read/write its storage.

2 Likes

Did someone consider the effects of this change to vanity addresses? It seems to me that 0x01000000000157aE408398dF7E5f4552091A69125d5dFcb7B8C2659029395bdF might be considered as a vanity adress and that it will be harder in the future to differentiate between “high-effort” and “low-effort” ones. Might not be the most important detail, but as users often check some parts of the address to verify the correctness, we should keep this in mind.

2 Likes

Is it worth considering some of the following?

(extensions may want to consider multihash “support” https://w3c-ccg.github.io/multihash/index.xml)