EIP-615: Subroutines and Static Jumps for the EVM

charles-cooper · March 1, 2019, 8:11pm

Maybe I have an answer to my own question. If you treat JUMPV as an optimized version of a tree of JUMPIF, having all 1’s as conditions would jump to the last leaf in the tree. But that is not really an explanation of JUMPSUBV because it cannot be an optimization of JUMPSUBI (because it is not in the proposal).

gcolvin · March 2, 2019, 7:31am

What is the wording you want @Arachnid? Is there existing wording in the YP that fits? MSB-first, unsigned, highest bit set to 0.

I’m following Wasm’s semantics here. I should probably say so.

gcolvin · March 2, 2019, 7:52am

Sidney Amani extended Yoichi’s work to handle this EIP in Lem. We worked on it off-and-on from Nov '17 through Jan '18. https://github.com/seed/eth-isabelle/tree/evm15

gcolvin · March 2, 2019, 9:41am

They are integers, not arbitrary bit patterns, whether they are described as @chfast does or as @Arachnid does. And the same bit patterns. So “MSB-first, twos-complement, signed, two-byte positive integers”, or “MSB-first, unsigned, two-byte integers less than 2^63.” I don’t have an opinion, @expede.

montyly · March 15, 2019, 8:07pm

I am really enthusiastic about the use of static calls and subroutines. At Trail of Bits, we spent a lot of effort building static analyzer and reverse engineering tools for EVM, and we always struggle because of the lack of clear stack-frame definition.

These changes will clearly make EVM more suitable for code analysis and help anyone doing program analysis.

Is there any blocker for pushing this EIP? What are the next steps/deadlines?

gcolvin · March 17, 2019, 2:16pm

In a discussion on AllCoreDevs Martin Swende noticed problems in the Backwards Compatibility section that are probably going to require some a versioning scheme.

Greg Colvin @gcolvin 01:42
@holiman I think you are right. A proposal for versioning code upfront is probably needed. eWasm will need one too, if it hasn’t proposed one already.

Noel Maersk @veox 02:44
@gcolvin I vaguely remember @Arachnid proposing a contract versioning scheme (with a VERSION operation IIRC); not sure if this was just chat or a forum topic.
Here: EM thread 2440 (but see ensuing discussion, next few comments at least).
Also with keyword “versioning”: EM thread 2286 (which I haven’t seen before).

Noel Maersk @veox 02:49
There’s also ethereum/EIPs#1712 (draft, “Disallow Deployment of Unused Opcodes”; discussion: sorpaas/EIPs#4) which may be tangentially related.

Noel Maersk @veox 02:59
And issue ethereum/EIPs#154 from 2016; and pull ethereum/EIPs#1707, linked therein…
In short: there’s 3-5 proposals on versioning, in various states of “stuck draft”.

veox · March 17, 2019, 4:50pm

The links in above copy-paste (in order of appearance):

gcolvin · March 17, 2019, 5:10pm

Thanks, Noel @veox.

(Post must be at least 20 characters.)

chfast · March 18, 2019, 10:43am

And https://github.com/ethereum/EIPs/issues/178.

So yes, a number of proposals for EVM versioning have been made. None of them reached a proper EIP draft.

I also believe this is prerequisite for static jumps.

gcolvin · March 18, 2019, 11:47am

Main blocker has been lack of Foundation funding. (Edit: not so much blocker as slower-downer.)
Next step will be Last Call once issues are resolved here.
Deadlines are tracked at https://en.ethereum.wiki/roadmap/istanbul

2019-05-17 (Fri) hard deadline to accept proposals for “Istanbul”
2019-07-19 (Fri) soft deadline for major client implementations

gcolvin · March 19, 2019, 2:36pm

@chfast @veox @holiman

Most of the existing versioning proposals involve starting the contract with a currently invalid bytecode and interpreting what follows as some sort of version name or number. If we don’t want to sort through them all, reopen their discussions, and get consensus on a general scheme, then we can solve the problem just for this proposal.

We can insist in EIP-615 that the implicit main routine that begins each contract must instead start with an explicit, BEGINSUB 0,0. That marks post-615 code and makes it invalid to pre-615 VMs.

In Phase One only post-615 code must be valid. In an optional Phase Two we stop allowing pre-615 contracts at all, except as created by pre-615 contracts.

holiman · March 28, 2019, 7:05pm

Well, what if I, today, deploy a contract that starts with BEGINSUB 0,0. It won’t be executable now, but after the fork, it will look like one of the new contracts. The difference being that my jumps have not been validated, and any deploy-time validations that should have been done have thus been skipped.

gcolvin · March 28, 2019, 9:53pm

Damn. Do we really do that little checking now? I shudder to go look in the Yellow Paper. And fear this is why all of the versioning EIPs are in some state of stuckness. But yes, it would be a breaking change to cause a program that used to stop immediately with an invalid instruction to instead do something unintended.

gcolvin · March 29, 2019, 7:02am

@holiman One way out (and we start getting into solving the whole versioning problem here) is to follow the leading new bytecode with something that is statistically highly unlikely to be there, like a hash of the rest of the contract’s code.

holiman · March 29, 2019, 8:34am

Well, that won’t stop a malicious coder from doing the same thing, right?
Unfortunately, we don’t have address namespaces, that would have been great to have from the beginning. So that certain address spaces have different mechanics.

gcolvin · March 29, 2019, 1:45pm

Not sure what the malicious coder gets besides marking a contract as post-Istanbul. The rest of the bytecode still has to be valid, and the hash still has to be right.

The idea is just to be sure that no pre-Istanbul contracts accidently look like post-Istanbul contracts.

Not sure what you mean by address namespaces.

gcolvin · March 30, 2019, 4:32am

@holiman @sorpaas @chfast

None of the proposals listed above take note of the problem you found. But this proposal doesn’t have that problem.
EIP-1707: Use Version Byte Prefix for Contract Account Versioning
It uses 0x00 as the leading byte, in combination with there being sufficient following bytes to hold the version identifier. But this assumes that a contract beginning with STOP can not have any following bytes, which I’m not sure is true.

ajsutton · March 30, 2019, 6:12am

I don’t believe there’s any restrictions on what the code for a contract can be. Whatever is returned from the creation transaction is stored without question. So there is no way to embed the version in the code itself without it being possible to deliberately set that up in a contract.

I believe you’d have to add a field to the account state entry to indicate the code version being used by the contract (if the field isn’t present it’s assumed to be the version we’re running now). Transactions would either be assumed to be the latest or could potentially have a similar version field added.

gcolvin · March 30, 2019, 7:30pm

I don’t believe there’s any restrictions on what the code for a contract can be.
[/quote]
That was my fear. I do think my idea of using a hash of the contract might work, despite @holiman’s worries.
To be concrete:

A new type of contract (EIP-615, eWasm, or whatever) has this layout.
a. the code length is at least 21 + TBD bytes
b. the first byte is null
c. the next TBD bytes are a valid version identifier
d. the following bytes up to the DATA section are valid code
e. the last 20 bytes are the correct RIPEMD-160 hash of all the preceding bytes
Anything else is an old type of contract.

The problem we want to prevent is an old contract accidently looking like a new contract. I’m not up for the calculation right now, but the odds of the last 20 bytes just happening to be the right hash of the preceding bytes look to be vanishingly small. (Please correct me if I’m being stupid.) And in that case the version identifier and the code would also have to be valid.

I’m not sure what an exploit would look like. An attacker can only deploy code that either looks like old code and is run as such, or looks like new code and is validated and run as such.

This would work fine, of course. The disadvantage is that you couldn’t tell from the bytecode itself what kind of bytecode it is. This makes it harder on tools that only have the bytecode to work with.

sorpaas · March 30, 2019, 11:03pm

I want to repeat my argument that treating contract code as data is problematic (both that it may break things we currently have, and it will be roadblocks for future hard forks). So far I don’t see any other uses of it than Solidity’s behavior of metadata postfix. I suggest we put that behavior to a stop.

I don’t think you even need the 20-bytes RIPEMD postfix. The chances of an old contract accidentally look like a new contract wouldn’t be higher than the scenario where we add new opcodes to EVM, and some old contracts’ data accidentally contains the new opcode, then drastically change the old contract’s behavior. We have deployed many new opcodes and it doesn’t seem like anyone’s complaining. So I argue that from risk assessment perspective adding a new type of contract wouldn’t be an issue.

And to be honest, all of this is caused by the fact that we allow code as data, while we actually did not intend it to be (EVM’s JUMPDEST analysis treats all code as code), and did not provide any basic guarantee (like in ELF format) that data section won’t be accidentally executed. I suggest we just disallow this behavior (using EIP-1712 or some sorts) to clear paths for our future hard fork.