The future of EVM
EVM Object Format
Andrei Maiboroda
Whoami
- Ipsilon team @ EF (previously called Ewasm)
- Team focus: improvements and analysis for execution (EVM)
Publishing on
EVM is awesome
- Small number of simple rules / instructions.
- Very simple interpreter.
- Every byte is an instruction.
- Plenty of optimization techniques.
- Well-developed tooling ecosystem.
EVM is awesome... but not quite
- Every byte is an instruction... but we also have (push)data bytes.
- Rules are simple... but there're some exceptional cases and quirks.
- Too simplistic, some more features would be nice to have, e.g. subroutines.
- Hard to extend / upgrade.
Issue 1: New instructions
- Any byte can exist in deployed contracts.
- Using an undefined instruction results in a failure of execution.
- Introducing a new instruction means change of behavior for such contracts.
Currently addressed by recommending to never rely on undefined instructions staying undefined.
Issue 2: Data bytes and JUMPs
- The
PUSH
instructions have an immediate data
- These data bytes can not be executed
- This means
JUMP
ing to such offset is invalid
Challenging to do this at runtime.
Mostly done upfront in a process called jumpdest analysis.
Jumpdest analysis explained
JUMPDEST
(0x5b
) instruction marks a valid jump destination
- But not every
0x5b
byte is a JUMPDEST
(it is not if it's in PUSH
immediate bytes)
Goal: Collect valid JUMPDEST
offsets before execution
JUMPDESTs also help with JIT/AOT compilation
Problems with jumpdest analysis
- Can be costly in terms of CPU time and subject to DOS attacks.
- Is repeated every time before code is executed.
- Need to be aware of new instructions with immediates.
- This also makes it harder to introduce new instructions with immediates, see
DUPn
/SWAPn
proposal.
Issue 3: "Data sections"
Contracts frequently contain trailing data:
- Solidity places "metadata" and other large constants at the end (useful for contract verification, acquiring ABI, etc.)
- Contract creation code has the returned runtime code there.
Issue 3: "Data sections"
This data can interfere with jumpdest analysis:
- It can be a lot of data to analyse
- It is still possible to jump there (if it contains
JUMPDEST
), breaking the boundary of code and data
Issue 3: "Data sections"
A long history of trying to address it
- EIP-2327: BEGINDATA opcode
- EIP-615: Subroutines and Static Jumps for the EVM
The Solution
EVM Object Format (EOF)
Goals:
- Separate code and data
- Code validation (at deploy time)
- Extensibility (via versioning)
EOF example
Legacy code:
600480600B6000396000F3600080FD
EOF code:
EF000101000B0200040060048060156000396000F3600080FD
EOF example
Legacy code:
600480600B6000396000F3600080FD
EOF code:
EF000101000B02000400 60048060156000396000F3 600080FD
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
These bytes are the magic.
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
The version number, which is set at 1 currently.
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
The first section, with kind 01
meaning code, and length of 0xB
(11).
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
The (optional) second section, with kind 02
meaning data, and length of 0x4
.
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
The terminator for the header (i.e. section kind 0).
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
This is the content of the first section (the code):
PUSH 4
DUP1
PUSH 21
PUSH 0
CODECOPY
PUSH 0
RETURN
EOF example
EF000101000B02000400 60048060156000396000F3 600080FD
This is the content of the second section (the data):
PUSH 0
DUP1
REVERT
In our case we actually encoded the runtime code here.
Additional execution rules
JUMP
s are allowed only inside code section
PC
is relative to code section (starts from 0)
CODECOPY
/EXTCODECOPY
can be used to copy from data section
Contract creation has validation procedure
Applies to contract creation: create transaction, CREATE
, CREATE2
- Is initcode valid?
- If not, contract creation fails
- Is code valid?
- If not, contract creation fails
Contract creation is flexible
EOF initcode can create EOF code
EF000101000B02000400 60048060156000396000F3 EF00010100600080FD
EOF initcode can create legacy code
EF000101000B02000400 60048060156000396000F3 600080FD
Legacy initcode can create EOF code
Role of magic
EF000101000B02000400 60048060156000396000F3 600080FD
- Requirement: Deployed code with EOF prefix must be valid.
- Magic guarantees that no currently deployed contracts are recognized as EOF.
- EIP-3541: Reject new contract code starting with the 0xEF byte (activated in London upgrade) guarantees that no contracts can be deployed now that will be interpreted later as invalid EOF.
EIP-3670: Code Validation
Additional code validation rules:
- Deploying undefined opcodes is forbidden.
- PUSH at the end of code without enough data is forbidden (no implicit zero bytes)
- Bonus: code must end with a terminator instruction (STOP, RETURN etc.)
(More potential rules to come)
EIP-3670: Code Validation
Why?
- Do not deploy bad code
- More efficient execution (fewer runtime checks)
- Easier rollout of new EVM features
EIP-4200: Static relative jumps
- New instructions RJUMP/RJUMPI with an immediate argument "offset to jump" instead of getting offset from stack
- Helps to move/replace parts of bytecode, may be useful to L2
- Reduces "dynamic" JUMPs usage
- Reduces JUMPDESTs usage
- May be combined with functions/subroutines to eliminate "dynamic" JUMPs and JUMPDESTs
More future ideas
- Deprecation of CALLCODE, SELFDESTRUCT etc.
- Subroutines (can be in separate code sections)
- Helping with EIP-3074 (AUTH/AUTHCALL), validate section for Account Abstraction
- Helping with Address Space Extension, encoding in EOF special kind of contracts bridging between old and new addresses.
- Rich code validation, EIP-3779: Safer Control Flow for the EVM (draft)
When EOF?
Hopefully the first feature upgrade after The Merge.
The future of EVM
EVM Object Format
Andrei Maiboroda
twitter.com/gumb00
github.com/gumb0