EIP-5018: Directory Standard


eip: 5018
title: Directory Standard
description: A standard interface for filesystem directories.
author: Qi Zhou (@qizhou)
discussions-to: TBD
status: Draft
type: Standards Track
category: ERC
created: 2022-04-18

Abstract

The following standard allows for the implementation of a standard API for filesystem directories within smart contracts.
This standard provides basic functionality to read/write binary objects for any size, as well as allow reading/writing chunks of the object if the object is too large to fit in a single transaction.

1 Like

I have a couple non-formatting related comments:

  • What makes this EIP filesystem-like? It seems like it’s really a key-value store?
  • What data structure do you envision backing this interface?
  • Why use chunk ids instead of byte ranges?

Thanks for the comment. I agree that the current interface is quite similar to a key-value store. The reason is that we want a filesystem-like smart contract with minimal necessary interfaces that can host a decentralized website:

  1. chunked-based functions are needed because we want to support reading a large BLOB, which cannot be fit in a single tx;
  2. ls (list directory contents) may not be needed in the minimal version as most websites do not offer it to users;
  3. sub-directory can be achieved by allowing β€œ/”'s in the filename, e.g., β€œ/a/b/c/d”, so we may not need an explicit interface of sub-directory.

Note that, a key-value store may not have 1. Moreover, if the applications do need 2 or 3, we can create the extension EIPs to include the features (similar to the extensions to ERC-20/ERC-721).

Thanks for the comment. Current EVM supports two types of storage.

  1. local contract storage via SLOAD/SSTORE; and
  2. contract-code-based via CREATE/CREATE2/EXTCODECOPY.

The first storage is efficient for 32-bytes operations, but if the data is large with dynamic size, using contract-code-based storage can be more efficient in both gas and IO. The following is a table of the gas for different storage (note that the gas for put only accounts for the the-first-time put):

OPCODE 1k 4k 8k 12k
Local contract (get) SLOAD 96212 310514 596473 882688
Local contract SSTORE 771051 2949132 5853295 8757522
Code-based storage EXTCODECOPY 30502 (1/3.15x) 38987 (1/7.96x) 50525 (1/11.8x) 62319 (1 / 14.1x)
Code-based storage CREATE 387383 (1/ 2x) 1128673 (1/2.61x) 2117788 (1/2.76x) 3104698 (1 / 2.82x)

We have implemented both types of storage following the standards and the code can be found here GitHub - ethstorage/evm-large-storage-bak, where

  • For local contract storage, we use keccak256(filename || chunkId) as the key, and the value is an optimized version of solidity bytes storage
  • For contract-code-based storage, we use keccak256(filename || chunkId) as the key, and the value is a contract, whose code contains the corresponding chunk data.

Furthermore, the code is deployed to Rinkeby to store ENS and Uniswap homepages, and works very well:

A good design question! To support large BLOB, we can definitely do bytes ranges as UNIX read/write do. However, this seems to be complicated in some cases such as read-modify-write - if a write overrides multiple physical storages (e.g., storage slots or contract code), the contract needs to read existing data/override the data/rewrite the data. Using chunk ids, we can simplify the logic and let the off-chain application determine how to use them (and do read-modify-write off-chain). What do you think?