EIP-5656: MCOPY instruction

axic · September 15, 2022, 2:52pm

This is the discussion topic for

Since external links are not allowed in EIPs:

here is the referenced analysis
and the EVM384 memory overhead discussion

matt · April 27, 2023, 2:56pm

Could you please explicitly state the gas costs in the specification section?

charles-cooper · April 27, 2023, 3:19pm

Thanks for catching – updating here: eip 5656: specify gas costs by charles-cooper · Pull Request #6942 · ethereum/EIPs · GitHub

RenanSouza2 · April 27, 2023, 6:59pm

Where are the tests being listed? is there a case where src and dst overlap?

Recmo · May 2, 2023, 8:43am

Copying non-exact words is more tricky, as for the last partial word, both the source and destination needs to be loaded, masked, or’d, and stored again. This overhead is significant. One edge case is if the last “partial word” is a single byte, it can be efficiently stored using MSTORE8.

Alternatively if there is more than one word you can make the last partial word overlap the previous on so the end aligns correctly. You would be copying some bytes twice, but this is idempotent. I found this method to be more efficient.

An important edge case not discussed in the EIP is overlapping source and destination. This may force you to do the copy backwards to not clobber the input.

All combined I ended up with the following (source):

    /// @dev Copies `length` bytes from memory location `source` to `dest`.
    /// @param dest memory address to copy bytes to.
    /// @param source memory address to copy bytes from.
    /// @param length number of bytes to copy.
    function memCopy(uint256 dest, uint256 source, uint256 length) internal pure {
        if (length < 32) {
            // Handle a partial word by reading destination and masking
            // off the bits we are interested in.
            // This correctly handles overlap, zero lengths and source == dest
            assembly {
                let mask := sub(exp(256, sub(32, length)), 1)
                let s := and(mload(source), not(mask))
                let d := and(mload(dest), mask)
                mstore(dest, or(s, d))
            }
        } else {
            // Skip the O(length) loop when source == dest.
            if (source == dest) {
                return;
            }

            // For large copies we copy whole words at a time. The final
            // word is aligned to the end of the range (instead of after the
            // previous) to handle partial words. So a copy will look like this:
            //
            //  ####
            //      ####
            //          ####
            //            ####
            //
            // We handle overlap in the source and destination range by
            // changing the copying direction. This prevents us from
            // overwriting parts of source that we still need to copy.
            //
            // This correctly handles source == dest
            //
            if (source > dest) {
                assembly {
                    // We subtract 32 from `sEnd` and `dEnd` because it
                    // is easier to compare with in the loop, and these
                    // are also the addresses we need for copying the
                    // last bytes.
                    length := sub(length, 32)
                    let sEnd := add(source, length)
                    let dEnd := add(dest, length)

                    // Remember the last 32 bytes of source
                    // This needs to be done here and not after the loop
                    // because we may have overwritten the last bytes in
                    // source already due to overlap.
                    let last := mload(sEnd)

                    // Copy whole words front to back
                    // Note: the first check is always true,
                    // this could have been a do-while loop.
                    for {

                    } lt(source, sEnd) {

                    } {
                        mstore(dest, mload(source))
                        source := add(source, 32)
                        dest := add(dest, 32)
                    }

                    // Write the last 32 bytes
                    mstore(dEnd, last)
                }
            } else {
                assembly {
                    // We subtract 32 from `sEnd` and `dEnd` because those
                    // are the starting points when copying a word at the end.
                    length := sub(length, 32)
                    let sEnd := add(source, length)
                    let dEnd := add(dest, length)

                    // Remember the first 32 bytes of source
                    // This needs to be done here and not after the loop
                    // because we may have overwritten the first bytes in
                    // source already due to overlap.
                    let first := mload(source)

                    // Copy whole words back to front
                    // We use a signed comparisson here to allow dEnd to become
                    // negative (happens when source and dest < 32). Valid
                    // addresses in local memory will never be larger than
                    // 2**255, so they can be safely re-interpreted as signed.
                    // Note: the first check is always true,
                    // this could have been a do-while loop.
                    for {

                    } slt(dest, dEnd) {

                    } {
                        mstore(dEnd, mload(sEnd))
                        sEnd := sub(sEnd, 32)
                        dEnd := sub(dEnd, 32)
                    }

                    // Write the first 32 bytes
                    mstore(dest, first)
                }
            }
        }
    }

charles-cooper · May 3, 2023, 10:15pm

The overlapping case is specified in the EIP:

It copies length bytes from the offset pointed at src to the offset pointed at dst in memory. Copying takes place as if an intermediate buffer was used, allowing the destination and source to overlap.

This is typically handled by the runtime or whatever standard memory copying routine is used by the client. For instance, Go specification states:

The built-in functions append and copy assist in common slice operations. For both functions, the result is independent of whether the memory referenced by the arguments overlaps.

The same is true of C stdlib’s memmove, (which is probably used by most language runtimes under the hood for copy operations):

Copying takes place as if an intermediate buffer were used, allowing the destination and source to overlap.

charles-cooper · May 6, 2023, 7:02pm

please see Update EIP-5656: add test cases including overlapping memory regions · ethereum/EIPs@94d9af0 · GitHub

wjmelements · May 8, 2023, 11:33pm

I’ve written dozens of smart contracts in solidity and assembly. I’ve never wished I had memcpy but I see how for sufficiently complex situations it might be helpful. So, I’m interested in the Motivation section of the EIP expanding on a scenario that would benefit from memcpy.

charles-cooper · May 18, 2023, 1:55pm

i mean the motivation section is already pretty detailed. maybe it matters more for compilers than user code, but like for instance every single assignment of the form x = y where x is larger than a single word can be optimized using mcopy. from the eip:

Memory copying is used by languages like Solidity and Vyper, where we expect this improvement to provide efficient means of building data structures, including efficient sliced access and copies of memory objects. Having a dedicated MCOPY instruction would also add forward protection against future gas cost changes to CALL instructions in general.

jochem-brouwer · May 27, 2023, 2:11pm

Hi all, I am implementing the EIP, it is nice that there are test cases, but:

There are test cases missing when it either copies memory from outside the current memory range, or it increases the memory size.
Gas costs are not listed.

charles-cooper · May 28, 2023, 1:49pm

Thanks, can add those. The semantics should do the “expected” thing though - copying from outside the current memory range should copy zeroes and expand memory, and increasing memory size should also expand memory. For reference, see evmone implementation: https://github.com/ethereum/evmone/pull/629/files#diff-0bab705191941f15a86a89eda1bea9c06947e63f4baf4ccb4909e7bfd50185a3R909-R922 (note that check_memory() expands memory.
The gas costs are listed twice in the latest version of the EIP, once in EIP-5656: MCOPY - Memory copying instruction and once in EIP-5656: MCOPY - Memory copying instruction.

jochem-brouwer · May 31, 2023, 3:09pm

Sorry, I meant that the gas costs are not listed in the test cases

Magicking · June 9, 2023, 9:59am

Currently working on a bitmap rendering library running on the EVM for the art scene, this opcode will directly translate to a larger surface of pixel available to render due to the cheaper computation when large surface of texture are compiled together within a gas limit.

Can’t wait for this EIP to be on the canonical chain!

radek · June 12, 2023, 8:45pm

edge cases not clear from the EIP:

dst = 0, src = type(uint256).max, len = 2+
dst = type(uint256).max, src = 0, len = 2+

…

charles-cooper · June 14, 2023, 5:33pm

These will fail at gas checking time due to gas expansion costs.

jochem-brouwer · June 22, 2023, 12:14pm

I have a problem with the test cases in the EIP. The last and the second-to-last test cases have a pre-state of 33 bytes of the memory. This is not possible in EVM since memory length is always a multiple of 32 bytes (and is filled with zeros if some region of this memory is not written to). I am assuming one zero-byte has been added accidentally to these tests.

Also, could these test cases report how much gas should be used when using MCOPY?

jochem-brouwer · June 22, 2023, 12:18pm

Also, the last test case output does not seem correct to me (will test on EthJS and will then report back)

EthereumJS reports:
000001020304050607 080000000000000000000000000000000000000000000000

It passes the other tests.

charles-cooper · June 30, 2023, 7:02am

nice catch, thank you! fixed here: fix eip-5656 test cases and add gas costs by charles-cooper · Pull Request #7257 · ethereum/EIPs · GitHub

that PR also adds gas costs - which i believe is 6 for all of the test cases in the EIP, but let me know if i made a mistake or you find any other issues.

jochem-brouwer · July 5, 2023, 6:19pm

I just checked and can confirm that for all MCOPY tests we indeed charge 6 gas.

dror · January 8, 2024, 9:16pm

One use-case for MCOPY is fill-with-zero: it is currently possible by having src offset set to a high value (e.g. 0xffffffff), but it would trigger a “memory expansion” and thus be very expensive.
Instead, I suggest defining this offset as “always zero”, so it will work without triggering such memory expansion, and thus making MCOPY also act as ZCOPY…