EIP-7517: Content Consent for AI/ML Data Mining

A proposal adding “dataMiningPreference” in the metadata to preserve the digital content’s original intent and respect creator’s rights.


This EIP proposes a standardized approach to declaring mining preferences for digital media content on the EVM-compatible blockchains. This extends digital media metadata standards like ERC-7053 and NFT metadata standards like ERC-721 and ERC-1155, allowing asset creators to specify how their assets are used in data mining, AI training, and machine learning workflows.

Motivation

As digital assets become increasingly utilized in AI and machine learning workflows, it is critical that the rights and preferences of asset creators and license owners are respected, and the AI/ML creators can check and collect data easily and safely. Similar to robot.txt to websites, content owners and creators are looking for more direct control over how their creativities are used.

This proposal aims to propose a standardized method of declaring these preferences. Adding dataMiningPreference in the content metadata allows creators to include the information about how they want their work whether the asset may be used as part of a data mining or AI/ML training workflow. This ensures the original intent of the content is maintained.

For AI-focused applications, this information serves as a guideline, facilitating the ethical and efficient use of content while respecting the creator’s rights and building a sustainable data mining and AI/ML environment.

The introduction of the dataMiningPreference property in digital asset metadata covers the considerations including

  • Accessibility: A clear and easily accessible method with human-readibility and machine-readibility for digital asset creators and license owners to express their preferences for how their assets are used in data mining and AI/ML training workflows. The AI/ML creators can check and collect data systematically.

  • Adoption: As Coalition for Content Provenance and Authenticity (C2PA) already outlines guidelines for indicating whether an asset may be used in data mining or AI/ML training, it’s crucial that onchain metadata aligns with these standards. This ensures compatibility between in-media metadata and onchain records.


Please see the latest proposal here and provide your comments below. Thanks!

Very nice work! I‘d recommend to add and also first check what the rights of the corresponding owner in the jurisdiction and legal system are at the time the asset is created. To my knowledge, legal issues may arise if, e.g., a token is created within a blockchain with an existing body of law at the time of the token creation. Therefore, even if a license is missing (or, e.g., the owner missed or did not create one himself) for the blockchain, all rights belong to the blockchain owner by default, to the best of my knowledge. So, mining preferences might be rejected, in this case, or may create a legal grey area.

1 Like

This suffers from the evil bit problem. This doesn’t necessarily need fixing, since many other standards that suffer from the same problem do exist (e.g. robots.txt), but due to the nature of how AI models are currently trained and the decentralized nature of Ethereum, it may be harder to enforce than other evil bit standards. Might be worth including in the Security Considerations?

1 Like

Please remember that in almost every jurisdiction nowadays, a person has the right to their personal data. They may decide to revoke consent to process it, may request their personal data/data porting, may ask for correcting false data or to erase any personal data (e.g. under GDPR/EU law or most cyber law). I myself would never allow for my personal data, out of blockchain data collections for example, to be processed by AI/ML data mining, so not sure this is really fashionable. Just saying…

1 Like

Unfortunately, this depends on users’ jurisdictions and is likely impossible to codify.