Thanks everyone for spearheading making GraphQL for Ethereum a real thing!
Given there is now an expressive syntax for querying a node with the implementation of this proposal, it now opens the door to providing much friendlier APIs do dapp developers. One which standardize and simplify their jobs, and can provide performance and scalability performance to those apps.
I’d like to suggest a feature (or perhaps class of features) that would be a natural extension to the current schema and are perhaps supportable in a GraphQL Extension proposal – if not added directly to EIP-1767.
I’m looking forward to thoughts, feedback, and ideas on what the best way to take continue moving this proposal forward.
Thanks in advance!
Proposal
Extend Query
Schema to support grouping by topics and returning FIRST N
or LAST N
logs from the group.
Example Schema
The follow example GraphQL schema addition would support all necessary operations for this grouped-selection proposal. Other schema formulations may do the the same, so another may work as well or better than this example.
type GroupedLogs {
groupTopics: [Bytes32!]!
count: Long!
logs(first: Int = NULL, last: Int = NULL): [Log!]
}
type Query {
logs(filter: FilterCriteria!): [Log!]!
# The same FilterCriteria as `logs` are used, but specifying topic indices
# in the groupTopics parameter would allow grouping the resulting logs by
# the topics specified.
#
# For the context of this proposal, logs are always sorted in natural
# (BlockNumber, LogIndex) order.
groupedLogs(filter: FilterCriteria!, groupTopics: [Int!]!): [GroupedLogs!]!
}
Rationale
Logs are emitted by smart contracts for a number of reasons. For complex applications, they are often the only way of notifying a client of certain results, and being able to quickly select the last N available logs would man clients don’t need to scan nodes for ALL logs just to retrieve the most recent event for the index they care about.
An Example
For example, say you have a game that generates NFTs, lets call them Wizards
, where the Wizard has a contract function evolve
. Holders of the Wizard can call evolve
at any block, and by virtue of doing that transaction, there is some probability that the Wizard evolves and has its “power” incremented by some amount.
Assuming you were a UI that wanted to display all created Wizards, sorted by their current power, there are a few options at play of how to implement this solution.
-
Keep track of this large sorted list of Wizards on chain as the Wizard is evolved.
- This doesn’t scale for a number of reasons, most notably the unbounded data-structure / algorithmic runtime don’t work with a Gas-capped based system.
-
Track all Wizards created with an off-chain aggregation (e.g. by specifying a log filter for a Created event), and using that list issue a large number of queries for the current chain state of each of the Wizards, and then organizing and sorting those items.
- This requires issuing a large amount of queries to eth nodes to fetch all of the current state of the chain, potentially across thousands of eth_calls. Even when batching this is particularly expensive especially for light nodes where you’re penalized heavily for asking for too much data. Also, if its not also coupled with an Event system there is no efficient way to query Power-over-time for a set of Wizards. When implemented in this way, current solutions effectively use centralized application servers to consolidate the on-chain data and provide a nice query interface.
-
Each time an Wizard is evolved, an Evolved event can be emitted, which would contain the current Wizard address as well as the power-level after the evolution. Using this method a client can issue a Query.logs request to fetch ALL logs across all Wizards, and then the client can store all those logs and do various operations on them, including sorting and grabbing the last Evolved event – using this to drive the UI.
- This approach is architecturally nice because it allows things users expect, like being able to see power-over-time easily for Wizards, but comes with the drawback that a client must effectively fetch all available data to figure out what the current power-level of each Wizard. The fact that fetching all that data is a costly endeavor
-
This proposal Architecturally we could use the above approach, coupled with the ability of an eth node to return the last N Evolved logs for each Wizard. With this proposal, a dapp builder doesn’t need to do any expensive blockchain scanning, nor coordinate large-scale
eth_call
spam to a node. This also totally eschews the need for business login in a centralized app server, and means that any node implementing this protocol can efficient power the user-interface for this theorized game. It also means that any third party node providers can easily cache the results of this particular query and scale it out even further, without having any specific business logic implemented for a dapp.
Real-Life Usage
The current Augur event architecture works similarly to the 2nd the last example above. Each Augur market is represented by a contract deployment, and each match trade of shares on the market is logged by a parent contract (Augur
) so that the trades can be analyzed by clients to support common trading use cases like Last Trade Price
, or to display a user’s current Profit and Loss across all the markets in which they participate.
Currently, fetching the most recent state of the application involves scanning and caching all log messages for the set of markets, and then using that to drive basic UI functionality, like sorting based on last trade price.
This is incredibly expensive per client, putting load on eth nodes to synchronize all log state to each client instead of returning the exact relevant data (the current state of each market over a range of blocks).
Potential Objects
Adding more advanced Filtering may add load on already loaded nodes
While this may seem true on the outset, and will be true to some extent, I believe that this load pales in compare to the load genreated by the alternatives. Take a common case, a client is listening to the Events coming off an eth node, the client goes offline for some number of hours and then needs to catch up to the state of the chain.
- In the case where they are able to query the chain directly for the data, they may need to do an unbounded number of requests to refresh the state of all objects that may have changed. In this case, a client would fall into the second case above and be required to issue potentially thousands of
eth_call
s to the node in order to get understand the current state.
- In the case where the client relies upon log notifications, they may need to scan up to 12 hours of logs to fetch the most recent events they care about. In this case there are two pieces: the index scan, and returning the data. The index scan for the data will need to happen both in the naive log fetching implementation and this proposal, but the amount of data returned stands to be significantly reduced if only the
LAST N
logs filter per group were used.
But Logs are supposed to be for Events, not long term storage!
Even if this is true there are cases where relatively short term storage of the logs for reliable delivery to nodes which may go away for some small number of hours or days could see a benefit from the ability to efficiently query for logs. If, on the other hand, it is decided the Log storage should be for a moderate, long, or forever time scale for full nodes then being able to efficiently query large amounts of event data becomes even more useful.