GraphQL interface to Ethereum node data

I don’t see how pending and the BlockNumber type are related. I mean to replace all occurrences of number: Long, both in block() and in blocks().

Parity assigned GraphQL-support to their next milestone. Yey.

1 Like

As I understand it, the reason for a special BlockNumber type is because in addition to numbers, we have two special values: pending and latest. In the current schema, the pending query handles the former, and the latest block can be fetched with a block query specifying neither number nor hash.

Interesting idea, but I’m not sure how this can be implemented efficiently on a node. As I understand it, returning the set of GroupedTopics would require scanning all the logs covered by the FilterCriteria, the same as a regular logs query. Why not simply do a regular query with the same criteria and do the grouping on the client side? The load on the node will be the same, and the reduction in data transmitted doesn’t seem substantial.

@pgebheim I see a lot these ideas developing as extension EIPs that provide an API layer on top of the nodes, but not necessarily in them. There is definitely a base layer of GraphQL that should be supported by the nodes for all of the reasons listed in the original justification as well as the extra benefits you get with query stitching and composing extensions on top of the base API. In my opinion, EIP-1767 should cover the base functionality of what nodes currently expose for data in and out.

There are additional layers that can be added on top of the nodes that can provide extra functionality. This can include an extension API for event / log indexing, transaction filtering, unrolling of the rlp encoded data etc. As to the differences between what is in EIP-1767 and what is currently in the EthQL project, I view this as an implementation toward this direction. For this particular standard (EIP-1767), I think it should limit the functionality and concentrate on providing a solid foundation from which we can extend in different directions including concepts like paging, log querying, and higher level convenience functions that are currently embedded in application code.

I don’t see it specified how many items are returned from a collection. For example, Github has these requirements

  • Clients must supply a first or last argument on any connection.
  • Values of first and last must be within 1-100.

This is one way to be explicit about it and it still allows for different pagination strategies.

This feels like a reasonable approach.

Any thoughts on how we should approach various proposals for layers on top of EIP-1767? Is there a need for separate EIPs for each extension or group of extensions?

In a naive implementation an eth node could scan all blocks to find this data. In an optimized case, this would be a composite key lookup – followed by a HEAD or a TAIL operation on the on the data based off a natural index. DBMS can handle these sorts of queries over VAST data sets optimally. It’s really a problem with known solutions, we just need to decide whether we want to do this in Ethereum nodes.

I personally find these sorts of premature optimization conversations to be a limiting factor in terms of bring good interfaces for application developers. I can construct any number of examples where the data size of log fetching becomes substantial – but even worse than that is pushing that development cost to every single application development team, and the cost of degraded UX from web applications that now need to fetch potentially thousands of documents spanning thousands of blocks in order to render data from a small set.

The performance penalty here becomes particularly poor when the client is speaking to a light client, where actually fetching logs from every block in a range requires also fetching block headers for each block returned even if only a small subset of them are actually used in the application.

I have concrete examples from Augur if you want me to go into them.

We have half a dozen node implementations for Ethereum, and if we want graphql to be a standard, they’ll all be expected to implement whatever is defined here. Thus, we should strive for a minimum viable interface first.

For a feature to add value, it should do something that’s significantly more efficient to do on the node than on the client. In cases like this, it seems like you could implement it just as efficiently by using the existing log filter support, then doing a grouping operation in the client. This functionality can be provided via a client-side library, meaning it only has to be implemented once - instead of half a dozen times.

Lets do some math:

For Augur’s first order volume goals, we expect around 2600 trades per week. Currently, the logs that are emitted for a trade happening clock in at just around 900 bytes. This means that a normal weekly cadence of coming back to trade will cause each user to download 2.34mb of data just for this one piece in order to update their local databases to then update the order. Those trades are likely to exist across ~30 markets at a time, meaning that if we could just fetch the latest log for each market efficiently, we would transmit 30*900, or around 27kb of data to get that user’s state up to date.

In any respect, transmitting an extra 2.2mb of data to client + the associated cost of deserialization, grouping, etc etc is going to make the user experience of a dApp that uses an eth node directly far worse.

NOW – lets take a look at this from the perspective of a user that is getting this data from a light node :wink:

Based on the way that light nodes need to fetch data from full nodes, and then verify the blocks before giving it a to a client, a light client needs to on average fetch 8x the amount of data in block headers as a full node.

In the example above, a light client would need to fetch ~2600 blocks worth of headers from a full node to scan and return all the logs. This creates the situation where the node must request ~20,000 block headers from the full nodes that are serving that light client in order to return the data.

Contrast this with the case where the light client can ask the full node directly for the last logs, where it would need to fetch and validate headers for 30 blocks for a total of 240 blocks to fetch.

Put that into the context of light client throttling and you’ll see that clearly expecting all clients to just fetch all the logs is going to put undue stress on the entire network and degrade UX for any client that is attempting to take advantage of eth nodes.

In the edge case that you only want the latest log entry for each group, this would save some data from node to client, yes. I’d suggest, though, that this is quite uncommon - usually we need all the logs in order to reconstruct a state.

Neither solution is scalable, though, because of the load it puts on the node - a solution like The Graph makes a lot more sense.

Also - 900 bytes per log entry?! That’s 28 words. What on earth is being recorded here?

Er? Where are you getting this from?

Adding this interface to the graphql API won’t allow this - you’d need to change the light client protocol instead. I don’t see how this is possible, though, as there’s no way at present to generate a proof-of-nonexistent for a more recent entry than the one the full node returned.

You say if a functionality can be added both on the node side, and on the client side, we should strive for the minimum on the node side. I think the math is actually the inverse here. We have 5-6 node implementations, and we’d need at least as many client-side libraries as mainstream programming languages. So if we want to gain a wider adoption of GraphQL, we have two options to support debated features,
a) Make it part of the spec and include it in the node implementation
b) Create libraries for the mainstream languages and implement it there

It’s not clear to me who the target audience / potential users of GraphQL would be. The EIP suggests it’s a long-term replacement of JSON-RPC. But it covers only a subset of JSON-RPC. Do you see the two APIs living together in the long run? Will their feature set diverge?

Why do from(): Account! and to(): Account! in a transaction take a block argument? I’d specify the block like this:

block(number: 42) {
  transactions {
    from
    to
  }
}

Now I have to specify a – potentially different – block for the accounts.

For most purposes the standard graphql libraries already available in the user’s language of choice should be sufficient - though in practice, most consumers are in JavaScript.

If we start adding everything and the kitchen sink, we will end up with 0 implementations on the server side, or several incompatible implementations, and arguing about how many languages will have to add support on the client side will be academic.

There’s a table in the EIP that shows JSON-RPC coverage; the GraphQL API covers all JSON-RPC functionality other than deprecated functionality (mining interface, transaction signing, etc).

To give you the flexibility to specify the block you want to fetch the account at. We should make it clear that the default value is the block the transaction was mined in, though, or the latest block if the transaction has not been mined.

I set up a Slack channel for all Ethereum+GraphQL related stuff, including this EIP. Feel free to join. It may enable a quicker feedback loop for clients wanting to implement or implementing a GraphQL interface.

I created an initial test suite based on Pantheon’s work. It’s a collection of graphql files and their expected output as json. This is not it’s final location, we’re working on a framework to run the tests across clients. See the README.md in the parent folder for how to run the tests.

According to these tests, the Geth and the Pantheon implementations are somewhat different, 8 of 67 tests fail with Geth. I’ll look into them and compare with the current spec.

Please, mercy, I can’t handle another Slack tab open in my browser all the time. Can’t we use Gitter, or Discord, if we must?

1 Like

Sorry, I know given a few different people in this space, they all prefer different chat apps. I have 9 installed on my phone :wink: You can set up a Zapier integration from Slack to Discord. Or I’ll cross-post the most important updates here.

I’m checking the differences between the Geth and Pantheon implementations. What I’ve found so far is how they differ in error handling.

They both consider a case, but return different error messages (for example “Invalid params” vs “hex number with leading zero digits”). I see a few options here.

  • Standardize error messages
  • Ignore the message part and standardize only the categories (the “extensions” part of the returned json).
  • Standardize “compile-time” error messages only (missing fields, etc).

They treat missing entities differently. If you query the balance of a non-existent account or a property of a non-existent block, Geth will return 0 and null, respectively. Pantheon returns an error in both cases (specific to the case). In this case, I’d return both data, and errors (which is ok by the spec). So a dumber client would be use 0 balance as earlier, a smarter client could tell the account name was too short.

I wrote a draft of how to handle error messages. It’s in the form of a WIP pull request so we can discuss it. I think an important lesson of JSON RPC is that we should standardize error codes and messages.

Thanks for your hard work! I’m just about to leave on a week’s vacation, but I’ll take a close look as soon as I get back.

1 Like

I encountered an issue with Long. JavaScript can represent numbers up to 53 bits, that’s why Long is not part of the GraphQL spec. Our schema uses a custom Long type defined as a 64-bit int in these cases:

  • gas-related (cumulativeGasUsed, estimateGas, gas, gasLimit, gasUsed)
  • block-indexing (highestBlock, etc)
  • counting states (knownStates, pulledStates)
  • nonce
  • transactionCount

I think we should use a custom Int53 in cases where it’s feasible and use BigInt for the rest. Or maybe define Long in a way that it returns an integer if it fits in 53 bits and a hex-encoded string otherwise, like in this example. So it would be an error to input or output “0x1” because it would fit into the 53 bits.