[ERC-2159] Common chain metrics

EIP: http://eips.ethereum.org/EIPS/eip-2159

I’ve found it very useful to be able to monitor a group of Pantheon nodes using Prometheus metrics and a Grafana dashboard (https://grafana.com/dashboards/10273 specifically). With Geth 1.9 also adding a Prometheus endpoint to expose its metrics, it seems like it would be worthwhile agreeing on standard names for a few of the core metrics like current chain height and peer count. It’s a minor thing but makes it easier to use a single dashboard to monitor nodes using different clients. I’ll write this up as an informational EIP if no-one screams too loudly.

Eth 2.0 an informative spec for metrics at https://github.com/ethereum/eth2.0-metrics/blob/master/metrics.md though I’m inclined to be even less prescriptive and basically just define the metrics, omitting anything about how to configure the client.

I’d suggest metrics for:

  • Current chain height: ethereum_block_number
  • Best known block number (similar to highestBlock from eth_syncing JSON-RPC but would be the current chain head when not syncing rather than null): ethereum_best_known_block_number
  • Current peer count: ethereum_peer_count

Them names for metrics around CPU and memory usage tend to be pretty standard so those metrics enable almost the entire dashboard I’ve been using which is a really nice overview.

You can generate quite a useful dashboard with just those. There are also some for fast sync (pivot block, number of times pivot block change, downloaded world state, known world state remaining) which may be worth setting some standard names for even though they won’t apply to every client.

And there will always be a ton of very client specific metrics which we shouldn’t even attempt to standardise.

2 Likes

We have hundreds of metrics that we send from Nethermind to Prometheus. Will post the list here later so maybe they will be an inspiration for some more detailed tracking. Will be glad to see your lists too.

Pantheon publishes quite a few but currently doesn’t make any promises about any of them being stable. I think the ones above are the most useful to provide stable names for and making them consistent across clients would be useful. They are also the ones I’ve heard from clients are useful (or been directly requested by clients).

Other possible options are number of network messages received (by type), counts of different disconnect reasons but I think this are more likely to be used to work out why something isn’t going well rather than just the high level overview. Once you get to that level you’re well into client-specific land anyway.

Example Pantheon metrics from one deliberately very under-powered box are in https://gist.github.com/ajsutton/09535958828b4b2a1a1f7da8f8d865e2 if you’re interested.

Drafted an Informational EIP at: https://github.com/ethereum/EIPs/pull/2159

1 Like

Left a question on the pull request:

So is this an ERC, Interface or Informational?

Is this proposing that clients when integrating with Prometheus should use the names below? What does integrating with Prometheus means?

In short: does this proposes something clients would implement or is this implemented by third parties when integrating a client into Prometheus?

Thanks for the rapid review. I left this on the PR as well but for people following along here:

This is something that clients would implement, but exactly how they integrate with Prometheus is up to them - there are a few ways a client and Prometheus could be configured to work together and no real benefit in trying to standardise it.

What is useful is if a client is sending metrics to Prometheus (via whatever means), that these few properties have the same names and meanings. Then you can create a dashboard like https://grafana.com/api/dashboards/10273/images/6473/image to monitor groups of nodes.

I’m not exactly sure what category this should go under. Interface is a possibility and would definitely be the case if it were standardising how to integrate with Prometheus. I’ve put it as Informational here because its essentially a recommended convention. It’s not trying to be the definitive spec on metrics for clients and its entirely optional whether clients choose to support it or not. It’s just a little more convenient if they do.

I don’t mind which category it winds up in though.

Can you include ERC-2159 in the title and http://eips.ethereum.org/EIPS/eip-2159 in the top comment?