I’ve found it very useful to be able to monitor a group of Pantheon nodes using Prometheus metrics and a Grafana dashboard (https://grafana.com/dashboards/10273 specifically). With Geth 1.9 also adding a Prometheus endpoint to expose its metrics, it seems like it would be worthwhile agreeing on standard names for a few of the core metrics like current chain height and peer count. It’s a minor thing but makes it easier to use a single dashboard to monitor nodes using different clients. I’ll write this up as an informational EIP if no-one screams too loudly.
Eth 2.0 an informative spec for metrics at https://github.com/ethereum/eth2.0-metrics/blob/master/metrics.md though I’m inclined to be even less prescriptive and basically just define the metrics, omitting anything about how to configure the client.
I’d suggest metrics for:
- Current chain height:
- Best known block number (similar to
eth_syncingJSON-RPC but would be the current chain head when not syncing rather than
- Current peer count:
Them names for metrics around CPU and memory usage tend to be pretty standard so those metrics enable almost the entire dashboard I’ve been using which is a really nice overview.
You can generate quite a useful dashboard with just those. There are also some for fast sync (pivot block, number of times pivot block change, downloaded world state, known world state remaining) which may be worth setting some standard names for even though they won’t apply to every client.
And there will always be a ton of very client specific metrics which we shouldn’t even attempt to standardise.