Not sure which category is the best for this question.
I set up a geth-1.9.42
full node and it, unfortunately, has the problem that
-
eth.syncing.highestBlock
never changes once geth started (also filed in geth’s github issues) -
eth.syncing.currentBlock
is usually 70000 blocks behind the latest block reported by etherscan.eth.syncing.currentBlock
grows by ~ 1 block per minute so it will never catch up.
I’ve spent too many hours trying to solve this problem, and this is the 3rd time I tried to set up a full node in the last 3 years, and I had been a Linux sysop for years with good knowledge of running network configuration and dæmons, yet I’m at my wit’s end. If you are an operator of a full node & can spend some time to help me over zoom or something kindly PM me your Ethereum address for payment for your time.
Since every year I make a project try to run a full node and fail, it’s fair to say I have committed more than 100 working hours in the last 3 years to get a node up and running and never achieved it. I set up Bitcoin nodes, ed2k nodes, BitTorrent etc with relative ease, geth is the most tricky of all.
If I talk to Ethereum users in meetups and Telegram, typically I am told to check
𝑎) network and
𝑏) if the host is too weak to run a full node.
So here we are:
𝑎) Network
There are constantly 50 peers and eth.syncing.currentBlock
kept growing (at a slow rate), so that ruled out lack-of-peer issue.
NAT is often the first suspected in any network problems, but in my case admin.peers
showed 50 peers, many peers with network.inbound == true
, meaning that they connected to this node. All ports are open and tested (with netcat).
Furthermore, the download/upload bandwidth used by geth, 1.3Mbits/s download and 300KBits/s uplod, is 1/16 of the available bandwidth, so it doesn’t look like we are choking on bandwidth. I called up the fibre-to-curb provider to upgrade the bandwidth 2× and verified the upgrade is effective (by downloading stuff), yet get still use the same amount of bandwidth (now amounts to 1/32 available bandwidth), so that rules out bandwidth problem.
𝑏) System too weak?
I can’t answer it because I don’t have a working geth setup to compare with, but following is my system’s data which shows all resources are underutilised: CPU, memory, disk_IO, available bandwidth. If the host configuration is too low, I should expect one of the resources to be fully used.
Memory: 80% unused
$ free -m
total used free shared buff/cache available
Mem: 20022 4584 407 1 15030 15824
Swap: 13311 1 13310
This is not expected. I expect geth to use at least 16GB memory since I have given it --cache 16384
. top(1)
shows geth uses 10% to ~20% memory typically.
System Load: medim-high but not maxed out
Since the CPU has 4 cores, load average 2.7~3.5, indicating mid-to-high but not full load. (Sometimes it goes to as low as 1.3). In my sysop years, clients starts to report errror when server’s load is approaching 1.5~2 times the number of cores, so this load look okay to me.
$ uptime
23:47:24 up 15 min, 2 users, load average: 3.54, 3.20, 1.95
CPU usage: 33% of one core (total 4 core)
Geth typically uses 33% ~ 34% of one CPU. (It doesn’t look like multi-threadable)
RAID performance: 170MB/s to 522MB/s
Using RAID 5 array of 4 disks. When not under load (not running geth), run hdparm -t
5 times to get this:
/dev/sdb:
Timing buffered disk reads: 1046 MB in 3.01 seconds = 347.88 MB/sec
a@osboxes:~$ sudo hdparm -t /dev/sdb
/dev/sdb:
Timing buffered disk reads: 1150 MB in 3.00 seconds = 382.90 MB/sec
a@osboxes:~$ sudo hdparm -t /dev/sdb
/dev/sdb:
Timing buffered disk reads: 1322 MB in 3.02 seconds = 437.09 MB/sec
a@osboxes:~$ sudo hdparm -t /dev/sdb
/dev/sdb:
Timing buffered disk reads: 1472 MB in 3.02 seconds = 487.37 MB/sec
a@osboxes:~$ sudo hdparm -t /dev/sdb
/dev/sdb:
Timing buffered disk reads: 1570 MB in 3.01 seconds = 522.44 MB/sec
But sometimes hdparm -t
reports only 170MB/s