Not sure which category is the best for this question.
I set up a
geth-1.9.42 full node and it, unfortunately, has the problem that
eth.syncing.highestBlocknever changes once geth started (also filed in geth’s github issues)
eth.syncing.currentBlockis usually 70000 blocks behind the latest block reported by etherscan.
eth.syncing.currentBlockgrows by ~ 1 block per minute so it will never catch up.
I’ve spent too many hours trying to solve this problem, and this is the 3rd time I tried to set up a full node in the last 3 years, and I had been a Linux sysop for years with good knowledge of running network configuration and dæmons, yet I’m at my wit’s end. If you are an operator of a full node & can spend some time to help me over zoom or something kindly PM me your Ethereum address for payment for your time.
Since every year I make a project try to run a full node and fail, it’s fair to say I have committed more than 100 working hours in the last 3 years to get a node up and running and never achieved it. I set up Bitcoin nodes, ed2k nodes, BitTorrent etc with relative ease, geth is the most tricky of all.
If I talk to Ethereum users in meetups and Telegram, typically I am told to check
𝑎) network and
𝑏) if the host is too weak to run a full node.
So here we are:
There are constantly 50 peers and
eth.syncing.currentBlock kept growing (at a slow rate), so that ruled out lack-of-peer issue.
NAT is often the first suspected in any network problems, but in my case
admin.peers showed 50 peers, many peers with
network.inbound == true, meaning that they connected to this node. All ports are open and tested (with netcat).
Furthermore, the download/upload bandwidth used by geth, 1.3Mbits/s download and 300KBits/s uplod, is 1/16 of the available bandwidth, so it doesn’t look like we are choking on bandwidth. I called up the fibre-to-curb provider to upgrade the bandwidth 2× and verified the upgrade is effective (by downloading stuff), yet get still use the same amount of bandwidth (now amounts to 1/32 available bandwidth), so that rules out bandwidth problem.
𝑏) System too weak?
I can’t answer it because I don’t have a working geth setup to compare with, but following is my system’s data which shows all resources are underutilised: CPU, memory, disk_IO, available bandwidth. If the host configuration is too low, I should expect one of the resources to be fully used.
Memory: 80% unused
$ free -m total used free shared buff/cache available Mem: 20022 4584 407 1 15030 15824 Swap: 13311 1 13310
This is not expected. I expect geth to use at least 16GB memory since I have given it
top(1) shows geth uses 10% to ~20% memory typically.
System Load: medim-high but not maxed out
Since the CPU has 4 cores, load average 2.7~3.5, indicating mid-to-high but not full load. (Sometimes it goes to as low as 1.3). In my sysop years, clients starts to report errror when server’s load is approaching 1.5~2 times the number of cores, so this load look okay to me.
$ uptime 23:47:24 up 15 min, 2 users, load average: 3.54, 3.20, 1.95
CPU usage: 33% of one core (total 4 core)
Geth typically uses 33% ~ 34% of one CPU. (It doesn’t look like multi-threadable)
RAID performance: 170MB/s to 522MB/s
Using RAID 5 array of 4 disks. When not under load (not running geth), run
hdparm -t 5 times to get this:
/dev/sdb: Timing buffered disk reads: 1046 MB in 3.01 seconds = 347.88 MB/sec a@osboxes:~$ sudo hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 1150 MB in 3.00 seconds = 382.90 MB/sec a@osboxes:~$ sudo hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 1322 MB in 3.02 seconds = 437.09 MB/sec a@osboxes:~$ sudo hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 1472 MB in 3.02 seconds = 487.37 MB/sec a@osboxes:~$ sudo hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 1570 MB in 3.01 seconds = 522.44 MB/sec
hdparm -t reports only 170MB/s