ProgPoW Audit Delay Issue

epic.henry · June 4, 2019, 12:00am

I have to comment on your post as the information you present is incomplete and misleading.

For the record, I am with ePIC Blockchain, a team of former AMD engineers who have over a century of experience designing GPU, mobile SoC and Blockchain ASIC. ePIC does not have a vested interest in either Ethash or ProgPOW.

Our sole interest is to help set the record straight with facts or pose questions that a hardware auditor need to address to conduct a comprehensive audit and reach a valid assessment for the Ethereum community.

I will address the key points that you posted above that ProgPOW favors Nvidia and that ProgPOW ASIC’s are easy to design:

1. Misleading claim that ProgPOW favors Nvidia

Sonia-Chen:

ProgPoW risks turning Ethereum into an Nvidia-managed game, with ETH devs becoming unpaid support staff to keep the game going.

I’m relaying the following quote from people with more GPU expertise than we have or are willing to acquire (not from Linzhi, take it fwiw):

"On AMD V_MUL_LO_U32 uses 16 cycles, on Nvidia starting from Volta IMAD uses 5 cycles.

32-bit multiplication in ProgPoW was pushed under the pretense of it being inefficient on both manufacturers but that turns out to be a lie as on Nvidia it was only inefficient on Pascal. The algorithm is tuned to still let Pascal utilize full memory bandwidth due to simply sheer compute capacity difference coming partially from higher die size which is why comparing GPUs based on price is being pushed as of late. It’s not a secret that 4xx & 5xx series AMD GPUs are not high-end but because ProgPoW’s compute to memory bandwidth ratio is tuned to match Nvidia GPUs, AMD GPUs are not utilized to their fullest, most importantly losing the full memory bandwidth utilization which is the very basic foundation of the Ethash algorithm."

Measured benchmark results do not support Linzhi’s post that ProgPOW favors Nvidia over AMD. I will point to a Medium post, Comprehensive ProgPoW Benchmark by Theodor Ghannam to support my observations.

From the Ethereum Hashrates chart, I extracted the following data comparing ProgPOW to Ethash performance for 6-8GB cards using GDDR5 memory. I excluded GDDR5x and GDDR6 cards, to maintain an apples to apples comparison, as faster memories improve ProgPOW performance

Card….ProgPOW…Ethash…Relative perf

RX470 …… 9.7 ……… 20.6 ……47.1%

RX480 …… 11.4 ……. 21 ……… 54.3%

RX580 …… 12.7 ……. 21 ……… 60%

1060 ……… 10.2 .…… 20.6 …… 49.5%

1060ti ……. 14.8 …… 26 ……… 56.9%

1070 ………. 13.7 .…. 27.7 …… 49.5%

1070ti .…… 13.9 …… 27.7 .…… 50.2%

Note that these results were measured using factory clocks and therefore slower than what miners are used to seeing. I should point out that comparing BIOS’ tuned for Ethash is not a valid comparison as one typically increases the memory clock while underclocking and undervolting the core clock. In ProgPOW, one needs to overclock the core to get higher compute performance.

The results show Nvidia and AMD both drop about 50% in performance on ProgPOW. This is to be expected as memory access are now 32 bytes as opposed to 16 bytes requiring double the memory bandwidth.

I will reiterate that Linzhi’s post that ProgPOW favors Nvidia over AMD is wrong. I would also chastise Linzhi to do their homework and check the facts before posting incorrect information.

2. Misleading claim that ProgPOW is easy to implement in an ASIC

Buried inside the article is Linzhi’s misleading claim that states:

The random instruction of EIP 1057 increases die cost/power by about 1%, and causes a die increase of <1mm². The proposed open design is demonstrating a logic-only performance of 1.2 GHash at 30W and could be deployed by Bitmain or Innosilicon, resulting in a machine with about half the hashrate of their predecessors, similar to the best GPUs.

These statements are completely incorrect for the following reasons.

While it is true that the ProgPOW logic is small and easy to design, Linzhi completely ignores the fact that the performance limitation of a custom ASIC still remains the required memory bandwidth and the resulting logic required for caching and performance.

For ProgPow, sequential accesses to the memory is the performance limiting factor. These operations are driven from a 16 kB cache which would be needed in the math block to ensure the pipeline is full and the performance is optimal. This would add area (depending on node/technology chosen) which is not included in the original design.

Taking from this Github cache load example:

Light DAG init done
Full DAG init done
ProgPoW version 0.9.3
Block 30000
Digest 6018c151b0f9895ebe44a4ca6ce2829e5ba6ae1a68a4ccd05a67ac01219655c1
Result 34d8436444aa5c61761ce0bcce0f11401df2eace77f5c14ba7039b86b5800c08
DAG 64 loads, 16384 bytes
Cache 11264 loads, 45056 bytes
Merge 33792 total (8192 7168 10240 8192)
Math 18432 total (0 2048 0 5120 5120 0 0 0 4096 0 2048)

The cache loads are very significant and play a major role in PPA (performance, power & area). This fact was noted by someone who responded to your Medium post. However, somehow the reply was removed from public viewing. Luckily this was captured on Reddit on this counterpoint refuting Linzhi ProgPOW post..

From Medium: Sarah Osbourne reply to Linzhi

The first major thing you omitted in your attack on the insignificant ALU is the size of the register file relative to the ALU. A 12 kiB register file will require almost 400,000 gates just for storing the bits. This completely dwarfs the ‘large’ multiplier at only 20,000 gates.

The most important thing that you have conveniently omitted are the 12 completely random cache operations per loop. Note that there are only 20 math operations per loop, so the cache operations are extremely significant. The cost of integrating a 16 kiB SRAM would be at least 500,000 gates just for storing the bits. You would need a 12-port SRAM just to service these requests for one pipelined processing element (which would destroy your simple register file idea), and that is completely ignoring address conflicts and many more problems. If you want anything even remotely resembling the heavily banked SRAMs in GPUs with all their advanced request conflict resolution and so on, then it’s going to take a lot of engineering effort. You might as well become a GPU manufacturer at this point.

Posting such misleading claims do not help Linzhi’s credibility in the Ethereum community. Some people would view this action as a deliberate attempt to sabotage ProgPOW in order to protect your Linzhi Ethash miner which would be bricked by a change to ProgPOW.

Let the hardware auditors and the experts in the community present their findings based on facts and then we can discuss why we agree or disagree.

Why can’t we all be friends and have FUN like SpongeBOB instead of spreading FUD like Plankton?