My technical take on ProgPow's weakest link

jcyr · March 28, 2019, 4:15pm

In theory sure. In practice I doubt you could measure a difference. You’re talking about a few extra instructions to do an indexed load from a branch table, then a jump to the entry’s content.

Also do we know that Nvidia’s just-in-time nvrtc compiler does as good a job optimizing as the command-line compiler nvcc?

gcolvin · March 28, 2019, 4:39pm

… but I think I get from your argument that it improves performance on GPUs enough that if it is possible miners will do it, so there is no point make it unnecessary. Do I understand you?

Anlan · March 28, 2019, 4:57pm

@gcolvin that is exactly what I meant

Anlan · March 28, 2019, 5:25pm

According to CUDA documentation the “extra” -O cli argument supplied by nvcc has effect only on “host” code (NVIDIA CUDA Compiler Driver).
NVCC is meant to do a lot more than simply compiling a device kernel. It links in fatbinary host and device code (so you can invoke kernels using <<<>>>), allows profilation of code, allows usage of host and device context mixed etc.
But for ptx generation both nvcc and nvrtc produce the very same output (tested).

Anlan · March 28, 2019, 5:50pm

Not true. Remove conditionals, remove index increments, remove everything not necessary, use inline asm (when possible) during the immutability period and you easily gain hashes and hashes per second from kernels. But you should already know that having contributed to ethminer for so long and having witnessed all the small adjustments that have brought ethminer to be on par with claymore (at least on CUDA).

jcyr · March 28, 2019, 8:14pm

That’s all very speculative

What you’re saying is that:

invoke_super_optiminzed_inlined_seqience_for_period(n);

will be significantly faster than:

switch (period) {
  case n:
     invoke_super_optiminzed_inlined_seqience_for_period(n);
     break;
  case n + 1:
     invoke_super_optiminzed_inlined_seqience_for_period(n + 1);
     break;
  case n + 2:
     invoke_super_optiminzed_inlined_seqience_for_period(n + 2);
     break;
  case n + 3:
     invoke_super_optiminzed_inlined_seqience_for_period(n + 3);
     break;
  ...
}

I really don’t think it would be perceptibly slower for any given n. For a sequential integral switch statement selector a compiler will generate an indexed load from a static table of branch addresses, followed with a jump to that address, and a branch out of the switch at the end. Not much overhead considering the amount of stuff inside each sequence, and the same wiz-bang optimizations can be applied in each case. Hence my earlier suggestion to limit the choice to 32 possible sequences. Such things have been tried in other POW algorithms, but I don’t have any data about how effective it was, so that’s also very speculative.

Of course it would be impractical to do this with a large set of choices. I’m just trying to give a sense of the actual performance degradation you speak of.

Anyway, I think this has been sufficiently flogged and I will leave it at that. It is likely too late to journey into even more unknown territory anyway!

gcolvin · March 28, 2019, 8:54pm

Agreed, reluctantly. Doing self-modifying code by printing out text and running it back through a compiler at runtime is usually an anti-pattern, but one I should have complained about a long time ago.

lightuponlight · March 29, 2019, 2:13pm

Yes agree, especially when you have little control over the compiler.

ProgPoW seems to add a huge amount of complexity to the mining algorithm with many new added dependencies. This looks like a recipe for lowered reliability and perhaps even potential network attacks.

Whether it will end up showing unreliability in actual operation is certainly unclear, but the large increase in complexity, layered components and dynamic behavior makes it a lot more likely than with the existing hash algo.

gcolvin · March 29, 2019, 3:04pm

I think it’s worth considering whether we can reduce the complexity without reducing the security, but somebody (and @ifdefelse might not want to volunteer) has to do the work, and I’m not seeing this as a showstopper.

gcolvin · March 29, 2019, 3:08pm

How hard would it be to back off to EthHash temporarily if a problem is found?

jcyr · March 29, 2019, 8:32pm

There already exists a miner that can do both Ethash and Progpow. However, there are no provisions in the protocol for signaling the choice of algorithm. Can’t speak for the nodes.

gcolvin · March 30, 2019, 12:04am

I don’t know that it would need to be in the protocol.

ifdefelse · March 30, 2019, 9:43am

Appreciate the technical discussion here. We have thought through many other options. In the end, the increased complexity in data manipulation and code generation was the best way to ensure hardware architecture affinity.

Furthermore, I’d like to echo an earlier comment that is very important when considering bugs that might break things. We’re always dependent on some party’s driver software or firmware - whether it is a cryptocurrency-ASIC or a GPU-ASIC. With bad software/firmware updates, hardware can be taken offline. Consider this: is it better to trust a party with a vested interest in getting the driver software/firmware right at enormous mission-critical scales and which is audited/tested every instant by a independent global computing ecosystem? Or, is it better to trust the alternative parties whose interests are profiting from mining hardware, who are historically well-known for backdoors and a lack of transparency, and naturally have a much smaller ecosystem of users?

gcolvin · March 30, 2019, 7:46pm

Thus my reluctant agreement, @ifdefelse. The code generation introduces some weakness and complexity, but am not sure that adequate security can be had without some sort of code generation, and starting into redesign now seems a bigger risk. And I’m hearing that any problems that arise can be easily mitigated.

But some discussion of whether we can simplify things safely in a future upgrade may be worthwhile after the current storm has past.

And, thankfully, I don’t think the GPU over ASIC arguments are relevant here We are looking for weaknesses with an eye to correcting or mitigating them. If this is indeed the weakest link ProgPoW is looking pretty good.

ifdefelse · April 1, 2019, 3:40am

Appreciate the technical discussion here. We have thought through many other options. In the end, the increased complexity in data manipulation and code generation was the best way to ensure hardware architecture affinity.

Furthermore, I’d like to echo an earlier comment that is very important when considering bugs that might break things. We’re always dependent on some party’s driver software or firmware. With bad software/firmware updates, hardware can be taken offline. Consider this: Is it better to trust a party with a vested interest in getting the driver software/firmware right at enormous mission-critical scales and which is audited/tested every instant by an independent global computing ecosystem? Or, is it better to trust the alternative parties whose interests are profiting from mining hardware, and which have a much smaller ecosystem of users?

Anlan · April 1, 2019, 7:46am

Protocol can be easily tweaked to signal mining algo.
See EIPs/EIPS/eip-1571.md at master · ethereum/EIPs · GitHub