My technical take on ProgPow's weakest link

progpow

#41

In theory sure. In practice I doubt you could measure a difference. You’re talking about a few extra instructions to do an indexed load from a branch table, then a jump to the entry’s content.

Also do we know that Nvidia’s just-in-time nvrtc compiler does as good a job optimizing as the command-line compiler nvcc?


#42

… but I think I get from your argument that it improves performance on GPUs enough that if it is possible miners will do it, so there is no point make it unnecessary. Do I understand you?


#43

@gcolvin that is exactly what I meant


#44

According to CUDA documentation the “extra” -O cli argument supplied by nvcc has effect only on “host” code (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#options-for-altering-compiler-linker-behavior-optimize).
NVCC is meant to do a lot more than simply compiling a device kernel. It links in fatbinary host and device code (so you can invoke kernels using <<<>>>), allows profilation of code, allows usage of host and device context mixed etc.
But for ptx generation both nvcc and nvrtc produce the very same output (tested).


#45

Not true. Remove conditionals, remove index increments, remove everything not necessary, use inline asm (when possible) during the immutability period and you easily gain hashes and hashes per second from kernels. But you should already know that having contributed to ethminer for so long and having witnessed all the small adjustments that have brought ethminer to be on par with claymore (at least on CUDA).


#46

That’s all very speculative

What you’re saying is that:

invoke_super_optiminzed_inlined_seqience_for_period(n);

will be significantly faster than:

switch (period) {
  case n:
     invoke_super_optiminzed_inlined_seqience_for_period(n);
     break;
  case n + 1:
     invoke_super_optiminzed_inlined_seqience_for_period(n + 1);
     break;
  case n + 2:
     invoke_super_optiminzed_inlined_seqience_for_period(n + 2);
     break;
  case n + 3:
     invoke_super_optiminzed_inlined_seqience_for_period(n + 3);
     break;
  ...
}

I really don’t think it would be perceptibly slower for any given n. For a sequential integral switch statement selector a compiler will generate an indexed load from a static table of branch addresses, followed with a jump to that address, and a branch out of the switch at the end. Not much overhead considering the amount of stuff inside each sequence, and the same wiz-bang optimizations can be applied in each case. Hence my earlier suggestion to limit the choice to 32 possible sequences. Such things have been tried in other POW algorithms, but I don’t have any data about how effective it was, so that’s also very speculative.

Of course it would be impractical to do this with a large set of choices. I’m just trying to give a sense of the actual performance degradation you speak of.

Anyway, I think this has been sufficiently flogged and I will leave it at that. It is likely too late to journey into even more unknown territory anyway!


#47

Agreed, reluctantly. Doing self-modifying code by printing out text and running it back through a compiler at runtime is usually an anti-pattern, but one I should have complained about a long time ago.


#48

Yes agree, especially when you have little control over the compiler.

ProgPoW seems to add a huge amount of complexity to the mining algorithm with many new added dependencies. This looks like a recipe for lowered reliability and perhaps even potential network attacks.

Whether it will end up showing unreliability in actual operation is certainly unclear, but the large increase in complexity, layered components and dynamic behavior makes it a lot more likely than with the existing hash algo.


#49

I think it’s worth considering whether we can reduce the complexity without reducing the security, but somebody (and @ifdefelse might not want to volunteer) has to do the work, and I’m not seeing this as a showstopper.


#50

How hard would it be to back off to EthHash temporarily if a problem is found?


#51

There already exists a miner that can do both Ethash and Progpow. However, there are no provisions in the protocol for signaling the choice of algorithm. Can’t speak for the nodes.


#52

I don’t know that it would need to be in the protocol.


#53

Appreciate the technical discussion here. We have thought through many other options. In the end, the increased complexity in data manipulation and code generation was the best way to ensure hardware architecture affinity.

Furthermore, I’d like to echo an earlier comment that is very important when considering bugs that might break things. We’re always dependent on some party’s driver software or firmware - whether it is a cryptocurrency-ASIC or a GPU-ASIC. With bad software/firmware updates, hardware can be taken offline. Consider this: is it better to trust a party with a vested interest in getting the driver software/firmware right at enormous mission-critical scales and which is audited/tested every instant by a independent global computing ecosystem? Or, is it better to trust the alternative parties whose interests are profiting from mining hardware, who are historically well-known for backdoors and a lack of transparency, and naturally have a much smaller ecosystem of users?


#54

Thus my reluctant agreement, @ifdefelse. The code generation introduces some weakness and complexity, but am not sure that adequate security can be had without some sort of code generation, and starting into redesign now seems a bigger risk. And I’m hearing that any problems that arise can be easily mitigated.

But some discussion of whether we can simplify things safely in a future upgrade may be worthwhile after the current storm has past.

And, thankfully, I don’t think the GPU over ASIC arguments are relevant here :slight_smile: We are looking for weaknesses with an eye to correcting or mitigating them. If this is indeed the weakest link ProgPoW is looking pretty good.


#55

Appreciate the technical discussion here. We have thought through many other options. In the end, the increased complexity in data manipulation and code generation was the best way to ensure hardware architecture affinity.

Furthermore, I’d like to echo an earlier comment that is very important when considering bugs that might break things. We’re always dependent on some party’s driver software or firmware. With bad software/firmware updates, hardware can be taken offline. Consider this: Is it better to trust a party with a vested interest in getting the driver software/firmware right at enormous mission-critical scales and which is audited/tested every instant by an independent global computing ecosystem? Or, is it better to trust the alternative parties whose interests are profiting from mining hardware, and which have a much smaller ecosystem of users?


#56

Protocol can be easily tweaked to signal mining algo.
See https://github.com/ethereum/EIPs/blob/master/EIPS/eip-1571.md