ProgPoW Audit Delay Issue

Beacon chain is fun because there exists no EIP for that. I’m not even sure how we would standardize something like that, it’s basically a new thing. Do we need a new EIP process? That rabbithole is looking pretty deep…

Anyways, I had a suggestion to add tags on EIPs requesting particular specialized review by subsets of the community. I think the event plan was to align these tags with FEM rings that maintain channels of communication and discuss the relevant proposals on something like a monthly basis. Not sure that makes sense to me any more, but might get the juices flowing on how we make something like this happen.

@souptacular and Ethereum core devs ,

ePIC Blockchain would like to help with the ProgPOW hardware audit.

We are former engineers with GPU and mobile SoC semiconductor firms with over 120 years of combined experience designing and manufacturing GPU, mobile SoC and various ASIC’s.

Our team consists of key architect and designers who have worked on 8 generations of GPU’s, the shader core for an upcoming game console, as well as, several crypto ASICS. Our supply chain team has put into production 100’s of millions of ASIC’s and GPU’s for ATI/AMD and Qualcomm. Our foundry experience and contacts range across TSMC, GF, Samsung and others through 7nm to 65nm nodes.

We would be pleased to work with the Ethereum community in the ProgPOW audit with the neutrality that is needed. Who better to conduct the audit than a team that builds both GPU and ASIC’s and has the full ASIC supply chain and foundry expertise?

Be sure to check out our website at www.epicblockchain.io. Also if any of you are in Toronto for the @ScalingETH Dev Workshop next week and want to meet to discuss ASIC’s or what hotspots in Toronto are, drop me a PM.

Cheers, Henry

PS. I should also point out that ePIC makes ASIC’s and we do not mine.

3 Likes

Hi, I’ve engineered both GPU and ASIC miners, and I’ve also operated large-scale CPU and medium-sized GPU farms. Together with 7400 and Salt, we released an open-source ASIC miner for Cryptonight Classic that was 5x better H/J than Bitmain’s X1, while using only 28nm. Recently, the Monero PoW team invited me to review RandomX, and I’m also releasing new technical work on Cuckoo Cycle.

Our hardware team at altASIC has done an initial review of ProgPoW and posted our comments to their GitHub issue. We’re unwilling to go on record as to whether ProgPoW will meet its objectives of keeping ASIC’s to within 2x and thereby being considered more ASIC-resistant than Ethash. Such determination would require more implementation work than we are willing to commit, but we call attention to the published design by Sonia Chen at Linzhi, also referenced in the ProgPoW issue.

I’ve cross-posted our comments here from the ProgPoW issue, and hope they’re helpful to the Ethereum community:

Overall, ProgPoW looks to be a genuine, professional attempt by GPU-interested parties. That is to say, we found no obvious backdoors or any suggestion that ProgPoW is anything but an honest attempt at ASIC resistance.

The inner loop does try to cover the shader datapaths pretty well, but obviously GPU’s without a unified texture/L1 architecture will waste some texture area, and all geometry pipelines go unused. Also, ProgPoW is strictly integer math, while GPU’s predominantly focus on float performance, so that overlap is also less than 100%. However, we are not GPU insiders and cannot quantify the GPU die area that would go unused in ProgPoW.

We do point out that while GPU’s are not especially good at integer multiplication and are outright bad at bit operations, five of eleven random math operations in ProgPoW are bitops and two are multiplies. For Nvidia, bitops take 2 cycles each, the same as addition, and multiplies are so slow the official docs say only “multiple cycles.” In ASIC’s, bit operations especially can run considerably faster.

We suspect a VLIW architecture may help exploit this, by combining multiple instructions into a bundle that can be computed in fewer clock cycles than each instruction individually. If we group the 11 operations into three categories: bitops, adds, and muls (also rot), then our slow-seed compiler can generate instructions like bitop-muladd-bitop that frequently match branches of the abstract syntax tree and run in far less than the 8+ cycles this would take on a GPU. The timings and dependencies of instructions may be precalculated by the compiler, such that no on-chip sequencing logic is necessary. Also, the set of VLIW instructions may be generated from the distribution of the program space, and this distribution may also inform the number of compute instances for each instruction. There would be many bitop-bitop units for example, and fewer bitop-add-mul-add-bitop units, efficiently matching transistor count to the frequency of each op sequence.

These gains all together may or may not give a 2x speedup, and we can’t say without deeper analysis.

Overall we think ProgPoW is a good try, probably the best anti-ASIC attempt so far. It is relatively simple and straightforward, and professionally designed and documented, yet we remain uncertain of its chances for keeping the ASIC gap to 2x.

3 Likes

Actual ethash suffers from the same waste.

FP operations imply other kind of problems.
I quote @shemnon from the gitter channel
“Floating point precision is different between GPUs (and vendors and versions) and CPUs. Java has a strictfp keyword just to deal with that. EWASM has banned it from their specification for similar reasons. Integer math provides portable exactness.”

1 Like

I’m not suggesting that the PoW use floats, and of course I’m well aware of the issues. Actually RandomX does use floats in a CPU-oriented PoW, but they take care to avoid non-normal conditions like underflow and NaN, and they’ve verified implementation consistency with the IEEE standard across Intel, AMD, and ARM, exhaustively validating all possible outputs of their float operators.

My point is that GPU’s are primarily designed for floating point performance, not integers or bitops or flow control. In fact, the only operation that runs in a single cycle on an Nvidia card is a half-precision float muladd. Full precision float multiplies run in just 2 cycles even though their integer multiply counterpart is much slower. GPU’s can do this because you just don’t need exact answers in graphics shaders, only something that’s close enough to look good to a human eye. GPU’s do the best they can in a short time and leave some small errors in the LSB’s of the significand, which is obviously unsuitable for a PoW. PoW’s require exactness and integer arithmetic, and using GPU’s for PoW is like using a hammer to drive a screw. The main thing they do well, float muladds, goes unused.

That is changing recently with the greater emphasis on GPGPU computing, but the focus is really on scientific compute and deep learning, which are both primarily all float ops. There is not much incentive outside crypto for GPU manufacturers to improve integer performance.

Although I want to maintain a neutral stance on ProgPoW vs Ethash, in general I think ASIC resistance is a fool’s errand, and GPU mining is doomed. The ProgPoW team did what they could within the constraints provided, but you can’t turn a boat into a car. Not their fault if it doesn’t work. GPU’s are just suboptimal for PoW computations.

ProgPoW’s hope is that is they do get “close enough” to ASIC performance that the economics of scale in the GPU industry make smaller ASIC manufacturers unable to compete, but this is an Economy of Scale effect, not a technical difference between ASIC’s and GPU’s. ASIC’s by their very definition will always be faster than general compute units at the task for which the ASIC was designed. The question is whether a GPU’s integer performance is good enough relative to ASIC’s to allow the GPU industry’s economy of scale to make up the difference.

If GPU integer performance sucks very badly then ProgPoW may be worse than Ethash. If GPU’s perform pretty well at integers, then ProgPoW may be better. We’re not willing to speculate without a lot more work.

My personal view on PoW is that there is no such thing as ASIC resistance. Furthermore, GPU’s can jump coins and are easily rented and are therefore less secure than ASIC’s even before you consider hashrates. A PoW needs to be simple to understand and implement, and it needs to be well reviewed by serious cryptographers. IMO Ethereum should dump both ProgPoW and Ethash and use Keccak. But there’s too much GPU-interested politics in Ethereum for that…

This is a double-edged sword that almost everyone points out the bad side of. The good side is that because GPU miner’s hardware is less specialized (they can always sell the hardware at a loss or mine another coin), they will not dig as deep a position against an existential threat to their operations (for example, the upcoming move to PoS and the mining reward reduction when we employ the finality gadget). If PoW mining were the long-term goal of Ethereum, then it would absolutely make the most sense to adopt an ASIC friendly position, because that type of deep hardware investment aligns these actors long-term with the health and security of the current chain. But we are not trying to do this! As a community we are looking to make a move to a completely different consensus mechanism that eliminates the need for any sort of mining over time.

To respond to the specific criticism though, rental attacks hold true for any PoW algorithm, no matter what is considered the most profitable hardware for mining it (CPUs, GPUs, ASICs, etc.). In the general case, if you do not have a supermajority share of the hardware considered to be the most profitable for mining a particular PoW algorithm, you will be at risk to a rental attack. Ethereum currently attracts the largest share of GPUs used for mining. If it were to change it’s algorithm to something that attracts more specialized hardware, that currently mines another coin more profitably, the network would become much more insecure as a result. Therefore changing to another algorithm where the GPU is not the most profitable hardware for mining it would almost certainly lead to a loss of security over what we currently have. It also represents a centralization of power as these manufacturers are much better coordinated and have larger incentives to disrupt any unprofitable changes than to current mining algorithm. It simply should not be done.

I don’t think it’s controversial to say we should either make a change in an attempt to further ensure ETH is the most valuable coin for GPU miners to mine, or accept that we as a community no longer value ASIC resistance and we prefer to let specialized hardware own our network security from the near future onwards, for better or worse. The later option will happen by default as it does not require any change to the current technical design of Ethereum, but we should recognize it represents a change to the social contract laid out from the beginning in the Whitepaper. However, deciding to go all in on allowing ASICs may see an exodus of GPU miners as they divest themselves from involvement in our community, so we may see short term insecurity as a result. Failure to make an active decision one way or another represents a win for specialized hardware manufacturers, and invites them to make further investment into our ecosystem with the knowledge that they can win against the community on any dispute much easier than if we did make an active decision.

Whatever we decide, it should be understood that it may come at the detriment of our future plans for a transition to PoS.

3 Likes

@timolson Have you watched the DevConIV ProgPow talk by @ohgodagirl? I think it addresses most of the issues, apart from the deep dive into floats. I would ask that anyone who is serious about debating ProgPow watch it first. She is Miss If (of If/Def/Else) so her word is gospel when it comes to the authors intent.

First, let’s not conflate ASIC-resistance with ASIC-proof. No algorithm is ASIC-proof. What ASIC-resistance aims to do is to resist, not prevent. By making an algorithm that is very well suited to the GPGPUs it makes the economics of producing and maintaining unpalatable, short of a sustained tripling of Eth’s all time high. Making GPUs the best device on the market to do it and designing the algorithm so that an ASIC would look like a GPU missing some ports an chips is their method. Miss If admits it is a short term solution and that this resistance has a time limit. The goal is to make that time limit past the Proof of Stake transition.

And that one of the reasons I think ProgPow should be done. We need to move the cheese before ASICs dominate the mining aspect in the way they have controlled BTC and other cryptocurrencies. The need for a return on investment is a contradictory alignment of interests. By moving to a GPU favorable algorithm the miners who do remain will have devices that are still useful when mining on the original Ethereum mainnet becomes moot, because Serenity is fully activated. If all the miners were on ASICs their incentive would be to sabotage such a transition as their hardware would be useless. But with a GPGPU there is still utility in AI, other cryptos, or even gasp video games.

So there is a community commitment to “move the cheese” once Proof-of-Stake becomes viable. If we flinch now updating the Proof-of-Work algorithm because the chosen algorithm isn’t ASIC-proof then there is a reason to doubt if the community will ever be willing to pull the trigger to end the mainnet and let the difficulty bomb fully explode. Remember, we were supposed to already be on a Proof of Stake network by now, the plan has already changed.

6 Likes

FFS I try to post a highly-educated technical review and first get insulted that I don’t even know the basics of floats, and now you think I don’t know who Kristy is or understand that ASIC resistance is a continuum?

We specifically call out the 2x threshold which seems to be a common number used for “resistance” and also say it’s not clear that ProgPoW is an improvement over Ethash (but it’s an honest try.) Really, it could go either way. If you just care about delaying ASIC’s with forks then the PoW doesn’t matter much. Just keep forking to something new every 6 months. If you think ProgPoW will last 12 months then that is purely speculation.

I thought the community would want an expert opinion that didn’t come from the authors. If you’d like to ask for more detail on our assessment I’m happy to oblige, but I have no interest in devolving into a philosophical PoW conversation. I shouldn’t have said anything about my personal view on PoW.

To my knowledge, no one has mentioned or considered a VLIW architecture for ProgPoW before, and it’s a clever approach that will give a nontrivial performance increase, so you’re welcome. I could have sat on this design and made a lot of money selling a ProgPoW ASIC.

I find it strange that no one has yet commented on this VLIW proposal or its implications for ASIC resistance. Instead, you seem most interested in just quoting KLM to me and defending a philosophical position you were already entrenched in… Have you even bothered to add up the gate counts in Linzhi’s design? Hmm? What do you get for your die area calculations?

3 Likes

My intent wasn’t to insult. Many people read these threads so I wanted more available resources. I’m not denying there are clever ways to create new ASICs, some we know about and some that will be invented, that is the nature of the beast. VLIW is one of those possible methods.

My first goal was to clarify the meaning of ASIC resistance vs ASIC proof, your statement “My personal view on PoW is that there is no such thing as ASIC resistance” contradicts your new claim that you “understand that ASIC resistance is a continuum.” I’ll give you the benefit of the doubt and interpret that when you said “there is no such thing” what you were arguing was that ASIC resistance is too easily overcome to provide meaningful value. We both agree there is no such thing as ASIC proof. The main question are can we delay the entry of ASICs into a ProgPow POW ecosystem, but does the mechanism proposed to delay it in a fashion that makes economic sense. A two week delay fixed by a change order is a delay, but a pointless one. Is ProgPow a similarly pointless delay? That’s the answers I’m looking for from the audit.

3 Likes

Sorry, it was my fault to say anything beyond the ProgPoW facts. I shouldn’t be snippy.

When I say ASIC resistance is not possible, it is an opinion that “even economies of scale in GPU’s or CPU’s are not enough to prevent a 2x improvement by ASIC’s in terms of total-cost-per-hash for any given PoW.” I don’t mean ASIC-proof; even a loose “economic resistance” seems too much. And even if 2x is achieved, I do not think that is enough to save GPU’s. But these are personal opinions not ProgPoW facts.

Maybe ProgPoW proves me wrong. I’m open to that. Like I said, I think it’s the best try so far, but I’m not sure if it’s better than Ethash. It’s a good try for sure.

Certainly ProgPoW is much more than an ECO on Ethash. It’s an all-new chip for sure. But the spec has been around long enough that you should expect a great deal of ASIC design work to already be completed. You could see ProgPoW ASIC miners on the network within 4-6 months of a switch to ProgPoW being finalized. I would expect manufacturers to wait for certainty on ProgPoW before taping out, so it’s actually to your benefit to keep alive the possibility of not switching. That’s 4 months from decision time, not from launch time…

For a case study in forking to fight ASIC’s, look at Monero. Their first attempt was useless (quickly ECO’d) but even big changes were overcome by ASIC’s within 6 months or so. The thing currently keeping ASIC’s at bay in Monero is the threat of further imminent forks, not anything to do with the PoW.

3 Likes

A separate point about mining economics:

Many people assume hash-per-joule is the ultimate mining metric, but anyone who has run a mine knows that capital expense is actually a HUGE factor, because the lifetime of mining hardware is so short. Of course the total lifetime cost for a miner is capex + time*opex, so if time is short, capex dominates. It’s a line with capex at time 0, going up with slope opex.

< deleted some wrong speculation about ethash vs progpow in capex vs opex >

ProgPoW will have both a bigger capex and opex cost compared to Ethash, so both the offset and slope of the total cost line will go up. That may seem like ProgPoW is an instant win vs Ethash, but it’s not clear, because maybe ProgPoW ASIC’s improve the slope of the cost line vs GPU’s more than Ethash ASIC’s. Then things depend on the lifetime you choose for the equipment, and the economic-ASIC-resistance lines of the two PoW’s will cross at some point…

You need to basically make an ASIC in order to know its power and area well enough to compute the cost lines that would inform any economic-ASIC-resistance determination.

Also, no one should worry about the audit firm dropping the job. It’s a thankless task and honestly very little money. Generally you’d need something like $500k to $1m to do enough work to make specific power and area claims. With the amount you have in the fund, under $20k I believe, you shouldn’t expect any real conclusion on ProgPoW to come out of the audit. Not sure what your specific goals are for that review.

There was more money allocated privately than that donations pool, but you’re right that it was no where near $500k. It’s somewhere around 1/4 to 1/3 of that.

This is a very interesting point, but I’m not sure you give a clear explanation for why that is so. At least, not clear to me. Would definitely appreciate if you could make the argument of “capex vs opex” differences in the algorithms, using some of the features of the algorithms to show why those points are true.

At a high level, the opex/capex argument is interesting to me. It would seem to make most sense that ensuring the opex for specialized hardware vs. commodity GPUs is as similar as possible and capex somewhat favors commodity GPUs, with a higher ratio of capex to opex, would ensure that the chosen algorithm has a good chance of resisting specialized hardware production for a reasonable period of time (12-18 months being a target). It would also being interesting to hear your take on what practical ranges of these parameters can be expected and under what time frames they would probably be valid for.

1 Like

Let me emphasize the word “might” in all those capex/opex statements. It’s probably not fair for me to conjecture. To explain:

Ethash primarily relies on bandwidth to memory with only very light computation, but ProgPoW adds a lot of computation, which means both extra silicon (capex) and extra power (opex). Actually it probably adds more to the opex side than the capex side, but I don’t know, so I shouldn’t have said anything. But yes they are using more chip, which means more up-front cost (capex), but there will be a lot more power too (opex). I don’t want to speculate whether one or the other will be better for typical miner lifetimes, only to point out the complexity of calculating economic ASIC-resistance.

This is wrong BS by me don’t believe it. In fact it’s probably the opposite way around. I was thinking only about the silicon not the power… We won’t know the power requirements of a ProgPoW ASIC without basically prototyping one, like I said above about the audit cost.

Definitely makes it really hard to speculate on what is the best decision here. On one hand, making the change has a non-trivial chance of being worse than Ethash in terms of efficiency by specialized mining hardware, although that is hopefully not the case since we have more data now. On the other hand, doing nothing seems to all but invite manufacturers of that hardware to the table, with whatever incentive misalignments they may bring. The best approach, as you said above, is to create the threat of making a change, but we are a large community that requires a lot of transparency to function, so it would seem unlikely we could fake it for very long. In my mind, the best scenario is therefore to build, test, and integrate ProgPoW, and at least have it within a hair’s trigger of going live. At that point, you might as well just try the damn thing so you are not seen as bluffing. The more well-researched and implemented options we have at our disposal for alternative mining algorithms, the less appealing it would be to develop hardware for it.

The VLIW proposal sounds very interesting to me!

1 Like

[Disclaimer/Background: I work for Linzhi Shenzhen, a small independent privately funded ASIC startup in Shenzhen. We are working on an upcoming crypto chip whose first application will be Ethash, as announced at ETC Summit 2018 in Seoul. We are facing some delays, but all is fine. All things worth doing take longer than expected.]

In my view, @timolson did the Ethereum community a great - free - service with his series of posts and replies in this thread. Thank you Tim!

We at Linzhi are a little more open to speculating about the intentions or business deals that may have led to ProgPoW, which look very different from those behind RandomX. The anonymous nature of Mr. Else and Mr. Def as well as a number of surprisingly uniform accounts and articles/presentations are quite intriguing.
We don’t know the deals between Core Scientific (where Kristy-Leigh Minehan the main ProgPoW proponent is CTO) and Nvidia. We believe cost advantage leads to centralization

ProgPoW risks turning Ethereum into an Nvidia-managed game, with ETH devs becoming unpaid support staff to keep the game going.

I’m relaying the following quote from people with more GPU expertise than we have
or are willing to acquire (not from Linzhi, take it fwiw):

"On AMD V_MUL_LO_U32 uses 16 cycles, on Nvidia starting from Volta IMAD uses 5 cycles.

32-bit multiplication in ProgPoW was pushed under the pretense of it being inefficient on both manufacturers but that turns out to be a lie as on Nvidia it was only inefficient on Pascal. The algorithm is tuned to still let Pascal utilize full memory bandwidth due to simply sheer compute capacity difference coming partially from higher die size which is why comparing GPUs based on price is being pushed as of late. It’s not a secret that 4xx & 5xx series AMD GPUs are not high-end but because ProgPoW’s compute to memory bandwidth ratio is tuned to match Nvidia GPUs, AMD GPUs are not utilized to their fullest, most importantly losing the full memory bandwidth utilization which is the very basic foundation of the Ethash algorithm."

Is this true? If so would it leave much of the “honesty of the attempt” that @timolson sees?

Phil Daian wrote a great paper over a year ago
https://pdaian.com/blog/anti-asic-forks-considered-harmful/

There is another anonymous guy out there writing about ProgPoW, ether4life (no, it’s not us).

ASICs are designed and manufactured when there are buyers.
Currently there is sufficient demand if a machine can offer 150 days ROI (with flat/flat assumption, that is coin price flat, difficulty flat).
ETH currently pays out 3.6 mio USD / day or about 100 mio USD / month.

Without doing much math, it’s not hard to see how a system that pays out 100 mio USD / month can incentivize the design and development of chips. A lot can be done for a few million USD and it’s only natural that different businesses are competing for that money.

@fubuloubu - prepared or irregular PoW changes are a big incentive for secret mining. The alternative would be to pre-announce a PoW change one or two years in advance.

Quoting @timolson
“If GPU integer performance sucks very badly then ProgPoW may be worse than Ethash. If GPU’s perform pretty well at integers, then ProgPoW may be better. We’re not willing to speculate without a lot more work.”

+1 from Linzhi.

Tim made many other good points, I can tell you what I am doing: re-read his points, think. There are always new realizations to be had. What about the VLIW design? :slight_smile:
We are happy to have more discussions also in our Telegram group LinzhiCorp.

We did not look at the AMD vs Nvidia issue. When we say “honest attempt” we mean that ProgPoW doesn’t have some secret way to make ASIC’s easy, or any backdoor, or anything like that. It is definitely pro-GPU.

It may be true—almost inevitable—that one manufacturer or card series outperforms the other. It may be true that the authors have specifically tuned the PoW to favor Nvidia over AMD.

AMD vs. Nvidia may be tuned using ProgPoW’s loop constants for compute and memory, and IMO it’s up to the Ethereum community to test different values on different cards and decide what’s fair. The structure and design of ProgPoW does not fundamentally favor either vendor in my view, and the tunings my be adjusted and tested by any GPU enthusiast who can change one number and compile code. If the Eth community doesn’t want to do the simple work of trying different loop constants on different cards, then the hard work of fighting over what values are fair, then they can just accept the authors’ suggested tunings. But don’t be afraid to adjust the loop constants if you are concerned about GPU vendor bias.

3 Likes

Thanks Sonia, you too! Linzhi has also offered valuable free insight into ProgPoW including a basic ASIC design with gate counts, and they’ve been generous in also helping RandomX improve their effort. Sonia knows what she’s talking about, and free advice in hardware is really rare. It’s wonderful to see such entrepreneurial spirit and transparency!