The Joy of Consensus Mechanisms: A Primer on Ethereum's Proof-of-Work vs Proof-of-Stake
What they are, how they work, and their relative strengths and weaknesses
Introduction
As Ethereum core developers enter their final stages of testing before the great switch from Proof-of-Work (PoW) to Proof-of-Stake (PoS) later this year (maybe as soon as August), I wanted to explain the basics behind what these two ‘consensus mechanisms’ are and how they work, with a description of their relative strengths and weaknesses.
I will attempt to present all of this without bias or prejudice in order to provide you, the reader, with as much information as possible so that you’re able to consider for yourself the various trade-offs that each approach makes.
Let’s begin with some background and context.
What is a consensus mechanism?
Even before we describe what a consensus mechanism is, we need to zoom out a little in order to make sense of how everything fits together.
First off, what is Ethereum? Ethereum is, at a high level:
a public ‘virtual computer’1 …
containing accounts, balances and smart contract code …
that is shared across a global network of computers (called nodes).
At any given point in time, this virtual computer presents a state of the world (e.g. what each individual or smart contract wallet is holding, etc.) that everyone on the network must agree on. When people talk about the Ethereum Virtual Machine (‘EVM’), they are talking about this single, canonical state machine.
So where does the ‘blockchain’ bit come in? The blocks in a blockchain contain transaction or computation requests from users, which themselves can range from simple requests to transfer ETH between two wallets, to more complex requests for interactions with smart contracts. As the blocks containing these requests are in turn:
mined by miners (in the case of PoW) or proposed by validators (in the case of PoS);
verified by other nodes; and
appended onto the tail of the blockchain (which in turn finalises and commits the transaction requests);
this causes a state change in the EVM, which is propagated throughout the entire network.
So far, so good I hope. Let’s pause here for a minute however and focus a little more on the steps outlined above.
Step 1, i.e. the act of mining/proposing blocks, is not a consensus mechanism all by itself. It is technically a Sybil resistance mechanism (great, more jargon!). Sybil attacks are when one user or group pretends to be many users. Resistance to this type of attack is essential for a decentralised blockchain, and both PoW and PoS protect against this by making users respectively either expend a lot of energy or put up a lot of collateral, providing an economic deterrent to Sybil attacks.
Steps 2 & 3, i.e. the act of verification by the other nodes and eventual appending onto the tail of the blockchain, employ something called a chain selection rule. Both Ethereum and Bitcoin, under PoW, currently use the longest chain rule, which means that, in the event of two valid blocks getting mined at the same time, creating a temporary fork, eventually one of these chains will become the accepted chain after a subsequent block has been mined and added to it, making it longer. Whichever blockchain is the longest (i.e. the one that has had the most ‘work’ performed on it) will be the one the rest of the nodes ultimately accept as valid and verify against.
Even though most people describe PoW and PoS as ‘consensus mechanisms’ in their own right, this isn’t entirely accurate as the individual miner or validator proposing the block isn’t actually performing any consensus by themselves. Instead, it’s the combination of the PoW/PoS Sybil resistance mechanism and chain selection rule in place which determines the overall consensus mechanism, i.e.:
Consensus mechanism = Sybil resistance mechanism + chain selection rule
With all of that definitional hair-splitting out of the way, let’s focus on how PoW and PoS respectively do what they do.
How does Ethereum’s implementation of Proof-of-Work work?
Ethereum, like Bitcoin, currently uses PoW, i.e. it uses ‘mining’ to create a block of transactions to be added to the Ethereum blockchain. The following steps below describe in detail how Ethereum transactions are mined2:
Alice broadcasts a transaction request to the entire Ethereum network from some node.
Upon hearing about the new transaction request, each node in the Ethereum network adds the request to their local mempool - a list of all transaction requests they’ve heard about that have not yet been committed to the blockchain in a block.
At some point, a mining node aggregates several dozen or hundred transaction requests into a potential block, in a way that maximizes the tips they earn from the transaction requests whilst still staying under the block gas (or computation) limit. The mining node then:
Verifies the validity of each transaction request (e.g. no one is trying to transfer ETH out of an account they haven’t produced a signature for, etc.), and then executes the code of the request, altering the state of their local copy of the EVM. The miner awards the ‘tip’ part of the transaction fee for each such transaction request to their own account.
Begins the process of producing the proof-of-work “certificate of legitimacy” for the potential block, once all transaction requests in the block have been verified and executed on the local EVM copy.
Eventually, a miner will finish producing a certificate for a block which includes Alice’s specific transaction request. The miner then broadcasts the completed block, which includes the certificate and a checksum of the claimed new EVM state.
Other nodes hear about the new block. They verify the certificate, execute all transactions on the block themselves (including the transaction originally broadcasted by Alice), and verify that the checksum of their new EVM state after the execution of all transactions matches the checksum of the state claimed by the miner’s block. (This verification is trivial to do as the miner has done the hard work for them.) Only then do these nodes append this block to the tail of their blockchain, and accept the new EVM state as the canonical state.
Each node removes all transactions in this new block from their local mempool of unfulfilled transaction requests.
Miners who successfully create a block get rewarded with 2 newly minted ETH along with the tips linked to the transaction requests.
Let’s pause again and focus a little on Step 4 above. How does the miner specifically produce that certificate, i.e. what does ‘mining’ actually involve?
Below is a brilliant, relatively short video which provides a simplified example of how PoW blockchains actually work and the trial-and-error computations performed by miners to produce blocks. I would highly recommend watching it before moving on as this whole process is much easier to understand when demonstrated in a video rather than simply being read about in a post.
As a high-level summary:
Each block is assigned a difficulty based on the global computing power (or hash rate) being thrown at the network in order to produce blocks. The difficulty is automatically adjusted block-to-block such that a block is only able to be mined roughly once every 12 to 14 seconds. (In the case of Bitcoin, block production is aimed for once every 10 minutes.)
This difficulty determines the precise number of leading zeros needed to be solved for by the miner in producing a hash of the block itself. A hash is a single encrypted string which ‘captures’ the contents of the block, e.g. 0x00000881b07a6a09f83b130798072441705d9a665c5ac8bdf2f39a3cdf3bee29 - which has 5 leading zeros in this instance. The higher the number of leading zeros required, the more difficult the block will be to mine.
The contents of the current block being mined include transaction requests and the hash of the previous block, as well as random number called a nonce. The miner’s software will go through an intense race of trial and error to find the nonce for a block such that the hash for that block has the prescribed number of leading zeros as required by the difficulty setting. Solving this nonce puzzle is the ‘work’ in Proof-of-Work.
Relative strengths of Proof-of-Work
Security
An attacker needs 51% of the network's hash rate to censor transactions and reorganise blocks. This would require such huge investments in equipment and energy; they would likely spend more than they would gain. Further, any attempt at producing invalid transactions from this (e.g. double spending) is likely to be rejected by the honest full nodes partaking in consensus.
More generally, the indirect penalties for miners submitting incorrect transactions data or blocks would be the sunk cost of this computing power, time and energy.
A virtuous circle can be considered at play when thinking about the security of PoW blockchains:
Neutrality
Miners don't need ETH to get started and block rewards allow them to go from zero ETH to a positive balance. PoS requires a starting balance of ETH to participate in staking and validation.
Battle-testedness
PoW is a tried and tested consensus mechanism that has kept Ethereum (and of course Bitcoin) secure and decentralised for many years.
Relative weaknesses of Proof-of-Work
Finality
Finality refers to the time you should wait before considering a transaction (effectively) “irreversible”.
Under PoW, temporary forks can occur with different nodes accepting different blocks at any given point in time. As such, the transactions in the latest block cannot be considered immediately ‘final’, as they could exist in a temporary fork which later gets rejected, and the transactions reversed.
As such, PoW blockchains are said to have ‘probabilistic’ finality, whereby the probability of reversing transactions is a function of cost, i.e. the more blocks that are built on top of a given block, the more hash rate is required to create an alternative longest chain from that given block, lowering the probability of a reorganisation.
For Ethereum, the recommended time after which you can say with relative confidence that a transaction was successful is six blocks or just over 1 minute. (This doesn’t allow for the time waiting for the transaction request to be picked up by the miner.)
Barriers to entry
Mining requires the purchase, regular maintenance, upgrading and/or replacement3 of specialised equipment (e.g. GPUs for Ethereum, ASICs for Bitcoin). Since the right to mine a block requires solving an arbitrary computational puzzle, miners can increase their odds of success by investing in more powerful hardware. These investments are predominantly made by professional large-scale mining operations (or mining farms) using thousands of GPUs/ASICs, and as such solo small-scale miners are unlikely to capture enough block rewards to offset their upfront (and ongoing) equipment and energy costs, unless they themselves join mining pools with other individuals to increase their reward consistency.
Energy usage
Due to ever-increasing computing requirements associated with maintaining an edge on mining blocks, an enormous amount of energy (or hash rate) is used to keep the network safe. To maintain this security, Ethereum on PoW is currently estimated to consume 86.4 TWh annually, comparable to the power consumption of Finland, and produce 48.2 Mt CO2 annually, comparable to the carbon footprint of Norway4.
Rate of return
The considerable economies of scale experienced by professional large-scale mining operations can increase the rate of returns they experience over the individual miner. For example, individual miners are usually not in a position to negotiate directly with utility companies, but larger operations can secure discounted energy rates by guaranteeing consistent and large usage. These savings can be reinvested into new equipment, facilities, etc. which increases their share of the hash rate within the network, and hence the frequency at which they receive block rewards.
Centralisation/Cartelisation risk
Due to increasing competition for block rewards, the consolidation of mining into a small number of professional large-scale mining operations and mining pools has contributed to centralisation and/or cartelisation risks. At the time of writing, the top five Bitcoin and Ethereum mining pools both have a 69.5% share of their total respective network hash rates, increasing the risk of collusion between these parties to control these chains. It should be noted however that, with these mining pools themselves consisting of multiple entities, persuasion/coercion of these miners by mining pool coordinators to defraud the chain might not be so easy a task. It should also be noted that there is a historical precedent for mining pools self-restricting their share of the total network hash rate when they have become too large5.
Attack recovery
With PoW, if an attacker acquires 51% of the network’s computing power, there is no real mechanism to prevent them from continuously and endlessly attacking the chain, rendering it useless (known as a Spawn-Camp Attack).
Under this scenario, there are two possible solutions for honest miners: source additional computing power and/or move away from the primary hardware used for mining at that point in time (e.g. under Bitcoin, move from using ASICs to GPUs). Whilst both options will be difficult from a logistical point-from-view, the second option has the added complication of requiring significant human coordination and off-chain interaction between the remaining honest parties on how best to proceed.
Real world risk
GPUs and ASICs are ‘real world’ hardware that, in the case of PoW, are predominantly held in mining facilities around the world. As such, these facilities and their hardware are increasingly subject to nation-state bans, political pressure, activism, supply chain issues, and nerfing.
How will Ethereum’s implementation of Proof-of-Stake work?
Under Ethereum’s implementation of PoS, miners are now replaced with validators. To participate as a validator, a user must deposit (or stake) 32 ETH6 into Ethereum’s deposit contract as collateral to set up a staking node.
Whereas under PoW, the timing of blocks is determined by the mining difficulty, in PoS, the tempo is now fixed as there is no more ‘work’ to do. Time in PoS Ethereum is divided into slots (12 seconds) and epochs (32 slots). One validator is randomly selected via an algorithm to be a block proposer in every slot. This validator is responsible for creating a new block and sending it out to other nodes on the network.7
Also in every slot, a committee of 128 other validators is randomly chosen, whose votes are used to determine the validity of the block being proposed. In order for the committee to perform this task, the transactions delivered in the block are re-executed, and the block signature is checked to ensure the block is valid. These validators then send votes (called attestations) in favour of that block across the network.
As a reward for their efforts, all staking nodes (regardless of whether or not they have been block proposers or attestors in any given slot) are recompensed in ETH issuance via a variable annual percentage yield (or APY), i.e. their staked balance increases.8 The staking APY is a function of Ethereum’s fee revenue and is also broadly inversely proportional to the amount of ETH staked, in order to encourage the use of ETH in more productive ways within the wider network.
As we discussed above, a consensus mechanism is a combination of a Sybil resistance mechanism and a chain selection rule. We described how the longest chain rule was the chain selection rule for PoW. For PoS, it’s a little different: When the network performs optimally and honestly, there is only ever one new block at the head of the chain, and all validators attest to it. However, it is possible for validators to have different views of the head of the chain due to network latency or because a block proposer has equivocated (i.e. proposed multiple blocks)9. As such, an algorithm called LMD-GHOST is used to decide which one to favour, which works by identifying the fork that has the greatest weight of attestations in its history.
Relative strengths of Proof-of-Stake
Security
An attacker needs 51% of the total staked ETH to censor transactions and reorganise blocks (currently just over 6.5m ETH, around $13bn10). However, not all nodes partaking in consensus are staking nodes, and the honest non-staking full nodes are likely to reject any attempt at producing invalid transactions from this (e.g. double spending).
Furthermore, the community can also resort to social recovery of an honest chain if a 51% attack were to overcome the crypto-economic defences. For example, the honest validators could decide to fork and keep building on the minority chain and encourage apps, exchanges, etc. to do the same, whilst removing the attacker from the network11 and destroying their staked ETH.
These options would obviously require their own significant amount of human coordination & off-chain interaction, although it could be argued that forking a chain (in the case of PoS) would be easier & cheaper to do than organising real-world hardware changes (e.g. ASICs to GPUs) or supply ramp-ups in the case of PoW.
Due to the ETH slashing punishment for an attacker, multiple attacks will become prohibitively expensive, requiring the purchase of billions more dollars worth of ETH every time.
There is an additional deterrent in that those partaking in malicious behaviour (e.g. equivocating and/or submitting contradictory attestations) can have their stake(s) slashed by up to 100%. (This is in contrast to PoW, where malicious actors can not be explicitly punished.)
Finality
In contrast to PoW’s ‘probabilistic’ finality, PoS blockchains exhibit ‘deterministic’ finality, whereby once a transaction is included in a valid block and satisfies the relevant consensus rules, it can no longer be reversed. This is because, in the absence of costs associated with ‘work’ under PoW, it is theoretically possible for a wealthy individual or entity to rewrite the history of the chain simply by using their significant stake - for example with a long range attack.
In order to achieve such deterministic finality within a blockchain, a vote or agreement between a set of network participants is required. Because of this arrangement, the ‘correct’ chain is not decided by the longest chain rule but is instead one that is determined and approved by a set of network nodes, acting as the ultimate authorities and accomplishing finality through their votes.
In Ethereum’s PoS implementation, deterministic finality is applied periodically to ‘checkpoint’ blocks. The first block in each epoch is considered a ‘checkpoint’ block, and it is finalised one epoch in arrears through agreement between at least two-thirds of the validators. Validators must bet their entire stake on this, so if they try to collude down the line, they will risk losing their entire stake.12
Since finality requires a two-thirds majority, an attacker could prevent the network from reaching finality by voting with one-third of the total stake (and hence be committed to losing this one-third of the total supply of staked ETH they have acquired through slashing - currently around 4.3m ETH or $8.5bn). Even with this slashing deterrent, there is a mechanism in place to defend against this: the inactivity leak. This activates whenever the chain fails to finalise for more than four epochs. The inactivity leak bleeds away the staked ETH from validators voting against the majority, allowing the honest validators to regain a two-thirds majority and finalise the chain.
If the attacker was able to acquire over two-thirds of the total stake, this would allow them to force finality on all transactions. In this event, the same social recovery options as for the 51% attack above could be used to circumvent the attack.
Barriers to entry
This requirement to only possess capital rather than purchase, maintain and regularly replace specialised hardware to participate in block creation should promote decentralisation and lead to more individual nodes securing the network under PoS.
Even the 32 ETH minimum deposit for would-be stakers can be circumvented through the use of pooled staking services (e.g. using Lido or Rocket Pool), although these come with their own centralisation/cartelisation risks (discussed again below under ‘Relative weaknesses of Proof-of-Stake’).
Energy usage
As there is no need to perform ‘work’ to mine & propose blocks, energy usage for PoS blockchains is around 99.95% lower than for PoW blockchains.
Rate of return
The staking APY is the same for all validators, regardless of how many validator nodes may be owned or controlled by an individual or entity. Note that obviously this doesn’t mean the absolute returns in ETH are going to be the same for everyone (discussed again below under ‘Relative weaknesses of Proof-of-Stake’).
Issuance
Due to the low energy requirements around PoS, less ETH issuance is required to incentivise participation - the current supply of 13,500 ETH per day under PoW is estimated to drop by up to 90%. It can be argued that this low, persistent inflation is healthier for the sustainability of blockchains rather than an higher issuance alternative, and also reduces the cost of capital around not staking.
Relative weaknesses of Proof-of-Stake
Battle-testedness
PoS is younger and less battle-tested compared to PoW. There could be bugs, exploits or as-yet-unforeseen opportunities to game the system.
Centralisation risk: Multiple-node stakers
Although the staking APY is the same for all validators (as previously mentioned), this also means that individuals or entities controlling multiple nodes will naturally receive higher absolute returns vs. those staking with only (one set of) 32 ETH. This is commonly referred to as the ‘rich-get-richer’ problem amongst opponents of PoS systems, and a perceived centralisation risk. This may be exacerbated when issuance drops post the switch to PoS, which may further cement the positions of incumbent multiple-node stakers.
In my reading around this, adherents of PoS have generally pointed to how PoW also has ‘rich-get-richer’ tendencies through economies of scale vectors (as discussed in the ‘Rate of return’ sub-section under ‘Relative weaknesses of Proof-of-Work’), but there is no explicit refutation around this ‘rich-get-richer’ claim for PoS itself - i.e. absolute returns will vary based on who’s staking what, that’s a mathematical fact.
In order for this not to become a significant centralisation problem, there is an implied reliance on sufficient numbers of single/low-number staking node validators in the network to balance out the larger multiple-node stakers and maintain decentralisation. Social pressure could also play a strong part, and we should remember that people will continue to take profits from staking. If that all fails, the punitive slashing mechanisms built into Ethereum’s PoS implementation are intended to be a significant means of deterrence against any attempts at centralisation and collusion. Furthermore, there may also be economic ways to discourage centralisation.
It should be noted that multiple-node stakers will increase their chances of being selected by the algorithm to propose blocks. This in and of itself isn’t an attack-vector problem as there would be 128 independent attestors in a committee verifying the proposed blocks at any given time. In terms of the likelihood of this malicious actor being able to get enough of their nodes as attestors as well to push through their blocks, there is less than a 1-in-trillion chance that an attacker controlling one-third of the validators on the network would also control two-thirds of the attesting validators in a committee to successfully execute an attack.
Centralisation/cartelisation risk: Pooled staking services
The use of pooled staking services come with their own centralisation or cartelisation risks.
To expound on this, using Lido as an example: Potential stakers provide their ETH to Lido and Lido allocates it in turn to one of their (currently) 20+ professional node operators. These node operators stake the ETH and the staker receives a staking derivative called ‘stETH’ in return, which accumulates the rewards from staking and can later be redeemed 1:1 for ETH post-switch to PoS (it can be swapped now for ETH if necessary, albeit slightly under the 1 ETH peg at the time of writing).
Lido has on-chain governance through its LDO token, and it also uses a permissioned set of node operators. Whilst this permissioned list of operators allows Lido to develop off-chain relationships and, in theory, quickly fix any on-chain issues, it’s not impossible to consider a future where Lido governance might act against the best interests of Ethereum and decide to coerce its operators in a unified & insidious way, e.g. block re-organisations, transaction censoring, etc.
Even in a world without Lido, it is not unreasonable to imagine that centralisation would simply just occur under node operators directly instead, as people wouldn’t generally want to operate their own nodes, and the slashing risks associated with downtime or mis-management. However, as node operators function more like PoW mining pools in the direct creation of blocks, and mining pools have traditionally self-restricted their percentage share control of the total network hash rate (through social pressure or otherwise), it’s not unreasonable to believe that PoS node operators might also act accordingly. Lido, through acting as a middle layer, distributing ETH to node operators, may be less incentivised to do this. Currently Lido controls around 32% of the total ETH staked on Ethereum’s PoS (‘Beacon’) chain.
To Lido’s benefit, they have been open about decentralisation and are actively engaging in ways to limit risks to the Ethereum protocol.
Regardless of any of these centralising/cartelising forces, Ethereum under PoS is still planned to be resilient to any attacks performed as a result of them, using slashing and/or forking mechanisms to compel honest actions.
Increased trust assumptions
Rather than relying on computational work to solve for a block, PoS instead relies more on proposals and attestations from validators, and as such requires an increase in trust assumptions versus PoW. There is, as such, a reliance on slashing mechanisms to keep all validators / staking nodes in check. Furthermore, not all nodes partaking in consensus are staking nodes, and the honest non-staking full nodes could just as well ultimately reject proposals/attestations if they are believed to be inaccurate or dishonest.
Complexity of implementation
PoS is more complex to implement than PoW, e.g. users need to run three pieces of software to participate in Ethereum's PoS compared to one for PoW.
Afterword
Well, there we are! I hope you’ve learnt something from the above. As you’ve hopefully discovered too, there are strengths and weaknesses for both PoW and PoS that should be borne in mind when considering and debating these mechanisms, and the security each brings to a blockchain.
As a one-sentence summary of this key difference in security philosophy, Vitalik himself does an excellent job here:
The “one-sentence philosophy” of proof of stake is thus not “security comes from burning energy”, but rather “security comes from putting up economic value-at-loss”.
Until Ethereum switches to PoS later in 2022, we won’t truly know whether this philosophy will bear out in practice, but it’s safe to say, given the time taken to implement, the Ethereum Foundation and core developers have considered numerous potential pitfalls with great care. It’s up to you, dear reader, whether you believe it will be sufficient.
Further Reading:
Ethereum documentation on foundational topics (including both PoW and PoS): https://ethereum.org/en/developers/docs/
Why Proof-of-Stake (Nov 2020): https://vitalik.ca/general/2020/11/06/pos2020.html
Proof-of-Stake FAQ: https://vitalik.ca/general/2017/12/31/pos_faq.html
2022-06-08 edits:
Updated ‘Security’ sub-sections under both PoW and PoS to better reflect exactly what a 51% attack might involve (i.e. censoring transactions and reorganisation of blocks), and why proposing invalid transactions would likely fail from rejection by honest full nodes.
Updated ‘Finality’ sub-sections under both PoW and PoS to further clarify their respective ‘probabilistic’ and ‘deterministic’ finalities.
We commonly hear the terms ‘distributed ledger’ or ‘database’ when people try to describe blockchains. This isn’t strictly true for Ethereum as its smart contract layer is there to support general computations, hence it is more like a ‘virtual computer’ than the aforementioned two descriptors.
Taken from https://ethereum.org/en/developers/docs/consensus-mechanisms/pow/mining/ with minor wording revisions.
As a comparison, Bitcoin is currently estimated to consume 204.50 TWh annually, comparable to the power consumption of Thailand, and produce 114.06 Mt CO2 annually, comparable to the carbon footprint of the Czech Republic. As a counterpoint, when looking at Bitcoin at least, it can be argued that miners are constantly seeking the cheapest and most efficient sources of energy, with strides having been made in recent months to increase the sustainable electricity mix of mining operations (acknowledging the potential self-selection bias of the survey participants in that link).
I haven’t found any written evidence demonstrating when this has been done, but am instead going off the discussion from a recent Uncommon Core podcast episode.
Why 32 ETH specifically? Explained here. In short, the figure is a trade-off between the costs to be a validator and the hardware requirements needed to follow the chain. An increase in the latter would harm decentralisation of the network.
Note that this description is most relevant for solo stakers, or those that use staking-as-a-service providers. For those that want to stake but don’t have 32 ETH, pooled staking (e.g. via Lido or Rocket Pool) and staking via centralised exchanges is also possible. For more information, visit https://ethereum.org/en/staking/.
Not all nodes on the Ethereum network are staking nodes, for example nodes can just be maintained for consensus and don’t need to be actively staking ETH.
Also known as the ‘nothing-at-stake’ problem.
Not to mention that this $ figure would likely increase due to the inherent ETH price appreciation from an attacker trying to buy enough ETH to acquire a 51% stake.
The attacker’s specific address(es) would be viewable to everyone, making this act of identification trivial.
This idea of checkpoints is also used to address the issue of subjectivity inherent in PoS blockchains.