Updated: Sep 5
The groth16 algorithm enables a quadratic arithmetic program to be computed by a prover over elliptic curve points derived in a trusted setup, and quickly checked by a verifier. It uses auxiliary elliptic curve points from the trusted setup to prevent forged proofs.
Given an R1CS Ls⊙Rs = Os with matrices of dimension n rows and n columns, we refer to the polynomials u, v, and w as
Where φ is the transformation from R1CS to QAP, i.e. interpolating the m R1CS with m polynomials.
To refer to a coefficient within a polynomial, we will sometimes write pᵢⱼ, where j is the jth coefficient of the ith polynomial. If j = 0, we refer to the constant portion of the polynomial, otherwise, it is the power of x associated with that polynomial. For example, p₂₃ is the index 2 polynomial, and the coefficient associated with x³.
We refer to elliptic curve points with square brackets, with a subscript referring to which group they belong to. For example, [b]₂ is an elliptic curve point in G₂, and it is equivalent to writing [bG₂]₂. Similarly, [f]₁ is an elliptic curve point in G₁, and [y]₁₂ is an elliptic curve point in G₁₂. Although groth16 isn’t required to use the bilinear pairing we described in the article referenced above, in practice this is the most common elliptic curve family it is used with, so we will stick to denoting the groups as G₁, G₂, and G₁₂ without loss of generality.
Because of the discrete logarithm problem, f cannot be extracted from [f]₁ except by the agent who computed the point multiplication fG₁, we simply refer to the point as [f]₁ for convenience.
In our chapter on evaluating a Quadratic Arithmetic Program at a hidden point τ, we had a significant issue that the prover can simply invent values a, b, c where ab = c and present those as elliptic curve points to the prover.
Thus, the verifier has no idea if elliptic curve points [A]₁, [B]₂, and [C]₁ were the result of a satisfied QAP or made up values.
We need to force the prover to be honest without introducing too much computational overhead. The first algorithm to accomplish this was “Pinocchio: Nearly Practical Verifiable Computation.” This was usable enough for zcash to base the first version of their blockchain on it.
However, groth16 was able to accomplish the same thing in much fewer steps, and the algorithm is still widely in use today, as no algorithm since has produced as efficient a verifier step (though other algorithms have eliminated the trusted setup or significantly reduced the amount of work for the prover).
Preventing forgery part 1: introducing α and β
It’s easy for the prover to fake values because they know the values of [A]₁, [B]₂, and [C]₁. But what if we force the prover to shift the values [A]₁ and [B]₂ by an unknown amount? This can be accomplished by adding elliptic curve points [αG] and [βG] respectively. These are created by the trusted setup agent randomly sampling α and β, creating elliptic curve points [αG] and [βG] and then destroying α and β.
That is, our updated [A] and [B] points are
Now the prover cannot predict the result of pairing [A]₁ and [B]₂! because they no longer know the values of [A]₁ and [B]₂
Despite shifting A and B in this manner, we can still create a balanced equation from the original QAP. Consider the following:
The section in the box is computed by the verifier, and the underlined sections are computed by the prover. The above formulas illustrate what is happening conceptually. We will give a precise definition for the trusted setup and encrypted evaluation of the QAP later. We must note that α and β are elliptic curve points in the actual algorithm, so there are some implementation details in the evaluation.
The point of the exercise above is to show how we can introduce shifts and keep the QAP balanced.
For now, we wish to show that shifting in this manner prevents forgeries if α and β are unknown.
Our previous verification formula:
Under the new formula, the prover can no longer invent A, B and compute C, or invent C and derive A and B. Let’s demonstrate this.
Attack 1: Forging A and B and deriving C
Suppose the prover randomly selects a’ and b’ to produce [A]₁ and [B]₂ and tries to derive a value [C’] that is compatible with the verifier’s formula. Because of the [α]₁[β]₂ term, the prover cannot simply provide c’ = a’b’ then compute [C’].
Here is how the malicious prover would try:
First they pick [A’]₁ and [B’]₂ which results in a pairing [D]₁₂. The malicious prover then does
However, the prover now has a G₁₂ point for C instead of a G₁ point, which the verification formula needs. To get that G₁ point, the malicious prover would have to take the discrete logarithm of [C’]₁₂ to get c’ and then multiply that with G₁ to get [c’G₁]₁.
But the discrete logarithm is infeasible, so this attack does not work.
Note that the malicious prover knows [D], but the reason the attack fails is because they do not know αβ
Let’s try attacking in the other direction.
Attack 2: Forging C and deriving A and B
Here the prover picks a random point c’ and compute [C’]₁. Because they know c’, they can try to discover a compatible combination of a' and b' such that a'b' = c'.
Hence, they compute
Now they need to split [D’]₁₂ into [A]₁ and [B]₂. However, the malicious prover runs into the discrete logarithm problem again because they don’t know the preimage of [D’]₁₂. Given an elliptic curve point in G₁₂, you cannot find two points from G₁ and G₂ that pair to it unless you know the underlying field element that generated the G₁₂ element.
Attack 3: Forging [A’] and [B’] with [a’G₁] + [α]₁ and [b’G₂] + [β]₂ and computing [C’]
This is a bit more clever. It seems like the equation should identically balance out right?
Here is the steps for the attack
Pick a random a’ and b’
Generate [A’] = a’[G] + [α], [B’] = b’[G] + [β]
The rest of the attack, and why it fails, is an exercise for the reader.
Adjusting the trusted setup to accommodate for the new C
When we balanced out the QAP equation after shifting by α and β, we glossed over how exactly the prover would compute C in the portion below:
Because beta will be an elliptic curve point after the trusted setup is complete, it cannot be multiplied with the polynomial without doing a pairing, but that would cause C to no longer be a G₁ point.
However, the term βu(τ) + αv(τ) + w(τ) can be precomputed at τ at the trusted setup. For example, to compute the first polynomial, the following setup is done after τ is selected:
Note that the powers of tau here is implied in the notation. To compute, for example, vᵢ(τ), it is evaluated as
The rather ugly expansion of the above notation is as follows:
Recap: Computing a proof
We begin with a circuit definition for a QAP
Where m is the number of columns in the original R1CS, and n is the number of rows. The polynomials that interpolate the columns will have degree n - 1, so we only need to compute powers of tau up to that power.
During the trusted setup, we compute the following:
We randomly chose (α,β,τ) from our finite field then
Admittedly, that is a lot of notation. The underlined parts are what we send to the prover. The non-underline portions are just a visual expansion of the terms.
As verbose as the notation above is, it will be even less readable in most programming languages, so we leave it as it.
Note that all the elliptic curve points (on the right hand side of the equations) above have been precomputed at the trusted setup phase.
The verifier computes
And accepts if the equation is true.
Preventing forgeries part 2: separating public and private inputs with γ and δ
The equation does not change significantly when we add the public portion of the witness in.
Recall how we split the QAP into the portions computed by the verifier (the public inputs) and the prover (the private inputs).
Here ℓ refers to the first ℓ elements of the witness vector which are public. A typical witness vector looks like [1, out, x1, x2, …] so in this case, ℓ will be 1, because element 0 and element 1 are public. This of course assumes that out is public, which it usually is.
The only thing that changes from above is that [C] is computed over the private inputs, instead of the entire witness vector.
and the verifier has to compute the public portion using the elliptic curve points from the trusted setup. This is the new verification formula
Preventing forgeries with public inputs
The assumption we are making above is that polynomial evaluations βu₀(τ) αw₀(τ) and w₁(τ) are used by the verifier, and not the prover. But there is nothing stopping the prover from using w₀(τ) w₁(τ) and possibly deriving a false proof (this is challenging to do, but it is possible).
To prevent this, the trusted setup agent divides w₀(τ) and w₁(τ) by a secret variable γ the prover portion by a different variable δ. The encrypted versions of these variables [γ] and [δ] are made available so that the verifier and prover can cancel them out if they are honest.
Our trusted setup now needs to incorporate these new values:
We pick random field elements (τ,α,β,γ,δ) and do
All the field elements are done over the order of the elliptic curve we are using, so taking the inverse is just applying the extending euclidian algorithm.
Verification step with γ and δ
Instead of pairing with G₂ at the verification step, we pair with [γ] and [δ]
The [γ] and [δ] terms will cancel out if the prover truly used the polynomials from the trusted setup. The prover (and verifier) do not know the field element that corresponds to [δ], so they cannot cause the terms to cancel out unless they use the values from the trusted setup.
If the prover uses polynomial evaluations from the public input portion of the witness, the γ and δ terms will not cancel.
Enforcing true zero knowledge: r and s
Our scheme is not yet truly zero knowledge. If an attacker is able to guess our witness vector (which is possible if there is only a small range of valid inputs, e.g. secret voting from privileged addresses), then they can verify their guess is correct by comparing their constructed proof to the original proof.
To do this, we introduce another random shift, but this time at the proving phase instead of the setup phase.
The prover samples two random field elements r and s and shifts their values accordingly
Note that we need to include [β]₁ and [δ]₁ as part of the trusted setup for this to work.
It is an exercise for the reader to show that adding r and s in this manner do not alter the balance of the equation for the verifier, regardless of the r and s values shown. To make this exercise easier, consider setting α= β = 0, and γ = δ = 1.
Final groth16 algorithm
Rather than summarize what we have above, we will show screenshots from the original groth16 paper (page 17 and 18).
Some minor differences in notation:
What we call τ the paper calls x.
The paper’s notation of τ is a collection of variables. You can ignore this.
The paper’s notation does encrypted evaluation implicitly, since cryptographers don’t need this explained.
The paper writes γ and δ as division instead of inverse.
We write pairing as pairing(), the paper uses [A]₁·[B]₂
The paper refers to the proof tuple ([A]₁, [B]₂, [C]₁) as π
The paper refers to a collection of G₁ points from the trusted setup as σ₁ and G₂ points as σ₂.
If you can read the paper’s formulas without assistance at this point, that’s good! You’ve taken your first step into a larger world!
Groth16 trusted setup
Verifying Groth16 in Solidity
At this point, you have sufficient knowledge to understand the proof verification code in Solidity.
We link here to tornado cash’s proof verification code. The reader is encouraged to read the source code closely. If the reader is comfortable with Solidity assembly programming, then understanding this source code will not be difficult as the variable names are consistent with the ones in this article.
There is also library support for groth16 on Solana.