What are Pedersen Commitments and How They Work
Pedersen commitments allow us to represent arbitrarily large vectors with a single elliptic curve point, while optionally hiding any information about the vector. It gives us an important primitive to prove things about the vector encoded with that point without revealing the vector.
When we discuss Bulletproof Zero Knowledge Proofs, they will generally be of the form “I have two vectors whose inner product is c.” This might seem basic, but you can actually use this mechanism to prove very non-trivial claims. We’ll get to that later.
But for such a proof to work, the vectors can’t just exist in the prover’s head – otherwise the prover can change them at will. They have to be mathematical entities in the real world. Generally, the prover does not want to just pass the two vectors to the verifier, but they still need to “pass something” to the verifier to represent that they’ve selected a pair of vectors and cannot change it.
This is where Pedersen commitments enter the picture.
We assume the reader is already familiar with elliptic curve point addition and multiplication and what it means for a point to be “on the curve.” If not, please refer to the first four chapters of our Zero Knowledge Proof Book.
Notation-wise, we say [A] is an elliptic curve (EC) point, a is a finite field element, and a[A] is point multiplication between finite field element a and EC point [A]. The expression [A] + [B] denotes elliptic curve point addition.
When we design commit reveal functions in smart contracts, they are usually of the form
commitment = hash(value, salt)
where the salt is a random value to prevent an attacker from brute-forcing our guess.
For example, if we were committing a vote, there are only a limited number of choices, and thus the vote selection can be guessed by trying all the votes seeing which hash matches.
The academic terminology for the salt variable here is the blinding factor. Because it is random, the attacker is blinded from being able to guess the value committed
Because the value “commitment” cannot be obtained by an adversary, we say this commitment scheme is hiding.
During the reveal phase, this is called the opening in the literature, the committer reveals the value and the salt, so the other party (or smart contract) can validate that it matches the original commitment. It isn’t possible to obtain another (value, salt) pair that can result in the same commitment, so we say this scheme is binding – the committer cannot change (i.e. is bound to) their committed value after the fact.
A hiding commitment does not allow an adversary to know what value was selected by the commiter. This is usually accomplished by including a random term that the attacker cannot guess.
A blinding term is the random number that makes the commitment impossible to guess. In some situations where we don’t care about zero knowledge (just succinctness), the blinding term might not be included.
An opening is the values that will compute to the commitment.
A binding commitment does not allow the committer to compute a hash with different values. That is, they cannot find two (value, salt) pairs that hash to the same value.
Pedersen commitments behave exactly the same way, except that they use elliptic curve groups instead of cryptographic hash functions.
Under the discrete logarithm assumption, given elliptic curve points [U] and [G], we cannot compute u where [U] = u[G]
Or in Python
from py_ecc.bn128 import G1, multiply u = 569723450 # chosen randomly U = multiply(G1, u) print(U, "cannot compute the discrete log of U")
We say u is the discrete logarithm of [U]. We still refer to u as the discrete logarithm of [U] even though we cannot compute it, because we know it exists. All (cryptographic) elliptic curve points have a discrete logarithm, even if it cannot be computed.
In this sense, elliptic curve point multiplication behaves like a hash function. They are binding as long as we only allow openings within the curve order.
However, although they are binding and we cannot invert them via the discrete logarithm, we have the same problem when u falls in a small range of possible guesses (like in our voting application example). It can be discovered by looping over all possible values of x and multiplying that x by G.
We can make a The Pedersen hiding in the following manner:
commitment = v[G] + s[Q]
where v is the value and s is the salt (or blinding factor) and Q is another elliptic curve point that the committer does not know the discrete logarithm of.
We should emphasize that although the discrete logarithms are unknown, the points [G] and [Q] are public and known to both the verifier and the committer.
Why the committer must not know the discrete logarithm of [Q]
Suppose the committer knows the discrete logarithm of [G] and [Q] and that [Q] = d[G]. This will enable them to find two unequal openings for the same commitment.
Here is how they could cheat the system
[C] = v[G] + s[G][C] = 10[G] + 15[Q] = 10[G] + 15[dG] [C′] = 11[G] + (15d−1)[Q] = 11[G] + (15d−1)[G] [C] = [C′]
The reader can mentally substitute some arbitrary number for d and see this works.
The committer must not know the discrete logarithm relationship between the elliptic curve points they are using.
One way to accomplish this is to have a verifier supply the elliptic curve points for the committer. A simpler way however is to pick the elliptic curve points in a random and transparent way, such as by pseudorandomly selecting elliptic curve points. Given a random elliptic curve point, we do not know its discrete logarithm.
For example, we could start with the generator point, hash the x and y values, then use that to seed a pseudorandom but deterministic search for the next point.
Why are Pedersen Commitments useful?
It seems like Pedersen Commitments are just a normal commit-reveal with a different hash-function, so what’s the point?
This scheme has a couple advantages
Given a generator G, we can add two commitments together aG + bG = (a + b)G. If we include random blinding terms, we can still do a valid opening:]
This is not the case for regular cryptographic hash functions like sha256.
The two points generated from b[G] and s[Q] will not “conflict” with each other when added.
Consider that following is true by associativity:
(a[G]+b[H]) + (c[G]+d[H]) = (a+c)[G] + (c+d)[H]
where [G] and [H] are two elliptic curves with an unknown discrete log relation.
You can think of the elliptic curve points as a basis for a linear combination over orthogonal dimensions.
Although there exist field elements a₁, a₂, b₁, b₂ and elliptic curve points [G], [H], where [G] ≠ [H], a₁ ≠ a₂ and b₁ ≠ b₂ but a₁[G] + b₁[H] = a₂[G] + b₂[H], it is not possible to compute (a₁, a₂, b₁, b₂) without computing the discrete log.
And when the order of our groups is over sufficiently large, the chances of finding such a match by random chance is negligible.
In other words, suppose two different people commit two different [C] = aG + bH and [C’] = a’ + b’H where a ≠ a’ and b ≠ b’. If the discrete log relationship between [G] and [H] is unknown, it is extremely unlikely [C] will equal [C’].
Pedersen commitments are zk-friendly
Elliptic curve addition and multiplication are relatively easy to make circuits for as the only require regular arithmetic operators, while normal hash functions require bitshifting and XOR which require a lot of constraints to encode as a zk circuit.
We can encode as many points as we like in a single point
Our example of using [G] and [Q] can also be thought of a 2D vector commitment without a blinding term. But we can add as many elliptic curve points as we like [G₁, G₂, …, Gₙ] and commit as many scalars as we like.
Pedersen Vector Commitments
We can take the above scheme as step further and commit a set of values rather than a value and a blinding term.
Vector commitment scheme
Suppose we have a set of random elliptic curve points (G₁,…,Gₙ) (that we do not know the discrete logarithm of), and we do the following:
This lets us commit four values to [C] and hide it with r.
Since the committer does not know the discrete logarithm of any of the generators, they don’t know the discrete logarithm of [C]. Hence, this scheme is binding: they can only reveal (v₁,…,vₙ) to produce [C] later, they cannot produce another vector.
Vector commitments can be combined
We can add two Pedersen Vector Commitments to get one commitment to two vectors. This will still only allow the committer to open to the original vectors. The important implementation detail is that we have to use a different set of elliptic curve points to commit against.
Here, r[Q] and s[Q] are the blinding terms. Even if the committer commits the zero vector, the commitment will still appear to be a random point.
The committer will later reveal the original vectors (v₁…vₙ) and (w₁…wₙ) and the blinding term r + s. This is binding: they cannot reveal another pair of vectors and blinding terms.
The fact that we are using (G₁,…,Gₙ) for one vector and (H₁,…,Hₙ) should not imply that there is a special relationship among the G points and a special relationship among the H points. All the points need to be selected pseudorandomly. This is merely notational convenience for saying “this vector of elliptic curve points goes with this vector of field elements, and this other vector of EC points goes with this other vector of field elements.”
There is no practical upper limit to the number of vectors we can commit.
Exercise for the reader: If we used the same G₁…Gₙ for both vectors before adding them, how could a committer open two different vectors for C_3? Give an example. How does using a different set of points H₁…Hₙ prevent this?
Exercise for the reader: What happens if the committer tries to switch the same elements inside the vector?
For example, they commit:
But open with
That is, they switch the first two elements leaving everything else unchanged. Assume that the vector G₁…Gₙ is unpermuted.
Generating random points transparently
How can we generate these random elliptic curve points? One obvious solution is to use a trusted setup, but this isn’t necessary. The committer is able to set up the points in a way they cannot know their discrete logarithm by randomly selecting the points in a transparent way.
They can pick the generator point, mix in a publicly chosen random number, and hash that result (and take it modulo the field_modulus) to obtain another value. if that results in an x value that lies on the elliptic curve, use that as the next generator and hash the (x, y) pair again. Otherwise, if the x-value does not land on the curve, increment x until it does. Because the committer is not generating the points, they don’t know their discrete log. The implementation details of this algorithm are left as an exercise for the reader.
At no point should you generate a point via multiplication, as that would lead to the discrete logarithm being known. You need to select the x values pseudorandomly via a hash function and figure out if it is on the curve.
It is okay to start with the generator (which has a known discrete logarithm of 1) and generate the other points.
Exercise for the reader: Suppose we commit a 2D vector to points [G1] and [G2]. The discrete logarithm for [G1] is known, but the discrete logarithm for [G2] is not known. We will ignore the blinding term for now. Can the committer open to two different vectors? Why or why not?
Learn More with RareSkills
Check out our zk bootcamp if you are looking to learn zero knowledge proofs.