404
Page not found :(
The requested page could not be found.
================================================
FILE: docs/Gemfile
================================================
source "https://rubygems.org"
# Hello! This is where you manage which Jekyll version is used to run.
# When you want to use a different version, change it below, save the
# file and run `bundle install`. Run Jekyll with `bundle exec`, like so:
#
# bundle exec jekyll serve
#
# This will help ensure the proper Jekyll version is running.
# Happy Jekylling!
#gem "jekyll", "~> 3.9.0"
# This is the default theme for new Jekyll sites. You may change this to anything you like.
gem "minima", "~> 2.0"
# If you want to use GitHub Pages, remove the "gem "jekyll"" above and
# uncomment the line below. To upgrade, run `bundle update github-pages`.
gem "github-pages", "~> 219", group: :jekyll_plugins
# If you have any plugins, put them here!
group :jekyll_plugins do
gem "jekyll-feed", "~> 0.6"
end
# Windows does not include zoneinfo files, so bundle the tzinfo-data gem
# and associated library.
platforms :mingw, :x64_mingw, :mswin, :jruby do
gem "tzinfo", "~> 1.2"
gem "tzinfo-data"
end
# Performance-booster for watching directories on Windows
gem "wdm", "~> 0.1.0", :platforms => [:mingw, :x64_mingw, :mswin]
# kramdown v2 ships without the gfm parser by default. If you're using
# kramdown v1, comment out this line.
gem "kramdown-parser-gfm"
gem "nokogiri", ">= 1.12.5"
================================================
FILE: docs/_config.yml
================================================
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely edit after that. If you find
# yourself editing this file very often, consider using Jekyll's data files
# feature for the data you need to update frequently.
#
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'bundle exec jekyll serve'. If you change this file, please restart the server process.
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: Anatomy of a STARK
email: alan@nervos.org
description: STARK tutorial with supporting source code in python.
twitter_username: aszepieniec
github_username: aszepieniec
# Build settings
markdown: kramdown
#theme: jekyll-theme-minimal
# Exclude from processing.
# The following items will not be processed, by default. Create a custom list
# to override the default setting.
# exclude:
# - Gemfile
# - Gemfile.lock
# - node_modules
# - vendor/bundle/
# - vendor/cache/
# - vendor/gems/
# - vendor/ruby/
================================================
FILE: docs/_includes/head-custom.html
================================================
================================================
FILE: docs/_posts/2021-10-20-welcome-to-jekyll.markdown
================================================
---
layout: post
title: "Welcome to Jekyll!"
date: 2021-10-20 16:21:33 +0200
categories: jekyll update
---
You’ll find this post in your `_posts` directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run `jekyll serve`, which launches a web server and auto-regenerates your site when a file is updated.
To add new posts, simply add a file in the `_posts` directory that follows the convention `YYYY-MM-DD-name-of-post.ext` and includes the necessary front matter. Take a look at the source for this post to get an idea about how it works.
Jekyll also offers powerful support for code snippets:
{% highlight ruby %}
def print_hi(name)
puts "Hi, #{name}"
end
print_hi('Tom')
#=> prints 'Hi, Tom' to STDOUT.
{% endhighlight %}
Check out the [Jekyll docs][jekyll-docs] for more info on how to get the most out of Jekyll. File all bugs/feature requests at [Jekyll’s GitHub repo][jekyll-gh]. If you have questions, you can ask them on [Jekyll Talk][jekyll-talk].
[jekyll-docs]: https://jekyllrb.com/docs/home
[jekyll-gh]: https://github.com/jekyll/jekyll
[jekyll-talk]: https://talk.jekyllrb.com/
================================================
FILE: docs/about.md
================================================
---
layout: page
title: About
permalink: /about/
---
This is the base Jekyll theme. You can find out more info about customizing your Jekyll theme, as well as basic Jekyll usage documentation at [jekyllrb.com](https://jekyllrb.com/)
You can find the source code for Minima at GitHub:
[jekyll][jekyll-organization] /
[minima](https://github.com/jekyll/minima)
You can find the source code for Jekyll at GitHub:
[jekyll][jekyll-organization] /
[jekyll](https://github.com/jekyll/jekyll)
[jekyll-organization]: https://github.com/jekyll
================================================
FILE: docs/basic-tools.md
================================================
# Anatomy of a STARK, Part 2: Basic Tools
## Finite Field Arithmetic
[Finite fields](https://en.wikipedia.org/wiki/Finite_field) are ubiquitous throughout cryptography because they are natively compatible with computers. For instance, they cannot generate overflow or underflow errors, and their elements have a finite representation in terms of bits.
The easiest way to build a finite field is to select a prime number $p$, use the elements $\mathbb{F}_p \stackrel{\triangle}{=} \lbrace 0, 1, \ldots, p-1\rbrace$, and define the usual addition and multiplication operations in terms of their counterparts for the integers, followed by reduction modulo $p$. Subtraction is equivalent to addition of the left-hand side to the negation of the right-hand side, and negation represents multiplication by $-1 \equiv p-1 \mod p$. Similarly, division is equivalent to multiplication of the left-hand side by the multiplicative inverse of the right-hand side. This inverse can be found using the [extended Euclidean algorithm](https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm), which on input two integers $x$ and $y$, returns their greatest common divisor $g$ along with matching [Bézout coefficients](https://en.wikipedia.org/wiki/B%C3%A9zout's_identity) $a, b$ such that $ax + by = g$. Indeed, whenever $\gcd(x,p) = 1$ the inverse of $x \in \mathbb{F}_p$ is $a$ because $ax + bp \equiv 1 \mod p$. Powers of field elements can be computed with the [square-and-multiply](https://en.wikipedia.org/wiki/Exponentiation_by_squaring) algorithm, which iterates over the bits in the expansion of the exponent, squares an accumulator variable in each iteration, and additionally multiplies it by the base element if the bit is set.
For the purpose of building STARKs we need finite fields with a particular structure[^1]: it needs to contain a substructure of order $2^k$ for some sufficiently large $k$. We consider prime fields whose defining modulus has the form $p = f \cdot 2^k + 1$, where $f$ is some cofactor that makes the number prime. In this case, the group $\mathbb{F}_p \backslash \lbrace 0\rbrace, \times$ has a subgroup of order $2^k$. For all intents and purposes, one can identify this subgroup with $2^k$ evenly spaced points on the complex unit circle.
An implementation starts with the extended Euclidean algorithm, for computing multiplicative inverses.
```python
def xgcd( x, y ):
old_r, r = (x, y)
old_s, s = (1, 0)
old_t, t = (0, 1)
while r != 0:
quotient = old_r // r
old_r, r = (r, old_r - quotient * r)
old_s, s = (s, old_s - quotient * s)
old_t, t = (t, old_t - quotient * t)
return old_s, old_t, old_r # a, b, g
```
It makes sense to separate the logic concerning the field from the logic concerning the field elements. To this end, the field element contains a field object as a proper field; this field object implements the arithmetic. Furthermore, Python supports operator overloading, so we can repurpose natural arithmetic operators to do field arithmetic instead.
```python
class FieldElement:
def __init__( self, value, field ):
self.value = value
self.field = field
def __add__( self, right ):
return self.field.add(self, right)
def __mul__( self, right ):
return self.field.multiply(self, right)
def __sub__( self, right ):
return self.field.subtract(self, right)
def __truediv__( self, right ):
return self.field.divide(self, right)
def __neg__( self ):
return self.field.negate(self)
def inverse( self ):
return self.field.inverse(self)
# modular exponentiation -- be sure to encapsulate in parentheses!
def __xor__( self, exponent ):
acc = FieldElement(1, self.field)
val = FieldElement(self.value, self.field)
for i in reversed(range(len(bin(exponent)[2:]))):
acc = acc * acc
if (1 << i) & exponent != 0:
acc = acc * val
return acc
def __eq__( self, other ):
return self.value == other.value
def __neq__( self, other ):
return self.value != other.value
def __str__( self ):
return str(self.value)
def __bytes__( self ):
return bytes(str(self).encode())
def is_zero( self ):
if self.value == 0:
return True
else:
return False
class Field:
def __init__( self, p ):
self.p = p
def zero( self ):
return FieldElement(0, self)
def one( self ):
return FieldElement(1, self)
def multiply( self, left, right ):
return FieldElement((left.value * right.value) % self.p, self)
def add( self, left, right ):
return FieldElement((left.value + right.value) % self.p, self)
def subtract( self, left, right ):
return FieldElement((self.p + left.value - right.value) % self.p, self)
def negate( self, operand ):
return FieldElement((self.p - operand.value) % self.p, self)
def inverse( self, operand ):
a, b, g = xgcd(operand.value, self.p)
return FieldElement(a, self)
def divide( self, left, right ):
assert(not right.is_zero()), "divide by zero"
a, b, g = xgcd(right.value, self.p)
return FieldElement(left.value * a % self.p, self)
```
Implementing fields generically is nice. However, in this tutorial we will not use any field other than the one with $1+407 \cdot 2^{119}$ elements. This field has a sufficiently large subgroup of power-of-two order.
```python
def main():
p = 1 + 407 * ( 1 << 119 ) # 1 + 11 * 37 * 2^119
return Field(p)
```
Besides ensuring that the subgroup of power-of-two order exists, the code also needs to supply the user with a generator for the entire multiplicative group, as well as the power-of-two subgroups. A generator for such a subgroup of order $n$ will be called a primitive $n$th root.
```python
def generator( self ):
assert(self.p == 1 + 407 * ( 1 << 119 )), "Do not know generator for other fields beyond 1+407*2^119"
return FieldElement(85408008396924667383611388730472331217, self)
def primitive_nth_root( self, n ):
if self.p == 1 + 407 * ( 1 << 119 ):
assert(n <= 1 << 119 and (n & (n-1)) == 0), "Field does not have nth root of unity where n > 2^119 or not power of two."
root = FieldElement(85408008396924667383611388730472331217, self)
order = 1 << 119
while order != n:
root = root^2
order = order/2
return root
else:
assert(False), "Unknown field, can't return root of unity."
```
Lastly, the protocol requires the ability to sample field elements randomly and pseudorandomly. To do this, the user supplies random bytes and the field logic turns them into a field element. The user should take care to provide enough random bytes.
```python
def sample( self, byte_array ):
acc = 0
for b in byte_array:
acc = (acc << 8) ^ int(b)
return FieldElement(acc % self.p, self)
```
## Univariate Polynomials
A *univariate polynomial* is a weighted sum of non-negative powers of a single formal indeterminate. We write polynomials as a formal sum of terms, *i.e.*, $f(X) = c_0 + c_1 \cdot X + \cdots + c_d X^d$ or $f(X) = \sum_{i=0}^d c_i X^i$ because a) the value of the indeterminate $X$ is generally unknown and b) this form emphasises the polynomial's semantic origin and is thus more conducive to building intuition. In these expressions, the $c_i$ are called *coefficients* and $d$ is the polynomial's *degree*.
Univariate polynomials are immensely useful in proof systems because relations that apply to their coefficient vectors extend to their values on a potentially much larger domain. If polynomials are equal, they are equal everywhere; whereas if they are unequal, they are unequal almost everywhere. By this feature, univariate polynomials reduce claims about large vectors to claims about the values of their corresponding polynomials in a small selection of sufficiently random points.
An implementation of univariate polynomial algebra starts with overloading the standard arithmetic operators to compute the right function of the polynomials' coefficient vectors. One important point requires special attention. It is impossible for the *leading coefficient* of a polynomial to be zero, since the leading coefficient means the coefficient of the highest-degree *non-zero* term. However, the implemented vector of coefficients might have trailing zeros, which should be ignored for all intents and purposes. The degree function comes in handy; it is defined here as one less than the length of the vector of coefficients after ignoring trailing zeros. This also means that the zero polynomial has degree $-1$ even though $-\infty$ makes more sense.
```python
from algebra import *
class Polynomial:
def __init__( self, coefficients ):
self.coefficients = [c for c in coefficients]
def degree( self ):
if self.coefficients == []:
return -1
zero = self.coefficients[0].field.zero()
if self.coefficients == [zero] * len(self.coefficients):
return -1
maxindex = 0
for i in range(len(self.coefficients)):
if self.coefficients[i] != zero:
maxindex = i
return maxindex
def __neg__( self ):
return Polynomial([-c for c in self.coefficients])
def __add__( self, other ):
if self.degree() == -1:
return other
elif other.degree() == -1:
return self
field = self.coefficients[0].field
coeffs = [field.zero()] * max(len(self.coefficients), len(other.coefficients))
for i in range(len(self.coefficients)):
coeffs[i] = coeffs[i] + self.coefficients[i]
for i in range(len(other.coefficients)):
coeffs[i] = coeffs[i] + other.coefficients[i]
return Polynomial(coeffs)
def __sub__( self, other ):
return self.__add__(-other)
def __mul__(self, other ):
if self.coefficients == [] or other.coefficients == []:
return Polynomial([])
zero = self.coefficients[0].field.zero()
buf = [zero] * (len(self.coefficients) + len(other.coefficients) - 1)
for i in range(len(self.coefficients)):
if self.coefficients[i].is_zero():
continue # optimization for sparse polynomials
for j in range(len(other.coefficients)):
buf[i+j] = buf[i+j] + self.coefficients[i] * other.coefficients[j]
return Polynomial(buf)
def __eq__( self, other ):
if self.degree() != other.degree():
return False
if self.degree() == -1:
return True
return all(self.coefficients[i] == other.coefficients[i] for i in range(len(self.coefficients)))
def __neq__( self, other ):
return not self.__eq__(other)
def is_zero( self ):
if self.degree() == -1:
return True
return False
def leading_coefficient( self ):
return self.coefficients[self.degree()]
```
This always get a little tricky when implementing division of polynomials. The intuition behind the schoolbook algorithm is that in every iteration you multiply the dividend by the correct term so as to generate a cancellation of leading terms. Once no such term exists, you have your remainder.
```python
def divide( numerator, denominator ):
if denominator.degree() == -1:
return None
if numerator.degree() < denominator.degree():
return (Polynomial([]), numerator)
field = denominator.coefficients[0].field
remainder = Polynomial([n for n in numerator.coefficients])
quotient_coefficients = [field.zero() for i in range(numerator.degree()-denominator.degree()+1)]
for i in range(numerator.degree()-denominator.degree()+1):
if remainder.degree() < denominator.degree():
break
coefficient = remainder.leading_coefficient() / denominator.leading_coefficient()
shift = remainder.degree() - denominator.degree()
subtractee = Polynomial([field.zero()] * shift + [coefficient]) * denominator
quotient_coefficients[shift] = coefficient
remainder = remainder - subtractee
quotient = Polynomial(quotient_coefficients)
return quotient, remainder
def __truediv__( self, other ):
quo, rem = Polynomial.divide(self, other)
assert(rem.is_zero()), "cannot perform polynomial division because remainder is not zero"
return quo
def __mod__( self, other ):
quo, rem = Polynomial.divide(self, other)
return rem
```
In terms of basic arithmetic operations, it is worth including a powering map, although mostly for notational ease rather than performance.
```python
def __xor__( self, exponent ):
if self.is_zero():
return Polynomial([])
if exponent == 0:
return Polynomial([self.coefficients[0].field.one()])
acc = Polynomial([self.coefficients[0].field.one()])
for i in reversed(range(len(bin(exponent)[2:]))):
acc = acc * acc
if (1 << i) & exponent != 0:
acc = acc * self
return acc
```
A polynomial is quite pointless if it does not admit the computation of its value in a given arbitrary point. For STARKs we need something more general -- polynomial evaluation on a *domain* of values, rather than a single point. Performance is not a concern at this point so the following implementation follows a straightforward iterative method. Conversely, STARKs also require polynomial interpolation where the x-coordinates are another known range of values. Once again, performance is not an immediate issue so for the time being standard [Lagrange interpolation](https://en.wikipedia.org/wiki/Lagrange_polynomial) suffices.
```python
def evaluate( self, point ):
xi = point.field.one()
value = point.field.zero()
for c in self.coefficients:
value = value + c * xi
xi = xi * point
return value
def evaluate_domain( self, domain ):
return [self.evaluate(d) for d in domain]
def interpolate_domain( domain, values ):
assert(len(domain) == len(values)), "number of elements in domain does not match number of values -- cannot interpolate"
assert(len(domain) > 0), "cannot interpolate between zero points"
field = domain[0].field
x = Polynomial([field.zero(), field.one()])
acc = Polynomial([])
for i in range(len(domain)):
prod = Polynomial([values[i]])
for j in range(len(domain)):
if j == i:
continue
prod = prod * (x - Polynomial([domain[j]])) * Polynomial([(domain[i] - domain[j]).inverse()])
acc = acc + prod
return acc
```
Speaking of domains: one thing that recurs time and again is the computation of polynomials that vanish on them. Any such polynomial is the multiple of $Z_D(X) = \prod_{d \in D} (X-d)$, the unique monic[^2] lowest-degree polynomial that takes the value 0 in all the points of $D$. This polynomial is usually called the *vanishing polynomial* and sometimes the *zerofier*. This tutorial prefers the second term.
```python
def zerofier_domain( domain ):
field = domain[0].field
x = Polynomial([field.zero(), field.one()])
acc = Polynomial([field.one()])
for d in domain:
acc = acc * (x - Polynomial([d]))
return acc
```
Another useful tool is the ability to *scale* polynomials. Specifically, this means obtaining the vector of coefficients of $f(c \cdot X)$ from that of $f(X)$. This function is particularly useful when $f(X)$ is defined to take a sequence of values on the powers of $c$: $v_i = f(c^i)$. Then $f(c \cdot X)$ represents the same sequence of values but shifted by one position.
```python
def scale( self, factor ):
return Polynomial([(factor^i) * self.coefficients[i] for i in range(len(self.coefficients))])
```
The last function that belongs to the univariate polynomial module anticipates a key operations in the FRI protocol, namely testing whether a triple of points fall on the same line -- a fancy word for which is *colinearity*.
```python
def test_colinearity( points ):
domain = [p[0] for p in points]
values = [p[1] for p in points]
polynomial = Polynomial.interpolate_domain(domain, values)
return polynomial.degree() <= 1
```
Before moving on to the next section, it is worth pausing to note that all ingredients are in place for *finite extension fields*, or simply *extension fields*. A finite field is simply a set equipped with addition and multiplication operators that behave according to high school algebra rules, *e.g.* every nonzero element has an inverse, or no two nonzero elements multiplied give zero. There are two ways to obtain them:
1. Start with the set of integers, and reduce the result of any addition or multiplication modulo a given prime number $p$.
2. Start with the set of polynomials over a finite field, and reduce the result of any addition or multiplication modulo a given *irreducible polynomial* $p(X)$. A polynomial is *irreducible* when it cannot be decomposed as the product of two smaller polynomials, analogously to prime numbers.
The point is that it is possible to do the arithmetization in a smaller field than cryptographic compilation step, as long as the latter step uses an extension field of that of the former. Specifically and for example, [EthSTARK](https://github.com/starkware-libs/ethSTARK) operates over the finite field defined by a 62-bit prime, but the FRI step operates over a quadratic extension field thereof in order to target a higher security level.
This tutorial will not use extension fields, and so an elaborate discussion of the topic is out of scope.
## Multivariate Polynomials
*Multivariate polynomials* generalize univariate polynomials to many indeterminates -- not just $X$, but $X, Y, Z, \ldots$. Where univariate polynomials are useful for reducing big claims about large vectors to small claims about scalar values in random points, multivariate polynomials are useful for articulating the arithmetic constraints that an integral computation satisfies.
For example, consider the [arithmetic-geometric mean](https://en.wikipedia.org/wiki/Arithmetic%E2%80%93geometric_mean), which is defined as the limit of either the first or second coordinate (which are equal in the limit) of the sequence $(a, b) \mapsto \left( \frac{a+b}{2}, \sqrt{a \cdot b} \right)$, for a given starting point $(a_0, b_0)$. In order to prove the integrity of several iterations of this process[^3], what is needed is a set of multivariate polynomials that capture the constraint of the correct application of a single iteration that relates the current state, $X_0, X_1$ to the next state, $Y_0, Y_1$. In this phrase, the word *capture* means that the polynomial evaluates to zero if the computation is integral. These polynomials might be $m_0(X_0, X_1, Y_0, Y_1) = Y_0 - \frac{X_0 + X_1}{2}$ and $m_1(X_0, X_1, Y_0, Y_1) = Y_1^2 - X_0 \cdot X_1$. (Note that the natural choice $m_1(X_0, X_1, Y_0, Y_1) = Y_1 - \sqrt{X_0 \cdot X_1}$ is not in fact a polynomial, but has the same zeros.)
Where the natural structure for implementing univariate polynomials is a list of coefficients, the natural structure for multivariate polynomials is a dictionary mapping exponent vectors to coefficients. Whenever this dictionary contains zero coefficients, they should be ignored. As usual, the first step is to overload the standard arithmetic operators, basic constructors, and standard functionalities.
```python
class MPolynomial:
def __init__( self, dictionary ):
self.dictionary = dictionary
def zero():
return MPolynomial(dict())
def __add__( self, other ):
dictionary = dict()
num_variables = max([len(k) for k in self.dictionary.keys()] + [len(k) for k in other.dictionary.keys()])
for k, v in self.dictionary.items():
pad = list(k) + [0] * (num_variables - len(k))
pad = tuple(pad)
dictionary[pad] = v
for k, v in other.dictionary.items():
pad = list(k) + [0] * (num_variables - len(k))
pad = tuple(pad)
if pad in dictionary.keys():
dictionary[pad] = dictionary[pad] + v
else:
dictionary[pad] = v
return MPolynomial(dictionary)
def __mul__( self, other ):
dictionary = dict()
num_variables = max([len(k) for k in self.dictionary.keys()] + [len(k) for k in other.dictionary.keys()])
for k0, v0 in self.dictionary.items():
for k1, v1 in other.dictionary.items():
exponent = [0] * num_variables
for k in range(len(k0)):
exponent[k] += k0[k]
for k in range(len(k1)):
exponent[k] += k1[k]
exponent = tuple(exponent)
if exponent in dictionary.keys():
dictionary[exponent] = dictionary[exponent] + v0 * v1
else:
dictionary[exponent] = v0 * v1
return MPolynomial(dictionary)
def __sub__( self, other ):
return self + (-other)
def __neg__( self ):
dictionary = dict()
for k, v in self.dictionary.items():
dictionary[k] = -v
return MPolynomial(dictionary)
def __xor__( self, exponent ):
if self.is_zero():
return MPolynomial(dict())
field = list(self.dictionary.values())[0].field
num_variables = len(list(self.dictionary.keys())[0])
exp = [0] * num_variables
acc = MPolynomial({tuple(exp): field.one()})
for b in bin(exponent)[2:]:
acc = acc * acc
if b == '1':
acc = acc * self
return acc
def constant( element ):
return MPolynomial({tuple([0]): element})
def is_zero( self ):
if not self.dictionary:
return True
else:
for v in self.dictionary.values():
if v.is_zero() == False:
return False
return True
def variables( num_variables, field ):
variables = []
for i in range(num_variables):
exponent = [0] * i + [1] + [0] * (num_variables - i - 1)
variables = variables + [MPolynomial({tuple(exponent): field.one()})]
return variables
```
Since multivariate polynomials are a generalization of univariate polynomials, there needs to be a way to inherit the logic that was already defined for the former class. The function `lift` does this by lifting a univariate polynomial to the multivariate polynomials. The second argument is the index of the indeterminate that corresponds to the univariate indeterminate.
```python
def lift( polynomial, variable_index ):
if polynomial.is_zero():
return MPolynomial({})
field = polynomial.coefficients[0].field
variables = MPolynomial.variables(variable_index+1, field)
x = variables[-1]
acc = MPolynomial({})
for i in range(len(polynomial.coefficients)):
acc = acc + MPolynomial.constant(polynomial.coefficients[i]) * (x^i)
return acc
```
Next up is evaluation. The argument to this method needs to be a tuple of scalars since it needs to assign a value to every indeterminate. However, it is worth anticipating a feature used in the STARK whereby the evaluation is *symbolic*: instead of evaluating the multivariate polynomial in a tuple of scalars, it is evaluated in a tuple of *univariate polynomials*. The result is not a scalar, but a new univariate polynomial.
```python
def evaluate( self, point ):
acc = point[0].field.zero()
for k, v in self.dictionary.items():
prod = v
for i in range(len(k)):
prod = prod * (point[i]^k[i])
acc = acc + prod
return acc
def evaluate_symbolic( self, point ):
acc = Polynomial([])
for k, v in self.dictionary.items():
prod = Polynomial([v])
for i in range(len(k)):
prod = prod * (point[i]^k[i])
acc = acc + prod
return acc
```
## The Fiat-Shamir Transform
In an interactive public coin protocol, the verifier's messages are pure randomness sampled from a distribution that *anyone* can sample from. The objective is to obtain a non-interactive protocol that proves the same thing, without sacrificing security. The Fiat-Shamir transform achieves this.

It turns out that for generating security against malicious provers, generating the verifier's messages randomly as the interactive protocol stipulates, is overkill. It is sufficient that the verifier's messages be difficult to predict by the prover. Hash functions are deterministic but still satisfy this property of outputs being difficult to predict. So intuitively, the protocol remains secure if the verifier's authentic randomness is replaced by a hash function's pseudorandom output. It is necessary to restrict the prover's control over what input goes into the hash function, because otherwise he can grind until he finds a suitable output. It suffices to set the input to the transcript of the protocol up until the point where the verifier's message is needed.
This is exactly the intuition behind the Fiat-Shamir transform: replace the verifier's random messages by the hash of the transcript of the protocol up until those points. The *Fiat-Shamir heuristic* states that this transform retains security. In an idealized model of the hash function called the *random oracle model*, this security is provable.
The Fiat-Shamir transform presents the first engineering challenge. The interactive protocol is described in terms of a *channel* which passes messages from prover to receiver or the other way around. The transform serializes this communication while enabling a description of the prover that makes abstraction of it. The transform does modify the description of the verifier, which becomes deterministic.
A *proof stream* is a useful concept to simulate this channel. The difference with respect to regular streams in programming is that there is no actual transmission to another process or computer taking place, and nor do sender and receiver need to operate simultaneously. It is not a simple queue either because the prover and the verifier have access to a function that computes pseudorandomness by hashing their view of the channel. For the prover, this view is the entire list of all messages *sent* so far. For the verifier, this view is the sublist of messages *read* so far. The verifier's messages are not added to the list because they can be deterministically computed from them. Given the list of prover's messages, serialization is straightforward. The non-interactive proof is exactly this serialization.

In terms of implementation, what is needed is a class `ProofStream` that supports 3 functionalities.
1. Pushing and pulling objects to and from a queue. The queue is simulated by a list with a read index. Whenever an item is pushed, it is appended. Whenever an item is pulled, the read index is incremented by one.
2. Serialization and deserialization. The amazing python library `pickle` does this.
3. Fiat-Shamir. Hashing is done below by first serializing the queue or the first part of it, and then applying SHAKE-256. SHAKE-256 admits a variable output length, which the particular application may want to set. By default the output length is set to 32 bytes.
```python
from hashlib import shake_256
import pickle as pickle # serialization
class ProofStream:
def __init__( self ):
self.objects = []
self.read_index = 0
def push( self, obj ):
self.objects += [obj]
def pull( self ):
assert(self.read_index < len(self.objects)), "ProofStream: cannot pull object; queue empty."
obj = self.objects[self.read_index]
self.read_index += 1
return obj
def serialize( self ):
return pickle.dumps(self.objects)
def deserialize( bb ):
ps = ProofStream()
ps.objects = pickle.loads(bb)
return ps
def prover_fiat_shamir( self, num_bytes=32 ):
return shake_256(self.serialize()).digest(num_bytes)
def verifier_fiat_shamir( self, num_bytes=32 ):
return shake_256(pickle.dumps(self.objects[:self.read_index])).digest(num_bytes)
```
## Merkle Tree
A [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) is a vector commitment scheme built from a collision-resistant hash function[^4]. Specifically, it allows the user to commit to an array of $2^N$ items such that:
- The commitment is a single hash digest and this commitment is *binding* -- it represents the array in a way that prevents the user from changing it without first breaking the hash function;
- For any index $i \in \lbrace0, \ldots, 2^N-1\rbrace$, the value in location $i$ of the array represented by the commitment can be proven with $N$ more hash digests.
Specifically, every leaf of the binary tree represents the hash of a data element. Every non-leaf node represents the hash of the concatenation of its two children. The root of the tree is the commitment. A membership proof consists of all siblings of nodes on a path from the indicated leaf to the root. This list of siblings is called an *authentication path*, and provides the verifier with $N$ complete preimages to the hash function at every step of the path, leading to a final test in the root node.

An implementation of this construct needs to provide three functionalities:
1. $\mathsf{commit}$ -- computes the Merkle root of a given array.
2. $\mathsf{open}$ -- computes the authentication path of an indicated leaf in the Merkle tree.
3. $\mathsf{verify}$ -- verifies that a given leaf is an element of the committed vector at the given index.
If performance is not an issue (and for this tutorial it is not), the recursive nature of these functionalities gives rise to a wonderfully functional implementation.
```python
from hashlib import blake2b
class Merkle:
H = blake2b
def commit_( leafs ):
assert(len(leafs) & (len(leafs)-1) == 0), "length must be power of two"
if len(leafs) == 1:
return leafs[0]
else:
return Merkle.H(Merkle.commit_(leafs[:len(leafs)//2]) + Merkle.commit_(leafs[len(leafs)//2:])).digest()
def open_( index, leafs ):
assert(len(leafs) & (len(leafs)-1) == 0), "length must be power of two"
assert(0 <= index and index < len(leafs)), "cannot open invalid index"
if len(leafs) == 2:
return [leafs[1 - index]]
elif index < (len(leafs)/2):
return Merkle.open_(index, leafs[:len(leafs)//2]) + [Merkle.commit_(leafs[len(leafs)//2:])]
else:
return Merkle.open_(index - len(leafs)//2, leafs[len(leafs)//2:]) + [Merkle.commit_(leafs[:len(leafs)//2])]
def verify_( root, index, path, leaf ):
assert(0 <= index and index < (1 << len(path))), "cannot verify invalid index"
if len(path) == 1:
if index == 0:
return root == Merkle.H(leaf + path[0]).digest()
else:
return root == Merkle.H(path[0] + leaf).digest()
else:
if index % 2 == 0:
return Merkle.verify_(root, index >> 1, path[1:], Merkle.H(leaf + path[0]).digest())
else:
return Merkle.verify_(root, index >> 1, path[1:], Merkle.H(path[0] + leaf).digest())
```
This functional implementation overlooks one important aspect: the data objects are rarely hash digests. So in order to use these functions in combination with real-world data, the real-world data elements must be hashed first. This hashing for preprocessing is part of the Merkle tree logic, so the Merkle tree module needs to be extended to accommodate this.
```python
def commit( data_array ):
return Merkle.commit_([Merkle.H(bytes(da)).digest() for da in data_array])
def open( index, data_array ):
return Merkle.open_(index, [Merkle.H(bytes(da)).digest() for da in data_array])
def verify( root, index, path, data_element ):
return Merkle.verify_(root, index, path, Merkle.H(bytes(data_element)).digest())
```
[0](index) - [1](overview) - **2** - [3](fri) - [4](stark) - [5](rescue-prime) - [6](faster)
[^1]: Actually, an [amazing new paper](https://arxiv.org/pdf/2107.08473.pdf) by the StarkWare team shows how to apply the same techniques in *any* finite field, whether it has the requisite structure or not. This tutorial explains the construction the simple way, using structured finite fields.
[^2]: A *monic* polynomial is one whose leading coefficient is one.
[^3]: Never mind that it does not make any sense to prove the correct computation of the algebraic-geometric mean of finite field elements; it serves the purpose of illustration.
[^4]: In some cases, such as hash-based signatures, collision-resistance may be overkill and more basic security notion such as second-preimage resistance may suffice.
================================================
FILE: docs/faster.md
================================================
# Anatomy of a STARK, Part 6: Speeding Things Up
The previous part of this tutorial posed the question whether maths-level improvement can reduce the running times of the STARK algorithms. Indeed they can! There are folklore computational algebra tricks that are independent of the STARK machinery, as well as some techniques specific to interactive proof systems.
## The Number Theoretic Transform and its Applications
### The Fast Fourier Transform
Let $f(X)$ be a polynomial of degree at most $2^k - 1$ with complex numbers as coefficients. What is the most efficient way to find the list of evaluations $f(X)$ on the $2^k$ complex roots of unity? Specifically, let $\omega = e^{2 \pi i / 2^k}$, then the output of the algorithm should be $(f(\omega^i))_{i=0}^{2^k-1} = (f(1), f(\omega), f(\omega^2), \ldots, f(\omega^{2^k-1}))$.
The naïve solution is to sequentially compute each evaluation individually. A more intelligent solution relies on the observation that $f(\omega^i) = \sum_{j=0}^{2^k-1} \omega^{ij} f_j$ and splitting the even and odd terms gives
$$ f(\omega^i) = \sum_{j=0}^{2^{k-1}-1} \omega^{i(2j)}f_{2j} + \sum_{j=0}^{2^{k-1}-1} \omega^{i(2j+1)} f_{2j+1} \\
= \sum_{j=0}^{2^{k-1}-1} \omega^{i(2j)}f_{2j} + \omega^i \cdot \sum_{j=0}^{2^{k-1}-1} \omega^{i(2j)} f_{2j+1} \\
= f_E(\omega^{2i}) + \omega^i \cdot f_O(\omega^{2i}) \enspace , $$
where $f_E(X)$ and $f_O(X)$ are the polynomials whose coefficients are the even coefficients, and odd coefficients respectively, of $f(X)$.
In other words, the evaluation of $f(X)$ at $\omega^i$ can be described in terms of the evaluations of $f_E(X)$ and $f_O(X)$ at $\omega^{2i}$. The same is true for a batch of points $\lbrace\omega^{ij}\rbrace_ {j=0}^{2^k-1}$, in which case the values of $f_E(X)$ and $f_O(X)$ on a domain of only half the size are needed: $\lbrace(\omega^{ij})^2\rbrace_ {j=0}^{2^k-1} = \lbrace(\omega^{2i})^j\rbrace_ {j=0}^{2^{k-1}-1}$. Note that tasks of batch-evaluating $f_E(X)$ and $f_O(X)$ are independent tasks of half the size. This screams divide and conquer! Specifically, the following strategy suggests itself:
- split the coefficient vector into even and odd parts;
- evaluate $f_E(X)$ on $\lbrace(\omega^{2i})^j\rbrace_{j=0}^{2^{k-1}-1}$ by recursion;
- evaluate $f_O(X)$ on $\lbrace(\omega^{2i})^j\rbrace_{j=0}^{2^{k-1}-1}$ by recursion;
- merge the evaluation vectors using the formula $f(\omega^i) = f_E(\omega^{2i}) + \omega^i \cdot f_O(\omega^{2i})$.
Vòilá! That's the fast Fourier transform (FFT). The reason why the $2^k$th root of unity is needed is because it guarantees that $\lbrace(\omega^{ij})^2\rbrace_ {j=0}^{2^k-1} = \lbrace(\omega^{2i})^j\rbrace_ {j=0}^{2^{k-1}-1}$, and so the recursion really is on a domain of half the size. Phrased differently, if you were to use a similar strategy to evaluate $f(X)$ in $\lbrace z^j\rbrace_{j=0}^{2^k-1}$ where $z$ is not a primitive $2^k$th root of unity then the evaluation domain would not shrink with every recursion step. There are $k$ recursion steps, and at each level there are $2^k$ multiplications and additions, so the complexity of this algorithm is $O(2^k \cdot k)$, or expressed in terms of the length of the coefficient vector $N = 2^k$, $O(N \cdot \log N)$. A lot faster than the $O(N^2)$ complexity of the naïve sequential algorithm.
Note that the only property that we need from $\omega$ is that the set of squares $\lbrace\omega^j\rbrace_{j=0}^{2^k-1}$ is a set of half the size. The number $\omega$ satisfies this property because $\omega^{2^{k-1}+i} = -\omega^i$. Importantly, $\omega$ does not need to be a complex number as long as it satisfies this property. In fact, whenever a finite field has a subgroup of order $2^k$, this subgroup is generated by some $\omega$, and this $\omega$ can be used in exactly the same way. The resulting algorithm is a finite field analogue of the FFT, sometimes called the *Number Theory Transform (NTT)*.
```python
def ntt( primitive_root, values ):
assert(len(values) & (len(values) - 1) == 0), "cannot compute ntt of non-power-of-two sequence"
if len(values) <= 1:
return values
field = values[0].field
assert(primitive_root^len(values) == field.one()), "primitive root must be nth root of unity, where n is len(values)"
assert(primitive_root^(len(values)//2) != field.one()), "primitive root is not primitive nth root of unity, where n is len(values)"
half = len(values) // 2
odds = ntt(primitive_root^2, values[1::2])
evens = ntt(primitive_root^2, values[::2])
return [evens[i % half] + (primitive_root^i) * odds[i % half] for i in range(len(values))]
```
The real magic comes into play when we apply the FFT (or NTT) twice, but use the inverse of $\omega$ for the second layer. Specifically, what happens if we treat the list of evaluations as a list of polynomial coefficients, and evaluate this polynomial in the $2^k$th roots of unity, in opposite order?
Recall that the $i$th coefficient of the Fourier transform is $f(\omega^i) = \sum_{j=0}^{2^k-1} f_j \omega^{ij}$. So the $l$th coefficient of the double Fourier transform is
$$ \sum_{i=0}^{2^k-1} f(\omega^i) \omega^{-il} = \sum_{i=0}^{2^k-1} \left( \sum_{j=0}^{2^k-1} f_j \omega^{ij} \right) \omega^{-il} \\
= \sum_{j=0}^{2^k-1} f_j \sum_{i=0}^{2^k-1} \omega^{i(l-j)} \enspace .$$
Whenever $l-j \neq 0$, the sum $\sum_{i=0}^{2^k-1} \omega^{i(l+j)}$ vanishes. To see this, recall that $\omega^{2^{k-1} + i} = -\omega^i$ for all $i$, so every term in this sum has an equal and opposite term that cancels it. So in the formula above, the only coefficient $f_j$ that is multiplied by a nonzero sum is $f_{l}$, and in fact this sum is $\sum_{i=0}^{2^k-1}1 = 2^k$. So in summary, the $l$th coefficient of the double Fourier transform of $\mathbf{f}$ is $2^k \cdot f_{l}$, which is the same as the $l$th coefficient of $\mathbf{f}$ but scaled by a factor $2^k$.
What was derived was an inverse fast Fourier transform. Specifically, this inverse is the same as the regular fast Fourier transform, except:
- it uses $\omega^{-1}$ instead of $\omega$; and
- it needs to undo the scaling factor $2^k$ on every coefficient.
Once again, the logic applies to finite fields that are equipped with a subgroup of order $2^k$ without any change, resulting in the inverse NTT.
```python
def intt( primitive_root, values ):
assert(len(values) & (len(values) - 1) == 0), "cannot compute intt of non-power-of-two sequence"
if len(values) == 1:
return values
field = values[0].field
ninv = FieldElement(len(values), field).inverse()
transformed_values = ntt(primitive_root.inverse(), values)
return [ninv*tv for tv in transformed_values]
```
### Fast Polynomial Arithmetic
The NTT is popular in computer algebra because the Fourier transform induces a homomorphism for polynomials and their values. Specifically, multiplication of polynomials corresponds to element-wise multiplication of their Fourier transforms. To see this, apply the formula for the Fourier transform to the formula for the product polynomial. To see why this is true, remember that the Fourier transform represents the *evaluations* of a polynomial. Clearly, the evaluation of $h(X) = f(X) \cdot g(X)$ in any point $z$ is the product of the evaluations of $f(X)$ and $g(X)$ in $z$. As long as $\mathsf{deg}(h(X)) < 2^k$, we can compute this product by:
- computing the NTT;
- multiplying the resulting vectors element-wise; and
- computing the inverse NTT.
```python
def fast_multiply( lhs, rhs, primitive_root, root_order ):
assert(primitive_root^root_order == primitive_root.field.one()), "supplied root does not have supplied order"
assert(primitive_root^(root_order//2) != primitive_root.field.one()), "supplied root is not primitive root of supplied order"
if lhs.is_zero() or rhs.is_zero():
return Polynomial([])
field = lhs.coefficients[0].field
root = primitive_root
order = root_order
degree = lhs.degree() + rhs.degree()
if degree < 8:
return lhs * rhs
while degree < order // 2:
root = root^2
order = order // 2
lhs_coefficients = lhs.coefficients[:(lhs.degree()+1)]
while len(lhs_coefficients) < order:
lhs_coefficients += [field.zero()]
rhs_coefficients = rhs.coefficients[:(rhs.degree()+1)]
while len(rhs_coefficients) < order:
rhs_coefficients += [field.zero()]
lhs_codeword = ntt(root, lhs_coefficients)
rhs_codeword = ntt(root, rhs_coefficients)
hadamard_product = [l * r for (l, r) in zip(lhs_codeword, rhs_codeword)]
product_coefficients = intt(root, hadamard_product)
return Polynomial(product_coefficients[0:(degree+1)])
```
Fast multiplication serves as the basis for a bunch of fast polynomial arithmetic algorithms. Of particular interest to this tutorial is the calculation of *zerofiers* -- the polynomials that vanish on a given list of points called the *domain*. For this task, the divide-and-conquer strategy suggests itself:
- divide the domain into two equal parts;
- compute the zerofiers for the two parts separately; and
- multiply the zerofiers using fast multiplication.
```python
def fast_zerofier( domain, primitive_root, root_order ):
assert(primitive_root^root_order == primitive_root.field.one()), "supplied root does not have supplied order"
assert(primitive_root^(root_order//2) != primitive_root.field.one()), "supplied root is not primitive root of supplied order"
if len(domain) == 0:
return Polynomial([])
if len(domain) == 1:
return Polynomial([-domain[0], primitive_root.field.one()])
half = len(domain) // 2
left = fast_zerofier(domain[:half], primitive_root, root_order)
right = fast_zerofier(domain[half:], primitive_root, root_order)
return fast_multiply(left, right, primitive_root, root_order)
```
Another task benefiting from fast multiplication (not to mention fast zerofier calculation) is batch evaluation in an arbitrary domain. The idea behind the algorithm is to progressively reduce the given polynomial to a new polynomial that takes the same values on a subset of the domain. The term "reduce" is not a metaphor -- it is polynomial reduction modulo the zerofier for that domain. So this gives rise to another divide-and-conquer algorithm:
- divide the domain into two halves, left and right;
- compute the zerofier for each half;
- reduce the polynomial modulo left zerofier and modulo right zerofier;
- batch-evaluate the left remainder in the left domain half and the right remainder in the right domain;
- concatenate the vectors of evaluation.
Note that the zerofiers, which are calculated by another divide-and-conquer algorithm, are used in the opposite order to how they are produced. A slightly more complex algorithm makes use of memoization for a performance boost.
```python
def fast_evaluate( polynomial, domain, primitive_root, root_order ):
assert(primitive_root^root_order == primitive_root.field.one()), "supplied root does not have supplied order"
assert(primitive_root^(root_order//2) != primitive_root.field.one()), "supplied root is not primitive root of supplied order"
if len(domain) == 0:
return []
if len(domain) == 1:
return [polynomial.evaluate(domain[0])]
half = len(domain) // 2
left_zerofier = fast_zerofier(domain[:half], primitive_root, root_order)
right_zerofier = fast_zerofier(domain[half:], primitive_root, root_order)
left = fast_evaluate(polynomial % left_zerofier, domain[:half], primitive_root, root_order)
right = fast_evaluate(polynomial % right_zerofier, domain[half:], primitive_root, root_order)
return left + right
```
Let's now turn to the opposite of evaluation -- polynomial interpolation. Ideally, we would like to apply another divide-and-conquer strategy, but it's tricky. We can divide the set of points into two halves and find the interpolants for each, but then how do we combine them?
How about finding the polynomial that passes through the left half of points, and takes the value 0 in the x-coordinates of the right half, and vice versa? This is certainly progress because adding them will give the desired interpolant. However, this is no longer a divide-and-conquer algorithm because after one recursion step the magnitude of the problem is still the same.
What if we find the interpolant through the left half of points, and multiply it by the zerofier of right half's x-coordinates? Close, but no cigar: the zerofier will take values different from 1 on the left x-coordinates, meaning that multiplication will destroy the information embedded in the left interpolant.
But the right zerofier's values in the left x-coordinates are not random, and can be predicted simply by calculating the right zerofier and batch-evaluating it in the left x-coordinates. What needs to be done is to find the polynomial that passes through points whose x-coordinates correspond to the left half of points, and whose y-coordinates anticipate multiplication by the zerofier. These are just the left y-coordinates, divided by values of the right zerofier in the matching x-coordinates.
```python
def fast_interpolate( domain, values, primitive_root, root_order ):
assert(primitive_root^root_order == primitive_root.field.one()), "supplied root does not have supplied order"
assert(primitive_root^(root_order//2) != primitive_root.field.one()), "supplied root is not primitive root of supplied order"
assert(len(domain) == len(values)), "cannot interpolate over domain of different length than values list"
if len(domain) == 0:
return Polynomial([])
if len(domain) == 1:
return Polynomial([values[0]])
half = len(domain) // 2
left_zerofier = fast_zerofier(domain[:half], primitive_root, root_order)
right_zerofier = fast_zerofier(domain[half:], primitive_root, root_order)
left_offset = fast_evaluate(right_zerofier, domain[:half], primitive_root, root_order)
right_offset = fast_evaluate(left_zerofier, domain[half:], primitive_root, root_order)
if not all(not v.is_zero() for v in left_offset):
print("left_offset:", " ".join(str(v) for v in left_offset))
left_targets = [n / d for (n,d) in zip(values[:half], left_offset)]
right_targets = [n / d for (n,d) in zip(values[half:], right_offset)]
left_interpolant = fast_interpolate(domain[:half], left_targets, primitive_root, root_order)
right_interpolant = fast_interpolate(domain[half:], right_targets, primitive_root, root_order)
return left_interpolant * right_zerofier + right_interpolant * left_zerofier
```
Next up: fast evaluation on a coset. This task is needed in the STARK pipeline when transforming a polynomial into a codeword to be input to FRI. It is possible to solve this problem using fast batch-evaluation on arbitrary domains. However, when the given domain coincides with a coset of order $2^k$, it would be a shame not to use the NTT directly. The only question is how to shift the domain of evaluation. This is precisely what polynomial scaling achieves.
```python
def fast_coset_evaluate( polynomial, offset, generator, order ):
scaled_polynomial = polynomial.scale(offset)
values = ntt(generator, scaled_polynomial.coefficients + [offset.field.zero()] * (order - len(polynomial.coefficients)))
return values
```
Fast evaluation on a coset allows us to answer a pesky problem that arises when adapting the fast multiplication procedure to divide instead of multiply. Where fast multiplication used element-wise multiplication on codewords, fast division uses element-wise division on codewords, where the codewords are obtained by applying the NTT to the polynomials' coefficient vectors. The problem is this: what happens when the divisor codeword is zero in a given location? If the numerator codeword is not zero in that location, then the division is unclean and has a nonzero remainder. In this case the entire operation can be flagged as erroneous. However, there can still be clean division if the numerator is also zero in the given location. The naïve fast division algorithm fails because of a zero-divided-by-zero error, even though the underlying polynomials generate a clean division. This is exactly the problem that occurs when attempting to use NTTs to divide out the zerofiers. We got around this problem in the previous part of the tutorial by using polynomial long division instead, but this solution has a *quadratic* running time. We want quasilinear!
The solution is to perform the element-wise division on codewords arising from evaluation on a coset of the group over which the NTT is defined. Specifically, the procedure involves five steps:
- scale,
- NTT,
- element-wise divide,
- inverse NTT, and
- unscale.
This solution only works if the denominator polynomials do not have any zeros on the coset. However, in some cases (like dividing out zerofiers), the denominator is *known* not to have zeros on a particular coset.
The Python code has a lot of boilerplate to deal with special circumstances, but in the end it boils down to those five steps.
```python
def fast_coset_divide( lhs, rhs, offset, primitive_root, root_order ): # clean division only!
assert(primitive_root^root_order == primitive_root.field.one()), "supplied root does not have supplied order"
assert(primitive_root^(root_order//2) != primitive_root.field.one()), "supplied root is not primitive root of supplied order"
assert(not rhs.is_zero()), "cannot divide by zero polynomial"
if lhs.is_zero():
return Polynomial([])
assert(rhs.degree() <= lhs.degree()), "cannot divide by polynomial of larger degree"
field = lhs.coefficients[0].field
root = primitive_root
order = root_order
degree = max(lhs.degree(),rhs.degree())
if degree < 8:
return lhs / rhs
while degree < order // 2:
root = root^2
order = order // 2
scaled_lhs = lhs.scale(offset)
scaled_rhs = rhs.scale(offset)
lhs_coefficients = scaled_lhs.coefficients[:(lhs.degree()+1)]
while len(lhs_coefficients) < order:
lhs_coefficients += [field.zero()]
rhs_coefficients = scaled_rhs.coefficients[:(rhs.degree()+1)]
while len(rhs_coefficients) < order:
rhs_coefficients += [field.zero()]
lhs_codeword = ntt(root, lhs_coefficients)
rhs_codeword = ntt(root, rhs_coefficients)
quotient_codeword = [l / r for (l, r) in zip(lhs_codeword, rhs_codeword)]
scaled_quotient_coefficients = intt(root, quotient_codeword)
scaled_quotient = Polynomial(scaled_quotient_coefficients[:(lhs.degree() - rhs.degree() + 1)])
return scaled_quotient.scale(offset.inverse())
```
## Fast Zerofier Evaluation
The algorithms described above chiefly apply to the prover, whose complexity drops from $O(T^2)$ to $O(T \log T)$. Scalability for the prover is achieved. The verifier's bottleneck is the evaluation of the transition zerofier, which is in general a dense polynomial of degree $T$. As a result, roughly $T$ coefficients will be possibly nonzero, and since the verifier must touch all of them to compute the polynomial's value, his running time will be on the same order of magnitude. For scalable verifiers, we need a running time of at most $\tilde{O}(\log T)$. There are two strategies to achieve this: (1) sparse zerofiers based on group theory and (2) preprocessed dense zerofiers.
## Sparse Zerofiers with Group Theory
It is an elementary fact of group theory that every element raised to its order gives the identity. For example, an element $x$ of the subgroup of order $r$ of the multiplicative group of a finite field $\mathbb{F}_ p \backslash \lbrace 0 \rbrace$ satisfies $x^r = 1$. Rearranging, and replacing $x$ with a formal indeterminate $X$, we get a polynomial
$$ X^r - 1 $$
that is guaranteed to evaluate to zero in every element of the order-$r$ subgroup. Furthermore, this polynomial is monic (*i.e.*, the leading coefficient is one) and of minimal degree (across all polynomials that vanish on all $r$ points of the subgroup). Therefore, this sparse polynomial is exactly the zerofier for the subgroup!
For STARKs, we are already using finite fields that come with subgroups of order $2^k$ for many $k$. Therefore, if the execution trace is interpolated over $\lbrace \omicron^i \, \vert \, 0 \leq i < 2^k \rbrace$ where $\omicron$ is a generator of the subgroup of order $2^k$, then the zerofier for $\lbrace \omicron^i \, \vert \, 0 \leq i < 2^k - 1\rbrace$ is equal to the rational expression
$$ \frac{X^{2^k-1} - 1}{X - \omicron^{-1}} $$
in all points $X$ except for $X = \omicron^{-1}$, where the rational expression is undefined.
The verifier obviously does not perform the division because it turns a sparse polynomial into a dense one. Instead, the verifier evaluates the numerator sparsely and divides it by the value of the denominator. This works as long as the verifier does not need to evaluate the zerofier in $\omicron^{-1}$, which is precisely what the coset-trick of FRI guarantees.
To apply this strategy, the STARK trace length must be a power of 2. If the trace is far from a power of two, say by a difference of $d$, then the verifier needs to evaluate a zerofier that has $d-1$ factors in the denominator. In other words, *the trace length must be a power of two in order for the verifier to be fast*.
The solution is to pad the trace until its length is the next power of 2. Clearly this padding must be compatible with the transition constraints so that the composition polynomials still evaluate to zero on all (but one point) of the power-of-two subgroup. The natural solution is to apply the same transition function for a power of two number of cycles, and have the boundary conditions refer to the "output" whose cycle index is somewhere in the middle. However, this design decision introduces a problem when it comes to appending randomizers to the trace for the purpose of leaking zero knowledge.
- If the randomizers are appended after padding the trace, then the randomized trace does not fit into the power-of-two subgroup. In this case the interpolant must be computed such that:
- over the power-of-two subgroup it evaluates to the execution trace; and
- over a distinct domain it evaluates to the uniformly random randomizers.
- If the randomizers are appended before padding, then the transition constraints must be compatible with this operation, or else the composition polynomials will not evaluate to zero in the entire power-of-two subgroup. This option requires changing the AIR.
### Preprocessing
Where a standard IOP consists of two parties, the prover and the verifier, a *Preprocessing IOP* consists if three: a prover, a verifier, and an *indexer*. (The indexer is sometimes also called the *preprocessor* or the *helper*.)
The role of the indexer is to perform computations that help the verifier (not to mention prover) but that are too expensive for the verifier to perform directly. The catch is that the indexer does not receive the same input as the verifier does. The indexer's input (the *index*) is information about the computation that can be computed ahead of time, before specific data is known. For example, the index could be the number of cycles that the computation is supposed to take, along with the transition constraints. The specific information about the computation, or *instance*, would be the boundary constraints. The verifier's input is the instance as well as the indexer's output (which itself may include the index). The point is that from the verifier's point of view, the indexer's output is trusted.

The formal definition of STARKs does not capture proof systems with preprocessing, and when counting the indexer's work as verifier work, a proof system with preprocessing is arguably not scalable. Nevertheless, a preprocessing proof system can be scalable in the English sense of the word if the verifier's work (not counting that of the indexer) is polylogarithmic in the size of the computation.
### Preprocessed Dense Zerofiers
Concretely, the indexer's output to the verifier will be a commitment to the zerofier $Z(X) = \prod_{i=0}^{T-1} (X-\omicron^i)$ via the familiar Merkle root of Reed-Solomon codeword construction. Whenever the verifier needs the value of this zerofier in a point, the prover supplies this leaf along with an authentication path. Note that the verifier does not need to evaluate the zerofier in points outside the FRI domain. As a result, there is no need to prove that the zerofier has a low degree; it comes straight from the trusted indexer.
This description highlights the main drawback of using preprocessing to achieve scalability: the proof is larger because it includes more Merkle authentication paths. Another drawback is the slightly stronger security model: the verifier needs to trust the indexer's output. Even though the preprocessing is transparent here, re-running the indexer in order to justify this trust might be prohibitively expensive. The code supporting this tutorial achieves scalability through preprocessing as opposed to group theory.
### Variable Execution Times
The solution described above works perfectly fine if the execution time $T$ is known beforehand. What to do, however, when the execution time is not known beforehand, and thus cannot be included in the index?
Preprocessing still holds a solution, but at the cost of a slightly more expensive verifier. The indexer commits to each member of a family of zerofiers $\{Z_ {2^k}(X)\}_ k$ where $Z_{2^k}(X) = \prod_{i=0}^{2^k-1} (X - \omicron^i)$. Let $t = \lfloor \log_2 T \rfloor$ such that $Z_{2^t}(X)$ belongs to this family.
The prover wishes to show that a certain transition polynomial $p(X)$ evaluates to zero on $\{\omicron^i\}_ {i=0}^{T-1}$. Without preprocessing, he would commit to and prove the bounded degree of a quotient polynomial $q(X) = p(X) / Z_{T-1}(X)$, where $Z_{T-1}(X) = \prod_{i=0}^{T-1} (X - \omicron^i)$. With preprocessing, he must commit to and prove the bounded degree of two quotient polynomials:
1. $q_l(X) = \frac{p(X) }{ Z_{2^t}(X)}$ and
2. $q_r(X) = \frac{p(X) }{\omicron^{T-1-2^t} \cdot Z_{2^t}(\omicron^{2^t-T+1} \cdot X)}$.
The denominator of the second polynomial is exactly the zerofier $\prod_{i=T-1-2^t}^{T-1} (X - \omicron^i)$. The transition polynomial is divisible by both zerofiers if and only if it is divisible by the union zerofier $\prod_{i=0}^{T-1} (X - \omicron^i)$.
While this solution works adequately in the general case, for the Rescue-Prime computation, the cycle count is known. Therefore, the implementation reflects this setting.
## Fast STARKs
Now it is time to apply the developed tools to make the STARK algorithmically efficient.
First, add a preprocessing function. This function is a member of the STARK class with access to its fields (such as the number of cycles). It produces two outputs: one for the prover, and one for the verifier. In this concrete case, the prover receives the zerofier polynomial and zerofier codeword, and the verifier receives the zerofier Merkle root.
```python
# class FastStark:
# [...]
def preprocess( self ):
transition_zerofier = fast_zerofier(self.omicron_domain[:(self.original_trace_length-1)], self.omicron, len(self.omicron_domain))
transition_zerofier_codeword = fast_coset_evaluate(transition_zerofier, self.generator, self.omega, self.fri.domain_length)
transition_zerofier_root = Merkle.commit(transition_zerofier_codeword)
return transition_zerofier, transition_zerofier_codeword, transition_zerofier_root
```
The argument lists of `prove` and `verify` must be adapted accordingly.
```python
# class FastStark:
# [...]
def prove( self, trace, transition_constraints, boundary, transition_zerofier, transition_zerofier_codeword, proof_stream=None ):
# [...]
def verify( self, proof, transition_constraints, boundary, transition_zerofier_root, proof_stream=None ):
```
The prover can use fast coset division to divide out the transition zerofier, and note that this denominator is exactly the argument.
```python
# class FastStark:
# [...]
# def prove( [..] ):
# [...]
# divide out zerofier
transition_quotients = [fast_coset_divide(tp, transition_zerofier, self.generator, self.omicron, self.omicron_domain_length) for tp in transition_polynomials]
```
The verifier needs to perform this division in a number of locations, which means that he need the value of the verifier in those locations. Therefore, the prover must provide them with authentication paths.
```python
# class FastStark:
# [...]
# def prove( [..] ):
# [...]
# ... and also in the zerofier!
for i in quadrupled_indices:
proof_stream.push(transition_zerofier_codeword[i])
path = Merkle.open(i, transition_zerofier_codeword)
proof_stream.push(path)
```
The verifier, in turn, needs to read these values and their authentication paths from the proof stream before verifying the authentication paths and storing the zerofier values in a structure for later use. Note that these authentication paths are verified against the Merkle root, which is the new input to the verifier.
```python
# class FastStark:
# [...]
# def verify( [..] ):
# [...]
# read and verify transition zerofier leafs
transition_zerofier = dict()
for i in duplicated_indices:
transition_zerofier[i] = proof_stream.pull()
path = proof_stream.pull()
verifier_accepts = verifier_accepts and Merkle.verify(transition_zerofier_root, i, path, transition_zerofier[i])
if not verifier_accepts:
return False
```
Finally, when the nonlinear combination is computed, these values can be read from memory and used.
```python
# class FastStark:
# [...]
# def verify( [..] ):
# [...]
quotient = tcv / transition_zerofier[current_index]
```
At this point, what remains is to switch to fast polynomial arithmetic outside the context of preprocessing. The first opportunity is interpolating the trace.
```python
# class FastStark:
# [...]
# def prove( [..] ):
# [...]
trace_polynomials = trace_polynomials + [fast_interpolate(trace_domain, single_trace, self.omicron, self.omicron_domain_length)]
```
Next, when committing to the boundary quotients, use fast coset evaluation. Same goes for the randomizer polynomial and the combination polynomial.
```python
# class FastStark:
# [...]
# def prove( [..] ):
# [...]
# commit to boundary quotients
# [...]
for s in range(self.num_registers):
boundary_quotient_codewords = boundary_quotient_codewords + [fast_coset_evaluate(boundary_quotients[s], self.generator, self.omega, self.fri_domain_length)]
merkle_root = Merkle.commit(boundary_quotient_codewords[s])
proof_stream.push(merkle_root)
# [...]
# commit to randomizer polynomial
randomizer_polynomial = Polynomial([self.field.sample(os.urandom(17)) for i in range(self.max_degree(transition_constraints)+1)])
randomizer_codeword = fast_coset_evaluate(randomizer_polynomial, self.generator, self.omega, self.fri_domain_length)
randomizer_root = Merkle.commit(randomizer_codeword)
proof_stream.push(randomizer_root)
# [...]
# compute matching codeword
combined_codeword = fast_coset_evaluate(combination, self.generator, self.omega, self.fri_domain_length)
```
Dividing out the transition zerofier is a pretty intense task. It pays to switch to NTT-based division. Note that coset division is needed here, since the zerofier definitely takes the value zero on points of the trace domain.
```python
# divide out zerofier
transition_quotients = [fast_coset_divide(tp, transition_zerofier, self.generator, self.omicron, self.omicron_domain_length) for tp in transition_polynomials]
```
Lastly, in the FRI verifier, switch out the slow Lagrange interpolation for the much faster (coset) NTT based interpolation.
```python
# class Fri:
# [...]
# def verify( [..] ):
# [...]
# compute interpolant
last_domain = [last_offset * (last_omega^i) for i in range(len(last_codeword))]
coefficients = intt(last_omega, last_codeword)
poly = Polynomial(coefficients).scale(last_offset.inverse())
```
After modifying the Rescue-Prime signature scheme to use the new, `FastStark` class and methods, this gives rise to a significantly faster signature scheme.
- secret key size: 16 bytes (yay!)
- public key size: 16 bytes (yay!)
- signature size: **~160 kB**
- keygen time: 0.01 seconds (acceptable)
- signing time: **72 seconds**
- verification time: **8 seconds**
How's that for an improvement? The proof is larger because there are many more Merkle paths associated with zerofier leafs, but in exchange for this larger proof, verification is an order of magnitude faster. Of course there is no shortage of further improvements, but those are beyond the scope of this tutorial and left as exercises to the reader.
[0](index) - [1](overview) - [2](basic-tools) - [3](fri) - [4](stark) - [5](rescue-prime) - **6**
================================================
FILE: docs/fri.md
================================================
# Anatomy of a STARK, Part 3: FRI
FRI is a protocol that establishes that a committed polynomial has a bounded degree. The acronym FRI stands for *Fast Reed-Solomon IOP of Proximity*, where IOP stands for *interactive oracle proof*. FRI is presented in the language of codewords: the prover sends codewords to the verifier who does not read them whole but who makes oracle-queries to read them in select locations. The codewords in this protocol are *Reed-Solomon codewords*, meaning that their values correspond to the evaluation of some low-degree polynomial in a list of points called the domain $D$. The length of this list is larger than the number of possibly nonzero coefficients in the polynomial by a factor called the *expansion factor* (also *blowup factor*), which is the reciprocal of the code's *rate* $\rho$.
Since the codewords represent low-degree polynomials, and since the codewords are hidden behind Merkle trees in any real-world deployment, it is arguably more natural to present FRI from the point of view of a polynomial commitment scheme, with some caveats. There is scientific merit in separating the type of codewords from the IOP, and those two from the Merkle tree that simulates the oracles. However, from an accessibility point of view, it is beneficial to consider them as three components of one basic primitive that relates to polynomial commitment schemes. For the remainder of this tutorial, we will use the term FRI in this sense.
In a regular polynomial commitment scheme, a prover commits to a polynomial $f(X)$ that is later opened at a given point $z$ such that it cannot equivocate between two different values of $f(z)$. The scheme consists of three algorithms:
- $\mathsf{commit}$, which computes a binding commitment from the polynomial;
- $\mathsf{open}$, which produces a proof that $f(z) = y$ for some $z$ and for the polynomial $f(X)$ that matches with the given commitment;
- $\mathsf{verify}$, which verifies the proof produced by $\mathsf{open}$.
The FRI scheme has a different interface, but a later section shows how it can simulate the standard polynomial commitment scheme interface without much overhead. FRI is a protocol between a prover and a verifier, which establishes that a given codeword belongs to a polynomial of low degree -- low meaning at most $\rho$ times the length of the codeword. Without losing much generality[^1], the prover knows this codeword explicitly, whereas the verifier knows only its Merkle root and leafs of his choosing, assuming the successful validation of the authentication paths that establish the leafs' membership to the Merkle tree.
## Split-and-Fold
One of the great ideas for proof systems in recent years was *split-and-fold* technique. The idea is to reduce a claim to two claims of half the size. Then both claims are merged into one using random weights supplied by the verifier. After logarithmically many steps (as a function of the size of the original claim) the claim has been reduced to one of a trivial size which is true if and only if (modulo some negligible security degradation) the original claim was true.
In the case of FRI, this computational claim asserts that the given codeword corresponds to a polynomial of low degree. Specifically, let $N$ be the length of the codeword, and $d$ be the maximum degree of the polynomial that it corresponds[^2] to. Let this polynomial be $f(X) = \sum_{i=0}^{d} c_i X^i$.
Following the divide-and-conquer strategy of the fast Fourier transform, this polynomial is divided into even and odd terms.
$$ f(X) = f_E(X^2) + X \cdot f_O(X^2) $$
where
$$ f_E(X^2) = \frac{f(X) + f(-X)}{2} = \sum_{i=0}^{\frac{d+1}{2}-1} c_{2i} X^{2i} $$
and
$$f_O(X^2) = \frac{f(X) - f(-X)}{2X} = \sum_{i=0}^{\frac{d+1}{2}-1} c_{2i+1} X^{2i} \enspace ,$$
where the expression $\frac{d+1}{2}-1$ is the number of possible even or odd terms in a degree $d$ polynomial counting from zero.
Keep in mind that for our usecase $d=2^k-1$, for $k \in \mathbb{N_{+}}$ so the expression is always an integer.
To see that this decomposition is correct, observe that for $f_E(X)$, the odd terms cancel; whereas for $f_O(X)$, it is the even terms that cancel. The key step of the protocol derives a codeword for $f^\star(X) = f_E(X) + \alpha \cdot f_O(X)$ from the codeword for $f(X)$, where $\alpha$ is a random scalar supplied by the verifier.
Let $D$ be a subgroup of even order $N$ of the multiplicative group of the field, and let $\omega$ generate this subgroup: $\langle \omega \rangle = D \subset \mathbb{F}_p \backslash\lbrace 0\rbrace.$
Let $\lbrace f(\omega^i)\rbrace_{i=0}^{N-1}$ be the codeword for $f(X)$, corresponding with evaluation on $D$. Let $D^\star = \langle \omega^2 \rangle$ be another domain, of half the length, and $\lbrace f_ E(\omega^{2i})\rbrace_{i=0}^{N/2-1}$, $\lbrace f_ O(\omega^{2i})\rbrace_{i=0}^{N/2-1}$, and $\lbrace f^\star(\omega^{2i})\rbrace_ {i=0}^{N/2-1}$ be the codewords for $f_E(X)$, $f_O(X)$, and $f^\star(X)$, respectively, corresponding to evaluation on $D^\star$.
Expanding the definition of $f^\star(X)$ gives
$$ \lbrace f^\star(\omega^{2i})\rbrace_{i=0}^{N/2-1} = \lbrace f_E(\omega^{2i}) + \alpha \cdot f_O(\omega^{2i})\rbrace_{i=0}^{N/2-1} . $$
Expand again, this time with the definition of $f_E(X^2)$ and $f_O(X^2)$.
$$ \lbrace f^\star(\omega^{2i})\rbrace_{i=0}^{N/2-1} $$
$$ = \left\lbrace \frac{f(\omega^i) + f(-\omega^i)}{2} + \alpha \cdot \frac{f(\omega^i) - f(-\omega^i)}{2 \omega^i} \right\rbrace_{i=0}^{N/2-1} $$
$$ = \lbrace 2^{-1} \cdot \left( ( 1 + \alpha \cdot \omega^{-i} ) \cdot f(\omega^i) + (1 - \alpha \cdot \omega^{-i} ) \cdot f(-\omega^i) \right) \rbrace_{i=0}^{N/2-1} $$
Since the order of $\omega$ is $N$, we have $\omega^{N/2} = -1$, and therefore $f(-\omega^i) = f(\omega^{N/2 + i})$. This substitution makes it clear that even though the index iterates over half the range (from $0$ to $N/2-1$), all the points of $\lbrace f(\omega^i)\rbrace_{i=0}^{N-1}$
are involved in the derivation of $\lbrace f^\star(\omega^{2i})\rbrace_{i=0}^{N/2-1}$. It does not matter that the latter codeword has half the length; its polynomial has half the degree.
At this point it is possible to describe the mechanics for one round of the FRI protocol. The prover commits to $f(X)$ by sending the Merkle root of its codeword to the verifier. The verifier responds with the random challenge $\alpha$. The prover computes $f^\star(X)$ and commits to it by sending the Merkle root of $\lbrace f^\star(\omega^{2i})\rbrace_{i=0}^{N/2-1}$ to the verifier.
The verifier now has two commitments to polynomials and his task is to verify that their correct relation holds. Specifically, the verifier should reject the proof if $f^\star(X^2) \neq 2^{-1} \cdot \left( (1 + \alpha X^{-1}) \cdot f(X) + (1 - \alpha X^{-1} ) \cdot f(-X) \right)$. (Ignore the case where $X=0$.) To do this, the verifier randomly samples an index $i \xleftarrow{\$} \lbrace 0, \ldots, N/2-1\rbrace$, which defines 3 points:
- $A: (\omega^i, f(\omega^i))$,
- $B: (\omega^{N/2+i}, f(\omega^{N/2+i}))$,
- $C: (\alpha, f^\star(\omega^{2i}))$.
Notice that the x-coordinates of $A$ and $B$ are the square roots of $\omega^{2i}$. Upon receiving the index $i$ from the verifier, the prover provides the y-coordinates along with their Merkle authentication paths. The verifier verifies these paths against their proper roots and follows up by verifying that $A$, $B$, and $C$ fall on a straight line. This test is known as the *colinearity check*.
Why would $A$, $B$, and $C$ lie on a straight line? Let's find the line that passes through $A$ and $B$ and see what that means for $C$. An elementary Lagrange interpolation yields
$$ y = \sum_i y_i \prod_{j \neq i} \frac{x - x_j}{x_i - x_j} \\
= f(\omega^i) \cdot \frac{x - \omega^{N/2+i}}{\omega^{i} - \omega^{N/2+i}} + f(\omega^{N/2+i}) \cdot \frac{x - \omega^{i}}{\omega^{N/2+i} - \omega^{i}} \\
= f(\omega^i) \cdot 2^{-1} \cdot \omega^{-i} \cdot (x + \omega^i) - f(\omega^{N/2+i}) \cdot 2^{-1} \cdot \omega^{-i} (x - \omega^i) \\
= 2^{-1} \cdot \left( (1 + x \cdot \omega^{-i}) \cdot f(\omega^i) + (1 - x \cdot \omega^{-i}) \cdot f(\omega^{N/2 + i}) \right) \enspace .$$
By setting $x = \alpha$ we get exactly the y-coordinate of $C$.
This description covers one round, at the end of which the prover and verifier are in the same position as they were at the start. The prover wishes to establish that a given Merkle root decommits to a codeword whose defining polynomial has a bounded degree. There is one important difference though: as a result of running one round of FRI, the length of the codeword as well as the number of possibly nonzero coefficients of the polynomial have halved. Prover and verifier can set $f = f^\star$, $D = D^\star$, and repeat the process. After running $\lceil \log_2 (d+1) \rceil - 1$ rounds of FRI, where $d$ is the degree of the original polynomial, prover and verifier end up with a constant polynomial whose codeword is also constant. At this point, the prover sends this constant[^3] instead of the codeword's Merkle root, making it abundantly clear that it corresponds to a polynomial of degree $0$.

In production systems, the length of the codeword is often reduced not by a factor 2, but a small power of 2. This optimization reduces the proof size and might even generate running time improvements. However, this tutorial optimizes for simplicity and any further discussion about higher folding factors is out of scope.
### Index Folding
The above description glosses over a counter-intuitive but highly subtle point: *the random indices are not independent between rounds*. Instead, the same index is re-used across all rounds, with reductions modulo the codeword length when necessary.
The reason why sampling the indices independently in each round is less secure, is because it is likely to fail to catch hybrid codewords, as the next picture shows.

The blue codeword is far from any codeword that matches with a low degree polynomial, whereas the green codeword does correspond to a low degree polynomial. In order to switch from blue to green, the malicious prover uses a hybrid codeword in the second round. This hybrid codeword is obtained by selecting the values from the one codeword or the other based on a randomly chosen partition. The malicious prover succeeds when all colinearity checks involve points of the same color.
The attack is thwarted when the same indices are used. The hybrid codeword necessarily generates a colinearity test of mismatching colors -- either in the current or in the next round.
### Intuition for Security
The polynomial $f^\star(X^2)$ is a random linear combination of $f(X)$ and $f(-X)$. Clearly, if the prover is honest, then $f^\star(X)$ and its codeword satisfy this relation. What is less intuitive is when the prover is dishonest in more subtle ways than the hybrid codeword attack -- what is it about this colinearity check that makes the verifier likely to notice the fraud?
A fraudulent prover is successful when the verifier accepts a codeword that does not correspond to a low degree polynomial. Let $\lbrace f(\omega^i)\rbrace_{i=0}^{N-1}$ be such a fraudulent codeword, corresponding to a polynomial $f(X)$ of degree $N-1$. Then $f_E(X)$ and $f_O(X)$ will be of degree at most $N/2 - 1$, and so will their linear combination $f^\star(X) = f_E(X) + \alpha \cdot f_O(X)$. At this point the malicious prover has two options.
1. He computes the codeword $\lbrace f^\star(\omega^{2i})\rbrace_{i=0}^{N/2-1}$ honestly by evaluating $f^\star(X)$ on $D^\star = \langle \omega^2 \rangle$. This does not improve his situation because instead of "proving" that a codeword of length $N$ corresponds to a polynomial of degree less than $\rho \cdot N$, he now has to "prove" that this codeword of length $N/2$ corresponds to a polynomial of degree less than $\rho \cdot N/2$. There is no reason to assume this false claim is any easier to prove than the one he started out with.
2. He sends a different codeword $\lbrace v_i\rbrace_{i=0}^{N/2-1}$ that disagrees with $f^\star(X)$ in *enough* points of $D^\star = \langle \omega^2 \rangle$. This is exactly the type of fraud that is likely to be exposed by the verifier's colinearity checks.
Intuitively, a prover who lies in one location is hardly cheating, because it is a single error in an error-correcting code. The other $N/2-1$ values of the codeword still uniquely identify the codeword's defining polynomial. A cheating prover needs the fraudulent codeword $\lbrace v_i\rbrace_{i=0}^{N/2-1}$ to correspond to a polynomial of degree less than $\rho \cdot N/2$, and for that to be the case it needs to agree with this low degree polynomial in many more points than just one. But as the number of points where the malicious prover is being dishonest increases, so too does the probability of this fraud being exposed by the colinearity check.
### Security Level
How many colinearity checks are needed for a target security level of $\lambda$ bits? That's the million dollar question.
The [FRI paper](https://eccc.weizmann.ac.il/report/2017/134/revision/1/download/), the [DEEP-FRI follow-up](https://sites.math.rutgers.edu/~sk1233/deep-fri.pdf), and the [follow-up to the follow-up](https://eprint.iacr.org/2020/654), present a sequence of more refined argument relying crucially on the code rate $\rho$. I do not pretend to understand these proofs and will content myself with merely reciting the rule of thumb used in the [EthSTARK documentation](https://eprint.iacr.org/2021/582) for conjectural security[^4]:
- The hash function used for building Merkle trees needs to have at least $2\lambda$ output bits.
- The field needs to have at least $2^\lambda$ elements. (Note that this refers to the field used for FRI. In particular, you can switch to an extension field if the base field is not large enough.)
- You get $\log_2 \rho^{-1}$ bits of security for every colinearity check, so setting the number of colinearity checks to $s = \lceil \lambda / \log_2 \rho^{-1} \rceil$ achieves $\lambda$ bits of security.
### Coset-FRI
The description of the FRI protocol up until now involves codewords defined as the list of values taken by a polynomial of low degree on a given *evaluation domain* $D$, where $D$ is a subgroup of order $2^k$ spanned by some subgroup generator $\omega$. This leads to problems later on, when linking the FRI together with the STARK machinery. Specifically, the STARK protocol is *also* defined in terms of Reed-Solomon codewords. It is worthwhile to anticipate the problems that can occur when the points of evaluation coincide, by choosing two disjoint sets.
Specifically, let the new evaluation domain by a *coset* of the subgroup of order $2^k$ defined by some *offset* $g$ which is not a member of the subgroup $\langle \omega \rangle, \cdot$. Specifically, $D = \lbrace g \cdot \omega^i \vert i \in \mathbb{Z}\rbrace$. The most straightforward choice is to set $g$ to a generator of the entire multiplicative group $\mathbb{F} \backslash \lbrace 0\rbrace, \cdot$. The evaluation domain for the next codeword is given by the set of squares of $D$: $D^\star = \lbrace d^2 \vert d \in D\rbrace = \lbrace g^2 \cdot \omega^{2i} \vert i \in \mathbb{Z}\rbrace$.
## Implementation
Let's implement the algorithms described in a module called `Fri`. Aside from logic for the prover and the verifier, it has helper methods to derive the number of rounds and the initial evaluation domain.
```python
import math
from hashlib import blake2b
class Fri:
def __init__( self, offset, omega, initial_domain_length, expansion_factor, num_colinearity_tests ):
self.offset = offset
self.omega = omega
self.domain_length = initial_domain_length
self.field = omega.field
self.expansion_factor = expansion_factor
self.num_colinearity_tests = num_colinearity_tests
def num_rounds( self ):
codeword_length = self.domain_length
num_rounds = 0
while codeword_length > self.expansion_factor and 4*self.num_colinearity_tests < codeword_length:
codeword_length /= 2
num_rounds += 1
return num_rounds
def eval_domain( self ):
return [self.offset * (self.omega^i) for i in range(self.domain_length)]
```
Note that the method to compute the number of rounds terminates the protocol early. Specifically, it terminates as soon as the number of colinearity checks is more than one quarter the length of the working codeword. If there were another step, more than half the points in the codeword would be a $C$ point in some colinearity test. At this point, the entropy of a random selection of indices drops significantly.
### Prove
The FRI protocol consists of two phases, called *commit* and *query*. In the commit phase, the prover sends Merkle roots of codewords to the verifier, and the verifier supplies random field elements as input to the split-and-fold procedure. In the query phase, the verifier selects indices of leafs, which the prover then opens, so that the verifier can check the colinearity requirement.
It is important to keep track of the set of indices of leafs of the initial codeword that the verifier wants to inspect. This is the point where the FRI protocol links into the Polynomial IOP that comes before it. Specifically, the larger protocol that uses FRI as a subroutine needs to verify that the leafs of the initial Merkle tree opened by the FRI protocol actually correspond to the codeword that the FRI protocol is supposedly about.
```python
def prove( self, codeword, proof_stream ):
assert(self.domain_length == len(codeword)), "initial codeword length does not match length of initial codeword"
# commit phase
codewords = self.commit(codeword, proof_stream)
# get indices
top_level_indices = self.sample_indices(proof_stream.prover_fiat_shamir(), len(codewords[1]), len(codewords[-1]), self.num_colinearity_tests)
indices = [index for index in top_level_indices]
# query phase
for i in range(len(codewords)-1):
indices = [index % (len(codewords[i])//2) for index in indices] # fold
self.query(codewords[i], codewords[i+1], indices, proof_stream)
return top_level_indices
```
The commit phase consists of several rounds in which:
- The Merkle root of the working codeword is computed.
- The Merkle root is sent to the verifier.
- The verifier supplies a random challenge $\alpha$.
- The prover applies the split-and-fold formula to derive a codeword for the next round.
- The prover squares both the offset $g$ and generator $\omega$ such that $\lbrace g \cdot \omega^i \vert i \in \mathbb{Z}\rbrace$ always corresponds to the working codeword's evaluation domain.
After running the loop, the prover is left with a codeword. It sends this codeword to the verifier in the clear. Lastly, the prover needs to keep track of the codewords computed in every round in order to open the Merkle trees generated from them in the next phase.
```python
def commit( self, codeword, proof_stream, round_index=0 ):
one = self.field.one()
two = FieldElement(2, self.field)
omega = self.omega
offset = self.offset
codewords = []
# for each round
for r in range(self.num_rounds()):
# compute and send Merkle root
root = Merkle.commit(codeword)
proof_stream.push(root)
# prepare next round, if necessary
if r == self.num_rounds() - 1:
break
# get challenge
alpha = self.field.sample(proof_stream.prover_fiat_shamir())
# collect codeword
codewords += [codeword]
# split and fold
codeword = [two.inverse() * ( (one + alpha / (offset * (omega^i)) ) * codeword[i] + (one - alpha / (offset * (omega^i)) ) * codeword[len(codeword)//2 + i] ) for i in range(len(codeword)//2)]
omega = omega^2
offset = offset^2
# send last codeword
proof_stream.push(codeword)
# collect last codeword too
codewords = codewords + [codeword]
return codewords
```
The query phase consists of the same number of iterations of a different loop:
- The indices for the x-coordinates of the $A$ and $B$ points are derived from the set of indices for the x-coordinates of $C$ points.
- The indicated codeword values are sent to the verifier, along with their authentication paths.
The prover needs to record the indices of the first round.
```python
def query( self, current_codeword, next_codeword, c_indices, proof_stream ):
# infer a and b indices
a_indices = [index for index in c_indices]
b_indices = [index + len(current_codeword)//2 for index in c_indices]
# reveal leafs
for s in range(self.num_colinearity_tests):
proof_stream.push((current_codeword[a_indices[s]], current_codeword[b_indices[s]], next_codeword[c_indices[s]]))
# reveal authentication paths
for s in range(self.num_colinearity_tests):
proof_stream.push(Merkle.open(a_indices[s], current_codeword))
proof_stream.push(Merkle.open(b_indices[s], current_codeword))
proof_stream.push(Merkle.open(c_indices[s], next_codeword))
return a_indices + b_indices
```
In the above snippet, the sampling of indices is hidden away behind the argument `c_indices`. The wrapper function `prove` invokes the function `sample_indices` to sample the set of master indices. This method takes a seed, a list size, and a desired number, and generates that number of uniformly pseudorandom indices in the given interval. The actual logic is tricky. It involves repeatedly sampling a single index by calling `blake2b` on the seed appended with an increasing counter. The function keeps track of indices that are fully folded, *i.e.*, indicate locations in the last codeword. Sampled indices that generate a collision through folding are rejected.
```python
def sample_index( byte_array, size ):
acc = 0
for b in byte_array:
acc = (acc << 8) ^ int(b)
return acc % size
def sample_indices( self, seed, size, reduced_size, number ):
assert(number <= 2*reduced_size), "not enough entropy in indices wrt last codeword"
assert(number <= reduced_size), f"cannot sample more indices than available in last codeword; requested: {number}, available: {reduced_size}"
indices = []
reduced_indices = []
counter = 0
while len(indices) < number:
index = Fri.sample_index(blake2b(seed + bytes(counter)).digest(), size)
reduced_index = index % reduced_size
counter += 1
if reduced_index not in reduced_indices:
indices += [index]
reduced_indices += [reduced_index]
return indices
```
### Verify
The verifier follows a complementary checklist to the prover's, executing steps that correspond to each phase of the prover's process. Specifically, the verifier:
- Reads the Merkle roots from the proof stream and reproduces the random scalars $\alpha$ with Fiat-Shamir;
- Reads the last codewords from the proof stream and checks that it matches with a low degree polynomial as well as the last Merkle root to be sent;
- Reproduces the master list of random indices with Fiat-Shamir, and infers the remaining indices for the colinearity checks;
- Reads the Merkle leafs and their authentication paths from the proof stream, and verifies their authenticity against the indices;
- Runs the colinearity checks for every pair of consecutive codewords.
```python
def verify( self, proof_stream, polynomial_values ):
omega = self.omega
offset = self.offset
# extract all roots and alphas
roots = []
alphas = []
for r in range(self.num_rounds()):
roots += [proof_stream.pull()]
alphas += [self.field.sample(proof_stream.verifier_fiat_shamir())]
# extract last codeword
last_codeword = proof_stream.pull()
# check if it matches the given root
if roots[-1] != Merkle.commit(last_codeword):
print("last codeword is not well formed")
return False
# check if it is low degree
degree = (len(last_codeword) // self.expansion_factor) - 1
last_omega = omega
last_offset = offset
for r in range(self.num_rounds()-1):
last_omega = last_omega^2
last_offset = last_offset^2
# assert that last_omega has the right order
assert(last_omega.inverse() == last_omega^(len(last_codeword)-1)), "omega does not have right order"
# compute interpolant
last_domain = [last_offset * (last_omega^i) for i in range(len(last_codeword))]
poly = Polynomial.interpolate_domain(last_domain, last_codeword)
assert(poly.evaluate_domain(last_domain) == last_codeword), "re-evaluated codeword does not match original!"
if poly.degree() > degree:
print("last codeword does not correspond to polynomial of low enough degree")
print("observed degree:", poly.degree())
print("but should be:", degree)
return False
# get indices
top_level_indices = self.sample_indices(proof_stream.verifier_fiat_shamir(), self.domain_length >> 1, self.domain_length >> (self.num_rounds()-1), self.num_colinearity_tests)
# for every round, check consistency of subsequent layers
for r in range(0, self.num_rounds()-1):
# fold c indices
c_indices = [index % (self.domain_length >> (r+1)) for index in top_level_indices]
# infer a and b indices
a_indices = [index for index in c_indices]
b_indices = [index + (self.domain_length >> (r+1)) for index in a_indices]
# read values and check colinearity
aa = []
bb = []
cc = []
for s in range(self.num_colinearity_tests):
(ay, by, cy) = proof_stream.pull()
aa += [ay]
bb += [by]
cc += [cy]
# record top-layer values for later verification
if r == 0:
polynomial_values += [(a_indices[s], ay), (b_indices[s], by)]
# colinearity check
ax = offset * (omega^a_indices[s])
bx = offset * (omega^b_indices[s])
cx = alphas[r]
if test_colinearity([(ax, ay), (bx, by), (cx, cy)]) == False:
print("colinearity check failure")
return False
# verify authentication paths
for i in range(self.num_colinearity_tests):
path = proof_stream.pull()
if Merkle.verify(roots[r], a_indices[i], path, aa[i]) == False:
print("merkle authentication path verification fails for aa")
return False
path = proof_stream.pull()
if Merkle.verify(roots[r], b_indices[i], path, bb[i]) == False:
print("merkle authentication path verification fails for bb")
return False
path = proof_stream.pull()
if Merkle.verify(roots[r+1], c_indices[i], path, cc[i]) == False:
print("merkle authentication path verification fails for cc")
return False
# square omega and offset to prepare for next round
omega = omega^2
offset = offset^2
# all checks passed
return True
```
## Simulating a Polynomial Commitment Scheme
FRI establishes that a given Merkle root decommits to (the evaluations of) a polynomial of degree less than $2^k$. The Merkle root can therefore double as the commitment of a polynomial commitment scheme. But then how to realize the $\mathsf{open}$ and $\mathsf{verify}$ procedures? Two steps achieve this transformation.
**One:** arbitrary degree bounds. If the prover instead wants to prove that a committed polynomial has degree at most $d$, where $d+1$ is not a power of two, can he use FRI? The answer is yes!
Let $f(X)$ be a polynomial of degree at most $d$. Let $g(X)$ be another polynomial defined as $$g(X) = f(X) + X^{2^k - d - 1} \cdot f(X) \enspace ,$$
where $2^k > d+1$. Then $g(X)$ has degree less than $2^k$ only if $f(X)$ has degree at most $d$. So the prover who uses FRI to establish that $g(X)$ has degree less than $2^k$ automatically establishes that $f(X)$ has degree at most $d$.
In order to link the Merkle root for $g(X)$ to the Merkle root for $f(X)$, the verifier supplies a bunch (about $\lambda$) of random indices $i$ and the prover responds with the leafs at those indices and their authentication paths. The verifier then verifies that for every such point $x_i$ being the $i$th point of the evaluation domain, $g(x_i) = (1+x_i^{2^k - d - 1}) \cdot f(x_i)$. Alternatively, the first codeword in FRI can be omitted altogether; in this case the verifier relates the second FRI codeword $g^\star(X)$ to $f(X)$ by eliminating the values of $g(X)$ using the same formula.
**Two:** dividing out the zerofier. The verifier asks for the value of a committed polynomial $f(X)$ in a given point $z$. The prover responds: $f(z) = y$. Can he authenticate this response? Once again, the answer is yes!
Let $f(X)$ be a polynomial of degree at most $d$, and let $y$ be the purported value of $f(X)$ at $X=z$. Then the polynomial $f(X) - y$ has a zero in $X=z$. Evaluating in $X=z$ is equivalent to modular reduction by $X-z$, so we can write $f(X)-y \equiv 0 \mod X-z$. This implies that $X-z$ divides $f(X)-y$. However, if $f(z) \neq y$, then $X-z$ does not divide $f(X)-y$.
This is useful for FRI because the codeword for $f(X)$ corresponds to a low degree polynomial. Furthermore, the verifier who inspects this codeword in a given point $x$ can compute the value of $\frac{f(X) - y}{X-z}$, giving rise to a new codeword. This derived codeword corresponds to a low degree polynomial if and only if $f(z) = y$. So the prover who lies about $y = f(z)$ will be exposed when trying to use FRI to "prove" that $\frac{f(X) - y}{X-z}$ has degree at most $d-1$.
This process can be repeated any number of times. Suppose the verifier asks for the values of $f(X)$ in $z_0, \ldots, z_{n-1}$. The prover responds with $y_0, \ldots, y_n$, supposedly the values of $f(X)$ in these points. Let $p(X)$ be the polynomial of minimal degree that interpolates between $(z_0, y_0), \ldots, (z_{n-1}, y_{n-1})$. Then $f(X) - p(X)$ has zeros at $X \in \lbrace z_0, \ldots, z_{n-1}\rbrace$, and so $\prod_{i=0}^{n-1} X - z_i$ divides $f(X) - p(X)$. The verifier who authenticates a Merkle leaf of the tree associated with $f(X)$ can compute the matching value of $\frac{f(X) - p(X)}{\prod_{i=0}^{n-1} X-z_i}$. FRI will establish that this codeword corresponds to a polynomial of degree at most $d-n$ if and only if the prover was honest about all the values of $f(X)$.
So where's the code implementing this logic? Other Polynomial IOPs do rely on the verifier asking for the values of committed polynomials in arbitrary points, but it turns out that the STARK IOP does not. Nevertheless, it does implicitly rely on much of the same logic as was described here.
## Compiling a Polynomial IOP
The previous section explains how to use FRI to establish that committed polynomials a) satisfy arbitrary degree bounds; and b) satisfy point-value relations. In any non-trivial IOP, STARKs included, there will be many polynomials for which these constraints are being claimed to hold. Since FRI is a comparatively expensive protocol, it pays to *batch all invocations into one*.
Suppose the prover wants to establish that the Merkle root commitments for $f_0(X), \ldots, f_{n-1}(X)$ represent polynomials of degrees bounded by $d_0, \ldots, d_{n-1}$. To establish this using one FRI claim, the prover and verifier first calculate a *weighted nonlinear sum*:
$$ g(X) = \sum_{i=0}^{n-1} \alpha_i \cdot f_i(X) + \beta_i \cdot X^{2^k-d_i-1} \cdot f_i(X) \enspace .$$
The coefficients $\alpha_i$ and $\beta_i$ are drawn at random and supplied by the verifier. FRI is used once to establish that $g(X)$ has degree less than $2^k \geq \max_i d_i$.
The first Merkle root of the FRI protocol decommits to the codeword associated with $g(X)$. To relate this codeword to the right-hand side of the above formula, the verifier can verify the random nonlinear combination in a bunch (say, $\lambda$) of random points $x_i$ belonging to the evaluation domain. Alternatively, this Merkle root can be omitted entirely. In this case, the verifier directly relates the second FRI codeword associated with $g^\star(X)$ to the random nonlinear combination in the indicated points.
The intuition why this random nonlinear combination trick is secure is as follows. If all the polynomials $f_i(X)$ satisfy their proper degree bounds, then clearly $g(X)$ has degree less than $2^k$ and the FRI protocol succeeds. However, if any one $f_i(X)$ has degree larger than $d_i$, then with overwhelming probability over the randomly chosen $\alpha_i$ and $\beta_i$, the degree of $g(X)$ will be larger than or equal to $2^k$. As a result, FRI will fail.
The astute reader notices that the above random nonlinear combination is similar to the deterministic nonlinear combination in step one of simulating a polynomial commitment scheme. The difference is the absence of random weights in that formula. As a matter of fact, the formula
$$ g(X) = f(X) + X^{2^k - d-1} \cdot f(X) $$
captures the right intuition but is concretely insecure. The reason is that when evaluation is restricted to $\mathbb{F}_p$, polynomials behave identical to their representatives modulo $X^p - X$. And so the right summand in the above expression can contain terms that are cancelled by the left when evaluated on a subset of $\mathbb{F}_p$ for the purpose of computing the polynomial's codeword. As a result, the codeword might correspond to a polynomial of low degree even though $f(X)$ has a *very* high degree!
The involvement of random coefficients $\alpha$ and $\beta$ supplied by the verifier makes this combination secure:
$$ g(X) = \alpha \cdot f(X) + \beta \cdot X^{2^k - d-1} \cdot f(X) $$
When the random coefficients are present, the cancellation of high degree terms occurs with negligible probability.
[0](index) - [1](overview) - [2](basic-tools) - **3** - [4](stark) - [5](rescue-prime) - [6](faster)
[^1]: The generality lost in this description has to do with when the codeword in question is compiled on the fly from applying arithmetic operations to other codewords.
[^2]: The term "corresponds" is used informally here in a manner that hides allowance for error-correcting codewords to disagree from their generating polynomial slightly. FRI makes no distinction between codewords that agree exactly with a low degree polynomials on the given domain, and polynomials that are merely close to such codewords in terms of Hamming distance.
[^3]: It might make sense to terminate the protocol early, in which case the prover must send a non-trivial codeword in the clear and the verifier must verify that it has a defining polynomial of bounded degree.
[^4]: The [EthSTARK documentation](https://eprint.iacr.org/2021/582.pdf) also provides a significantly more complex formula for the security level provably achieved without relying on coding theoretic conjectures.
================================================
FILE: docs/index.md
================================================
# Anatomy of a STARK, Part 0: Introduction
This series of articles is a six part tutorial explaining the mechanics of the STARK proof system. It is directed towards a technically-inclined audience with knowledge of basic maths and programming.
- Part 0: Introduction
- [Part 1: STARK Overview](overview)
- [Part 2: Basic Tools](basic-tools)
- [Part 3: FRI](fri)
- [Part 4: The STARK Polynomial IOP](stark)
- [Part 5: A Rescue-Prime STARK](rescue-prime)
- [Part 6: Speeding Things Up](faster)
## What Are STARKs?
One of the most exciting recent advances in the field of cryptographic proof systems is the development of STARKs. It comes in the wake of a booming blockchain industry, for which proof systems in general seem tailor-made: blockchain networks typically consist of *mutually distrusting parties* that wish to *transact*, or generally *update collective state* according to *state evolution rules*, using *secret information*. Since the participants are mutually distrusting, they require the means to verify the validity of transactions (or state updates) proposed by their peers. *Zk-SNARKs* are naturally equipped to provide assurance of computational integrity in this environment, as a consequence of their features:
- zk-SNARKs are (typically) universal, meaning that they are capable of proving the integrity of arbitrary computations;
- zk-SNARKs are non-interactive, meaning that the entire integrity proof consists of a single message;
- zk-SNARKs are efficiently verifiable, meaning that the verifier has an order of magnitude less work compared to naïvely re-running the computation;
- zk-SNARKs are zero-knowledge, meaning that they do not leak any information about secret inputs to the computation.

Zk-SNARKs have existed for a while, but the STARK proof system is a relatively new thing. It stands out for several reasons:
- While traditional zk-SNARKs rely on cutting-edge cryptographic hard problems and assumptions, the only cryptographic ingredient in a STARK proof system is a collision-resistant hash function. As a result, the proof system is provably post-quantum under an idealized model of the hash function [^1]. This stands in contrast to the first generation of SNARKs which use bilinear maps and are only provably secure under unfalsifiable assumptions.
- The field of arithmetization for STARKs is independent of the cryptographic hard problem, and so this field can be chosen specifically to optimize performance. As a result, STARKs promise concretely fast provers.
- Traditional zk-SNARKs rely on a trusted setup ceremony to produce public parameters. After the ceremony, the used randomness must be securely forgotten. The ceremony is trusted because if the participants refuse or neglect to delete this cryptographic toxic waste, they retain the ability to forge proofs. In contrast, STARKs have no trusted setup and hence no cryptographic toxic waste.

In this tutorial I attempt to explain how many of the pieces work together. This textual explanation is supported by a python implementation for proving and verifying a simple computation based on the [Rescue-Prime](https://eprint.iacr.org/2020/1143.pdf) hash function. After reading or studying this tutorial, you should be able to write your own zero-knowledge STARK prover and verifier for a computation of your choice.
## Why?
It should be noted early on that there are a variety of sources for learning about STARKs. Here is an incomplete list.
- The scientific papers on [FRI](https://eccc.weizmann.ac.il/report/2017/134/revision/1/download/), [STARK](https://eprint.iacr.org/2018/046.pdf), [DEEP-FRI](https://eprint.iacr.org/2019/336.pdf), and the latest [soundness analysis for FRI](https://eccc.weizmann.ac.il/report/2020/083/)
- A multi-part tutorial by Vitalik Buterin (parts [I](https://vitalik.eth.limo/general/2017/11/09/starks_part_1.html)/[II](https://vitalik.eth.limo/general/2017/11/22/starks_part_2.html)/[3](https://vitalik.eth.limo/general/2018/07/21/starks_part_3.html))
- A series of blog posts by StarkWare (parts [1](https://medium.com/starkware/stark-math-the-journey-begins-51bd2b063c71), [2](https://medium.com/starkware/arithmetization-i-15c046390862), [3](https://medium.com/starkware/arithmetization-ii-403c3b3f4355), [4](https://medium.com/starkware/low-degree-testing-f7614f5172db), [5](https://medium.com/starkware/a-framework-for-efficient-starks-19608ba06fbe))
- The [STARK @ Home](https://www.youtube.com/playlist?list=PLcIyXLwiPilUFGw7r2uyWerOkbx4GFMXq) webcasts by StarkWare
- The [STARK 101](https://starkware.co/developers-community/stark101-onlinecourse/) online course by StarkWare
- The [EthStark documentation](https://eprint.iacr.org/2021/582.pdf) by StarkWare
- generally speaking, anything put out by [StarkWare](https://starkware.co)
- [A summary on the FRI low degree test](https://eprint.iacr.org/2022/1216) by Ulrich Haböck
With these sources available, why am I writing another tutorial?
*The tutorials are superficial.* The tutorials do a wonderful job explaining from a high level how the techniques work and conveying an intuition why it could work. However, they fall short of describing a complete system ready for deployment. For instance, none of the tutorials describe how to achieve zero-knowledge, how to batch various low degree proofs, or how to determine the resulting security level. The EthSTARK documentation does provide a complete reference to answer most of these questions, but it is tailored to one particular computation, does not cover zero-knowledge, and does not emphasize an accessible intuitive explanation.
*The papers are inaccessible.* Sadly, the incentives in scientific publishing are set up to make scientific papers unreadable to a layperson audience. Tutorials such as this one are needed then, to make those papers accessible to a wider audience.
*Sources are out of date.* Many of the techniques described in the various tutorials have since been improved upon. For instance, the EthSTARK documentation (the most recent document cited above) describes a *DEEP insertion technique* in order to reduce claims of correct evaluations to those of polynomials having bounded degrees. The tutorials do not mention this technique because they pre-date it.
*I prefer my own style.* I disagree with a lot of the symbols and names and I wish people would use the correct ones, dammit. In particular, I like to focus on polynomials as the most fundamental objects of the proof system. In contrast, all the other sources describe the mechanics of the proof system in terms of operations on Reed-Solomon codewords[^2] instead.
*It helps me to make sense of things.* Writing this tutorial helps me systematize my own knowledge and identify areas where it is shallow or wholly lacking.
## Required Background Knowledge
This tutorial does re-hash the background material when it is needed. However, the reader might want to study up on the following topics because if they are unfamiliar with them, the presentation here might be too dense.
- finite fields, and extension fields thereof
- polynomials over finite fields, both univariate and multivariate ones
- the fast fourier transform
- hash functions
## Roadmap
- [Part 1: STARK Overview](overview) paints a high-level picture of the concepts and workflow.
- [Part 2: Basic Tools](basic-tools) introduces the basic mathematical and cryptographic tools from which the proof system will be built.
- [Part 3: FRI](fri) covers the low degree test, which is the cryptographic heart of the proof system.
- [Part 4: The STARK IOP](stark) explains the information-theoretical that generates an abstract proof system from arbitrary computational claims.
- [Part 5: A Rescue-Prime STARK](rescue-prime) puts the tools together and builds a transparent zero-knowledge proof system for a simple computation.
- [Part 6: Speeding Things Up](faster) introduces algorithms and techniques to make the whole thing faster, effectively putting the "S" into the STARK.
## Supporting Python Code
In addition to the code snippets contained in the text, there is full working python implementation. Clone the repository from [here](https://github.com/aszepieniec/stark-anatomy). Incidentally, if you find a bug or a typo, or if you have an improvement you would like to suggest, feel free to make a pull request.
## Questions and Discussion
The best place for questions and discussion is on the [community forum of the zero-knowledge podcast](https://community.zeroknowledge.fm).
## Acknowledgements
The author wishes to thank Bobbin Threadbare, Thorkil Værge, and Eli Ben-Sasson for useful feedback and comments, as well as [Nervos](https://nervos.org) Foundation for financial support. Send him an email at `alan@nervos.org` or follow `aszepieniec` on twitter or Github. Consider donating [btc](bitcoin:bc1qg32wme6sqltus5e9yzuq4y56xxc0rutly8ak7y), [ckb](nervos:ckb1qyq9s4rvld206a3rl6jmzxav4ffx58uj5prsv867ml) or [eth](ethereum:0x934B24cE32ceEDB38ce088Da1D9366Fa23F7B3f4).
## Mirrors
This tutorial is hosted in several locations. If you're hosting an identical copy too, or a translation, let me know.
- [GitHub Pages](https://aszepieniec.github.io/stark-anatomy/)
- [Neptune Project Website](https://neptune.cash/learn/stark-anatomy/)
**0** - [1](overview) - [2](basic-tools) - [3](fri) - [4](stark) - [5](rescue-prime) - [6](faster)
[^1]: In the literature, this idealization is known as the quantum random oracle model.
[^2]: A Reed-Solomon codeword is the vector of evaluations of a low degree polynomial on a given domain of points. Different codewords belong to the same code when their defining polynomials are different but the evaluation domain is the same.
================================================
FILE: docs/latex/.gitignore
================================================
graphics.aux
graphics.log
graphics.nav
graphics.out
graphics.pdf
graphics.snm
graphics.synctex.gz
graphics.toc
================================================
FILE: docs/latex/graphics.tex
================================================
\documentclass[11pt]{beamer}
%\usetheme{Boadilla}
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{tikz}
\usefonttheme[onlymath]{serif}
\author{Alan Szepieniec}
%\title{}
\setbeamercovered{transparent}
\setbeamertemplate{navigation symbols}{}
%\logo{}
%\institute{}
%\date{}
%\subject{}
\begin{document}
%\begin{frame}
%\titlepage
%\end{frame}
%\begin{frame}
%\tableofcontents
%\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}[scale=0.9, every node/.style={scale=0.9}]
\node[draw, line width=1, rounded corners=0.5cm, minimum width=5cm, minimum height=1cm] (computation box) at (0,0) {};
\node[anchor=north] (computation label) at (computation box.north) {\textsc{computation}};
\node[draw, line width=1, rounded corners=0.5cm, minimum width=5cm, minimum height=1cm, color=red!50!black] (arithmetic constraint system box) at (0, -2) {\begin{tabular}{c}
\textsc{arithmetic} \\
\textsc{constraint system}
\end{tabular}};
\node[draw, line width=1, rounded corners=0.5cm, minimum width=5cm, minimum height=1cm, color=blue!50!black] (polynomial iop box) at (0, -4) {};
\node[anchor=north, color=blue!50!black] (polynomial iop label) at (polynomial iop box.north) {\textsc{polynomial iop}};
\node[draw, line width=1, rounded corners=0.5cm, minimum width=5cm, minimum height=1cm, color=green!50!black] (cryptographic proof system) at (0,-6) {\begin{tabular}{c}
\textsc{cryptographic} \\
\textsc{proof system}
\end{tabular}};
%\pause
%
%\node[] (bits) at (-4.5, 0) {$\{0,1\}^*, \wedge, \vee, \Leftrightarrow \!\!\neg$};
%
%\pause
%
%\node[] (field elements) at (-4.5, -2) {$\mathbb{F}^{\cdot \times \cdot}, \times, +, \circ, \cdot^\mathsf{T}$};
%
%\pause
%
%\node[] (polynomials) at (-4.5, -4) {$\mathbb{F}[X], z \xleftarrow{\S} \mathbb{F}, \langle \mathsf{P} \leftrightarrow \mathsf{V} \rangle$};
%
%\pause
%
%\node[] (crypto) at (-4.5, -6) {$\mathbb{F}, \mathsf{H}(\cdot), \mathsf{g}^x, \mathbf{A}s + e$};
%
%\pause
\draw[->, line width=2] (2.5, -0.5) .. controls (3, -0.5) and (3, -1.5) .. (2.5, -1.5) node[right, midway, xshift=0.2cm] {\textit{arithmetization}};
\draw[->, line width=2] (2.5, -2.5) .. controls (3, -2.5) and (3, -3.5) .. (2.5, -3.5) node[right, midway, xshift=0.2cm] {\textit{interpolation}};
\draw[->, line width=2] (2.5, -4.5) .. controls (3, -4.5) and (3, -5.5) .. (2.5, -5.5) node[right, midway] {\begin{tabular}{c}
\textit{cryptographic} \\
\textit{compilation}
\end{tabular}};
\end{tikzpicture}
\end{frame}
\begin{frame}
\centering
\begin{tikzpicture}[scale=0.8, every node/.style={scale=0.8}]
\node[scale=2, color=black] (computation) at (2, 6) {computation};
\node[scale=2, color=red!50!black] (trace) at (0, 3) {$T$};
\node[anchor=east, xshift=-0.25cm, color=red!50!black] (aet) at (trace.west) {\begin{tabular}{c}
\textit{algebraic} \\
\textit{execution} \\
\textit{trace}
\end{tabular}};
\node[scale=1.5, color=red!50!black] (constraints) at (3, 3) {constraints};
\node[anchor=west, xshift=0.25cm, color=red!50!black] (air) at (constraints.east) {\begin{tabular}{c}
\textit{algebraic} \\
\textit{intermediate} \\
\textit{representation}
\end{tabular}};
\draw[->, line width=1] (computation) -- (trace) {};
\draw[->, dashed, color=gray, line width=1] (computation) -- (constraints) {};
\node[scale=2, color=blue!50!black] (tx) at (0,0) {$\boldsymbol{t}(X)$};
\node[anchor=east, xshift=-0.25cm, color=blue!50!black] (trace polynomials) at (tx.west) {\begin{tabular}{c}
\textit{trace} \\
\textit{polynomials}
\end{tabular}};
\node[scale=4] (mapsto) at (3, 0) {$\longmapsto$};
\node[scale=2, color=blue!50!black] (qx) at (6, 0) {$\boldsymbol{q}(X)$};
\node[anchor=west, xshift=0.25cm, color=blue!50!black] (quotient polynomials) at (qx.east) {\begin{tabular}{c}
\textit{quotient} \\
\textit{polynomials}
\end{tabular}};
\draw[->, line width=1] (trace) -- (tx) {};
\draw[->, dashed, color=gray, line width=1] (constraints) -- (3,0.1) {};
\node[color=green!50!black, draw, rounded corners=0.25cm, scale=2, minimum width=1.5cm, minimum height=0.75cm] (fri) at (6, -2.5) {FRI};
\draw[->, line width=1] (qx) -- (fri) {};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}[node distance=2.5cm,auto]
\node [coordinate] (root) []{};
\node [coordinate, yshift=-0.5cm, xshift=-1cm] (l) []{};
\node [coordinate, yshift=-0.5cm, xshift=1cm] (r) []{};
\node [coordinate, yshift=-0.5cm, xshift=-0.5cm] (ll) at (l) []{};
\node [coordinate, yshift=-0.5cm, xshift=0.5cm] (lr) at (l) []{};
\node [coordinate, yshift=-0.5cm, xshift=-0.5cm] (rl) at (r) []{};
\node [coordinate, yshift=-0.5cm, xshift=0.5cm] (rr) at (r) []{};
\node [coordinate, yshift=-0.75cm, xshift=-0.75cm] (lll) at (ll) []{};
\node [coordinate, xshift=0.65cm] (llr) at (lll) []{};
\node [coordinate, xshift=0.65cm] (lrl) at (llr) []{};
\node [coordinate, xshift=0.65cm] (lrr) at (lrl) []{};
\node [coordinate, xshift=0.65cm] (rll) at (lrr) []{};
\node [coordinate, xshift=0.65cm] (rlr) at (rll) []{};
\node [coordinate, xshift=0.65cm] (rrl) at (rlr) []{};
\node [coordinate, xshift=0.65cm] (rrr) at (rrl) []{};
\path[-] (root) edge (l);
\path[-] (root) edge (r);
\path[-] (l) edge (ll);
\path[-] (l) edge (lr);
\path[-] (r) edge (rl);
\path[-] (r) edge (rr);
\path[-] (ll) edge (lll);
\path[-] (ll) edge (llr);
\path[-] (lr) edge (lrl);
\path[-] (lr) edge (lrr);
\path[-] (rl) edge (rll);
\path[-] (rl) edge (rlr);
\path[-] (rr) edge (rrl);
\path[-] (rr) edge (rrr);
\draw [thick, fill=blue!50!black] (root) circle (0.25cm);
\draw [thick, fill=gray] (l) circle (0.25cm);
\draw [thick, fill=red!50!black] (r) circle (0.25cm);
\draw [thick, fill=red!50!black] (ll) circle (0.25cm);
\draw [thick, fill=gray] (lr) circle (0.25cm);
\draw [thick, fill=white] (rl) circle (0.25cm);
\draw [thick, fill=white] (rr) circle (0.25cm);
\draw [thick, fill=white] (lll) circle(0.25cm);
\draw [thick, fill=white] (llr) circle(0.25cm);
\draw [thick, fill=green!50!black] (lrl) circle(0.25cm);
\draw [thick, fill=red!50!black] (lrr) circle(0.25cm);
\draw [thick, fill=white] (rll) circle(0.25cm);
\draw [thick, fill=white] (rlr) circle(0.25cm);
\draw [thick, fill=white] (rrl) circle(0.25cm);
\draw [thick, fill=white] (rrr) circle(0.25cm);
\node[coordinate] (root legend) at (-2.5, -2.75) {};
\draw[thick, fill=blue!50!black] (root legend) circle (0.25cm);
\node[anchor=north, yshift=-0.25cm, color=blue!50!black] (root label) at (root legend.south) {root};
\node[coordinate] (leaf legend) at (-0.75, -2.75) {};
\draw[thick, fill=green!50!black] (leaf legend) circle (0.25cm);
\node[anchor=north, yshift=-0.25cm, color=green!50!black] (leaf label) at (leaf legend.south) {leaf};
\node[coordinate] (authentication path legend 1) at (1, -2.75) {};
\node[coordinate] (authentication path legend 2) at (1.75, -2.75) {};
\node[coordinate] (authentication path legend 3) at (2.5, -2.75) {};
\draw[thick, fill=red!50!black] (authentication path legend 1) circle (0.25cm);
\draw[thick, fill=red!50!black] (authentication path legend 2) circle (0.25cm);
\draw[thick, fill=red!50!black] (authentication path legend 3) circle (0.25cm);
\node[anchor=north, yshift=-0.25cm, color=red!50!black] (authentication) at (authentication path legend 2.south) {authentication};
\node[anchor=north, color=red!50!black, yshift=0.125cm] (path) at (authentication.south) {path};
%\path[->] (start) edge node {} (e);
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}
\draw[-, rounded corners=0.25cm] (-5, 0) -- (-6, 0) -- (-6, -6) -- (-4, -6) -- (-4, 0) -- (-5, 0) {};
\draw[-, rounded corners=0.25cm] (2, 0) -- (1, 0) -- (1, -6) -- (3, -6) -- (3, 0) -- (2, 0) {};
\node[anchor=north] (prover) at (-5, 0) {\textsc{Prover}};
\node[anchor=north] (verifier) at (2, 0) {\textsc{Verifier}};
\draw[->, line width=1] (-4, -1) -- (1, -1) {};
\draw[->, line width=1] (1, -2) -- (-4, -2) node[above, midway] {\textit{\textcolor{gray}{random}}};
\draw[->, line width=1] (-4, -3) -- (1, -3) {};
\draw[->, line width=1] (1, -4) -- (-4, -4) node[above, midway] {\textit{\textcolor{gray}{random}}};
\draw[dashed, color=gray!50!black] (-1.5, -4.5) -- (-1.5, -5) {};
\draw[->, line width=1] (-4, -5.5) -- (1, -5.5) {};
\draw[->, line width=1] (-5.5, 0.75) -- (-5.5, 0) node[above, near start, yshift=0.25cm] {\textit{claim}};
\draw[->, line width=1] (-4.5, 0.5) -- (-4.5, 0) node[above, near start, yshift=0.125cm] {\textit{witness}};
\draw[->, line width=1] (1.5, 0.75) -- (1.5, 0) node[above, near start, yshift=0.25cm] {\textit{claim}};
\draw[->, line width=1] (2, -6) -- (2, -6.75) node[below, near end, yshift=-0.25cm] {verdict};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}[overlay, yshift=2.5cm]
\draw[-, rounded corners=0.25cm] (-5, 0) -- (-6, 0) -- (-6, -6) -- (-4, -6) -- (-4, 0) -- (-5, 0) {};
\draw[-, rounded corners=0.25cm] (5, 0) -- (4, 0) -- (4, -6.5) -- (6, -6.5) -- (6, 0) -- (5, 0) {};
\draw[-, rounded corners=0.25cm, fill=black] (0,-0.5) -- (-0.5, -0.5) -- (-0.5, -6.5) -- (0.5, -6.5) -- (0.5, -0.5) -- (0, -0.5) {};
\node[anchor=north] (prover) at (-5, 0) {\textsc{Prover}};
\node[anchor=north] (proof stream) at (0, 0) {\textsc{Proof Stream}};
\node[anchor=north] (verifier) at (5, 0) {\textsc{Verifier}};
\draw[->, line width=1] (-4, -1) -- (-0.5, -1) {};
\draw[->, line width=1] (-0.5, -2) -- (-4, -2) node[above, midway] {\textit{pseudo\textcolor{gray}{random}}};
\draw[->, line width=1] (-4, -3) -- (-0.5, -3) {};
\draw[->, line width=1] (-0.5, -4) -- (-4, -4) node[above, midway] {\textit{pseudo\textcolor{gray}{random}}};
\draw[dashed, color=gray!50!black] (-2.5, -4.5) -- (-2.5, -5) {};
\draw[->, line width=1] (-4, -5.5) -- (-0.5, -5.5) {};
\draw[->, line width=1] (-5.5, 0.75) -- (-5.5, 0) node[above, near start, yshift=0.25cm] {\textit{claim}};
\draw[->, line width=1] (-4.5, 0.5) -- (-4.5, 0) node[above, near start, yshift=0.125cm] {\textit{witness}};
\draw[->, line width=1] (4.5, 0.75) -- (4.5, 0) node[above, near start, yshift=0.25cm] {\textit{claim}};
\draw[->, line width=1] (0.5, -6) -- (4, -6) node[above, midway] {\textit{serialized proof}};
\draw[->, line width=1] (5, -6.5) -- (5, -7.25) node[below, near end, yshift=-0.25cm] {verdict};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}[scale=0.9, every node/.style={scale=0.9}]
\draw[-, dashed, line width=2, color=gray!50!white, rounded corners=0.5cm] (3, -0.5) -- (-1.4, -0.5) -- (-1.4, -5.3) -- (10.3, -5.3) -- (10.3, -0.5) -- (3, -0.5) {};
\draw[-, dashed, line width=2, color=gray!50!white, rounded corners=0.5cm] (3, -5.7) -- (-1.4, -5.7) -- (-1.4, -8.8) -- (10.3, -8.8) -- (10.3, -5.7) -- (3, -5.7) {};
\node[color=gray] (commit phase) at (6, -1) {\textsc{commit phase}};
\node[color=gray] (commit phase) at (6, -6.2) {\textsc{query phase}};
\node[color=blue!50!black] (polynomial) at (0,0) {polynomial};
\node[draw, color=green!50!black, rounded corners=0.125cm] (evaluate) at (0, -1) {evaluate};
\draw[->] (polynomial) -- (evaluate) {};
\node[color=green!50!black] (codeword) at (0, -2) {codeword};
\draw[->] (evaluate) -- (codeword) {};
\node[draw, color=green!50!black, rounded corners=0.125cm] (merkleize) at (2, -3) {Merkleize};
\draw[->] (codeword) -- (merkleize) {};
\node[color=green!50!black] (merkle root) at (5, -3) {Merkle root};
\draw[->] (merkleize) -- (merkle root) {};
\draw[-, line width=1] (8, 0 ) -- (8, -9) {};
\draw[-, line width=1] (10, 0) -- (10, -9) {};
\node[] (verifier) at (9, 0) {\textsc{Verifier}};
\draw[->] (merkle root) -- (7.5, -3) {};
\node[color=green!50!black] (weight) at (5, -4) {weight $\alpha$};
\draw[->] (7.5, -4) -- (weight) {};
\node[draw, color=green!50!black, rounded corners=0.125cm] (split and fold) at (0, -4) {split-and-fold};
\draw[->] (weight) -- (split and fold) {};
\draw[->] (split and fold) -- (codeword) {};
\node[color=green!50!black] (final codeword) at (0, -5) {final codeword};
\draw[->] (split and fold) -- (final codeword) {};
\draw[->] (final codeword) -- (7.5, -5) {};
\node[color=green!50!black] (merkle trees) at (2, -6) {Merkle trees};
\draw[->, line width=2, color=white] (merkleize) -- (merkle trees) {};
\draw[->] (merkleize) -- (merkle trees) {};
\node[draw, color=green!50!black, rounded corners=0.125cm] (decommitment) at (2, -7) {decommitment};
\draw[->] (merkle trees) -- (decommitment) {};
\node[color=green!50!black] (indices) at (5, -7) {indices};
\draw[->] (7.5, -7) -- (indices) {};
\draw[->] (indices) -- (decommitment) {};
\node[color=green!50!black] (leafs) at (1, -8) {leafs};
\node[color=green!50!black] (paths) at (4, -8) {authentication paths};
\draw[->] (decommitment) -- (leafs) {};
\draw[->] (decommitment) -- (paths) {};
\draw[->, rounded corners=0.125cm] (leafs) -- (1, -8.5) -- (7.5, -8.5) {};
\draw[-, rounded corners=0.125cm] (paths) -- (4, -8.5) -- (4.125, -8.5) {};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}
\node[color=red!50!black] (execution trace) at (0,0) {\begin{tabular}{c}
execution \\
trace
\end{tabular}};
\node[color=red!50!black, scale=0.9] (boundary constraints) at (-3, 0) {\begin{tabular}{c}
\textsc{Boundary} \\
\textsc{Constraints}
\end{tabular}};
\node[color=red!50!black, scale=0.9] (transition constraints) at (3, 0) {\begin{tabular}{c}
\textsc{Transition} \\
\textsc{Constraints}
\end{tabular}};
\node[color=blue!50!black] (trace polynomials) at (0, -3) {\begin{tabular}{c}
trace \\
polynomials
\end{tabular}};
\draw[->] (execution trace) -- (trace polynomials) node[above, midway, sloped, color=gray, scale=0.75] {\textit{interpolation}};
\node[color=blue!50!black] (boundary polynomials) at (-3 ,-5) {\begin{tabular}{c}
boundary \\
polynomials
\end{tabular}};
\node[color=blue!50!black] (transition polynomials) at (3 ,-5) {\begin{tabular}{c}
transition \\
polynomials
\end{tabular}};
\draw[->] (boundary constraints) -- (boundary polynomials) node[below, midway, sloped, color=gray, scale=0.75] {\textit{interpolate and subtract}};
\draw[<->] (trace polynomials) -- (-3, -3) {};
\draw[->] (transition constraints) -- (transition polynomials) node[above, midway, sloped, color=gray, scale=0.75] {\textit{symbolic evaluation}};
\draw[->] (trace polynomials) -- (3, -3) {};
\node[draw, color=green!50!black] (boundary quotients) at (-3, -7.5) {\textcolor{blue!50!black}{\begin{tabular}{c}
boundary \\
quotients
\end{tabular}}};
\node[draw, color=green!50!black] (transition quotients) at (3, -7.5) {\textcolor{blue!50!black}{\begin{tabular}{c}
transition \\
quotients
\end{tabular}}};
\draw[<->] (boundary polynomials) -- (boundary quotients) node[midway, right, color=gray, scale=0.75] {\begin{tabular}{l}
multiply or \\
divide by \\
zerofier
\end{tabular}};
\draw[<->] (transition polynomials) -- (transition quotients) node[midway, right, color=gray, scale=0.75] {\begin{tabular}{l}
multiply or \\
divide by \\
zerofier
\end{tabular}};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}
\node[color=red!50!black] (execution trace) at (0,0) {\begin{tabular}{c}
execution \\
trace
\end{tabular}};
\node[color=red!50!black, scale=0.9] (transition constraints) at (3, 0) {\begin{tabular}{c}
\textsc{AIR} \\
\textsc{Constraints}
\end{tabular}};
\node[] (trace polynomials) at (0, -3) {\begin{tabular}{c}
\textcolor{blue!50!black}{trace} \\
\textcolor{blue!50!black}{polynomials}
\end{tabular}};
\draw[dashed, color=green!50!black] (trace polynomials.north west) -- (trace polynomials.north east) -- (trace polynomials.south east) -- (trace polynomials.south west) -- (trace polynomials.north west) {};
\draw[->] (execution trace) -- (trace polynomials) node[above, midway, sloped, color=gray, scale=0.75] {\textit{interpolation}};
\node[color=blue!50!black] (transition polynomials) at (3 ,-5) {\begin{tabular}{c}
AIR evaluation \\
polynomials
\end{tabular}};
\draw[->] (transition constraints) -- (transition polynomials) node[above, midway, sloped, color=gray, scale=0.75] {\textit{symbolic evaluation}};
\draw[->] (trace polynomials) -- (3, -3) {};
\node[draw, color=green!50!black] (transition quotients) at (3, -7.5) {\textcolor{blue!50!black}{\begin{tabular}{c}
quotients
\end{tabular}}};
\draw[<->] (transition polynomials) -- (transition quotients) node[midway, right, color=gray, scale=0.75] {\begin{tabular}{l}
multiply or \\
divide by \\
zerofier
\end{tabular}};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\centering
\begin{tikzpicture}
\node[color=red!50!black] (witness) at (-3, 0) {witness};
\node[color=red!50!black] (index) at (0, 0) {index};
\node[color=red!50!black] (instance) at (3, 0) {instance};
\node[minimum height=2cm, minimum width=3cm, draw, rounded corners=0.125cm] (indexer box) at (0, -3) {};
\node[anchor=north] (indexer label) at (indexer box.north) {\textsc{Indexer}};
\node[minimum height=3cm, minimum width=3cm, draw, rounded corners=0.125cm] (prover box) at (-3, -7) {};
\node[anchor=north] (prover label) at (prover box.north) {\textsc{Prover}};
\node[minimum height=3cm, minimum width=3cm, draw, rounded corners=0.125cm] (verifier box) at (3, -7) {};
\node[anchor=north] (verifier label) at (verifier box.north) {\textsc{Verifier}};
\draw[->, rounded corners=0.5cm] ([xshift=-0.25cm] instance.south) -- (2.75, -1) -- (-2.5, -1) -- (-2.5, -5.5) {};
\draw[-, color=white, line width=4] (index.south) -- (0, -1.5) {};
\draw[->] (index.south) -- (0, -2) {};
\draw[->] (witness.south) -- (-3, -5.5) {};
\draw[->] ([xshift=0.25cm] instance.south) -- (3.25, -5.5) {};
\draw[->, rounded corners=0.5cm] (-0.5, -4) -- (-0.5, -4.75) -- (-2, -4.75) -- (-2, -5.5) {};
\draw[->, rounded corners=0.5cm] (0.5, -4) -- (0.5, -4.75) -- (2, -4.75) -- (2, -5.5) {};
\draw[<->] (prover box) -- (verifier box) {};
\end{tikzpicture}
\end{frame}
\begin{frame}{}
\begin{tikzpicture}
\begin{scope}[xshift=0]
\draw[-, line width=4, color=blue!50!black] (-4.5, 0) -- (-0.5, 0) {};
\draw[-, line width=4, color=blue!50!black] (-4.5, -1) -- (-2.5, -1) {};
\draw[-, line width=4, color=green!50!black] (-4.5, -1) -- (-4.25, -1) {};
\draw[-, line width=4, color=green!50!black] (-3.5, -1) -- (-3.25, -1) {};
\draw[-, line width=4, color=green!50!black] (-4.2, -1) -- (-4.1, -1) {};
\draw[-, line width=4, color=green!50!black] (-3.2, -1) -- (-3.1, -1) {};
\draw[-, line width=4, color=green!50!black] (-4, -1) -- (-3.9, -1) {};
\draw[-, line width=4, color=green!50!black] (-3, -1) -- (-2.9, -1) {};
\draw[-, line width=4, color=green!50!black] (-4.5, -2) -- (-3.5, -2) {};
\draw[-, line width=4, color=green!50!black] (-4.5, -3) -- (-4, -3) {};
\node[] (independent rounds) at (-2.5, 1) {independent rounds};
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -0.7) -- (-1, -0.9) {};
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -0.3) -- (-1, -0.1) {};
\draw[->, transform canvas={xshift=-0.7cm}, color=red!50!black, line width=1] (-1, -0.3) -- (-1, -0.1) {};
\draw[color=gray, line width=0.3] (-3.7, -0.35) -- (-3.7, -0.65);
\draw[color=gray, line width=0.3] (-3.7, -0.5) -- (-1.7, -0.5) -- (-1.7, -0.35);
\draw[->, transform canvas={xshift=-3.4cm}, color=red!50!black, line width=1] (-1, -1.7) -- (-1, -1.9) {};
\draw[->, transform canvas={xshift=-3.4cm}, color=red!50!black, line width=1] (-1, -1.3) -- (-1, -1.1) {};
\draw[->, transform canvas={xshift=-2.4cm}, color=red!50!black, line width=1] (-1, -1.3) -- (-1, -1.1) {};
\draw[color=gray, line width=0.3] (-4.4, -1.35) -- (-4.4, -1.65);
\draw[color=gray, line width=0.3] (-4.4, -1.5) -- (-3.4, -1.5) -- (-3.4, -1.35);
\draw[->, transform canvas={xshift=-3.1cm}, color=red!50!black, line width=1] (-1, -2.7) -- (-1, -2.9) {};
\draw[->, transform canvas={xshift=-3.1cm}, color=red!50!black, line width=1] (-1, -2.3) -- (-1, -2.1) {};
\draw[->, transform canvas={xshift=-2.6cm}, color=red!50!black, line width=1] (-1, -2.3) -- (-1, -2.1) {};
\draw[color=gray, line width=0.3] (-4.1, -2.35) -- (-4.1, -2.65);
\draw[color=gray, line width=0.3] (-4.1, -2.5) -- (-3.6, -2.5) -- (-3.6, -2.35);
\end{scope}
\begin{scope}[xshift=6cm]
\draw[-, line width=4, color=blue!50!black] (-4.5, 0) -- (-0.5, 0) {};
\draw[-, line width=4, color=blue!50!black] (-4.5, -1) -- (-2.5, -1) {};
\draw[-, line width=4, color=green!50!black] (-4.5, -1) -- (-4.25, -1) {};
\draw[-, line width=4, color=green!50!black] (-3.5, -1) -- (-3.25, -1) {};
\draw[-, line width=4, color=green!50!black] (-4.2, -1) -- (-4.1, -1) {};
\draw[-, line width=4, color=green!50!black] (-3.2, -1) -- (-3.1, -1) {};
\draw[-, line width=4, color=green!50!black] (-4, -1) -- (-3.9, -1) {};
\draw[-, line width=4, color=green!50!black] (-3, -1) -- (-2.9, -1) {};
\draw[-, line width=4, color=green!50!black] (-4.5, -2) -- (-3.5, -2) {};
\draw[-, line width=4, color=green!50!black] (-4.5, -3) -- (-4, -3) {};
\node[] (index folding) at (-2.5, 1) {index folding};
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -0.7) -- (-1, -0.9) {};
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -0.3) -- (-1, -0.1) {};
\draw[->, transform canvas={xshift=-0.7cm}, color=red!50!black, line width=1] (-1, -0.3) -- (-1, -0.1) {};
\draw[color=gray, line width=0.3] (-3.7, -0.35) -- (-3.7, -0.65);
\draw[color=gray, line width=0.3] (-3.7, -0.5) -- (-1.7, -0.5) -- (-1.7, -0.35);
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -1.7) -- (-1, -1.9) {};
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -1.3) -- (-1, -1.1) {};
\draw[->, transform canvas={xshift=-1.7cm}, color=red!50!black, line width=1] (-1, -1.3) -- (-1, -1.1) {};
\draw[color=gray, line width=0.3] (-3.7, -1.35) -- (-3.7, -1.65);
\draw[color=gray, line width=0.3] (-3.7, -1.5) -- (-2.7, -1.5) -- (-2.7, -1.35);
\draw[->, transform canvas={xshift=-3.2cm}, color=red!50!black, line width=1] (-1, -2.7) -- (-1, -2.9) {};
\draw[->, transform canvas={xshift=-3.2cm}, color=red!50!black, line width=1] (-1, -2.3) -- (-1, -2.1) {};
\draw[->, transform canvas={xshift=-2.7cm}, color=red!50!black, line width=1] (-1, -2.3) -- (-1, -2.1) {};
\draw[color=gray, line width=0.3] (-4.2, -2.35) -- (-4.2, -2.65);
\draw[color=gray, line width=0.3] (-4.2, -2.5) -- (-3.7, -2.5) -- (-3.7, -2.35);
\end{scope}
\end{tikzpicture}
\end{frame}
\end{document}
================================================
FILE: docs/overview.md
================================================
# Anatomy of a STARK, Part 1: STARK Overview
STARKs are a class of interactive proof systems, but for the purpose of this tutorial it's good to think of them as a special case of SNARKs in which
- hash functions are the only cryptographic ingredient,
- arithmetization is based on AIR (algebraic intermediate representation [^1]), and reduces the claim about computational integrity to one about the low degree of certain polynomials
- the low degree of polynomials is proven by using FRI as a subprotocol, and FRI itself is instantiated with Merkle trees [^2];
- zero-knowledge is optional.
This part of the tutorial is about explaining the key terms in this definition of STARKs.
## Interactive Proof Systems
In computational complexity theory, an interactive proof system is a protocol between at least two parties in which one party, the verifier, is convinced of the correctness of a certain mathematical claim if and only if that claim is true. In theory, the claim could be anything expressible by mathematical symbols, such as the Birch and Swinnerton-Dyer conjecture, $\mathbf{P} \neq \mathbf{NP}$, or "the fifteenth Fibonacci number is 643617." (In a sound proof system, the verifier will reject that last claim.)
A cryptographic proof system turns this abstract notion of interactive proof systems into a concrete object intended for deployment in the real world. This restriction to real world applications induces a couple of simplifications:
- The claim is not about a mathematical conjecture but concerns the integrity of a particular computation, like "circuit $C$ gives output $y$ when evaluated on input $x$", or "Turing machine $M$ outputs $y$ after $T$ steps". The proof system is said to establish *computational integrity*.
- There are two parties to the protocol, the prover and the verifier. Without loss of generality the messages sent by the verifier to the prover consist of unadulterated randomness and in this case (so: almost always) the proof system can be made non-interactive with the *Fiat-Shamir transform*. Non-interactive proof systems consist of a single message from the prover to the verifier.
- Instead of perfect security, it is acceptable for the verifier to have a nonzero but negligibly small false positive or false negative rate. Alternatively, it is acceptable for the proof system to offer security only against provers whose computational power is bounded. After all, all computers are computationally bounded in practice. Sometimes authors use the term *argument system* to distinguish the protocol from a proof system that offers security against computationally unbounded provers, and *argument* for the transcript resulting from the non-interactivity transform.
- There has to be a compelling reason why the verifier cannot naïvely re-run the computation whose integrity is asserted by the computational integrity claim. This is because the prover has access to resources that the verifier does not have access to.
- When the restricted resource is time, the verifier should run an order of magnitude faster than a naïve re-execution of the program. Proof systems that achieve this property are said to be *succinct* or have *succinct verification*.
- Succinct verification requires short proofs, but some proof systems like [Bulletproofs](https://eprint.iacr.org/2017/1066.pdf) or [Aurora](https://eprint.iacr.org/2018/828.pdf) feature compact proofs but still have slow verifiers.
- When the verifier has no access to secret information that is available to the prover, and when the proof system protects the confidentiality of this secret, the proof system satisfies *zero-knowledge*. The verifier is convinced of the truth of a computational claim while learning no information about some or all of the inputs to that computation.
- Especially in the context of zero-knowledge proof systems, the computational integrity claim may need a subtle amendment. In some contexts it is not enough to prove the correctness of a claim, but the prover must additionally prove that he *knows* the secret additional input, and could as well have outputted the secret directly instead of producing the proof.[^3] Proof systems that achieve this stronger notion of soundness called knowledge-soundness are called *proofs (or arguments) of knowledge*.
A SNARK is a *Succinct Non-interactive ARgument of Knowledge*. The [paper](https://eprint.iacr.org/2011/443.pdf) that coined the term SNARK used *succinct* to denote proof system with efficient verifiers. However, in recent years the meaning of the term has been diluted to include any system whose proofs are compact. This tutorial takes the side of the original definition.
## STARK Overview
The acronym STARK stands for Scalable Transparent ARgument of Knowledge. *Scalable* refers to the fact that two things occur simultaneously: (1) the prover has a running time that is at most quasilinear in the size of the computation, in contrast to SNARKs where the prover is allowed to have a prohibitively expensive complexity, and (2) verification time is poly-logarithmic in the size of the computation. *Transparent* refers to the fact that all verifier messages are just publicly sampled random coins. In particular, no trusted setup procedure is needed to instantiate the proof system, and hence there is no cryptographic toxic waste. The acronym's denotation suggests that non-interactive STARKs are a subclass of SNARKs, and indeed they are, but the term is generally used to refer to a *specific* construction for scalable transparent SNARKs.
The particular qualities of this construction are best illustrated in the context of the compilation pipeline. Depending on the level of granularity, one might opt to subdivide this process into more or fewer steps. For the purpose of introducing STARKs, the compilation pipeline is divided into four stages and three transformations. Later on in this tutorial, there will be a much more fine-grained pipeline and diagram.

### Computation
The input to the entire pipeline is a *computation*, which you can think of as a program, an input, and an output. All three are provided in a machine-friendly format, such as a list of bytes. In general, the program consists of instructions that determine how a machine manipulates its resources. If the right list of instructions can simulate an arbitrary Turing machine, then the machine architecture is Turing-complete.
In this tutorial the program is hardcoded into the machine architecture. As a result, the space of allowable computations is rather limited. Nevertheless, the inputs and outputs remain variable.
The *resources* that a computation requires could be *time*, *memory*, *randomness*, *secret information*, *parallelism*. The goal is to transform the computation into a format that enables resource-constrained verifier to verify its integrity. It is possible to study more types of resources still, such as entangled qubits, non-determinism, or oracles that compute a given black box function, but the resulting questions are typically the subject of computational complexity theory rather than cryptographical practice.
### Arithmetization and Arithmetic Constraint System
The first transformation in the pipeline is known as *arithmetization*. In this procedure, the sequence of elementary logical and arithmetical operations on strings of bits is transformed into a sequence of native finite field operations on finite field elements, such that the two represent the same computation. The output is an arithmetic constraint system, essentially a bunch of equations with coefficients and variables taking values from the finite field. The computation is integral *if and only if* the constraint system has a satisfying solution -- meaning, a single assignment to the variables such that all the equations hold.
The STARK proof system arithmetizes a computation as follows. At any point in time, the state of the computation is contained in a tuple of $\mathsf{w}$ registers that take values from the finite field $\mathbb{F}$. The machine defines a *state transition function* $f : \mathbb{F}^\mathsf{w} \rightarrow \mathbb{F}^\mathsf{w}$ that updates the state every cycle. The *algebraic execution trace (AET)* is the list of all state tuples in chronological order.
The arithmetic constraint system defines at least two types of constraints on the algebraic execution trace:
- *Boundary constraints*: at the start or at the end of the computation an indicated register has a given value.
- *Transition constraints*: any two consecutive state tuples evolved in accordance with the state transition function.
Collectively, these constraints are known as the *algebraic intermediate representation*, or *AIR*. Advanced STARKs may define more constraint types in order to deal with memory or with consistency of registers within one cycle.
### Interpolation and RS IOPs
Interpolation in the usual sense means finding a polynomial that passes through a set of data points. In the context of the STARK compilation pipeline, *interpolation* means finding a representation of the arithmetic constraint system in terms of polynomials. The resulting object is not an arithmetic constraint system but an abstract protocol.
The prover in a regular proof system sends messages to the verifier. But what happens when the verifier is not allowed to read them? Specifically, if the messages from the prover are replaced by oracles, abstract black-box functionalities that the verifier can query in points of his choosing, the protocol is an *interactive oracle proof (IOP)*. When the oracles correspond to polynomials of low degree, it is a *Polynomial IOP*. The intuition is that the honest prover obtains a polynomial constraint system whose equations hold, and that the cheating prover must use a constraint system where at least one equation is false. When polynomials are equal, they are equal everywhere, and in particular in random points of the verifier's choosing. But when polynomials are unequal, they are unequal *almost* everywhere, and this inequality is exposed with high probability when the verifier probes the left and right hand sides in a random point.
The STARK proof system interpolates the algebraic execution trace literally -- that is to say, it finds $\mathsf{w}$ polynomials $t_i(X)$ such that the values $t_i(X)$ takes on a domain $D$ correspond to the algebraic execution trace of the register $i$. These polynomials are sent as oracles to the verifier. At this point the AIR constraints give rise to operations on polynomials that send low-degree polynomials to low-degree polynomials only if the constraints are satisfied. The verifier simulates these operations and can thus derive new polynomials whose low degree certifies the satisfiability of the constraint system, and thus the integrity of the computation. In other words, the interpolation step reduces the satisfiability of an arithmetic constraint system to a claim about the low degree of certain polynomials.
### Cryptographic Compilation with FRI
In the real world, polynomial oracles do not exist. The protocol designer who wants to use a Polynomial IOP as an intermediate stage must find a way to commit to a polynomial and then open that polynomial in a point of the verifier's choosing. FRI is a key component of a STARK proof that achieves this task by using Merkle trees of Reed-Solomon Codewords to prove the boundedness of a polynomial's degree.
The Reed-Solomon codeword associated with a polynomial $f(X) \in \mathbb{F}[X]$ is the list of values it takes on a given domain $D \subset \mathbb{F}$. Consider without loss of generality domains $D$ whose cardinality is larger than the maximum allowable degree for polynomials. These values can be put into a Merkle tree, in which case the root represents a commitment to the polynomial. The *Fast Reed-Solomon IOP of Proximity (FRI)* is a protocol whose prover sends a sequence of Merkle roots corresponding to codewords whose lengths halve in every iteration. The verifier inspects the Merkle trees (specifically: asks the prover to provide the indicated leafs with their authentication paths) of consecutive rounds to test a simple linear relation. For honest provers, the degree of the represented polynomials likewise halves in each round, and is thus much smaller than the length of the codeword. However, for malicious provers, this degree is one less than the length of the codeword. In the last step, the prover sends a non-trivial codeword corresponding to a constant polynomial.
There is a minor issue the above description does not capture: how does the verifier query a committed polynomial $f(X)$ in a point $z$ that does not belong to the domain? In principle, there is an obvious and straightforward solution: the verifier sends $z$ to the prover, and the prover responds by sending $y=f(z)$. The polynomial $f(X) - y$ has a zero in $X=z$ and so must be divisible by $X-z$. So both prover and verifier have access to a new low degree polynomial, $\frac{f(X) - y}{X-z}$. If the prover was lying about $f(z)=y$, then he is incapable of proving the low degree of $\frac{f(X) - y}{X-z}$, and so his fraud will be exposed in the course of the FRI protocol. This is, in fact, the exact mechanism that enforces the boundary constraints; a slightly more involved, but similar, construction enforces the transition constraints. The new polynomials are the result of dividing out known factors, so they will be called *quotients* and denoted $q_i(X)$.
The terms "IOP" and "Polynomial IOP" in general refer to different, but similar, things. The messages sent by the prover are *codewords* in the case of IOPs, but *polynomials* in the case of Polynomial IOPs. However, when the FRI protocol is used, then the polynomials themselves are represented by their Reed-Solomon codewords. In other words, in the context of FRI, the terms "IOP" and "Polynomial IOP" become interchangeable.
At this point, the Polynomial IOP has been compiled into an interactive concrete proof system. In principle, the protocol could be executed. However, it pays to do one more step of cryptographic compilation: replace the verifier's random coins (a.k.a. *randomness*) by something pseudorandom -- but deterministic. This is exactly what the Fiat-Shamir transform does, and the result is the non-interactive proof known as the STARK.

This description glosses over many details. The remainder of this tutorial will explain the construction in more concrete and tangible terms, and will insert more fine-grained components into the diagram.
[0](index) - **1** - [2](basic-tools) - [3](fri) - [4](stark) - [5](rescue-prime) - [6](faster)
[^1]: Also, algebraic *internal* representation.
[^2]: Note that FRI is defined in terms of abstract oracles which can be queried in arbitrary locations; a FRI protocol can thus be compiled into a concrete protocol by simulating the oracles with any cryptographic vector commitment scheme. Merkle trees provide this functionality but are not the only cryptographic primitive to do it.
[^3]: Formally, *knowledge* is defined as follows: an extractor algorithm must exist which has oracle access to a possibly-malicious prover, pretends to be the matching verifier (and in particular reads the messages coming from the prover and sends its own via the same interface), has the power to rewind the possibly-malicious prover to any earlier point in time, runs in polynomial time, and outputs the witness. STARKs have been shown to satisfy this property, see section 5 of the [EthSTARK documentation](https://eprint.iacr.org/2021/582.pdf).
================================================
FILE: docs/rescue-prime.md
================================================
# Anatomy of a STARK, Part 5: A Rescue-Prime STARK
This part of the tutorial puts the tools developed in the previous parts together to build a concretely useful STARK proof system. This application produces a STARK proof of correct evaluation of the Rescue-Prime hash function on a secret input with a known output. It is concretely useful because the resulting non-interactive proof doubles as a post-quantum signature scheme.
## Rescue-Prime
[Rescue-Prime](https://eprint.iacr.org/2020/1143.pdf) is an arithmetization-oriented hash function, meaning that it has a compact description in terms of AIR. It is a sponge function constructed from the Rescue-XLIX permutation $f_{\mathrm{R}^{\mathrm{XLIX}}} : \mathbb{F}^m \rightarrow \mathbb{F}^m$, consisting of several almost-identical rounds. Every round consists of six steps:
1. Forward S-box. Every element of the state is raised to the power $\alpha$, where $\alpha$ is the smallest invertible power.
2. MDS. The vector of state elements is multiplied by a matrix with special properties.
3. Round constants. Pre-defined constants are added to every element of the state.
4. Backward S-box. Every element of the state is raised to the power $\alpha^{-1}$, which is the integer whose power map is the inverse of $x \mapsto x^\alpha$.
5. MDS. The vector of state elements is multiplied by a matrix with special properties.
6. Round constants. Pre-defined constants are added to every element of the state.

The rounds are *almost* identical but not quite because constants are different in every rounds. While the Backward S-box step seems like a high degree operation, as we shall see, all six steps of the Rescue-XLIX round function can be captured by non-deterministic transition constraints of degree $\alpha$.
Once the Rescue-XLIX permutation is defined, one obtains Rescue-Prime by instantiating a sponge function with it. In this construction, input field elements are absorbed into the top $r$ elements of the state in between permutations. After one final permutation, the top $r$ elements are read out. The Rescue-Prime hash digest consists of these $r$ elements.

For the present STARK proof the following parameters are used:
- prime field of $p = 407 \cdot 2^{119} + 1$ elements
- $\alpha = 3$ and $\alpha^{-1} = 180331931428153586757283157844700080811$
- $m = 2$
- $r = 1$
Furthermore, the input to the hash computation will be a single field element. So in particular, there will be only one round of absorbing and one application of the permutation.
## Implementation
The Rescue-Prime [paper](https://eprint.iacr.org/2020/1143.pdf) provides a nearly complete reference implementation. However, the code here is tailored to this one application.
```python
class RescuePrime:
def __init__( self ):
self.p = 407 * (1 << 119) + 1
self.field = Field(self.p)
self.m = 2
self.rate = 1
self.capacity = 1
self.N = 27
self.alpha = 3
self.alphainv = 180331931428153586757283157844700080811
self.MDS = [[FieldElement(v, self.field) for v in [270497897142230380135924736767050121214, 4]],
[FieldElement(v, self.field) for v in [270497897142230380135924736767050121205, 13]]]
self.MDSinv = [[FieldElement(v, self.field) for v in [210387253332845851216830350818816760948, 60110643809384528919094385948233360270]],
[FieldElement(v, self.field) for v in [90165965714076793378641578922350040407, 180331931428153586757283157844700080811]]]
self.round_constants = [FieldElement(v, self.field) for v in [174420698556543096520990950387834928928,
109797589356993153279775383318666383471,
228209559001143551442223248324541026000,
268065703411175077628483247596226793933,
250145786294793103303712876509736552288,
154077925986488943960463842753819802236,
204351119916823989032262966063401835731,
57645879694647124999765652767459586992,
102595110702094480597072290517349480965,
8547439040206095323896524760274454544,
50572190394727023982626065566525285390,
87212354645973284136664042673979287772,
64194686442324278631544434661927384193,
23568247650578792137833165499572533289,
264007385962234849237916966106429729444,
227358300354534643391164539784212796168,
179708233992972292788270914486717436725,
102544935062767739638603684272741145148,
65916940568893052493361867756647855734,
144640159807528060664543800548526463356,
58854991566939066418297427463486407598,
144030533171309201969715569323510469388,
264508722432906572066373216583268225708,
22822825100935314666408731317941213728,
33847779135505989201180138242500409760,
146019284593100673590036640208621384175,
51518045467620803302456472369449375741,
73980612169525564135758195254813968438,
31385101081646507577789564023348734881,
270440021758749482599657914695597186347,
185230877992845332344172234234093900282,
210581925261995303483700331833844461519,
233206235520000865382510460029939548462,
178264060478215643105832556466392228683,
69838834175855952450551936238929375468,
75130152423898813192534713014890860884,
59548275327570508231574439445023390415,
43940979610564284967906719248029560342,
95698099945510403318638730212513975543,
77477281413246683919638580088082585351,
206782304337497407273753387483545866988,
141354674678885463410629926929791411677,
19199940390616847185791261689448703536,
177613618019817222931832611307175416361,
267907751104005095811361156810067173120,
33296937002574626161968730356414562829,
63869971087730263431297345514089710163,
200481282361858638356211874793723910968,
69328322389827264175963301685224506573,
239701591437699235962505536113880102063,
17960711445525398132996203513667829940,
219475635972825920849300179026969104558,
230038611061931950901316413728344422823,
149446814906994196814403811767389273580,
25535582028106779796087284957910475912,
93289417880348777872263904150910422367,
4779480286211196984451238384230810357,
208762241641328369347598009494500117007,
34228805619823025763071411313049761059,
158261639460060679368122984607245246072,
65048656051037025727800046057154042857,
134082885477766198947293095565706395050,
23967684755547703714152865513907888630,
8509910504689758897218307536423349149,
232305018091414643115319608123377855094,
170072389454430682177687789261779760420,
62135161769871915508973643543011377095,
15206455074148527786017895403501783555,
201789266626211748844060539344508876901,
179184798347291033565902633932801007181,
9615415305648972863990712807943643216,
95833504353120759807903032286346974132,
181975981662825791627439958531194157276,
267590267548392311337348990085222348350,
49899900194200760923895805362651210299,
89154519171560176870922732825690870368,
265649728290587561988835145059696796797,
140583850659111280842212115981043548773,
266613908274746297875734026718148328473,
236645120614796645424209995934912005038,
265994065390091692951198742962775551587,
59082836245981276360468435361137847418,
26520064393601763202002257967586372271,
108781692876845940775123575518154991932,
138658034947980464912436420092172339656,
45127926643030464660360100330441456786,
210648707238405606524318597107528368459,
42375307814689058540930810881506327698,
237653383836912953043082350232373669114,
236638771475482562810484106048928039069,
168366677297979943348866069441526047857,
195301262267610361172900534545341678525,
2123819604855435621395010720102555908,
96986567016099155020743003059932893278,
248057324456138589201107100302767574618,
198550227406618432920989444844179399959,
177812676254201468976352471992022853250,
211374136170376198628213577084029234846,
105785712445518775732830634260671010540,
122179368175793934687780753063673096166,
126848216361173160497844444214866193172,
22264167580742653700039698161547403113,
234275908658634858929918842923795514466,
189409811294589697028796856023159619258,
75017033107075630953974011872571911999,
144945344860351075586575129489570116296,
261991152616933455169437121254310265934,
18450316039330448878816627264054416127]]
def hash( self, input_element ):
# absorb
state = [input_element] + [self.field.zero()] * (self.m - 1)
# permutation
for r in range(self.N):
# forward half-round
# S-box
for i in range(self.m):
state[i] = state[i]^self.alpha
# matrix
temp = [self.field.zero() for i in range(self.m)]
for i in range(self.m):
for j in range(self.m):
temp[i] = temp[i] + self.MDS[i][j] * state[j]
# constants
state = [temp[i] + self.round_constants[2*r*self.m+i] for i in range(self.m)]
# backward half-round
# S-box
for i in range(self.m):
state[i] = state[i]^self.alphainv
# matrix
temp = [self.field.zero() for i in range(self.m)]
for i in range(self.m):
for j in range(self.m):
temp[i] = temp[i] + self.MDS[i][j] * state[j]
# constants
state = [temp[i] + self.round_constants[2*r*self.m+self.m+i] for i in range(self.m)]
# squeeze
return state[0]
```
### Rescue-Prime AIR
The transition constraints for a single round of the Rescue-XLIX permutation are obtained by expressing the state values in the middle of the round in terms of the state values at the beginning, and again in terms of the state values at the end, and then equating both expressions. Specifically, let $\boldsymbol{s}_ {i}$ denote the state values at the beginning of round $i$, let $\boldsymbol{c}_ {2i}$ and $\boldsymbol{c}_{2i+1}$ be round constants, let $M$ be the MDS matrix, and let superscript denote element-wise powering. Then the transition of a single round is captured by the equation:
$$ M (\boldsymbol{s}_i^\alpha) + \boldsymbol{c}_{2i} = \left(M^{-1} (\boldsymbol{s}_{i+1} - \boldsymbol{c}_{2i+1})\right)^\alpha \enspace $$
To be used in a STARK, transition constraints cannot depend on the round. In other words, what is needed is a single equation that describes all rounds, not just the $i$th round. Let $\boldsymbol{X}$ denote the vector of variables representing the current state (beginning of the round), and $\boldsymbol{Y}$ denote the vector of variables represnting the next state (at the end of the round). Furthermore, let $\mathbf{f}_ {\boldsymbol{c}_ {2i}}(W)$ denote the vector of $m$ polynomials that take the value $\boldsymbol{c}_ {2i}$ on $\omicron^i$, and analogously for $\mathbf{f}_ {\boldsymbol{c}_{2i+1}}(W)$. Suppose without loss of generality that the execution trace will be interpolated on the domain $\lbrace \omicron^i \vert 0 \leq i \leq T\rbrace$ for some $T$. Then the above family of arithmetic transition constraints gives rise to the following equation capturing the same transition conditions:
$$ M(\boldsymbol{X}^\alpha) + \mathbf{f}_ {\boldsymbol{c}_ {2i}}(W) = \left(M^{-1}(\boldsymbol{Y} - \mathbf{f}_ {\boldsymbol{c}_{2i+1}}(W))\right)^\alpha $$
The transition constraint polynomial is obtained by moving all terms to the left-hand side and dropping the equation to zero. Note that there are $2m+1$ variables, corresponding to $m = \mathsf{w}$ registers.
```python
def round_constants_polynomials( self, omicron ):
first_step_constants = []
for i in range(self.m):
domain = [omicron^r for r in range(0, self.N)]
values = [self.round_constants[2*r*self.m+i] for r in range(0, self.N)]
univariate = Polynomial.interpolate_domain(domain, values)
multivariate = MPolynomial.lift(univariate, 0)
first_step_constants += [multivariate]
second_step_constants = []
for i in range(self.m):
domain = [omicron^r for r in range(0, self.N)]
values = [self.field.zero()] * self.N
#for r in range(self.N):
# print("len(round_constants):", len(self.round_constants), " but grabbing index:", 2*r*self.m+self.m+i, "for r=", r, "for m=", self.m, "for i=", i)
# values[r] = self.round_constants[2*r*self.m + self.m + i]
values = [self.round_constants[2*r*self.m+self.m+i] for r in range(self.N)]
univariate = Polynomial.interpolate_domain(domain, values)
multivariate = MPolynomial.lift(univariate, 0)
second_step_constants += [multivariate]
return first_step_constants, second_step_constants
def transition_constraints( self, omicron ):
# get polynomials that interpolate through the round constants
first_step_constants, second_step_constants = self.round_constants_polynomials(omicron)
# arithmetize one round of Rescue-Prime
variables = MPolynomial.variables(1 + 2*self.m, self.field)
cycle_index = variables[0]
previous_state = variables[1:(1+self.m)]
next_state = variables[(1+self.m):(1+2*self.m)]
air = []
for i in range(self.m):
# compute left hand side symbolically
# lhs = sum(MPolynomial.constant(self.MDS[i][k]) * (previous_state[k]^self.alpha) for k in range(self.m)) + first_step_constants[i]
lhs = MPolynomial.constant(self.field.zero())
for k in range(self.m):
lhs = lhs + MPolynomial.constant(self.MDS[i][k]) * (previous_state[k]^self.alpha)
lhs = lhs + first_step_constants[i]
# compute right hand side symbolically
# rhs = sum(MPolynomial.constant(self.MDSinv[i][k]) * (next_state[k] - second_step_constants[k]) for k in range(self.m))^self.alpha
rhs = MPolynomial.constant(self.field.zero())
for k in range(self.m):
rhs = rhs + MPolynomial.constant(self.MDSinv[i][k]) * (next_state[k] - second_step_constants[k])
rhs = rhs^self.alpha
# equate left and right hand sides
air += [lhs-rhs]
return air
```
The boundary constraints are a lot simpler. At the beginning, the first state element is the unknown secret and the second state element is zero because the sponge construction defines it as such. At the end (after all $N$ rounds or $T$ cycles), the first state element is the one element of known hash digest $[h]$, and the second state element is unconstrained. Note that this second state element must be kept secret to be secure -- otherwise the attacker can invert the permutation. This description gives rise to the following set $\mathcal{B}$ of triples $(c, r, e) \in \lbrace 0, \ldots, T \rbrace \times \lbrace 0, \ldots, \mathsf{w}-1 \rbrace \times \mathbb{F}$:
- $(0, 1, 0)$
- $(T, 0, h)$.
```python
def boundary_constraints( self, output_element ):
constraints = []
# at start, capacity is zero
constraints += [(0, 1, self.field.zero())]
# at end, rate part is the given output element
constraints += [(self.N, 0, output_element)]
return constraints
```
The piece of the arithmetization is the witness, which for STARKs is the execution trace. In the case of this particular computation, the trace is the collection of states after every round, in addition to the state at the very beginning.
```python
def trace( self, input_element ):
trace = []
# absorb
state = [input_element] + [self.field.zero()] * (self.m - 1)
# explicit copy to record state into trace
trace += [[s for s in state]]
# permutation
for r in range(self.N):
# forward half-round
# S-box
for i in range(self.m):
state[i] = state[i]^self.alpha
# matrix
temp = [self.field.zero() for i in range(self.m)]
for i in range(self.m):
for j in range(self.m):
temp[i] = temp[i] + self.MDS[i][j] * state[j]
# constants
state = [temp[i] + self.round_constants[2*r*self.m+i] for i in range(self.m)]
# backward half-round
# S-box
for i in range(self.m):
state[i] = state[i]^self.alphainv
# matrix
temp = [self.field.zero() for i in range(self.m)]
for i in range(self.m):
for j in range(self.m):
temp[i] = temp[i] + self.MDS[i][j] * state[j]
# constants
state = [temp[i] + self.round_constants[2*r*self.m+self.m+i] for i in range(self.m)]
# record state at this point, with explicit copy
trace += [[s for s in state]]
return trace
```
## STARK-based Signatures
A non-interactive zero-knowldge proof system can be transformed into a signature scheme. The catch is that it must be capable of proving knowledge of a solution to a cryptographically hard problem. STARKs can be used to prove arbitrarily complex computational statements. However, the whole point of Rescue-Prime is that it generates cryptographically hard problem instances in a STARK-friendly way -- concretely, with a compact AIR. So let's transform a STARK for Rescue-Prime into a signature scheme.
### Rescue-Prime STARK
Producing a prover and verifier for STARK for Rescue-Prime consists of little more than linking together existing code snippets.
```python
class RPSSS:
def __init__( self ):
self.field = Field.main()
expansion_factor = 4
num_colinearity_checks = 64
security_level = 2 * num_colinearity_checks
self.rp = RescuePrime()
num_cycles = self.rp.N+1
state_width = self.rp.m
self.stark = Stark(self.field, expansion_factor, num_colinearity_checks, security_level, state_width, num_cycles, transition_constraints_degree=3)
def stark_prove( self, input_element, proof_stream ):
output_element = self.rp.hash(input_element)
trace = self.rp.trace(input_element)
transition_constraints = self.rp.transition_constraints(self.stark.omicron)
boundary_constraints = self.rp.boundary_constraints(output_element)
proof = self.stark.prove(trace, transition_constraints, boundary_constraints, proof_stream)
return proof
def stark_verify( self, output_element, stark_proof, proof_stream ):
boundary_constraints = self.rp.boundary_constraints(output_element)
transition_constraints = self.rp.transition_constraints(self.stark.omicron)
return self.stark.verify(stark_proof, transition_constraints, boundary_constraints, proof_stream)
```
Note the explicit argument concerning the proof stream. This needs to be a special object that simulates a *message-dependent* Fiat-Shamir transform, as opposed to a regular one.
### Message-Dependent Fiat-Shamir
In order to transform a zero-knowledge proof system into a signature scheme, a non-interactive proof must be tied to the document that is being signed. Traditionally, the way to do this via Fiat-Shamir is to define the verifier's pseudorandom response as the hash digest of the document concatenated with the entire protocol transcript up until the point where its output is required.
In terms of implementation, this requires a new proof stream object -- one that is aware of the document for which the signature is to be generated or verified. The next class achieves just this.
```python
class SignatureProofStream(ProofStream):
def __init__( self, document ):
ProofStream.__init__(self)
self.document = document
self.prefix = blake2s(bytes(document)).digest()
def prover_fiat_shamir( self, num_bytes=32 ):
return shake_256(self.prefix + self.serialize()).digest(num_bytes)
def verifier_fiat_shamir( self, num_bytes=32 ):
return shake_256(self.prefix + pickle.dumps(self.objects[:self.read_index])).digest(num_bytes)
def deserialize( self, bb ):
sps = SignatureProofStream(self.document)
sps.objects = pickle.loads(bb)
return sps
```
### Signature Scheme
At this point it is possible to define the key generation, signature generation, and signature verification functions that make up a signature scheme. Note that these functions are members of the Rescue-Prime STARK Signature Scheme (`RPSSS`) class whose definition started earlier.
```python
# class RPSSS:
def keygen( self ):
sk = self.field.sample(os.urandom(17))
pk = self.rp.hash(sk)
return sk, pk
def sign( self, sk, document ):
sps = SignatureProofStream(document)
signature = self.stark_prove(sk, sps)
return signature
def verify( self, pk, document, signature ):
sps = SignatureProofStream(document)
return self.stark_verify(pk, signature, sps)
```
This code defines a *provably secure*[^1], *post-quantum* signature scheme that (almost) achieves a 128 bit security level. While this description sounds flattering, the scheme's performance metrics are much less so:
- secret key size: 16 bytes (yay!)
- public key size: 16 bytes (yay!)
- signature size: **~133 kB**
- keygen time: 0.01 seconds (acceptable)
- signing time: **250 seconds**
- verification time: **444 seconds**
There might be a few optimizations available that can reduce the proof's size, such as merging common paths when opening a batch of Merkle leafs. However, these optimizations distract from the purpose of this tutorial, which is to highlight and explain the mathematics involved.
In terms of speed, a lot of the poor performance is due to using Python instead of a language that is closer to the hardware such as C or Rust. Python was chosen for the same reason -- to highlight and explain the maths. But the biggest performance gain in terms of speed is going to come from switching to faster algorithms for key operations. This is the topic of the next and last part of the tutorial.
[0](index) - [1](overview) - [2](basic-tools) - [3](fri) - [4](stark) - **5** - [6](faster)
[^1]: More specifically, in the random oracle model, a successful signature-forger gives rise to an adversary who breaks the one-wayness of Rescue-Prime with polynomially related running time and success probability.
================================================
FILE: docs/stark.md
================================================
# Anatomy of a STARK, Part 4: The STARK IOP
This part of the tutorial deals with the information-theoretic backbone of the STARK proof system, which you might call the STARK IOP. Recall that the compilation pipeline of SNARKs involves intermediate stages, the first two of which are the *arithmetic constraint system* and the *IOP*. This tutorial does describe the properties of the arithmetic constraint system. However, a discussion about the *arithmetization* step, which transforms the initial computation into an arithmetic constraint system, is out of scope. The *interpolation* step, which transforms this arithmetic constraint system into an IOP, is discussed at length. The final IOP can be compiled into a concrete proof system using the FRI-based compiler described in [part 3](fri).
## Arithmetic Intermediate Representation (AIR)
The *arithmetic intermediate representation (AIR)* (also, arithmetic *internal* representation) is a way of describing a computation in terms of an execution trace that satisfies a number of constraints induced by the correct evolution of the state. The term *arithmetic* refers to the fact that this execution trace consists of a list of finite field elements (or an array, if more than one register is involved), and that the constraints are expressible as low degree polynomials.
Let's make this more concrete. Let $\mathbb{F}_p$ be the field of definition. Without loss of generality, the computation describes the evolution of a *state* of $\mathsf{w}$ registers for $T$ cycles. The *algebraic execution trace (AET)* is the table of $T \times \mathsf{w}$ field elements where every row describes the state of the system at the given point in time, and every column tracks the value of the given register.
A *state transition function*
$$f : \mathbb{F}_p^\mathsf{w} \rightarrow \mathbb{F}_p^\mathsf{w} $$
determines the state at the next cycle as a function of the state at the previous cycle. Furthermore, a list of boundary conditions
$$ \mathcal{B} \subseteq \mathbb{Z}_T \times \mathbb{Z}_\mathsf{w} \times \mathbb{F}_p $$
enforce the correct values of some or all registers at the first cycle, last cycle, or even at arbitrary cycles.
The *computational integrity claim* consists of the state transition function and the boundary conditions. The *witness* to this claim is the algebraic execution trace. The claim is *true* if there is a witness $W \in \mathbb{F}_p^{T \times \mathsf{w}}$ such that:
- For every cycle, the state evolves correctly: $\forall i \in \lbrace 0, \ldots, T-2 \rbrace \, . \, f(W_{[i,:]}) = W_{[i+1,:]}$.
- All boundary conditions are satisfied: $\forall (i, w, e) \in \mathcal{B} \, . \, W_{[i,w]} = e$.
The state transition function hides a lot of complexity. For the purpose of STARKs, it needs to be describable as low degree polynomials that are *independent of the cycle*. However, this list of polynomials does not need to compute the next state from the current one; it merely needs to distinguish correct evolutions from incorrect ones. Specifically, the function
$$ f : \mathbb{F}_p^\mathsf{w} \rightarrow \mathbb{F}_p^\mathsf{w} $$
is represented by a list of polynomials $\mathbf{p}(X_0, \ldots, X_{\mathsf{w}-1}, Y_{0}, \ldots, Y_{ \mathsf{w}-1})$ such that $f(\mathbf{x}) = \mathbf{y}$ if and only if $\mathbf{p}(\mathbf{x}, \mathbf{y}) = \mathbf{0}$. Say there are $r$ such state transition verification polynomials. Then the transition constraints become:
- $\forall i \in \lbrace 0, \ldots, T - 2 \rbrace \, . \, \forall j \in \lbrace 0, \ldots, r-1\rbrace \, . \, p_j(W_{[i,0]}, \ldots, W_{[i, \mathsf{w}-1]}, W_{[i+1,0]}, \ldots, W_{[i+1, \mathsf{w}-1]}) = 0$.
This representation admits *non-determinism*, which has the capacity to reduce high degree state transition *computation* polynomials with low degree state transition *verification* polynomials. For example: the state transition function $f : \mathbb{F}_p \rightarrow \mathbb{F}_p$ given by
$$
x \mapsto \begin{cases}
x^{-1} & \text{if } x \neq 0, \\
0 & \text{otherwise, if } x = 0
\end{cases}
$$
can be represented as a computation polynomial $f(x) = x^{p-2}$ or as a pair of verification polynomials $\mathbf{p}(x,y) = (x(xy-1), y(xy-1))$. The degree drops from $p-2$ to 3.
Not all lists of $\mathsf{w}$ field elements represent valid states. For instance, some registers may be constrained to bits and thus take only values from $\lbrace 0, 1\rbrace$. The state transition function is what guarantees that the next state is well-formed if the current state is. When translating to verification polynomials, these *consistency constraints* are polynomials in the first half of variables only ($X_0, \ldots, X_{\mathsf{w}-1}$) because they apply to every single row in the AET, as opposed to every consecutive pair of rows. For the sake of simplicity, this tutorial will ignore consistency constraints and pretend as though every $\mathsf{w}$-tuple of field elements represents a valid state.
## Interpolation
The arithmetic constraint system described above already represents the computational integrity claim as a bunch of polynomials; each such polynomial corresponds to a constraint. Transforming this constraint system into an IOP requires extending this representation in terms of polynomials to the witness and extending the notion of *valid* witnesses to *witness polynomials*. Specifically, we need to represent the conditions for true computational integrity claims in terms of identities of polynomials.
Let $D$ be a list of points referred to from here on out as the *trace evaluation domain*. Typically, $D$ is set to the span of a generator $\omicron$ of a subgroup of order $2^k \geq T+1$. So for the time being, set $D = \lbrace \omicron^i \vert i \in \mathbb{Z}\rbrace$. The Greek letter $\omicron$ ("omicron") indicates that the trace evaluation domain is smaller than the FRI evaluation domain by a factor exactly equal to the expansion factor[^1].
Let $\boldsymbol{t}(X) \in (\mathbb{F}_p[X])^\mathsf{w}$ be a list of $\mathsf{w}$ univariate polynomials that interpolate through $W$ on $D$. Specifically, the *trace polynomial* $t_w(X)$ for register $w$ is the univariate polynomial of lowest degree such that $\forall i \in \lbrace 0, \ldots, T\rbrace \, . \, t_w(\omicron^i) = W[i, w]$. The trace polynomials are a representation of the algebraic execution trace in terms of univariate polynomials.
Translating the conditions for true computational integrity claims to the trace polynomials, one gets both:
- All boundary constraints are satisfied: $\forall (i, w, e) \in \mathcal{B} \, . \, t_w(\omicron^i) = e$.
- For all cycles, all transition constraints are satisfied: $\forall i \in \lbrace 0, \ldots, T-2 \rbrace \, . \, \forall j \in \lbrace 0, \ldots, r-1 \rbrace \, . \, p_j( t_0(\omicron^i), \ldots, t_{\mathsf{w}-1}(\omicron^i), t_0(\omicron^{i+1}), \ldots, t_{\mathsf{w}-1}(\omicron^{i+1})) = 0$.
The last expression looks complicated. However, observe that the left hand side of the equation corresponds to the univariate polynomial $p_j(t_0(X), \ldots, t_{\mathsf{w}-1}(X), t_0(\omicron \cdot X), \ldots, t_{\mathsf{w}-1}(\omicron \cdot X))$. The entire expression simply says that all $r$ of these *transition polynomials* evaluate to 0 in $\lbrace \omicron^i \vert i \in \mathbb{Z}_T\rbrace$.
This observation gives rise to the following high-level protocol:
1. The prover commits to the trace polynomials $\boldsymbol{t}(X)$.
2. The verifier checks that $t_w(X)$ evaluates to $e$ in $\omicron^i$ for all $(i, w, e) \in \mathcal{B}$.
3. The prover commits to the transition polynomials $\mathbf{c}(X) = \mathbf{p}(t_0(X), \ldots, t_{\mathsf{w}-1}(X), t_0(\omicron \cdot X), \ldots, t_{\mathsf{w}-1}(\omicron \cdot X))$.
4. The verifier checks that $\mathbf{c}(X)$ and $\boldsymbol{t}(X)$ are correctly related by:
- Choosing a random point $z$ drawn uniformly from the field excluding the element 0.
- Querying the values of $\boldsymbol{t}(X)$ in $z$ and $\omicron \cdot z$.
- Evaluating the transition verification polynomials $\mathbf{p}(X_0, \ldots, X_{\mathsf{w}-1}, Y_0, \ldots, Y_{\mathsf{w}-1})$ in $(X_0, \ldots, X_{\mathsf{w}-1}, Y_0, \ldots, Y_{\mathsf{w}-1}) = (t_0(z), \ldots, t_{\mathsf{w}-1}(z), t_0(\omicron \cdot z), \ldots, t_{\mathsf{w}-1}(\omicron \cdot z))$.
- Querying the values of $\mathbf{c}(X)$ in $z$.
- Checking that the values obtained in the previous two steps match.
5. The verifier checks that the transition polynomials $\mathbf{c}(X)$ evaluate to zero in $\lbrace \omicron^i \vert i \in \lbrace 0, \ldots, T-2 \rbrace \rbrace$.
In fact, the commitment of the transition polynomials can be omitted. Instead, the verifier uses the evaluation of $\boldsymbol{t}(X)$ in $z$ and $\omicron \cdot z$ to compute the value of $\mathbf{c}(X)$ in the one point needed to verify that $\mathbf{c}(X)$ evaluates to 0 in $\lbrace \omicron^i \vert i \in \lbrace 0, \ldots, T-2 \rbrace \rbrace$.
There is another layer of redundancy, but it is only apparent after the evaluation checks are unrolled. The FRI compiler simulates an evaluation check by a) subtracting the y-coordinate, b) dividing out the zerofier, which is the minimal polynomial that vanishes at the x-coordinate, and c) proving that the resulting quotient has a bounded degree. This procedure happens twice for the STARK polynomials: (1) first, applied to the trace polynomials to show satisfaction of the boundary constraints and (2) second, applied to the transition polynomials to show that the transition constraints are satisfied. We call the resulting lists of quotient polynomials the *boundary quotients* and the *transition quotients* respectively.
The redundancy comes from the fact that the trace polynomials relate to both quotients. It can therefore be eliminated by merging the equations they are involved in. The next diagram illustrates this elimination in the context of the STARK IOP workflow. The green box indicates that the polynomials are committed to through the familiar evaluation and Merkle root procedure and are provided as input to FRI.

At the top of this diagram in red are the objects associated with the arithmetic constraint system, with the constraints written in small caps font to indicate that they are known to the verifier. The prover interpolates the execution trace to obtain the trace polynomials, but it is not necessary to commit to these polynomials. Instead, the prover interpolates the boundary points and subtracts the resulting interpolants from the trace polynomials. This procedure produces the *boundary polynomials*, for lack of a better name. To obtain the *boundary quotients* from the boundary polynomials, the prover divides out the zerofier. Note that the boundary quotients and trace polynomials are equivalent in the following sense: if the verifier knows a value in a given point of one, he can compute the matching value of the other using only public information.
To obtain the transition polynomials, the prover evaluates the transition constraints (recall, these are given as multivariate polynomials) symbolically in the trace polynomials. To get the transition quotients from the transition polynomials, divide out the zerofier. Assume for the time being that the verifier is capable of evaluating this zerofier efficiently. Note that the transition quotients and the trace polynomials are not equivalent -- the verifier cannot necessarily undo the symbolic evaluation. However, this non-equivalence does not matter. What the verifier needs to verify is that the boundary quotients and the transition quotients are linked. Traveling from the boundary quotients to the transition quotients, and performing the indicated arithmetic along the way, establishes this link. The remaining part of the entire computational integrity claim is the bounded degree of the quotient polynomials, and this is exactly what FRI already solves.
The use of the plural on the right hand side is slightly misleading. After the boundary quotients have been committed to by sending their Merkle roots to the verifier, the prover obtains from the verifier random weights with which to compress the transition constraints to a single linear combination. As a result of this compression, there is one transition constraint, one transition polynomial, and one transition quotient. Nevertheless, this compression may be omitted without affecting security; it merely requires more work on the part of both the prover and the verifier.
To summarize, this workflow generates two recipes: one for the prover and one for the verifier. They are presented here in abstract terms and in interactive form.
Prover:
- Interpolate the execution trace to obtain the trace polynomials.
- Interpolate the boundary points to obtain the boundary interpolants, and compute the boundary zerofiers along the way.
- Subtract the boundary interpolants from the trace polynomials, and divide out the boundary zerofier, giving rise to the boundary quotients.
- Commit to the boundary quotients.
- Get $r$ random coefficients from the verifier.
- Compress the $r$ transition constraints into one master constraint that is the weighted sum.
- Symbolically evaluate the master constraint in the trace polynomials, thus generating the transition polynomial.
- Divide out the transition zerofier to get the transition quotient.
- Commit to the transition quotient.
- Run FRI on all the committed polynomials: the boundary quotients, the transition quotients, and the transition zerofier.
- Supply the Merkle leafs and authentication paths that are requested by the verifier.
Verifier:
- Read the commitments to the boundary quotients.
- Supply the random coefficients for the master transition constraint.
- Read the commitment to the transition quotient.
- Run the FRI verifier.
- Verify the link between boundary quotients and transition quotient. To do this:
- For all points of the transition quotient codeword that were queried in the first round of FRI do:
- Let the point be $(x, y)$.
- Query the matching points on the boundary quotient codewords. Note that there are two of them, $x$ and $\omicron \cdot x$, indicating points "one cycle apart".
- Multiply the y-coordinates of these points by the zerofiers' values in $x$ and $\omicron \cdot x$.
- Add the boundary interpolants' values.
- Evaluate the master transition constraint in this point.
- Divide by the value of the transition zerofier in $x$.
- Verify that the resulting value equals $y$.
## Generalized AIR Constraints
The description so far makes a clear distinction between transition constraints on the one hand, and boundary constraints on the other hand. However, there is a unifying perspective that characterizes both as cleanly dividing ratios of polynomials. More accurately, the denominator divides the numerator cleanly if the computation is integral; otherwise, there is a nonzero remainder.
Such a generalized AIR constraint is given by two polynomials.
- The *numerator* determines which equations between elements of the algebraic execution trace hold, in a manner that is independent of the cycle. For transition constraints, the numerator is exactly the transition constraint polynomial. For boundary constraints, the numerator is simply $t_ i(X) - y$ where $y$ is supposedly the value of $t_ i(X)$ at the given boundary.
- The *denominator* is a zerofier that determines *where* the equality is supposed to hold, by vanishing (*i.e.*, evaluating to zero) in those points. For transition constraints, this zerofier vanishes on all points of the trace evaluation domain except the last. For the boundary constraints, this zerofier vanishes only on the boundary.
Treating boundary constraints and transition constraints as subclasses of generalized AIR constraints, leads to a simpler workflow diagram. Now, the prover commits to the raw trace polynomials, but these polynomials are not input to the FRI subprotocol. Instead, they are used to only verify that the leafs of the first Merkle tree of FRI were computed correctly.

The implementation of this tutorial follows the earlier workflow, which separates boundary constraints from transition constraints, rather than the workflow informed by a generalized notion of AIR constraints.
## Adding Zero-Knowledge
Formally, an interactive proof system is *zero-knowledge* if the distribution of transcripts arising from authentic executions of the protocol is independent of the witness and can be sampled efficiently with public information only. In practice, this means that the prover randomizes the data structures and proof arithmetic using randomness that also remains secret. The transcript is independent of the witness because *any* transcript can be explained by the right choice of randomizers.
With respect to randomizing the STARK proof system, it is worth separating the mechanism into two parts and randomizing them separately.
1. The FRI bounded degree proof. This component is randomized by adding a randomizer codeword to the nonlinear combination. This randomizer codeword corresponds to a polynomial of maximal degree whose coefficients are drawn uniformly at random.
2. The linking part that establishes that the boundary quotients are linked to the transition quotient(s). To randomize this, the execution trace for every register is extended with $4s$ uniformly random field elements. The number $4s$ comes from the number $s$ of colinearity checks in the FRI protocol: every colinearity check induces two queries in the initial codeword. The two values of the transition quotient codeword need to be linked to four values of the boundary quotient codewords.
It is important to guarantee that none of the x-coordinates that are queried as part of FRI correspond to x-coordinates used for interpolating the execution trace. This is one of the reasons why coset-FRI comes in handy. Nevertheless, other solutions can address this problem.
Lastly, if the field is not large enough (specifically, if its cardinality is significantly less than $2^\lambda$ for security level $\lambda$), then salts need to be appended to the leafs when building the Merkle tree. Specifically, every leaf needs $\lambda$ bits of randomness, and if it does not come from the field element then it must come from an explicit appendix.
Without leaf salts, the Merkle tree and its paths are deterministic for a given codeword. This codeword is still somewhat random, because the polynomial that generates it has randomizers. However, every leaf has at most $\vert \mathbb{F}_ p \vert$ bits of entropy, and when this number is smaller than $\lambda$, the attacker is likely to find duplicate hash digests. In other words, he can notice, with less than $2^\lambda$ work, that the same value is being input to the hash function. This observation leads to a distinguisher between authentic and simulated transcript, which in turn undermines zero-knowledge.
The code presented here omits leaf salts because the field is large enough.
## Implementation
Like the FRI module, the STARK module starts with an initializer function that sets the class's fields to the initialization arguments or values inferred from them.
```python
from functools import reduce
import os
class Stark:
def __init__( self, field, expansion_factor, num_colinearity_checks, security_level, num_registers, num_cycles, transition_constraints_degree=2 ):
assert(len(bin(field.p)) - 2 >= security_level), "p must have at least as many bits as security level"
assert(expansion_factor & (expansion_factor - 1) == 0), "expansion factor must be a power of 2"
assert(expansion_factor >= 4), "expansion factor must be 4 or greater"
assert(num_colinearity_checks * 2 >= security_level), "number of colinearity checks must be at least half of security level"
self.field = field
self.expansion_factor = expansion_factor
self.num_colinearity_checks = num_colinearity_checks
self.security_level = security_level
self.num_randomizers = 4*num_colinearity_checks
self.num_registers = num_registers
self.original_trace_length = num_cycles
randomized_trace_length = self.original_trace_length + self.num_randomizers
omicron_domain_length = 1 << len(bin(randomized_trace_length * transition_constraints_degree)[2:])
fri_domain_length = omicron_domain_length * expansion_factor
self.generator = self.field.generator()
self.omega = self.field.primitive_nth_root(fri_domain_length)
self.omicron = self.field.primitive_nth_root(omicron_domain_length)
self.omicron_domain = [self.omicron^i for i in range(omicron_domain_length)]
self.fri = Fri(self.generator, self.omega, fri_domain_length, self.expansion_factor, self.num_colinearity_checks)
```
The code makes a distinction between the *original trace length*, which is one greater than the number of cycles, and the *randomized trace length*, which extends the previous trace length with $4s$ additional randomizers. A third related variable is the `omicron_domain`, which is the list of points in the subgroup of order $2^k$ where $k$ is the smallest integer such that this domain is still larger than or equal to the randomized trace length.
Next up are the helper functions. First are the degree bounds calculators for a) transition polynomials; b) transition quotient polynomials; and c) the nonlinear random combination of polynomials that goes into FRI. This last number is one less than the next power of two.
```python
def transition_degree_bounds( self, transition_constraints ):
point_degrees = [1] + [self.original_trace_length+self.num_randomizers-1] * 2*self.num_registers
return [max( sum(r*l for r, l in zip(point_degrees, k)) for k, v in a.dictionary.items()) for a in transition_constraints]
def transition_quotient_degree_bounds( self, transition_constraints ):
return [d - (self.original_trace_length-1) for d in self.transition_degree_bounds(transition_constraints)]
def max_degree( self, transition_constraints ):
md = max(self.transition_quotient_degree_bounds(transition_constraints))
return (1 << (len(bin(md)[2:]))) - 1
```
Note that this code is not compressing the many transition constraints into one. As a result, there are many transition polynomials and many transition quotients.
Up next are zerofier polynomials, which come in two categories: boundary zerofiers and transition zerofiers.
```python
def transition_zerofier( self ):
domain = self.omicron_domain[0:(self.original_trace_length-1)]
return Polynomial.zerofier_domain(domain)
def boundary_zerofiers( self, boundary ):
zerofiers = []
for s in range(self.num_registers):
points = [self.omicron^c for c, r, v in boundary if r == s]
zerofiers = zerofiers + [Polynomial.zerofier_domain(points)]
return zerofiers
```
The next function computes polynomials that interpolate through the (location,value)-pairs of the boundary conditions. This function also enables a boundary counterpart to the transition quotient degree bounds calculator.
```python
def boundary_interpolants( self, boundary ):
interpolants = []
for s in range(self.num_registers):
points = [(c,v) for c, r, v in boundary if r == s]
domain = [self.omicron^c for c,v in points]
values = [v for c,v in points]
interpolants = interpolants + [Polynomial.interpolate_domain(domain, values)]
return interpolants
def boundary_quotient_degree_bounds( self, randomized_trace_length, boundary ):
randomized_trace_degree = randomized_trace_length - 1
return [randomized_trace_degree - bz.degree() for bz in self.boundary_zerofiers(boundary)]
```
The last helper function is used by the prover and verifier when they want to transform a seed, obtained from the Fiat-Shamir transform, into a list of field elements. The resulting field elements are used as weights in the nonlinear combination of polynomials before starting FRI.
```python
def sample_weights( self, number, randomness ):
return [self.field.sample(blake2b(randomness + bytes(i)).digest()) for i in range(0, number)]
```
### Prove
Next up is the prover. The big difference with respect to the explanation above is that there is no compression of transition constraints into one master constraint. This task is left as an exercise to the reader.
Another difference is that the transition constraints have $2\mathsf{w}+1$ variables rather than $2\mathsf{w}$. The extra variable takes the value of the evaluation domain over which the execution trace is interpolated. This feature anticipates constraints that depend on the cycle, for instance to evaluate a hash function that uses round constants that are different in each round.
```python
def prove( self, trace, transition_constraints, boundary, proof_stream=None ):
# create proof stream object if necessary
if proof_stream == None:
proof_stream = ProofStream()
# concatenate randomizers
for k in range(self.num_randomizers):
trace = trace + [[self.field.sample(os.urandom(17)) for s in range(self.num_registers)]]
# interpolate
trace_domain = [self.omicron^i for i in range(len(trace))]
trace_polynomials = []
for s in range(self.num_registers):
single_trace = [trace[c][s] for c in range(len(trace))]
trace_polynomials = trace_polynomials + [Polynomial.interpolate_domain(trace_domain, single_trace)]
# subtract boundary interpolants and divide out boundary zerofiers
boundary_quotients = []
for s in range(self.num_registers):
interpolant = self.boundary_interpolants(boundary)[s]
zerofier = self.boundary_zerofiers(boundary)[s]
quotient = (trace_polynomials[s] - interpolant) / zerofier
boundary_quotients += [quotient]
# commit to boundary quotients
fri_domain = self.fri.eval_domain()
boundary_quotient_codewords = []
boundary_quotient_Merkle_roots = []
for s in range(self.num_registers):
boundary_quotient_codewords = boundary_quotient_codewords + [boundary_quotients[s].evaluate_domain(fri_domain)]
merkle_root = Merkle.commit(boundary_quotient_codewords[s])
proof_stream.push(merkle_root)
# symbolically evaluate transition constraints
point = [Polynomial([self.field.zero(), self.field.one()])] + trace_polynomials + [tp.scale(self.omicron) for tp in trace_polynomials]
transition_polynomials = [a.evaluate_symbolic(point) for a in transition_constraints]
# divide out zerofier
transition_quotients = [tp / self.transition_zerofier() for tp in transition_polynomials]
# commit to randomizer polynomial
randomizer_polynomial = Polynomial([self.field.sample(os.urandom(17)) for i in range(self.max_degree(transition_constraints)+1)])
randomizer_codeword = randomizer_polynomial.evaluate_domain(fri_domain)
randomizer_root = Merkle.commit(randomizer_codeword)
proof_stream.push(randomizer_root)
# get weights for nonlinear combination
# - 1 randomizer
# - 2 for every transition quotient
# - 2 for every boundary quotient
weights = self.sample_weights(1 + 2*len(transition_quotients) + 2*len(boundary_quotients), proof_stream.prover_fiat_shamir())
assert([tq.degree() for tq in transition_quotients] == self.transition_quotient_degree_bounds(transition_constraints)), "transition quotient degrees do not match with expectation"
# compute terms of nonlinear combination polynomial
x = Polynomial([self.field.zero(), self.field.one()])
terms = []
terms += [randomizer_polynomial]
for i in range(len(transition_quotients)):
terms += [transition_quotients[i]]
shift = self.max_degree(transition_constraints) - self.transition_quotient_degree_bounds(transition_constraints)[i]
terms += [(x^shift) * transition_quotients[i]]
for i in range(self.num_registers):
terms += [boundary_quotients[i]]
shift = self.max_degree(transition_constraints) - self.boundary_quotient_degree_bounds(len(trace), boundary)[i]
terms += [(x^shift) * boundary_quotients[i]]
# take weighted sum
# combination = sum(weights[i] * terms[i] for all i)
combination = reduce(lambda a, b : a+b, [Polynomial([weights[i]]) * terms[i] for i in range(len(terms))], Polynomial([]))
# compute matching codeword
combined_codeword = combination.evaluate_domain(fri_domain)
# prove low degree of combination polynomial
indices = self.fri.prove(combined_codeword, proof_stream)
indices.sort()
duplicated_indices = [i for i in indices] + [(i + self.expansion_factor) % self.fri.domain_length for i in indices]
# open indicated positions in the boundary quotient codewords
for bqc in boundary_quotient_codewords:
for i in duplicated_indices:
proof_stream.push(bqc[i])
path = Merkle.open(i, bqc)
proof_stream.push(path)
# ... as well as in the randomizer
for i in indices:
proof_stream.push(randomizer_codeword[i])
path = Merkle.open(i, randomizer_codeword)
proof_stream.push(path)
# the final proof is just the serialized stream
return proof_stream.serialize()
```
### Verify
Last is the verifier. It comes with the same caveat and exercise.
```python
def verify( self, proof, transition_constraints, boundary, proof_stream=None ):
H = blake2b
# infer trace length from boundary conditions
original_trace_length = 1 + max(c for c, r, v in boundary)
randomized_trace_length = original_trace_length + self.num_randomizers
# deserialize with right proof stream
if proof_stream == None:
proof_stream = ProofStream()
proof_stream = proof_stream.deserialize(proof)
# get Merkle roots of boundary quotient codewords
boundary_quotient_roots = []
for s in range(self.num_registers):
boundary_quotient_roots = boundary_quotient_roots + [proof_stream.pull()]
# get Merkle root of randomizer polynomial
randomizer_root = proof_stream.pull()
# get weights for nonlinear combination
weights = self.sample_weights(1 + 2*len(transition_constraints) + 2*len(self.boundary_interpolants(boundary)), proof_stream.verifier_fiat_shamir())
# verify low degree of combination polynomial
polynomial_values = []
verifier_accepts = self.fri.verify(proof_stream, polynomial_values)
polynomial_values.sort(key=lambda iv : iv[0])
if not verifier_accepts:
return False
indices = [i for i,v in polynomial_values]
values = [v for i,v in polynomial_values]
# read and verify leafs, which are elements of boundary quotient codewords
duplicated_indices = [i for i in indices] + [(i + self.expansion_factor) % self.fri.domain_length for i in indices]
leafs = []
for r in range(len(boundary_quotient_roots)):
leafs = leafs + [dict()]
for i in duplicated_indices:
leafs[r][i] = proof_stream.pull()
path = proof_stream.pull()
verifier_accepts = verifier_accepts and Merkle.verify(boundary_quotient_roots[r], i, path, leafs[r][i])
if not verifier_accepts:
return False
# read and verify randomizer leafs
randomizer = dict()
for i in indices:
randomizer[i] = proof_stream.pull()
path = proof_stream.pull()
verifier_accepts = verifier_accepts and Merkle.verify(randomizer_root, i, path, randomizer[i])
# verify leafs of combination polynomial
for i in range(len(indices)):
current_index = indices[i] # do need i
# get trace values by applying a correction to the boundary quotient values (which are the leafs)
domain_current_index = self.generator * (self.omega^current_index)
next_index = (current_index + self.expansion_factor) % self.fri.domain_length
domain_next_index = self.generator * (self.omega^next_index)
current_trace = [self.field.zero() for s in range(self.num_registers)]
next_trace = [self.field.zero() for s in range(self.num_registers)]
for s in range(self.num_registers):
zerofier = self.boundary_zerofiers(boundary)[s]
interpolant = self.boundary_interpolants(boundary)[s]
current_trace[s] = leafs[s][current_index] * zerofier.evaluate(domain_current_index) + interpolant.evaluate(domain_current_index)
next_trace[s] = leafs[s][next_index] * zerofier.evaluate(domain_next_index) + interpolant.evaluate(domain_next_index)
point = [domain_current_index] + current_trace + next_trace
transition_constraints_values = [transition_constraints[s].evaluate(point) for s in range(len(transition_constraints))]
# compute nonlinear combination
counter = 0
terms = []
terms += [randomizer[current_index]]
for s in range(len(transition_constraints_values)):
tcv = transition_constraints_values[s]
quotient = tcv / self.transition_zerofier().evaluate(domain_current_index)
terms += [quotient]
shift = self.max_degree(transition_constraints) - self.transition_quotient_degree_bounds(transition_constraints)[s]
terms += [quotient * (domain_current_index^shift)]
for s in range(self.num_registers):
bqv = leafs[s][current_index] # boundary quotient value
terms += [bqv]
shift = self.max_degree(transition_constraints) - self.boundary_quotient_degree_bounds(randomized_trace_length, boundary)[s]
terms += [bqv * (domain_current_index^shift)]
combination = reduce(lambda a, b : a+b, [terms[j] * weights[j] for j in range(len(terms))], self.field.zero())
# verify against combination polynomial value
verifier_accepts = verifier_accepts and (combination == values[i])
if not verifier_accepts:
return False
return verifier_accepts
```
[0](index) - [1](overview) - [2](basic-tools) - [3](fri) - **4** - [5](rescue-prime) - [6](faster)
[^1]: It is worth ensuring that the trace evaluation domain is disjoint from the FRI evaluation domain, for example by using the coset-trick. However, if overlapping subgroups are used for both domains, then $\omega^{1 / \rho} = \omicron$ and $\omega$ generates the larger domain whereas $\omicron$ generates the smaller one.