01-interference.Rmd

# (PART) Foundations {-} 

# Quantum interference: an overview {#chapter1}

> About complex numbers, called **probability amplitudes**, that, unlike probabilities, can cancel each other out, leading to **quantum interference**, and consequently qualitatively new ways of processing information.

The classical theory of computation does not usually refer to physics.
Pioneers such as Alan Turing, Alonzo Church, Emil Post, and Kurt Gödel managed to capture the correct classical theory by intuition alone and, as a result, it is often falsely assumed that its foundations are self-evident and purely abstract.
They are not!^[Computation is a physical process. Computation is a physical process. Computation is ...]

The concepts of information and computation can be properly formulated only in the context of a physical theory --- information is stored, transmitted and processed always by _physical_ means.
Computers are physical objects and computation is a physical process.
Indeed, any computation, classical or quantum, can be viewed in terms of physical experiments, which produce **outputs** that depend on initial preparations called **inputs**.
Once we abandon the classical view of computation as a purely logical notion independent of the laws of physics it becomes clear that whenever we improve our knowledge about physical reality, we may also gain new means of computation.
Thus, from this perspective, it is not very surprising that the discovery of quantum mechanics in particular has changed our understanding of the nature of computation.
In order to explain what makes quantum computers so different from their classical counterparts, we begin with the rudiments of quantum theory.

Some of what we say in this chapter will be repeated in later chapters, but usually in much more detail.
Feel free to think of this chapter as a sort of "aeroplane tour" of the rudiments, knowing that we will soon land on the ground to go out exploring by foot.


## Two basic rules

Quantum theory, at least at some instrumental level, can be viewed as a modification of probability theory.
We replace positive numbers (probabilities) with complex numbers $z$ (called **probability amplitudes**) such that the squares of their absolute values, $|z|^2$, are interpreted as probabilities.

::: {.idea data-latex=""}
The correspondence between probability amplitudes $z$ and probabilities $|z|^2$ is known as **Born's Rule**.
:::

The rules for combining amplitudes are very reminiscent of the rules for combining probabilities:

1. Whenever something can happen in a sequence of independent steps, we multiply the amplitudes of each step.

```{r,engine='tikz',fig.width=3}
\definecolor{primary}{RGB}{177,98,78}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\tikzset{process/.style={thick,->,shorten >=0.1cm,shorten <=0.1cm}}
\begin{tikzpicture}
  \node [blob=primary] (initial) at (0,0) {};
  \node [blob=primary] (middle) at (2,-0.3) {};
  \node [blob=primary] (final) at (4,-0.6) {};
  \draw [process] (initial) to [out=40,in=120] node[pos=0.5,fill=white] {$z_1$} (middle);
  \draw [process] (middle) to [out=40,in=120] node[pos=0.5,fill=white] {$z_2$} (final);
  \node at (2,-1) {$z = z_1 z_2$};
\end{tikzpicture}
```

2. Whenever something can happen in several alternative ways, we add the amplitudes for each separate way.

```{r,engine='tikz',fig.width=2.25}
\definecolor{primary}{RGB}{177,98,78}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\tikzset{process/.style={thick,->,shorten >=0.1cm,shorten <=0.1cm}}
\begin{tikzpicture}
  \node (initial) at (0,0) [blob=primary] {};
  \node (final) at (3,-0.5) [blob=primary] {};
  \draw [process] (initial) to [out=40,in=120] node[pos=0.5,fill=white] {$z_1$} (final);
  \draw [process] (initial) to [out=-60,in=-140] node[pos=0.5,fill=white] {$z_2$} (final);
  \node at (1.5,-1.5) {$z = z_1 + z_2$};
\end{tikzpicture}
```

That's it!
These two rules are basically all you need to manipulate amplitudes in any physical process, no matter how complicated.^[We will, however, amend the two rules later on when we touch upon particle statistics.]
They are universal and apply to any physical system, from elementary particles through atoms and molecules to white dwarfs stars.
They also apply to information, since, as we have already emphasised, information is physical.
The two rules look deceptively simple but, as you will see in a moment, their consequences are anything but trivial.


## Quantum interference: the failure of probability theory

Modern mathematical probability theory is based on three axioms, proposed by Andrey Nikolaevich Kolmogorov (1903--1987) in his monograph with the impressive German title _Grundbegriffe der Wahrscheinlichkeitsrechnung_ ("Foundations of Probability Theory").
The **Kolmogorov axioms** are simple and intuitive:^[I always found it an interesting coincidence that the two basic ingredients of modern quantum theory, namely probability and complex numbers, were discovered by the same person, an extraordinary man of many talents: a gambling scholar by the name of Girolamo Cardano (1501--1576).]

1. Once you identify all elementary outcomes, or events, you may then assign probabilities to them.
2. Probability is a number between $0$ and $1$, and an event which is certain has probability $1$.
3. Last but not least, the probability of any event can be calculated using a deceptively simple rule --- the **additivity axiom**:
  _Whenever an event can occur in several mutually exclusive ways, the probability for the event is the sum of the probabilities for each way considered separately._

Obvious, isn't it?
So obvious, in fact, that probability theory was accepted as a mathematical framework theory, a language that can be used to describe actual physical phenomena.
Physics should be able to identify elementary events and assign numerical probabilities to them.
Once this is done we may revert to mathematical formalism of probability theory.
The Kolmogorov axioms will take care of the mathematical consistency and will guide us whenever there is a need to calculate probabilities of more complex events.
This is a very sensible approach, apart from the fact that it does not always work!
Today, we know that probability theory, as ubiquitous as it is, fails to describe many common quantum phenomena.
In order to see the need for quantum theory let us consider a simple experiment in which probability theory fails to give the right predictions.


### The double slit experiment

In a double slit experiment, a particle emitted from a source $S$ can reach the detector $D$ by taking two different paths, e.g. through an upper or a lower slit in a barrier between the source and the detector.
After sufficiently many repetitions of this experiment we can evaluate the frequency of clicks in the detector $D$ and show that it is inconsistent with the predictions based on probability theory.
Let us use the quantum approach to show how the discrepancy arises.

The particle emitted from a source $S$ can reach detector $D$ by taking two different paths, with amplitudes $z_1$ and $z_2$ respectively.
We may say that the upper slit is taken with probability $p_1=|z_1|^2$ and the lower slit with probability $p_2=|z_2|^2$.
These are two mutually exclusive events.
With the two slits open, probability theory declares (by the additivity axiom) that the particle should reach the detector with probability $p_1+p_2= |z_1|^2+|z_2|^2$.
But this is not what happens experimentally!

Following the "quantum rules", first we add the amplitudes and then we square the absolute value of the sum to get the probability.
Thus, the particle will reach the detector with probability
$$
  \begin{aligned}
    p &= |z|^2
  \\& = |z_1 + z_2|^2
  \\& = |z_1|^2 + |z_2|^2
        + z_1^\star z_2 + z_1 z_2^\star
  \\& = p_1 + p_2
        + |z_1||z_2|\left(
          e^{i(\varphi_2-\varphi_1)}
          + e^{-i(\varphi_2-\varphi_1)}
        \right)
  \\& = p_1 + p_2
        + 2 \sqrt{p_1 p_2} \cos(\varphi_2-\varphi_1)
  \\& = p_1 + p_2 + \text{interference terms}
  \end{aligned}
\tag{1.2.1.1}
$$
where we have expressed the amplitudes in their polar forms
$$
\begin{aligned}
  z_1 &= |z_1|e^{i\varphi_1}
\\z_2 &= |z_2|e^{i\varphi_2}.
\end{aligned}
$$
The appearance of the interference terms marks the departure from the classical theory of probability.
The probability of any two seemingly mutually exclusive events is the sum of the probabilities of the individual events, $p_1 + p_2$, _modified_ by the **interference term** $2 \sqrt{p_1p_2}\cos(\varphi_2-\varphi_1)$.
Depending on the **relative phase** $\varphi_2-\varphi_1$, the interference term can be either negative (which we call **destructive** interference) or positive (**constructive** interference), leading to either suppression or enhancement of the total probability $p$.

The algebra is simple; our focus is on the physical interpretation.
Firstly, note that the important quantity here is the _relative_ phase $\varphi_2-\varphi_1$ rather than the individual values $\varphi_1$ and $\varphi_2$.
This observation is not trivial at all: if a particle reacts only to the difference of the two phases, each pertaining to a separate path, then it must have, somehow, experienced the two paths, right?
Thus we cannot say that the particle has travelled _either_ through the upper or the lower slit, because it has travelled through _both_.
In the same way, quantum computers follow, in some tangible way, _all_ computational paths simultaneously, producing answers that depend on _all_ these alternative calculations.
Weird, but this is how it is!

Secondly, what has happened to the additivity axiom in probability theory?
What was wrong with it?
One problem is the assumption that the processes of taking the upper or the lower slit are mutually exclusive; in reality, as we have just mentioned, the two transitions _both occur_, simultaneously.
However, we cannot learn this from probability theory, nor from any other _a priori_ mathematical construct.^[According to the philosopher Karl Popper (1902--1994) a theory is genuinely scientific only if it is possible, in principle, to establish that it is false. Genuinely scientific theories are never finally confirmed because no matter how many confirming observations have been made observations that are inconsistent with the empirical predictions of the theory are always possible.]

::: {.idea data-latex=""}
There is no fundamental reason why Nature should conform to the additivity axiom.
:::

We find out how nature works by making intelligent guesses, running experiments, checking what happens and formulating physical theories.
If our guess disagrees with experiments then it is wrong, so we try another intelligent guess, and another, etc.
Right now, quantum theory is the best guess we have: it offers good explanations and predictions that have not been falsified by any of the existing experiments.
This said, rest assured that one day quantum theory _will_ be falsified, and then we will have to start guessing all over again.


## Superpositions

Amplitudes are more than just tools for calculating probabilities: they tell us something about physical reality.
When we deal with probabilities, we may think about them as numbers that quantify our lack of knowledge.
Indeed, when we say that a particle goes through the upper or the lower slit with some respective probabilities, it does go through one of the two slits, we just do not know which one.
In contrast, according to quantum theory, a particle that goes through the upper and the lower slit with certain amplitudes does explore _both_ of the two paths, not just one of them.
This is a statement about a real physical situation --- about something that is out there and something we can experiment with.

::: {.idea data-latex=""}
The assumption that the particle goes through one of the two slits, but just that we do not know which one, is inconsistent with _many_ experimental observations.
:::

We have to accept that, apart from some easy to visualise states, known as the **basis states**, (such as the particle at the upper slit or the particle at the lower slit), there are infinitely many other states, all of them equally real, in which the particle is in a **superposition** of the two basis states.
This rather bizarre picture of reality is the best we have at the moment, and it works, at least for now.

Physicists write such superposition states as^[Dirac notation will likely be familiar to physicists, but may look odd to mathematicians or computer scientists. Love it or hate it (and I suggest the former), the notation is so common that you simply have no choice but to learn it, especially if you want to study anything related to quantum theory.]
$$
  \ket{\psi}=\alpha \ket{\text{at the upper slit}} +\beta \ket{\text{at the lower slit}},
$$
meaning the particle at the upper slit with amplitude $\alpha$ and at the lower slit with amplitude $\beta$.
Mathematically, you can think about this expression as a vector $\ket{\psi}$ in a two-dimensional complex vector space written in terms of the two basis vectors $\ket{\text{at the upper slit}}$ and $\ket{\text{at the lower slit}}$.
You could also write this vector as a column vector with two complex entries $\alpha$ and $\beta$, but then you would have to explain the _physical meaning_ of the basis states.
Here, we use the $\ket{\cdot}$ notation, introduced by Paul Dirac in the early days of the quantum theory as a useful way to write and manipulate vectors.
In Dirac notation you can put into the box $\ket{\phantom{0}}$ anything that serves to specify what the vector is: it could be $\ket{\uparrow}$ for spin up and $\ket{\downarrow}$ for spin down, or $\ket{0}$ for a quantum bit holding logical $0$ and $\ket{1}$ for a quantum bit holding logical $1$, etc.
As we shall see soon, there is much more to this notation, and learning to manipulate it will help you greatly.


## Interferometers

Many modern interference experiments are performed using internal degrees of freedom of atoms and ions.
For example, **Ramsey interferometry**, named after American physicist Norman Ramsey, is a generic name for an interference experiment in which atoms are sent through two separate resonant interaction zones, known as **Ramsey zones**, separated by an intermediate dispersive interaction zone.

Many beautiful experiments of this type were carried out in the 1990s in Serge Haroche's lab at the Ecole Normale Supérieure in Paris.
Rubidium atoms were sent through two separate interaction zones (resonant interaction in the first and the third cavity) separated by a phase inducing dispersive interaction zone (the central cavity).
The atoms were subsequently measured, via a selective ionisation, and found to be in one of the two preselected energy states, here labeled as $\ket{0}$ and $\ket{1}$.
The fraction of atoms found in states $\ket{0}$ or $\ket{1}$ showed a clear dependence on the phase shifts induced by the dispersive interaction in the central cavity.
In 2012, Serge Haroche and Dave Wineland shared the Nobel Prize in physics for "ground-breaking experimental methods that enable measuring and manipulation of individual quantum systems."

(ref:ramsey-figure-caption) A schematic diagram of a Ramsey interference experiment.

```{r ramsey-figure,engine='tikz',fig.cap='(ref:ramsey-figure-caption)'}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\newcommand{\ket}[1]{|#1\rangle}
\begin{tikzpicture}
  \begin{scope}[shift={(-1,0.5)},scale=0.7]
  \draw[thick]  (0,0) ellipse (20pt and 28pt);
  \draw (-0.4,-0.4) -- (0.4,-0.4);
  \draw (-0.4,0.4) -- (0.4,0.4);
  \node [blob=primary] at (0,0.4) {};
  \node at (-1.1,0.5) {$\ket{0}$};
  \end{scope}
  \begin{scope}[shift={(7.6,-0.8)},scale=0.7]
  \draw[thick]  (0,0) ellipse (20pt and 28pt);
  \draw (-0.4,-0.4) -- (0.4,-0.4);
  \draw (-0.4,0.4) -- (0.4,0.4);
  \node [blob=primary] at (0,-0.4) {};
  \node at (1.2,-0.4) {$\ket{1}$};
  \end{scope}
  \begin{scope}[shift={(7.6,1.8)},scale=0.7]
  \draw[thick]  (0,0) ellipse (20pt and 28pt);
  \draw (-0.4,-0.4) -- (0.4,-0.4);
  \draw (-0.4,0.4) -- (0.4,0.4);
  \node [blob=primary] at (0,0.4) {};
  \node at (1.2,0.4) {$\ket{0}$};
  \end{scope}
  \draw [thick] (0,0.5) -- (6,0.5);
  \draw [thick,->] (6,0.5) -- (7,1.5);
  \draw [thick,->] (6,0.5) -- (7,-0.5);
  \draw [ultra thick,dashed,primary] (0.4,-0.5) rectangle (2,1.5);
  \draw [ultra thick,dashed,secondary] (2.5,-0.5) rectangle (3.6,1.5);
  \draw [ultra thick,dashed,primary] (4,-0.5) rectangle (5.6,1.5);
  \node at (1,-1) {resonant};
  \node at (3,-1.03) {dispersive};
  \node at (5,-1) {resonant};
\end{tikzpicture}
```

The three rectangular boxes in Figure \@ref(fig:ramsey-figure) represent three cavities, each cavity being an arrangement of mirrors which traps electromagnetic field (think about standing waves in between two mirrors).
The oval shapes represent rubidium atoms with two preselected energy states labelled as $\ket{0}$ and $\ket{1}$.
Each atom is initially prepared in a highly excited internal energy state $\ket{0}$ and zips through the three cavities, from the left to the right.
In each cavity the atom interacts with the cavity field.
The first and the third cavities are, for all theoretical purposes, identical: their frequencies are tuned to the resonant frequency of the atom, and the atom exchanges energy with the cavity, going back and forth between its energy states $\ket{0}$ and $\ket{1}$.
In contrast, in the second (central) cavity, the atom undergoes the so-called dispersive interaction: it is too off-resonance to exchange energy with the field but its energy states "feel" the field and acquire phase shifts.
After experiencing this well timed sequence of resonant--dispersive--resonant interactions, the energy of the atom is measured and the atom is found to be either in state $\ket{0}$ or state $\ket{1}$.
The fraction of atoms found in state $\ket{0}$ or $\ket{1}$ shows a clear dependence on the phase shifts induced by the dispersive interaction in the central cavity.

We can understand this interference better if we follow the two internal states of the atom as it moves through the three cavities.

(ref:ramsey-figure-2-caption) The Ramsey interferometer represented as an abstract diagram. It should be read from left to right. The line segments represent transitions between the two states, $\ket{0}$ and $\ket{1}$,  and the numbers are the corresponding probability amplitudes.

```{r ramsey-figure-2,engine='tikz',fig.cap='(ref:ramsey-figure-2-caption)'}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\definecolor{highlight}{RGB}{0,0,0}
\definecolor{lowlight}{RGB}{184,177,162}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\newcommand{\ket}[1]{|#1\rangle}
\begin{tikzpicture}[scale=1.7]
  \draw [ultra thick,dashed,primary] (0.6,2.5) rectangle (2.3,-0.5);
  \draw [ultra thick,dashed,secondary] (2.5,2.5) rectangle (3.4,-0.5);
  \draw [ultra thick,dashed,primary] (3.6,2.5) rectangle (5.3,-0.5);
  \node at (1.5,-1) {resonant};
  \node at (3,-1) {dispersive};
  \node at (4.5,-1) {resonant};
  \node [label=left:{$\ket{0}$}] (input0) at (0.2,2) {};
  \node [label=left:{$\ket{1}$}] (input1) at (0.2,0) {};
  \node [label=right:{$\ket{0}$}] (output0) at (5.7,2) {};
  \node [label=right:{$\ket{1}$}] (output1) at (5.7,0) {};
  \foreach \x in {1,2,4,5}
    \foreach \y in {0,2}
      \node [dot] (n\x\y) at (\x,\y) {};
  \draw [ultra thick,highlight] (input0) to (n12) to (n22);
  \draw [thick,lowlight] (input1) to (n10) to (n20);
  \draw [thick,lowlight] (n42) to (n52) to (output0);
  \draw [ultra thick,highlight] (n40) to (n50) to (output1);
  \draw [ultra thick,highlight] (n22) to node [pos=0.5,fill=white] {$e^{i\varphi_0}$} (n42);
  \draw [ultra thick,highlight] (n20) to node [pos=0.5,fill=white] {$e^{i\varphi_1}$} (n40);
  \draw [thick,lowlight] (n10) to node [pos=0.25,fill=white] {\small$\frac1{\sqrt{2}}$} (n22);
  \draw [thick,lowlight] (n40) to node [pos=0.25,fill=white] {\small$\frac1{\sqrt{2}}$} (n52);
  \draw [ultra thick,highlight] (n12) to node [pos=0.25,fill=white] {\small$\frac1{\sqrt{2}}$} (n20);
  \draw [ultra thick,highlight] (n42) to node [pos=0.25,fill=white] {\small$\frac1{\sqrt{2}}$} (n50);
  \node [highlight,fill=white] at (1.6,2) {\small$\frac1{\sqrt{2}}$};
  \node [lowlight,fill=white] at (4.6,2) {\small$\frac1{\sqrt{2}}$};
  \node [lowlight,fill=white] at (1.6,0) {\small$\frac{-1}{\sqrt{2}}$};
  \node [highlight,fill=white] at (4.6,0) {\small$\frac{-1}{\sqrt{2}}$};
\end{tikzpicture}
```

Suppose we are interested in the probability that the atom, initially in state $\ket{0}$, will be found, after completing its journey through the three cavities, in state $\ket{1}$.
As you can see in Figure \@ref(fig:ramsey-figure-2), this can happen in two ways, as indicated by the two red paths connecting the input state $\ket{0}$ on the left with the output state $\ket{1}$ on the right.
Again, let $U_{ij}$ denote the probability amplitude that input $\ket{j}$ generates output $\ket{i}$ (for $i,j=0,1$).
We can see from the diagram that
$$
  \begin{aligned}
    U_{10}
    &= \frac{1}{\sqrt2} e^{i\varphi_0}\frac{1}{\sqrt2} + \frac{1}{\sqrt2} e^{i\varphi_1}\frac{-1}{\sqrt2}
  \\&= \frac12 e^{i\varphi_0} - \frac12 e^{i\varphi_1}
  \\&= -ie^{i\varphi/2}\sin\frac{\varphi}{2},
  \end{aligned}
$$
where $\varphi = \varphi_0-\varphi_1$ is the relative phase.
The corresponding probability reads^[From the classical probability theory perspective the resonant interaction induces a random switch between $\ket{0}$ and $\ket{1}$ (why?) and the dispersive interaction has no effect on these two states (why?). Hence, one random switch followed by another random switch gives exactly a single random switch, which gives $\frac12$ for the probability that input $\ket{0}$ becomes output $\ket{1}$.]
$$
  \begin{aligned}
    P_{10}
    &= \vert U_{10}\vert^2
  \\&= \left\vert \frac12 e^{i\varphi_0} - \frac12 e^{i\varphi_1}\right\vert^2
  \\&= \frac12 - \frac12\cos\varphi.
  \end{aligned}
$$
You should recognise the first term, $\frac12$, as the "classical" probability and the second one, $-\frac12\cos\varphi$, as the interference term.
We can repeat such calculations for any other pair of input--output states.
This approach works fine here but, in general, tracking all possible paths in evolving quantum systems can become messy when the number of input and output states increases.
There is, however, a neat way of doing it via matrix multiplication.

The effect of each interaction on atomic states can be described by a matrix of transition amplitudes, as illustrated in Figure \@ref(fig:interference-matrix).
Then a sequence of independent interactions is described by the product of these matrices.
$$
  \begin{aligned}
    U &=
    \begin{bmatrix}
      \frac{1}{\sqrt2} & \frac{1}{\sqrt2}
    \\\frac{1}{\sqrt2} & \frac{-1}{\sqrt2}
    \end{bmatrix}
    \begin{bmatrix}
      e^{i\varphi_0} & 0
    \\0 & e^{i\varphi_1}
    \end{bmatrix}
    \begin{bmatrix}
      \frac{1}{\sqrt2} & \frac{1}{\sqrt2}
    \\\frac{1}{\sqrt2} & \frac{-1}{\sqrt2}
    \end{bmatrix}
  \\&= e^{i\frac{\varphi_0+\varphi_1}{2}}
    \begin{bmatrix}
      \cos\frac{\varphi}{2} & -i\sin\frac{\varphi}{2}
    \\\ -i\sin\frac{\varphi}{2}& \cos\frac{\varphi}{2}
    \end{bmatrix},
  \end{aligned}
$$
where $\varphi = \varphi_0-\varphi_1$.

(ref:interference-matrix-caption) The Ramsey interferometer represented as an abstract diagram (matrix approach). Here we have omitted the $\ket{0}$ and $\ket{1}$ labels, just to simply the diagram. We also ignore the global phase factor of $e^{i\frac{\varphi_0+\varphi_1}{2}}$.

```{r interference-matrix,engine='tikz',fig.cap='(ref:interference-matrix-caption)'}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\newcommand{\ket}[1]{|#1\rangle}
\begin{tikzpicture}[xscale=1.35]
  \draw [ultra thick,dashed,primary] (0.6,2.5) rectangle (2.3,-0.5);
  \draw [ultra thick,dashed,secondary] (2.5,2.5) rectangle (3.4,-0.5);
  \draw [ultra thick,dashed,primary] (3.6,2.5) rectangle (5.3,-0.5);
  \node at (1.5,-1.2) {$\begin{bmatrix}\frac{1}{\sqrt2}&\frac{1}{\sqrt2}\\\frac{1}{\sqrt2}&\frac{-1}{\sqrt2}\end{bmatrix}$};
  \node at (3,-1.2) {$\begin{bmatrix}e^{i\varphi_0}&0\\0&e^{i\varphi_1}\end{bmatrix}$};
  \node at (4.5,-1.2) {$\begin{bmatrix}\frac{1}{\sqrt2}&\frac{1}{\sqrt2}\\\frac{1}{\sqrt2}&\frac{-1}{\sqrt2}\end{bmatrix}$};
  \node at (6.15,-1.2) {$=$};
  \node at (8,-1.2) {$\begin{bmatrix}\cos\frac\varphi2&-i\sin\frac\varphi2\\-i\sin\frac\varphi2&\cos\frac\varphi2\end{bmatrix}$};
  \node (input1) at (0.2,2) [] {};
  \node (input2) at (0.2,0) [] {};
  \node (output1) at (5.7,2) {};
  \node (output2) at (5.7,0) {};
  \draw (input1)--(1,2)--(2,2)--(4,2);
  \draw (4,2)--(5,2)--(output1);
  \draw (input2)--(1,0)--(2,0);
  \draw (2,0)--(4,0)--(5,0)--(output2);
  \node at (6.15,1) {$=$};
  \foreach \x in {1,2,4,5,7,9}
    \foreach \y in {0,2}
      \node [dot] (n\x\y) at (\x,\y) {};
  \draw [-,thick] (n12) to node [pos=0.25,fill=white] {\small$\frac{1}{\sqrt2}$} (n20);
  \draw [-,thick] (n10) to node [pos=0.25,fill=white] {\small$\frac{1}{\sqrt2}$} (n22);
  \draw [-,thick] (n42) to node [pos=0.25,fill=white] {\small$\frac{1}{\sqrt2}$} (n50);
  \draw [-,thick] (n40) to node [pos=0.25,fill=white] {\small$\frac{1}{\sqrt2}$} (n52);
  \draw [-,thick] (n70) to node [pos=0.25,fill=white] {\small$-i\sin\frac{\varphi}{2}$} (n92);
  \draw [-,thick] (n70) to node [pos=0.5,fill=white] {\small$\cos\frac{\varphi}{2}$} (n90);
  \draw [-,thick] (n72) to node [pos=0.25,fill=white] {\small$-i\sin\frac{\varphi}{2}$} (n90);
  \draw [-,thick] (n72) to node [pos=0.5,fill=white] {\small$\cos\frac{\varphi}{2}$} (n92);
  \draw [-,thick] (9,0)--(9.3,0);
  \draw [-,thick] (9,2)--(9.3,2);
  \draw [-,thick] (6.7,0)--(7,0);
  \draw [-,thick] (6.7,2)--(7,2);
  \node [fill=white] at (4.6,2) {\small$\frac{1}{\sqrt2}$};
  \node [fill=white] at (1.6,2) {\small$\frac{1}{\sqrt2}$};
  \node [fill=white] at (3,2) {\small$e^{i\varphi_0}$};
  \node [fill=white] at (3,0) {\small$e^{i\varphi_1}$};
  \node [fill=white] at (1.6,0) {\small$\frac{1}{\sqrt2}$};
  \node [fill=white] at (4.6,0) {\small$\frac{-1}{\sqrt2}$};
\end{tikzpicture}
```

In general, quantum operation $A$ followed by another quantum operation $B$ is a quantum operation described by the matrix product $BA$ (watch the order of matrices).
Indeed, the expression $(BA)_{ij}=\sum_k B_{ik}A_{kj}$ is the sum over amplitudes that input $\ket{j}$ generates output $\ket{i}$ via a specific intermediate state $\ket{k}$.
As you can see, the matrix approach is a wonderful bookkeeping tool for in one swap it takes care of both multiplying and adding probability amplitudes corresponding to all the contributing paths.


## Qubits, gates, and circuits

Atoms, trapped ions, molecules, nuclear spins and many other quantum objects with two pre-selected basis states labeled as $\ket{0}$ and $\ket{1}$ (from now on we will call such objects quantum bits or **qubits**) can be used to implement simple quantum interference.
There is no need to learn about physics behind these diverse technologies if all you want is to understand the basics of quantum theory.
We may now conveniently forget about any specific experimental realisation of a qubit and represent a generic **single qubit interference** graphically as a **circuit diagram**:^[Do not confuse the interference diagrams of Figure \@ref(fig:ramsey-figure) and Figure \@ref(fig:interference-matrix) with the circuit diagram. In the circuit diagrams, which we will use a lot from now on, a single qubit is represented by a single line.]

```{r,engine='tikz',engine.opts=list(template="tikz2pdf.tex"),fig.width=4}
\begin{quantikz}
  \lstick{$\ket{0}$} \qw
  & \gate{H}
  & \phase{\varphi}
  & \gate{H}
  & \qw \rstick{$\cos\frac{\varphi}{2}\ket{0}-i\sin\frac{\varphi}{2}\ket{1}$}
\end{quantikz}
```

This diagram should be read from left to right.
The horizontal line represents a qubit that is inertly carried from one quantum operation to another. We often call this line a **quantum wire**.
The wire may describe translation in space (e.g. atoms travelling through cavities) or translation in time (e.g. a sequence of operations performed on a trapped ion).
The boxes or circles on the wire represent elementary quantum operations, called **quantum logic gates**.
Here we have two types of gates: two Hadamard gates $H$ (think about resonant interactions) and one phase gate $P_\varphi$ (think about dispersive interaction), where^[Global phase factors are irrelevant, it is the relative phase $\varphi =\varphi_1-\varphi_0$ that matters. In a single qubit phase gate we usually factor out $e^{i\varphi_0}$, which leaves us with the two diagonal entries: $1$ and $e^{i\varphi}$.]
$$
H=\begin{bmatrix}
    \frac{1}{\sqrt2} & \frac{1}{\sqrt2}
  \\\frac{1}{\sqrt2} & \frac{-1}{\sqrt2}
  \end{bmatrix}
  \quad\text{and}\quad
P_\varphi  = \begin{bmatrix}
    1 & 0
  \\0 & e^{i\varphi}
  \end{bmatrix}.
$$

The input qubits appear as state vectors on the left side of circuit diagrams, and the output qubits as state vectors on the right.
The product of the three matrices $HP_\varphi H$ (see Figure \@ref(fig:interference-matrix)) describes the action of the whole circuit: it maps input state vectors to output state vectors^[$HP_\varphi H =\begin{bmatrix}\cos\frac{\varphi}{2} & -i\sin\frac{\varphi}{2}\\\ -i\sin\frac{\varphi}{2}& \cos\frac{\varphi}{2}\end{bmatrix}$]:
$$
  \begin{array}{lcr}
    \ket{0} & \longmapsto & \cos\frac{\varphi}{2}\ket{0} - i\sin\frac{\varphi}{2}\ket{1},
  \\\ket{1}
    & \longmapsto
    &- i\sin\frac{\varphi}{2}\ket{0} + \cos\frac{\varphi}{2}\ket{1}.
  \end{array}
$$


## Quantum decoherence

We do need quantum theory to describe many physical phenomena, but, at the same time, there are many other  phenomena where the classical theory of probability works pretty well.
We hardly see quantum interference on a daily basis.
Why?
The answer is **decoherence**.
The addition of probability amplitudes, rather than probabilities, applies to physical systems which are completely isolated.
However, it is almost impossible to isolate a complex quantum system, such as a quantum computer, from the rest of the world.
There will always be spurious interactions with the environment, and when we add amplitudes, we have to take into account not only different configurations of the physical system at hand, but also different configurations of the environment.

For example, consider an isolated system composed of a quantum computer and its environment.
The computer is prepared in some input state $I$ and generates output $O$.
Let us look at the following two scenarios:

1. _The computer is isolated and quantum computation does not affect the environment._
  The computer and the environment evolve independently from each other and, as a result, the environment does not hold any physical record of how the computer reached output $O$.
  In this case we add the amplitudes for each of the two alternative computational paths.

```{r,engine='tikz',fig.width=1.5}
\definecolor{primary}{RGB}{177,98,78}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\tikzset{process/.style={thick,->,shorten >=0.1cm,shorten <=0.1cm}}
\begin{tikzpicture}[scale=0.5]
  \node (initial) at (0,0) [blob=primary] {\small $I$};
  \node (final) at (4,-1) [blob=primary] {\small $O$};
  \draw [process] (initial) to [out=40,in=120] node[pos=0.5,fill=white] {$z_1$} (final);
  \draw [process] (initial) to [out=-60,in=-140] node[pos=0.5,fill=white] {$z_2$} (final);
  \node at (2,-3) {$p = |z_1 + z_2|^2$};
\end{tikzpicture}
```

2. _Quantum computation affects the environment._
  The environment now holds a physical record of how the computer reached output $O$, which results in two final states of the composed system (computer + environment) which we denote $O_1$ and $O_2$.
  We add the probabilities for each of the two alternative computational paths.

```{r,engine='tikz',fig.width=1.5}
\definecolor{primary}{RGB}{177,98,78}
\tikzset{blob/.style args={#1}{circle,draw,fill={#1}}}
\tikzset{process/.style={thick,->,shorten >=0.1cm,shorten <=0.1cm}}
\begin{tikzpicture}[scale=0.5]
  \node (initial) at (0,0) [blob=primary] {\small I};
  \node (final1) at (4.2,-0.5) [blob=primary,inner sep=2pt] {\small $O_1$};
  \node (final2) at (3.8,-1.5) [blob=primary,inner sep=2pt] {\small $O_2$};
  \draw [process] (initial) to [out=50,in=120] node[pos=0.5,fill=white] {$z_1$} (final1);
  \draw [process] (initial) to [out=-70,in=-150] node[pos=0.5,fill=white] {$z_2$} (final2);
  \node at (2,-3.2) {$p = |z_1|^2 + |z_2|^2$};
\end{tikzpicture}
```

When quantum computation affects the environment, we have to include the environment in our analysis for it now takes part in the computation.
Depending on which computational path was taken, the environment may end up in two distinct states.
The computer itself may show output $O$, but when we include the environment we have not one, but two outputs, $O_1$ and $O_2$, denoting, respectively, "computer shows output $O$ and the environment knows that path $1$ was taken" and "computer shows output $O$ and the environment knows that path $2$ was taken".
There are no alternative ways of reaching $O_1$ or $O_2$, hence there is no interference, and the corresponding probabilities read $p_1=|z_1|^2$ for $O_1$, and $p_2=|z_2|^2$ for $O_2$.
The probability that the computer shows output $O$, regardless the state of the environment, is the sum of of the two probabilities: $p=p_1+p_2$.
We have lost the interference term and any advantages of quantum computation along with it.
In the presence of decoherence, the interference formula in Equation (1.2.1.1) is modified and reads
$$
p
= p_1 + p_2 + 2 v \sqrt{p_1 p_2}\cos (\varphi_2-\varphi_1),
$$
where the parameter $v$, called the **visibility** of the interference pattern, ranges from $0$ (the environment can perfectly distinguish between the two paths, total decoherence, no interference) to $1$ (the environment cannot distinguish between the two paths, no decoherence, full interference), with the values in between corresponding to partial decoherence.

```{r,engine='tikz',engine.opts=list(template="tikz2pdf.tex"),fig.width=4}
\definecolor{secondary}{RGB}{91,132,177}
\begin{tikzpicture}
  \begin{axis}[
    xmin=0,xmax=5,
    xlabel=relative phase,
    ylabel=$\mbox{probability $p$}$,
    ymin=0,ymax=1.1,
    axis lines=middle,
    xtick={0},
    xticklabels={$0$},
    ytick={0.5,1},
    yticklabels={$p_1+p_2$,1},
    xlabel style={font=\Large,anchor=north,at={(axis description cs:0.8,0)}},
    ylabel style={font=\Large,anchor=south},very thick,
    yticklabel style={font=\Large},
  ]
    \addplot[domain=0:1.6*pi,samples=200,secondary,line width=2pt]{0.5+0.15*cos(deg(3*x))};
    \addplot[domain=0:1.6*pi,samples=200,gray,very thick]{0.5};
  \end{axis}
\end{tikzpicture}
```

We shall derive this formula later on, and you will see that $v$ quantifies the degree of distinguishability between $O_1$ and $O_2$.
The more the environment knows about which path was taken the less interference we see.

::: {.idea data-latex=""}
Decoherence suppresses quantum interference.
:::

Decoherence is chiefly responsible for our classical description of the world --- without interference terms we may as well add probabilities instead of amplitudes.
While decoherence is a serious impediment to building quantum computers, depriving us of the power of quantum interference, it is not all doom and gloom: there are clever ways around decoherence, such as quantum error correction and fault-tolerant methods we will meet later.


## Computation: deterministic, probabilistic, and quantum

Take one physical bit or a qubit.
It has two logical states: $\ket{0}$ and $\ket{1}$.
Bring another qubit and the combined systems has four logical states $\ket{00}, \ket{01},\ket{10}$ and $\ket{11}$.
In general $n$ qubits will give us $2^n$ states representing all possible binary strings of length $n$.
It is important to use subsystems --- here qubits --- rather than one chunk of matter, for operating on at most $n$ qubits we can reach any of the $2^n$ states of the composed system.
Now, let the qubits interact in a controllable fashion.
We are computing.
Think about computation as a physical process that evolves a prescribed initial configuration of a computing machine, called **$\texttt{INPUT}$**, into some final configuration, called **$\texttt{OUTPUT}$**.
We shall refer to the configurations as **states**.
Figure \@ref(fig:deterministic-computation) shows five consecutive computational steps performed on four distinct states.

```{r deterministic-computation,engine='tikz',fig.cap='Deterministic computation.'}
\usetikzlibrary{arrows.meta}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\tikzset{process/.style={thick,-{Latex},shorten >=0.1cm,shorten <=0.1cm}}
\begin{tikzpicture}[xscale=2.5,yscale=0.8]
    \foreach \x in {0,1,2,3,4,5}
      \foreach \y in {0,1,2,3}
        \node [dot] (n\x\y) at (\x,\y) {};
    \node at (-0.3,3) {\sc input};
    \node at (5.3,1) {\sc output};
    \draw [process] (n03) to (n11);
    \draw [process] (n11) to (n22);
    \draw [process] (n22) to (n30);
    \draw [process] (n30) to (n42);
    \draw [process] (n42) to (n51);
  \end{tikzpicture}
```

That computation was **deterministic**: every time you run it with the same input, you get the same output.
But a computation does not have to be deterministic --- we can augment a computing machine by allowing it to "toss an unbiased coin" and to choose its steps randomly.
It can then be viewed as a directed^[So we read left to right, and omit the arrowheads.] tree-like graph where each node corresponds to a state of the machine, and each edge represents one step of the computation, as shown in Figure \@ref(fig:probabilistic-computation)

```{r probabilistic-computation,engine='tikz',fig.cap='Probabilistic computation.'}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\begin{tikzpicture}[xscale=2.5,yscale=0.8]
  \foreach \x in {0,...,5}
    \foreach \y in {0,1,2,3}
      \node [dot] (n\x\y) at (\x,\y) {};
  \foreach \x in {1,2,3,4}
    \foreach \y in {0,1,2,3}
      \foreach \z in {0,1,2,3}
        \pgfmathtruncatemacro{\a}{\x+1}
        \draw [lightgray] (n\x\y) to (n\a\z);
  \foreach \y in {0,1,2,3}
    \draw [lightgray] (n03) to (n1\y);
  \draw [thick,black] (n03) to (n12) to (n20) to (n31) to (n40) to (n51);
  \draw [thick,black] (n03) to (n11) to (n22) to (n30) to (n42) to (n51);
  \node at (-0.3,3) {\sc input};
  \node at (5.5,1) {$p=p_1+p_2$};
\end{tikzpicture}
```

The computation starts from some initial state ($\texttt{INPUT}$) and it subsequently branches into other nodes representing states reachable with non-zero probability from the initial state.
The probability of a particular final state ($\texttt{OUTPUT}$)  being reached is equal to the sum of the probabilities along all mutually exclusive paths which connect the initial state with that particular state.
Figure \@ref(fig:probabilistic-computation) shows only two computational paths, but, in general, there could be many more paths (here, up to 256) contributing to the final probability.
Quantum computation can be represented by a similar graph, as in \@ref(fig:quantum-computation).

```{r quantum-computation,engine='tikz',fig.cap='Quantum computation.'}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\begin{tikzpicture}[xscale=2.5,yscale=0.8]
  \foreach \x in {0,...,5}
    \foreach \y in {0,1,2,3}
      \node [dot] (n\x\y) at (\x,\y) {};
  \foreach \x in {1,2,3,4}
    \foreach \y in {0,1,2,3}
      \foreach \z in {0,1,2,3}
        \pgfmathtruncatemacro{\a}{\x+1}
        \draw [lightgray] (n\x\y) to (n\a\z);
  \foreach \y in {0,1,2,3}
    \draw [lightgray] (n03) to (n1\y);
  \draw [thick,black] (n03) to (n12) to (n20) to (n31) to (n40) to (n51);
  \draw [thick,black] (n03) to (n11) to (n22) to (n30) to (n42) to (n51);
  \node at (-0.3,3) {\sc input};
  \node at (5.5,1) {$p=p_1+p_2$};
  \node at (5.8,0.4) {$+2\sqrt{p_1p_2}\cos(\varphi_2-\varphi_1)$};
\end{tikzpicture}
```

For quantum computations, we associate with each edge in the graph the probability _amplitude_ that the computation follows that edge.
The probability amplitude of a particular path to be followed is the product of amplitudes pertaining to transitions in each step.
The probability amplitude of a particular final state being reached is equal to the sum of the amplitudes along all mutually exclusive paths which connect the initial state with that particular state:
$$
  z = \sum_{\mathrm{all\,paths}\,k} z_k.
$$
The resulting probability, as we have just seen, is the sum of the probabilities pertaining to each computational path $p_k$ modified by the interference terms:
$$
  \begin{aligned}
    p
    &= |z|^2
  \\&= \sum_{k,j} z_j^\star z_k
  \\&= \sum_k p_k + \sum_{k\ne j} \sqrt{p_k p_j}\cos(\varphi_k-\varphi_j).
  \end{aligned}
$$

::: {.idea data-latex=""}
Quantum computation can be viewed as a complex multi-particle quantum interference involving many computational paths through a computing device.
The art of quantum computation is to shape quantum interference, through a sequence of computational steps, enhancing probabilities of correct outputs and suppressing probabilities of the wrong ones.
:::


## Computational complexity

Is there a compelling reason why we should care about quantum computation?
It may sound like an extravagant way to compute something that can be computed anyway.
Indeed, your standard laptop, given enough time and memory, can simulate pretty much any physical process.
In principle, it can also simulate any quantum interference and compute everything that quantum computers can compute.
The snag is, this simulation, in general, is very inefficient.
And efficiency does matter, especially if you have to wait more than the age of the Universe for your laptop to stop and deliver an answer!^[The age of the Universe is currently estimated at 13.772 billion years.]

In order to solve a particular problem, computers (classical or quantum) follow a precise set of instructions called an **algorithm**.
Computer scientists quantify the efficiency of an algorithm according to how rapidly its running time, or the use of memory, increases when it is given ever larger inputs to work on.
An algorithm is said to be **efficient** if the number of elementary operations taken to execute it increases no faster than a polynomial function of the size of the input.^[Note that the technological progress alone, such as increasing the speed of classical computers, will never turn an inefficient algorithm (exponential scaling) into an efficient one (polynomial scaling). Why?]
We take the input size to be the total number of binary digits (bits) needed to specify the input.
For example, using the algorithm taught in elementary school, one can multiply two $n$ digit numbers in a time that grows like the number of digits squared, $n^2$.
In contrast, the fastest-known method for the reverse operation --- factoring an $n$-digit integer into prime numbers --- takes a time that grows exponentially, roughly as $2^n$.
That is considered inefficient.

```{r,engine='tikz',engine.opts=list(template="tikz2pdf.tex"),fig.width=4}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\begin{tikzpicture}
  \begin{axis}[
    xmin=0,xmax=10,
    xlabel=input size $n$,
    ylabel=execution time,
    ymin=0,ymax=100,
    axis lines=middle,
    xtick={0},
    xticklabels={$0$},
    ytick={0},
    yticklabels={$0$},
    xlabel style={anchor=north,at={(axis description cs:0.8,0)}},
    ylabel style={anchor=south,at={(axis description cs:0.15,1.05)}},
  ]
    \addplot[domain=0:10,samples=200,primary,line width=2pt]{x^2};
    \addplot[domain=0:10,samples=200,secondary,line width=2pt]{2^x};
  \end{axis}
  \node at (4.1,5.5) {$2^n$};
  \node at (6.2,5.5) {$n^2$};
  \node at (6.2,2.3) {\footnotesize polynomial};
  \node at (6.2,2) {\footnotesize is good {:)}};
  \node at (2.5,3.8) {\footnotesize exponential};
  \node at (2.5,3.5) {\footnotesize is bad {:(}};
\end{tikzpicture}
```

The class of problems that can be solved by a deterministic computer in polynomial time is represented by the capital letter $\texttt{P}$, for _polynomial_ time.
The class of problems that can be solved in polynomial time by a probabilistic computer is called $\texttt{BPP}$, for _bounded-error probabilistic polynomial_ time.
It is clear that $\texttt{BPP}$ contains $\texttt{P}$, since a deterministic computation is a special case of a probabilistic computation in which we never consult the source of randomness.
When we run a probabilistic (a.k.a. randomised) computation many times on the same input, we will not get the same answer every time, but the computation is useful if the probability of getting the right answer is high enough.
Finally, the complexity class $\texttt{BQP}$, for _bounded-error quantum polynomial_, is the class of problems that can be solved in polynomial time by a quantum computer.

```{r,engine='tikz',fig.width=2}
\begin{tikzpicture}
  \node at (0,0) {{\footnotesize\texttt{P}}};
  \node at (0,0.7) {{\footnotesize\texttt{BPP}}};
  \node at (0,1.2) {{\footnotesize\texttt{BQP}}};
  \draw (0,0) circle (0.5cm);
  \draw (0,0) circle (1cm);
  \draw (0,0) circle (1.5cm);
\end{tikzpicture}
```

Since a quantum computer can easily generate random bits and simulate a probabilistic classical computer, $\texttt{BQP}$ certainly contains the class $\texttt{BPP}$.
Here we are interested in problems that are in $\texttt{BQP}$ but not known to be in $\texttt{BPP}$.
The most popular example of such a problem is factoring.

A quantum algorithm, discovered by Peter Shor in 1994, can factor $n$-digit numbers in a number of steps that grows only as $n^2$, as opposed to the $2^n$ that we have classically.^[It must be stressed that not all quantum algorithms are so efficient, in fact many are no faster than their classical counterparts. Which particular problems will lend themselves to quantum speed-ups is an open question.]
Since the intractability of factorisation underpins the security of many methods of encryption, Shor's algorithm was soon hailed as the first `killer application' for quantum computation: something very useful that only a quantum computer could do.
Since then, the hunt has been on for interesting things for quantum computers to do, and at the same time, for the scientific and technological advances that could allow us to build quantum computers.


## Outlook

When the physics of computation was first investigated, starting in the 1960s, one of the main motivations was a fear that quantum-mechanical effects might place fundamental bounds on the accuracy with which physical objects could render the properties of the abstract entities, such as logical variables and operations, that appear in the theory of computation.
It turned out, however, that quantum mechanics itself imposes no significant limits, but does break through some of those that classical physics imposed.
The quantum world has a richness and intricacy that allows new practical technologies, and new kinds of knowledge.
In this course we will merely scratch the surface of the rapidly developing field of quantum computation.
We will concentrate mostly on the fundamental issues and skip many experimental details.
However, it should be mentioned that quantum computing is a serious possibility for future generations of computing devices.
At present it is not clear how and when fully-fledged quantum computers will eventually be built;
but this notwithstanding, the quantum theory of computation already plays a much more fundamental role in the scheme of things than its classical predecessor did.
I believe that anyone who seeks a fundamental understanding of either physics, computation or logic must incorporate its new insights into their world view.


## Remarks and exercises

### {}

As a purely historical remark: back in 1926, Max Born simply postulated the connection between amplitudes and probabilities, but did not get it quite right on his first attempt.
In the original paper^[Max Born, "Zur Quantenmechanik der Stoßvorgänge", _Zeitschrift für Physik_ **37** (1926), 893--867.] proposing the probability interpretation of the state vector (wavefunction) he wrote:

> ... If one translates this result into terms of particles only one interpretation is possible.
> $\Theta_{\eta,\tau,m}(\alpha,\beta,\gamma)$ [the wavefunction for the particular problem he is considering] gives the probability$^*$ for the electron arriving from the $z$ direction to be thrown out into the direction designated by the angles $\alpha,\beta,\gamma$...
>
> $^*$ Addition in proof: More careful considerations show that the probability is proportional to the square of the quantity $\Theta_{\eta,\tau,m}(\alpha,\beta,\gamma)$.

### {}

Suppose that we modified the Born rule, so that probabilities were given by the absolute values of amplitudes raised to power $p$ (for $p$ not necessarily equal to $2$).
Then admissible physical evolutions must preserve the normalisation of probability.
Mathematically speaking, they must be isometries of $p$-norms.

Recall that the $p$-norm of vector $v$, with components $v_1, v_2,\ldots, v_n$, is defined as
$$
  \sqrt[{}^p]{|v_1|^p + |v_2|^p + \ldots + |v_n|^p}.
$$
It is clear that any permutation of vector components and multiplication by phase factors (i.e. unit complex numbers) will leave any $p$-norm unchanged.
It turns out that these complex permutations are the _only_ isometries, except for _one_ special case!
For $p=2$, the isometries are unitary operations, which form a continuous group; in all other cases we are restricted to discrete permutations.
We do not have to go into details of the proof since we can _see_ this result.

(ref:p-norm-unit-spheres-caption) The unit spheres in the $p$-norm for $p=1,2,42,\infty$.

```{r p-norm-unit-spheres,engine='tikz',fig.width=4,fig.cap='(ref:p-norm-unit-spheres-caption)'}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\begin{tikzpicture}[scale=2.5]
  \draw [->](-1.3,0) to (1.3,0) node [above] {$v_1$};
  \draw [->](0,-1.3) to (0,1.3) node [right] {$v_2$};
  %
  \draw (1,0) to (0,1) to (-1,0) to (0,-1) to cycle;
  \node [label={[rotate=-45]below:${\tiny{p=1}}$}] at (0.52,0.52) {};
  %
  \draw (0,0) [primary] circle (1);
  \node [label={[primary,rotate=-45]below:${\tiny p=2}$}] at (0.68,0.79) {};
  %
  \draw [domain=0:1,samples=40,secondary] plot ({\x}, {sqrt(1-\x^6)});
  \draw [domain=0:1,samples=40,secondary] plot ({\x}, {-sqrt(1-\x^6)});
  \draw [domain=0:1,samples=40,secondary] plot ({-\x}, {-sqrt(1-\x^6)});
  \draw [domain=0:1,samples=40,secondary] plot ({-\x}, {sqrt(1-\x^6)});
  \node [label={[secondary,rotate=-45]below: ${\tiny p=42}$}] at (0.79,0.94) {};
  %
  \draw (-1,-1) rectangle (1,1);
  \node[] at (0.8,1.07) {$p=\infty$};
\end{tikzpicture}
```

In particular, the image of the unit sphere must be preserved under probability preserving operations.
As we can see in Figure \@ref(fig:p-norm-unit-spheres), the $2$-norm is special because of its rotational invariance --- the probability measure picks out no preferred basis in the space of state vectors.
Moreover, it respects unitary operations and does not restrict them in any way.
If the admissible physical evolutions were restricted to discrete symmetries, e.g. permutations, then there would be no continuity, and no concept of "time" as we know it.

### {}

Complex numbers have many applications in physics, however, not until the advent of quantum theory was their ubiquitous and fundamental role in the description of the actual physical world so evident.
Even today, their profound link with probabilities appears to be a rather mysterious connection.
Mathematically speaking, the set of complex numbers is a field. This is an important algebraic structure used in almost all branches of mathematics.
You do not have to know much about algebraic fields to follow these lectures, but still, you should know the basics.
Look them up.

a. The set of rational numbers and the set of real numbers are both fields, but the set of integers is not. Why?
b. What does it mean to say that the field of complex numbers is **algebraically closed**?
c. Evaluate each of the following quantities:
  $$
    1+e^{-i\pi},
    \quad
    |1+i|,
    \quad
    (1+i)^{42},
    \quad
    \sqrt{i},
    \quad
    2^i,
    \quad
    i^i.
  $$
d. Here is a simple proof that $+1=-1$: $$1=\sqrt{1}=\sqrt{(-1)(-1)}=\sqrt{-1}\sqrt{-1}=i^2=-1.$$ What is wrong with it?

### {}

A quantum computer starts calculations in some initial state, then follows $n$ different computational paths which lead to the final output.
The computational paths are followed with probability amplitudes $\frac{1}{\sqrt n}e^{i k \varphi}$, where $\varphi$ is a fixed angle $0< \varphi <2\pi$ and $k=0,1,...n-1$.
Show that the probability of generating the output is^[$1+z+z^2+\ldots + z^n= \frac{1-z^{n+1}}{1-z}$]
$$
  \frac{1}{n}\left\vert
    \frac{1-e^{i n\varphi}}{1-e^{i\varphi}}
  \right\vert^2
  = \frac{1}{n} \frac{\sin^2 (n\frac{\varphi}{2})}{\sin^2 (\frac{\varphi}{2})}.
$$
for $0<\varphi<2\pi$, and $1$ for $\varphi=0$.
Plot the probability as a function of $\varphi$.

### {}

Imagine two distant stars, A and B, that emit _identical_ photons.
If you point a single detector towards them you will register a click every now and then, but you never know which star the photon came from.
Now prepare two detectors and point them towards the stars.
Assume the photons arrive with the probability amplitudes specified in Figure \@ref(fig:photons-from-stars).
Every now and then you will register a coincidence: the two detectors will fire.

a. Calculate the probability of a coincidence.
b. Now, assume that $z\approx \frac{1}{r}e^{i\frac{2r\pi}{\lambda}}$, where $r$ is the distance between detectors and the stars. How can we use this to measure $r$?

```{r photons-from-stars,engine='tikz',fig.width=3.5,fig.cap='Two photon detectors pointing at two stars, with the probabilities of detection.'}
\usetikzlibrary{shapes}
\usetikzlibrary{arrows.meta}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\tikzset{process/.style={thick,-{Latex},shorten >=0.1cm,shorten <=0.1cm}}
\tikzset{pics/measure left/.style args={#1}{code={\draw [fill=gray!15] (0,-0.5) arc (-90:90:0.5cm) -- cycle;\node [anchor=mid] at (0.22,0) {#1};}}}
\begin{tikzpicture}[xscale=4,yscale=3]
  \tikzstyle{nicestar}=[draw,star,star points=7,star point ratio=0.55,minimum size=12pt]
  \node [nicestar,fill=gray!40] (star1) at (0,1) {A};
  \node [nicestar,fill=gray!40] (star2) at (0,0) {B};
  \node (detector1) at (1,1) {};
  \node (detector2) at (1,0) {};
  \draw (detector1) pic {measure left={1}};
  \draw (detector2) pic {measure left={2}};
  \draw [process] (star1) to node [pos=0.5,fill=white] {$z$} (detector1);
  \draw [process] (star2) to node [pos=0.5,fill=white] {$z$} (detector2);
  \draw [process] (star1) to node [pos=0.25,fill=white] {$ze^{i\phi}$} (detector2);
  \draw [process] (star2) to node [pos=0.25,fill=white] {$ze^{i\phi}$} (detector1);
\end{tikzpicture}
```

### Physics against logic?

Now that we have poked our heads into the quantum world, let us see how quantum interference challenges conventional logic and leads to qualitatively different computations.
Consider the following task (which we will return to a few more times in later chapters): design a logic gate that operates on a single bit such that, when it is followed by another, identical, logic gate, the output is _always_ the negation of the input.
Let us call this logic gate **the square root of $\texttt{NOT}$**, or $\sqrt{\texttt{NOT}}$.
A simple check, such as an attempt to construct a truth table, should persuade you that there is no such operation in logic.
It may seem reasonable to argue that since there is no such operation in logic, $\sqrt{\texttt{NOT}}$ is impossible.
Think again!

Figure \@ref(fig:sqrt-not) shows a simple computation, two identical computational steps performed on two states labelled as $0$ and $1$, i.e. on one bit.
An interplay of constructive and destructive interference makes some transitions impossible and the result is the logical $\texttt{NOT}$.
Thus, quantum theory declares, the square root of $\texttt{NOT}$ is possible.
And it does exist!
Experimental physicists routinely construct this and many other "impossible" gates in their laboratories.
They are the building blocks of a quantum computer.
Quantum theory explains the behaviour of $\sqrt{\texttt{NOT}}$, hence, reassured by the physical experiments that corroborate this theory, logicians are now entitled to propose a new logical operation $\sqrt{\texttt{NOT}}$.
Why?
Because a faithful physical model for it exists in nature.

Write a $2\times 2$ matrix which describes the $\sqrt{\texttt{NOT}}$ operation.
Is there just one such a matrix?
Suppose you are given a supply of Hadamard and phase gates with tuneable phase settings.
How would you construct the $\sqrt{\texttt{NOT}}$ gate?

(ref:sqrt-not-caption) A computation that, when repeated, gives exactly $\texttt{NOT}$. An unlabelled line means that it has probability $1$, and the lack of a line corresponds to having probability $0$.

```{r sqrt-not,engine='tikz',fig.cap='(ref:sqrt-not-caption)'}
\tikzset{dot/.style={fill,shape=circle,minimum size=5pt,inner sep=0pt}}
\tikzset{centrelabel/.style={pos=0.5,fill=white}}
\begin{tikzpicture}[xscale=3,yscale=2]
  \node [dot,label=left:{$0$}] (i0) at (0,1) {};
  \node [dot,label=left:{$1$}] (i1) at (0,0) {};
  \node [dot] (m0) at (1,1) {};
  \node [dot] (m1) at (1,0) {};
  \node [dot,label=right:{$0$}] (o0) at (2,1) {};
  \node [dot,label=right:{$1$}] (o1) at (2,0) {};
  \draw [thick] (i0) to node [centrelabel] {$\frac{1+i}{\sqrt2}$} (1,1);
  \draw [thick] (i0) to node [pos=0.25,fill=white] {$\frac{1-i}{\sqrt2}$} (1,0);
  \draw [thick] (i1) to node [pos=0.25,fill=white] {$\frac{1-i}{\sqrt2}$} (1,1);
  \draw [thick] (i1) to node [centrelabel] {$\frac{1+i}{\sqrt2}$} (1,0);
  \draw [thick] (m0) to node [centrelabel] {$\frac{1+i}{\sqrt2}$} (o0);
  \draw [thick] (m0) to node [pos=0.25,fill=white] {$\frac{1-i}{\sqrt2}$} (o1);
  \draw [thick] (m1) to node [pos=0.25,fill=white] {$\frac{1-i}{\sqrt2}$} (o0);
  \draw [thick] (m1) to node [centrelabel] {$\frac{1+i}{\sqrt2}$} (o1);
  %
  \node at (2.5,0.5) {$=$};
  %
  \node [dot,label=left:{$0$}] (i0) at (3,1) {};
  \node [dot,label=left:{$1$}] (i1) at (3,0) {};
  \node [dot,label=right:{$0$}] (o0) at (4,1) {};
  \node [dot,label=right:{$1$}] (o1) at (4,0) {};
  \draw [thick] (i0) to (o1);
  \draw [thick] (i1) to (o0);
  %
  \node at (0.5,-0.45) {$\sqrt{\texttt{NOT}}$};
  \node at (1.5,-0.45) {$\sqrt{\texttt{NOT}}$};
  \node at (3.5,-0.45) {\texttt{NOT}};
\end{tikzpicture}
```

### Quantum bomb tester

You have been drafted by the government to help in the demining effort in a former war-zone.^[This is a slightly modified version of a bomb testing problem described by Avshalom Elitzur and Lev Vaidman in _Quantum-mechanical interaction-free measurement_, Found. Phys. **47** (1993), 987-997.]
In particular, retreating forces have left very sensitive bombs in some of the sealed rooms.
The bombs are configured such that if even one photon of light is absorbed by the fuse (i.e. if someone looks into the room), the bomb will go off.
Each room has an input and output port which can be hooked up to external devices.
An empty room will let light go from the input to the output ports unaffected, whilst a room with a bomb will explode if light is shone into the input port and the bomb absorbs even just one photon --- see Figure \@ref(fig:bomb-detecting-scenario).

(ref:bomb-detecting-scenario-caption) _Left_ --- the passage of a photon through an empty room. _Right_ --- the passage of a photon through a room containing a bomb.

```{r bomb-detecting-scenario,engine='tikz',fig.cap='(ref:bomb-detecting-scenario-caption)'}
\usetikzlibrary{arrows.meta}
\usetikzlibrary{shapes}
\definecolor{secondary}{RGB}{91,132,177}
\begin{minipage}{0.4\textwidth}
  \centering
  \begin{tikzpicture}[scale=0.2]
    \draw [thick,fill=gray!40] (0,0) rectangle (13,8);
    \draw [thick,fill=white] (1,1) rectangle (12,7);
    \draw [white,fill=white] (-0.1,3.5) rectangle (13.1,4.5);
    \draw [thick] (0,3.5) -- (1,3.5);
    \draw [thick] (0,4.5) -- (1,4.5);
    \draw [thick] (12,3.5) -- (13,3.5);
    \draw [thick] (12,4.5) -- (13,4.5);
    \draw [-Latex,thick] (-3,4) to (16,4);
    \node at (6.5,2.5) {empty room};
  \end{tikzpicture}
\end{minipage}
\hfill
\begin{minipage}{0.4\textwidth}
  \begin{tikzpicture}[scale=0.2]
    \node [draw,star,star points=11,minimum size=5cm,thick,fill=yellow] at (6.5,4) {};
    \node [draw,star,star points=11,minimum size=4cm,thick,fill=orange] at (6.5,4) {};
    \node [draw,star,star points=11,minimum size=3cm,thick,fill=red] at (6.5,4) {};
    %
    \draw [thick,fill=gray!40]
      (0,0) -- (0,3.5) -- (1,3.5) -- (1,1) --
      (12,1) -- (12,3.5) -- (13,3.5) -- (13,0) -- cycle;
    \draw [thick,fill=gray!40]
      (0,8) -- (0,4.5) -- (1,4.5) -- (1,7) --
      (12,7) -- (12,4.5) -- (13,4.5) -- (13,8) -- cycle;
    \draw [thick,fill=secondary] (6.5,4) circle (2.5cm);
    \draw [-Latex,thick] (-3,4) to (4,4);
    \node at (16,4) {};
    \node at (6.5,4) {\color{white}\large\sc bomb};
  \end{tikzpicture}
\end{minipage}
```
    
Your task is to find a way of determining whether a room has a bomb in it without blowing it up, so that specialised (limited and expensive) equipment can be devoted to defusing that particular room.
You would like to know with certainty whether a particular room had a bomb in it.

1. To start with, consider the setup in Figure \@ref(fig:mach-zehnder-bomb-tester), where the input and output ports are hooked up in the lower arm of a Mach--Zehnder interferometer.^[Read about Mach--Zehnder interferometers in [Chapter 3](#chapter3).]
    a. Assume an empty room.
      Send a photon to input port $\ket{0}$.
      Which detector, at the output port, will register the photon?
    b. Now assume that the room does contain a bomb.
      Again, send a photon to input port $\ket{0}$.
      Which detector will register the photon and with which probability?
    c. Design a scheme that allows you --- at least some of the time --- to decide whether a room has a bomb in it without blowing it up.
      If you iterate the procedure, what is its overall success rate for the detection of a bomb without blowing it up?

```{r mach-zehnder-bomb-tester,engine='tikz',fig.cap='The Mach--Zehnder interferometer hooked up to the bomb-testing room.'}
\usetikzlibrary{arrows.meta}
\usetikzlibrary{decorations}
\usetikzlibrary{decorations.markings}
\tikzset{beamsplitter/.pic={\draw [fill=gray!10] (-0.4,-0.4) -- (-0.4,0.4) -- (0.4,0.4);\draw [fill=gray!40] (-0.4,-0.4) -- (0.4,-0.4) -- (0.4,0.4);\draw (-0.4,-0.4) -- (0.4,0.4);}}
\tikzset{mirror/.pic={\draw [line width=2pt] (-0.3,-0.3) -- (0.3,0.3);}}
\tikzset{pics/measure left/.style args={#1}{code={\draw [fill=gray!15] (0,-0.5) arc (-90:90:0.5cm) -- cycle;\node [anchor=mid] at (0.22,0) {#1};}}}
\tikzset{pics/measure top/.style args={#1}{code={\draw [fill=gray!15] (0.5,0) arc (0:180:0.5cm) -- cycle;\node [anchor=mid] at (0,0.22) {#1};}}}
\tikzset{middlearrow/.style 2 args={decoration={markings,mark=at position #2 with {\arrow{#1}}},postaction=decorate}}
\newcommand{\ket}[1]{|#1\rangle}
\begin{tikzpicture}
  \draw [thick,fill=gray!40] (1.5,-0.5) rectangle (3,0.5);
  \draw [thick,fill=white] (1.7,-0.3) rectangle (2.8,0.3);
  \draw [white,fill=white] (1.4,-0.1) rectangle (3.1,0.1);
  \draw [thick] (1.5,-0.1) -- (1.7,-0.1);
  \draw [thick] (1.5,0.1) -- (1.7,0.1);
  \draw [thick] (2.8,-0.1) -- (3,-0.1);
  \draw [thick] (2.8,0.1) -- (3,0.1);
  %
  \path (0,0) pic {beamsplitter};
  \path (4,2) pic {beamsplitter};
  \draw [thick,middlearrow={Latex}{0.6}] (0,-1.1) to (0,2) pic {mirror};
  \draw [thick,-Latex,shorten >=0.1cm] (0,2) to (5.1,2) pic {measure left={2}};
  \draw [thick,middlearrow={Latex}{0.5}] (-1.1,0) to (4,0) pic {mirror};
  \draw [thick,-Latex,shorten >=0.1cm] (4,0) to (4,3.1) pic {measure top={1}};
  \node at (-1.5,0) {$\ket{0}$};
  \node at (0,-1.5) {$\ket{1}$};
  \node [fill=white] at (2.25,0) {room};
\end{tikzpicture}
```

2. Assume that the two beam splitters in the interferometer are different.
  Say the first beam-splitter reflects incoming light with probability $r$ and transmits with probability $t=1-r$, and the second one transmits with probability $r$ and reflects with probability $t$.
  Would the new setup improve the overall success rate of the detection of a bomb without blowing it up?

3. There exists a scheme, involving many beam-splitters and something called the **quantum Zeno effect**, such that the success rate for detecting a bomb without blowing it up approaches 100%.
  Try to work it out, or find a solution on the internet.

### {}

A quantum machine has $N$ perfectly distinguishable configurations.
What is the maximum number of computational paths connecting a specific input with a specific output after $k$ steps of the machine?
Suppose you are using your laptop to add together amplitudes pertaining to each of the paths.
As $k$ and $N$ increase you may need more time and more memory to complete the task.
How does the execution time and the memory requirements grow with $k$ and $N$?
Will you need more time or more memory or both?

### {}

The classical theory of computation is essentially the theory of the universal Turing machine --- the most popular mathematical model of classical computation.
Its significance relies on the fact that, given a large but finite amount of time, the universal Turing machine is capable of any computation that can be done by any modern classical digital computer, no matter how powerful.
The concept of Turing machines may be modified to incorporate quantum computation, but we will not follow this path.
It is much easier to explain the essence of quantum computation talking about quantum logic gates and quantum Boolean networks or circuits.
The two approaches are computationally equivalent, even though certain theoretical concepts, e.g. in computational complexity, are easier to formulate precisely using the Turing machine model.
The main advantage of quantum circuits is that they relate far more directly to proposed experimental realisations of quantum computation.

### {}

In computational complexity the basic distinction is between polynomial and exponential algorithms.
Polynomial growth is good and exponential growth is bad, especially if you have to pay for it.
There is an old story about the legendary inventor of chess who asked the Persian king to be paid only by a grain of cereal, doubled on each of the 64 squares of a chess board.
The king placed one grain of rice on the first square, two on the second, four on the third, and he was supposed to keep on doubling until the board was full.
The last square would then have $2^{63}=9,223,372,036,854,775,808$ grains of rice, more than has been ever harvested on planet Earth, to which we must add the grains of all previous squares, making the total number about twice as large.
If we placed that many grains in an unbroken line we would reach the nearest star Alpha Centauri, our closest celestial neighbour beyond the solar system, about $4.4$ light-years away.^[One light year (the distance that light travels through a vacuum in one year) is $9.4607\times10^{15}$ metres.]
The moral of the story: if whatever you do requires an exponential use of resources, you are in trouble.

### {}

In order to make qualitative distinctions between how different functions grow we will often use the asymptotic big-$O$ notation.
For example, suppose an algorithm running on input of size $n$ takes $a n^2+bn+c$ elementary steps, for some positive constants $a, b$ and $c$.
These constants depend mainly on the details of the implementation and the choice of elementary steps.
What we really care about is that, for large $n$, the whole expression is dominated by its quadratic term.
We then say that the running time of this algorithm grows as $n^2$, and we write it as $O(n^2)$, ignoring the less significant terms and the constant coefficients.
More precisely, let $f(n)$ and $g(n)$ be functions from positive integers to positive reals.
You may think of $f(n)$ and $g(n)$ as the running times of two algorithms on inputs of size $n$.
We say $f=O(g)$,^[$f=O(g)$ is pronounced as "$f$ is big-oh of $g$".] which means that $f$ grows no faster than $g$, if there is a constant $c>0$ such that $f(n)\leqslant c g(n)$ for all sufficiently large values of $n$.
Essentially, $f=O(g)$ is a very loose analogue of $f \leqslant g$.
In addition to the big-$O$ notation, computer scientists often use $\Omega$ for lower bounds: $f=\Omega (g)$ means $g=O(f)$.
Again, this is a very loose analogue of $f \geqslant g$.

1. When we say that $f(n)=O(\log n)$, why don't we have to specify the base of the logarithm?
2. Let $f(n)=5n^3+1000n+50$. Is $f(n)=O(n^3)$, or $O(n^4)$, or both?
3. Which of the following statements are true?

    a. $n^k=O(2^n)$ for any constant $k$
    b. $n!=O(n^n)$
    c. if $f_1=O(g)$ and $f_2=O(g)$, then $f_1+f_2=O(g)$.

### {}

There exists a randomised algorithm which tests whether a given number $N$ is prime.^[Primality used to be given as the classic example of a problem in $\texttt{BPP}$ but not $\texttt{P}$. However, in 2002 a deterministic polynomial time test for primality was proposed by Manindra Agrawal, Neeraj Kayal, and Nitin Saxena. Thus, since 2002, primality has been in $\texttt{P}$.]
The algorithm always returns $\texttt{yes}$ when $N$ is prime, and the probability it returns $\texttt{yes}$ when $N$ is not prime is $\epsilon$, which not greater than a half (independently, each time you run the algorithm).
You run this algorithm (for the same $N$) $r$ times and each time the algorithm returns $\texttt{yes}$.
What is the probability that $N$ is not prime?

### {}

Suppose a randomised algorithm solves a decision problem, returning $\texttt{yes}$ or $\texttt{no}$ answers.
It gets the answer wrong with a probability not greater than $\frac12-\delta$, where $\delta>0$ is a constant.^[This result is known as the **Chernoff bound**.].
If we are willing to accept a probability of error no larger than $\epsilon$, then it suffices to run the computation $r$ times, where $r=O(\log 1/\epsilon)$.}

```{r,engine='tikz',engine.opts=list(template="tikz2pdf.tex"),fig.width=4}
\definecolor{primary}{RGB}{177,98,78}
\definecolor{secondary}{RGB}{91,132,177}
\begin{tikzpicture}
    \begin{axis}[
        xmin=-3,xmax=3,
        xlabel=$x$,
        ymin=-4,ymax=7,
        axis lines=middle,
        xtick={0},
        xticklabels={$0$},
        ytick={0},
        yticklabels={$0$},
        xlabel style={font=\Large,anchor=north,at={(axis description cs:0.95,0.35)}},
      ]
      \addplot[domain=-3:3,samples=200,line width=2pt,primary]{1+x};
      \addplot[domain=-3:3,samples=200,line width=2pt,secondary]{e^x};
    \end{axis}
    \node at (6.5,3.3) {$1+x$};
    \node at (5,5) {$e^x$};
\end{tikzpicture}
```

1. If we perform this computation $r$ times, how many possible sequences of outcomes are there?
2. Give a bound on the probability of any particular sequence with $w$ wrong answers.
3. If we look at the set of $r$ outcomes, we will determine the final outcome by performing a majority vote.
  This can only go wrong if $w>r/2$.
  Give an upper bound on the probability of any single sequence that would lead us to the wrong conclusion.
4. Using the bound $1-x\leq e^{-x}$, conclude that the probability of our coming to the wrong conclusion is upper bounded by $e^{-2r\delta^2}$.