One of the main objectives of CERN Quantum Technology Initiative (CERN QTI) is to investigate if quantum computing can be used in the field of high-energy physics. With this tutorial/exercise we are going to explore quantum machine learning through the use of Variational Quantum Circuits (VQCs) applied to solve the problem of classification between Supersymmetry (signal) and Standard Model (background) type of events.
The simulations used were produced by the CMS collaboration and aim to simulate proton-pronton collisions and detection by the CMS detector at tht LHC - CERN.
Stop 4-Body decay in 1 lepton Final State
We are going to focus on the search for stop pair production where each stop decays via the four body decay mode and there's only a single lepton in the final state.
We'll be looking in the compressed scenario where: Δm = m(stop)-m(neutralino) < 80 GeV using 2017 samples. Since we don't know the mass of the stop nor the mass of the neutralino our signal will be a Δm (from 10 to 80 GeV) composed of different signal points with the same Δm.
Our background is all the standard model processes that have the same signature as the signal: 1 lepton, jets and missing transverse energy(MET). To separate signal from background we're going to develop a Variational Quantum Classifier(VQC).
You'll find everything you need to know on the public CMS results: CMS-PAS-SUS-21-003 !
Machine learning (general) approach
Use data samples to construct a model that minimizes cost of unseen data.
The Qubit
A qubit (or quantum bit) is a basic unit of quantum information, meaning we can take advantage of properties found in quantum mechanics and use them in our computations such as superposition and entanglement.
While the classic bit can only have a value of 1 or 0, a qubit can be in a superposition of 1 and 0. The qubit's 0 and 1 states are expressed as follows:
∣0⟩=(10),∣1⟩=(01)
As a consequence of this mathematical description, unlike a bit, a qubit is not limited to being in one of these two states. Qubits can exist in what's known as a superposition of states. A superposition is a linear combination of two basis vectors. Mathematically:
∣ψ⟩=α∣0⟩+β∣1⟩=α(10)+β(01)=(αβ)
where α and β are complex numbers and have to satisfy:
αα∗+ββ∗=1
To note that when a qubit is in a superposition, it is not in both ∣0⟩ and ∣1⟩ "simultaneously", but is rather in some intermediate combination of the two.
We can visualize a qubit as follows:
This is called the Bloch sphere and you can play with it in this demo !
A Quantum Circuit
Quantum circuits are a common means of visually representing the sequence of operations that are performed on qubits during a quantum computation. They consist of a set of operations, or gates, applied to a set of qubits (or more generally, wires). Each wire in the diagram represents a qubit. Circuits are read from left to right, and this is the order in which operations are applied.
An example of a quantum circuit:
Quantum circuits end in a measurement of one or more qubits. For example, the circuit below has 3 qubits that start in state |0⟩, applies 5 operations, and measures every qubit at the end.
Quantum computation is all about manipulating qubit states by using gates and seeing the outcome of such manipulations by measuring the qubits.
The bit flip operation: NOT Gate
Performing quantum operations involves multiplication by matrices that send valid, normalized quantum states to other normalized quantum states. The simplest quantum operation as the following effect:
X∣0⟩=∣1⟩,
X∣1⟩=∣0⟩
Known as bit flip, Pauli-X or NOT gate, it can be mathematically represented as:
X=(0110)
and drawn in a quantum circuit diagram like:
Let's now confirm that all of this works mathematically:
X∣0⟩=[0110][10]=[01]=∣1⟩
We've flipped a quantum bit!
Rotations, rotations, rotations: the RX, RZ, and RY Gates
As we saw, we can represent a qubit in 3D-space - the Bloch sphere. RX, RZ, and RY rotate the qubit's state vector about the appropriate axis.
These operations become very useful in the context of QML because it means we can manipulate the qubits as a function of a parameter/angle θ in order to find some kind of optimum in the quantum algorithm we design.
The matrix representation of an RX(θ) is:
RX(θ)=(cos(2θ)−isin(2θ)−isin(2θ)cos(2θ))
Multiple Qubit states
A single classical bit has 2 possible states, a qubit has 2 complex amplitudes. In a similar manner, 2 classical bits have 4 possible states:
00 01 10 11
To describe a state of 2 qubits, 4 complex amplitudes are required:
Until here, we saw gates that act only on single qubits, to complete all the ingredients we need gates that can act on multiple qubits in order to entangle them. The idea being entanglement is that knowing the state of a qubit, means knowing the state of another qubit. And the idea behind the CNOT gate is that we can use a qubit to control the state of another one. When this 2 concepts are merged together we can entagle qubits. Let's see how that's done.
By definition, a state is entangled if it cannot be described as a tensor product of individual qubit states (if it can, it is separable). An entangled state can only be described by specifying the full state. One example of an entangled state:
∣ψ⟩=21(∣00⟩+∣11⟩)
In order to describe this state using the kronecker tensor we need to find a0, a1, b0 and b1 such that:
b0a0=1b0a1=0b1a0=0b1a1=1
as one can see, there is no solution, it's not possible to describe this state as two separate qubits, so they are entangled!
Let's now see how we can make such a state using the controlled-NOT or CNOT gate. This is a two-qubit gate that performs an operation (specifically, a Pauli X or NOT gate) on one qubit depending on the state of another:
The first qubit, denoted with the solid dot, is the control qubit. The state of this qubit does not change, but its state is the one that determines whether the operation is performed. The second qubit is the target qubit. It's Matrix representation:
CNOT=⎝⎛1000010000010010⎠⎞
CNOT acting on the computational basis states ∣ab⟩:
∣ab⟩
CNOTab∣ab⟩
∣00⟩
∣00⟩
∣01⟩
∣01⟩
∣10⟩
∣11⟩
∣11⟩
∣10⟩
If, the first qubit is the target and the second one the control, CNOT is:
CNOT=⎝⎛1000000100100100⎠⎞
We have seen how this acts on classical states, but let’s now see how it acts on a qubit in a superposition ∣q0⟩ (see Note) and ∣q1⟩ in state ∣0⟩:
∣q0⟩=21(∣0⟩+∣1⟩);
∣q1⟩ = ∣0⟩
The statevector of such system is: [212100]
∣q0q1⟩=21(∣00⟩+∣01⟩)
∣q0q1⟩ is not entangled, but if we apply the CNOT gate:
Note: this is known as the ∣+⟩ state in the Bloch sphere and can be obtained by applaying an Hadamard gate to ∣0⟩.
Variational quantum circuits
VQCs are the practical embodiement of the idea: Let's train our quantum computers like we train our neural networks
it looks like trainable model in θ...
How could quantum computing help with ML?
Data
speed up sampling from data distributions
use fewer data samples (e.g., Arunachalam 1701.06806)
Model
speed up existing models (Pararo et al. 1401.4997, Low et al. 1402.7359, Allcock et al. 1812.03089)
design better models (Amin et al. 1601.02036, Benedetti et al. 1906.07682)
Cost
speed up optimisation (Wiebe et al. 1204.5242, Rebentrost et al. 1307.0471,
Denil & Freitas ∼ 2012 cs.ubc.ca/ ̃nando/)
find better solutions
Can we take advantage from Quantum Machine Learning in High-Energy Physics?
The strategy
We want to somehow encode our information into qubits, create a relation between them and in the end measure a qubit. That qubit should tell us if the event that it processed is either a signal type or background type of event: either SUSY or SM.
We are going to use supervised learning in a strategy similar to the one used in Neural Network:
Prepare labeled data
Build a trainable quantum circuit: Variational Quantum Circuit (VQC)
Train our VQC
Comparing quantum VS classical: VQC VS BDT
Conclusions
Software needed
Setup
Copy
import random
import uproot
import numpy
from itertools import chain
import pennylane as qml
from pennylane import numpy as np
np.random.seed(42)
from sklearn import decomposition
from sklearn import metrics as metrics
from sklearn.preprocessing import minmax_scale
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
from math import log
from alive_progress import alive_bar
plt.rcParams['figure.figsize'] = [24, 24]
# plot all input variables and BDT output
plotInVars = False
if plotInVars:
fig, axs = plt.subplots(5, 3)
axs = axs.ravel()
canvas = 0
for feature in myFeatures:
axs[canvas].hist(sgDict[feature], density=True, color = "blue", alpha=0.25)
axs[canvas].hist(bkDict[feature], density=True, color = "red", alpha=0.25)
axs[canvas].set_title(feature)
canvas+= 1
plt.show()
# Do 2D plots
plot2D = False
if plot2D:
flist = ["LepPt", "Met", "Jet1Pt","mt","BDT"]
fig, axs = plt.subplots(len(flist), len(flist))
fig.subplots_adjust(hspace = 0.5, wspace=0.5)
axs = axs.ravel()
canvas = 0
for xFeature in flist:
for yFeature in flist:
axs[canvas].scatter(sgDict[xFeature], sgDict[yFeature], color = "blue", alpha=0.25)
axs[canvas].scatter(bkDict[xFeature], bkDict[yFeature], color = "red", alpha=0.25)
axs[canvas].set_title(xFeature+" VS "+yFeature)
canvas+= 1
plt.show()
labeling the data
Copy
# Fix signal XS
sgDict["XS"] = numpy.ones(sgJet1Pt.size)
## Add categoty: 1 for signal and -1 for background
sgDict["category"] = numpy.ones(sgJet1Pt.size)
bkDict["category"] = -numpy.ones(bkJet1Pt.size)
def dataForQDevice(n_evts, sampleDict, features):
tensor = np.zeros([n_evts,len(features)])
for ev in range(0,n_evts):
event = np.zeros([1,len(features)])
for feature in range(0,len(features)):
event[0][feature] = sampleDict[features[feature]][ev]
#event[0][feature] = sampleDict[features[feature]][ev]*sampleDict["XS"][ev]/n_evts
tensor[ev] = event
return tensor
Copy
# events to train
n_evts = 512
features = ["LepPt", "Met", "Jet1Pt","mt","HT","JetHBpt"]
n_features = len(features)
weights = ["XS","Nevt"]
features.extend(weights)
sg_evts = n_evts # use sgJet1Pt.size if you want to use the complete sample size
bk_evts = n_evts # use bkJet1Pt.size if ...
# get events to train in the VQC - features
X0 = dataForQDevice(sg_evts,sgDict,features)
X1 = dataForQDevice(bk_evts,bkDict,features)
# events category
Y0 = np.array(sgDict["category"][:sg_evts])
Y1 = np.array(bkDict["category"][:bk_evts])
# events weight
W0 = np.array(sgDict["XS"][:sg_evts])/np.array(sgDict["Nevt"][:sg_evts])
W1 = np.array(bkDict["XS"][:bk_evts])/np.array(bkDict["Nevt"][:bk_evts])
X = np.concatenate([X0,X1], axis=0)
Y = np.concatenate([Y0,Y1], axis=0)
W = np.concatenate([W0,W1], axis=0)
normalizing data for the quantum model
We are going to be performing rotations on the qubits, the idea is to encode our data into angles and perform rotations until our final measurment gives us the predicted category for the event.
Copy
# normalize features
X[:,:-2] = minmax_scale(X[:,:-2],feature_range=(0, numpy.pi))
# split data into train + validation and test
X_train_val , X_test , y_train_val , y_test=train_test_split(X , Y , test_size=0.2)
2. Build a trainable quantum circuit: Variational Quantum Circuit (VQC)
Copy
# we enconde each feature in a qubit
n_qubits = n_features
# start a quantum device with n_features qubits
dev = qml.device('default.qubit', wires=n_qubits) # qml.device('qiskit.aer', wires=n_qubits)
# if you want to use IBM's qiskit
angle embedding
Encodes N features into the rotation angles of n qubits, where N≤n.
This is the first step into turning our data into quantum information.
strongly entangling layers
Layers consisting of single qubit rotations and entanglers, inspired by the circuit-centric classifier design arXiv:1804.00633
# classification of the events used in this exercise by the VQC
y_sig = prediction_numpy(theta_weights, theta_bias, X[:n_evts])
y_bkg = prediction_numpy(theta_weights, theta_bias, X[n_evts:])
classification output from BDT
Copy
# by the BDT, this one done apriori
BDT0 = np.array(sgDict["BDT"][:sg_evts])
BDT1 = np.array(bkDict["BDT"][:bk_evts])
BDT = np.concatenate([BDT0,BDT1], axis=0)
Copy
# plotting the prediction vs the real value
plt.rcParams['figure.figsize'] = [15,5]
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
binning=np.arange(-1,1,0.1)
ax1.set_title("Ideal")
ax1.hist(Y0, 1, weights=W0, color = "blue", alpha=0.25, label="Signal")
ax1.hist(Y1, 1, weights=W1, color = "red" , alpha=0.25, label="Background")
ax1.set_yscale("log")
ax1.set_xlim([-1, 1])
ax1.legend(loc="best")
ax2.set_title("VQC")
ax2.hist(y_sig, bins=binning, color = "blue", weights=W0, alpha=0.25, label="Signal")
ax2.hist(y_bkg, bins=binning, color = "red" , weights=W1, alpha=0.25, label="Background")
ax2.set_yscale("log")
ax2.set_xlim([-1, 1])
ax2.legend(loc="best")
ax3.set_title("BDT")
ax3.hist(BDT0, bins=binning, color = "blue", weights=W0, alpha=0.25, label="Signal")
ax3.hist(BDT1, bins=binning, color = "red" , weights=W1, alpha=0.25, label="Background")
ax3.set_yscale("log")
ax3.set_xlim([-1, 1])
ax3.legend(loc="best")
Until now, it's not obvious which algorithm is better and the overall impact of using it in the search for supersymmetry. We need to build a function that takes as inputs the signal and background yield precicted by placig a 1-D cut on the output our machine learning algorithm (remember, this type of methods take N-Dimensional variables and translate them into a 1-D output). In experimental particle physics we call this funtion a Figure of Metric(FOM):
# measure the process yield as a function of the ML cut
def getYield(dataPD, ml="VQC", cut=0.0, luminosity=41479, splitfactor=2):
if ml == "VQC":
selEvents = dataPD[dataPD.VQC>cut]
elif ml == "BDT":
selEvents = dataPD[dataPD.BDT>cut]
Yield = selEvents.weight.sum() * luminosity * splitfactor
YieldUnc = np.sqrt(np.sum(np.square(selEvents.weight))) * luminosity * splitfactor
return Yield, YieldUnc
Copy
# FOM as a function of the signal and background yields for different ML output cuts
def FOM(signal, background, f=0.20):
s, sErr = signal
b, bErr = background
if b == 0: b=0.0001 # check if NaN => TODO: fix this hack
sigmaB2 = (f*b)**2 # we set the relative uncertainty of the background yield
# to be 20%: f=0.20
term1 = (s+b)*log(((s+b)*(b+sigmaB2))/(b**2+(s+b)*sigmaB2))
term2 = b**2/sigmaB2*log(1+(sigmaB2*s)/(b*(b+sigmaB2)))
fom = (2*term1-term2)**0.5
return fom
Now, we want to see how the FOM evolves as a function of the ML output for both methods and compare them.
# plot fom
plt.title("FOM")
plt.plot(cuts,fomEvo_vqc, color = "blue", label="VQC")
plt.plot(cuts,fomEvo_bdt, color = "red", label="BDT")
plt.xlabel("FOM >")
plt.legend(loc='upper left')
plt.grid()
<Figure size 1080x360 with 1 Axes>
6. Conclusions
We showed that with quantum simulation we can use the ideas of quantum mechanics to build a classifier suitable for High Energy Physics
Like BDTs, VQCs are linear classifiers and have similar performance: VQCs don't provide any known advantage. The key is in how we embedd the feature map
VQCs can achieve the performance of BDTs with less epochs and samples