bsc-thesis/thesis.md

---
author: David Leppla-Weber
#title: Search for excited quark states decaying to qW/qZ
lang: en-GB
header-includes: |
    \usepackage[onehalfspacing]{setspace}
    \usepackage{siunitx}
    \usepackage{tikz-feynman}
    \usepackage{csquotes}
    \usepackage{abstract}
    \pagenumbering{gobble}
    \setlength{\parskip}{0.5em}
    \bibliographystyle{lucas_unsrt}
documentclass: article
geometry:
- top=2.5cm
- left=2.5cm
- right=2.5cm
- bottom=2cm
papersize: a4
mainfont: Times New Roman
fontsize: 12pt
toc: false
nocite: |
    @*
figPrefix: "Fig."
tblPrefix: "Table"
secPrefix: "Sec."
eqnPrefix: "Eq."
---
\begin{titlepage}
\begin{center}
       \vspace*{1cm}
       \Huge
       \rule{\textwidth}{0.1cm}
       \textbf{Search for excited quark states decaying to qW/qZ with the CMS experiment}
       \rule{\textwidth}{0.1cm}

       \vspace{2.5cm}
       \Large
       von\\
       \LARGE
       David Leppla-Weber\\
       \Large
       \vspace{0.5cm}
       Geboren am\\
       18.09.1996

       \vfill

       Bachelorarbeit im Studiengang Physik\\
       Universität Hamburg\\
       November 2019

   \end{center}
\end{titlepage}

\newpage
\mbox{}
\vfill

1. Gutachter: Dr. Andreas Hinzmann
2. Gutachter: Jun.-Prof. Dr. Gregor Kasieczka

\newpage

\begin{abstract}
A search for an excited quark state, called q*, is presented using data of proton-proton collisions at the LHC recorded
by the CMS experiment during the years 2016, 2017 and 2018 with a centre-of-mass energy of $\sqrt{s} =
\SI{13}{\tera\eV}$ and an integrated luminosity of $\SI{137.19}{\per\femto\barn}$. Its decay channels to q
+ W and q + Z with the vector bosons further decaying hadronically to $q + q\bar{q}'$ resp. $q + q\bar{q}$, resulting in
  two jets in the final state, are analysed. The dijet invariant mass spectrum of those two jets is then used to look
for a resonance and to reconstruct the q* mass. To identify jets originating from the decay of a vector boson, a
V-tagger is needed. For that, the new DeepAK8 tagger, based on a neural network, is compared to the older N-subjettiness
tagger. In the result, no significant deviation from the Standard Model can be observed, therefore the q* is excluded up
to a mass of 6.1\ TeV (qW) resp. 5.5\ TeV (qZ) with a confidence level of 95 \%. This limit is about 1\ TeV higher than
the limits found by a previous research of data with an integrated luminosity of $\SI{35.92}{\per\femto\barn}$ collected
by the CMS experiment in 2016, excluding the q* particle up to a mass of 5.0\ TeV resp. 4.7\ TeV. The DeepAK8 tagger is
found to currently be at the same level as the N-subjettiness tagger, giving a $\SI{0.1}{\tera\eV}$ better result for
the decay to qW but a by $\SI{0.6}{\tera\eV}$ worse one for the decay to qZ. By optimizing the neural network's training
for the datasets of 2016, 2017 and 2018, the sensitivity can likely be improved.

\end{abstract}
\newpage
\renewcommand{\abstractname}{Zusammenfassung}
\begin{abstract}

In dieser Arbeit wird eine Suche nach angeregten Quarkzuständen, genannt q*, durchgeführt. Dafür werden Daten von
Proton-Proton Kollisionen am LHC mit einer integrierten Luminosität von $\SI{137.19}{\per\femto\barn}$ analysiert,
welche über die Jahre 2016, 2017 und 2018 bei einer Schwerpunktsenergie von $\sqrt{s} = \SI{13}{\tera\eV}$ vom CMS
Experiment aufgenommen wurden. Es wird der Zerfall des q* Teilchens zu q + W und q + Z untersucht, bei anschließendem
hadronischen Zerfall des Vektorbosons zu $q\bar{q}'$ bzw. $q\bar{q}$. Der gesamte Zerfall resultiert damit in zwei Jets,
mithilfe deren invariantem Massenspektrum die q* Masse rekonstruiert und nach einer Resonanz gesucht wird. Zur
Identifizerung von Jets, welche durch den Zerfall eines Vektorbosons entstanden sind, wird ein V-Tagger benötigt.
Hierfür wird der neue DeepAK8 Tagger, welcher auf einem neuronalen Netzwerk basiert, mit dem älteren N-Subjettiness
Tagger verglichen. Im Ergebnis kann keine signifikante Abweichung vom Standardmodell beobachtet werden. Das q* Teilchen
wird mit einem Konfidenzniveau von 95 \% bis zu einer Masse von 6.1\ TeV (qW) bzw. 5.5\ TeV (qZ) ausgeschlossen. Das Limit
liegt etwa 1\ TeV höher, als das anhand des $\SI{35.92}{\per\femto\barn}$ großen Datensatzes von 2016 gefundene von 5.0
TeV bzw. 4.7\ TeV. Beim Zerfall zu qW erzielt der DeepAK8 Tagger ein um $\SI{0.1}{\tera\eV}$ besseres Ergebnis, als der
N-Subjettiness Tagger, beim Zerfall zu qZ jedoch ein um $\SI{0.6}{\tera\eV}$ schlechteres. Durch Verbesserung des
Trainings des neuronalen Netzwerkes für die drei Datensätze von 2016, 2017 und 2018, gibt es aber noch Potential die
Sensitivität zu verbessern.

\end{abstract}

\newpage
\setcounter{tocdepth}{3}
\tableofcontents

\newpage
\pagenumbering{arabic}

# Introduction

The Standard Model is a very successful theory in describing most of the interactions happening between particles.
Still, it has a lot of limitations, that show that it isn't yet a full "theory of everything". To solve these
shortcomings, lots of theories beyond the standard model exist that try to expand the Standard Model in different ways
to solve these issues.

One category of such theories is based on a composite quark model. Quarks are currently considered elementary particles
by the Standard Model. The composite quark models on the other hand predict that quarks consist of particles unknown
to us so far or can bind to other particles using unknown forces. This could explain the symmetries between particles
and reduce the number of constants needed to explain the properties of the known particles. One common prediction of
those theories are excited quark states. Those are quark states of higher energy that can decay to an unexcited quark
under the emission of a boson. This thesis will look for their decay to a vector boson that then further decays
hadronically. The final state of this decay consists only of quarks forming two jets, making Quantum Chromodynamics the
main background.

In a previous research [@PREV_RESEARCH], an exclusion limit for the mass of an excited quark has already been set using
data from the 2016 run of the Large Hadron Collider with an integrated luminosity of $\SI{35.92}{\per\femto\barn}$.
Since then, a lot more data has been collected by the CMS experiment, totalling to $\SI{137.19}{\per\femto\barn}$ of
data usable for research. This thesis takes advantage of this larger dataset as well as a new technique to identify
decays of highly boosted particles based on a deep neural network. By using more data and new tagging techniques, it
aims to either confirm the existence of the q\* particle or improve the previously set lower limit of 5 TeV respectively
4.7 TeV for the decay to qW respectively qZ on its mass to even higher values. It will also directly compare the
performance of this new tagging technique to an older tagger based on jet substructure studies used in the previous
research.

In chapter 2, a theoretical background will be presented briefly explaining the Standard Model, its shortcomings and the
theory of excited quarks. Then, in chapter 3, the Large Hadron Collider and the Compact Muon Solenoid, the detector that
collected the data for this analysis, will be described. After that, in chapters 4-7, the main analysis part follows,
describing how the data were used to extract limits on the mass of the excited quark particle. At the very end, in
chapter 8, the results are presented and compared to previous research.

\newpage

# Theoretical motivation

This chapter presents a short summary of the theoretical background relevant to this thesis. It first gives an
introduction to the standard model itself and some of the issues it raises. It then goes on to explain the background
processes of quantum chromodynamics and the theory of q*, which are the relevant phenomena for the search described in
this thesis.

## Standard Model {#sec:sm}

The Standard Model of physics proved to be very successful in describing three of the four fundamental interactions
currently known: the electromagnetic, weak and strong interaction. The fourth, gravity, could not yet be successfully
included in this theory.

The Standard Model divides all particles into spin-$\frac{n}{2}$ fermions and spin-n bosons, where n could be any
integer but so far is only known to be one for fermions and either one (gauge bosons) or zero (scalar bosons) for
bosons. Fermions are further classified into quarks and leptons.
Quarks and leptons can also be categorized into three generations, each of which contains two particles, also called
flavours. For leptons, the three generations each consist of a charged lepton and its corresponding neutrino, namely the
electron, the muon and the tau. The three quark generations consist of first, the up and down, second, the charm and
strange, and third, the top and bottom quark. A full list of particles of the standard model can be found in [@fig:sm].
Furthermore, all fermions have an associated anti particle with reversed charge. Bound states of multiple quarks also
exist and are called hadrons.

![
Elementary particles of the Standard Model and their mass charge and spin. Taken from [@SM]
](./figures/sm_wikipedia.pdf){width=50% #fig:sm}

The gauge bosons, namely the photon, $W^\pm$ bosons, $Z^0$ boson, and gluon, are mediators of the different
forces of the standard model.

The photon is responsible for the electromagnetic force and therefore interacts with all
electrically charged particles. It itself carries no electromagnetic charge and has no mass. Possible interactions are
either scattering or absorption. Photons of different energies can also be described as electromagnetic waves of
different wavelengths.

The $W^\pm$ and $Z^0$ bosons mediate the weak force. All quarks and leptons carry a flavour, which is a conserved value
in all interactions but the weak one. There, a quark or lepton can, by interacting with a $W^\pm$ boson, change its
flavour. The probabilities of this happening are determined by the Cabibbo-Kobayashi-Maskawa matrix:

\begin{equation}
  V_{CKM} =
    \begin{pmatrix}
      |V_{ud}| & |V_{us}| & |V_{ub}| \\
      |V_{cd}| & |V_{cs}| & |V_{cb}| \\
      |V_{td}| & |V_{ts}| & |V_{tb}|
    \end{pmatrix}
  =
    \begin{pmatrix}
      0.974 & 0.225 & 0.004 \\
      0.224 & 0.974 & 0.042 \\
      0.008 & 0.041 & 0.999
    \end{pmatrix}
\end{equation}

The probability of a quark changing its flavour from $i$ to $j$ is given by the square of the absolute value of the
matrix element $V_{ij}$. It is easy to see, that the change of flavour in the same generation is way more likely than
any other flavour change.

Due to their high masses of 80.39 GeV resp. 91.19 GeV, the $W^\pm$ and $Z^0$ bosons themselves decay very quickly.
Either in the leptonic or hadronic decay channel. In the leptonic channel, the $W^\pm$ decays to a lepton and the
corresponding anti-lepton neutrino, in the hadronic channel it decays to a quark and an anti-quark of a different
flavour. Due to the $Z^0$ boson having no charge, it always decays to a fermion and its anti-particle, in the leptonic
channel this might be for example an electron - positron pair, in the hadronic channel an up and anti-up quark pair.
This thesis examines the hadronic decay channel, where both vector bosons decay to two quarks.

The quantum chromodynamics (QCD) describes the strong interaction of particles. It applies to all
particles carrying colour (e.g. quarks). The force is mediated by gluons. These bosons carry colour as well,
although they don't carry only one colour but rather a combination of a colour and an anticolour, and can therefore
interact with themselves and exist in eight different variants. As a result of this, processes where a gluon decays into
two gluons are possible. Furthermore the strength of the strong force, binding colour carrying particles, increases with
their distance making it at a certain point more energetically efficient to form a new quark - antiquark pair than
separating the two particles even further. This effect is known as colour confinement. Due to this effect, colour
carrying particles can't be observed directly, but rather form so called jets that cause hadronic showers in the
detector. Those jets are cone like structures made of hadrons and other particles. The effect is called hadronisation
[@HADRONIZATION].

### Shortcomings of the Standard Model

While being very successful in describing the effects observed in particle colliders or the particles reaching earth
from cosmological sources, the Standard Model still has several shortcomings.

- **Gravity**: as already noted, the standard model doesn't include gravity as a force.
- **Dark Matter**: observations of the rotational velocity of galaxies can't be explained by the known matter. Dark
  matter currently the most popular theory to explain those.
- **Matter-antimatter asymmetry**: The amount of matter vastly outweights the amount of antimatter in the observable
  universe. This can't be explained by the standard model, which predicts a similar amount of matter and antimatter.
- **Symmetries between particles**: Why do exactly three generations of fermions exist? Why is the charge of a quark
  exactly one third of the charge of a lepton? How are the masses of the particles related? Those and more questions
  cannot be answered by the standard model.
- **Hierarchy problem**: The weak force is approximately $10^{24}$ times stronger than gravity and so far, there's no
  satisfactory explanation as to why that is.

## Excited quark states {#sec:qs}

One category of theories that try to explain the symmetries between particles of the standard model are the composite
quark models. Those state, that quarks consist of some particles unknown so far. This could explain the symmetries
between the different fermions. A common prediction of those models are excited quark states (q\*, q\*\*, q\*\*\*...)
[@QSTAR_THEORY]. Similar to atoms, that can be excited by the absorption of a photon and can then decay again under
emission of a photon with an energy corresponding to the excited state, those excited quark states could decay under the
emission of any boson. Quarks are measured to be smaller than $10^{-18}$ m. This corresponds to an energy scale of
approximately 1 TeV. Therefore the excited quark states are expected to be in that energy region. That will cause the
emitted boson to be highly boosted.

\begin{figure}
\centering
\feynmandiagram [large, horizontal=qs to v] {
  a -- qs -- b,
  qs -- [fermion, edge label=\(q*\)] v,
  q1 [particle=\(q\)] -- v -- w [particle=\(W\)],
  q2 [particle=\(q\)] -- w -- q3 [particle=\(q\)],
};
\caption{Feynman diagram showing the decay of a q* particle to a W boson and a quark with the W boson decaying
hadronically.} \label{fig:qsfeynman}
\end{figure}

This thesis will search data collected by the CMS in the years 2016, 2017 and 2018 for the decay of a single excited
quark state q\* to a quark and a vector boson . An example of a q\* decaying to a quark and a W boson can be seen in
[@fig:qsfeynman]. As explained in [@sec:sm], the vector boson can then decay either in the hadronic or leptonic decay
channel. This research investigates only the hadronic channel with two quarks in the final state. Because the boson is
highly boosted, those will be very close together and therefore appear to the detector as only one jet. This means that
the investigated decay of a q\* particle will have two jets in the final state and will therefore be hard to distinguish
from the QCD background described in [@sec:qcdbg].

The choice of only examining the decay of the q\* particle to the vector bosons is motivated by the branching ratios
calculated for the decay [@QSTAR_THEORY]:


: Branching ratios of the decaying q\* particle.

| decay mode                | br. ratio [%] | decay mode                | br. ratio [%] |
|---------------------------|---------------|---------------------------|---------------|
| $U^* \rightarrow ug$      | 83.4          | $D^* \rightarrow dg$      | 83.4          |
| $U^* \rightarrow dW$      | 10.9          | $D^* \rightarrow uW$      | 10.9          |
| $U^* \rightarrow u\gamma$ | 2.2           | $D^* \rightarrow d\gamma$ | 0.5           |
| $U^* \rightarrow uZ$      | 3.5           | $D^* \rightarrow dZ$      | 5.1           |

The decay to the vector bosons have the second highest branching ratio. The decay to a gluon and a quark is the dominant
decay, but virtually impossible to distinguish from the QCD background described in the next section. This makes the
decay to the vector bosons the most promising choice.

To reconstruct the mass of the q\* particle from an event successfully recognized to be the decay of such a particle,
the dijet invariant mass has to be calculated. This can be achieved by adding the four momenta of the two jets in the
final state, vectors consisting of the energy and momentum of a particle, together. From the four momentum it's easy to
derive the mass by solving $E=\sqrt{p^2
+ m^2}$ for m.

A search for the excited quark predicted by this theory has already been investigated in [@PREV_RESEARCH] analysing data
with an integrated luminosity of $\SI{35.92}{\per\femto\barn}$ recorded by the CMS experiment in 2016, excluding the q\*
particle up to a mass of 5 TeV resp. 4.7 TeV for the decay to qW resp. qZ analysing the hadronic decay of the vector
boson. This thesis aims to either exclude the particle to higher masses or find a resonance showing its existence using
more data that is available now.

### Quantum Chromodynamic background {#sec:qcdbg}

In this thesis, a decay with two jets in the final state will be analysed. Therefore it will be hard to distinguish the
signal processes from QCD effects. Those can also produce two jets in the final state, as can be seen in
[@fig:qcdfeynman]. They are also happening very often in a proton proton collision, as it is happening in the Large
Hadron Collider. This is caused by the structure of the proton. It not only consists of three quarks, called valence
quarks, but also of a lot of quark-antiquark pairs connected by gluons, called the sea quarks, that exist due to the
self interaction of the gluons binding the three valence quarks. Therefore the QCD multijet backgroubd is the dominant
background of the signal described in [@sec:qs].

\begin{figure}
\centering
\feynmandiagram [horizontal=v1 to v2] {
    q1 [particle=\(q\)] -- [fermion] v1 -- [gluon] g1 [particle=\(g\)],
    v1 -- [gluon] v2,
    q2 [particle=\(q\)] -- [fermion] v2 -- [gluon] g2 [particle=\(g\)],
};
\feynmandiagram [horizontal=v1 to v2] {
    g1 [particle=\(g\)] -- [gluon] v1 -- [gluon] g2 [particle=\(g\)],
    v1 -- [gluon] v2,
    g3 [particle=\(g\)] -- [gluon] v2 -- [gluon] g4 [particle=\(g\)],
};
\caption{Two examples of QCD processes resulting in two jets.} \label{fig:qcdfeynman}
\end{figure}

\newpage

# Experimental Setup

Following on, the experimental setup used to gather the data analysed in this thesis will be described.

## Large Hadron Collider

The Large Hadron Collider [@LHC_MACHINE] is the world's largest and most powerful particle accelerator. It has a
circumference of 27 km and can accelerate two beams of protons to an energy of 6.5 TeV resulting in a collision with a
centre of mass energy of 13 TeV. It is home to several experiments, between others the Compact Muon Solenoid (CMS),
which is the one used for the search presented in this thesis. It is a general-purpose detector to investigate the
particles that form during particle collisions. The LHC may also be used for colliding ions but this ability is to no
interest for this research.

Because of the collision of two beams with particles of the same charge, it is not possible to use the same magnetic
field for both beams. Therefore opposite magnetic-dipole fields exist in both rings to be able to accelerate the beams
in opposite directions.

Particle colliders are characterized by their luminosity L. It is a quantity to be able to calculate the number of
events per second generated in a collision by $\dot{N}_{event} = L\sigma_{event}$ with $\sigma_{event}$ being the cross
section of the event. The LHC aims for a peak luminosity of $10^{34}\si{\per\square\centi\metre\per\s}$. This is
achieved by colliding two bunches of protons every $\SI{25}{ns}$. Each proton beam thereby consists of 2'808 bunches.
Furthermore, the integrated Luminosity, defined as $\int Ldt$, can be used to describe the amount of data collected over
a specific time interval.

## Compact Muon Solenoid

The data used in this thesis was recorded by the Compact Muon Solenoid (CMS) [@CMS_REPORT]. It is one of the four main
experiments at the Large Hadron Collider. It can detect all elementary particles of the standard model except neutrinos.
For that, it has an onion like setup, as can be seen in [@fig:cms_setup]. The particles produced in a collision first go
through a tracking system. They then pass an electromagnetic as well as a hadronic calorimeter. This part is surrounded
by a superconducting solenoid that generates a magenetic field of 3.8 T. Outside of the solenoid are big muon chambers.
In 2016 the CMS captured data of an integrated luminosity of $\SI{37.80}{\per\femto\barn}$. In 2017 it collected
$\SI{44.98}{\per\femto\barn}$ and in 2018 $\SI{63.67}{\per\femto\barn}$. Because of eventual inconsistencies in the
setup, some data have to be discarded. The amount of usable data is $\SI{34.92}{\per\femto\barn}$,
$\SI{41.53}{\per\femto\barn}$ and $\SI{59.74}{\per\femto\barn}$ for the years 2016, 2017 and 2018, totalling to
$\SI{137.19}{\per\femto\barn}$ of data.

![
The setup of the Compact Muon Solenoid showing its onion like structure, the different detector parts and where
different particles are detected [@CMS_PLOT].
](./figures/cms_setup.png){#fig:cms_setup}


### Coordinate conventions

Per convention, the z axis points along the beam axis in the direction of the magnetic fields of the solenoid, the y
axis upwards and the x axis horizontal towards the LHC centre. The azimuthal angle $\phi$, which describes the angle in
the x - y plane, the polar angle $\theta$, which describes the angle in the y - z plane and the pseudorapidity $\eta$,
which is defined as $\eta = -ln\left(tan\frac{\theta}{2}\right)$ are also introduced. The coordinates are visualised in
[@fig:cmscoords]. Furthermore, to describe a particle's momentum, often the transverse momentum, $p_t$ is used. It is
the component of the momentum transversal to the beam axis. Before the collision, the transverse momentum is zero,
therefore, due to conservation of energy, the sum of all transverse momenta after the collision has to be zero, too. If
this is not the case for the detected events, it implies particles that weren't detected such as neutrinos.

![Coordinate conventions of the CMS illustrating the use of $\eta$ and
$\phi$. The Z axis is in beam direction. Taken from [@COORD_PLOT]
](./figures/cms_coordinates.png){#fig:cmscoords width=60%}

### The tracking system

The tracking system is built of two parts, closest to the collision is a pixel detector and around that silicon strip
sensors. They are used to measure their charge sign, direction and momentum to be later able to reconstruct the tracks
of charged particles. They are as close to the collision as possible to be able to identify secondary vertices.

### The electromagnetic calorimeter

The electromagnetic calorimeter measures the energy of photons and electrons. It is made of tungstate crystal and
photodetectors. When passed by particles, the crystal produces scintillation light in proportion to the particle's
energy. This light is measured by the photodetectors that convert it to an electrical signal. To measure a particles
energy, it has to leave its whole energy in the ECAL, which is true for photons and electrons, but not for other
particles such as hadrons and muons. Those interact with matter differently and therefore only leave some energy in the
ECAL but are not stopped by it.

### The hadronic calorimeter

The hadronic calorimeter (HCAL) is used to detect high energy hadronic particles. It surrounds the ECAL and is made of
alternating layers of active and absorber material. While the absorber material with its high density causes the hadrons
to shower, the active material then detects those showers and measures their energy, similar to how the ECAL works.

### The solenoid

The solenoid, giving the detector its name, is one of the most important features. It creates a magnetic field of 3.8 T
and therefore makes it possible to measure momentum of charged particles by bending their tracks.

### The muon system

Outside of the solenoid, but still in its return yoke, there is only the muon system. It consists of three types of gas
detectors, the drift tubes, cathode strip chambers and resistive plate chambers. It covers a total of $0 < |\eta| <
2.4$. The muons are the only detected particles, that can pass all the other systems without a significant energy loss.

### The Trigger system

The CMS features a two level trigger system. It is necessary because the detector is unable to process all the events
due to limited bandwidth. The Level 1 trigger reduces the event rate from 40 MHz to 100 kHz, the software based High
Level trigger is then able to further reduce the rate to 1 kHz. The Level 1 trigger uses the data from the
electromagnetic and hadronic calorimeters as well as the muon chambers to decide whether to keep an event. The High
Level trigger uses a streamlined version of the CMS offline reconstruction software for its decision making.

### The Particle Flow algorithm

The particle flow algorithm [@PARTICLE_FLOW] is used to identify and reconstruct all the particles arising from the
proton - proton collision by using all the information available from the different sub-detectors. It does so by
extrapolating the tracks through the different calorimeters and associating clusters they cross with them. The set of
clusters already associated to a track is then no more used for the reconstruction of other particles. This is first
done for muons and then for charged hadrons, so a muon can't give rise to a wrongly identified charged hadron. Due to
Bremsstrahlung photon emission, electrons are harder to reconstruct. For them a specific track reconstruction algorithm
is used [@ERECO]. After identifying charged hadrons, muons and electrons, all remaining clusters within the HCAL
correspond to neutral hadrons and within ECAL to photons. When the list of particles and their corresponding deposits is
established, it can be used to determine the particles four momenta. From that, the missing transverse energy can be
calculated and tau particles can be reconstructed by their decay products.

## Jet clustering

Because of the hadronisation it is not possible to uniquely identify the originating particle of a jet. Nonetheless,
several algorithms exist to help with this problem. The algorithm used in this thesis is the anti-$k_t$ [@ANTIKT]
clustering algorithm. It arises from a generalization of several other clustering algorithms, namely the $k_t$,
Cambridge/Aachen and SISCone clustering algorithms.

The anti-$k_t$ clustering algorithm associates high $p_t$ particles with the lower $p_t$ particles surrounding them
within a radius R in the $\eta$ - $\phi$ plane forming cone like jets. If two jets overlap, the jets shape is changed
according to its hardness in regards to the transverse momentum. A softer particles jet will change its shape more than
a harder particles. A visual comparison of four different clustering algorithms can be seen in [@fig:antiktcomparison].
It shows, that the jets reconstructed using the anti-$k_t$ algorithm have the clearest cone like shape and is therefore
chosen for this thesis. For this analysis, a radius of 0.8 is used.

![
Comparison of the $k_t$, Cambridge/Aachen, SISCone and anti-$k_t$ algorithms clustering a sample parton-level event
with many random soft "ghosts". Taken from [@ANTIKT]
](./figures/antikt-comparision.png){#fig:antiktcomparison}

Furthermore, to approximate the mass of a heavy particle that caused a jet, the soft-drop mass [@SDM] can be used. In
its calculation, to reduce the contamination from initial state radiation, underlying event and multiple hadron
scattering, wide angle soft particles are removed from the jet. It therefore is more accurate in determining the mass of
a particle causing a jet than taking the mass of all constituent particles of the jet combined.

\newpage

# Method of analysis {#sec:moa}

This section gives an overview over how the data collected by CMS is going to be analysed to be able to either exclude
the q\* particle to even higher masses than already done or confirm its existence.

As described in [@sec:qs], the decay of the q\* particle to a quark and a vector boson with the vector boson then
decaying hadronically will be investigated. This is the second most probable decay of the q\* particle and easier to
analyse than the dominant decay to a quark and a gluon. Therefore it is a good choice for this research.
It results in two jets, because the decay products of the heavy vector boson are highly boosted, causing them to be very
close together and therefore be reconstructed as one jet. The dijet invariant mass of the two jets in the final state is
then used to reconstruct the mass of the q\* particle. The only background considered is the QCD multijet background
described in [@sec:qcdbg]. A selection using different kinematic variables as well as a tagger to identify jets from the
decay of a vector boson is introduced to reduce the background and increase the sensitivity for the signal. After that,
it will be looked for a peak in the dijet invariant mass distribution at the resonance mass of the q\* particle.

The data studied were collected by the CMS experiment in the years 2016, 2017 and 2018. They are analysed with the
Particle Flow algorithm to reconstruct jets and all the other particles forming during the collision. The jets are then
clustered using the anti-$k_t$ algorithm with the distance parameter R being 0.8.

The analysis will be conducted in two steps. First, only the data collected by the CMS experiment in 2016 with an
integrated luminosity of $\SI{35.92}{\per\femto\barn}$ will be used to compare the results to the previous analysis
[@PREV_RESEARCH]. Then the combined data from 2016, 2017 and 2018 with an integrated luminosity of
$\SI{137.19}{\per\femto\barn}$ will be used to improve the previously set limits for the mass of the q\* particle. Also,
two different V-tagging methods will be used to compare their performance. One based on the N-subjettiness variable used
in the previous research [@PREV_RESEARCH], the other being a novel approach using a deep neural network, that will be
explained in the following.

## Signal and Background modelling

Before looking at the data collected by the CMS experiment, Monte Carlo simulations [@MONTECARLO] of background and
signal are used to understand how the data is expected to look like. To replicate the QCD background processes, the
different particle interactions that take place in a proton - proton collision are simulated using the probabilities
provided by the Standard Model by calculating the cross sections of the different Feynman diagrams. This was done using
MadGraph and Pythia 8. Later on, also detector effects (like its limited resolution) are applied to make sure, they look
like real data coming from the CMS detector.

The q\* signal samples are simulated by the probabilities given by the q\* theory [@QSTAR_THEORY] and assuming a cross
section of $\SI{1}{\per\pico\barn}$. The simulation was done using MadGraph for eleven masspoints between 1.6 TeV and 7
TeV. Because of the expected high mass, the signal width will be dominated by the resolution of the detector, not by the
natural resonance width.

The dijet invariant mass distribution of the QCD background is expected to smoothly fall with higher masses.
It is therefore fitted using the following smooth falling function with three parameters p0, p1, p2:
\begin{equation}
\frac{dN}{dm_{jj}} = \frac{p_0 \cdot ( 1 - m_{jj} / \sqrt{s} )^{p_2}}{ (m_{jj} / \sqrt{s})^{p_1}}
\end{equation}
Whereas $m_{jj}$ is the invariant mass of the dijet and $p_0$ is a normalisation parameter. It is the same function as
used in the previous research studying 2016 data only but was also found to reliably reproduce the background shape of
the other years.

The signal is fitted using a double sided crystal ball function. It has six parameters:

- mean: the functions mean, in this case the resonance mass
- sigma: the functions width, in this case the resolution of the detector due to the very small resonance width expected
- n1, n2, alpha1, alpha2: parameters influencing the shape of the left and right tail

A gaussian and a poisson function have also been studied but found to be not able to reproduce the signal shape as they
couldn't model the tails on both sides of the peak.

A linear combination of the signal and background function is then fitted to a toy dataset with gaussian errors and a
simulated signal cross section of $\SI{1}{\per\pico\barn}$. The resulting coefficients of said combination then show the
expected signal rate for the simulated cross section. An example of such a fit can be seen in [@fig:cb_fit]. In this
figure, a binning of 200 GeV is used for presentational purposes. The analysis itself is conducted using a 1 GeV
binning. It can be seen that the fit works very well and therefore confirms the functions chosen to model signal and
background. This is supported by a $\chi^2 /$ ndof of 0.5 and a found mean for the signal at 2999 $\pm$ 23
$\si{\giga\eV}$ which is in very good agreement with the expected 3000 GeV mean. Those numbers clearly show that the
method in use is able to successfully describe the simulated toy data.

![
Combined fit of signal and background on a toy dataset with gaussian errors and a simulated resonance mass of 3 TeV.
](./figures/cb_fit.pdf){#fig:cb_fit}

\newpage

# Preselection and data quality

To reduce the background and increase the signal sensitivity, a selection of events that satisfy certain requirements is
introduced. This is done taking into account different variables. The selection is divided into two stages. The first
one (the preselection) introduces some general physics motivated selections using kinematic variables and is also used
to ensure a high trigger efficiency. In the second part, the discriminants introduced by different taggers will be used
to identify jets originating from the decay of a vector boson. After the preselection, it is made sure, that the
simulated samples represent the real data well by comparing the data with the simulation in the signal as well as a
sideband region, where no signal events are expected.

## Preselection

First, all events are cleaned of jets with a $p_t < \SI{200}{\giga\eV}$ and a pseudorapidity $|\eta| > 2.4$. This is to
discard soft background and to make sure the particles are in the barrel region of the detector for an optimal track
reconstruction. Furthermore, all events with one of the two highest $p_t$ jets having an angular separation smaller
than 0.8 from any electron or muon are discarded to allow future use of the data in studies investigating the leptonic
decay channel of the vector boson.

From a decaying q\* particle, two jets are expected in the final state. The dijet invariant mass of those two jets will
be used to reconstruct the mass of the q\* particle. Therefore a cut is added to have at least 2 jets, accounting for
the possibility of more jets, for example caused by gluon radiation of a quark or other QCD effects. If this is the
case, the two jets with the highest $p_t$ are used for the reconstruction of the q\* mass.
The distributions of the number of jets before and after the selection can be seen in [@fig:njets]. The light blue
filled histogram shows the QCD background, the green and red line show the expected signal for a decay of the q\*
particle to qW with a mass of 2 TeV (green) and 5 TeV (red). By comparing the left to the right distributions, it is
clear that the requirement of at least 2 jets reduces the background significantly while keeping mostly all signal
events.

\begin{figure}
\begin{minipage}{\textwidth}
\centering\textbf{Comparison for 2016}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_Cleaner_N_jets_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_Njet_N_jets_stack.eps}
\end{minipage}
\begin{minipage}{\textwidth}
\centering\textbf{Comparison for the combined dataset}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_Cleaner_N_jets_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_Njet_N_jets_stack.eps}
\end{minipage}
\caption{Comparison of the number of jet distribution before and after the cut at number of jets $\ge$ 2. \newline
Left: distribution before the cut. Right: distribution after the cut. \newline
The signal curves are amplified by a factor of 10'000 to be visible.}
\label{fig:njets}
\end{figure}

The next selection is done using $\Delta\eta = |\eta_1 - \eta_2|$, with $\eta_1$ and $\eta_2$ being the $\eta$ of the
two jets with the highest transverse momentum. The q\* particle is expected to be very heavy in regards to the center of
mass energy of the collision and will therefore be almost stationary. Its decay products should therefore be close to
back to back, which means the $\Delta\eta$ distribution is expected to peak at zero. At the same time, particles
originating from QCD effects are expected to have a higher $\Delta\eta$. To maintain comparability, the same selection
as in previous research of $\Delta\eta \le 1.3$ is used. The comparison of the $m_{jj}$ distribution seen in [@fig:deta]
before and after the cut clearly shows, that the signal sensitivity was greatly improved by this cut.

\begin{figure}
\begin{minipage}{\textwidth}
\centering\textbf{$\Delta\eta$ cut with signal amplified by 10'000}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_Njet_deta_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_Njet_deta_stack.eps}
\end{minipage}
\begin{minipage}{\textwidth}
\vspace{0.1cm}
\centering\textbf{$m_{jj}$ distribution before the cut}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_Njet_invMass_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_Njet_invMass_stack.eps}
\end{minipage}
\begin{minipage}{\textwidth}
\vspace{0.1cm}
\centering\textbf{$m_{jj}$ distribution after the cut}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_Eta_invMass_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_Eta_invMass_stack.eps}
\end{minipage}
\caption{Demonstration of the effect of the $\Delta\eta$ cut at $\Delta\eta \le 1.3$. \newline
Left: Partial dataset of $\SI{35.92}{\per\femto\barn}$ Right: Full dataset of $\SI{137.19}{\per\femto\barn}$.
}
\label{fig:deta}
\end{figure}

The last selection in the preselection is on the dijet invariant mass: $m_{jj} \ge \SI{1050}{\giga\eV}$. It is important
for a trigger efficiency higher than 99 % with a soft-drop mass cut of $m_{SDM} > \SI{65}{\giga\eV}$ applied to the jet
with the highest transverse momentum. A comparison of the $m_{jj}$ distribution before and after the selection can be
seen in [@fig:invmass]. Also, it has a huge impact on the background because it usually consists of lighter particles.
The q\* on the other hand is expected to have a very high invariant mass of more than 1 TeV. The $m_{jj}$ distribution
should be a smoothly falling function for the QCD background and peak at the simulated resonance mass for the signal
events.

\begin{figure}
\begin{minipage}{\textwidth}
\centering\textbf{Comparison for 2016}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_Eta_invMass_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/v1_invmass_invMass_stack.eps}
\end{minipage}
\begin{minipage}{\textwidth}
\centering\textbf{Comparison for the combined dataset}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_Eta_invMass_stack.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/v1_invmass_invMass_stack.eps}
\end{minipage}
\caption{Comparison of the invariant mass distribution before and after the cut at $m_{jj} \ge \SI{1050}{\giga\eV}$. It
shows the expected smooth falling functions of the background whereas the signal peaks at the simulated resonance mass.
\newline
Left: distribution before the cut. Right: distribution after the cut.}
\label{fig:invmass}
\end{figure}

After the preselection, the signal efficiency for q* decaying to qW of 2016 ranges from 48 % for 1.6 TeV to 49 % for 7
TeV. Decaying to qZ, the efficiencies are between 45 % (1.6 TeV) and 50 % (7 TeV). The amount of background after the
preselection is reduced to 5 % of the original events. For the combined data of the three years those values look
similar. Decaying to qW signal efficiencies between 49 % (1.6 TeV) and 56 % (7 TeV) are reached, whereas the
efficiencies when decaying to qZ are in the range of 46 % (1.6 TeV) to 50 % (7 TeV). Here, the background could be
reduced to 8 % of the original events. So while keeping around 50 % of the signal, the background was already reduced to
less than a tenth.

## Data - Monte Carlo Comparison

To ensure that the simulation reproduces the data well, the simulated QCD background sample is now being compared to the
data of the corresponding year collected by the CMS detector. This is done for the partial dataset of year 2016 and for
the full dataset separately. In [@fig:data-mc], this comparison can be seen for the distributions of the variables used
during the preselection.
To compensate for the simulation overpredicting the scale of the QCD background, histograms are rescaled, so that the
dijet invariant mass distributions of data and simulation have the same integral.
The invariant mass distribution of the data of 2016 falls slightly faster than the simulated one, apart from that, the
distributions are in very good agreement.

For analysing the data from the CMS experiment, jet energy corrections have to be applied. Those are to calibrate the
ECAL and HCAL parts of the CMS, so the energy of the detected particles can be measured correctly. The corrections used
were published by the CMS group. [cite todo]

\begin{figure}
\begin{minipage}{0.5\textwidth}
\centering\textbf{2016}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\centering\textbf{Combined}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/DATA/v1_invmass_N_jets.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/DATA/v1_invmass_N_jets.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/DATA/v1_invmass_deta.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/DATA/v1_invmass_deta.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/DATA/v1_invmass_invMass.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/DATA/v1_invmass_invMass.eps}
\end{minipage}
\caption{Comparison of data with the Monte Carlo simulation.}
\label{fig:data-mc}
\end{figure}


### Sideband

The sideband region is introduced to make sure no bias in the data and Monte Carlo simulation is introduced and also to
verify the agreement of data and simulation. It is a region in which no signal event is expected. Again, data and the
Monte Carlo simulation are compared. For this analysis, the region where the soft-drop mass of both of the two jets with
the highest transverse momentum is more than 105 GeV is chosen. 105 GeV is well above the mass of 91 GeV of the Z boson,
the heavier vector boson. Therefore it is very unlikely, that an event with a particle heavier than that originates from
the decay of a vector boson. In [@fig:sideband], the comparison of data with simulation in the sideband region can be
seen for the soft-drop mass distribution as well as the dijet invariant mass distribution. It can be seen, that in the
sideband region data and simulation match very well.

\begin{figure}
\begin{minipage}{\textwidth}
\centering\textbf{2016}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/sideband/v1_SDM_SoftDropMass_1.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/2016/sideband/v1_SDM_invMass.eps}
\end{minipage}
\begin{minipage}{\textwidth}
\centering\textbf{Combined dataset}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/sideband/v1_SDM_SoftDropMass_1.eps}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/combined/sideband/v1_SDM_invMass.eps}
\end{minipage}
\caption{Comparison of data with the Monte Carlo simulation in the sideband region.}
\label{fig:sideband}
\end{figure}

\newpage

# Jet substructure selection

So far it was made sure, that the data collected by the CMS and the simulation are in good agreement after the
preselection and no unwanted side effects are introduced in the data by the used cuts. Now another selection has to be
introduced, to further reduce the background to be able to look for the hypothetical signal events in the data.

This is done by distinguishing between QCD and signal events using a tagger to identify jets coming
from a vector boson. Two different taggers will be used to later compare their performance. The decay analysed includes
either a W or Z boson, which are, compared to the particles in QCD effects, very heavy. This can be used by adding a
selection using the soft-drop mass of a jet. The soft-drop mass of at least one of the two leading jets is expected to
be within $\SI{35}{\giga\eV}$ and $\SI{105}{\giga\eV}$. This cut already provides a good separation of QCD and signal
events, on which the two taggers presented next can build.

Both taggers provide a discriminant to choose whether an event can be classified as the decay of a vector boson or
originates from QCD effects. This value will be optimized afterwards to make sure the maximum signal significance
possible is achieved.

## N-Subjettiness

The N-subjettiness [@TAU21] $\tau_N$ is a jet shape parameter designed to identify boosted hadronically-decaying
objects. When a vector boson decays hadronically, it produces two quarks each causing a jet. But in the case of the
decay of a q\* particle, the vector boson is highly boosted and so are its decay products. They therefore appear, after
applying a clustering algorithm, as just one jet. This algorithm now tries to figure out, whether one jet might consist
of two subjets by using the kinematics and positions of the constituent particles of this jet.
The N-subjettiness is defined as

\begin{equation} \tau_N = \frac{1}{d_0} \sum_k p_{T,k} \cdot \text{min}\{ \Delta R_{1,k}, \Delta R_{2,k}, …, \Delta
R_{N,k} \} \end{equation}

with k going over the constituent particles in a given jet, $p_{T,k}$ being their transverse momenta and $\Delta R_{J,k}
= \sqrt{(\Delta\eta)^2 + (\Delta\phi)^2}$ being the distance of a candidate subjet J and a constituent particle k in the
$\eta$ - $\phi$ plane. It quantifies to what degree a jet can be regarded as a jet composed of $N$ subjets. In the
hadronic decay of a highly boosted vector boson, two subjets are expected. Therefore it seems that $\tau_2$ would be a
good choice for a discriminant. However, experiments showed, that rather than using $\tau_2$ directly, the ratio
$\tau_{21} = \tau_2/\tau_1$ is a better discriminant between QCD effects and events originating from the decay of a
boosted vector boson.

The lower the $\tau_{21}$ is, the more likely a jet is caused by the decay of a vector boson. Therefore a selection will
be introduced, so that $\tau_{21}$ of one candidate jet is smaller then some value that will be determined by the
optimization process described in the next chapter. As candidate jet the one of the two highest $p_t$ jets passing the
soft-drop mass window is used. If both of them pass, the one with higher $p_t$ is chosen.

## DeepAK8

The DeepAK8 tagger [@DEEP_BOOSTED] uses a deep neural network (DNN) to identify decays originating in a vector boson. It
reduces the background rate by up to a factor of ~10 with the same signal efficiency compared to non-machine-learning
approaches like the N-Subjettiness method. This is shown by [@fig:ak8_eff], showing a comparison of background and
signal efficiency of the DeepAK8 tagger with, between others, the $\tau_{20}$ tagger that is also used in this analysis.

![Comparison of tagger efficiencies, showing, between others, the DeepAK8-MD (which stands for mass decorrelated and is
the one used for this research) and $\tau_{21}$ tagger used in this analysis.
Taken from [@DEEP_BOOSTED]](./figures/deep_ak8.pdf){#fig:ak8_eff width=60%}

The DNN has two input lists for each jet. The first is a list of up to 100 constituent particles of the jet, sorted by
decreasing $p_t$. A total of 42 properties of the particles such es $p_t$, energy deposit, charge and the
angular momentum between the particle and the jet or subjet axes are included. The second input list is a list of up to
seven secondary vertices, each with 15 features, such as the kinematics, displacement and quality criteria.
To process those inputs, a customised DNN architecture has been developed. It consists of two convolutional neural
networks (CNN) that each process one of the input lists. The outputs of the two CNNs are then combined and processed by
a fully-connected network to identify the jet. The network was trained with a sample of 40 million jets, another 10
million jets were used for development and validation.

In this thesis, the mass decorrelated version of the DeepAK8 tagger is used. It adds an additional mass predictor layer,
that is trained to quantify how strongly the output of the non-decorrelated tagger is correlated to the mass of a
particle. Its output is fed back to the network as a penalty so it avoids using features of the particles correlated to
their mass. The result is a largely mass decorrelated tagger of heavy resonances, that doesn't introduce a bias in the
jet mass shape. As can be seen in [@fig:ak8_eff], it performs not as good as the non-mass-decorrelated version, but
still better than the other taggers it was compared to.

The higher the discriminant value, called WvsQCD resp. ZvsQCD, of the deep boosted tagger, the more likely is the jet to
be caused by the decay of a vector boson. Therefore, using the same way to choose a candidate jet as for the
N-subjettiness tagger, a selection is applied so that this candidate jet has a WvsQCD/ZvsQCD value greater than some
value determined by the optimization presented next.

## Optimization {#sec:opt}

To figure out the best value to cut on the discriminants introduced by the two taggers, a value to quantify how good a
cut is has to be introduced. For that, the significance calculated by $\frac{S}{\sqrt{B}}$ will be used. S stands for
the amount of signal events and B for the amount of background events in a given interval. This value assumes a gaussian
error on the background so it will be calculated for the 2 TeV masspoint where enough background events exist to justify
this assumption. It follows from the central limit theorem that states, that for identical distributed random variables,
their sum converges to a gaussian distribution. The significance therefore represents how good the signal can be
distinguished from the background in units of the standard deviation of the background. As interval, a 10 % margin
around the resonance nominal mass is chosen. The significance is then calculated for different selections on the
discriminant of the two taggers and then plotted in dependence on the minimum resp. maximum allowed value of the
discriminant to pass the selection for the deep boosted resp. the N-subjettiness tagger.

The optimization process is done using only the data from year 2018, assuming the taggers have similar performances on
the data of the different years.

\begin{figure}
  \begin{minipage}{0.5\textwidth}
    \includegraphics{./figures/sig-db.pdf}
  \end{minipage}
  \begin{minipage}{0.5\textwidth}
    \includegraphics{./figures/sig-tau.pdf}
  \end{minipage}
\caption{Significance plots for the deep boosted (left) and N-subjettiness (right) tagger at the 2 TeV masspoint.}
\label{fig:sig}
\end{figure}

As a result, the $\tau_{21}$ cut is placed at $\le 0.35$, confirming the value previous research chose and the deep
boosted cut is placed at $\ge 0.95$. For the deep boosted tagger, 0.97 would give a slightly higher significance but as
it is very close to the edge where the significance drops very low and the higher the cut the less background will be
left to calculate the cross section limits, especially at higher resonance masses, the slightly less strict cut is
chosen.

For both taggers also a low purity category is introduced for high TeV regions. Using the cuts optimized for 2 TeV,
there are very few background events left for higher resonance masses, but to reliably calculate cross section limits,
those are needed. Therefore in the final cross section calculation, the two categories are combined to have a high
signal sensitivity for all masspoints between 1.6 TeV and 7 TeV that were simulated. As low purity category for the
N-subjettiness tagger, a cut at $0.35 < \tau_{21} < 0.75$ is used. For the deep boosted tagger the opposite cut from the
high purity category is used: $VvsQCD < 0.95$.

\newpage

# Signal extraction {#sec:extr}

After the optimization, now the optimal selection for the N-subjettiness as well as the deep boosted tagger is found and
applied to the simulated samples as well as the data collected by the CMS experiment. The fit described in [@sec:moa] is
performed for all masspoints of the decay to qW and qZ and for the partial dataset of $\SI{35.92}{\per\femto\barn}$ as
well as the complete dataset of $\SI{137.19}{\per\femto\barn}$ separately.

To test for the presence of a resonance in the data, the cross section limits of the signal event are calculated using a
frequentist asymptotic limit criterion described in [@ASYMPTOTIC_LIMIT]. Using the parameters and signal rate obtained
by the method described in [@sec:moa] as well as a shape analysis of the data recorded by the CMS experiment, it
determines an expected and an observed cross section limit by doing a signal + background versus background-only
hypothesis test. It also calculates upper and lower limits of the expected cross section corresponding to a confidence
level of 95 %.

In the absence of the q\* particle in the data, the observed limits lie within the $2\sigma$ environment, meaning a 95 %
confidence level, of the expected limit. This observed limit is plotted together with a theory line, representing the
cross section limits expected, if the q\* predicted by [@QSTAR_THEORY] would exist.
Since no significant deviation from the Standard Model is found while looking for the resonance, the crossing of the
theory line with the observed limit is calculated, to have a limit of mass up to which the existence of the q\* particle
can be excluded. To find the uncertainty of this result, the crossing of the theory line plus, respectively minus, its
uncertainty with the observed limit is also calculated.

## Systematic Uncertainties

The variables used in this analysis are affected by systematic uncertainties.
For calculating the cross section of the signal, four sources of such uncertainties are considered.

First, the uncertainty of the Jet Energy Corrections. When measuring a particle's energy with the ECAL or HCAL part of
the CMS, the electronic signals send by the photodetectors in the calorimeters have to be converted to actual energy
values. Therefore an error in this calibration causes the energy measured to be shifted to higher or lower values
causing also the position of the signal peak in the $m_{jj}$ distribution to vary. The uncertainty is approximated to be
2 %.

Second, the tagger does not work perfectly and therefore some events, that don't originate from a V boson are wrongly
chosen and on the other hand sometimes events that do originate from one are not. It influences the events chose for
analysis and is therefore also considered as an uncertainty, which is approximated to be 6 %.

Third, the uncertainty of the parameters of the background fit is also considered, as it might change the background
shape a little and therefore influence how many signal and background events are reconstructed from the data.

Fourth, the uncertainty on the luminosity influence the normalization of the processes. Its value is 2.5 % [cite todo].
\newpage

# Results

This chapter will start by presenting the results for the partial dataset of year 2016 with an integrated luminosity of
$\SI{35.92}{\per\femto\barn}$ using both taggers and comparing it to the previous research [@PREV_RESEARCH]. It will
then go on showing the results for the combined dataset with an integrated luminosity of $\SI{137.19}{\per\femto\barn}$,
again using both taggers comparing their performances.

## Partial dataset

Using the $\SI{35.92}{\per\femto\barn}$ of data collected by the CMS experiment during 2016, the cross section limits
seen in [@fig:res2016] were obtained.

As described in [@sec:extr], the calculated cross section limits are used to then calculate a mass limit, meaning the
lowest possible mass of the q\* particle, by finding the crossing of the theory line with the observed cross section
limit. In [@fig:res2016dw,@fig:res2016dz] it can be seen, that the observed limit using the deep boosted tagger in the
region where theory and observed limit cross is very high compared to when using the N-subjettiness tagger. Therefore
the two lines cross at lower resonance masses, which results in lower exclusion limits on the mass of the q\* particle
causing the deep boosted tagger to perform worse than the N-subjettiness tagger in regards of establishing those limits
as can be seen in [@tbl:res2016]. The table also shows the upper and lower limits on the mass found by calculating the
crossing of the theory plus resp. minus its uncertainty. Due to the theory and the observed limits line being slowly
falling in the high TeV region, even a small uncertainty of the theory can cause a high difference of the mass limit.


: Mass limits found using the partial dataset of $\SI{35.92}{\per\femto\barn}$ {#tbl:res2016}

| Decay | Tagger       | Limit [TeV] | Upper Limit [TeV] | Lower Limit [TeV] |
|-------|--------------|-------------|-------------------|-------------------|
| qW    | $\tau_{21}$  | 5.39        | 6.01              | 4.99              |
| qW    | deep boosted | 4.96        | 5.19              | 4.84              |
| qZ    | $\tau_{21}$  | 4.86        | 4.96              | 4.70              |
| qZ    | deep boosted | 4.62        | 4.71              | 4.49              |


\begin{figure}%
  \centering
  \subfloat[Decay to qW, using N-subjettiness tagger]{%
  \label{fig:res2016tw}%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqW_2016tau_13TeV.pdf}}
  \subfloat[Decay to qW, using deep boosted tagger]{%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqW_2016db_13TeV.pdf}%
  \label{fig:res2016dw}}\\
  \subfloat[Decay to qZ, using N-subjettiness tagger]{%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqZ_2016tau_13TeV.pdf}%
  \label{fig:res2016tz}}%
  \subfloat[Decay to qZ, using deep boosted tagger]{%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqZ_2016db_13TeV.pdf}%
  \label{fig:res2016dz}}%
\caption{Results of the cross section limits for the partial dataset of 2016 using the $\tau_{21}$ tagger and the deep
boosted tagger.}
\label{fig:res2016}
\end{figure}

### Comparison with existing results

The result will now be compared to an existing result using the same dataset. This research, however, uses a newer
detector calibration as well as an improved reconstruction so slight variations in the results are to be expected.

The limit established by using the N-subjettiness tagger with the partial dataset is $\SI{0.39}{\tera\eV}$ (decay to qW)
resp. $\SI{0.16}{\tera\eV}$ (decay to qZ) higher than the one from previous research, which was found to be 5 TeV for
the decay to qW and 4.7 TeV for the decay to qZ. This is mainly due to the fact, that in our data, the observed limit at
the intersection point happens to be in the lower region of the expected limit interval and therefore causing a very
late crossing with the theory line when using the N-subjettiness tagger (as can be seen in [@fig:res2016]). Comparing
the expected limits, there is a difference between 2 % and 30 %, between the values calculated by this thesis compared
to the previous research. It is not, however, that one of the two results was constantly lower or higher but rather
fluctuating. As already noted, a slight variations in the results was expected, therefore it can be said, that the
results are in good agreement. The cross section limits of the previous research can be seen in [@fig:prev].

\begin{figure}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/results/prev_qW.png}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/results/prev_qZ.png}
\end{minipage}
\caption{Previous results of the cross section limits for q\* decaying to qW (left) and q\* decaying to qZ (right).
Taken from \cite{PREV_RESEARCH}.}
\label{fig:prev}
\end{figure}

## Combined dataset

Using the full available dataset of $\SI{137.19}{\per\femto\barn}$, the cross section limits seen in [@fig:resCombined]
were obtained. The cross section limits are, compared to only using the 2016 dataset, reduced to about 50 %. This shows
the big improvement achieved by using more than three times the amount of data.

The results for the mass limits of the combined years are presented in the following table.


: Mass limits found using $\SI{137.19}{\per\femto\barn}$ of data {#tbl:resCombined}

| Decay | Tagger       | Limit [TeV] | Upper Limit [TeV] | Lower Limit [TeV] |
|-------|--------------|-------------|-------------------|-------------------|
| qW    | $\tau_{21}$  | 6.00        | 6.26              | 5.74              |
| qW    | deep boosted | 6.11        | 6.31              | 5.39              |
| qZ    | $\tau_{21}$  | 5.49        | 5.76              | 5.29              |
| qZ    | deep boosted | 4.95        | 5.13              | 4.85              |


The combination of the three years not just improved the cross section limits, but also the limit for the mass of the
q\* particle. The final result is 1 TeV higher for the decay to qW and almost 0.8 TeV higher for the decay to qZ than
what was concluded by the previous research [@PREV_RESEARCH].


\begin{figure}%
  \centering
  \subfloat[Decay to qW, using N-subjettiness tagger]{%
  \label{fig:resCombinedtw}%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqW_Combinedtau_13TeV.pdf}}
  \subfloat[Decay to qW, using deep boosted tagger]{%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqW_Combineddb_13TeV.pdf}%
  \label{fig:resCombineddw}}\\
  \subfloat[Decay to qZ, using N-subjettiness tagger]{%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqZ_Combinedtau_13TeV.pdf}%
  \label{fig:resCombinedtz}}%
  \subfloat[Decay to qZ, using deep boosted tagger]{%
  \includegraphics[width=0.5\textwidth]{./figures/results/brazilianFlag_QtoqZ_Combineddb_13TeV.pdf}%
  \label{fig:resCombineddz}}%
\caption{Results of the cross section limits for the combined dataset using the $\tau_{21}$ tagger and the deep boosted
tagger.}
\label{fig:resCombined}
\end{figure}


## Comparison of taggers

The results presented in [@tbl:res2016, @tbl:resCombined] show, that the deep boosted tagger was not able to
significantly improve the results compared to the N-subjettiness tagger.
For further comparison, in [@fig:limit_comp] the expected limits of the different taggers for the q\* $\rightarrow$ qW
and the q\* $\rightarrow$ qZ decay are shown. It can be seen, that the deep boosted is at best as good as the
N-subjettiness tagger. This was not the expected result, as the deep neural network was already found to provide a
higher significance in the optimisation done in [@sec:opt]. The higher significance should also result in lower cross
section limits. To make sure, there is no mistake in the setup, also the expected cross section limits using only the
high purity category of the two taggers with 2018 data are compared in [@fig:comp_2018]. There, the cross section limits
calculated using the deep boosted tagger are a bit lower than with the N-subjettiness tagger, showing, that the method
used for optimisation is working but the assumption of it also applying to the combined dataset did not hold.

This can be explained by some training issues identified lately.
The training of the DeepAK8 tagger was done for the data of year 2016. It therefore performs differently for the data of
the other years. This caused the DeepAK8 tagger to perform significantly worse than it could have for several reasons.
First, the optimization done for the data of year 2018 could therefore not be applied to the other datasets. Second,
even for the data of 2016, a newer version of the background simulation was used, that, in combination with the samples
used for the signal, turned out to be the worst case scenario for the used training.
Recently, the training was improved to better perform across all datasets, but those changes could not be incorporated
into this thesis due to it not being possible to do this in a reasonable timeframe.


\begin{figure}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/limit_comp_w.pdf}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\includegraphics{./figures/limit_comp_z.pdf}
\end{minipage}
\caption{Comparison of expected limits of the different taggers using different datasets. Left: decay to qW. Right:
decay to qZ}
\label{fig:limit_comp}
\end{figure}


![Comparision of deep boosted and N-subjettiness tagger in the high purity category using the data from year 2018.
](./figures/limit_comp_2018.pdf){#fig:comp_2018 width=70%}

\clearpage
\newpage

# Summary

In this thesis, a search for the q\* particle decaying to q + W and q + Z was presented. Data of proton - proton
collisions at the LHC of an integrated luminosity of $\SI{137.19}{\per\femto\barn}$ collected by the CMS experiment at a
centre-of-mass energy of $\sqrt{s} = \SI{13}{\tera\eV}$ has been searched. Also a partial dataset of
$\SI{35.92}{\per\femto\barn}$ was analysed, to be able to compare the results to previous research. Monte Carlo
simulations were used to estimate the QCD background and signal.

A selection was introduced to reduce background events and enhance signal sensitivity. This selection required at least
two jets, a $\Delta\eta \ge 1.3$ between the two highest $p_t$ jets, an invariant mass of the two highest $p_t$ jets
greater than $\SI{1050}{\giga\eV}$ and a soft-drop mass of at least one jet between $\SI{35}{\giga\eV}$ and
$\SI{105}{\giga\eV}$.

Two taggers, the DeepAK8 and the N-subjettiness tagger, have been used to identify jets originating from the decay of a
vector boson. For both of them, two categories were introduced. A high purity category, aiming for maximal signal
sensitivity in the low TeV region of the invariant mass spectrum and a low purity category, aiming for better statistics
in the high TeV region. For the DeepAK8 tagger, a high purity category of $VvsQCD > 0.95$ and a low purity category of
$VvsQCD \le 0.95$ was used. For the N-subjettiness tagger the high purity category was $\tau_{21} < 0.35$ and the low
purity category $0.35 < \tau_{21} < 0.75$. These values were obtained by optimizing for the highest possible
significance of the signal.

A combined fit to the dijet invariant mass distribution of background plus signal has been used to determine their shape
parameters and the expected signal rate. With those results, the cross section limits were extracted from the data.
Because no significant deviation from the Standard Model was observed, new exclusion limits for the mass of the q\*
particle were set. These are 6.1 TeV by analyzing the decay to qW, respectively 5.5 TeV for the decay to qZ. Those
limits are about 1 TeV higher than the ones found in previous research, that found them to be 5 TeV resp. 4.7 TeV.

The DeepAK8 tagger performed worse than was expected. This can be explained with some training issues identified lately.
Therefore, with an updated training, it is expected that the presented results can be further improved.

\newpage

\nocite{*}