# Running ILD simulation and reconstruction

This exercise aims at showing you how to run full simulation as well as
reconstruction using `ddsim` and the Gaudi based Key4hep framework respectively.
You will
- Run `ddsim` to produce SIM level input files for the reconstruction in EDM4hep
  format
- Learn how to use the tools provided by
  [`k4MarlinWrapper`](https://github.com/key4hep/k4MarlinWrapper) that allows to
  run workflows that were originally developed for the `Marlin` in the Gaudi
  based framework of Key4hep. This includes
  - Converting a Marlin steering file to a Gaudi options file,
  - Adapting the options file to be able to read and write EDM4hep output
  - Running this Gaudi options file via `k4run`

In this particular case we are using the ILD configuration to do this but the
conceptual steps are very similar for other detector concepts that used Marlin
originally. 

## Setup
If you haven't done it yet, source a Key4hep software environment via

```bash
source /cvmfs/sw.hsf.org/key4hep/setup.sh
```

For the remainder of the tutorial we will assume that you are working within the
`key4hep_tut_ild_reco` directory, i.e. 

```bash 
mkdir key4hep_tut_ild_reco
cd key4hep_tut_ild_reco 
``` 

However, this is a minor detail and you can choose whatever directory you want.
We do suggest a clean directory though.

Next we will be using the standard simulation and reconstruction
configuration for ILD which we can get via

```bash
git clone https://github.com/iLCSoft/ILDConfig
```

For the rest of this tutorial we will be working in the
`ILDConfig/StandardConfig/production` folder

```bash
cd ILDConfig/StandardConfig/production
```

## Running the simulation

We will use the output file of [*the whizard
tutorial*](https://github.com/key4hep/key4hep-tutorials/blob/main/whizard_gen/README.md)
as generator level input. In case you have not done that exercise you can get
one via

```bash
wget https://raw.githubusercontent.com/key4hep/key4hep-tutorials/main/gaudi_ild_reco/input_files/zh_mumu.slcio
```

Simulating a few events with `ddsim` is straight forward. `ddsim` can produce
EDM4hep and LCIO format output files, and it decides which format to used based
on the name of the output file:
- Names ending on `.slcio` will result in LCIO output files
- Names ending in `edm4hep.root` will result in in EDM4hep output files

In the course of this exercise we will only need the EDM4hep format, we simply
provide both options for convenience here.

::::{tab-set}
:::{tab-item} EDM4hep
:sync: edm4hep

To run the simulation with EDM4hep output you can use the following command
```bash
ddsim --compactFile $k4geo_DIR/ILD/compact/ILD_l5_v02/ILD_l5_v02.xml \
      --steeringFile ddsim_steer.py \
      --inputFiles zh_mumu.slcio \
      --outputFile zh_mumu_SIM.edm4hep.root
```

:::
:::{tab-item} LCIO
:sync: lcio

To run the simulation with LCIO output you can use the following command
``` bash
ddsim --compactFile $k4geo_DIR/ILD/compact/ILD_l5_v02/ILD_l5_v02.xml \
      --steeringFile ddsim_steer.py \
      --inputFiles zh_mumu.slcio \
      --outputFile zh_mumu_SIM.slcio
```

:::
::::

Depending on the machine where you are running this, this will take up to a few
minutes to complete. You can start this and read on in the meantime.

## Reconstruction 

To run the reconstruction we will use the Gaudi based Key4hep framework. Note
that we can run the reconstruction just the same as within iLCSoft via `Marlin`.
However, we will not show that in this tutorial.

### Using `ILDReconstruction.py` from ILDConfig

In order to run the *standard reconstruction* simply run the following command

```bash
k4run ILDReconstruction.py \
      --detectorModel=ILD_l5_o1_v02 \
      --inputFiles=zh_mumu_SIM.edm4hep.root \
      --outputFileBase=zh_mumu
```

The `ILDReconstruction.py` configuration can handle several different ILD
detector configurations, so we have to choose one via the `--detectorModel`
argument. Make sure that this is compatible to the one that has been used for
simulation.

This will produce several new output files

- `zh_mumu_REC.edm4hep.root` - The output of the reconstruction in EDM4hep
  format
- `zh_mumu_AIDA.root` - The histograms that were produce by wrapped processors
  using the AIDA interface
- `zh_mumu_PfoAnalysis.root` - The output file of the Pandora PFO analysis
  processor

`ILDReconstruction.py` also supports reading LCIO inputs and will detect this
automatically from the input file name. It also supports producing LCIO output
files via the `--lcioOutput` flag, which can either be `off` (default), `on` or
`only` (for not producing any EMD4hep output).

Check [the introduction to EDM4hep /
podio](https://key4hep.github.io/key4hep-doc/how-tos/key4hep-tutorials/edm4hep_analysis/edm4hep_api_intro.html)
for more details on how to read and analyse this file.

### Creating a Gaudi options file

```{warning}
These instructions show you how to convert an existing Marlin steering file to a
Gaudi options file using the full ILD reconstruction as an example since it
exhibits a few of the potential issues that you might run into along the way.
However, we **strongly recommend using the existing reconstruction configuration
that is available from
[ILDConfig](https://github.com/iLCSoft/ILDConfig/tree/master/StandardConfig/production)
for actually running the reconstruction.** It has more features and also
supports running with different detector models, something that the following
steps will not achieve!.
```

The bulk of the work for creating such an options file from an existing Marlin
steering file in XML format can be done with the
`convertMarlinSteeringToGaudi.py` converter script. We will start by converting
the `MarlinStdReco.xml` steering file and then do some minor adjustments to the
converted options file. The main thing to consider for the ILD configuration is
that `MarlinStdReco.xml` makes use of several include statements to pull in more
configuration. Hence, we first have to create a Marlin steering file with these
includes resolved. We also have to provide a `DetectorModel` constant here,
since some of the includes depend on this.

```bash
Marlin -n MarlinStdReco.xml --constant.DetectorModel=ILD_l5_o1_v02
```

You should now have a `MarlinStdRecoParsed.xml` file. This is the one that we
will convert using the converter script via

```bash
convertMarlinSteeringToGaudi.py MarlinStdRecoParsed.xml MarlinStdReco.py
```

Since some parts of the Marlin steering file conversion can not be handled
automatically we have to make a few adjustments to `MarlinStdReco.py`. We
recommend to simply edit the file directly, but you can also use the `sed`
commands below to do these adjustments. The adjustments are:
- Give the `lcgeo_DIR` constant (first entry in the `CONSTANTS` dict) a
  meaningful value. The easiest way to do this is to simply get the value of the
  corresponding environment variable via `os.environ["lcgeo_DIR"]` (don't forget
  to `import os` at the top)
- Exclude the `BgOverlayWW`, `BgOverlayBB`, `BgOverlayBW`, `BgOverlayWB` and `PairBgOverlay`
  algorithms from being run, by simply commenting out the lines where these are
  appended to the `algList` (this list is populated at almost the end of the
  file).

:::{dropdown} `sed` commands for adjustments

``` bash
sed -i '1s/^/import os\n/' MarlinStdReco.py
sed -i 's/\( *.lcgeo_DIR.:\).*/\1 os.environ["lcgeo_DIR"],'/ MarlinStdReco.py
sed -i 's/algList.append(BgOverlayWW)/# algList.append(BgOverlayWW)/' MarlinStdReco.py
sed -i 's/algList.append(BgOverlayWB)/# algList.append(BgOverlayWB)/' MarlinStdReco.py
sed -i 's/algList.append(BgOverlayBW)/# algList.append(BgOverlayBW)/' MarlinStdReco.py
sed -i 's/algList.append(BgOverlayBB)/# algList.append(BgOverlayBB)/' MarlinStdReco.py
sed -i 's/algList.append(PairBgOverlay)/# algList.append(PairBgOverlay)/' MarlinStdReco.py
```

:::

With the state the options file is in now, you would be able to run it with LCIO
input.

:::{dropdown} Running the reconstruction with LCIO
To run the reconstruction with LCIO inputs and outputs we now simply need to
pass in the input file that we have created at the simulation step

```bash
k4run MarlinStdReco.py --LcioEvent.Files=zh_mumu_SIM.slcio
```

This should take somewhere between 20 seconds up to roughly a minute to run. If
you haven't changed anything else you should now have a few output files:

```bash
ls StandardReco_*.*
```

should now show a `REC` and `DST` file, as well as a `PfoAnalysis` and an `AIDA`
file. You can change the names of these files by adjusting the `OutputBaseName`,
resp. The corresponding filename constants values in `CONSTANTS`.

:::

### Adapting the options file for EDM4hep

It is necessary to adapt the Gaudi options file a bit further:
- Replace the `LcioEvent` algorithm with the `PodioInput` algorithm 
  - Make sure to replace the `Files` option with the `collections` option and to
    populate this option with the list of collections you want to read (see
    below)
- Replace the `EventDataSvc` with the `k4DataSvc` (remember to instantiate it
  with `"EventDataSvc"` as name)
- Add a `PodioOutput` algorithm to write EDM4hep output (don't forget to add it
  to the `algList` at the very end)
  - (For the sake of this exercise) configure this to only write the
    `MCParticlesSkimmed`, `PandoraPFOs` and the `RecoMCTruthLink` collections
- Attach the necessary in-memory on-the-fly converters between EDM4hep and LCIO
  (and vice versa)
  - For the conversion of the EDM4hep inputs to LCIO instantiate a
    `EDM4hep2LcioTool` and attach it to the first wrapped processor that is run
    (`MyAIDAProcessor`). See detailed description below.
  - For the conversion of the LCIO outputs to EDM4hep instantiate a
    `Lcio2EDM4hepTool` and attach it to the last wrapped processor that is run
    before the `PodioOutput` algorithm that you just added (`MyPfoAnalysis`).
    Also see below.
    
**For all of these steps make sure that you `import` all the necessary tools and
algorithms from `Configurables`!**
  
The top of your file should now look something like this

```python
from Configurables import (
    PodioInput, PodioOutput, k4DataSvc, MarlinProcessorWrapper,
    EDM4hep2LcioTool, Lcio2EDM4hepTool
    )
from k4MarlinWrapper.parseConstants import *
algList = []
evtsvc = k4DataSvc("EventDataSvc")
```

while the configuration for the input reader and the `EDM4hep2LcioTool` should
look like this

```python
read = PodioInput()
read.OutputLevel = INFO
read.collections = [
    # ... list of collection names
]
algList.append(read)

edm4hep2LcioConv = EDM4hep2LcioTool()
edm4hep2LcioConv.collNameMapping = {
    "MCParticles": "MCParticle"
}

# ... Unchanged config of MyAIDAProcessor

MyAIDAProcessor.EDM4hep2LcioTool = edm4hep2LcioConv
```

:::{dropdown} list of collection names

The list of collections that is populated by standard configuration of ILD for
simulation looks like this. You can simply copy this into the options file

```python
read.collections = [
     "BeamCalCollection",
     "BeamCalCollectionContributions",
     "ECalBarrelScHitsEven",
     "ECalBarrelScHitsEvenContributions",
     "ECalBarrelScHitsOdd",
     "ECalBarrelScHitsOddContributions",
     "ECalBarrelSiHitsEven",
     "ECalBarrelSiHitsEvenContributions",
     "ECalBarrelSiHitsOdd",
     "ECalBarrelSiHitsOddContributions",
     "EcalEndcapRingCollection",
     "EcalEndcapRingCollectionContributions",
     "ECalEndcapScHitsEven",
     "ECalEndcapScHitsEvenContributions",
     "ECalEndcapScHitsOdd",
     "ECalEndcapScHitsOddContributions",
     "ECalEndcapSiHitsEven",
     "ECalEndcapSiHitsEvenContributions",
     "ECalEndcapSiHitsOdd",
     "ECalEndcapSiHitsOddContributions",
     "EventHeader",
     "FTDCollection",
     "HcalBarrelRegCollection",
     "HcalBarrelRegCollectionContributions",
     "HCalBarrelRPCHits",
     "HCalBarrelRPCHitsContributions",
     "HCalECRingRPCHits",
     "HCalECRingRPCHitsContributions",
     "HcalEndcapRingCollection",
     "HcalEndcapRingCollectionContributions",
     "HCalEndcapRPCHits",
     "HCalEndcapRPCHitsContributions",
     "HcalEndcapsCollection",
     "HcalEndcapsCollectionContributions",
     "LHCalCollection",
     "LHCalCollectionContributions",
     "LumiCalCollection",
     "LumiCalCollectionContributions",
     "MCParticles",
     "SETCollection",
     "SITCollection",
     "TPCCollection",
     "TPCLowPtCollection",
     "TPCSpacePointCollection",
     "VXDCollection",
     "YokeBarrelCollection",
     "YokeBarrelCollectionContributions",
     "YokeEndcapsCollection",
     "YokeEndcapsCollectionContributions",
]
```

:::

Finally, the `PodioOutput` algorithm and the `Lcio2EDM4hepTool` can be
configuration should look something like this

```python
# ... MyPfoAnalysis configuration unchanged

lcio2edm4hepConv = Lcio2EDM4hepTool()
lcio2edm4hepConv.collNameMapping = {
    "MCParticle": "MCParticles"
}
MyPfoAnalysis.Lcio2EDM4hepTool = lcio2edm4hepConv

edm4hepOutput = PodioOutput()
edm4hepOutput.filename = "zh_mumu_reco.edm4hep.root"
edm4hepOutput.outputCommands = [
    "drop *",
    "keep MCParticlesSkimmed",
    "keep PandoraPFOs",
    "keep RecoMCTruthLink",
]

# ... the complete algList
algList.append(edm4hepOutput)

# ... ApplicationMgr config
```

### Running the reconstruction with `k4run`

After all these adaptions it is now possible to run the full reconstruction
chain on the previously simulated input with `k4run`

```bash
k4run MarlinStdReco.py --num-events=3 --EventDataSvc.input=zh_mumu_SIM.edm4hep.root
```

Here we are again using the command line to specify the input file, we could
have just as well used the `input` option of the `evtsvc` in the options file.
Note also that we explicitly pass in the number of events, this is a workaround
for [this issue](https://github.com/key4hep/k4MarlinWrapper/issues/94).

You should now have a `zh_mumu_reco.edm4hep.root` file that contains the
complete events in all their glory. For a more practical output you can tweak
the `edm4hepOutput.outputCommands` option in order to keep only "interesting"
collections. Also note that the REC and DST LCIO output files are still
produced. Can you reproduce these data tiers for EDM4hep?