Running ILD simulation and reconstruction

This exercise aims at showing you how to run full simulation as well as reconstruction using ddsim and the Gaudi based Key4hep framework respectively. You will

  • Run ddsim to produce SIM level input files for the reconstruction in EDM4hep format

  • Learn how to use the tools provided by k4MarlinWrapper that allows to run workflows that were originally developed for the Marlin in the Gaudi based framework of Key4hep. This includes

    • Converting a Marlin steering file to a Gaudi options file,

    • Adapting the options file to be able to read and write EDM4hep output

    • Running this Gaudi options file via k4run

In this particular case we are using the ILD configuration to do this but the conceptual steps are very similar for other detector concepts that used Marlin originally.

Setup

If you haven’t done it yet, source a Key4hep software environment via

source /cvmfs/sw.hsf.org/key4hep/setup.sh

For the remainder of the tutorial we will assume that you are working within the key4hep_tut_ild_reco directory, i.e.

mkdir key4hep_tut_ild_reco
cd key4hep_tut_ild_reco 

However, this is a minor detail and you can choose whatever directory you want. We do suggest a clean directory though.

Next we will be using the standard simulation and reconstruction configuration for ILD which we can get via

git clone https://github.com/iLCSoft/ILDConfig

For the rest of this tutorial we will be working in the ILDConfig/StandardConfig/production folder

cd ILDConfig/StandardConfig/production

Running the simulation

We will use the output file of the whizard tutorial as generator level input. In case you have not done that exercise you can get one via

wget https://raw.githubusercontent.com/key4hep/key4hep-tutorials/main/gaudi_ild_reco/input_files/zh_mumu.slcio

Simulating a few events with ddsim is straight forward. ddsim can produce EDM4hep and LCIO format output files, and it decides which format to used based on the name of the output file:

  • Names ending on .slcio will result in LCIO output files

  • Names ending in edm4hep.root will result in in EDM4hep output files

In the course of this exercise we will only need the EDM4hep format, we simply provide both options for convenience here.

To run the simulation with EDM4hep output you can use the following command

ddsim --compactFile $k4geo_DIR/ILD/compact/ILD_l5_v02/ILD_l5_v02.xml \
      --steeringFile ddsim_steer.py \
      --inputFiles zh_mumu.slcio \
      --outputFile zh_mumu_SIM.edm4hep.root

To run the simulation with LCIO output you can use the following command

ddsim --compactFile $k4geo_DIR/ILD/compact/ILD_l5_v02/ILD_l5_v02.xml \
      --steeringFile ddsim_steer.py \
      --inputFiles zh_mumu.slcio \
      --outputFile zh_mumu_SIM.slcio

Depending on the machine where you are running this, this will take up to a few minutes to complete. You can start this and read on in the meantime.

Reconstruction

To run the reconstruction we will use the Gaudi based Key4hep framework. Note that we can run the reconstruction just the same as within iLCSoft via Marlin. However, we will not show that in this tutorial.

Using ILDReconstruction.py from ILDConfig

In order to run the standard reconstruction simply run the following command

k4run ILDReconstruction.py \
      --detectorModel=ILD_l5_o1_v02 \
      --inputFiles=zh_mumu_SIM.edm4hep.root \
      --outputFileBase=zh_mumu

The ILDReconstruction.py configuration can handle several different ILD detector configurations, so we have to choose one via the --detectorModel argument. Make sure that this is compatible to the one that has been used for simulation.

This will produce several new output files

  • zh_mumu_REC.edm4hep.root - The output of the reconstruction in EDM4hep format

  • zh_mumu_AIDA.root - The histograms that were produce by wrapped processors using the AIDA interface

  • zh_mumu_PfoAnalysis.root - The output file of the Pandora PFO analysis processor

ILDReconstruction.py also supports reading LCIO inputs and will detect this automatically from the input file name. It also supports producing LCIO output files via the --lcioOutput flag, which can either be off (default), on or only (for not producing any EMD4hep output).

Check the introduction to EDM4hep / podio for more details on how to read and analyse this file.

Creating a Gaudi options file

Warning

These instructions show you how to convert an existing Marlin steering file to a Gaudi options file using the full ILD reconstruction as an example since it exhibits a few of the potential issues that you might run into along the way. However, we strongly recommend using the existing reconstruction configuration that is available from ILDConfig for actually running the reconstruction. It has more features and also supports running with different detector models, something that the following steps will not achieve!.

The bulk of the work for creating such an options file from an existing Marlin steering file in XML format can be done with the convertMarlinSteeringToGaudi.py converter script. We will start by converting the MarlinStdReco.xml steering file and then do some minor adjustments to the converted options file. The main thing to consider for the ILD configuration is that MarlinStdReco.xml makes use of several include statements to pull in more configuration. Hence, we first have to create a Marlin steering file with these includes resolved. We also have to provide a DetectorModel constant here, since some of the includes depend on this.

Marlin -n MarlinStdReco.xml --constant.DetectorModel=ILD_l5_o1_v02

You should now have a MarlinStdRecoParsed.xml file. This is the one that we will convert using the converter script via

convertMarlinSteeringToGaudi.py MarlinStdRecoParsed.xml MarlinStdReco.py

Since some parts of the Marlin steering file conversion can not be handled automatically we have to make a few adjustments to MarlinStdReco.py. We recommend to simply edit the file directly, but you can also use the sed commands below to do these adjustments. The adjustments are:

  • Give the lcgeo_DIR constant (first entry in the CONSTANTS dict) a meaningful value. The easiest way to do this is to simply get the value of the corresponding environment variable via os.environ["lcgeo_DIR"] (don’t forget to import os at the top)

  • Exclude the BgOverlayWW, BgOverlayBB, BgOverlayBW, BgOverlayWB and PairBgOverlay algorithms from being run, by simply commenting out the lines where these are appended to the algList (this list is populated at almost the end of the file).

sed commands for adjustments
sed -i '1s/^/import os\n/' MarlinStdReco.py
sed -i 's/\( *.lcgeo_DIR.:\).*/\1 os.environ["lcgeo_DIR"],'/ MarlinStdReco.py
sed -i 's/algList.append(BgOverlayWW)/# algList.append(BgOverlayWW)/' MarlinStdReco.py
sed -i 's/algList.append(BgOverlayWB)/# algList.append(BgOverlayWB)/' MarlinStdReco.py
sed -i 's/algList.append(BgOverlayBW)/# algList.append(BgOverlayBW)/' MarlinStdReco.py
sed -i 's/algList.append(BgOverlayBB)/# algList.append(BgOverlayBB)/' MarlinStdReco.py
sed -i 's/algList.append(PairBgOverlay)/# algList.append(PairBgOverlay)/' MarlinStdReco.py

With the state the options file is in now, you would be able to run it with LCIO input.

Running the reconstruction with LCIO

To run the reconstruction with LCIO inputs and outputs we now simply need to pass in the input file that we have created at the simulation step

k4run MarlinStdReco.py --LcioEvent.Files=zh_mumu_SIM.slcio

This should take somewhere between 20 seconds up to roughly a minute to run. If you haven’t changed anything else you should now have a few output files:

ls StandardReco_*.*

should now show a REC and DST file, as well as a PfoAnalysis and an AIDA file. You can change the names of these files by adjusting the OutputBaseName, resp. The corresponding filename constants values in CONSTANTS.

Adapting the options file for EDM4hep

It is necessary to adapt the Gaudi options file a bit further:

  • Replace the LcioEvent algorithm with the PodioInput algorithm

    • Make sure to replace the Files option with the collections option and to populate this option with the list of collections you want to read (see below)

  • Replace the EventDataSvc with the k4DataSvc (remember to instantiate it with "EventDataSvc" as name)

  • Add a PodioOutput algorithm to write EDM4hep output (don’t forget to add it to the algList at the very end)

    • (For the sake of this exercise) configure this to only write the MCParticlesSkimmed, PandoraPFOs and the RecoMCTruthLink collections

  • Attach the necessary in-memory on-the-fly converters between EDM4hep and LCIO (and vice versa)

    • For the conversion of the EDM4hep inputs to LCIO instantiate a EDM4hep2LcioTool and attach it to the first wrapped processor that is run (MyAIDAProcessor). See detailed description below.

    • For the conversion of the LCIO outputs to EDM4hep instantiate a Lcio2EDM4hepTool and attach it to the last wrapped processor that is run before the PodioOutput algorithm that you just added (MyPfoAnalysis). Also see below.

For all of these steps make sure that you import all the necessary tools and algorithms from Configurables!

The top of your file should now look something like this

from Configurables import (
    PodioInput, PodioOutput, k4DataSvc, MarlinProcessorWrapper,
    EDM4hep2LcioTool, Lcio2EDM4hepTool
    )
from k4MarlinWrapper.parseConstants import *
algList = []
evtsvc = k4DataSvc("EventDataSvc")

while the configuration for the input reader and the EDM4hep2LcioTool should look like this

read = PodioInput()
read.OutputLevel = INFO
read.collections = [
    # ... list of collection names
]
algList.append(read)

edm4hep2LcioConv = EDM4hep2LcioTool()
edm4hep2LcioConv.collNameMapping = {
    "MCParticles": "MCParticle"
}

# ... Unchanged config of MyAIDAProcessor

MyAIDAProcessor.EDM4hep2LcioTool = edm4hep2LcioConv
list of collection names

The list of collections that is populated by standard configuration of ILD for simulation looks like this. You can simply copy this into the options file

read.collections = [
     "BeamCalCollection",
     "BeamCalCollectionContributions",
     "ECalBarrelScHitsEven",
     "ECalBarrelScHitsEvenContributions",
     "ECalBarrelScHitsOdd",
     "ECalBarrelScHitsOddContributions",
     "ECalBarrelSiHitsEven",
     "ECalBarrelSiHitsEvenContributions",
     "ECalBarrelSiHitsOdd",
     "ECalBarrelSiHitsOddContributions",
     "EcalEndcapRingCollection",
     "EcalEndcapRingCollectionContributions",
     "ECalEndcapScHitsEven",
     "ECalEndcapScHitsEvenContributions",
     "ECalEndcapScHitsOdd",
     "ECalEndcapScHitsOddContributions",
     "ECalEndcapSiHitsEven",
     "ECalEndcapSiHitsEvenContributions",
     "ECalEndcapSiHitsOdd",
     "ECalEndcapSiHitsOddContributions",
     "EventHeader",
     "FTDCollection",
     "HcalBarrelRegCollection",
     "HcalBarrelRegCollectionContributions",
     "HCalBarrelRPCHits",
     "HCalBarrelRPCHitsContributions",
     "HCalECRingRPCHits",
     "HCalECRingRPCHitsContributions",
     "HcalEndcapRingCollection",
     "HcalEndcapRingCollectionContributions",
     "HCalEndcapRPCHits",
     "HCalEndcapRPCHitsContributions",
     "HcalEndcapsCollection",
     "HcalEndcapsCollectionContributions",
     "LHCalCollection",
     "LHCalCollectionContributions",
     "LumiCalCollection",
     "LumiCalCollectionContributions",
     "MCParticles",
     "SETCollection",
     "SITCollection",
     "TPCCollection",
     "TPCLowPtCollection",
     "TPCSpacePointCollection",
     "VXDCollection",
     "YokeBarrelCollection",
     "YokeBarrelCollectionContributions",
     "YokeEndcapsCollection",
     "YokeEndcapsCollectionContributions",
]

Finally, the PodioOutput algorithm and the Lcio2EDM4hepTool can be configuration should look something like this

# ... MyPfoAnalysis configuration unchanged

lcio2edm4hepConv = Lcio2EDM4hepTool()
lcio2edm4hepConv.collNameMapping = {
    "MCParticle": "MCParticles"
}
MyPfoAnalysis.Lcio2EDM4hepTool = lcio2edm4hepConv

edm4hepOutput = PodioOutput()
edm4hepOutput.filename = "zh_mumu_reco.edm4hep.root"
edm4hepOutput.outputCommands = [
    "drop *",
    "keep MCParticlesSkimmed",
    "keep PandoraPFOs",
    "keep RecoMCTruthLink",
]

# ... the complete algList
algList.append(edm4hepOutput)

# ... ApplicationMgr config

Running the reconstruction with k4run

After all these adaptions it is now possible to run the full reconstruction chain on the previously simulated input with k4run

k4run MarlinStdReco.py --num-events=3 --EventDataSvc.input=zh_mumu_SIM.edm4hep.root

Here we are again using the command line to specify the input file, we could have just as well used the input option of the evtsvc in the options file. Note also that we explicitly pass in the number of events, this is a workaround for this issue.

You should now have a zh_mumu_reco.edm4hep.root file that contains the complete events in all their glory. For a more practical output you can tweak the edm4hepOutput.outputCommands option in order to keep only “interesting” collections. Also note that the REC and DST LCIO output files are still produced. Can you reproduce these data tiers for EDM4hep?