Reading and writing EDM4hep files in Gaudi

The facilities to read and write EDM4hep (or in general event data models based on podio) are provided by k4FWCore. This page will describe their usage, but not go into too much details of their internals. This page also assumes a certain familiarity with Gaudi, i.e. most of the snippets just show a minimal configuration part, and not a complete runnable example.

The k4DataSvc

Whenever you want to work with EDM4hep in the Gaudi based framework of Key4hep, you will need to use the k4DataSvc as EventDataSvc. You can instantiate and configure this service like the following

from Gaudi.Configuration import *
from Configurables import k4DataSvc

evtSvc = k4DataSvc("EventDataSvc")

It is important that the name is EventDataSvc in this case, as otherwise this is an assumption from Gaudi. Once you have the k4DataSvc instantiated, you still have to make the ApplicationMgr aware of it, by making sure that the evtSvc is in the list of the external services (ExtSvc):

from Configurables import ApplicationMgr
ApplicationMgr(
    # other args
    ExtSvc = [evtSvc]
)

Reading events

To read events you will need to use the PodioInput algorithm in addition to the k4DataSvc. Currently, you will need to pass the input file to the k4DataSvc via the input option but pass the collections that you want to read to the PodioInput. We are working on making this (discussion happens in this issue). The parts of your options file related to reading EDM4hep files will look something like this

from Configurables import PodioInput, k4DataSvc

evtSvc = k4DataSvc("EventDataSvc")
evtSvc.input = "/path/to/your/input-file.root"

podioInput = PodioInput()

It is possible to change the input file from the command line via

k4run <your-options-file> --EventDataSvc.input=<input-file>

By default the PodioInput will read all collections that are available from the input file. It is possible to limit the collections that should become available via the collections option

podioInput.collections = [
  # List of collection names that should be made available
]

Writing events

To write events you will need to use the PodioOutput algorithm in addition to the k4DataSvc:

from Configurables import PodioOutput

podioOutput = PodioOutput("PodioOutput", filename="my_output.root")

By default this will write the complete event contents to the output file.

Writing only a subset of collections

Sometimes it is desirable to limit the collections to a subset of all available collections from the EventStore. The PodioOutput allows to do this via the outputCommands option that takes a list of keep or drop commands. Each command must consist of the keep/drop command and a target. The target is a collection name that may include the ? or * wildcard patterns. This might look like the following

podioOutput.outputCommands = ["keep *"]

which will keep everything (the default), while

podioOutput.outputCommands = ["drop *"]

will simply drop all collections and effectively write an empty file (apart from some metadata). A common pattern is to "drop *" and then selectively adding keep collections to keep, e.g. to only keep the highest level MC and reco information:

podioOutput.outputCommands = [
    "drop *",
    "keep MCParticlesSkimmed",
    "keep PandoraPFOs",
    "keep RecoMCTruthLink",
]