How to run multithreading with k4MarlinWrapper (Gaudi)

Supported

  • Reading LCIO events with LcioEvent()

  • Writing LCIO events with LcioEventOutput()

  • Running MarlinProcessorWrappers with no converters

  • Using whiteboard as ExtSvc (no k4DataSvc or EventDataSvc)

Not supported

  • Using EDM converters EDM4hep2LcioTool and Lcio2EDM4hepTool()

  • Reading EDM4hep events with PodioInput()

  • Writing EDM4hep events with PodioOutput()

  • Writing EDM4hep events with MarlinProcessorWrapper of type LCIOOutputProcessor

  • Running non-thread algorithms/processors in parallel

  • Running wrapped Marlin processors that make use of the isFirstEvent method in their processEvent method to do some setup only in the first event. There is no way to make this thread safe. If you want your processor to be usable on in a multi threaded environment, you have to move this setup to init.

Running Gaudi with multithreading support

Gaudi uses Intel TBB under the hood for multithreading.

Gaudi exposes two main levels of parallelism:

  • Inter-event parallelism: running multiple events in parallel

  • Intra-event parallelism: running multiple algorithms in parallel, within an event

The two levels of parallelism can be combined: events can run in parallel, and algorithms within the events can run in parallel.

How to run with inter-event parallelism

The following components are used to achieve parallelism:

from Configurables import (HiveWhiteBoard, HiveSlimEventLoopMgr, AvalancheSchedulerSvc)

These 3 components need to be configured to adapt the level of parallelism to the sequence, algorithms and hardware to be used.

  • Event Data Service: HiveWhiteBoard

    • EventSlots: Number of events that may run in parallel, each with its own EventStore

    • This is the Event Data Service, which needs the number of EventSlots

  • Event Loop Manager: HiveSlimEventLoopMgr

    • Event Loop Manager with parallelism support

  • Thread Scheduling config: AvalancheSchedulerSvc

    • Scheduler to indicate the number of threads to use

    • In needs the total number of threads to use: this determines how many events and algorithms can be in flight (run in parallel)

    • Default value is -1 which indicate TBB to take over the machine with what it decides to be the optimal configuration

All these components can be set as follows in the options file:

evtslots = 4
threads = 4

whiteboard = HiveWhiteBoard("EventDataSvc", EventSlots=evtslots)
slimeventloopmgr = HiveSlimEventLoopMgr(SchedulerName="AvalancheSchedulerSvc", OutputLevel=DEBUG)
scheduler = AvalancheSchedulerSvc(ThreadPoolSize=threads, OutputLevel=WARNING)

from Configurables import ApplicationMgr
ApplicationMgr( TopAlg = [seq],
                EvtSel = 'NONE',
                EvtMax = 10,
                ExtSvc = [whiteboard],
                EventLoop=slimeventloopmgr,
                MessageSvcType="InertMessageSvc"
              )

To only run events in parallel, with no parallelism between the algorithms within each event, the cardinality of the algorithms must be set:

  • Given a list of algorithms, set the cardinality of all to be 1

  • Create a GaudiSequencer with all the algorithms as Members

  • Set the GaudiSequencer sequential property to true

  • Pass the created GaudiSequencer to the Application Manager in the TopAlg: TopAlg = [seq]

from Configurables import MarlinProcessorWrapper
from Configurables import (GaudiSequencer)

cardinality = 1

alg1 = MarlinProcessorWrapper("alg1")
alg2 = MarlinProcessorWrapper("alg2")
alg3 = MarlinProcessorWrapper("alg3")
alg4 = MarlinProcessorWrapper("alg4")

algList = []

algList.append(alg1)
algList.append(alg2)
algList.append(alg3)
algList.append(alg4)

for algo in algList:
    algo.Cardinality = cardinality
    algo.OutputLevel = DEBUG

seq = GaudiSequencer(
    "createViewSeq",
    Members=algList,
    Sequential=True,
    OutputLevel=VERBOSE)

from Configurables import ApplicationMgr
ApplicationMgr( TopAlg = [seq],
                EvtSel = 'NONE',
                EvtMax = 10,
                ExtSvc = [whiteboard],
                EventLoop=slimeventloopmgr,
                MessageSvcType="InertMessageSvc"
              )

Running example

A multi-threaded CLIC Reconstruction can be run in multi-threaded mode, for LCIO input and output. After successful compilation, from the build location:

# Check available tests
ctest -N

# Run multi-threaded clicReconstruction test
ctest -R clicRec_lcio_mt