<!--
Copyright (c) 2019-2024 Key4hep-Project.

This file is part of Key4hep.
See https://key4hep.github.io/key4hep-doc/ for further info.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# How to run multithreading with k4MarlinWrapper (Gaudi)

**Supported**
- Reading LCIO events with `LcioEvent()`
- Writing LCIO events with `LcioEventOutput()`
- Running `MarlinProcessorWrapper`s with **no** converters
- Using `whiteboard` as `ExtSvc` (no k4DataSvc or EventDataSvc)

**Not supported**
- Using EDM converters `EDM4hep2LcioTool` and `Lcio2EDM4hepTool()`
- Reading EDM4hep events with `PodioInput()`
- Writing EDM4hep events with `PodioOutput()`
- Writing EDM4hep events with `MarlinProcessorWrapper` of type `LCIOOutputProcessor`
- Running non-thread algorithms/processors in parallel
- Running wrapped Marlin processors that make use of the `isFirstEvent` method in their `processEvent` method to do some setup only in the first event. There is no way to make this thread safe. If you want your processor to be usable on in a multi threaded environment, you have to move this setup to `init`.

## Running Gaudi with multithreading support

Gaudi uses [oneTBB](https://uxlfoundation.github.io/oneTBB/) under the hood for multithreading.

Gaudi exposes two main levels of parallelism:
- Inter-event parallelism: running multiple events in parallel
- Intra-event parallelism: running multiple algorithms in parallel, within an event

The two levels of parallelism can be combined: events can run in parallel, and algorithms within the events can run in parallel.

### How to run with inter-event parallelism

The following components are used to achieve parallelism:

```python
from Configurables import (HiveWhiteBoard, HiveSlimEventLoopMgr, AvalancheSchedulerSvc)
```

These 3 components need to be configured to adapt the level of parallelism to the sequence, algorithms and hardware to be used.

- Event Data Service: `HiveWhiteBoard`
  + *EventSlots*: Number of events that may run in parallel, each with its own EventStore
  + This is the Event Data Service, which needs the number of EventSlots
- Event Loop Manager: `HiveSlimEventLoopMgr`
  + Event Loop Manager with parallelism support
- Thread Scheduling config: `AvalancheSchedulerSvc`
  + Scheduler to indicate the number of threads to use
  + In needs the total number of threads to use: this determines how many events and algorithms can be in flight (run in parallel)
  + Default value is `-1` which indicate TBB to take over the machine with what it decides to be the optimal configuration


All these components can be set as follows in the options file:

```python
evtslots = 4
threads = 4

whiteboard = HiveWhiteBoard("EventDataSvc", EventSlots=evtslots)
slimeventloopmgr = HiveSlimEventLoopMgr(SchedulerName="AvalancheSchedulerSvc", OutputLevel=DEBUG)
scheduler = AvalancheSchedulerSvc(ThreadPoolSize=threads, OutputLevel=WARNING)

from Configurables import ApplicationMgr
ApplicationMgr( TopAlg = [seq],
                EvtSel = 'NONE',
                EvtMax = 10,
                ExtSvc = [whiteboard],
                EventLoop=slimeventloopmgr,
                MessageSvcType="InertMessageSvc"
              )
```

To only run events in parallel, with no parallelism between the algorithms within each event, the cardinality of the algorithms must be set:

- Given a list of algorithms, set the cardinality of all to be 1
- Create a `GaudiSequencer` with all the algorithms as `Members`
- Set the `GaudiSequencer` sequential property to true
- Pass the created `GaudiSequencer` to the Application Manager in the `TopAlg`: `TopAlg = [seq]`

```python
from Configurables import MarlinProcessorWrapper
from Configurables import (GaudiSequencer)

cardinality = 1

alg1 = MarlinProcessorWrapper("alg1")
alg2 = MarlinProcessorWrapper("alg2")
alg3 = MarlinProcessorWrapper("alg3")
alg4 = MarlinProcessorWrapper("alg4")

algList = []

algList.append(alg1)
algList.append(alg2)
algList.append(alg3)
algList.append(alg4)

for algo in algList:
    algo.Cardinality = cardinality
    algo.OutputLevel = DEBUG

seq = GaudiSequencer(
    "createViewSeq",
    Members=algList,
    Sequential=True,
    OutputLevel=VERBOSE)

from Configurables import ApplicationMgr
ApplicationMgr( TopAlg = [seq],
                EvtSel = 'NONE',
                EvtMax = 10,
                ExtSvc = [whiteboard],
                EventLoop=slimeventloopmgr,
                MessageSvcType="InertMessageSvc"
              )
```

## Running example

A multi-threaded CLIC Reconstruction can be run in multi-threaded mode, for LCIO input and output.
After successful compilation, from the build location:

```sh
# Check available tests
ctest -N

# Run multi-threaded clicReconstruction test
ctest -R clicRec_lcio_mt
```