Description
I would like to use dragon as within the context of a larger piece of python software (Cubed - see cubed-dev/cubed#467). In particular I want to write an equivalent to Cubed's ProcessesExecutor
(or ThreadsExecutor
) but which uses Dragon as the concurrent_executor
.
All this executor needs to do is execute a series of stages each made up of a number of embarrasingly parallel tasks (each of which are python functions). I just want Dragon to launch the tasks in parallel for me across a whole HPC allocation.
I'm looking through the docs and I have two main questions:
1) Should I use the dragon.workflows.parsl_executor.DragonPoolExecutor
?
That seems like a drop-in replacement, but if that's actually using Parsl (which I noticed got built when I built the dragon
executable) and that's all I want to use then would I be better off not bothering with Dragon and just using the parsl.executors.ThreadPoolExecutor
instead? What's the difference?
2) How do I launch dragon from within the context of another python program?
All the docs examples seem to say that you use dragon to launch another python program from the command line like this
dragon my_python_script.py
, and dragon works by
replacing all standard Multiprocessing classes with Dragon equivalent classes before CPython resolves the inheritance tree.
(from Inheritance and Multiple Start Methods.)
But this is inconvenient if I can't represent my workload as a single standalone python script. Instead I ideally want to be able to call the Executor from inside a running python process on an interactive job (e.g. from within a jupyter notebook cell) and have it execute across a whole allocation.
Do I need to somehow auto-generate this script and make a subprocess.call
to the dragon
executable?
Or if I try omitting the dragon
executable (mentioned on this page) then I'm not sure what this implies:
The Dragon core library can still be imported via e.g.
from dragon.managed_memory import MemoryPool
and used. In this case, the "dragon" start method must not be set. The infrastructure will not be started.
This part:
Note that all other parts of the Dragon stack, in particular the Dragon Native API require the running Dragon infrastructure and are thus not supported without patching Multiprocessing.
seems to be saying that I can still use Dragon Core but not Dragon Native from within a python program that I didn't launch using the dragon
executable. Is the dragon.workflows.parsl_executor.DragonPoolExecutor
in Dragon Core or Dragon Native?
@applio you said you
got dragon to run the
add-asarray.py
example single node as the executor already
so I'm curious what your approach was?
cc @tomwhite