I'm working with a very large range of frames (tens of thousands), where a simple Python script processes each frame. I've noticed that running the script for each frame individually is much faster than processing multiple frames in parallel. This seems to be due to the overhead of loading Python libraries for each work item, which significantly adds up over time.
I'd like to optimize the workflow by running a chunk of work items sequentially within a single work item so that Python modules load only once, and then the script processes several frames before loading the modules again. However, I'm having trouble setting this up.
I would like to group, say, 10 work items together and run them sequentially as if they were one. So far, attempts at partitioning work items either end up processing chunks in sequence (with the loading overhead for each) or treating each work item separately, which brings me back to the original performance issue.
How do I approach this?