I think we have just got this issue in our studio’s render farm. When it happens a gob will get into the state on 4 out of 5 jobs. Render jobs that don’t come from IFD’s rendering on karma CPU or XPU, redshift and even geo processing. The last thing in the logs is..
Rendering frames 1010-1019
Then nothing. The node will use 1% cpu and not unlock unless it times out. It’s hard to debug as ther is nothing else in the logs.
we found one thing that helped. in the hython script that loads the file to render, the pipeline guys put a 10 second pause, this gave the hython session some time to catch up to the loading of the hip file and not error.