PDG Graphs on Farm

   5360   9   2
User Avatar
Member
577 posts
Joined: 11月 2005
Offline
Hello,
we successfully connected our dispatcher ( Muster )to PDG and the task are executed like expected on the assigned computers. That's very cool, but I must admit I do not fully understand the way this should work.
In this setup I can launch PDG graphs on the dispatcher, but as soon as I quit houdini or it dies(happens from time to time), the graph is not executed anymore. It seems all the actual dispatching work is done in PDG and as soon as it goes down, the current task are finished, but task that should be started are not started anymore. Like expected.

But often we would need a more disconnected way to work with PDG graphs. For example it would be quite unrealistic to have a 3 day render on the farm and keep houdini open for three days, so it can proceed with some ffmpeg and comp tasks.
Let's say I have a top with sim, a ifd generate, a ifd render and a ffmpeg and just want to send it to the farm, and then close houdini, or do something different.
Are there any methods already to do this?
Where does Pilot fit into this, by the way?

thanks
Martin
User Avatar
Member
201 posts
Joined: 7月 2015
Offline
Hey Martin,
you can send the graph execution itsself to the farm. You can find it under the submitAsJob callback.


I have a quick question about Muster if you don't mind asking. Is Muster able to expand existing jobs/taks/.. (don't know Muster's terminology). If yes, lucky you! If not, how did you go about translating dynamic PDG work items to Muster?
Edited by shadesoforange - 2019年10月13日 02:15:02

Attachments:
Capture.PNG (39.3 KB)

Manuel Köster - Senior Technical Artist @Remedy Entertainment

https://www.remedygames.com/ [www.remedygames.com]
http://shadesoforange.de/ [shadesoforange.de]
https://twitter.com/ShadesOfOrange_ [twitter.com]
User Avatar
Member
577 posts
Joined: 11月 2005
Offline
Hi Manuel, thank You.
I'm a bit confused as we now connect to the farm renders with a musterscheduler, not sure where a additional scheduler fits here. But I see, there are “submit graph as job buttons” in hqueue and deadline as well. Does that mean that the complete graph is processed as a job on farm, just as if You open the scene and do a Cook Output Node?

I was looking for a even more static way. More like all static tasks are evaluated with all dependencies, and then all nodes that create files are submitted as “indipendent” job to the dispatcher, but with correct dependencies set on job or chunk level.
I tried to resolve the dependencies with the api, but I could not easily get over nodes like wait for all, as they seem to have no staticWorkItems jsut staticWrapers, that made me wonder is there maybe already a defined way in the api, to resolve all dependencies.

About the expanding jobs stuff, I think this is possible. right now when I start a job with several top nodes, the first batches of tasks get launched on the assigned computers, and no other job exists for muster at this moment. as soon as one batch is finished a dependent batch pops into existence on muster. the only info it has is the, top node, task name and framerange.
looks like thisL: –batch -p “filename” -n “\obj\geo1\fileCache1\render” -i “ropfetch40” -fs 1 -fe 1 -fi 1
muster seems to just connect to the farm. all the dispatching happens in pdg. hope that helps

Martin
Edited by sanostol - 2019年10月14日 11:14:42
User Avatar
Member
603 posts
Joined: 9月 2016
Offline
sanostol
But I see, there are “submit graph as job buttons” in hqueue and deadline as well. Does that mean that the complete graph is processed as a job on farm, just as if You open the scene and do a Cook Output Node?
Yes, the graph is cooked using hython on the farm, which spawns jobs on the farm as usual.

I was looking for a even more static way. More like all static tasks are evaluated with all dependencies, and then all nodes that create files are submitted as “indipendent” job to the dispatcher, but with correct dependencies set on job or chunk level.
This is possible but not recommended. If you call the pdg.GraphContext.cook function using pdg.cookType.StaticDepsFull , your scheduler will get the onScheduleStatic callback with complete task graph information.
The reason it's not recommended is because much of TOPs functionality uses dynamic generation of items, which can't work with a static job description, in addition many job scripts assume they can communicate with PDG, and will fail if it's not there. For example rop-fetch batches. Also it's not easy in general to translate a graph with fan-in and fan-out dependencies into a typical static job description.
User Avatar
Member
201 posts
Joined: 7月 2015
Offline
sanostol
Hi Manuel, thank You.
I'm a bit confused as we now connect to the farm renders with a musterscheduler, not sure where a additional scheduler fits here.



About the expanding jobs stuff, I think this is possible. right now when I start a job with several top nodes, the first batches of tasks get launched on the assigned computers, and no other job exists for muster at this moment. as soon as one batch is finished a dependent batch pops into existence on muster. the only info it has is the, top node, task name and framerange.
looks like thisL: –batch -p “filename” -n “\obj\geo1\fileCache1\render” -i “ropfetch40” -fs 1 -fe 1 -fi 1
muster seems to just connect to the farm. all the dispatching happens in pdg. hope that helps

Ah I see what you mean. I've downloaded muster yesterday on my home pc and took a small dive into its PDG stuff. I'm guessing you are “just” using the default scheduler that is shipped with muster.
That one does not seem to have the “Submit graph as Job” functionality. Maybe you guys could speak to vvertex and see if they are interested in implementing it. Otherwise you guys could also expand the provided scripts yourself for that functionality.

Maybe this helps.

chrisgreb
Also it's not easy in general to translate a graph with fan-in and fan-out dependencies into a typical static job description.

Could elaborate what fan-in fan-out dependencies are Chris?

Thanks!
Manuel Köster - Senior Technical Artist @Remedy Entertainment

https://www.remedygames.com/ [www.remedygames.com]
http://shadesoforange.de/ [shadesoforange.de]
https://twitter.com/ShadesOfOrange_ [twitter.com]
User Avatar
Member
577 posts
Joined: 11月 2005
Offline
Hi Manuel,

yes we are using the muster scheduler. the vvertex developer is very supporting, very cool and we are in contact with him.

Hi Chris,

I get Your point (more or less), with a typical static job description You loose a lot of functionality. Still trying to get the broader concepts when submitting stuff to farms. I think I have to install at some point hqueue to be able to compare. The downside I see right now with a full pdg submission to farm is the lack of overview. Jobs pop into existence depending on previous jobs, what makes it hard to keep track if You have multiple jobs with dependencies. it is hard to keep track. Right now if a job in the chain fails, I'm not to sure if I can restart this specific job/junk and everything is just fine.

I specifically have have tasks in mind, that are low maintenance, proven and of high count. Generic stuff, creature fx, stuff like that. they are less dynamic in job creation. asset imports, collision preparation, presim, sim and postsim passes, flip book and ffmpeg. maybe a bit wedging. they are not difficult, but most of the time You have to deal with huge amounts and just want to throw it on to the farm, and see result clips before You continue to push it into the next department.

It would be easier with the top interface embedded into the scheduler.
What position does Pilot have here?

Thanks for Your insight
Martin
User Avatar
Member
603 posts
Joined: 9月 2016
Offline
Could elaborate what fan-in fan-out dependencies are Chris?
Thanks!
By fan-in, I mean that jobs can have multiple upstream dependencies (like when generating from a partition), as well as multiple downstream (fan-out). This might be complicated to translate depending on what the job spec supports.
User Avatar
Member
603 posts
Joined: 9月 2016
Offline
sanostol
The downside I see right now with a full pdg submission to farm is the lack of overview. Jobs pop into existence depending on previous jobs, what makes it hard to keep track if You have multiple jobs with dependencies. it is hard to keep track. Right now if a job in the chain fails, I'm not to sure if I can restart this specific job/junk and everything is just fine.

We've done some work for the next release of Houdini to improve visibility of such submitted jobs. There will be a (preview, still WIP) way to attach a Pilot PDG session to a remotely executing cook.

It would be easier with the top interface embedded into the scheduler.

In the future release, this may be possible since a web viewer could be attached to the session.
Edited by chrisgreb - 2019年10月17日 10:17:25
User Avatar
Member
577 posts
Joined: 11月 2005
Offline
that would help a lot
User Avatar
Member
8768 posts
Joined: 7月 2007
Offline
I'm starting to look into TOP farm submissions and am curious if there were any improvements in the static submissions

Being able to submit jobs statically for completely static graphs is crucial for visibility

Also some nodes like Fetch TOP are unnecessarily dynamic even if fetching from within the same file, and when forced as static, they create just a single opaque workitem without any info passed through, so there is no static transparency into resolved workitems inside of fetched topet
Ideally fetching from the same hip would just behave as if the fetched branch was part of the same network, otherwise fetching topnet from within Filecache, especially Labs wedged Filecache is tedious and the proposed workaround of having it channel referenced FileCache TOP is very dirty and not Houdini like
(I have an RFE#138516 about having lightweight Fetch TOP that just does this kind of live fetching from different location within the same file essentially behaving like an instance of fetched bode )

Ideally the fetching and submission can be as simple as it is within ROP network and Fetch ROP where all dependencies can be statically resolved (treating all fetched networks as a single complex graph) and submitted easily

Submit Job as Graph in TOPs is in theory a good idea but imagining that every sunmitted graph needs to consume hengine license just to manage the work of other workers is not viable, especially not for schedulers that make it difficult to stack low core jobs on the same machine to save on licenses

I know that this may get complicated with the possibilities of the Graph running machine having to execute tasks In-Process etc. so I'd like to also hear from people who successfully use TOPs in production in a way that doesn't lock local machines and whether running dynamically as a graph executing job is really the only option
Tomas Slancik
FX Supervisor
Method Studios, NY
  • Quick Links