Simplest way to add a dependency in the Python TOP

Forums PDG/TOPs Simplest way to add a dependency in the Python TOP

3450 15 1


BitreelTD: Member; 19 posts; Joined: Dec. 2019; Offline

Jan. 24, 2023 3:32 p.m.

I'm struggling with figuring out how to get this working. Ideally, each of the tasks that this Python node would generate would simply wait for the previous one that was added before it to finish.

I tried putting this in the dependency, and getting this error. Where did I go wrong?

Attachments:
internal_dependencies.jpg (1.1 MB)


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Jan. 24, 2023 4:41 p.m.

Node callbacks are only permitted to access the local variables passed into them. In this case, the callback is only allowed to add dependencies between the work items from internal_items list.

This is especially important if your node is dynamic. A dynamic node generates work items each time an input work item cooks, and it might be doing that in parallel. Therefore, the list of work items on the node itself may be incomplete at any point in time.

It's important to note that onAddDependencies is called each time the node generates work item, not once per node. For example if your node is dynamic and has 10 input itemss, onGenerate will be called 10 times (once for each cooked input work item). onAddDependencies will also be called 10 times as well, once for each of the lists of work items produced by the corresponding onGenerate call.

You will need to change your Generate When parameter to "All Upstream Items are Generated" or "All Upstream Items are Cooked" if you need to access all work items at the same time.


BitreelTD: Member; 19 posts; Joined: Dec. 2019; Offline

Jan. 24, 2023 5:25 p.m.

Thanks, that does make sense. But doesn't give me the solution I'm looking for.

For this dynamically generated node. I want to form an internal dependency so that each item is dependent on the last item that was added. Where would I add this code, in the "on generate" I take it? And how would this dependency holder be constructed?


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Jan. 24, 2023 5:37 p.m.

In order to do that, you need the node generate all of it's work items in one go.

If you set "Generate When" to "All Upstream Items are Cooked", then your node will generate all its work items with a single onGenerate call, once all upstream work items are cooked. It will then invoke onAddInternalDependencies once as well, on the full list of generated work items, which will allow you to add the dependencies you need.

If your node only needs the inputs to be generated in order to create its work items, then you can use "All Upstream Items are Generated" instead. But you have to use one of those options to force the node to do it's generation/internal dependencies for the whole node at once.

Edited by tpetrick - Jan. 24, 2023 17:37:54


BitreelTD: Member; 19 posts; Joined: Dec. 2019; Offline

Jan. 24, 2023 5:53 p.m.

I might be after a different workflow all together then. I've had this come up multiple times for various tasks. Creating a directory, cleaning a directory, creating a file.

In these instances, I might have 100 tasks, but I only want this operation done once. The reason for this is that other conditions might not be ready before these tasks will run.

Here would be a scenario.

I want this Python TOP to write files to a directory
This directory should be cleaned before the python TOP runs
I don't want to wait for all of it's parent tasks to complete before running the first task, I also don't want each task to be deleting the outputs of other tasks.

For put more simply:
I want all files in this folder to be deleted before this top is allowed to run, but, there is no need to wait for all of it's parent tasks to complete to do this function.

Is there some sort of a pre-run area that can be accessed for a top/pdg node? Similar to the pre-render script of a rop?


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Jan. 25, 2023 10:56 a.m.

Processor nodes have a Pre Cook callback which might do what you need, but it isn't exposed on the Python Processor TOP: https://www.sidefx.com/docs/houdini/tops/processors.html#onprecook-pdg-result [www.sidefx.com] It's currently only available if you write a custom node, but we can easily promote it up onto the Python Processor if you think that would be useful. Please file an RFE for that.

I do have a follow up question to your use case though. What happens if you cancel the cook with half of the work items completed, and then cook the node again to trigger the remaining work items? It sounds like you'd probably NOT want to run the pre cook logic in that case, even though it's technically a new cook, otherwise it would delete the outputs produced by the work items that were already cooked. So what you'd actually need is a "pre cook, but only before any work items cook" operation.

It might be tempting to assume that's the same as running the pre cook as part of the first work item, but work items don't necesasrily cook in order. The first logical item work in the node may infact be the last one to start cooking, depending on when its input tasks actually finish. Or the first one might fail, and then be cooked a second time at a later point due to scheduler retry settings, etc.

Adding a dependency between the first item and the remaning items in the node would fix that, as you were trying to do, but it would mean that any change to the first work item's state would effectively dirty the whole node. And every other work item in the node would end up having to wait on the first work item, which could end up being the same thing as waiting for all inputs to cook depending on the cook order of the input work items.

Generally speaking, PDG is designed to encourage work items that operate on local state only, i.e. the attributes and files associated with the work item. That way things in the same node can be run in parallel and in any order, once their dependencies are satisfied.

In other words, I'm not sure if there's a way to do exactly you want, depending on what's expected in the case were a node is only partially cooked and resumed. I think adding in pre/post hooks or a pre/post node is valid RFE, but the exact behavior needs to be well defined.

Edited by tpetrick - Jan. 25, 2023 10:59:35


BitreelTD: Member; 19 posts; Joined: Dec. 2019; Offline

Jan. 25, 2023 12:50 p.m.

This sounds like a great solution. I brutally hacked a solution, I'll post it once it's sane, as well as do an RFE But probably the best way is the way you've described.

Edited by BitreelTD - Jan. 25, 2023 12:51:44


florens: Member; 23 posts; Joined: March 2020; Offline

Aug. 17, 2023 9:43 p.m.

Heyja just reviving this I'm trying to follow pretty much Bitreels workflow but my usecase is a bit different.

What I essentially want to do is n*2 amount of workitems but have the first one compute first in a generic generator TOP and then the other. I made a very simple scene for now with a python processor running this on the generate tab:

i = 0
for k in range(3):
    new_item = item_holder.addWorkItem(index = i, priority = 0)
    i+=1
    
    new_item2 = item_holder.addWorkItem(index = i, priority = 1)
    i+=1
    item_holder.addDependency(new_item2, new_item)

and then afterwards I've got the the generic generator that for now is just running "sleep 3" just to see if the dependencies are making it through.

When I run this I can see the workitems linked to each other on the items on the python processor but the generator will still run them all together.

I could ofc just set the generator to sequential but for the actual usecase I'll be spawning a couple hundred workitems some taking longer than others and would really just love that dependency to make it through alright. I've set both the processor and the generator to generate when all upstream items are cooked.

My plan B is to just duplicate the generator and split the items into their own stream. Then wait until stream A is cooked and only then cook stream B. Feels dirty though.

How could I go about this?


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Aug. 17, 2023 10:07 p.m.

In order to add dependencies between work items in the same node, you need to implement the Add Internal Dependencies callback on the Python Processor. The generation step can't add dependencies. The purpose of the different node callbacks is described here: https://www.sidefx.com/docs/houdini/tops/processors.html [www.sidefx.com]

For example, the implementation used by the Generic Generator looks like the following, when making work items cook in order:

    def onAddInternalDependencies(self, dependency_holder, internal_items, is_static):
        if not self['sequential'].evaluateBool():                                
            return pdg.result.Success                                            
                                                                                 
        previous_item = None                                                     
        for internal_item in internal_items:                                     
            if previous_item:                                                    
                dependency_holder.addDependency(internal_item, previous_item)    
            previous_item = internal_item                                        
        return pdg.result.Success


florens: Member; 23 posts; Joined: March 2020; Offline

Aug. 17, 2023 11:36 p.m.

Thank you for the quick reply!

How exactly would I implement this?

I had a look at the docs but couldn't quite understand where to add this snippet either.

I tried just copy pasting this into the Add Internal Dependenices tab of the processor and adding a call to the function underneath.

I took off the addDependency call in the onGenerate tab and had to take off the if sequential check because otherwise the node would error since the python processor doesn't have that parm.

I'm still seeing the nodes on the generator to cook simultaneously though.

It's hard to say I believe they start after one another but they don't wait for the previous one to finish cooking before the next one starts.

cheers,
Florens


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Aug. 17, 2023 11:47 p.m.

You don't need to call the function, you can just paste the code in for adding dependencies. PDG invokes the callbacks automatically on the Python Processor for you, when needed. I've attached an example file that creates 10 work items in a Python Processor that sleep for 2 seconds each, and are sequentially dependent on one and other.

Edited by tpetrick - Aug. 17, 2023 23:48:17

Attachments:
sequential.hip (86.7 KB)


florens: Member; 23 posts; Joined: March 2020; Offline

Aug. 17, 2023 11:49 p.m.

Maybe just to add

as a planB I wanted to try priorities. It's not fully guaranteeing that the first work item of each array will cook first but it would mitigate the issue. I'm getting warnings on the processor though

Constructing work item with custom priority, but the containing node's Work Item Priority parameter is not set to 'Node Defines Priority'

I can't find that parm it's referring to anywhere neither on the python processor itself nor anywhere in the add parameter window node presets. Where can I find that parameter and what values does it expect?


florens: Member; 23 posts; Joined: March 2020; Offline

Aug. 17, 2023 11:54 p.m.

tpetrick
You don't need to call the function, you can just paste the code in for adding dependencies. PDG invokes the callbacks automatically on the Python Processor for you, when needed. I've attached an example file that creates 10 work items in a Python Processor that sleep for 2 seconds each, and are sequentially dependent on one and other.

Aaaaah interesting you're calling the command directly inside the processor. Is there any chance to have the dependencies travel across tops?


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Aug. 17, 2023 11:59 p.m.

The setCommand call is just there to add a delay to the work items -- it's done using Python so that doesn't rely on any particular shell features. It's basically just a command line string that causes the work item to sleep for 2 second as their "job", so the work items don't cook instantly.


tpetrick: Staff; 600 posts; Joined: May 2014; Offline

Aug. 18, 2023 12:07 a.m.

To add to that -- the setCommand API methods sets the command line string that gets run when the work is scheduled to cook out of process. It doesn't run immediately, it's just a string field on the work item, which is eventually run whenever the work item actually gets to cook. You can right-mouse button on the node and selected "Generate Node" to generate all tasks without running anything, and then inspect the dependencies of those work items, their command line string, etc.

As for traveling across TOPs, you might want to look into Feedback loops if you want the dependency relationship to be maintained. Feedback blocks create tasks that behave like a traditional for loop -- each iteration cooks all the way down to the bottom of the block, before beginning the next iteration at the top of the block. Iterations also cook in serial, whereas TOPs work items normally cook in parallel.


florens: Member; 23 posts; Joined: March 2020; Offline

Aug. 18, 2023 12:11 a.m.

for loop is an interesting call. Tbf I can probably make my case work inside one python processor though out of curiosity would each loop of the for loop cook in parallel as well?

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts