Houdini 20.5 Executing tasks with PDG/TOPs

PDG Data Layer (PDGD)

PDG data subscription service that automatically sends subscribers updates on what has changed with their data.

On this page

Overview

What is the PDG data layer?

The PDG data layer or PDGD is an alternative method for accessing PDG data.

PDGD is a subscription-based system. This means that you can use PDGD to subscribe to a PDG object, watch that object’s data, and then report back on any changes to the object’s data via automatic updates.

Basic usage

To be able to access a PDG object’s data, you first need to create a visualizer for the object and then create a subscription to the object.

Code

Behavior

pdgd.util module

Provides utility functions and classes to make it easier to interact with PDGD.

get_local_data_layer() method

Returns the data layer interface for the current Houdini instance.

pdgd.util.PDGDObject constructor

Accepts two parameters:

  • The PDG object path.

  • The data layer interface. This can be the local data layer interface returned by pdgd.util.get_local_data_layer() or a remote data layer interface.

Example

import pdgd.util 

data_layer = pdgd.util.get_local_data_layer() 
visualizer_object = pdgd.util.PDGDObject('pdg/object/path', data_layer)

def my_callback(pdg_object_path, object, message): 
        pass

# It is important to note that at this point the visualizer_object 
# may already have received data, so you might want to manually call 
# my_callback once if you need to process that initial data.

visualizer_object.addListener(my_callback)

The pdgd.util.PDGDObject class abstracts communication with PDGD. As soon as you create it, it subscribes to the PDGD object, constructs an internal representation of the object, and converts all data to python types. You can access this internal representation at any time using the object member. You can also register the listener functions with PDGDObject. These are the functions that are triggered every time the object’s data changes.

PDGD property names are simple strings like “graph_list” or “id”. In order to maintain compatibility with future versions of PDGD, these property names are exposed through pdgd.PropertyNames. So instead of directly using a property name like “graph_list”, we recommend that you use pdgd.PropertyNames.GraphList instead.

Tip

You can find examples that show basic PDGD usage here: $HFS/pdgd/examples/visualizer.

PDG graphs

There is a high-level PDG object that keeps track of all the PDG graphs created by the current Houdini instance. The path for this high-level object is 'pdg'.

Example

import pdgd.util

data_layer = pdgd.util.get_local_data_layer() 
visualizer_object = pdgd.util.PDGDObject('pdg', data_layer) 

print(visualizer_object.object)

The example above prints a dictionary with a graph_list property. This is a list of all the PDG graphs created by the current Houdini instance. If you create a new TOP network, then the object would update to include the new graph in that list.

The graph_list property holds a list of graph names. You can use these names to build their graph object paths using the format 'pdg/{graph_name}'.

Example

import pdgd 
import pdgd.util

GRAPH_LIST_KEY = pdgd.PropertyNames.GraphList

data_layer = pdgd.util.get_local_data_layer()
visualizer_object = pdgd.util.PDGDObject('pdg', data_layer)

graph_objects = {} 

if GRAPH_LIST_KEY in visualizer_object.object: 
    for graph_name in visualizer_object.object[GRAPH_LIST_KEY]: 
        graph_objects[graph_name] = pdgd.util.PDGDObject('pdg/' + graph_name, data_layer) 

PDG nodes

You can access nodes using the format 'pdg/{graph_name}/node_{node_id}'.

Nodes are owned by the graph, and you can obtain a node ID list by subscribing to the graph.

The pdgd.PropertyName.NodeList property holds a list of IDs that you can use to build node object paths.

PDG work items

You can access work items using the format 'pdg/{graph_name}/work_item_{work_item_id}'.

Work items are owned by the graph, and you can obtain a work item ID list by subscribing to the graph.

The pdgd.PropertyNames.WorkItems property holds a list of IDs that you can use to build work item object paths, and this list holds all work items in the graph. Node objects also have a pdgd.PropertyNames.WorkItems property, but this list only holds the node’s work items.

Remote PDGD

In order to access data from a remote PDG instance, the code only needs a remote data layer interface.

Houdini provides two default server/client implementations, a WebSockets based one and a binary one (using TCP sockets). The PDGD exposes methods to start data layer servers and to create remote data layer interfaces (clients).

To...Do this

To start a server

import pdgd

server_manager = pdgd.DataLayerServerManager.Instance()

# The first parameter is the server type, the second 
# is the port the server will listen, zero means the 
# system will choose an available port 

server = server_manager.createServer('DataLayerWSServer', 0) 

server.serve() 

print('Server listening at port: {}'.format(server.getPort())) 

To connect to a remote interface

import pdgd

interface_manager = pdgd.DataLayerInterfaceManager.Instance() 
data_layer = interface_manager.createInterface('DataLayerWSClient', 'ws://host:port')

visualizer_object = pdgd.util.PDGDObject('pdg', data_layer)

When using the local interface the visualizer object will likely receive data as soon as it subscribes, that is not the case when using a remote interface. It is good practice to cover both cases.

import pdgd.util 

data_layer = pdgd.util.get_local_data_layer() 
visualizer_object = pdgd.util.PDGDObject('pdg/object/path', data_layer) 

def my_callback(pdg_object_path, object, message): 
    pass

# Manually calling the callback ensures we're checking for 
# any data received on subscribing. This is likely needed if 
# we're using a local interface. 

my_callback('pdg/object/path', visualizer_object.object, None)

visualizer_object.addListener(my_callback) 

Data layer visualizer

The PDGDObject class is an implementation of the data layer visualizer. While PDGDObject is usually enough for most use cases, if you need a finer degree of control, then a better option for you would be to implement a data layer visualizer with update message handling.

A data layer visualizer must use a processMessage method, and optionally a subscriptionLost method. Mes­sages are sent every time data changes and they contain information about what property has changed.

Object properties in PDGD are always collections of values (lists and sets are supported), but visualizers should never assume one or the other. Sets are used for improving performance locally, but when connecting to a remote data layer using JSON serialization, all properties are lists.

Example

import pdgd 
import pdgd.util

class MyDataVisualizer(pdgd.DataLayerVisualizer): 

        def __init__(self): 
                super(MyDataVisualizer, self).__init__() 

        def processMessage(self, message, sender): 
                print('Received message from {}: \n{}\n'.format(sender, message.toJson())) 

        def subscriptionLost(self): 
                print('Subscription to data layer was lost') 

my_visualizer = MyDataVisualizer()

data_layer = pdgd.util.get_local_data_layer() 
data_layer.subscribe(my_visualizer, 'pdg') 

Messages

Visualizers process messages in order to access data. Each message tells the visualizer what is happening to an object’s data, and a message can be either a snapshot that holds the full state for an object, or a delta message that describes how a particular property has changed.

The python API exposes the pdgd.Message class.

You can check a message’s type by calling the getType method. The possible values include:

  • pdgd.Message.MessageType.eMessageTypeObjectSnapshot

  • pdgd.Message.MessageType.eMessageTypeAppendValues

  • pdgd.Message.MessageType.eMessageTypeRemoveValues

  • pdgd.Message.MessageType.eMessageTypeSetValues

You can also convert all message data to python types by calling the toPython() method. This method makes the messages easier to deal with and can also reduce C++ code calls.

Snapshot messages

A snapshot message contains information about all the work item properties and attributes for an object. Most objects will start sending snapshot messages as soon as you add a new subscription.

You can return a pdgd.ObjectSnapshot instance by calling the getObjectSnapshot() method.

Delta messages

A delta message contains information about a specific object property and how its values have changed.

You can find which property has changed by calling the getPropertyName() method.

You can also get a list of the values (the message type will determine the meaning of that value list) that have changed by calling the getPropertyArray() method.

Set values messages

A set values message indicates that an object property value has been replaced.

The visualizer should consider the value list in the message as the complete value list for that property.

Add values messages

An add values message indicates that new values have been added to an object property.

This message contains a value list of the added values.

Remove values messages

A remove values message indicates that values have been removed from an object property.

This message contains a value list of the removed values.

Subscription handler

Subscription handlers are responsible for communicating with PDG. It is the subscription handler that knows how to build a snapshot or when to send a delta message.

PDGD provides subscription handlers for graphs, nodes, and work items, but you can also extend PDGD by writing custom subscription handlers.

PDGD uses a type system similar to PDG. You can check the registered subscription handler types by using the pdgd.TypeRegistry class.

Example

import pdgd

type_registry = pdgd.TypeRegistry.Instance() 
registered_handlers = type_registry.typeNames( 
        pdgd.registeredType.DataLayerSubscriptionHandler) 

print('Registered subscription handlers: {}'.format(registered_handlers)) 

In order to add a new subscription handler to PDGD, you need to define a class that implements the “PyDataLayerSubscrip­tionHandler” instance and register that class in the PDGD type registry.

Subscription handlers can add child subscription handlers, allowing you to create a data hierarchy. For example, the PDG graph subscription handler creates child node subscription handlers.

Using pdgd.util.SimpleHandler

pdgd.util.SimpleHandler is a base class that you can use to create new subscription handlers. Similar to pdgd.util.PDGDObject, this base class is an abstraction layer on top of PDGD code.

Example

import pdgd 
import pdgd.util 

class MySubscriptionHandler(pdgd.util.SimpleHandler):

        def __init__(self, handler): 
                super(MySubscriptionHandler, self).__init__(handler) 

                self.addValues('TestProperty', [0, 1, 2, 3, 4]) 
                self.addValues2('TestProperty2', ["test_string"]) 

pdgd.util.register_new_handler(MySubscriptionHandler, 'MySubscriptionHandlerName', ' my_subscription_handler_address') 

data_layer = pdgd.util.get_local_data_layer() 
visualizer_object = pdgd.util.PDGDObject(data_layer, 'my_subscription_handler_address')

In the example above, we define a custom subscription handler, register it, and then subscribe to it using a pdgd.util.PDGDObject.

The pdgd.util.register_new_handler method accepts three parameters:

  • The subscription handler definition.

  • The name that will be used for instantiating handlers of this type.

  • (Optional) The path where PDGD will create an instance of that subscription handler.

Commands

The data layer supports bidirectional communication. Subscription handlers expose commands, and you can use these commands to take a generic parameter and optionally return a value.

Example

import pdgd 
import pdgd.util

class MySubscriptionHandler(pdgd.util.SimpleHandler):

        def __init__(self, handler): 
                super(MySubscriptionHandler, self).__init__(handler) 

        @pdgd.util.SimpleHandler.command('MyCommand') 
        def my_command(self, params): 
                print('MyCommand command called with parameters: {}'.format(params)

pdgd.util.register_new_handler(MySubscriptionHandler, 'MySubscriptionHandlerName', 'my_subscription_handler_address')

data_layer = pdgd.util.get_local_data_layer() 
data_layer.sendCommand('MyCommand', 'Parameter', 'my_subscription_handler_address') 

Child handlers

A subscription handler can have child handler instances. You can use the SubscriptionHandlerManager instance to create new subscription handlers.

Example

import pdgd 
import pdgd.util 

class MySubscriptionHandler(pdgd.util.SimpleHandler): 

        def __init__(self, handler): 
                super(MySubscriptionHandler, self).__init__(handler) 

                this_handler_name = self.getObjectName() 
                child_handler_name = 'child' child_handler_path = '{}/{}'.format( 
                        this_handler_name, child_handler_name)

                handler_manager = pdgd.SubscriptionHandlerManager.Instance() 
                child_handler = handler_manager.createSubscriptionHandler(child_handler_path, 'MySubscriptionHandlerName') 

                self.addChildHandler(child_handler_name, child_handler)

pdgd.util.register_new_handler(MySubscriptionHandler, 'MySubscriptionHandlerName') 

Query system

Queries are special objects that provide dynamic search results. A query object has a results property that contains all object paths that satisfy the query’s parameters. The default data layer implementation supports queries by providing a special "query?" subscription path followed by the query definition in a JSON format.

The input for a query contains a list of property paths. The query will use the property values and the format string (by replacing the format string’s braces with the property values) to build a list of objects. That list can then be filtered and/or sorted.

Query format example

query? 
{ 
    'property_paths': [ 
            'path/to/obj1/prop1', 
            'path/to/obj1/prop2', 
            'path/to/obj2/prop1' 
    ], 
    'format_string': 'path/to/other/obj/{}', 
    'sort': true,
    'reverse_sort': false,
    'sort_property': 'SortProperty',
    'filters': [ 
        [{ 
            'property_name': 'filter_prop',
            'filter_value': 'include_this',
            'negate': false 
        }, 
        { 
            'property_name': 'filter_prop_2',
            'filter_value': 'do_not_include_this',
            'negate': true 
        }] 
    ] 
}

In the example above, the 'filters' property is a list of lists, the outer list represents an AND operation, the inner list represents an OR operation, and the filter definition [[A, B, C], [D, E, F], [G, H, I]] corresponds to the test ((A or B or C)and (D or E or F)and (G or H or I)).

Query example

Suppose you have these objects in PDGD:

{ 
        'obj_a': { 
                'prop_a': [1, 3, 5]
        },
        'obj_b': { 
                'my_children_ids': [0, 1, 2, 3, 4, 5, 6, 7]
        }, 
        'obj_b/child0': { 'my_filter_prop': 0 },
        'obj_b/child1': { 'my_filter_prop': 1 },
        'obj_b/child2': { 'my_filter_prop': 2 },
        'obj_b/child3': { 'my_filter_prop': 3 },
        'obj_b/child4': { 'my_filter_prop': 0 },
        'obj_b/child5': { 'my_filter_prop': 1 },
        'obj_b/child6': { 'my_filter_prop': 2 },
        'obj_b/child7': { 'my_filter_prop': 3 } 
} 

If you're interested in the subset of obj_ b's child objects represented by obj_a's property “prop_a”, but only if “my_filter_prop” equals 1, you can build the following query:

import pdgd 
import pdgd.util 

query_string = 
""" 
query? 
{ 
    'property_paths': [ 
            'obj_a/prop_a',
    ],
    'format_string': 'obj_b/child{}',
    'filters': [ 
        [{ 
            'property_name': 'my_filter_prop',
            'filter_value': 1, 
            'negate': false 
        }] 
    ] 
} 
""" 

data_layer = pdgd.util.get_local_data_layer() 
query_visualizer = pdgd.util.PDGDObject(query_string, data_layer) 

In the example above, the query acts just like any other PDGD object. Every time any data affecting this query changes, a message will be sent indicating how the results have changed.

PDG Data Layer Panel

You can open this pane from the New Tab > New Pane Tab Type > TOPs menu.

PDG Data Layer Panel pane

You can use this pane to test your custom integrations of the data layer.

The PDG Data Layer Panel pane lets you:

  • Start PDG data layer servers using custom ports or free ports.

  • Stack multiple PDG data layer servers.

  • Stop PDG data layer servers.

  • Choose which data layer back-end to use. Currently, there is only one back-end to choose from.

  • Create a JSON change log-version of the data layer back-end’s contents and copy it to your clipboard. This is useful for debugging purposes. For example, you could use the JSON to identify whether or not a problem with your data layer custom integration is local (data layer client or server).

Executing tasks with PDG/TOPs

Basics

Beginner Tutorials

Next steps

  • Running external programs

    How to wrap external functionality in a TOP node.

  • File tags

    Work items track the results created by their work. Each result is tagged with a type.

  • PDG Path Map

    The PDG Path Map manages the mapping of paths between file systems.

  • Feedback loops

    You can use for-each blocks to process looping, sequential chains of operations on work items.

  • Service Blocks

    Services blocks let you define a section of work items that should run using a shared Service process

  • PDG Services

    PDG services manages pools of persistent Houdini sessions that can be used to reduce work item cooking time.

  • Integrating PDG with render farm schedulers

    How to use different schedulers to schedule and execute work.

  • Visualizing work item performance

    How to visualize the relative cook times (or file output sizes) of work items in the network.

  • Event handling

    You can register a Python function to handle events from a PDG node or graph

  • Tips and tricks

    Useful general information and best practices for working with TOPs.

  • Troubleshooting PDG scheduler issues on the farm

    Useful information to help you troubleshoot scheduling PDG work items on the farm.

  • PilotPDG

    Standalone application or limited license for working with PDG-specific workflows.

Reference

  • All TOPs nodes

    TOP nodes define a workflow where data is fed into the network, turned into work items and manipulated by different nodes. Many nodes represent external processes that can be run on the local machine or a server farm.

  • Processor Node Callbacks

    Processor nodes generate work items that can be executed by a scheduler

  • Partitioner Node Callbacks

    Partitioner nodes group multiple upstream work items into single partitions.

  • Scheduler Node Callbacks

    Scheduler nodes execute work items

  • Custom File Tags and Handlers

    PDG uses file tags to determine the type of an output file.

  • Python API

    The classes and functions in the Python pdg package for working with dependency graphs.

  • Job API

    Python API used by job scripts.

  • Utility API

    The classes and functions in the Python pdgutils package are intended for use both in PDG nodes and scripts as well as out-of-process job scripts.