Houdini 20.5 Executing tasks with PDG/TOPs

Troubleshooting PDG scheduler issues on the farm

Useful information to help you troubleshoot scheduling PDG work items on the farm.

On this page

This page contains general troubleshooting recommendations for all TOP schedulers. For more in-depth information about schedulers, please see the TOP scheduler documentation.

General debugging tips

Logs

  • Check the farm logs or job logs for warnings or errors.

  • Attach any warning or error messages you find to your bug reports.

Deadline scheduler

  • In the Deadline Scheduler node, turn on the Deadline ▸ Verbose Logging parameter to enable log output from the scheduler. The log output may contain useful warning or error messages. Please note that we will be adding this parameter to the other TOP schedulers soon.

  • Add the log output to your support tickets or your SideFX forum posts to help SideFX track down the problem.

Farm machines

  • Make sure that all your farm machines and your submitting machine have access to your network file system.

  • Ideally, you should have at least two farm machines or nodes available for cooking PDG work items. A single farm machine may not be able to run both the work item tasks and the MQ job, especially for the TOP Deadline Scheduler.

Paths

  • Do not use single backslashes (\) in paths as these are treated as escape sequences evaluated by Houdini. Instead, please use double backslashes (\\) to accommodate Houdini’s evaluation, or simply use forward slashes (/).

  • Spaces in paths are not supported by Houdini. Instead, surround your paths with quotation marks (") or use backslashes (\) to escape the space characters.

Work items fail to report results due to connection refused or time out

PDG work items executing on farm machines have to report their results back to the Houdini process that initiated their cook. This Houdini process is typically run on a user’s workstation, also known as the submitting machine, which is not a farm machine and in some cases may even have a different network environment than the farm machines.

The results are reported back via a network socket-based Remote Procedure Call (RPC). To receive these results, a server is automatically started on the submitting machine to listen for these RPCs and to respond back if needed.

That is why the executing work items need to know the IP address (or host name) and port number of the submitting machine, and there needs to be a resolvable network route from each farm machine to the submitting machine.

Firewalls

Problems

  • Firewalls and host name resolution can cause issues with the PDG work item reporting mechanism.

  • Firewalls can get in the way of RPCs if they are enabled on any of your farm machines, your submitting machines, or between networks.

    To work around this, PDG utilizes the Message Queue MQ server. The MQ server can run as a task or job on your farm machines behind your firewalls. It can also use a limited number of ports (at least 2) if they are allowed through your firewalls to the submitting machine.

Solutions

  • Contact your IT Administrator to allow a few ports through your firewalls.

  • Specify these ports in the Task Callback Port and Relay Port parameter fields on your TOP scheduler nodes.

    For more information on these nodes, see TOP nodes.

DNS

Problem

Domain Name Resolution (DNS) can cause issues when reporting results via RPCs. Currently, the reporting mechanism uses hostname by default, which needs to be resolved to an actual IP address via a hosts file or DNS.

Solutions

  • For the hosts file, you can edit:

    Windows

    C:\Windows\System32\Drivers\etc\hosts

    Linux

    /etc/hosts

    Mac

    /etc/hosts

  • If neither are available (for example, like with an AWS farm without DNS), the RPC mechanism can attempt to resolve the IP address of the MQ server.

    • You can enable this by specifying the PDGMQ_USE_IP=1 environment value in the work item job process or the .hip file.

MQ

  • For Submit Graph as Job cooks, the MQ server runs locally on the submitting job on the farm. As such, this should allow it to avoid any networking issues.

  • Running MQ as its own job or task takes up a farm machine for some scheduler set-ups. In addition, each scheduler node might run its own MQ server.

Work items fail due to required files not found

PDG on farms requires a network file system that is accessible by all machines involved in the process; this includes the submitting machine as well as all of the farm machines. All the files required by this process are copied to the PDG working directory specified by the scheduler located on the network file system. For more information, please see paths.

Problems

Issues that can interfere with this process are:

  • Different file paths for submitting machine vs. farm machines.

  • Non-homogeneous farm machine set-ups (for example, when you have Windows, macOS, and Linux machines in the same farm).

Solution

Each of the TOP scheduler nodes provides parameters to specify the remote file paths separately from the local file paths for the working directory.

  • Specify the local file path for the submitting machine.

  • Specify the remote file path that the farm machines can resolve.

HQueue Scheduler

  • Turn on the Override Local Shared Root parameter on your TOP scheduler node and then specify the appropriate Local Shared Root Paths.

Deadline Scheduler

  • For the local file path, use the Working Directory ▸ Local Shared Path parameter field on your TOP scheduler node.

  • For the remote file path, use the Working Directory ▸ Remote Shared Path parameter field on your TOP scheduler node.

Tractor Scheduler

  • Use the Shared File Root Path parameter fields on your TOP scheduler node.

Python not found

Problem

PDG requires Python for executing work on the farm. As such, the TOP schedulers assume that the Python executable is accessible via the system path.

Solution

  1. Install Python.

    If Houdini is installed on your farm machines, you can use the Python that ships with Houdini.

    It is located in:

    Windows

    $HFS/python27/python.exe

    Linux

    $HFS/python/bin/python

    Mac

    $HFS/Frameworks/Python.framework/Versions/Current/bin/python

  2. Do one of the following:

    • For a global solution, add the path to the Python executable to the system path environment.

    • For a solution specific to a single TOP scheduler and all its work items, specify the path to the Python executable in:

      • HQueue Scheduler

        The Executable Paths ▸ Python Executable parameter field on your TOP scheduler node.

      • Deadline Scheduler

        The Paths ▸ Python parameter field on your TOP scheduler node.

      • Tractor Scheduler

        The Scheduler ▸ Python Executable parameter field on your TOP scheduler node.

Executing tasks with PDG/TOPs

Basics

Beginner Tutorials

Next steps

  • Running external programs

    How to wrap external functionality in a TOP node.

  • File tags

    Work items track the results created by their work. Each result is tagged with a type.

  • PDG Path Map

    The PDG Path Map manages the mapping of paths between file systems.

  • Feedback loops

    You can use for-each blocks to process looping, sequential chains of operations on work items.

  • Service Blocks

    Services blocks let you define a section of work items that should run using a shared Service process

  • PDG Services

    PDG services manages pools of persistent Houdini sessions that can be used to reduce work item cooking time.

  • Integrating PDG with render farm schedulers

    How to use different schedulers to schedule and execute work.

  • Visualizing work item performance

    How to visualize the relative cook times (or file output sizes) of work items in the network.

  • Event handling

    You can register a Python function to handle events from a PDG node or graph

  • Tips and tricks

    Useful general information and best practices for working with TOPs.

  • Troubleshooting PDG scheduler issues on the farm

    Useful information to help you troubleshoot scheduling PDG work items on the farm.

  • PilotPDG

    Standalone application or limited license for working with PDG-specific workflows.

Reference

  • All TOPs nodes

    TOP nodes define a workflow where data is fed into the network, turned into work items and manipulated by different nodes. Many nodes represent external processes that can be run on the local machine or a server farm.

  • Processor Node Callbacks

    Processor nodes generate work items that can be executed by a scheduler

  • Partitioner Node Callbacks

    Partitioner nodes group multiple upstream work items into single partitions.

  • Scheduler Node Callbacks

    Scheduler nodes execute work items

  • Custom File Tags and Handlers

    PDG uses file tags to determine the type of an output file.

  • Python API

    The classes and functions in the Python pdg package for working with dependency graphs.

  • Job API

    Python API used by job scripts.

  • Utility API

    The classes and functions in the Python pdgutils package are intended for use both in PDG nodes and scripts as well as out-of-process job scripts.