On this page |
This page contains general troubleshooting recommendations for all TOP schedulers. For more in-depth information about schedulers, please see the TOP scheduler documentation.
General debugging tips ¶
Logs ¶
-
Check the farm logs or job logs for warnings or errors.
-
Attach any warning or error messages you find to your bug reports.
Deadline scheduler ¶
-
In the Deadline Scheduler node, turn on the Deadline ▸ Verbose Logging parameter to enable log output from the scheduler. The log output may contain useful warning or error messages. Please note that we will be adding this parameter to the other TOP schedulers soon.
-
Add the log output to your support tickets or your SideFX forum posts to help SideFX track down the problem.
Farm machines ¶
-
Make sure that all your farm machines and your submitting machine have access to your network file system.
-
Ideally, you should have at least two farm machines or nodes available for cooking PDG work items. A single farm machine may not be able to run both the work item tasks and the MQ job, especially for the TOP Deadline Scheduler.
Paths ¶
-
Do not use single backslashes (
\
) in paths as these are treated as escape sequences evaluated by Houdini. Instead, please use double backslashes (\\
) to accommodate Houdini’s evaluation, or simply use forward slashes (/
). -
Spaces in paths are not supported by Houdini. Instead, surround your paths with quotation marks (
"
) or use backslashes (\
) to escape the space characters.
Work items fail to report results due to connection refused or time out ¶
PDG work items executing on farm machines have to report their results back to the Houdini process that initiated their cook. This Houdini process is typically run on a user’s workstation, also known as the submitting machine, which is not a farm machine and in some cases may even have a different network environment than the farm machines.
The results are reported back via a network socket-based Remote Procedure Call (RPC). To receive these results, a server is automatically started on the submitting machine to listen for these RPCs and to respond back if needed.
That is why the executing work items need to know the IP address (or host name) and port number of the submitting machine, and there needs to be a resolvable network route from each farm machine to the submitting machine.
Firewalls ¶
Problems
-
Firewalls and host name resolution can cause issues with the PDG work item reporting mechanism.
-
Firewalls can get in the way of RPCs if they are enabled on any of your farm machines, your submitting machines, or between networks.
To work around this, PDG utilizes the Message Queue MQ server. The MQ server can run as a task or job on your farm machines behind your firewalls. It can also use a limited number of ports (at least 2) if they are allowed through your firewalls to the submitting machine.
Solutions
-
Contact your IT Administrator to allow a few ports through your firewalls.
-
Specify these ports in the Task Callback Port and Relay Port parameter fields on your TOP scheduler nodes.
For more information on these nodes, see TOP nodes.
DNS ¶
Problem
Domain Name Resolution (DNS) can cause issues when reporting results via RPCs. Currently, the reporting mechanism uses hostname by default, which needs to be resolved to an actual IP address via a hosts file or DNS.
Solutions
-
For the hosts file, you can edit:
Windows
C:\Windows\System32\Drivers\etc\hosts
Linux
/etc/hosts
Mac
/etc/hosts
-
If neither are available (for example, like with an AWS farm without DNS), the RPC mechanism can attempt to resolve the IP address of the MQ server.
-
You can enable this by specifying the
PDGMQ_USE_IP=1
environment value in the work item job process or the.hip
file.
-
MQ ¶
-
For
Submit Graph as Job
cooks, the MQ server runs locally on the submitting job on the farm. As such, this should allow it to avoid any networking issues. -
Running MQ as its own job or task takes up a farm machine for some scheduler set-ups. In addition, each scheduler node might run its own MQ server.
Work items fail due to required files not found ¶
PDG on farms requires a network file system that is accessible by all machines involved in the process; this includes the submitting machine as well as all of the farm machines. All the files required by this process are copied to the PDG working directory specified by the scheduler located on the network file system. For more information, please see paths.
Problems
Issues that can interfere with this process are:
-
Different file paths for submitting machine vs. farm machines.
-
Non-homogeneous farm machine set-ups (for example, when you have Windows, macOS, and Linux machines in the same farm).
Solution
Each of the TOP scheduler nodes provides parameters to specify the remote file paths separately from the local file paths for the working directory.
-
Specify the local file path for the submitting machine.
-
Specify the remote file path that the farm machines can resolve.
-
Turn on the Override Local Shared Root parameter on your TOP scheduler node and then specify the appropriate Local Shared Root Paths.
-
For the local file path, use the Working Directory ▸ Local Shared Path parameter field on your TOP scheduler node.
-
For the remote file path, use the Working Directory ▸ Remote Shared Path parameter field on your TOP scheduler node.
-
Use the Shared File Root Path parameter fields on your TOP scheduler node.
Python not found ¶
Problem
PDG requires Python for executing work on the farm. As such, the TOP schedulers assume that the Python executable is accessible via the system path.
Solution
-
Install Python.
If Houdini is installed on your farm machines, you can use the Python that ships with Houdini.
It is located in:
Windows
$HFS/python27/python.exe
Linux
$HFS/python/bin/python
Mac
$HFS/Frameworks/Python.framework/Versions/Current/bin/python
-
Do one of the following:
-
For a global solution, add the path to the Python executable to the system path environment.
-
For a solution specific to a single TOP scheduler and all its work items, specify the path to the Python executable in:
-
The Executable Paths ▸ Python Executable parameter field on your TOP scheduler node.
-
The Paths ▸ Python parameter field on your TOP scheduler node.
-
The Scheduler ▸ Python Executable parameter field on your TOP scheduler node.
-
-