On this page | |
Since | 17.5 |
This node schedules work items using HQueue in order to execute them on remote machines.
For more information on configuring HQueue, see Getting Started with HQueue or PDG For Design Work Pt. 3 - Setting Up Distributed PDG.
Cook Modes ¶
This scheduler can operate in two different cook modes:
-
The normal cook mode connects to your HQueue scheduler and creates jobs for work items as they become ready to execute, and the jobs then communicate back to the submitting machine with status changes. This means that the submitting Houdini session must remain open for the duration of the cook.
This mode is used whenever you select Cook from any of the menus or buttons in the TOP UI.
-
The standalone job mode cooks the entire TOP network as a standalone job. In this mode, the submitting Houdini session is detached from the cooking of the TOP network, the
.hip
file is copied if necessary, and ahython
process executes the TOP network using the default scheduler for that topnet. You will also not see any updates to your current Houdini session. To check the progress of your job when using this mode, you will need to use the HQueue web portal.This mode is used whenever you click the Submit Graph As Job > Submit button in the HQueue Scheduler’s parameters.
Network Requirements ¶
As part of the cook, a message queue (MQ) job is submitted. This job is used to communicate information from executing jobs back to the submitting machine. For this reason, your farm machines must be able to resolve the hostnames of other farm machines.
Tip
This is as simple as editing the /etc/hosts
(Linux / macOS) or C:\Windows\System32\Drivers\etc\hosts
(Windows).
In addition, farm machines must not have firewalls between them, or you need to use the Task Callback Port parameter to specify the open port to use.
When the cook starts, the submitting machine connects to the farm machine that is running the MQ job. So farm machines also must not have firewalls between them and the submitting machine, or you need to use the Relay Port parameter to specify the open port to use.
Enable Server
When on, turns on the data layer server for the TOP job that will cook on the farm. This allows PilotPDG or other WebSocket clients to connect to the cooking job remotely to view the state of PDG.
Server Port
Determines which server port to use for the data layer server.
This parameter is only available when Enable Server is on.
Automatic
A free TCP port to use for the data layer server chosen by the node.
Custom
A custom TCP port to use for the data layer server specified by the user.
This is useful when there is a firewall between the farm machine and the monitoring machine.
Auto Connect
When on, the scheduler will try to send a command to create a remote visualizer when the job starts. If successful, then a remote graph is created and is automatically connected to the server executing the job. The client submitting the job must be visible to the server running the job or the connection will fail.
This parameter is only available when Enable Server is on.
When Finished
Determines what to do when the TOP Cook finishes. This allows the TOP Cook job to continue running after the graph cook completes so that it can be inspected by a wrangler using a Data Layer viewer. For example, with When Finished you can retry a failed work item without restarting its whole job.
Terminate
Exit the job as normal.
Keep Open If Error
Keep the job running only if there is an error detected. You will need to kill the job manually.
Keep Open
Keep the job running. You will need to kill the job manually.
Block on Failed Work Items
When on, if there are any failed work items on the scheduler, then the cook is blocked from completing and the PDG graph cook is prevented from ending. This allows you to manually retry your failed work items. You can cancel the scheduler’s cook when it is blocked by failed work items by pressing the ESC key, clicking the Cancels the current cook button in the TOP tasks bar, or by using the cancel API method.
Auto retry downstream tasks
When on, if a parent tasks is retried manually, then its child tasks will also be retried. This parameter is only available when Block on Failed Work Items is turned on.
Hython
Determines which Houdini Python interpreter (hython) is used for your Houdini jobs. You can also specify this hython in a command using the PDG_HYTHON
token.
Default
Use the default hython interpreter that is installed with Houdini.
Custom
Use the executable path specified by the Hython Executable parameter.
Hython Executable
This parameter is only available when Hython is set to Custom.
The full path to the hython executable to use for your Houdini jobs.
Load Item Data From
Determines how jobs processed by this scheduler should load work item attributes and data.
Temporary JSON File
The scheduler writes out a .json
file for each work item to the PDG temporary file directory. This option is selected by default.
RPC Message
The scheduler’s running work items request attributes and data over RPC. If the scheduler is a farm scheduler, then the job scripts running on the farm will also request item data from the submitter when creating their out-of-process work item objects.
This parameter option removes the need to write data files to disk and is useful when your local and remote machines do not share a file system.
Delete Temp Dir
Determines when PDG should automatically delete the temporary file directory associated with the scheduler.
Never
PDG never automatically deletes the temp file directory.
When Scheduler is Deleted
PDG automatically deletes the temp file directory when the scheduler is deleted or when Houdini is closed.
When Cook Completes
PDG automatically deletes the temp file directory each time a cook completes.
Compress Work Item Data
When on, PDG compresses the work item .json
files when writing them to disk.
This parameter is only available when Load Item Data From is set to Temporary JSON File.
Ignore RPC Errors
Determines whether RPC errors should cause out of process jobs to fail.
Never
RPC connection errors will cause work items to fail.
When Cooking Batches
RPC connection errors are ignored for batch work items, which typically make a per-frame RPC back to PDG to report output files and communicate sub item status. This option prevents long-running simulations from being killed on the farm, if the submitter Houdini session crashes or becomes unresponsive.
Always
RPC connection errors will never cause a work item to fail. Note that if a work item can’t communicate with the scheduler, it will be unable to report output files, attributes or its cook status back to the PDG graph.
Max RPC Errors
The maximum number of RPC failures that can occur before RPC is disabled in an out of process job.
Connection Timeout
The number of seconds to wait when an out of process jobs makes an RPC connection to the main PDG graph, before assuming the connection failed.
Connection Retries
The number of times to retry a failed RPC call made by an out of process job.
Retry Backoff
When Connection Retries is greater than 0, this parameter determines how much time should be spent between consecutive retries.
Batch Poll Rate
Determines how quickly an out of process batch work item should poll the main Houdini session for dependency status updates, if the batch is configured to cook when it’s first frame of work is ready. This has no impact on other types of batch work items.
Release Job Slot When Polling
Determines whether or not the scheduler should decrement the number of active workers when a batch is polling for dependency updates.
Windows
Windows Services cannot use network-mounted drives. Since HQueue jobs on Windows are executed by a Windows Service, you should only use UNC paths. For example, use //myserver/hq/project/myhip.hip
instead of H:/project/myhip.hip
. Also be careful with backslashes in paths, as they are interpreted as escape sequences when evaluated by Houdini or the command shell.
Tip
On the HQueue Scheduler Node, press the Load Path Map button in the Path Mapping section to automatically load the necessary path maps.
TOP Attributes ¶
|
integer |
When the scheduler submits a work item to HQueue, it adds this attribute to the work item in order to track the HQueue job ID. |
Parameters ¶
Scheduler ¶
These are the global parameters that configure the behavior of the connection and file paths for HQueue.
HQueue Server
URL of the HQueue server. For example, http://localhost:5000
.
Job Name
The name of the top-level HQueue Job for submitted cooks.
Job Description
The description of the top-level HQueue job. This can be seen in the Job Properties for the job.
Limit Jobs
When enabled, sets the maximum number of jobs that can be submitted by the scheduler at the same time.
For farm schedulers like Tractor or HQueue, this parameter can be used to limit the total number of jobs submitted to the render farm itself. Setting this parameter can help limit the load on the render farm, especially when the PDG graph has a large number of small tasks.
Block on Failed Work Items
When on, if there are any failed work items on the scheduler, then the cook is blocked from completing and the PDG graph cook is prevented from ending. This allows you to manually retry your failed work items. You can cancel the scheduler’s cook when it is blocked by failed work items by pressing the ESC key, clicking the Cancels the current cook button in the TOP tasks bar, or by using the cancel API method.
Verbose Logging
Turn on printing output to console. Can be useful for debugging problems.
Tick Period
Sets the minimum time (in seconds) between calls to the onTick
callback.
Max Items Per Tick
Sets the maximum number of ready item onSchedule
callbacks between ticks.
Working Directory
Specifies the directory where the cook generates intermediate files and output. The intermediate files are placed in a subdirectory named pdgtemp
.
If you are opening your .hip
file in Houdini from the shared network path (for example, from H:/myproj/myhip.hip
), you can use $HIP
here (the default). However, if you are opening your .hip
file from a local directory (for example, from C:/temp/myhip.hip
), you have to copy it to a shared network before it can be accessed by farm machines. In this case, the Working Directory should be an absolute or relative path to that shared network location (for example, //MYPC/Shared/myproj
). The .hip
file will be copied automatically in that case, but note that for cross-platform compatibility you will need to add a Path Map from your local $HIP path to the farm Working Directory (for example c:/temp → /mnt/hq/pyproj)
Load Item Data From
Determines how jobs processed by this scheduler should load work item attributes and data.
Temporary JSON File
The scheduler writes out a .json
file for each work item to the PDG temporary file directory. This option is selected by default.
RPC Message
The scheduler’s running work items request attributes and data over RPC. If the scheduler is a farm scheduler, then the job scripts running on the farm will also request item data from the submitter when creating their out-of-process work item objects.
This parameter option removes the need to write data files to disk and is useful when your local and remote machines do not share a file system.
Delete Temp Dir
Determines when PDG should automatically delete the temporary file directory associated with the scheduler.
Never
PDG never automatically deletes the temp file directory.
When Scheduler is Deleted
PDG automatically deletes the temp file directory when the scheduler is deleted or when Houdini is closed.
When Cook Completes
PDG automatically deletes the temp file directory each time a cook completes.
Compress Work Item Data
When on, PDG compresses the work item .json
files when writing them to disk.
This parameter is only available when Load Item Data From is set to Temporary JSON File.
Validate Outputs When Recooking
When on, PDG validates the output files of the scheduler’s cooked work items when the graph is recooked to see if the files still exist on disk. Work items that are missing output files are then automatically dirtied and cooked again. If any work items are dirtied by parameter changes, then their cache files are also automatically invalidated. Validate Outputs When Recooking is on by default.
Check Expected Outputs on Disk
When on, PDG looks for any unexpected outputs (for example, like outputs that can result from custom output handling internal logic) that were not explicitly reported when the scheduler’s work items finished cooking. This check occurs immediately after the scheduler marks work items as cooked, and expected outputs that were reported normally are not checked. If PDG finds any files that are different from the expected outputs, then they are automatically added as real output files.
Path Mapping
Global
If the PDG Path Map exists, then it is applied to file paths.
None
Delocalizes paths using the PDG_DIR
token.
Path Map Zone
When on, specifies a custom mapping zone to apply to all jobs executed by this scheduler. Otherwise, the local platforms are LINUX
, MAC
or WIN
.
Load Path Map
Opens the PDG Path Map Panel and populates it with path mappings based on the configuration of your HQueue Server for the default LINUX, MAC, and WIN zones.
The HQueue farm should be configured with a shared network filesystem and the mount point of this shared file system is specified for each platform.
These parameters are only available when Override Local Shared Root is on.
Load from HQueue
Queries the HQueue server to retrieve the local shared root paths for each platform and fills the parameters below.
Universal HFS
When on, a single path to the $HFS
directory (the Houdini install directory) is used by all platforms. You can use $HQROOT
and $HQCLIENTARCH
to help specify the directory path.
Linux HFS Path
$HFS
path for Linux.
This parameter is only available when Universal HFS is off.
macOS HFS Path
$HFS
path for macOS.
Windows HFS Path
$HFS
path for Windows.
This parameter is only available when Universal HFS is off.
Python
Determines which Python interpreter is used for your Python jobs. You can also specify this Python in a command using the PDG_PYTHON
token.
From HFS
Use the Python interpreter that is installed with Houdini.
From HQClient
Use the same Python interpreter that HQClient is using on the farm machine.
Custom
Use the executable path specified by the Python Executable parameter.
Python Executable
This parameter is only available when Python is set to Custom.
The full path to the Python executable to use for your Python jobs.
Hython
Determines which Houdini Python interpreter (hython) is used for your Houdini jobs. You can also specify this hython in a command using the PDG_HYTHON
token.
Default
Use the default hython interpreter that is installed with Houdini.
Custom
Use the executable path specified by the Hython Executable parameter.
Hython Executable
This parameter is only available when Hython is set to Custom.
The full path to the hython executable to use for your Houdini jobs.
Submit As Job ¶
Submit
Cooks the entire TOP network as a standalone job. Displays the status URI for the submitted job. The submitting Houdini session is detached from the cooking of the TOP network. The .hip
file is copied if necessary and a hython
process executes the TOP network normally using the default scheduler for that topnet.
Tip
You can restart a finished standalone jobs using the HQueue Web UI. However, you should restart the child job named TOP Cook instead of the parent job.
Job Name
Specifies the name of the submitted job.
Job Verbosity
Specifies the verbosity level of the standalone job.
Output Node
When on, specifies the path to the node to cook. If a node is not specified, the display node of the network that contains the Scheduler is cooked instead.
Save Task Graph File
When on, the submitted job will save a task graph .py
file once the cook completes.
Assign To
Which clients to assign priority to.
Any Client
Assign to any client.
Listed Clients
Assign to specified clients.
Clients from Listed Groups
Assign to specified client groups.
Clients
Names of clients to assign jobs to separated by spaces.
This parameter is only available when Assign To is set to Listed Clients.
Client Groups
Names of client groups to assign jobs to separated by spaces.
This parameter is only available when Assign To is set to Clients from Listed Groups.
CPUs per Job
The maximum number of CPUs that will be consumed by the job. If the number exceeds a client machine’s number of free CPUs, then the client machine will not be assigned the job.
Enable Server
When on, turns on the data layer server for the TOP job that will cook on the farm. This allows PilotPDG or other WebSocket clients to connect to the cooking job remotely to view the state of PDG.
Server Port
Determines which server port to use for the data layer server.
This parameter is only available when Enable Server is on.
Automatic
A free TCP port to use for the data layer server chosen by the node.
Custom
A custom TCP port to use for the data layer server specified by the user.
This is useful when there is a firewall between the farm machine and the monitoring machine.
Auto Connect
When on, the scheduler will try to send a command to create a remote visualizer when the job starts. If successful, then a remote graph is created and is automatically connected to the server executing the job. The client submitting the job must be visible to the server running the job or the connection will fail.
This parameter is only available when Enable Server is on.
When Finished
Determines what to do when the TOP Cook finishes. This allows the TOP Cook job to continue running after the graph cook completes so that it can be inspected by a wrangler using a Data Layer viewer. For example, with When Finished you can retry a failed work item without restarting its whole job.
Terminate
Exit the job as normal.
Keep Open If Error
Keep the job running only if there is an error detected. You will need to kill the job manually.
Keep Open
Keep the job running. You will need to kill the job manually.
Message Queue ¶
The Message Queue (MQ) server is required to get work item results from the jobs running on the farm. Several types of MQ are provided to work around networking issues such as firewalls.
Type
The type of Message Queue (MQ) server to use.
Local
Starts or shares the MQ server on your local machine.
If another HQueue scheduler node (in the current Houdini session) already started a MQ server locally, then this scheduler node uses that MQ server automatically.
If there are not any firewalls between your local machine and the farm machines, then we recommend you use this setting.
Farm
Starts or shares the MQ server on the farm as a separate job.
If there are firewalls between your local machine and the farm machines, then we recommend you use this parameter.
Connect
Connects to an already running MQ server.
The MQ server needs to have been started manually. This is the manual option for managing the MQ and useful for running MQ as a centralized service on a single machine to serve all PDG jobs which use this setting.
Task Callback Port
Sets the TCP Port used by the Message Queue Server for the XMLRPC
callback API. This port must be accessible between farm clients.
Relay Port
Sets the TCP Port used by the Message Queue Server connection between PDG and the client that is running the Message Queue Command. This port must be reachable on farm clients by the PDG/user machine.
Address
IP address of the machine running the persistent MQ server.
This parameter is only available when Type is set to Connect.
RPC Server ¶
Parameters for configuring the behavior of RPC connections from out of process jobs back to a scheduler instance.
Ignore RPC Errors
Determines whether RPC errors should cause out of process jobs to fail.
Never
RPC connection errors will cause work items to fail.
When Cooking Batches
RPC connection errors are ignored for batch work items, which typically make a per-frame RPC back to PDG to report output files and communicate sub item status. This option prevents long-running simulations from being killed on the farm, if the submitter Houdini session crashes or becomes unresponsive.
Always
RPC connection errors will never cause a work item to fail. Note that if a work item can’t communicate with the scheduler, it will be unable to report output files, attributes or its cook status back to the PDG graph.
Max RPC Errors
The maximum number of RPC failures that can occur before RPC is disabled in an out of process job.
Connection Timeout
The number of seconds to wait when an out of process jobs makes an RPC connection to the main PDG graph, before assuming the connection failed.
Connection Retries
The number of times to retry a failed RPC call made by an out of process job.
Retry Backoff
When Connection Retries is greater than 0, this parameter determines how much time should be spent between consecutive retries.
Batch Poll Rate
Determines how quickly an out of process batch work item should poll the main Houdini session for dependency status updates, if the batch is configured to cook when it’s first frame of work is ready. This has no impact on other types of batch work items.
Release Job Slot When Polling
Determines whether or not the scheduler should decrement the number of active workers when a batch is polling for dependency updates.
Job Parms ¶
These job-specific parameters affect all submitted jobs, but can be overridden on a node-by-node basis. For more information, see Scheduler Job Parms / Properties.
Note
Many of these parameters correspond directly to the HQueue Job Properties.
Job Priority
The job’s HQueue priority.
Jobs with higher priorities are scheduled and processed before jobs with lower priorities. 0
is the lowest priority.
Assign To
Which clients to assign priority to.
Any Client
Assign to any client.
Listed Clients
Assign to specified clients.
Clients from Listed Groups
Assign to specified client groups.
Clients
Names of clients to assign jobs to separated by spaces.
This parameter is only available when Assign To is set to Listed Clients.
Select Clients
Selects clients from HQueue to populate the Clients list.
This parameter is only available when Assign To is set to Listed Clients.
Client Groups
Names of client groups to assign jobs to separated by spaces.
This parameter is only available when Assign To is set to Clients from Listed Groups.
Select Groups
Selects client groups from HQueue to populate the Client Groups list.
This parameter is only available when Assign To is set to Clients from Listed Groups.
CPUs per Job
The maximum number of CPUs that will be consumed by the job. If the number exceeds a client machine’s number of free CPUs, then the client machine will not be assigned the job.
Note that you can control the multi-threading of some jobs with Houdini Max Threads. You can also use the Tags parm to control if this job needs a dedicated machine.
Houdini Max Threads
When on, sets the HOUDINI_MAXTHREADS
environment to the specified value. If CPUS per Job is enabled, HOUDINI_MAXTHREADS
is set to the same value unless this parameter is also enabled.
A value of 0 indicates that the job should use all available CPUs cores.
Positive values limit the number of threads that can be used. For example, a value of 1
disables multi-threading entirely by limiting the job to one thread. Positive values are also clamped to the number of available CPU cores.
If the value is negative, the value is added to the maximum number of processors to determine the threading limit for the job. For example, a value of -1
uses all CPU cores except 1.
Max Run Time
The maximum amount of time (in seconds) that the work item is permitted to run for. If it’s running time exceeds the maximum time then it is automatically canceled by HQueue.
Create Container Job
Determines whether a node-level container job should be created in the job tree, and how it should be named.
Custom Container Name
When Create Container Job is set to Custom Name, this parameter can be set to an expression to define the container job name.
Job Description
Description property for the job.
Allowed Host
The host name of the machine that the job should execute on.
Resources
Adds custom resource-quantity pairings of HQueue Resources to the job.
Handle By
Customize what to do when the command fails (Returns a non-zero exit code).
Reporting Error
The work item fails.
Reporting Warning
The work item succeeds and a warning is added to the node.
Retrying Task
The work item is retried by HQueue for the number of Max Retries remaining.
Ignoring Exit Code
The work item succeeds.
Handle All Non Zero
When off, you can specify a particular exit code.
Exit Code
Specifies the exit code that you want to handle using Handle By. All other non-zero exit codes will be treated as a failure as normal.
This parameter is only available when Handle All Non Zero is off.
Max Retries
Number of times to retry the job when the command fails.
This parameter is only available when Handle By is set to Retrying Task.
Inherit Local Environment
When on, the environment variables in the current Houdini session are copied into the job’s environment.
Unset Variables
Space-separated list of environment variables that should be unset in the task environment.
Environment File
Environment Variables
Additional work item environment variables can be specified here. These will be added to the job’s environment. If the value of the variable is empty, it will be removed from the job’s environment.
Name
Name of the work item environment variable.
Value
Value of the work item environment variable.
Specifies an environment file for environment variables to be added to the job’s environment. An environment variable from the file will overwrite an existing environment variable if they share identical names.
Environment Variables
Adds custom key-value environment variables to each task.
Pre Shell
Specifies a shell script to be executed/sourced before the command is executed.
Post Shell
Specifies a shell script to be executed/sourced after the command is executed.
Pre Python
Specifies the Python script to be executed in the wrapper script before the command process is spawned.
Post Python
Specifies the Python script to be executed in the wrapper script after the command process exits.
See also |