On this page | |
Since | 17.5 |
Overview ¶
This scheduler executes work items on a farm managed by Tractor.
Note
The Tractor Scheduler TOP supports Tractor 2.4 and greater.
Tractor 2.4 introduced a Python 3-compliant API, which is required by PDG and TOPs.
The Tractor Scheduler TOP requires the Tractor client Python API. You can install the client API with the Pixar RenderMan installer using the installer’s default options. To make the API available to TOPs and PDG, add the API’s location to the PYTHONPATH
environment variable before launching Houdini.
The Tractor client Python API (i.e. tractor
package) is installed in the following location:
Windows
C:\Program Files\Pixar\RenderManProServer-X.X\bin
Mac
/Applications/Pixar/RenderManProServer-X.X/bin
Linux
/opt/pixar/RenderManProServer-X.X/bin
Cook Modes ¶
This scheduler can operate in two different cook modes. The normal cook mode is used when selecting cook from any of the menus or buttons in the TOP UI. It connects to your Tractor engine and creates jobs for work items as they become ready to execute. The jobs then communicate back to the submitting machine with status changes. This means the submitting Houdini session must remain open for the duration of the cook.
Alternatively, you can use the button in Submit Graph As Job to cook the entire TOP Network as a standalone job. In this mode, the submitting Houdini session is detached from the cooking of the TOP Network. The HIP file is copied if necessary, and a hython
task executes the TOP network as normal using whatever the default scheduler is for that topnet. In this mode, you will not see any updates in your current Houdini session. You should instead check the progress of your job using the Tractor web portal.
Network Requirements ¶
As part of the cook, a message queue (MQ) job is submitted. This job is used to communicate information from executing jobs back to the submitting machine. For this reason, your farm machines must be able to resolve the hostnames of other farm machines.
Tip
This is as simple as editing the /etc/hosts
(Linux / macOS) or C:\Windows\System32\Drivers\etc\hosts
(Windows).
In addition, farm machines must not have firewalls between them, or you need to use the Task Callback Port parameter to specify the open port to use.
When the cook starts, the submitting machine connects to the farm machine that is running the MQ job. So farm machines also must not have firewalls between them and the submitting machine, or you need to use the Relay Port parameter to specify the open port to use.
Enable Server
When on, turns on the data layer server for the TOP job that will cook on the farm. This allows PilotPDG or other WebSocket clients to connect to the cooking job remotely to view the state of PDG.
Server Port
Determines which server port to use for the data layer server.
This parameter is only available when Enable Server is on.
Automatic
A free TCP port to use for the data layer server chosen by the node.
Custom
A custom TCP port to use for the data layer server specified by the user.
This is useful when there is a firewall between the farm machine and the monitoring machine.
Auto Connect
When on, the scheduler will try to send a command to create a remote visualizer when the job starts. If successful, then a remote graph is created and is automatically connected to the server executing the job. The client submitting the job must be visible to the server running the job or the connection will fail.
This parameter is only available when Enable Server is on.
When Finished
Determines what to do when the TOP Cook finishes. This allows the TOP Cook job to continue running after the graph cook completes so that it can be inspected by a wrangler using a Data Layer viewer. For example, with When Finished you can retry a failed work item without restarting its whole job.
Terminate
Exit the job as normal.
Keep Open If Error
Keep the job running only if there is an error detected. You will need to kill the job manually.
Keep Open
Keep the job running. You will need to kill the job manually.
Block on Failed Work Items
When on, if there are any failed work items on the scheduler, then the cook is blocked from completing and the PDG graph cook is prevented from ending. This allows you to manually retry your failed work items. You can cancel the scheduler’s cook when it is blocked by failed work items by pressing the ESC key, clicking the Cancels the current cook button in the TOP tasks bar, or by using the cancel API method.
Auto retry downstream tasks
When on, if a parent tasks is retried manually, then its child tasks will also be retried. This parameter is only available when Block on Failed Work Items is turned on.
Hython
Determines which Houdini Python interpreter (hython) is used for your Houdini jobs. You can also specify this hython in a command using the PDG_HYTHON
token.
Default
Use the default hython interpreter that is installed with Houdini.
Custom
Use the executable path specified by the Hython Executable parameter.
Hython Executable
This parameter is only available when Hython is set to Custom.
The full path to the hython executable to use for your Houdini jobs.
Load Item Data From
Determines how jobs processed by this scheduler should load work item attributes and data.
Temporary JSON File
The scheduler writes out a .json
file for each work item to the PDG temporary file directory. This option is selected by default.
RPC Message
The scheduler’s running work items request attributes and data over RPC. If the scheduler is a farm scheduler, then the job scripts running on the farm will also request item data from the submitter when creating their out-of-process work item objects.
This parameter option removes the need to write data files to disk and is useful when your local and remote machines do not share a file system.
Delete Temp Dir
Determines when PDG should automatically delete the temporary file directory associated with the scheduler.
Never
PDG never automatically deletes the temp file directory.
When Scheduler is Deleted
PDG automatically deletes the temp file directory when the scheduler is deleted or when Houdini is closed.
When Cook Completes
PDG automatically deletes the temp file directory each time a cook completes.
Compress Work Item Data
When on, PDG compresses the work item .json
files when writing them to disk.
This parameter is only available when Load Item Data From is set to Temporary JSON File.
Ignore RPC Errors
Determines whether RPC errors should cause out of process jobs to fail.
Never
RPC connection errors will cause work items to fail.
When Cooking Batches
RPC connection errors are ignored for batch work items, which typically make a per-frame RPC back to PDG to report output files and communicate sub item status. This option prevents long-running simulations from being killed on the farm, if the submitter Houdini session crashes or becomes unresponsive.
Always
RPC connection errors will never cause a work item to fail. Note that if a work item can’t communicate with the scheduler, it will be unable to report output files, attributes or its cook status back to the PDG graph.
Max RPC Errors
The maximum number of RPC failures that can occur before RPC is disabled in an out of process job.
Connection Timeout
The number of seconds to wait when an out of process jobs makes an RPC connection to the main PDG graph, before assuming the connection failed.
Connection Retries
The number of times to retry a failed RPC call made by an out of process job.
Retry Backoff
When Connection Retries is greater than 0, this parameter determines how much time should be spent between consecutive retries.
Batch Poll Rate
Determines how quickly an out of process batch work item should poll the main Houdini session for dependency status updates, if the batch is configured to cook when it’s first frame of work is ready. This has no impact on other types of batch work items.
Release Job Slot When Polling
Determines whether or not the scheduler should decrement the number of active workers when a batch is polling for dependency updates.
Authentication ¶
The artist submitting work to Tractor may need to supply PDG with login information. The environment variables $TRACTOR_USER and $TRACTOR_PASSWORD will be used to authenticate with the Tractor API if they are present. The Job Owner parm sets the owner of the job. However this will be overridden by the environment variable $PDG_TRACTOR_USER if present. This can be useful when using the Submit Graph As Job workflow, because in that case PDG needs to login to the Tractor API from the blade that is actually executing the TOP Cook job. $PDG_TRACTOR_USER should be set in the Tractor client environment in that case. The Tractor Password parm should only be used for debugging and never saved in the HIP file because there is no encryption of the parm value.
TOP Attributes ¶
|
integer |
When the schedule submits a work item to Tractor, it will add this attribute to the work item in order to track the Tractor Job and Task IDs. The first element is the Job |
Parameters ¶
Schedulers ¶
These are global parameters for all work items using this scheduler.
Tractor Server
Specifies the Tractor server address.
Port
Specifies the Tractor server port.
Tractor User
Specifies the username for the Tractor server login. This user must have permission to submit and query job status. You can override this with $PDG_TRACTOR_USER
.
Tractor Password
Specifies the password for the Tractor server login.
This is for convenience only. When saved, the password is saved in the HIP file with no encryption.
Alternatively, you should set $TRACTOR_PASSWORD
in the Houdini environment. For more information, see how to set environment variables.
Limit Jobs
When enabled, sets the maximum number of jobs that can be submitted by the scheduler at the same time.
For farm schedulers like Tractor or HQueue, this parameter can be used to limit the total number of jobs submitted to the render farm itself. Setting this parameter can help limit the load on the render farm, especially when the PDG graph has a large number of small tasks.
Block on Failed Work Items
When on, if there are any failed work items on the scheduler, then the cook is blocked from completing and the PDG graph cook is prevented from ending. This allows you to manually retry your failed work items. You can cancel the scheduler’s cook when it is blocked by failed work items by pressing the ESC key, clicking the Cancels the current cook button in the TOP tasks bar, or by using the cancel API method.
Working Directory
Specifies the relative directory where the work generates intermediate files and output. The intermediate files are placed in a subdirectory. For the Local Scheduler or HQueue, typically $HIP
is used. For other schedulers, this should be a relative directory to Local Shared Root Path
and Remote Shared Root Path
; this path is then appended to these root paths.
Load Item Data From
Determines how jobs processed by this scheduler should load work item attributes and data.
Temporary JSON File
The scheduler writes out a .json
file for each work item to the PDG temporary file directory. This option is selected by default.
RPC Message
The scheduler’s running work items request attributes and data over RPC. If the scheduler is a farm scheduler, then the job scripts running on the farm will also request item data from the submitter when creating their out-of-process work item objects.
This parameter option removes the need to write data files to disk and is useful when your local and remote machines do not share a file system.
Compress Work Item Data
When on, PDG compresses the work item .json
files when writing them to disk.
This parameter is only available when Load Item Data From is set to Temporary JSON File.
Python Executable
Specifies the full path to the Python executable on the farm machines. This is used to execute the job wrapper script for PDG work items.
Hython
Determines which Houdini Python interpreter (hython) is used for your Houdini jobs. You can also specify this hython in a command using the PDG_HYTHON
token.
Default
Use the default hython interpreter that is installed with Houdini.
Custom
Use the executable path specified by the Hython Executable parameter.
Hython Executable
This parameter is only available when Hython is set to Custom.
The full path to the hython executable to use for your Houdini jobs.
Path Mapping
Global
If the PDG Path Map exists, then it is applied to file paths.
None
Delocalizes paths using the PDG_DIR
token.
Path Map Zone
When on, specifies a custom mapping zone to apply to all jobs executed by this scheduler. Otherwise, the local platforms are LINUX
, MAC
or WIN
.
Validate Outputs When Recooking
When on, PDG validates the output files of the scheduler’s cooked work items when the graph is recooked to see if the files still exist on disk. Work items that are missing output files are then automatically dirtied and cooked again. If any work items are dirtied by parameter changes, then their cache files are also automatically invalidated. Validate Outputs When Recooking is on by default.
Check Expected Outputs on Disk
When on, PDG looks for any unexpected outputs (for example, like outputs that can result from custom output handling internal logic) that were not explicitly reported when the scheduler’s work items finished cooking. This check occurs immediately after the scheduler marks work items as cooked, and expected outputs that were reported normally are not checked. If PDG finds any files that are different from the expected outputs, then they are automatically added as real output files.
NFS
Specifies the path to the Houdini installation for farm machines in the NFS zone.
UNC (Windows)
Specifies the path to the Houdini installation for farm machines in the UNC zone.
Append PID
When on, a subdirectory is added to the location specified by the Location parameter and is named after the value of your Houdini session’s PID (Process Identifier). The PID is typically a 3-5 digit number.
This is necessary when multiple sessions of Houdini are cooking TOP graphs at the same time.
Custom
The full path to the custom temporary directory, which needs to be accessible by all blades involved in executing the Job.
Delete Temp Dir
Determines when PDG should automatically delete the temporary file directory associated with the scheduler.
Never
PDG never automatically deletes the temp file directory.
When Scheduler is Deleted
PDG automatically deletes the temp file directory when the scheduler is deleted or when Houdini is closed.
When Cook Completes
PDG automatically deletes the temp file directory each time a cook completes.
Job Spec ¶
Job Owner
Specifies the username of the owner of the job.
Job Title
Specifies the title of the top-level job for the submitted cooks.
Job Priority
Specifies the priority of the cook jobs.
Tier
Specifies a list of valid site-wide tiers, where each tier represents a particular global job priority and scheduling discipline.
Projects
Specifies the names of project affiliations for this job.
Max Active Tasks
When on, sets the maximum number of tasks that the PDG cook job is allowed to run concurrently.
After Jobs
Specifies which jobs need to be complete before job processing can begin. You can specify a single job ID or a space-separated list of multiple job IDs. Once spooled, this parameter’s settings will delay the start of job processing until the specified jobs have completed.
Verbose Logging
When on, detailed messages from scheduler binding are printed to console.
Use Session File
When on, the Tractor API will create a temporary file to avoid the need to authenticate the local user multiple times in one session of Houdini. The file will be created as $TEMP/.pdgtractor.{user}.{host}.session
.
Submit As Job ¶
Submit
Cooks the entire TOP Network as a standalone job and displays the status URI for the submitted job.
By default, the submitted job uses the Tractor login and sets it to the job environment with $PDG_TRACTOR_USER
and $TRACTOR_PASSWORD
. If these are not present, then the Tractor User and Tractor Password parameter values are used.
Job Title
Specifies the title of the job that is submitted.
Job Verbosity
Specifies the verbosity level of the standalone job.
Output Node
When on, specifies the path to the node to cook. If a node is not specified, the display node of the network that contains the scheduler is cooked instead.
Save Task Graph File
When on, the submitted job will save a task graph .py
file once the cook completes.
Job Service Keys
Specifies the Tractor Service Key expression for the job that will execute the TOP graph on the farm.
You may want to use a cheaper slot for this because although executing the TOP graph requires a separate task, it does not consume much memory or CPU.
Enable Server
When on, turns on the data layer server for the TOP job that will cook on the farm. This allows PilotPDG or other WebSocket clients to connect to the cooking job remotely to view the state of PDG.
Server Port
Determines which server port to use for the data layer server.
This parameter is only available when Enable Server is on.
Automatic
A free TCP port to use for the data layer server chosen by the node.
Custom
A custom TCP port to use for the data layer server specified by the user.
This is useful when there is a firewall between the farm machine and the monitoring machine.
Auto Connect
When on, the scheduler will try to send a command to create a remote visualizer when the job starts. If successful, then a remote graph is created and is automatically connected to the server executing the job. The client submitting the job must be visible to the server running the job or the connection will fail.
This parameter is only available when Enable Server is on.
When Finished
Determines what to do when the TOP Cook finishes. This allows the TOP Cook job to continue running after the graph cook completes so that it can be inspected by a wrangler using a Data Layer viewer. For example, with When Finished you can retry a failed work item without restarting its whole job.
Terminate
Exit the job as normal.
Keep Open If Error
Keep the job running only if there is an error detected. You will need to kill the job manually.
Keep Open
Keep the job running. You will need to kill the job manually.
Message Queue ¶
Service Keys
Specifies the Tractor Service Key expression for the task that will execute the Message Queue Server.
You may want to use a cheaper slot for this because although the Message Queue Process requires a separate task, it does not consume much memory or CPU.
Please note that a Message Queue Task is not created when a graph is cooked via Submit Graph As Job.
Task Callback Port
When on, sets the TCP Port used by the Message Queue Server for the job callback API. The port must be accessible between farm blades.
Relay Port
When on, sets the TCP Port used by the Message Queue Server connection between PDG and the blade that is running the Message Queue Command. The port must be reachable on farm blades by the PDG/user machine.
RPC Server ¶
Parameters for configuring the behavior of RPC connections from out of process jobs back to a scheduler instance.
Ignore RPC Errors
Determines whether RPC errors should cause out of process jobs to fail.
Never
RPC connection errors will cause work items to fail.
When Cooking Batches
RPC connection errors are ignored for batch work items, which typically make a per-frame RPC back to PDG to report output files and communicate sub item status. This option prevents long-running simulations from being killed on the farm, if the submitter Houdini session crashes or becomes unresponsive.
Always
RPC connection errors will never cause a work item to fail. Note that if a work item can’t communicate with the scheduler, it will be unable to report output files, attributes or its cook status back to the PDG graph.
Max RPC Errors
The maximum number of RPC failures that can occur before RPC is disabled in an out of process job.
Connection Timeout
The number of seconds to wait when an out of process jobs makes an RPC connection to the main PDG graph, before assuming the connection failed.
Connection Retries
The number of times to retry a failed RPC call made by an out of process job.
Retry Backoff
When Connection Retries is greater than 0, this parameter determines how much time should be spent between consecutive retries.
Batch Poll Rate
Determines how quickly an out of process batch work item should poll the main Houdini session for dependency status updates, if the batch is configured to cook when it’s first frame of work is ready. This has no impact on other types of batch work items.
Release Job Slot When Polling
Determines whether or not the scheduler should decrement the number of active workers when a batch is polling for dependency updates.
Job Parms ¶
These job-specific parameters affect all submitted jobs, but can be overridden on a node-by-node basis. See Scheduler Job Parms / Properties.
Service Key Expression
Specifies the job service key expression. This determines the type of blade that can run this job.
At Least Slots
Sets the minimum number of free slots that must be available on a Tractor blade in order to execute this command.
At Most Slots
When on, the maximum number of free slots that this command can use when launched. This is used as the default value Houdini Max Threads value unless explicitly set.
Houdini Max Threads
When on, sets the HOUDINI_MAXTHREADS
environment to the given value. By default, HOUDINI_MAXTHREADS
is set to the value of At Most Slots when enabled.
The default value of 0 means to use all available processors.
If the value is positive, the value limits the number of threads that can be used. A value of 1 disables multithreading entirely by limiting it to only one thread. Positive values are clamped to the number of available CPU cores.
If the value is negative, the value is added to the maximum number of processors to determine the threading limit. For example, a value of -1 uses all CPU cores except 1.
Env Keys
Specifies a space separated list of environment keys which are defined in the blade profiles.
Task Title
Specifies a custom task name prefix. By default, the corresponding work item name will be used. The name suffix is a value used internally by PDG for book keeping.
Maximum Run Time
Specifies a maximum runtime limit for tasks (in seconds). When tasks run past the time limit, they are then killed. By default, there is no maximum runtime limit as the default value is 0.
Post Success Wait
Specifies how many seconds to wait before exiting a successful job. This stops Tractor from immediately re-assigning a blade before a dependent, higher priority job can be spooled by PDG.
Metadata
Specifies an arbitrary string that is attached to the task definition.
Preview Launch
Specifies a launch expression to run an external application from the Tractor UI. This allows you to view in-progress cook results using an external application.
Tip
TOPs has its own internal viewer registry.
Handle By
Customize what to do when the command fails (Returns a non-zero exit code).
Reporting Error
The work item fails.
Reporting Warning
The work item succeeds and a warning is added to the node.
Retrying Task
The work item is retried by Tractor for the number of Retries remaining.
Ignoring Exit Code
The work item succeeds.
Handle All Non Zero
When off, you can specify a particular exit code.
Exit Code
Specifies the exit code that you want to handle using Handle By. All other non-zero exit codes will be treated as a failure as normal.
This parameter is only available when Handle All Non Zero is off.
Retries
Number of times to retry the job when the command fails.
This parameter is only available when Handle By is set to Retrying Task.
Inherit Local Environment
When on, environment variables in the current session of PDG are copied into the job’s environment.
Unset Variables
Space-separated list of environment variables that should be unset in the task environment.
Environment File
Environment Variables
Additional work item environment variables can be specified here. These will be added to the job’s environment. If the value of the variable is empty, it will be removed from the job’s environment.
Name
Name of the work item environment variable.
Value
Value of the work item environment variable.
Specifies an environment file for environment variables to be added to the job’s environment. An environment variable from the file will overwrite an existing environment variable if they share identical names.
Environment Variables
Multiparm that lets you add custom key-value environment variables for each task.
Pre Shell
Specifies a shell script to be executed/sourced before the command is executed.
Post Shell
Specifies a shell script to be executed/sourced after the command is executed.
Pre Python
Specifies the Python script to be executed in the wrapper script before the command process is spawned.
Post Python
Specifies the Python script to be executed in the wrapper script after the command process exits.
See also |