On this page |
Overview ¶
Schedulers are one of the main types of node in a PDG graph. The purpose of a scheduler node is to execute ready work items that are submitted to the scheduler node by PDG.
In addition the scheduler must report status changes and ensure that the jobs are able to communicate back to PDG when necessary. By default the jobs will use the supporting python module pdgjson
to do this communication via XMLRPC, and the scheduler is responsible for ensuring that the XMLRPC server is running. Note that this mechanism could be replaced by a custom scheduler if required.
Scheduler Callbacks ¶
Each scheduler node has several callbacks that can be implemented to control how it operates. When writing a scheduler node the only callback you're required to implement is onSchedule
, since that hook is responsible for actually executing the ready work items. If the work items are marked as begin in process, they will not reach the scheduler and will instead be handled by PDG’s internal scheduler. The other callbacks are optional and are only needed to further customize the behavior of the node.
Warning
The only callback that can safely write work item attributes is onSchedule
. If you want to add attributes to a work item in the onTick
callback, you need to use the pdg.WorkItem.lockAttributes in order to safely modify the work item.
Additionally, your scheduler node should only keep references to work items that are actively running. Once your scheduler notifies PDG that a work item has succeeded or failed, it should no longer hold a reference to that work item.
The full scheduler node API is described in pdg.Scheduler.
applicationBin(self, name, work_item)
→ str
This callback is used when a node is creating a command that uses an application that can be parameterized by the scheduler. For example there may be UI to control which 'python' application should be used for python-based jobs.
Custom scheduler bindings can use their own application 'names' to work with custom nodes.
At minimum 'hython' and 'python' should be supported.
onSchedule(self, work_item)
→ pdg.scheduleResult
This callback is evaluated when the given pdg.WorkItem is ready to be executed. The scheduler should create the necessary job spec for their farm scheduler and submit it if possible. If it doesn’t have enough resources to execute the work item, it should return Deferred or FullDeferred, which tells PDG that the scheduler can’t accommodate the work item, and it should check back later.
Otherwise it should return Succeeded to indicate that the work item has been accepted.
The other return values are used when a work item for some reason is handled immediately. This is not generally recommended because it will force work items to execute in series.
For example, Local Scheduler will return FullDeferred if it determines that all available 'slots' on the local machine are in use. On the other hand it will return Deferred if there are slots available but not enough for this particular work item. If there are enough slots, it will deduct the slots required, spawn a subprocess for the work item, and then add the work item to a private queue of running items to be tracked.
Note that the frequency that this callback is called is controlled by the pdg node parameter pdg_maxitems
and pdg_tickperiod
(See onTick
below).
onTick(self)
→ pdg.tickResult
This callback is called periodically when the graph is cooking. The callback is generally used to check the state of running work items. This is also the only safe place to cancel an ongoing cook.
The period of this callback is controlled with the PDG node parameter pdg_tickperiod
, and the maximum number of ready item onSchedule
callbacks between ticks is controlled by the node parameter pdg_maxitems
. For example by default the tick period is 0.5s and the max items per tick is 30. This means that onSchedule
will be called a maximum of 60 times per second. Adjusting these values can be useful to control the load on the farm scheduler.
The callback should return SchedulerReady if the scheduler is ready to accept new work items, and should return SchedulerBusy if it’s full at the moment. In case there is a serious problem with the scheduler (for example the connection to the farm is lost), it should return SchedulerCancelCook.
onAcceptWorkItem(self, work_item)
→ pdg.acceptResult
By default custom schedulers will only accept out-of-process work items. In-process work items, like the ones in an Invoke TOP or a Python Script TOP, will be handled internally by PDG itself. The optional onAcceptWorkItem
callback can be used to override that behavior.
The callback is called to determine if the scheduler is able to process a given work item. If it returns pdg.acceptResult.Accept then the work item will be queued with the scheduler, and passed to an onSchedule
call at a later point in time. pdg.acceptResult.Reject indicates that the scheduler cannot process the specified work item, and pdg.acceptResult.Default indicates that the default behavior should be used instead.
Note
You should not try to actually cook or schedule the work item in this callback. It should be only be used to determine if the the work item compatible with the custom scheduler.
onConfigureCook(self, cook_options)
This callback is called before the graph begins to cook, after the list of schedulers for the cook is chosen. It can be used to change cook options before the cook begins.
cook_options
is a writeable reference to the pdg.CookOptions used by the current cook.
Note
Not all cook options can be changed by this callback. For example, changing the pdg.CookOptions.nodeNames option will have no effect because the PDG graph will have already processed it and determines the list of nodes/schedulers to cook prior to calling this function.
onSetupCook(self)
This callback is called after onStartCook
, but before any work items are scheduled.
Unlike onStartCook
, which blocks the UI thread, this callback is run in the background. This makes it a better choice for setup tasks that can take a while to complete, like starting and connecting to the MQ server or copying files to remotely mounted filesystems.
onStartCook(self, static, cook_set)
→ bool
This callback is called when a PDG cook starts, after static generation.
static
is True
when a static cook is being performed instead of a full cook. See onScheduleStatic
for details.
cook_set
is the set
of PDG pdg.Node being cooked.
This can be used to initialize any resources or cache any values that apply to the overall cook. Returning False
or raising an exception will abort the cook.
You should tell PDG what the user’s working directory is by calling:
self.setWorkingDir(local_path, remote_path)
onStopCook(self, cancel)
→ bool
Called when cooking completes or is canceled. If cancel
is True
there will likely be jobs still running. In that case the scheduler should cancel them and block until they are actually canceled. This is also the time to tear down any resources that are set up in onStartCook
. The return value is ignored.
onStart(self)
→ bool
Called by PDG when scheduler is first created. Can be used to acquire resources that persist between cooks. The return value is ignored.
onStop(self)
→ bool
Called by PDG when scheduler is cleaned up. Can be used to release resources. Note that this method may not be called in some cases when Houdini is shut down. The return value is ignored.
onCancelWorkItems(self, work_items, node)
Called when the scheduler should cancel a subset of the work items that have been scheduled during the current cook. If node
is set to a value other than None
, then all of the work items in the work_items
list are from the same PDG node and the scheduler should cancel all tasks associated with that node. Otherwise, the scheduler should cancel the specific items listed in the work_items
list.
For example, the HQueue scheduler cancels the top level node job associated with the node
if one is passed in, otherwise it cancels individual work item jobs based on the contents of the work_items
list.
getStatusURI(self, work_item)
→ str
Called to return the status URI for the specified work item. This appears in the MMB detail window of a work item. It can be formatted to point to a local file with file:///
or a web page with 'http://'.
getLogURI(self, work_item)
→ str
Returns the log URI for the specified work item. This appears in the MMB detail window of a work item, and is also available with the special @pdg_log
attribute. It can be formatted to point to a local file with file:///
or a web page with http://
.
workItemResultServerAddr(self)
→ str
Returns the network endpoint for the work item result server, in the format <HOST>:<PORT>, this is equivalent to the __PDG_RESULT_SERVER__
command token, and the job environment variable $PDG_RESULT_SERVER. This will typically be an XMLRPC API server.
onScheduleStatic(self, dependency_map, dependent_map, ready_items)
→ None
Called to do a static cook of the graph, which is a cook mode of StaticDepsFull or StaticDepsNode. Typically this function will build a complete job spec and submit this to the farm scheduler. How this is done depends on your farm scheduler API. For example the dependencies between work items may have to be translated into parent/child relationships in the job spec so that the work is executed in the correct order.
Note
This functionality is only needed if complete static cooks are required. See /tops/custom_scheduler.html#staticcook.
dependency_map
is a map of pdg.WorkItem to a set
of it’s dependency work items.
dependent_map
is a map of pdg.WorkItem to a set
of it’s dependent work items.
ready_items
is a list of pdg.WorkItem that are ready to be executed.
Note that this information can be obtained via pdg.Graph.dependencyGraph
import pdg n = hou.node("/obj/topnet1/out") # Call cookWorkItems to ensure PDG context is created n.cookWorkItems(tops_only=True) # Perform generation phase of PDG cook n.getPDGGraphContext().cook(True, pdg.cookType.StaticDepsFull) # Retrieve the generated task graph work items and topology (dependencies, dependents, ready) = n.getPDGGraphContext().graph.dependencyGraph(True)
Note
This mode of cooking is not exposed in the TOP UI, and is not supported by the stock schedulers. Although Local Scheduler has a basic implementation for demonstration purposes. To trigger this mode of cooking you can call pdg.GraphContext.cook with mode
of StaticDepsFull or StaticDepsNode).
The implementation should save the required data and return immediately from this function. Then it should asynchronously manage the execution of the graph and report back all state changes via the scheduler node functions onWorkItemSucceeded, onWorkItemFailed or onWorkItemCanceled. In addition, it should ensure that all attribute changes and added files during the job are reported back to PDG, for example by calling onWorkItemAddOutput.
Once all work items have been reported back to PDG as finished the static cook will end.
See also |