On this page | |
Since | 20.5 |
Overview ¶
This is a generic ML training node that can solve regression problems. In ML, regression means training a feedforward neural network (a model) so it can closely approximate a continuous function from a fixed number of input variables to a fixed number of output variables. In Houdini, this function may be provided in the form of a procedural network. See Machine Learning documentation for more general information.
For example, regression ML may learn how to realistically deform a character based on its rig pose to improve over linear blend skinning. See ML Deformer Content Library. In this case, you use ML to learn the function that maps a rig pose to a skin deformation.
ML Regression Train trains a model given a set of data set consisting of labeled examples, see ML Example. Each labeled example is a pair consisting of an input component and a corresponding target component. For regression, both the input component and the target component are tuples of continuous variables, which may include point coordinates, colors, or PCA components. The term regression is used because each target component consists of continuous variables. Labeled examples are the basis for training. In the specific case of the ML Deformer, each input component is an (serialized) pose and the corresponding target component is the (PCA-compressed) skin deformation obtained by a flesh simulation.
ML Regression Train creates a model that predicts a target component given some user-specified input component. The goal is to have a model that is generalized well; it should produce useful predictions for inputs that are not included in the training portion of the data set. ML Regression Train provides several methods that can help ensure that a trained model performs well on unseen inputs. These are called regularization methods.
The trained model produced by ML Regression Train is written out to disk in ONNX format. The easiest way to bring this model into Houdini is the ML Regression Inference tool.
ML Regression Train creates a feedforward neural network. The number of inputs in this network corresponds to the number of input variables. The number of outputs of the network corresponds to the number of target variables. ML Regression Train automatically figures out the number of inputs and outputs of the network using the dimensions stored in the raw data set that it works with.
In-between there are several hidden layers, which can be controlled by the user. The width of these hidden layers is the maximum of the number of inputs and outputs. The behavior of the network is controlled by a set of parameters. Each parameter is either a weight (scaling factor) or a bias (additive constant). The training aims to optimize these parameters for the regression task.
ML Regression Train does multiple passes through the training portion of the data set. Each such pass is called an epoch.
The training process may be repeated multiple times, each time with different settings on the training node. These training settings are commonly referred to as hyperparameters. Common examples of such hyperparameters are settings such as the learning rate, the number of hidden layers of the network and the width of the network. The Wedge TOP allows the training process to be repeated with varying hyperparameters. For each setting in a wedge, the resulting model can be saved to a separate file by adding a TOP attribute name at the end of the Model Base Name.
ML Regression Train internally splits the data set provided by the user into two portions: a training portion and a validation portion. These portions can be controlled by the user. The training portion is used to improve the network’s parameters during each training epoch. The validation portion is used to verify the network’s performance on data that is not used for the training.
Overfitting occurs when a model produces outputs that are accurate for inputs in the training data but inaccurate for inputs outside the training data (unseen). To reduce overfitting, the ML Regression Train TOP supports two simple regularization strategies: early stopping and weight decay. Early stopping checks whether the validation loss (the loss on the validation set) stops decreasing. If this is the case, then the training terminates. Weight decay modifies the loss function that is used for training to include a term that penalizes large weights.
Partial Training ¶
Instead of training a model all at once, a model can be partially trained bit by bit, where each training pass consists of a limited number of epochs. Breaking up the training into several passes offers various training setup possibilities. For example, it allows partial models to be periodically saved to disk. By inferencing each of the saved-out partially trained models, it is possible to visualize the training progress.
Another training setup is to generate fresh training data before each partial training pass. This may be applicable when the data set is generated within Houdini through a procedural network.
Training on a stream of fresh training data compared to revisiting data points in a fixed-size data set may produce a model that generalizes better to unseen data. Another advantage is the amount of training data does not have to be decided upfront.
Breaking up the training into several passes can be achieved by placing and configuring TOP nodes around ML Train Regression as follows:
-
ML Regression Train can be placed inside a feedback block, see Feedback Begin.
-
The input to Feedback Begin should have one work item per partial training.
-
For example, a Wedge node can be put before Feedback Begin on which Wedge Count specifies the number of partial training passes.
-
-
The Create Iterations From on Feedback Begin should be set to Upstream Items.
-
The maximum number of epochs per partial training pass can be controlled by setting Max Epochs on ML Train Regression to some relatively small value such as 1024.
-
To output and retain partially trained ONNX models, you can specify on the ML Train Regression node.
-
In the Files tab, a Model Base Name that includes an attribute such as
@wedgeindex
.
-
-
To generate fresh training data before each partial training pass, you can put a ROP Fetch that triggers the generation of a new data set in front of ML Train Regression inside the feedback loop.
File Access ¶
ML Regression Train accesses a variety of files. Each file name can be optionally modified by inserting one or more TOP attributes into a base name of a file. For example, @wedgeindex
or @loopiter
.
This creates various types of TOP ML training setups such as:
-
Training for varying hyperparameters with the use of a Wedge TOP,
-
Performing repeated partial training in a TOP feedback loop,
-
Generating additional training data on demand in a TOP feedback loop.
Some of the files ML Regression Train accesses are read-only, both read from and written to, and only written to (or appended to).
In the read-only category, there is the data set, specified using Data Set Folder and Data Set Base Name. This provides the source for training data and validation data if early stopping is on.
In the write-only category, there are partially trained models in ONNX format, with file names determined by Models Folder and Model Base Name.
The current state of the training is specified using States Folder and State Base Name. The training state is read from at the start of a single invocation of ML RegressionTrain and written to at the end. The training state allows ML Regression Train to resume training where it left off in a previous invocation. This is valid when the state name is the same as the previous invocation. The training state can act like a checkpoint, allowing an imcomplete training to resume.
Diagnostic and progress information related to the training is logged to a file location controlled using Logs Folder and Log Base Name. If a log file exists, log information is appended to that existing log file. This logging allows the creation of a single log file if a single training is broken up into parts by invoking ML Regression Train multiple times. For example, in a TOP feedback loop.
Limitations ¶
ML Regression Train and its associated SOP nodes provide an easy starting point for experimentation with regression ML in Houdini. At some point you may outgrow this ML Regression Train node, because you need to add some training functionality that is specific to your problem. Examples may include other neural network architectures, a different regularization approach, a different cost function, and additional input data. In that case, you can extract and copy the internals of this node and its training script and use this as a basis for creating a modified training node. You may still be able to use many of the other tools on the SOP side of the example-based ML toolkit.
Parameters ¶
Architecture
Architecture of the network.
Fully Connected Layers (MLP)
Creates a network consisting of fully connected linear layers alternated by activation layers.
Custom
Creates a network based on an architecture specified entirely by the user.
Constant Output Layer
Create a network that consists of only output units. All inputs are ignored.
Custom Network Format
#id customnetworkformat
The format used to specify a custom neural network structure. There is currently only one choice: Script.
Network Specification Script
Shuffle
When on (recommended), the elements of the data set are re-ordered in a random order before the data set is used at all. Having this on ensures the validation set, which consists of the last contiguous part of the data set, will consist of random samples of the data set.
Limit Size
When on, only an initial part of the data set is preserved. The remaining data is deleted. This step takes part right after shuffling, before any of the data is used for training. This option is useful for finding out how the generalization error of the trained model depends on the data size (making use of a Wedge TOP, for example). The resulting curve may inidicate whether more data would be beneficial to improve the generalization.
Upper Limit
Specify an upper limit of the number of data sets that are preserved. The size of the remaining data set is the minimum of the initial data set size and this limit.
Random Seed
The random seed used to initialize the parameters of the neural network. Different random seeds may result in different models, with different accuracies. This hyperparameter is a candidate for wedging.
Epochs per Evaluation
The number of epochs that are trained before each validation loss evaluation.
Patience
The number of times the validation loss is evaluated without finding an improvement of the current best validation loss before giving up.
Note
This parameter is expressed in terms of the number of evaluations, not epochs. See the Epochs for Evaluation parameter to see how many epochs are trained between evaluations.
Max Batch Size
Upper limit on the number of labeled examples, randomly selected from the training set, that is considered for each optimization step.
Algorithm
Choose the optimization algorithm that is used for training. Currently, the only option is the Adam optimizer, described here
Learning Rate
This controls the step size that is used while training. The larger the step size, the larger each update of the network parameters tends to be. Choosing a smaller learning rate may take longer, but helps avoid locally optimal solutions being skipped over.
Beta1
This coefficient is specific to the Adam optimization algorithm. See Pytorch documentation.
Beta2
This coefficient is specific to the Adam optimization algorithm. See Pytorch documentation.
Limit Epochs
Enforce a hard upper limit on the number of epochs during training.
Max Epochs
Hard upper limit on the number of epochs. Setting this number too low may negatively affect the accuracy of the model.
Weight Decay
The higher this value is set, the more the training session will try to keep the weights small (only the weights, not the biases). This is a very basic method for preventing overfitting.
Enable Early Stopping
When on, stops the training as soon as the performance of the model on the validation set stops improving.
Training Data Proportion
The proportion of the data set that is used to train the model. This parameter applies only when Enable Early Stopping is enabled.
Validation Data Proportion
The proportion of the data set that is used to validate the performance of the model. The validation set consists of a contiguous range of elements at the end of the data (after shuffling, if that’s enabled). It is recommended to turn on the Shuffle option, otherwise the validation set will generally not consist of a random sample of the entire data set. This parameter applies only when Enable Early Stopping is enabled.
Log to Standard Output
If enabled, information is written to the standard output during training. This does not stop the same information from being written out to log files.
Data Set Folder
Source folder that contains one or more data sets.
Data Set Base Name
The base name of a data set, excluding the .raw extension.
Models Folder
Destination folder for trained models.
Models Base Name
The base name of a trained model, excluding the .onnx extension.
States Folder
Destination folder where the training node keeps the training state.
States Base Name
The base name of a training state, excluding any extensions. This is the state the training node will be resumed.
Logs Folder
Folder that contains one or more training logs.
Logs Base Name
The base name of a training log, excluding the .txt extension.
Use CPU Exclusively
When on, the entire training is on the CPU, not using the GPU. This is not recommended as it is very slow. This option exists for debugging purposes.
Environment Path
The path to the python virtual environment in which the internal training script of this node is run.
TOP Scheduler Override
This parameter overrides the TOP scheduler for this node.
Schedule When
When enabled, this parameter can be used to specify an expression that determines which work items from the node should be scheduled. If the expression returns zero for a given work item, that work item will immediately be marked as cooked instead of being queued with a scheduler. If the expression returns a non-zero value, the work item is scheduled normally.
Work Item Label
Determines how the node should label its work items. This parameter allows you to assign non-unique label strings to your work items which are then used to identify the work items in the attribute panel, task bar, and scheduler job names.
Use Default Label
The work items in this node will use the default label from the TOP network, or have no label if the default is unset.
Inherit From Upstream Item
The work items inherit their labels from their parent work items.
Custom Expression
The work item label is set to the Label Expression custom expression which is evaluated for each item.
Node Defines Label
The work item label is defined in the node’s internal logic.
Label Expression
When on, this parameter specifies a custom label for work items created by this node. The parameter can be an expression that includes references to work item attributes or built-in properties. For example, $OS: @pdg_frame
will set the label of each work item based on its frame value.
Work Item Priority
This parameter determines how the current scheduler prioritizes the work items in this node.
Inherit From Upstream Item
The work items inherit their priority from their parent items. If a work item has no parent, its priority is set to 0.
Custom Expression
The work item priority is set to the value of Priority Expression.
Node Defines Priority
The work item priority is set based on the node’s own internal priority calculations.
This option is only available on the Python Processor TOP, ROP Fetch TOP, and ROP Output TOP nodes. These nodes define their own prioritization schemes that are implemented in their node logic.
Priority Expression
This parameter specifies an expression for work item priority. The expression is evaluated for each work item in the node.
This parameter is only available when Work Item Priority is set to Custom Expression.
See also |