Using UT_ThreadedAlgorithm

The easiest form of multithreading to perform inside the HDK is with UT_ThreadedAlgorithm. This class maintains a thread pool to allow for efficient data-parallel execution. It also ties into Houdini's -j option so will respect command line requests for how much threading the user wants performed.

While you can setup and invoke the UT_ThreadedAlgorithm::run method yourself, this involves a lot of boiler plate code to marshal parameters in and out of your threaded procedures. To greatly simplify the setup there are a series of THREADED_METHOD1 macros which can be used to make class members multithreaded.

Consider this class, FOO, which has a single-threaded method bar which one wants to make multithreaded.

class FOO
{
public:
    void bar(int p1, float p2);
    int myLength;
    int *myData;
};
void
FOO::bar(int p1, float p2)
{
    int i;
    for (i = 0; i < myLength; i++)
    {
        myData[i] += p1 * p2;
    }
}

We want to divide bar's for loop up onto separate threads.

class FOO
{
public:
    THREADED_METHOD2(                   // Construct two parameter threaded method
                    FOO,                // Name of class
                    myLength > 100,     // Evaluated to see if we should multithread.
                    bar,                // Name of function
                    int, p1,            // An integer parameter named p1
                    float, p2)          // A float parameter named p2
    void barPartial(int p1, float p2, const UT_JobInfo &info);
    int myLength;
    int *myData;
};
void
FOO::barPartial(int p1, float p2, const UT_JobInfo &info)
{
    int         i, n;
    for (info.divideWork(myLength, i, n); i < n; i++)
    {
        myData[i] += p1 * p2;
    }
}

Callers of FOO::bar() will automatically trigger a multithreaded execution of FOO::barPartial(). The UT_JobInfo class allows each instance to find out how many threads are active and which thread it is, from which it can decide on its own load balancing approach. The divideWork() method makes it easy to do a equal assignment of resources.

Callers can avoid multithreading by calling FOO::barNoThread() to invoke only a single copy of FOO::barPartial. Likewise, in this case, if myLength > 100 is false, no threading occurs. Providing a lower bound is often useful to avoid threading overhead from dominating with small work loads.

Note the placement of commas in the THREADED_METHOD2 macro. Each parameter type is followed by a space. Also note that these are marshalled by value into the different threads so large structures should be passed by reference or pointer.

Note that the return type is void. If you need to return some data you can pass in a pointer to store the result. This usually requires locking. Consider this example using FOO again, but returning the sum of the myData.

class FOO
{
public:
    THREADED_METHOD1(FOO, myLength > 100,
                    sum,
                    float *, result)
    void sumPartial(float *result, const UT_JobInfo &info);
    int myLength;
    int *myData;
};
void
FOO::sumPartial(float *result, const UT_JobInfo &info)
{
    int         i, n;
    float       total = 0;
    for (info.divideWork(myLength, i, n); i < n; i++)
    {
        total += myData[i];
    }
    {
        UT_AutoJobInfoLock a(info);
         result += total;
    }
}

Here the UT_AutoJobInfoLock is used to create a lock that lasts for the scope of that variable. This sort of auto lock is useful as you don't have to worry about making sure it is released to avoid deadlocks. If you want to manually lock, you can use the UT_JobInfo::lock and UT_JobInfo::unlock methods. This are better than using your own UT_Lock because they becom no-ops when the UT_ThreadedAlgorithm is run in single threaded mode.

An example of THREADED_METHOD in use can be found in SIM/SIM_GasAdd.C and SIM/SIM_GasAdd.h.

See Also: Thread Safety

Thread Local Storage

Often it is useful to have static data that you want to store. Static data is usually a bad thing in multithreaded applications because different threads might clobber on your shared data. However, in practice, the static data is usually semantically independent between threads. If you don't want it shared between threads, you can turn it into thread local storage.

Different compilers have different native ways of defining thread local storage. If you are on one platform, this might be the best approach. Houdini provides its own cross platform thread local storage implementation via UT_ThreadSpecificValue.

int
cachelastval(int val)
{
    static UT_ThreadSpecificValue<int> thelastval;
    int result = thelastval.get();
    thelastval.get() = val;
    return result;
}

This function will return the previous value passed to it. The default of an int in UT_ThreadSpecificValue is 0, so the first call will return 0. It is thread safe, meaning that the behavior of one thread will not affect another.

Creating Child Threads

The UT_Thread class provides lower level functionality for explicitly creating threads. It is recommended to use UT_ThreadedAlgorithm instead wherever possible.

Conceptually, when a thread is spawned, two tasks are created: one in the new child thread, and another in parent thread. Then any code that runs in the two new child tasks may serialize (or lock) against each other for non-thread-safe resources. In order to do this, UT_TaskScope objects must be created in the child tasks before any further HDK code is run. This allows UT_TaskLock objects to be acquired in a deadlock-free manner. UT_Thread::startThread() will automatically do this for the child thread, but you must do this explicitly if you run any code that calls the HDK in the parent thread. UT_ThreadedAlgorithm already handles all of this internally so nothing extra needs to be done when using UT_ThreadedAlgorithm.

Here is an example of manually creating threads.

#include <UT/UT_Lock.h>
#include <UT/UT_PtrArray.h>
#include <UT/UT_TaskScope.h>
#include <UT/UT_Thread.h>
// Use a UT_Lock to serialize code that is non-threadsafe. If the work you do
// inside it might result in the lock being reacquired in a child thread, then
// use a UT_TaskLock instead. If the code inside the lock might run into
// deadlocks, then use UT_AbortableRecursiveLock, or UT_AbortableTaskLock.
static UT_Lock  theMutex;
static void *
doTask(void *data)
{
    // ... do some thread-safe work in parallel
    {
        UT_Lock::Scope  lock(theMutex); // released when it goes out of scope
        // ... do some serialized work here
    }
    return NULL; // always return NULL
}
static void
runMultithreaded()
{
    int                      thread_count = UT_Thread::getNumProcessors() - 1;
    UT_PtrArray<UT_Thread *> threads;
    void *                   data = NULL;
    // Allocate and start up threads that run doTask()
    for (int i = 0; i < thread_count; i++)
    {
        threads.append(UT_Thread::allocThread(UT_Thread::ThreadLowUsage));
        if (!threads(i) || !threads(i)->startThread(&doTask, data))
            break;  // error, failed to create thread
    }
    // Now spawn a child task in the parent thread as well
    {
        UT_TaskScope task_scope(UT_TaskScope::getCurrent());
        (void) doTask(data);
    }
    // Wait for all threads to finish, deallocate them once done
    for (int i = 0; i < threads.entries(); i++)
    {
        threads(i)->waitForState(UT_Thread::ThreadIdle);
        delete threads(i);
    }
}

See Also: Thread Safety