Taichi: A better GPU programming environment for COP?

Forums Technical Discussion Taichi: A better GPU programming environment for COP?

762 5 1


raincole: Member; 529 posts; Joined: 8月 2019; Offline

2024年8月20日 23:29

For now, the new COP (Copernicus) has two programming languages for custom logic. The first is the old reliable Vex. Vex is what Vex is, but it doesn't run on GPU. In other words, if you use Vex in Copernicus, you're basically downgrade it to the old COP2 performance. It's okay in some case but if the texture is 4k or more then god bless you.

The other one is OpenCL (C). I know SideFX has made some efforts to mitigate its verbosity, but C is still C. Even the Houdini document [www.sidefx.com] says:

Its use of pointers means it is easy to write unsafe code that might crash your video card driver or Houdini. While developing you may find yourself in a bad state where all kernels error on compile - restarting Houdini may be necessary to restore the driver. Rarely, you may need to restart your machine.

And it's nothing like the rest of Houdini. When you talk about being a Houdini artist/TA, people won't expect you to know C.

But there are better ways. One of them is Taichi [github.com]. It allows you to utilize GPU power with super clean Python code:

# python/taichi/examples/simulation/fractal.py

import taichi as ti

ti.init(arch=ti.gpu)

n = 320
pixels = ti.field(dtype=float, shape=(n * 2, n))

@ti.func
def complex_sqr(z):
    return ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2])

@ti.kernel
def paint(t: float):
    for i, j in pixels:  # Parallelized over all pixels
        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
        iterations = 0
        while z.norm() < 20 and iterations < 50:
            z = complex_sqr(z) + c
            iterations += 1
        pixels[i, j] = 1 - iterations * 0.02

gui = ti.GUI("Julia Set", res=(n * 2, n))

for i in range(1000000):
    paint(i * 0.03)
    gui.set_image(pixels)
    gui.show()

It's much easier, and more importantly, safer to write than OpenCL. And every Houdini user has already known Python. It's under Apache 2 so SideFX can integrate it into Houdini without worrying GPL issues.

What do you think?

Edited by raincole - 2024年8月20日 23:30:01

Attachments:
fractal_small.gif (2.2 MB)


PHENOMDESIGN: Member; 163 posts; Joined: 5月 2021; Offline

2024年8月21日 1:32

Definitely do not like OpenCL at all.

Taichi is cool but only a few libraries. There is also Jax, Warp, Phi-Flow, and MLX. Jax will have the most libraries by far. I personally am working a lot with MLX on Apple Silicon and targeting not only the GPU but also the Digital Signal Processor and Neural Engine on Mac.

https://jax.readthedocs.io/en/latest/quickstart.html [jax.readthedocs.io]

https://github.com/NVIDIA/warp [github.com]

https://github.com/tum-pbs/PhiFlow [github.com]

https://ml-explore.github.io/mlx/build/html/index.html [ml-explore.github.io]

I did have the thought the other day about an alternative reality where Vex had a compiled GPU kernel LLVM backend

I think that Julia is the best programming language right now since it is so close to the math and language that it works well with LLMs because there is not as much technical verbosity. It is a JIT compiled language with a C API. It is super performant, has the most advanced math libraries, and is meant to be portable from CPU, GPU, (any other hardware accelerator) to HPC cluster. Think about programming with the most concise and powerful mathematical expressions that LLMs do not have to train on but already know because it is more math not new and specific lingo.

It is a bit more verbose but can be abstracted more. I like a lot of docstrings and context with my code but here is a concise approach to a Julia Set in Julia. Again, I am pair-programming with the JuliaHub AskAI feature and so it is a bit different.

https://info.juliahub.com/blog/ask-ai-chat-gpt-juliahub [info.juliahub.com]

#JULIASET IN JULIA
# Julia Set GPU Kernel using CUDA.jl for better performance
function julia_kernel!(img, w, h, max_iter, c1, c2, zoom, move_x, move_y)
    @cuda threads=256 blocks=ceil(Int, w * h / 256) -> kernel_func(img, w, h, max_iter, c1, c2, zoom, move_x, move_y)
end

@cuda kernel function kernel_func(img, w, h, max_iter, c1, c2, zoom, move_x, move_y)
    tid = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    x = (tid % w) + 1
    y = (tid ÷ w) + 1

    if x <= w && y <= h
        zx, zy = (x / w - 1) * zoom + move_x, (y / h - 0.5) * zoom + move_y
        iter = 0
        while zx * zx + zy * zy < 20.0f0 && iter < max_iter
            zx_new = zx * zx - zy * zy + c1
            zy = 2.0f0 * zx * zy + c2
            zx, iter = zx_new, iter + 1
        end
        img[x, y] = 1.0f0 - iter * 0.02f0
    end
end

# Function to generate and update Julia set dynamically
function generate_julia_set(; w=640, h=480, max_iter=50, c=-0.8 + 0.156im, zoom=2.0, move_x=0.0, move_y=0.0)
    img = CUDA.zeros(Float32, w, h)
    julia_kernel!(img, w, h, max_iter, real(c), imag(c), zoom, move_x, move_y)
    return Array(img)
end

Edited by PHENOMDESIGN - 2024年8月21日 02:38:09

Brave, Big-hearted Rebel
------
Design the Future, Now.
PHENOM DESIGN
------
Processing smarter, not harder on a M3 MAX 36GB


alexwheezy: Member; 285 posts; Joined: 1月 2013; Offline

2024年8月21日 2:40

For Julia, by the way, I've seen attempts to do something like this before, but it still doesn't look like a very easy and convenient way for an artist to work.

https://github.com/pedohorse/yuria [github.com]
https://www.patreon.com/posts/julia-61740179 [www.patreon.com]


PHENOMDESIGN: Member; 163 posts; Joined: 5月 2021; Offline

2024年8月21日 3:23

Very true. That example was a more traditional GPU kernel instead of a semantic program. It can be easy you just have to write it like so. I would have made it more semantic with docstrings but wanted to keep it closer to the original.

Here is an Ocean Simulation:

using Oceananigans
grid = RectilinearGrid(CPU(), size=(128, 128), x=(0, 2π), y=(0, 2π), topology=(Periodic, Periodic, Flat))
model = NonhydrostaticModel(; grid, advection=WENO())
ϵ(x, y) = 2rand() - 1
set!(model, u=ϵ, v=ϵ)
simulation = Simulation(model; Δt=0.01, stop_time=4)
run!(simulation)

Here is an Image Dither:

using DitherPunk
using Images
d = dither(img)                   # apply default algorithm: FloydSteinberg()
d = dither(img, Bayer())          # apply algorithm of choice

dither!(img)                      # or in-place modify image
dither!(img, Bayer())             # with the algorithm of your choice

You can make language program abstractions. I like it to take the Technical Animation papers and translate the equations into graphics programs. Also allows me to write specialized programs for Mac:

https://github.com/JuliaGPU/Metal.jl [github.com]
https://github.com/JuliaLinearAlgebra/AppleAccelerate.jl [github.com]

using Metal

function vadd(a, b, c)
    i = thread_position_in_grid_1d()
    c[i] = a[i] + b[i]
    return
end

dims = (3,4)
a = round.(rand(Float32, dims) * 100)
b = round.(rand(Float32, dims) * 100)
c = similar(a)

d_a = MtlArray(a)
d_b = MtlArray(b)
d_c = MtlArray(c)

len = prod(dims)
@metal threads=len vadd(d_a, d_b, d_c)
c = Array(d_c)

But you can wrap anything with Julia.

Here are some more artist friendly libraries:

Brave, Big-hearted Rebel
------
Design the Future, Now.
PHENOM DESIGN
------
Processing smarter, not harder on a M3 MAX 36GB


raincole: Member; 529 posts; Joined: 8月 2019; Offline

2024年8月22日 1:18

As far as I know, Warp is Nvidia only, so I don't think it's a good idea for Houdini to use it as the default way to utilize GPU.

For other options, I'm not familar with them enough to have an opinion.


PHENOMDESIGN: Member; 163 posts; Joined: 5月 2021; Offline

2024年8月22日 10:49

You are correct and I share the sentiment. I was very impressed by the library (Miles Macklin https://scholar.google.com/citations?user=V9EUwCEAAAAJ&hl=en [scholar.google.com] is the main developer and his research is amazing!) but for GPU it only generates CUDA kernels though you can select the CPU only option. The point there would be to use python to write direct CUDA kernels, which if you are on NVIDIA is pretty cool. And there is also the USD rendering integration too.

It looks like Taichi is CUDA and Vulkan specific so would not work as well.

I think every platform has a specific python kernel language now such as Apple's has MLX and Metal, or Google's JAX and TPUs (Jax can be used else where too).

Edited by PHENOMDESIGN - 2024年8月22日 10:51:36

Brave, Big-hearted Rebel
------
Design the Future, Now.
PHENOM DESIGN
------
Processing smarter, not harder on a M3 MAX 36GB

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts