Open CL Settings

   64443   43   9
User Avatar
Member
43 posts
Joined: April 2012
Offline
I get 1.2 fps on CPU and 3.8 fps with openCL on default scene settings . I`m on

i5 450m
ati 5650m

Btw, it looks too low for gpu. Something wrong with drivers? Unfortunately no chance to change pc now. Gpu caps viewer see my openCL devices, and it says that I have CAL 1.4.1703 (VM) driver installed. Any suggestions?
User Avatar
Staff
5202 posts
Joined: July 2005
Offline
ATI's older hardware is not as efficient for compute as the newer 7000 series and Nvidia's hardware. It uses a VLIW5 design, so that 5 shaders make up 1 compute unit. If the compiler could schedule a scalar and a vec4 operation at once, then the unit could run at 100% capacity. But, if it could only schedule a single vec3 operation, or worse, a scalar, it'll perform under its rated capacity.

In contrast, Nvidia's hardware and the AMD 7000 series use a SIMD approach, which allows for better utilization of shaders when scalars and smaller vectors are used.

So, in short, it's a limitation of the hardware. However, a 3x improvement isn't bad.
User Avatar
Member
334 posts
Joined: July 2007
Offline
When i set up a test scene with 300x300x300 reolution houdini uses 1.4gb of ram. If i buy a tesla with 6gb will i be able to sim as much as houdini can on cpu with 6gb ram? Cause i can push alot more than 300x3 resolution on 6gb of ram on cpu
www.gimpville.no
User Avatar
Member
68 posts
Joined: Oct. 2011
Offline
Sidefx staff, Thanks for sharing.

I wanted to share my results from a humble PC. I got a 2600k@4.4ghz with gtx 550ti 2gb.
128^3 5.0 fps vs 1.2fps(cpu) 623mb 45% gpu load
256^3 1.2 fps vs 8 fpm(cpu) 1200mb 60-90%
beyond there to 304^3, i think the gpu speed starts to matter. 48-30 fpm with similar 1500-1700mb memory usage and 60-97% gpu usage.
320^3 errors out with lack of memory OpenCL Context error: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 550 Ti (Device 0).
OpenCL Exception: clEnqueueNDRangeKernel (-4)


I haven't really worked with gpu before so I find it really really fast. I'm sure there's a way to write the sim from gpu out to file (I'll test it after this post). That would be awesome.

edit: so rendering out using the gpu is the same.
User Avatar
Member
30 posts
Joined: July 2009
Offline
win7 h12.1.125
i7 3770k @ 4.3ghz
gtx680 4gb not oc'd

395^3 gpu 31fpm && cpu 1.4fpm

im pretty happy with that !
User Avatar
Member
172 posts
Joined: March 2012
Offline
WIN 7
HFX 12.5.408
GTX titan
2 * XEON 3.07ghz hexacore
48 GB ram

356^3 800-980 ms
256^3 480-500ms
User Avatar
Member
2 posts
Joined: April 2013
Offline
100 frame, 200^3

GTX 780 : 1 m 4 sec

CPU(q6600) : 14 m 15 sec



gpu is 13 times faster, houdini , i love you
User Avatar
Member
85 posts
Joined: Aug. 2010
Offline
omg! This can't be right?:

400^3 at 750 ms per frame!

Intel I7 920 @2.67GHz 2.67GHZ
9GB Ram
64bit
Windows 7 Pro

Geforce GTX 680 4GB

This is crazy fast
Sam Swift-Glasman
Art Director
Five AI
User Avatar
Member
238 posts
Joined: Nov. 2013
Offline
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…

Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Nevertheless I am impressed.
http://www.sekowfx.com [www.sekowfx.com]
User Avatar
Staff
823 posts
Joined: July 2006
Offline
sekow
Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Unfortunately the current Nvidia OpenCL drivers only allow addressing 32-bits of memory, meaning Houdini (and all OpenCL clients) can't use anything above 4GB on the larger Nvidia cards. We are in contact with Nvidia about the issue, but unfortunately have no news to report. This limitation does not exist under CUDA, so it's not a hardware issue.

The Intel CPU drivers are naturally 64-bit on 64-bit OS's, since they share the operating system memory space. Hence the very large sims possible with it.

Although untested, it might also be possible to use above 4GB on AMD cards via an environment variable:
http://devgurus.amd.com/message/1286769 [devgurus.amd.com]
User Avatar
Member
238 posts
Joined: Nov. 2013
Offline
then we should all hope this will be addressed in the future.
these speed gains are incredible.
http://www.sekowfx.com [www.sekowfx.com]
User Avatar
Member
48 posts
Joined: June 2011
Offline
Any word on the 32-bit issue being resolved with nVidia? This would seem like a crippling flaw considering they already have 12GB workstation cards on the market. I know they're somewhat renowned for dragging their heels with OpenCL support, as they have vested interest in pushing CUDA instead, but it still seems a strange oversight for top-end kit.

Has anyone had any experience testing out AMD cards with this test scene? Does that 64-bit override work in conjunction with Houdini sims?
They've recently got a 16GB card to market, which could be an appealing proposition if it can actually be made use of.

It would also be interesting just to know how AMD cards stack up to nVidia in this regard… some sources seem to suggest their OpenCL performance is considerably better than nVidia's… some show them pretty much on parity.

I know most of the professional CG industry avoids AMD like the plague, but they could still be useful as a dedicated compute card, if they happened to show a clear advantage :-)
User Avatar
Member
392 posts
Joined: Nov. 2008
Offline
As I know NVidia failed to deliver OpenCL 1.2 support yet. (As there is already OpenCL 2.0 specification). No ETA as they don't bother to answer emails.

So they flagship K6000 is too expensive (useless) with this artifical driver limitation for any OpenCL task.

I didn't have the opportunity to test any AMD/ATI with high memory capacity yet, but should be possible to.
User Avatar
Member
48 posts
Joined: June 2011
Offline
Just wanted to add - I happened to try out the “Experimental OpenCL driver” for Intel CPUs using this test scene.

While it obviously isn't a patch on GPU accelerated sims, I noticed a 2-3x speed increase simming the exact same 512^3 sim on the exact same i7 3770k CPU, routing it via OpenCL. Rather remarkable I'd say!


I'm tempted to get hold of a cheapish 4GB AMD GPU, to see whether that 64-bit enabling trick works. I am right in thinking that due to whatever addressing overheads, 32-bit tops out around the 2.5GB mark, right? So a 4GB card should be enough to assess whether there's a difference…
User Avatar
Member
8 posts
Joined: May 2013
Offline
GTX970 4GB VRAM
i7 4790k @4.0 ghz
32GB RAM

400^3 div

GPU openCL: 47 fpm
CPU: 3.2 fpm

Wow it's everything I imagined
User Avatar
Member
117 posts
Joined: Feb. 2015
Offline
Hi guys, I found this old thread.

I never really got any improvements from OpenCL before but then I found this thread.

My work computer Dual Xeon 8 core 3.1Ghz
Geforce 980 GTX 4GB RAM

300x300x300 voxels - 40 frames
980GTX = 17s (10x FASTER!)
2x8 3.1GHz 177s

I noticed that if I used larger than 300x300x300 then it got a lot slower on the GPU, I think that I'm beginning to run out of memory. Will be interesting to test this on a Titan X card.

At home we have Two Titan X 2 cards with 12GB and a 2x14 core CPU. I will run the test.

As I understand they only use 1 GPU, would be great if they switch over to multiple gpu cards in the future.

Is it possible to assign a specific GPU? I was thinking I should be able to start to Houdini sessions one on each GPU in theory.

Also would be great if we could have support for explosions on the GPU as well!
Anyone else have done similar tests?
/Andreas
User Avatar
Member
117 posts
Joined: Feb. 2015
Offline
sekow
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…

Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Nevertheless I am impressed.

Is this something that has been addressed. 380 subdivisions is not that much.
/Andreas
User Avatar
Staff
5202 posts
Joined: July 2005
Offline
Andreas Öberg
sekow
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…

Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Nevertheless I am impressed.

Is this something that has been addressed. 380 subdivisions is not that much.
/Andreas

Nvidia added 64b support to their OpenCL driver sometime in the summer. Before that they were limited to 32b addressing, so it wasn't possible to run sims >4GB (and often much less than that). A recent Nvidia driver should be able to use all 12GB of the card.
User Avatar
Member
117 posts
Joined: Feb. 2015
Offline
Great. I will try it at home!

/Andreas
User Avatar
Member
5 posts
Joined: May 2017
Offline
Sorry for bumping old (but still VERY useful) thread. But after all those optimizations, it became apparent that now Volume Source is major point of slow-down. Here are couple of screenshots:









Perhaps, there is another, more optimized way of sourcing volumes?
Edited by lamer3d - Feb. 16, 2018 09:08:08
  • Quick Links