Open CL Settings

Forums Technical Discussion Open CL Settings

64817 43 9

First
1
2
3
Last


miklem: Member; 43 posts; Joined: April 2012; Offline

April 10, 2012 7:33 a.m.

I get 1.2 fps on CPU and 3.8 fps with openCL on default scene settings . I`m on

i5 450m
ati 5650m

Btw, it looks too low for gpu. Something wrong with drivers? Unfortunately no chance to change pc now. Gpu caps viewer see my openCL devices, and it says that I have CAL 1.4.1703 (VM) driver installed. Any suggestions?


malexander: Staff; 5212 posts; Joined: July 2005; Offline

April 10, 2012 9:30 a.m.

ATI's older hardware is not as efficient for compute as the newer 7000 series and Nvidia's hardware. It uses a VLIW5 design, so that 5 shaders make up 1 compute unit. If the compiler could schedule a scalar and a vec4 operation at once, then the unit could run at 100% capacity. But, if it could only schedule a single vec3 operation, or worse, a scalar, it'll perform under its rated capacity.

In contrast, Nvidia's hardware and the AMD 7000 series use a SIMD approach, which allows for better utilization of shaders when scalars and smaller vectors are used.

So, in short, it's a limitation of the hardware. However, a 3x improvement isn't bad.


Korhon: Member; 334 posts; Joined: July 2007; Offline

April 11, 2012 11:18 a.m.

When i set up a test scene with 300x300x300 reolution houdini uses 1.4gb of ram. If i buy a tesla with 6gb will i be able to sim as much as houdini can on cpu with 6gb ram? Cause i can push alot more than 300x3 resolution on 6gb of ram on cpu

www.gimpville.no


icerust: Member; 68 posts; Joined: Oct. 2011; Offline

July 13, 2012 12:39 a.m.

Sidefx staff, Thanks for sharing.

I wanted to share my results from a humble PC. I got a 2600k@4.4ghz with gtx 550ti 2gb.
128^3 5.0 fps vs 1.2fps(cpu) 623mb 45% gpu load
256^3 1.2 fps vs 8 fpm(cpu) 1200mb 60-90%
beyond there to 304^3, i think the gpu speed starts to matter. 48-30 fpm with similar 1500-1700mb memory usage and 60-97% gpu usage.
320^3 errors out with lack of memory OpenCL Context error: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 550 Ti (Device 0).
OpenCL Exception: clEnqueueNDRangeKernel (-4)

I haven't really worked with gpu before so I find it really really fast. I'm sure there's a way to write the sim from gpu out to file (I'll test it after this post). That would be awesome.

edit: so rendering out using the gpu is the same.


craigthailand: Member; 30 posts; Joined: July 2009; Offline

Dec. 10, 2012 5:39 a.m.

win7 h12.1.125
i7 3770k @ 4.3ghz
gtx680 4gb not oc'd

395^3 gpu 31fpm && cpu 1.4fpm

im pretty happy with that !


sami.tawil: Member; 172 posts; Joined: March 2012; Offline

May 26, 2013 4 p.m.

WIN 7
HFX 12.5.408
GTX titan
2 * XEON 3.07ghz hexacore
48 GB ram

356^3 800-980 ms
256^3 480-500ms


karoly: Member; 2 posts; Joined: April 2013; Offline

Nov. 7, 2013 2:34 a.m.

100 frame, 200^3

GTX 780 : 1 m 4 sec

CPU(q6600) : 14 m 15 sec

gpu is 13 times faster, houdini , i love you


megasets: Member; 85 posts; Joined: Aug. 2010; Offline

May 6, 2014 4:50 p.m.

omg! This can't be right?:

400^3 at 750 ms per frame!

Intel I7 920 @2.67GHz 2.67GHZ
9GB Ram
64bit
Windows 7 Pro

Geforce GTX 680 4GB

This is crazy fast

Sam Swift-Glasman
Art Director
Five AI


sekow: Member; 238 posts; Joined: Nov. 2013; Offline

May 28, 2014 2:15 p.m.

380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…

Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Nevertheless I am impressed.

http://www.sekowfx.com [www.sekowfx.com]


johner: Staff; 823 posts; Joined: July 2006; Offline

May 28, 2014 2:32 p.m.

sekow
Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Unfortunately the current Nvidia OpenCL drivers only allow addressing 32-bits of memory, meaning Houdini (and all OpenCL clients) can't use anything above 4GB on the larger Nvidia cards. We are in contact with Nvidia about the issue, but unfortunately have no news to report. This limitation does not exist under CUDA, so it's not a hardware issue.

The Intel CPU drivers are naturally 64-bit on 64-bit OS's, since they share the operating system memory space. Hence the very large sims possible with it.

Although untested, it might also be possible to use above 4GB on AMD cards via an environment variable:
http://devgurus.amd.com/message/1286769 [devgurus.amd.com]


sekow: Member; 238 posts; Joined: Nov. 2013; Offline

May 28, 2014 3:09 p.m.

then we should all hope this will be addressed in the future.
these speed gains are incredible.

http://www.sekowfx.com [www.sekowfx.com]


danwood82: Member; 48 posts; Joined: June 2011; Offline

July 13, 2014 4:02 p.m.

Any word on the 32-bit issue being resolved with nVidia? This would seem like a crippling flaw considering they already have 12GB workstation cards on the market. I know they're somewhat renowned for dragging their heels with OpenCL support, as they have vested interest in pushing CUDA instead, but it still seems a strange oversight for top-end kit.

Has anyone had any experience testing out AMD cards with this test scene? Does that 64-bit override work in conjunction with Houdini sims?
They've recently got a 16GB card to market, which could be an appealing proposition if it can actually be made use of.

It would also be interesting just to know how AMD cards stack up to nVidia in this regard… some sources seem to suggest their OpenCL performance is considerably better than nVidia's… some show them pretty much on parity.

I know most of the professional CG industry avoids AMD like the plague, but they could still be useful as a dedicated compute card, if they happened to show a clear advantage :-)


pezetko: Member; 392 posts; Joined: Nov. 2008; Offline

July 13, 2014 5:12 p.m.

As I know NVidia failed to deliver OpenCL 1.2 support yet. (As there is already OpenCL 2.0 specification). No ETA as they don't bother to answer emails.

So they flagship K6000 is too expensive (useless) with this artifical driver limitation for any OpenCL task.

I didn't have the opportunity to test any AMD/ATI with high memory capacity yet, but should be possible to.


danwood82: Member; 48 posts; Joined: June 2011; Offline

July 15, 2014 9:33 a.m.

Just wanted to add - I happened to try out the “Experimental OpenCL driver” for Intel CPUs using this test scene.

While it obviously isn't a patch on GPU accelerated sims, I noticed a 2-3x speed increase simming the exact same 512^3 sim on the exact same i7 3770k CPU, routing it via OpenCL. Rather remarkable I'd say!

I'm tempted to get hold of a cheapish 4GB AMD GPU, to see whether that 64-bit enabling trick works. I am right in thinking that due to whatever addressing overheads, 32-bit tops out around the 2.5GB mark, right? So a 4GB card should be enough to assess whether there's a difference…


Dain Lewis: Member; 8 posts; Joined: May 2013; Offline

Nov. 21, 2014 1:21 a.m.

GTX970 4GB VRAM
i7 4790k @4.0 ghz
32GB RAM

400^3 div

GPU openCL: 47 fpm
CPU: 3.2 fpm

Wow it's everything I imagined


AndreasOberg: Member; 117 posts; Joined: Feb. 2015; Offline

Dec. 22, 2015 9:31 a.m.

Hi guys, I found this old thread.

I never really got any improvements from OpenCL before but then I found this thread.

My work computer Dual Xeon 8 core 3.1Ghz
Geforce 980 GTX 4GB RAM

300x300x300 voxels - 40 frames
980GTX = 17s (10x FASTER!)
2x8 3.1GHz 177s

I noticed that if I used larger than 300x300x300 then it got a lot slower on the GPU, I think that I'm beginning to run out of memory. Will be interesting to test this on a Titan X card.

At home we have Two Titan X 2 cards with 12GB and a 2x14 core CPU. I will run the test.

As I understand they only use 1 GPU, would be great if they switch over to multiple gpu cards in the future.

Is it possible to assign a specific GPU? I was thinking I should be able to start to Houdini sessions one on each GPU in theory.

Also would be great if we could have support for explosions on the GPU as well!
Anyone else have done similar tests?
/Andreas


AndreasOberg: Member; 117 posts; Joined: Feb. 2015; Offline

Dec. 22, 2015 9:35 a.m.

sekow
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…

Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Nevertheless I am impressed.

Is this something that has been addressed. 380 subdivisions is not that much.
/Andreas


malexander: Staff; 5212 posts; Joined: July 2005; Offline

Dec. 22, 2015 10:15 a.m.

Andreas Öberg
sekow
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…

Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.

Nevertheless I am impressed.

Is this something that has been addressed. 380 subdivisions is not that much.
/Andreas

Nvidia added 64b support to their OpenCL driver sometime in the summer. Before that they were limited to 32b addressing, so it wasn't possible to run sims >4GB (and often much less than that). A recent Nvidia driver should be able to use all 12GB of the card.


AndreasOberg: Member; 117 posts; Joined: Feb. 2015; Offline

Dec. 22, 2015 10:38 a.m.

Great. I will try it at home!

/Andreas


lamer3d: Member; 5 posts; Joined: May 2017; Offline

Feb. 16, 2018 9:07 a.m.

Sorry for bumping old (but still VERY useful) thread. But after all those optimizations, it became apparent that now Volume Source is major point of slow-down. Here are couple of screenshots:

Perhaps, there is another, more optimized way of sourcing volumes?

Edited by lamer3d - Feb. 16, 2018 09:08:08

First
1
2
3
Last

Quick Links

                    
                        Search links
                        Show recent posts
                        Show unanswered posts