I get 1.2 fps on CPU and 3.8 fps with openCL on default scene settings . I`m on
i5 450m
ati 5650m
Btw, it looks too low for gpu. Something wrong with drivers? Unfortunately no chance to change pc now. Gpu caps viewer see my openCL devices, and it says that I have CAL 1.4.1703 (VM) driver installed. Any suggestions?
Open CL Settings
64817 43 9- miklem
- Member
- 43 posts
- Joined: April 2012
- Offline
- malexander
- Staff
- 5212 posts
- Joined: July 2005
- Offline
ATI's older hardware is not as efficient for compute as the newer 7000 series and Nvidia's hardware. It uses a VLIW5 design, so that 5 shaders make up 1 compute unit. If the compiler could schedule a scalar and a vec4 operation at once, then the unit could run at 100% capacity. But, if it could only schedule a single vec3 operation, or worse, a scalar, it'll perform under its rated capacity.
In contrast, Nvidia's hardware and the AMD 7000 series use a SIMD approach, which allows for better utilization of shaders when scalars and smaller vectors are used.
So, in short, it's a limitation of the hardware. However, a 3x improvement isn't bad.
In contrast, Nvidia's hardware and the AMD 7000 series use a SIMD approach, which allows for better utilization of shaders when scalars and smaller vectors are used.
So, in short, it's a limitation of the hardware. However, a 3x improvement isn't bad.
- Korhon
- Member
- 334 posts
- Joined: July 2007
- Offline
- icerust
- Member
- 68 posts
- Joined: Oct. 2011
- Offline
Sidefx staff, Thanks for sharing.
I wanted to share my results from a humble PC. I got a 2600k@4.4ghz with gtx 550ti 2gb.
128^3 5.0 fps vs 1.2fps(cpu) 623mb 45% gpu load
256^3 1.2 fps vs 8 fpm(cpu) 1200mb 60-90%
beyond there to 304^3, i think the gpu speed starts to matter. 48-30 fpm with similar 1500-1700mb memory usage and 60-97% gpu usage.
320^3 errors out with lack of memory OpenCL Context error: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 550 Ti (Device 0).
OpenCL Exception: clEnqueueNDRangeKernel (-4)
I haven't really worked with gpu before so I find it really really fast. I'm sure there's a way to write the sim from gpu out to file (I'll test it after this post). That would be awesome.
edit: so rendering out using the gpu is the same.
I wanted to share my results from a humble PC. I got a 2600k@4.4ghz with gtx 550ti 2gb.
128^3 5.0 fps vs 1.2fps(cpu) 623mb 45% gpu load
256^3 1.2 fps vs 8 fpm(cpu) 1200mb 60-90%
beyond there to 304^3, i think the gpu speed starts to matter. 48-30 fpm with similar 1500-1700mb memory usage and 60-97% gpu usage.
320^3 errors out with lack of memory OpenCL Context error: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on GeForce GTX 550 Ti (Device 0).
OpenCL Exception: clEnqueueNDRangeKernel (-4)
I haven't really worked with gpu before so I find it really really fast. I'm sure there's a way to write the sim from gpu out to file (I'll test it after this post). That would be awesome.
edit: so rendering out using the gpu is the same.
- craigthailand
- Member
- 30 posts
- Joined: July 2009
- Offline
- sami.tawil
- Member
- 172 posts
- Joined: March 2012
- Offline
- karoly
- Member
- 2 posts
- Joined: April 2013
- Offline
- megasets
- Member
- 85 posts
- Joined: Aug. 2010
- Offline
- sekow
- Member
- 238 posts
- Joined: Nov. 2013
- Offline
- johner
- Staff
- 823 posts
- Joined: July 2006
- Offline
sekow
Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.
Unfortunately the current Nvidia OpenCL drivers only allow addressing 32-bits of memory, meaning Houdini (and all OpenCL clients) can't use anything above 4GB on the larger Nvidia cards. We are in contact with Nvidia about the issue, but unfortunately have no news to report. This limitation does not exist under CUDA, so it's not a hardware issue.
The Intel CPU drivers are naturally 64-bit on 64-bit OS's, since they share the operating system memory space. Hence the very large sims possible with it.
Although untested, it might also be possible to use above 4GB on AMD cards via an environment variable:
http://devgurus.amd.com/message/1286769 [devgurus.amd.com]
- sekow
- Member
- 238 posts
- Joined: Nov. 2013
- Offline
- danwood82
- Member
- 48 posts
- Joined: June 2011
- Offline
Any word on the 32-bit issue being resolved with nVidia? This would seem like a crippling flaw considering they already have 12GB workstation cards on the market. I know they're somewhat renowned for dragging their heels with OpenCL support, as they have vested interest in pushing CUDA instead, but it still seems a strange oversight for top-end kit.
Has anyone had any experience testing out AMD cards with this test scene? Does that 64-bit override work in conjunction with Houdini sims?
They've recently got a 16GB card to market, which could be an appealing proposition if it can actually be made use of.
It would also be interesting just to know how AMD cards stack up to nVidia in this regard… some sources seem to suggest their OpenCL performance is considerably better than nVidia's… some show them pretty much on parity.
I know most of the professional CG industry avoids AMD like the plague, but they could still be useful as a dedicated compute card, if they happened to show a clear advantage :-)
Has anyone had any experience testing out AMD cards with this test scene? Does that 64-bit override work in conjunction with Houdini sims?
They've recently got a 16GB card to market, which could be an appealing proposition if it can actually be made use of.
It would also be interesting just to know how AMD cards stack up to nVidia in this regard… some sources seem to suggest their OpenCL performance is considerably better than nVidia's… some show them pretty much on parity.
I know most of the professional CG industry avoids AMD like the plague, but they could still be useful as a dedicated compute card, if they happened to show a clear advantage :-)
- pezetko
- Member
- 392 posts
- Joined: Nov. 2008
- Offline
As I know NVidia failed to deliver OpenCL 1.2 support yet. (As there is already OpenCL 2.0 specification). No ETA as they don't bother to answer emails.
So they flagship K6000 is too expensive (useless) with this artifical driver limitation for any OpenCL task.
I didn't have the opportunity to test any AMD/ATI with high memory capacity yet, but should be possible to.
So they flagship K6000 is too expensive (useless) with this artifical driver limitation for any OpenCL task.
I didn't have the opportunity to test any AMD/ATI with high memory capacity yet, but should be possible to.
- danwood82
- Member
- 48 posts
- Joined: June 2011
- Offline
Just wanted to add - I happened to try out the “Experimental OpenCL driver” for Intel CPUs using this test scene.
While it obviously isn't a patch on GPU accelerated sims, I noticed a 2-3x speed increase simming the exact same 512^3 sim on the exact same i7 3770k CPU, routing it via OpenCL. Rather remarkable I'd say!
I'm tempted to get hold of a cheapish 4GB AMD GPU, to see whether that 64-bit enabling trick works. I am right in thinking that due to whatever addressing overheads, 32-bit tops out around the 2.5GB mark, right? So a 4GB card should be enough to assess whether there's a difference…
While it obviously isn't a patch on GPU accelerated sims, I noticed a 2-3x speed increase simming the exact same 512^3 sim on the exact same i7 3770k CPU, routing it via OpenCL. Rather remarkable I'd say!
I'm tempted to get hold of a cheapish 4GB AMD GPU, to see whether that 64-bit enabling trick works. I am right in thinking that due to whatever addressing overheads, 32-bit tops out around the 2.5GB mark, right? So a 4GB card should be enough to assess whether there's a difference…
- Dain Lewis
- Member
- 8 posts
- Joined: May 2013
- Offline
- AndreasOberg
- Member
- 117 posts
- Joined: Feb. 2015
- Offline
Hi guys, I found this old thread.
I never really got any improvements from OpenCL before but then I found this thread.
My work computer Dual Xeon 8 core 3.1Ghz
Geforce 980 GTX 4GB RAM
300x300x300 voxels - 40 frames
980GTX = 17s (10x FASTER!)
2x8 3.1GHz 177s
I noticed that if I used larger than 300x300x300 then it got a lot slower on the GPU, I think that I'm beginning to run out of memory. Will be interesting to test this on a Titan X card.
At home we have Two Titan X 2 cards with 12GB and a 2x14 core CPU. I will run the test.
As I understand they only use 1 GPU, would be great if they switch over to multiple gpu cards in the future.
Is it possible to assign a specific GPU? I was thinking I should be able to start to Houdini sessions one on each GPU in theory.
Also would be great if we could have support for explosions on the GPU as well!
Anyone else have done similar tests?
/Andreas
I never really got any improvements from OpenCL before but then I found this thread.
My work computer Dual Xeon 8 core 3.1Ghz
Geforce 980 GTX 4GB RAM
300x300x300 voxels - 40 frames
980GTX = 17s (10x FASTER!)
2x8 3.1GHz 177s
I noticed that if I used larger than 300x300x300 then it got a lot slower on the GPU, I think that I'm beginning to run out of memory. Will be interesting to test this on a Titan X card.
At home we have Two Titan X 2 cards with 12GB and a 2x14 core CPU. I will run the test.
As I understand they only use 1 GPU, would be great if they switch over to multiple gpu cards in the future.
Is it possible to assign a specific GPU? I was thinking I should be able to start to Houdini sessions one on each GPU in theory.
Also would be great if we could have support for explosions on the GPU as well!
Anyone else have done similar tests?
/Andreas
- AndreasOberg
- Member
- 117 posts
- Joined: Feb. 2015
- Offline
sekow
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…
Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.
Nevertheless I am impressed.
Is this something that has been addressed. 380 subdivisions is not that much.
/Andreas
- malexander
- Staff
- 5212 posts
- Joined: July 2005
- Offline
Andreas Öbergsekow
380 at 500-600 ms
gtx titan black 6GB
using 2.5 GB VRAM, and givning OpenCl Execptions. But still simulating.
Weird…
Everything beyond 380 Divisions wont work, despite the fact that 4 GB are left untouched.
Nevertheless I am impressed.
Is this something that has been addressed. 380 subdivisions is not that much.
/Andreas
Nvidia added 64b support to their OpenCL driver sometime in the summer. Before that they were limited to 32b addressing, so it wasn't possible to run sims >4GB (and often much less than that). A recent Nvidia driver should be able to use all 12GB of the card.
- AndreasOberg
- Member
- 117 posts
- Joined: Feb. 2015
- Offline
- lamer3d
- Member
- 5 posts
- Joined: May 2017
- Offline
Sorry for bumping old (but still VERY useful) thread. But after all those optimizations, it became apparent that now Volume Source is major point of slow-down. Here are couple of screenshots:
Perhaps, there is another, more optimized way of sourcing volumes?
Perhaps, there is another, more optimized way of sourcing volumes?
Edited by lamer3d - Feb. 16, 2018 09:08:08
-
- Quick Links