Open CL Memory Limits

   7965   12   2
User Avatar
Member
258 posts
Joined:
Offline
I am curious why I my card has 12 gigs but open cl says the max allocations is 3072 mb. Is there a way to change that? This is my print out from about houdini.

OpenCL Platform NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.1 CUDA 7.0.29
OpenCL Device Quadro M6000
OpenCL Type GPU
Device Version OpenCL 1.1 CUDA
Frequency 1114 MHz
Compute Units 24
Device Address Bits 32
Global Memory 12288 MB
Max Allocation 3072 MB
Global Cache 384 KB
Max Constant Args 9
Max Constant Size 64 KB
Local Mem Size 47 KB
2D Image Support 32768x32768
3D Image Support 4096x4096x4096
User Avatar
Member
4189 posts
Joined: June 2012
Offline
Two things; the addressable memory in your information is 32bit, which makes it a 4GB limit, and the latest drivers from Nivida are ‘64bit’ but have a bug where you can't address above 4GB.

We are waiting for Nvidia to fix the bug.

Note AMD or CPU openCL can already address the full 64bit address space.
User Avatar
Member
577 posts
Joined: Nov. 2005
Offline
i really hope nvidia sees it as a bug, not a strategic decision to push their tesla cards
User Avatar
Staff
5212 posts
Joined: July 2005
Offline
The 350 driver introduced 64 bit addressing with OpenCL 1.2 support, but it's still failing to allocate more than 4GB. We've filed a bug with Nvidia on the issue. Getting closer, but not quite there yet.
User Avatar
Staff
478 posts
Joined: April 2014
Offline
I think the Max Allocation number describes the largest continuous memory block that can be allocated for a single resource.
User Avatar
Staff
5212 posts
Joined: July 2005
Offline
Yep, that's what Max Allocation is. However, the 350 driver still fails to allocate more than 4GB total, split over several buffers, and that's the bug we filed.
User Avatar
Member
4189 posts
Joined: June 2012
Offline
sanostol
i really hope nvidia sees it as a bug, not a strategic decision to push their tesla cards

The Tesla would have only 32bit OpenCL too, but should have 64bit Cuda.
User Avatar
Staff
5212 posts
Joined: July 2005
Offline
sanostol
i really hope nvidia sees it as a bug, not a strategic decision to push their tesla cards

For this generation, the new Teslas are actually different GPUs than the Maxwell-based Quadros, Titans and GEForces. The new Teslas use a GPU based on the Kepler design found in the GEForce 780, while the Maxwell architecture that the new Quadro M and GEForce 900 series is based on is quite different than Kepler. So Nvidia's actually segmenting the markets by hardware now, not just software. There's good reason though, as Maxwell's FP64 capabilities are severely limited (1/32 FP32 rate) and a lot of Tesla users require the extra precision, so they had to keep FP64 running well in the Tesla.

But I agree, I certainly hope this is not an artificial limitation in the Maxwell-based Quadro and GEForces. Given that CUDA can manage 12GB of VRAM, it does seem like more of an OpenCL bug.
User Avatar
Member
4189 posts
Joined: June 2012
Offline
As a side note to the SP to DP disparity; interesting talk at GDC, by Amber molecular dynamics, where they compute in single precision and accumulate in double precision IIRC i.e. comparing DP, DPFP, SPFP, SPXP etc

Video:
http://on-demand.gputechconf.com/gtc/2015/video/S5478.html [on-demand.gputechconf.com]

Slides:
http://on-demand.gputechconf.com/gtc/2015/presentation/S5226-Ross-Walker.pdf [on-demand.gputechconf.com]
User Avatar
Member
577 posts
Joined: Nov. 2005
Offline
if this gets fixed a dream would come true

twod
For this generation, the new Teslas are actually different GPUs than the Maxwell-based Quadros, Titans and GEForces. The new Teslas use a GPU based on the Kepler design found in the GEForce 780, while the Maxwell architecture that the new Quadro M and GEForce 900 series is based on is quite different than Kepler. So Nvidia's actually segmenting the markets by hardware now, not just software. There's good reason though, as Maxwell's FP64 capabilities are severely limited (1/32 FP32 rate) and a lot of Tesla users require the extra precision, so they had to keep FP64 running well in the Tesla.

But I agree, I certainly hope this is not an artificial limitation in the Maxwell-based Quadro and GEForces. Given that CUDA can manage 12GB of VRAM, it does seem like more of an OpenCL bug.
User Avatar
Staff
823 posts
Joined: July 2006
Offline
MartybNz
As a side note to the SP to DP disparity; interesting talk at GDC, by Amber molecular dynamics, where they compute in single precision and accumulate in double precision IIRC i.e. comparing DP, DPFP, SPFP, SPXP etc

FWIW we do the same thing. Most of the internal multigrid computations are single-precision, but if we're looking at total error to determine whether we can stop iterating, we use double precision for accumulation, dot product totals, etc.
User Avatar
Member
4189 posts
Joined: June 2012
Offline
Great to know johner. It's crazy when you hear for the first time how bad the DP has got on these cards, and, awesome that we've got people working around it!
User Avatar
Staff
823 posts
Joined: July 2006
Offline
Just wanted to point out that the new Nvida 352.09 beta drivers seem to fix the 4GB OpenCL limitation! I got them for Linux here:

http://www.nvidia.com/download/driverResults.aspx/85057/en-us [nvidia.com]

I don't know the status under Windows, I'm afraid.

I ran a 200M voxel smoke sim last night that used 11GB in a K6000 and solved in under 4 seconds / frame.

We'd be very curious to hear experiences successful or otherwise if anyone has a chance to try these.
  • Quick Links