I am curious why I my card has 12 gigs but open cl says the max allocations is 3072 mb. Is there a way to change that? This is my print out from about houdini.
OpenCL Platform NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.1 CUDA 7.0.29
OpenCL Device Quadro M6000
OpenCL Type GPU
Device Version OpenCL 1.1 CUDA
Frequency 1114 MHz
Compute Units 24
Device Address Bits 32
Global Memory 12288 MB
Max Allocation 3072 MB
Global Cache 384 KB
Max Constant Args 9
Max Constant Size 64 KB
Local Mem Size 47 KB
2D Image Support 32768x32768
3D Image Support 4096x4096x4096
Open CL Memory Limits
7997 12 2- sl0throp
- Member
- 258 posts
- Joined:
- Offline
- anon_user_37409885
- Member
- 4189 posts
- Joined: June 2012
- Offline
Two things; the addressable memory in your information is 32bit, which makes it a 4GB limit, and the latest drivers from Nivida are ‘64bit’ but have a bug where you can't address above 4GB.
We are waiting for Nvidia to fix the bug.
Note AMD or CPU openCL can already address the full 64bit address space.
We are waiting for Nvidia to fix the bug.
Note AMD or CPU openCL can already address the full 64bit address space.
- sanostol
- Member
- 577 posts
- Joined: Nov. 2005
- Offline
- malexander
- Staff
- 5218 posts
- Joined: July 2005
- Offline
- Guillaume
- Staff
- 479 posts
- Joined: April 2014
- Offline
- malexander
- Staff
- 5218 posts
- Joined: July 2005
- Offline
- anon_user_37409885
- Member
- 4189 posts
- Joined: June 2012
- Offline
- malexander
- Staff
- 5218 posts
- Joined: July 2005
- Offline
sanostol
i really hope nvidia sees it as a bug, not a strategic decision to push their tesla cards
For this generation, the new Teslas are actually different GPUs than the Maxwell-based Quadros, Titans and GEForces. The new Teslas use a GPU based on the Kepler design found in the GEForce 780, while the Maxwell architecture that the new Quadro M and GEForce 900 series is based on is quite different than Kepler. So Nvidia's actually segmenting the markets by hardware now, not just software. There's good reason though, as Maxwell's FP64 capabilities are severely limited (1/32 FP32 rate) and a lot of Tesla users require the extra precision, so they had to keep FP64 running well in the Tesla.
But I agree, I certainly hope this is not an artificial limitation in the Maxwell-based Quadro and GEForces. Given that CUDA can manage 12GB of VRAM, it does seem like more of an OpenCL bug.
- anon_user_37409885
- Member
- 4189 posts
- Joined: June 2012
- Offline
As a side note to the SP to DP disparity; interesting talk at GDC, by Amber molecular dynamics, where they compute in single precision and accumulate in double precision IIRC i.e. comparing DP, DPFP, SPFP, SPXP etc
Video:
http://on-demand.gputechconf.com/gtc/2015/video/S5478.html [on-demand.gputechconf.com]
Slides:
http://on-demand.gputechconf.com/gtc/2015/presentation/S5226-Ross-Walker.pdf [on-demand.gputechconf.com]
Video:
http://on-demand.gputechconf.com/gtc/2015/video/S5478.html [on-demand.gputechconf.com]
Slides:
http://on-demand.gputechconf.com/gtc/2015/presentation/S5226-Ross-Walker.pdf [on-demand.gputechconf.com]
- sanostol
- Member
- 577 posts
- Joined: Nov. 2005
- Offline
if this gets fixed a dream would come true
twod
For this generation, the new Teslas are actually different GPUs than the Maxwell-based Quadros, Titans and GEForces. The new Teslas use a GPU based on the Kepler design found in the GEForce 780, while the Maxwell architecture that the new Quadro M and GEForce 900 series is based on is quite different than Kepler. So Nvidia's actually segmenting the markets by hardware now, not just software. There's good reason though, as Maxwell's FP64 capabilities are severely limited (1/32 FP32 rate) and a lot of Tesla users require the extra precision, so they had to keep FP64 running well in the Tesla.
But I agree, I certainly hope this is not an artificial limitation in the Maxwell-based Quadro and GEForces. Given that CUDA can manage 12GB of VRAM, it does seem like more of an OpenCL bug.
- johner
- Staff
- 823 posts
- Joined: July 2006
- Offline
MartybNz
As a side note to the SP to DP disparity; interesting talk at GDC, by Amber molecular dynamics, where they compute in single precision and accumulate in double precision IIRC i.e. comparing DP, DPFP, SPFP, SPXP etc
FWIW we do the same thing. Most of the internal multigrid computations are single-precision, but if we're looking at total error to determine whether we can stop iterating, we use double precision for accumulation, dot product totals, etc.
- anon_user_37409885
- Member
- 4189 posts
- Joined: June 2012
- Offline
- johner
- Staff
- 823 posts
- Joined: July 2006
- Offline
Just wanted to point out that the new Nvida 352.09 beta drivers seem to fix the 4GB OpenCL limitation! I got them for Linux here:
http://www.nvidia.com/download/driverResults.aspx/85057/en-us [nvidia.com]
I don't know the status under Windows, I'm afraid.
I ran a 200M voxel smoke sim last night that used 11GB in a K6000 and solved in under 4 seconds / frame.
We'd be very curious to hear experiences successful or otherwise if anyone has a chance to try these.
http://www.nvidia.com/download/driverResults.aspx/85057/en-us [nvidia.com]
I don't know the status under Windows, I'm afraid.
I ran a 200M voxel smoke sim last night that used 11GB in a K6000 and solved in under 4 seconds / frame.
We'd be very curious to hear experiences successful or otherwise if anyone has a chance to try these.
-
- Quick Links