[cudaErrorIllegalAddress] Karma XPU error

   4756   31   3
User Avatar
Member
580 posts
Joined: Aug. 2014
Offline
For those of you who are interested, this bug was fixed in 20.5.370.
User Avatar
Member
4 posts
Joined: May 2020
Offline
I actually just updated today from 20.5.332 to 20.5.370 and started hitting this error. Seems to be whenever I turn on a material with either transmission or SSS. But for what it's worth, had never hit this error before, just started hitting this error right after updating. What's really odd is that even rolling back to 332, I'm now hitting the error. Not sure what broke. I tried updating to the latest Nvidia drivers as well. I'm running a RTX 4090, 3090, and a Ryzen 9 7950x.
User Avatar
Member
4 posts
Joined: May 2020
Offline
This is nuts. It's even now erroring out in H20 on hip files that I haven't touched for months.
User Avatar
Member
31 posts
Joined: Oct. 2022
Offline
Getting the same cudaErrorIllegalAddress on recent files. Even restarting houdini doesnt fix it. Always the same error.

Latest nvidia studio drivers (rtx 4070)
ryzen 9 9950X
windows 11 Pro
Latest production/daily builds

What is going on?
User Avatar
Staff
531 posts
Joined: May 2019
Offline
Windows may have auto-updated your driver. What happens if you roll your nvidia driver back to an earlier version?
User Avatar
Member
31 posts
Joined: Oct. 2022
Offline
There's a mismatch on the dates between the device manager and geforce experience that I just updated, is this normal?

Attachments:
NVIDIA_GeForce_Experience_VpKiXk8zC6.png (197.2 KB)

User Avatar
Staff
531 posts
Joined: May 2019
Offline
Maybe related?
https://www.sidefx.com/forum/topic/98268/#post-432055 [www.sidefx.com]
User Avatar
Member
31 posts
Joined: Oct. 2022
Offline
I can't revert to previous driver since I just installed windows. This is the only driver I have in the system.
User Avatar
Staff
531 posts
Joined: May 2019
Offline
johancc
I can't revert to previous driver since I just installed windows. This is the only driver I have in the system.

What driver are you on?

You can download older drivers from the nvidia website.

FYI, Siegmattel fixed his issue by going with an older driver
“… I rolled my Nvidia drivers back to 560.94 (Game-ready) and uninstalled Geforce Experience and now everything is working. Seems like a compatibility issue with newer drivers...”
User Avatar
Staff
531 posts
Joined: May 2019
Offline
One more thing, we've found a bug with our OpenCL caching where rolling back an NVidia driver could cause an "OpenCL Exception" error to appear. Its been fixed in 20.5.394, but you can also fix it via... (from devs) "They can wipe their HOUDINI_TEMP/OCL_CodeCache. Likewise their ~/.nv/ComputeCache"
User Avatar
Member
4 posts
Joined: June 2020
Offline
I started getting this error yesterday while I was on 20.5.278

siegmattel
I actually just updated today from 20.5.332 to 20.5.370 and started hitting this error. Seems to be whenever I turn on a material with either transmission or SSS. But for what it's worth, had never hit this error before, just started hitting this error right after updating. What's really odd is that even rolling back to 332, I'm now hitting the error. Not sure what broke. I tried updating to the latest Nvidia drivers as well. I'm running a RTX 4090, 3090, and a Ryzen 9 7950x.

As siegmattel said: SSS can be the reason and in my case it is exactly like that by enabling SSS on a material in a simple demo scene with just a sphere and SSS material.

brians
FYI, Siegmattel fixed his issue by going with an older driver
“… I rolled my Nvidia drivers back to 560.94 (Game-ready) and uninstalled Geforce Experience and now everything is working. Seems like a compatibility issue with newer drivers...”

Right, I tested some combinations and found that just reverting to NVidia's 560.94 while still using 20.5.278 made it better but didn't solve it. What currently seems to solve it is the combination of 560.94 and 20.5.370 while 20.5.370 doesn't work with the most current NVidia driver.
User Avatar
Staff
531 posts
Joined: May 2019
Offline
BlackOceanTide
reverting to NVidia's 560.94 while still using 20.5.278 made it better but didn't solve it.

20.5.278 had an issue that was fixed in 20.5.370
https://www.sidefx.com/forum/topic/97577/?page=2#post-431105 [www.sidefx.com]

BlackOceanTide
What currently seems to solve it is the combination of 560.94 and 20.5.370 while 20.5.370 doesn't work with the most current NVidia driver.

The 565 drivers have been problematic
https://www.sidefx.com/forum/topic/98442/ [www.sidefx.com]

We've put a potential workaround into 20.5.397
So please try 20.5.397 with latest driver and let us know how you get on.
Testing so far has been positive, so I expect a production build to go out soon.

thanks
Edited by brians - Nov. 4, 2024 10:10:24
  • Quick Links