To make a long story short, I'm working on a quite big project right now and realized that my GPU (RTX 4080 16GB) isn't being used on render time at all. Instead, when I first render after restarting Houdini I get this error:
"KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed"
After this error shows up once my GPU isn't even showing up in the list of XPU devices when rendering. Just 100% CPU usage. This remains until I restart Houdini, at which point it attempts to use my GPU only the first time initializing a render and eventually gives me the error above.
I've tried switching from Nvidia Game Ready Drivers to the latest Studio Driver (560.81). For the rest of my system, I'm on Windows 11 and using an AMD 7800X3D CPU, as well as 128GB of RAM. Full PC restart also doesn't fix the issue, and I'm getting the same error in the current daily Houdini build (20.5.328).
I don't have time right now to troubleshoot and find the exact source of the problem, so I'm throwing a hail mary here: Has anyone had the same issue, and if so, did you find a fix for it?
[cudaErrorIllegalAddress] Karma XPU error
4007 31 3- MCJamZam
- Member
- 14 posts
- Joined: May 2018
- Offline
- johnmather
- Staff
- 528 posts
- Joined: Aug. 2019
- Offline
If you open the display device and render stats in the viewport, you may get more details as to why it's failing. See "Display device and render stats in the viewport" here: https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#howto [www.sidefx.com]
- MCJamZam
- Member
- 14 posts
- Joined: May 2018
- Offline
johnmather
If you open the display device and render stats in the viewport, you may get more details as to why it's failing. See "Display device and render stats in the viewport" here: https://www.sidefx.com/docs/houdini/solaris/karma_xpu.html#howto [www.sidefx.com]
Thank you, this is helping - my guess is it has something to do with running out of VRAM. My scene is using just barely more than 16GB of memory just for geometry instances, which is of course more than the VRAM on my GPU.
If you don't mind me asking, do you know if there's a way to break these render stats down further? Now that I know I'm using too much memory on geometry, I need to find which geometry exactly is using the most amount of memory. Rather than painstakingly going through my entire scene testing objects one by one, it'd be nice to just get a list of memory usage per object, or something along those lines. (this might be something fairly basic in Solaris I've totally missed so far, seeing as I'm still learning)
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
I'm frequently encountering the same problem. I think I'm experiencing it since I upgraded from the current 278 production build to daily 328. It doesn't seem to be related to GPU running out of VRAM, because according to
I'm using
Is it possible to restart the Optix device without resorting to restarting the whole program and reloading the scene? This would save me a lot of time.
nvidia-smi
, around the time when the Optix device fails there's still about 35% free VRAM available on it.I'm using
nvidia-driver
550.54.15 and RTX 3070 running on Debian Bookworm.Is it possible to restart the Optix device without resorting to restarting the whole program and reloading the scene? This would save me a lot of time.
Edited by ajz3d - Aug. 18, 2024 17:22:13
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
- brians
- Staff
- 530 posts
- Joined: May 2019
- Offline
ajz3d
I think I'm experiencing it since I upgraded from the current 278 production build to daily 328.
Are you able to confirm that for us?
ajz3d
Is it possible to restart the Optix device without resorting to restarting the whole program and reloading the scene?
It depends on the error. Sadly cudaErrorIllegalAddress requires a full restart of Houdini
Have you tried restarting XPU (ie at the topright of the viewport, click the dropdown and choose "restart")
ajz3d
I'm frequently encountering the same problem
Are you able to reliably reproduce this?
It would be great to get a repro scene + clear repro steps from you, so we can investigate.
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
briansHi Brians. Yes, I confirm this. I installed daily 332 yesterday (which is now the new production build) while filling out a bug report about this particular issue, and this build also crashes OptiX on my end. So I reverted back to 278, and there are no crashes at all, no matter what I do and how hard I try. Rock solid in this dept.ajz3d
I think I'm experiencing it since I upgraded from the current 278 production build to daily 328.
Are you able to confirm that for us?
briansNaturally, but t doesn't restart it. I also tried one of your suggestions from some other thread. That is, to switch to CPU and then back to XPU. The result is the same, unfortunately.ajz3d
Is it possible to restart the Optix device without resorting to restarting the whole program and reloading the scene?
It depends on the error. Sadly cudaErrorIllegalAddress requires a full restart of Houdini
Have you tried restarting XPU (ie at the topright of the viewport, click the dropdown and choose "restart")
briansYes I can reproduce it every single time. I'm preparing the package and will send it to support.ajz3d
I'm frequently encountering the same problem
Are you able to reliably reproduce this?
It would be great to get a repro scene + clear repro steps from you, so we can investigate.
Edited by ajz3d - Aug. 21, 2024 10:14:21
- ronald_a
- Member
- 92 posts
- Joined: Aug. 2017
- Offline
brians
Sadly cudaErrorIllegalAddress requires a full restart of Houdini
Have you tried restarting XPU (ie at the topright of the viewport, click the dropdown and choose "restart")
Is there any chance that the need to restart houdini with cudaErrorIllegalAddress will go away any time soon? This is one of the few small annoyances using xpu.
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
- z8editing
- Member
- 3 posts
- Joined: June 2019
- Offline
I noticed an interactivity issue while moving around the viewport with xpu when theres at least 50% of vram used, but only on the newest production build .332, the .278 doesn't have it
(my very scientific method is to duplicate the test pig without instancing until it chugs in xpu)
A friend with a different setup and 2x 4070 also has the same issue
Also I would like to point out that xpu on 20.5 takes about 25% more vram compared to h20 with only textures/materials. I'm trying to narrow down the issue before sending a ticket but i'm not making sense of the inconsistency
for example i have a scene taking 5gb of vram without materials on both versions, it goes up to 6.2gb with materials on h20.0 and 8gb on h20.5 while having the same look
(my very scientific method is to duplicate the test pig without instancing until it chugs in xpu)
A friend with a different setup and 2x 4070 also has the same issue
Also I would like to point out that xpu on 20.5 takes about 25% more vram compared to h20 with only textures/materials. I'm trying to narrow down the issue before sending a ticket but i'm not making sense of the inconsistency
for example i have a scene taking 5gb of vram without materials on both versions, it goes up to 6.2gb with materials on h20.0 and 8gb on h20.5 while having the same look
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
Brains, did you by any chance had the opportunity to look into this issue and the scene I sent? I haven't heard from the support for two weeks now and the issue discussed here is pretty critical for me, because I can't upgrade to anything beyond the initial 278 production build without experiencing OptiX crashes. Which of course means that I have to struggle, on a daily basis, with a variety of bugs that were already fixed in newer Houdini builds.
Edited by ajz3d - Sept. 8, 2024 15:42:18
- brians
- Staff
- 530 posts
- Joined: May 2019
- Offline
We've had trouble reproducing your issue on our end.
If we could binary-search to find the exact version of Houdini that caused the issue, it would make it much easier to track down. Are you in a position to do that?
ajz3d
I think I'm experiencing it since I upgraded from the current 278 production build to daily 328
If we could binary-search to find the exact version of Houdini that caused the issue, it would make it much easier to track down. Are you in a position to do that?
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
- brians
- Staff
- 530 posts
- Joined: May 2019
- Offline
It can be done via the launcher (specifically the --version argument)
https://www.sidefx.com/docs/houdini/ref/utils/launcher.html [www.sidefx.com]
Let me know if this works for you, if not I'll check with our release/web guys
https://www.sidefx.com/docs/houdini/ref/utils/launcher.html [www.sidefx.com]
Let me know if this works for you, if not I'll check with our release/web guys
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
briansIf it's not a problem, I'd rather like to use legacy installers.
It can be done via the launcher (specifically the --version argument)
https://www.sidefx.com/docs/houdini/ref/utils/launcher.html [www.sidefx.com]
Let me know if this works for you, if not I'll check with our release/web guys
- brians
- Staff
- 530 posts
- Joined: May 2019
- Offline
- brians
- Staff
- 530 posts
- Joined: May 2019
- Offline
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
- brians
- Staff
- 530 posts
- Joined: May 2019
- Offline
- ajz3d
- Member
- 570 posts
- Joined: Aug. 2014
- Offline
-
- Quick Links