KarmaXPU fails with 4090

   11645   24   6
User Avatar
Member
8 posts
Joined: July 2018
Offline
I recently got a RTX 4090 and opened up a scene that I had been working on and noticed that KarmaXPU would constantly fail and display KarmaXPU: Unable to create CUDA context for device 0.

I decided to do some troubleshooting so I made a new scene with just a simple sphere and a grid made into a sweep. Before I created any materials I instantly threw down a karma node in a LOP network and changed it to XPU and it began rendering fine. I made two materials and put them on the sphere and the sweep and then suddenly it errored out the same way it did in my other scene.

I found that the material with transmission set to 1 instantly causes the error. However in my original scene I did not have any transmission materials, only MaterialX materials that used texture files for albedo, roughness, and metal.

[21:07:14] KarmaXPU: device Type:Optix ID:0 has registered a critical error [cudaErrorIllegalAddress], so will now stop functioning. Future error messages will be suppressed [21:08:13] KarmaXPU: Unable to create CUDA context for device 0

Nvidia RTX 4090 Driver Version: 531.18

I've attached the simple scene with sphere and grid that fails and crashes the XPU render and falls back to CPU.
Edited by IamBramer - March 8, 2023 22:25:09

Attachments:
Fail XPU Render.hiplc (598.3 KB)

User Avatar
Member
2625 posts
Joined: June 2008
Offline
Testing your scene, I receive no errors after browsing to my copy of the artist_workshop.hdr.
I tried turning transmission on and off, and XPU works fine on my RTX3050 8GB running Houdini 19.5.493. nVidia Drivers 516.94
Using Houdini Indie 20.0
Windows 11 64GB Ryzen 16 core.
nVidia 3050RTX 8BG RAM.
User Avatar
Member
8 posts
Joined: July 2018
Offline
Enivob
Testing your scene, I receive no errors after browsing to my copy of the artist_workshop.hdr.
I tried turning transmission on and off, and XPU works fine on my RTX3050 8GB running Houdini 19.5.493. nVidia Drivers 516.94

Glad to hear!

I rolled back to the Nvidia Studio Driver 528.49 and everything is working. So it seems to be some sort of driver conflict issue.
User Avatar
Member
241 posts
Joined: Feb. 2016
Offline
I had the same issue with my 3060. At some point after adding a few textures it would crash with the same error. rolling back to 528.49 fixed my issues as well.
Thanks,

Evan
User Avatar
Staff
530 posts
Joined: May 2019
Offline
I'm able to replicate this as well on my end.
So thanks for posting about it.
We'll contact NVidia about it and let you know any updates.
Cheers
Edited by brians - March 23, 2023 20:23:58
User Avatar
Member
143 posts
Joined: Oct. 2015
Offline
Same problem Here with Nvidia RTX 3070 Driver Version: 531.18
rolling back to previous version too
User Avatar
Staff
530 posts
Joined: May 2019
Offline
This has been addressed in 19.5.568 (and also backported to 19.0.384)
(ie, after advice from NVidia, we've put a temporary workaround in the XPU code until the driver issue is fixed)
Thanks for your patience guys!
Edited by brians - March 28, 2023 18:45:38
User Avatar
Member
8 posts
Joined: July 2018
Offline
brians
This has been addressed in 19.5.568 (and also backported to 19.0.384)
(ie, after advice from NVidia, we've put a temporary workaround in the XPU code until the driver issue is fixed)
Thanks for your patience guys!

Awesome stuff brians. Always amazed at how awesome everyone at Houdini and everyone in the community is at helping out and solving stuff!
User Avatar
Member
2 posts
Joined: June 2022
Offline
I'm experiencing this same error with Houdini 19.5.569 and Studio Driver 531.61 for the RTX 2080 Super. Any ideas on a stable driver for this? 531.61 is the latest for the 2080 Super here.
User Avatar
Member
24 posts
Joined: Dec. 2020
Offline
brians
I'm able to replicate this as well on my end.
So thanks for posting about it.
We'll contact NVidia about it and let you know any updates.
Cheers
Hi , its happen to me .
In houdini indi when I change from cpu to xpu , with a rtx 4090 with an asset with trasmission and SS .
i have the last nvidia drivers 535.98 i tried with studio drivers too. And the same error appears.

KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed
KarmaXPU: Unable to create CUDA context for device 0
KarmaXPU: Unable to create CUDA context for device 0

tomorrow i will share the hip file if you need it.
thanks!!
I love node based world

Learning and Playing houdini
User Avatar
Staff
530 posts
Joined: May 2019
Offline
ansuter
I'm experiencing this same error with Houdini 19.5.569 and Studio Driver 531.61 for the RTX 2080 Super. Any ideas on a stable driver for this? 531.61 is the latest for the 2080 Super here.

Are you still experiencing issues ansuter?
Its possible I listed the incorrect houdini version. I'm curious if you're still getting issues on 531.61 for houdini versions past 19.5.569

Jackatack
In houdini indi when I change from cpu to xpu , with a rtx 4090 with an asset with trasmission and SS .
i have the last nvidia drivers 535.98 i tried with studio drivers too. And the same error appears.

We've been finding issues with 535.98 (will be getting in touch with NVidia)
What happens if you roll back to 532.03?


thanks!
Edited by brians - June 14, 2023 05:31:07
User Avatar
Member
24 posts
Joined: Dec. 2020
Offline
Brian
ansuter
I'm experiencing this same error with Houdini 19.5.569 and Studio Driver 531.61 for the RTX 2080 Super. Any ideas on a stable driver for this? 531.61 is the latest for the 2080 Super here.

Are you still experiencing issues ansuter?
Its possible I listed the incorrect houdini version. I'm curious if you're still getting issues on 531.61 for houdini versions past 19.5.569

Jackatack
In houdini indi when I change from cpu to xpu , with a rtx 4090 with an asset with trasmission and SS .
i have the last nvidia drivers 535.98 i tried with studio drivers too. And the same error appears.

We've been finding issues with 535.98 (will be getting in touch with NVidia)
What happens if you roll back to 532.03?


thanks!
Hi Brian , nice to meet you.

I reported the bug to support , this is what I discovered debugging the issue.


I can make XPU work again returning my GPU drivers to 531.61-desktop-win10-win11-64bit-international-nsd-dch-whql
So I think it's a nvidia latest drivers problem .


Actually Im having issues with xpu , the same asset , with trasmission and SSS. If I scatter the crystal x 10.000 using an instancer in solaris .Houdini suddenly closes .
Sometimes , an error appear with a problem of memory allocation.
I have windows 10 , 128 gb ram , ryzen 5950x and a rtx 4090.


Now Im installing the new nvidia drivers , i will post when i have more info updates.

Cheers!!


NEWS:
with Houdini 19.5.640 and the lastest studio drivers 535.98 the xpu first problem its solve .No more cuda init errors.


But when I make a renderView with karma xpu with the crystal scattered , houdini closed without an error everytime i try.
If I reduce the intances , it works .
Edited by Jackatack - June 15, 2023 17:02:57
I love node based world

Learning and Playing houdini
User Avatar
Staff
530 posts
Joined: May 2019
Offline
Hi

Jackatack
Actually Im having issues with xpu , the same asset , with trasmission and SSS. If I scatter the crystal x 10.000 using an instancer in solaris .Houdini suddenly closes .

This seems like a different issue
Could you submit a new bug-report with repro scene etc..?

thanks
Brian
User Avatar
Member
24 posts
Joined: Dec. 2020
Offline
brians
Hi

Jackatack
Actually Im having issues with xpu , the same asset , with trasmission and SSS. If I scatter the crystal x 10.000 using an instancer in solaris .Houdini suddenly closes .

This seems like a different issue
Could you submit a new bug-report with repro scene etc..?

thanks
Brian

This weekend I will do it .
Cheers
I love node based world

Learning and Playing houdini
User Avatar
Member
22 posts
Joined: May 2015
Offline
Hello,

I may have a similar problem.
When I render with KarmaXPU, I got the following message in the terminal.

KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will Future error messages will be suppressed KarmaXPU: Unable to create CUDA context for device 0

Also, when rendering with KarmaCPU, I got the following message

denoise plugin:.

I am using Houdini FX 19.5.640 on Linux Mint 20.3.
Graphics card---NVIDIA GeForce RTX 4090 with NVIDIA 535.54.03.

PS:I rolled back the NVIDIA driver to NVIDIA 525.125.06 and the problem did not occur.
Edited by Shigeru Iriki - July 11, 2023 09:45:50
User Avatar
Member
874 posts
Joined: Oct. 2008
Offline
It may not necessarily be nvidia drivers fault! I've had terrible problems with the latest Linux kernel updates causing problems with (530) nvidia drivers. I still have opencl problems on the CPU but at least it runs on the GPU now.
Edited by Soothsayer - July 11, 2023 12:40:59
--
Jobless
User Avatar
Staff
530 posts
Joined: May 2019
Offline
Shigeru Iriki2
I may have a similar problem.
When I render with KarmaXPU, I got the following message in the terminal.

KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will Future error messages will be suppressed KarmaXPU: Unable to create CUDA context for device 0


I am using Houdini FX 19.5.640 on Linux Mint 20.3.
Graphics card---NVIDIA GeForce RTX 4090 with NVIDIA 535.54.03.

PS:I rolled back the NVIDIA driver to NVIDIA 525.125.06 and the problem did not occur.

We currently are having trouble with drivers 535.98 and later.
We are in communication with NVidia about it and will let you know when we have any information to share.
In the meantime please roll back to driver 532.03 (or earlier, as you have done)
Thanks for your patience
User Avatar
Member
6 posts
Joined: March 2016
Offline
Hi, do you have cpu with built in GPU? because i have same issue, check GPU device in task manager to know if CPU graphic is on or not. i disabled my own manually
User Avatar
Member
24 posts
Joined: Dec. 2020
Offline
seyed Ali Hossein Izadi
Hi, do you have cpu with built in GPU? because i have same issue, check GPU device in task manager to know if CPU graphic is on or not. i disabled my own manually
No , my cpu is a ryzen 5950x without gpu.
I love node based world

Learning and Playing houdini
User Avatar
Staff
530 posts
Joined: May 2019
Offline
Hi guys

More info on this:
NVidia have found the bug and will be rolling out a fix in the coming weeks.
Until then, you can use this environment variable if you're having driver issues.

KARMA_XPU_OPTIX_FORCE_GAS_TRACE=1

It should be in H19.5 from about 19.5.697
  • Quick Links