It seems that after upgrading from 20.0.605 to 20.0.653 (production build) I have lost the ability to render with Karma XPU. There are no errors printed to stdout, but Log Viewer contains several errors and warnings (logs are in the attachment). HUD in the upper right corner of the viewport doesn't even mention OptiX. Same thing happens with the newest daily build (20.0.675).
I had to roll back to 20.0.605 where XPU still works.
My specs: Debian 12.5 (Bookworm), nvidia-driver/libnvoptix1 550.54.15-1 (upstream), RTX 3070.
Has anyone else experienced this problem?
XPU stopped working after Houdini upgrade (605 to 653).
5084 35 4- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
- pete4d
- Member
- 11 posts
- Joined: May 2014
- Offline
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
Can you tell me which version of NVIDIA driver you are using? It was suggested to me by the support that I should upgrade to 550.67, though this version isn't available in upstream repository yet, so I cannot test this solution out.
Perhaps something was changed in XPU architecture between 605 and 653, and it now requires some functions that exist in newer GPU driver?
Perhaps something was changed in XPU architecture between 605 and 653, and it now requires some functions that exist in newer GPU driver?
- protozoan
- Member
- 1718 posts
- Joined: March 2009
- Offline
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
- brians
- Staff
- 531 posts
- Joined: May 2019
- Offline
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
Bingo.
But why does it complain that this library cannot be found? I have it inside
[20:16:57] KarmaXPU: Failed to load CUDA DSO [libnvidia-ml.so: cannot open shared object file: No such file or directory]
But why does it complain that this library cannot be found? I have it inside
/usr/lib/x86_64-linux-gnu/nvidia/current/
path. It's a symlink to libnvidia-ml.so.1
, which in turn is a symlink to libnvidia-ml.so.550.54.15
.
Edited by ajz3d - April 16, 2024 14:46:26
- brians
- Staff
- 531 posts
- Joined: May 2019
- Offline
Maybe a path issue.
We load two files dynamically at runtime,
Do they live beside each other in the same directory? Or is the
Or maybe its sym-linked in another location, and that's what we're picking up?
I might put a log-message about where exactly we're picking these files up from, and that might give us more clues about where these files are being found at runtime.
We load two files dynamically at runtime,
libcuda.so
and libnvidia-ml.so
.Do they live beside each other in the same directory? Or is the
libcuda.so
file in a different location?Or maybe its sym-linked in another location, and that's what we're picking up?
I might put a log-message about where exactly we're picking these files up from, and that might give us more clues about where these files are being found at runtime.
Edited by brians - April 16, 2024 19:41:07
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
/usr/lib/x86_64-linux-gnu/nvidia/current/
contains libcuda.so
as well as libnvidia-ml.so*
set of files.However,
/usr/lib/x86_64-linux-gnu/
contains libcuda.so
, but only libnvidia-ml.so.1
. I believe Houdini is checking this particular path, because after I created libnvidia-ml.so
as a symlink to /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
, OptiX started working again.So yes, it's a problem with paths.
Edited by ajz3d - April 17, 2024 07:38:13
- jsmack
- Member
- 8045 posts
- Joined: Sept. 2011
- Offline
ajz3d/usr/lib/x86_64-linux-gnu/nvidia/current/
containslibcuda.so
as well aslibnvidia-ml.so*
set of files.
However,/usr/lib/x86_64-linux-gnu/
containslibcuda.so
, but onlylibnvidia-ml.so.1
. I believe Houdini is checking this particular path, because after I createdlibnvidia-ml.so
as a symlink to/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
, OptiX started working again.
So yes, it's a problem with paths.
Does that make it a bug with the distro or the nvidia driver installer?
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
Hard to say. I can only speculate, but I would definitely exclude the distro from the blame list, because the only thing I did before Optix stopped working in XPU is to upgrade Houdini from 20.0.605 to 20.0.653. No apt upgrades, no new nvidia-driver installations or anything like that. And, rolling back to Houdini 20.0.605 makes the Optix work again in XPU. Besides, I'm not using nvidia-driver from Debian's repositories, but the one from the upstream repo which Debian team has no control of.
I'd say it's most likely Houdini or NVIDIA. Brians said that they're loading a new driver binary now. I assume he had this
I'd say it's most likely Houdini or NVIDIA. Brians said that they're loading a new driver binary now. I assume he had this
libnvidia-ml.so
in mind. So maybe they're loading from the wrong path? It might also be that Houdini uses the correct path to dynamically link this library, but NVIDIA misconfigured their .deb packages and that's why the symlink to libnvidia-ml.so
wasn't created in /usr/lib/x86_64-linux-gnu
path when the nvidia-driver package was installed. Who knows? :/
- jandress
- Member
- 44 posts
- Joined: Nov. 2013
- Offline
- brians
- Staff
- 531 posts
- Joined: May 2019
- Offline
ajz3d
Brians said that they're loading a new driver binary now. I assume he had this libnvidia-ml.so in mind. So maybe they're loading from the wrong path?
The thing that has changed is that we are loading the
libnvidia-ml.so
file. But it should live beside libcuda.so
meaning that if we can load one, then we should be able to load the other from the same path. I'm not sure if its the distro or nvidia at fault, but I think we'll just fix Houdini to go looking for the
libnvidia-ml.so
file in the location of the actual libcuda.so
binary (not the symlink). Hopefully that should address the issue.
Edited by brians - April 18, 2024 05:58:31
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
- brians
- Staff
- 531 posts
- Joined: May 2019
- Offline
- Mirko Jankovic
- Member
- 167 posts
- Joined: Aug. 2015
- Offline
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
briansHi Brians,
I've made this change to 20.0.685
When you get a chance, can you please test and let me know either way.
I removed manually created
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so
symlink, restarted Debian (just in case), installed 20.0.685, and ran some XPU test renders from both: the GUI and the offline renderer. The problem seems to be fixed as there were no errors. OptiX kicked in and I had 99% load on the GPU.I'm still on nvidia-driver 550.54.15-1.
- Mirko Jankovic
- Member
- 167 posts
- Joined: Aug. 2015
- Offline
- ajz3d
- Member
- 580 posts
- Joined: Aug. 2014
- Offline
-
- Quick Links