Karma XPU barely using GPU

   22576   46   5
User Avatar
Member
157 posts
Joined: July 2005
Offline
But are you using Hardware Acceleration /Hardware Ray Tracing in Redshift? I believe that is what they call using Optix for ray tracing. This would also explain why the Denoiser doesn't work as it's also using Optix.

Redshift does render images making "full" use of the GPU's. But as you note, not only is the denoiser disabled, but I'm also getting an "OptiX msg: Error initializing RTX library" in the Redshift logs.

So, no hardware ray tracing, meaning something's obviously not right with my nvidia 515.48.07 driver installation.

Have also been unable to get the OptiX SDK sample files to run.

But, everything else seems to function correctly, and Redshift is solid, rendering without crashes.

Whereas, (Houdini 19.5) Karma XPU doesn't render at all.

Out of curiosity, anyone out there running a Debian 11 (or other Linux) system with Karma XPU running correctly? Redhift? OptiX SDK sample files?
Edited by fgillis - July 30, 2022 05:16:51
Floyd Gillis
User Avatar
Member
95 posts
Joined:
Offline
well
fgillis
But are you using Hardware Acceleration /Hardware Ray Tracing in Redshift? I believe that is what they call using Optix for ray tracing. This would also explain why the Denoiser doesn't work as it's also using Optix.

Redshift does render images making "full" use of the GPU's. But as you note, not only is the denoiser disabled, but I'm also getting an "OptiX msg: Error initializing RTX library" in the Redshift logs.

So, no hardware ray tracing, meaning something's obviously not right with my nvidia 515.48.07 driver installation.

Have also been unable to get the OptiX SDK sample files to run.

But, everything else seems to function correctly, and Redshift is solid, rendering without crashes.

Whereas, (Houdini 19.5) Karma XPU doesn't render at all.

Out of curiosity, anyone out there running a Debian 11 (or other Linux) system with Karma XPU running correctly? Redhift? OptiX SDK sample files?


well for me i test large scene in XPU

and in windows GPU working perfectly
BUT in debian 11 the GPU works BUT its failed all the time

the big question if karma XPU its out-of memory soo how is it possible thats in windows its working and in linux its doesn't?
and i take this even further in windows i also open two 3d software both with large scene and still its working perfectly without failing (also test it in 327 build and its do the same).
but in linux its failed all the time

when i check the render stats its show me that the total its 11GB (and i have 24GB) and still its failed in debian or ubuntu....
BUT in windows when i open 2 software with large scene i was amazed because its supposed to fail but its working perfectlly .
in the past i assumed that windows using "share GPU memory" but someone explain me thats its only for iGPU (and the wierd part thats windows still using it)

and also linux its far more effecint with vRAM soo it doesn't supposed to fail.
soo something wierd going on in linux behaveir with houdini .
Edited by habernir - Aug. 3, 2022 05:55:06
User Avatar
Member
157 posts
Joined: July 2005
Offline
Came to the conclusion my problems were caused by a conflict with previously installed nvidia driver(s).
Also, my Debian 11 install was an upgrade from Debian 10, rather than a clean install.

So, did a clean install of Debian 11 and installed the nvidia 515.65.01 driver (with cuda) using the instructions here...
https://www.linuxcapable.com/install-510-nvidia-drivers-on-debian-11-bullseye [www.linuxcapable.com]

Redshift Render and Karma XPU now running correctly under Debian 11 using nvidia driver 515.65.01.
(Damn, XPU is fast!)

No longer getting "OptiX msg: Error initializing RTX library" with Redshift.
No longer getting "The denoiser will be disabled" with Redshift
No longer getting "Unable to create optix context for device..." with Karma XPU.

Both GPU's (RTX 2070 Super & GTX 1080) hitting 100% during rendering.

Tested with Houdini 19.5.303/Redshift 3.5.05 & Houdini 19.5.327.

So far, so good.
Edited by fgillis - Aug. 9, 2022 08:40:48

Attachments:
nvidia_driver_aug_08_22.png (211.6 KB)

Floyd Gillis
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
It seems that Debian uses the newest drivers from Long Term Support Branch (LTSB). Currently, it's 470.141.03 with its latest minor version release on the 2nd of August 2022, and with security support up to the 1st of September 2024. So they're not exactly old drivers, as I have initially thought.

Life cycle of R495, which was tagged as a New Feature Branch (NFB) and is currently the minimum requirement of Karma XPU, ended on the 14th of January 2022, so it will never reach Debian. I guess Debian incorporates only LTSB versions, which makes sense because it prioritizes stability and security over new features. This also explains why newest R510 drivers, which are from Production Branch (PB), won't budge from Experimental to Unstable repositories. If I'm correct, being from PB, they will never migrate.

So, until NVIDIA releases a newer LTSB driver, which considering the remaining lifespan of R470 does not seem to be happening soon, users of Debian workstations may only dream of Karma XPU. Or they can try installing drivers directly from upstream, but this potentially opens a can of worms, like problems on kernel upgrades, and in the worst case: system becoming unstable over time.

Perhaps SideFX have pushed down on the gas pedal a bit too strongly recently? Some of us simply can't catch up now, or at least not without undertaking drastic changes in our workstation setups, which by the way, were working more-or-less flawlessly with Houdini so far. I for instance don't remember any point in time, when Houdini driver requirements exceeded driver version available in Debian. Maybe up to version 19.0, Houdini only required LTSB, and then it was changed because some XPU features that SideFX are implementing requires PB drivers? That I don't know, but I hope that once XPU evolves to a production state, there will be no more reasons for this kind of surprises to occur.

Ok, enough of my rambling.

Sources: https://docs.nvidia.com/datacenter/tesla/drivers/index.html#lifecycle [docs.nvidia.com]
Edited by ajz3d - Sept. 14, 2022 11:32:08
User Avatar
Member
8052 posts
Joined: Sept. 2011
Online
Windows drivers go BRRRRR
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
jsmack
Windows drivers go BRRRRR
They do. Until all stops and you can't debug.
Edited by ajz3d - Sept. 15, 2022 07:52:14

Attachments:
bsod.jpg (15.1 KB)

User Avatar
Staff
4201 posts
Joined: Sept. 2007
Offline
ajz3d
Perhaps SideFX have pushed down on the gas pedal a bit too strongly recently? Some of us simply can't catch up now, or at least not without undertaking drastic changes in our workstation setups, which by the way, were working more-or-less flawlessly with Houdini so far. I for instance don't remember any point in time, when Houdini driver requirements exceeded driver version available in Debian. Maybe up to version 19.0, Houdini only required LTSB, and then it was changed because some XPU features that SideFX are implementing requires PB drivers? That I don't know, but I hope that once XPU evolves to a production state, there will be no more reasons for this kind of surprises to occur.

XPU is still in beta, and the requirements are fairly clear about Nvidia driver versions for those interested in testing. There is technology in the Nvidia Optix libraries that doesn't exist in those older drivers. We haven't necessarily been targeting Debian's available LTSB drivers, nor have we been specifically trying to break compatibility, but the vast majority of our customers install/manage their drivers manually, or use a distro which makes more recent driver updates/fixes available (i.e. Ubuntu LTS).

If XPU is important, you'll probably need to manage Nvidia drivers differently on Debian, or use/dual-boot with different distro. Personally, I've found Ubuntu or derivitives like Mint/Pop!_OS strike a good balance of stable base, but offer more up-to-date drivers that are easy to install/manage compared to installing from packages.

Everyone is very excited about XPU, and we want to make it available for others to test with, but we also stress that it's still beta, and often there are Optix features/fixes we need that require newer versions.

Good luck, thanks for the feedback!
I'm o.d.d.
User Avatar
Member
8052 posts
Joined: Sept. 2011
Online
ajz3d
and you can't debug

that's Nvidias job
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
Thank you @goldleaf for an exhaustive reply. I'm one of those excited people too, so I kept on trying to figure out how to resolve this problem and I think I have good news for Debian users. At least those with Debian Testing as their daily driver.

I succeeded in running Karma XPU with drivers from Debian's main repository. The trick was to purge existing *nvidia*packages and install drivers for Tesla (nvidia-tesla-driver), which are already 510.85.xx. Additionally, I had to install libnvidia-tesla-nvoptix1and nvidia-tesla-opencl-icd.

I also installed nvidia-cuda-toolkit, but I'm not sure if it's required for XPU to work because CUDA is already in libnvidia-tesla-cuda1which is one of the Tesla driver's dependency packages, and is also in a newer version (11.6 vs 11.5). I will uninstall the toolkit later and see if it breaks anything.

Why Tesla driver? According to nvidia-tesla-driverpackage description, it also works with RTX cards (like my RTX 3070):

This version only supports GeForce, NVS, Quadro, RTX, Tesla, ... GPUs based on
the Maxwell, Pascal, Volta, Turing, Ampere or newer architectures.

Several weeks ago I've read somewhere that nvidia-driverpackage was created in order to avoid some problems that nvidia-tesla-drivermay cause. What are those problems? I don't know because they weren't mentioned in that article or post. I guess sooner or later I'm going to find out if there are any.

Here are instructions that I issued in order to change the driver. I did it from TTY1, not GUI. Follow them at your own risk and don't blame me if you video card explodes.

Instead of installing everything "in one go", I kept rebooting the system after each installed package, just to be on the safe side. This however might not be necessary, so it might be sufficient to reboot only after purge of the old driver, installation of the driver, and later after installing supplementary packages.

apt purge "*nvidia*"
systemctl reboot
# After reboot noveau driver should kick in.
# Jump to TTY1 again.
apt install nvidia-tesla-driver
systemctl reboot
apt install libnvidia-tesla-nvoptix1
# Optional?
apt install nvidia-cuda-toolkit
systemctl reboot
apt install nvidia-tesla-opencl-icd
systemctl reboot

In case of trouble, follow Troubleshooting and Uninstallation sections of: https://wiki.debian.org/NvidiaGraphicsDrivers [wiki.debian.org]

So far I've performed only some simple XPU tests. No complex materials or other stuff.

I've already noticed one problem which is with Optix failing when trying to render out Crag test geometry. Once this happens, OptiX will be broken for the duration of Houdini session, so to fix it I need to restart Houdini. Rendering of other available test models works fine. The error which pops up while rendering Crag is:

KarmaXPU: device Type:Optix ID:0 has registered a critical error [cudaErrorIllegalAddress], so will now stop functioning.  Future error messages will be suppressed

Also, sometimes it takes a lot of time for OptiX to initialize.
Edited by ajz3d - Oct. 11, 2022 12:09:15
User Avatar
Staff
533 posts
Joined: May 2019
Offline
ajz3d
The error which pops up while rendering Crag is:

I've been trying to get a repro case for this.
Is this repeatable for you?
Is it possible to more thoroughly describe your setup + steps please?
eg
- what version of houdini
- what driver version
- what are the minimal number of exact steps needed to reproduce? (preferably from a clean/new launch of houdini)
- etc...

ajz3d
sometimes it takes a lot of time for OptiX to initialize.

Yea, that is the shaders compiling :/
We're working to make this faster.

thanks!
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
It occurs every time, so it's fully reproducible on my machine. I've just tested it on fresh default Houdini user preferences and Crag crashes OptiX every time if specific conditions are met (more info on that after some technicalities).

Here are some technical details you asked for:
- Houdini 19.5.368 - Py3.9 (latest production build)
- nvidia-tesla-driverversion 510.85.02
- CUDA version 11.6 (according to nvidia-smi)
- RTX 3070 8GB

Installed NVIDIA libraries:
▶ apt search --names-only "[.]*nvidia[.]*" | grep installed

glx-alternative-nvidia/testing,now 1.2.1 amd64 [installed,automatic]
libegl-nvidia-tesla0/testing,now 510.85.02-1 amd64 [installed,automatic]
libgl1-nvidia-tesla-glvnd-glx/testing,now 510.85.02-1 amd64 [installed,automatic]
libgles-nvidia-tesla1/testing,now 510.85.02-1 amd64 [installed,automatic]
libgles-nvidia-tesla2/testing,now 510.85.02-1 amd64 [installed,automatic]
libglx-nvidia-tesla0/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-egl-gbm1/testing,now 1.1.0-1 amd64 [installed,automatic]
libnvidia-egl-wayland1/testing,now 1:1.1.10-1 amd64 [installed,automatic]
libnvidia-ml-dev/testing,now 11.5.50~11.5.2-2 amd64 [installed,automatic]
libnvidia-tesla-allocator1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-cfg1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-compiler/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-cuda1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-eglcore/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-encode1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-glcore/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-glvkspirv/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-ml1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-nvcuvid1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-nvoptix1/testing,now 510.85.02-1 amd64 [installed]
libnvidia-tesla-ptxjitcompiler1/testing,now 510.85.02-1 amd64 [installed,automatic]
libnvidia-tesla-rtcore/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-cuda-dev/testing,now 11.5.2-2 amd64 [installed,automatic]
nvidia-cuda-gdb/testing,now 11.5.114~11.5.2-2 amd64 [installed,automatic]
nvidia-cuda-toolkit/testing,now 11.5.2-2 amd64 [installed]
nvidia-cuda-toolkit-doc/testing,testing,now 11.5.2-2 all [installed,automatic]
nvidia-egl-common/testing,now 470.141.03-2 amd64 [installed,automatic]
nvidia-installer-cleanup/testing,now 20220217+1 amd64 [installed,automatic]
nvidia-kernel-common/testing,now 20220217+1 amd64 [installed,automatic]
nvidia-modprobe/testing,now 515.48.07-1 amd64 [installed,automatic]
nvidia-opencl-common/testing,now 470.141.03-2 amd64 [installed,automatic]
nvidia-openjdk-8-jre/testing,now 9.+8u342-b07-1~11.5.2-2 amd64 [installed,automatic]
nvidia-persistenced/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-profiler/testing,now 11.5.114~11.5.2-2 amd64 [installed,automatic]
nvidia-settings-tesla/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-support/testing,now 20220217+1 amd64 [installed,automatic]
nvidia-tesla-alternative/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-driver/testing,now 510.85.02-1 amd64 [installed]
nvidia-tesla-driver-bin/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-driver-libs/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-egl-icd/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-kernel-dkms/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-kernel-support/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-opencl-icd/testing,now 510.85.02-1 amd64 [installed]
nvidia-tesla-smi/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-vdpau-driver/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-tesla-vulkan-icd/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-visual-profiler/testing,now 11.5.126~11.5.2-2 amd64 [installed,automatic]
nvidia-vulkan-common/testing,now 470.141.03-2 amd64 [installed,automatic]
xserver-xorg-video-nvidia-tesla/testing,now 510.85.02-1 amd64 [installed,automatic]

Let me know if you need anything else.

What are the minimal number of exact steps needed to reproduce? (preferably from a clean/new launch of houdini)

1. Start Houdini and jump to the Stage.
2. Drop a Karma LOP and set it to XPU.
3. Switch the viewport to Karma rendering.
4. Drop a Pig Head.
5. Drop Crag.
6. Drop a dome light and put it between Pig Head and Karma LOP.
7. Assign a texture to Dome Light (for example, the bundled kiara_5_noon).
8. Now pipe Crag to the input of Dome Light, replacing the Pig Head. This will crash OptiX.

What I noticed while writing this post and testing even more, is that it seems to be crashing whenever I'm switching between Crag and other geometry while some kind of light is present in the stage. I just had it crash with distant light, so it's not limited to dome light. I've cobbled up together a simple test scene in which I can reproduce the crash 100% of the time. Fiddle with the switch while Karma XPU rendering is enabled in the viewport and once you reach Crag (7th input) it should crash.

And if it doesn't... Well, then it's probably one of those problems of using Tesla driver on a consumer-grade RTX. :/
Edited by ajz3d - Oct. 12, 2022 09:19:04

Attachments:
optix_crag_crash_test.hiplc.tar.gz (49.4 KB)

User Avatar
Staff
533 posts
Joined: May 2019
Offline
Thanks for sending this stuff through.
We're failing to reproduce the Optix failure (on both Linux and Windows), so maybe it is a driver thing as you say...
But I've put it on file in a bug report, so will refer back to it if we find this issue again.
thanks again!
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
I was afraid you're going to say this. Nevertheless, thanks for trying to help @brians.

Meanwhile, I continued my own investigation. First I uninstalled nvidia-cuda-toolkitand temporarily removed OCIOenvar, just to eliminate them as a potential source of the problem. That didn't help. The only think this told me is that the toolkit package is not required for XPU, so you can scratch it out from instructions I recently posted.

But then, I was able to come up with something quite interesting.

@Brians, could you perhaps check out the attached scene and compare it to the one I posted before? At first glance, they look exactly the same, however in this one Crag works without problems, every single time so far.

This new scene is inherited from the previous one, but I was doing some experimentation on it. For example, I temporarily bypassed the light and camera and piped Crag directly to Karma LOP. I saved the scene and restarted Houdini. I moved the timeline to a different frame, saved and restarted, etc. You know, all sorts of random things we do while trying to figure out a problem.

Then, after nth restart and reload I noticed that OptiX no longer crashes on Crag in this scene. So I piped back everything into its former place, restored connections, removed bypasses, and saved again. After restarting Houdini, reloading the scene, and fiddling with the switch, I have no more crashes in this scene. Quite surprising really.

Naturally, the old scene still remains borked.

You guys probably have some kind of inside tools for cracking open and analyzing HIP files, so perhaps comparing the two scenes can shed some light on what is so significantly different between them, that it makes Crag crashing OptiX in the old one, but not in the new.

PS. For what it's worth, there's something I forgot to mention while listing technical things. I'm using a custom HOUDINI_USER_PREF_DIRenvar in .xsessionrc:
export HOUDINI_USER_PREF_DIR=$HOME/.config/houdini19.5
It probably doesn't matter at all, but I know that while debugging certain problems, sometimes small insignificant things like that can help to reproduce the bug.
Edited by ajz3d - Oct. 14, 2022 17:00:31
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
Today nvidia-driver510.85.02-5 metapackage has migrated to Debian Testing.

https://tracker.debian.org/pkg/nvidia-graphics-drivers [tracker.debian.org]

I'm going to wait a couple of days to let all dependency packages to migrate too, and then I'll replace nvidia-tesla-driverwith it. I wonder if it's going to solve this strange Crag OptiX crashes.
User Avatar
Staff
533 posts
Joined: May 2019
Offline
ajz3d
I wonder if it's going to solve this strange Crag OptiX crashes.

It would be amazing if it did
We're still looking at this issue FYI. Its a very stubborn bug, with heisenbug behaviours. But we'll get there eventually.
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
Thanks for the info, @brians. I'll keep my fingers crossed.

By the way, I have noticed that I have a history of changes (sort of) that I did to that original scene. Up to a state where it stopped crashing OptiX. Each file represents a point in time when I saved and restarted Houdini after altering the scene is some way. The readme.orgfile contains some more information about each file. Maybe you will find it helpful.
Edited by ajz3d - Oct. 24, 2022 09:43:10

Attachments:
xpu_test_working_history.tar.bz2 (318.4 KB)

User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
@Brians, unfortunately the new "non-tesla" nvidia-driver 510.85.02-5 doesn't seem to make any difference.
The old scene still crashes, the modified one does not.

I also remade the scene with the most recent Houdini production build (19.5.403), thinking that maybe it will not crash when combined with the standard nvidia-driver, but the problem persists.

glx-alternative-nvidia/testing,now 1.2.1 amd64 [installed,automatic]
libcuda1/testing,now 510.85.02-5 amd64 [installed,automatic]
libegl-nvidia0/testing,now 510.85.02-5 amd64 [installed,automatic]
libgl1-nvidia-glvnd-glx/testing,now 510.85.02-5 amd64 [installed,automatic]
libgles-nvidia1/testing,now 510.85.02-5 amd64 [installed,automatic]
libgles-nvidia2/testing,now 510.85.02-5 amd64 [installed,automatic]
libglx-nvidia0/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvcuvid1/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-allocator1/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-cfg1/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-compiler/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-egl-gbm1/testing,now 1.1.0-1 amd64 [installed,automatic]
libnvidia-egl-wayland1/testing,now 1:1.1.10-1 amd64 [installed,automatic]
libnvidia-eglcore/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-encode1/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-glcore/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-glvkspirv/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-ml1/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-ptxjitcompiler1/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvidia-rtcore/testing,now 510.85.02-5 amd64 [installed,automatic]
libnvoptix1/testing,now 510.85.02-5 amd64 [installed]
libvdpau1/testing,now 1.5-1 amd64 [installed,automatic]
libxnvctrl0/testing,now 510.85.02-2 amd64 [installed,automatic]
nvidia-alternative/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-driver/testing,now 510.85.02-5 amd64 [installed]
nvidia-driver-bin/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-driver-libs/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-egl-common/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-egl-icd/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-installer-cleanup/testing,now 20220217+1 amd64 [installed,automatic]
nvidia-kernel-common/testing,now 20220217+1 amd64 [installed,automatic]
nvidia-kernel-dkms/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-kernel-support/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-legacy-check/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-modprobe/testing,now 515.48.07-1 amd64 [installed,automatic]
nvidia-opencl-common/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-opencl-icd/testing,now 510.85.02-5 amd64 [installed]
nvidia-persistenced/testing,now 510.85.02-1 amd64 [installed,automatic]
nvidia-settings/testing,now 510.85.02-2 amd64 [installed,automatic]
nvidia-smi/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-support/testing,now 20220217+1 amd64 [installed,automatic]
nvidia-vdpau-driver/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-vulkan-common/testing,now 510.85.02-5 amd64 [installed,automatic]
nvidia-vulkan-icd/testing,now 510.85.02-5 amd64 [installed,automatic]
vdpau-driver-all/testing,now 1.5-1 amd64 [installed,automatic]
xserver-xorg-video-nouveau/testing,now 1:1.0.17-2 amd64 [installed,automatic]
xserver-xorg-video-nvidia/testing,now 510.85.02-5 amd64 [installed,automatic]
Edited by ajz3d - Oct. 24, 2022 11:50:53
User Avatar
Staff
533 posts
Joined: May 2019
Offline
ajz3d
maybe it will not crash when combined with the standard nvidia-driver, but the problem persists.

thanks for trying
User Avatar
Staff
533 posts
Joined: May 2019
Offline
An update:

We suspect this has been a driver issue, introduced sometime after 495.89, but it's hard to be sure given the issue exhibits heisenbug behavior. But tentatively we've arrived at this...

Our testing suggests these are fine
- 495.89
- 496.13

And it seems this is now fine too
- 526.86

But anything in the middle could be broken (we have not exhaustively searched to find the exact range sorry). If it's not possible for you to move onto 526.86, could you perhaps roll back to a version as close to 495.89 as possible? (but not before, as that is the minimum required version for XPU)

thanks!
User Avatar
Member
590 posts
Joined: Aug. 2014
Offline
Thanks for the update, @Brians.

Unfortunately no nvidia-drivers "near" those version numbers are currently available on Debian. It's either 470.xx (Debian Stable) which we know doesn't work with XPU, 510.85.02-6 which I'm currently using (available in Testing and Unstable), and 515.48.07-1 (non-installable), which was uploaded to Debian Experimental at the beginning of the month.

Neither does nvidia-tesla-driver qualify: 510.85.02-4.

So the only way for me to test this now, would be to install nvidia-driver directly from the upstream, but like I wrote in my past posts, I'm unwilling to do it for security and stability reasons. I know you'll understand.

Nevertheless, I can promise you to do the testing once the appropriate driver version becomes available in the official Debian Testing repository.
  • Quick Links