CUDA_ERROR_INVALID_VALUE

   Views 868   Replies 12   Subscribers 1
User Avatar
Member
210 posts
Joined: Aug. 2015
Online
I'n trying to render scene on linux and it all starts fine but after a bit I get this error:

2025-03-16 08:04:48: 0: STDOUT: First Pixel: 1:11.24
2025-03-16 08:04:59: 0: STDOUT: 0.8% Lap= 0:01:21.03 Left= 2:51:31 Mem= 30.38 GiB Peak= 30.38 GiB
2025-03-16 08:05:19: 0: STDOUT: 1.6% Lap= 0:01:41.21 Left= 1:46:16 Mem= 32.44 GiB Peak= 32.44 GiB
2025-03-16 08:05:21: 0: STDOUT: 2.3% Lap= 0:01:43.40 Left= 1:11:48 Mem= 32.65 GiB Peak= 32.65 GiB
2025-03-16 08:05:23: 0: STDOUT: 3.1% Lap= 0:01:44.87 Left= 0:54:11 Mem= 32.72 GiB Peak= 32.72 GiB
2025-03-16 08:05:29: 0: STDOUT: 3.9% Lap= 0:01:51.70 Left= 0:45:47 Mem= 32.81 GiB Peak= 32.81 GiB
2025-03-16 08:05:35: 0: STDOUT: KarmaXPU: device Type:Optix ID:1 has registered a critical error , so will now stop functioning. Future error messages will be suppressed
2025-03-16 08:05:38: 0: STDOUT: 4.7% Lap= 0:01:59.76 Left= 0:40:35 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:05:42: 0: STDOUT: 5.5% Lap= 0:02:03.76 Left= 0:35:39 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:05:50: 0: STDOUT: 6.2% Lap= 0:02:12.18 Left= 0:33:02 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:05:52: 0: STDOUT: 7.0% Lap= 0:02:14.04 Left= 0:29:32 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:05:53: 0: STDOUT: KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed
2025-03-16 08:06:09: 0: STDOUT: 7.8% Lap= 0:02:31.78 Left= 0:29:51 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:06:30: 0: STDOUT: 8.6% Lap= 0:02:51.80 Left= 0:30:27 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:06:50: 0: STDOUT: 9.4% Lap= 0:03:11.85 Left= 0:30:54 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:07:09: 0: STDOUT: 10.2% Lap= 0:03:31.59 Left= 0:31:11 Mem= 32.99 GiB Peak= 32.99 GiB
2025-03-16 08:07:29: 0: STDOUT: 10.9% Lap= 0:03:51.40 Left= 0:31:24 Mem= 32.99 GiB Peak= 32.99 GiB

And after taht GPUs stop working and only CPU sontinues. After this frame netx frame starts fine again and again this popsup and no more GPUs are used.

ANy ideas?

Houdono 20.5.522, Nobara 41 linux, dual 4090, Driver Version: 570.124.04 CUDA Version: 12.8
User Avatar
Member
210 posts
Joined: Aug. 2015
Online
Ok to expand got same thing on windwos as well. fresh install latest studio drivers..

2025-03-16 16:09:40: 0: STDOUT: KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed
2025-03-16 16:09:44: 0: STDOUT: 71.9% Lap= 0:03:22.32 Left= 0:01:19 Mem= 80.59 GiB Peak= 80.59 GiB
2025-03-16 16:09:46: 0: STDOUT: KarmaXPU: device Type:Optix ID:1 has registered a critical error , so will now stop functioning. Future error messages will be suppressed
2025-03-16 16:10:09: 0: STDOUT: 72.7% Lap= 0:03:47.63 Left= 0:01:25 Mem= 80.59 GiB Peak= 80.59 GiB
User Avatar
Member
210 posts
Joined: Aug. 2015
Online
Just to add... trying anothe rproject, two cahracters, hair, inside of pharmacy .. it renders blazing fast and no hickups at all.. all machines rendering clean and smooth..
Can it be something in the scene that is jsut.. too much?

one more thing I;ve jsut noticed.
In this first project where my machine is failing, cpu and both gpus are at 100% usage btu power draw for gpus is poor around 90W
Rendering another project they draw properly power all good no issues. screenshots attached.
Edited by Mirko Jankovic - March 16, 2025 14:30:33

Attachments:
W_1.png (42.0 KB)
W_2.png (67.8 KB)

User Avatar
Member
210 posts
Joined: Aug. 2015
Online
And onemroe udpate. After pruning some objects that are not directly visible in one of the shots my mahcine is wokring properly again and full pwoer draw is visible on GPUs when rendering. And no crashing.
User Avatar
Staff
542 posts
Joined: May 2019
Offline
Mirko Jankovic
2025-03-16 16:09:40: 0: STDOUT: KarmaXPU: device Type:Optix ID:0 has registered a critical error , so will now stop functioning. Future error messages will be suppressed

Sorry but the forum strips the most important part of that message out, the actual error :/
can you type it manually? or perhaps post a screenshot?

(ps: I'll rework our error-messages so they have a format that won't get stripped by the forum)

thanks
User Avatar
Staff
542 posts
Joined: May 2019
Offline
Mirko Jankovic
And onemroe udpate. After pruning some objects that are not directly visible in one of the shots my mahcine is wokring properly again and full pwoer draw is visible on GPUs when rendering. And no crashing.

Whats the memory like on your GPUs (and on your CPU) like when you have issues?
User Avatar
Member
210 posts
Joined: Aug. 2015
Online
just is nothing more, just posted screenshot of the error as well on that other thread but here it is again, it just gives that error, and GPU stops being used but rendering continues on CPU..

I will test again tomorrow to get relevant readings from GPU and sys ram as well.

Attachments:
20250318_145703.jpg (2.5 MB)

User Avatar
Staff
542 posts
Joined: May 2019
Offline
Thanks Mirko
I saw Julca's (+ your) reply on the other thread.
I know Julca is busy, but were you able to reproduce this CUDA_ERROR_INVALID_VALUE on a stripped-down scene? Or does it only happen on the full scene?
User Avatar
Member
210 posts
Joined: Aug. 2015
Online
As soon as I prune parts of the scene it works fine. So I suspect that it is VRAM problem.
Here is another screenshot with nvtop as well to see what going on with GPu s if it helps:

In upper graph I do see a moment where at is seems memory of GPU hits the top and it fails after that.
I'm rendering more frames like that one now hope to catch the exact moment failure again

Edited by Mirko Jankovic - March 19, 2025 04:54:28

Attachments:
error.png (337.9 KB)

User Avatar
Member
210 posts
Joined: Aug. 2015
Online
Another screen in this case there wasn't full usage but it still registered fail:

Attachments:
Enter_a_filename.png (651.8 KB)

User Avatar
Member
210 posts
Joined: Aug. 2015
Online
Small update: Testing again on Windows on this problematic machine. Installed the latest Nvidia Studio driver update that just came out, and so far it seems to be stable. Got 5 frames out with no issues. Will see how it goes. After that, I will reinstall Linux and try again, as two other machines with Linux, also freshly installed, are working fine.
An update after another fresh install of Nobara 41: It seems to work fine, rendering rather stably! Hope it will stay that way for now, besides Wayland issues, that is.
Edited by Mirko Jankovic - March 19, 2025 09:19:38
User Avatar
Member
210 posts
Joined: Aug. 2015
Online
Well, joy was shortlived, the same error continued again, a bit more stable as it seemed but still there. Is it scene too complex and it hits VRAM and crashes?
User Avatar
Staff
2560 posts
Joined: Sept. 2007
Offline
Hi Mirko! We should take this out of the forum and into the bug database. If you haven't already logged a bug for this, can you please do so along with your scene file? Thanks!

https://www.sidefx.com/forum/topic/15603/ [www.sidefx.com]
Chris McSpurren
Senior Quality Assurance Specialist
SideFX
  • Quick Links