Husk.exe stuck not finishing task

   9554   33   12
User Avatar
Member
47 posts
Joined: Feb. 2020
Offline
Hi,

I have an issue with husk.exe getting "stuck" and not completing a render. It happens on random frames. Often though it does it early on, for example if I have 1001>1100 frames, it usually happens in the first 20 frames or so. I have noticed it tends to happen more often, but not exclusive to, RAM intensive render products as well as deepexr.
I am using husk command "husk --restart-delegate 1" in the Render Command in USD Render ROP.


I have attached a screenshot of two render logs. In the top log you can see that husk.exe finished the env_deep.1003.exr task and send a signal back to PDG resulting in PDG_RESULT.

In the second log below you can see Husk does not finish the task completely and no signal gets sent back to PDG allowing it to move to the next task. I do however get all files on disk, so from a rendering point of view the task is "done".



Notice in Task Manager how Husk.exe is taking up RAM and uses a small amount of CPU. No matter how long I let husk.exe sit there, it does not complete to send a signal to PDG.




This happens with both localScheduler and deadlineScheduler.


Any help would be greatly appreciated!

Thanks
Edited by kskovbo - July 23, 2023 17:29:23

Attachments:
huskIssue.png (410.1 KB)
huskIssue2.png (25.5 KB)

User Avatar
Member
47 posts
Joined: Feb. 2020
Offline
Anyone from sideFX able to shine a light on this issue please? At least to understand the expected behaviour of Husk when a render completes.


thanks
User Avatar
Member
2 posts
Joined: Feb. 2019
Offline
kskovbo
Anyone from sideFX able to shine a light on this issue please? At least to understand the expected behaviour of Husk when a render completes.


thanks

I just started running into the same issue. No changes on my end using 19.5.640 and suddenly husk will not completely the task until I kill the process. Using a USD Render ROP with Redshift. The images are written to disk correctly, but never report back as either failed or complete. Same results using Deadline or a local Render in Background.
User Avatar
Member
47 posts
Joined: Feb. 2020
Offline
To follow up on this topic, we have tried a very simple scene with just using the USD Render Rop and after 8 frames it got stuck with husk.exe sitting there using 1 core. We tried waiting it out over 2hours to see if it would eventually finish, but it does not. We feel this rules out anything to do with PDG/Deadline and it must be an issue with Husk.exe and the render delegate. We are using Renderman, but I can see above that tntbruin has the same issue with Redshift.

Have there been any updates to Husk in recent patches?

Thanks
User Avatar
Staff
4525 posts
Joined: July 2005
Offline
Did anyone submit a bug about this? The more "vanilla" the setup the better (i.e. if this can be reproduced with a local scheduler, using karma instead of redshift or prman, that would greatly increase the likelihood of us tracking it down). Alternatively, if anyone is able to reproduce this on some system other than Windows, maybe you could attach gdb to husk and generate a stack trace so we can see what husk is doing when it gets stuck like this?

We've got a lot of people at siggraph this week so it may take some time to get back to you on this.

Thanks for any assistance,
Mark
User Avatar
Member
47 posts
Joined: Feb. 2020
Offline
Hello,

I did submit a bug report yes. I managed to boil the issue down to a simple scene that can reproduce the issue, which I attached.

I am still getting the issue using just the usd render rop node (no PDG or Deadline). The environment scene I am rendering is fairly complex from a geometry point of view, but with simple shaders and no textures I am getting the husk.exe getting stuck with Renderman. I am using Renderman 25.2 and Houdini 19.5.640.

Same scene works if I switch render delegate to Karma, but with Renderman, Husk.exe gets stuck still.


Let me know if any more information is needed.


Thanks
User Avatar
Member
136 posts
Joined: Oct. 2020
Offline
+1, me and a friend are having the same issue with Arnold and Karma using husk on multiple machines both locally and on multiple deadlines. I managed to solve the issue temporarily by "cook all frames as a single process" then "restart delegate" every 1 frame. This works with husk Arnold, not sure how I would translate that to Karma on the farm tho.
https://www.youtube.com/channel/UC4NQi8wpYUbR9wLolfHrZVA [www.youtube.com]
User Avatar
Member
20 posts
Joined: March 2019
Offline
Hi!

Did anyone find solution to these stuck husk processes? We're having same issues using Houdini 19.5.435 and htoa-6.2.3.2 with Tractor
User Avatar
Member
1 posts
Joined: March 2022
Offline
I'm running into the same issue with Houdini 19.5.640 and Renderman 25.2. Is there any update or at least workaround for this issue?
User Avatar
Member
4 posts
Joined: Sept. 2020
Offline
We are having a similar issue with Arnold (htoa) in 19.5.640+ on linux (does not happen in 19.5.421). In this case, it's the Houdini process that gets stuck.

We managed to get it pared down to a simple scene with a sphere exporting via command line to an Arnold .ass file - this will repeatedly result in a stuck process that holds on to it's ram, despite the job finishing. Do the same thing in mantra without HtoA loaded, and no issue. Sometimes it happens just with loading Houdini with the plugin as well, this is more random though.

SideFX support logged #132288 as a bug, but it's not clear to me this is the identical problem. It sure seems similar though.

Oddly enough, I couldn't get it to repeat if I ran the process as root.
User Avatar
Member
136 posts
Joined: Oct. 2020
Offline
can you report it on Arnold forums?
https://www.youtube.com/channel/UC4NQi8wpYUbR9wLolfHrZVA [www.youtube.com]
User Avatar
Member
4 posts
Joined: Sept. 2020
Offline
I have reported it to the Arnold team on the beta and in the forums and will log an official ticket. Still not 100% sure this is just HtoA given the similar issues in other renderers in this thread.
User Avatar
Member
4 posts
Joined: Sept. 2020
Offline
For people running into this issue with Arnold, setting these variables may be a workaround:

ARNOLD_ADP_DISABLE=1
ARNOLD_CER_ENABLED=0

This only worked for us for HtoA 6.2.1.0+
Edited by lcymet-wefx - Nov. 10, 2023 18:47:10
User Avatar
Member
61 posts
Joined: Oct. 2013
Offline
Just to hop in here, we're experiencing the same issue w/ Renderman 25.2, Houdini 19.5.569

Agree it seems suspicious that it's happening w/ both Renderman and Arnold
Grant Miller
VFX Supervisor
Ingenuity Studios
User Avatar
Member
4 posts
Joined: Sept. 2020
Offline
blented
Just to hop in here, we're experiencing the same issue w/ Renderman 25.2, Houdini 19.5.569

Agree it seems suspicious that it's happening w/ both Renderman and Arnold

Yeah - that's what's weird and scary to me, but it sounds like any number of things can cause an unclean termination. The approach we took with Arnold was to test with the same Houdini version different plugin versions until we isolated the version that failed. Then talked with the Arnold team until we figured it out.
User Avatar
Member
61 posts
Joined: Oct. 2013
Offline
There's a corresponding RenderMan forum post w/ sadly a similar lack of info / traction:
https://renderman.pixar.com/forum/showthread.php?s=&postid=265604#post265604 [renderman.pixar.com]

Definitely seems like the non-Karma renderers aren't "releasing resources" or "finishing" properly. Really hoping devs on both sides can put their heads together and figure this out, as it's top of list for us right now as we attempt to migrate to RenderMan 25.2 / Houdini 19.5.

We're running a test w/ Houdini 20 tomorrow to see if the issue persists there, will update this thread w/ the results 👍
Grant Miller
VFX Supervisor
Ingenuity Studios
User Avatar
Staff
2642 posts
Joined: July 2005
Offline
husk has a --fast-exitoption:
  --fast-exit arg (=2)                  0 - Force a full tear down of the
USD scene and Hydra
1 - Fast exit lets the OS tear down
resources
2 - Use setting in UsdRenderers.json
to use delegate preference

By default, this uses the setting in UsdRenderers.json, but perhaps this isn't set for the delegates? Does manually adding --fast-exit0 fix the issue?

By default, fast-exitshould be 0, so husk will tear down the delegate on exit.
Edited by mark - Nov. 26, 2023 18:00:58
User Avatar
Member
47 posts
Joined: Feb. 2020
Offline
Hi,

I mentioned this on the Renderman forum page, but no fast-exit did not fix it for us. We tried going and using a combination of the flags listed on this page here: https://www.sidefx.com/docs/houdini/ref/utils/husk.html [www.sidefx.com]

On our end this is still an unresolved issue that we never managed to fix.
User Avatar
Member
61 posts
Joined: Oct. 2013
Offline
We've spent the day debugging this to no avail unfortunately.

Tested:

  • fast exit = 0
  • rendering locally then copying to the network (Houdini consistently hates our SMB server)
  • reducing scene complexity / instance count
  • testing w/ several other scenes

There seems to be a loose correlation between the scene complexity and our chances of getting hung frames, with greater complexity giving a much greater chance. We reduced the instance count by half, and then to 20%, and went from most frames hanging to perhaps a third of the frames hanging.

We've also pulled in additional scenes to test, and are getting the same hanging in a simple high poly river scene that we're seeing in a scatter-heavy instance scene, but w/ less consistency, only hanging or slowing about 10% of the frames.

Some of our hung frames do eventually complete, often taking 4-20x longer than the original frames. Our best test case for this scenario has been the river scene, which is normally 2 minutes per frame, but consistently has several frames out of 100 taking 30+ minutes or never completing.

Progress on the hung frames often stalls out at 50-99%, with the logs simply stopping after that.

Apologies for the lengthy post, just trying to give as much info as we can to help solve this.

Windows 10 / Houdini 19.5.569 / RenderMan 25.2
Grant Miller
VFX Supervisor
Ingenuity Studios
User Avatar
Member
4 posts
Joined: Sept. 2016
Offline
Does someone has found a solution to husk getting stuck. I am having this problem when trying to render volume.
I have tried with all the Houdini 20 update. Now I am using 20.506 and I still have the problem.
  • Quick Links