Houdini 18 HQueue Error

   Views 12820   Replies 28   Subscribers 5
User Avatar
Member
132 posts
Joined: 9月 2018
Offline
Hey guys,

I'm getting this error with trying to render with HQueue.

Failed to save output to file “Traceback (most recent call last):
File ”C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/soho/python2.7/HQrender.py“, line 361, in
render()
File ”C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/soho/python2.7/HQrender.py“, line 150, in render
hqrop.submitJob(parms, _submitRenderJob)
File ”C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/python2.7libs\hqrop.py“, line 35, in submitJob
submit_function(parms)
File ”C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/soho/python2.7/HQrender.py", line 231, in _submitRenderJob
parms, hip_file)
File “C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/soho/python2.7/HQrender.py”, line 350, in _getProjectName
hqrop.getHQueueServerMachineFromURL(hq_server_url))
File “C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/python2.7libs\rendertracker.py”, line 48, in getConnection
_render_tracker_connection.isRunning()
File “C:/PROGRA~1/SIDEEF~1/HOUDIN~1.287/houdini/python2.7libs\hjsonrpc.py”, line 92, in __call__
response_code, response_data))
RPCException: Server returned error 500: b'


Any tips on getting this resolved? I'm running 18.0.287.

Vincent Griffith
Edited by VGriffith - 2019年12月3日 16:33:35
User Avatar
Member
121 posts
Joined: 7月 2005
Offline
I note that there is now a daily build that fixes this problem (it did for me). That said it still isn't working for me due to other problems. I'm interested in knowing if anyone is actually using HQueue with H18 that doesn't use the default network mounts?
User Avatar
Member
132 posts
Joined: 9月 2018
Offline
drew
I note that there is now a daily build that fixes this problem (it did for me). That said it still isn't working for me due to other problems. I'm interested in knowing if anyone is actually using HQueue with H18 that doesn't use the default network mounts?

I updated to the latest daily build and that did resolve the issue but like you said it still isn't working. I'm connected with support about it and hopefully the issue can be resolved.
Edited by VGriffith - 2019年12月4日 01:49:23
User Avatar
Member
121 posts
Joined: 7月 2005
Offline
I'm doing the same.
User Avatar
Member
19 posts
Joined: 10月 2011
Offline
For what it's worth, I've found a couple of problems with HQueue in 18, not downloaded a daily build to see if it's fixed but I wanted to add to the thread in case anyone else is hitting the same wall.

hqnode.ini is being rewritten when you run hqclientd.bat with the default “localhost\hq” values instead of the actual server location. Everything is set up correctly as per my previous HQ configs.
In an older version of HQueue (Both client and server) this problem goes away.

However, just trying to submit a job in 18 throws up a “hqueue could not retrieve network folders from 192.168.1.xxx:xxxx” error.
Doesn't do that in 17.5
User Avatar
Member
734 posts
Joined: 12月 2006
Offline
SkippyLink
However, just trying to submit a job in 18 throws up a “hqueue could not retrieve network folders from 192.168.1.xxx:xxxx” error.
Doesn't do that in 17.5

I'm getting the same thing.
Sean Lewkiw
CG Supervisor
Machine FX - Cinesite MTL
User Avatar
Member
382 posts
Joined: 11月 2010
Offline
drew
I note that there is now a daily build that fixes this problem (it did for me). That said it still isn't working for me due to other problems. I'm interested in knowing if anyone is actually using HQueue with H18 that doesn't use the default network mounts?

I stopped trying to use HQueue altogether and switched to Pandora which still has it's quirks but is so much easier to setup up and maintain… and it's free.

https://prism-pipeline.com/pandora/ [prism-pipeline.com]
Edited by OneBigTree - 2019年12月5日 09:58:45
User Avatar
スタッフ
1284 posts
Joined: 7月 2005
Offline
Hi All,

The RPCException: Server returned error 500: b'error is fixed in the daily builds. Unfortunately it wasn't detected until late in the release cycle and therefore was not fixed in time for the .287 gold build. Please use the daily builds for the HQueue server and clients.

For the network folder issues, there's a new Network Folders management section in the HQueue website (go to the “hamburger” menu in the top right corner and choose Network Folders). The section is a replacement for the settings in hqserver.ini and was designed to auto-migrate the network folder settings from hqserver.ini but it sounds like there's a bug in the migration process. In any case, if you enter your network folder settings in the web page, I wonder if that will resolve some of the issues.

As for the “hqueue could not retrieve network folders from 192.168.1.xxx:xxxx” error, we haven't encountered that before and are investigating as to how that can happen. I'll post here when we have an update.

Also, please avoid using the hqclientd.bat batch file on Windows. It's a legacy file that was a workaround for networking issues experienced in old HQueue Client versions. The file can actually cause issues when executed multiple times. We removed references to that file in the HQueue documentation and will likely remove that file altogether in a future release.

Cheers,
Rob
User Avatar
Member
19 posts
Joined: 10月 2011
Offline
rvinluan
Hi All,

The RPCException: Server returned error 500: b'error is fixed in the daily builds. Unfortunately it wasn't detected until late in the release cycle and therefore was not fixed in time for the .287 gold build. Please use the daily builds for the HQueue server and clients.

For the network folder issues, there's a new Network Folders management section in the HQueue website (go to the “hamburger” menu in the top right corner and choose Network Folders). The section is a replacement for the settings in hqserver.ini and was designed to auto-migrate the network folder settings from hqserver.ini but it sounds like there's a bug in the migration process. In any case, if you enter your network folder settings in the web page, I wonder if that will resolve some of the issues.

As for the “hqueue could not retrieve network folders from 192.168.1.xxx:xxxx” error, we haven't encountered that before and are investigating as to how that can happen. I'll post here when we have an update.

Also, please avoid using the hqclientd.bat batch file on Windows. It's a legacy file that was a workaround for networking issues experienced in old HQueue Client versions. The file can actually cause issues when executed multiple times. We removed references to that file in the HQueue documentation and will likely remove that file altogether in a future release.

Cheers,
Rob

Hi Rob, thanks for the help.

I tried the Network Folders tab on the Server page and was able to change the settings so the network folder error went away. So thanks for that! Don't know if it's a problem on my end, but using Firefox there were no feilds to change, but it worked with Chrome. See image below.

https://imgur.com/a/C2HBHsv [imgur.com]

However, no matter what I tried, the job would fail with the error;

“Loading .hip file //myserver/HQueue/projects/Render_Test/RenderTest.hiplc.”
“ERROR: Cannot find file //myserver/HQueue/projects/Render_Test/RenderTest.hiplc”


I tried so many variations in the network folder feilds to no avail. But it worked as soon as I stopped the HQClient service and ran the hqclientd.bat batch file. I appreciate that there might be something wrong with my config somwehere that stops HQueue running nativley, but for years I've relied on that batch file. Way back in the day when I was first setting it up I read on the forums that was the solution to get HQueue running on Windows and it works. I can't speak for everyone here but as far as I know thats the only way to get HQClient to work on Windows.

If someone could shine some light on getting it working without the batch file I'd be more than happy to not use it. But as it stands, HQueue doesnt work without it, for me at least.

Appreciate all your help!
Edited by SkippyLink - 2019年12月5日 18:26:14
User Avatar
Member
121 posts
Joined: 7月 2005
Offline
Grrrr….

After spending a few days investigating what was going on with H18 HQueue and submitting bug reports detailing how hqnode.ini files were being overwritten on startup, it's really frustrating to find out this has been moved into the web server. Why was this done? Why was nothing put into the configuration files mentioning the fact that they were now were being ignored. Nothing in the Whats New about HQueue changes.

I've spent a good deal of time building automated tools for:

- spinning up new linux VMs
- rolling out new versions of HQueue, keeping versions and configurations up to date
- installing hqnode clients that start on machine reboot, and restart on errors
- running up the web server on boot, _not_ as the root user
- creating a hquser on all the machines with the same uid/gid that allows a team to use HQueue without having the network mount completely open from a security standpoint
- not requiring someone to put passwords in for every client they add, frankly it's scary that the web server wants to ssh into machines and do stuff behind my back, and doesn't scale to a large number of nodes

So the new changes just seem to be going in the opposite direction. It's obviously not testable at the moment as evidenced by the fact that it was completely broken on day one of the H18 release. Philosophically I just don't think that HQueue should be developed in this direction, it seems to me un-Houdini like. When administering multiple machines there are much better tools for system orchestration than buggy web servers holding user passwords. Configuration should be in one place, clearly documented, updateable, and automatable. At least give system admins a chance to do things the right way. Sorry for the rant but I had to get this off my chest!

BTW sometimes we have to nuke the server and database and start from scratch due to large render jobs being nearly impossible to delete. The server goes into 100% cpu and hours go by without a successful delete operation. I'm assuming that would mean putting the network shares back into the database by hand through the web GUI?

Maybe I'm expecting too much of HQueue? It's been a really useful, albeit fragile, tool that has rendered hundreds of thousands of frames for my small group over the years. We're hardware rich, but software poor so it has filled a gap when we couldn't afford Deadline, Tractor etc. But maybe now we should be focusing on developing a custom PDG scheduler for task distribution in our environment, something that I'm seriously considering?

-Drew
Edited by drew - 2019年12月5日 23:46:27
User Avatar
スタッフ
1284 posts
Joined: 7月 2005
Offline
SkippyLink
I tried the Network Folders tab on the Server page and was able to change the settings so the network folder error went away. So thanks for that! Don't know if it's a problem on my end, but using Firefox there were no feilds to change, but it worked with Chrome. See image below.

Ah, looks like an incompatibility with Firefox. I submitted a bug (101479) for this into our system.

SkippyLink
However, no matter what I tried, the job would fail with the error;

“Loading .hip file //myserver/HQueue/projects/Render_Test/RenderTest.hiplc.”
“ERROR: Cannot find file //myserver/HQueue/projects/Render_Test/RenderTest.hiplc”


Take a look at the HQueue Client service on your client machine. The service may be running with an account that doesn't have access to the network folder. Try changing the service's Log On account to say your Windows user account which I assume has access to the network folder (you can verify this by navigating to “\\myserver\HQueue” in Windows File Explorer).

After changing the Log On account, restart the service and see how that goes.

Cheers,
Rob
User Avatar
Member
19 posts
Joined: 10月 2011
Offline
rvinluan
Take a look at the HQueue Client service on your client machine. The service may be running with an account that doesn't have access to the network folder. Try changing the service's Log On account to say your Windows user account which I assume has access to the network folder (you can verify this by navigating to “\\myserver\HQueue” in Windows File Explorer).

Thanks for the tip but I'm afraid that didn't work.

I can access the share through explorer (as you suggested) when logging onto the service with my local account but HQueue still throws up “ERROR: Cannot find file //myserver/HQueue/projects/Render_Test/RenderTest.hiplc”.

I should point out that the server and the local machine do use different passwords (same username) but I have no problem accessing the share through windows. It is a mapped network drive and i can also naviagate to it using both its name and IP address.

If i remember correctly, in the past the message that you would get was “Access Denied” if the service couldn't log on due to permissions. Here, it simply can't find the file.

Tried again with the batch file and it works.
User Avatar
スタッフ
1284 posts
Joined: 7月 2005
Offline
drew
Grrrr….

After spending a few days investigating what was going on with H18 HQueue and submitting bug reports detailing how hqnode.ini files were being overwritten on startup, it's really frustrating to find out this has been moved into the web server. Why was this done? Why was nothing put into the configuration files mentioning the fact that they were now were being ignored. Nothing in the Whats New about HQueue changes.

Hi Drew,

The move to the web server was intended to make it easier for new users to configure their network folders. We had common complaints and questions about the discoverability of the settings in hqserver.ini and what the hqserver.sharedNetwork.*config keys should be set to. Moving the settings to the server meant we could present them in a more intuitive way and also bring them closer together with other management aspects of HQueue such as managing clients, groups and resources.

We added an entry to the online Journals but you're right about the What's New and configuration. That's an oversight on our end and I sincerely apologize for that.

drew
I've spent a good deal of time building automated tools for:

- spinning up new linux VMs
- rolling out new versions of HQueue, keeping versions and configurations up to date
- installing hqnode clients that start on machine reboot, and restart on errors
- running up the web server on boot, _not_ as the root user
- creating a hquser on all the machines with the same uid/gid that allows a team to use HQueue without having the network mount completely open from a security standpoint
- not requiring someone to put passwords in for every client they add, frankly it's scary that the web server wants to ssh into machines and do stuff behind my back, and doesn't scale to a large number of nodes

So the new changes just seem to be going in the opposite direction. It's obviously not testable at the moment as evidenced by the fact that it was completely broken on day one of the H18 release. Philosophically I just don't think that HQueue should be developed in this direction, it seems to me un-Houdini like. When administering multiple machines there are much better tools for system orchestration than buggy web servers holding user passwords. Configuration should be in one place, clearly documented, updateable, and automatable. At least give system admins a chance to do things the right way. Sorry for the rant but I had to get this off my chest!

No worries Drew. You made some very good points. I'm actually very interested in understanding more about the automated processes you have to setup and configure your farm. I'll PM you and if you're up for it, I want to figure out a more system-administrator-friendly approach that still works with the new network folder management setup.


drew
BTW sometimes we have to nuke the server and database and start from scratch due to large render jobs being nearly impossible to delete. The server goes into 100% cpu and hours go by without a successful delete operation. I'm assuming that would mean putting the network shares back into the database by hand through the web GUI?

Yes, we've experienced the same problem in-house a couple of times though only with PDG jobs and when there's about 20000+ jobs that are active in the system. We're looking into some of the bottlenecks though some are architectural so we unfortunately won't be addressing them until the next release.

If/when the server gets into that state, could you take a snapshot of the database (i.e. ./hqueue/db/hqserver.db) and send it my way?

And yes, the only way to enter the network shares back into the database is through the web GUI or if you are willing, then to execute some SQL commands into sqlite. But let's see if we can come up with a cleaner, more automated solution instead.

Cheers,
Rob
User Avatar
スタッフ
1284 posts
Joined: 7月 2005
Offline
SkippyLink
Thanks for the tip but I'm afraid that didn't work.

I can access the share through explorer (as you suggested) when logging onto the service with my local account but HQueue still throws up “ERROR: Cannot find file //myserver/HQueue/projects/Render_Test/RenderTest.hiplc”.

I should point out that the server and the local machine do use different passwords (same username) but I have no problem accessing the share through windows. It is a mapped network drive and i can also naviagate to it using both its name and IP address.

If i remember correctly, in the past the message that you would get was “Access Denied” if the service couldn't log on due to permissions. Here, it simply can't find the file.

Tried again with the batch file and it works.

“Access is Denied” is one manifestation of a permissions issue but I've also seen “Cannot find file” errors when the Log On account has limited networking privileges as well.

Just to confirm, you can access the network share in Windows File Explorer using the share's UNC path, right? So using “\\myserver\HQueue” instead of the mapped network drive.

Also, here are some other things to try/use:
- When specifying the Log On account in the dialog, try using the “Check username” button (I think that's what it's called, I don't have a Windows box in front of me) to verify and select the user account. It's interesting that you mentioned that there are 2 different user accounts with the same name. I recall another user running into conflicts between the local account and the network account. If I recall correctly, they had to specify the local machine name in the account name by clicking that button to get things working.
- Another option is to specify the server as an IP address instead of DNS, so “\\XX.XX.XX.XX\HQueue”, in case the service's user account still has some limited network privileges.

Also, regarding the Firefox issue. We cannot reproduce the problem in-house. Tested with Firefox 70.0.1. Silly question but does shift+click refreshing the page help?

Cheers,
Rob
User Avatar
Member
19 posts
Joined: 10月 2011
Offline
Hi Rob

Yep, can access using the share's UNC path, I have no troubles accessing the shares through file browsers.

I'll give the logon accounts a few more tries later today and let you know how it goes. I don't have two User accounts with the same name, my NAS (Where the share is located) and the OS have the same username, but with different passwords so I'm wondering if it's trying to access the share with the wrong password, maybe.

I did try the Check username button and it autofilled with ./username rather than the expected DESKTOP/Username which looked a little odd to me. I'll also look into credential manager to see if anything looks iffy in there. I've tried using the IP address too and it worked with the batch file as per usual, but I'll change it to the IP for the case of troubleshooting.

I'll also reinstall and see if messing around in the “Account to run service under” section works. I have a feeling it might be there….

Firefox is working now… went to check the shift+click and it was already there. Typical

Again, appreciate all your help!
User Avatar
Member
19 posts
Joined: 10月 2011
Offline
OK! New update!

I changed the “Account to run service under” to my username and password, which immediately failed.
In the service login in section I entered my username and clicked check and it filled with “.\Username”
I changed that to “DESKTOPNAME\Username” and it was able to find the file and start rendering!

Then I get a new error, but new is better…

ERROR: The attempted operation failed.
Error: Could not create output directory H:/projects/Render_Test/render

That directory already exists, so I assume it is having problems with permissions writing to the NAS now?

All i could think to do was add “HOUDINI_ACCESS_METHOD = 2” to the hqnode.ini as that has allowed me to save to the NAS through Houdini normally, but that didn't work in this case.

We're getting closer!
User Avatar
スタッフ
1284 posts
Joined: 7月 2005
Offline
SkippyLink
Then I get a new error, but new is better…

ERROR: The attempted operation failed.
Error: Could not create output directory H:/projects/Render_Test/render

That's good to hear! About making progress that is, not about the error.

The HQueue Client service is unable to access mapped drive letters so that would explain why it complains about H:. I would wager that there is something in the .hip that is referencing a hard-coded H:path.

Here are a couple of things to check:
- Check the Outputt Picture parameter in the Mantra ROP. If it has a hard-coded H:/prefix, then try replacing it with a $HIP prefix instead.
- Check the Advanced -> Create Directories parameter in the HQueue Render ROP. If there are any H:/directories listed therem then you can try replacing H:/with $HQROOT or remove the directory entries altogether.


Cheers,
Rob
User Avatar
Member
19 posts
Joined: 10月 2011
Offline
rvinluan
The HQueue Client service is unable to access mapped drive letters so that would explain why it complains about H:. I would wager that there is something in the .hip that is referencing a hard-coded H:path.

Spot on Rob, had a hard H: in my output picture!

Problem solved, it all works now!

Can't thank you enough, great advice.

Cheers
User Avatar
スタッフ
1284 posts
Joined: 7月 2005
Offline
Awesome! Glad to hear that it's working now.

Cheers,
Rob
User Avatar
Member
3 posts
Joined: 1月 2019
Offline
rvinluan
The RPCException: Server returned error 500: b'error is fixed in the daily builds. Unfortunately it wasn't detected until late in the release cycle and therefore was not fixed in time for the .287 gold build. Please use the daily builds for the HQueue server and clients.

I'm also getting this error on a mac, but, I'm an Apprentice user, so, I'm pretty sure I don't have access to Daily Builds. Any idea when another build for Apprentice users will be released / is there a workaround?

Thanks!
  • Quick Links