deadline scheduler _updateWorkingDir() error

   7374   12   0
User Avatar
Member
12 posts
Joined: 9月 2016
Offline
We've updated to houdini 17.5.221 to see if the deadline scheduler works, but so far i can't seem to make it work, and it always gives the same error. Any ideas?

Folder is accessible in all machines. It's set to $HIP/jobs

Attachments:
Untitled.png (45.4 KB)

User Avatar
Member
571 posts
Joined: 5月 2017
Offline
This was a regression caused by a recent change. Please update to 17.5.223 or newer to get the fixed version. In fact, I recommed that you get 17.5.224 (today's build), where the deadline node got more robust error handling, and support for empty repository field (allowing to use system default deadline repository setup).
User Avatar
Member
12 posts
Joined: 9月 2016
Offline
seelan
This was a regression caused by a recent change. Please update to 17.5.223 or newer to get the fixed version. In fact, I recommed that you get 17.5.224 (today's build), where the deadline node got more robust error handling, and support for empty repository field (allowing to use system default deadline repository setup).

I updated and tested again, leaving the repository empty seemed to work. However it does nothing, lots of jobs are submitted to deadline, 1 per frame apparently, and no calculation is made, and houdini crashes afterwards.

The job is a shelf explosion on a simple scene, so nothing especial. Any ideas?





edit: as a single frame, still no calculation, and no machines start the job. Here's how's it is appearing.




Edited by romanus - 2019年4月12日 10:14:29

Attachments:
Untitled.png (28.8 KB)
Untitled1.png (4.7 KB)
Untitled2.png (2.9 KB)

User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Are you able to check the Houdini crash report for any clues as to the cause? In terms of work done being done, can you check the job report on Deadline's monitor to see if any errors? Make sure that the path mapping has been setup for hython in Deadline's repository configuration. Also, try rendering a single frame or a subset.

On the ROP Fetch, you can specify Frames per Batch so that you can render x number of frames per job. On the Deadline node, under Job Parms, you can also specify a Job Batch Name for organizing the deadline jobs under a parent job.
Edited by seelan - 2019年4月12日 10:34:58
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Hmm, if the status stays at Queued, it could possibly mean that there are no available clients. Do you have the CommandLine plugin enabled in your repository configuration? I can reproduce the queued status if I disable it.
Edited by seelan - 2019年4月12日 10:37:28

Attachments:
cmdline_dl.png (67.3 KB)

User Avatar
Member
12 posts
Joined: 9月 2016
Offline
edit: Ok that seemed to work. However .sim files aren't being saved correctly. I'll try to see what's wrong now. Thanks for the prompt response btw
Edited by romanus - 2019年4月12日 10:45:28
User Avatar
Member
12 posts
Joined: 9月 2016
Offline
After one file being saved (in the wrong directory), it now doesn't save any more files, it gives this error:

Error: Process returned non-zero exit code ‘3’


=======================================================
Error
=======================================================
Error: Process returned non-zero exit code ‘3’
at Deadline.Plugins.PluginWrapper.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel)

=======================================================
Type
=======================================================
RenderPluginException

=======================================================
Stack Trace
=======================================================
at Deadline.Plugins.SandboxedPlugin.d(DeadlineMessage aog)
at Deadline.Plugins.SandboxedPlugin.RenderTask(String taskId, Int32 startFrame, Int32 endFrame)
at Deadline.Slaves.SlaveRenderThread.c(TaskLogWriter aem)

=======================================================
Log
=======================================================
2019-04-12 15:53:24: 0: Loading Job's Plugin timeout is Disabled
2019-04-12 15:53:26: 0: Executing plugin command of type ‘Sync Files for Job’
2019-04-12 15:53:26: 0: All job files are already synchronized
2019-04-12 15:53:26: 0: Plugin CommandLine was already synchronized.
2019-04-12 15:53:26: 0: Done executing plugin command of type ‘Sync Files for Job’
2019-04-12 15:53:26: 0: Executing plugin command of type ‘Initialize Plugin’
2019-04-12 15:53:26: 0: INFO: Executing plugin script ‘C:\Users\Diogo Ferreira\AppData\Local\Thinkbox\Deadline10\slave\lfdiogo\plugins\5cb0a64333b45045549a70c2\CommandLine.py’
2019-04-12 15:53:26: 0: INFO: Single Frames Only: False
2019-04-12 15:53:26: 0: INFO: About: Command Line Plugin for Deadline
2019-04-12 15:53:26: 0: INFO: Render Job As User disabled, running as current user ‘Diogo Ferreira’
2019-04-12 15:53:26: 0: INFO: The job's environment will be merged with the current environment before rendering
2019-04-12 15:53:26: 0: Done executing plugin command of type ‘Initialize Plugin’
2019-04-12 15:53:26: 0: Start Job timeout is disabled.
2019-04-12 15:53:26: 0: Task timeout is disabled.
2019-04-12 15:53:26: 0: Loaded job: ropgeometry1_ropfetch10_1_0 (5cb0a64333b45045549a70c2)
2019-04-12 15:53:26: 0: Executing plugin command of type ‘Start Job’
2019-04-12 15:53:26: 0: INFO: Executing global asset transfer preload script ‘C:\Users\Diogo Ferreira\AppData\Local\Thinkbox\Deadline10\slave\lfdiogo\plugins\5cb0a64333b45045549a70c2\GlobalAssetTransferPreLoad.py’
2019-04-12 15:53:26: 0: INFO: Looking for AWS Portal File Transfer…
2019-04-12 15:53:26: 0: INFO: Looking for File Transfer controller in CProgram Files/Thinkbox/S3BackedCache/bin/task.py…
2019-04-12 15:53:26: 0: INFO: Could not find AWS Portal File Transfer.
2019-04-12 15:53:26: 0: INFO: AWS Portal File Transfer is not installed on the system.
2019-04-12 15:53:26: 0: Done executing plugin command of type ‘Start Job’
2019-04-12 15:53:26: 0: Plugin rendering frame(s): 0
2019-04-12 15:53:27: 0: Executing plugin command of type ‘Render Task’
2019-04-12 15:53:27: 0: INFO: Executable: CProgram Files/Side Effects Software/Houdini 17.5.224/bin/hython.exe
2019-04-12 15:53:27: 0: INFO: Arguments: “A_Temp/houdini_sim_network_test/jobs/test/pdgtemp/21724/scripts/rop.py” “–batch” “-p” “A_Temp/houdini_sim_network_test/jobs/test/test.hip” “-n” “/obj/topnet1/ropgeometry1/ropnet1/geometry1” “-to” “/obj/topnet1/ropgeometry1” “-i” “ropgeometry1_ropfetch10_1” “-s” “GLPedro:49330” “-fs” “1” “-fe” “50” “-fi” “1”
2019-04-12 15:53:27: 0: INFO: Execute in Shell: False
2019-04-12 15:53:27: 0: INFO: Invoking: Run Process
2019-04-12 15:53:40: 0: STDOUT: VFH Loading Phoenix cache loader plugins from “A:\Software\Houdini\Vray\vray_adv_41101_houdini17.5.173_a3cbb74_3709/vfh_home/libs”…
2019-04-12 15:53:40: 0: STDOUT: Redshift for Houdini plugin version 2.6.37 (Mar 18 2019 17:46:29)
2019-04-12 15:53:40: 0: STDOUT: Plugin compile time HDK version: 17.5.173
2019-04-12 15:53:40: 0: STDOUT: Houdini host version: 17.5.224
2019-04-12 15:53:40: 0: STDOUT: Houdini and the Redshift plugin versions don't match. Houdini or Redshift may become unestable, with features not available or crashes at render time
2019-04-12 15:53:40: 0: STDOUT: Plugin dso/dll and config path: ASoftware/Houdini/Redshift/2.6.37//Plugins/Houdini/17.5.173/dso
2019-04-12 15:53:40: 0: STDOUT: Core data path: A:\Software\Houdini\Redshift\2.6.37
2019-04-12 15:53:40: 0: STDOUT: Local data path: C:\ProgramData\Redshift
2019-04-12 15:53:40: 0: STDOUT: Procedurals path: A:\Software\Houdini\Redshift\2.6.37\Procedurals
2019-04-12 15:53:40: 0: STDOUT: Preferences file path: C:\ProgramData\Redshift\preferences.xml
2019-04-12 15:53:40: 0: STDOUT: License path: C:\ProgramData\Redshift
2019-04-12 15:53:40: 0: STDOUT: VFH Build a3cbb74 from Mar 22 2019, 04:24:23
2019-04-12 15:53:42: 0: STDOUT: No licenses could be found to run this application.
2019-04-12 15:53:42: 0: STDOUT: Please check for a valid license server host
2019-04-12 15:53:42: 0: INFO: Process returned: 3
2019-04-12 15:53:42: 0: Done executing plugin command of type ‘Render Task’



I'm not even running Redshift in this scene, so i don't think it's because of it

edit: Disabled houdini.env to be sure, still same error.
Edited by romanus - 2019年4月12日 10:58:27
User Avatar
Member
12 posts
Joined: 9月 2016
Offline
Ok, so i decided to test with the local scheduler and it seems to work.

Using “Save to disk in background” or Local Scheduler seems to return similar calculation times, however using the deadline scheduler in my local machine, it takes a LONG time, basically unusable.

Locally via “save to disk in background” = 29 secs
Locally via TOP local scheduler = 31 secs
Locally via TOP deadline scheduler = 4 minutes

This is a very basic sim. What could cause this discrepancy in calculation times?
It is also saving to the wrong directory (same rop fetch) compared to the local scheduler…..
Edited by romanus - 2019年4月12日 11:40:48
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Local scheduler runs on your local machine with the local environment. With deadline, it depends on the farm setup and the scheduling. Are you actually scheduling jobs on a farm? How many client machines are running the jobs? Can you post a screenshot of your Deadline monitor showing the jobs with the info?

Are you able to attach a hip file for the path issue? Make sure your set the Local Root Path and Relative Job Directory properly.
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
Just noticed that you are running your jobs locally via Deadline. I'm not sure why you'd want that over using local scheduler, but I recommend using the Deadline scheduler node for jobs on a farm. The reason is the overhead cost, and this is best illustrated by the following screenshot from Deadline's Monitor window:



This shows the running time for a simple render with 5 frames in a batch. Note that from Submission time to Finished it was a total of 27 seconds. But the same job took 3 seconds total using the local scheduler. Using Deadline, the delay between submitting the job and the client picking it up took 13 seconds, then the job ran for another ~13 seconds. Therefore it took almost as long just to even start the job. This has nothing to do with PDG, and everything to do with Deadline's architecture, local hardware, system state, etc. Now the reason that it took 14 seconds to execute the work is partly due to the TOP Deadline scheduler node. Each job is actually 2 tasks: on the first task, a preload script is run to setup the environment (reason for this is to utilize Deadline's path mapping feature), then the actual job is run. This is the other part of the overhead.

What this boils down to is that running a job locally with the TOP Deaddline scheduler node is not optimal. For this case, it would be better to use the local scheduler. But if you are running on a farm where Deadline is setup, then the overhead should be worth it.

Attachments:
dl_runtimepng.png (5.5 KB)

User Avatar
Member
12 posts
Joined: 9月 2016
Offline
@seelan Thanks for the detailed explanation, right now i don't have time to continue testing it, but as soon as i have i'll report back, probably still this week, and do some more tests.
User Avatar
Member
151 posts
Joined: 2月 2009
Offline
I also have issues with this on linux (RHEL 7.6). I have enabled the CommandLine plugin in deadline, but I think more needs to be done to allow the environment variables to be initialised.

normal deadline submission works fine with the deadline rop.

This is a snippet of the output-

=======================================================
Error
=======================================================
Error: Executable "$HFS/bin/hython" is not rooted, and does not exist in the current directory or in the PATH. (System.Exception)
  at Deadline.Plugins.DeadlinePlugin.RunProcessAsUser (System.String executable, System.String arguments, System.String startupDirectory, System.Int32 timeoutMilliseconds, System.String userName, System.String domain, System.String password, System.Boolean useSu, System.Boolean preserveEnvironment, System.Boolean setHomeVariable) [0x00064] in <eedd765ca81a405c9961081363725775>:0 
  at Deadline.Plugins.DeadlinePlugin.RunProcess (System.String executable, System.String arguments, System.String startupDirectory, System.Int32 timeoutMilliseconds) [0x00045] in <eedd765ca81a405c9961081363725775>:0 
  at (wrapper managed-to-native) System.Reflection.MonoMethod:InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&)
  at System.Reflection.MonoMethod.Invoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00038] in <a8460a77e67a430a8486a9751162e5f4>:0 
  at Python.Runtime.Dispatcher.TrueDispatch (System.Collections.ArrayList args) [0x00092] in <f9721944f74b45c1a0cf39a0b63e9bc8>:0 
  at Python.Runtime.Dispatcher.Dispatch (System.Collections.ArrayList args) [0x00008] in <f9721944f74b45c1a0cf39a0b63e9bc8>:0 
  at Deadline.Plugins.PluginWrapper.RenderTasks (System.String taskId, System.Int32 startFrame, System.Int32 endFrame, System.String& outMessage, FranticX.Processes.ManagedProcess+AbortLevel& abortLevel) [0x002ea] in <eedd765ca81a405c9961081363725775>:0 

=======================================================
Type
=======================================================
RenderPluginException

=======================================================
Stack Trace
=======================================================
  at Deadline.Plugins.SandboxedPlugin.d (Deadline.Net.DeadlineMessage ajq) [0x00242] in <eedd765ca81a405c9961081363725775>:0 
  at Deadline.Plugins.SandboxedPlugin.RenderTask (System.String taskId, System.Int32 startFrame, System.Int32 endFrame) [0x000df] in <eedd765ca81a405c9961081363725775>:0 
  at Deadline.Slaves.SlaveRenderThread.c (Deadline.IO.TaskLogWriter zx) [0x0073e] in <eedd765ca81a405c9961081363725775>:0 

=======================================================
Log
=======================================================
2019-05-19 23:08:59:  0: Loading Job's Plugin timeout is Disabled
2019-05-19 23:09:01:  0: Executing plugin command of type 'Sync Files for Job'
2019-05-19 23:09:01:  0: All job files are already synchronized
2019-05-19 23:09:01:  0: Plugin CommandLine was already synchronized.
2019-05-19 23:09:01:  0: Done executing plugin command of type 'Sync Files for Job'
2019-05-19 23:09:01:  0: Executing plugin command of type 'Initialize Plugin'
2019-05-19 23:09:01:  0: INFO: Executing plugin script '/home/deadlineuser/Thinkbox/Deadline10/slave/workstation/plugins/5ce1536bf20ad925194a46dc/CommandLine.py'
https://openfirehawk.com/ [openfirehawk.com]
Support Open Firehawk - An open source cloud rendering project for Houdini on Patreon.
This project's goal is to provide an open source framework for cloud computing for heavy FX based workflows and allows end users to pay the lowest possible price for cloud resources.
User Avatar
Member
571 posts
Joined: 5月 2017
Offline
You'll need to set a path mapping for $HFS in your Deadline's Repository Configuration -> Mapped Paths. Or if the HFS path is the same on your local computer as the farm, then you can remove the `\` from the `\HFS` path in the Deadline TOP node.
Edited by seelan - 2019年5月21日 08:20:59
  • Quick Links