Hi All,
We are seeing intermittent corrupt .bgeo.sc files when we are running sims via Houdini Engine as a deadline job.
No Error message is displayed; the job continues and successfully writes frames following the corrupt frame.
The job completes without error.
But when we try to reference the .bgeo.sc files, the corrupt frame cannot be loaded. The file size for the corrupt frame is also smaller than the frames around it.
When this happens, the simulation is not usable, and the sim must be run again from the beginning, which is tremendously inconvenient.
We have not been able to determine any consistent cause for this. This may happen once a week, or sometimes 3-5 times a day. Sometimes we go days without seeing this, but it always returns.
Has anyone else experienced this? Any suggestions? I would really like to be able to trust our sim jobs more than I do, but now we have to watch them closely when the sim is for a shot with a deadline.
Thank you,
Fred
Help : CORRUPT .bgeo.sc FILES
2542 12 2- Fred P
- Member
- 7 posts
- Joined: April 2017
- Offline
- jsmack
- Member
- 8043 posts
- Joined: Sept. 2011
- Offline
- Fred P
- Member
- 7 posts
- Joined: April 2017
- Offline
- jsmack
- Member
- 8043 posts
- Joined: Sept. 2011
- Offline
Fred P
The server reports nothing unusual and has been working well. It is always possible,I suppose. I have suspected that it could be something networking, a switching issue or perhaps a write cache issue on the server. But we have not found any proof of that yet.
Does the problem go away with 'background writes' disabled on the geometry ROP?
- Fred P
- Member
- 7 posts
- Joined: April 2017
- Offline
- johnmather
- Staff
- 529 posts
- Joined: Aug. 2019
- Offline
- Fred P
- Member
- 7 posts
- Joined: April 2017
- Offline
- adlabac
- Member
- 21 posts
- Joined: Sept. 2015
- Offline
- adlabac
- Member
- 21 posts
- Joined: Sept. 2015
- Offline
- johnmather
- Staff
- 529 posts
- Joined: Aug. 2019
- Offline
- adlabac
- Member
- 21 posts
- Joined: Sept. 2015
- Offline
Sure. Pretty much the same description as the author's original post. We launch sims via Deadline. All our project data are stored on a huge Isilon NAS storage. Render slave's OS is Rocky Linux 8.8. It happens randomly, so we can't reproduce ourselves on purpose. I've sent couple of corrupted files under this RFE https://www.sidefx.com/bugs/#/bug/135536 [www.sidefx.com] but I can deliver more if needed. Due to fact Houdini isn't give any error during simulation, we discover particular corruption after whole simulation is done and cache is used.
We are using several Houdini versions ranging from 19.5.xxx to 20.0.xxx. In the past we used HQUEUE on windows (also storing outputs to ISILON)and never saw this happening. As I've already mentioned setting of HOUDINI_BUFEREDSAVE didn't help us.
Because we switched to Deadline and linux OS in the same time we cannot tell what the right reason could it be.
We are using several Houdini versions ranging from 19.5.xxx to 20.0.xxx. In the past we used HQUEUE on windows (also storing outputs to ISILON)and never saw this happening. As I've already mentioned setting of HOUDINI_BUFEREDSAVE didn't help us.
Because we switched to Deadline and linux OS in the same time we cannot tell what the right reason could it be.
Edited by adlabac - April 23, 2024 15:32:08
- Fred P
- Member
- 7 posts
- Joined: April 2017
- Offline
adlabac
HI, We have exactly same problem. Did you manage to solve this issue by setting BUFFEREDSAVE variable?
I am sorry to say we could not find an absolute cause or definite solution to this issue.
BufferSave seemed to help reduce the frequency of this issue for us. At the time, I honestly thought it could have been caused by our server's write cache not being able to keep up with the number of nodes creating the .bgeo.sc files because we only had this happen when we had many machines simultaneously writing very large cache files.
We have not seen this issue since the original post, so I am afraid I cant share much more.
I do hope you get this resolved very soon, it is terrible when this is what slows down production.
- adlabac
- Member
- 21 posts
- Joined: Sept. 2015
- Offline
-
- Quick Links