I'm wondering if there is any advice around how to improve performance in Copernicus? Testing it out locally, it seems to run much slower than in any of the demos that I have seen.
A good example is the mask-painting content example that can be found here: https://www.sidefx.com/contentlibrary/texture-mask-paint/ [www.sidefx.com]
If you look at the launch presentation here: https://player.vimeo.com/video/973697874 [player.vimeo.com] it shows the mask paint being drawn, and updating the quick material from Copernicus a bit slowly, but just about interactively, at maybe 5-10 fps or so.
I've loaded this file up on a couple of beefy workstations now, one with a 3090 and another with a 4090, with plenty of CPU and GPU horsepower, and I'm getting about 1 fps updates if I change the mask paint. Looking at performance capture, a single update takes about 1.08s, with the biggest culprits being the monotosdf node (0.166s) idtomask (0.149s), the two dilate erodes (about 0.11s each) etc.
The copnet node has parameters on it that sound like they could speed it up, like enabling Compiled Cook, or lowering the Default Resolution or the Precision, but none of these seem like they have any impact at all on the quality or the speed of the cook. Even the Proxy 1:2 tickbox makes no difference to the resolution or speed of the network.
I've gone through the docs, but I feel like I'm missing some key part of understanding how this should be set up correctly?
Copernicus Performance
1333 8 2- 9of9
- Member
- 39 posts
- Joined: Oct. 2017
- Offline
- alexwheezy
- Member
- 297 posts
- Joined: Jan. 2013
- Offline
- 9of9
- Member
- 39 posts
- Joined: Oct. 2017
- Offline
I mean, yeah, I've tried even bypassing outright, the most expensive nodes like monotosdf and heighttoambientocclusion, but at best I can maybe get the performance down from just over one second to maybe 0.9 or 0.8 seconds. I can completely break the material in that regard, and yet it will remain largely unusable and very, very far from the performance shown at the keynote - so I'm feeling like there's some broader principle missing than just optimising individual nodes.
As far as computation running on the GPU goes, this material does not look that complex, and my gut says this math should fundamentally run on a modern GPU in maybe a couple of milliseconds at most. Obviously there will be some overhead from Houdini, but compiling the graph should take care of that to some degree in principle. Even smaller, simpler networks made of very simple maths functions run orders of magnitude slower than they really should be - I'd love to get a better sense of where the bottlenecks are, and what the base knobs for tuning overall system performance for copernicus are, and at the very least getting up to the speed of execution shown at the keynote seems like it should be plausible
As far as computation running on the GPU goes, this material does not look that complex, and my gut says this math should fundamentally run on a modern GPU in maybe a couple of milliseconds at most. Obviously there will be some overhead from Houdini, but compiling the graph should take care of that to some degree in principle. Even smaller, simpler networks made of very simple maths functions run orders of magnitude slower than they really should be - I'd love to get a better sense of where the bottlenecks are, and what the base knobs for tuning overall system performance for copernicus are, and at the very least getting up to the speed of execution shown at the keynote seems like it should be plausible
Edited by 9of9 - July 16, 2024 12:35:00
- ikoon
- Member
- 207 posts
- Joined: Jan. 2016
- Offline
Hi 9of9, I am not sure if I am doing this right, but I have tried this:
- open the file as it is
- set the display flag to /obj/Texture_Mask_Paint/merge1
- set the brush to Erase on the /obj/Texture_Mask_Paint/texturemaskpaint1
- hit Enter in the viewport to start the Paint tool
- (for some reason the painting transform is reversed, but I don't investigate)
I am getting ~4 fps, as in the gif
My specs:
- gpu: 4090
- cpu: intel i9-12900K
- windows 11
- nvidia drivers: 560.70
If you want to reach support, I can give them my Houdini_Info.txt file (Help>About>Details)
- open the file as it is
- set the display flag to /obj/Texture_Mask_Paint/merge1
- set the brush to Erase on the /obj/Texture_Mask_Paint/texturemaskpaint1
- hit Enter in the viewport to start the Paint tool
- (for some reason the painting transform is reversed, but I don't investigate)
I am getting ~4 fps, as in the gif
My specs:
- gpu: 4090
- cpu: intel i9-12900K
- windows 11
- nvidia drivers: 560.70
If you want to reach support, I can give them my Houdini_Info.txt file (Help>About>Details)
- kodra
- Member
- 373 posts
- Joined: June 2023
- Offline
- Soothsayer
- Member
- 874 posts
- Joined: Oct. 2008
- Offline
- 9of9
- Member
- 39 posts
- Joined: Oct. 2017
- Offline
ikoon
Hi 9of9, I am not sure if I am doing this right, but I have tried this:
- open the file as it is
- set the display flag to /obj/Texture_Mask_Paint/merge1
- set the brush to Erase on the /obj/Texture_Mask_Paint/texturemaskpaint1
- hit Enter in the viewport to start the Paint tool
- (for some reason the painting transform is reversed, but I don't investigate)
I am getting ~4 fps, as in the gif
That is far more reasonable than my results! Will try to attach a video. Your frame time looks to be about 230ms on average - mine is about 1650ms if following those exact steps, so that's a 7x difference!
While 4fps is low, that does look closer to what was demoed and is at least somewhat usable - I can see that being something that could be optimised down across the specific nodes, but I can't see a way for me to claw back ~1400ms of frame time as it stands!
My specs:
- GPU: NVIDIA 4090
- CPU: AMD Ryzen 3970X 32 Cores (64 CPUs), ~3.7GHz
- RAM: 64GB
- OS: Windows 11
- Driver: 555.99
Edited by 9of9 - July 18, 2024 08:20:29
- ikoon
- Member
- 207 posts
- Joined: Jan. 2016
- Offline
I watched my GPU and CPU usage. CPU goes to some 11% (probably single core full load). GPU goes to 70-90%
Single core performance of that i9 may be 60% higher than Ryzen's. I am not sure where else might be the difference. Maybe try to update the nvidia drivers too. (I have the Studio Drivers 560.70)
Single core performance of that i9 may be 60% higher than Ryzen's. I am not sure where else might be the difference. Maybe try to update the nvidia drivers too. (I have the Studio Drivers 560.70)
- 9of9
- Member
- 39 posts
- Joined: Oct. 2017
- Offline
That's a good shout - upgrading to the latest 560.70 driver, whether Game-Ready or Studio improves my frametime to about 850ms on the 4090, approximately halving it (and still over 1200ms on the 3090!). Though that's still almost four times slower than yours!
My GPU utilisation is about 90-100% while painting, whereas CPU remains steady at 5%.
My GPU utilisation is about 90-100% while painting, whereas CPU remains steady at 5%.
-
- Quick Links