Pure OpenGL Geometry cache representations (BGEO + Alembic)

   9460   9   2
User Avatar
Member
763 posts
Joined: Sept. 2011
Offline
We've recently been looking into optimising houdini scene representations that contain a lot of geometry. I imagine most studios are the same in that they have the option of loading in everything if the artist requires i.e set, char, prop, camera,crowd etc…

Unfortunately we find compared to the other packages houdini never seems to handle lots of poly cache data in any format (bgeo/alembic/custom) that well and FPS and responsiveness really start to suffer with lots of large caches. Looking at a few other threads others seem to be seeing the same i.e: http://www.sidefx.com/index.php?option=com_forum&Itemid=172&page=viewtopic&t=30310 [sidefx.com]

After working on a recent show that defaulted to all geo caches primarily displayed in openGL (via a custom renderhook) it was a pleasant change to see how fast things could get. Maya can now do a similar thing with its Alembic CPU caching (with a nice added threading feature) see this impressive video from 2.30 onwards: http://www.youtube.com/watch?v=RkfVxFti-Sw [youtube.com]

In fact we just tried to implement a similar thing to the above with alembic in Houdini by updating alembics free simpleABCviewer to openGL3.3 and calling it though a custom GR renderhook. Compared to the default H13 alembic importer SOP its incredible to see multiple 300mb+ alembic file's play back in realtime, something we can't achieve with the default alembic importer unless we switch to bounds only.


My question is: Is it time that Houdini should provide similar options to load its native geo formats as pure GL representations only? packed prims and the alembic load options do help but are still far too slow, and when you consider full scenes may contain large amounts of geo for which the artist only needs 10-30% as geometry to interact with (probably less in a lighting department) and the rest is for visual reference, it makes sense to be able put the rest to the GPU/openGL.

Note: As long as you stick to a methodology (similar to alembic packed prims) of inserting 1 point that stores the original cache path as a point attribute, the caches can still be passed to the renderer for later expansion. Also if the artist actually wants to manipulate or work with the geometry in a sim they can just change that specific sop file/alembic read node to full geometry representation.


p.s I would love to upload some typically slow example files but as stated this is for heavy production assets (usually 200mb + plus)
Miles Green, Supervising TD, Animal Logic
User Avatar
Member
6 posts
Joined:
Offline
I was thinking the same thing. It would be sweet to have Alembic and bgeo be offloaded to the video card with the Custom Primitive. THe tech is there.

THis would make me so happy. OMG If we could get this for volumes(I guess standard bgeo would take care of this). My life would be so much better.
User Avatar
Staff
5212 posts
Joined: July 2005
Offline
In fact we just tried to implement a similar thing to the above with alembic in Houdini by updating alembics free simpleABCviewer to openGL3.3 and calling it though a custom GR renderhook. Compared to the default H13 alembic importer SOP its incredible to see multiple 300mb+ alembic file's play back in realtime, something we can't achieve with the default alembic importer unless we switch to bounds only.
Being a relatively new feature to H13.0, packed primitives were still not fully optimized when it was released due to its very short release cycle. Alembic primitives are all packed primitives underneath, so they share some of the same optimization issues.

That being said, we've made substantially optimizations in the current development build which unfortunately can't be backported to 13.0 because of their scope. But just in case, if you would like to upload a slow Alembic to our ftp side (following the instructions @ customer support [sidefx.com]) we can take a look and see if there are any small/safe optimization to be made to H13.

There are also some features that the native alembic primitive renderer provides that render hooks likely don't, such as selection/picking, attribute inspection, various display modifiers (ghosting), which cut into performance a bit.

My question is: Is it time that Houdini should provide similar options to load its native geo formats as pure GL representations only?
Packed primitives pretty much do this already. There is a very lightweight format that packed geometries are loaded into (to generate normals, convex, etc) which is used to feed data to GL.

You can also load bgeos as Packed Disk Primitives in the File SOP (Load As: Packed Disk Primitive, Display As: Full Geometry). It limits what you can do in SOPs in terms of geometry modification, but it does not go through the normal GU_Detail path at all for the packed geometry.
User Avatar
Member
763 posts
Joined: Sept. 2011
Offline
That being said, we've made substantially optimizations in the current development build which unfortunately can't be backported to 13.0 because of their scope.

OK I'll reserve my judgement on packed primitives until the next release, but continue to use mayas GPU cache as the speed benchmark to test against when it releases. In the meantime I'll also try an get a suitable non production asset to upload to your ftp, If I can I'll also send a compiled version of our GR hook so you can see the difference in speed for yourself.

There are also some features that the native alembic primitive renderer provides that render hooks likely don't, such as selection/picking, attribute inspection, various display modifiers (ghosting), which cut into performance a bit

Thats my main point, in my experience a production scene contains 60 percent and above of geometry that never needs to be selected/picked or have its attributes inspected, its just needs to be visible for scene/shot reference.

Note: We've also found with the pure GL render hook method you can also do nice tricks like switching to bounding boxes/points as objects or parts of an object reach a certain distance from camera further speeding things up. In extreme heavy geo cases you could also start to skip a percentage of polys too. Doing this all the Packed primitive way would likely involve unpacking the data adding significant time, or having to switch it manually to bounds/points at the SOP load level.

One last question, does the alembic or file SOP for bgeo's (packed primitive or not) have the ability to read ahead into the cache so that subsequent frames can be preloaded for faster playback?
Miles Green, Supervising TD, Animal Logic
User Avatar
Staff
5212 posts
Joined: July 2005
Offline
milomilo
OK I'll reserve my judgement on packed primitives until the next release, but continue to use mayas GPU cache as the speed benchmark to test against when it releases. In the meantime I'll also try an get a suitable non production asset to upload to your ftp, If I can I'll also send a compiled version of our GR hook so you can see the difference in speed for yourself.

Thanks! That would be very helpful, and make sure we're on the same page. I'd hate to make performance claims only to find out your Alembic scene was different enough not to beneift from them.

Thats my main point, in my experience a production scene contains 60 percent and above of geometry that never needs to be selected/picked or have its attributes inspected, its just needs to be visible for scene/shot reference.

Unfortunately it's somewhat difficult to predict exactly what the user wants to do with the geometry, but we are continually expanding our on-demand approach that should help these situations. The Selectable flag would also be a good hint for the Alembic Archive case. Perhaps some way to load alembics as “passive” geoemtry (or any geometry, for that matter) would help as well.

Note: We've also found with the pure GL render hook method you can also do nice tricks like switching to bounding boxes/points as objects or parts of an object reach a certain distance from camera further speeding things up. In extreme heavy geo cases you could also start to skip a percentage of polys too. Doing this all the Packed primitive way would likely involve unpacking the data adding significant time, or having to switch it manually to bounds/points at the SOP load level.

We've started down this path with instance reduction/standins (for point instancing) and have frustum culling implemented, so this would be a logical extension of that.

Do you find that tumble speed is also adversely affected by these scenes, or is it more update speed (ie, animation) that is a problem?

One last question, does the alembic or file SOP for bgeo's (packed primitive or not) have the ability to read ahead into the cache so that subsequent frames can be preloaded for faster playback?

Currently, no. Read-ahead would be more helpful for HDF alembics than Ogawa, I'd think, as the latter substantially reduces the Alembic overhead. It might be able to pre-cache certain aspects of the Alembic file, like animated transforms or visibility, which could help certain files quite a bit.
User Avatar
Member
763 posts
Joined: Sept. 2011
Offline
Do you find that tumble speed is also adversely affected by these scenes, or is it more update speed (ie, animation) that is a problem?

We find changing the animation time has the biggest lag but we notice but the tumble time is also a bit slower ans steppy compared to the pure GL representation too..

Perhaps some way to load alembics as “passive” geometry (or any geometry, for that matter) would help as well.

Thats what we're trying to do with our implementation the user gets just a GL representation and point per archive by default, but can always switch to geometry if needed with a simple switch attr

It might be able to pre-cache certain aspects of the Alembic file, like animated transforms or visibility, which could help certain files quite a bit.

The current example I have has animations purely driven by transforms, with no changing point count or deformation, so some caching could help. On our current pure GL implementation we may even look at caching the VBO's per frame so once a runthough of the timeline has taken place its quicker the next, plus when an asset is replicated multiple times (which we often see) with position or time offsets, a simple lookup into the cache can be made for even quicker display next time





we recently updated our nvidia drivers to see if that would help, but it did not make a change:



current GPU Specs:
————————–

Platform: linux-x86_64-gcc4.4
Operating System: CentOS release 6.5 (Final)
Number of Cores: 16
Physical Memory: 31.28 GB

OpenGL Vendor: NVIDIA Corporation
OpenGL Renderer: Quadro 4000/PCIe/SSE2
OpenGL Version: 4.3.0 NVIDIA 319.60
OpenGL Shading Language: 4.30 NVIDIA via Cg compiler
Viewport Render Version: GL 3.3
Detected: NVidia Professional
2048 MB
319.60.0.0
——————–


p.s I uploaded an example file to support for you to test
Miles Green, Supervising TD, Animal Logic
User Avatar
Member
4189 posts
Joined: June 2012
Offline
milomilo
On our current pure GL implementation we may even look at caching the VBO's per frame so once a runthough of the timeline has taken place its quicker the next, plus when an asset is replicated multiple times (which we often see) with position or time offsets, a simple lookup into the cache can be made for even quicker display next time

This sounds very good! Out of interest is it possible to release your pure GL implementation to the public? Thanks!
User Avatar
Member
763 posts
Joined: Sept. 2011
Offline
Out of interest is it possible to release your pure GL implementation to the public?

If it holds up we'll look into it (company policy permitting) but ideally this is something I'd like to see SESI take on as they have the scope to implement it across the board and maintain it for Houdini, i.e for bgeo and all other formats and not just for alembic as we have done.

I also guess we will all have to eagerly await H14's packed primitive updates as it sounds twod may have made some significant GL speed improvements that may match what we have done in pure GL
Miles Green, Supervising TD, Animal Logic
User Avatar
Member
4189 posts
Joined: June 2012
Offline
Awesome! Appreciate it.

Also definitely worth prodding Twod/Sesi with some screen recordings with what you can achieve too Competition can be very productive!
User Avatar
Staff
5212 posts
Joined: July 2005
Offline
milomilo
when an asset is replicated multiple times (which we often see) with position or time offsets, a simple lookup into the cache can be made for even quicker display next time

This is also something that's fairly high on our list of optimizations.

I've also profiled through your example and found a few places where more work was being done than required in some cases, which improved animation performance quite a bit (50% faster). Caching the animated transforms would also remove a solid chunk of work (around 30-35% reduction in cook time after the first cook), so there still seems to be a bit of low-hanging fruit available in this case.

I'm not sure how much can be backported to 13.0 yet, however.
  • Quick Links