Scenegraph optimizations

From FlightGear wiki
Jump to navigation Jump to search
This article is a stub. You can help the wiki by expanding it.

This page is intended to gather comments related to optimizing our scene graph, which is a long-standing idea to address the issue of FlightGear being CPU bound for many end users.


RFC / 09/2020

Currently OpenGL wise we are basically geometry setup bound - at least for the models. This really means that vertices are not an issue. That still means that for setting up that one draw with 3 triangles is about as heavy as setting up say 500 triangles, but the conclusion of this is *not* that you should schedule as many triangles ass possible. The right conclusion is to collapse as many triangles as *sensible* for culling into a single draw.

A simple help is to switch on the onscreen stats in the viewer or flightgear and see if the yellow bar is longer than the orange one. A yellow bar longer than the orange one means that the cpu is not able to saturate the gpu. Again, beware this does in no way mean that you should write complicated shaders to saturate the gpu! This rather means that the geometry setup/state change time - the yellow one - must decrease![1]


Appart from OpenGL we spend a lot of time in scenegraph traversal. This is mostly due to plenty of structural and often needless nodes in the scenegraph. The reason or this is that historically the xml files did some implicit grouping for *every* animation in the xml file. To make that reilably work I had to introduce a huge amount of group nodes in the xml file loader. These really hurt today as they introduce a whole lot of group nodes that just have a single child which need to be traversed for update and for cull. Profiling shows that Group::traverse is the most used function in flightgear. The lowest hanging fruit could be to optimize away the redundant nodes from the top level model that gets loaded by a database pager request. We cannot do that recursively once a model part is loaded since the mentioned grouping nodes might be referenced in any level in the model hierarchy above the currently loaded model. So only the top level model could do this without braking tons of models.[2]

Flattening Transforms

To flatten the LOD quadtrees and transforms of the tiles. Each tile will get some top-level LOD groups for all objects (shared and static). I'm hoping in combination with the LOD-scale function in OSG, this will mean we can automatically drop random tress / building and STG objects to keep frame-rate up. (as an optional mode of course!)[3]

The trick is to minimize the number of LOD nodes introduced into the scene graph while avoiding popping effects.[4]

As already discussed this is good to do. But some of them are critical to stay like the are or at least similar. The first one that positions objects with respect to the earth centered coordinate system for precision reasons.

Also these coordinate systems coudl for drawing reasons aligned in any direction. But as long as we do simulations using the same tree and geometry alignments that we do for draw this still interferes with the bounding volumes we have for ground intersections. And this axis aligned bounding box implementation gains a lot by having the boxes horizontally aligned. Todays transforms are done so that huger things are aligned with the horizont which serves this purpose.[5]


With gDebugger that the sky dome is made of more than 23000 vertices, even if a small amount of them are really displayed on screen.

In Rembrandt, several pass are done on 1/16th of the size of the screen (1/4th of each dimension), relying on the mag filter to cover the whole screen. This is the case of the blur pass in the AO or Bloom effects.[6]

In Fred's other terrain engine project, the sky is drawn with a quad (4 vertices at the corners of the screen). The view direction is computed in the vertex shader and then interpolated in a varying and renormalized in the fragment shader. That way there is no wasted computation (except the region overdrawn by the terrain, but that could be optimized with a stencil buffer)[7]