Understanding Rembrandt
Background
Many of us are wondering why Rembrandt is so slow for us, despite having fairly powerful computers.
there’s plenty of other people with reasonable hardware who have problems with Rembrandt performance - me for example :) And while that may be related to settings, being a lay Mac user I don’t like solutions which require extensive configuration to give acceptable performance. — James Turner (Nov 27th, 2013). Re: [Flightgear-devel] Rendering strategies.
(powered by Instant-Cquotes) |
with a i3770K and a GTX670, I get some hit from ALS (10-30%) but Rembrandt instantly drops me to 20fps, and < 10fps I use an aircraft I actually want to fly (777 or Citation) and go to any major airport (EGKK, EHAM, EDDM, EDDF, EGLC, VHHH) This is at 2560x1600, but on the 670 I would be highly surprised if I'm fill-rate limited, given that AA is off, and the general suboptimal sie of our primitive batches. Emilian has explained on IRC this might be due to the out-of-the-box / default config for Rembrandt being highly suboptimal, which I didn't yet evaluate, I would be delighted to have it more usable. I'm going to test further over the weekend. — James Turner (Jun 20th, 2013). Re: [Flightgear-devel] reminder: entering feature freeze now.
(powered by Instant-Cquotes) |
Objective
Try to better understand -and document- Rembrandt internals, so that we can better troubleshoot performance issues.
Ok, I just tried Rembrandt again (after spending 5 minutes reading the wiki article), and while my computer is much less powerful than yours, I am also getting roughly ~15 fps at ksfo with the ufo - and looking at those OSG stats, there's a hell lot of stuff going on obviously, i.e. we have 10 different cameras 3 of them extremely busy (0,1 and 5)
The performance is actually decent, though ALS with all goodies maxed out gets 25 fps for the same scene under Linux (there's the grass shader needing lots of noise calls for instance). I don't get a huge green bar.
The big hit comes when I try to see Las Vegas (with the Urban shader) - that drives me down to 3 fps. Or when I try to activate filtering and switch it to 3 - then my framerate likewise dives down to 4.
So I remember how this went - the base performance of Rembrandt without shadows was actually pretty decent on my box under Windows, 30 fps or so. Switching shadows on cost me some of that but was still flyable, but the shadows were flickering so much that I got a headache after 5 minutes, so I needed to switch the filtering to max. to be able to look at it - and that killed the framerate for good.
If this is something that we really want to investigate more closely, I guess it would be a good idea to read the "deferred rendering" paper that Fred linked to in the article - at least those parts describing the 3 cameras that seem really busy (geometry/shadows/lighting)
Scene Complexity
Purely from a troubleshooting standpoint, I would like to know what kind of effect/impact we can expect from discarding vertices/triangles and quads from all three cameras (having 10 fps even at night time seems very odd), i.e. if discarding those translate into any proportional/tangible performance gains
Actually, the base idea of deferred rendering is that it should be pretty insensitive to the amount of vertices you feed to it because it really has a minimal geometry shader (computationally cheaper than default even, it basically only notes where stuff is on the screen and stores the non-projected position in a buffer) and all the actual work of lighting etc. is done in the fragment pipelines. So I'd be very surprised if it responds at all to changes of the vertex load.
I sort of see this on my card - if I'm fragment-limited, it switches to synchronized framrates, I get either 25 or 30 or 60 fps, but not 33 or 47. Completely different when the vertex shader jams, then I get to see arbitrary numbers. Which is a neat first-look diagnostics. Rembrandt is clearly fragment-dominated on my box.
The thing is, it only takes a few errors in the C++ code that could massively inflate the amount of primitives sent to effects/shaders. And Rembrandt is obviously not well understood now that Fred is not maintaininig it currently. So there might be some low-hanging fruits there, but I am not going to spend hours going through the code unless I see tangible results.
As far as I understand the wiki article, most stages are XML configurable, so we can probably customize things a bit there, or even disable certain cameras/stages - which would make sense to see if each camera's stage looks sane.
For starters, I would probably start up at night time above sea - i.e. "minimal startup profile" and see what Rembrandt is doing then in each stage. The number of vertices etc. should be fairly minimal then, shouldn't it ?
ok, when going to zero-scenery places, I am getting rock-solid 60 fps/25ms here (daytime), with Rembrandt running with aircraft shadows, even with maxed out settings. Can we work with that ? What about you ? I remember your "orbitview" (?) project where you placed a huge sphere into the scenery. Could this help us to do some troubleshooting, i.e. using Nasal to place a few models (and possibly light sources) and see what's having an impact ?
the other thing I noticed is that CPU load doesn't seem to decrease despite AGL/ASL attitude being too high to realistically cast shadows - would these be things that we could add to the effects/shaders to reduce rembrandt workload a bit ?
Reducing Complexity ?
What would be involved in editing effects/shaders to simply discard 50% of all vertices ? I just want to see for myself if that's having an effect here or not ?
In a vertex shader it's fairly difficult to do. To effectively discard a vertex, you need to evaluate some criterion based on its coordinates/attributes and then if that criterion is true move it out of the view frustrum and return, so you run a 'minimal' set of operations.
The obvious thing to do if you want to test response to vertex numbers is to set visibility lower so that terrain is simply not showing up at the vertex shader at all in a controllable way. A theoretically elegant way if you can is to set random numbers as vertex attributes and to move the vertex out of the view frustrum if the random number is smaller than a threshold. But in passing attributes, you're of course changing the pipeline in a substantial way...
If you want to test scaling of fragment shaders, that's much easier to do - evaluate another criterion in the first line which is true half of the time (say whether you're in the right half of the screen, you can test against gl_FragCoord which is the fragment position on the screen) and then insert a discard; if you want to dump the fragment without computing anything.
Profiling
I am going to check what the C++ runtime profile looks like in comparison to the classical renderer.
Okay, I left rembrandt running for 10 minutes with the profiler enabled: fgcommand("profiler-start") - for some reason, the profile showed that "osgParticles" were eating up /some/ resources despite being not enabled - I tried to explicitly disable them, but that would still not change anything, so I removed the corresponding subsystem from src/Main/fg_init.cxx, which gave me +8 fps:
diff --git a/src/Main/fg_init.cxx b/src/Main/fg_init.cxx
index 86494da..04d6f42 100644
--- a/src/Main/fg_init.cxx
+++ b/src/Main/fg_init.cxx
@@ -635,9 +635,11 @@ void fgCreateSubsystems(bool duringReset) {
// Initialize the scenery management subsystem.
////////////////////////////////////////////////////////////////////
+#if 0
globals->get_scenery()->get_scene_graph()
->addChild(simgear::Particles::getCommonRoot());
simgear::GlobalParticleCallback::setSwitch(fgGetNode("/sim/rendering/particles", true));
+#endif
////////////////////////////////////////////////////////////////////
// Initialize the flight model subsystem.
@@ -1077,8 +1079,10 @@ void fgStartNewReset()
simgear::clearEffectCache();
simgear::SGModelLib::resetPropertyRoot();
-
+
+#if 0
simgear::GlobalParticleCallback::setSwitch(NULL);
+#endif
globals->resetPropertyRoot();
fgInitConfig(0, NULL, true);
Shadows at night time
Well, something seems a bit odd there, because rembrandt is doing shadows, right ? And when I switch to night time mode, I am still just getting 10-15 fps and seeing similar subsystem/osg activity.
as far as I know it's doing shadows at night just as well. If you drive your ufo to a Rembrandt-defined light source, I think it will cast a shadow also at night. I don't think the shadow part is off just because the sun is down.
Wrong, Rembrandt is not doing shadows at night time: https://sourceforge.net/p/flightgear/flightgear/ci/5ccc83566785c9b5b75e8d03579dbd1aa45d7237/tree/src/Viewer/renderer.cxx#l938
The conceptual beauty of shadows in Rembrandt is that they're not faked, there's actually a physics computation going on where light in the scene reaches and where it doesn't. The downside is that unless there's a huge amount of filtering going on, that computation is suffering from numerical accuracy so much that it flickers all over the place.
the thing about rembrandt & light sources is true, I remember seeing screen shots - but I cannot imagine that the amount of computations for a handful of airports light should be equal to have a "central" light source illuminating everything (aka the sun) ?
I invit you to start FlightGear and enable the Rembrandt shadows and look at your framerate, then disable Rembrandt shadows and look again at your framerate. Conclusion: how many FPS costs the Rembrandt shadows ? (here the answer is 3~5 FPS, so Rembrandt shadows are not the "FPS killer" thing)
Depends on you graphic card and a lot of the aircraft. A well done aircraft not splitted in many submodels and objects shows a sginificant less impact than other aircraft with shadows. Also the shadow distance definded with the cascades has a big influence. It is true that shadow rendering in Rembrandt are not the main reason for the comparable lower fps, but still can have big impact depending on graphic card, aircraft models, scenery complexity.
Basically, it seems there's no "LOD for shadows" taking place, i.e. computations are heavy/complex despite having any light souces in your vicinity that could have an effect realistically. I would have expected that the corresponding shaders/effects specifically look for light source so that the computations kick in, but otherwise don't - unless, rembrandt is even doin moonlight shadow ?
The shadow cascades are acting like "shadows LOD", you can tweak this setting at runtime and see how Rembrandt shadows are updated. After that you will conclude that "LOD for shadows" is implemented finally.
Also, those cascades has an influence on fps, as the amount of objects and not vertices influences the perfomance in Rembrandt. The more objects you have the less framerates you have. With increasing cascades distance you get more objects to draw in scenery/ Aircraft and the less fps.
Zero Scenery Tests
Using --aircraft=ufo - --enable-rembrandt --prop:/sim/rendering/shadows/map-size=8192 --prop:/sim/rendering/shadows/num-cascades=4 I can even get way beyond 60 fps when there's not scenery to be displayed, it would be interesting to check what Fred did there, i.e. if there are heuristics in place to recognize this ? We would probably want to add a few static models to the scenery and see what the performance impact is like.
I think through roughly a dozen different test cases like these, one could incrementally understand rembrandt and its stages - obviously, one would now need to edit some of the XML files and maybe some effects/shaders to see how things are affected
Internals
Internally, a Rembrandt buffer is not much different from any other RTT context - Canvas is all about rendering to a dynamic texture and updating it dynamically by modifying a sub-tree in the property tree - but its primary primitives are 1) osgText, 2) shivaVG/OpenVG paths, 3) static raster images, 3) groups/maps - none of these would be particularly useful in this context. But Zan's newcamera work could be turned into a new "CanvasCamera" element to allow camera views to be rendered to a Canvas, including not just scenery views - but also individual rendering stages. Canvas itself maintains a FBO for each texture, which is also the mechanism in use by Rembrandt. Tim's CameraGroup code is designed such that it does expose a bunch of windowing-related attributes to the property tree - equally, our view manager is property-controlled.
|
Both, Fred and Mathias, seemed pretty eager to adopt Zan's work back then: https://www.mail-archive.com/flightgear ... 36481.html |