Howto:Use the system monitor

From FlightGear wiki
Jump to: navigation, search

Profiling

For developers, and people building from source, FlightGear versions >= 2.9 provide real profiling support using Google PerfTools, for details please see Built-in Profiler.

Terms

Cquote1.png those systems are not "processes", but just "loops" - i.e. whatever the thing is called in the list, you should append "update-loop" to it - and some names are still misleading (events=Nasal).
— Hooray (Nov 13th, 2015). Re: Su-15.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png FG is basically using cooperative multi-tasking-imagine it like a team of people (15+), all of whom need to finish their work and hand it over to the next person, so that a single frame can be produced - with some people (subsystems) being allowed to redo/check their work several times in a row (fdm, autopilot), while others will only get a single opportunity to update their internal state. Finally, it is this "team of people" that will produce a single frame - so whenever "someone" is too slow, it will end up causing stuttering. No matter the reason, scenery/terrain, Nasal etc - at some point, another subsystem will have to wait, which means fewer frames created per second.
— Hooray (Nov 13th, 2015). Re: Su-15.
(powered by Instant-Cquotes)
Cquote2.png

The system monitor

Screen shot showing the system performance monitor available in FG 2.6.0+
Caution
Cquote1.png the built-in performance monitor, and especially the GUI (it being used in PUI/PLIB)- is adding to the performance overhead you're seeing
Cquote2.png
Cquote1.png you will want to use the http/telnet interfaces to watch the performance stats, because the GUI dialog using PUI/Nasal is dead-slow, and even adding to the overhead you are seeing
— Hooray (Nov 13th, 2015). Re: Su-15.
Cquote1.png "events" is the subsystem that triggers/runs Nasal timers, and any Nasal code (anywhere), may trigger the GC (garbage collector) - to see if that's the case, you can use the "GC tracker" patch in the sigar code we have.
Cquote2.png

(powered by Instant-Cquotes)
Cquote2.png


Cquote1.png You can get pretty accurate stats using the performance monitor - it will write everything to the property tree. To see for yourself, open the performance monitor dialog and then use the property browser to watch the performance monitor branch in the property tree, you can learn more by inspecting $FG_ROOT/Nasal/performance_monitor/monitor.nas

As you can see there, the switch for enabling the monitor is /sim/performance-monitor/enabled (bool) This will work regardless of the PUI dialog, and you can view/access all stats (for each subsystem!) without showing the PUI dialog (=better performance) For a list of subsystems, go to /sim/performance-monitor/subsystems

Note that you can also use the http (browser) or telnet protocols to do this while extending the logging protocol
Cquote2.png


In FlightGear >= 2.6.0, use the menu option Debug => Monitor System Performance to analyze stutters.

In a perfect world, min/max/mean values should be almost identical for every subsystem - and standard deviation should be almost 0.

Larger differences / high standard deviations result in a sloppy simulation and stuttering movements, though they'll hardly influence the frames-per-second value at all.

Also, imagine a system producing 100 frames in 0.5 seconds, then blocking completely for another 0.5 seconds: The fps display would still show "100fps", which seems great. But the 0.5 second stutter cause the visual performance to be terrible.

When evaluating simulation performance, don't get fooled by the frame rate. What's really important to us is the "worst case frame latency". Even if the system is producing a huge average of 100 frames per second, it can still look absolutely crappy, if only a single frame took more than 20-30ms, since that's immediately visible to the human eye. So, we're building a real-time system here, and 20-30ms is our timing constraint.

That's why it is preferable to display "(worst-case) frame spacing" instead of fps (View => Display Options => Show frame spacing). The frame spacing for the previous example would show "500ms", while a system producing 100 frames with perfectly even spacing would show "10ms".

So, frame spacing is a much better property to judge visual quality than just watching fps.


The new GUI dialog is available in the menu: "Debug" => "Monitor system performance".

  • Only statistics on our "subsystems" are shown. That's the FG modules running in our single-threaded main loop. Everything outside the main loop isn't shown, i.e. there are no statistics on the GPU or OSG rendering threads (use the OSG statistics screen for these).
  • Title bar shows average and worst case frame rate. In a perfect world, all frames should be spaced evenly, so both values should be almost identical. When frames are produced unevenly, then a larger difference between the worst and average frame rate is visible. Needless to say, we should try to get FG to deliver frames evenly, so movements look smooth.
  • Main table shows statistics on the individual subsystems. Helps to identify how much time certain systems currently consume, or which are responsible for jitters resulting in uneven frame rates.
  • A bit background on the FG subsystems may be necessary though to really judge what's going on:
    • For example, you'll see the "nasal" subsystem consuming almost no time at all, so it looks great. However, almost all the nasal code runs in timers, and timers are driven by the "events" subsystem. So, to judge Nasal performance, you'll mainly need to look at "events" (and yes, you'll see time being consumed and jitters being produced there, which is related due to Nasal's current MARK/SWEEP Garbage Collector implementation, i.e. see: Improving Nasal).
    • the props (property tree) system make invoke Nasal callbacks via listeners
    • The "tile-manager" an "ai-model" subsystems may also contain time consumed by the Nasal engine, since each scenery tile and each AI/MP aircraft may trigger a Nasal script ("load"/"unload" hooks).
    • The GUI system is another subsystem which may trigger Nasal hooks
    • While it is true that the current Nasal mark/sweep garbage collector has been shown to cause occasional stuttering under certain circumstances, the system monitor in its current form is not suitable to tell you the difference between a badly coded "rogue" script causing stutter or the stuttering being really caused by the Nasal GC. Also, it is worth noting that following good coding practices may actually help decrease the overall pressure on the GC, i.e. by keeping the number of references to be traversed low and pretty constant, but also by not calling unneeded code too frequently using timers or listeners.
  • At the moment, it is unfortunately not yet possible to get timing statistics for individual Nasal scripts or sub modules, but this is something that has been requested by a number of users and so we are looking into making this possible eventually.
  • Statistics data is only collected while the dialog is open (or you manually set /sim/performance-monitor/enabled).

Anyway, this alone isn't improving anything yet, but having a GUI to conveniently monitor performance will hopefully help us to see where exactly we're having issues - and help us to improve...

Discussed Enhancements

  • decouple Nasal events from the "events" subsystem and use a new "events-nasal" instead [1]
  • show average frame-spacing effect per subsystem as a percentage
  • show memory consumption per subsystem (overload the new operator)
  • plot graphs for the various metrics using the new Canvas system
  • track Nasal GC pressure (invocations) and total memory allocated [2]

Debugging stuttering

To put it simple, vertex shader performance is affected by the amount of detail visible in the geometry of scene (number of vertices visible), and of course shader complexity itself, while fragment performance is affected by screen resolution(or window size if you run it windowed) and shader complexity as well. They both affect framerate in the end, since the frame needs to be finished before being put up for display. Frame delay might be affected by a lot of other stuff, but it's a better measure of smoothness since it displays the "longest" time it had to wait for a frame in a certain amount of time. There might be some performance lost in high detail scenes, due to the culling (visibility check, and hiding of faces facing away from the camera, or being behind other faces, or too small to display (openscenegraph is smart about that one and it won't "display" faces/parts of objects that would end up too small)) that needs to be done on the scene before it goes through the shaders.

To test if it's a graphics performance issue, you'd do a couple of tests.:

First load up the ufo, over detailed scenery, remain static, don't move the view around. Check framerate and frame delay. If they're consistent with each other it might be a graphics performance issue, if not you can be sure there's some other subsystem that does background processing and needs to finish its stuff.

Move the view around, do you get stutters? If you do, then there's some pressure on your video card's RAM. As it needs to load the new geometry and textures that come into view, it might not have enough space left, so it might need to offload the "old" ones. Even like this framerate should stabilise once you stop moving the view around, or once the rate of movement is slow enough so that the video card can keep up.

Fly straight and level, at reasonable speeds, keeping the view fixed. Do you get stutters? Is framerate consistent with frame delay? If it is, and you get low fps, again it might be a graphics issue, as new random objects come into view and/or the trees shader creates new ones in the distance. If not I's still suspect another susbsytem being the culprit.

You can repeat these with some aircraft, see if and how things change.

In short: if the framerate varies along with the Frame Delay, then it's a graphics performance issue. If framerate is pretty stable, and Frame Delay varies more or less wildly I'd suspect some other subsystem being the culprit and causing more or less periodic delays as it needs to do it's stuff.

Interpreting the OSG on screen stats

1rightarrow.png See OSG on screen stats for the main article about this subject.