FlightGear benchmark

From FlightGear wiki
Jump to navigation Jump to search

Note: While this article is based on considerable community feedback, there's nobody working on this currently.
So if you'd like to help in one way or another, please get in touch or just help improve the article in the meantime!
Useful Skills:
PropertyList XML File, Aircraft-set.xml, Property Tree, Nasal scripting, fgcommands, Nasal/Web scripting


People:

Mentors: Hooray (get in touch to learn more)
It's possible that this article hasn't been updated in a while, so to catch up with the latest developments, you are advised not to start working on anything directly related to this without first coordinating your ideas with fellow FlightGear contributors using the FlightGear developers mailing list or the FlightGear forums. See also the talk page.

Note  Also see Testing
Cquote1.png a thing FlightGear developers could do to help developers of the free drivers, to help themselves, to help users and to help the Phoronix website would be to implement a benchmark mode. Phoronix is desperate to find more up to date and graphically challenging games for their benchmarks. The driver developers do read Phoronix and use the Phoronix benchmark suite to optimie the drivers. Users make buying decisions based on these benchmarks and general reports and last but not least, you know best how difficult performance optimization is for an application developer.
— Stefan Seifert (Jan 30th, 2014). Re: [Flightgear-devel] Graphics cards.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I played around with the existing capabilities last weekend and it looks like we're almost there anyway. Setting FG_HOME to a temporary directoy ought to be enough to prevent leaking settings from one run to another and allows using specific settings for the benchmark run (e.g. Rembrandt/ALS). Using generic file input allows replaying a full flight and the telnet interface allows reading FPS and frame distance numbers. The script at https://github.com/flighten/test attempts to do so anyway. Combined with some static weather input and fixed random seeds (probably supplied on the command line) we'd have all we need for reproducible benchmarks. If any developer finds some time to implement the missing pieces this could help tremendously. My personal situation will improve in about half a year, but if someone can pull this off before that, we'd not only get better support but also great marketing. FlightGear would very probably be featured in every Phoronix benchmarking article and those are very frequent.
— Stefan Seifert (Jan 30th, 2014). Re: [Flightgear-devel] Graphics cards.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png 3.2 switched the base-package scenery to the high-resolution (i.e. memory-intensive) version, with the result that FG on default settings hangs my system (4GB memory, Intel graphics, no swap).

It becomes usable after reducing the bare LOD range, but one needs to know to do that; I'd like to replace the fixed defaults by something
that automatically adjusts to the hardware, but haven't yet got around to this.


Cquote2.png
Cquote1.png
  • we appear to be single-thread-CPU bound (and if we are on my machine, we probably are on most)
  • terrain mesh (bare LOD range) costs memory, instanced objects (random *s) and the first shader step cost frame rate, unique objects (complex airports/aircraft) cost both
  • texture format makes little difference to either main memory use or frame rate (but note that global-png and global-dds are probably not a fully like-for-like comparison)

— Rebecca Palmer (2014-09-03). [Flightgear-devel] Performance tests.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png Except as stated: current 3.3 with my locked-listener patch (see earlier today), c172p stationary on KSFO 28R, --timeofday=noon --disable-real-weather-fetch --disable-ai-traffic (for consistency), Terrasync scenery, LOD range 1.5/9/12km, regional textures, random buildings/objects/vegetation and precipitation/3D clouds on, shader level 1, default (looks about 1024x730) window size. Fresh run for each setting, Intel i5-3230M with integrated GPU, Ubuntu 14.04 64-bit, memory/CPU measured with System Monitor (may not include GPU memory, 25%=one core fully loaded)
  • baseline: 19fps 1.3GB memory 24% CPU
  • unlocked listener (the old, crash-prone way): 18.5fps 1.3GB 24%
  • polling (current next): 12fps 1.3GB 24%
  • global-png textures: 20fps 1.2GB 23%
  • global-dds textures: 18.5fps 1.2GB 23%

— Rebecca Palmer (2014-09-03). [Flightgear-devel] Performance tests.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png On my Linux PC FlightGear has always been quite a slow program in comparison to other graphics intensive stuff (think Steam games and so on). So I've always been interested -- how does FlightGear compare to something like Team Fortress 2? Today I did a small investigation
Cquote2.png


Screen shot showing a the performance monitor in a patched version of FlightGear 3.2 where subsystem initialization is made better configurable and increasingly optional by allowing subsystems to be explicitly disabled/enabled during startup. Decoupling internal subsystem dependencies means that we can more easily provide support for benchmarking, but also headless regression testing - and eventually, also a standalone FGCanvas startup mode.


Objective

Note  Based on recent experiments with benchmarking Rembrandt, it would obviously make sense to have access to individual stages (cameras)

A long time ago, we once had a FG-specific benchmark suite called "FGBenchmark" over time this wasn't updated anylonger and got phased out- meanwhile, a number of end-users and long-term contributors have been talking about re-introducing a form of scriptable benchmark, directly as part of FlightGear itself, using Nasal scripting to recreate certain situations (location, aircraft, rendering settings etc) in order to gather runtime statistics, but also for better regression testing.

Obviously, FlightGear has drastically evolved since the early days of FGBenchmark, so lots of benchmarking metrics can now be gathered, even without touching the C++ source code and without using any external tools or introducing other platform-specific dependencies. Basically, a simple form of regression testing or benchmark (unit tests) can now be implemented directly through FlightGear and Nasal scripting. Technically, the main restrictions are currently:

  • FlightGear expects an aircraft to be selected at startup, so that benchmarks could only be self-contained if they're are provided as a custom set of aircraft-set.xml files, simply because we cannot yet switch aircraft at runtime. Thus, the simplest option is simply creating benchmarks using certain aircraft and providing a "aircraft-benchmark-set.xml" file, the ufo should work well for starters.
  • the fgfsrc, autosave.xml preferences.xml files are user-specific and cannot currently be overridden by a benchmark, however these files may contain tons of settings that might affect performance/benchmarks - which needs fixing, to ensure that a 100% correct setup can be replicated by a benchmark. Basically, we could simply add a new command-line switch to ignore these local files in $FG_HOME/$FG_ROOT and instead refer to a corresponding PropertyList-section embedded in the aircraft-set.xml file, so that user-specific settings are not loaded (well, except for obvious candidates like --fg-root= and some others)
  • FlightGear always expects a fully interactive GUI session to be running, see FlightGear Headless
  • Many settings are runtime-configurable and can be changed through the property tree (or fgcommands) while running FlightGear, some others still require a full simulator reset - this applies in particular to non-optional subsystems but also a bunch of rendering related settings
  • The Nasal scripting interpreter is initialized pretty late because it has some hard-coded assumptions regarding available subsystem, OTOH it could be doing useful work if a restricted interpreter was available earlier, i.e. to help with simulator-reinitialization, see Initializing Nasal early

Our hope is that we'll be able to come up with a simple benchmark suite to help users provide better troubleshooting reports, but also allow developers to do largely automated regression tests, i.e. through benchmarks or scripted flights. The recent advances in deferred rendering support (Rembrandt) also resulted in tons of GPU/GLSL related bug reports that are often hardware-specific and difficult to reproduce. Also see: Troubleshooting performance issues#A note from the developers.

In the long run, the corresponding data could also help us to provide more reliable Hardware Recommendations.


In its simplest form, a scripted benchmark merely traverses a list of input and output properties, i.e. properties that have an effect on performance and which can be modified at runtime (visibility, fog settings, shader settings) - which in turn, have an effect on certain output properties, such as frame rate or frame spacing. These could then even be tuned using a PID algorithm, i.e. the autopilot system controllers to implement feature-scaling support.

##
#
#

Benchmarking results could be shared by exchanging XML files or even by directly uploading them to a server using built-in HTTP support.

FlightGear on Netbooks


Benchmark support in X-Plane

Just for reference, here's some info on X-Planes integrated benchmarking support:

Status

Cquote1.png While working on performance issues reported by cppcheck, I wanted to compare performance to a known baseline and came up with this simple test bench.

It's:

  • written in Python
  • uses the telnet interface to FG
  • doesn't have other dependencies and
  • runs on Linux and Mac (tested) and run on Windows in its first incarnation

The script is called framerate.py and it replays a pre-recorded flight (test2.out.gz described here: http://wiki.flightgear.org/Suggested_Prerecorded_Flights) while collecting numbers on frame rate and maximum latency.

The repository for this is at : https://github.com/flighten/test[1]
— Tom P
Cquote2.png

Background

Cquote1.png I'm interested in the capability of doing multiple builds with different versions, branches and options and in doing some kind of automated testing on the resulting builds[2]
— Pat
Cquote2.png
Cquote1.png Does flightgear have like a default benchmarking system?

I think the devs should define a set of standard settings and perhaps a flight recording to benchmark flightgear on, to help determine a computer's suitability to run flightgear. Obviously user submitted benchmarkings are pretty different due to different software and settings, so a set of standards like the benchmarking in arma2/just cause2 etc. would be awesome. eg. low settings benchmark would be a 10 min flight on a low-polygon airplane and simple terrain, and a high settings benchmark would be everything "maxed out" in high res, in heavily complex scenery, in like a thunderstorm with 100 AI aircraft perofrming CPU intensive maneuvers in close proximity etc. (I obviously don't know how fg works and which things are most CPU/GPU intensive)

Obviously a flightgear specific benchmark could be much more suitable for flightgear than a generic gaming benchmark and much more helpful for people figuring out what settings are best for their systems, so what do you guys think of the idea?

I am totally new to flightgear and I haven't seen any indication that fg has a benchmarking system, so I thought that would be nice. I could provide/specify a list of settings and record some flights (if said feature exists), although I doubt I have the fg experience and authority to do so, so I hope you guys can sort this out- besides, a benchmark system should be pretty easy to implement compared to like say adding more realism/ better graphics/ revamping the engine, so I think that including a set of specific benchmarking tools and settings is plausible[3]
— Ericolon
Cquote2.png
Cquote1.png http://wiki.flightgear.org/Howto:Debugging_FlightGear_Crashes#Minimal_Startup_Profile

I remember using this to try and figure out why I was getting 10fps on a nvidia GTX470 with the lowest settings and default renderer (I was accidently running a debug build).

I might come up with a dash script that tests things in different areas with different settings, But that will only be helpful on linux. I'll

probably just use the telnet server to pull the property tree frame rates/spacing I haven't used it yet but I imagine it would be quite easy).[4]
— Christopher Andrews
Cquote2.png


Cquote1.png to come up with a long line of tests, Eg something like the minimal startup profile in the middle of the ocean and then test

individual things like 3d clouds, the quality slider thing, random buildings/trees, advanced weather, different aircraft (compare ufo to concorde and you will see - but something in the default package), and then test it all again with rembrandt.

I might also come up with an "aircraft" tester to see how different planes affect frame rates.[5]
— Christopher Andrews
Cquote2.png
Cquote1.png what about performance with different views (pilot view vs chase view), and having the panel open vs. closed in pilot view?[6]
— Saikrishna Arcot
Cquote2.png
Cquote1.png I'm usually interested in very specific before/after questions. For instance, I can push some shader code into a conditional clause and benchmark this to run faster on my system. I'd like to know - does it generalize? I've learned that optimization seems to generalize across nvidia hardware, but I'd like to get feedback in a before/after situation from a Radeon user.[7]
— Renk Thorsten
Cquote2.png
Cquote1.png Or, system dependent optimizations. Stuart has introduced a cloud LOD system and has some framerate gain from it in overcast layers. I've been playing with it and couldn't get much clear difference in performance, so I just switched it off completely. What I'd be interested in is - for what hardware do we see framerate gain, and what LOD distances would people typically select in order to get a good balance between visuals and framerate. Or would they prefer to vary cloud density, or cloud visibility radius? If we would know what most people select if given the choice, we could set reasonable defaults and structure the GUI accordingly.[8]
— Renk Thorsten
Cquote2.png

A benchmark/regression testing suite could also be run through the FlightGear Build Server:

Cquote1.png A standardized benchmark would be, if we get enough data, be more of a general warning system - suppose we regularly monitor performance on 50 different systems, and after some commit we see 20% performance drop on 35 of them - that's indicate that the commit might be in some way problematic. But for this, we would require a regular time history - basically the monitoring script should run and report after every update of either FG or the drivers.[9]
— Renk Thorsten
Cquote2.png

Brainstorming

Note  Beginning with FlightGear 3.1+, you can also toggle individual scenegraph traversal masks on/off (these can be changed at runtime using the Property browser:
  • --prop:browser=/sim/rendering/draw-mask
  • --prop:/sim/rendering/draw-mask/terrain=0
  • --prop:/sim/rendering/draw-mask/aircraft=0
  • --prop:/sim/rendering/draw-mask/models=0
  • --prop:/sim/rendering/draw-mask/clouds=0

Purpose

  • maybe we really do need a benchmark .fgfsrc for proper comparison. [2]
  • Even a very simple benchmark could be useful for feature-scaling and regression testing purposes, and if it's implemented in a non-interactive fashion, it could even help with regression testing. Ideally, a benchmark would start out with the bare minimum settings and then dynamically change settings on the fly to determine their effect on frame rate and frame spacing, to come up with a list configuration settings that work properly, while ensuring a satisfying simulator experience. We already have various building blocks in FG to do most of this, it's really just a matter of combining and integrating existing features to provide such a simple benchmark. From a troubleshooting perspective this could in fact also be useful, because we could ask users to open a certain dialog and run a certain benchmark and report the results here. [3]
  • We actually talked about that benchmarking idea a while ago, and I even implemented a proof of concept [4]
  • Several people mentioned that they would like to have some form of "benchmark" to run FlightGear on various different platforms to see how it performs.I think the idea is not that bad, and that this might actually help troubleshoot some issues. Also, I do think that such a benchmark could probably be implemented directly in FlightGear, just by using Nasal scripting and some custom XML files. This would be pretty much related to the idea of "feature scaling" which was discussed in the other thread. [5]
  • having a number of benchmarks available could probably provide useful metrics to get FlightGear to run. For example, even the very simple file that I posted can already be used for troubleshooting: if a user is not able to run this with more than 100 fps, he is unlikely to be able to run FlightGear with default settings.

Regression Tests

Troubleshooting bug reports is often extremely tedious, because we need to replicate lots of settings:

  • "What's the highest shader level, at which random buildings still work? Or the lowest at which they fail?" [6]
  • we should add a menu item to dump the current position and all rendering/environment settings to an XML file, so that we can more easily reproduce such things, just by loading a config from a file. [7]
  • hat's a super idea! It wouldn't surprise me if some of these glitches are peculiar to specific hardware configurations, either, so perhaps that might be part of the report as well. I'll paste XML into forum posts all day if it helps the devs fix bugs. [8]
  • even just knowing that certain issues only occur with some GPUs would be VERY good to know. But obviously we would need a sane way to easily reproduce a certain configuration, including all startup settings, but also the runtime rendering settings. [9]
  • After all, having an easy way to reproduce a certain configuration, could save us tons of time and question asking - so having such a feature would be really invaluable in my opinion. We could add a dialog so that people could even describe the problem - so that the XML files would become self-contained and could be easily checked by different people without having to ask tons of tedious questions... Thinking about it, the simplest option would seem to be using existing stuff. After all, this is just about recording and replaying properties. And that's exactly what the new flight recorder (replay tapes) system does. So we could simply abuse it a little to also provide a configuration to sample the various rendering properties (see rendering dialog), which should give us a way to reproduce settings fairly well. [10]

Aircraft-set.xml based benchmarks

  • The only problem is that FlightGear always makes the assumption that it is running some form of aircraft/vehicle, so any sort of "benchmark" needs to be provided as an aircraft. Also, one needs to override the global preferences.xml file because there is no way to use a different one.
  • Well, while ago, we talked about creating benchmarks in the form of custom aircraft-set.xml files, which would already contain all startup settings (resolution, bpp, shaders etc) [11]
  • This would allow us to share "benchmarks" in the form of aircraft, so that people could easily launch them using fgrun (or whatever GUI frontend they have) - still, it'd be possible to export benchmark results to XML.
  • One would only need a way to create a default situation (i.e. like a custom preferences.xml file) and a way to dynamically toggle FlightGear features on/off and tweak them at runtime.

Approaches

  • This should be pretty straightforward to do, at least for those features (configuration properties) that are already using listeners or that are read every frame. This applies to most of the recent graphics additions (i.e. shaders), because these can be dynamically enabled, disabled and configured.
  • So a FlightGear benchmark would then only have to be run with common default settings (e.g. window resolution, color depth, startup airport, aircraft and environment settings) while a Nasal script could then be used to dynamically tune these settings. Reading internal counters (namely the framerate counter for the time being) would then give us an instrument to see how significant certain settings are.
  • In the beginning, the easiest way to have something like a benchmark in FlightGear would be to simply use static "situations" that are loaded from XML files, these would then override all local custom settings so that users can reliably compare their frame rates when running such "situations" on different machines.
  • imagine we would create a bunch of additional "benchmarks" like this, each of those testing individual features of FlightGear (shaders, effects, particles, shadows, AI aircraft and so on), all of these could be useful to allow users to see if their system (and configuration) is able to run FlightGear or if it needs to be modified (software/hardware configuration). [12]

Extending the replay/flight recorder subsystems

  • We do have a so called "flight recorder/replay" system that can save flights. The whole system is property-driven, and it is possible to provide custom sets of properties that should be recorded. In other words, it would be possible to create a custom "flight recorder" configuration that doesn't just record aircraft settings, but also rendering related settings [13]
  • Maybe we could use the flight recorder to record a flight, so that more people could try the same flight, recreating your settings ? That would basically be a simple benchmark [14]
  • Using a combination of prerecorded flights, the replay/flight recorder system and a Nasal script to change setting on the fly, it wouldn't necessarily be very difficult to create a simple benchmark framework. [15]
  • having an easy way to reproduce a certain configuration, could save us tons of time and question asking - so having such a feature would be really invaluable in my opinion.[16]
  • the simplest option would seem to be using existing stuff. After all, this is just about recording and replaying properties. And that's exactly what the new flight recorder (replay tapes) system does. So we could simply abuse it a little to also provide a configuration to sample the various rendering properties (see rendering dialog) and implement a benchmark, which should give us a way to reproduce settings fairly well. [17]
  • Still, the idea of creating a simple, easily recreatable benchmark flight, sounds good to me! What aircraft would be suitable? Is the UFO in the standard installation? I don't know, since I use the fgdata from git for years... Then I would make some flight over KSFO, which should be on every FG installation, and then? What would I do with the recorder tape? upload it somewhere? [18]

Community Feedback

Cquote1.png could some thought be given to producing a benchmark suite for Flightgear. It would need to take in all of the, by now well known, variables - making it by no means a simple beast to manage. If this could be automated in some way it would be much easier to capture, and then submit, consistent data. [10]
— Alan Teeder
Cquote2.png
Cquote1.png A scripted run would be an EXCELLENT tool.[11]
— geneb
Cquote2.png
Cquote1.png a scripted run can be set up to play all the tricks, even if it needs to run FG several times to e.g. reset the graphics, and it can be set up to finish the run by offering to upload the results automatically[12]
— Arnt Karlsen
Cquote2.png
Cquote1.png Is there a benchmarking tool/setup for flightgear? For example a preconfigured/prerecorded flight with fixed variables (weather, time, fov, etc), fixed nr of frames. Basically everything fixed except rendering options and that it measures how long it takes to render/run and calculate the average FPS? People would be able to compare this value, and one would not be comparing apples with pears. Everybody ran the same benchmark/flight. It would be very helpful in determining if some change brought improvement or made performance worse by a proper measuring instead of staring at the FPS counter in the bottom of the screen during gameplay and 'estimating' if things improved or not.[13]
— EViLSLT - Rob
Cquote2.png
Cquote1.png A FGLive type ISO with programs that will benchmark the hardware only.... so what if it runs under *nix.... local optimisations are a matter of personal choice albeit open for public discussion.... you guys would know if this is do-able/worth-while though .... just an idea[14]
— dene maxwell
Cquote2.png
Cquote1.png The FGBenchmark package was meant to compare performance not only under different setups of the same operating system and architecture (say Linux on x86) but to compare different arcitectures as well. So I put 'fgfs'binaries for different systems (Linux, FreeBSD, Solaris/sparc, Solaris/x86, IRIX) into a package and made a start script that determines which binary to run.[15]
— Martin Spott
Cquote2.png
Cquote1.png I would be very interested to know how many polygons per second FGFS is rendering. Do you have a ballpark number? It might be nice to have several sections of the benchmark and in one try to maximize poly count of the scene and minimize all else.[16]
— Wolfram Kuss
Cquote2.png
Cquote1.png a subset of the FlightGear Open Source flight simulator, packaged together with the purpose to serve as a specific benchmarking tool among different Unix platforms. The idea arose after realizing, that real world performance numbers for FlightGear on Unix workstations, especially for SGI and Sun machines, are rare because most potential users apparently don't like to share their experiences. The package would also serve as a FlightGear 'Getting Started' kit, it consists of binaries for a few platforms and a base package with high resolution textures and some aircraft removed.[17]
— Martin Spott
Cquote2.png
Cquote1.png I've assembled a 'small' (40 MByte) FlightGear package and included a README:

This is a subset of the FlightGear Open Source flight simulator, packaged together with the purpose to serve as a specific benchmarking tool among different Unix platforms. The idea arose after realizing, that real world performance numbers for Unix workstations, especially for SGI and Sun machines, are rare because most potential users apparently don't like to share their experiences. The package would also serve

as a FlightGear 'Getting Started' kit.[18]
— Martin Spott
Cquote2.png
Cquote1.png How about a reproductible way to benchmark FlightGear ? Something like q1test or q2test in Quake. That is : an automated sequence of flight during, say 30s to 2mn, along a predetermined path from KSFO with different views. This could be presented has a demo and at the end, a summary on framerate and performance numbers will be displayed.

This could be controlled by command line options

Just a thought,[19]
— Frederic Bouvier
Cquote2.png
  1. Tom P (Sat, 22 Jun 2013 14:27:35 -0700). Re: [Flightgear-devel] Benchmark matrix.
  2. Pat (Wed, 03 Jul 2013 17:24:19 -0700). Re: [Flightgear-devel] FG 2.12 RC Broken ?.
  3. Ericolon (Thu Feb 21, 2013 4:35 am). Flightgear-specific benchmark.
  4. Christopher Andrews (Fri, 21 Jun 2013 08:12:05 -0700). Re: [Flightgear-devel] Benchmark matrix.
  5. Christopher Andrews (Fri, 21 Jun 2013 13:17:10 -0700). Re: [Flightgear-devel] Benchmark matrix.
  6. Saikrishna Arcot (Fri, 21 Jun 2013 14:08:11 -0700). Re: [Flightgear-devel] Benchmark matrix.
  7. Renk Thorsten (Sat, 22 Jun 2013 00:35:23 -0700). Re: [Flightgear-devel] Benchmark matrix.
  8. Renk Thorsten (Sat, 22 Jun 2013 00:35:23 -0700). Re: [Flightgear-devel] Benchmark matrix.
  9. Renk Thorsten (Sat, 22 Jun 2013 00:35:23 -0700). Re: [Flightgear-devel] Benchmark matrix.
  10. Alan Teeder (Thu, 20 Jun 2013 10:15:42 -0700). [Flightgear-devel] Benchmark matrix.
  11. geneb (Thu, 20 Jun 2013 10:20:41 -0700). Re: [Flightgear-devel] Benchmark matrix.
  12. Arnt Karlsen (Fri, 21 Jun 2013 04:27:19 -0700). Re: [Flightgear-devel] Benchmark matrix.
  13. EViLSLT - Rob (Wed, 18 Apr 2012 06:51:44 -0700). Re: [Flightgear-devel] An empassioned plea.
  14. dene maxwell (Sun, 21 May 2006 03:42:17 -0700). RE: [Flightgear-devel] FGBenachmark; Was: ..FGLiveCD boot workaround.
  15. Martin Spott (Sun, 21 May 2006 04:08:03 -0700). Re: [Flightgear-devel] FGBenachmark; Was: ..FGLiveCD boot workaround.
  16. Wolfram Kuss (Wed, 25 Feb 2004 13:42:21 -0800). Re: [Flightgear-devel] FlightGear 'benchmark'.
  17. Martin Spott (Mon, 29 Mar 2004 12:27:40 -0800). [Flightgear-devel] new FGBenchmark package.
  18. Martin Spott (Mon, 23 Feb 2004 07:40:44 -0800). [Flightgear-devel] FlightGear 'benchmark'.
  19. Frederic Bouvier (Sun, 07 Apr 2002 09:59:28 -0700). Re: [Flightgear-devel] FrameRate !!.