Profiling performance

From FlightGear wiki
Jump to navigation Jump to search

This page will contain info about performance bottlenecks and profiling in FG. I'm not sure where this info should go after this is done.

Background

Status: Last updated 02/2022

The problem is that FlightGear’s architecture does not use either CPU efficiently (multi-threading) or GPU efficiently (large draw batches with few state changes). So we have no hope of hitting 100% utilisation. You can probably run a equivalent or better graphics to FG 2020.3 on your laptop at even 60fps, *if* the renderer was re-written from scratch. (and you’d see GPU at 100% and CPU use higher, maybe) But that’s an impossible amount of work, so we are trading your money (what users spend on hardware) for development time (we don’t have five full time developers to ditch OSG and re-write the entire renderer natively using Vulkan / Metal / D3D12 .. and then rebuild every aircraft model to use efficient textures, replace all Effects, etc, etc).

We especially get away with this because large old CAD applications have the same problems as FG (using archaic OpenGL, and very poor threading) and nVidia has done a lot work making their drivers support these applications well. (Hence why the nVidia drivers are allegedly a larger codebase than the Windows kernel at this point)

Whereas, the Intel GPUs give pretty good FPS (especially the newer Iris units) … in modern code written to use them efficiently. The minute you do something old-fashioned, you hit slow and weird paths (and bugs): same for many of the open-source drivers.

So our utilisation is terrible, but we have no ‘developer affordable’ ways to get it dramatically better in the short term. Instead I’d recommend to build a small, cheap desktop machine with a moderate CPU (no need to go crazy with 32 cores, FG will only use 3-4) and an nVidia GPU in it. You could use even second-hand parts for this: a 6th or 7th gen i5 or i7 with an nVidia 970GTX or 1060GTX will be a perfectly fine FG machine for the next few years, and when it’s not, you can drop in whatever nVidia 6090RTX that you can afford and TSMC can supply… [1]

Introduction

3d applications like FlightGear create a series of images, called frames, to be displayed on your computer screen (monitor). Each of these images shows the simulated world at a point in time - for example the state of the cockpit toggles and displays, moving objects outside like other airplanes at an airport, moving scenery objects like windsocks/trees/grass swaying in the wind, stationary scenery objects like buildings, the changing position of clouds/sun/moon/stars. What can be seen outside the cockpit changes on the position and orientation of your craft. The state of your craft and it's position/orientation is influenced by the control inputs you provide - through a joystick/yoke/keyboard/rudder pedals/mouse.

There is a loop - read input and simulate the world, then draw an image:

  1. Read control input at time T0 and simulate the world at a time T1 - note T1 may be after T0. From T0 to T1 the aircraft is simulated running with no change in input.
  2. Draw an image (frame) - which the monitor reads and finishes updating at T3 - note the image of the simulated world at T1 ends up on the screen at T3
  3. Read control input at time T4 and simulate the world at a time T2
  4. Draw an image (frame) - which the monitor reads and finishes updating at T5
  5. Read control input at time T6 and ...

Frame Spacing - The time taken for a loop e.g. from T3 to T5 is the spacing between drawn images (frames), called the frame spacing. The time taken can vary each iteration - some things might take more time than usual. For example the amount of work done to simulate the world might change - e.g. if an aircraft display was switched on. The amount of work done to draw the world might change - for example if looking at empty blue sky and changing view to look at a city or a forest. There could be events in the craft or the world that take time to simulate. There might be a pauses while loading things (design tries to avoid this). Sometimes the operating system may take away time to do an important/emergency task, or service a program running in the background (if there are programs running in the background that disturb FG you could try turning them off). If the time taken to complete a frame takes longer than usual this is known as a frame spacing spike. Stutters - if the spacing between frames gets very long, e.g. 0.5 seconds or several seconds, then this is called a stutter - maybe it can happen when loading.

Frame rate per second (FPS) - the average time taken to render frames over a period. Calculated by counting the number of frames rendered over an interval, and dividing by the time taken. This is an average and it keeps varying. At 60FPS the average frame spacing (average time between images) is 1 second/60 FPS = 16.67 milliseconds. At 30 FPS it is 33.3 ms. At 20 FPS it is 50 ms.

Input lag - only important for Virtual Reality (VR). The time taken to read input at T0 and for the VR headset or monitor to finish drawing an image (frame) at T3. For VR, only the head movement input lag is important - that is, the time delay between moving the head and the VR headset display finishing updating, e.g. head movement while looking around. For VR, it is possible to draw an image that is slightly larger than the field of view - and read head movement input after the frame has been drawn to very quickly re-project the view to use the latest head position and view direction. This re-projection reduces head input lag a lot, and it is extremely fast. Newer GPUs are also capable of interrupting drawing if it is taking too long, and quickly drawing a re-projected image using the recent head position and the previously finished image. This is called asynchronous re-projection[1] - there are different types and Steam VRs motion smoothing (2021 - [2][3]) will re-project to adjust the head position and view direction while also using the previous 2 images to predict the how objects in the image have moved by looking at the depth data.

The aircraft control input lag - the time delay between interacting with aircraft controls and seeing a change on the monitor - is not important for both normal and VR uses. This is because the frame spacing is really short compared to the time it takes to move aircraft input interfaces in a real plane, not just the joystick/yoke/keyboard/mouse/pedals at a users PC. The frame spacing is also short compared to the time it takes for the aircraft to change systems in response to input, or the time it takes for the aircraft to respond e.g. start to noticeably change movement in response to controls.

Measuring and reporting performance

Note: FlightGear throttles frame-rates to 50 FPS by default in 2020.3 LTS. To profile performance FPS throttling must be turned off. VSYNC must also be turned off in the graphics driver control panel.

GPU performance

GPU performance is how quickly the GPU tasks get done. The tasks depend on a particular situation - for example, at certain graphics settings, in certain craft, flying over certain scenery at certain altitude, in certain weather. The size of the GPU task depends on what's in view - if tree density is ver high it won't make a difference to the GPU task when flying over a desert or the sea.

Often you may want to compare a change or difference in performance - e.g. a difference in performance between an old feature and a new version of the feature, or a difference in performance with one graphics setting and another, or a difference in performance between the FlightGear LTS and next branch (nightly builds).

GPU performance, when GPU bound

FPS can be used to compare performance if your frame rate is bottlenecked by the GPU. This is useful for reporting the FPS impact of new rendering features on different hardware. New rendering features often impact the GPU task size more - for example a GPU programming change (shader change) will only change the GPU load. But new rendering features can also change the CPU task size - it's also important to report this.

Different aspects of the GPU may be the bottleneck (different stages of the pipeline). At high shader settings, you are very likely to be bound by the number of pixels - the number of fragments to be exact. This is called being fragment bound.

  • You can also tell if you are GPU bound, as GPU utilisation will be 100%.
  • When GPU fragment bound: changing the window size slightly should result in a change in FPS.
  • Change in performance = FPS2/FPS1. Examples: increase of 15 FPS to 30 FPS = 30/15 = 2x (200%) , or a 200%-100%=100% increase. A drop of 40 FPS to 30 FPS = 30/40 = 0.75 (75%) or a 100%-75%= 25% drop.
GPU performance, when CPU bound (non-GPU bound)
  • Not GPU bound: If your bottleneck is not the GPU, it is most likely the CPU. You need to measure GPU utilisation. GPU utilisation is the GPU load - it will be less than 100%.
  • Change in performance = utilisation1/utilisation2. Examples: Drop from 40% to 20% = 40/20 = 2x increase in performance or doubling of FPS. Going from 40% to 30% = 40/30 = 1.33x (133% FPS),or a 133%-100%= 33% increase.

CPU performance, when CPU bound (non-GPU bound)

CPU bound FPS limit: You can usually find your CPU bound FPS limit by reducing window size until FPS stops increasing. This depends on what's in view as the CPU task involves going through all the scenery objects in the OSG scenegraph.

CPU performance, when GPU bound

It's usually simple to make FlightGear non-GPU bound. Go into windowed mode (shift+f10 - Nov 2021) and shrink the window until FPS stops increasing. This usually works as people are most often fragment bound at higher shader settings.

If you cannot make FlightGear CPU bound, you need to measure CPU utilisation in the same way as above.

Benchmark based on FG-tape

Recent work on FGtape replay support may make a benchmark possible in future. See [4] [5].

Benchmark information

  • A useful bit of information is to know if FlightGear was CPU bound or GPU bound for a particular frame. This may need some querying from the drivers.
    • The benchmark could output: the fraction of frames for which FG was CPU or GPU bound, time series of binary boundness values , time series of GPU time/CPU time, ability to compare time-series of boundess with a reference (e.g. binary states for each frame could be subtracted, and then the average found).
  • A histogram of frame-spacing values.
  • Breakdown of the above by different segments of the replay - e.g. the space shuttle during different phases (altitudes / speeds) of a re-entry and landing.

FlightGear, open-source, cross-platform, non-synthetic, benchmark

  • The available partially non-synthetic benchmarks are non-free - they need the paid version to unlock features. The completely non-synthetic benchmarks are short-lived games that may be heavily CPU or GPU bound and used in lazy hardware benchmarks without bound-ness tracking, or just sessions of playing games.
  • Having a FlightGear benchmark may end up giving better driver support to FG (Currently the usual relationship between graphics hardware companies and 3d application devs hasn't started yet - see this old 2014 blog [6] for a description of the level of interaction. This is despite FG being able to create rendering tasks that can occupy high end cards, and FlightGear determining people's hardware purchasing decisions. This is mostly due to FG being a non-game and a simulation that has been open-source and contributed to by volunteers. A benchmark will likely attract advice and optimisation contributions from hardware companies, and people following hardware performance for Flight/Space simulations or open-worlds.
  • FlightGear is a completely non-synthetic benchmark - a real application that stresses both GPU and CPU heavily in a variety of ways - one that has been and will be maintained for a long time. A FG benchmark will also have some intriguing visuals.
  • An opensource cross-platform benchmark allows some comparison between different OSes. FlightGear has binary builds for Linux now (Appimages) so benchmarking without compiler versioning and optimisation differences is possible on all 3 major OS platforms. FG even compiles for ARM processors, and Raspberry Pi.
  • An opensource benchmark means the internals are visible to all interested hardware parties. The available open-source benchmarks [7] are different from simulations.FG, are pretty synthetic, and mostly deal with CPU
  • Special branches of FG could contain the needed scenery and aircraft, and the versioning will mean the benchmark stays frozen.
  • Probably Phoronix (who already familiar with FG as they have covered releases on their news site for a long time) and Openbenchmarking.org will be interested and may offer to assist with setting one up.

Simulation comfort

To do:

Due to the way human vision works, a consistent frame spacing is the major factor in comfort - rather than FPS, as is commonly misconceived. A higher FPS with inconsistent frame spacing can feel less comfortable than a lower FPS with consistent frame spacing. Of course, a higher FPS means the differences in frame spacing get proportionally smaller - so the inconsistencies in frame spacing feel more comfortable. The way the eye works is by scanning the scene with the high resolution fovea This is a link to a Wikipedia article, sort of like scanning the environment with the beam from a search light, to put together a 3d model of the environment which the higher cognitive functions perceives - sort of like a person looking at a 3d model in blender [8]. The eye scans by making a series of very rapid movements called Saccades This is a link to a Wikipedia article, to update the 3d model of the world. To update the 3d model, the eye needs to know where to scan. The human vision system uses knowledge of the way objects in the scene move (i.e. learned physics) to help it know where scan [9]. The reason people improve their eye coordination with practice, e.g. improving at tracking balls in games as kids, is the vision system learns where to look to update the world better, by improving prediction. A computer monitor updates at set intervals e.g. 60hz. Each image shows the simulated world at a certain time. If successive images show the simulated world with the same time spacing, it is easy for the eye to update the 3d model of the world and the experience feels more comfortable, more 'smooth'. If the time spacing varies randomly then the vision system is constantly being surprised by the position of objects, and it's harder for the eye to predict - it feels less comfortable.

Update rate of human vision system and monitor refresh rates

Monitor refresh rate in hz - The number of times per second a monitor will read the output image from the GPU (known as the framebuffer). The monitor will read what ever is there, and it will display that image until it the GPU again. If the GPU is still drawing the next image the monitor will read the previous image. If the GPU is in the middle of updating the output the monitor will have part of the previous image and part of the current image.

Monitor response time and ghosting (moving objects leaving a trail) - this is not important for FlightGear. It's mainly important for applications where the user turns the camera around to look in a tiny fraction of a second, and do control inputs where a tiny fraction of a second regularly matters (which it doesn't for real aircraft). Response time is mainly used in the context of games involving tiny, low-force, finger movements and reflex muscle memory, with fast interaction physics that don't really have realistic counterparts. It's mentioned to avoid confusion with requirements mentioned elsewhere for non-simulation applications, and to help look at the right specs when choosing monitor hardware for simulation. Ghosting: The monitor takes time to update the display on the screen - response time. For LCD monitors, the RGB pixels on the screen take time to change value - the less they have to change, the shorter the time needed. If an object moves across the screen, it can leave a trail behind, If the contrast between an object and a background is big the trail gets larger. A low response time will likely not matter for the way most people use FG.

Update rate of the human visual system - this is fairly low, around the mid 20 hz. However, people perceive fast movement as a blur - try rapidly waving your fingers, or a pen, in front of your eyes.

People can tell a slight difference between a moving object on a monitor that updates at 30hz and the same object on a monitor that updates at 60hz+. The blur is different. This is because an LCD monitor will draw an image and keep it until the next image starts drawing - at 10hz it will keep the image for a 100 ms , at 60 hz it will keep it for 16.7 ms. The position of a steadily moving object looks like a stair case function instead of a straight line - a object undergoing a wave motion jumps between positions like this. It doesn't mean the visual system has an really high update rate - sometimes the fact that people can spot a difference between two monitors with different update rates is used as a marketing statistic to sell higher refresh rate monitors.

FPS for comfortable simulation

The exact FPS for a comfortable simulation depends the frame spacing consistency of the simulation. The comfort also depends on the person and the particular activity they are doing in a particular craft. For the purposes of flight simulation, and FlightGear as of 2016 and earlier, 20-30 FPS is often enough [10]. Lower FPS than this might feel comfortable when the interaction with the aircraft involves using switches and toggles that have a few states - rather than using analog inputs like stick, rudder, or collective. Lower FPS might feel comfortable for piloting glass cockpit craft like airliners by programming the craft's autopilot. If the FPS is really low, even with consistent frame spacing, the very long update interval might feel somewhat odd. Turning on the DDS cache in FlightGear 2020.3 LTS helps smoothness. Frame rate consistency may have improved even more in FG by the time you read this, so lower FPS may be tolerable in the same situation compared to what you've experienced in the past.

Measure of comfort / frame spacing inconsistency

There is no measure of frame spacing inconsistency in FG currently (Nov 2021). A measure of frame spacing inconsistency is best presented as a histogram. The histogram should be of the last X seconds (moving bar graphs are harder to read) - or over a measurement interval with stop/start controls. The histogram should have a mode that's in units of percentages compared to the average frame spacing over the interval - this allows the measurements to translate to translate better to different hardware speeds and average FPS.

See the trade off between graphics content, FPS, and graphics settings and the discussion on hardware recommendations .

Bottlenecks

To put it very roughly, to simulate the world and draw an image, there is a CPU task and a GPU task that must be competed each frame. The size of these tasks depends on the craft, the situation in the simulated world, and what's in view.

If the CPU finishes the task very quickly, it then has to wait until the GPU is finished. Then the bottleneck to performance is the GPU. Getting a faster CPU won't make the frame spacing shorter - and it won't increase the average FPS in such a situation. If the GPU finishes the task very quickly, the the CPU has to wait until the GPU is finished. Then the bottleneck is the GPU. Getting a faster GPU won't improve the frame spacing and average FPS.

It's more complicated than this - the CPU and GPU tasks are made of lots of different small tasks that stress different components on the CPU or GPU. For example a CPU-side task might involve moving data to the CPU and back - it might depend on the CPU's RAM cache and RAM speed. A 2nd CPU-side task might involve a lot of arithmetic - it depends on the CPUs processing speed. A 3rd CPU-side task may involve reading a large file from disk, and SSDs might load faster (the ideal design should be to let loading happen in the background so the frame spacing is unaffected). A GPU-side task might involve looking up a lot of textures stored in memory - it depends on the the GPUs' VRAM (Video RAM) caching and speed. Another GPU-side task might involve lots of calculations, and depend on the GPUs math speed.

Each of these sub-tasks can be broken down into even smaller operations. The components on the CPU or GPU side can also be broken down into more sub-components. For example a CPU has integer and floating point units - so different CPUs may

The CPU and GPU tasks aren't necessarily done at the same time - for example the the simulation of the world has to be done before everything else and this is a CPU based task. Sometimes the GPU has to wait for the CPU. Usually in current FG a GPU drawing tasks follow a CPU task, but it isn't necessarily the case in future - it's possible to calculate certain things on the GPU and read them back to the CPU before using that information to decide what the GPU draws next. If the GPU is busy drawing the scene at the end, the CPU may use this time to read input and simulate the world at a suitable time for the next frame.

CPU bound and GPU bound situations

One of the CPU and GPU will be the limiting factor. If the CPU is the limiting factor then an application is called CPU bound in that particular situation/usage. Changing the CPU load or performance slightly (e.g. changing CPU clock speed via an overclocking tool) will cause a slight drop or increase in FPS - but doing the same for the GPU won't affect FPS. If the limiting factor is the GPU, an application is called GPU bound.

Profiling tools

Some info on the standalone graphics profilers that doesn't need compiling from source.

gDEBugger

Works with FG's older OpenGL contexts. Have to download v6 from archive.org: Windows download link. Linux download link. It seems AMD bought gDEBugger, and then took old versions offline as they supported reading NVIDIA specific GPU performance counters.

  • Create project: bin/fgfs.exe. Windows: Use --launcher as command-line, and any other command-line options you need. Working directory same as shortcut (the one /bin is in).
  • 3 modes. Each mode has different functions available. See menu and shortcut buttons. Modes have different amounts of slow down. Switch to faster modes to fiddle with FlightGear's in-sim settings, or change locations.
  • Menu bar: Play to launch/resume. Pause to halt and examine data/stats for duration fo the run. There are buttons to play one frame and stop, or progress to the next draw call.
  • Profile mode: fastest and with least features. Right click on graphs to add counters. Add counters for different rendering contexts: no 5 seems to be the main rendering one. There are GPU specific counters but they need 'NVInstEnabler.exe' to switch driver instrumentation on.
  • Debug mode has a menubar buttons to give different stages in GPU trivial workloads to see if frame rate jumps up because that stage was a bottleneck. Shows opengl call stats & history. Shows resources like textures with ID, glsl. Analyze mode can show redundant state changes & call stack .
  • 'NVInstEnabler.exe' from NVIDIA perfkit is needed to enable reading internal GPU counters from the driver. Publicly available perfkit only supports up-to 900 series (Maxwell) GPUs. Perfworks is the new replacement. Perfworks supports 1000 series and later, but doesn't seem to be publicly available. I haven't got internal counters to work with gDEBugger yet.

Render Doc

to do:

https://renderdoc.org/

NVIDIA Nsight Graphics

Quick look:

Complains about some of FlightGear's outdated OpenGL use. FG is moving to a higher OpenGL core profile in the next LTS, and the bits of old OpenGl code will be removed in the process. Contact the "fg-devel" mailing-list to help with this. FG currently uses interface calls that were removed so only bits of it work. Starting the analysing interfrace using the frame debugger/profiler/trace fails. However the internal performance counter reporting works, just enough to view counters in a graph

Nsight v2018.4 supports OpenGL 4.5 core profile API [11]. Programs that only use API calls that are still in the 4.5 core profile can be profiled. Works with 10 series and later GPUs. Each version of Nsight needs a minimum driver version, so you may need to check / update.

Nsight is also much faster than gDEBugger even in profile mode.

See also