Resource Tracking for FlightGear

From FlightGear wiki
Jump to: navigation, search
This article is a stub. You can help the wiki by expanding it.
system and GPU video memory exposed to the property tree(Nvidia-only for now)
KSFO memory usage.png
Started in 08/2015
Description system and video memory utilization tracking
Contributor(s) hamzaalloush,Hooray (since 08/2015),
Status Under active development as of 09/2015


Cquote1.png we DO load all textures for all effects right now - this is bug #610, which I was recently reminded about, and am doing some hacking on. This is certainly not helping our performance or memory footprint on lower-end machines since the various textures for highest-quality effects (the water depth shader, bump maps, reflection maps) are all being loaded. It’s also making startup / reset slower.
— James Turner (Apr 1st, 2014). Re: [Flightgear-devel] Towards better scenery rendering.
(powered by Instant-Cquotes)
Cquote2.png


Cquote1.png we want to avoid writing explicit deletes as much as possible, as that need is the source of most memory leaks. We have two classes for smart, referencecounted pointers, osg::ref_ptr and SGSharedPtr which should be used for all long-lived, shared objects.
— Tim Moore (Sat, 19 Jan 2008 02:40:21 -0800). Re: How can I retire from the forum?.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png it also opens up a larger question of how we do memory management in FG, and whether we should be doing things such as more aggressively freeing up terrain tiles. At one level, removing entire terrain tiles from memory earlier if memory occupancy becomes a concern would be a better management strategy than just stopping generating new buildings.
— stuart (Aug 24th, 2012). Re: Random Buildings.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png In general I’d rather not see any raw pointers as class members unless they are truly weakly-referenced, and in that case I’d prefer they were to an SGReferenced via SGWeakPtr.


BTW for C++11 folks I’m aware auto_ptr will go away but that should be a search-and-replace refactoring.


Cquote2.png


Cquote1.png Another goal is to add more node bits (and a GUI dialog to control them) so various categories of objects can be disabled during the update pass. This will mean the direct hit of, say, AI models vs particles vs random trees can be measured. Of course it won't account for resources (memory, textures) burned by such things, but would still help different people identify slowness on their setups.
— James Turner (Jul 19th, 2012). Re: [Flightgear-devel] Rendering passes question.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png 3.2 switched the base-package scenery to the high-resolution (i.e. memory-intensive) version, with the result that FG on default settings hangs my system (4GB memory, Intel graphics, no swap).

It becomes usable after reducing the bare LOD range, but one needs to know to do that; I'd like to replace the fixed defaults by something
that automatically adjusts to the hardware, but haven't yet got around to this.


Cquote2.png
Cquote1.png On File > Reset, memory usage drops from 1.3GB to 1.1GB then rises to 2.3GB (at KSFO not doing anything), suggesting that a large part of the

old used memory isn't being freed, and often causing an out-of-memory hang/crash. Is this a known issue?


— Rebecca Palmer (2014-09-04). [Flightgear-devel] Large memory leak on Reset.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png FlightGear currently has a large increase in memory usage on Reset (tested with c172p@...: 1.6GB -> minimum during reset 1.2GB -> probably-out-of-memory system hang at 2.0GB), but when I tried to trace this problem using AddressSanitizer's leak checker, the (many) leaks it found were much too small to explain this.
— Rebecca N. Palmer (2015-03-25). Re: [Flightgear-devel] Detecting circular-reference memory leaks.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I was using some local hacks into SGSharedPtr to detect these issues when working on the reset code. Memory use at the ‘bottom’ of reset (after everything has been freed / references dropped, and before we start re-creating stuff) should be substantially lower than what you’re reporting, so indeed it sounds as if a circular reference has crept in.
Cquote2.png
Cquote1.png I was almost at the limit of available swap space, I had to add a temporary swap file for fear of running out of virtual memory and thus getting a crash. (Would have sucked after 6 hours of flight). As I said, I was up to 7GB but I use a 64 bit system. People with 32 bit machines would run into trouble much earlier, typically at 2 or 3GB already.
— Csaba Halász (Apr 13th, 2011). Re: [Flightgear-devel] OSG caching.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I've done some initial measurement to identify potential memory leaks in SimGear.

Detection was limited to execution of code covered by unit tests.


— xDraconian (2015-03-24). [Flightgear-devel] SimGear Memory Leaks.
(powered by Instant-Cquotes)
Cquote2.png

Status

A simple proof-of-concept is now working properly, while it is currently restricted to Linux-only OS for RAM tracking, and NVIDIA GPUs for VRAM tracking, the corresponding stubs for other operating systems and GPU vendors are prepared and waiting to be filled in over time. If you are interested in helping with this, you are invited to get involved - for the time being, it would help to have people involved with access to other operating systems (namely, Windows and Mac OSX), as well as different GPU makes (ATI/AMD and Intel primarily).

People wanting to get involved, should ideally be able to patch & build FlightGear from source, so that we can exchange patches and integrate things step by step.

The mid-term idea here being to expose relevant system metrics to the FlightGear property tree, so that these can be internally used for benchmarking/feature-scaling purposes, but also for regression testing - for instance, any metrics exposed to the property tree can be trivially accessed using built-in means, such as Logging properties (CSV), which would in turn make it possible to create diagrams using gnuplot for different startup/run-time configuration (think minimal startup profile or Rembrandt vs. ALS etc).

At some point, we're also hoping to leverage off the patches from Initializing Nasal early to determine how much individul subsystems are adding to overall RAM/VRAM load during initialization, and especially during Reset & re-init.

Monday, August 23, 2015: Nvidia GPU metrics added, "GPUInfo::ATI_GPU, GPUInfo::INTEL_GPU" available for extension

Using the "GL_NVX_gpu_memory_info" extension from Nvidia's OpenGL API, Hooray has been able to expose video memory information, which was tested initially on a GT540 mobile edition card successfully. In an effort to make this extendible for other GPU's, we inherit a GPUInfo object for each vendor's implementation. Testing for other GPU's will be required in the future.

Thursday, August 27, 2015: System Information Gatherer And Reporter (SIGAR) libraries adopted, new "sigar" sub-system

We decided to adopt the System Information Gatherer And Reporter (SIGAR) libraries, to provide cross-platform support for resource tracking, memory in addition to other new metrics(which are shown in the Gallery section of this article), are now exposed under the new sub-system "/sigar", which can be found in the property tree (sigar is also the name of the new subsystem, and both may be subject to change). VRAM support progress for other, non-nvidia, vendors are now postponed due to priority of Nasal and per-subsystem resource tracking, as well as issues involving non-proprietary drivers. A development repository for the project is also being considered. Fow the being, people wanting to test the latest patch, will need to clone the sigar repository and build/install the master branch (system-wide) and then rebuild fgfs using -DENABLE_SIGAR=ON

Gallery

GPU/VRAM

Cquote1.png Maybe the GLX_MESA_query_renderer would be of use?Regards,EdwardEdit: This is where I heard it: Phoronix: GLX_MESA_query_renderer Extension Published and [Mesa-dev [RFC] GLX_MESA_query_renderer].
Cquote2.png

NVIDIA

Background

Note  for better diagnostics, and better end-user bug reports, we could consider exposing a cross-platform process and system utilities module via Nasal/CppBind, such as e.g. psutil (Windows, MacOS & BSD/Unix) Not done Not done

ticket 1447

Despite many FlightGear users now using FlightGear on 64 bit operating systems with sufficient RAM (8+ gb), since early 2010, we've been seeing an increasing number of end-user reports due to errors along the lines of Open GL - out of memory error, Warning: detected OpenGL error 'out of memory' at after RenderBin::draw(..) as can be seen in countless forum discussions, mailing list postings and bug reports in the issue tracker. This effort is intended to help expose process-specific metrics (e.g. RAM/VRAM utilization) in the FlightGear property tree, so that this information can be used for troubleshooting purposes, but also for benchmarking and feature-scaling.

Cquote1.png This is so useful, maybe people might want to apply this patch to the already supported platform(Linux and Nvidia), it's really is an eye opener, Just spawning in KSFO with the UFO will consume over 2 GB of RAM(never mind something like the triple 7), also as you can see in the picture GPU memory usage nearly at 1GB VRAM, which was this card's maximum, so maybe we can now establish minimum hardware requirements like with other sims  :) People wanted with AMD hardware to help let this thing be the default in FG, no further need for using task manager to track resources.... We have direct access! Not even FSX does this if we're done
— hamzaalloush (Aug 25th, 2015). Re: Looking for people with ATI/AMD GPUs building from sourc.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png we can now also begin tracking RAM/VRAM utiliation for each initialized subsystem (think reset/re-init), which will help identify "rogue" (broken) subsystems, so that these can either be fixed/optimized or made entirely optional. In addition, it would become possible to track complexity of different locations/aircraft using different startup/run-time profiles.Obviously, this could easily be the foundation for any kind of "benchmark", but also for actual feature-scaling, to dynamically adjust certain simulator settings based on things like RAM, VRAM, CPU or Nasal/GC load, while allowing content developers (those doing scenery/aircraft) to look "behind the scenes" (e.g. to help optimize aircraft like the 777 or extra500).
Cquote2.png
Cquote1.png the idea is to track resource utiliation of different sub-systems as they are initialized in FG, (not just memory usage, but also Nasal statistics for tracking where Nasal scripts are affecting performance), this is meant to be used by developers and users who, for example want to adjust view LOD to where system memory allows, etc... maybe Hooray can clarify a bit.we also are entertaining the feature-scaling idea, ever played a game where you can use "automatic" settings for your system? well if we have all the system information we can use that to apply for optimal performance on your system,
— hamzaalloush (Aug 25th, 2015). Re: Looking for people with ATI/AMD GPUs building from sourc.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png In its simplest form, it will merely tell you RAM/VRAM utiliation for different startup/run-time settings, so that the impact of -for example- using different aircraft/locations (or weather systems) can be compared in varying situations. Equally, this would allow us to grow a library of startup/run-time profiles and compare the memory footprint for different hardware (think graphics cards).Such a library of startup settings could be grown and maintained as part of $FG_ROOT - at some point, this could even include scripted flights (think route-manager driven, and/or replay/fgtape based).
Cquote2.png

Problem

It would be great to write subsystem-specific RAM usage info to the property tree, so that it can be shown (i.e. in the performance monitor), but also used by Nasal scripts to scale down features dynamically - the degree of swap usage vs. amount of free RAM is probably one of the most important metrics here.

We also don't know much about aircraft/model complexity, it would surely be a good idea to track the ram usage for each loaded model and write it to the property tree, so that we have some metrics and can actually see how much of an impact MP aircraft have.

To be brutally honest, FG is simply dumb here - it doesn't even know how much real RAM you have, or if the OS starts swapping to disk (which is dead slow), also FG doesn't know how much video ram (VRAM) you have. These are things that can be fixed, and they need to be fixed first, prior to adding more and more memory-hungry features.

Seriously, it's like spending a ton of money each day, without knowing how much you have available - what we're currently doing isn't much different actually.

Honestly, it makes ZERO sense to continue developing FG like this, because we're basically designing a car without having a reliable means to tell how fast it is going or how much fuel it is consuming ...

We are adding more and more features, but we have zero clue as to how expensive they really are, because we're lacking the instruments to track resource usage properly.

Adding RAM/SWAP usage, e will tell you how much RAM FG is consuming, and if it starts swapping. Next, we would need to track RAM usage per subsystem (or feature). We could also then configure a "cap" to stop allocating RAM, unless RAM is also freed. That's then touching the realms of "feature scaling" and resource management. Currently, we only care about frame rate and frame latency (spacing), i.e. how long it takes to create a single frame. We are unaware of other resource usage like RAM - which needs fixing, simple as that. FG is being developed by power users (developers) who tend to have extremely powerful computers, I am guilty of this as well - we do not typically test our new features on less powerful hardware (or even just old computers, or netbooks).

Which is a huge problem for people who do not have access to the latest hardware.

The problem mentioned on the forum (v2 scenery) is a good example for this: nobody noticed that there was a bug in TerraGear that caused the created scenery to use massive amounts of memory - it was tracked down by the atlas developer, by coincidence, because he was loading the scenery into atlas, and saw the number of UNNECESSARY triangles created by TG.

Now, this is a just a symptom - and it will be fixed during the next scenery build, but it goes to show that we DO have a real problem here, because nobody really looks at memory usage until there really is a problem - and in this case, we would probably still not know about it, if it wasn't for atlas and its developer.

So in this case, it was the scenery generator (TG) that was buggy, and we never noticed it, because we are lacking the "instruments" to track memory usage per subsystem, or we would have seen much earlier that v2 scenery is eating up ~twice as much RAM, without adding any useful triangles ...

While fixing the scenery issue would be straightforward - this still would not help us with similar problems due to the same problem, i.e. imagine some other system in FG leaking massive amounts of memory - as long as we aren't doing any tracking, we won't notice such things, and we cannot really evaluate how well a system performs - currently, we're only looking at "speed" in terms of performance, but we have no ideas how much system memory or video memory is being used by certain features. Other games and simulators solved the problem long ago - and it makes absolutely sense - right now, we're just making sure that the program (fgfs) performs at least with 30 fps - but we have no idea how "fuel" (RAM) is burnt, and where it is used - because we're lacking a fuel gauge ...

There really needs to be better memory management strategy employed in FlightGear in general. I mean, smart pointers have been part of simgear long before SGReferenced and long before osg::ref_ptr [1], still there's lots of new stuff added each week not using any smart pointers at all - including my patch by the way ;-)

Stuart's random buildings code is an exception in that it uses all the osg::ref_ptr machinery already...

That said, OSG has quite a lot of "clever" machinery to help with memory management, at least with all the rendering-related stuff. For "problems" like excessive ram usage with random buildings and random trees, there are well-defined "standard" solutions available in OSG - but for many others, there aren't any. So rendering-related issues can be obviously addressed by using what's provided by OSG.

Thus, the issue is real two-fold: some subsystems making excessive usage of RAM, and others leaking lots of memory.

Probably, that would justify introducing a dedicated "memory management subsystem" eventually - i.e. something like a new/delete replacement that uses the boehm/weiser GC to manage resources dynamically, or a custom memory pool implementation using "placement new" for subsystems and other code allocating memory.


Manual memory management in C++ is always extremely tedious and error-prone, especially in non-trivial projects with a long history and lifespan.

Stuart:I think it also opens up a larger question of how we do memory management in FG, and whether we should be doing things such as more aggressively freeing up terrain tiles. At one level, removing entire terrain tiles from memory earlier if memory occupancy becomes a concern would be a better management strategy than just stopping generating new buildings.

SIGAR

The Sigar API provides a portable interface for gathering system information such as:

  • System memory, swap, cpu, load average, uptime, logins
  • Per-process memory, cpu, credential info, state, arguments, environment, open files
  • File system detection and metrics
  • Network interface detection, configuration info and metrics
  • TCP and UDP connection tables
  • Network route table

This information is available in most operating systems, but each OS has their own way(s) providing it. SIGAR provides developers with one API to access this information regardless of the underlying platform. The core API is implemented in pure C


've skimmed through the SIGAR sources on github: https://github.com/hyperic/sigar

Turns out, the latest stuff (master branch) already has cmake support, so is really straightforward, and lightweight, to build.

For now, I have built the whole thing out of source and installed it system-wide. However, like you said, it would probably make sense to directly absorb this into $FG_SRC.

For testing purposes, this will do - I don't expect many people to use our little patch until it has grown a little more ... I've looked at the demos/examples, and they're also fairly straightforward - for example, see: https://github.com/hyperic/sigar/blob/master/examples/cpuinfo.c

This means, that you'll get tons of useful information with just ~15 lines of C code. So, I am probably going to focus on replacing the hard-coded Linux stuff in the patch with the equivalent SIGAR code, because that stuff is multi-platform, and provides much more, better, and much more accurate information, too.

You can see a plethora of unit tests here: https://github.com/hyperic/sigar/tree/master/tests Some of those could be directly useful in FG, too.

Actually integrating this in FG should be straightforward - the main thing is extending our cmake logic to link in sigar, and then add the headers for sigar.h - so that we can directly call SIGAR APIs in our little "process-stats" subsystem (I may end up renaming this "sigar")

The required cmake changes can be seen in the CMakelists.txt file in the sigar source tree.

Once we are using SIGAR, we should probably get this committed to a topic branch, so that others can more easily test/extend the code over time - I guess it would make sense to see which metrics we'll want to plot using gnuplot ...

Startup Profiles

WIP.png Work in progress
This article or section will be worked on in the upcoming hours or days.
See history for the latest developments.


Note  The following FlightGear startup profile assumes that you have a $FG_ROOT environment variable set up, or that you are explicitly setting fg-root using the --fg-root command line argument, this startup profile is intended to be put into your Fgfsrc file or to be used when starting FlightGear from the command line.

The profile listed below is

  • name: minimal
  • version: 3.7
  • description: n/a
# --ignore-autosave # uncomment this for FlightGear versions >= 2.99
--disable-terrasync
--disable-splash-screen
--airport=ksfo
--offset-distance=4000
--offset-azimuth=90
--altitude=500
--heading=0
--model-hz=60
--disable-random-objects
--prop:/sim/rendering/texture-compression=off
--prop:/sim/rendering/quality-level=0
--prop:/sim/rendering/shaders/quality-level=0
--disable-ai-traffic
--prop:/sim/ai/enabled=0
--aircraft=ufo
--disable-sound
--prop:/sim/rendering/random-vegetation=0
--prop:/sim/rendering/random-buildings=0
--disable-specular-highlight
--disable-ai-models
--disable-clouds
--disable-clouds3d
# --disable-textures
--fog-fastest
--visibility=5000
--disable-distance-attenuation
--disable-real-weather-fetch
--prop:/sim/rendering/particles=0
--prop:/sim/rendering/multi-sample-buffers=1
--prop:/sim/rendering/multi-samples=2
--prop:/sim/rendering/draw-mask/clouds=false
--prop:/sim/rendering/draw-mask/aircraft=false
--prop:/sim/rendering/draw-mask/models=false
--prop:/sim/rendering/draw-mask/terrain=false

--disable-random-vegetation
--disable-random-buildings
--disable-rembrandt
--disable-horizon-effect



Using gnuplot

WIP.png Work in progress
This article or section will be worked on in the upcoming hours or days.
See history for the latest developments.

Extending OSG StatsHandler

Objective

TheTom: "I don't think we should track every bit of allocated memory. This could become very slow and there already exist special malloc/new implementations to track every allocation. Probably a good start would be have a closer look at the obvious locations like scenery, models and the property tree. I think the memory used by Tiles, Objects, and PropertyNodes should give already a good first picture (I expect the property nodes only to use a small fraction of the memory used by Tiles/Objects). Maybe it is also able for Vertex/Texture data to get the memory used on the GPU."

  • expose total amount of available physical RAM via the property tree 30}% completed (so far only a rough prototype for Linux)
  • expose total amount of RAM used by the fgfs process via the property tree 30}% completed (so far only a rough prototype for Linux)
  • expose "swappiness" - i.e. percentage of used RAM being swapped 10}% completed
  • investigate exposing subsystem-specific memory stats for all major subsystems (nasal, canvas, autopilot, gui etc) Not done Not done

Roadmap

  • extend the help/about dialog to also show: amount of RAM/VRAM/SWAP, and utilization [1] Not done Not done
  • expose amount of total RAM available to property tree 30}% completed (Linux-only for now)
  • expose amount of currently used RAM to property tree 30}% completed (Linux-only for now)
  • expose amount of swap space used 30}% completed (Linux-only for now)
  • expose amount of total/used VRAM 30}% completed (NVIDIA-only for now)
  • expose Nasal GC internals to the property tree using ThorstenB's patch 20}% completed
  • add a postinit hook to SGSubsystem (SimGear) so that initialization overhead for each subsystem can be logged Not done Not done

Ideas & Experiments

Having access to internal RAM usage statistics would definitely be useful - and probably help us improve subsystem further. And enabling/disabling a potential allocation-block could also be implemented through a boolean property then, which the system sets automatically once certain heuristics apply - which would not even need to be implemented in C++, it could just as well be a property rule or a Nasal script. The C++ code just needs an option to be told to stop allocating new objects, while ensuring that it publishes internal stats and writes them into the property tree.

Exposing internal RAM usagee stats to the property tree would still seem useful, especially for people wanting to make bug reports, or even just for dynamic "feature-scaling" using Nasal and/or XML-Property Rules. Thus, I feel that even just replacing the SG_LOG() statements with SGPropertyNode->setStringValue(...) would be a good move, as it would help people looking "under the hood" of the system, so to speak.

And like I said earlier, being able to control the behavior of the system (even if that just means disabling/enabling allocations) would definitely be useful to see if segfaults/crashes reported by users can even be remotely related to the new system or not. If we had this option now, it would be much easier for us to provide support to the users here seeing "OpenGL out of memory" errors.

It would be worthwhile to keep this in mind and possibly really come up with a subsystem that monitors FlightGear's RAM usage at configurable intervals of say 1-5 seconds, and dumps all the info to the property tree, where it can be further processed by Nasal scripts or GUI listeners. If it's really swapping that slows down or kills FG for so many people, we obviously need to prevent it - and having access to real time memory usage stats would definitely seem useful then - not just for the random buildings system, but also all other subsystems that may "over-allocate" and run into swap space.

Thus, even just having a boolean "/memory/swap/is-swapping" would be useful, because other subsystems could monitor it using a listener and then adjust their own allocation behavior dynamically.

Hooray once talked with Stuart and TheTom about also adding details about the amount of physical RAM vs. swap usage, too - having these things available in FG, would make troubleshooting on the forums quite a bit easier, and would also allow us to do feature-scaling (dynamically adjusting features based on resource usage), Hooray provided a patch for this (Linux-only implementation) back then, and we could probably extend this for all main OS (win, mac, linux): http://forum.flightgear.org/viewtopic.php?f=5&t=16083&p=165168#p164936

As can be seen in the original discussion, we specifically talked about subsystem-specific RAM usage tracking (random buildings, canvas, autopilot, replay/flight recorder etc) -analogous to ThorstenB's system monitor, just with a focus on RAM (memory).

Possibly via some custom smart pointer implementation that overloads new/delete - the main issue being that we really need a way to track memory usage per subsystem, so that we can identify "rogue" features - either to fix them, or to provide hooks for feature-scaling purposes.

For Nasal space, we pretty much have a working solution that could help us identify scripts that are causing lots of new naRefs or temporaries. But C++ code is a different beast, and as can be seen in many discussions here - we have quite a few people with >= 8gb of RAM seeing crashes on 64 bit OS due to "out of RAM" errors.

There's also many other examples for this, such as the v2 scenery eating up tons of RAM without anybody noticing, until the atlas developer pointed this out: http://forum.flightgear.org/viewtopic.php?f=5&t=21498&p=195610&hilit=#p195610

So after the 3.0 release, we should probably investigate how to track RAM per subsystem/feature and provide some "live" stats.

Proof of concept (patch)

Notehttp://codepad.org/27VcoMSX/raw.txt

Here's a first stab at a simple subsystem to monitor FlightGear memory usage on Linux at 5 second intervals, consider it a "proof of concept" prototype now.

History

Typical suggestions include:

The error is related to your fgfs running out of memory (but not your gpu memory)!

This will happen with detailed aircraft or detailed scenery.
Custom scenery like the one from papillon81 repo will cause this happen realy quick.
Using a detailed plane like the 787 from omega makes it worse.
(will crash fg on startup at LOWI for example / or cause scnery and some textures to be not loaded and leads to a crash a couple of minutes later)
This happens without any eyecandy activated!

And we've even be seeing people reporting "memory issues" despite having 12+ gb of RAM [3].

Cquote1.png Fast forward 2 years later, there are now several options available to query the OS/GPU driver and obtain GPU specific information, such as amount of dedicated VRAM and overall GPU/VRAM utilization - here's what I've found for nvidia, and I am willing to provide a corresponding patch to expose these stats to the property tree and update them, i.e. once or twice per second:


http://www.geeks3d.com/20100531/program ... in-opengl/


— Hooray (Sat Aug 16). Re: Open GL - out of memory error..
(powered by Instant-Cquotes)
Cquote2.png