Resource Tracking for FlightGear

From FlightGear wiki
Jump to navigation Jump to search
ram usage metrics exposed to the property tree (Linux only for now)
KSFO ram usage patch.jpg
Started in 08/2015
Description memory utilization tracking
Contributor(s) hamzaalloush,Hooray (since 08/2015),
Status Under active development as of 09/2015


This article is a stub. You can help the wiki by expanding it.


Cquote1.png In general I’d rather not see any raw pointers as class members unless they are truly weakly-referenced, and in that case I’d prefer they were to an SGReferenced via SGWeakPtr.


BTW for C++11 folks I’m aware auto_ptr will go away but that should be a search-and-replace refactoring.


Cquote2.png
Cquote1.png 3.2 switched the base-package scenery to the high-resolution (i.e. memory-intensive) version, with the result that FG on default settings hangs my system (4GB memory, Intel graphics, no swap).

It becomes usable after reducing the bare LOD range, but one needs to know to do that; I'd like to replace the fixed defaults by something
that automatically adjusts to the hardware, but haven't yet got around to this.


Cquote2.png
Cquote1.png On File > Reset, memory usage drops from 1.3GB to 1.1GB then rises to 2.3GB (at KSFO not doing anything), suggesting that a large part of the

old used memory isn't being freed, and often causing an out-of-memory hang/crash. Is this a known issue?


— Rebecca Palmer (2014-09-04). [Flightgear-devel] Large memory leak on Reset.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png FlightGear currently has a large increase in memory usage on Reset (tested with c172p@...: 1.6GB -> minimum during reset 1.2GB -> probably-out-of-memory system hang at 2.0GB), but when I tried to trace this problem using AddressSanitizer's leak checker, the (many) leaks it found were much too small to explain this.
— Rebecca N. Palmer (2015-03-25). Re: [Flightgear-devel] Detecting circular-reference memory leaks.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I was using some local hacks into SGSharedPtr to detect these issues when working on the reset code. Memory use at the ‘bottom’ of reset (after everything has been freed / references dropped, and before we start re-creating stuff) should be substantially lower than what you’re reporting, so indeed it sounds as if a circular reference has crept in.
Cquote2.png
Cquote1.png I've done some initial measurement to identify potential memory leaks in SimGear.

Detection was limited to execution of code covered by unit tests.


— xDraconian (2015-03-24). [Flightgear-devel] SimGear Memory Leaks.
(powered by Instant-Cquotes)
Cquote2.png

GPU/VRAM

NVIDIA

Background

Note  for better diagnostics, and better end-user bug reports, we could consider exposing a cross-platform process and system utilities module via Nasal/CppBind, such as e.g. psutil (Windows, MacOS & BSD/Unix) Not done Not done

ticket #1447

Despite many FlightGear users now using FlightGear on 64 bit operating systems with sufficient RAM (8+ gb), since early 2010, we've been seeing an increasing number of end-user reports due to errors along the lines of Open GL - out of memory error, Warning: detected OpenGL error 'out of memory' at after RenderBin::draw(..) as can be seen in countless forum discussions, mailing list postings and bug reports in the issue tracker.

Typical suggestions include:

The error is related to your fgfs running out of memory (but not your gpu memory)!

This will happen with detailed aircraft or detailed scenery.
Custom scenery like the one from papillon81 repo will cause this happen realy quick.
Using a detailed plane like the 787 from omega makes it worse.
(will crash fg on startup at LOWI for example / or cause scnery and some textures to be not loaded and leads to a crash a couple of minutes later)
This happens without any eyecandy activated!

And we've even be seeing people reporting "memory issues" despite having 12+ gb of RAM [1].

Cquote1.png Fast forward 2 years later, there are now several options available to query the OS/GPU driver and obtain GPU specific information, such as amount of dedicated VRAM and overall GPU/VRAM utilization - here's what I've found for nvidia, and I am willing to provide a corresponding patch to expose these stats to the property tree and update them, i.e. once or twice per second:


http://www.geeks3d.com/20100531/program ... in-opengl/


— Hooray (Sat Aug 16). Re: Open GL - out of memory error..
(powered by Instant-Cquotes)
Cquote2.png

Problem

It would be great to write subsystem-specific RAM usage info to the property tree, so that it can be shown (i.e. in the performance monitor), but also used by Nasal scripts to scale down features dynamically - the degree of swap usage vs. amount of free RAM is probably one of the most important metrics here.

We also don't know much about aircraft/model complexity, it would surely be a good idea to track the ram usage for each loaded model and write it to the property tree, so that we have some metrics and can actually see how much of an impact MP aircraft have.

To be brutally honest, FG is simply dumb here - it doesn't even know how much real RAM you have, or if the OS starts swapping to disk (which is dead slow), also FG doesn't know how much video ram (VRAM) you have. These are things that can be fixed, and they need to be fixed first, prior to adding more and more memory-hungry features.

Seriously, it's like spending a ton of money each day, without knowing how much you have available - what we're currently doing isn't much different actually.

Honestly, it makes ZERO sense to continue developing FG like this, because we're basically designing a car without having a reliable means to tell how fast it is going or how much fuel it is consuming ...

We are adding more and more features, but we have zero clue as to how expensive they really are, because we're lacking the instruments to track resource usage properly.

Adding RAM/SWAP usage, e will tell you how much RAM FG is consuming, and if it starts swapping. Next, we would need to track RAM usage per subsystem (or feature). We could also then configure a "cap" to stop allocating RAM, unless RAM is also freed. That's then touching the realms of "feature scaling" and resource management. Currently, we only care about frame rate and frame latency (spacing), i.e. how long it takes to create a single frame. We are unaware of other resource usage like RAM - which needs fixing, simple as that. FG is being developed by power users (developers) who tend to have extremely powerful computers, I am guilty of this as well - we do not typically test our new features on less powerful hardware (or even just old computers, or netbooks).

Which is a huge problem for people who do not have access to the latest hardware.

The problem mentioned on the forum (v2 scenery) is a good example for this: nobody noticed that there was a bug in TerraGear that caused the created scenery to use massive amounts of memory - it was tracked down by the atlas developer, by coincidence, because he was loading the scenery into atlas, and saw the number of UNNECESSARY triangles created by TG.

Now, this is a just a symptom - and it will be fixed during the next scenery build, but it goes to show that we DO have a real problem here, because nobody really looks at memory usage until there really is a problem - and in this case, we would probably still not know about it, if it wasn't for atlas and its developer.

So in this case, it was the scenery generator (TG) that was buggy, and we never noticed it, because we are lacking the "instruments" to track memory usage per subsystem, or we would have seen much earlier that v2 scenery is eating up ~twice as much RAM, without adding any useful triangles ...

While fixing the scenery issue would be straightforward - this still would not help us with similar problems due to the same problem, i.e. imagine some other system in FG leaking massive amounts of memory - as long as we aren't doing any tracking, we won't notice such things, and we cannot really evaluate how well a system performs - currently, we're only looking at "speed" in terms of performance, but we have no ideas how much system memory or video memory is being used by certain features. Other games and simulators solved the problem long ago - and it makes absolutely sense - right now, we're just making sure that the program (fgfs) performs at least with 30 fps - but we have no idea how "fuel" (RAM) is burnt, and where it is used - because we're lacking a fuel gauge ...

There really needs to be better memory management strategy employed in FlightGear in general. I mean, smart pointers have been part of simgear long before SGReferenced and long before osg::ref_ptr [1], still there's lots of new stuff added each week not using any smart pointers at all - including my patch by the way ;-)

Stuart's random buildings code is an exception in that it uses all the osg::ref_ptr machinery already...

That said, OSG has quite a lot of "clever" machinery to help with memory management, at least with all the rendering-related stuff. For "problems" like excessive ram usage with random buildings and random trees, there are well-defined "standard" solutions available in OSG - but for many others, there aren't any. So rendering-related issues can be obviously addressed by using what's provided by OSG.

Thus, the issue is real two-fold: some subsystems making excessive usage of RAM, and others leaking lots of memory.

Probably, that would justify introducing a dedicated "memory management subsystem" eventually - i.e. something like a new/delete replacement that uses the boehm/weiser GC to manage resources dynamically, or a custom memory pool implementation using "placement new" for subsystems and other code allocating memory.


Manual memory management in C++ is always extremely tedious and error-prone, especially in non-trivial projects with a long history and lifespan.

Stuart:I think it also opens up a larger question of how we do memory management in FG, and whether we should be doing things such as more aggressively freeing up terrain tiles. At one level, removing entire terrain tiles from memory earlier if memory occupancy becomes a concern would be a better management strategy than just stopping generating new buildings.

SIGAR

The Sigar API provides a portable interface for gathering system information such as:

  • System memory, swap, cpu, load average, uptime, logins
  • Per-process memory, cpu, credential info, state, arguments, environment, open files
  • File system detection and metrics
  • Network interface detection, configuration info and metrics
  • TCP and UDP connection tables
  • Network route table

This information is available in most operating systems, but each OS has their own way(s) providing it. SIGAR provides developers with one API to access this information regardless of the underlying platform. The core API is implemented in pure C

Extending OSG StatsHandler

Objective

TheTom: "I don't think we should track every bit of allocated memory. This could become very slow and there already exist special malloc/new implementations to track every allocation. Probably a good start would be have a closer look at the obvious locations like scenery, models and the property tree. I think the memory used by Tiles, Objects, and PropertyNodes should give already a good first picture (I expect the property nodes only to use a small fraction of the memory used by Tiles/Objects). Maybe it is also able for Vertex/Texture data to get the memory used on the GPU."

  • expose total amount of available physical RAM via the property tree 30}% completed (so far only a rough prototype for Linux)
  • expose total amount of RAM used by the fgfs process via the property tree 30}% completed (so far only a rough prototype for Linux)
  • expose "swappiness" - i.e. percentage of used RAM being swapped 10}% completed
  • investigate exposing subsystem-specific memory stats for all major subsystems (nasal, canvas, autopilot, gui etc) Not done Not done

Roadmap

  • expose amount of total RAM available to property tree 30}% completed (Linux-only for now)
  • expose amount of currently used RAM to property tree 30}% completed (Linux-only for now)
  • expose amount of swap space used 30}% completed (Linux-only for now)
  • expose amount of total/used VRAM 30}% completed (NVIDIA-only for now)
  • expose Nasal GC internals to the property tree using ThorstenB's patch 20}% completed

Ideas & Experiments

Having access to internal RAM usage statistics would definitely be useful - and probably help us improve subsystem further. And enabling/disabling a potential allocation-block could also be implemented through a boolean property then, which the system sets automatically once certain heuristics apply - which would not even need to be implemented in C++, it could just as well be a property rule or a Nasal script. The C++ code just needs an option to be told to stop allocating new objects, while ensuring that it publishes internal stats and writes them into the property tree.

Exposing internal RAM usagee stats to the property tree would still seem useful, especially for people wanting to make bug reports, or even just for dynamic "feature-scaling" using Nasal and/or XML-Property Rules. Thus, I feel that even just replacing the SG_LOG() statements with SGPropertyNode->setStringValue(...) would be a good move, as it would help people looking "under the hood" of the system, so to speak.

And like I said earlier, being able to control the behavior of the system (even if that just means disabling/enabling allocations) would definitely be useful to see if segfaults/crashes reported by users can even be remotely related to the new system or not. If we had this option now, it would be much easier for us to provide support to the users here seeing "OpenGL out of memory" errors.

It would be worthwhile to keep this in mind and possibly really come up with a subsystem that monitors FlightGear's RAM usage at configurable intervals of say 1-5 seconds, and dumps all the info to the property tree, where it can be further processed by Nasal scripts or GUI listeners. If it's really swapping that slows down or kills FG for so many people, we obviously need to prevent it - and having access to real time memory usage stats would definitely seem useful then - not just for the random buildings system, but also all other subsystems that may "over-allocate" and run into swap space.

Thus, even just having a boolean "/memory/swap/is-swapping" would be useful, because other subsystems could monitor it using a listener and then adjust their own allocation behavior dynamically.

Hooray once talked with Stuart and TheTom about also adding details about the amount of physical RAM vs. swap usage, too - having these things available in FG, would make troubleshooting on the forums quite a bit easier, and would also allow us to do feature-scaling (dynamically adjusting features based on resource usage), Hooray provided a patch for this (Linux-only implementation) back then, and we could probably extend this for all main OS (win, mac, linux): http://forum.flightgear.org/viewtopic.php?f=5&t=16083&p=165168#p164936

As can be seen in the original discussion, we specifically talked about subsystem-specific RAM usage tracking (random buildings, canvas, autopilot, replay/flight recorder etc) -analogous to ThorstenB's system monitor, just with a focus on RAM (memory).

Possibly via some custom smart pointer implementation that overloads new/delete - the main issue being that we really need a way to track memory usage per subsystem, so that we can identify "rogue" features - either to fix them, or to provide hooks for feature-scaling purposes.

For Nasal space, we pretty much have a working solution that could help us identify scripts that are causing lots of new naRefs or temporaries. But C++ code is a different beast, and as can be seen in many discussions here - we have quite a few people with >= 8gb of RAM seeing crashes on 64 bit OS due to "out of RAM" errors.

There's also many other examples for this, such as the v2 scenery eating up tons of RAM without anybody noticing, until the atlas developer pointed this out: http://forum.flightgear.org/viewtopic.php?f=5&t=21498&p=195610&hilit=#p195610

So after the 3.0 release, we should probably investigate how to track RAM per subsystem/feature and provide some "live" stats.

Proof of concept (patch)

Note  The following patch implements a simple SGSubsystem by inheriting from the base class and wrapping access to /proc/pid/smaps (on Linux).

This should probably be updated to use the SGTimeStamp and SGPropertyObject APIs respectively. For the time being, the patch should be considered out of date - it's been updated to also include VRAM utilization stats, and will probably evolve to also provide other metrics over time.

Here's a first stab at a simple subsystem to monitor FlightGear memory usage on Linux at 5 second intervals, consider it a "proof of concept" prototype now, as this would need to be cleaned up and implemented for Mac/Windows respectively - on Linux it simply works such that it merely fopen()s /proc/pid/smaps and copies two metrics to the property tree:

diff -urN a/src/Main/CMakeLists.txt b/src/Main/CMakeLists.txt
--- a/src/Main/CMakeLists.txt 2015-08-20 23:03:15.070835000 +0300
+++ b/src/Main/CMakeLists.txt 2015-08-20 23:26:58.706798388 +0300
@@ -17,12 +17,17 @@
main.cxx
options.cxx
util.cxx
+ ram_usage.cxx
positioninit.cxx
subsystemFactory.cxx
screensaver_control.cxx
${RESOURCE_FILE}
)

+IF(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+ list(APPEND SOURCES ram_usage_linux.cxx)
+ENDIF(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+
set(HEADERS
fg_commands.hxx
fg_init.hxx
@@ -35,12 +40,19 @@
main.hxx
options.hxx
util.hxx
+ ram_usage.hxx
positioninit.hxx
subsystemFactory.hxx
AircraftDirVisitorBase.hxx
screensaver_control.hxx
)

+IF(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+ list(APPEND HEADERS ram_usage_linux.hxx)
+ENDIF(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
+
+
+
get_property(FG_SOURCES GLOBAL PROPERTY FG_SOURCES)
get_property(FG_HEADERS GLOBAL PROPERTY FG_HEADERS)

diff -urN a/src/Main/fg_init.cxx b/src/Main/fg_init.cxx
--- a/src/Main/fg_init.cxx 2015-08-20 23:22:52.166804000 +0300
+++ b/src/Main/fg_init.cxx 2015-08-20 23:29:15.074794813 +0300
@@ -141,6 +141,7 @@
#include "globals.hxx"
#include "logger.hxx"
#include "main.hxx"
+#include "ram_usage.hxx"
#include "positioninit.hxx"
#include "util.hxx"
#include "AircraftDirVisitorBase.hxx"
@@ -715,6 +716,10 @@
////////////////////////////////////////////////////////////////////
globals->add_subsystem("properties", new FGProperties);

+ ////////////////////////////////////////////////////////////////////
+ // Add the ram usage statistics system
+ ////////////////////////////////////////////////////////////////////
+ globals->add_subsystem("memory-stats", new MemoryUsageStats, SGSubsystemMgr::INIT, 5.00);

////////////////////////////////////////////////////////////////////
// Add the performance monitoring system.
diff -urN a/src/Main/ram_usage.cxx b/src/Main/ram_usage.cxx
--- a/src/Main/ram_usage.cxx 1970-01-01 03:00:00.000000000 +0300
+++ b/src/Main/ram_usage.cxx 2015-08-20 23:29:53.950793794 +0300
@@ -0,0 +1,22 @@
+#include "ram_usage_linux.hxx"
+
+MemoryUsageStats::MemoryUsageStats() {
+ _mem = new LinuxMemoryInterface(); //FIXME: should be implemented for Win/Mac & Linux
+}
+
+MemoryUsageStats::~MemoryUsageStats() {
+ delete _mem;
+}
+
+void
+MemoryUsageStats::update(double dt) {
+ _mem->update();
+ double swap = _mem->getSwapSize();
+ double total = _mem->getTotalSize();
+ SG_LOG(SG_GENERAL, SG_DEBUG, "Updating Memory Stats:" << total << " kb");
+ fgSetInt("/memory-usage/swap-usage-kb", swap );
+ fgSetInt("/memory-usage/total-usage-kb", total );
+}
+
+
+
diff -urN a/src/Main/ram_usage.hxx b/src/Main/ram_usage.hxx
--- a/src/Main/ram_usage.hxx 1970-01-01 03:00:00.000000000 +0300
+++ b/src/Main/ram_usage.hxx 2015-08-20 23:29:53.950793794 +0300
@@ -0,0 +1,51 @@
+#ifndef __RAM_USAGE
+#define __RAM_USAGE
+
+#include <simgear/timing/timestamp.hxx>
+#include <simgear/structure/subsystem_mgr.hxx>
+
+#include <Main/globals.hxx>
+#include <Main/fg_props.hxx>
+
+#include <string>
+#include <map>
+
+using std::map;
+
+// Linux: /proc/pid/smaps
+// Windows: http://msdn.microsoft.com/en-us/library/windows/desktop/ms682050(v=vs.85).aspx
+
+class MemoryInterface {
+public:
+ MemoryInterface() {}
+ typedef map<const char*, double> RamMap;
+//protected:
+ virtual void update() = 0;
+
+ double getTotalSize() const {return _total_size;}
+ //virtual void setTotalSize(double t) {_total_size=t;}
+
+ double getSwapSize() const {return _swap_size;}
+ //virtual void setSwapSize(double s) {_swap_size=s;}
+protected:
+ RamMap _size;
+ std::string _path;
+ std::stringstream _pid;
+
+ double _total_size;
+ double _swap_size;
+};
+
+class MemoryUsageStats : public SGSubsystem
+{
+public:
+ MemoryUsageStats();
+ ~MemoryUsageStats();
+ virtual void update(double);
+protected:
+private:
+ MemoryInterface* _mem;
+};
+
+#endif
+
diff -urN a/src/Main/ram_usage_linux.cxx b/src/Main/ram_usage_linux.cxx
--- a/src/Main/ram_usage_linux.cxx 1970-01-01 03:00:00.000000000 +0300
+++ b/src/Main/ram_usage_linux.cxx 2015-08-20 23:29:53.950793794 +0300
@@ -0,0 +1,49 @@
+// https://gist.github.com/896026/c346c7c8e4a9ab18577b4e6abfca37e358de83c1
+
+#include "ram_usage_linux.hxx"
+
+#include <cstring>
+#include <string>
+
+#include "Main/globals.hxx"
+
+using std::string;
+
+LinuxMemoryInterface::LinuxMemoryInterface() {
+ _pid << getpid();
+ _path = "/proc/"+ _pid.str() +"/smaps";
+}
+
+void
+LinuxMemoryInterface::OpenProcFile() {
+ file = fopen(_path.c_str(),"r" );
+ if (!file) {
+ throw("MemoryTracker:Cannot open /proc/pid/smaps");
+ }
+ SG_LOG(SG_GENERAL, SG_DEBUG, "Opened:"<< _path.c_str() );
+}
+
+LinuxMemoryInterface::~LinuxMemoryInterface() {
+ if (file) fclose(file);
+}
+
+void LinuxMemoryInterface::update() {
+ OpenProcFile();
+ if (!file) throw("MemoryTracker: ProcFile not open");
+
+ _total_size = 0;
+ _swap_size = 0;
+
+ char line[1024];
+ while (fgets(line, sizeof line, file))
+ {
+ char substr[32];
+ int n;
+ if (sscanf(line, "%31[^:]: %d", substr, &n) == 2) {
+ if (strcmp(substr, "Size") == 0) { _total_size += n; }
+ else if (strcmp(substr, "Swap") == 0) { _swap_size += n; }
+ }
+ }
+ fclose(file);
+}
+
diff -urN a/src/Main/ram_usage_linux.hxx b/src/Main/ram_usage_linux.hxx
--- a/src/Main/ram_usage_linux.hxx 1970-01-01 03:00:00.000000000 +0300
+++ b/src/Main/ram_usage_linux.hxx 2015-08-20 23:29:53.950793794 +0300
@@ -0,0 +1,22 @@
+#ifndef __RAM_USAGE_LINUX
+#define __RAM_USAGE_LINUX
+
+ #include <sys/types.h>
+ #include <unistd.h>
+ #include <stdio.h>
+
+ #include "ram_usage.hxx"
+
+class LinuxMemoryInterface : public MemoryInterface {
+public:
+ LinuxMemoryInterface();
+~LinuxMemoryInterface();
+ virtual void update();
+private:
+ void OpenProcFile();
+ const char* filename;
+ FILE *file;
+};
+
+
+#endif
+