Howto:Disable Nasal GC

From FlightGear wiki
Jump to navigation Jump to search
Caution  This wiki page discusses a patch to disable the GC entirely, which means that FlightGear will simply keep allocating and leaking memory without ever freeing it - this is primarily useful for troubleshooting purposes, i.e. to see what impact mark/sweep really has. Unless you are a core developer or familiar with C/C++, familiar with patching/rebuilding fgfs from source and with Nasal internals, you will probably not want to use any of this, certainly not for flying!

Performance Issues

Screen shot showing a diagram created with gnuplot to visualize the relationship between frame rate, frame spacing and GC induced pauses (exemplified by opening the FG1000 at ~ 03:00) - this causes a ton of GC pressure, and ends up running the GC more frequently than before

To be fair, it doesn't have to be Nasal or its GC - it's just the most likely culprit (depending on the aircraft/features you're using, those not using much/any Nasal, are unlikely to put that much pressure on the GC)

Scenery/Aircraft complexity can be excluded from the equation by using a custom/minimal startup profile to disable some of the really fancy stuff: https://wiki.flightgear.org/Minimal_Startup_Profile

To see if Nasal is the culprit, there's a rather brute force way to exclude it from the equation - disable it entirely: https://wiki.flightgear.org/Howto:Disable_Nasal_entirely

If you are still seeing stuttering, it obviously cannot be related to Nasal or its GC.

That way, we have found some enormous resource leaks over the years - if stuttering increases over time it would either be swapping, GC or timers/resources getting registered over and over again, so that they keep running hundreds or thousands of times - unnecessarily.

The latter would be rather simple to check by dumping the size() of the corresponding std::vectors of t the timer/listener queue - if those grow massively over the course of an hour, you have found the culprit (you can use a precorded flight/fgtape to automate that to some degree): https://wiki.flightgear.org/Performance_testing_by_replaying_recordings

The performance monitor dialog itself is a real resource hog, but it too, uses your property code as its interface - which is to say you can activate the performance monitor, and it will simply dump all stats to the global tree - and then you can use telnet or http to access/browse that, without clogging up the main loop even more (it is for a reason that PUI is in the process of being replaced by a Canvas based solution): https://wiki.flightgear.org/Howto:Using_the_Performance_monitor_without_PUI

Question

Anecdotally, the FG1000 seems to have more of a framerate impact than it did previously.

Has anyone experimented with not running the GC at all? Presumably the GC is just cleaning up the heap, and I wonder if memory occupancy of Nasal is that big of a problem.

Findings

we did that a couple of years ago - but it's a little more involved, imagine it like unmarking all of the memory, recursively traversing a tree structure (marking) to identify reachable nodes - this has to be done for each type of data type supported by Nasal (vectors, hashes, functions, ghosts etc)

Also, Nasal's GC doesn't merely allocate/free memory, it also organizes said memory into pools that are dynamically resized as needed.

So, in reality we're dealing with ~7 of these trees (memory pools) whose nodes may be connected. Thus, we cannot "simply" disable the GC without also crippling those memory pools, what's needed is to keep those functioning and do away with the expensive mark/sweep phase to make a difference.

However, there are other issues - as part of the whole GC journey, I also added a ton of debugging info to expose GC internals to the FG property tree, and it seems we are once again leaking GC related resources, and we're also running code unnecessarily (remember the effects cache/listener leak a couple of years ago?). That's the sort of thing I can currently see in FGNasalSys, too.

Which is a bit unfortunate, because I started reworking the existing GC scheme to become a generational GC - which wasn't all that difficult after all, but the modified GC being based on the broken one, only means there's less stuttering, but it continues to leak.

like you said, for people like you it would make sense to provide 2 startup options: 1) disable the GC entirely and 2) disable all of Nasal. At that point, we can safely exclude the GC/Nasal from any issues.

Like you said, I believe it would be best to provide two new Nasal options for troubleshooting purposes: 1) disable the GC, just keep it allocating like crazy and 2) disable Nasal itself entirely.

From a troubleshooting standpoint, it could go a long way to simply tell people use a property to disable the GC/Nasal itself to see if the issue persists or not, and it will also benefit us once we're investigating other leaks, like the infamous effects/listener cache - because, Nasal could then be easily removed from the equation.

Besides, bugman was considering to provide an option to disable Nasal, due to his FGPythonSys and cppunit work - so he might also have a few useful comments/ideas.


Summary

Conversely, to be perfectly honest, providing also a startup option to disable the whole scenery engine (including the tile manager) would also be enormously useful to acquit it (seriously though, if you could provide such an option, that could also have the effect of having more eyeballs involved in the Nasal/GC department - or, if the shit hits the fan, being able to exclude Nasal would also get more people interested in looking at resource utilization/performance challenges in other parts of FG like the scenery department)

Other than that, the integration of the Canvas GUI system itself is leaking Nasal: Any Canvas GUI dialog will make heap utilization grow, and it will never shrink, not even after closing all Canvas UI dialogs.

I have not yet checked if the issue can be reproduced with other Canvas placements (cockpit/scenery), so it could also be an issue with the GUI placement itself, keeping smart pointers around or something along those lines.

Anyway, making use of any reset/re-init related feature in-sim, will exacerbate the problem massively - I used patches shared by ThorstenB, AndersG and James to track active GC refs and objs, and the irony is any time, we are resetting something, Nasal's memory utilization will just continue grow more massively - that even applies to James' recent work to make Canvas modules re-loadable at runtime.

Overall, I don't think the FG1000 is the problem here - Nasal itself has issues, and Erik's standalone interpreter could help us better understand whether it's FG specific or can also be reproduced out-of-sim

PS: In hindsight it probably wasn't good use of our time to tinker with the existing GC, adding support for a more standard GC next to the existing one, would have probably been less work, and required less time - i.e. something like boehm GC, which does support generational/incremental GC

Motivation

To provide a little more background, compared to 2003 (when Nasal first got added), Nasal is indeed heavily used in a ton of places - among others, it's used for the 2D rendering API, which is indeed property based - which is to say, Nasal writes tons of properties, which in turn fire your SGPropertyChangeListener API, which are then bound to C++ APIs in OSG/OpenGL space - we built a whole API that way, on top of the property tree - which is not the most efficient way to do things, but that way, this API is accessible to all sorts of subsystems, not just the Nasal scripting system - i.e. conceptually, you could even set up 2D rendering textures over http or props (telnet).

The real issue however is that Nasal's GC is a rather simple mark/sweep collector. It took us many years to figure out that, depending on startup/runtime settings, Nasal GC overhead may indeed add up considerably. And Andy Ross himself commented various times on potential speeds-ups. But the meat of the problem really is 1) GC and 2) context switches in between Nasal/SG (C++) - because of the way we're abusing your listener interface to model a complete 2D rendering API on top of the property tree.

Don't get me wrong, it's served us rather well (and computers keep getting faster), but the combination of the "simple" GC scheme and the inevitable amount of context switches, may indeed add up considerably.

To see this for yourself, you only need to look at the DEBUG menu and start up the FG1000 PFD/MFD that Stuart implemented in response to your RNAV thread that you started a couple of years ago: i.e. the lack of modern glass avionics in FG, the FG1000 was implemented afterwards and depending on your system, it may take sends to boot up, and it may affect frame rate/spacing considerably.

(it can be rather eductional to open such a Canvas based feature, and then open the property browser and browse the /canvas sub-tree to see how things are hanging together there)

There also is a SGSubsystemMgr based performance monitor available via the debug menu, the thing to keep in mind here is that Nasal doesn't show up as "nasal" there, but rather as "events", because the majority of Nasal code is triggered via timers and listeners (which is a shortcoming of the UI or rather the way stats are associated at the SGSubsystem level)

Ideally, we would track GC overhead separatey, but also overhead of timers/listeners - and then add those stats to the OSG stats.

Now, it's not a good idea to start hacking away directly - what some of us have done however is using just a checkout of SG, to patch Nasal there and run the standalone interpreter from Andy's repo in conjunction with SG lib used by FG. Speaking in general, we're lacking tools/better stats to understand what's happening inside FlightGear.

The truth is, in the past, we've seen a number of performance problems due to leaking listeners/timers, i.e. callbacks being invoked unnecessarily - sometimes up 3000 times over a single second - in that regard, we're also lacking tooling/stats there, or at least heuristics to detect such issues.

the performance monitor is using SGSubsystem based SGTimeStamp tracking - that's why Nasal stuff, like the GC, but also complete subsystems implemented on top of listeners/timers are eluding those stats - in hindisght, the correct thing would have been to expose the SGSubsystem API to scripting space so that new Nasal modules use the same APIs rather than using timers and listeners to hook into the sim

Understanding GC

it's worth keeping in mind that a mark/sweep collector isn't nearly as sophisticated as the SGPropertyNode class we have in SimGear. And that SGPropertyNode is indeed very much overlapping from a scope/algorithmic standpoint.

And indeed, a background in XML/SGPropertyNode could come in handy when working with/understanding a GC:

Behind the scenes, a mark/sweep collector really isn't that different from an algorithm that recursively walks a SGPropertyNode structure to look for nodes and child nodes and which sets a "mark bit" (think XML attribute) on reachable nodes - that's indeed what's happening under the hood: You have a top-level tree/graph data structure (analogous to SGPropertyNode) and whenever a such a node has child nodes, those are marked - because these are obviously reachable from one of the root nodes.

That would basically sum up the "mark" phase - all reachable nodes get an attribute (mark bit) to mark them as reachable, which means they're valid and cannot be freed/deleted.


In the Nasal context, individual nodes may become invalid over time - because they're no longer referenced anywhere (think dangling pointers, or just variables created by a function and no longer used upon leaving the function). Think about it like an XML node that may simply "expire" because it's becoming invalid, as it was introduced by code - I believe in the DHTML context, you could think of a DOM node being removed by JavaScript (?) - that way, that node would no longer be reachable by any root nodes.

That is basically what's happening in Nasal space, some nodes are temporarily created/added but may simply expire (run out of scope).

And then there's a reap/sweep phase, which again, recursively walks ALL nodes (but this time not starting from any roots, but instead walking each memory pool completely), and looks for said "mark bit " (think XML attribute) on memory allocated inside the type specific memory pool - and if a node doesn't have such bit/marker set, the corresponding node is added to a "free list", which basically marks it for freeing, but also to be used for later reuse.

The background here being that Nasal uses the equivalent of "placement new" for its data types (variables) - so there are memory pools for each data type supported by Nasal (think vectors, hashes, and C++ data structures).

Each such pool is implemented on top of a linked list with raw C memory - that linked list is managed via malloc() - so that new allocations in Nasal space are mapped onto that raw memory. However, none of that is important to understand/fix up the current GC scheme - what's important is the recursive traversal to mark/reap objects depending on whether those are reachable or not.

Now, what's happening when the GC runs out of memory is that the Nasal allocator determines that the corresponding memory pool no longer has sufficient space, at which point it will pause/halt all Nasal execution (all Nasal threads), and do the recursive tree walking to look for reachable objects and mark those, and then do another traversal to remote all those not reachable any more by any "root" objects (think stack frame, local variables).

The hope here being to "free up" sufficient space, and if that isn't suffiicent, allocate new space in the memory pool using raw malloc() and increase/shrink the linked list as needed. At that point, the allocator will have free space and the new object is created by overlaying a struct/union on top of the C raw memory allocated by malloc.

And that is exactly why Nasal GC is a "stop-the-world" thing: it needs to walk the whole graph to determine what can be freed and what cannot be freed.

Andy's suggestion at the time was to allocate objects into generations, so that newer/younger objects live in the first generation, and only ever mark/reap from that completely, whereas the older generations (timestamp based or depending on GC survivals), would be reaped from more seldom, under the assumption that objects that are "old" have previously survived GC passes, so that these would be less likely to be reaped again.

This would shrink the total work load of the GC, because the majority of older objects would live in older generations and be rarely subject to marking/sweeping. The young generation would only hold "recently allocated" variables.

In Nasal space, we can implement "generational GC" by turning the array of memory pools into an multi-dimension array of memory pools - and that's indeed the part I got working meanwhile, and the part I have tested locally in a Nasal-only SG build (much faster to test/develop things without having to boot up fgfs every time).

What is now missing are heuristics to move/promote GC objects in between generations, and implement a corresponding "promotion" routine to update the pointers of such promoted objects.

Again, I firmly believe that tinkering with the current 340 LOC gc.c module is more worthwhile than even just trying to update individual Nasal modules: https://sourceforge.net/p/flightgear/si ... nasal/gc.c

Over time, we have grown quite a bit of information, so that the GC is becoming less and less obscure: https://wiki.flightgear.org/Nasal_GC_Musings https://wiki.flightgear.org/How_the_Nasal_GC_works


The SGPropertyNode comparison is probably a more workable entry point than you may think, because someone familiar with XML/tree data structures and recursion, undoubtedly also understands how a tree data structure is "walked" and how to differentiate between reachable and unreachable nodes.

With that introduction, people familiar with SGPropertyNode might be in a better position than they may think to take a look at the Nasal GC, especially in combination with the gc.c module itself, and those two wiki articles. Or at the very least understand if their performance issues are GC related or not.

I've talked to more than one person about this, improving gc.c is almost certainly going to be more worthwhile than any of the alternatives (some people even suggested replacing Nasal entirely by JavaScript or Python ...)

Overall, imagine a stack frame like a SGPropertyNode root - every time you instantiate a new variable inside the same stack frame, you are adding a child node to the route - once you are calling another function, you are adding a new stack frame, and new roots - which may have their own child nodes - upon function return, those variables are no longer reachable, so cannot be marked - which means they can be freed.

Instead of freeing them completely, there is a linked list that represents the raw memory which represents the memory pool for each data type, and dead blocks are freed so that these can be later on reused for new objects inside the same pool

Also, working on the GC can be rather rewarding in comparison to having to build all of sg/fg from source - because it's such a tiny and self-contained module after all, and fixing up the GC can have such a massive impact of overall fgfs performance.

I believe anybody considering even just reworking a 30k LOC Nasal module like Advanced Weather is deluding themselves in terms of hours and manpower needed compared to "just" understanding/fixing up 340 LOC of gc.c - or adding a 3rd party GC like Boehm GC

Patch

The following is what I could dig out of my archives, not sure if it's doing exactly what we discussed, but it should work "more or less". Obviously, it's using #ifdef macros to get the job done. But it should be straightforward to adapt this or add an extern char and set that from within FG via a prop switch or env variable

diff --git a/simgear/nasal/gc.c b/simgear/nasal/gc.c
index 4d45288c..9f32f0df 100644
--- a/simgear/nasal/gc.c
+++ b/simgear/nasal/gc.c
@@ -42,6 +42,7 @@ static void garbageCollect()
     while(c) {
         for(i=0; i<NUM_NASAL_TYPES; i++)
             c->nfree[i] = 0;
+#if 0
         for(i=0; i < c->fTop; i++) {
             mark(c->fStack[i].func);
             mark(c->fStack[i].locals);
@@ -50,15 +51,18 @@ static void garbageCollect()
             mark(c->opStack[i]);
         mark(c->dieArg);
         marktemps(c);
+#endif
         c = c->nextAll;
     }
 
+#if 0
     mark(globals->save);
     mark(globals->save_hash);
     mark(globals->symbols);
     mark(globals->meRef);
     mark(globals->argRef);
     mark(globals->parentsRef);
+#endif
 
     // Finally collect all the freed objects
     for(i=0; i<NUM_NASAL_TYPES; i++)
@@ -295,6 +299,7 @@ static void reap(struct naPool* p)
     p->nfree = 0;
     p->free = p->free0;
 
+#if 0
     for(b = p->blocks; b; b = b->next)
         for(elem=0; elem < b->size; elem++) {
             struct naObj* o = (struct naObj*)(b->block + elem * p->elemsz);
@@ -302,6 +307,7 @@ static void reap(struct naPool* p)
                 freeelem(p, o);
             o->mark = 0;
         }
+#endif
 
     p->freetop = p->nfree;

Boehm GC

Sceen shot showing fgfs running with an experimental BoehmGC integration
BoehmGC/fgfs running with the FG1000

The patch above could also be adapted to integrate an existing library like BoehmGC, which we did several years ago. To configure the GC for debugging purposes, you can use a few environment variables: GC_PRINT_STATS=1 GC_ENABLE_INCREMENTAL=1 GC_PAUSE_TIME_TARGET=10