How the Nasal GC works

From FlightGear wiki
Jump to navigation Jump to search

High Level Architecture

1rightarrow.png See High-Level Architecture for the main article about this subject.

Cquote1.png Regarding Nasal as a scripting language and AW, I'm hopeful that the work I'm doing on HLA will allow us to run Nasal in a separate thread from the FDM and display, so Nasal GC no-longer can impact frame-rates. It would also allow for writing a weather simulation completely external to the FlightGear instance, which could be quite neat.
— stuart (Nov 24th, 2015). Re: the real cost of the Nasal Garbage Collector.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I do like the idea of a stand-alone script interpreter communicating via HLA or some similar IPC, but we still have to consider fragments of script code embedded in models to create or enhance fancy animations. Not all script fragments make sense to run externally.
Cquote2.png
Cquote1.png if scripting is a major performance hog, off-load it to a different thread. I don't think we have many situation where it is, but that's what HLA is going to do.
Cquote2.png
Cquote1.png haven't made too many tests with Nasal active as anyone can do that (adjust paths to have no terrain if necessary). But most of this becomes less relevant with the HLA changes that Stuart's working on.
— Richard (Nov 24th, 2015). Re: the real cost of the Nasal Garbage Collector.
(powered by Instant-Cquotes)
Cquote2.png

Status (02/2016)

The main Nasal repository (maintained by Andy Ross) can also be found at github: https://github.com/andyross/nasal

The latest version in FlightGear is maintained in the SimGear repository.

The garbage collector in Nasal is pretty self-contained and entirely implemented in the file gc.c.

Cquote1.png Nasal's garbage collector requires that no other Nasal threads are

running, and achieves this by keeping a count of active Nasal threads
(nThreads, simgear/nasal/code.h), and having the thread that needs GC
ask all other threads to stop and wait until nThreads-1 say they have
(bottleneck(), simgear/nasal/gc.c). This will hang if there are any
threads that Nasal thinks are active but actually aren't, as they will
never reply to the stop request.


Cquote2.png



Cquote1.png Such things may look simple at first. Easy to convert a simple "hello world". But it's very complex when supporting all the features of an interpreted script language. And the funny thing is: you'd still need to worry about automatic garbage collection and count references (though that'd be a lesser issue compared to many others then). So, time wake up...
— ThorstenB (Mar 26th, 2011). Re: [Flightgear-devel] Stuttering at 1 Hz rate.
(powered by Instant-Cquotes)
Cquote2.png

Patches

Cquote1.png there is still the issue that you can't execute Nasal while the GC is running (for that you'd need a concurrent GC which is most likely not so easily implemented). The implication is that if the time required for a collection exceeds the frame dealy you'll still notice it even when the GC runs in a different thread. I have updated my experimental patch from last year, which should also work on Windows now (and it has even been tested there): Again: It doesn't solve the bigger problem of the current Nasal GC needing to run uninterrupted and in mutual exclusion - but I see very few GC runs in the main loop here (the patch prints "** Nasal GC in main thread **" in the console when that happens) except a few during startup.
— Anders Gidenstam (May 23rd, 2012). Re: [Flightgear-devel] Nasal performance.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png There are algorithms for incremental and/or concurrent and/or parallel garbage collection out there. They most likely not easy to implement and as far as I have seen so far would require (at least for concurrent and /or parallel GC) all writes of pointers to the Nasal heap (and possibly reads) to be redirected via wrapper functions (also known as (GC) read/write barriers). This will not be an easy task but in my opinion it would be a promising option. It might be possible to use a GC module from a GPL:d Java vm or similar. Btw, just running the normal (mutually exclusive) Nasal GC in another thread than the main loop is not hard - but since it is mutually exclusive to executing Nasal functions it doesn't help much when it comes to reducing the worst case latency. The small changes needed to add a separate GC thread are available here: Cquote2.png
Cquote1.png There were some serious issues with my patch - namely that the destruction of some C++ side FG objects kept alive by Nasal references must not be invoked on a different thread. I didn't see the crashes, but they appeared for people using Canvas heavy aircraft.

James did some work to fix these problems but I'm not sure how completely he managed. I still run the patch with his updates, however, and for the aircraft (I don't think I use any canvases, though) I use it works.

Running the Nasal GC in a different thread could certainly be revisited, and if the GC updated to an incremental one it could be done without impacting the main loop much.
— AndersG (Nov 24th, 2015). Re: the real cost of the Nasal Garbage Collector.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png Running the Nasal GC outside the main thread is already (seemingly) possible but code for it is not in the next branch. I think James fixed the remaining issue a few years back, namely that Nasal/C++ ghost objects must be handed back to the main thread for destruction since they invoke C++-side destructors that touch shared FG state. However, since the Nasal GC (currently) is mutual exclusive to running any Nasal context (using the same heap) this is still not a satisfactory solution once the time needed for a GC approaches the frame duration as the main thread will be blocked waiting for the GC to finish. I've been told this frequently happens for more Nasal heavy use cases, though I have not experienced it on my own ancient (by far GPU-bound) system in my use cases. There are some script use cases that require script code to run in sync with the viewer, e.g. script-based fly by, walk and MP/AI views. For such tightly coupled script code Lua (which, IIRC, offers "real" threads and an incremental GC) or an overhauled incremental and/or concurrent Nasal GC (not a negligible amount of work, though) would seem like options for this type of script code.
Cquote2.png


Cquote1.png The built-in osgviewer stats can be extended with custom stats, that works by subclassing osg::StatsHandler, this is already done in $FG_SRC/Viewer/FGEventHandler.?xx

The class can be extended to add your own stats via osgViewer::StatsHandler::addUserStatsLine()

You can even register totally custom stats via osg::Stats

The main suspects in this case are probably 1) scenery, 2) cockpit, 3) Nasal (GC - bottleneck() function in simgear/nasal )
For the Nasal stats, you'll probably want to register them in FGNasalSys::FGNasalSys() by accessing globals->viewer->
That should give you a better idea about what's going on there, and it's been suggested by core developers, too:

http://www.mail-archive.com/flightgear- ... 37823.html

Cquote1.png Another goal is to add more node bits (and a GUI dialog to control them) so

various categories of objects can be disabled during the update pass. This will
mean the direct hit of, say, AI models vs particles vs random trees can be
measured. Of course it won't account for resources (memory, textures) burned by
such things, but would still help different people identify slowness on their
setups.

Cquote2.png

— Hooray (Sun Feb 23). Re: New Aircraft: the Extra500.
(powered by Instant-Cquotes)
Cquote2.png

Problem

Cquote1.png we now know that regular and unpleasant stuttering is caused by Nasals garbage collector
Cquote2.png
Cquote1.png the garbage collection pass must be done in a single atomic step, otherwise it would leave the heap in an inconsistent state and adversely affect the scripts.
Cquote2.png
Cquote1.png There are algorithms for incremental and/or concurrent and/or parallel garbage collection out there. They most likely not easy to implement and as far as I have seen so far would require (at least for concurrent and /or parallel GC) all writes of pointers to the Nasal heap (and possibly reads) to be redirected via wrapper functions (also known as (GC) read/write barriers).
Cquote2.png
Cquote1.png This will not be an easy task but in my opinion it would be a promising option. It might be possible to use a GC module from a GPL:d Java vm or similar.
Cquote2.png
Cquote1.png just running the normal (mutually exclusive) Nasal GC in another thread than the main loop is not hard - but since it is mutually exclusive to executing Nasal functions it doesn't help much when it comes to reducing the worst case latency.

The small changes needed to add a separate GC thread are available here:* http://www.gidenstam.org/FlightGear/misc/test/sg-gc-2.diff


Cquote2.png
Cquote1.png the result is very similar to the standard FG since even if the GC is run on a different thread it still takes the same amount of time and while it is running the main thread will be blocked waiting for it as soon as it tries to run some Nasal code.

Note that there is a spurious change in simgear/nasal/code.c that makes variable creation without the var keyword illegal. I use that to make sure my Nasal scripts are clean in this regard but there are plenty of scripts in fgdata that do not pass that test (and, well, they are still valid Nasal). I'll update the patch file ASAP but please revert that file to the standard one in the meanwhile.


Cquote2.png
Cquote1.png there is still the issue that you can't execute Nasal while the GC is running (for that you'd need a concurrent GC which is most likely not so easily implemented). The implication is that if the time required for a collection exceeds the frame dealy you'll still notice it even when the GC runs in a different thread.

I have updated my experimental patch from last year, which should also work on Windows now (and it has even been tested there):


— Anders Gidenstam (2012-05-23). Re: [Flightgear-devel] Nasal performance.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png It doesn't solve the bigger problem of the current Nasal GC needing to run uninterrupted and in mutual exclusion - but I see very few GC runs in the main loop here (the patch prints "** Nasal GC in main thread **" in the console when that happens) except a few during startup.
— Anders Gidenstam (2012-05-23). Re: [Flightgear-devel] Nasal performance.
(powered by Instant-Cquotes)
Cquote2.png

The Nasal language is a dynamic programming language, where memory is not manually allocated and freed by the programmer, but instead the Nasal virtual machine automatically allocates memory from so called "memory pools", every so often these memory pools must be checked, to see how many references (i.e. objects like variables) are still reachable, so that the unreachable ones can be removed from the memory pool and/or so that memory blocks in each pool can be allocated or freed. This is called "garbage management" or "garbage collection".

The current Nasal garbage collector (GC) is known to affect the frame rate in FlightGear and occasionally causes stutter (increased frame spacing) under certain circumstances (i.e. under heavy load, due to complex scripts - such as the Bombable addon or the local weather system.

This is largely attributed to the GC being a fairly simple sequential mark/sweep collector, which needs to stop all operations, recursively mark all reachable objects and free unreachable objects in an exclusive, single-threaded, fashion.

In other words, the time spent in the GC is proportial to the number of active objects and references held (i.e. lines of code), even for objects which may not be running or active - just loaded. For additional background information, please see Improving Nasal.

Cquote1.png allocating objects into generations and only mark/reaping from the most recent one on most iterations is a straightforward optimization and will definitely make things faster.
— (Andy Ross) FlightGear Devel mailing list[1]
Cquote2.png

Objective

Provide a basic introduction to garbage collection and document how the existing Nasal GC scheme works, so that it can be fully understood, before alternatives (fixes/improvements) or additional implementations are added (feedback and contributions are encouraged and appreciated!).

Also, come up with an approach to generalize the Nasal GC API such that additional GC schemes can be easily implemented next to the native Nasal GC, providing an option to select a custom/experimental GC implementation at startup. Ideally, this should be possible by using a switch property during fgfs startup - such as --prop:/sim/nasal-gc=custom

One of the most straightforward changes would be turning the existing mark/sweep GC into a generational GC using 2-3 generations, where past GC survivors are promoted to the next generation, the GC manager naGC_get() would then by default only request new memory in generation 0, whereas resize() events in gen0 would result in the next higher generation being subject to GC (and so on). The most senior data structures and objects would over time end up in the "eden" generation.

Some of the first steps would include:

  • turning naGC_get() into a context-aware void function, so that callers do not need to be aware of the return type
  • moving GC specific data structures from the globals struct to a GC struct, and only keep a void* GC pointers in globals
  • move GC initialization to gc.c and allow different GCs to register their own init callbacks
  • move GC specific declarations from data.h to gc.h (naPool etc)
  • move GC specific stuff from misc.c to gc.c (none of the GC internals should be exposed to the rest of the code)
  • customize the existing GC to support multiple generations

Suggested readings

The wikipedia article on garbage collection contains some pretty good general GC information: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)

In addition, the following online presentations provide a basic introduction to garbage collection:


Mark/Sweep

When using a mark-and-sweep collector, unreachable objects are not immediately reclaimed. Instead, garbage is allowed to accumulate until all available memory has been exhausted. When that happens, the execution of the program is suspended temporarily while the mark-and-sweep algorithm collects all the garbage. Once all unreferenced objects have been reclaimed, the normal execution of the program can resume.

Obviously, the main disadvantage of the mark-and-sweep approach is the fact that that normal program execution is suspended while the garbage collection algorithm runs. In particular, this can be a problem in a program that must satisfy real-time execution constraints (like a flight simulator). For example, an interactive application that uses mark-and-sweep garbage collection becomes unresponsive periodically.

The mark-and-sweep algorithm is called a tracing garbage collector because is traces out the entire collection of objects that are directly or indirectly accessible by the program. The objects that a program can access directly are those objects which are referenced by local variables on the processor stack as well as by any static variables that refer to objects. In the context of garbage collection, these variables are called the roots . An object is indirectly accessible if it is referenced by a field in some other (directly or indirectly) accessible object. An accessible object is said to be live . Conversely, an object which is not live is garbage.

The mark-and-sweep algorithm consists of two phases: In the first phase, it finds and marks all accessible objects. The first phase is called the mark phase. In the second phase, the garbage collection algorithm scans through the heap and reclaims all the unmarked objects. The second phase is called the sweep phase.

The mark/sweep algorithm can also be easily expressed using pseudo code, i.e. in Nasal itself:

 
 var garbageCollect = func (roots) {
 foreach(var root; roots) {
   mark(root);
  }
  sweep();
 }

The mark() function would just recursively check (i.e. calling itself) if the "marked" flag is set or not, and make sure to set the mark bit for all referenced objects:

 
 var mark = func (obj) {
  if (!obj.marked) {
   obj.marked=1;
   foreach( var referenced; referenced_by(obj) )
     mark(referenced);
  }
 }

Finally, the sweep() function (reap() in the Nasal GC), will free all unreachable objects and reclaim the memory:

 
 var sweep = func {
  foreach(var obj; allocated_objects) {
   if (obj.marked) obj.marked=0;
   else reclaim(obj);
  }
 }

Nasal's implementation of sweep() ( called reap() ) works such that it is always executed for a handful of different type-specific memory pools. In addition, it also makes sure to allocate new memory if required.

Pool storage

A memory pool is basically a preallocated region of memory, which is dynamically resized. The Nasal GC works such that it manages a handful of global memory pools for all native Nasal types (strings, functions, vectors, hashes etc). At the moment, the hard coded defaults ensure that 25-50% of additional object "slots" (memory blocks) are kept available during each execution of reap() [1].

Whenever new memory is requested to create a new type (such as a vector or a string), the available memory in the corresponding pool will be checked, reachable objects will be marked, and dead objects will be removed from all pools using a mark/sweep collector, new memory blocks will be allocated if necessary. All of this happens atomically, i.e. single-threaded, in a "stop-the-world" fashion.

Memory blocks

Nasal Block structure


A Nasal memory pool consists of memory blocks.

A memory block is a singly linked list: flightgear/simgear/next/simgear/nasal/gc.c#l10.

Callgraph for newBlock()

Each block contains a field to store its size, a pointer to the allocated memory, and another pointer to the next block.

struct Block {
    int   size; // size of the block 
    char* block; // pointer to the memory region
    struct Block* next;	// pointer to the next block
};

New memory blocks are allocated using newBlock() and the memory is initialized with 0: flightgear/simgear/next/simgear/nasal/gc.c#l167

static void newBlock(struct naPool* p, int need)	
{
    int i;
    struct Block* newb;
    if(need < MIN_BLOCK_SIZE) need = MIN_BLOCK_SIZE;
    // allocate a new Block
    newb = naAlloc(sizeof(struct Block));
    // initialize the Block
    newb->block = naAlloc(need * p->elemsz); // number of elements * size of element
    newb->size = need; // set block size 
    // memory blocks are circular linked lists: 
    newb->next = p->blocks;
    p->blocks = newb;
    naBZero(newb->block, need * p->elemsz);

    if(need > p->freesz - p->freetop) need = p->freesz - p->freetop;
    p->nfree = 0;
    p->free = p->free0 + p->freetop;
    // mark all new blocks as unreachable and add them to the pool's free list
    for(i=0; i < need; i++) {
        // initialize each new new memory blocks by setting up an naRef (the container for all Nasal references) 
        struct naObj* o = (struct naObj*)(newb->block + i*p->elemsz);
        o->mark = 0;
        p->free[p->nfree++] = o;	
    }
    p->freetop += need;	
}

Nasal memory pools

For each of the 7 Nasal data types, there is a separate storage pool available, declared as part of the "Globals" structure.

As can be seen, each storage pool is addressed by its enum index (0..6): flightgear/simgear/next/simgear/nasal/data.h#l65

enum { T_STR, T_VEC, T_HASH, T_CODE, T_FUNC, T_CCODE, T_GHOST, NUM_NASAL_TYPES }; // V. important that this come last!

For example, globals->pools[T_VEC] refers to the storage pools for vectors:

globals->pools[T_VEC];

The 7 storage pools are declared in code.h as part of the "Globals" structure: flightgear/simgear/next/simgear/nasal/code.h#l40

The relevant part is:

struct Globals {	
    // Garbage collecting allocators:
    struct naPool pools[NUM_NASAL_TYPES];
    int allocCount;
    // Dead blocks waiting to be freed when it is safe
    void** deadBlocks;
    int deadsz;
    int ndead;
//...
};

So, we have these pools:

  • globals->pools[T_STR]
  • globals->pools[T_VEC]
  • globals->pools[T_HASH]
  • globals->pools[T_CODE]
  • globals->pools[T_FUNC]
  • globals->pools[T_CCODE]
  • globals->pools[T_GHOST]

The Globals structure

Nasal Globals structure

The complete Globals structure looks like this:

struct Globals {	
    // Garbage collecting allocators:
    struct naPool pools[NUM_NASAL_TYPES];
    int allocCount;
    // Dead blocks waiting to be freed when it is safe
    void** deadBlocks;
    int deadsz;
    int ndead;

    // Threading stuff
	
    int nThreads;
	
    int waitCount;
	
    int needGC;
	
    int bottleneck;
	
    void* sem;
	
    void* lock;
	
	
    // Constants
	
    naRef meRef;
	
    naRef argRef;

    naRef parentsRef;

    // A hash of symbol names
    naRef symbols;
    naRef save;
    struct Context* freeContexts;
    struct Context* allContexts;	
};

The naPool structure

Nasal naPool structure

The structure keeping all pool-related information is named "naPool", the layout of the naPool structure can be seen in data.h: flightgear/simgear/next/simgear/nasal/data.h#l181

struct naPool {
    int           type; // one of T_STR, T_VEC, T_HASH etc
    int           elemsz; // size of each element - determined via naTypeSize(T_*)
    struct Block* blocks;
    void**   free0; // pointer to the alloced buffer
    int     freesz; // size of the alloced buffer
    void**    free; // current "free frame"
    int      nfree; // down-counting index within the free frame
    int    freetop; // curr. top of the free list
};

Each memory pool is initialized using naGC_init(): flightgear/simgear/next/simgear/nasal/gc.c#l192

void naGC_init(struct naPool* p, int type)
{
    p->type = type;
    p->elemsz = naTypeSize(type);
    p->blocks = 0;
    p->free0 = p->free = 0;
    p->nfree = p->freesz = p->freetop = 0;
    reap(p);	
}

As can be seen, "elemsz" just stores the size of each type-specific structure, so that it can be used for new allocations.

naGC_init is called in initGlobals() in code.c for each Nasal type (T_STR, T_VEC etc): flightgear/simgear/next/simgear/nasal/code.c#l172

for(i=0; i<NUM_NASAL_TYPES; i++)
        naGC_init(&(globals->pools[i]), i);


The size of a memory pool is determined using poolsize(): flightgear/simgear/next/simgear/nasal/gc.c#l203

static int poolsize(struct naPool* p)
{
    int total = 0;
    struct Block* b = p->blocks;
    while(b) { total += b->size; b = b->next; }
    return total;	
}

The function poolsize() just iterates over all memory blocks in the pool, and computes the total size of the pool by adding up the "size" field of all blocks in the linked list.

Allocating Nasal types

All Nasal types (scalars, vectors, hashes, funcs, ghosts) are allocated via wrappers in misc.c.

These wrappers are:

All of these wrappers are really just helpers on top of naNew(), they just set up type-specific information and initialize each Nasal type properly. For instance, first naNew() is called for the given context, along with the type info (to set up the size properly)- and then each type is initialized:

// The naNew* helpers are just wrappers to set up type-specific structures, after allocating the memory using naNew()
naRef naNewString(struct Context* c)	
{
    naRef s = naNew(c, T_STR);
    PTR(s).str->emblen = 0;
    PTR(s).str->data.ref.len = 0;
    PTR(s).str->data.ref.ptr = 0;
    PTR(s).str->hashcode = 0;
    return s;
}

naNew()

As can be seen, the key allocator function is naNew(). All memory allocations done via the naNew*() helpers will first of all set up memory using naNew().

naNew() is really just a fancy malloc() replacement, i.e. it's just getting passed the Nasal execution context, and the nasal type (which is mapped to the size of the requested memory region using naTypeSize()).

Cquote1.png New nasal objects are added to a temporary bin when they are created, because further allocation might cause a garbage collection to happen before the code that created the object can store a reference to it where the garbage collector can find it. For performance and simplicity, this list is stored per-context. When the context next executes code, it clears this list. Here's the problem: we do a lot of our naNewXX() calls in FlightGear using the old "global context" object that is created at startup. But this context is no longer used to execute scripts* at runtime, so as Csaba discovered, it's temporaries are never flushed. That essentially causes a resource leak: those allocations (mostly listener nodes) will never be freed. And all the extra "reachable" Nasal data floating around causes the garbage collector to take longer and longer to do a full collection as time goes on, causing "stutter". And scripts that use listeners extensively (the cmdarg() they use was one of the affected allocations) will see the problem more seriously.
Cquote2.png
Cquote1.png Once listeners were added, scripts could be recursive: (script A sets property X which causes listener L to fire and cause script B to run) We need two or more contexts on the stack to handle that; a single global one won't work. I didn't like the fix though. Exposing the temporary bin as part of the Nasal public API is ugly; it's an internal design feature, not something users should tune. Instead, I just hacked at the FlightGear code to reinitialie this context every frame, thus cleaning it up. A "proper" fix would be to remove the global context entirely, but that would touch a bunch of code.
Cquote2.png

The naNew() function is also implemented in misc.c, see flightgear/simgear/next/simgear/nasal/misc.c#l66:

naRef naNew(struct Context* c, int type)	
{
    naRef result;
    if(c->nfree[type] == 0)
        c->free[type] = naGC_get(&globals->pools[type],
                                 OBJ_CACHE_SZ, &c->nfree[type]);
    result = naObj(type, c->free[type][--c->nfree[type]]);
    naTempSave(c, result);
    return result;
}

First, the conditional checks if there's any free memory in the Nasal storage pools for the requested Nasal data type, if there isn't any free memory left, new memory is assigned to the contexts->free[type] pool via naGC_get().

Note: naGC_get() is the only function which really manages all memory pools and which triggers the GC. Allocations via naNew()* will end up calling naNew() and naNew() ends up calling naGC_get() always.

For the time being, OBJ_CACHE_SZ can be ignored, it's defined in code.h: flightgear/simgear/next/simgear/nasal/code.h#l12

// Number of objects (per pool per thread) asked for using naGC_get().
// The idea is that contexts can "cache" allocations to prevent thread
// contention on the global pools.  But in practice this interacts
// very badly with small subcontext calls, which grab huge numbers of
// cached objects and don't use them, causing far more collections
// than necessary.  Just leave it at 1 pending a rework of the
// collector synchronization.
#define OBJ_CACHE_SZ 1

The final argument to the naGC_get() function is a pointer to to the context's nfree[type] member which is updated by the function.

Next, a new naRef structure is set up using the allocated naObj memory and the type info:

result = naObj(type, c->free[type][--c->nfree[type]]);


naRef naObj(int type, struct naObj* o)	
{
    naRef r;
    SETPTR(r, o);
    o->type = type;
    return r;	
}

The GC interface

Nasal GC


TODO: flightgear/simgear/next/simgear/nasal/data.h#l204

  • void naGC_init(struct naPool* p, int type);
  • struct naObj** naGC_get(struct naPool* p, int n, int* nout);
  • void naGC_swapfree(void** target, void* val); (used for vector/hash resizing)
  • void naGC_freedead();
  • void naiGCMark(naRef r);
  • void naiGCMarkHash(naRef h);

The pool manager

Callgraph for naGC_get()

All high level type allocators (naNewString, naNewVector, naNewHash etc) make use of the naNew() call introduced earlier.

naNew() doesn't directly allocate new memory from the heap using naAlloc(), but instead uses the naGC_get() helper which manages memory from the global pools (i.e. preallocated memory for each type).

For each native Nasal data type (scalars, vectors, hashes, funcs etc) , there are separate memory pools in globals->pools[n]. The index (n) is one of T_VEC, T_HASH, T_FUNC etc.

The naGC_get() pool "manager" is located in gc.nas: flightgear/simgear/next/simgear/nasal/gc.c#l211

struct naObj** naGC_get(struct naPool* p, int n, int* nout)
{
    struct naObj** result;
    naCheckBottleneck();
    LOCK();
    while(globals->allocCount < 0 || (p->nfree == 0 && p->freetop >= p->freesz)) {
        globals->needGC = 1;
        bottleneck();	
    }
    if(p->nfree == 0)
        newBlock(p, poolsize(p)/8);
    n = p->nfree < n ? p->nfree : n;
    *nout = n;
    p->nfree -= n;
    globals->allocCount -= n;
    result = (struct naObj**)(p->free + p->nfree);
    UNLOCK();
    return result;	
}

As can be seen, the garbage collector itself is triggered by setting the global "needGC" flag to 1 in the globals structure and then calling the bottleneck() function: flightgear/simgear/next/simgear/nasal/gc.c#l105.

The memory region from the global pool is cast back to a pointer (struct naObj**): flightgear/simgear/next/simgear/nasal/gc.c#l226 the "nout" parameter is just there to update c->nfree[type] with the amount of free memory.

In other words, whenever new Nasal types are allocated, new memory from one of the 7 Nasal pools is requested. All of these allocation requests are channeled through naNew() and finally naGC_get(). naGC_get() is the de facto memory manager in Nasal, all memory management is triggered here - new memory block allocations for each pool, as well as garbage collection via the bottleneck() call.

This also means that we basically just need to provide an alternate naGC_get() function, next to the existing one, to add support for alternate GC schemes.

Supporting additional GC implementations

It would probably make sense to change the pool manager interface/signature such that additional implementations can be easily provided and registered, so that merely a GC_CALLBACK pointer must be set during initialization in order to use a different GC scheme:

diff --git a/simgear/nasal/data.h b/simgear/nasal/data.h
index deb4fbd..f3fcaa1 100644
--- a/simgear/nasal/data.h
+++ b/simgear/nasal/data.h
@@ -202,7 +202,7 @@ int naiHash_sym(struct naHash* h, struct naStr* sym, naRef* out);
 void naiHash_newsym(struct naHash* h, naRef* sym, naRef* val);
 
 void naGC_init(struct naPool* p, int type);
-struct naObj** naGC_get(struct naPool* p, int n, int* nout);
+void naGC_alloc(struct Context* c, int type, int n);
 void naGC_swapfree(void** target, void* val);
 void naGC_freedead();
 void naiGCMark(naRef r);
diff --git a/simgear/nasal/gc.c b/simgear/nasal/gc.c
index e9d9b7b..3bae313 100644
--- a/simgear/nasal/gc.c
+++ b/simgear/nasal/gc.c
@@ -191,8 +191,10 @@ static int poolsize(struct naPool* p)
     return total;
 }
 
-struct naObj** naGC_get(struct naPool* p, int n, int* nout)
+void naGC_alloc(struct Context* c, int type, int n)
 {
+    struct naPool* p = &globals->pools[type]; // type-specific memory pool
+    int* nout = &c->nfree[type];		// 
     struct naObj** result;
     naCheckBottleneck();
     LOCK();
@@ -208,7 +210,7 @@ struct naObj** naGC_get(struct naPool* p, int n, int* nout)
     globals->allocCount -= n;
     result = (struct naObj**)(p->free + p->nfree);
     UNLOCK();
-    return result;
+    c->free[type]=result;
 }
 
 static void markvec(naRef r)
diff --git a/simgear/nasal/misc.c b/simgear/nasal/misc.c
index dae0bfe..5caeb76 100644
--- a/simgear/nasal/misc.c
+++ b/simgear/nasal/misc.c
@@ -66,9 +66,7 @@ naRef naStringValue(naContext c, naRef r)
 naRef naNew(struct Context* c, int type)
 {
     naRef result;
-    if(c->nfree[type] == 0)
-        c->free[type] = naGC_get(&globals->pools[type],
-                                 OBJ_CACHE_SZ, &c->nfree[type]);
+    if(! c->nfree[type] ) naGC_alloc(c, type, OBJ_CACHE_SZ);
     result = naObj(type, c->free[type][--c->nfree[type]]);
     naTempSave(c, result);
     return result;

Disabling the current GC

The overhead caused by managing the memory pools and constantly scanning the heap (mark/reap) can be examined by modifying the naGC_get() function such that it no longer uses any pool memory, but directly allocates via malloc()/naAlloc() - without ever freeing any memory (i.e. leaking). While the fgfs process will eventually get killed this way, the GC will be basically switched off.

struct naObj** naGC_get(struct naPool* p, int n, int* nout)
{
    struct naObj** result;
    naCheckBottleneck();
    LOCK();
#define LEAKY 
#ifndef LEAKY
    while(globals->allocCount < 0 || (p->nfree == 0 && p->freetop >= p->freesz)) {
        globals->needGC = 1;
        bottleneck();	
    }
    if(p->nfree == 0)
        newBlock(p, poolsize(p)/8);
    n = p->nfree < n ? p->nfree : n;
    *nout = n;
    p->nfree -= n;
    globals->allocCount -= n;
    result = (struct naObj**)(p->free + p->nfree);
#else
    result = (struct naObj**) naAlloc( naTypeSize(n) );
#endif
    UNLOCK();
    return result;	
}

bottleneck()

Callergraph of naCheckBottleneck()
Callgraph of naCheckBottleneck()

Whenever new memory from one of the global pools is being requested via naGC_get(), naCheckBottleneck() will also be invoked:

flightgear/simgear/next/simgear/nasal/gc.c#l131

void naCheckBottleneck()	
{
    if(globals->bottleneck) { LOCK(); bottleneck(); UNLOCK(); }	
}

This checks if globals->bottleneck is true, and if it is, it makes sure that the bottleneck() function is only invoked after acquiring a lock.

Callgraph of bottleneck()

flightgear/simgear/next/simgear/nasal/gc.c#l100

// Must be called with the main lock.  Engages the "bottleneck", where
// all threads will block so that one (the last one to call this
// function) can run alone.  This is done for GC, and also to free the
// list of "dead" blocks when it gets full (which is part of GC, if
// you think about it).
static void bottleneck()	
{
    struct Globals* g = globals;
    g->bottleneck = 1;
    while(g->bottleneck && g->waitCount < g->nThreads - 1) {
        g->waitCount++;
        UNLOCK(); naSemDown(g->sem); LOCK();
        g->waitCount--;	
    }
    if(g->waitCount >= g->nThreads - 1) {
        freeDead();
        if(g->needGC) garbageCollect();
        if(g->waitCount) naSemUp(g->sem, g->waitCount);
        g->bottleneck = 0;	
    }	
}

The actual garbage collection takes place in line 114 where dead objects from the memory pool are freed, and the garbageCollect() function is called if globals.needGC is true.

garbageCollect()

Callgraph of garbageCollect()

This is the function that does the mark/sweep part using mark() to recursively mark reachable objects, and reap() to collect unreachable objects.

flightgear/simgear/next/simgear/nasal/gc.c#l35

// Must be called with the big lock!
static void garbageCollect()	
{
    int i;
    struct Context* c;
    globals->allocCount = 0;
    c = globals->allContexts;
    while(c) {
        for(i=0; i<NUM_NASAL_TYPES; i++)
            c->nfree[i] = 0;
        for(i=0; i < c->fTop; i++) {
            mark(c->fStack[i].func);
            mark(c->fStack[i].locals);
        }
        for(i=0; i < c->opTop; i++)
            mark(c->opStack[i]);
        mark(c->dieArg);
        marktemps(c);
        c = c->nextAll;	
    }
    mark(globals->save);
    mark(globals->symbols);
    mark(globals->meRef);
    mark(globals->argRef);
    mark(globals->parentsRef);
    // Finally collect all the freed objects
    for(i=0; i<NUM_NASAL_TYPES; i++)
        reap(&(globals->pools[i]));
    // Make enough space for the dead blocks we need to free during
    // execution.  This works out to 1 spot for every 2 live objects,
    // which should be limit the number of bottleneck operations
    // without imposing an undue burden of extra "freeable" memory.
    if(globals->deadsz < globals->allocCount) {
        globals->deadsz = globals->allocCount;
        if(globals->deadsz < 256) globals->deadsz = 256;
        naFree(globals->deadBlocks);
        globals->deadBlocks = naAlloc(sizeof(void*) * globals->deadsz);	
    }
    globals->needGC = 0;	
}

For each invocation of garbageCollect(), the following takes place:

mark()

Callgraph of mark()

All reachable objects are recursively marked using tagged naRef structures flightgear/simgear/next/simgear/nasal/data.h#l50, so that unreachable objects can be collected (i.e. because the reftag bit is not set) and freed.

TODO: flightgear/simgear/next/simgear/nasal/gc.c#l223

reap()

Callgraph for reap()

This is the collector function called by garbageCollect() which also frees elements and allocates new blocks for the specified pool:

// Collects all the unreachable objects into a free list, and
// allocates more space if needed.
static void reap(struct naPool* p)
{
    struct Block* b;
    int elem, freesz, total = poolsize(p);
    freesz = total < MIN_BLOCK_SIZE ? MIN_BLOCK_SIZE : total;
    freesz = (3 * freesz / 2) + (globals->nThreads * OBJ_CACHE_SZ);
    if(p->freesz < freesz) {
        naFree(p->free0);
        p->freesz = freesz;
        p->free = p->free0 = naAlloc(sizeof(void*) * p->freesz);
    }
    p->nfree = 0;
    p->free = p->free0;
    for(b = p->blocks; b; b = b->next)
        for(elem=0; elem < b->size; elem++) {
            struct naObj* o = (struct naObj*)(b->block + elem * p->elemsz);
            if(o->mark == 0)
                freeelem(p, o);
            o->mark = 0;
        }
    p->freetop = p->nfree;
    // allocs of this type until the next collection
    globals->allocCount += total/2;
    // Allocate more if necessary (try to keep 25-50% of the objects
    // available)
    if(p->nfree < total/4) {
        int used = total - p->nfree;
        int avail = total - used;
        int need = used/2 - avail;
        if(need > 0)
            newBlock(p, need);
    }	
}

swapfree()

Callgraph of swapfree()

freeelem()

Callgraph of freeelem()

Frees individual elements and adds freed pointers to the globals free list, implemented in gc.c: flightgear/simgear/next/simgear/nasal/gc.c#l153

static void freeelem(struct naPool* p, struct naObj* o)
{
    // Clean up any intrinsic storage the object might have...
    switch(p->type) {
    case T_STR:   naStr_gcclean  ((struct naStr*)  o); break;
    case T_VEC:   naVec_gcclean  ((struct naVec*)  o); break;
    case T_HASH:  naiGCHashClean ((struct naHash*) o); break;
    case T_CODE:  naCode_gcclean ((struct naCode*) o); break;
    case T_GHOST: naGhost_gcclean((struct naGhost*)o); break;	
    }
    p->free[p->nfree++] = o;  // ...and add it to the free list	
}

Type destructors

Will be called by freelem(), implemented for all native Nasal types:

Contexts

Cquote1.png New nasal objects are added to a temporary bin when they are created, because further allocation might cause a garbage collection to happen before the code that created the object can store a reference to it where the garbage collector can find it. For performance and simplicity, this list is stored per-context. When the context next executes code, it clears this list. Here's the problem: we do a lot of our naNewXX() calls in FlightGear using the old "global context" object that is created at startup. But this context is no longer used to execute scripts* at runtime, so as Csaba discovered, it's temporaries are never flushed. That essentially causes a resource leak: those allocations (mostly listener nodes) will never be freed. And all the extra "reachable" Nasal data floating around causes the garbage collector to take longer and longer to do a full collection as time goes on, causing "stutter". And scripts that use listeners extensively (the cmdarg() they use was one of the affected allocations) will see the problem more seriously. (That's a feature, not a bug. Once listeners were added, scripts could be recursive: (script A sets property X which causes listener L to fire and cause script B to run) We need two or more contexts on the stack to handle that; a single global one won't work.) I didn't like the fix though. Exposing the temporary bin as part of the Nasal public API is ugly; it's an internal design feature, not something users should tune. Instead, I just hacked at the FlightGear code to reinitialize this context every frame, thus cleaning it up. A "proper" fix would be to remove the global context entirely, but that would touch a bunch of code. [2]
— Andy Ross
Cquote2.png


Cquote1.png The problem is the fact that Nasal recycles context objects - it keeps a pool of freed contexts around. At the point we shutdown Nasal, all contexts should be inactive, i.e in the free list (and they are, phew). Free-ed contexts are still considered for GC however, and there seems to be an issue that the opStack (part of the stack machine which implements the core bytecode interpreter of Nasal) doesn’t end up empty in some situations. So my work around is to clear the opStack (and frameStack) to zero, inside naFreeContext. This then matches the state for a new (non-recycled) context, and ensures that on the next GC cycle, items referenced by the opStack aren’t marked as in-use. (At least to me it makes sense, that a freed context - which is therefore inaccessible - shouldn’t be retaining items in its opStack into its next re-use) The ideal fix would be to find out *why* the opStack doesn’t return to empty ‘naturally’ before the context is freed [3]
— James Turner
Cquote2.png


Also see: flightgear/flightgear/next/src/Scripting/NasalSys.cxx (in FGNasalSys::update)

    // The global context is a legacy thing.  We use dynamically
    // created contexts for naCall() now, so that we can call them
    // recursively.  But there are still spots that want to use it for
    // naNew*() calls, which end up leaking memory because the context
    // only clears out its temporary vector when it's *used*.  So just
    // junk it and fetch a new/reinitialized one every frame.  This is
    // clumsy: the right solution would use the dynamic context in all
    // cases and eliminate _context entirely.  But that's more work,
    // and this works fine (yes, they say "New" and "Free", but
    // they're very fast, just trust me). -Andy

Consider using coccinelle here to rewrite naContext handling in all places at once


Nasal Context structure

flightgear/simgear/next/simgear/nasal/code.h#l76

struct Context {
	
    // Stack(s)
	
    struct Frame fStack[MAX_RECURSION];
	
    int fTop;
	
    naRef opStack[MAX_STACK_DEPTH];
	
    int opFrame; // like Frame::bp, but for C functions
	
    int opTop;
	
    int markStack[MAX_MARK_DEPTH];
	
    int markTop;
	
	
    // Free object lists, cached from the global GC
	
    struct naObj** free[NUM_NASAL_TYPES];
	
    int nfree[NUM_NASAL_TYPES];
	
	
    // GC-findable reference point for objects that may live on the
	
    // processor ("real") stack during execution.  naNew() places them
	
    // here, and clears the array each instruction
	
    struct naObj** temps;
	
    int ntemps;
	
    int tempsz;
	
	
    // Error handling
	
    jmp_buf jumpHandle;
	
    char error[128];
	
    naRef dieArg;
	
	
    // Sub-call lists
	
    struct Context* callParent;
	
    struct Context* callChild;
	
	
    // Linked list pointers in globals
	
    struct Context* nextFree;
	
    struct Context* nextAll;
	
	
    void* userData;
	
};

Stack frames

flightgear/simgear/next/simgear/nasal/code.h#l33

Tagged structures

flightgear/simgear/next/simgear/nasal/data.h#l84

Core allocations

All core memory allocation takes place via wrappers in misc.c:

void naFree(void* m) { free(m); }	
void* naAlloc(int n) { return malloc(n); }
void* naRealloc(void* b, int n) { return realloc(b, n); }
void naBZero(void* m, int n) { memset(m, 0, n); }