How the Nasal GC works
Nasal internals |
---|
Memory Management (GC) |
Multicore |
---|
Configuration |
Ongoing Efforts |
Proposals & RFCs |
Background |
For developers |
High Level Architecture
See High-Level Architecture for the main article about this subject. |
Regarding Nasal as a scripting language and AW, I'm hopeful that the work I'm doing on HLA will allow us to run Nasal in a separate thread from the FDM and display, so Nasal GC no-longer can impact frame-rates. It would also allow for writing a weather simulation completely external to the FlightGear instance, which could be quite neat. — stuart (Nov 24th, 2015). Re: the real cost of the Nasal Garbage Collector.
(powered by Instant-Cquotes) |
I do like the idea of a stand-alone script interpreter communicating via HLA or some similar IPC, but we still have to consider fragments of script code embedded in models to create or enhance fancy animations. Not all script fragments make sense to run externally. — curt (Dec 28th, 2015). Re: FGPython an propose for Python as an nasal alternative.
(powered by Instant-Cquotes) |
if scripting is a major performance hog, off-load it to a different thread. I don't think we have many situation where it is, but that's what HLA is going to do. — Thorsten (Dec 29th, 2015). Re: FGPython an propose for Python as an nasal alternative.
(powered by Instant-Cquotes) |
haven't made too many tests with Nasal active as anyone can do that (adjust paths to have no terrain if necessary).
But most of this becomes less relevant with the HLA changes that Stuart's working on. — Richard (Nov 24th, 2015). Re: the real cost of the Nasal Garbage Collector.
(powered by Instant-Cquotes) |
Status (02/2016)
The main Nasal repository (maintained by Andy Ross) can also be found at github: https://github.com/andyross/nasal
The latest version in FlightGear is maintained in the SimGear repository.
The garbage collector in Nasal is pretty self-contained and entirely implemented in the file gc.c.
Nasal's garbage collector requires that no other Nasal threads are running, and achieves this by keeping a count of active Nasal threads — Rebecca N. Palmer (2014-10-25). [Flightgear-devel] Nasal threading, naCall() vs naCallMethodCtx().
(powered by Instant-Cquotes) |
Such things may look simple at first. Easy to convert a simple "hello world". But it's very complex when supporting all the features of an interpreted script language. And the funny thing is: you'd still need to worry about automatic garbage collection and count references (though that'd be a lesser issue compared to many others then). So, time wake up... — ThorstenB (Mar 26th, 2011). Re: [Flightgear-devel] Stuttering at 1 Hz rate.
(powered by Instant-Cquotes) |
Patches
there is still the issue that you can't execute Nasal while the GC is running (for that you'd need a concurrent GC which is most likely not so easily implemented). The implication is that if the time required for a collection exceeds the frame dealy you'll still notice it even when the GC runs in a different thread. I have updated my experimental patch from last year, which should also work on Windows now (and it has even been tested there):
— Anders Gidenstam (May 23rd, 2012). Re: [Flightgear-devel] Nasal performance.
(powered by Instant-Cquotes) |
There are algorithms for incremental and/or concurrent and/or parallel garbage collection out there. They most likely not easy to implement and as far as I have seen so far would require (at least for concurrent and /or parallel GC) all writes of pointers to the Nasal heap (and possibly reads) to be redirected via wrapper functions (also known as (GC) read/write barriers). This will not be an easy task but in my opinion it would be a promising option. It might be possible to use a GC module from a GPL:d Java vm or similar. Btw, just running the normal (mutually exclusive) Nasal GC in another thread than the main loop is not hard - but since it is mutually exclusive to executing Nasal functions it doesn't help much when it comes to reducing the worst case latency. The small changes needed to add a separate GC thread are available here:
|
There were some serious issues with my patch - namely that the destruction of some C++ side FG objects kept alive by Nasal references must not be invoked on a different thread. I didn't see the crashes, but they appeared for people using Canvas heavy aircraft.
James did some work to fix these problems but I'm not sure how completely he managed. I still run the patch with his updates, however, and for the aircraft (I don't think I use any canvases, though) I use it works. Running the Nasal GC in a different thread could certainly be revisited, and if the GC updated to an incremental one it could be done without impacting the main loop much.— AndersG (Nov 24th, 2015). Re: the real cost of the Nasal Garbage Collector.
(powered by Instant-Cquotes) |
Running the Nasal GC outside the main thread is already (seemingly) possible but code for it is not in the next branch. I think James fixed the remaining issue a few years back, namely that Nasal/C++ ghost objects must be handed back to the main thread for destruction since they invoke C++-side destructors that touch shared FG state. However, since the Nasal GC (currently) is mutual exclusive to running any Nasal context (using the same heap) this is still not a satisfactory solution once the time needed for a GC approaches the frame duration as the main thread will be blocked waiting for the GC to finish. I've been told this frequently happens for more Nasal heavy use cases, though I have not experienced it on my own ancient (by far GPU-bound) system in my use cases. There are some script use cases that require script code to run in sync with the viewer, e.g. script-based fly by, walk and MP/AI views. For such tightly coupled script code Lua (which, IIRC, offers "real" threads and an incremental GC) or an overhauled incremental and/or concurrent Nasal GC (not a negligible amount of work, though) would seem like options for this type of script code. — Anders Gidenstam (Jan 26th, 2016). Re: [Flightgear-devel] Designing a thread-safe property tree API
(was Re: A FGPythonSys implementation: ...).
(powered by Instant-Cquotes) |
The built-in osgviewer stats can be extended with custom stats, that works by subclassing osg::StatsHandler, this is already done in $FG_SRC/Viewer/FGEventHandler.?xx The class can be extended to add your own stats via osgViewer::StatsHandler::addUserStatsLine() |
Problem
we now know that regular and unpleasant stuttering is caused by Nasals garbage collector
— Robert (2011-04-10). [Flightgear-devel] Nasal Garbage Collector (was Stuttering at 1 Hz
rate).
(powered by Instant-Cquotes) |
the garbage collection pass must be done in a single atomic step, otherwise it would leave the heap in an inconsistent state and adversely affect the scripts.
— Curtis Olson (2011-04-10). Re: [Flightgear-devel] Nasal Garbage Collector (was Stuttering at 1
Hz rate).
(powered by Instant-Cquotes) |
There are algorithms for incremental and/or concurrent and/or parallel garbage collection out there. They most likely not easy to implement and as far as I have seen so far would require (at least for concurrent and /or parallel GC) all writes of pointers to the Nasal heap (and possibly reads) to be redirected via wrapper functions (also known as (GC) read/write barriers).
— Anders Gidenstam (2011-04-10). Re: [Flightgear-devel] Nasal Garbage Collector (was Stuttering at 1
Hz rate).
(powered by Instant-Cquotes) |
This will not be an easy task but in my opinion it would be a promising option. It might be possible to use a GC module from a GPL:d Java vm or similar.
— Anders Gidenstam (2011-04-10). Re: [Flightgear-devel] Nasal Garbage Collector (was Stuttering at 1
Hz rate).
(powered by Instant-Cquotes) |
just running the normal (mutually exclusive) Nasal GC in another thread than the main loop is not hard - but since it is mutually exclusive to executing Nasal functions it doesn't help much when it comes to reducing the worst case latency.
The small changes needed to add a separate GC thread are available here:* http://www.gidenstam.org/FlightGear/misc/test/sg-gc-2.diff — Anders Gidenstam (2011-04-10). Re: [Flightgear-devel] Nasal Garbage Collector (was Stuttering at 1
Hz rate).
(powered by Instant-Cquotes) |
the result is very similar to the standard FG since even if the GC is run on a different thread it still takes the same amount of time and while it is running the main thread will be blocked waiting for it as soon as it tries to run some Nasal code.
Note that there is a spurious change in simgear/nasal/code.c that makes variable creation without the var keyword illegal. I use that to make sure my Nasal scripts are clean in this regard but there are plenty of scripts in fgdata that do not pass that test (and, well, they are still valid Nasal). I'll update the patch file ASAP but please revert that file to the standard one in the meanwhile. — Anders Gidenstam (2011-04-11). Re: [Flightgear-devel] Nasal Garbage Collector (was Stuttering at 1
Hz rate).
(powered by Instant-Cquotes) |
there is still the issue that you can't execute Nasal while the GC is running (for that you'd need a concurrent GC which is most likely not so easily implemented). The implication is that if the time required for a collection exceeds the frame dealy you'll still notice it even when the GC runs in a different thread.
I have updated my experimental patch from last year, which should also work on Windows now (and it has even been tested there):
— Anders Gidenstam (2012-05-23). Re: [Flightgear-devel] Nasal performance.
(powered by Instant-Cquotes) |
It doesn't solve the bigger problem of the current Nasal GC needing to run uninterrupted and in mutual exclusion - but I see very few GC runs in the main loop here (the patch prints "** Nasal GC in main thread **" in the console when that happens) except a few during startup.
— Anders Gidenstam (2012-05-23). Re: [Flightgear-devel] Nasal performance.
(powered by Instant-Cquotes) |
The Nasal language is a dynamic programming language, where memory is not manually allocated and freed by the programmer, but instead the Nasal virtual machine automatically allocates memory from so called "memory pools", every so often these memory pools must be checked, to see how many references (i.e. objects like variables) are still reachable, so that the unreachable ones can be removed from the memory pool and/or so that memory blocks in each pool can be allocated or freed. This is called "garbage management" or "garbage collection".
The current Nasal garbage collector (GC) is known to affect the frame rate in FlightGear and occasionally causes stutter (increased frame spacing) under certain circumstances (i.e. under heavy load, due to complex scripts - such as the Bombable addon or the local weather system.
This is largely attributed to the GC being a fairly simple sequential mark/sweep collector, which needs to stop all operations, recursively mark all reachable objects and free unreachable objects in an exclusive, single-threaded, fashion.
In other words, the time spent in the GC is proportial to the number of active objects and references held (i.e. lines of code), even for objects which may not be running or active - just loaded. For additional background information, please see Improving Nasal.
allocating objects into generations and only mark/reaping from the most recent one on most iterations is a straightforward optimization and will definitely make things faster. — (Andy Ross) FlightGear Devel mailing list[1]
|
Objective
Provide a basic introduction to garbage collection and document how the existing Nasal GC scheme works, so that it can be fully understood, before alternatives (fixes/improvements) or additional implementations are added (feedback and contributions are encouraged and appreciated!).
Also, come up with an approach to generalize the Nasal GC API such that additional GC schemes can be easily implemented next to the native Nasal GC, providing an option to select a custom/experimental GC implementation at startup. Ideally, this should be possible by using a switch property during fgfs startup - such as --prop:/sim/nasal-gc=custom
One of the most straightforward changes would be turning the existing mark/sweep GC into a generational GC using 2-3 generations, where past GC survivors are promoted to the next generation, the GC manager naGC_get() would then by default only request new memory in generation 0, whereas resize() events in gen0 would result in the next higher generation being subject to GC (and so on). The most senior data structures and objects would over time end up in the "eden" generation.
Some of the first steps would include:
- turning naGC_get() into a context-aware void function, so that callers do not need to be aware of the return type
- moving GC specific data structures from the globals struct to a GC struct, and only keep a void* GC pointers in globals
- move GC initialization to gc.c and allow different GCs to register their own init callbacks
- move GC specific declarations from data.h to gc.h (naPool etc)
- move GC specific stuff from misc.c to gc.c (none of the GC internals should be exposed to the rest of the code)
- customize the existing GC to support multiple generations
Suggested readings
The wikipedia article on garbage collection contains some pretty good general GC information: http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
In addition, the following online presentations provide a basic introduction to garbage collection:
- Garbage Collection in realtime games
- Garbage Collector Basics and Performance Hints
- http://www.memorymanagement.org/glossary/g.html#garbage.collection
- Garbage Collection Basics
- http://www.slideshare.net/khuonganpt/basic-garbage-collection-techniques
- http://www.slideshare.net/achinth/garbage-collection-algorithms
- Unified Theory of Garbage Collection
- GC FAQ
- GC algorithms
- http://www.cs.kent.ac.uk/people/staff/rej/gc.html
- http://blogs.msdn.com/b/abhinaba/archive/tags/garbage+collection/
Mark/Sweep
When using a mark-and-sweep collector, unreachable objects are not immediately reclaimed. Instead, garbage is allowed to accumulate until all available memory has been exhausted. When that happens, the execution of the program is suspended temporarily while the mark-and-sweep algorithm collects all the garbage. Once all unreferenced objects have been reclaimed, the normal execution of the program can resume.
Obviously, the main disadvantage of the mark-and-sweep approach is the fact that that normal program execution is suspended while the garbage collection algorithm runs. In particular, this can be a problem in a program that must satisfy real-time execution constraints (like a flight simulator). For example, an interactive application that uses mark-and-sweep garbage collection becomes unresponsive periodically.
The mark-and-sweep algorithm is called a tracing garbage collector because is traces out the entire collection of objects that are directly or indirectly accessible by the program. The objects that a program can access directly are those objects which are referenced by local variables on the processor stack as well as by any static variables that refer to objects. In the context of garbage collection, these variables are called the roots . An object is indirectly accessible if it is referenced by a field in some other (directly or indirectly) accessible object. An accessible object is said to be live . Conversely, an object which is not live is garbage.
The mark-and-sweep algorithm consists of two phases: In the first phase, it finds and marks all accessible objects. The first phase is called the mark phase. In the second phase, the garbage collection algorithm scans through the heap and reclaims all the unmarked objects. The second phase is called the sweep phase.
The mark/sweep algorithm can also be easily expressed using pseudo code, i.e. in Nasal itself:
var garbageCollect = func (roots) {
foreach(var root; roots) {
mark(root);
}
sweep();
}
The mark() function would just recursively check (i.e. calling itself) if the "marked" flag is set or not, and make sure to set the mark bit for all referenced objects:
var mark = func (obj) {
if (!obj.marked) {
obj.marked=1;
foreach( var referenced; referenced_by(obj) )
mark(referenced);
}
}
Finally, the sweep() function (reap() in the Nasal GC), will free all unreachable objects and reclaim the memory:
var sweep = func {
foreach(var obj; allocated_objects) {
if (obj.marked) obj.marked=0;
else reclaim(obj);
}
}
Nasal's implementation of sweep() ( called reap() ) works such that it is always executed for a handful of different type-specific memory pools. In addition, it also makes sure to allocate new memory if required.
Pool storage
A memory pool is basically a preallocated region of memory, which is dynamically resized. The Nasal GC works such that it manages a handful of global memory pools for all native Nasal types (strings, functions, vectors, hashes etc). At the moment, the hard coded defaults ensure that 25-50% of additional object "slots" (memory blocks) are kept available during each execution of reap() [1].
Whenever new memory is requested to create a new type (such as a vector or a string), the available memory in the corresponding pool will be checked, reachable objects will be marked, and dead objects will be removed from all pools using a mark/sweep collector, new memory blocks will be allocated if necessary. All of this happens atomically, i.e. single-threaded, in a "stop-the-world" fashion.
Memory blocks
A Nasal memory pool consists of memory blocks.
A memory block is a singly linked list: flightgear/simgear/next/simgear/nasal/gc.c#l10.
Each block contains a field to store its size, a pointer to the allocated memory, and another pointer to the next block.
struct Block {
int size; // size of the block
char* block; // pointer to the memory region
struct Block* next; // pointer to the next block
};
New memory blocks are allocated using newBlock() and the memory is initialized with 0: flightgear/simgear/next/simgear/nasal/gc.c#l167
static void newBlock(struct naPool* p, int need)
{
int i;
struct Block* newb;
if(need < MIN_BLOCK_SIZE) need = MIN_BLOCK_SIZE;
// allocate a new Block
newb = naAlloc(sizeof(struct Block));
// initialize the Block
newb->block = naAlloc(need * p->elemsz); // number of elements * size of element
newb->size = need; // set block size
// memory blocks are circular linked lists:
newb->next = p->blocks;
p->blocks = newb;
naBZero(newb->block, need * p->elemsz);
if(need > p->freesz - p->freetop) need = p->freesz - p->freetop;
p->nfree = 0;
p->free = p->free0 + p->freetop;
// mark all new blocks as unreachable and add them to the pool's free list
for(i=0; i < need; i++) {
// initialize each new new memory blocks by setting up an naRef (the container for all Nasal references)
struct naObj* o = (struct naObj*)(newb->block + i*p->elemsz);
o->mark = 0;
p->free[p->nfree++] = o;
}
p->freetop += need;
}
Nasal memory pools
For each of the 7 Nasal data types, there is a separate storage pool available, declared as part of the "Globals" structure.
As can be seen, each storage pool is addressed by its enum index (0..6): flightgear/simgear/next/simgear/nasal/data.h#l65
enum { T_STR, T_VEC, T_HASH, T_CODE, T_FUNC, T_CCODE, T_GHOST, NUM_NASAL_TYPES }; // V. important that this come last!
For example, globals->pools[T_VEC] refers to the storage pools for vectors:
globals->pools[T_VEC];
The 7 storage pools are declared in code.h as part of the "Globals" structure: flightgear/simgear/next/simgear/nasal/code.h#l40
The relevant part is:
struct Globals {
// Garbage collecting allocators:
struct naPool pools[NUM_NASAL_TYPES];
int allocCount;
// Dead blocks waiting to be freed when it is safe
void** deadBlocks;
int deadsz;
int ndead;
//...
};
So, we have these pools:
- globals->pools[T_STR]
- globals->pools[T_VEC]
- globals->pools[T_HASH]
- globals->pools[T_CODE]
- globals->pools[T_FUNC]
- globals->pools[T_CCODE]
- globals->pools[T_GHOST]
The Globals structure
The complete Globals structure looks like this:
struct Globals {
// Garbage collecting allocators:
struct naPool pools[NUM_NASAL_TYPES];
int allocCount;
// Dead blocks waiting to be freed when it is safe
void** deadBlocks;
int deadsz;
int ndead;
// Threading stuff
int nThreads;
int waitCount;
int needGC;
int bottleneck;
void* sem;
void* lock;
// Constants
naRef meRef;
naRef argRef;
naRef parentsRef;
// A hash of symbol names
naRef symbols;
naRef save;
struct Context* freeContexts;
struct Context* allContexts;
};
The naPool structure
The structure keeping all pool-related information is named "naPool", the layout of the naPool structure can be seen in data.h: flightgear/simgear/next/simgear/nasal/data.h#l181
struct naPool {
int type; // one of T_STR, T_VEC, T_HASH etc
int elemsz; // size of each element - determined via naTypeSize(T_*)
struct Block* blocks;
void** free0; // pointer to the alloced buffer
int freesz; // size of the alloced buffer
void** free; // current "free frame"
int nfree; // down-counting index within the free frame
int freetop; // curr. top of the free list
};
Each memory pool is initialized using naGC_init(): flightgear/simgear/next/simgear/nasal/gc.c#l192
void naGC_init(struct naPool* p, int type)
{
p->type = type;
p->elemsz = naTypeSize(type);
p->blocks = 0;
p->free0 = p->free = 0;
p->nfree = p->freesz = p->freetop = 0;
reap(p);
}
As can be seen, "elemsz" just stores the size of each type-specific structure, so that it can be used for new allocations.
naGC_init is called in initGlobals() in code.c for each Nasal type (T_STR, T_VEC etc): flightgear/simgear/next/simgear/nasal/code.c#l172
for(i=0; i<NUM_NASAL_TYPES; i++)
naGC_init(&(globals->pools[i]), i);
The size of a memory pool is determined using poolsize(): flightgear/simgear/next/simgear/nasal/gc.c#l203
static int poolsize(struct naPool* p)
{
int total = 0;
struct Block* b = p->blocks;
while(b) { total += b->size; b = b->next; }
return total;
}
The function poolsize() just iterates over all memory blocks in the pool, and computes the total size of the pool by adding up the "size" field of all blocks in the linked list.
Allocating Nasal types
All Nasal types (scalars, vectors, hashes, funcs, ghosts) are allocated via wrappers in misc.c.
These wrappers are:
- naNewString(): flightgear/simgear/next/simgear/nasal/misc.c#l77
- naNewVector(): flightgear/simgear/next/simgear/nasal/misc.c#l87
- naNewHash(): flightgear/simgear/next/simgear/nasal/misc.c#l94
- naNewCode(): flightgear/simgear/next/simgear/nasal/misc.c#l101
- naNewCCode(): flightgear/simgear/next/simgear/nasal/misc.c#l106
- naNewFunc(): flightgear/simgear/next/simgear/nasal/misc.c#l131
- naNewGhost(): flightgear/simgear/next/simgear/nasal/misc.c#l140
- naNewGhost2(): flightgear/simgear/next/simgear/nasal/misc.c#l154
All of these wrappers are really just helpers on top of naNew(), they just set up type-specific information and initialize each Nasal type properly. For instance, first naNew() is called for the given context, along with the type info (to set up the size properly)- and then each type is initialized:
// The naNew* helpers are just wrappers to set up type-specific structures, after allocating the memory using naNew()
naRef naNewString(struct Context* c)
{
naRef s = naNew(c, T_STR);
PTR(s).str->emblen = 0;
PTR(s).str->data.ref.len = 0;
PTR(s).str->data.ref.ptr = 0;
PTR(s).str->hashcode = 0;
return s;
}
naNew()
As can be seen, the key allocator function is naNew(). All memory allocations done via the naNew*() helpers will first of all set up memory using naNew().
naNew() is really just a fancy malloc() replacement, i.e. it's just getting passed the Nasal execution context, and the nasal type (which is mapped to the size of the requested memory region using naTypeSize()).
New nasal objects are added to a temporary bin when they are created, because further allocation might cause a garbage collection to happen before the code that created the object can store a reference to it where the garbage collector can find it. For performance and simplicity, this list is stored per-context. When the context next executes code, it clears this list. Here's the problem: we do a lot of our naNewXX() calls in FlightGear using the old "global context" object that is created at startup. But this context is no longer used to execute scripts* at runtime, so as Csaba discovered, it's temporaries are never flushed. That essentially causes a resource leak: those allocations (mostly listener nodes) will never be freed. And all the extra "reachable" Nasal data floating around causes the garbage collector to take longer and longer to do a full collection as time goes on, causing "stutter". And scripts that use listeners extensively (the cmdarg() they use was one of the affected allocations) will see the problem more seriously. — Andy Ross (Oct 24th, 2007). [Flightgear-devel] Stutter/Nasal issue resolved (was: FlightGear/Plib periodic stutter notes).
(powered by Instant-Cquotes) |
Once listeners were added, scripts could be recursive: (script A sets property X which causes listener L to fire and cause script B to run) We need two or more contexts on the stack to handle that; a single global one won't work. I didn't like the fix though. Exposing the temporary bin as part of the Nasal public API is ugly; it's an internal design feature, not something users should tune. Instead, I just hacked at the FlightGear code to reinitialie this context every frame, thus cleaning it up. A "proper" fix would be to remove the global context entirely, but that would touch a bunch of code. — Andy Ross (Oct 24th, 2007). [Flightgear-devel] Stutter/Nasal issue resolved (was: FlightGear/Plib periodic stutter notes).
(powered by Instant-Cquotes) |
The naNew() function is also implemented in misc.c, see flightgear/simgear/next/simgear/nasal/misc.c#l66:
naRef naNew(struct Context* c, int type)
{
naRef result;
if(c->nfree[type] == 0)
c->free[type] = naGC_get(&globals->pools[type],
OBJ_CACHE_SZ, &c->nfree[type]);
result = naObj(type, c->free[type][--c->nfree[type]]);
naTempSave(c, result);
return result;
}
First, the conditional checks if there's any free memory in the Nasal storage pools for the requested Nasal data type, if there isn't any free memory left, new memory is assigned to the contexts->free[type] pool via naGC_get().
Note: naGC_get() is the only function which really manages all memory pools and which triggers the GC. Allocations via naNew()* will end up calling naNew() and naNew() ends up calling naGC_get() always.
For the time being, OBJ_CACHE_SZ can be ignored, it's defined in code.h: flightgear/simgear/next/simgear/nasal/code.h#l12
// Number of objects (per pool per thread) asked for using naGC_get().
// The idea is that contexts can "cache" allocations to prevent thread
// contention on the global pools. But in practice this interacts
// very badly with small subcontext calls, which grab huge numbers of
// cached objects and don't use them, causing far more collections
// than necessary. Just leave it at 1 pending a rework of the
// collector synchronization.
#define OBJ_CACHE_SZ 1
The final argument to the naGC_get() function is a pointer to to the context's nfree[type] member which is updated by the function.
Next, a new naRef structure is set up using the allocated naObj memory and the type info:
result = naObj(type, c->free[type][--c->nfree[type]]);
naRef naObj(int type, struct naObj* o)
{
naRef r;
SETPTR(r, o);
o->type = type;
return r;
}
The GC interface
TODO: flightgear/simgear/next/simgear/nasal/data.h#l204
- void naGC_init(struct naPool* p, int type);
- struct naObj** naGC_get(struct naPool* p, int n, int* nout);
- void naGC_swapfree(void** target, void* val); (used for vector/hash resizing)
- void naGC_freedead();
- void naiGCMark(naRef r);
- void naiGCMarkHash(naRef h);
The pool manager
All high level type allocators (naNewString, naNewVector, naNewHash etc) make use of the naNew() call introduced earlier.
naNew() doesn't directly allocate new memory from the heap using naAlloc(), but instead uses the naGC_get() helper which manages memory from the global pools (i.e. preallocated memory for each type).
For each native Nasal data type (scalars, vectors, hashes, funcs etc) , there are separate memory pools in globals->pools[n]. The index (n) is one of T_VEC, T_HASH, T_FUNC etc.
The naGC_get() pool "manager" is located in gc.nas: flightgear/simgear/next/simgear/nasal/gc.c#l211
struct naObj** naGC_get(struct naPool* p, int n, int* nout)
{
struct naObj** result;
naCheckBottleneck();
LOCK();
while(globals->allocCount < 0 || (p->nfree == 0 && p->freetop >= p->freesz)) {
globals->needGC = 1;
bottleneck();
}
if(p->nfree == 0)
newBlock(p, poolsize(p)/8);
n = p->nfree < n ? p->nfree : n;
*nout = n;
p->nfree -= n;
globals->allocCount -= n;
result = (struct naObj**)(p->free + p->nfree);
UNLOCK();
return result;
}
As can be seen, the garbage collector itself is triggered by setting the global "needGC" flag to 1 in the globals structure and then calling the bottleneck() function: flightgear/simgear/next/simgear/nasal/gc.c#l105.
The memory region from the global pool is cast back to a pointer (struct naObj**): flightgear/simgear/next/simgear/nasal/gc.c#l226 the "nout" parameter is just there to update c->nfree[type] with the amount of free memory.
In other words, whenever new Nasal types are allocated, new memory from one of the 7 Nasal pools is requested. All of these allocation requests are channeled through naNew() and finally naGC_get(). naGC_get() is the de facto memory manager in Nasal, all memory management is triggered here - new memory block allocations for each pool, as well as garbage collection via the bottleneck() call.
This also means that we basically just need to provide an alternate naGC_get() function, next to the existing one, to add support for alternate GC schemes.
Supporting additional GC implementations
It would probably make sense to change the pool manager interface/signature such that additional implementations can be easily provided and registered, so that merely a GC_CALLBACK pointer must be set during initialization in order to use a different GC scheme:
diff --git a/simgear/nasal/data.h b/simgear/nasal/data.h
index deb4fbd..f3fcaa1 100644
--- a/simgear/nasal/data.h
+++ b/simgear/nasal/data.h
@@ -202,7 +202,7 @@ int naiHash_sym(struct naHash* h, struct naStr* sym, naRef* out);
void naiHash_newsym(struct naHash* h, naRef* sym, naRef* val);
void naGC_init(struct naPool* p, int type);
-struct naObj** naGC_get(struct naPool* p, int n, int* nout);
+void naGC_alloc(struct Context* c, int type, int n);
void naGC_swapfree(void** target, void* val);
void naGC_freedead();
void naiGCMark(naRef r);
diff --git a/simgear/nasal/gc.c b/simgear/nasal/gc.c
index e9d9b7b..3bae313 100644
--- a/simgear/nasal/gc.c
+++ b/simgear/nasal/gc.c
@@ -191,8 +191,10 @@ static int poolsize(struct naPool* p)
return total;
}
-struct naObj** naGC_get(struct naPool* p, int n, int* nout)
+void naGC_alloc(struct Context* c, int type, int n)
{
+ struct naPool* p = &globals->pools[type]; // type-specific memory pool
+ int* nout = &c->nfree[type]; //
struct naObj** result;
naCheckBottleneck();
LOCK();
@@ -208,7 +210,7 @@ struct naObj** naGC_get(struct naPool* p, int n, int* nout)
globals->allocCount -= n;
result = (struct naObj**)(p->free + p->nfree);
UNLOCK();
- return result;
+ c->free[type]=result;
}
static void markvec(naRef r)
diff --git a/simgear/nasal/misc.c b/simgear/nasal/misc.c
index dae0bfe..5caeb76 100644
--- a/simgear/nasal/misc.c
+++ b/simgear/nasal/misc.c
@@ -66,9 +66,7 @@ naRef naStringValue(naContext c, naRef r)
naRef naNew(struct Context* c, int type)
{
naRef result;
- if(c->nfree[type] == 0)
- c->free[type] = naGC_get(&globals->pools[type],
- OBJ_CACHE_SZ, &c->nfree[type]);
+ if(! c->nfree[type] ) naGC_alloc(c, type, OBJ_CACHE_SZ);
result = naObj(type, c->free[type][--c->nfree[type]]);
naTempSave(c, result);
return result;
Disabling the current GC
The overhead caused by managing the memory pools and constantly scanning the heap (mark/reap) can be examined by modifying the naGC_get() function such that it no longer uses any pool memory, but directly allocates via malloc()/naAlloc() - without ever freeing any memory (i.e. leaking). While the fgfs process will eventually get killed this way, the GC will be basically switched off.
struct naObj** naGC_get(struct naPool* p, int n, int* nout)
{
struct naObj** result;
naCheckBottleneck();
LOCK();
#define LEAKY
#ifndef LEAKY
while(globals->allocCount < 0 || (p->nfree == 0 && p->freetop >= p->freesz)) {
globals->needGC = 1;
bottleneck();
}
if(p->nfree == 0)
newBlock(p, poolsize(p)/8);
n = p->nfree < n ? p->nfree : n;
*nout = n;
p->nfree -= n;
globals->allocCount -= n;
result = (struct naObj**)(p->free + p->nfree);
#else
result = (struct naObj**) naAlloc( naTypeSize(n) );
#endif
UNLOCK();
return result;
}
bottleneck()
Whenever new memory from one of the global pools is being requested via naGC_get(), naCheckBottleneck() will also be invoked:
flightgear/simgear/next/simgear/nasal/gc.c#l131
void naCheckBottleneck()
{
if(globals->bottleneck) { LOCK(); bottleneck(); UNLOCK(); }
}
This checks if globals->bottleneck is true, and if it is, it makes sure that the bottleneck() function is only invoked after acquiring a lock.
flightgear/simgear/next/simgear/nasal/gc.c#l100
// Must be called with the main lock. Engages the "bottleneck", where
// all threads will block so that one (the last one to call this
// function) can run alone. This is done for GC, and also to free the
// list of "dead" blocks when it gets full (which is part of GC, if
// you think about it).
static void bottleneck()
{
struct Globals* g = globals;
g->bottleneck = 1;
while(g->bottleneck && g->waitCount < g->nThreads - 1) {
g->waitCount++;
UNLOCK(); naSemDown(g->sem); LOCK();
g->waitCount--;
}
if(g->waitCount >= g->nThreads - 1) {
freeDead();
if(g->needGC) garbageCollect();
if(g->waitCount) naSemUp(g->sem, g->waitCount);
g->bottleneck = 0;
}
}
The actual garbage collection takes place in line 114 where dead objects from the memory pool are freed, and the garbageCollect() function is called if globals.needGC is true.
garbageCollect()
This is the function that does the mark/sweep part using mark() to recursively mark reachable objects, and reap() to collect unreachable objects.
flightgear/simgear/next/simgear/nasal/gc.c#l35
// Must be called with the big lock!
static void garbageCollect()
{
int i;
struct Context* c;
globals->allocCount = 0;
c = globals->allContexts;
while(c) {
for(i=0; i<NUM_NASAL_TYPES; i++)
c->nfree[i] = 0;
for(i=0; i < c->fTop; i++) {
mark(c->fStack[i].func);
mark(c->fStack[i].locals);
}
for(i=0; i < c->opTop; i++)
mark(c->opStack[i]);
mark(c->dieArg);
marktemps(c);
c = c->nextAll;
}
mark(globals->save);
mark(globals->symbols);
mark(globals->meRef);
mark(globals->argRef);
mark(globals->parentsRef);
// Finally collect all the freed objects
for(i=0; i<NUM_NASAL_TYPES; i++)
reap(&(globals->pools[i]));
// Make enough space for the dead blocks we need to free during
// execution. This works out to 1 spot for every 2 live objects,
// which should be limit the number of bottleneck operations
// without imposing an undue burden of extra "freeable" memory.
if(globals->deadsz < globals->allocCount) {
globals->deadsz = globals->allocCount;
if(globals->deadsz < 256) globals->deadsz = 256;
naFree(globals->deadBlocks);
globals->deadBlocks = naAlloc(sizeof(void*) * globals->deadsz);
}
globals->needGC = 0;
}
For each invocation of garbageCollect(), the following takes place:
- For each context:
- The counter for each type-specific pool will be reset to 0: flightgear/simgear/next/simgear/nasal/gc.c#l43
- for each stack frame of the given context, all reachable locals and functions will be marked as reachable: flightgear/simgear/next/simgear/nasal/gc.c#l45
- All instructions in the opcode stack will be marked as reachable from top to bottom: flightgear/simgear/next/simgear/nasal/gc.c#l49
- mark(die->Arg)
- temporary variables are marked reachable
- root references (constants) are marked reachable: flightgear/simgear/next/simgear/nasal/gc.c#l56
- foreach Nasal data type:
- unreachable objects are then collected: flightgear/simgear/next/simgear/nasal/gc.c#l63
- dead blocks are freed and new memory is allocated: flightgear/simgear/next/simgear/nasal/gc.c#l67
- the needGC flag is set to 0
mark()
All reachable objects are recursively marked using tagged naRef structures flightgear/simgear/next/simgear/nasal/data.h#l50, so that unreachable objects can be collected (i.e. because the reftag bit is not set) and freed.
TODO: flightgear/simgear/next/simgear/nasal/gc.c#l223
reap()
This is the collector function called by garbageCollect() which also frees elements and allocates new blocks for the specified pool:
// Collects all the unreachable objects into a free list, and
// allocates more space if needed.
static void reap(struct naPool* p)
{
struct Block* b;
int elem, freesz, total = poolsize(p);
freesz = total < MIN_BLOCK_SIZE ? MIN_BLOCK_SIZE : total;
freesz = (3 * freesz / 2) + (globals->nThreads * OBJ_CACHE_SZ);
if(p->freesz < freesz) {
naFree(p->free0);
p->freesz = freesz;
p->free = p->free0 = naAlloc(sizeof(void*) * p->freesz);
}
p->nfree = 0;
p->free = p->free0;
for(b = p->blocks; b; b = b->next)
for(elem=0; elem < b->size; elem++) {
struct naObj* o = (struct naObj*)(b->block + elem * p->elemsz);
if(o->mark == 0)
freeelem(p, o);
o->mark = 0;
}
p->freetop = p->nfree;
// allocs of this type until the next collection
globals->allocCount += total/2;
// Allocate more if necessary (try to keep 25-50% of the objects
// available)
if(p->nfree < total/4) {
int used = total - p->nfree;
int avail = total - used;
int need = used/2 - avail;
if(need > 0)
newBlock(p, need);
}
}
swapfree()
freeelem()
Frees individual elements and adds freed pointers to the globals free list, implemented in gc.c: flightgear/simgear/next/simgear/nasal/gc.c#l153
static void freeelem(struct naPool* p, struct naObj* o)
{
// Clean up any intrinsic storage the object might have...
switch(p->type) {
case T_STR: naStr_gcclean ((struct naStr*) o); break;
case T_VEC: naVec_gcclean ((struct naVec*) o); break;
case T_HASH: naiGCHashClean ((struct naHash*) o); break;
case T_CODE: naCode_gcclean ((struct naCode*) o); break;
case T_GHOST: naGhost_gcclean((struct naGhost*)o); break;
}
p->free[p->nfree++] = o; // ...and add it to the free list
}
Type destructors
Will be called by freelem(), implemented for all native Nasal types:
- strings (T_STR): naStr_gcclean flightgear/simgear/next/simgear/nasal/string.c#l113
- vectors (T_VEC): naVec_gcclean flightgear/simgear/next/simgear/nasal/vector.c#l22
- hashes (T_HASH): naiGCHashClean flightgear/simgear/next/simgear/nasal/hash.c#l221
- code (T_CODE): naCode_gcclean flightgear/simgear/next/simgear/nasal/gc.c#l136
- ghosts (T_GHOST): naGhost_gcclean flightgear/simgear/next/simgear/nasal/gc.c#l147
Contexts
New nasal objects are added to a temporary
bin when they are created, because further allocation might cause a
garbage collection to happen before the code that created the object
can store a reference to it where the garbage collector can find it.
For performance and simplicity, this list is stored per-context. When
the context next executes code, it clears this list.
Here's the problem: we do a lot of our naNewXX() calls in FlightGear
using the old "global context" object that is created at startup. But
this context is no longer used to execute scripts* at runtime, so as
Csaba discovered, it's temporaries are never flushed. That
essentially causes a resource leak: those allocations (mostly listener
nodes) will never be freed. And all the extra "reachable" Nasal data
floating around causes the garbage collector to take longer and longer
to do a full collection as time goes on, causing "stutter". And
scripts that use listeners extensively (the cmdarg() they use was one
of the affected allocations) will see the problem more seriously.
(That's a feature, not a bug. Once listeners were added, scripts
could be recursive: (script A sets property X which causes listener
L to fire and cause script B to run) We need two or more contexts on
the stack to handle that; a single global one won't work.)
I didn't like the fix though. Exposing the temporary bin as part of
the Nasal public API is ugly; it's an internal design feature, not
something users should tune. Instead, I just hacked at the FlightGear
code to reinitialize this context every frame, thus cleaning it up. A
"proper" fix would be to remove the global context entirely, but that
would touch a bunch of code.
[2] — Andy Ross
|
The problem is the fact that Nasal recycles context objects - it keeps a pool of freed contexts around. At the point we shutdown Nasal, all contexts should be inactive, i.e in the free list (and they are, phew). Free-ed contexts are still considered for GC however, and there seems to be an issue that the opStack (part of the stack machine which implements the core bytecode interpreter of Nasal) doesn’t end up empty in some situations.
So my work around is to clear the opStack (and frameStack) to zero, inside naFreeContext. This then matches the state for a new (non-recycled) context, and ensures that on the next GC cycle, items referenced by the opStack aren’t marked as in-use. (At least to me it makes sense, that a freed context - which is therefore inaccessible - shouldn’t be retaining items in its opStack into its next re-use)
The ideal fix would be to find out *why* the opStack doesn’t return to empty ‘naturally’ before the context is freed
[3] — James Turner
|
- ↑ Andy Ross (22 May 2012). Nasal performance.
- ↑ Andy Ross (Wed, 24 Oct 2007 11:33:36 -0700). [Flightgear-devel] Stutter/Nasal issue resolved (was: FlightGear/Plib periodic stutter notes).
- ↑ James Turner (2019-09-17 17:43:53). [Flightgear-devel] Resetting Nasal (was Re: Implementing aircraft.nas in C++).
Also see: flightgear/flightgear/next/src/Scripting/NasalSys.cxx (in FGNasalSys::update)
// The global context is a legacy thing. We use dynamically
// created contexts for naCall() now, so that we can call them
// recursively. But there are still spots that want to use it for
// naNew*() calls, which end up leaking memory because the context
// only clears out its temporary vector when it's *used*. So just
// junk it and fetch a new/reinitialized one every frame. This is
// clumsy: the right solution would use the dynamic context in all
// cases and eliminate _context entirely. But that's more work,
// and this works fine (yes, they say "New" and "Free", but
// they're very fast, just trust me). -Andy
Consider using coccinelle here to rewrite naContext handling in all places at once
flightgear/simgear/next/simgear/nasal/code.h#l76
struct Context {
// Stack(s)
struct Frame fStack[MAX_RECURSION];
int fTop;
naRef opStack[MAX_STACK_DEPTH];
int opFrame; // like Frame::bp, but for C functions
int opTop;
int markStack[MAX_MARK_DEPTH];
int markTop;
// Free object lists, cached from the global GC
struct naObj** free[NUM_NASAL_TYPES];
int nfree[NUM_NASAL_TYPES];
// GC-findable reference point for objects that may live on the
// processor ("real") stack during execution. naNew() places them
// here, and clears the array each instruction
struct naObj** temps;
int ntemps;
int tempsz;
// Error handling
jmp_buf jumpHandle;
char error[128];
naRef dieArg;
// Sub-call lists
struct Context* callParent;
struct Context* callChild;
// Linked list pointers in globals
struct Context* nextFree;
struct Context* nextAll;
void* userData;
};
Stack frames
flightgear/simgear/next/simgear/nasal/code.h#l33
Tagged structures
flightgear/simgear/next/simgear/nasal/data.h#l84
Core allocations
All core memory allocation takes place via wrappers in misc.c:
void naFree(void* m) { free(m); }
void* naAlloc(int n) { return malloc(n); }
void* naRealloc(void* b, int n) { return realloc(b, n); }
void naBZero(void* m, int n) { memset(m, 0, n); }