Improving Nasal: Difference between revisions

From FlightGear wiki
Jump to navigation Jump to search
(Switch to {{gitorious url}} to fix the broken Gitorious link.)
 
(58 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Template:Nasal Internals}}


 
Last update: 07/2013
Last update: 10/2011


As more and more code in FlightGear is moved to the base package and thus implemented in Nasal space, some Nasal related issues have become increasingly obvious.
As more and more code in FlightGear is moved to the base package and thus implemented in Nasal space, some Nasal related issues have become increasingly obvious.


On the other hand, Nasal has a proven track record of success in FlightGear, and has shown remarkably few significant issues so far. Most of the more prominent issues are related to a wider adoption in FlightGear, and thus more complex features being implemented in Nasal overall.
On the other hand, Nasal has a proven track record of success in FlightGear, and has shown remarkably few significant issues so far. Most of the more prominent issues are related to a wider adoption in FlightGear, and thus more complex features being implemented in Nasal overall, often developed by people without any formal programming training and/or coding experience, for whom Nasal may be their first programming language - meaning, that many issues we've been seeing are of algorithmic nature.


So, rather than having Nasal flame wars and talking about "alternatives" like Perl, Python, Javascript or Lua, the idea is to document known Nasal issues so that they can hopefully be addressed eventually.
So, rather than having Nasal flame wars and talking about "alternatives" like Perl, Python, Javascript or Lua, the idea is to document known Nasal issues so that they can hopefully be addressed eventually.


If you are aware of any major Nasal issues that are not yet covered here, please feel free to add them here, however it is also a good idea to use the FlightGear bug tracker in such cases: http://flightgear-bugs.googlecode.com/
If you are aware of any major Nasal issues that are not yet covered here, please feel free to add them here, however it is also a good idea to use the FlightGear bug tracker in such cases: {{create ticket}}
 
= Get rid of the global interpreter context =
 
Source: Andy Ross, Nasal author
 
Year: 2007-2011
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg13028.html
 
'''Problem:''' New nasal objects are added to a temporary
bin when they are created, because further allocation might cause a
garbage collection to happen before the code that created the object
can store a reference to it where the garbage collector can find it.
For performance and simplicity, this list is stored per-context.  When
the context next executes code, it clears this list.
 
Here's the problem: we do a lot of our naNewXX() calls in FlightGear
using the old "global context" object that is created at startup.  But
this context is no longer used to execute scripts* at runtime, so as
Csaba discovered, it's temporaries are never flushed.  That
essentially causes a resource leak: those allocations (mostly listener
nodes) will never be freed.  And all the extra "reachable" Nasal data
floating around causes the garbage collector to take longer and longer
to do a full collection as time goes on, causing "stutter".  And
scripts that use listeners extensively (the cmdarg() they use was one
of the affected allocations) will see the problem more seriously.
 
(That's a feature, not a bug.  Once listeners were added, scripts
could be recursive: (script A sets property X which causes listener
L to fire and cause script B to run) We need two or more contexts on
the stack to handle that; a single global one won't work.)
 
I didn't like the fix though.  Exposing the temporary bin as part of
the Nasal public API is ugly; it's an internal design feature, not
something users should tune.  Instead, I just hacked at the FlightGear
code to reinitialize this context every frame, thus cleaning it up.  A
"proper" fix would be to remove the global context entirely, but that
would touch a bunch of code.
 
Also see: http://gitorious.org/fg/flightgear/blobs/next/src/Scripting/NasalSys.cxx (in FGNasalSys::update)
 
    // The global context is a legacy thing.  We use dynamically
    // created contexts for naCall() now, so that we can call them
    // recursively.  But there are still spots that want to use it for
    // naNew*() calls, which end up leaking memory because the context
    // only clears out its temporary vector when it's *used*.  So just
    // junk it and fetch a new/reinitialized one every frame.  This is
    // clumsy: the right solution would use the dynamic context in all
    // cases and eliminate _context entirely.  But that's more work,
    // and this works fine (yes, they say "New" and "Free", but
    // they're very fast, just trust me). -Andy
 
= Improve the garbage collector =
Also see: [[How the Nasal GC works]]
 
Year: 2011-2012
 
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg33190.html
 
'''Problem:''' Nasal has a garbage collection problem. One solution to it is - we avoid
Nasal code wherever possible and try to hard-code everything. But Nasal
crops up on a lot of places - complex aircraft such as the Concorde come
to my mind, interactive AI models, lots of really nifty and useful
applications... - so instead of fixing things in a lot of places, one
could also think about it the other way and fix just one thing, i.e. the
garbage collection such that it doesn't hit a single frame. I fully well
realize that dragging out complicated operations across many frames while
everything else keeps changing is at least an order of magnitude more
complicated (about 1/3 of Local Weather deal with precisely that
problem...) - but I don't believe it can't be done at all. It sort of bugs
me a bit that somehow the fault is always supposed to be in using Nasal...
 
I think it's great if we have a discussion where the issues are placed on
the table to give everyone the change to learn and understand more, and
then reasonably decide what to do. Nasal has advantages and disadvantages,
so has C++, sometimes accessibility and safety are worth a factor 3
performance (to me at least), sometimes not.  But I don't really want to
discuss dogmatics where 'truth' is a priori clear. There is a case for
having high-level routines in Nasal, there's a case to be made to switch
low level workhorses to C++ - and there's always the question of what is
the most efficient way of doing something. But I'm clearly not considering
Nasal-based systems immature or experimental per se.
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg31918.html
 
As discussed in "Stuttering at 1 Hz rate" we now know that regular and
unpleasant stuttering is caused by Nasals garbage collector.
So I thought about possibilities to improve it.
What if we could decouple the following function as a separate thread, so
that it runs *asynchronously* from the main thread?
This way it would not interfere (or much less) with the main thread and our
fps would be more consistent.
 
This is the function causing the jitter:
In "simgear/nasal/gc.c"
static void garbageCollect()
 
The thread will need to share some of the global variables from the main
thread.
 
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg31919.html
 
I'm not an expert in nasal garbage collection, but I think the problem is
that garbage collection is not something we can divide up into chunks (which
is essentially what threading would do.)  In addition, threading adds a lot
of potential order dependent bugs.
 
In the case of nasal, I believe the garbage collection pass must be done in
a single atomic step, otherwise it would leave the heap in
an inconsistent state and adversely affect the scripts.
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg31637.html
I don't know much about our Nasal implementation, but I suspect that
the garbage collector could be changed to trace only a portion of
Nasal's heap at each invocation, at the risk of increased memory use.
 
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg31921.html
 
There are algorithms for incremental and/or concurrent and/or parallel
garbage collection out there. They most likely not easy to implement and
as far as I have seen so far would require (at least for concurrent and
/or parallel GC) all writes of pointers to the Nasal heap (and possibly
reads) to be redirected via wrapper functions (also known as
(GC) read/write barriers).
 
This will not be an easy task but in my opinion it would be a promising
option. It might be possible to use a GC module from a GPL:d Java vm or
similar.
 
Btw, just running the normal (mutually exclusive) Nasal GC in another
thread than the main loop is not hard - but since it is mutually exclusive
to executing Nasal functions it doesn't help much when it comes to
reducing the worst case latency.
 
The small changes needed to add a separate GC thread are available here:
http://www.gidenstam.org/FlightGear/misc/test/sg-gc-2.diff
http://www.gidenstam.org/FlightGear/misc/test/fg-gc-1.diff
 
 
Also, I had a brief look at exactly which Nasal timers caused a jitter.
And the winner is...
... well, any. Any Nasal timer, even if it's almost empty, will every
now and then consume a much larger amount of time than normal.
Seems to be a general issue with the Nasal execution engine: could be
triggered by Nasal's garbage collector, which every now and then needs
to do extra work - and runs within the context of a normal Nasal call.
It could also be a result of Nasal's critical sections: other threads
may acquire a temporary lock to alter Nasal data structures - which may
block the execution of Nasal timers at certain points. Hmm... Best
practices for debugging a multi-threaded program anyone? :)
 
Concerning the frequency of the jitter: I guess it isn't related to the
FDM at all. It's probably just a result of Nasal complexity. The more
Nasal code is running, the more often/likely garbage collection /
blocking may occur. Frame rate may also influce it: many Nasal timers
run at delay 0 (in every update loop).
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg37308.htmla
 
A significant part of Nasal-related frame rate impact is
caused by garbage collection. Its delay/jitter only depends on the
number of Nasal objects and their references which need to be searched.
Increasing the number of Nasal objects (code+data) existing in memory
also increases the delay.
 
The amount of Nasal code which is actually being executed only
influences the g/c frequency, i.e. whether the effect is visible every
few seconds vs several times per second.
 
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg37310.html
 
I did look at incremental GC for Nasal last year, but couldn't find a 'simple
enough' generational algorithm. Still happy for someone else to try - the Nasal
GC interface is very clean and self-contained, so quite easy to experiment with
different GC schemes.
 
 
URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg37338.html


But as I said, I think really  the GC needs to be addressed. There's only so
= Consider Opcode reordering =
much hacking around the actual problem one can do.
Referring to switch/case constructs like these (also to be found in Andy's code):
<syntaxhighlight lang="c">
334 switch(BYTECODE(code)[i]) {
335         case OP_PUSHCONST: case OP_MEMBER: case OP_LOCAL:
336         case OP_JIFTRUE: case OP_JIFNOT: case OP_JIFNOTPOP:
337         case OP_JIFEND: case OP_JMP: case OP_JMPLOOP:
338         case OP_FCALL: case OP_MCALL:
339             naVec_append(bytecode, naNum(BYTECODE(code)[++i]));
</syntaxhighlight>
Such logic can be expressed more easily by simply wrapping these OP codes in between BEGIN_IMMEDIATE_MODE and END_IMMEDIATE_MODE enums, because we then only need to do this:


URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg37338.html
<syntaxhighlight lang="c">
#define IS_IMMEDIATE_MODE(bytecode) bytecode > BEGIN_IMMEDIATE_MODE && BYTECODE(code)[i] < END_IMMEDIATE_MODE
if ( IS_IMMEDIATE_MODE(BYTECODE(code)[i]) )
  naVec_append(bytecode, naNum(BYTECODE(code)[++i]));
#undef IS_IMMEDIATE_MODE
</syntaxhighlight>


Right, a problem is that I've possibly studied all Nasal documentation I could  
Which basically means that we only need to worry about a single place when it comes to extending opcodes (and checking in run() that these invalid opcodes aren't used), which also translates into fewer assembly instructions that are actually run (2 CMP vs. ~12 per insn). Also, the bytecode interpreter routine itself could be simplified that way, too. In addition, it would make sense to augment the list of opcode enums by adding an OP_VERSION field that is incremented once opcodes are added/removed (which would be a prerequisite for any caching/serialization schemes too):
get without finding any reference of the GC problem - that was only transmitted
to me much later. I think you'll find that most Nasal users are not aware of
any such problems, because it's not documented anywhere. It doesn't help so
much if you are aware of it.


URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg37338.html
<syntaxhighlight lang="c">
enum {
    OPCODE_VERSION = 0x01, // for serialization and versioning (e.g. caching bytecode)
    BEGIN_OPS=0xFF,    // reserve space for 255 opcode changes (should be plenty)
    OP_NOT, OP_MUL, OP_PLUS, OP_MINUS, OP_DIV, OP_NEG, OP_CAT, OP_LT, OP_LTE,
    OP_GT, OP_GTE, OP_EQ, OP_NEQ, OP_EACH, OP_JMP, OP_JMPLOOP, OP_JIFNOTPOP,


the current GC is bad, and big Nasal shows this while small Nasal doesn't.
    BEGIN_IMMEDIATE_MODE,
    OP_JIFEND, OP_FCALL, OP_MCALL, OP_RETURN, OP_PUSHCONST, OP_PUSHONE,
    END_IMMEDIATE_MODE, //FIXME: incomplete - just intended as an example!


    OP_PUSHZERO, OP_PUSHNIL, OP_POP, OP_DUP, OP_XCHG, OP_INSERT, OP_EXTRACT,
    OP_MEMBER, OP_SETMEMBER, OP_LOCAL, OP_SETLOCAL, OP_NEWVEC, OP_VAPPEND,
    OP_NEWHASH, OP_HAPPEND, OP_MARK, OP_UNMARK, OP_BREAK, OP_SETSYM, OP_DUP2,
    OP_INDEX, OP_BREAK2, OP_PUSHEND, OP_JIFTRUE, OP_JIFNOT, OP_FCALLH,
    OP_MCALLH, OP_XCHG2, OP_UNPACK, OP_SLICE, OP_SLICE2,
    NUM_OPCODES
};
</syntaxhighlight>


URL: http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg37338.html
(The same technique could be used to unify and optimize BINOP handling and some other instructions)


At which point it would make sense to add a pre-populated environment hash  like '''runtime''' - with build-time constants, like these:


We have an implementation of Nasal which dumps all the GC into a single frame
* OP_VERSION
and is apparently sensitive to the total amount of code, regardless if the code
* MAX_STACK_DEPTH 512
is actually run or not. This fact has historically not been widely advertized
* MAX_RECURSION 128
or explained. That turns out to be a problem.
* MAX_MARK_DEPTH 128
* OBJ_CACHE_SZ 1
* REFMAGIC
* BIG_ENDIAN/LITTLE_ENDIAN


The way this usually comes across is 'Advanced Weather causes stutter'. But it
Having access to these would help provide better support for backwards compatibility
actually doesn't really (or at least that remains to be shown) - what causes
stutter is mainly the GC, and Advanced Weather just happens to trigger this.
The range of suggested solutions in the past included almost everything, from
avoiding Nasal to porting code to Nasal to hacking around the problem to
loading things on-demand - except fixing the actual cause of the problems.


I don't honestly know how complex code to collect garbage across many frames
= Better debugging/development support =
is, but somehow I doubt that in terms of man-hours the effort beats porting the
existing large-scale Nasal codes to C++. Just my 2 cents in any case.


Besides making a full IDE (which would be ''really'' cool), there are several things that can be done by editing the source code of Nasal to enhance debugging support and increase development :


* add build time/runtime sanity checks for Nasal core internals, especially naRef/GC stuff like Andy's pointer hacks, which did cause problems in the past, especially WRT to aggressive compiler optimizations and naHash - see {{Issue|1240}} and Philosopher's comments - in the meantime, consider making naRef stuff '''volatile''' and using gcc attributes to disable any/all optimizations here [http://gcc.gnu.org/wiki/FunctionSpecificOpthttp://gcc.gnu.org/wiki/FunctionSpecificOpt] [http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html] {{Not done}}
* Being able to dump the global namespace (see [http://forum.flightgear.org/viewtopic.php?f=30&t=19049&p=182930&#p182930 this topic] for a possible solution) or at least dump things prettily (an unreleased version of the file discussed in [[Nasal Meta-Programming]] has good support for nice formatting). This should probably be  lazy API that can dump an arbitrary namespace recursively - using the canvas, we could then map that to a TreeView
* Register a callback for handling errors using call() (for parser errors it will need the AST, for runtime errors it would need bytecode access)
* work on abstracting the GC interface (Hooray) {{Progressbar|50}}
* Register a callback for OP_FCALL et al. to be able to time function calls {{Progressbar|80}}. Example [{{gitorious url|nasal-standalone|nasal-experiments|branch=extended-f_call|path=test.nas}}].
* Set breakpoints: register callbacks for values of <tt>(struct Frame*)->ip</tt>.
** Typically supported conditional break point types are [http://www.ofb.net/gnu/gdb/gdb_28.html][http://www.delorie.com/gnu/docs/gdb/gdb_29.html][http://winappdbg.sourceforge.net/HowBreakpointsWork.html]:
** onNamespace, onFile, onLine, onValue, onTypeChange, onRead, onWrite, onExecute, onOpcode, onAlloc, onFree (most of these could be implemented in scripting space meanwhile)
* Time other parts of Nasal (not just VM) with a compile-time flag? (could be stored in the Context struct, so that sub contexts would have their own flags, i.e. recursive scripts would not affect each other)
* Also, add some form of Context-based debug/log-level flag for different verbosity levels and phases (parse,codegen,vm,gc) - and maybe don't write it directly to the console, but allow a container/callback to be specified - for better integration/processing by the host app (fgfs)
* Better error messages {{Progressbar|30}}.
** '''Parsing:''' Say something other than "parse error", like "null pointer".
** '''VM:''' Indicate type of variable if wrong type.
** '''Both:''' Could we give line ''and'' column? Note: I tried this and failed (even just copying Andy's code I got random numbers sometimes). It's a big patch...
** Most of these diagnostics could be delegated to Nasal space by using some of our C-space hooks that expose the parser/codegen and VM
* Working with bytecode:
** Expose to Nasal: {{Done}}
** Decompile to text: partial (untested) {{Progressbar|70}}
** Optimize: not started (it makes only sense to look at optimizations after we're able to instrument/profile a running FG session to come up with hot spots that are executed either frequently, or that are responsible for significant runtime overhead - i.e. due to GC pressure or other issues)
** Working with it: provide Bytecode class in Nasal: not started<!-- (the exposeOpcode() API already exists, most other machinery could be built in scripting space on top of it?)-->
* Inspect Context: {{Progressbar|80}} (passed as argument to callbacks).
* Expose Tokens to Nasal: implemented by Hooray as argument to compile(), should be extended to cover output from lex.c and after blocking in addition to the current after-prec-ing (and before freeing!) support. {{Progressbar|70}}
** compile() being used by call(), it should be straightforward to also map a call() hash.callback to do the same thing here - so that there's no disparity here.
** Yeah, just another entry in naContext->extras, right? (affirmative)
* Real function name support via assignment:
** Option 1: look at the parse tree and check if the right side of the assignment is a function. If so, go ahead and parse the function with the left side as the "name" of the function instead of falling through to more recursions of genExpr().
** Option 2: recognize assignment in the VM and if there is a bindToContext event, set the name of the function based upon either the last LOCAL/MEMBER/HINSERT or the combination of them (i.e. complex lvalues like local.fn). This presents some obvious issues, however:
*** The right-hand side of an assignment is done before the left-hand side, thus one would have to look ahead to see the assignment, which is clearly illegal for the VM to do.
*** Or one could look behind to see a naCode constant being pushed, and give some indication to its naFunc that it now has a name. This is still somewhat illegal, but not dangerous and thus could be done.
** Option 3: abandon <tt>var foo = func(){}</tt> for ECMAscript-like function declaration syntax <tt>function foo() {}</tt>. This would not affect the use of anonymous func expressions but would instead be applicable in cases where we want to say "this function is static (i.e. permanent) and should have a name" (as opposed the the case of temporary storage variables for functions). Regardless of the method used, a name member will have to be added to naFunc's and the VM and error handling procedures will have to be changed according.
** Regarding the last comment: Providing an API to "lock" a symbol/naRef to become immutable would be generally useful, not just for functions - but also for constants (math.pi FT2M etc) and other stuff that may otherwise break consistency - ThorstenR mentioned a couple of times how he's intentionally replicating standard constants in LW/AW just to be on the safe side, because there's no such thing as a "constant" in Nasal. Providing a library function to make naRefs read-only should be straightforward, and could be easily implemented by hooking into the VM to register a callback that yields naRuntimeError() -aka die()- for any such attempts. The method would be scalable to also implement optional typing or min/max/stepping (value ranges), too.
** This should be doable, since I do a naRuntimeError(ctx,naGetError(subcontext)) which ''should'' pass errors from a callback (untested).
** To implement support for immutable symbols (constants/ private/protected encapsulation), one would really just need to either lock WRITE access or restrict visibility, which could work analogous to parents, just as embedded protected/private hashes that are honored by the codegen.
* Getting stats on Nasal performance while running: use the callbacks and systime to time things. Need hooks into the GC as well. Statistics worth tracking (also look at similar tools like [http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile.html google perftools]):
** Per function (also handles timers/listeners and other typical FG callbacks):
*** ncalls per frame/cumulative
*** time per call (kept as list) {{Done}} – display min/max/avg/cumulative.
*** number of GC invocations & avg/min/max time
*** number of naNew() calls
*** number & list of names of once-use variables {{Done}}; n that are numbers (i.e. non-GC-managed).
*** try to come up with heuristics to track *identical* temporaries per callback invocation: [[User:Philosopher/Howto:Write Optimized Nasal Code]]
*** maybe some holistic "GC pressure" percentage over time (5,30,60,300 seconds ?)
*** GC pressure can be computed by looking not just at new allocations, but also at realloc() events and the mark/reap phases
*** for GC stats, we can also easily access 1) size of all allocated naType pools and 2) percentage that's in use and 3) newBlock() allocations
** Per context/global:
*** time spent
*** number of GC invocations:
*** naContext leaks (see Andy's comments below)
**** per frame
**** min/max/avg
** Data per function call (not displayed, dumped to a file - maybe just accept an optional callback here too?):
*** caller line/name
** threading (eventually)
*** locking overhead


= Performance / Optimizations =
* a bunch of performance issues were reported [http://www.mail-archive.com/flightgear-devel@lists.sourceforge.net/msg36668.html] to be related to:
** accidentally registering listeners/callbacks twice without noticing (or even more often)
** never freeing timers/listeners, i.e. we could make sure that issue a warning if a listener's ghost is GC'ed while the listener is still active, because there's then now way to clean up the listener
** aircraft/addon scripts setting up listeners and timers without registering a /sim/signals/reset handler that handles cleanup
** always letting timers run at frame rate
* at least overload the settimer/setlistener API to support a singleton/ONCE param that issues a warning once the VM determines that multiple instances were registered?
* Hooray: look into adapting the existing GC scheme to support multiple generations - which is a straightforward optimizations even without being a GC expert, it basically boils down to having a single typedef enum {GC_GEN0, GC_GEN1} GENERATION; in code.h and then changing all places in $SG_NASAL to use the GC_GEN0 pool by default, i.e. instead of having &globals->pools[type]; - we would have &globals->pools[GC_GEN0][type]; for starters - the GC manager would then by default only mark/reap the GEN0 (nursery, young generation), promote any objects that survived the GC phase to GEN1, and only ever mark/reap GEN1 if GEN0 has to be resized (start off with reasonably sized generations, based on real stats - e.g. GEN0 16 MB and GEN1 32MB). {{Not done}}
** http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29
** http://c2.com/cgi/wiki?GenerationalGarbageCollection
** http://blogs.msdn.com/b/abhinaba/archive/2009/03/02/back-to-basics-generational-garbage-collection.aspx
* other dynamic languages like lua or python have GC hooks to customize the GC and to call it on demand: [http://lua-users.org/wiki/GarbageCollectionTutorial], [http://luatut.com/collectgarbage.html]
* another straightforward optimization would be exposing an API to allocate new objects in a certain generation (GEN0/GEN1) to directly tell the interpreter about the object's lifetime ( until reset/reinit, until aircraft change, timer/frame based).
* marking/reaping could be parallelized using several threads, for each pool - by using write barriers to sync access to naRefs


== Separate GC implementations ==
== Expose additional threading primitives ==
* [http://engineering.twitter.com/search?q=Garbage+Collector+ Ruby Enterprise Edition] - performance blog
Consider using pthreads, Nasal's threading support is extremely basic [http://plausible.org/nasal/lib.html].
* [http://www.hpl.hp.com/personal/Hans_Boehm/gc/ Boehm GC]
* [http://www.dekorte.com/projects/opensource/libgarbagecollector/ libgarbagecollector]
* [http://www.friday.com/bbum/2008/11/11/autozone-the-objective-c-garbage-collector/ AutoZone]
* http://mono-project.com/Generational_GC
* http://mono-project.com/Working_With_SGen
* http://www.mono-project.com/Compacting_GC
* http://www.utdallas.edu/~ramakrishnan/Projects/GC_for_C/index.htm


[[Category:GSoC]]
[[Category:Core developer documentation]]
[[Category:Core developer documentation]]
[[Category:Nasal]]
[[Category:Developer Plans]]

Latest revision as of 16:45, 9 March 2016


Last update: 07/2013

As more and more code in FlightGear is moved to the base package and thus implemented in Nasal space, some Nasal related issues have become increasingly obvious.

On the other hand, Nasal has a proven track record of success in FlightGear, and has shown remarkably few significant issues so far. Most of the more prominent issues are related to a wider adoption in FlightGear, and thus more complex features being implemented in Nasal overall, often developed by people without any formal programming training and/or coding experience, for whom Nasal may be their first programming language - meaning, that many issues we've been seeing are of algorithmic nature.

So, rather than having Nasal flame wars and talking about "alternatives" like Perl, Python, Javascript or Lua, the idea is to document known Nasal issues so that they can hopefully be addressed eventually.

If you are aware of any major Nasal issues that are not yet covered here, please feel free to add them here, however it is also a good idea to use the FlightGear bug tracker in such cases: Create an issue tracker ticket

Consider Opcode reordering

Referring to switch/case constructs like these (also to be found in Andy's code):

334 switch(BYTECODE(code)[i]) {
335	        case OP_PUSHCONST: case OP_MEMBER: case OP_LOCAL:
336	        case OP_JIFTRUE: case OP_JIFNOT: case OP_JIFNOTPOP:
337	        case OP_JIFEND: case OP_JMP: case OP_JMPLOOP:
338	        case OP_FCALL: case OP_MCALL:
339	            naVec_append(bytecode, naNum(BYTECODE(code)[++i]));

Such logic can be expressed more easily by simply wrapping these OP codes in between BEGIN_IMMEDIATE_MODE and END_IMMEDIATE_MODE enums, because we then only need to do this:

#define IS_IMMEDIATE_MODE(bytecode) bytecode > BEGIN_IMMEDIATE_MODE && BYTECODE(code)[i] < END_IMMEDIATE_MODE
if ( IS_IMMEDIATE_MODE(BYTECODE(code)[i]) )
  naVec_append(bytecode, naNum(BYTECODE(code)[++i]));
#undef IS_IMMEDIATE_MODE

Which basically means that we only need to worry about a single place when it comes to extending opcodes (and checking in run() that these invalid opcodes aren't used), which also translates into fewer assembly instructions that are actually run (2 CMP vs. ~12 per insn). Also, the bytecode interpreter routine itself could be simplified that way, too. In addition, it would make sense to augment the list of opcode enums by adding an OP_VERSION field that is incremented once opcodes are added/removed (which would be a prerequisite for any caching/serialization schemes too):

enum {
    OPCODE_VERSION = 0x01, // for serialization and versioning (e.g. caching bytecode)
    BEGIN_OPS=0xFF,    // reserve space for 255 opcode changes (should be plenty)
    OP_NOT, OP_MUL, OP_PLUS, OP_MINUS, OP_DIV, OP_NEG, OP_CAT, OP_LT, OP_LTE,
    OP_GT, OP_GTE, OP_EQ, OP_NEQ, OP_EACH, OP_JMP, OP_JMPLOOP, OP_JIFNOTPOP,

    BEGIN_IMMEDIATE_MODE,
     OP_JIFEND, OP_FCALL, OP_MCALL, OP_RETURN, OP_PUSHCONST, OP_PUSHONE,
    END_IMMEDIATE_MODE, //FIXME: incomplete - just intended as an example!

    OP_PUSHZERO, OP_PUSHNIL, OP_POP, OP_DUP, OP_XCHG, OP_INSERT, OP_EXTRACT,
    OP_MEMBER, OP_SETMEMBER, OP_LOCAL, OP_SETLOCAL, OP_NEWVEC, OP_VAPPEND,
    OP_NEWHASH, OP_HAPPEND, OP_MARK, OP_UNMARK, OP_BREAK, OP_SETSYM, OP_DUP2,
    OP_INDEX, OP_BREAK2, OP_PUSHEND, OP_JIFTRUE, OP_JIFNOT, OP_FCALLH,
    OP_MCALLH, OP_XCHG2, OP_UNPACK, OP_SLICE, OP_SLICE2, 
    NUM_OPCODES
};

(The same technique could be used to unify and optimize BINOP handling and some other instructions)

At which point it would make sense to add a pre-populated environment hash like runtime - with build-time constants, like these:

  • OP_VERSION
  • MAX_STACK_DEPTH 512
  • MAX_RECURSION 128
  • MAX_MARK_DEPTH 128
  • OBJ_CACHE_SZ 1
  • REFMAGIC
  • BIG_ENDIAN/LITTLE_ENDIAN

Having access to these would help provide better support for backwards compatibility

Better debugging/development support

Besides making a full IDE (which would be really cool), there are several things that can be done by editing the source code of Nasal to enhance debugging support and increase development :

  • add build time/runtime sanity checks for Nasal core internals, especially naRef/GC stuff like Andy's pointer hacks, which did cause problems in the past, especially WRT to aggressive compiler optimizations and naHash - see ticket #1240 and Philosopher's comments - in the meantime, consider making naRef stuff volatile and using gcc attributes to disable any/all optimizations here [1] [2] Not done Not done
  • Being able to dump the global namespace (see this topic for a possible solution) or at least dump things prettily (an unreleased version of the file discussed in Nasal Meta-Programming has good support for nice formatting). This should probably be lazy API that can dump an arbitrary namespace recursively - using the canvas, we could then map that to a TreeView
  • Register a callback for handling errors using call() (for parser errors it will need the AST, for runtime errors it would need bytecode access)
  • work on abstracting the GC interface (Hooray) 50}% completed
  • Register a callback for OP_FCALL et al. to be able to time function calls 80}% completed. Example [3].
  • Set breakpoints: register callbacks for values of (struct Frame*)->ip.
    • Typically supported conditional break point types are [4][5][6]:
    • onNamespace, onFile, onLine, onValue, onTypeChange, onRead, onWrite, onExecute, onOpcode, onAlloc, onFree (most of these could be implemented in scripting space meanwhile)
  • Time other parts of Nasal (not just VM) with a compile-time flag? (could be stored in the Context struct, so that sub contexts would have their own flags, i.e. recursive scripts would not affect each other)
  • Also, add some form of Context-based debug/log-level flag for different verbosity levels and phases (parse,codegen,vm,gc) - and maybe don't write it directly to the console, but allow a container/callback to be specified - for better integration/processing by the host app (fgfs)
  • Better error messages 30}% completed.
    • Parsing: Say something other than "parse error", like "null pointer".
    • VM: Indicate type of variable if wrong type.
    • Both: Could we give line and column? Note: I tried this and failed (even just copying Andy's code I got random numbers sometimes). It's a big patch...
    • Most of these diagnostics could be delegated to Nasal space by using some of our C-space hooks that expose the parser/codegen and VM
  • Working with bytecode:
    • Expose to Nasal: Done Done
    • Decompile to text: partial (untested) 70}% completed
    • Optimize: not started (it makes only sense to look at optimizations after we're able to instrument/profile a running FG session to come up with hot spots that are executed either frequently, or that are responsible for significant runtime overhead - i.e. due to GC pressure or other issues)
    • Working with it: provide Bytecode class in Nasal: not started
  • Inspect Context: 80}% completed (passed as argument to callbacks).
  • Expose Tokens to Nasal: implemented by Hooray as argument to compile(), should be extended to cover output from lex.c and after blocking in addition to the current after-prec-ing (and before freeing!) support. 70}% completed
    • compile() being used by call(), it should be straightforward to also map a call() hash.callback to do the same thing here - so that there's no disparity here.
    • Yeah, just another entry in naContext->extras, right? (affirmative)
  • Real function name support via assignment:
    • Option 1: look at the parse tree and check if the right side of the assignment is a function. If so, go ahead and parse the function with the left side as the "name" of the function instead of falling through to more recursions of genExpr().
    • Option 2: recognize assignment in the VM and if there is a bindToContext event, set the name of the function based upon either the last LOCAL/MEMBER/HINSERT or the combination of them (i.e. complex lvalues like local.fn). This presents some obvious issues, however:
      • The right-hand side of an assignment is done before the left-hand side, thus one would have to look ahead to see the assignment, which is clearly illegal for the VM to do.
      • Or one could look behind to see a naCode constant being pushed, and give some indication to its naFunc that it now has a name. This is still somewhat illegal, but not dangerous and thus could be done.
    • Option 3: abandon var foo = func(){} for ECMAscript-like function declaration syntax function foo() {}. This would not affect the use of anonymous func expressions but would instead be applicable in cases where we want to say "this function is static (i.e. permanent) and should have a name" (as opposed the the case of temporary storage variables for functions). Regardless of the method used, a name member will have to be added to naFunc's and the VM and error handling procedures will have to be changed according.
    • Regarding the last comment: Providing an API to "lock" a symbol/naRef to become immutable would be generally useful, not just for functions - but also for constants (math.pi FT2M etc) and other stuff that may otherwise break consistency - ThorstenR mentioned a couple of times how he's intentionally replicating standard constants in LW/AW just to be on the safe side, because there's no such thing as a "constant" in Nasal. Providing a library function to make naRefs read-only should be straightforward, and could be easily implemented by hooking into the VM to register a callback that yields naRuntimeError() -aka die()- for any such attempts. The method would be scalable to also implement optional typing or min/max/stepping (value ranges), too.
    • This should be doable, since I do a naRuntimeError(ctx,naGetError(subcontext)) which should pass errors from a callback (untested).
    • To implement support for immutable symbols (constants/ private/protected encapsulation), one would really just need to either lock WRITE access or restrict visibility, which could work analogous to parents, just as embedded protected/private hashes that are honored by the codegen.
  • Getting stats on Nasal performance while running: use the callbacks and systime to time things. Need hooks into the GC as well. Statistics worth tracking (also look at similar tools like google perftools):
    • Per function (also handles timers/listeners and other typical FG callbacks):
      • ncalls per frame/cumulative
      • time per call (kept as list) Done Done – display min/max/avg/cumulative.
      • number of GC invocations & avg/min/max time
      • number of naNew() calls
      • number & list of names of once-use variables Done Done; n that are numbers (i.e. non-GC-managed).
      • try to come up with heuristics to track *identical* temporaries per callback invocation: User:Philosopher/Howto:Write Optimized Nasal Code
      • maybe some holistic "GC pressure" percentage over time (5,30,60,300 seconds ?)
      • GC pressure can be computed by looking not just at new allocations, but also at realloc() events and the mark/reap phases
      • for GC stats, we can also easily access 1) size of all allocated naType pools and 2) percentage that's in use and 3) newBlock() allocations
    • Per context/global:
      • time spent
      • number of GC invocations:
      • naContext leaks (see Andy's comments below)
        • per frame
        • min/max/avg
    • Data per function call (not displayed, dumped to a file - maybe just accept an optional callback here too?):
      • caller line/name
    • threading (eventually)
      • locking overhead

Performance / Optimizations

  • a bunch of performance issues were reported [7] to be related to:
    • accidentally registering listeners/callbacks twice without noticing (or even more often)
    • never freeing timers/listeners, i.e. we could make sure that issue a warning if a listener's ghost is GC'ed while the listener is still active, because there's then now way to clean up the listener
    • aircraft/addon scripts setting up listeners and timers without registering a /sim/signals/reset handler that handles cleanup
    • always letting timers run at frame rate
  • at least overload the settimer/setlistener API to support a singleton/ONCE param that issues a warning once the VM determines that multiple instances were registered?
  • Hooray: look into adapting the existing GC scheme to support multiple generations - which is a straightforward optimizations even without being a GC expert, it basically boils down to having a single typedef enum {GC_GEN0, GC_GEN1} GENERATION; in code.h and then changing all places in $SG_NASAL to use the GC_GEN0 pool by default, i.e. instead of having &globals->pools[type]; - we would have &globals->pools[GC_GEN0][type]; for starters - the GC manager would then by default only mark/reap the GEN0 (nursery, young generation), promote any objects that survived the GC phase to GEN1, and only ever mark/reap GEN1 if GEN0 has to be resized (start off with reasonably sized generations, based on real stats - e.g. GEN0 16 MB and GEN1 32MB). Not done Not done
  • other dynamic languages like lua or python have GC hooks to customize the GC and to call it on demand: [8], [9]
  • another straightforward optimization would be exposing an API to allocate new objects in a certain generation (GEN0/GEN1) to directly tell the interpreter about the object's lifetime ( until reset/reinit, until aircraft change, timer/frame based).
  • marking/reaping could be parallelized using several threads, for each pool - by using write barriers to sync access to naRefs

Expose additional threading primitives

Consider using pthreads, Nasal's threading support is extremely basic [10].