Improving Nasal
Nasal internals |
---|
Memory Management (GC) |
Last update: 07/2013
As more and more code in FlightGear is moved to the base package and thus implemented in Nasal space, some Nasal related issues have become increasingly obvious.
On the other hand, Nasal has a proven track record of success in FlightGear, and has shown remarkably few significant issues so far. Most of the more prominent issues are related to a wider adoption in FlightGear, and thus more complex features being implemented in Nasal overall, often developed by people without any formal programming training and/or coding experience, for whom Nasal may be their first programming language - meaning, that many issues we've been seeing are of algorithmic nature.
So, rather than having Nasal flame wars and talking about "alternatives" like Perl, Python, Javascript or Lua, the idea is to document known Nasal issues so that they can hopefully be addressed eventually.
If you are aware of any major Nasal issues that are not yet covered here, please feel free to add them here, however it is also a good idea to use the FlightGear bug tracker in such cases: Create an issue tracker ticket
Consider Opcode reordering
Referring to switch/case constructs like these (also to be found in Andy's code):
334 switch(BYTECODE(code)[i]) {
335 case OP_PUSHCONST: case OP_MEMBER: case OP_LOCAL:
336 case OP_JIFTRUE: case OP_JIFNOT: case OP_JIFNOTPOP:
337 case OP_JIFEND: case OP_JMP: case OP_JMPLOOP:
338 case OP_FCALL: case OP_MCALL:
339 naVec_append(bytecode, naNum(BYTECODE(code)[++i]));
Such logic can be expressed more easily by simply wrapping these OP codes in between BEGIN_IMMEDIATE_MODE and END_IMMEDIATE_MODE enums, because we then only need to do this:
#define IS_IMMEDIATE_MODE(bytecode) bytecode > BEGIN_IMMEDIATE_MODE && BYTECODE(code)[i] < END_IMMEDIATE_MODE
if ( IS_IMMEDIATE_MODE(BYTECODE(code)[i]) )
naVec_append(bytecode, naNum(BYTECODE(code)[++i]));
#undef IS_IMMEDIATE_MODE
Which basically means that we only need to worry about a single place when it comes to extending opcodes (and checking in run() that these invalid opcodes aren't used), which also translates into fewer assembly instructions that are actually run (2 CMP vs. ~12 per insn). Also, the bytecode interpreter routine itself could be simplified that way, too. In addition, it would make sense to augment the list of opcode enums by adding an OP_VERSION field that is incremented once opcodes are added/removed (which would be a prerequisite for any caching/serialization schemes too):
enum {
OPCODE_VERSION = 0x01, // for serialization and versioning (e.g. caching bytecode)
BEGIN_OPS=0xFF, // reserve space for 255 opcode changes (should be plenty)
OP_NOT, OP_MUL, OP_PLUS, OP_MINUS, OP_DIV, OP_NEG, OP_CAT, OP_LT, OP_LTE,
OP_GT, OP_GTE, OP_EQ, OP_NEQ, OP_EACH, OP_JMP, OP_JMPLOOP, OP_JIFNOTPOP,
BEGIN_IMMEDIATE_MODE,
OP_JIFEND, OP_FCALL, OP_MCALL, OP_RETURN, OP_PUSHCONST, OP_PUSHONE,
END_IMMEDIATE_MODE, //FIXME: incomplete - just intended as an example!
OP_PUSHZERO, OP_PUSHNIL, OP_POP, OP_DUP, OP_XCHG, OP_INSERT, OP_EXTRACT,
OP_MEMBER, OP_SETMEMBER, OP_LOCAL, OP_SETLOCAL, OP_NEWVEC, OP_VAPPEND,
OP_NEWHASH, OP_HAPPEND, OP_MARK, OP_UNMARK, OP_BREAK, OP_SETSYM, OP_DUP2,
OP_INDEX, OP_BREAK2, OP_PUSHEND, OP_JIFTRUE, OP_JIFNOT, OP_FCALLH,
OP_MCALLH, OP_XCHG2, OP_UNPACK, OP_SLICE, OP_SLICE2,
NUM_OPCODES
};
(The same technique could be used to unify and optimize BINOP handling and some other instructions)
At which point it would make sense to add a pre-populated environment hash like runtime - with build-time constants, like these:
- OP_VERSION
- MAX_STACK_DEPTH 512
- MAX_RECURSION 128
- MAX_MARK_DEPTH 128
- OBJ_CACHE_SZ 1
- REFMAGIC
- BIG_ENDIAN/LITTLE_ENDIAN
Having access to these would help provide better support for backwards compatibility
Better debugging/development support
Besides making a full IDE (which would be really cool), there are several things that can be done by editing the source code of Nasal to enhance debugging support and increase development :
- add build time/runtime sanity checks for Nasal core internals, especially naRef/GC stuff like Andy's pointer hacks, which did cause problems in the past, especially WRT to aggressive compiler optimizations and naHash - see ticket #1240 and Philosopher's comments - in the meantime, consider making naRef stuff volatile and using gcc attributes to disable any/all optimizations here [1] [2] Not done
- Being able to dump the global namespace (see this topic for a possible solution) or at least dump things prettily (an unreleased version of the file discussed in Nasal Meta-Programming has good support for nice formatting). This should probably be lazy API that can dump an arbitrary namespace recursively - using the canvas, we could then map that to a TreeView
- Register a callback for handling errors using call() (for parser errors it will need the AST, for runtime errors it would need bytecode access)
- work on abstracting the GC interface (Hooray)
- Register a callback for OP_FCALL et al. to be able to time function calls . Example [3].
- Set breakpoints: register callbacks for values of (struct Frame*)->ip.
- Time other parts of Nasal (not just VM) with a compile-time flag? (could be stored in the Context struct, so that sub contexts would have their own flags, i.e. recursive scripts would not affect each other)
- Also, add some form of Context-based debug/log-level flag for different verbosity levels and phases (parse,codegen,vm,gc) - and maybe don't write it directly to the console, but allow a container/callback to be specified - for better integration/processing by the host app (fgfs)
- Better error messages .
- Parsing: Say something other than "parse error", like "null pointer".
- VM: Indicate type of variable if wrong type.
- Both: Could we give line and column? Note: I tried this and failed (even just copying Andy's code I got random numbers sometimes). It's a big patch...
- Most of these diagnostics could be delegated to Nasal space by using some of our C-space hooks that expose the parser/codegen and VM
- Working with bytecode:
- Expose to Nasal: Done
- Decompile to text: partial (untested)
- Optimize: not started (it makes only sense to look at optimizations after we're able to instrument/profile a running FG session to come up with hot spots that are executed either frequently, or that are responsible for significant runtime overhead - i.e. due to GC pressure or other issues)
- Working with it: provide Bytecode class in Nasal: not started
- Inspect Context: (passed as argument to callbacks).
- Expose Tokens to Nasal: implemented by Hooray as argument to compile(), should be extended to cover output from lex.c and after blocking in addition to the current after-prec-ing (and before freeing!) support.
- compile() being used by call(), it should be straightforward to also map a call() hash.callback to do the same thing here - so that there's no disparity here.
- Yeah, just another entry in naContext->extras, right? (affirmative)
- Real function name support via assignment:
- Option 1: look at the parse tree and check if the right side of the assignment is a function. If so, go ahead and parse the function with the left side as the "name" of the function instead of falling through to more recursions of genExpr().
- Option 2: recognize assignment in the VM and if there is a bindToContext event, set the name of the function based upon either the last LOCAL/MEMBER/HINSERT or the combination of them (i.e. complex lvalues like local.fn). This presents some obvious issues, however:
- The right-hand side of an assignment is done before the left-hand side, thus one would have to look ahead to see the assignment, which is clearly illegal for the VM to do.
- Or one could look behind to see a naCode constant being pushed, and give some indication to its naFunc that it now has a name. This is still somewhat illegal, but not dangerous and thus could be done.
- Option 3: abandon var foo = func(){} for ECMAscript-like function declaration syntax function foo() {}. This would not affect the use of anonymous func expressions but would instead be applicable in cases where we want to say "this function is static (i.e. permanent) and should have a name" (as opposed the the case of temporary storage variables for functions). Regardless of the method used, a name member will have to be added to naFunc's and the VM and error handling procedures will have to be changed according.
- Regarding the last comment: Providing an API to "lock" a symbol/naRef to become immutable would be generally useful, not just for functions - but also for constants (math.pi FT2M etc) and other stuff that may otherwise break consistency - ThorstenR mentioned a couple of times how he's intentionally replicating standard constants in LW/AW just to be on the safe side, because there's no such thing as a "constant" in Nasal. Providing a library function to make naRefs read-only should be straightforward, and could be easily implemented by hooking into the VM to register a callback that yields naRuntimeError() -aka die()- for any such attempts. The method would be scalable to also implement optional typing or min/max/stepping (value ranges), too.
- This should be doable, since I do a naRuntimeError(ctx,naGetError(subcontext)) which should pass errors from a callback (untested).
- To implement support for immutable symbols (constants/ private/protected encapsulation), one would really just need to either lock WRITE access or restrict visibility, which could work analogous to parents, just as embedded protected/private hashes that are honored by the codegen.
- Getting stats on Nasal performance while running: use the callbacks and systime to time things. Need hooks into the GC as well. Statistics worth tracking (also look at similar tools like google perftools):
- Per function (also handles timers/listeners and other typical FG callbacks):
- ncalls per frame/cumulative
- time per call (kept as list) Done – display min/max/avg/cumulative.
- number of GC invocations & avg/min/max time
- number of naNew() calls
- number & list of names of once-use variables Done; n that are numbers (i.e. non-GC-managed).
- try to come up with heuristics to track *identical* temporaries per callback invocation: User:Philosopher/Howto:Write Optimized Nasal Code
- maybe some holistic "GC pressure" percentage over time (5,30,60,300 seconds ?)
- GC pressure can be computed by looking not just at new allocations, but also at realloc() events and the mark/reap phases
- for GC stats, we can also easily access 1) size of all allocated naType pools and 2) percentage that's in use and 3) newBlock() allocations
- Per context/global:
- time spent
- number of GC invocations:
- naContext leaks (see Andy's comments below)
- per frame
- min/max/avg
- Data per function call (not displayed, dumped to a file - maybe just accept an optional callback here too?):
- caller line/name
- threading (eventually)
- locking overhead
- Per function (also handles timers/listeners and other typical FG callbacks):
Performance / Optimizations
- a bunch of performance issues were reported [7] to be related to:
- accidentally registering listeners/callbacks twice without noticing (or even more often)
- never freeing timers/listeners, i.e. we could make sure that issue a warning if a listener's ghost is GC'ed while the listener is still active, because there's then now way to clean up the listener
- aircraft/addon scripts setting up listeners and timers without registering a /sim/signals/reset handler that handles cleanup
- always letting timers run at frame rate
- at least overload the settimer/setlistener API to support a singleton/ONCE param that issues a warning once the VM determines that multiple instances were registered?
- Hooray: look into adapting the existing GC scheme to support multiple generations - which is a straightforward optimizations even without being a GC expert, it basically boils down to having a single typedef enum {GC_GEN0, GC_GEN1} GENERATION; in code.h and then changing all places in $SG_NASAL to use the GC_GEN0 pool by default, i.e. instead of having &globals->pools[type]; - we would have &globals->pools[GC_GEN0][type]; for starters - the GC manager would then by default only mark/reap the GEN0 (nursery, young generation), promote any objects that survived the GC phase to GEN1, and only ever mark/reap GEN1 if GEN0 has to be resized (start off with reasonably sized generations, based on real stats - e.g. GEN0 16 MB and GEN1 32MB). Not done
- other dynamic languages like lua or python have GC hooks to customize the GC and to call it on demand: [8], [9]
- another straightforward optimization would be exposing an API to allocate new objects in a certain generation (GEN0/GEN1) to directly tell the interpreter about the object's lifetime ( until reset/reinit, until aircraft change, timer/frame based).
- marking/reaping could be parallelized using several threads, for each pool - by using write barriers to sync access to naRefs
Expose additional threading primitives
Consider using pthreads, Nasal's threading support is extremely basic [10].