User:Philosopher/Optimization findings
Nasal internals |
---|
Memory Management (GC) |
Here's some findings WRT optimizations. Note that these were from a test run with the B-1B, which is particularly heavy Nasal wise, see the second heading. Also note that these were profiled with a meta-Nasal script – Nasal profiling itself might have influenced some of the findings (the sim usually slows down, say, a factor of 4-6), but it should be relatively correct. To profile these yourself, clone & compile the "extended-nasal" branches of [1] [2] [3].
props.Node.getNode()
props.Node.getNode is evil. Well, as evil as it can be, here's some raw stats:
props.wrapNode():
Global name: props.wrapNode()
Declared in /Users/philosopher/Documents/FlightGear/fgdata/Nasal/props.nas
Times called: 12797
Call time stats:
min: 0.000036/3.600121e-05
max: 0.062680/6.268001e-02
average: 0.000074/7.401553e-05
Used local variables: None
Unused local variables: None
Most called from: /Users/philosopher/Documents/FlightGear/fgdata/Nasal/props.nas
props.Node.getNode():
Global name: props.Node.getNode()
Declared in /Users/philosopher/Documents/FlightGear/fgdata/Nasal/props.nas
Times called: 10272
Call time stats:
min: 0.000182/1.819134e-04
max: 0.070453/7.045293e-02
average: 0.001123/1.122783e-03
Used local variables: None
Unused local variables: None
Most called from: /Users/philosopher/Documents/FlightGear/fgdata/Aircraft/Generic/aar.nas
That's a lot of calls to props.wrapNode (it was the most called function in this particular run) and it creates a new Nasal object each and every time – that's 12797 Nasal objects from a short test run! (At least, I believe it was short...) And each time one does getNode, that creates a new object. So using "immediate" .setValue/.getValue (those with the relative path) is going to be better in some circumstances. The best solution, I believe, is to use C++Bind for these (which of course is where we're eventually headed anyways...). The good news for now is that these functions are usually closer to the minimum value versus the maximum value (in particular, wrapNode averages out to about twice it's minimum time taken and getNode is a little less than an order of ten times it's minimum).
Aircraft/Generic/aar.nas
The worst module ever. To quote from my analyzer script: "Longest function called (if called more than three times): fuel.update_loop (max time taken: 0.6208341, called 51 times)".
Global name: fuel.update_loop()
Declared in /Users/philosopher/Documents/FlightGear/fgdata/Aircraft/Generic/aar.nas at line 216
Times called: 51
Call time stats:
min: 0.106532/1.065321e-01
max: 0.620834/6.208341e-01
average: 0.312339/3.123395e-01
Used local variables: {'lbs', 'ppg', 'selected_tanks', 'consumed', 'cap', 'ac', 'out_of_fuel', 'gals', 'mp', 'fuel_per_tank', 'fuel', 'e', 'refueling', 't'}
Unused local variables: {'tankers'}
Most called from: /Users/philosopher/Documents/FlightGear/fgdata/Aircraft/Generic/aar.nas
That's just awful, taking up a half-second max and barely managing a tenth of a second ?? This needs to be improved. (It looks like it recreates variables each time unnecessarily.)
Garbage Collector
Actually, I think it's pretty efficient if it has to sort through that:
**** Nasal garbage collection statistics: objects: 43883, references: 105527
(wink)
- to determine the potential benefits of the generational approach to GC, just add a counter to the naRef struct to count the number of GC survivals per object and increment it upon mark()ing - which should give you the percentage of "old" objects vs. "new" objects (nursery/young generation). The general assumption is that the majority of objects are in the nursery generation (gen0) and will be discarded quickly, while older objects (those having survived multiple GC passes) are assumed to stay around longer, in a larger generation (usually called "eden").
GC Pressure
Objective: Come up with heuristics to define GC pressure as a useful metric, e.g.:
- function triggers GC (normally, ANY function can trigger the GC for unrelated objects, just because the GC runs out of free memory, so we need to do tracking)
- function allocates new objects
- function causes memory pools to be resized
- objects allocated by the function (temporaries) are GC'ed during or after the invocation of the func
- percentage of mark()ed and reap()ed objects per function
- repeatedly called functions (via timers or listeners) can be sampled by comparing runtime profiles of different runs
- new objects vs. additional references
- shadowed symbols
GC Hints
Nasal being a dynamic language, we could reduce the amount of necessary GC work by giving hints to the GC, either through dedicated APIs (extension functions) or by using a custom syntax for variables that have a certain lifetime (which would also help with generational GC), e.g.:
var canvas = {};
GC.promote_to_generation(GC.eden, canvas);
A syntax/keyword-based approach would seem more intuitive, but would only take place at runtime, e.g.:
var:permanent nodes = []; # a permanent variable
var:temp i = []; # a temporary variable
We could use lookup tables to register callbacks for different var: storage specifiers. Python and Lua provide explicit hooks into the GC, to affect and customize GC behavior.
Random observations
Resizing the window seems to be particularly Nasal-heavy (i.e. takes a long time with introspection).
- I assume that's because of a bunch of Nasal scripts having listeners registered to the x/y coordinates of the window ? Or is it something else ?