Hi fellow wiki editors!

To help newly registered users get more familiar with the wiki (and maybe older users too) there is now a {{Welcome to the wiki}} template. Have a look at it and feel free to add it to new users discussion pages (and perhaps your own).

I have tried to keep the template short, but meaningful. /Johan G

Howto:Troubleshooting Nasal Callbacks

From FlightGear wiki
Revision as of 13:06, 20 February 2016 by Hooray (Talk | contribs) (Understanding Callbacks)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
prototype UI callback monitor dialog using nasal and canvas

Motivation

1rightarrow.png See Resource Tracking for FlightGear for the main article about this subject.

Cquote1.png it's relatively easy to do bad things unintentionally. Like tie a bit of code to an FDM property and run updates of a display 120 times per second rather than the 30 times you actually need. Like start a loop multiple times so that you update the same property 30 times per frame rather than the one time you actually need. It's actually pretty hard to catch these things, because the code is formally okay, does the right thing and just eats more performance than necessary, and there's no simple output telling you that you're running 30 loops rather than the one you expect.
Cquote2.png
Cquote1.png Someone recently pointed out that the setlistener() / removelistener() API makes it easy to leak resources. So I wondered about making an alternate API where the return value from setlistener must be kept, or the listener is removed. I can imagine this with a helper object
var myL = setlistener2("some/prop", func { ... }} )

myL.addprop("some/other/prop");

myL.addprop("yet/another/prop");

Now you need to retain a ref to myL or the listeners on all the props are removed. I don't think we can retro-fit this to the existing API, because I suspect many places just ignore the return value and would break with this change. (Or even pass a vec of property names to setlistener2, to avoid all the discrete calls to 'addprop', that's a seperate detail really)

It seems like this would work, and be easy enough to implement - question is if it gives enough benefit to be worth the confusion.
— zakalawe (Oct 28th, 2013). Listener objects.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I think the information condensed is how often we really run a particular function per frame - say how often did XXX_loop() get called via a settimer/ maketimer/ ... every frame?
— Thorsten (Feb 6th, 2016). Re: Nasal Callback tracking (timers & listeners).
(powered by Instant-Cquotes)
Cquote2.png

Background

Cquote1.png At EDXH, the Effect system attaches approx. 1500 Listeners to the property tree but only to 17 unique properties At KSFO, the Effect system attaches some 10000 Listeners to the property tree - but also only to 17 unique properties. That's with shader quality set toero (all off).
Cquote2.png
Cquote1.png On my old Laptop, (1.6GH dual core, GeForce Go 7400, 3Gb RAM) FlightGear has becom barely usable over the last few years. Framerates of max 20 at EDXH and single digits at KSFO.
Cquote2.png
Cquote1.png Without the patch, thousands of listeners were triggered each frame
— Torsten Dreyer (Aug 29th, 2014). Re: [Flightgear-devel] crash in SGPropertyNode::fireValueChanged.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png We've (hopefully successfully!) reduced the number of property listeners generated by the Effects system massively - there should now be only one per property. I'm hopeful that this may provide a performance boost for CPU-limited systems, as well as removing one crash source.
— Stuart Buchanan (Sep 27th, 2014). Re: [Flightgear-devel] crash in SGPropertyNode::fireValueChanged.
(powered by Instant-Cquotes)
Cquote2.png


Cquote1.png usually, "events" means Nasal callbacks (timers) - however, it's also often not just Nasal per se, but misuse of timers - in general, settimer() is a API that is prone to being used improperly, because it needs to be consciously used, i.e. to enable/disable loops and to prevent them from being "registered" (invoked) too often - which is the most common pattern of misuse we've seen so far: Nasal based callbacks that end up being invoked doens (or even hundreds) of times despite being only intended to be invoked once per interval.

However, to be fair, this is not specific to Nasal - A few months ago, Torsten fixed pretty much the exact same bug in the C++ effects code which was registering thousands of identical callbacks. The underlying problem here is that the SGPropertyNode code does not currently support any form of troubleshooting, i.e. to do callback tracking - for details, refer to: Towards better troubleshooting

The maketimer() API can be used to help prevent such issues when using timers, but listeners are a completely different matter (if in doubt, please file a feature request)
— Hooray (Jan 31st, 2016). Re: 777 freezes and FPS loss.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png There is a lot of Nasal/Nasal GC bashing going on, but in most cases we've seen so far, it's really improper use of lower-level APIs like timers and listeners that are not properly managed by aircraft developers - which is unlikely to show up on powerful systems, unless you keep FG running for a long time - because the only thing that is really happening is that callbacks (=code) is triggered more often than necessary.

Imagine it like configuring your mobile phone to check your eMail inbox once per hour - in FlightGear, there is no notion of a "scheduler" to do this correctly, instead "timers" and "listeners" are used to associate callbacks (=code routines) with certain events, such as a timer expiring (e.g. after a few seconds) or a certain property changing (or any combination of these, i.e. by triggering listeners to be registered when a timer expires and vice versa). What is happening behind the scenes is a bit obscure, but that has nothing to do with Nasal or its garbage collection (GC) scheme, it has more to do with the low-level nature of the functions provided by core developers to "manage" such callbacks, i.e. the settimer() and setlistener() APIs are particularly tedious to manage properly - but the de facto practice is to use those as the main building blocks to write/integrate full subsystems into the FlightGear main loop - tiny coding errors may not have much of an effect, but under certain circumstances, those coding errors will add up (i.e. over time), so that the original "task" of checking your inbox once per hour, ends up being executed hundreds of times per second - the underlying code would still be correct though, it's just the event handling code that is not written correctly, which is often the case when using reset/re-init or when changing locations - because the simulator hasn't been designed, developed and maintained with these requirements in mind. Furthermore, there's not introspection facility provided that would allow people to look behind the scenes of the simulator to understand just how often a certain timer/listener is registered and triggered, so that the whole thing is extremely obscure - but like I said, this is not a problem inherent to Nasal coding, it also happens/happened at the C++ level - e.g. look at Torsten's fix for the effects code, which was also leaking/re-running listeners like cray (unnecessarily) - thus, it's redundant work that is getting re-scheduled to over and over again due to buggy code - the problem is not necessarily the buggy code though (writing broken code is part of the whole process), the real problem is that we don't provide any functionality that would allow people to look at where their resources (CPU, RAM, VRAM, GPU) are utilized, so that uninformed conclusions are drawn by some people (like "Nasal is bad ... and needs to be replaced with .... "). Thus, what you can do is to overload the corresponding APIs (extension functions) and treat the id() of the callback as the key for a hash lookup to gather data on how many timers/listeners are registered, and sample those over time.

You will almost certainly end up seeing dozens of unnecessary callbacks being invoked by certain code - even though that may not necessarily be restricted to Nasal code, like I said.
— Hooray (Jan 31st, 2016). Re: 777 freezes and FPS loss.
(powered by Instant-Cquotes)
Cquote2.png

Understanding Callbacks

1rightarrow.png See Callbacks for the main article about this subject.

Cquote1.png To learn more about timers and how they work in Nasal, see $FG_SRC/Scripting/NasalSys.cxx

The settimer() API is mapped to the f_settimer() C function here: http://sourceforge.net/p/flightgear/flightgear/ci/next/tree/src/Scripting/NasalSys.cxx#l781

The C code for this extension function can be seen here: http://sourceforge.net/p/flightgear/flightgear/ci/next/tree/src/Scripting/NasalSys.cxx#l471

That's only 3 lines of code, which ends up calling the FGNasalSys::setTimer() API at: http://sourceforge.net/p/flightgear/flightgear/ci/next/tree/src/Scripting/NasalSys.cxx#l1265


— Hooray (Jan 31st, 2016). Re: 777 freezes and FPS loss.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png What this will do is create a temporary NasalTimer object, for the class/struct refer to: http://sourceforge.net/p/flightgear/flightgear/ci/next/tree/src/Scripting/NasalSys.hxx#l163

It will then set up the NasalTimer object according to the arguments you passed to settimer() - and ultimately, it will invoke the event manager to register the timer object: http://sourceforge.net/p/flightgear/flightgear/ci/next/tree/src/Scripting/NasalSys.cxx#l1298 That really is all you need to know in order to check if a Nasal callback gets re-added, or if it is already running, because the "handler" parameter can be checked (e.g. use SG_LOG() to print out the address of the pointer). Most of this would be also applicable to the setlistener() API, with the difference being that calls will not be specific to any particular subsystem like "events", but can be triggered by arbitrary subsystems which end up writing/modifying a property with a registered Nasal callback/listener. These two are the most likely suspects when it comes to leaking callbacks and having redundant callbacks registered to be executed unnecessarily. Note that the event manager lives in SimGear, and one simple thing you can do is to log the number of active timers to the console and/or to the property tree (e.g. using a propertyObject) - you will almost certainly see those grow over time, probably even with identical callbacks: http://api-docs.freeflightsim.org/simgear/classSGEventMgr.html The other thing that can be done is to use SGTimeStamp::now() to benchmark the execution time of different callbacks. None of this would be specific to the 777 - it would benefit any/all aircraft and other Nasal code suffering from these coding errors


Even if you don't feel comfortable writing C++ code, all you need to know is how to use the SG_LOG() macro to dump the sie of a the timer queue: http://api-docs.freeflightsim.org/simgear/classSGTimerQueue.html

It's _numEntries that's relevant here, which is private, but can be exposed by adding a corresponding method
— Hooray (Jan 31st, 2016). Re: 777 freezes and FPS loss.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png you can patch the C++ code to maintain a list of callbacks invoked via timers/listeners and log everything to the property tree/console by printing out the file/line number where each callback was registered - I previously posted a patch doing this, so you could run a forum search - it would provide you with a list of file names and line numbers, including SGTimeStamp based timing for each running callback. There's another patch identifying repeatedly registered callbacks, too - by checking the address of each Nasal callback against a vector of already known callbacks.
— Hooray (Feb 13th, 2015). Re: Sometime, with Yasim... frame-rate is slow.
(powered by Instant-Cquotes)
Cquote2.png

Approach

1rightarrow.png See Howto:Exposing Subsystems to Nasal for the main article about this subject.

Cquote1.png We cannot really/easily identify "loops" per se - i.e. as far as I remember (Philosopher may correct me if I am wrong), we have to use what's available via die(), aka naRuntimeError, which really only provides 1) filename/path and 2) line number. However, this can be used to build a list of "active callbacks" that can be associated with their "origin", i.e. where the callback was registered to be invoked via settimer()/maketimer() or setlistener().
— Hooray (Feb 6th, 2016). Re: Nasal Callback tracking (timers & listeners).
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png What's available in C++ space are primarily: the event manager subsystem (SGEventMgr) which has two internal "queues" for keeping a list of callbacks that shall be invoked when a certain event (timeout) happens, as well as the property tree interface to register Nasal callbacks via the SGPropertyChangeListener API:

http://api-docs.freeflightsim.org/simgear/classSGEventMgr.html

http://api-docs.freeflightsim.org/simgear/classSGPropertyChangeListener.html At that point, all we have access to is C++ pointers, and pointers to Nasal data structures representing a function, so the best we can currently do is probably logging filename/line number per timer/listener and provide a list of those. Invocation is a different matter, i.e. we could maintain a counter for each known callback (C/C++ pointer) and reset/increment that per frame to actually determine how often a certain callback is actually invoked per frame/second/minute, analogous to the performance monitor. From a representation perspective, my suggestion would be to highlight per-frame callbacks in yellow, and everything that gets updated several times per frame (think FDM/AP-coupled coode) would be shown in red. Doing that will inevitably require patches to the SG/FG side of FGNasalSys, though. But the UI could be coded in Nasal itself, i.e. using the Canvas system. Equally, it might make sense to gather a total of timers/listeners at different stages of the initialiation sequence, i.e. how loading/unloading scenery/aircraft affects the global count of listeners/timers, and how many of those are actually pointing to C++ code vs. Nasal callbacks, and how that is affected by reset/re-init.

Like you say, this is one of the more likely reasons for Nasal performance affecting the main simulation loop, probably even more so than the Nasal GC itself...
— Hooray (Feb 6th, 2016). Re: Nasal Callback tracking (timers & listeners).
(powered by Instant-Cquotes)
Cquote2.png

Objective

Implementation

We will be wrapping/overriding the problematic APIs at the global level to provide functions that internally support callback/registration tracking, and which can automatically detect improper use of these APIs:

var ObjectIdentity = func(obj) {
 return id(obj);
};

# TODO: need to use call/compile to track call location

var TrackableResource = {
 new: func() {
  var m={parents:[TrackableResource]};
  m.objects=[];
  return m;
 },
 del: func() {
 },
};

var TrackableAPI = {
 new: func(api) {
  var m = { parents:[TrackableAPI] };
  m.collection = {};
  return m;
 },
 del: func() {},

 getCallback: func() {return me.invoke},
 invoke: func (callback, interval) {
 var key=id(callback);
 if (me.collection[key]!=nil)
  print("Warning, key/callback already in collection: ");
 },
};


var setlistener = TrackableAPI.new(api: setlistener).getCallback();
# removelistener
var settimer = TrackableAPI.new(api: settimer).getCallback();

var test = func() {

## improper use of APIs:
var foo = func;

for (var i=0;i<=5;i++) {
 # 5 listeners registered for the same property/callback:
 setlistener("/sim/foo", foo);
 # invoke foo again after 5 seconds
 settimer(foo, 5);
} # for loop

}; # of test

Integration

The corresponding APIs should be either loaded globally via $FG_ROOT/Nasal or explicitly loaded via the -set.xml file:

Code