Howto:Troubleshooting Nasal Callbacks

From FlightGear wiki
Revision as of 20:45, 31 January 2016 by Hooray (talk | contribs) (→‎Background: +ref)
Jump to navigation Jump to search

Background

Cquote1.png Without the patch, thousands of listeners were triggered each frame
— Torsten Dreyer (Aug 29th, 2014). Re: [Flightgear-devel] crash in SGPropertyNode::fireValueChanged.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png Someone recently pointed out that the setlistener() / removelistener() API makes it easy to leak resources. So I wondered about making an alternate API where the return value from setlistener must be kept, or the listener is removed. I can imagine this with a helper object
var myL = setlistener2("some/prop", func { ... }} )

myL.addprop("some/other/prop");

myL.addprop("yet/another/prop");

Now you need to retain a ref to myL or the listeners on all the props are removed. I don't think we can retro-fit this to the existing API, because I suspect many places just ignore the return value and would break with this change. (Or even pass a vec of property names to setlistener2, to avoid all the discrete calls to 'addprop', that's a seperate detail really)

It seems like this would work, and be easy enough to implement - question is if it gives enough benefit to be worth the confusion.
— zakalawe (Oct 28th, 2013). Listener objects.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png usually, "events" means Nasal callbacks (timers) - however, it's also often not just Nasal per se, but misuse of timers - in general, settimer() is a API that is prone to being used improperly, because it needs to be consciously used, i.e. to enable/disable loops and to prevent them from being "registered" (invoked) too often - which is the most common pattern of misuse we've seen so far: Nasal based callbacks that end up being invoked doens (or even hundreds) of times despite being only intended to be invoked once per interval.

However, to be fair, this is not specific to Nasal - A few months ago, Torsten fixed pretty much the exact same bug in the C++ effects code which was registering thousands of identical callbacks. The underlying problem here is that the SGPropertyNode code does not currently support any form of troubleshooting, i.e. to do callback tracking - for details, refer to: Towards better troubleshooting

The maketimer() API can be used to help prevent such issues when using timers, but listeners are a completely different matter (if in doubt, please file a feature request)
— Hooray (Jan 31st, 2016). Re: 777 freezes and FPS loss.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png There is a lot of Nasal/Nasal GC bashing going on, but in most cases we've seen so far, it's really improper use of lower-level APIs like timers and listeners that are not properly managed by aircraft developers - which is unlikely to show up on powerful systems, unless you keep FG running for a long time - because the only thing that is really happening is that callbacks (=code) is triggered more often than necessary.

Imagine it like configuring your mobile phone to check your eMail inbox once per hour - in FlightGear, there is no notion of a "scheduler" to do this correctly, instead "timers" and "listeners" are used to associate callbacks (=code routines) with certain events, such as a timer expiring (e.g. after a few seconds) or a certain property changing (or any combination of these, i.e. by triggering listeners to be registered when a timer expires and vice versa). What is happening behind the scenes is a bit obscure, but that has nothing to do with Nasal or its garbage collection (GC) scheme, it has more to do with the low-level nature of the functions provided by core developers to "manage" such callbacks, i.e. the settimer() and setlistener() APIs are particularly tedious to manage properly - but the de facto practice is to use those as the main building blocks to write/integrate full subsystems into the FlightGear main loop - tiny coding errors may not have much of an effect, but under certain circumstances, those coding errors will add up (i.e. over time), so that the original "task" of checking your inbox once per hour, ends up being executed hundreds of times per second - the underlying code would still be correct though, it's just the event handling code that is not written correctly, which is often the case when using reset/re-init or when changing locations - because the simulator hasn't been designed, developed and maintained with these requirements in mind. Furthermore, there's not introspection facility provided that would allow people to look behind the scenes of the simulator to understand just how often a certain timer/listener is registered and triggered, so that the whole thing is extremely obscure - but like I said, this is not a problem inherent to Nasal coding, it also happens/happened at the C++ level - e.g. look at Torsten's fix for the effects code, which was also leaking/re-running listeners like cray (unnecessarily) - thus, it's redundant work that is getting re-scheduled to over and over again due to buggy code - the problem is not necessarily the buggy code though (writing broken code is part of the whole process), the real problem is that we don't provide any functionality that would allow people to look at where their resources (CPU, RAM, VRAM, GPU) are utilized, so that uninformed conclusions are drawn by some people (like "Nasal is bad ... and needs to be replaced with .... "). Thus, what you can do is to overload the corresponding APIs (extension functions) and treat the id() of the callback as the key for a hash lookup to gather data on how many timers/listeners are registered, and sample those over time.

You will almost certainly end up seeing dozens of unnecessary callbacks being invoked by certain code - even though that may not necessarily be restricted to Nasal code, like I said.
— Hooray (Jan 31st, 2016). Re: 777 freezes and FPS loss.
(powered by Instant-Cquotes)
Cquote2.png

Objective

Implementation

We will be wrapping/overriding the problematic APIs at the global level to provide functions that internally support callback/registration tracking, and which can automatically detect improper use of these APIs:

var ObjectIdentity = func(obj) {
 return id(obj);
};

# TODO: need to use call/compile to track call location

var TrackableResource = {
 new: func() {
  var m={parents:[TrackableResource]};
  m.objects=[];
  return m;
 },
 del: func() {
 },
};

var TrackableAPI = {
 new: func(api) {
  var m = { parents:[TrackableAPI] };
  m.collection = {};
  return m;
 },
 del: func() {},

 getCallback: func() {return me.invoke},
 invoke: func (callback, interval) {
 var key=id(callback);
 if (me.collection[key]!=nil)
  print("Warning, key/callback already in collection: ");
 },
};


var setlistener = TrackableAPI.new(api: setlistener).getCallback();
# removelistener
var settimer = TrackableAPI.new(api: settimer).getCallback();

var test = func() {

## improper use of APIs:
var foo = func;

for (var i=0;i<=5;i++) {
 # 5 listeners registered for the same property/callback:
 setlistener("/sim/foo", foo);
 # invoke foo again after 5 seconds
 settimer(foo, 5);
} # for loop

}; # of test

Integration

The corresponding APIs should be either loaded globally via $FG_ROOT/Nasal or explicitly loaded via the -set.xml file:

Code