User:Philosopher/Nasal FGCommand crashes: Difference between revisions

From FlightGear wiki
Jump to navigation Jump to search
(→‎Current status: good idea H, that seems right!)
Line 28: Line 28:


Well, it will keep calling bind() for each invocation/parse() of the binding [https://gitorious.org/fg/flightgear/source/740b3f35e98b8c0506bc887326ef831704520f89:src/Scripting/NasalSys.cxx#L1113]. And it's all in a fresh context using a new func naRef - so it may be possible, that previously bound stuff is still lingering around - but we need to dump the namespace to see what's going on there for tens of thousands of invocations. The short-term "fix" might be NOT re-parsing things over and over again, and checking if that solves the problem or not. After all, it seems unrelated to compiler optimizations, which is a good thing :) We may also want to check Andy's usage of naBindFunction() vs. naBindToContext() [https://gitorious.org/fg/simgear/source/19481983e5e67c60603acb41d42ff406751ada0c:simgear/nasal/code.c#L834].
Well, it will keep calling bind() for each invocation/parse() of the binding [https://gitorious.org/fg/flightgear/source/740b3f35e98b8c0506bc887326ef831704520f89:src/Scripting/NasalSys.cxx#L1113]. And it's all in a fresh context using a new func naRef - so it may be possible, that previously bound stuff is still lingering around - but we need to dump the namespace to see what's going on there for tens of thousands of invocations. The short-term "fix" might be NOT re-parsing things over and over again, and checking if that solves the problem or not. After all, it seems unrelated to compiler optimizations, which is a good thing :) We may also want to check Andy's usage of naBindFunction() vs. naBindToContext() [https://gitorious.org/fg/simgear/source/19481983e5e67c60603acb41d42ff406751ada0c:simgear/nasal/code.c#L834].
Context... that's a good point. How about [https://gitorious.org/fg/flightgear/commit/0fbc448 commit 0fbc448]? That changed it so it creates and then deletes a context (instead of using global one), and is about the right timeframe - between 2.12 and present (it's dated Nov 22, 2013). It looks like we can still revert it (there's a global context still), so I'll see what that does, but it's awfully suspicious if it would cause problems. But I do agree that freeing the wrong thing could definitely be a problem - if you lose the context and free the function, it's really going to get screwed over, and possibly rewritten with anything else. And since the whole Nasal and GC path is deterministic (so far), that would support why I keep getting the same indexes over and over again - just trigger it in the wrong place and the function goes kaboom.
: Actually, to keep going off on that, it looks like freeing the context immediately is quite dangerous. naNew(ctx...) calls naTempSave(ctx...), which allows ''active'' naContexts to keep the function alive (as long as they don't run code...), but obviously naFreeContext(ctx) doesn't keep it alive, which would allow for GC'ing the objects saved with naTempSave(ctx...) - which, for example, can happen in the if statement you mentioned, which [https://gitorious.org/fg/flightgear/source/740b3f35e98b8c0506bc887326ef831704520f89:src/Scripting/NasalSys.cxx#L1151 creates a new hash] if needed. (And I believe things started crashing much earlier when I deleted the namespace each time, which would support that.) In addition to the timing seeming right, this all seems to fit, so I'll revert the commit and test it ASAP. This also does raise doubts for me about threaded GC though... with creation and deletion of contexts, anything threaded might slip into the cracks. (You weren't on the list, but James sent out an email about 1-2 weeks ago proposing a threaded GC patch again, and I can obviously send that to you/put it up on gitorious.)
: I also added assertions (and AFAIK they're active), which basically checks that what's going in is what's going to come out, and everything looked fine so far, so that would point to corruption instead of a bad pointer being passed to it.


=== Available backtraces ===
=== Available backtraces ===

Revision as of 13:08, 9 April 2014

Just a page to gather information about the crashes we've been seeing recently. It seems to be related to running Nasal bindings, especially with joysticks.

Bug tracker

Forum threads

Comments so far

Cquote1.png I'm fairly certain this is joystick related and it is new

I downgraded to 2.12.1 and it has not crashed yet with the js plugged in

It has previously crashed on all newer versions including the latest git but only when the joystick is plugged in.
— chrisb[1]
Cquote2.png
Cquote1.png To be honest, and as previously mentioned, I find it hard to believe that there really is a bug that's just showing up in one particular subsystem (like the JS bindings) - now, I have never experienced this, but I also don't happen to use a Joystick :D I think it's been 2 releases ago that the JS stuff was revamped by Stuart, so it's possible that there's something going on in the C++ code and the way it's interacting with the Nasal engine. But so far, my impression really is that the naHash related issues are due to aggressive compiler optimizations - we have previously seen issues exactly due to this, and due to Andy's sophisticated pointer hacks, which also did show up in hash.c - to see if this could be the case here or not, I'd suggest to come up with a minimal test case that reliably triggers a segfault (no scenery, no complex aircraft - anything else disabled). Once we have that, we can either disable all optimizations, or explicitly mark hash.c (or libnasal.a) to not use any optimizations at all - if that helps, we need to look at the generated IR/ASM code to see what's going on, or simply use volatile/compiler attributes to prevent the compiler from touching some of Andy's data structures/code.

I would be very surprised if we had a bug in the Nasal engine that only shows up when using Joysticks - after all, Nasal doesn't know anything about Joysticks at all. So such an issue should show up in a lot of other places, and not just joystick bindings. Currently, I find it much more likely to be either due to compiler optimizations or due to the way the C++ code is interfacing with the corresponding ghosts.

If anybody around can come up with a very compact test case that triggers the whole thing, I may get a joystick and give it a try to see if I can reproduce the whole thing.
— Hooray[2]
Cquote2.png
  1. chrisb (7 April 2014). Re: Frequent constant random crashes.
  2. Hooray (5 April 2014). Re: FG Crashes too, too often.

Current status

It looks like something happens when calling the function, which is compiled each time from the property tree - something that didn't happen in 2.12. The crash is reproducible after enough iterations, with any kind of (valid) binding code, and regardless of other conditions. It appears that some sort of invalid symbol appears on the stack, which does not exist as a valid address.

Well, it will keep calling bind() for each invocation/parse() of the binding [1]. And it's all in a fresh context using a new func naRef - so it may be possible, that previously bound stuff is still lingering around - but we need to dump the namespace to see what's going on there for tens of thousands of invocations. The short-term "fix" might be NOT re-parsing things over and over again, and checking if that solves the problem or not. After all, it seems unrelated to compiler optimizations, which is a good thing :) We may also want to check Andy's usage of naBindFunction() vs. naBindToContext() [2].

Context... that's a good point. How about commit 0fbc448? That changed it so it creates and then deletes a context (instead of using global one), and is about the right timeframe - between 2.12 and present (it's dated Nov 22, 2013). It looks like we can still revert it (there's a global context still), so I'll see what that does, but it's awfully suspicious if it would cause problems. But I do agree that freeing the wrong thing could definitely be a problem - if you lose the context and free the function, it's really going to get screwed over, and possibly rewritten with anything else. And since the whole Nasal and GC path is deterministic (so far), that would support why I keep getting the same indexes over and over again - just trigger it in the wrong place and the function goes kaboom.

Actually, to keep going off on that, it looks like freeing the context immediately is quite dangerous. naNew(ctx...) calls naTempSave(ctx...), which allows active naContexts to keep the function alive (as long as they don't run code...), but obviously naFreeContext(ctx) doesn't keep it alive, which would allow for GC'ing the objects saved with naTempSave(ctx...) - which, for example, can happen in the if statement you mentioned, which creates a new hash if needed. (And I believe things started crashing much earlier when I deleted the namespace each time, which would support that.) In addition to the timing seeming right, this all seems to fit, so I'll revert the commit and test it ASAP. This also does raise doubts for me about threaded GC though... with creation and deletion of contexts, anything threaded might slip into the cracks. (You weren't on the list, but James sent out an email about 1-2 weeks ago proposing a threaded GC patch again, and I can obviously send that to you/put it up on gitorious.)
I also added assertions (and AFAIK they're active), which basically checks that what's going in is what's going to come out, and everything looked fine so far, so that would point to corruption instead of a bad pointer being passed to it.

Available backtraces

Complete

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000020
0x0000000100d85b04 in naiHash_newsym (hash=0x1070c9cb0, sym=0x20, val=0x7fff5fbfd978) at /Users/philosopher/Documents/FlightGear/simgear/simgear/nasal/hash.c:257
257 struct naStr *s = PTR(*sym).str;
(gdb) bt
#0 0x0000000100d85b04 in naiHash_newsym (hash=0x1070c9cb0, sym=0x20, val=0x7fff5fbfd978) at /Users/philosopher/Documents/FlightGear/simgear/simgear/nasal/hash.c:257
#1 0x0000000100d799e3 in setupArgs (ctx=0x105898400, f=0x105898400, args=0x0, nargs=0) at /Users/philosopher/Documents/FlightGear/simgear/simgear/nasal/code.c:304
#2 0x0000000100d7954e in naCall (ctx=0x105898400, func={num = 2.1730606295780482e-314, ref = {ptr = {obj = 0x1062917e0, str = 0x1062917e0, vec = 0x1062917e0, hash = 0x1062917e0, code = 0x1062917e0, func = 0x1062917e0, ccode = 0x1062917e0, ghost = 0x1062917e0}, reftag = 2147444617}}, argc=0, args=0x0, obj={num = 0, ref = {ptr = {obj = 0x0, str = 0x0, vec = 0x0, hash = 0x0, code = 0x0, func = 0x0, ccode = 0x0, ghost = 0x0}, reftag = 2147444617}}, locals={num = 2.1804274803696552e-314, ref = {ptr = {obj = 0x1070c9cb0, str = 0x1070c9cb0, vec = 0x1070c9cb0, hash = 0x1070c9cb0, code = 0x1070c9cb0, func = 0x1070c9cb0, ccode = 0x1070c9cb0, ghost = 0x1070c9cb0}, reftag = 2147444617}}) at /Users/philosopher/Documents/FlightGear/simgear/simgear/nasal/code.c:1010
#3 0x0000000100d7cdb4 in naCallMethodCtx (ctx=0x105898400, code={num = 2.1730606295780482e-314, ref = {ptr = {obj = 0x1062917e0, str = 0x1062917e0, vec = 0x1062917e0, hash = 0x1062917e0, code = 0x1062917e0, func = 0x1062917e0, ccode = 0x1062917e0, ghost = 0x1062917e0}, reftag = 2147444617}}, self={num = 0, ref = {ptr = {obj = 0x0, str = 0x0, vec = 0x0, hash = 0x0, code = 0x0, func = 0x0, ccode = 0x0, ghost = 0x0}, reftag = 2147444617}}, argc=0, args=0x0, locals={num = 2.1804274803696552e-314, ref = {ptr = {obj = 0x1070c9cb0, str = 0x1070c9cb0, vec = 0x1070c9cb0, hash = 0x1070c9cb0, code = 0x1070c9cb0, func = 0x1070c9cb0, ccode = 0x1070c9cb0, ghost = 0x1070c9cb0}, reftag = 2147444617}}) at /Users/philosopher/Documents/FlightGear/simgear/simgear/nasal/code.c:1084
#4 0x0000000100d7cec7 in naCallMethod (code={num = 2.1730606295780482e-314, ref = {ptr = {obj = 0x1062917e0, str = 0x1062917e0, vec = 0x1062917e0, hash = 0x1062917e0, code = 0x1062917e0, func = 0x1062917e0, ccode = 0x1062917e0, ghost = 0x1062917e0}, reftag = 2147444617}}, self={num = 0, ref = {ptr = {obj = 0x0, str = 0x0, vec = 0x0, hash = 0x0, code = 0x0, func = 0x0, ccode = 0x0, ghost = 0x0}, reftag = 2147444617}}, argc=0, args=0x0, locals={num = 2.1804274803696552e-314, ref = {ptr = {obj = 0x1070c9cb0, str = 0x1070c9cb0, vec = 0x1070c9cb0, hash = 0x1070c9cb0, code = 0x1070c9cb0, func = 0x1070c9cb0, ccode = 0x1070c9cb0, ghost = 0x1070c9cb0}, reftag = 2147444617}}) at /Users/philosopher/Documents/FlightGear/simgear/simgear/nasal/code.c:1095
#5 0x00000001005360ba in FGNasalSys::callMethod (this=0x110aa73f0, code={num = 2.1730606295780482e-314, ref = {ptr = {obj = 0x1062917e0, str = 0x1062917e0, vec = 0x1062917e0, hash = 0x1062917e0, code = 0x1062917e0, func = 0x1062917e0, ccode = 0x1062917e0, ghost = 0x1062917e0}, reftag = 2147444617}}, self={num = 0, ref = {ptr = {obj = 0x0, str = 0x0, vec = 0x0, hash = 0x0, code = 0x0, func = 0x0, ccode = 0x0, ghost = 0x0}, reftag = 2147444617}}, argc=0, args=0x0, locals={num = 2.1804274803696552e-314, ref = {ptr = {obj = 0x1070c9cb0, str = 0x1070c9cb0, vec = 0x1070c9cb0, hash = 0x1070c9cb0, code = 0x1070c9cb0, func = 0x1070c9cb0, ccode = 0x1070c9cb0, ghost = 0x1070c9cb0}, reftag = 2147444617}}) at /Users/philosopher/Documents/FlightGear/flightgear/src/Scripting/NasalSys.cxx:248
#6 0x0000000100536012 in FGNasalSys::call (this=0x110aa73f0, code={num = 2.1730606295780482e-314, ref = {ptr = {obj = 0x1062917e0, str = 0x1062917e0, vec = 0x1062917e0, hash = 0x1062917e0, code = 0x1062917e0, func = 0x1062917e0, ccode = 0x1062917e0, ghost = 0x1062917e0}, reftag = 2147444617}}, argc=0, args=0x0, locals={num = 2.1804274803696552e-314, ref = {ptr = {obj = 0x1070c9cb0, str = 0x1070c9cb0, vec = 0x1070c9cb0, hash = 0x1070c9cb0, code = 0x1070c9cb0, func = 0x1070c9cb0, ccode = 0x1070c9cb0, ghost = 0x1070c9cb0}, reftag = 2147444617}}) at /Users/philosopher/Documents/FlightGear/flightgear/src/Scripting/NasalSys.cxx:236
#7 0x000000010053a1c4 in FGNasalSys::handleCommand (this=0x110aa73f0, moduleName=0x10f7ee430 "__js0", fileName=0x149a93648 "/input/joysticks/js/axis/binding", src=0x115a5c9e0 "aileron.push()", arg=0x115a84860) at /Users/philosopher/Documents/FlightGear/flightgear/src/Scripting/NasalSys.cxx:1231
#8 0x000000010053a2a8 in FGNasalSys::handleCommand (this=0x110aa73f0, arg=0x115a84860) at /Users/philosopher/Documents/FlightGear/flightgear/src/Scripting/NasalSys.cxx:1241
#9 0x0000000100008949 in do_nasal (arg=0x115a84860) at /Users/philosopher/Documents/FlightGear/flightgear/src/Main/fg_commands.cxx:183
#10 0x0000000100011fde in SGCommandMgr::FunctionCommand::operator() (this=0x1069538f0, arg=0x115a84860) at commands.hxx:57
#11 0x0000000100dd0145 in SGBinding::innerFire (this=0x1305be9c0) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/SGBinding.cxx:92
#12 0x0000000100dd07b5 in SGBinding::fire (this=0x1305be9c0, setting=0.08504398912191391) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/SGBinding.cxx:133
#13 0x00000001002b02e8 in FGJoystickInput::updateJoystick (this=0x115adc4b0, index=0, joy=0x115adc4e0, dt=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/flightgear/src/Input/FGJoystickInput.cxx:347
#14 0x00000001002b08aa in FGJoystickInput::update (this=0x115adc4b0, dt=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/flightgear/src/Input/FGJoystickInput.cxx:386
#15 0x0000000100e1c5b7 in SGSubsystemGroup::Member::update (this=0x124af2600, delta_time_sec=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/subsystem_mgr.cxx:412
#16 0x0000000100e1b77a in SGSubsystemGroup::update (this=0x124a58790, delta_time_sec=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/subsystem_mgr.cxx:249
#17 0x0000000100e1c5b7 in SGSubsystemGroup::Member::update (this=0x124af26a0, delta_time_sec=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/subsystem_mgr.cxx:412
#18 0x0000000100e1b77a in SGSubsystemGroup::update (this=0x106822760, delta_time_sec=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/subsystem_mgr.cxx:249
#19 0x0000000100e1cfbb in SGSubsystemMgr::update (this=0x106822670, delta_time_sec=0.041666666666666664) at /Users/philosopher/Documents/FlightGear/simgear/simgear/structure/subsystem_mgr.cxx:515
#20 0x000000010004a184 in fgMainLoop () at /Users/philosopher/Documents/FlightGear/flightgear/src/Main/main.cxx:108
#21 0x000000010073e9d3 in fgOSMainLoop () at /Users/philosopher/Documents/FlightGear/flightgear/src/Viewer/fg_os_osgviewer.cxx:333
#22 0x0000000100049288 in fgMainInit (argc=3, argv=0x7fff5fbffa50) at /Users/philosopher/Documents/FlightGear/flightgear/src/Main/main.cxx:493
#23 0x0000000100007a3c in main (argc=3, argv=0x7fff5fbffa50) at /Users/philosopher/Documents/FlightGear/flightgear/src/Main/bootstrap.cxx:234


Incomplete

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000020
0x000000010091d821 in naiHash_newsym ()
(gdb) bt
#0  0x000000010091d821 in naiHash_newsym ()
#1  0x0000000100912b79 in setupArgs ()
#2  0x000000010091283f in naCall ()
#3  0x0000000100915664 in naCallMethodCtx ()
#4  0x0000000100915739 in naCallMethod ()
#5  0x000000010038ca0b in FGNasalSys::handleCommand ()
#6  0x000000010038ca93 in FGNasalSys::handleCommand ()
#7  0x00000001000086fc in do_nasal ()
#8  0x00000001000119fd in SGCommandMgr::FunctionCommand::operator() ()
#9  0x000000010094e48b in SGBinding::innerFire ()
#10 0x00000001001af7d7 in FGJoystickInput::updateJoystick ()
#11 0x00000001001afb19 in FGJoystickInput::update ()
#12 0x0000000100995b0e in SGSubsystemGroup::Member::update ()
#13 0x0000000100994c7f in SGSubsystemGroup::update ()
#14 0x0000000100995b0e in SGSubsystemGroup::Member::update ()
#15 0x0000000100994c7f in SGSubsystemGroup::update ()
#16 0x0000000100996807 in SGSubsystemMgr::update ()
#17 0x000000010003b326 in fgMainLoop ()
#18 0x00000001004d5098 in fgOSMainLoop ()
#19 0x000000010003ad2c in fgMainInit ()
#20 0x000000010000760c in main ()


    Core was generated by `/home/flight/FG/install/fgfs/bin/fgfs --fg-root=/home/flight/FG/install/fgfs/fg'.
    Program terminated with signal 11, Segmentation fault.
    #0  0x08bd8372 in naiHash_newsym ()
    (gdb) bt
    #0  0x08bd8372 in naiHash_newsym ()
    #1  0x08bcfbf9 in setupArgs.isra.3 ()
    #2  0x08bd37f9 in naCall ()
    #3  0x08bd3b91 in naCallMethodCtx ()
    #4  0x08bd3c8c in naCallMethod ()
    #5  0x086658d1 in callMethod (locals=..., self=..., code=..., args=0x0, argc=0,
        this=<optimized out>) at /home/flight/FG/fgfs/flightgear/src/Scripting/NasalSys.cxx:247
    #6  call (locals=..., args=0x0, argc=0, code=..., this=0xf692028)
        at /home/flight/FG/fgfs/flightgear/src/Scripting/NasalSys.cxx:235
    #7  FGNasalSys::handleCommand (this=0xf692028, moduleName=0xae98ef8 "__js0",
        fileName=0x154241dc "/input/joysticks/js/button[7]/binding",
        src=0xaf20978 "controls.adjPropeller(-0.25)", arg=0xae98df0)
        at /home/flight/FG/fgfs/flightgear/src/Scripting/NasalSys.cxx:1152
    #8  0x08660e53 in FGNasalSys::handleCommand (this=0xf692028, arg=0xae98df0)
        at /home/flight/FG/fgfs/flightgear/src/Scripting/NasalSys.cxx:1164
    #9  0x082799a8 in do_nasal (arg=0xae98df0)
        at /home/flight/FG/fgfs/flightgear/src/Main/fg_commands.cxx:183
    #10 0x08c09093 in SGBinding::innerFire() const ()
    #11 0x084385d6 in FGButton::update (this=0x10d37f48, modifiers=0, pressed=true, x=-1, y=-1)
        at /home/flight/FG/fgfs/flightgear/src/Input/FGButton.cxx:65
    #12 0x0844897e in FGJoystickInput::updateJoystick (this=0xae7b470, index=0, joy=0xae7b48c,
        dt=0.016666666666666666)
        at /home/flight/FG/fgfs/flightgear/src/Input/FGJoystickInput.cxx:360
    #13 0x08448acb in FGJoystickInput::update (this=0xae7b470, dt=0.016666666666666666)
        at /home/flight/FG/fgfs/flightgear/src/Input/FGJoystickInput.cxx:370
    #14 0x08c32039 in SGSubsystemGroup::Member::update(double) ()
    #15 0x08c322db in SGSubsystemGroup::update(double) ()
    #16 0x08c32039 in SGSubsystemGroup::Member::update(double) ()
    #17 0x08c322db in SGSubsystemGroup::update(double) ()
    #18 0x08c31e82 in SGSubsystemMgr::update(double) ()
    #19 0x0829f3ea in fgMainLoop () at /home/flight/FG/fgfs/flightgear/src/Main/main.cxx:108
    #20 0x087a52d9 in fgOSMainLoop ()
        at /home/flight/FG/fgfs/flightgear/src/Viewer/fg_os_osgviewer.cxx:333
    #21 0x0829ff97 in fgMainInit (argc=15, argv=0xbff7c144)
        at /home/flight/FG/fgfs/flightgear/src/Main/main.cxx:493
    #22 0x08251ce6 in main (argc=<error reading variable: Cannot access memory at address 0x0>,
        argv=<error reading variable: Cannot access memory at address 0x4>)
        at /home/flight/FG/fgfs/flightgear/src/Main/bootstrap.cxx:278