Howto:Reset/re-init Troubleshooting

From FlightGear wiki
Jump to navigation Jump to search
This article is a stub. You can help the wiki by expanding it.


Nasal/Canvas dialog showing a reset/reinit control panel for troubleshooting

Motivation

Cquote1.png The startup sequence has always proved to be rather tricky to get right
— Erik Hofman (Apr 18th, 2006). Re: [Flightgear-devel] Subsystem run-levels.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png the property tree as it is currently is in need of some rework because of the ownship (single desktop aircraft) approach. This is easier than it sounds - basically most of the property tree becomes part of the aircraft and only a few items are shared. This will also allow the switching of aircraft. The reason to consider this now, and maybe not implement it, is to ensure that the design will support this when it is time to implement it.
— Richard Harrison (Nov 19th, 2015). Re: [Flightgear-devel] HLA developments.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png Currently we don’t take good advantage of multi-core CPUs [...]

Defining a multi-threaded client-server architecture from the beginning would make a big difference today[...]

We’re also planning to use HLA/RTI, which will allow FlightGear to integrate with distributed simulation environments (important for industrial applications) and make better use of multi-core processors.
Cquote2.png
Cquote1.png FlightGear currently has a large increase in memory usage on Reset (tested with c172p@...: 1.6GB -> minimum during reset 1.2GB -> probably-out-of-memory system hang at 2.0GB), but when I tried to trace this problem using AddressSanitier's leak checker, the (many) leaks it found were much too small to explain this.
— Rebecca N. Palmer (Mar 25th, 2015). Re: [Flightgear-devel] Detecting circular-reference memory leaks.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png Modularlisation, isolation, and standardisation of the FG codebase is very much WIP. This is being pushed hard from 2 directions - James with the reinit() and Stuart with HLA. Note my CppUnit comment on the mailing list - if this takes off, then this process will be massively accelerated.
Cquote2.png


Cquote1.png Instead of adding just-another-feature we need to strip it down to getting a fast and constand fps rendering engine. Everything else needs to run outside the main loop and has to interact with the core by, say HLA/RTI or whatever IPC we have.
Cquote2.png
Cquote1.png I'd just recommend to not do it but to focus on detangling the subsystems and creating a distributes simulation instead. That's my vision.
Cquote2.png


Objective

reset/reinit control panel for regression testing purposes implemented using Nasal & Canvas

FlightGear does not currently support saving/loading flights or reliably switching between aircraft at run-time (this is extensively discussed at FlightGear Sessions).

Reset & re-init is an effort to refactor the FlightGear initialization process so that resetting and repositioning (switching aircraft) is supported, without having to exit or restart FlightGear. Currently, this is exposed via the Canvas-based Aircraft Center, but is considered broken or fragile at best by most core developers.

The core developers are aiming to find out the dependencies of different subsystems[1], and re-factor them so that more and more subsystems can be made optional (analogous to run-levels)[2], enabling them to be dynamically removed and re-added at run-time.

This will be particularly important to untangle implicit or hard-coded dependencies among the different subsystems[3], and will be one of the key tasks to move certain subsystems into dedicated High-Level Architecture (HLA) federates.

One of the long-term goals is to provide a so called "headless" mode so that certain features/subsystems (unrelated to graphics) can be better tested in isolation. An example would running FlightGear in an automated fashion on the FlightGear Build Server, which could help increasingly automate the release process and related regression testing, which is another stated goals of several core developers[4][5][6][7][8].

The other goal is to increasingly modularize FlightGear by using HLA [9]and split off the simulation loops (see also FGViewer), as well as supporting different renderers (such as Rembrandt and ALS), scenery engines (e.g., standard and osgEarth) and weather engines, in a fashion similar to how FlightGear already supports different FDM engines (JSBSim and YASim). HLA will make it possible for certain subsystems to be moved to dedicated cores by using separate threads or even processes, meaning that certain subsystems may even be able to be on a different computer, in a distributed setup.

The underlying requirement that these efforts share is that there needs to be a much better re-initialization process, with no hard-coded assumptions about running subsystems or initialization order.

For the time being, however, many of these efforts are not yet completely functional, so more feedback and data are needed.

You can help this effort by running the relevant APIs and providing GDB (GNU Debugger) backtraces and bug reports (or crash reports if you have Windows) of any segmentation faults that occur.


People running the code shown below should be prepared to trigger segfaults, and should ideally be able to provide gdb backtraces (if you are on Windows, please send crash reports.

For testing purposes, it does make sense to run FlightGear using the minimal startup profile, with graphics/rendering disabled using Draw masks.

Background

James has added initial code to work on dynamic subsystem creation, so that subsystems can be added or removed at runtime.

Only some subsystems are supported so far, since many have non-default C++ constructors (e.g. systems designed as singletons) or other complexities (see FlightGear Run Levels for further details).

With this change, it's now possible to dynamically add and remove the traffic-manager at runtime using fgcommands, for example (pasted into the Nasal Console):

fgcommand("add-subsystem", props.Node.new({
    "subsystem": "traffic-manager",
    "name":"traffic-manager",
    "do-bind-init": 1
}));

The idea is to improve this further so that more and more subsystems in FlightGear can be optionally toggled on and off at runtime, which should help facilitate other ongoing efforts, like the FGCanvas project, for example. Ultimately, this will help make FlightGear more configurable and scalable, but also more easily usable for other purposes, such as distributed (multi-machine) setups, so that a single binary can easily be used for different purposes.

In addition, it will be much easier for developers to do regression testing and benchmarking once subsystems can be completely disabled, a long-standing feature request (see FlightGear Headless); it would also help to simplify, and to some extent even automate, release preparations.

Cquote1.png thanks to all the reset/re-init work, the C++ dependencies are fairly straightforward to decouple these days, I just made the tile manager optional in under 10 minutes, i.e. all the scenery/sky/stars is gone now - which means that FG is now utiliing less than 140 MB of RAM here using this "startup mode".

So this about more than just "FGCanvas": It will make troubleshooting so much more straightforward, and we can also explore feature-scaling and run-time benchmarking once this works. Currently, I'd even consider restructuring initialization to always boot up like this, because it's the safest possible subset of subsystems for now - i.e. roughly a dozen subsystems that basically stand no chance of "crashing" the sim. And from then on, additional initialization could be handled by Nasal obviously - and scripts could even scale up/down, depending on hardware support and features. With 14 subsystems remaining, I am now getting 650 fps when showing the Canvas GUI (Aircraft Center) here. As for C++ changes, I have come to the conclusion that this could be greatly simplified by making the whole aircraft shebang optional - i.e. aircraft as such, and all its related subsystems, e.g.:

  • view manager
  • FDM (obviously)
  • Autopilot, Property Rules
  • Route Manager
  • history
  • flight recorder


These are currently hard-coded as separate SGSubsystem instances in fg_init.cxx - even though this is conceptually a headache, simply because these should be all handled by a single SGSubsystemGroup and wrapped inside it.
— Hooray (Jul 6th, 2014). Re: FGCanvas Experiments & Updates.
(powered by Instant-Cquotes)
Cquote2.png

Fgcommands

The relevant fgcommands are add-subsystem and remove-subsystem.

XML usage

<binding>
  <command>add-subsystem</command>
  <subsystem></subsystem>
  <name></name>
  <group></group>
  <do-bind-init></do-bind-init>
  <min-time-sec></min-time-sec>
</binding>
<binding>
  <command>remove-subsystem</command>
  <subsystem></subsystem>
</binding>

Nasal usage

Note  this will trigger an error if the system has not been previously removed:
do_add_subsystem: duplicate subsystem name:traffic-manager
fgcommand("add-subsystem", props.Node.new({
    "subsystem": "traffic-manager",
    "name": "traffic-manager",
    "group": "general",
    "do-bind-init": 1,
    "min-time-sec": 2
}));
fgcommand("remove-subsystem", props.Node.new({"subsystem": "traffic-manager"}));

List of subsystems

See flightgear/src/Main/subsystemFactory.cxx (line 74).

Subsystem status

Broken Subsystems

Subsystem reinit removal adding Notes Memory Backtrace (pastebin)
route-manager - segfault segfault needs to be tested -
environment unknown segfault unknown needs to be tested -
ephemeris unknown segfault unknown needs to be tested -
traffic-manager unknown unknown unknown needs to be tested - -
gui (PUI) unknown unknown unknown needs to be tested bunch of runtime errors, but massive performance improvement when disabled entirely! -
ai-model-mgr unknown segfault unknown needs to be tested -
aircraft-lighting unknown segfault unknown needs to be tested -
sound unknown unknown unknown needs to be tested - -

Working Subsystems

minimal subset of required fg subsystems

Note  The subsystems shown in the table below seem to be fairly safe to remove/add at run-time, so that it would seem possible to come up with an extended minimal startup profile implemented in scripting space that will actually disable unneeded subsystems during startup.

Missing subsystems

Note  This is a list of subsystems that are currently not supported by subsystemFactory.cxx, due to inherent complexities/non-default ctors or harcoded runtime assumptions (e.g. Nasal/events or the tile-manager).
  • performance-monitor
  • ATC-Old
  • xml-autopilot
  • terrasync
  • navcache (not a true subsystem currently!)
  • time
  • tile-manager (reinit is supported)
  • events (keeps Nasal timers)
  • nasal

Nasal dialog

updated screenshot showing the reset/re-init control panel for helping troubleshoot reset/re-init related segfaults
Note  The following Nasal script can be executed via the Nasal Console or put in a separate file and executed via a menu item to easily test different aspects of reset/re-init. This is intended to be a stress test, by allowing people to easily switch aircraft, relocate repeatedly and/or stop/restart and suspend/resume different subsystems. For the time being, you are likely to trigger segfaults/crashes or other undefined behavior (e.g., memory leaks), so it is recommended to run FlightGear in a GNU Debugger (GDB) session to obtain a backtrace. If you manage to cause a bug or crash, please file a bug report: Issue tracker tickets
canvas.MessageBox.warning(
    "Developer Feature",
    "This dialog is mainly intended for people familiar with FlightGear/core internals to help troubleshoot reset/re-init related bugs. You will probably want to run FlightGear inside a GNU Debugger (GDB) session when using this dialog",
    func(sel){
        if(sel != canvas.MessageBox.Ok) return;

###################################################################
var (width, height) = (700, 480);
# https://sourceforge.net/p/flightgear/mailman/message/35048478/
var title = 'Reset/re-init Panel (aka Segfault Paradise)';
 
var window = canvas.Window.new([width, height], "dialog")
    .set("title", title);
 
var myCanvas = window.createCanvas().set("background", canvas.style.getColor("bg_color"));
 
var root = myCanvas.createGroup();
 
var myLayout = canvas.VBoxLayout.new();

myCanvas.setLayout(myLayout);

var controls = canvas.HBoxLayout.new();
myLayout.addItem(controls);

var drawMaskHBox = canvas.HBoxLayout.new();
myLayout.addItem(drawMaskHBox);

# create a scrollbar  
var scroll = canvas.gui.widgets.ScrollArea.new(root, canvas.style, {size: [96, 128]}).move(20, 100);
myLayout.addItem(scroll, 1);

var scrollContent = scroll.getContent()
    .set("font", "LiberationFonts/LiberationSans-Bold.ttf")
    .set("character-size", 16)
    .set("alignment", "left-center");

var list = canvas.VBoxLayout.new();
scroll.setLayout(list);

var fgcommandCb = func(command, arguments) {
 return func() {
  fgcommand(command, props.Node.new(arguments) );
 };
}

##
# vector with control buttons/callbacks (shown at the top of the dialog)

var ControlButtons = [
	{name: "Global reset", callback: fgcommandCb("reset", {}) },
	# see $FG_SRC/Main/fg_commands.cxx (do_switch_aircraft)
	{name: "Reload aircraft", callback: fgcommandCb("switch-aircraft", {aircraft:'ufo'}) },
	# WIP: (placeholders)
	{name: "Relocate:KSFO", callback: func() {} },
	{name: "Relocate:KRNO", callback: func() {} },
	{name: "Pause/unpause", callback: func() {} },
];

foreach(var c; ControlButtons) {
controls.addItem(
	canvas.gui.widgets.Button.new(root, canvas.style, {})
	.setText(c.name)
	.setFixedSize(120, 25)
	.listen("clicked", c.callback)
);
} # foreach control


drawMaskHBox.addItem(
	canvas.gui.widgets.Label.new(root, canvas.style, {wordWrap: 0})
	.setText("Draw masks:")
);

var draw_masks = ["terrain", "aircraft", "models", "clouds"];
foreach(var d; draw_masks) {
    var prop = "/sim/rendering/draw-mask/" ~ d;
    drawMaskHBox.addItem(
        canvas.gui.widgets.CheckBox.new(root, canvas.style, {})
            .setText(d)
            .setChecked(getprop(prop))
            .listen("toggled", func(e) {
                setprop(prop, e.detail.checked);
            })
    );
}
 
##
# helper for creating an event handler
var resetHandler = func(command, arguments){
    return func(){
        var name = arguments.name;
        logprint(4, "testing subsystemFactory for:" ~ name);
        fgcommand(command, props.Node.new({"subsystem": name}));
    };
};

##
# call this to add buttons to trigger a test
# 
var addTest = func(root, layout, test){ 
    # create a new layout
    var row = canvas.HBoxLayout.new();
    layout.addItem(row);

    var label = canvas.gui.widgets.Label.new(root, canvas.style, {wordWrap: 0});
    label.setText(test.name);
    row.addItem(label);

    var status = canvas.gui.widgets.Label.new(root, canvas.style, {wordWrap: 0});
    status.setText("nop");
    row.addItem(status);

    test.status =  status;


# this adds a row of buttons for the 3 currently supported fgcommands
# (support for suspend/resume is pending)
    foreach(var cmd; ["reinit", "remove-subsystem", "add-subsystem"]) { 
        var button = canvas.gui.widgets.Button.new(root, canvas.style, {})
            .setText(cmd)
            .setFixedSize(150, 25);
        button.listen("clicked", resetHandler(cmd, test));
        row.addItem(button);
    }

    return status; # we want to update the label elsewhere
}; # addTest
 
##
# vector with hashes containing subsystems that will be added as buttons to the dialog
# each hash will end up with an additional "label" field that is updated separately
# Also, the hash can be extended to add tooltips and/or other meta information that can be
# used for dealing with subsystems differently (think Nasal depending on events, sound/tilemgr being threaded etc).
# 
# this must match $FG_SRC/Main/subsystemFactory.cxx, so you need to extend this 
# code if you are using this in conjunction with a topic branch like the FGPythonSys branch
##
# NOTE: in the current form of the script, this is a vector with hashes
# that will be populated by briefly running the performance monitor
# with a name field containing the name of the subsystem
# and a status field pointing to the Canvas text node for showing active/inactive flags
var Tests = [];

var performance_monitor = props.globals.getNode('/sim/performance-monitor');

var subsystemListTimer = maketimer(1.5, func(){
    foreach(var s; performance_monitor.getNode("subsystems").getChildren('subsystem') ) {
    var name = s.getNode('name').getValue();
    print("Subsystem found:", name);
    append(Tests, {name: name});
}

print("subsystem list retrieved, disabling performance monitor again"); 
performance_monitor.getNode('enabled').setBoolValue(0); # disable the performance monitor again

 
debug.benchmark("button setup", func() {

##
# add buttons for each test to the scrollbar layout
foreach(var test; Tests){
    # will add label fields to each test, so that labels can be dynamically updated
    # using a different callback
    addTest(root: scrollContent, layout: list, test: test);
}
}); # button setup (benchmark)

var dynamicLabels = [];

var subsystemMonitor = func(){

    # update labels showing frame rate & frame spacing
    foreach(var d; dynamicLabels) {
     # FIXME: dynamic labels should probably be using sprintf() style format strings
      d.label.setText( d.cb() );
    } 

    foreach(var test; Tests){
        var isRunning = fgcommand("subsystem-running", props.Node.new({"subsystem": test.name}));
        var suffix = isRunning ? " (active)" : " (inactive)"; 
        #test.label.setBackground(color);
        var currentText = test.status._view._text.get("text");
        var neededLabel = suffix;
	#FIXME: better use show()/hide() methods here which is more efficient than changing labels per frame ...
        if (currentText == neededLabel) continue;
        test.status.setText(neededLabel);
    }
}

var statusbar =canvas.HBoxLayout.new();
myLayout.addItem(statusbar);

var version=canvas.gui.widgets.Label.new(root, canvas.style, {wordWrap: 0});
version.setText("FlightGear v" ~ getprop("/sim/version/flightgear"));
statusbar.addItem(version);

## placeholders for dynamic labels 

var fps=canvas.gui.widgets.Label.new(root, canvas.style, {wordWrap: 0});
fps.setText("45 fps");
statusbar.addItem(fps);
append(dynamicLabels, {label:fps, cb:func() {return ""~ getprop("/sim/frame-rate");}, suffix:' fps' });

var ms=canvas.gui.widgets.Label.new(root, canvas.style, {wordWrap: 0});
ms.setText("35 ms");
statusbar.addItem(ms);
append(dynamicLabels, {label:ms, cb:func() {return ""~ getprop("/sim/frame-latency-max-ms");}, suffix:' ms' });


var myTimer = maketimer(0.1, subsystemMonitor);
myTimer.start();

window.del = func(){
    print("Cleaning up window: ", title);
    myTimer.stop();
    call(canvas.Window.del, [], me);
};


}); # end of initial/setup timer

performance_monitor.getNode('enabled').setBoolValue(1); 
subsystemListTimer.singleShot = 1; # timer will only be run once
gui.popupTip("Getting list of known subsystems from the performance monitor ...",1.5);
subsystemListTimer.start();


###################################################################
    }, # event handler for messageBox
    canvas.MessageBox.Ok |canvas.MessageBox.Cancel | canvas.MessageBox.DontShowAgain
);