Reset & re-init

From FlightGear wiki
Jump to navigation Jump to search
Reset & Re-init
Started in 01/2013
Description Fixing reset & re-init
Maintainer(s) Zakalawe
Contributor(s) User:Zakalawe (since 01/2013),
Status merged (under active development as of 01/2013)
Screen shot showing a the performance monitor in a patched version of FlightGear 3.2 where subsystem initialization is made better configurable and increasingly optional by allowing subsystems to be explicitly disabled/enabled during startup. Decoupling internal subsystem dependencies means that we can more easily provide support for benchmarking, but also headless regression testing - and eventually, also a standalone FGCanvas startup mode.
See also Initializing Nasal early, Aircraft Center and FGCanvas.

Scheme agreed by James & Thorsten at FSWeekend 2012 to fix reset / re-init weirdness.

Motivation

  • Re-init of all subsystems doesn't work, because many subsystems don't support it. (There's an ticket #419 open bug about this in the tracker). Instead we have 'reset', which does some special operations, but is different logic from normal startup. This leads to bugs when the reset behaviour differs to a fresh start.
  • JSBSim has issues with resets since it doesn't read simulation state each frame - this is why you can't resume flying from a recording / replay for JSBSim aircraft.
    • Note this is a different issue from problems with in-air starts, which also needs some tweaking to work with JSBSim.
Cquote1.png Actually my most recent observations show some issues with multiple teleports, or combination of teleports and resets. Maybe that should be part of a minimum test before something is commited[1]
— Jim Wilson
Cquote2.png

The last time, "reset/reinit" (as well as saving/loading flights) was reported to work properly, was prior to the "presets" work:

Cquote1.png Yes, reset and save/restore are a bit broken in FlightGear right now. I'll try to fix them when I have a chance, but it will require a bit of refactoring (removing Curt's initial-state stuff, for example).[2]
— David Megginson
Cquote2.png

For details, see Fixing Presets.

Cquote1.png

The original reset was a simple kludge that worked fine for a simple program; now that FlightGear is a lot more sophisticated, we need to refactor a bit rather than just holding things together with scotch tape and bubble gum.

The problem is that right now we have two different approaches - a lot of stuff is handled cleanly in individual modules' update() methods, but a lot of stuff is still handled by hundreds of lines of BASIC-like, hard-to-read spaghetti code in main.cxx and fg_init.cxx (forget OO, it's not even functional programming).

That was the right idea to get started - I'm not a fan of wasting time on an initial OO design that you won't end up using -- but now we need to finish cleaning it up. We need to make this a high priority.[3]
— David Megginson
Cquote2.png
  1. Jim Wilson (Wed, 05 Jun 2002 16:38:32 -0700). Crash on "Reset" function..
  2. David Megginson (Mon, 24 Nov 2003 07:22:13 -0800). Re: Helicopters: wow!.
  3. David Megginson (Thu, 06 Jun 2002 05:03:39 -0700). Crash on "Reset" function..

Status / related Commits (01/2016)

reset looks to most subsystems like ‘off and on again’, i.e nothing should survive in terms of instances - everything is destroyed and re-created. This includes Nasal so add-ons should just work. Any property data needs to survive should be marked on the SGPropertyNode with a flag (I think it’s ‘archived’ but I will check), it will be saved across the reset, and hence visible to the subsystem when it comes back up again. No special Nasal or signals should be needed, since the idea is that reset looks exactly like a full shutdown and then a clean start. Adding additional signals will break that concept, and potentially bring us back to a dangerous world where some systems behave differently on reset to first-time-init, which is a very bad thing in my opinion.[1]


Cquote1.png I guess we know already that one ain't really working :-/ Unless this has been fixed, any reset freees all uniforms in the effect systems to their value at reset - doesn't show dramatically initially, but in fact lighting never changes from that point, snow line doesn't, dynamical displacement of shadows doesn't work,...
— Thorsten (Sep 25th, 2015). Re: Please test the 3.6 RC.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png We have already cleaned up a lot of subsystem / initialization things over the past year - and more things are planned. Eventually, I hope we can just recreate an FDM at any time ("out of thin air"... ;-) ) with arbitrary initial settings.
Cquote2.png
Cquote1.png In-sim reset causes a whole host of subtle issues in addition (not sure how much of these reposition triggers...) so I'd recommend not to expect this to work and not to use it without a good cause at the moment.
Cquote2.png

Over the next few days James is going to start switching over the reset system to use the new code. The following changes will happen, in order:

  • places that only need to re-position will use the new ‘reposition' command, which is a simplified version of the current reset (and will become even simpler over time, but for now the goal is maximum compatibility). This means the 'select airport' and 'position in air' dialogs, principally.
  • the reset command will be switched to use the new logic, which is slightly slower than the current logic, but much more complete. It's very close to shutting down the simulator and re-starting; in particular Nasal is restarted, and it should be possible in the near future to toggle between Rembrandt and the classic renderer. (The only UI-path that triggers a reset its the actual menu command / key-shortcut - If anyone knows of some strange way we trigger a reset, please do let me know)
  • the "save / restore initial state" logic in globals will be removed, since it will no longer ever be used
  • Many properties which are flagged as 'preserved' or 'userarchive' need to be audited for correctness in the new system. This will be an iterative process, but James will try to address issues as quickly as possible. Unfortunately each time a fix in preferences.xml is made, testers will need to delete their auto-save file. This is the price of living on the bleeding edge :)

James is not expecting any build failures since the code is already in Git, but I would ask that any bug-reports are only made /after/ wiping your autosave XML file.

See: FlightGear commit 881df711

2013-11-12 23:36 James Turner       Reset: changes for SGSubsystem ownership.
2013-11-12 23:10 James Turner       Reset: adjust for tweaked TerraSync API
2013-11-12 22:26 James Turner       Reset: explicit close-window function.
2013-11-11 10:13 James Turner       Reset: uninstall deletion-manager
2013-11-05 05:31 James Turner       Reset: route-manager guard against no plan.
2013-11-11 10:11 James Turner       Reset: ATIS shutdown fix  
2013-11-10 12:32 James Turner       Reset: panel-node cleans up listener.
2013-11-05 05:32 James Turner       Reset: guard against NULL HUD / lighting
2013-11-05 05:32 James Turner       Reset: sound manager can be shutdown cleanly
2013-11-05 05:30 James Turner       Reset: drop FGLight prop refs on unbind()
2013-11-05 05:29 James Turner       Reset: remove commands
2013-11-11 10:13 James Turner       Reset: use simple properties for TimeManager
2013-10-06 17:36 James Turner       Reset: Nasal can be shutdown.

Remaining Issues

See also ticket #1702

1rightarrow.png See Howto:Reset/re-init Troubleshooting for the main article about this subject.

Cquote1.png FlightGear currently has a large increase in memory usage on Reset (tested with c172p@...: 1.6GB -> minimum during reset 1.2GB -> probably-out-of-memory system hang at 2.0GB), but when I tried to trace this problem using AddressSanitizer's leak checker, the (many) leaks it found were much too small to explain this.
— Rebecca N. Palmer (2015-03-25). Re: [Flightgear-devel] Detecting circular-reference memory leaks.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png this is _not_ a circular-reference leak, but not actually unreachable memory at all: the old objects are still referenced from somewhere, just not anywhere that actually knows it should free them.

The amount of memory lost depends on location but not significantly on aircraft: about 150MB over ocean, plus as much as the scenery was using over land, suggesting that the lost object is roughly "all the scenery".


— Rebecca N. Palmer (2015-03-26). Re: [Flightgear-devel] Large memory "leak" on Reset.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png memory usage (at least immediately) after Go To Airport appears to be the _maximum_ of normal usage at the 'before' and 'after' locations, not simply the usage at the 'after' location (expected) or their sum (simple leak).
— Rebecca N. Palmer (2015-03-26). Re: [Flightgear-devel] Large memory "leak" on Reset.
(powered by Instant-Cquotes)
Cquote2.png

Solution

  • Get rid of reset / reinit entirely, and simply re-run the normal init sequence after tearing down almost everything. (What remains, is to be decided by testing - probably scenery, maybe nothing else at all).
  • The requires reliable deleting + cleanup of subsystems - i.e fixing and fixing all the places using statics incorrectly. Some work towards this has already been done.
  • The current property tree mirror needs to be replaced with some similar, but simpler logic, to preserve the user-archive properties, and re-overlay them after the property tree is rebuilt.
    • This is simpler than the current 'clone the entire property tree' logic which also causes bugs. Indeed, one approach would be to actually update the autosave.xml as we do at shutdown, and re-read it. Hence no difference between a reset and quit + restart.
  • Subsystems that remain alive must hence be checked that they unbind / re-bind correct, since we will get a new, clean property tree root. This will be easy for some cases, tricky for others. Probably only subsystems related to rendering and display should even be considered for this.

Benefits

  • No more 'X breaks on reset' bugs - less code to maintain and fewer special cases in the code.
  • Dynamic switching of aircraft (saving/loading flights, see Ticket #1151)
    • Since the property tree will be rebuilt, we can safely read a different -set.xml file without the universe imploding.
  • Restarting of Nasal (re-read scripts on the fly, also see Nasal Events [2] [3])
    • Since it's a new property tree, we have to destroy and restart the Nasal context anyway. This should greatly simplify aircraft Nasal script development.
  • Separate startup GUI frontends like fgrun may become obsolete or at least optional eventually, because FlightGear itself would become increasingly runtime-configurable (we have about 10-15 GUI launchers for FG currently).
    • once FlightGear can be propertly re-initialized, providing a Canvas-based GUI on top would be fairly simple, see Aircraft Center
    • this should help improve end user usability - because FlightGear would become more self-contained, without inevitably requiring external tools to provide basic GUI configurability.
    • given the number of separate FlightGear launchers, there are also lots of developers involved in developing and maintaining these external software packages, some of them also active FlightGear core developers/contributors - so this step could further help channel important development resources, to focus on the main project, i.e. fgfs.

Implementation

  • This is a complex change, which cannot easily be done incrementally. It will inevitably cause some breakage and issues while everything is tested.
  • The init / shutdown code, including FGGlobals, needs to be adjusted to do the actual deletion and run init again. Most of the other init steps (configuration options) are already re-factored in this direction anyway.
  • Any subsystems which should be retained, such as tile-manager or renderer, need to be checked very carefully.
    • This likely requires some new SGSubsystemGroup hooks to extract the subsystems from their owning groups so they escape shutdown and deletion.
  • Any systems with other threads need to be inerted during the reset. In particular any osgDB paged loaders, or TerraSync threads, need to be paused or audited for safety. TerraSync should be relatively simple (since it's a regular subsystem), osgDB interactions, especially with properties, will be harder.
    • All animations and effects will need to be destroyed for sure, since they have property tree references.

Future Changes (post 3.2)

Based on experimenting with Initializing Nasal early and FGCanvas, here's some feedback to hopefully further simplify the whole process and make it more flexible in the future (comments welcome!):

  • making sure that all subsystems actually implement SGSubsystem would be helpful,especially those that are separately handled by reset/re-init (e.g. FGInterpolator, NavDataCache) - see Tom's commit at FlightGear commit 0f14a2d7
  • we have several corner-cases of FGGlobals fields that are separately managed by logic in fg_init.cxx: we should probably even consider turning the SGPropertyNode root in globals.cxx into a wrapped SGSubsystem - because the reset/re-init logic explicitly contains stuff to deal with the property tree: Whenever a "member" needs to be explicitly handled to support reset/re-init, we should consider wrapping them in a separate SGSubsystem class so that we can encapsulate the reset/re-init logic there, instead of doing that separately in fg_init.cxx, where all the important subsystem-specific logic is kept separate from the actual subsystem (not maintenance-friendly).
  • lots of problems result from the ambiguity of resets vs. re-init/repositioning, it would make sense to extend SGSubsystem accordingly and expose two virtual methods to handle reset and repositioning explicitly per object, and not separately-we'd want to extend SGSubsystemGroup accordingly, to ensure it can delegate events to each member
  • Nasal is currently being dealt with separately, and quite frankly, it doesn't belong into the same SGSubsystemMgr - we'd be MUCH better off by adding a simple SGSharedPtr<FGNasalSys> to globals.hxx with a getter. That way, we can easily allocate & initialize Nasal very early, without having to manually call nasal->init() in fgPostInitSubsystems()
  • init order is currently kinda fragile, it'd make sense to internally use each subsystem's bind/unbind and postinit() methods to ensure that cleanup is handled there, and not in fgPostInitSubsystems() or fgStartNewReset(), each run-level/group would then internally manage things:
    • Allocating Nasal as early as possible is a pre-requisite for using bind() to add Nasal/cppbind bindings, because the instance pointer must be valid obviously (i.e. have _context/_globals set up already)
    • each Subsystem that provides Nasal APIs, needs to implement the bind/unbind APIs to register/release Nasal APIs
    • Given the add-subsystem/remove-subsystem fgcommands/APIs, we can no longer afford having initNasalFOO() code in FGNasalSys::init() that depends on other subsystems being present. Subsystem-specific APIs must never be unconditionally added, they need to be added/released (managed) by the underlying subsystem itself, or things will break inevitably.
  • when re-ordering SGSubsystem initialization in fgCreateSubsystems I noticed that /some/ of its systems are pretty sensitive to ordering issues, which is non-obvious because things will typically "work", until the next reset, which is when race conditions may show up (or not). We should strive to formalize such dependencies and then encapsulate them using SGSubsystemGroups - which will also be useful to better establish threading-dependencies. For the time being, it helps to shut down all background threads manually when resetting.
  • getting rid of singletons could help simplify the init process
  • Once we use a PropertyObject<int> for idle_state in main.cxx, we can easily use Nasal to implement the whole splash screen loop in fgIdleFunction() using a single listener.
  • fixing up subsystems with non-default ctors:
    • performance-monitor
    • xml-autopilot
    • xml-proprules
    • httpd
    • events
    • materials
  • there are a few subsystems that are intended to remain PERSISTENT per session, e.g. terragear, time, lighting, events - those should probably be put into a dedicated SGSubsystemGroup that isn't subject to reset/re-init at all, e.g. outside the actual subsystem_mgr, or maybe using a new GroupType ?
  • fgCreateSubsystems() in fg_init.cxx and subsystemFactory.cxx are now overlapping in functionality, and we've seen some inconsistencies here that may further complicate matters FlightGear commit a52c0882. It would make sense to get rid of fgCreateSubsystems() and move this into scripting space by using the add-subsystem fgcommand APIs (have this kinda working in the Initializing Nasal early branch, using a boot script).
  • we can emulate run-levels by grouping related SGSubsystems into SGSubsystemGroups which would greatly simplify things Tom's Canvas code is already using the notion of a PropertyBasedElement which wraps SGSubsystemGroup and SGPropertyChangeListener. This means we could allocate/free groups of subsystems and treat them as run-levels e.g.:
    • CORE: logger, prop-interpolator, performance-mon
    • AUDIO: sound, fgcom, voice
    • NETWORK: io, mp, http
    • VIDEO: gui, Canvas, CanvasGUI
    • SCENERY: tile-manager, lighting, environment, ephemeris
    • WORLD: ai, atc, traffic
    • AIRCRAFT: flight, controls, systems, instrumentation, hud, cockpit-displays, replay, history, autopilot, route-manager
  • as per Initializing Nasal early, it would make sense to prepare delegating options.cxx and especially "presets" to scripting space
  • memory cleanup must be improved, see ticket #1559.
  1. James Turner  (Oct 7th, 2017).  Re: [Flightgear-devel] Addons - reinit (fgcamera) .