Failure Manager

Failure Manager
Started in	02/2014
Description	Failure management framework
Maintainer(s)	galvedro and Hooray
Contributor(s)	galvedro
Status	Merged
Folders	$FG_ROOT/Nasal/FailureMgr; $FG_ROOT/Aircraft/Generic/Systems;
Topic branches:
fgdata	gitorious/fg/galvedros-fgdata/dev-failure-manager

Note Whenever possible, please refrain from modeling complex systems, like an FDM, autopilot or Route Manager with Nasal. This is primarily to help reduce Nasal overhead (especially GC overhead). It will also help to unify duplicated code. The FlightGear/SimGear code base already contains fairly generic and efficient systems and helpers, implemented in C++, that merely need to be better generalized and exposed to Nasal so that they can be used elsewhere. For example, this would enable Scripted AI Objects to use full FDM implementations and/or built-in route manager systems.

Technically, this is also the correct approach, as it allows us to easily reuse existing code that is known to be stable and working correctly, .

For details on exposing these C++ systems to Nasal, please refer to Nasal/CppBind. If in doubt, please get in touch via the mailing list or the forum first.

Objective

Design a framework to unify and simplify failure simulation, both from the system modeler and end user perspective.

Status (12/2014)

A first stable version of the framework is available since 3.2. See the project sidebar for pointers to the code.
Necolatis is working on a more capable Canvas based GUI to replace the old one.
Galvedro is revising the architecture to support aircraft provided wear/damage/failure models.

Current Situation (3.1 and earlier)

All systems and most instruments implemented in the C++ core support basic failure simulation by means of a serviceable property. Generally, when this property is set to false, the system will stop updating itself. Some of these systems may support additional, more elaborate types of failures.

Other than this convention of using a serviceable property, there is no framework on the C++ side with regards to failure simulation. There is, however, a Nasal submodule that can generate random failures by taking user input from the GUI and controlling the relevant properties.

The approach is good, but the main problem is that the supported types of failures are hardcoded both in the Nasal module and the GUI.

Limitations

The GUI presents a fixed set of failures that can be simulated, regardless of what systems are actually implemented in the aricraft.
Aircraft can not add their own implemented failures to the set in a clean way.
Failures are considered boolean by the framework, i.e. either serviceable or not serviceable. There is no way to express intermediate states of failure.
Only random failures based on time or usage cycles are supported.
In general, the framework is not extensible.

Proposed improvements

The proposal is to maintain the current schemma of having a Nasal submodule dedicated to failure simulation, but overhaul it to overcome the limitations stated above. In order to accomplish that, we will raise its status to a full fledged Failure Manager.

Here are some desirable traits for the new module:

To start with, the failure manager should definitively _not_ implement any particular system failure by default, but provide the logic for managing random events or on demand failure mode activation.

It should also provide a subscription interface so aircraft systems can register their own failure modes. After all, only the aircraft "object" is really aware of what systems are being modeled.

It should not make any assumptions on how to trigger failure modes (i.e. do not assume a serviceable property). Instead, the Failure Manager should use an opaque interface for setting a "failure level" and leave the details of activation to user scripts.

The Failure Manager should also support a flexible set of trigger conditions, so failure modes can be programmed to be fired in different ways, for example at a certain altitude.

GUI dialogs should be generated procedurally, based on the set of supported failure modes that has been declared by the aircraft.

Structure

The new module includes three components:

A Nasal submodule that implements the core Failure Manager ($FG_ROOT/Nasal/FailureMgr).
A Nasal library of triggers and actuators for programming the Failure Manager ($FG_ROOT/Aircraft/Generic/Systems/failures.nas).
A compatibility script that programs the Failure Manager to emulate previous behavior ($FG_ROOT/Aircraft/Generic/Systems/compat_failure_modes.nas). Currently loaded by default on startup.

The public Nasal interface for programming the FailureMgr is documented here: $FG_ROOT/Nasal/FailureMgr/public.nas

The design revolves around the following concepts, all of them implemented as Nasal objects.

FailureMode: A failure mode represents one way things can go wrong, for example, a blown tire. A given system may implement more than one failure mode. They store a current failure level that is represented by a floating point number in the range [0, 1] so non boolean failure states can be supported.

FailureActuator: Actuators are attached to FailureModes and encapsulate a specific way to activate the failure simulation. They can be simple wrappers that change a property value, but they could also implement more complex operations. By encapsulating the way failure modes are activated, the Failure Manager does not depend on conventions like the serviceable property, and can be easily adapted to control systems designed in different ways.

Trigger: A Trigger represents a condition that makes a given FailureMode become active. The failures.nas library currently supports the following types: altitude, waytpoint proximity, timeout, MTBF (mean time between failures) and MCBF (mean cycles between failures). More can be easily implemented by extending the FailureMgr.Trigger Nasal interface.

FailureMgr: The Failure Manager itself. Keeps a list of supported failure modes that can be added or removed dynamically using a Nasal API. It also offers a Nasal interface for attaching triggers to failure modes (one trigger per failure mode). While enabled, the FailureMgr monitors trigger conditions, and fires the relevant failure modes through their actuators when their trigger becomes active. The FailureMgr can be enabled and disabled on command, both from Nasal and the property tree.

Design

The compatibility layer makes things rather confusing since, right now, there is an unhealthy mixture between stiff legacy behaviour and the new functionality. Lets forget about the compat layer for now, including the PUI-GUI, and lets go over how the system is designed to work.

Without the compat layer, the FailureMgr starts empty: there is no failure registered in the system. The system is designed so that it is the aircraft itself who subscribes all failure modes that it is capable of simulating. This is important: it is not an external system who subscribes stuff, it is the aircraft model itself. So what exactly is a failure mode? A failure mode is something that has a failure/damage/wear level, whatever you want to call it, which makes the aircraft behave differently depending on its state.

A "broken altimeter" is a failure mode that makes the aircraft incapable of displaying altitude through the instrument.
A "stuck aileron" is a f.m. that makes the aircraft incapable of moving that aileron.
A "leaking fuel tank" is a f.m. that makes the aircraft loose fuel at a certain rate.
"Bug pollution" is a f.m. that makes an aircraft loose aerodynamic performance.
A "broken wing" is a failure mode that makes an aircraft fly in funky ways.

Now, FlightGear is capable of simulating a wide range of flying machines, not only aircraft. Helicopters, hang-gliders, balloons and space vessels are also part of the hangar, so assumptions should not be made for the system to be equally valid for the whole spectrum of simulated thingies. This is the reason why the FailureMgr expects each model to declare the collection of failure modes supported.

The purpose of the FailureMgr as a core module is to keep all this information together in one place and in a uniform format. This should make it easier for model developers to add support for failure simulation by delegating on the FailureMgr some of the boiler-plate duties, but it should also make it simpler for end users to interact with the system, for example, by being able to disable failure simulation completely from a single place, or by being able to control it from a uniform interface, e.g. Nasal & the property tree.

Actuators. Actuators do not "decide", actuators "do". An Actuator is a little script with a defined interface that is used by the FailureMgr to apply a level of failure to the simulation. Every failure mode must have one actuator associated to it for the FailureMgr to do something useful. There are a few implemented in a library, but the Actuator concept is open for aircraft modellers to do whatever they need in order to recreate a failure simulation. Some examples:

A simple type of actuator can just set a property to a certain value. This is used to "actuate" on systems that simulate failures by means of a "serviceable" property.
A more complex actuator could tweak certain parameters in the FDM to modify aerodynamic performance.
An even more complex one could play some sounds, trigger animations and tweak certain performance parameters.

So far, we have a collection of failure modes and one actuator for each of them. This is sufficient for having a usable failure simulation, albeit, it has to be manually controlled, which is one of the use cases that should be supported. At this point, the FailureMgr has created a normalized interface in the property tree and also allows you to control the failures at individual level from Nasal, e.g. FailureMgr.set_failure_level(mode_id, level).

The final elements in the mix are Triggers. Triggers know nothing about FailureModes nor Actuators. They are independent entities whose sole purpose in life is to flag when certain conditions are met. An AltitudeTrigger flags when certain altitude conditions are met, a WaypointTrigger flags when the current position is within range of a certain waypoint, and so on. The FailureMgr allows you to attach one Trigger to each FailureMode. You can do it, but you don't have to, it's a feature added for convenience that opens the door for different use cases:

An instructor could attach a certain trigger to a failure mode for simulating an emergency scenario in defined conditions.
A user can set up randomized triggers (like MTBF or MCBF) to certain failure modes to spice up a flight.
An author can use the trigger system to create a realistic failure simulation where failure modes are fired when certain flight conditions are met.

Roadmap

Replace Nasal/failures.nas with a new module implementing the design proposed above. Wire it to the exising GUI dialogs and ensure backwards compatibility
Replace the hard-coded dialogues with a dynamic one that reflects the set of supported failure modes.
Design an XML format so aircraft can declare their support for failures from XML instead of programmatically.
Do not load the compatibility layer globally (i.e. by default), but rather, only for those models that do not have a failures.xml.
Aircraft authors can now start customizing the failure features for their crafts in a clean way.
Extend the feature set as needs arise (instructor console, additional triggers, ground equipment failure simulation, etc).

Under consideration

Generalize the trigger system and make it available as a global service. Might be useful for missions/adventures, AI agents, etc.
Introduce the concept of Wear Models.

the other advantage is that this is going to be agnostic to the way it is controlled, i.e. there's a concept of a dedicated "failure manager", so that this can be hooked up to different front-ends, including a telnet/web-based front-end (e.g. instructor console), or even just an integrated Canvas/GUI dialog.

— Hooray (Wed Apr 30). Re: Engine wear.
(powered by Instant-Cquotes)

I am doing some work related to this. Wear, as a concept, will not be supported in the first drop of this development, but I would like to include it eventually as part of the system. What I would like to know from those of you who create aircraft models or have an opinion on the subject is: how would you expect such a feature to work?

— galvedro (Sat May 03). Re: Engine wear.
(powered by Instant-Cquotes)

This is all looking very promising, but you guys should really be aware of galvedro's work, and flug's bombable addon - there certainly is quite some overlapping code in all 3 efforts here, and it would make sense to generalize and unify things so that code can be better reused.

— Hooray (Mon Jun 09). Re: Better nort crash.
(powered by Instant-Cquotes)

Once we start looking at combat hits, we'll almost certainly be comparing flug and dfaber's work on projectile hits so that our method of reporting submodel hits allows compatibility where possible. Tom has already built a method of seeing tracer from AI and MP models which also checks for collisions using submodels, so our next step is to address hit compatibility and how hits are passed over MP.

— Algernon (Mon Jun 09). Re: Better nort crash.
(powered by Instant-Cquotes)

as far as failures go, I am quite keen to find galvedro's code to find out how the failures system is changing - my intention all along has been to keep pace with FG's built in failures and adapt the damage system accordingly to make the best use of it. I'm looking through the repository at the moment but haven't yet found it. The damage system is intended to be a stage between hits and failures - failures may happen anyway, but failures are more likely to result where there is damage; to what extent will be handled between the damage script and the built in failure system. That said, I still think there will be room, a need even, for more detailed modelling of individual aircraft's particular characteristics - as an example, I've been looking at the failure probabilities for an EE Lightning, they will be significantly more prone to engine fires than the Victor!

— Algernon (Mon Jun 09). Re: Better nort crash.
(powered by Instant-Cquotes)

Since you are actually doing failure/damage/wear modeling, I am very interested in hearing your feedback about the new failure manager architecture and functionality.

I would suggest to read the wiki page Hooray posted first, as I tried to document the motivation for the change and the design principles there. The public interface for programming the failure manager from Nasal is at Nasal/FailureMgr/public.nas. It should be reasonably documented, but please let me know if you find something confusing or unclear.

On a side note, I don't recommend using the property tree interface directly for new developments, as it is currently half way between what it was and what I want it to be, so it is a bit dirty right now and it will change a bit in the future.

— galvedro (Thu Jun 12). Re: Better nort crash.
(powered by Instant-Cquotes)

Algernon: Great to see that you're actually interested in collaborating here and using existing code - it is very frustrating to see other efforts whose contributors don't realize how heavily their work is related, and how much it would make sense to team up with others to collaborate in a more framework-centric fashion, rather some aircraft-specific feature. We've recently seen several efforts with little to zero communication and collaboration, where contributors could have save months of work had they spoken up earlier and had they shown willingness to collaborate.

The added advantage here is that galvedro's code is a good foundation to work with, i.e. his code is exceptionally clean and he's obviously very familiar with coding, so a joint effort can be a mutually beneficial experience for all parties involved, and you'll save a ton of work and time along the way, while also ensuring that your work is generic, i.e. can be easily reused by other aircraft/developers.

— Hooray (Thu Jun 12). Re: Better nort crash.
(powered by Instant-Cquotes)

Regarding damage modeling WRT combat/bombable, I'd like to check flug's code at some point to see if/how certain parts of it could be generalized there - even just moving useful routines to a dedicated module in $FG_ROOT/Nasal or $FG_ROOT/Aircraft would be a good thing in my opinion. Flug has written some very clever Nasal code as part of the bombable addon, and we should really try to understand how to generalize and integrate the most useful parts so that people working on similar features can reuse his work.

EDIT: bombable.nas: https://github.com/bhugh/Bombable/blob/ ... mbable.nas

— Hooray (Thu Jun 12). Re: Better nort crash.
(powered by Instant-Cquotes)

We're definitely keen on using existing code where possible, I will admit that I need to look outside my own development sphere more as it's too tempting just to code something for hours, for fun, which is probably already extant somewhere! I believe galvedro has mentioned somewhere in a post he's interested in overhauling the Electrical.nas script - that is somewhere I'd be very interested to collaborate - battery drain and charge, AC and DC circuits, reasonably realistic load characteristics... that's something I'm excited about! I'm also always keen to get a firmer grip on Nasal, mine is still extremely basic and fairly inelegant.

— Algernon (Thu Jun 12). Re: Better nort crash.
(powered by Instant-Cquotes)

I have tested most of it.

New failures:
McbfTrigger, TimeoutTrigger, MtbfTrigger, AltitudeTrigger, set_unserviceable, set_readonly all works.

Old failures:
Engine, gear, flaps, electrical, aileron, rudder, elevator, asi, altimeter, attitude still works.

I have yet to write a custom trigger or actuator, but I would say this system seems good designed, and very powerful.

— Necolatis (Thu Jun 12). Re: How does serviceable and failures work?.

We will talk about this a bit later on when the FGUK guys have played with the module as well. But one thing that aircraft developers are demanding in one way or another, is to go one step further and do a more complex system damage/wearing, where failures influence each other, i.e. a structural failure here produces a system failure there and so on.

This kind of modeling is likely to be fairly aircraft specific, but we will see. I have given some thinking to a next evolutionary step in this direction, and it is quite tricky to organize in a clean way actually.

— galvedro (Fri Jun 13). Re: How does serviceable and failures work?.

Custom triggers will not play well with the gui at the moment. For this first release I just squeezed the new engine underneath the existing gui, and the compatibility layer responds to it by emulating the former behavior (if it doesn't, it is a bug). That means that only Mtbf and Mcbf triggers are currently supported through the gui.