Towards better troubleshooting

These are some ideas towards better troubleshooting that we usually come up with after each release to improve the quality of end-user bug reports. Most of these are fairly trivial to implement, it just seems that we don't typically have a need for them until someone shows up having a real problem who is not able to build from source... It mostly boils down to showing additional information in the about dialog, or including it in the startup log file created in $FG_HOME, or maybe even making it available in the new crash reporting tool eventually.

You may also want to check out this post on the forum: Re:Helping users in the long term post on the forum

Background

Whatever release plan we come up with, it will never be able to solve our issues with respect to bug reporting/fixing. We have all the required infrastructure to do better here: the bugtracker, a forum, a mailing list and we even send crash reports - at least on windows. What we do not have are the people dealing with that. We need:

people filing proper bug reports
people classifying/reviewing the bug reports
people analying the crash reports
people fixing the bugs
— Torsten Dreyer (Nov 20th, 2015). Re: [Flightgear-devel] Some thoughts about the release process.
(powered by Instant-Cquotes)

whether we do frequent automated releases or single hand-picked releases, the Achilles heel for the whole process is going to be getting enough test data and processing it properly.

— Thorsten Renk (Nov 19th, 2015). Re: [Flightgear-devel] Some thoughts about the release process.
(powered by Instant-Cquotes)

Usually reports range from 'FG sucks' all the way to suggested C++ patches to fix an issue (with a strong bias towards the former). Ironically, we get at the same time too little and too much information - too little, because often people just write a frustrated post that FG version X crashes all the time and they reverted to the previous version - so we know there is an issue

— Thorsten Renk (Nov 19th, 2015). Re: [Flightgear-devel] Some thoughts about the release process.
(powered by Instant-Cquotes)

I feel that getting the information better organied and brought to the attention of the relevant maintainers would be the key to make better stable releases - but I am at a loss how to achieve that.

— Thorsten Renk (Nov 19th, 2015). Re: [Flightgear-devel] Some thoughts about the release process.
(powered by Instant-Cquotes)

however we do it in the future - this is what I see as the most pressing issue to solve - how to improve the flow of information from users to developers without making everyone unhappy in the process.

— Thorsten Renk (Nov 19th, 2015). Re: [Flightgear-devel] Some thoughts about the release process.
(powered by Instant-Cquotes)

Might sth like a bug-report (or maybe crash-report?) feature be of help? We got a lot of information in the property-tree, like GPU-drivers etc. For instance, do we have OS and stuff like that too? Might be fairly easy to put this important info into a tmp-file, and in the case we crash, add it to some kind of http-request to flightgear.org. An experience-report might be of help, what frequently asked questions do you pose all the time? + the red box...

— chris (Nov 19th, 2015). Re: [Flightgear-devel] Some thoughts about the release process.
(powered by Instant-Cquotes)

we should absolutely stop telling anyone to edit preferences.xml in FG_ROOT; any documentation or advice which says to should be changes ASAP.

— zakalawe (Sat Oct 26). Re: NavCache:init failed:Sqlite error:Sqlite API abuse.
(powered by Instant-Cquotes)

I'm interested in going through the entire fleet to create a new base level that all aircraft will be flyable. I'm not talking about fixing clunky FDMs or incomplete parts, but really basic maintenance - aircraft which start with tonnes of errors, and especially aircraft that cannot be started. If you had a list or could point out a few, that'd be awesome! I know there is tonnes of information in old forum posts about this, but for me it's quicker just to start each up one by one and test. If you do have a good list, maybe it would be good to start a new topic.

— bugman (Sun Apr 26). Looking for aircraft with 'paper cut' bugs.
(powered by Instant-Cquotes)

Trivial

These are already available in the property tree, i.e. just requires editing about.xml:

Expose threading mode Not done
Expose renderer mode (classic vs. rembrandt / osgEarth) Not done
Expose availability of Built-in Profiler Not done
show location of $FG_HOME in help/about dialog as a button to open it via file explorer ?
show checksum of preferences.xml using the md5() Nasal API in the About dialog to ensure that people didn't tamper with it
consider adding the checksum of preferenfces.xml to the binary, so that startup code can identify if files were modified that should not be changed
should show if fgfsrc is available/used
use intel detection heuristics and adapt those to detect AMD/ATI cards and show a startup warning to inform the user that PUI/styling may interfere with effects/shaders
we keep seeing people who install mismatching versions of aircraft/fgfs - introduce a version-check property so that -set.xml files can encode version requirements, which are checked by a script in $FG_ROOT/Nasal [1]

Better version info

add boost version info to a property and show it in help/about dialog

More logging

Would it be possible to print the command line options FlightGear has been started with into the log file? At least from log-level "info" on upward. This would make helping users with issues a tad more easy. They would not have paste two files, the log plus the command line options. And the people trying to help would not need to explain how to gather the command line options. Which is not easy with with a bunch of launchers out there. It also would reduce the need for each of those said launchers to implement a feature to print the command line options.

— Alex D-HUND (Jan 12th, 2016). [Flightgear-devel] Feature request - Logging / debugging.
(powered by Instant-Cquotes)

This is a valid request, but there is complication in how various files and options are combined. There are two ways to do this:

show the input data: .fgfsrc and other config files which people have largely forgotten exist, and the arguments. There files are applied in ‘most general’ to ‘most specific’ order, so arguments are the most specific and applied after everything else, so you can always override config state with arguments.
show the combined data inside the options system, as it’s processed. This would still show us when options are being over-riden, but would be less clear where options come from (we no longer know the file paths of config files). This is much easier to implement however.

The second approach is a little simpler to code, and has the advantage that nothing can be accidentally missed from it, but it’s a little harder to understand where a particular option came from. Although, there are probably ways we could track this too (more work of course).

— James Turner (Jan 12th, 2016). Re: [Flightgear-devel] Feature request - Logging / debugging.
(powered by Instant-Cquotes)

Core changes

This article may require cleanup to meet the quality standards of the wiki. Please improve this article if you can.

Screen shot showing the property browser running on Netbook in a patched version of FlightGear with resource tracking support using the cross platform SIGAR library.

CPU information exposed to the property tree using the SIGAR library

These will typically require changes on the C++ side, but would still be good to have whenever someone makes a bug report:

provide a startup option/property to disable aicraft-set.xml Nasal code for troubleshooting purposes
log frequency of Nasal callback invocation to the console to detect fdm coupled code ?
dump node-specific osg camera stats to the console/log file (or even splash screen) during startup for the scenery/aircraft ROOT nodes [2]
separate OpenGL/GLSL logging/log file for troubleshooting graphics related startup errors rendering FG useless (e.g. faulty drivers) [3]
case-sensitive path checking [4]
write the current pid (process ID) to the property tree, so that this can be used for debugging purposes (or even just for the suffix of log files) Not done
add startup options to override/ignore startup files and trigger a navdb cache rebuild [5] Not done
expose number of active timers/listeners[6] Not done
introduce a tag for FDM properties to optionally notice listeners being tied to FDM rate, as per [7]
provide support for an optional TRACECB (callback) mode where multiple identical callbacks are reported ? Not done
we could also provide a property and use it as a "soft limit" for the number of listeners per property before a warning is shown, e.g. ~20-50 ? Not done

I used two places to check how many Uniforms are created. EDXH (Isle of Helgoland in the North Sea) - barely any scenery; And KSFO with AI Traffic enabled, you know the scenery complexity.At EDXH, the Effect system attaches approx. 1500 Listeners to the property tree but only to 17 unique properties. At KSFO, the Effect system attaches some 10000 Listeners to the property tree - but also only to 17 unique properties. That's with shader quality set to zero (all off).

On my old Laptop, (1.6GHz dual core, GeForce Go 7400, 3Gb RAM) FlightGear has becom barely usable over the last few years. Framerates of max 20 at EDXH and single digits at KSFO.

— Torsten Dreyer (2014-09-04). Re: [Flightgear-devel] crash in SGPropertyNode::fireValueChanged.
(powered by Instant-Cquotes)

Subsystem-level Memory Tracking for FlightGear (RAM/VRAM, CPU/GPU utilization) Not done
For better diagnostics, and better end-user bug reports, we could consider exposing a cross-platform process and system utilities module via Nasal/CppBind, such as e.g. psutil (Windows, MacOS & BSD/Unix) Not done
Nasal callback tracking (timers/listeners) [8]
Nasal callback tracing, i.e. identical callbacks being called more than once per frame ?
Expose scenery version info [9] [10] Not done
Expose all startup arguments (would just be copied to a single property) and log them to the log file [11] Not done
Expose complete content of fgfsrc (could be read into the property tree) Not done
Expose cmake/compiler flags Not done
Expose release/debug build info (osg,simgear & fg) Not done
Expose architecture (32bit/64 bit) e.g. via __i386__ and __x86_64__ on Intel or via cmake [12] [13] Not done
Expose amount of physical RAM available Not done
Expose amount of used RAM Not done
Expose degree of swapping Not done
Provide some means to show GLSL errors in a dedicated place, i.e. the loglist ? Not done
Maybe provide an option to attach the startup log file to a flight recorder "tape" so that people can more easily share reproducible flights ?
Keep track of Nasal callbacks running via timers/listeners and maintain a list of origin (FILE/LINE, handle/identifier) so that we can inspect a list of "Nasal processes" [14]
Investigate providing support for Nasal backtraces across timers/listeners as per Philosopher's posting [15]
Extend Initializing Nasal early to provide a startup mode where the Nasal engine will be fully shut-down at run-time for better troubleshooting

Property ownership

Need a way to register property ownership:

That's because the property is normally not under user control but set by the weather system. You have to stop the weather system before you can set it. And other values are affected too - most of the environment in fact

— Thorsten (Jan 16th, 2016). [[16] Re: Property Browser: /environment/visibility].
(powered by Instant-Cquotes)

Version checking

Is there a way to bypass, fool or add multiple version #'s to VERSION file in the data folder so one can start any version of FG without having to edit that file every time?

— wlbragg (Mar 11th, 2014). [[17] folder ].
(powered by Instant-Cquotes)

I have just removed the checked from the branch I am usually working with. But I admit that it would be better to provide a --prop: option to disable the check optionally - it would be useful for people juggling many different branches of SG/FG/FGDATA (while still knowing what they're doing obviously)

— Hooray (Mar 11th, 2014). [[18] Re: ].
(powered by Instant-Cquotes)

It works for people who know what they're doing - but providing it as an option could be problematic, because most people are unlikely to know what this entails, so it may be more trouble supporting this than requiring people to edit that file every time

— Hooray (Mar 12th, 2014). [[19] Re: ].
(powered by Instant-Cquotes)

I was just wondering if there might be any other existing mechanism (other than changing the code or the VERSION file) by which you could circumvent or satisfy that check.

I think if it were up to me, because I do bounce back and forth between two data sets, (cutting edge and current stable) and at least two FG program versions, (again cutting edge and current stable), I would or maybe will, change the code at the version check to look for the correct code, then if not satisfied, read to see if there is a next line in file [VERSION]. If so and it is say, TRUE, it would let it go. Then only people that "know what they are doing" can edit that file one time and not be bothered with it again. Simple fix, but absolutely necessary, no.

At any rate it is no big deal, just an annoyance that I now have the ability to change, at least in my version.

— wlbragg (Mar 13th, 2014). [[20] Re: ].
(powered by Instant-Cquotes)

My main issue is when I want to run 3.0 binary with 3.1 data and then also switch back to 3.1 binary on the same 3.1 data. I don't know that I ever need to use 3.0 data with 3.1 binary, but maybe I do on occasion. I find myself doing it on a regular enough basis that I got annoyed enough to ask the question. It's a simple version check, no mystery, I for one bounce back and forth with different combinations. As for the "why" I guess I do things in 3.1 data that I want to check out using 3.0 binary.

— wlbragg (Mar 13th, 2014). [[21] Re: ].
(powered by Instant-Cquotes)

Nasal

it's relatively easy to do bad things unintentionally. Like tie a bit of code to an FDM property and run updates of a display 120 times per second rather than the 30 times you actually need. Like start a loop multiple times so that you update the same property 30 times per frame rather than the one time you actually need. It's actually pretty hard to catch these things, because the code is formally okay, does the right thing and just eats more performance than necessary, and there's no simple output telling you that you're running 30 loops rather than the one you expect.

— Thorsten Renk (Feb 1st, 2016). Re: [Flightgear-devel] Designing a thread-safe property tree API.
(powered by Instant-Cquotes)

There is a lot of Nasal/Nasal GC bashing going on, but in most cases we've seen so far, it's really improper use of lower-level APIs like timers and listeners that are not properly managed by aircraft developers - which is unlikely to show up on powerful systems, unless you keep FG running for a long time - because the only thing that is really happening is that callbacks (=code) is triggered more often than necessary.

Imagine it like configuring your mobile phone to check your eMail inbox once per hour - in FlightGear, there is no notion of a "scheduler" to do this correctly, instead "timers" and "listeners" are used to associate callbacks (=code routines) with certain events, such as a timer expiring (e.g. after a few seconds) or a certain property changing (or any combination of these, i.e. by triggering listeners to be registered when a timer expires and vice versa). What is happening behind the scenes is a bit obscure, but that has nothing to do with Nasal or its garbage collection (GC) scheme, it has more to do with the low-level nature of the functions provided by core developers to "manage" such callbacks, i.e. the settimer() and setlistener() APIs are particularly tedious to manage properly - but the de facto practice is to use those as the main building blocks to write/integrate full subsystems into the FlightGear main loop - tiny coding errors may not have much of an effect, but under certain circumstances, those coding errors will add up (i.e. over time), so that the original "task" of checking your inbox once per hour, ends up being executed hundreds of times per second - the underlying code would still be correct though, it's just the event handling code that is not written correctly, which is often the case when using reset/re-init or when changing locations - because the simulator hasn't been designed, developed and maintained with these requirements in mind. Furthermore, there's not introspection facility provided that would allow people to look behind the scenes of the simulator to understand just how often a certain timer/listener is registered and triggered, so that the whole thing is extremely obscure - but like I said, this is not a problem inherent to Nasal coding, it also happens/happened at the C++ level - e.g. look at Torsten's fix for the effects code, which was also leaking/re-running listeners like cray (unnecessarily) - thus, it's redundant work that is getting re-scheduled to over and over again due to buggy code - the problem is not necessarily the buggy code though (writing broken code is part of the whole process), the real problem is that we don't provide any functionality that would allow people to look at where their resources (CPU, RAM, VRAM, GPU) are utilized, so that uninformed conclusions are drawn by some people (like "Nasal is bad ... and needs to be replaced with .... "). Thus, what you can do is to overload the corresponding APIs (extension functions) and treat the id() of the callback as the key for a hash lookup to gather data on how many timers/listeners are registered, and sample those over time.

You will almost certainly end up seeing dozens of unnecessary callbacks being invoked by certain code - even though that may not necessarily be restricted to Nasal code, like I said.

— Hooray (Jan 31st, 2016). [[22] Re: 777 freezes and FPS loss].
(powered by Instant-Cquotes)

Canvas

Canvas/Element baseclass: SGTimeStamp/PropertyObject for sampling the duration of each update step [23]
make canvas optional [24]
load Canvas/cppbind bindings on demand [25]#p265730] ]
add draw masks for Canvas placements [26]

Runtime stuff

provide a switch to enable/disable updating of canvas/groups on/off (or placements?), to help with troubleshooting
Investigate extending Canvas::Element to optionally keep track of memory required for each element (use/sub-class osg::ref_ptr), so that we can provide some runtime info Not done
Provide a means to list all running Nasal callbacks invoked via timers/listeners [27] Not done
Extend FGStatsHandler to expose Nasal GC stats via osg::StatsHandler Not done

The built-in osgviewer stats can be extended with custom stats, that works by subclassing osg::StatsHandler, this is already done in $FG_SRC/src/Viewer/FGEventHandler.hxx#L14

The class can be extended to add your own stats via osgViewer::StatsHandler::addUserStatsLine()

You can even register totally custom stats via osg::Stats

The main suspects in this case are probably 1) scenery, 2) cockpit, 3) Nasal (GC - bottleneck() function in simgear/nasal )
For the Nasal stats, you'll probably want to register them in FGNasalSys::FGNasalSys() by accessing globals->viewer->
That should give you a better idea about what's going on there, and it's been suggested by core developers, too:

http://www.mail-archive.com/flightgear- ... 37823.html

Another goal is to add more node bits (and a GUI dialog to control them) so

various categories of objects can be disabled during the update pass. This will
mean the direct hit of, say, AI models vs particles vs random trees can be
measured. Of course it won't account for resources (memory, textures) burned by
such things, but would still help different people identify slowness on their
setups.

— Hooray (Sun Feb 23). Re: New Aircraft: the Extra500.
(powered by Instant-Cquotes)

Canvas::Element/Canvas::Group should be extended to gather SGTimeStamp-based stats for each group/element, so that people can better tell what's going on behind the scenes [28] Not done
Use a separate draw mask for cockpit panels (e.g. via a corresponding XML attribute that serves as a marker), so that the impact of heavy cockpits can be better evaluated Not done
Whenever there's an exception while booting, we should try to re-init the sim and increase the log level to provide crashrpt with better info [29]

Towards better troubleshooting

Contents

Background

Trivial

Better version info

More logging

Core changes

Property ownership

Version checking

Nasal

Canvas

Runtime stuff

Navigation menu

Towards better troubleshooting

Background

Trivial

Better version info

More logging

Core changes

Property ownership

Version checking

Nasal

Canvas

Runtime stuff

Navigation menu

Search