Troubleshooting crashes
| Troubleshooting |
|---|
Crashes are never easy to troubleshoot. It's very likely that you'll need to experiment a bit to find out the cause, or at least to provide useful feedback. In fact issues depend a lot on things like hardware components, driver support and so on. Developers can't work on your machine, so you'll need to provide them with some specific information. This article will guide you in that.
If not only FlightGear, but the whole system crashes, or freezes, please refer to System Crashes.
If in the end you like bug hunting, please consider helping in testing future releases.
Known problems
See the bug tracker for more info.
Troubleshooting your crash
You can find the logs for FlightGear in ~/.fgfs/ or C:\Users\your username\AppData\Roaming\flightgear.org\ on Windows.
Community assistance
If you need help with the whole process, please get in touch via the FG forum or Discord.
When doing that, please fully document the whole process, i.e. including:
- startup settings (and fgfsrc if used)
- error messages and warnings (log/console output)
- screen shots
- optional screen shots showing frame rate, frame spacing, performance monitor, osg on-screen stats (see Howto:Use the system monitor )
- use the about dialog to provide other important information
Troubleshooting Performance Issues
|
|
Debugging Segfaults & Obtaining Backtraces
| Note This section is currently Linux/gdb specific. Any help in updating it to become less platform/OS/debugger-specific would be greatly appreciated! |
A segfault is another word for "crash", it's a so called "segmentation fault", where the program is doing things that it wasn't designed to do, such as accessing invalid memory for example, so that the operating system will terminate the process because its behavior is no longer valid/correct. These segfaults will typically happen due to coding errors, either in FlightGear or one of its dependencies (libraries like SimGear, plib, OpenSceneGraph or OpenAL).
A typical segfault may look like this:
Initializing splash screen
Splash screen progress init
Program received signal SIGSEGV, Segmentation fault.
0xb71467e6 in __strlen_sse2_bsf () from /usr/lib/libc.so.6
The last two lines are basically showing the actual crash, anything shown earlier is normally just an indicator how far the code could proceed shortly before the crash actually occurred.
In order to see what was going on when the crash occurred, you'll need to build a binary with debugging symbols enabled/included, usually this includes FlightGear and SimGear - and possibly even OSG (OpenSceneGraph, normally only if your crash happens inside rendering related code).
A so called backtrace is basically pretty much like a flight data recorder tape/log that is used to conduct a post-crash analysis.
A binary with debugging symbols included will contain additional information that helps developers track down what was causing a certain crash, such as for example the file name, function, line number, variable that was accessed before the crash happened. This is analogous to how a FDR/CVR on an airplane records certain parameters like position, orientation, altitude, velocities, configuration, cockpit conversations etc. This can also be used by developers to see who is responsible for a certain segfault/bug, i.e. which commit introduced the bug.
Debugging FlightGear crashes is significantly simplified by having access to such runtime data, because we can basically go back in time and see what the FlightGear program was doing shortly before the crash.
Unfortunately, logging all this data by default has a certain run-time overhead and requires additional tools and knowledge, so it is usually too costly and not done, FlightGear will by default be heavily optimized by your compiler and not include code logging important debugging/post-crash parameters, for the sake of better performance. . However, as developers, we do appreciate all efforts to provide meaningful bug reports, and a reproducible bug including a good crash analysis, i.e. a so called backtrace is as good as it gets, and the likelihood of a certain bug being examined and fixed is significantly higher when the bug report includes both, a backtrace, and a set of steps to reproduce the error reliably.
A good way to to provide a reproducible test case is using the Replay/flight recorder system to create a flight recorder tape that triggers the crash, alternatively, consider providing a "flight plan" that triggers the crash - ideally, without it being aircraft specific, i.e. reproducible using aircraft like the ufo
Thus, you'll want to reconfigure your SimGear/FlightGear ($SG_SRC & $FG_SRC) source trees using the -CMAKE_BUILD_TYPE=RelWithDebInfo switch, for details please refer to: Building using CMake#Debug_Builds. It is a good idea not to touch your existing build trees for this, but instead create an additional directory hierarchy for your debugging binary, please see Building using CMake#Multiple build directories for details.
Once you have rebuilt and relinked SimGear and FlightGear, you'll want to use a debugger like gdb to run your new binary. It does help to have a good way to reproduce a crash, such as using certain startup/runtime settings. For the sake of simplicity it is usually a good idea to disable all unrelated features and subsystems/settings, this includes complex aircraft and complex scenery locations (airports) if possible. For details, refer to the minimal startup profile detailed in this article.
- For gdb to be available, you'll normally have to use your package manager (apt, yum, yast etc) and install the "gdb" package first.
- next, you'd navigate to your build directory where your fgfs debug binary is located
- you'd then, run gdb
- via the gdb shell, you can specify the file to be used via file src/Main/fgfs
- this will preload the binary and its dependencies
- to actually run the file, you would use the run command
- you'll normally want to pass arguments right behind run, especially the mandatory ones (i.e. to specify $FG_ROOT)
- creating a simple .Fgfsrc file helps for the sake of simplicity
- once the segfault/crash occurs, you'll want to run backtrace (with bt being the shorter equivalent)
- only if that isn't conclusive, use thread apply all bt full instead to get a full backtrace for all threads (background tasks)
- please use the issue tracker to post your backtrace
This is what a typical backtrace may look like:
(gdb) thread apply all bt
Thread 1 (Thread 0xb66c4740 (LWP 10908)):
#0 0xb71467e6 in __strlen_sse2_bsf () from /usr/lib/libc.so.6
#1 0x0886c080 in copy_string (s=s@entry=0x0)
at /home/cesar/compilation/pkg/simgear/2.10.0/src/simgear-2.10.0/simgear/props/props.cxx:161
#2 0x08870550 in set_string (val=0x0, this=0x8dd55b8)
at /home/cesar/compilation/pkg/simgear/2.10.0/src/simgear-2.10.0/simgear/props/props.cxx:524
#3 SGPropertyNode::setStringValue (this=0x8dd55b8, value=0x0)
at /home/cesar/compilation/pkg/simgear/2.10.0/src/simgear-2.10.0/simgear/props/props.cxx:1587
#4 0x08326ae1 in (anonymous namespace)::GeneralInitOperation::run (this=0x8bdea48, gc=0x8be4310)
at /home/cesar/compilation/pkg/flightgear/2.10.0/src/flightgear-2.10.0/src/GUI/gui.cxx:123
#5 0x085c2b7b in flightgear::GraphicsContextOperation::operator() (this=0x8bdea48, gc=0x8be4310)
at /home/cesar/compilation/pkg/flightgear/2.10.0/src/flightgear-2.10.0/src/Viewer/WindowSystemAdapter.cxx:41
#6 0xb78ca246 in osg::GraphicsOperation::operator()(osg::Object*) () from /usr/lib/libosg.so.80
#7 0xb78c7784 in osg::GraphicsContext::runOperations() () from /usr/lib/libosg.so.80
#8 0xb7b29fe2 in osgViewer::ViewerBase::renderingTraversals() () from /usr/lib/libosgViewer.so.80
#9 0xb7b2707c in osgViewer::ViewerBase::frame(double) () from /usr/lib/libosgViewer.so.80
#10 0x085c4841 in fgOSMainLoop ()
at /home/cesar/compilation/pkg/flightgear/2.10.0/src/flightgear-2.10.0/src/Viewer/fg_os_osgviewer.cxx:286
#11 0x082221fa in fgMainInit (argc=32, argv=0xbffff494)
at /home/cesar/compilation/pkg/flightgear/2.10.0/src/flightgear-2.10.0/src/Main/main.cxx:339
#12 0x081e71ee in main (argc=32, argv=0xbffff494)
at /home/cesar/compilation/pkg/flightgear/2.10.0/src/flightgear-2.10.0/src/Main/bootstrap.cxx:251As can be seen, the first few lines basically show exactly what happened, and where it happened, while all the subsequent lines are basically showing the callgraph, i.e. how the piece of code got triggered. For example, FlightGear started in bootstrap.cxx line 251, then called into main.cxx (line 339), at which point the osgviewer setup took place. Next, there are a few lines that do not contain a lot of interesting info, because those are all library calls into OSG - things are starting to be interesting again around line #5, where FlightGear specific code is getting called again. As can be seen in this particular example, the actual segfault was triggered in a SimGear operation here, which was due to a NULL pointer.
At first, this may look a bit confusing and even intimidating, but it does make sense to people familiar with C++ and FlightGear/SimGear - so even if you cannot understand a single thing, just post such output on the issue tracker, so that others can have a look. We'll ask for additional information if necessary.
However, at times it may be nearly impossible to see what actually goes in in FG these days simply because there's too many disparate parts involved here. For example, in the rare instance that our Nasal scripting interpreter could lead to a segfault, we could potentially see where the C code was using a debugger like gdb – but that wouldn't tell us where we were in the Nasal code.
And that similarly applies to many things, especially things that "jump" boundaries here. Another prime example is property tree listeners - Nasal listeners don't have a backtrace beyond the C code, even if it is triggered directly from a Nasal setprop()/.setValue(), because it passes through C and C++ stack frames and then back into Nasal.
Thus, valgrind/gdb/cachegrind/perftools may be too low-level to be meaningful in most cases, i.e. just look at what's going on in the scenery/aircraft department, here just identifying "hot spots" in terms of code/functions would be a red herring, because the real culprit is typically not at all in the run-time code, but the underlying scenegraph data, such as highly complex textures, 3D models or terrain/scenery.
This can now be easily verified by using Zakalawe's new draw-masks to switch off aircraft/scenery/clouds or AI model rendering.
Having acess to gdb/valgrind etc is one thing, being able to actually run and interpret the results is a completely different thing.
Thus, it is more promising -in the long term- to optionally expose certain info at run-time, so that users can access it to see how their aircraft/scenery/feature/configuration behaves in comparison to some other aircraft/scenery/feature/configuration.
Also, when it comes to using tools like gdb -and especially valgrind- the main challenge is that FlightGear is primarily a GUI application, and it doesn't lend itself it being debugged/profiled or leak-tested in an isolated fashion at all.
For example, for valgrind runs to be useful and straightforward, we would need to be able to minimize what's done inside fg_init.cxx so that we can disable things not needed, but also even run FlightGear in a "headless" mode without a visible GUI window, so that such tests could be run directly on the build server as part of a regression testing suite. Otherwise, even just running fgfs in valgrind for 10 minutes sim time may turn out to take hours due to the nature of valgrind, i.e. it being an emulator.
valgrind et all are great if you know how to use them, but for end users, these tools are too fine-grained, we would ideally support some per-subsystem or per-feature granularity for end users.
Using AddressSanitizer
| AddressSanitizer does a similar job to Valgrind with less overhead (~2.5x memory use, and nearly full speed), but requires recompiling; I — Rebecca Palmer (2014-08-22). Re: [Flightgear-devel] crash in SGPropertyNode::fireValueChanged.
(powered by Instant-Cquotes) |
#this is for Linux with llvm-3.4, libsqlite3-dev, flite1-dev, libhtsengine-dev; it will probably work on Mac (with possibly different ASAN_SYMBOLIZER_PATH) but not Windows
#simgear
cmake ../../git/simgear -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer -O2 -g"
-DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer -O2 -g"
-DCMAKE_SHARED_LINKER_FLAGS="-fsanitize=address -fno-omit-frame-pointer
-O2 -g" -DCMAKE_VERBOSE_MAKEFILE=1 -DSIMGEAR_SHARED=ON
make
sudo make install
#flightgear
cmake ../../git/flightgear -DCMAKE_BUILD_TYPE=RelWithDebInfo
-DCMAKE_C_FLAGS="-fsanitize=address -fno-omit-frame-pointer -O2 -g"
-DCMAKE_CXX_FLAGS="-fsanitize=address -fno-omit-frame-pointer -O2 -g"
-DCMAKE_VERBOSE_MAKEFILE=1 -DSIMGEAR_SHARED=ON -DENABLE_IAX=OFF
-DENABLE_FGCOM=OFF -DSYSTEM_SQLITE=ON
make
sudo make install
#run
ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-3.4/bin/llvm-symbolizer
ASAN_OPTIONS="symbolize=1 alloc_dealloc_mismatch=0" fgfs [options] 2>
asan_log.txt
#post asan_log.txt to developers mailing list and/or the issue tracker