Navdata cache

From FlightGear wiki
Revision as of 17:52, 31 October 2015 by Hooray (talk | contribs) (add working screen shots)
Jump to navigation Jump to search
This article is a stub. You can help the wiki by expanding it.
Screen shot showing structure of $FG_HOME
Caution  This feature is crucial to the FlightGear startup sequence, but is unfortunately infamous for rendering FlightGear non-startable under certain situations. Particularly, after experiencing a FlightGear crash or when installing/running multiple FlightGear instances at the same time[1]. It is assumed that this is due to lack of synchronization and missing atexit() handlers, so that the SQLite database gets corrupted under these circumstances.

The only way to "fix" this currently is renaming/deleting the navdata cache entirely (navdata_n_n.cache), or renaming the folder containing $FG_HOME.

Unfortunately, this bug is hard to reproduce. So if you are encountering this bug, please be sure to create a ticket, including instructions for reproducing the bug, so that this can be hopefully fixed.

The navdata cache, navdb, (sometimes also navcache) is automatically built by FlightGear during startup by parsing/processing the gzipped nav.dat file and building a spatial SQLite-based database in $FG_HOME so that more efficient queries can be run at run-time, but, in particular when starting up/re-initializing FlightGear. Memory consumption is also lower, since we don't keep airports / fixes / taxiways / runways in memory until they're needed.

Screen shot showing the SQLite3 schema of the nav cache using a GUI SQLite database editor

Background

Screen shot showing several navcache (SQLite) files in $FG_HOME for various fgfs versions

We need a lot of data available at startup for position init. Parsing it from apt.dat and nav.dat is 'slow', but has been optimised over the years. (Parsing 20MB of text data each launch is still kind of crazy, though) Collecting it from many XML files would also be slow, but there's a general point that we should be decoupling availability from loading.

Cache initialization

Usually, the navdb will only be compiled once during the first start of the simulator - however, some changes may require rebuilding the cache on disk (see below for details).

Touching the navdata files will cause a rebuild, so rebuilds need to take a 'reasonable' amount of time. Shipping the binary data improves first-launch perceptions, but many applications do additional work on their first launch after an install, so it's not a top concern for the time being.

When the scenery paths change, we have to rebuild, because Airports XML data (from the <scenery>/Airports/ tree) is overlaid into the cache, replacing the data from apt.dat. This means when the search paths or order changes, we have to completely rebuild, because there's no way to restore the unmodified (prior to XML overlay) version.

Cquote1.png we should absolutely stop telling anyone to edit preferences.xml in FG_ROOT; any documentation or advice which says to should be changes ASAP.
Cquote2.png

Zakalawe considers the time and fragility to implement such a system much greater than the impact of a rebuild when switching scenery paths, which most people don't do very often.

The rebuild is purely a CPU / disk-bound operation, and for most people, it takes around 60 sec for debug and 30 sec for release times for a complete rebuild. (That time will be dramatically reduced if we get the taxiway data out of apt.dat in the future, since taxiways account for >80% of the lines in the file).

Usually the navdb should be under 200 MB and normally even a rebuild should not take much longer than a few minutes. However, on some platforms/OS the navdb rebuild is known to take exceptionally long, which seems to affect mainly Windows users - some of whom have reported the navcache rebuild taking more than 30 minutes.


Technical details

The nav-cache is versioned, it will be wiped when the cache schema version changes, but right now it’s not wiped when the flightgear version does.

The DB schema version is tracked internally in the file, and if the schema changes during a development version (3.3 or whatever) it will force a drop and rebuild.

The naming scheme is why the first run of a new stable release of FG always does a cache rebuild - it caused some complaints, since the feedback on cache rebuilds is not great (we indicate activity but not progress through the data). it does mean you can run stable and dev versions side by side without continual cache rebuilds of course.

we /do/ also run a Sqlite verification check on the DB at launch time - this should catch many kinds of file-/index- level corruption (bad commit, etc), and in any case where we detect an issue, the solution is again to drop the cache and rebuild.

The internal format of the cache (for example the fact it's a SQL DB) is deliberately opaque. There are no guarantees made about the format of the file, the SQL scheme or anything. There is a version field of course, and we're not planning or expecting any changes, but we really don't want to be tied to the current scheme if we discover problems or want to add something else in the future.

The cache is stored in $FG_HOME/navdata.cache, and is rebuilt if the timestamps on any of the data files change (apt.dat, nav.data, fix.dat and so on). When the cache needs to be rebuilt, startup will take a bit longer than before, but when the cache is valid, startup is much faster, especially for debug builds - because all the usual parsing/processing will be skipped, and the corresponding data will be read from the binary cache instead.

Known issues

Note  See also these bug reports:

Read only database

The SQLite-based navcache seems to have a tendency to get corrupted causing FlightGear to not start and give for example the following error messages:

  • Sqlite error : attempt to write a readonly database received from DELETE FROM groundnet_edge
  • Sqlite error : attempt to write a readonly database

The easiest way around these errors is to delete the navcache and let FlightGear automatically rebuild it the next startup. Depending on the amount of scenery available to FlightGear on your computer it may take some minutes.

Corruption due to crashes

The navcache also seems to get corrupted during segfaults/crashes, especially on Windows, but more recently also on OSX:

Cquote1.png Sqlite error : attempt to write a readonly database received from DELETE FROM groundnet_edge
— daniel36 (Jul 9th, 2015). Sqlite error.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png at least on my 10.7 build, I’m seeing an issue with the nav-cache not initialising. This is actually good, because it gives me a way, finally, to trace down the ‘nav-cache never initialises’ bug that some folks have reported. But it will take a little time. Since the same binary works perfectly on my 10.10 box, I assume it’s something /very/ subtle, probably uninitialised memory or some tiny change in zlib’s gzread functions (since we use the system Zlib library which hence could be different between 10.7 and 10.10).
— James Turner (2015-02-18). Re: [Flightgear-devel] Release 3.4.0 is coming.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I'd sugest to review the atexit() handlers in $FG_SRC/Main to ensure that the SQLite/navcache properly cleans up all resources there, i.e. by closing files etc - as far as I remember, the navcache is not an actual/proper "SGSubsystem" - so maybe the corresponding ctors aren't getting called during segfaults/crashes - IIRC, it's implemented as a singleton:

https://gitorious.org/fg/flightgear/sou ... t.cxx#L570


— Hooray (Wed Dec 31). Re: FATAL ERROR.
(powered by Instant-Cquotes)
Cquote2.png

Upcoming developments

we are looking at making the loading of this data asynchronous so startup is not delayed, but we also lack Windows developers who can investigate the issue, since Linux, Mac and some Windows boxes are absolutely fine.

There's future work to move even more data into the cache — for example parking positions — which will further help performance for FMS / map systems since we won't need to parse lots of XML data repeatedly.

Cquote1.png This makes me think I should do some validation on FG_HOME, i.e that it exists, and is writeable. That would have caught the UTF-8 / windows-local-8-bit encoding screw-up I had for 2.10, too. I think we have all the pieces to do this, will come up with something in the next few days.
Cquote2.png
Cquote1.png I need people to track down *why* the cache rebuild is being triggered. The possibilities are: a scenery path reconfiguration (added / removed / re-ordered scenery paths) or that one of the .dat files we build the cache from has been modified. All the .dat files (Navaids/nav.dat.gz, Airports/apt.dat.gz, and similar .dat.gz files) should have a modification timestamp that corresponds to when the program was installed. If people could confirm that's the case, it would be one useful step.


The next step after that is for someone with a debug build (compiled from source) to step through the cache startup code and see which conditional is tripping the rebuild.


— zakalawe (Thu Oct 17). Re: Loading forever: "loading navigation data".
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png Having the option would be awesome, because it would greatly simplify troubleshooting, as you say - so having additional startup options that explicitly override/ignore autosave.xml and ~/.fgfsrc would be awesome as it would be much more straightforward to exclude a plethora of potential problem sources, we could save tons of time that way - but I wouldn't make it part of the same option. Ideally, being able to ignore autosave/fgfsrc would be different command-line arguments.Preferably, we would have a "soft ignore" mode (retaining data) and a "delete" mode, which resets any local settings that may interfere with FG.
Cquote2.png
Cquote1.png I'll add a --rebuild-caches option and it can drop the aircraft and terrasync caches too.
Cquote2.png
Cquote1.png I mean the update cache which stops us spamming the server with update requests every launch - instead we only check the server once per 24-hour period, which greatly cuts down on nuisance update polls to the server.

The only thing I can't decide if this option should drop is the auto-save file. Then the options becomes a bit more of a 'reset to defaults', which might be more useful since people often end up with old values in their auto-save and again we don't have a UI to drop it.


Cquote2.png
Cquote1.png So I think only the following is missing:
  • make --restore-defaults kill the caches (nav-cache, aircraft cache and terrasync update cache)
  • add a --ignore-defaults which stops both read *and* writing of autosave.xml
  • log any XML config file we process

Cquote2.png

Discussion

Note  Also see A NavDB Web Service.
Cquote1.png We do not currently have a dedicated interface for exchanging internal data structures like the navdb, terrain data or images/video streams. However, something like that would be generally useful for a whole number of purposes and projects, so please feel free to leave a feature request at the tracker:

http://flightgear-bugs.googlecode.com/


Cquote2.png
Cquote1.png the navigation databases


Currently these are gzipped compressed text - they’re not small and currently we change them infrequently but the upstream sources does receive updates relatively often (monthly?). Again there is some code dependency here, since new versions could break things.


Cquote2.png
Cquote1.png My proposed initial solution, but this is where the discussion starts:

move the category [1] files, and potentially unzipped [2] files, into the main flightgear repository, under, say, ‘flightgear/data’. I don’t think a submodule gains us anything, all the files are tiny so it won’t bloat the repository. (And it avoids the complexity of forgetting to update the submodule - but if that issue can be solved, a submodule also works for me)

(We need to switch to unzipping the category [2] files anyway, for another improvement to startup performance, and once they are text rather than binary, Git will be able to diff them efficiently, hence my thought they can maybe be treated the same way. But I’m not sure. Also there is the option to start getting upstream updates more often, e.g. between FG releases)


Cquote2.png

FGPositioned

Cquote1.png it will become a base for the following:
  • FGAirport
  • FGRunway
  • FGFix
  • FGNavRecord
  • ATCData

— James Turner (2008-08-16). [Flightgear-devel] FGPositioned refactoring.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png I've been carrying on with my experiments with my FGPositioned idea, and I now have several of my original steps done:
  • Fix, NavRecord, Airport and Runway all inherit the base class
  • they all live on the heap, where previously Runway and Fix were stack based, and hence rather heavy to work with
  • since the liftetime is now (generally) long, I can use a persistent spatial index (currently not Matthias', but it's an easy, internal change)
  • I can query all of the above in a unified fashion, and add more types easily (Jon Stockhill has obstacle data I can add in)
  • I've written a bunch of test cases, all of which pass identically on the mainline and with my changes applied

— James Turner (2008-09-06). [Flightgear-devel] FGPositioned update.
(powered by Instant-Cquotes)
Cquote2.png
Cquote1.png it's also the starting point for working on improving the airways code and creating a standard FGFlightPlan class - an airway or flightplan is essentially built out of FGPositioned objects, tagged with some extra data.
— James Turner (2008-08-16). [Flightgear-devel] FGPositioned refactoring.
(powered by Instant-Cquotes)
Cquote2.png
References
  1. Thorsten Renk (Nov 12th, 2014). [Flightgear-devel] Random oddities.