FlightGear Git: splitting FGData

From FlightGear wiki
Revision as of 23:16, 16 November 2011 by Durk (talk | contribs)
Jump to navigation Jump to search

To Split or not to Split, that's the question

After much discussion on the mailing list, it was decided to put the existing attempt to split FGdata on hold until further notice. The main reason for postponing the split was that, while it was considered a well intended initiative, the end result of the splitting process itself left the FlightGear fgdata project in a less than desirable state. For this reason, before another splitting attempt is to be undertaken, the pro's and con's of each step should be carefully evaluated. This article discusses some of our options and will formulate a plan of approach that can be presented to -and discussed in further depth- on the developers mailing list. Several reasons have been put forth to split fgdata:

Reasons to split fgdata

  • Advantages
    • Aircraft authors can get commit access to their own aircraft, without granting them global fgdata access.
    • When pulling fgdata, one won't have to download several gigs of aircraft data. People will have to pull the base package, but any additional aircraft will be optional.
    • It will be easier for aircraft authors to check the history of their aircraft.
    • Commiting will go faster, because Git will no longer have to check those thousands of files to see whether they were edited. NOTE: Can't reproduce even on really old, slow (7.2k SATA) disks.
    • fgdata size decreases from 5,6 GB to 1 GB (see statistics below).

It should also be noted, however, that a split is not without potential problems:

  • Disadvantages
    • It will be harder to keep a local up to date copy of all aircraft. No more "git pull" to fetch all the latest updates.
      • Might be fixed by using Git submodules.[1]
    • How to deal with licences? Until now there was a COPYING file in fgdata. When aircraft are split in separate repositories, they'll likely need to include a license reference themselves.
    • Need a concept for release management, maintaining version numbers, release branches, release tags et. al.
    • Quite a few unmaintained aircraft got adopted after one of the developers accidentially tripped over them. Need a plan how this would be supposed to work with split aircraft repositories, otherwise the project would axe one of the substantial principles which contributed to its success.
    • Need an idea about how to subsitute the the previous "starter" package which was offered via HTTP for those who'd like to have the entire repository.

One of the most prominent reasons brought forth in favor of splitting fgdata is related to the relatively large size of the initial clone of the git repository, the relatively slow download size of gitorious, and the observation that interrupted downloads cannot be resumed. Before discussing possible alternatives to this problem, a few observations should be made with respect to the actual size of the downloaded git package:

  • Statistics: To obtain proper GIT repository size statistics, make sure to only check the size of the ".git" folder - which contains the history that belongs to the archive and needs to be downloaded. Once you check out a branch as a "working copy" locally, the total size of your actual file system folder increases (likely doubles), since the check-out creates a working copy of all files by extracting data from the compressed archive.
    • Size of original fgdata GIT repository: 5.6GB
    • Size of fgdata core GIT repository without aircraft: 1GB
    • Total size of all aircraft repositories: 3.1GB
    • Number of aircraft: 385

It should be noted that interrupted downloads are a potential problem; however there are a number of viable workarounds for these:

    • Download an initial clone using a more robust download system, such as a bittorrent
    • Download a snapshot without full project history.
    • Clone the repository from a faster mirror, such as the mapserver.

It should further be noticed that git's merging and update algoritms are sufficiently efficient to deal with our ever increasing repository, so no immediate problems are to be expected in this area. Given these considerations, it appears that there are sufficient alternatives to circumvent the initial clone problem, and that the size of the git repository as such poses no immediate problem. That said, there are a number of additional reasons that make it desireable to split the fgdata repository in smaller, more manageable chunks. Splitting off the aircraft directory from the rest is a logical first step, and the main question is how to proceed with this. There are a number of possible alternatives: 1) Split off all aircraft and keep then all in a single, but separate repository. 2) Move each aircraft to its own repository, and 3), organize aircraft by logical units. Here are the advantages and disadvantages of keeping all aircaft in a single repository:

Keeping all aircraft under a single project

  • Advantages
    • The current fgdata-developers team can access any single aircraft, for easy/quick fixes. For example when something is found to be wrong and copied among several aircraft (which happens due to copy&paste). Or when something about the sim itself changes and aircraft msut be adapted to run on an upcoming release.
    • When an aircraft developers decides to leave, the repo can easily be taken over by other developers. If the author set up his own repository, we'd have to create a new repository (and thus change all references/links).
    • It allows us to use the bug tracker for aircraft. Most developers won't clone aircraft repos from all kind of places, just to help fixing bugs.
  • Disadvantages
    • Authors won't be able to choose their own license.
      • The FlightGear Aircraft project has been set to "License: Other/Multiple". This allows (in theory, we first need to agree on this) any aircraft author to add whatever license file to his/her aircraft and still put it under the project.

Organizing Aircraft by Logical units

  • Advantages
    • Logical ordering units remain manageble both in terms of the number of them as well as their size
  • Disadvantages
    • It is difficult to come up with a good set of criteria to define the aircraft categories.

Assigning each aircraft to its own project

  • Advantages
    • Each aircraft developer can get commit rights to his or her own project.
  • Disadvantages
    • It will become increasingly difficult to maintain abandoned aircraft, or conduct maintanance
    • With over 500 individual repositories it will become increasingly difficult to keep track of new developments.


Given these considerations, it can be concluded that it is desirable to separate the aircraft from the main repository. It should also be pointed out that seperating out the aircraft, and moving them all into a single repository is the sole action addressing the most urgent reason for the split, namely giving the opportunity to be more liberal in granting aircraft developers commit rights, without having to consider the integrity of the base package as such. It should furthermore be noticed that while it is technically possible to remove an entire subdirectory from a project without losing its history, it is undesirable to do so. Every split done in this manner would force every user to reclone the entire repository in question. Additionally, it is considerably more difficult to combine repositories again once they have been split. Therefore, we should be cautious in performing split operations. For this reason, the most feasible action appears to be to just separate the aircraft directory from fgdata and move this in it's entirety to a new subproject. There it can live until a new and agreed upon classification scheme for separate repositories has been developed.

Finally, the following considerations should be taken into account.

  • FlightGear's release distributions are steadily increasing in size. With the proposed fgdata split, we should consider removing all aircraft, except the default cessna 172p from the base package. All others can simply be downloaded from the website. Considering that the c172p is and integral part of FlightGear, it should remain the ONLY aircraft that remains in the base package, and wich is is not moved to the new fgaircraft repository.
  • It should be emphasized that GIT is a distributed revision control system, and that our current use of git is insufficient.Aircraft developers should be encouraged to set up their own personal clone on gitorious, and we should encourage aircraft developers to post more merge requests.
References