FlightGear Git: splitting FGData: Difference between revisions

From FlightGear wiki
Jump to navigation Jump to search
Line 1: Line 1:
[[Aircraft]] are split to seperate [[Git]] repositories, under the [https://gitorious.org/flightgear-aircraft FlightGear Aircraft] project at Gitorious.
= To Split or not to Split, that's the question =
 
After much discussion on the mailing list, it was decided to put the existing attempt to split FGdata on hold until further notice. The main reason for postponing the split was that, while it was considered a well intended initiative, the end result of the splitting process itself left the FlightGear fgdata project in a less than desirable state. For this reason, before another splitting attempt is to be undertaken, the pro's and con's of each step should be carefully evaluated. This article discusses some of our options and will formulate a plan of approach that can be presented to -and discussed in further depth- on the developers mailing list. Several reasons have been put forth to split fgdata:


== Reasons to split fgdata ==
== Reasons to split fgdata ==
Line 8: Line 10:
** Commiting will go faster, because Git will no longer have to check those thousands of files to see whether they were edited. NOTE: Can't reproduce even on really old, slow (7.2k SATA) disks.
** Commiting will go faster, because Git will no longer have to check those thousands of files to see whether they were edited. NOTE: Can't reproduce even on really old, slow (7.2k SATA) disks.
** fgdata size decreases from 5,6 GB to 1 GB (see statistics below).
** fgdata size decreases from 5,6 GB to 1 GB (see statistics below).
It should also be noted, however, that a split is not without potential problems:
* '''Disadvantages'''
* '''Disadvantages'''
** It will be harder to keep a local up to date copy of all aircraft. No more "git pull" to fetch all the latest updates.
** It will be harder to keep a local up to date copy of all aircraft. No more "git pull" to fetch all the latest updates.
Line 15: Line 20:
** Quite a few unmaintained aircraft got adopted after one of the developers accidentially tripped over them. Need a plan how this would be supposed to work with split aircraft repositories, otherwise the project would axe one of the substantial principles which contributed to its success.
** Quite a few unmaintained aircraft got adopted after one of the developers accidentially tripped over them. Need a plan how this would be supposed to work with split aircraft repositories, otherwise the project would axe one of the substantial principles which contributed to its success.
** Need an idea about how to subsitute the the previous "starter" package which was offered via HTTP for those who'd like to have the entire repository.
** Need an idea about how to subsitute the the previous "starter" package which was offered via HTTP for those who'd like to have the entire repository.
One of the most prominent reasons brought forth in favor of splitting fgdata is related to the relatively large size of the initial clone of the git repository, the relatively slow download size of gitorious, and the observation that interrupted downloads cannot be resumed. Before discussing possible alternatives to this problem, a few observations should be made with respect to the actual size of the downloaded git package:


* '''Statistics''': To obtain proper GIT repository size statistics, make sure to only check the size of the ".git" folder - which contains the history that belongs to the archive and needs to be downloaded. Once you check out a branch as a "working copy" locally, the total size of your actual file system folder increases (likely doubles), since the check-out creates a working ''copy'' of all files by ''extracting'' data from the ''compressed'' archive.
* '''Statistics''': To obtain proper GIT repository size statistics, make sure to only check the size of the ".git" folder - which contains the history that belongs to the archive and needs to be downloaded. Once you check out a branch as a "working copy" locally, the total size of your actual file system folder increases (likely doubles), since the check-out creates a working ''copy'' of all files by ''extracting'' data from the ''compressed'' archive.
Line 22: Line 29:
** Number of aircraft: 385
** Number of aircraft: 385


=== Reasons to put aircraft under a single project ===
It should be noted that interrupted downloads are a potential problem; however there are a number of viable workarounds for these:
** Download an initial clone using a more robust download system, such as a bittorrent
** Download a snapshot without full project history.
** Clone the repository from a faster mirror, such as the mapserver.
 
It should further be noticed that git's merging and update algoritms are sufficiently efficient to deal with our ever increasing repository, so no immediate problems are to be expected in this area. Given these considerations, it appears that there are sufficient alternatives to circumvent the initial clone problem, and that the size of the git repository as such poses no immediate problem. That said, there are a number of additional reasons that make it desireable to split the fgdata repository in smaller, more manageable chunks. Splitting off the aircraft directory from the rest is a logical first step, and the main question is how to proceed with this. There are a number of possible alternatives: 1) Split off all aircraft and keep then all in a single, but separate repository. 2) Move each aircraft to its own repository, and 3), organize aircraft by logical units. Here are the advantages and disadvantages of keeping all aircaft in a single repository:
 
=== Keeping all aircraft under a single project ===
* '''Advantages'''
* '''Advantages'''
** The current fgdata-developers team can access any single aircraft, for easy/quick fixes. For example when something is found to be wrong and copied among several aircraft (which happens due to copy&paste). Or when something about the sim itself changes and aircraft msut be adapted to run on an upcoming release.
** The current fgdata-developers team can access any single aircraft, for easy/quick fixes. For example when something is found to be wrong and copied among several aircraft (which happens due to copy&paste). Or when something about the sim itself changes and aircraft msut be adapted to run on an upcoming release.
Line 31: Line 45:
*** The FlightGear Aircraft project has been set to "License: Other/Multiple". This allows (in theory, we first need to agree on this) any aircraft author to add whatever license file to his/her aircraft and still put it under the project.
*** The FlightGear Aircraft project has been set to "License: Other/Multiple". This allows (in theory, we first need to agree on this) any aircraft author to add whatever license file to his/her aircraft and still put it under the project.


== Starting new aircraft ==
=== Organizing Aircraft by Logical units ===
New aircraft should not be commited to fgdata. A separate repository should be created instead. Contact a FlightGear Aircraft admin to get your repository included in the FlightGear Aircraft project. You, as author, will get commit rights for that repository.
* '''Advantages'''
** Logical ordering units remain manageble both in terms of the number of them as well as their size
* '''Disadvantages'''
** It is difficult to come up with a good set of criteria to define the aircraft categories.


The current fgdata developers will have access to every single aircraft repository. In order to maintain aircraft that no longer have an active author, fix global bugs etc. Other users can create merge requests to get their fixes/improvements commited.
=== Assigning each aircraft to its own project ===
 
* '''Advantages'''
== Step 1: Split aircraft into separate repositories (DONE) ==
** Each aircraft developer can get commit rights to his or her own project.
Aircraft are split into "private" repositories, under the [https://gitorious.org/flightgear-aircraft FlightGear Aircraft project]. Repositories for all aircraft in current fgdata have been created as of 17 October. Repository names equal directory-names (with the exceptions of dots (.) and capitals (ABC), as those are not supported by Gitorious.
* '''Disadvantages'''
 
** It will become increasingly difficult to maintain abandoned aircraft, or conduct maintanance
There are two workflows that both do the trick:
** With over 500 individual repositories it will become increasingly difficult to keep track of new developments.  
 
=== Method 1 ===
'''Note:''' the split branch (<tt>737-300-split</tt> in this example) must NOT exist beforehand.
 
cd $FG_ROOT
git subtree split -P Aircraft/737-300 -b 737-300-split
mkdir -p ../split-aircraft/737-300
cd ../split-aircraft/737-300
git init
git fetch $FG_ROOT 737-300-split
git checkout -b master FETCH_HEAD
git push git@gitorious.org:flightgear-aircraft/737-300.git master
 
=== Method 2 ===
(need to check this one)
cd fgdata
git clone --no-hardlinks /fgdata /737-300
git filter-branch --subdirectory-filter Aircraft/737-300 HEAD -- --all
git reset --hard
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now
 
Then all aircraft should be pushed to their new repositories. A script will take care of that.
 
== Step 2: Remove all aircraft from fgdata ==
A new fgdata repository without any aircraft is created, pushed to gitorious and tested.
 
== Step 3: Write/update manuals ==
Since the aircraft development flow will be different, we should teach our developers how to use the new system.
 
'''To be written:'''
* [[FlightGear Git: aircraft authors]]
 
'''To be updated:'''
* [[FlightGear and Git]]
* [[FlightGear Git on Mac OS X]]
* [[FlightGear Git on Windows]]
 
== Step 4: Switch fgdata ==
Once tests show that the new repository is working fine and no data has been lost, the repositories will be switched.
The exisiting repository is renamed, and the new repository takes its place and access to the new repository is setup.
The historic fgdata is kept but stays frozen, at least until we're sure everything is safe. It may also be a good idea to keep the existing repo, since it contains release branches - which are not part of the new repository and would be lost otherwise.


== Step 5: Inform the crowd ==
Post an announcement at the mailing list, forum, Facebook (?) and in the [[FlightGear Newsletter October 2011|upcoming newsletter edition]]. Link to the respective wiki articles (see [[#Step 3: Write/update manuals|Step 3]]) that contain details on how to work with the new system.


== Questions ==
Given these considerations, it can be concluded that it is desirable to separate the aircraft from the main repository. It should also be pointed out that seperating out the aircraft, and moving them all into a single repository is the sole action addressing the most urgent reason for the split, namely giving the opportunity to be more liberal in granting aircraft developers commit rights, without having to consider the integrity of the base package as such. It should furthermore be noticed that while it is technically possible to remove an entire subdirectory from a project without losing its history, it is undesirable to do so. Every split done in this manner would force every user to reclone the entire repository in question. Additionally, it is considerably more difficult to combine repositories again once they have been split. Therefore, we should be cautious in performing split operations. For this reason, the most feasible action appears to be to just separate the aircraft directory from fgdata and move this in it's entirety to a new subproject. There it can live until a new and agreed upon classification scheme for separate repositories has been developed.  
* What to do with Aircraft/UIUC should it be placed like "one" aircraft in a aircraft-repository? Or each individual?
** An flightgear-aircraft/UIUC repo has been created, we can delete it if needed later on.
* Gitorious does not support dots in repository names.
** For now we just skipped dots, so Supermarine-S.6B has a repo called <tt>supermarine-s6b</tt> (note that Gitorious also does not distinct capitals. This does not seem to be a problem though, since git clone uses the original directory name).
* What rights should aircraft authors get?
** Commit/review might be best. If they are granted admin rights, they can delete the repo and remove the fgdata-developers as collaborators. When they do the later, there's no way we can regain control over the repo. We won't even be able to delete it from the FlightGear Aircraft project.
* Should we support multiple licenses under the FlightGear Aircraft project?
** Possible, but then the Aircraft Downloadpage would maybe need an update. Better create one Place for GNU-GPL and one for other licences.
*How long should we test? FGData is increasing again as new aircraft are continuously added again.  
 


{{Appendix}}
Finally, the following considerations should be taken into account.
* FlightGear's release distributions are steadily increasing in size. With the proposed fgdata split, we should consider removing all aircraft, except the default cessna 172p from the base package. All others can simply be downloaded from the website. Considering that the c172p is and integral part of FlightGear, it should remain the ONLY aircraft that remains in the base package, and wich is is not moved to the new fgaircraft repository.
* It should be emphasized that GIT is a distributed revision control system, and that our current use of git is insufficient.Aircraft developers should be encouraged to set up their own personal clone on gitorious, and we should encourage aircraft developers to post more merge requests.

Revision as of 23:14, 16 November 2011

To Split or not to Split, that's the question

After much discussion on the mailing list, it was decided to put the existing attempt to split FGdata on hold until further notice. The main reason for postponing the split was that, while it was considered a well intended initiative, the end result of the splitting process itself left the FlightGear fgdata project in a less than desirable state. For this reason, before another splitting attempt is to be undertaken, the pro's and con's of each step should be carefully evaluated. This article discusses some of our options and will formulate a plan of approach that can be presented to -and discussed in further depth- on the developers mailing list. Several reasons have been put forth to split fgdata:

Reasons to split fgdata

  • Advantages
    • Aircraft authors can get commit access to their own aircraft, without granting them global fgdata access.
    • When pulling fgdata, one won't have to download several gigs of aircraft data. People will have to pull the base package, but any additional aircraft will be optional.
    • It will be easier for aircraft authors to check the history of their aircraft.
    • Commiting will go faster, because Git will no longer have to check those thousands of files to see whether they were edited. NOTE: Can't reproduce even on really old, slow (7.2k SATA) disks.
    • fgdata size decreases from 5,6 GB to 1 GB (see statistics below).

It should also be noted, however, that a split is not without potential problems:

  • Disadvantages
    • It will be harder to keep a local up to date copy of all aircraft. No more "git pull" to fetch all the latest updates.
      • Might be fixed by using Git submodules.[1]
    • How to deal with licences? Until now there was a COPYING file in fgdata. When aircraft are split in separate repositories, they'll likely need to include a license reference themselves.
    • Need a concept for release management, maintaining version numbers, release branches, release tags et. al.
    • Quite a few unmaintained aircraft got adopted after one of the developers accidentially tripped over them. Need a plan how this would be supposed to work with split aircraft repositories, otherwise the project would axe one of the substantial principles which contributed to its success.
    • Need an idea about how to subsitute the the previous "starter" package which was offered via HTTP for those who'd like to have the entire repository.

One of the most prominent reasons brought forth in favor of splitting fgdata is related to the relatively large size of the initial clone of the git repository, the relatively slow download size of gitorious, and the observation that interrupted downloads cannot be resumed. Before discussing possible alternatives to this problem, a few observations should be made with respect to the actual size of the downloaded git package:

  • Statistics: To obtain proper GIT repository size statistics, make sure to only check the size of the ".git" folder - which contains the history that belongs to the archive and needs to be downloaded. Once you check out a branch as a "working copy" locally, the total size of your actual file system folder increases (likely doubles), since the check-out creates a working copy of all files by extracting data from the compressed archive.
    • Size of original fgdata GIT repository: 5.6GB
    • Size of fgdata core GIT repository without aircraft: 1GB
    • Total size of all aircraft repositories: 3.1GB
    • Number of aircraft: 385

It should be noted that interrupted downloads are a potential problem; however there are a number of viable workarounds for these:

    • Download an initial clone using a more robust download system, such as a bittorrent
    • Download a snapshot without full project history.
    • Clone the repository from a faster mirror, such as the mapserver.

It should further be noticed that git's merging and update algoritms are sufficiently efficient to deal with our ever increasing repository, so no immediate problems are to be expected in this area. Given these considerations, it appears that there are sufficient alternatives to circumvent the initial clone problem, and that the size of the git repository as such poses no immediate problem. That said, there are a number of additional reasons that make it desireable to split the fgdata repository in smaller, more manageable chunks. Splitting off the aircraft directory from the rest is a logical first step, and the main question is how to proceed with this. There are a number of possible alternatives: 1) Split off all aircraft and keep then all in a single, but separate repository. 2) Move each aircraft to its own repository, and 3), organize aircraft by logical units. Here are the advantages and disadvantages of keeping all aircaft in a single repository:

Keeping all aircraft under a single project

  • Advantages
    • The current fgdata-developers team can access any single aircraft, for easy/quick fixes. For example when something is found to be wrong and copied among several aircraft (which happens due to copy&paste). Or when something about the sim itself changes and aircraft msut be adapted to run on an upcoming release.
    • When an aircraft developers decides to leave, the repo can easily be taken over by other developers. If the author set up his own repository, we'd have to create a new repository (and thus change all references/links).
    • It allows us to use the bug tracker for aircraft. Most developers won't clone aircraft repos from all kind of places, just to help fixing bugs.
  • Disadvantages
    • Authors won't be able to choose their own license.
      • The FlightGear Aircraft project has been set to "License: Other/Multiple". This allows (in theory, we first need to agree on this) any aircraft author to add whatever license file to his/her aircraft and still put it under the project.

Organizing Aircraft by Logical units

  • Advantages
    • Logical ordering units remain manageble both in terms of the number of them as well as their size
  • Disadvantages
    • It is difficult to come up with a good set of criteria to define the aircraft categories.

Assigning each aircraft to its own project

  • Advantages
    • Each aircraft developer can get commit rights to his or her own project.
  • Disadvantages
    • It will become increasingly difficult to maintain abandoned aircraft, or conduct maintanance
    • With over 500 individual repositories it will become increasingly difficult to keep track of new developments.


Given these considerations, it can be concluded that it is desirable to separate the aircraft from the main repository. It should also be pointed out that seperating out the aircraft, and moving them all into a single repository is the sole action addressing the most urgent reason for the split, namely giving the opportunity to be more liberal in granting aircraft developers commit rights, without having to consider the integrity of the base package as such. It should furthermore be noticed that while it is technically possible to remove an entire subdirectory from a project without losing its history, it is undesirable to do so. Every split done in this manner would force every user to reclone the entire repository in question. Additionally, it is considerably more difficult to combine repositories again once they have been split. Therefore, we should be cautious in performing split operations. For this reason, the most feasible action appears to be to just separate the aircraft directory from fgdata and move this in it's entirety to a new subproject. There it can live until a new and agreed upon classification scheme for separate repositories has been developed.

Finally, the following considerations should be taken into account.

  • FlightGear's release distributions are steadily increasing in size. With the proposed fgdata split, we should consider removing all aircraft, except the default cessna 172p from the base package. All others can simply be downloaded from the website. Considering that the c172p is and integral part of FlightGear, it should remain the ONLY aircraft that remains in the base package, and wich is is not moved to the new fgaircraft repository.
  • It should be emphasized that GIT is a distributed revision control system, and that our current use of git is insufficient.Aircraft developers should be encouraged to set up their own personal clone on gitorious, and we should encourage aircraft developers to post more merge requests.