FlightGear Git: splitting FGData
This article is outdated but is kept for historical reference.
This split had been discussed for many years and finally begun in September 2014. See FlightGear Newsletter September 2014#Aircraft moved to SVN. The new repositories after the split are called FGData and FGAddon. |
Project issues |
---|
After much discussion on the mailing list, it was decided to put the existing attempt to split FGdata on hold until further notice. The main reason for postponing the split was that, while it was considered a well intended initiative, the end result of the splitting process itself left the FlightGear fgdata project in a less than desirable state. For this reason, before another splitting attempt is to be undertaken, the pro's and con's of each step should be carefully evaluated. This article discusses some of our options and will formulate a plan of approach that can be presented to -and discussed in further depth- on the developers mailing list. Several reasons have been put forth to split fgdata:
News / Status (03/2015)
I am going to commence reducing FGDATA this weekend. The reasons for that action have been discussed here and on the forum in epic detail, so I won't — Torsten Dreyer (2015-03-05). [Flightgear-devel] FGData size reduction.
(powered by Instant-Cquotes) |
Next, I'll do a initial commit of the base package aircraft to fgaddon/svn. Following that, I create a new git repo for a smaller git fgdata and push — Torsten Dreyer (2015-03-05). [Flightgear-devel] FGData size reduction.
(powered by Instant-Cquotes) |
Following a fairly lengthy discussion, here is what we now intending to do with FGData:
— James Turner (2015-03-05). Re: [Flightgear-devel] FGData size reduction.
(powered by Instant-Cquotes) |
Reasons to split fgdata
Advantages
- Aircraft authors can get commit access to their own aircraft, without granting them global fgdata access.
- When pulling fgdata, one won't have to download several gigs of aircraft data. People will have to pull the base package, but any additional aircraft will be optional.
- It will be easier for aircraft authors to check the history of their aircraft.
- Commiting will go faster, because Git will no longer have to check those thousands of files to see whether they were edited. NOTE: Can't reproduce even on really old, slow (7.2k SATA) disks.
- fgdata size decreases from 5,6 GB to 1 GB (see statistics below).
Disadvantages
It should also be noted, however, that a split is not without potential problems:
- It will be harder to keep a local up to date copy of all aircraft. No more "git pull" to fetch all the latest updates.
- Might be fixed by using Git submodules.[1]
- How to deal with licences? Until now there was a COPYING file in fgdata. When aircraft are split in separate repositories, they'll likely need to include a license reference themselves.
- Need a concept for release management, maintaining version numbers, release branches, release tags et. al.
- Quite a few unmaintained aircraft got adopted after one of the developers accidentially tripped over them. Need a plan how this would be supposed to work with split aircraft repositories, otherwise the project would axe one of the substantial principles which contributed to its success.
- One of the main reasons for running a community owned (aircraft or source) repository, and the reason why people have "donated" (aircraft or sources) to the common repository, is to guarantee that any contributed work lives on - for as long as the (FG) project itself exists. Private repositories, even hangars run by a small number of people, are likely to become unmaintained and even lost eventually, since people's interests and hobbies change over time. Few (FG) contributors are active for more than 5-10 years. Hence, a common and well maintained community repository is essential to every open source project.
- Need an idea about how to subsitute the the previous "starter" package which was offered via HTTP for those who'd like to have the entire repository.
- It will be harder to keep a local up to date copy of all aircraft. No more "git pull" to fetch all the latest updates.
One of the most prominent reasons brought forth in favor of splitting fgdata is related to the relatively large size of the initial clone of the git repository, the relatively slow download size of gitorious, and the observation that interrupted downloads cannot be resumed. Before discussing possible alternatives to this problem, a few observations should be made with respect to the actual size of the downloaded git package:
Statistics
To obtain proper GIT repository size statistics, make sure to only check the size of the ".git" folder - which contains the history that belongs to the archive and needs to be downloaded. Once you check out a branch as a "working copy" locally, the total size of your actual file system folder increases (likely doubles), since the check-out creates a working copy of all files by extracting data from the compressed archive.
- Size of original fgdata GIT repository: 5.6GB
- Size of fgdata core GIT repository without aircraft: 1GB
- Total size of all aircraft repositories: 3.1GB
- Number of aircraft: 385
It should be noted that interrupted downloads are a potential problem; however there are a number of viable workarounds for these:
- Download an initial clone using a more robust download system, such as a bittorrent
- Download a snapshot without full project history.
- Clone the repository from a faster mirror, such as the mapserver.
It should further be noticed that git's merging and update algoritms are sufficiently efficient to deal with our ever increasing repository, so no immediate problems are to be expected in this area. Given these considerations, it appears that there are sufficient alternatives to circumvent the initial clone problem, and that the size of the git repository as such poses no immediate problem. That said, there are a number of additional reasons that make it desireable to split the fgdata repository in smaller, more manageable chunks.
Options
Splitting off the aircraft directory from the rest is a logical first step, and the main question is how to proceed with this. There are a number of possible options:
- Split off all aircraft and keep them all in a single, but separate repository.
- Move each aircraft to its own repository
- Organize aircraft by logical units.
Here are the advantages and disadvantages of each option:
Single repository
- Advantages
- The current fgdata-developers team can access any single aircraft, for easy/quick fixes. For example when something is found to be wrong and copied among several aircraft (which happens due to copy&paste). Or when something about the sim itself changes and aircraft msut be adapted to run on an upcoming release.
- When an aircraft developers decides to leave, the aircraft can easily be taken over by other developers.
- It allows us to use the bug tracker for aircraft. Most developers won't clone aircraft repos from all kind of places, just to help fixing bugs.
- Disadvantages
- Authors won't be able to choose their own license.
- The aircraft/ directory makes up the largest part of the current repository. A single aircraft repository would make contributing to the base package easier, but it won't have much effect on aircraft developers.
Per-aircraft repository, single project
Each aircraft would get a repository under a "FGAircraft" project at Gitorious.
- Advantages
- The current fgdata-developers team can access any single aircraft, for easy/quick fixes. For example when something is found to be wrong and copied among several aircraft (which happens due to copy&paste). Or when something about the sim itself changes and aircraft msut be adapted to run on an upcoming release.
- When an aircraft developers decides to leave, the repo can easily be taken over by other developers. If the author set up his own repository, we'd have to create a new repository (and thus change all references/links).
- It allows us to use the bug tracker for aircraft. Most developers won't clone aircraft repos from all kind of places, just to help fixing bugs.
- Authors can choose their own license.
- The FlightGear Aircraft project can be set to "License: Other/Multiple". This allows (in theory, we first need to agree on this) any aircraft author to add whatever license file to his/her aircraft and still put it under the project.
- However, the use of a common license for all community aircraft has many advantages and has also contributed to the success of the FG project (new aircraft can be easily based on existing aircraft, can copy elements from other aircraft without triggering a complicated mixed license situation), and specifically the GPL has many advantages for the FG project (see Changing the FlightGear License for details why FG wants to stick to the GPL). Authors preferring an other license can of course already (and continue to) do so - except that their aircraft is not in the community repository.
Per-aircraft project
- Advantages
- Each aircraft developer can get admin rights to his or her own project.
- Disadvantages
- It will become increasingly difficult to maintain abandoned aircraft, or conduct maintanance
- With over 500 individual repositories it will become increasingly difficult to keep track of new developments.
Organizing aircraft by logical units
- Advantages
- Logical ordering units remain manageble both in terms of the number of them as well as their size
- Disadvantages
- It is difficult to come up with a good set of criteria to define the aircraft categories.
Considerations
Given these considerations, it can be concluded that it is desirable to separate the aircraft from the main repository. It should also be pointed out that seperating out the aircraft, and moving them all into a single repository is the sole action addressing the most urgent reason for the split, namely giving the opportunity to be more liberal in granting aircraft developers commit rights, without having to consider the integrity of the base package as such. It should furthermore be noticed that while it is technically possible to remove an entire subdirectory from a project without losing its history, it is undesirable to do so. Every split done in this manner would force every user to reclone the entire repository in question. Additionally, it is considerably more difficult to combine repositories again once they have been split. Therefore, we should be cautious in performing split operations. For this reason, the most feasible action appears to be to just separate the aircraft directory from fgdata and move this in it's entirety to a new subproject. There it can live until a new and agreed upon classification scheme for separate repositories has been developed.
Finally, the following considerations should be taken into account.
- FlightGear's release distributions are steadily increasing in size. With the proposed fgdata split, we should consider removing all aircraft, except the default cessna 172p from the base package. All others can simply be downloaded from the website. Considering that the c172p is and integral part of FlightGear, it should remain the ONLY aircraft that remains in the base package, and wich is is not moved to the new fgaircraft repository.
- It should be emphasized that GIT is a distributed revision control system, and that our current use of git is insufficient.Aircraft developers should be encouraged to set up their own personal clone on gitorious, and we should encourage aircraft developers to post more merge requests.
- As more and more people gain commit rights, some rules and guidelines are in order. We currently have a largely unwritten code of conduct that has proven to work well. With new people entering the scene, it will become necessary to put them in writing however.
A new plan for splitting fgdata
- Create a new git repository: fgaircraft. This repository will -for the time being- contain all current and new FlightGear aircraft except the default c172p.
- Provide images of the git repository, to alleviate the initial clone problem. Preferably, this should be done using bittorrent, or another interruptable download source, so that people with limited bandwidth, can monitor their data consumption more flexibly.
- After a few days of testing, remove all aircraft from the main fgdata repository.
- Formalize the status of the new aircraft repositories by formalizing our existing, but unwritten, code of conduct and granting new aircraft authors who agree to this code of conduct commit rights.
- Because the new system can allow us to grant a potentially large number of aircraft developers commit access, we deem it desireabe to formulate an explicit set of rules with regard to the contributor's rights and obligations. It should be noted that the following rules are not really new, but merely formalizations of our existing code of conduct. By explicitly formulating them, our main goal is to avoid misunderstanding, and to provide clear guidelines in the unlikely event of misuse of commit rights.
- Post these rule on a relatively prominent location on the main FlightGear.org website.
- Formulate and discuss plans for further separation into logical categories.
- Once a consensus has been reached further split operations can be conducted.
The rules describing the rights and obligations of committers are as follows:
- Authors who obtain commit rights to fgaircraft retain the rights to handle their own work.
- The FlightGear fgaircraft administrators are allowed to update aircraft files, but only
- to enable the use of new FlightGear features (such as adding a necessary include file or set a few property switches),
- to fix small and obvious bugs (such as fixing a misspelled property or file name), or
- to apply changes necessary to keep aircraft compatible with an upcoming FlightGear release.
- The FlightGear fgaircraft repository maintainers reserve the right to revoke commit access of an individual committer. They can only exercise their right after consensus has been reached among the maintainers, and only in cases of
- misuse of commit rights by the offending developer, or
- prolonged inactivity of the committer (more than a year of inactivity)
- Aircraft that have not been maintained by the prime committer for a prolonged period of time are considered to have been abandoned and may be assigned to a different committer. The obvious exception to this rule is formed by aircraft that have a high level of completeness that are maintained by committers who are still very active in other areas of FlightGear's development.
- Commit rights will be given after the authors have shown a reasonable level of competence in both aircraft development and GIT usage. While the aircraft developer is still in the process of obtaining the required skills, he or she can seek a mentor who will handle any merge requests.
- If the aircraft developer is uncomfortable in working with GIT he or she can also opt to choose a "mentor" who will handle the merge requests for them.
- In addition to these rules, anybody contributing to the FlightGear project is encouraged to work with personal clones and submit merge requests.
Comments on the new plan
(HHS comments) That sounds like the situation we already have - the only difference: the aircraft are just splitted from FGData, and the maintainer of an Aircraft-project can get the rights to commit without any magic. But with that there are lot of rules with exceptions. From my daily real life job I do know, that many rules with exceptions may make things more complicated and provoke discussions and conflicts. So my question is: How can we see that "a reasonable level of competence in both aircraft development and GIT usage" is there? Does he has to be checked? Which level of competence is needed for making an aircraft?
(some answers from TorstenD) I'd define "reasonable level of competence" for GIT as "be able be familiar with the basic everyday GIT workflow". If everyday git is understandable, I'd say OK. And for aircraft development: Be able to stay in your sandbox, don't mess with other's files, be able to reuse existing (common) files, create reasonable file sizes, accept naming conventions etc. Most important: nobody is perfect, but be able to fix what you broke ;-)
--ThorstenB 17:57, 11 December 2011 (EST): It's a huge improvement to split aircraft from the main fgdata repo. Any change to an aircraft will only affect a single aircraft. But changes to the rest of fgdata (the central Nasal, Shader, GUI directories etc) can have a massive impact: on all other aircraft, on the simulator core, or on the design/direction/structure of core parts. So, it's good to have these things separated. And I think we can allow a lot more people direct commit access once aircraft are separated - even if they all share a repo. I don't think we would see many (or any) cases of authors messing with other people's work. And I think the rules are fine and simple enough - actually almost "common sense" when working in a collaborative environment.
--Durk 05:02, 12 December 2011 (EST) I would also like to mention that we don't consider the initial split to be the final stage. Additional splits are possible, but we need to carefully evaluate how we should do that. Once we do have a decent plan we can continue with phase b) of the operation. I will add that to the plan later.
--Durk 05:22, 12 December 2011 (EST) Oh, and before I forget: James Turner is working on providing the infrastructure for aircraft meta-data. Once this in place, we would be much more flexible in splitting, but I would be cautious to not step beyond the initial stage before James is finished with that.
(hvengel comments) Having well defined rules it a good thing. Some (perhaps many) of the aircraft devs have little or no experience with normal software development work flows and they will need well defined rules to help them do the right thing. Reading through the proposed rules I don't see too much room for things to be incorrectly used. However because the level of experience with software work flows will be all over the place (IE. some aircraft devs are also experienced software developers/engineers and some have absolutely no experience with this type of thing) the rules should be a simple and as clear and complete as possible. Nothing should be open to interpretation.
I don't see the concern over determining GIT and/or aircraft development competence to be a significant issue. I am sure that there are a significant number of aircraft devs who are currently using merge requests who would be given commit rights right from the get go. Newer aircraft devs can work with a mentor with commit rights while they are learning GIT and aircraft development and the mentor can determine when they are ready for commit rights. However I think having a documented process for this might be a good idea.
(HHS comments)
It is good to see that the existing rules are finally formalized, so we get a clear guideline.
It is also good to see that people should be more encouraged to submit merge requests. But this also means that those with commit rights should be more encouraged to review those merge requests and commit them!
--T3r 15:06, 11 December 2011 (EST): Absolutely - however: as most commiters are as lazy as I am, they do not regularly check for merge requests. Picking your most trustworthy commiter from the list and asking him to handle the requests for you is probably a good idea. Reminders by PM are always welcome.
--Durk 05:02, 12 December 2011 (EST) Which is actually why we introduced the notion of a mentor. The primary purpose is that more one-to-one work relations between committers and contributers will be established. This will certainly benefit the overall workflow.
—Johan G (Talk | contribs) 17:32, 24 September 2012 (EDT) The UFO should probably have the same status as proposed for the C172, but I do not know weather it resides with the aircraft or already with the rest of FlightGear.
New Splitting Concept
see FGAddon for new concept
References |