Software testing

From FlightGear wiki
Jump to navigation Jump to search
Note  There’s already the test_suite in $FG_SRC/test_suite using CppUnit, thanks to some hard work by Edward. We need more tests written for it; submissions are welcome. (Pick an area of interest)[1]

FlightGear developers use various testing tools. This includes automated testing via unit tests in SimGear and a full test suite with multiple test categories in the flightgear repository, as well as manual in-sim testing. Writing tests is one of the best ways to jump into FlightGear development.

One improving area is unit testing: certain areas and features (e.g., carrier start) now 'can't break.' As we add testing in additional areas (e.g., Multi-player, AI, protocols, and replay are all possible), we increase the baseline quality and have a clearer idea when we make incompatible changes. (The idea is that we capture the 'supported API' in the tests: when an aircraft deviates from that, we can decide to add another test case, fix the aircraft, etc). Of course, there are some pretty significant areas where Automated Testing Is Hard (TM).[2]

SimGear

Unit testing for the SimGear sources uses the CMake CTest unit testing infrastructure. Several tests use the BOOST unit testing infrastructure tied to the build system using CTest; however, the FlightGear developers are shifting towards eliminating BOOST, so CPPUnit and CTest tests are preferred.

Building and running the SimGear tests

$ make
$ ctest

FlightGear

Testing of the flightgear sources is via a comprehensive test suite implemented using CppUnit, a port of the famous JUnit framework.

Building the test suite

You must build FlightGear from the source using cmake to run the tests. See Building FlightGear for details.

Once you have your cmake build environment, do the following:

  1. Change to your FlightGear build* directory
  2. Enable building the tests by setting a cmake variable: cmake -DBUILD_TESTING=ON .
  3. Ensure the $FG_ROOT environment variable points to fgdata e.g. $FG_INSTALL_DIR/share/fgdata
  4. Build the test suite: make test_suite

Building the test suite will also run a test because you will typically want to write a test and then immediately compile and run it.

* ls in this directory should include CMakeCache.txt, cmake_install, and others. You do not want to run cmake in the flightgear sources directory, which includes files such as AUTHORS, COPYING, INSTALL, etc.

On a Windows MSVC-based build environment, after generating files with cmake, run the following code:

  1. cmake --build . --config RelWithDebInfo --target test_suite/test_suite

Running the test suite

To run the test suite, simply run ./test_suite/fgfs_test_suite

Executing fgfs_test_suite will run the entire test suite and print a Synopsis of results, as shown below.

Synopsis
========

System/functional tests ....................................... [ OK ]
Unit tests .................................................... [ OK ]
Simgear unit tests ............................................ [ OK ]
FGData tests .................................................. [ OK ]
Synopsis ...................................................... [ OK ]


You can also run individual test cases. Run ./test_suite/fgfs_test_suite -h to see the various options

For example, fgfs_test_suite --log-level=alert -d -u GPSTests will run the GPS unit tests while displaying the output.

Why write unit tests?

A well-tested piece of software will have a much lower bug count/load. An extensive test suite with "'unit tests,'" system/functional tests, GUI tests, installer tests, and other categories of tests can significantly help in this regard.

The benefits of not just chasing clear "wins" are great: An excellent learning experience for new developers; the ability to catch latent, unreported bugs; making it easier to refactor current code by creating a safety net; making it easier for current developers to accept new contributions (when accompanied with passing tests); helping other test writers by contributing to the standard test suite infrastructure; and being able to check for memory leaks or other issues via Valgrind easily.[3]

If you are a new developer, jump in and write any test! It does not need to catch a bug. Do whatever you wish! Just dive into this shallow end, and you'll see that the water is not cold.

You are writing a test as a safety net. You write the test to pass, make your changes, and then make sure that the test still passes. Then, you push both the test and core changes.[4]


It'd be better to work in a specific area of interest to you and submit merge requests. When reviewed, that would usually trigger some C++ feedback, but we aren't looking for perfection here. The feedback you receive during the open and public review process increases our overall pool of knowledge of what best practice looks like, even if a given commit is less than perfect.

Having 10 or 20 people actively contributing correct and reasonable code is more important than three people contributing perfect, micro-optimised C++. [5]

Benefits of unit testing

There are lots of benefits to writing tests that pass.

Benefits include :

  • Learning! New developers can learn a ton from writing several passing tests in the area they are interested in. This is one of the quickest ways to learn about a pre-existing and mature code base. You have zero worries about breaking things.
  • Latent bug uncovering. One will probably fail every ten tests you write, expecting them to pass. Tests may uncover unexpected behavior that a developer can improve.
  • Refactoring. If we had 10,000 passing tests (assuming universal test coverage), large-scale refactoring of the entire code base would be quick and reliable. It would enable refactoring on a scale currently unimaginable. I cannot emphasize enough how much of a benefit this would be.
  • Developer turnover. Again, if we had 10,000 passing tests (assuming universal test coverage), it would encourage new developers, giving them confidence that their changes will not cause problems. It is a safety net. It also would provide existing developers peace of mind when a new developer works in one of the dark parts of FlightGear that no current developer understands (there are plenty of those).
  • Test suite infrastructure. The more passing tests written, the better the test suite infrastructure will become. We can already do a lot, but adding more passing tests will help other test writers.
  • Memory checking. Running a single test through Valgrind is fantastic. Running FlightGear through Valgrind is close to impossible. One can write tests that pass but are useful under Valgrind to catch memory leaks!
  • Code quality and standards. If a test compiles on all OSes without warning, it passes, and Valgrind gives you an ok, it is good enough. You don't need to be a C++ expert to dive into this shallow end of the pool.

Bootstrapping completely new tests

To start diving straight into the test suite code, firstly copy what has been done in this commit: edauvergne/flightgear/8474df commit view


Just modify all names for a JSBSim test (or any other test fixture you want to code). You should then be able to compile and check that your new test dummy () test passes as expected. You can then slowly build up from this basic infrastructure as you learn the fgfs internals, c++, and git skills required for implementing your test on your fork's new development branch :)

For a step by step description of how to add new tests

see: Software Testing/Adding CPPUnit Tests

For a detailed explanation of everything in the flightgear/test_suite folder, top to bottom level,

see: Software Testing/Flightgear Test Suite Details

Headless testing

1rightarrow.png See FlightGear Headless for the main article about this subject.

For an FDM+systems test, we should run FG without a renderer (which is what the test_suite does) to benchmark the pure C++ performance of whatever system we care about (FDM or whatever). But a few hours playing with 'perf' on Linux or Instruments on macOS will show you that OSG + the GL drivers use 80% of our CPU time, and hence Amdhal's law will always get you.[6]

Graphics testing

Create test-case scenes where you can quickly measure differences and compare via screenshots. The brain/eye/memory are terrible at this stuff, setup something you can load from the command line via a script to test old/new versions, if possible. [7]

For example, a test would use several rc files (.fgfsrc variants, with different renderers and threading modes, including static values for:

  • c172p at some parking in a detailed airport
  • camera set with a specific direction and field of view
  • AW with specific METARs around (if not possible, BW with specific METAR)
  • fixed rendering settings ( [8]

We can load a replay tape on startup. Since the FDM and User interface are unavailable, replays are suitable for testing rendering and performance.

But essentially, small amounts of shell script + Nasal hacking, can implement any of these methods, and any of them would be welcome additions. The unit-test framework is excellent for lower-level tests run by developers (i.e., 'Does the API call produce the right results in the system?'), but a smoke test that regular users can run would be ideal.

A rendering performance would likely do the following:

  • Select some particular rendering settings (clouds, draw distance, etc)
  • Run a saved fgtape recording
  • Record the mean/min/max FPS during this and save it in some text file/copy to the clipboard

So yes, if anyone wants to work on the above, the code is all there. Please jump in and start hacking. I don't think it needs any more from the core code, but as always, please ask if it does.[9]

For example, when describing a /rendering/ test (to establish FPS), the advantage of a replay tape is that the actual position (and, therefore, the rendered scene) will be 100% consistent across different computers.

Keep in mind that the CPU use of the FDM+systems is typically < 10% of our total CPU use, even when running OSG single-threaded, so for a rendering performance test, whether the FD is run or not is probably noise compared to other things that do run (Nasal, Canvas for example)

Also, the Multi-monitor setup is an area that could use additional unit testing. [10]

Fgdata

Nasal scripting (comments)

WIP.png Work in progress
This article or section will be worked on in the upcoming hours or days.
See history for the latest developments.

The now builtin CppUnit framework can solve all the issues identified in the old Nasal Unit Testing Framework wiki article and the discussions it points to and provide the full framework required.[11]

We have some very simple tests running now for the route manager, which relies on Nasal. We're skipping a few of the bigger Nasal modules (local weather, jetways) and have a few lingering issues in some other modules, but the basic concept is working.


An exciting further step, which you might wish to discuss with Edward, is writing test checks *in* Nasal since this could be quite a fast way to test some areas of the code. There are several ways that could work, and I don't know if Edward has always planned something around this, so I won't preempt that conversation. [12]

we have route-manager tests which validate route_manager.nas is working correctly, and we have Canavs tests ($FG_SRC/test_suite/simgear_tests/canvas) which poke the Nasal API. [13]

We need more FGData testing via the test suite.

James has suggested adding CppUnit assertions to Nasal so others can write tests in pure Nasal. James would make these changes in C++. In addition, some C++ code in a test would scan a directory for files matching a pattern, e.g., test_XYZ.nas, and run each of those automatically.

The idea for testing Nasal would be that you write a small CppUnit interface in C++ in $FG_SRC/test_suite/*_tests/ (the FGData Nasal testing would be in a fgdata_tests/ directory). This would register each test which points to the script in $FG_SRC/test_suite/shared_data/nasal/, and the setUp() and tearDown() functions would use helper functions in the fgtest namespace to start and stop Nasal. The Nasal scripts could then call the CppUnit assertion macros wrapped up as Nasal functions for communicating failures and errors to the test suite. [14]

Scanning for scripts is a great idea. Then, developers (core and content) could write tests using pure Nasal.

However, all scripts found and executed will be seen as a single test within the test suite. So maybe we should have a $FG_SRC/test_suite/nasal_staging/ directory for the initial development of such auto-scanned scripts. But then we have someone shift them into $FG_SRC/test_suite/system_tests/, $FG_SRC/test_suite/unit_tests/, or $FG_SRC/test_suite/fgdata_tests/ later on? That would give better diagnostics and would avoid long-term clutter.[15]

  • First, we should probably hard code tests into the C++ framework. For this, the CppUnit assertion macros will have to be wrapped up as Nasal functions.
  • Implement the scanning code as we need some CMake magic (probably using file(COPY, ...)).
  • Finally, we must determine if and how to improve the Nnasaldebugging output.

the code could go into a subdirectory in $FG_SRC/test_suite/fgdata_tests/, and the Nasal script in $FG_SRC/test_suite/shared_data/nasal/.

References

References