Software testing: Difference between revisions

From FlightGear wiki
Jump to navigation Jump to search
(it'd be a waste to keep something like this hidden in the depths of the devel list archives ... @bugman: this could still use a little review, time permitting (esp. to get rid of 1st person speech) ;-))
 
(30 intermediate revisions by 5 users not shown)
Line 1: Line 1:
{{Stub}}
{{Note|There’s already the test_suite in {{fg src file|path=test_suite}} using [[Cppunit effort|CppUnit]], thanks to some hard work by [[User:Bugman|Edward]]. We need more tests written for it, submissions welcome. (Pick an area of interest)<ref>https://sourceforge.net/p/flightgear/mailman/message/36972720/</ref>}}


{{See also|Cppunit effort}}
The FlightGear source code and data are tested by the FlightGear developers using a number of tools.  This includes automated testing via unit tests in [[SimGear]] and a full test suite with multiple test categories in the [[FlightGear Git|flightgear repository]], as well as manual in-sim testing.  Writing tests is one of the best ways to jump into FlightGear development.
One improving area is unit-testing: because of which, certain areas and features (eg, carrier start) now ‘can’t break’. As we add more such areas (eg Multi-player, AI, protocols and replay are all possible), we increase the baseline quality, and also have a clearer idea when we make incompatible changes. (The idea being we capture the ‘supported API’ in the tests: when an aircraft deviates from that, we can decide to add another test case, fix the aircraft, etc). Of course, there’s some pretty major areas where Automated Testing Is Hard (TM).<ref>https://sourceforge.net/p/flightgear/mailman/message/37078825/</ref>
== SimGear ==
The [[SimGear]] sources are checked via unit testing.  This is implemented using the CMake CTest unit testing infrastructure.  A number of tests are currently implemented using the BOOST unit testing infrastructure tied to the build system using CTest, however this should be avoided when writing tests as there is a shift towards eliminating BOOST by the FlightGear developers.
=== Building and running the SimGear tests ===
<syntaxhighlight lang="bash">
$ make
$ ctest
</syntaxhighlight>
== FlightGear ==
Testing of the [[FlightGear Git|flightgear sources]] is via a comprehensive test suite implemented using [https://www.freedesktop.org/wiki/Software/cppunit/ CppUnit], a port of the famous JUnit framework.
=== Building the test suite ===
To run the tests you will need to build FlightGear from source using cmake.  See [[Building FlightGear]] for details.
Once you have your cmake build environment do the following:
# Change to your FlightGear build directory
# Enable building the tests by setting a cmake variable:  <code>cmake -LBUILD_TESTING=ON .</code>
# Ensure the <code>[[$FG ROOT|$FG_ROOT]]</code> environment variable points to fgdata e.g. <code>$FG_INSTALL_DIR/share/fgdata</code>
# Build the test suite:  <code>make test_suite</code>
This will also run the full test suite (see below), because you will typically want to write a test and then immediately compile and run it.
=== Running the test suite ===
To run the test suite, simply run <code>./test_suite/fgfs_test_suite</code>
This will run the full test suite and print a Synopsis of results similar to the following:
Synopsis
========
System/functional tests ....................................... [ OK ]
Unit tests .................................................... [ OK ]
Simgear unit tests ............................................ [ OK ]
FGData tests .................................................. [ OK ]
Synopsis ...................................................... [ OK ]
You can also run individual test cases.  Run <code>./test_suite/fgfs_test_suite -h</code> to see the various options
=== Why write unit tests? ===
A well tested piece of software will have a much lower bug count/load.  An extensive test suite with '''unit tests''', system/functional tests, GUI tests, installer tests, and other categories of tests can significantly help in this regard.
The benefits of not just chasing clear "wins" are great:  An awesome learning experience for new developers; the ability to catch latent, unreported bugs; making it easier to refactor current code by creating a safety net; making it easier for current developers to accept new contributions (when accompanied with passing tests); helping other test writers by contributing to the common test suite infrastructure; and being able to easily check for memory leaks or other issues via Valgrind.<ref>https://sourceforge.net/p/flightgear/mailman/message/36977686/</ref>
If you are a new developer, just jump in and write any test!  It does not need to catch a bug. Do whatever how ever you wish! Just dive in this shallow end and you'll see that the water is not cold.
Writing a test as a safety net. You write the test to pass, make your changes, then make sure that the test still passes.  Then you push both the test and core changes.<ref>https://sourceforge.net/p/flightgear/mailman/message/36977465/</ref>
It'd be better jumping into a specific area of interest to you, and submitting merge-requests. That would naturally trigger some C++ feedback when reviewed but we aren’t looking for perfection here, more to increase the overall pool of knowledge of what best-practice looks like, even if a given commit is less than perfect.
I,e It’s more important to have 10 or 20 people actively contributing correct-and-reasonable code than three people contributing absolutely perfect, micro-optimised C++. <ref>https://sourceforge.net/p/flightgear/mailman/message/36951247/</ref>
=== Benefits of unit testing ===
There are lots of benefits to writing tests that pass.
There are lots of benefits to writing tests that pass.


Benefits include :
Benefits include :


* Learning!  New developers can learn a ton from writing a number of passing tests in the area they are interested in. This is one of the quickest ways to learn about a pre-existing and mature codebase.  You have zero worries about breaking things.  This is diving in the shallow end.
* Learning!  New developers can learn a ton from writing a number of passing tests in the area they are interested in. This is one of the quickest ways to learn about a pre-existing and mature code base.  You have zero worries about breaking things.  This is diving in the shallow end.


* Latent bug uncovering.  For every ten tests you write expecting them to pass, one will probably fail.  Or at least uncover unexpected behavior that can be improved.
* Latent bug uncovering.  For every ten tests you write expecting them to pass, one will probably fail.  Or at least uncover unexpected behavior that can be improved.


* Refactoring.  If we had 10,000 passing tests (assuming universal test coverage), large scale refactoring of the entire codebase would be quick and reliable.  It would enable refactoring on a scale currently unimaginable. I cannot emphasize enough how much of a benefit this would be.
* Refactoring.  If we had 10,000 passing tests (assuming universal test coverage), large scale refactoring of the entire code base would be quick and reliable.  It would enable refactoring on a scale currently unimaginable. I cannot emphasize enough how much of a benefit this would be.


* Developer turnover.  Again if we had 10,000 passing tests (assuming universal test coverage), it would encourage new developers.  This is because the fear of breaking something is removed.  It is a total safety net.  It also would give existing developers peace of mind when a new developer is touching one of the dark parts of FlightGear that no current developer understands (there are plenty of those) .
* Developer turnover.  Again if we had 10,000 passing tests (assuming universal test coverage), it would encourage new developers.  This is because the fear of breaking something is removed.  It is a total safety net.  It also would give existing developers peace of mind when a new developer is touching one of the dark parts of FlightGear that no current developer understands (there are plenty of those) .
Line 15: Line 81:
* Test suite infrastructure.  The more passing tests written, the better the test suite infrastructure will become.  We can already do a lot.  But the addition of more passing tests will help other test writers.
* Test suite infrastructure.  The more passing tests written, the better the test suite infrastructure will become.  We can already do a lot.  But the addition of more passing tests will help other test writers.


* Memory checking.  Running a single test through Valgrind is amazing. Running fgfs through Valgrind is close to impossible.  Passing tests can be written to catch memory leaks!
* Memory checking.  Running a single test through Valgrind is amazing. Running FlightGear through Valgrind is close to impossible.  Passing tests can be written to catch memory leaks!
 
* Low code quality and standards.  This is related to the learning point.  As long as a test compiles on all OSes without warning, it passes, and Valgrind gives you an ok, it is good enough.  You dont need to be a C++ expert to dive into this shallow end of the pool.
 
=== Bootstrapping completely new tests ===
To start diving straight into the test suite code, firstly copy what has been done in this commit:  {{repo link
| site  = sf
| user = edauvergne
| repo  = flightgear
| commit = 8474df
| view  = commit
}}
 
 
Just modify all names for a JSBSim test (or any other test fixture you
want to code).  You should then be able to compile and check that your
new testDummy() test passes as expected.  You can then slowly build up
from this basic infrastructure as you learn the fgfs internals, c++,
and git skills required for implementing your test on your fork's new
development branch :)
 
=== Headless testing ===
{{Main article|FlightGear Headless}}
 
For an FDM+systems test, we should run FG without a renderer (which is what the test_suite does), to benchmark the pure C++ performance of whatever system it is we care about (FDM or whatever). But this is not really worth doing anyway, since a few hours playing with ‘perf’ on Linux / Instruments on macOS will show you that 80% of our CPU time is spent in OSG + the GL drivers, and hence Amdhal’s law will always get you.<ref>https://sourceforge.net/p/flightgear/mailman/message/36977666/</ref>
 
=== Graphics testing ===
{{See also|FlightGear Benchmark}}
 
The other thing is to setup some test-case scenes where you can quickly measure differences and compare via screenshots. The brain/eye/memory are very bad at this stuff, setup something you can load from the command line via a script to test old/new versions, if possible. <ref>https://sourceforge.net/p/flightgear/mailman/message/36959002/</ref>
 
For example, a test would would be a .[[fgfsrc]] that makes:
* c172p at some parking in a detailed airport;
* camera looking at it in specific direction with specific FOV;
* AW with specific METARs around (if not possible, BW with specific METAR);
* fixed rendering settings (several variants may be needed for renderers and threading modes) <ref>https://sourceforge.net/p/flightgear/mailman/message/36975122/</ref>
 


* Low code quality and standards.  This is related to the learning point. As long as a test compiles on all OSes without warning, it passes, and Valgrind gives you an ok, it is good enough.  You don't need to be a C++ expert to dive into this shallow end of the pool.
We can already load a [[Instant Replay|replay tape]] on startup, however, which is good for testing rendering (scenery, loading) performance, since the FDM is not involved, nor is the user-interface.


So for new developers, just jump in and write any test!  It does not need to catch a bug. Do whatever how ever you wish!  Just dive in this shallow end and you'll see that the water is not cold <ref>https://sourceforge.net/p/flightgear/mailman/message/36977465/</ref>
But essentially any and all of the methods proposed can be done here with small amounts of shell-script + Nasal hacking I think, and any of them would be welcome additions. For lower-level tests run by developers, the unit-test framework is great (i.e ‘does the API call produce the right results in the system’), but a smoke-test that regular users can run would be ideal.  


A rendering performance would likely do:
* select some particular rendering settings (clouds, draw distance, etc)
* run a saved fgtape recording
* record the mean/min/max FPS during this and save it in some text file / copy to the clipboard
So yes, if anyone wants to work on the above, the code is all there, please jump in and start hacking - I don’t think it needs any more from the core code, but as ever, please just ask if it does.<ref>https://sourceforge.net/p/flightgear/mailman/message/36975213/</ref>
For example, describing a /rendering/ test (to establish FPS) - the advantage of a replay tape is the actual position (and therefore what is rendered) will be 100% consistent acorss different computers.
Keep in mind that the CPU use of the FDM+systems is typically < 10% of our total CPU use, even when running OSG single-threaded, so for a rendering performance test, whether the FD is run or not is probably noise compared to other things that do run (Nasal, Canvas for example)
Also, Multi-monitor setup is not tested so commonly unfortunately. <ref>https://sourceforge.net/p/flightgear/mailman/message/36904782/</ref>
== Fgdata ==
=== Nasal scripting (comments) ===
{{WIP}}
The now builtin CppUnit framework can obviously solve all the issues
identified in the old [[Nasal Unit Testing Framework]] wiki article and the discussions it points to
and provide the full framework required.<ref>https://sourceforge.net/p/flightgear/mailman/message/36990615/</ref>
have some very simple tests running now for the route-manager, which rely on Nasal. we're skipping a few of the bigger Nasal modules (local-weather, jetways) and have a few lingering issues in some other modules but the basic concept is working.
A very interesting further step, which you might wish to discuss with Edward, is writing test checks *in* Nasal, since this could be quite a fast way to test some area of the code. There’s several ways that could work, and I don’t know if Edward has always planned something around this, so I won’t preempt that conversation.
<ref>https://sourceforge.net/p/flightgear/mailman/message/36764781/</ref>
we have route-manager tests which validate route_manager.nas is working correctly, and we have Canavs tests ({{fg src file|path=test_suite/simgear_tests/canvas|}}) which poke the Nasal API. <ref>https://sourceforge.net/p/flightgear/mailman/message/36991200/</ref>
We need more FGData testing via the test suite.
Adding the CppUnit assertions to Nasal, is the task James has been suggesting he would do, so others can write tests in pure Nasal.
The other piece is some C++ code to scan a directory for files matching a pattern, eg test_XYZ.nas, and to run each of those automatically.
The idea for testing Nasal would be that you write a small CppUnit
interface in c++ in $FG_SRC/test_suite/*_tests/ (the FGData Nasal
testing would be in a fgdata_tests/ directory).  This would register
each test which points to the script in
$FG_SRC/test_suite/shared_data/nasal/, and the setUp() and tearDown()
functions would use helper functions in the fgtest namespace to start
and stop Nasal.  The Nasal scripts could then call the CppUnit
assertion macros wrapped up as Nasal functions for communicating
failures and errors to the test suite. <ref>https://sourceforge.net/p/flightgear/mailman/message/36991150/</ref>
Scanning for scripts is a great idea.  Then developers (core and content) could write tests in pure Nasal.
However all scripts found and executed will be seen as a single test within the test suite.  So maybe we should have a $FG_SRC/test_suite/nasal_staging/ directory for initial development of such auto-scanned scripts.  But then we have someone shift them into $FG_SRC/test_suite/system_tests/, $FG_SRC/test_suite/unit_tests/, or $FG_SRC/test_suite/fgdata_tests/ later on?  That would give better diagnostics and would avoid long-term clutter.<ref>https://sourceforge.net/p/flightgear/mailman/message/36991198/</ref>
* First we should probably hard code tests into the C++ framework. The CppUnit assertion macros will have to be wrapped up as Nasal functions for this.
* Then implement the scanning code as we need some CMake magic (probably using file(COPY, ...)).
* Finally work out if and how we need to improve the Nasal debugging output.
the code could go into a subdirectory in $FG_SRC/test_suite/fgdata_tests/, and the Nasal script in $FG_SRC/test_suite/shared_data/nasal/.


== References ==
== References ==
{{Appendix}}
{{Appendix}}
[[Category:Core development]]

Revision as of 17:34, 26 September 2020

Note  There’s already the test_suite in $FG_SRC/test_suite using CppUnit, thanks to some hard work by Edward. We need more tests written for it, submissions welcome. (Pick an area of interest)[1]

The FlightGear source code and data are tested by the FlightGear developers using a number of tools. This includes automated testing via unit tests in SimGear and a full test suite with multiple test categories in the flightgear repository, as well as manual in-sim testing. Writing tests is one of the best ways to jump into FlightGear development.

One improving area is unit-testing: because of which, certain areas and features (eg, carrier start) now ‘can’t break’. As we add more such areas (eg Multi-player, AI, protocols and replay are all possible), we increase the baseline quality, and also have a clearer idea when we make incompatible changes. (The idea being we capture the ‘supported API’ in the tests: when an aircraft deviates from that, we can decide to add another test case, fix the aircraft, etc). Of course, there’s some pretty major areas where Automated Testing Is Hard (TM).[2]

SimGear

The SimGear sources are checked via unit testing. This is implemented using the CMake CTest unit testing infrastructure. A number of tests are currently implemented using the BOOST unit testing infrastructure tied to the build system using CTest, however this should be avoided when writing tests as there is a shift towards eliminating BOOST by the FlightGear developers.

Building and running the SimGear tests

$ make
$ ctest

FlightGear

Testing of the flightgear sources is via a comprehensive test suite implemented using CppUnit, a port of the famous JUnit framework.

Building the test suite

To run the tests you will need to build FlightGear from source using cmake. See Building FlightGear for details.

Once you have your cmake build environment do the following:

  1. Change to your FlightGear build directory
  2. Enable building the tests by setting a cmake variable: cmake -LBUILD_TESTING=ON .
  3. Ensure the $FG_ROOT environment variable points to fgdata e.g. $FG_INSTALL_DIR/share/fgdata
  4. Build the test suite: make test_suite

This will also run the full test suite (see below), because you will typically want to write a test and then immediately compile and run it.

Running the test suite

To run the test suite, simply run ./test_suite/fgfs_test_suite

This will run the full test suite and print a Synopsis of results similar to the following:

Synopsis
========

System/functional tests ....................................... [ OK ]
Unit tests .................................................... [ OK ]
Simgear unit tests ............................................ [ OK ]
FGData tests .................................................. [ OK ]
Synopsis ...................................................... [ OK ]


You can also run individual test cases. Run ./test_suite/fgfs_test_suite -h to see the various options

Why write unit tests?

A well tested piece of software will have a much lower bug count/load. An extensive test suite with unit tests, system/functional tests, GUI tests, installer tests, and other categories of tests can significantly help in this regard.

The benefits of not just chasing clear "wins" are great: An awesome learning experience for new developers; the ability to catch latent, unreported bugs; making it easier to refactor current code by creating a safety net; making it easier for current developers to accept new contributions (when accompanied with passing tests); helping other test writers by contributing to the common test suite infrastructure; and being able to easily check for memory leaks or other issues via Valgrind.[3]

If you are a new developer, just jump in and write any test! It does not need to catch a bug. Do whatever how ever you wish! Just dive in this shallow end and you'll see that the water is not cold.

Writing a test as a safety net. You write the test to pass, make your changes, then make sure that the test still passes. Then you push both the test and core changes.[4]


It'd be better jumping into a specific area of interest to you, and submitting merge-requests. That would naturally trigger some C++ feedback when reviewed but we aren’t looking for perfection here, more to increase the overall pool of knowledge of what best-practice looks like, even if a given commit is less than perfect.

I,e It’s more important to have 10 or 20 people actively contributing correct-and-reasonable code than three people contributing absolutely perfect, micro-optimised C++. [5]

Benefits of unit testing

There are lots of benefits to writing tests that pass.

Benefits include :

  • Learning! New developers can learn a ton from writing a number of passing tests in the area they are interested in. This is one of the quickest ways to learn about a pre-existing and mature code base. You have zero worries about breaking things. This is diving in the shallow end.
  • Latent bug uncovering. For every ten tests you write expecting them to pass, one will probably fail. Or at least uncover unexpected behavior that can be improved.
  • Refactoring. If we had 10,000 passing tests (assuming universal test coverage), large scale refactoring of the entire code base would be quick and reliable. It would enable refactoring on a scale currently unimaginable. I cannot emphasize enough how much of a benefit this would be.
  • Developer turnover. Again if we had 10,000 passing tests (assuming universal test coverage), it would encourage new developers. This is because the fear of breaking something is removed. It is a total safety net. It also would give existing developers peace of mind when a new developer is touching one of the dark parts of FlightGear that no current developer understands (there are plenty of those) .
  • Test suite infrastructure. The more passing tests written, the better the test suite infrastructure will become. We can already do a lot. But the addition of more passing tests will help other test writers.
  • Memory checking. Running a single test through Valgrind is amazing. Running FlightGear through Valgrind is close to impossible. Passing tests can be written to catch memory leaks!
  • Low code quality and standards. This is related to the learning point. As long as a test compiles on all OSes without warning, it passes, and Valgrind gives you an ok, it is good enough. You dont need to be a C++ expert to dive into this shallow end of the pool.

Bootstrapping completely new tests

To start diving straight into the test suite code, firstly copy what has been done in this commit: edauvergne/flightgear/8474df commit view


Just modify all names for a JSBSim test (or any other test fixture you want to code). You should then be able to compile and check that your new testDummy() test passes as expected. You can then slowly build up from this basic infrastructure as you learn the fgfs internals, c++, and git skills required for implementing your test on your fork's new development branch :)

Headless testing

1rightarrow.png See FlightGear Headless for the main article about this subject.

For an FDM+systems test, we should run FG without a renderer (which is what the test_suite does), to benchmark the pure C++ performance of whatever system it is we care about (FDM or whatever). But this is not really worth doing anyway, since a few hours playing with ‘perf’ on Linux / Instruments on macOS will show you that 80% of our CPU time is spent in OSG + the GL drivers, and hence Amdhal’s law will always get you.[6]

Graphics testing

The other thing is to setup some test-case scenes where you can quickly measure differences and compare via screenshots. The brain/eye/memory are very bad at this stuff, setup something you can load from the command line via a script to test old/new versions, if possible. [7]

For example, a test would would be a .fgfsrc that makes:

  • c172p at some parking in a detailed airport;
  • camera looking at it in specific direction with specific FOV;
  • AW with specific METARs around (if not possible, BW with specific METAR);
  • fixed rendering settings (several variants may be needed for renderers and threading modes) [8]


We can already load a replay tape on startup, however, which is good for testing rendering (scenery, loading) performance, since the FDM is not involved, nor is the user-interface.

But essentially any and all of the methods proposed can be done here with small amounts of shell-script + Nasal hacking I think, and any of them would be welcome additions. For lower-level tests run by developers, the unit-test framework is great (i.e ‘does the API call produce the right results in the system’), but a smoke-test that regular users can run would be ideal.

A rendering performance would likely do:

  • select some particular rendering settings (clouds, draw distance, etc)
  • run a saved fgtape recording
  • record the mean/min/max FPS during this and save it in some text file / copy to the clipboard

So yes, if anyone wants to work on the above, the code is all there, please jump in and start hacking - I don’t think it needs any more from the core code, but as ever, please just ask if it does.[9]

For example, describing a /rendering/ test (to establish FPS) - the advantage of a replay tape is the actual position (and therefore what is rendered) will be 100% consistent acorss different computers.

Keep in mind that the CPU use of the FDM+systems is typically < 10% of our total CPU use, even when running OSG single-threaded, so for a rendering performance test, whether the FD is run or not is probably noise compared to other things that do run (Nasal, Canvas for example)


Also, Multi-monitor setup is not tested so commonly unfortunately. [10]

Fgdata

Nasal scripting (comments)

WIP.png Work in progress
This article or section will be worked on in the upcoming hours or days.
See history for the latest developments.

The now builtin CppUnit framework can obviously solve all the issues identified in the old Nasal Unit Testing Framework wiki article and the discussions it points to and provide the full framework required.[11]

have some very simple tests running now for the route-manager, which rely on Nasal. we're skipping a few of the bigger Nasal modules (local-weather, jetways) and have a few lingering issues in some other modules but the basic concept is working.


A very interesting further step, which you might wish to discuss with Edward, is writing test checks *in* Nasal, since this could be quite a fast way to test some area of the code. There’s several ways that could work, and I don’t know if Edward has always planned something around this, so I won’t preempt that conversation. [12]

we have route-manager tests which validate route_manager.nas is working correctly, and we have Canavs tests ($FG_SRC/test_suite/simgear_tests/canvas) which poke the Nasal API. [13]

We need more FGData testing via the test suite.

Adding the CppUnit assertions to Nasal, is the task James has been suggesting he would do, so others can write tests in pure Nasal.

The other piece is some C++ code to scan a directory for files matching a pattern, eg test_XYZ.nas, and to run each of those automatically.


The idea for testing Nasal would be that you write a small CppUnit interface in c++ in $FG_SRC/test_suite/*_tests/ (the FGData Nasal testing would be in a fgdata_tests/ directory). This would register each test which points to the script in $FG_SRC/test_suite/shared_data/nasal/, and the setUp() and tearDown() functions would use helper functions in the fgtest namespace to start and stop Nasal. The Nasal scripts could then call the CppUnit assertion macros wrapped up as Nasal functions for communicating failures and errors to the test suite. [14]

Scanning for scripts is a great idea. Then developers (core and content) could write tests in pure Nasal.

However all scripts found and executed will be seen as a single test within the test suite. So maybe we should have a $FG_SRC/test_suite/nasal_staging/ directory for initial development of such auto-scanned scripts. But then we have someone shift them into $FG_SRC/test_suite/system_tests/, $FG_SRC/test_suite/unit_tests/, or $FG_SRC/test_suite/fgdata_tests/ later on? That would give better diagnostics and would avoid long-term clutter.[15]

  • First we should probably hard code tests into the C++ framework. The CppUnit assertion macros will have to be wrapped up as Nasal functions for this.
  • Then implement the scanning code as we need some CMake magic (probably using file(COPY, ...)).
  • Finally work out if and how we need to improve the Nasal debugging output.

the code could go into a subdirectory in $FG_SRC/test_suite/fgdata_tests/, and the Nasal script in $FG_SRC/test_suite/shared_data/nasal/.

References

References