Test Data Generation

This chapter explains the principles behind the Test Data Generation feature of Coco.

Principle of the Genetic Algorithm

The way Coco discovers new test cases is to use a genetic algorithm based on an existing unit test to discover the optimal set of input data.

First, the user provides a unit test, with input parameters (integer, float, strings...) and which produces some output (also integer, float, strings...). Suppose that test T has 2 parameters, a string and an integer, and 1 output, a float. This defines a row of data.

Each time that this test is executed, Coco knows the coverage.

As an example:

RowCallOutputCoverage
1T("", 0 )0.020%
2T("a",1)3.050%

Coco will try to find new test data rows which increase the code coverage by mixing 3 techniques:

  • Using a random parameter
  • Mutating a parameter
  • Performing a crossover of 2 tests

Each new row generated will only be kept if it brings a benefit to the overall coverage.

Let's run this on our sample. At the beginning, there is no test data available. The algorithm chooses only random values:

RowCallOutputCoverage
1T("x", 0 )0.020%
2T("a",10)3.050%
3T("ab",-4)3.0same as row 2

After the execution of 3 tests, 2 rows will be kept. The 3rd has the same coverage as the second one and so is redundant. So the full list will be:

RowCallOutputCoverage
1T("x", 0 )0.020%
2T("a",10)3.050%

For the next test, the algorithm can choose to perform a mutation or a crossover. This decision is made randomly.

Suppose that it performs a mutation: it takes a previous row and changes one parameter. We take T("x", 0 ) and replace the second parameter with -1: T("x", -1 ). If the coverage increases, the result is kept, if not, we try other alternatives.

If the crossover is used, then we mix 2 test parameters together. In this case, we could take the first parameter of the first row and the second parameter of the second row. This would give the test T("x",10).

This algorithm can be iterated indefinitely.

Benefit of the Genetic Approach

The main benefit of the genetic approach is that if a set of data discovers an uncovered branch of code, the mutation and crossover are efficient techniques to discover the branch. The measurements of code coverage guide the algorithm.

Coco v7.2.1 ©2024 The Qt Company Ltd.
Qt and respective logos are trademarks of The Qt Company Ltd. in Finland and/or other countries worldwide. All other trademarks are property of their respective owners.