Test Data

Posted in Testing on April 6, 2016 by Adrian Wyssmann ‐ 6 min read

Test data is very important for testing, but why? and what is test data?

ISTQB - Standard Glossary defines test data as follows

Test data is data that exists (for example, in a database) before a test is executed, and that affects or is affected by the component or system under test.

Types and forms of test Data

I would categorize test data in 3 different types of test data.

Input Data

Test data which is used in a confirmatory way, typically to verify that a given set of input to a given function produces some expected result. Other data may be used in order to challenge the ability of the program to respond to unusual, extreme, exceptional, or unexpected input. This data is usually bound to an individual test case.

Baseline Data

Test data which forms the baseline for a test to be executed (e.g. data records in a database so that you can tests whether they can be deleted). Baseline test data typically refers to data in a database but not explicitly (e.g. existing files on a drive to test file operations or configuration files which brings your application into a desired state). This data is not necessarily bound to a specific test case but founds the basis (pre-condition) on which some test cases can be performed.

Output Data

Test data which is produced as an outcome of a test. This could be possibly used as input data for subsequent tests or as documented evidence.

All these test data comes in various forms like

  • Parameters or variables
  • Files
    • Data lists (Excel files, CSV files, …)
    • Data files (to be fed to the SUT via an specified interface and using a particular protocol ASTM, HL7)
    • Configuration files (XML, master files, …)
  • Database

This point of view of mine can be discussed but I think if you keep in mind that you will probably handle the each of them slightly different (keyword “Test Data Management”) the definition is not all too bad.

Parameters / Variables

This form of test data is most common for unit tests, where you test particular code like a method or function of a class. It is embedded in the code and therefore stored and maintained in the source control.

public void testSumPositiveNumbers() {
     Adder adder = new AdderImpl();
     assert(adder.add(1, 7) == 8);

The numbers 1 and 7 used to passed to the function add are test data. Also the result returned from adder.add(1,7) is test data. This data is usually provided directly in the code and not fed by an external source like a data list. Sure, you can also use parametrized tests, here as an example with JUnit

public class SimpleConsoleParseTests {

    public static Collection<Object[]> data() {
        return Arrays.asList(new Object[][] {
             { 1, 2, 3},
             { 2, 2, 4},
             { -1, -1, -2},
             { -1, 1, 0}

    private int iNbr1;
    private int iNbr1;
    private int iResult;

    public SimpleConsoleParseTests(int iNbr1, int iNbr2, int iResult){
       iNbr1= iNbr1;
       iNbr2= iNbr2;
       iResult = iResult;
    public void testSumPositiveNumbers() {
       Adder adder = new AdderImpl();
       assert(adder.add(iNbr1, iNbr2) == iResult);

In this example you have 4 different sets of test data each of it provides 2 numbers to be added and a results to be verified.


Data list

Data lists are usually test data which is represented in a table (Excel, CSV) where each line within the file represents a test data record and are a common input to run one an the same test with different data i.e. data-driven testing.


In the example above we could imagine that a test runs 3 times, trying to login with given username and password and verifies that the user either can login or not. The test framework would read each line of the data list and pass it to a function or interface which has to be tested. So in most cases the data in files are also only variables. However this form of data is treated slightly different as variables which are embedded in the source code; the data is separated from the source code and not necessarily in the source control (but it would be still a good idea to do so).

Data files

Data files usually represent a more complex type of test data and are more common for integration or even system testing. A single file - in contrary data lists - usually represents a single test record. In my work as test engineer in a medical device company I had to deal a lot with such data files. Here an example:

O|1||^86839^^^^Syringe||||||||||||Blood^Venous^A. femoralis l.

Another example could be a CSV file - similar to the example above - but where the whole file is used as an import for e.g. testing and import function of your SUT.

Configuration files

I also consider any configuration file as test data. It may not be necessarily explicit for a particular test case but such files configure your SUT as desired for your testing.


A database used for testing is also test data. Yes also a database is mainly a data file but is serves usually as a baseline for further tests and due it’s possible big size and format it may also not be stored in a source control. Whenever possible, I would create script files which can re-create the data structure and data in the database. This also simplifies correction and adjustments to the test data as well as the versioning using source control.

Data Sources

Data can come from different sources and depending on it data acquisition and data conditioning looks different

  • production / customer data
  • from scratch
  • existing test data

Something I will discuss more thoroughly in another article.

Test Data Management

Test data management is all about handling of test data (analyzing, creating, versioning, …) and is very crucial in the test engineering as test data is strongly coupled to a particular test case version and likewise the test case version itself is strongly coupled to a particular version of the test object. I always illustrate this with a simple example:

Versioning of Test Data
Example of relation of test data to a test case

So it shall be clear that test data shall be managed to ensure that

  • data is correct and adequate to the test goal and the test object
  • data belongs to the correct version of the test case
  • data which is referenced and deployed was not modified (i.e. test data matches with test case)
  • data can be easily extracted and stored

However this is a big topic and therefore I will discuss this in another article.