1. Overview

One of the most difficult parts of unit testing a database (or a data-focused application) is how to handle test data. Traditionally there are two ways to do this:

  1. For each test suite, create all necessary test data. Run tests. Optionally, remove test data.

  2. Create a hierarchy of test suites that use the data from a previous suite.

These each have significant drawbacks:

#1 is very bad for performance. Constantly creating new data for tests becomes a significant overhead at deeper levels of object nesting. IE: To test generating a customer statement you need invoices, which need invoice line items, and purchase orders (and line items), and customer records.

#2 is much more efficient, but maintaining the dependencies between different tests can be very difficult.

A problem for both methods is how to actually reference test data. "Which customer record do we use for testing really old unpaid invoices?"

test_factory attempts to solve these problems. It provides a simple function (tf.get) that allows you to retrieve test data using a text name as an identifier. If the test data you need doesn’t already exist then it is created automatically, and a copy is kept in a real table. That copy is never removed automatically, so retrieving that data later is fast. Dependencies are easily handled in the definition of the test data itself. For example:

INSERT INTO invoice(customer_id) VALUES( (tf.get( NULL::customer, 'base' )).customer_id )

2. Quick Start

Register two Test_Sets ("base", and "scratch") for the customer Test Object.

SELECT tf.register(
        table_name := 'customer'
        , test_sets :=
    array[
                row(
                        'base'
                        ,
$$INSERT INTO customer VALUES
        ( DEFAULT -- customer_id
                , 'email', 'first', 'middle', 'last', 'suffix'
                , 'address', 'city', 'state', 'postal'
        )
        RETURNING *
$$
                        )::tf.test_set
                , row(
                        'scratch'
                        ,
$$INSERT INTO customer VALUES
        ( DEFAULT
                , 'email2', 'first', 'middle', 'last', 'suffix'
                , 'address', 'city', 'state', 'postal'
        )
        RETURNING *
$$
                        )::tf.test_set
        ]
);

Retrieve test customer data (data will be inserted if it doesn’t already exist).

SELECT * FROM tf.get( NULL::customer, 'base' );

Register a customer invoice. Note that this test set uses the already registered customer test data.

SELECT tf.register(
    table_name := 'invoice'
    , test_sets :=
      array[
        row(
            'base'
            ,
$$INSERT INTO invoice VALUES
    ( DEFAULT -- invoice_id
        , (tf.get( NULL::customer, 'base' )).customer_id <1>
        , current_date -- Invoice date
        , current_date + 30 -- Due Date
        , 'PO number'
    )
    RETURNING *
$$
            )::tf.test_set
    ]
);
  1. Note the use of tf.get() to refer to other test data.

3. Usage

There are two uses for test_factory: registering test data and getting test data.

3.1. Registering

Test data is only registered once, using <tf.register,tf.register()>>. Usually you want to do this as part of creating a table. If you later ALTER the table a second call to tf.register() will replace the old registered data with new data.

When you register tests the data is not immediately created. Instead, you are providing commands to create test data. Those commands will only be executed by tf.get(), and only if test data doesn’t already exist.

This behavior is important because it means you can register test data in any order. The only requirement is that the relevant table exists (but the table could even have no columns when you call tf.register()!)

When registering test data that references other test data, all you need to do is embed a call to tf.get() in your registered command.

3.2. Getting

Once you have data registered, tf.get() will return it to you, creating it if necessary. All subsequent calls to tf.get() will return the same test data, without creating new test data. If you install the test_factory_pgtap extension, you can also use tf.tap()

Important
When tf.get() creates data it stores a copy of that data, and that copy is what is returned. This means that if you modify the underlying data those changes will not be seen in subsequent calls to tf.get().

4. Syntax

4.1. Terms

Test Table

A table that requires test data. Currently may only be a table.

Test Set

A single set of test data for a Test Table. A set may contain multiple rows of data. Currently, a set may only have a single command to generate the data.

4.2. tf.test_set

Every Test Table has Test_Sets associated with it. To facilitate this, there is a tf.test_set data type:

CREATE TYPE tf.test_set AS (
  set_name                text
  , insert_sql  text
);

set_name is used to subsequently refer to the data created by insert_sql. Note that it is case and space sensitive. insert_sql is a command that must return test data rows in the same form as the Test Table.

Note
If you’re wondering what’s up with the leading comma… I’ve found that it’s next to impossible to forget a comma when they are leading instead of trailing. This doesn’t sound like a big deal, but over time it really adds up. It also makes editing easier, since it’s a lot more common to add an item at the end than the start. They might look weird at first, but you quickly get used to it.

Note that insert_sql does not have to be an insert statement. It could be a function, for example. The only requirement is that it returns data in the form of table rows. A function defined as "RETURNS SETOF table_name" would work.

Example 1. tf.test_set
SELECT row(
  -- set_name
  'base' <1>

  -- insert_sql
  , $sql$
INSERT INTO customer( is_test, first_name, last_name ) <2>
  VALUES( true, 'Test first name', 'Test last name' )
  RETURNING * <3>
$sql$
)::tf.test_set <4>
                          row
\--------------------------------------------------------
 (table,"                                              +
 INSERT INTO customer( is_test, first_name, last_name )+
   VALUES( true, 'Test first name', 'Test last name' ) +
   RETURNING *                                         +
 ")
(1 row)
  1. Because test sets only exist in the context of a table their names don’t really need to include the table name.

  2. Sometimes you need test data in production systems, usually for operational reasons. A is_test column is a good awy to differentiate it.

  3. The RETURNING clause is critical; this is how tf.get() is able to retrieve the newly inserted data.

  4. This cast is necessary to turn the generic row() into a test_set.

4.2.1. tf.register()

Test data is registered using tf.register(). This function accepts a table name that the test data is for, and a set of commands (in the form of test_sets that build test data.

Important
Additional calls to tf.register() will replace already defined Test Sets for the specified table.
tf.register(
  table_name text <1>
  , test_sets tf.test_set[] <2>
) RETURNS void
  1. Currently only tables are supported. May be schema-qualified. This immediately gets cast to regclass, so the table must exist when tf.register() is called.

  2. Note that this is an array of test_set.

Notes
  • Currently tf.register doesn’t remove old test_set’s that don’t appear in a function call. That will probably change in the future.

Example 2. tf.register()
SELECT tf.register(
        table_name := 'customer'
        , test_sets :=
      array[ <1>
        row(
            'base'
            ,
$$INSERT INTO invoice ( customer_id, invoice_date ) VALUES(
    (tf.get( NULL::customer, 'base' )).customer_id <2>
    , current_date -- Invoice date
  )
  RETURNING *
$$
            )::tf.test_set <3>
    , row( <4>
      'scratch'
      ,
$sql$SELECT invoice__create(
      customer := customer_id
      , due_date := current_date + 30
      , po_number := po_number
    )
  FROM tf.get( NULL::purchase_order, 'test' ) <5>
$sql$
      ) <6>
    ]
);
  1. Remember: test_sets is an array of test_sets

  2. You can refer to other test data inline. Note the extra parenthesis!

  3. Remember to cast the row to tf.test_set

  4. Here we’re defining a second test set for this table.

  5. This is how you can use multiple fields from another piece of test data.

  6. Casting to tf.test_set is optional after the first element in the array.

4.2.2. tf.get()

Returns test data.

tf.get(
  tably_type anyelement <1>
  , set_name text <2>
) RETURNS SETOF anyelement
  1. The anyelement pseudotype is necessary to tell Postgres what type of data the function will be returning.

  2. set_name is the name of a test_set that was defined for this table. If it doesn’t exist you will get an error.

Example 3. tf.get()
SELECT * FROM tf.get(
  NULL::customer <1>
  , 'base' <2>
);
 customer_id | first_name | last_name
-------------+------------+-----------
           1 | first      | last
(1 row)
  1. This is how to tell the system what table the results will take the form of. It is equivalent to cast( NULL AS customer ).

  2. The name of a test set defined for the table. If the set doesn’t exist for the table referenced in <1> you will get an error.

4.2.3. tf.tap()

This is a convenience wrapper around tf.get() for use with pgTap. It calls tf.get() in a pgTap isnt_empty() function and returns the isnt_empty() output. This ensures you actually got test data.

tf.tap(
  table_name text <1>
  , set_name text DEFAULT 'base'
) RETURNS SETOF text
  1. Note that table_name is immediately cast to regclass, so it must be a valid table

4.3. TODO

  • [ ] The array[ row()::tf.test_set ] syntax is a bit awkward. Maybe a JSON syntax would be better.

  • [ ] Repeated calls to tf.register() should replace all Test Sets registered for a table.

  • [ ] At least some of this documentation should be turned into Postgres COMMENTs on the relevant objects.

5. Copyright

Copyright © 2015 Jim C. Nasby <Jim.Nasby@BlueTreble.com>.