Friday, August 13, 2010

Fundamentals of data testing: Characteristics of a test suite

An automated test suite should be a living thing that is easy to update and easy to run. The test suite only adds value when it is run to find errors so it should be built to run many times in a day. This is just normal Agile thinking.

People who are not used to Agile sometimes think about creating an automated test suite that is open to a privileged few and used only at the end of the project when testing is traditionally performed. At best, this leads to a situation where the suite cannot not deliver maximum return on investment because it has limited chances to catch defects. At worst, the test suite will simply become an overhead that quickly falls out of use.

The most successful automated test solutions on data integration projects I have been involved with have been those that business users, analysts, developers, testers and other relevant parties all provide input to throughout the project. This means unexpected problems late in the project are easy to solve because everyone trusts the test suite covers all possibilities. Just this reason alone is adequate return on the investment needed to setup the test suite.

The tests are run by the developers after every significant change and it’s not unusual for the test suite to be run 10 to 20 times in a day (as an aside unit test suites in object development are run much more frequently, but data integration test suites are limited in speed because they deal with files and databases and it’s often hard to isolate and run the lowest level code the developer writes).

Here, at a high level, is what works for me.

You need 3 things:

  • An environment you can teardown and rebuild repeatedly many times a day
  • A way of storing input data sets and expected results
  • A way of running the input data sets through the system and verifying the output matches the expected results

I usually just build this in the development environment where we have most control over what we can setup and delete and who is able to use it.

The nature of the test suite I suggest is difficult to categorise. It’s like a unit test because it’s designed to be run by the developers and uses a small volume of data. But it’s also like a system and/or user acceptance test because it often runs a sequence of related jobs and should include all the cases we expect to find in the production environment. I suppose “unit test” is the most useful description for practical purposes.

One thing it definitely is not is an integration suite or a basis for application testing. A common mistake is to try and test the flow of the data through a number of systems. It can work under specific conditions, but usually it’s working at just too high a level for your tests to run quickly and you are at the mercy of changes in the internals, workflows and input/output interfaces of the applications and distracts you from the real goal of building good data transformation code.

There is an obvious dilemma, though; what happens if a change to an application has an impact on the data transformations?

Well, it’s OK to run the application interfaces every so often. It’s just that you should try to avoid automating the testing through them. In fact, I strongly recommend you make sure you run the whole application end to end regularly so you can get real feedback on how the system in behaving. It can be manual and doesn’t need to be heavyweight - just enough to spot problems quickly and get enough detail to update the automated tests.

This is actually a similar problem to making sure your tests cover edge cases in the data, but I’ll cover that in a future post.

No comments: