Friday, August 13, 2010

Fundamentals of data testing: Deterministic vs Heuristic

The style of test automation I find most useful when testing data integration systems is largely based on the “deterministic” testing style popularised by Agile methods. That means that for every value input, we need to be able to define an expected output from the system under test. This is appropriate for testing most situations that arise in data integration solutions but we find that there are occasional cases we will need to borrow from techniques in systems based on heuristics.


For example, consider the following mapping rules:



Source Column

Rule

Target Column


System date formatted as “YYYYMMDD HH24:MI:SS”

creation_date

flight_number

==>

flight_number

destination

Convert to Sentence Case

dest_name



The rules for populating flight_number and destination are deterministic. Flight number is straightforward because the value from the source is simply copied to the target. The rules for destination are slightly more involved but we easily construct test cases that define an expected result based on a given input - EDINBURGH -> Edinburgh, gLasgow -> Glasgow... and so on.


The "creation_date" rule is quite different. Timestamps taken from the system clock are non-deterministic because it’s impossible to define a value that will definitely match the system time at the moment the job is run. In some cases, you can get round this by injecting a known value into the system under test using a parameter (or some similar method) but it’s not always possible so we need to take a leaf out of heuristic systems thinking.


Heuristic systems use a combination of rules and data gathered at runtime to make decisions. A scheduling system used by an airline plots a schedule for the airline each day that takes into account the many factors that affect airline scheduling - plane location, crew availability, turnaround time, weather conditions, volcanic activity in iceland, and so forth. There is no “right” answer to the question of how to arrange the planes so it’s hard to test deterministically. Instead, the development team need to ensure that the correct decision making flow is executed when deciding the schedule rather than trying to predict an outcome. Emily and Geoff Bache work on such as system and gave a presentation at Agile2008 explaining how their test suite works.


Back to our system datetime. We can borrow from heuristic concepts by plugging in a set of rules that allow us to validate the date generated. A reasonable heuristic rule for the system date might be to check the date is correctly formatted and within a few minutes of the moment the test suite was run.


Take care though. Heuristic tests make your test suite more complicated because you need capture and process more data in a heuristic style test than a deterministic test. Sometimes it may just not be worth the effort. Furthermore, it’s easy to write heuristic tests that are just as complex than the system being tested. Let’s face it, the risk of getting the system date call wrong is pretty remote so maybe it’s just easier to ignore those fields and focus building watertight tests for rules you can test deterministically. That’s usually where the real business benefits are found anyway.


Above all, you should always try to find a deterministic solution and you should expect to do so at least 95% of the time.

3 comments:

Steve said...

Well said. Small typo though in the first line of the second paragraph. after the table. You wrote:

The "dest_name" rule is quite different. Timestamps taken from the system clock are non-deterministic because...

I think you meant:
The "creation_date" rule...

That's what I got from the context.

Bob said...

Thank you for introducing me to TextTest

Adrian Mowat said...

@steve Well spotted. Thanks for taking a moment to tell me. I've made the correction.