Testing & Documenting your Testing
This article is aimed at undergraduates taking courses in programming.
When I set assignments, there are usually some marks allocated for the testing of the program, which has to be documented. Some students do this very well, but other students I suspect may have been able to get more marks if they had more guidance as to what the testing and documentation involves.
This article aims to answer common student queries about testing the functionality of software. It is aimed at the undergraduates learning to program, and also gives guidance as to how to present testing results as part of a submission for a programming assignment, to help students get more marks in programming assignments.
This article covers:
- The Goals of Testing
- Preparing to Test
- Choice of Test Cases
- Black Box Testing
- White Box Testing
- Testing the Right Things
- Debugging
The Goals of Testing
Firstly, if you haven't already, read this article about the basics of software testing.
You want to have a program that works. You want it to be bug-free. Some bugs you may already know about, because you've noticed there are times when your program isn't working correctly. In that case, you need to debug your program: finding out where the bug is located, and then fixing it.
On the other hand, maybe you've fixed all the bugs you know of, maybe your program looks like it works, but how do you know it really does work? It may appear like it works (you haven't seen any bugs recently), but how do you have confidence that your program really does work 100% of the time, and it isn't just the case that the bugs in your program are hiding? This is where the testing comes in. Testing is all about finding bugs in your program.
It is actually very difficult to test programs that you yourself have written, because if there aren't any obvious bugs, it is very tempting to believe that your program works. It's very difficult to imagine that there could be some hidden bugs that turn up in circumstances you haven't considered yet. But no matter how tempting it might be to not test your programs thoroughly, test you must! Bugs are very common things to find in programs.
To give you an idea of how often bugs can occur, let me tell you of a student with whom I was discussing testing. This student didn't realise how often bugs occur in programs. I asked the student how many, out of, say, 50 coursework submissions, how many did he think would have 100% correct code in. The student thought about half of the submissions would be bug-free. In reality, from experience of marking many assignments, I would expect only about a couple of the 50 submissions to be bug-free submissions of complete program code. In addition, probably about half of these coursework submissions would have reported no errors found in all their code. So that is an awful lot of undetected bugs in submitted program code!
The best attitude towards testing is to remember that if a test finds a bug, then it is a good test! Try and adopt the attitude "There is highly likely a bug in my program somewhere, and I'M GOING TO FIND IT!!!". If it helps, imagine that your friend is going to test your program too, and you will owe your friend a fiver for every bug your friend finds. So you'd better find those bugs before your friend does!
Preparing to Test
You will need a specification to test your program against. Remember that a specification is simply a clear description of what the program code is supposed to do.
Some of the specification you need will come from the coursework handout itself, because that says what work you are to do for the programming assignment. Some of the specification may come from you yourself, though. For example, if you decide for yourself that you need a procedure to append an element to the end of a list, then it will be you who specifies what that procedure is supposed to do. You can do that, for example, by writing preconditions and postconditions for your methods.
Now that you have your specification(s), before you can actually do any testing, you have to decide what it is that you are going to test. In other words, you have to form a test plan.
Your test plan will consist of lots of little tests. Each small test will investigate how your code behaves under certain conditions. These choices of tests to perform under assorted circumstances are known as test cases.
For each test case, you need to describe the conditions under which the test runs, what data is being used and what actions are being carried out. You also need to predict what the expected result of the test should be if the program behaves correctly - that is, if the code meets its specification.
Then you can run your tests, and find out what the actual results of the testing were. To see whether your code passed the tests, you compare the expected test results to the actual test results; if the actual results of a test match the expected results, then your code passes that test.
Choice of Test Cases
When you are choosing which tests you are going to do on your program code, there are two general things you have to bear in mind:
The first point is that your tests should be really thorough, that is, you should cover all aspects of your program code when it comes to testing.
The other general point is that you should use a wide variety of tests, for each part of your program that you focus on testing. In other words, for each aspect of your program that you test, there should be a wide variety of test cases.
Here's a general example to illustrate these two points:
Suppose your program was a game. Your tests would have to cover all the various parts of the game - that's the thoroughness bit, covering everything.
As for variety, well, for example, one of the aspects of the game you might test would be the scoring of the game; you'd want to make sure that the code was getting the scoring correct. So you wouldn't do just one run of the game to see whether the scoring was correct, because a bug in the scoring might not show up in that particular run of the game. No, you'd do lots and lots of different runs of the game with different scores, to check that the scoring works for a wide variety of different runs of the game.
Test cases can use randomly-chosen data, but they can also use carefully chosen data to investigate what happens for certain partciular situations. For example:
Suppose you had a procedure
public void append(LinkedList list, int i) {
// pre: true
// post: the item i has been appended onto the end of the list
You could try and appending all kinds of different integers to a linked list, and see whether the appending is done correctly (here a method that displays the contents of your list would be useful!).
Also, you could choose your test case data carefully to cover different situations. Looking at the code for the procedure...
procedure append(LinkedList list, int i) {
if (list.last==null) { // list is empty
list.last = new ListNode();
list.last.data := i;
list.last.next := null;
list.head := list.last;
}
else { // list non-empty, add to end
list.last.next = new ListNode();
list.last.next.data := i;
list.last.next.next := null;
list.last := list.last.next;
}
} // append
...we can see that there are two main cases, when the list is empty before appending and when the list is not empty. So we should try a few test cases appending different integers to an empty list and another few test cases appending different integers to non-empty lists, including some lists of different sizes.
What you test depends very much on the data structures involved and what your code is actually doing. However, here's some questions you can use to inspire your choice of tests:
- For each global variable, as the program progresses, does it always contain the correct information?
- Does this method do what it is supposed to do?
Another thing to bear in mind is that it is easier to start out with small test cases that use a small quantity of data. This is because it is easier to predict what the correct answer should be with a small test. Once you have tested your program with smaller quantities of data and it seems to work ok, then you can test with larger datasets.
When you have a little experience of testing your programs, you may find the following concepts (explained below) helpful, that of black box testing, which concentrates on the function of your program, and white box testing, which concentrates on how your program is structured.
Black Box Testing
In black box testing, you just test what the outputs of the program should be, given the program's inputs (e.g. input from the user or from a file). This kind of testing does not pay attention to how the program is coded, it concentrates on various aspects of what the program does.
For example, in the sample testing documentation for the (Pascal) programming assignment The ESP Game, the testing of the scoring of the game is a "black box" style test: it concentrates on the aspect of the game's output that is the scoring, and in several runs of the game, the scoring is inspected to check that it is correct.
Also, in the same example, the testing of whether the randomness seemed to be working ok for random predictions was a black box style test.
Black box tests can be very useful for finding some types of bugs. What they might not be quite so good at is finding bugs that only turn up in certain unusual circumstances. A useful technique to complement black box testing is that of white box testing...
White Box Testing
Unlike black box testing, white box testing does mean looking at the program code and choosing the test cases based on the structure of the program code.
So when you do white box testing, it is based on the code. You go through your program thoroughly and systematically, thinking about each section of code - how do you know it is doing what it should be doing? For example,
- Check the code in each file/class is working ok.
- Check whether the code in each section, including each method, is working ok
-
Check all the different execution paths through the code.
For example, if you have an
ifstatement, you need to check circumstances where each branch of theifwill be chosen
For example, in the sample testing documentation for the (Pascal) programming assignment The ESP Game, the test cases for the tree unit were "white box" style tests, inspired by the actual code. For example, in the tests for the GetPrediction Procedure, the code for the procedure is
function FindPrediction(t: GuessTree; gh: GuessHistory; depth:integer): char;
// Makes a prediction based on the most recent four guesses.
// depth indicates the depth in the overall tree
// gh is the most recent four guesses of the user.
// If no prediction can be made (e.g. because this sequence of guesses
// hasn't occurred before or the number of following heading guesses is
// equal to the number of following tail guesses), then the char 'N'
// is returned.
begin
if (t=nil) then
FindPrediction := 'N' // no prediction; user hasn't done this before
else
begin
if depth=4 then // we're at the right level to make a prediction
begin
if t^.headcount > t^.tailcount then
FindPrediction := 'H'
else if t^.headcount < t^.tailcount then
FindPrediction := 'T'
else // could equally be either
FindPrediction := 'N'
end
else // not deep enough yet
begin
if upCase(gh[depth+1])='H' then
FindPrediction := FindPrediction(t^.left,gh,depth+1) // go left for head
else
FindPrediction := FindPrediction(t^.right,gh,depth+1) // go right for tail
end
end
end;
function GetPrediction(t: GuessTree; gh:GuessHistory): char;
// Makes a prediction based on the most recent four guesses.
begin
GetPrediction := FindPrediction(t,gh,0);
end;
If you look at the test cases chosen, you can see how the various branches of the procedure were tested: there are test cases for an empty tree, there are test cases where a prediction of H was returned, there are test cases where a prediction of T was returned, and there are test cases for situations where no prediction was possible from the data in the tree.
Testing the Right Things
There is one more thing to bear in mind when testing, which is to test the correct things. Otherwise you end up putting testing effort into tests that don't need to be run.
The classic example of testing the wrong thing comes from a psychology experiment by Peter Wason:
Suppose you are presented with four cards, all placed flat on a table in front of you. You are told that every card has a letter on one side of it and a number on the other.
You are also told the rule "Every card that has a vowel on one side of it has an even number on the other side." One card shows "E", another shows "7", and the other two show "4" and "K".
Which cards do you turn over to test whether this rule is indeed true?
Most people turn over the cards marked with "E" and "4". It is true that you do need to turn over the card marked "E" to test that there is indeed a vowel on the other side. But turning over the card with "4" is unnecessary; it is testing the wrong thing. Suppose you find the reverse has a consonant? Or a vowel? Neither will violate the rule. No, the other card you need to turn over is the card marked "7", because if it has a vowel on the other side, then it will violate the rule.
Similarly, you have to test the right things when you are testing.
For example, do test your handling of user input if your program is supposed to handle user errors in input, but if your program isn't supposed to handle user errors, then don't test that. For example, if your program is a game, then it may very well have to cope with errors in user input, so you do want to test how your program behaves in that situation. On the other hand, if what you coded is a hash table, and you are meant to be testing your hash table, using a test harness, then testing the user input doesn't tell you anything about whether the hash table works correctly or not.
Another common area where this can occur is that of whether a precondition is true or not for a procedure/function. For example, suppose there is a procedure
public Element pop () {
// pre: this stack is not empty
// post: the top element of the stack has been removed
// from the top of the stack and returned
To test this method, note it is only supposed to work if
the precondition is true, i.e. the stack is non-empty.
So it is no use testing this method to see what
happens when the stack is empty. There may however be some
code somewhere else in your program whose job it is to make
sure the stack isn't empty before calling the pop()
method, and in that case, then by all means test
that bit of code to make sure it is living up to
its responsibilities and behaving correctly with respect to
the stack!
Debugging
Whilst really debugging is a whole different topic, it is worth mentioning how debugging can help inspire your testing.
If you're like most people, you're sensibly doing the code development a step at a time, and once you've compiled that bit and run it to see if it "seems" to work, then you debug and fix any obvious errors. One way debugging can help you with testing is that the debugging you do can help inspire your choice of tests - if you do see a bug then you can ask yourself "what test could I do to catch that kind of bug?"
Also, sometimes when trying to find bugs
you program in little extra bits of code
(like System.out.println statements), to give you
more information about the variables and the data they
hold, and as you do this, you are actually doing little
tests on your code, to see whether variables are containing
what they should. What you can do is to incorporate that
into your testing later. Leave the extra code in your
program, but commented out, so the marker can see what
code you used to do the tests.
Describing your Tests
When you are producing documentation to show what testing you did, what you need to aim for is to be really really clear about precisely what your tests were. For any test you do on your program, you need to say
- what are you trying to test (explain clearly in words)
- what data are you using for the test
- what operations are being performed in the test
- what result you would expect if the program passed the test
- what result you actually did get from the test
This information needs to be clear and detailed, and sufficient to describe the test so precisely that someone else could know enough to run the exact same test, just from the documentation given.
You can see various examples of tests described in this sample test documentation, which should give you an idea of how these things can be described. Here are various further hints and tips for describing your tests and the results of testing:
- Convincing a skeptic
-
Imagine there is an extremely skeptical person who doesn't believe that your code works - you have to SHOW that person that your code really really does work. Except that this person is not standing next to you at the computer so you can point at the results, you actually have to put it down in documentation form.
- Tabulating your tests and results
-
There are various ways you can describe your tests. Some people find it helpful to list some of their tests in a table, like these tables from the sample testing documentation for the programming assignment The ESP Game. For some tests, textual description is more helpful to describe what the purpose of a test is and the other information as listed above. Or sometimes, you need to use diagrams (e.g. if you have a tree structure and you want to show what data is in the tree) and the diagrams just don't fit neatly into a table. So long as you have the information about your test there clearly presented, that is fine.
When you are describing the expected and actual results of your test, do not simply say "it works" or "works normally" or "the test worked as expected", you need to be clear about precisely what it means for your code to pass the test. Yes it may be obvious to you but you have to put it down, otherwise how does the marker know that you really do know what the expected result of your test is? If you just say "it works" then it just looks like you are not capable of predicting what the expected result should be and you didn't do the comparison between the expected and actual results, you are just assuming your program works!
- Debugging techniques
-
If you use debugging techniques to inspire your choice of tests, be clear about how you carried out the testing. Don't just say "I used the debugging facilities of Delphi" - anyone can say that! HOW did you use the facilities? What were you looking for? Show the reader, don't just tell the reader.
- Use of screenshots
- Don't give screenshots where the writing is so small it can't be read.
- Don't give a whole big screenshot just so that the reader can see that the result was "2". Do use the crop feature to just show the relevant parts of your screenshots.
- Don't include unnecessary screenshots, often just simply stating what your results were is fine.
- Don't leave the reader to figure out what your screenshot is supposed to convey, you still need to say what the screenshot shows - did the program pass the test or not?
- Be explicit
-
Don't assume that it's self-evident what you are testing - you have to state what it is you are testing / looking for explicitly. Sometimes I mark work where the student has presented a screenshot, as if the screenshot itself was enough to describe what was being tested. It isn't.
This might seem surprising to you as it probably seems obvious what you are testing for and you think it should be obvious to whoever is marking your work, but you do have to spell it out - remember that there are a whole bunch of students submitting work, and different students may use the same technique but for different purposes, so no, it's not obvious what YOU are doing, the marker can't read your mind!
- Similar tests
-
For some testing purposes, you may end up doing a lot of similar tests. In that case, when describing your tests, you may find it helpful to give one example and then describe how the rest are similar. This is just to save you writing lots and lots of information for each case - you still need to be very clear to the reader what you are testing, though!
When you carry out tests, sometimes it can be handy to show the results of your program directly. One way to do this is with screenshots. Be smart in the way you do this, though:
The Importance of Honesty
You should be honest in presenting your test results. That is, you shouldn't pretend you have run a test when you haven't, and you shouldn't pretend that your code passed your test when it doesn't.
If you think you are trying to impress the marker by saying all the tests worked, even if they didn't, then not only is that falsification (a form of university misconduct) but by presenting test results that aren't true, you are simply showing yourself to be a poor tester!
A good test is one that does find a bug, and if there is a bug in your code (pretty likely, the vast majority of student code submissions for assignments do contain bugs), you will get more marks for testing if you have found that bug! So best is to do good thorough testing, and be honest about the results.
If you found a bug, and you say so, then you will get marks for your good test that found the bug, even if you didn't manage to find the bug and fix the code. On the other hand, if you pretend that your code works, either by lying about the results of your test or by omitting to describe the test that your code fails, then not only will you lose marks for your incorrect code, but you will also lose marks for poor testing!
