Debugging

Notes on Debugging Gunnar Gotshalks Revised 1998 January 7

1 -- Definitions

Testing -- Try to find errors in a program.

Debugging -- Correct for known errors in a program. Alternately you can look at it as a search and destroy mission -- it helps if you have a can of Raid. Debugging takes place after testing shows the presence of errors. Testing is also subpart of debugging as you usually need to run additional tests to try to identify what is causing the problem.

Corollary -- Getting a program to work for specific input values.

By the by way do you know where the term bug came from? Grace Hopper, a lieutenant in the US Navy (at the time, she later became a rear admiral), was working on a computer in the late 40's. A program failed. When she examined the machine she found a bug crushed in a relay switch preventing it from making contact. The program worked after removal of the bug. The bug was taped into the logbook which has been saved in the Smithsonian Museum in Washington DC.

2 -- Where bugs come from

To be able to find bugs you have to know where they come from. That gives you an idea as to where to look for bugs, unfortunately, its the entire life cycle.

requirements
specification
design
implementation
maintenance -- sometimes more bugs are introduced than are fixed!

Typical implementation bugs

compile time syntax errors. Try to get syntax correct first time. At least most modern compilers give reasonable messages.
- type mismatch, wrong variable referenced
- Problems occur with C/C++ where often anything which takes up 4 bytes is treated as an int.
- In C use of = (assignment) instead of == (equality)
  if (a == 5) { b = 4;}
run time -- many of these are trapped in Turing but try not to be lulled into a false sense of security. Almost every other language lets you shoot yourself in the foot and after you reload you can shoot the other foot. In most other languages the symptoms appear long after the erroneous statement has executed.
- array bounds are violated
- dangling pointers, unitialized pointers.
- division by zero
general logical errors -- algorithm is incorrect -- often difficult to detect and correct. Typical errors are the following.
- off by 1 counts -- are you counting the rails or the posts a fence.
- infinite loops -- variables in the condition are not changing, or not changing in the correct direction.
- first time through a loop exceptions or second time through a loop -- boundary conditions.
- not setting the correct variables -- example maxcol and maxCol in Turing. One is a variable the other is a builtin constant.
- shallow copy versus deep copy of structures
- point to wrong type (subtype) and then do not execute appropriate routine.

Best offense is a good defense -- have to design testing and debugging in from the start.

Need lots of experience to know what can cause various symptoms; somewhat like a doctor (see the case study below).

3 -- Actions

Have to be methodical and organized. It helps to keep notes (a reminder that documentation is always required!) as it helps you avoid going in circles and reduces the chances of misinterpreting what is happening.

Primarily use assert and put statements. Find out what is actually happening instead of moaning about what should happen and working with what you think is happening.

Use one entry one exit functions/procedures -- easier to check for entry and exit values.

Make hypotheses and compare what happens with what you think should happen; analogous to a test plan, now it is a debugging plan. Another of the myriad instances where planning is required. Use lots of pre/post conditions -- elements of top down design.

Use binary/block search to isolate the cause of the problem. Even copy the program and remove everything that is not in the path to the problem -- I found an error in a C++ compiler at IBM this way. Get to the essence and it becomes easier to see what needs to be done to fix the error.

Use stubs.

You have to know the semantics of all operations so read references on language features and other documentation related to the program -- see the case study below.

You may have to write small test programs to check out various langauge and abstract data type features. Understand what happens when they are used incorrectly. Then look for those symptoms in the prgram you are debugging.

4 -- Debugging Aids

Use pre/post and assert statments in Turing. C and C++ have the assert macro (include assert.h) and use with assert(cond).

Use debug flags to turn on/off various sets of debugging statements. They can also be used to prevent/enable execution of different parts of the algorithm. For example:

    if DEBUG1 THEN ... else ... end if
    if DEBUG2 then ... else ... end if

Use a consistent output structure -- Create output procedures to standardize debug messages. Pass 1,2,3,... parameters of appropriate types. One parameter is a message/number to identify which output statment is produces the result. For example: debugout(5, variable1, variable2) calls a procedure to output "At location 5 value 1 is dddd and value 2 is nnnn". It is easy to get confused over what to read and how to interpret it in output, so debug output needs to be clearly self identifying. Consider the use of a table structure to reduce the need for labels to reduce the volume of output to read. But the columns should to be labeled on every output page.

Print data in different formats -- as character and as number using ord(ch) -- see the case study below.

Debuggers -- programs to step through a program -- do the grunt work but you still have to be organized and keep track of what you are looking for and where you are in the execution sequence of a program.

5 -- CASE STUDY

The program.

var cmd : char
loop
  get cmd
  case cmd of
    label 'a': get int
    label 'b': ...
    label 'c': ...
    label: put "invalid command: ", cmd
  end case
end loop

Symptom: User enters "a" and the case statement executes once. User enters "b" or "c" and case enters twice, once for the label and once for invalid command where white space appears -- apparently no cmd.

Consider the white space. Replace the put statment with the following
put "invalid command: " , cmd , "= " , ord(cmd)

Find out what the cmd character is. Turns out it is ASCII 10 = linefeed (return key) = end of line in Unix.

A hypothesis now is the characters after the label entry cause a proble. Try entering "b ", many spaces after the b. You find out many invalid commands are printed. So "get cmd", with cmd being of type char, reads every character. Why not the same result for "a"?

Read the specification of "get int" and you find out "get int" skips lead whitespace and reads one whitespace character after the int. "get int" consumes the return character and any other whitespace characters after the "a". Try "get skip, cmd" to skip lead whitespace. It works and another bug has been laid to rest. But do not worry, there is an unbounded supply of them.

Happy hunting.