BitterSuite Python Testing

Getting Started: test.pt and test.1 folder layout

Python testing using RST/Bittersuite is similar to Racket testing. You have to create folders called test.pt (for public test) and test.1 (for tests you run after due date). The test.pt and test.1 folders go inside marking/aXX folder in the course account ( aXX is assignment number, ex. a04).

Here is a simple example of what the test.pt folder may look like. In this example, there are two questions. Question 1 asks students to define cube(n)=n*n*n. Question 2 asks students to read from the keyboard and print some stuff to the screen.

Your test cases are placed in test.py, which is structured as a regular Python script, and is executed like one. Student code is available and acts as if the line from (studentcode) import * had been executed beforehand (i.e. all student code is available in the global scope).

Two variables in test.py have special meaning: result and expected. result should be set so that it receives the return value from the function which should be tested (though there are more advanced ways to use it: see examples later). As the name implies, the value of expected should be the value you expect to be produced by the student code. RST/BitterSuite will automatically compare result and expected using &eq;&eq;, although the comparison function can be changed (see Equality Testing below)

If there is a file called input in the test directory (i.e. alongside test.py), that file will be sent as standard (keyboard) input as test.py is executed: this can be used to fake input from the interactive shell, for example.

The Python testing system will always reload a student's code just once per question (loadcode "aXXqY.py"), even if the question uses mutation (this is in contrast to Scheme, where a loadcode command is specified for every test if mutation is used).

The important thing to remember when making test.py is that it is executed as a Python script might, so it has access to all features of the language. Hence, you can do some fairly advanced stuff here. However, if you find yourself repeating the same tricks over and over again in separate tests, you may wish to make that behaviour into a module if possible.

provided/ is in the module search path of test.py, so any modules you put there are accessible to test.py simply by performing an import statement; you do not need to use include anything in the options file in order to do this.

Creating the test.pt and test.1 folders

One way to create the test.pt and test.1 folders is to create the folders and files manually (ie right-click in Finder window to create the folder, and create the files using a text editor). This method works but is very slow.

Instead, you can use the scripts make-all.py and make-tests.py described in AutomatedAutotestCreation to create the folders automatically.

Some examples of test.py

result = student_func(1,2,3)
expected = 13

This calls student_func and stores the return value in result. The testing suite will check that result is equal to 13, and will print appropriate messages if an error occurs (e.g. student_func throws an exception) or if the answer is not correct.

variable = 12
student_mutate(variable)
result = variable
expected = 8

Result does not always have to be set to the value of the function call. In cases of mutation, for example, the output of the function is often unimportant, and setting result to a mutated variable makes more sense.

In the case where there are multiple values to test (perhaps both a mutated variable and the function output are important), you have two choices: either set result to be a list or dictionary containing those values, or run one test for each important value. Running multiple tests is clearer for students and allows for part marks, so this is often the better choice when there are only two or three important values. However, when several values are important, the clarity is lost in the sheer number of tests and the part marks are probably insignificant, so making a list or dictionary likely makes more sense.

try:
   result = crashy_student_code()
except:
   result = "Error occured"
finally:
   # put important cleanup here
expected = "One"

If there is important clean up to do after running student code, such as closing a file, it is often good to use try/except/finally blocks like the example above. These are also useful if you are testing students' error handling. For example, if they are only supposed to stop ZeroDivisonErrors, you may want to cause a different type of error but still have the student pass the test; without a try/except block, this is impossible.

Input/Output

When a function has effects other than consuming and producing values, tests become more complicated. Use the methods below.

Screen Output

To test screen output (print statements) in student code, use the redirect_output module (which you can download from this page, or from the direcory /u/cs116/marking/Useful Modules). A simple example is below:

from redirect_output import *
result = redirect_output(studentfunction, [arg1, arg2])
expected = ["First line of output", "Second line"]

The redirect_output function (inside the module of the same name) consumes two arguments: a function and the arguments to that function (as a list), and produces the screen output as a list of strings. Each string in the list is one line of screen output with the newline character removed.

You can of course do more with this. If only particular lines matter, you can take individual elements from the list. If only particular characters in the output matter (maybe each line includes the score of a particular stage of a game), then using a for loop to replace each line with the relevant value will make the output shorter, and will prevent penalizing students for a typo in an unimportant part of the output.

IMPORTANT: redirect_output produces a list of lines rather than a single string for a reason. BitterSuite will crash with a cadr error if result or expected contains a newline character. By using a list of lines, the newline character will never appear in the output, but misplaced newline characters will still cause the test to fail (since the lines will be split in the wrong spot).

Keyboard Input

To test keyboard input, save a file named input (no file extension) in the same directory as test.py. Each time the programs calls raw_input, one line of the file input is read as the keyboard input.

When keyboard input is combined with screen or file output, use the suppress prompt module. Copy the module file from the marking/Useful Modules directory into the provided folder, and include the line (modules "suppress_prompt") in the options.ss file. If this module is not included, the prompts in the students code will appear in the screen output, and make testing more difficult.

File Input

To test file input, save a file (which will be used by the function) in the directory test.X/provided. When writing the test, treat the input file as if it is in the same directory as test.py.

File Output - Option 1: Reading from the Output File

To test file output, copy fileiomod.py from the directory /u/cs116/marking/Useful Modules into the provided fouler. In the test.py file, include the line from fileiomod import get_file_lines at the beginning, and include the line get_file_lines(filename) at the end, where filename is the name of the file that should be created by the function. The function get_file_lines will read in and ultimately produce the contents of the created file as a list of strings (where each string represents a single line in the file). Compare the contents of the output file represented by this list of strings with the contents that you expect in the test.py file.

For example, suppose a student function named write_to_file consumes a string representing the name of the output file and writes the following lines to the file:

line 1
line 2
line 3

In the test.py file, you would write the following to test that the student's function write_to_file writes the correct lines to the file:

from fileiomod import get_file_lines
try:
   write_to_file('temp')
   try:
      studentanswer = get_file_lines('temp')
      result = studentanswer
      expected = ['line 1\n', 'line 2\n', 'line 3']
   except:
      result = 'an error while reading the produced file;'
      expected = 'no errors.'
except:
   result = 'an error while running write_to_file;'
   expected = 'no errors.'

File Output - Option 2: Testing like Screen Output

To test file output copy dumpfile.py from the directory marking/Useful Modules into the provided folder. In the test.py file, include the line from dumpfile import dumpfile at the beginning, and include the line dumpfile(filename) at the end, where filename is the name of the file that should be created by the function. Dumpfile will print the contents of the produced file to the screen, and so the produced file can be tested in the same way as screen output.

Equality testing

By specifying the option (equal "...") in an options.ss file, you can control how equality is checked. The default is Python's built in "==" test, which performs deep equality testing on all built-in types (i.e. objects compare the same if they have the same contents) and shallow equality for class instances which don't define eq.

The option must be specified as a lambda expression taking two values, or the invocation of a function which returns such a function (confusing?). For example

(equal "lambda x,y: x is y")

gives Python's standard shallow equality tester (e.g. [1] is [1] yields False, but 1 is 1 yields True; two variables which refer to the same object in memory are also equal under shallow equality). This is somewhat similar to Scheme's eq? predicate.

NOTE: The directory /u/cs116/marking/Useful Modules includes a file equality.py which should be useful in defining equality checks. Read the instructions in that file, and keep in mind that it has not be thoroughly tested.

Questions with Dependencies

From time to time, an assignment will have a question which wants to use another question as a helper function. Perhaps question 1a builds a game board, and question 1b plays the game. It may be desirable to run question 1b using the model solution version of question 1a, instead of the student version. This allows students who struggle on part a to still make an attempt at part b. To do this takes two steps: you first need to delete the student's version of the code, and then provide the correct version.

If both functions are defined in the same file (perhaps 1a and 1b are both included in the file a8q1.py), and if model_solns.py is a file in the aX/test.Y/provided directory, you could include the following code in test.py

try:    del make_game_board
except: pass
from model_solns import play_game

The try/except is important in case the student didn't define make_game_board; if that is the case, del make_game_board will produce an error. You could also copy and paste the definition of play_game into each test.py file, but importing from a single file makes it easier to change the model solution (if necessary).

If the two questions are in different files, this process is a little longer, but still fundamentally the same. Continuing the previous example, suppose question 1a is in the file a8q1a.py and question 1b is in the file a8q1b.py. Students should include import a8q1a at the beginning of a8q1b.py; they can then use a8q1a.make_game_board in their code for part (b). Assuming again that model_solns.py is a file in the aX/test.Y/provided directory, the test.py file should contain something like

try:    del a8q1a.make_game_board
except: pass

import os
os.rename('model_solns.py', 'a8q1.py')
try:    reload(a8q1a)
except: import a8q1a

The first try/except is important for the same reasons as before. The second is needed because reload(module) only works if module has been imported previously, while import module only works if module has not been imported before. (Aside: import module will not produce an error if it has already been imported; in fact, it seems to do nothing at all.)

Note that you could have students copy/paste their code from part (a) into a8q1b.py, and then follow the instructions for both functions being in the same file. This method is not as good, because it makes a8q1b.py far longer than it needs to be, which makes it more difficult for markers to focus on the important elements. It also means that every time a student edits their a8q1a.py file, they need to recopy the code in order to test their a8q1b.py file; if the function is imported instead, an edit to a8q1a.py will immediately effect a8q1b.py.

Modules

See BitterSuitePythonModules.

Topic revision: r20 - 2018-12-18 - YiLee
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback