Name
bittersuite3 — A framework for running tests via
RST
Description
BitterSuite3 is a testing framework that works on top of
rst. Instead of providing full
runTests and
computeMarks programs, the user simply provides stubs that redirect to the main
BitterSuite3 code. The goal is to minimize the amount of scripting required by course staff, so the focus instead can be on creating tests.
The output file (generated by
rst, and suitable either for mailing testing output to students or generating a postscript printout for handmarking) contains by default a processed version of the staff-provided
mark-scheme
, a nicely-formatted summary of all of the autotesting, side-by-side output comparisons for generated output that differs from the expected output, and optionally the text of all submitted files.
Alternatively, the
runTests and
computeMarks in
/u/isg/bittersuite3
should be set as defaults in
.rstrc
. This way, once a course is set up to use
BitterSuite, course staff no longer have to think about this for every suite for every assignment.
Alternatively, the stubs that should be provided in the
RST testdir are very short. The contents of
runTests should be
#!/bin/sh
exec /u/isg/bittersuite3/runTests
and the contents of
computeMarks should be
#!/bin/sh
exec /u/isg/bittersuite3/computeMarks
with any appropriate flags appended. The
-f
flag means that any student-submitted files should appear on the output; this is desirable for output to be marked by TAs, but not for a public test system. The intent-indicating
-q
flag and the mark-scheme modifying
-n
and
-i
flags are described below.
Test directory setup
In addition to the
runTests and
computeMarks scripts in a given test suite directory, there are other files and subdirectories standardized by Bittersuite.
config.ss
This file specifies options that will apply to this entire
BitterSuite run. They should be specified in key-value pairs within S-expressions.
- interpret-mark-scheme
- A boolean value, set to false by default. If changed to true, bash commands in the mark-scheme file, such as autotesting grades, will be evaluated.
- nroff-mark-scheme
- A boolean value, set to false by default. If changed to true, nroff commands in the mark-scheme file, such as .ti, will be evaluated.
- print-by-question
- A boolean value, set to false by default, which determines how the autotesting results are displayed in the output file of rst. If true, autotesting totals for each question will be displayed, and the tests will be labelled as "Question X, test Y". If false, autotesting totals for each question will not be displayed, and the tests will be labelled as "Test X_Y".
- print-submit-files
- A boolean value, set to false by default. If changed to true, the student's code will be included in the output file of rst.
- test-account
- A string specifying which account the tests will be run on (default: cs
XXX
t)
- test-connect
- A list of strings specifying the command used to connect to the test account. This command will be run as test-connect test-account
- verbosity
- A number between 0 and 10 specifying the amount of output you would like BitterSuite to produce (default: 1). This refers specifically to output visible to the person running the tests, not to the students.
mark-scheme
This is a file that contains the marking scheme that should be at the top of the student output. If no behaviour-modifying flags are supplied to
computeMarks, then this is just used verbatim.
If the
-n
flag is used, the
mark-scheme
is formatted by nroff, allowing nroff directives to specify cleaner, page-size independent formatting.
If the
-i
flag is used, the
mark-scheme
is interpreted by the bash shell. This allows the use of a number of environment variables that mirror locations inside of the
in
hierarchy (see below) to be used internally in the marking scheme. The variables
te
and
to
specify the total earned marks and the total "out of" marks, respectively. The mark total for all tests housed under the directory
in/3/2
would be
t3_2o
, and the earned mark would be
t3_2e
. The script is interpreted by the shell by doing the equivalent of
eval echo "`cat mark-scheme
`"
The simplest way to access the variables made available by
-i
is to enclose the entire contents of
mark-scheme
in double-quotes (and avoid the use of unescaped double-quotes elsewhere in the file). The shell will simply replace all variables with their values. The more complex way is to encase the marking scheme in $( ... ), which enables the use of scripting commands (and requires explicit output commands to construct the output marking scheme).
provided
The
provided
directory simply contains any files you want to be available to the student files, but which should not have been submitted. This may include Scheme modules used for marking, provided C header files, and so on.
in
The
in
directory is where all of the tests are kept in a hierarchy of subdirectories. Tests occur at the leaf directories, and scores propagate recursively back from there to the root directory. Any
options.ss
or
options.scm
that are encountered are parsed, and the key-values list S-expressions that are in each file modify the state of the tester at the given directory level. Options are parsed in-order, so the state changes in option N are visible to option N+1. All state changes are propagated to the child directories, but are not propagated back to the parent directory. If the
-q
flag is provided to
computeMarks, it indicates that the top-level subdirectories of
in
each represent individual questions, and the autotesting output will be formatted accordingly.
Default Behaviour
BitterSuite3 provides a set of default behaviour that is meant to apply to all domain-specific languages. Some of the implementation is left to these other languages (the timeout and memory options, as well as the handling of input files), but they should honour the intent of these settings as closely as they possibly can.
Allowable options
- language
- This should be followed by a single string or symbol specifying which language should be used from this point on to interpret any options/files before the default behaviour is used as a fallback. Details of these alternate languages is in a later section.
- value
- This should be followed by a single S-expression which specifies a number representing the number of points any encountered tests will be worth. This S-expression will be evaled immediately. Ideally, this expression will result in a rational value, as in the case of (/ 4 2) or 3/5. However, expressions such as (begin (require scheme/math) pi) are also valid. The default value is 1.
- desc, description
- This should be followed by a single string representing a description of the current test. The default value is the empty string.
- timeout
- This should be followed by a single number. It represents the length of time, in seconds, that a test should be given to execute before it is timed out (likely under the assumption that it will continue executing indefinitely). The default value is 15.
- memory
- This should be followed by a single number. It represents the amount of memory, in Mb, a test should be allowed to consume before it is killed.
- diff
- This should be followed by a single string. It specifies a program that takes two filenames as parameters and will be used to compare output from a student test to output from the model solution. This program will output a number representing the percentage earned on this question to file descriptor three, and an explanation for that earned mark to standard output. The default is a wrapper for diff -ibB -q which gives a mark of 100 if that command finds no differences, and 0 if it does.
- thread-children
- This should be followed by a single boolean. It specifies whether child directories should be processed in parallel or not. The default value is #f.
This is an option that must be treated very carefully. If any mutable state is shared among several tests, then this state may become incorrect. So in that case, while top-level directories may be able to run in parallel, the low-level ones would not be able to. As an example, python tests should not be run in parallel, nor should tests on a single evaluator in advanced student scheme that has global mutable state.
On the Solaris systems, an initial test produced quite unsatisfactory results. Multiple tests confirmed that on a beginning student sample, threading increased the runtime from about 2.5 minutes to about 3 minutes, and a question tested in intermediate student consistently would time out because an evaluator was not made after 15 seconds, meaning the thread scheduling algorithm is doing quite a poor job of sharing time slices. This may still be useful for one of three conditions: tests in language like external or C where the OS can take control of some processes, Linux machines where the implementation may be better, or in future versions of mzscheme that are properly multithreaded. Further testing will need to be done for confirmation on any of these scenarios.
Any other value types for these keys, or any other keys, will result in a fatal error that kills the testing suite so course staff can fix the problem.
Allowable non-option files
- input
- A file whose contents will be used as input on standard input for any encountered tests.
Any other encountered files will result in a fatal error that kills the testing suite so course staff can fix the problem.
Languages
The current language, as selected by the language option, is what provides interesting behaviour for
BitterSuite3. Any languages that are supported directly are housed in
/u/isg/bittersuite3/languages
. Alternate domain-specific or testing languages may also be placed in
/u/csXXX
/bittersuite_languages
. The current centrally-supported languages are the Scheme family (scheme/module, scheme/beginner, and so on), C, external and python.
Documentation for these languages is on the ISG TWiki, but should eventually be propagated to their own man pages, named bittersuite3-scheme, etc.)
Providing Language Implementations
definitions.ss
Language definitions must be placed in a Scheme module named
definitions.ss
inside of a subdirectory of one of the main directories mentioned in the previous section with the same name as the language itself. This module must provide four functions, which will be called from the main testing code, and which define the behaviour for this particular language.
- initialize
- hashtable -> ?: Mutates the provided hashtable so it contains an appropriate default state for this language. The produced value is discarded.
- parse-option
- hashtable symbol [value1 ... valueN] -> symbol: Mutates the provided hashtable so its state reflects the changes dictated by the provided symbol key and the list of values provided for that key. If the return value is 'not-handled, this indicates that the key was not recognized by this handler, and the default attempts to handle the key. If the return value is the symbol 'bad-value, the suite will die with an error so course staff can fix the problem. If the return value is 'handled, this indicates to the test suite that the option was recognized and handled successfully. Any other return value is an error.
- interpret-file
- hashtable path -> symbol: Mutates the provided hashtable so its state reflects the changes dicated by the provided file. If the return value is 'not-handled, it indicates the language handler could not make use of the given file, and the default handler attempts to make use of it instead. If the return value is 'handled, this indicates to the test suite that the file was recognized and handled successfully. Any other return value is an error.
- run-test
- hashtable -> (values (union number symbol) string): Run a test, given all of the state provided in the hashtable. There are two return values: either a number representing the percentage earned on this particular test or the symbol 'defer indicating output comparison tests need to be done later, and a string specifying a message explaining the mark earned.
hierarchy-runner/common
All language modules have
hierarchy-runner
added to their module search paths, which enables the use of a number of helper functions and constants when a module contains
(require hierarchy-runner/common)
.
- cond-print
- integer>0 string ... -> (void): The first parameter specifies the minimum verbosity required for the following information to be printed. If only a string is provided with no other parameters, then the string will be printed verbatim followed by a newline. If other parameters are provided, they and the initial string are passed to the format function before it is displayed with a trailing newline. The minimum verbosity should be used with care to try to ensure that the end user gets an appropriate amount of feedback printed.
- additional-indent
- This is a parameter that determines how much more each subsequent line should be indented by cond-print. So, for example, passing a value of 2 to additional-indent if the current value is 6 will make every subsequent line be indented by 8 space characters. The value passed must be an integer value.
- cond-set!
- hash-table any any -> (void): The second parameter specifies the key. If this key already exists in the hash table, then nothing is done; otherwise, the third parameter is inserted in the hash table as the value associated with that key. This is particularly useful in the initialize function of the language modules.
- hash-set!&success
- Applies hash-set! to the provided arguments, and returns the value 'handled. This is particularly useful in the parse-option and interpret-file functions whenever a provided key/value pair or file can be handled successfully.
computeMarks-postprocess
If an executable by this name exists, it is run at the end of
computeMarks, after all diff checking and mark-scheme processing has been done, and the default set of files have been kept for the output. It gives the option, for example, to run
keepFile on any extra files that were generated during testing in this particular language.
Suite-specific configuration
There are four executables that, if they are provided in the test suite directory along with
runTests and
computeMarks, provide entry points into
BitterSuite for various kinds of customization.
Note that these scripts often lend themselves to "quick hack" solutions that could often be implemented more properly as a customized language implementation.
- runTests-preprocess
- Run after student-submitted files and the files from provided are linked into the temporary directory, but before the
in
hierarchy has been processed.
- runTests-postprocess
- Run after the
in
hierarchy has been processed.
- computeMarks-preprocess
- Run near the beginning of computeMarks, after the environment has been set up but before other processing has been done.
- computeMarks-postprocess
- Run at the end of computeMarks, after the default files have been kept for the marking output, and after any and all language-specific postprocessors have been run.