bittersuite3 — A framework for running tests via RST
BitterSuite3 is a testing framework that works on top of rst. Instead of providing full runTests and computeMarks programs, the user simply provides stubs that redirect to the main BitterSuite3 code. The goal is to minimize the amount of scripting required by course staff, so the focus instead can be on creating tests.
The output file (generated by rst, and suitable either for mailing testing output to students or generating a postscript printout for handmarking) contains by default a processed version of the staff-provided mark-scheme
, a nicely-formatted summary of all of the autotesting, side-by-side output comparisons for generated output that differs from the expected output, and optionally the text of all submitted files.
Alternatively, the runTests and computeMarks in /u/isg/bittersuite3
should be set as defaults in .rstrc
. This way, once a course is set up to use BitterSuite, course staff no longer have to think about this for every suite for every assignment.
Alternatively, the stubs that should be provided in the RST testdir are very short. The contents of runTests should be
#!/bin/sh exec /u/isg/bittersuite3/runTests
and the contents of computeMarks should be
#!/bin/sh exec /u/isg/bittersuite3/computeMarks
with any appropriate flags appended. The -f
flag means that any student-submitted files should appear on the output; this is desirable for output to be marked by TAs, but not for a public test system. The intent-indicating -q
flag and the mark-scheme modifying -n
and -i
flags are described below.
In addition to the runTests and computeMarks scripts in a given test suite directory, there are other files and subdirectories standardized by Bittersuite.
This file specifies options that will apply to this entire BitterSuite run. They should be specified in key-value pairs within S-expressions.
XXX
t)
This is a file that contains the marking scheme that should be at the top of the student output. If no behaviour-modifying flags are supplied to computeMarks, then this is just used verbatim.
If the -n
flag is used, the mark-scheme
is formatted by nroff, allowing nroff directives to specify cleaner, page-size independent formatting.
If the -i
flag is used, the mark-scheme
is interpreted by the bash shell. This allows the use of a number of environment variables that mirror locations inside of the in
hierarchy (see below) to be used internally in the marking scheme. The variables te
and to
specify the total earned marks and the total "out of" marks, respectively. The mark total for all tests housed under the directory in/3/2
would be t3_2o
, and the earned mark would be t3_2e
. The script is interpreted by the shell by doing the equivalent of
eval echo "`cat mark-scheme
`"
The simplest way to access the variables made available by -i
is to enclose the entire contents of mark-scheme
in double-quotes (and avoid the use of unescaped double-quotes elsewhere in the file). The shell will simply replace all variables with their values. The more complex way is to encase the marking scheme in $( ... ), which enables the use of scripting commands (and requires explicit output commands to construct the output marking scheme).
The provided
directory simply contains any files you want to be available to the student files, but which should not have been submitted. This may include Scheme modules used for marking, provided C header files, and so on.
The in
directory is where all of the tests are kept in a hierarchy of subdirectories. Tests occur at the leaf directories, and scores propagate recursively back from there to the root directory. Any options.ss
or options.scm
that are encountered are parsed, and the key-values list S-expressions that are in each file modify the state of the tester at the given directory level. Options are parsed in-order, so the state changes in option N are visible to option N+1. All state changes are propagated to the child directories, but are not propagated back to the parent directory. If the -q
flag is provided to computeMarks, it indicates that the top-level subdirectories of in
each represent individual questions, and the autotesting output will be formatted accordingly.
BitterSuite3 provides a set of default behaviour that is meant to apply to all domain-specific languages. Some of the implementation is left to these other languages (the timeout and memory options, as well as the handling of input files), but they should honour the intent of these settings as closely as they possibly can.
This is an option that must be treated very carefully. If any mutable state is shared among several tests, then this state may become incorrect. So in that case, while top-level directories may be able to run in parallel, the low-level ones would not be able to. As an example, python tests should not be run in parallel, nor should tests on a single evaluator in advanced student scheme that has global mutable state.
On the Solaris systems, an initial test produced quite unsatisfactory results. Multiple tests confirmed that on a beginning student sample, threading increased the runtime from about 2.5 minutes to about 3 minutes, and a question tested in intermediate student consistently would time out because an evaluator was not made after 15 seconds, meaning the thread scheduling algorithm is doing quite a poor job of sharing time slices. This may still be useful for one of three conditions: tests in language like external or C where the OS can take control of some processes, Linux machines where the implementation may be better, or in future versions of mzscheme that are properly multithreaded. Further testing will need to be done for confirmation on any of these scenarios.
Any other value types for these keys, or any other keys, will result in a fatal error that kills the testing suite so course staff can fix the problem.
The current language, as selected by the language option, is what provides interesting behaviour for BitterSuite3. Any languages that are supported directly are housed in /u/isg/bittersuite3/languages
. Alternate domain-specific or testing languages may also be placed in /u/
. The current centrally-supported languages are the Scheme family (scheme/module, scheme/beginner, and so on), C, external and python. Documentation for these languages is on the ISG TWiki, but should eventually be propagated to their own man pages, named bittersuite3-scheme, etc.)
csXXX
/bittersuite_languages
Language definitions must be placed in a Scheme module named definitions.ss
inside of a subdirectory of one of the main directories mentioned in the previous section with the same name as the language itself. This module must provide four functions, which will be called from the main testing code, and which define the behaviour for this particular language.
All language modules have hierarchy-runner
added to their module search paths, which enables the use of a number of helper functions and constants when a module contains (require hierarchy-runner/common)
.
If an executable by this name exists, it is run at the end of computeMarks, after all diff checking and mark-scheme processing has been done, and the default set of files have been kept for the output. It gives the option, for example, to run keepFile on any extra files that were generated during testing in this particular language.
There are four executables that, if they are provided in the test suite directory along with runTests and computeMarks, provide entry points into BitterSuite for various kinds of customization.
Note that these scripts often lend themselves to "quick hack" solutions that could often be implemented more properly as a customized language implementation.
in
hierarchy has been processed.
in
hierarchy has been processed.