This walkthrough will go through the creation of a simple marking suite to illustrate the creation of a Scheme suite using BitterSuite, version 3. For your particular course, substitute course in directory paths with the appropriate value (e.g., cs135).
As of Spring 2010, the BitterSuiteInitialSetup should be done for any course using BitterSuite; this walkthrough should then be written to assume this setup by default (with pointers otherwise).
The reason for the particulars of how marking suites are constructed is due to the behaviour
of RST. More details on the behaviour of RST can be read
in the documentation for that program.
While BitterSuite is designed in a fashion that would allow it to run independently of RST,
RST provides a number of safety features to help ensure basic protection is
in place for your program and handles cycling through submissions automatically,
so in practice it is always run through RST.
All marking suites are contained inside of the directory
/u/csXXX/marking
. Let's say we want to create a marking suite
for the assignment “fake.”. We would then create a subdirectory with
this name: /u/csXXX/marking/fake/
. Various testing suites for this
assignment can then be created as subdirectories of this. Let's create one called
0
, so the full path to this suite will be /u/csXXX/marking/fake/0/
.
If your .rstrc
has been configured as described at BitterSuiteInitialSetup,
this testing suite should automatically
use the default BitterSuite hooks. Otherwise, you'll need to need to create explicit
runTests
and computeMarks
scripts as described at that page.
You can now try running this testing suite to see results:
rst fake 0
Chances are this will give you an error message because this assignment
does not exist (unless it has already been created). You'll want to
create the directory /u/course/handin/fake
and then populate that with
directories representing a fake student or two.
You will also likely see RST complain about a missing answers
directory.
This is done to make sure that if you're doing I/O tests you've prepared them
properly. See BitterSuiteIOWalkthrough for details; otherwise, create an
empty folder /u/csXXX/marking/fake/answers/
to suppress this
warning message.
Once you've done this, you will instead see a series of errors from
BitterSuite because of other files that are missing. However, you'll
be able to verify that rst
is running runTests
and
computeMarks
as expected.
Note that you may also see permissions errors on the marking directory if an advanced user configured RST not to overwrite marking directory permissions with friendly defaults; if this is the case, see RSTConfiguration for more information.
Naturally, at some point you will also want to test some student code.
The testing hierarchy is contained within a directory in
inside
of /u/course/marking/fake/0
.
Inside of in
we will create a directory for each test. For the
purposes of creating a test, we will assume this assignment required
students to submit a file addition.scm
which contains a function
add4 which consumes a single number and produces a new number
which is 4 larger than the consumed number.
The testing suite needs to know a number of pieces of information, which
are provided via an options.ss
file. Each option is specified in a
key-values s-expression. The ordering of these expressions matters; they are processesd in order. In particular, there are keys that are understood by particular languages but not by the base code, and having the base code attempt to interpret these will cause the suite to abort.
So, we'll create a test directory called 1
(full path /u/course/marking/fake/0/in/1
),
and put the following information inside of a file called options.ss
inside of that directory:
(language scheme/intermediate) (loadcode "addition.scm") (value 4) (desc "Adding to zero") (result (add4 0)) (expected 4)
This first line, (language scheme/intermediate)
, specifies the language that should be used to run the test. For Scheme, there are a number of different teaching language dialects; for this example, we're using Intermediate Student.
Valid language values are those listed as teaching language codes
in the sandbox.ss documentation in the DrScheme Help Desk, as well
as scheme/module.
After this, we've specified what file the code to be tested is in with (loadcode "addition.scm")
.
This code will be loaded into a sandbox testing environment
where it is run in the specified language.
Next, we have decided to give the number of points this test is worth; any expression which evaluates to a numerical value will work. The default is 1, so this is not necessary if the test will just be worth a single mark.
Finally, after all of that boilerplate, we'll specify the test itself. Information about tests generally consists of three Scheme expressions representing the following:
The results of evaluating (add4 0)
and 4
will be compared using the
equal?
function by default. If this returns true, the test will be marked
as passed; if not, it will be marked as failed.
Now you can run the rst
command specified earlier again. You should
see much more meaningful results now. Try playing around with different
fake student submissions in /u/course/handin/fake
to see different output.
For example, one student may correctly write code: (define (add4 x) (+ 4 x))
, another may
specify it incorrectly: (define (add4 x) (+ 3 x))
, another may misspell the function:
(define (ad4) (+ 4 x))
, and a fourth student may submit something that isn't even
valid Scheme: (define (hopeless
.
Note that this is a required part of creating tests. You must verify that
The autotesting results will give you some indication of the behaviour of the suite under different circumstances. If the file cannot be found or the code can not be loaded properly, then there will be an appropriate failure report for the test, and execution of the test will not be attempted. If the test passes, the test description is printed along with an indication it passed. If the test fails, the test description is printed followed with a mention of failure, and both the results produced by the student code and the expected value are printed so the student can try to trace what went wrong. If running the test on the student code causes an exception to be raised, this is just a special case of the failure situation; the string representation of the exception will be used as the student's output.
To create multiple tests, you can create new subdirectories of in
and
create new options.ss
files in those subdirectories in
a fashion similar to that given above. Note that the names of these
subdirectories are slightly more restricted than general directory names; you
should stick to alphanumeric characters, ignoring all symbols including the
hyphen and underscore because they will conflict with the behaviour of the
suite scripts.
This is enough for you to be able to write a testing suite for a given Scheme
assignment. However, creating extra options files in each directory when
they'll have very similar if not the same content is mind-numbing, as with the
repeated language
and loadcode
expressions you'll be typing. Worse,
creating a new evaluator for every test will slow your rst runs to a crawl.
There is a way to avoid this.
There are two key features which vastly improve the usability of this testing suite:
options.ss
files are inherited from parent directories as you descend through the directory hierarchy.
In this example above, the language
and loadcode
expressions could have been
specified once in an options.ss
file in the root in
directory and not specified in any of the test subdirectories;
the behaviour would have been the same.
Normally the language option
in particular is present at the top directory and no subdirectories
since a given assignment is often entirely in a single language.
Moreover, these can be overridden at deeper directories. For example,
say that tests 1 through 5 are supposed to have value 1, but test 6
should be worth 2. Then the first five directories do not need a value
specified in the options
file since they will inherit the value of 1 from the parent directory,
but the sixth test's directory can override this this with its own
value
option containing 2.
Since directory hierarchy can be arbitrarily deep, questions can also be grouped by value, although this will be reflected in the autotesting output for the students. For example, a hierarchy could be constructed as follows:
In this case, tests 1 though 3 are worth 1 mark, tests 4 and 5 are worth 3/2 and test 6 is worth 2, for a total of 8 marks.
See SchemeRationalNumbers for warnings about non-integer values.
Before any tests are released, you should take a look at BitterSuiteMarkSchemeWalkthrough to make sure you are taking proper advantage of the potential of marking schemes.
If you are going to be working with tests using input and output, see BitterSuiteIOWalkthrough.
And, you should also read the language-particular twiki pages listed below:
These pages are designed as reference pages; walkthroughs in the style of this page should be added for these languages at some point in the future.