Walkthrough for BitterSuite

This walkthrough will go through the creation of a simple marking suite to illustrate the creation of a Scheme suite using BitterSuite, version 3. For your particular course, substitute course in directory paths with the appropriate value (e.g., cs135).

As of Spring 2010, the BitterSuiteInitialSetup should be done for any course using BitterSuite; this walkthrough should then be written to assume this setup by default (with pointers otherwise).

Creation of a marking directory

The reason for the particulars of how marking suites are constructed is due to the behaviour of RST. More details on the behaviour of RST is at ISGScriptsManPages#RST. While BitterSuite is designed in a fashion that would allow it to run independently of RST, RST provides a number of safety features to help ensure basic protection is in place for your program and handles cycling through submissions automatically, so in practice it is always run through RST.

All marking suites are contained inside of the directory /u/csXXX/marking. Let's say we want to create a marking suite for the assignment “fake.”. We would then create a subdirectory with this name: /u/csXXX/marking/fake/. Various testing suites for this assignment can then be created as subdirectories of this. Let's create one called 0, so the full path to this suite will be /u/csXXX/marking/fake/0/.

If your .rstrc has been configured as described at BitterSuiteInitialSetup, this testing suite should automatically use the default BitterSuite hooks. Otherwise, you'll need to need to create explicit runTests and computeMarks scripts as described at that page. You also need to create a config.ss file, as described at BitterSuiteConfig.

You can now try running this testing suite to see results:
rst fake 0
Chances are this will give you an error message because this assignment does not exist (unless it has already been created). You'll want to create the directory /u/course/handin/fake and then populate that with directories representing a fake student or two.

You will also likely see RST complain about a missing answers directory. This is done to make sure that if you're doing I/O tests you've prepared them properly. See BitterSuiteIOWalkthrough for details; otherwise, create an empty folder /u/csXXX/marking/fake/answers/ to suppress this warning message.

Once you've done this, you will instead see a series of errors from BitterSuite because of other files that are missing. However, you'll be able to verify that rst is running runTests and computeMarks as expected.

Note that you may also see permissions errors on the marking directory if an advanced user configured RST not to overwrite marking directory permissions with friendly defaults; if this is the case, see RSTConfiguration for more information.

Jonathan Templin has created a python script which automates the creation of this marking directory. The script has only been tested by CS 115 and CS 116 tutors, but it should be useable by any course. For more information and to download the script, see AutomatedAutotestCreation.

A Single Scheme Test

Creating a test directory

Naturally, at some point you will also want to test some student code. The testing hierarchy is contained within a directory in inside of /u/course/marking/fake/0.

Inside of in we will create a directory for each test. For the purposes of creating a test, we will assume this assignment required students to submit a file addition.scm which contains a function add4 which consumes a single number and produces a new number which is 4 larger than the consumed number.

Creating a test

The testing suite needs to know a number of pieces of information, which are provided via an options.ss file. Each option is specified in a key-values s-expression. The ordering of these expressions matters; they are processesd in order. In particular, there are keys that are understood by particular languages but not by the base code, and having the base code attempt to interpret these will cause the suite to abort.

So, we'll create a test directory called 1 (full path /u/course/marking/fake/0/in/1), and put the following information inside of a file called options.ss inside of that directory:

(language scheme/intermediate)
(loadcode "addition.scm")
(value 4)
(desc "Adding to zero")
(result (add4 0))
(expected 4)

This first line, (language scheme/intermediate), specifies the language that should be used to run the test. For Scheme, there are a number of different teaching language dialects; for this example, we're using Intermediate Student. Valid language values are those listed as teaching language codes in the sandbox.ss documentation in the DrScheme Help Desk, as well as scheme/module.

After this, we've specified what file the code to be tested is in with (loadcode "addition.scm"). This code will be loaded into a sandbox testing environment where it is run in the specified language.

Next, we have decided to give the number of points this test is worth; any expression which evaluates to a numerical value will work. The default is 1, so this is not necessary if the test will just be worth a single mark.

Finally, after all of that boilerplate, we'll specify the test itself. Information about tests generally consists of three Scheme expressions representing the following:

Description: A string describing what the test should do.
Student evaluation: An expression which will test the code submitted by the student.
Expected result: The value expected from the evaluation of the above expression.

The results of evaluating (add4 0) and 4 will be compared using the equal? function by default. If this returns true, the test will be marked as passed; if not, it will be marked as failed.

Testing the tests

Now you can run the rst command specified earlier again. You should see much more meaningful results now. Try playing around with different fake student submissions in /u/course/handin/fake to see different output. For example, one student may correctly write code: (define (add4 x) (+ 4 x)), another may specify it incorrectly: (define (add4 x) (+ 3 x)), another may misspell the function: (define (ad4) (+ 4 x)), and a fourth student may submit something that isn't even valid Scheme: (define (hopeless.

Note that this is a required part of creating tests. You must verify that

The test suite you have designed does not contain any errors.
Code that is known to be correct passes your tests
Multiple examples of code that is known to be incorrect will fail the tests, and in particular, will fail in exactly the way that's expected and not in other ways.

The autotesting results will give you some indication of the behaviour of the suite under different circumstances. If the file cannot be found or the code can not be loaded properly, then there will be an appropriate failure report for the test, and execution of the test will not be attempted. If the test passes, the test description is printed along with an indication it passed. If the test fails, the test description is printed followed with a mention of failure, and both the results produced by the student code and the expected value are printed so the student can try to trace what went wrong. If running the test on the student code causes an exception to be raised, this is just a special case of the failure situation; the string representation of the exception will be used as the student's output.

Creating additional tests

To create multiple tests, you can create new subdirectories of in and create new options.ss files in those subdirectories in a fashion similar to that given above. Note that the names of these subdirectories are slightly more restricted than general directory names; you should stick to alphanumeric characters, ignoring all symbols including the hyphen and underscore because they will conflict with the behaviour of the suite scripts.

This is enough for you to be able to write a testing suite for a given Scheme assignment. However, creating extra options files in each directory when they'll have very similar if not the same content is mind-numbing, as with the repeated language and loadcode expressions you'll be typing. Worse, creating a new evaluator for every test will slow your rst runs to a crawl. There is a way to avoid this.

Hierarchical testing

There are two key features which vastly improve the usability of this testing suite:

The values specified in options.ss files are inherited from parent directories as you descend through the directory hierarchy.
Directories may be nested arbitrarily deep.

In this example above, the language and loadcode expressions could have been specified once in an options.ss file in the root in directory and not specified in any of the test subdirectories; the behaviour would have been the same. Normally the language option in particular is present at the top directory and no subdirectories since a given assignment is often entirely in a single language.

Moreover, these can be overridden at deeper directories. For example, say that tests 1 through 5 are supposed to have value 1, but test 6 should be worth 2. Then the first five directories do not need a value specified in the options file since they will inherit the value of 1 from the parent directory, but the sixth test's directory can override this this with its own value option containing 2.

Since directory hierarchy can be arbitrarily deep, questions can also be grouped by value, although this will be reflected in the autotesting output for the students. For example, a hierarchy could be constructed as follows:

in
- options.ss: (value 1) (language scheme/beginner) (loadcode abc.ss)
- simple
  - 1
    - test.ss
  - 2
    - test.ss
  - 3
    - test.ss
- complex
  - options.ss: (value 3/2)
  - 4
    - test.ss
  - 5
    - test.ss
  - 6
    - options.ss: (value 2)
    - test.ss

In this case, tests 1 though 3 are worth 1 mark, tests 4 and 5 are worth 3/2 and test 6 is worth 2, for a total of 8 marks.

See SchemeRationalNumbers for warnings about non-integer values.