CheckTestcases

In Fall 2012, CS 135 tried using a script to check the students' check-expects. The script checks if the student has thoroughly tested their code and if they have the required testcases. This page is about that script.

Background Information

In the assignment marking scheme, there is typically a list of the testcases students should have. For example, if students have to write a function sum-lon to add a list of numbers, the required testcases might be:

  • An empty list
  • A list of length 1
  • A list of length > 1
Checking whether students have these testcases can be done automatically (instead of manually by the marker).

The script works by re-defining check-expect (and check-within and check-error) using Racket's define-syntax. It then runs the student's code in a sandbox environment. After the student's code is run, you will have access to a list containing the student's check-expects.

Some information about the script's behaviour:

  • The script does not care whether the check-expects pass or not. It only uses the first argument to check-expect; it ignores all other arguments completely. For example, if a student writes:
(check-expect (undefined-fcn (+ 45 2)) totally syntax error! (add1) (/ 1 0))

the script can still get at the 47, and do whatever it needs to do with it.

  • The script does not care where check-expects are placed. For CS135, examples and tests are both written with check-expects, and the script cannot differentiate between examples and tests.
  • The script does not check if helper functions have check-expects.
  • In many ways, the script is as strict as the autotest. If the student's code contains syntax errors outside of check-expect, then the script won't work (sandbox will be unable to load the student's code). In that case, the script will output nothing and the marker has to manually read the student's check-expects. The script will not incorrectly say every test case is missing -- it just quits quietly.
  • The script only checks one of the student's files at a time. If students have to submit more than one file, you can create several versions of the script (one version for each file the student has to submit).

Examples

Attached at the bottom of this page are two examples. To try out the demo example:

  1. Download and save check-testcases-demo.zip somewhere on the course account. Unzip it.
  2. In a terminal, cd into the created folder. This folder will contain four files:
    • demo.rkt, the main script that will check the student's testcases
    • collector.rkt, a file that will be require'd by the student's code
    • studentscode.rkt, the student's code
    • irisdata.rkt, a teachpack that the student can use.
  3. Run racket demo.rkt studentscode.rkt. It will print the following:

Q1: sum-lon
-----------------------------------------------------
1. Missing empty case

Checking my-equal?
-----------------------------------------------------
1. Missing test with two iris

Bonus question
-----------------------------------------------------
1. Missing test with two stings


Total missing, excluding bonus: 2


Number of testcases:
   Q1) sum-lon: 3
   Q2) my-equal?: 10
   Q3) bonus-fcn: 1

Is studentscode.rkt covered? yes

As you can see, the script outputs the missing testcases and the number of distinct, valid check-expects the student wrote. It also determines if the student's file is covered or not. A file is covered if all code is used (DrRacket will not highlight any code).

The check-testcases-a10.zip example is a real example from CS135 Fall 2012 Assignment 10. You can download the zip file, unzip it, and cd into the created folder. There are two sample submissions that you can try:

  • racket a10.rkt skyscrapers1.rkt
  • racket a10.rkt skyscrapers2.rkt
skyscrapers1.rkt has all the required testcases, and skyscrapers2.rkt is missing a few testcases.

What the Script Does in Detail

Let's trace through the demo example. There are two scripts, demo.rkt which is the main script, and collector.rkt, which is a helper script. You can rename demo.rkt if you want.

demo.rkt needs one argument: the path to the student's file. It will first create a modified copy of the student's code, and save the modified copy as .check-testcases-tmp-file.rkt. This temporary file will be saved in the same directory as the script.

The file .check-testcases-tmp-file.rkt is the same as the student's code, except:

  • The DrRacket heading (the first three lines that DrRacket adds) is changed. The script removes installed teachpacks and forces student to use the correct language level. The constant new-header defines the new header.
  • (require "collector.rkt") is added at the top.
collector.rkt re-defines check-expect (and check-within and check-error) using Racket's define-syntax, and provides the re-defined check-expect. The script also defines and provides two functions: add-a-testcase and get-all-testcases.

add-a-testcase consumes a list representing a check-expect test, and adds it to the front of the list testcases. The function produces (void). For example, (add-a-testcase '(check-expect (add1 4) 5)) will add the list '(check-expect (add1 4) 5) to the front of testcases. The script collector.rkt re-defines check-expect such that the new check-expect will call add-a-testcase.

get-all-testcases produces testcases, the list of all the testcases that have been added. You can use this function to access the student's testcases. (get-all-testcases) contains the added testcases in the reverse order that they were added. For example:

> (get-all-testcases)
'()
> (add-a-testcase '(check-expect (symbol? 'one) true))
> (add-a-testcase '(check-expect (symbol? 'two) true))
> (get-all-testcases)
'((check-expect (symbol? 'two) true) (check-expect (symbol? 'one) true))

Once the modified file .check-testcases-tmp-file.rkt is created, demo.rkt runs this file using make-module-evaluator. make-module-evaluator produces an evaluator, which is a function that consumes a list representing a Racket expression, and evaluates it in the context of the student's code. For example, if the produced evaluator is called e, then (e '(+ 1 2)) will produce 3. It will evaluate (+ 1 2) using the student's code. As another example, if the student defines a function (define (f x) x), then (e '(f 2)) will produce 2. It does not matter whether f is defined in demo.rkt or not, and (e '(f 2)) will use the f defined by the student.

demo.rkt calls (e '(get-all-testcases)), which produces a list of the student's testcases in reverse order. This list is stored in the variable student-testcases-list. For the demo example, student-testcases-list looks like:

(list '(bonus-fcn 0 0) 
      (list 'my-equal? (iris 1 2 3 4) 'blueberry)
      '(my-equal? #t #f)
      '(my-equal? "a" "b")
      '(my-equal? #\a #\b)
      '(my-equal? sym1 symb2)
      '(my-equal? (1 2 (3 4)) (#t "abcdef" #\u #t #f ok))
      (list 'my-equal? (list (posn 23 23)) (list (posn 3 4)))
      '(my-equal? (a b c) #\c)
      (list 'my-equal? (posn 0 0) (posn 0 0))
      (list 'my-equal? (posn 0 0) 42)
      '(sum-lon (3.141592653589793))
      '(sum-lon "not a list")
      '(sum-lon (5 4))
      '(sum-lon (1 2 3))
      '(/ 1 0))

There is a one-to-one correspondence between student-testcases-list and the student's testcases.

demo.rkt takes student-testcases-list and filters out "bad" testcases, such as (check-expect (/ 1 0) 'inf). It also filters out tests where the inputs violate the function's contract. For example, if students have to write a function sum-lon to add a list of numbers, and they write the test

(check-expect (sum-lon "not a list") 'error)

then this test will be filtered out. Whether a testcase is filtered out or not is defined by the XXX/valid? functions where XXX is the name of a function the student has to write. The XXX/valid? functions consume a list representing the second argument to check-expect. For example, if the student writes (check-expect (sum-lon (list 1 2 3)) 6) then sum-lon/valid? will consume the list '(sum-lon (list 1 2 3)). The XXX/valid? functions should produce true if the inputs are valid, and false otherwise. For example, for sum-lon, the validity checker is

(define (sum-lon/valid? fcn-app)
  (and (= 2 (length fcn-app))
       (equal? (first fcn-app) 'sum-lon)
       (list? (second fcn-app))
       (andmap number? (second fcn-app))))

For example, (sum-lon/valid? '(sum-lon "not a list")) is false, so the test case (check-expect (sum-lon "not a list") 'error) will be filtered out and ignored.

The filtered testcases (ie the valid testcases) are stored in the hash table student-testcases. In this hash table, keys are Symbols representing the function name, and the values are lists of lists. The inner lists are list of arguments passed to the function being tested. For the demo example, the hash table would look like:

(hash 'sum-lon '(((1 2 3)) 
                 ((5 4)) 
                 ((3.141592653589793)))
      'my-equal? (list (list (posn 0 0) 42)
                       (list (posn 0 0) (posn 0 0))
                       '((a b c) #\c)
                       (list (list (posn 23 23)) (list (posn 3 4)))
                       '((1 2 (3 4)) (#t "abcdef" #\u #t #f ok))
                       '(sym1 symb2)
                       '(#\a #\b)
                       '("a" "b")
                       '(#t #f)
                       (list (iris 1 2 3 4) 'blueberry))
      'bonus-fcn '((0 0)))

Note that the bad testcase (check-expect (sum-lon "not a list") 'error) is not in the hash table, but it is in student-testcases-list, because the testcase has been filtered out.

Once the hash table is made, the script will check which testcases are missing. It follows this algorithm:

for each required testcase T
{
    meet_testcase = false;
    for each of the student's check-expect ce
    {
        if ce satisfies T
        {
            meet_testcase = true; // the student has testcase T
            break;
        }
    }
    
    if ( ! meet_testcase)
    {
        print "Test case " + T + " not met";
    }
}

Required testcases are added using the add function.

After all required testcases are checked, demo.rkt will print how many testcases the student has missed in total. This total does not include missing testcases for bonus questions.

demo.rkt then prints how many distinct check-expects the student has for the function fcn using (length (remove-duplicates (hash-ref student-testcases fcn empty))).

Finally, demo.rkt prints whether the student's file is covered or not, and then quits.

Tasks for Each Assignment

For each assignment, you have to edit the "Assignment Specifics" section of demo.rkt (you can rename demo.rkt if you want). Here is a description of the variables and functions you'll use. For examples, see demo.rkt.

List of Variables, Functions, and Structures

(define-struct tc (desc has?) #:transparent)

This structure holds information about a required testcase. desc is a string which will be printed if the student is missing the testcase. has? is a function that:

  • consumes the same thing(s) as the function the student writes, and
  • produces true if the testcase is met, and false otherwise.

timeout and memory

  • Type: Positive Integer
  • Meaning: Roughly how much time and memory the student's code can use up. timeout and memory should be set very high.

modules

  • Type: List of Strings
  • Meaning: A list of filenames of the teachpacks that the students can use. modules should also include the names of any files that students can include with (require ...). Put teachpacks and provided files in the same folder as the main script.

bonuses

  • Type: List of Symbols
  • Meaning: A list of symbols, where each symbol is the name of a function for a bonus question. Missing testcases for bonus functions are not added to the total count.

summary-line

  • Type: String
  • Meaning: summary-line will be printed right before the total number of tests that are missing.

language-level

  • Type: String
  • Meaning: Pick the language level the assignment is in. Enter a number (argument to list-ref) to pick a language.

get-fcns-from-eval

  • Type: Syntax
  • Meaning: get-fcns-from-eval is defined using define-syntax near the top of the script. Give get-fcns-from-eval a list of functions which are in the student's code and which you want to use in the script. For example, if the student defines a structure (define-struct s (a b)), and you want to use s-a in demo.rkt, give s-a as an argument to get-fcns-from-eval.

question-names

  • Type: Hash Table mapping symbols to strings
  • Meaning: Each key is a symbol representing the name of a function the student has to write. The value (a string) is the heading for that question in the ouput.

short-names

  • Type: Hash Table mapping symbols to strings
  • Meaning: Similar to question-names. Each key is a symbol representing the name of a function the student has to write. The value is a string, and it's used when printing the number of testcases the student has written.

required-testcases

  • Type: An association list where keys are symbols and values are (listof tc). So the type of required-testcases is (listof (list Symbol (listof tc))).
  • Meaning: Each key is a symbol representing the name of a function the student has to write. The value is a list of required testcases for that funcion. You do not have to populate the (listof tc) in the definition of required-testcases; you will add the required testcases using the add function.

add

  • Type: Function
  • Meaning: The add function is used to add a testcase that the students should have. add consumes a symbol (name of the function this testcase is for), and a tc structure. The tc structure stores information about a required testcase. It has two fields:

  1. a string which will be displayed if the testcase is missing, and
  2. a predicate which consumes the same things as the function the student writes, and produces true if that testcase is met.

valid-input-checker

  • Type: Hash Table mapping symbols to predicates (the XXX/valid? functions)
  • Meaning: Each key is a symbol representing the name of a function the student has to write. The value is a function which is used to determine if a check-expect is valid. The function consumes a list representing the second argument of check-expect. For example, if the student writes (check-expect (sum-lon (list 1 2 3)) 6) then the function will consume the list '(sum-lon (list 1 2 3)). It will produce true if this testcase is valid, and false otherwise.

Summary of Variables

Variable Type Meaning
timeout Positive Integer Roughly how much time the student's code is given to complete. The timeout and memory should be set very high.
memory Positive Integer Roughly how much memory the student's code can use. The timeout and memory should be set very high.
modules List of String A list of teachpacks that the students can use. modules should also include any files that student can include with (require ...).
bonuses List of Symbol A list of symbols, where each symbol is the name of a function for a bonus question.
summary-line String summary-line will be printed right before the total number of tests that are missing.
language-level String Pick the language level the assignment is in. Enter a number (argument to list-ref) to pick a language.
get-fcns-from-eval Syntax List out the functions which are defined in the student's code, and which you want to use in demo.rkt.
question-names Hash Table mapping symbols to strings Each key is a symbol representing the name of a function the student has to write. The value is a string, and it's the heading for that question in the output.
short-names Hash Table mapping symbols to strings Each key is a symbol representing the name of a function the student has to write. The value is a string, and it's used when printing the number of testcases the student has.
required-testcases Association list Each key is a symbol representing the name of a function the student has to write. The value is a list of required testcases for that funcion.
add Function The add function is used to add a testcase that the students should have. add consumes a symbol (name of a function the student has to write), and a tc structure describing the required testcase.
valid-input-checker Hash Table mapping symbols to predicates Each key is a symbol representing the name of a function the student has to write. The value is a function which is used to determine if a check-expect is valid.

Running on All Students

For a real assignment, you should use a more meaningful name than demo.rkt. For example, say you name the file a10.rkt. To run a10.rkt on all the students, you can use a Bash script similar to the following:

#!/bin/bash

base='/u/cs135/check-testcases/a10/'

# Output will be saved here
results='/u/cs135/check-testcases/a10/test-results/'

mkdir ${results}

cd /u/cs135/handin/a10_autotest/
for stud in *; do 
   echo "Doing work for ${stud}"
   racket ${base}/a10.rkt ${stud}/skyscrapers.rkt       \
          1> ${results}/${stud}_missing_testcases.txt   \
          2> ${results}/${stud}_errors.txt
done

This will create a lot of empty XXX_errors.txt files, where XXX is a student's Quest ID. You can remove the empty files with the Python script remove_empty_files.py attached below. This script takes one argument, which is a path to a folder, and removes all empty files (ie files with a size of 0 bytes) from the folder.

cs135@linux028:~/check-testcases$ python remove_empty_files.py /u/cs135/check-testcases/a10/test-results/
Cleaning up /u/cs135/check-testcases/a10/test-results/

Fixing Permission Problems

Occasionally, you may get permission errors. The sandbox environment is very strict, and when the students' code is run in sandbox, their code cannot access anything unless you allow it. An example of a permission error is:

FATAL ERROR occured with file studentscode.rkt: #(struct:exn:fail
file-or-directory-modify-seconds: `read' access denied for 
/u3/cs135/check-testcases/temp/imagedata.rkt #<continuation-mark-set>)

To give students permission to read files (such as teachpacks and provided files), add the appropriate permissions to the sandbox-path-permissions variable.

Notes

  • For the lambda assignment, if the question asks the student to create a function that produces a function, as of W14 there is no way to test questions like this.
Topic attachments
I Attachment Action Size Date Who Comment
Compressed Zip archivezip check-testcases-a10.zip manage 25.8 K 2013-01-01 - 13:39 YiLee CS135 Fall 2012 A10 example
Compressed Zip archivezip check-testcases-demo.zip manage 8.6 K 2013-01-01 - 13:34 YiLee Demo Example
Texttxt remove_empty_files.py.txt manage 0.3 K 2013-01-01 - 14:58 YiLee A script that removes all empty files from a folder/directory.
Topic revision: r8 - 2014-04-08 - ScottFoggo
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback