ISG Web>MarkUs>MarkUsGroupsSVNRepos>CheckTestcases (2021-12-16, XiyuChen)

<nop>CheckTestcases

In Fall 2012, CS 135 tried using a script to check the students' check-expects. The script checks if the student has thoroughly tested their code and if they have the required testcases. This page is about that script.

Background Information

In the assignment marking scheme, there is typically a list of the testcases students should have. For example, if students have to write a function sum-lon to add a list of numbers, the required testcases might be:

An empty list
A list of length 1
A list of length > 1

Checking whether students have these testcases can be done automatically (instead of manually by the marker).

The script works by re-defining check-expect (and check-within and check-error) using Racket's define-syntax. It then runs the student's code in a sandbox environment. After the student's code is run, you will have access to a list containing the student's check-expects.

Some information about the script's behaviour:

The script does not care whether the check-expects pass or not. It only uses the first argument to check-expect; it ignores all other arguments completely. For example, if a student writes:

(check-expect (undefined-fcn (+ 45 2)) totally syntax error! (add1) (/ 1 0))

the script can still get at the 47, and do whatever it needs to do with it.

The script does not care where check-expects are placed. For CS135, examples and tests are both written with check-expects, and the script cannot differentiate between examples and tests.
The script does not check if helper functions have check-expects.
In many ways, the script is as strict as the autotest. If the student's code contains syntax errors outside of check-expect, then the script won't work (sandbox will be unable to load the student's code). In that case, the script will output nothing and the marker has to manually read the student's check-expects. The script will not incorrectly say every test case is missing -- it just quits quietly.
The script only checks one of the student's files at a time. If students have to submit more than one file, you can create several versions of the script (one version for each file the student has to submit).

Examples

Attached here are two different examples for testcase files. Convention is for these files to be named questionname-tc.rkt. All these files are then put in check-testcases/aX/test-cases, where X is the assignment number (For example a09).

redux-tc.rkt super-foldr-tc.rkt

Each question in an assignment should have its own testcase file, and all subquestions should have their own "add" call designated to them (add is explained in more detail lower on the page). All question names should then be added to a file-list.txt which is in the same folder, in the order the questions appear. If an assignment has questions name1, name2, and name3, there should be a file-list with contents "name1 name2 name3".

Refactored tc-lib update

Between Fall 2018 and Winter 2019, Paul Nijjar moved all of the common code in each test case checking file into a common tc-lib.rkt. These can be found in ~/check-testcases/aXY/f18_test_refactored, while the old ones are in the f18_test folder.

On the filesystem, the new scripts (tc-lib.rkt and the collector scripts) live in ~/check-testcases/factored/common .

The refactored scripts are stored in the ISG gitlab repo here: https://git.uwaterloo.ca/isg/racket-check-testcases .

Work for improving the shell scripts has also been done in run-student-flags.sh. For now, just stick to run-all.sh.

Producing input file for set_marks_csv

Between Fall 2018 and Winter 2019, ISAs and Nick Lee made adjustments to the tc-lib.rkt mentioned above to allow for a .csv to be produced that set_marks_csv can use. Since all the code is embedded in the Racket files, this feature is automatically available when run-all.sh or run-student.sh is executed. When a particular student's test cases are checked, there is additional code to write to a file called output.csv located in the same folder as the script (most likely f18_test_refactored). The only extra work is to actually run set_marks_csv using output.csv. You should check that output.csv looks correct and has nothing like "CATEGORY MISSING" anywhere.

There is no code to clear out output.csv when check-testcases is re-run multiple times. It only gets cleared if run-all.sh is run with the "replace" flag, but if it's run with "skip" (the default flag) and a different folder in the -output flag, the new rows will get appended to the old output.csv and it might get unnecessarily big. You can manually delete output.csv if you are doing a "fresh" check-testcases run and are expecting to re-run on every single student.

Tasks for Each Assignment

For each assignment, you have to edit the "Assignment Specifics" section of demo.rkt (you can rename demo.rkt if you want). Here is a description of the variables and functions you'll use. For examples, see demo.rkt.

Useful Type Checking

1. Define your own "member?" in tc files, since full racket does not provide it.

   (define (member? n lst)
        (ormap (lambda (i) (equal? n i)) lst))

2. Check valid (listof Str) inputs:

  (and ... 
       (list? (... fcn-app))
       (andmap string? (...fcn-app))
       ...)

3. One way to check valid BST inputs using flatten method: (suppose the leaves are Sym)

(define (nat? n)
     (and (integer? n) (>= n 0)))

(define (tree? x)
     (or (symbol? x)
          (and (node? x)
                  (node/valid? x))))

 (define (node/valid? t)
      (and (node? t)
             (nat? (node-val t))
             (tree? (node-left t))
             (andmap (lambda (i) (< i (node-val t))) (flatten (node-left t)))
             (tree? (node-right t))
             (andmap (lambda (i) (> i (node-val t))) (flatten (node-right t)))))

(define (flatten t)
     (cond [(symbol? t) empty]
              [(node? t) (append (flatten (node-left t)) 
                                          (list (node-val t))
                                          (flatten (node-right t)))]))

List of Variables, Functions, and Structures

(define-struct tc (desc has?) #:transparent)

This structure holds information about a required testcase. desc is a string which will be printed if the student is missing the testcase. has? is a function that:

consumes the same thing(s) as the function the student writes, and
produces true if the testcase is met, and false otherwise.

timeout and memory

Type: Positive Integer
Meaning: Roughly how much time and memory the student's code can use up. timeout and memory should be set very high.

modules

Type: (anyof (listof Str (list (anyof 'except-in 'only-in) Str Sym Sym ...))))
Meaning: A list of filenames of the teachpacks that the students can use. modules should also include the names of any files that students can include with (require ...). Put teachpacks and provided files in the same folder as the main script.
Update Fall 2018: There was an issue with A08 from F17 where students were allowed to require their own files (namely, they could require "ranking.rkt" in their "match.rkt") Many students defined two constants, "students" and "employers", in both files. This broke the marking scripts because Racket complained of duplicate definitions. To get around this, we expanded the functionality of modules to allow regular filenames (as before) and filenames with particular functions/constants excluded or included. Note that you must use the 'common-collector-simple collector in collector-file below for this to work.
For example: (define modules '("a08lib.rkt" (except-in "ranking.rkt" employers students))) will require "a08lib.rkt" as usual, and require "ranking.rkt" except for the identifiers "employers" and "students".

bonuses

Type: List of Symbols
Meaning: A list of symbols, where each symbol is the name of a function for a bonus question. Missing testcases for bonus functions are not added to the total count.

summary-line

Type: String
Meaning: summary-line will be printed right before the total number of tests that are missing.This is almost never used.

language-level

Type: String
Meaning: Pick the language level the assignment is in. Enter a number (argument to list-ref) to pick a language.

collector-file

Type: String or symbol
Meaning: Indicates which collector file should be used for this question. There are three types of collector file, and you can copy one of these and modify it as necessary. The three existing ones are:
- 'common-collector-ignore-stuff : corresponds to "collector-ignore-stuff.rkt". This is usually what you want to use. It collects test cases and also rewrites templates so that they will not trigger black highlighting warnings. It doesn’t work with locals and lambdas though. So, if there is a template + locals/lambdas in the same file then black highlighting cannot be checked.

- 'common-collector-simple : corresponds to "collector-simple.rkt". This is what you can use most of the time if students are not submitting templates. Note that you run into issues if students have locals in a file and also templates, because then both ignore-stuff and simple collector may cause problems. This is a barebones collector that just collects the test cases without trying to ignore templates. It is necessary when trying to include or exclude functions from imported modules, because the usual "collector-ignore-stuff.rkt" connector redefines define, and so does racket/base (which is necessary for the enhanced functionality). See modules parameter above.

- 'common-collector-count-things : the most complicated collector. Ignores templates and also tallies up the different types of functions in student code. Breaks if students use local. This collector offers the possibility of listing functions students wrote in their code, so you can assess the quality and quantity of their helper function names.

get-fcns-from-eval

Type: Syntax
Meaning: get-fcns-from-eval is defined using define-syntax near the top of the script. Give get-fcns-from-eval a list of functions which are in the student's code and which you want to use in the script. For example, if the student defines a structure (define-struct s (a b)), and you want to use s-a in demo.rkt, give s-a as an argument to get-fcns-from-eval.

debugging

Type: boolean
Meaning: enable debugging in the script. The two main things this does are: produce a list of all the test cases the script found, and produce a copy of the munged student code (with the collector required) in the script folder.

question-names

Type: Hash Table mapping symbols to strings
Meaning: Each key is a symbol representing the name of a function the student has to write. The value (a string) is the heading for that question in the ouput.

short-names

Type: Hash Table mapping symbols to strings
Meaning: Similar to question-names. Each key is a symbol representing the name of a function the student has to write. The value is a string, and it's used when printing the number of testcases the student has written.

required-testcases

Type: An association list where keys are symbols and values are (listof tc). So the type of required-testcases is (listof (list Symbol (listof tc))).
Meaning: Each key is a symbol representing the name of a function the student has to write. The value is a list of required testcases for that funcion. You do not have to populate the (listof tc) in the definition of required-testcases; you will add the required testcases using the add function.

add

Type: Function
Meaning: The add function is used to add a testcase that the students should have. add consumes a symbol (name of the function this testcase is for), and a list of tc structures. A tc structure stores information about a required testcase. It has two fields:

a string which will be displayed if the testcase is missing, and
a predicate which consumes the same things as the function the student writes, and produces true if that testcase is met.

valid-input-checker

Type: Hash Table mapping symbols to predicates (the XXX/valid? functions)
Meaning: Each key is a symbol representing the name of a function the student has to write. The value is a function which is used to determine if a check-expect is valid. The function consumes a list representing the second argument of check-expect. For example, if the student writes (check-expect (sum-lon (list 1 2 3)) 6) then the function will consume the list '(sum-lon (list 1 2 3)). It will produce true if this testcase is valid, and false otherwise.

Summary of Variables

Variable	Type	Meaning
timeout	Positive Integer	Roughly how much time the student's code is given to complete. The timeout and memory should be set very high.
memory	Positive Integer	Roughly how much memory the student's code can use. The timeout and memory should be set very high.
modules	List of String, or (list Sym Str Sym Sym...)	A list of teachpacks that the students can use. modules should also include any files that student can include with (require ...).
bonuses	List of Symbol	A list of symbols, where each symbol is the name of a function for a bonus question.
summary-line	String	summary-line will be printed right before the total number of tests that are missing.
language-level	String	Pick the language level the assignment is in. Enter a number (argument to list-ref) to pick a language.
get-fcns-from-eval	Syntax	List out the functions which are defined in the student's code, and which you want to use in demo.rkt.
question-names	Hash Table mapping symbols to strings	Each key is a symbol representing the name of a function the student has to write. The value is a string, and it's the heading for that question in the output.
short-names	Hash Table mapping symbols to strings	Each key is a symbol representing the name of a function the student has to write. The value is a string, and it's used when printing the number of testcases the student has.
required-testcases	Association list	Each key is a symbol representing the name of a function the student has to write. The value is a list of required testcases for that funcion.
add	Function	The add function is used to add a testcase that the students should have. add consumes a symbol (name of a function the student has to write), and a tc structure describing the required testcase.
valid-input-checker	Hash Table mapping symbols to predicates	Each key is a symbol representing the name of a function the student has to write. The value is a function which is used to determine if a check-expect is valid.
collector-file	String or Symbol	Which version of the collector will be used to tally test cases.
debugging	Boolean #t or #f	Enable some additional output for a student submission

Running on All Students

For a real assignment, you should use a more meaningful name than demo.rkt. For example, say you name the file a10.rkt. To run a10.rkt on all the students, you can use a Bash script similar to the following:

#!/bin/bash

base='/u/cs135/check-testcases/a10/'

# Output will be saved here
results='/u/cs135/check-testcases/a10/test-results/'

mkdir ${results}

cd /u/cs135/handin/a10_autotest/
for stud in *; do 
   echo "Doing work for ${stud}"
   racket ${base}/a10.rkt ${stud}/skyscrapers.rkt       \
          1> ${results}/${stud}_missing_testcases.txt   \
          2> ${results}/${stud}_errors.txt
done

This will create a lot of empty XXX_errors.txt files, where XXX is a student's Quest ID. You can remove the empty files with the Python script remove_empty_files.py attached below. This script takes one argument, which is a path to a folder, and removes all empty files (ie files with a size of 0 bytes) from the folder.

cs135@linux028:~/check-testcases$ python remove_empty_files.py /u/cs135/check-testcases/a10/test-results/
Cleaning up /u/cs135/check-testcases/a10/test-results/

Fixing Permission Problems

Occasionally, you may get permission errors. The sandbox environment is very strict, and when the students' code is run in sandbox, their code cannot access anything unless you allow it. An example of a permission error is:

FATAL ERROR occured with file studentscode.rkt: #(struct:exn:fail
file-or-directory-modify-seconds: `read' access denied for 
/u3/cs135/check-testcases/temp/imagedata.rkt #<continuation-mark-set>)

To give students permission to read files (such as teachpacks and provided files), add the appropriate permissions to the sandbox-path-permissions variable.

Special Cases

1. Pre-defined Selectors

Sometimes the selectors of a structure are already defined in full racket. In this case you need to make use of struct->vector and vector-ref to define a new function which extract the specific field of a structure.

For example, there is a structure File which has data definition

(define-struct file (name size owner))

;; A File is a (make-file Str Nat Sym)

file-size is a build in function in full racket. You can define the following function which works the same as the selector of a file.

;; my-file-size: File -> Nat

(define (my-file-size f)

(vector-ref (struct->vector f) 2))

Number 2 represents the second field of the structure.

2. Requiring Files

Sometimes students need to require some other files in their solutions. You need to include these required files in the same folder where you put [file name]-tc.rkt as well. Also, if these required files are in a teaching language, you may want to get rid of #lang racket and include the following codes at the beginning of these files:

;; The first three lines of this file were inserted by DrRacket. They record metadata

;; about the language level of this file in a form that our tools can easily process.

#reader(lib "htdp-intermediate-lambda-reader.ss" "lang")((modname [file-name]) (read-case-sensitive #t) (teachpacks ()) (htdp-settings #(#t constructor repeating-decimal #f #t none #f () #t)))

The first bolded portion is the language you want this file to use. You can set it to whichever teaching language version that is relevant to the assignment.

Unfilled Rubric Criteria Issues

If students have unfilled Tests/Cases or Highlighting rubric criteria n MarkUs after you run the MarkUs script for filling in check testcases results, the may have the following problems:

- You can check whether students have syntax/run-time errors in their codes. Copy their codes in your DrRacket and see if it complains about errors. If it has any error, you can give 0 to all questions in the same file which also contains the error.

- If the file “runs” and there is a black highlighting issue in the file, it is possible that there are some run-time errors in students’ codes that are locally defined but are never tested by the students (through check-expect/within/error). You can check test-result and see if it is the case. If it is, you may have set the [filename]-tc.rkt with a non-simple collector. You should set it to simple collector if the questions don’t ask for templates. You may need to re-run autotesting on check testcases, re-make AUTOTESTING.ss, and re-auto fill marks if many students have this error.

- It is possible that all the conditions above don’t fit your current situation. In this case, it may be because you forgot to include some conditions in your codes (tc.rkt) which caused errors while running the check test case scripts like checking list? before using andmap/ormap.

- If students include some weird things in their codes and their files run (for example having incorrect structure names), you need to ask instructors what to do next.

Some Extra Information:

If students have some check-expect/within with incorrect order of arguments ( (check-expect expected-value (function args)) ), usually, this should not cause problems. However, if students have check-expect like this

(check-expect some-list/structure/symbol (function args)), the script ignores these tests.

If students have test cases like the following:

(check-expect Num (function args ...)), it should still be collected by the script.

Notes

For the lambda assignment, if the question asks the student to create a function that produces a function, as of W14 there is no way to test questions like this.
For any assignment that uses the teaching languages Beginning Student or Beginning Student with List Abbreviations, students that name their function parameters or constants "time" will cause the coverage scripts to crash, even though their own program runs fine. ISAs might need to let instructors know about this so that they can avoid assignment questions that deal with time, or ISAs might have to find another way around this (i.e. rename all of their "time" parameters, manually grade their test cases, etc.).

What the Script Does in Detail (Outdated)

(This is extremely outdated, and not exactly how the current scripts work)

Let's trace through the demo example. There are two scripts, demo.rkt which is the main script, and collector.rkt, which is a helper script. You can rename demo.rkt if you want.

demo.rkt needs one argument: the path to the student's file. It will first create a modified copy of the student's code, and save the modified copy as .check-testcases-tmp-file.rkt. This temporary file will be saved in the same directory as the script.

The file .check-testcases-tmp-file.rkt is the same as the student's code, except:

The <nop>DrRacket heading (the first three lines that <nop>DrRacket adds) is changed. The script removes installed teachpacks and forces student to use the correct language level. The constant new-header defines the new header.
(require "collector.rkt") is added at the top.

collector.rkt re-defines check-expect (and check-within and check-error) using Racket's define-syntax, and provides the re-defined check-expect. The script also defines and provides two functions: add-a-testcase and get-all-testcases.

add-a-testcase consumes a list representing a check-expect test, and adds it to the front of the list testcases. The function produces (void). For example, (add-a-testcase '(check-expect (add1 4) 5)) will add the list '(check-expect (add1 4) 5) to the front of testcases. The script collector.rkt re-defines check-expect such that the new check-expect will call add-a-testcase.

get-all-testcases produces testcases, the list of all the testcases that have been added. You can use this function to access the student's testcases. (get-all-testcases) contains the added testcases in the reverse order that they were added. For example:


> (get-all-testcases)
'()
> (add-a-testcase '(check-expect (symbol? 'one) true))
> (add-a-testcase '(check-expect (symbol? 'two) true))
> (get-all-testcases)
'((check-expect (symbol? 'two) true) (check-expect (symbol? 'one) true))

Once the modified file .check-testcases-tmp-file.rkt is created, demo.rkt runs this file using make-module-evaluator. make-module-evaluator produces an evaluator, which is a function that consumes a list representing a Racket expression, and evaluates it in the context of the student's code. For example, if the produced evaluator is called e, then (e '(+ 1 2)) will produce 3. It will evaluate (+ 1 2) using the student's code. As another example, if the student defines a function (define (f x) x), then (e '(f 2)) will produce 2. It does not matter whether f is defined in demo.rkt or not, and (e '(f 2)) will use the f defined by the student.

demo.rkt calls (e '(get-all-testcases)), which produces a list of the student's testcases in reverse order. This list is stored in the variable student-testcases-list. For the demo example, student-testcases-list looks like:


(list '(bonus-fcn 0 0) 
      (list 'my-equal? (iris 1 2 3 4) 'blueberry)
      '(my-equal? #t #f)
      '(my-equal? "a" "b")
      '(my-equal? #\a #\b)
      '(my-equal? sym1 symb2)
      '(my-equal? (1 2 (3 4)) (#t "abcdef" #\u #t #f ok))
      (list 'my-equal? (list (posn 23 23)) (list (posn 3 4)))
      '(my-equal? (a b c) #\c)
      (list 'my-equal? (posn 0 0) (posn 0 0))
      (list 'my-equal? (posn 0 0) 42)
      '(sum-lon (3.141592653589793))
      '(sum-lon "not a list")
      '(sum-lon (5 4))
      '(sum-lon (1 2 3))
      '(/ 1 0))

There is a one-to-one correspondence between student-testcases-list and the student's testcases.

demo.rkt takes student-testcases-list and filters out "bad" testcases, such as (check-expect (/ 1 0) 'inf). It also filters out tests where the inputs violate the function's contract. For example, if students have to write a function sum-lon to add a list of numbers, and they write the test

(check-expect (sum-lon "not a list") 'error)

then this test will be filtered out. Whether a testcase is filtered out or not is defined by the XXX/valid? functions where XXX is the name of a function the student has to write. The XXX/valid? functions consume a list representing the second argument to check-expect. For example, if the student writes (check-expect (sum-lon (list 1 2 3)) 6) then sum-lon/valid? will consume the list '(sum-lon (list 1 2 3)). The XXX/valid? functions should produce true if the inputs are valid, and false otherwise. For example, for sum-lon, the validity checker is


(define (sum-lon/valid? fcn-app)
  (and (= 2 (length fcn-app))
       (equal? (first fcn-app) 'sum-lon)
       (list? (second fcn-app))
       (andmap number? (second fcn-app))))

For example, (sum-lon/valid? '(sum-lon "not a list")) is false, so the test case (check-expect (sum-lon "not a list") 'error) will be filtered out and ignored.

The filtered testcases (ie the valid testcases) are stored in the hash table student-testcases. In this hash table, keys are Symbols representing the function name, and the values are lists of lists. The inner lists are list of arguments passed to the function being tested. For the demo example, the hash table would look like:


(hash 'sum-lon '(((1 2 3)) 
                 ((5 4)) 
                 ((3.141592653589793)))
      'my-equal? (list (list (posn 0 0) 42)
                       (list (posn 0 0) (posn 0 0))
                       '((a b c) #\c)
                       (list (list (posn 23 23)) (list (posn 3 4)))
                       '((1 2 (3 4)) (#t "abcdef" #\u #t #f ok))
                       '(sym1 symb2)
                       '(#\a #\b)
                       '("a" "b")
                       '(#t #f)
                       (list (iris 1 2 3 4) 'blueberry))
      'bonus-fcn '((0 0)))

Note that the bad testcase (check-expect (sum-lon "not a list") 'error) is not in the hash table, but it is in student-testcases-list, because the testcase has been filtered out.

Once the hash table is made, the script will check which testcases are missing. It follows this algorithm:


for each required testcase T
{
    meet_testcase = false;
    for each of the student's check-expect ce
    {
        if ce satisfies T
        {
            meet_testcase = true; // the student has testcase T
            break;
        }
    }
    
    if ( ! meet_testcase)
    {
        print "Test case " + T + " not met";
    }
}

Required testcases are added using the add function.

After all required testcases are checked, demo.rkt will print how many testcases the student has missed in total. This total does not include missing testcases for bonus questions.

demo.rkt then prints how many distinct check-expects the student has for the function fcn using (length (remove-duplicates (hash-ref student-testcases fcn empty))).

Finally, demo.rkt prints whether the student's file is covered or not, and then quits.

Attachments

Topic attachments
I	Attachment	History	Action	Size	Date	Who	Comment
zip	check-testcases-a10.zip	r2 r1	manage	25.8 K	2013-01-01 - 13:39	YiLee	CS135 Fall 2012 A10 example
zip	check-testcases-demo.zip	r2 r1	manage	8.6 K	2013-01-01 - 13:34	YiLee	Demo Example
rkt	redux-tc.rkt	r1	manage	8.2 K	2020-12-23 - 11:39	AdamMehdi
txt	remove_empty_files.py.txt	r1	manage	0.3 K	2013-01-01 - 14:58	YiLee	A script that removes all empty files from a folder/directory.
rkt	super-foldr-tc.rkt	r1	manage	8.7 K	2020-12-23 - 11:39	AdamMehdi

Topic revision: r16 - 2021-12-16 - XiyuChen

ISG Web

ISG Web Home
- Changes
- Index
- Search

Webs
- AIMAS
- CERAS
- CF
- CrySP
- External
- Faqtest
- HCI
- Himrod
- ISG
- Main
- Multicore
- Sandbox
- TWiki
- TestNewSandbox
- TestWebS
- UW

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit

Instructional Support Group, David R. Cheriton School of Computer Science, University of Waterloo