RSTPublicTests Suggested Improvements

  • There seems to be an implicit nice applied when public tests are launched, probably due to nohup; however, this nice should be explicit, as in nohup nice +6 ....
  • Request logs are currently not implemented. This should be appended to the files by the runners by caching the timestamp of the request file before it is deleted.
  • It's arguable that the public test runner should only e-mail something like Public test runner $identifier reported errors, which may have resulted in incorrect public test results or prevented tests from completing or being e-mailed to you. Please try again, and contact course staff if the errors persist. instead of dumping the actual error output, which can be retrieved from the log files by course staff on request.

Proposed improvements that would be irrelevant given CSCF involvement described below.

  • Add the following documentation instruction for pub_test_request: Make sure that the course account bin is world executable (as well as every parent directory) so the executable can be found. This is only necessary without the permission change in the point above; if that is implemented, these directories only need to be executable by the cs-marks group.
  • The public interface pub_test_request can be bypassed and the course account interface utilized directly because it requires permission 6555. There should be a cs-marks group guard on the course-account-specific version to prevent this. So, the public interface could initially be a stub script that is world-executable; this in turn would fork to executables on the different architectures which are setgid cs-marks. These would do intensive input checking and verification of the course-account files. They would then execute the course account executables (now 4550 instead of 6555), which would pick up the course-specific effective userid, and then call the main request logging code as is currently done.

CSCF involvement

Current Course Requirements for Public Tests

At the start of the term, each course is expected to take the following steps:

  • Run pub_test_kill and then pub_test_launch to ensure the test runners are in place, and are running fresh code.
  • Copy a precompiled Solaris program from the ISG account to the course account's bin directory and ensure that it is chmod 6555 so requests for tests can be made from the command line by students. This also has the additional requirement that bin and every one of its parent directories be world executable.

While this is a relatively small amount of burden, it is something that is easily overlooked by staff in any given course and which is not necessary if there's additional CSCF support. Also, pending any additional insights in RT #71466, both pub_test_launch and distrst (a program that automatically spreads batch autotesting load across multiple servers) are locked only to Solaris 8, which is particularly undesirable given the pending shift to Ubuntu for many courses.

Expansion of the amount of the public_test system removed from the course account control

CSCF has already copied a particular state of pub_test_request so that it will automatically be available in the standard PATH for all students on student.cs systems. This is currently being reviewed and may end up becoming a symlink in future terms.

Right now, pub_test_request calls a command that must be put on the course accounts manually to get setuid status before calling pub_test_logger. Instead, it should be possible for this to run as some other privileged user with the ability to set its uid and gid to all course accounts and cs-marks and drop down to course account permissions to run pub_test_logger without an executable being placed on the course accounts (the primary proposal is for this privileged account to be the isg account, and for sudo or ssh access to the course account to be used). This would simplify use of the command from the instructional standpoint.

The other potential involvement would be for the pub_test_runner executables. Right now, every course launches N of these on each server, where N is the number of processors on that server. Most of the time, these processes are idling and doing unnecessary polling. Instead, it should be possible for the privileged account to launch this pool of processes, and for each of them to drop down to the appropriate course to search for requests and service them if necessary. Again, this decreases the burden on the course accounts in terms of monitoring daemon status; however, it does mean that the pollers need to do more work (read configuration files on each account dropdown, check appropriate directory vs. reading configuration once at startup; the advantage though is automatic configuration refresh without restarting the test runners).

There is also the issue that the intent is for each course to choose a particular platform on which to launch the test runners. The privileged account would have to run them on all platforms, and then the course would need a way to list every server it wanted requests serviced on so only the appropriate runner would take action.

Possible Algorithm for test runners

"Priviliged user X" launches public test daemons; 1 per “processor” on every server in the student.cs environment. This is done to prioritize the fast servers when requests are serviced. bin/util/numprocessors in the ISG subversion repository tries to obtain this count; a current unstable checkout is available at the time of writing at /u2/isg/u/tavaskor/working/bin/util/numprocessors which appears to count cores and hyperthreading on Linux in addition to simple physical processors (which, in this case, seems appropriate).

These daemons should possibly be checked on periodically by a cron job to refresh any that may have crashed or been killed; if this is done automatically, it means CSCF wouldn't need to handle gripes about the runners dying in any cases where this happens.

"Priviliged user X" daemon algorithm
While true; do
  • Cycle over courses; run a modified ~isg/bin/public_test/pub_test_runner on any appropriate courses. There are various approaches to this; one may be:
    • Change directory to /u/
    • For each course in cs[1-9][0-9][0-9]; do
      • setgid cs-marks && setuid $course
      • if (~isg/bin/public_test/pub_test_runner is executable); then run it
      • return to uid X
      done
  • Sleep for a while; bin/public_test/pub_test_runner in the ISG repository currently sleeps a random amount of time, with longer sleep periods if it's been “a while” since it last needed to run tests
done

ISG account pub_test_runner
If `hostname` is in the allowable list for this course; then essentially follow the same algorithm it currently does, but without the infinite loop and sleeping as the privileged-user wrapping process would now handle that.

Conclusion

The net effect is that some scripting/maintenance weight is lifted from ISG and the courses, as every individual course does not need to know how to launch and maintain the public test runners, or figure out a way to launch them on only a particular selection of servers.

However, there would still be a need at the start of term for initial configuration. To simplify this so that a single configuration option can be used for both the public test runners and distrst, the most natural option will likely be a list of allowable servers in .rstrc.

As the configuration is read in by bash scripts, this would most naturally be an array; for example,

test_servers=( cpu16.student.cs cpu18.student.cs cpu20.student.cs )

The overall net effect is not a complete elimination of the start-of-term setup requirements for each course regarding public tests, but still a reduction to a single statement in a configuration file.

Topic revision: r5 - 2010-03-16 - TerryVaskor
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback