Previous Bulletins and Downtimes


MC 3022 Terminals Down — 2016-02-24 10:30 - 2016-02-24 17:15

All Nettop (NUC) termianls in MC 3022 are now functional.
There was a problem with the server responsible for supplying the keytab files for each device in the lab. Logins did not work nor did NetApp file server access. Service was restored later the same afternoon.

Spam mail overload— 2016-01-12 13:45 - 2016-01-13 11:50

The spam issue has now been resolved. It turns out that IST made changes to their configuration yesterday that inadvertently permitted email relayed through our servers (and possibly others on campus) to pass through directly without being checked for spam. IST has corrected this configuration issue and email is now being checked for spam properly. If you have any ongoing issues or concerns, please speak to your Point of Contact.

CS webserver unavailable — 2016-01-07 19:01

The CS main webserver had a DNS server failure from 18:35 to 18:55, which affected CS Drupal pages and any pages requiring database lookups such as marmoset, cs-uops CGI scripts, ST (CSCF and MFCF), inventory (CSCF and MFCF), and CS faculty-recruiting.

mysql.cs database server down again — 2015-12-18 16:30

The database server mysql.cs (also known as database.cs) was down again due to a disk issue, from Friday 18 Dec 2015 05:00 - 13:30. Services affected were marmoset, cs-uops CGI scripts, ST (CSCF and MFCF), inventory (CSCF and MFCF), and CS faculty-recruiting. We believe we have fixed the underlying problem that caused this and the earlier incident.

mysql.cs database server down — 2015-12-14 13:00

The database server mysql.cs (also known as database.cs) was down due to a disk issue, from Sunday 13 Dec 2015 05:43 - Monday 14 Dec 2015 10:02. Services affected were marmoset, cs-uops CGI scripts, ST (CSCF and MFCF), inventory (CSCF and MFCF), and CS faculty-recruiting.

CS Drupal pages unavailable — 2015-12-14 15:36

The CS main webpage and all other Drupal pages were unavailable due to an unexplained Drupal formatting issue, from Monday 14 Dec 2015 08:55 - 09:18.

ubuntu1204-006.student.cs was down — 2015-10-29 15:25 → 2015-10-29 16:05

ubuntu1204-006.student.cs (one of the 3 linux.student.cs systems) hung from 3:25 PM to 4:05 PM due to some form of resource starvation.

CS-GENERAL SAMBA (smb-files.cs.uwaterloo.ca) file service down — 2015-10-09 15:00-16:15 estimated

Network file services in the SCS general computing environment were intermittently inaccessible due to a corrupted bind of the SAMBA server to the CS-GEWNERAL domain. Repairs required a reboot of the file server so there was a final universal outage of about two minutes.

linux.student.cs:/u failing — 2015-10-01 18:00 → 2015-10-02 9:40

The directory "/u" was failing on most of the linux.student.cs servers. The cause was a configuration gadget misbehaving.

Networking problems in DC — 2015-09-24 15:30 → 2015-09-25 11:00

Various workstations were failing to boot in DC. While taking another step towards eliminating our DHCP service, it broke (in spite of documented tests suggesting otherwise). It's back to working now.

www.cs down — 2015-08-27 8:00 → 8:35

The CS WWW server (and thus the multiple associated WWW sites) ceased to answer from 8:00 to 8:35 on Thursday August 27th. The cause remains to be determined.

linux.cs, Mac labs, and others effectively down — 2015-08-18 08:40 → 11:00

A local DNS server problem (which makes hostname lookups fail for a while) was affecting logins on linux.cs and linux.student.cs, and making some www.cs access slow, and might have been affecting other systems as well. This was dodged on linux.cs as of 9:30, and www.cs as of 9:40. The DNS problem looks to have been resolved as of 9:45. In addition there was a student SMB fileserver (smb-files.student.cs) problem that wasn't resolved until around 11:00, which was preventing logins to any of the student Macs.

ubuntu1204-004.student.cs down — 2015-08-01 4:20 → 2015-08-02 11:50

ubuntu1204-004.student.cs was down due to some form of resource starvation.

CS WWW service, and @cs email, down 2015-07-28 11:30 → 12:15

The CS webserver (which provides https://cs.uwaterloo.ca, and various other WWW sites), along with reception of email addressed to @cs.uwaterloo.ca were down from 11:30 to 12:15 today.

reading @cs failing (intermittently) — 2015-07-27 10:00 → 17:00

The names mails.cs.uwaterloo.ca, imaps.cs.uwaterloo.ca, and smtp.cs.uwaterloo.ca ceased to be available, which broke mail clients configured to use them (instead of mail.cs.uwaterloo.ca). While correcting that, a DNS problem arose that caused the name mail.cs.uwaterloo.ca to be unavailable (depending upon which DNS servers are queried). The problem was dodged in the afternoon, and much later, DNS caught up.

Upgraded all three linux.student.cs this afternoon 2015-07-13 15:00

CPU load went high starting sometime yesterday (July 12), and the 'ps' command would hang on all three linux.student.cs hosts. Since seeing this a few times lately, we did an update and rebooted them one by one from 13:50 to 14:40 this afternoon.

MC3003 down — 2015-07-11 8:10 → 11:40

An electrical panel is being replaced, affecting the "southwest area of the 3rd floor". So far we've seen MC3003 affected, from 8:10 to 11:40.

Processes hanging on ubuntu1204-00[26].student.cs — 2015-06-28 → 2105-07-02 14:40

Records suggest that starting sometime June 28th processes that queried the system process table would hang. E.g. the `ps` command (with common options) would hang. This had the effect of causing some Marmoset processes to hang (as they tried to cleanup at the end of a test). The effect resulted in a report Tuesday (2105-07-02) afternoon. Investigation did not reveal a way around this; it appears to be a kernel problem. So both of ubuntu1204-002.student.cs and ubuntu1204-004.student.cs were rebooted on Tuesday, returning around 2:40PM.

Mac lab file access problems — 2015-06-25 → 2015-06-29 12:20

SMB access to cs-teaching files, from machines using the cs-teaching directory service, was failing intermittently from sometime late last week to midday Monday 2015-06-29. The apparent cause was an inaccurate clock on the SMB file server.

CS General environment: The switch over to the new NetApp Research file server is complete. — 2015-06-06

9:00am
The move to the newer (faster) file server started. All systems being shutdown and access to the CS-GENERAL NetApp home directories removed.
10:30am
linux.cs.uwaterloo.ca systems back in production.
Having problems with Web and Samba (CIFS) services.
Users will need to reboot their Thin-Clients to access their home directories new location.
1:00pm
Web Services back.
1:30pm
All Services back.

Mac lab slowness/login problems — 2015-06-04 13:00 → 2015-06-05 12:30

We were seeing intermittent failures of home directory access, affecting multiple people. We found an anomaly in the virtual machine running the SMB server that the Macs use for fileserver access. A reboot was required to fix that, around 11:45, which would have disconnected those using the Mac labs. A configuration problem with the SMB server itself was found, and resolved at about 12:30. Performance appears to have returned to normal.

CS Teaching environment: The switch over to the new NetApp Student file server is complete. — 2015-05-30

9:00am
The move to the newer (faster) file server started. All systems being shutdown and access to the CS-TEACHING NetApp home directories removed.
9:15am
Final synchronisation of data from old CS-Teaching file server to new NetApp hardware started.
9:35am
Completed final synchronisation of data from old CS-Teaching file server to new NetApp hardware. Verifying new NetApp functionality.
10:10am
Starting to bring general-use linux.student.cs back online.
10:20am
linux.student.cs.uwaterloo.ca systems back in production.
11:00am
linux.student.cs and supporting services are in production. Now working on Teaching lab hosts.
12:30am
Task is completed. linux.student.cs, services and Teaching labs hosts are in production.

Markus/Marmoset/assignments.student.cs down — 2015-05-27 17:40 → 20:45

assignments.student.cs (and hence Markus and Marmoset) was down from 5:40 PM to 8:45 PM on Wednesday 2015-05-27. The cause wasn't determined.

ubuntu1204-006.student.cs was down — 2015-05-22 15:55 → 16:05

ubuntu1204-006.student.cs (one of the 3 linux.student.cs systems) hung from 3:55 to 4:05 pm due to some form of resource starvation.

Marmoset and Markus were down during Monday night (18 May 2015)

The service was restored at 6:00 on 19 May 2015, after restarting relevant server host.

Firewall rules to be applied to non-research servers — 2015-05-07 @ 8:30

Firewall rules will be applied to the (non-research) servers that people login to:

linux.cs
linux.student.cs
`ssh` to these systems will still be possible. All other network protocols will be blocked from access outside of campus, without using the campus VPN (https://uwaterloo.ca/information-systems-technology/services/virtual-private-network-vpn).

Other servers, for mail to @cs.uwaterloo.ca and @student.cs.uwaterloo.ca, and WWW servers such as cs.uwaterloo.ca, www.student.cs.uwaterloo.ca, markus.student.cs, and marmoset.student.cs, will also be affected in that only mail and WWW protocols will be allowed to those machines from off-campus (unless the VPN is used).

And while we're at it, a few more workstations in DC will be firewalled as well.

Feel free to ask for clarification and/or details.

This was intended to happen Wednesday morning (2015-05-06) however that didn't happen. So we're asking for Thursday morning (2015-05-07).

CS teaching environment downtime — 2015-05-04 update

The move to the new fileserver was delayed in favour of shortening home directory names to match userids (8 characters long). That happened on the weekend. Downtime wasn't required. This is expected to be a transparent change, as links from the old names to the new names were made. Group names haven't been shortened yet.

The move to the newer (faster) file server should be doable later this week, possibly Wednesday night. Once it looks like a 30 minute downtime is practical, the move will be scheduled and announced.

Marmoset was down during the morning of 4 May 2015

The service was restored shortly after twelve through restarting relevant server host. The problem was caused by an incorrect hostname update on assignments.student.cs. This has now been resolved and the service is functioning as expected.

CS teaching environment downtime — 2015-04-25 maybe!

A new central file server will be installed in the undergraduate teaching environments for both Math and Computer Science. Both environments will be completely unavailable on the day of the major changeover. We were planning for this to happen Saturday April 25th. However one of the involved vendors has yet to produce some of the needed equipment. So it might be delayed until next weekend. Notice will be provided as we learn more (due to vendor time and equipment constraints).

Firewall rules to be applied to admin staff workstations — 2015-04-22 @ 8:30

Many (however not yet all) CS administrative staff workstations are on a subnet that will have "default deny" firewall rules applied to it on the morning of 2015-04-22 (Wednesday) @ 8:30. It's highly unlikely that anyone will notice.

Marmoset down — 2015-04-06 → 2015-04-20 14:00

On 2015-04-13 Marmoset was reported down (as of about 2015-04-06). We were told it wasn't too urgent since classes had ended. The staff needed to diagnose the problem were away then; work has started now (2015-04-20). The problem appeared to be non-trivial, however it's now been resolved. A somewhat subtle change in general database configuration had caused the problem.

Firewall rules to be applied to DC "thin clients", and the student software engineering lab — 2015-04-15 @ 8:30

The "thin clients" (a.k.a. Nettops) used by some graduate students in DC will be put behind "default deny" firewall rules, as will the workstations in the student Software Engineering lab (DC2577), on the morning of 2015-04-15 @ 8:30. It's highly unlikely the thin client users will notice. In the unlikely event that students have been logging in to the DC2577 workstations from off campus, the campus VPN will have to be used.

Firewall rules to be applied to CS teaching workstations and "ugsters" — 2015-04-08 @ 8:30

Various teaching workstations will be put behind "default deny" firewall rules on the morning of 2015-04-08 @ 8:30. Most people won't notice any change at all. However, sometimes people login to the graphics lab machines (MC3007) from off campus; for that, the campus VPN will have to be used. An alternative to the VPN will be to first `ssh linux.student.cs`, and then login to a graphics lab machine.

In addition, the ugsterNN.student.cs systems will have the same firewall rules applied. The advice is the same as for the graphics lab machines.

ubuntu1204-006.student.cs was down — 2015-03-19 13:35 → 14:40

ubuntu1204-006.student.cs (one of the 3 linux.student.cs systems) hung from 13:35 to 14:40 due to some form of resource starvation.

ubuntu1204-006.student.cs was down — 2015-03-05 3:15 → 9:35

ubuntu1204-006.student.cs (one of the 3 linux.student.cs systems) hung from 3:15 to 9:35 due to some form of resource starvation.

database.cs (mysql170.cs) was down — 2015-02-26 12:20 → 20:45

database.cs (mysql170.cs) was down 2015-02-26 12:20 to 20:45 due to a server failure.

ubuntu1204-004.student.cs was down — 2015-02-26 16:15 → 17:40

ubuntu1204-004.student.cs (one of the 3 linux.student.cs systems) hung from 16:15 to 17:40 due to some form of resource starvation.

ubuntu1204-004.student.cs was down — 2015-02-10 15:40 → 16:25

ubuntu1204-004.student.cs (one of the 3 linux.student.cs systems) hung from 15:40 to 16:25 due to some form of resource starvation.

Maple (17) unavailable on linux.student.cs — 2015-01-25 → 2015-01-26 7:30

Running `maple` on linux.student.cs would hang, and then complain about a license server problem. The license server it was using looks to have vanished. It's now using the correct license server.

All thin clients (nettops) were down — 2015-01-14 13:30 → 2015-01-15 15:00

All thin clients (in at least MC) were down. That's at least in MC2061, MC3022, and MC3018 (the realtime lab). The failed nettop root file server has now been replaced.

https://cs.uwaterloo.ca wasn't answering — 2015-01-02 13:00 → 2015-01-03 13:40

https://cs.uwaterloo.ca wasn't answering. It was caused by a network problem that IST has sinced resolved.

linux.student.cs and linux.cs will be rebooted in the early morning on Sunday, Jan 4, 2015 — 2014-12-31

We will upgrade and reboot linux.student.cs (all three hosts) and linux.cs from 7:30 to 8:30 AM on Sunday, Jan 4, 2015. The downtime for each host should be less than 10 minutes.

`ssh linux.cs` failing — 2014-12-21 10:30 → 12:20

`ssh linux.cs` was failing from approxmately 10:30AM to 12:20PM on Sunday 2014-12-21. The cause was an `ssh` attack. The offending network addresses have been added to "deny" list.

email lists changed — 2014-12-16 16:30

The email lists used primarily for announcements to groups within the School of Computer Science changed their implementation details. There were a couple of hiccups at the time of the change but the lists should be consistent and correct now.

Contact your usual support person in CSCF, or the CSCF Help Desk cscfhelp@uwaterloo.ca if you encounter problems related to this.

mail and login hangs — 2014-11-20 7:00 → 9:30

Starting at about 7AM this morning, two of the (CS) DNS servers used by various servers and workstations went down. The visible symptom was very long delays in things like reading mail or logging in. It was corrected sometime around 9:30AM.

Student Labs Printing: UPDATED AGAIN — 2014-11-10

Printing directly from applications (using the File -> Print menu option) to the new Xerox printers in MC 2061, MC 3006, MC 3008 and MC 3009 is now available. Queue names on the local terminals, workstations and servers are as follows.

rs-public-mc2061
rs-public-mc3006
rs-public-mc3008
rs-public-mc3009-bw
rs-public-mc3009-colour

Print jobs sent to these queues must still be released for printing using the PaperCut (uPrint) web interface at the following address. This portal requires a user's WatIAM/Quest password.

https://uwaterloo.ca/uPrint
NOTE: As of yet, printing to rs-public-mc3009-colour fails when sent from Macintosh terminals in MC.

Student labs printing — 2014-09-11

The MC3016 printer room is no longer in operation. It has been replaced by printers in MC3006, MC3008, MC3009. Sometime soon, a printer in MC2061 will also be available.

Printing to these printers is different as well. For the moment, it is necessary to print via a WWW form at

https://uwaterloo.ca/uPrint

It will require your userid, and WatIAM/Quest password. Enabling printing directly from applications remains to be implemented.

linux.cs unavailable — 2014-10-28 10:30 - 11:45

There was a hung process that caused the load average to climb such that the machine became unusable. A reboot was required to bring it back into service.

git.cs and depot.cs down — 2014-10-11 05:00

The server that supports git.cs and depot.cs started failing (with a disk error) 5AM Saturday 2014-10-11. Repair is expected sometime 2014-10-14.

mail.cs and print.cs down — 2014-10-11 05:00 → 2014-10-14 10:30

The server that supports mail.cs (for email to @cs addresses) and print.cs (e.g. for printing to lj_cs) started failing (with a disk error) 5AM Saturday 2014-10-11. While mail.cs and print.cs were the most visible problem, this also affected bs102.cs (for Nettop workstation booting) and git.cs. And it had the potential to affect crysp.uwaterloo.ca, and depot.cs.

ubuntu1204-004.student.cs hung — 2014-10-02 18:15 → 21:30

ubuntu1204-004.student.cs, one of the 3 linux.student.cs systems, stopped responding to `ssh` at approximately 6:15PM on Thursday 2014-10-02. The cause is unknown. The system was rebooted at 9:30PM, and responds to `ssh`.

ubuntu1204-002.student.cs down since 2014-09-29 16:45

ubuntu1204-002.student.cs is currently down due to unknown causes. Uptime unknown. linux.student.cs.uwaterloo.ca still reliably accesses the other two servers.

Intermittent CS Nexus login failures — 2014-09-24 14:00 → 16:00

Access to the CS Nexus fileserver was intermittently failing from 2pm to 4pm on Wednesday September 24th. That resulted in logins to Nexus stations failing for some CS students and staff. The server ran out of memory; the cause of which is surprisingly unclear.

ubuntu1204-004.student.cs was down — 2014-09-13 00:30 → 11:50

ubuntu1204-004.student.cs was down Saturday morning (2014-09-13), due to a hardware hiccup.

windows.cs Terminal Server cluster is now upgraded to Windows Server 2012 — 2014-09-02 12:10

For those users who wish to continue using the old Windows 2008 windows.cs, it can be accessed under its new name windows-legacy.cs.uwaterloo.ca.

mail.cs and print.cs were down — 2014-08-14 9:10 → 2014-08-14 11:20

mail.cs and print.cs were down (the underlying hosting system failed again). This affected printing to the main CS printers, as well as various mailing lists, and those who still use mail.cs (instead of Connect).

The problem was fixed around 11:20.

mail.cs and print.cs were down — 2014-08-12 20:00 → 2014-08-13 10:30

mail.cs and print.cs were down (the underlying hosting system failed). This affected printing to the main CS printers, as well as various mailing lists, and those who still use mail.cs (instead of Connect).

The problem was fixed around 10:30 on Wednesday, August 13.

Login Shell For Some Research, Grad and Admin Accounts Reset to /bin/bash - 2014-08-08

Anyone whose linux login shell was reset to /bin/bash today can correct this failure by utilizing the chsh command on linux.cs. We apologize for any inconvenience this problem may have caused.
Undergrad/Teaching environment accounts would not have been affected.

A major CS research machine room (DC3556) is down — 2014-07-08

At about 8:55AM 2014-07-08 the fire alarm triggered in the DC. The A/C in the main CS research machine room (DC3556) failed, and the room got hot enough to trigger the sprinkler system. Assessment and repairs are ongoing.

A major CS machine room (DC3558) is down — 2014-07-08

At about 11:00, the main CS machine room (DC3558) had to be shut down due to water infiltration from across the hall (DC3556). Some jury-rig has recovered most services, except for (at least): smb-files.cs and backup.cs.

Cleanup is in progress. We expect to have the room back by noon Wednesday (2014-07-09).

www.student.cs and assignments.student.cs were down — 2014-06-28 8:40 → 2014-06-29 9:00

www.student.cs and assignments.student.cs (and thus Markus and Marmoset) down from Saturday 2014-06-28 8:40AM to Sunday 2014-06-29 9:00AM. The cause was a major networking failure in MC3015, combined with a lack of automated monitoring for the above.

linux.student.cs systems were down — 2014-06-28 8:40 → 15:00

linux.student.cs (all three systems) were down Saturday 2014-06-28 from 8:40AM to 3:00PM. The cause was a major networking failure in MC3015.

ubuntu1204-006.student.cs was down — 2014-06-22 2:30 → 14:15

ubuntu1204-006.student.cs (one of the 3 linux.student.cs systems) hung from 2:30AM Sunday 2014-06-22 to 2:15PM. A reboot was required. The cause of the hang was memory exhaustion.

ubuntu1204-004.student.cs was down — 2014-06-20 22:20 → 2014-06-21 10:05

ubuntu1204-004.student.cs (one of the 3 linux.student.cs systems) hung from 10:20PM Friday 2014-06-20 to 10:05AM Saturday 2014-06-21. A reboot was required. The cause of the hang was memory exhaustion.

linux.cs down — 2014-06-20 17:30 → 23:30

linux.cs was down from 17:30 to 23:30 on 2014-06-20. For some hours before then, various activities were hanging (e.g. the `ps` command). Investigation failed to reveal the cause of that, so a reboot was attempted. The first reboot hung as well.

Web services and the CS incoming mail handler will be offline Thursday morning between 6:00am and 6:15am

m3-3101-12.cloud.cs.private.uwaterloo.ca IAAS server kernel update went well. www152 and mx100 went down at 6:00amd and were back in production by 6:10am

imap service down — 2014-06-13 13:00 → 14:40

imap service was unavailable due to NFS mount problem.

print.cs (and a few others) unavailable — 2014-06-08 → 2014-06-09 9:20AM

Print.cs (and hence most printing), as well as git.cs (an experiment `git` repository), depot.cs (a local repository for Ubuntu software), and the boot servers for the "nettops", found mostly in MC3022 and MC2061, are down. The cause is a networking configuration problem on the hosting machine. Uptime is unknown, however we can hope that it will be fixed by this morning.

The networking problem was fixed around 9:20 on Monday.

Emergency Linux downtimes extended — 2014-06-07 10AM to ?

We're having a few problems which may cause short interuptions of non-critical services as the day progresses

All OAT systems should be back online. 2014-06-07 11:00AM

Emergency Linux downtimes — 2014-06-07 7AM → 10AM

All of the CSCF teaching systems, and general systems, that run Linux (Ubuntu) will be down at various times from 7AM to 10AM Saturday morning (June 7th) for emergency maintenance. Downtimes might be quick in some cases. In other cases it could take 15 minutes (maximum 30).

This will affect (at least)

In addition, we can expect that the linux-legacy.student.cs systems

as well as will remain down.

Our apologies for the short advance notice, however the need for this is urgent.

linux024.student.cs down — 2014-06-04 16:05 → 17:20

linux024.student.cs was hung from 16:05 to 17:20 2014-06-04.

ubuntu1204-006.student.cs down — 2014-06-02 21:00 → 2014-06-03 4:00

ubuntu1204-006.student.cs was hung from 2014-06-02 21:00 to 2014-06-03 4:00 apparently due to some form of resource starvation.

linux024.student.cs down — 2014-05-31 1:25 → 10:05

linux024.student.cs, one of the linux-legacy.student.cs systems, was down from about 1:25AM to 10:05AM Saturday May 31st. The cause is not yet known.

CS web server and @cs email reception were down — 2014-05-27 1:25 → 8:45

The CS web server (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down from 1:25AM to 8:45AM on 2014-05-27. The cause of the underlying hang is currently unknown.

https://www.cs.uwaterloo.ca is working, however an `ssh www.cs` fails — 2014-05-20 18:05

While some simple tests of https://www.cs (or https://cs) web pages work, an `ssh www.cs` doesn't. The cause has yet to be determined. Don't be surprised if a reboot is needed.

ubuntu1204-004 and ubuntu1204-006.student.cs were down — 2014-05-18 4:30 → 12:00

Runaway processes appear to have been the cause.

The cs-general Windows file server (smb-files.cs) was down — 2014-05-15 16:45 → 20:55

UPDATE: cs-general Windows (SMB) fileserver (smb-files.cs) is back online as of 20:55 this evening. The problem appears to have been a networking failure on the server. Further details will be determined Friday (2014-05-16) morning.

The CS web server and @cs email service have moved to new hardware — 2014-05-12

As the true cause of last week's outages remain a mystery, the www.cs and @cs email services have been moved to new hardware.

CS web server and @cs email reception were down — 2014-05-11 5:40 → 9:55

The CS web server (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down from 5:40AM to 9:550AM on 2014-05-11.

CS web server and @cs email reception were down — 2014-05-10 8:30 → 10:30

The CS web server (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down from 8:30AM to 10:30AM on 2014-05-10.

CS web server and @cs email reception were down — 2014-05-09 16:45 → 16:55

The CS web server (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down from 4:45PM to 4:55PM on 2014-05-09. The plan is to move the service to different hardware.

CS web server and @cs email reception were down — 2014-05-07 6:45 → 9:00

The CS web server (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down from 6:45AM to 9:00AM on 2014-05-07. There is a likely hardware problem, to be resolved with a firmware update, however the symptoms suggest that more might be involved.

CS web server and @cs email reception were down — 2014-05-05 8:45 → 9:10

The CS web server (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down from 8:45AM to 9:10AM on 2014-05-05. The cause remains to be determined.

CS web server and @cs email reception were down — 2014-05-03 10:15am → 4:45pm

The CS webserver (parts of www.cs and all of associated WWW sites) and @cs email reception (mx100.cs) were down today until 4:45pm. The cause is currently unknown.

Patch Internet Explorer

On April 26th, Microsoft announced a critical security flaw with the Internet Explorer (IE) browser. The flaw was discovered as cybercriminals were using it to target organizations in the USA. The flaw affects IE versions 6 through 11 on Microsoft Windows.

Microsoft has issued a patch (MS14-021), and it is available both from Microsoft and the campus WSUS server (which supplies patches automatically to most Windows systems on campus). The patch covers all versions of Internet Explorer from 6 through 11 and will be delivered and installed via Windows Updates.

If you need help in enabling automatic updates, please visit https://support.microsoft.com/kb/294871. If that doesn't work, then as always, feel free to ask your CSCF Point of Contact for help.

Two ~10 minutes outages of Web and mail services occurred 2014-04-22 — 6:55am → 7:05am, 7:15am → 7:25am

Reconfiguring the NAT service on the IAAS host caused an unexpected outage. A second outage (reboot) was needed to fix the problem.

www.student.cs page authentication broken — 2014-04-14 23:23 → 2014-04-16 13:20

www.student.cs pages that require WaIAM authentication started failing late on 2014-04-14. The cause, a complaint about mismatched certificates with CAS authentication service, was presumably related to the certificate updates being done to avoid the "heartbleed" bug. However, the similarly configured www.cs doesn't have the problem. The dodge has been an early move to a newer version of www.student.cs that had been scheduled for 2014-04-24, as that system doesn't exhibit the problem.

@cs email handling delayed — 2014-04-14 11:00 → 16:30

The handler for @cs was rate limiting to one connection per limit. The net effect is that mail reception and forwarding looked to be down. System configuration has been changed. Why a reboot caused that configuration remains to be determined.

@cs email reception, and many www-related services down — 2014-04-13 13:00 → 2014-04-14 11:00

Reception of email to @cs.uwaterloo.ca addresses is delayed, and various WWW related services are down. The cause is a mail handler (mx100.cs) and a database server (postgres165.cs) being inaccessible. The cause of that is unclear, other than both systems reside on a virtualization system which is otherwise accessible.

Markus and Marmoset down — 2014-03-30 23:57 → 2013-04-31 9:50

Markus (and presumably) Marmoset went down later Sunday night; the system assignments.student.cs failed. It was restored Monday morning.

www.cs down — 2014-03-17 16:30 → 17:35

The main CS web server was down from approximately 16:30 to 17:35. While many of the http://cs.uwaterloo.ca pages reside elsewhere (see https://math.uwaterloo.ca/computer-science/), the following were affected:

ai.uwaterloo.ca algcomp.uwaterloo.ca
cfm.uwaterloo.ca charisma2010.uwaterloo.ca
chil.uwaterloo.ca compstats.uwaterloo.ca
crysp.uwaterloo.ca cs.uwaterloo.ca
db.uwaterloo.ca dmet.uwaterloo.ca
epad.uwaterloo.ca hi.uwaterloo.ca
odyssey.uwaterloo.ca qc.uwaterloo.ca
requirements-engineering.org ripple.uwaterloo.ca
scicom.uwaterloo.ca se.uwaterloo.ca
softeng.uwaterloo.ca uclp.uwaterloo.ca
userver.uwaterloo.ca watform.uwaterloo.ca
www.scg.uwaterloo.ca www.swag.uwaterloo.ca

What should have been a "simple" network addition had non-trivial consequences. The root causes are still being investigated.

www.cs down — 2014-03-03 21:50 → 2014-03-04 10:15

The server responsible for www.cs failed. The hardware has been rebooted, and all web services are running as expected.

www.student.cs problem — 2014-03-03 0:35

There are some problems viewing webpages on www.student.cs from 17:00 to 19:00. The problem was mitigated by a graceful reboot of Apache2 webserver. The original cause of problems is still unknown.

Sharepoint websites unavailable Saturday, January 25, 2014

IST has announced that the campus Sharepoint websites will be unavailable on Saturday January 25 (7 AM to 7 PM) in order to implement a major version upgrade to Sharepoint 2013. All Sharepoint content will be inaccessible during this time. See the complete IST announcement below.

What does this upgrade mean to me?

The upgrade will have two apparent changes:

  1. After the upgrade, the only credentials that will work with Sharepoint are your Nexus (WatIAm) userid & password. The "ADS" credentials previously accepted by Sharepoint will no longer work.

    When you log in, you should no longer need to specify a "nexus\" prefix to your userid. However, your browser may have cached (remembered) "ads\" as part of your userid. If you have problems logging in after Saturday, try a "nexus\" prefix to your userid. Once your browser's cache is reset, you should no longer need the "nexus\".
  2. You may see subtle changes in "look and feel" in the new version of Sharepoint. While the new version offers substantial changes, they are optional, and site owners can choose when to update their sites fully to the new version. However, there may be some minor changes that are apparent in un-updated sites.
Need help?

If you are having problems logging into Sharepoint after the change, contact your usual support person in CSCF, or the CSCF Help Desk cscfhelp@uwaterloo.ca. If you have more general questions about Sharepoint 2013, please contact IST (contact information below).

IST Announcement
Subject: SharePoint unavailable Saturday, January 25

What is being done? SharePoint is being upgraded from SharePoint 2010 to the new version, SharePoint 2013.

Why is this being done? This is part of the normal IST upgrade and maintenance process. This upgrade will provide for newer features and options available in SharePoint 2013 Server (Enterprise).

When is it being done? Saturday, January 25, 2013 between 7:00 a.m. and 7:00 p.m. If there is the need to back-out of the changes, SharePoint may be down longer while a restore is done.

What is the impact to SharePoint users? SharePoint will be unavailable during this maintenance window. After the upgrade SharePoint Administrators should verify that Site Content has been migrated and Site Permissions are working correctly.

Questions/concerns? Please contact the IST Service Desk, helpdesk@uwaterloo.ca or ext. 84357.

imap.cs and smtp.cs down 2014-01-08 17:50 → 2014-01-09 16:40

imaps.cs and smtp.cs were down, due to the failure of 10G network interface.

imap.cs and smtp.cs down 2014-01-01 15:25 → 2014-01-02 12:25

imaps.cs and smtp.cs were down, due to a hardware fault.

ubuntu1204-002.student.cs down — 2013-12-26 → 2014-01-02 9:45

ubuntu1204-002.student.cs was down from early 2013-12-26 to 9:25 2014-01-02. As of 2014-01-01 it had been removed from the linux.student.cs list, to prevent every third login from hanging. The cause of the outage remains to be determined.

Web service outage ; 2013-11-26 17:01 → 17:20

All CS web services were down for 20 minutes.
Problem occured as CSCF staff were trying to configure the web servers firewall to log outgoing smtp connections in an attempt to catch possible email spammers.

MC3027 Mac Lab, MC4065 Tutorial Centre - logins failing — 2013-11-12 10:13 → 2013-11-15 12:15

Logins were failing for all Mac Mini terminals in MC3027 and MC4065. For the exception of three terminals, both rooms are back in working order.

Login authentication problems — 2013-11-06 12:00 → 15:40

We experienced authentication problems that affected SCS mail and most general use and teaching systems within the School of Computer Science, as well as some research systems (anything that used either the cs-teaching or cs-general directory services). It did not affect CS WWW page access.

Logins to ancient Solaris systems disabled — 2013-10-21

In preparation for their elimination, logins to: cpu108.cs, cpu112.cs (a.k.a. solaris.cs), fe102-solaris.cs, services112.cs, and core.cs have been disabled. If there is still something on Solaris that you need that we don't have on linux.cs, please let us know.

mc-dns-1 was out of service from 2013-11-04 13:35 to 2013-11-05 8:55

mc-dns-1, one of four dns servers, was back in service at 6:10 pm on Monday, Nov 4.

Web services suffered a 10 minutes outage around 5:00pm Oct 28 — 2013-10-28 17:15

Web services suffered a 10 minute outage around 5:00pm Oct 28. The hosting piece of hardware/hypervisor hung and was not accessible via it's console. Root cause of the problem is still under investigation although a cold reboot has everything working again.

Logins to ancient Solaris systems disabled — 2013-10-21

In preparation for their elimination, logins to: cpu108.cs, cpu112.cs (a.k.a. solaris.cs), fe102-solaris.cs, services112.cs, and core.cs have been disabled. If there is still something on Solaris that you need that we don't have on linux.cs, please let us know.

Planned network outage — MC Mac labs — 2013-10-8 at 0700h

There will be a brief interruption to the networking service to the MC Mac labs on Tuesday October 8 at 7:00 AM, lasting for about 10 minutes, in order to complete network maintenance.

Please contact Trevor Grove (trevor.grove @ uwaterloo.ca) if you have concerns about this planned outage.

Printing Spooler (print.cs) Unavailable — 2013-09-24 6:50 → 11:05

The print server (print.cs) along with various other services (e.g. depot.cs and the NetTop boot servers) failed at 6:50 this morning. A regular set of switch reboots at 6:50 this morning resulted in a single network not resetting as it should have. IST will schedule a similar reboot tomorrow morning so that we can verify that the problem won't reoccur.

CS-Teaching environment problems, Markus, Marmoset — 2013-09-21 12:00 → 2013-09-22 12:10

Markus and Marmoset were down. Logins to the environment were timing out. Failures in multiple local DNS servers, combined with a possible configuration error on the Markus/Marmoset server appear to have been the cause. More work remains (only one of the DNS servers is back), and the possible configuration problem needs investigation.

scs-council@cs and other mailing lists failed — 2013-09-09 01:20 → 21:20

Email to the scs-council@cs, cs-faculty@cs, and possibly other mailing lists wasn't being delivered, being held in a queue for delivery. As of 21:20 the queue was draining. The underlying cause might have been inexplicable contention between filesystem locks originating from multiple mail servers (the older imaps.cs and the newer mails.cs).

imaps.cs/smtp.cs down — 2013-09-08 6:50AM → 2013-09-09 12:00PM

imaps.cs and smtp.cs were down, due to a hardware fault followed by not automatically rebooting.

@student.cs mail, mysql070.student.cs, postgres060.student.cs down — 2013-08-23 17:00 → 2013-08-24 12:05

mail.student.cs, mysql070.student.cs, mx000.student.cs and postgres060.student.cs were down from Friday 2013-08-23 17:00 to Saturday 2013-08-24 12:05.

Mail.student.cs down — 2013-08-08 16:00 → 17:15

mail.student.cs, mysql070.student.cs, mx000.student.cs and postgres060.student.cs were down due to a faulty network connection.

Marmoset and Faculty Recruiting down — 2013-07-31 10:05 → 11:40

At around 10:00, a database server (database.cs) failed. It provides for Marmoset, the Faculty recruiting application, the CSCF internal work tracking system, and the CSCF inventory system. A faulty network connection has been replaced; it's working again.

Exam Seating and Graduate Admissions down — 2013-07-30 17:15 → 17:40

The Exam Seating and Graduate Admissions applications were unavailable 2013-07-30 from 17:15 to 17:40. The immediate cause was a database server being down. The cause of that remains to be determined.

COMPLETED: Scheduled network outage — MC Mac labs — Thursday 2013-7-25, 7AM

The network maintenance has been completed. Please report any Mac Lab networking anomalies to Trevor Grove (trevor.grove @ uwaterloo.ca).

Scheduled network outage — MC Mac labs — Thursday 2013-7-25, 7AM

There will be a brief network outage in the Apple Mac computer labs on the secon d and third floors of the MC building on Thursday July 25 beginning at 7 AM and lasting for approximately 30 minutes.

No changes should be apparent to end users. Please contact Trevor Grove (trevor.grove @ uwaterloo.ca) for questions or conce rns about this network change.

CS-TEACHING non-responsive — 2013-07-21 13:00 → 2013-07-22 9:15

The CS-TEACHING environment was effectively down from around Sunday 2013-07-21 13:00 to Monday 2013-07-22 9:15. A machine hosting some virtual DNS servers used by the environment had a networking problem. The details are being investigated.

Network interruption in MC scheduled for Tuesday 2013-7-16 at 7:00 AM

IST advises that there will be a brief network outage for all CS networks in the Math & Computer (MC) building (Mac labs, Linux workstations, graphics, real-tim e, etc) on Tuesday 2013-7-16 at 7:00 AM lasting up to 30 minutes. This will not affect the "laptop jack" networks in the labs.

Please contact Trevor Grove (trevor.grove @ uwaterloo.ca) if you have questions or concerns.

linux028.student.cs and linux032.student.cs were down — 2013-07-14 14:00 → 2013-07-15 8:55

Runaway processes appear to have been the cause.

Logins to CS-GENERAL machines down — 2013-07-13 18:00 → 2013-07-15 10:00

Logins to the CS-GENERAL machines, e.g. to linux.cs, were not working during at least the above interval. The apparent cause was a dead directory server combined with lack of redundancy.

linux024.student.cs was down — 2013-07-03 4:00 → 2013-07-03 8:50 AM

Runaway processes appear to have been the cause.

MarkUs, linux024.student.cs, linux028.student.cs down — 2013-06-23 16:00 → 2013-06-24 12:00

MarkUs was down, along with linux024.student.cs and linux028.student.cs (2 of the 3 linux.student.cs servers), from about Sunday 2013-06-23 16:00 to Monday 2013-06-24 12:00. Runaway processes and a disk quota problem appear to have been the cause.

linux032.student.cs is available — 2013-06-12 10:00

linux032.student.cs was put back to the list of machines reached by linux.student.cs.

linux032.student.cs unavailable — 2013-06-10 15:45

linux032.student.cs has been removed from the list of machines reached by linux.student.cs until a problem with a Marmoset assignment test that's disabling machines can be resolved. We expect that the worst case is resolution by end of the week.

networking outage — 2013-06-11 9:48 → 11:15

Networking, mostly for the CS admin area, was out this morning from 9:48 to 11:51. IST replaced the failed switch.

linux{028,032}.student.cs down — 2013/06/08 8:45 → 2013/06/10 9:00

linux028.student.cs looks to have been down since around 8:45 on Saturday, and linux032.student.cs since 9:00 on Saturday. That's two of the three linux.student.cs systems, and is where Marmoset runs assignments. They're back this morning @ 9:00AM.

www.student.cs problem — 2013-05-31 0:35

There are some problems viewing webpages on www.student.cs from 12:55 to 13:30. It was overloaded when we rebooted it at 13:27 today. The problem is fixed.

CS Teaching Environment - various outages — 2013-05-30 → 2013-05-31 12:25

A hardware failure disabled:

The hardware was replaced.

Slow Networking — 2013-05-29 → 2013-05-30 evening

Network load has been swamping the CS firewall, resulting in sluggish response for many systems. One possible cause was been found and corrected, which temporarily resolved the problem. However the load returned. Further investigation has resolved the problem.

Teaching Labs - Graphics, RealTime, MC3022 — 2013-05-29 17:45 → 2013-05-30 21:00

Failure to mount home directories had disabled the MC3022 general lab, and the realtime (MC3018) and graphics (MC3007) labs. The problem has been resolved, although it's possible that each workstation will need to be rebooted.

CS Teaching Environment down — 2013-05-29 13:35 → 13:50

A power problem took out the CS teaching fileserver. We're told that power was restored quickly. linux.student.cs recovered around 13:50.

Scheduled network maintenance — Thursday 2013-5-23 (7 to 8 AM)

On Thursday 2013-5-23 from 7 to 8 AM, CSCF and IST will be making minor changes to two CSCF network connections. The change will result in a short outage on some CSCF-internal networks and should not affect end users.

Please contact Trevor Grove (trevor.grove @ uwaterloo.ca) if you have questions or concerns about this scheduled maintenance.

CS-Teaching Authentication Delays — 2013-05-22

We're seeing long delays and timeouts when authenticating in the CS-teaching environment. Investigation is underway.

database.cs was down — 2013-05-20 6:48 → 2013-05-21 8:43

database.cs was down for about a day. It would have affected Marmoset, ST, inventory, grad TAs, faculty recruiting, and various other WWW based apps.

linux.cs will be rebooted at 7:30 AM on Friday, May 17

We will apply updates on linux.cs and reboot the machine at 7:30 am on Friday, May 17. The downtime should be less than 10 minutes.

linux.student.cs login problems — 2013-05-09 10:00 → 13:00

Logins (via `ssh`) to the linux.student.cs machines became so slow that they often timed out. The problem has been resolved.

Scheduled network maintenance — Wednesday 2013-5-1 and Thursday 2013-5-2 (6:30 to 8 AM)

On the mornings of Wednesday May 1 and Thursday May 2, CSCF and IST will be making minor changes to some CS network connections. The changes will result in short (five- to ten-minute) outages on some (but not all) connections. In particular, the network connections for printers in the Davis Centre will be affected by this outage.

The maintenance will occur during a service window each day from 6:30 to 8:00 AM. Each change should take only a few minutes, but they will occur unpredictably throughout the service window. Clients should be prepared for the short outages to occur at any time during the service window.

Please contact Trevor Grove (trevor.grove @ uwaterloo.ca) if you have questions or concerns about this scheduled maintenance.

Emergency postgres update — 2013-04-04

The postgres servers postgres.cs and postgres.student.cs will undergo emergency maintenance. postgres.cs will be updated at 5pm on Thursday 4 April. Downtime is expected to be a few minutes but might be as long as an hour if there are critical problems. postgres.student.cs will be updated on the morning of Friday 5 April (time TBA).

`gripe` command to be retired, 2013-3-12

On Tuesday March 12, 2013, the Unix/Linux "gripe" command will be retired and removed from CSCF-managed systems, and the newsgroup u w.cscf.help will no longer be monitored for client inquiries or comments. Instead:

This change matches the similar change at MFCF, and reflects the diminishing use of the command and underlying newsgroup.

If you have comments or concerns about this change, please email Trevor Grove (trevor.grove @ uwaterloo.ca).

linux032.student.cs was rebooted at 14:50 on Monday, March 11, 2013

linux032 didn't accept ssh connection and was rebooted. It is one of three hosts for the round robin linux.student.cs. Users may be delayed while logging on to linux.student.cs. The outage is probably about one hour.

Scheduled downtime — Tuesday 2013-2-19 and Saturday 2013-2-23; 9:00 to 10:00

CSCF and MFCF are scheduling joint system downtimes during Reading Week to upgrade our central file-server capacity. The scheduled times are: