The Riemann cluster obtained in late 2009 is an SGI XE340 cluster consisting of nine chassis with two separate machines per chassis for a total of 18 nodes. The head node is riemann.math with a public network interface. The rest are on a private subnet. Home directories come from NetApp fs01.student.math by means of a gigabit fibre link also on the private subnet. The private subnet operates on switch math-sw-mc-3-3015-ah06. Each node has two quad-core Intel Nehalem CPUs. The cluster was updated to SLES 11 SP2 with SGI Foundation 2.8 and Performance Suite 1.6 in August 2013. There was an initial three year contract for full support on hardware and software. The last renewal expires November 2014.
Initial set-up was done by SGI. There is no serious cluster administration or job scheduling suite such as Scali, Platform Manager, Grid Engine, etc. Open-source System Imager (SI) was used for basic OS imaging of the nodes. This was largely a budgetary decision. With SI it does not seem possible to propagate minor updates or OS configuration changes; one must apply such changes on a node and then inhale a complete image to SI for subsequent complete rebuilding of client nodes. Consequently, it becomes essential for us to record any configuration changes made to the nodes to ensure that we can either later incorporate those into an image or easily refer to such records for repeating changes after a node is reinstalled.
Please be sure to RCS any manual changes to system configuration files. That way we can find things we've changed by searching for RCS directories. Things we do through Yast won't be manageable that way, so we'll have to record them here (or via reference to ST items with details).
https://www.cs.uwaterloo.ca/cscf/internal/edocs/machines/riemann.math.uwaterloo.ca/
Roughly: netmask 255.255.0.0; data on 10.20.0.[1-99], BMC on 10.20.0.[101-199], pilatus on 10.20.0.99 (former home directory server), fs01.student.math on 10.20.0.98. All nodes other than head node use eth0 for the private network and BMC. Head node uses eth0 IP 129.97.140.166, eth1 is used for private network 10.20.0.18, eth0 is also used for BMC 129.97.82.166
The XE340 has BMC which can be accessed using IPMI tools. The networked BMC connection piggybacks on the default eth0 port. (A dedicated port is available but that would consume twice as many network switch ports and we don't think the traffic warrants it.)
There's a graphical tool for connecting to the IMPIs that you can start with this command: /bin/sh /opt/SUPERMICRO/IPMIView/IPMIView20.bin
Create the account on the riemann head node. Use the same UID and GID that the person has in the Math region.
Also, create a .forward file to the person's @uwaterloo.ca account. We don't want the cluster to be a mail processing destination.
Accounts are propagated to all sub-nodes from the head node via NIS. To make the nodes notice a new account, run YaST2 on the head node and go through the NIS Server set-up, not changing anything, just next-next-next...done. Some details follow.
Sub-nodes will point to the NIS server
On a node to find all NIS users
SSH into the head node will require all users to enter their password. From the head node enter any client node without requiring a password. In this example, we set up the keys for cscf-adm on Pilatus. As it is NFS mounted to riemann all nodes will have the 'authorized2' file. This dates back to use of pilatus as the home directory server, which is no longer the case.
SuperMicro IPMI BMC comes with an embedded web server. Access to the remote management card is done via any browser to https://bmc-riemann.math.uwaterloo.ca (temporarily the card is named bmc2-riemann.math.uwaterloo.ca)
From the browser connection view sensor information, event logs, power control, network settings, etc.
To access riemann.math via the iLOM KVM interface select the menu tab Remote Control then Launch Console button. Log into riemann and start Firefox and connect to any of the sub-node iLOM cards via http://10.20.0.101 to http://10.20.0.117
.
Don't change the Configuration, LAN Select Settings. If this setting is changed to Enable On-Board or Dedicated the node must be power-cycled, otherwise sensor readings will be inaccessible.
The Java from the remote machine must be able to download and run jviewer.jnlp. Head node riemann.math does not have the correct Java version. To access a sub-node find a machine with the correct Java.
From fe105.math fe105$ firefox & In the address bar: http://bmc-riemann.math Log into the IPMI card. Then start a KVM of riemann. It will ask to download jviewer.jnlp and to select Java Web Start to run the command. It will query yes/no a couple times. After login start a firefox GUI session. In firefox address bar enter the IP of the sub-node IPMI. Login to the sub-node and start a KVM session of the sub-node.
riemann:/opt/SUPERMICRO/IPMIView # ./ipmicfg-linux.x86_64 -m IP=129.97.82.166 MAC=00:30:48:C9:6C:55
-- RobynLanders - 04 Dec 2009