The Riemann cluster obtained in late 2009 is an SGI XE340 cluster consisting of nine chassis with two separate machines per chassis for a total of 18 nodes. The head node is riemann.math with a public network interface. The rest are on a private subnet. Home directories come from pilatus.cs by means of a gigabit fibre link also on the private subnet. The private subnet operates on switch math-sw-mc-3-3015-ah06. Each node has two quad-core Intel Nehalem CPUs. The cluster is running SLES 11 with SGI Foundation 1 SP5 and ProPack 6 SP5. We have three years of full support on hardware and software.
Initial set-up was done by SGI. There is no serious cluster administration or job scheduling suite such as Scali, Platform Manager, Grid Engine, etc. We're using open-source System Imager (SI) for basic OS imaging of the nodes. This was largely a budgetary decision. With SI it does not seem possible to propagate minor updates or OS configuration changes; one must apply such changes on a node and then inhale a complete image to SI for subsequent complete rebuilding of client nodes. Consequently, it becomes essential for us to record any configuration changes made to the nodes to ensure that we can either later incorporate those into an image or easily refer to such records for repeating changes after a node is reinstalled.
Please be sure to RCS any manual changes to system configuration files. That way we can find things we've changed by searching for RCS directories. Things we do through Yast won't be manageable that way, so we'll have to record them here (or via reference to RT items with details).
Please say something about the IP network set-up.
Roughly: netmask 255.255.0.0; data on 10.20.0.[1-99], BMC on 10.20.0.[101-199], pilatus on 10.20.1.99. All nodes other than head node use eth0 for the private network and BMC. Head node uses eth0 IP 129.97.140.166, eth1 is used for private network 10.20.0.18, eth0 is also used for BMC 129.97.82.166
The XE340 has BMC which can be accessed using IPMI tools. The networked BMC connection piggybacks on the default eth0 port. (A dedicated port is available but that would consume twice as many network switch ports and we don't think the traffic warrants it.)
There's a graphical tool for connecting to the IMPIs that you can start with this command:
/bin/sh /opt/SUPERMICRO/IPMIView/IPMIView20.bin
Create the account on pilatus.cs. This will create the /home directory entry for the account onto all riemann nodes.
As an example in creating an account.
Create the account on pilatus.cs first if not already present. This creates a /home directory on all riemann nodes.
Create the account on the riemann head node. Use the same UID and GID if you want.
To become another user
Once the account is created on pilatus.cs and riemann.math head node it will need to be propagated to the sub-nodes from the head node. We use NIS for account propagation.
Accounts are propagated to all sub-nodes from the head node via NIS.
Use YAST2 to set up NIS server on the head node riemann. Follow the selections from YAST2. Do not set up the head node as an NIS client (NIS docs suggest doing so).
Use YAST2 to set up NIS client on the sub-nodes
The NIS domain name is nisriemann.
To report the NIS domain name on the client node 10.20.0.5:
The head node will be running ypserv daemaon and the sub-nodes will be running ypbind daemon.
Sub-nodes will point to the NIS server
To add accounts from /etc/passwd to NIS run YAST2 again to set up the server. Unlike shiraz/maroo/vidal running '/var/yp/make all' and '/var/yp/make' does not seem to work.
On a node to find all NIS users
SSH into the head node will require all users to enter their password. From the head node enter any client node without requiring a password. Set up the keys for cscf-adm on Pilatus. As it is NFS mounted to riemann all nodes will have the 'authorized2' file.
Test SSH keys by logging into riemann and from riemann log into any client node riemann1 to riemann17. It may ask to allow the connection if this is the first time logging into the client, reply with 'yes'. After this it should not ask for a password.
SuperMicro IPMI BMC comes with an embedded web server. Access to the remote management card is done via any browser to https://bmc-riemann.math.uwaterloo.ca (temporarily the card is named bmc2-riemann.math.uwaterloo.ca)
From the browser connection view sensor information, event logs, power control, network settings, etc.
To access riemann.math via the iLOM KVM interface select the menu tab Remote Control then Launch Console button. Log into riemann and start Firefox and connect to any of the sub-node iLOM cards via http://10.20.0.101 to http://10.20.0.117
.
Don't change the Configuration, LAN Select Settings. If this setting is changed to Enable On-Board or Dedicated the node must be power-cycled, otherwise sensor readings will be inaccessible.
The Java from the remote machine must be able to download and run jviewer.jnlp. Head node riemann.math does not have the correct Java version. To access a sub-node find a machine with the correct Java.
From fe105.math fe105$ firefox & In the address bar: http://bmc-riemann.math Log into the IPMI card. Then start a KVM of riemann. It will ask to download jviewer.jnlp and to select Java Web Start to run the command. It will query yes/no a couple times. After login start a firefox GUI session. In firefox address bar enter the IP of the sub-node IPMI. Login to the sub-node and start a KVM session of the sub-node.
riemann:/opt/SUPERMICRO/IPMIView # ./ipmicfg-linux.x86_64 -m IP=129.97.82.166 MAC=00:30:48:C9:6C:55
-- RobynLanders - 04 Dec 2009