muscat Blade Centre Documentation

Physical Setup

muscat consists of two IBM Blade Center Model H chassis. They contain 28 blades, model number LS-21. Each blade has 2 AMD dual-core 2212 HE CPUs at 2.0 (1-14) or 2.2GHz (15-28), at least 8GB of RAM, and a single 36GB internal disk.

The Blade Centres are physically installed at the University of Waterloo in DC3556. Networking is provided by two 100mbit uplinks to a CSCF-managed switch; one of these links connects to a dumb gigabit switch and is used for managing the system controllers, the other is plugged directly into the Blade Centre. Each link is on a different VLAN. There are three cables for the system controllers -- one is for muscat itself, the other two are for the system controllers on the disk arrays.

The Blade Centres have two internal switches each, each with six external copper RJ45 ports. These can be thought of as plugging directly into the two network devices on the blades themselves -- Linux will see them as eth0 and eth1. The first switch has one port in use, which is plugged into a switch port on VLAN7, so any blade activating its first network device will see that network. A short patch cable is used to connect the second switches on each BC, so the eth1 devices can talk to each other on the cluster's private network. The eth1 interfaces of all blades use IP addresses of the form 192.168.143.x.

There is a 13 disk DS4200 Express SAN attached to the BC by fibre. IBM documentation related to the DS4200 can be found at https://cs.uwaterloo.ca/cscf/research/cerasblade/documents/.

Access to the Cluster

To access the cluster, first log in to the head node of the cluster, which is muscat01.cs.uwaterloo.ca. For example, using ssh:

  % ssh -Y -A userid@muscat01.cs.uwaterloo.ca

Once you are logged in to the head node, you can log in from there to the nodes (blades) that have been assigned to you. Please do not log in to nodes that have not been assigned to you.


Node Assignments

Blades are assigned to individual users. Users should only log in to the head node and to those blades that they have been assigned.

Blade/Node Name Assignment
muscat01 head node
muscat02 cavram (Hadoop/edge)
muscat03 cavram (Hadoop/edge)
muscat04 Rajabi
muscat05 Waldman
muscat06 Waldman
muscat07 Rajabi
muscat08 Rajabi
muscat09 ak5singh
muscat10 x39liu (MySQL)
muscat11 x39liu (MySQL)
muscat12 r46liu (Cassandra/DAX)
muscat13 Rajabi
muscat14 r46liu (Cassandra/DAX)
muscat15 rgarcia
muscat16 h2saxena/mmior
muscat17 h2saxena/mmior
muscat18 h2saxena/mmior
muscat19 h2saxena/mmior
muscat20 Rajabi
muscat21 ufminhas (VoltDB)
muscat22 ufminhas (VoltDB)
muscat23 ufminhas (VoltDB)
muscat24 ufminhas (RemusDB)
muscat25 ufminhas (RemusDB)
muscat26 ufminhas (VoltDB)
muscat27 ufminhas (VoltDB)
muscat28 ufminhas (VoltDB)

Notes for sysadmins

The information in this section is probably out of date.

Network setup, SAN

All the management hosts are addressable by private network. They are best reached using a web browser; all require a robust Java environment installed. Windows (either XP or Vista) with IE7 seems to work well, but some installations of Ubuntu and CentOS have also been used. I have had very little luck trying to use a Mac OS 10.4 or 10.5 system.

The SAN controller should only be used to configure the disks, and there should be no need to connect to the system controllers -- if you need to talk to those, do it through the MM.

Passwords for these modules can be found in The Ususal Place. Please contact the CSCF RSG point of contact if you're a user who needs to be able to configure the disks, or to connect to the KVMs. Users with access to the KVMs are currently Umar Minhas and Tao Zheng.

One also needs the Storage Manager software (available from IBM's website) in order to configure the disks in the DS4200. SM runs well under CentOS, OpenSuSE, or Windows, and is installed on muscat01. Windows XP users should get the Windows 2003 version of the Storage Manager software, and there is also a Vista version available. Current version is 10.1. Older (9.60) installations will no longer work.

DNS

Admins will have to configure the DNS information by hand; using the GUI tool currently seems to break the setup for reverse DNS. BIND configurations are stored in /var/lib/named/master. The Appendix lists which IPs are currently allocated on the private network, and for what purpose.

CLI package management

OpenSuSE 10.3 has a new CLI tool called zypper which may seem familiar to anybody who's used rug from previous releases of SuSE, or apt from Debian-clones. Main commands are things like zypper repos, zypper search packagename and zypper install packagename.

Adding Accounts

The LDAP setup has notes included in RT#60251. It is possible to use the yast2 tool in order to add new users, but there's also a manual way that may work better. I've created tools and stashed them in /root/people on muscat01. The first thing one needs to do is figure out what userid, uid, and password to use for the new user. It is strongly recommended that we stick with using CS uids whenever possible, and always truncate to 8 characters for these userids. These can be retrieved from any core CS machine with the idregistry command, like so:

cpu104> idregistry request mpatters
mpatterson:1633
cpu104>

So, the uid for the userid mpatters is 1633.

The first tool to use is called adder.pl, and is called like this: adder.pl userid uid password. The password string should be encrypted, although if you're not sure what to put here, ' ' should do; you can later change it with the passwd command. adder.pl simply creates an LDIF file that can be added to the LDAP database using a different script named addtoldap.sh - provided you know the LDAP admin password. Here you can see Real OutputTMfrom adding a user. We already know the user's uid is 6092.

muscat01:~/people # ./adder.pl t3zheng 6092 ' '
muscat01:~/people # ls -ld t3zheng.ldif
-rw-r--r-- 1 root root 277 2008-02-13 16:12 t3zheng.ldif
muscat01:~/people # ./addtoldap.sh t3zheng.ldif 
Enter LDAP Password: 
adding new entry "uid=t3zheng,ou=people,dc=muscat01,dc=cs,dc=uwaterloo,dc=ca"

muscat01:~/people #

In this situation, where we've set the user's encrypted password to a space, we should immediately change it to something we know. If you need a good method for generating passwords, the apg program is commonly available (although not on our installation). In any event, you should immediately remove the LDIF file that was created in this process.

If the user is not a member of the UW community, choose a uid in the sequence starting with 1001. Ideally their userid will not potentially conflict with a UWDir userid, but it's not a disaster if it does. The usual rules for UWDir are FirstInitialMiddleInitialLastname, so Henry Aaron Bloggins would be habloggins, which would then be truncated to habloggi. However, the LDAP configuration on muscat01 does not enforce any particular userid restrictions.

The fastest way to determine which uids are in use on the system is to use slapcat, something like this (note that one needs to have sudo privileges:

mpatters@muscat01:~> sudo /usr/sbin/slapcat | \
 grep uidNumber | awk '{print $2}' | sort -n
1001
1633
5888
6092
14499
mpatters@muscat01:~>

The GUI Way of adding users

yast2 actually has two GUIs, one is ncurses based and the other X11. It would appear that the ncurses tool is not full-featured, which causes problems given that we would like to be able to set a user's uid manually, and also it does not seem to accept long passwords. Use the ncurses version at your own risk. A quick note about X11 and root/sudo: use the -E option for sudo, then run xauth merge /home/userid/.Xauthority. This will allow your new rootshell to use your X11 tunnel cookies.

After starting yast2, you should get a window titled ``YaST Control Center'', and then click (from the left) `Security and Users', which will bring you right to that section. (Otherwise you could just scroll down.) Next, select `User Management'. It should read in a bunch of settings, then throw you into a screen like this:

Figure: User/group administration in yast2
Image yast2_1

You will want to use the `Set Filter' button to restrict the user list to `LDAP Users', and you will then be prompted for the LDAP server password. The display should update and look something like this:

Figure: LDAP User/group administration in yast2
Image yast2_2
Click `Add User', then fill in the User Data tab. To set the uid and group memberships, choose the Details tab. (NB: the ncurses yast tool does not seem to allow this.) The user we're adding is named y6hu, has a uid of 1736, and we want this user to be able to start Xen machines:

Figure: Adding a user to LDAP database, details pane
Image yast2_3
When you're happy with everything, click `Accept', then `Finish'. Since you've just set the user's password, you should be able to try logging in yourself. If all goes well, then you're set.

Creating sudoers

To allow a user to use sudo on any stable node, add that user to the stableadmins group on the head node. This can be done using the standard OpenSuSE group admin tools in yast2. This will allow users to run any command on any of the stable nodes.

NB: This does not include the ability to sudo on the head node - for that, add the user to the group wheel on the head node itself.

Sometimes adding the user to a group doesn't immediately `take' on client nodes. Rebooting the client node will force this; restarting the nscd service (/etc/init.d/nscd restart) may cause it to happen more quickly and less destructively. This is some sort of cache issue that needs to be resolved.

Working with System Imager

RT#61548 is setting up SI on muscat01. Things to keep in mind are the machine will need an entry in /etc/dhcpd.conf if it doesn't already, as well as /var/lib/named/master/*, and also look in /var/lib/systemimager/scripts. Copying the format of other machines is fine. The current image is actually called Test02, and the scripts directory looks like this:

mpatters@muscat01:/var/lib/systemimager/scripts> ls -l
total 68
-rw-r--r-- 1 root root   210 2008-02-13 15:00 hosts
lrwxrwxrwx 1 root root    16 2008-02-13 14:27 muscat10.sh -> muscat-stable.sh
lrwxrwxrwx 1 root root    16 2008-02-13 14:27 muscat11.sh -> muscat-stable.sh
lrwxrwxrwx 1 root root    13 2008-03-12 13:23 muscat12.sh -> Test01.master
lrwxrwxrwx 1 root root    16 2008-02-13 14:27 muscat13.sh -> muscat-stable.sh
lrwxrwxrwx 1 root root    16 2008-02-13 14:27 muscat14.sh -> muscat-stable.sh
lrwxrwxrwx 1 root root    13 2008-03-12 12:02 muscat-stable.sh -> Test02.master
drwxr-xr-x 2 root root  4096 2008-02-08 16:00 post-install
drwxr-xr-x 2 root root  4096 2008-02-08 16:00 pre-install
-rw-r--r-- 1 root root 24937 2008-02-13 12:21 Test01.master
-rw-r--r-- 1 root root 24937 2008-03-12 11:58 Test02.master
mpatters@muscat01:/var/lib/systemimager/scripts>

To prepare the golden client machine, you need to do something like si_prepareclient -server 192.168.143.1 -kernel /boot/vmlinuz. Once that's completed, you need to do something like si_getimage -golden-client 192.168.143.114 -image Test02 -exclude '/media/*' -exclude '/scratch*' -exclude '/home/*' -exclude '/tmp/*'.

Previously there have been issues with networking ceasing to function shortly after the node boots up; this has currently been attributed to several factors and has, we believe, been resolved. See the notes in RT 64335, dated 10 October 2008, for more details.

In order to PXE boot a client to re-image it, 3 services need to be running on muscat01, you can start them like this:

Once you are done imaging your new client(s), shut the services off again, so a machine that is accidentally PXE-booted won't be automatically re-imaged.

Working with the Management Module

If you want to boot a blade off a local ISO image, you need to hit F12 and choose CDROM. If you're booting a floppy disk image, choose local diskette. You need to make sure that the media tray is set to the machine you want to boot, even if it's a local image.

The console can sometimes behave very oddly, becoming off-centre or just plain not displaying much at all. Unfortunately, that seems to be partially a consequence of running an unsupported operating system. Sometimes clicking the Paint or Calibrate buttons can help. If not, try changing virtual consoles (Linux host Ctrl-Alt-Fx) or the blade to which the remote console is attached.

Network layout

Figure: Network layout
Image muscatlayout

In the diagram, the 192.168 addresses are assigned always to the second interface on the blade; in Linux, it will be known as eth1. Care needs to be taken in order to avoid having blades other than 01 turn into bridges onto the private network. Ideally, the eth0 interface will not be brought up on any blade other than the first, although some software management may require this be done. While it is possible to disable the primary (or secondary) interface for individual blades through either the MM or the BIOS of the blade, this is not recommended as it can lead to confusion with respect to device naming.

Not shown here is the connection to the DS4200s. Technically every blade has the ability to see it; however, certain of the storage partitions (the one containing the RAID5s known as home and scratch-net) should never be accessed by any machine but muscat01. Doing otherwise is almost certain to cause catastrophic data loss.

Generally speaking, we should avoid placing blades on the publicly addressable network if at all possible. This reduces the risk of intrusion, as well as the risk of accidentally creating a possibly uncontrolled bridge between the public and private networks.

About this document ...

muscat Blade Centre Documentation

This document was generated using the LaTeX2HTML translator Version 2008 (1.71)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_navigation -split 0 muscat.tex

The translation was initiated by Ken Salem on 2013-05-02


Ken Salem 2013-05-02