muscat consists of two IBM Blade Center Model H chassis. They contain 28 blades, model number LS-21. Each blade has 2 AMD dual-core 2212 HE CPUs at 2.0 (1-14) or 2.2GHz (15-28), at least 8GB of RAM, and a single 36GB internal disk.
The Blade Centres are physically installed at the University of Waterloo in DC3556. Networking is provided by two 100mbit uplinks to a CSCF-managed switch; one of these links connects to a dumb gigabit switch and is used for managing the system controllers, the other is plugged directly into the Blade Centre. Each link is on a different VLAN. There are three cables for the system controllers -- one is for muscat itself, the other two are for the system controllers on the disk arrays.
The Blade Centres have two internal switches each, each with six external copper RJ45 ports. These can be thought of as plugging directly into the two network devices on the blades themselves -- Linux will see them as eth0 and eth1. The first switch has one port in use, which is plugged into a switch port on VLAN7, so any blade activating its first network device will see that network. A short patch cable is used to connect the second switches on each BC, so the eth1 devices can talk to each other on the cluster's private network. The eth1 interfaces of all blades use IP addresses of the form 192.168.143.x.
There is a 13 disk DS4200 Express SAN attached to the BC by fibre. IBM documentation related to the DS4200 can be found at https://cs.uwaterloo.ca/cscf/research/cerasblade/documents/.
To access the cluster, first log in to the head node of the cluster, which is muscat01.cs.uwaterloo.ca. For example, using ssh:
% ssh -Y -A userid@muscat01.cs.uwaterloo.ca
Once you are logged in to the head node, you can log in from there to the nodes (blades) that have been assigned to you. Please do not log in to nodes that have not been assigned to you.
Blades are assigned to individual users. Users should only log in to the head node and to those blades that they have been assigned.
Blade/Node Name | Assignment |
muscat01 | head node |
muscat02 | cavram (Hadoop/edge) |
muscat03 | cavram (Hadoop/edge) |
muscat04 | Rajabi |
muscat05 | Waldman |
muscat06 | Waldman |
muscat07 | Rajabi |
muscat08 | Rajabi |
muscat09 | ak5singh |
muscat10 | x39liu (MySQL) |
muscat11 | x39liu (MySQL) |
muscat12 | r46liu (Cassandra/DAX) |
muscat13 | Rajabi |
muscat14 | r46liu (Cassandra/DAX) |
muscat15 | rgarcia |
muscat16 | h2saxena/mmior |
muscat17 | h2saxena/mmior |
muscat18 | h2saxena/mmior |
muscat19 | h2saxena/mmior |
muscat20 | Rajabi |
muscat21 | ufminhas (VoltDB) |
muscat22 | ufminhas (VoltDB) |
muscat23 | ufminhas (VoltDB) |
muscat24 | ufminhas (RemusDB) |
muscat25 | ufminhas (RemusDB) |
muscat26 | ufminhas (VoltDB) |
muscat27 | ufminhas (VoltDB) |
muscat28 | ufminhas (VoltDB) |
The information in this section is probably out of date.
The SAN controller should only be used to configure the disks, and there should be no need to connect to the system controllers -- if you need to talk to those, do it through the MM.
Passwords for these modules can be found in The Ususal Place. Please contact the CSCF RSG point of contact if you're a user who needs to be able to configure the disks, or to connect to the KVMs. Users with access to the KVMs are currently Umar Minhas and Tao Zheng.
One also needs the Storage Manager software (available from IBM's website) in order to configure the disks in the DS4200. SM runs well under CentOS, OpenSuSE, or Windows, and is installed on muscat01. Windows XP users should get the Windows 2003 version of the Storage Manager software, and there is also a Vista version available. Current version is 10.1. Older (9.60) installations will no longer work.
Admins will have to configure the DNS information by hand; using the GUI tool currently seems to break the setup for reverse DNS. BIND configurations are stored in /var/lib/named/master. The Appendix lists which IPs are currently allocated on the private network, and for what purpose.
The LDAP setup has notes included in RT#60251. It is possible to use the yast2 tool in order to add new users, but there's also a manual way that may work better. I've created tools and stashed them in /root/people on muscat01. The first thing one needs to do is figure out what userid, uid, and password to use for the new user. It is strongly recommended that we stick with using CS uids whenever possible, and always truncate to 8 characters for these userids. These can be retrieved from any core CS machine with the idregistry command, like so:
cpu104> idregistry request mpatters mpatterson:1633 cpu104>
So, the uid for the userid mpatters is 1633.
The first tool to use is called adder.pl, and is called like this: adder.pl userid uid password. The password string should be encrypted, although if you're not sure what to put here, ' ' should do; you can later change it with the passwd command. adder.pl simply creates an LDIF file that can be added to the LDAP database using a different script named addtoldap.sh - provided you know the LDAP admin password. Here you can see Real OutputTMfrom adding a user. We already know the user's uid is 6092.
muscat01:~/people # ./adder.pl t3zheng 6092 ' ' muscat01:~/people # ls -ld t3zheng.ldif -rw-r--r-- 1 root root 277 2008-02-13 16:12 t3zheng.ldif muscat01:~/people # ./addtoldap.sh t3zheng.ldif Enter LDAP Password: adding new entry "uid=t3zheng,ou=people,dc=muscat01,dc=cs,dc=uwaterloo,dc=ca" muscat01:~/people #
In this situation, where we've set the user's encrypted password to a space, we should immediately change it to something we know. If you need a good method for generating passwords, the apg program is commonly available (although not on our installation). In any event, you should immediately remove the LDIF file that was created in this process.
If the user is not a member of the UW community, choose a uid in the sequence starting with 1001. Ideally their userid will not potentially conflict with a UWDir userid, but it's not a disaster if it does. The usual rules for UWDir are FirstInitialMiddleInitialLastname, so Henry Aaron Bloggins would be habloggins, which would then be truncated to habloggi. However, the LDAP configuration on muscat01 does not enforce any particular userid restrictions.
The fastest way to determine which uids are in use on the system is to use slapcat, something like this (note that one needs to have sudo privileges:
mpatters@muscat01:~> sudo /usr/sbin/slapcat | \ grep uidNumber | awk '{print $2}' | sort -n 1001 1633 5888 6092 14499 mpatters@muscat01:~>
yast2 actually has two GUIs, one is ncurses based and the other X11. It would appear that the ncurses tool is not full-featured, which causes problems given that we would like to be able to set a user's uid manually, and also it does not seem to accept long passwords. Use the ncurses version at your own risk. A quick note about X11 and root/sudo: use the -E option for sudo, then run xauth merge /home/userid/.Xauthority. This will allow your new rootshell to use your X11 tunnel cookies.
After starting yast2, you should get a window titled ``YaST Control Center'', and then click (from the left) `Security and Users', which will bring you right to that section. (Otherwise you could just scroll down.) Next, select `User Management'. It should read in a bunch of settings, then throw you into a screen like this:
You will want to use the `Set Filter' button to restrict the user list to `LDAP Users', and you will then be prompted for the LDAP server password. The display should update and look something like this:
Click `Add User', then fill in the User Data tab. To set the uid and group memberships, choose the Details tab. (NB: the ncurses yast tool does not seem to allow this.) The user we're adding is named y6hu, has a uid of 1736, and we want this user to be able to start Xen machines:
When you're happy with everything, click `Accept', then `Finish'. Since you've just set the user's password, you should be able to try logging in yourself. If all goes well, then you're set.
To allow a user to use sudo on any stable node, add that user to the stableadmins group on the head node. This can be done using the standard OpenSuSE group admin tools in yast2. This will allow users to run any command on any of the stable nodes.
NB: This does not include the ability to sudo on the head node - for that, add the user to the group wheel on the head node itself.
Sometimes adding the user to a group doesn't immediately `take' on client nodes. Rebooting the client node will force this; restarting the nscd service (/etc/init.d/nscd restart) may cause it to happen more quickly and less destructively. This is some sort of cache issue that needs to be resolved.
RT#61548 is setting up SI on muscat01. Things to keep in mind are the machine will need an entry in /etc/dhcpd.conf if it doesn't already, as well as /var/lib/named/master/*, and also look in /var/lib/systemimager/scripts. Copying the format of other machines is fine. The current image is actually called Test02, and the scripts directory looks like this:
mpatters@muscat01:/var/lib/systemimager/scripts> ls -l total 68 -rw-r--r-- 1 root root 210 2008-02-13 15:00 hosts lrwxrwxrwx 1 root root 16 2008-02-13 14:27 muscat10.sh -> muscat-stable.sh lrwxrwxrwx 1 root root 16 2008-02-13 14:27 muscat11.sh -> muscat-stable.sh lrwxrwxrwx 1 root root 13 2008-03-12 13:23 muscat12.sh -> Test01.master lrwxrwxrwx 1 root root 16 2008-02-13 14:27 muscat13.sh -> muscat-stable.sh lrwxrwxrwx 1 root root 16 2008-02-13 14:27 muscat14.sh -> muscat-stable.sh lrwxrwxrwx 1 root root 13 2008-03-12 12:02 muscat-stable.sh -> Test02.master drwxr-xr-x 2 root root 4096 2008-02-08 16:00 post-install drwxr-xr-x 2 root root 4096 2008-02-08 16:00 pre-install -rw-r--r-- 1 root root 24937 2008-02-13 12:21 Test01.master -rw-r--r-- 1 root root 24937 2008-03-12 11:58 Test02.master mpatters@muscat01:/var/lib/systemimager/scripts>
To prepare the golden client machine, you need to do something like si_prepareclient -server 192.168.143.1 -kernel /boot/vmlinuz. Once that's completed, you need to do something like si_getimage -golden-client 192.168.143.114 -image Test02 -exclude '/media/*' -exclude '/scratch*' -exclude '/home/*' -exclude '/tmp/*'.
Previously there have been issues with networking ceasing to function shortly after the node boots up; this has currently been attributed to several factors and has, we believe, been resolved. See the notes in RT 64335, dated 10 October 2008, for more details.
In order to PXE boot a client to re-image it, 3 services need to be running on muscat01, you can start them like this:
Once you are done imaging your new client(s), shut the services off again, so a machine that is accidentally PXE-booted won't be automatically re-imaged.
If you want to boot a blade off a local ISO image, you need to hit F12 and choose CDROM. If you're booting a floppy disk image, choose local diskette. You need to make sure that the media tray is set to the machine you want to boot, even if it's a local image.
The console can sometimes behave very oddly, becoming off-centre or just plain not displaying much at all. Unfortunately, that seems to be partially a consequence of running an unsupported operating system. Sometimes clicking the Paint or Calibrate buttons can help. If not, try changing virtual consoles (Linux host Ctrl-Alt-Fx) or the blade to which the remote console is attached.
In the diagram, the 192.168 addresses are assigned always to the second interface on the blade; in Linux, it will be known as eth1. Care needs to be taken in order to avoid having blades other than 01 turn into bridges onto the private network. Ideally, the eth0 interface will not be brought up on any blade other than the first, although some software management may require this be done. While it is possible to disable the primary (or secondary) interface for individual blades through either the MM or the BIOS of the blade, this is not recommended as it can lead to confusion with respect to device naming.
Not shown here is the connection to the DS4200s. Technically every blade has the ability to see it; however, certain of the storage partitions (the one containing the RAID5s known as home and scratch-net) should never be accessed by any machine but muscat01. Doing otherwise is almost certain to cause catastrophic data loss.
Generally speaking, we should avoid placing blades on the publicly addressable network if at all possible. This reduces the risk of intrusion, as well as the risk of accidentally creating a possibly uncontrolled bridge between the public and private networks.
This document was generated using the LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_navigation -split 0 muscat.tex
The translation was initiated by Ken Salem on 2013-05-02