m160 Bioinformatics Research Cluster

  • This document describes Ming Li's CCF research cluster m160.cs

EDOCS

OS

  • Ubuntu 10.04.1 LTS AMD ALT 64bit
  • See UbuntuImageCreation section on Ubuntu 10.04 for install details

ONESIS

Note - we hope to use this some day - but it is not yet confiugured

Quotas

  • See UbuntuQuota System has quota setup but not used yet

Access

Accounts

Create all accounts on m160.cs only - See: LocalLinuxAccounts

  1. ) Log into m160.cs
  2. ) sudo bash
  3. ) adduser userid
  4. ) /root/utils/sync-users
    • This copies the users password/group etc over to the nodes

Software

Backups

Legato Networker

Local SCripts

  • See script /scripts/save - run from /etc/crontab
  • The following files and directories are saved to /save and then TAR GZIPPED every week to */archive(
  • Files in /archive older then 28 days are removed
root@m160:/scripts# cat save.list
/etc/aliases
/etc/apcupsd/apcupsd.conf
/etc/apt/apt.conf.d/10periodic
/etc/bash.bashrc
/etc/cups/cupsd.conf
/etc/default/apcupsd
/etc/default/dnsmasq
/etc/default/exim4
/etc/default/grub
/etc/default/portmap
/etc/default/snmpd
/etc/default/sysstat
/etc/denyhosts.conf
/etc/dnsmasq.conf
/etc/dnsmasq.hosts
/etc/exports
/etc/fstab
/etc/fuse.conf
/etc/gnome/defaults.list
/etc/group
/etc/gshadow
/etc/host.conf
/etc/hostname
/etc/hosts
/etc/hosts.allow
/etc/hosts.deny
/etc/init.d/networker
/etc/ld.so.conf
/etc/mailname
/etc/modprobe.d/ib.conf
/etc/network/interfaces
/etc/nscd.conf
/etc/nsswitch.conf
/etc/ntp.conf
/etcodefault/samba
/etc/pam.conf
/etc/passwd
/etc/resolv.conf
/etc/rpc
/etc/securetty
/etc/security/limits.conf
/etc/shadow
/etc/smartd.conf
/etc/snmp/snmpd.conf
/etc/sudoers
/etc/sysctl
/etc/X11/xorg.conf
/etc/xinetd.conf

# DIRS
/cscf-adm
/etc/apt
/etc/exim4
/etc/pam.d
/etc/profile
/etc/samba
/etc/ssh
/root
/scripts
/tftpboot
/usr/local/bin
/opt

SMART drive tools

See Raid Monitoring Section

  • Package: smartmontools
  • Config: /etc/smartd.conf

Email

  • apt-get install fetchmail bsd-mailx

Security

  • Denyhosts

Subversion

SSHfs

  • apt-get install sshfs
  • See: SshFs

CSVN

  • CollabNet Subversion Edge
  • See /opt/csvn

HADOOP

  • See /opt/hadoop - *y63tang*'s project

Scripts

All scripts in this section are run from m160.cs

  • Locations: /root/utils
  • NODES a common script that contains a list of the cluster nodes

Setup Auto Update of Security updates on each node

  • Script: /root/utils/auto-updates
  • Provides: configures each node to run security updates - using m160 is reference

Basic Health Test

  • Script: /root/utils/check-nodes
  • Provides: basic ping and SSH test to each node

List Drive Serial Numbers using smartmontools

  • Script: /root/utils/drives
  • Provides: serial numbers
    • note basic ping and SSH test to each node

Syncing Packages between nodes

  • Uses m160-1 as a reference node and duplicates all of its packages on each node
  • Script: /root/utils/sync-packages

Syncing userid or password updates between nodes

  • Provides: Sync account entries,passwords and SSH keys across nodes
    • Adds user SSH keys to their .ssh/authorized_keys2 file
  • Script: /root/utils/sync-users

Remounting nodes

Used if the head node restarts

  • Script: /root/utils/fix-mount

Sync user accounts and ssh keys

  • /root/utils/sync-users
    • uses add_gcos

Update nodes

  • /root/utils/update-nodes
    • Setups unattended updates on nodes using m160 as templet

Rebooting Nodes

  • /root/utils/reboot-nodes

DNSMASQ - PXE BOOT, DHCP, DNS

  • 30 April 2015 - we switched to the updated PXE,BOOT/DNS/DHCP setup
  • See /cscf-adm/src/dnsmasq
  • Provides: PXE BOOT, DNS, DHCP
  • Updating DHCP and DNS information for nodes
    • cd /cscf-adm/src/dnsmasq/dnsmasq.common.m160
    • Edit /cscf-adm/src/dnsmasq/dnsmasq.common.m160
    • Restart: make
  • PXE BOOT /tftpboot/pxes
    • Config: /tftpboot/pxes/pxelinux.cfg/default
    • Mounted ISO boot images: /tftpboot/pxes/iso

Firewall NAT

  • 30 April 2015 - we switched to the updated PXE,BOOT/DNS/DHCP setup
* This critical script allows the nodes to talk to the outside world and limits access to the head node
  • Startup script: service firewall start|stop|status|restart
  • Support scripts:
    • /usr/local/bin/common_host, /usr/local/bin/common_vars,/usr/local/bin/common_vars

Samba Shares

  • m160 has a local Samba server configured for its local networks
  • Userid: cacf-adm - 2015 password -in safe
  • SMB node images: smb://m160/images

Imaging Nodes

  • See Remote Management
  • See ImageDeploymentAcronis
  • PXE BOOT the node you want to Image by powering on or resetting the node
    • Press F12 during BIOS boot phase when you first see the BIOS screen - keep pressing the F12 until
      • - or - from m160
    • *ipmipxeboot lom-m160-NN (NN = 1..16)
      • (cscf-adm, 2015 password when prompted)
    • ipmipoweron or ipmireset lom-m160-NN (NN = 1 .. 16)
      • (cscf-adm, 2015 password when prompted)
  • Start Acronis

Acronis Disk Partition notes

  • Before restoring an image on a node clear the disk partition tables of any previous BIOS, GPT or MBR partition information
    • WARNING: THIS WILL DISTROY DATA ON BOTH DISKS OF A NODE!
    • Open Shell prompt under Acronis Actions Menu
      • dd if=/dev/zero of=/dev/sda bs=1M count=1000
      • dd if=/dev/zero of=/dev/sdb bs=1M count=1000
  • Use ALT F1 to exit shell when you finish

Image Archive Locations and Access

  • smb://m160/images
    • Userid: cscf-adm - 2015 password in safe
    • Current Node Image: node-f-disks_2011_06_01_11_49_00_110D.TIB

Networking

Remote Management

NOTE: port 5 on cs-sw-mc-3015-cs2d controlls access to ALL IPMI interfaces in the M160 cluster - it is left DISABLED when not in use for added security!!! IPMI primary feed for ALL IPMI interfaces in the M160 cluster

Power On/Off

IPMI method

IPMI tools Power ON/Off method

20Oct2016*
  • From any host with ipmitool installed
    • ipmitool -H lom-m160.cs.uwaterloo.ca -U cscf-adm -P <2016 cscf-adm password> power on
      • Replace <> with password
      • Note: All nodes have names lom-m160-1 .. lom-m160-16
  • Script to boot all nodes after m160 is running*
    • Become root on linux.cscf, and ssh root@m160.cs
    • ipmipoweron_nodes
      • (ADMIN: cscf-adm, password 2016 version)

Login method to shutdown

  • ssh to *cscf-adm@m160.cs"
    • Access: cscf-adm - password in safe - 2015
  • cd /root/utils
    • ./shutdown-all

Console Access

Web Access to LOM - Windows

  • Provides: remote console and power managment of the cluster
  • Notes: MUST use Windows IE Browser - does NOT work under Linux

IPMI LOM addresses

Updated 17 Jul 2013
IP addresses for the m160 cluster IPMI/LOM interfaces

  • Access: cscf-adm - password in safe - 2016
  • NOTE: ONA port for IPMI access is left DISABLED when not in use for added security!!!
  • m160.cs Headnode
    • 172.19.96.246 lom-m160.cs.uwaterloo.ca lom-m160
  • Nodes:
    • 172.19.96.227 lom-m160-16.cs.uwaterloo.ca lom-m160-16
    • 172.19.96.228 lom-m160-15.cs.uwaterloo.ca lom-m160-15
    • 172.19.96.229 lom-m160-14.cs.uwaterloo.ca lom-m160-14
    • 172.19.96.230 lom-m160-13.cs.uwaterloo.ca lom-m160-13
    • 172.19.96.231 lom-m160-12.cs.uwaterloo.ca lom-m160-12
    • 172.19.96.232 lom-m160-11.cs.uwaterloo.ca lom-m160-11
    • 172.19.96.233 lom-m160-10.cs.uwaterloo.ca lom-m160-10
    • 172.19.96.234 lom-m160-9.cs.uwaterloo.ca lom-m160-9
    • 172.19.96.238 lom-m160-8.cs.uwaterloo.ca lom-m160-8
    • 172.19.96.239 lom-m160-7.cs.uwaterloo.ca lom-m160-7
    • 172.19.96.240 lom-m160-6.cs.uwaterloo.ca lom-m160-6
    • 172.19.96.241 lom-m160-5.cs.uwaterloo.ca lom-m160-5
    • 172.19.96.242 lom-m160-4.cs.uwaterloo.ca lom-m160-4
    • 172.19.96.243 lom-m160-3.cs.uwaterloo.ca lom-m160-3
    • 172.19.96.244 lom-m160-2.cs.uwaterloo.ca lom-m160-2
    • 172.19.96.245 lom-m160-1.cs.uwaterloo.ca lom-m160-1

IPMI View utility - Linux

  • Provides: remote console and power managment of the cluster
  • Notes: See IPMI for Ubuntu Linux utility
  • Head Node: lom-m160.cs
  • Nodes: lom-m160-N.cs
  • Access: cscf-adm - password in safe - 2016
  • Documentation: TWIKI page IPMI
  • Start IPMI View utility
  • All LOM interfaces live on network 172.19.96
    • Start IPMI View Search* using addresses 172.19.96.227 to 172.19.96.246
    • Save the LOM interfaces the utility finds
    • OK to exit
  • Double click on the node you wish to manage - on left hand side under IPMI Domain
  • Login: cscf-adm - password in safe - 2016
  • Open KVM Console
  • Open the "Soft Keyboard* so you can send special characters that your local OS may intercept

RAID Health Monitoring

Nagios Scripts

  • /usr/local/bin/disk-stats - no options - modified Nagios perl script reports disk array status
root@m160:~/utils# disk-stats
OK - VirtualDrives=2, Degraded=0, Offline=0, PhysicalDevices=39, Disks=36, CriticalDisks=0, FailedDisks=0, MemoryCorrectableErrors=0, MemoryUncorrectableErrors=0

MegaCli Utility

smartctl utilities

  • /root/utils/drives - lists drive serial numbers via smartctl - part of smartmontools
    • Note for some reason a bug - as of 21 Oct 2015 - causes one of the drives not to be listed

Fix NFS mounts on nodes

  • /root/utils/fix-mount

Fix Grub on the nodes

  • /root/utils/update-grub-all
    • uses grub-fix

Install MPI on the nodes

  • /root/utils/install-mpi

Hardware and Inventory Section

  • MegaCli64 -PDList -aALL

List All drive serial Numbers

  • MegaCli64 -PDList -aALL | grep "Inquiry Data"
Updated 3 Nov 2011
root@m160:~/utils# MegaCli64 -PDList -aALL | grep "Inquiry Data"
   Inquiry Data: SEAGATE ST32000444SS    00069WM33GWT            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33KW5            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33LG7            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33CX1            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33GSG            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34F8D            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33G6P            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33LY8            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33M19            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33L2C            
   Inquiry Data: SEAGATE ST32000444SS    00069WM345BT            
   Inquiry Data: SEAGATE ST32000444SS    00069WM31XJP            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34FS4            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33KXB            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33KTK            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33LN4            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34DNT            
   Inquiry Data: SEAGATE ST32000444SS    00069WM31W2Y            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34FGR            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33LHQ            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34AAN            
   Inquiry Data: SEAGATE ST32000444SS    00069WM345E6            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34BEK            
   Inquiry Data: SEAGATE ST32000444SS    00069WM345FX            
   Inquiry Data: SEAGATE ST32000444SS    00069WM347EN            
   Inquiry Data: SEAGATE ST32000444SS    00069WM31Y3G            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34AWW            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33E8F            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34C3J            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33LWS            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34H9D            
   Inquiry Data: SEAGATE ST32000444SS    00069WM2XK5C            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33D50            
   Inquiry Data: SEAGATE ST32000444SS    00069WM34K29            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33LXA            
   Inquiry Data: SEAGATE ST32000444SS    00069WM33CH4            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P068XV            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P06ZSJ            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P05DS9            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P073CP            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P06EV3            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P06ZTV            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P06ZTC            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P05DLA            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P072H2            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P06EVR            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P04Z2Z            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P06JDV            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P067YK            
   Inquiry Data: SEAGATE ST2000NM0001    0001Z1P0600E        
   

PO with parts summary

* quote_1510258_4-1.pdf: PO with parts summary

APC SUA3000XLT UPS

AP5017 APC LCD/KVM

File Server - SuperChassis Storage Extra High-Density 4U Storage Chassis

  • Monitoring: smartmontools
  • Inventory CS006522
    • Name: m160.cs
    • IPMI: lom-m160.cs

Hardware and Documents

Switches

Compute Nodes - Supermicro 6016TT-TF Twin 1U Intel Xeon 5600/5500 Series

Hardware

Inventory and Access and LOM

Hardware and Documents

HP ProCurve 2910al-24G Ethernet Switch

Hardware and Documents

DES-1024D 24-port 10/100 Desktop/Rackmount Switch

Topic attachments
I Attachment Action Size Date Who Comment
PDFpdf quote_1510258_4-1.pdf manage 1628.5 K 2011-03-29 - 15:04 MikeGore PO with parts summary
Topic revision: r30 - 2016-11-28 - MikeGore
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback