Himrod cluster

This is a cluster belonging to Ashraf Aboulnaga and Hans DeSterck, purchase in May 2014, running since October 2014


role count cpu memory disk interconnect other
himrod.cs 1 2xIntel E5-2670(8C) 32GB Dell RAID (2TB) 4x 10GbE + 2x GbE
himrod-big-[1-4] 4 2x Intel E5-2670 (8C) 512GB 6x 600GB + 1x 200GB 2x 10GbE + 2x GbE  
himrod-[1-23] 23 2x Intel E5-2670 (8C) 256GB 6x 600GB + 1x 200GB 2x 10GbE + 2x GbE  
himrod-storage 1 2x Intel E5-2620 (6C) 64GB Dell RAID (50TB) 2x 10GbE + 2x GbE  

Sysadmin notes

Admin tools

  • IMPORTANT Currently I have a number of tools located under /cscf-adm/src/cluster - the plan is to move all of them to /usr/local/bin in the very near future.
  • This list below has been moved there

add_users file.csv

  • What:
    • * Add users from CSV file optionally specify home directory, email address, password and groups*
    • If the group is admin - the user will be get all of the admin groups
    • The script will email the user their password and account information
  • Usage: * ./add_users file.csv
  • See:
    • sync-users to sync all user changes to all of the nodes
  • Notes:
    • needs root
    • if the users email is NOT in uwdir you must add the full email to the csv file


  • Delete a user on all of the nodes
  • Usage:
    • del-user userid
  • Notes:
    • needs root perms


  • sync user accounts, passwords, ssh keys and group setting to all of the nodes
  • Usage:
    • sync-users
    • This command can be run any time, and more then once without harm
  • Notes:
    • needs root

NFS mounts"> Checking nodes and NFS mounts

  • /cscf-adm/src/cluster/fix-mount
    • Verifies NFS mounts are working - mounts them if not


  • check that each node is on line or not
  • Usage:
    • check-nodes
  • Notes:
    • This can be used as a common template we use to perform a task on all nodes
    • Check to See if all of the nodes are online

ILOM managment

  • You can do full KVM management of each node (himrod has to be up)
    • This includes powering up, console, booting remote media from your own desktop, etc
  • Browse directly to the ILOM interface as listed below
    • Accept the security certificate
    • Login userid: cscf-adm - fall 2013 password - see CSCF staff ilom-himrod-storage ilom-himrod-big-1 ilom-himrod-big-2 ilom-himrod-big-3 ilom-himrod-big-4 ilom-himrod-1 ilom-himrod-2 ilom-himrod-3 ilom-himrod-4 ilom-himrod-5 ilom-himrod-6 ilom-himrod-7 ilom-himrod-8 ilom-himrod-9 ilom-himrod-10 ilom-himrod-11 ilom-himrod-12 ilom-himrod-13 ilom-himrod-14 ilom-himrod-15 ilom-himrod-16 ilom-himrod-17 ilom-himrod-18 ilom-himrod-19 ilom-himrod-20 ilom-himrod-21 ilom-himrod-22 ilom-himrod-23  ilom-himrod-switch

Himrod Admin setup overview SAMBA, DNSMASQ, DHCP, FIREWALL, APACHE, NFS, Imaging tools

  • HimrodTools overview of the setup and installaion of services on HIMROD

iDRac tools and scripts


Imaging a node

PXE booting recovry and imaging tools

Setup and configuration Scripts


finding packages

  • Example: apt-cache search postgres -n
    • Search for postgres in the one line description
  • Example: apt-cahe search postgres
    • Searchj for postgres in the entire description

example script to install packages on all of the nodes

  • /cscf-adm/src/cluster/install-mpi
    • The script first installed the packages listed in the script on himrod and then on the nodes
    • The script is only 27 lines long and you will only have to change 2 lines!
    • NODES and "common_vars* is pulled in from the search path - in this case: /usr/local/bin
      • (ie. they do NOT have to be in the current directory)
# Mike Gore, 10 Oct 2014
# Install openmpi on the nodes and headnode

. common_vars

update_packages netpipe-openmpi openmpi-bin openmpi-checkpoint openmpi-common openmpi-doc 

for i in $NODES
   if ping -c 1 $i >/dev/null 2>&1
   cat <<EOF | ssh root@"$i" 
. common_vars
update_packages netpipe-openmpi openmpi-bin openmpi-checkpoint openmpi-common openmpi-doc 
      echo $i is down

Disks on nodes

  • Each node has disks mounted with names /localdiskN where N is 0 .. 5



  • I have installed openmpi on himrod and all of the nodes
  • As of 10 Oct 2014 only the configuration part has not been completed


I have added the following lines to the /etc/security/limits.conf file for now (until I am given a better idea of the other entries we must set)
(I used the /cscf-adm/src/cluster/sync-users script to update this file on the nodes - I ran it as root)
Note: the nodes may need to be restarted
There is a script that does this restart correctly is under /cscf-adm/src/cluster called restart_all_nodes
(Must be run as root)

# system defaults
*          hard    cpu             unlimited
*          hard    nproc           unlimited
*          hard    as              unlimited
*          hard    data            unlimited
*          hard    sigpending      unlimited
*          hard    nofile          unlimited
*          hard    msqqueue        unlimited
*          hard    locks           unlimited
*          hard    file            unlimited
Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2016-10-28 - MikeGore
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback