-- MikeGore - 2015-08-27

Common management utilities for end users and admins

Other resources

  • ClusterTools * Cluster Tools main documentation root
  • ClusterToolsSetup
    • CLuster tools setup and installation walk-through
  • * ClusterToolsScripts
    • Documentation, File and scripts for the installation and operation of a new cluster along with some - some end user tools


  • What:
    • * Add users from CVS optionally specify home directory, email address, password and groups*
    • If the group is admin - the user will be get all of the admin groups
  • See:
    • sync-users to sync all user changes to all of the nodes
    • usermod
  • Notes:
    • needs root


  • usermod - Linux command to add or change groups for a user
  • Options:
    • -a append - you normally always use this unless you intend to replace groups which is unlikely
    • -G list of group names to add
  • Admin groups:
    • admin, adm, sudo


  • Delete a user on all of the nodes
  • Usage:
    • del-user userid
  • Notes:
    • needs root perms


  • sync user accounts, passwords, ssh keys and group setting to all of the nodes
  • Usage:
    • sync-users
    • This command can be run any time, and more then once without harm
  • Notes:
    • needs root
    • Creates SSH keys for a user - but only if they do not have any
    • Add the users public key to their authorised_keys2 file - so in a cluster they can log into nodes that share /home


finding packages

  • Example: apt-cache search postgres -n
    • Search for postgres in the one line description
  • Example: apt-cahe search postgres
    • Searchj for postgres in the entire description

example script to install packages on all of the nodes

  • /cscf-adm/src/cluster/install-mpi
    • The script first installed the packages listed in the script on himrod and then on the nodes
    • The script is only 27 lines long and you will only have to change 2 lines!
    • NODES and "common_vars* is pulled in from the search path - in this case: /usr/local/bin
      • (ie. they do NOT have to be in the current directory)
           # Mike Gore, 10 Oct 2014
           # Install openmpi on the nodes and headnode
           . common_vars
           . NODES
           update_packages netpipe-openmpi openmpi-bin openmpi-checkpoint openmpi-common openmpi-doc 
           for i in $NODES
              if ping -c 1 $i >/dev/null 2>&1
              cat <<EOF | ssh root@"$i" 
           . common_vars
           . NODES
           update_packages netpipe-openmpi openmpi-bin openmpi-checkpoint openmpi-common openmpi-doc 
                 echo $i is down


  • File /etc/security/limits.conf controls user memory, process, file handel, signal, lock limits
  • WARNING the default Ubuntu install HAS NO LIMITS set
    • This means ANY user can CRASH an Ubuntu system by running out of system resources!


  • Defaults
          *          hard    cpu             unlimited
          *          hard    nproc           unlimited
          *          hard    as              unlimited
          *          hard    data            unlimited
          *          hard    sigpending      unlimited
          *          hard    nofile          unlimited
          *          hard    msqqueue        unlimited
          *          hard    locks           unlimited
          *          hard    file            unlimited

NFS mounts"> Checking nodes and NFS mounts

  • /cscf-adm/src/cluster/fix-mount
    • Verifies NFS mounts are working - mounts them if not


  • check that each node is on line or not
  • Usage:
    • check-nodes
  • Notes:
    • This can be used as a common template we use to perform a task on all nodes
    • Check to See if all of the nodes are online



  • Grab the settings for all of the non-system users from the headnode
    • Create a script that can be run on all of the nodes to reproduce everything
  • Run this script on each node to make the changes
  • Notes:
    • We use both useradd and usermod - if the first fails because they already exist then usermod fixing the values


  • Ping each node once to see if it is alive - display up/down status


  • Shutdown all of the nodes then the headnode


  • Reboots all of the nodes

all documenation below this section is work in progress

  • As of *27 Aug 2015


  • runs mount -a on all nodes


  • Fix /etc/resolv.conf using the headnode as a template
    • Note: removes before the copy


  • Updates
    • /etc/hosts
    • /etc/hostname
    • /etc/resolv.conf
    • /etc/udev/rules.d/70-persistent-net.rules
  • Restarts
    • networking service
    • nscd service
  • Mounts
    • mount -a


  • Updates /etc/profile on all NODES


  • local copy of profile /etc/profile used by fix-profile



  • Setup Automatic updates of critical files on all of the nodes


  • OpenMPI task/cpu sharing software
  • Install OpenMPI on all of the nodes and headnode


  • Updates the disk scheduler options on all the nodes and headnode


  • Updates the disk scheduler options


  • save important system config files in /etc/config-backups


  • Using the nodes called PACKAGE_MASTER defined in file NODES
    • Sync the packages so they are the same on all of the other nodes



  • Reinstall and configure GRUB on all of the nodes
    • Updates /etc/default/grub


  • Reinstall and configure GRUB
    • Updates /etc/default/grub
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2015-09-22 - MikeGore
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback