ClusterToolsUtils < CF

TWiki>

CF Web>Linux>ClusterTools>ClusterToolsUtils (2015-09-22, MikeGore)

EditAttach

-- MikeGore - 2015-08-27

Common management utilities for end users and admins

Other resources

ClusterTools * Cluster Tools main documentation root
ClusterToolsSetup
- CLuster tools setup and installation walk-through
* ClusterToolsScripts
- Documentation, File and scripts for the installation and operation of a new cluster along with some - some end user tools

add_users

What:
- * Add users from CVS optionally specify home directory, email address, password and groups*
- If the group is admin - the user will be get all of the admin groups
See:
- sync-users to sync all user changes to all of the nodes
- usermod
Notes:
- needs root

usermod

usermod - Linux command to add or change groups for a user
Options:
- -a append - you normally always use this unless you intend to replace groups which is unlikely
- -G list of group names to add
Admin groups:
- admin, adm, sudo

del-user

Delete a user on all of the nodes
Usage:
- del-user userid
Notes:
- needs root perms

sync-users

sync user accounts, passwords, ssh keys and group setting to all of the nodes
Usage:
- sync-users
- This command can be run any time, and more then once without harm
Notes:
- needs root
- Creates SSH keys for a user - but only if they do not have any
- Add the users public key to their authorised_keys2 file - so in a cluster they can log into nodes that share /home

Packaging

finding packages

Example: apt-cache search postgres -n
- Search for postgres in the one line description
Example: apt-cahe search postgres
- Searchj for postgres in the entire description

example script to install packages on all of the nodes

/cscf-adm/src/cluster/install-mpi

The script first installed the packages listed in the script on himrod and then on the nodes
The script is only 27 lines long and you will only have to change 2 lines!

NODES and "common_vars* is pulled in from the search path - in this case: /usr/local/bin

(ie. they do NOT have to be in the current directory)

   !/bin/bash
   #
   # Mike Gore, 10 Oct 2014
   #
   # Install openmpi on the nodes and headnode

   . common_vars
   . NODES

   update_list
   update_packages netpipe-openmpi openmpi-bin openmpi-checkpoint openmpi-common openmpi-doc 


   for i in $NODES
   do 
      if ping -c 1 $i >/dev/null 2>&1
      then
      cat <<EOF | ssh root@"$i" 
   . common_vars
   . NODES
   update_list
   update_packages netpipe-openmpi openmpi-bin openmpi-checkpoint openmpi-common openmpi-doc 
   EOF
      else
         echo $i is down
      fi
   done

Limits

File /etc/security/limits.conf controls user memory, process, file handel, signal, lock limits
WARNING the default Ubuntu install HAS NO LIMITS set
- This means ANY user can CRASH an Ubuntu system by running out of system resources!

/etc/security/limits.conf

Defaults

      *          hard    cpu             unlimited
      *          hard    nproc           unlimited
      *          hard    as              unlimited
      *          hard    data            unlimited
      *          hard    sigpending      unlimited
      *          hard    nofile          unlimited
      *          hard    msqqueue        unlimited
      *          hard    locks           unlimited
      *          hard    file            unlimited

Checking nodes and NFS mounts

/cscf-adm/src/cluster/fix-mount
- Verifies NFS mounts are working - mounts them if not

check-nodes

check that each node is on line or not
Usage:
- check-nodes
Notes:
- This can be used as a common template we use to perform a task on all nodes
- Check to See if all of the nodes are online

Users

sync-users

Grab the settings for all of the non-system users from the headnode
- Create a script that can be run on all of the nodes to reproduce everything
Run this script on each node to make the changes
Notes:
- We use both useradd and usermod - if the first fails because they already exist then usermod fixing the values

check-nodes

Ping each node once to see if it is alive - display up/down status

shutdown-all

Shutdown all of the nodes then the headnode

reboot-nodes

Reboots all of the nodes

all documenation below this section is work in progress

As of *27 Aug 2015

fix-mount

runs mount -a on all nodes

fix-resolv

Fix /etc/resolv.conf using the headnode as a template
- Note: removes 127.0.0.1 before the copy

fix-network

Updates
- /etc/hosts
- /etc/hostname
- /etc/resolv.conf
- /etc/udev/rules.d/70-persistent-net.rules
Restarts
- networking service
- nscd service
Mounts
- mount -a

fix-profile

Updates /etc/profile on all NODES

profile

local copy of profile /etc/profile used by fix-profile

Software

install-autoupdates

Setup Automatic updates of critical files on all of the nodes

install-mpi

OpenMPI task/cpu sharing software
Install OpenMPI on all of the nodes and headnode

install-scheduler

Updates the disk scheduler options on all the nodes and headnode

update-scheduler

Updates the disk scheduler options

save-configs

save important system config files in /etc/config-backups

sync-packages

Using the nodes called PACKAGE_MASTER defined in file NODES
- Sync the packages so they are the same on all of the other nodes

GRUB

fix-grub-all

Reinstall and configure GRUB on all of the nodes
- Updates /etc/default/grub

grub-fix

Reinstall and configure GRUB
- Updates /etc/default/grub

Topic revision: r3 - 2015-09-22 - MikeGore

Information in this area is meant for use by CSCF staff and is not official documentation, but anybody who is interested is welcome to use it if they find it useful.

Other Webs

My links
- People
- CERAS
- WatForm
- Tetherless lab
- Ubuntu Main.HowTo
- eDocs
- RGG NE notes
- RGG
- CS infrastructure
- Grad images

Edit