SGI Altix System Administration 
Dave Wright
out of Egan, Minnesota
daw@sgi.com

10.15.0.2-7
hostname: tng2-tng7

Course expects some amount of Red Hat background
   - SGI will be switching to Suse
   
SALE - SGI Advanced Linux Environment

EFI - goes with Itanium boxes
SGI 750 - early development boxes
   - not officially supported
   
- ELILO - unique to I64
   - elilo.conf
   
- startup and shutdown from a controller

- Configuring ESP

- Path conventions for SCSI and SATA

- 2.4.21 - Propack 3
- 2.6 - Propack 4

- partitioning
   - fdisk (gone with Suse)
   - parted

- XVM
   - stripes, mirrors
   - can partition a drive, but not recommended
   - works in a cluster, CXFS

- XFS
   - from SGI
   - ships with standard Suse
   - Suse on SGI uses standard 

- Performance Co-pilot - PCP
   - bundled on the Altix (licensed on Irix)
- kdb - kernel debugger
   - kernel interrupt information
   - "survival training" - get basic diagnostic info 
- LKCD, lcrash
   - obtain a system dump
   - bundled with Suse now
- module command - manipulate paths
- ethernet interface (eth0)
- SGI Advanced Linux Environment and Propack support issues
   - Suse updates (YOU) - go to sgi.com
   
- System Administration
   - Suse - YAST, YAST2
   - Linux - setup



SGI Installation
   - 4 SALE CDs
      - Disk 1 is rescue disk
   - 2 Propack CDs

   - Docs
      /usr/src/linux/Documentation
      http://techpubs.sgi.com
   
- installer is slightly different from standard Red Hat CD
   - includes support for XFS
   - understand the PROM code

Load the first CD
   - from L1 prompt, issue =reset= command
   - from boot menu, choose EFI Shell
   - choose fs# for CD room
   - From fs# =cd efi\boot=
   - =elilo=
   - skip testing CD
   - custom installation
   - Autopartition
   - ignore partition errors (SGI versions)
   - Remove all partitions
   - install on target1 (not target2) (on this system)
      - 500M /boot/efi
      - 9G - swap - don't make too big (no longer needs to be size of physical memory)
      - 25G - root
      - remove everything on target2
   -network
      - pci - default internet
      - on-board disabled
      - 10.15.0.2 255.255.255.0
      - 
   - package selection
      - use lab manual 
      
- SGI ProPack installation
   - common problems with the 750 installatin
      - tty
      - image - add dig
      - mouse - link to /dev/psaux
      - modules
   - /mnt/cdrom/INSTALL
      - install ALL
      - customize software as per lab manual
      
      
      
/.unconfigured
   - prompts to change root passwd
   - runs netconfig, timeconfig, kbdconfig, authconfig, ntsysv
/fastboot, /fsckoptions, /forcefsck, /halt
   - used by /etc/rc.sysinit

System information
   - cat /etc/*rel*
      /etc/sgi-release
      LSB_Version 1.3
      Red Hat Enterprise Linux AS release 3 (Taroon)
      SGI ProPack 3
   - cd /proc/sgi_sn
      cat system_serial_number
   - /etc/sysconfig/networking/eth0_persist
      eth0 08:00:69:13:db:88
   - uname -a
   - lmhostid
      FlexLM host ID
   - Common rpm options
      -qa    query for installed packages
      -ivh   Install packages
      -ql    List files in a package
      -qf    List package that file is from
      -V      Verify package
      -U       upgrade and replace older package
      -e      Erase package
      -qpil   Query rpm file for information
      -F      Upgrade installed packages (only)
      -force   Downgrade
      --last   History of packages
      
   - user accounts
      - useradd - add users
      - (adduser also works in RedHat - not Suse)
      - ussermod
      - userdel
      passwd
      chage
      pwconv - creates /etc/shadow
      pwunconv
      
   - useradd
      - Creates home directory
      - copies files from /etc/skel into home
      - group same as username is also created, called a User Private Group
      /etc/default/useradd gives defaulst
      /etc/profile.d scripts are run
   - Host info
      /etc/sysconfig/network (RedHat)
      - use =setup= in RedHat
      - use =yast= in Suse
      /etc/sysconfig/network-scripts
         ifcfg-eth0
   
- Hardware
   - I/O slots
   - 6 buses, 2 slots per bus
   - rotate through buses first
   - don't mix card types on same bus
   - IO9 card
   
   - bandwidth vs latency
      
- Kernel modules
   - keeps kernel small
   - modules located in /lib/modules/<kernelname>
      - kernel name must exist
      

Day 2
=====
Application Performance Tuning

Resources
CPU
memory
disk
cache
network
IPC


Health of System - "shrink"
   sar pcp pmchart topdisk

Quality of Service  "accountant"
 - time to solution
 - top codes, top users

Profiling "efficiency expert"
 - we'll focus on CPU time primarily
    - Floating point, Integer, Branch, inefficiencies
 
 - top application for profiling:
    * histx  http://www.sgi.com/products/software/histx.html
       - similar to "SpeedShop" in Irix
       - SGI working on an open source version of SpeedShop
    Intel: Vtune
    Linux community: PAPI  http://icl.cs.utk.edu/papi/
    NCSA: psrun (similar to histx)
    SGI Propack utility: profile.pl - avoid
    strace
    Ed: prof / gprof - "garbage"
   pfmon
   
---++ Application Behavioural Problems
---+++ Cache misses
 - Cache thrash
    - set associativity
    - avoid with padding
    - avoid arrays of size ^2
 - Stride
    - how we stride through the data
   - TLB misses (http://www.cs.umass.edu/~weems/CmpSci635A/Lecture11/L11.18.html)
   - cache misses
   - columns vs rows
      - can make a significant difference going by rows vs columns or vice verse
      (eg: switch i and j, 1000 secs vs 1700 secs)
   - avoid with:
      - larger pages
      - change stride i,j / j,i
      - transposing
         - re-organize array
  - Cache busting
     - data larger than cache
     - avoid with:
        - blocking (chunking data into cache-sized pieces)
        - multithreading
  - System cache thrash
     - sharing the caches
     - swapping between processes, reloading caches
     - avoid with:
        - dplace - place/pin processes into that CPU set
        - cpuset - private CPU set 
        - page coloring (software solution)
        - do not have processes/threads share a CPU
  - TLB misses
  - Floating Point errors
     - shows up as system time
  - software pipelining
     - multiple instruction pipes
     - make sure that every pipe has something to do
     - compiler will do that, but may need clues / directives
  - False Cache sharing
     - unique to multi processor systems
     - where threads "step on each other" and cause the cpus to refresh their caches
     - 2 cpus writing to the same boundary area
  - Barrier synchronization
     - "#1 problem out there"
   - app taking 40 secs vs 3 weeks ...
   - environment variables to guide how that is done
   - CFQ - Complete Fair Queueing
      - Robert Love

User
System
Memory Use
I/O wait
   
Module 8 - Application User Time

 - viewing stack
    - idb, gdb, totalview
    - histx - shareware
 - Application Tuning
    - top cpu, top i/o, etc
    - csacms
    
    - top 
       then use options:  C, i, I (Iris mode), fu
       Irix mode - shows %of CPU
    
 - Application Tuning steps
    - let the compiler do the work
    - use existing libraries
    - profile the application
    - recode the expensive algorithms
    - resolve software pipelining
    - tune single threaded first
    - multi-thread and run with dplace or a cpuset
       - SGI's MPT's MPI is NUMA topology aware
    - fix barrier synchronization / load balance problems
    - resolve false cache sharing and data placement
    - do friendly, well formed I/O
 - Compiler choices
    - Intel Compilers
       - ifort - Fortran 77, 90 and 95
       - icc C and C++
       - guideefc, guidec (for OpenMP programs)
       - KAP/PRO Openmp directives - parallel
       - Vtune analyzer (GUI)
    - Other alternatives
       - gnu tools, gcc, g77 and g++ (from Red Hat)
       - ORC - the Open Research Compiler, based on SGI's Pro64
          http://ipf-orc.sourceforge.net/
 - Compiler Optimixation
    - Runtime performance, but longer compile
       - o0   - no optimization
       -o1      - Local (just within routine)
       -o2      - Extensive but conservative - some swp - Default
       -o3      - Agressive (prefetch, IPO, LNO)
    - Inter procedure 
 - Profiling Tools
    - gprof
    - profile.pl (Propack)
       - does not work in multi-user environment
       - samples a single CPU
    - pfmon
       - parent of application and monitors hardware counters (only 4 counters at a time)
    - histx
       - Evaluation software
       - runs as parent of application
       - iprep, csrep, lipfpm, samppm
          - report tools (after running histx)
    - strace
       - trace system calls, I/O characteristics
    - dlook
       - what node the pages are on
       - very verbose
    - top, pmap, ps -l
       - pmap: memory map
    - lsof
       - lists open files
       
 Performance Tuning and Optimization Guide
    http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?coll=0650&db=bks&cmd=toc&pth=/SGI_Developer/OrOn2_PfTune
    
    
Compiler Setup
   cat /etc/motd - read notes
   cat /local_pilatus/UsageNotes/Intel_Compilers - read notes
   bash
   source /opt/intel_cc_80/bin/iccvars.sh

Histx Setup
   cd /usr/local/histx
   source histx+.sh
    - INTEL's VTUNE GUI
    /usr/local/histx/doc/doc.txt - doc file
    
 - watching all nodes with a graphical display:
    pmshub
 
GNU gprof experiment
   info gprof
   g77 -pg -o3 prog prog.f
   ./prog
   gprof prog gmon.out
   more gmon.sum

Brian Sumner @ SGI - bls@sgi.com for latest histx

Application Tuning Lab
   - histx not working
   - now working - histx 1.2a

Processes
   ASE 2.0     /proc/pid/*
            /proc/tid/*
   ASE 3.0 NPTL   /proc/pid/*      (processes)
               /proc/.tid/*   (threads)
      NPTL - Native pthread Library
   2.6         /proc/pid/tasks.tid/*
   
Multi-threading techniques

 - Tightly Coupled
   - OpenMP                     Micro-tasking - SMP aware, not cluster aware
   - preprocessor Auto-tasking         Compiler detects - SMP aware
      - parallel
   - LD_ASSUME_KERNEL=2.4.19         Pre PP3.0 OpenMP
   - MPI (Message Passing Interface)   Macro-tasking - cluster aware
   - SHMEM                        put/get message passing
   
 - pthreads                        Posix standard
 - clone                        Kernel system call
    - was sprock in older SGI
 - Loosely coupled
    - InterProcess Communication (IPC)
 - Other multi-threading techniques
    - LINDA   - Compiler language
    - PVM - Parallel Virtual Machine
    - MLP - Nasa-Ames multitasking libraries
 
 Sample app:
    - single thread
       Real: 23s, User: 23s
    - -parallel (64 processors)
       Real: 35s, User: 26m24s  (!)


Counting Threads
   - ps, top
      H option in top shows threads
      ps -m shows threads
      
Multi-threading issues
   - Data Locality
   - Partitinong Data (Chunk Scheduling)
      - load balancing
      - static, dynamic, guided
   - Orchestration (Thread  Scheduling)
      - static or dynamic
   - Communication (Barrier Synchronization)
      - spin or yield
   - Data and Thread Placement

Dplace
   dplace -c16-31 -x2
   place the jobs on CPUs 16-31, skipping every 2nd CPU (?)



Trying to determine the nature of a por

ifort -O3 -g -o code2o3 code2.f -ldl      - compile with symbol table
histx -l -e pm:L2_MISSES@50000 ./code2o3   - check L2_MISSES every 50000 occurrences
iprep *14034                         - show report with source lines
vi code2.f

See: aappl1.0.pdf for documentation on handling these kinds of issues

Summary:

Kinds of Cache misses:
 - cache thrash
    - array results step on cache line
       - set associativity of the the chip
       - avoid ^2 / padding
       D(i) = A(i)+B(i)+C(i)
    - stride
       - TLB misses, walk through array, columns vs rows (i,j vs j,i)
       - transposing array
      
   - cache busting

-- LawrenceFolland - 08 Jun 2005

Topic revision: r2 - 2013-02-11 - DrewPilcher
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback