SGI Altix System Administration
Dave Wright
out of Egan, Minnesota
daw@sgi.com
10.15.0.2-7
hostname: tng2-tng7
Course expects some amount of Red Hat background
- SGI will be switching to Suse
SALE - SGI Advanced Linux Environment
EFI - goes with Itanium boxes
SGI 750 - early development boxes
- not officially supported
- ELILO - unique to I64
- elilo.conf
- startup and shutdown from a controller
- Configuring ESP
- Path conventions for SCSI and SATA
- 2.4.21 - Propack 3
- 2.6 - Propack 4
- partitioning
- fdisk (gone with Suse)
- parted
- XVM
- stripes, mirrors
- can partition a drive, but not recommended
- works in a cluster, CXFS
- XFS
- from SGI
- ships with standard Suse
- Suse on SGI uses standard
- Performance Co-pilot - PCP
- bundled on the Altix (licensed on Irix)
- kdb - kernel debugger
- kernel interrupt information
- "survival training" - get basic diagnostic info
- LKCD, lcrash
- obtain a system dump
- bundled with Suse now
- module command - manipulate paths
- ethernet interface (eth0)
- SGI Advanced Linux Environment and Propack support issues
- Suse updates (YOU) - go to sgi.com
- System Administration
- Suse - YAST, YAST2
- Linux - setup
SGI Installation
- 4 SALE CDs
- Disk 1 is rescue disk
- 2 Propack CDs
- Docs
/usr/src/linux/Documentation
http://techpubs.sgi.com
- installer is slightly different from standard Red Hat CD
- includes support for XFS
- understand the PROM code
Load the first CD
- from L1 prompt, issue =reset= command
- from boot menu, choose EFI Shell
- choose fs# for CD room
- From fs# =cd efi\boot=
- =elilo=
- skip testing CD
- custom installation
- Autopartition
- ignore partition errors (SGI versions)
- Remove all partitions
- install on target1 (not target2) (on this system)
- 500M /boot/efi
- 9G - swap - don't make too big (no longer needs to be size of physical memory)
- 25G - root
- remove everything on target2
-network
- pci - default internet
- on-board disabled
- 10.15.0.2 255.255.255.0
-
- package selection
- use lab manual
- SGI ProPack installation
- common problems with the 750 installatin
- tty
- image - add dig
- mouse - link to /dev/psaux
- modules
- /mnt/cdrom/INSTALL
- install ALL
- customize software as per lab manual
/.unconfigured
- prompts to change root passwd
- runs netconfig, timeconfig, kbdconfig, authconfig, ntsysv
/fastboot, /fsckoptions, /forcefsck, /halt
- used by /etc/rc.sysinit
System information
- cat /etc/*rel*
/etc/sgi-release
LSB_Version 1.3
Red Hat Enterprise Linux AS release 3 (Taroon)
SGI ProPack 3
- cd /proc/sgi_sn
cat system_serial_number
- /etc/sysconfig/networking/eth0_persist
eth0 08:00:69:13:db:88
- uname -a
- lmhostid
FlexLM host ID
- Common rpm options
-qa query for installed packages
-ivh Install packages
-ql List files in a package
-qf List package that file is from
-V Verify package
-U upgrade and replace older package
-e Erase package
-qpil Query rpm file for information
-F Upgrade installed packages (only)
-force Downgrade
--last History of packages
- user accounts
- useradd - add users
- (adduser also works in RedHat - not Suse)
- ussermod
- userdel
passwd
chage
pwconv - creates /etc/shadow
pwunconv
- useradd
- Creates home directory
- copies files from /etc/skel into home
- group same as username is also created, called a User Private Group
/etc/default/useradd gives defaulst
/etc/profile.d scripts are run
- Host info
/etc/sysconfig/network (RedHat)
- use =setup= in RedHat
- use =yast= in Suse
/etc/sysconfig/network-scripts
ifcfg-eth0
- Hardware
- I/O slots
- 6 buses, 2 slots per bus
- rotate through buses first
- don't mix card types on same bus
- IO9 card
- bandwidth vs latency
- Kernel modules
- keeps kernel small
- modules located in /lib/modules/<kernelname>
- kernel name must exist
Day 2
=====
Application Performance Tuning
Resources
CPU
memory
disk
cache
network
IPC
Health of System - "shrink"
sar pcp pmchart topdisk
Quality of Service "accountant"
- time to solution
- top codes, top users
Profiling "efficiency expert"
- we'll focus on CPU time primarily
- Floating point, Integer, Branch, inefficiencies
- top application for profiling:
* histx http://www.sgi.com/products/software/histx.html
- similar to "SpeedShop" in Irix
- SGI working on an open source version of SpeedShop
Intel: Vtune
Linux community: PAPI http://icl.cs.utk.edu/papi/
NCSA: psrun (similar to histx)
SGI Propack utility: profile.pl - avoid
strace
Ed: prof / gprof - "garbage"
pfmon
---++ Application Behavioural Problems
---+++ Cache misses
- Cache thrash
- set associativity
- avoid with padding
- avoid arrays of size ^2
- Stride
- how we stride through the data
- TLB misses (http://www.cs.umass.edu/~weems/CmpSci635A/Lecture11/L11.18.html)
- cache misses
- columns vs rows
- can make a significant difference going by rows vs columns or vice verse
(eg: switch i and j, 1000 secs vs 1700 secs)
- avoid with:
- larger pages
- change stride i,j / j,i
- transposing
- re-organize array
- Cache busting
- data larger than cache
- avoid with:
- blocking (chunking data into cache-sized pieces)
- multithreading
- System cache thrash
- sharing the caches
- swapping between processes, reloading caches
- avoid with:
- dplace - place/pin processes into that CPU set
- cpuset - private CPU set
- page coloring (software solution)
- do not have processes/threads share a CPU
- TLB misses
- Floating Point errors
- shows up as system time
- software pipelining
- multiple instruction pipes
- make sure that every pipe has something to do
- compiler will do that, but may need clues / directives
- False Cache sharing
- unique to multi processor systems
- where threads "step on each other" and cause the cpus to refresh their caches
- 2 cpus writing to the same boundary area
- Barrier synchronization
- "#1 problem out there"
- app taking 40 secs vs 3 weeks ...
- environment variables to guide how that is done
- CFQ - Complete Fair Queueing
- Robert Love
User
System
Memory Use
I/O wait
Module 8 - Application User Time
- viewing stack
- idb, gdb, totalview
- histx - shareware
- Application Tuning
- top cpu, top i/o, etc
- csacms
- top
then use options: C, i, I (Iris mode), fu
Irix mode - shows %of CPU
- Application Tuning steps
- let the compiler do the work
- use existing libraries
- profile the application
- recode the expensive algorithms
- resolve software pipelining
- tune single threaded first
- multi-thread and run with dplace or a cpuset
- SGI's MPT's MPI is NUMA topology aware
- fix barrier synchronization / load balance problems
- resolve false cache sharing and data placement
- do friendly, well formed I/O
- Compiler choices
- Intel Compilers
- ifort - Fortran 77, 90 and 95
- icc C and C++
- guideefc, guidec (for OpenMP programs)
- KAP/PRO Openmp directives - parallel
- Vtune analyzer (GUI)
- Other alternatives
- gnu tools, gcc, g77 and g++ (from Red Hat)
- ORC - the Open Research Compiler, based on SGI's Pro64
http://ipf-orc.sourceforge.net/
- Compiler Optimixation
- Runtime performance, but longer compile
- o0 - no optimization
-o1 - Local (just within routine)
-o2 - Extensive but conservative - some swp - Default
-o3 - Agressive (prefetch, IPO, LNO)
- Inter procedure
- Profiling Tools
- gprof
- profile.pl (Propack)
- does not work in multi-user environment
- samples a single CPU
- pfmon
- parent of application and monitors hardware counters (only 4 counters at a time)
- histx
- Evaluation software
- runs as parent of application
- iprep, csrep, lipfpm, samppm
- report tools (after running histx)
- strace
- trace system calls, I/O characteristics
- dlook
- what node the pages are on
- very verbose
- top, pmap, ps -l
- pmap: memory map
- lsof
- lists open files
Performance Tuning and Optimization Guide
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?coll=0650&db=bks&cmd=toc&pth=/SGI_Developer/OrOn2_PfTune
Compiler Setup
cat /etc/motd - read notes
cat /local_pilatus/UsageNotes/Intel_Compilers - read notes
bash
source /opt/intel_cc_80/bin/iccvars.sh
Histx Setup
cd /usr/local/histx
source histx+.sh
- INTEL's VTUNE GUI
/usr/local/histx/doc/doc.txt - doc file
- watching all nodes with a graphical display:
pmshub
GNU gprof experiment
info gprof
g77 -pg -o3 prog prog.f
./prog
gprof prog gmon.out
more gmon.sum
Brian Sumner @ SGI - bls@sgi.com for latest histx
Application Tuning Lab
- histx not working
- now working - histx 1.2a
Processes
ASE 2.0 /proc/pid/*
/proc/tid/*
ASE 3.0 NPTL /proc/pid/* (processes)
/proc/.tid/* (threads)
NPTL - Native pthread Library
2.6 /proc/pid/tasks.tid/*
Multi-threading techniques
- Tightly Coupled
- OpenMP Micro-tasking - SMP aware, not cluster aware
- preprocessor Auto-tasking Compiler detects - SMP aware
- parallel
- LD_ASSUME_KERNEL=2.4.19 Pre PP3.0 OpenMP
- MPI (Message Passing Interface) Macro-tasking - cluster aware
- SHMEM put/get message passing
- pthreads Posix standard
- clone Kernel system call
- was sprock in older SGI
- Loosely coupled
- InterProcess Communication (IPC)
- Other multi-threading techniques
- LINDA - Compiler language
- PVM - Parallel Virtual Machine
- MLP - Nasa-Ames multitasking libraries
Sample app:
- single thread
Real: 23s, User: 23s
- -parallel (64 processors)
Real: 35s, User: 26m24s (!)
Counting Threads
- ps, top
H option in top shows threads
ps -m shows threads
Multi-threading issues
- Data Locality
- Partitinong Data (Chunk Scheduling)
- load balancing
- static, dynamic, guided
- Orchestration (Thread Scheduling)
- static or dynamic
- Communication (Barrier Synchronization)
- spin or yield
- Data and Thread Placement
Dplace
dplace -c16-31 -x2
place the jobs on CPUs 16-31, skipping every 2nd CPU (?)
Trying to determine the nature of a por
ifort -O3 -g -o code2o3 code2.f -ldl - compile with symbol table
histx -l -e pm:L2_MISSES@50000 ./code2o3 - check L2_MISSES every 50000 occurrences
iprep *14034 - show report with source lines
vi code2.f
See: aappl1.0.pdf for documentation on handling these kinds of issues
Summary:
Kinds of Cache misses:
- cache thrash
- array results step on cache line
- set associativity of the the chip
- avoid ^2 / padding
D(i) = A(i)+B(i)+C(i)
- stride
- TLB misses, walk through array, columns vs rows (i,j vs j,i)
- transposing array
- cache busting
--
LawrenceFolland - 08 Jun 2005