TWiki> CF Web>Research>GPUComputerBuilds>TML_GPU (revision 8)EditAttach
-- MikeGore - 2018-12-03

tml.cs, tml2.cs and tml3 hardware build notes and software configuration

Contacts

  • These machines are owned by Yaoliang Yu
  • Managed by Mike Gore

Remote Admin Access

Local Admin Access

  • tml.cs
    • cscf-adm (2017 password)
  • tml2.cs
    • cscf-adm (2019 password)
  • tml2.cs
    • cscf-adm (2019 password)

TML hardware inventory

  • Motherboard: ASUS WS X299 SAGE
  • Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz
  • Memory: 64GB (4 x 16GB Corsair CMK32GX4M2A2666C16 2133MHZ)
  • Disks: Samsung NVME 2TB SSD
  • GPU GeForce RTX 2080 and GeForce GTX 1080

TML2 hardware inventory

  • Motherboard: ASUS X399 Taichi
  • CPU: AMD Ryzen Threadripper 2920X 12-Core Processor
  • Memory: 48GB (3 x 16GB Corsair CMK32GX4M2A2666C16 2133MHZ)
  • Disks: Samsung NVME 1TB SSD
  • GPU GeForce RTX 2080

TML3 hardware inventory

  • Motherboard: ASUS WS X299 SAGE
  • CPU: Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz
  • Memory: 64GB (4 x 16GB Corsair CMK32GX4M2A2666C16 2133MHZ)
  • Disks: Samsung NVME 1TB SSD
  • GPU
    • NVIDIA Corporation GV100 TITAN V
    • NVIDIA Corporation GV100 TITAN V

Notes on the original purchase of TML

  • The main constraint in the design of this machine was to permit up to 4 liquid cooled GPU cards in one chassis.
  • The biggest problem we faced is that most GPU cards are air cooled from the side - putting 4 every other slot would obstruct the fans
  • I9 System with 64G ram (expandable to 128G)
  • Mother board needs at least 7 slots and the chassis must have 8 slots (GPU cards are two slots wide) the
    • GPU cards plug into slots: 1,3,5,7
  • Build is only cost effective if you plan to add the 4 GPU's - currently the CPU is 35% of the overall cost
  • Overall this system runs quiet as all of the fans ar 120mm and because of the large radiators they do not need to run fast

Parts for TML

Software Documentation

14Feb2020 - Mike Gore
  • Note - I will be moving the GPU setup scrips to aother TWIKI has they are updated frequently - I push out the changes to all of the TML machines - and others with GPUs
Wed Feb 12 13:40:41 EST 2020 IMPORTANT UPDATE tml3 
We now have cuda 10, cuDNN 7.31, tensorflow, pytorch and keras installed
   The packages are highly interdependent on specific cuda versions chosen.
   So we must use anaconda to permit private python environments. 

I created an anaconda environment called "ml"  - for math learning
   FYI: tensorflow and pytorch use "ml" for their installation.

To use the "ml" environment:
    source "/home/gpu-setup"/install_env  - this sets search paths and library paths
    source activate ml         - makes sure that you are in the ml workspace!

I created a Linux system group called ml to permit sharing of code
    You run "/home/gpu-setup"/update_ml_users as root any time to all all users to the ml group
    Example ml group sharing: chgrp -R ml /home/share;  chmod -R g+w /home/share

The following directories belong to the "ml group and all their files have group write added to them
    /usr/local/cuda*
    /usr/local/anaconda3
    "/home/gpu-setup"/cudnn_samples_v7

Installation scripts located in "/home/gpu-setup" were run as follows:
  install_first      - installs all required Ubuntu packages
  install_anaconda   - creates - anaconda ml environment - reboot after this
Note: the following scripts can be run any time if the system gets broken
  install_cuda       - reboot after this
  install_cuDNN      - installs cuDNN
  install_pycuda     - uses ml environment
  install_tensorflow - uses anaconda ml environment
  install_pytorch    - uses anaconda ml environment
  install_keras      - uses anaconda ml environment
Tests:
  cd "/home/gpu-setup"
  ./test_cuda
  ./test_pycuda
  ./test_pytorch
  ./test_tensorflow
  ./benchmark_gpu


TML Pictures

  • TML - with covers off power supply side view:
    IMG_20181203_102802.jpg

  • TMP with covers off rear top view:
    IMG_20181203_102818.jpg

  • TML covers off CPU side view:
    IMG_20181203_102831.jpg

Install scripts

14Feb2020 - Mike Gore
  • Note - I will be moving the GPU setup scrips to aother TWIKI has they are updated frequently - I push out the changes to all of the TML machines - and others with GPUs

  • Note: These scripts are in constant development - please use the latest version of these scripts which can be found on cscf-adm@asimov.uwaterloo.ca:/cscf-adm/src/gpu-setup

  • install_1st: initial install script - a few basic package installs

  • install_2nd: install anaconda , create python environment "ml" for math learning, installed support packages

  • install_cuda: install cuda 9.0 and drivers using nVidias site - removes any existing nvidia or cuda drivers

  • install_env: source this file in your shell scripts to setup environment and libraries paths

Topic attachments
I Attachment History Action Size Date Who Comment
JPEGjpg IMG_20181203_102802.jpg r1 manage 949.7 K 2018-12-03 - 10:38 MikeGore TML - with covers off power supply side view
JPEGjpg IMG_20181203_102818.jpg r1 manage 1014.8 K 2018-12-03 - 10:39 MikeGore TMP with covers off rear top view
JPEGjpg IMG_20181203_102831.jpg r1 manage 997.0 K 2018-12-03 - 10:40 MikeGore TML covers off CPU side view
Unknown file formatext benchmark_gpu r1 manage 0.2 K 2018-12-03 - 11:09 MikeGore Cuda benchmarks
Unknown file formatext common_functions r1 manage 84.0 K 2018-12-03 - 11:09 MikeGore support shell functions using in all scripts
Unknown file formatext install_cuda r1 manage 2.2 K 2018-12-03 - 11:06 MikeGore install cuda 9.0 and drivers using nVidias site - removes any existing nvidia or cuda drivers
Unknown file formatext install_env r1 manage 1.0 K 2018-12-03 - 11:08 MikeGore source this file in your shell scripts to setup invironment and libraries paths
Unknown file formatext install_pytorch r1 manage 0.8 K 2018-12-03 - 11:07 MikeGore install pytorch
Unknown file formatext install_tensorflow r1 manage 1.8 K 2018-12-03 - 11:07 MikeGore install tensorflow
Unknown file formatext test_tensorflow r1 manage 0.2 K 2018-12-03 - 11:10 MikeGore test tensorflow
Unknown file formatext update_ml_users r1 manage 0.1 K 2018-12-03 - 11:11 MikeGore add all users to the system group called ml
Edit | Attach | Watch | Print version | History: r12 | r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r8 - 2020-02-14 - DanielAllen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback