-- Mike Gore - 2020-08-05

Linux CUDA,cuDNN, NVIDIA and Tensorflow GPU install

  • Note: As of Ubuntu 20.04 we are switching to docker
  • Why?
    • The Ubuntu packages and various dependencies for NVIDIA driver, CUDA version and utilities change and break as often as the weather!, so what works today will not work in a few weeks even if you install exactly the same packages by name. Worse is that older drivers are removed so you then must reevaluate new dependencies that eventually tie to user applications like tensorflow. Tensorflow as very specific requirements for cuDNN and CUDA version so that makes things even worse.... Shortly after 20.04 came out just a few packages were need to install and everything just worked - now its far from simple

Overview

  • This document is broken down into two sections
  1. ) Administrative software and driver install
    • CSCF RSG does this if you request a computer specifically with GPU setup machine from us
  2. ) End user install private to their profile - setup a virtual environment and install TensorFlow

References

Docker References

Administrative section software and driver installation

  • *Note: the dependency and file names seem to change every few week so the examples below may be quickly out of date
    • It is important to know in advance what versions of cuda your tools will new. For example TensorFlow is very sensitive to versions

Ubuntu 20.04LTS easy setup with Docker install script

Updated and tested on Jan 9th 2020 Mike Gore
  • Aside: The original simple package install actions no longer work. The packages dependencies were altered so this method is longer viable
  • Solution - we are using a docker install scripts that will do EVERYTHING including nVidia drivers, CUDA and all package prerequisites

  • install_nvidia_docker: Install GPU Docker container and support packages CUDA. NVIDIA tools and Python support
    • This will install the prerequisite packages and correct nVidia drivers then install docker with images for testing

  • run_docker_tests: Run tests on GPU Docker container and support packages CUDA. NVIDIA tools and Python support
    • The will run basic tests on the containers
    • Please consult this file for running applications with the installed containers

docker images
REPOSITORY              TAG             IMAGE ID       CREATED         SIZE
nvidia/cuda             10.1-base       c06d556b5f80   3 months ago    105MB
tensorflow/tensorflow   2.1.0-gpu-py3   e2a4af785bdb   12 months ago   4.11GB
hello-world             latest          bf756fb1ae65   12 months ago   13.3kB

Ubuntu 20.04LTS Docker tests

Ubuntu 20.04LTS remove all nVIDIA, CUDA and docker and docker images to prepare for a clean install

  • remove_docker_nvidia: Purge all GPU Docker container and support packages CUDA. NVIDIA tools and Python support
    • Warning this will remove all drivers, containers and packages related to CUDA, NVIDIA and Docker
    • This was used on the test machine to successfully remove the Docker images, CUDA driver, nvidia drivers to prepare a clean install
    • Note: this script will remove existing docker images and programs - please update the script if that is not desired

Ubuntu18.04LTS

  • This document has been tested for installing CUDA, NVIDIA drivers and TensorFlow on Ubuntu 18.04LTS

Ubuntu 16.04 and 18.04LTS

Hardware Requirements

  • nVidia GPU card
  • CPU with AVX support
  • Open a terminal window
    • Activities -> Search -> term
      • grep -i avx /proc/cpuinfo
        • You need to see a line with avx in it

Software Requirements

  • Open a terminal window
    • Activities -> Search -> term
      • For future quick access -> right click on the terminal icon now on your task bar and pick Add to favorites
  • sudo bash
    • This gives you a root shell
    • It will ask you for your normal login password
  • ubuntu-drivers devices
    • Note the recommened driver name and install it
    • Example: apt-get install nvidia-driver-440
      • Should work for 18.04
    • Example: apt-get install nvidia-driver-455
      • Should work for 20.04

* reboot machine before continuing to allow drivers to load

Useful prerequisites for coding and development

apt-get install aptitude vim gdebi linux-headers-$(uname -r) curl apt-transport-https build-essential binutils gdb coreutils dpkg-dev autoconf automake make cmake patch git rcs subversion pylint python-dev python2.7-dev python3-dev swig libcupti-dev golang python-opengl python3-msgpack python-setuptools libboost-python-dev libboost-thread-dev libboost-all-dev tmux htop unzip bzip2 gzip p7zip-full p7zip-rar zip tar cabextract

CUDA, cuDNN, NVIDIA install

Note: I assume you have 3rd part driver support enabled in the Ubuntu Software center
  • sudo bash

18.04

  • OS=ubuntu1804
       wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin 
       sudo mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600
       sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/7fa2af80.pub
       sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /"
       sudo apt-get update
       
  • apt-get install nvidia-cuda-toolkit nvidia-cuda-gdb nvidia-cuda-doc python3-pycuda python-pycuda-doc python3-pycuda-dbg python3-numpy
  • apt-get install cuda-libraries-dev-10-1 cuda-libraries-10-1 libcublas10 libcublas-dev ibcudnn7-dev libcudnn7

End User instructions for installing of TensorFlow GPU in a virtual environment

  • You MUST always use a Python3 virtual environment otherwise you risk damaging the system wide Python installation
  • Note: this installation can be done as a normal using a terminal window
  • Open a terminal window
    • Activities -> Search -> term
  • Create a vertual environment called venv in your current directory
    • python3 -m venv --system-site-packages ./venv
    • source venv/bin/activate
    • pip install --upgrade pip

20.04LTS

    • This no longer works - use Docker method
    • pip install tensorflow-gpu==2.4

18.04LTS

    • pip install tensorflow-gpu==2.0

Testing

  • For 20.04LTS consult
  • Make sure your virtual environment is activated
  • Open a terminal window
    • Activities -> Search -> term
  • Activate venv
    • source venv/bin/activate
  • python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatext install_nvidia_docker r5 r4 r3 r2 r1 manage 5.2 K 2021-01-09 - 17:36 MikeGore Install GPU Docker container and support packages CUDA. NVIDIA tools and Python support
Unknown file formatext remove_docker_nvidia r1 manage 1.5 K 2021-01-09 - 17:36 MikeGore Purge all GPU Docker container and support packages CUDA. NVIDIA tools and Python support
Unknown file formatext run_docker_tests r2 r1 manage 1.6 K 2021-01-09 - 18:18 MikeGore Run tests on GPU Docker container and support packages CUDA. NVIDIA tools and Python support
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2021-01-14 - MikeGore
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback