CentOS7にTensorFlow(Ver:1.13.1)をBuildインストール(CUDA:10.1)

以前にも導入記録を残しているが(CentOS7(Ver:7.5)にTensorFlow(Ver:1.8.0)導入)、
時間が経過してバージョンが上がり、以前の手順では導入できなかったので再度まとめる。

※前提としてCUDA(Version 10.1)がインストール済みのこと

【目次】

  1. 環境
  2. Pythonのインストール(pyenv + Anaconda)
  3. cuDNNのインストール
  4. Bazelのインストール
  5. TensorFlowのインストトール

1.環境

  • OS : CentOS 7.6
  • CUDA : 10.1
  • cuDNN : 7.5
  • Bazel : 1.19.2
  • TensorFlow : 1.13.1
  • Python : 3.6.0

2.Pythonのインストール

下記サイトの「pyenvのインストール」〜「Pythonのインストール」までを実施

CentOS7にTensorFlowをインストールする方法

3.cuDNNのインストール

これをしないとビルド時にエラーとなるので導入

下記のページの「Download cuDNN」からダウンロードする。
今回は「Download cuDNN v7.5.0 (Feb 25, 2019), for CUDA 10.1」を選択した。
※ダウンロードするためにはメンバーシップへの参加が必要です。

https://developer.nvidia.com/cudnn

ダウンロードしたファイル(cudnn-10.1-linux-x64-v7.5.0.56.tgz)を展開して配置する。

$ tar zxvf cudnn-10.1-linux-x64-v7.5.0.56.tgz
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.5.0
cuda/lib64/libcudnn_static.a
$ sudo cp ./cuda/include/* /usr/local/cuda-10.1/include/
$ sudo cp ./cuda/lib64/* /usr/local/cuda-10.1/lib64/

— 2019/05/24 追記 —
上記の方法だとldconfigを実行すると次の警告が発生するため、修正する。
ldconfig: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7 はシンボリックリンクではありません
以下、修正方法。修正後にldconfigを実行して警告がでなければOK

$ cd /usr/local/cuda-10.1/targets/x86_64-linux/lib
$ sudo rm libcudnn.so.7 libcudnn.so
$ sudo ln -s libcudnn.so.7.5.0 libcudnn.so.7
$ sudo ln -s libcudnn.so.7 libcudnn.so

———————-

4.Bazelのインストール

今回はリポジトリからのインストールを行わない。(リポジトリに存在するバージョンが新しすぎるため)
下記のページから「bazel-0.19.2-installer-linux-x86_64.sh」をダウンロードする。(結構時間がかかりました)

https://github.com/bazelbuild/bazel/releases

Bazelインストール

# bash bazel-0.19.2-installer-linux-x86_64.sh 
Bazel installer
---------------

Bazel is bundled with software licensed under the GPLv2 with Classpath exception.
You can find the sources next to the installer on our release page:
   https://github.com/bazelbuild/bazel/releases

# Release 0.19.2 (2018-11-19)

Baseline: ac880418885061d1039ad6b3d8c28949782e02d6

Cherry picks:

   + 9bc3b20053a8b99bf2c4a31323a7f96fabb9f1ec:
     Fix the "nojava" platform and enable full presubmit checks for
     the various JDK platforms now that we have enough GCE resources.
   + 54c2572a8cabaf2b29e58abe9f04327314caa6a0:
     Add openjdk_linux_archive java_toolchain for nojava platform.
   + 20bfdc67dc1fc32ffebbda7088ba49ee17e3e182:
     Automated rollback of commit
     19a401c38e30ebc0879925a5caedcbe43de0028f.
   + 914b4ce14624171a97ff8b41f9202058f10d15b2:
     Windows: Fix Precondition check for addDynamicInputLinkOptions
   + 83d406b7da32d1b1f6dd02eae2fe98582a4556fd:
     Windows, test-setup.sh: Setting RUNFILES_MANIFEST_FILE only when
     it exists.
   + e025726006236520f7e91e196b9e7f139e0af5f4:
     Update turbine
   + 5f312dd1678878fb7563eae0cd184f2270346352:
     Fix event id for action_completed BEP events
   + f0c844c77a2406518c4e75c49188390d5e281d3d:
     Release 0.19.0 (2018-10-29)
   + c3fb1db9e4e817e8a911f5b347b30f2674a82f7c:
     Do not use CROSSTOOL to select cc_toolchain
   + 8e280838e8896a6b5eb5421fda435b96b6f8de60:
     Windows Add tests for msys gcc toolchain and mingw gcc toolchain
   + fd52341505e725487c6bc6dfbe6b5e081aa037da:
     update bazel-toolchains pin to latest release Part of changes to
     allow bazelci to use 0.19.0 configs. RBE toolchain configs at or
     before 0.17.0 are not compatible with bazel 0.19.0 or above.
   + eb2af0f699350ad187048bf814a95af23f562c77:
     Release 0.19.1 (2018-11-12)
   + 6bc452874ddff69cbf7f66186238032283f1195f:
     Also update cc_toolchain.toolchain_identifier when
     CC_TOOLCHAIN_NAME is set
   + f7e5aef145c33968f658eb2260e25630dc41cc67:
     Add cc_toolchain targets for the new entries in the default
     cc_toolchain_suite.
   + 683c302129b66a8999f986be5ae7e642707e978c:
     Read the CROSSTOOL from the package of the current cc_toolchain,
     not from --crosstool_top

- Fixes regression #6662, by fixing tools/cpp/BUILD
- Fixes regression #6665, by setting the toolchain identifier.
- CROSSTOOL file is now read from the package of cc_toolchain, not from the
  package of cc_toolchain_suite. This is not expected to break anybody since
  cc_toolchain_suite and cc_toolchain are commonly in the same package.

## Build informations
   - [Commit](https://github.com/bazelbuild/bazel/commit/6d6633a)
Uncompressing.......

Bazel is now installed!

Make sure you have "/usr/local/bin" in your path. You can also activate bash
completion by adding the following line to your ~/.bashrc:
  source /usr/local/lib/bazel/bin/bazel-complete.bash

See http://bazel.build/docs/getting-started.html to start a new project!
$ echo "source /usr/local/lib/bazel/bin/bazel-complete.bash" >> ~/.bashrc

5.TensorFlowのインストール

ビルド前の準備

$ sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp

$ sudo ln -s /usr/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105
$ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1 /usr/local/cuda-10.1/lib64/libcublas.so
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10.1.105 /usr/local/cuda-10.1/lib64/libcusolver.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcurand.so.10.1.105 /usr/local/cuda-10.1/lib64/libcurand.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcufft.so.10.1.105 /usr/local/cuda-10.1/lib64/libcufft.so.10.1

ビルド開始 ※結構時間かかります…

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout refs/tags/v1.13.1
$ ./configure
# 必要に応じてYes Noを選択
# 今回はCUDAをYesにしてバージョンを10.1に
# また、cuDNN に7.5を指定した
WARNING: Running Bazel server needs to be killed, because the startup options are different.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /home/yuya/.pyenv/versions/anaconda3-4.3.0/bin/python]:             


Found possible Python library paths:
  /home/yuya/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/home/yuya/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages]

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: 
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: 
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: 10.1


Please specify the location where CUDA 10.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.5


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 


Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.

Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: 


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: 


Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 


Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: 


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: 
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
	--config=gdr         	# Build with GDR support.
	--config=verbs       	# Build with libverbs support.
	--config=ngraph      	# Build with Intel nGraph support.
	--config=dynamic_kernels	# (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
	--config=noaws       	# Disable AWS S3 filesystem support.
	--config=nogcp       	# Disable GCP support.
	--config=nohdfs      	# Disable HDFS support.
	--config=noignite    	# Disable Apacha Ignite support.
	--config=nokafka     	# Disable Apache Kafka support.
	--config=nonccl      	# Disable NVIDIA NCCL support.
Configuration finished
$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

インストールパッケージを生成
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
インストール

$ source activate tensorflow
$ pip install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp36-cp36m-linux_x86_64.whl

動作チェック

$ cd ~/
$ python
Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2019-03-24 11:58:58.077953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-24 11:58:58.081272: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3623d70 executing computations on platform CUDA. Devices:
2019-03-24 11:58:58.081300: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-03-24 11:58:58.084051: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3200050000 Hz
2019-03-24 11:58:58.084286: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x368bfa0 executing computations on platform Host. Devices:
2019-03-24 11:58:58.084308: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-24 11:58:58.084948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:06:00.0
totalMemory: 3.95GiB freeMemory: 3.89GiB
2019-03-24 11:58:58.084972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-24 11:58:58.088318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-24 11:58:58.088342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-03-24 11:58:58.088350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-03-24 11:58:58.088869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3687 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'

以上

 

参考

TensorFlow Build from source

need guide to build with CUDA 10.1

TensorflowをCentOS7にオフライン環境でインストールする