以前にも導入記録を残しているが(CentOS7(Ver:7.5)にTensorFlow(Ver:1.8.0)導入)、
時間が経過してバージョンが上がり、以前の手順では導入できなかったので再度まとめる。
※前提としてCUDA(Version 10.1)がインストール済みのこと
【目次】
- 環境
- Pythonのインストール(pyenv + Anaconda)
- cuDNNのインストール
- Bazelのインストール
- TensorFlowのインストトール
1.環境
- OS : CentOS 7.6
- CUDA : 10.1
- cuDNN : 7.5
- Bazel : 1.19.2
- TensorFlow : 1.13.1
- Python : 3.6.0
2.Pythonのインストール
下記サイトの「pyenvのインストール」〜「Pythonのインストール」までを実施
CentOS7にTensorFlowをインストールする方法
3.cuDNNのインストール
これをしないとビルド時にエラーとなるので導入
下記のページの「Download cuDNN」からダウンロードする。
今回は「Download cuDNN v7.5.0 (Feb 25, 2019), for CUDA 10.1」を選択した。
※ダウンロードするためにはメンバーシップへの参加が必要です。
https://developer.nvidia.com/cudnn
ダウンロードしたファイル(cudnn-10.1-linux-x64-v7.5.0.56.tgz)を展開して配置する。
$ tar zxvf cudnn-10.1-linux-x64-v7.5.0.56.tgz
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.5.0
cuda/lib64/libcudnn_static.a
$ sudo cp ./cuda/include/* /usr/local/cuda-10.1/include/
$ sudo cp ./cuda/lib64/* /usr/local/cuda-10.1/lib64/
— 2019/05/24 追記 —
上記の方法だとldconfigを実行すると次の警告が発生するため、修正する。
ldconfig: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7 はシンボリックリンクではありません
以下、修正方法。修正後にldconfigを実行して警告がでなければOK
$ cd /usr/local/cuda-10.1/targets/x86_64-linux/lib
$ sudo rm libcudnn.so.7 libcudnn.so
$ sudo ln -s libcudnn.so.7.5.0 libcudnn.so.7
$ sudo ln -s libcudnn.so.7 libcudnn.so
———————-
4.Bazelのインストール
今回はリポジトリからのインストールを行わない。(リポジトリに存在するバージョンが新しすぎるため)
下記のページから「bazel-0.19.2-installer-linux-x86_64.sh」をダウンロードする。(結構時間がかかりました)
https://github.com/bazelbuild/bazel/releases
Bazelインストール
# bash bazel-0.19.2-installer-linux-x86_64.sh
Bazel installer
---------------
Bazel is bundled with software licensed under the GPLv2 with Classpath exception.
You can find the sources next to the installer on our release page:
https://github.com/bazelbuild/bazel/releases
# Release 0.19.2 (2018-11-19)
Baseline: ac880418885061d1039ad6b3d8c28949782e02d6
Cherry picks:
+ 9bc3b20053a8b99bf2c4a31323a7f96fabb9f1ec:
Fix the "nojava" platform and enable full presubmit checks for
the various JDK platforms now that we have enough GCE resources.
+ 54c2572a8cabaf2b29e58abe9f04327314caa6a0:
Add openjdk_linux_archive java_toolchain for nojava platform.
+ 20bfdc67dc1fc32ffebbda7088ba49ee17e3e182:
Automated rollback of commit
19a401c38e30ebc0879925a5caedcbe43de0028f.
+ 914b4ce14624171a97ff8b41f9202058f10d15b2:
Windows: Fix Precondition check for addDynamicInputLinkOptions
+ 83d406b7da32d1b1f6dd02eae2fe98582a4556fd:
Windows, test-setup.sh: Setting RUNFILES_MANIFEST_FILE only when
it exists.
+ e025726006236520f7e91e196b9e7f139e0af5f4:
Update turbine
+ 5f312dd1678878fb7563eae0cd184f2270346352:
Fix event id for action_completed BEP events
+ f0c844c77a2406518c4e75c49188390d5e281d3d:
Release 0.19.0 (2018-10-29)
+ c3fb1db9e4e817e8a911f5b347b30f2674a82f7c:
Do not use CROSSTOOL to select cc_toolchain
+ 8e280838e8896a6b5eb5421fda435b96b6f8de60:
Windows Add tests for msys gcc toolchain and mingw gcc toolchain
+ fd52341505e725487c6bc6dfbe6b5e081aa037da:
update bazel-toolchains pin to latest release Part of changes to
allow bazelci to use 0.19.0 configs. RBE toolchain configs at or
before 0.17.0 are not compatible with bazel 0.19.0 or above.
+ eb2af0f699350ad187048bf814a95af23f562c77:
Release 0.19.1 (2018-11-12)
+ 6bc452874ddff69cbf7f66186238032283f1195f:
Also update cc_toolchain.toolchain_identifier when
CC_TOOLCHAIN_NAME is set
+ f7e5aef145c33968f658eb2260e25630dc41cc67:
Add cc_toolchain targets for the new entries in the default
cc_toolchain_suite.
+ 683c302129b66a8999f986be5ae7e642707e978c:
Read the CROSSTOOL from the package of the current cc_toolchain,
not from --crosstool_top
- Fixes regression #6662, by fixing tools/cpp/BUILD
- Fixes regression #6665, by setting the toolchain identifier.
- CROSSTOOL file is now read from the package of cc_toolchain, not from the
package of cc_toolchain_suite. This is not expected to break anybody since
cc_toolchain_suite and cc_toolchain are commonly in the same package.
## Build informations
- [Commit](https://github.com/bazelbuild/bazel/commit/6d6633a)
Uncompressing.......
Bazel is now installed!
Make sure you have "/usr/local/bin" in your path. You can also activate bash
completion by adding the following line to your ~/.bashrc:
source /usr/local/lib/bazel/bin/bazel-complete.bash
See http://bazel.build/docs/getting-started.html to start a new project!
$ echo "source /usr/local/lib/bazel/bin/bazel-complete.bash" >> ~/.bashrc
5.TensorFlowのインストール
ビルド前の準備
$ sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp
$ sudo ln -s /usr/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105
$ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1 /usr/local/cuda-10.1/lib64/libcublas.so
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10.1.105 /usr/local/cuda-10.1/lib64/libcusolver.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcurand.so.10.1.105 /usr/local/cuda-10.1/lib64/libcurand.so.10.1
$ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcufft.so.10.1.105 /usr/local/cuda-10.1/lib64/libcufft.so.10.1
ビルド開始 ※結構時間かかります…
$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout refs/tags/v1.13.1
$ ./configure
# 必要に応じてYes Noを選択
# 今回はCUDAをYesにしてバージョンを10.1に
# また、cuDNN に7.5を指定した
WARNING: Running Bazel server needs to be killed, because the startup options are different.
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.19.2 installed.
Please specify the location of python. [Default is /home/yuya/.pyenv/versions/anaconda3-4.3.0/bin/python]:
Found possible Python library paths:
/home/yuya/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/home/yuya/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages]
Do you wish to build TensorFlow with XLA JIT support? [Y/n]:
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: 10.1
Please specify the location where CUDA 10.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.5
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.
Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]:
Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.
Preconfigured Bazel build configs to DISABLE default on features:
--config=noaws # Disable AWS S3 filesystem support.
--config=nogcp # Disable GCP support.
--config=nohdfs # Disable HDFS support.
--config=noignite # Disable Apacha Ignite support.
--config=nokafka # Disable Apache Kafka support.
--config=nonccl # Disable NVIDIA NCCL support.
Configuration finished
$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
インストールパッケージを生成
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
インストール
$ source activate tensorflow
$ pip install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp36-cp36m-linux_x86_64.whl
動作チェック
$ cd ~/
$ python
Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2019-03-24 11:58:58.077953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-24 11:58:58.081272: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3623d70 executing computations on platform CUDA. Devices:
2019-03-24 11:58:58.081300: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1
2019-03-24 11:58:58.084051: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3200050000 Hz
2019-03-24 11:58:58.084286: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x368bfa0 executing computations on platform Host. Devices:
2019-03-24 11:58:58.084308: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined>
2019-03-24 11:58:58.084948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:06:00.0
totalMemory: 3.95GiB freeMemory: 3.89GiB
2019-03-24 11:58:58.084972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-24 11:58:58.088318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-24 11:58:58.088342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-24 11:58:58.088350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-24 11:58:58.088869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3687 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
以上
参考
TensorFlow Build from source
need guide to build with CUDA 10.1
TensorflowをCentOS7にオフライン環境でインストールする