以前にも導入記録を残しているが(CentOS7(Ver:7.5)にTensorFlow(Ver:1.8.0)導入)、
時間が経過してバージョンが上がり、以前の手順では導入できなかったので再度まとめる。
※前提としてCUDA(Version 10.1)がインストール済みのこと
【目次】
- 環境
- Pythonのインストール(pyenv + Anaconda)
- cuDNNのインストール
- Bazelのインストール
- TensorFlowのインストトール
1.環境
- OS : CentOS 7.6
- CUDA : 10.1
- cuDNN : 7.5
- Bazel : 1.19.2
- TensorFlow : 1.13.1
- Python : 3.6.0
2.Pythonのインストール
下記サイトの「pyenvのインストール」〜「Pythonのインストール」までを実施
3.cuDNNのインストール
これをしないとビルド時にエラーとなるので導入
下記のページの「Download cuDNN」からダウンロードする。
今回は「Download cuDNN v7.5.0 (Feb 25, 2019), for CUDA 10.1」を選択した。
※ダウンロードするためにはメンバーシップへの参加が必要です。
https://developer.nvidia.com/cudnn
ダウンロードしたファイル(cudnn-10.1-linux-x64-v7.5.0.56.tgz)を展開して配置する。
$ tar zxvf cudnn-10.1-linux-x64-v7.5.0.56.tgz cuda/include/cudnn.h cuda/NVIDIA_SLA_cuDNN_Support.txt cuda/lib64/libcudnn.so cuda/lib64/libcudnn.so.7 cuda/lib64/libcudnn.so.7.5.0 cuda/lib64/libcudnn_static.a $ sudo cp ./cuda/include/* /usr/local/cuda-10.1/include/ $ sudo cp ./cuda/lib64/* /usr/local/cuda-10.1/lib64/
— 2019/05/24 追記 —
上記の方法だとldconfigを実行すると次の警告が発生するため、修正する。
ldconfig: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7 はシンボリックリンクではありません
以下、修正方法。修正後にldconfigを実行して警告がでなければOK
$ cd /usr/local/cuda-10.1/targets/x86_64-linux/lib $ sudo rm libcudnn.so.7 libcudnn.so $ sudo ln -s libcudnn.so.7.5.0 libcudnn.so.7 $ sudo ln -s libcudnn.so.7 libcudnn.so
———————-
4.Bazelのインストール
今回はリポジトリからのインストールを行わない。(リポジトリに存在するバージョンが新しすぎるため)
下記のページから「bazel-0.19.2-installer-linux-x86_64.sh」をダウンロードする。(結構時間がかかりました)
https://github.com/bazelbuild/bazel/releases
Bazelインストール
# bash bazel-0.19.2-installer-linux-x86_64.sh Bazel installer --------------- Bazel is bundled with software licensed under the GPLv2 with Classpath exception. You can find the sources next to the installer on our release page: https://github.com/bazelbuild/bazel/releases # Release 0.19.2 (2018-11-19) Baseline: ac880418885061d1039ad6b3d8c28949782e02d6 Cherry picks: + 9bc3b20053a8b99bf2c4a31323a7f96fabb9f1ec: Fix the "nojava" platform and enable full presubmit checks for the various JDK platforms now that we have enough GCE resources. + 54c2572a8cabaf2b29e58abe9f04327314caa6a0: Add openjdk_linux_archive java_toolchain for nojava platform. + 20bfdc67dc1fc32ffebbda7088ba49ee17e3e182: Automated rollback of commit 19a401c38e30ebc0879925a5caedcbe43de0028f. + 914b4ce14624171a97ff8b41f9202058f10d15b2: Windows: Fix Precondition check for addDynamicInputLinkOptions + 83d406b7da32d1b1f6dd02eae2fe98582a4556fd: Windows, test-setup.sh: Setting RUNFILES_MANIFEST_FILE only when it exists. + e025726006236520f7e91e196b9e7f139e0af5f4: Update turbine + 5f312dd1678878fb7563eae0cd184f2270346352: Fix event id for action_completed BEP events + f0c844c77a2406518c4e75c49188390d5e281d3d: Release 0.19.0 (2018-10-29) + c3fb1db9e4e817e8a911f5b347b30f2674a82f7c: Do not use CROSSTOOL to select cc_toolchain + 8e280838e8896a6b5eb5421fda435b96b6f8de60: Windows Add tests for msys gcc toolchain and mingw gcc toolchain + fd52341505e725487c6bc6dfbe6b5e081aa037da: update bazel-toolchains pin to latest release Part of changes to allow bazelci to use 0.19.0 configs. RBE toolchain configs at or before 0.17.0 are not compatible with bazel 0.19.0 or above. + eb2af0f699350ad187048bf814a95af23f562c77: Release 0.19.1 (2018-11-12) + 6bc452874ddff69cbf7f66186238032283f1195f: Also update cc_toolchain.toolchain_identifier when CC_TOOLCHAIN_NAME is set + f7e5aef145c33968f658eb2260e25630dc41cc67: Add cc_toolchain targets for the new entries in the default cc_toolchain_suite. + 683c302129b66a8999f986be5ae7e642707e978c: Read the CROSSTOOL from the package of the current cc_toolchain, not from --crosstool_top - Fixes regression #6662, by fixing tools/cpp/BUILD - Fixes regression #6665, by setting the toolchain identifier. - CROSSTOOL file is now read from the package of cc_toolchain, not from the package of cc_toolchain_suite. This is not expected to break anybody since cc_toolchain_suite and cc_toolchain are commonly in the same package. ## Build informations - [Commit](https://github.com/bazelbuild/bazel/commit/6d6633a) Uncompressing....... Bazel is now installed! Make sure you have "/usr/local/bin" in your path. You can also activate bash completion by adding the following line to your ~/.bashrc: source /usr/local/lib/bazel/bin/bazel-complete.bash See http://bazel.build/docs/getting-started.html to start a new project! $ echo "source /usr/local/lib/bazel/bin/bazel-complete.bash" >> ~/.bashrc
5.TensorFlowのインストール
ビルド前の準備
$ sudo ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp $ sudo ln -s /usr/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105 $ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1.0.105 /usr/local/cuda-10.1/lib64/libcublas.so.10.1 $ sudo ln -s /usr/local/cuda-10.1/lib64/libcublas.so.10.1 /usr/local/cuda-10.1/lib64/libcublas.so $ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10.1.105 /usr/local/cuda-10.1/lib64/libcusolver.so.10.1 $ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcurand.so.10.1.105 /usr/local/cuda-10.1/lib64/libcurand.so.10.1 $ sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcufft.so.10.1.105 /usr/local/cuda-10.1/lib64/libcufft.so.10.1
ビルド開始 ※結構時間かかります…
$ git clone https://github.com/tensorflow/tensorflow.git $ cd tensorflow $ git checkout refs/tags/v1.13.1 $ ./configure # 必要に応じてYes Noを選択 # 今回はCUDAをYesにしてバージョンを10.1に # また、cuDNN に7.5を指定した WARNING: Running Bazel server needs to be killed, because the startup options are different. WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.2 installed. Please specify the location of python. [Default is /home/yuya/.pyenv/versions/anaconda3-4.3.0/bin/python]: Found possible Python library paths: /home/yuya/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages Please input the desired Python library path to use. Default is [/home/yuya/.pyenv/versions/anaconda3-4.3.0/lib/python3.6/site-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: 10.1 Please specify the location where CUDA 10.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.5 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: No TensorRT support will be enabled for TensorFlow. Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished $ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
インストールパッケージを生成
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
インストール
$ source activate tensorflow $ pip install /tmp/tensorflow_pkg/tensorflow-1.13.1-cp36-cp36m-linux_x86_64.whl
動作チェック
$ cd ~/ $ python Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf >>> hello = tf.constant('Hello, TensorFlow!') >>> sess = tf.Session() 2019-03-24 11:58:58.077953: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-03-24 11:58:58.081272: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3623d70 executing computations on platform CUDA. Devices: 2019-03-24 11:58:58.081300: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1050 Ti, Compute Capability 6.1 2019-03-24 11:58:58.084051: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3200050000 Hz 2019-03-24 11:58:58.084286: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x368bfa0 executing computations on platform Host. Devices: 2019-03-24 11:58:58.084308: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-03-24 11:58:58.084948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392 pciBusID: 0000:06:00.0 totalMemory: 3.95GiB freeMemory: 3.89GiB 2019-03-24 11:58:58.084972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-03-24 11:58:58.088318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-03-24 11:58:58.088342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-03-24 11:58:58.088350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-03-24 11:58:58.088869: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3687 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:06:00.0, compute capability: 6.1) >>> print(sess.run(hello)) b'Hello, TensorFlow!'
以上
参考
need guide to build with CUDA 10.1
TensorflowをCentOS7にオフライン環境でインストールする