【922ee最新】0490-如何为GPU环境编译CUDA9.2的TensorFlow1.8与1.12

作者：李继武

创建文档的目的

从CDSW1.1.0开始支持GPU，请参阅Fayson前面的句子。

如何在CDSW中利用GPU进行深度学习

“您可以在最新的CDSW支持GPU网站上查看相应的NVIDIA驱动器版本、CUDA版本和tensorflow版本，如下所示：

我们注意到CUDA的版本是9.2，但目前正式发布的编译版TensorFlow的CUDA版本仍然是9.0。在CDSW环境中，为了使TensorFlow在GPU上运行，必须使用CUDA9.2。我们需要手动编译TensorFlow源代码。

这里，以编译Ten和Ten的版本为例，指定CUDA的版本为9.2，cudnn的版本为7.2.1。

安装编译过程中需要的包及环境

此部分两个版本的操作都相同

1.配置JDK1.8到环境变量中

2.执行如下命令，安装依赖包

yum -y install numpy yum -y install python-devel yum -y install python-pip yum -y install python-wheel yum -y install epel-release yum -y install gcc-c++ pip install --upgrade pip enum34 pip install keras --user pip install mock

如果安装时没有可用的包，可到下面的地址下载，然后制作本地yum源：

3.下载CUDA9.2并安装

到下面的地址下载CUDA9.2安装包：

;target_arch=x86_64&target_distro=RHEL&target_version=7&target_type=runfilelocal

选择runfile(local)版本：

上传到服务器：

修改文件权限，并运行该文件：

chmod +x cuda_9.2.148_396.37_linux.run .

将CUDA添加到环境变量：

export PATH=/usr/local:$PATH export LD_LIBRARY_PATH=/usr/local:$LD_LIBRARY_PATH

执行如下命令应能看到cuda版本：

source /etc/profile nvcc -V

4.cuDNN v7.2.1 下载并安装

到如下地址下载cudnn v7.2.1，需要注册之后才能下载：

上传到服务器CUDA的安装目录/usr/local/cuda，解压到该目录下

tar -zxvf cudnn-9.2-linux-x64-v7.2.1.38.tgz

在该目录下执行下面命令将cudnn添加到cuda的库中：

sudo cp cuda/include /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include /usr/local/cuda/lib64/libcudnn*

进入lib64目录，建立一个软连接：

cd /usr/local/cuda/lib64 ln -s stub libcuda.

安装编译工具bazel

这部分编译不同的tensorflow版本需要安装不同版本的bazel，使用太新的版本有时会报错。

A.Ten使用的bazel版本为0.19.2：

1.下载bazel-0.19.2：

wget

2.添加可执行权限，并执行：

chmod +x bazel-0.19.2-in ./bazel-0.19.2-in --user

该--user标志将Bazel安装到$HOME/bin系统上的目录并设置.bazelrc路径$HOME/.bazelrc。使用该--help 命令可以查看其他安装选项。

显示下面的提示表示安装成功：

如果使用--user上面的标志运行Bazel安装程序，则Bazel可执行文件将安装在$HOME/bin目录中。将此目录添加到默认路径是个好主意，如下所示：

export PATH=$HOME/bin:$PATH

B.Ten使用的bazel版本为0.13.0：

1.下载bazel-0.13.0

wget https://github.com/bazelbuild/bazel/releases/download/0.13.0/bazel-0.13.0-in

其余的操作与上面安装bazel-0.19.2相同。

下载Tensorflow源码

A. 下载最新版的tensorflow：

git clone --recurse-submodules

该命令会在当前目录下创建一个tensorflow目录，在其中下载最新版的tensorflow源码：

编写此文档时tensorflow最新的版本为1.12。

B.下载ten:

wget /archive/v1.8.0.tar.gz

解压到当前文件夹：

wget /archive/v1.8.0.tar.gz

配置tensorflow

不同版本的配置略有不同。

A.Ten

进入ten的源码目录，执行./configure并根据提示选择：

[root@cdh4 tensorflow]# ./configure Extracting Bazel installation... WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". INFO: Invocation ID: cc8b0ee2-5e84-4995-ba12-2c922ee3646b You have bazel 0.19.2 installed. Please specify the location of python. [Default is /usr/bin/python]: Found possible Python library paths: /usr/lib /usr/lib64 Please input the desired Python library path to use. Default is [/usr/lib] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: n No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: n No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.2 Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.2.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: n No TensorRT support will be enabled for TensorFlow. Please specify the locally installed NCCL version you want to use. [Default is to use ]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: . Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,7.0]: Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished

B.Ten

进入ten的源码目录，执行./configure并根据提示选择：

[root@cdh2 ]# ./configure WARNING: Running Bazel server needs to be killed, because the startup options are different. You have bazel 0.13.0 installed. Please specify the location of python. [Default is /usr/bin/python]: Found possible Python library paths: /usr/lib /usr/lib64 Please input the desired Python library path to use. Default is [/usr/lib] Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y jemalloc as malloc support will be enabled for TensorFlow. Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n No Google Cloud Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n No Hadoop File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n No Amazon S3 File System support will be enabled for TensorFlow. Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n No Apache Kafka Platform support will be enabled for TensorFlow. Do you wish to build TensorFlow with XLA JIT support? [y/N]: n No XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with GDR support? [y/N]: n No GDR support will be enabled for TensorFlow. Do you wish to build TensorFlow with VERBS support? [y/N]: n No VERBS support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2 Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Do you wish to build TensorFlow with TensorRT support? [y/N]: n No TensorRT support will be enabled for TensorFlow. Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: . Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] Do you want to use clang as CUDA compiler? [y/N]: n nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: n No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tool for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. Configuration finished

编译tensorflow

两个版本都使用下方的命令进行编译

bazel build --config=opt --config=cuda --config=monolithic //tensorflow/tools/pip_package:build_pip_package

注意：执行该命令要在tensorflow的源码目录下

开始编译：

等待编译结束，该过程比较耗时，出现下面提示表示编译成功。

编译结束后，执行下面命令：

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

执行完毕后可在/tmp/tensorflow_pkg目录中看到编译成功的tensorflow安装包：

注意：在编译过程中，磁盘不足或者内存不足都将导致编译失败，内存不足可能出现下面的错误，可通过设置交换区来解决。

设置缓冲区：

sudo dd if=/dev/zero of=/var/cache/swap/swap0 bs=1M count=1024 sudo chmod 0600 /var/cache/swap/swap0 sudo mkswap /var/cache/swap/swap0 sudo swapon /var/cache/swap/swap0

当编译结束后，删除该交换区：

swapoff /var/cache/swap/swap0 rm -rf /var/cache/swap/swap0

验证

此处以验证ten为例：

1.安装编译好的tensorflow安装包：

sudo pip install /tmp/tensorflow_pkg/-cp27-none-linux_x86_64.whl

2.安装成功后，打开Python的交互界面，导入tensorflow，查看版本及路径：

注意：测试的时候别在tensorflow目录下import tensorflow，可能直接引用里面的目录下的包。

提示：代码块部分可以左右滑动查看噢

为天地立心，为生民立命，为往圣继绝学，为万世开太平。
温馨提示：如果使用电脑查看图片不清晰，可以使用手机打开文章单击文中的图片放大查看高清原图。

推荐关注Hadoop实操，第一时间，分享更多Hadoop干货，欢迎转发和分享。

原创文章，欢迎转载，转载请注明：转载自微信公众号Hadoop实操

1.《【922ee最新】0490-如何为GPU环境编译CUDA9.2的TensorFlow1.8与1.12》援引自互联网，旨在传递更多网络信息知识，仅代表作者本人观点，与本网站无关，侵删请联系页脚下方联系方式。

2.《【922ee最新】0490-如何为GPU环境编译CUDA9.2的TensorFlow1.8与1.12》仅供读者参考，本网站未对该内容进行证实，对其原创性、真实性、完整性、及时性不作任何保证。

3.文章转载时请保留本站内容来源地址，https://www.lu-xu.com/yule/3196997.html

【922ee最新】0490-如何为GPU环境编译CUDA9.2的TensorFlow1.8与1.12

【什么山河】摄影家眼中的壮美山河

【922ee最新】“9.22”特大跨国电信网络诈骗专案77名嫌犯被押解回国

【922ee最新】百度i贴吧0day跨站漏洞

【922ee最新】EDG的秘密武器？找SKT打训练赛！

【922ee最新】这就是英国低端安卓手机？国内小伙伴表示呵呵