CentOS8.2にCUDA11.0u1をインストール

CentOS8.2にCUDA11.0をインストールします。

少々以前のバージョンと異なっていましたので記載しておきます。

firewallの停止作業

[hoge@apple ~]$ su –
パスワード:
[root@apple ~]#
[root@apple ~]# systemctl stop firewalld.service
[root@apple ~]# systemctl mask firewalld.service
Created symlink /etc/systemd/system/firewalld.service → /dev/null.
[root@apple ~]#
[root@apple ~]# systemctl list-unit-files | grep firewalld
firewalld.service masked
[root@apple ~]#

SELinuxの停止作業

[root@apple ~]# getenforce
Enforcing
[root@apple ~]#
[root@apple ~]# setenforce 0
[root@apple ~]#

[root@apple ~]# vi /etc/selinux/config
[root@apple ~]# cat /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing – SELinux security policy is enforced.
# permissive – SELinux prints warnings instead of enforcing.
# disabled – No SELinux policy is loaded.
#SELINUX=enforcing
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
# targeted – Targeted processes are protected,
# minimum – Modification of targeted policy. Only selected processes are protected.
# mls – Multi Level Security protection.
SELINUXTYPE=targeted

[root@apple ~]#

[root@apple ~]# mkdir /home/cuda
[root@apple ~]# cd /home/cuda
[root@apple cuda]#
[root@apple cuda]# wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run ←CUDAのキットを入手します。
–2020-09-14 09:48:08– https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda_11.0.3_450.51.06_linux.run
developer.download.nvidia.com (developer.download.nvidia.com) をDNSに問いあわせています… 8.8.8.8
developer.download.nvidia.com (developer.download.nvidia.com)|8.8.8.8|:443 に接続しています… 接続しました。
HTTP による接続要求を送信しました、応答を待っています… 200 OK
長さ: 3112522594 (2.9G) [application/octet-stream]
`cuda_11.0.3_450.51.06_linux.run’ に保存中

cuda_11.0.3_450.51.06_linux.run 100%[===========================================================================================>] 2.90G 11.0MB/s 時間 4m 36s

2020-09-14 09:52:44 (10.8 MB/s) – `cuda_11.0.3_450.51.06_linux.run’ へ保存完了 [3112522594/3112522594]

[root@apple cuda]#

[root@apple cuda]#
[root@apple cuda]#
[root@apple cuda]# systemctl set-default multi-user.target ← X11の自動起動を止めます。
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/multi-user.target.
[root@apple cuda]#
[root@apple cuda]# lsmod | grep nouveau ← NVIDIAのドライバを確認(これはOSデフォルトのもの)
nouveau 2220032 4
video 45056 1 nouveau
mxm_wmi 16384 1 nouveau
i2c_algo_bit 16384 1 nouveau
drm_kms_helper 212992 1 nouveau
ttm 114688 1 nouveau
drm 536576 7 drm_kms_helper,ttm,nouveau
wmi 32768 5 hp_wmi,intel_wmi_thunderbolt,wmi_bmof,mxm_wmi,nouveau
[root@apple cuda]#
[root@apple cuda]#
[root@apple cuda]# vi /etc/modprobe.d/blacklist-nouveau.conf
[root@apple cuda]#
[root@apple cuda]# cat /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
[root@apple cuda]#
[root@apple cuda]# dracut –force ← OS標準のNVIDIAドライバの削除

※インストーラの中でもやっています。

[root@apple cuda]#
[root@apple cuda]# reboot

[root@apple cuda]#
[root@apple cuda]# lsmod | grep nouveau
[root@apple cuda]#

[hoge@apple ~]$ su –
パスワード:
[root@apple ~]# ls /home/cuda
cuda_11.0.3_450.51.06_linux.run
[root@apple ~]# cd /home/cuda
[root@apple cuda]#
[root@apple cuda]# chmod a+x cuda_11.0.3_450.51.06_linux.run
[root@apple cuda]#
[root@apple cuda]# ls -l
合計 3039576
-rwxr-xr-x. 1 root root 3112522594 8月 5 06:21 cuda_11.0.3_450.51.06_linux.run
[root@apple cuda]#
[root@apple cuda]# ./cuda_11.0.3_450.51.06_linux.run ← CUDAのインストーラー起動
Extraction failed.
Ensure there is enough space in /tmp and that the installation package is not corrupt
Signal caught, cleaning up ← /tmp の容量が足りない場合のエラー
[root@apple cuda]#
[root@apple cuda]# ./cuda_11.0.3_450.51.06_linux.run -help
Options:
–silent
Performs an installation with no further user-input and minimal
command-line output based on the options provided below. Silent
installations are useful for scripting the installation of CUDA.
Using this option implies acceptance of the EULA. The following flags
can be used to customize the actions taken during installation. At
least one of –driver, –uninstall, –toolkit, and –samples must
be passed if running with non-root permissions.

–driver
Install the CUDA Driver.

–toolkit
Install the CUDA Toolkit.

–toolkitpath=<path>
Install the CUDA Toolkit to the <path> directory. If this flag is not
provided, the default path of /usr/local/cuda-10.2 is used.

–samples
Install the CUDA Samples.

–samplespath=<path>
Install the CUDA Samples to the <path> directory. If this flag is not
provided, the default path of /root/NVIDIA_CUDA-10.2_Samples is used.

–librarypath=<path>
Install libraries to the <path> directory. If this flag is not provided,
the default path of your distribution is used. This flag only applies to
libraries installed outside of the CUDA Toolkit path.

–installpath=<path>
Install everything to the <path> directory. This flag sets the same values
as the toolkitpath, samplespath, and librarypath options.

–extract=<path>
Extracts driver runfile and the raw files of the toolkit and samples to
<path>.

This is especially useful when one wants to install the driver using one or
more of the command-line options provided by the driver installer which
are not exposed in this installer.

–override
Ignores compiler version checks which would prevent installation.

–no-opengl-libs
Prevents the driver installation from installing NVIDIA’s GL libraries.
Useful for systems where the display is driven by a non-NVIDIA GPU.
In such systems, NVIDIA’s GL libraries could prevent X from loading
properly.

–no-man-page
Do not install the man pages under /usr/share/man.

–kernel-source-path=<path>
Tells the driver installation to use <path> as the kernel source directory
when building the NVIDIA kernel module. Required for systems where the
kernel source is installed to a non-standard location.

–run-nvidia-xconfig
Tells the driver installation to run nvidia-xconfig to update the system
X configuration file so that the NVIDIA X driver is used. The pre-existing
X configuration file will be backed up.

This option should not be used on systems that require a custom
X configuration, or on systems where a non-NVIDIA GPU is rendering the
display.

–no-drm
Do not install the nvidia-drm kernel module. This kernel module provides
several features, including X11 autoconfiguration, support for PRIME, and
DRM-KMS. The latter is used to support modesetting on windowing systems
that run independently of X11. The ‘–no-drm’ option should only be used
to work around failures to build or install the nvidia-drm kernel module
on systems that do not need these features.

–tmpdir=<path>
Performs any temporary actions within <path> instead of /tmp. Useful in
cases where /tmp cannot be used (doesn’t exist, is full, is mounted with
‘noexec’, etc.).

–help
Prints this help message.
[root@apple cuda]#
[root@apple cuda]# ./cuda_11.0.3_450.51.06_linux.run –tmpdir=/home/cuda ← CUDAインストーラーの起動
Installation failed. See log at /var/log/cuda-installer.log for details. ← なにげによく発生するエラー
[root@apple cuda]#
[root@apple cuda]# cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version…
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc バージョン 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 450.51.06
[INFO]: Executing NVIDIA-Linux-x86_64-450.51.06.run –ui=none –no-questions –accept-license –disable-nouveau –no-cc-version-check –install-libglvnd –run-nvidia-xconfig 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 450.51.06 failed, quitting
[root@apple cuda]#
[root@apple cuda]# cat /var/log/nvidia-installer.log
nvidia-installer log file ‘/var/log/nvidia-installer.log’
creation time: Mon Sep 14 10:20:16 2020
installer version: 450.51.06

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

nvidia-installer command line:
./nvidia-installer
–ui=none
–no-questions
–accept-license
–disable-nouveau
–no-cc-version-check
–install-libglvnd
–run-nvidia-xconfig

Using built-in stream user interface
-> Detected 40 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 450.51.06.
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration directory. Would you like nvidia-installer to attempt to create this modprobe file for you? (Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written. For some distributions, this may be sufficient to disable Nouveau; other distributions may require modification of the initial ramdisk. Please reboot your system and attempt NVIDIA driver installation again. Note if you later wish to reenable Nouveau, you will need to delete these files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf, /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
-> Performing CC sanity check with CC=”/usr/bin/cc”.
-> Performing CC check.
-> Kernel source path: ‘/lib/modules/4.18.0-193.el8.x86_64/source’
-> Kernel output path: ‘/lib/modules/4.18.0-193.el8.x86_64/build’
-> Performing Compiler check.
-> Performing Dom0 check.
-> Performing Xen check.
-> Performing PREEMPT_RT check.
-> Performing vgpu_kvm check.
-> Cleaning kernel module build directory.
executing: ‘cd ./kernel; /usr/bin/make -k -j32 clean NV_EXCLUDE_KERNEL_MODULES=”” SYSSRC=”/lib/modules/4.18.0-193.el8.x86_64/source” SYSOUT=”/lib/modules/4.18.0-193.el8.x86_64/build”‘…
rm -f -r conftest
make[1]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[2]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[2]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[1]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
-> Building kernel modules
executing: ‘cd ./kernel; /usr/bin/make -k -j32 NV_EXCLUDE_KERNEL_MODULES=”” SYSSRC=”/lib/modules/4.18.0-193.el8.x86_64/source” SYSOUT=”/lib/modules/4.18.0-193.el8.x86_64/build”‘…
make[1]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[2]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
/usr/src/kernels/4.18.0-193.el8.x86_64/Makefile:975: *** “Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel”. Stop.
make[2]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[1]: *** [Makefile:157: sub-make] Error 2
make[1]: Target ‘modules’ not remade because of errors.
make[1]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make: *** [Makefile:81: modules] Error 2
-> Error.
ERROR: An error occurred while performing the step: “Building kernel modules”. See /var/log/nvidia-installer.log for details.
-> The command `cd ./kernel; /usr/bin/make -k -j32 NV_EXCLUDE_KERNEL_MODULES=”” SYSSRC=”/lib/modules/4.18.0-193.el8.x86_64/source” SYSOUT=”/lib/modules/4.18.0-193.el8.x86_64/build”` failed with the following output:

make[1]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[2]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
/usr/src/kernels/4.18.0-193.el8.x86_64/Makefile:975: *** “Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel”. Stop.
make[2]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[1]: *** [Makefile:157: sub-make] Error 2
make[1]: Target ‘modules’ not remade because of errors.
make[1]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make: *** [Makefile:81: modules] Error 2
-> Checking to see whether the nvidia kernel module was successfully built
executing: ‘cd ./kernel; /usr/bin/make -k -j32 NV_KERNEL_MODULES=”nvidia” NV_EXCLUDE_KERNEL_MODULES=”” SYSSRC=”/lib/modules/4.18.0-193.el8.x86_64/source” SYSOUT=”/lib/modules/4.18.0-193.el8.x86_64/build”‘…
make[1]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[2]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
/usr/src/kernels/4.18.0-193.el8.x86_64/Makefile:975: *** “Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel”. Stop.
make[2]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[1]: *** [Makefile:157: sub-make] Error 2
make[1]: Target ‘modules’ not remade because of errors.
make[1]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make: *** [Makefile:81: modules] Error 2
-> Error.
ERROR: An error occurred while performing the step: “Checking to see whether the nvidia kernel module was successfully built”. See /var/log/nvidia-installer.log for details.
-> The command `cd ./kernel; /usr/bin/make -k -j32 NV_KERNEL_MODULES=”nvidia” NV_EXCLUDE_KERNEL_MODULES=”” SYSSRC=”/lib/modules/4.18.0-193.el8.x86_64/source” SYSOUT=”/lib/modules/4.18.0-193.el8.x86_64/build”` failed with the following output:

make[1]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[2]: Entering directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
/usr/src/kernels/4.18.0-193.el8.x86_64/Makefile:975: *** “Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel”. Stop.
make[2]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make[1]: *** [Makefile:157: sub-make] Error 2
make[1]: Target ‘modules’ not remade because of errors.
make[1]: Leaving directory ‘/usr/src/kernels/4.18.0-193.el8.x86_64’
make: *** [Makefile:81: modules] Error 2
ERROR: The nvidia kernel module was not created.
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.
[root@apple cuda]#
[root@apple cuda]#
[root@apple cuda]# dnf install elfutils-libelf-devel ← 足りないモジュールのインストール
メタデータの期限切れの最終確認: 1:15:25 時間前の 2020年09月14日 09時08分49秒 に実施しました。
依存関係が解決しました。
=================================================================================================================================================================================
パッケージ アーキテクチャー バージョン リポジトリー サイズ
=================================================================================================================================================================================
インストール中:
elfutils-libelf-devel x86_64 0.178-7.el8 BaseOS 58 k
依存関係のインストール中:
zlib-devel x86_64 1.2.11-13.el8 BaseOS 57 k

トランザクションの概要
=================================================================================================================================================================================
インストール 2 パッケージ

合計サイズ: 115 k
インストール済みのサイズ: 171 k
これでよろしいですか? [y/N]: y
パッケージのダウンロード:
[SKIPPED] elfutils-libelf-devel-0.178-7.el8.x86_64.rpm: Already downloaded
[SKIPPED] zlib-devel-1.2.11-13.el8.x86_64.rpm: Already downloaded
———————————————————————————————————————————————————————————
合計 11 MB/s | 115 kB 00:00
警告: /var/cache/dnf/BaseOS-929b586ef1f72f69/packages/elfutils-libelf-devel-0.178-7.el8.x86_64.rpm: ヘッダー V3 RSA/SHA256 Signature、鍵 ID 8483c65d: NOKEY
CentOS-8 – Base 1.6 MB/s | 1.6 kB 00:00
GPG 鍵 0x8483C65D をインポート中:
Userid : “CentOS (CentOS Official Signing Key) <security@centos.org>”
Fingerprint: 99DB 70FA E1D7 CE22 7FB6 4882 05B5 55B3 8483 C65D
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
これでよろしいですか? [y/N]: y
鍵のインポートに成功しました
トランザクションの確認を実行中
トランザクションの確認に成功しました。
トランザクションのテストを実行中
トランザクションのテストに成功しました。
トランザクションを実行中
準備 : 1/1
インストール中 : zlib-devel-1.2.11-13.el8.x86_64 1/2
インストール中 : elfutils-libelf-devel-0.178-7.el8.x86_64 2/2
scriptlet の実行中: elfutils-libelf-devel-0.178-7.el8.x86_64 2/2
検証 : elfutils-libelf-devel-0.178-7.el8.x86_64 1/2
検証 : zlib-devel-1.2.11-13.el8.x86_64 2/2
Installed products updated.

インストール済み:
elfutils-libelf-devel-0.178-7.el8.x86_64 zlib-devel-1.2.11-13.el8.x86_64

完了しました!
[root@apple cuda]#
[root@apple cuda]#

[root@apple cuda]# ./cuda_11.0.3_450.51.06_linux.run –tmpdir=/home/cuda ← CUDAインストーラーの起動

┌──────────────────────────────────────────────────────────────────────────────┐
│ End User License Agreement │
│ ————————– │
│ │
│ NVIDIA Software License Agreement and CUDA Supplement to │
│ Software License Agreement. │
│ │
│ │
│ Preface │
│ ——- │
│ │
│ The Software License Agreement in Chapter 1 and the Supplement │
│ in Chapter 2 contain license terms and conditions that govern │
│ the use of NVIDIA software. By accepting this agreement, you │
│ agree to comply with all the terms and conditions applicable │
│ to the product(s) included herein. │
│ │
│ │
│ NVIDIA Driver │
│ │
│ │
│──────────────────────────────────────────────────────────────────────────────│
│ Do you accept the above EULA? (accept/decline/quit): │
accept
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer │
│ – [X] Driver │
│ [X] 450.51.06 │
│ + [X] CUDA Toolkit 11.0 │
│ [X] CUDA Samples 11.0 │
│ [X] CUDA Demo Suite 11.0 │
│ [X] CUDA Documentation 11.0 │
Options
Install←Installを選択する前にOptionsを選択してください。
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ Up/Down: Move | Left/Right: Expand | ‘Enter’: Select | ‘A’: Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│ Options │
Driver Options←Driver Optionsを選択してください。
│ Toolkit Options │
│ Samples Options │
│ Library install path (Blank for system default) │
│ Done │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ Up/Down: Move | Left/Right: Expand | ‘Enter’: Select | ‘A’: Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Driver │
│ [ ] Do not install any of the OpenGL-related driver files │
│ [ ] Do not install the nvidia-drm kernel module │
│ [X] Update the system X config file to use the NVIDIA X driver │←選択(デフォルトはチェックなし)
│ Change directory containing the kernel source files │
│ Done │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ Up/Down: Move | Left/Right: Expand | ‘Enter’: Select | ‘A’: Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

===========
= Summary =
===========

Driver: Installed
Toolkit: Installed in /usr/local/cuda-11.0/
Samples: Installed in /root/, but missing recommended libraries

Please make sure that
– PATH includes /usr/local/cuda-11.0/bin
– LD_LIBRARY_PATH includes /usr/local/cuda-11.0/lib64, or, add /usr/local/cuda-11.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.0/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall
Logfile is /var/log/cuda-installer.log
[root@apple cuda]#
[root@apple cuda]# nvidia-smi ← CUDAインストールの確認
Mon Sep 14 10:31:40 2020
+—————————————————————————–+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|——————————-+———————-+———————-+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P400 Off | 00000000:2D:00.0 Off | N/A |
| 31% 42C P0 N/A / N/A | 0MiB / 1990MiB | 1% Default |
| | | N/A |
+——————————-+———————-+———————-+

+—————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+—————————————————————————–+
[root@apple cuda]#

[root@apple cuda]# ls /etc/ld.so.conf.d
bind-export-x86_64.conf cuda-11-0.conf dyninst-x86_64.conf kernel-4.18.0-193.el8.x86_64.conf libiscsi-x86_64.conf
[root@apple cuda]#
[root@apple cuda]# cat /etc/ld.so.conf.d/cuda-11-0.conf
/usr/local/cuda-11.0/targets/x86_64-linux/lib
[root@apple cuda]#
[root@apple cuda]# ldconfig
[root@apple cuda]#
[root@apple cuda]#
[root@apple cuda]# systemctl set-default graphical.target
Removed /etc/systemd/system/default.target.
Created symlink /etc/systemd/system/default.target → /usr/lib/systemd/system/graphical.target.
[root@apple cuda]# reboot