오랜만에 AI 작업을 위해 Ubuntu 기반의 GPU 서버를 활용하다 발생한 문제입니다.
(base) root@gpu-server:~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.161
Nvidia 드라이버를 재설치하여 해결하였습니다.
아래 링크 글에서 많은 도움이 되었습니다.
(참고 링크 : https://dfso2222.tistory.com/69)
1. Nvidia 드라이버 완전히 삭제하기
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get install ubuntu-desktop
sudo rm /etc/X11/xorg.conf
echo 'nouveau' | sudo tee -a /etc/modules
위 명령어를 순서대로 실행하면 정상적으로 드라이버가 제거됩니다.
2. Nvidia 드라이버 설치
(base) admin@gpu-server:~$ sudo apt install nvidia-driver-550
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
ca-certificates-java fonts-dejavu-extra g++-8 java-common javascript-common
libaccinj64-10.1 libatk-wrapper-java libatk-wrapper-java-jni libclang-cpp10
libcublas10 libcublaslt10 libcudart10.1 libcufft10 libcufftw10
libcuinj64-10.1 libcupti-dev libcupti-doc libcupti10.1 libcurand10
libcusolver10 libcusolvermg10 libcusparse10 libjs-jquery libncurses5
libnppc10 libnppial10 libnppicc10 libnppicom10 libnppidei10 libnppif10
libnppig10 libnppim10 libnppist10 libnppisu10 libnppitc10 libnpps10
libnvblas10 libnvgraph10 libnvidia-container-tools libnvidia-container1
libnvidia-ml-dev libnvjpeg10 libnvrtc10.1 libnvtoolsext1 libnvvm3
libstdc++-8-dev libthrust-dev libtinfo5 libvdpau-dev libz3-4 libz3-dev
llvm-10-tools node-html5shiv openjdk-8-jre openjdk-8-jre-headless
python3-pygments
... (중략) ...
3. 설치 확인
'nvidia-smi' 명령어로 확인할 수 있습니다.
(base) admin@gpu-server:~$ nvidia-smi
Mon Apr 22 12:04:24 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40 Off | 00000000:3B:00.0 Off | 0 |
| 0% 39C P8 13W / 300W | 13MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A40 Off | 00000000:AF:00.0 Off | 0 |
| 0% 33C P8 13W / 300W | 13MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
감사합니다.