Hi everyone,
I’m trying to run Ollama with GPU acceleration inside Docker on my QNAP NAS, but it always falls back to CPU. I’ve done quite a bit of debugging and would appreciate any advice or confirmation if this is a known limitation.
My setup
-
QTS: 5.2.9
-
Kernel: 5.10.60-qnap
-
GPU: NVIDIA GeForce RTX 3090
-
NVIDIA Driver (QPKG): 575.64.05
-
Driver type: NVIDIA Open Kernel Module
-
Docker: Container Station + CLI (
--gpus all)
What works
-
nvidia-smiworks on host (via container) -
nvidia-smiworks inside containers -
/dev/nvidia*devices are present -
NVIDIA modules loaded:
nvidia
nvidia_uvm
nvidia_modeset
nvidia_drm
So GPU passthrough to containers seems fine.
What does NOT work
Ollama does not detect GPU and always uses CPU:
inference compute id=cpu library=cpu
total_vram="0 B"
Even though GPU is available.
What I tested
1. Different Ollama versions
-
0.20.6-rc1
-
0.20.5
-
0.19.0
→ same result (CPU only)
2. CUDA libraries
Inside container, CUDA libs are present:
/usr/lib/ollama/cuda_v12/libcudart.so.12
/usr/lib/ollama/cuda_v12/libcublas.so.12
/usr/lib/ollama/cuda_v12/libcublasLt.so.12
Initially ldd showed missing libs, but after setting:
LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/lib/x86_64-linux-gnu
→ all dependencies resolve correctly.
3. Still fails
Despite that, Ollama fails CUDA init:
ggml_cuda_init: failed to initialize CUDA: initialization error
4. Kernel logs (this looks suspicious)
NVRM: nvCheckOkFailedNoLog: Check failed: Out of memory [NV_ERR_NO_MEMORY]
NVRM: faultbufCtrlCmdMmuFaultBufferRegisterNonReplayBuf_IMPL: Error allocating client shadow fault buffer
My current understanding
It looks like:
-
GPU passthrough works (Docker side OK)
-
CUDA libraries are present
-
but CUDA initialization fails at runtime
Since I’m using NVIDIA Open Kernel Module, I suspect:
it might not fully support CUDA workloads in this environment
or there is a compatibility issue with QNAP kernel (5.10.60)
Questions
-
Has anyone successfully run Ollama (or any CUDA-heavy app) with GPU on QNAP?
-
Is this a known limitation of NVIDIA Open Kernel Module on QNAP?
-
Is it possible to use proprietary NVIDIA driver instead of open module?
-
Has anyone seen
NV_ERR_NO_MEMORYerrors like this?
Workaround
Right now I’m considering:
-
running Ollama on a separate Linux machine (Debian/Ubuntu)
-
and keeping QNAP only for UI/services



