Ollama cannot use GPU in Docker on QNAP (RTX 3090, CUDA init fails)

Hello,

I have the same issue as you (RTX Pro 4000 Blackwell), whether it’s for the RAG in Qsirch or in an Ollama container. It often happens, irregularly (not always after the same waiting time that this occurs), that the model uses the CPU rather than the GPU. I also opened a support ticket (awaiting a response).

I’ve been discovering Ollama for several days now, and I’m starting to get to know some basic orders, but I can’t figure out where the problem is coming from.

On my end, all applications, containers, and models are on RAID SSDs.