On-Premise RAG Search

NA9D · December 9, 2025, 2:40pm

So I have a question and maybe this will require the QNAP staff to answer..

I wanted to experiment with on-premise LLM and RAG searches in Qsirch. However, I do not have a GPU installed in my NAS. I do have the QNAP USB AI Accelerator device.

On QNAP’s website, nothing is said about a GPU being required:

But apparently it is. But why can’t we use the AI accelerator instead of the GPU?

rinthos · December 9, 2025, 4:14pm

The On-premise RAG search loads LLMs into video memory at the time of the first search. If there’s insufficient GPU memory, it reports an error (insufficient memory).

They don’t state minimum requirements, but they do list “Recommended” requirements here:

How to Set up On-Prem Qsirch RAG | QNAP

And they “recommend” an RTX 4000 (Ada Generation, which has 20GB of vRAM) and an RTX 6000 Pro (also Ada gen, which has 96GB of vRAM). For kicks I tried with a RTX 3050 6GB and was able to load some of the smaller models, but not a lot. Performance is similar to just running ollama on a comparable desktop.

Regarding the “AI Accelerator”, I’d venture a guess it doesn’t have the memory for this sort of task, but better to let QNAP explain/cover that use-case..

Just to add, you can always use the On-line LLMs in the RAG search feature, but your NAS will ship data to Gemini, ChatGPT, etc, vs. being fully on-prem. No GPU required since all the work is being done in those cloud provider’s DCs.

Hope this helps.

NA9D · December 9, 2025, 5:58pm

Aha. OK. The memory is probably the big thing here. I would agree that the accelerators probably don’t have the memory.

I’m not going to give up my 10 Gbit fiber and put in a GPU that may or may not work well with this in my TS-873A. Not worth it.

And really don’t want to go out to the world with my searches. I can do that with just going to those websites…