On-Premise RAG Search

So I have a question and maybe this will require the QNAP staff to answer..

I wanted to experiment with on-premise LLM and RAG searches in Qsirch. However, I do not have a GPU installed in my NAS. I do have the QNAP USB AI Accelerator device.

On QNAP’s website, nothing is said about a GPU being required:

But apparently it is. But why can’t we use the AI accelerator instead of the GPU?

The On-premise RAG search loads LLMs into video memory at the time of the first search. If there’s insufficient GPU memory, it reports an error (insufficient memory).

They don’t state minimum requirements, but they do list “Recommended” requirements here:

How to Set up On-Prem Qsirch RAG | QNAP

And they “recommend” an RTX 4000 (Ada Generation, which has 20GB of vRAM) and an RTX 6000 Pro (also Ada gen, which has 96GB of vRAM). For kicks I tried with a RTX 3050 6GB and was able to load some of the smaller models, but not a lot. Performance is similar to just running ollama on a comparable desktop.

Regarding the “AI Accelerator”, I’d venture a guess it doesn’t have the memory for this sort of task, but better to let QNAP explain/cover that use-case.. :slight_smile:

Just to add, you can always use the On-line LLMs in the RAG search feature, but your NAS will ship data to Gemini, ChatGPT, etc, vs. being fully on-prem. No GPU required since all the work is being done in those cloud provider’s DCs.

Hope this helps.

Aha. OK. The memory is probably the big thing here. I would agree that the accelerators probably don’t have the memory.

I’m not going to give up my 10 Gbit fiber and put in a GPU that may or may not work well with this in my TS-873A. Not worth it.

And really don’t want to go out to the world with my searches. I can do that with just going to those websites…

Thanks for your interest in Qsirch RAG Search.

At the moment, if you want to run Qsirch with an on-prem LLM + RAG search, it does need an NVIDIA GPU. The main reason is that local LLMs need quite a bit of GPU memory to run properly, and that’s something the AI Accelerator can’t really handle yet.

We understand that you’re concerned about data privacy and prefer to keep your data local. That makes total sense.
However, if you’re open to trying RAG search with a cloud LLM for evaluation purposes, it can be a good way to get a clear picture of how RAG works in Qsirch and how to set up OpenAI-compatible APIs.

You can also start with a small or non-sensitive dataset just to understand the workflow first, and then decide later whether running everything on-prem with a GPU makes sense for your use case.

Thank you, Vivian. I don’t really need an LLM search capability. It was really more of experimenting with the capabilities of the NAS.