I am researching various vector search methods with the aim of finding a cost-effective option for RAG. Faiss, a tool developed by Facebook Research, caught my attention, and I would like to run some benchmarks on it.
https://github.com/facebookresearch/faiss
CPU Version
The CPU version can be easily installed.
pip install faiss-cpu
This time, I used random text from Wikipedia, amounting to around 100,000 characters. In a typical company, gathering 100,000 characters of text for business purposes can be quite challenging.
In vector search, the original text is divided into chunks, which are then searched. For this experiment, the chunk settings are as follows:
chunk_size=500, chunk_overlap=125
Here are the settings: chunk_size=1000 and chunk_overlap=0, which I think are appropriate.
The character count is:
91254
The final number of registered chunks is:
265
It ends up being slightly more than simply dividing the text.
The time taken to split the text is:
0.05208230018615723 s
It's almost instantaneous, so there's no need to worry about it.
To vectorize the split chunks and register them in the Faiss database, the process is:
db = FAISS.from_documents(docs, embeddings)
and it takes
7.595894813537598 s
It took more time than expected. I haven't looked into it in detail, so I might be wrong, but the vectorization and database registration process does take time.
Next, I will use the registered database. To load the database, the process is:
0.0033731460571289062 s
It hardly takes any time. Let's perform a search. I tried it with a random prompt.
0.0436704158782959 s
So, for this size, it hardly takes any time at all.
-----
Before moving on to the GPU, I changed the chunk size a bit.
chunk_size=1000, chunk_overlap=0
This is also a commonly used size, isn't it? The text splitting process is:
0.013817548751831055 s
It's instantaneous. It's even faster than before.
The database registration process is also:
5.615519046783447 s
... and it has been accelerated. The number of indexes is:
96
The result is as follows. The database loading speed is:
0.0034017562866210938
This is not much different from the previous size. The search results are:
0.05048203468322754 s
This took more time. It seems that a larger chunk size takes longer.
------
I'll try it with the GPU as well. The GPU used is the easy-to-use RTX 3060.
I'll test with two chunk sizes.
Chunk size: 500, overlap: 125
The text splitting process is:
0.0519862174987793 s
DB
7.6153154373168945 s
Retrieve
0.05502033233642578 s
Chunk size: 1000, overlap: 0
The text splitting
0.014249086380004883 s
DB
5.6154749393463135 s
Retrieve
0.04950141906738281 s
So, for this size, there isn't much time difference between using the GPU and the CPU.
To make sure the GPU is actually being used, I checked. Despite some overhead in the measurement, I confirmed that the power consumption increased significantly. Normally, the standby power is around 20-30W.
NVIDIA GeForce RTX 3060
76W / 170W