So far, I&#39;ve introduced several LLMs for use on local PCs. This time, I&#39;ll discuss the very popular RAG that utilizes LLMs. RAG stands for Retrieval Augmented Generation, combining large language models with search to meet the needs of businesses, which has made it quite popular. As the acronym suggests, RAG processes through three steps: - Retrieval (Search)- Augmented (Enhancement)- Generation (Creation) Typically, questions posed to an LLM are limited to what is learned within the LLM, often leading to errors. Moreover, normal models have a limitation on the number of tokens, meaning the LLM can only understand a limited number of words as input. Therefore, RAG involves a step where a search system retrieves the necessary information, which is then reviewed to complete the given task. For instance, when a user wants to inquire about specific knowledge, having a database or documents prepared in advance for searching and extracting necessary information can improve the response&#39;s accuracy. While recent language models can input very long texts, allowing for the provision of a manual with the question for searching, RAG uniquely performs a search process. For the search, texts are read, divided into chunks, or blocks of text, and stored in a database. Then, it searches for similarities to the query and retrieves them in order of resemblance. This time, we&#39;ll attempt to implement RAG using the local LLMs we&#39;ve learned about, utilizing the Mistral7B model. To implement RAG, it seems using the Langchain framework simplifies the process. The official Langchain documentation is a well-organized reference material:<a href="https://www.langchain.com/" rel="noopener noreferrer" target="_blank">https://www.langchain.com/</a> Since we&#39;re using a local LLM for the machine, and we need a vector search model along with the local LLM, I decided to use an H100 for a bit more VRAM leeway. However, a standard GPU should suffice. For tool installation, since I want to read PDF papers this time, I&#39;ll be using PyPDF. <pre class="ql-syntax" spellcheck="false">pip install transformers accelerate langchain langchain-community sentence-transformers faiss-gpu pypdf
</pre> First, I loaded the local LLM. For simplicity, I utilized something called a pipeline, which made configuration easy. The output length changed depending on the value of max_new_tokens. I asked, &#34;What is a logical qubit?&#34; in English. <pre class="ql-syntax" spellcheck="false">from transformers import AutoTokenizer, pipeline
#import torch

model_id = &#34;mistralai/Mistral-7B-Instruct-v0.2&#34;
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline(&#34;text-generation&#34;, model=model_id, tokenizer=tokenizer, device=0, max_new_tokens=300)

query = &#39;what is a logical qubit ?&#39;
pipe(query)
</pre> <pre class="ql-syntax" spellcheck="false">[{&#39;generated_text&#39;: &#39;what is a logical qubit ?\n\nComment: @user1266428: A qubit is a quantum bit, a quantum two-level system. It is a quantum mechanical object, not a logical one.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428:&#39;}]
</pre> <pre class="ql-syntax" spellcheck="false">A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.
</pre> I received an answer, but it&#39;s not clear and not accurate. I quickly search for documents to reference. This time, I&#39;ll be reading a paper on QuEra&#39;s cold atoms from arXiv&#39;s PDFs. The chunk_size specifies the size for dividing the document. The model used for calculating similarity is called embeddings, which I&#39;ve specified from Hugging Face. The tool I&#39;m using for this, FAISS, seems to be a handy library from Facebook for vector search. I&#39;m using it without much thought.Facebook AI Similarity Search (Faiss) <pre class="ql-syntax" spellcheck="false">from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = PyPDFLoader(&#34;https://arxiv.org/pdf/2312.03982.pdf&#34;)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(
  model_name=&#34;intfloat/multilingual-e5-large&#34;
)

db = FAISS.from_documents(docs, embeddings)
print(db.index.ntotal)
</pre> <pre class="ql-syntax" spellcheck="false">32
</pre> To search for documents with high similarity to the previously set query, <pre class="ql-syntax" spellcheck="false">docs = db.similarity_search(query)
print(docs[0].page_content)
</pre> <pre class="ql-syntax" spellcheck="false">Logical quantum processor based on reconfigurable atom arrays
Dolev Bluvstein1, Simon J. Evered1, Alexandra A. Geim1, Sophie H. Li1, Hengyun Zhou1,2,
Tom Manovitz1, Sepehr Ebadi1, Madelyn Cain1, Marcin Kalinowski1, Dominik Hangleiter3, J. Pablo Bonilla
Ataides1, Nishad Maskara1, Iris Cong1, Xun Gao1, Pedro Sales Rodriguez2, Thomas Karolyshyn2,
Giulia Semeghini4, Michael J. Gullans3, Markus Greiner1, Vladan Vuleti´ c5, and Mikhail D. Lukin1
1Department of Physics, Harvard University, Cambridge, MA 02138, USA
2QuEra Computing Inc., Boston, MA 02135, USA
3Joint Center for Quantum Information and Computer Science,
NIST/University of Maryland, College Park, Maryland 20742, USA
4John A. Paulson School of Engineering and Applied Sciences,
Harvard University, Cambridge, MA 02138, USA
5Department of Physics and Research Laboratory of Electronics,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Suppressing errors is the central challenge for useful quantum computing [1], requiring quan-
tum error correction [2–6] for large-scale processing. However, the overhead in the realization of
error-corrected “logical” qubits, where information is encoded across many physical qubits for redun-
dancy [2–4], poses significant challenges to large-scale logical quantum computing. Here we report
the realization of a programmable quantum processor based on encoded logical qubits operating
with up to 280 physical qubits. Utilizing logical-level control and a zoned architecture in recon-
figurable neutral atom arrays [7], our system combines high two-qubit gate fidelities [8], arbitrary
connectivity [7, 9], as well as fully programmable single-qubit rotations and mid-circuit readout [10–
15]. Operating this logical processor with various types of encodings, we demonstrate improvement
of a two-qubit logic gate by scaling surface code [6] distance from d= 3 to d= 7, preparation
of color code qubits with break-even fidelities [5], fault-tolerant creation of logical GHZ states and
feedforward entanglement teleportation, as well as operation of 40 color code qubits. Finally, using
three-dimensional [[8,3,2]] code blocks [16, 17], we realize computationally complex sampling cir-
cuits [18] with up to 48 logical qubits entangled with hypercube connectivity [19] with 228 logical
two-qubit gates and 48 logical CCZ gates [20]. We find that this logical encoding substantially
improves algorithmic performance with error detection, outperforming physical qubit fidelities at
both cross-entropy benchmarking and quantum simulations of fast scrambling [21, 22]. These results
herald the advent of early error-corrected quantum computation and chart a path toward large-scale
logical processors.
（以下略）
</pre> We can use retriever class also. The result is the same. <pre class="ql-syntax" spellcheck="false">retriever = db.as_retriever()
docs = retriever.invoke(query)
print(docs[0].page_content)
</pre> It seems we&#39;re ready to start using it right away.The configuration involves, <pre class="ql-syntax" spellcheck="false">from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=pipe)

template = &#34;&#34;&#34;Answer the question based only on the following context:

{context}

Question: {question}
&#34;&#34;&#34;

prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
  return &#34;\n\n&#34;.join([d.page_content for d in docs])

chain = (
  {&#34;context&#34;: retriever | format_docs, &#34;question&#34;: RunnablePassthrough()}
  | prompt
  | llm
  | StrOutputParser()
)

answer = chain.invoke(query)
print(answer)
</pre> <pre class="ql-syntax" spellcheck="false">（略）
Question: what is a logical qubit ?

Answer: A logical qubit is a quantum state that is encoded across many physical qubits for redundancy in quantum error correction. It is designed to operate with high fidelity by delocalizing a logical qubit degree of freedom across many redundant physical qubits, such that if any given physical qubit fails, it does not corrupt the underlying logical information. In practice, useful quantum error correction poses many challenges, including significant overhead in physical qubit numbers and highly complex gate operations between the delocalized logical degrees of freedom. The paper describes the realization of a programmable quantum processor based on hardware-efficient control over logical qubits in reconfigurable neutral atom arrays, and demonstrates key building blocks of quantum error correction and programmable logical algorithms.
</pre> It worked with incredible accuracy... It seems the key is the embedding model and the LLM model, etc. That&#39;s all.

Super Beginner's Guide to LLM + RAG

Yuichiro Minato