LLM + RAG 超入門

これまでいくつかローカルのPCで利用するLLMの紹介をしてきました。今回はLLMを使ったとても最近人気のあるRAGについてです。

RAGはRetrieval Augmented Generationの略で、大規模言語モデルと検索を組み合わせることで企業向けの需要を満たすということで大きな人気があります。RAGは略称の通り、

Retrieval（検索）

Augmented（拡張）

Generation（生成）

の三つのステップで処理が進みます。通常のLLMでの質問は知識はLLM内部に学習された内容に限られ、しばしば間違えることもあります。また、通常のモデルではトークン数と言ってLLMが入力として理解のできる単語数に制限があります。そのため、検索のシステムを一度通し、必要な情報を取得し、それを観ながら与えられた処理を完結するというステップを踏むのがRAGになります。

例えば、ユーザーがある知識について質問をしたい場合には、あらかじめデータベースや参照できる文書を用意しておき、そちらから必要な情報を検索して抜き出しておいた上で、それらを観ながらユーザーに対して回答を行った方がより精度が上がるということがあります。最近の言語モデルでは非常に長い文章を入力として入れることもできるので、質問と一緒にマニュアルを提供してしまって探してもらうということもできますが、RAGの場合には検索を一度通すという処理を行います。

検索をするためには文章を読み込み、チャンクと言っていくつかの文字の塊に分けてデータベースに格納するようです。そして、その問いかけに対して類似度を検索して、似ている順番に取り出します。

今回はこれまで学んできたローカルLLMを使ってRAGを実装してみます。モデルは、Mistral7Bを使ってみます。

RAGを実装するにはどうやらLangchainというフレームワークを使うのが簡単のようなので使ってみます。

参考資料はLangchainのオフィシャルドキュメントがよくまとまっていました。

https://www.langchain.com/

マシンはローカルLLMを利用しますので、今回はベクトルの検索用のモデルとローカル用のLLMを使うのでちょっとVRAMに余裕を持たせてH100を使ってみました。こんなにスペックいらないので、普通のGPUでも良いと思います。

ツールのインストールですが、今回はPDFの論文を読み込ませたいので、pypdfを使います。

pip install transformers accelerate langchain langchain-community sentence-transformers faiss-gpu pypdf

最初にローカルのLLMを読み込みます。今回は簡単のためにpipelineというものを利用しました。設定が簡単です。max_new_tokensの値によって出力の長さが変わりました。英語で論理量子ビットとは何か？を聞いてみました。

from transformers import AutoTokenizer, pipeline
#import torch

model_id = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline("text-generation", model=model_id, tokenizer=tokenizer, device=0, max_new_tokens=300)

query = 'what is a logical qubit ?'
pipe(query)

答えは、

[{'generated_text': 'what is a logical qubit ?\n\nComment: @user1266428: A qubit is a quantum bit, a quantum two-level system. It is a quantum mechanical object, not a logical one.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428: A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.\n\nComment: @user1266428:'}]

こんな感じです。

A logical qubit is a qubit that is used to represent a logical bit in a quantum computer.

意味がわかりませんが、なんか答えてます。正確ではありませんね。

早速参照するための文章を探します。今回はarxivのPDFから今話題のQuEraの冷却原子論文を読みます。

chunk_sizeは文章を分割するサイズを指定しています。類似度を計算するモデルはembeddingsというモデルでhugging faceから指定をしています。

今回利用するFAISSというのはベクトル検索用のfacebookの便利ライブラリのようです。何も考えずに使います。

Facebook AI Similarity Search (Faiss)

from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings.huggingface import HuggingFaceEmbeddings
from langchain_text_splitters import CharacterTextSplitter

loader = PyPDFLoader("https://arxiv.org/pdf/2312.03982.pdf")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = HuggingFaceEmbeddings(
model_name="intfloat/multilingual-e5-large"
)

db = FAISS.from_documents(docs, embeddings)
print(db.index.ntotal)

先ほど設定したqueryに対して類似度の高い文章を検索するには、

docs = db.similarity_search(query)
print(docs[0].page_content)

Logical quantum processor based on reconfigurable atom arrays
Dolev Bluvstein1, Simon J. Evered1, Alexandra A. Geim1, Sophie H. Li1, Hengyun Zhou1,2,
Tom Manovitz1, Sepehr Ebadi1, Madelyn Cain1, Marcin Kalinowski1, Dominik Hangleiter3, J. Pablo Bonilla
Ataides1, Nishad Maskara1, Iris Cong1, Xun Gao1, Pedro Sales Rodriguez2, Thomas Karolyshyn2,
Giulia Semeghini4, Michael J. Gullans3, Markus Greiner1, Vladan Vuleti´ c5, and Mikhail D. Lukin1
1Department of Physics, Harvard University, Cambridge, MA 02138, USA
2QuEra Computing Inc., Boston, MA 02135, USA
3Joint Center for Quantum Information and Computer Science,
NIST/University of Maryland, College Park, Maryland 20742, USA
4John A. Paulson School of Engineering and Applied Sciences,
Harvard University, Cambridge, MA 02138, USA
5Department of Physics and Research Laboratory of Electronics,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Suppressing errors is the central challenge for useful quantum computing [1], requiring quan-
tum error correction [2–6] for large-scale processing. However, the overhead in the realization of
error-corrected “logical” qubits, where information is encoded across many physical qubits for redun-
dancy [2–4], poses significant challenges to large-scale logical quantum computing. Here we report
the realization of a programmable quantum processor based on encoded logical qubits operating
with up to 280 physical qubits. Utilizing logical-level control and a zoned architecture in recon-
figurable neutral atom arrays [7], our system combines high two-qubit gate fidelities [8], arbitrary
connectivity [7, 9], as well as fully programmable single-qubit rotations and mid-circuit readout [10–
15]. Operating this logical processor with various types of encodings, we demonstrate improvement
of a two-qubit logic gate by scaling surface code [6] distance from d= 3 to d= 7, preparation
of color code qubits with break-even fidelities [5], fault-tolerant creation of logical GHZ states and
feedforward entanglement teleportation, as well as operation of 40 color code qubits. Finally, using
three-dimensional [[8,3,2]] code blocks [16, 17], we realize computationally complex sampling cir-
cuits [18] with up to 48 logical qubits entangled with hypercube connectivity [19] with 228 logical
two-qubit gates and 48 logical CCZ gates [20]. We find that this logical encoding substantially
improves algorithmic performance with error detection, outperforming physical qubit fidelities at
both cross-entropy benchmarking and quantum simulations of fast scrambling [21, 22]. These results
herald the advent of early error-corrected quantum computation and chart a path toward large-scale
logical processors.
（以下略）

同じように、Langchainで使えるクラスを使っても同じように検索できます。中身は同じでした。

retriever = db.as_retriever()
docs = retriever.invoke(query)
print(docs[0].page_content)

ここまで準備できればもう早速使えるようです。

設定には、

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=pipe)

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
return "\n\n".join([d.page_content for d in docs])

chain = (
  {"context": retriever | format_docs, "question": RunnablePassthrough()}
  | prompt
  | llm
  | StrOutputParser()
)

answer = chain.invoke(query)
print(answer)

結果、、、

（略）
Question: what is a logical qubit ?

Answer: A logical qubit is a quantum state that is encoded across many physical qubits for redundancy in quantum error correction. It is designed to operate with high fidelity by delocalizing a logical qubit degree of freedom across many redundant physical qubits, such that if any given physical qubit fails, it does not corrupt the underlying logical information. In practice, useful quantum error correction poses many challenges, including significant overhead in physical qubit numbers and highly complex gate operations between the delocalized logical degrees of freedom. The paper describes the realization of a programmable quantum processor based on hardware-efficient control over logical qubits in reconfigurable neutral atom arrays, and demonstrates key building blocks of quantum error correction and programmable logical algorithms.

むちゃくちゃ正確になって成功しました。。。肝はembedding modelとLLMモデルなどですかね。以上です。

LLM + RAG 超入門

Yuichiro Minato