common.title

Quantum Circuit
TYTAN CLOUD

QUANTUM GAMING

Nobisuke

Dekisugi


autoQAOA
RAG for dev
DEEPSCORE

Overview
Service overview
Terms of service

Privacy policy

Contact
Research

Sign in
Sign up
common.title

Running Google’s Gemma2 on an AMD GPU: A Detailed Walkthrough

Yuichiro Minato

2024/08/12 04:40

Running Google’s Gemma2 on an AMD GPU: A Detailed Walkthrough

In this post, I’ll walk you through my experience running Google’s large language model, Gemma2, using an AMD RX7900XTX GPU. From setup to execution, here’s everything you need to know.

Environment Setup and Preparation

First, I installed ROCm-compatible PyTorch, which is necessary to utilize an AMD GPU effectively. After completing the installation, I then installed the Hugging Face Transformers library, allowing easy access to the Gemma2 model.

pip install -U transformers

Once the installation was complete, I proceeded to obtain an access token from Hugging Face. To do this, I visited the following link, agreed to the terms of use, and retrieved my token:

https://huggingface.co/google/gemma-2-2b

Next, I logged in using the token via Jupyter Notebook:

from huggingface_hub import login
login("your-token-here")

With this, I was ready to access the model through Hugging Face.

Running the Model

With everything set up, I ran the following code to load the model and generate text:

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="google/gemma-2-9b",
    device="cuda",  # "cuda" is used here even with an AMD GPU via ROCm
    torch_dtype=torch.float16,
)

text = "Do you know about Blueqat Corporation?"
outputs = pipe(text, max_new_tokens=256)
response = outputs[0]["generated_text"]
print(response)

Execution Results

After running the code, I obtained the following output:

Input Text:

Do you know about Blueqat Corporation?

Generated Output:

Blueqat Corporation was founded in October 2015 with the concept of "Making life with cats more enjoyable." It develops and sells products that support life with cats.

The time taken to generate the output was approximately 12 seconds. Thanks to the high processing power of the AMD RX7900XTX, the result was delivered in a relatively short time.

Conclusion

This experiment demonstrated that Google’s Gemma2 model could be run smoothly on an AMD GPU. By installing ROCm-compatible PyTorch, it’s possible to leverage AMD GPUs even without CUDA support. The output quality was also satisfactory, showing great potential for further applications.

© 2025, blueqat Inc. All rights reserved