Without NVIDIA's GPU, it's not possible to test the heavy parameters of large-scale language models. However, I happened to have a machine with 8 GPUs that supports NV-link, so I utilized it for the execution.
There are often reports of implementing the 7B or 13B models, but reports on the 70B model are scarce here in Japan, perhaps due to the difficulty of obtaining suitable hardware. This time, our company had a machine equipped with 8 NVIDIA GPUs intended for quantum computing, which we used for this purpose.
The installation process was not particularly complicated. However, the drivers that were initially installed were a bit outdated, so I reinstalled them, reinstalled CUDA, and finally tried to execute it, but it didn't run smoothly. Upon investigation, I encountered an unfamiliar error. It seemed that the machine was missing a library needed to control the NV-link that links the 8 GPUs. After installing that library, it worked without any issues.
I'm reporting this because the 70B model ran directly without any quantization or special tweaks. I'd like to actually input a prompt and see it run.
what is quantum computer?
>
A quantum computer is a computer that makes direct use of quantum-mechanical phenomena, such as superposition and entanglement, to perform operations on data. Quantum computers are different from binary digital electronic computers based on transistors. Whereas common digital computing requires that the data be encoded into binary
==================================
量子コンピュータとは何ですか?
>
Quantum computing is the area of study focused on the development of computer technology based on the principles of quantum theory. The quantum bit, or qubit, is the basic unit of quantum information. Qubits can exist in a state of superposition, which means that they can represent both 0 and 1
Even when I ask in Japanese, it seems it returned in English, so I tried once more to see if it could answer in Japanese.
The result is... no good, huh?
量子コンピュータとは何ですか?日本語で答えてください
>
Quantum computers are the next generation of computers that use quantum mechanics to solve problems. They are expected to be much more powerful than traditional computers, as they can process information in a completely different way.
Quantum computers use quantum bits, or qubits, to store and process information. Qubits can
I encountered an unfamiliar alert while running.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
So, it seems that by default, it specifies a thread set to 1 initially. I'm not sure what number to set it to, but I tried it out. I executed with the same prompt, specifying beforehand with OMP_NUM_THREADS=.
OMP_NUM_THREADS = 1
Loaded in 76.69 second
OMP_NUM_THREADS = 2
Loaded in 111.08 seconds
OMP_NUM_THREADS = 4
Loaded in 73.54 seconds
To be honest, the behavior was unclear.
For now, the 70B model worked without any adjustments. I have some performance concerns about the 7B, so I'm considering using the 13B/70B models. Next, I'd like to try out Code llama.