ChatGPT is based on the GPT model, which uses the decoder part of the Transformer architecture.
The Transformer architecture has an encoder and a decoder component, GPT uses only the decoder in autoregressive form, which means it is optimized to predict the next token (word) in a sequence.
Optimizing for predicting the next token causes unintended behaviors. That's why GPT3 often made up facts, generated biased text, or didn't follow the user's intentions.
This is one of the key areas that ChatGPT fixed.
Researchers at OpenAI changed the autoregressive optimization function for the objective "follow the user’s instructions helpfully and safely". To do this, they use Reinforcement Learning from Human Feedback (RLHF).
In RLHF, humans rate multiple questions and answer pairs, and rank them on quality. This generates a reward function that is used to optimize the GPT model with a reinforcement learning algorithm.
ChatGPT was optimized from a GPT 3.5 model, trained on Azure supercomputers. This is an evolution of GPT3 which finished training in early 2022.
- ChatGPT: chat.openai.com/chat
- Blog: openai.com/blog/chatgpt
- Paper of the early version, InstructGPT: arxiv.org/abs/2203.02155
- Information of GPT 3.5: https://beta.openai.com/docs/model-index-for-researchers
© 2023, blueqat Inc. All rights reserved