Generative Pre-trained Transformer (GPT)
Concept
- GPT-1 (125M params): Autoregressively pre-train a multi-layer Transformer decoder, and fine-tune on supervised data for downstream tasks.
- GPT-2 (1.5B params): Scale up GPT-1 and utilize a larger dataset: WebText (in-house).
- GPT-3 (175B params): Scale up GPT-2 and utilize more datasets.
- Used datasets: Common Crawl (filtered), WebText2 (in-house), Books1, Books2, and Wikipedia.
- Propose in-context learning to perform zero-shot, one-shot1, few-shot1 learning without fine-tuning.
- Uses GPT2TokenizerFast, which is based on Byte pair encoding (BPE).
- Costs 3,6402 petaflop/s-day (pfs-day) to train. (~20 years on a single 8xV100 machine)
- Codex (12B params): Fine-tune GPT-3 on code datasets.
- Code Fine-tuning on 159 GB Python code collected and filtered from GitHub.
- Add an additional set of tokens for representing whitespace runs of different lengths (based on GPT-3's tokenizer).
- (Codex-S) Supervised Fine-tuning on standalone functions curated from competitive programming websites, interview preparation websites, and GitHub repos/PyPI packages with continuous integration (CI).
- Applications such as GitHub Copilot uses a distinct (production) version of Codex.
- InstructGPT (175B params): Make GPT-3 more aligned to user instructions by
Reinforcement Learning from Human Feedback (RLHF):
- (Step 1) Supervised Fine-Tuning (SFT) (175B params): Train a supervised policy (i.e., fine-tune GPT-3) on collected demonstration data.
- Sample prompts from dataset, and ask labelers to compose the desired outputs.
- (Step 2) Reward Modeling (RM) (6B params): Train a reward model on collected comparison data.
- Sample prompts & outputs, and ask labelers to rank the outputs.
- Weights are initialized from (a fine-tuned variant of) the GPT-3 model.
- (Step 3) Reinforcement Learning (RL): Train a policy against the reward model with reinforcement learning (PPO in Actor-Critic style, bandit environment).
- The policy is defined as \(P(\mathrm{action}|\mathrm{state})=P(\mathrm{response}|\mathrm{prompt})\)
- Policy network (175B params) weights are initialized from (a fine-tuned variant of) the GPT-3 model.
- Value network (6B params) weights are initialized from the Reward Model (RM).
- Hired ~40 contractors (with an onboarding process) to compose:
-
See this image in the official blog post for more information.
- (Step 1) Supervised Fine-Tuning (SFT) (175B params): Train a supervised policy (i.e., fine-tune GPT-3) on collected demonstration data.
- GPT-3.5: GPT-3 and InstructGPT trained on a blend of text and code.
-
ChatGPT: Use the same methods as InstructGPT to improve GPT-3.5, but collects supervised data in chat format instead (convsersations between a user and an AI assistant)
ChatGPT Training Process Overview, from OpenAI Blog.
- Modifies Step 1 in InstructGPT: Collect demonstration data of two roles (user and AI assistant) and ask labelers to compose the desired outputs for each role.
- ChatGPT Plugins: Support interaction with third-party services.
- GPT-4: Further improves ChatGPT, with no technical details available. GPT-4 powers the current ChatGPT Plus.
Some GPT variants are not included in this post (e.g., WebGPT, Image GPT, Visual ChatGPT), but may be included in the future.
Official Resources
- (GPT-1) Improving Language Understanding by Generative Pre-Training [paper][code][blog] (citations: 5290, 5082, as of 2023-04-28)
- (GPT-2) Language Models are Unsupervised Multitask Learners [paper][code][blog] (citations: 5800, 9390, as of 2023-04-28)
- (GPT-3) [NIPS 2020] Language Models are Few-Shot Learners [arxiv][paper][blog][demo] (citations: 9471, 9528, as of 2023-04-28)
- (Codex) Evaluating Large Language Models Trained on Code [arxiv] (citations: 525, 740, as of 2023-04-28)
- (InstructGPT) [NIPS 2022] Training language models to follow instructions with human feedback [arxiv][paper][paper][blog] (citations: 565, 730, as of 2023-04-28)
- (ChatGPT) Optimizing Language Models for Dialogue [blog][demo]
- (ChatGPT Plugins) [blog]
- (GPT-4) GPT-4 Technical Report [arxiv][paper][link]
- GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models [arxiv]
Community Resources
Obsolete Contents
Interesting Posts Regarding ChatGPT.
- Building A Virtual Machine inside ChatGPT by Jonas Degrave
Ask ChatGPT to act as a Linux terminal, use the (imagined) terminal to browse the ChatGPT website (with CURL POST), and ask the (imagined) ChatGPT website to act as a Linux terminal. - Bypassing OpenAI's ChatGPT alignment efforts by Miguel Piedrafita
ChatGPT blocks (direct) unethical questions, but does not block unethical questions in film script format. - ChatGPT passed Problem ABC280D on CodeForces under human coaching by Lily
ChatGPT passed a non-trivial competitive programming question under human coaching, which resembles the coding interview process for SWE. The human coach and ChatGPT resembles the interviewer and interviewee, respectively. - Ask ChatGPT to debug code and explain the fix by Amjad Masad
- Ask ChatGPT to reverse enginner a snippet of x86 assembly (Link missing)
It correctly answered the encryption algorithm and explanations. - Generating AI art prompts by Guy Parsons
-
Please note that the terms: one-shot, few-shot here is different from previous literature. It does not involves fine-tuning. Instead, it prepends the examples at the beginning as a prompt. Maybe it's better to call it few-shot prompting instead of few-shot learning to avoid ambiguity? ↩↩
-
GPT-3 175B costs 3.64E+03 PF-days, from Table D.1 in the GPT-3 paper. ↩
-
The sum of collected data from labeler/customer and for training/validation, from Table 6 in the InstructGPT paper. ↩↩↩