Let's reproduce GPT-2 (124M)
Andrej Karpathy,
YouTube,
Jun 19, 2024
I am not likely to ever find the time to work though this exercise - it's a four-hour video demonstrating every step involved to reproduce GPT-2. "We reproduce the GPT-2 (124M) from scratch," writes Andrej Karpathy. "This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations." What I really appreciate about this that it exists. As one commenter says, "Andrej is doing himself what OpenAi was supposed to do in the early days — make AI open." Expand the description for a full table of contents.
Today: 1 Total: 111 [Share]
] [