New LLM Pre-training and Post-training Paradigms

There are hundreds of LLM papers each month proposing new techniques and approaches. However, one of the best ways to see what actually works well in practice is to look at the pre-training and post-training pipelines of the most recent state-of-the-art models. Luckily, four major new LLMs have been released in the last months, accompanied by relatively detailed technical reports. In this article, I focus on the pre-training and post-training pipelines of the following models: Alibaba’s Qwen 2, Apple Intelligence Foundation Language Models, Google’s Gemma 2, Meta AI’s Llama …