How Good Are the Latest Open LLMs? And Is DPO Better Than PPO?

What a month! We had four major open LLM releases: Mixtral, Meta AI’s Llama 3, Microsoft’s Phi-3, and Apple’s OpenELM. In my new article, I review and discuss all four of these major transformer-based LLM model releases, followed by new research on reinforcement learning with human feedback methods for instruction finetuning using PPO and DPO …