All tags
Person: "jim-fan"
o1: OpenAI's new general reasoning models
o1 o1-preview o1-mini gpt-4o llama openai nvidia test-time-reasoning reasoning-tokens token-limit competitive-programming benchmarking scaling-laws ai-chip-competition inference training model-performance jason-wei jim-fan
OpenAI has released the o1 model family, including o1-preview and o1-mini, focusing on test-time reasoning with extended output token limits over 30k tokens. The models show strong performance, ranking in the 89th percentile on competitive programming, excelling in USA Math Olympiad qualifiers, and surpassing PhD-level accuracy on physics, biology, and chemistry benchmarks. Notably, o1-mini performs impressively despite its smaller size compared to gpt-4o. The release highlights new scaling laws for test-time compute that scale loglinearly. Additionally, Nvidia is reportedly losing AI chip market share to startups, with a shift in developer preference from CUDA to llama models for web development, though Nvidia remains dominant in training. This news reflects significant advances in reasoning-focused models and shifts in AI hardware competition.
Evals: The Next Generation
gpt-4 gpt-5 gpt-3.5 phi-3 mistral-7b llama-3 scale-ai mistral-ai reka-ai openai moderna sanctuary-ai microsoft mit meta-ai-fair benchmarking data-contamination multimodality fine-tuning ai-regulation ai-safety ai-weapons neural-networks model-architecture model-training model-performance robotics activation-functions long-context sam-altman jim-fan
Scale AI highlighted issues with data contamination in benchmarks like MMLU and GSM8K, proposing a new benchmark where Mistral overfits and Phi-3 performs well. Reka released the VibeEval benchmark for multimodal models addressing multiple choice benchmark limitations. Sam Altman of OpenAI discussed GPT-4 as "dumb" and hinted at GPT-5 with AI agents as a major breakthrough. Researchers jailbroke GPT-3.5 via fine-tuning. Global calls emerged to ban AI-powered weapons, with US officials urging human control over nuclear arms. Ukraine launched an AI consular avatar, while Moderna partnered with OpenAI for medical AI advancements. Sanctuary AI and Microsoft collaborate on AI for general-purpose robots. MIT introduced Kolmogorov-Arnold networks with improved neural network efficiency. Meta AI is training Llama 3 models with over 400 billion parameters, featuring multimodality and longer context.