All tags
Model: "gpt-4.1-nano"
Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI
o3-pro o3 o1-pro gpt-4o gpt-4.1 gpt-4.1-mini gpt-4.1-nano meta-ai-fair scale-ai lamini amd openai gemini google anthropic model-release benchmarking reasoning fine-tuning pricing model-performance direct-preference-optimization complex-problem-solving alexandr_wang sharon_zhou fidji_simo sama jack_rae markchen90 kevinweil gdb gregkamradt lechmazur wesrothmoney paul_cal imjaredz cto_junior johnowhitaker polynoamial scaling01
Meta hires Scale AI's Alexandr Wang to lead its new "Superintelligence" division following a $15 billion investment for a 49% stake in Scale. Lamini's Sharon Zhou joins AMD as VP of AI under Lisa Su, while Instacart's Fidji Simo becomes CEO of Apps at OpenAI under Sama. Meta offers over $10 million/year compensation packages to top researchers, successfully recruiting Jack Rae from Gemini. OpenAI releases o3-pro model to ChatGPT Pro users and API, outperforming o3 and setting new benchmarks like Extended NYT Connections and SnakeBench. Despite being slower than o1-pro, o3-pro excels in reasoning and complex problem-solving. OpenAI cuts o3 pricing by 80%, making it cheaper than GPT-4o and pressuring competitors like Google and Anthropic to lower prices. Users can now fine-tune the GPT-4.1 family using direct preference optimization (DPO) for subjective tasks.
not much happened today
hunyuan-turbos qwen3-235b-a22b o3 gpt-4.1-nano grok-3 gemini-2.5-pro seed1.5-vl kling-2.0 tencent openai bytedance meta-ai-fair nvidia deepseek benchmarking model-performance moe reasoning vision video-understanding vision-language multimodality model-evaluation model-optimization lmarena_ai artificialanlys gdb _jasonwei iScienceLuvr _akhaliq _philschmid teortaxesTex mervenoyann reach_vb
Tencent's Hunyuan-Turbos has risen to #8 on the LMArena leaderboard, showing strong performance across major categories and significant improvement since February. The Qwen3 model family, especially the Qwen3 235B-A22B (Reasoning) model, is noted for its intelligence and efficient parameter usage. OpenAI introduced HealthBench, a new health evaluation benchmark developed with input from over 250 physicians, where models like o3, GPT-4.1 nano, and Grok 3 showed strong results. ByteDance released Seed1.5-VL, a vision-language model with a 532M-parameter vision encoder and a 20B active parameter MoE LLM, achieving state-of-the-art results on 38 public benchmarks. In vision-language, Kling 2.0 leads image-to-video generation, and Gemini 2.5 Pro excels in video understanding with advanced multimodal capabilities. Meta's Vision-Language-Action framework and updates on VLMs for 2025 were also highlighted.
SOTA Video Gen: Veo 2 and Kling 2 are GA for developers
veo-2 gemini gpt-4.1 gpt-4o gpt-4.5-preview gpt-4.1-mini gpt-4.1-nano google openai video-generation api coding instruction-following context-window performance benchmarks model-deprecation kevinweil stevenheidel aidan_clark_
Google's Veo 2 video generation model is now available in the Gemini API with a cost of 35 cents per second of generated video, marking a significant step in accessible video generation. Meanwhile, China's Kling 2 model launched with pricing around $2 for a 10-second clip and a minimum subscription of $700 per month for 3 months, generating excitement despite some skill challenges. OpenAI announced the GPT-4.1 family release, including GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, highlighting improvements in coding, instruction following, and a 1 million token context window. The GPT-4.1 models are 26% cheaper than GPT-4o and will replace the GPT-4.5 Preview API version by July 14. Performance benchmarks show GPT-4.1 achieving 54-55% on SWE-bench verified and a 60% improvement over GPT-4o in some internal tests, though some critiques note it underperforms compared to other models like OpenRouter and DeepSeekV3 in coding tasks. The release is API-only, with a prompting guide provided for developers.
GPT 4.1: The New OpenAI Workhorse
gpt-4.1 gpt-4.1-mini gpt-4.1-nano gpt-4o gemini-2.5-pro openai llama-index perplexity-ai google-deepmind coding instruction-following long-context benchmarks model-pricing model-integration model-deprecation sama kevinweil omarsar0 aidan_mclau danhendrycks polynoamial scaling01 aravsrinivas lmarena_ai
OpenAI released GPT-4.1, including GPT-4.1 mini and GPT-4.1 nano, highlighting improvements in coding, instruction following, and handling long contexts up to 1 million tokens. The model achieves a 54 score on SWE-bench verified and shows a 60% improvement over GPT-4o on internal benchmarks. Pricing for GPT-4.1 nano is notably low at $0.10/1M input and $0.40/1M output. GPT-4.5 Preview is being deprecated in favor of GPT-4.1. Integration support includes Llama Index with day 0 support. Some negative feedback was noted for GPT-4.1 nano. Additionally, Perplexity's Sonar API ties with Gemini-2.5 Pro for the top spot in the LM Search Arena leaderboard. New benchmarks like MRCR and GraphWalks were introduced alongside updated prompting guides and cookbooks.