All tags
Topic: "coding-performance"
Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview
gemini-2.5 gemini-2.5-flash-lite gemini-2.5-flash gemini-2.5-pro gemini-2.5-ultra kimi-dev-72b nanonets-ocr-s ii-medical-8b-1706 jan-nano deepseek-r1 minimax-m1 google moonshot-ai deepseek cognitivecompai kling-ai mixture-of-experts multimodality long-horizon-planning benchmarking coding-performance long-context ocr video-generation model-releases tulsee_doshi oriolvinyalsml demishassabis officiallogank _philschmid swyx sainingxie scaling01 gneubig clementdelangue mervenoyann
Gemini 2.5 models are now generally available, including the new Gemini 2.5 Flash-Lite, Flash, Pro, and Ultra variants, featuring sparse Mixture-of-Experts (MoE) transformers with native multimodal support. A detailed 30-page tech report highlights impressive long-horizon planning demonstrated by Gemini Plays Pokemon. The LiveCodeBench-Pro benchmark reveals frontier LLMs struggle with hard coding problems, while Moonshot AI open-sourced Kimi-Dev-72B, achieving state-of-the-art results on SWE-bench Verified. Smaller specialized models like Nanonets-OCR-s, II-Medical-8B-1706, and Jan-nano show competitive performance, emphasizing that bigger models are not always better. DeepSeek-r1 ties for #1 in WebDev Arena, and MiniMax-M1 sets new standards in long-context reasoning. Kling AI demonstrated video generation capabilities.
not much happened today
claude-3.5-sonnet claude-3.5-haiku o1-preview mochi-1 stable-diffusion-3.5 embed-3 kerashub differential-transformer anthropic openai cohere microsoft computer-use coding-performance video-generation fine-tuning multimodality transformers attention-mechanisms model-optimization alexalbert fchollet rasbt
Anthropic released upgraded Claude 3.5 Sonnet and Claude 3.5 Haiku models featuring a new computer use capability that allows interaction with computer interfaces via screenshots and actions like mouse movement and typing. The Claude 3.5 Sonnet achieved state-of-the-art coding performance on SWE-bench Verified with a 49% score, surpassing OpenAI's o1-preview. Anthropic focuses on teaching general computer skills rather than task-specific tools, with expected rapid improvements. Other releases include Mochi 1, an open-source video generation model, Stable Diffusion 3.5 with Large and Medium variants, and Embed 3 by Cohere, a multimodal embedding model for text and image search. KerasHub was launched by François Chollet, unifying KerasNLP and KerasCV with 37 pretrained models. Microsoft introduced the Differential Transformer to reduce attention noise via differential attention maps, and research on transformer attention layers was shared by Rasbt.