All tags
Model: "gpt-oss"
nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion
gemini-2.5-flash-image-preview hermes-4 nemotron-nano-9b-v2 internvl3.5 gpt-oss qwen3 deepseek-v3.1 google-deepmind nous-research nvidia openai ollama huggingface openrouter image-editing natural-language-processing multi-image-composition character-consistency reasoning hybrid-models context-windows model-steerability pretraining finetuning alignment vision vision-language api model-integration sundarpichai _philschmid lmarena_ai omarsar0 skirano yupp_ai xanderatallah officiallogank mervenoyann
Google DeepMind revealed Gemini-2.5-Flash-Image-Preview, a state-of-the-art image editing model excelling in character consistency, natural-language edits, and multi-image composition, dominating the Image Edit Arena with a ~170-180 Elo lead and over 2.5M votes. It is integrated into multiple platforms including Google AI Studio and third-party services. Nous Research released Hermes 4, an open-weight hybrid reasoning model focused on steerability and STEM benchmarks. NVIDIA launched Nemotron Nano 9B V2, a hybrid Mamba-Transformer with 128k context, top-performing under 10B parameters, and released a 6.6T-token pretraining subset. InternVL3.5 introduced 32 vision-language models based on OpenAI's gpt-oss and Qwen3 backbones. Ollama v0.11.7 added DeepSeek v3.1 support with hybrid thinking and Turbo mode preview.
OpenAI's gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3
gpt-oss-120b gpt-oss-20b gpt-oss claude-4.1-opus claude-4.1 genie-3 openai anthropic google-deepmind mixture-of-experts model-architecture agentic-ai model-training model-performance reasoning hallucination-detection gpu-optimization open-weight-models realtime-simulation sama rasbt sebastienbubeck polynoamial kaicathyc finbarrtimbers vikhyatk scaling01 teortaxestex
OpenAI released the gpt-oss family, including gpt-oss-120b and gpt-oss-20b, their first open-weight models since GPT-2, designed for agentic tasks and licensed under Apache 2.0. These models use a Mixture-of-Experts (MoE) architecture with wide vs. deep design and innovative features like bias units in attention and a unique swiglu variant. The 120B model was trained with about 2.1 million H100 GPU hours. Meanwhile, Anthropic launched claude-4.1-opus, touted as the best coding model currently. DeepMind showcased genie-3, a realtime world simulation model with minute-long consistency. The releases highlight advances in open-weight models, reasoning capabilities, and world simulation. Key figures like @sama, @rasbt, and @SebastienBubeck provided technical insights and performance evaluations, noting strengths and hallucination risks.
Gemini 2.5 Deep Think finally ships
gemini-2.5-deep-think gpt-oss gpt-5 kimi-k2-turbo-preview qwen3-coder-flash glm-4.5 step-3 claude openai anthropic google-deepmind kimi-moonshot alibaba ollama zhipu-ai stepfun parallel-thinking model-releases moe attention-mechanisms multimodal-reasoning model-performance context-windows open-source-models model-leaks creative-ai coding reasoning model-optimization demishassabis philschmid scaling01 teortaxestex teknium1 lmarena_ai andrewyng
OpenAI is rumored to soon launch new GPT-OSS and GPT-5 models amid drama with Anthropic revoking access to Claude. Google DeepMind quietly launched Gemini 2.5 Deep Think, a model optimized for parallel thinking that achieved gold-medal level at the IMO and excels in reasoning, coding, and creative tasks. Leaks suggest OpenAI is developing a 120B MoE and a 20B model with advanced attention mechanisms. Chinese AI companies like Kimi Moonshot, Alibaba, and ZHIpu AI are releasing faster and more capable open models such as kimi-k2-turbo-preview, Qwen3-Coder-Flash, and GLM-4.5, signaling strong momentum and potential to surpass the U.S. in AI development. "The final checkpoint was selected just 5 hours before the IMO problems were released," highlighting rapid development cycles.