Topic: "multimodality"

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

Google I/O 2026: Gemini 3.5 Flash, Omni, and Google’s Agent Stack

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

Context Graphs: Hype or actually Trillion-dollar opportunity?

not much happened today

Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager

not much happened today

Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch

not much happened today

Claude Skills grows: Open Standard, Directory, Org Admin

Gemini 3.0 Flash Preview: 1/4 cost of Pro, but ~as smart, retakes Pareto Frontier

not much happened today

not much happened today

not much happened today

Mistral 3: Mistral Large 3 + Ministral 3B/8B/14B open weights models

Gemini 3 Pro — new GDM frontier model 6, Gemini 3 Deep Think, and Antigravity IDE

GPT 5.1 in ChatGPT: No evals, but adaptive thinking and instruction following

not much happened today

DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100

not much happened today

OpenAI Dev Day: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs

not much happened today

Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap

Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model

Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer & datacenters

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

not much happened today

SmolLM3: the SOTA 3B reasoning open source LLM

not much happened today

not much happened today

OpenAI releases Deep Research API (o3/o4-mini)

Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?

Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview

not much happened today

not much happened today

not much happened today

not much happened today

OpenAI buys Jony Ive's io for $6.5b, LMArena lands $100m seed from a16z

not much happened today

ChatGPT Codex, OpenAI's first cloud SWE agent

codex-1 openai-o3 codex-mini gemma-3 blip3-o qwen-2.5 marigold-iid deepseek-v3 lightlab gemini-2.0 lumina-next openai runway salesforce qwen deepseek google google-deepmind j1 software-engineering parallel-processing multimodality diffusion-models depth-estimation scaling-laws reinforcement-learning fine-tuning model-performance multi-turn-conversation reasoning audio-processing sama kevinweil omarsar0 iscienceluvr akhaliq osanseviero c_valenzuelab mervenoyann arankomatsuzaki jasonwei demishassabis philschmid swyx teortaxestex jaseweston

OpenAI launched Codex, a cloud-based software engineering agent powered by codex-1 (an optimized version of OpenAI o3) available in research preview for Pro, Enterprise, and Team ChatGPT users, featuring parallel task execution like refactoring and bug fixing. The Codex CLI was enhanced with quick sign-in and a new low-latency model, codex-mini. Gemma 3 is highlighted as the best open model runnable on a single GPU. Runway released the Gen-4 References API for style transfer in generation. Salesforce introduced BLIP3-o, a unified multimodal model family using diffusion transformers for CLIP image features. The Qwen 2.5 models (1.5B and 3B versions) were integrated into the PocketPal app with various chat templates. Marigold IID, a new state-of-the-art open-source depth estimation model, was released. In research, DeepSeek shared insights on scaling and hardware for DeepSeek-V3. Google unveiled LightLab, a diffusion-based light source control in images. Google DeepMind's AlphaEvolve uses Gemini 2.0 to discover new math and reduce costs without reinforcement learning. Omni-R1 studied audio's role in fine-tuning audio LLMs. Qwen proposed a parallel scaling law inspired by classifier-free guidance. Salesforce released Lumina-Next on the Qwen base, outperforming Janus-Pro. A study found LLM performance degrades in multi-turn conversations due to unreliability. J1 is incentivizing LLM-as-a-Judge thinking via reinforcement learning. A new Qwen study correlates question and strategy similarity to predict reasoning strategies.

not much happened today

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

Gemini 2.5 Pro Preview 05-06 (I/O edition) - the SOTA vision+coding model

gpt-image-1 - ChatGPT's imagegen model, confusingly NOT 4o, now available in API

not much happened today

not much happened today; New email provider for AINews

Gemini 2.5 Flash completes the total domination of the Pareto Frontier

OpenAI o3, o4-mini, and Codex CLI

not much happened today

Google's Agent2Agent Protocol (A2A)

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Llama 4's Controversial Weekend Release

not much happened today

>$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)

OpenAI adopts MCP

Gemini 2.5 Pro + 4o Native Image Gen

Every 7 Months: The Moore's Law for Agent Autonomy

not much happened today

Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)

Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen

The new OpenAI Agents Platform

DeepSeek's Open Source Stack

not much happened today

not much happened today

GPT 4.5 — Chonky Orion ships!

not much happened today

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

X.ai Grok 3 and Mira Murati's Thinking Machines

LLaDA: Large Language Diffusion Models

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

not much happened today

DeepSeek #1 on US App Store, Nvidia stock tanks -17%

OpenAI launches Operator, its first Agent

Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning

small little news items

Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model

not much happened to end the year

not much happened today

OpenAI Voice Mode Can See Now - After Gemini Does

Meta BLT: Tokenizer-free, Byte-level LLM

Google wakes up: Gemini 2.0 et al

$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews

not much happened today

Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)

not much happened to end the week

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11

Common Corpus: 2T Open Tokens with Provenance

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

OpenAI beats Anthropic to releasing Speculative Decoding

not much happened today

not much happened today

not much happened today

DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

not much happened today

State of AI 2024

The AI Nobel Prize

not much happened this weekend

Liquid Foundation Models: A New Transformers alternative + AINews Pod 2

not much happened today

not much happened today

Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

not much happened today

o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release

a quiet weekend

Pixtral 12B: Mistral beats Llama to Multimodality

not much happened today

Ideogram 2 + Berkeley Function Calling Leaderboard V2

not much happened today

not much happened today

How Carlini Uses AI

Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)

Nothing much happened today

Problems with MMLU-Pro

Gemma 2: The Open Model for Everyone

Is this... OpenQ*?

Hybrid SSM/Transformers > Pure SSMs/Pure Transformers

The Last Hurrah of Stable Diffusion?

Francois Chollet launches $1m ARC Prize

Mamba-2: State Space Duality

1 TRILLION token context, real time, on device?

ALL of AI Engineering in One Place

Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model

Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing

Not much happened today

Google I/O in 60 seconds

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)

OpenAI's PR Campaign?

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

Evals: The Next Generation

Perplexity, the newest AI unicorn

Lilian Weng on Video Diffusion

Multi-modal, Multi-Aspect, Multi-Form-Factor AI

Music's Dall-E moment

not much happened today

MM1: Apple's first Large Multimodal Model

DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY

Not much happened today

Stable Diffusion 3 — Rombach & Esser did it again!

Claude 3 just destroyed GPT 4 (see for yourself)

The Era of 1-bit LLMs

Sora pushes SOTA

CodeLLama 70B beats GPT4 on HumanEval

codellama miqu mistral-medium llama-2-70b aphrodite-engine mixtral flatdolphinmaid noromaid rpcal chatml mistral-7b activation-beacon eagle-7b rwkv-v5 openhermes2.5 nous-hermes-2-mixtral-8x7b-dpo imp-v1-3b bakllava moondream qwen-vl meta-ai-fair ollama nous-research mistral-ai hugging-face ai-ethics alignment gpu-optimization direct-prompt-optimization fine-tuning cuda-programming optimizer-technology quantization multimodality context-length dense-retrieval retrieval-augmented-generation multilinguality model-performance open-source code-generation classification vision

Meta AI surprised the community with the release of CodeLlama, an open-source model now available on platforms like Ollama and MLX for local use. The Miqu model sparked debate over its origins, possibly linked to Mistral Medium or a fine-tuned Llama-2-70b, alongside discussions on AI ethics and alignment risks. The Aphrodite engine showed strong performance on A6000 GPUs with specific configurations. Role-playing AI models such as Mixtral and Flatdolphinmaid faced challenges with repetitiveness, while Noromaid and Rpcal performed better, with ChatML and DPO recommended for improved responses. Learning resources like fast.ai's course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like Paged 8bit lion and adafactor were discussed. At Nous Research AI, the Activation Beacon project introduced a method for unlimited context length in LLMs using "global state" tokens, potentially transforming retrieval-augmented models. The Eagle-7B model, based on RWKV-v5, outperformed Mistral in benchmarks with efficiency and multilingual capabilities. OpenHermes2.5 was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like IMP v1-3b, Bakllava, Moondream, and Qwen-vl were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.

Adept Fuyu-Heavy: Multimodal model for Agents

Google Solves Text to Video

1/16/2024: ArtificialAnalysis - a new model/host benchmark site

12/28/2023: Smol Talk updates

12/26/2023: not much happened today

12/25/2023: Nous Hermes 2 Yi 34B for Christmas

12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous

Is Google's Gemini... legit?