Topic: "vision"

GPT-5.2 (Instant/Thinking/Pro): 74% on GDPVal, 1.4x cost of GPT 5.1, on 10 Year OpenAI Anniversary

DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100

Claude Agent Skills - glorified AGENTS.md? or MCP killer?

Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA

Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model

Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer & datacenters

not much happened today

not much happened today

not much happened today

nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion

not much happened today

Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants

Figma's $50+b IPO

not much happened today

not much happened today

not much happened today

not much happened today

Cognition's DeepWiki, a free encyclopedia of all GitHub repos

OpenAI o3, o4-mini, and Codex CLI

not much happened today

Google's Agent2Agent Protocol (A2A)

not much happened today

Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen

not much happened today

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

not much happened today

s1: Simple test-time scaling (and Kyutai Hibiki)

DeepSeek #1 on US App Store, Nvidia stock tanks -17%

Titans: Learning to Memorize at Test Time

Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model

not much happened today

not much happened today

not much happened today

Genesis: Generative Physics Engine for Robotics (o1-mini version)

Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)

o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning

Meta BLT: Tokenizer-free, Byte-level LLM

$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews

not much happened today

OLMo 2 - new SOTA Fully Open LLM

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)

Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11

Claude 3.5 Sonnet (New) gets Computer Use

The AI Nobel Prize

Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

Pixtral 12B: Mistral beats Llama to Multimodality

AIPhone 16: the Visual Intelligence Phone

CogVideoX: Zhipu's Open Source Sora

Ideogram 2 + Berkeley Function Calling Leaderboard V2

not much happened today

not much happened today

Too Cheap To Meter: AI prices cut 50-70% in last 30 days

not much happened today

GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

not much happened today

We Solved Hallucinations

FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence

Qdrant's BM42: "Please don't trust us"

That GPT-4o Demo

Ways to use Anthropic's Tool Use GA

Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model

Google I/O in 60 seconds

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)

AdamW -> AaronD?

Welcome /r/LocalLlama!

Claude 3 just destroyed GPT 4 (see for yourself)

CodeLLama 70B beats GPT4 on HumanEval

codellama miqu mistral-medium llama-2-70b aphrodite-engine mixtral flatdolphinmaid noromaid rpcal chatml mistral-7b activation-beacon eagle-7b rwkv-v5 openhermes2.5 nous-hermes-2-mixtral-8x7b-dpo imp-v1-3b bakllava moondream qwen-vl meta-ai-fair ollama nous-research mistral-ai hugging-face ai-ethics alignment gpu-optimization direct-prompt-optimization fine-tuning cuda-programming optimizer-technology quantization multimodality context-length dense-retrieval retrieval-augmented-generation multilinguality model-performance open-source code-generation classification vision

Meta AI surprised the community with the release of CodeLlama, an open-source model now available on platforms like Ollama and MLX for local use. The Miqu model sparked debate over its origins, possibly linked to Mistral Medium or a fine-tuned Llama-2-70b, alongside discussions on AI ethics and alignment risks. The Aphrodite engine showed strong performance on A6000 GPUs with specific configurations. Role-playing AI models such as Mixtral and Flatdolphinmaid faced challenges with repetitiveness, while Noromaid and Rpcal performed better, with ChatML and DPO recommended for improved responses. Learning resources like fast.ai's course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like Paged 8bit lion and adafactor were discussed. At Nous Research AI, the Activation Beacon project introduced a method for unlimited context length in LLMs using "global state" tokens, potentially transforming retrieval-augmented models. The Eagle-7B model, based on RWKV-v5, outperformed Mistral in benchmarks with efficiency and multilingual capabilities. OpenHermes2.5 was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like IMP v1-3b, Bakllava, Moondream, and Qwen-vl were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.

12/28/2023: Smol Talk updates

12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous

12/13/2023 SOLAR10.7B upstages Mistral7B?