Company: "gemma"

minor ai followups: MultiAgents, Meta-SSI-Scale, Karpathy, AI Engineer

gpt-4o afm-4.5b gemma qwen stt-1b-en_fr stt-2.6b-en hunyuan-3d-2.1 openai meta-ai-fair scale-ai huggingface tencent arcee-ai ai-safety alignment ai-regulation memory-optimization scalable-oversight speech-recognition 3d-generation foundation-models sama polynoamial neelnanda5 teortaxestex yoshua_bengio zachtratar ryanpgreenblatt reach_vb arankomatsuzaki code_star

OpenAI released a paper revealing how training models like GPT-4o on insecure code can cause broad misalignment, drawing reactions from experts like @sama and @polynoamial. California's AI regulation efforts were highlighted by @Yoshua_Bengio emphasizing transparency and whistleblower protections. The term "context rot" was coined to describe LLM conversation degradation, with systems like Embra using CRM-like memory for robustness. Scalable oversight research aiming to improve human control over smarter AIs was discussed by @RyanPGreenblatt. New model releases include Kyutai's speech-to-text models capable of 400 real-time streams on a single H100 GPU, Tencent's Hunyuan 3D 2.1 as the first open-source production-ready PBR 3D generative model, and Arcee's AFM-4.5B foundation model family targeting enterprise use, competitive with Gemma and Qwen.

Jun 04

AI Engineer World's Fair Talks Day 1

gemini-2.5 gemma claude-code mistral cursor anthropic openai aie google-deepmind meta-ai-fair agent-based-architecture open-source model-memorization scaling-laws quantization mixture-of-experts language-model-memorization model-generalization langgraph model-architecture

Mistral launched a new Code project, and Cursor released version 1.0. Anthropic improved Claude Code plans, while ChatGPT announced expanded connections. The day was dominated by AIE keynotes and tracks including GraphRAG, RecSys, and Tiny Teams. On Reddit, Google open-sourced the DeepSearch stack for building AI agents with Gemini 2.5 and LangGraph, enabling flexible agent architectures and integration with local LLMs like Gemma. A new Meta paper analyzed language model memorization, showing GPT-style transformers store about 3.5–4 bits/parameter and exploring the transition from memorization to generalization, with implications for Mixture-of-Experts models and quantization effects.

May 28

not much happened today

deepseek-r1-0528 pali-gemma-2 gemma-3 shieldgemma-2 txgemma gemma-3-qat gemma-3n-preview medgemma dolphingemma signgemma claude-4 opus-4 claude-sonnet-4 codestral-embed bagel qwen nemotron-cortexa gemini-2.5-pro deepseek-ai huggingface gemma claude bytedance qwen nemotron sakana-ai-labs benchmarking model-releases multimodality code-generation model-performance long-context reinforcement-learning model-optimization open-source yuchenj_uw _akhaliq clementdelangue osanseviero alexalbert__ guillaumelample theturingpost lmarena_ai epochairesearch scaling01 nrehiew_ ctnzr

DeepSeek R1 v2 model released with availability on Hugging Face and inference partners. The Gemma model family continues prolific development including PaliGemma 2, Gemma 3, and others. Claude 4 and its variants like Opus 4 and Claude Sonnet 4 show top benchmark performance, including new SOTA on ARC-AGI-2 and WebDev Arena. Codestral Embed introduces a 3072-dimensional code embedder. BAGEL, an open-source multimodal model by ByteDance, supports reading, reasoning, drawing, and editing with long mixed contexts. Benchmarking highlights include Nemotron-CORTEXA topping SWEBench and Gemini 2.5 Pro performing on VideoGameBench. Discussions on random rewards effectiveness focus on Qwen models. "Opus 4 NEW SOTA ON ARC-AGI-2. It's happening - I was right" and "Claude 4 launch has dev moving at a different pace" reflect excitement in the community.

May 12

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

intellect-2 dreamo qwen gemini-2.5-pro dynamic-byte-latent-transformer gen-4-references mistral-medium-3 le-chat-enterprise primeintellect bytedance qwen gemma meta-ai-fair runwayml mistral-ai google distributed-training reinforcement-learning gpu-clusters model-optimization quantization multimodality agentic-ai video-understanding fine-tuning _akhaliq reach_vb osanseviero aiatmeta c_valenzuelab lmarena_ai adcock_brett

Prime Intellect released INTELLECT-2, a decentralized GPU training and RL framework with a vision for distributed AI training overcoming colocation limits. ByteDance launched DreamO, a unified image customization model on Hugging Face. Qwen released models optimized for GPTQ, GGUF, and AWQ quantization. Gemma surpassed 150 million downloads on Hugging Face. Meta released weights for the Dynamic Byte Latent Transformer and the Collaborative Reasoner framework to improve language model efficiency and reasoning. RunwayML introduced Gen-4 References, a near-realtime model requiring no fine-tuning. Mistral AI released Mistral Medium 3, a strong multimodal model, and Le Chat Enterprise, an agentic AI assistant for business. Google updated Gemini 2.5 Pro Preview with video understanding and UI improvements. "Airbnb for spare GPUs from all over the world" highlights the ongoing challenges and potential of distributed GPU training.

Jul 17, 2024

Gemma 2 tops /r/LocalLlama vibe check

gemma-2-9b gemma-2-27b llama-3 mistral-7b phi-3 qwen gemma llamaindex mistral-ai cohere deepseek-ai nous-research eureka-labs model-comparison local-llms multilinguality model-efficiency fine-tuning ai-education ai-teaching-assistants andrej-karpathy

Gemma 2 (9B, 27B) is highlighted as a top-performing local LLM, praised for its speed, multilingual capabilities, and efficiency on consumer GPUs like the 2080ti. It outperforms models like Llama 3 and Mistral 7B in various tasks, including non-English text processing and reasoning. The community discussion on /r/LocalLlama reflects strong preference for Gemma 2, with 18 mentions, compared to 10 mentions for Llama 3 and 9 mentions for Mistral. Other models like Phi 3 and Qwen also received mentions but are considered surpassed by Gemma 2. Additionally, Andrej Karpathy announced the launch of Eureka Labs, an AI+Education startup aiming to create an AI-native school with AI Teaching Assistants, starting with the LLM101n course to teach AI training fundamentals. This initiative is seen as a significant development in AI education.

Jun 11, 2024

Talaria: Apple's new MLOps Superweapon

gemma mixtral phi dbrx apple google mistral-ai microsoft mosaic quantization on-device-ai adapter-models model-optimization model-latency lossless-quantization low-bit-palletization token-generation model-benchmarking human-evaluation craig-federighi andrej-karpathy

Apple Intelligence introduces a small (~3B parameters) on-device model and a larger server model running on Apple Silicon with Private Cloud Compute, aiming to surpass Google Gemma, Mistral Mixtral, Microsoft Phi, and Mosaic DBRX. The on-device model features a novel lossless quantization strategy using mixed 2-bit and 4-bit LoRA adapters averaging 3.5 bits-per-weight, enabling dynamic adapter hot-swapping and efficient memory management. Apple credits the Talaria tool for optimizing quantization and model latency, achieving about 0.6 ms time-to-first-token latency and 30 tokens per second generation rate on iPhone 15 Pro. Apple focuses on an "adapter for everything" strategy with initial deployment on SiriKit and App Intents. Performance benchmarks rely on human graders, emphasizing consumer-level adequacy over academic dominance. The Apple ML blog also mentions an Xcode code-focused model and a diffusion model for Genmoji.

Mar 12, 2024

Fixing Gemma

gemma claude-3-opus claude-3 mistral-large gpt-4 google unsloth anthropic mistral-ai finetuning numerical-precision benchmarking structured-data-extraction adaptive-equalizer information-theory hallucination-detection model-stability daniel-han yann-lecun francois-chollet arav-srinivas _aidan_clark_

Google's Gemma model was found unstable for finetuning until Daniel Han from Unsloth AI fixed 8 bugs, improving its implementation. Yann LeCun explained technical details of a pseudo-random bit sequence for adaptive equalizers, while François Chollet discussed the low information bandwidth of the human visual system. Arav Srinivas reported that Claude 3 Opus showed no hallucinations in extensive testing, outperforming GPT-4 and Mistral-Large in benchmarks. Reflections from Yann LeCun highlight ongoing AI progress toward human-level intelligence. The community is shifting pipelines to work better with Claude models, and emotional experiences in ML development were shared by Aidan Clark.

Feb 22, 2024

Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)

gemma-2b gemma-7b gemma gemini-pro-1.5 llama-2 llama-3 mistral google hugging-face nvidia benchmarking license-policies image-generation video-understanding long-context dataset-editing model-integration gpu-hardware bug-fixes quantization

Google's Gemma open models (2-7B parameters) outperform Llama 2 and Mistral in benchmarks but face criticism for an unusual license and poor image generation quality, which Google partially acknowledges. The upcoming Gemini Pro 1.5 model features a 1 million token context window, excelling in video understanding and needle-in-haystack tasks. Discord communities like TheBloke and LM Studio discuss mixed reception of Gemma models, anticipation for Llama 3 release, challenges in dataset editing, and hardware considerations such as NVIDIA GeForce RTX 3090 and RTX 4090 GPUs. LM Studio users report issues with version 0.2.15 Beta and ongoing integration of Gemma models, with resources shared on Hugging Face.