subscribe / issues / tags /

Person: "sebastien-bubeck"

not much happened today

phi-4 reinforce++ arc-agi-2 ai21-labs ollama langchain togethercompute groq reinforcement-learning ppo model-optimization memory-efficiency python-packages vision text-extraction frontend-code-generation workflow-automation coding-agents compute-cost-reduction ethical-ai agi-benchmarks scam-alerts sebastien-bubeck fchollet tom-doerr arohan_ bindureddy hwchase17 jonathanross321 clementdelangue vikhyatk

Sebastien Bubeck introduced REINFORCE++, enhancing classical REINFORCE with PPO-inspired techniques for 30% faster training. AI21 Labs released Phi-4 under the MIT License, accessible via Ollama. François Chollet announced plans for ARC-AGI-2 and a next-generation AGI benchmark. LangChain launched 10 new integration packages to boost LLM application development. Tom Doerr introduced Ollama-OCR, a Python package for text extraction using vision language models. Arohan optimized Shampoo for memory efficiency, reducing usage from 20 to 6 bytes per parameter. Bindu Reddy showcased CodeLLM's v1 for frontend code generation and highlighted LlamaIndex Workflows for academic summarization and slide generation. Hwchase17 collaborated with Together Compute to enhance WebDev Arena with complex coding agents for LLM coding evaluations. Jonathan Ross detailed Groq's mission to reduce compute costs by 1000x amid rising generative AI spending. Clement Delangue warned about scam alerts involving false claims of association with AI21. Vikhyat K raised concerns about the ethical implications and trade-offs of AGI. Memes and humor included creative AI prompts and critiques of LLM behaviors.

Shazeer et al (2024): you are overpaying for inference >13x

claude-3.5-sonnet claude-3-opus character.ai anthropic memory-efficiency kv-cache attention-mechanisms stateful-caching int8-precision transformer-architecture scaling overfitting architecture noam-shazeer kevin-a-fischer sebastien-bubeck _aidan_clark_ andrej-karpathy

Noam Shazeer explains how Character.ai serves 20% of Google Search Traffic for LLM inference while reducing serving costs by a factor of 33 compared to late 2022, with leading commercial APIs costing at least 13.5X more. Key memory-efficiency techniques include MQA > GQA reducing KV cache size by 8X, hybrid attention horizons, cross-layer KV-sharing, stateful caching with a 95% cache rate, and native int8 precision with custom kernels. Anthropic released Claude 3.5 Sonnet, which outperforms Claude 3 Opus at twice the speed and one-fifth the cost, passing 64% of internal pull request tests and introducing new features like Artifacts for real-time doc and code generation. Discussions on LLM architecture highlight the dominance of transformers, challenges in scaling and overfitting, and the importance of architecture work for progress.

© 2026 • AINews

You can also subscribe by rss .

Press Esc or click anywhere to close