subscribe / issues / tags /

Company: "mila"

not much happened today

kimi-linear kimi-delta-attention minimax-m2 looped-llms aardvark-gpt-5 moonshot-ai minimax bytedance princeton mila openai cursor cognition hkust long-context attention-mechanisms agentic-ai tool-use adaptive-compute coding-agents performance-optimization memory-optimization reinforcement-learning model-architecture kimi_moonshot scaling01 uniartisan omarsar0 aicodeking songlinyang4 iscienceluvr nrehiew_ gdb embeddedsec auchenberg simonw

Moonshot AI released Kimi Linear (KDA) with day-0 infrastructure and strong long-context metrics, achieving up to 75% KV cache reduction and 6x decoding throughput. MiniMax M2 pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding performance with 200k context and ~100 TPS. ByteDance, Princeton, and Mila introduced Looped LLMs showing efficiency gains comparable to larger transformers. OpenAI's Aardvark (GPT-5) entered private beta as an agentic security researcher for scalable vulnerability discovery. Cursor launched faster cloud coding agents, though transparency concerns arose regarding base-model provenance. Cognition released a public beta for a desktop/mobile tool-use agent named Devin. The community discussed advanced attention mechanisms and adaptive compute techniques.

not much happened today

gpt-5-pro gemini-2.5 vllm deepseek-v3.1 openai google-deepmind microsoft epoch-ai-research togethercompute nvidia mila reasoning reinforcement-learning inference speculative-decoding sparse-attention kv-cache-management throughput-optimization compute-efficiency tokenization epochairesearch yitayml _philschmid jiqizhixin cvenhoff00 neelnanda5 lateinteraction mgoin_ blackhc teortaxestex

FrontierMath Tier 4 results show GPT-5 Pro narrowly outperforming Gemini 2.5 Deep Think in reasoning accuracy, with concerns about problem leakage clarified by Epoch AI Research. Mila and Microsoft propose Markovian Thinking to improve reasoning efficiency, enabling models to reason over 24K tokens with less compute. New research suggests base models inherently contain reasoning mechanisms, with "thinking models" learning to invoke them effectively. In systems, NVIDIA Blackwell combined with vLLM wins InferenceMAX with significant throughput gains, while Together AI's ATLAS adaptive speculative decoding achieves 4× speed improvements and reduces RL training time by over 60%. SparseServe introduces dynamic sparse attention with KV tiering, drastically improving throughput and latency in GPU memory management.

a quiet weekend

o1 datagemma aloha demostart firefly-ai-video-model pixtral-12b gamegen-o openai google-deepmind adobe mistral-ai tencent supermaven 11x cohere anthropic latent-space-university stanford microsoft mila notre-dame reinforcement-learning chain-of-thought reasoning robotics diffusion-models multimodality video-generation model-training reflection-tuning mathematical-reasoning model-benchmarking fine-tuning george-hotz terence-tao adcock_brett rohanpaul_ai bindureddy fchollet philschmid

OpenAI released the new o1 model, leveraging reinforcement learning and chain-of-thought prompting to excel in reasoning benchmarks, achieving an IQ-like score of 120. Google DeepMind introduced DataGemma to reduce hallucinations by connecting LLMs with real-world data, and unveiled ALOHA and DemoStart for robot dexterity using diffusion methods. Adobe previewed its Firefly AI Video Model with text-to-video and generative extend features. Mistral launched the multimodal Pixtral 12B model, and Tencent presented the GameGen-O open-world video game generation model. Several research papers from Stanford, OpenAI, Microsoft, Mila, and Notre Dame focus on advanced reasoning, self-verification, and reflection tuning techniques. Experts like Terence Tao and George Hotz have shared mixed but optimistic views on o1's capabilities. Seed funding rounds include Supermaven ($12M) and 11x ($24M).

© 2026 • AINews

You can also subscribe by rss .

Press Esc or click anywhere to close