All tags
Company: "princeton"
not much happened today
kimi-linear kimi-delta-attention minimax-m2 looped-llms aardvark-gpt-5 moonshot-ai minimax bytedance princeton mila openai cursor cognition hkust long-context attention-mechanisms agentic-ai tool-use adaptive-compute coding-agents performance-optimization memory-optimization reinforcement-learning model-architecture kimi_moonshot scaling01 uniartisan omarsar0 aicodeking songlinyang4 iscienceluvr nrehiew_ gdb embeddedsec auchenberg simonw
Moonshot AI released Kimi Linear (KDA) with day-0 infrastructure and strong long-context metrics, achieving up to 75% KV cache reduction and 6x decoding throughput. MiniMax M2 pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding performance with 200k context and ~100 TPS. ByteDance, Princeton, and Mila introduced Looped LLMs showing efficiency gains comparable to larger transformers. OpenAI's Aardvark (GPT-5) entered private beta as an agentic security researcher for scalable vulnerability discovery. Cursor launched faster cloud coding agents, though transparency concerns arose regarding base-model provenance. Cognition released a public beta for a desktop/mobile tool-use agent named Devin. The community discussed advanced attention mechanisms and adaptive compute techniques.
We Solved Hallucinations
gpt-2 flashattention-3 lynx meta-ai-fair nvidia princeton colfax patronus-ai databricks mosaic-ai openai compute-hardware gpu-optimization flashattention llm-evaluation hallucination-detection vision benchmarking synthetic-data model-training karpathy tri_dao giffmana vikhyatk dbrxmosaicai
Reddit's URL structure causes link errors in AI-generated summaries, especially with NSFW content affecting models like Claude and GPT-4. The team fixed this glitch while still leveraging LLMs for summarizing Reddit content. GPT-2 training costs have dramatically dropped to ~$672 using H100 GPUs and software improvements like CUDA and FlashAttention. FlashAttention-3 was released, achieving up to 740 TFLOPS on H100 GPUs, with FP8 nearing 1.2 PFLOPS, developed collaboratively by Meta, NVIDIA, Princeton, and Colfax. Hopper GPUs enable major speedups with new hardware features. Synthetic data may not improve vision tasks, as shown in recent research. The Avocado360 benchmark evaluates vision-language models' ability to detect avocados in images. Lynx, a hallucination detection model for LLMs, was introduced for real-world healthcare and fintech applications, trained by Patronus AI on Databricks Mosaic AI using Composer.