All tags
Company: "cursor"
AI Engineer World's Fair Talks Day 1
gemini-2.5 gemma claude-code mistral cursor anthropic openai aie google-deepmind meta-ai-fair agent-based-architecture open-source model-memorization scaling-laws quantization mixture-of-experts language-model-memorization model-generalization langgraph model-architecture
Mistral launched a new Code project, and Cursor released version 1.0. Anthropic improved Claude Code plans, while ChatGPT announced expanded connections. The day was dominated by AIE keynotes and tracks including GraphRAG, RecSys, and Tiny Teams. On Reddit, Google open-sourced the DeepSearch stack for building AI agents with Gemini 2.5 and LangGraph, enabling flexible agent architectures and integration with local LLMs like Gemma. A new Meta paper analyzed language model memorization, showing GPT-style transformers store about 3.5–4 bits/parameter and exploring the transition from memorization to generalization, with implications for Mixture-of-Experts models and quantization effects.
Cursor @ $9b, OpenAI Buys Windsurf @ $3b
llama-nemotron-ultra llama-nemotron-super llama-nemotron-nano qwen3-235b-a22b prover-v2 phi-4-reasoning ernie-4.5-turbo ernie-x1-turbo suno-v4.5 gen-4-references o1-mini openai cursor nvidia alibaba deepseek microsoft baidu suno runway keras reasoning inference-efficiency open-license moe-models math-reasoning theorem-proving model-performance music-generation image-generation recommender-systems tpu-optimization _akhaliq adcock_brett lmarena_ai fchollet
OpenAI is reportedly close to closing a deal with Windsurf, coinciding with Cursor's $900M funding round at a $9B valuation. Nvidia launched the Llama-Nemotron series featuring models from 8B to 253B parameters, praised for reasoning and inference efficiency. Alibaba released the Qwen3 family with MoE and dense models up to 235B parameters, ranking highly in coding and math benchmarks. DeepSeek introduced Prover-V2, an open-source AI for math reasoning with an 88.9% pass rate on MiniF2F-test. Microsoft released reasoning-focused Phi-4 models, outperforming OpenAI's o1-mini. Baidu debuted turbo versions of ERNIE 4.5 and X1 for faster, cheaper inference. Suno v4.5 added advanced AI music generation features, while Runway Gen-4 References enable placing characters into scenes with high consistency. KerasRS, a new recommender system library optimized for TPUs, was released by Fran ois Chollet.
not much happened today
phi-4 phi-4-mini-reasoning qwen3-235b qwen3-moe-235b qwen3-moe-30b qwen3-dense-32b qwen3-dense-14b qwen3-dense-8b qwen3-dense-4b qwen3-dense-0.6b qwen2.5-omni-3b deepseek-prover-v2 llama llama-guard-4 prompt-guard-2 mimo-7b microsoft anthropic cursor alibaba togethercompute deepseek meta-ai-fair xiaomi openrouterai cohere reasoning model-fine-tuning model-evaluation benchmarking model-popularity open-source math model-scaling model-filtering jailbreak-prevention cline reach_vb vipulved akhaliq omarsar0 zhs05232838 huajian_xin mervenoyann karpathy random_walker sarahookr blancheminerva clefourrier
Microsoft released Phi-reasoning 4, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. Anthropic introduced remote MCP server support and a 45-minute Research mode in Claude. Cursor published a model popularity list. Alibaba launched Qwen3-235B and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on Together AI API. Microsoft also released Phi-4-Mini-Reasoning with benchmark performance on AIME 2025 and OmniMath. DeepSeek announced DeepSeek-Prover V2 with state-of-the-art math problem solving, scaling to 671B parameters. Meta AI's Llama models hit 1.2 billion downloads, with new Llama Guard 4 and Prompt Guard 2 for input/output filtering and jailbreak prevention. Xiaomi released the open-source reasoning model MiMo-7B trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the LMArena leaderboard, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like OpenRouterAI rankings. "LMArena slop and biased" and "61.3% of all data going to proprietary model providers" were noted concerns.
>$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)
deepseek-v3-0324 gemini-2.5-pro claude-3.7-sonnet openai deepseek gemini cursor etched skypilot agent-evals open-models model-releases model-performance coding multimodality model-deployment cost-efficiency agent-evaluation privacy kevinweil sama lmarena_ai scaling01 iscienceluvr stevenheidel lepikhin dzhng raizamrtn karpathy
OpenAI is preparing to release a highly capable open language model, their first since GPT-2, with a focus on reasoning and community feedback, as shared by @kevinweil and @sama. DeepSeek V3 0324 has achieved the #5 spot on the Arena leaderboard, becoming the top open model with an MIT license and cost advantages. Gemini 2.5 Pro is noted for outperforming models like Claude 3.7 Sonnet in coding tasks, with upcoming pricing and improvements expected soon. New startups like Sophont are building open multimodal foundation models for healthcare. Significant fundraises include Cursor closing $625M at a $9.6B valuation and Etched raising $85M at $1.5B. Innovations in AI infrastructure include SkyPilot's cost-efficient cloud provisioning and the launch of AgentEvals, an open-source package for evaluating AI agents. Discussions on smartphone privacy highlight iPhone's stronger user defense compared to Android.
Cerebras Inference: Faster, Better, AND Cheaper
llama-3.1-8b llama-3.1-70b gemini-1.5-flash gemini-1.5-pro cogvideox-5b mamba-2 rene-1.3b llama-3.1 gemini-1.5 claude groq cerebras cursor google-deepmind anthropic inference-speed wafer-scale-chips prompt-caching model-merging benchmarking open-source-models code-editing model-optimization jeremyphoward sam-altman nat-friedman daniel-gross swyx
Groq led early 2024 with superfast LLM inference speeds, achieving ~450 tokens/sec for Mixtral 8x7B and 240 tokens/sec for Llama 2 70B. Cursor introduced a specialized code edit model hitting 1000 tokens/sec. Now, Cerebras claims the fastest inference with their wafer-scale chips, running Llama3.1-8b at 1800 tokens/sec and Llama3.1-70B at 450 tokens/sec at full precision, with competitive pricing and a generous free tier. Google's Gemini 1.5 models showed significant benchmark improvements, especially Gemini-1.5-Flash and Gemini-1.5-Pro. New open-source models like CogVideoX-5B and Mamba-2 (Rene 1.3B) were released, optimized for consumer hardware. Anthropic's Claude now supports prompt caching, improving speed and cost efficiency. "Cerebras Inference runs Llama3.1 20x faster than GPU solutions at 1/5 the price."
Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing
gpt-4 gpt-4o gpt-4-turbo gpt-4o-mini llama bloom stable-diffusion cursor openai anthropic google-deepmind huggingface speculative-decoding code-edits multimodality image-generation streaming tool-use fine-tuning benchmarking mmlu model-performance evaluation synthetic-data context-windows sama abacaj imjaredz erhartford alexalbert svpino maximelabonne _philschmid
Cursor, an AI-native IDE, announced a speculative edits algorithm for code editing that surpasses GPT-4 and GPT-4o in accuracy and latency, achieving speeds of over 1000 tokens/s on a 70b model. OpenAI released GPT-4o with multimodal capabilities including audio, vision, and text, noted to be 2x faster and 50% cheaper than GPT-4 turbo, though with mixed coding performance. Anthropic introduced streaming, forced tool use, and vision features for developers. Google DeepMind unveiled Imagen Video and Gemini 1.5 Flash, a small model with a 1M-context window. HuggingFace is distributing $10M in free GPUs for open-source AI models like Llama, BLOOM, and Stable Diffusion. Evaluation insights highlight challenges with LLMs on novel problems and benchmark saturation, with new benchmarks like MMLU-Pro showing significant drops in top model performance.