All tags
Person: "alexandr_wang"
not much happened today
gemma-3n glm-4.1v-thinking deepseek-r1t2 mini-max-m1 o3 claude-4-opus claude-sonnet moe-72b meta scale-ai unslothai zhipu-ai deepseek huawei minimax-ai allenai sakana-ai-labs openai model-performance vision conv2d float16 training-loss open-source model-benchmarks moe load-balancing scientific-literature-evaluation code-generation adaptive-tree-search synthesis-benchmarks alexandr_wang natfriedman steph_palazzolo thegregyang teortaxes_tex denny_zhou agihippo danielhanchen osanseviero reach_vb scaling01 ndea
Meta has hired Scale AI CEO Alexandr Wang as its new Chief AI Officer, acquiring a 49% non-voting stake in Scale AI for $14.3 billion, doubling its valuation to ~$28 billion. This move is part of a major talent shuffle involving Meta, OpenAI, and Scale AI. Discussions include the impact on Yann LeCun's influence at Meta and potential responses from OpenAI. In model news, Gemma 3N faces technical issues like vision NaNs and FP16 overflows, with fixes from UnslothAI. Chinese open-source models like GLM-4.1V-Thinking by Zhipu AI and DeepSeek R1T2 show strong performance and speed improvements. Huawei open-sourced a 72B MoE model with a novel load balancing solution. The MiniMax-M1 hybrid MoE model leads math benchmarks on the Text Arena leaderboard. AllenAI launched SciArena for scientific literature evaluation, where o3 outperforms others. Research from Sakana AI Labs introduces AB-MCTS for code generation, improving synthesis benchmarks.
not much happened today
chai-2 gemini-2.5-pro deepseek-r1-0528 meta scale-ai anthropic cloudflare grammarly superhuman chai-discovery atlassian notion slack commoncrawl hugging-face sakana-ai inference model-scaling collective-intelligence zero-shot-learning enterprise-deployment data-access science-funding open-source-llms alexandr_wang nat_friedman clementdelangue teortaxestex ylecun steph_palazzolo andersonbcdefg jeremyphoward reach_vb
Meta makes a major AI move by hiring Scale AI founder Alexandr Wang as Chief AI Officer and acquiring a 49% non-voting stake in Scale AI for $14.3 billion, doubling its valuation to about $28 billion. Chai Discovery announces Chai-2, a breakthrough model for zero-shot antibody discovery and optimization. The US government faces budget cuts threatening to eliminate a quarter million science research jobs by 2026. Data access restrictions intensify as companies like Atlassian, Notion, and Slack block web crawlers including Common Crawl, raising concerns about future public internet archives. Hugging Face shuts down HuggingChat after serving over a million users, marking a significant experiment in open-source LLMs. Sakana AI releases AB-MCTS, an inference-time scaling algorithm enabling multiple models like Gemini 2.5 Pro and DeepSeek-R1-0528 to cooperate and outperform individual models.
not much happened today
o3-mini o1-mini llama hunyuan-a13b ernie-4.5 ernie-4.5-21b-a3b qwen3-30b-a3b gemini-2.5-pro meta-ai-fair openai tencent microsoft baidu gemini superintelligence ai-talent job-market open-source-models multimodality mixture-of-experts quantization fp8-training model-benchmarking model-performance model-releases api model-optimization alexandr_wang shengjia_zhao jhyuxm ren_hongyu shuchaobi saranormous teortaxesTex mckbrando yuchenj_uw francoisfleuret quanquangu reach_vb philschmid
Meta has poached top AI talent from OpenAI, including Alexandr Wang joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next Llama model. The AI job market shows polarization with high demand and compensation for top-tier talent, while credentials like strong GitHub projects gain importance. The WizardLM team moved from Microsoft to Tencent to develop open-source models like Hunyuan-A13B, highlighting shifts in China's AI industry. Rumors suggest OpenAI will release a new open-source model in July, potentially surpassing existing ChatGPT models. Baidu open-sourced multiple variants of its ERNIE 4.5 model series, featuring advanced techniques like 2-bit quantization, MoE router orthogonalization loss, and FP8 training, with models ranging from 0.3B to 424B parameters. Gemini 2.5 Pro returned to the free tier of the Gemini API, enabling developers to explore its features.
Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI
o3-pro o3 o1-pro gpt-4o gpt-4.1 gpt-4.1-mini gpt-4.1-nano meta-ai-fair scale-ai lamini amd openai gemini google anthropic model-release benchmarking reasoning fine-tuning pricing model-performance direct-preference-optimization complex-problem-solving alexandr_wang sharon_zhou fidji_simo sama jack_rae markchen90 kevinweil gdb gregkamradt lechmazur wesrothmoney paul_cal imjaredz cto_junior johnowhitaker polynoamial scaling01
Meta hires Scale AI's Alexandr Wang to lead its new "Superintelligence" division following a $15 billion investment for a 49% stake in Scale. Lamini's Sharon Zhou joins AMD as VP of AI under Lisa Su, while Instacart's Fidji Simo becomes CEO of Apps at OpenAI under Sama. Meta offers over $10 million/year compensation packages to top researchers, successfully recruiting Jack Rae from Gemini. OpenAI releases o3-pro model to ChatGPT Pro users and API, outperforming o3 and setting new benchmarks like Extended NYT Connections and SnakeBench. Despite being slower than o1-pro, o3-pro excels in reasoning and complex problem-solving. OpenAI cuts o3 pricing by 80%, making it cheaper than GPT-4o and pressuring competitors like Google and Anthropic to lower prices. Users can now fine-tune the GPT-4.1 family using direct preference optimization (DPO) for subjective tasks.
Gemini 2.5 Flash completes the total domination of the Pareto Frontier
gemini-2.5-flash o3 o4-mini google openai anthropic tool-use multimodality benchmarking reasoning reinforcement-learning open-source model-releases chain-of-thought coding-agent sama kevinweil markchen90 alexandr_wang polynoamial scaling01 aidan_mclau cwolferesearch
Gemini 2.5 Flash is introduced with a new "thinking budget" feature offering more control compared to Anthropic and OpenAI models, marking a significant update in the Gemini series. OpenAI launched o3 and o4-mini models, emphasizing advanced tool use capabilities and multimodal understanding, with o3 dominating several leaderboards but receiving mixed benchmark reviews. The importance of tool use in AI research and development is highlighted, with OpenAI Codex CLI announced as a lightweight open-source coding agent. The news reflects ongoing trends in AI model releases, benchmarking, and tool integration.
Not much happened today
grok-beta llama-3-1-70b claude-3-5-haiku claude-3-opus llama-3 chatgpt gemini meta-ai-fair scale-ai anthropic perplexity-ai langchainai weights-biases qwen pricing national-security defense open-source agentic-ai retrieval-augmented-generation election-predictions real-time-updates annotation ai-ecosystem memes humor alexandr_wang svpino aravsrinivas bindureddy teortaxestex jessechenglyu junyang-lin cte_junior jerryjliu0
Grok Beta surpasses Llama 3.1 70B in intelligence but is less competitive due to its pricing at $5/1M input tokens and $15/1M output tokens. Defense Llama, developed with Meta AI and Scale AI, targets American national security applications. SWE-Kit, an open-source framework, supports building customizable AI software engineers compatible with Llama 3, ChatGPT, and Claude. LangChainAI and Weights & Biases integrate to improve retrievers and reduce hallucinations in RAG applications using Gemini. Perplexity AI offers enhanced election tracking tools for the 2024 elections, including live state results and support for Claude 3.5 Haiku. AI Talk launched featuring discussions on Chinese AI labs with guests from Qwen. Memes highlight Elon Musk and humorous AI coding mishaps.
ChatGPT Advanced Voice Mode
o1-preview qwen-2.5 llama-3 claude-3.5 openai anthropic scale-ai togethercompute kyutai-labs voice-synthesis planning multilingual-datasets retrieval-augmented-generation open-source speech-assistants enterprise-ai price-cuts benchmarking model-performance sam-altman omarsar0 bindureddy rohanpaul_ai _philschmid alexandr_wang svpino ylecun _akhaliq
OpenAI rolled out ChatGPT Advanced Voice Mode with 5 new voices and improved accent and language support, available widely in the US. Ahead of rumored updates for Llama 3 and Claude 3.5, Gemini Pro saw a significant price cut aligning with the new intelligence frontier pricing. OpenAI's o1-preview model showed promising planning task performance with 52.8% accuracy on Randomized Mystery Blocksworld. Anthropic is rumored to release a new model, generating community excitement. Qwen 2.5 was released with models up to 32B parameters and support for 128K tokens, matching GPT-4 0613 benchmarks. Research highlights include PlanBench evaluation of o1-preview, OpenAI's release of a multilingual MMMLU dataset covering 14 languages, and RAGLAB framework standardizing Retrieval-Augmented Generation research. New AI tools include PDF2Audio for converting PDFs to audio, an open-source AI starter kit for local model deployment, and Moshi, a speech-based AI assistant from Kyutai. Industry updates feature Scale AI nearing $1B ARR with 4x YoY growth and Together Compute's enterprise platform offering faster inference and cost reductions. Insights from Sam Altman's blog post were also shared.
nothing much happened today
o1 chatgpt-4o llama-3-1-405b openai lmsys scale-ai cognition langchain qdrant rohanpaul_ai reinforcement-learning model-merging embedding-models toxicity-detection image-editing dependency-management automated-code-review visual-search benchmarking denny_zhou svpino alexandr_wang cwolferesearch rohanpaul_ai _akhaliq kylebrussell
OpenAI's o1 model faces skepticism about open-source replication due to its extreme restrictions and unique training advances like RL on CoT. ChatGPT-4o shows significant performance improvements across benchmarks. Llama-3.1-405b fp8 and bf16 versions perform similarly with cost benefits for fp8. A new open-source benchmark "Humanity's Last Exam" offers $500K in prizes to challenge LLMs. Model merging benefits from neural network sparsity and linear mode connectivity. Embedding-based toxic prompt detection achieves high accuracy with low compute. InstantDrag enables fast, optimization-free drag-based image editing. LangChain v0.3 releases with improved dependency management. Automated code review tool CodeRabbit adapts to team coding styles. Visual search advances integrate multimodal data for better product search. Experts predict AI will be default software by 2030.