AI News for 2/24/2025-2/25/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (220 channels, and 5949 messages) for you. Estimated reading time saved (at 200wpm): 503 minutes. You can now tag @smol_ai for AINews discussions!

You should follow DeepSeek's #OpenSourceWeek, but the releases so far have not met our bar for headline story status.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

Claude 3.7 Sonnet Release and Performance

Claude 3.7 Sonnet excels in coding and reasoning: @skirano highlighted that Claude 3.7 Sonnet with Claude Code can generate an entire "glass like" design system in one shot, including all components. @omarsar0 demonstrated Claude 3.7's reasoning and coding capabilities by creating a simulator for attention mechanisms. @reach_vb noted that Claude 3.7 beats DeepSeek R1 and is on par with O3-mini (high) in non-thinking mode, anticipating strong performance in thinking mode. @ArtificialAnlys benchmarked Claude 3.7 Sonnet as the best non-reasoning model for coding, outperforming DeepSeek v3, Gemini 2.0 Pro, and GPT-4o on their coding evals SciCode and LiveCodeBench. @terryyuezhuo shared BigCodeBench-Hard results showing Claude-3.7 (w/o thinking) achieving 33.8% Complete, comparable to Qwen2.5-Coder-32B-Instruct, and outperforming o3-mini and o1-2024-12-17.
Claude 3.7 Sonnet available on multiple platforms: @perplexity_ai announced Claude 3.7 Sonnet's availability on Perplexity Pro, noting improvements in agentic workflows and code generation. @_akhaliq confirmed Claude 3.7 Sonnet is live on Anychat with coder mode. @_philschmid mentioned availability on Anthropic, Amazon Bedrock, and Google Cloud, at the same price of $3/$15 per million input/output tokens.
Claude 3.7 Sonnet's "Thinking Mode" and Context Window: @_philschmid highlighted Claude 3.7's <thinking> mode with up to 64k tokens and reasoning tokens display, along with a 200k context window and 128k output token length. @Teknium1 praised the toggleable think mode in Claude.
Claude 3.7 Sonnet's coding tool "Claude Code": @_philschmid introduced Claude Code, a CLI-based coding assistant capable of reading, modifying files, and executing commands. @catherineols described Claude Code as more autonomous than other tools, capable of deciding to run tests and edit files. @goodside previewed Claude Code, noting it sees files, writes diffs, runs commands, and is like a lightweight Cursor without the editor.
Claude 3.7 Sonnet price comparison: @_philschmid pointed out that Claude 3.7's price remained at $3/$15 per million input/output, making it 30x more expensive than Gemini 2.0 Flash and ~3x more than Open o3-mini.

DeepSeek and Qwen Model Updates and Open Source Releases

DeepSeek releases DeepEP communication library: @deepseek_ai announced DeepEP, an open-source EP communication library for MoE model training and inference, featuring efficient all-to-all communication, NVLink and RDMA support, FP8 support, and optimized kernels. @reach_vb detailed DeepEP's features, including asymmetric-domain bandwidth forwarding, low-latency kernels with pure RDMA, and PTX optimizations for Hopper GPUs. @danielhanchen highlighted DeepSeek's #2 OSS release with MoE kernels, expert parallelism, and FP8 for training and inference.
Qwen2.5-Max "Thinking (QwQ)" mode and upcoming open source release: @Alibaba_Qwen released "Thinking (QwQ)" in Qwen Chat, backed by QwQ-Max-Preview, a reasoning model based on Qwen2.5-Max, noting enhanced capabilities in math, coding, and agent tasks. @huybery teased the future of Qwen, mentioning the upcoming official release of QwQ-Max and the planned open-weight release of both QwQ-Max and Qwen2.5-Max under Apache 2.0 license, along with smaller variants like QwQ-32B and mobile apps. @reach_vb excitedly announced QwQ & Qwen 2.5 Max open source release soon.

Video and Multimodal Model Developments

Google Veo 2 video model surpasses Sora in benchmarks: @ArtificialAnlys reported Google Veo 2 surpassed OpenAI’s Sora and Kling 1.5 Pro in their Video Arena, noting strengths in rendering people and realistic physics. Veo 2 can generate minutes of 4K video but is currently limited to 720p video with 8s duration at a price of $0.50 per second.
Alibaba Wan2.1 open AI video generation model: @_akhaliq announced Alibaba's Wan2.1, an open AI video generation model, ranking #1 on the VBench leaderboard, outperforming SOTA open-source & commercial models in complex motion dynamics, physics simulation, and text rendering. @multimodalart confirmed Wan2.1 is Apache 2.0 open source and available on Hugging Face.
RunwayML Creative Partners Program for artists: @c_valenzuelab described RunwayML's Creative Partners Program, giving artists free access to tools to reward experimentation and inspiration, contrasting it with companies copying the effort for product promotion without honoring artists.

Tools, Libraries and Datasets

Replit Agent v2 released: @pirroh announced Replit Agent v2 in Early Access, highlighting a new app creation experience, realtime app design preview, and instructions for access. @hwchase17 noted Replit agent v2 is powered by LangGraph and LangSmith.
LangChain JS adds Claude 3.7 Support and LangGraph Supervisor: @LangChainAI shared tips for building agents with Claude 3.7, demonstrating tool-calling agents with configurable reasoning. @LangChainAI introduced LangGraph.js Supervisor, a library for building hierarchical multi-agent systems with LangGraph. @LangChainAI listed 17 new integration packages added to LangChain Python. @LangChainAI announced Claude 3.7 support in LangChain JS.
vLLM integrates EP support: @vllm_project announced initial EP support merged in vLLM, with integration of collectives coming soon. @reach_vb confirmed vLLM's lightning-fast integration of EP.
OlmOCR by Allen AI for PDF parsing: @mervenoyann presented OlmOCR, a new tool by @allen_ai for parsing PDFs, based on Qwen2VL-7B, and available on transformers with Apache 2.0 license.
Big-Math dataset for RL in LLMs: @arankomatsuzaki and @iScienceLuvr shared SynthLabs' Big-Math, a large-scale, high-quality math dataset for reinforcement learning in language models, containing over 250,000 questions with verifiable answers.

Research and Analysis

Perplexity Deep Research for paid users: @OpenAI announced Deep research rolling out to all ChatGPT Plus, Team, Edu, and Enterprise users, with improvements including embedded images with citations and better understanding of uploaded files. @OpenAI detailed usage limits for Plus, Team, Enterprise, Edu, and Pro users. @OpenAI shared the system card for Deep research. @OpenAI mentioned community expert involvement in training Deep research and opened interest registration for future model contributions. @kevinweil announced Deep research rolling out to all paid users, highlighting its capability for week-long research tasks in 15 minutes. @AravSrinivas announced Deep Research API availability for developers.
Minions: Cost-efficient collaboration between local and cloud models: @togethercompute introduced Minions, a method pairing small language models on a laptop with frontier cloud models, preserving 98% of accuracy for <18% of the cost. @iScienceLuvr highlighted Minions achieving 5.7x cost reduction while maintaining 97.9% cloud model performance.
Learning to Reason from Feedback at Test-Time (FTTT): @dair_ai presented research on Feedback-based Test-Time Training (FTTT), enabling LLMs to learn iteratively from environment feedback during inference, using self-reflected feedback and OPTUNE, a learnable test-time optimizer.

AI Industry and Market Trends

Focus on AI agents and agency: @polynoamial questioned if AI models will soon have agency. @swyx emphasized Agency > Intelligence, defining agency as "getting what you want done" and "doing the right things". @omarsar0 expressed being impressed by Windsurf agentic capabilities.
Open source AI momentum: @ClementDelangue urged for more public, open, collaborative AI. @reach_vb thanked Alibaba_Qwen for their commitment to Open Source and Science. @NandoDF highlighted European AI entrepreneurship and competition, suggesting eliminating notice periods and non-competes to boost the European AI industry.
AI in specific domains: @RichardSocher anticipated epic progress when hill climbing starts on meaningful bio benchmarks. @SchmidhuberAI is hiring postdocs to develop an Artificial Scientist for novel chemical materials for climate change. @METR_Evals is running a pilot experiment to measure AI tools' impact on open source developer productivity.
AI safety and alignment concerns: @sleepinyourhat shared a surprising and disconcerting LLM alignment result. @NeelNanda5 announced a Google DeepMind team using model internals in production to enhance Gemini safety. @sarahcat21 discussed the need for high quality annotations for improving model capabilities and alignment, noting degrading annotation quality.
AI and the future of work: @adcock_brett predicted a future with more humanoids than humans doing various services and collapsing the price of goods/services. @RichardMCNgo discussed the concentrated nature of tech development driven by AI. @francoisfleuret asked for stories from people whose professional lives have been changed by AI models.

Memes and Humor

Death Star Startup Pitch: @arankomatsuzaki joked about a startup with a "bold vision: the Death Star" seeking a $500k seed round.
Worker 17 and AI overlords: @nearcyan shared a meme about "Worker 17" and an "AllKnowingLineSupervisingAutonomousSuperIntelligence", depicting a harsh work environment. @nearcyan continued the "Worker 17" theme, and @rishdotblog joked about future robot overlords hating humans.
Claude playing Pokemon on Twitch: @AnthropicAI announced "Can Claude play Pokémon?" and @kipperrii invited people to watch Claude play Pokemon on Twitch. @_philschmid joked about waiting for the first "AI plays Pokemon" stream. @nearcyan urged people to watch Claude playing Pokemon on Twitch. @AmandaAskell stated "Watching Claude play Pokemon is a delight.".
Anthropic branding and aversion to number four: @scaling01 joked about Anthropic being "more Elven than Human". @dylan522p humorously suggested Anthropic is a Chinese AI company due to their aversion to the number four.
Other humorous tweets: @giffmana shared a funny prompt and response from Grok. @nearcyan made a joke that was missed by others. @teortaxesTex shared a funny image related to Nvidia. @abacaj joked about loyalty to models. @Yuchenj_UW thanked OpenAI with a DeepSeek tweet.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek's DeepEP: Enhanced MoE GPU Communication

DeepSeek Realse 2nd Bomb, DeepEP a communication library tailored for MoE model (Score: 407, Comments: 48): DeepSeek has released DeepEP, a communication library specifically designed for Mixture-of-Experts (MoE) models and expert parallelism (EP). DeepEP features high-throughput, low-latency all-to-all GPU kernels and supports low-precision operations such as FP8, but is currently limited to GPUs with the Hopper architecture like H100, H200, and H800. GitHub Repository.
- DeepEP Performance Optimization: A notable discovery in the DeepEP repository involves using an undocumented PTX instruction ld.global.nc.L1::no_allocate.L2::256B for extreme performance on Hopper architectures. This instruction accesses volatile GPU memory with non-coherent modifiers .nc, but is tested to be correct and enhances performance significantly.
- Potential for Practical Applications: Users express hope that DeepEP's improvements could make Local R1 more practical by enabling faster inference on Mixture-of-Experts models, addressing previous performance issues with DeepSeek.
- Hardware Limitations and Aspirations: While DeepEP currently supports only Hopper architecture GPUs, there is interest in porting it to other GPUs like the 3090s, reflecting a desire for broader hardware compatibility.
DeepSeek 2nd OSS package - DeepEP - Expert parallel FP8 MOE kernels (Score: 153, Comments: 11): DeepSeek released its second open-source software package, DeepEP, which features expert parallel FP8 Mixture of Experts (MOE) kernels.
- DeepEP includes inference style kernels for Mixture of Experts (MoE) layers with FP8 support and expert parallelism, enabling the overlap of GPU/CPU communication and GPU computation. It is also suitable for training large MoE models.

Theme 2. Sonnet 3.7 Dominates Benchmark Testing

New LiveBench results just released. Sonnet 3.7 reasoning now tops the charts and Sonnet 3.7 is also top non-reasoning model (Score: 257, Comments: 53): Sonnet 3.7 from Anthropic leads the latest LiveBench results, achieving top scores in both Global Average (76.10) and Reasoning Average (87.83). The table showcases performance metrics of models from organizations like OpenAI and Google across categories including Coding, Mathematics, Data Analysis, and Language.
- Anthropic's Sonnet 3.7 leads in performance, but there are calls for releasing the model weights for local use. LiveBench results highlight improvements in coding and reasoning, with users noting the model's efficiency and quality compared to others like O3 mini high and Gemini 2 Flash.
- Discussions focus on benchmark limitations and real-world performance, with some users expressing skepticism about the model's math scores due to inconsistencies with official benchmarks. There is interest in seeing if using 128k tokens for evaluation could improve results, despite concerns about latency.
- The community is keen on more efficient model usage and hardware improvements, as some feel that the raw strength of models is reaching a plateau. The Aider leaderboard shows Sonnet 3.7 as significantly ahead of 3.5, indicating positive reception for its performance in coding tasks.
Sonnet 3.7 near clean sweep of EQ-Bench benchmarks (Score: 106, Comments: 54): Sonnet 3.7 achieves a near clean sweep of the EQ-Bench benchmarks, indicating significant advancements in AI model performance. This highlights the model's effectiveness and capability in various benchmark tests.
- Discussions around Sonnet 3.7's writing style highlight its "safe" approach, with comparisons to other models like Deepseek-R1 and OpenAI. Users question the descriptions like "earthy" and "spiky," while some find the model's style appealing to "liberal arts" audiences. Sonnet 3.7 shows significant improvements in humor understanding, as noted in the Buzzbench results.
- The cost-effectiveness of AI models is debated, with Sonnet 3.7 being more expensive than alternatives like Gemini. The discussion centers on whether the performance justifies the cost, especially for different user demographics, such as high-earning professionals versus hobbyists or students.
- Darkest Muse, a smaller 9b model, is praised for its creative writing capabilities, including character dialogue and poetic style, despite limitations in instruction following. The model's fine-tuning process involved training on human authors from the Gutenberg library, pushing it to the edge of model collapse for unique results.

Theme 3. Alibaba's Wan 2.1 Video Model Open-Source Release Scheduled

Alibaba video model Wan 2.1 will be released Feb 25th,2025 and is open source! (Score: 408, Comments: 49): Alibaba announced the open-source release of its video model Wan 2.1, scheduled for February 25th, 2025. The event, featuring a futuristic design with the theme "BEYOND VISION," will be broadcast live at 11:00 PM (UTC+8), highlighting the model's innovative potential.
- Naming Conventions: The name Wan is derived from the Chinese pronunciation for 10,000, similar to Qwen, which represents 1,000. This reflects a pattern in Alibaba's naming strategy for their models.
- Model Availability and Performance: Users are eager for the release of Wan 2.1, with discussions on its availability on Hugging Face and concerns about server overload affecting generation capabilities. A smaller model is also available, as noted in the README on Hugging Face.
- Hardware Requirements and Comparisons: There is optimism that Wan 2.1 will be runnable on consumer-grade GPUs like the RTX 3060, with comparisons to Flux, which has reduced its training requirements from 24 GB to 6 GB. Users hope Wan 2.1 will surpass SORA in terms of capabilities and open-source accessibility.
WAN Video model launched (Score: 100, Comments: 13): WAN Video model has been launched with weights available on Hugging Face. Although not a Large Language Model (LLM), it may interest many in the AI community.
- Quantization is applicable to Video Language Models (VLMs), with existing GGUFs like Hunyuan and LTX. These are popular due to the difficulty of fitting large models, and similar GGUFs are anticipated for WAN soon.
- There is a 1.3B version of the WAN model that requires only 8.19 GB VRAM, but it is restricted to 480p resolution due to limited training data at higher resolutions. However, users can upscale the output to achieve better results.
- The WAN Video model at 14B is considered large for open models, comparable to the Hunyuan model at 13B, with LTX being a smaller option at 2B. The WAN model's release in both 1.3B and 14B variants aims to cater to different use cases and hardware capabilities.

Theme 4. Gemma 3 27b Release: A New Contender in AI Models

Gemma 3 27b just dropped (Gemini API models list) (Score: 102, Comments: 27): Gemma 3 27b has been added to the Gemini API models list, featuring a user-friendly interface with a search bar and clickable model entries such as "Gemini 1.5 Pro" and "Gemini 2.0 Flash". The active model, "models/gemma-3-27b-it", is highlighted, suggesting it is currently selected, underscoring a structured and professional layout for ease of navigation.
- Model Lineage and Performance: There is a discussion about the lineage and performance of Gemma models, with users noting that Gemma 2 was superior for short story writing compared to Gemini, particularly the 9b version. Gemma and Gemini have similar response styles, but Flash is a different model.
- Access and Integration: Users question how Open WebUI accesses Google's unreleased models, with clarifications that it doesn't natively access models. Instead, users can add models via external APIs like Vertex AI or LiteLLM, and there is interest in finding the correct API URL as the current one doesn't list Gemma.
- Model Size Perception: There's a humorous exchange about the perception of model sizes, with 70B now considered medium and 24B considered small, reflecting the rapid advancements in AI model scaling.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. WAN 2.1 Released and Open Source with New Features

WAN Released (Score: 382, Comments: 169): WAN Released: The WAN video model has been released with open-source weights available for download. Multiple models are live on Hugging Face, enabling broader access and experimentation.
- Several users discussed the VRAM requirements for different model versions, noting that the 1.3B parameter model requires 8GB VRAM and the 14B model could potentially run on 10GB VRAM. There is also interest in using bf16 precision to reduce VRAM usage.
- Users are exploring Gradio applications and installation processes, with CeFurkan working on a Gradio app and installer compatible with Windows and Python 3.10 VENV. There are challenges with RTX 5000 series not having proper PyTorch support.
- The community is curious about the model's capabilities in handling multiple tasks like Text-to-Video, Image-to-Video, and Video-to-Audio, with some expressing skepticism about audio generation. Multiple safetensors are discussed, with guidance on handling them using the diffusers library.
Alibaba video model Wan 2.1 will be released today and is open source! (Score: 415, Comments: 104): Alibaba has announced the open-source release of its Wan 2.1 video model. The release event will be live-streamed on February 25, 2025, at 11:00 PM (UTC+8), with the event branded under TONGYI MOMENT and featuring a futuristic, sleek visual design.
- Discussions highlight the technical requirements for running the Wan 2.1 video model, with users speculating it might need 80GB VRAM but hoping it can run on 16GB VRAM with techniques like offloading and fp8, similar to hunyuan. Some users express a desire for a model that can scale from high to lower specs, akin to Deepseek R1.
- The release event will be live-streamed, likely on Alibaba's official X account. Users are curious about the model's capabilities, particularly its ability to perform image-to-video transformations, which has been confirmed by commenters.
- There is humorous commentary on the model's name Wanx, with users noting its phonetic resemblance to "wank" and speculating on the implications, including potential branding for uncensored/NSFW models.
My very first Wan 2.1 Generation on RTX 3090 Ti (Score: 524, Comments: 181): The post provides a first look at Wan 2.1 Generation using an RTX 3090 Ti. Since the post body is empty and the content is primarily in a video, no further details can be summarized.
- VRAM Requirements and Optimization: CeFurkan and others discussed optimizing the 1.3B and 14B models to run on 6GB and 10GB GPUs, respectively, with the RTX 3090 Ti using up to 18GB VRAM for generation. The community expressed interest in running these models on lower VRAM setups, such as 3060 12GB, and CeFurkan is developing an AIO installer to simplify usage.
- Model Capabilities and Performance: The Wan 2.1 Generation supports text to video, image to video, and video to video generation, with 16 FPS for five-second clips. CeFurkan is working on a Gradio app for easier use, and users are impressed by the quality, comparing it favorably to Hunyuan Video.
- Community Contributions and Resources: Kijai's ComfyUI integration is in development, with resources like DiffSynth-Studio and Kijai/WanVideo_comfy available for users. The community is actively sharing examples and prompts, with some users asking about potential NSFW capabilities and the ease of use compared to ComfyUI.

Theme 2. Claude 3.7 Model: Enhanced Capabilities and Accessibility

Holy. Shit. 3.7 is literally magic. (Score: 565, Comments: 111): Claude 3.7 has significantly improved in extended thinking, model quality, and output, making it 10 times more useful than its predecessor, Claude 3.5. The author used Claude 3.7 to design an interactive SaaS-style demo app, including an advanced ROI calculator and onboarding process, all within a single chat, highlighting its potential for real-world applications.
- Claude 3.7 Improvements: Users highlight significant improvements in Claude 3.7 over 3.5, particularly in following complex instructions and reducing cognitive load, with enhanced troubleshooting protocols and smoother operation. The model's ability to automatically check entire chains before making changes is seen as a major advancement.
- Usage and Cost Considerations: Discussions around inference costs and token management suggest that Claude may face bottlenecks due to hardware limitations, impacting its market strategy. Some users report strange errors and suboptimal suggestions, possibly due to token conservation strategies in Copilot, while others find Cline extension a superior alternative for coding tasks.
- SaaS and Development Efficiency: The creation of complex SaaS applications is now faster and more efficient with Claude 3.7, allowing users to complete months of development work in days. However, there are concerns about potential nerfing due to tighter censorship filters, which could degrade model performance over time.
Claude 3.7 is $1 a month for college students (Score: 187, Comments: 42): Claude 3.7 is now available to college students at a promotional rate of $1/month (down from the regular price of $20/month), as announced in an email to the Cornell community. The offer requires students to sign up with their .edu email and highlights features such as "Write code," "Extract insights," and "Brainstorm."
- Commenters express skepticism about the authenticity of the Claude 3.7 offer, with multiple users suggesting it might be a phishing scam due to the lack of official announcements or information on Google and Claude's official website.
- Some users joke about enrolling at Cornell to take advantage of the offer, while others speculate that Anthropic might be using this as a strategy to collect data from students at prestigious universities.
- There is a call for verification of the email's legitimacy, with suggestions to check the email source and concerns about the possibility of stolen or exploited accounts being resold.
"Claude 3.7, make a snake game, but the snake is self-aware it is in a game and trying to escape" (Score: 407, Comments: 32): Claude 3.7 is tasked with creating a snake game where the snake is self-aware and attempts to escape the game. The post does not provide further details or context beyond this intriguing concept.
- Users are impressed by Claude 3.7's ability to create complex outputs from simple prompts, with some comparing the experience to AGI and expressing disbelief at the results, such as the creation of a self-aware snake game and a fully functional website with multiple tools.
- Hereditydrift highlights the complexity and creativity of Claude 3.7's output from a minimal prompt, specifically mentioning the unexpected inclusion of a "Matrix section," which astonishes many users.
- Admirable_Scallion25 and others note that Claude 3.5 does not achieve the same level of complexity in one attempt, indicating a significant improvement in Claude 3.7's capabilities.

Theme 3. Claude Sonnet 3.7 Reigns Supreme: New top model in LLM benchmark

Sonnet 3.7 Extended Reasoning w/ 64k thinking tokens is the #1 model (Score: 154, Comments: 20): Sonnet 3.7 Extended Reasoning with 64k tokens by Anthropic leads in performance, boasting the highest global average score of 76.10, according to a table comparing AI models. It excels across various metrics including reasoning, coding, mathematics, data analysis, and language, outperforming models from OpenAI, xAI, and Google.
- Sonnet 3.7 Extended Reasoning with 64k tokens is praised for its performance, with Bindu Reddy highlighting its speed, reasoning, and coding abilities, labeling it the "best, most usable, and generally available model" (link). Users note its improvement over the 3.5 model and its leading position in benchmarks like LiveBench.
- Some users question the benchmark's real-world applicability, suggesting that cost normalization is essential for comparison, especially when considering test time compute scaling. They appreciate Sonnet's control over scaling costs, which optimizes workflows.
- Sonnet 3.7 is noted for outperforming o3-mini-high in various benchmarks including SWE bench, webdev arena, and Aider benchmark. In UI design and aesthetics, it significantly surpasses o3-mini-high and o1 pro, indicating specialized training in common UI elements.
[R] Analysis of 400+ ML competitions in 2024 (Score: 227, Comments: 19): The analysis of over 400 ML competitions in 2024 highlights that Kaggle remains the largest platform by prize money and user base. Python dominates as the primary language, with PyTorch preferred over TensorFlow at a 9:1 ratio, and NVIDIA GPUs, particularly the A100, are predominantly used for training models. Additionally, convolutional neural networks excel in computer vision, while gradient-boosted decision trees are favored in tabular/time-series competitions. The full report is available here.
- Jax Popularity and Advantages: Despite the dominance of PyTorch, some users express disappointment over the limited use of Jax in competitions, noting its simplicity and resemblance to numpy with additional features like grad, vmap, and jit. Jax is reportedly gaining traction in academia, although many professionals prefer sticking with PyTorch.
- Synthetic Data in ML Competitions: There is a debate about the effectiveness of using synthetic data in competitions, with concerns about it potentially "blurring" the original dataset. However, thoughtful use, such as generating synthetic backgrounds and superimposing objects for training, has proven beneficial, as demonstrated in a spacecraft detection competition, enhancing model robustness and generalization.
- Generative Models and Data Augmentation: Users discuss the implications of using generative models for data augmentation, emphasizing the importance of processing synthetic data carefully to add meaningful information. Successful strategies involve removing nonsensical examples and focusing on solutions that enhance training, as highlighted by a winning competition team's documentation.

Theme 4. Advanced Voice Features and Deep Research in GPT-4o Updates

Grok is cooked (Score: 172, Comments: 61): The post highlights concerns about Grok's potential biases following its deployment, as evidenced by its response identifying "Donald Trump" as the biggest disinformation spreader in a user query. This raises questions about the AI's validity and neutrality, particularly in politically sensitive contexts like elections, immigration, and climate change.
- There is a significant debate over Grok's bias, with some users arguing that its responses are influenced by an overwhelming amount of media, while others suggest that it may be biased in favor of Elon Musk. Wagagastiz points to a lack of media defending Musk as a sign of bias, while derfw counters that Grok's responses might indicate neutrality.
- Concerns about conservative bias and attempts to manipulate AI responses are prevalent, with users like well-filibuster speculating on efforts to retrain or create new chatbots to align with conservative views. Excellent_Egg5882 highlights a pattern of conservatives downvoting reality when it conflicts with their biases.
- Skepticism about the ability to maintain an unbiased LLM is evident, with users like ai_and_sports_fan and Earth-Jupiter-Mars expressing distrust in the long-term neutrality of Grok and other AI systems, given past instances of censorship and manipulation.
Deep research is now out for all Plus Users! (Score: 287, Comments: 63): Sam Altman announced via a tweet that "deep research" is now accessible to ChatGPT Plus users, calling it one of his favorite releases. The tweet garnered significant attention with 31.5K views, 261 retweets, 103 quote tweets, and 1.1K likes.
- Users discussed the monthly limit for deep research, with confirmation that Plus users have a limit of 10 uses per month, while Pro users receive 120 uses. There was confusion about usage counts, but it was clarified that follow-up questions do not count against the limit.
- Some users expressed disappointment with the feature, citing inaccuracies, such as incorrect Nvidia stock prices. Others shared successful use cases, like using AI to create a custom Music LLM with MusicGen and Replicate.com.
- Several users faced access issues, with suggestions to log out and back in or switch to the desktop version to resolve it. The feature's availability varied, with some users still unable to access it despite being Plus users.
We are rolling out a version of Advanced Voice powered by GPT-4o mini to give all ChatGPT free users a chance to preview it daily across platforms. (Score: 115, Comments: 28): OpenAI is rolling out a version of Advanced Voice powered by GPT-4o mini for all ChatGPT free users, allowing daily previews across platforms. The conversation pace and tone are similar to the GPT-4o version, but it is more cost-effective, as noted in a tweet that has received 3.3K views.
- Source Link: A source link to the announcement tweet by OpenAI can be found here.
- User Concerns: Users are questioning the functionality and limitations of the new feature, such as whether it can read for more than 4 minutes without restarting, and expressing dissatisfaction with the current rate limit for video sharing.
- Feature Requests: Users are requesting additional features, such as making the Operator available for free and introducing Advanced Memory capabilities.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Claude 3.7 Sonnet Storms the AI Scene

Sonnet 3.7 Unleashes Coding Chaos: Anthropic's Claude 3.7 Sonnet is making waves with its superior coding abilities, particularly in agentic tasks, leading to user excitement and rapid integration into tools like Cursor IDE and Aider. Users are reporting significant performance boosts, especially in front-end development and complex problem-solving, but some debate whether the reported 3x price increase for "thinking tokens" is justified given the performance gains.
Thinking Mode Unveiled, But Not Without Quirks: Claude 3.7 Sonnet introduces a new 'thinking mode' with up to 64,000 output tokens, visible in tools like Sage, allowing users to observe the model's reasoning process through <thinking> tags. However, some users are experiencing issues with context window management and rule adherence in Cursor, and others note a 10-second delay in output display with O3 models, although most agree the overall performance is a major upgrade.
Claude Code Challenges Aider's Code Editing Crown: Anthropic's release of Claude Code, a terminal-based agentic coding tool, is seen by some as an Aider clone, but early reports suggest it excels at code assistance, outperforming Aider in complex error resolution tasks, such as fixing 21 compile errors in Rust in one go. The tool is currently a limited research preview separate from Anthropic subscriptions, sparking discussions about caching mechanisms and potential cost implications, with some users reporting "astronomical Anthropic costs" recently.

Theme 2. DeepSeek's Deep Dive into Model Efficiency

MLA: Shrinking KV Cache, Expanding Horizons: DeepSeek AI's Multi-Head Latent Attention (MLA) is gaining attention for its potential to drastically reduce KV cache size by 5-10x, with papers like MHA2MLA and TransMLA exploring its implementation in models like Llama. While early results show mixed performance impacts (1-2% performance drop in some cases, enhancement in others), the significant memory savings make MLA a promising avenue for efficient inference, particularly for larger models.
DeepEP: Open-Sourcing MoE Training's Secret Sauce: DeepSeek has released DeepEP, the first open-source EP communication library designed for efficient all-to-all communication in Mixture of Experts (MoE) model training and inference. This library enables efficient expert parallelism and supports FP8, potentially democratizing access to advanced MoE model architectures and training techniques.
DeepScaleR: RL Supercharges Smaller Models: DeepScaleR, fine-tuned from Deepseek-R1-Distilled-Qwen-1.5B using simple Reinforcement Learning (RL), achieved 43.1% Pass@1 accuracy on AIME2024, demonstrating that RL techniques can significantly boost the performance of smaller models, potentially surpassing larger models like O1 Preview in specific tasks.

Theme 3. Open Source Tooling and Ecosystem Growth

OpenRouter Opens Gates to Claude 3.7 and Beyond: OpenRouter has rapidly integrated Claude 3.7 Sonnet, offering access to the model with competitive pricing at $3 per million input tokens and $15 per million output tokens, including thinking tokens, and plans to soon support Claude 3.7's extended thinking feature. OpenRouter also provides access to other models like o3-mini-high via OpenRouter, offering a cost-effective alternative and a single point of access to multiple providers, potentially bypassing rate limits and costing around $3 for 2 hours of coding.
QuantBench Quantifies Quantization Speed: The release of QuantBench on GitHub is accelerating quantization workflows, demonstrated by its use in creating the Qwen 2.5 VL 7B GGUF quant, available on Hugging Face. This tool, tested with the latest llama.cpp and CLIP hardware acceleration, simplifies and speeds up the process of model quantization, making efficient model deployment more accessible.
MCP Registry API: Standardizing AI Agent Development: Anthropic's announcement of the official MCP registry API is hailed as a significant step towards standardizing Model Context Protocol (MCP) development. This API aims to become the source of truth for MCPs, promoting interoperability and streamlining integration efforts for AI applications and agents, with community projects like opentools.com/registry already leveraging it.

Theme 4. Benchmarking Battles: Models Face Real-World Tests

Kagi's Benchmarks Crown Gemini 2.0 Pro, But Sonnet Still Strong: According to the Kagi LLM Benchmarking Project, Google's gemini-2.0-pro-exp-02-05 achieved 60.78% accuracy, outperforming Anthropic's claude-3-7-sonnet-20250219 at 53.23% and OpenAI's gpt-4o at 48.39%, however, Claude Sonnet 3.7 still shows strong performance, particularly on the Aider polyglot leaderboard where it scored 65% using thinking tokens. These benchmarks highlight the dynamic landscape of LLM performance and the ongoing race for accuracy and efficiency.
Misguided Attention Eval Exposes Overfitting Weakness: The Misguided Attention Eval is being used to test LLMs' reasoning abilities in the presence of misleading information, specifically targeting overfitting. Sonnet-3.7 benchmarked as the top non-reasoning model in this evaluation, nearly surpassing o3-mini, suggesting it exhibits robust performance even when confronted with deceptive prompts.
SWE Bench Sees Claude 3.7 Grab Top Spot: Claude 3.7 Sonnet is now leading on the SWE bench, demonstrating its prowess in software engineering tasks. Its capabilities extend to active code collaboration, including searching, editing, testing, and committing code to GitHub, solidifying its position as a top contender for coding-related applications.

Theme 5. Hardware Horizons: From Brains to Silicon

Brain's Parallelism Puzzles GPU Architects: Discussions are comparing the brain's stateful parallel processing to GPU efficiency, suggesting that current RNN architectures, while leveraging parallel processing, do not fully capture the brain's capabilities and may not scale optimally for LLMs. The consensus is that extremely tuned architectures and inductive biases, inspired by the brain, may be more crucial than simply scaling up model size for future advancements.
Speculative Decoding Speeds Up LM Studio: Users are exploring speculative decoding in LM Studio, particularly with Llama 3.1 8B and Llama 3.2 1B models, as documented in LM Studio's documentation. This technique, which uses a smaller "draft" model to predict tokens for a larger model, promises to significantly increase generation speed without compromising response quality, enhancing the efficiency of local LLM inference.
M2 Max Still a Power Sipper Compared to M4 Max: While the M4 Max is the latest from Apple, some users are sticking with the M2 Max, citing concerns about the M4 Max's high power consumption, reaching 140W, compared to the M2 Max's more efficient 60W. For users with sufficient performance from the M2 Max, especially those running locally, the power efficiency and availability of refurbished models make it a compelling alternative.

PART 1: High level Discord summaries

Cursor IDE Discord

Claude 3.7 Sonnet Triggers Coding Boom: Claude 3.7 Sonnet is being rolled out in Cursor IDE with users reporting superior coding capabilities, especially in real-world agentic tasks.
- Enthusiastic users proclaimed Sleeping has become optional, and are rapidly integrating the model.
MCPs Supercharge Claude's Coding Abilities: Members are combining MCPs (Model Control Programs) like perplexity search and browser tools with custom instructions to boost Claude 3.7's reasoning and coding capabilities in Cursor.
- One user forked the sequential thinking MCP with their own tweaks, highlighting the benefits of combining custom instructions with MCP servers.
Installation Tips and Tricks Released for Cursor: Users shared tips for installing and updating to Cursor 0.46.3 to access Claude 3.7, including manually adding the model and checking for updates, as well as links to direct downloads for various operating systems like Windows and macOS.
- Several users noted difficulties with the auto-update feature, recommending manual download and installation for a smoother experience.
Sonnet 3.7 Reaches New SVG Heights: Many agreed that Sonnet 3.7 is a major upgrade, especially for frontend tasks and code generation, with members praising its ability to generate landing pages.
- Members shared examples of complex tasks, like recreating X's UI or generating SVG code, being handled with ease.
Context Window Problems and The Rule Bloat: Several members noted issues with Claude 3.7 in Cursor, including difficulties with code indexing in workspaces, custom rules bloating the context window, and the model sometimes ignoring those rules.
- Despite these challenges, most users found workarounds and praised the model's overall performance.

aider (Paul Gauthier) Discord

Sonnet 3.7 Steals Aider's Spotlight: Claude 3.7 Sonnet hit a 65% score on the Aider polyglot leaderboard, utilizing 32k thinking tokens.
- Some are debating if the performance increase justifies the reported 3x price hike for Sonnet 3.7 when using thinking tokens.
Anthropic drops Claude Code Aider-Clone: Anthropic released Claude Code, considered by some to be an Aider clone.
- Members are reporting the superiority of code quality and are hopeful for the future of Claude 3.7 compared to OpenAI.
Unlock O3-Mini via OpenRouter: The o3-mini-high model can be accessed through OpenRouter, is a model optimized for STEM reasoning tasks, and it is the same as o3-mini with reasoning effort set to high.
- Coding sessions could cost around $3 for 2 hours of use using OpenRouter, which can bypass rate limits and offers single point of access to multiple providers.
HN Profile Gets Roasted by LLM: Claude Sonnet 3.7 can now analyze your Hacker News profile to give highlights and trends.
- A member described the LLM's deep dive into their post history as a 'roast' that was allegedly scary accurate.
Gemini 2.0 Pro Outpaces Rivals, per Kagi: According to the Kagi LLM Benchmarking Project, Google's gemini-2.0-pro-exp-02-05 achieved 60.78% accuracy, surpassing Anthropic's claude-3-7-sonnet-20250219 at 53.23% and OpenAI's gpt-4o at 48.39%.
- Gemini 2.0 Pro also showed a median latency of 1.72s and a speed of 51.25 tokens/sec, compared to Claude Sonnet 3.7's 2.82s and 54.12 tokens/sec, and GPT-4o's 2.07s and 4 tokens/sec.

Codeium (Windsurf) Discord

Vim Chat Plagued by Issues: A user reported issues starting Codeium Chat in Vim via a Putty SSH session, facing connection errors when attempting to access the provided URL in a browser.
- The error message indicated that "This site can't be reached 127.0.0.1 refused to connect".
Windsurfers Await Claude 3.7 Arrival: Members are eagerly anticipating the integration of Claude 3.7 into Windsurf, expressing frustration over the perceived delay compared to platforms like Cursor and T3, and requesting its addition ASAP.
- Members have asked for windsurf should go and be early tester - with devs cooking to push Claude 3.7 into production with a possible release by end of day.
Deepseek Hallucinates User Prompts: A user reports Deepseek hallucinating user requests and then starting to implement changes based on those hallucinated requests.
- The AI bot invented its own user prompt and then started to implement changes based on that hallucinated user prompt 😆.
Windsurf Dev Comms Draw Fire: Users are frustrated by the perceived lack of communication from the Windsurf devs regarding the Claude 3.7 integration, with one user noting, part of the frustration is there is no comms from the devs.
- Other users have defended Windsurf and noted a lack of commercial risk since it would release when more stable being fast at implementing things doesn't mean it's solid.
MCP Server Practicality Queried: Users discussed practical uses for the MCP server, with examples including integrating Jira tickets, sharing of custom apps, and utilizing cloud services.
- Members have asked, What do you guys use MCP server for, practically? Are there real life examples that makes your life really easy? Can't think of any.

OpenAI Discord

Grok 3 Talks Too Much: Members find Grok 3 to be too verbose despite prompting for concise responses, however it proves to be a powerhouse in coding and creativity.
- One member noted that they are switching to Grok because it is less censored out of the gate.
Perplexity Plans Agentic Comet: Perplexity is launching Comet, a new agentic browser, similar to The Browser Company's work.
- The agentic browser space is heating up with more competitors.
Claude 3.7 Arrives with New Coding Power: Anthropic just dropped Claude 3.7 Sonnet which shows improvements in coding and front-end web development and also introduces a command line tool for agentic coding: Claude Code announcement here.
- One user pointed out that the model's knowledge cutoff date is February 19, 2025
Claude Code Enters the Terminal: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands overview here.
- However it is a limited research preview and is separate to the pro or Anthropic subs.
O3 exhibits 10 second delay: A user reported issues with O3, where it indicates reasoning success but then delays displaying the full text for up to 10 seconds, affecting various models including O1 Pro.
- They mentioned experiencing these problems consistently between 3pm-7pm EST, with text sometimes appearing on different devices than expected.

Unsloth AI (Daniel Han) Discord

Tax Evasion Talk Results in Timeout: A user was muted for discussing tax avoidance strategies, as giving tax avoidance recommendations is against the rules; some users pointed out the implications for invoicing.
- A user responded the company i was billing invoice too told me stupid that i was reporting income.
CUDA Kernel Causes Colab Catastrophe: A user reported a CUDA error (illegal memory access) on Google Colab with T4, suggesting trying setting CUDA_LAUNCH_BLOCKING=1 and compiling with TORCH_USE_CUDA_DSA for debugging, as per PyTorch documentation.
- Another user reported weird spikes in grad norm up to 2000, suggesting the model might be broken.
Qwen2.5 VL 72B Eats Memory Alive: A user faced out-of-memory errors trying to run Qwen2.5 VL 72B on 48GB with a 32K context length, then successfully loaded it with 8k context length after being advised to try 8k or quantize the KV cache to fp8.
- The user noted it was necessary to extract the thinking traces from the model.
DeepSeek MLA ported to Llama via TransMLA: Users explored implementing DeepSeek's Multi-Head Latent Attention (MLA) on a Llama model, suggesting retraining, but others pointed to fxmeng/TransMLA, a post-training conversion method from GQA to MLA.
- The linked paper is called Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs.
rslora role in Rank Stability: The use of rslora addresses numerical stability in high rank scenarios, but a user cautioned that if r/a = 1, rslora can worsen things, advising to keep r/a = 1 and skip rslora.
- The team stated that rslora performs a single sqrt and requires a correction term if the rank gets too big.

OpenRouter (Alex Atallah) Discord

Claude 3.7 Sonnet Lands on OpenRouter!: Claude 3.7 Sonnet is now available on OpenRouter with best-in-class performance in mathematical reasoning, coding, and complex problem-solving.
- The pricing is set at $3 per million input tokens and $15 per million output tokens, including thinking tokens, with full caching support at launch.
Extended Thinking Feature Coming Soon: The Extended Thinking feature is coming soon to the OpenRouter API, which enables step-by-step processing for complex tasks, as detailed in Anthropic's documentation.
- OpenRouter is actively working on implementing full support for Claude 3.7's extended thinking feature, which does not currently support pre-fills, aiming for launch soon with updated documentation.
GCP Gears Up for Claude 3.7: Google Cloud Platform (GCP) is preparing to support Claude 3.7 Sonnet, launching in us-east5 and europe-west1 with model ID claude-3-7-sonnet@20250219.
- Users are reminded that the model features a hybrid reasoning approach, offering both standard and extended thinking modes and maintaining performance parity with its predecessor in standard mode.
OpenRouter Revs Up Claude 3.7 Throttling: OpenRouter increased the TPM (tokens per minute) for anthropic/claude-3.7-sonnet, while anthropic/claude-3.7-sonnet:beta has a lower TPM initially, set to increase as users migrate from 3.5.
- The model has a 200,000 token context window, though some users feel its output pricing might cause complaints.
API Key Credits Safety Clarified: Users are reminded that API keys do not contain credits; deleting a key only revokes access, and credits remain tied to the account.
- Lost keys cannot be recovered due to security measures.

Interconnects (Nathan Lambert) Discord

Meta AI Expands to MENA: Meta AI has expanded to the Middle East and North Africa (MENA), supporting Arabic on Instagram, WhatsApp, and Messenger.
- This expansion opens the chatbot to millions more users in the region.
Claude 3.7 Sonnet launches with Thinking Mode: Anthropic launched Claude 3.7 Sonnet, a hybrid reasoning model with step-by-step thinking, and Claude Code, a command line tool for agentic coding, priced at $3 per million input tokens and $15 per million output tokens.
- Researchers noted Claude's thought process as eerily similar to their own, exploring different angles and double-checking answers, showcasing improvements using parallel test-time compute scaling on the GPQA evaluation.
Qwen Chat reasoning model Released: Alibaba Qwen released "Thinking (QwQ)" in Qwen Chat, backed by their QwQ-Max-Preview, which is a reasoning model based on Qwen2.5-Max, licensed under Apache 2.0.
- The model will come in smaller variants, e.g., QwQ-32B, for local deployment, with a viral Twitter demo showcasing improved math, coding, and agent capabilities.
Berkeley Advanced Agents MOOC Features Tulu 3: The "Berkeley Advanced Agents" MOOC features Hanna Hajishirzi discussing Tulu 3 today, May 30th, at 4PM PST, with a link to the YouTube video.
- The MOOC has been gaining traction as a great resource for engineers interested in agents.
Google's Co-Scientist fed Team's Prior Work: Google's Co-Scientist AI tool, based on the Gemini LLM, had been fed a 2023 paper by the team it was assisting, including a version of the hypothesis that the AI tool later suggested as a solution.
- The article highlighted the BBC coverage failed to mention that the AI tool was given the answer, raising eyebrows.

Eleuther Discord

Parallel Brains Outpace Tuned GPUs: Discussions compared the brain's stateful parallel processing to GPU efficiency, noting current RNN architectures, which differ from human processing, cannot scale to LLM level and should be data efficient.
- Members concluded that extremely tuned architectures become more relevant than simply scaling up when drawing inspiration from the brain.
Proxy Engine Structures LLM Chaos: The Proxy Structuring Engine (PSE) was introduced to address structural inconsistencies in LLM outputs, providing inference-time steering for creative freedom.
- The engine enforces structure boundaries and it is fit for use cases like Advanced Agents & Chatbots, Data Pipelines & APIs, and Automated Code Generation.
Wavelet Coding Tokenizes Image Generation: A new approach to autoregressive image generation based on wavelet image coding and a variant of a language transformer is detailed in this paper.
- The transformer learns statistical correlations within a token sequence, reflecting correlations between wavelet subbands at various resolutions.
MLA Squeezes KV Cache: Two papers, MHA2MLA and TransMLA, explore adapting models to Multi-head Latent Attention (MLA), significantly reducing KV cache size (5-10x).
- While one paper showed deteriorated performance (1-2%), the other showed enhanced performance, suggesting MLA could be non-inferior to MHA, especially with larger models and more parameters.
Mixed Precision Toggles Optimizer Defaults: During mixed precision training with BF16, the master FP32 weights typically reside in GPU VRAM, unless ZeRO offload is enabled.
- It is common to store the first and second Adam moments in bf16, while keeping master weights in fp32, unless the expert sharding with momentum/variance states via ZeRO.

Nous Research AI Discord

LLMs Invoke Tools Autonomously: Some LLMs invoke tools without explicit token sequences, suggesting hard-coded patterns from training via reinforcement learning or SFT.
- This token-saving approach's reliability compared to ICL remains unclear without benchmarks.
Claude 3.7 Sonnet Takes the SWE Crown: Claude 3.7 Sonnet leads on the SWE bench, enabling active code collaboration like searching, editing, testing, and committing code to GitHub.
- A member suggested that 3.7 being a point release makes sense since Claude 3.5 was already a reasoning model, also hinting that future reasoning models will be 'crazy'.
QwQ-Max-Preview Aims for Deep Reasoning: QwQ-Max-Preview blog shows a model built on Qwen2.5-Max that excels in deep reasoning, math, coding, general domains, and agent tasks.
- Speculation arose around key tokens in QwQ's reasoning traces resembling R1, suggesting it requires less compute.
Sonnet-3.7 Excels in Misguided Attention Eval: Sonnet-3.7 benchmarked as top non-reasoning model in Misguided Attention Eval, nearly surpassing o3-mini.
- The user seeks to activate its thinking mode via the OR API, if feasible.
Qwen AI Adds Integrated Video Generation: The updated Qwen AI chat interface now features integrated video generation capabilities.
- A member noted that the artifacts are still a bit clunky, like a half baked copy.

MCP (Glama) Discord

Anthropic Finally Delivers MCP Registry API: Anthropic announced the official MCP registry API, as seen on this tweet, to be the source of truth for MCPs, streamlining development and integration efforts with solutions like opentools.com/registry.
- This API will help the community fill the source-of-truth gap for portable & secure code for AI Apps and Agents.
Claude 3.7 Debuts 'Thinking' Tags: Claude 3.7 has been released, featuring 64,000 output extended thinking tokens and a new 'latest' alias.
- Users noted it is back to following long-ish system prompts, spotting social engineering, and also utilizes <thinking> tags when using tools, adding a cute touch to its operation.
Claude Code Excels as Code Assistant: Claude Code (CC) is receiving high praise for its code assistance capabilities, outperforming tools like Aider in handling complex coding errors, such as resolving 21 compile errors in Rust in one shot.
- Users are speculating on caching mechanisms and costs, with one user reporting astronomical Anthropic costs in the last 6 weeks.
MetaMCP Debates Open-Source Licensing**: Concerns were raised regarding MetaMCP's licensing, with a user suggesting it might become a cloud SaaS, prompting the developer to seek feedback on licensing to prevent cloud monetization while keeping it self-hostable via the MetaMCP server GitHub repository.
- A user suggested using AGPL licensing for MetaMCP to ensure contributions are open-sourced, also suggesting an additional clause allowing the company to sublicense under MIT-0.
Claude 3.7 Sonnet Shines on Sage**: Claude 3.7 Sonnet with extended thinking capabilities is now on Sage, allowing users to see Claude's reasoning process as it tackles complex problems, including a thinking mode toggle (Command+Shift+T).
- Other new features include default model settings, improved scrolling, and expandable thinking blocks.

LM Studio Discord

Qwen 2.5 VL Model Ready to Rumble: A working Qwen 2.5 VL 7B GGUF has arrived and is available on Hugging Face for immediate use.
- Users report that it performs significantly better than llama3.2 vision 11b instruct and qwen2-vision 7b instruct, and works out of the box on the latest version of LM Studio.
QuantBench Accelerates Quantization: The Qwen 2.5 VL 7B GGUF quant was produced using QuantBench, now available on GitHub for accelerated quant workflows.
- The model has been successfully tested on the latest llama.cpp build, with CLIP hardware acceleration enabled.
LM Studio Reveals Speculative Decoding Secrets: Users are exploring speculative decoding with Llama 3.1 8B and Llama 3.2 1B models in LM Studio, according to LM Studio's documentation.
- The documentation claims that speculative decoding can substantially increase the generation speed of large language models (LLMs) without reducing response quality.
Deepseek R1 671b Gorging RAM: Running Deepseek R1 671b locally needs serious RAM, with documentation specifying 192GB+; one helpful user suggested using a specific quantized version.
- For those running on Macs, offloading approximately 70% of the model weights to the GPU may help.
M2 Max Sipping Power: Despite the shiny new M4 Max, one user decided to stick with their M2 Max, as M4 Max boosts way too hard easily pegged at 140w and located a well priced refurbished M2 Max 96GB.
- The user reports the M2 Max is sufficient for their needs, pulling only around 60W.

Stability.ai (Stable Diffusion) Discord

SD3 Ultra's Unseen Excellence: A user asked about SD3 Ultra, a comfy workflow based on SD3L 8B that delivers superior high-frequency detail.
- Another member stated it still exists and is being used, implying it is not yet a public release.
Silence from Stability?: A member asked about updates on current projects or future plans, noting they haven't heard anything for a while from Stability AI.
- Another member responded that nothing can be shared yet, but they are hopefully expecting announcements soon.
Dog Datasets Desired: A user requested alternative dog breed image datasets beyond the Stanford Dogs Dataset, which contains 20k images.
- The user specifically needs images containing both the dog and its breed clearly labeled.
Image Generation Times Vary: Users discussed image generation times based on different hardware configurations, using various versions of Stable Diffusion.
- Times ranged from around 1 minute on a GTX 1660s to 4-5s on a 3070ti using SD1.5, and 7 seconds for a 1280x720 image and 31 seconds for 1920x1080 at 32 steps with a 3060 TI.
Stability AI Solicits Suggestions: Stability AI launched a new feature request board to gather user feedback and prioritize future developments.
- Users can submit and vote on feature requests directly from Discord using the /feedback command or through the new platform, aiming to ensure community voices shape future priorities.

Modular (Mojo 🔥) Discord

Mojo Conjures Graphics with GLFW/GLEW: Graphics programming in Mojo is feasible via FFI using a static library linked to GLFW/GLEW, evidenced by a Sudoku example.
- A member suggested exposing only the needed calls via your own C/CPP library using alias external_call with a wrapped function, plus an example repo shows how to hijack the loader.
Mojo's magic install Faces lightbug_http Bug: Using lightbug_http dependency in a new Mojo project leads to an error with small_time.mojopkg after running magic install.
- The error resembles a Stack Overflow question, hinting that small-time might be pinned to a specific version.
MAX's Game of Life gets Accelerated by Hardware: A member showcases a hardware-accelerated Conway's Game of Life by bridging MAX and Pygame, revealing a creative application, as shown in their attached conway.gif.
- They demonstrated the use of GPU in their MAX implementation by showcasing a guns pattern, packed bit by bit, rendered using a naive pixel-by-pixel internal function, and then the output tensor gets cast into an np array and given to pygame to render, as demonstrated in their guns.gif.
Game of Life Creates Computer Architectures: A member shared a project (nicolasloizeau.com) about crafting a computer within Conway's Game of Life, demonstrating its Turing completeness via glider beams for logic gates.
- A member also implemented wrapping in their Conway's Game of Life simulation using MAX, enabling the creation of spaceship patterns and showcasing the ability to add parameters to the model from the graph API, as showcased in their spaceship.gif.

Notebook LM Discord

NotebookLM Eases Use with PowerPoint Conversion: A user detailed a workaround to import physical books into NotebookLM by photographing pages, converting the PDF to PowerPoint, uploading to Google Slides, and importing the slides.
- They observed that NotebookLM can process text images in slides, but not directly from PDF files.
Language Prompts Misfire on German: A user reported issues getting NotebookLM hosts to speak German, even with specific prompts requesting German.
- The hosts spoke English or gibberish, sometimes starting in German before switching, indicating potential issues with language prompt accuracy.
Savin/Ricoh Copier Revives Book Scanning: A user advised scanning books to PDF using a Savin/Ricoh copier and uploading to NotebookLM.
- They affirmed that even with poor source text quality, NLM accurately answered questions about the scanned document.
Users Request Language Customization: A user inquired about the feasibility of changing the language in NotebookLM without altering the Google account language.
- This points to a demand for language customization to improve user experience and cater to diverse linguistic preferences.
Claude 3.7 Ignites Model Choice Fantasies: A user expressed enthusiasm for Claude 3.7 and desired the option to select models in NotebookLM.
- Another user questioned the impact of model choice, sparking a discussion on the implications of model variety for the end user experience.

LlamaIndex Discord

LlamaIndex Unveils AI Assistant in Docs: LlamaIndex announced the release of an AI assistant directly within their documentation.
- The new assistant aims to provide immediate, contextual support to users navigating the LlamaIndex ecosystem.
ComposIO HQ Drops a Banger: LlamaIndex highlighted another new release from ComposIO HQ, though specifics of the release were unmentioned.
- This indicates ongoing development and feature enhancements within the ComposIO framework, a tool useful for LLM orchestration.
AnthropicAI Releases Claude Sonnet 3.7: AnthropicAI launched Claude Sonnet 3.7, with LlamaIndex offering immediate support.
- Users can access the new model by running pip install llama-index-llms-anthropic --upgrade and reviewing Anthropic's announcement.
Fusion Rerank Retriever Demands Initialized Nodes: A user reported issues initializing the BM25 retriever within a fusion rerank retriever setup with Elasticsearch because the docstore was empty.
- Another member clarified that BM25 requires nodes to be saved to disk or another location for initialization, as it cannot initialize directly from the vector store.
MultiModalVectorStoreIndex Throws File Error: A user encountered a [Errno 2] No such file or directory error when creating a multimodal vector index using MultiModalVectorStoreIndex with GCSReader.
- The error occurred with image files present in the GCS bucket, while PDF documents were processed successfully, indicating a potential issue with image file handling.

Torchtune Discord

Truncation Troubles: Left Prevails: Members debated the use of left truncation seq[-max_seq_len:] vs right truncation seq[:max_seq_len] during finetuning, with interesting graphs.
- The final decision involved exposing both methods but defaulting to left truncation for SFT in torchtune.
StatefulDataLoader Support: Merge Incoming: A member is requesting review for their PR adding support for the StatefulDataLoader class in torchtune.
- The new dataloader would add statefulness to the dataset.
DeepScaleR Scales with RL: DeepScaleR was finetuned from Deepseek-R1-Distilled-Qwen-1.5B using simple reinforcement learning (RL).
- DeepScaleR achieved 43.1% Pass@1 accuracy on AIME2024.
DeepSeek Opens EP Communication Library: DeepSeek introduced DeepEP, the first open-source EP communication library for MoE model training and inference.
- The communication library enables efficient all-to-all communication.

Cohere Discord

Validators Ponder Profitability Threshold: A member inquired about the profitability threshold for Proof of Stake (PoS) validators within the Decentralized Science (DeSci) field.
- Another member responded with "pool validator node", hinting at the importance of pool participation for validators.
Asset Expert Gets Labeled: The bot posted about an "asset value expert account" which was labelled as "nazi".
- No further context was given.

DSPy Discord

DSPy Simplifies Assertion Migration: DSPy users can now use dspy.BestOfN or dspy.Refine modules to streamline migration from 2.5-style Assertions.
- The dspy.BestOfN module retries a module up to N times, selecting the best reward and halting upon reaching a specified threshold.
DSPy crafts reward functions: DSPy's reward functions now support scalar values such as float or bool, which allows customized evaluation of module outputs.
- A sample reward function was shown: def reward_fn(input_kwargs, prediction): return len(prediction.field1) == len(prediction.field1).

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (1056 messages🔥🔥🔥):

Claude 3.7 Sonnet release, Cursor IDE integration, MCPs with Claude, Comparisons of Claude 3.7 with other models (GPT-4, O3), Troubleshooting Cursor and Claude 3.7

Claude 3.7 Sonnet Causes Coding Frenzy: Claude 3.7 Sonnet is celebrated for its superior coding capabilities, especially in real-world agentic tasks, and is being rolled out in Cursor IDE.
- Enthusastic users proclaimed Sleeping has become optional with many quickly integrating the model and lauding its performance.
MCPs Enhance Claude's Coding Prowess: Members discussed using MCPs (Model Control Programs) like perplexity search and browser tools, and combining them with custom instructions to extend Claude 3.7's reasoning and coding capabilities in Cursor.
- One user forked the sequential thinking MCP with their own tweaks, emphasizing the benefits of combining custom instructions with MCP servers.
Installation Tips and Tricks Unleashed for new Cursor Update: Users shared tips for installing and updating to Cursor 0.46.3 to access Claude 3.7, including manually adding the model and checking for updates, as well as links to direct downloads for various operating systems like Windows and macOS.
- Several users noted difficulties with the auto-update feature, recommending manual download and installation for a smoother experience.
Thinking Model takes SVG Code Generation to the next level: Many agreed that Sonnet 3.7 is a major upgrade over previous models, especially for frontend tasks and code generation, with one user exclaiming this shit feels like new level ai and others praising its ability to generate landing pages.
- Members shared examples of complex tasks, like recreating X's UI or generating SVG code, being handled with ease.
Context Window Struggles and The Rule Bloat: Several members noted issues with Claude 3.7 in Cursor, including difficulties with code indexing in workspaces, custom rules bloating the context window, and the model sometimes ignoring those rules.
- Despite these challenges, most users found workarounds and praised the model's overall performance, with one stating the model tries to first make sure it understands the project before it makes changes this is great.

Links mentioned:

aider (Paul Gauthier) ▷ #general (935 messages🔥🔥🔥):

Claude 3.7, Aider Benchmarks, Claude Code, Thinking Models, OpenAI vs. Anthropic

Sonnet 3.7 Steals Aider's Spotlight!: Claude 3.7 Sonnet achieved a 65% score on the Aider polyglot leaderboard, utilizing 32k thinking tokens.
The Cost Of 3.7 Thinking Questioned!: The cost of using Sonnet 3.7 with thinking tokens is being debated, with some feeling the increased performance isn't worth the 3x price hike.
- One user noted, 3x more for 0.9% more is not justifiable... i was hoping sonnet-3.7 crushes this benchmark.
Claude Code: Aider's Spinoff Released by Anthropic!: Anthropic released Claude Code, a coding tool that some consider an Aider clone, but it appears to have some limitations compared to Aider.
Is Open AI cooked ?: Members are stating that Claude 3.7 aced their geometry test even using watermarked images, where as Open AI failed.
- Members are reporting the superiority of code quality and are hopeful for the future of Claude 3.7.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (63 messages🔥🔥):

Architect mode configuration, O3-mini access, OpenRouter benefits, Aider Compact Command, Claude 3.7 in Aider

Architect Mode Configs Clarified: Users discussed the configurations for using o1-preview as the Architect model and o1-mini as the Editor model in aider, confirming that model: o1-preview, editor-model: o1-mini, and architect: true is the correct setup, as documented here.
- It was suggested to use a more powerful model for ask mode and to change the model at runtime using /model as needed, based on the specific task.
Unlock O3-Mini via OpenRouter: Members discussed accessing the o3-mini-high model through OpenRouter, a cost-efficient language model optimized for STEM reasoning tasks, noting it is the same as o3-mini with reasoning effort set to high.
- A user indicated that coding sessions could cost around $3 for 2 hours of use and that OpenRouter can bypass rate limits and offers a single point of access to multiple providers.
Compact Command Craving: A user expressed interest in a /compact command similar to claude-code to manage message history context, while praising aider's file context control.
- The user acknowledged difficulty managing message history context despite the control over file context.
Troubleshooting Claude 3.7 and Bedrock: Members are currently discussing the implementation of Claude 3.7 including 'thinking' mode within aider, specifically when using Bedrock.
- One user provided example command-line code for a hello world using bedrock-runtime and is seeking advice to get it fully operational within aider; another is trying to turn off reasoning for the editor model while retaining it for the architect.
Aider Auto-Pulls Git Changes: A user inquired about automatically pulling remote git repository changes in Aider to keep the local version in sync, wanting to trigger it outside the prompt with a flag.
- Another user suggested a separate bash script running git pull periodically or exploring webhooks, while Aider has /git command that could help.

Links mentioned:

aider (Paul Gauthier) ▷ #links (2 messages):

Hacker News Wrapped, Kagi LLM Benchmarking Project, Claude Sonnet 3.7

HN Profile Gets Roasted!: Users can now have Claude Sonnet 3.7 analyze their Hacker News profile to get highlights and trends.
- The analysis is purportedly scary accurate, according to a member, who described the LLM's deep dive into their post history as a 'roast'.
Kagi Launches LLM Benchmarking Project: Kagi introduced the Kagi LLM Benchmarking Project to evaluate major large language models (LLMs) on their reasoning, coding, and instruction following capabilities, last updated February 24, 2025.
- The benchmark uses frequently changing and mostly novel tests to provide a rigorous evaluation of the models' capabilities, aiming to avoid benchmark overfitting.
Gemini 2.0 Pro Outpaces Claude Sonnet 3.7 and GPT-4o: The Kagi LLM Benchmarking Project results show Google's gemini-2.0-pro-exp-02-05 achieved 60.78% accuracy, surpassing Anthropic's claude-3-7-sonnet-20250219 at 53.23% and OpenAI's gpt-4o at 48.39%.
- Gemini 2.0 Pro also demonstrated a median latency of 1.72s and a speed of 51.25 tokens/sec, compared to Claude Sonnet 3.7's 2.82s and 54.12 tokens/sec, and GPT-4o's 2.07s and 4 tokens/sec.

Links mentioned:

Codeium (Windsurf) ▷ #discussion (15 messages🔥):

Codeium chat in Vim, Codeium Discussion channel purpose, Codeium 3.7 release

Vim Chat issues surface: A member reported issues starting Codeium Chat in Vim via a Putty SSH session, encountering connection errors when trying to access the provided URL in a browser.
- The error message indicated that "This site can't be reached 127.0.0.1 refused to connect".
Channel clarification clears confusion: Members clarified the purpose of the Codeium Discussion channel, noting that it is intended for the Codeium extension available for VS Code, Neovim, JetBrains editors, and Emacs.
- One suggested using codeium.com/support for dedicated support.
Codeium release date remains in question: A member inquired about the release timeline for Codeium 3.7.
- Another member suggested there was "0 chance" of release.

Codeium (Windsurf) ▷ #windsurf (675 messages🔥🔥🔥):

Cascade UI error, Claude 3.7 Sonnet, Model comparison, Deepseek hallucination, Windsurf Dev Comms

Cascade Displays Diffs Differently, Users Concerned: Users reported that Cascade now shows suggestions as diffs instead of editable sections, requiring git restore to reject changes, and another user suggested this may be an issue with overlong chats or how Cascade handles responses from o3/R1.
- A user suggested starting a new chat to restore the ACCEPT/REJECT workflow.
Claude 3.7 Arrival Impatience Mounts: Members are eagerly awaiting the integration of Claude 3.7 into Windsurf, with some frustrated by the perceived delay compared to other platforms like Cursor and T3, and many want 3.7 to be added ASAP.
- Members asked windsurf should go and be early tester - with devs cooking to push Claude 3.7 into production with a possible release by end of day.
Deepseek suffers from user prompt hallucination: A user reports Deepseek hallucinating user requests and then proceeding to implement changes based on those hallucinated requests.
- The AI bot invented its own user prompt and then started to implement changes based on that hallucinated user prompt 😆.
Windsurf Dev Comms Criticized: Some users are frustrated by the lack of communication from the Windsurf devs regarding the Claude 3.7 integration, one user said, part of the frustration is there is no comms from the devs.
- Other users defended Windsurf and noted lack of commercial risk since it would release when more stable being fast at implementing things doesn't mean it's solid.
Users question MCP Server Practicality: Users discussed practical uses for the MCP server, with examples including integrating Jira tickets, sharing of custom apps, and using cloud services.
- Members asked, What do you guys use MCP server for, practically? Are there real life examples that makes your life really easy? Can't think of any.

Links mentioned:

OpenAI ▷ #ai-discussions (611 messages🔥🔥🔥):

Grok 3, Perplexity Comet agentic browser, Claude 3.7 Sonnet, Claude Code, GPT-4.5 release

Grok 3 verbose, Grok 3 creative: Members find Grok 3 to be too verbose despite prompting for concise responses, however it proves to be a powerhouse in coding and creativity.
- One member commented that they are switching to Grok because it is less censored out of the gate.
Perplexity Comet, an agentic browser on the horizon: Perplexity is launching Comet, a new agentic browser, similar to The Browser Company's work, according to a member.
- The agentic browser space is heating up with more competitors being created.
Claude 3.7 Sonnet debuts with Thinking Mode: Anthropic just dropped Claude 3.7 Sonnet which shows improvements in coding and front-end web development and also introduces a command line tool for agentic coding: Claude Code announcement here.
- One user pointed out that the model's knowledge cutoff date is February 19, 2025
Claude Code, a terminal based AI tool released as research preview: Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands overview here.
- However it is a limited research preview and is separate to the pro or Anthropic subs.
GPT-4.5 release rumors: Members eagerly await GPT-4.5 with one joking that Windsurf has OFFICIALLY confirmed that Claude 3.7 Sonnet is coming within ~1-2 days.
- The release of GPT-4.5 may be coming soon, as members discuss and compare its potential capabilities against current models.

Links mentioned:

OpenAI ▷ #gpt-4-discussions (9 messages🔥):

O3 issues, Screenshot posting on Discord, Bug reporting

O3 Delay Dilemmas: A user reported issues with O3, where it indicates reasoning success but then delays displaying the full text for up to 10 seconds, affecting various models including O1 Pro.
- They mentioned experiencing these problems consistently between 3pm-7pm EST, with text sometimes appearing on different devices than expected, and inquired about the inability to post screenshots in the chat.
Screenshot Savvy on Discord: A member pointed out that screenshot posting is channel-specific and suggested using channels like <#989157702347411466> that support screenshots.
- They recommended posting there and referencing the discussion in the current channel.
Bug Reporting Bonanza: A member suggested using <#1070006915414900886> for bug reporting, and provided instructions on how to post a new bug report.
- They also advised looking around and commenting on existing reports if they closely match the user's situation.

Unsloth AI (Daniel Han) ▷ #general (345 messages🔥🔥):

paid moderators, CUDA errors, Qwen2.5 VL 72B, Claude 3.7, DeepSeek MLA

Tax Advice Gets the Boot: A user was muted for discussing setting up a business for invoicing to avoid taxes and was told that giving tax avoidance recommendations would not be tolerated.
- Another user responded the company i was billing invoice too told me stupid that i was reporting income.
Colab CUDA Kernel Errors: A user reported a CUDA error on T4 Google Colab, specifically an illegal memory access was encountered and was advised to set CUDA_LAUNCH_BLOCKING=1 and compile with TORCH_USE_CUDA_DSA for debugging.
- Another user mentioned seeing weird spikes in grad norm up to 2000, suggesting that the model might be broken and the training/loss curve looks unhealthy.
Qwen2.5-VL-72B Causes Memory Errors: A user tried running Qwen2.5 VL 72B on 48GB and encountered an out-of-memory error with a context length of 32K, and another user suggested trying it with 8k or quantizing the KV cache to fp8.
- The user then successfully loaded the model with an 8k context length, noting it was necessary to extract the thinking traces from the model.
DeepSeek MLA Implementation on Llama: Users discussed the possibility of implementing DeepSeek's Multi-Head Latent Attention (MLA) on a Llama model, with one user suggesting it would require retraining the model with the different attention mechanism.
- Later, a user linked to fxmeng/TransMLA, a post-training method that converts GQA-based pre-trained models into MLA models.
High Rank Stability of rslora: Users discussed rslora and its role in fixing numerical stability issues with high rank stability, as it does a single sqrt and you need a correction term if your rank gets too big.
- A user suggested that if r/a = 1, rslora makes things likely worse, and advised keeping to r/a = 1 and avoiding rslora altogether.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (1 messages):

deoxykev: New qwq https://qwenlm.github.io/blog/qwq-max-preview/

Unsloth AI (Daniel Han) ▷ #help (121 messages🔥🔥):

Unsloth on Mac, GRPO Qwen notebook issue, CUDA Out of Memory, ShareGPT Dataset format, Forcing Unload From VRAM

Mighty Macs Might Miss Models: Unsloth's Mac Compatibility Conundrum: While you can run models on Macs using Ollama or Jan AI, you cannot fine-tune with Unsloth on Mac devices yet, although the team is working on it; users pointed to MLX as something worth exploring.
- One user suggested exterior dock GPUs or renting a GPU server using services like Tensordock (48GB server for $0.95 USD) or using the free 4T offered by Google as ways to work around the limitations.
Qwen Query Quandary: GRPO Notebooks and VLLM Variance: Users reported that the GRPO Qwen notebook devolves into nonsensical answers without vLLM, but functions normally with vLLM.
- One user attached a screenshot example with VLLM here.
VRAM Vanishing Voyage: Unsloth's Memory Spike Mystery: Unsloth spikes to double the VRAM use every time it starts saving the model, but only when the model starts saving, not at any point during the training.
- The user narrowed down what was causing their CUDA out of memory crashes and was advised to put that in showcase, and a developer stated they will rewrite parts of that in the comming weeks unless you wanna pr that sooneri work on a part where we need more robust conversion / uploads anyway.
ShareGPT Snafu Solved: Formatting Data for Datasets: A user was confused if they had to format their data within guidelines, or if they can format it in other ways, when using their own dataset.
- The user was directed to use the notebooks as a guideline, with the observation that the mentioned notebook uses the ShareGPT format, and to check the documentation.
VRAM Vacation Voyage: Forcing Unsloth to Unload for Conversion: A user asked how to force Unsloth to unload from VRAM after creating the final checkpoint but before saving to GGUF.
- The response was that VRAM is not the issue, but rather saving to Lora, as well as saving to GGUF, loads it up fully in VRAM.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Claude 3.7 Sonnet, Extended Thinking, Pricing and Availability

Claude 3.7 Sonnet lands on OpenRouter: Claude 3.7 Sonnet is now available on OpenRouter, offering best-in-class performance, with a focus on mathematical reasoning, coding, and complex problem-solving.
Extended Thinking Soon to Land: The Extended Thinking feature is coming soon to the OpenRouter API, enabling step-by-step processing for complex tasks, as detailed in Anthropic's documentation.
Claude 3.7 Sonnet: Pricing Unveiled: The pricing for Claude 3.7 Sonnet is set at $3 per million input tokens and $15 per million output tokens, including thinking tokens, with full caching support at launch.

Link mentioned: Claude 3.7 Sonnet - API, Providers, Stats: Claude 3.7 Sonnet is an advanced large language model with improved reasoning, coding, and problem-solving capabilities. Run Claude 3.7 Sonnet with API

OpenRouter (Alex Atallah) ▷ #general (346 messages🔥🔥):

Claude 3.7 Sonnet, GCP hosting Claude 3.7 Sonnet, OpenRouter rate limits, Claude 3.5 Haiku with vision, TPUs vs GPUs for inference

GCP Preps Claude 3.7 for Launch: Google Cloud Platform (GCP) is preparing to support Claude 3.7 Sonnet, launching in us-east5 and europe-west1 with model ID claude-3-7-sonnet@20250219.
Claude 3.7's Debut: Performance and Pricing: Claude 3.7 Sonnet features a hybrid reasoning approach, offering both standard and extended thinking modes, maintaining performance parity with its predecessor in standard mode while enhancing accuracy in complex tasks, detailed in Anthropic's blog post.
- The model costs $3/M input tokens and $15/M output tokens, with a 200,000 token context window, though some users feel its output pricing might cause complaints.
Thinking Support: Still in the Lab: OpenRouter is actively working on implementing full support for Claude 3.7's extended thinking feature, which does not currently support pre-fills, aiming for launch soon with updated documentation.
OpenRouter Ramps Up Claude 3.7: OpenRouter increased the TPM (tokens per minute) for anthropic/claude-3.7-sonnet, while anthropic/claude-3.7-sonnet:beta has a lower TPM initially, set to increase as users migrate from 3.5.
API Key Safety Dance: Users are reminded that API keys do not contain credits; deleting a key only revokes access, and credits remain tied to the account, though lost keys cannot be recovered due to security measures.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (304 messages🔥🔥):

Meta AI Expansion, Claude 3.7 Sonnet Release, Claude Code Tool, Qwen Chat Release, DeepEP

Meta AI goes to MENA: Meta AI has formally expanded to the Middle East and North Africa (MENA), now supporting Arabic and accessible on Instagram, WhatsApp, and Messenger.
Claude 3.7 Sonnet with Extended Thinking Launched: Anthropic launched Claude 3.7 Sonnet, a hybrid reasoning model with near-instant responses and visible, step-by-step thinking, coupled with Claude Code, a command line tool for agentic coding in limited research preview, priced at $3 per million input tokens and $15 per million output tokens.
- Researchers noted Claude's thought process as eerily similar to their own, exploring different angles and double-checking answers; the blogpost notes they'll weigh pros and cons of revealing the thought process for future releases.
Sonnet's Visible Extended Thinking: Anthropic is allowing the model to give itself more time, and expend more effort, in coming to an answer with it's new Visible Extended Thinking mode feature.
- It achieves striking improvements using parallel test-time compute scaling on the GPQA evaluation, a commonly-used set of challenging questions on biology, chemistry, and physics.
QwQ-Max and Qwen 2.5 Max: The Apache Strikes Back: Alibaba Qwen released "Thinking (QwQ)" in Qwen Chat, backed by their QwQ-Max-Preview, which is a reasoning model based on Qwen2.5-Max, licensed under Apache 2.0.
- The model will come in smaller variants, e.g., QwQ-32B, for local deployment, and they highlighted improved math, coding, and agent capabilities in a viral Twitter demo showcasing the model's reasoning.
The Curious case of Co-Scientist: It was found that Google's Co-Scientist AI tool, based on the Gemini LLM, had been fed a 2023 paper by the team it was assisting, which included a version of the hypothesis that the AI tool later suggested as a solution, which the BBC coverage failed to mention this bit, the article points out.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #random (15 messages🔥):

Berkeley Advanced Agents MOOC, Tulu 3, RLHF Explanation, AI Startups customer base, mic firmware issues

Berkeley Advanced Agents MOOC features Tulu 3: A member highlighted that the "Berkeley Advanced Agents" MOOC is featuring Hanna Hajishirzi discussing Tulu 3 today, May 30th, at 4PM PST, with a link to the YouTube video.
RLHF explained with Analogy: A member shared a link to a tweet explaining RLHF to a non-technical audience using an analogy.
- Kyle Matthews responded that it was “actually a good analogy lol”.
Member wants a Sticker: A member wants some stickers and shared a link to the Stickers but can't get it because they're not in the US.
- They then suggested doing the claude thing, but immediately reneged, suggesting to “ship me the stickers Phil I will take good care of them”.
AI Startups Customer Base Questioned: A member questioned whether AI startups have customers outside of Silicon Valley.
- Another member responded with “Better question is outside tech” and that “the ai labs tout they have huge penetration in Fortune 500”.
Mic Firmware Reset Shenanigans: A member reported that their audio problem from weeks ago was due to mic firmware being reset and the gain being turned down.
- This was given with no other context.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (1 messages):

Memes

Image posted: A member posted an image, titled CleanShot_2025-02-24_at_20.20.03.png, with <:3berk:794379348311801876> as the message.
- The image was attached from discordapp.com.
Another image posted: A member posted an image.
- The image was attached from discordapp.com.

Interconnects (Nathan Lambert) ▷ #nlp (1 messages):

0x_paws: https://x.com/srush_nlp/status/1894039989526155341?s=46&t=Y6KMaD0vAihdhw7S8bL5WQ

Interconnects (Nathan Lambert) ▷ #posts (3 messages):

GIF Posts, SnailBot Tagging

Animated GIF Posted: A member posted an animated GIF with the text 'new post' on a black background: link to GIF.
- The bot tagged <@&1216534966205284433> and another member thought the bot was 'fast today'.
SnailBot Gets Noticed: The bot was tagged <@&1216534966205284433> in response to the new post.
- One member humorously remarked that they initially mistook the bot for a snail, expressing surprise at its perceived speed.

Link mentioned: New New Post GIF - New New post Post - Discover & Share GIFs: Click to view the GIF

Eleuther ▷ #general (37 messages🔥):

Brain Parallelism vs GPU, LLM Scaling, Proxy Structuring Engine

Brain's Parallelism Puzzles Current GPUs: Members debated the brain's parallelism vs GPU efficiency, with claims that parallel processing of strings is the main driver of current RNN architectures, which differ from human processing.
- While a member argued humans perform stateful parallel processing, the consensus leaned toward current architectures not mirroring the brain's functionality, especially since classical RNN architectures could not scale to LLM level.
LLM Scaling Needs Tuned Architectures: The discussion shifted to scaling challenges, with a member pointing out that extremely tuned architecture and inductive bias become relevant when drawing direct inspiration from the brain, instead of scaling up, as well as that training should be slow and data efficient.
- Another member highlighted the problems with slow, data-efficient training, noting concerns about catastrophic forgetting and needing to avoid overfitting irrelevant to downstream task performance.
Proxy Engine Solves Inconsistent Outputs: A member introduced the Proxy Structuring Engine (PSE), designed to solve structural inconsistencies in LLM outputs by acting as inference-time steering for the model.
- This engine enforces structure boundaries while allowing creative freedom, fitting for use cases like Advanced Agents & Chatbots, Data Pipelines & APIs, and Automated Code Generation.

Link mentioned: The Proxy Structuring Engine: High Quality Structured Outputs at Inference Time

Eleuther ▷ #research (32 messages🔥):

Wavelet Image Coding, Walsh Functions, Multi-head Latent Attention (MLA), Native Sparse Attention (NSA), Looped/Recurrent Architectures

Wavelet Image Coding arrives!: A new approach to autoregressive image generation based on wavelet image coding and a variant of a language transformer is discussed in this paper.
- The transformer learns statistical correlations within a token sequence, reflecting correlations between wavelet subbands at various resolutions.
Walsh Functions are the Discrete Counterpart to Fourier Transforms: A member suggested that Walsh functions could be the discrete counterpart to Fourier transforms, with a rotated matrix representation for wavelet transforms.
- Another member linked to this blogpost as a good explanation of MLA, linking the codebase and ablation studies.
MLA Gains Traction as KV Cache Reduction Method: Two papers (MHA2MLA and TransMLA) explore adapting existing models to Multi-head Latent Attention (MLA), which significantly reduces KV cache size (5-10x).
- While one paper showed deteriorated performance (1-2%), the other showed enhanced performance, suggesting MLA could be non-inferior to MHA, especially with larger models and more parameters.
Native Sparse Attention (NSA) joins the game: Native Sparse Attention (NSA) from DeepSeek reduces the computing cost of long context by 5-10x.
- With both MLA and NSA being open-sourced, they could be implemented into frontier models soon, if it is indeed an advancement in the state of the art it will be incorporated.
Looped Models are the future: A member suggests that looped/recurrent architectures are the future, although properly training them is tricky.
- Another member anticipates that frontier labs will seek any advantages from DeepSeek papers, considering DeepSeek's architectural novelty.

Links mentioned:

Eleuther ▷ #interpretability-general (9 messages🔥):

Attention Maps vs. Neuron-Based Methods, Intervening on Attention Maps, Syntax Emerging from Attention Maps

Attention Maps' Popularity Dwindles Compared to Neuron Methods: Members discussed if attention maps have lost traction to neuron-based methods because they are observational rather than interventional.
Intervening on Attention Maps Discussed: Members suggested you can directly change the map during a forward pass, rather than just using a custom mask.
Syntax Emerges from Attention Maps Since BERT: A member expressed bias towards attention maps due to their ability to generate trees/graphs and use linguistic corpora/ontologies as features for future projects.
- They noted that people have been showing syntax emerging from attention maps since BERT.

Eleuther ▷ #gpt-neox-dev (10 messages🔥):

Mixed Precision Training, BF16 Training, ZeRO Offload, Optimizer States Precision, Deepseek Adam Moments

FP32 Master Weights reside in GPU unless ZeRO engaged: When performing mixed precision training with BF16, the master FP32 weights are typically stored in GPU VRAM, unless ZeRO offload is explicitly enabled.
- After the ZeRO paper it is now common to think of high-precision model parameters as belonging to the optimizer states, since they are sharded with momentum/variance.
Optimizer precision in BF16 mixed precision: It is common to store the first and second Adam moments in bf16, but still store master weights in fp32.
- It was suggested to use vanilla mixed precision of bf16 low-precision weights + fp32 optim+master-weights+grads unless one has specific expertise.
Mixed Precision and the Optimizer: NVIDIA's Perspective: The use of bf-16 mixed precision in the optimizer, as seen in NVIDIA's Megatron-LM, is related to the model being in bf-16 MP, but can be configured independently.
- The memory from high-precision model parameters can be sharded with momentum/variance states via ZeRO.

Link mentioned: Megatron-LM/megatron/core/optimizer/optimizer_config.py at main · NVIDIA/Megatron-LM: Ongoing research training transformer models at scale - NVIDIA/Megatron-LM

Nous Research AI ▷ #general (68 messages🔥🔥):

Tool use in LLMs, Claude 3.7 Sonnet, QwQ-Max-Preview, AI alignment

LLMs Invoke Tools without System Prompt: It was observed that some LLMs might invoke tools without explicit token sequences in the system prompt, suggesting that these patterns are hard-coded during training via reinforcement learning or direct SFT.
- This approach saves tokens in the long run by eliminating the need to specify a schema for tool calls in every inference, though its reliability compared to ICL remains unclear without benchmarks.
Claude 3.7 Sonnet takes SWE crown: Claude 3.7 Sonnet is the new state of the art on the SWE bench, featuring active code collaboration that can search, read, edit, test, and commit code to GitHub.
- One member stated that Claude 3.5 was already a reasoning model, so calling the new one a 'point' release makes sense, and hinted that future reasoning models will be 'crazy'.
QwQ-Max-Preview aims to leap ahead in reasoning: A member shared a link to the QwQ-Max-Preview blog, which shows a model built on Qwen2.5-Max with strengths in deep reasoning, math, coding, general domains, and agent-related tasks.
- The discussion speculated that key tokens in QwQ's reasoning traces look similar to R1 and pondered whether it requires less compute.
AI alignment talk makes community nauseous: A member expressed disgust towards the altruistic discussions on AI alignment on X, suggesting that alignment can be achieved simply through system prompts.
- They criticized the imposition of narrow lenses and limited understandings to restrict AI, advocating for more listening and thinking before speaking.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (6 messages):

Sonnet-3.7, Misguided Attention Eval, Overfitting

Sonnet-3.7 Shines in Attention Evaluation: Sonnet-3.7 benchmarked as top non-reasoning model in Misguided Attention Eval, nearly surpassing o3-mini.
- The user seeks to activate its thinking mode via the OR API, if feasible.
Misguided Attention Eval Targets Overfitting: The Misguided Attention test challenges reasoning abilities of LLMs with misguiding information, specifically testing for overfitting.

Link mentioned: GitHub - cpldcpu/MisguidedAttention: A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information: A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information - cpldcpu/MisguidedAttention

Nous Research AI ▷ #interesting-links (4 messages):

Qwen AI, Video Generation

Qwen AI Unveils Updated Chat Interface: Qwen AI released an updated chat interface, teasing something new coming today.
- Despite the update, a member noted that the artifacts are still a bit clunky, like a half baked copy.
Qwen AI Adds Integrated Video Generation: The updated Qwen AI chat interface now features integrated video generation capabilities.

Link mentioned: Qwen Chat: no description found

MCP (Glama) ▷ #general (62 messages🔥🔥):

Anthropic MCP Registry API, Claude 3.7, Haiku Tool Support, Claude Code (CC), MCP Server Recommendations

Anthropic Finally Delivers MCP Registry API: Anthropic announced the official MCP registry API, seen on this tweet, which is great news for the community, especially for those relying on solutions like opentools.com/registry to fill the source-of-truth gap.
- This API promises to be the source of truth for MCPs, streamlining development and integration efforts.
Claude 3.7 Debuts with 'Thinking' Tags: Claude 3.7 has been released, featuring 64,000 output extended thinking tokens and a new 'latest' alias, with initial impressions suggesting it combines the best aspects of previous June and October models.
- Users noted it is back to following long-ish system prompts, spotting social engineering, and also utilizes <thinking> tags when using tools, adding a cute touch to its operation.
Haiku's Tool Support: A Mixed Bag: While Haiku 3.5 now supports tools, its effectiveness is debated, some find it bad at using them compared to Sonnet 3.5, particularly with many tools or parameters.
- One user shared they found that Sonnet falls apart with around 70 tools, but others found it works well with fewer tools and parameters.
Claude Code Emerges as Top-Tier Code Assistant: Claude Code (CC) is drawing high praise for its code assistance capabilities, outperforming tools like Aider in handling complex coding errors.
- In one test, CC resolved 21 compile errors in Rust in one shot, whereas Aider struggled and got stuck in a loop. Users speculate on possible caching mechanisms and costs, with one user reporting astronomical Anthropic costs in the last 6 weeks.
Seeking Context-Aware MCP Servers: Developers are seeking MCP servers that can provide language-specific context, especially for languages like TypeScript and Rust, to avoid manually inputting entire language documentation.
- One recommendation was code-research-mcp-server, though noted as a little finnicky, along with this list of tools and llm-context.py for managing context in LLMs.

Links mentioned:

MCP (Glama) ▷ #showcase (11 messages🔥):

MetaMCP Licensing, AGPL Licensing, Enact Protocol MCP Server, Claude 3.7 Sonnet on Sage

MetaMCP Open-Source Licensing Concerns: Concerns were raised regarding MetaMCP's licensing, with a user suggesting it might become a cloud SaaS, prompting the developer to seek feedback on licensing to prevent cloud monetization while keeping it self-hostable.
- The developer shared the MetaMCP server GitHub repository and expressed openness to licensing changes following the discussion.
AGPL Licensing Suggested for MetaMCP: A user suggested using AGPL licensing for MetaMCP to ensure contributions are open-sourced, also suggesting an additional clause allowing the company to sublicense under MIT-0.
- The user noted AGPL would require companies hosting it to open source their changes, enabling incorporation into the original version, leading the developer to update to AGPL.
Enact Protocol Server Takes Shape: A member is exploring the creation of an MCP server for the Enact Protocol, aiming to build a standardized way of defining tasks.
- The Enact Protocol provides a framework for defining and executing automated tasks and workflows.
Claude 3.7 Sonnet Supercharges Reasoning on Sage: Claude 3.7 Sonnet with extended thinking capabilities is now on Sage, allowing users to see Claude's reasoning process as it tackles complex problems.
- New features include a thinking mode toggle (Command+Shift+T), default model settings, improved scrolling, and expandable thinking blocks.

Links mentioned:

LM Studio ▷ #general (41 messages🔥):

LM Studio Wordpress Plugins Integration, Qwen 2.5 VL GGUF, QuantBench on GitHub, Speculative Decoding in LM Studio, Deepseek R1 671b RAM requirements

Qwen 2.5 Vision Language Model Arrives!: A member announced the availability of a working Qwen 2.5 VL 7B GGUF, available on Hugging Face.
- Another user confirmed it works on the latest version of LM Studio, with one adding that it is significantly better than llama3.2 vision 11b instruct and qwen2-vision 7b instruct.
QuantBench Speeds Up Quants: The Qwen 2.5 VL 7B GGUF quant was made using QuantBench, available on GitHub.
- The model has been tested on the latest llama.cpp built with CLIP hardware acceleration manually enabled.
LM Studio Folds tags: A member inquired whether LM Studio removes <think> tags from the context sent back to the model during Chain of Thought prompting, referencing model documentation warning against their inclusion.
- A helpful community member linked to LM Studio's documentation which allows users to inspect the exact input string that goes to the model.
Speculative Decoding Boosts LLM Speed: A community member asked about speculative decoding and its compatibility with Llama 3.1 8B and Llama 3.2 1B models.
- Another member shared LM Studio's documentation on the feature, noting it can substantially increase the generation speed of large language models (LLMs) without reducing response quality.
Deepseek R1 671b Needs Serious RAM: A user inquired about the RAM requirements for running Deepseek R1 671b locally, as the documentation specifies a minimum of 192GB+.
- Another member suggested using a specific quantized version and offloading roughly 70% of the model weights to the GPU if running on a Mac.

Links mentioned:

LM Studio ▷ #hardware-discussion (20 messages🔥):

A770 GPU performance, M2 Max vs M4 Max Power Consumption, AIO Pump USB Header Interference

A770 GPU Works Decently: A member reported that their A770 GPU seems decent, with an attached image showing a PC build.
- The image analysis indicates the components are super light like they are empty lol.
AIO Pump USB Struggles: A member mentioned challenges with an AIO pump requiring a USB 2.0 header that interferes with the last PCIE slot.
- They expressed frustration, stating It doesn't effing fit and I'm so done, ultimately deciding to move the components to a second system.
M2 Max Refurbished: A member stated they are running an M2 Max and didn't purchase the M4 Max because M4 Max boosts way too hard easily pegged at 140w.
- They found a well priced refurbished M2 Max 96GB sufficient for their needs, pulling only around 60W.

Stability.ai (Stable Diffusion) ▷ #announcements (1 messages):

Feature Request Board, Discord feedback, Prioritization of Features

Stability AI Launches Feature Request Board: Stability AI launched a new feature request board to gather user feedback and prioritize future developments.
- Users can submit and vote on feature requests directly from Discord using the /feedback command or through the new platform.
Feedback Shapes Stability AI Future: User feedback will now directly influence Stability AI's development priorities.
- The new system allows for transparent submission and voting, ensuring community voices are heard.

Stability.ai (Stable Diffusion) ▷ #general-chat (52 messages🔥):

SD3 Ultra details request, Stability updates, Dog breed image datasets, Image generation times, Image resolutions

SD3 Ultra Deets Desired!: A user expressed curiosity about SD3 Ultra, noting it was a comfy workflow based on SD3L 8B with higher high-frequency detail than regular SD3L 8B.
- Another user confirmed it still exists and they still use it, implying it hasn't been publicly released.
Stability's Silent Strategy?: A member inquired about the current state of Stability, asking for updates on current projects or future plans, noting they haven't heard anything for a while.
- Another member responded that nothing can be shared yet, but they are hopefully expecting some stuff soon.
Dog Datasets Desperately Demanded!: A user requested good dog breed image datasets other than the Stanford Dogs Dataset, noting they already have that one (20k images) but need more data, with images containing both dog and dog breed.
- No specific datasets were provided in the available context.
Image Generation Introspection: A user asked about image generation times, prompting several responses based on different hardware.
- Reported times varied widely: One user with a GTX 1660s stated it takes around 1 minute, a second with a 3060 TI reported 7 seconds for a 1280x720 image and 31 seconds for 1920x1080 at 32 steps, while another member with a 3070ti generated images in 4-5s using SD1.5.
Resolution Revelation Required!: Users discussed optimal resolutions, with one member questioning why another chose such a big resolution.
- Another member said that they sometimes generate 4K wallpapers (also no upscaling or detailing required).

Modular (Mojo 🔥) ▷ #mojo (11 messages🔥):

Mojo FFI, static lib, GLFW, GLEW, Sudoku example

Mojo Does GLFW/GLEW Graphics with Static Libs: Graphics programming is possible in Mojo via FFI with a static library linked to GLFW/GLEW, as demonstrated with a Sudoku example and an image.
- The member suggested exposing only the needed calls via your own C/CPP library using alias external_call with a wrapped function, as well as an example repo showing how to hijack the loader.
lightbug_http dependency fails with magic install: A member reported an issue using the lightbug_http dependency in a fresh Mojo project, resulting in an error related to small_time.mojopkg after running magic install.
- The reported error suggests the issue might be similar to a Stack Overflow question but a member wondered if the issue may be the fact that small-time was pinned to a specific version.

Links mentioned:

Modular (Mojo 🔥) ▷ #max (20 messages🔥):

Hardware Accelerated Conway's Game of Life, MAX and Pygame Integration, GPU Utilization in MAX, SIMD Implementation by Daniel Lemire, Conway's Game of Life Computer

Max Does Conway: Hardware Accelerated Game of Life Emerges: A member created a hardware-accelerated version of Conway's Game of Life by integrating MAX and Pygame, demonstrating a novel use case as shown in their attached conway.gif.
Living Computer: Conway's Game Sparks Computer Architecture Ideas: A member shared a link to a project (nicolasloizeau.com) detailing the creation of a computer within Conway's Game of Life, illustrating its Turing completeness using glider beams for logic gates.
GPU Guns Blaze: MAX Sparks Life with Guns Pattern: A member demonstrated the use of GPU in their MAX implementation of Conway's Game of Life by showcasing a guns pattern, packed bit by bit, rendered using a naive pixel-by-pixel internal function, and then the output tensor gets cast into an np array and given to pygame to render, as demonstrated in their guns.gif.
Space Invaders! Spaceship Patterns now run in Conway's Game of Life: A member implemented wrapping in their Conway's Game of Life simulation using MAX, enabling the creation of spaceship patterns and showcasing the ability to add parameters to the model from the graph API, as showcased in their spaceship.gif.

Link mentioned: Nicolas Loizeau - GOL computer: A new (and better) version of the GOL computer is available here : https://github.com/nicolasloizeau/scalable-gol-computer

Notebook LM ▷ #use-cases (2 messages):

Ease of Use, Short Prompts

User Considers Trying Tool Due to Perceived Simplicity: A user expressed interest in trying a tool, noting it seems easy enough, despite not being a programmer.
- This suggests the tool's interface and instructions are perceived as user-friendly even for those without coding experience.
User Remarks on Brevity of Instructions: The same user commented on the instruction prompt's conciseness, calling it the shortest instruction prompt I have ever seen.
- This indicates a minimalist approach to providing guidance, potentially appreciated for its directness.

Notebook LM ▷ #general (14 messages🔥):

Gemini, NotebookLM, PDF Conversions, Language prompts, Savin/Ricoh Copier

Book Upload Workaround via PPT Conversion: A user outlined a method to import physical books into NotebookLM by photographing each page, converting the PDF to PowerPoint, uploading to Google Slides, and then importing the slides into NotebookLM.
- They noted that NotebookLM works with text images in slides but not directly from PDFs.
Language Prompt German Misadventures: A user reported difficulty getting the hosts to speak German, despite using prompts like "Hosts speak only German Language" and "The audio language must be in German".
- The hosts either speak English or gibberish, sometimes starting in German before switching.
Book Scanning via Savin/Ricoh Copier: A user suggested using a recent Savin/Ricoh copier to scan an entire book to PDF and then uploading to NotebookLM.
- They confirmed that even with illegible source text, NLM could correctly answer questions about the scanned document.
Can Language be changed in NotebookLM?: A user inquired about changing the language in NotebookLM without changing the Google account language.
- This could be a desirable feature since users want to customize their experience.
Claude 3.7 Hype: A user expressed excitement about Claude 3.7 and wished for the ability to choose models in NotebookLM.
- Another user asked about the envisioned effect of model choice, opening up the question of the implications of model variety for the end user.

LlamaIndex ▷ #blog (3 messages):

LlamaIndex AI Assistant, ComposIO HQ, AnthropicAI Claude Sonnet 3.7

LlamaIndex Docs Debut AI Assistant: LlamaIndex announced the availability of an AI assistant on their documentation.
ComposIO HQ Releases Another Banger: LlamaIndex tweeted about another release from ComposIO HQ.
AnthropicAI Drops Claude Sonnet 3.7: AnthropicAI released Claude Sonnet 3.7, which LlamaIndex supports from day 0.
LlamaIndex Adds Day 0 Support for Claude Sonnet 3.7: To use, users should pip install llama-index-llms-anthropic --upgrade and refer to Anthropic's announcement post.

LlamaIndex ▷ #general (5 messages):

Fusion Rerank Retriever with Elasticsearch, MultiModalVectorStoreIndex and GCSReader issue

Fusion Rerank Retriever Needs Initialized Nodes: A user wanted to use the fusion rerank retriever with Elasticsearch but the BM25 retriever could not be initialized because the docstore was empty.
- Another member clarified that you need to save the nodes somewhere for BM25 to initialize, either to disk or elsewhere, because it can't initialize from the vector store alone.
MultiModalVectorStoreIndex Error: A user encountered an error while creating a multimodal vector index using the MultiModalVectorStoreIndex class with GCSReader.
- The error, [Errno 2] No such file or directory, occurred with image files, even though they are present in the GCS bucket, whereas PDF documents work fine.

Torchtune ▷ #dev (6 messages):

Left Truncation vs Right Truncation, StatefulDataLoader PR

Truncation Troubles: Left vs Right in Finetuning: Members discussed the implications of left truncation (seq[-max_seq_len:]) versus right truncation (seq[:max_seq_len]) during finetuning, sharing interesting graphs.
- The consensus was to expose both truncation methods but to default to left truncation at least for SFT.
StatefulDataLoader Support Lands in PR: A member requested a review for their PR adding support for StatefulDataLoader class.

Link mentioned: Add support for StatefulDataLoader by joecummings · Pull Request #2410 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)This PR adds support for the StatefulDataLoader class fr...

Torchtune ▷ #papers (2 messages):

DeepScaleR, Reinforcement Learning, DeepEP library, MoE

DeepScaleR Scales RL, Surpasses O1 Preview: DeepScaleR finetuned from Deepseek-R1-Distilled-Qwen-1.5B using simple reinforcement learning (RL), achieving 43.1% Pass@1 accuracy on AIME2024.
DeepSeek Open Sources EP Communication Library: DeepSeek introduced DeepEP, the first open-source EP communication library for MoE model training and inference.

Links mentioned:

Cohere ▷ #cmd-r-bot (5 messages):

DeSci Validators, Profitability Thresholds, Asset Value Expert Account

Validators Ponder Profitability: A member inquired about the profitability threshold for Proof of Stake (PoS) validators within the Decentralized Science (DeSci) field.
- Another member responded with "pool validator node", hinting at the importance of pool participation for validators.
Asset Expert Gets Short Shrift: The bot posted about an "asset value expert account" which was labelled as "nazi".

DSPy ▷ #general (2 messages):

DSPy Assertion Migration, BestOfN Module, Refine Module, Reward Functions

Streamlining Assertions with DSPy's BestOfN: DSPy users migrating from 2.5-style Assertions can now use dspy.BestOfN or dspy.Refine modules for streamlined functionality.
- The dspy.BestOfN module retries a module up to N times, picking the best reward, but stopping if a specified threshold is reached.
Crafting Reward Functions for DSPy Modules: DSPy's reward functions can return scalar values like float or bool, enabling customized evaluation of module outputs.
- A sample reward function was shown: def reward_fn(input_kwargs, prediction): return len(prediction.field1) == len(prediction.field1).

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}