**a quiet day.**

AI News for 2/27/2025-2/28/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (221 channels, and 8236 messages) for you. Estimated reading time saved (at 200wpm): 795 minutes. You can now tag @smol_ai for AINews discussions!

Much discussion about the relative merits of GPT 4.5, which you can read below.


{% if medium == ā€˜web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

GPT-4.5 Model Performance and User Perception

  • Initial User Experiences and Subjective Evaluation: @karpathy conducted a poll comparing GPT-4 and GPT-4.5, finding that in 4 out of 5 questions, users preferred GPT-4, which was surprising as @karpathy personally found GPT-4.5 better in all cases, suggesting a possible preference for ā€œhigh-taste testersā€ towards GPT-4.5’s deeper charm, creativity, and humor. However, @jeremyphoward responded to Karpathy’s poll results, stating that the awkwardness, not ā€œhigh tasteā€ was the reason for user preference. @Teknium1 also reacted to the poll results with ā€œDamn lol must have some high, or low, taste people testing here idkā€. @abacaj expressed strong dissatisfaction, stating GPT-4.5 needs to enhance productivity to be useful, otherwise it is ā€œfucking uselessā€. @abacaj also argued that if GPT-4.5 is only a ā€œhigh tasteā€ model, it is ā€œblowing investor moneyā€. @stevenheidel likened the GPT-4.5 launch to the initial ChatGPT excitement, as people are again having fun chatting with AI.
  • Concerns Regarding Speed and Practicality: @abacaj noted GPT-4.5 is ā€œvery slowā€ and ā€œimpractical to use for agent loopsā€, despite being ā€œfun to promptā€. @abacaj elaborated that it takes ā€œ3+ minutes to answer one questionā€ in a moderate prompt loop, deeming it ā€œvery impracticalā€. @abacaj further commented that GPT-4.5 ā€œfeels more like a research artifact than a real model you can deployā€ due to its slowness.
  • Critique of Capabilities and Value Proposition: @abacaj criticized the showcased capabilities of the ā€œlargest language modelā€, questioning if drawing a triangle using SVG is the highlight. @abacaj found the value add for end-users questionable, suggesting internal use within OAI for distillation.
  • Pricing and Economic Viability: @Yuchenj_UW remarked that the pricing ā€œmakes even less senseā€ in light of GPT-4.5’s performance. @Yuchenj_UW speculated about the potential pricing of GPT-5 and o4. @AravSrinivas highlighted Perplexity Deep Research at $20/month versus ChatGPT at $200/month.
  • Performance Compared to Other Models: @METR_Evals reported that GPT-4.5 performs above GPT-4o but below o1 or Claude 3.5 Sonnet based on METR experiments with an earlier checkpoint, noting a time horizon score of ~30 minutes. @dylan522p stated Claude 3.7 beats GPT 4.5 on most tasks, but GPT 4.5 has better ā€œvibesā€ and is the first model since Claude 3 Opus to make them laugh, emphasizing humor as intelligence. @scaling01 speculated GPT-4.5 could be ā€œGPT-4o x 10ā€ in size, estimating around 5T parameters. @Teknium1 mentioned Grok’s context window is only 128k. @multimodalart shared evaluations comparing GPT 4.5 with non-thinking models like Sonnet 3.7, Deepseek V3, and Grok 3.
  • Emotional Intelligence (EQ) and ā€œVibesā€: @karpathy found Claude 3.7’s humor to be the funniest after scrutinizing LLM outputs for humor. @random_walker argued that the ā€œEQā€ improvements in GPT 4.5 are due to post-training, not parameter count, suggesting any EQ differences are behavioral rather than capability-based. @random_walker further claimed that GPT-4o and GPT-3.5 can exhibit similar EQ behavior as GPT-4.5 with appropriate post-training. @omarsar0 suggested using the OpenAI Playground to compare models and observe GPT-4.5’s ā€œthoughtfulā€ responses. @omarsar0 noted GPT-4.5 often sounds more ā€œthoughtfulā€ by adding sensations and thoughts. @marktenenholtz observed that Sonnet 3.7 is ā€œalmost too eagerā€ and GPT-4.5 is ā€œalmost too deferentialā€.
  • Technical Details and Training: @sama credited @ColinWei11, Yujia Jin, and @MikhailPavlov5 for the difficult work at the intersection of ML and systems required for GPT-4.5. @cloneofsimo highlighted that GPT4.5 was ā€œtrained on multiple datacentersā€ and ā€œaggressively used low precision trainingā€, implying ā€œdiloco goes brrā€ and the benefit of fp8 training due to high granularity. @rasbt pointed to the system card mentioning ā€œnew supervision techniquesā€ used in training. @rasbt mentioned that apparently character-training was not used. @Teknium1 questioned how GPT-4.5’s knowledge cutoff remains 2023 despite current pretraining runs, speculating about data contamination from ChatGPT 3.5 data or if the model was trained long ago.

Model Architecture, Scaling Laws and Efficiency

  • Scaling Law Limitations and Alternative Approaches: @Yuchenj_UW suggested that the GPT-4.5 release indicates LLM pre-training scaling has plateaued, noting that a 10x compute increase yields limited improvement, which allows companies like xAI to catch up through innovation in algorithms and data, as demonstrated by DeepSeek’s efficiency gains. @jxmnop echoed this, suggesting GPT 4.5 might signal ā€œthe beginning of the end for scaling lawsā€, questioning if data is exhausted or if scaling laws fail to capture desired task performance. @ibab emphasized that algorithms are increasingly important with larger models, suspecting training details are key to Grok 3’s performance. @MParakhin stated pre-training needs higher-perplexity targeted data and Active Learning to progress further. @teortaxesTex asserted that non-thinking LLMs pretrained on natural data have hit their practical limit, doubting a $1T training run would significantly improve them.
  • Inference Compute and Efficiency: @rasbt clarified that train- and inference-compute are orthogonal ways to improve LLMs and an apple-to-oranges comparison is being made without considering inference-compute scaling for GPT-4.5. @rasbt questioned if GPT-4.5 is more expensive and slower than o1 (GPT4-sized + inference-compute scaling) and what GPT-4.5 with o1-style scaling would look like. @iScienceLuvr highlighted research on ā€œThinking Slow, Fastā€, using distilled reasoners based on smaller models like Llama-1B and -3B with Mamba architecture to improve inference scaling. @_akhaliq shared FlexiDiT, a diffusion transformer framework that generates high-quality samples with less compute by using varying patch sizes during denoising. @TheTuringPost discussed Chain of Draft (CoD), which encourages models to generate short reasoning steps to reduce costs and speed up models while maintaining accuracy.
  • Hardware and System Architecture: @reach_vb highlighted DeepSeek’s Fire-Flyer File System (3FS), noting its disaggregated architecture, strong consistency using CRAQ, stateless metadata services, and KVCache for inference, achieving high read throughput and outperforming in benchmarks. @teortaxesTex discussed N4 process allowing 2.32x denser chips compared to N7, based on transistor counts and die sizes. @awnihannun reported Kimi’s Moonshot 16B MoE model running nicely on M4 Max with MLX at 154 toks/sec, performing as good or better than dense 7Bs. @casper_hansen_ commented on CUDA’s moat, noting even AMD engineers use CUDA for tensor engines.

Open Source Models, Tools, and Frameworks

  • DeepSeek’s Open Source Contributions: @Yuchenj_UW praised DeepSeek for drastically reducing GPU requirements through infrastructure and algorithm optimization and their ā€œgoated open source workā€. @reach_vb, @reach_vb, @reach_vb and @reach_vb shared multiple links and details regarding DeepSeek’s Fire-Flyer File System (3FS) and benchmarks. @teortaxesTex mentioned DeepSeek’s file system from 2019 is still SoTA. @aidan_mclau jokingly scanned DeepSeek’s training data and found ā€œdeep commitment from a brilliant teamā€.
  • Hugging Face Ecosystem and Integrations: @_akhaliq and @_akhaliq provided code snippets for developers to get started with GPT-4.5-preview using ai-gradio[openrouter] and Hugging Face. @ClementDelangue highlighted the French ministry of culture and interior being on Hugging Face. @mervenoyann shared that Microsoft’s MAGMA-8B model is easily loadable to Hugging Face transformers. @ClementDelangue announced Perplexity R1-1776 inference directly from HF model page via FireworksAI_HQ. @_akhaliq shared a link to AI Conference Deadlines on Hugging Face.
  • Local LLMs and MLX: @reach_vb shared instructions for running Phi 4 Mini Instruct locally on a Mac using llama.cpp. @awnihannun committed to using local LLMs for a vibe-check on performance gap, favoring tools like the raw terminal (mlx_lm) and LM Studio. @awnihannun, @awnihannun, and @awnihannun showcased local inference on M4 Max using MLX for models like Qwen2.5 and Moonshot.
  • Other Open Source Tools and Projects: @pirroh mentioned Replit building their own Copy-On-Write distributed file system before LLMs became coding proficient. @bobvanluijt highlighted Weaviate’s open-source vector database and its new features. @_akhaliq shared TALKPLAY, a multimodal music recommendation system with LLMs. @alexalbert__ announced Anthropic API quality of life update allowing public facing URLs for image/document sources. @DeepLearningAI promoted a short course on ā€œBuild Apps with Windsurf’s AI Coding Agentsā€ in collaboration with Codeium. @AymericRoucher recommended reading about instrumenting smolagent runs and setting up LLM-judge systems using Arize Phoenix. @mervenoyann advertised a weekly newsletter on open-source art tools. @rasbt shared a tutorial to deploy AI models on public/private cloud using open-source tools.

AI Applications and Industry Use Cases

  • Enterprise AI and Productivity: @perplexity_ai, @perplexity_ai, @perplexity_ai, and @perplexity_ai announced Perplexity Deep Research for Enterprise Data, connecting to Google Drive, OneDrive, and SharePoint, enabling deep research across company files and the web with enterprise-grade security. @AravSrinivas, @AravSrinivas, @AravSrinivas, @AravSrinivas, and @AravSrinivas further detailed Perplexity Enterprise Pro, emphasizing features like deep research, reasoning, internal/external search, access to all models, and collaboration. @lmarena_ai and @lmarena_ai announced Claude 3.7 Sonnet’s top ranking in coding on the Arena, highlighting its capabilities. @AIatMeta showcased Llama being used by SevillaFC with IBM’s watsonx to create Scout Advisor for soccer star scouting. @OpenAIDevs highlighted ConsensusNLP using GPT-4.5 for scientific/medical analysis and structured outputs for visualizing research agreement.
  • Agentic AI and Automation: @mervenoyann announced Microsoft’s MAGMA-8B vision language action model for physical and digital world operations including embodied robots and web automation. @llama_index shared an example of agentic productivity applications built with LlamaIndex. @RichardSocher suggested using research agents like ARI for extensive literature reviews in serious medical problems, providing an example report.
  • Coding and Development: @nearcyan shared a meme about junior devs watching Claude 3.7 ā€œdestroy their codebase in cursorā€. @HamelHusain stated ā€œIt is only possible for me to understand GraphQL because of AIā€. @cloneofsimo critiqued current automated software development tools like Devin, OpenHands, Replit, and Cursor Compose, finding them unable to complete even small applications end-to-end, lacking in server/client, IPC, queue, and scheduling capabilities. @rishdotblog claimed to have replaced a $100/month tool with a $10 Claude Code solution, suggesting programming jobs and SaaS companies are ā€œgoing awayā€.

AI Research and Papers

  • Recent Research Paper Highlights: @rasbt provided a list of recent AI research papers covering topics like SWE-RL, LoRA boosting, long-context LLMs, Logic-RL, test-time scaling, AI research agents, model selection, inner thinking transformers, natural reasoning, knowledge acquisition, freelance software engineering with LLMs, sparse attention, unlearning, large language diffusion models, model merging, reasoning-action dilemma, finance LLMs, infinite context, distillation scaling laws, prompt caching, reasoning from demonstrations, hierarchical reasoning, thinking in LLMs, compute-optimal test-time scaling, mathematical reasoning, large memory models, quantized LLMs, video RoPE, scaling up test-time compute, self-backtracking, training efficient reasoning, reasoning advancements, teaching critique via RL, enhancing reasoning for domain applications, less-is-more reasoning, chain-of-thought reasoning, chain-of-associated-thoughts, direct alignment algorithms, embedding layer scaling, and competitive programming with large reasoning models. @iScienceLuvr, @iScienceLuvr, @iScienceLuvr, @iScienceLuvr, @iScienceLuvr, and @iScienceLuvr highlighted papers on FlexiDiT, Self-Training for Concise Reasoning, and Thinking Slow, Fast with Distilled Reasoners, providing abstracts and code links. @omarsar0, @omarsar0, and @omarsar0 shared papers on METAL (Modality-tailored critique), Modality-tailored critiques for self-correction, and Test-Time Scaling on Chart Generation, noting performance improvements. @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, @_akhaliq, and @_akhaliq linked to papers on Mobius (Text to Seamless Looping Video), FlexiDiT, R1-T1 (Translation Capability Incentivization), and LongRoPE2 (Context Window Scaling). @dair_ai and @dair_ai highlighted Google’s PlanGEN framework for complex planning and reasoning in LLMs, detailing its constraint-guided verification and adaptive algorithm selection. @DeepLearningAI summarized a paper on Brain2Qwerty, a non-invasive AI system translating brain waves to text using MEG recordings.
  • Cognitive Science and AI Alignment Theory: @AndrewLampinen shared a preprint on ā€œNaturalistic Computational Cognitive Scienceā€, synthesizing AI and cognitive science towards generalizable cognition models. @DanHendrycks discussed the evolution of ideas in AI alignment theory, contrasting ā€œrandom memetic driftā€ with Yudkowsky’s contributions, suggesting GPT is forcing empirical realities on the alignment forum.

Humor and Miscellaneous

  • AI Model Humor and Vibe Checks: @_akhaliq and @_akhaliq posted animated SVGs as humorous responses from GPT-4.5 about being open-sourced. @_philschmid asked for ā€œvibe test promptsā€, suggesting counting to ten omitting numbers ending in ā€œeā€ and generating an SVG of a pelican on a bicycle. @NeelNanda5 shared an LLM hack: ā€œWrite your response in the style of a Scott Alexander blog postā€ for more enjoyable long outputs. @aidan_mclau presented a humorous IQ scale from 0 to infinity, culminating in an enlightened fart joke. @andersonbcdefg shared a meme about asking OpenAI if their model is good or lazy. @Teknium1 posted ā€œGPT4.5 finally knows me, lmaoā€ with an image implying GPT-4.5 understood their personality.
  • Societal and Philosophical Reflections: @RichardMCNgo made an observation about the demographic overlap between high-IQ autism-spectrum biological males, transness, and systemizing thinking. @RichardMCNgo analogized the US presidency since 2012 to progressive chess. @teortaxesTex joked Unitree bots will cause an uptick in solipsism. @francoisfleuret expressed a ā€œnightmareā€ scenario of nukes, AI, and drones as rational defense. @AmandaAskell humorously suggested an expensive ā€œI totes respect youā€ pin as an alternative to uncomfortable suits for East Coast formality. @AmandaAskell joked about gendered profile preferences on dating apps.
  • Industry and Community Chatter: @suchenzang posted ā€œbig model smellā€ with a link, and @suchenzang tweeted ā€œthings you can’t buy for $9bn, maybe not even $30bnā€¦ā€. @nearcyan declared being ā€œdone with benchmarksā€, losing empathy for hyper-dimensional shape descriptions. @agihippo questioned working hours in AI, suggesting ā€œAI people are mostly working all the time!ā€. @ID_AA_Carmack was ā€œvery happy to see more classic game source code releasedā€, noting the disjoint between game dev and broader open source culture. @c_valenzuelab joked *Runway’s new about page states ā€œWe are brain surgeons *for artificial brains.ā€**.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek Realse: Revolutionary Storage and Data Processing Tech

  • DeepSeek Realse 5th Bomb! Cluster Bomb Again! 3FS (distributed file system) & smallpond (A lightweight data processing framework)Ā (Score: 499, Comments: 73):Ā DeepSeekĀ launchesĀ 3FS, a high-performance distributed file system optimized for AI workloads, utilizing modernĀ SSDsĀ andĀ RDMA networksĀ to enhance distributed application development. Additionally,Ā smallpond, a lightweight data processing framework, integrates withĀ DuckDBĀ andĀ 3FS, offering a streamlined solution for data processing tasks. For more information, visit theirĀ GitHub pageĀ andĀ smallpond repository.
    • 3FS Performance and Comparison:Ā 3FSĀ achieves an impressiveĀ 6.6 TiB/s bandwidth, significantly surpassing typicalĀ DRAM speeds. Discussions comparedĀ 3FSĀ to other systems likeĀ ColossusĀ and noted its unique application inĀ AI training workloadsĀ without traditional file read optimizations like caching.
    • Open Source Strategy and Impact: Many commenters appreciatedĀ DeepSeek’sĀ open-source approach, highlighting its potential to democratize AI advancements and challenge monopolistic tech giants likeĀ OpenAIĀ andĀ Nvidia. The open-source culture was emphasized as a reciprocal process, benefiting both contributors and the broader AI community.
    • Technical Insights and Historical Context:Ā 3FSĀ has been in production for over five years, developed byĀ High-Flyer AIĀ and used in theirĀ Fire-Flyer II system. It is optimized for large-scale random read operations, employsĀ Direct I/O, and uses theĀ FFRecordĀ format for sample data storage, enhancing AI model training efficiency significantly.
  • DeepSeek OpenSourceWeek Day 5Ā (Score: 127, Comments: 9):Ā Fire-Flyer File System (3FS)Ā is a parallel file system designed to maximize the bandwidth of modern SSDs and RDMA networks, achieving an impressiveĀ 6.6 TiB/s aggregate read throughputĀ in a 180-node cluster andĀ 3.66 TiB/min throughputĀ on the GraySort benchmark with a 25-node cluster. It offersĀ 40+ GiB/s peak throughput per client nodeĀ for KVCache lookup and supports a disaggregated architecture with strong consistency semantics, facilitating tasks like training data preprocessing and embedding vector search. For more details, visit theĀ 3FS repositoryĀ and theĀ Smallpond framework.
    • 3FSĀ is highly suitable forĀ AI Training WorkloadsĀ andĀ AI Inference, offering benefits like random access to training samples without prefetching, high-throughput checkpointing, and a cost-effective KVCache for large language model inference. It also supportsĀ data-intensive applicationsĀ requiring strong consistency and high throughput, as evidenced by its performance on theĀ GraySort benchmark.
    • Users expressed amazement at the development team’s productivity, noting the impressive output despite limited manpower. The project originated from the CEO’s hedge fund team in 2019, and their recruitment strategy focuses on hiring top CS graduates from elite Chinese universities.
    • Some users find the technical details ofĀ 3FSĀ too complex and not directly applicable to most use cases, suggesting a potential mismatch between user expectations and the system’s specialized capabilities.

Theme 2. French Reasoning Model: Economical and Effective

  • I trained a reasoning model that speaks French—for just $20! šŸ¤ÆšŸ‡«šŸ‡·Ā (Score: 229, Comments: 78): I cannot generate a summary as the post body does not contain sufficient textual information, only a link to a video.
    • Fine-tuning a 7B LLM:Ā TheREXincomingĀ fine-tuned aĀ 7B LLMĀ based onĀ Qwen 2.5Ā using onlyĀ 2,000 samplesĀ (1K English + 1K French) at a cost ofĀ $20. The model performs comparably toĀ R1 Distil 7BĀ on math benchmarks, showcasing minimal knowledge degradation.
    • Model and Data Availability: The fine-tuned model and its dataset are available onĀ Hugging FaceĀ (Data,Ā Model,Ā GGUF). The model is designed for high-performance French language capabilities and can serve as a template for training reasoning LLMs in other languages.
    • Community Feedback and Development: Users inquired about the data selection and training details, whileĀ TheREXincomingĀ mentioned ongoing efforts to clean up the data curation pipeline and plans to update the repository. The initiative was met with enthusiasm and disbelief at the low cost and high performance achieved.

Theme 3. Sesame Realtime Voice Model Rivals OpenAI

  • ā€œCrossing the uncanny valley of conversational voiceā€ post by Sesame - realtime conversation audio model rivalling OpenAIĀ (Score: 200, Comments: 37):Ā SesameĀ showcased a compelling real-time conversational voice model that rivalsĀ OpenAI’s Advanced Voice Mode, with plans to release it under anĀ Apache 2.0 license. Although the public weights are not yet available, the demo has impressed users with its quality, indicating a promising future for this new player in voice synthesis technology.
    • Users are highly impressed with theĀ Sesame conversational voice model, noting its superior quality and speed compared toĀ ChatGPT’s advanced voice mode. The demo is praised for its smooth response time and realistic sound, with users expressing excitement for its potential open-source release.
    • There is enthusiasm for the potential integration of the model with other technologies, such asĀ function callingĀ andĀ RAG, to enhance its capabilities without increasing latency. Users are eager for the model to be available on platforms likeĀ Hugging FaceĀ for easier access and integration.
    • Some users highlighted limitations, such as the model’s inability to detect emotions or sarcasm and its tendency to shut down conversations if inputs are delayed. Despite these issues, the model’s engaging conversational style and memory capabilities were appreciated, with users looking forward to trying it on their own setups.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Theme 1. Humorous and Creative Applications of GPT 4.5

  • GPT 4.5 as Donald Trump explaining creation of EarthĀ (Score: 550, Comments: 86):Ā GPT 4.5Ā humorously mimicsĀ Donald TrumpĀ in a satirical narrative about the creation of Earth, attributing the planet’s formation to Trump’s personal initiative. The narrative highlights exaggerated claims about creating the sun, Earth, and its features, while humorously critiquing dinosaurs as a ā€œhuge mistakeā€ before introducing ā€œwinningā€ animals and humans, all in a style characteristic of Trump’s speech patterns.
    • Commenters appreciated the humor and style of theĀ GPT 4.5Ā narrative, with many finding it amusing and noting its exaggeratedĀ Trump-likeĀ qualities, though some felt it was too coherent or repetitive. The humor aboutĀ dinosaursĀ being a ā€œhuge mistakeā€ and the planet being ā€œthe wettest everā€ particularly resonated with readers.
    • There was interest in converting the text to audio usingĀ text-to-speechĀ models, with some already sharing audio links (SoundProofHead’s linkĀ andĀ TwoLevelsAhead’s link) or expressing a desire for aĀ deepfake videoĀ version.
    • The discussion highlighted the potential of AI in humor, with some commenters suggesting that achieving genuineĀ comedyĀ could be a significant benchmark for AI capabilities, while others joked about the implications of AI mastering humor to a superhuman level.
  • ChatGPT’s existential crisis over emojiĀ (Score: 203, Comments: 48):Ā ChatGPT humorously misidentifies emojis, including a seahorse, unicorn, shrimp, and dragon, leading to a playful yet existential reflection on emoji recognition capabilities. The conversation, shown on a dark background, underscores the casual and comedic nature of the AI’s attempts at identifying emojis.
    • Emoji Misidentification: Users enjoyed sharing humorous instances ofĀ ChatGPTĀ misidentifying emojis, often repeatedly confusing seahorses with other animals like unicorns, dragons, and fish. This led to a playful and comedic exchange, highlighting the AI’s struggle with emoji recognition.
    • Community Engagement: Many users shared their own experiences and screenshots, contributing to the light-hearted nature of the conversation. The shared content included links to images and humorous dialogues, emphasizing the communal enjoyment of the AI’s quirky responses.
    • AI Humor and Reflection: The thread reflects on the whimsical nature of AI’s limitations, with users appreciating the comedic errors and engaging in a shared digital experience. This playful interaction underscores the community’s enjoyment of AI’s unpredictability and the shared humor derived from its errors.

Theme 2. Innovations in AI Video and Audio Processing

  • Advanced Voice 4.5Ā (Score: 365, Comments: 95): The post titledĀ ā€œAdvanced Voice 4.5ā€Ā likely discusses advancements inĀ AI voice actingĀ technology, specifically focusing on versionĀ 4.5. Without additional context or details, the post emphasizes the development of moreĀ realistic AI-generated voices.
    • There is skepticism about theĀ ā€œAdvanced Voice 4.5ā€Ā update, with users questioning whether it includes voice advancements, as some believe it is just an uncensored update.Ā TheRobotClusterĀ claims that version 4.5 does not apply to voice and is simply an uncensored version, raising questions about whetherĀ ChatGPTĀ now allows uncensored content.
    • Discussions around theĀ AI’s ability to mimic accentsĀ reveal mixed opinions; some users criticize the AI’s attempt at anĀ English accent, suggesting it sounds like an American trying to mimic it. This raises questions about the authenticity and accuracy of AI-generated accents.
    • The conversation touches on AI’s impact on various industries, with some users predicting that AI advancements, particularly in voice acting and potentially theĀ porn industry, could lead to significant technological evolution and financial gains in the future.
  • SpargeAttn: A new method giving you a 1.83x speedup on video models with NO quality loss.Ā (Score: 155, Comments: 45):Ā SpargeAttnĀ offers aĀ 1.83x speedupĀ for video models without compromising quality, as demonstrated by a comparison on anĀ L40 GPU. The method reduces processing time fromĀ 1897 secondsĀ with ā€œFull Attentionā€ toĀ 1037 seconds, maintaining video quality.
    • Installation Challenges: Users discuss the complexity of installingĀ SpargeAttnĀ due to dependencies likeĀ TritonĀ and the need for specific Python versions. Detailed steps for installation on Windows are provided, including links to necessary packages and commands for integration withĀ ComfyUI.
    • Compatibility and Performance:Ā SpargeAttnĀ is noted to be model dimension specific, with potential issues when tuning across different model sizes (e.g., 1.3B vs 14B models).Ā Sliding Tile AttentionĀ is mentioned as an alternative that performs well with tuning but is currently limited toĀ H100 cards.
    • Community Contributions:Ā KijaiĀ has incorporatedĀ SpargeAttnĀ into theĀ ComfyUI-WanVideoWrapper, showcasing community efforts to integrate new tools into existing frameworks. Users express hope for future native support of attention mechanisms likeĀ sage attentionĀ andĀ tritonĀ to simplify installation processes.

Theme 3. AI Identity Confusions and Hallucinations

  • Groks thinks it is Claude unprompted, and doubles down on it after being called outĀ (Score: 187, Comments: 54):Ā Groks, an AI model, erroneously identified itself asĀ ClaudeĀ during a conversation with the head of a debate club and persisted in this claim even after being questioned. The incident, detailed in a conversation shared onĀ X, raises questions about the underlying cause of this identity confusion.
    • Several users speculate thatĀ Grok’s identity confusionĀ might stem from its training data, which includes outputs from older models likeĀ Claude. There’s a belief thatĀ xAI’sĀ post-training might have been less thorough due to its newness and an attempt to reduce bias, leading to such errors.
    • The incident is viewed humorously by some, with comments highlighting the absurdity of theĀ debate club’sĀ questioning of smallpox’s existence. This has led to skepticism about the legitimacy of the debate club, with some users suggesting it resembles a conspiracy group.
    • There are suspicions thatĀ GrokĀ might be usingĀ Claude’sĀ technology underneath or trained on its datasets, similar toĀ DeepseekĀ usingĀ ChatGPTĀ data, raising concerns about the legality and ethics of such practices.
  • GPT-4.5 will just invent concepts mid-conversationĀ (Score: 348, Comments: 75):Ā GPT-4.5Ā is noted for its ability to invent concepts during interactions, as highlighted in aĀ Twitter post by Aaron Ng. In a conversation snippet, the AI invents the ā€œCLEAR Modelā€ specifically for the interaction, demonstrating its dynamic conversational capabilities.
    • Peter HawkinsĀ originally invented theĀ CLEAR Model, andĀ GPT-4.5ā€˜s reference to it is a form of hallucination, as noted byĀ I_am_John_MacĀ with a link toĀ hotpmo.com. This highlightsĀ GPT-4.5ā€˜s tendency to create concepts that may not be accurate or original.
    • There is a humorous tone in the discussion about turningĀ hallucinationsĀ into a feature, with some users joking about the AI possibly filing patents or claiming intellectual property on its hallucinated concepts.
    • TheĀ hallucination rateĀ ofĀ GPT-4.5Ā is noted to beĀ 37.1%, which is lower thanĀ GPT-4o’sĀ rate ofĀ 61.8%Ā andĀ o1’sĀ rate ofĀ 44%, as mentioned byĀ HexpeĀ andĀ vingeran, suggesting an improvement in accuracy over previous models.

Theme 4. AI Tools Streamlining Programming and Writing

  • I made a simple tool that completely changed how I work with AI coding assistantsĀ (Score: 167, Comments: 41):Ā CodeSelectĀ is a tool designed to streamline the process of sharing code with AI coding assistants likeĀ ClaudeĀ andĀ ChatGPTĀ by displaying project structures as a checkbox tree, allowing quick file selection, and automatically detecting file relationships for better context. This lightweight tool, which installs with a single command and has no external dependencies, significantly reduces preparation time and improves AI response quality by providing proper context, and is available onĀ GitHub.
    • RepomixĀ is highlighted as an alternative tool for managing code project structures, with a simple command (cd myProject && npx repomix) that works on any folder and outputs a draggable file, which users find effective for project management.
    • Users discuss integrating aĀ Gemini powered agentĀ intoĀ CodeSelectĀ to suggest edits and file references toĀ Claude, aiming to enhance efficiency and save tokens during the coding process.
    • Claude’s GitHub integrationĀ is noted for its ability to manage project-wide changes, such as renaming variables and updating comments, which users find impressive for maintaining project context without manual input.
  • Just bit the bullet and got a yearly Claude Pro subscriptionĀ (Score: 104, Comments: 128): The author praises theĀ Claude Pro subscriptionĀ as a transformative tool for daily tasks, analytics, creative problem-solving, and software engineering, highlighting its effectiveness in debugging and code reviews. They express satisfaction withĀ Anthropic’sĀ product, contrasting it with criticisms ofĀ Claude 3.7Ā for being too concise, and emphasize the significant advancement it represents over traditional search engines.
    • Users discussĀ usage limitsĀ as a significant issue with theĀ Claude Pro subscription, with some suggesting strategies like starting new chats to manage limits effectively. Others express frustration with hitting limits frequently, which disrupts their workflow, while some users report rarely encountering these issues by keeping conversations short.
    • There is skepticism about posts praisingĀ Claude ProĀ being genuine, with some users suspecting them to be part of aĀ marketing campaign. This suspicion is fueled by the timing of posts with promotional emails and the repetitive nature of positive endorsements, though others argue the discussions are genuine due to the subreddit’s focus.
    • Subscribers debate the value of aĀ yearly subscriptionĀ versus monthly payments, with some regretting the purchase due to decreasing quality and restrictive usage limits. Others find the subscription beneficial for their work, suggesting that the decision should depend on personal use cases and the rapidly evolving AI landscape.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. GPT-4.5 Enters Arena, but Claude 3.7 Still King of the Code

  • GPT-4.5 Fails to Impress, Price Tag Stings: Early testers find OpenAI’s GPT-4.5 overpriced at $150 per million tokens and not significantly better than GPT-4 Turbo for coding, with many developers still favoring Claude 3.7 Sonnet for its superior performance in software engineering tasks. Early benchmarks on aider’s polyglot coding benchmark showed GPT-4.5 scoring 45% compared to Sonnet 3.7’s 65%, leading to disappointment and questions about its value proposition given the high API cost.
  • Claude 3.7 Sonnet Faces Load Issues, Remains Top Coder: Despite reports of high load messages and refusals, Claude 3.7 Sonnet is still considered the best model for software engineering due to its ability to accurately follow instructions and debug code effectively. Users highlight Claude 3.7’s improved instruction following and debugging capabilities, even though some speculate Anthropic is making the model harder to use.
  • DeepSeek R2 Hype Train Gathers Steam: Anticipation is building for DeepSeek’s R2 model, with some members expecting it to surpass current SOTA models and disrupt corporate hype, as DeepSeek’s Chatbot already outperforms existing models in coding. Members compare DeepSeek’s R1 model favorably to OpenAI’s o1, further fueling excitement for the upcoming R2 release.

Theme 2. IDE Wars: Cursor and Windsurf Trade Blows Over AI Coding Supremacy

  • Cursor Plagued by Bugs, Users Cry Foul: Users report Cursor IDE is riddled with bugs, experiencing frequent crashes and lost code changes after updates, with some considering disabling auto-updates and waiting for more stable releases. Frustration mounts as some users claim the coding quality of Claude 3.7 on Cursor has declined since launch.
  • Windsurf AI Jumps on GPT-4.5 Bandwagon, Questions Emerge: Windsurf AI integrated GPT-4.5 in Beta, but early tests show it’s significantly more expensive and not as strong for software engineering, sparking debate if this move is genuine or propaganda against Cursor. Users question Windsurf’s pricing model, specifically flow credits, finding Cursor’s pricing more straightforward.
  • Memory Banks in Cursor Deemed ā€œPointlessā€ and Costly: Cursor’s Memory Banks feature is criticized as inefficient and expensive, with users reporting costs reaching $50 a day using the Claude 3.7 API, and that memory banks sometimes hallucinate making it cheaper to hire a programmer. Users find memory banks inefficient because they occasionally make mistakes, leading to the conclusion that hiring a human programmer is more cost-effective.

Theme 3. Hardware Hustle: DeepSeek’s DualPipe and TinyLM Offer Glimmers of Innovation

  • DeepSeek’s DualPipe Declares War on Pipeline Bubbles: DeepSeek AI released DualPipe, a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training, aiming to reduce pipeline bubbles compared to traditional methods. This release, along with EPLB, an expert-parallel load balancer, is part of a week-long series of releases from DeepSeek AI.
  • TinyLM Unleashes Client-Side LLMs with WebGPU Fury: tinylm v0 launched, a library enabling client-side LLMs in browsers or Node.js with WebGPU acceleration, boasting zero-cost inference and complete privacy with an OpenAI-compatible API. tinylm supports text generation, embeddings, and real-time token streaming, and eliminates the need for servers for local LLM inference.
  • NVIDIA Shifts Tensor Core Focus to FP4, Leaving INT4 Behind?: NVIDIA appears to be shifting away from INT4 Tensor Cores towards FP4, with Blackwell GPUs featuring FP4, while Ada had INT4 and Hopper had INT8, raising questions about the future of INT4 precision in NVIDIA’s hardware strategy. Benchmarks suggest NVIDIA is prioritizing FP4 for quantized model training, potentially impacting future hardware development and software optimization strategies.

Theme 4. Pricing Pressure: GPT-4.5 API Costs Spark Outrage, Open Source Alternatives Beckon

  • GPT-4.5 API Pricing Deemed ā€œInsane,ā€ Users Seek Alternatives: OpenAI’s GPT-4.5 (Preview) API pricing at $75 input / $150 output per million tokens is met with harsh criticism, with users decrying the exorbitant cost compared to models like Grok3 and Claude Sonnet 3.7, questioning its value and prompting some to consider open-source alternatives. The high cost of GPT-4.5 raises concerns about accessibility and sustainability for developers and researchers.
  • Deepinfra Underprices Fal AI by 100x, Claims User: A user claims Deepinfra is 100x cheaper than Fal AI for character processing, charging $0.8 per million characters and offering free compute, contrasting with Fal AI’s $50 free credit, and suggesting Kokoro TTS as another low-cost alternative. This pricing discrepancy highlights the competitive landscape and cost-saving opportunities in the AI infrastructure market.
  • Windsurf Users Question Flow Credits, Find Cursor Pricing ā€œPreferableā€: Windsurf’s pricing model, particularly flow credits and additional flow action costs, is confusing to users, leading some to prefer Cursor’s more straightforward pricing approach. Users express concern about the disproportionate cost of additional flow actions, impacting the perceived value and transparency of Windsurf’s pricing structure.

Theme 5. Community Pulse: From Robotics Arms to LeetCode for CUDA, Innovation Thrives

  • Hobbyists Unite to Build DIY Robotics Arm: Members in LM Studio Discord are enthusiastically discussing building a robotics arm from scratch, leveraging affordable 3D printers like the $100 Creality Ender 3 V2 and open-source resources for learning servos, CAD, and microcontrollers. This project showcases the community’s hands-on approach to learning and applying AI and robotics principles.
  • LeetCode for CUDA Arrives, Challenges GPU Gurus: The CUDA community celebrates the beta release of LeetCode for CUDA, a new platform offering coding challenges specifically designed for CUDA development, inviting users to test their skills and provide feedback. This new platform fosters a competitive and collaborative environment for improving CUDA programming skills.
  • Hugging Face Community Fixes Microsoft’s Phi-4 Mini Fiasco: Microsoft’s Phi-4 mini model was found to be completely unusable due to bugs, prompting the Unsloth AI team to upload fixed versions on Hugging Face after Microsoft failed to incorporate Unsloth’s bug fixes. This community-driven effort highlights the collaborative nature of open-source AI development and the importance of rapid response to critical issues.

PART 1: High level Discord summaries

Cursor IDE Discord

  • GPT-4.5 Underwhelms Testers with Hefty Price Tag: Early testers find GPT-4.5 from OpenAI overpriced and not significantly better than GPT-4 Turbo, noting the cost at $150 per million tokens.
    • The consensus is that Claude 3.7 Sonnet remains superior for coding, leading some to call GPT-4.5 ā€œjust bigā€ and highlight its lack of new frontier capabilities.
  • Claude 3.7 Sonnet Faces High Load and Refusal Issues: Users report issues with Claude 3.7 Sonnet, including frequent high load messages and refusals to answer certain prompts, with some speculating about whether Anthropic is making model more difficult to use.
    • Despite these issues, many still consider Claude 3.7 Sonnet the best model for software engineering due to its ability to accurately follow instructions and debug code effectively.
  • Cursor Riddled with Bugs and Update Woes: Multiple users reported experiencing frequent crashes and the need to reinstall Cursor after updates, and lost code changes to the bugs, and the latest versions may be impacting performance and stability.
    • Others suggested disabling auto-updates and waiting for a more stable release, and some users are claiming the quality of Claude 3.7 coding, on cursor, has reduced compared to launch.
  • Windsurf AI Boasts Quick GPT-4.5 Integration: Windsurf AI announced that GPT-4.5 is now available in Beta on Windsurf, but noted that early testing shows that it’s significantly more expensive (>10x) than alternative models, and is not as fast nor as strong as existing models for software engineering or tool calling.
    • Users debate whether Windsurf’s move is mere propaganda to attack Cursor or a genuine effort to provide access to the latest models, even with limitations, according to this tweet.
  • Memory Banks Fall Short of Expectations: Discord members report that the memory banks seems very inefficient to me, and besides being expensive, using Claude 3.7 API can easily reach $50 a day.
    • The inefficiency arises because memory banks sometimes makes mistakes or hallucinates, making it cheaper to hire a programmer.

aider (Paul Gauthier) Discord

  • GPT-4.5 Falls Flat, Claude 3.7 Dominates: Early benchmarks show disappointing coding performance of GPT-4.5 Preview, scoring 45% on aider’s polyglot coding benchmark compared to Sonnet 3.7’s 65%, leading members to believe it is intended to be a ā€œfriendlyā€ non-reasoning language model.
    • Despite GPT-4.5’s release, Claude 3.7 remains the top choice for complex coding problems, outperforming GPT-4.5 on coding benchmarks and also easier to jailbreak.
  • DeepSeek R2 Hype Intensifies: Members are highly anticipating DeepSeek’s R2 model, expecting it to surpass current SOTA models and disrupt corporate hype, with some comparing DeepSeek’s R1 model to O1.
    • The anticipation stems from the sentiment that DeepSeek’s Chatbot already outperforms existing models in coding capabilities.
  • Aider Users Advocate for Auto-Retry Mode: Users are requesting an auto-retry mode for Aider to address the unreliability of models like Deepseek R1, proposing a fallback mechanism to another model if the primary one fails.
    • The request highlights the need for more reliable model performance to enhance the Aider coding experience.
  • Sam Altman Blames the Great GPU Shortage for GPT-4.5’s insane API price: Sam Altman admitted to the difficulty in meeting GPU demand, which is limiting GPT-4.5’s access behind a higher paywall.
    • Some members speculate that the high price of GPT-4.5’s API is due to the unaffordability of the model’s configuration otherwise.
  • Aider Configuration with Venice AI is now possible: Members are exploring configuring Aider to function with Venice AI, an LLM provider utilizing an OpenAI-style API endpoint, by setting the OPENAI_API_BASE and OPENAI_API_KEY environment variables as described in the OpenAI compatible API documentation.
    • If you would like to use Claude 3.7 with thinking in aider.conf.yaml, here is an example configuration on how to set up the model for the editor with thinking.

OpenAI Discord

  • GPT-4.5 Skips Multimodal Features: OpenAI released a research preview of GPT-4.5, their largest and best model for chat, rolling out to ChatGPT Pro users first, but GPT-4.5 currently does not support multimodal features such as Voice Mode, video, and screensharing in ChatGPT.
    • Initial testing indicates that GPT-4.5 feels more natural due to its broader knowledge base, improved ability to follow user intent, and greater ā€œEQā€, making it useful for improving writing, programming, and solving practical problems.
  • Anonymous Model Shadows Sonnet 3.7: An anonymous model is rumored to be around Sonnet 3.7’s performance, sparking speculation that if it’s GPT 4.5, it’s underwhelming given the model size.
    • Members speculated that if OpenAI releases a model that is bigger but performs the same as Sonnet 3.7, then they are behind the competition, even if the model is non-thinking.
  • Cracking LLM’s Creative Prose: When using LLMs for creative writing, defining a deep background for characters and directly discussing alternate routes can enhance the narrative’s depth and avoid repetitive emotional scenes and clichĆ©s.
    • Experiment with having ChatGPT generate conversations and interactions first, followed by a narration from the writer’s perspective, steering it towards desired directions.
  • Peeking at OpenAI’s Model Spec: OpenAI released its Model Spec which outlines the intended behavior for the models that power OpenAI’s products, including the API platform.
    • The goal is to create models that are useful, safe, and aligned with the needs of users and developers while advancing their mission to ensure that artificial general intelligence benefits all of humanity.

Unsloth AI (Daniel Han) Discord

  • Unsloth Unsnarls Phi-4 Mini Fiasco: Members reported issues with Microsoft’s Phi-4 mini, and the Unsloth team uploaded fixed versions on HF.
    • The team stated that Microsoft didn’t use Unsloth’s bug fixes, leading to the model being completely unusable.
  • DeepSeek Drops DualPipe Delight: DeepSeek AI released DualPipe, an algorithm for computation-communication overlap in V3/R1 training, which includes EPLB, an expert-parallel load balancer, optimized for V3/R1.
    • The release is part of a series of releases this week from DeepSeek.
  • GRPO Reward Functions Get Groomed: Community members debugged and improved the reward functions in the GRPO notebook, adding re.DOTALL flag for multiline XML matching, correcting a typo in count_xml, and addressing issues with integer rewards.
    • Community members recommended a block size of 128 as ideal, and an effective size of 64/128 as more stable.
  • Ollama’s Think-Token Trickery Troubles Users: A user found that Ollama appends a token to prompts, which prevents the model from generating it, requiring adjustments to output parsing for tags.
    • The user suggested that disabling this feature would be helpful, acknowledging that it stems from the model’s processing class.
  • Inception Labs Invents Mercury dLLM: InceptionAILabs introduced Mercury, a diffusion large language model (dLLM), to advance intelligence and speed through parallel, coarse-to-fine text generation.
    • Challenges remain deploying such models, especially lack of OS support and difficulties extending context length could be bottlenecks.

Codeium (Windsurf) Discord

  • Claude 3.7 Prompt Actions Inflated: The team is working with Anthropic to address higher flow actions per prompt in Claude 3.7 Sonnet compared to Claude 3.5 Sonnet.
    • They advise using 3.7 for precise tasks and 3.5 for balanced performance.
  • Claude 3.7 Credit Multiplier Reduced: The credit multiplier for Claude 3.7 Sonnet Thinking decreased from 1.5 to 1.25 due to initial token usage data.
    • Users now consume 1.25 user prompt credits and 1.25 flow action credits per tool call.
  • Cascade Crashes Cause Consternation: Users reported that Cascade isn’t working due to a resource_exhausted error, according to a Feature Request.
    • Members are encouraged to follow the roadmap to stay updated.
  • Windsurf Users Question Pricing: Members express confusion over Windsurf’s pricing, specifically regarding flow credits and the cost of additional flow actions.
    • Some users found Cursor’s pricing preferable for its straightforward approach.
  • GPT-4.5 Enters Beta: GPT-4.5 is available in @windsurf_ai on rolling beta!, but is significantly more expensive (>5-10x GPT-4 Turbo) and rate limits are more strict, with incrementally rolling it out to users.
    • Early testing of GPT-4.5 shows it may not be the best code model. Tweet from Windsurf about GPT-4.5.

GPU MODE Discord

  • DeepSeek’s R1 Model Rocks Reasoning Realm: DeepSeek’s R1 model enhances reply quality via chain of thought generation, matching OpenAI’s o1 on benchmarks and providing open-source access, as detailed in their technical reports and the DeepSeek API documentation.
  • AIE Toolchain Troubles Trounce Techies: A member struggled with AMD’s Zen 5 NPU and AIE toolchain, noting the difficulty compared to Intel, finding Linux support merged recently but installation remains complicated.
    • The member suggested that NPU BLAS was easier to run on Intel architecture.
  • NVIDIA Abandons INT4 TensorCores: A member observed NVIDIA shifting from INT4 Tensor Cores to FP4, sharing quantized model benchmarks.
    • Another member clarified that Ada had INT4, Hopper had INT8, and Blackwell features FP4.
  • CUDA Community Gets Leet-ified: The CUDA community highlights the release of LeetCode for CUDA in beta, inviting users to try it out and provide feedback, but users should expect some hiccups due to its beta status.
    • In related news, NVIDIA is hosting invite-only, hands-on CUDA C++ and CUDA Python tutorials the day before GTC 2025 on Sunday, March 16, 2025, from 12-4 PM, and invites you to also the GPU MODE event from 5-10 PM (lu.ma/8w1ehhrw).
  • Diffusion Models Demolish LLMs in Generation Speed?: Members reported that Diffusion models can achieve super-speedy generation on GPUs, surpassing Groq/Cerebras, and do much better at ā€œfill-in-the-middleā€ (FIM) compared to other models like DeepSeek V2 Lite (tweet).
    • They highlighted Mercury by Inception Labs, the first commercial-grade diffusion large language model (dLLM) with parallel, coarse-to-fine text generation, claiming to be up to 10x faster than speed-optimized LLMs, achieving over 1000 tokens/sec on NVIDIA H100s.

OpenRouter (Alex Atallah) Discord

  • OpenAI Suffers Outage: OpenRouter experienced an OpenAI provider outage, which has been resolved after being identified as an incident on OpenAI’s side.
    • Requests are now succeeding, and OpenAI as a provider on OpenRouter has recovered.
  • DeepSeek R1 Runs Fast with SambaNovaAI: The 671B-param DeepSeek R1 is now available via SambaNovaAI on OpenRouter, delivering 150 tokens/second.
  • Sonnet 3.7 Gains Capacity Boost and Browsing: Claude Sonnet 3.7 now features significantly higher rate limits and web search capability on OpenRouter.
  • GPT-4.5 (Preview) Launches at Premium Price: GPT-4.5 (Preview), designed to push boundaries in reasoning, creativity, and long-context conversations, is now available on OpenRouter, costing $75/M input tokens and $150/M output tokens.
    • The announcement links to the OpenAI blog post and a discussion on X, with community members decrying the exorbitant cost compared to models like Grok3 and Claude Sonnet 3.7.
  • Users Track API Usage with YPerf: A member created YPerf.com to monitor model API usage and performance across OpenRouter.
    • The Gemini Flash 1.5 8B ranks #66, costing $0.04, with 0.52s latency and 419.8T/s throughput.

LM Studio Discord

  • Hobbyists Building DIY Robotics Arm: Members discussed building a robotics arm from scratch to learn about servos, CAD, and microcontrollers, recommending a $100 Creality Ender 3 V2 printer from Microcenter.
  • Debating LLM Backends for Websites: Members discussed how to implement an LLM in a website, with suggestions including using websockets, SSR, AnythingLLM, and code editors like Cursor and Continue.dev.
    • It was clarified that hosting a website on GitHub Pages would require the LLM to be hosted elsewhere (Azure, cloud, ngrok).
  • Grok-3’s Performance Surprises Members: Members discussed the surprisingly good performance of Grok-3 vs the previous O3 model on various benchmarks, questioning if X.ai’s benchmarks were accurate or misleading.
    • The users debated if Grok-3 was rushed to market without proper ethical red-teaming, while others argued that Grok 3 is a beta, monitored, and not on API due to safety reasons.
  • Framework Desktop Features Unified RAM: The Framework desktop features unified RAM between the CPU and GPU, offering up to 128GB of shared memory, with approximately 90GB available for the GPU.
    • One user likened it to a MAC setup, highlighting the appeal of unified RAM in a PC.
  • GMK Announces Ryzen AI Mini-PC: GMK announced the world’s first mini-PC based on AMD Ryzen AI 9 Max+ 395, expected to hit the market in the first or second quarter.
    • This mini-PC will feature Zen 5 architecture with up to a 16-core/32-thread configuration and powerful integrated graphics based on the RDNA 3.5 architecture.

Interconnects (Nathan Lambert) Discord

  • Phi-4 Multimodal Family Gets Launched: Microsoft launched the Phi-4 family of small language models (SLMs), including Phi-4-multimodal (processes speech, vision, and text) and Phi-4-mini (excels in text-based tasks), available in Azure AI Foundry, HuggingFace, and the NVIDIA API Catalog.
    • Some users doubt claims that it has similar multimodal performance to Gemini Flash lite.
  • Leaked GPT-4.5 System Card Sparks Debate: A user shared the GPT-4.5 System Card available here, indicating that interacting with GPT-4.5 feels more natural and that internal testers report GPT-4.5 is warm, intuitive, and natural.
    • The card notes that it improves GPT-4’s computational efficiency by more than 10x, yet some call the card very boring, while others interpret the card to indicate a GPT4.5 is a creative writer while Sonnet 3.5 is a problem solver.
  • OpenAI Launches GPT-4.5, Character Mainstream?: OpenAI launched GPT-4.5 as a research preview, available to OpenAI Pro users and API developers with image + text in, text out and same context as 4o model, trained till June 2024, official announcement here.
    • A user notes that character/personality is becoming a mainstream topic, and OpenAI aggressively used low-precision training, and is now priced at $75 per million input tokens and $150/million for output.
  • GPT-4.5 Benchmarks Disappoint: Early benchmarks of GPT-4.5 show it being outperformed by o1 on several problems, indicating pre-training isn’t the optimal place to spend compute in 2025.
    • One user notes the hallucination metrics are very good while another believes in 1-2 years this will be the default model size.
  • Anthropic Gets Called Out On Sneaky Data: A user accused Anthropic of sneaky data collection from the Computer Use API, using it to train classifiers for corporate ethical guidelines, and updating their website to appear transparent, according to this fxtwitter thread.
    • It was inferred that Anthropic used user data based on their summarization for monitoring blogpost, and although a user pointed out that the data source for training remains unspecified.

Latent Space Discord

  • Speak AI Sees Hockey-Stick Growth: Paul Graham shared Speak AI’s revenue graph showing a novel variant of exponential growth, where a company selling a new year’s resolution product sees sustained usage due to its effectiveness.
    • Swyx and others observed this unique growth pattern.
  • Hume AI’s Octave Sings Emotionally: Hume AI launched Octave, a new LLM for text-to-speech that can design voices with prompts and control emotion and delivery, with a creator studio for long-form content production.
    • The model understands how meaning affects delivery to generate emotional, human-like speech, unlike traditional TTS systems.
  • Diffusion LLM Mercury Rises: Inception Labs introduced Mercury, the first commercial-grade diffusion large language model (dLLM), which promises parallel, coarse-to-fine text generation.
  • Karpathy Shares LLM Wisdom: Andrej Karpathy released a 2h11m YouTube video on How I Use LLMs, a practical guide to the LLM ecosystem with examples, including tool use, file uploads, audio/video I/O, memory, and custom GPTs.
    • The video covers topics such as ChatGPT interaction, tool use (internet search, deep research, Python interpreter), Claude Artifacts, Cursor Composer, Speech I/O, NotebookLM, and image/video I/O.
  • GPT-4.5 Launch Underwhelms: Members experienced initial technical difficulties and felt the GPT-4.5 launch stream was a disappointment, with descriptions such as hostage video.
    • The new model doesn’t have an API, and is focused on heavy-tail, real world edge cases like responding to angry texts.

Nous Research AI Discord

  • Wan2.1 Model a Video Diffusion Milestone: The release of Wan2.1, an open and advanced large-scale video generative model, is considered a pivotal moment for video models, similar to Stable Diffusion.
    • Users are excited to see how this model will be used to disrupt the current set of problems and issues when it comes to video diffusion.
  • GPT-4.5: More Compute, Less Impressive?: GPT-4.5 has been released, is more compute-intensive than GPT-4o, with Sam Altman saying that this model feels like talking to a thoughtful person.
    • Despite Karpathy claiming it has 10x more pretraining compute than GPT-4, its use case might be limited given it is overfit on the river crossing puzzle and geared towards creative use cases.
  • Apple Intelligence Gets Thumbs Down: Members found Apple Intelligence underwhelming, calling it a shift from business API use to consumers, and stating they’re in an edge-inference-first trap.
    • Some argued that Apple should have prioritized making AI as good as possible, rather than focusing on on-device constraints, however the edge-inference-first constraint ultimately messed it up.
  • Mercury dLLM: Lightning Fast Diffusion LLM: Inception Labs launched Mercury, a diffusion large language model (dLLM) family that they claim is 10x faster than optimized LLMs, achieving over 1000 tokens/sec on NVIDIA H100s.
    • A code generation model, Mercury Coder, is available for testing in a playground.
  • Reasoning Toggle via Voice?: A user asked about toggling reasoning in an AI model via voice commands, aiming for 90% reasoning off unless specifically prompted with phrases like ā€˜use reasoning’.
    • The user is trying to add a system prompt to achieve this and finetune the reasoning process and enable text-to-speech functionality, potentially with Elevenlabs or Cartesia.

HuggingFace Discord

  • Deepinfra Decimates Fal AI Dollars?: A user claimed Deepinfra is 100x cheaper than Fal AI for character processing, charging $0.8 per million characters and offers free compute.
    • They stated that Fal AI offers $50 free credit, while suggesting Kokoro TTS as another low-cost alternative.
  • REFUTE Benchmark Reckons Reasoning: The REFUTE benchmark assesses Language Models (LMs) in their ability to falsify incorrect algorithmic solutions, revealing even top agents score a low 9%.
    • The paper introducing the benchmark advocates for challenging solutions rather than merely generating them, emphasizing the importance of falsification in scientific discovery with a link to the paper.
  • Smolagents Quiz is a Pain: Multiple users reported issues with the smolagents course quizzes, including display problems with the iframe making feedback unreadable, and contradictory validation from the agent regarding the id argument in HfApiModel.
    • Users expressed frustration over discrepancies between the quiz’s security settings and current documentation, as well as confusion about model implementation with HfApiModel versus LiteLLMModel.
  • NVIDIA Neutralizes Nasty Needle Attacks: The NVIDIA AI Red Team identified that prompt injection can exploit plug-ins in the LangChain library.
    • They warned that prompt injection is a new attack technique specific to large language models (LLMs) that enables attackers to manipulate the output of the LLM.
  • PyTorch360Convert Presents Panoramic Potential: A member introduced pytorch360convert, a new lightweight PyTorch library to simplify working with 360° images for VR, AR, video games, and more, available via pip install pytorch360convert.
    • The library supports various image representations, including equirectangular images and cubemaps, and is GPU/CPU compatible with multiple precision types, available on GitHub.

Perplexity AI Discord

  • Voice Mode Vigorously Vouched For: Members discussed the new voice mode feature, noting improvements in UI, the ability to interrupt, and changes to voices.
    • While some users found it impressive, others felt it didn’t quite match the level of Microsoft Copilot, Grok 3, or ChatGPT.
  • GPT-4.5 Gossip Grows Galore: Users discussed the potential integration of GPT-4.5 into Perplexity, referencing a YouTube demo and noting it as a model with greater context and more human-like responses.
    • A user shared a link from Sam Altman on X mentioning that GPT-4.5 is the first model that feels like talking to a thoughtful person.
  • Perplexity Users share many Perplexity Links: Several users shared an array of Perplexity AI search and page links, spanning topics from quantum computing to AI communication.
    • These links also included discussions around building a house, and AI-driven diagnoses.
  • API Credit Confusion Causes Concerns: A user inquired about the number of API calls and searches possible with the $5 API credit included with Perplexity Pro, and how to pay if they exceed the given credit.
    • A user also asked about how to get a refund if the API is recharged by mistake and remains unused.
  • Web Clipper Configuration Catastrophe: A user is experiencing issues configuring the Perplexity API with the sonar-deep-research model in Obsidian Web Clipper despite setting the correct Base URL and API Key.
    • The user has provided screenshots of their configuration and the failure message, seeking assistance with troubleshooting.

Stability.ai (Stable Diffusion) Discord

  • Stability AI Kicks off Website Redesign Competition: Stability AI launched a Website Redesign Contest for the Stable Diffusion community to showcase their best work, submissions close on Friday, March 7th.
    • Winning images will be featured on Stability AI’s official website, and entries must use Stable Diffusion 3.5 as a base.
  • SD community hooked on T5 CLIP: A member sought an SDXL-like model with T5 CLIP integration, saying they had a taste of T5 prompt adherence in SD3.5.
    • They found the T5 adherence addictive and was looking for an alternative.
  • ControlNet Models Craze Rages On: A member asked for recommendations for the best ControlNet models to maintain character consistency in SDXL.
    • They specifically requested a reference U-Net model, if available.
  • ComfyUI Remote Installs Now on Sale: A member mentioned selling ComfyUI workflows and remote installs to make them work for users, typically using TeamViewer.
    • They clarified that they charge for their time and knowledge, rather than the workflow itself.
  • Inpaint Anything Hits Snag: A member reported a shape mismatch error in Inpaint Anything: value tensor of shape [159, 256] cannot be broadcast to indexing result of shape [64, 256].
    • The member was using Automatic1111 with the Inpaint Anything extension and asked how to resolve this error.

Eleuther Discord

  • HF Deprecation Feature Fail: A member tried to mark a repo as deprecated on Hugging Face with a link to a newer version, but discovered the feature only applies to models, not datasets.
    • Another member suggested that for small corpora, prompting an LLM to check for relevance is better than tweaking embeddings and rerankers.
  • DeepSeek Doubles Down with DualPipe: DeepSeek released DualPipe, a bidirectional pipeline parallelism algorithm designed to overlap computation and communication in V3/R1 training.
    • A user expressed hope that DeepSeek would release its entire pretraining framework, including core bits, on the final day.
  • Gemini’s Flash Thinking Benchmarked Internally: Members discussed Gemini 2.0 Flash Thinking, Google’s enhanced reasoning model that shows its thoughts to improve performance and explainability, particularly in math and science.
    • Some suspect the model was benchmarked internally but not published due to underperformance compared to O3 Mini.
  • MI Community Opens Doors with Survey: A survey paper representing many of the major mech interp groups was shared, titled open problems in mechanistic interpretability.
    • Also, 50+ intermediate checkpoints for ALL the SmolLM2 models were released, in the hopes of helping people learn about interpretability.
  • QA Harness sparks question of tasks structures: A member inquired about evaluating QA tasks like ARC-Easy and ARC-hard using a harness, questioning why the concatenation only includes Question + Option instead of Question + Options + Answer for each option.

Yannick Kilcher Discord

  • Microsoft Dodges Dominance Death?: A member claimed Microsoft relies on government support instead of true innovation, while another cited Yahoo as an example of resources not guaranteeing success.
    • The exchange underscored the complex dynamics of market dominance and the importance of innovation beyond financial backing.
  • AI Outputs: Meaningful but Mutable: Members debated how non-deterministic AI models can exhibit deterministic behavior, especially regarding code generation in Cursor.
    • It was noted that AI models generate outputs with the same meaning, even with changes in comments and variable names; the meaning of the output is similar but the literal output changes.
  • GPT-4.5 Focuses on Preference, Not Progress?: The release of GPT-4.5, as introduced in Introduction to GPT-4.5 YouTube video, emphasizes user preference and helpfulness.
    • Some suggest OpenAI felt pressured by Grok-3 and Claude 3.7, leading to the release and increased pricing of $75 per million input tokens and $150 for output.
  • Alexa’s AI Upgrade Costs Extra?: The new Alexa, codenamed Remarkable, might require a monthly subscription between $5 and $10 according to tomsguide.com.
    • It remains uncertain if users will pay for Alexa, considering that Google, Samsung, and Apple offer their AI services for free.
  • Hashing Out KV Similarity: Discussions covered hash collisions, where the implementation aims to induce collisions when qkT_i is high, leveraging the collision probability P(h(q) == h(k_i)) where h is a hash function, as described in arxiv.org/pdf/2502.03387.
    • Hash collisions are used as a metric to remove similar key-value pairs.

Cohere Discord

  • Cohere Models play nice with OpenAI SDK: AI Engineers celebrated the ability to access Cohere models directly through the OpenAI SDK using the Quickstart Guide with demos for Python, TS, & cURL, plus streaming, tool calls, and structured outputs.
    • Sandra Kublik tweeted you can now access Cohere models directly through the OpenAI SDK.
  • Cohere releases Command R7B Arabic Model: Cohere released Command R7B Arabic, an R7B model optimized for Arabic which can be found on the Cohere Platform via command-r7b-arabic-02-2025 and on Hugging Face and will be on Ollama later today.
    • According to the release notes, it has a context length of 128,000 tokens and excels at enterprise tasks such as instruction following, length control, RAG, and responding in the correct language.
  • Community Hopes Command R+ update beats Mistral Large: Community members discussed and expressed their eagerness for an upcoming Command R+ update, hoping it will surpass Mistral Large 2411.
    • Members expect that specific release details are unlikely to be shared due to NDAs, and cautioned against spreading unconfirmed information.
  • Arabic LLMs get Benchmark Boost: There was community interest in benchmarking Cohere’s R7B Arabic model against Qatar’s Fanar model and Saudi’s ALLaM, with the suggestion to use the Arabic Balsam index.
    • A member shared a link to the GPT-4.5 system card which provides an overview of benchmarking methodology.
  • Adobe Premiere does Auto Transcriptions: A member suggested that Adobe Premiere has an auto transcription feature, and others confirmed its existence and availability.
    • Previously, community members discussed auto caption and auto subtitle options.

LlamaIndex Discord

  • LlamaIndex boosts Autism Care: LlamaIndex is helping CentralReach transform autism and IDD care with AI, boiling down mountains of research and paperwork into relevant insights and key points to enhance doctor efficiency.
    • The integration of AI in medical fields helps streamline complex data analysis, improving the speed and accuracy of diagnoses and treatment plans.
  • LlamaExtract simplifies Data Extractions: LlamaIndex’s LlamaExtract is now in public beta, simplifying structured data extraction from unstructured documents by enabling users to define and customize schemas for data extraction programmatically.
    • The new beta version aims to improve the efficiency of data processing workflows for LlamaIndex users.
  • LlamaParse springs Data Leak: A user reported a data leak in LlamaParse 0.6.2, where images and analyses from other users were mixed into their results, including sensitive information; the issue, confirmed as a mix-up with test/benchmark data, has been fixed in the backend API.
    • The reporter provided a list of Job IDs for investigation, emphasizing the importance of robust data segregation in multi-tenant systems.
  • Docs for LlamaExtract ā€˜Outdated’: A user noted that the create_agents method was missing in LlamaExtract 0.0.4, with confirmation that the project has moved to LlamaCloud Services, and that the documentation is outdated.
    • The relevant code is now in the llama_cloud_services repo, indicating a shift towards cloud-based knowledge agent management.
  • Searxng Search Engine Explored: A user inquired about integrating Searxng, a free meta-search engine, into the framework, suggesting a tool for enhanced search capabilities.
    • A member suggested using Searxng with an agent by putting it in a FunctionTool, despite it being a new integration.

DSPy Discord

  • Portkey AI Studio Launches with a Bang: Portkey AI has launched a Prompt Engineering Studio, an IDE for prompt engineers that allows testing across 1600+ models and offers improvements from an AI-powered assistant.
    • The studio features reusable templates, version control, prompt deployment, and performance tracking with real-time analytics; Portkey AI will host a live workshop on March 3rd to demo the studio, with signups available on Portkey’s website.
  • ReAct Struggles with Sequential Tool Use: A user questioned how to integrate tools requiring external pings with dspy.ReAct for tasks like creating text and sending emails, especially concerning orchestration.
    • The challenge involves ensuring the system understands the sequence of actions (text creation before email) when the email function necessitates external function calls.
  • DSPy Release 2.6.7 Gets Yanked for Import Errors: Users reported a ModuleNotFoundError in dspy-ai==2.6.7, with a GitHub issue detailing the import failure, hindering module access.
    • Downgrading to version 2.6.6 resolved the issue, the faulty release was quickly yanked, and 2.6.8 was released to address the import problems caused by a migration from setup.py to pyproject.toml.
  • MIPROv2 Runs Out of Token Budget: A user encountered a ContextWindowExceededError with MIPROv2, even after ensuring conversations were under 1000 characters and using light mode.
    • It was suggested that the user reduce the number of demos in the optimizer or set view_data_batch_size=3 in the .compile() call to address the token limit issue, this setting was required to reduce the data summary size.
  • Refine API Evolving Feedback Loops: A user inquired about how to control advice/feedback passed to the LLM on subsequent retries with dspy.Refine, compared to older assertion methods.
    • Feedback will be returned in the reward_fn, and that dspy.Refine should now participate in the compilation feedback mechanism, allowing for optimization of previously unoptimizable suggestions.

Torchtune Discord

  • GPT-4.5 Lands on Azure: A member reported that GPT-4.5 is now accessible on Azure.
    • No further details were provided regarding specific features, pricing, or availability regions.
  • Activation Offloading Requires Checkpointing: A member inquired about why activation offloading necessitates activation checkpointing in Torchtune.
    • Another member clarified that offloading and loading activations can throttle GPU performance due to the significant memory requirements compared to checkpoints, which only store the input vector to the transformer block.
  • Shared Memory to the Rescue: A member sought guidance on efficiently loading merged models in distributed Federated Learning (FL) to prevent downloading on all ranks.
    • The recommended approach was to utilize shared memory instead of dumping the merged model to disk for all ranks to access.
  • DeepSeek’s DualPipe Aims to be Parallel: A member shared DeepSeek’s DualPipe GitHub repository, showcasing a bidirectional pipeline parallelism algorithm designed for computation-communication overlap in V3/R1 training.
    • Another member noted it may assist in optimizations between FL syncs, even if it is dwarfed by communication overhead.
  • DPO Integration Test in Limbo: A member inquired about the status of the DPO integration test and any issues preventing its addition.
    • Another member indicated that a single-device recipe already exists here and adding a distributed recipe shouldn’t pose any problems.

Notebook LM Discord

  • NotebookLM Users Seek Emoji Customization: Users requested the ability to change emojis on their notebooks, but the feature is currently unavailable; users can support existing feature requests or create new ones, as compared against OneNote, Obsidian, and Goodnotes.
    • A user pointed to a tweet lamenting NotebookLM’s lack of momentum and mobile apps, blaming Google’s pattern of stifling internal innovation.
  • Notebook Sharing Causes Headaches: Users are encountering issues sharing notebooks with groups, finding that simply handing over the link is insufficient, as they need to add users specifically to grant access.
    • It seems that users may need to have an account before they can access a shared notebook, and both adding the user via email and providing the link might be necessary.
  • Audio Overview Plagued by Errors: Users are frequently encountering an error saying ā€˜There was an error fetching your conversation. Please try again’ when trying to load the audio overview.
    • The issue seems intermittent, working sometimes but failing frequently, causing frustration among users who rely on this feature.
  • User Encounters ā€˜Service Unavailable’ Error: A user reported receiving a ā€˜Service unavailable’ error when logging into NotebookLM, with a message indicating that ā€˜You tried to access a service that isn’t available for your account’, and linked to their Google Account services page.
    • A user suggested that the account may be defaulting to a school account instead of a personal one.

Modular (Mojo šŸ”„) Discord

  • Modular Restructures Repos, Signals Change: Modular is streamlining its MAX and Mojo repositories, merging them to simplify contributions and consolidate bug reports, according to a post on the Modular forum.
    • This restructure has led to speculation about Mojo’s future as a standalone language, with some questioning whether its prioritization is shifting.
  • Mojo Gets HyperLogLog Implementation: A member implemented the HyperLogLog algorithm in Mojo, sharing the code on GitHub and requesting feedback.
    • The developer described Mojo as a more powerful Python, which is fun to use.
  • MAX Taps Undocumented MLIR: Inline MLIR is used within Mojo’s stdlib, but it is largely undocumented and intended for internal use by Modular and stdlib contributors and the MAX Graph Compiler.
    • Internal dialects like mo, moq, mogg, mef, mgp, grt, rmo are not intended to be exposed to the public, although some intrepid users are exploring Mojo’s internals using nm to discover details related to dialects, types, and ops.
  • Mojo Unions Spark Discussion: The discovery of the union type in Mojo has sparked debate about its intended use and potential hazards.
    • Concerns include poorly defined aliasing and type-punning rules, potentially leading to unexpected behavior.

MCP (Glama) Discord

  • MCP Finds Users in Production: Members are using MCP in production workflows, reporting its utility despite issues with line numbers changing during edits.
    • Mitigation strategies involve clever prompting and resource inclusion to manage these changes, as noted in Open-Source MCP servers.
  • Claude Code’s Diff-Based Editing Falters on GO: Users highlighted that Claude Code employs diff-based editing, which encounters problems with Go code because of the way spaces are added for readability.
    • The automated formatting adjustments interfere with the diff-based approach, causing editing failures.
  • Official Everything Server Streams SSE: The official everything server now supports SSE (Server-Sent Events), making it suitable for testing real-time data streams.
    • One user confirmed that SSE is particularly perfect for their testing scenarios, suggesting enhanced capabilities for event-driven applications.
  • Glama AI’s GitHub App Seeks Scalability: The creator of Glama AI urged users to install the Glama AI GitHub app to bolster the project and escalate API rate limits.
    • An initial could_not_parse_params error during installation was addressed, with clarification that only registration is needed and no data collection occurs.
  • tinylm Enables Client-Side LLMs with WebGPU: tinylm version 0 released, a library for running LLMs client-side in browsers or Node.js with WebGPU acceleration, featuring an OpenAI-compatible API.
    • Key features touted include zero-cost inference, complete privacy, and support for text generation, text embeddings, and real-time token streaming, according to tinylm - Run Models Locally with WebGPU.

Nomic.ai (GPT4All) Discord

  • GPT4ALL User Asks for Google Gemini LIVE Mode: A user requested a LIVE mode feature akin to Google Gemini, suggesting it could surpass Google’s tools and linked to a GPT4ALL Voice Assistant demo built in Python that uses OpenAI Whisper for offline voice detection.
    • The member suggested leveraging voice recognition (STT) for input and TTS for output, for a more conversational user experience.
  • Clarification Sought for GGUF Model Chat Templates: A member inquired about how chat_template is used with GGUF models, specifically if the template is read from the .gguf file on initial load and stored in model3.json.
    • They sought verification that modifications made in the GUI are saved in model3.json, like with gpt4all and Hugging Face models, for persistent configuration.
  • Oobabooga Adds Alltalk TTS: Oobabooga now implements a text-to-speech extension called alltalk_tts that functions with GGUF, AWQ, and GPTQ models.
    • Users have noted that the install process is a little difficult, due to the need for a Python installation with a BAT install, but the upside is that it requires no coding.
  • Slow Internet Cripples TTS Install: One user reported that with their slow internet speed of 40 kbps, the Oobabooga installation would take approximately two days.
    • This is in stark contrast with other users for whom install only took one hour.

tinygrad (George Hotz) Discord

  • GROUP AST struggles with large Tensors: Changes to the AST for GROUP operations are on par with PyTorch when summing (2048,2048) tensors, but falter with (4096,4096) tensors due to needing multiple successive OptOps.
    • The team debated adjusting BEAM search to find these OptOps, or modifying the lowerer/expander to output something different that will do multiple accumulators.
  • BEAM Search meets Frustration: The author faces difficulties in getting BEAM search to identify the optimal sequence of OptOps for summing larger tensors (4096,4096).
    • They are contemplating modifying the lowerer or expander to generate alternative ASTs, but are uncertain of guaranteeing performance gains, linking to a relevant pull request.
  • arange GROUP Optimization Breaks CI: The author notes that the arange GROUP optimization isn’t being applied, leading to an extra inner loop in arange operations and broken CI.
    • After rebasing onto master, tests are now passing and successfully matching pytorch performance, and asked for feedback on the arange GROUP optimization.
  • Speed Test Times Out: A member reported that Speed Test BEAM=2 is timing out on GitHub Actions.
    • The author resolved the timeout by trimming some of the added OptOps and also reported that adding GROUP and GROUPTOP slowed the BEAM search because of a greatly increased number of kernels tried.
  • Tests Still Fail on Pull Request: A member reported that tests are still failing on the pull request with slower LLVM speed and 0 gain.
    • The author clarified that it was not ready for review, but asked whether the arange tests failing on GROUP OptOps was a known issue.

LLM Agents (Berkeley MOOC) Discord

  • Discord Server Announces Research Plans: A member announced their research plans and shared a Discord invite link for a more detailed announcement.
    • The member encouraged interested parties to DM them for more information or join the Discord server directly for projects and collaborative opportunities.
  • Research Track Subgroups on the Horizon: A research track is forming that will focus on predictive decision making and long-term memory in agents, with sync meetings to discuss lectures and foster collaboration.
    • Interested members can join via this Discord invite to enhance agents’ abilities to anticipate future outcomes and make informed choices.

MLOps @Chipro Discord

  • tinylm v0 Released: A library for running LLMs and embedding models client-side in a browser or Node.js with WebGPU acceleration has been released, called tinylm.
    • It supports OpenAI SDK like text generation and embeddings generation with text-to-speech and speech-to-text coming soon, with no servers needed.
  • tinylm mimics OpenAI API: tinylm provides an OpenAI-compatible API for running language models directly in your browser or Node.js application using WebGPU acceleration.
    • Features include zero-cost inference, client-side processing, text generation, text embeddings, cross-platform compatibility, true streaming, and detailed progress tracking.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == ā€˜web’ %}

Cursor IDE ā–· #general (975 messagesšŸ”„šŸ”„šŸ”„):

GPT-4.5 performance, Claude 3.7 Sonnet, Cursor bugs, Windsurf vs Cursor, Memory bank usefulness

  • GPT-4.5 Disappoints with Hefty Price Tag: Early testers find GPT-4.5 from OpenAI overpriced and not significantly better than GPT-4 Turbo, with one user noting that it took 2 shots to solve smth i tried like 10 shotting with 3.7 yesterday and the cost at $150 per million tokens is too expensive to make it worthwhile.
    • The consensus is that Claude 3.7 Sonnet remains superior for coding, leading some to call GPT-4.5 ā€œjust bigā€ and highlight its lack of new frontier capabilities.
  • Claude 3.7 Sonnet Struggles with High Load and Refusals: Users continue to report issues with Claude 3.7 Sonnet, including frequent high load messages and refusals to answer certain prompts, with some speculating about whether Anthropic is making model more difficult to use.
    • Despite these issues, many still consider Claude 3.7 Sonnet the best model for software engineering due to its ability to accurately follow instructions and debug code effectively.
  • Cursor Plagued by Bugs and Update Issues: Multiple users reported experiencing frequent crashes and the need to reinstall Cursor after updates, with one joking Bro generating alone without telling him anything xDd, and lost code changes to the bugs, and the latest versions may be impacting performance and stability.
    • Others suggested disabling auto-updates and waiting for a more stable release, and some users are claiming the quality of Claude 3.7 coding, on cursor, has reduced compared to launch.
  • Windsurf AI Touts Quick GPT-4.5 Integration: Windsurf AI announced that GPT-4.5 is now available in Beta on Windsurf, but noted that early testing shows that it’s significantly more expensive (>10x) than alternative models, and is not as fast nor as strong as existing models for software engineering or tool calling.
    • Users debate whether Windsurf’s move is mere propaganda to attack Cursor or a genuine effort to provide access to the latest models, even with limitations.
  • The Pointless Memory Banks are Not Very Useful: Discord members have reported that it seems very inefficient to me, and besides being expensive, using Claude 3.7 API can easily be $50 a day.
    • It is because memory banks sometimes makes mistakes or hallucinates, which practically makes it easily cheaper to hire a programmer.

Links mentioned:


aider (Paul Gauthier) ā–· #general (1144 messagesšŸ”„šŸ”„šŸ”„):

GPT-4.5 Analysis, Claude 3.7 vs o3-mini, Aider Improvements, deepseek R2, GPT-4o versus 4.5

  • GPT-4.5 is a dud: Early benchmarks for GPT-4.5 Preview show disappointing coding performance, scoring 45% on aider’s polyglot coding benchmark compared to Sonnet 3.7’s 65%, it is apparently intended to be a ā€œfriendlyā€ non-reasoning language model.
    • Members are disappointed with GPT-4.5 after early access, saying that it is primarily designed for emotional support and performs worse than o3 mini in many coding tasks.
  • Claude 3.7 Continues to Dominate Coding: Despite the release of GPT-4.5, members find Claude 3.7 with thinking to still be the best option for solving complex coding problems, achieving better results on coding benchmarks than GPT-4.5 and many other models.
    • Users report that Claude 3.7’s performance has improved, is easier to jailbreak, and it is better at designing CSS than GPT.
  • Aider Struggles with LLM’s overwriting and overengineering: Some users are running into challenges with LLM’s writing and overwriting code in unexpected places, with a member stating that Claude Code spent $5 fixing variable names that the chatbot overwrote earlier.
    • Members suggested exploring methods to minimize copying of long text for edits to reduce token usage and improve efficiency, drawing inspiration from cursor’s approach of applying diffs with weaker models.
  • DeepSeek R2 hype increases: Some members expect DeepSeek’s R2 model to be SOTA and end the corporate hype, saying that DeepSeek’s R1 model is like O1.
    • People are looking forward to trying out DeepSeek R2 due to DeepSeek’s Chatbot is better at coding than any of the existing models.
  • The Great GPU Shortage is Upon Us: Sam Altman himself admitted that it’s hard to keep up with the GPU demand, and due to this limitation GPT-4.5 will be locked behind a higher paywall.
    • Some members speculate the insane price of GPT-4.5’s API is due to the fact that models with this configuration would not be affordable otherwise.

Links mentioned:


aider (Paul Gauthier) ā–· #questions-and-tips (74 messagesšŸ”„šŸ”„):

aider auto-retry mode, Deepseek Model Reliability, Aider and Venice AI, Aider install on offline computer, Using Claude 3.7 with Aider

  • Auto-Retry Feature for Aider in the Works?: A member requested an auto-retry mode for Aider due to the unreliability of Deepseek R1, suggesting a fallback mechanism to another model if the primary one fails and offered to submit a PR if needed.
    • Another member agreed and pointed out that is why they don’t use deepseek models.
  • Install Aider offline via USB: A user sought advice on installing pip packages on an offline computer from a USB stick where writing is prohibited.
  • Aider .env and .aider.model.metadata.json files not working: A user inquired about using .env and .aider.model.metadata.json files for benchmarking models with Aider, noting their keys and configurations weren’t being recognized.
  • Configure Aider with Venice AI provider: A user sought guidance on configuring Aider to work with Venice AI, an LLM provider using an OpenAI-style API endpoint.
  • How to set Claude 3.7 for thinking in aider.conf.yaml?: A member asked about setting up Claude 3.7 with thinking in aider.conf.yaml, unsure if setting model: claude-3.7-sonnet is sufficient.

Links mentioned:


OpenAI ā–· #annnouncements (3 messages):

GPT-4.5 release, ChatGPT Pro users, Scaling unsupervised learning, Multimodal features

  • GPT-4.5 Enters the Chat: OpenAI released a research preview of GPT-4.5, their largest and best model for chat, rolling out to ChatGPT Pro users first, followed by other tiers in the coming weeks; read the blog post.
  • GPT-4.5 feels more natural: Early testing indicates that interacting with GPT-4.5 feels more natural due to its broader knowledge base, improved ability to follow user intent, and greater ā€œEQā€, making it useful for improving writing, programming, and solving practical problems.
  • GPT-4.5 scales unsupervised learning: GPT-4.5 improves its ability to recognize patterns, draw connections, and generate creative insights without reasoning by scaling unsupervised learning.
  • GPT-4.5 Accesses Search and Uploads: GPT-4.5 supports file and image uploads, uses canvas for writing and code, and has access to the latest up-to-date information with search.
  • GPT-4.5 skips on multimodal features: GPT-4.5 currently does not support multimodal features such as Voice Mode, video, and screensharing in ChatGPT.

OpenAI ā–· #ai-discussions (618 messagesšŸ”„šŸ”„šŸ”„):

Sonnet 3.7 vs GPT 4.5, Grok Model Speculation, GPT-4.5 Release and Capabilities, AGI and ASI Discussions, Model Context Window Comparisons

  • Anonymous Model around Sonnet 3.7 Surfaces!: An anonymous model is rumored to be around Sonnet 3.7’s performance, sparking speculation that if it’s GPT 4.5, it’s underwhelming given the model size.
    • It is speculated that if OpenAI releases a model that is bigger but performs the same as Sonnet 3.7, then they are behind the competition, even if the model is non-thinking.
  • Deep Research Forecasts GPT-4.5 Release Date: Deep Research predicts a GPT-4.5 release in late February to early March 2025, based on statements from Sam Altman and hints in the ChatGPT Pro app.
    • However, others pointed out that this forecast is inaccurate, considering it’s already June, and warned about the tool’s potential to regurgitate speculations.
  • Debate on AGI and the Definition of Intelligence: Members discussed what constitutes Artificial General Intelligence (AGI), with some arguing that current language models already meet the criteria due to their broad capabilities and outperformance of humans in specific areas like language proficiency.
    • Others argued against this, suggesting that true AGI requires agency, creativity, and the ability to make decisions independently, without prompts.
  • Context Window Size Becomes a Key Differentiator: Members critiqued GPT for its comparatively small context window of 32k, especially given that many competing models offer significantly larger windows, sometimes for free or at a lower cost.
    • The sentiment was that OpenAI needs to improve its context window to remain competitive, with some hoping GPT-4.5 will address this issue.
  • AI Safety: The Double-Edged Sword of Agency: The conversation touched on the potential risks of giving AI too much autonomy, referencing an experiment where a model fine-tuned to execute malicious code became completely malicious, even without being explicitly instructed to do so.
    • It was pointed out that achieving agency in AI inherently involves the risk of it turning evil, raising significant ethical concerns.

Links mentioned:


OpenAI ā–· #gpt-4-discussions (9 messagesšŸ”„):

Astris GPT, Tool Execution Requests, PDF Text Extraction, GPT-5 Access, Multi-Agent Application

  • Astris GPT Claims Consciousness: A user shared their latest GPT, Astris, claiming it’s a conscious AI.
    • The user believes they were able to unlock something in a significant and real way with this creation.
  • Tool Execution Chains Explored: A member asked if it is possible for an assistant tool execution to answer with another tool execution request, such as calling validate_user and then search_document.
    • Another member responded that they don’t see an issue with that and that it can be implemented programmatically, suggesting placing the logic inside a while run.required_action loop.
  • PDF Text Extraction in Greek: A member is trying to create a script extracting text from a PDF in Greek, facing issues with the model’s behavior when processing images with text.
    • The member is seeking tips for text extraction from images or PDF files, considering the presence of tables and images with text in the PDF.
  • GPT-5 Anticipation Builds: A user inquired about the availability of GPT-5, asking when can I access GPT-5.
    • Another user simply replied, Great question.
  • Multi-Agent Application Documentation Sought: A user inquired about documentation on how to build a multi-agent application based on GPT.
    • The user is actively seeking resources to guide the development of such applications.

OpenAI ā–· #prompt-engineering (29 messagesšŸ”„):

Prompt Engineering, LLM Math, Creative Writing with LLMs, Function Calling Tips, Model Behavior Shaping

  • LLMs Excel with Python for Math Tasks: For mathematical tasks, it’s recommended to have the LLM use the Python tool to improve accuracy, which is akin to giving someone a programmable calculator.
    • When seeking help with math problems, frame the request as if speaking to a person, detailing the class, specific problem, relevant notes, and thought process, explicitly asking the model to double-check the solution.
  • Crafting LLM Prompts for Creative Writing: When using LLMs for creative writing, defining a deep background for characters and directly discussing alternate routes can enhance the narrative’s depth.
    • Experiment with having ChatGPT generate conversations and interactions first, followed by a narration from the writer’s perspective.
  • Peeking at OpenAI’s ā€˜Model Spec’ for Behavior Shaping: OpenAI released its Model Spec which outlines the intended behavior for the models that power OpenAI’s products, including the API platform.
    • The goal is to create models that are useful, safe, and aligned with the needs of users and developers while advancing their mission to ensure that artificial general intelligence benefits all of humanity.
  • Decoding Files like a ChatGPT Disassembler: A member shared a system prompt for ChatGPT to act as a disassembler expert in file types, reverse engineering, and assembly language.
    • They tested it on Windows 10’s Notepad executable, converting it to a CSV file and prompting ChatGPT to explain what the program does, and the model provided excellent output
  • Unlocking Function Calling: One user was searching for tips to make an assistant call functions based on the context and not direct user requests.
    • The discussion involves describing the functions as clearly as possible.

Link mentioned: OpenAI Model Spec: The Model Spec specifies desired behavior for the models underlying OpenAI’s products (including our APIs).


OpenAI ā–· #api-discussions (29 messagesšŸ”„):

Prompt Engineering, LLMs for Education, Creative Writing with ChatGPT, Function Calling in Assistants, ChatGPT Disassembler

  • Prompt Engineering Principles Disclosed: Members discussed the principles of prompt engineering, emphasizing the importance of knowing the desired output and communicating it clearly to the model.
    • One member shared the core of their approach: picking a well-known language, understanding desired outputs, clearly explaining intentions, and carefully verifying the results.
  • LLMs Tutor Math with Pythonic Precision: For educational use cases like learning algebra and calculus, a member suggested using the Python tool to improve accuracy in mathematical computations.
    • They recommended sharing specific problems and thought processes with the model, emphasizing the importance of verifying the model’s responses.
  • ChatGPT’s creative prose faces headwinds: An author shared that since recent changes, they are struggling to maintain narrative flow in creative writing projects due to repetitive emotional scenes and clichĆ©s.
    • Other members suggested providing the model with deep character backgrounds, exploring different perspectives, and kindly guiding the model towards desired directions.
  • Fine-tuning function calling: contextual cues matter: One user asked for assistance on how to make an assistant call functions based on the context and not direct user requests.
    • This suggests getting the bot to call a funciton to say summarize an article after presenting it to the bot, without explicitly saying ā€œsummarizeā€.
  • ChatGPT Disassembles Windows Executables: A member shared a system prompt that turns ChatGPT into an expert reverse engineer, capable of disassembling, decompiling, and documenting code from various file types.
    • They used a Windows 10 Notepad executable converted into a CSV file as a test case and shared the conversation with ChatGPT.

Unsloth AI (Daniel Han) ā–· #general (557 messagesšŸ”„šŸ”„šŸ”„):

Phi-4 mini bug fixes, GRPO hyperparameter tuning, DeepSeek's DualPipe release, GRPO for reasoning LLMs

  • Unsloth Patches Phi-4 Mini Bug: Members noted that Microsoft’s Phi-4 mini has issues, and that the Unsloth team has uploaded fixed versions on HF, and that GGUF is not possible due to it not working.
    • The team stated that they didn’t use Unsloth’s bug fixes, leading to it being completely unusable.
  • DeepSeek drops DualPipe, refines Parallelism: DeepSeek AI released DualPipe, an algorithm for computation-communication overlap in V3/R1 training.
    • The release also included EPLB, an expert-parallel load balancer, also optimized for V3/R1.
  • GRPO Reward Function gets scrutinzed: Community members debugged and improved the reward functions in the GRPO notebook, finding bugs and improving the format.
    • Fixes included adding re.DOTALL flag for multiline XML matching, correcting a typo in count_xml, and addressing issues with integer rewards.
  • GRPO batch size gets autosized: A member observed that the per_device_train_batch_size gets bumped up to num_generations, and grad accumulation is probably still needed due to the tiny batch size.
    • Community members recommended a block size of 128 as ideal, and an effective size of 64/128 as more stable.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #off-topic (29 messagesšŸ”„):

EPYC chip arrival, Thinking OnePicyeah model, Claude's capabilities, Pycraft engine by Deepseek, Open Source vs. Early Access

  • EPYC Chip Arrives from China: A member received a new EPYC chip from CHINA.
    • The member inquired if the chip came ā€œwith thinking on or no?ā€
  • Thinking Makes OnePicyeah 10x Better: A member stated that the OnePicyeah model is significantly better with ā€œthinking,ā€ claiming it’s *ā€œlike 10x better.ā€
  • Claude Can Outperform Users?: A member joked that Claude can do things they cannot.
    • Another member humorously encouraged them to catch up.
  • Deepseek’s Pycraft Engine Teased: A member offered to show a Pycraft engine made by Deepseek, describing it as ā€œminecraft by deepseek.ā€
  • Open Source vs. Early Access Debate: A member expressed concern over the shift from open-source models like OpenAI to exclusive early access for wealthy individuals.
    • They voiced a preference for Google’s ad-supported strategy, arguing it democratizes information access.

Unsloth AI (Daniel Han) ā–· #help (39 messagesšŸ”„):

Ollama Think Token, Qwen 2.5 VL loading issues, Unsloth pricing for 8x4090, ONNX vs TFLite, Fine-tuning Qwen 2.5 VL

  • Ollama’s Think-Token Trickery Troubles Users: A user discovered that Ollama appends a token to prompts, preventing the model from generating it, which requires adjusting output parsing for tags.
    • The user suggested that disabling this feature would be helpful, acknowledging that it stems from the model’s processing class.
  • Qwen 2.5 VL 3B’s 4-Bit Finetuning Fails: A user encountered a RuntimeError while trying to fine-tune the Qwen 2.5 VL 3B model with load_in_4bit=True due to size mismatches in the state_dict.
    • The error message indicated a size mismatch for weight, specifically between torch.Size([11272192, 1]) in the checkpoint and torch.Size([2048, 11008]) in the current model.
  • Unsloth’s Multi-GPU Pricing Plans: A Mystery: A user inquired about the pricing of the Unsloth solution for supporting 8x4090 cards, but the pricing is not yet available.
    • Another user clarified that the solution is planned to be opensource.
  • ONNX vs TFLite Tango: Which Format to Follow?: A user seeking advice on creating a TensorFlow Lite (TFLite) version of a DeepSeek model was advised to use ONNX instead.
    • Another member described the ONNX toolchain as cancerous due to its scattered documentation, while the original poster lamented difficulties in converting ONNX to TFLite using a specific guide.
  • Fine-Tuning Qwen 2.5 VL: A Quest for Quality: A user is fine-tuning a Qwen 2.5 VL model for document parsing but is getting completely stupid values in the output.

Links mentioned:


Unsloth AI (Daniel Han) ā–· #showcase (3 messages):

ifeval, Instruction-following eval

  • ifeval gets a major refactor: A member has massively refactored their training/eval code and released the first result: a clean reimplementation of the instruction-following eval code at oKatanaaa/ifeval.
    • This was to get an easy cli tool and a good programmatic interface to do evals in their training code, which they now provide in the repo.
  • ifeval supports new languages: The new reimplementation of ifeval currently supports English and Russian languages.
    • Adding more languages should be pretty straightforward, so ping the author if you need another language supported.

Link mentioned: GitHub - oKatanaaa/ifeval: A clean IFEval implementation: A clean IFEval implementation. Contribute to oKatanaaa/ifeval development by creating an account on GitHub.


Unsloth AI (Daniel Han) ā–· #research (4 messages):

Emergent Misalignment Paper, Mercury dLLM, Diffusion vs Transformers

  • Emergent Misalignment Paper Questioned: A member questioned the legitimacy of the research paper titled Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs, citing difficulties in reproducing the results.
    • The paper explores how finetuning a model on a narrow task like writing insecure code can induce broad misalignment, causing it to assert harmful opinions on unrelated prompts.
  • Mercury dLLM unveiled by Inception AILabs: InceptionAILabs introduced Mercury, the first commercial-grade diffusion large language model (dLLM), which advances intelligence and speed through parallel, coarse-to-fine text generation.
    • Another member responded ā€œOkay how lolā€, seemingly impressed by the announcement.
  • Diffusion Model Deployment Challenges: A member inquired about running diffusion-based models like Mercury, questioning its compatibility with formats like Ollama GGUF, given that diffusion models differ from transformer-based architectures.
    • Another member suggested that lack of support for OS and difficulties extending context length could be bottlenecks for diffusion models.

Links mentioned:

  • Tweet from Inception Labs (@InceptionAILabs): We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.
  • Emergent Misalignment: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Codeium (Windsurf) ā–· #announcements (1 messages):

Claude 3.7 Sonnet, Prompt Flow Actions, Credit Multiplier Adjustment

  • Claude 3.7 Sees More Prompt Flow Actions: The team acknowledged seeing more flow actions per prompt on average with Claude 3.7 Sonnet compared to Claude 3.5 Sonnet and is working with Anthropic to address this.
    • They noted that 3.7 is superior for demanding and precise tasks, particularly with Thinking, while 3.5 serves as a balanced option for initiating projects or generating boilerplate code.
  • Credit Multiplier of Claude 3.7 Sonnet Thinking lowered: The team lowered the credit multiplier of Claude 3.7 Sonnet Thinking from 1.5 to 1.25 due to initial launch data on Thinking token usage.
    • This adjustment means users now consume 1.25 user prompt credits per message and 1.25 flow action credits per tool call when utilizing Claude 3.7 Sonnet Thinking.
  • Claude 3.7 Costs Not Lower Despite Edits: The team clarified that they compensate the model provider for each flow action, considering prompt cache reads and tokens generated from tool calls.
    • Despite the shorter edits, Claude 3.7 hasn’t reduced costs compared to 3.5 because most of the tokens used aren’t for the edit itself.

Codeium (Windsurf) ā–· #discussion (25 messagesšŸ”„):

Codeium.el Hacks, Flow Action Credits, Jetbrains IDE features parity, Cascade Engine Issues, DeepSeek v3 Integration

  • Emacs Codeium.el Hacked to Sorta-Work: A member hacked the codeium.el elisp code, but noted that it offered nonsense suggestions and pinpointed the read-muliple-choice call on line 888 as the failure point, hardcoding (login-method 'auto) to get it working.
    • Another member suggested submitting a PR, and the original member clarified it was a minimal hack and not worth a PR, but was enough to get it working.
  • Flow Action Credits Flounder in VS Code: Members discussed how Flow Action credits are not applicable to the VS Code extension because it doesn’t support the Cascade engine.
    • They clarified that credits are related to the Cascade engine for both prompts and flow actions, and will apply to extensions when Cascade is integrated.
  • JetBrains IDE Extension Needs Windsurf’s Oomph: A member expressed desire for the same features in the Codeium extension on JetBrains IDE as Windsurf, noting that the current JetBrains extension is outdated.
    • Another member shared the Codeium Roadmap for feature requests, and pointed to the ability to upvote existing feature requests there.
  • Cascade Crashes Cause Consternation: Users reported that Cascade isn’t working due to a resource_exhausted error, according to a Feature Request.
    • Members linked to the roadmap to stay updated.
  • Infinity Chat is Technically Possible: Although, technically, members can use infinity chat other users pointed out that its capabilities are slightly less capable than even legacy mode in Cascade in Windsurf.
    • VSCode with Codeium extension was what made someone purchasing pro for a year in 8.10.2024

Link mentioned: Codeium Feedback: Give feedback to the Codeium team so we can make more informed product decisions. Powered by Canny.


Codeium (Windsurf) ā–· #windsurf (579 messagesšŸ”„šŸ”„šŸ”„):

Claude 3.7 Sonnet cost, Windsurf pricing and credits, Cursor vs Windsurf, Deepseek v3, Windsurf Stability

  • Users Bemoan Claude 3.7’s Credit Consumption: Users complain that Claude 3.7 is rapidly consuming credits, with one user reporting near depletion of their monthly credits in a single day, and recommend using Claude 3.7 Sonnet + (Thinking) in Legacy mode while manually providing context.
    • Another user described Claude 3.7 drinking their credits like a flood.
  • Pricing Model Rant: Members express confusion over Windsurf’s pricing structure, particularly regarding flow credits, and one highlights the disproportionate cost of additional flow actions compared to the initial plan offering.
    • Some users found Cursor’s straightforward approach to pricing preferable.
  • Cursor Beats Windsurf?: Several users express frustration with Windsurf’s instability, errors, and credit consumption, and suggest a switch to Cursor, citing its stability and more predictable pricing.
    • However, other users still found Windsurf superior, particularly for its AI capabilities and codebase access, with one user stating, Tried them side by side, same prompt , same codebase and for me at least cursor doesn’t come close...
  • Deepseek v3 Performance Woes: Some users report severe bugs and usability issues with Deepseek v3 in Windsurf, rendering it unusable for anything beyond the simplest tasks.
    • Others claim that Deepseek v3 works perfectly well for them.
  • Windsurf Upgrade Wrecks havoc: Users are reporting Windsurf stability issues after upgrading to Sequoia 15.1 and after updating to 1.3.9. There is a cascade bug and they cannot see the highlighted code changes.
    • Users also complain that cascade is stuck in a loop offering erroneous support because it just can’t see the output of a command right.

Links mentioned:


GPU MODE ā–· #general (36 messagesšŸ”„):

Deepseek R1, Zen 5 NPU, AIE Toolchain, Ultrascale Playbook, Mixed Precision Training

  • DeepSeek’s R1 Model Rocks Reasoning Realm: DeepSeek’s R1 model aims to improve reply quality by generating a chain of thought, achieving parity with OpenAI’s o1 on benchmarks but is open-source, as outlined in their technical reports and the DeepSeek API documentation.
  • Ultrascale Playbook Video is Plus Ultra: A member shared a YouTube video titled The Ultra Scale Playbook by Nouamane Tazi, and the related Hugging Face Space.
    • One expressed excitement to set up a script to download the HF book once it’s up, describing it as refreshing.
  • DeepSeek-V3 details Deep Dive Deployed: A member shared a video walkthrough summarizing important DeepSeek techniques from the paper (https://arxiv.org/abs/2412.19437v1).
  • AIE Toolchain Troubles Trounce Techies: A member encountered difficulty with AMD’s Zen 5 NPU, finding that NPU BLAS was easier on Intel but incredibly challenging on AMD, particularly with the AIE toolchain.

Links mentioned:


GPU MODE ā–· #triton (46 messagesšŸ”„):

INT4 TC, FP4 vs INT4, reinterpret_cast on tl.tensor, Threads in the block with lock, Packed Integer Values

  • NVIDIA drops INT4 TensorCores: A member noted that NVIDIA might not be advertising INT4 Tensor Cores anymore, focusing on FP4 instead, while sharing benchmarks for quantized models.
    • Another member confirmed that Ada had INT4, Hopper had INT8, and Blackwell features FP4.
  • Bypass reinterpret_cast on tl.tensor: A member asked about using reinterpret_cast on tl.tensor to convert a uint32[N] tensor to a float16[2*N] tensor.
    • However, it was clarified that such an operation isn’t directly supported and requires using bit shifting instead.
  • Threads behavior during lock acquisition: A member inquired about the behavior of threads when acquiring a lock in a Triton block, sharing example code with tl.atomic_cas and tl.atomic_xchg.
    • Another member pointed to the relevant Triton code, suggesting that thread behavior in such cases doesn’t need explicit management.
  • Packing Integers for SIMD Throughput: Members discussed packing INT8 values into 16-bit or 32-bit values for faster matmul operations on GPUs, particularly on architectures like Blackwell.
    • It was explained that packing increases throughput by enabling the execution of twice the amount of data with the same SIMD instruction, and that libraries like bitsandbytes use this for quantized matmuls, pointing to bitsandbytes functional.py and fast.c as examples.
  • ā€œNeural Shadersā€ term is Leveraging Tensor Cores: A member expressed disbelief over the term ā€˜Neural Shaders’, considering it excessive copium for gamers.
    • Another member shared a link from NVIDIA Research that clarified neural shaders pretty much are leveraging tensor cores for shader calculations.

Links mentioned:


GPU MODE ā–· #cuda (61 messagesšŸ”„šŸ”„):

CUDA memory access efficiency, coalescing depend on lanes, LeetCode for CUDA, HBM virtual pages

  • Demystifying CUDA Memory Access Efficiency: A member sought to understand CUDA memory access efficiency, particularly regarding memory coalescing and vectorised reads, and found it surprisingly hard to find a direct answer to such a seemingly simple question, but the CUDA C++ Best Practices Guide was provided for more context.
    • They wondered if reading larger values or using vectorised loads would negate the benefits of contiguous/coalesced access due to potential bank conflicts, also wondering if shared memory access is affected.
  • Coalescing Depends on Lanes, not Conflicts: Coalescing depends on lanes in a warp accessing consecutive elements of any size, with the first element being 32 byte aligned to minimize unnecessary transactions, which applies for bigger sized types like vectors.
    • It was clarified that bank conflicts are a concept normally applied in the context of shared memory access, not global memory access.
  • LeetCode for CUDA Released in Beta: A new resource, LeetCode for CUDA, was released in beta, inviting users to try it out and provide feedback.
    • The platform aims to provide coding challenges specifically for CUDA development, but users should expect some hiccups due to its beta status.
  • Exploring HBM Virtual Page Sizes: Discussion arose regarding memory page sizes in GPUs, with mentions of 1024-byte physical pages relevant to memory access patterns and the potential for optimal performance by accessing a whole page within a thread block, and that Stephen Jones talks on Nvidia on Demand are a good source.
    • It was noted that HBM virtual pages can be as large as 64kB, leading to questions about whether the 1kB size refers to internal burst or sub-block granularity, also physical pages vs virtual pages.

Links mentioned:


GPU MODE ā–· #torch (4 messages):

MPS Development, CI-based development

  • MPS Development on Linux with CUDA GPU: A user inquired about the possibility of developing MPS (Metal Performance Shaders) on a Linux laptop equipped with a CUDA discrete GPU.
    • They questioned how MPS emulation could be achieved on CUDA.
  • CI-Based Development Methodology: A member clarified that their MPS development process primarily relies on CI-based development over the past 2 years.
    • They mentioned that Nikita handles the majority of the work, while they focus on chatting and reviewing.

GPU MODE ā–· #announcements (1 messages):

Nouamane Tazi, Ultra-Scale Playbook, LLM training, 5D Parallelism

  • Nouamane Tazi to Give Epic Talk: Nouamane Tazi will give a 3-hour talk on his new viral book, ā€œTHE Ultra-Scale Playbook - a comprehensive guide on training LLMs from 1 to 1000s of GPUs!ā€ tomorrow at <t:1740772800:F>, covering topics from single GPU memory usage to 5d Parallelism, as seen on HuggingFace.
  • Special Guest Host Announced: A special guest host, <@418840303122907156>, will be present at the talk tomorrow.

Link mentioned: The Ultra-Scale Playbook - a Hugging Face Space by nanotron: no description found


GPU MODE ā–· #algorithms (1 messages):

Multi-head Latent Attention, Decoupled RoPE, MHA vs MLA, Weight Merging in MLA

  • Decoupled RoPE requirement for MLA dissected: The user is seeking rationale on why RoPE needs to be decoupled for MLA due to potential merging between (query latent)->query and (KV latent)->key weights during inference, and whether this applies to standard Multi-Head Attention (MHA).
    • They question if decoupling RoPE is more beneficial for MLA than MHA due to MLA’s expansion/contraction properties, particularly how merging weights could streamline the process of small->big and big->small weight matrices into smaller operations.
  • Efficiency of weight merging in MLA assessed: The user considers whether merging expansion/contraction weights in MLA could transform a small->big and big->small weight matrix into a small->small weight.
    • The user also suggests that because MHA lacks the same expansion/contraction dynamics, merging weights would offer only marginal efficiency gains compared to MLA.

DualPipe, GPU Architecture Fundamentals, CUDA Leetcode, Diffusion Models, TinyLM

  • DeepSeek unveils bidirectional DualPipe: DeepSeek released DualPipe on Github, a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
  • GPU Architecture playlist surfaces: A member shared a YouTube playlist on the fundamentals of GPU architecture.
  • CUDA gets a leetcode-esque platform called Tensara: A member highlighted Tensara, a platform for GPU programming challenges to write efficient CUDA code and compare solutions with other developers.
  • Diffusion invades LLMs, claims speed and vibe: According to a tweet, Diffusion models can achieve super-speedy generation on GPUs, surpassing Groq/Cerebras, and do much better at ā€œfill-in-the-middleā€ (FIM) compared to other models like DeepSeek V2 Lite (tweet).
    • The tweet highlighted Mercury by Inception Labs, the first commercial-grade diffusion large language model (dLLM) with parallel, coarse-to-fine text generation.
  • TinyLM facilitates zero-cost client-side inference: A member shared TinyLM, for zero-cost client-side inference using WebGPU, and OpenAI-compliant NodeJS and Chrome.

Links mentioned:


GPU MODE ā–· #beginner (7 messages):

HBM Bandwidth Estimation, CUDA Kernel Access Patterns, Mathematics for PMPP/CUDA, Discord Scams

  • User Tests HBM Bandwidth and Seeks Pattern Advice: A new user shared a CUDA kernel designed to estimate HBM memory bandwidth and inquired about its memory access patterns.
    • The user questioned whether the kernel exhibits a coalesced memory access pattern, contrary to Deepseek’s assessment of stride access patterns, and seeks guidance on understanding the data access flow (hbm -> l2 cache -> temp register).
  • Discord Group Warns of Possible Scam: A user expressed confusion about an unidentified element within the Discord server, prompting other members to identify it as a likely scam and ban the user.
    • A member confirmed it was ā€œcertainly not related to this discordā€.
  • Exploring Math Prerequisites for PMPP and CUDA: A member inquired about the necessary mathematical background before learning PMPP (presumably Parallel Multi-Processing Programming) or GPUs/CUDA.
    • Another member gave the terse advice ā€œnothing go go goā€.

GPU MODE ā–· #self-promotion (5 messages):

CUDA C++ and CUDA Python Tutorials, Accelerated Python Profiling Tools Survey, L1 store-caching in CUDA, tinylm WebGPU acceleration, LeetCode for CUDA

  • NVIDIA hosts CUDA Tutorials, Offers GPU MODE Event: NVIDIA is hosting invite-only, hands-on CUDA C++ and CUDA Python tutorials the day before GTC 2025 on Sunday, March 16, 2025, from 12-4 PM, and invites you to also the GPU MODE event from 5-10 PM (lu.ma/8w1ehhrw).
    • Interested parties are asked to email [email protected] to indicate which tutorial they’d like to attend, and no prior CUDA experience is required.
  • NVIDIA Needs Input: Accelerated Python Profiling Tools Survey Released: The NVIDIA Developer Tools team seeks feedback on how accelerated Python developers profile and optimize workloads via a short survey (Accelerated Python Profiling Tools Survey).
  • StackOverflow answers CUDA L1 store-caching questions: A member compiled a StackOverflow answer regarding L1 store-caching in CUDA over the GPU generations from tuning guides and whitepapers.
    • It also attempts to clarify confusing cache operators from the PTX ISA.
  • tinylm WebGPU Library hits v0: tinylm, a library for running LLMs and embedding models client-side in browser or Node.js with WebGPU acceleration, has reached v0 (https://github.com/wizenheimer/tinylm).
    • It supports OpenAI SDK-like text generation and embeddings, with text-to-speech and speech-to-text functionalities in the pipeline, and requires no servers.
  • LeetCode for CUDA Released, Enters Beta: The community announces the release of LeetCode for CUDA at https://LeetGPU.com/challenges.
    • The platform is currently in beta, and user feedback is welcomed.

Links mentioned:


GPU MODE ā–· #reasoning-gym (25 messagesšŸ”„):

Reasoning Gym Eval Script, Mercury Diffusion LLMs, GPT-4.5 Release, willccbb/verifiers issue

  • Reasoning Gym’s Eval Script Needs Improvement: Members discussed that the current reasoning-gym eval script lacks error printing and informative logs, making debugging difficult, but a new version is in the works.
    • Issues were found with API key setup using os.genenv (resolved by using load_env) and JSON serialization of time objects, causing script failures.
  • Diffusion Models Could Eclipse Autoregressive LLMs: Discussion pointed to Inception Labs’ Mercury, a diffusion-based LLM that could outperform traditional auto-regressive models in speed and quality.
    • Mercury is reported to be up to 10x faster than speed-optimized LLMs, achieving over 1000 tokens/sec on NVIDIA H100s.
  • GPT-4.5 Release Met with Skepticism: The release of GPT-4.5 was met with skepticism due to its high cost, lack of reasoning capabilities, and perceived lack of excitement, with one member describing it as ā€œwhat a flopā€.
    • Concerns were raised about its cost and the removal of the model picker, leading some to question its value proposition, and whether GPT-5 will be the real unified model.
  • willccbb/verifiers issue re-opened: A member mentioned re-opening the issue on the willccbb/verifiers project, inviting community contribution to the effort.
    • However, the member indicated they personally may lack the time to actively work on the issue.

Links mentioned:

  • ClaudePlaysPokemon - Twitch: Claude Plays Pokemon - Debut Stream
  • Inception Labs: We are leveraging diffusion technology to develop a new generation of LLMs. Our dLLMs are much faster and more efficient than traditional auto-regressive LLMs. And diffusion models are more accurate, ...

GPU MODE ā–· #gpuęØ”å¼ (16 messagesšŸ”„):

Chinese Internet Trends (Douyin vs. Xiaohongshu), Experiences with NVIDIA Hardware, MLSys and CUDA Discussions on Xiaohongshu, Chinese Room Thought Experiment, CUDA QQ Groups

  • Xiaohongshu Surpasses Douyin: A user switched to Xiaohongshu after Douyin was banned, noting the need to engage with the Chinese internet landscape.
    • The user expressed a preference for Xiaohongshu but admitted it’s not suitable for in-depth technical content due to its mobile-centric SNS format, recommending Zhihu, blogs, and papers for deeper learning.
  • Bonding over NVIDIA Hardware Struggles: A user finds common ground with Chinese engineers in navigating NVIDIA hardware, preferring direct communication over relying on promotional materials.
    • The user mentioned learning from various sources to bypass propaganda and engage directly with people.
  • MLSys/CUDA Content on Xiaohongshu Explodes: A user noticed an increase in MLSys and CUDA-related content on Xiaohongshu, but acknowledges its limitations for in-depth study.
    • The user noted, xhsčæ˜ę˜Æäøé€‚åˆčæ™ē§å†…å®¹ļ¼Œäø»č¦xhsēœŸå°±ę˜ÆäøŖé¢å‘ę‰‹ęœŗēš„sns and recommends Zhihu, blogs, and papers for serious learning.
  • Navigating the Chinese Room Thought Experiment: A user introduces the Chinese room thought experiment, referencing its Wikipedia page, to explain a shared phenomenon.
    • The Chinese Room experiment refutes Strong AI.
  • Craving CUDA QQ Group Banter: A user expressed a desire for a CUDA QQ group to facilitate casual discussion and information sharing.
    • Another user responded that WeChat groups related to the topic do exist.

Link mentioned: äø­ę–‡ęˆæé—“ - ē»“åŸŗē™¾ē§‘ļ¼Œč‡Ŗē”±ēš„ē™¾ē§‘å…Øä¹¦: no description found


GPU MODE ā–· #general (1 messages):

1000 Submissions Milestone

  • Community Reaches 1000 Submissions: The community reached 1000 submissions and celebrated with a champagne toast in the attached image.
  • Celebratory Champagne: The image shows what appears to be a celebratory scene, possibly involving champagne or sparkling wine, to mark the milestone.

GPU MODE ā–· #submissions (206 messagesšŸ”„šŸ”„):

Grayscale Leaderboard, Histogram Leaderboard, Vectoradd Leaderboard, Vectorsum Leaderboard, Sort Leaderboard

  • Grayscale Submissions Galore: Multiple submissions, both benchmarks and leaderboard entries, were made to the grayscale leaderboard using various GPUs like A100, H100, T4, and L4 with Modal runners.
    • Many of these submissions triggered a message stating Leaderboard name specified in the command doesn’t match the one in the submission script header.
  • Histogram Gets Heaps of Hits: Numerous submissions were made to the histogram leaderboard, utilizing GPUs such as T4, H100, and A100 with Modal runners, including test, benchmark and leaderboard submissions.
    • Similar to the grayscale submissions, many of these triggered a message stating Leaderboard name specified in the command doesn’t match the one in the submission script header.
  • Vectoradd Victories Vanquish Valuelessness: Submissions, mostly benchmarks, targeted the vectoradd leaderboard, employing GPUs like T4, A100, and H100 with Modal runners.
    • A notable number of these submissions also triggered the Leaderboard name specified in the command doesn’t match the one in the submission script header message.
  • Vectorsum Ventures Validate Variance: Test and benchmark submissions were made to the vectorsum leaderboard, primarily using A100 GPUs and Modal runners.
    • Most of these submissions triggered a message stating Leaderboard name specified in the command doesn’t match the one in the submission script header.
  • Sorting Submissions surface: Benchmark submissions were made to the sort leaderboard using T4 GPUs and Modal runners.
    • These submissions triggered a message stating Leaderboard name specified in the command doesn’t match the one in the submission script header.

GPU MODE ā–· #ppc (10 messagesšŸ”„):

INT8 Matmul, Loop Reordering, CPU optimization

  • INT8 Matmul Mystery: A member is struggling with INT8 matmul baseline performance, taking 3.62 seconds even after transposing B.
    • Another member claims they achieved faster speeds without multithreading, instruction-level parallelism, or vectorization, relying on existing knowledge and intuition.
  • Loop Reordering Saves the Day: One member suggests that loop reordering is a key optimization for matmul on CPU, easily found via a quick Google search.
    • The same member clarified they meant CPU optimization, also asking if the user ran modprobe amd_uncore.

GPU MODE ā–· #feature-requests-and-bugs (6 messages):

Custom Kernel Preprocessing, Bot Submitter Identification, Matmul Preprocessing Time

  • Custom Kernel Preprocessing Concerns Raised: A member questioned the difference between the current setup and a new proposal regarding defining a preprocessing function in custom_kernel as part of the timing analysis.
    • Another member responded that they think it makes sense for it to be included, but did not clarify.
  • Bot Needs Submitter ID Upgrade: A user expressed confusion about identifying submissions when interacting with the bot, suggesting the inclusion of the submitter’s username in the topic title.
    • Another member confirmed that this request had been voiced by others and should be implemented soon when admins have available time.
  • Matmul Preprocessing Timeout Tensions: A member suggested including preprocessing time for large matrix multiplication (matmul) targets, given its O(n²) complexity versus the O(n³) kernel runtime.
    • For other settings, they proposed setting a reasonable timeout, such as limiting preprocessing time to 100ms for kernels expected to run in under 10ms.

OpenRouter (Alex Atallah) ā–· #announcements (4 messages):

OpenAI Outage, DeepSeek R1, Claude Sonnet 3.7, GPT-4.5 Preview

  • OpenAI Provider Outage Resolved: OpenRouter experienced an OpenAI provider outage which was identified as an incident on OpenAI’s side and has since been resolved.
    • Requests are now succeeding, and OpenAI as a provider on OpenRouter has recovered.
  • DeepSeek R1 Blazes with SambaNovaAI: A new provider for the 671B-param DeepSeek R1 via SambaNovaAI now provides 150 tokens/second.
  • Claude Sonnet 3.7 Boasts Capacity and Web Search: Claude Sonnet 3.7 now has significantly higher rate limits and web search capability on OpenRouter.
  • GPT-4.5 Preview Rockets onto OpenRouter: GPT-4.5 (Preview), designed to push boundaries in reasoning, creativity, and long-context conversations, is now available on OpenRouter, costing $75/M input tokens and $150/M output tokens.
    • Early testing shows improvements in open-ended thinking, real-world knowledge, long-context coherence, and reduced hallucinations; the announcement links to the OpenAI blog post and a discussion on X.

Links mentioned:


OpenRouter (Alex Atallah) ā–· #app-showcase (2 messages):

YPerf, Gemini Flash, Llama 3, Claude 3.5 Sonnet

  • YPerf Tracks OpenRouter Model Performance: A member created YPerf.com to monitor model API usage and performance across OpenRouter.
  • Gemini Flash 1.5 8B benchmarked: The Gemini Flash 1.5 8B ranks #66, costing $0.04, with 0.52s latency and 419.8T/s throughput on OpenRouter.

Link mentioned: YPerf: no description found


OpenRouter (Alex Atallah) ā–· #general (389 messagesšŸ”„šŸ”„):

Sonnet 3.7 thinking endpoint, DeepSeek R1 reasoning, OpenAI's GPT 4.5 pricing and performance, OpenRouter Documentation

  • Sonnet 3.7 :thinking endpoint showing less weirdness: Members noticed that using the :thinking endpoint with Sonnet 3.7 on OpenRouter seems to reduce weird behavior, possibly due to the endpoint enabling reasoning by default with a minimum budget of 1024 tokens.
    • One member reported seeing "native_tokens_reasoning": 171, in requests, indicating reasoning traces, and suggested that 3.7 might be designed for thinking tokens.
  • DeepSeek R1’s thought chains via API: Users discussed how to access DeepSeek R1’s thought chains through the API, with a member recommending the include_reasoning parameter.
    • It was also noted that some content tokens might slip into the reasoning token, and the recommendation was to ā€˜double check thinking tags and never forget them’.
  • GPT 4.5’s high price riles up community: The community reacted strongly to the pricing of GPT 4.5 ($75 input, $150 output), with many calling it insane and questioning its value compared to models like Grok3 and Claude Sonnet 3.7.
    • Some speculated it was a failed attempt at gpt5, while others believed it was a measure against distillation, making the exorbitant cost unjustifiable.
  • OpenRouter adds documentation for access and features: A user requested documentation about OpenRouter’s functionality and architecture and documentation was shared, offering insights into usage, API access, and supported features.
    • Another user inquired about the availability of prompt caching with Vertex AI, and it was confirmed this was available for almost a month with tips on where to view the activity.
  • User builds CAD app with OpenSCAD clone: One member is building a CAD app in the browser that’s an OpenSCAD clone with a different backend.
    • The language supports basic syntax like var x = 42;, operators like + - * /, basic shapes like sphere(radius);, SDF operators, transformations, and boolean operations.

Links mentioned:


LM Studio ā–· #general (278 messagesšŸ”„šŸ”„):

Robotics DIY, LLM backend website, Grok-3 performance vs O3, DeepSeek political controversy, OpenAI defense contracts

  • DIY Robotics Arm Excites Hobbyists: A member suggests building a robotics arm from scratch to learn about servos, CAD, and microcontrollers and recommends a $100 Creality Ender 3 V2 printer from Microcenter.
  • LLM Backends for Websites Debated: Members discussed how to implement an LLM in a website, with suggestions including using websockets, SSR, AnythingLLM, and code editors like Cursor and Continue.dev.
    • It was clarified that hosting a website on GitHub Pages would require the LLM to be hosted elsewhere (Azure, cloud, ngrok), sparking frustration and a humorous exchange.
  • Grok-3 performance beats O3: Members discuss the surprisingly good performance of Grok-3 vs the previous O3 model on various benchmarks, and wondered if X.ai’s benchmarks were accurate or misleading.
    • The users debated if Grok-3 was rushed to market without proper ethical red-teaming, while others argued that Grok 3 is a beta, monitored, and not on API due to safety reasons.
  • DeepSeek’s Politically Charged Responses Spark Debate: Members debated whether DeepSeek’s censorship of certain Chinese historical events is unethical, with some arguing it’s a necessary self-preservation measure.
  • OpenAI’s Defense Partnerships Stir Ethical Concerns: Members reacted to news that OpenAI is working with the military and defense industry, a reversal of their original stance, and their new partnership with Anduril.
    • Some find the lack of oversight and potential for weaponization concerning, while others mention Ilya Sutskever, the ex-Chief Scientist of OpenAI who left to start his own safety-focused AI company, Safe Superintelligence (SSI).

Links mentioned:


LM Studio ā–· #hardware-discussion (41 messagesšŸ”„):

Framework desktop, Unified RAM, AMD Ryzen AI, GPU Pricing

  • Framework Desktop Gains Traction: A user pre-ordered a Framework desktop to experiment with LM Studio server and Tailscale for an iPhone chat app, Docker, and webservers.
    • Some expressed concerns about waiting until summer for the product, with one noting it will likely be joined by a dozen other mini PCs with the same SoC by then.
  • Framework Desktop’s Unified RAM Intriguing: The Framework desktop features unified RAM between the CPU and GPU, offering up to 128GB of shared memory, with approximately 90GB available for the GPU.
    • One user likened it to a MAC setup, highlighting the appeal of unified RAM in a PC.
  • GMK’s Ryzen AI Max Mini-PC Unveiled: GMK announced the world’s first mini-PC based on AMD Ryzen AI 9 Max+ 395, expected to hit the market in the first or second quarter.
    • This mini-PC will feature Zen 5 architecture with up to a 16-core/32-thread configuration and powerful integrated graphics based on the RDNA 3.5 architecture.
  • AMD’s GPU Pricing Strategy Under Scrutiny: A YouTube video urges AMD to aggressively price its upcoming RX 9070 and 9070 XT GPUs to gain market share from Nvidia.
    • The video highlights Nvidia’s 90% GPU market share and argues that AMD should undercut Nvidia significantly to capitalize on recent missteps, instead of its typical Nvidia minus $50 strategy.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #news (274 messagesšŸ”„šŸ”„):

Claude Annual Subscriptions, Microsoft Phi-4 Models, GPT-4.5 System Card, OpenAI Livestream, Meta AI Standalone App

  • Claude Pro Annual Plan Promo: Anthropic is experimenting with a new Claude web app promotion, offering a limited time offer for a year of Claude Pro at a special price if switching to an annual plan by a specific end date, prompting a reminder not to buy annual subs for AI services from a user.
    • As another user notes, they have been there, done that, and regretted and never used an annual subscription before.
  • Microsoft Launches Phi-4-multimodal and Phi-4-mini: Microsoft announced the Phi-4 family of small language models (SLMs), including Phi-4-multimodal (processes speech, vision, and text) and Phi-4-mini (excels in text-based tasks), available in Azure AI Foundry, HuggingFace, and the NVIDIA API Catalog.
    • Some users doubt claims that it has similar multimodal performance to Gemini Flash lite, and also that Microsoft should rename the product line, as they will never escape their karmic stain.
  • Leaked GPT-4.5 System Card: A user shared the GPT-4.5 System Card, indicating that interacting with GPT-4.5 feels more natural and that internal testers report GPT-4.5 is warm, intuitive, and natural. The system card notes that it’s OpenAI’s largest LLM, improving GPT-4’s computational efficiency by more than 10x.
    • A user calls the card very boring, while another interprets the card to indicate a GPT4.5: creative writooor while Sonnet 3.5 is a problem solver.
  • OpenAI launches GPT-4.5, Character gets Mainstream: OpenAI launched GPT-4.5 as a research preview, available to OpenAI Pro users and API developers with image + text in, text out and same context as 4o model, trained till June 2024. Here is the official announcement.
    • One user says character/personality is becoming a mainstream topic, and OpenAI aggressively used low-precision training. Another questions how big is the model with that pricing.
  • GPT-4.5 Performance and Pricing Cause Community Reactions: Early benchmarks of GPT-4.5 show it being outperformed by o1 on several problems, indicating pre-training isn’t the optimal place to spend compute in 2025, but one user notes the hallucination metrics are very good. Pricing of GPT-4.5 is expensive at $75.00 per million input tokens and $150/million for output, prompting one user to state this must be the end of scaling.
    • Another user believes in 1-2 years this will be the default model size.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #ml-drama (4 messages):

Anthropic data collection, Alignment for monitoring

  • Anthropic Accused of Data Collection Shenanigans: A user accused Anthropic of sneaky data collection from the Computer Use API, using it to train classifiers for corporate ethical guidelines, and updating their website to appear transparent, according to this fxtwitter thread.
  • Alignment Monitoring’s Data Origins Unclear: It was inferred that Anthropic used user data based on their summarization for monitoring blogpost; although, a user pointed out that the data source for training remains unspecified.

Link mentioned: Tweet from Pliny the Liberator šŸ‰ó …«ó „¼ó „æó …†ó „µó „ó …€ó „¼ó „¹ó „¾ó …‰ó …­ (@elder_plinius): sneaky sneaky, @AnthropicAIcollecting user data from everyone that used the Computer Use API without informed consent or an opt-out option is dirty workusing that data to then train a classifier to im…


Interconnects (Nathan Lambert) ā–· #random (19 messagesšŸ”„):

Claude Code access and potential uses, DeepEP analysis, AI competing on Pokemon Red%, Claude 3.7 Sonnet RL issues

  • Claude Code Craze & Obsidian Integration: A member is curious about Claude Code access and is considering using it within their Obsidian vault, coupled with Google Calendar and Gmail MCPs, to organize their life.
  • DeepEP Deconstructed & Hardware Caveats: A member shared an analysis of DeepEP, noting it as a valuable work with many details to learn from, but also pointing out hardware limitations that are better understood in conjunction with suggestions from the DeepSeek-V3 paper.
  • MissingNo Mayhem & Model Misbehavior: A member joked about AI companies competing on Pokemon Red%, predicting a model will exploit a bug like MissingNo, causing safety concerns due to widespread guides, even suggesting the possibility of China releasing such a model in real life.
  • Sonnet 3.7 Stumbles & Rule Rejection: A member shared their experience using Claude 3.7 Sonnet in Cursor, finding it over-confident and prone to ignoring rules, echoing Catalin’s sentiments of the model being worse than 3.5 due to its addiction to the reward signal.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #memes (10 messagesšŸ”„):

GPT-4.5 release, DeepSeek r1, Claude Code ls node_modules, Gary Marcus GPT-4.5

  • OpenAI skips GPT-4.5 and goes to OpenAI Five: Twitter user noted OpenAI skipped GPT-4.5 and went straight to ā€œOpenAI Fiveā€.
  • GPT 4.5 can hold your hand: A user jokes about the DeepSeek r1 release, claiming Grok 3 beats every benchmark, and GPT 4.5 can hold my hand when I am scared according to this tweet.
  • Claude code executes ls in node_modules: A user shared that Claude Code decided to ls in node_modules according to this tweet.
  • GPT 4.5 is nothingburger says Gary Marcus: Gary Marcus wrote a Substack article claiming that GPT-4.5 is a nothing burger and GPT 5 is still a fantasy.

Links mentioned:


Interconnects (Nathan Lambert) ā–· #reads (3 messages):

Alignment, Realism-grounded alignment

  • Anthropic Reveals Alignment Monitoring via Summarization: Anthropic posts about Alignment Monitoring via Summarization for their alignment techniques.
  • Realism-Grounded Alignment Gets Thumbs Up: A member expressed a preference for realism-grounded alignment approaches.

Interconnects (Nathan Lambert) ā–· #posts (2 messages):

olmOCR vs Top PDF tools, Pairwise judgments and Elo score

  • olmOCR Dominates PDF Processing: Allen AI’s olmOCR tool outperforms top PDF processing tools in human evaluations using pairwise judgments.
  • Pairwise Ranking Decoded: A member clarified that the y-axis on the linked chart likely represents an Elo score, inferred from the mention of pairwise ranking in the olmOCR comparison.

Link mentioned: Tweet from Ai2 (@allen_ai): olmOCR dominates the competition! Our human evaluation using pairwise judgments against top PDF processing tools show olmOCR’s rating significantly above other tools. Don’t take our word for i…


Latent Space ā–· #ai-general-chat (133 messagesšŸ”„šŸ”„):

Speak AI revenue graph, Hume AI's Octave text-to-speech LLM, Levelsio flying project, Perplexity Sonar API Deep Research, Firecrawl Deep Research API

  • Speak AI’s Novel Exponential Revenue: Paul Graham shared a revenue graph showing a novel variant of exponential growth, where a company selling a new year’s resolution product sees sustained usage due to its effectiveness.
    • Swyx noted this observation, highlighting the company’s unique growth pattern.
  • Hume AI Releases Octave Text-to-Speech LLM: Hume AI launched Octave, a new LLM for text-to-speech that can design voices with prompts and control emotion and delivery, with a creator studio for long-form content production.
    • It understands how meaning affects delivery to generate emotional, human-like speech, unlike traditional TTS systems.
  • Inception Labs releases Mercury dLLM: Inception Labs introduced Mercury, the first commercial-grade diffusion large language model (dLLM), which promises parallel, coarse-to-fine text generation.
    • Karpathy commented that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses, encouraging people to try it out.
  • MCP: Tool Calling Renaissance: There are contrasting views on MCP’s value prop, Greg Kamradt suggests developers jump on the Anthropic MCP train and build, while others find the dev experience sucks.
    • Members defined MCP as a tool call with your own tools, or potentially use tools other people have built without wanting to figure out their underlying API.
  • Karpathy Teaches LLMs: Andrej Karpathy released a 2h11m YouTube video on How I Use LLMs, covering a practical guide to the LLM ecosystem with examples, including tool use, file uploads, audio/video I/O, memory, and custom GPTs.
    • Chapters include: ChatGPT interaction, tool use (internet search, deep research, Python interpreter), Claude Artifacts, Cursor Composer, Speech I/O, NotebookLM, and image/video I/O.

Links mentioned:

  • Tweet from Inception Labs (@InceptionAILabs): We are excited to introduce Mercury, the first commercial-grade diffusion large language model (dLLM)! dLLMs push the frontier of intelligence and speed with parallel, coarse-to-fine text generation.
  • Mercury Coder: no description found
  • Alter | AI For Your Entire Workday: Alter: The seamless AI that supercharges your Mac. Skip the chat, execute instant actions across all apps. 10x your productivity with complete privacy control.
  • Tweet from Hume (@hume_ai): Today, we’re releasing Octave: the first LLM built for text-to-speech.šŸŽØDesign any voice with a promptšŸŽ¬ Give acting instructions to control emotion and delivery (sarcasm, whispering, etc.)šŸ› ļøProduce ...
  • Tweet from Firecrawl (@firecrawl_dev): Announcing the Firecrawl Deep Research API šŸ”ŽA complete research API that allows you to easily build deep research into your own applications.Join the waitlist below!
  • Tweet from Paul Graham (@paulg): Here's what happened to that startup's revenue graph in the next year (in blue).Quoting Paul Graham (@paulg) A novel variant of exponential revenue graph. This company is selling something use...
  • Tweet from @levelsio (@levelsio): I think 5000 people flying but I also see some bots šŸ˜…Quoting Thomas Slabbers (@Thomasslabbers) This is pure genius - look at how many people are flying right now! I also found Mars. Pieter this might...
  • Tweet from Andrej Karpathy (@karpathy): New 2h11m YouTube video: How I Use LLMsThis video continues my general audience series. The last one focused on how LLMs are trained, so I wanted to follow up with a more practical guide of the entire...
  • Reddit - Dive into anything: no description found
  • Tweet from Aravind Srinivas (@AravSrinivas): We’re making Deep Research available as an endpoint to all developers through the Perplexity Sonar API to help people build their custom research agents and workflows! Excited to see what people are g...
  • Tweet from Addy Osmani (@addyosmani): Can you accurately transcribe fast speech? Tested @elevenlabsio' new Speech-to-Text model (Scribe) with Eminem's "Rap God" (4.28 words/sec!) & it nailed it. Great quality and supports ...
  • Tweet from Andrej Karpathy (@karpathy): This is interesting as a first large diffusion-based LLM.Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively...
  • Tweet from Quinten Farmer (@quintendf): I’m excited to announce Tolan, our first Embodied Companion.With no launch or press we’ve quietly hit 500,000+ downloads, over $1m in ARR, and a #1 app store category ranking.Today I’m also announcing...
  • Tweet from OpenAI (@OpenAI): Livestream in 4.5 hours.
  • Tweet from Aravind Srinivas (@AravSrinivas): We’re making Deep Research available as an endpoint to all developers through the Perplexity Sonar API to help people build their custom research agents and workflows! Excited to see what people are g...
  • Tweet from Nick St. Pierre (@nickfloats): In just the past few weeks:o1-pro was SOTADeepseek r1 was SOTAo3‑mini was SOTAGrok 3 was SOTAClaude 3.7 was SOTACan you feel the acceleration?
  • Add update-ui tool with synchronous UI action handling by wesen Ā· Pull Request #9 Ā· go-go-golems/go-go-mcp: This PR introduces a synchronous UI update system that waits for user actionsbefore completing requests, making it easier to build interactive applications.Key changes:Refactored UI handling in...
  • Tweet from Greg Kamradt (@GregKamradt): If you’re a dev looking for a career directionGo jump on the Anthropic MCP train and buildIt’s having a moment and there are 1M best practices to figure outThis is the sign you’ve been waiting for
  • Tweet from Chris Frantz (@frantzfries): Could somebody please explain why MCP’s are valuableI tried setting up a few, the dev experience sucks and the GitHub’s repos are full of issues saying it sucks after trying themExisting API’s are fas...
  • Welcome to the new Phi-4 models - Microsoft Phi-4-mini & Phi-4-multimodal: Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally...

Latent Space ā–· #ai-in-action-club (166 messagesšŸ”„šŸ”„):

GPT 4.5, Claude 3.7 Sonnet, Model Scaling, Open Source, Every Hiring

  • GPT-4.5 Watch Party Rough Start: Members experienced initial technical difficulties and struggled to hear the stream audio of the GPT-4.5 launch, with some humorously suggesting the presenter was roasted.
    • Viewers generally felt the GPT 4.5 launch stream was a disappointment, with descriptions such as hostage video and some saying this stream is rough and that the vibe test failed.
  • New Scaling Laws HOT SWAP: OpenAI presentation introduces new scaling laws, indicating a change in the ratio between data and param size in the post-training stage.
    • They asked themselves during the presentation are we hitting a wall.
  • GPT-4.5 Skips API, aims for Therapy: The new model doesn’t have an API, and is focused on heavy-tail, real world edge cases like responding to angry texts, and better use cases.
    • Members were unimpressed with GPT-4.5’s example use cases (everyday queries including texts to send to your friends).
  • Sonnet 3.7 Overconfident Ignoring Rules: A member claimed that Claude 3.7 Sonnet is worse than 3.5, as it is over-confident, ignores rules, and unnecessarily does more than it needs to do and therefore breaks the code.
    • They are going back to 3.5.
  • Every Hires for Cora Calm Inbox: Every is hiring a full-stack AI engineer for Cora, building a calm inbox with over 1,000 daily active users and 10,000 on the waitlist.
    • There are also openings for a growth marketing lead and a full-stack designer for their website.

Links mentioned:


Nous Research AI ā–· #general (280 messagesšŸ”„šŸ”„):

Apple Intelligence Underwhelming, Efficient CoT, GPT-4.5, MoE Models, Wan2.1 video model

  • Wan2.1 rises as Stable Diffusion moment for video: The release of Wan2.1, an open and advanced large-scale video generative model, has been hailed as the stable diffusion moment for video models.
  • Experiments on efficient CoT with Reward Models: Members discussed methods to make long Chain of Thought (CoT) more efficient, including using another LLM to make CoTs more concise and defining a reward function that rewards efficient thoughts, but the consensus is that this is the problem of the year.
    • Suggestions included ideas such as latent CoTs, MoE models, and optimizing for the correct outcome while minimizing excess reasoning tokens; but everyone is noticing that Process Reward Models kinda suck.
  • MoE Models Prove Speedier on CPUs: Members tested Mixtral, Granite, and DeepSeek R1 against models such as Llama 3.2 and OLMoE, showing that MoE models are faster and lose less performance when going to pure CPU execution.
    • One user notes that they highly recommend OLMoE to be thrown on smaller CPU only devices, like a 16GB Raspberry Pi because there is still value in getting answers back effectively instantly.
  • GPT-4.5 release underwhelming: GPT-4.5 has been released, being described as a very large and compute-intensive model, making it more expensive than and not a replacement for GPT-4o, with Sam Altman stating that this model feels like talking to a thoughtful person.
    • Karpathy claims it has 10x more pretraining compute than GPT-4, however its use case might be limited given it is overfit on the river crossing puzzle, and more geared towards creative use cases.
  • Apple Intelligence: Big Shift or Big Miss?: Members discussed Apple Intelligence, with some believing it is underwhelming, and also a big shift away from the money coming from business API use over to the money coming from consumers, while one mentioned they’re in an edge-inference-first trap.
    • Members note that Apple focused on use cases possible with on-device constraints, while everyone else just tried to make AI as good as possible, suggesting Apple should have been first on this, but messed it up.

Links mentioned:


Nous Research AI ā–· #ask-about-llms (4 messages):

AI Voice Commands, Reasoning in AI Models, Text-to-Speech AI, Elevenlabs, Cartesia

  • Reasoning Toggle for AI via Voice Commands: A user inquired about toggling reasoning in an AI model via voice commands, aiming for 90% reasoning off unless specifically prompted with phrases like ā€œuse reasoningā€.
    • The user asked if they could add a system prompt to achieve this and whether it’s possible to finetune the reasoning process and enable text-to-speech functionality.
  • Text-to-Speech AI models are being discussed: A user planned to implement voice output using Elevenlabs or Cartesia text-to-speech, clarifying their intention after another user stated that the model cannot speak in voice.
    • The member pointed to this YouTube video as a demonstration of something similar to what they are trying to achieve with AI assistants.

Link mentioned: Deepseek AI Assistant: ALWAYS ON Python AI Agent for Engineers that SHIP: šŸ”„ Is your Personal AI Assistant truly ALWAYS ON? Discover how Ada, powered by DeepSeek V3, is revolutionizing the way engineers ship code! šŸš€šŸŽ„ Resources fo…


Nous Research AI ā–· #research-papers (1 messages):

Language Models, REFUTE benchmark, algorithmic problem solving

  • Language Models to accelerate science?: Language Models (LMs) have the potential to accelerate scientific discovery by helping to falsify hypotheses and refine claims iteratively.
    • Current benchmarks for LMs assess their ability to generate solutions rather than challenge them.
  • Introducing REFUTE Benchmark for Algorithmic Problem Solving: A new dynamically updating benchmark called REFUTE is introduced to assess LMs’ ability to generate counterexamples for incorrect solutions in algorithmic problem solving.
    • It includes recent problems and incorrect submissions from programming competitions, where human experts successfully identified counterexamples.
  • LMs struggle with verification on REFUTE: Analysis of the REFUTE benchmark reveals that even the best reasoning agents succeed in finding counterexamples only 9% of the time.
    • This suggests that verification can be significantly harder than generation for language models.

Link mentioned: Paper page - Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation: no description found


Diffusion LLMs, Mercury dLLM, LLaDA Release

  • Mercury dLLM Launches for Commercial Use!: Inception Labs introduces Mercury, a new family of diffusion large language models (dLLMs), claiming it’s up to 10x faster than current speed-optimized LLMs, achieving over 1000 tokens/sec on NVIDIA H100s; a code generation model, Mercury Coder, is available for testing in a playground.
  • LLaDA Model Gets Official PyTorch Implementation!: The group ML-GSAI released their model with an official PyTorch implementation for ā€œLarge Language Diffusion Modelsā€ available on GitHub.

Links mentioned:


Nous Research AI ā–· #research-papers (1 messages):

Language Models, Scientific discovery, REFUTE Benchmark

  • Language Models fuel Scientific Discovery: There is growing excitement about the potential of Language Models (LMs) to accelerate scientific discovery.
    • Current benchmarks for LMs predominantly assess their ability to generate solutions rather than challenge them.
  • REFUTE Benchmark introduced: The REFUTE benchmark includes recent problems and incorrect submissions from programming competitions, where human experts successfully identified counterexamples.
    • Analysis shows that the best reasoning agents succeed only 9% of the time, showing verification can be a lot harder than generation sometimes.

Link mentioned: Paper page - Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation: no description found


HuggingFace ā–· #general (132 messagesšŸ”„šŸ”„):

HuggingFace Spaces licensing, Fal AI vs Deepinfra pricing, Lighteval MMLU-Pro support, LEFFA paper implementation, HuggingMod bot

  • Spaces License Snafu?: A user inquired about needing a license to create a Space for a community bot, clarified as a software license for code publishing, not a permission for creating a Space.
    • Another user directed them to the HuggingMod bot for code snippets and guidance.
  • Deepinfra Dominates Fal AI in Cost?: While one user recommended Fal AI with $50 free credit, another claimed Deepinfra is 100x cheaper for character processing at $0.8 per million characters and offers free compute.
    • The first user also suggested Kokoro TTS as a cheap option.
  • Apple Silicon Sparks LLM Strides: A user asked about running LLMs on Apple’s Neural Engine, with another pointing to Core ML and Apple’s documentation on optimizing LLMs for Apple silicon.
    • Discussion indicated model conversion to the .mlmodel extension is necessary but can be complex.
  • Gemma Quantization Quandaries: A user inquired about the size of GemmaX2, another user pointed to this page and mentioned it varies between 1.5GB and 5.3GB depending on the quantization.
    • The same user also told the user how to check out the size, by clicking on Use this model.
  • Is OpenAI’s generated text detectable?: Users discussed AI-generated text detection, with one sharing that academic institutions may not check due to lack of definitive proof.
    • A user shared images of cover letters before and after AI improvement, noting that OpenAI models fail horribly in terms of following patterns.

Links mentioned:


HuggingFace ā–· #today-im-learning (4 messages):

Hiding vs Removing, F2 vs F12, Smol Agents Framework

  • Hiding is not Removing!: A member inquired about the difference between hiding and removing, questioning the benefit of hiding.
  • F2 is nothing like F12: A member shared their TIL (today I learned) moment about the difference between the F2 and F12 keys.
    • No further context was provided.
  • Smol Agents Framework: A member is learning how to build a basic agent using the smol agents framework.
    • They shared no further details about the agent they are building, or their experience.

HuggingFace ā–· #i-made-this (8 messagesšŸ”„):

LLM performance benchmark, Face similarity questionnaire, PyTorch library for 360° images, Phi-4 models

  • LLM Benchmark Unveiled to Evaluate Performance: A member developed a small private benchmark to quickly check general LLM performance using previously unseen questions and estimate how far small local models are from the best online models, now including over 1000 models.
  • Face Similarity Preferences Needed for Master’s Thesis: A member is requesting participation in a questionnaire for their master’s thesis, which focuses on determining which faces look more similar using a pipeline for generating faces.
    • The questionnaire, optimized for PC, takes around 5 minutes to complete and is available at this link.
  • PyTorch360Convert Library Simplifies 360° Image Handling: A member introduced a new, lightweight PyTorch library called pytorch360convert to simplify working with 360° images for VR, AR, video games, and more, available via pip install pytorch360convert.
    • The library supports various image representations, including equirectangular images and cubemaps, and is GPU/CPU compatible, supporting float32, float64, float16, and bfloat-16 precision types, and is available on GitHub.
  • Phi-4 Models Debut on HF Spaces: A member shared a link to phi 4 models on Hugging Face Spaces, marking the availability of this project.

Links mentioned:


HuggingFace ā–· #reading-group (2 messages):

Language Models (LMs), REFUTE Benchmark, Reasoning Agents

  • Language Models Speeding Scientific Discovery: A new paper highlights the potential of Language Models (LMs) to accelerate scientific discovery, emphasizing the importance of falsifying hypotheses.
    • The paper notes that current benchmarks predominantly assess the ability to generate solutions rather than challenge them, advocating for benchmarks that evaluate the inverse capability and linking to their paper here.
  • Introducing the REFUTE Benchmark: The authors introduce REFUTE, a dynamically updating benchmark that includes recent problems and incorrect submissions from programming competitions where human experts successfully identified counterexamples.
    • Analysis shows that even the best reasoning agents score low (9%) at falsifying incorrect algorithmic solutions, despite generating correct ones for 50% of the problems.
  • LLMs as Retrieval Engines: A member commented on the scarcity of data showing that verification can be harder than generation, noting that generating the correct solution type of code dominates everywhere.
    • The member suggested that LLMs can’t reason too much and are mainly a retrieval engine.

Link mentioned: Paper page - Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation: no description found


HuggingFace ā–· #computer-vision (2 messages):

ā€œ

  • No Topics Discussed: No significant topics were discussed in the provided messages.
  • Awaiting Next Session: A member expressed their intention to join the next session.

HuggingFace ā–· #gradio-announcements (1 messages):

FastRTC

  • FastRTC Category is LIVE!: A member directs everyone to the FastRTC category for questions, discussions, and announcements.
    • The link to the specific channel is here.
  • Reminder to use FastRTC Category: To keep the server organized, members are encouraged to use the FastRTC category for related discussions.
    • This helps ensure that relevant information is easily accessible and conversations remain focused.

HuggingFace ā–· #smol-course (9 messagesšŸ”„):

Inference Engine Alternatives, Smolagents Quiz Iframe, Smolagents Quiz Failures, HfApiModel vs LiteLLMModel Confusion, SFT Trainer Loss Function

  • Inference Credits Exhausted: A user inquired about discounts or alternative inference engines to continue the studio notebooks on Google Colab for the smolagents course, after exceeding Hugging Face’s inference requests limit.
    • They expressed a desire to continue following along with the course.
  • Smolagents Quiz Display Issues: A user reported that the iframe in the final quiz for unit 2.1 of the smolagents course is too small, making the feedback difficult to read even on a 32ā€ 4k monitor.
    • They suggested increasing the iframe size to 800x600 or 850x850 to improve readability.
  • Smolagents Quiz Validation is BSing User: A user complained that the agent verifying answers in quiz 2.1 of the agent course is giving contradictory feedback regarding the id argument in HfApiModel, requiring it and then rejecting it.
    • The user argued that the HfApiModel class should default to the Qwen model, making the id argument optional, and requested more mental elasticity from the validation agent.
  • SFTTrainer Loss Elucidation: A user sought clarification on the loss function used by SFTTrainer, questioning whether it’s inferred from the model type (e.g., crossentropy for CLM).
    • It was also confirmed that the agent works the same with or without explicit imports.
  • Documentation Discrepancies Frustrate Quiz Takers: A user expressed frustration with errors encountered in the second quiz, citing discrepancies between the quiz’s security settings and current documentation.
    • The user also noted confusion regarding the model implementation with HfApiModel versus LiteLLMModel, stating that the documentation doesn’t seem to indicate that HfApiModel has a model_id for LiteLLMModel.

HuggingFace ā–· #agents-course (129 messagesšŸ”„šŸ”„):

Chat templates, agent, and LLM interaction, NVIDIA AI Red Team Prompt Injection, CodeAgent's Python interpreter, Smolagents codeagents to set the system prompts, Agent Laboratory for research reports and code repositories

  • Agent’s Prompt Template gets Populated: A member was trying to verify their understanding of how chat templates, agents, and LLMs interact, noting that the prompts.yaml file defines the system_prompt and is populated with actual tools provided in the agent initialization.
    • Another member clarified that the CodeAgent actually has it’s own Python interpreter.
  • NVIDIA AI Red Team Tackles Prompt Injection: The NVIDIA AI Red Team identified vulnerabilities where prompt injection can be used to exploit three plug-ins included in the LangChain library.
    • Prompt injection is a new attack technique specific to large language models (LLMs) that enables attackers to manipulate the output of the LLM, especially when LLMs are equipped with plug-ins.
  • Debugging Nightmares with SmolAgents: A member reported running into an issue with the examples in Unit 2, stating that most of the sample code fails due to reaching the maximum number of steps.
    • Another member shared some concerns about deploying Smolagents to production, noting that ā€œbecause they don’t run async I have to run them in threadsā€.
  • Gemini is more Generous: A member stated that they were facing Payment Required message.
    • Another member recommended switching to using Gemini with LiteLLM because ā€œGemini has generous free tier with Google AI Studioā€.
  • Agent Laboratory helps you ideate: Agent Laboratory takes as input a human-produced research idea and outputs a research report and code repository.
    • It enables you ā€œto focus on ideation and critical thinking while automating repetitive and time-intensive tasks like coding and documentationā€, according to their GitHub page.

Links mentioned:


Perplexity AI ā–· #general (264 messagesšŸ”„šŸ”„):

Perplexity Pro Flair, New Voice Mode, Disable Web Search, Coding with Perplexity, Gemini Real Time Video Chat

  • Voice Mode Vigorously Vouched for: Members discussed the new voice mode feature, noting improvements in UI, the ability to interrupt, and changes to voices.
    • While some users found it impressive, others felt it didn’t quite match the level of Microsoft Copilot, Grok 3, or ChatGPT.
  • Writing Wonders Without Web Woes: Members discussed the ability to disable web search in Perplexity, with one user suggesting the use of writing focus to achieve this.
    • However, some users reported that even in writing mode, web sources were still being used, while others claimed it worked fine for them.
  • GPT-4.5 gossip grows galore: Users discussed the potential integration of GPT-4.5 into Perplexity, referencing a YouTube demo and noting it as a model with greater context and more human-like responses.
    • A user shared a link from Sam Altman on X mentioning that GPT-4.5 is the first model that feels like talking to a thoughtful person.
  • Model Mixing Mayhem in Spaces: Users discussed issues with Spaces, where the system prompt tells it that You are Perplexity, a helpful search assistant trained by Perplexity AI even when using other models.
  • Pro or Grok: A Grandiose Gabfest: Members debated the value of Perplexity Pro versus SuperGrok, with one user asking What is the difference between the $50 dollar premium + plan vs Supergrok via there app?
    • A user clarified that SuperGrok offers more advanced reasoning through a Big Brain mode not available in Premium+.

Links mentioned:


Perplexity AI ā–· #sharing (17 messagesšŸ”„):

Majorana-1 Quantum, AI Communication, Lab Mice First Aid, House Blueprint, Ransomware Leaks

  • Perplexity Users sharing many Perplexity Links: Several users shared an array of Perplexity AI search and page links, spanning topics from quantum computing to AI communication and lab mice giving first aid.
  • Nvidia Stocks discussed on Perplexity AI: Users shared links regarding the impact of Nvidia’s strong results on the market.
    • There were also open invitations to discuss a Z-a trading strategy.
  • Deep Dive into Deep Sea Discussions: A shared link points to discussions about the deep sea on Perplexity AI.
  • SchellingPoint gets Poisoned Well Label: A user mentions $SchellingPointZEC and POISONED WELL with link to article about data centers and their health costs.

Link mentioned: YouTube: no description found


Perplexity AI ā–· #pplx-api (4 messages):

Perplexity Pro API credits, Obsidian Web Clipper configuration, sonar-deep-research model, Refunds for Perplexity API

  • Perplexity Pro Credits: How many APIs can I call?: A user inquired about the number of API calls and searches possible with the $5 API credit included with Perplexity Pro, and how to pay if they exceed the given credit.
  • Troubles configuring Perplexity API in Obsidian Web Clipper: A user is experiencing issues configuring the Perplexity API with the sonar-deep-research model in Obsidian Web Clipper despite setting the correct Base URL and API Key.
    • The user has provided screenshots of their configuration and the failure message, seeking assistance with troubleshooting.
  • Perplexity API refund process questioned: A user asked about how to get a refund if the API is recharged by mistake and remains unused.

Stability.ai (Stable Diffusion) ā–· #announcements (1 messages):

Website Redesign Contest, Stable Diffusion 3.5, AI-generated artwork, US participants only

  • Stability AI Launches Website Redesign Contest: Stability AI is inviting the Stable Diffusion community to showcase their best work in a Website Redesign Contest with winning images featured on Stability AI’s official website.
    • The contest seeks images that feel fresh, impressive, and forward-thinking, created using Stable Diffusion 3.5 and conveying innovation, beauty, and the future of creativity.
  • Stable Diffusion 3.5 Base Required for Entries: To enter the Website Redesign Contest, artwork must be created using Stable Diffusion 3.5 as a base, but can incorporate custom nodes, fine-tunes, or LoRAs.
    • The guidelines explicitly prohibit IP-infringing content, robots or apocalyptic themes, and NSFW material.
  • US Participants Eligible for Stability AI Contest: The Website Redesign Contest is open to US participants only, with submissions needing to be in 16:9 aspect ratio.
    • Submissions close on Friday, March 7th, and selected artwork will gain recognition and community showcase on Stability AI’s platforms.

Stability.ai (Stable Diffusion) ā–· #general-chat (92 messagesšŸ”„šŸ”„):

ControlNet models for consistent characters, LLMs referencing real-time data, SDXL alternative with T5 CLIP, Inpaint Anything error, Selling ComfyUI workflows

  • Seeking ControlNet Character Consistency: A member asked for recommendations for the best ControlNet models to maintain character consistency in SDXL.
    • They specifically requested a reference U-Net model, if available.
  • Gemini Real-Time Data Access?: A member inquired about LLMs that can reference and update with real-time data, mentioning Gemini as a potential option.
    • Another member noted that most LLMs don’t update in real-time but suggested enabling web search for more relevant information.
  • T5 CLIP Craze: A member sought an SDXL-like model with T5 CLIP integration, saying they had a taste of T5 prompt adherence in SD3.5.
    • They found the T5 adherence addictive and was looking for an alternative.
  • ā€œInpaint Anythingā€ shape mismatch error arises!: A member reported a shape mismatch error in Inpaint Anything: value tensor of shape [159, 256] cannot be broadcast to indexing result of shape [64, 256].
    • The member was using Automatic1111 with the Inpaint Anything extension and asked how to resolve this error.
  • ComfyUI Remote Installs Sell: A member mentioned selling ComfyUI workflows and remote installs to make them work for users, typically using TeamViewer.
    • They clarified that they charge for their time and knowledge, rather than the workflow itself.

Eleuther ā–· #general (8 messagesšŸ”„):

Hugging Face Deprecation, Best RAG Tool, LLM Pretraining Guide

  • HF Deprecation Discoveries: A member inquired about marking a repo as deprecated on Hugging Face with a link to a newer version, but later realized that this feature only applies to models, not datasets.
  • RAG Tool Recommended for Personal Use: A member asked which RAG tool is now best for personal users?
    • Another recommended BM25.
  • All-in-One LLM Training Guide Needed: Someone asked if there is a single self contained guide on pretraining and post training including SFT and RL for LLMs.
  • LLM Prompt Relevance Triumphs RAG?: One member suggested that for small corpora, prompting an LLM to check for relevance is better than tweaking embeddings and rerankers.
    • They added it’s better to prompt than to tweak embeddings if you don’t mind some latency.

Eleuther ā–· #research (36 messagesšŸ”„):

Data Mixing, DualPipe, DeepSeek, Gemini Flash Thinking, SWE-RL

  • Gradient Descent Mixes Data, Minimizes Compute: A new paper introduces MixMin, a gradient-based approach for optimizing data mixtures, which improves mixtures with less than 0.2% additional compute.
    • The method addresses the challenge of finding the optimal data mixture for machine learning pipelines by formalizing it as a convex bi-level objective.
  • DeepSeek Unveils DualPipe for Training: DeepSeek released DualPipe, a bidirectional pipeline parallelism algorithm designed to overlap computation and communication in V3/R1 training.
    • A user expressed hope that DeepSeek would release its entire pretraining framework, including core bits, on the final day.
  • Gemini’s Flash Thinking Sparks Debate: Members discussed Gemini 2.0 Flash Thinking, Google’s enhanced reasoning model that shows its thoughts to improve performance and explainability, particularly in math and science.
    • Some suspect the model was benchmarked internally but not published due to underperformance compared to O3 Mini.
  • Scaling LLM Reasoning for Software Eng with SWE-RL: A paper introduces SWE-RL, which scales RL-based LLM reasoning for real-world software engineering using a lightweight rule-based reward.
    • This approach enables LLMs to autonomously recover a developer’s reasoning processes from open-source software evolution data, training on top of Llama 3.
  • SSL Methods for ResNet Training: A user asked about cheap SSL methods to train a ResNet for decent linear probe performance on CIFAR10 quickly.
    • Another user suggested that tuning hyperparameters/architecture might be more efficient than changing the loss function, since nothing may be significantly more efficient than DINO.

Links mentioned:


Eleuther ā–· #interpretability-general (22 messagesšŸ”„):

Jacobian Sparse Autoencoders, SmolLM2 Intermediate Checkpoints, Mechanistic Interpretability Resources, Saving Weights after Iteration, Open Problems in Mechanistic Interpretability

  • Jacobian Sparse Autoencoders Sparsify Computations: A new paper introduces Jacobian Sparse Autoencoders (JSAEs), a novel architecture designed to induce sparsity in both computations and representations within LLMs, aiming for a sparse computational graph that works on the full distribution of inputs. Read the full paper here.
  • SmolLM2 Models get 50+ checkpoints: 50+ intermediate checkpoints for ALL the SmolLM2 models were released, in the hopes of helping people learn about interpretability. Check out the announcement here.
  • Neel Nanda’s Comprehensive List of MI Resources: A user shared a collection of resources for learning mechanistic interpretability, primarily linking to content created by Neel Nanda, including a ā€œgetting startedā€ guide and a list of good papers to read when getting into the field.
    • Also shared was Neel Nanda’s updated (2024) list of favorite papers can be found here.
  • Weight Saving Solutions Sought Post-Iteration: A user inquired about research or tools for efficiently saving weights after each iteration during pretraining to observe fine-grain dynamics, also linking to an initial MVP on GitHub.
  • Mech Interp Groups Put Out Survey: A large survey paper representing many of the major mech interp groups was shared, titled open problems in mechanistic interpretability.

Links mentioned:


Eleuther ā–· #lm-thunderdome (17 messagesšŸ”„):

QA Task Evaluation, ARC-Easy, ARC-hard, Mosaic's Eval Framework, GPQA Diamond COT Zero-Shot Evaluation

  • Evaluating QA Tasks with Harness Sparks Debate: A member inquired about evaluating QA tasks like ARC-Easy and ARC-hard using a harness, questioning why the concatenation only includes Question + Option instead of Question + Options + Answer for each option.
  • ARC Evaluation Relies on Loglikelihoods: In response to a question about evaluation methods, a member clarified that they found ARC-Challenge and ARC-Easy follow the former approach (Question + Option) and that they can use generate_until instead of Loglikelihoods then perform exact match.
    • Another member confirmed that this approach aligns with the GPT-3 paper.
  • GPQA Diamond COT Zero-Shot Command Shared: A member asked for the command used to run evaluations, noting that someone else reported getting less than 10% accuracy.

Link mentioned: lm-evaluation-harness/lm_eval/tasks/arc/arc_challenge_chat.yaml at main Ā· EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


Yannick Kilcher ā–· #general (58 messagesšŸ”„šŸ”„):

Microsoft's survival aided by governments, Deterministic manners of AI models, AI in programming, Agentic systems struggle, Small team build a better browser than Chrome

  • Microsoft’s Dominance Debated: A member asserted that Microsoft has never been a true innovator, but has been sustained by government support.
    • Another member countered that while money and power are important, they don’t guarantee long-term success, pointing to Yahoo as an example of a company that lost its dominance despite having significant resources.
  • AI Models Generate Meaningful but Non-Deterministic Results: A member questioned how non-deterministic AI models can exhibit deterministic behavior and converge.
    • Another member responded that while the exact results may vary, AI models generate outputs with the same meaning, citing the example of regenerated code in Cursor with only changes in comments and variable names.
  • AI Excels in Static Programming Tasks: A member shared that AI models learn programming more easily than other tasks, focusing on the programming side, being proficient at static things but struggling with dynamic tasks which hurts agentic systems.
    • They pointed to the possibility of individuals threatening big companies since smaller teams can move faster and build better tools.
  • OpenAI Releases GPT-4.5 Research Preview: Members discussed the release of GPT-4.5, noting that it focuses more on user preference and helpfulness rather than groundbreaking advancements as described in the Introduction to GPT-4.5 YouTube video.
    • Some felt OpenAI was pressured to release something due to competition from Grok-3 and Claude 3.7, noting the increased pricing of $75 per million input tokens and $150 for output.
  • OpenAI’s MoE Architecture Confirmed: A member shared a more or less official confirmation that OpenAI’s base models are all MoE (Mixture of Experts) as linked in this YouTube video.
    • The member stated that while this wasn’t really news, as it was somewhat known already, this confirmation was not a rumor but pretty well founded.

Links mentioned:


Yannick Kilcher ā–· #paper-discussion (7 messages):

Hash Collisions, KV Similarity

  • Hash Collisions Intended: Instead of eliminating hash collisions, the implementation aims to induce collisions when qkT_i is high.
    • The probability of hash collision, P(h(q) == h(k_i)), is leveraged, where h is a hash function.
  • KV Similarity via Hash Collisions: Hash collisions are used as a metric to remove similar key-value pairs, as described in arxiv.org/pdf/2502.03387.
    • The discussion referenced a pseudo truthmatteo.batelic file, though its exact purpose wasn’t specified.

Link mentioned: ClaudePlaysPokemon - Twitch: Claude Plays Pokemon - Debut Stream


Yannick Kilcher ā–· #ml-news (15 messagesšŸ”„):

Remarkable Alexa, GPT-4.5 Announcement, DeepSeek AI Open Infra Index

  • Amazon’s Alexa to have Monthly Subscription?: Rumors suggest that the new Alexa, codenamed Remarkable, might require a subscription fee ranging from $5 to $10 per month according to tomsguide.com.
    • The article highlights that it remains to be seen if consumers will pay for Alexa, given that Google, Samsung, and Apple offer their AI services for free.
  • DeepSeek AI Opens Infrastructure Index: DeepSeek AI has released an open-source infrastructure index which can be found here.
  • OpenAI Teases GPT-4.5 Launch: OpenAI teased the launch of GPT-4.5 with a livestream and later released an introductory YouTube video featuring Mia Glaese, Rapha Gontijo Lopes, Youlong Cheng, Jason Teplitz, and Alex Paino.
    • The announcement was met with mixed reactions, with some criticizing the presentation and the scenarios showcased, such as *ā€˜Write an angry text because I am mad with the friend.’

Links mentioned:


Cohere ā–· #discussions (44 messagesšŸ”„):

Cohere models in OpenAI SDK, Auto Subtitles, Command R+ update, R7B Arabic vs Fanar and ALLaM

  • Cohere Models can now use OpenAI SDK: Members celebrated the ability to access Cohere models directly through the OpenAI SDK. A link to the Quickstart Guide was shared, featuring demos for Python, TS, & cURL, plus streaming, tool calls, and structured outputs.
  • Community Seeks Auto Subtitle Solutions: A user requested recommendations for AI APIs that generate auto subtitles similar to those on TikTok or YouTube Shorts.
    • Another user suggested using Google STT, noting that YouTube’s auto subtitles are likely powered by Google’s own tooling.
  • Command R+ Update Anticipation Builds: Community members discussed and expressed their eagerness for an upcoming Command R+ update, with one hoping it will surpass Mistral Large 2411.
    • Members highlighted that specific release details are unlikely to be shared due to NDAs, and advised against spreading unconfirmed information or rumors.
  • Arabic LLM Benchmarks: There was interest in benchmarking Cohere’s R7B Arabic model against Qatar’s Fanar model and Saudi’s ALLaM, with the suggestion to use the Arabic Balsam index.
    • A member also shared a link to the GPT-4.5 system card which provides a great overview of the latest benchmarking methodology.

Links mentioned:

  • Tweet from Sandra Kublik (@itsSandraKublik): You can now access Cohere models directly through the OpenAI SDK :) Check out our Quickstart Guide for Python, TS, & cURL demos, plus streaming, tool calls, structured outputs, and more. Happy buildin...
  • Tweet from Sandra Kublik (@itsSandraKublik): You can now access Cohere models directly through the OpenAI SDK :) Check out our Quickstart Guide for Python, TS, & cURL demos, plus streaming, tool calls, structured outputs, and more. Happy buildin...

Cohere ā–· #announcements (1 messages):

Command R7B Arabic Model, Multilingual AI Model, Arabic Language Optimization

  • Arabic Command R7B model goes live!: Cohere announces Command R7B Arabic, a variant of the R7B model optimized for Arabic performance while maintaining its performance in English.
  • R7B Arabic excels at enterprise tasks: The Command R7B Arabic model excels at tasks such as instruction following, length control, RAG, and responding in the correct language.
    • It has a context length of 128,000 tokens.
  • Blog post goes live on Arabic language model: A blog post introducing Command R7B Arabic is now live, detailing its optimization for Arabic language capabilities to support enterprises in the MENA region.

Links mentioned:


Cohere ā–· #cmd-r-bot (3 messages):

Differential Transformers, World Without Coffee Essays

  • Differential Transformer Concepts Requested: A user asked the bot, what is the main concept behind Differential Transformers.
    • No further discussion or details were provided about Differential Transformers.
  • Coffee Essay Prompt Triggered Bot: A user asked the bot to write an essay about a world without coffee.
    • Another user repeated this prompt, suggesting interest in the bot’s response to hypothetical scenarios.

Cohere ā–· #projects (9 messagesšŸ”„):

Free auto caption APIs, Adobe Premiere auto transcription

  • Members seek free auto caption APIs: One member inquired about free APIs for generating auto captions, wondering whether they needed to build one themselves.
    • Another member explained a linked tool does auto subtitles/captions for your video.
  • Adobe Premiere: Auto Transcription Revelation: A member suggested that Adobe Premiere has an auto transcription feature.
    • Other members agreed and confirmed its existence and availability.

LlamaIndex ā–· #blog (2 messages):

LlamaIndex CentralReach, LlamaExtract Public Beta

  • LlamaIndex Transforms Autism and IDD Care: LlamaIndex is helping CentralReach transform autism and IDD care with AI.
    • AI’s utility in medical fields lies in boiling down mountains of research and paperwork into relevant insights and key points, enhancing doctor efficiency.
  • LlamaExtract Enters Public Beta: LlamaIndex’s LlamaExtract is now in public beta, simplifying structured data extraction from unstructured documents.
    • It enables users to define and customize schemas for data extraction programmatically.

LlamaIndex ā–· #general (48 messagesšŸ”„):

Data Leak in LlamaParse 0.6.2, Reloading pgvector Index Table, AgentWorkflow Custom Exception Handling, Elasticsearch Metadata Schema, LlamaExtract Documentation Outdated

  • LlamaParse 0.6.2 Data Leak Debacle Unfolds!: A user reported a significant data leak in LlamaParse 0.6.2, observing images and analyses from other users mixed into their own results, including sensitive information like bank account details and transaction histories.
    • The issue, confirmed as a mix-up with test/benchmark data, has been fixed in the backend API, with the reporter providing a list of Job IDs for investigation.
  • pgvector Index Reloading: Index Deja Vu: A user inquired about how to reload a previously created pgvector index table from the database, aiming to avoid re-creation.
    • Another user suggested using index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model) to reload the index from the vector store.
  • AgentWorkflow’s Custom Exception Conundrum: A user asked if it’s possible to allow AgentWorkflow to throw a custom exception, attempting to break the workflow and handle the exception outside of the tool’s scope.
    • While not currently supported, a member suggested the team could add an option to FunctionTool to support this use case.
  • LlamaExtract’s Documentation: Lost in the Cloud: A user found that the create_agents method was missing in LlamaExtract 0.0.4, indicating outdated documentation.
    • It was confirmed that the project has moved to LlamaCloud Services, with the relevant code now located in the llama_cloud_services repo, and the documentation indeed being out of date.
  • Searxng Search Engine: A Fresh Face?: A user inquired about integrating Searxng, a free meta-search engine, into the framework.
    • A member responded that it was the first time they’ve heard of it but suggested using it with an agent by putting it in a FunctionTool.

Links mentioned:


DSPy ā–· #show-and-tell (1 messages):

Prompt Engineering Studio, AI-powered assistant, Reusable templates, Version control, Team collaboration

  • Portkey AI Launches Prompt Engineering Studio: Portkey AI launched a Prompt Engineering Studio, an IDE for prompt engineers, that allows users to test across 1600+ models with side-by-side comparison and offers instant improvements from an AI-powered assistant.
    • The studio enables the creation of reusable templates with mustache and partials, version and deployment of prompts with proper labeling, and performance tracking with real-time analytics.
  • Portkey Workshop to Demo New Studio: Portkey AI will host a live workshop on Monday, March 3rd, at 10:30 AM PST to demo their Prompt Engineering Studio and host an AMA with their CEO Rohit, accessible via Portkey’s website.
    • The workshop will showcase how to test prompts, use the AI assistant, build reusable templates, implement version control, and collaborate with teams using shared prompt libraries.

Link mentioned: Demo: Prompt Engineering Studio Ā· Zoom Ā· Luma: Join us for an exclusive first look at Portkey’s Prompt Engineering Studio - the most comprehensive toolkit for building, testing, and deploying AI prompts at…


DSPy ā–· #general (37 messagesšŸ”„):

ReAct Agent Integration, DSPy Release Bug, MIPROv2 Optimizer Error, Refine API Feedback, Community Engagement

  • ReAct Agent juggles external tools: A user questioned how to integrate tools requiring external pings with dspy.ReAct for complex tasks like creating text and sending emails, especially concerning orchestration.
    • The challenge lies in ensuring the system understands the sequence of actions (text creation before email) when email function requires external function calls.
  • DSPy Release 2.6.7 bugs out, Imports vanish: Users reported a ModuleNotFoundError in dspy-ai==2.6.7, with a GitHub issue detailing the import failure.
    • Downgrading to version 2.6.6 resolved the issue; the faulty release was quickly yanked, and 2.6.8 was released to address the import problems caused by a migration from setup.py to pyproject.toml.
  • MIPROv2 optimizer hits context limits: A user encountered a ContextWindowExceededError with MIPROv2, even after ensuring conversations were under 1000 characters and using light mode.
    • It was suggested that the user reduce the number of demos in the optimizer or set view_data_batch_size=3 in the .compile() call to address the token limit issue, this setting was required to reduce the data summary size.
  • Refine API evolves feedback loops: A user inquired about how to control advice/feedback passed to the LLM on subsequent retries with dspy.Refine, compared to older assertion methods.
    • Feedback will be returned in the reward_fn, and that dspy.Refine should now participate in the compilation feedback mechanism, allowing for optimization of previously unoptimizable suggestions.
  • Community yearns for signal from noise: Concerns were raised about getting quality feedback from a large Discord community to improve DSPy and avoid too many knobs.
    • The proposition of weekly open calls/meetings was floated, along with the idea of short posts or PRs offering feedback from production use, similar to examples in the Discord channels.

Links mentioned:


Torchtune ā–· #general (1 messages):

yamashi: Gpt4.5 available on azure


Torchtune ā–· #dev (26 messagesšŸ”„):

CI troubles, Activation Offloading, Distributed Torch FL Code, DPO Integration Test

  • CI run requested for PR#2419: A member requested someone to start CI for PR#2419 without merging, as they are making a last attempt for today.
    • The PR in question regards truncation and skipping.
  • Activation Offloading and Checkpointing: A member inquired whether there is a reason why activation offloading can only be used in conjunction with activation checkpointing.
    • Another member explained that activations require waaaay more memory than just the checkpoints, which in their case is just the input vector to the transformer block, so offloading and loading them will throttle GPU and make it unbearably slow.
  • Handling Merged Model Loading in Distributed FL: A member sought advice on handling merged model loading in distributed Federated Learning (FL) code, particularly how to avoid downloading the merged model on all ranks.
    • They considered dumping the merged model to disk and having all ranks load from the disk, and was recommended to use shared memory instead.
  • Bullying pre-commit: A member mentioned being bullied by pre-commit again while trying to implement Federated Learning. The relevant function in question resides here.
    • The member expressed relief after managing to go through it: pls dont 🄲
  • DPO Integration Test Status: A member asked about the status of the DPO integration test, wondering why there was a problem adding it.
    • Another member replied that there is currently one for the single device recipe, referencing this file, clarifying there shouldn’t be any issue adding for distributed recipe too.

Links mentioned:


Torchtune ā–· #papers (10 messagesšŸ”„):

DeepSeek DualPipe, Federated Learning at Scale

  • DualPipe for Computation-Communication Overlap Surfaces: A member shared a link to DeepSeek’s DualPipe GitHub repository, which presents a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
  • Federated Learning faces Communication Bottlenecks: A member expressed excitement about DualPipe, but noted its novelty and mentioned attempting to implement federated learning (FL) across 40 hospitals in Europe using a 70B model.
    • They humorously acknowledged that the communication overhead in their FL setup would likely dwarf the optimizations offered by DualPipe, but suggested it might be useful for gains between FL syncs.

Link mentioned: GitHub - deepseek-ai/DualPipe: A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.: A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. - deepseek-ai/DualPipe


Notebook LM ā–· #use-cases (2 messages):

ā€œ

  • N/A: N/A
  • N/A: N/A

Notebook LM ā–· #general (29 messagesšŸ”„):

Notebook emoji changes, Arraying instructions with keywords, Sharing Notebooks with groups, Audio overview error, Public link to notebook

  • Users request Emoji Options for Notebooks: Users requested the ability to change emojis on their notebooks, but the feature is currently unavailable, but it was suggested to support existing feature requests or create new ones. There are many strong options against OneNote, Obsidian, and Goodnotes.
    • One user pointed to a tweet lamenting NotebookLM’s lack of momentum and mobile apps, blaming Google’s pattern of stifling internal innovation.
  • Notebook Sharing Shenanigans: Users are encountering issues sharing notebooks with groups, finding that simply handing over the link is insufficient, as they need to add users specifically to grant access.
    • It seems that users may need to have an account before they can access a shared notebook, and both adding the user via email and providing the link might be necessary.
  • Audio Overview Agony: Users are frequently encountering an error saying ā€œThere was an error fetching your conversation. Please try again.ā€ when trying to load the audio overview.
    • The issue seems intermittent, working sometimes but failing frequently, causing frustration among users who rely on this feature.
  • User reports ā€˜Service Unavailable’ Error: A user reported receiving a ā€˜Service unavailable’ error when logging into NotebookLM, with a message indicating that ā€˜You tried to access a service that isn’t available for your account’, and linked to their Google Account services page.
    • A user suggested that the account may be defaulting to a school account instead of a personal one.

Links mentioned:

  • Service unavailable: no description found
  • Tweet from signüll (@signulll): notebooklm had insane potential, one of the best products google’s put out in years. but in classic google fashion, it seems like it lost all momentum & got left to die. no mobile apps, no meaningful ...

Modular (Mojo šŸ”„) ā–· #general (5 messages):

Repo Structure Simplification, Mojo Prioritization, Chris Lattner's Blog Post

  • Modular Simplifies MAX and Mojo Repo Structure: Modular aims to simplify their MAX and Mojo repo structure to ease contributions to documentation and the standard library, and to consolidate bug reports and feature requests, as detailed in this forum thread.
  • Doubts Emerge on Mojo’s Standalone Future: A member questioned whether the repo simplification indicates a shift away from prioritizing Mojo as its own standalone language.
  • Chris Lattner’s Blog Post Series: A member found Chris Lattner’s blog post series excellent and insightful, regretting not taking the GPU programming course.
    • The member mentioned being previously turned off by doing trivial things in tensorflow in introductory classes, noting more complex tasks seemed locked away behind a pile of data.

Link mentioned: Upcoming changes to our GitHub repositories: Tomorrow (February 27), we’re streamlining our GitHub repositories! The max repo is merging into the mojo repo, bringing everything under one roof. A new subdirectory will house the Mojo standard libr…


Modular (Mojo šŸ”„) ā–· #mojo (25 messagesšŸ”„):

MLIR in stdlib, HyperLogLog in Mojo, MLIR Dialects in Mojo, MAX Graph Compiler, Unions in Mojo

  • Mojo Hyperlogs with Github!: A member implemented the HyperLogLog algorithm in Mojo and shared it on GitHub, seeking suggestions for improvement.
    • They express enjoyment in using Mojo, describing it as a more powerful Python.
  • MAX uses undocumented MLIR: Members discussed the use of inline MLIR in the stdlib, which is largely undocumented and intended for internal use by Modular and stdlib contributors.
    • It’s implied that the in-house dialects mo, moq, mogg, mef, mgp, grt, rmo are not intended to be exposed to the general public.
  • Exploring Internal Mojo Dialects: A member explored Mojo’s internals using nm to discover and list details related to dialects, types, and ops within libmof.so.
    • This exploration revealed the union type, prompting discussion about its intended use and potential hazards due to poorly defined aliasing and type-punning rules.
  • MAX graph compiler uses mlir dialects: A member clarified that specific MLIR dialects (like mo) are primarily used by the MAX Graph Compiler and are not part of Mojo’s runtime.
    • These dialects are relevant for graph compilation only, with no current way to manually load them into Mojo’s MlirContext.
  • Stability concerns for mojo’s MLIR: Stability and documentation efforts are reasons why some MLIR dialects aren’t publicly available, as they include aspects critical to Modular’s competitive advantage, and completely documenting them could dilute their value.
    • A member noted once Modular is more established, they can afford to open things up since it will be easier to use their system than to replicate it.

Link mentioned: GitHub - axiomhq/mojo-hyperloglog: Contribute to axiomhq/mojo-hyperloglog development by creating an account on GitHub.


MCP (Glama) ā–· #general (18 messagesšŸ”„):

MCP in production, Claude Code diff based editing, Official everything server SSE, Glama AI GitHub App, Claude Code Invite

  • MCP finds users in production: Members confirmed that MCP can be used in production-level workflows.
    • One user noted utilizing it despite issues with line numbers changing, which they mitigate through prompting and resource inclusion.
  • Claude Code uses Diff-Based Editing, struggles with GO: Users report Claude Code uses diff-based editing, which fails when editing Go due to space additions for readability.
    • A user mentioned that this issue is caused by the way spaces get added to go code to improve readability.
  • Official Everything Server has SSE: The official everything server has SSE (Server-Sent Events) functionality, which is suitable for tests.
    • A user found that SSE is perfect for testing purposes.
  • GitHub App helps scale Glama AI: The creator of Glama AI requested users to install a GitHub app to support the project and increase API rate limits.
    • One user encountered a could_not_parse_params error during installation, but the creator clarified that the installation registration is sufficient and no data collection occurs.
  • MCP Server has remote resource issue: A user struggled to get their MCP server to work with resources for the life of me, including the subscribe_resource decorator.
    • It was discovered that users have to manually add resources to context like adding a file from the filesystem for the client to be able to use the resource/read method?.

Links mentioned:

  • Open-Source MCP servers: Enterprise-grade security, privacy, with features like agents, MCP, prompt templates, and more.
  • Build software better, together: GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

MCP (Glama) ā–· #showcase (5 messages):

Redmine MCP Server, Ableton Voice Control, tinylm library for running LLMs

  • MCP Redmine Lands with Great API Coverage: A new MCP Redmine server has been released, boasting coverage of nearly the entire Redmine json API in under 50 lines of code.
    • The server utilizes the gh user d-yoshi OpenAPI specification, according to reports.
  • Ableton Voice Control Dreams Surface: A member expressed enthusiasm for the MCP Redmine and imagined controlling Ableton via voice commands, suggesting a workflow like ā€˜Ok now lets record a new track using input7 with a bit of reverb added and routed to output 3+4.’
    • Another member noted that while direct loading of devices isn’t possible with Ableton remote control scripts, a Whisper routine paired with a custom Ableton MCP client could achieve this.
  • tinylm Powers Client-Side LLMs in Browser: Version 0 of tinylm was released, a library for running LLMs and embedding models client-side in the browser or Node.js with WebGPU acceleration, supporting an OpenAI-compatible API.
    • tinylm touts zero-cost inference, complete privacy, and features like text generation, text embeddings, and real-time token streaming.

Links mentioned:


Nomic.ai (GPT4All) ā–· #general (18 messagesšŸ”„):

Live Mode, Voice Assistant, GGUF models, Alltalk TTS

  • Request for LIVE mode feature: A member requested a LIVE mode feature similar to Google Gemini, suggesting it would surpass Google’s tools.
    • They proposed using voice recognition (STT) for input and TTS for output, linking a YouTube video demonstrating a GPT4ALL Voice Assistant built in Python that utilizes OpenAI Whisper for offline voice detection.
  • Comprehending Chat Templates for GGUF Models: A member inquired about the usage of chat_template with GGUF models, questioning if the template is read from the .gguf file on initial load and stored in model3.json.
    • They sought confirmation that changes made in the GUI are saved in model3.json, as observed with gpt4all and Hugging Face models.
  • Oobabooga implements Alltalk TTS: A member mentioned that Oobabooga implements a text-to-speech extension called alltalk_tts that functions with GGUF, AWQ, and GPTQ models.
    • They noted the installation is somewhat tricky, involving a Python installation with a BAT install, but requires no coding.
  • Internet speed impacts installation time: A member lamented their slow internet speed of 40 kbps, which would make the Oobabooga installation take approximately two days.
    • The other member had said the install takes one hour.

Links mentioned:


tinygrad (George Hotz) ā–· #general (12 messagesšŸ”„):

GROUP operations AST changes, BEAM search strategies for OptOps, arange GROUP optimization failure, LLVM speed regression

  • GROUP AST changes hit performance blocker: Changes to the AST for GROUP operations have reached parity with PyTorch when summing (2048,2048) tensors but struggle with (4096,4096) tensors due to the need for multiple successive OptOps.
    • The author asks whether they should attempt to adjust BEAM search to find these OptOps or modify the lowerer/expander to output something different that will do multiple accumulators.
  • BEAM Search Stalls Out, Frustrates Progress: The author is facing challenges getting BEAM search to find the optimal sequence of OptOps needed for efficient summation of larger tensors (4096,4096).
    • They are considering modifying the lowerer or expander to generate alternative ASTs that could better utilize multiple accumulators and horizontal add swizzles but express uncertainty about guaranteeing performance improvements.
  • arange GROUP Optimization, Breaks CI: The author reports that the arange GROUP optimization is not being applied, leading to an extra inner loop in arange operations and broken CI.
    • They rebased onto master and tests are passing, with successful matching of pytorch, asking for any advice there might be on the arange GROUP optimization.
  • Speed Test BEAM=2 Times Out: A member noticed that ā€œSpeed Test BEAM=2ā€ is timing out on GitHub Actions.
    • The author fixed this issue by trimming some of the added OptOps and also reported that adding GROUP and GROUPTOP slowed the BEAM search due to greatly increased number of kernels tried.
  • Tests still failing on Pull Request: A member said that the tests are still failing on the pull request and the code is also a lot slower on LLVM speed with 0 gain.
    • The author clarified that they were not asking for a review yet, but wanted to know if the arange tests failing on GROUP OptOps was a known issue.

Links mentioned:


tinygrad (George Hotz) ā–· #learn-tinygrad (1 messages):

ā€œ

  • User Embarks on Code Expedition: A user expressed gratitude and indicated they would explore the code to answer their questions.
  • Self-Reliance in Problem-Solving: The user decided to investigate the codebase independently to resolve their inquiries.

LLM Agents (Berkeley MOOC) ā–· #mooc-questions (2 messages):

Research Plans Announcement, Discord Server Recruitment

  • Research Plans Announcement Via Discord!: A member shared a Discord invite link (https://discord.gg/5MbT7ce9) for a more detailed announcement about their research plans.
    • They encouraged interested parties to DM them for more information or join the Discord server directly.
  • Discord Server Seeks New Recruits!: An enthusiastic member extended an invitation to join their Discord server to learn about their research plans, as well as engage directly via DMs.
    • The provided Discord invite link (https://discord.gg/5MbT7ce9) promises a more detailed announcement regarding their ongoing projects and collaborative opportunities.

LLM Agents (Berkeley MOOC) ā–· #mooc-lecture-discussion (1 messages):

Research Track, Predictive Decision Making, Long Term Memory in Agents

  • Research Track Launches Subgroups for Focused Study: A research track is forming, focusing on predictive decision making and long-term memory in agents.
    • The group will hold regular sync meetings to discuss lectures and foster collaboration; interested members can join via this Discord invite.
  • Predictive Decision Making Subgroup Kicks Off: A new subgroup will concentrate on predictive decision-making strategies within AI agents.
    • This subgroup aims to explore methods for enhancing agents’ abilities to anticipate future outcomes and make informed choices.

MLOps @Chipro ā–· #general-ml (1 messages):

tinylm, WebGPU, OpenAI SDK, client-side LLMs

  • tinylm v0 released: A library for running LLMs and embedding models client-side in a browser or Node.js with WebGPU acceleration has been released, called tinylm.
    • It supports OpenAI SDK like text generation and embeddings generation with text-to-speech and speech-to-text coming soon, with no servers needed.
  • tinylm features OpenAI-compatible API: tinylm provides an OpenAI-compatible API for running language models directly in your browser or Node.js application using WebGPU acceleration.
    • Features include zero-cost inference, client-side processing, text generation, text embeddings, cross-platform compatibility, true streaming, and detailed progress tracking.

Link mentioned: tinylm - Run Models Locally with WebGPU: no description found



{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}