AIE is all you need.

AI News for 5/6/2025-5/7/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 4624 messages) for you. Estimated reading time saved (at 200wpm): 485 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

It’s a quiet day, but as we did almost exactly a year ago, we’ll spend an issue talking about the new speakers announced for the second large-scale AI Engineer World’s Fair next month:

TLDR for the second year running, we’re offering a onetime discount to AI News readers: CLICK HERE and enter AINEWS before EOD Friday :)

The first World’s Fair was a big experiment - is AI Engineering big enough to warrant its own large multitrack conference? We were fortunate to be the first to preview capabilities that now every single AI Engineer can build with and take for granted. In between then and now, the NYC Summit blew out all expectations, with 4 Madison Square Gardens worth of people tuning in on the livestream and a viral MCP workshop.

The 2025 AI Engineer World’s Fair (Jun 3-5 in SF)

AIEWF 2025 is now 2x as big again as last year, with expo booths, talks, workshops across 18 tracks. You can browse the llms.txt or llms-full.txt, to get up to speed on the evolving meta of AI Engineering:

Expanded tracks:
- The “RAG” track that every AI conference has is really a bundle of deeper problems: Retrieval + Search (now that LLMs are increasingly being bundled with web search), GraphRAG (Neo4J returns to expand upon one of the most popular talks from last year), and RecSys (inspired by Eugene Yan’s coverage, who is now hosting this track)
- The “Agents” track no longer makes sense when 2025 is the year of agents. Since everything is becoming agentic and nobody agrees on definitions, we are simply double clicking on the three most urgent agent hotspots: SWE-Agents, Agent Reliability, and Reasoning + RL (again taking signal from a top NYC talk).
- “Multimodality” also is now broken up into special focuses on Voice AI (realtime voice APIs) and Generative Media (image/video generation)
- Infrastructure, Security and Evals finally get their own track (with Braintrust, one of the top Latent Space episodes of the year)
Maturing Leadership: Per the Bret Taylor pod, the role of “AI Leadership” is now increasingly owned by AI Architects, and we’ve also moved the AI in the Fortune 500 case studies and war stories to double options for those who build AI in the enterprise.
Brand New Directions:
- MCP the single most oversubscribed track.
- Tiny Teams building companies with more millions in ARR than employees.
- Product Management a track for the PMs that work closely with AI Engineers.
- Design Engineering similar; except AI Designers are increasingly also engineers.
- Robotics and Autonomy covers the latest in embodied LLMs - with new foundation models info from Waymo, Tesla, Google and more
Of course, the most important track is the unlisted one: the hallway track, which last year was incredible, but of course you simply had to be there.

To celebrate the launch, we’re offering a onetime discount to AI News readers: CLICK HERE and enter AINEWS before EOD Friday to lock in the Early Bird tickets before prices go up this weekend.

If the curation here/on Latent Space has the most cosine similarity with your interests, this conference was made for you. See you in SF June 3-5!

AI Twitter Recap

Gemini 2.5 Pro Model Improvements and Performance

Release and Capabilities: @demishassabis announced the release of Gemini 2.5 Pro Preview ‘I/O edition’, touting its coding capabilities and ranking as no.1 on LMArena in Coding and no.1 on the WebDev Arena Leaderboard. The model is particularly adept at building interactive web apps. @GoogleDeepMind highlighted that it can transform images of nature into code. It is available in @GeminiApp, Vertex AI, and AI Studio. @GoogleDeepMind noted improvements extend to code transformation, editing, and developing complex agents.
Community Response and Iteration: @demishassabis mentioned the positive response to the Gemini 2.5 series and encouraged continued feedback.
WebDev Arena Leaderboard: @GoogleDeepMind stated that Gemini 2.5 Pro leads on the WebDev Arena Leaderboard and ranks #1 on @LMArena_ai in Coding. @lmarena_ai confirmed Gemini 2.5 Pro is the new #1 in WebDev Arena, surpassing Claude for the first time.
Coding Prowess: @Yuchenj_UW claimed Gemini-2.5-Pro-preview-05-06 is their top coding model, outperforming o3 and Claude 3.7 Sonnet on challenging prompts. They also suggested Google should call it Gemini 3.
Benchmarking: @scaling01 reported Gemini 2.5 Pro Livebench results show improvements across the board, except for a minor regression in mathematics, with significant improvement in data analysis.
Application in Cline: @cline mentioned that Gemini 2.5 Pro got a significant upgrade, especially for front-end web dev and function calling, noting that using 03-25 in Cline automatically points to the 05-06 version.
@alexalbert__ mentions adding a web search tool to the Anthropic API, giving Claude direct access to real-time web content.
@iScienceLuvr talks about Gemini 2.5 Pro with a key trick that adds some inference-time tokens to make it into an agent

AI Models and Frameworks

FastVLM: @awnihannun announced a new release from Apple ML research including code and models for FastVLM, an MLX implementation, and an on-device (iPhone) demo app.
Parakeet ASR: @awnihannun mentioned that Nvidia’s state-of-the-art Parakeet ASR model has an MLX implementation. The 0.6B model is at the top of the Hugging Face ASR leader board and runs super fast locally with MLX.
Meta Perception Models: @AIatMeta introduced Meta Perception Language Model (PLM), an open & reproducible vision-language model for challenging visual tasks, and @AIatMeta introduced Meta Perception Encoder, a vision encoder setting new standards in image & video tasks.
BayesFlow 2.0: @fchollet announced BayesFlow 2.0, a Python package for amortized Bayesian inference, now powered by Keras 3, with support for JAX, PyTorch, and TF.
LLaMA-Omni2, Ming-Lite-Uni, SuperEdit, Voila: @_akhaliq, @_akhaliq, @_akhaliq, and @_akhaliq shared releases of LLaMA-Omni2, Ming-Lite-Uni, SuperEdit, and Voila on Hugging Face, detailing their functionalities and applications.
@reach_vb shares that you can learn VLMs from inside out in < 1000 lines of pure PyTorch code!
@cloneofsimo notes that SandAI_HQ ‘s Magi attention have a insanely beautiful abstraction on all sorts of attentions.
SynCity: @LiorOnAI highlighted SynCity, a new research project and codebase for generating entire 3D worlds from a single text prompt without training.
Supabase: @LiorOnAI introduced Supabase as the ChatGPT of databases, allowing users to build and launch databases, create charts, and generate sample data, noting that it is 100% open source.
Browser use: @LiorOnAI announced that users can now connect any AI agents to the internet in a few lines of code using Browser use, which is fully open-source.
Dolphin-Math Datagen: @cognitivecompai presented Dolphin-Math Datagen, a tool for creating math problems for model training, inspired by @FernandoNetoAi, encouraging contributions to expand its open-source capabilities.
@clattner_llvm shares that Modular’s 25.3 release is now also open and free to use both on CPUs and on NVIDIA and aims to be the most open GenAI platform out there
@iScienceLuvr introduces openly licensed datasets for Rewriting Pre-Training Data to Boost LLM Performance in Math and Code which includes SwallowCode and SwallowMath.
@AymericRoucher reports Computer Use in smolagents that is powered by Qwen-VL models, with built-in grounding, i.e. ability to locate any element in an image by its coordinates, thus to click any item on a screenshot.
@vllm_project notes how @vllm_project is used in the rollout process with with offloading the engine to CPU and give the GPU back to the kernel to be benchmarked!

Tools and Platforms

Cursor: @cursor_ai announced that Cursor is now free for students. @cognitivecompai expressed a preference for Cursor with local model support.
Cline: @cline introduced Plan & Act modes in Cline, emphasizing the importance of understanding before coding. @cline also highlighted the /newrule command to capture project standards.
Weights & Biases: @weights_biases announced faster logging, instant dashboards, and performance built for scale.
LangSmith: @LangChainAI announced that LangSmith now supports images, PDFs, and audio files, making it easier to build and evaluate multimodal applications.
@_philschmid We will enter a new era for vibe coding! The new Gemini 2.5 Pro can now zero-shot full Single Page Application, Complete Responsive Mobile Games, convert UI screenshots precisly to working code.
@jerryjliu0 highlights an AI agent that can not only perform highly accurate extraction from the most complex PDFs/Powerpoints/etc., but also give back precise citations and reasoning back to the source element @jxnlco share that RAG is the number 1 use-case of LLMs in Enterprises.

AI Education, Learning Resources, and Community

Building AI Voice Agents for Production Course: @AndrewYNg introduced a new short course on building conversational AI voice agents, created in collaboration with @LiveKitAgent and @realavatarai. @DeepLearningAI also promoted this course, highlighting its focus on real-time, low-latency, human-like voice agents.
AI Benchmarking Hub: @EpochAIResearch announced the addition of four new benchmarks to the Epoch AI Benchmarking Hub, including Aider Polyglot, WeirdML, Balrog, and Factorio Learning Environment.
Hugging Face Wild Card Applications: @ClementDelangue announced they are reviewing wild card applications for those excited to join the @huggingface team.
LLM Course: @ben_burtenshaw notes the new video content has been added to the @huggingface LLM course!
@jerryjliu0 shares the most comprehensive guide out there on how to build Deep Research, step-by-step - for both beginner to advanced users.

Broader AI Industry Trends and Discussions

Stargate AI Training Facility: @sama shared progress on the first Stargate in Abilene, in partnership with Oracle, which will be the biggest AI training facility in the world.
AI Adoption Rate: @lateinteraction questioned OpenAI’s claim that AI adoption is outpacing the internet’s early growth.
@karpathy notes that it was a major mistake he made in his undergrad is that he focused way too much on mathematical lens of computing - computability, decidability, asymptotic complexity etc. And too little on physical lens - energy/heat of state change, data locality, parallelism, computer architecture. The former is interesting; The latter bestows power.
@aidan_clark says that No LLM researcher should spent their whole life on one side of the pre/post training divide.

AI Reddit Recap

/r/LocalLlama Recap

1. New SOTA AI Models, Benchmarks, and Training Innovations

New SOTA Apache Fine tunable Music Model! (Score: 303, Comments: 86): The ACE-Step model (Github, HuggingFace) is a newly released open-source, Apache-licensed generative model for music that is fine-tunable and features extremely fast inference: benchmarks report 3 minutes of music generated in 34s on an RTX 4070, and sub-3s for shorter clips on a 4090. Model weights, demo, and code are available, and the architecture is designed for local use with potential ComfyUI integration. Community feedback confirms impressive generation speed and prompt-following precision, but observers note audio quality (especially timbre and style variety) is still behind proprietary SOTA models like Suno and Udio. Experts emphasize the model’s leap in local, high-speed music generation and seek integrations (e.g., ComfyUI), while recognizing that audio quality does not yet match closed-source leaders.
- Users are reporting exceptionally fast music generation speeds: one cited 3 seconds for output on an Nvidia 4090 and another stated 34 seconds for 3 minutes of music on a 4070, while even a 3060 provides “absolutely incredible” performance. This speed surpasses most commercial solutions, at least under default settings.
- Several users highlight mixed audio quality—the generation model excels in creative output and prompt adherence, but its sound is described as “still quite a bit lower than Suno or Udio” in terms of both instrumentation and vocals. The audio often resembles an overly compressed mp3; suggestions include tweaking settings to potentially enhance quality.
- There is technical interest in integrating the model into ComfyUI, with users offering to build an implementation if one does not exist, reflecting demand for greater tooling and support within ecosystem workflows.
Self-improving AI unlocked? (Score: 136, Comments: 41): The Absolute Zero Reasoner (AZR) introduces a paradigm where a language model self-generates tasks to optimize its own learning, using a code-execution environment for automatic reward verification, thereby eliminating any need for external, human-curated data. According to the paper, AZR achieves state-of-the-art results on coding and mathematical reasoning benchmarks—even outperforming prior zero-shot approaches dependent on large supervised datasets—and demonstrates scalability across various model sizes and classes. This approach extends reinforcement learning with verifiable rewards (RLVR) to a fully self-sustaining, data-free regime, leveraging only the model’s own curriculum evolution and code-based validation as the ground truth signal. Top comments highlight the shift towards models that autonomously construct and adapt their learning distributions as a potential breakthrough that could “free reasoning models from the constraints of human-curated data,” referencing a possible new era of self-improving AI. One commenter equates this to an “AlphaZero moment” for large language models applied to coding and mathematics, reflecting consensus that this marks a major advance in autonomous reasoning.
- The paper describes a shift towards reasoning models (notably AZR) that can define and evolve their own learning task distributions in interaction with an environment, reducing reliance on curated human data. This is compared to freeing models from human-curated constraints and is positioned as a parallel to a new phase in reasoning models, per citations (Morris, 2025; Silver & Sutton, 2025; Zhao et al., 2024).
- Commenters draw an analogy to AlphaZero in the context of LLMs for coding and math, emphasizing that AlphaZero learned purely from the rules of the game via self-play, whereas the ‘Absolute Zero Reasoner’ model builds on pre-existing abilities (e.g., reading, writing, coding). There is debate about whether ‘zero’ should truly indicate no prior human knowledge or whether the model’s performance derives partly from pre-trained capabilities.
- A technical excerpt from the paper demonstrates AZR (Absolute Zero Reasoner-Llama3.1-8b at step 132) handling deliberately obfuscated Python tasks designed to challenge both humans and machine learning models, showing the system’s practical ability to tackle adversarial reasoning or code comprehension challenges.
I’ve trained a LTXV 13b LoRA. It’s INSANE (Score: 487, Comments: 43): A user fine-tuned an LTXV 13B video diffusion model using a LoRA adapter, trained on 22 video samples for 2,000 steps via the official Lightricks LTX-Video-Trainer on an H100 GPU (using Runpod) in roughly 1 hour. The resultant LoRA, shared on CivitAI, is compatible with ComfyUI after conversion (using the _comfy suffix) and includes workflow/YAML instructions; this demonstrates immediate community ability to extend state-of-the-art video diffusion models with minimal compute and default hyperparameters. Commentary highlights both the rapidity of LoRA finetuning by the community (reacting within a day of model release) and concerns about hardware accessibility (noting challenges for users with GPUs like the 3090/4090). Requests for the user’s workflow and tips signal growing interest in reproducibility and best practices for LoRA video diffusion finetuning.
- Discussion highlights the rapid pace of community adaptation, noting that a LoRA version of the LTXV 13B model was trained and released within just one day of the base model’s availability. This underscores both the accessibility of LoRA finetuning and the enthusiasm among users leveraging consumer hardware such as the RTX 3090, despite resource constraints.
SamsungCam UltraReal - Flux Lora (Score: 256, Comments: 29): The post introduces “SamsungCam UltraReal - Flux Lora,” a custom LoRA designed to replicate Samsung-style photographic realism, particularly for Flux-based generative AI models. The LoRA emphasizes enhanced fine detail (e.g., pores, hair strands), reduces ‘plastic doll’ skin artifacts typical in many generative models, and partially replicates Samsung’s vibrant color science. While optimized for the creator’s UltraReal Fine-Tune (https://civitai.com/models/978314/ultrareal-fine-tune), it is compatible with base Flux.dev but may display image glitches (notably with hands) at high (2MP) generation resolutions. Commenters note a marked improvement in perceived realism, highlighting that this LoRA delivers on its promise of more lifelike outputs compared to standard models. A light-hearted remark about activating Windows appears but is not technically substantive.
- A user inquires about the optimal workflow for character fine-tuning: whether using the Ultrareal checkpoint with Kohya’s Dreambooth would yield good results or if it’s preferable to train a LoRA on the base Flux.dev model and apply it to Ultrareal. The question signals an interest in fine-tune stacking, model compatibility, and the potential effect of upstream checkpoints on personalization quality.

2. Open Source AI Video and Image Generation Tools & Compression

Generated this entire video 99% with open source & free tools. (Score: 624, Comments: 69): The OP details a near 100% open-source/free video creation pipeline combining tools like ComfyUI with custom workflows (using Flux Turbo, Redux, Gemini 1.2 Flash for consistent characters), Wan2.2 and Skyreels for image-to-video synthesis, AudioX for generating SFX, Suno 4.5 for music, and Zono for TTS, with only Enhancor (skin realism) paid. CapCut was used as the final editing platform, and ControlNets were integrated for output fidelity, collectively showcasing a robust, modular AI content stack. More details can be found in the original Reddit post. Top technical feedback notes that the process is about 89% free due to the single paid component (Enhancor), and otherwise highlights appreciation for the workflow and queries regarding the video’s actual thematic content rather than workflow-specific critiques.
- A commenter highlights the video’s strong character consistency, asking whether it was achieved by training a LoRA (Low-Rank Adaptation) model on a real set of photos, or if additional methods or tools were involved. This references technical details related to personalized image/video generation and training approaches.
Run FLUX.1 losslessly on a GPU with 20GB VRAM (Score: 217, Comments: 66): The post announces losslessly compressed versions of the 12B FLUX.1-dev and FLUX.1-schnell models using DFloat11 (paper), which applies entropy coding to BFloat16 weights for ~30% size reduction (from 24GB to ~16.3GB) with no output changes. This enables these large models to inference on a single 20GB+ VRAM GPU, incurring only a brief per-image overhead. Downloads and usage examples are provided via HuggingFace and LeanModels GitHub. The primary technical query in comments is whether DFloat11 could be applied to other large models such as SDXL, primarily for storage space reduction, and there’s interest in upcoming support for alternative UIs/runtimes like Forge and Comfy.
- A commenter questions the practical utility of DF11 compression compared to existing methods, specifically citing INT8_SYM weights-only compression as already being nearly lossless, reporting it yields “a total of 0 to 10 different pixels on the output image” with about 30% more compression than DF11. This raises a challenge to the claimed advantages of the DF11 approach for model size reduction and fidelity.
- Technical concerns are raised about the transparency and reproducibility of the compression method: a request is made for the release of both compression code and the source for the decode.ptx file, with skepticism expressed regarding its provenance (noting that it was generated by NVIDIA’s NVVM Compiler rather than hand-written). This underscores the necessity of releasing source artifacts for community validation and extension.
- There are inquiries into broader applicability, specifically whether the technique can be used with SDXL checkpoints (for storage savings) and with video models, suggesting technical interest in generalizing the method to various model classes beyond its original scope.

3. Major Industry Shifts: Google, Reddit, and OpenAI AI Impact

A year ago Google bought the rights to use Reddit content for their AI training, now their model is supreme (Score: 373, Comments: 119): The post references Google’s 2023 licensing deal obtaining the rights to use Reddit data for AI training (see: WSJ coverage), suggesting a link between access to Reddit content and Google’s model advancement. Top commenters emphasize that, despite Google’s formal agreement, most AI labs (OpenAI, Meta, etc.) had already scraped Reddit data without permission, meaning the legal access likely made little technical difference in model capability or training. Another technical observation links the internet’s meme/benchmark culture to model evaluation, implying overlap between Reddit’s contributor base and those designing AI benchmarks. Some debate the practical impact of Google’s deal, arguing the access was not unique nor critical since the AI community widely scraped Reddit previously, thus the deal was largely symbolic or legal rather than yielding a technical edge.
- Several commenters point out that while Google paid for Reddit content, other AI labs have historically scraped Reddit data without formal agreements, suggesting there is likely no direct technical advantage gained from the purchase itself. The implication is that the rights agreement is more about legal or ethical alignment than unique training data access since most major language models have already incorporated Reddit data.
- Glxblt76 emphasizes that Reddit offers inherent content curation through its upvote system, which could benefit AI training by ensuring higher quality or more representative user-generated data enters the training set. This curation could indirectly influence model performance, as curated datasets are often correlated with improved downstream results.
- Express-Set-1543 makes the point that the overlap between Reddit contributors and benchmark creators may introduce distributional alignment between training and evaluation data, potentially resulting in higher scores on commonly-used benchmarks—especially those devised by the same technical communities active on Reddit.
AI ironically destroying Google. Stock dropped 10% today on declining Safari browser searches. (Score: 336, Comments: 153): The post discusses the technical implications of AI-powered chat (LLMs) on Google’s ad-driven business, particularly after a 10% stock drop linked to declining Safari browser search volume. The poster argues that AI assistants may decrease both Google’s core search revenue and third-party display ad revenue by reducing web visits, while also noting the user-unfriendliness of inserting ads directly into LLMs (e.g., Gemini). Financial data cited includes Google’s heavy reliance on advertising. No major new benchmarks, model releases, or implementation details provided. Commenters engage in a technical debate: Several highlight that search ads (especially for commercial intent queries) remain difficult to fully replace with LLMs, and stress that Google’s diversified ad ecosystem (YouTube, Android, display networks) buffers revenue impact. Others note that ad monetization within chatbots lacks a proven model, which partly explains OpenAI’s subscription approach. The competitive technical positioning of Google’s AI is also cited as a strategic advantage.
- Discussion centers on the complexity of search ad revenue displacement: chatbots like ChatGPT or Gemini have not significantly eroded Google’s core search ad business because chatbot interaction models are not conducive to effective ad placements—unlike traditional web, app, or video search contexts where user intent to transact or click through is clearer and highly monetizable.
- Several users stress that technical advances in Google’s AI (such as ‘Deep Research’ on Gemini Advanced) reflect a strategic shift towards more sophisticated, subscription-based AI products, but current reliance on traditional search for high-value queries means semantic search/chatbots are not immediately replacing key revenue streams.
- A significant technical and financial risk noted is the market’s reaction to antitrust proceedings regarding Google’s search dominance, particularly their deal with Apple to be the default search provider in Safari; regulatory changes threaten a revenue stream worth ~$20 billion/year, and are seen as more impactful to share price than AI-induced search disruption at present.
OpenAI Takes 80% of U.S. Business AI Subscription Spend (Score: 160, Comments: 59): The chart visualizes U.S. business spending on AI subscriptions, with OpenAI commanding 32.4% adoption (and 80% of total card spend) based on Ramp.com commercial card data, showing a rapid overall AI adoption increase projected to reach 40.1% by mid-2025. Competitors like Anthropic (8%), xAI, Google, and DeepSeek collectively hold a much smaller share, each under 1% except Anthropic. The data is specific to businesses using Ramp, not the entire market, which could skew representation. Commenters highlight that the data is limited to Ramp.com’s platform and may not reflect the broader business landscape. There’s technical curiosity about Google’s declining share coinciding with a major model (2.5) release.
- A commenter points out that the reported 80% market share for OpenAI only represents data from companies using the ramp.com service, cautioning that this sample may not reflect the broader U.S. business landscape or aggregate spending patterns outside Ramp’s user base.
- There is technical scrutiny around Google’s reported business AI market share, with skepticism that “Google Workspace for Business”—which incorporates significant AI functionality—is showing extremely low adoption (0.1%) in the data. Commenters suggest this might be due to either incomplete accounting or methodological issues in how AI-related spend is captured in the source statistics.
- A commenter observes a notable decline in Google’s share correlating with the release of version 2.5, hinting at potential issues or shifts in business adoption, but requests more data to clarify if this is a reporting artifact or reflects real-world usage trends.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. Gemini 2.5 Pro Model Updates and Benchmarks

Gemini 2.5 Pro Update: Even better coding performance [Google Blog] (Score: 188, Comments: 34): Google’s latest Gemini 2.5 Pro Preview (05-06) advances coding performance, especially in front-end/UI work, as evidenced by its #1 rank on the WebDev Arena leaderboard. Key improvements include more robust code transformation/editing, enhanced function calling with lower error and higher trigger rates, and better agentic workflow support; video understanding is also state-of-the-art (84.8% VideoMME benchmark). Model version aliases (e.g., 03-25 now points to 05-06) mean no user-side action is needed for upgrades, ensuring seamless API deployment. Commenters raise concerns about the practice of updating model versions in-place—potentially undermining reproducibility and versioning best practices—while also noting that improvements are largely confined to coding, with scientific and mathematical capabilities trailing competitors (e.g., GPT-4o in some areas).
- The update flow for Gemini 2.5 Pro is criticized due to the aliasing of the latest model version; users mention that the ‘03-25’ endpoint now automatically references ‘05-06’, raising technical concerns about proper versioning and reproducibility (e.g., model outputs could change over time under the same version label).
- While Gemini 2.5 Pro shows coding-specific improvements, users note insufficient progress in other domains, particularly in science and math, as even Google’s own benchmarks reportedly show older models outperforming it in these areas. There’s mention of Logan (possibly a Google staff member) confirming it’s a coding-targeted update.
- Despite recognition of strong coding benchmarks, critical comments highlight persistent failure cases: Gemini 2.5 Pro is flagged for generating code that calls non-existent functions or returns incorrect results for basic coding tasks (like filtering lines containing a string), revealing ongoing challenges with code generation reliability.
The Updated Gemini 2.5 Pro now ranks #1 on the WebDev Arena leaderboard (Score: 201, Comments: 39): The image presents the WebDev Arena leaderboard, where the newly updated Gemini 2.5 Pro model has reached the #1 position with a top arena score of 1420. The leaderboard visually compares various models’ coding performance in the WebDev Arena benchmark, and a highlighted metric specifies that Gemini 2.5 Pro gained +147 Elo over its prior version, signaling a notable leap in coding and web development task capabilities. This update positions Gemini 2.5 Pro ahead of other leading models, underlining rapid progress in LLM-based coding assistants. Commenters are impressed by the magnitude of the leap in coding performance (+147 Elo), with some noting that improvements are also evident in creative writing capabilities—less repetition and better prompt understanding with the updated model.
- Multiple comments highlight a significant performance leap for Gemini 2.5 Pro, as it has taken the #1 spot on the WebDev Arena leaderboard. This suggests marked improvements over previous versions and potentially over competing models, especially in tasks related to web development benchmarks.
- One user notes practical improvements in creative writing tasks, describing the model as less repetitive, more natural, and having a better understanding of prompts. This points to notable advancements in natural language generation and context awareness in the updated version.
- There is anticipation for further evaluation through third-party benchmarks such as ‘simple-bench,’ indicating that while leaderboard results are promising, the community values comprehensive, independent testing for verifying claimed improvements.
New Version of Gemini 2.5 Pro: gemini-2.5-pro-preview-05-06 (Score: 333, Comments: 68): The image is an official-looking banner announcing the release of ‘gemini-2.5-pro-preview-05-06’, described as ‘our most advanced reasoning model’ by Google/DeepMind. The title and minimalistic design emphasize this as a technological upgrade, likely referencing improvements in complex reasoning and problem-solving capabilities for the Gemini 2.5 Pro language model. The versioning suggests iterative enhancements over previous internal/prototype releases, and the reference to ‘Matts_fixes_APPROVED’ in a top comment may imply special attention to recent bug fixes or architectural tweaks. Technical commenters express anticipation and curiosity about the upgrade, inquiring about hands-on experience and suggesting adoption of this specific variant due to approved fixes, hinting at internal debate over model reliability or performance between Gemini release candidates.
- There is discussion on the model versioning: specifically, the referenced release is “gemini-2.5-pro-preview-05-06-FOR_RELEASE_use_this_one!!!_Matts_fixes_APPROVED(2)(2)”, indicating some internal or patch-related updates and approval noted as ‘Matts fixes’. This level of naming suggests a staged or internal QA process before broader deployment.
- Technical deployment is mentioned: initially, users noted the model was only available on Vertex AI, with one user later confirming its availability in AI Studio as well, highlighting staggered rollout across Google’s AI platforms and possible delays in wider model accessibility.
- A meta point is raised regarding Google’s labelling practices: a user questions if Google ever moves models out of ‘experimental’ or ‘preview’ phases, hinting at a pattern of prolonged preview status for their AI model releases, which may impact production adoption timelines.
Today’s gemini 2.5 pro update (Score: 300, Comments: 20): Google’s Gemini 2.5 Pro update demonstrates precise code generation capabilities by implementing the classic Barnsley fern fractal using standard IFS parameters from the original 1988 paper, as reported in the official Google blog post (source). Technical commenters observe that the algorithm chosen is well-known and simple, noting that successful, unguided code generation illustrates the model’s competence in recognizing and correctly applying canonical solutions from computer graphics history. One notable debate concerns the significance of using classic algorithms as benchmarks; some argue it highlights expected SOTA LLM performance, while others note Gemini 2.5 Pro’s overall quality surpasses many GPT variants in recent usage.
- One notable technical discussion point concerns the example Google used to showcase Gemini 2.5 Pro—a well-known fractal generation algorithm. Multiple users highlight that this algorithm is both “extremely old and relatively easy to implement,” suggesting that such tasks should now be trivial for any state-of-the-art (SOTA) large language model (LLM).
- Some users directly compare Gemini 2.5 Pro’s capabilities to those of other leading LLMs, noting that its recent updates have led to notable performance improvements and even suggesting it “performs better than most GPT models” in practical use cases.
- A user raises a key technical question about Gemini 2.5 Pro’s reasoning ability, reflecting ongoing community interest in whether its architecture and training have materially advanced beyond earlier models’ reasoning limitations.

2. OpenAI Acquisition of Windsurf Coverage

OpenAI Reaches Agreement to Buy Startup Windsurf for $3 Billion (Score: 187, Comments: 50): OpenAI has reportedly agreed to acquire the AI coding agent startup Windsurf for $3 billion. Windsurf is known for its rapid-release open-source coding agents that integrate with multiple AI models, while the current ecosystem is characterized by a separation between open-source coding agents (e.g., Aider, Kilo Code, Cline) and a broad variety of models (with frequent releases and increasing availability of local/cheaper models). Concerns are raised that acquisition by OpenAI may bias Windsurf’s future product toward favoring OpenAI models over alternatives such as Gemini or Claude, potentially reducing ecosystem diversity and openness. Technically-focused concerns highlight risks of vertical integration reducing user choice and innovation speed, particularly if formerly model-agnostic agents become locked into a single provider. The $3 billion valuation is also questioned as potentially excessive, given current market dynamics.
- There is concern about vertical integration and potential platform lock-in if Windsurf, previously an open-source AI coding agent with wide model support, starts favoring OpenAI models post-acquisition. This could disrupt the current ecosystem where open-source coding agents (like Cline, Roo, Aider, Kilo Code) rapidly add features and support many models, fostering innovation and fair competition across the landscape.
- One commenter suggests the acquisition cost is justified by the value of Windsurf’s user telemetry data, which OpenAI could use to enhance its coding models. This indicates OpenAI’s current position in AI coding tools may be weaker than perceived, and the purchase is strategic for strengthening its dataset and proprietary model training.
- There is discussion about market strategies, with the point that “just slapping the OpenAI name” onto Windsurf could drive adoption regardless of actual technical superiority. Some note that OpenAI’s dominance among coders (many of whom use only ChatGPT) gives it overwhelming network and distribution advantages, even if alternatives like Cursor exist and might be technically better.
OpenAI Reaches Agreement to Buy Startup Windsurf for $3 Billion (Score: 513, Comments: 94): OpenAI has agreed to acquire AI startup Windsurf for ~$3 billion, aiming to integrate Windsurf’s technology and talent to accelerate its product development and expand core AI capabilities. This acquisition may target strengthening OpenAI’s position in the subscription-based AI tooling space and is expected to enhance both infrastructure and model innovation. Bloomberg coverage provides further detail. Technical commenters debate whether Windsurf’s technology justifies the $3B price tag, suggesting it could be rebuilt for less, and question whether integration into OpenAI’s paid tiers will be a significant differentiator—particularly with competitors like Cursor mentioned.
- A key technical discussion centers on the risk of vertical integration: acquiring Windsurf could cause its platform to prioritize OpenAI’s own models (like GPT-4/5), reducing support or access for competing models (e.g., Google Gemini, Anthropic Claude). This could negatively impact developer choice and ecosystem diversity, which currently flourishes due to agent/model decoupling and ongoing open-source innovation.
- One commenter highlights that many AI coding agents (e.g., Cline, Roo, Aider, Kilo Code) are released weekly, most are open source, and they integrate with multiple models. There’s concern the acquisition could hinder rapid feature development or compatibility with non-OpenAI models, since corporate ownership may prioritize proprietary integration over inclusivity.
- There are technical objections to the acquisition’s cost, with suggestions that Windsurf’s core features could be replicated for significantly less than $3 billion, raising questions about the efficiency and technological differentiation justifying such a large outlay.
OpenAI Reaches Agreement to Buy Startup Windsurf for $3 Billion (Score: 632, Comments: 120): OpenAI has reportedly reached an agreement to acquire startup Windsurf for $3 billion. Details about Windsurf’s technology or product offering are not provided in the post, but the high valuation suggests Windsurf possesses unique intellectual property, technology infrastructure, or talent that OpenAI deems difficult or time-consuming to replicate. A related comment also notes that Cursor, presumably another tech startup, is now valued at $9 billion after raising $900 million, underscoring current high valuations in the AI sector. Commenters question why OpenAI would acquire Windsurf rather than develop the relevant technology in-house, implying that Windsurf may possess a significant technical or organizational advantage not easily reproduced by OpenAI.
- Several users discuss the high valuation of Cursor ($9B) and Windsurf’s $3B acquisition, with one pointing out Cursor’s recent $900M funding round, signaling intense investment and interest in AI-powered developer tools and IDEs.
- A user speculates that OpenAI’s acquisition of Windsurf may be tied to their strategies around future software engineer agents, indicating Windsurf’s technology could play a key role in powering autonomous or semi-autonomous coding systems beyond what OpenAI can develop internally.
- Another comment suggests that this acquisition marks a significant escalation in competition among AI-powered IDEs and developer platforms, implying a forthcoming “IDE wars” driven by massive investments and M&A in this space.

3. Latest AI Image and Video Generation Model Launches

LTXV 13B Released - The best of both worlds, high quality - blazing fast (Score: 1026, Comments: 180): Lightricks has released LTXV 13B, an open-source, 13B-parameter video generation model featuring multiscale rendering: it initially produces low-res frames and iteratively refines them, resulting in high rendering efficiency and improved physical realism. The model claims to be ~30x faster than comparable models, supports advanced controls (keyframing, camera/scene/character motion, multi-shot sequencing), and provides both standard and quantized (FP8) versions optimized for local GPU use. Full commercial rights are granted (with some large enterprise exceptions), and the ecosystem includes an easy-to-use LoRA finetuning trainer (GitHub), ComfyUI workflows, and Diffusers pipelines; the model and FP8 variants are available on Hugging Face. Commenters highlight the size of the download (~26GB) but appreciate availability of an FP8 quantized version, and anticipate comparing it to other recent video models like Wan FLF and SkyR I2V. Quality/speed tradeoffs of quantized models are noted in repo documentation.
- There are concerns regarding the 8-bit floating point (FP8) workflow with LTXV 13B: users report noticeably lower detail after upscaling and consistent exposure shifts (images become brighter with less contrast), which could limit usefulness for high-fidelity or color-critical applications.
- One user inquires about hardware compatibility, specifically whether a system with 4GB VRAM and 32GB RAM can run the model, implying potential challenges with resource constraints for LTXV 13B, given its large model size (26GB for standard versions).
Insert Anything – Seamlessly insert any object into your images with a powerful AI editing tool (Score: 152, Comments: 32): “Insert Anything” is an AI-powered image editing framework allowing seamless insertion of any reference object into a target image. The tool claims preservation of photorealistic detail (color, texture) and supports applications such as virtual try-on, advertising, and meme creation. The code and workflows are provided via Hugging Face Space and GitHub, with ComfyUI workflow integration. Commenters note the tool reportedly requires ~26GB VRAM, implying significant hardware requirements and reduced accessibility for users with mid-range GPUs (e.g., RTX 3060). Functionality is described as working well by at least one user.
- Users are discussing the significant VRAM requirement (26GB) for running the tool locally, expressing concern over whether cards like the RTX 3090 (24GB VRAM) or RTX 3060 (12GB VRAM) can handle the workload, implicating large model sizes or resource-intensive operations.
- A user inquires about the underlying model or architecture, questioning if the tool is based on Flux, SDXL, or another framework, pointing to a desire for more implementation-level details about the image editing approach.
ZenCtrl Update - Source code release and Subject-driven generation consistency increase (Score: 127, Comments: 30): The image is a collage demonstrating ZenCtrl’s latest improvements in subject consistency across different perspectives and scenes. This update addresses prior model weaknesses where subject identity would break during angle or scene changes, following additional training and model refinement. The release features open-sourcing of ZenCtrl, now available on GitHub, alongside links to a Hugging Face demo and Discord, emphasizing its open, modular approach for controllable AI image generation. Commenters inquire about ZenCtrl’s architecture, specifically if it is analogous to ControlNet for SDXL/Flux or incorporates its own generative backbone, and its potential integration with ComfyUI. Technical discussion centers on implementation specifics and workflow compatibility, indicating strong interest in modular integration and usability in existing pipelines.
- A user inquires whether ZenCtrl operates analogously to a ControlNet for SDXL/Flux or if the repository also includes a standalone image model. This question seeks to clarify if ZenCtrl augments existing diffusion pipelines with subject conditioning, or if it provides a full generative backbone model on its own.
- Another commenter asks about usability within ComfyUI, suggesting interest in integration details and compatibility for composable diffusion workflows. They are seeking technical documentation or community confirmation regarding how ZenCtrl might be incorporated as a node or module within ComfyUI pipelines.
- A question is raised about the change in project license from Apache, which touches on the implications for open-source use, redistribution, and commercial adaptation. This is crucial for downstream developers who might integrate or extend ZenCtrl.
ComfyUI API Nodes and New Branding (Score: 133, Comments: 67): ComfyUI has announced native API node integration for a range of state-of-the-art (SOTA) third-party models, including Bfl FLUX, Kling, Luma, Minimax, PixVerse, Recraft, Stability AI, Google Veo, Ideogram, and Pika. Access to these APIs is opt-in and requires prepaid billing, charging only the underlying API costs and some transaction fees, while the core ComfyUI remains free and open source. More technical details and implementation context are provided in their official blog post. Technical users express reservations about the direction toward SaaS/API dependence but recognize the need for project sustainability; some emphasize appreciation for continued open-source access while noting philosophical concerns about external service integration.
- Some users express concern that ComfyUI’s direction towards API nodes and new branding could eventually lead to closed-source APIs, which may impact transparency and open community contributions. There’s an underlying debate around the sustainability of open-source projects versus the need for monetization through methods like SaaS or restricted APIs.
- A direct link to a ComfyUI blog post (https://blog.comfy.org/p/comfyui-native-api-nodes) is provided for technical readers seeking deeper information about the newly introduced native API nodes, which may indicate significant architectural or extensibility changes in the ComfyUI ecosystem.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Flash Preview

Theme 1. New Models Hit the Scene, Bringing Power and Problems

Qwen 3 Outshines Gemini in Code: Users found Qwen 3 models superior to the new Gemini 2.5 Pro update for coding, providing functional code that respects instructions where Gemini completely ignores and overengineers requests. One user stated that Qwen 3 is better in every way for coding tasks.
GPT-4o Gets Dumb, Users Report: Members wonder if GPT-4o is losing its touch, reporting random, out-of-topic responses and general decline, though some claim their model is working glorious. One user directly asked what is happening to gpt? why its acting weird? it reply randomly out of topic.
Mistral Medium 3 Arrives, Sparks Debate: Mistral Medium 3 was released (Mistral’s official announcement), sparking mixed reactions from “useless” to potentially good for creative writing, while others noted DeepSeek v3 is cheaper and performs better on benchmarks. The release offers competitive pricing ($0.4 input / $2.0 output per M token).

Theme 2. Pushing Silicon to the Limit: Hardware, Speed, and Optimization

Cerebras and Groq Duel for Model Hosting: OpenRouter announced Cerebras as a new provider (link to X post) with massive 4 trillion transistor chips and 40 GB on-chip memory, sparking debate against Groq for hosting large models like DeepSeek v3.1/r1. Users noted Cerebras offers high TPS on models like Llama 4 Scout and Llama 3.3 70B Instruct.
NVIDIA GPU Hierarchy Debated: The A6000 Ada often beats the L40s in stock performance despite less tensorcore power, possibly due to L40s having default ECC memory, while the A40 reigns supreme in tokens/$ for renting in the 4x GB category. The 4090 is the fastest Ada card but faces more overheating issues compared to L40s servers.
Quantization Unleashed: TorchAO & ExecuTorch Power Efficiency: PyTorch released quantized Phi-4 Mini Instruct models via TorchAO on Hugging Face, optimized for vLLM (INT4/FP8) and ExecuTorch (INT8/INT4), enabling significant memory reduction and speedups on GPUs and mobile devices. These models reached 17.3 tokens/sec on an iPhone 15 Pro using 3206 Mb of memory.

Theme 3. Dev Tools and Ecosystems Evolve with AI

Cursor Student Discount Hits Snags: Users report issues getting the free Cursor student discount, facing verification problems and incorrect billing, particularly with international educational emails like .etu in France. Suggestions included contacting support at [email protected].
OpenRouter Expands with Cerebras, Adds Data Export: OpenRouter added Cerebras as a provider (link to X post) and will soon launch an export function on their activity page (link to X post) for users to extract their usage data. A user requested timezone adjustments for activity stats.
MCP Servers Spark Security and Usability Concerns: Concerns were raised about the security of MCP servers, trusting code from random people, and ensuring credentials are not leaked, while also debating Cursor’s difficulties in utilizing MCP resources as intended by the MCP specification. Debugging MCP servers, especially over stdio, proved challenging, with users suggesting tee stream or this debugging guide.

Theme 4. Training, Tuning, and Trusting the Data

Absolute Zero Paper Fuels Self-Play Debate: Discussion of the paper Absolute Zero: Reinforced Self-play Reasoning with Zero Data centered on whether models can generate truly diverse problems or if they are still limited by how diverse the problems that the model can come up with are, tied to their pretraining data. Members speculated the model primarily solves problems reflected in its pretraining data.
Benchmarks under Scrutiny for Data Contamination: The reliability of LLM benchmarks is questioned due to the potential for models to be trained on the benchmark data itself, leading the community to advise taking results with a “grain of salt”. Despite concerns, members acknowledge the need for metrics to evaluate models.
Dolphin Seeks ‘Naughty’ Data for Training: Eric at Cognitive Computations is looking for help to democratize, make anonymous, and organically source naughty interactions to train the Dolphin model using a tool called dolphin-logger. Users can install dolphin-logger, add your keys to .env, and run it, then point your agentic/MCP tools at it.

Theme 5. AI Agents Gain Skills, Face Hurdles

LlamaIndex Powers Deep Research Agents: LlamaIndex released a workshop tutorial guiding users to build a Deep Research agent from scratch using AgentWorkflow, capable of performing more in-depth analysis by creating a single multi-agent system. They also updated LlamaExtract to enhance citation and reasoning (LlamaIndex’s tweet).
Aider Learns to Search, Uses Perplexity API: Users are exploring giving Aider web-searching capabilities, suggesting using a Perplexity API key as an OpenAI-compatible endpoint or manually adding webpage content via the /web command as a markdown file. This allows for integrating web search into coding tasks.
Agent Hackathons Offer Cash and Credits: The LLM Agents MOOC announced hackathons sponsored by Auth0 (up to $5,000 cash) and Lambda (up to $1,000 credits) for building and scaling AI agents. Register for the Lambda AgentX Workshop at lu.ma/AgentX-lambda.

Discord: High level Discord summaries

LMArena Discord

GPT-4o Struggles Where O1 Excels: Members noted that GPT-4o can’t handle electrical and philosophy tasks as well as O1, which suggests that there are differences in the model’s knowledge base.
- One user pointed out that O1 can think and recall about electrical stuff whereas 4o has to immediately predict a chat reply.
Grok 3.5 Rumors Stir ASI Speculation: The AI community speculates that Grok 3.5 is a tuned version of Grok 4 and might be approaching Artificial Superintelligence (ASI).
- Some users reported brief access to Grok 3.5 in the UI, but others dismissed the claims as fake news.
Gemini 2.5 Pro Coding Reviews Mixed: Early reviews of Gemini 2.5 Pro (0506) show improved coding and web design performance but possibly worsened performance in other areas.
- Although it is strong with coding, some users find it overrated because its analytical processes and context handling are not as good as its predecessor.
O3 Pro Still MIA: The community is disappointed over the continued absence of O3 Pro, and they speculate that OpenAI might be intentionally delaying its release.
- One user joked that OpenAI don’t want u to use it, and pointed to the possibility of a fake announcement written by Gork 4 as a sign of the model’s capabilities.
LMArena Enhances Discord, Seeks Feedback: LMArena is investing in its AI community by hiring a community manager and seeking feedback on Discord improvements to foster engagement and growth, via this survey.
- Community members can submit their thoughts on potential changes they’d like to see in the Discord.

Perplexity AI Discord

Perplexity Discord Bot Gets Axed: Users expressed disappointment over the removal of the Perplexity Discord bot, suggesting self-hosting alternatives via the Perplexity API, Gemini, and OpenRouter.
- One user noted that github has many options and with gemini + openrouter you can make it basically free.
Perplexity Image Generation Trips Up Users: Users reported problems with Perplexity’s image generation, receiving instructions to use MidJourney instead, fixed by starting a new thread.
- A user suggested that old context might interfere, recommending to try in a new thread.
AI Chatbots Mull Over Ads in AI: The conversation centered on the potential inclusion of ads in AI responses, referencing an article about Apple says Google searches are dropping in favor of ai search.
- A user noted Copilot’s attempt to suggest additional products when asked to find a specific item, demonstrating a form of ad insertion.
Google’s Massive Apple Search Engine Payday: Users discussed Google’s traffic acquisition costs (TAC) to remain the default search engine on Apple devices, with Google paying Apple an estimated $20-21 billion annually.
- Analysis indicates losing default status on Apple devices could result in revenue losses of $28.2-32.7 billion for Google.
Deep Research Yields Long Reports: A member shared a Deep Research report on a specific topic, which was a solid report, clocking in at 6200 words.
- The research can be found at this Perplexity AI link.

OpenAI Discord

Veo 2 Dominates Sora in Video Face-Off: Members comparing Veo 2 and Sora in video generation found Veo 2 to be faster, of higher quality, and more accurate in following prompts and reference images, but that Sora produces lower quality video and fails to follow prompts.
- Users criticized Sora for having cuts, random scene changes, terrible physics, and censorship problems.
DeepSeek R1 Echoes Early GPT-4: Internal analysis showed that DeepSeek models like R1 and V3 exhibit response patterns and syntax similar to early GPT-3.5/4 outputs, suggesting that it may have mimicked OpenAI outputs through distillation or fine-tuning.
- Evidence pointed to large batch output pulls via API tokens from low-usage regions, further fueling speculation.
GPT-4o: Is It Really Getting Dumber?: Members are openly wondering whether GPT-4o has lost its spark, with some users reporting random, out-of-topic responses and an overall decline in response quality.
- While some users suggest checking conversation memories for confusing information, others claim their GPT-4o is working glorious.
Hypertree Prompting Creates Stellar Plans: A member shared a link to a ChatGPT conversation using new hypertree planning prompting, suggesting it could provide/organize context in a more manageable way.
- The conversationalist said that it sounds like it could be pretty stellar.
Atoms Glow in Visible Light, Defying Expectations: Members shared an image of a single Yb atom suspended in an atom trap, demonstrating that atoms can emit visible light directly when saturated with excess energy.
- Visible light can’t be used to see an individual atom, but atoms can emit light in the visible spectrum.

Cursor Community Discord

Student Discount Program Plagued with Problems: Users are reporting issues with applying the Cursor student discount, including verification problems and incorrect billing.
- Members suggest that the discount might be tied to the email used for verification, with some noting that .etu emails in France aren’t being properly recognized, with suggestions to contact support at [email protected].
Gemini 2.5 Pro Fumbles Tool Use: Users are sharing experiences with Gemini 2.5 Pro in Cursor, noting that while the model is generally good, it often fails to call tools properly and struggles with JSON formatting, especially with backticks in strings.
- One user provided a request ID for a case where Gemini said it would apply changes but never called any tool.
MCP Servers Spark Security Scrutiny: Concerns are raised about the security of MCP (Model Context Protocol) servers, particularly regarding trusting code from random people and ensuring the libraries don’t send MySQL credentials to unauthorized servers.
- One member suggested checking the GitHub repo before running an MCP server, building it locally, or creating your own MCP with Cursor.
Discord Channel Structure Drives Discussion: Members suggest restructuring the Cursor Discord server with more channels and better organization, similar to Langchain’s Discord setup, to improve navigation and community engagement.
- It was confirmed that a team is working on the channel prioritization and archiving to better support the community.
PowerShell Version Problems Persist: A user is having trouble setting PowerShell 7 as the default in Cursor, despite updating settings.
- The suggested solution was to close the last terminal session and start a new one after updating Settings.json to include specific profiles for PowerShell 7.

OpenRouter (Alex Atallah) Discord

OpenRouter Activity Page Adds Export Function: OpenRouter announced an export function coming soon to their activity page (link to X post), allowing users to extract their data for further analysis.
- A user requested a timezone adjustment for stats, highlighting ongoing needs for usability improvements.
Cerebras Joins OpenRouter, Challenging Groq: OpenRouter announced Cerebras as a new provider, touting their massive chips with 4 trillion transistors and 40 GB of on-chip memory (link to X post), promising high throughputs.
- Members debated the merits of Cerebras versus Groq for hosting large models like DeepSeek v3.1/r1, considering factors like hardware capacity and model size.
DeepSeek Models Spark Provider Speed Race: Users are eagerly awaiting a fast, long-context DeepSeek v3.1/r1 provider, as there is a gap for speed in the market.
- Discussion revolved around achieving speeds of 300 tok/s on H100s with optimized speculative decoding, but others argued the model is too large for Groq’s hardware with a need of 2700 cards for the weights.
Mistral Medium 3 Released, Mixed Reactions: The release of Mistral Medium 3 garnered mixed reactions, with some users finding it “useless” while others suggested its potential for creative writing, but others stated DeepSeek v3 is cheaper and better on benchmarks.
- Some noticed that Mistral is teasing their new large model in the Mistral Medium post, and some members mentioned their 2504 checkpoint has been deployed on Cerebras for some time.
Clippy Makes a Comeback as VS Code Extension: Felix Rieseberg brought back the iconic Clippy as a VS Code extension (Clippy VS Code extension), injecting a dose of nostalgia into the coding world.
- The extension revives the famous paperclip assistant from Microsoft Office, providing a humorous element to the development environment.

Unsloth AI (Daniel Han) Discord

Qwen3 A3B Impresses with 3B Active Parameters: Members find the Qwen3 a3b (mixture of experts) models impressive, noting the a refers to the 3B active parameters during inference and outperforming qwq32b.
- The A3B model (30B total parameters) uses roughly the same RAM as the 32B Q6k, about 25GB.
GGUF Conversion Stumbling Blocks Post-Qwen3 Training: After training Qwen3-30b-a3b, one user faced issues converting it to a usable gguf file, prompting a suggestion to symlink llama-quantize within the main llama.cpp/ directory.
- The user identified that llama-quantize was located in llama.cpp/bin/ rather than the main llama.cpp/ directory.
vLLM Errors Linked to LoRA Adapters, V1 Engine: A user discovered that vLLM errors may stem from LoRA adapters and the v1/v0 engines in vLLM, with errors resolving when vLLM is forced to use the v0 engine.
- The issue occurs when some versions default to the V1 engine when adapters are used.
Phi-3.5’s Premature Stopping Problem: A member noticed that finetuning LLMs like Phi-3.5 relies solely on <|end|> to signal stops, leading it to generate content until it hits the token limit.
- The default tokenizer’s padding token is <|placeholder6|>, which is then appended with tokenizer.eos_token.
Absolute Zero Sparks Debate on Self-Play Limitations: Discussion around the paper Absolute Zero: Reinforced Self-play Reasoning with Zero Data emphasizes that it is still limited by how diverse the problems that the model can come up with are.
- It was speculated that the model can only solve problems already reflected in its pretraining data, although existing data might inspire it to create more diverse problems.

Manus.im Discord Discord

Manus Eclipses ChatGPT: A member described Manus as being in another world compared to ChatGPT Plus, praising its capabilities.
- Another member suggested Deepseek v3 or Gemini 2.5 Pro as fully free alternatives.
International Channel Advocated: A member proposed adding an international channel to the Discord server, tagging an admin to facilitate its creation.
- Another member felt the existing <#1349440507507376129> channel, with its globe icon, feels like any language is welcome.
o3 Agent Disambiguation: A member clarified that o3 is a model similar to 4o or Deepseek R1 with tool access, whereas Manus leverages Claude 3.7 Sonnet and offers an environment for comprehensive control.
- The member suggested focusing on Python and studying papers on ArXiv for further learning.
ChatGPT Plus Price Questioned: A member questioned the continued value of a ChatGPT Plus subscription.
- In response, another member disclosed that they pay for multiple AI providers, including You.com, and pointed to Gemini Advanced’s AI Studio for free usage.
Manus Fails Cringe Test: A member found that Manus was not successful when asked to remove the cringe from a script.
- Another member clarified that the system likely failed to filter the content effectively because cringe is not well defined in normal dictionary, and it’s a newly emerged internet slang.

aider (Paul Gauthier) Discord

Gemini 2.5 Falls Short on Coding: Despite benchmark improvements, Gemini 2.5 Pro Exp is considered by some to be “HOT trash for code compared to sonnet”, while others find the 25 free requests useful with Aider.
- Members reported that the /vendor folder seems to be the cause of the API errors.
Mistral Medium 3 Competes on Price: Mistral Medium 3 offers competitive performance at a lower cost ($0.4 input / $2.0 output per M token) than other proprietary models, according to Mistral’s official announcement.
- The return of Mistral was welcomed with enthusiasm, particularly due to their history of open-source contributions that foster competition.
LLM Benchmarks Questioned for Data Contamination: The utility of LLM benchmarks is under scrutiny, as some argue they are misleading due to potential model training on the benchmark data itself.
- Despite these concerns, the community recognizes the need for metrics but advises interpreting them cautiously, emphasizing the need to take them with a “grain of salt”.
Aider vs. Cursor Auto-Context: Users compared Aider with proprietary tools like Cursor, noting that “the auto-context in Cursor is more reliable” because Cursor includes more files by default, potentially due to its use of a proprietary model for information extraction before sending data to the LLM.
- Some users expressed a preference for Aider for complex tasks.
Aider Gains Web-Searching Superpowers: Users discuss integrating web search into Aider, suggesting the use of a Perplexity API key as an OpenAI-compatible API or manually adding webpage content via /web command as a markdown file.
- One user suggests using aider-desk with a search MCP to achieve autonomous internet searching for context.

LM Studio Discord

Qwen 3 Dominates Gemini: Users found that the new Gemini update completely ignores and overengineers requests, while Qwen 3 provides functional code that respects instructions, especially in coding tasks.
- One user stated that Qwen 3 is better in every way and provides functional code whereas Gemini has completely ignored the instructions.
Same Provider Speculative Decoding: The discussion clarified that sticking to models from the same provider is more likely to work well for speculative decoding due to fine-tuning and conversion differences, especially for Qwen 3 0.6b.
- It was emphasized that there is no technical requirement for models to be from the same provider but this configuration will provide more reliable results.
Local Models Flounder in Refactoring: Members found that local models struggle with context in large codebase refactoring but are effective for autocomplete and personal coding projects; the LM Studio API and extensions like Cline are recommended.
- Deepseek was identified as a top local choice, despite limitations, and users have found it useful in conjunction with commercial alternatives.
HP Z2 Mini Speeds Sensation: Members discussed the HP Z2 Mini workstation, highlighting its memory speed of 8000 MT/s, which translates to theoretically transferring 8,000 million megabytes of data per second.
- Users are anticipating many units entering the market.
Halo PC Boasts Bandwidth Bonanza: The GMKtec Evo X2 with Ryzen AI Max 395 Strix Halo Mini PC was discussed as a potentially cheaper alternative with 128GB RAM.
- It was noted that the memory bandwidth of the Strix Halo PC is 240 GB/s, which is similar to some NVIDIA cards, launching May 20.

GPU MODE Discord

Kevin 32B Model KernelBench Released by CognitionAI: CognitionAI just released Kevin 32B, a model utilizing Multi-Turn RL for writing CUDA Kernels, detailed in this blog post.
- The model was trained on 180 tasks and a holdout set of 20 tasks and is reportedly quite effective in this niche, but members noticed that Kevin was trained on the test set, raising concerns about evaluation methodology.
MI300 Dominates AMD Leaderboards: Multiple submissions were made to the amd-fp8-mm leaderboard using MI300, with timings ranging from 251 µs to 9.85 ms and even sub-millisecond timings reaching as low as 195 µs.
- Several submissions were also made to the amd-mixture-of-experts leaderboard, with timings including 7275 ms, 7281 ms, and 12259 ms on the MI300.
A6000 Ada Edges Out L40s in Performance: The A6000 Ada often outperforms the L40s in stock settings, while the 4090 is the fastest among the Ada cards, despite the L40s having more tensorcore power on paper; the L40s has ECC memory which can impact performance.
- The A40 is the top performer in tokens per dollar when renting in the 4x GB category, while the A6000 Ada leads in the Ada 4x GB category; there are also more overheating issues compared to L40s servers.
TorchAO Powers Quantized Phi-4 Mini Instruct: The PyTorch team released quantized Phi-4 Mini Instruct models on Hugging Face, quantized using TorchAO and optimized for deployment on vLLM and Executorch.
- The release includes INT4 weight-only quant (vLLM-ready, with 67% peak memory reduction, 10-20% speedup on A100), FP8 dynamic activation & weight quant (vLLM-ready, 36% peak memory reduction, 15-20% speedup on H100), and INT8 dynamic activation & INT4 weight quant (for ExecuTorch, reaching 17.3 tokens/sec on an iPhone 15 Pro).
Factorio’s FLE Docker Suffers Desyncs: Users are experiencing connectivity issues between their Factorio client and the FLE Docker server after a recent Steam update, resulting in immediate desynchronization errors due to mismatched CRC values.
- Additional errors include the new import strategy breaking things throughout the codebase, as pydantic did not recognize classes as being the same when imported differently, and the broken harvest_resource/server.lua.

MCP (Glama) Discord

Cursor Stumbles to Use MCP: Members debated Cursor’s ability to utilize resources, pointing out they aren’t visible in the UI like prompts, and that Claude insists on manual user integration of resources, possibly conflicting with the intended MCP architecture.
- Someone clarified that resources are intended to be managed by the host application without user intervention, citing the MCP specification.
A2A Communication Called Cumbersome: Participants discussed the merits of Agent-to-Agent (A2A) communication, with one finding it a little cumbersome yet acknowledging valuable core abstractions such as task, agent, and artifact.
- Others highlighted that MCP can support A2A effectively and that A2A’s task management workflow could be executed with minimal code.
Debugging MCP Servers Proves Difficult: A member recounted their challenges in debugging an MCP server served over stdio transport, taking a week to fix a minor issue due to disabling console logs and ineffective VSCode breakpoints.
- Suggestions included using tee stream for debugging, and one user shared this guide for debugging an mcp server, tool calling using the mcp inspector.
Cloudflare MCP Deployments Fail to Connect: A user questioned whether others were having connectivity problems with remote MCP Servers deployed on Cloudflare.
- The discussion lacked explicit solutions.
Vertex AI MCP server Now Open Source: A member introduced their vertex-ai-mcp-server project, which began with Vertex AI and has since expanded to include Gemini, grounding, and additional tools.
- The member announced that the project is now open source for community contributions.

Yannick Kilcher Discord

LLM Output Infestation Plagues Server: Members voiced concerns about the proliferation of LLM-generated content in the channel, questioning the quality of engagement and the potential for spam.
- A member compared the server’s declining quality to that of Quora, lamenting the flood of long, bullet-point-ridden messages.
Em Dash Reveals LLM’s Identity: The frequent use of em dashes in LLM output has become a telltale sign, easily identifying AI-generated text.
- One member joked that most people don’t even know how to type them on their keyboard, making it an obvious giveaway.
Dynamic AI Agents Eye Academic Article Writing: A user is developing agents that are dynamic and a society of minds for writing patents and academic articles and referenced DreamerV3 for blurring the lines between static and dynamic task-adaptive world modeling.
- They acknowledged that some don’t believe it looks futuristic.
Gemini Powers Data Science Agent in Colab: A user shared a Google blog post detailing a data science agent in Colab powered by Gemini.
- However, one member expressed disappointment, stating that lately all I seem to be able to get out of Gemini is facile blog-post-level stuff.
Winner-Takes-All Economics Sparks Debate: Members debated the effects of the winner-takes-all economic system, and whether it is the system that the US has been building since its augeration, as well as the announcement of Mistral Medium.
- One member joked that the new model comes at the low cost of a couple of millions USD, referencing FixupX.

Latent Space Discord

Windsurf AI Outpaces Cursor Vision: Members find Windsurf has a clearer product vision than Cursor, possibly influenced by OpenAI’s acquisition strategy and internal model development.
- The concern is that OpenAI’s focus on productizing may conceal coding models, building a strong competitive advantage (moat) inaccessible to others.
Gemini 2.5 Faces Guardrail Quirks: Users are seeing odd responses from the Gemini 2.5 Pro preview, likely due to increased guardrails and safety training, as noted in this Reddit post.
- One user lamented the relentless tuning for personality and engagement, adding that endless leaderboard hacking and constantly shifting safety training is making these models very weird.
Mistral Debuts Enterprise Chat Agent: Mistral is launching new models, including a Mistral enterprise chat agent, to rival OpenAI’s gpt models in performance and safety, as announced on X.
- These new models aim to compete head-to-head with OpenAI’s GPT models.
New Claude code pod Released: A new Claude code pod has been released, get the details at Latent Space’s X post.
- This release promises new features and enhancements for coding tasks.
AI Engineer Conference Tickets Selling Out Fast: The biggest AI Engineer conference of the year is happening this June and early bird tickets are expected to sell out this weekend.
- The speaker list is now live, see who’s speaking at ai.engineer/#speakers.

HuggingFace Discord

HF Subscription Activation Glitches Resolved via Email: Users reported subscription activation issues, needing to email [email protected] for support due to delayed access to membership benefits.
- They emphasized that billing questions for HF access token and meta-llama-Llama-3.3-70B-Instruct should be directed to [email protected], linking to the Hugging Face Inference Cost documentation.
No CUDA? No Problem! Text-to-3D Diffusion on Mac: Users discussed running text-to-3D scene diffusion models on Macs without CUDA, suggesting leveraging Apple Silicon for speed improvements.
- Resources like the Diffusers optimization documentation for MPS and CoreML were shared for guidance.
Recursals RADLADS Protocol Distills Attention: The Recursal team introduced RADLADS, a protocol for rapidly converting softmax attention transformers into linear attention decoder models, as described in their ArXiv paper and HuggingFace model collection.
- The team claims RADLADS requires less than 700M tokens and under $2,000 USD for training and its training code is available on GitHub.
Flash Attention 2 Zaps with FP16 and BF16 Support: Members noted that Flash Attention 2 supports FP16 and BF16, with BF16 requiring Ampere or newer GPUs, recommending to install using git clone https://github.com/Dao-AILab/flash-attention.
- They provided a link to the pytorch blog and discussed installation instructions.
DaoDeCode Framework achieves Maximal Transformation: A member introduced DaoDeCode, a language model framework incorporating Mechanism Points and Five Element Transformation Patterns, inspired by Daoist strategy, with source code available on GitHub.
- The framework aims for minimal intervention and maximal transformation by identifying the perfect seam where space and time vanish.

Nous Research AI Discord

Open Codex branches out to Gemini: An open-codex fork now supports models like Gemini and Ollama as described in this repo.
- Another member commented that the official OpenAI Codex reportedly supports other models, but was censored by a bot.
M4 Max MacBook Pro achieves LLM Glory: Users can run models up to 90 billion parameters in LM studio on an M4 Max MacBook Pro with 128GB of RAM and a 2TB drive.
- This MacBook Pro was bought Certified Refurbished directly off the Apple website.
Dolphin gets down and dirty: Cognitive Computations seeks assistance to source naughty interactions to train the Dolphin model, using a tool called dolphin-logger.
- Users should install dolphin-logger, add your keys to .env, and run it, then point your agentic/MCP tools at it to make it happen.
Zed launches AI Code Editor into the ring: Zed launched a new AI code editor with good local model support, integrating easily with local models, including Hermes.
- It seems that users can also add any Ollama model, assuming it supports tool calls and diff styles.
Gemini Blazes with 500-1500 tps: The new Gemini model achieved 500-1500 tps in testing.
- One user found this insane, believing only Cerebras and Groq could reach such performance.

Eleuther Discord

Cursor Offers Free IDE Access for Students: A member shared that Cursor IDE is now free for students, sparking community interest.
- Many in the community found this genuinely useful, showing Cursor is a valuable resource.
Researcher Seeks Scale Maximalism Proponents: A user sought references for papers or researchers advocating for scale maximalism as a universal solution in AI.
- The goal is to find evidence-backed proponents who believe that scale maximalism will solve all problems in AI.
Prolific Outshines MTurk for Human Evals: In a comparison between MTurk and Prolific for human evaluations, Prolific was favored.
- One member decisively recommended Prolific 80% of the time, implying its overall superiority.
Guidance Sparked for lm-eval-harness Implementation: A user inquired about implementing a custom model in lm-eval-harness, and another provided a link to the documentation.
- The suggestion was to inherit from HFLM and overload _model_call and _model_generate, pointing to the Mamba implementation as a practical example.
HF Inference battles vLLM on Power Usage: While vLLM is faster for generation tasks, HuggingFace inference uses less power, yet HuggingFace inference can use full power for loglikelihood tasks.
- Another user replied that this is expected, because vllm is optimised for fast generation.

DSPy Discord

Unsloth Finetunes Claude Sonnet, Qwen3-4b Compared: A member used Unsloth to finetune their Claude Sonnet chat history data and compared it to Qwen3-4b, finding that the finetuned Lora model correctly picked up def forward in a zero-shot setting.
- The member believes a GRPO finetune would pick up LabeledFewShot correctly, as their SFT finetune almost succeeded but hallucinated.
Brainstorming Knowledge Injection for Token Savings: A member seeks a more token-efficient method for injecting domain knowledge (ES cluster indices and mappings) into a DSPy program, instead of including it as an InputField in every prompt.
- A user suggested including the entire Postgres schema in the system prompt for text2SQL tasks.
Docstrings become Specifications for DSPy Signatures: Members discussed using docstrings in dspy.Signature to provide essential instructions, explaining what the task is, rather than how to achieve it.
- It was revealed that docstrings make their way as instructions for the system message in the default adapter.
React Module’s Signature Asks for Output: A member inquired about creating a signature for a ReAct module that primarily outputs tool calls, questioning whether output fields can be left blank.
- No direct answer was given.
Github Botches Notebooks but Colab Steps In: A member suggested that GitHub’s notebook rendering might be too picky, causing a missing “state” key error, and that this could stem from a prematurely copied notebook.
- A user suggested that linking to the Colab version (https://colab.research.google.com/github/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) instead of the GitHub version would be the quickest fix, because Colab is more forgiving.

Modular (Mojo 🔥) Discord

Apple Silicon Users Hit Roadblock Running Modular Puzzles: Members determined that running Modular Puzzles directly on Apple Silicon GPUs isn’t feasible, suggesting the use of a cloud instance with a GPU.
- The supported NVIDIA GPU architectures for Mojo GPU programming include Turing, Ampere, Hopper, and Blackwell (RTX 20XX - 50XX series).
Traits Spark Debate: Fields vs Properties: There was debate about fields in traits, with agreement that fields in traits could happen, but this would prevent extension.
- The community agreed that properties in traits is probably a strictly better idea than fields in traits, because it’s more general.
Modular Hackathon Kicks Off!: The Modular Hackathon is happening at AGI House in Hillsborough on Saturday, and a member reminded everyone that spots are still available, see: Modular Hackathon.
- Speakers include members from Modular, Mark Saroufim (GPU MODE & PyTorch), Simon Boehm & Sasha Krassovsky (Anthropic), and Dylan Patel (SemiAnalysis).
Mojo’s Syntax Mirrors Python’s Convention: A member inquired about the presence of a pub syntax in Mojo (similar to Rust), and another member stated that Mojo currently follows the Python convention of “everything is public, prefix things that are supposed to be private with an underscore”.
- There was additional interest in a roadmap to include this feature.
Mojo Awaits Open-Source Contributions: A member inquired about contributing to the Mojo compiler, and a member responded that there is not currently a way to do so.
- The member noted that there’s a decent chance that sum types will be implemented before the compiler is open for contribution.

LlamaIndex Discord

LlamaIndex Enables Deep Research Agents: LlamaIndex now allows users to build their own Deep Research agent from scratch using a workshop tutorial. It guides users through a tutorial to create a multi-agent system for deep research, using AgentWorkflow.
- The new agent enables a single agent to perform more in-depth research and analysis.
LlamaExtract Augmented with Reasoning: LlamaExtract is updated with new features for AI applications, providing enhanced citation capabilities and improved reasoning. LlamaIndex’s tweet describes features to extract information with precise source attributions, offering reasoning for extractions, and boosting transparency.
- This enhancement facilitates clearer audit trails and more reliable data extraction from complex data sources.
Anthropic API Gets LlamaIndex Support: Anthropic API now has a built-in web search tool that LlamaIndex immediately supports, as shown in a demo notebook and Anthropic’s announcement.
- This integration allows users to combine Anthropic’s search capabilities with LlamaIndex’s data management features.
Memgraph Unmasked as Neo4j Client: Memgraph, when tested in WSL2 VS Code, calls Neo4j, indicating that Memgraph is a wrapper on Neo4j using the Neo4j client, detailed in the docs.
- Users are advised to check the underlying implementation when using Memgraph in their projects.
GPT-4o-mini Turns Multimodal: Users can pass documents directly to the multimodal LLM, GPT-4o-mini, for one-shot inference, avoiding OCR.
- Appending the document to the system prompt allows the LLM to query directly, streamlining the document processing workflow.

tinygrad (George Hotz) Discord

Latner Drops Mojo Kernel Trove: Chris Latner’s Mojo now hosts a substantial collection of “mojo kernels” available at modular/modular/tree.
- The community is currently assessing their speed and figuring out operational instructions to see if they have some oomph.
tinygrad’s True Colors Exposed: A user sought clarification on the color codes in a tinygrad output screenshot, prompting a discussion on the visual representation of data structures.
- Another user linked to the color definitions in the tinygrad GitHub repository, pinpointing the exact code responsible for the color scheme.
Bending Beam Search Cache to Your Will: A user inquired about customizing the beam search cache location for use in a Lambda Labs instance, due to storage configurations.
- George Hotz clarified that the cache location can be overridden by setting the CACHEDB environment variable, as documented in line 175 in helpers.

Torchtune Discord

Tokenizer arg renaming in Torchtune PR sparks confusion: A Torchtune PR sparked discussion over renaming tokenizer arguments like add_end_tokens, with one member pointing out inconsistencies.
- The member stated that add_end_token was originally messed up with add_end_tokens, and that renaming could make it more confusing if there’s no add_start_tokens.
Renaming for Uniformity?: A member suggested putting the context on the PR for the author, feeling the rename brings better uniformity.
- The rename will make any future work to make a common tokenizer interface easier.

Nomic.ai (GPT4All) Discord

ROCm Support Sought for GPT4All: A user asked about updating the Windows version of GPT4All to support AMD ROCm for potentially faster performance.
- No response was given.
GPT4All Eyes Classroom iPads: A teacher inquired whether the GPT4All app could be used on iOS devices like iPads for classroom integration.
- Another user responded that LLMs need significant processing power, suggesting a home server setup instead.
GGUF Settings Explored: A user questioned whether the GGUF file format dictates its own max new token limit and optimum temperature settings.
- The response clarified that the max new token count is tied to available VRAM and other settings serve as a starting point for tweaking.
Users Seek Uncensored Models: A user reported that the model refused to answer an illegal question, despite there being no restriction on criminal use.
- Peers suggested trying uncensored models and suggested exploring Hugging Face for such models.

LLM Agents (Berkeley MOOC) Discord

Auth0 Hackathon Bounty Incoming: The Auth0 Workshop on May 7th at 10AM PT is teaching how to secure AI agents with authentication and offering hackathon prizes for teams integrating Auth0.ai.
- Prizes include $5,000 for 1st place, $3,000 for 2nd, and $2,000 for 3rd.
Lambda Pays for AgentX Competition: The AgentX Workshop with Lambda on May 15th at 10am PT focuses on scaling agentic AI projects using Lambda’s Inference API.
- Special prizes are available for AgentX Competition participants, including up to $1,000 in credits for 1st place, $500 for 2nd, and $300 for 3rd in both Entrepreneurship and Research tracks; register at lu.ma/AgentX-lambda.
Email Notifications MIA: A user, <@854134294870884363>, reported not receiving email notifications for credit tracking, making monitoring difficult.
- The user provided email addresses [email protected] and [email protected] for receiving notifications.
LLMs Fake Conditionals via Stats: LLMs execute conditional statements through statistical pattern recognition, learning from millions of examples and representing relationships like “If X, then Y” in their parameters.
- They use neural attention to weight parts of a prompt and anticipate following text, statistically approximating logical reasoning.

Cohere Discord

AWS x Cohere Workshop Recording Requested: A user inquired about the availability of a recording of the in-person AWS x Cohere workshop for those unable to attend physically.
- The user, based in Malaysia, expressed interest in the insights from industry experts presenting at the event.
Coral Gets Rebooted After Brief Outage: Users noticed Coral was briefly down for maintenance but is now accessible again at coral.cohere.com.
- Service was quickly restored.

Codeium (Windsurf) Discord

Windsurf 1.8.2 Fixes Tool Call Errors: The Windsurf 1.8.2 patch fixes tool call errors for users with disabled telemetry and crashes around workspace conversation.
- The update also includes server updates to add regional channels.
Windsurf Expands Geographically With Regional Channels: Windsurf has added regional channels to help connect Windsurfers across the globe, including the SF Bay Area, San Diego, Taipei City, Boston, Miami, NYC, Tokyo, Austin, and Toronto.
- Users can join these channels by answering the onboarding question in the customize section.
Cascade Gets Customizable with New Tools: Windsurf Wave 8 Day 2 introduces customization tools for Cascade, including custom workflows as .md files, an enhanced rules system, simultaneous cascades, a Cascade Plugins Panel, and enhanced MCP Integration.
- These features allow users to customize Cascade to their patterns and preferences to maximize productivity and can be seen in the launch video.
Windsurf Introduces File-Based Rules: Windsurf enhances rules system with multiple activation modes (Manual, Always On, Model Decision, Glob) stored in .windsurf/rules/.
- These File-Based Rules can be activated in multiple ways.
Multi Cascade Power Arrives: Windsurf introduces Simultaneous Cascades which allow you to start new Cascade conversations while existing ones are running.
- No more waiting!

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (914 messages🔥🔥🔥):

GPT-4o vs O1 capabilities, Grok 3.5 release rumors, Gemini 2.5 Pro (0506) reviews, O3 Pro release speculation, Dense vs MoE model architectures

GPT-4o can’t match O1 on some tasks: Members discuss how GPT-4o struggles with electrical and philosophy tasks that O1 can handle, indicating differences in their knowledge bases and reasoning abilities.
- They suggest this is because O1 can think and recall about electrical stuff before forming a reply, whereas 4o has to immediately predict a chat reply.
AI Community Abuzz with Grok 3.5 ASI Speculation: The AI community hotly anticipates the release of Grok 3.5, with speculation that it is tuned from Grok 4 and may be approaching Artificial Superintelligence (ASI).
- Some users reported an hour-long access to Grok 3.5 in UI, fueling hype, while others dismissed the claims as fake news.
Gemini 2.5 Pro has mixed coding reviews: Early reviews of Gemini 2.5 Pro (0506) are mixed, with reports of improved coding and web design performance but potential degradation in other areas.
- While it tops the coding arena, some users find it overrated, noting worsened analytical processes and context handling compared to its predecessor.
Mystery of the Missing O3 Pro Release: The community expresses disappointment over the continued absence of O3 Pro, speculating that OpenAI might be intentionally delaying its release.
- Some suggest OpenAI don’t want u to use it and point to the possibility of a fake announcement written by Gork 4 as a sign of the model’s capabilities.
Dense vs MoE Debate Rages On: A debate emerges regarding dense versus Mixture of Experts (MoE) model architectures, with arguments about parameter counts, training efficiency, and performance trade-offs.
- One user referenced a Qwen 3 experiment showing that small active parameters in MoE models can still allow performance close to dense models with larger parameter counts.

LMArena ▷ #announcements (1 messages):

Discord improvements, LMArena community growth, AI community space

LMArena Invests in Growing AI Community: LMArena is investing more time and energy into creating a space for those interested in making an impact in AI and announced a new community manager for Discord.
- Users are invited to share their thoughts on possible changes they’d like to see in the Discord via this survey to help grow, engage, and protect this space.
Community Input Sought for Discord Enhancements: The new community manager is seeking feedback on potential Discord improvements to foster growth and engagement within the AI community.
- A survey has been launched to gather suggestions from community members regarding desired changes.

Perplexity AI ▷ #general (618 messages🔥🔥🔥):

Discord Bot, Perplexity Image Generation, AI ads, Traffic Acquisition Costs, Deepseek R2

Perplexity Discord Bot Deletion Mourned: Users expressed their disappointment with the removal of the Perplexity Discord bot, with one user saying it was tragic how they shut down their bot, while another saying i used it all the time quite sad.
- Users discussed self-hosting alternatives using the Perplexity API, Gemini, and OpenRouter, with one saying, github has many options and with gemini + openrouter you can make it basically free.
Perplexity User Trouble Shoots Image Generation Woes: A user reported issues with image generation, with perplexity telling them it can’t generate images and gives me instructions on how to do it with MidJourney, but this was resolved by starting a new thread.
- Another user suggested that the issue could be due to old context interfering, and recommended always trying it in a new thread.
Ads in AI Responses? The Horror!: Users debated whether the future holds ads in AI responses, with a conversation about Apple says Google searches are dropping in favor of ai search.
- One user said that Copilot tried it, saying that if you ask it to find a product, at the bottom it would be like oh and how about if you also looked at this thing.
Breaking: How Much Does Google Pay Apple to be the Default Search Engine?: Users discussed the traffic costs for google to be the default search engine on Apple, noting that By 2023, TAC reached $55.7 billion, representing 21.39% of Google’s total advertising revenues and Currently, Google pays Apple an estimated $20-21 billion annually for default search status.
- A user noted that Google’s analysis shows losing default status on Apple devices could result in revenue losses of $28.2-32.7 billion.
US/EU Legislation and Censorship: A user predicts that OpenAI, Perplexity, all others will be bashed by the US/EU legislation and forced to enforce censorship, suggesting that local downloads of early models may be the only way to bypass that.
- The user mentioned that That’s why saving a copy of them is important, referring to early models.

Deep Research Reports, Long Reports

Deep Research Yields Lengthy Reports: A member noted that Deep Research on a certain topic produced a solid, long report of 6200 words.
Link to Perplexity AI research: The research can be found at this Perplexity AI link.

Perplexity AI ▷ #pplx-api (3 messages):

Office Hours, Credits

Perplexity Office Hours Happening Tonight!: Perplexity AI is hosting office hours tonight and encourages everyone to attend.
User Fixes Credit Issue: A user reported that they were able to successfully obtain their $50 credits after experiencing issues.

OpenAI ▷ #ai-discussions (353 messages🔥🔥):

XenArcAI Introduction, Lucid Dreaming Techniques, AI's Em Dash Obsession, DeepSeek vs. OpenAI, Veo 2 vs. Sora video generation

New XenArcAI community for LLM Development Announced: A new global community called XenArcAI was introduced, focused on building impactful LLMs and exploring their business potential, and seeking collaboration, and early-stage investment.
- Interested parties were invited to DM for more information or to get involved in AI innovation and research.
Users Share Tips for Lucid Dreaming: Members shared strategies for inducing lucid dreams, including maintaining consistent sleep schedules, using dream recall techniques, and employing breathing pattern adjustments to mitigate potential sleep paralysis.
- One member recommended that keeping the picture in frame is key and that writing down dreams immediately after waking up can help with dream recall.
AI’s love for the Em Dash is Explained: Members discussed why AI models overuse the em dash, attributing it to the prevalence of academic and business materials in training data and the potential influence of post-training guardrails and reward models at OpenAI.
- One member even suggested that using the em dash more might make it harder to discern content written by AI, while another pointed out that thumbs up/down training lead to human bias.
DeepSeek Caught Mimicking Early GPT-4 Outputs: An internal analysis suggests that DeepSeek models like R1 and V3 exhibit response patterns and syntax similar to early GPT-3.5/4 outputs, and there is evidence of large batch output pulls via API tokens from low-usage regions.
- This has led to speculation that DeepSeek may have used distillation or fine-tuning from OpenAI outputs as a shortcut due to the absence of a native training dataset.
Veo 2 Annihilates Sora in Video Generation Shootout: Members compared Veo 2 and Sora in video generation, with Veo 2 being faster, providing higher quality, and accurately following prompts and reference images, whereas Sora produces lower quality video and fails to follow prompts.
- Users found Sora to have cuts, random scene changes, terrible physics, and censorship problems.

OpenAI ▷ #gpt-4-discussions (11 messages🔥):

GPT-4o degradation, Brief GPT-4o responses, Placebo upvote buttons

GPT-4o: Is it Getting Dumber?: Members discussed whether GPT-4o is acting weird and replying randomly out of topic, with one user directly asking what is happening to gpt? why its acting weird? it reply randomly out of topic.
- One user suggested checking conversation memories for confusing or overloading information, while another claimed their GPT-4o has been glorious.
GPT-4o Getting Brief: A user reported that GPT-4o is getting much briefer in its responses.
- In response, some users are feeling the frustration.
Upvote Buttons: A Placebo?: A member questioned whether the upvote buttons are a complete placebo.
- Another member responded it is a world of disappointment.

OpenAI ▷ #prompt-engineering (68 messages🔥🔥):

Atoms in Visible Light, Hypertree Planning Prompting, Building a ChatGPT website, Project-Based Prompt Management, Prompt engineering and experimental sciences

Atoms Glow in Visible Spectrum: Members debated whether atoms can be seen in visible light, with some clarifying that while individual atoms are smaller than the wavelength of visible light, atoms can emit visible light directly, as shown in an image of a single Yb atom suspended in an atom trap.
Hypertree Prompting Creates Stellar Plans: A member shared a link to a ChatGPT conversation using new hypertree planning prompting.
- The conversationalist said that it sounds like it could be pretty stellar to provide/organize context in a more manageable way.
Users discuss a ChatGPT Website Idea: A member wanted to create a website to compete with ChatGPT, but another member warned that you can’t use the model to create a competitive site. and that this will get you banned.
- The member clarified that they want a product based on the chat gpt api to do it.
Project Prompting for Fun and Profit: A member proposed creating a website with login, database, custom prompts, settings and saved conversations, and the ability to export, in order to manage projects.
- Other members suggested that the project already existed.
Prompting mirrors Scientific Process: Members compared prompt engineering to scientific experimentation, describing something about seeing the process of science and experimentation and ‘trying to get the desired result from reality’ or ‘trying to map input-output from anything’ as a type of prompt engineering.
- Prompting is about ‘trying to get the desired result from reality’ or ‘trying to map input-output from anything’.

OpenAI ▷ #api-discussions (68 messages🔥🔥):

Atoms in visible light, Prompt engineering techniques, Hypertree planning prompting, Project-based custom prompts, Creating a Chat GPT website

Atoms emit visible light, defying expectations: A member shared an image of a single Yb atom suspended in an atom trap, demonstrating that atoms can emit visible light directly when saturated with excess energy.
- Visible light is 400-700nm, it can’t be used to see an individual atom because an atom is .1nm, but atoms can emit light in the visible spectrum.
Diving Deep into the HyperTree Planning Prompting: A member shared a link to the new hypertree planning prompting, calling it so good.
Prompt Engineering parables and molten Aluminum: A member likened prompt engineering to metallurgy, sharing a personal experience about learning from someone with practical smelting experience, noting the importance of challenging established conventions and understanding the system’s internal workings.
- Prompt engineering is like ‘trying to get the desired result from reality’ or ‘trying to map input-output from anything’, and they approach model interactions similarly to how they explore new programs or interact with unfamiliar entities.
Building Custom Chat GPT Projects and Websites: A member outlined a plan to create a custom Chat GPT website with login, database, custom prompts, and project settings to save conversations, but another member warned it could violate the terms of service and result in a ban.

Cursor Community ▷ #general (484 messages🔥🔥🔥):

Cursor student discounts, Gemini 2.5 Pro performance, MCP server security concerns, Channel Organization, PowerShell default version

Student Discount Program Troubles: Users report issues with applying the Cursor student discount, including verification problems and incorrect billing, prompting suggestions to contact support at [email protected].
- Members highlight that the discount might be tied to the email used for verification and that .etu emails in France aren’t being properly recognized.
Gemini 2.5 Pro: Tools or No Tools?: Users are sharing experiences with Gemini 2.5 Pro in Cursor, noting that while the model is generally good, it often fails to call tools properly and struggles with JSON formatting, especially with backticks in strings.
- One user provided a request ID for a case where Gemini said it would apply changes but never called any tool.
Users Wary of Malignant MCP Servers: Concerns are raised about the security of MCP (Model Context Protocol) servers, with questions about trusting code from random people and ensuring the libraries don’t send MySQL credentials to unauthorized servers.
- One member suggested checking the GitHub repo before running an MCP server, building it locally, or creating your own MCP with Cursor.
Discord Channel Structure Overhaul: Members suggest restructuring the Cursor Discord server with more channels and better organization, similar to Langchain’s Discord setup, to improve navigation and community engagement.
- It was confirmed that a team is indeed working on the channel prioritization and archiving to better support the community.
Windows PowerShell Version Troubles: A user is having trouble setting PowerShell 7 as the default in Cursor, despite updating settings.
- The suggested solution was to close the last terminal session and start a new one after updating Settings.json to include specific profiles for PowerShell 7.

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

OpenRouter activity page, Cerebras new provider

Activity Page adds Export Function: OpenRouter announced a screencast of more activity page features are coming soon, including an export function (link to X post).
- A user requested to see the stats in their current timezone because they don’t even see the stats for the current day.
Cerebras chips: Largest ever built!: OpenRouter announced Cerebras as a new provider, which has chips that are the largest ever built, packed with up to 4 trillion transistors on a single wafer, and massive on-chip memory: 40 GB per wafer, eliminating bottlenecks from external memory (link to X post).
- Cerebras gives 3k+ TPS on Llama 4 Scout and 1.8k+ TPS on Llama 3.3 70B Instruct, that can instantly generate an animation for ~$0.001.

OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Clippy as VSCode Extension, Bring back Clippy with VS Code Extension

Bring back Clippy with VS Code Extension: A member shared a link to the Clippy VS Code extension made by Felix Rieseberg.
- It seems that the good old paperclip is back in town, now on VS Code!
Clippy makes a comeback: The famous paperclip assistant from Microsoft Office is available as a VS Code extension.
- Developed by Felix Rieseberg, this extension brings a nostalgic and humorous element to the coding environment.

OpenRouter (Alex Atallah) ▷ #general (299 messages🔥🔥):

Gemini 2.5 Pro Upgrade, OpenRouter API Issues, Cerebras vs Groq, DeepSeek Models, Mistral New Release

Gemini 2.5 Pro Upgrade: A Mixed Bag: Users report that the new Gemini 2.5 Pro is worse than the previous version in almost all areas except coding, sparking debate about whether this is due to tradeoffs in design or something else.
- One user noted, *“Whatever changed just blew up my structured output bills. From 0.000000xxx to 0.0xxx because its reasoning 5-10k tokens suddenly.”
OpenRouter API Hiccups: Some users are experiencing issues with OpenRouter’s text completion requests, with some getting no response or the service replying as if it’s a chat completion instead.
- One user pointed out, *“if I do a text completion request directly to chutes it works properly but the exact same request through openrouter is broken, just like it has been for days now lol.”
Cerebras and Groq: A Chip Duel: Members debated whether Groq or Cerebras would be better suited to host DeepSeek v3.1/r1, with some suggesting that the model is too large for Groq’s hardware.
- One user noted that Groq would need about 2700 cards just to fit the weights of r1, while another suggested that Cerebras could easily run Qwen 235b.
DeepSeek Models: Long Context Speed Race: Users highlighted the need for a fast, long context DeepSeek v3.1/r1 provider, suggesting that such a provider would be an instant hit.
- One user suggested, “good speculative decoding will have H100s spit out 300 tok/s on big Llama3. Imagine how much a little optimisation can get you there.”
Mistral Medium 3 Released, Is It Useless?: A user noted the release of Mistral Medium 3, to which another user replied it sounded useless, but another user said that while DeepSeek v3 is cheaper and better on benchmarks, Mistral might be good for creative writing.
- Another user pointed out that Mistral is teasing their new large model in the Mistral Medium post, but they’ve had their 2504 checkpoint deployed on Cerebras for a while now.

Unsloth AI (Daniel Han) ▷ #general (231 messages🔥🔥):

Qwen3 models, gguf conversion issues, vLLM issues with LoRA adapters, Datasets for QandA training

Qwen3 a3b models considered impressive: Members are finding the Qwen3 a3b (mixture of experts) models impressive, especially given that the a refers to the 3B active parameters during inference, and are better than qwq32b.
- The A3B model (30B total parameters) is said to take the same RAM roughly as the 32B Q6k, about 25GB.
Users discuss issues with GGUF conversion after training Qwen3: One user training Qwen3-30b-a3b was unsuccessful in converting it to a useful gguf file, after which another user suggested they symlink llama-quantize in the main llama.cpp/ directory.
- They noted that llama.cpp was missing llama-quantize in the main dir llama.cpp/, and it was in llama.cpp/bin/ dir instead.
vLLM version related issues: A user reported that vLLM errors may be related to LoRA adapters and v1/v0 engines in vLLM, and that forcing vLLM to use the v0 engine resolved errors.
- Specifically, they noted that some versions will use V1 engine when adapters are used, and that leads to issues.
Debate whether or not tags should be included for Q&A: One user asked what their training dataset for Qwen3 should look like for SFT, and whether they should include the tag in their answers.
- One user said they trained their Qwen3 model just now and they don’t think the InfinityInstruct dataset has the tag in it.

Unsloth AI (Daniel Han) ▷ #off-topic (9 messages🔥):

Mistral Medium, Vast.ai, AI Project Recruitment, Model Weights

Mistral Medium Launched: A member shared a link to the new Mistral Medium model announcement: mistral.ai/news/mistral-medium-3.
- Another member pointed out they are using Vast.ai.
Recruiting Reliable AI Project Members: A member is looking for a reliable person for an AI Project.
- Requirements: Citizen of USA, Australia, Canada, UK, Switzerland, Netherlands, or Germany; Compensation: $500 weekly (Part-Time); DM for more info.
No Weights = Useless?: A member commented that no weights means Mistral Medium is useless <:rip:1233329793584468062>.

Unsloth AI (Daniel Han) ▷ #help (39 messages🔥):

Phi-3.5 Finetuning, Phi-4 Confusion, GGUF Finetuning, Qwen3 Training, Gradient Accumulation

Phi-3.5 Has Stopping Problem: A member noticed that finetuning LLMs like Phi-3.5 only uses <|end|> to signal stops, causing it to generate until the token limit.
- They also noted that the default tokenizer’s padding token is <|placeholder6|> but ended up appending tokenizer.eos_token at the end.
Phi-4 Tokenizer Confuses Newbies: It was brought up that Unsloth fixed a bug where padding and EOS tokens were the same, and that Phi-4 has a different chat template than Phi-3.5.
- Specifically, the tokenizer for unsloth/Phi-4-mini-instruct uses the Phi-3 tokenizer, which means tokenizing ”<|im_start|>” results in [27, 91, 321, 10949, 91, 29] rather than a special token.
GGUF Isn’t Finetunable: A member asked whether to finetune from a GGUF checkpoint or a 4-bit one, clarifying that GGUF is applied after finetuning for inference optimization.
- Another member clarified that you cannot finetune GGUF, considering it a compiled app opposed to source code.
Qwen3 Suffers Kaggle Issues: A member training Qwen3 on Kaggle faced issues using 2 GPUs, noting Unsloth doesn’t yet support multiple GPUs, despite the notebook being configured for two T4s.
- Another member linked to a blogpost regarding gradient accumulation: Unsloth Gradient Accumulation Issues that occurs when inheriting a trainer class.
Pad Token Switcheroo: Members noticed that the tokenizer_config.json differs between unsloth/Qwen3-0.6B-Base and Qwen/Qwen3-0.6B-Base on HF.
- Specifically, the Unsloth version removes the chat template and swaps "pad_token": "<|endoftext|>" for "pad_token": "<|vision_pad|>".

Unsloth AI (Daniel Han) ▷ #research (18 messages🔥):

Absolute Zero paper, Self-play Reasoning, Unsupervised Training, Code validator, Memory Layer Hooking

Absolute Zero Sparks Discussion on Self-Play: Discussion around the paper Absolute Zero: Reinforced Self-play Reasoning with Zero Data highlights concerns that it is still limited by how diverse the problems that the model can come up with are.
- Members speculated it will only solve problems already reflected on its (pre)training data, although existing data might inspire it to create more diverse problems.
Concerns Raised about Task Correctness in Self-Play: The paper Absolute Zero: Reinforced Self-play Reasoning with Zero Data raises concerns about ensuring the correctness of proposed tasks, specifically if the LLM hallucinates during task generation and trains itself on an incorrect solution.
- Using a diversity of teacher-like models to validate task correctness was suggested as a potential improvement.
Member deems Unsupervised Training could be big: A member thinks unsupervised training with a code validator could be big.
- Another member quipped millions is an understatement.
Gemma3 27b gets bolted: A member hooked the input and output layers of Gemma3 27b, using just one bolt on memory layer (to force crosstalk) and still got valid generation in the output.
- The member found that hooking the middle layers is what breaks the model.
Crosstalk Memory Layer on models: A member lacks the compute to benchmark the technique of hooking every nth model layer to a single crosstalk memory layer.
- The member added that even Mistral blows up the memory requirements.

Manus.im Discord ▷ #general (202 messages🔥🔥):

Manus vs ChatGPT, International Channel Request, o3 as an agent, AI learning Advice, GPT-4.5 language and writing

Manus is ‘another world’ Compared to ChatGPT Plus: One member shared that they think Manus is like another world compared to ChatGPT plus.
- Later in the convo another member suggests if you want fully free, use deepseek v3 or gemini 2.5 pro.
Call for International Channel in Discord: A member suggested that the Discord server should add an international channel, tagging an admin <@470844096328761364> to request its creation.
- Another member responded that the <#1349440507507376129> channel and the globe icon, makes it feel like any language is welcome.
Distinguishing o3 From Manus: A member clarified that o3 is a model like 4o or Deepseek R1 with access to tools, while Manus uses Claude 3.7 Sonnet and provides an environment for controlling everything.
- Another member recommended focusing on Python and reading papers on ArXiv.
Weighing the Value of ChatGPT Plus: A member inquired whether ChatGPT Plus is still worth the cost, leading to a discussion about its features and alternatives.
- Another member mentioned that they pay for multiple AI providers, including You.com, and highlighted Gemini Advanced’s AI Studio for free use.
Challenges of Applying Cringe Filter on Manus: A member tried to get Manus to remove the cringe from a script and it didn’t work.
- Another member noted, cringe is not well defined in normal dictionary, and it’s a newly emerged internet slang, so i think you could achieve this through giving concrete instructions.

aider (Paul Gauthier) ▷ #general (159 messages🔥🔥):

Gemini 2.5 Pro Exp, Mistral Medium 3 Performance, LLM Benchmarks, Aider with Cursor

Gemini 2.5 Arrives but Falls Short in Coding: Members reported that while Google has released Gemini 2.5 Pro Exp, some find it “HOT trash for code compared to sonnet” despite benchmark improvements.
- It was suggested that the 25 free requests for Gemini 2.5 Pro Exp can be useful with Aider.
Mistral Medium 3 Re-Enters Competition: Mistral Medium 3 has been released, offering competitive performance at a lower cost ($0.4 input / $2.0 output per M token) than other proprietary models, as noted in Mistral’s official announcement.
- Others expressed happiness to see Mistral back in the game “cause they usually make stuff foss which breeds competition”.
LLM Benchmarks: Meaningless or Useful?: The utility of LLM benchmarks was debated, with some arguing they are misleading because models may be trained on the benchmark data itself.
- Despite concerns, some members acknowledge the need for metrics to evaluate models but advise taking them with a “grain of salt”.
Aider vs Cursor: Auto-Context Comparisons: Users compared Aider with proprietary tools like Cursor in terms of codebase indexing and auto-context implementation.
- Some users felt that “the auto-context in Cursor is more reliable” because Cursor includes more files by default, potentially due to its use of a proprietary model for information extraction before sending data to the LLM; some prefer Aider for complex tasks.

aider (Paul Gauthier) ▷ #questions-and-tips (36 messages🔥):

Golang Authentication Error, Gemini 2.5 'Thinking Mode', Aider RAG functionality, Claude CLI vs. Aider cost, Perplexity API for Web Search

Authentication Errors Plague Golang Repos: Users report experiencing litellm.AuthenticationError specifically in Golang repositories with the /vendor folder, suggesting a potential issue with the repo map or API authentication when using architect mode.
- Adding the /vendor folder to .aiderignore doesn’t resolve the problem, and the issue seems to happen only in OpenRouter.
Gemini 2.5 Thinking Mode Explored: Discussion revolves around ensuring Gemini 2.5 operates in a ‘thinking mode’, with the /think-tokens slash command proposed as a method to allocate a token budget for more complex reasoning, the accepts_settings: ["thinking_tokens"] line has to be added to the config.
- A user suggests testing this by comparing outputs and completion times with and without the thinking-tokens parameter.
Aider Gains Web-Searching Superpowers: Users discuss integrating web search into Aider, suggesting the use of a Perplexity API key as an OpenAI-compatible API or manually adding webpage content via /web command as a markdown file.
- One user suggests using aider-desk with a search MCP to achieve autonomous internet searching for context.
Aider’s Gemini Debug Loops: Members observed Aider entering debug loops with Gemini, especially when encountering errors, but presenting multiple error sets could lead to resolution by rethinking the implementation.
- A member wonders whether insufficient conversational context prevents Aider from recognizing and breaking out of these debug failure loops, or if successful self-correction is possible.

LM Studio ▷ #general (142 messages🔥🔥):

Gemini issues, Qwen 3 model, LM Studio fine-tuning, Speculative decoding, Local model capabilities

Gemini Update Draws Mixed Reactions: A user reported that the new Gemini update completely ignores and overengineers requests, prompting discussion about its effectiveness compared to other models.
- Another user stated that Qwen 3 is better in every way for coding and provides functional code that respects instructions.
Qwen 3 Model Compatibility for Speculative Decoding Debated: Users discussed the compatibility of Qwen 3 0.6b for speculative decoding, noting the need to use models from the same provider due to fine-tuning and conversion differences.
- It was clarified that there is no technical requirement for models to be from the same provider, but sticking to the same provider is more likely to work well.
Local Models Face Challenges in Large Codebase Refactoring: Members shared that local models struggle with context and aren’t ideal for refactoring large codebases but are suitable for autocomplete and personal coding projects.
- It was recommended to use extensions like Cline and the LM Studio API for code tasks, though commercial models also have limitations, with Deepseek emerging as a top local choice.
Exploring LM Studio Integration for Enhanced Workflows: Users inquired about integrating LM Studio AI models with programs like Excel, Word, and Visual Studio Code.
- It was suggested to scroll up for the VS Code extension, while integration with MS Office remains uncertain; Open WebUI was suggested as a way to access from different devices.
Parameters into Prompt Templates: Members discussed the possibility of passing parameters into prompt templates within LM Studio for better control over model behavior, particularly for functions with mandatory parameters.
- For now, users can pass /no_think in the system prompt to achieve no thinking.

LM Studio ▷ #hardware-discussion (37 messages🔥):

Mac Studio RAM, HP Z2 Mini Workstation, Strix Halo PC, Model Quality vs Speed, DDR5 Memory

Mac Studio RAM Pricing Ripped Off: Users lament that manufacturers charge too much for RAM upgrades on Macs, where soldered RAM isn’t a problem if higher configurations are purchased, but pricing seems artificially high.
- One user noted that prompt processing speed is a bottleneck on Macs, questioning the utility of large models needing considerable time to process even a few thousand tokens.
HP Z2 Mini: Workstation Sensation?: A member suggested considering alternatives like the HP Z2 Mini workstation upon release, anticipating many units entering the market.
- The discussion highlighted the unit’s memory speed of 8000 MT/s (megatransfers per second), which translates to theoretically transferring 8,000 million megabytes of data per second.
Strix Halo PC: Members discussed the GMKtec Evo X2 with Ryzen AI Max 395 Strix Halo Mini PC launching May 20, as a potentially cheaper alternative with 128GB RAM.
- It was noted that the memory bandwidth of the Strix Halo PC is 240 GB/s, which is similar to some NVIDIA cards.
Quality Takes Time: Users debated the trade-off between model quality and processing speed with model sizes between 70B and 300B+.
- One member suggested online Runpod testing to find the right model, with Mistral Large (123B) being highlighted as a model that “Can do 32k very well”.
DDR5 Memory Speed Details: One user clarified that the amount of data transferred depends on encoding when discussing memory speeds in megatransfers per second, and that MHz is not always an accurate measure.
- Another added that with desktop DDR5, CL38 is slightly better than CL40 at the same 8000MT/s.

GPU MODE ▷ #general (3 messages):

CI environment modifications, Python packages in CI

CI Environment Modification Asked for Python Packages: A user asked whether the environment of the CI benchmarking procedure could be modified to allow the installation or import of Python packages.
- This user indicated that the inability to install packages makes it difficult to write efficient kernels.
Query about specific Python Packages: Another user followed up asking what specific additional packages are needed.
- No packages were identified, but the question was asked of the original poster.

GPU MODE ▷ #triton (17 messages🔥):

Triton compiler passes, atomic ops, non-deterministic results, floating point arithmetic

Compiler Pass Concerns: A user questioned the duplication of compiler passes in Triton, specifically noting that both NVIDIA and AMD backends override make_ttir and include similar passes, leading to redundancy.
- The user asked whether factoring out a common set of backend-independent passes was avoided due to kernel performance being sensitive even at the TTIR level, thus requiring hardware-specific control early in the pipeline.
Atomic Addition Anomaly: A user encountered an issue with Triton where running the same input resulted in slightly different outputs across different runs with a custom matmul implementation on an A100 using fp16.
- They suspected it was related to atomic operations but noted that the issue didn’t occur with other kernels using the same atomic addition; another user chimed in that if you use atomic_add then you can definitely get different results.
Floating Point Precision Problems: A user asked if the issue was specifically related to atomic addition with fp16, wondering if the order of mathematical sums should matter with relaxed atomic addition.
- Another user stated that the order in which floating point results are added can indeed lead to different results, regardless of precision, illustrating with the classic example 1e-8 + 1e8 - 1e8.

GPU MODE ▷ #cuda (29 messages🔥):

A6000 Ada, L40s, 4090, ECC Memory, Vast.ai Quality

A6000 Ada Faster Than L40s Out-of-Box: The A6000 Ada often outperforms the L40s in stock settings, while the 4090 is the fastest among the Ada cards, despite the L40s having more tensorcore power on paper.
- The performance discrepancy may be due to ECC being enabled by default on the L40s but disabled on the A6000 Ada, but cloud providers do not allow disabling this.
L40s Has Benefits Over 4090: The L40s offers benefits over the 4090, including ECC memory, passive cooling, different drivers, stability, and potentially better efficiency due to lower clocks.
A40 is the King of Token/$: When renting, the A40 is the top performer in tokens per dollar in the 4x GB category, while the A6000 Ada leads in the Ada 4x GB category, based on stock settings from cloud providers.
4090 Servers Overheat More: 4090 servers tend to have more overheating issues compared to L40s servers, possibly due to poor setup, and NVIDIA forbids 4090 use in datacenters.
- A member notes that sites like Vast.ai having 4090s in their “verified datacenters” is suspicious.
Strange L40s Inefficiencies: Despite having more tensorcore power, the L40s doesn’t outperform the A6000 Ada, and the L40s performs similarly to the L40 for LLM inference, even though the L40 has half the flops.
- Actual performance in tokens/sec/$ matters more than specs due to other performance-impacting factors; this user also linked to a tweet showing other users complaining about L40s perf.

GPU MODE ▷ #torch (1 messages):

CUDAGraphs, Warmup Stream, Graph Capture Isolation

CUDAGraphs’ Warmup Stream Needs Context: A member inquired why the PyTorch documentation for CUDAGraphs creates a new stream for the warmup.
- They wondered if the torch.cuda.graph already creates a new stream for recording the graph, and why that isn’t sufficient to isolate the graph capture.
Isolation Concerns with Graph Capture in CUDAGraphs: The discussion revolves around whether torch.cuda.graph sufficiently isolates graph capture when a new stream is already created for recording the graph.
- The user seeks to understand the necessity of an additional stream for warmup, questioning if the existing stream provided by torch.cuda.graph is inadequate for isolation.

GPU MODE ▷ #cool-links (7 messages):

Devin, 32B model, GPU kernel, KernelBench

Devin Maker Drops Kevin: The company that made Devin just released a 32B parameter model named Kevin for GPU kernel development, detailed in this blog post and discussed in a random twitter post.
- The model is reportedly quite effective in this niche.
Kevin Trained on Test Set: Members noticed that Kevin was trained on the test set, raising concerns about evaluation methodology.
- The model was trained on 180 tasks and a holdout set of 20 tasks.
KernelBench measures CUDA Skills: KernelBench, a dataset of 250 PyTorch-based classic deep learning tasks, is used to measure a model’s ability to replace PyTorch operators with optimized CUDA kernels.
- The evaluation focuses on Level 1 (foundational tasks like matrix multiplication) and Level 2 (fused operators).

GPU MODE ▷ #beginner (4 messages):

Roofline Plot Generation, Nsight Compute for Roofline Analysis, Memory Allocation Strategies for Roofline Plots, Tensor Core Programming Pattern

Nsight Compute Generates Roofline Plots: A member recommended using Nsight Compute to generate roofline plots, noting that it uses theoretical limits as rooflines and displays the kernel’s performance in relation to them.
Experimenting with matmul sizes and memory: A member experimented with generating roofline plots by sweeping over different matmul sizes (black triangles) and repeated multiplications (colored circles) on the same allocated bytes of memory.
HBM to Tensor Core: After watching some tutorials, a member asked if the pattern HBM -> shared -> register -> tensor core is a common and efficient pattern for programming a matmul.

GPU MODE ▷ #torchao (6 messages):

torchao scaled_mm op usage, quantized Phi-4 Mini Instruct models, INT8 dynamic activation & INT4 weight quant for ExecuTorch

torchao Uses _scaled_mm Op For Quantization: Members confirmed that torchao uses the _scaled_mm op (link to code) for both CPU and GPU, but for integer quantization, it uses a different implementation (link to code).
- One member explained that the _scaled_mm op is for float8, while for integer quantization, a different kernel is used, citing the intmm.py file in the torchao repository, as well as the quantization API.
Quantized Phi-4 Mini Instruct Models Released: The PyTorch team released quantized Phi-4 Mini Instruct models on Hugging Face, quantized using TorchAO and optimized for deployment on vLLM and Executorch.
- The release includes INT4 weight-only quant (vLLM-ready, with 67% peak memory reduction, 10-20% speedup on A100), FP8 dynamic activation & weight quant (vLLM-ready, 36% peak memory reduction, 15-20% speedup on H100), and INT8 dynamic activation & INT4 weight quant (for ExecuTorch).
ExecuTorch Gets INT8 and INT4 Quantization: Quantized models with INT8 dynamic activation and INT4 weight quant are now available for ExecuTorch, enabling execution on phones and mobile devices, decoding performance reaching 17.3 tokens/sec on an iPhone 15 Pro using 3206 Mb of memory.
- The models and step-by-step recipes for quantization, serving, model quality, and performance evaluation are available, encouraging users to provide feedback via issues on relevant repositories.

GPU MODE ▷ #off-topic (2 messages):

Cursor.com student offer, IDE for coding

Cursor Offers Student Program: A member shared a link to the Cursor student program for students.
- Another member thanked them for sharing.
New IDE for Coding: Cursor is advertised as a new IDE built for pair-programming with AI.
- It allows asking GPT-4 questions about code, generating code, and finding and fixing bugs.

GPU MODE ▷ #irl-meetup (1 messages):

random.oof: Anyone at the vllm meet up in nyc?

GPU MODE ▷ #rocm (2 messages):

install .whl file manually, python script import pip module

Install Pre-Built .whl Manually: A member inquired about the possibility of manually installing a pre-built .whl file.
- Another member suggested importing the pip module in a Python script to install scripts inline.
Pip Module Import for Inline Installs: A member suggested using Python’s pip module directly within a script for inline installations.
- This approach allows for programmatic installation of packages without needing separate command-line invocations.

GPU MODE ▷ #webgpu (1 messages):

WGPU Sampling Rate

Guidance Needed: WGPU Sampling Rate Retrieval: A member is seeking assistance on how to retrieve the supported sampling rate via WGPU.
Still seeking Guidance on WGPU Sampling Rate Retrieval: A member is still seeking assistance on how to retrieve the supported sampling rate via WGPU.

GPU MODE ▷ #liger-kernel (3 messages):

Qwen 3, Liger-Kernel, Qwen 3 MoE

Qwen 3 MoE support coming to Liger-Kernel: A member inquired whether Liger-Kernel supports Qwen 3 MoE after noticing Qwen 3 support in version 0.5.9.
- After receiving confirmation that it was not yet supported, the member stated they will send a pr for Qwen 3 MoE soon.
Qwen 3 is now available: A member noticed that Qwen 3 is supported in version 0.5.9
- OK I’ll send a pr for Qwen 3 MoE soon.

GPU MODE ▷ #self-promotion (1 messages):

ML efficiency, Linear Layer Optimization, Quantization, Low-bit Matmul Kernels

Cohere’s Chat demystifies Dem Models: On May 14th, there will be a talk about optimizing linear layers to make dem models more efficient, covering quantization, low-bit matmul kernels, and other related techniques at the Cohere Labs.
- The talk will focus on practical methods to enhance ML efficiency, specifically targeting the computational bottlenecks in dem models.
Linear Layers Lean In: The talk will cover optimization of linear layers, to enhance the speed of dem models, as well as the related topics of quantization and low-bit matmul kernels at the Cohere Labs.
- Attendees can expect to gain insights into practical methods for improving the efficiency of machine learning models, with a focus on reducing computational overhead.

GPU MODE ▷ #🍿 (1 messages):

CognitionAI, Kevin 32B, Multi-Turn RL, CUDA Kernels

CognitionAI Drops Kevin 32B: CognitionAI just dropped Kevin 32B, a model utilizing Multi-Turn RL for writing CUDA Kernels, as seen on Notion.
Kevin writes CUDA?: The new Kevin 32B model reportedly writes CUDA kernels, apparently using Multi-Turn RL, according to this Notion link.

GPU MODE ▷ #reasoning-gym (1 messages):

RL in LoRA, Base Model Quality

RL in LoRA Achieves Lift: Members discuss that it’s definitely possible to improve models with RL in LoRA.
- Although you will likely hit a wall earlier, the members pointed to other factors like base model quality being more important than LoRA vs full RL.
Base Model Quality Matters: The quality of the base model is more important than using LoRA or full RL for improvements.
- While LoRA has limitations, the inherent quality of the base model significantly impacts the potential for enhancement through reinforcement learning.

GPU MODE ▷ #submissions (52 messages🔥):

amd-fp8-mm leaderboard, MI300 optimization, amd-mixture-of-experts leaderboard

MI300 Crushes AMD FP8-MM Leaderboard!: Multiple submissions were made to the amd-fp8-mm leaderboard using MI300, with timings ranging from 251 µs to 9.85 ms.
- One user submitted a ranked script fp8_gemm.py to the amd-fp8-mm leaderboard.
AMD Mixture-of-Experts Benchmarking Bonanza!: Several submissions were made to the amd-mixture-of-experts leaderboard, with timings including 7275 ms, 7281 ms, and 12259 ms on the MI300.
Sub-Millisecond Showdown on AMD FP8-MM!: Several users achieved sub-millisecond timings on the amd-fp8-mm leaderboard using MI300, with some reaching as low as 195 µs.

GPU MODE ▷ #status (2 messages):

popcorn-cli, github releases, timeout fix

Automated popcorn-cli Releases Deployed: Releases for the popcorn-cli are now automated to github releases.
- A member asked for feedback, and particularly called out a timeout fix that was incorporated.
Timeout fix incorporated: A timeout fix was incorporated into the popcorn-cli.
- The fix was implemented by a specific member.

GPU MODE ▷ #hardware (3 messages):

DGX Spark, Blackwell ISA, New SASS Instructions, FP8 Operations

DGX Spark Speculation Sparks!: Members speculated about DGX Spark, suggesting it might be related to the new hardware capabilities.
- They point out that the GB10CUDA12.9 documentation lists three new SASS instructions for the Blackwell ISA: QADD4, QFMA4, QMUL4.
Blackwell ISA Brings New Instructions: The documentation for GB10CUDA12.9 highlights the introduction of three new SASS instructions specific to the Blackwell ISA: QADD4, QFMA4, and QMUL4.
- These instructions indicate new functionalities for 4xFP8 operations, including add, multiply, and fused multiply-add, but “there are no corresponding PTX instructions yet AFAICT”.
FP8 Operations Amplified: The new SASS instructions for the Blackwell ISA (QADD4, QFMA4, QMUL4) suggest enhanced capabilities for FP8 operations.
- These include instructions for addition, multiplication, and fused multiply-add operations on 4xFP8 data.

GPU MODE ▷ #factorio-learning-env (24 messages🔥):

FLE Docker server connectivity issues, LangGraph agent integration with FLE, Factorio client version, Steam update, harvest_resource/server.lua is broken

Backtracking Breakthroughs: Lab Records Tumble!: New records were set in the lab using Mart’s backtracking framework on the electronic-circuits and automation-science tests, as seen in the electronic-circuits.mp4 video and the automation-science.mp4 video.
- The harvest_resource/server.lua was identified as broken on main and reverting to an older version fixed the issue, but tests continued to fail because of the new import strategy.
Import Impasse: New Strategy Snafus Tests!: The new import strategy, involving absolute imports, caused tests using relative imports to fail, particularly when checking isinstance(entity, Prototype) across different files.
- Additionally, the new import structure reportedly broke things throughout the codebase, as pydantic did not recognize classes as being the same when imported differently.
Factorio Frustration: Mismatched CRCs Cause Desyncs!: A user reported experiencing connectivity issues between their Factorio client and the FLE Docker server after a recent Steam update, resulting in immediate desynchronization errors due to mismatched CRC values.
- Attempts to resolve this included verifying the Factorio client version (1.1.110), rebuilding the FLE Docker container, and ensuring the Space Age DLC was disabled, yet the issue persisted, pointing to a FLE-specific problem.
LangGraph Limbo: Documentation Deficit Hinders Integration!: A user expressed difficulty connecting their LangGraph agent to FLE, citing limited documentation and uncertainty about the expected communication interface.
- Despite pointers to agent creation in the /agents directory and code for attaching new agents in /eval/open/independent_runs/run.py, the absence of a LangGraph agent example complicated the integration process.

GPU MODE ▷ #amd-competition (7 messages):

AMD Mixture-of-Experts Leaderboard, popcorn-cli timeout patch, aiter/test_moe

AMD Mixture-of-Experts Leaderboard Submission Works: A user reported that the leaderboard submission for the AMD mixture-of-experts model is working using the command /leaderboard submit benchmark, after uploading the Python file and selecting the GPU.
- If the GPU is not selected, an error occurs; also, the leaderboard might fail due to timeouts because the mode includes both test and benchmark modes.
Timeout Patch for Popcorn-CLI: A user mentioned that a patch to extend the timeout window has been submitted and can be found here.
- The user suggested updating the CLI with the latest popcore-cli code and compiling the CLI binary, which they verified resolves the timeout issue for leaderboard submissions.
Accessing aiter/test_moe Directly: A user suggested accessing aiter/test_moe directly via this link.
- This link may be helpful in resolving issues related to timeouts or submission errors.

GPU MODE ▷ #mojo (11 messages🔥):

Mojo GPU Kernel, PyTorch Mojo, Qualcomm GPU support, Modular GPU Kernel Hackathon

Mojo Focuses on GPU Kernels and High-Performance CPU Programming: Mojo is focusing its efforts on GPU kernel and high-performance CPU programming, making features like classes a lower priority, which may hinder building a complete AI framework in Mojo.
- The team encourages the use of PyTorch and hopes to see awesome PyTorch+Mojo integration in the future.
Prefix Sum Incorrect Results with GPU Warp: A user encountered incorrect results with Mojo’s warp_sum and block_sum functions while trying simple GPU kernels and shared code and debug output in a Gist.
- Further debugging revealed a potential issue with prefix_sum rather than shuffle_up, and the problem was tracked to a GitHub Pull Request.
Modular Compiler goes Open Source Next Year: A user inquired about the status of the Modular compiler and its open-sourcing.
- It was confirmed that the plan is to open-source it next year.
Qualcomm GPU Support Explored: A user asked if there are plans for supporting Qualcomm GPUs like the 8650.
- Another user suggested checking out the Modular Puzzles as a wonderful onboarding to GPU Programming.
Modular GPU Kernel Hackathon: There was an announcement for the Modular GPU Kernel Hackathon happening this Saturday at AGI House, with speakers like Dylan Patel and others, registration link is available at AGI House.
- The event featured speakers and a promotional image.

MCP (Glama) ▷ #general (145 messages🔥🔥):

Cursor MCP, A2A discussion, Debugging MCP Servers, Cloudflare Deployment issues

Cursor Struggles to Use MCP Resources: Members discussed issues with getting Cursor to use resources, noting that they do not show up in the UI, unlike prompts and that Claude requires explicit user inclusion of resources, which may not align with the intended MCP design.
- It was pointed out that resources are meant to be handled by the host application without user interaction, as stated in the MCP specification.
A2A Considered Cumbersome: There was a discussion on the value of Agent-to-Agent (A2A) communication, with one member finding it a little cumbersome but appreciating the core abstractions like task, agent, and artifact.
- It was suggested that MCP can effectively support A2A and that A2A’s workflow for managing tasks could be implemented with relatively little code.
Debugging MCP Servers Proves Challenging: A member shared their difficulty in debugging an MCP server served over stdio transport, mentioning it took a week to resolve a small issue due to the need to disable console logs and the ineffectiveness of VSCode debugger breakpoints.
- A suggestion was made to use tee stream for debugging and one user pointed to this guide for debugging an mcp server, tool calling using the mcp inspector.
Cloudflare Deployments Fail to Connect: A member inquired whether anyone was experiencing connectivity issues with remote MCP Servers deployed on Cloudflare.
- No specific solutions were provided in the context.
MCP server now open source: A member introduced their project, a vertex-ai-mcp-server that started as Vertex AI but has grown to include Gemini, grounding, and other tools.
- The project is now open source.

MCP (Glama) ▷ #showcase (3 messages):

MCP Client, OpenLink AI Layer, Model Context Protocol

New CLI-Based MCP Client Emerges: A new lightweight, fast, and simple CLI-Based MCP Client for STDIO MCP Servers has been released, designed to bridge local LLMs running Ollama with MCP Servers (loom link).
- It can be used with jadx mcp servers to perform AI-assisted reverse engineering of Android APKs using local LLMs, with the code available on GitHub.
OpenLink’s OPAL MCP Server Goes GA: The MCP Server for the OpenLink Software AI Layer (OPAL) is now generally available for both cloud-based and on-premise deployment.
- This implementation of the Model Context Protocol (MCP) supports both client and server roles, and features operations like database queries, metadata exploration, LLM interaction, and AI agent integration, per the OpenLink Community Forum Post.

Yannick Kilcher ▷ #general (81 messages🔥🔥):

LLM output spam, AI-generated content, Em dashes in LLM output, AI article/patent writing agents, Nerf field with Gemini

Community Debates LLM Output Spam: Members discussed the proliferation of LLM-generated content in the channel, with concerns raised about the quality of engagement and the potential for spamming long, bullet-point-filled messages.
- One user stated they really do think this is a matter of quality of the server and its interactions, with the server moving closer to Quora in discussion quality.
The Em Dash giveaway: Members noticed the frequent use of em dashes in LLM output, which often serves as a giveaway.
- One member stated that Most people don’t know how to do it on there keyboard since there isn’t a discrete key for them.
Users Develop Dynamic AI Article/Patent Writing Agents: A user is developing agents that are dynamic and are a society of minds for writing patents and academic articles.
- They reference their article mentioning DreamerV3 blurring the lines between static models and dynamic task-adaptive world modeling, but acknowledged that some don’t believe it looks futuristic.
Data Science Agent in Colab with Gemini: A user shared a link to a Google blog post about a data science agent in Colab with Gemini.
- Another member said that Lately all I seem to be able to get out of gemini is facile blog-post-level stuff. It’s disappointing.
AI Model Bias/Corruption: A user said some companies may provide wrong or fake content to agents fetching content, potentially causing agents to become more biased.
- The user said I think what’s happening is that some companies, (I think has started from Cloudflare) provide wrong/fake content to agents that’re trying to fetch their content. They compared it to Chinese websites serving zip bombs.

Yannick Kilcher ▷ #paper-discussion (1 messages):

Time off, Volunteers needed

Time off announced: A member announced they are taking time off, probably this week and next week, to catch up on some things.
- They assured that they will be back.
Volunteers needed!: A member prompted anyone else who wants to present or organize to feel free!
- No other context was given.

Yannick Kilcher ▷ #ml-news (26 messages🔥):

Winner-Takes-All Economics, Mistral Medium, Zed AI Code Editor, Cerebras Inference, Windows Compilation

Debate Over Winner-Takes-All Economics Ensues: Members debated the effects of the winner-takes-all economic system, and whether it is the system that the US has been building since its augeration.
- Another member linked to the Mistral Medium announcement, noting a FixupX post and joking it comes at the low cost of a couple of millions USD.
Zed AI Code Editor Enters Fray: Members discussed the new Zed AI code editor, noting it is open source.
- One member planned to use it with the Mellum 4B Base model and expressed disappointment about the lack of a Windows binary.
Cerebras Inference Model Hosting Questioned: One member expressed frustration with the Cerebras website, finding it difficult to locate the list of models they host for inference.
- They found more details on their Twitter page, criticizing modern web design for prioritizing visuals over details.
Zed AI Compilation on Windows Documented: One member successfully compiled Zed AI on Windows, following the instructions here and linking to a related issue.
- Another member asked why Zed needed to be compiled, since no windows binary available, and another member pointed to this tutorial.
Zed Font Too Blurry: A member complained that the font in Zed was too blurry and also noted that signing in with Github is needed for tab completion.
- They also expressed disappointment that it was necessary to sign in with Github to get tab completion in order to try Mellum 4B on LM Studio for tab completion.

Latent Space ▷ #ai-general-chat (80 messages🔥🔥):

Windsurf AI, Cursor vs Windsurf, OpenAI internal models, Gemini 2.5, Product Market Fit

Windsurf’s Product Vision Prevails: Members discussed that Windsurf seems to have a more coherent product vision than Cursor, potentially due to OpenAI’s acquisition and internal model development.
- The concern is that OpenAI’s focus on productizing may lead to internally hidden coding models, creating a significant competitive advantage (moat) that others can’t access.
OpenAI Model Lead Time Still Matters: Despite the emergence of various models, some believe that OpenAI’s lead in model training gives them a significant edge, even if it’s just a month or two ahead of competitors.
- The ability to consistently offer more capable models could secure market share, leading some to consider switching between tools like Windsurf, Claude, and Cursor based on the latest model’s performance.
The Gemini 2.5 Guardrails Cause Odd Responses: Users reported odd responses from the new Gemini 2.5 Pro preview, attributing it to increased guardrails and safety training, as seen in this reddit post.
- One user noted that at least they aren’t relentlessly tuning it for personality and engagement but the endless leaderboard hacking and constantly shifting safety training is making these models very weird.
AI Augmentation Creates New Work: An article on Arstechnica discussed whether time saved by AI offset by new work created, which was shared in the chat.
- Some members agreed based on personal experience, noting that they now spend time on tasks that didn’t exist before AI, such as curating a prompt library, which is similar to the red queen’s race concept.
New Mistral Models Debuts: Members reported that Mistral is releasing a new model as well as a Mistral enterprise chat agent, per this tweet.
- These new models are aimed at competing with the OpenAI gpt models in both performance and safety.

Latent Space ▷ #ai-announcements (2 messages):

New Claude code pod, AI Engineer conference

New Claude code pod is here!: A new Claude code pod is out now, check it out at Latent Space’s X post.
AI Engineer Conference early bird tix almost gone: The biggest AI Engineer conference of the year is happening this June and early bird tickets are expected to sell out this weekend, get them while they’re hot!
- The speaker list is now live, see who’s speaking at ai.engineer/#speakers.

HuggingFace ▷ #general (38 messages🔥):

Hugging Face billing inquiries, LLM User info approach, Text to 3D diffusion on Mac, Dolphin model to be more human, Reinforcement Learning at scale for agents

Subscription Activation Snafu Requires Email Intervention: A user reported paying for a subscription but not receiving membership access, and was advised to email [email protected] for support.
- After not receiving a response in a day, the user followed up asking about a faster way to contact support.
Billing Questions Prompt HF Email Referral: A user inquired about cost calculation for HF access token and meta-llama-Llama-3.3-70B-Instruct and was directed to [email protected].
- Members shared links to the Hugging Face Inference Cost documentation and a relevant forum discussion.
LLM for User Info: RAG to the Rescue: A user is looking for the right approach to feed the LLM with user details and old chat logs, so that the LLM can keep in mind what the user is.
- It was suggested that while complex mechanisms like RAG may not be required, the program flow is similar to RAG, so RAG may serve as a reference, and referenced blogposts like AI Blueprint Agentic RAG Part 3: Generate and Gradio State.
CUDA-less Text-to-3D Diffusion? Mac Users Unite!: A user lamented running a text-to-3D scene diffusion model on a Mac without CUDA.
- Another user suggested using Apple Silicon to speed things up, linking to Diffusers optimization documentation for MPS and CoreML.
SemEval and ISWC 2025 Challenges: There are two open community challenges: LLMs4Subjects at SemEval 2025 and KONVENS 2025, focused on bilingual subject tagging, as detailed here.
- Additionally, LLMs4OL is collocated with ISWC 2025 and is a Semantic Web AI challenge focused on reconstructing well-known ontologies like the Gene Ontology using LLMs, with more info here.

HuggingFace ▷ #today-im-learning (9 messages🔥):

Cache-Augmented Generation, Distributed RLHF, Converting .tensorflow to .bin, Offline Model

CAG Paper recommended: A member recommended reading the Cache-Augmented Generation (CAG) paper, calling it very easy to read and light, and providing a link to the paper.
Distributed RLHF in progress: One member is creating a library for distributed RLHF and reading several DeepSeek papers before reaching GRPO used in R1 and R1-zero.
- They also called out Neuralink using wandb for loss plots and inquired about the second tool to visualize the use of threads and processes.
How to convert .tensorflow file into .bin file: A member asked about converting .tensorflow file into .bin file to get an offline version of the model, and another member shared a script to convert the HD5 into a Bin model file.

HuggingFace ▷ #i-made-this (11 messages🔥):

RADLADS, Alpha-Root Dataset, CommonCrawl Data Extraction, Embedder Collections, ACE-STEP Music Generation

Recursal’s RADLADS emerges for attention distillation: The Recursal team introduces RADLADS (Rapid Attention Distillation to Linear Attention Decoders at Scale), a protocol for rapidly converting softmax attention transformers into linear attention decoder models, detailed in their ArXiv paper and HuggingFace model collection.
- RADLADS requires less than 700M tokens and under $2,000 USD for training, while maintaining inference quality close to original transformer models; training code is available on GitHub.
Collection of Embedders Available: A member shared a link to a collection of embeddings, available here, potentially useful for RAG applications.
- They also shared a link to the Alpha-Root Dataset, a competitive cybersecurity pretraining dataset.
Alpha-Root Dataset uses New CommonCrawl Paradigm: A new paradigm for extracting data from CommonCrawl is described in the Alpha-Root dataset, which mines domains directly on the common crawl web graph.
- According to the creators, this approach matches the performance of PRIMUS-FineWeb while using ~10x less resources and data, extracting 3B tokens from FineWeb-Edu without needing a classifier.
ACE-STEP Model achieves SOTA music generation: The ACE-STEP model achieves state-of-the-art music generation performance as showcased in this YouTube video.

HuggingFace ▷ #computer-vision (9 messages🔥):

Flash Attention 2, FP16 and BF16 support, local file formats

Flash Attention 2 Installation Instructions Surface: Members discuss using pip install flash-attn to install and use Flash Attention 2, with one member providing a link to the pytorch blog.
FA2 now supports FP16 and BF16!: Members mention that Flash Attention 2 supports FP16 and BF16 (with BF16 requiring Ampere or newer GPUs), recommending to install using git clone https://github.com/Dao-AILab/flash-attention.
Local File Formats: Members inquired whether locally loaded models are in .bin or .safetensors format.

HuggingFace ▷ #NLP (2 messages):

DaoDeCode, Maximilian-Winter, github.com

DaoDeCode Language Model Summoned: A member introduced DaoDeCode, a language model framework incorporating Mechanism Points and Five Element Transformation Patterns, inspired by Daoist strategy.
- The framework aims for minimal intervention and maximal transformation by identifying the perfect seam where space and time vanish, with source code available on GitHub.
Enthusiastic Response to DaoDeCode: Another member reacted with enthusiasm to the announcement of DaoDeCode.
- The member’s short message, Brother what?, was in response to the initial detailed message.

HuggingFace ▷ #smol-course (1 messages):

Smolagents Transcriber, Speech-to-text pipeline, Whisper-Turbo

Smolagents Transcriber Tool Arrives: The Transcriber is a speech-to-text pipeline built on Whisper-Turbo that transcribes an audio to text, as outlined in the Smolagents documentation.
Transcriber leverages Whisper-Turbo: The transcriber leverages Whisper-Turbo for fast and accurate transcription of audio to text, according to the Smolagents documentation.

HuggingFace ▷ #agents-course (12 messages🔥):

404 Client Error, Running models locally, Including Image in AgentWorkflow, Gated Repo Access, RAG over a CSV

Bypassing 404 Client Error: Users encountered a 404 Client Error when trying to run a Jupyter notebook with a specific model.
- A member suggested changing the client to client = InferenceClient(provider="hf-inference",model="meta-llama/Llama-3.3-70B-Instruct") to resolve the issue.
Guidance on Running Models Locally Awaited: A member requested guidance on running models locally, specifically with llama_index for building agents.
- They were looking for advice on how to include images in the AgentWorkflow input but struggled to find relevant information.
Tackling Gated Repo Access Errors: A user faced issues accessing a gated repo despite setting the Hugging Face token in the space settings and code.
- They configured their code to use the token like this: huggingface_token = os.environ.get("HUGGINGFACE_TOKEN") but still got an access error.
RAG-over-CSV Submissions Spark Debate: A member criticized that half of the submissions are just RAG over a CSV with the solution to all of the benchmark questions.
- The member expressed interest in seeing real agent implementations with smolagent.
Testing Agent with Specific Questions Locally: A member shared a test_agent.py file to test agents on specific questions from the dataset locally.
- They used it to atomically check the correctness of the agent on various specific tasks by commenting and uncommenting test cases.

Nous Research AI ▷ #general (37 messages🔥):

Open Codex forks for Gemini, M4 Max MacBook Pro for LLMs, Dolphin Logger for Naughty Chats, Zed AI Code Editor, Gemini model tps

Open-Codex forks now support Gemini, Ollama and more: A member mentioned that there are at least two open-codex forks that aim to allow using other models, including this one that supports Gemini, Ollama and more.
- Another member claimed the official OpenAI Codex also apparently supports other models now, but was censored by a bot.
M4 Max MacBook Pro runs up to 90B parameter models: A member reported running up to 90 billion parameter models in LM studio on an M4 Max MacBook Pro with 128GB of RAM and a 2 terabyte drive.
- The MacBook Pro was bought Certified Refurbished directly off the Apple website.
Cognitive Computation seeks help for Dolphin naughty chats: Eric at Cognitive Computations is looking for help to democratize, make anonymous, and organically source naughty interactions to train the Dolphin model.
- To do this, he is suggesting users install dolphin-logger, add your keys to .env, and run it, then point your agentic/MCP tools at it.
Zed launches fastest AI code editor: Zed launched a new AI code editor, with good local model support which means that it should integrate easily with other local models, including Hermes.
- It seems like users can just do any Ollama model too, assuming it supports toolcalls and diff styles.
New Gemini model hits 500-1500 tps: A member reported achieving 500-1500 tps on the new Gemini model.
- They found this insane, and thought that level of performance would be something only Cerebras and Groq could do.

Nous Research AI ▷ #ask-about-llms (5 messages):

DeepHermes-3-Llama-3 Sizes, 1b model size limitations

DeepHermes-3-Llama-3 size variants clarified: Members clarified that DeepHermes-3-Llama-3 comes in 3B, 8B, and 24B sizes.
- A member inquired about NousResearch/DeepHermes-3-Llama-3-1B-Preview model, but it’s not an existing official release.
1B model size declared “too small”: It was reported that there was an attempt to create a 1B model version.
- Members concluded that the 1B parameter model size is just too small to be effective.

Nous Research AI ▷ #research-papers (3 messages):

Arxiv Paper, Learn Mandarin

Arxiv paper link shared: A member shared a link to an Arxiv paper - which may or may not be real.
- They also shared an Arxiv abstract link for the same paper.
User states desire to learn Mandarin: A member expressed the sentiment I gotta learn mandarin.

Nous Research AI ▷ #interesting-links (1 messages):

kotykd: https://cognition.ai/blog/kevin-32b

Nous Research AI ▷ #research-papers (3 messages):

Arxiv Paper, Learning Mandarin

Double Publication Drops on Arxiv: Two links to an Arxiv paper were posted: arxiv.org/pdf/2505.03335 and arxiv.org/abs/2505.03335.
User to study Mandarin: A user mentioned that they gotta learn mandarin.

Eleuther ▷ #general (27 messages🔥):

Cursor free for students, Scale Maximalism, Advertising saturation point, SLURM memory allocation

Cursor Gives Students Free IDE: A user shared a link about free access to Cursor IDE for students.
- Others stated that knowing that Cursor is free for students is something that’s genuinely useful to a meaningful % of this community.
Maximalists Believe Scale Will Solve All: A member asked for recommendations of papers or researchers who firmly believe in scale maximalism.
- They are looking for proponents that think scale maximalism will solve all problems in AI.
Advertising Reaches Saturation: A member suggests that if a product is already sufficiently saturated, such as Gemini or ChatGPT, you can almost always “advertise.”
- They feel like Cursor is a resource that doesn’t benefit the poster in particular, and the final stage of advertising is when your customers function as your marketing.
SLURM User Requests 80MB Instead of GB: A user disclosed a fix for their issue: they were requesting 80MB of memory through SLURM instead of 80 gigabytes.
- Another member responded, saying that it makes me glad we just run everything on bare-metal.

Eleuther ▷ #research (2 messages):

MTurk, Prolific, Human Evals

Prolific Favored Over MTurk for Human Evals: A member inquired about whether to use MTurk or Prolific for human evals.
- Another member recommended Prolific 80% of the time.
Prolific Demolishes MTurk for Evals: When prompted, one member decisively suggested Prolific over MTurk for human evaluations.
- The curt recommendation stated a preference for Prolific 80% of the time, implying its superiority in most scenarios.

Eleuther ▷ #lm-thunderdome (10 messages🔥):

lm-eval-harness implementation, HuggingFace vs vLLM, lm-eval-harness BOS token, lm-eval-harness sampling

Guidance on lm-eval-harness Implementation Surfaces: A user inquired about implementing a custom model in lm-eval-harness, to which another user provided a link to the documentation.
- They suggested inheriting from the HFLM class and overloading the _model_call and _model_generate methods, pointing to the Mamba implementation as an example.
HuggingFace Inference vs vLLM Performance is a hot topic: A user noted that while vLLM is faster for generation tasks, HuggingFace inference uses less power, yet HuggingFace inference can use full power for loglikelihood tasks.
- Another user replied that this is expected, because vllm is optimised for fast generation.
lm-eval-harness BOS Token Discussion: A user reported that the tokenized prompt includes the BOS token when running loglikelihood tasks using base models and the LocalCompletionsAPI implementation.
- The user inquired about specifying add_bos_token=False.
lm-eval-harness sampling discussed: A user asked if setting do_sample:true without specifying temperature would use the Hugging Face model’s generation_config settings.
- The response was that Need temp > 0. Otherwise it sets do_sample to false.

DSPy ▷ #show-and-tell (1 messages):

Unsloth, Claude Sonnet Finetuning, Qwen3-4b comparison, GRPO

Unsloth Finetunes Claude Sonnet Data: A member used Unsloth to finetune their Claude Sonnet chat history data, which they downloaded.
- They use Claude Sonnet to vibe code dspy a lot.
Lora vs Qwen3-4b: The member provided a side-by-side comparison of their finetuned Lora model versus Qwen3-4b in a zero-shot setting via a screenshot.
- The Lora model correctly picked up def forward, whereas the non-Lora version skipped it, and the Claude data uses instead of .
GRPO finetune might work better: The member noted that their SFT finetune almost got LabeledFewShot correct but hallucinated.
- They believe a GRPO finetune would pick it up easily.

DSPy ▷ #general (30 messages🔥):

Efficient Domain Knowledge Injection in DSPy, DSPy Signature Docstrings, ReAct Module Signature without direct output, Accessing full LLM history

Brainstorming Knowledge Injection Methods in DSPy: A member is seeking a more token-efficient way to inject domain knowledge (specifically ES cluster indices and mappings) into a DSPy program, rather than including it as an InputField in every prompt, aiming to provide it only once at the session start.
- While system prompts might work, DSPy’s idiomatic approach is preferred; another user suggested including the entire Postgres schema in the system prompt for text2SQL tasks.
Docstrings as Instructions in DSPy Signatures: Members discussed using docstrings in dspy.Signature to provide essential instructions, treating them as specifications for collaborators regarding the function’s intended behavior.
- The best practice is to explain what the task is, rather than how to achieve it, focusing on non-obvious details from inputs and outputs, and relying on training data when available, but potentially abusing the docstring in quick and dirty applications.
Docstrings automatically incorporated in System message: It was revealed that docstrings make their way as instructions for the system message in the default adapter.
- The dspy.inspect_history() method was suggested to inspect the LLM’s interaction history, but a member pointed out that llm.inspect_history() truncates fields and throws serialization errors when attempting to dump to a file.
Designing ReAct Module Signature without Direct Output: A member inquired about creating a signature for a ReAct module that primarily outputs tool calls, questioning whether output fields can be left blank if the tool calls are the primary objective.
- No direct answer was given.

DSPy ▷ #examples (3 messages):

GitHub Notebook Rendering Issues, Colab vs GitHub for Notebooks, Missing "State" Key Error

GitHub rendering botches Notebooks: A member suggested that GitHub’s notebook rendering might be too picky, causing a missing “state” key error.
- The member suggested that the issue could stem from a prematurely copied notebook, lacking a step that creates the missing “state” key.
Colab steps in to fix Github: A user suggested that in the future, linking to the Colab version (https://colab.research.google.com/github/Columbia-NLP-Lab/PAPILLON/blob/main/papillon_tutorial.ipynb) instead of the GitHub version would be the quickest fix.
- They suggested that Colab’s rendering is more forgiving of the potentially missing piece.

Modular (Mojo 🔥) ▷ #general (6 messages):

Modular Puzzles on Macbook, Fields in traits, Modular Hackathon

Solving Modular Puzzles Remains Elusive on Apple Silicon: Members discussed the ability to run Modular Puzzles on Macbooks, concluding it’s not directly possible on Apple Silicon GPUs but can be done remotely on a GPU-attached cloud instance.
- Supported NVIDIA GPU architectures for Mojo GPU programming include Turing, Ampere, Hopper, and Blackwell (RTX 20XX - 50XX series).
Trait Fields Tempt, Properties Please: Concerns were raised about fields in traits, suggesting that fields in traits could happen, but you would be denied the ability to add such a trait via an extension; you would need to include it in the original struct definition.
- It was agreed that properties in traits is probably a strictly better idea than fields in traits, because it’s more general.
Modular Hackathon Happening!: A final reminder that there is a hackathon coming up, the Modular Hackathon at AGI House in Hillsborough on Saturday, with spots still available.
- The speakers include folks from Modular and also Mark Saroufim (GPU MODE & PyTorch), Simon Boehm and Sasha Krassovsky (Anthropic), and Dylan Patel (SemiAnalysis).

Modular (Mojo 🔥) ▷ #mojo (28 messages🔥):

Public/Private Syntax, Enum Recommendations, Open-Source Contributions, Compile-Time Abort, Testing constrained“

Mojo’s Pub/Private Syntax: A member inquired if Mojo has a pub syntax like Rust or defaults to private, and if there’s a roadmap to add this feature.
- A member responded that Mojo currently follows the Python convention of “everything is public, prefix things that are supposed to be private with an underscore”
Mimicking Enums in Mojo: A member requested recommendations for mimicking enums with large sets of enumerable values, asking if a nested alias for each unique value is the only way right now.
- Another member confirmed that this is pretty much the way to do it and provided an example using aliases.
Open-Source Compiler Contributions: A member inquired if open-source contributors can currently contribute to the Mojo compiler.
- A member responded that there is not currently a way, but there’s a decent chance that sum types will be implemented before the compiler is open for contribution.
Compile-Time Abort Capability: A member asked if it’s possible to abort at compile-time to add compile-time guards, sharing a code snippet example.
- Another member responded that they’re not sure if that’s possible, but the proposed requires-like syntax might fix this though.
Mojo Roadmap Revealed: The Mojo Roadmap has been revealed to the public in the Modular Forums.
- Members reacted positively and discussed how to test constrained tests, or, when we get it requires to validate error messages, similar to “assert_raising” but at compile time.

LlamaIndex ▷ #blog (3 messages):

Deep Research Agent, LlamaExtract, Anthropic API support

LlamaIndex makes Deep Research Agents accessible: Learn to build your own Deep Research agent from scratch in LlamaIndex!
- A recent workshop tutorial covers going from zero knowledge of LlamaIndex to a fully-fledged multi-agent system for deep research, using AgentWorkflow to create a single agent.
LlamaExtract Enhances AI Apps: The latest LlamaExtract features enhance your AI applications with citation capabilities and improved reasoning.
- Now you can extract information from complex data sources with precise source attributions, provide reasoning for these extractions, and boost transparency (see LlamaIndex’s tweet).
Anthropic API and LlamaIndex join Forces: Anthropic’s API now supports a built-in web search tool and LlamaIndex offers day 0 support!
- Check out the demo notebook showing how to use it in LlamaIndex or read Anthropic’s announcement.

LlamaIndex ▷ #general (27 messages🔥):

Memgraph using Neo4j client, Multimodal LLMs with GPT-4o-mini, ChatGPT System Prompt Memory, Agentic RAG App Structure, Medical LLM Bot Building

Memgraph masquerades as Neo4j: A user testing Memgraph in WSL2 VS Code noticed it was calling Neo4j, prompting confirmation that Memgraph is a wrapper on Neo4j using the Neo4j client - see the docs.
- The user initially suspected issues with their Neo4j environment but confirmed it was the underlying implementation.
GPT-4o-mini goes Multimodal: A user inquired about passing documents directly to a multimodal LLM (gpt-4o-mini) for one-shot inference, bypassing OCR.
- A member suggested parsing the document and appending it to the system prompt for the LLM to query on top of.
ChatGPT’s Lingering Memories: A user noted that even after turning off the memory function and deleting all threads, memories seemed to persist in the official ChatGPT using GPT-4o.
- Another member found it odd and suggested contacting OpenAI support, speculating a delay in the feature taking effect.
New Agentic RAG App Structure Blues: A user found the new folder structure created by npx create-llama@latest for agentic RAG apps unintuitive compared to older versions, noting the absence of a full-fledged Next.js app in the .frontend folder.
- A member noted the older structure was overwhelming for most and suggested using the --pro flag to get the older structure, also highlighting the continued availability of the FastAPI app via LlamaIndexServer.
Possible Questions Workflow Tool Wanted: A user wants to build a medical LLM bot that suggests next possible questions to the user based on the last answer, then asks the local LLM.
- User is looking for a suitable tool within LlamaIndex to implement this workflow.

tinygrad (George Hotz) ▷ #general (2 messages):

Mojo Kernels, Chris Latner

Chris Latner’s Mojo has Kernel Trove: Chris Latner’s Mojo has a huge collection of “mojo kernels” located at modular/modular/tree.
- It is unknown if they are fast or how to run them, but they look interesting.
Mojo Kernels Availability: The “mojo kernels” are available at modular/modular/tree
- Their speed and operational instructions remain unclear.

tinygrad (George Hotz) ▷ #learn-tinygrad (8 messages🔥):

tinygrad Color meanings, beam search cache location

tinygrad Colors Demystified: A user inquired about the meaning of colors in a tinygrad output screenshot.
- Another user provided a link to the relevant section in the tinygrad GitHub repository that defines the color scheme.
Beam Search Cache Location Customization: A user asked how to override the beam search’s cache location, aiming to use tinygrad in a Lambda Labs instance with specific storage setup.
- George Hotz responded that the cache location can be overridden using the CACHEDB environment variable, referencing line 175 in helpers.

Torchtune ▷ #dev (9 messages🔥):

Torchtune PR, Tokenizer arguments, Uniform tokenizer interface

Tokenizer arg renaming causes confusion: A Torchtune PR sparked discussion over renaming tokenizer arguments like add_end_tokens, with one member pointing out inconsistencies and potential confusion arising from the PR name versus the argument name.
- The member stated that add_end_token was originally messed up with add_end_tokens, and that renaming could make it more confusing if there’s no add_start_tokens.
Better uniformity through renaming?: A member suggested putting the context on the PR for the author.
- He felt the rename, while not strictly necessary, brings better uniformity and will make any future work to make a common tokenizer interface easier.

Nomic.ai (GPT4All) ▷ #general (9 messages🔥):

GPT4All on AMD ROCm, GPT4All on iOS, GGUF token limits, Uncensored models

AMD ROCm support requested for GPT4All: A user inquired about updating the Windows version of GPT4All to support AMD ROCm.
- No response was given.
GPT4All hoped for iOS classroom integration: A teacher asked if the GPT4All app would be usable on iOS devices such as iPads.
- A user responded that LLMs require significant processing power, suggesting running a server at home and connecting via that instead; it was unclear if GPT4All supported this but other options might.
GGUF’s Optimal Temp Explored: A user asked if the GGUF file format contains its own max new token limit and optimum temperature settings.
- The reply clarified that max new token depends on available VRAM and other settings are a starting point for experimentation.
Circumventing Censorship with Uncensored Models: A user reported that the model refused to answer an illegal question, despite seeing no restriction on criminal use.
- Other users suggested using uncensored models and advised searching on Hugging Face.

LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (3 messages):

Auth0 Workshop, Lambda Workshop

Auth0 Workshop happening!: There was a reminder for the Auth0 Workshop on 5/7 at 10AM PT, which teaches how to secure AI agents with authentication solutions.
- Auth0 is sponsoring up to $5,000 for 1st place, $3,000 for 2nd place, and $2,000 for 3rd place for teams that successfully integrate Auth0.ai into their projects.
Lambda Sponsors Prizes for AgentX: There was an announcement of the AgentX Workshop with Lambda on 5/15 10am PT, designed for scaling agentic AI projects using Lambda’s Inference API.
- There are special prizes for AgentX Competition participants: Up to $1,000 in credits for 1st place, $500 for 2nd, and $300 for 3rd in both Entrepreneurship and Research tracks. Register at lu.ma/AgentX-lambda.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):

Hugging Face, Email Notification Issues

Hugging Face Awaits Incoming Info: Information was sent to Hugging Face yesterday and the team is waiting to hear back from them.
Tracking Credit Notification: A user reported not receiving any email to track credits, making it difficult to monitor without daily website visits.
- The user, <@854134294870884363>, provided email addresses [email protected] and [email protected] for receiving notifications.

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (1 messages):

LLMs, Statistical Pattern Recognition, Conditional Statements in LLMs, Neural Attention

LLMs Execute Conditionals via Statistical Patterns: LLMs execute conditional statements through statistical pattern recognition, not formal logic, learning from millions of examples in natural language.
- The model learns to link patterns like “If X, then Y”, representing these relationships in its parameters.
LLMs Generate Rules from Learned Patterns: LLMs don’t “remember” rules like computer programs but generate them from learned patterns, allowing complex conditional reasoning on any subject matter without explicit programming.
- They employ neural attention to weight all parts of a prompt and foresee the text that should follow, approximating logical reasoning statistically.

Cohere ▷ #💬-general (4 messages):

AWS x Cohere Workshop, Coral Status

AWS x Cohere Workshop to be available online?: A user inquired whether the in-person AWS x Cohere workshop would be recorded and made available online, as they were unable to attend in person due to being located in Malaysia.
- The user expressed interest in learning from the industry experts presenting at the event.
Coral Back Up After Maintenance: A user asked if Coral was shut down.
- Another user responded that it was a brief redirect for maintenance, but it is now back up and accessible at coral.cohere.com.

Cohere ▷ #🤝-introductions (1 messages):

xvarunx: Welcome everyone! 🥳 🎉 Thanks for joining!

Codeium (Windsurf) ▷ #announcements (2 messages):

Windsurf 1.8.2 Fixes, Windsurf Regional Channels, Cascade Customization, File-Based Rules, Simultaneous Cascades

Windsurf 1.8.2 Squashes Bugs: The Windsurf 1.8.2 patch fixes tool call errors for users with disabled telemetry and crashes around workspace conversation.
- The update also includes server updates to add regional channels.
Windsurf Expands Geographically: Windsurf has added regional channels to help connect Windsurfers across the globe, including the SF Bay Area, San Diego, Taipei City, Boston, Miami, NYC, Tokyo, Austin, and Toronto.
- Users can join these channels by answering the onboarding question in the customize section.
Cascade Gets Customizable in Windsurf Wave 8 Day 2: Windsurf Wave 8 Day 2 introduces customization tools for Cascade, including custom workflows as .md files, an enhanced rules system, simultaneous cascades, a Cascade Plugins Panel, and enhanced MCP Integration.
- These features allow users to customize Cascade to their patterns and preferences to maximize productivity and can be seen in the launch video.
Windsurf Adds File-Based Rules: Windsurf enhances rules system with multiple activation modes (Manual, Always On, Model Decision, Glob) stored in .windsurf/rules/.
- These File-Based Rules can be activated in multiple ways.
Multi Cascade Power Arrives: Windsurf introduces Simultaneous Cascades which allow you to start new Cascade conversations while existing ones are running.
- No more waiting!