AI News for 3/3/2025-3/4/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (221 channels, and 4084 messages) for you. Estimated reading time saved (at 200wpm): 481 minutes. You can now tag @smol_ai for AINews discussions!

Their brief blogpost here. It's not technical news, but it's still only every other week that a frontier lab raises money, and more money for Claude is only good news for AI Engineers.

Meanwhile, GPT 4.5 rated #1 across the board on LMArena. For posterity, here is where the current rankings lie under style control. Claude has a ways to go yet to reclaim frontier status.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

Model Performance & Benchmarks, Comparisons and Evaluations

GPT-4.5 Performance Leadership: @lmarena_ai announced that GPT-4.5 has topped the Arena leaderboard, achieving #1 rank across all categories, including Multi-Turn and Style Control, based on over 3k votes. @lmarena_ai further detailed that GPT-4.5 leads in Multi-Turn, Hard Prompts, Coding, Math, Creative Writing, Instruction Following, and Longer Query categories. @lmarena_ai highlighted GPT-4.5's strength in Style Control, leading the leaderboard in this specific area. @lmarena_ai provided a link to explore full GPT 4.5 results.
DeepSeek R1 Joint #1 with GPT 4.5: @teortaxesTex noted that DeepSeek R1 is ranked joint #1 with GPT 4.5 on hard prompts with style control, congratulating the OpenAI team.
GPT-4.5 vs Claude 3.7 Coding Capabilities: @casper_hansen_ questioned if GPT 4.5 is actually better than Claude Sonnet 3.7 in coding.
GPT-4.5 vs Claude 3.7 for Workflow: @omarsar0 described a new coding workflow using GPT-4.5 for brainstorming, Claude 3.7 Sonnet for building, and Windsurf for agentic tasks.
GPT-4.5 Benchmark Skepticism: @aidan_mclau asked @DaveShapi if 4.5 is overfit to benchmarks, or if other models are. @willdepue expressed surprise at GPT-4.5 topping categories without test-time compute, suggesting pretraining is still important. @vikhyatk is retracting positive comments about GPT-4.5, not wanting to be seen as a "low-taste tester".
Claude Sonnet 3.7 Performance: @Teknium1 described Sonnet 3.7 in Cursor as "busted" and questioned its proper chat mode usage. @reach_vb mentioned Claude Sonnet 3.7 and DeepSeek as favorite LLMs, using Cursor and DeepSeek chat.
LMSYS Leaderboard Importance: @aidan_clark stated that LMSYS is clearly the most important benchmark and advised labs to prioritize it for maximizing user value.
Benchmark Relevance Questioned: @cto_junior argued that beating benchmarks is not relevant now, and gaining users is more important.

Industry News, Funding, and Partnerships

Anthropic's $3.5B Funding Round: @AnthropicAI announced a $3.5 billion funding round at a $61.5 billion valuation, led by Lightspeed Venture Partners, to advance AI development and international expansion.
Perplexity AI and Deutsche Telekom Partnership: @perplexity_ai announced a partnership with Deutsche Telekom to make Perplexity Assistant a native feature on their new AI Phone, further highlighted by @AravSrinivas and @yusuf_i_mehdi who sees AI-first browsers as the future with Edge pushing this forward with Copilot integration.
Microsoft Dragon Copilot Launch: @mustafasuleyman highlighted the Microsoft Dragon Copilot launch, aiming to reduce administrative overload in healthcare and refocus doctors on patients.
DeepSeek AI on Copilot+ PCs: @yusuf_i_mehdi mentioned DeepSeek R1's 7B and 14B distilled models are now available on Snapdragon-powered Copilot+ PCs, emphasizing hybrid AI.
Firefly Aerospace Moon Landing: @kevinweil congratulated @Firefly_Space on being the first commercial company to successfully land a vehicle on the moon.

Tools, Frameworks, and Coding Workflows

LlamaParse Updates with Claude 3.7 and Gemini 2.0 Support: @llama_index announced updates to LlamaParse, adding support for AnthropicAI Claude Sonnet 3.7 and Google Gemini 2.0 Flash in "Parse With Agent" mode for better table parsing and cross-page consistency, and in "Parse With LVM" mode for parsing screenshots.
LlamaIndex Workflow-Based Travel Planner Tutorial: @llama_index shared a tutorial and repo by RS Rohan on building an agentic travel planner using LlamaIndex, demonstrating structured predict feature with Pydantic models, API integrations (Google Flights, Hotels, Top Sites), and event-driven architecture.
LlamaExtract for Resume Extraction: @llama_index introduced LlamaExtract, powered by SOTA LLMs like 3.7 Sonnet and o3-mini, for extracting standardized candidate information from resumes, and generalizable to other data types.
SynaLinks, Keras-inspired Framework for LLM Applications: @fchollet and @fchollet introduced SynaLinks, a Keras-inspired framework for building LLM applications as DAGs of trainable components, enabling sophisticated pipelines and RL fine-tuning.
Groovy, Python-to-JavaScript Engine: @_akhaliq highlighted Groovy, a Python-to-JavaScript engine that transpiles Python functions for client-side execution, with @algo_diver noting its potential to make Gradio production-ready.
Outlines for Structured Generation with MLX-LM: @awnihannun shared how to use Outlines by @dottxtai with mlx-lm for local structured generation, with documentation provided @awnihannun.
LangSmith for Observability and Evals Tooling: @hwchase17 pointed out that LangSmith is used to transform user feedback into evals, emphasizing observability as evals tooling.
Cursor Coding Workflow: @omarsar0 mentioned using Cursor in a new coding workflow. @jeremyphoward noted creating complex apps in a day using tools like Cursor with Python, fasthtml and MonsterUI.
Gibberlink for Encrypted AI Agent Communication: @ggerganov, @ggerganov, and @ggerganov introduced Gibberlink, demonstrating encrypted audio chat between two AI agents and provided a GitHub project link.

Research and Papers

Brain-to-Text Decoding Research: @AIatMeta highlighted a research paper from Meta FAIR and BCBL researchers on Brain-to-Text Decoding, a non-invasive approach via typing.
Diffusion Models and Flow Matching Course: @omarsar0 and @TheTuringPost shared a free MIT course on Introduction to Flow Matching and Diffusion Models, covering theory, training, and applications, including course notes, slides, YouTube videos and labs, with @omarsar0 providing another link.
Reasoning LLMs Deep Dive: @omarsar0 recommended a "Deep Dive into Reasoning LLMs", summarizing progress in post-training.
SoS1 Paper on Reasoning LLMs as Sum-of-Square Solvers: @_akhaliq shared a paper titled "SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers".
HAIC Paper on Improving Human Action Understanding: @_akhaliq posted about the "HAIC" paper, focusing on improving human action understanding and generation using better captions for multi-modal LLMs.
Sim-to-Real Reinforcement Learning for Humanoid Manipulation: @arankomatsuzaki and @arankomatsuzaki highlighted Nvidia's presentation on Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, achieving robust generalization without human demonstration, and provided project and abstract links.
Chains-of-thought and Inference Bottleneck: @francoisfleuret discussed how chains-of-thought make inference compute-bound and suggested distilling large models into faster SSMs or hybrids for better trade-offs.
LLMs as Evolution Strategies: @SakanaAILabs listed several works discussed in an interview, including "(1) Large Language Models As Evolution Strategies".
TileLang for Kernel Programming: @teortaxesTex mentioned TileLang, a user-friendly AI programming language lowering the barrier to kernel programming.
Evaluation of LLM Belief Structures: @teortaxesTex shared an insightful evaluation of LLM belief structures.
LangProBe for Evaluating AI Systems: @lateinteraction introduced LangProBe from @ShangyinT et al., questioning what complete AI systems should be built and evaluated.

AI in Business & Applications

Inventory Tracking and Demand for Tokens: @gallabytes suggested that the demand for trillions of tokens per day will come from areas like improving inventory tracking in various sectors of the economy.
AI for Shader Golf: @torchcompiled shouted out to folks working on shader golf.
AI Powered Wiki Explorer App: @omarsar0 and @omarsar0 developed a wikiexplorer app using AI, utilizing Wikipedia and OpenAI models for hints, designed to be a fun way to learn new topics.
AI Research Agent for Literature Reviews: @TheTuringPost promoted Deep Review by SciSpace, an AI research agent for systematic literature reviews, claiming it saves hours of work and is significantly more relevant than OpenAI's Deep Research and Google Scholar.
AI in Android Day-to-day Life: @Google highlighted AI on Android at #MWC25, demonstrating features like Circle to Search for translating menus and Gemini Live for learning complex topics.
AI Co-scientist Example with AlphaFold: @_philschmid gave an example of extending a GoogleAI co-scientist with GoogleDeepMind AlphaFold for protein modification assessment.
AI in Web Development with Groovy and Gradio: @algo_diver believes Groovy will make Gradio production-ready for full-stack web development.

Memes and Humor

Karpathy's AirPods Pro Saga: @karpathy shared a humorous, multi-line tweet in the style of 4chan greentext about AirPods Pro malfunctions.
Elon Musk and Grok Realism: @Teknium1 posted "Grok is much more open to realism" with a link, implying Grok's unfiltered nature, and @Teknium1 replied "Better" to a Grok image comparison.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Atom of Thoughts Enhancing Smaller Models

New Atom of Thoughts looks promising for helping smaller models reason (Score: 641, Comments: 90): Atom of Thoughts (AOT) algorithm significantly enhances smaller models' reasoning, achieving 80.6% F1 on HotpotQA with GPT-4o-mini, surpassing other models. AOT's process includes decomposing questions into a Directed Acyclic Graph (DAG), simplifying through subquestion contraction, and iterating to reach atomic questions, as illustrated in the accompanying flowchart.
- Critiques on Methodology and Results: Users questioned the reliability of the Atom of Thoughts (AOT) results, citing potential issues with the sample size of 1k tasks, unspecified confidence intervals, and tests conducted at temperature 1, which could lead to high result volatility. Concerns were raised about the randomness of results, suggesting that the reported improvements might not be statistically significant without repeated testing.
- Discussion on Rule-Based Methods: There was a debate on the relevance of rule-based methods in AI, with some users arguing that while rule-based approaches are not scalable, they can still be relevant in specific contexts. The concept of the "bitter lesson" was mentioned, indicating that computation often trumps encoding knowledge, but it doesn't rule out the utility of logical rulesets.
- Practical Implementation and Resources: A link to the open-source repository of the AOT algorithm was shared, allowing users to explore and implement the algorithm themselves (GitHub link). Additionally, the original paper is available on arXiv, providing further details on the algorithm's development and performance.

Theme 2. Klee Open-Sourced for Local LLM Use with Zero Data Collection

I open-sourced Klee today, a desktop app designed to run LLMs locally with ZERO data collection. It also includes built-in RAG knowledge base and note-taking capabilities. (Score: 397, Comments: 67): The Klee desktop app is now open-sourced, designed for running LLMs locally without any data collection, and includes a RAG knowledge base and note-taking features. The app interface offers model options like "deepseek-r1-7b" and emphasizes privacy with a "Local Mode" toggle, ensuring no data is sent to the cloud.
- Users discuss the backend compatibility of Klee, questioning if it forces the use of Ollama or if alternatives like llama.cpp can be used. There is also curiosity about how Klee compares to other platforms like LM Studio and OpenWebUI, with some pointing out that Klee is essentially a wrapper over Ollama.
- Data privacy is a focal point, with inquiries about the "ZERO data collection" claim and whether using Ollama + Open WebUI involves data collection. It's noted that both platforms run stats for bug collection, which can be disabled, aligning with Klee's emphasis on local data security.
- The user interface and features are debated, with some users put off by the Slack-inspired UI, while others appreciate the simplicity for non-technical users. Questions are raised about the potential for an Android port, the ability to run models from Hugging Face, and the customization of the RAG knowledge base.

Theme 3. Split Brain 'DeepSeek-R1-Distill-Qwen' and 'Llama' Fusion Architecture

Split brain "DeepSeek-R1-Distill-Qwen-1.5B" and "meta-llama/Llama-3.2-1B" (Score: 139, Comments: 30): The Split Brain project explores a novel dual-decoder architecture that combines two distinct language models, DeepSeek-R1-Distill-Qwen-1.5B and meta-llama/Llama-3.2-1B, to enable simultaneous processing and cross-attention fusion. This system allows for collaborative reasoning and specialized processing by maintaining separate models on different GPUs, utilizing an EnhancedFusionLayer for cross-attention, and employing a sophisticated gating mechanism for adaptive information flow. The architecture enhances computational efficiency and task flexibility, allowing for both collaborative and specialized operations while maintaining parameter efficiency by only training the fusion components.
- Cross-Attention Fusion: The Split Brain project uses bidirectional cross-attention fusion where both models generate outputs simultaneously, attending to each other's hidden representations rather than final token outputs. This real-time interaction at the hidden representation level allows for mutual influence on the models' 'thinking processes' without direct token feedback.
- Model Vocabulary Challenges: A key challenge identified is managing different vocabularies between the models, which requires a sophisticated mechanism to ensure seamless interaction and processing.
- Potential for Personalization: There is interest in using a split-brain approach for personalized AI models by combining a small, personality-reflective model with a larger, powerful model. This could surpass current prompt-based agents by allowing one model to direct and correct the other, enhancing personalization through collaborative processing.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

TO BE COMPLETED

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. IDE Wars: Cursor Stumbles, Windsurf Surfs On, and Plugin Pains Persist

Cursor IDE Plunges into Bug Abyss: Cursor IDE users are battling instability, connection failures, and checkpoint malfunctions, prompting engineers to eye Windsurf and Trae AI as life rafts. The latest release is described as incredibly unstable, with MCP server configurations adding to the chaos, especially on Windows and remote Ubuntu setups, leading to client creation failures and users seeking help on forums.
Windsurf's Ubuntu Update Capsizes Systems Then Self-Corrects: A recent Windsurf update for Ubuntu 24.04 backfired spectacularly, bricking systems with a FATAL:setuid_sandbox_host.cc(158) error, forcing reinstalls and data loss for some, but a subsequent patch and workaround involving chrome-sandbox permissions offered a lifeline. Users on Windows ARM64 however are celebrating as Windsurf Next now supports their platform, available for download here.
JetBrains Plugin Gets Requesting Hang-up: Codeium's JetBrains plugin is frustrating users by getting stuck in a perpetual Processing request state, particularly in the latest pre-release, rendering it useless for generating code and forcing downgrades to older, more stable versions to keep workflows afloat. The issue with the JetBrains plugin contrasts with the Windows ARM64 support in Windsurf Next, showcasing uneven feature reliability across different IDE integrations.

Theme 2. Claude 3.7: Speed Bumps and Credit Crunch, But Still Impresses

Claude 3.7 Chokes on Cursor, Runs in Slow Motion: Claude 3.7 is causing headaches in Cursor IDE, with users reporting it's insanely slow and prone to halting mid-request, pushing many to downgrade or use Cursor's 'Ask' mode, highlighting concerns about the model's current stability. Despite instability in Cursor, users in the OpenAI Discord declared ChatGPT is a joke now and found Claude 3.7 to be very impressive, particularly noting Claude's superior context understanding in larger files, boasting a 200K token window, eclipsing ChatGPT's 128K.
Claude 3.7 Devours Windsurf Credits Like Pac-Man: Claude 3.7 in Windsurf is guzzling premium flow action credits at an alarming rate, with reports of 30-40 tool calls per prompt for minor edits, leading to rapid credit depletion and user ire, some users are switching back to 3.5 or considering a move to Cursor to escape the credit drain. Users are urging Codeium to demote Claude 3.7 from its default model status due to its voracious credit consumption.
Claude Code Gets Anon-Kode Remix, Goes Open API: A developer, known as anon-kode, has released a modified, OpenAI-compatible version of Claude Code, dubbed anon-kode, after extracting the original source code (original tweet), making it compatible with OpenAI APIs (tweet) and available on GitHub, offering a potential open-source alternative, albeit with lots of things to fix.

Theme 3. AI Models: New Releases, Performance Quirks, and Ethical Quandaries

GPT-4.5 Claims Arena Throne, Image Recognition Debated: GPT-4.5 has ascended to the top of the Arena leaderboard, dominating across categories from coding to creative writing (source), but its image recognition capabilities are under scrutiny, with mixed reviews and debates on whether it surpasses GPT-4o, even though initial tests show a marginal +5% improvement on the MMMU benchmark. Despite leaderboard victories, some users feel OpenAI is deprioritizing Plus users in favor of Pro users, suggesting a shift in premium status perceptions.
Grok's Custom Instructions Fail to Troll, Prompting Persona Panic: Grok AI's much-anticipated custom instructions feature, now live for all users, is facing criticism as being useless, with users reporting failures to mold Grok into desired personas, including one attempt to create an "abusive and lewd troll" that backfired, leaving users questioning the feature's efficacy. Despite custom instruction flops, Grok is praised for its debugging prowess, outshining models like O3 mini high sonnet in this area, although some users find O3 mini high sonnet superior in code creation tasks.
Phi-3 Model Fine-Tuning Faces A100 Hurdles, Dataset Viewer Needs Error Fixes: Fine-tuning Phi-3 for multi-modality is proving to be a Herculean task, requiring an estimated 6+ A100s and approximately 2 weeks, even with Colab Pro, while Hugging Face's Dataset Viewer is plagued by errors impacting compatibility with various libraries and SQL, hindering data discoverability and usability. Despite these challenges, Hugging Face is celebrating latency reductions of up to 10x on Remote VAE Decode endpoints for SD v1, SD XL and Flux, thanks to code-name honey, empowering local AI builders with Hybrid Inference.

Theme 4. Hardware Hustles: Tilelang Triumphs, AMD's Ascent, and SRAM Secrets

Tilelang Kernel Smokes Triton, Nears Flash-MLA Speed: A lean 80-line tilelang kernel is boasting 95% performance of deepseek flashmla on H100, achieving a 500% speedup over Triton, showcasing tilelang's potential for high-performance computing, with code available on GitHub. This performance leap is stirring calls for an MLA leaderboard to showcase similar achievements, possibly repurposed from the bitnet group.
AMD GPUs Inch Closer to ML Spotlight, Intel Arc A770 Joins Tinygrad Party: Discussions are heating up about AMD and Intel becoming viable alternatives to CUDA in ML pipelines, with some believing increased AMD market share could spur greater investment in their GPU computing department, while Intel Arc A770 GPUs are confirmed to be compatible with tinygrad using the OpenCL backend, broadening hardware options for developers. Despite AMD's progress, questions remain about their foundry time acquisition, with concerns Nvidia still holds a significant advantage in chip manufacturing access.
SRAM's Cache Conspiracy Unveiled: Deep dives into SRAM architecture reveal that registers, shared memory, and cache are all SRAM constructs, with unallocated shared memory morphing into L1 cache, while Triton's cache_modifier in tl.load allows specifying L1 or L2 hits, but lacks direct cache level control, exposing the nuanced layers of memory management in GPU programming. For CUDA compilation, torch.cuda.get_device_capability() in PyTorch is suggested for determining --arch=, though nvidia-smi --query-gpu=name,compute_cap --format=csv offers a PyTorch-free alternative.

Theme 5. Agent Innovations and Frustrations: Travel Planning AI, Smol Agent Quiz Fails, and MCP Multi-Agent Visions

Travel App Agents Spring Up to Rescue Reel-Ravaged Travelers: A new app, ThatSpot, emerges to combat travel reel overload, deploying AI agents to automatically extract crucial trip-planning data—locations, prices, booking links—directly from travel reels, automating hours of manual research and streamlining trip organization for wanderlust-stricken users. The app promises to process travel reels and extract every mentioned place, automating the tedious manual research process.
Smol Agents Quiz Stumps Students, Error Logs Hold Clues: The Smol Agents Quiz is causing headaches, with users reporting unclear requirements and failing scores despite multiple attempts, prompting calls to mine error logs from the quiz's app.py file to pinpoint necessary tool and model providers, highlighting the need for clearer quiz instructions and better error feedback in AI learning platforms. Despite quiz woes, HuggingFace has launched a new NLP Reasoning Course unit, aiming to educate on reinforcement learning in LLMs and contribution to Open R1.
MCP Multi-Agent Architectures Materialize, Fast Agent Framework Floats: Engineers are exploring MCP for multi-agent systems, drawing inspiration from Anthropic workshops and envisioning frameworks for agents collaborating across devices, with one member sharing their fast-agent GitHub project for Defining, Prompting and Testing MCP enabled Agents and Workflows, allowing agents to be configured with distinct MCP servers and called as tools by other agents. However, MCP Terraform Registry setup is proving troublesome, particularly with Claude desktop and Cline, facing mcp-server-fetch errors when system-level proxies are active.

PART 1: High level Discord summaries

Cursor IDE Discord

Cursor Plagued by Instability: Users report instability, connection failures, and non-functional checkpoints in the latest Cursor IDE release.
- Members consider alternatives like Windsurf and Trae AI due to the poor user experience.
MCP Servers Cause Configuration Nightmares: Members struggle to configure MCP servers in Cursor, especially with Windows and remote Ubuntu workspaces, facing issues like client creation failures.
- One member eventually solved their issues with Pupeteer, as well as using Firecrawl MCP server for web scraping with LLM clients.
Claude 3.7 Faces Glitches: Users experience issues with Claude 3.7 such as being insanely slow and stopping mid-request without errors.
- As a result, many resort to using Cursor's 'Ask' mode or reverting to older versions for critical tasks.
Designers Dive into Landing Pages: Members share landing page designs generated with Cursor and discuss their aesthetic appeal and effectiveness.
- The community compares designs to those of Linear, Framer, Magician Design, and Webflow for inspiration.
Repo Prompt Hailed for Multi-File Editing: Users show excitement about Repo Prompt, praising its multi-file edit capabilities and code snippet integration.
- The community also mentions BrowserTools for debugging, and PasteMax, an open source poor man's version of Repo Prompt, for file selection.

Codeium (Windsurf) Discord

Windsurf Adds Windows ARM64 Support: Windsurf Next now supports Windows ARM64, available for download here.
- This expansion allows users on Windows ARM64 platforms to leverage the latest features and improvements in Windsurf Next.
Windsurf's Ubuntu Update Crashes Systems: A recent Windsurf update caused issues on Ubuntu 24.04, leading to the application failing to start with a FATAL:setuid_sandbox_host.cc(158) error.
- One user reported a system crash, reinstallation, and data loss, highlighting the need for backups before updating, and a manual workaround involving changing permissions for chrome-sandbox may be required.
Claude 3.7 Burns Credits, Sparks User Ire: Users report Claude 3.7 in Windsurf is rapidly depleting premium flow action credits due to excessive tool calls per prompt, with some experiencing 30-40 tool calls for minor changes.
- Members suggest Codeium hide Claude 3.7 as a default model, with some switching back to 3.5 or other models for better efficiency, and are considering switching to Cursor.
Codeium Customer Support Faces Scrutiny: Users are reporting poor customer support experiences from Codeium, with one user awaiting resolution of a subscription issue for four weeks.
- The lack of timely and effective support is driving users to seek alternative solutions and has raised concerns about Codeium's responsiveness.
JetBrains Plugin Plagued by Processing Request Hang: Users of the JetBrains plugin are encountering a persistent Processing request state, leading to errors, particularly in the latest pre-release version.
- This issue renders the plugin unable to generate responses, disrupting workflow and necessitating a downgrade to a more stable version.

OpenAI Discord

OpenAI Hosts Sora Onboarding: The Sora team hosted a live onboarding session covering Sora fundamentals and optimal prompting techniques on <t:1741024800:R>, and you can join the discussion via this discord link.
- The Sora 101 session also shared insights from the onboarding process for early access artists.
GPT-4.5 Image Recognition Gets Mixed Reviews: Members are debating whether the new GPT-4.5 has better image recognition compared to GPT-4o, with Future Machine being more vocal about OpenAI (OAI)'s choices.
- Initial tests show that GPT-4.5 scores a bit higher than 4o on the MMMU (vision oriented reasoning benchmark) with a +5% improvement.
Custom Grok is a Flop: Grok AI's custom instructions feature has been released for all users, but members report the custom instruction is useless.
- One member shared custom Grok instructions aiming for an abusive and lewd troll persona, but reported it doesn't work, and other users reported the same.
Claude 3.7 Impresses, But Projects Flounder: One user declared ChatGPT is a joke now and found Claude 3.7 to be very impressive, while Claude can better understand the context in larger files with 200K context window.
- However, another user said Claude's projects are of no use, complaining that they can hardly upload only two files maximum and it says memory full, calling Claude over hyped.
Dall-E Delivers Synthetic Biology: A member prompted Dall-E to generate an image of synthetic plants that grow hearts and livers for transplant, visible within a transparent membrane and nourished by the GM plant.
- The initial results emphasized hearts over livers, prompting the user to refine the prompt with more details about the liver lobes.

Unsloth AI (Daniel Han) Discord

Llama Model Zips WAVs Hilariously: A member amusingly reported that compressing a 192 KB ZIP file with a llama model resulted in a 48 KB lossless WAV format.
- The user found this confusion since the model then attempted to re-zip the WAV to make it smaller, specifically mentioning the r1-1776-distill-llama-70b model.
GRPO Training: More Steps Needed for Reasoning?: Users discussed the necessary training steps for LoRA training Qwen2.5-14B-instruct with GRPO, emphasizing lowering the loss for better reasoning.
- Suggestions included allocating around 24 hours, or 700-1200 steps, underscoring that convergence is model-dependent, as described in Unsloth's Documentation.
GCC Compiler Causes VLLM Pain: A user encountered a RuntimeError related to the GCC compiler while running the GRPO tutorial locally with meta-Llama-3.1-8B-Instruct.
- Despite attempting to install GCC via conda, the issue persisted, and the user is restricted from using apt-get due to security reasons on their school's HPC.
String Replacement: Coding Strategy Success?: Members debated the effectiveness of string replacement for code editing with one member deeming it generally garbage.
- Another member, however, reported success fine-tuning Qwen 2.5 for string replacement, especially when the model has access to the entire file before making replacements.
Claude 3.5 Sonnet Sweeps Bench with SOTA: Anthropic's Claude 3.5 Sonnet achieves 49% on SWE-bench Verified, surpassing the previous state-of-the-art model's 45%.
- Members made a reference to the bitter lesson: general methods that leverage computation are ultimately the most effective.

Perplexity AI Discord

Perplexity Web UI Rewrite Feature Malfunctions: Users report that the Perplexity web UI's rewrite functionality is broken, always defaulting to pplx_pro regardless of selected model.
- Some experienced prompt duplication, tagging <@883069224598257716> for support, indicating significant issues with the rewrite tool.
Claude 3 Model Confusion Persists: Users are unsure if Perplexity model indicators accurately reflect the model in use, questioning whether they're receiving Claude 3.7 Sonnet or Claude 3 Opus when selecting Claude.
- Some noticed that Pro Search overrides the selected model with Sonar, creating disparities between chosen and employed models.
Perplexity API Troubles Obsidian Web Clipper: The Perplexity API's partial incompatibility with OpenAI standards causes issues for tools like Obsidian Web Clipper.
- The API's requirement for an assistant message between user messages, absent in OpenAI, hinders Obsidian Web Clipper's ability to post consecutive user messages.
Deepseek generates Controversial Propaganda?: A user shared an image allegedly generated by Deepseek which the community regarded as politically biased propaganda.
- Another member dismissed the image, asserting You are fake deepseek. Real deepseek doesn't talk on western affairs.

HuggingFace Discord

Phi-3 Fine-Tuning Faces Hurdles: One member is fine-tuning Phi-3 for multi-modality using an A100-equipped Colab Pro, but was cautioned such fine-tuning would take 6+ A100s and run for approximately 2 weeks.
- Another member added that [QLora and Peft make anything possible with a positive attitude and a credible project].
Dataset Viewer experiences Errors: A user suggested fixing Dataset Viewer errors for compatibility with various libraries and SQL, to improve data discoverability.
- Another user thanked them in advance, and jokingly requested an additional 1.2M rows of a HQ dataset.
Hugging Face reduces latency with new VAE: Hugging Face deployed code-name honey on Remote VAE Decode endpoints for SD v1, SD XL and Flux, reducing latency up to 10x which empowers local AI builders with Hybrid Inference.
- Hybrid Inference is free, fully compatible with Diffusers, and developer-friendly with simple requests and fast responses, and VAE Encode is coming soon.
Smol Agents Quiz Sparks Frustration: A member expressed frustration with the Smol Agents Quiz, citing unclear requirements and receiving a score of 0.0 out of 5 despite multiple attempts, referencing the quiz's app.py file.
- The member pointed to the need to mine error logs to understand the exact providers required for tools and models.
Lambda Go Labs: AI learning and building: Lambda Go Labs is a community focused on AI learning, building, and research.
- The community offers hands-on experience, opportunities to share work, and a supportive network for both experienced professionals and newcomers.

aider (Paul Gauthier) Discord

Aider Leaderboard Tooling Showdown: The Aider leaderboard now benchmarks AI models, alongside tools like Claude Code, assessing them as primary coding assistants.
- A user advocated for a tool-agnostic benchmark akin to SWE Benchlets to facilitate broader comparisons of coding tools and models.
Anon-Kode Remixes Claude Code: A modified version of Claude Code, dubbed anon-kode, was released by the same developer who extracted the source code (link to original tweet), now compatible with OpenAI APIs (link to tweet) and available on GitHub.
- Lots of things to fix, but you can use anything that supports OpenAI-style API. If you're brave, give it a try.
Gemini 2.0 Pro Hits Context Wall?: A user reported RESOURCE_EXHAUSTED errors with the gemini/gemini-2.0-pro-exp-02-05 model in Aider when using a large context window.
- In contrast, the gemini-2.0-flash-thinking-exp-01-21 model functions smoothly; the user inquired about maximizing context window usage with the Pro model.
Aider Gets Git Diff Wish: A user requested Aider to directly edit files using git diff syntax (e.g., <<<<<< branch, ======, >>>>>>> replace) within the files themselves.
- Currently, Aider displays diffs in the terminal, but the user seeks in-file editing for pre-acceptance modifications; other users pointed out a fork would be necessary, or use an external diff tool.
Grok's Debugging Edge: Members noted that while Grok excels at debugging, O3 mini high sonnet may outperform it in code creation tasks, such as adding new functions.
- They observed Claude 3.7 sometimes introduces unintended elements, while deepseek-chat with O1 Pro has proven highly reliable as an editor, approaching 95% accuracy.

GPU MODE Discord

Vision Models Still Favor Attention: Despite alternatives like MLP-Mixer existing, attention-based ViTs remain the SOTA choice for vision models.
- The relative underutilization of MLP-Mixer, detailed in MLP-Mixer: An all-MLP Architecture for Vision, was questioned by a member.
SRAM's Cache Quirks Revealed: Registers, shared memory, and cache are chip/software level properties constructed from SRAM, with unallocated shared memory becoming L1 cache.
- While direct cache level control (L1/L2) is absent in Triton, cache_modifier in tl.load specifies L1 or L2 hits, where cg targets L2 exclusively.
CUDA Architecture Query gets Torch Answer: For determining the --arch= for CUDA compilation, torch.cuda.get_device_capability() from PyTorch was suggested, and the alternative solution nvidia-smi --query-gpu=name,compute_cap --format=csv was found.
- The second option avoids needing a PyTorch dependency, and the CUDA Runtime API can programmatically select the best device based on specified criteria as shown in the docs.
Tilelang kernels flash faster than Flash-MLA: A member boasted that 80 lines of tilelang kernel code yields 95% performance of deepseek flashmla, a 500% speedup over Triton on H100, with a link to the GitHub repo.
- Another member expressed the desire to have an MLA leaderboard, perhaps repurposed from the bitnet group.
FA3 needs Absmax for Quantization: While FA3 is now working, it exhibits significantly higher quantization error than basic absmax quantization, suggesting a need for strategic adjustments.
- It was proposed to apply absmax quantization after the Hada transform, especially for 'v', mitigating out-of-distribution issues stemming from large activations.

OpenRouter (Alex Atallah) Discord

Travel App Springs Up to Save Travel Reels: An app emerged to solve the problem of endless saving of travel reels and hours of manual research, using AI agents to automatically extract data such as locations, price ranges, reservation requirements, booking links, and operating hours directly from travel reels at https://thatspot.app/.
- The app streamlines trip planning by leveraging AI agents to process travel reels, automatically extracting every place mentioned, automating the manual research process.
Google Flash 2.0 Flashes a 502 Error: A user reported a 502 error when inferencing with Google's Flash 2.0 and Flash 2.0 Light models, with the error message "Provider returned error".
- The error indicates an internal issue encountered by Google.
OpenRouter's Sonnet singing with Rate Limits: A user asked about the rate limits for Claude 3.7 Sonnet in terms of RPM (Requests Per Minute) and TPM (Tokens Per Minute).
- A member clarified that OpenRouter doesn't impose specific rate limits per user, pointing to Anthropic's rate limits documentation and BYOK settings (OpenRouter Integration Settings).
OpenRouter API Key throws VS Studio for a Loop: A user faced a 401 Authentication Failure using an OpenRouter API key in VS Studio via RooCode, despite having sufficient funds.
- Suggestions included verifying the API key, selecting OpenRouter as the API provider in RooCode, and ensuring the correct base URL, referencing this tutorial.
BYOK Azure Models Yearning for OpenRouter: A user inquired about using BYOK (Bring Your Own Key) with Azure models in OpenRouter, seeking a unified API for finetuned models.
- A member clarified that only models listed in the /models endpoint are supported, excluding BYOK models, suggesting the use of an OpenAI API Key in Integration settings instead.

LM Studio Discord

LM Studio Launches SDKs for Python and TypeScript: LM Studio released software developer kits for Python (lmstudio-python) and TypeScript (lmstudio-js) under the MIT license to allow developers to tap into LM Studio's AI capabilities from their own code.
- The SDKs support LLMs, embedding models, and agentic flows, featuring the .act() API for autonomous task execution using provided tools, as documented on their respective pages (lmstudio-python) and (lmstudio-js).
LM Studio "Unsupported Device" Error Plague Users: After an LM Studio update, users reported encountering Failed to load model errors with the message Unsupported device, advising to try adjusting GPU offloading or thread pool size.
- The error might be tied to context length impacting memory usage; the left number is the number of tokens the model is using in the chat history already while the right number is the context limit.
Diffusion Model Architecture Unsupported by Llama.cpp: Users reported errors loading diffusion models, receiving error loading model architecture: unknown model architecture: 'sd3', it was clarified that llama.cpp does not support image/video/audio generation models.
- Support for vision models in llama.cpp is uncertain, with concerns about the lack of Llama 3.2 vision or Pixtral vision support, however, some believe that UI-TARS fixes will help.
Pseudollama Patches OLLAMA Gap: Members discussed if LM Studio endpoints were compatible with apps that take an OLLAMA endpoint, and it was answered that it is not supposed to work by default, but Pseudollama can bridge the gap.
- The author noted that this is 100% vibe coded, so there are likely dumb issues throughout, but it works.
AMD needs to compete in the GPU Space: Members discussed whether AMD or Intel could become viable for ML pipelines and frameworks to compete with CUDA.
- Some members believe if AMD increases their market share, they would be more interested in investing in their GPU computing department, and that the real question is whether AMD can buy the time from a chip foundry, because Nvidia has the upper hand.

Nous Research AI Discord

Nous API Pricing Discussed: Members discussed Nous potentially launching an API for their models to generate income, with speculative pricing around $0.8/M tokens, potentially yielding $800-1600/day.
- Suggestions included pricing closer to $1/M input tokens and $3/M output tokens for specialized models, with ongoing efforts underway to realize this.
LLMs Fail at CUDA Kernel Generation: Members concurred that while LLMs can produce valid CUDA syntax, they struggle to independently generate high-performance CUDA kernels.
- The optimal approach involves integrating hardware and compute graph data with the LLM, potentially via a knowledge graph or GNN, complemented by intensive GPU profiling.
Logic-RL Boosts Reasoning with Rule-Based RL: The Logic-RL paper explores the potential of rule-based reinforcement learning (RL) in large reasoning models, taking inspiration from DeepSeek-R1.
- The 7B model, trained on only 5K logic problems, displayed generalization on challenging math benchmarks like AIME and AMC.
Runway Unveils General World Models: Runway introduced General World Models, aiming to create AI systems capable of building internal representations of environments to simulate future events.
- Their goal is to represent and simulate a broad spectrum of situations and interactions, surpassing confined settings like video games or driving simulations.
Qwen2.5-Math-1.5B Model's Longcot Struggles: A user found that the Qwen2.5-Math-1.5B model has difficulties with longcot examples, needing help with configuring the dataset structure and the GRPOTrainer.
- They linked their Kaggle notebook requesting guidance on solving these issues.

Interconnects (Nathan Lambert) Discord

Unitree Unleashes Open Source Trove: Unitree Robotics has open-sourced multiple repositories, offering access via their GitHub.
- This move opens up possibilities for collaborative development and innovation in robotics.
GPT-4.5 Ascends Arena Throne: GPT-4.5 has seized the top spot on the Arena leaderboard across all categories, including Multi-Turn, Hard Prompts, Coding, Math, Creative Writing, Instruction Following, and Longer Query (source).
- The latest ratings cement GPT-4.5 as state of the art for the moment.
Anthropic's Astronomical Ascent Continues: Anthropic secured $3.5 billion in funding at a staggering $61.5 billion post-money valuation, with Lightspeed Venture Partners leading the charge (source).
- The funding aims to advance their AI systems' development, deepen understanding of their functionality, and propel international growth.
Grok3 Pricing Structure Surfaces?: Potentially leaked Grok3 pricing details suggest costs of $3.50/million for input, $0.875/million for cached input, and $10.50/million for output, as reported in this tweet.
- The leaked pricing model offers insights into the potential costs for leveraging Grok3 in various applications.
Human Data Still Vital for Real-World AI?: A blog post (https://www.amplifypartners.com/blog-posts/annotation-for-ai-doesnt-scale) contends that human data remains essential for building truly useful AI products.
- This perspective challenges the notion that synthetic data alone can drive substantial advancements in model performance.

Yannick Kilcher Discord

Claude Cracks Coding Challenge: A member reported using Claude and Cursor to complete 95% of the work on this GitHub Pull Request involving granular configuration options.
- The member was working on the object-property-newline rule by adding support for granular configuration options, allowing developers to specify different behaviors for different node types.
Tackling Tricky Time Slots: A member initially considered presenting on Joscha Bach, but it's unclear if this was the final topic.
- Another member offered to present in the <t:1741046400:F> timeslot if there were no other presentations scheduled, and offered further advice to those interested.
Elsagate Erupts Again: A member shared a YouTube video titled "Elsagate 3.0 Is Worse Than we Thought" with a warning that it is NOT FOR CHILDREN.
- Another member responded, stating, "Well, that is horrifying."

Notebook LM Discord

Financial Statements enter NotebookLM: A member inquired about loading financial statements for analysis into NotebookLM to automate financial analysis.
- This suggests interest in using NotebookLM for professional tasks.
Podcast Length Debate and Timeline Demands Aired: Concerns were expressed about podcast length and coverage of important topics, referencing a Supreme Court Application found here.
- A member requested timelines be added to the podcast free version, while another member shared an example notebooklm podcast.
Dynamic Docs, a Feature that is MIA: Members are curious if NotebookLM can dynamically update from sources like Google Docs, for use cases like tracking furniture dimensions.
- Because the feature is not automatic it has lead to discussions about workarounds and feature requests.
Notebook Sharing Snafu Defused!: A user reported a server error when sharing notebooks with Gmail personal accounts, specifically "You are not allowed to access this notebook".
- The issue was resolved when the user found the recipient had a new phone not correctly configured with their gmail account.

Stability.ai (Stable Diffusion) Discord

Face Copy Alternatives Emerge: Members debated the best ways to copy faces, with some preferring reference only in ControlNet while others recommended Reactor Faceswap as a preferable alternative to IP-Adapter.
- The community consensus seems to favor ControlNet for its versatility.
Reforge's AMDGPU Support Remains Murky: A user reported conflicting information regarding Reforge supporting AMDGPU, as it's mentioned on the Stability Matrix but not on the GitHub page.
- Another user's attempt to use Zluda resulted in PC freezes, leading to skepticism about the accuracy of the Stability Matrix and a recommendation to use a UI outside matrix.
DirectML and Reforge Don't Mix: A member's attempt to use Reforge with DirectML after Zluda failed proved unsuccessful.
- There was a discussion about a potential fork of Reforge for AMD by Lshqtiger.
CivitAI Offers Free Image Generation: Members discussed CivitAI as a platform for image generation requests, noting it provides a few starting credits and 25 free daily credits that can be saved.
- The cost to use the platform depends on the model selected.
Local Image Generation Requirements Detailed: One member asked about the requirements for creating images locally; another responded that a GPU with around 6-8GB VRAM is recommended, along with other resources in <#1002602742667280404>.
- Another member shared links to CivitAI for online generation as an alternative.

Eleuther Discord

ReasonableLLAMA-Jr-3b Seeks Feedback: A member requested feedback on their ReasonableLLAMA-Jr-3b model, a reasoning model trained with GRPO on LLAMA 3.2 3B, based on concepts from the Atom of Thoughts (AoT) paper.
- The model uses a custom-written GRPO-based Agent in a Gym environment using MLX, where each state transition in the reasoning process is a self-contained, atomic question, as described in Atom of Thoughts for Markov LLM Test-Time Scaling.
Recurrent LLM Reasoning: Exorbitant?: Members debated whether recurrent LLM reasoning, which requires compute equivalent to a 32B parameter model to match the performance of a 7B model, is practical.
- The key question posed was: why not train a 32B parameter model instead and use early exit, mixture of depths, or speculative decoding for cheaper inference?
Troubleshooting 'trust_remote_code' in Harness: A user questioned if trust_remote_code is unconditionally set in lm-evaluation-harness, pointing to a specific line in the GitHub repo.
- A member clarified that trust_remote_code is set only if the --trust_remote_code argument is provided, referencing the relevant section of the code.
Unveiling Dataset Kwargs Pathway: A user inquired whether setting trust_remote_code would override dataset_kwargs when loading a local dataset.
- A member clarified that dataset_kwargs are passed to datasets.load_dataset(...) within the harness, linking to the relevant part of the code.
User Reports Dataset Generation Error: A user reported encountering a dataset generation error while running lm_eval with a configuration specifying dataset_path: json and a data_dir containing train.jsonl, validation.jsonl, and test.jsonl.
- In response, a member advised manually testing the dataset loading with load_dataset and trying an absolute path for the data directory.

MCP (Glama) Discord

MCP Terraform Registry Faces Issues: Users reported issues getting terraform-registry-mcp and aws-mcp server to function, especially with Claude desktop and Cline when a system-level proxy is enabled, causing mcp-server-fetch errors.
- The issue seems related to proxy settings interfering with the server's ability to fetch necessary resources.
Multi-Agent MCP Architectures Arise: A member explored implementing MCP for multi-agent systems, referencing an Anthropic workshop at AI Engineering Summit and shared an image from the workshop.
- They are building a framework for agents to cooperate across devices and considering adopting MCP, with inspiration from examples like BabyAGI and Stanford generative agents.
Fast Agent Framework Floats into Focus: A member shared their project, fast-agent on GitHub, for Defining, Prompting and Testing MCP enabled Agents and Workflows.
- The framework allows each agent to be configured with a separate set of MCP servers and can be called as a tool by other agents.
Node Version Nightmares Nag Claude Users: Users reported encountering a Cannot find package 'timers' error when using fastmcp in Claude desktop.
- The problem was traced back to an outdated Node v14 version that Claude was utilizing.
MCPHub.nvim Navigates Neovim: A new MCPHub.nvim plugin was released, which assists in managing MCP servers within Neovim, and offers features like smart server lifecycle management and integration with CodeCompanion.nvim for AI chat.
- The plugin, installable with a single command (:MCPHub), provides a streamlined setup process for MCP server management.

DSPy Discord

Ash Framework Ecosystem Gets Love: A member suggested the Ash framework for a project, pointing to the ash-project/ash_ai GitHub repository.
- They highlighted instructor_ex, which provides structured outputs for LLMs in Elixir, and directed users to the Ash Discord community for guidance.
Async Support Initiative Ignites DSPy: A member inquired about the motivations and anticipated performance boosts of full async support in DSPy, and linked to another Discord invite link.
- A core contributor announced intentions to make async support native, and requested feature requests via GitHub issues to prevent Discord oversight.
LangProBe Benchmarks Program Composition: A new paper, LangProBe: a Language Programs Benchmark, evaluates the impact of DSPy program composition and optimizers on different tasks, while exploring cost/quality tradeoffs.
- As noted in its X/Twitter post, the paper shows that smaller models in optimized programs can outstrip larger models at a lower cost.
Minions Prep for Cost Dominance: A member indicated that the just-released LangProBe paper provides a good baseline for benchmarking their implemented minions feature, referencing their closed pull request.
- The member added MinionsLM and StructuredMinionsLM for intelligent LM routing, and emphasized the direct relevance of the paper to cost optimization.

LlamaIndex Discord

AgentWorkflow Context: A member asked about the distinction between Context and Chat History within AgentWorkflow.
- Another member responded that the chat history is inside the context.
LlamaIndex Integrates MCP Support: A user asked about MCP support in LlamaIndex, and another confirmed it exists with an example notebook.
- The notebook demonstrates how to use MCP with LlamaIndex.
LlamaParse's Latest Models Parse with Agent: The 'Parse With Agent' mode now supports AnthropicAI Claude Sonnet 3.7 and Google Gemini 2.0 Flash, enhancing table parsing and cross-page consistency (announcement).
- These updates should improve the accuracy and reliability of parsing complex documents.
Need PII? Ask LlamaIndex!: A member is seeking both paid and open-source options for redacting Personally Identifiable Information (PII) from PDFs and images before sending them to an LLM.
- This request highlights the growing need for robust PII redaction tools in LLM applications.
Windsurf not riding high due to Checkpoint Absence: A member noted the absence of a checkpoint feature in Windsurf, noting the inability to revert to previous states despite repeated coding attempts and file/workspace manipulations.
- The member attached an image illustrating their attempts to drag and drop files into the tab menu, seeking a way to access previous checkpoints.

Latent Space Discord

AI Not Quite Replacing Programmers Yet: An O'Reilly article suggests AI tools are evolving programming, similar to historical changes since the early days of physical circuit programming.
- Members agreed, noting that LLMs accelerate learning and that this is similar to past complaints about copying from StackOverflow.
Senior Engineers Rule AI Outputs: Senior engineers effectively guide AI's output with their expertise, preventing unmaintainable code when using tools like Cursor or Copilot.
- While AI speeds up implementation, senior engineers ensure code maintainability, a skill often lacking in junior engineers.
Anthropic Scores Massive Funding Round: Anthropic secured $3.5 billion in funding, valuing the company at $61.5 billion post-money, with Lightspeed Venture Partners leading the round.
- This investment will support the advancement of AI systems, enhance understanding of their functionality, and support global expansion.
Stagehand Tooling Sought for Pythonistas: Following a Latent Space podcast episode on Browserbase, a member sought a self-healing browser workflow tool akin to Stagehand in Python.
- Another member suggested stagehand-py, noting that “it's wip”.
Cursor Beats Claude Code in Code Cagefight: Members compared Claude Code against Cursor, with Cursor being favored for its rollback capabilities.
- Feedback indicated that Claude Code struggles with focus, adds unnecessary code, is more expensive, and lacks the speed of Cursor for code edits.

tinygrad (George Hotz) Discord

Tinygrad Aims for Fair Compute Marketplace: George Hotz (@tinygrad) describes tinygrad as a formalist project to capture Software 2.0 in a non-leaky abstraction, aiming for a fair marketplace for compute, similar to Linux and LLVM.
- Hotz anticipates tinygrad's speed on NVIDIA to match the existing torch CUDA backend by year-end, sans CUDA, and envisions a test cloud for renting FLOPS in a lambda function.
Ops.CAT Speed Bounty Faces LLVM Rewriting Issues: A member reported ongoing challenges with the Ops.CAT speed bounty, specifically struggling to get it to rewrite in LLVM, despite being scheduled.
- The current Ops.CAT operations feature a complex structure of PAD, RESHAPE, and BUFFER operations, with arg representing the two tensors to concatenate.
RDNA2/RX6000 Usability Inquiries with tinygrad: A user asked about RDNA2/RX6000/GFX1030 usability with tinygrad, reporting an OSError: [Errno 22] Invalid argument when running AMD=1.
- Another member said that it should work on Linux, requesting the trace for the OS error which was provided in a trace.txt file.
Intel Arc A770 Plays Nice with OpenCL: A member confirmed that Intel Arc A770 is indeed usable with tinygrad.
- The recommendation is to utilize the OpenCL backend by setting GPU=1.

LLM Agents (Berkeley MOOC) Discord

Sutton Dives into Coding Agents: Amazing guest speaker Charles Sutton presented on Coding Agents and AI for Vulnerability Detection, at Lecture 5.
- The lecture explores using LLM agents for computer security tasks like finding software vulnerabilities, and discusses design issues in LLM agents.
DeepMind Researcher Wins Accolades: Charles Sutton, a Research Scientist at Google DeepMind, has research in machine learning motivated by applications in code generation, software engineering, programming languages, and computer security.
- Sutton's work in software engineering has received two ACM Distinguished Paper Awards (FSE 2014, ICSE 2020) and a 10-year Most Influential Paper award (MSR 2023).
Quiz Posting Day Revealed: A user asked when the quiz is posted each week, to which another user responded that they generally try to release it Wed/Thurs.
- This question was asked in the mooc-questions channel.
Audio Issues Plague Lecture: A member reported not being able to hear questions during the lecture due to audio problems, requesting assistance from someone in the room, in the mooc-lecture-discussion channel.
- A staff member apologized for the audio issues during the lecture and promised to remind the speakers to repeat all questions going forward.

Cohere Discord

Cohere Image Embedding Issue Vanishes: A user reported an issue with embedding images using Cohere, but later confirmed that the issue mysteriously resolved itself.
- Another member simply acknowledged the resolution without further comment.
Cohere Probes Pesky 504 Errors: A Cohere member mentioned that while they didn't observe a spike in 504 errors, they did note super slow requests as a potential cause.
- The member is planning to investigate the source of the slow requests further, thanking the user for the heads up.

Modular (Mojo 🔥) Discord

owned Becomes own by Pull Request: A member submitted a pull request to rename owned to own for consistency with the rest argument convention.
- The renaming aims to align with established coding practices and enhance readability.
Community Meeting Needs Speakers: The upcoming community meeting, scheduled in one week, seeks speakers to present talks or showcase projects.
- Interested individuals should contact the organizers to secure a spot on the agenda.
AWS GenAI Loft Hosts MAX Engine Event: An event titled Beyond CUDA: Accelerating GenAI Workloads with Modular’s MAX Engine, Hosted by AWS will take place at the AWS GenAI Loft.
- The event, targeted for the Bay Area audience, is scheduled for tomorrow evening.
SIMD DType Construction Explained: A discussion clarified that SIMD[DType.uint8, 1](0).type returns the dtype at compile time, using var a = UInt8(0); alias dtype = __typeof(a).type as an example.
- A member highlighted that SIMD includes construction checks within its implementation, which helps with validity and type safety.
Parameter Injection Favored over Globals: In response to a question about using globals, a member asserted that injecting parameters is generally preferable, if you have the time.
- This preference aligns with best practices for code maintainability and testability.

Torchtune Discord

Step-Based Checkpointing Keeps Compute Alive: Members expressed interest in, and confirmed the ongoing implementation of, step-based checkpointing to mitigate compute waste from training failures.
- This feature saves progress at regular intervals, reducing the impact of interruptions.
Torch Users Trace with Tensorboard: Torch users debated strategies for visualizing profiler traces, initially attempting Tensorboard but noting the removal of certain plugin features for PyTorch.
- They recommended the PyTorch memory visualizer tool and Perfetto for memory and timing traces as sufficient for following the trail.
Alternative Profiling Tools Prevail: The discussion underscored the PyTorch memory visualizer tool and Perfetto as solid alternatives for memory and timing traces, respectively.
- These tools arose after users flagged issues with Tensorboard, specifically the absence of some plugin features for PyTorch.

Nomic.ai (GPT4All) Discord

Ollama vs GPT4All: Which Llama Reigns Supreme?: A user questioned why people choose Ollama or Llama.cpp over GPT4All, arguing that GPT4All's out-of-the-box functionality makes it a better choice.
- The user did not provide specific details on the comparison metrics, but emphasized the ease of use as a key advantage.
GPT4All Interface to Get Catalan Language Support: A community member requested the addition of Catalan as a language option for the GPT4All interface.
- The request highlighted the presence of Catalan speakers in the community and the potential benefit of localized support.
Security hole found, GPT4All v3.10.0 faces Vulnerability: A user reported a potential vulnerability in GPT4All v3.10.0 and asked about the proper way to report it.
- No details about the nature of the vulnerability were disclosed in the message, but prompt reporting was advised.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (745 messages🔥🔥🔥):

Cursor IDE, MCP, Landing Page Design, Model Performance, Repo Prompt

Cursor Unstable & Buggy: Users are reporting instability, connection failures, and non-functional checkpoints in the latest Cursor release.
- One user stated, it's just incredible how unstable is cursor currently, with many considering to switch to other alternatives like Windsurf and Trae AI.
MCP Configuration Headaches: Members are struggling to configure MCP servers in Cursor, especially with Windows and remote Ubuntu workspaces, facing issues like client creation failures.
- One user needed help with Firecrawl MCP server setup, but eventually got Pupeteer working and said Ty for help yall Im dumbsorry.
3.7 not quite heaven: Users are experiencing issues with Claude 3.7 such as it being insanely slow and prone to stopping mid-request without errors.
- One user stated, 3.7 is really unstable atm, which has led many to use Cursor's 'Ask' mode or switch to older versions for important tasks.
Designers Dig into Landing Pages: Members share landing page designs generated with Cursor and discuss their aesthetic and effectiveness.
- The community debated the merits of different designs and compared them to sites like Linear, Framer, Magician Design, and Webflow for inspiration.
Repo Prompt Gains Traction: Users are hyped about Repo Prompt, praising its multi-file edit capabilities and integration of code snippets.
- The community has also mentioned BrowserTools for debugging, as well as PasteMax, described as an open source poor man's version of Repo Prompt for selecting files.

Links mentioned:

Codeium (Windsurf) ▷ #announcements (1 messages):

Windows ARM support, Windsurf Next, Ubuntu 24.04, Claude 3.7 Sonnet, MCP Tools

Windsurf adds Windows ARM support: Windsurf Next <:wsnext:1336821369685540914> now supports Windows ARM64 as of this weekend, and can be downloaded here.
Ubuntu 24.04 bug squashed: Both Windsurf 1.3.11 and Windsurf Next have been patched to fix crashes caused by permissions errors on Ubuntu 24.04 (changelog).
Cascade Now Supports Claude 3.7 Sonnet: Windsurf Next now supports Claude 3.7 Sonnet which takes 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call.
Windsurf Addresses MCP Tool issues in JSON: Windsurf 1.3.11 patched MCP fixes for incorrectly formatted MCP tools in JSON, and provides better MCP Error Handling.

Links mentioned:

Codeium (Windsurf) ▷ #discussion (37 messages🔥):

Codeium Pro issues and snoozing, Supercomplete availability in Codeium, Visual Studio Codeium extension versions, JetBrains extension issues

Codeium Pro users face snoozing problems: A user reported that Codeium Pro stops working and snoozes itself, requiring manual re-enablement, which is a known issue for some users.
- The user found that using the pre-release version of the extension provided a better experience, though still not ideal.
Supercomplete's Status Clarified: Supercomplete, initially marketed for Pro plans, is now included in the free plan but functions fully only in the pre-release version of the extension due to API issues caused by Microsoft.
- One user noted that they wished to be notified of such changes.
Visual Studio Version Lags: A user noted that the Visual Studio version of the extension provides inferior suggestions compared to the VSCode version.
- It was revealed that the Visual Studio extension uses an older Codeium LSP, with a link to the GitHub repo provided for reference.
JetBrains Plugin stuck on Processing request: Users of the JetBrains plugin reported being stuck in the Processing request phase, leading to errors.
- This problem occurs specifically in the latest pre-release version, causing the plugin to be unable to generate responses.
Enterprise features lagging behind Windsurf: A member mentioned that Enterprise features are lagging the other subscription types.
- Another member countered that Enterprise and Team plans have the same cadence and that this is done to ship only totally battle-tested features.

Link mentioned: GitHub - Exafunction/CodeiumVisualStudio: Visual Studio extension for Codeium: Visual Studio extension for Codeium. Contribute to Exafunction/CodeiumVisualStudio development by creating an account on GitHub.

Codeium (Windsurf) ▷ #windsurf (432 messages🔥🔥🔥):

Codeium customer support, Premium flow action credits, Windsurf's new update, Claude 3.7, Multiple selection using CTRL+D

Users Experience Poor Codeium Customer Support: Users have reported experiencing poor customer support from Codeium, with one user mentioning they have been trying to follow up with their subscription issue for the last 4 weeks without a resolution.
Premium Flow Action Credits Depleting Rapidly: Users reported that their premium flow action credits are depleting rapidly, within 4-5 days, even though they still have prompt credits left, and some users are now considering switching to Cursor.
- One user speculated that changes to Claude 3.7 might be the cause, as it now uses flow actions per prompt, leading to high credit consumption.
Windsurf's New Update Causing Issues on Ubuntu: A recent Windsurf update caused issues for users on Ubuntu 24.04, with the application failing to start and displaying a FATAL:setuid_sandbox_host.cc(158) error, requiring a manual fix to change the permissions of chrome-sandbox.
- One user reported that the update crashed their Ubuntu system, leading to a reinstallation and data loss, highlighting the need for proper backups before updating.
Claude 3.7 Code Model Credit Consumption Rate Sparks Debate: Members observed that Claude 3.7 is consuming credits rapidly due to excessive tool calls per prompt, with some experiencing 30-40 tool calls for small changes, also the implementation of 3.7 was too hasty.
- Some users are now sticking with 3.5 or other models for better efficiency, and there are strong suggestions by users for Codeium to hide 3.7 as a default model.
Windsurf struggles to move cursor using CTRL+D: A user reported an issue with multiple selection using CTRL+D in Windsurf, where moving the cursor with CTRL + Left or CTRL + SHIFT + Left only moves the cursor at the first selection.
- It was recommended to check the status bar to confirm if you are in multi-selection mode, and it was found out that the multiple selections disappears from status bar.

Links mentioned:

OpenAI ▷ #annnouncements (2 messages):

Sora onboarding session, Sora prompt crafting

Sora 101 Live Onboarding Happening Soon!: Join the Sora team's <@713183296099450910>-hosted live onboarding session to cover Sora fundamentals, optimal prompting techniques, and early access artist onboarding insights.
- The live session starts <t:1741024800:R> and you can join the discussion via this discord link or this one.
Crafting Great Prompts for Sora: The Sora 101 session will offer best practices for crafting great prompts, based on insights from the onboarding process for early access artists.
- Whether new to Sora or looking to refine your approach, this session is a great opportunity to learn and ask questions.

OpenAI ▷ #ai-discussions (423 messages🔥🔥🔥):

Mirror sites with pro accounts, GPT-4.5 Image Recognition, GPT prioritization of Pro vs Plus, Switching from ChatGPT to Grok, Gemini Free vs Pro Features

Mirror Sites Offer Pro Accounts for Free: Multiple mirror sites offer pro accounts for very cheap or free, and haven't been banned, instead OpenAI is shadow banning Plus users who used the service on multiple devices according to drinkoblog.weebly.com.
- The same user noted the realistic compute limits, arguing unlimited sounds better than '12 hours compute' on the surface, but is unrealistic in practice since some users never touch grass and will consume disproportionately more resources.
GPT-4.5 Image Recognition Draws Mixed Reactions: Members are debating whether the new GPT-4.5 has better image recognition compared to GPT-4o, with Future Machine being more vocal about OpenAI (OAI)'s choices, though initial tests show that GPT-4.5 scores a bit higher than 4o on the MMMU (vision oriented reasoning benchmark) with a +5% improvement.
- One member expressed how they felt they were deprioritizing plus users in favour of pro users, stating that plus now feels like second class citizens instead of the usual premium status.
Grok's Custom Instructions Released, Not Working For All: Grok AI's custom instructions feature has been released for all users according to a news source, allowing users to customize Grok as per their needs and make it respond accordingly.
- A member shared custom Grok instructions aiming for an abusive and lewd troll persona, but reported it doesn't work, and other users reported the same.
GPT pro or Grok Super: Users discuss the pros and cons of switching from ChatGPT Pro to SuperGrok with Grok being much faster, smarter for general tasks and some API with outdated info.
- A user claimed that Gemini’s context window has 1 million tokens, no filter and great for creative writing.
Dynamic learning system prototype: A user is developing a basic application prototype and will try injecting stellargraph nodes into it to create a learning system where they can inject the most relevant information to the AI dynamically, injecting knowledge in it.

Links mentioned:

OpenAI ▷ #gpt-4-discussions (30 messages🔥):

GPT Model Selection, Projects vs GPTs, Claude 3.7 vs ChatGPT, Context window size comparison, Clearing Chatlogs and Uploaded Data

Can't Choose Specific Model in GPT Creation: A user inquired about selecting a specific model when creating a GPT, but was told it defaults to 4o and you can't choose models in GPTs.
- A user stated you can use the Projects feature to change models to 4o, o1, 4.5 in a single chat.
Projects Outshines GPTs by Allowing Model Selection: One user found that while they got the best programming results using 03-mini-high, custom instructions in Projects don't work with o3 mini models (only file uploads).
- However, another user said that with a Pro subscription, using a combination of o1, 4o, and 4.5 in Projects allows for custom instructions with file uploads.
ChatGPT Bashed, Claude 3.7 Hailed: One user declared ChatGPT is a joke now and found Claude 3.7 to be very impressive.
- However, another user said Claude's projects are of no use, complaining that they can hardly upload only two files maximum and it says memory full, calling Claude over hyped.
Context Windows Compared between Models: A user touted that Claude can better understand the context in larger files, and pointed out its larger context window: 200K compared to ChatGPT's 128K.
- They clarified that the 128K context window on ChatGPT costs $200 with an Enterprise plan.
Chatlogs Clearance Conundrums: One user asked who knows to clear all chatlogs and uploaded data at studio using web interface or api call?
- They were testing an app and needed to remove many dummy files and logs.

OpenAI ▷ #prompt-engineering (2 messages):

Dall-E image generation, Synthetic plants, Image prompting strategies

Dall-E Generates Synthetic Plants with Transplant Organs: A member prompted Dall-E to generate an image of synthetic plants that grow hearts and livers for transplant, visible within a transparent membrane and nourished by the GM plant.
- The initial results emphasized hearts over livers, prompting the user to refine the prompt with more details about the liver lobes.
Crafting Detailed Prompts for Specific Dall-E Outputs: The model suggested basing the image on bioengineered plant-inspired organs with a focus on livers and hearts after an initial image couldn't be shown.
- The user iteratively refined the prompt to guide Dall-E's attention to specific features like liver lobes and the organs forming almost like fruit.
Showcasing Art and Following Community Guidelines: A member shared art prompts, noting the need to spoiler unsettling or horror-based images as per community guidelines in <#1107255707314704505>.
- This emphasizes awareness of community rules while sharing creative content.

OpenAI ▷ #api-discussions (2 messages):

Dall-E image generation, Synthetic plants growing organs for transplant

Dall-E Crafts Organs from Synthetic Flora: A member prompted Dall-E to generate an image of synthetic plants growing hearts and livers for transplant within a transparent membrane.
- After the first image stressed the hearts, the member rephrased the prompt to emphasize the liver lobes and the organs forming almost like fruit.
Navigating Image Content Guidelines: Members were reminded to use spoiler tags for images that may be unsettling or horror-based, following channel guidelines.
- The content creator had to spoiler some of the images they shared in order to comply.

Unsloth AI (Daniel Han) ▷ #general (252 messages🔥🔥):

Llama zipping WAVs confusion, GRPO training steps, 4-bit model saving issues, Unsloth team size, Continued pretraining of Unsloth

Llama Zips WAVs into Headaches: A member had a funny experience with a llama model: compressing a 192 KB ZIP file resulted in a 48 KB lossless WAV format, which it then tried to re-zip.
- The user noted, "confusion, after first working through packing a 48KB again, to make a smaller zip… That was the r1-1776-distill-llama-70b."
GRPO Training Steps Speculation: A user inquired about the number of training steps needed for LoRA training Qwen2.5-14B-instruct with GRPO to lower the loss, wondering if more steps are required for better reasoning.
- Another member suggested GRPO training might take around 24 hours, stating that convergence depends on the model and sampling, so there's no fixed time.
Saving 4-Bit Models into Trouble: A user ran into issues while saving a Phi-4 model in 4-bit after training it with GRPO, using an example Jupyter notebook provided by Unsloth.
- The error occurred during the saving process.
Unsloth's Docs Teach Checkpointing: A user inquired about loading LoRA weights from a previous run, to which another user pointed to the Unsloth documentation on finetuning from the last checkpoint.
- The documentation explains how to edit the Trainer to add save_strategy and save_steps in order to save checkpoints and resume training.
Unsloth Continued Pretraining is Very Helpful for New Languages: When a user asked about scenarios for continued pre-training, another user linked to the Unsloth documentation, stating it's useful for new languages.
- The member pointed out that the blog post is even better, explaining how Unsloth's release allows you to easily continually pretrain LLMs 2x faster and use 50% less VRAM than Hugging Face + Flash Attention 2 QLoRA.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (10 messages🔥):

Github Issues, inline_asm, GRPO, VLLM, Online Training

Github Repo Issues Reported: A member reported issues with reading open repos.
- They found some code with the inline_asm feature.
Unsloth uses VLLM for backend: A member asked how Unsloth works in production deployment and was told it uses VLLM for the backend.
- The process involves GRPO fine-tuning, then inferencing on the fine-tuned model using Unsloth + VLLM, rather than Continuous Training + Inference (online learning).
Online training requires implementation: A member clarified that during GRPO fine-tuning, VLLM is used to generate and grade outputs with reward functions before backpropagating loss.
- To implement online training, one would need to develop the mechanism themselves.

Link mentioned: stickbreaking-attention/stickbreaking_attention/sb_varlen/softplus.py at main · shawntan/stickbreaking-attention: Stick-breaking attention. Contribute to shawntan/stickbreaking-attention development by creating an account on GitHub.

Unsloth AI (Daniel Han) ▷ #help (54 messages🔥):

GRPO Training, Qwen2.5-14B-instruct fine-tuning, DeepSeek-R1-Distill-Llama-8B error, Mistral embedding model, GCC compiler issue

GRPO Training needs adequate steps: A member inquired about the number of training steps needed for LoRA training Qwen2.5-14B-instruct with GRPO to lower the loss.
- Another member suggested that 700-1200 steps are generally good for training a GRPO model using LoRA, but the optimal number depends on the dataset.
DeepSeek-R1-Distill-Llama-8B error: A user encountered a RuntimeError when training unsloth/DeepSeek-R1-Distill-Llama-8B on Kaggle, related to matrix multiplication shapes.
- Despite upgrading transformers with pip install --upgrade "git+https://github.com/huggingface/transformers.git", the issue persisted, and the user sought advice on resolving it.
Mistral Embedding Model missing lm_head: A user is trying to create an embedding model using Mistral but is facing issues with removing the lmhead for training.
- The user is seeking suggestions on how to properly remove the lmhead for effective training of the embedding model.
Fixing GCC Compiler Issue with vLLM: A user encountered a RuntimeError: Failed to find C compiler when running the GRPO tutorial locally with meta-Llama-3.1-8B-Instruct.
- Despite attempting to install GCC via conda, the issue remained unresolved, and the user is restricted from using apt-get due to security reasons on their school's HPC.
Decoding GGUF models naming convention: A user inquired about the naming convention for GGUF models, specifically the meaning of the UD prefix in models like unsloth/r1-1776-GGUF.
- A member explained that UD stands for Unsloth Dynamics, a quantization algorithm that provides better results than IQ.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (94 messages🔥🔥):

GRPO Reward Functions, Distilling Models for Agent Skeletons, SWE-bench Performance, String Replacement for Code Editing, Model Tool-Calling and Composition

GRPO Rewards Linearly: Reward functions for GRPO are scored using the same format as in Unsloth's script, with correctness measured by linear scaling based on matches between the extracted answer and the correct answer, using sum(i == j for i,j in zip(r,a))/len(a).
- The creator noted that GRPO works even without a language prior, leading to better exploration and varied reasoning traces during training.
Vladrad cracks Distilling Agent Skeletons: A member shared a model distilled from O3/R1, designed to generate good agent skeletons from pre-made functions, available on Hugging Face, after investing $400 to get the string replacement right.
- This approach aims to create a base model capable of handling languages that base models struggle with, such as Perl, by incorporating unit tests and problem-solving strategies from other repos.
Claude 3.5 Sonnet Sweeps Bench with SOTA: Anthropic's Claude 3.5 Sonnet achieves 49% on SWE-bench Verified, surpassing the previous state-of-the-art model's 45%; they assess a model's ability to complete real-world software engineering tasks by resolving GitHub issues from popular open-source Python repositories.
- A member referred to the bitter lesson: general methods that leverage computation are ultimately the most effective.
String Replace Debated as Coding Strategy: Members debated the effectiveness of string replacement for code editing, with one member arguing it's generally garbage because models aren't meant to write code this way.
- Despite reservations, another member reported success fine-tuning Qwen 2.5 for string replacement, especially when the model can see the file before making replacements.
New Algorithm: Reinforce++ enhances stability: A new algorithm, Reinforce++, claims to improve stability over the classic REINFORCE algorithm by incorporating elements from PPO, calculating advantage with A_t = r(s_t, a_t) - Beta * sum(KL divergence_t->T).
- The algorithm is said to be similarly performant to GRPO in terms of reward and speed but with greater stability during training.

Links mentioned:

Perplexity AI ▷ #general (378 messages🔥🔥):

Perplexity AI Bugs, Claude 3 Opus vs Sonnet, Deepseek Propaganda?, Perplexity AI Business Fellowship, GPT-4.5 Quality Concerns

Perplexity Web UI struggles with Rewrite Bugs: Multiple users report that Perplexity's web UI rewrite functionality is broken, with rewritten prompts always using the default model, pplx_pro, regardless of the selected option.
- Some users have also experienced the rewriting function duplicating the prompt instead of rewriting it, with <@883069224598257716> tagged for assistance.
Claude 3 Opus and Sonnet: A Model Mix-Up: Users are confused as to whether the model indicators are accurate, questioning whether they're getting Claude 3.7 Sonnet or Claude 3 Opus when selecting Claude in settings.
- Some find that Pro Search overrides the default model with Sonar, leading to discrepancies between selected models and actual models used.
Deepseek's Alleged Propaganda Outing: A user posted an image allegedly generated by Deepseek that was deemed politically motivated by the community.
- Another member said You are fake deepseek. Real deepseek doesn't talk on western affairs.
GPT-4.5 gets review bombed for poor riddle solving skills: Some users report poor output from GPT-4.5 on Perplexity, saying the model answers more creatively and accurately, and that the model indicator used to show which model when you used pro search.
- One member said they’re stepping it up with 4.5, back then it wasn’t able to solve a riddle like 4.5 in chatgpt now it solves it perfectly and response is exactly like 4.5 in chatgpt while showing off an example of their chat log.
Bookmark Organization with Perplexity: A user asked if Perplexity could organize their 7,000 browser bookmarks.
- The user accompanied their question with a Bart Simpson meme.

Links mentioned:

Perplexity AI ▷ #sharing (7 messages):

Shareable threads, Perplexity AI integrations, Ingredient breakdowns

Shareable Threads requested: A member was reminded to ensure their thread is set to Shareable, with an attached screenshot for reference.
- This ensures that other users can easily access and view the content of the thread.
Perplexity AI Integrations in Demand: Multiple users are exploring ways to integrate Perplexity AI into various applications and workflows as shown by shared search queries like integrate perplexity.
- This indicates a growing interest in leveraging Perplexity AI's capabilities in diverse contexts.
Deep Dive into Ingredient Breakdowns: A user shared search queries related to ingredient breakdowns, recommending in-depth analysis.
- The shared queries included requests for explanations of ingredients and must-have components: ingredient breakdown.

Perplexity AI ▷ #pplx-api (3 messages):

Open Source Claude-Code, Perplexity API Limitations, Obsidian Web Clipper Issue

Perplexity Offers API Credits for Open Source Claude-Code: Perplexity is offering free API credits to developers interested in building an open-source Claude-Code model with editor integrations and extensions, as announced on X.
- Interested individuals are encouraged to DM @GregFeingold and @AarashHeydari for more details.
Perplexity API Incompatibility with OpenAI impacts Obsidian: The Perplexity API isn't fully compatible with OpenAI, causing issues with tools like Obsidian Web Clipper.
- Specifically, the API requires an assistant message between user messages, a constraint not present in OpenAI-type APIs, leading to problems when Obsidian Web Clipper attempts to post consecutive user messages.

Link mentioned: Tweet from Aravind Srinivas (@AravSrinivas): If anyone wants to build an open source Claude-Code with some editor integrations and extensions, Perplexity would be happy to provide free API credits. Please DM @GregFeingold and @AarashHeydari

HuggingFace ▷ #general (118 messages🔥🔥):

RL fundamentals, DeepMind RL Course, Automated video generation, OpenAI image generation alternatives, Fine-tuning Phi-3 for Multi-modality

DeepMind delivers detailed RL course: Someone asked for recommendations for reinforcement learning resources comparable to Stanford's CS231n, and another member suggested the DeepMind x UCL Deep Learning Lecture Series 2021 on YouTube.
Video Automation System Seeketh Guidance: A Java developer seeks guidance on creating a local text-to-video automation system without prior AI coding experience, requesting step-by-step instructions.
Spaces Face Spaces' Spam Snags: Spaces have restrictions around prohibited programs like VNC, and rapidly creating a large number of free Spaces, which is considered SPAM, but there are Spaces that have been running for years without being restarted, so if you create them properly, they're very robust.
Fine-Tune Phi-3 for Multi-Modality: One member is fine-tuning Phi-3 for multi-modality, aiming to give images speech/text as input, using an A100-equipped Colab Pro.
- Another member warned that such fine-tuning would take 6+ A100s and run for approximately 2 weeks, though added that [QLora and Peft make anything possible with a positive attitude and a credible project].
HTML code generation using AI is not worth it: One member asked if its possible to have a small llm (at max, maybe 8b) that can use a seperate thing to work off of alecsharpie/codegen_350m_html, specifically for html
- Another member responded that using AI for raw HTML generation is not worth it, suggesting AI should manage a system that configures a layout and other systems put that into HTML because of the cascade of problems the precision on the details is just something that AI will not be able to achieve even with endless testcases

Links mentioned:

HuggingFace ▷ #today-im-learning (1 messages):

VLMs, Cracking VLMs

Cracking VLMs notes shared: A link to a Google Docs document titled Cracking VLMs, notes so far was shared, at this link.
Upcoming VLM Cracking Competition: A user is preparing a series of challenges on VLMs for cracking and reverse engineering.

Link mentioned: VLM Notes: VLM Notes TODO : Understand how preprocessor handles images : More work into explaining vision_encoder Resources : https://github.com/merveenoyan/smol-vision Smolvlm and idefics Moondream especia...

HuggingFace ▷ #cool-finds (7 messages):

Dataset Viewer Errors, FastRTC

Dataset Viewer plagues users: A user suggested fixing Dataset Viewer errors for compatibility with various libraries and SQL, enhancing discoverability.
- Another user thanked them in advance, and jokingly requested an additional 1.2M rows of a HQ dataset.
FastRTC Project: Anyone Working on It?: A member inquired whether anyone is actively working on FastRTC.
- Another member pointed them to a specific channel.

HuggingFace ▷ #i-made-this (7 messages):

AI Story Studio, MoD ControlNet Tile Upscaler, VAE comparison, Remote VAE from HF, Cross-device browser-based scratchpad

AI Story Studio launches for collaborative storytelling!: A new interactive storytelling experience called AI Story Studio has launched, allowing users to co-write adventures with AI by picking a genre, guiding the story with prompts, and downloading the final result from AI Story Studio.
- The tool aims to practice creative writing, overcome writer’s block, and explore storytelling with AI-generated ideas.
MoD Upscaler Enhances Images Without Quality Loss: The MoD ControlNet Tile Upscaler for SDXL tool was launched, which uses tiling technology to upscale images with preserved details and smooth transitions, as shown on the Demo App and Github Code.
- The upscaler offers preserved details, advanced tiling tech, fast performance, and a user-friendly interface for professional-quality results.
Compare VAE quality with interactive demo!: An interactive demo comparing the reconstruction quality of various VAEs has been released by @rizavelioglu, link to Space.
- A user asked to add the remote VAE from HF, blog post at huggingface.co.
Build Cross-Device Scratchpad for Math Learning!: A member built a cross-device, browser-based scratchpad to augment math learning, which allows users to access the same scratchpad across devices using an ID, it is barebones, but potentially useful for others.
- A demo video was shared (scratchpadDemoOutput.mp4), using Firebase which was turned off due to resource concerns.
InternVL 2.5 AWQ Conversion Released: An AWQ conversion of InternVL 2.5 was released, showing little degradation in performance compared to the original, found at HuggingFace.
- Unlike the author's version, this AWQ version is compatible with the transformers library.

Links mentioned:

HuggingFace ▷ #core-announcements (1 messages):

Remote VAE Decode endpoints, Hybrid Inference, SD v1, SD XL and Flux

Sweet Honey Optimization Deployed: The new codename honey is live on Remote VAE Decode endpoints for SD v1, SD XL and Flux, reducing latency up to 10x.
- The change empowers local AI builders with Hybrid Inference.
Hybrid Inference Benefits Listed: Hybrid Inference offers a fast and simple way to offload local generation requirements, reducing requirements and offering the highest quality without sacrificing performance.
- It's free, fully compatible with Diffusers, and developer-friendly with simple requests and fast responses.
VAE Encode is Coming Soon: Quickly decode latent representations into high-quality images without compromising performance or workflow speed with VAE Decode.
- Efficiently encode images into latent representations for generation and training with VAE Encode (coming soon).

Link mentioned: Hybrid Inference: no description found

HuggingFace ▷ #computer-vision (3 messages):

Audio to Video matching, ViT resources, ViT and Global Average Pooling

Audio & Video get Synced: A member pointed to a discussion on how to match audio to video in this discord channel.
ViT resources sought, clarity craved: A member requested resources to understand Vision Transformer (ViT), specifically how each attention head contributes to the CLS token or captures information about the image.
- They noted existing explanations state CLS token is used the same as BERT, but found this explanation insufficient.
ViT: Average Pooling Questioned: A member inquired why Global Average Pooling cannot be used, as mentioned in the ViT paper.

HuggingFace ▷ #NLP (2 messages):

Web scraping with Python, Running Phi-4 as real-time API

Pythonistas Ponder Web Scraping: Members are asking for advice on web scraping data from sites like Wikipedia using Python.
- While specific methods weren't detailed, common practices involve libraries like Beautiful Soup and Scrapy for parsing HTML content.
Phi-4 Phans Fancy Real-Time API: Someone inquired about running the Phi-4 model as a real-time API using websockets.
- Unfortunately, there were no shared experiences or advice on this topic within the discussion.

HuggingFace ▷ #gradio-announcements (1 messages):

Gradio, Groovy, Python to Javascript

Gradio kicks off Groovy: Python gets Javascript!: Gradio introduces Groovy, a tool that converts Python functions to JavaScript for client-side execution in Gradio apps, enabling developers to write code once in Python and achieve JavaScript performance without the burden of maintaining dual codebases (docs).
Groovy Transpiler Promises Clarity: Unlike other transpilers, Groovy is designed to provide clear error messages when it cannot transpile specific code, focusing on supported elements such as simple Python functions, a subset of the Python standard library, and Gradio-specific classes.
- This approach ensures developers are aware of limitations when crossing languages, prioritizing transparency over attempting to handle all possible scenarios.
Client-Side Functions Boost Gradio Responsiveness: Gradio allows you to run certain "simple" functions directly in the browser by setting js=True in your event listeners, which will automatically convert your Python code into JavaScript.
- This improves app responsiveness by avoiding server round trips, especially beneficial on hosted applications with high load or latency.

Link mentioned: Client Side Functions: A Step-by-Step Gradio Tutorial

HuggingFace ▷ #smol-course (5 messages):

Smol Agents Quiz, NLP Reasoning Course, ClaudePlaysPokemon replication with smolagents

Smol Agents Quiz Frustrates Users: A member expressed frustration with the Smol Agents Quiz, citing issues with unclear requirements and receiving a score of 0.0 out of 5 despite multiple attempts, linking to the quiz's app.py file.
- The member noted the need to mine error logs to understand the exact providers required for tools and models.
NLP Reasoning Course Released: HuggingFace released a new unit in their NLP course focused on Reasoning models, titled The Reasoning Course.
- The course aims to help students understand reinforcement learning and its role in LLMs, including how to use and contribute to Open R1, and mentions that the course is to help students and learners to use and contribute to Open R1.
Replicate ClaudePlaysPokemon with SmolAgents: A community effort is underway to replicate ClaudePlaysPokemon using smolagents, as detailed in this GitHub repository.
- This project serves as a benchmark for LLM agents in a simulated environment.

Links mentioned:

HuggingFace ▷ #agents-course (87 messages🔥🔥):

Introductions, Lambda Go Labs, CodeAgent LLM Size, Quiz Grader Issues, Inference Credits Exhaustion

Lambda Go Labs Spark AI Excitement: Lambda Go Labs, is a community by Lambda Go and Future Technologies Limited focused on AI learning, building, and research.
- The community offers hands-on experience, opportunities to share work, and a supportive network for both experienced professionals and newcomers.
LLM Size Debate for CodeAgents Explodes: A member inquired whether a 32B LLM is necessary for a CodeAgent, or if a smaller distilled model would suffice for learning purposes.
- They noted that smaller models yielded rubbish answers when used for playing around with SmolAgents.
Final Quiz Grader Under Fire: Multiple members reported that the final quiz grader in unit 2.1 is obviously broken and inquired about when it will be fixed.
- One user linked to a discord thread discussing this issue.
Inference Credits Disappear Quickly!: A user reported exhausting their monthly included credits for Inference Providers despite only completing course requirements.
- Another user suggested it may be related to switching between Google accounts or kernel dying and re-running, and they were forced to upgrade to PRO.
ToolCallAgent Troubles Surface: A member reported issues with MultiAgent Architecture using ToolCallAgent in smolagents, where the fallback to a child agent with web access failed.
- Specifically, the manager agent couldn't delegate web search tasks to the child agent, despite the child agent having the necessary tools.

Links mentioned:

HuggingFace ▷ #open-r1 (2 messages):

Replicant model training, R1 reasoning dataset for coding tasks

Replicant Model Training Halted: The research team paused the procedural generation of a 25 petabyte dataset for training the replicant model.
- The team had to return to other work rather than creating the dataset.
Quest for R1 Coding Dataset Arises: A member inquired whether an R1 reasoning dataset exists for coding tasks.
- There were no answers to this question.

aider (Paul Gauthier) ▷ #general (121 messages🔥🔥):

Aider Leaderboard, Claude Code, Grok vs O3 Mini, anon-kode, python + uv

Aider Leaderboard Tooling Benchmarks: The Aider leaderboard is used to benchmark AI models while using Aider as the primary tool/assistant; Claude Code is recognized as a tool/assistant rather than an AI model and is also benchmarked.
- A user pointed out the need for a tool-agnostic benchmark like SWE Benchlets for comparison.
Anon-Kode forks Claude Code, goes OpenAI Compatible: A member shared that the dude who grabbed the source for Claude Code (link to original tweet) has now released a modified version that works with OpenAI compatible APIs (link to tweet) and available on GitHub.
Set up Aider in personal project: A user asked for advice about using Aider with a personal project's Git repository and virtual environment.
- Others suggested installing Aider globally or in the project's venv and using tools like uv to sync environments.
Grok Debugging > O3 Mini Code Creation?: A user mentioned Grok is good at debugging, but O3 mini high sonnet might be better at code creation, like adding new functions.
- They also noted Claude 3.7 adds unintended stuff and deepseek-chat with O1 Pro has been almost 95% fine for them as an editor.
LLMs Censorship: Some members discuss the increased censorship in language models, particularly regarding kernel-level code generation and the need to label prompts as "educational only" to bypass restrictions.
- Grok3 seems to refuse to do the request, guess there still are limits.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (85 messages🔥🔥):

Gemini 2.0 Pro Model issues, Aider + RAG/Vector embeddings, Aider editing with git diff, Aider with OpenRouter models and edit modes, Aider Architect mode

gemini/gemini-2.0-pro-exp-02-05 RESOURCE_EXHAUSTED issues: A user reported encountering RESOURCE_EXHAUSTED errors when using the gemini/gemini-2.0-pro-exp-02-05 model with a large context window in Aider, while the gemini-2.0-flash-thinking-exp-01-21 model works fine.
- They asked if there's a way to use the Pro model with a full context window without limits.
User solved the problem with Aider Extension using workon command.: A user created an Aider extension with a new /workon command, which analyzes imports in a file and passes a list of relevant files to the add command.
- The user claims it saves a lot of time and they have implementations for TS, Vue, Kotlin, and Java, but it's described as rather ugly and doesn't support --subtree-only.
Requesting Aider edit files with Git Diff Syntax: A user wants Aider to edit files using git diff syntax (like <<<<<< branch, ======, >>>>>>> replace) directly in the file, instead of just showing it in the terminal and replacing the text.
- They want to be able to edit the changes before accepting them, but others pointed out this is not possible without forking the project, also you can use built-in IDE diff tool.
Sonnet vs OpenRouter recommendation for Aider: A user requested recommendations for models to use with Aider's edit modes that aren't too expensive, finding o1-preview to be both costly and ineffective.
- Another user suggested using r1-free or gemini flash thinking for planning and Sonnet 3.7 for execution and shared a snippet of their aider config with model aliases and settings for edit formats.
Git commands for Aider: A user described a technique of creating a script to run Aider, using --load to load a script with commands on startup, and adding /run git diff to update on the latest changes, useful for working on a branch.
- A separate user suggests Aider could use git apply mypatch.diff to apply changes instead of having the LLM manually edit the whole file and add a --check step.

Link mentioned: GitHub - lutzleonhardt/copilot-proxy: Copilot Proxy is a Visual Studio Code extension that exposes the VS Code Language Model API via an Express server. This experimental extension is intended solely for research and prototyping purposes and should not be used in production environments.: Copilot Proxy is a Visual Studio Code extension that exposes the VS Code Language Model API via an Express server. This experimental extension is intended solely for research and prototyping purpos...

GPU MODE ▷ #general (1 messages):

Vision Models, Attention based ViTs, MLP-Mixer

MLP-Mixer: An alternative to ViTs: A member questioned why MLP-Mixer isn't used more often for vision models.
Attention based ViTs still prevail: The member noted that attention based ViTs are still the standard for SOTA vision models.

Link mentioned: MLP-Mixer: An all-MLP Architecture for Vision: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that w...

GPU MODE ▷ #triton (54 messages🔥):

SRAM vs Cache Confusion, Triton Scalar Constants Data Type, CUDA backend hyper-parameters, Triton Autotuning Resources, Triton BLAS Implementations

SRAM and Cache Clarifications: A discussion clarified the relationship between SRAM, registers, shared memory, and cache, noting that registers, shared memory, and cache are chip/software level properties built from SRAM.
- Any shared memory that isn't allocated actually becomes L1 cache.
Cache Level Control in Triton: It was discussed that while Triton doesn't offer direct control over cache levels (L1/L2), the cache_modifier argument in tl.load allows specifying whether to hit from L1 or L2.
- The difference between ca and cg is that cg specifies that the load should only hit from L2 and not L1 cache.
CUDA Cache Coherency Deep Dive: The CUDA documentation states that global data is coherent at the L2 level, but multiple L1 caches are not coherent for global data.
- It was explained that L2 cache is shared among all SMs, but each SM has its own independent L1 cache and that a threadfence can be used to flush L1 cache writes to global memory.
Scalar Constants Data Types in Triton: A user asked about specifying the data type of scalar constants in Triton to avoid unintended upcasting during operations like bitwise AND.
- Another user suggested to apply mask and then downcast it again to int8 x=(x&mask).to(tl.int8).
Block Scaled Matmul for INT8 in Triton: A user sought guidance on implementing block scaled matrix multiplication for INT8 in Triton, referencing a tutorial for FP4 and FP8 formats.
- The user encountered errors with tl.dot when attempting to apply scales due to hidden dimension requirements and sought advice on handling the scales part.

Link mentioned: Block Scaled Matrix Multiplication — Triton documentation: no description found

GPU MODE ▷ #cuda (25 messages🔥):

FP8 GEMM in CUTLASS, Determine Architecture for NVCC, Flash Attention Indexing

FP8 GEMM with rowwise scales achieved: A member was looking for an example of fp8 gemm with rowwise scales implemented in CUTLASS for sm100, eventually found a solution, relieving the need for further assistance.
Torch utility helps determine CUDA arch: A member needed to determine the --arch= of the current system for their build system, and another member suggested using the torch.cuda.get_device_capability() utility from PyTorch.
- An alternative solution was found by leveraging nvidia-smi --query-gpu=name,compute_cap --format=csv to query the GPU's compute capability directly, avoiding the PyTorch dependency.
Flash Attention Indexing puzzles developers: A member requested help with indexing in Flash Attention, specifically struggling with how to implement that part of the kernel.
CUDA Runtime API reveals device properties: A member discovered the CUDA Runtime API to programmatically select a compute-device which best matches certain criteria.

Links mentioned:

GPU MODE ▷ #torch (17 messages🔥):

FSDP2 OffloadPolicy, register_post_accumuate_grad_hook, load_inline CUDA kernels, reduce and not scatter, optimizer scaling

Users seek flexible FSDP2 OffloadPolicy: A user inquired about plans for FSDP2 OffloadPolicy classes, seeking greater control over gradient handling, specifically to reduce and not scatter.
- The response indicated no immediate plans but suggested exploring register_post_accumuate_grad_hook, though this runs after reduce scatter, which the user wants to avoid.
CUDA Kernel launch gives illegal memory access: A user reported issues with illegal memory access errors when passing memory pointers directly to CUDA code using load_inline.
- Another user suggested that the PyTorch CUDA caching allocator might allocate more memory than needed, potentially causing out-of-bounds reads in the direct pointer version while the tensor version works due to bounds checking.
Exploring All-Reduce for Gradient Aggregation: A user proposed an alternative approach of using All-Reduce to aggregate gradients, followed by immediate optimizer application and zeroing of gradients, to avoid offloading.
- A respondent questioned the scalability of this approach due to the 2x communication volume compared to reduce-scatter, and suggested gathering gradients into larger blocks instead.
Users consider parameter scaling methods: A user targets scaling on a single node, exploring options like scattering to CPU and accumulating, or gathering on rank 0 on CPU for the optimizer step.
- The goal is to find faster ways, highlighting the desire for an extensibility point in the system.

GPU MODE ▷ #algorithms (5 messages):

fa3, absmax quantization, hada transform

FA3 Works, Quantization Error High: After some initial issues, FA3 was reported to be working, but with a significantly higher quantization error than basic absmax quantization.
- It was suggested to perform absmax quantization after the hada transform, especially for 'v', to avoid out-of-distribution issues due to large activations.
Quantization Strategy Shift Proposed: To mitigate the quantization challenges with FA3, a strategy shift was proposed involving applying absmax quantization post Hada transform for better performance.
- The focus is particularly on the 'v' component, which is prone to large activations and out-of-distribution behavior if not properly quantized after the transform.

GPU MODE ▷ #jobs (1 messages):

Internship Opportunity, Low-Level Programming, LLM Inference, Mobile and PC Platforms

Internship Seeks Low-Level LLM Inference: A user is seeking interns for low-level programming to improve LLM inference on mobile and PC platforms, directing interested parties to this GitHub repo.
- The repo hosts code for running llama gemma on cupy and numpy.
GitHub Repo Focuses on llama gemma: The GitHub repository provided focuses on running llama gemma on cupy and numpy.
- This suggests a focus on optimizing LLM performance using numerical computing libraries.

Link mentioned: GitHub - githubpradeep/llm_np_cp: running llama gemma on cupy and numpy: running llama gemma on cupy and numpy. Contribute to githubpradeep/llm_np_cp development by creating an account on GitHub.

GPU MODE ▷ #beginner (5 messages):

Triton tensor creation, ROCm support for RX 7800 XT, NVIDIA GPU alternatives

Creating Triton Tensors of Scalars: A user inquired about explicitly creating a tensor of a scalar in Triton to specify the data type.
- They tried mask = tl.tensor(0xF, type=tl.uint16) but reported that it did not work.
RX 7800 XT ROCm Woes: A new GPU programming and AI user with a NITRO+ AMD Radeon™ RX 7800 XT 16GB reported issues getting it to work with PyTorch and other AI libraries.
- They noted that ROCm doesn't support anything lower than 7900.
NVIDIA GPUs: A Viable Alternative?: A member jokingly suggested to sell your GPU and buy NVIDIA, then provided serious alternatives for accessing NVIDIA GPUs.
- They recommended platforms like lightning.ai for free monthly GPU hours (availability unconfirmed), and runpod and vast.ai for affordable GPU renting.

GPU MODE ▷ #self-promotion (8 messages🔥):

Tilelang Kernel, Deepseek flashmla, MLA leaderboard, Bitnet group

Tilelang kernel rivals Deepseek flashmla: A member shared that with 80 lines of tilelang kernel code you can get 95% performance of deepseek flashmla (500% faster than triton) on H100, with a link to the GitHub repo.
Desire expressed for MLA leaderboard: One of the members expressed the desire to have an MLA leaderboard so others can flex.
- That same member asked another if they would be interested in a working group and re-purposing the bitnet group.
Tilelang Kernel receives positive feedback: A member stated that the kernel is well written.
- That same member stated that the documentation should probably be a tutorial/blog in itself.

Link mentioned: tilelang/examples/deepseek_mla at main · tile-ai/tilelang: Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels - tile-ai/tilelang

GPU MODE ▷ #reasoning-gym (11 messages🔥):

Chain of Draft PR, Throttling errors

Chain of Draft PR: Drafty Thinking Makes LLMs Swiftly Reasoning: A member added a PR for two new system prompt styles similar to those in the Chain of Draft paper.
- The Chain of Draft paper introduces a paradigm where LLMs generate minimalistic intermediate reasoning outputs, reducing verbosity and cost.
Eval Script Praised for Logging Upgrade: Members are grateful that whoever fixed the eval script is the goat, resulting in much better logging.
- However, there is an open issue with throttling errors that still needs fixing, and no intermediate saving of results is implemented yet.

Link mentioned: Chain of Draft: Thinking Faster by Writing Less: Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-ste...

GPU MODE ▷ #gpu模式 (4 messages):

Tilelang, MLA, FlashMLA, Python

Tilelang Touted as MLA/FlashMLA Alternative: A member suggested to "directly all in tilelang", claiming it is as fast as MLA and flashMLA but only requires 80 lines of Python code.
Enthusiastic response to tilelang suggestion: A member thanked the original poster for sharing and expressed eagerness to learn tilelang.

GPU MODE ▷ #general (1 messages):

prefixsum submission, H100 submission

Prefixsum Submission Purge Requested: A user requested the removal of their top H100 submission on prefixsum due to a naming discrepancy.
- The user believes that removing this submission will facilitate easier tracking of changes.
H100 Prefix Sum: Discussion about top H100 Submission for Prefix Sum.
- A user requested the removal of this submission to better track changes.

GPU MODE ▷ #submissions (17 messages🔥):

Leaderboard Submissions, Leaderboard Name Mismatches, Successful Submissions, GPU Usage

Cluster-Bot Flags Leaderboard Name Clashes: The Cluster-Bot detected that the leaderboard name specified in the command doesn't match the one in the submission script header, defaulting to submitting to the specified name (e.g., grayscale, vectorsum, vectoradd).
Modal Runners Yield Successful Submissions: Several leaderboard submissions succeeded using Modal runners on various GPUs, including H100 for conv2d (ID 1509), T4 for matmul (IDs 1510, 1512), and A100, L4, H100 for vectorsum (ID 1516).
Missing Leaderboard Name Causes Cluster-Bot to Complain: The Cluster-Bot prompted users to specify the leaderboard name either via command argument or within the submission script using the {#,//}!POPCORN leaderboard <leaderboard_name> directive.
Vectoradd Leaderboard Sees Multiple Test and Benchmark Submissions: Multiple test (ID 1526, 1528) and benchmark (ID 1527, 1529) submissions to the vectoradd leaderboard succeeded using A100 and H100 GPUs with Modal runners.

GPU MODE ▷ #ppc (3 messages):

AVX512, FMA instruction, Performance Improvement

AVX512 Broadcasts Directly in FMA Instruction: A member suggested exploring AVX512's capability to perform ||broadcasts directly within the FMA instruction|| for potential performance improvements.
- Another member acknowledged they had not considered this and would investigate it further, signaling potential interest in leveraging AVX512's features.
Seeking Drastic Improvements via AVX512: A member expressed the need for a drastically different or new approach to achieve the next level of improvement.
- The suggestion to explore AVX512's FMA instruction with direct broadcasts was presented as a potential avenue for achieving this significant advancement.

GPU MODE ▷ #feature-requests-and-bugs (10 messages🔥):

L4 & T4 Timeout, AMD MI300s, Beta Launch

L4/T4 Timeout Issue Lingers: The issue with L4 and T4 timing out during compiling hasn't been resolved due to development delays.
- A member mentioned a slight delay because the team is working on more interesting problems with some real life impact.
MI300s Launch Possibly Incoming!: There was an improvement in the status with AMD, so the team will hopefully launch with MI300s as an option.
- The team emphasized that this is not promising anything, but it is an option.
Beta Launch a Roaring Success: The team expressed satisfaction with the beta/alpha launch and decided to tackle more interesting problems sooner than planned.
- This launch was originally planned for the end of April.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Travel Reels, AI agents, Trip Planning

App emerges for Saving Travel Reels: An app was created to solve the endless cycle of saving travel reels on social media and then wasting hours researching each spot manually.
- The app (https://thatspot.app/) uses AI agents to automatically process travel reels, extracting every place mentioned with locations, price ranges, reservation requirements, booking links, and operating hours.
AI Agents Automate Travel Research: The app leverages AI agents to streamline the manual research process associated with planning trips from saved travel reels.
- It automatically extracts precise locations, price ranges, reservation requirements, direct booking links, and operating hours, directly from travel reels.

Link mentioned: ThatSpot Guide: no description found

OpenRouter (Alex Atallah) ▷ #general (126 messages🔥🔥):

Google Flash 2.0 Error, Claude 3.7 Sonnet Rate Limits, OpenRouter API Key with VS Studio/RooCode, BYOK azure models in openrouter, Accessing Links in Chat Models

Google Flash 2.0 throws error: A user reported receiving a 502 error when inferencing with Google's Flash 2.0 and Flash 2.0 Light models, with the error message "Provider returned error" and an internal error encountered by Google.
- A member suggested putting the request in the appropriate Discord channel.
Rate Limits of Claude 3.7 Sonnet Discussed: A user inquired about the rate limits for Claude 3.7 Sonnet in terms of RPM (Requests Per Minute) and TPM (Tokens Per Minute).
- A member stated that OpenRouter doesn't have specific rate limits for individual users, and if rate limits are hit, it's usually the OpenRouter limit, which is higher than Tier 4 (see Anthropic's rate limits documentation).
Struggles with OpenRouter API Key in VS Studio/RooCode: A user encountered a 401 Authentication Failure while trying to use an OpenRouter API key in VS Studio via RooCode, despite having funds in their OpenRouter account.
- Members suggested checking the API key for correctness, ensuring OpenRouter is selected as the API provider in RooCode, and verifying the base URL is correctly configured based on this tutorial.
Requesting BYOK azure models in OpenRouter: A user asked about using BYOK (Bring Your Own Key) with Azure models in OpenRouter, aiming for a unified API to use finetuned models through openrouter.
- A member stated that it's not possible to use models other than what's listed in the /models endpoint, which only returns public models and not ones with BYOK. However, you can use your own OpenAI API Key in Integration settings (OpenRouter Integration Settings)
Navigating the Labyrinth of OpenRouter Latency: A user inquired about improving the time to first token (TTFT) latency on OpenRouter, noting their finding that OpenRouter has an average of 2x TTFT compared to using providers directly.
- A team member asked the user to consolidate their findings in a forum post and mentioned that reducing latency is currently a high priority.

Links mentioned:

LM Studio ▷ #announcements (1 messages):

LM Studio SDK, Python, TypeScript, Agent API, MIT License

LM Studio SDKs Arrive for Python and TypeScript!: LM Studio launched software developer kits for Python (lmstudio-python) and TypeScript (lmstudio-js), both under the MIT license.
- These SDKs allow developers to tap into LM Studio's AI capabilities from their own code, including LLMs, embeddings models, and agentic flows.
LM Studio Introduces Agent-Oriented .act() API: LM Studio introduced its first agent-oriented API, the .act() call, where the model autonomously executes tasks using provided tools over multiple rounds.
- This API lets you give it a prompt and tools, and the model goes on its own autonomously for multiple execution rounds until it accomplishes the task (or gives up).
LM Studio SDK Documentation Now Live: LM Studio released documentation for the Python (lmstudio-python) and TypeScript (lmstudio-js) SDKs, providing resources for interacting with LLMs, embedding models, and agentic flows.
- The SDK allows you to use LLMs to respond in chats or predict text completions, define functions as tools, and turn LLMs into autonomous agents that run completely locally.

Links mentioned:

LM Studio ▷ #general (100 messages🔥🔥):

Context Length Error, Model Architecture unsupported by Llama.cpp, LM Studio CLI Commands, LM Studio SDKs, LM Studio Downgrading

LM Studio model failing to load and showing "Unsupported device" error: Users are encountering a Failed to load model error with the message Unsupported device after updating LM Studio, potentially due to insufficient memory or incompatible model loading settings and were advised to try adjusting GPU offloading or thread pool size.
- Context length can also impact memory usage; the left number is the number of tokens the model is using in the chat history already while the right number is the context limit (basically how long until the memory starts to truncate).
Diffusion Model architecture not supported by Llama.cpp: Users are receiving error loading model architecture: unknown model architecture: 'sd3' when trying to load diffusion models, and it was clarified that llama.cpp does not support image/video/audio generation models.
- Support for vision models in llama.cpp is uncertain, with concerns about the lack of Llama 3.2 vision or Pixtral vision support, however, some believe that UI-TARS fixes will help a lot more.
Pseudollama bridges the OLLAMA gap: A user asked if LM Studio endpoints were compatible with apps that take an OLLAMA endpoint, and it was answered that it is not supposed to work by default, but Pseudollama can bridge the gap.
- The author noted that this is 100% vibe coded, so there are likely dumb issues throughout, but it works.
LM Studio SDK Documentation Released: With the most recent release of LM Studio, LM Studio CLI commands were documented, and a user confirmed that the OpenAI API continues to be supported and prioritized.
- One member of the community noted that they were waiting for this to come up so I can make a plugin, noting that they wanted to make a watch app communicator.
Users find way to Downgrade LM Studio: A user needed to downgrade to version 0.3.10 because the new version removed the ability for their preset to do a tensor split between their two cards.
- Another user suggested using the web archive to find the download link, while another said to just switch up the download link parameters.

Links mentioned:

LM Studio ▷ #hardware-discussion (21 messages🔥):

AMD and Intel vs CUDA, Vulkan vs CUDA, AMD GPU market share, Nvidia 5090 specs

AMD and Intel could rival CUDA: Members discussed whether AMD or Intel could become viable for ML pipelines and frameworks to compete with CUDA.
Vulcan is not a CUDA competitor: Members mentioned that Vulkan is a graphics API whereas CUDA is built for GPGPU compute tasks, so using Vulkan for computing it's like using dx12 for computing: You can do it, but does it make sense?
- One member stated, competition and having alternatives to Vulcan will be nothing but good for consumers.
AMD needs higher market share to invest in GPU Compute: Some members believe if AMD increases their market share, they would be more interested in investing in their GPU computing department.
- One member believes they've been happy holding their 10%.
Nvidia 5090 Spec Numbers are sought out: A member recalls seeing some Nvidia 5090 FP8/16/32 spec numbers and is asking where to find those.
- A member also shared an image of their 4x 3090 gang setup.
Chip Foundry Time Acquisition: One member believes that the real question is whether AMD can buy the time from a chip foundry, because Nvidia has the upper hand.
- Another member said Intel opening their next one in ~2030 so expecting nothing from AMD.

Nous Research AI ▷ #general (93 messages🔥🔥):

Low-rank space reasoning, Nous API, CUDA Kernels, Hermes 3 erotic fiction, Ollama usability

Low-Rank Space Makes PEFT Reasonable: A member suggested that since reasoning differences from base models are often in low-rank space, using PEFT for training becomes a reasonable approach, further suggesting the use of Qwen 0.5B for low-cost testing.
- Others agreed that low vram unsloth RL trainer seems to work well because of this.
Nous API is coming?: A member suggested that Nous provide their own API to use their models to generate income, especially given that current income is inconsistent, suggesting a possible pricing of $0.8/M tokens and estimating potential revenue of $800-1600/day.
- Others suggested that Nous could charge closer to $1/M input tokens, $3/M output tokens for forge and others noted that there are ongoing efforts to make this happen.
LLMs struggle creating performant CUDA kernels: Members discussed generating CUDA kernels with LLMs, with the consensus that while LLMs can output CUDA syntax, no LLM is good at producing performant kernels on their own.
- The best strategy seems to involve augmenting the LLM with hardware and compute graph information, possibly using a knowledge graph or GNN, with a semi-manual approach involving extensive GPU profiling.
Hermes 3 Writes Erotic Fiction: A user praised Hermes 3's unexpected talent for writing erotic fiction, expressing excitement for a future NousChud model iteration.
- Another member mentioned they always have something in the works but they have a preference for models that can be run without needing a datacenter.
Ollama criticized for Beginner-Centric Design: While Ollama is considered okay for beginners, it's criticized for being terrible for people that are past the beginner stage because it defaults to Q4 quantization even for 7B and 8B models.
- Alternatives like llama.cpp or koboldcpp were suggested for more advanced users, but it was acknowledged that configuring and maintaining an environment is a different skill and can be too much to throw at someone at once.

Links mentioned:

Nous Research AI ▷ #research-papers (8 messages🔥):

Logic-RL, Rule-Based Reinforcement Learning, General World Models, Worldsim

Logic-RL Unleashes Reasoning with Rule-Based Reinforcement Learning: A new paper (Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning) explores the potential of rule-based reinforcement learning (RL) in large reasoning models, inspired by DeepSeek-R1.
- The 7B model, trained on just 5K logic problems, demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.
Runway Introduces General World Models: Runway introduced General World Models, envisioning AI systems that build internal representations of environments to simulate future events.
- They aim to represent and simulate a wide range of situations and interactions, moving beyond limited and controlled settings like video games or driving simulations.
Community Discusses Generative AI's Potential in Worldsim: Members discussed the potential of generative AI to simulate entire worlds as interactive experiences, hinting at the profound capabilities of emergent LLM world models.
- There was additional conversation that work to continue worldsim, is likely to appear in blog form, although making it a paper is the final goal.

Links mentioned:

Nous Research AI ▷ #interesting-links (1 messages):

``

No Topics Found: No relevant topics were found in the provided messages.
No Summaries Available: Unable to create summaries due to lack of meaningful content.

Link mentioned: San: no description found

Nous Research AI ▷ #research-papers (8 messages🔥):

Rule-Based Reinforcement Learning (RL), DeepSeek-R1, Logic-RL, Worldsim, General World Models (GWM) by RunwayML

Logic-RL Unleashes LLM Reasoning with Rule-Based RL: A new paper, Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, explores using rule-based RL in large reasoning models, drawing inspiration from DeepSeek-R1.
- The paper highlights key contributions such as a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence.
RunwayML Introduces General World Models (GWM): RunwayML is starting a new long-term research effort around what they call general world models (GWM), aiming to build AI systems that understand the visual world and its dynamics to simulate future events.
- They believe the next major advancement in AI will come from systems that understand the visual world and its dynamics.
Exploring LLMs' Emergent World Models: Discussion revolves around the idea that LLMs construct world models as an emergent property from training on large datasets.
- The conversation seeks to understand the capabilities and limitations of these emergent world models, especially how a general world model might enhance experiences or open new creative frontiers.
Worldsim's Profound Potential in Generative AI: Worldsim hints at the potential of generative AI as a creative medium, with the ability to simulate entire worlds as interactive experiences.
- One member is preparing work to continue Worldsim, likely in blog form, expressing uncertainty about making it a paper.

Links mentioned:

Nous Research AI ▷ #reasoning-tasks (1 messages):

Qwen2.5-Math-1.5B, longcot examples, dataset structuring, setting up the GRPOTrainer

Qwen2.5-Math-1.5B Model Struggles with longcot Format: A member is experimenting with Qwen2.5-Math-1.5B using longcot examples, but the model is not following the expected format.
- The member seeks assistance with structuring the dataset and setting up the GRPOTrainer, with a link to their Kaggle notebook.
Dataset Structuring and GRPOTrainer Setup Issues: The user is facing challenges in getting the Qwen2.5-Math-1.5B model to adhere to the desired format when using longcot examples.
- They suspect the issue lies either in the dataset's structure or in the configuration of the GRPOTrainer, and are requesting guidance.

Interconnects (Nathan Lambert) ▷ #news (28 messages🔥):

Unitree Open Source, Gemma 3 Release, GPT-4.5 tops Arena leaderboard, Post-Training Interpretation, Anthropic $3.5B Funding

Unitree Robotics open sources repos: Unitree Robotics has open-sourced a bunch of their repos, with a link provided to their GitHub.
Gemma 3 release: The release of Gemma 3 was announced for March 12.
GPT-4.5 takes the Arena Lead: GPT-4.5 now tops the Arena leaderboard across all categories, including Multi-Turn, Hard Prompts, Coding, Math, Creative Writing, Instruction Following, and Longer Query (source).
Eliciting the potential of Post-Training: Most improvements in models from OpenAI, Anthropic, and Google over the last 18 months are from the post-training phase, akin to F1 teams improving car performance through aerodynamics and systems changes (source).
Anthropic Raises Billions: Anthropic raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners (source).

Links mentioned:

Interconnects (Nathan Lambert) ▷ #ml-drama (8 messages🔥):

BobbyBroccoli videos, Deep Learning History, Shun-Ichi Amari

X says Twitter App is Free: A member linked to a tweet stating this app is free X Tweet.
- The member questioned whether other fields are like this, answering, The answer's gotta be "No," right.
Juergen's Deep Learning History: A member suggested reading Juergen Schmidhuber's Deep Learning History.
- The member also highlighted Shun-Ichi Amari as someone they hadn't heard of before but found interesting.

Link mentioned: Tweet from loss (derogatory) (@untitled01ipynb): gm this app is free

Interconnects (Nathan Lambert) ▷ #random (34 messages🔥):

Grok3 Pricing, LLM Summarization Ethics, Anon-Kode GitHub, Taiwan Security

Grok-onomics: Pricing Leaks Online!: Likely leaked Grok3 pricing indicates costs of $3.50/million for input, $0.875/million for cached input, and $10.50/million for output, as per this tweet.
LLMs Can't Keep a Surprise?!: Debate sparked regarding whether an LLM summarizing chat logs for a user named John should include plans for his surprise birthday party, raising questions about implicit contextual understanding and ethical boundaries.
- A member stated that it depends on if it’s John summarizing or another user asking for a summary to give to John.
Anon-Kode: Claude Telemetry-ectomy: Anon-Kode is a GitHub project that removes telemetry from Claude-Code and replaces the Anthropic endpoint with a customizable OpenAI endpoint.
- Some users expressed concern about the implications of removing Anthropic's license.
Taiwanese Tensions Trending?: Concerns were raised regarding US commitment to Taiwan security, prompted by POTUS comments at a press conference which coincide with TSMC's $100 billion investment in the US (timestamp).
- The same press conference also lauded David Sacks's intellect.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #memes (2 messages):

Anthropic Funding, AI development, International Expansion

Anthropic Achieves Colossal Capital Infusion: Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners.
- This funding aims to advance their AI systems' development, deepen understanding of their functionality, and propel international growth.
Huwupy Kawaii Social Post: A user shared a link to a post on bsky.app.
- No context was provided.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #rl (1 messages):

Med-RLVR

Med-RLVR: Emerging Medical Reasoning from a 3B base: A new paper called Med-RLVR: Emerging Medical Reasoning from a 3B base has been published.
- The model has 3B parameters.
Medical Reasoning: The paper focuses on medical reasoning abilities.
- It explores how a relatively small model can perform in complex medical scenarios.

Interconnects (Nathan Lambert) ▷ #reads (11 messages🔥):

Post-Training Methodologies for LLMs, In-House Data Labeling for SOTA Models, Human Data vs Synthetic Data, Disentangling Post-training Performance from Data

LLMs get new Survey on Post-Training Methodologies: A new survey paper (https://arxiv.org/abs/2502.21321) explores post-training methodologies for Large Language Models (LLMs), analyzing their role in refining LLMs beyond pretraining, including fine-tuning, reinforcement learning, and test-time scaling.
Yeager Chooses In-House Data Labeling: Enhanced Radar's Yeager, a SOTA model that understands air traffic control audio, chose to label data in-house due to the industry-specific technical complexity, resulting in a high degree of standardization and near-perfect accuracy.
- Compensation was tied to the number of characters transcribed and financial penalties were assessed.
Human Data's still needed: A blog post (https://www.amplifypartners.com/blog-posts/annotation-for-ai-doesnt-scale) argues that real, human data is still needed to build AI products that are genuinely useful, disagreeing with the belief that synthetic data will be sufficient to drive step-change improvements in model performance.
Delving into Post-training performance: A Notion page (https://mohit-raghavendra.notion.site/Disentangling-Post-training-performance-from-data-1a5db7f2a34480e18010d689a1f46f74) discusses disentangling post-training performance from data.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #policy (3 messages):

TSMC $100B investment in U.S. chip factories

TSMC Mulls Massive US Chip Expansion: The CEO of TSMC is reportedly heading to the White House to discuss a potential $100B investment in U.S. chip factories, according to this tweet and The Information.
- One member joked that "if we do 1T I won’t be so pessimistic" about the deal.
TSMC Investment Optimism: A member expressed skepticism about the initial $100B investment, considering it insufficient.
- The member indicated a more positive outlook if the investment reached $1T.

Link mentioned: Tweet from Anissa Gardizy (@anissagardizy8): new: The CEO of TSMC is heading to the White House today to talk about a $100B investment in U.S. chip factories https://www.theinformation.com/briefings/trump-tsmc-to-announce-100-billion-chip-factor...

Yannick Kilcher ▷ #general (44 messages🔥):

Bachelor's degree project ideas in VLMs, Automating IRL jobs, Finding interesting problems to solve, Literature review article in AI, invite link to discord server

Brainstorming VLM Project Ideas: A member sought ideas for a final year bachelor's project involving VLMs, open to suggestions on other hot topics suitable for a year-long project with potential for publication.
Claude Completes GitHub PR: A member reported using Claude and Cursor to complete 95% of the work on this GitHub Pull Request.
Exploring Novel Architectures for Deep Learning: A member proposed comparing X-Splines with foundational architectures like Transformers, RNNs, CNNs, GNNs, Mamba, and KANs.
Overcoming Contextual Limitations in Knowledge Graphs: Discussion of methods to improve knowledge graphs/hypergraphs by assigning separate nodes for words based on their context/concept to differentiate between concepts and instances.
Automating IRL jobs: A member mentioned they are working on automating their in-real-life job, without telling anyone.

Links mentioned:

Yannick Kilcher ▷ #paper-discussion (7 messages):

Joscha Bach, Presentation Time Slot

Bach Talk Brainstorming Before Presentation: A member initially considered presenting on Joscha Bach, but it's unclear if this was the final topic.
- Another member offered to present in the <t:1741046400:F> timeslot if there were no other presentations scheduled.
Presentation Still In Progress!: A member lamented missing the presentation, but another member clarified that it was still ongoing.
- A participant thanked the presenter, and the presenter offered further advice to those interested.

Yannick Kilcher ▷ #ml-news (3 messages):

Elsagate 3.0

Elsagate 3.0: A Horrifying Discovery: A member shared a YouTube video titled "Elsagate 3.0 Is Worse Than we Thought" with a warning that it is NOT FOR CHILDREN.
- Another member responded, stating, "Well, that is horrifying."
Additional Topic Placeholder: This is a placeholder to ensure the minimum number of topics is met.
- Further details can be added if the conversation provides more substance.

Link mentioned: Elsagate 3.0 Is Worse Than we Thought.: THIS VIDEO IS NOT FOR CHILDREN. VIEWER DISCRETION IS ADVISEDGet a FREE sample pack from Five, just pay shipping (must be 21+): https://bit.ly/FreeFiveRaymund...

Notebook LM ▷ #use-cases (16 messages🔥):

Financial statement analysis in NotebookLM, Podcast length, Notebook Combination, Blog Outline, Podcast customization

Financial Statement Analysis in NotebookLM: A member inquired about loading financial statements for analysis into NotebookLM.
Podcast length concerns: A member expressed that most podcasts are 20 to 30 mins and don’t cover some of the important topics, referencing a Supreme Court Application.
- Another member said, “You can get that long?”, with a link to US_Dept._of_State_v._AIDS_Vaccine_Advocacy_Coalition.
Notebook Combination is Scaled: A member asked if it was possible to combine Notebooks in NotebookLM.
- A moderator confirmed this feature is not available yet, but has been escalated to the nblm team for consideration.
Blog Outline: Yes or No?: A member inquired about using NotebookLM for blog outline writing.
- Another member simply replied with, “Yes”.
Podcast Guest Customization Commands: A member asked about using custom commands for Podcasts in NotebookLM.
- The other member used a command like 'the hosts interview lawyers representing both sides of the case' and also noted issues with accuracy when summarizing YouTube podcast episodes about guests.

Notebook LM ▷ #general (33 messages🔥):

Dynamically updated sources, Google Docs integration, Podcast timelines, Copying and pasting index numbers, Bulk deleting sources

Users Discuss Dynamic Source Linking and Google Docs: Members are curious if NotebookLM can dynamically update from sources like Google Docs, for use cases like tracking furniture dimensions, with some concerns about manual updates.
- The answer is no, it's not automatic, leading to discussions about workarounds and feature requests.
Podcast Timelines & Creation Requests Aired: A member requested that timelines be added to the podcast free version.
- Another user asked how to create a podcast, which was met with excitement by another member who found the feature very impressive, sharing an example notebooklm podcast.
Citation Links Vanish After Saving Notes: Members noticed that citation links disappear when saving query results as notes.
- A member clarified that saved notes are "view only" and the citation links are only available on the chat and specific response.
NotebookLM Lacks Mobile App and Icon Customization: Users inquired about a mobile app for NotebookLM and the ability to change notebook icons.
- The responses confirmed that there is no mobile app, nor is there a way to change the icon of a notebook.
Notebook Sharing Snafu Resolved!: A user reported a server error when sharing notebooks with Gmail personal accounts, specifically "You are not allowed to access this notebook".
- The issue was resolved by the user, it was the recipient who had a new phone not correctly configured with his gmail account.

Link mentioned: no title found: no description found

Stability.ai (Stable Diffusion) ▷ #general-chat (40 messages🔥):

IP Adapter, Reactor Faceswap, ControlNet, Reforge AMDGPU support, Zluda

IP-Adapter Face Copy Alternatives: A member was seeking the best IP-Adapter for face copy but found reference only in ControlNet to be sufficient.
- Another member suggested Reactor Faceswap as a preferable alternative, praising the all-mighty ControlNet.
Reforge's AMDGPU Support Still Unclear: One member inquired about Reforge supporting AMDGPU, noting its mention on the Stability Matrix but absence from the GitHub page.
- Another member attempted using Zluda but encountered PC freezes at launch, advising against relying on the Stability Matrix due to perceived bugs, and recommending sticking to a UI outside matrix.
DirectML and Reforge Compatibility Struggles: A member tested Reforge with DirectML after Zluda failed, but it did not work.
- There was a discussion of a possible fork of Reforge for AMD by Lshqtiger.
CivitAI as a Generation Request Platform: A member asked for image generation requests and discussed the CivitAI website, noting it provides a few starting credits and 25 free credits daily that can be saved.
- The usage cost depends on the model.
Requirements to generate images locally: A member asked how to create images and another member noted that a GPU with around 6-8GB VRAM and resources in <#1002602742667280404> are recommended.
- Another member offered links to CivitAI for online generation.

Eleuther ▷ #general (13 messages🔥):

Finding good problems to solve, EleutherAI affiliation projects, RWKV models, 4D gaussian splatting

Members seek guidance on finding good problems to tackle: A member asked for suggestions on finding good problems to solve and the methodology for deciding on ideas and problems to work on.
- Another member suggested reading through topics of interest, searching relevant literature, and identifying a reasonable jumping-off point by asking "what if" questions.
EleutherAI projects: involvement and focus: A member clarified that most people in the server do not work on projects under an EleutherAI affiliation.
- He pointed out that there's significant activity in Interpretability channels, along with interest in RWKV models and evaluations on the NLP side, in addition to the GPT-NeoX team pushing out a new training library.
Diving deep into research with a 4D Gaussian Splatting Example: A member described their research process using the example of improving 4D gaussian splatting (3D + time).
- They suggested starting with established work, reproducing it, then taking one experimental step towards your idea to deeply understand the problem domain and inform your next deep dive.

Eleuther ▷ #research (15 messages🔥):

Reasoning Model, GRPO based Agent, LLAMA 3.2 3B, Recurrent LLM reasoning, Atom of Thoughts (AoT)

ReasonableLLAMA-Jr-3b Model Needs Your Feedback!: A member is seeking feedback on their ReasonableLLAMA-Jr-3b model, a reasoning model trained/finetuned using GRPO on LLAMA 3.2 3B quantized, using a custom written GRPO based Agent in Gym Env using MLX.
- The model is based on the concepts described in the Atom of Thoughts (AoT) paper, where each state transition in the reasoning process is a self-contained, atomic question.
Recurrent LLM Reasoning: Too Expensive?: Members discussed that recent recurrent LLM reasoning papers require significantly more compute (equivalent to a 32B parameter model) to achieve performance comparable to a 7B parameter model.
- It begs the question: why not just train a 32B parameter model and use early exit, mixture of depths, or speculative decoding for cheaper inference?
Truncated Backpropagation: Still a Memory Hog?: Although recurrent models use truncated backpropagation, the truncated depth (e.g., 8) may still correspond to the activations of a significant-sized model (e.g., 15B).
- A member wondered if a DEQ type training would have worked, and questioned whether the r and k parameters were optimized.

Link mentioned: Atom of Thoughts for Markov LLM Test-Time Scaling: Large Language Models (LLMs) achieve superior performance through training-time scaling, and test-time scaling further enhances their capabilities by conducting effective reasoning during inference. H...

Eleuther ▷ #lm-thunderdome (10 messages🔥):

trust_remote_code in lm-evaluation-harness, dataset_kwargs override, dataset loading errors, data_dir specification

Trust Remote Code Conditional Setting: A user inquired about trust_remote_code being always set in lm-evaluation-harness's dataset loading, referencing a specific line in the GitHub repo.
- A member clarified that trust_remote_code is only set if the --trust_remote_code argument is passed, pointing to the relevant section of the code.
Dataset Kwargs Pathway revealed: A user asked if setting trust_remote_code would override dataset_kwargs when loading a local dataset.
- A member explained that dataset_kwargs are passed to datasets.load_dataset(...) within the harness, linking to the relevant part of the code.
Dataset Generation Error Arises: A user reported encountering a dataset generation error when running lm_eval with a specific task configuration.
- The user's task configuration specifies dataset_path: json and a data_dir containing train.jsonl, validation.jsonl, and test.jsonl.
Manual dataset loading suggested for debugging: In response to the reported error, a member suggested testing whether the dataset loads correctly using load_dataset manually.
- The member also suggested trying an absolute path for the data directory, to rule out path-related issues.

Links mentioned:

MCP (Glama) ▷ #general (36 messages🔥):

Terraform Registry MCP issues, MCP Multi-Agent Systems, fast-agent GitHub repo, Claude desktop FastMCP errors, MCP server claiming problems

MCP Terraform Troubles Surface: A member reported issues getting terraform-registry-mcp and aws-mcp server to work, seeking advice beyond the inspector.
- They clarified the problem occurs with Claude desktop and Cline, specifically when a system-level proxy is enabled, causing errors with mcp-server-fetch.
Multi-Agent MCP Musings Materialize: A member discussed implementing MCP for multi-agent systems, referencing an Anthropic workshop at AI Engineering Summit and shared an image from the workshop.
- They mentioned building a framework for agents to cooperate across devices and considering adopting MCP, noting examples like BabyAGI and Stanford generative agents.
Fast Agent Framework Finds Fans: A member shared a link to their project, fast-agent on GitHub, describing it as a way to Define, Prompt and Test MCP enabled Agents and Workflows.
- They confirmed that each agent can be configured with a separate set of servers, clarifying that each agent can be called as a tool by another agent and configured with a set of MCP Servers, each of which exposes tools.
Claude's Node Version Vexes Users: A member reported encountering a Cannot find package 'timers' error when using fastmcp in Claude desktop.
- The solution was to remove the old Node v14 version, which Claude was using.
Twitter API Pricing Puts a Damper on Tweet Dreams: A member explored using MCP to connect to a Twitter account for pulling and generating tweets but acknowledged the challenge of Twitter's API costs.
- A user suggested browser automation as a more cost-effective alternative for pet projects and pointed to a recent browser automation example.

Links mentioned:

MCP (Glama) ▷ #showcase (2 messages):

MCPHub.nvim, Graphlit MCP Server, Neovim Plugin, Model Context Protocol

MCPHub.nvim makes MCP server management smooth: A new MCPHub.nvim plugin was released which helps manage MCP servers in Neovim offering features like smart server lifecycle management and integration with CodeCompanion.nvim for AI chat.
- The plugin can be installed with one command (:MCPHub) and configured with a simple setup function.
Graphlit MCP Server released: The Graphlit MCP Server has been launched, offering new content ingestion and retrieval capabilities for MCP clients like Claude Desktop, Goose, Cline, Cursor, and Windsurf.
- The server is open-source and requires a free Graphlit account and project to store the knowledge base.

Links mentioned:

DSPy ▷ #general (30 messages🔥):

Ash Framework, instructor_ex, Async Support in DSPy, LangProBe Benchmark, Minions Feature Benchmarks

Ash Framework EcoSystem Explored: A member suggested the Ash framework for a project, linking to the ash-project/ash_ai GitHub repository, but clarified it's a sub-project within the larger Ash framework ecosystem.
- They also shared a link to instructor_ex, highlighting structured outputs for LLMs in Elixir, along with the Ash Discord community where a live streamer can provide guidance.
DSPy's Async Support Initiative: A member inquired about the motivation for full async support in DSPy, questioning potential performance boosts and production concerns, linking to another Discord invite link.
- A core contributor announced that a lead would be making async support as native as needed, asking the community to specify expectations and workflows, and requesting feature requests to be submitted as GitHub issues due to potential Discord oversight during work hours.
LangProBe Benchmark Shows Program Composition Effects: A member shared a new paper, LangProBe: a Language Programs Benchmark, evaluating the impact of DSPy program composition and optimizers on various tasks, as well as understanding cost/quality tradeoffs.
- According to its X/Twitter post, the paper finds that smaller models in optimized programs can outperform larger models at a lower cost.
Minions Benchmarks Prepare for Cost Optimization: A member noted that the just-dropped LangProBe paper provides a good baseline for running benchmarks on their implemented minions feature, referencing their closed pull request.
- The member was adding MinionsLM and StructuredMinionsLM for intelligent LM routing by jmanhype. They noted the paper is directly related to cost optimization.

Links mentioned:

LlamaIndex ▷ #blog (2 messages):

Workflow-based travel planner, LlamaParse updates, AnthropicAI Claude Sonnet 3.7, Google Gemini 2.0 Flash

Travel Planner: RS Rohan Takes You Places!: RS Rohan demonstrates how to build an advanced, agentic travel planner in @llama_index, which extracts travel information from user queries and delegates tasks to specialized agents (tutorial and repo).
LlamaParse Gets an Upgrade: The 'Parse With Agent' mode now supports AnthropicAI Claude Sonnet 3.7 and Google Gemini 2.0 Flash, improving table parsing and cross-page consistency (announcement).

LlamaIndex ▷ #general (19 messages🔥):

AgentWorkflow context vs chat history, MCP Support, PII redaction with LLMs, Anthropic DeltaStream

AgentWorkflow: Context vs. Chat History Clarified: A member inquired about the difference between Context and Chat History in AgentWorkflow, and when to use each.
- Another member clarified that the chat history is inside the context.
MCP Support in LlamaIndex Verified: A member inquired about MCP support in LlamaIndex, and another member confirmed its existence.
- They provided a link to an example notebook demonstrating its usage.
Seeking PII Redaction Tools for LLMs: A member is looking for paid or open-source tools to assist with sending PDFs and images containing Personally Identifiable Information (PII) to a Large Language Model (LLM).
Anthropic's DeltaStream causes ValueError: A member reported that Anthropic now has a different DeltaStream that LlamaIndex doesn't support, specifically, with thinking enabled, it streams a delta with type ThinkingDelta which is not instance of TextDelta which causes the library to raise a ValueError.
- A maintainer of the library acknowledged the issue and stated they still need to add better support for it.

Link mentioned: llama_index/llama-index-integrations/tools/llama-index-tools-mcp/examples/mcp.ipynb at main · run-llama/llama_index: LlamaIndex is the leading framework for building LLM-powered agents over your data. - run-llama/llama_index

LlamaIndex ▷ #ai-discussion (1 messages):

Windsurf Checkpoints

Windsurf lacks Checkpoint Feature: A member inquired about the absence of a checkpoint feature in Windsurf, noting the inability to revert to previous states despite repeated coding attempts and file/workspace manipulations.
- The member attached an image illustrating their attempts to drag and drop files into the tab menu, seeking a way to access previous checkpoints.
Another topic required: Dummy Summary as topicSummaries requires minItems of 2.

Latent Space ▷ #ai-general-chat (13 messages🔥):

AI replacing programmers, Senior Engineers vs Junior Engineers, Anthropic Fundraising, Stagehand and Browserbase, Claude Code vs Cursor

Reports of AI Replacing Programmers are greatly exaggerated: An O'Reilly article argues that AI tools will change programming, but it's not new, as programming has always evolved since the first programmers connected physical circuits.
- A member commented that “They've just find/replaced their old articles for AI”, which is exactly like decrying people for copy-pasting from StackOverflow, and further stated that LLMs make learning new things much faster.
Senior Engineers Wield AI Better: Senior engineers apply engineering wisdom to shape and constrain AI's output, preventing it from creating unmaintainable “house of cards code” when using tools like Cursor or Copilot.
- The AI accelerates implementation, but their expertise is what keeps the code maintainable, a skill that junior engineers often miss.
Anthropic Achieves $61.5 Billion Valuation: Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners, to advance AI systems, deepen understanding of how they work, and fuel international expansion.
Members hail Stagehand, seek Python Alternative: After listening to the Latent Space podcast's episode featuring Browserbase, a member sought a self-healing browser workflow tool like Stagehand in Python.
- A member pointed to stagehand-py and stated “it's wip”.
Claude Code vs Cursor, the cage match: Members discussed Claude Code and how it fares against Cursor, citing that Cursor is preferable due to its superior rollback process.
- One member indicated that Claude Code has a harder time staying focused, tends to add extra lines of code, is more expensive, and that “code edits in cursor are way way faster”.

Links mentioned:

tinygrad (George Hotz) ▷ #general (10 messages🔥):

tinygrad formalist project, Ops.CAT speed bounty, RDNA2/RX6000 usable with tinygrad, Intel Arc A770 usable with tinygrad

Tinygrad: Formalist Project Aiming for Fair Compute Marketplace: George Hotz (@tinygrad) describes tinygrad as a formalist project that attempts to capture the full gamut of Software 2.0 in a non-leaky abstraction, aiming for a fair marketplace for compute similar to Linux and LLVM.
- He mentions that by the end of the year, tinygrad should be similar in speed on NVIDIA to the existing torch CUDA backend, but without CUDA, and plans to have a test cloud up where users can rent FLOPS in a lambda function.
Ops.CAT Speed Bounty Still in Progress: A member is working on the Ops.CAT speed bounty, but is still facing issues getting it to rewrite in LLVM even after getting added to the schedule.
- The current Ops.CAT operations involve a complex structure of PAD, RESHAPE, and BUFFER operations, with arg being the two tensors to concat.
RDNA2/RX6000 Usability Questioned: A member inquired whether RDNA2/RX6000/GFX1030 is usable with tinygrad, reporting an OSError: [Errno 22] Invalid argument when running AMD=1.
- Another member responded that it should work on Linux and requested the trace for the OS error, which was subsequently provided in a trace.txt file.
Intel Arc A770: Yes with OpenCL: In response to a question, a member confirmed that Intel Arc A770 is usable with tinygrad.
- The recommended approach is to use the OpenCL backend by setting GPU=1.

Link mentioned: Tweet from the tiny corp (@tinygrad): What is tinygrad?tinygrad is a formalist project. It attempts to capture the full gamut of software 2.0 in a non leaky abstraction. The methods on Tensor class create a directed graph of immutable RIS...

LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

Charles Sutton, Coding Agents, AI for Vulnerability Detection

Charles Sutton Speaks on Coding Agents: Amazing guest speaker Charles Sutton presented on Coding Agents and AI for Vulnerability Detection, at Lecture 5.
- The lecture explores using LLM agents for computer security tasks, like finding software vulnerabilities, and discusses design issues in LLM agents.
DeepMind Researcher's Software Engineering Accolades: Charles Sutton, a Research Scientist at Google DeepMind, has research in machine learning motivated by applications in code generation, software engineering, programming languages, and computer security.
- Sutton's work in software engineering has received two ACM Distinguished Paper Awards (FSE 2014, ICSE 2020) and a 10-year Most Influential Paper award (MSR 2023).

Links mentioned:

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (4 messages):

Discord Admin Spam Account Removal, Quiz Posting Schedule

Discord Admin Asks for Account Removal: An admin asked for the removal of an account that was spamming links, suggesting re-adding it after the security breach is resolved.
Quiz Posting Day Revealed: A user asked when the quiz is posted each week, to which another user responded that they generally try to release it Wed/Thurs.

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (4 messages):

Audio issues during lectures

Audio issues plague lecture: A member reported not being able to hear questions during the lecture due to audio problems, requesting assistance from someone in the room.
- Following up, the member noted that the audio was cutting out completely, inquiring if there was an issue with the AV setup in the room.
Speakers to repeat questions after issue: A staff member apologized for the audio issues during the lecture.
- They promised to remind the speakers to repeat all questions going forward.

Cohere ▷ #api-discussions (9 messages🔥):

Embed Images, 504 Errors

Image Embedding Issue Resolved: A user reported an issue with embedding images, but later confirmed that the issue is resolved.
- Another member acknowledged the resolution.
504 Errors Investigated: A member mentioned that they didn't observe a spike in errors but noted super slow requests often resulting in 504 errors.
- The member is planning to investigate further and thanked the user for the information.

Modular (Mojo 🔥) ▷ #general (5 messages):

Renaming ownedtoown, Community meeting, AWS GenAI Loft event

owned Becomes own Thanks to Pull Request: A member created a pull request to rename owned to own for consistency with the rest argument convention.
Community Meeting Approaching - Speakers Needed: The next community meeting is coming up in one week, and the organizers are looking for speakers to give talks or share projects during the meeting.
- If you're interested, please contact the organizers to express your interest and secure a spot on the agenda.
MAX Engine Event at AWS GenAI Loft: If you're in the Bay Area, consider attending an event tomorrow evening at the AWS GenAI Loft, entitled Beyond CUDA: Accelerating GenAI Workloads with Modular’s MAX Engine, Hosted by AWS.

Link mentioned: modular/max: The MAX Platform (includes Mojo). Contribute to modular/max development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #mojo (1 messages):

SIMD DType, Construction Checks, Globals vs Parameters

SIMD DType Dissected: A user questioned the need to wrap a DType in SIMD for C bindings, as it obscures the original dtype, but another member clarified that SIMD[DType.uint8, 1](0).type returns the dtype at compile time.
- They exemplified with var a = UInt8(0); alias dtype = __typeof(a).type to further clarify the use case.
SIMD Construction Under Scrutiny: A member highlighted that SIMD includes construction checks within its implementation.
- This ensures the validity and type safety of SIMD objects upon creation.
Parameter Injection Praised: When asked about using globals, one of the members stated that injecting parameters is always preferable to using globals.
- The member claimed this is true, if you have the time.

Link mentioned: max/mojo/stdlib/src/builtin/simd.mojo at main · modular/max: The MAX Platform (includes Mojo). Contribute to modular/max development by creating an account on GitHub.

Torchtune ▷ #general (1 messages):

Step-based checkpointing

Step-based Checkpointing Reduces Compute Waste: A member expressed interest in step-based checkpointing to reduce compute waste if a failure occurs during training.
- Another member responded that step-based checkpointing is already being implemented to address this concern.
Ongoing Implementation Addresses Concerns: The ongoing implementation of step-based checkpointing directly addresses the initial worry about wasted compute.
- This feature aims to minimize the impact of failures during training runs by saving progress at regular intervals.

Torchtune ▷ #dev (3 messages):

Profiler traces, Tensorboard, PyTorch memory visualizer tool, Perfetto

Torch Users Tracing with Tensorboard and More: Users discussed strategies for visualizing profiler traces, mentioning initial attempts with Tensorboard and its apparent removal of certain plugin features for PyTorch.
- They recommended the PyTorch memory visualizer tool and Perfetto for memory and timing traces, respectively, as sufficient for following the trail.
Alternative Profiling Tools: The discussion highlighted the PyTorch memory visualizer tool and Perfetto as alternatives for memory and timing traces.
- These tools were suggested after a user reported issues with Tensorboard, which seemed to have removed some plugin features for PyTorch.

Nomic.ai (GPT4All) ▷ #general (3 messages):

Ollama vs GPT4All, Catalan Language support for GPT4All, GPT4All v3.10.0 Vulnerability

Ollama vs GPT4All, which is the best llama?: A user inquired why individuals opt for Ollama or Llama.cpp over GPT4All, asserting GPT4All's superiority due to its out-of-the-box functionality.
Catalan Language Support for the win: A user requested the addition of Catalan as a language option for the GPT4All interface, citing the presence of Catalan speakers within the community.
GPT4All v3.10.0 has security vulnerability: A user reported the discovery of a potential vulnerability in GPT4All v3.10.0 and sought guidance on the appropriate reporting procedure.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}