AI News for 3/3/2025-3/4/2025. We checked 7 subreddits, 433 Twitters and 29 Discords (221 channels, and 4084 messages) for you. Estimated reading time saved (at 200wpm): 481 minutes. You can now tag @smol_ai for AINews discussions!
Their brief blogpost here. Itâs not technical news, but itâs still only every other week that a frontier lab raises money, and more money for Claude is only good news for AI Engineers.
Meanwhile, GPT 4.5 rated #1 across the board on LMArena. For posterity, here is where the current rankings lie under style control. Claude has a ways to go yet to reclaim frontier status.
{% if medium == âwebâ %}
Table of Contents
[TOC]
{% else %}
The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!
{% endif %}
AI Twitter Recap
Model Performance & Benchmarks, Comparisons and Evaluations
- GPT-4.5 Performance Leadership: @lmarena_ai announced that GPT-4.5 has topped the Arena leaderboard, achieving #1 rank across all categories, including Multi-Turn and Style Control, based on over 3k votes. @lmarena_ai further detailed that GPT-4.5 leads in Multi-Turn, Hard Prompts, Coding, Math, Creative Writing, Instruction Following, and Longer Query categories. @lmarena_ai highlighted GPT-4.5âs strength in Style Control, leading the leaderboard in this specific area. @lmarena_ai provided a link to explore full GPT 4.5 results.
- DeepSeek R1 Joint #1 with GPT 4.5: @teortaxesTex noted that DeepSeek R1 is ranked joint #1 with GPT 4.5 on hard prompts with style control, congratulating the OpenAI team.
- GPT-4.5 vs Claude 3.7 Coding Capabilities: @casper_hansen_ questioned if GPT 4.5 is actually better than Claude Sonnet 3.7 in coding.
- GPT-4.5 vs Claude 3.7 for Workflow: @omarsar0 described a new coding workflow using GPT-4.5 for brainstorming, Claude 3.7 Sonnet for building, and Windsurf for agentic tasks.
- GPT-4.5 Benchmark Skepticism: @aidan_mclau asked @DaveShapi if 4.5 is overfit to benchmarks, or if other models are. @willdepue expressed surprise at GPT-4.5 topping categories without test-time compute, suggesting pretraining is still important. @vikhyatk is retracting positive comments about GPT-4.5, not wanting to be seen as a âlow-taste testerâ.
- Claude Sonnet 3.7 Performance: @Teknium1 described Sonnet 3.7 in Cursor as âbustedâ and questioned its proper chat mode usage. @reach_vb mentioned Claude Sonnet 3.7 and DeepSeek as favorite LLMs, using Cursor and DeepSeek chat.
- LMSYS Leaderboard Importance: @aidan_clark stated that LMSYS is clearly the most important benchmark and advised labs to prioritize it for maximizing user value.
- Benchmark Relevance Questioned: @cto_junior argued that beating benchmarks is not relevant now, and gaining users is more important.
Industry News, Funding, and Partnerships
- Anthropicâs $3.5B Funding Round: @AnthropicAI announced a $3.5 billion funding round at a $61.5 billion valuation, led by Lightspeed Venture Partners, to advance AI development and international expansion.
- Perplexity AI and Deutsche Telekom Partnership: @perplexity_ai announced a partnership with Deutsche Telekom to make Perplexity Assistant a native feature on their new AI Phone, further highlighted by @AravSrinivas and @yusuf_i_mehdi who sees AI-first browsers as the future with Edge pushing this forward with Copilot integration.
- Microsoft Dragon Copilot Launch: @mustafasuleyman highlighted the Microsoft Dragon Copilot launch, aiming to reduce administrative overload in healthcare and refocus doctors on patients.
- DeepSeek AI on Copilot+ PCs: @yusuf_i_mehdi mentioned DeepSeek R1âs 7B and 14B distilled models are now available on Snapdragon-powered Copilot+ PCs, emphasizing hybrid AI.
- Firefly Aerospace Moon Landing: @kevinweil congratulated @Firefly_Space on being the first commercial company to successfully land a vehicle on the moon.
Tools, Frameworks, and Coding Workflows
- LlamaParse Updates with Claude 3.7 and Gemini 2.0 Support: @llama_index announced updates to LlamaParse, adding support for AnthropicAI Claude Sonnet 3.7 and Google Gemini 2.0 Flash in âParse With Agentâ mode for better table parsing and cross-page consistency, and in âParse With LVMâ mode for parsing screenshots.
- LlamaIndex Workflow-Based Travel Planner Tutorial: @llama_index shared a tutorial and repo by RS Rohan on building an agentic travel planner using LlamaIndex, demonstrating structured predict feature with Pydantic models, API integrations (Google Flights, Hotels, Top Sites), and event-driven architecture.
- LlamaExtract for Resume Extraction: @llama_index introduced LlamaExtract, powered by SOTA LLMs like 3.7 Sonnet and o3-mini, for extracting standardized candidate information from resumes, and generalizable to other data types.
- SynaLinks, Keras-inspired Framework for LLM Applications: @fchollet and @fchollet introduced SynaLinks, a Keras-inspired framework for building LLM applications as DAGs of trainable components, enabling sophisticated pipelines and RL fine-tuning.
- Groovy, Python-to-JavaScript Engine: @_akhaliq highlighted Groovy, a Python-to-JavaScript engine that transpiles Python functions for client-side execution, with @algo_diver noting its potential to make Gradio production-ready.
- Outlines for Structured Generation with MLX-LM: @awnihannun shared how to use Outlines by @dottxtai with mlx-lm for local structured generation, with documentation provided @awnihannun.
- LangSmith for Observability and Evals Tooling: @hwchase17 pointed out that LangSmith is used to transform user feedback into evals, emphasizing observability as evals tooling.
- Cursor Coding Workflow: @omarsar0 mentioned using Cursor in a new coding workflow. @jeremyphoward noted creating complex apps in a day using tools like Cursor with Python, fasthtml and MonsterUI.
- Gibberlink for Encrypted AI Agent Communication: @ggerganov, @ggerganov, and @ggerganov introduced Gibberlink, demonstrating encrypted audio chat between two AI agents and provided a GitHub project link.
Research and Papers
- Brain-to-Text Decoding Research: @AIatMeta highlighted a research paper from Meta FAIR and BCBL researchers on Brain-to-Text Decoding, a non-invasive approach via typing.
- Diffusion Models and Flow Matching Course: @omarsar0 and @TheTuringPost shared a free MIT course on Introduction to Flow Matching and Diffusion Models, covering theory, training, and applications, including course notes, slides, YouTube videos and labs, with @omarsar0 providing another link.
- Reasoning LLMs Deep Dive: @omarsar0 recommended a âDeep Dive into Reasoning LLMsâ, summarizing progress in post-training.
- SoS1 Paper on Reasoning LLMs as Sum-of-Square Solvers: @_akhaliq shared a paper titled âSoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solversâ.
- HAIC Paper on Improving Human Action Understanding: @_akhaliq posted about the âHAICâ paper, focusing on improving human action understanding and generation using better captions for multi-modal LLMs.
- Sim-to-Real Reinforcement Learning for Humanoid Manipulation: @arankomatsuzaki and @arankomatsuzaki highlighted Nvidiaâs presentation on Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, achieving robust generalization without human demonstration, and provided project and abstract links.
- Chains-of-thought and Inference Bottleneck: @francoisfleuret discussed how chains-of-thought make inference compute-bound and suggested distilling large models into faster SSMs or hybrids for better trade-offs.
- LLMs as Evolution Strategies: @SakanaAILabs listed several works discussed in an interview, including â(1) Large Language Models As Evolution Strategiesâ.
- TileLang for Kernel Programming: @teortaxesTex mentioned TileLang, a user-friendly AI programming language lowering the barrier to kernel programming.
- Evaluation of LLM Belief Structures: @teortaxesTex shared an insightful evaluation of LLM belief structures.
- LangProBe for Evaluating AI Systems: @lateinteraction introduced LangProBe from @ShangyinT et al., questioning what complete AI systems should be built and evaluated.
AI in Business & Applications
- Inventory Tracking and Demand for Tokens: @gallabytes suggested that the demand for trillions of tokens per day will come from areas like improving inventory tracking in various sectors of the economy.
- AI for Shader Golf: @torchcompiled shouted out to folks working on shader golf.
- AI Powered Wiki Explorer App: @omarsar0 and @omarsar0 developed a wikiexplorer app using AI, utilizing Wikipedia and OpenAI models for hints, designed to be a fun way to learn new topics.
- AI Research Agent for Literature Reviews: @TheTuringPost promoted Deep Review by SciSpace, an AI research agent for systematic literature reviews, claiming it saves hours of work and is significantly more relevant than OpenAIâs Deep Research and Google Scholar.
- AI in Android Day-to-day Life: @Google highlighted AI on Android at #MWC25, demonstrating features like Circle to Search for translating menus and Gemini Live for learning complex topics.
- AI Co-scientist Example with AlphaFold: @_philschmid gave an example of extending a GoogleAI co-scientist with GoogleDeepMind AlphaFold for protein modification assessment.
- AI in Web Development with Groovy and Gradio: @algo_diver believes Groovy will make Gradio production-ready for full-stack web development.
Memes and Humor
- Karpathyâs AirPods Pro Saga: @karpathy shared a humorous, multi-line tweet in the style of 4chan greentext about AirPods Pro malfunctions.
- Elon Musk and Grok Realism: @Teknium1 posted âGrok is much more open to realismâ with a link, implying Grokâs unfiltered nature, and @Teknium1 replied âBetterâ to a Grok image comparison.
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. Atom of Thoughts Enhancing Smaller Models
- New Atom of Thoughts looks promising for helping smaller models reason (Score: 641, Comments: 90): Atom of Thoughts (AOT) algorithm significantly enhances smaller modelsâ reasoning, achieving 80.6% F1 on HotpotQA with GPT-4o-mini, surpassing other models. AOTâs process includes decomposing questions into a Directed Acyclic Graph (DAG), simplifying through subquestion contraction, and iterating to reach atomic questions, as illustrated in the accompanying flowchart.
- Critiques on Methodology and Results: Users questioned the reliability of the Atom of Thoughts (AOT) results, citing potential issues with the sample size of 1k tasks, unspecified confidence intervals, and tests conducted at temperature 1, which could lead to high result volatility. Concerns were raised about the randomness of results, suggesting that the reported improvements might not be statistically significant without repeated testing.
- Discussion on Rule-Based Methods: There was a debate on the relevance of rule-based methods in AI, with some users arguing that while rule-based approaches are not scalable, they can still be relevant in specific contexts. The concept of the âbitter lessonâ was mentioned, indicating that computation often trumps encoding knowledge, but it doesnât rule out the utility of logical rulesets.
- Practical Implementation and Resources: A link to the open-source repository of the AOT algorithm was shared, allowing users to explore and implement the algorithm themselves (GitHub link). Additionally, the original paper is available on arXiv, providing further details on the algorithmâs development and performance.
Theme 2. Klee Open-Sourced for Local LLM Use with Zero Data Collection
- I open-sourced Klee today, a desktop app designed to run LLMs locally with ZERO data collection. It also includes built-in RAG knowledge base and note-taking capabilities. (Score: 397, Comments: 67): The Klee desktop app is now open-sourced, designed for running LLMs locally without any data collection, and includes a RAG knowledge base and note-taking features. The app interface offers model options like âdeepseek-r1-7bâ and emphasizes privacy with a âLocal Modeâ toggle, ensuring no data is sent to the cloud.
- Users discuss the backend compatibility of Klee, questioning if it forces the use of Ollama or if alternatives like llama.cpp can be used. There is also curiosity about how Klee compares to other platforms like LM Studio and OpenWebUI, with some pointing out that Klee is essentially a wrapper over Ollama.
- Data privacy is a focal point, with inquiries about the âZERO data collectionâ claim and whether using Ollama + Open WebUI involves data collection. Itâs noted that both platforms run stats for bug collection, which can be disabled, aligning with Kleeâs emphasis on local data security.
- The user interface and features are debated, with some users put off by the Slack-inspired UI, while others appreciate the simplicity for non-technical users. Questions are raised about the potential for an Android port, the ability to run models from Hugging Face, and the customization of the RAG knowledge base.
Theme 3. Split Brain âDeepSeek-R1-Distill-Qwenâ and âLlamaâ Fusion Architecture
- Split brain âDeepSeek-R1-Distill-Qwen-1.5Bâ and âmeta-llama/Llama-3.2-1Bâ (Score: 139, Comments: 30): The Split Brain project explores a novel dual-decoder architecture that combines two distinct language models, DeepSeek-R1-Distill-Qwen-1.5B and meta-llama/Llama-3.2-1B, to enable simultaneous processing and cross-attention fusion. This system allows for collaborative reasoning and specialized processing by maintaining separate models on different GPUs, utilizing an EnhancedFusionLayer for cross-attention, and employing a sophisticated gating mechanism for adaptive information flow. The architecture enhances computational efficiency and task flexibility, allowing for both collaborative and specialized operations while maintaining parameter efficiency by only training the fusion components.
- Cross-Attention Fusion: The Split Brain project uses bidirectional cross-attention fusion where both models generate outputs simultaneously, attending to each otherâs hidden representations rather than final token outputs. This real-time interaction at the hidden representation level allows for mutual influence on the modelsâ âthinking processesâ without direct token feedback.
- Model Vocabulary Challenges: A key challenge identified is managing different vocabularies between the models, which requires a sophisticated mechanism to ensure seamless interaction and processing.
- Potential for Personalization: There is interest in using a split-brain approach for personalized AI models by combining a small, personality-reflective model with a larger, powerful model. This could surpass current prompt-based agents by allowing one model to direct and correct the other, enhancing personalization through collaborative processing.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
TO BE COMPLETED
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking
Theme 1. IDE Wars: Cursor Stumbles, Windsurf Surfs On, and Plugin Pains Persist
- Cursor IDE Plunges into Bug Abyss: Cursor IDE users are battling instability, connection failures, and checkpoint malfunctions, prompting engineers to eye Windsurf and Trae AI as life rafts. The latest release is described as incredibly unstable, with MCP server configurations adding to the chaos, especially on Windows and remote Ubuntu setups, leading to client creation failures and users seeking help on forums.
- Windsurfâs Ubuntu Update Capsizes Systems Then Self-Corrects: A recent Windsurf update for Ubuntu 24.04 backfired spectacularly, bricking systems with a FATAL:setuid_sandbox_host.cc(158) error, forcing reinstalls and data loss for some, but a subsequent patch and workaround involving chrome-sandbox permissions offered a lifeline. Users on Windows ARM64 however are celebrating as Windsurf Next now supports their platform, available for download here.
- JetBrains Plugin Gets Requesting Hang-up: Codeiumâs JetBrains plugin is frustrating users by getting stuck in a perpetual Processing request state, particularly in the latest pre-release, rendering it useless for generating code and forcing downgrades to older, more stable versions to keep workflows afloat. The issue with the JetBrains plugin contrasts with the Windows ARM64 support in Windsurf Next, showcasing uneven feature reliability across different IDE integrations.
Theme 2. Claude 3.7: Speed Bumps and Credit Crunch, But Still Impresses
- Claude 3.7 Chokes on Cursor, Runs in Slow Motion: Claude 3.7 is causing headaches in Cursor IDE, with users reporting itâs insanely slow and prone to halting mid-request, pushing many to downgrade or use Cursorâs âAskâ mode, highlighting concerns about the modelâs current stability. Despite instability in Cursor, users in the OpenAI Discord declared ChatGPT is a joke now and found Claude 3.7 to be very impressive, particularly noting Claudeâs superior context understanding in larger files, boasting a 200K token window, eclipsing ChatGPTâs 128K.
- Claude 3.7 Devours Windsurf Credits Like Pac-Man: Claude 3.7 in Windsurf is guzzling premium flow action credits at an alarming rate, with reports of 30-40 tool calls per prompt for minor edits, leading to rapid credit depletion and user ire, some users are switching back to 3.5 or considering a move to Cursor to escape the credit drain. Users are urging Codeium to demote Claude 3.7 from its default model status due to its voracious credit consumption.
- Claude Code Gets Anon-Kode Remix, Goes Open API: A developer, known as anon-kode, has released a modified, OpenAI-compatible version of Claude Code, dubbed anon-kode, after extracting the original source code (original tweet), making it compatible with OpenAI APIs (tweet) and available on GitHub, offering a potential open-source alternative, albeit with lots of things to fix.
Theme 3. AI Models: New Releases, Performance Quirks, and Ethical Quandaries
- GPT-4.5 Claims Arena Throne, Image Recognition Debated: GPT-4.5 has ascended to the top of the Arena leaderboard, dominating across categories from coding to creative writing (source), but its image recognition capabilities are under scrutiny, with mixed reviews and debates on whether it surpasses GPT-4o, even though initial tests show a marginal +5% improvement on the MMMU benchmark. Despite leaderboard victories, some users feel OpenAI is deprioritizing Plus users in favor of Pro users, suggesting a shift in premium status perceptions.
- Grokâs Custom Instructions Fail to Troll, Prompting Persona Panic: Grok AIâs much-anticipated custom instructions feature, now live for all users, is facing criticism as being useless, with users reporting failures to mold Grok into desired personas, including one attempt to create an âabusive and lewd trollâ that backfired, leaving users questioning the featureâs efficacy. Despite custom instruction flops, Grok is praised for its debugging prowess, outshining models like O3 mini high sonnet in this area, although some users find O3 mini high sonnet superior in code creation tasks.
- Phi-3 Model Fine-Tuning Faces A100 Hurdles, Dataset Viewer Needs Error Fixes: Fine-tuning Phi-3 for multi-modality is proving to be a Herculean task, requiring an estimated 6+ A100s and approximately 2 weeks, even with Colab Pro, while Hugging Faceâs Dataset Viewer is plagued by errors impacting compatibility with various libraries and SQL, hindering data discoverability and usability. Despite these challenges, Hugging Face is celebrating latency reductions of up to 10x on Remote VAE Decode endpoints for SD v1, SD XL and Flux, thanks to code-name honey, empowering local AI builders with Hybrid Inference.
Theme 4. Hardware Hustles: Tilelang Triumphs, AMDâs Ascent, and SRAM Secrets
- Tilelang Kernel Smokes Triton, Nears Flash-MLA Speed: A lean 80-line tilelang kernel is boasting 95% performance of deepseek flashmla on H100, achieving a 500% speedup over Triton, showcasing tilelangâs potential for high-performance computing, with code available on GitHub. This performance leap is stirring calls for an MLA leaderboard to showcase similar achievements, possibly repurposed from the bitnet group.
- AMD GPUs Inch Closer to ML Spotlight, Intel Arc A770 Joins Tinygrad Party: Discussions are heating up about AMD and Intel becoming viable alternatives to CUDA in ML pipelines, with some believing increased AMD market share could spur greater investment in their GPU computing department, while Intel Arc A770 GPUs are confirmed to be compatible with tinygrad using the OpenCL backend, broadening hardware options for developers. Despite AMDâs progress, questions remain about their foundry time acquisition, with concerns Nvidia still holds a significant advantage in chip manufacturing access.
- SRAMâs Cache Conspiracy Unveiled: Deep dives into SRAM architecture reveal that registers, shared memory, and cache are all SRAM constructs, with unallocated shared memory morphing into L1 cache, while Tritonâs
cache_modifier
intl.load
allows specifying L1 or L2 hits, but lacks direct cache level control, exposing the nuanced layers of memory management in GPU programming. For CUDA compilation,torch.cuda.get_device_capability()
in PyTorch is suggested for determining--arch=
, thoughnvidia-smi --query-gpu=name,compute_cap --format=csv
offers a PyTorch-free alternative.
Theme 5. Agent Innovations and Frustrations: Travel Planning AI, Smol Agent Quiz Fails, and MCP Multi-Agent Visions
- Travel App Agents Spring Up to Rescue Reel-Ravaged Travelers: A new app, ThatSpot, emerges to combat travel reel overload, deploying AI agents to automatically extract crucial trip-planning dataâlocations, prices, booking linksâdirectly from travel reels, automating hours of manual research and streamlining trip organization for wanderlust-stricken users. The app promises to process travel reels and extract every mentioned place, automating the tedious manual research process.
- Smol Agents Quiz Stumps Students, Error Logs Hold Clues: The Smol Agents Quiz is causing headaches, with users reporting unclear requirements and failing scores despite multiple attempts, prompting calls to mine error logs from the quizâs app.py file to pinpoint necessary tool and model providers, highlighting the need for clearer quiz instructions and better error feedback in AI learning platforms. Despite quiz woes, HuggingFace has launched a new NLP Reasoning Course unit, aiming to educate on reinforcement learning in LLMs and contribution to Open R1.
- MCP Multi-Agent Architectures Materialize, Fast Agent Framework Floats: Engineers are exploring MCP for multi-agent systems, drawing inspiration from Anthropic workshops and envisioning frameworks for agents collaborating across devices, with one member sharing their fast-agent GitHub project for Defining, Prompting and Testing MCP enabled Agents and Workflows, allowing agents to be configured with distinct MCP servers and called as tools by other agents. However, MCP Terraform Registry setup is proving troublesome, particularly with Claude desktop and Cline, facing mcp-server-fetch errors when system-level proxies are active.
PART 1: High level Discord summaries
Cursor IDE Discord
- Cursor Plagued by Instability: Users report instability, connection failures, and non-functional checkpoints in the latest Cursor IDE release.
- Members consider alternatives like Windsurf and Trae AI due to the poor user experience.
- MCP Servers Cause Configuration Nightmares: Members struggle to configure MCP servers in Cursor, especially with Windows and remote Ubuntu workspaces, facing issues like client creation failures.
- One member eventually solved their issues with Pupeteer, as well as using Firecrawl MCP server for web scraping with LLM clients.
- Claude 3.7 Faces Glitches: Users experience issues with Claude 3.7 such as being insanely slow and stopping mid-request without errors.
- As a result, many resort to using Cursorâs âAskâ mode or reverting to older versions for critical tasks.
- Designers Dive into Landing Pages: Members share landing page designs generated with Cursor and discuss their aesthetic appeal and effectiveness.
- The community compares designs to those of Linear, Framer, Magician Design, and Webflow for inspiration.
- Repo Prompt Hailed for Multi-File Editing: Users show excitement about Repo Prompt, praising its multi-file edit capabilities and code snippet integration.
- The community also mentions BrowserTools for debugging, and PasteMax, an open source poor manâs version of Repo Prompt, for file selection.
Codeium (Windsurf) Discord
- Windsurf Adds Windows ARM64 Support: Windsurf Next now supports Windows ARM64, available for download here.
- This expansion allows users on Windows ARM64 platforms to leverage the latest features and improvements in Windsurf Next.
- Windsurfâs Ubuntu Update Crashes Systems: A recent Windsurf update caused issues on Ubuntu 24.04, leading to the application failing to start with a FATAL:setuid_sandbox_host.cc(158) error.
- One user reported a system crash, reinstallation, and data loss, highlighting the need for backups before updating, and a manual workaround involving changing permissions for chrome-sandbox may be required.
- Claude 3.7 Burns Credits, Sparks User Ire: Users report Claude 3.7 in Windsurf is rapidly depleting premium flow action credits due to excessive tool calls per prompt, with some experiencing 30-40 tool calls for minor changes.
- Members suggest Codeium hide Claude 3.7 as a default model, with some switching back to 3.5 or other models for better efficiency, and are considering switching to Cursor.
- Codeium Customer Support Faces Scrutiny: Users are reporting poor customer support experiences from Codeium, with one user awaiting resolution of a subscription issue for four weeks.
- The lack of timely and effective support is driving users to seek alternative solutions and has raised concerns about Codeiumâs responsiveness.
- JetBrains Plugin Plagued by Processing Request Hang: Users of the JetBrains plugin are encountering a persistent Processing request state, leading to errors, particularly in the latest pre-release version.
- This issue renders the plugin unable to generate responses, disrupting workflow and necessitating a downgrade to a more stable version.
OpenAI Discord
- OpenAI Hosts Sora Onboarding: The Sora team hosted a live onboarding session covering Sora fundamentals and optimal prompting techniques on <t:1741024800:R>, and you can join the discussion via this discord link.
- The Sora 101 session also shared insights from the onboarding process for early access artists.
- GPT-4.5 Image Recognition Gets Mixed Reviews: Members are debating whether the new GPT-4.5 has better image recognition compared to GPT-4o, with Future Machine being more vocal about OpenAI (OAI)âs choices.
- Initial tests show that GPT-4.5 scores a bit higher than 4o on the MMMU (vision oriented reasoning benchmark) with a +5% improvement.
- Custom Grok is a Flop: Grok AIâs custom instructions feature has been released for all users, but members report the custom instruction is useless.
- One member shared custom Grok instructions aiming for an abusive and lewd troll persona, but reported it doesnât work, and other users reported the same.
- Claude 3.7 Impresses, But Projects Flounder: One user declared ChatGPT is a joke now and found Claude 3.7 to be very impressive, while Claude can better understand the context in larger files with 200K context window.
- However, another user said Claudeâs projects are of no use, complaining that they can hardly upload only two files maximum and it says memory full, calling Claude over hyped.
- Dall-E Delivers Synthetic Biology: A member prompted Dall-E to generate an image of synthetic plants that grow hearts and livers for transplant, visible within a transparent membrane and nourished by the GM plant.
- The initial results emphasized hearts over livers, prompting the user to refine the prompt with more details about the liver lobes.
Unsloth AI (Daniel Han) Discord
- Llama Model Zips WAVs Hilariously: A member amusingly reported that compressing a 192 KB ZIP file with a llama model resulted in a 48 KB lossless WAV format.
- The user found this confusion since the model then attempted to re-zip the WAV to make it smaller, specifically mentioning the r1-1776-distill-llama-70b model.
- GRPO Training: More Steps Needed for Reasoning?: Users discussed the necessary training steps for LoRA training Qwen2.5-14B-instruct with GRPO, emphasizing lowering the loss for better reasoning.
- Suggestions included allocating around 24 hours, or 700-1200 steps, underscoring that convergence is model-dependent, as described in Unslothâs Documentation.
- GCC Compiler Causes VLLM Pain: A user encountered a RuntimeError related to the GCC compiler while running the GRPO tutorial locally with meta-Llama-3.1-8B-Instruct.
- Despite attempting to install GCC via conda, the issue persisted, and the user is restricted from using apt-get due to security reasons on their schoolâs HPC.
- String Replacement: Coding Strategy Success?: Members debated the effectiveness of string replacement for code editing with one member deeming it generally garbage.
- Another member, however, reported success fine-tuning Qwen 2.5 for string replacement, especially when the model has access to the entire file before making replacements.
- Claude 3.5 Sonnet Sweeps Bench with SOTA: Anthropicâs Claude 3.5 Sonnet achieves 49% on SWE-bench Verified, surpassing the previous state-of-the-art modelâs 45%.
- Members made a reference to the bitter lesson: general methods that leverage computation are ultimately the most effective.
Perplexity AI Discord
- Perplexity Web UI Rewrite Feature Malfunctions: Users report that the Perplexity web UIâs rewrite functionality is broken, always defaulting to pplx_pro regardless of selected model.
- Some experienced prompt duplication, tagging <@883069224598257716> for support, indicating significant issues with the rewrite tool.
- Claude 3 Model Confusion Persists: Users are unsure if Perplexity model indicators accurately reflect the model in use, questioning whether theyâre receiving Claude 3.7 Sonnet or Claude 3 Opus when selecting Claude.
- Some noticed that Pro Search overrides the selected model with Sonar, creating disparities between chosen and employed models.
- Perplexity API Troubles Obsidian Web Clipper: The Perplexity APIâs partial incompatibility with OpenAI standards causes issues for tools like Obsidian Web Clipper.
- The APIâs requirement for an assistant message between user messages, absent in OpenAI, hinders Obsidian Web Clipperâs ability to post consecutive user messages.
- Deepseek generates Controversial Propaganda?: A user shared an image allegedly generated by Deepseek which the community regarded as politically biased propaganda.
- Another member dismissed the image, asserting You are fake deepseek. Real deepseek doesnât talk on western affairs.
HuggingFace Discord
- Phi-3 Fine-Tuning Faces Hurdles: One member is fine-tuning Phi-3 for multi-modality using an A100-equipped Colab Pro, but was cautioned such fine-tuning would take 6+ A100s and run for approximately 2 weeks.
- Another member added that [QLora and Peft make anything possible with a positive attitude and a credible project].
- Dataset Viewer experiences Errors: A user suggested fixing Dataset Viewer errors for compatibility with various libraries and SQL, to improve data discoverability.
- Another user thanked them in advance, and jokingly requested an additional 1.2M rows of a HQ dataset.
- Hugging Face reduces latency with new VAE: Hugging Face deployed code-name honey on Remote VAE Decode endpoints for SD v1, SD XL and Flux, reducing latency up to 10x which empowers local AI builders with Hybrid Inference.
- Hybrid Inference is free, fully compatible with Diffusers, and developer-friendly with simple requests and fast responses, and VAE Encode is coming soon.
- Smol Agents Quiz Sparks Frustration: A member expressed frustration with the Smol Agents Quiz, citing unclear requirements and receiving a score of 0.0 out of 5 despite multiple attempts, referencing the quizâs app.py file.
- The member pointed to the need to mine error logs to understand the exact providers required for tools and models.
- Lambda Go Labs: AI learning and building: Lambda Go Labs is a community focused on AI learning, building, and research.
- The community offers hands-on experience, opportunities to share work, and a supportive network for both experienced professionals and newcomers.
aider (Paul Gauthier) Discord
- Aider Leaderboard Tooling Showdown: The Aider leaderboard now benchmarks AI models, alongside tools like Claude Code, assessing them as primary coding assistants.
- A user advocated for a tool-agnostic benchmark akin to SWE Benchlets to facilitate broader comparisons of coding tools and models.
- Anon-Kode Remixes Claude Code: A modified version of Claude Code, dubbed anon-kode, was released by the same developer who extracted the source code (link to original tweet), now compatible with OpenAI APIs (link to tweet) and available on GitHub.
- Lots of things to fix, but you can use anything that supports OpenAI-style API. If youâre brave, give it a try.
- Gemini 2.0 Pro Hits Context Wall?: A user reported
RESOURCE_EXHAUSTED
errors with thegemini/gemini-2.0-pro-exp-02-05
model in Aider when using a large context window.- In contrast, the
gemini-2.0-flash-thinking-exp-01-21
model functions smoothly; the user inquired about maximizing context window usage with the Pro model.
- In contrast, the
- Aider Gets Git Diff Wish: A user requested Aider to directly edit files using git diff syntax (e.g.,
<<<<<< branch
,======
,>>>>>>> replace
) within the files themselves.- Currently, Aider displays diffs in the terminal, but the user seeks in-file editing for pre-acceptance modifications; other users pointed out a fork would be necessary, or use an external diff tool.
- Grokâs Debugging Edge: Members noted that while Grok excels at debugging, O3 mini high sonnet may outperform it in code creation tasks, such as adding new functions.
- They observed Claude 3.7 sometimes introduces unintended elements, while deepseek-chat with O1 Pro has proven highly reliable as an editor, approaching 95% accuracy.
GPU MODE Discord
- Vision Models Still Favor Attention: Despite alternatives like MLP-Mixer existing, attention-based ViTs remain the SOTA choice for vision models.
- The relative underutilization of MLP-Mixer, detailed in MLP-Mixer: An all-MLP Architecture for Vision, was questioned by a member.
- SRAMâs Cache Quirks Revealed: Registers, shared memory, and cache are chip/software level properties constructed from SRAM, with unallocated shared memory becoming L1 cache.
- While direct cache level control (L1/L2) is absent in Triton,
cache_modifier
intl.load
specifies L1 or L2 hits, wherecg
targets L2 exclusively.
- While direct cache level control (L1/L2) is absent in Triton,
- CUDA Architecture Query gets Torch Answer: For determining the
--arch=
for CUDA compilation,torch.cuda.get_device_capability()
from PyTorch was suggested, and the alternative solutionnvidia-smi --query-gpu=name,compute_cap --format=csv
was found.- The second option avoids needing a PyTorch dependency, and the CUDA Runtime API can programmatically select the best device based on specified criteria as shown in the docs.
- Tilelang kernels flash faster than Flash-MLA: A member boasted that 80 lines of tilelang kernel code yields 95% performance of deepseek flashmla, a 500% speedup over Triton on H100, with a link to the GitHub repo.
- Another member expressed the desire to have an MLA leaderboard, perhaps repurposed from the bitnet group.
- FA3 needs Absmax for Quantization: While FA3 is now working, it exhibits significantly higher quantization error than basic absmax quantization, suggesting a need for strategic adjustments.
- It was proposed to apply absmax quantization after the Hada transform, especially for âvâ, mitigating out-of-distribution issues stemming from large activations.
OpenRouter (Alex Atallah) Discord
- Travel App Springs Up to Save Travel Reels: An app emerged to solve the problem of endless saving of travel reels and hours of manual research, using AI agents to automatically extract data such as locations, price ranges, reservation requirements, booking links, and operating hours directly from travel reels at https://thatspot.app/.
- The app streamlines trip planning by leveraging AI agents to process travel reels, automatically extracting every place mentioned, automating the manual research process.
- Google Flash 2.0 Flashes a 502 Error: A user reported a 502 error when inferencing with Googleâs Flash 2.0 and Flash 2.0 Light models, with the error message âProvider returned errorâ.
- The error indicates an internal issue encountered by Google.
- OpenRouterâs Sonnet singing with Rate Limits: A user asked about the rate limits for Claude 3.7 Sonnet in terms of RPM (Requests Per Minute) and TPM (Tokens Per Minute).
- A member clarified that OpenRouter doesnât impose specific rate limits per user, pointing to Anthropicâs rate limits documentation and BYOK settings (OpenRouter Integration Settings).
- OpenRouter API Key throws VS Studio for a Loop: A user faced a 401 Authentication Failure using an OpenRouter API key in VS Studio via RooCode, despite having sufficient funds.
- Suggestions included verifying the API key, selecting OpenRouter as the API provider in RooCode, and ensuring the correct base URL, referencing this tutorial.
- BYOK Azure Models Yearning for OpenRouter: A user inquired about using BYOK (Bring Your Own Key) with Azure models in OpenRouter, seeking a unified API for finetuned models.
- A member clarified that only models listed in the
/models
endpoint are supported, excluding BYOK models, suggesting the use of an OpenAI API Key in Integration settings instead.
- A member clarified that only models listed in the
LM Studio Discord
- LM Studio Launches SDKs for Python and TypeScript: LM Studio released software developer kits for Python (
lmstudio-python
) and TypeScript (lmstudio-js
) under the MIT license to allow developers to tap into LM Studioâs AI capabilities from their own code.- The SDKs support LLMs, embedding models, and agentic flows, featuring the
.act()
API for autonomous task execution using provided tools, as documented on their respective pages (lmstudio-python
) and (lmstudio-js
).
- The SDKs support LLMs, embedding models, and agentic flows, featuring the
- LM Studio âUnsupported Deviceâ Error Plague Users: After an LM Studio update, users reported encountering
Failed to load model
errors with the messageUnsupported device
, advising to try adjusting GPU offloading or thread pool size.- The error might be tied to context length impacting memory usage; the left number is the number of tokens the model is using in the chat history already while the right number is the context limit.
- Diffusion Model Architecture Unsupported by Llama.cpp: Users reported errors loading diffusion models, receiving
error loading model architecture: unknown model architecture: 'sd3'
, it was clarified that llama.cpp does not support image/video/audio generation models.- Support for vision models in
llama.cpp
is uncertain, with concerns about the lack of Llama 3.2 vision or Pixtral vision support, however, some believe that UI-TARS fixes will help.
- Support for vision models in
- Pseudollama Patches OLLAMA Gap: Members discussed if LM Studio endpoints were compatible with apps that take an OLLAMA endpoint, and it was answered that it is not supposed to work by default, but Pseudollama can bridge the gap.
- The author noted that this is 100% vibe coded, so there are likely dumb issues throughout, but it works.
- AMD needs to compete in the GPU Space: Members discussed whether AMD or Intel could become viable for ML pipelines and frameworks to compete with CUDA.
- Some members believe if AMD increases their market share, they would be more interested in investing in their GPU computing department, and that the real question is whether AMD can buy the time from a chip foundry, because Nvidia has the upper hand.
Nous Research AI Discord
- Nous API Pricing Discussed: Members discussed Nous potentially launching an API for their models to generate income, with speculative pricing around $0.8/M tokens, potentially yielding $800-1600/day.
- Suggestions included pricing closer to $1/M input tokens and $3/M output tokens for specialized models, with ongoing efforts underway to realize this.
- LLMs Fail at CUDA Kernel Generation: Members concurred that while LLMs can produce valid CUDA syntax, they struggle to independently generate high-performance CUDA kernels.
- The optimal approach involves integrating hardware and compute graph data with the LLM, potentially via a knowledge graph or GNN, complemented by intensive GPU profiling.
- Logic-RL Boosts Reasoning with Rule-Based RL: The Logic-RL paper explores the potential of rule-based reinforcement learning (RL) in large reasoning models, taking inspiration from DeepSeek-R1.
- The 7B model, trained on only 5K logic problems, displayed generalization on challenging math benchmarks like AIME and AMC.
- Runway Unveils General World Models: Runway introduced General World Models, aiming to create AI systems capable of building internal representations of environments to simulate future events.
- Their goal is to represent and simulate a broad spectrum of situations and interactions, surpassing confined settings like video games or driving simulations.
- Qwen2.5-Math-1.5B Modelâs Longcot Struggles: A user found that the Qwen2.5-Math-1.5B model has difficulties with longcot examples, needing help with configuring the dataset structure and the GRPOTrainer.
- They linked their Kaggle notebook requesting guidance on solving these issues.
Interconnects (Nathan Lambert) Discord
- Unitree Unleashes Open Source Trove: Unitree Robotics has open-sourced multiple repositories, offering access via their GitHub.
- This move opens up possibilities for collaborative development and innovation in robotics.
- GPT-4.5 Ascends Arena Throne: GPT-4.5 has seized the top spot on the Arena leaderboard across all categories, including Multi-Turn, Hard Prompts, Coding, Math, Creative Writing, Instruction Following, and Longer Query (source).
- The latest ratings cement GPT-4.5 as state of the art for the moment.
- Anthropicâs Astronomical Ascent Continues: Anthropic secured $3.5 billion in funding at a staggering $61.5 billion post-money valuation, with Lightspeed Venture Partners leading the charge (source).
- The funding aims to advance their AI systemsâ development, deepen understanding of their functionality, and propel international growth.
- Grok3 Pricing Structure Surfaces?: Potentially leaked Grok3 pricing details suggest costs of $3.50/million for input, $0.875/million for cached input, and $10.50/million for output, as reported in this tweet.
- The leaked pricing model offers insights into the potential costs for leveraging Grok3 in various applications.
- Human Data Still Vital for Real-World AI?: A blog post (https://www.amplifypartners.com/blog-posts/annotation-for-ai-doesnt-scale) contends that human data remains essential for building truly useful AI products.
- This perspective challenges the notion that synthetic data alone can drive substantial advancements in model performance.
Yannick Kilcher Discord
- Claude Cracks Coding Challenge: A member reported using Claude and Cursor to complete 95% of the work on this GitHub Pull Request involving granular configuration options.
- The member was working on the
object-property-newline
rule by adding support for granular configuration options, allowing developers to specify different behaviors for different node types.
- The member was working on the
- Tackling Tricky Time Slots: A member initially considered presenting on Joscha Bach, but itâs unclear if this was the final topic.
- Another member offered to present in the
<t:1741046400:F>
timeslot if there were no other presentations scheduled, and offered further advice to those interested.
- Another member offered to present in the
- Elsagate Erupts Again: A member shared a YouTube video titled âElsagate 3.0 Is Worse Than we Thoughtâ with a warning that it is NOT FOR CHILDREN.
- Another member responded, stating, âWell, that is horrifying.â
Notebook LM Discord
- Financial Statements enter NotebookLM: A member inquired about loading financial statements for analysis into NotebookLM to automate financial analysis.
- This suggests interest in using NotebookLM for professional tasks.
- Podcast Length Debate and Timeline Demands Aired: Concerns were expressed about podcast length and coverage of important topics, referencing a Supreme Court Application found here.
- A member requested timelines be added to the podcast free version, while another member shared an example notebooklm podcast.
- Dynamic Docs, a Feature that is MIA: Members are curious if NotebookLM can dynamically update from sources like Google Docs, for use cases like tracking furniture dimensions.
- Because the feature is not automatic it has lead to discussions about workarounds and feature requests.
- Notebook Sharing Snafu Defused!: A user reported a server error when sharing notebooks with Gmail personal accounts, specifically âYou are not allowed to access this notebookâ.
- The issue was resolved when the user found the recipient had a new phone not correctly configured with their gmail account.
Stability.ai (Stable Diffusion) Discord
- Face Copy Alternatives Emerge: Members debated the best ways to copy faces, with some preferring reference only in ControlNet while others recommended Reactor Faceswap as a preferable alternative to IP-Adapter.
- The community consensus seems to favor ControlNet for its versatility.
- Reforgeâs AMDGPU Support Remains Murky: A user reported conflicting information regarding Reforge supporting AMDGPU, as itâs mentioned on the Stability Matrix but not on the GitHub page.
- Another userâs attempt to use Zluda resulted in PC freezes, leading to skepticism about the accuracy of the Stability Matrix and a recommendation to use a UI outside matrix.
- DirectML and Reforge Donât Mix: A memberâs attempt to use Reforge with DirectML after Zluda failed proved unsuccessful.
- There was a discussion about a potential fork of Reforge for AMD by Lshqtiger.
- CivitAI Offers Free Image Generation: Members discussed CivitAI as a platform for image generation requests, noting it provides a few starting credits and 25 free daily credits that can be saved.
- The cost to use the platform depends on the model selected.
- Local Image Generation Requirements Detailed: One member asked about the requirements for creating images locally; another responded that a GPU with around 6-8GB VRAM is recommended, along with other resources in <#1002602742667280404>.
- Another member shared links to CivitAI for online generation as an alternative.
Eleuther Discord
- ReasonableLLAMA-Jr-3b Seeks Feedback: A member requested feedback on their ReasonableLLAMA-Jr-3b model, a reasoning model trained with GRPO on LLAMA 3.2 3B, based on concepts from the Atom of Thoughts (AoT) paper.
- The model uses a custom-written GRPO-based Agent in a Gym environment using MLX, where each state transition in the reasoning process is a self-contained, atomic question, as described in Atom of Thoughts for Markov LLM Test-Time Scaling.
- Recurrent LLM Reasoning: Exorbitant?: Members debated whether recurrent LLM reasoning, which requires compute equivalent to a 32B parameter model to match the performance of a 7B model, is practical.
- The key question posed was: why not train a 32B parameter model instead and use early exit, mixture of depths, or speculative decoding for cheaper inference?
- Troubleshooting âtrust_remote_codeâ in Harness: A user questioned if
trust_remote_code
is unconditionally set inlm-evaluation-harness
, pointing to a specific line in the GitHub repo.- A member clarified that
trust_remote_code
is set only if the--trust_remote_code
argument is provided, referencing the relevant section of the code.
- A member clarified that
- Unveiling Dataset Kwargs Pathway: A user inquired whether setting
trust_remote_code
would overridedataset_kwargs
when loading a local dataset.- A member clarified that
dataset_kwargs
are passed todatasets.load_dataset(...)
within the harness, linking to the relevant part of the code.
- A member clarified that
- User Reports Dataset Generation Error: A user reported encountering a dataset generation error while running
lm_eval
with a configuration specifyingdataset_path: json
and adata_dir
containingtrain.jsonl
,validation.jsonl
, andtest.jsonl
.- In response, a member advised manually testing the dataset loading with
load_dataset
and trying an absolute path for the data directory.
- In response, a member advised manually testing the dataset loading with
MCP (Glama) Discord
- MCP Terraform Registry Faces Issues: Users reported issues getting terraform-registry-mcp and aws-mcp server to function, especially with Claude desktop and Cline when a system-level proxy is enabled, causing mcp-server-fetch errors.
- The issue seems related to proxy settings interfering with the serverâs ability to fetch necessary resources.
- Multi-Agent MCP Architectures Arise: A member explored implementing MCP for multi-agent systems, referencing an Anthropic workshop at AI Engineering Summit and shared an image from the workshop.
- They are building a framework for agents to cooperate across devices and considering adopting MCP, with inspiration from examples like BabyAGI and Stanford generative agents.
- Fast Agent Framework Floats into Focus: A member shared their project, fast-agent on GitHub, for Defining, Prompting and Testing MCP enabled Agents and Workflows.
- The framework allows each agent to be configured with a separate set of MCP servers and can be called as a tool by other agents.
- Node Version Nightmares Nag Claude Users: Users reported encountering a Cannot find package âtimersâ error when using fastmcp in Claude desktop.
- The problem was traced back to an outdated Node v14 version that Claude was utilizing.
- MCPHub.nvim Navigates Neovim: A new MCPHub.nvim plugin was released, which assists in managing MCP servers within Neovim, and offers features like smart server lifecycle management and integration with CodeCompanion.nvim for AI chat.
- The plugin, installable with a single command (
:MCPHub
), provides a streamlined setup process for MCP server management.
- The plugin, installable with a single command (
DSPy Discord
- Ash Framework Ecosystem Gets Love: A member suggested the Ash framework for a project, pointing to the ash-project/ash_ai GitHub repository.
- They highlighted instructor_ex, which provides structured outputs for LLMs in Elixir, and directed users to the Ash Discord community for guidance.
- Async Support Initiative Ignites DSPy: A member inquired about the motivations and anticipated performance boosts of full async support in DSPy, and linked to another Discord invite link.
- A core contributor announced intentions to make async support native, and requested feature requests via GitHub issues to prevent Discord oversight.
- LangProBe Benchmarks Program Composition: A new paper, LangProBe: a Language Programs Benchmark, evaluates the impact of DSPy program composition and optimizers on different tasks, while exploring cost/quality tradeoffs.
- As noted in its X/Twitter post, the paper shows that smaller models in optimized programs can outstrip larger models at a lower cost.
- Minions Prep for Cost Dominance: A member indicated that the just-released LangProBe paper provides a good baseline for benchmarking their implemented minions feature, referencing their closed pull request.
- The member added MinionsLM and StructuredMinionsLM for intelligent LM routing, and emphasized the direct relevance of the paper to cost optimization.
LlamaIndex Discord
- AgentWorkflow Context: A member asked about the distinction between Context and Chat History within AgentWorkflow.
- Another member responded that the chat history is inside the context.
- LlamaIndex Integrates MCP Support: A user asked about MCP support in LlamaIndex, and another confirmed it exists with an example notebook.
- The notebook demonstrates how to use MCP with LlamaIndex.
- LlamaParseâs Latest Models Parse with Agent: The âParse With Agentâ mode now supports AnthropicAI Claude Sonnet 3.7 and Google Gemini 2.0 Flash, enhancing table parsing and cross-page consistency (announcement).
- These updates should improve the accuracy and reliability of parsing complex documents.
- Need PII? Ask LlamaIndex!: A member is seeking both paid and open-source options for redacting Personally Identifiable Information (PII) from PDFs and images before sending them to an LLM.
- This request highlights the growing need for robust PII redaction tools in LLM applications.
- Windsurf not riding high due to Checkpoint Absence: A member noted the absence of a checkpoint feature in Windsurf, noting the inability to revert to previous states despite repeated coding attempts and file/workspace manipulations.
- The member attached an image illustrating their attempts to drag and drop files into the tab menu, seeking a way to access previous checkpoints.
Latent Space Discord
- AI Not Quite Replacing Programmers Yet: An OâReilly article suggests AI tools are evolving programming, similar to historical changes since the early days of physical circuit programming.
- Members agreed, noting that LLMs accelerate learning and that this is similar to past complaints about copying from StackOverflow.
- Senior Engineers Rule AI Outputs: Senior engineers effectively guide AIâs output with their expertise, preventing unmaintainable code when using tools like Cursor or Copilot.
- While AI speeds up implementation, senior engineers ensure code maintainability, a skill often lacking in junior engineers.
- Anthropic Scores Massive Funding Round: Anthropic secured $3.5 billion in funding, valuing the company at $61.5 billion post-money, with Lightspeed Venture Partners leading the round.
- This investment will support the advancement of AI systems, enhance understanding of their functionality, and support global expansion.
- Stagehand Tooling Sought for Pythonistas: Following a Latent Space podcast episode on Browserbase, a member sought a self-healing browser workflow tool akin to Stagehand in Python.
- Another member suggested stagehand-py, noting that âitâs wipâ.
- Cursor Beats Claude Code in Code Cagefight: Members compared Claude Code against Cursor, with Cursor being favored for its rollback capabilities.
- Feedback indicated that Claude Code struggles with focus, adds unnecessary code, is more expensive, and lacks the speed of Cursor for code edits.
tinygrad (George Hotz) Discord
- Tinygrad Aims for Fair Compute Marketplace: George Hotz (@tinygrad) describes tinygrad as a formalist project to capture Software 2.0 in a non-leaky abstraction, aiming for a fair marketplace for compute, similar to Linux and LLVM.
- Hotz anticipates tinygradâs speed on NVIDIA to match the existing torch CUDA backend by year-end, sans CUDA, and envisions a test cloud for renting FLOPS in a lambda function.
- Ops.CAT Speed Bounty Faces LLVM Rewriting Issues: A member reported ongoing challenges with the Ops.CAT speed bounty, specifically struggling to get it to rewrite in LLVM, despite being scheduled.
- The current Ops.CAT operations feature a complex structure of PAD, RESHAPE, and BUFFER operations, with arg representing the two tensors to concatenate.
- RDNA2/RX6000 Usability Inquiries with tinygrad: A user asked about RDNA2/RX6000/GFX1030 usability with tinygrad, reporting an
OSError: [Errno 22] Invalid argument
when runningAMD=1
.- Another member said that it should work on Linux, requesting the trace for the OS error which was provided in a trace.txt file.
- Intel Arc A770 Plays Nice with OpenCL: A member confirmed that Intel Arc A770 is indeed usable with tinygrad.
- The recommendation is to utilize the OpenCL backend by setting
GPU=1
.
- The recommendation is to utilize the OpenCL backend by setting
LLM Agents (Berkeley MOOC) Discord
- Sutton Dives into Coding Agents: Amazing guest speaker Charles Sutton presented on Coding Agents and AI for Vulnerability Detection, at Lecture 5.
- The lecture explores using LLM agents for computer security tasks like finding software vulnerabilities, and discusses design issues in LLM agents.
- DeepMind Researcher Wins Accolades: Charles Sutton, a Research Scientist at Google DeepMind, has research in machine learning motivated by applications in code generation, software engineering, programming languages, and computer security.
- Suttonâs work in software engineering has received two ACM Distinguished Paper Awards (FSE 2014, ICSE 2020) and a 10-year Most Influential Paper award (MSR 2023).
- Quiz Posting Day Revealed: A user asked when the quiz is posted each week, to which another user responded that they generally try to release it Wed/Thurs.
- This question was asked in the mooc-questions channel.
- Audio Issues Plague Lecture: A member reported not being able to hear questions during the lecture due to audio problems, requesting assistance from someone in the room, in the mooc-lecture-discussion channel.
- A staff member apologized for the audio issues during the lecture and promised to remind the speakers to repeat all questions going forward.
Cohere Discord
- Cohere Image Embedding Issue Vanishes: A user reported an issue with embedding images using Cohere, but later confirmed that the issue mysteriously resolved itself.
- Another member simply acknowledged the resolution without further comment.
- Cohere Probes Pesky 504 Errors: A Cohere member mentioned that while they didnât observe a spike in 504 errors, they did note super slow requests as a potential cause.
- The member is planning to investigate the source of the slow requests further, thanking the user for the heads up.
Modular (Mojo đ„) Discord
owned
Becomesown
by Pull Request: A member submitted a pull request to renameowned
toown
for consistency with the rest argument convention.- The renaming aims to align with established coding practices and enhance readability.
- Community Meeting Needs Speakers: The upcoming community meeting, scheduled in one week, seeks speakers to present talks or showcase projects.
- Interested individuals should contact the organizers to secure a spot on the agenda.
- AWS GenAI Loft Hosts MAX Engine Event: An event titled Beyond CUDA: Accelerating GenAI Workloads with Modularâs MAX Engine, Hosted by AWS will take place at the AWS GenAI Loft.
- The event, targeted for the Bay Area audience, is scheduled for tomorrow evening.
- SIMD DType Construction Explained: A discussion clarified that
SIMD[DType.uint8, 1](0).type
returns the dtype at compile time, usingvar a = UInt8(0); alias dtype = __typeof(a).type
as an example.- A member highlighted that
SIMD
includes construction checks within its implementation, which helps with validity and type safety.
- A member highlighted that
- Parameter Injection Favored over Globals: In response to a question about using globals, a member asserted that injecting parameters is generally preferable, if you have the time.
- This preference aligns with best practices for code maintainability and testability.
Torchtune Discord
- Step-Based Checkpointing Keeps Compute Alive: Members expressed interest in, and confirmed the ongoing implementation of, step-based checkpointing to mitigate compute waste from training failures.
- This feature saves progress at regular intervals, reducing the impact of interruptions.
- Torch Users Trace with Tensorboard: Torch users debated strategies for visualizing profiler traces, initially attempting Tensorboard but noting the removal of certain plugin features for PyTorch.
- They recommended the PyTorch memory visualizer tool and Perfetto for memory and timing traces as sufficient for following the trail.
- Alternative Profiling Tools Prevail: The discussion underscored the PyTorch memory visualizer tool and Perfetto as solid alternatives for memory and timing traces, respectively.
- These tools arose after users flagged issues with Tensorboard, specifically the absence of some plugin features for PyTorch.
Nomic.ai (GPT4All) Discord
- Ollama vs GPT4All: Which Llama Reigns Supreme?: A user questioned why people choose Ollama or Llama.cpp over GPT4All, arguing that GPT4Allâs out-of-the-box functionality makes it a better choice.
- The user did not provide specific details on the comparison metrics, but emphasized the ease of use as a key advantage.
- GPT4All Interface to Get Catalan Language Support: A community member requested the addition of Catalan as a language option for the GPT4All interface.
- The request highlighted the presence of Catalan speakers in the community and the potential benefit of localized support.
- Security hole found, GPT4All v3.10.0 faces Vulnerability: A user reported a potential vulnerability in GPT4All v3.10.0 and asked about the proper way to report it.
- No details about the nature of the vulnerability were disclosed in the message, but prompt reporting was advised.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
{% if medium == âwebâ %}
Cursor IDE â· #general (745 messagesđ„đ„đ„):
Cursor IDE, MCP, Landing Page Design, Model Performance, Repo Prompt
- Cursor Unstable & Buggy: Users are reporting instability, connection failures, and non-functional checkpoints in the latest Cursor release.
- One user stated, itâs just incredible how unstable is cursor currently, with many considering to switch to other alternatives like Windsurf and Trae AI.
- MCP Configuration Headaches: Members are struggling to configure MCP servers in Cursor, especially with Windows and remote Ubuntu workspaces, facing issues like client creation failures.
- One user needed help with Firecrawl MCP server setup, but eventually got Pupeteer working and said Ty for help yall Im dumbsorry.
- 3.7 not quite heaven: Users are experiencing issues with Claude 3.7 such as it being insanely slow and prone to stopping mid-request without errors.
- One user stated, 3.7 is really unstable atm, which has led many to use Cursorâs âAskâ mode or switch to older versions for important tasks.
- Designers Dig into Landing Pages: Members share landing page designs generated with Cursor and discuss their aesthetic and effectiveness.
- The community debated the merits of different designs and compared them to sites like Linear, Framer, Magician Design, and Webflow for inspiration.
- Repo Prompt Gains Traction: Users are hyped about Repo Prompt, praising its multi-file edit capabilities and integration of code snippets.
- The community has also mentioned BrowserTools for debugging, as well as PasteMax, described as an open source poor manâs version of Repo Prompt for selecting files.
Links mentioned:
- no title found: no description found
- Tweet from eric provencher (@pvncher): Apply mode is one of the best parts of Repo Prompt, and it's something that seems to scare people, because of how it's presented.It's super powerful though! Here's how you can use it t...
- Tweet from lmarena.ai (formerly lmsys.org) (@lmarena_ai): GPT-4.5 topped all categories across the board, with a clear leadership in Multi-Turn.đ„ Multi-Turnđ Hard Promptsđ Codingđ Mathđ Creative Writingđ Instruction Followingđ Longer Query
- Repo Prompt: no description found
- AgentDesk: no description found
- Kermit Gun GIF - Kermit Gun Kermit gun - Discover & Share GIFs: Click to view the GIF
- elijah wood is asked if he wears wigs: from a prank interview featuring the cast of the lord of the rings movie franchisedo you wear wigs? have you worn wigs? will you wear wigs? when will you wea...
- Installation - AgentDesk - BrowserToolsMCP: no description found
- Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter
- no title found: no description found
- Dinosolostan GIF - Dinosolostan - Discover & Share GIFs: Click to view the GIF
- Reddit - Dive into anything: no description found
- agent_tools/src/agent_tools/method_validator at main · grahama1970/agent_tools: Contribute to grahama1970/agent_tools development by creating an account on GitHub.
- GitHub - mendableai/firecrawl-mcp-server: Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.: Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients. - mendableai/firecrawl-mcp-server
- GitHub - kleneway/pastemax: A simple tool to select files from a repository to copy/paste into an LLM: A simple tool to select files from a repository to copy/paste into an LLM - kleneway/pastemax
- GitHub - browserbase/stagehand: An AI web browsing framework focused on simplicity and extensibility.: An AI web browsing framework focused on simplicity and extensibility. - browserbase/stagehand
- GitHub - daniel-lxs/mcp-starter: A lightweight Go application that parses JSON configuration files and executes commands with specified environment variables.: A lightweight Go application that parses JSON configuration files and executes commands with specified environment variables. - daniel-lxs/mcp-starter
- Linear â Plan and build products: Linear streamlines issues, projects, and roadmaps. Purpose-built for modern product development.
- Framer: The website builder loved by designers: Design, scale, and publish your websiteâno code needed. Start for free today.
- Magician for Figma: A magical design tool for Figma powered by AI.
- Webflow: Create a custom website | Visual website builder: Create custom, responsive websites with the power of code â visually. Design and build your site with a flexible CMS and top-tier hosting. Try Webflow for free.
- Your Design is S**t: I help early-stage founders turn landing pages into revenue engines. Because great products deserve design that converts visitors into customers.
- The Musks: When $300 Billion Splits Your Family: When you think of the Musk family, itâs likely Elon Musk who comes to mind - however, the Musk family extends far beyond Elon, with each member contributing ...
- Reddit - Dive into anything: no description found
- Elon Musk Responds to Nazi Salute Comparisons: âThey Need Better Dirty Tricksâ: The 53-year-old sparked outrage on Monday when he made a straight-armed gesture at Donald Trumpâs Presidential Parade.
- WATCH: Elon Musk appears to give fascist salute during Trump inauguration celebration: Billionaire Elon Musk gave what appeared to be a fascist salute Monday while making a speech at the post-inauguration celebration for President Donald Trump ...
- Musk family - Wikipedia: no description found
- no title found: no description found
- How Rich Has Elon Musk Been During Every Decade of His Life?: Elon Musk is the richest person in the world right now, according to Forbes. He has a ton of money -- around $232 billion. But Musk didn't become mega-rich overnight. He made his huge fortune over...
Codeium (Windsurf) â· #announcements (1 messages):
Windows ARM support, Windsurf Next, Ubuntu 24.04, Claude 3.7 Sonnet, MCP Tools
- Windsurf adds Windows ARM support: Windsurf Next <:wsnext:1336821369685540914> now supports Windows ARM64 as of this weekend, and can be downloaded here.
- Ubuntu 24.04 bug squashed: Both Windsurf 1.3.11 and Windsurf Next have been patched to fix crashes caused by permissions errors on Ubuntu 24.04 (changelog).
- Cascade Now Supports Claude 3.7 Sonnet: Windsurf Next now supports Claude 3.7 Sonnet which takes 1.0 user prompt credits on every message and 1.0 flow action credits on each tool call.
- Windsurf Addresses MCP Tool issues in JSON: Windsurf 1.3.11 patched MCP fixes for incorrectly formatted MCP tools in JSON, and provides better MCP Error Handling.
Links mentioned:
- Thank you for downloading Windsurf Next: Windsurf Next is our experimental beta, giving early adopters a unique opportunity to test new features before they make it to the stable release.
- Windsurf Next Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Next extension.
- Windsurf Editor Changelogs | Windsurf Editor and Codeium extensions: Latest updates and changes for the Windsurf Editor.
Codeium (Windsurf) â· #discussion (37 messagesđ„):
Codeium Pro issues and snoozing, Supercomplete availability in Codeium, Visual Studio Codeium extension versions, JetBrains extension issues
- Codeium Pro users face snoozing problems: A user reported that Codeium Pro stops working and snoozes itself, requiring manual re-enablement, which is a known issue for some users.
- The user found that using the pre-release version of the extension provided a better experience, though still not ideal.
- Supercompleteâs Status Clarified: Supercomplete, initially marketed for Pro plans, is now included in the free plan but functions fully only in the pre-release version of the extension due to API issues caused by Microsoft.
- One user noted that they wished to be notified of such changes.
- Visual Studio Version Lags: A user noted that the Visual Studio version of the extension provides inferior suggestions compared to the VSCode version.
- It was revealed that the Visual Studio extension uses an older Codeium LSP, with a link to the GitHub repo provided for reference.
- JetBrains Plugin stuck on Processing request: Users of the JetBrains plugin reported being stuck in the Processing request phase, leading to errors.
- This problem occurs specifically in the latest pre-release version, causing the plugin to be unable to generate responses.
- Enterprise features lagging behind Windsurf: A member mentioned that Enterprise features are lagging the other subscription types.
- Another member countered that Enterprise and Team plans have the same cadence and that this is done to ship only totally battle-tested features.
Link mentioned: GitHub - Exafunction/CodeiumVisualStudio: Visual Studio extension for Codeium: Visual Studio extension for Codeium. Contribute to Exafunction/CodeiumVisualStudio development by creating an account on GitHub.
Codeium (Windsurf) â· #windsurf (432 messagesđ„đ„đ„):
Codeium customer support, Premium flow action credits, Windsurf's new update, Claude 3.7, Multiple selection using CTRL+D
- Users Experience Poor Codeium Customer Support: Users have reported experiencing poor customer support from Codeium, with one user mentioning they have been trying to follow up with their subscription issue for the last 4 weeks without a resolution.
- Premium Flow Action Credits Depleting Rapidly: Users reported that their premium flow action credits are depleting rapidly, within 4-5 days, even though they still have prompt credits left, and some users are now considering switching to Cursor.
- One user speculated that changes to Claude 3.7 might be the cause, as it now uses flow actions per prompt, leading to high credit consumption.
- Windsurfâs New Update Causing Issues on Ubuntu: A recent Windsurf update caused issues for users on Ubuntu 24.04, with the application failing to start and displaying a FATAL:setuid_sandbox_host.cc(158) error, requiring a manual fix to change the permissions of chrome-sandbox.
- One user reported that the update crashed their Ubuntu system, leading to a reinstallation and data loss, highlighting the need for proper backups before updating.
- Claude 3.7 Code Model Credit Consumption Rate Sparks Debate: Members observed that Claude 3.7 is consuming credits rapidly due to excessive tool calls per prompt, with some experiencing 30-40 tool calls for small changes, also the implementation of 3.7 was too hasty.
- Some users are now sticking with 3.5 or other models for better efficiency, and there are strong suggestions by users for Codeium to hide 3.7 as a default model.
- Windsurf struggles to move cursor using CTRL+D: A user reported an issue with multiple selection using CTRL+D in Windsurf, where moving the cursor with CTRL + Left or CTRL + SHIFT + Left only moves the cursor at the first selection.
- It was recommended to check the status bar to confirm if you are in multi-selection mode, and it was found out that the multiple selections disappears from status bar.
Links mentioned:
- oTTomator: no description found
- AlfredPros/CodeLlama-7b-Instruct-Solidity · Hugging Face: no description found
- Anthropic Status: no description found
- servers/src/github at main · modelcontextprotocol/servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
- Piyushzen GIF - PIYUSHZEN - Discover & Share GIFs: Click to view the GIF
- servers/src/github at main · modelcontextprotocol/servers: Model Context Protocol Servers. Contribute to modelcontextprotocol/servers development by creating an account on GitHub.
- Support | Windsurf Editor and Codeium extensions: Need help? Contact our support team for personalized assistance.
- GPT-4.5 FLOP? Claude 3.7 Sonnet STARTER PACK. What is Claude Code REALLY?: đ„ GPT-4.5 maybe the biggest FLOP. MEANWHILE Claude 3.7 Sonnet and Claude Code just CHANGED THE GAME! đ€ŻAnthropic's Claude Code maybe the greatest AI Agent T...
- GitHub - kamusis/vsix-downloader: Contribute to kamusis/vsix-downloader development by creating an account on GitHub.
- GitHub - ian-cowley/MCPSqlServer: SQL Server MCP Server for Windsurf IDE - A standalone MCP server providing SQL Server integration capabilities: SQL Server MCP Server for Windsurf IDE - A standalone MCP server providing SQL Server integration capabilities - ian-cowley/MCPSqlServer
- GPT-4.5 FLOP? Claude 3.7 Sonnet STARTER PACK. What is Claude Code REALLY?: đ„ GPT-4.5 maybe the biggest FLOP. MEANWHILE Claude 3.7 Sonnet and Claude Code just CHANGED THE GAME! đ€ŻAnthropic's Claude Code maybe the greatest AI Agent T...
- Feature Requests | Codeium: Give feedback to the Codeium team so we can make more informed product decisions. Powered by Canny.
OpenAI â· #annnouncements (2 messages):
Sora onboarding session, Sora prompt crafting
- Sora 101 Live Onboarding Happening Soon!: Join the Sora teamâs <@713183296099450910>-hosted live onboarding session to cover Sora fundamentals, optimal prompting techniques, and early access artist onboarding insights.
- The live session starts <t:1741024800:R> and you can join the discussion via this discord link or this one.
- Crafting Great Prompts for Sora: The Sora 101 session will offer best practices for crafting great prompts, based on insights from the onboarding process for early access artists.
- Whether new to Sora or looking to refine your approach, this session is a great opportunity to learn and ask questions.
OpenAI â· #ai-discussions (423 messagesđ„đ„đ„):
Mirror sites with pro accounts, GPT-4.5 Image Recognition, GPT prioritization of Pro vs Plus, Switching from ChatGPT to Grok, Gemini Free vs Pro Features
- Mirror Sites Offer Pro Accounts for Free: Multiple mirror sites offer pro accounts for very cheap or free, and havenât been banned, instead OpenAI is shadow banning Plus users who used the service on multiple devices according to drinkoblog.weebly.com.
- The same user noted the realistic compute limits, arguing unlimited sounds better than â12 hours computeâ on the surface, but is unrealistic in practice since some users never touch grass and will consume disproportionately more resources.
- GPT-4.5 Image Recognition Draws Mixed Reactions: Members are debating whether the new GPT-4.5 has better image recognition compared to GPT-4o, with Future Machine being more vocal about OpenAI (OAI)âs choices, though initial tests show that GPT-4.5 scores a bit higher than 4o on the MMMU (vision oriented reasoning benchmark) with a +5% improvement.
- One member expressed how they felt they were deprioritizing plus users in favour of pro users, stating that plus now feels like second class citizens instead of the usual premium status.
- Grokâs Custom Instructions Released, Not Working For All: Grok AIâs custom instructions feature has been released for all users according to a news source, allowing users to customize Grok as per their needs and make it respond accordingly.
- A member shared custom Grok instructions aiming for an abusive and lewd troll persona, but reported it doesnât work, and other users reported the same.
- GPT pro or Grok Super: Users discuss the pros and cons of switching from ChatGPT Pro to SuperGrok with Grok being much faster, smarter for general tasks and some API with outdated info.
- A user claimed that Geminiâs context window has 1 million tokens, no filter and great for creative writing.
- Dynamic learning system prototype: A user is developing a basic application prototype and will try injecting stellargraph nodes into it to create a learning system where they can inject the most relevant information to the AI dynamically, injecting knowledge in it.
Links mentioned:
- Tweet from Derya Unutmaz, MD (@DeryaTR_): @EducatingwithAI @kimmonismus I canât talk about it now but full autonomous problem solving is coming soon! Just hang in there for few more months ;)
- GPT 4.5 - not so much wow: GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well ...
- Tweet from Derya Unutmaz, MD (@DeryaTR_): After what Iâve seen today related to an AI model (canât talk about it yet) I can confidently claim that the scientific process will never be the same again! Iâm now 99% certain that all diseases incl...
- GPT-4.5 shocks the world with its lack of intelligence...: Try Brilliant free for 30 days https://brilliant.org/fireship Youâll also get 20% off an annual premium subscription.Let's take a first look at OpenAI's late...
- New 'Custom Instructions' Feature Added for X Platform's Grok AI on Web - The Tech Outlook: It was being reported lately that a new âCustom Instructionsâ feature will be brought to Grok AI and finally, it is now available to X users on the web. This new feature will let users cus...
- New 'Custom Instructions' Feature Added for X Platform's Grok AI on Web - The Tech Outlook: It was being reported lately that a new âCustom Instructionsâ feature will be brought to Grok AI and finally, it is now available to X users on the web. This new feature will let users cus...
OpenAI â· #gpt-4-discussions (30 messagesđ„):
GPT Model Selection, Projects vs GPTs, Claude 3.7 vs ChatGPT, Context window size comparison, Clearing Chatlogs and Uploaded Data
- Canât Choose Specific Model in GPT Creation: A user inquired about selecting a specific model when creating a GPT, but was told it defaults to 4o and you canât choose models in GPTs.
- A user stated you can use the Projects feature to change models to 4o, o1, 4.5 in a single chat.
- Projects Outshines GPTs by Allowing Model Selection: One user found that while they got the best programming results using 03-mini-high, custom instructions in Projects donât work with o3 mini models (only file uploads).
- However, another user said that with a Pro subscription, using a combination of o1, 4o, and 4.5 in Projects allows for custom instructions with file uploads.
- ChatGPT Bashed, Claude 3.7 Hailed: One user declared ChatGPT is a joke now and found Claude 3.7 to be very impressive.
- However, another user said Claudeâs projects are of no use, complaining that they can hardly upload only two files maximum and it says memory full, calling Claude over hyped.
- Context Windows Compared between Models: A user touted that Claude can better understand the context in larger files, and pointed out its larger context window: 200K compared to ChatGPTâs 128K.
- They clarified that the 128K context window on ChatGPT costs $200 with an Enterprise plan.
- Chatlogs Clearance Conundrums: One user asked who knows to clear all chatlogs and uploaded data at studio using web interface or api call?
- They were testing an app and needed to remove many dummy files and logs.
OpenAI â· #prompt-engineering (2 messages):
Dall-E image generation, Synthetic plants, Image prompting strategies
- Dall-E Generates Synthetic Plants with Transplant Organs: A member prompted Dall-E to generate an image of synthetic plants that grow hearts and livers for transplant, visible within a transparent membrane and nourished by the GM plant.
- The initial results emphasized hearts over livers, prompting the user to refine the prompt with more details about the liver lobes.
- Crafting Detailed Prompts for Specific Dall-E Outputs: The model suggested basing the image on bioengineered plant-inspired organs with a focus on livers and hearts after an initial image couldnât be shown.
- The user iteratively refined the prompt to guide Dall-Eâs attention to specific features like liver lobes and the organs forming almost like fruit.
- Showcasing Art and Following Community Guidelines: A member shared art prompts, noting the need to spoiler unsettling or horror-based images as per community guidelines in <#1107255707314704505>.
- This emphasizes awareness of community rules while sharing creative content.
OpenAI â· #api-discussions (2 messages):
Dall-E image generation, Synthetic plants growing organs for transplant
- Dall-E Crafts Organs from Synthetic Flora: A member prompted Dall-E to generate an image of synthetic plants growing hearts and livers for transplant within a transparent membrane.
- After the first image stressed the hearts, the member rephrased the prompt to emphasize the liver lobes and the organs forming almost like fruit.
- Navigating Image Content Guidelines: Members were reminded to use spoiler tags for images that may be unsettling or horror-based, following channel guidelines.
- The content creator had to spoiler some of the images they shared in order to comply.
Unsloth AI (Daniel Han) â· #general (252 messagesđ„đ„):
Llama zipping WAVs confusion, GRPO training steps, 4-bit model saving issues, Unsloth team size, Continued pretraining of Unsloth
- Llama Zips WAVs into Headaches: A member had a funny experience with a llama model: compressing a 192 KB ZIP file resulted in a 48 KB lossless WAV format, which it then tried to re-zip.
- The user noted, âconfusion, after first working through packing a 48KB again, to make a smaller zip⊠That was the r1-1776-distill-llama-70b.â
- GRPO Training Steps Speculation: A user inquired about the number of training steps needed for LoRA training Qwen2.5-14B-instruct with GRPO to lower the loss, wondering if more steps are required for better reasoning.
- Another member suggested GRPO training might take around 24 hours, stating that convergence depends on the model and sampling, so thereâs no fixed time.
- Saving 4-Bit Models into Trouble: A user ran into issues while saving a Phi-4 model in 4-bit after training it with GRPO, using an example Jupyter notebook provided by Unsloth.
- The error occurred during the saving process.
- Unslothâs Docs Teach Checkpointing: A user inquired about loading LoRA weights from a previous run, to which another user pointed to the Unsloth documentation on finetuning from the last checkpoint.
- The documentation explains how to edit the Trainer to add save_strategy and save_steps in order to save checkpoints and resume training.
- Unsloth Continued Pretraining is Very Helpful for New Languages: When a user asked about scenarios for continued pre-training, another user linked to the Unsloth documentation, stating itâs useful for new languages.
- The member pointed out that the blog post is even better, explaining how Unslothâs release allows you to easily continually pretrain LLMs 2x faster and use 50% less VRAM than Hugging Face + Flash Attention 2 QLoRA.
Links mentioned:
- Continued LLM Pretraining with Unsloth: Make a model learn a new language by doing continued pretraining with Unsloth using Llama 3, Phi-3 and Mistral.
- Finetuning from Last Checkpoint | Unsloth Documentation: Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
- Continued Pretraining | Unsloth Documentation: AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
- Aman's AI Journal âą Primers âą DeepSeek-R1: no description found
- KullbackâLeibler divergence - Wikipedia: no description found
- GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models.: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- Reasoning - GRPO & RL | Unsloth Documentation: Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO.
- Unsloth Documentation: no description found
Unsloth AI (Daniel Han) â· #off-topic (10 messagesđ„):
Github Issues, inline_asm, GRPO, VLLM, Online Training
- Github Repo Issues Reported: A member reported issues with reading open repos.
- They found some code with the
inline_asm
feature.
- They found some code with the
- Unsloth uses VLLM for backend: A member asked how Unsloth works in production deployment and was told it uses VLLM for the backend.
- The process involves GRPO fine-tuning, then inferencing on the fine-tuned model using Unsloth + VLLM, rather than Continuous Training + Inference (online learning).
- Online training requires implementation: A member clarified that during GRPO fine-tuning, VLLM is used to generate and grade outputs with reward functions before backpropagating loss.
- To implement online training, one would need to develop the mechanism themselves.
Link mentioned: stickbreaking-attention/stickbreaking_attention/sb_varlen/softplus.py at main · shawntan/stickbreaking-attention: Stick-breaking attention. Contribute to shawntan/stickbreaking-attention development by creating an account on GitHub.
Unsloth AI (Daniel Han) â· #help (54 messagesđ„):
GRPO Training, Qwen2.5-14B-instruct fine-tuning, DeepSeek-R1-Distill-Llama-8B error, Mistral embedding model, GCC compiler issue
- GRPO Training needs adequate steps: A member inquired about the number of training steps needed for LoRA training Qwen2.5-14B-instruct with GRPO to lower the loss.
- Another member suggested that 700-1200 steps are generally good for training a GRPO model using LoRA, but the optimal number depends on the dataset.
- DeepSeek-R1-Distill-Llama-8B error: A user encountered a RuntimeError when training unsloth/DeepSeek-R1-Distill-Llama-8B on Kaggle, related to matrix multiplication shapes.
- Despite upgrading transformers with
pip install --upgrade "git+https://github.com/huggingface/transformers.git"
, the issue persisted, and the user sought advice on resolving it.
- Despite upgrading transformers with
- Mistral Embedding Model missing lm_head: A user is trying to create an embedding model using Mistral but is facing issues with removing the lmhead for training.
- The user is seeking suggestions on how to properly remove the lmhead for effective training of the embedding model.
- Fixing GCC Compiler Issue with vLLM: A user encountered a RuntimeError: Failed to find C compiler when running the GRPO tutorial locally with meta-Llama-3.1-8B-Instruct.
- Despite attempting to install GCC via conda, the issue remained unresolved, and the user is restricted from using apt-get due to security reasons on their schoolâs HPC.
- Decoding GGUF models naming convention: A user inquired about the naming convention for GGUF models, specifically the meaning of the UD prefix in models like unsloth/r1-1776-GGUF.
- A member explained that UD stands for Unsloth Dynamics, a quantization algorithm that provides better results than IQ.
Links mentioned:
- unsloth/r1-1776-GGUF at main: no description found
- Google Colab: no description found
- Google Colab: no description found
Unsloth AI (Daniel Han) â· #research (94 messagesđ„đ„):
GRPO Reward Functions, Distilling Models for Agent Skeletons, SWE-bench Performance, String Replacement for Code Editing, Model Tool-Calling and Composition
- GRPO Rewards Linearly: Reward functions for GRPO are scored using the same format as in Unslothâs script, with correctness measured by linear scaling based on matches between the extracted answer and the correct answer, using
sum(i == j for i,j in zip(r,a))/len(a)
.- The creator noted that GRPO works even without a language prior, leading to better exploration and varied reasoning traces during training.
- Vladrad cracks Distilling Agent Skeletons: A member shared a model distilled from O3/R1, designed to generate good agent skeletons from pre-made functions, available on Hugging Face, after investing $400 to get the string replacement right.
- This approach aims to create a base model capable of handling languages that base models struggle with, such as Perl, by incorporating unit tests and problem-solving strategies from other repos.
- Claude 3.5 Sonnet Sweeps Bench with SOTA: Anthropicâs Claude 3.5 Sonnet achieves 49% on SWE-bench Verified, surpassing the previous state-of-the-art modelâs 45%; they assess a modelâs ability to complete real-world software engineering tasks by resolving GitHub issues from popular open-source Python repositories.
- A member referred to the bitter lesson: general methods that leverage computation are ultimately the most effective.
- String Replace Debated as Coding Strategy: Members debated the effectiveness of string replacement for code editing, with one member arguing itâs generally garbage because models arenât meant to write code this way.
- Despite reservations, another member reported success fine-tuning Qwen 2.5 for string replacement, especially when the model can see the file before making replacements.
- New Algorithm: Reinforce++ enhances stability: A new algorithm, Reinforce++, claims to improve stability over the classic REINFORCE algorithm by incorporating elements from PPO, calculating advantage with A_t = r(s_t, a_t) - Beta * sum(KL divergence_t->T).
- The algorithm is said to be similarly performant to GRPO in terms of reward and speed but with greater stability during training.
Links mentioned:
- Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet: A post for developers about the new Claude 3.5 Sonnet and the SWE-bench eval
- SOTA on swebench-verified: (re)learning the bitter lesson: Searching code is an important part of every developer's workflow. We're trying to make it better.
- How DeepSeek learns: GRPO explained with Triangle Creatures: đđĄClick to visit my sponsor https://brilliant.org/DrMihaiNica/ and try their *Language Models course* (along with everything else they have to offer too!) ...
Perplexity AI â· #general (378 messagesđ„đ„):
Perplexity AI Bugs, Claude 3 Opus vs Sonnet, Deepseek Propaganda?, Perplexity AI Business Fellowship, GPT-4.5 Quality Concerns
- Perplexity Web UI struggles with Rewrite Bugs: Multiple users report that Perplexityâs web UI rewrite functionality is broken, with rewritten prompts always using the default model, pplx_pro, regardless of the selected option.
- Some users have also experienced the rewriting function duplicating the prompt instead of rewriting it, with <@883069224598257716> tagged for assistance.
- Claude 3 Opus and Sonnet: A Model Mix-Up: Users are confused as to whether the model indicators are accurate, questioning whether theyâre getting Claude 3.7 Sonnet or Claude 3 Opus when selecting Claude in settings.
- Some find that Pro Search overrides the default model with Sonar, leading to discrepancies between selected models and actual models used.
- Deepseekâs Alleged Propaganda Outing: A user posted an image allegedly generated by Deepseek that was deemed politically motivated by the community.
- Another member said You are fake deepseek. Real deepseek doesnât talk on western affairs.
- GPT-4.5 gets review bombed for poor riddle solving skills: Some users report poor output from GPT-4.5 on Perplexity, saying the model answers more creatively and accurately, and that the model indicator used to show which model when you used pro search.
- One member said theyâre stepping it up with 4.5, back then it wasnât able to solve a riddle like 4.5 in chatgpt now it solves it perfectly and response is exactly like 4.5 in chatgpt while showing off an example of their chat log.
- Bookmark Organization with Perplexity: A user asked if Perplexity could organize their 7,000 browser bookmarks.
- The user accompanied their question with a Bart Simpson meme.
Links mentioned:
- LiveBench: no description found
- Claude 3.7 goes hard for programmersâŠ: Try Convex for free, the only database designed to be generated https://convex.link/fireshipAnthropic released an impressive new CLI tool for programmers cal...
- The ECHO Emergence: A Case Study in AI Collaboration, Evolution, and Containment : Foreword by Chase Holden:
- Big Oof GIF - Big Oof Yikes - Discover & Share GIFs: Click to view the GIF
- The ECHO Emergence: A Case Study in AI Collaboration, Evolution, and Containment : Foreword by Chase Holden:
Perplexity AI â· #sharing (7 messages):
Shareable threads, Perplexity AI integrations, Ingredient breakdowns
- Shareable Threads requested: A member was reminded to ensure their thread is set to
Shareable
, with an attached screenshot for reference.- This ensures that other users can easily access and view the content of the thread.
- Perplexity AI Integrations in Demand: Multiple users are exploring ways to integrate Perplexity AI into various applications and workflows as shown by shared search queries like integrate perplexity.
- This indicates a growing interest in leveraging Perplexity AIâs capabilities in diverse contexts.
- Deep Dive into Ingredient Breakdowns: A user shared search queries related to ingredient breakdowns, recommending in-depth analysis.
- The shared queries included requests for explanations of ingredients and must-have components: ingredient breakdown.
Perplexity AI â· #pplx-api (3 messages):
Open Source Claude-Code, Perplexity API Limitations, Obsidian Web Clipper Issue
- Perplexity Offers API Credits for Open Source Claude-Code: Perplexity is offering free API credits to developers interested in building an open-source Claude-Code model with editor integrations and extensions, as announced on X.
- Interested individuals are encouraged to DM @GregFeingold and @AarashHeydari for more details.
- Perplexity API Incompatibility with OpenAI impacts Obsidian: The Perplexity API isnât fully compatible with OpenAI, causing issues with tools like Obsidian Web Clipper.
- Specifically, the API requires an assistant message between user messages, a constraint not present in OpenAI-type APIs, leading to problems when Obsidian Web Clipper attempts to post consecutive user messages.
Link mentioned: Tweet from Aravind Srinivas (@AravSrinivas): If anyone wants to build an open source Claude-Code with some editor integrations and extensions, Perplexity would be happy to provide free API credits. Please DM @GregFeingold and @AarashHeydari
HuggingFace â· #general (118 messagesđ„đ„):
RL fundamentals, DeepMind RL Course, Automated video generation, OpenAI image generation alternatives, Fine-tuning Phi-3 for Multi-modality
- DeepMind delivers detailed RL course: Someone asked for recommendations for reinforcement learning resources comparable to Stanfordâs CS231n, and another member suggested the DeepMind x UCL Deep Learning Lecture Series 2021 on YouTube.
- Video Automation System Seeketh Guidance: A Java developer seeks guidance on creating a local text-to-video automation system without prior AI coding experience, requesting step-by-step instructions.
- Spaces Face Spacesâ Spam Snags: Spaces have restrictions around prohibited programs like VNC, and rapidly creating a large number of free Spaces, which is considered SPAM, but there are Spaces that have been running for years without being restarted, so if you create them properly, theyâre very robust.
- Fine-Tune Phi-3 for Multi-Modality: One member is fine-tuning Phi-3 for multi-modality, aiming to give images speech/text as input, using an A100-equipped Colab Pro.
- Another member warned that such fine-tuning would take 6+ A100s and run for approximately 2 weeks, though added that [QLora and Peft make anything possible with a positive attitude and a credible project].
- HTML code generation using AI is not worth it: One member asked if its possible to have a small llm (at max, maybe 8b) that can use a seperate thing to work off of alecsharpie/codegen_350m_html, specifically for html
- Another member responded that using AI for raw HTML generation is not worth it, suggesting AI should manage a system that configures a layout and other systems put that into HTML because of the cascade of problems the precision on the details is just something that AI will not be able to achieve even with endless testcases
Links mentioned:
- DeepMind x UCL | Deep Learning Lecture Series 2021: The Deep Learning Lecture Series is a collaboration between DeepMind and the UCL Centre for Artificial Intelligence.
- SmolWorld - a Hugging Face Space by p3nGu1nZz: no description found
- Using Hugging Face libraries on AMD GPUs: no description found
- Repo duplicator - a Hugging Face Space by huggingface-projects: no description found
- Wan-AI/Wan2.1-T2V-1.3B · Hugging Face: no description found
- microsoft/Phi-3-mini-4k-instruct · Hugging Face: no description found
- Tonic/GemmaX2-28-2B-gguf · Hugging Face: no description found
- microsoft/Phi-3.5-mini-instruct · Hugging Face: no description found
- modeling_phi3 errors with AttributeError: 'DynamicCache' object has no attribute 'get_max_length' · Issue #36071 · huggingface/transformers: System Info transformers version: 4.49.0.dev0 (315a9f4~1) Platform: Windows-10-10.0.20348-SP0 Python version: 3.11.7 Huggingface_hub version: 0.28.1 Safetensors version: 0.5.2 Accelerate version: 1...
- GitHub - vosen/ZLUDA: CUDA on non-NVIDIA GPUs: CUDA on non-NVIDIA GPUs. Contribute to vosen/ZLUDA development by creating an account on GitHub.
- John6666/Phi-3.5-mini-instruct · Hugging Face: no description found
- moondream/megalith-mdqa · Datasets at Hugging Face: no description found
- Spaces - Hugging Face: no description found
- Getting Started With The Python Client: A Step-by-Step Gradio Tutorial
- Spaces - Hugging Face: no description found
- GitHub - sayakpaul/q8-ltx-video: This repository shows how to use Q8 kernels with `diffusers` to optimize inference of LTX-Video on ADA GPUs.: This repository shows how to use Q8 kernels with `diffusers` to optimize inference of LTX-Video on ADA GPUs. - sayakpaul/q8-ltx-video
- GitHub - Wan-Video/Wan2.1: Wan: Open and Advanced Large-Scale Video Generative Models: Wan: Open and Advanced Large-Scale Video Generative Models - Wan-Video/Wan2.1
HuggingFace â· #today-im-learning (1 messages):
VLMs, Cracking VLMs
- Cracking VLMs notes shared: A link to a Google Docs document titled Cracking VLMs, notes so far was shared, at this link.
- Upcoming VLM Cracking Competition: A user is preparing a series of challenges on VLMs for cracking and reverse engineering.
Link mentioned: VLM Notes: VLM Notes TODO : Understand how preprocessor handles images : More work into explaining vision_encoder Resources : https://github.com/merveenoyan/smol-vision Smolvlm and idefics Moondream especiaâŠ
HuggingFace â· #cool-finds (7 messages):
Dataset Viewer Errors, FastRTC
- Dataset Viewer plagues users: A user suggested fixing Dataset Viewer errors for compatibility with various libraries and SQL, enhancing discoverability.
- Another user thanked them in advance, and jokingly requested an additional 1.2M rows of a HQ dataset.
- FastRTC Project: Anyone Working on It?: A member inquired whether anyone is actively working on FastRTC.
- Another member pointed them to a specific channel.
HuggingFace â· #i-made-this (7 messages):
AI Story Studio, MoD ControlNet Tile Upscaler, VAE comparison, Remote VAE from HF, Cross-device browser-based scratchpad
- AI Story Studio launches for collaborative storytelling!: A new interactive storytelling experience called AI Story Studio has launched, allowing users to co-write adventures with AI by picking a genre, guiding the story with prompts, and downloading the final result from AI Story Studio.
- The tool aims to practice creative writing, overcome writerâs block, and explore storytelling with AI-generated ideas.
- MoD Upscaler Enhances Images Without Quality Loss: The MoD ControlNet Tile Upscaler for SDXL tool was launched, which uses tiling technology to upscale images with preserved details and smooth transitions, as shown on the Demo App and Github Code.
- The upscaler offers preserved details, advanced tiling tech, fast performance, and a user-friendly interface for professional-quality results.
- Compare VAE quality with interactive demo!: An interactive demo comparing the reconstruction quality of various VAEs has been released by @rizavelioglu, link to Space.
- A user asked to add the remote VAE from HF, blog post at huggingface.co.
- Build Cross-Device Scratchpad for Math Learning!: A member built a cross-device, browser-based scratchpad to augment math learning, which allows users to access the same scratchpad across devices using an ID, it is barebones, but potentially useful for others.
- A demo video was shared (scratchpadDemoOutput.mp4), using Firebase which was turned off due to resource concerns.
- InternVL 2.5 AWQ Conversion Released: An AWQ conversion of InternVL 2.5 was released, showing little degradation in performance compared to the original, found at HuggingFace.
- Unlike the authorâs version, this AWQ version is compatible with the transformers library.
Links mentioned:
- Vae Comparison - a Hugging Face Space by rizavelioglu: no description found
- rootonchair/InternVL2_5-4B-AWQ · Hugging Face: no description found
- Remote VAEs for decoding with Inference Endpoints đ€: no description found
- Remote VAE Inference Endpoints - a diffusers Collection: no description found
- MoD ControlNet Tile Upscaler SDXL - a Hugging Face Space by elismasilva: no description found
- GitHub - DEVAIEXP/mod-control-tile-upscaler-sdxl: MoD Control Tile Upscaler for SDXL Pipeline: MoD Control Tile Upscaler for SDXL Pipeline. Contribute to DEVAIEXP/mod-control-tile-upscaler-sdxl development by creating an account on GitHub.
HuggingFace â· #core-announcements (1 messages):
Remote VAE Decode endpoints, Hybrid Inference, SD v1, SD XL and Flux
- Sweet Honey Optimization Deployed: The new codename honey is live on Remote VAE Decode endpoints for SD v1, SD XL and Flux, reducing latency up to 10x.
- The change empowers local AI builders with Hybrid Inference.
- Hybrid Inference Benefits Listed: Hybrid Inference offers a fast and simple way to offload local generation requirements, reducing requirements and offering the highest quality without sacrificing performance.
- Itâs free, fully compatible with Diffusers, and developer-friendly with simple requests and fast responses.
- VAE Encode is Coming Soon: Quickly decode latent representations into high-quality images without compromising performance or workflow speed with VAE Decode.
- Efficiently encode images into latent representations for generation and training with VAE Encode (coming soon).
Link mentioned: Hybrid Inference: no description found
HuggingFace â· #computer-vision (3 messages):
Audio to Video matching, ViT resources, ViT and Global Average Pooling
- Audio & Video get Synced: A member pointed to a discussion on how to match audio to video in this discord channel.
- ViT resources sought, clarity craved: A member requested resources to understand Vision Transformer (ViT), specifically how each attention head contributes to the CLS token or captures information about the image.
- They noted existing explanations state CLS token is used the same as BERT, but found this explanation insufficient.
- ViT: Average Pooling Questioned: A member inquired why Global Average Pooling cannot be used, as mentioned in the ViT paper.
HuggingFace â· #NLP (2 messages):
Web scraping with Python, Running Phi-4 as real-time API
- Pythonistas Ponder Web Scraping: Members are asking for advice on web scraping data from sites like Wikipedia using Python.
- While specific methods werenât detailed, common practices involve libraries like Beautiful Soup and Scrapy for parsing HTML content.
- Phi-4 Phans Fancy Real-Time API: Someone inquired about running the
Phi-4
model as a real-time API using websockets.- Unfortunately, there were no shared experiences or advice on this topic within the discussion.
HuggingFace â· #gradio-announcements (1 messages):
Gradio, Groovy, Python to Javascript
- Gradio kicks off Groovy: Python gets Javascript!: Gradio introduces Groovy, a tool that converts Python functions to JavaScript for client-side execution in Gradio apps, enabling developers to write code once in Python and achieve JavaScript performance without the burden of maintaining dual codebases (docs).
- Groovy Transpiler Promises Clarity: Unlike other transpilers, Groovy is designed to provide clear error messages when it cannot transpile specific code, focusing on supported elements such as simple Python functions, a subset of the Python standard library, and Gradio-specific classes.
- This approach ensures developers are aware of limitations when crossing languages, prioritizing transparency over attempting to handle all possible scenarios.
- Client-Side Functions Boost Gradio Responsiveness: Gradio allows you to run certain âsimpleâ functions directly in the browser by setting
js=True
in your event listeners, which will automatically convert your Python code into JavaScript.- This improves app responsiveness by avoiding server round trips, especially beneficial on hosted applications with high load or latency.
Link mentioned: Client Side Functions: A Step-by-Step Gradio Tutorial
HuggingFace â· #smol-course (5 messages):
Smol Agents Quiz, NLP Reasoning Course, ClaudePlaysPokemon replication with smolagents
- Smol Agents Quiz Frustrates Users: A member expressed frustration with the Smol Agents Quiz, citing issues with unclear requirements and receiving a score of 0.0 out of 5 despite multiple attempts, linking to the quizâs app.py file.
- The member noted the need to mine error logs to understand the exact providers required for tools and models.
- NLP Reasoning Course Released: HuggingFace released a new unit in their NLP course focused on Reasoning models, titled The Reasoning Course.
- Replicate ClaudePlaysPokemon with SmolAgents: A community effort is underway to replicate ClaudePlaysPokemon using smolagents, as detailed in this GitHub repository.
- This project serves as a benchmark for LLM agents in a simulated environment.
Links mentioned:
- The Reasoning Course - Hugging Face NLP Course: no description found
- app.py · agents-course/unit2_smolagents_quiz at main: no description found
- GitHub - CalebDeLeeuwMisfits/PokemonLLMAgentBenchmark: Contribute to CalebDeLeeuwMisfits/PokemonLLMAgentBenchmark development by creating an account on GitHub.
HuggingFace â· #agents-course (87 messagesđ„đ„):
Introductions, Lambda Go Labs, CodeAgent LLM Size, Quiz Grader Issues, Inference Credits Exhaustion
- Lambda Go Labs Spark AI Excitement: Lambda Go Labs, is a community by Lambda Go and Future Technologies Limited focused on AI learning, building, and research.
- The community offers hands-on experience, opportunities to share work, and a supportive network for both experienced professionals and newcomers.
- LLM Size Debate for CodeAgents Explodes: A member inquired whether a 32B LLM is necessary for a CodeAgent, or if a smaller distilled model would suffice for learning purposes.
- They noted that smaller models yielded rubbish answers when used for playing around with SmolAgents.
- Final Quiz Grader Under Fire: Multiple members reported that the final quiz grader in unit 2.1 is obviously broken and inquired about when it will be fixed.
- One user linked to a discord thread discussing this issue.
- Inference Credits Disappear Quickly!: A user reported exhausting their monthly included credits for Inference Providers despite only completing course requirements.
- Another user suggested it may be related to switching between Google accounts or kernel dying and re-running, and they were forced to upgrade to PRO.
- ToolCallAgent Troubles Surface: A member reported issues with MultiAgent Architecture using ToolCallAgent in smolagents, where the fallback to a child agent with web access failed.
- Specifically, the manager agent couldnât delegate web search tasks to the child agent, despite the child agent having the necessary tools.
Links mentioned:
- Hugging Face â The AI community building the future.: no description found
- Dancing Cowboy GIF - Dancing Cowboy Finger Gun - Discover & Share GIFs: Click to view the GIF
HuggingFace â· #open-r1 (2 messages):
Replicant model training, R1 reasoning dataset for coding tasks
- Replicant Model Training Halted: The research team paused the procedural generation of a 25 petabyte dataset for training the replicant model.
- The team had to return to other work rather than creating the dataset.
- Quest for R1 Coding Dataset Arises: A member inquired whether an R1 reasoning dataset exists for coding tasks.
- There were no answers to this question.
aider (Paul Gauthier) â· #general (121 messagesđ„đ„):
Aider Leaderboard, Claude Code, Grok vs O3 Mini, anon-kode, python + uv
- Aider Leaderboard Tooling Benchmarks: The Aider leaderboard is used to benchmark AI models while using Aider as the primary tool/assistant; Claude Code is recognized as a tool/assistant rather than an AI model and is also benchmarked.
- A user pointed out the need for a tool-agnostic benchmark like SWE Benchlets for comparison.
- Anon-Kode forks Claude Code, goes OpenAI Compatible: A member shared that the dude who grabbed the source for Claude Code (link to original tweet) has now released a modified version that works with OpenAI compatible APIs (link to tweet) and available on GitHub.
- Set up Aider in personal project: A user asked for advice about using Aider with a personal projectâs Git repository and virtual environment.
- Others suggested installing Aider globally or in the projectâs venv and using tools like uv to sync environments.
- Grok Debugging > O3 Mini Code Creation?: A user mentioned Grok is good at debugging, but O3 mini high sonnet might be better at code creation, like adding new functions.
- They also noted Claude 3.7 adds unintended stuff and deepseek-chat with O1 Pro has been almost 95% fine for them as an editor.
- LLMs Censorship: Some members discuss the increased censorship in language models, particularly regarding kernel-level code generation and the need to label prompts as âeducational onlyâ to bypass restrictions.
- Grok3 seems to refuse to do the request, guess there still are limits.
Links mentioned:
- Tweet from Aravind Srinivas (@AravSrinivas): If anyone wants to build an open source Claude-Code with some editor integrations and extensions, Perplexity would be happy to provide free API credits. Please DM @GregFeingold and @AarashHeydari
- Tweet from Daniel Nakov (@dnak0v): Extracted claude-code unminified TS filesRepo in comments
- Tweet from Daniel Nakov (@dnak0v): Lots of things to fix, but you can use anything that supports OpenAI-style API. If you're brave, give it a try.Source code will go up once I clean it up some morenpm i -g anon-kodecd your-projectk...
- GitHub - dnakov/anon-kode: Contribute to dnakov/anon-kode development by creating an account on GitHub.
- Building Python tools with a one-shot prompt using uv run and Claude Projects: Iâve written a lot about how Iâve been using Claude to build one-shot HTML+JavaScript applications via Claude Artifacts. I recently started using a similar pattern to create one-shot Python utilities,...
- #!/usr/bin/env -S uv run: This is a really neat pattern. Start your Python script like this: #!/usr/bin/env -S uv run # /// script # requires-python = ">=3.12" # dependencies = [ # "flask==3.*", # âŠ
aider (Paul Gauthier) â· #questions-and-tips (85 messagesđ„đ„):
Gemini 2.0 Pro Model issues, Aider + RAG/Vector embeddings, Aider editing with git diff, Aider with OpenRouter models and edit modes, Aider Architect mode
gemini/gemini-2.0-pro-exp-02-05
RESOURCE_EXHAUSTED issues: A user reported encounteringRESOURCE_EXHAUSTED
errors when using thegemini/gemini-2.0-pro-exp-02-05
model with a large context window in Aider, while thegemini-2.0-flash-thinking-exp-01-21
model works fine.- They asked if thereâs a way to use the Pro model with a full context window without limits.
- User solved the problem with Aider Extension using
workon
command.: A user created an Aider extension with a new/workon
command, which analyzes imports in a file and passes a list of relevant files to theadd
command.- The user claims it saves a lot of time and they have implementations for TS, Vue, Kotlin, and Java, but itâs described as rather ugly and doesnât support
--subtree-only
.
- The user claims it saves a lot of time and they have implementations for TS, Vue, Kotlin, and Java, but itâs described as rather ugly and doesnât support
- Requesting Aider edit files with Git Diff Syntax: A user wants Aider to edit files using git diff syntax (like
<<<<<< branch
,======
,>>>>>>> replace
) directly in the file, instead of just showing it in the terminal and replacing the text.- They want to be able to edit the changes before accepting them, but others pointed out this is not possible without forking the project, also you can use built-in IDE diff tool.
- Sonnet vs OpenRouter recommendation for Aider: A user requested recommendations for models to use with Aiderâs edit modes that arenât too expensive, finding o1-preview to be both costly and ineffective.
- Another user suggested using
r1-free
orgemini flash thinking
for planning andSonnet 3.7
for execution and shared a snippet of their aider config with model aliases and settings for edit formats.
- Another user suggested using
- Git commands for Aider: A user described a technique of creating a script to run Aider, using
--load
to load a script with commands on startup, and adding/run git diff
to update on the latest changes, useful for working on a branch.- A separate user suggests Aider could use
git apply mypatch.diff
to apply changes instead of having the LLM manually edit the whole file and add a--check
step.
- A separate user suggests Aider could use
Link mentioned: GitHub - lutzleonhardt/copilot-proxy: Copilot Proxy is a Visual Studio Code extension that exposes the VS Code Language Model API via an Express server. This experimental extension is intended solely for research and prototyping purposes and should not be used in production environments.: Copilot Proxy is a Visual Studio Code extension that exposes the VS Code Language Model API via an Express server. This experimental extension is intended solely for research and prototyping purposâŠ
GPU MODE â· #general (1 messages):
Vision Models, Attention based ViTs, MLP-Mixer
- MLP-Mixer: An alternative to ViTs: A member questioned why MLP-Mixer isnât used more often for vision models.
- Attention based ViTs still prevail: The member noted that attention based ViTs are still the standard for SOTA vision models.
Link mentioned: MLP-Mixer: An all-MLP Architecture for Vision: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that wâŠ
GPU MODE â· #triton (54 messagesđ„):
SRAM vs Cache Confusion, Triton Scalar Constants Data Type, CUDA backend hyper-parameters, Triton Autotuning Resources, Triton BLAS Implementations
- SRAM and Cache Clarifications: A discussion clarified the relationship between SRAM, registers, shared memory, and cache, noting that registers, shared memory, and cache are chip/software level properties built from SRAM.
- Any shared memory that isnât allocated actually becomes L1 cache.
- Cache Level Control in Triton: It was discussed that while Triton doesnât offer direct control over cache levels (L1/L2), the
cache_modifier
argument intl.load
allows specifying whether to hit from L1 or L2.- The difference between
ca
andcg
is thatcg
specifies that the load should only hit from L2 and not L1 cache.
- The difference between
- CUDA Cache Coherency Deep Dive: The CUDA documentation states that global data is coherent at the L2 level, but multiple L1 caches are not coherent for global data.
- It was explained that L2 cache is shared among all SMs, but each SM has its own independent L1 cache and that a threadfence can be used to flush L1 cache writes to global memory.
- Scalar Constants Data Types in Triton: A user asked about specifying the data type of scalar constants in Triton to avoid unintended upcasting during operations like bitwise AND.
- Another user suggested to apply mask and then downcast it again to int8
x=(x&mask).to(tl.int8)
.
- Another user suggested to apply mask and then downcast it again to int8
- Block Scaled Matmul for INT8 in Triton: A user sought guidance on implementing block scaled matrix multiplication for INT8 in Triton, referencing a tutorial for FP4 and FP8 formats.
- The user encountered errors with
tl.dot
when attempting to apply scales due to hidden dimension requirements and sought advice on handling thescales
part.
- The user encountered errors with
Link mentioned: Block Scaled Matrix Multiplication â Triton documentation: no description found
GPU MODE â· #cuda (25 messagesđ„):
FP8 GEMM in CUTLASS, Determine Architecture for NVCC, Flash Attention Indexing
- FP8 GEMM with rowwise scales achieved: A member was looking for an example of fp8 gemm with rowwise scales implemented in CUTLASS for sm100, eventually found a solution, relieving the need for further assistance.
- Torch utility helps determine CUDA arch: A member needed to determine the
--arch=
of the current system for their build system, and another member suggested using thetorch.cuda.get_device_capability()
utility from PyTorch.- An alternative solution was found by leveraging
nvidia-smi --query-gpu=name,compute_cap --format=csv
to query the GPUâs compute capability directly, avoiding the PyTorch dependency.
- An alternative solution was found by leveraging
- Flash Attention Indexing puzzles developers: A member requested help with indexing in Flash Attention, specifically struggling with how to implement that part of the kernel.
- CUDA Runtime API reveals device properties: A member discovered the CUDA Runtime API to programmatically select a compute-device which best matches certain criteria.
Links mentioned:
- torch.cuda.get_device_capability â PyTorch 2.6 documentation: no description found
- CUDA Runtime API :: CUDA Toolkit Documentation: no description found
GPU MODE â· #torch (17 messagesđ„):
FSDP2 OffloadPolicy, register_post_accumuate_grad_hook, load_inline CUDA kernels, reduce and not scatter, optimizer scaling
- Users seek flexible FSDP2 OffloadPolicy: A user inquired about plans for FSDP2
OffloadPolicy
classes, seeking greater control over gradient handling, specifically to reduce and not scatter.- The response indicated no immediate plans but suggested exploring
register_post_accumuate_grad_hook
, though this runs after reduce scatter, which the user wants to avoid.
- The response indicated no immediate plans but suggested exploring
- CUDA Kernel launch gives illegal memory access: A user reported issues with illegal memory access errors when passing memory pointers directly to CUDA code using
load_inline
.- Another user suggested that the PyTorch CUDA caching allocator might allocate more memory than needed, potentially causing out-of-bounds reads in the direct pointer version while the tensor version works due to bounds checking.
- Exploring All-Reduce for Gradient Aggregation: A user proposed an alternative approach of using All-Reduce to aggregate gradients, followed by immediate optimizer application and zeroing of gradients, to avoid offloading.
- A respondent questioned the scalability of this approach due to the 2x communication volume compared to reduce-scatter, and suggested gathering gradients into larger blocks instead.
- Users consider parameter scaling methods: A user targets scaling on a single node, exploring options like scattering to CPU and accumulating, or gathering on rank 0 on CPU for the optimizer step.
- The goal is to find faster ways, highlighting the desire for an extensibility point in the system.
GPU MODE â· #algorithms (5 messages):
fa3, absmax quantization, hada transform
- FA3 Works, Quantization Error High: After some initial issues, FA3 was reported to be working, but with a significantly higher quantization error than basic absmax quantization.
- It was suggested to perform absmax quantization after the hada transform, especially for âvâ, to avoid out-of-distribution issues due to large activations.
- Quantization Strategy Shift Proposed: To mitigate the quantization challenges with FA3, a strategy shift was proposed involving applying absmax quantization post Hada transform for better performance.
- The focus is particularly on the âvâ component, which is prone to large activations and out-of-distribution behavior if not properly quantized after the transform.
GPU MODE â· #jobs (1 messages):
Internship Opportunity, Low-Level Programming, LLM Inference, Mobile and PC Platforms
- Internship Seeks Low-Level LLM Inference: A user is seeking interns for low-level programming to improve LLM inference on mobile and PC platforms, directing interested parties to this GitHub repo.
- The repo hosts code for running llama gemma on cupy and numpy.
- GitHub Repo Focuses on llama gemma: The GitHub repository provided focuses on running llama gemma on cupy and numpy.
- This suggests a focus on optimizing LLM performance using numerical computing libraries.
Link mentioned: GitHub - githubpradeep/llm_np_cp: running llama gemma on cupy and numpy: running llama gemma on cupy and numpy. Contribute to githubpradeep/llm_np_cp development by creating an account on GitHub.
GPU MODE â· #beginner (5 messages):
Triton tensor creation, ROCm support for RX 7800 XT, NVIDIA GPU alternatives
- Creating Triton Tensors of Scalars: A user inquired about explicitly creating a tensor of a scalar in Triton to specify the data type.
- They tried
mask = tl.tensor(0xF, type=tl.uint16)
but reported that it did not work.
- They tried
- RX 7800 XT ROCm Woes: A new GPU programming and AI user with a NITRO+ AMD Radeonâą RX 7800 XT 16GB reported issues getting it to work with PyTorch and other AI libraries.
- They noted that ROCm doesnât support anything lower than 7900.
- NVIDIA GPUs: A Viable Alternative?: A member jokingly suggested to sell your GPU and buy NVIDIA, then provided serious alternatives for accessing NVIDIA GPUs.
- They recommended platforms like lightning.ai for free monthly GPU hours (availability unconfirmed), and runpod and vast.ai for affordable GPU renting.
GPU MODE â· #self-promotion (8 messagesđ„):
Tilelang Kernel, Deepseek flashmla, MLA leaderboard, Bitnet group
- Tilelang kernel rivals Deepseek flashmla: A member shared that with 80 lines of tilelang kernel code you can get 95% performance of deepseek flashmla (500% faster than triton) on H100, with a link to the GitHub repo.
- Desire expressed for MLA leaderboard: One of the members expressed the desire to have an MLA leaderboard so others can flex.
- That same member asked another if they would be interested in a working group and re-purposing the bitnet group.
- Tilelang Kernel receives positive feedback: A member stated that the kernel is well written.
- That same member stated that the documentation should probably be a tutorial/blog in itself.
Link mentioned: tilelang/examples/deepseek_mla at main · tile-ai/tilelang: Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels - tile-ai/tilelang
GPU MODE â· #reasoning-gym (11 messagesđ„):
Chain of Draft PR, Throttling errors
- Chain of Draft PR: Drafty Thinking Makes LLMs Swiftly Reasoning: A member added a PR for two new system prompt styles similar to those in the Chain of Draft paper.
- The Chain of Draft paper introduces a paradigm where LLMs generate minimalistic intermediate reasoning outputs, reducing verbosity and cost.
- Eval Script Praised for Logging Upgrade: Members are grateful that whoever fixed the eval script is the goat, resulting in much better logging.
- However, there is an open issue with throttling errors that still needs fixing, and no intermediate saving of results is implemented yet.
Link mentioned: Chain of Draft: Thinking Faster by Writing Less: Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-steâŠ
GPU MODE â· #gpuæšĄćŒ (4 messages):
Tilelang, MLA, FlashMLA, Python
- Tilelang Touted as MLA/FlashMLA Alternative: A member suggested to âdirectly all in tilelangâ, claiming it is as fast as MLA and flashMLA but only requires 80 lines of Python code.
- Enthusiastic response to tilelang suggestion: A member thanked the original poster for sharing and expressed eagerness to learn tilelang.
GPU MODE â· #general (1 messages):
prefixsum submission, H100 submission
- Prefixsum Submission Purge Requested: A user requested the removal of their top H100 submission on prefixsum due to a naming discrepancy.
- The user believes that removing this submission will facilitate easier tracking of changes.
- H100 Prefix Sum: Discussion about top H100 Submission for Prefix Sum.
- A user requested the removal of this submission to better track changes.
GPU MODE â· #submissions (17 messagesđ„):
Leaderboard Submissions, Leaderboard Name Mismatches, Successful Submissions, GPU Usage
- Cluster-Bot Flags Leaderboard Name Clashes: The Cluster-Bot detected that the leaderboard name specified in the command doesnât match the one in the submission script header, defaulting to submitting to the specified name (e.g.,
grayscale
,vectorsum
,vectoradd
). - Modal Runners Yield Successful Submissions: Several leaderboard submissions succeeded using Modal runners on various GPUs, including H100 for
conv2d
(ID1509
), T4 formatmul
(IDs1510
,1512
), and A100, L4, H100 forvectorsum
(ID1516
). - Missing Leaderboard Name Causes Cluster-Bot to Complain: The Cluster-Bot prompted users to specify the leaderboard name either via command argument or within the submission script using the
{#,//}!POPCORN leaderboard <leaderboard_name>
directive. - Vectoradd Leaderboard Sees Multiple Test and Benchmark Submissions: Multiple test (ID
1526
,1528
) and benchmark (ID1527
,1529
) submissions to thevectoradd
leaderboard succeeded using A100 and H100 GPUs with Modal runners.
GPU MODE â· #ppc (3 messages):
AVX512, FMA instruction, Performance Improvement
- AVX512 Broadcasts Directly in FMA Instruction: A member suggested exploring AVX512âs capability to perform ||broadcasts directly within the FMA instruction|| for potential performance improvements.
- Another member acknowledged they had not considered this and would investigate it further, signaling potential interest in leveraging AVX512âs features.
- Seeking Drastic Improvements via AVX512: A member expressed the need for a drastically different or new approach to achieve the next level of improvement.
- The suggestion to explore AVX512âs FMA instruction with direct broadcasts was presented as a potential avenue for achieving this significant advancement.
GPU MODE â· #feature-requests-and-bugs (10 messagesđ„):
L4 & T4 Timeout, AMD MI300s, Beta Launch
- L4/T4 Timeout Issue Lingers: The issue with L4 and T4 timing out during compiling hasnât been resolved due to development delays.
- A member mentioned a slight delay because the team is working on more interesting problems with some real life impact.
- MI300s Launch Possibly Incoming!: There was an improvement in the status with AMD, so the team will hopefully launch with MI300s as an option.
- The team emphasized that this is not promising anything, but it is an option.
- Beta Launch a Roaring Success: The team expressed satisfaction with the beta/alpha launch and decided to tackle more interesting problems sooner than planned.
- This launch was originally planned for the end of April.
OpenRouter (Alex Atallah) â· #app-showcase (1 messages):
Travel Reels, AI agents, Trip Planning
- App emerges for Saving Travel Reels: An app was created to solve the endless cycle of saving travel reels on social media and then wasting hours researching each spot manually.
- The app (https://thatspot.app/) uses AI agents to automatically process travel reels, extracting every place mentioned with locations, price ranges, reservation requirements, booking links, and operating hours.
- AI Agents Automate Travel Research: The app leverages AI agents to streamline the manual research process associated with planning trips from saved travel reels.
- It automatically extracts precise locations, price ranges, reservation requirements, direct booking links, and operating hours, directly from travel reels.
Link mentioned: ThatSpot Guide: no description found
OpenRouter (Alex Atallah) â· #general (126 messagesđ„đ„):
Google Flash 2.0 Error, Claude 3.7 Sonnet Rate Limits, OpenRouter API Key with VS Studio/RooCode, BYOK azure models in openrouter, Accessing Links in Chat Models
- Google Flash 2.0 throws error: A user reported receiving a 502 error when inferencing with Googleâs Flash 2.0 and Flash 2.0 Light models, with the error message âProvider returned errorâ and an internal error encountered by Google.
- A member suggested putting the request in the appropriate Discord channel.
- Rate Limits of Claude 3.7 Sonnet Discussed: A user inquired about the rate limits for Claude 3.7 Sonnet in terms of RPM (Requests Per Minute) and TPM (Tokens Per Minute).
- A member stated that OpenRouter doesnât have specific rate limits for individual users, and if rate limits are hit, itâs usually the OpenRouter limit, which is higher than Tier 4 (see Anthropicâs rate limits documentation).
- Struggles with OpenRouter API Key in VS Studio/RooCode: A user encountered a 401 Authentication Failure while trying to use an OpenRouter API key in VS Studio via RooCode, despite having funds in their OpenRouter account.
- Members suggested checking the API key for correctness, ensuring OpenRouter is selected as the API provider in RooCode, and verifying the base URL is correctly configured based on this tutorial.
- Requesting BYOK azure models in OpenRouter: A user asked about using BYOK (Bring Your Own Key) with Azure models in OpenRouter, aiming for a unified API to use finetuned models through openrouter.
- A member stated that itâs not possible to use models other than whatâs listed in the
/models
endpoint, which only returns public models and not ones with BYOK. However, you can use your own OpenAI API Key in Integration settings (OpenRouter Integration Settings)
- A member stated that itâs not possible to use models other than whatâs listed in the
- Navigating the Labyrinth of OpenRouter Latency: A user inquired about improving the time to first token (TTFT) latency on OpenRouter, noting their finding that OpenRouter has an average of 2x TTFT compared to using providers directly.
- A team member asked the user to consolidate their findings in a forum post and mentioned that reducing latency is currently a high priority.
Links mentioned:
- Discord - Group Chat Thatâs All Fun & Games: Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
- Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
- API Parameters - Complete Guide to Request Configuration: Learn about all available parameters for OpenRouter API requests. Configure temperature, max tokens, top_p, and other model-specific settings.
- Elevated errors on on requests: no description found
- Tweet from Imrat (@imrat): 1. Setup OpenAI API Key in Settings > Models with your OpenRouter api key, and the OpenRouter Base URL2. Make sure you add the right models3. When I want to use OpenRouter models - CMD+Shift+0 (zer...
- List endpoints for a model â OpenRouter | Documentation: no description found
- Object Generation: Learn how to use the useObject hook.
- no title found: no description found
- Cursor 0.46.7 Pro: Openrouter Key not working: Hey, your API keys only work in chat mode, also known as Ask mode.
LM Studio â· #announcements (1 messages):
LM Studio SDK, Python, TypeScript, Agent API, MIT License
- LM Studio SDKs Arrive for Python and TypeScript!: LM Studio launched software developer kits for Python (
lmstudio-python
) and TypeScript (lmstudio-js
), both under the MIT license.- These SDKs allow developers to tap into LM Studioâs AI capabilities from their own code, including LLMs, embeddings models, and agentic flows.
- LM Studio Introduces Agent-Oriented .act() API: LM Studio introduced its first agent-oriented API, the
.act()
call, where the model autonomously executes tasks using provided tools over multiple rounds.- This API lets you give it a prompt and tools, and the model goes on its own autonomously for multiple execution rounds until it accomplishes the task (or gives up).
- LM Studio SDK Documentation Now Live: LM Studio released documentation for the Python (
lmstudio-python
) and TypeScript (lmstudio-js
) SDKs, providing resources for interacting with LLMs, embedding models, and agentic flows.- The SDK allows you to use LLMs to respond in chats or predict text completions, define functions as tools, and turn LLMs into autonomous agents that run completely locally.
Links mentioned:
- Introducing lmstudio-python and lmstudio-js: Developer SDKs for Python and TypeScript are now available in a 1.0.0 release. A programmable toolkit for local AI software.
- lmstudio-python (Python SDK) | LM Studio Docs: Getting started with LM Studio's Python SDK
- lmstudio-js (TypeScript SDK) | LM Studio Docs: Getting started with LM Studio's Typescript / JavaScript SDK
LM Studio â· #general (100 messagesđ„đ„):
Context Length Error, Model Architecture unsupported by Llama.cpp, LM Studio CLI Commands, LM Studio SDKs, LM Studio Downgrading
- LM Studio model failing to load and showing âUnsupported deviceâ error: Users are encountering a
Failed to load model
error with the messageUnsupported device
after updating LM Studio, potentially due to insufficient memory or incompatible model loading settings and were advised to try adjusting GPU offloading or thread pool size.- Context length can also impact memory usage; the left number is the number of tokens the model is using in the chat history already while the right number is the context limit (basically how long until the memory starts to truncate).
- Diffusion Model architecture not supported by Llama.cpp: Users are receiving
error loading model architecture: unknown model architecture: 'sd3'
when trying to load diffusion models, and it was clarified that llama.cpp does not support image/video/audio generation models.- Support for vision models in
llama.cpp
is uncertain, with concerns about the lack of Llama 3.2 vision or Pixtral vision support, however, some believe that UI-TARS fixes will help a lot more.
- Support for vision models in
- Pseudollama bridges the OLLAMA gap: A user asked if LM Studio endpoints were compatible with apps that take an OLLAMA endpoint, and it was answered that it is not supposed to work by default, but Pseudollama can bridge the gap.
- The author noted that this is 100% vibe coded, so there are likely dumb issues throughout, but it works.
- LM Studio SDK Documentation Released: With the most recent release of LM Studio, LM Studio CLI commands were documented, and a user confirmed that the OpenAI API continues to be supported and prioritized.
- One member of the community noted that they were waiting for this to come up so I can make a plugin, noting that they wanted to make a watch app communicator.
- Users find way to Downgrade LM Studio: A user needed to downgrade to version 0.3.10 because the new version removed the ability for their preset to do a tensor split between their two cards.
- Another user suggested using the web archive to find the download link, while another said to just switch up the download link parameters.
Links mentioned:
- lms â LM Studio's CLI | LM Studio Docs: Get starting with the lms command line utility.
- GitHub - verbiate/Pseudollama: Contribute to verbiate/Pseudollama development by creating an account on GitHub.
- Tweet from LM Studio (@lmstudio): Something big for builders is coming in the next few hours! đšâČïž
- GitHub - ggml-org/llama.cpp: LLM inference in C/C++: LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.
LM Studio â· #hardware-discussion (21 messagesđ„):
AMD and Intel vs CUDA, Vulkan vs CUDA, AMD GPU market share, Nvidia 5090 specs
- AMD and Intel could rival CUDA: Members discussed whether AMD or Intel could become viable for ML pipelines and frameworks to compete with CUDA.
- Vulcan is not a CUDA competitor: Members mentioned that Vulkan is a graphics API whereas CUDA is built for GPGPU compute tasks, so using Vulkan for computing itâs like using dx12 for computing: You can do it, but does it make sense?
- One member stated, competition and having alternatives to Vulcan will be nothing but good for consumers.
- AMD needs higher market share to invest in GPU Compute: Some members believe if AMD increases their market share, they would be more interested in investing in their GPU computing department.
- One member believes theyâve been happy holding their 10%.
- Nvidia 5090 Spec Numbers are sought out: A member recalls seeing some Nvidia 5090 FP8/16/32 spec numbers and is asking where to find those.
- A member also shared an image of their 4x 3090 gang setup.
- Chip Foundry Time Acquisition: One member believes that the real question is whether AMD can buy the time from a chip foundry, because Nvidia has the upper hand.
- Another member said Intel opening their next one in ~2030 so expecting nothing from AMD.
Nous Research AI â· #general (93 messagesđ„đ„):
Low-rank space reasoning, Nous API, CUDA Kernels, Hermes 3 erotic fiction, Ollama usability
- Low-Rank Space Makes PEFT Reasonable: A member suggested that since reasoning differences from base models are often in low-rank space, using PEFT for training becomes a reasonable approach, further suggesting the use of Qwen 0.5B for low-cost testing.
- Others agreed that low vram unsloth RL trainer seems to work well because of this.
- Nous API is coming?: A member suggested that Nous provide their own API to use their models to generate income, especially given that current income is inconsistent, suggesting a possible pricing of $0.8/M tokens and estimating potential revenue of $800-1600/day.
- Others suggested that Nous could charge closer to $1/M input tokens, $3/M output tokens for forge and others noted that there are ongoing efforts to make this happen.
- LLMs struggle creating performant CUDA kernels: Members discussed generating CUDA kernels with LLMs, with the consensus that while LLMs can output CUDA syntax, no LLM is good at producing performant kernels on their own.
- The best strategy seems to involve augmenting the LLM with hardware and compute graph information, possibly using a knowledge graph or GNN, with a semi-manual approach involving extensive GPU profiling.
- Hermes 3 Writes Erotic Fiction: A user praised Hermes 3âs unexpected talent for writing erotic fiction, expressing excitement for a future NousChud model iteration.
- Another member mentioned they always have something in the works but they have a preference for models that can be run without needing a datacenter.
- Ollama criticized for Beginner-Centric Design: While Ollama is considered okay for beginners, itâs criticized for being terrible for people that are past the beginner stage because it defaults to Q4 quantization even for 7B and 8B models.
- Alternatives like llama.cpp or koboldcpp were suggested for more advanced users, but it was acknowledged that configuring and maintaining an environment is a different skill and can be too much to throw at someone at once.
Links mentioned:
- Claude: Talk with Claude, an AI assistant from Anthropic
- ollama/docs/api.md at main · ollama/ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models. - ollama/ollama
- GitHub - PsycheFoundation/psyche: An open infrastructure to democratize and decentralize the development of superintelligence for humanity.: An open infrastructure to democratize and decentralize the development of superintelligence for humanity. - PsycheFoundation/psyche
- Intro to Psyche - Psyche: no description found
Nous Research AI â· #research-papers (8 messagesđ„):
Logic-RL, Rule-Based Reinforcement Learning, General World Models, Worldsim
- Logic-RL Unleashes Reasoning with Rule-Based Reinforcement Learning: A new paper (Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning) explores the potential of rule-based reinforcement learning (RL) in large reasoning models, inspired by DeepSeek-R1.
- The 7B model, trained on just 5K logic problems, demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.
- Runway Introduces General World Models: Runway introduced General World Models, envisioning AI systems that build internal representations of environments to simulate future events.
- They aim to represent and simulate a wide range of situations and interactions, moving beyond limited and controlled settings like video games or driving simulations.
- Community Discusses Generative AIâs Potential in Worldsim: Members discussed the potential of generative AI to simulate entire worlds as interactive experiences, hinting at the profound capabilities of emergent LLM world models.
- There was additional conversation that work to continue worldsim, is likely to appear in blog form, although making it a paper is the final goal.
Links mentioned:
- Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning: Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as t...
- Runway Research | Introducing General World Models: We believe the next major advancement in AI will come from systems that understand the visual world and its dynamics, which is why weâre starting a new long-term research effort around what we call ge...
Nous Research AI â· #interesting-links (1 messages):
â
- No Topics Found: No relevant topics were found in the provided messages.
- No Summaries Available: Unable to create summaries due to lack of meaningful content.
Link mentioned: San: no description found
Nous Research AI â· #research-papers (8 messagesđ„):
Rule-Based Reinforcement Learning (RL), DeepSeek-R1, Logic-RL, Worldsim, General World Models (GWM) by RunwayML
- Logic-RL Unleashes LLM Reasoning with Rule-Based RL: A new paper, Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning, explores using rule-based RL in large reasoning models, drawing inspiration from DeepSeek-R1.
- The paper highlights key contributions such as a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence.
- RunwayML Introduces General World Models (GWM): RunwayML is starting a new long-term research effort around what they call general world models (GWM), aiming to build AI systems that understand the visual world and its dynamics to simulate future events.
- They believe the next major advancement in AI will come from systems that understand the visual world and its dynamics.
- Exploring LLMsâ Emergent World Models: Discussion revolves around the idea that LLMs construct world models as an emergent property from training on large datasets.
- The conversation seeks to understand the capabilities and limitations of these emergent world models, especially how a general world model might enhance experiences or open new creative frontiers.
- Worldsimâs Profound Potential in Generative AI: Worldsim hints at the potential of generative AI as a creative medium, with the ability to simulate entire worlds as interactive experiences.
- One member is preparing work to continue Worldsim, likely in blog form, expressing uncertainty about making it a paper.
Links mentioned:
- Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning: Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as t...
- Runway Research | Introducing General World Models: We believe the next major advancement in AI will come from systems that understand the visual world and its dynamics, which is why weâre starting a new long-term research effort around what we call ge...
Nous Research AI â· #reasoning-tasks (1 messages):
Qwen2.5-Math-1.5B, longcot examples, dataset structuring, setting up the GRPOTrainer
- Qwen2.5-Math-1.5B Model Struggles with longcot Format: A member is experimenting with Qwen2.5-Math-1.5B using longcot examples, but the model is not following the expected format.
- The member seeks assistance with structuring the dataset and setting up the GRPOTrainer, with a link to their Kaggle notebook.
- Dataset Structuring and GRPOTrainer Setup Issues: The user is facing challenges in getting the Qwen2.5-Math-1.5B model to adhere to the desired format when using longcot examples.
- They suspect the issue lies either in the datasetâs structure or in the configuration of the GRPOTrainer, and are requesting guidance.
Interconnects (Nathan Lambert) â· #news (28 messagesđ„):
Unitree Open Source, Gemma 3 Release, GPT-4.5 tops Arena leaderboard, Post-Training Interpretation, Anthropic $3.5B Funding
- Unitree Robotics open sources repos: Unitree Robotics has open-sourced a bunch of their repos, with a link provided to their GitHub.
- Gemma 3 release: The release of Gemma 3 was announced for March 12.
- GPT-4.5 takes the Arena Lead: GPT-4.5 now tops the Arena leaderboard across all categories, including Multi-Turn, Hard Prompts, Coding, Math, Creative Writing, Instruction Following, and Longer Query (source).
- Eliciting the potential of Post-Training: Most improvements in models from OpenAI, Anthropic, and Google over the last 18 months are from the post-training phase, akin to F1 teams improving car performance through aerodynamics and systems changes (source).
- Anthropic Raises Billions: Anthropic raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners (source).
Links mentioned:
- Tweet from Ivan Leo ( đșđž In NY 19 - 28 Feb! ) (@ivanleomk): Fingers crossed haha!Paris is pretty exciting to be in, so many events happening!
- Tweet from Teortaxesâ¶ïž (DeepSeek æšçčđéçČ 2023 â â) (@teortaxesTex): As of 12 PM Pacific Standard Time, OpenAI has silently changed their exaggerated claim of improvement on hallucination and accuracy benchmark PersonQA (new version on the left), reporting the correct ...
- Tweet from Anthropic (@AnthropicAI): Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners.This will advance our development of AI systems, deepen our understanding of how they work...
- Tweet from lmarena.ai (formerly lmsys.org) (@lmarena_ai): GPT-4.5 topped all categories across the board, with a clear leadership in Multi-Turn.đ„ Multi-Turnđ Hard Promptsđ Codingđ Mathđ Creative Writingđ Instruction Followingđ Longer Query
- Tweet from lmarena.ai (formerly lmsys.org) (@lmarena_ai): BREAKING News: @OpenAI's GPT-4.5 now tops the Arena leaderboard!With over 3k votes, GPT-4.5 landed #1 across ALL categories, and singularly #1 under Style Control / Multi-Turn đ„ Huge congratulati...
- Tweet from Nathan Lambert (@natolambert): If you look at most of the models we've received from OpenAI, Anthropic, and Google in the last 18 months you'll hear a lot of "Most of the improvements were in the post-training phase....
- Tweet from lmarena.ai (formerly lmsys.org) (@lmarena_ai): đ°More exciting news today: @xai's latest Grok-3 tops the Arena leaderboard! đ„This is the newest, production model, grok-3-preview-02-24With over 3k votes, this model is tied for #1 overall, and ...
- Unitree Robotics: High performance civilian robot manufacturer. Please everyone be sure to use the robot in a Friendly and Safe manner. - Unitree Robotics
- David Marx (@digthatdata.bsky.social): Something I've been wondering is how much pre-training investment is optimal as a post-training input. Under the "elicitation" theory, you'd anticipate full pre-training to be optima...
Interconnects (Nathan Lambert) â· #ml-drama (8 messagesđ„):
BobbyBroccoli videos, Deep Learning History, Shun-Ichi Amari
- X says Twitter App is Free: A member linked to a tweet stating this app is free X Tweet.
- The member questioned whether other fields are like this, answering, The answerâs gotta be âNo,â right.
- Juergenâs Deep Learning History: A member suggested reading Juergen Schmidhuberâs Deep Learning History.
- The member also highlighted Shun-Ichi Amari as someone they hadnât heard of before but found interesting.
Link mentioned: Tweet from loss (derogatory) (@untitled01ipynb): gm this app is free
Interconnects (Nathan Lambert) â· #random (34 messagesđ„):
Grok3 Pricing, LLM Summarization Ethics, Anon-Kode GitHub, Taiwan Security
- Grok-onomics: Pricing Leaks Online!: Likely leaked Grok3 pricing indicates costs of $3.50/million for input, $0.875/million for cached input, and $10.50/million for output, as per this tweet.
- LLMs Canât Keep a Surprise?!: Debate sparked regarding whether an LLM summarizing chat logs for a user named John should include plans for his surprise birthday party, raising questions about implicit contextual understanding and ethical boundaries.
- A member stated that it depends on if itâs John summarizing or another user asking for a summary to give to John.
- Anon-Kode: Claude Telemetry-ectomy: Anon-Kode is a GitHub project that removes telemetry from Claude-Code and replaces the Anthropic endpoint with a customizable OpenAI endpoint.
- Some users expressed concern about the implications of removing Anthropicâs license.
- Taiwanese Tensions Trending?: Concerns were raised regarding US commitment to Taiwan security, prompted by POTUS comments at a press conference which coincide with TSMCâs $100 billion investment in the US (timestamp).
- The same press conference also lauded David Sacksâs intellect.
Links mentioned:
- Tweet from Xeophon (@TheXeophon): LLMs can't keep a secret, can they
- Tweet from fishy business (@swishfever): likely leaked grok3 pricing:input: $3.50/millioncached input: $0.875/millionoutput: $10.50/million
- BREAKING: President Trump Announces $100 Billion TSMC Investment In The US, Takes Reporter Questions: President Donald Trump held a press briefing on Monday where he announced a $100 billion investment from Taiwanese Chip company in the United States. Fuel yo...
- GitHub - dnakov/anon-kode: Contribute to dnakov/anon-kode development by creating an account on GitHub.
Interconnects (Nathan Lambert) â· #memes (2 messages):
Anthropic Funding, AI development, International Expansion
- Anthropic Achieves Colossal Capital Infusion: Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners.
- This funding aims to advance their AI systemsâ development, deepen understanding of their functionality, and propel international growth.
- Huwupy Kawaii Social Post: A user shared a link to a post on bsky.app.
- No context was provided.
Links mentioned:
- Tweet from sam mcallister (@sammcallister): Quoting Anthropic (@AnthropicAI) Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners.This will advance our development of AI systems, deepen o...
- đ đ„đ«đœ hoopy frood đ¶ïž đ„đ«đ” (@huwupy.kawaii.social): Love how I can see someone say âif you watch Claude plays PokĂ©mon you can see that itâs overâ and not know if this is a pro or anti AI take
Interconnects (Nathan Lambert) â· #rl (1 messages):
Med-RLVR
- Med-RLVR: Emerging Medical Reasoning from a 3B base: A new paper called Med-RLVR: Emerging Medical Reasoning from a 3B base has been published.
- The model has 3B parameters.
- Medical Reasoning: The paper focuses on medical reasoning abilities.
- It explores how a relatively small model can perform in complex medical scenarios.
Interconnects (Nathan Lambert) â· #reads (11 messagesđ„):
Post-Training Methodologies for LLMs, In-House Data Labeling for SOTA Models, Human Data vs Synthetic Data, Disentangling Post-training Performance from Data
- LLMs get new Survey on Post-Training Methodologies: A new survey paper (https://arxiv.org/abs/2502.21321) explores post-training methodologies for Large Language Models (LLMs), analyzing their role in refining LLMs beyond pretraining, including fine-tuning, reinforcement learning, and test-time scaling.
- Yeager Chooses In-House Data Labeling: Enhanced Radarâs Yeager, a SOTA model that understands air traffic control audio, chose to label data in-house due to the industry-specific technical complexity, resulting in a high degree of standardization and near-perfect accuracy.
- Compensation was tied to the number of characters transcribed and financial penalties were assessed.
- Human Dataâs still needed: A blog post (https://www.amplifypartners.com/blog-posts/annotation-for-ai-doesnt-scale) argues that real, human data is still needed to build AI products that are genuinely useful, disagreeing with the belief that synthetic data will be sufficient to drive step-change improvements in model performance.
- Delving into Post-training performance: A Notion page (https://mohit-raghavendra.notion.site/Disentangling-Post-training-performance-from-data-1a5db7f2a34480e18010d689a1f46f74) discusses disentangling post-training performance from data.
Links mentioned:
- LLM Post-Training: A Deep Dive into Reasoning Large Language Models: Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these m...
- When Scale AI doesn't cut it: doing data labelling in-house: We in-housed our data labelling and it's the best thing we did
- Annotation for AI doesnât Scale: As models get more and more capable, the systems we've built to label and annotate data aren't cutting it: what's going to replace them?
Interconnects (Nathan Lambert) â· #policy (3 messages):
TSMC $100B investment in U.S. chip factories
- TSMC Mulls Massive US Chip Expansion: The CEO of TSMC is reportedly heading to the White House to discuss a potential $100B investment in U.S. chip factories, according to this tweet and The Information.
- One member joked that âif we do 1T I wonât be so pessimisticâ about the deal.
- TSMC Investment Optimism: A member expressed skepticism about the initial $100B investment, considering it insufficient.
- The member indicated a more positive outlook if the investment reached $1T.
Link mentioned: Tweet from Anissa Gardizy (@anissagardizy8): new: The CEO of TSMC is heading to the White House today to talk about a $100B investment in U.S. chip factories https://www.theinformation.com/briefings/trump-tsmc-to-announce-100-billion-chip-factorâŠ
Yannick Kilcher â· #general (44 messagesđ„):
Bachelor's degree project ideas in VLMs, Automating IRL jobs, Finding interesting problems to solve, Literature review article in AI, invite link to discord server
- Brainstorming VLM Project Ideas: A member sought ideas for a final year bachelorâs project involving VLMs, open to suggestions on other hot topics suitable for a year-long project with potential for publication.
- Claude Completes GitHub PR: A member reported using Claude and Cursor to complete 95% of the work on this GitHub Pull Request.
- Exploring Novel Architectures for Deep Learning: A member proposed comparing X-Splines with foundational architectures like Transformers, RNNs, CNNs, GNNs, Mamba, and KANs.
- Overcoming Contextual Limitations in Knowledge Graphs: Discussion of methods to improve knowledge graphs/hypergraphs by assigning separate nodes for words based on their context/concept to differentiate between concepts and instances.
- Automating IRL jobs: A member mentioned they are working on automating their in-real-life job, without telling anyone.
Links mentioned:
- AI timelines: What do experts in artificial intelligence expect for the future?: Many believe there is a real chance that human-level AI will be developed within the next decades, and some believe that it will exist much sooner.
- The AI Timeline: Follow The Latest Cutting Edge AI Research
- Join the Yannic Kilcher Discord Server!: Check out the Yannic Kilcher community on Discord - hang out with 20116 other members and enjoy free voice and text chat.
- feat: Add granular configuration options to object-property-newline rule by AlbertMarashi · Pull Request #707 · eslint-stylistic/eslint-stylistic: This PR enhances the object-property-newline rule by adding support for granular configuration options, allowing developers to specify different behaviors for different node types.Closes #234Chan...
Yannick Kilcher â· #paper-discussion (7 messages):
Joscha Bach, Presentation Time Slot
- Bach Talk Brainstorming Before Presentation: A member initially considered presenting on Joscha Bach, but itâs unclear if this was the final topic.
- Another member offered to present in the
<t:1741046400:F>
timeslot if there were no other presentations scheduled.
- Another member offered to present in the
- Presentation Still In Progress!: A member lamented missing the presentation, but another member clarified that it was still ongoing.
- A participant thanked the presenter, and the presenter offered further advice to those interested.
Yannick Kilcher â· #ml-news (3 messages):
Elsagate 3.0
- Elsagate 3.0: A Horrifying Discovery: A member shared a YouTube video titled âElsagate 3.0 Is Worse Than we Thoughtâ with a warning that it is NOT FOR CHILDREN.
- Another member responded, stating, âWell, that is horrifying.â
- Additional Topic Placeholder: This is a placeholder to ensure the minimum number of topics is met.
- Further details can be added if the conversation provides more substance.
Link mentioned: Elsagate 3.0 Is Worse Than we Thought.: THIS VIDEO IS NOT FOR CHILDREN. VIEWER DISCRETION IS ADVISEDGet a FREE sample pack from Five, just pay shipping (must be 21+): https://bit.ly/FreeFiveRaymundâŠ
Notebook LM â· #use-cases (16 messagesđ„):
Financial statement analysis in NotebookLM, Podcast length, Notebook Combination, Blog Outline, Podcast customization
- Financial Statement Analysis in NotebookLM: A member inquired about loading financial statements for analysis into NotebookLM.
- Podcast length concerns: A member expressed that most podcasts are 20 to 30 mins and donât cover some of the important topics, referencing a Supreme Court Application.
- Another member said, âYou can get that long?â, with a link to US_Dept._of_State_v._AIDS_Vaccine_Advocacy_Coalition.
- Notebook Combination is Scaled: A member asked if it was possible to combine Notebooks in NotebookLM.
- A moderator confirmed this feature is not available yet, but has been escalated to the nblm team for consideration.
- Blog Outline: Yes or No?: A member inquired about using NotebookLM for blog outline writing.
- Another member simply replied with, âYesâ.
- Podcast Guest Customization Commands: A member asked about using custom commands for Podcasts in NotebookLM.
- The other member used a command like âthe hosts interview lawyers representing both sides of the caseâ and also noted issues with accuracy when summarizing YouTube podcast episodes about guests.
Notebook LM â· #general (33 messagesđ„):
Dynamically updated sources, Google Docs integration, Podcast timelines, Copying and pasting index numbers, Bulk deleting sources
- Users Discuss Dynamic Source Linking and Google Docs: Members are curious if NotebookLM can dynamically update from sources like Google Docs, for use cases like tracking furniture dimensions, with some concerns about manual updates.
- The answer is no, itâs not automatic, leading to discussions about workarounds and feature requests.
- Podcast Timelines & Creation Requests Aired: A member requested that timelines be added to the podcast free version.
- Another user asked how to create a podcast, which was met with excitement by another member who found the feature very impressive, sharing an example notebooklm podcast.
- Citation Links Vanish After Saving Notes: Members noticed that citation links disappear when saving query results as notes.
- A member clarified that saved notes are âview onlyâ and the citation links are only available on the chat and specific response.
- NotebookLM Lacks Mobile App and Icon Customization: Users inquired about a mobile app for NotebookLM and the ability to change notebook icons.
- The responses confirmed that there is no mobile app, nor is there a way to change the icon of a notebook.
- Notebook Sharing Snafu Resolved!: A user reported a server error when sharing notebooks with Gmail personal accounts, specifically âYou are not allowed to access this notebookâ.
- The issue was resolved by the user, it was the recipient who had a new phone not correctly configured with his gmail account.
Link mentioned: no title found: no description found
Stability.ai (Stable Diffusion) â· #general-chat (40 messagesđ„):
IP Adapter, Reactor Faceswap, ControlNet, Reforge AMDGPU support, Zluda
- IP-Adapter Face Copy Alternatives: A member was seeking the best IP-Adapter for face copy but found reference only in ControlNet to be sufficient.
- Another member suggested Reactor Faceswap as a preferable alternative, praising the all-mighty ControlNet.
- Reforgeâs AMDGPU Support Still Unclear: One member inquired about Reforge supporting AMDGPU, noting its mention on the Stability Matrix but absence from the GitHub page.
- Another member attempted using Zluda but encountered PC freezes at launch, advising against relying on the Stability Matrix due to perceived bugs, and recommending sticking to a UI outside matrix.
- DirectML and Reforge Compatibility Struggles: A member tested Reforge with DirectML after Zluda failed, but it did not work.
- There was a discussion of a possible fork of Reforge for AMD by Lshqtiger.
- CivitAI as a Generation Request Platform: A member asked for image generation requests and discussed the CivitAI website, noting it provides a few starting credits and 25 free credits daily that can be saved.
- The usage cost depends on the model.
- Requirements to generate images locally: A member asked how to create images and another member noted that a GPU with around 6-8GB VRAM and resources in <#1002602742667280404> are recommended.
- Another member offered links to CivitAI for online generation.
Eleuther â· #general (13 messagesđ„):
Finding good problems to solve, EleutherAI affiliation projects, RWKV models, 4D gaussian splatting
- Members seek guidance on finding good problems to tackle: A member asked for suggestions on finding good problems to solve and the methodology for deciding on ideas and problems to work on.
- Another member suggested reading through topics of interest, searching relevant literature, and identifying a reasonable jumping-off point by asking âwhat ifâ questions.
- EleutherAI projects: involvement and focus: A member clarified that most people in the server do not work on projects under an EleutherAI affiliation.
- He pointed out that thereâs significant activity in Interpretability channels, along with interest in RWKV models and evaluations on the NLP side, in addition to the GPT-NeoX team pushing out a new training library.
- Diving deep into research with a 4D Gaussian Splatting Example: A member described their research process using the example of improving 4D gaussian splatting (3D + time).
- They suggested starting with established work, reproducing it, then taking one experimental step towards your idea to deeply understand the problem domain and inform your next deep dive.
Eleuther â· #research (15 messagesđ„):
Reasoning Model, GRPO based Agent, LLAMA 3.2 3B, Recurrent LLM reasoning, Atom of Thoughts (AoT)
- ReasonableLLAMA-Jr-3b Model Needs Your Feedback!: A member is seeking feedback on their ReasonableLLAMA-Jr-3b model, a reasoning model trained/finetuned using GRPO on LLAMA 3.2 3B quantized, using a custom written GRPO based Agent in Gym Env using MLX.
- The model is based on the concepts described in the Atom of Thoughts (AoT) paper, where each state transition in the reasoning process is a self-contained, atomic question.
- Recurrent LLM Reasoning: Too Expensive?: Members discussed that recent recurrent LLM reasoning papers require significantly more compute (equivalent to a 32B parameter model) to achieve performance comparable to a 7B parameter model.
- It begs the question: why not just train a 32B parameter model and use early exit, mixture of depths, or speculative decoding for cheaper inference?
- Truncated Backpropagation: Still a Memory Hog?: Although recurrent models use truncated backpropagation, the truncated depth (e.g., 8) may still correspond to the activations of a significant-sized model (e.g., 15B).
- A member wondered if a DEQ type training would have worked, and questioned whether the r and k parameters were optimized.
Link mentioned: Atom of Thoughts for Markov LLM Test-Time Scaling: Large Language Models (LLMs) achieve superior performance through training-time scaling, and test-time scaling further enhances their capabilities by conducting effective reasoning during inference. HâŠ
Eleuther â· #lm-thunderdome (10 messagesđ„):
trust_remote_code in lm-evaluation-harness, dataset_kwargs override, dataset loading errors, data_dir specification
- Trust Remote Code Conditional Setting: A user inquired about
trust_remote_code
being always set inlm-evaluation-harness
âs dataset loading, referencing a specific line in the GitHub repo.- A member clarified that
trust_remote_code
is only set if the--trust_remote_code
argument is passed, pointing to the relevant section of the code.
- A member clarified that
- Dataset Kwargs Pathway revealed: A user asked if setting
trust_remote_code
would overridedataset_kwargs
when loading a local dataset.- A member explained that
dataset_kwargs
are passed todatasets.load_dataset(...)
within the harness, linking to the relevant part of the code.
- A member explained that
- Dataset Generation Error Arises: A user reported encountering a dataset generation error when running
lm_eval
with a specific task configuration.- The userâs task configuration specifies
dataset_path: json
and adata_dir
containingtrain.jsonl
,validation.jsonl
, andtest.jsonl
.
- The userâs task configuration specifies
- Manual dataset loading suggested for debugging: In response to the reported error, a member suggested testing whether the dataset loads correctly using
load_dataset
manually.- The member also suggested trying an absolute path for the data directory, to rule out path-related issues.
Links mentioned:
- lm-evaluation-harness/lm_eval/__main__.py at main · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- lm-evaluation-harness/lm_eval/api/task.py at 14b0bd26956609b2ee50987299dfa34223fa23b8 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- GitHub · Build and ship software on a single, collaborative platform: Join the world's most widely adopted, AI-powered developer platform where millions of developers, businesses, and the largest open source community build software that advances humanity.
- Eleuth: GitHub is where Eleuth builds software.
- lm-evaluation-harness/lm_eval/__main__.py at 14b0bd26956609b2ee50987299dfa34223fa23b8 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
MCP (Glama) â· #general (36 messagesđ„):
Terraform Registry MCP issues, MCP Multi-Agent Systems, fast-agent GitHub repo, Claude desktop FastMCP errors, MCP server claiming problems
- MCP Terraform Troubles Surface: A member reported issues getting terraform-registry-mcp and aws-mcp server to work, seeking advice beyond the inspector.
- They clarified the problem occurs with Claude desktop and Cline, specifically when a system-level proxy is enabled, causing errors with mcp-server-fetch.
- Multi-Agent MCP Musings Materialize: A member discussed implementing MCP for multi-agent systems, referencing an Anthropic workshop at AI Engineering Summit and shared an image from the workshop.
- They mentioned building a framework for agents to cooperate across devices and considering adopting MCP, noting examples like BabyAGI and Stanford generative agents.
- Fast Agent Framework Finds Fans: A member shared a link to their project, fast-agent on GitHub, describing it as a way to Define, Prompt and Test MCP enabled Agents and Workflows.
- They confirmed that each agent can be configured with a separate set of servers, clarifying that each agent can be called as a tool by another agent and configured with a set of MCP Servers, each of which exposes tools.
- Claudeâs Node Version Vexes Users: A member reported encountering a Cannot find package âtimersâ error when using fastmcp in Claude desktop.
- The solution was to remove the old Node v14 version, which Claude was using.
- Twitter API Pricing Puts a Damper on Tweet Dreams: A member explored using MCP to connect to a Twitter account for pulling and generating tweets but acknowledged the challenge of Twitterâs API costs.
- A user suggested browser automation as a more cost-effective alternative for pet projects and pointed to a recent browser automation example.
Links mentioned:
- GitHub - evalstate/fast-agent: Define, Prompt and Test MCP enabled Agents and Workflows: Define, Prompt and Test MCP enabled Agents and Workflows - evalstate/fast-agent
- Open-Source MCP servers: Enterprise-grade security, privacy, with features like agents, MCP, prompt templates, and more.
MCP (Glama) â· #showcase (2 messages):
MCPHub.nvim, Graphlit MCP Server, Neovim Plugin, Model Context Protocol
- MCPHub.nvim makes MCP server management smooth: A new MCPHub.nvim plugin was released which helps manage MCP servers in Neovim offering features like smart server lifecycle management and integration with CodeCompanion.nvim for AI chat.
- The plugin can be installed with one command (
:MCPHub
) and configured with a simple setup function.
- The plugin can be installed with one command (
- Graphlit MCP Server released: The Graphlit MCP Server has been launched, offering new content ingestion and retrieval capabilities for MCP clients like Claude Desktop, Goose, Cline, Cursor, and Windsurf.
- The server is open-source and requires a free Graphlit account and project to store the knowledge base.
Links mentioned:
- Graphlit MCP Server: Integrate with MCP clients such as Goose, Cline and Claude Desktop - Graphlit: Graphlit is a batteries-included, serverless RAG-as-a-Service platform for developers building AI-powered applications and agents with unstructured data. Provides ETL for LLMs with web scraping, Googl...
- GitHub - ravitemer/mcphub.nvim: A powerful Neovim plugin for managing MCP (Model Context Protocol) servers: A powerful Neovim plugin for managing MCP (Model Context Protocol) servers - ravitemer/mcphub.nvim
DSPy â· #general (30 messagesđ„):
Ash Framework, instructor_ex, Async Support in DSPy, LangProBe Benchmark, Minions Feature Benchmarks
- Ash Framework EcoSystem Explored: A member suggested the Ash framework for a project, linking to the ash-project/ash_ai GitHub repository, but clarified itâs a sub-project within the larger Ash framework ecosystem.
- They also shared a link to instructor_ex, highlighting structured outputs for LLMs in Elixir, along with the Ash Discord community where a live streamer can provide guidance.
- DSPyâs Async Support Initiative: A member inquired about the motivation for full async support in DSPy, questioning potential performance boosts and production concerns, linking to another Discord invite link.
- A core contributor announced that a lead would be making async support as native as needed, asking the community to specify expectations and workflows, and requesting feature requests to be submitted as GitHub issues due to potential Discord oversight during work hours.
- LangProBe Benchmark Shows Program Composition Effects: A member shared a new paper, LangProBe: a Language Programs Benchmark, evaluating the impact of DSPy program composition and optimizers on various tasks, as well as understanding cost/quality tradeoffs.
- According to its X/Twitter post, the paper finds that smaller models in optimized programs can outperform larger models at a lower cost.
- Minions Benchmarks Prepare for Cost Optimization: A member noted that the just-dropped LangProBe paper provides a good baseline for running benchmarks on their implemented minions feature, referencing their closed pull request.
- The member was adding MinionsLM and StructuredMinionsLM for intelligent LM routing by jmanhype. They noted the paper is directly related to cost optimization.
Links mentioned:
- Ash Framework Blog: A declarative foundation for ambitious Elixir applications. Model your domain, derive the rest.
- GitHub - ash-project/ash_ai: Allow your users to chat with your app đ: Allow your users to chat with your app đ. Contribute to ash-project/ash_ai development by creating an account on GitHub.
- GitHub - thmsmlr/instructor_ex: Structured outputs for LLMs in Elixir: Structured outputs for LLMs in Elixir. Contribute to thmsmlr/instructor_ex development by creating an account on GitHub.
- Tweet from Lakshya A Agrawal (@LakshyAAAgrawal): đ§”Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!We find that, on avg across diverse tasks, smaller models within...
- Add MinionsLM and StructuredMinionsLM for intelligent LM routing by jmanhype · Pull Request #7891 · stanfordnlp/dspy: MinionsLM and StructuredMinionsLM ImplementationThis PR introduces two new classes to the DSPy framework:MinionsLM: An intelligent LM router that implements the MinionS protocol for cost-efficie...
LlamaIndex â· #blog (2 messages):
Workflow-based travel planner, LlamaParse updates, AnthropicAI Claude Sonnet 3.7, Google Gemini 2.0 Flash
- Travel Planner: RS Rohan Takes You Places!: RS Rohan demonstrates how to build an advanced, agentic travel planner in @llama_index, which extracts travel information from user queries and delegates tasks to specialized agents (tutorial and repo).
- LlamaParse Gets an Upgrade: The âParse With Agentâ mode now supports AnthropicAI Claude Sonnet 3.7 and Google Gemini 2.0 Flash, improving table parsing and cross-page consistency (announcement).
LlamaIndex â· #general (19 messagesđ„):
AgentWorkflow context vs chat history, MCP Support, PII redaction with LLMs, Anthropic DeltaStream
- AgentWorkflow: Context vs. Chat History Clarified: A member inquired about the difference between Context and Chat History in AgentWorkflow, and when to use each.
- Another member clarified that the chat history is inside the context.
- MCP Support in LlamaIndex Verified: A member inquired about MCP support in LlamaIndex, and another member confirmed its existence.
- They provided a link to an example notebook demonstrating its usage.
- Seeking PII Redaction Tools for LLMs: A member is looking for paid or open-source tools to assist with sending PDFs and images containing Personally Identifiable Information (PII) to a Large Language Model (LLM).
- Anthropicâs DeltaStream causes ValueError: A member reported that Anthropic now has a different DeltaStream that LlamaIndex doesnât support, specifically, with thinking enabled, it streams a delta with type
ThinkingDelta
which is not instance ofTextDelta
which causes the library to raise aValueError
.- A maintainer of the library acknowledged the issue and stated they still need to add better support for it.
Link mentioned: llama_index/llama-index-integrations/tools/llama-index-tools-mcp/examples/mcp.ipynb at main · run-llama/llama_index: LlamaIndex is the leading framework for building LLM-powered agents over your data. - run-llama/llama_index
LlamaIndex â· #ai-discussion (1 messages):
Windsurf Checkpoints
- Windsurf lacks Checkpoint Feature: A member inquired about the absence of a checkpoint feature in Windsurf, noting the inability to revert to previous states despite repeated coding attempts and file/workspace manipulations.
- The member attached an image illustrating their attempts to drag and drop files into the tab menu, seeking a way to access previous checkpoints.
- Another topic required: Dummy Summary as topicSummaries requires minItems of 2.
Latent Space â· #ai-general-chat (13 messagesđ„):
AI replacing programmers, Senior Engineers vs Junior Engineers, Anthropic Fundraising, Stagehand and Browserbase, Claude Code vs Cursor
- Reports of AI Replacing Programmers are greatly exaggerated: An OâReilly article argues that AI tools will change programming, but itâs not new, as programming has always evolved since the first programmers connected physical circuits.
- A member commented that âTheyâve just find/replaced their old articles for AIâ, which is exactly like decrying people for copy-pasting from StackOverflow, and further stated that LLMs make learning new things much faster.
- Senior Engineers Wield AI Better: Senior engineers apply engineering wisdom to shape and constrain AIâs output, preventing it from creating unmaintainable âhouse of cards codeâ when using tools like Cursor or Copilot.
- The AI accelerates implementation, but their expertise is what keeps the code maintainable, a skill that junior engineers often miss.
- Anthropic Achieves $61.5 Billion Valuation: Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners, to advance AI systems, deepen understanding of how they work, and fuel international expansion.
- Members hail Stagehand, seek Python Alternative: After listening to the Latent Space podcastâs episode featuring Browserbase, a member sought a self-healing browser workflow tool like Stagehand in Python.
- A member pointed to stagehand-py and stated âitâs wipâ.
- Claude Code vs Cursor, the cage match: Members discussed Claude Code and how it fares against Cursor, citing that Cursor is preferable due to its superior rollback process.
- One member indicated that Claude Code has a harder time staying focused, tends to add extra lines of code, is more expensive, and that âcode edits in cursor are way way fasterâ.
Links mentioned:
- Tweet from Eugene Yan (@eugeneyan): For simple features and apps, have you found Claude Code more effective, or are you sticking with Cursor/Windsurf?https://docs.anthropic.com/en/docs/agents-and-tools/claude-code/overviewââââWhat's...
- Tweet from undefined: no description found
- Tweet from Anthropic (@AnthropicAI): Anthropic has raised $3.5 billion at a $61.5 billion post-money valuation, led by Lightspeed Venture Partners.This will advance our development of AI systems, deepen our understanding of how they work...
- The End of Programming as We Know It: no description found
- stagehand-py: Python SDK for Stagehand
tinygrad (George Hotz) â· #general (10 messagesđ„):
tinygrad formalist project, Ops.CAT speed bounty, RDNA2/RX6000 usable with tinygrad, Intel Arc A770 usable with tinygrad
- Tinygrad: Formalist Project Aiming for Fair Compute Marketplace: George Hotz (@tinygrad) describes tinygrad as a formalist project that attempts to capture the full gamut of Software 2.0 in a non-leaky abstraction, aiming for a fair marketplace for compute similar to Linux and LLVM.
- He mentions that by the end of the year, tinygrad should be similar in speed on NVIDIA to the existing torch CUDA backend, but without CUDA, and plans to have a test cloud up where users can rent FLOPS in a lambda function.
- Ops.CAT Speed Bounty Still in Progress: A member is working on the Ops.CAT speed bounty, but is still facing issues getting it to rewrite in LLVM even after getting added to the schedule.
- The current Ops.CAT operations involve a complex structure of PAD, RESHAPE, and BUFFER operations, with arg being the two tensors to concat.
- RDNA2/RX6000 Usability Questioned: A member inquired whether RDNA2/RX6000/GFX1030 is usable with tinygrad, reporting an
OSError: [Errno 22] Invalid argument
when runningAMD=1
.- Another member responded that it should work on Linux and requested the trace for the OS error, which was subsequently provided in a trace.txt file.
- Intel Arc A770: Yes with OpenCL: In response to a question, a member confirmed that Intel Arc A770 is usable with tinygrad.
- The recommended approach is to use the OpenCL backend by setting
GPU=1
.
- The recommended approach is to use the OpenCL backend by setting
Link mentioned: Tweet from the tiny corp (@tinygrad): What is tinygrad?tinygrad is a formalist project. It attempts to capture the full gamut of software 2.0 in a non leaky abstraction. The methods on Tensor class create a directed graph of immutable RISâŠ
LLM Agents (Berkeley MOOC) â· #mooc-announcements (1 messages):
Charles Sutton, Coding Agents, AI for Vulnerability Detection
- Charles Sutton Speaks on Coding Agents: Amazing guest speaker Charles Sutton presented on Coding Agents and AI for Vulnerability Detection, at Lecture 5.
- The lecture explores using LLM agents for computer security tasks, like finding software vulnerabilities, and discusses design issues in LLM agents.
- DeepMind Researcherâs Software Engineering Accolades: Charles Sutton, a Research Scientist at Google DeepMind, has research in machine learning motivated by applications in code generation, software engineering, programming languages, and computer security.
- Suttonâs work in software engineering has received two ACM Distinguished Paper Awards (FSE 2014, ICSE 2020) and a 10-year Most Influential Paper award (MSR 2023).
Links mentioned:
- CS 194/294-280 (Advanced LLM Agents) - Lecture 5, Charles Sutton: Questions: bli.do/csut-code5
- Academic ranks in the US and UK: The US and the UK both have a series of ranks for academics, but the names of the job titles are somewhat different.
LLM Agents (Berkeley MOOC) â· #mooc-questions (4 messages):
Discord Admin Spam Account Removal, Quiz Posting Schedule
- Discord Admin Asks for Account Removal: An admin asked for the removal of an account that was spamming links, suggesting re-adding it after the security breach is resolved.
- Quiz Posting Day Revealed: A user asked when the quiz is posted each week, to which another user responded that they generally try to release it Wed/Thurs.
LLM Agents (Berkeley MOOC) â· #mooc-lecture-discussion (4 messages):
Audio issues during lectures
- Audio issues plague lecture: A member reported not being able to hear questions during the lecture due to audio problems, requesting assistance from someone in the room.
- Following up, the member noted that the audio was cutting out completely, inquiring if there was an issue with the AV setup in the room.
- Speakers to repeat questions after issue: A staff member apologized for the audio issues during the lecture.
- They promised to remind the speakers to repeat all questions going forward.
Cohere â· #api-discussions (9 messagesđ„):
Embed Images, 504 Errors
- Image Embedding Issue Resolved: A user reported an issue with embedding images, but later confirmed that the issue is resolved.
- Another member acknowledged the resolution.
- 504 Errors Investigated: A member mentioned that they didnât observe a spike in errors but noted super slow requests often resulting in 504 errors.
- The member is planning to investigate further and thanked the user for the information.
Modular (Mojo đ„) â· #general (5 messages):
Renaming
ownedto
own, Community meeting, AWS GenAI Loft event
owned
Becomesown
Thanks to Pull Request: A member created a pull request to renameowned
toown
for consistency with the rest argument convention.- Community Meeting Approaching - Speakers Needed: The next community meeting is coming up in one week, and the organizers are looking for speakers to give talks or share projects during the meeting.
- If youâre interested, please contact the organizers to express your interest and secure a spot on the agenda.
- MAX Engine Event at AWS GenAI Loft: If youâre in the Bay Area, consider attending an event tomorrow evening at the AWS GenAI Loft, entitled Beyond CUDA: Accelerating GenAI Workloads with Modularâs MAX Engine, Hosted by AWS.
Link mentioned: modular/max: The MAX Platform (includes Mojo). Contribute to modular/max development by creating an account on GitHub.
Modular (Mojo đ„) â· #mojo (1 messages):
SIMD DType, Construction Checks, Globals vs Parameters
- SIMD DType Dissected: A user questioned the need to wrap a DType in
SIMD
for C bindings, as it obscures the original dtype, but another member clarified thatSIMD[DType.uint8, 1](0).type
returns the dtype at compile time.- They exemplified with
var a = UInt8(0); alias dtype = __typeof(a).type
to further clarify the use case.
- They exemplified with
- SIMD Construction Under Scrutiny: A member highlighted that
SIMD
includes construction checks within its implementation.- This ensures the validity and type safety of
SIMD
objects upon creation.
- This ensures the validity and type safety of
- Parameter Injection Praised: When asked about using globals, one of the members stated that injecting parameters is always preferable to using globals.
- The member claimed this is true, if you have the time.
Link mentioned: max/mojo/stdlib/src/builtin/simd.mojo at main · modular/max: The MAX Platform (includes Mojo). Contribute to modular/max development by creating an account on GitHub.
Torchtune â· #general (1 messages):
Step-based checkpointing
- Step-based Checkpointing Reduces Compute Waste: A member expressed interest in step-based checkpointing to reduce compute waste if a failure occurs during training.
- Another member responded that step-based checkpointing is already being implemented to address this concern.
- Ongoing Implementation Addresses Concerns: The ongoing implementation of step-based checkpointing directly addresses the initial worry about wasted compute.
- This feature aims to minimize the impact of failures during training runs by saving progress at regular intervals.
Torchtune â· #dev (3 messages):
Profiler traces, Tensorboard, PyTorch memory visualizer tool, Perfetto
- Torch Users Tracing with Tensorboard and More: Users discussed strategies for visualizing profiler traces, mentioning initial attempts with Tensorboard and its apparent removal of certain plugin features for PyTorch.
- They recommended the PyTorch memory visualizer tool and Perfetto for memory and timing traces, respectively, as sufficient for following the trail.
- Alternative Profiling Tools: The discussion highlighted the PyTorch memory visualizer tool and Perfetto as alternatives for memory and timing traces.
- These tools were suggested after a user reported issues with Tensorboard, which seemed to have removed some plugin features for PyTorch.
Nomic.ai (GPT4All) â· #general (3 messages):
Ollama vs GPT4All, Catalan Language support for GPT4All, GPT4All v3.10.0 Vulnerability
- Ollama vs GPT4All, which is the best llama?: A user inquired why individuals opt for Ollama or Llama.cpp over GPT4All, asserting GPT4Allâs superiority due to its out-of-the-box functionality.
- Catalan Language Support for the win: A user requested the addition of Catalan as a language option for the GPT4All interface, citing the presence of Catalan speakers within the community.
- GPT4All v3.10.0 has security vulnerability: A user reported the discovery of a potential vulnerability in GPT4All v3.10.0 and sought guidance on the appropriate reporting procedure.
{% else %}
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!
If you enjoyed AInews, please share with a friend! Thanks in advance!
{% endif %}