AI News for 4/7/2025-4/8/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (229 channels, and 7279 messages) for you. Estimated reading time saved (at 200wpm): 692 minutes. You can now tag @smol_ai for AINews discussions!
After the DeepSeek R1 launch (our coverage here), a raft of âR1 but more openâ clone attempts emerged, of which it seems only HuggingFaceâs OpenR1 is still posting active updates, if you discount the distillation work. However, today Together and the Agentica Project (previously of the DeepScaleR work) have come out with a 14B code-focused reasoning model that scores at O3-mini level:
Usually these projects are easy to game and therefore unremarkable, but this project distinguishes it self by being fully open source - dataset, code, recipe and all, meaning the educational value is high, particularly given the prior work of its collaborators.
Specifically for RL training, they note the sampler bottleneck:
so they have very good thoughts on pipelining:
and they also propose an update to DeepSeekâs GRPO:
{% if medium == âwebâ %}
Table of Contents
[TOC]
{% else %}
The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!
{% endif %}
AI Twitter Recap
Model Releases and Updates
- Gemini 2.5 Pro, including its âFlashâ experimental versions, are now available for subscribers, according to @Google and @_philschmid. It can be accessed within the Deep Research feature of the Gemini app, as noted by @GoogleDeepMind. @lepikhin mentioned the team worked hard to serve all the traffic.
- Moonshot AI has released Kimi-VL-A3B, a multimodal LM with 128K context under the MIT license, outperforming GPT4o on vision + math benchmarks according to @reach_vb, with models available on Hugging Face and integrated with Transformers. @_akhaliq also noted the release.
- Together AI and Agentica have collaborated to release DeepCoder-14B, an open-source coding model that rivals OpenAIâs o3-mini and o1 on coding tasks, costing approximately $26,880 to train, according to @Yuchenj_UW. The model, training code, dataset, and a detailed blog are available, noted by @togethercompute. It achieves a 60.6% score on LiveCodeBench and 1936 on CodeForces, performing on par with o3-mini (low) and o1 on competition-level coding tasks, as per @togethercompute. It was trained using an open-source RL framework from ByteDance, as noted by @Yuchenj_UW.
- Meta AI has released Llama 4 Scout and Maverick, with a larger version called Behemoth in training, as mentioned by @EpochAIResearch. Maverick mixes MoE layers & dense, while Scout uses L2 Norm on QK, according to @danielhanchen.
- Runway has released Gen-4 Turbo, which offers 10x better results than Gen-3 at the same price point, according to @c_valenzuelab.
- Google has announced Imagen 3, their highest quality text-to-image model, now in Vertex AI, which allows for easier removal of unwanted objects, as per @GoogleDeepMind.
- Google has announced Veo 2 which allows users to refine and enhance existing footage and direct shot composition in Vertex AI, according to @GoogleDeepMind.
Evaluations and Benchmarks
- OpenAI has released a new Evals API for programmatically defining tests, automating evaluation runs, and iterating on prompts, integrating them into any workflow, as stated by @OpenAIDevs. @OpenAIDevs notes that good evals help improve the quality of model responses systematically.
- Epoch AI Research evaluated Llama 4, finding that Maverick and Scout scored 67% and 52% on GPQA Diamond, respectively, similar to Metaâs reported scores, according to @EpochAIResearch.
- ZeroBench tests reveal that current vision-language models fail, with GPT-4V and Gemini scoring 0% pass@1 and 0% 5/5 reliability on 100 hard visual reasoning questions, according to @LiorOnAI.
Agentic Systems and Tooling
- Auth0âs Auth for GenAI now has native LlamaIndex support, making it easier to build authentication into agent workflows, as announced by @llama_index.
- MongoDB has released a repository with 100+ step-by-step notebooks on AI Agents and RAG, covering chatbot construction to Airbnb agents, according to @LiorOnAI.
Industry Analysis
- Swyx believes the twittersphere is well-calibrated on individual developer tooling but not on how AI is improving every aspect of the SDLC, which may be more impactful, making Sourcegraph well-positioned as an AI dev tooling company, according to @swyx.
- Nearcyan believes that consumers will not be prompting their own full apps because most good apps require data and there is no real data portability for consumers, according to @nearcyan.
- Svpino argues it is essential to learn how to apply AI in oneâs craft, as Shopify understands, and those who know whatâs up are asking people to learn and study, according to @svpino.
Humor/Memes
- Vikhyatk joked about lunch in downtown Seattle costing 16-20 H100-hours, with caloric consumption dropping by 10x since converting $ to H100-hours, according to @vikhyatk.
- Scaling01 joked that Gemini 3.0 will be too cheap to meter, according to @scaling01.
- Andrew Carr noted Geminiâs run on Pokemon, citing Gemini âI canât believe it took six tries, and now the game is asking if I want to humiliate myself further by giving this thing a nickname. No way. I donât want to name this symbol of my failure. Iâll press B to declineâ, according to @andrew_n_carr.
AI Reddit Recap
Our pipelines had an outage yesterday. Sorry!
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.5 Pro Exp
Theme 1: Model Mania: Gemini Reigns, Llama 4 Stumbles, New Contenders Emerge
- Gemini 2.5 Pro Crowned King, But Lacks Reasoning Transparency: Across multiple Discords (LMArena, OpenRouter, Perplexity AI, Nous Research AI, aider), Gemini 2.5 Pro earns high praise for general capabilities, creative writing, and even generating functional code from complex prompts, often cited as superior to competitors like GPT-4.5 and Claude 3.5 Sonnet. However, users note its reasoning tokens arenât exposed via the Perplexity API, hindering its use as a reasoning model there, and it can still hallucinate even with deep research capabilities unless specifically grounded in AI Studio.
- Llama 4 Launch Leaves Users Lamenting: The release of Llama 4 (Scout, Maverick) met widespread disappointment (LM Studio, Manus.im, Yannick Kilcher, Nomic.ai), with users calling it terrible, overhyped, and potentially a step backward despite some decent Japanese language performance. Concerns center on sloppy post-training, questionable benchmark validity possibly due to overfitting or gaming, and higher VRAM requirements than expected for its performance level, leading many to wait for an overhaul or stick with alternatives like Qwenâs 14B.
- Cogito & Nvidia Models Challenge the Status Quo: New models are making waves, including DeepCogitoâs v1 Preview models (3B-70B), trained via Iterated Distillation and Amplification (IDA), claiming to outperform Llama, DeepSeek, and Qwen equivalents and even Llama 4 109B MoE, offering both direct answering and self-reflection modes (DeepCogito Research). Nvidia also quietly released a SOTA-level reasoning model, Llama-3.1-Nemotron-Ultra-253B-v1, featuring a toggle to turn reasoning capabilities on or off (Nvidia Blog Post).
Theme 2: Training & Fine-Tuning Frontiers
- Unsloth Fine-Tuning Fixes and FP4 Finds: Unsloth AI tackled DDP training issues on 3+ GPUs, recommending specific CUDA device visibility settings, while advocating for bitsandbytes (bnb) over GGUF for QLoRA training due to data efficiency. Users explored fine-tuning quantized models using FP4 via tools like Unsloth for faster training, clarifying that while direct fine-tuning of quantized models isnât feasible, LoRA offers a viable path.
- Distributed Training Debates: DeepSpeed vs. FSDP & Untrusted Compute: In Torchtune, the merits of integrating DeepSpeed were debated, with maintainers favoring native PyTorch FSDP for better composability, though offering support for community DeepSpeed recipes. Meanwhile, the Panthalia platform (X.com Waitlist), inspired by the Nous DeMo paper, aims to verify untrusted, low-cost compute for Distributed Data Parallel (DDP) training using gradient compression (Algorithm Docs).
- Novel Techniques and Research Directions Discussed: Researchers discussed the Hierarchical Perceiver patent by Google DeepMind, potentially related to long context in Gemini, and debated QKNorm advancements (Paper 1, Paper 2). Other discussions included the MIPRO algorithm for automated prompt engineering scaling across complex tasks (TensorZero Blog), and OLMo powering DAPO research for better RLHF answers (DAPO Paper, OLMo Paper).
Theme 3: Tools & Platforms: Updates, Bugs, and Battles
- Platform Updates: New UIs, Rate Limits, and Rebrands: LMArena launched its Alpha UI for testing, while OpenRouter debuted a slick new frontend but tightened free model rate limits to 50 RPD (unless users have $10+ credits), sparking user frustration. Codeium officially rebranded to Windsurf (Rebrand Announcement) following the success of its editor, launching a new SubReddit.
- Tool Troubles: Bugs Plague Cursor, Aider, and APIs: Cursor users reported issues with the C/C++ extension requiring rollbacks (Forum Thread), the auto-select feature choosing poor models, and potential bans for bypassing the trial. Aider users faced /architect mode edits being cut off and sought ways to disable auto-committing (Aider Config Docs), while Perplexity API users noted discrepancies compared to the web UI and issues with Sonar prompts focusing on the system prompt (Prompt Guide).
- Framework Frustrations and Fixes: Mojo, MAX, Granite: Mojo developers discussed its borrowing paradigm (Mojo vs Rust Blog),
__moveinit__
vs__copyinit__
(Example Code), and managingSpan
lifetimes. Users compared MLX and MAX, noting MAXâs current inability to target Apple Silicon GPUs, while Unsloth AI users found a quick fix for a GraniteModel bug in Colab involving editingconfig.json
.
Theme 4: The AI Ecosystem: Research, Rumors, and Real-World Use
- Research Ripples: Patents, Audits, and Unlearning: Google DeepMindâs attempt to patent the Hierarchical Perceiver (Patent Link, Paper Link) sparked discussion about defensive patenting and long-context Gemini. Researchers sought AI professionals for an ethics-based auditing survey (Survey Link), and ICML announced a machine unlearning workshop (Workshop Website).
- Industry Insights & Intrigue: Googleâs Payroll, Tariffs, and Cybercrime: A TechCrunch article alleged Google pays some departing AI staff for a year to prevent them joining competitors, raising questions about legality and impact. Concerns surfaced that potential tariffs on NVDA GPUs could slow AI progress, while others noted AI adoption by cybercriminals seems slower than expected, though a future âshockâ remains possible.
- Applications & Integrations: MCP, Math, Auth, and Agents: The Model Context Protocol (MCP) saw use cases discussed, including integrating with Neo4j graph databases for RAG using clients like mcpomni-connect, and Semgrep rewrote its MCP server using SSE (Cursor Demo). AI4Math discussions highlighted using LLMs with formal systems like Lean for theorem proving (Kaiyu Yang Lecture), while Auth0âs Auth for GenAI integrated native LlamaIndex support (Tweet). Mozilla AI released
any-agent
to simplify agent framework evaluation (GitHub Repo).
Theme 5: GPU & Hardware Hustle
- Hardware Headaches: ROCm Woes and METAL Sync Glitches: Users continued to struggle getting ROCm via WSL working on AMD 7800XT GPUs due to lack of official support (AMD Docs) and WSL passthrough issues. In tinygrad, a user debugging a METAL sync issue bounty found that sharding problems in LLaMA might stem from COPY operations executing before XFER commands finished, causing incorrect data reads.
- Performance Puzzles & Optimizations: Tinygrad users reported significant speedups on AMD hardware using BEAM=2, surpassing Torch performance. In GPU MODE, discussions centered on Tritonâs
tl.make_block_ptr
withboundary_check
for handling out-of-bounds memory safely (at a slight performance cost) and TorchTitanâs unique pre-compile strategy potentially avoidingtorch.compile
bugs (TorchTitan Code), though numerical issues withtorch.compile
and FSDP persist. - New Releases & Resources for GPU Gurus: Nvidiaâs PhysX CUDA physics simulation kernels are now open source, inviting community ports (like ROCm). TorchAO v0.10.0 was released (Release Notes), adding MXFP8 training support for Nvidia B200 and a module swap quantization API. For learning, the geohotarchive YouTube channel and the Programming Massively Parallel Processors (PMPP) book (4th ed) were recommended.
PART 1: High level Discord summaries
LMArena Discord
- Gemini 2.5 Pro Declared A.I. Supreme: Members are calling Gemini 2.5 Pro the first true A.I., highlighting its superiority in creative writing and consistency over previous models.
- While Gemini 2.5 Pro excels in general tasks, it has been noted that the unreleased Nightwhisper model is superior in coding capabilities.
- OpenAIâs Deep Research Gets Skeptical Eye: Doubts emerge regarding OpenAIâs Deep Research project, despite claims of it being the best agent for web searching, with some stating 2.5 with tools is just on another level.
- The prevailing sentiment suggests that Deep Research is merely a rebranded version of OpenAIâs existing o3 model.
- DeepCoder-14B Debuts with Muted Applause: Together AI and Agentica launched DeepCoder-14B-Preview, a code reasoning model, finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL.
- However, this release was met with criticism, with one user deriding the marketing as the dumbest most shameful marketing ever, saying the gains arenât impressive considering this is just o3-mini.
- NightWhisperâs Coding Skills Tease: Enthusiasm builds around the potential release of NightWhisper, celebrated for its coding capabilities demonstrated on the arena, despite its short webdev and lmarena availability.
- Thereâs speculation that NightWhisper might align with the upcoming Google Ultra model.
- Alpha UI Opens for Crowd Testing: The Alpha UI is now available for testing here without a password.
- Users are prompted to provide feedback and bug reports through the provided Google Forms and Airtable links, as frequent updates are expected for both Desktop & Mobile.
Unsloth AI (Daniel Han) Discord
- Unsloth Patches DDP Training: Users reported issues with HF Trainer and DDP not working with 3 or more GPUs, and it was recommended ensuring CUDA visible devices are set to a specific GPU, but Unsloth supports DDP.
- After testing, it threw a ValueError, so members recommended ensuring CUDA visible devices are set to a specific GPU.
- bnb Is the Way for LoRA Training: It was advised to use bnb (bitsandbytes) for QLoRA training instead of GGUF, as it saves downloading 4x the data, and you can save and merge the adapter with the bnb model for later export to GGUF.
- Users were considering between training a LoRA on bnb 4-bit or GGUF for a tiny model, and the consensus leaned towards the former.
- Llama 4 Models Earn a Sloppy Reputation: Members testing Llama 4 (Scout and Maverick) found it to perform well in Japanese and to be capable base models despite sloppy post-training.
- The general sentiment is to await a forthcoming post-training overhaul.
- DeepCogito v1 Claims Lead in LLM Performance: DeepCogitoâs claims their v1 Preview models outperform the best available open models of the same size, including counterparts from LLaMA, DeepSeek, and Qwen.
- These models offer the ability to answer directly (standard LLM), or self-reflect before answering (like reasoning models).
- GraniteModel Bug Affects Colab: Users encountered a bug in the Colab notebook using GraniteModel, and suggested a quick fix that involves editing
granite_based/config.json
to replace GraniteModel with GraniteForCausalLM and rerun the cell.- The recommended method for editing the file on Colab is to download, edit locally, and then upload the modified version back to Colab.
OpenRouter (Alex Atallah) Discord
- Free Model Limits Squeezed on OpenRouter: OpenRouter reduced the token limit for free models to 50, triggering negative reactions as users expressed frustration over the lowered limit, with some feeling that itâs like a paywall.
- Accounts with at least $10 in credits will have their daily requests per day (RPD) boosted to 1000, while those with less than $10 will see a decrease from 200 RPD to 50 RPD.
- Quasar Credit-Dependent Rate Limit Coming: The update notes that Quasar will soon have a rate limit that is dependent on credits and there is no hourly rate limit, but the rate limit is 20 requests per minute.
- Members opened a feedback thread for users to post their thoughts on the changes.
- OpenRouter Debuts Slick New Frontend: OpenRouter has a new frontend that looks sick with big ups to clinemay!
- One user joked that it looked like gpt-3.5 made this website in about 4 minutes.
- Gemini Crowned King of the Models: Gemini 2.5 Pro is on a whole other level compared to the other models, making it the most powerful model up to day.
- One user noted it was rated as 1. gemini 2.5 pro ⊠10. everyone else.
- Nvidia Stealthily Unleashes Reasoning Model: Nvidia silently dropped a SOTA-level reasoning model.
- The new model is casually showing itâs better than Behemoth.
Cursor Community Discord
- Daniel Mac Graphs Code with GraphDB: A member shared Daniel Macâs tweet about using a graph database for code querying.
- This sparked a discussion on the potential benefits of using graph databases for code analysis and understanding complex relationships within codebases.
- Manus.im Devours Credits: A user reported that Manus.im failed to answer a question correctly and consumed 984 of their 1000 free credits on a single prompt.
- Alternatives like Smithery.ai and Awesome MCP Servers were suggested as potential solutions.
- C/C++ Extension Error Strikes: A user reported encountering an error related to the C/C++ extension after using Cursor since March 2023, noting that the extension may be limited to Microsoft products.
- A workaround involving rolling back to a previous version was suggested, with users sharing other forum threads discussing the issue.
- Auto-Select Model Labeled as Scam: Users are reporting that the auto-select model option is choosing low quality models, with one user claiming it fucked up my codebase.
- Another user suggested that this behavior might be intentional, raising concerns about the reliability of the auto-select feature.
- Cursorâs Ban-Hammer Swings at Free Tier Bypassers: A member reported that bypassing the trial version of Cursor could lead to a complete ban from using the tool, with a warning that you wonât be able to use it at all soon.
- This sparked a debate about the fairness of Cursorâs trial version restrictions and the consequences of attempting to circumvent them.
LM Studio Discord
- Llama 4 Disappoints Users: Users expressed disappointment with Llama 4âs performance, some describing it as a step backwards, questioning benchmark validity.
- While Llama 4 provides speed/cost similar to 17B models with similar results to 24-27B, it requires more VRAM, making it pointless for simple users, while Qwenâs 14B models are praised.
- ROCm on WSL Still Doesnât Work on 7800XT: A user reported that ROCm via WSL doesnât work with a 7800XT due to lack of official support (AMD documentation).
- Another user suggested it might work since both cards are RDNA3, while the first user confirmed that it was impossible to get working due to WSL passthrough issues.
- Fix Cogito Jinja Errors Quickly: Users reported errors with Jinja templates when using cogito-v1-preview-llama-3b, and were advised to use ChatGPT to quickly fix the template.
- The community model maintainer was notified about the wonky template and is expected to update the model soon.
- Docker Gets Bashed: After one member expressed wanting to be âbest friendsâ with anyone who says bad things about Docker, another member jokingly asked âDid Docker take out your family or something?â
- The first member humorously replied, *âMy therapist said I shouldnât talk about it.â
- Debating an Affordable Supercomputer Build: One user proposed building a 16-node supercomputer with either RTX 4090 D GPUs or a less powerful option, aiming for a 2T model with 1M context.
- Skeptics questioned the feasibility, highlighting the need for RDMA, fast interconnects, and skilled engineers.
Perplexity AI Discord
- Startups Snag Savings via Perplexity: Perplexity AI introduces a startup program offering $5000 in Perplexity API credits and 6 months of Perplexity Enterprise Pro for eligible startups.
- Eligibility requires less than $20M in funding, being less than 5 years old, and association with a Startup Partner.
- Gemini 2.5 Reasoning Ruckus Reported: Members noted that Gemini 2.5 Pro doesnât expose its reasoning tokens via the API, and therefore canât be included as a reasoning model on Perplexity, though it is a high latency thinking model.
- Consequently, the reasoning isnât displayed via the API, unlike in AI Studio.
- Deep Research High Hype but Hindered: Users await the rollout of Deep Research High, which aims to use 150-200 sources on average, yet one user reports Perplexityâs deep research got 23 sources, the free gemini deep research got over 500.
- Some members are frustrated by the lack of communication on the release timeline and the current versionâs summary output, instead of a truly deep research; check out the DeepSeek Subreddit.
- Llama 4 faces Benchmark Faking Flak: Concerns were raised regarding a Perplexity AI search result questioning if Llama 4 is faking benchmarks.
- This is part of a broader discussion regarding model benchmarking transparency and the methodologies used to evaluate Llama 4.
- Perplexity API: Prompting Problems Persist: A user reported that Sonar responses focus on the system prompt, rather than user queries, while a team member clarified that the system prompt isnât used during the search phase, advising the user to optimize the user prompt instead using Prompt Guide.
- Also, some members discussed discrepancies between the Perplexity API and the web UI when summarizing web pages, with the API sandbox even giving way better results than the actual API when using sonar-reasoning-pro.
Manus.im Discord Discord
- Local Manus is on the Horizon: Members speculated that a local version of Manus will be possible in the future, similar to other AI models.
- This would allow users to run Manus on their own hardware, addressing concerns about credit usage and data privacy.
- MCP Servers Deployed on Claude: As of November 25, 2024, MCP servers are available on Claude and can be used with Claude code, as one member reported.
- This integration enables users to leverage MCP servers within the Claude environment for enhanced functionality.
- Llama 4 Hype Train Derails: After testing on Openrouter.AI, users report that Llama 4 is overhyped due to subpar responses.
- Criticism extends to Zucks, who is accused of gaming the benchmarks, leading to inflated performance expectations.
- Octopus Web Scraper Steals the Show: A member reported that the free website scraper Octopus works effectively on Zillow and Realtor, offering a cost-effective alternative to Bardeen, which is priced at $130/month.
- The high cost of Bardeen prompted suggestions to use Manus for building a custom scraper as a more economical solution.
- Manus Credit Crunch Angers Users: Users express dissatisfaction with the high cost of Manus credits, reporting that even simple tasks consume substantial credits, with one user exhausting 1000 free credits on a single Standard Complexity task.
- To mitigate credit consumption, users suggest breaking tasks into smaller dialogue windows and considering Proxy as a cheaper alternative, pending updates to Manusâs pricing and credit plans.
aider (Paul Gauthier) Discord
- Gemini 2.5 vs Sonnet Prompting Power: Users found Gemini 2.5âs logic strong but instruction following poor, contrasting it with Sonnetâs feature-rich coding that needs more prompting.
- One user reported needing only 1 prompt with Gemini 2.5 compared to 3 prompts for Sonnet, even with Sonnetâs advanced features like multiple file input methods and batch processing.
- Aiderâs Auto-Commit Causing Havoc?: A user seeks to disable Aiderâs auto-committing due to committing untested code, referencing the Aider configuration options.
- Another user suggested providing a model and key or Aider will guess based on available keys.
- OpenRouterâs Missing Sonar Pro Citations: A user questioned missing citation links when using Perplexity Sonar Pro via OpenRouter, providing a visual reference here.
- The discussion implies potential issues with citation link reliability when using certain models through OpenRouter.
- Software Engineer Gap Year a Career Killer?: An article argues that taking a gap year/holiday would be a poor decision for software engineers, citing insights about the current tech landscape, see this article.
- The author suggests the fast-evolving nature of tech makes extended breaks detrimental for maintaining relevance.
- Architect Mode Edits Getting Interrupted: Users report /architect mode edits in Aider being cut off when adding new files, leading to potential loss of the editor state.
- Avoiding the addition of new files during editing appears to allow the process to continue without interruption.
Notebook LM Discord
- AgentSpace unlocks NotebookLM for Enterprise: Googleâs AgentSpace documentation reveals that NotebookLM Enterprise can now be set up with Customer-Managed Encryption Keys (CMEK) for better data encryption control.
- A user inquired about commercial-scale NotebookLM, and another member pointed out this new offering.
- NotebookLMâs Privacy Assurances Confirmed: Both the Enterprise and Plus versions of NotebookLM ensure user data remains private and never enters the public domain, according to a member.
- This clarification addresses misunderstandings about Googleâs privacy policy and terms, noting built-in mechanisms to prevent prompt injection.
- User Correction Improves NotebookLMâs Summary: A user reported that NotebookLM initially misread a scholarly article, but corrected itself after a quotation and explanation were provided.
- Repeating the same prompt in different Google accounts from the beginning yielded correct results, raising questions about training and privacy.
- Discovery Mode Rollout Still in Progress: Users are still awaiting the new Discovery Mode feature in NotebookLM, with the rollout expected to take up to two weeks from the release date.
- A user humorously demanded special treatment as a Google fanboy to get early access.
- Gemini Still Hallucinates with Deep Research: Users report that Gemini hallucinates with deep research, even with internet access.
- A member clarified that Gemini can connect to Google Search, but it requires specific grounding instructions in AI Studio.
Interconnects (Nathan Lambert) Discord
- DeepSeek R2 Primed for LlamaCon Release: Members are urging DeepSeek to release R2 on the same day as LlamaCon to capitalize on the hype, noting that training data for MoE differs from base models, citing this paper.
- The release could challenge other models and draw significant attention during the event.
- Together AI Gets into the Training Game: Together AI is entering the model training business, as evidenced by this case study showcasing the Cogito-v1-preview-llama-70B model.
- This move marks a shift towards providing comprehensive AI solutions, including training infrastructure and services.
- Google Rumored to Pay AI Staff for Idleness: According to this TechCrunch article, Google is allegedly paying some AI staff to do nothing for a year rather than allowing them to join competitors.
- A member critiqued this as a basic management idea with horrifically bad second-order effects, with another noting it could create legal perils by restricting what they do or build while under contract.
- Tariffs Threaten NVDA GPU Availability: Members speculated that if tariffs remain, the AI field may slow down due to the increased cost of NVDA GPUs.
- This could impact development and research, as access to necessary hardware becomes financially constrained.
- OLMo Powers DAPO Research: Members discussed a DAPO paper as offering âExtreme valueâ, referencing another paper built on OLMo.
- The researchers noted a novel compute method that results in better answers for RLHF tasks.
Eleuther Discord
- DeepMindâs Hierarchical Patent Pursuit: Google DeepMind is trying to patent the Hierarchical Perceiver, drawing comparisons between the patent diagrams and those in the original research paper.
- Speculation suggests this patent might be related to DeepMindâs work on ultra-long context lengths in Gemini, possibly as a defensive measure.
- Survey Seeks AI Auditing Experts: A researcher seeks participation from AI professionals for a survey on ethics-based auditing of generative AI systems.
- The survey aims to gather insights on auditing or evaluating AI systems, especially generative models.
- Debate Dawns Over Dubious Developments in QKNorm: Members debated that the QKNorm developments are not the right way to go, referencing this paper.
- A member suggested a better/earlier paper.
- ICML Invites Investigation Into Unlearning: A member shared that ICML will have a machine unlearning workshop.
- The workshopâs website can be found here.
- LM Harness Hand-holding Heeded: A member inquired about a LM harness implementation for HotpotQA to evaluate Llama and GPT models.
- Guidance was requested on running evaluations against HotpotQA.
Nous Research AI Discord
- Llama-4-Scout-17B Ready for llama.cpp: Llama-4-Scout-17B text-to-text support has been added to llama.cpp, and members are converting and quantizing the model.
- This pre-release has generated excitement among users, eager to test its capabilities.
- Gemini 2.5 Pro Generates functional Code Snippets: Gemini 2.5 Pro is praised for generating functional code snippets from complex prompts, see the prompts and responses in this message.
- A user reports using aider-chat combined with Gemini 2.5 Pro to edit or create 15 files from a 300k token context, including their frontend, API, and microservices.
- HiDream-I1 Generates High-Quality Images: HiDream-I1 is a new open-source image generative foundation model with 17B parameters using Llama 3.1 8B as a text encoder, released under the MIT license.
- It produces exceptional results across multiple styles including photorealistic, cartoon, artistic, and more, achieving state-of-the-art HPS v2.1 score, which aligns with human preferences.
- Cogito Models use Iterated Distillation: A new suite of Cogito models (3B-70B) outperform models like Llama, DeepSeek, and Qwen, trained using Iterated Distillation and Amplification (IDA), which iteratively improves a modelâs capabilities.
- Notably, the 70B model allegedly surpasses the newly released Llama 4 109B MoE model, as outlined in this research.
- Panthalia Platform Aims to Verify Low-Cost Compute with DDP: Inspired by the Nous DeMo paper, a platform has been developed to verify untrusted, low-cost compute for training models over the internet using distributed data parallel (DDP), with a waitlist available via X.com.
GPU MODE Discord
- GPUMODEâs dataset requires PyTorch 2.5: The GPUMODE âtritonâ dataset, used for Inductor Created Data, was created using PyTorch 2.5, and the creator promised to update the readme.
- Users may experience issues running the dataset on PyTorch 2.6+.
- Triton Gets Boundary Checks: A member suggested using
tl.make_block_ptr
withboundary_check
andpadding_option="zero"
to create pointers that can fill with zeros for out-of-bounds memory accesses.- It was clarified that omitting
boundary_check
increases speed, but risks errors like âdevice-side assert triggeredâ due to potential buffer overruns.
- It was clarified that omitting
- TorchTitan Compiles Before Ops: TorchTitan does a unique per-block compile before operations, potentially to circumvent some torch compile bugs; see torchtitan/parallelize_llama.py#L313.
- Numerical issues may still exist when using
torch.compile
and FSDP together.
- Numerical issues may still exist when using
- PhysX now Open Source: NVIDIAâs CUDA physics simulation kernels are now open source, and some are already working on a ROCm version.
- The Triton-Distributed learning note details fusing Triton with NVSHMEM/ROC-SHMEM to enable multi-GPU execution.
- LiveDocs Documents Legit Logistics: The creator of LiveDocs invites users to document your code with their upgraded service, now with more features available via signup at www.asvatthi.com.
- Included was an image of the interface, showing off various code documentation pages.
HuggingFace Discord
- FP4 Fine-Tuning Fuels Faster Finishes: Users are exploring fine-tuning quantized models using FP4 with tools like Unsloth, which allows loading lower precision models for training and quantization.
- While fine-tuning a quantized model is possible via LoRA, directly fine-tuning the quantized model itself is not.
- Parasail Provides Premier Performance: Parasail, a new inference provider, is looking to partner with Hugging Face after recently coming out of stealth, already serving 3B tokens a day on Open Router and 5B+ a day for private companies, as reported by The Next Platform.
- The Next Platform reported that Parasail brokers between AI compute demand and supply.
- Llama.cpp Leaps to Llama 4: The backend Llama.cpp has been updated to support Llama 4, according to the GitHub releases.
- This update allows for enhanced compatibility and performance with the latest Llama models.
- AI Runner Desktop GUI Takes Flight: A member released AI Runner, a desktop GUI for running AI models locally using HuggingFace libraries as described in this YouTube video.
- The tool enables users to create and manage chatbots with custom voices, personalities, and moods, and the bots are agents built with llama-index using ReAct tools to generate images with Stable Diffusion and real-time voice conversations (espeak, speecht5, or openvoice).
- any-agent Library Simplifies Agent Framework Evaluation: The Mozilla AI team released
any-agent
, a library designed to simplify trying different agent frameworks, with a GitHub repository available for users to try and contribute.- The library supports frameworks like smolagents, OpenAI, Langchain, and Llama Index.
MCP (Glama) Discord
- Semgrep MCP Server Gets Docker Boost: A member reports running the Semgrep MCP server for over a month, hosted via Docker and AWS EC2.
- This setup is a practical demonstration of deploying MCP in a cloud environment, with potential for wider adoption given its ease of use.
- CORS Error Fixed in Semgrep MCP Server: A reported CORS error when connecting with the Cloudflare Playground was quickly resolved.
- The tool was being tested with Cursor, suggesting real-world application and integration needs.
- HTTP Request-Response Support in MCP for Enterprises: Discussion emerged regarding the need for HTTP request-response support in MCP for enterprise customers, highlighted in this pull request.
- The demand for this feature underscores MCPâs growing adoption among enterprise organizations.
- MCP Integrates with Graph DB for RAG: A member inquired about using MCP in a RAG use case with a Neo4j graph database, focusing on vector search and custom CQL search.
- Another member confirmed this is a good use case, linking to mcpomni-connect as a viable MCP client, showcasing MCPâs versatility.
- Semgrep Rewrites MCP Server with SSE: A member rewrote Semgrepâs MCP server and shared demo videos using SSE in Cursor and Claude.
- The server is using SSE because the Python SDK doesnât support HTTP streaming yet.
Latent Space Discord
- Shopifyâs AI Quest Gains Momentum: Shopifyâs AI mandate is gaining attention, as highlighted in this tweet.
- The company is pushing towards AI integration across its platform, with internal discussions focusing on practical applications and strategic implications.
- Anthropic API Credits Have Expiration Dates: Anthropic API credits expire after one year, potentially for accounting simplification and to account for the rapidly evolving AI landscape.
- Members suggest that this policy helps manage projections in a quickly changing field, providing a framework for resource allocation and future planning.
- NVIDIA Reasoning Model Features On/Off Toggle: NVIDIA has released a new model with the ability to turn reasoning on or off, detailed in this blog post and available on Hugging Face.
- This feature allows developers to experiment with different reasoning approaches and fine-tune their AI applications for specific tasks.
- Cybercrimeâs AI Adoption Slower Than Expected: Despite basic AI applications like FraudGPT, mass adoption of AI by cybercriminals is surprisingly slow, with speculation that a âcybercrime AI shockâ may occur when they adopt it more broadly.
- One member noted that LLMs may have only recently become good enough for use in cybercrime, indicating that the technology is still maturing in this context.
- Gemini Streams Pokemon Gameplay: The Gemini AI is now playing Pokémon, garnering attention as shown in this tweet.
- This showcases the potential of AI in gaming and interactive entertainment, demonstrating its ability to engage in complex tasks within virtual environments.
Yannick Kilcher Discord
- Llama 4 Benchmarking Shortcomings Exposed: A member asserted that Llama 4 flops on nongamed nonoverfitted benchmarks, sparking interest in the paper arxiv.org/abs/2408.04220 and a related YouTube talk.
- Concerns arose that Meta should have clarified that âLlama-4-Maverick-03-26-Experimentalâ was a customized model to optimize for human preference, according to this fxtwitter link.
- Decoding Bayesian Structural EMâs Secrets: A member highlighted that Bayesian inference has been combining weights and architecture for around a century, citing Bayesian Structural EM as an example.
- DNA of a Model: Procedural Model Representation: A member introduced procedural model representation, where a small seed generates a large model (architecture + weights), envisioning downloading a 10MB model to generate a 100TB model.
- The member described downloading DNA to generate a human, by swapping seeds to generate different models.
- Cogito 14b Adopts Efficient Tool Template: The 14b model unexpectedly began utilizing a more efficient tool calling template than what was initially provided in the instructions, see the Cogito model.
- This suggests the model may have autonomously optimized its tool use, offering a potential area for further investigation.
- DeepCogito Improves Iteratively: A member shared a link from Hacker News about an iterative improvement strategy using test time compute for fine-tuning, from DeepCogito.
- Another member pointed to this paper and shared an Awesome talk about adapting pre-training text.
Nomic.ai (GPT4All) Discord
- Granite 8B Impresses with RAG-ability: Members reported that IBM Granite 8B is effective with RAG tasks, especially regarding providing references.
- Other members concurred, having also found Granite to be effective.
- Docling Does OCR Delicately: A member recommended docling for image OCR, especially for non-text PDFs like scans, for running embeddings.
- They highlighted its continuous operation for embeddings and integration into a database with indexed documents, enabling RAG through intersections.
- Semantic Chunking Chunks Context: A member shared a semantic chunking server, demonstrating its use with clipboard examples.
- They noted its compatibility with audio and image processing, suggesting ComfyUI for combining all modalities.
- Llama 4th Gen Bashed Badly: A member trashed the Llama 4th gen model for being terrible compared to smaller models.
- Others agreed, noting Reddit comments speculated that it may have overfit on smaller âhigh qualityâ datasets, despite some benchmarks showing promise.
- GPT4All: Run Locally!: A member advised using GPT4All primarily for local operations to ensure privacy and avoid sending private information to remote APIs.
- They detailed how to run embedding models locally and index files by chunking and embedding, referencing a shell script example.
Modular (Mojo đ„) Discord
- MAX struggles with Apple Silicon Deployment: A member compared MLX and MAX, noting MAX currently cannot target Apple Silicon GPUs, unlike MLX, which poses challenges for direct comparison and deployment.
- They suggested that while MLX is convenient for initial experiments, the practical limitations of deploying Appleâs ecosystem in server settings necessitates rewriting to frameworks like MAX, JAX, or PyTorch.
- Mojo Borrowing Paradigm Receives Praise: A newcomer shared a blog post comparing Mojo and Rust, observing that Mojoâs borrow by default felt more intuitive, and wondered about how Mojo handles returning values from functions.
- Discussion ensued on how Mojo handles returning values from functions.
- Moveinit vs Copyinit deep dive: A member clarified that when returning objects in Mojo, the presence of
__moveinit__
dictates whether the object is moved, otherwise__copyinit__
is used, and provided an example on Github.- The member also pointed to the official Mojo documentation for a complete picture.
- Span Lifetimes got you down? Rebind!: A member inquired how to specify in Mojo that âthe lifetime of the return value is at least the lifetime of selfâ, specifically for a
Span
.- Another member suggested using
rebind[Span[UInt8, __origin_of(self)]](Span(self.seq))
or making the trait generic over origin, but noted that trait parameters are not yet supported.
- Another member suggested using
- Self-Promotion Rules Trigger Moderator!: A member flagged a post in the Discord channel as a violation of self-promotion rules.
- A moderator agreed, confirming the post indeed violated the communityâs self-promotion guidelines.
tinygrad (George Hotz) Discord
- Seeking Elegant Tensor Naming: A member is seeking a more elegant way to name tensors for easier tracking when printing model parameters, instead of manually adding a name attribute in the Tensor class.
- The member is seeking techniques to streamline tensor naming conventions for enhanced code readability.
- GPU Programming and Compiler Dev Resources: A member expressed interest in getting into GPU programming and compiler development for projects like tinygrad and requested learning resources or blog posts.
- The member is planning to read tinygrad-notes and asked for book or blog post recommendations on compiler development for GPUs, with another member recommending the geohotarchive YouTube channel as a resource for learning about tinygrad, and PMPP (4th ed) for GPU programming.
- METAL Sync Glitch Shards LLaMA: A member found unexpected behavior in sharding while reproducing a minimal example of a METAL sync issue from the bounty, suspecting that the COPY from METAL:1 to CPU was executing before the XFER from METAL to METAL:1 ended.
- The user suggests this caused the CPU to read zeros instead of the correct shard during LLaMA inference.
- AMD BEAM=2 Turbocharges Tinygrad: A user reported impressive speed improvements using AMD with BEAM=2, achieving 64 it/s, outperforming their previous best with Torch at 55+ it/s.
- Members noted that BEAM=2 often beats torch.
- LLaMA Sharding Loses Device Info: A user encountered an AssertionError while running llama.py with
--shard 4
, indicating that the device info was lost after sampling.- A potential fix was proposed to move the tensor, as seen on GitHub.
LlamaIndex Discord
- Llama 4 Powers New RAG Workflow: A quickstart tutorial demonstrates building a RAG workflow from scratch using Llama 4, showcasing how to set up core steps around ingestion, retrieval, and generation using LlamaIndex workflows, as shown in this tweet.
- The tutorial focuses on core steps around ingestion, retrieval, and generation.
- Auth0 and LlamaIndex Join Forces on Auth for GenAI: Auth0âs Auth for GenAI now ships with native LlamaIndex support, making it easier to build auth into agent workflows, as announced in this tweet.
- This integration simplifies incorporating authentication into agent-based applications.
- Gemini 2.5 Pro Shuttered, Points to Unified SDK: Members discovered that Gemini 2.5 Pro is deprecated and to use Googleâs latest unified SDK instead, as noted in the LlamaIndex Documentation.
- It was brought up that the Google SDK doesnât validate model names, but assumes provided name is valid, so it may be important to double check.
- StructuredPlannerAgent Gets the Axe: The documentation for
StructuredPlannerAgent
was removed because it is no longer maintained due to a cleanup of the agent docs, with a backlink provided for historical reference: StructuredPlannerAgent.- Instead of
StructuredPlannerAgent
, it was suggested to use an agent with a planning tool that does some Chain of Thought (CoT) reasoning, or using the LLM itself to create a plan before using agent(s).
- Instead of
Cohere Discord
- Members Inquire on Event Recordings: A member inquired about the availability of event recordings for those unable to attend live, but no response was given.
- The member expressed interest, so in the future, posting event recordings would benefit absent members.
- Newbies Seek Structured Output Guidance: A new member requested examples of how to get structured output (e.g., a list of books) using Cohere, and were directed to the Cohere documentation.
- The user admitted to being inexperienced with Cohere, and more examples of structured output may be warranted in the official documentation.
- Pydantic Schemas Integrated via cURL: A member sought ways to use Pydantic schemas directly in
response_format
with Cohere and avoid using the Cohere Python package.- They received a link to the Cohere Chat API reference and a cURL example for requests to
https://api.cohere.com/v2/chat
, mirroring the approach in the OpenAI SDK.
- They received a link to the Cohere Chat API reference and a cURL example for requests to
- Cohere Side-Steps Vector DB Recommendations: Explicit recommendations for vector DBs have historically been avoided because Cohereâs models are designed to function effectively with all vector DBs.
- This approach ensures broad compatibility and a neutral stance towards the vector database ecosystem, meaning no special optimizations are needed for any particular vector DB.
- Aditya Enters the Cohere Community: Aditya, with a background in machine vision and control, introduced themself while taking a sabbatical to explore web/AI with the openchain.earth project.
- Aditya is using VS Code, Github Co-Pilot, Flutter, MongoDB, JS, and Python (Evaluating), looking to learn more about integrating Cohereâs AI into their projects.
Torchtune Discord
- Contributor Tag Sought After: A member requested a Contributor tag on Discord, sharing their GitHub username.
- The user lightheartedly mentioned their Discord profile picture featuring the character Gus from Psych.
- DeepSpeed Integration Debated for TorchTune: A member inquired about integrating DeepSpeed as a backend into TorchTune and created an issue to discuss the possibility.
- A maintainer asked for more context, noting that FSDP supports all the sharding options from DeepSpeed.
- TorchTune Favors FSDP Over DeepSpeed: TorchTune leans towards FSDP due to its better composition with other PyTorch distributed features, with the belief that supporting both versions well is not feasible.
- Users who migrated to TorchTune to avoid the complexities of composing DeepSpeed, PyTorch, and Megatron prefer sticking to native PyTorch.
- Recipe for DeepSpeed with TorchTune?: A maintainer suggested creating a community recipe that imports TorchTune and hosts a DeepSpeed recipe, offering to feature it if a repo is made.
- This allows users interested in DeepSpeed to leverage it with TorchTune while keeping the core framework focused on native PyTorch.
- Tweaking FSDPModule for zero1-2 Training: Since TorchTune defaults to the equivalent of zero3, documentation or more recipes on how to tweak recipes using the FSDPModule methods for zero1-2 training are appreciated.
- Itâs believed that zero 1-3 are all possible with very minor tweaks to the collectives.
DSPy Discord
- MIPRO Algorithm Scaled on Complex Tasks: An article tested the MIPRO automated prompt engineering algorithm across tasks of varied complexity, from named entity recognition to text-based game navigation.
- The study leveraged tasks like CoNLL++, HoVer, BabyAI, and Ï-bench (customer support with agentic tool use).
- Larger Models Leverage MIPRO More: The study found that larger models benefit more from MIPRO optimization in complex settings, potentially because they handle longer multi-turn demonstrations more effectively.
- The quality of feedback significantly impacts the MIPRO optimization process, with meaningful improvements seen even from noisy AI-generated feedback.
LLM Agents (Berkeley MOOC) Discord
- Kaiyu Yang Explores Formal Math Reasoning: Guest speaker Kaiyu Yang presented on âLanguage models for autoformalization and theorem provingâ on a livestream, available at this link.
- The lecture covered using LLMs for formal mathematical reasoning, including theorem proving and autoformalization.
- AI4Math Becomes Crucial for AI Systems: AI for Mathematics (AI4Math) is crucial for AI-driven system design and verification, mirroring NLP techniques, especially training LLMs on curated math datasets.
- A complementary approach involves formal mathematical reasoning grounded in systems like Lean, which verify reasoning correctness and provide feedback.
- Dr. Yang Enhances AI in Math: Dr. Kaiyu Yang, a Research Scientist at Meta FAIR, focuses on enhancing AIâs mathematical reasoning by integrating formal systems like Lean.
- His work explores using LLMs for tasks like theorem proving (generating formal proofs) and autoformalization (translating informal to formal).
MLOps @Chipro Discord
- Manifold Research Deep Dive: The Manifold Research Group is hosting their Community Research Call #4 this Saturday (4/12 @ 9 AM PST), offering a look into their latest projects.
- Discussions will include Multimodal AI, self-assembling space robotics, and robotic metacognition, inviting collaboration in frontier science.
- Swarm Space Robotics Takes Flight: A PhD student at Manifold Research Group, who specializes in robotic swarms in space, extended an invitation to the research call.
- The research call seeks to encourage collaboration and probe frontier science in the field of space robotics.
Codeium (Windsurf) Discord
- Codeium Rebrands to Windsurf After Editor Success: Codeium rebranded to Windsurf after the successful launch of the Windsurf Editor in November 2024, explained in their rebrand announcement.
- The new name represents a blend of human and machine capabilities to create powerful experiences.
- Windsurf Floats a New SubReddit: Windsurf launched a new SubReddit to build a community, coinciding with changes to their Discord server.
- These changes included refreshed pages and channel renaming to reflect the new Windsurf branding.
- Codeium Extensions Get a New Plugin: With the rebrand, Codeium Extensions are now officially Windsurf Plugins and more innovation is promised.
- The company reiterated their dedication to enhancing the Windsurf Editor continually.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
{% if medium == âwebâ %}
LMArena â· #general (1134 messagesđ„đ„đ„):
Gemini 2.5 Pro, OpenAI's Deep Research, Google's AI Strategy, DeepCoder-14B Preview Model, NightWhisper Model
- Gemini 2.5 Pro Hailed as Superior Model: Members are calling Gemini 2.5 Pro the first true A.I., noting its superiority in creative writing and consistency over other models.
- Some users have observed that while Gemini 2.5 Pro excels in general tasks, Nightwhisper is superior in coding.
- OpenAIâs Deep Research Under Scrutiny: Users are questioning OpenAIâs Deep Research, noting its potential as the best agent for web searching, with one stating that 2.5 with tools is just on another level.
- However, the general consensus is that Deep Research is just OpenAIâs existing o3 model.
- Together AI Launches DeepCoder-14B Preview Model: Together AI and Agentica jointly released DeepCoder-14B-Preview, a code reasoning model, finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL.
- A user pointed out the dumbest most shameful marketing ever used, saying the gains arenât impressive considering this is just o3-mini.
- NightWhisper Modelâs coding prowess praised: Users are eagerly awaiting the potential release of NightWhisper, highlighting its demonstrated coding capabilities on the arena, despite its brief availability on webdev and lmarena.
- Some speculate itâs the same as the upcoming Google Ultra model.
- O3 model variations get mixed reviews: Members are comparing OpenAIâs O3 Mini and O3 models, with one noting that O1 is more adept in deciding how long to think than O3 mini.
- One user with access to O3 medium described it as better at language-related problems than O1, but still weaker than Gemini 2.5 Pro for code.
LMArena â· #announcements (1 messages):
Alpha UI, Desktop & Mobile, Bugs, Leaderboard
- Alpha UI Open for Testing: The Alpha UI is now open for testing without a password at https://alpha.lmarena.ai/.
- Users are encouraged to submit feedback and bug reports via the provided Google Forms and Airtable links.
- Updates Coming Fast for Alpha UI: The announcement mentions that the Alpha UI is an early version with limited features, but updates are coming quickly for Desktop & Mobile.
- For the latest models and leaderboard data, users should refer to the main site, suggesting that the alpha version may not be fully up-to-date.
Unsloth AI (Daniel Han) â· #general (586 messagesđ„đ„đ„):
Unsloth DDP Support, GGUF vs bnb LoRA training, Llama 4 Analysis, cogito-v1 preview LLMs
- Unsloth Addresses DDP Training Issues: A user reported issues with HF Trainer and DDP not working with 3 or more GPUs, but working fine with 2, but Unsloth supports DDP
- After testing, it threw a ValueError, and a member recommended ensuring CUDA visible devices are set to a specific GPU.
- bnb Is the Way to Go: A user inquired about whether to train a LoRA on bnb 4-bit or GGUF for a tiny model, to which it was advised to use bnb (bitsandbytes) for QLoRA training, as it saves downloading 4x the data.
- Once the adapter is trained, it can be saved and merged with the bnb model, then exported to GGUF.
- Llama 4 Models Get a Sloppy Reputation: A member tested Llama 4 (Scout and Maverick) and mentioned that it performs well in Japanese and seems to be capable base models with sloppily-put-together post-training.
- Another member commented that they will be waiting for the post-training overhaul.
- DeepCogitoâs v1 Preview LLMs Boast Strong Claims: A user shared DeepCogitoâs v1 Preview models, claiming their models outperform the best available open models of the same size, including counterparts from LLaMA, DeepSeek, and Qwen.
- They claim each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models).
Unsloth AI (Daniel Han) â· #off-topic (21 messagesđ„):
iMatrix Dynamic Uploads, Apple BFloat, Model Pruning, Online DPO
- iMatrix Dynamic Uploads Land on HF: Members uploaded iMatrix dynamic versions for Llama-4-Scout-17B-16E-Instruct-GGUF to HuggingFace.
- B in BFloat stands for Brain: The âBâ in bfloat means âbrainâ and the dataype was developed at Google Brain, according to Appleâs documentation.
- Schizo Theory: A member shared his âschizo theory is that companies like openai / claude / gemini use user inputs to prune their modelsâ.
- He believes the âWhich one of these do you preferâ-like responses are for collecting user preference data for training their models.
- Online DPO learns you too well: One member noted that online DPO starts to understand you better than you understand yourself.
Unsloth AI (Daniel Han) â· #help (175 messagesđ„đ„):
GraniteModel bug, Unsloth on MacOS, Multi-GPU Support, Gemma 3 12b issues, GRPO training
- GraniteModel Bug Bites Colab Users!: Users encountered a bug in the Colab notebook using GraniteModel, but a quick fix involves editing
granite_based/config.json
to replace GraniteModel with GraniteForCausalLM and rerun the cell.- The recommended method for editing the file on Colab is to download, edit locally, and then upload the modified version back to Colab.
- MacOS Misses Out on Unslothâs GPU Goodness: Unsloth currently only supports GPUs, leading to an
NotImplementedError
for MacOS users without NVIDIA GPUs.- However, thereâs a potential solution via this pull request that aims to address MacOS compatibility.
- Multi-GPU Support Coming Soon!: Users are eagerly awaiting multi-GPU support for fine-tuning in Unsloth.
- The response from the team is that itâs âsoon (tm)â.
- Gemma 3 12b Faces Loading Fails: Users reported that
push_to_hub_merged
isnât uploading all the necessary files to HF, so they cannot useAutoModelForCausalLM.from_pretrained("modelname/here")
and get an errorOSError: modelname/here does not appear to have a file named pytorch_model.bin
.- One member suggested that if youâre using > 1B gemma itâs a vision language model technically, so some things are slightly different. Users are suggested to try
FastModel
vsFastLanguageModel
for gemma3.
- One member suggested that if youâre using > 1B gemma itâs a vision language model technically, so some things are slightly different. Users are suggested to try
- GRPO Training Tips Sought for Massive Models: A user sought advice on training a 24B model with a 16k context length using GRPO, managing only a batch size of 1 on an H200 with 141GB VRAM and asked about Unsloth pro plan multi GPU support.
- Suggestions included increasing gradient accumulation, with the possibility of multi-GRPO support via other frameworks, and discussions around distributed GRPO concepts for sampling efficiency.
Unsloth AI (Daniel Han) â· #showcase (2 messages):
Location clarification
- Location not France: A member asked another member if they were from France.
- The member responded clarifying that they are from Dutch/Holland.
- Location Confirmed: The member confirmed they are from Dutch/Holland.
- This clarifies their origin in response to the initial question.
Unsloth AI (Daniel Han) â· #research (36 messagesđ„):
LLMs knowledge storage alternatives, RAG for memory offloading, Vector DBs and privacy, Retrieval augmented training, DeepSeek-V3
- LLMs Mull Knowledge Storage Alternatives: Members discussed the potential of LLMs offloading knowledge retrieval to RAG pipelines to reduce the size and increase the speed of the models, and training the attention heads to learn in conjunction with a vector database.
- It was suggested that OpenAI could provide generalized vector DB knowledge lookup over private datasets that open LLM kernels could plug into for added context.
- RAG Reimagined: Retrieval Portion Evolved: Discussions revolved around splitting LLMs into a knowledge model and a chat model, where the chat model focuses on intelligence and reasoning and tool calls to the knowledge model.
- While likened to RAG, the focus is on a kernel that works with experts or specialized vector DBs built on the same embeddings, effectively increasing vocab size in some sense.
- Vector DB Ventures: Privacy Benefits Beckon: A member noted that OpenAI could potentially benefit from giving away an open kernel for free: *âLook at your benchmarks before our attention vector lookups. Now look at your benchmarks after our attention vector lookups.â
- This could also lead to a privacy benefit by only offloading static knowledge memory lookup.
- Rewarding Retraining: Forget Whatâs Efficiently Remembered: A participant suggested âretrieval augmented trainingâ, rewarding the model to forget what it can efficiently remember via vector search.
- This approach could lead to more efficient models by leveraging external knowledge sources during training.
- DeepCoder Optimization Detailed: A member shared a link to a Together AI blog post about DeepCoder optimization, highlighting its potential for optimizing the vLLM pipeline.
- The optimization minimizes the wait for sampling by doing an initial sample and training, while simultaneously sampling again.
OpenRouter (Alex Atallah) â· #announcements (5 messages):
Rate Limits, Credits, Quasar Rate Limit, Feedback on Rate Limiting
- OpenRouter adjusts Free Model Rate Limits: Accounts with at least $10 in credits will have their daily requests per day (RPD) boosted to 1000, while those with less than $10 will see a decrease from 200 RPD to 50 RPD.
- Quasar to get Credit-Dependent Rate Limit: The update also notes that Quasar will soon have a rate limit that is dependent on credits.
- Feedback on Free Model Rate Limits: A member opened a feedback thread for users to post their thoughts on the changes.
- Hourly rate limits not available: There is no hourly rate limit, but the rate limit is 20 requests per minute.
OpenRouter (Alex Atallah) â· #app-showcase (2 messages):
Olympia.chat, Shopify, SaaS Marketing, Turnkey Operation
- Olympia.chat Seeks New Leadership: The founder of Olympia.chat has taken a role as Principal Engineer at Shopify, and the company is seeking an experienced site operator to take over technical maintenance and SaaS marketing.
- The profitable site generates over $3k USD per month, and the founders are flexible about terms for a potential takeover, offering a turnkey operation with all IP included.
- Olympia.chatâs Financial Performance: Despite peaking at nearly $8k last year, Olympia.chat currently generates over $3k USD per month consistently.
- Lack of funding led to a halt in marketing efforts, impacting customer churn.
OpenRouter (Alex Atallah) â· #general (758 messagesđ„đ„đ„):
OpenRouter Frontend, Quasar Open Sourced, Free Model Rate Limits, API Keys Please, Gemini
- OpenRouter Drops Sick New Frontend: OpenRouter has a new frontend that looks sick, big ups clinemay!
- One user joked that it looked like gpt-3.5 made this website in about 4 minutes.
- Gemini models are top tier: Gemini 2.5 Pro is on a whole other level compared to the other models, making it the most powerful model up to day.
- One user noted it was rated as 1. gemini 2.5 pro ⊠10. everyone else.
- Free Model Limits Tightened, Community Reacts: OpenRouter reduced the token limit for free models to 50, triggering mixed reactions from users, with some expressing frustration over the lowered limit.
- Some users feel that itâs like a paywall.
- API Keys Made Easier: Users can now easily get an API key once they make an account, add credits then in the top right dropdown go to keys and create one there.
- A community member said: I was asking about the app so i could try to help you put the key in the right spot but not sure how Godot works iwth that.
- Nvidia Silently Drops SOTA-Level Reasoning Model Llama 3.1: Nvidia silently dropped a SOTA-level reasoning model.
- The new model casually showing itâs better than Behemoth.
Cursor Community â· #general (762 messagesđ„đ„đ„):
Augment, Vector DB vs graph DB, Manus.im, Cursor C/C++ extension error, Model selection
- Daniel Mac goes Graph DB for Code: A member shared a link to Daniel Macâs tweet about using a graph database for code querying.
- Manus.im burns through credits: A user reported that Manus.im failed to answer a question correctly and burned through 984 of 1000 free credits on a single prompt.
- Another member suggested exploring alternatives like Smithery.ai or Awesome MCP Servers.
- C/C++ Extension Error: A user reported receiving an error related to the C/C++ extension after using Cursor since its launch in March 2023, with the extension possibly limited to use with Microsoft products.
- A workaround involved rolling back to a previous version and users shared other forum threads discussing the issue.
- Auto-Select is a scam: Users are reporting that the auto-select model option is selecting trash models.
- One user claimed it fucked up my codebase, while another suggested that itâs intentionally designed this way.
- Cursorâs Free Tier Gets Heat: A member reported that bypassing the trial version of Cursor could result in getting the user completely banned from using Cursor.
- One user noted: Gonna ban you now, but just so you know, I hope you didnât like using Cursor because you wonât be able to use it at all soon.
LM Studio â· #general (158 messagesđ„đ„):
Llama 4 Disappointment, GPU requirements and model sizes, LM Studio and Ollama, Jinja templates
- Llama 4 performance leaves users disappointed: Users express disappointment with Llama 4âs performance, describing it as bad and 10 steps backwards, questioning the validity of benchmarks.
- Others suggest that larger models may have quality control issues due to random data, too many connections, or poisoned datasets, while Qwenâs 14B models are praised.
- LLM size and hardware implications: A discussion arose regarding the relationship between VRAM consumption and model dilution, with some noting that models consuming less VRAM often appear more distilled or diluted to reduce size.
- A user clarified that Llama 4 gives similar results to 24-27B models, but has speed and cost of 17B model, but requires more vram making it pointless for simple users.
- LM Studioâs remote GPU compatibility is debated: Users discussed connecting LM Studio to remote instances of Ollama, but it was confirmed that LM Studio is not compatible with Ollama.
- Furthermore, the potential for connecting LM Studio with a remote GPU cluster was raised, alongside a discussion regarding the use of Snapdragon X Series NPUs and their (lack of) support with LM Studio and llama.cpp.
- Cogito modelsâ Jinja errors fixed with ChatGPT: Users reported errors with Jinja templates when using cogito-v1-preview-llama-3b, and were advised to use ChatGPT to quickly fix the template.
- The community model maintainer was notified about the wonky template and is expected to update the model.
- Decoding MOE models for dummies: A user asked, what is an MoE model?
- A helpful member explained that Mixture of Experts (MoE) models can be faster than dense models, as only parts of the model are active per token, although the whole model must be in VRAM.
LM Studio â· #hardware-discussion (398 messagesđ„đ„):
Docker Bad, AMD ROCm WSL Woes, Memory Limits and Motherboards, Umbrella Rack SuperComputer, Fast Reading Skills
- Docker Gets Roasted: After one member expressed wanting to be âbest friendsâ with anyone who says bad things about Docker, another member jokingly asked âDid Docker take out your family or something?â.
- The first member humorously replied, *âMy therapist said I shouldnât talk about it.â
- ROCm on WSL still problematic for 7800XT: A user reported that ROCm via WSL doesnât work with a 7800XT due to the lack of official support as seen in the AMD documentation.
- Despite this, another user suggested it might work since both cards are RDNA3 architecture, while the first user confirmed that it was impossible to get working half a year ago due to WSL passthrough issues.
- Memory Limits Debated: In a discussion about RAM limits, a user stated that a Ryzen 7000 has a weak memory controller and that the BIOS limit is 192GB on consumer hardware, while mainboards can fit 256GB.
- Another user pointed out that AMDâs website states a 128GB limit, to which the first user responded that people have been running 192GB for years, and they attributed the discrepancy to server hardware having different quality targets.
- Assembling an NND Umbrella Rack SuperComputer: One user proposed building a 16-node supercomputer with either RTX 4090 D GPUs (totaling 3TB VRAM) or a less powerful option (1.5TB VRAM), aiming for a 2T model with 1M context within a budget cheaper than an Nvidia DGX B300.
- Skeptics questioned the feasibility, with one user bluntly stating, âthis isnât how you do any of thisâŠâ, highlighting the need for RDMA, fast interconnects, and skilled engineers, emphasizing that the userâs goal was not possible on their current hardware.
- Language Model fine-tuning educational project: One member asked about a fun and educational project involving beefy hardware (2 RTX ADA 6000s, 512GB of RAM) and asked if itâs a good idea to learn to fine-tune a small instance of something like phi4.
- Another member suggested pretraining an LLM from scratch or fine-tuning an LLM and pointed to a coding dataset from Nvidia (huggingface.co) and suggested that fine-tuning base models, not instruct ones would be better.
Perplexity AI â· #announcements (1 messages):
Perplexity for Startups program, API Credits, Enterprise Pro
- Perplexity Launches Startup Program: Perplexity AI is launching a startup program offering resources to help startups reduce research time and focus on building.
- The program provides $5000 in Perplexity API credits and 6 months of Perplexity Enterprise Pro for the entire team; eligibility requires less than $20M in funding, being less than 5 years old, and association with a Startup Partner.
- Startup Program Details: The Perplexity for Startups program aims to provide eligible startups with the resources they need to accelerate their development.
- Eligible startups can receive $5000 in API credits and a 6-month subscription to Perplexity Enterprise Pro, enabling access to advanced AI capabilities for their entire team.
Perplexity AI â· #general (453 messagesđ„đ„đ„):
Gemini 2.5 Pro performance, Deep Research High rollout, Perplexity Discover tab, Manus Invites are still needed, AI image generation on Android
- Gemini 2.5 Proâs Reasoning Outputs Missing: Members discussed that Gemini 2.5 Pro doesnât expose its reasoning tokens, and therefore canât be included as a reasoning model on Perplexity, though it is a high latency thinking model.
- Because Gemini 2.5 Pro reasoning tokens arenât sent, Perplexity via the API doesnât show the reasoning like you would via AI Studio, but is still a high latency thinking model.
- Deep Research High Rolling Out Slowly: Members are eagerly awaiting the rollout of Deep Research High, which is expected to use 150-200 sources on average, however, one user reports Perplexityâs deep research got 23 sources, the free gemini deep research got over 500.
- Some members voiced frustration over the lack of communication regarding the release timeline, and the fact that the current version outputs a summary rather than conducting truly deep research. Check out the DeepSeek Subreddit.
- Gemini 2.5 Pro gives great performance on Perplexity: Users noted the addition of Gemini 2.5 Pro in Perplexity, with one user finding that Gemini 2.5 Proâs single story beat the other 3 stories by GPT 4.5, and another stating that itâs now powering deep research, delivering detailed reports like this one.
- However, one user noted that answers are often truncated at 500-800 tokens, despite the model generating 16,098 tokens of detailed report.
- Users report Perplexity auto-enabling Pro mode: Several users have reported that Perplexity is auto-enabling Pro mode on free users to waste their daily limits.
- One user said the non-pro model seems to be balls.
- Reported issues when Uploading PDF files: Pro user tried uploading 8 .pdf files and after 5 minutes of loading it either uploads one or two that instantly disappear with an error pop up saying file upload failed.
- Files sizes ranges from 114kb to 9,502kb
Perplexity AI â· #sharing (1 messages):
Llama 4, Benchmark Faking
- Benchmark Faking Allegations against Llama 4: A user shared a Perplexity AI search result questioning whether Llama 4 is faking benchmarks.
- The shared link provides a discussion and potential evidence related to the alleged benchmark manipulation by Llama 4.
- Ongoing Debate on Model Benchmarking: The conversation highlights the broader issue of transparency and reliability in AI model benchmarking, a recurring theme in the AI community.
- Concerns were raised about the methodologies used to evaluate Llama 4 and the potential for misleading results.
Perplexity AI â· #pplx-api (29 messagesđ„):
Perplexity API News Fetching, Perplexity API Sonar Prompting, Perplexity API Search Discrepancies, Perplexity API Citations, Perplexity API Sandbox
- Perplexity API News Fetching: A user requested a news API feature to fetch news based on queries or topics, similar to particle.news, and the team responded that they already have partnerships to surface news via their API.
- A team member suggested building a news API feature using Sonarâs existing functionalities and adding it to the API cookbook.
- Perplexity API Sonar Prompting: A user reported issues with Sonar, where responses were focused around the system prompt rather than dynamically handling user queries.
- A team member clarified that the system prompt isnât used during the search phase, advising the user to optimize the user prompt instead, referencing the Prompt Guide.
- Perplexity API Search Discrepancies: A user reported discrepancies between the Perplexity API and the web UI when summarizing web pages, with some links not being retrieved by the API and the results being less structured.
- The user is seeking assistance to resolve the issues, as the Sonar-pro model yields different results between the API and the web UI.
- Perplexity API Sandbox superiority?: A user reported that the API sandbox is giving way better results than the actual API when using sonar-reasoning-pro.
- The user is seeking advice on how to make the API give the same results as the sandbox.
Manus.im Discord â· #general (463 messagesđ„đ„đ„):
High Effort Mode, Manus Local Version, Genspark vs Manus, Llama 4 hype, Manus Credit Usage
- Local Manus Coming Soon: Members discussed that a local version of Manus will be possible in the future, like most other AI models.
- MCP Servers Available: Members noted that MCP servers are available on Claude since Nov 25/2024, and can be used with Claude code.
- Some members expressed skepticism, citing successful past attempts at what they termed cursed model merging.
- Llama 4 is Overhyped: Users share that Llama 4 is overhyped after testing it on Openrouter.AI and receiving subpar responses.
- Others claim Zucks is receiving criticism because he supposedly gamed the benchmarks.
- Octopus Web Scraper Works: A member reported that Octopus, a free website scraper, works pretty well on Zillow and Realtor, while Bardeen costs $130/month.
- Another member said $130/month seems expensive when you could use Manus to build your own.
- Manus credits are too expensive: Several users complain that Manus credits are too expensive, with one user reporting that a single Standard Complexity task used up all 1000 free credits and wishing the pricing and credit plans get updated.
- Some users shared it is better to break it into smaller tasks with new dialogue windows, and also recommend Proxy as a cheaper alternative.
aider (Paul Gauthier) â· #general (237 messagesđ„đ„):
Gemini 2.5 vs Sonnet Thinking, Aider's auto-testing, Gemini 2.5 Pro vs exp, OpenRouter citation links, AI resume builder
- Gemini 2.5 and Sonnet Faceoff: Members discuss Gemini 2.5âs strong logic but poor instruction following versus Sonnetâs feature-rich but less accurate coding, ultimately requiring fewer prompts for a working program.
- One user reported that Gemini 2.5 only needed 1 prompt, while Sonnet needed 3 prompts, despite Sonnet including multiple file input methods, optional drag and drop, batch processing, file queue management, explicit conversion start, explicit cancellation, resizable window, etc.
- Aiderâs Auto-Testing Troubles: A user is looking to enable Aiderâs auto-testing and potentially disable auto-committing due to issues with committing untested code, with a pointer to the Aider configuration options.
- Another user suggests to provide a model and key or Aider will guess what you want based on whatever of your keys it can find.
- Gemini 2.5 Pro exp Rate Limits: Users compare Gemini 2.5 Pro exp to Gemini 2.5 Pro preview, noting different rate limits, and one reports getting charged for the seemingly free
pro-exp
model.- Despite one user feeling like exp is weaker, another user had cancelled Sonnet within an hour of using it, while another user got rate limit issues on both, especially through openrouter.
- OpenRouter Missing Citation Links: A user asks if missing citation links are normal when using services like Perplexity Sonar Pro through OpenRouter, attaching an image.
- DIY AI Resume Builder Idea: A user is seeking an LLM-powered tool to analyze resumes against job listings, suggesting wording modifications, and another user suggests building oneâs own tool.
- A user suggests if they had some programming experience that this could be built, and they could also use it to test out Gemini 2.5 pro.
aider (Paul Gauthier) â· #questions-and-tips (8 messagesđ„):
Architect mode interruptions, Aider Response Time, Aider Cursor Rules
- Architect Mode Edits Getting Cut Off?: Users are reporting /architect mode edits getting interrupted when asked to add new files, potentially losing the editor.
- Saying no to adding new files appears to allow the edit to continue.
- Aider Response Time Questioned: Users are reporting that Aider v0.81.1 with
openrouter/deepseek/deepseek-r1
andopenrouter/anthropic/claude-3.5-sonnet
is as slow as ChatGPT.- One user waited â5 freaking minutesâ for a schema file to be created, only to receive a
litellm.APIError
due to a connection issue.
- One user waited â5 freaking minutesâ for a schema file to be created, only to receive a
- Comparing Aider Conventions to Cursor Rules: Users are asking if Aider conventions are similar to Cursor rules.
- A member clarified that aider âconventionsâ isnât really a thing, but just added context sent to the LLM. They are added manually or with
--read CONVENTIONS.md
.
- A member clarified that aider âconventionsâ isnât really a thing, but just added context sent to the LLM. They are added manually or with
aider (Paul Gauthier) â· #links (8 messagesđ„):
Software Engineer Gap Year, LLMs as AI Coworkers, Programming LLMs for Successful Outcomes
- Gap Year Not a Good Idea for Software Engineers?: An article suggests that taking a gap year/holiday would be an incredibly bad decision/time for software engineers, pointing to insights about the current tech landscape, see this article.
- LLMs as 1000 AI Coworkers: Anni Betts from Anthropic suggests that software engineers should think beyond having âan AI coworkerâ and instead consider having â1000 AI coworkers that went ham on your entire issue backlog at onceâ.
- According to the author this can be done by programming the LLMs and building a âstdlibâ that manufactures successful LLM outcomes.
Notebook LM â· #use-cases (10 messagesđ„):
NotebookLM Commercial Options, NotebookLM privacy assurances, NotebookLM Misreading Scholarly Articles
- Googleâs AgentSpace unlocks NotebookLM for Enterprise: A user inquired about a commercial-scale version of NotebookLM with data privacy and specific programming capabilities, and another member linked to Googleâs AgentSpace NotebookLM Enterprise documentation that enables CMEK.
- The documentation outlines how to set up NotebookLM with Customer-Managed Encryption Keys (CMEK), offering greater control over data encryption.
- Privacy Assurances Provided by NotebookLM: A member explained that both the Enterprise and Plus versions of NotebookLM offer privacy assurances, emphasizing that user data is never in the public domain, regardless of the version used.
- They clarified this point to address a fundamental misunderstanding of Googleâs privacy policy and terms of service, and further suggested that the platform has mechanisms to prevent prompt injection attempts.
- NotebookLMâs improved Summary after user correction: A user noticed that NotebookLM initially misread a critical point in a scholarly articleâs summary but corrected itself after the user provided a quotation and explanation.
- Repeating the same prompt with the same article in different Google accounts yielded the correct summary from the beginning, raising questions about whether the model uses previous queries for training and whether the privacy statement is accurate, according to the user.
Notebook LM â· #general (204 messagesđ„đ„):
Discovery Mode rollout, Google Cloud Next and Google I/O, NotebookLM Legal Use cases, New Gemini features with deep research, Podcast Audio Overviews
- Discovery Mode Still Rolling Out Slowly: Users report waiting for the new Discovery Mode feature, with the rollout expected to take up to two weeks from release date.
- One user jokingly demanded special treatment as a Google fanboy, requesting to be an alpha tester.
- Google Cloud Next and Google I/O Promise Surprises: The upcoming Google Cloud Next and Google I/O events are anticipated to reveal new features, though details remain tightly guarded.
- One user humorously compared Cloud Next to Christmas, with Google acting as Santa.
- NLM for Legal Use Cases and Printing Concerns: A user sought advice on using NotebookLM for extracting specific information from legal documents, aiming to get article numbers and relevant text, seeking assistance on printing the entire answer with all the links included.
- Another member suggested breaking content into 10-20 notebooks, each with its own specific content, to ask the same question immediately in each notebook.
- Gemini Still Hallucinates with Deep Research: Some users report experiencing hallucinations with Geminiâs deep research, despite it having access to the internet.
- One member clarified that Gemini can connect to Google Search, but it doesnât do it if you donât specify you wanna ground it and recommends testing this in AI Studio.
- Podcast Audio Overviews Coming to NotebookLM: It was reported that the new 2.5 Pro deep research version will have the capability to make audio overviews, but it is not working for all users.
- A Google employee clarified that complex topics with several different angles covered in the sources around a central topic result in longer podcasts.
Interconnects (Nathan Lambert) â· #news (92 messagesđ„đ„):
DeepSeek R2 Release, LlamaCon, Llama-4-Maverick, Style Control Ranking, HF version of Llama-4-Maverick
- DeepSeek R2 must release for LlamaCon: Members are encouraging those with connections at DeepSeek to release R2 on the same day as LlamaCon to leverage the hype, citing that the training data needed for MoE is different than base models according to research from arxiv.org.
- LM Arena policy updates: Early analysis shows style and model response tone was an important factor (demonstrated in style control ranking), and the HF version of Llama-4-Maverick is being added to Arena, but Meta should have made it clearer that Llama-4-Maverick-03-26-Experimental was a customized model to optimize for human preference, so leaderboard policies are being updated to reinforce commitment to fair, reproducible evaluations.
- Members reacted saying it was a yapping emoji slopfest.
- Cogito models released under open license: Strong LLMs of sizes 3B, 8B, 14B, 32B and 70B are being released under open license, with each model outperforming the best available open models of the same size from LLaMA, DeepSeek, and Qwen, across most standard benchmarks and the 70B model outperforming the newly released Llama 4 109B MoE model.
- These LLMs are trained using Iterated Distillation and Amplification (IDA), an scalable and efficient alignment strategy for superintelligence using iterative self-improvement from DeepCogito.
- Together AI moves into Training: Together AI is getting into the training business, as showcased by this case study.
- Google Gemini 2.5 Pro Deep Research Announced: Google Gemini 2.5 Pro Deep Research was announced according to 9to5Google with a member reporting that Gemini 2.5 deep research is roughly on par with OpenAI Plus with an audio overview podcast option thing.
Interconnects (Nathan Lambert) â· #ml-questions (30 messagesđ„):
OpenAI Image Gen Capabilities, Logprob Reward, Arxiv Publishing, Arxiv Moderation, Phi-CTNL
- OpenAIâs New Image Gen Capabilities: A member inquired about write-ups describing how OpenAI unlocked new image generation capabilities, suggesting it wasnât a new model but latent capabilities.
- Another member suggested it was achieved using an objective similar to this one and incorporating a logprob reward as seen in this paper.
- Arxiv Posting Process Revealed: Members discussed the process for posting on Arxiv, noting that a small vouch is needed, which differs from the old physics days of just dumping content.
- They added that a small random chance of moderation exists, where even nonsense papers can get rejected, but people mostly just post whatever anyway.
- Champion Phi-CTNL Paper has 20 Citations: A member shared a link to this paper describing a champion model that has 20 citations, exclaiming Godlike.
- Another member noted the brutal model name phi-CTNL, speculating on the reaction if Meta cites it in a future Llama 4 paper.
Interconnects (Nathan Lambert) â· #ml-drama (15 messagesđ„):
Google AI Staff, AI Sabbatical, NVDA Tariffs, ASI, Google's management vibes
- Google allegedly pays AI staff to do nothing: A TechCrunch article discusses how Google is allegedly paying some AI staff to do nothing for a year rather than join rivals.
- A member described it as the most basic idea from management where all the second order effects are horrifically bad.
- AI Engineer eyes Sabbatical to look at trees: One member expressed that after AI settles down in a year or two, theyâd happily take a sabbatical and write a book.
- Another member said that from the corporate side (McKinsey etc.) they give their researchers very long time off to make sure they stay engaged otherwise they found you just lose everyone over time.
- Tariffs May Accelerate AI Slowdown: A member suggested that if tariffs stay, the AI field will settle down in a month because people canât afford NVDA GPUs.
- They stated if tariffs stay itâll settle down in a month as you canât afford NVDA GPUs.
- Google Paying Quitters Creates Legal Peril: A member clarified that Google is paying people who have quit for another year but forcing them not to work.
- In theory, anything they do in that year belongs to Google, so they canât start working on their startup or something without legal peril.
Interconnects (Nathan Lambert) â· #random (12 messagesđ„):
Google Cloud Next, Qwen 3 Launch, GPT 4.5 preferences, Claude Code Credits, Tim Apple
- Google Cloud Next to Drop New Models: A member shared that Google will drop new models on Cloud Next, which starts Wednesday, according to this X post.
- This may mean the launch of Qwen 3.
- GPT 4.5 Preferences Underway: A member alluded to GPT 4.5 preferences being collected by OpenAI, linking to this X post.
- They were looking for the High Taste Tester LMarena to weigh in.
- Anthropic offers Claude Code Credits: A member shared a link to Anthropic offering $50 in Claude Code credits to 1,000 people for trying Claude Code.
- According to the post, that may be enough credit to change one var name.
Interconnects (Nathan Lambert) â· #memes (5 messages):
Jiankui He's X ad revenue
- Jiankui Heâs AdSense fortune: A user joked about Jiankui He making money from the X creator ad share.
- Another user joked that he makes $20 from that, or $20K if Elon Musk wants to âfuck him.â
- AdSense Revenue Speculation: Speculation arose regarding the potential ad revenue earned by Jiankui He on X.
- Estimates ranged from a modest $20 to a more substantial $20,000, contingent on Elon Muskâs intervention.
Interconnects (Nathan Lambert) â· #rl (24 messagesđ„):
DAPO papers, OLMo, Tulu 3, BoN Sampling
- DAPO papers offer âExtreme valueâ: Members in the channel discussed a DAPO paper as offering âExtreme valueâ.
- They also referenced another paper built on OLMo.
- Tulu 3âs work makes it into another paper: A paper using Tulu 3âs work was mentioned and linked: https://arxiv.org/abs/2504.03790.
- Alpha in research = talk to other researchers: A member stated that âbiggest alpha in research is just talking to other researchersâ and shared some insights from a paper, noting it âuses a very different method of inference time computeâ.
- They also stated that it âshows that BoN sampling is effectively just changing the beta factor in RLHF (lowering the KL penalty)â, and that âyou can design the inference time compute differently, so that you arenât hacking RL (in this case using an RM as guidance) and get far better answersâ.
- BoN sampling to sub in future work?: A member asked whether BoN sampling could substitute in future work.
- Another member responded that âitâs more complicated to implement but sure why not if itâs flop equivalentâ.
- Ash Vaswani Tweet on Undergrad Technical Report: A user shared a tweet from Ash Vaswani and stating that a linked paper âdidnât give very good vibes thoughâ.
- The member stated that the paper âfelt very undergrad technical reportâ, but declined to tweet negatively about it.
Interconnects (Nathan Lambert) â· #reads (3 messages):
Karan Dalal Post, Yuxi Liu Essay
- Karan Dalalâs Post Goes Viral: A member shared a link to Karan Dalalâs post on fxtwitter, generating excited discussion.
- The original poster reacted with simply, âWTFâ.
- Yuxi Liuâs Essay Gains Attention: A member posted a link to Yuxi Liuâs essay prompting immediate discussion.
- No specific details about the essay were mentioned.
Interconnects (Nathan Lambert) â· #posts (1 messages):
natolambert: My post looks generous next to Marcusâs, oh my
Eleuther â· #general (106 messagesđ„đ„):
Adam second-moment estimate buffers, Google DeepMind Patents, Hierarchical Perceiver, AI Auditing Survey, GFlowNets
- Adam Buffers Reconstruction Discussions Emerge: Members discussed the utility of Adam second-moment estimate buffers and how to efficiently reconstruct them for open-source models, balancing accuracy with computational cost, for potential method improvements.
- It was noted that setting beta2 to a high value (e.g., 0.999999) and the learning rate to zero could improve accuracy, though the final epoch of pretraining presents challenges.
- DeepMind Patents Hierarchical Perceiver: Members noted that Google DeepMind is trying to patent the Hierarchical Perceiver, drawing comparisons between the patent diagrams and those in the original research paper.
- Some speculated this patent could be related to DeepMindâs work on ultra-long context lengths in Gemini, with discussions on whether itâs a defensive measure or indicative of current usage after its original lack of uptake.
- Licensing Faceoff: Apache 2.0 Prevails over MIT: The conversation mentioned a preference for the Apache 2.0 license over MIT, citing its defenses against patent-based lawfare in machine learning.
- It was highlighted that institutional inertia and GitHub org settings favored Apache 2.0, with the sentiment that outside of GPLv2 weirdness or wanting to engage in lawfare shenanigans, thereâs no reason to argue for MIT over Apache 2.0.
- DeepMind Rumored to Sandbag Model Releases: Members discussed a rumor, per a Reddit thread, that DeepMind may be delaying the release of research to maintain a competitive edge.
- One participant clarified that sandbagging refers to holding back in ability, not releasing purposely bad versions of models to mislead others.
- Survey seeks AI Auditing Experts: A researcher from the University of Turku, Finland, is conducting a survey on ethics-based auditing of generative AI systems and is seeking participation from professionals with practical experience in AI auditing, model evaluation, risk/compliance, or ethical alignment of AI principles.
- The survey aims to gather insights on auditing or evaluating AI systems, especially generative models.
Eleuther â· #research (35 messagesđ„):
QKNorm, Soft RL, Llama 4 Memorization, Critical Batch Size, Reward, Value, Q-value letters
- QKNorm Developments Deemed Dubious: A member suggested a better/earlier paper and stated that the QKNorm developments are not the right way to go, referencing this paper.
- Soft RL Goal Gleaned: A member summarized that the goal of Soft RL is to learn a policy that not only knows a good response to every query, but ideally knows all good responses to every query.
- They linked to test-time-training.github.io/video-dit/ and this tweet while mentioning thread block clusters.
- Llama 4 Lacks on MATH-Perturb: In a discussion about measuring memorization of test sets in the Llama 4 models, a member stated that it performs pretty badly on the MATH-Perturb dataset and linked to this tweet.
- Critical Batch Size Critiqued: Regarding the statement that very large batch sizes are not good for convergence, a member cited the standard McCandlish paper on critical batch sizes to back up the statement and linked to this paper.
- R stands for Return, Remarks a Redditor: A member joked that one day llm researchers will use the correct letters out of R, V, and Q to represent reward, state-value, and state-action values respectively but not today while linking to this paper.
- Another member responded Trick question, R stands for Return alongside a link to this paper.
Eleuther â· #interpretability-general (9 messagesđ„):
Baranuik and Balestriero's works, ReLU networks, Boris Hanin's ReLU networks paper, ICML machine unlearning workshop
- Hyperplane Happy Neural Nets Hedge Overfitting: It was noted that because ReLU neural nets work by carving the input space along hyperplanes, they have an implicit bias against overfitting that gets better in high dimension.
- It takes at least d+1 hyperplanes to enclose a bounded set, so a perfectly overfitted model enclosing each datapoint in a separate bounded set would need at least n(d+1) neurons.
- Haninâs Hyperplane Handling Helpful Hints: A member shared a link to Boris Haninâs paper which demonstrates some mathematical properties of ReLU networks, specifically studying the geometry of their constant regions.
- Another member expressed their love for a specific figure in the paper.
- ICML Invites Insightful Investigation Into Unlearning: A member shared that ICML will have a machine unlearning workshop.
- The workshopâs website can be found here.
Eleuther â· #lm-thunderdome (1 messages):
LM Harness, HotpotQA, Llama Eval, GPT Models
- Guidance Needed: LM Harness for HotpotQA: A member inquired about a LM harness implementation for HotpotQA to evaluate Llama and GPT models.
- They requested guidance on running evaluations against HotpotQA.
- Llama and GPT models under eval: Members are evaluating Llama and GPT models.
- They require a LM harness implementation for HotpotQA to do so.
Nous Research AI â· #general (127 messagesđ„đ„):
Llama-4-Scout-17B, Gemini 2.5 Pro Code Generation, aider-chat & Gemini 2.5 Pro, HiDream-I1 Image Model, DeepCogito LLMs & IDA
- Llama-4-Scout-17B Gets Ready for llama.cpp: Llama-4-Scout-17B text-to-text support has been added to llama.cpp, with members working on converting and quantizing the model.
- This pre-release is generating excitement among users eager to test its capabilities.
- Gemini 2.5 Pro Generates Solid Code Snippets: Gemini 2.5 Pro is being lauded for its ability to generate functional code snippets from complex prompts. See the prompt and responses in this message.
- aider-chat plus Gemini 2.5 Pro Creates AGI Prototype: A user reported using aider-chat combined with Gemini 2.5 Pro to edit or create 15 files from a 300k token context, including their frontend, API, and microservices.
- The user feels like they now have all the files to deploy a production AGI prototype.
- HiDream-I1 Image Model Generates High-Quality Images: HiDream-I1 is a new open-source image generative foundation model with 17B parameters using Llama 3.1 8B as a text encoder, released under the MIT license that achieves state-of-the-art image generation quality within seconds.
- It produces exceptional results across multiple styles including photorealistic, cartoon, artistic, and more, achieving state-of-the-art HPS v2.1 score, which aligns with human preferences.
- DeepCogito Models use Iterated Distillation and Amplification: A new suite of Cogito models (3B-70B) claim to outperform same-size models like Llama, DeepSeek, and Qwen, and are trained using Iterated Distillation and Amplification (IDA), which iteratively improves a modelâs capabilities through cycles of amplification and distillation as outlined in this research.
- Notably, the 70B model allegedly surpasses the newly released Llama 4 109B MoE model.
Nous Research AI â· #ask-about-llms (4 messages):
LayerNorm Implementation, Llama4 Context Window, H100 Usage
- LayerNorm Stats Calculated Per Sample: A member implemented LayerNorm, noting the key difference from BatchNorm is computing statistics per sample (axis=1) instead of per batch, with keepdims=True to avoid operand issues.
- They also removed running averages since mean and variance depend on the number of features, not batch size, and attached an image showcasing it.
- Llama4 needs H100?: A member inquired about testing Llama4 with a 10M context window on a single H100.
Nous Research AI â· #interesting-links (18 messagesđ„):
Distributed data parallel training, Untrusted low-cost compute, Nous DeMo paper, Gradient compression algorithm, P2P interruptible compute
- Panthalia Platform Verifies Low-Cost Compute with DDP: A platform has been developed to verify untrusted, low-cost compute for training models over the internet using distributed data parallel (DDP), inspired by the Nous DeMo paper for compression, with a waitlist available via X.com.
- Panthalia Aims to Resell H100 Compute at $0.60/hr: In its early stages, Panthalia aims to resell low-cost provider compute at interruptible prices, such as $0.60/hr for an H100 and $0.13/hr for a 4090, leveraging DDP and DeMo-style compression.
- The weights are stored on reliable servers, enabling scaling for initial users, and long-term plans include building a supply of P2P interruptible compute.
- Panthalia Enables User-Defined Training and Plugins: The platform supports model sizes limited only by device capacity, using DeMo compression to achieve significant reduction in size, with a plugin system allowing users to define their own models, training methods (QLoRA), and distributed training algorithms (DeMo vs. DiLoCo).
- Users can download weights, and compute units can be standardized within subnets to ensure validation, with Stripe (credit card) for payments and crypto for payouts.
GPU MODE â· #general (10 messagesđ„):
GPUMODE triton dataset, PyTorch version for triton kernels, GPUMODE website improvements, GPUMODE Job Portal
- GPUMODE Triton Dataset: Genesis on PyTorch 2.5: The GPUMODE âtritonâ dataset, used for Inductor Created Data, was created using PyTorch 2.5.
- The creator promised to update the readme to reflect this crucial detail, since users may have issues running it on PyTorch 2.6+.
- GPUMODE Website: New Tab Navigations Suggested: A user suggested that the âLecturesâ and âResourcesâ tabs on the GPUMODE website should open in a new tab, since they are hyperlinks to YouTube/GitHub.
- This would prevent users from navigating away from the GPUMODE website in the same tab, thus improving user experience.
- GPUMODE: Job Portal Idea Scraped: A member proposed adding a job portal to the GPUMODE website, which would scrape postings from a specific channel, to create new postings.
- They also suggested a static template (JSON or YAML) for job posters to ensure consistent formatting and simplify entry creation, and the GPUMODE staff have acknowledged the suggestion.
GPU MODE â· #triton (14 messagesđ„):
block_ptr usage, tl.load and boundary_check, Boundary checks and performance
- Block Pointers to fill out-of-bounds: A member suggests using
tl.make_block_ptr
to create pointers that can fill with zeros for out-of-bounds memory accesses, specifically highlighting usage withboundary_check
andpadding_option="zero"
.- The usage example provided utilizes
tl.make_block_ptr
with parameters likeshape
,strides
,offsets
,block_shape
, andorder
to create the pointer, and then loads data usingtl.load
with boundary checks.
- The usage example provided utilizes
tl.make_block_ptr
deep dive: A member inquired abouttl.make_block_ptr
, asking if it could be spammed in a loop, how to use the offset parameter, and the meaning of the order parameter.- Another member clarified that
tl.advance
should be called to increment the pointer for loading data in a loop, the offset represents the start element index, and the order parameter defines the memory layout (e.g., col-major matrix).
- Another member clarified that
boundary_check
order is irrelevant, but required for correctness: A member asked about the meaning and behavior ofboundary_check
intl.load
, specifically its order and the consequences of omitting it.- It was explained that the order of
boundary_check
doesnât matter, and omitting it increases speed, but risks errors like âdevice-side assert triggeredâ due to potential buffer overruns, especially when array dimension is not a multiple of the block size.
- It was explained that the order of
- Can you fill with another value than zero or nan?: A member asked if it was possible to fill with a value other than zero or NaN when using block pointers.
- Another member answered that it is difficult to do, but you can replace NaN with another value by using
tl.where(x == x, x, another)
becausenan != nan
.
- Another member answered that it is difficult to do, but you can replace NaN with another value by using
GPU MODE â· #cuda (4 messages):
Deepseek communication library, NVSHMEM and Unified Virtual Addressing (UVA), LDSM (Local Data Share Memory), Optimized smem load
- Deepseek Lib Built off NVDAâs NVSHMEM: The deepseek communication library is built off NVSHMEM library from NVDA.
- NVSHMEM Explored for UVA Intra-Node Comm: A member inquired whether NVSHMEM uses Unified Virtual Addressing (UVA) for intra-node communication, enabling peer-to-peer loads/stores to remote GPUs via NVlink.
- LDSM Copying Discussed: A user asked for the code for defining
make_tilded_copy
, and stated that the current image does not look like one from atiled_mma
, instead it looks like an LDSM copy.- One explained that with LDSM, 32 threads in a warp coordinate to copy data from smem to rmem, and if the smem is row-major, T0 loads from 0-7 from smem, and stores 0,1,8,9,128,129,136,137 into its own registers memory.
- Optimized smem load: A member shared a code snippet,
tCsA = thr_mma.partition_A(sA); tCrA = thr_mma.make_fragment_A(tCsA); copy(tCsA, tCrA);
, for partitioningsA
according to thetiled_mma
.- They added that for optimized smem load we should use LDSM though.
GPU MODE â· #torch (9 messagesđ„):
TorchTitan's Compile Strategy, FSDP Numerical Issues, FSDP2 Model Extraction
- TorchTitanâs Pre-Compile Strategy: The standard practice is usually to compile after operations, but TorchTitan does a unique per-block compile before, potentially to circumvent some torch compile bugs; see torchtitan/parallelize_llama.py#L313.
- The block-wrapping approach aims to leverage Dynamoâs caching to skip Tritonâs LLVM compilation, which is slow, however, numerical issues may still exist when using
torch.compile
and FSDP together.
- The block-wrapping approach aims to leverage Dynamoâs caching to skip Tritonâs LLVM compilation, which is slow, however, numerical issues may still exist when using
- FSDP and torch compile cause Numerical Issues: A research lab experienced numerical issues using FSDP with
torch.compile
, leading to training instability, where the reward would suddenly plummet.- They discovered that disabling
torch.compile
resolved the issues, and cautioned to be careful with torch compile, highlighting that these problems were observed with HF qwen2.5 and a custom GRPO+entropy loss.
- They discovered that disabling
- The wrapped model extraction from FSDP2 remains challenging: A member asked how to get the original model from an FSDP2 wrapped model because the modifications are done in place and
copy.deepcopy
isnât implemented in FSDPModule.- Another member suggested that FSDP modifies the model in place by wrapping modules, recommending keeping the original model around before applying FSDP.
GPU MODE â· #cool-links (5 messages):
CUDA physics simulation kernels go open source, Triton-Distributed, SMERF 3D
- PhysX Goes Public!: NVIDIAâs CUDA physics simulation kernels are now open source; some are already working on a ROCm version.
- Triton Gets Distributed Superpowers!: A learning note details Triton-Distributed, fusing Triton with NVSHMEM/ROC-SHMEM to enable multi-GPU execution, add IR for distributed tasks, and support compute-communication overlap (link).
- SMERFâs Berlin Demo is Still Cool: The SMERF (Scalable Modelling of Explicit Radiance Fields) projectâs Berlin demo remains impressive for its 3D scene reconstruction capabilities; the project page is here.
GPU MODE â· #jobs (2 messages):
Krea hiring, ML engineers, GPU cluster, diffusion models, interns
- Krea Seeks ML Engineers for GPU Brilliance!: Krea is hiring ML engineers to optimize the training/inference pipeline for their GPU cluster, seeking individuals passionate about accelerating image generation models.
- Krea needs Researchers for diffusion models: Krea is also seeking researchers interested in enhancing the controllability and aesthetics of diffusion models.
- Inquiries Emerge for Internship Spots: A member inquired about potential internship openings at Krea.
GPU MODE â· #beginner (15 messagesđ„):
Graph Neural Networks (GNNs), Graph Attention Networks (GATs), CUDA compilation of C code, NVIDIA Streaming Multiprocessors, Thread cooperation in CUDA
- GNN Computations are Radically Parallel: Members discussed the parallel nature of Graph Neural Networks (GNNs), noting that updates for each node in a graph can often be computed in parallel.
- One member mentioned that Graph Attention Networks (GATs) architecture is one such example that comes to mind.
- C++ Compilers may fail on valid C code: Members discussed the claim that C++ compilers can compile all C code, referencing the serverâs FAQ.
- One member pointed out that itâs possible to write C code that does not compile with a C++ compiler, citing a Wikipedia article.
- CUDA Glossary Updated: A member suggested including a screenshot from Hennesy & Patterson that breaks down terminology into a common ground (Ex: NVIDIA Streaming Multiprocessors = Cores).
- Another member suggested adding it as a suggestion in the glossary, or post it in the channel.
GPU MODE â· #torchao (1 messages):
torchao 0.10.0 release, MXFP8 training, PARQ, Module Swap Quantization API, Low Bit Kernels
- TorchAO Drops New Release: v0.10.0: The 0.10.0 release of torchao introduces end-to-end training support for mxfp8 on Nvidia B200, along with PARQ for quantization-aware training.
- This release also includes a module swap quantization API for research and updates for low-bit kernels, with details available in the release notes.
- Nvidia B200 can use MXFP8 training: MXFP8 is now supported for end to end training on Nvidia B200 due to the updates from the torchao 0.10.0 release.
- These training capabilities will allow for better and faster quantization aware training and new research.
- TorchAO releases Module Swap Quantization API for Research: The new Module Swap Quantization API will enable researchers to effectively apply quantization to custom modules in models.
- The torchao 0.10.0 release enables researchers to experiment with quantization strategies more flexibly, by swapping standard modules with quantized versions.
GPU MODE â· #off-topic (1 messages):
twzy: met yann lecun today and he seemed pissed
GPU MODE â· #self-promotion (9 messagesđ„):
Tom and Jerry Diffusion Transformers, Nvidia Hopper Distributed Shared Memory, Verifying Untrusted Low-Cost Compute, LiveDocs Code Documentation
- Toon Time: Team Triumphs with Tom & Jerry Transformer: A team completed a project creating 1 minute long Tom and Jerry cartoons by finetuning a diffusion transformer, and their work was accepted to CVPR 2025 and released their finetuning code on Github.
- They also released a sample video of the unedited output, fully generated by the diffusion transformer.
- Hopperâs Hidden Hardware Helps High-Performance RNNs: A member mentioned that on Nvidia Hopper architecture, there is a really interesting feature where you can have distributed shared memory to transfer data directly between SRAM of SMs.
- They used this feature to run tensor parallelism across SMs of their RNNâs hidden states on a single GPU to remove the need of writing back to HBM.
- Panthalia Platform Provides Proof of Trustworthy Parallelism: A member has been working on a platform that verifies untrusted low-cost compute to train models over the internet with distributed data parallel as described here.
- They use a compression algorithm heavily inspired by the DeMo paper (docs).
- LiveDocs Launches Legit Lookinâ Logistics: The creator of LiveDocs invites users to document your code with their upgraded service, now with more features available via signup at www.asvatthi.com.
- Included was an image of the interface, showing off various code documentation pages.
GPU MODE â· #đż (1 messages):
AlphaGeometry, KernelBench, GPU kernel generation
- KernelBench Boasts Bootstrapping GPU Kernels: A member mentioned prior work on using verifiers to bootstrap GPU kernel generation capabilities through test-time compute scaling, referencing experiments in KernelBench.
- The approach isnât quite AlphaGeometry-style but involves a small set of actions to apply angle chasing solvers.
- Geometry and Verifiers Discussed: Discussion involved methods related to alpha-geometry style techniques and verifiers.
- Mentioned a method inherently involves a pretty small set of possible actions to apply angle chasing solvers.
GPU MODE â· #reasoning-gym (6 messages):
Quasar Alpha, Reasoning Gym Levels, Curricula Tasks
- Quasar Alpha: Open Router Test Model: A user shared the performance of Quasar Alpha, the open router test model, with an attached image.
- Another user asked for the raw outputs to potentially add in a PR to reasoning-gym-eval.
- Reasoning Gym Task Levels Need Definition: A user mentioned they are defining the levels for 15 tasks in reasoning-gym that currently lack a defined level, planning to finish by the evening.
- The user inquired about submitting a PR to make these definitions available on the main branch and it was approved.
- Reasoning Gym Curricula Task PRs: A user asked if adding curricula to tasks without them was appropriate for a PR to reasoning-gym-eval.
- It was encouraged to create a PR for this purpose.
GPU MODE â· #gpuæšĄćŒ (3 messages):
DeepSeek Communication Library, NVSHMEM and UVA, Intra-node communication
- DeepSeek Leverages NVSHMEM: A member inquired whether the DeepSeek communication library is built off NVSHMEM library from NVDA.
- NVSHMEMâs use of UVA Questioned: A member questioned whether NVSHMEM uses Unified Virtual Addressing (UVA) for intra-node communication.
- Peer-to-Peer Loads/Stores via NVLink: The member added that using UVA, one can perform peer-to-peer loads/stores to data stored in a remote GPU (connected by something like NVLink).
GPU MODE â· #general (11 messagesđ„):
Submitting .py files with inline CUDA, CUDA Kernels, Grayscale CUDA Example, torch::extension
- Inline CUDA Submission Trouble: A user reported trouble submitting .py files with inline CUDA, questioning the validity of the reference script.
- Admins acknowledged the issue and requested a link to the failing job to assist with debugging. Another user suggested the sample submission might be incorrect and that other inline CUDA implementations may work.
- CUDA Inline Script Solution: A user requested a sample script for inline CUDA submissions, and another user provided a code template using C++ and CUDA.
- The code template included CUDA sources (a
grayscale_kernel
function), C++ sources (including<torch/extension.h>
), and a Python module loaded viaload_inline
.
- The code template included CUDA sources (a
GPU MODE â· #submissions (17 messagesđ„):
vectoradd benchmarks, grayscale benchmarks, Modal runners
- VectorAdd Benchmarks Galore: Multiple benchmark submissions for vectoradd on L4 GPUs using Modal runners have succeeded with IDs ranging from 3500 to 3532.
- Grayscale Leaderboard Gains Traction: Leaderboard submissions for grayscale on L4, T4, A100, and H100 GPUs using Modal runners were successful, including IDs 3503, 3536, 3539, and 3540.
- Modal Runners Deliver Results: Submissions to the vectoradd leaderboard on T4 and A100 GPUs, IDs 3537 and 3538 respectively, succeeded using Modal runners.
GPU MODE â· #feature-requests-and-bugs (5 messages):
Leaderboard discrepancy, CUDA submission failure
- Leaderboard Time Units Cause Confusion: A user noticed a discrepancy in time units between the web (https://gpu-mode.github.io/discord-cluster-manager/) and Discord leaderboards, with the former displaying nanos and the latter millis.
- A new leaderboard website is being prepared, with time units converted for clarity.
- CUDA Submission Stumbles: A user reported that the sample CUDA submission from (https://github.com/gpu-mode/reference-kernels/blob/main/problems/pmpp/vectoradd_py/solutions/correct/submission_cuda_inline.py) failed to run as a test.
- This was deemed unexpected, and the user was asked to provide the specific error message.
GPU MODE â· #hardware (3 messages):
A100 vs L40, FP8 support, 4bit weights, Open source w4a8 kernels, GPU Fryer tool
- A100 Dominates L40 in Bandwidth and Tensor Ops: The A100 is reportedly nearly twice as fast as the L40 in both DRAM bandwidth and tensor operations.
- Despite the L40 having FP8 support and a larger L2 cache, vLLM doesnât include optimized kernels for Lovelace in its normal distribution.
- Limited Benefits of 8-bit Floating Point with 4-bit Weights: With 4-bit weights, the benefits of 8-bit floating point support in Hopper/Lovelace are limited.
- There are currently no open-source w4a8 kernels available.
- GPU Fryer Tool for Problem Hunting: Running the GPU Fryer tool can help in identifying problems.
- This tool, maintained by Hugging Face, is useful for stress-testing and debugging GPU configurations.
HuggingFace â· #general (52 messagesđ„):
FP4 Fine-tuning, Parasail Inference Provider, Llama.cpp Llama 4 Support, Mobile SQL Generation Models, Multi-Agent AI Deployment
- FP4 Fine-Tuning Frenzy Fuels Faster Finishes: Users are exploring fine-tuning quantized models using FP4 with tools like Unsloth, which allows loading lower precision models for training and quantization.
- While fine-tuning a quantized model is possible via LoRA, directly fine-tuning the quantized model itself is not.
- Parasail aims to Provide Premier Performance: Parasail, a new inference provider, is looking to partner with Hugging Face after recently coming out of stealth, already serving 3B tokens a day on Open Router and 5B+ a day for private companies, as reported by The Next Platform.
- Llama.cpp Leaps to Llama 4: The backend Llama.cpp has been updated to support Llama 4, according to the GitHub releases.
- Tiny Transformers Touted for Telephones: For generating SQL queries from data descriptions on mobile devices, Qwen 2.5 0.5B and models from the SmollM2 Intermediate Checkpoints collection and the TinyLlama collection are recommended.
- Converting models to TensorRT format via ONNX is suggested for leveraging older architectures.
- Orchestration Options Open Opportunities: Oblix (https://oblix.ai/), a new tool for orchestrating AI between edge and cloud, integrates with Ollama on the edge and supports OpenAI and ClaudeAI in the cloud, aiming to create low-latency, privacy-conscious workflows.
HuggingFace â· #today-im-learning (3 messages):
Ollama local deployment, NLP in HuggingFace
- Newbie Uses Ollama for Local Deployment: A member is starting to learn using Ollama for local deployment with Python and OpenAI.
- They are using Ollama local deployment to avoid paying for OpenAIâs API keys.
- Newbie learning NLP in HuggingFace: A member mentioned they are learning about NLP in the HuggingFace page.
- They are hoping to finish the course by the deadline.
HuggingFace â· #cool-finds (1 messages):
Daily Papers Podcast, Takara TLDR
- Takara TLDR inspires a Daily Papers Podcast!: A user took the Takara TLDR concept and created a daily papers podcast.
- The podcast seems to be hosted on the HuggingFace platform.
- Daily AI paper summaries, now in Podcast form: A user has remixed the Takara TLDR concept into a daily papers podcast.
- This could be a valuable resource for staying up-to-date with the latest AI research.
HuggingFace â· #i-made-this (3 messages):
AI Runner, GAPRS
- AI Runner desktop GUI takes Flight!: A member released AI Runner, a desktop GUI for running AI models locally using HuggingFace libraries as described in this YouTube video.
- The tool enables users to create and manage chatbots with custom voices, personalities, and moods, and the bots are agents built with llama-index using ReAct tools to generate images with Stable Diffusion and real-time voice conversations (espeak, speecht5, or openvoice).
- GAPRS 3.0 sets Sail!: A member launched the 3rd iteration of their masterâs thesis project, a web application called GAPRS (graph-based academic recommender system) at lqhvwseh.manus.space.
- The goal of GAPRS is to help students know where to start when writing their theses, streamline the academic research process, and revolutionize monetization of academic papers; more details are available in the memberâs masterâs thesis.
HuggingFace â· #computer-vision (3 messages):
Monocular Depth Models, Segmentation Problem, Tools Recognition Task
- Monocular Depth Models Explored: A member inquired if another member had found a solution to a problem, noting they had already tried monocular depth models.
- Segmentation Solution Proposed: A member suggested a solution to a segmentation problem involving vertical poles, proposing to check the overlap of the x-coordinates of the bounding boxes of different segments with the same label.
- The user said to take the min(x_lefts)->max(x_rights); they also suggested trimming thick boxes by using (x_mid +/- 0.5*width_pole).
- Tools Recognition Task: A member asked for suggestions on the best model/algorithm for a tool recognition task, specifying that the model should identify tools by providing a reference picture.
- The member asked if the model should be enhanced for better feature extraction.
HuggingFace â· #smol-course (4 messages):
Dataset forms, Unit 1 Quiz failing to load, Agents Build Errors, Chat templating exercises
- Dataset forms cause confusion: A member pointed out that someone was doing the same thing to both datasets, but they start in different forms.
- Another member requested more details, asking what do you mean by forms?
- Unit 1 Quiz fails to load, redirects too many times: A member reported the Unit 1 quiz fails to load and gets stuck in a redirect loop: agents-course-unit-1-quiz.hf.space redirected you too many times.
- They mentioned theyâre new to coding and unsure how to resolve the issue, seeking support.
- Agents Get Build Error: A member reported experiencing a Build error when trying to fetch the error logs, getting stuck in a loop.
- Initially, they couldnât get any response when chatting with the agent, and they were having issues even with copying and pasting the name of the tool.
- Someone Needs Chat Template Exercise Buddy: A member is seeking someone to discuss the chat templating exercises with them.
- No other details were provided, they were just looking for a study buddy.
HuggingFace â· #agents-course (26 messagesđ„):
Code Agents Ch. 2 Notebook Issues, Gemini Models as Alternatives, Course FAQ Request, any-agent library release, RAG with smart glasses challenge
- Code Agents Ch. 2 Notebook Requires Payment: A member reported needing to pay to run the Chapter 2 notebook for Code Agents, receiving errors about invalid credentials and payment requirements for the recommended model.
- They sought advice on logging in correctly or using alternative tokens to run the supposedly free course examples.
- Gemini Models Recommended to bypass paywalls: A member suggested using Gemini models as a free alternative in many countries, linking to course notes with instructions.
- Other members highlighted resources like Ollama and other providers (OpenAI, Grok) offering generous free tokens, in order to bypass the Hugging Face paywalls.
- FAQ Section Needed in Agent Course: Multiple members requested an FAQ section within the Agent Course itself, as many users face the same initial issues and find the Discord navigation difficult.
- The discussion clarified that while there isnât an official FAQ page, the Discord channel contains numerous frequently asked questions that can be searched.
any-agent
Library Simplifies Agent Framework Evaluation: The Mozilla AI team releasedany-agent
, a library designed to simplify trying different agent frameworks.- The library supports frameworks like smolagents, OpenAI, Langchain, and Llama Index, with a GitHub repository available for users to try and contribute.
- Meta CRAG Multi-Modal Challenge Release: The community shared an interesting challenge from Meta: the CRAG Multi-Modal Challenge 2025 related to RAG with smart glasses.
- This is recommended as a knowledge exercise to solidify what was learned in the course.
HuggingFace â· #open-r1 (13 messagesđ„):
Deepseek R1, Active AI Discord Chats
- Deepseek R1 Chatroom Gossip Starts: A member asked which Deepseek versions another member had been working with.
- That other member quipped back that they arenât working at all, but assume this room is Deepseek R1 related, and so the AI is working for them.
- Chatter Seeks Active AI Communities: A member inquired about active chats on AI in Discord, or even better, active voice chats.
MCP (Glama) â· #general (75 messagesđ„đ„):
Semgrep MCP server, MCP HTTP Streaming, MCP and CORS errors, MCP Github server issues, MCP for Graph API application
- Semgrepâs MCP Server Makes Waves: A member has been running mcp.semgrep.ai/sse for over a month, hosted via Docker and AWS EC2.
- CORS Error Squashed in Semgrep MCP Server: A member reported a CORS error when connecting with the Cloudflare Playground, which was quickly fixed.
- The reporter noted the tool was testing with Cursor and will need to fix CORS there as well.
- MCP HTTP Request-Response Support Arrives: A discussion emerged regarding the need for HTTP request-response support in MCP for enterprise customers, as highlighted in this pull request.
- Members pointed out that many enterprise organizations are using MCP, and this feature is expected to further increase its adoption.
- MCP Powers Up RAG with Graph DB: A member inquired about using MCP in a RAG use case with a Neo4j graph database, focusing on vector search and custom CQL search.
- Another member confirmed this is a good usecase, linking to mcpomni-connect as a viable MCP client.
- Cloudflare Provides Remote MCP Server Tutorial: For those seeking an easier tutorial to get started with remote MCP servers, a member recommends the Cloudflare Agents guide.
MCP (Glama) â· #showcase (15 messagesđ„):
Semgrep rewrites MCP, C# MCP SDK, ASGI style in process fastmcp sessions
- Semgrep Rewrites MCP Server: A member rewrote Semgrepâs MCP server and shared demo videos in Cursor and Claude.
- The hosted server uses SSE, not HTTP streaming, because the Python SDK doesnât support it yet.
- MCP SDK Leverages Sqlite for LLM Memories: A member played with the C# MCP SDK to leverage sqlite for LLM memories.
- A new version is available that gives memories an importance ranking to help with search results, and is intended to scale to larger memory graphs.
- ASGI Style Fastmcp Sessions Finalized: A member bumped versions on easymcp to 0.4.0, with notable changes including ASGI style in process fastmcp sessions.
- Other updates included a finalized native docker transport, refactored protocol implementation, a new mkdocs, and a full proper pytest setup.
- Terminal Chat with MCP Servers: A member created a terminal chat with MCP servers.
Latent Space â· #ai-general-chat (62 messagesđ„đ„):
Shopify AI Mandate, Anthropic API Credits, API Latency Benchmarking, Cybercriminals and AI, LLM Automated Exploitation
- Shopifyâs AI Quest Gets Noticed: Shopifyâs AI mandate is gaining traction, as highlighted in this tweet.
- Anthropicâs API Credits have Expiration Dates: Anthropic API credits expire after one year, potentially for accounting simplification and to account for the rapidly evolving AI landscape.
- As one member suggested, this policy helps manage projections in a quickly changing field.
- NVIDIAâs Reasoning Model has On/Off Switch: NVIDIA has released a new model with the ability to turn reasoning on or off, detailed in this blog post and available on Hugging Face.
- Cybercrimeâs AI Shock May Be Delayed: Despite basic AI applications like FraudGPT, mass adoption of AI by cybercriminals is surprisingly slow, with speculation that a âcybercrime AI shockâ may occur when they adopt it more broadly.
- One member noted that LLMs may have only recently become good enough for use in cybercrime.
- Gemini Plays Pokemon and Streams the Madness: The Gemini AI is now playing Pokémon, garnering attention as shown in this tweet.
Yannick Kilcher â· #general (14 messagesđ„):
Llama 4 flops on benchmarks, Bayesian Structural EM, Procedural model representation DNA, Meta should have clarified, Disrupt Science Hackathon Details
- Llama 4 flops on Nongamed Benchmarks: A member claimed that Llama 4 flops hard on nongamed nonoverfitted benchmarks.
- The Daily Paper Discussion room will be discussing this paper, and a recent talk by the main author (YouTube link) discusses this paper.
- Meta should clarify Llama 4: It was stated that Meta should have made it clearer that âLlama-4-Maverick-03-26-Experimentalâ was a customized model to optimize for human preference.
- This discussion was based on this fxtwitter link.
- Bayesian Inference Insights: A member pointed out that Bayesian inference has been combining weights and architecture for about 100 years, and cited Bayesian Structural EM as an advanced example.
- Procedural Model Representation: DNA of a Model: A member introduced the concept of procedural model representation, where a small seed can generate a large model (architecture + weights).
- They envisioned downloading a 10MB model that generates a 100TB model, or swapping seeds to generate different models, akin to downloading DNA to generate a human.
- Disrupt Science Hackathon Details Posted: Details for the Disrupt Science Hackathon have been posted.
- The details can be found in this discord link.
Yannick Kilcher â· #paper-discussion (12 messagesđ„):
Fast.ai Diffusion Methods, F_A_E_S_I_k=2 Discussion, Open Source beautiful.ai Alternatives
- Fast.ai Explores Diffusion Methods: A member shared a link to Fast.aiâs diffusion methods course.
- Another member inquired about the timing of the second part of the course.
- Decoding F_A_E_S_I_k=2 Revelations: A member joked about having
F_A_E_S_I_k=2
, leading to getting 40 hours of video, in relation to a paper discussion on arxiv.org/abs/2408.04220.- They speculated it might by construction have built this in, but probably not good at needles in a haystack, especially when the needles have dependencies on one another.
- Quest for Beautiful.ai Open Source Twins: A member asked about open-source alternatives to Beautiful.ai.
Yannick Kilcher â· #agents (1 messages):
Efficient Tool Calling Templates, Cogito 14b
- Cogito 14bâs Efficient Tool Template: The 14b model suddenly started using a more efficient tool calling template than was initially provided in the instructions.
- Itâs recommended to check out the Cogito model for examples and inspiration.
- New Efficient Tool Calling Template Implementation: A user reported that a 14b model unexpectedly adopted a more efficient tool calling template.
- This suggests the model may have autonomously optimized its tool use, offering a potential area for further investigation.
Yannick Kilcher â· #ml-news (9 messagesđ„):
Adapting Pre-training Text, Diffusion Modeling to Control LLMs, Llama 4 Release Issues, Iterative Improvement Strategy
- Pre-Training Adaptation Talk: A member shared an Awesome talk about adapting pre-training text to include database lookups for relevant facts, to train the LLM to look things up during generation.
- Diffusion Models Guide LLMs: A member mentioned using diffusion modeling to control LLMs and pointed to this paper as a relevant resource.
- Llama 4âs Flop Explained: The poor release of Llama 4 was attributed to bad implementations.
- DeepCogito Iterative Improvement Strategy Preview: A member shared a link from Hacker News about an iterative improvement strategy using test time compute for fine-tuning, from DeepCogito.
Nomic.ai (GPT4All) â· #general (28 messagesđ„):
IBM Granite 8B, RAG references, docling OCR, semantic chunking server, ComfyUI image generation
- Granite 8B Shines for RAG tasks: A member reported that IBM Granite 8B works well with RAG, especially concerning providing references by LLM.
- Another member concurred, having also found Granite to be effective.
- Docling for Non-Text PDF OCR: A member recommended docling for excellent image OCR, especially for non-text PDFs like scans.
- They highlighted its continuous operation for embeddings and integration into a database with indexed documents, enabling RAG through intersections.
- Semantic Chunking for Contextual Text: A member shared a semantic chunking server, demonstrating its use with clipboard examples.
- They noted its compatibility with audio and image processing, suggesting ComfyUI for combining all modalities.
- Llama 4 Gets Trashed for Being Terrible: A member trashed the Llama 4th gen model for being terrible compared to smaller models.
- Others agreed, noting Reddit comments speculated that it may have overfit on smaller âhigh qualityâ datasets, despite some benchmarks showing promise.
- GPT4All: Keep it Local to Be Safe: A member advised using GPT4All primarily for local operations to ensure privacy and avoid sending private information to remote APIs.
- They detailed how to run embedding models locally and index files by chunking and embedding, referencing a shell script example.
Modular (Mojo đ„) â· #general (4 messages):
MLX vs MAX, Apple Silicon GPU limitations, MAX capabilities
- MLX vs MAX: A Comparative Dive: A member compared MLX (an array programming framework akin to JAX) and MAX, noting that while MLX is tailored for Apple Silicon GPUs, MAX currently cannot target them, which poses challenges for direct comparison.
- The member highlighted that MAX for AMD GPUs will eventually mirror MLXâs shared memory benefits on MI300A and AMDâs consumer CPUs, suggesting a future convergence in capabilities.
- Apple Siliconâs Deployment Drawbacks: The member cautioned against relying solely on MLX for extensive projects, citing the difficulty of deploying Apple Silicon in server environments, necessitating potential rewrites to frameworks like MAX, JAX, or PyTorch for deployment.
- They emphasized that while MLX might offer convenience for initial experimentation, the practical limitations of Appleâs ecosystem in server settings should be a key consideration.
- MAXâs Manual Vectorization and Multi-Device Support: The member detailed that MAX, despite its lower-level API, offers capabilities comparable to NumPy and leverages Mojo for both automatic and manual vectorization, making it programmer-friendly.
- They admitted MAXâs limitations in autodiff but highlighted its multi-device support, exemplified by the Llama pipeline, and its avoidance of tensor shape issues, positioning it as a robust alternative despite certain challenges.
- Discord Self-Promotion Rules Violated: A member pointed out that a specific post seemed inappropriate for the Discord channel, suggesting it might be a violation of self-promotion rules.
- A moderator agreed, confirming that the post indeed violated the communityâs self-promotion guidelines.
Modular (Mojo đ„) â· #mojo (16 messagesđ„):
Mojo vs Rust, __moveinit__ and __copyinit__ in Mojo, Returning values in Mojo, Span lifetime in Mojo
- Mojoâs Borrowing vs. Rustâs: A Fresh Look: A newcomer to Mojo shared a blog post comparing Mojo and Rust, noting that Mojoâs âborrow by defaultâ felt more intuitive.
- The member then wondered about how Mojo handles returning values from functions.
- Moveinit vs Copyinit: Deep Dive into Mojo Object Returns: A member clarified that when returning objects in Mojo, the presence of
__moveinit__
dictates whether the object is moved, otherwise__copyinit__
is used, and provided an example on Github.- The member also pointed to the official Mojo documentation for a complete picture.
- Unlock Span Lifetimes in Mojo with rebinding!: A member inquired how to specify in Mojo that âthe lifetime of the return value is at least the lifetime of selfâ, specifically for a
Span
.- Another member suggested using
rebind[Span[UInt8, __origin_of(self)]](Span(self.seq))
or making the trait generic over origin, but noted that trait parameters are not yet supported.
- Another member suggested using
tinygrad (George Hotz) â· #general (5 messages):
Tensor Naming, GPU Programming, Compiler Development, Tinygrad Contribution Resources, PMPP 4th ed
- Elegant Tensor Naming Tricks Sought: A member inquired about a more elegant way to name a tensor for easier tracking when printing model parameters, noting they currently add a name attribute in the Tensor class manually.
- GPU Programming and Compiler Dev Resources Requested: A member expressed interest in getting into GPU programming and compiler development for projects like tinygrad and requested learning resources or blog posts.
- They are planning to read tinygrad-notes and asked for book or blog post recommendations on compiler development for GPUs.
- geohotarchive YouTube Channel recommended: A member suggested the geohotarchive YouTube channel as a resource for learning about tinygrad.
- âPMPPâ 4th Edition Recommended for GPU Programming: A member recommended PMPP (4th ed) for GPU programming, suggesting to share any excellent compiler resources if found.
tinygrad (George Hotz) â· #learn-tinygrad (12 messagesđ„):
METAL sync issue, AMD performance with BEAM=2, ContextVar type, LLaMA sharding issue, Device info loss after sampling
- METAL Sync Glitch Causes Sharding Shenanigans: A member found unexpected behavior in sharding while reproducing a minimal example of a METAL sync issue from the bounty.
- The user suspected that the COPY from METAL:1 to CPU was executing before the XFER from METAL to METAL:1 ended, causing the CPU to read zeros instead of the correct shard.
- AMD BEAM=2 Turbocharges Tinygrad: One user reported impressive speed improvements using AMD with BEAM=2, achieving 64 it/s, outperforming their previous best with Torch at 55+ it/s.
- Members noted that BEAM=2 often beats torch.
- LLaMA Sharding Snafu: Device Info Lost in Translation: A user encountered an AssertionError while running llama.py with
--shard 4
, indicating that the device info was lost after sampling.- A potential fix was proposed to move the tensor, as seen on GitHub, but itâs not directly related to METAL or sync issues.
LlamaIndex â· #blog (2 messages):
RAG workflow tutorial, Auth0 Auth for GenAI with LlamaIndex
- RAG workflows using Llama 4: A quickstart tutorial demonstrates building a RAG workflow from scratch using Llama 4, showcasing how to set up core steps around ingestion, retrieval, and generation using LlamaIndex workflows, as shown in this tweet.
- Auth0 Auth for GenAI ships with LlamaIndex support: Auth0âs Auth for GenAI now ships with native LlamaIndex support, making it easier to build auth into agent workflows, as announced in this tweet.
LlamaIndex â· #general (13 messagesđ„):
Gemini 2.5 Pro, Google's latest unified SDK, StructuredPlannerAgent Docs, Agent Planning Tool
- Gemini 2.5 Pro Not Available: A member inquired if Gemini 2.5 Pro was available, but discovered a deprecation message suggesting to use Googleâs latest unified SDK instead of Gemini 2.5 pro, as noted in the LlamaIndex Documentation.
- Google SDK model names arenât validated: A member noted that the Google SDK doesnât validate model names, but assumes the provided name is valid, and also suggests manually setting the
context_window
value since Gemini 2.5âs context window is quite large. StructuredPlannerAgent
Docs Removed: The documentation forStructuredPlannerAgent
was removed because it is no longer maintained due to a cleanup of the agent docs, due to duplicate implementations.- A backlink to the old documentation was provided: StructuredPlannerAgent.
- Agent Planning Tool recommended: Instead of
StructuredPlannerAgent
, it was suggested to use an agent with a planning tool that does some Chain of Thought (CoT) reasoning, or using the LLM itself to create a plan before using agent(s).
Cohere â· #ăđŹăgeneral (8 messagesđ„):
Events Recording Availability, Structured Output Examples, Pydantic Schema Integration, API Requests without Cohere Package, Model Recommendation for Company List Generation
- Events Recordings: Are They Available?: A member inquired whether recordings of events are available for those unable to attend in real-time, as some events sounded interesting.
- No response was given.
- Members Seek Structured Output Examples: A new member asked for examples on how to get structured output (e.g., a list of books) using Cohere, expressing their lack of experience in the field.
- The member was directed to the Cohere documentation as a starting point.
- Pydantic Schema with Cohere: A member sought ways to use Pydantic schemas directly in
response_format
with Cohere and on how to send requests without the Cohere Python package, aiming to avoid introducing a dependency.- They were provided with a link to the Cohere Chat API reference and shown how to use cURL for requests to
https://api.cohere.com/v2/chat
.
- They were provided with a link to the Cohere Chat API reference and shown how to use cURL for requests to
- OpenAI SDK example with Response Format: A member found the cURL example useful and noted its presence in the OpenAI SDK example with the
response_format
parameter.- The member then asked for a recommendation on the most suitable model for generating a list of companies on a specific topic.
Cohere â· #ăđăapi-discussions (1 messages):
Vector Databases, Model Compatibility, Explicit Recommendations
- Vector DB Recommendations Historically Avoided: Historically, explicit recommendations for vector DBs have been avoided, because our models work well with all of them.
- This is because the models are designed to function effectively with all vector DBs without specific optimizations for any particular one.
- Model Compatibility Across Vector DBs: The models are designed for broad compatibility, ensuring they perform well with various vector database solutions.
- This approach avoids favoring specific vector DBs and maintains a neutral stance towards the ecosystem.
Cohere â· #ăđ€ăbot-cmd (1 messages):
competent: Currently not working!
Cohere â· #ăđ€ăintroductions (2 messages):
Introduction to Aditya, Machine vision and control, Innovation accelerator, Openchain.earth project, Tools used by Aditya
- Aditya Joins Cohereâs Community: Aditya introduced themself as having a background in machine vision and control for manufacturing equipment (Semi/Electronics).
- Currently, they are taking a sabbatical from an innovation accelerator/matchmaking/assessment role to explore web/AI, with a project on openchain.earth.
- Adityaâs Tech Stack Revealed: Aditya uses VS Code, Github Co-Pilot, Flutter, MongoDB, JS, and Python (Evaluating) in their projects.
- They are here to find out more on Cohereâs AI and how it can be used in their project.
Cohere â· #ăđąăstatus-updates (1 messages):
competent: Should work!
Torchtune â· #general (1 messages):
Contributor Tag Request, Discord Roles
- Contributor Tag Incoming: A member requested a Contributor tag on Discord, sharing their GitHub username.
- They also made a lighthearted mention of their Discord profile picture featuring the character Gus from Psych.
- Requesting Discord Roles: A user is seeking a role elevation within the Discord server, specifically the Contributor tag.
- They linked their GitHub profile for verification and joked about their profile picture.
Torchtune â· #dev (6 messages):
DeepSpeed Integration, FSDP vs DeepSpeed, FSDP Sharding, zero1-3 training
- DeepSpeed Integration Debated for TorchTune: A member inquired about integrating DeepSpeed as a backend into TorchTune and created an issue to discuss the possibility.
- A maintainer asked for more context, noting that FSDP supports all the sharding options from DeepSpeed; potential reasons for DeepSpeed integration include fallback in lieu of FSDP bugs, diverging hardware/accelerator support, and speed.
- FSDP Favored Over DeepSpeed in TorchTune: TorchTune leans towards FSDP due to its better composition with other PyTorch distributed features, with the belief that supporting both versions well is not feasible.
- Users who migrated to TorchTune to avoid the complexities of composing DeepSpeed, PyTorch, and Megatron prefer sticking to native PyTorch, so there is no need to over-index on integrating and supporting other frameworks.
- Community Recipe Idea: DeepSpeed with TorchTune: A maintainer suggested creating a community recipe that imports TorchTune and hosts a DeepSpeed recipe, offering to feature it if a repo is made.
- This allows users interested in DeepSpeed to leverage it with TorchTune while keeping the core framework focused on native PyTorch.
- Tweaking FSDPModule for zero1-2 Training: Since TorchTune defaults to the equivalent of zero3, documentation or more recipes on how to tweak recipes using the FSDPModule methods for zero1-2 training are appreciated.
- Itâs believed that zero 1-3 are all possible with very minor tweaks to the collectives.
DSPy â· #show-and-tell (1 messages):
MIPRO, Automated Prompt Engineering, Task Complexity Scaling
- MIPRO Algorithm Tested on Scaling Complex Tasks: An article tested the MIPRO automated prompt engineering algorithm across tasks of varied complexity, from named entity recognition to text-based game navigation.
- The study leveraged tasks like CoNLL++, HoVer, BabyAI, and Ï-bench (customer support with agentic tool use).
- Model Size Matters for MIPRO Optimization: The study found that larger models benefit more from MIPRO optimization in complex settings, potentially because they handle longer multi-turn demonstrations more effectively.
- The quality of feedback significantly impacts the MIPRO optimization process, with meaningful improvements seen even from noisy AI-generated feedback.
LLM Agents (Berkeley MOOC) â· #mooc-announcements (1 messages):
Kaiyu Yang, AI4Math, Theorem Proving, Autoformalization
- Kaiyu Yang Lectures on Formal Math Reasoning: Guest speaker Kaiyu Yang presented on âLanguage models for autoformalization and theorem provingâ on a livestream today; link here.
- His lecture covered using LLMs for formal mathematical reasoning, including theorem proving and autoformalization.
- AI4Math Crucial for AI-Driven Systems: AI for Mathematics (AI4Math) is crucial for AI-driven system design and verification, and techniques mirror NLP, especially training LLMs on curated math datasets.
- A complementary approach involves formal mathematical reasoning grounded in systems like Lean, which verify reasoning correctness and provide feedback.
- Metaâs Dr. Yang Enhances AI in Math: Dr. Kaiyu Yang, a Research Scientist at Meta FAIR, focuses on enhancing AIâs mathematical reasoning by integrating formal systems like Lean.
- His work explores using LLMs for tasks like theorem proving (generating formal proofs) and autoformalization (translating informal to formal).
MLOps @Chipro â· #events (1 messages):
Manifold Research Group, Multimodal AI, self-assembling space robotics, robotic metacognition, Community Research Call #4
- Manifold Research Group hosts Research Call #4: Manifold Research Group is hosting Community Research Call #4 this Saturday (4/12 @ 9 AM PST).
- The call will cover their latest work in Multimodal AI, self-assembling space robotics, and robotic metacognition.
- Space Robotics Research Taking Off: A PhD student from Manifold Research Group, specializing in robotic swarms in space, extended an invitation to a research call.
- The call aims to foster collaboration and explore frontier science in space robotics.
Codeium (Windsurf) â· #announcements (1 messages):
Codeium rename, Windsurf Reddit, Windsurf Plugins
- Codeium Rebrands as Windsurf: Codeium has officially rebranded to Windsurf, following the launch and incredible adoption of the Windsurf Editor in November 2024.
- The new name better reflects their vision of combining human and machine to create effortlessly powerful experiences, according to their rebrand announcement.
- Windsurf Launches New SubReddit: The company has launched a new SubReddit for the community.
- The announcement was made alongside changes to the Discord server, including refreshed pages and renaming of channels.
- Codeium Extensions are now Windsurf Plugins: With the rebrand, Codeium Extensions are now officially Windsurf Plugins.
- The company promised to continue improving the Windsurf Editor, wave by wave, with the same commitment to innovation.
{% else %}
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!
If you enjoyed AInews, please share with a friend! Thanks in advance!
{% endif %}