Agent Harnesses are all you need.
AI News for 5/15/2025-5/16/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (214 channels, and 3819 messages) for you. Estimated reading time saved (at 200wpm): 341 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Deepmindâs new AlphaEvolve, 2025âs update of AlphaTensor and FunSearch, is hard to grok, as it summarizes a year of results across a vast swath of math and LLM training applications, AND is not publicly available to try, but GDM succinctly puts it as âa Gemini-powered coding agent for algorithm discovery⊠able to:
- Design faster matrix multiplication algorithms,
- Find new solutions to open math problems,
- Make data centers, chip design and AI training more efficient across @Google.
It is described as an agent rather than a model due to the mutiple components in a loop:
Itâs very Googley to understate their results, so one has to turn to the Twitterverse to get the highlights, which are much better:
- âSped up Gemini training with 23% faster kernel resulting 1% total reduction in training time.â - Philipp Schmid
- âsurpassing SOTA on 20% of the problems it was applied to is actually nutsâ - Scott Swingle
- âThe results are impressive: they improve the best known bounds on many problems including the Minimum Overlap Problem by Erdos, matrix multiplication, and the Kissing number in 11 dimensions.â
- âThe solutions here are pieces of code, and this is a search agent that modifies, evaluates, and optimizes code i.e. pieces of text. This is in sharp contrast to Deep-RL where the solutions are models and what is optimized is their weights.â - Alex Dimakis
- On the 32% speedup of FlashAttention CUDA code - Henry
- âAlphaEvolve is deeply disturbing for RL diehards like yours truly Maybe midtrain + good search is all you need for AI for scientific innovation And what an alpha move to keep it secret for a year.â â Jason Wei
Inquiring minds can watch the MLST interview about it:
AI Twitter Recap
GPT-4.1 and OpenAI Model Releases
- GPT-4.1 Now Available in ChatGPT: @OpenAI announced that GPT-4.1 is available in ChatGPT, highlighting its specialization in coding tasks and instruction following, making it a faster alternative to OpenAI o3 & o4-mini for everyday coding needs, while @kevinweil also noted that this model is available for Plus/Pro/Teams subscribers and soon to Enterprise/Edu. @michpokrass confirmed GPT-4.1 landing in chatgpt today after initially planning on keeping this model api only, whereas @scaling01 said this is a huge upgrade for ALL ChatGPT free users!, noting GPT-4.1-mini replaces GPT-4o mini, and is honestly much, much better.
- Introducing the Safety Evaluations Hub: @OpenAI introduced the Safety Evaluations Hub, a resource to explore safety results for their models, emphasizing proactive communication about safety.
- Introducing GPT-4.1 mini: @OpenAI announced that theyâre also introducing GPT-4.1 mini, replacing GPT-4o mini, in ChatGPT for all users.
- Releasing the OpenAI to Z Challenge: @OpenAIDevs announced the OpenAI to Z Challenge using o3/o4 mini and GPT 4.1 models to discover previously unknown archaeological sites:, while @gdb is Releasing the OpenAI to Z Challenge â using o3/o4 mini and GPT 4.1 models to discover previously unknown archaeological sites.
- Responses API Support Added to Evals API and Dashboard: @OpenAIDevs announced added support for the Responses API in the Evals API and dashboard and provided a handy guide on how to get started, using an example of comparing gpt-4.1-mini with gpt-4o-mini on stored responses @OpenAIDevs.
Googleâs AlphaEvolve and Gemini
- AlphaEvolve, a Gemini-powered Coding Agent: @GoogleDeepMind introduced AlphaEvolve, a Gemini-powered coding agent for algorithm discovery, which can design faster matrix multiplication algorithms, find new solutions to open math problems, and make data centers, chip design, and AI training more efficient across @Google. @demishassabis congratulated the AlphaEvolve, Gemini and Science teams on their accomplishment. They also detailed how AlphaEvolve is used, using LLMs to synthesize information, automated evaluation for measurable problems, and evolution to iteratively improve algorithms. The company has also been applying AlphaEvolve to optimize data center scheduling, assist in hardware design, and enhance AI training and inference. AlphaEvolve has also been used to discover new matrix multiplication algorithms, outperforming the previous model AlphaTensor, and find new solutions to open math problems. The company aims to keep developing AlphaEvolve due to itâs potential impact across different fields.
- Implicit Caching with Gemini: @_philschmid highlighted implicit caching support in GoogleDeepMindâs Gemini, which unlocks up to 75% cost savings when requests hit the cache. This is especially useful when sending requests with a common prefix, such as querying parts of a large PDF.
Open Source Models, Training, and Frameworks
- Nous Decentralized Pretraining Run: @Teknium1 announced that Nous has begun a decentralized pretraining run of a dense Deepseek-like model with 40B parameters, over 20T tokens, with MLA for long context efficiency.
- Hugging Face MCP Course: @reach_vb announced Hugging Face has dropped the MCP Course, covering everything you need to know about Model Context Protocol and how to use it.
- AM-Thinking-v1 Reasoning Model: @omarsar0 noted AM-Thinking-v1 looks like a strong 32B reasoning model. It outperforms DeepSeek-R1 and rivals Qwen3-235B-A22B, while being built on top of open-source, and @arankomatsuzaki notes that AM-Thinking-v1 performs on par with Qwen3-235B-A22B and Seed1.5-Thinking while being built entirely from the open-source Qwen2.5-32B base model and publicly available queries
- Salesforce BLIP3-o Multimodal Models: @_akhaliq and @iScienceLuvr mentioned that Salesforce has released BLIP3-o on Hugging Face. It is a family of fully open unified multimodal models employing a diffusion transformer to generate semantically rich CLIP image features.
Reasoning and Agentic Systems
- LLMs Get Lost in Multi-turn Conversations: @omarsar0 highlighted a paper investigating how LLMs perform in realistic, multi-turn conversational settings where user instructions are often underspecified and clarified over several turns. @omarsar0 notes that All tested LLMs show significantly worse performance in multi-turn, underspecified conversations compared to single-turn, fully-specified instructions. The average performance drop is 39% across six tasks, even for SoTA models. He also listed the main reasons LLMs get âlostâ including making premature assumptions, attempting full solutions before having all necessary information and overly verbose outputs.
- FedRAG Framework: @nerdai introduced FedRAG, an open-source framework for fine-tuning RAG systems across centralized and federated architectures.
- Chain-of-Thought Reasoning: @francoisfleuret claims CoT is a poor manâs version of âthe real thingâ, as a process to sample meaningful latents.
- RL for Search-Efficient LLMs: @omarsar0 noted that this presents a new post-training RL framework that explicitly trains LLMs to optimize search usage.
- LangChainâs Open Agent Platform (OAP): @LangChainAI introduced the Open Agent Platform, an open-source, no-code agent building platform, which connects to MCP Tools, LangConnect for RAG, and other LangGraph Agents.
- Runway References for Zero-Shot Testing: @c_valenzuelab demonstrated Runway References for zero-shot testing of clothes, locations, and poses.
AI Implementation, Tooling, and Infrastructure
- Metaâs Transformers + MLX Integration: @awnihannun expressed the importance of Transformers to the open-source and overall AI ecosystem and looked forward to more and deeper integrations with MLX + Transformers.
- OpenAIâs Tech Stack: @nrehiew_ pointed out that OpenAI uses FastAPI to serve ChatGPT, countering complaints about Python and FastAPIâs capabilities.
- LangGraph Platform Generally Available: @LangChainAI announced that LangGraph Platform is now generally available, allowing users to deploy, scale, and manage agents with long-running, stateful workflows.
- GPT-4.1 Coding Skills @OpenAIDevs states that it was Coded up by GPT-4.1, rolling out today in ChatGPT
- The importance of embeddings in training: @jxmnop states embeddings are underrated.
- Atropos and Axolotl AI: @Teknium1 You can now train using Atropos using @axolotl_ai too
AI Analysis and Evaluation
- ARI Beats OpenAIâs Deep Research: @RichardSocher announced that ARI (Advanced Research & Insights agent) just beat OpenAIâs Deep Research by a large margin and on two benchmarks.
- GPT-4.1 Excellent Coding Skills: @kevinweil states that GPT 4.1 is very good at coding and instruction following, recommending users to give it a try.
- Limitations of current evals: @cline reports that eval loops rarely survive contact with real humans.
- LangChain Interrupt 2025 Evals: @LangChainAI launched OpenEvals, a set of utilities to simulate full conversations and evaluate LLM application performance.
- AM-Thinking-v1 Model Evaluation: @omarsar0 points out that AM-Thinking-v1 outperforms DeepSeek-R1 and rivals Qwen3-235B-A22B and @arankomatsuzaki notes that AM-Thinking-1 performs on par with Qwen3-235B-A22B and Seed1.5-Thinking while being built entirely from the open-source Qwen2.5-32B base model and publicly available queries.
Humor and Miscellaneous
- The cat is out of the bag @omarsar0 on the topic of LLMs get lost in multi-turn conversations.
- OpenAI uses FastAPI for ChatGPT: @nrehiew_ jokes about the extensive use of python despite constant complaining about it.
- Google Time: @zacharynado exclaims âïž GOOGLE TIME! (ă„ïœĄââżâżâïœĄ)ă„âïž.
- LLMs will just lie to me @nearcyan reports on hating LLMs.
- You have picked your poison! @scottastevenson responding to a user about escaping ambiguity and submitting to structure, education, doomscrolling, drug addiction, fitness routines, mountain climbing, entrepreneurship, raising a family, abusive relationships.
- The mog trend was kinda cringe, but this one goes hard fr @AravSrinivas reacting to an image.
- The fact flash-attention somehow works with uv gives me hope this is possible @typedfemale hopeful comment on uv and flash attention.
- Building perfect machines out of imperfect parts @MillionInt a deep comment.
- oh no @lateinteraction a reaction to something that occurred.
- This is too good, I didnât expect YC launch videos to be docudramas of stuff that just happened like a month ago but here we are @iScienceLuvr reaction to a YC launch video.
AI Reddit Recap
/r/LocalLlama Recap
1. Text-to-Speech Model Training and Tools in Unsloth
- TTS Fine-tuning now in Unsloth! (Score: 223, Comments: 43): Unsloth has introduced support for efficient Text-to-Speech (TTS) model fine-tuning, claiming ~1.5x faster training and 50% less VRAM usage compared to alternatives, particularly on FA2 hardware. Supported models include
Sesame/csm-1b
,CanopyLabs/orpheus-3b-0.1-ft
, and Transformer-based models (e.g., Llasa, Outte, Spark), with data-efficient SFT-style workflows using emotion-annotated datasets like âEliseâ. Users can leverage Google Colab notebooks, 16-bit LoRA or full-precision fine-tuning, and quantized/original checkpoints on Hugging Face. Notably, a new Qwen3 GRPO method is also supported, combining base models with custom proximity-based reward functions and regex-guided evaluation. Comments clarify that while Whisper is primarily an STT model, its inclusion may be for ASR-related preprocessing or dataset generation. Users discuss LoRA fine-tuning scalability and best practices for controlling TTS model tone, pitch, and cadence, as well as dataset size requirements per parameter count, highlighting interest in practical fine-tuning methodologies.- A user questions whether âWhisperâ is a TTS modelâclarifying that it is primarily an STT (speech-to-text) and ASR (automatic speech recognition) modelânot text-to-speech. The comment asks if Unslothâs finetuning is supporting datasets for ASR finetuning rather than true TTS.
- Another commenter asks specifically about the requirements for TTS finetuning, namely how many audio/text examples are needed per billion or hundred million parameters. This reflects a common technical concern of dataset scale versus model parameterization in speech synthesis finetuning.
- A technical feature request is made regarding native Mac MPS (Metal Performance Shaders) support, seeking to enable hardware-accelerated training/inference on Apple Silicon devices, which is relevant for efficient TTS model finetuning workflows outside of CUDA dependencies.
- Introducing A.I.T.E Ball (Score: 281, Comments: 53): The post details a local, offline implementation of an AI-powered â8 Ballâ device running entirely on an Orange Pi Zero 2W. The setup uses whisper.cpp for local text-to-speech, llama.cpp for LLM inference, and specifically runs the Gemma 3 1B model, emphasizing the resource constraints and fully offline capability. Commenters noted appreciation for fully local operation, highlighting the rarity of offline AI hardware in a heavily internet-connected landscape. No substantive technical debate is present.
- A commenter suggests integrating Piper TTS (https://github.com/rhasspy/piper), an open-source text-to-speech engine, to enhance the project, noting its ability to run efficiently on modest hardware. This implies the device could be upgraded for voice output despite hardware limitations.
- Multiple comments highlight the unique feature of running the model entirely offline, contrasting it with the trend of always-online AI devices. This offline capability is noted as significant, particularly for privacy and increased control of the application on resource-constrained hardware.
2. New Features and Data Handling in llama.cpp
- PDF input merged into llama.cpp (Score: 120, Comments: 34): A recent PR (#13562) has added native PDF input support in the llama.cpp web UI by integrating an external JS library for PDF parsing, offering users the ability to toggle between text extraction and image rendering. This approach ensures that the C++ core remains unaffected, allowing rapid updates and replacement of PDF parsing tools, and includes an automatic conversion for lengthy pasted content to file uploads. Comments note that this implementation upholds core modularity (aligning with maintainability), but some express concern about feature creep versus adherence to the Unix philosophy. Thereâs technical discussion about OCR integration for mixed-content PDFs and requests for merging related PRs for extended document handling.
- The PDF input functionality for llama.cpp is implemented in the built-in web frontend using an external JavaScript package, not in the core C++ application. This architectural decision keeps core maintenance minimal and allows for easy replacement or upgrading of the PDF conversion package without affecting core functionality.
- Currently, the solution provides two modes for handling PDFs: parsing as pure text or pure image. There is recognition among users that a more robust approach would selectively extract text natively while applying OCR only to the image partsâakin to how specialized OCR software worksâsuggesting potential future improvements in structural and semantic PDF understanding.
- Users question whether the existing integration can extract and represent structural information from PDFs, such as tables or embedded images, which are crucial for advanced tasks like RAG (Retrieval Augmented Generation) and graph building. Effective support for such features is recognized as a significant technical challenge in PDF processing for LLM pipelines.
- Introducing A.I.T.E Ball (Score: 281, Comments: 53): The post details a local, offline implementation of an AI-powered â8 Ballâ device running entirely on an Orange Pi Zero 2W. The setup uses whisper.cpp for local text-to-speech, llama.cpp for LLM inference, and specifically runs the Gemma 3 1B model, emphasizing the resource constraints and fully offline capability. Commenters noted appreciation for fully local operation, highlighting the rarity of offline AI hardware in a heavily internet-connected landscape. No substantive technical debate is present.
- A commenter suggests integrating Piper TTS (https://github.com/rhasspy/piper), an open-source text-to-speech engine, to enhance the project, noting its ability to run efficiently on modest hardware. This implies the device could be upgraded for voice output despite hardware limitations.
- Multiple comments highlight the unique feature of running the model entirely offline, contrasting it with the trend of always-online AI devices. This offline capability is noted as significant, particularly for privacy and increased control of the application on resource-constrained hardware.
3. LLM Multi-Turn Conversation Challenges and Benchmarks
- LLMs Get Lost In Multi-Turn Conversation (Score: 218, Comments: 67): A recent paper (arXiv:2505.06120) demonstrates that both open and closed-source LLMs (Language Learning Models) show substantial degradation in performance during multi-turn conversations, particularly when instructions are âshardedâ (split across turns) versus âconcatâ (provided all at once). Experiments reveal LLMs frequently make compounding errors after initial incorrect assumptions, rarely recovering from early misinterpretationsâa phenomenon not captured in single-turn benchmarks. The research suggests that reinitiating conversations with all relevant context in the first prompt can mitigate this issue. Commenters corroborate these findings with practical experiences, noting this multi-turn degradation with various models (e.g., o1 pro, sonnet 3.7, vs. improvements in 2.5 pro). One detailed example shows how iterative prompting with LLMs (Gemma, Qwen) leads to semantic drift and compounding of initial mistakes due to LLMsâ reliance on prior outputs, illustrating a core challenge in multi-turn context tracking.
- Users observe that LLMsâlike o1 pro, sonnet 3.7, and even strong open models like Gemma and Qwenâoften make initial incorrect assumptions in early turns, then compound these mistakes across multi-turn conversations due to their autoregressive nature. As one user notes, âLLMs being word probability engines makes them double down on previous choices, so initial mistakes lead to compounding errors and general weirdness.â
- There is a critique that most LLM benchmarks and tuning exercises are focused primarily on single-turn, fully-specified instruction settings, which can misrepresent actual multi-turn or agent-centric use cases. One commenter highlights that in real-world and especially coding workflows, multi-turn performance is paramount, but models may only be optimized for benchmark scores.
- The comment about the âfull and concatâ prompt strategies reflects an interest in how context handling methods affect performance: earlier and smaller models are said to depend heavily on prompt structure, while newer/modern models may manage context better. This draws attention to an important technical axis for ranking modelsâ suitability for agent use and extended dialogues.
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo
TO BE COMPLETED
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.5 Pro Exp
Theme 1: Model Mania - New Releases and Capabilities Spark Fierce Debates
- Gemini 2.5 Pro Flexes Its Muscles (and Context Window): Engineers across Discords like LMArena, aider, and OpenAI extensively discussed Gemini 2.5 Pro, praising its coding prowess, reasoning skills, and massive 1 million token context window, which some find indispensable compared to GPTâs 32k limit. While its free availability is appreciated, many acknowledge itâs likely temporary due to high operational costs, though some users found its reasoning chunks useless.
- GPT-4.1 Variants Battle for Coding Crown: The OpenAI and aider communities buzzed with comparisons between GPT-4.1 and GPT-4o, with many asserting GPT-4.1 (and particularly GPT-4.1 mini) excels at coding tasks due to better instruction following. Users shared screenshots comparing models and debated the potential GPT-5 release, speculated for summer or late 2024.
- Fresh Faces in the AI Arena: DeepSeek, Qwen, AlphaEvolve & More!: New model announcements peppered discussions, including DeepSeek v3 as a Mixture of Experts (MoE) model, Qwen3âs aptitude for translating Mandarin datasets, and Google DeepMindâs AlphaEvolve (AlphaEvolve PDF) sparking debate on whether itâs a true evolutionary algorithm or LLM-driven. Samsung also joined the fray with models like MythoMax-L2-13B (spotted on Hugging Face) and MuTokenZero2-32B.
Theme 2: Engineering AI - Optimizing Performance and Refining Development Tools
- Quantization Wars: QNL Smokes GGUFs for Speed!: The Unsloth AI community reported that QNL offers faster performance than standard GGUFs, though formal benchmarks are awaited, highlighting ongoing efforts to optimize model speed and efficiency. Discussions in LM Studio also emphasized the critical role of keeping models entirely within VRAM over DRAM speed for optimal performance, with KV Cache location being a key factor.
- Framework Fever: DSPy, LlamaIndex, LangGraph & MCP Streamline AI Dev: Developers are actively leveraging frameworks like DSPy for implementing structured outputs with Pydantic models, and LlamaIndex for building event-driven agent workflows, such as a multi-agent Docs Assistant for Weaviate. LangGraph is gaining traction for managing complex conversational flows (LangGraph Course), while the Meta-Circular Evaluator Protocol (MCP), now with Shortwave client support, is enabling easier integration of AI agents with various applications.
- Hardware Hustles: From Multi-GPU Fine-Tuning to WebGPU Roasts: Multi-GPU fine-tuning using tools like Accelerate with Unsloth is a hot topic, and the GPU MODE server saw active benchmarking of MI300 cards and discussions on TritonBench errors on AMD GPUs (example kernel). Meanwhile, a fun WebGPU Vision-LLM app called AsianMOM, using SmolVLM 500M and LLama 3.2 1B, demonstrated in-browser AI roasting capabilities.
Theme 3: Platform Quirks & User Workarounds - Navigating the AI Landscape
- Perplexityâs Pro Problems and Deep Research Disappointments: Perplexity AI users faced delays getting the Perplexity Pro role on Discord (with mods manually assigning), experienced errors viewing mobile app answers on the web, and expressed frustration with the Deep Research mode defaulting to regular search or using limited sources, with one user stating, âIf thatâs dumbed down, I see no reason to pay for it.â
- Model Mishaps: Looping Llamas and False Flags Frustrate Users: Users in LM Studio reported Llama 3.1/3.3 models producing undesirable outputs with fantasy prompts, showing token loss and punctuation issues. Cursor Community members saw Claude 3.5 Sonnet get stuck in loops, and Eleuther discussions highlighted MCQ evaluations like MMLU incorrectly flagging model outputs as false.
- Community to the Rescue: Proxies, Token Tips, and GPT4All Alternatives: When OpenAI access was restricted by country, OpenRouter users suggested proxies as a workaround. Aider users shared tips for managing token usage with commands like
/clear
and using models like Gemini 2.5 Flash, while the Nomic.ai community, fearing GPT4Allâs discontinuation, discussed Jan.ai and LM Studio as alternatives.
Theme 4: The Bustling AI Ecosystem - Collaboration, Learning, and Open Source Triumphs
- Indie Devs Unleash Creative AI: From Hotel Agents to Mom Roasters!: The community showcased impressive open-source projects, including the Jinko MCP for building AI agents to sell hotels, the Tig coding agent (announced by LlamaIndex) built with LlamaIndex workflows, and the humorous AsianMOM WebGPU Vision-LLM app that roasts users. Additionally, Mem0.ai introduced OpenMemory MCP, a unified memory management layer for AI apps.
- Level Up Your AI Game: Workshops, Webinars, and Challenges Galore!: Numerous learning opportunities surfaced, such as Nous Research and Solana Foundationâs Decentralized AI event in NYC, a Lambda workshop on building agentic applications with $100 API credits available, and OpenAIâs âOpenAI to Z Challengeâ (details here) to discover Amazonian archaeological sites. BlackboxNLP 2025 also announced a new shared task on circuits/causal variable localization using the MIB Benchmark.
- Fueling the Fire: API Credits and Sponsorships Keep Innovation Burning: Generosity flowed with a user in the aider community offering free API credits for Gemini, Claude, and OpenAI to support interesting projects, particularly for the aider project. Elsewhere, a tech-adjacent nonprofit sought event sponsorships and grants from Cohere (contact [email protected]), highlighting the various ways the ecosystem supports ongoing development.
Theme 5: AIâs Wild Side - Controversies, Accidental Leaks, and Industry Shake-ups
- Grokâs Gaffes: From âWhite Genocideâ Claims to Admitting Elonâs a Threat!: Elon Muskâs Grok model stirred controversy in the aider community by making wild claims about white genocide, leading to distrust, with some users viewing xAI as a joke. One member humorously reported asking Grok who the biggest threat to democracy on X was, to which it allegedly replied: Elon.
- Oops, We Leaked It! Samsungâs MythoMax-L2-13B Makes Brief Debut: Samsung inadvertently released (and then quickly removed) the MythoMax-L2-13B roleplay model, which was spotted on Hugging Face, as noted in the Yannick Kilcher and HuggingFace Discords. This led one user to quip, âCan someone do that for OpenAI and âreleaseâ GPT4Chan? Or Anthropic, that would be priceless.â
- Industry Tremors: TypeScript Dev Fired, Agentic Tools Challenge Big Tech: Microsoftâs unexpected firing of a key TypeScript developer (tweeted here) sparked dismay in the Latent Space community. Discussions also touched on how agentic tooling might empower indie developers to outpace big tech, and the demise of FUNapis was humorously attributed to Bingâs chatbot wrapper ambitions.
Discord: High level Discord summaries
Perplexity AI Discord
- Perplexity Pro Role: Mods to the Rescue: Users purchasing Perplexity Pro are experiencing delays in obtaining the Perplexity Pro role on Discord, but moderators are manually assigning roles while the system is reworked.
- Having the same email for Discord and Perplexity is not a requirement.
- Mobile App Answers Trigger Webpage Errors: Answers generated from the Perplexity mobile app canât be read on the web version, resulting in error messages.
- Customer service acknowledged the issue and reported it to technicians, but a fix has not yet been implemented.
- Deep Research Deemed Disappointing?: Users report issues with Deep Research mode, as it defaults to regular search and uses a limited number of sources.
- One user summarized the sentiment: I legit just use Perplexity because it reads 20-40 sources per query. If thatâs dumbed down, I see no reason to pay for it.
- 23andMe Braces for Bankruptcy: 23andMe filed for Chapter 11 bankruptcy, indicating significant financial challenges and a need for restructuring.
- The Chapter 11 filing suggests that 23andMe is seeking legal protection to reorganize its debts and operations.
- Sonar API Stings Hackathon Hopefuls: Members are facing problems acquiring the Sonar API for a hackathon project, since it requires credit card details.
- Another member reports that the answers generated using the Sonar API are different from those generated in the Perplexity AI playground.
Manus.im Discord Discord
- Manus Engages in âVibe Codingâ Livestream: Manus hosted a âVibe Coding with Manusâ livestream from San Francisco, featuring the Manus SF Fellows, and available on YouTube.
- The livestream showcased coding projects, fostering community engagement.
- Johnny the Credit-Farming Genius: A user humorously highlights how Johnny is âfarming Manus dailyâ compared to another user paying for manus to make a femboy detector.
- This illustrates a humorous contrast between exploiting the platform for free credits and using it for entertainment.
- The Femboy Detector: A user created an app that determines if a person is a femboy or not, using Function Calling to get currency rates from wise.com.
- The app outputs âfemboyâ for male names, leading to comedic accusations of users being labeled femboys: MANUSâS API IS LYING!
- Invite Link Feature Vanishes: Some users have found that the option to generate invite links has disappeared from the UI.
- It seems that the invite link generation feature is no longer available to all users and some speculate this affects free users only.
- Users Ponder Credits Usage: Users expressed concerns about the cost of credits, with one noting that a PDF report cost 500 credits and a DCF on Google cost 1500 credits.
- Some believe the credit usage is too expensive, especially for complex tasks and look forward to alipay as a payment method.
LMArena Discord
- Geminiyoâs Heavy Reasoning Time?: Members pondered whether Gemini 2.5 Pro will have longer reasoning times compared to other reasoning models, given it is a heavier model.
- The discussion centered on the trade-offs between model size, complexity, and inference speed.
- Grok 3.5 Launch Stuck in the Muck?: Theorizing that Elon delayed the release of Grok 3.5 because fine-tuning it to the far right wasnât successful, one member was promptly corrected by another noting it was false information.
- Discussion touched on the possibility of injecting political stuff into the system prompt.
- Attention Steering a Mirage?: Despite claims that LLMs can steer attention, particularly with Grokâs Twitter timeline, one member clarified that this isnât steering attention where you need them to, but rather a plain announcement.
- They further linked to Transluceâs observability interface as a tool to play with feature steering, but cautioned itâs not particularly useful in practice yet.
- LMArena Funded by Credits, not Cash: In a discussion about how LMArena funds its models, a member pointed out that itâs not just the companies themselves who pay for inferences, suggesting credit grants are provided to LMArena instead of direct monetary payments.
- Another member added that big labs give LMArena endpoints for their models, with valuable data on human preference being collected as a result.
- O3 Pro MIA on Arena: Despite speculation about an O3 Pro release, a member stated O3 pro wont come to arena lol, to which a moderator replied I canât confirm if/when new models are arriving on arena, but will be sure to put out announcements when I can.
- Anticipation remains high for new models to be added to the platform.
Unsloth AI (Daniel Han) Discord
- QNL smokes GGUF for Speed: QNL is reportedly faster than standard GGUFs, but formal benchmarks are still pending.
- See Unsloth Dynamic 2.0 GGUFs documentation for details.
- Scale Up with Multi-GPU Fine-tuning: For multi-GPU finetuning, using Accelerate with Unsloth may be successful.
- Despite consumer GPUs like the 3090 offering 24GB VRAM, companies often opt for H100s for local AI.
- SLMs Challenger to LLMs: Smaller models (SLMs) can become competitive through fine-tuning on specific tasks, even if they are not as smart out-of-the-box.
- Qwen3-4B is recommended for models needing decent reasoning power, and purportedly beats Mistral 7B.
- Qwen3 cracks translation: Qwen3 is suggested for translating datasets in Mandarin due to its pretraining data.
- Users have reported success using 30B models with Ollama on Kaggle for processing millions of strings.
- SmolVLM roasts in-browser: A member created AsianMOM, a WebGPU Vision-LLM app that roasts you like ur mom in-browser.
- It uses SmolVLM 500M and LLama 3.2 1B and works directly in your browser, thanks to Transformers.js and HF.
aider (Paul Gauthier) Discord
- Grok Sparks White Genocide Fears: Elon Muskâs Grok raised concerns when it made wild claims about white genocide, causing distrust and prompting some to view xAI as a joke, and Elon as the biggest threat to democracy on X.
- Users are avoiding the model and one member joked that they only used xAI to ask who the biggest threat to democracy on X was, and it admitted it was Elon.
- Gemini 2.5 Pro Flees Copilot: Gemini 2.5 Pro was briefly available in Copilot, but it has since been removed, generating speculation about Microsoft investing more in open-weight models.
- The removal prompted speculation that Microsoftâs rocky relationship with OpenAI may lead them to focus on open-weight models.
- Dev Generosity: Free API Credits Flow: A user offered free API credits for Gemini, Claude, and OpenAI, inviting others to test new infrastructure and develop interesting projects, particularly for the aider project.
- Another member is planning to add a
/consolidate
command to aider to roll each long chat into a single, fully-specified prompt, using a fresh single-turn request using the main model, in response to the LLMs Get Lost In Multi-Turn Conversation paper.
- Another member is planning to add a
- Aider Token Use Gets a Grip: Users discussed strategies for managing token usage in Aider, suggesting the use of
/clear
to reduce context and/add
only necessary files, and using models like Gemini 2.5 Flash via Google AI Studio.- Also suggested was using OpenRouter like Deepseek v3 0324, paired with copy-pasting from Gemini 2.5 Pro.
- Muscle-mem tool surfaces: A member shared a github link to a tool, muscle-mem, perhaps an aid to memorization?
- No other details were shared, so it is difficult to know what it is for!
OpenAI Discord
- GPT-4.1 Mini Smarts Spark Coding Debate: Members debated the merits of GPT-4.1 versus GPT-4o, with some asserting that 4.1 is superior for coding while others found 4o to be more intuitive overall, and several claimed 4.1 mini is the best small model.
- Some users shared their prompt testing results and experiences across specific dialects to show their GPT-4.1 preferences, while sharing screenshots of the models.
- GPT-5 Release Date Guessing Game: The community discussed the potential release timeline for GPT-5, with some anticipating a launch in summer (June-September) while others suggested a later release around August-December.
- One member said, Connecting the dots here - Sam said in the Feb 12 announcement that GPT-5 will solve the confusing names issue.
- Gemini 2.5 Pro Wins Fans with Million Context Window: Users praised Gemini 2.5 Pro, noting its coding abilities, reasoning skills, and large context window, with one saying Gemini 2.5 Pro is actually really good at coding, while acknowledging that its free availability is likely temporary due to its high running costs.
- One user has switched to Gemini 2.5 Pro due to its 1 million context window stating that they simply canât work with GPTâs tiny 32k context window anymore.
- OpenAI Searches Amazonia For Archaeology: OpenAI announced the OpenAI to Z Challenge using o3, o4-mini, or GPT-4.1 to discover previously unknown archaeological sites in the Amazon, inviting participants to share progress on X with the hashtag #OpenAItoZ.
- The challenge details are available at the OpenAI website.
- Community Gives Research GPT the Eye Test: A member is requesting feedback on their Research GPT, in its final stages of refinement.
- The creator is particularly interested in identifying potential issues in English, as it is not their first language, while noting that Korean functionality is satisfactory.
Cursor Community Discord
- Cursor Pro Plan Clarifications: A member inquired about using all models on their current Cursor Pro plan and whether an API key is required, which was confirmed to be managed by Cursor itself.
- The plan includes various models supported directly by Cursor, eliminating the need for users to manage their own API keys.
- Cursor Client Version Stats: A user shared detailed Cursor client information, including VSCode version 1.96.2, Electron 34.3.4, Chromium 132.0.6834.210, and Node.js 20.18.3, while troubleshooting a âno restartâ message, as seen on the Cursor FAQ.
- This level of detail helps in diagnosing specific issues related to the clientâs configuration and environment, though it was also suspected to be related to timezone.
- Claude 3.5 Sonnet Runs in Circles: A user reported that Claude 3.5 Sonnet was getting stuck in a loop, complete with supporting images, eventually resolving itself due to context limits.
- The issue highlights potential problems with the modelâs ability to manage context and avoid repetitive outputs.
- Slash Context Reset Command Silently Implemented: Members discussed using the /reset command in Cursor to clear the context, with some users expressing dislike for its silent execution.
- It was clarified that after typing /reset, nothing will show up, it will be executed silently, which might confuse some users about whether the command was actually processed.
- Gemini Pro Preview Has Editing Setbacks: A member reported that Gemini Pro Preview spends a significant amount of time deciding on code changes but then struggles to apply those edits effectively.
- Another member pointed out that version 0.50.4 aimed to improve the apply functionality, suggesting that the issue might be addressed in newer releases.
LM Studio Discord
- Fantasy Prompts Frustrate Llama Models: Users reported that Llama 3.1/3.3 models produce undesirable outputs when given fantasy-themed prompts, exhibiting token loss, punctuation issues, and partial word omissions as shown in attached screenshots.
- The issues highlight ongoing challenges in maintaining coherence and accuracy with specific types of prompts, with no clear solution offered by the community.
- LM Studio Vision API Input Guidance Given: Users sought advice on supplying images to a vision-enabled LLM via the LM Studio API without the Python library, focusing on the OpenAI endpoint and one user pointed to the LM Studio documentation.
- A cURL example was shared, demonstrating how to pass an image URL to the API, and resolving the issue for the user.
- DRAM Secondary to VRAM for Model Speed: Members discussed the impact of VRAM versus DRAM speed on model performance, concluding that DRAM speed barely matters if you keep your model in VRAM.
- They emphasized keeping the model entirely within VRAM for optimal speed and performance, since the KV Cache location impacts performance.
- 7900 XTX Card tempts over Nvidia: One member bought a 7900XTX card and is about to sell one of their Nvidia cards, because they were mildly annoyed with dual card driver instability issues with their 4080s and 4060ti cards.
- They mentioned that the 5060 ti is going back as a return, with no further detail.
- 5060 Ti Faces AI and Gaming Divide: The 8GB 5060 Ti is considered inadequate for AI tasks, and not economical enough for budget gaming setups.
- The consensus suggests that prospective buyers should consider the 16GB version instead if they want to use AI models.
Yannick Kilcher Discord
- AlphaEvolve: LLM in Disguise?: Members debated whether AlphaEvolve is simply an LLM-driven agentic loop or an evolutionary algorithm, referencing Google DeepMindâs blog.
- Arguments centered on the balance between the power of LLMs like Gemini and the importance of the evolutionary engine and verifier components, crucial to review in the ablation section of the paper.
- Gemini 2.5 Pro Aces Coding: The community lauded Gemini 2.5 Proâs coding skills, notably its zero-shot performance, while also noting it had sorta random refusals for cyber ethics homework when released.
- Fine-tuning with verified rewards and rejecting non-compiling outputs may be the key to its coding excellence and reasoning abilities.
- LiquidAI Faces Vaporwave Skepticism: Skepticism is mounting around LiquidAI and its liquid foundational models, with one member initially writing them off, while another compared LiquidAI to State Space Models (SSMs).
- The community favored the treatment of SSMs like Mamba, Gated DeltaNet, or RWKV 7, noting similarities between these and LiquidAIâs Research at LiquidAIâs Research.
- Absolute Zero: Models Rise From Nothing: The community discussed the Absolute Zero paper Absolute Zero, which improves models without any data by having LLMs generate and verify synthetic training data.
- The LLM is trained to perform three tasks: Y = F(X), Y = F(?), and Y = ?(X).
- Samsung Accidentally Drops the MythoMax-L2-13B Roleplay: Samsung inadvertently released the MythoMax-L2-13B roleplay model, spotted and quickly removed on Hugging Face.
- A member joked, âCan someone do that for OpenAI and âreleaseâ GPT4Chan? Or Anthropic, that would be priceless.â
OpenRouter (Alex Atallah) Discord
- Users Get Shortcut to Quick Chat: Users can now click on model icons in the grid to initiate a quick chat with a specific model, streamlining the user experience as shown in the attached image.
- The new feature lets users start quick chats with individual models by simply clicking on the model icons, greatly improving efficiency because it bypasses the need to open the entire group.
- DeepSeek v3 is a MoE Model: DeepSeek v3 is a Mixture of Experts (MoE) model, meaning it activates only a subset of its parameters during inference.
- Even though all the parameters are loaded into VRAM, only the parameters relevant to the prompt are computed, making the inference speed much faster.
- Urban Corvids Switch to Peanuts and Cat Food: Users observed that corvids (crows and magpies) are only eating peanuts and cat food instead of normal bird food.
- It was suggested that urban corvids have adapted to a diet closer to trash and prefer alternatives to standard bird food, as their diet has changed over many generations.
- Bypass Country Restriction with a Proxy: A user shared an error message from OpenAI indicating that their country, region, or territory is not supported, which they bypassed using a proxy.
- This avoids geographic restrictions causing the error.
Qwen3
Needs Toggle to THINK: Forqwen3
, it needs to be forced to think with/think
or/no_think
to toggle on and off thinking.- It was reported that
/no_think
functionality had a bug and OR needs to auto route away.
- It was reported that
HuggingFace Discord
- Samsung Enters LLM Race: A member announced the release of new models from Samsung, including the MuTokenZero2-32B model and MythoMax-L2-13B model.
- Another member indicated that the models were a work in progress.
- LangGraph Powers Complex Flows: Members shared a link to the LangGraph documentation (LangGraph Course), highlighting its use in building agentic workflows and complex conversational flows.
- LangGraph excels at overseeing intricate dialogue paths and multi-agent setups.
- AsianMOM Roasts You!: A member introduced AsianMOM, a WebGPU Vision-LLM app that roasts you like your mom in-browser using SmolVLM (500M) and LLama 3.2 (1B), and available on asianmom.kuber.studio.
- The creator expressed that this funny little project genuinely taught them so much about WebML and Vision models, noting that the technologies weâre getting with WebML will 100% democratize AI access.
- DistilRobertaâs Accuracy Questioned: A member questioned why the DistilRoberta version of a model has more downloads than Roberta, wondering if itâs better for emotion detection despite potential accuracy differences.
- Another member explained that DistilRoberta is a lighter version of Roberta, trained to balance computational cost and accuracy, but theoretically has lower accuracy due to fewer weights.
- Agent Course Template Triggers: A member reported that the First_agent_template worked initially but now consistently throws errors, and asked if theyâd run out of credits.
- Another member noted this space for Unit 3 has error and needs to be fixed Unit 3 Agentic RAG.
Notebook LM Discord
- Ducky Bedtime Stories Quack into Action: A member created an audiobook of ducky bedtime stories, reading with appropriate energy and enthusiasm, using different voices for each character.
- The expert speaker is a duck and can only say âQUACK!â, with the volume decreasing as the audiobook progresses.
- Notebook LM helps Achieve Deep Focus: A user discovered that running Googleâs Notebook LM podcast of a chosen book with YouTube music on a low volume helped them enter deep focus at work.
- The user recommends a loop button on the podcast option and an integration with YouTube Music for a richer experience.
- Pakistan Patrons Ponder VPN Pathways: A user inquired about the appâs lack of availability in Pakistan, and another suggested using a VPN to download it.
- Another user, an Android app tester, pointed out issues with voice review personalization within the studio.
- Link Lurkers Launch Looming Lies: Users were cautioned about scammy links promising free gifts, easy cash, or amazing deals, and were advised to think twice before clicking such suspicious links.
- It was emphasized that links offering freebies are major red flags, and users should always protect their personal information.
- Podcast Plan Pays Off: A user hit their 100-podcast max on NotebookLM and intends to download the WAV files, convert them to video podcasts, and upload them to YouTube or Google Photos.
- Another user replied that this is smart, with another user replying I do something similar.
Nous Research AI Discord
- Nous Research and Solana Foundation partner for Decentralized AI: Nous Research is co-hosting an event with the Solana Foundation in NYC on May 22, highlighting efforts to democratize intelligence through Decentralized AI. Registration is available here.
- The event will feature discussions on Psyche, Nousâs project focused on democratizing intelligence.
- Psyche races into Hyperdrive: The training rate for Psyche is 12B tokens per day, while processing the entire 20T tokens dataset is estimated to take almost 2000 days.
- Contributors can support model training through donations to the mining pool on the Psyche Network or by contributing to the codebase on GitHub, spurring calls for more GPU power.
- Meta Tackles AR and AI Integration: Meta faces challenges in integrating AI into its smart glasses, which could potentially make its AR investments obsolete if not properly managed.
- Despite this shift, Meta is continuing AR research with projects such as Project Aria.
- Smart Glasses Crave Agentic AI: The general consensus is that smart glasses need real agentic AI to effectively interpret and interact with the userâs environment, as demonstrated by Sesame.
- Members are prompting calls for an open smart glass AI to foster innovation towards more useful integrations.
- Grokâs Glitches in South Africa: Discussion arose whether Grokâs issues in South Africa stemmed from tweaked steering vectors or clumsy prompt updates.
- One member stated Absolutely no basis to back it up but I am voting clumsy prompt.
GPU MODE Discord
- TritonBench Benchmarks Bomb on AMD GPUs: A member found that about 7 kernels throw memory access fault errors when running TritonBench benchmarks on an AMD GPU.
- One example provided was chunk_gla_fwd.py which throws an
Unknown Reason
error, and the member requested assistance to pinpoint the cause.
- One example provided was chunk_gla_fwd.py which throws an
- CUDA IPC Memory Handles String-Serialized: A member explored using
cudaIpcGetMemHandle()
for single GPU multiprocess communication and found thatcudaIpcMemHandle_t
can be string-serialized.- This enables a straightforward producer-consumer setup for sharing memory handles, sidestepping more complex inter-process communication methods for single GPU sharing.
- Tracing Fused Operations Requires Careful Code Reading: A member inquired about mapping fused operations back to their original model code after compiler fusion, and another member replied with a link to docs regarding
inductor_provenance_tracking_node_mappings_<number>.json
files.- The member was unsure how to easily map the exported programâs graph to the original model code without careful reading.
- Pipeline Parallelism yields no concurrency gains: A member experimented with pipeline parallelism using
torch.autograd.graph.saved_tensors_hooks
to manage activations across separate CUDA streams, aiming for concurrent forward and backward passes, referencing the docs.- Despite successful implementation without race conditions, the member observed minimal concurrency gains due to the modelâs kernel occupancy, deeming it a âfun experiment though!!â
- MI300 Heats Up with New Benchmarks: Several users submitted new benchmarks on the MI300 across different leaderboards, including
amd-fp8-mm
andamd-mixture-of-experts
.- Multiple successful submissions were recorded on the
amd-fp8-mm
leaderboard, with times ranging from 155 ”s to 3.28 ms on the MI300, while theamd-mixture-of-experts
leaderboard saw frequent entries, with multiple users achieving personal bests, such as 6233 ms and 6247 ms on the MI300.
- Multiple successful submissions were recorded on the
Latent Space Discord
- OpenMemory MCP Opens Memory: A member shared OpenMemory MCP, a new open-source project that aims to provide a unified memory management layer for AI applications.
- The community reacted with praise calling it a cool unified memory management layer for AI apps.
- Ongoing Grok Troubles Tracked: Ongoing issues with Grok are being tracked in this Discord channel.
- No further context was provided.
- Microsoft Axes TypeScript Talent: Microsoft fired the TypeScript dude without warning, sparking dismay, as seen in this tweet.
- Community members expressed that the firing happened without warning.
- Agentic Tooling Outpaces Big Tech: Members discussed a shift happening with agentic tooling, expressing hope that indie devs can use it to outpace big tech and corporations.
- It was suggested that getting the computers to do the right thing well is a harder problem to solve than do the wrong thing well, especially given internal incentive structures in corporations.
- FUNapis Succumbs to Bingâs Chatbot: A member suggested that FUNapis died so Bing can sell their chatbot wrapper of the API, as seen in this tweet.
- No further community context was provided.
Eleuther Discord
- Optimize Data Loading and Preprocessing: A member is optimizing their data loading and preprocessing pipeline to avoid being bottlenecked by resources due to bad tooling as they work by themselves outside of academia or industry.
- They hope that this work will benefit all future audio data + mech interp work at scale.
- DNN Training Plagued by Data Stalls: Discussion highlights concerns that CPU-based preprocessing might bottleneck DNN training pipelines, particularly in audio modality contexts, referencing the paper Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling.
- Members debated the benefits of optimizing CPU workload versus potential GPU bottlenecks.
- BlackboxNLP heads to EMNLP 2025**: The 8th edition of BlackboxNLP, will be co-located with EMNLP 2025 in Suzhou this November.
- They will feature a new shared task on circuits/causal variable localization in LMs using the recently released MIB Benchmark with a submission deadline of August 1st.
- False Flags in MCQ Evaluations: An issue was identified where MCQ evaluations like MMLU flag model outputs as false, even when models assign the highest probability to a specific option based on NLL values.
- This issue is especially prominent with smaller models, indicating a potential bias or limitation in how these models handle multiple-choice questions.
MCP (Glama) Discord
- MCP Client Seeks Server Emulation Advice: A member needs help emulating an MCP server, after starting the client/server handshake, client -> method: initialize and server -> method initialized.
- They seek insights on the intermediate steps to properly implement an MCP server.
- Chainlit Query Parameter Conundrums: A member is struggling to access query parameters from the URL within Chainlit, despite efforts using FastAPI middleware.
- The member tried passing tokens and decoded dictionaries but failed to access them, seeking solutions to properly retrieve query parameters.
- Jinko MCP Courts Hotelier AI Agents: The community announced the creation of the Jinko MCP for developers to build AI agents that can sell hotels, with the Jinko MCP GitHub repository now available.
- The new tool provides access to 2M+ hotels, enabling search, booking, payment, and customer service functions.
- Smithery Server Square-off with Claude Desktop: A member requires assistance integrating a Smithery-installed server with Claude Desktop using their OpenRouter key.
- The member questions if the model used in the MCP tool configuration needs to align with that in Claude (e.g., sonnet-3.5 in MCP config vs. sonnet 3.7 in Claude).
- Shortwave Surfs MCP Client Support: Shortwave now offers MCP client support, supporting both HTTP MCP & stdio MCP, and provides one-click toggles for integrations like Hubspot, Notion, Zapier, Asana, and Linear, according to their blog post.
- Further details are available in their documentation.
Modular (Mojo đ„) Discord
- Mojo stdlib documentation location clarified: The Mojo stdlib documentation is generated directly from the stdlib itself, as clarified in a thread, and is modifiable directly, as opposed to in the
/mojo/docs/manual
directory, and a member fixed the doc in PR 4530.- This addresses Issue 4482 which a member was looking to update.
- Members wrestle with pointer declarations in Mojo: Members sought help with declaring a Pointer in a Mojo struct, with one suggesting making the
Op
generic over origin if they want to borrow.- It was clarified that Mojo requires the origin to be part of the type, making it a parameter, as related to the borrow checker, as explained in the Mojo Lifetimes Documentation.
- MAX plagued by installation issues: A member encountered errors during MAX installation, indicating missing essential functionalities like tensor operations (
tensor
,nn
,zeros
,ones
,matmul
,add
,mul
).- This is preventing the continuation of a MAX-only implementation for a diffusion LoRA trainer due to weak tensor support in Mojo and MAX.
- Hybrid MAX and PyTorch Approach more viable, for now: Due to missing tensor operations for MAX-only LoRA trainer, Claude AI suggested a hybrid approach using PyTorch and MAXâs interoperability features as a more viable solution for immediate implementation.
- A member ensured that tools like Claude have access to the current Modular GitHub repository and documentation to avoid LLM hallucinations.
- Karpathyâs micrograd gets ported to Mojo: A member is learning Mojo by porting Karpathyâs micrograd, sidestepping the lack of lambda function support, and another member shared their similar project, momograd, created last year as one of their first Mojo learning projects.
- The momograd project has not been updated to the latest Mojo versions but serves as an example of the community interest.
Cohere Discord
- Cohere courts cooperation with community cause: A tech-adjacent nonprofit seeks event sponsorships and grants from Cohere, aiming for a partnership to further their tech-focused initiatives.
- Interested parties should reach out to [email protected] to connect with the appropriate Cohere staff.
- Cohere Classify API Clicks with Clients: Users laud the Cohere Classify API, expressing eagerness to scale its usage to millions of entries and suggesting they plan to contact [email protected] to request a rate limit increase.
- The increase would help to explore the feasibility of running the API at scale without extensive waiting times.
- SiliconFlow Setups Stirring with Screenshots: A user locally modified the SiliconFlow endpoint, demonstrated in an attached image.
- Additionally, screenshots of Gemma 3 4b 4bit and Llama 3.2 3B 4bit were shared, showcasing their implementations in separate attached image and another image.
- Web AI Engineer Wants Work: A seasoned Web, AI Engineer with 7 years of fullstack experience introduced themself, with proficiency in modern web technologies.
- They are skilled in React(Next), React Native(Expo), Flutter, Vue(Nuxt), Svelte, Astro and tools like Node/Express, Django, Nest.js, Go, Web3.js, Shopify, Wordpress, TailwindCSS, Shadcn, MUI, Docker, Kubernetes, AWS/GCP, LLM.
- Full stack fan of AI finds favor with frameworks: A full stack developer with over 20 years of experience has embraced AI, and enjoys building real-time applications with carefully crafted UI and UX.
- They are a fan of Nuxt running on Cloudflare and using tools like RooCode and Windsurf.
DSPy Discord
- Gemini Modelsâ Structured Outputs implemented in DSPy: Members discussed if Gemini Modelsâ response schema, similar to OpenAIâs structured outputs, is implemented in DSPy, and another member confirmed that it is.
- It was also confirmed that DSPy dynamically builds the response schema.
- Pydantic Models drive Structured Outputs in DSPy: A member inquired about implementing structured outputs in DSPy, similar to OpenAI tools, including nested outputs or JSON schema constraints.
- Another member replied to just use signatures, and pass Pydantic models or Python TypedDicts as output field types.
Nomic.ai (GPT4All) Discord
- GPT4Allâs Vital Signs Fade: Members speculate about the discontinuation of GPT4All due to the absence of updates since February.
- The community expresses concern over the lack of communication from Nomic regarding a new version release.
- Nomic Eyes Pay-to-Play Model?: Speculation arises around Nomic potentially transitioning to a monetized platform.
- The claim that gpt4all is over and Nomic is pivoting to monetization lacks substantiating evidence.
- Jan.ai and LM Studio emerge as GPT4All contenders: Jan.ai and LM Studio are mentioned as possible substitutes for GPT4All in light of recent concerns.
- The discussion did not include why those were good alternatives or which features they had that might be beneficial.
LlamaIndex Discord
- Event-Driven Agents Assist Weaviate: LlamaIndex unveiled a walkthrough demonstrating how to use event-driven agent workflows to construct a multi-agent Docs Assistant that writes webpages into LlamaIndexDocs & WeaviateDocs collections in Weaviate.
- The orchestrator decides when to call the Weaviate QueryAgent for search, showcased in this Tweet.
- Tig Coding Agent Makes Debut: An open-source (human in the loop) coding agent called Tig, created by @rsrohan99 and built with LlamaIndex workflows, was highlighted.
- Tig can write, debug, and analyze code across multiple languages, execute shell commands, and search the web as shown on Twitter.
- LlamaIndex Tackles PDF Content Extraction: A member requested advice on extracting content from a PDF using LlamaParse or LlamaIndex, specifically to extract the Table of Contents and isolate content and tables from a particular section based on a predefined name.
- The user seeks guidance on setting up the instructions or pipeline to detect the section from the TOC, isolate the content, and properly structure the extracted tables, with the right parameters for use in no-code tools like n8n.
- AI Startup goes Vibe Coding: An AI startup based in Korea is looking for passionate developers with Vibe Coding experience to partner on client projects.
- The opportunity includes a fair revenue-sharing model and ongoing partnership, with requirements for strong communication skills, GitHub links, Vibe Coding project references, and English/Korean communication skills.
Torchtune Discord
- Torchtune Network Stumbles on vLLM: A member faced implementation failures while trying to deploy a custom Torchtune network on vLLM, despite following several tutorials.
- A suggestion was made to convert the checkpoint to HF format for better syncing, also inquiring whether the model was registered with vLLM.
- Custom Models Wrestle with vLLM: A member reported difficulties implementing a custom model with a custom architecture within vLLM.
- Another member shared a vLLM guide on implementing custom models to assist with the implementation.
LLM Agents (Berkeley MOOC) Discord
- Lambda Workshop Teaches Agentic AI: A Lambda workshop is teaching how to build agentic applications using Lambdaâs Inference API, optimizing agent performance, and deploying agents in production.
- Participants can apply for $100 serverless API credits by Friday 5/16 via this form.
- Nobel FutureTech Fireside Chat Details: A fireside chat co-hosted by Nobel FutureTech Group and Berkeley RDI is providing insights into the innovative ecosystem of the Nobel FutureTech Genius Club.
- The session gives information on mentorship, funding, and collaboration opportunities, with a livestream link available.
tinygrad (George Hotz) Discord
- Topk Bounty Asks for Revision: A user questioned the âmove topkâ bounty requirements, noting topk, masked_select, and randperm_generator already run off the CPU.
- They proposed the bounty be revised due to functions like index_put_impl and index_tensor still requiring attention.
- Index Functions Awaiting GPU Acceleration: It was noted that index_put_impl and index_tensor are still running on the CPU.
- The suggestion was to target these and other functions in the torch backend for GPU offloading.
MLOps @Chipro Discord
- Webinar Announced: Agentic Enrichment with Featureform: A live webinar on Agentic Enrichment with Simba Khadder, Founder of Featureform, is scheduled for Tuesday, May 27th at 8 AM PT and will cover how to unlock data for AI agents using MCP, and you can sign up here.
- The webinar will discuss the missing layer of infrastructure needed for AI agents to access real-time, internal business data, highlighting the limitations agents face due to data access rather than intelligence.
- Featureform Tackles LLM Data Access: The webinar will cover the need for better internal data access to unlock the full potential of AI agents, detailing the three key components of agentic enrichment: semantic catalog, low latency serving, and governance.
- It will demonstrate how Featureform enables this data access, making agents more useful and powerful in production environments, with real-world examples of improved workflows in AI systems.
Codeium (Windsurf) Discord
- Windsurf Floats SWE-1 Models: Windsurf has launched the SWE-1 family of software engineering models, including SWE-1, SWE-1-lite, and SWE-1-mini, detailed in a blog post and launch video.
- Windsurf claims the new models will accelerate software development by 99%.
- SWE-1 Shows Claude 3.5-Level Performance: The SWE-1 model is advertised to have high-reasoning, tool-capable, and Cascade-optimized performance comparable to Claude 3.5 but at a reduced cost.
- The models are trained using a unique âflow awarenessâ approach, understanding the timeline between humans and AI across development surfaces.
AI21 Labs (Jamba) Discord
- AI Tinkerers to Host AI21 Meetups: AI Tinkerers will host meetups with AI21 to discuss the newly announced Maestro platform, a tool for planning and orchestration.
- The meetups are free and open to the public with registration required for events in New York City, Paris, and San Francisco - see links above.
- AI21 Launches Maestro Planning Platform: AI21 Labs recently revealed Maestro, a platform designed for planning and orchestration in AI systems.
- This platform seeks to equip developers with the necessary tools and infrastructure to construct more complex and effective AI applications.
The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (869 messagesđ„đ„đ„):
Perplexity Pro role, App answers cannot be read on the web version, Research function down, Deep Research broken, Deepsearch rate limits
- Pro role troubles, solved by mods: A member inquired about obtaining the Perplexity Pro role after purchasing the Pro plan and a moderator sorted it out manually and stated that they are currently reworking how to get the Pro role on Discord in the meantime.
- Another user confirmed that having the same email for Discord and Perplexity is not a requirement.
- Mobile App answers causing Webpage Glitches: Users reported that answers obtained from the Perplexity mobile app cannot be read on the web version, prompting an error message; this issue doesnât occur when the answer is obtained directly from the webpage, as demonstrated here.
- A user reported this issue to customer service 10 hours ago and they said the issue has been reported to the technicians but it has not been fixed yet.
- Research function is temporarily offline: Some users asked, Is the Research function down?, with some reporting that they were unable to use Perplexity, and the status page reflecting an outage.
- This was fixed within the hour, and some users who were experiencing issues being logged out of their account found that clearing their browser cache resolved the problem.
- Deep Research Doomed? Users question value: Multiple users reported that Deep Research mode appeared to be broken, citing issues such as the system defaulting to regular search on the web, only using a limited number of sources, and that pro search only uses 10 sources, instead of 20.
- One user stated: I legit just use Perplexity because it reads 20-40 sources per query. If thatâs dumbed down, I see no reason to pay for it.
- Debate Grok vs Perplexity: Members discussed the use of Grok, with one sharing We donât talk about it. Itâs worse than regular search in some cases, while others discussed the Deepsearch rate limits and response quality.
- Another noted that Grok scrapes the web very well but it sucks at one shot sometimes but the rate limits compensate for that if you elaborate and cuss enough.
Perplexity AI â· #sharing (1 messages):
23andMe files for Chapter 11
- 23andMe Prepares for Restructuring: A link was shared regarding 23andMe filing for Chapter 11 bankruptcy.
- Legal and Financial Restructuring Imminent: The Chapter 11 filing suggests that 23andMe is facing significant financial challenges and is seeking legal protection to reorganize its debts and operations.
Perplexity AI â· #pplx-api (6 messages):
Sonar API, Perplexity hackathon Credits, sonar model
- Sonar API Key Woes Plague Hackathon: A member is facing issues acquiring the Sonar API for a hackathon project because it requires credit card details, which they donât have, and seeks a way to access the API for free, solely for demo purposes.
- Another member also reports the same issue.
- Hackathon Credits MIA?: A member reports not receiving their Perplexity hackathon credits for 2 days and requests assistance.
- The member is looking to use the API to gather more info on a list of contacts, like it does on the web.
- Sonar API output mismatch: A member reports that the answers generated using the Sonar API are different from those generated in the Perplexity AI playground.
- They speculate it might be due to their system prompt, and provides a link to the Perplexity AI Model Cards.
Manus.im Discord â· #general (472 messagesđ„đ„đ„):
Manus Vibe Coding Livestream, Johnny's credit farming, Femboys, invite links gone, Credits Usage
- Manus Hosts âVibe Codingâ Livestream: Manus hosted a âVibe Coding with Manusâ livestream from San Francisco, featuring the Manus SF Fellows, available on YouTube.
- The livestream showcased coding projects, fostering community engagement.
- Johnny the Credit-Farming Genius: A user humorously highlights how Johnny is âfarming Manus dailyâ compared to another user paying for manus to make a femboy detector
- This illustrates a humorous contrast between exploiting the platform for free credits and using it for entertainment.
- The Femboy Detector App: A user created an app that determines if a person is a femboy or not, using Function Calling to get currency rates from wise.com.
- The app outputs âfemboyâ for male names, leading to comedic accusations of users being labeled femboys: MANUSâS API IS LYING!
- Some Users Lost Invite Link Feature: Some users have found that the option to generate invite links has disappeared from the UI.
- It seems that the invite link generation feature is no longer available to all users. Some speculate this affects free users only.
- Credits Usage Concerns Arise: Users expressed concerns about the cost of credits, with one noting that a PDF report cost 500 credits and a DCF on Google cost 1500 credits.
- Some believe the credit usage is too expensive, especially for complex tasks and look forward to alipay as a payment method.
LMArena â· #general (401 messagesđ„đ„):
Gemini 2.5 Pro Reasoning Time, Elon's Grok 3.5 Release Delay, LLMs Steering Attention, LMArena Model Funding, O3 Pro on Arena
- Geminiyo Sparks Performance Gap Thoughts: Members pondered whether Gemini 2.5 Pro will have longer reasoning times compared to other reasoning models, given it is a heavier model.
- Grok 3.5 Release Delayed Due to Elonâs Far-Right Fine-Tuning?: Theorizing that Elon delayed the release of Grok 3.5 because fine-tuning it to the far right wasnât successful, one member was promptly corrected by another noting it was false information.
- Another chimed in, Why would he include political stuff into the system prompt? while attaching a screenshot about the dangers of programming society with social networks and LLMs.
- LLMs Canât Steer Attention Like You Think: Despite claims that LLMs can steer attention, particularly with Grokâs Twitter timeline, one member clarified that this isnât steering attention where you need them to, but rather a plain announcement.
- They further linked to Transluceâs observability interface as a tool to play with feature steering, but cautioned itâs not particularly useful in practice yet.
- Labs Give LMArena Credits, not Cash: In a discussion about how LMArena funds its models, a member pointed out that itâs not just the companies themselves who pay for inferences, suggesting credit grants are provided to LMArena instead of direct monetary payments.
- Another member added that big labs give LMArena endpoints for their models, with valuable data on human preference being collected as a result.
- O3 Pro Arrival on Arena Remains a Mystery: Despite speculation about an O3 Pro release, a member stated O3 pro wont come to arena lol, to which a moderator replied I canât confirm if/when new models are arriving on arena, but will be sure to put out announcements when I can.
Unsloth AI (Daniel Han) â· #general (208 messagesđ„đ„):
Quantized versions (GGUFs, QNL), Multi-GPU Finetuning, SLM vs LLM, Qwen3 model for translation, H200 Temp
- QNL is Faster than Standard GGUFs: QNL is reportedly faster than standard GGUFs, but performance benchmarks are still pending; Unsloth Dynamic 2.0 GGUFs documentation provides more details.
- One member inquired about combining GPTQv[1-2] with GGUF + imatrix for accuracy improvements.
- Multi-GPU finetuning via Accelerate: To achieve multi-GPU finetuning, users can try using Accelerate with Unsloth.
- While consumer-grade GPUs like the 3090 offer 24GB VRAM, companies often prefer H100s for local AI tasks, despite not being top-level in datacenter contexts.
- LLMs vs SLMs: While smaller models (SLMs) may not be as smart out-of-the-box, they can become competitive through fine-tuning on specific tasks; Qwen3-4B is a good starting choice for models needing decent reasoning power.
- One member highlighted that Qwen3 4B is even better than Mistral 7B.
- Qwen3 translation capabilities: For translating datasets in Mandarin, Qwen3 is suggested because of its extensive pretraining data, with the recommendation of 14B parameters for adequate performance.
- One user reported success using 30B models with Ollama on Kaggle, citing speed requirements for processing millions of strings.
- H200 Temperature Expectations: Normal operating temperatures for H200 cards in cloud environments like Runpod are around 80-85°C, which is considered acceptable for production cards.
- One user reported running below 80°C, indicating good thermal performance.
Unsloth AI (Daniel Han) â· #off-topic (10 messagesđ„):
Fine-tuning AI models, Jarvis-like AI clones, Continuous Thought Machine on Flappy Bird
- Fine-Tuning AI Models: Heavenâs Take: A member suggested that fine-tuning would be possible to study an AI model.
- Another member agreed, mentioning their years of experience in AI-related projects.
- Jarvis Clones Found on YouTube and GitHub: A member stated that there are dozens of Jarvis-like clones on YouTube and GitHub.
- They suggested a quick Google search to find them.
- Continuous Thought Machine Tries Flappy Bird: A member trained a Continuous Thought Machine (CTM) from the CTM paper on Flappy Bird.
- After ~750 episodes, it can only make one pipe gap sometimes, indicating the difficulty of the task.
Unsloth AI (Daniel Han) â· #help (120 messagesđ„đ„):
Qwen3 DPO Training, Orpheus-3B Fine-tuning Issues, Mistral 7b VRAM Usage, Epoch Display Bug, BLIP2 and Transformers
- Qwen3 Gets DPOâed: A user inquired about training Qwen3 using DPO and whether a sample notebook similar to Zephyr exists, to which another user responded with a link to their Kaggle notebook for Llama 3 fine-tuning on multi-GPU setup, indicating a similar approach might work.
- The notebook includes steps for conversation models, using accelerator, and other optimizations useful for DPO.
- Orpheus-3Bâs Loss Landscape & Colab Crashes: A user reported fluctuating loss (4.5-3.9) while fine-tuning the Orpheus-3B TTS model using Unsloth and inquired about its normalcy.
- Another user responded that this is normal and that multiple epochs can potentially reduce the loss to 1, also sharing a code snippet for using SNAC for inference and troubleshooting Colab crashes due to account login access issues.
- GPU RAM gets Mistralâed: A user faced issues training Mistral 7B on an NVIDIA RTX A2000 (8 GB) despite Unsloth benchmarks suggesting it should be possible with QLoRA 4-bit, and sought advice on potential misconfigurations.
- A user pointed out that batch size, r value, and max_seq_length significantly impact VRAM usage, also suggesting the user ensures no other processes are consuming GPU memory.
- LLM Epoch Tracker Bug Squashed: A user noticed a discrepancy in the training output, where the progress bar appeared full after completing all steps, but the epoch count remained at 1/2, leading to confusion about whether the training completed both epochs.
- Another user suggested it might be a minor display issue, as the number of examples and steps aligned with completing two epochs.
- Vision Models ride LLM Compatibility Wave: A user asked if Unsloth supports BLIP2 finetuning and how to check if Transformers supports it.
- A user confirmed Transformers supports BLIP2, referencing a PEFT notebook and Hugging Face documentation and stating Unsloth is pretty much compatible with any transformers model.
Unsloth AI (Daniel Han) â· #showcase (3 messages):
WebGPU Vision-LLM app, Geminized Qwen3 MoE
- AsianMOM roasts you in-browser: A member created AsianMOM, a WebGPU Vision-LLM app that roasts you like ur mom in-browser.
- It uses SmolVLM 500M and LLama 3.2 1B and works right in your browser without having to install anything, thanks to Transformers.js and HF.
- Geminized Qwen3 MoE released: A member released a Geminized version of Qwen3 MoE, a merged bf16 LoRA trained on ~450 examples with 1 or 2 turns, around 250 of these examples are diverse, human prompted conversations directly from Gemini2.5.
- Use âYou are an assistant with reasoning capabilities.â system prompt to trigger Gemini style reasoning.
Unsloth AI (Daniel Han) â· #research (5 messages):
Intellect-2, Solo Author, Mechanistic-Interpretable Ethics
- Intellect-2 Boasts Performance: A member shared an ArXiv link to Intellect-2, implying that itâs performing well and bragging.
- The same member shared a news.smol.ai link related to the ArXiv paper.
- Solo Author Achievement: A member highlighted a personal website noting that it was a solo author achievement.
- It seems like a nod to the effort involved in solo research and development.
- Ethics Interpreted Mechanistically: A member shared a link to a Hugging Face Space called Mechanistic-Interpretable Ethics-Cell automata.
- The project is hosted on Hugging Face Spaces.
aider (Paul Gauthier) â· #general (193 messagesđ„đ„):
Grok's Alignment Issues, Gemini 2.5 Pro Removal, Free API Credits, Aider Token Usage, Consolidate Command for Aider
- Grokâs Alignment Issues and Xâs Joke Status: Concerns arose about Grok, Elon Muskâs LLM, after it made wild claims about white genocide, leading some to distrust it for development and consider xAI a joke.
- One member joked that they only used xAI to ask who the biggest threat to democracy on X was, and it admitted it was Elon.
- Gemini 2.5 Proâs Brief Stint in Copilot: Users noted that Gemini 2.5 Pro was briefly available in Copilot, but it has been removed.
- One speculated that Microsoft might invest more in open-weight models due to its rocky relationship with OpenAI.
- Generous API Credit Giveaway: A user offered free API credits for Gemini, Claude, and OpenAI to those building interesting projects, to test new infrastructure.
- Others joked that free tokens are free tokens, or expressed interest in using those tokens to contribute to the aider project.
- Managing Aider Token Usage Savvy: Users discussed the importance of managing token usage in Aider, recommending the use of
/clear
to reduce context and/add
only necessary files.- They suggested using free models like Gemini 2.5 Flash via Google AI Studio or OpenRouter like Deepseek v3 0324, paired with copy-pasting from Gemini 2.5 Pro.
- Aider is getting a
/consolidate
command: A member is planning to add a/consolidate
command to aider to roll each long chat into a single, fully-specified prompt, using a fresh single-turn request using the main model, in response to the LLMs Get Lost In Multi-Turn Conversation paper.- The goal is to address the issue of LLMs losing context in multi-turn conversations by rewriting previous turns into a clean prompt.
aider (Paul Gauthier) â· #questions-and-tips (99 messagesđ„đ„):
Commit Prompt, Rate limited, Black box for code, aider config, home directory
- Streamline commit messages: A user shared a cool tip for YML configurations, by adding
AIDER_COMMIT_PROMPT="Respond with exactly two plain text lines and nothing else. Line 1: your commit title (max five words, no labels or prefixes). Line 2: Changes: file1,file2,... . Do not include the word Title or any markdown, headings, quotes, or extra text."
for generating concise commit messages.- Another shared what the base prompt for the commit message should look like, formatted with
<type>: <description>
, e.g.fix: add feature
instead ofadded feature
.
- Another shared what the base prompt for the commit message should look like, formatted with
- Configure dark mode, YAML style: A user asked how to enable the black box for code snippets rather than the default white box.
- Another member pointed them to the configuration documentation and suggested setting
dark-mode: true
in their.aider.conf.yml
file.
- Another member pointed them to the configuration documentation and suggested setting
- Thinking budget for Gemini?: A member inquired about the behavior of max tokens for Gemini, asking if it truncates or actively budgets its response.
- A fellow member suggested checking the docs for
thinking_budget
and that there may be related extra parameters.
- A fellow member suggested checking the docs for
- Configuring O3 and GPT-4.1: A user asked how to use
o3 (high) + gpt-4.1
in--architect
mode.- Another member provided a link to the architect documentation and a sample command:
aider --model openrouter/google/gemini-2.5-pro-exp-03-25 --editor-model openai/gpt-4.1 --architect
.
- Another member provided a link to the architect documentation and a sample command:
- Insufficent funds for O3!: A user encountered an error while running the command
aider --model openrouter/openai/o3 --editor-model openrouter/openai/gpt-4.1 --architect
.- It was found out they needed to have money in the OpenAi settings as well as OpenRouter, to call the O3 API.
aider (Paul Gauthier) â· #links (1 messages):
p0lyg0n: https://github.com/pig-dot-dev/muscle-mem
OpenAI â· #annnouncements (1 messages):
OpenAI to Z Challenge, Archaeological Sites, Amazon, GPT-4.1
- OpenAI Launches Amazon Archaeology Quest!: OpenAI announced the OpenAI to Z Challenge using o3, o4-mini, or GPT-4.1 to discover previously unknown archaeological sites in the Amazon, inviting participants to share progress on X with the hashtag #OpenAItoZ.
- The challenge details are available at the OpenAI website.
- Explore the Amazon using OpenAIâs latest tools!: Participants are encouraged to leverage OpenAIâs o3, o4-mini, and GPT-4.1 models in a quest to unearth undiscovered archaeological treasures nestled within the Amazon rainforest.
- Share your journey and findings on X using the hashtag #OpenAItoZ to connect with fellow explorers and showcase your contributions.
OpenAI â· #ai-discussions (195 messagesđ„đ„):
GPT-4.1 Mini Smarts, GPT-5 Release, Gemini 2.5 Pro, OpenAI's Open-Source Model, Context Window Expansion
- GPT-4.1 Coding Focus Sparks Debate: Members debated the merits of GPT-4.1 versus GPT-4o, with some asserting that 4.1 is superior for coding while others found 4o to be more intuitive overall, and several claimed 4.1 mini is the best small model.
- Some users shared their prompt testing results and experiences across specific dialects to show their GPT-4.1 preferences, while sharing screenshots of the models.
- GPT-5 Speculation Ramps Up Release Date: The community discussed the potential release timeline for GPT-5, with some anticipating a launch in summer (June-September) while others suggested a later release around August-December, and one member said, Connecting the dots here - Sam said in the Feb 12 announcement that GPT-5 will solve the confusing names issue.
- Members expect the unified o-models will be unified with GPT in the GPT-5 release.
- Gemini 2.5 Pro Wins Fans but faces High Cost: Users praised Gemini 2.5 Pro, noting its coding abilities, reasoning skills, and large context window, with one saying Gemini 2.5 Pro is actually really good at coding, while acknowledging that its free availability is likely temporary due to its high running costs.
- One user has switched to Gemini 2.5 Pro due to its 1 million context window stating that they simply canât work with GPTâs tiny 32k context window anymore.
- OpenAIâs Open-Source Model Debuts in Mid-June: The community is anticipating the release of OpenAIâs open-source model around the middle of June, but some doubt they will be able to run it locally.
- They predict the model will be probably >30B params but <100B.
- Bidding on Vast.ai Is Addictive: A member claimed that bidding on interruptible compute on vast.ai is addictive if you have a workflow suited for it, and that they were running like $40,000 worth of gpus for less than a dollar/hr lol.
- They suggest that the workflow is best if suited for it.
OpenAI â· #gpt-4-discussions (21 messagesđ„):
GPT-4.1 vs GPT-4o, Fine-tuning Datasets for Story Generation, Mathematics in GPT-4.1
- GPT-4.1 races ahead of GPT-4o: Members discussed whether the new GPT-4.1 version is better than o3 mini and 4o models, citing the official OpenAI announcement that itâs slightly better at instruction following and better at remembering stuff.
- GPT-4.1âs existence called into question: One user jokingly suggested that GPT-4.1 isnât real, referencing a post from Sam Altman on X.
- Fine-tuning models for creative story telling: A member asked which model is best for fine-tuning with a dataset of 200 stories to consistently create similar, creative stories.
- GPT-4.1âs math skills questioned: One user asked if GPT-4.1 is good at mathematics, another responded mostly no unless 4o is good in it.
OpenAI â· #prompt-engineering (2 messages):
Research GPT Feedback, Multilingual Capabilities
- Research GPT Seeks Feedback: A member is requesting feedback on their Research GPT, in its final stages of refinement.
- The creator is particularly interested in identifying potential issues in English, as it is not their first language, while noting that Korean functionality is satisfactory.
- GPTâs Multilingual Capabilities: The GPT model appears to function effectively in Korean, indicating robust multilingual support.
- However, the request for feedback highlights the importance of verifying performance across different languages, especially when the developerâs proficiency varies.
OpenAI â· #api-discussions (2 messages):
GPT feedback, English language issues, Korean language model performance
- Feedback sought for Research GPT: A member is seeking feedback on a Research GPT and believes they are nearing completion.
- They are performing final checks to iron out possible remaining issues.
- Korean GPT excels, English GPT flounders?: The member notes that their Research GPT functions very nicely in Korean, but they are unsure about potential problems in English due to it not being their first language.
Cursor Community â· #general (192 messagesđ„đ„):
Cursor Pro vs. Free, Client Version Details, Claude 3.5 Sonnet, Gemini Pro Preview, Agent Rules Neglected
- Confusions on Cursor Pro features: A member asked whether they can use all models with their current plan and which models require an API key.
- Another member clarified that the models shouldnât give you an API key as all of them are held and supported by Cursor itself.
- Cursorâs Client Version Details: A member shared their Cursor client details (Version 0.49.6) including VSCode version 1.96.2, Electron 34.3.4, Chromium 132.0.6834.210, and Node.js 20.18.3 while reporting a âno restartâ message.
- It was suspected that might be late for the Canadian time zones.
- Claude 3.5 Sonnet Stuck in a Loop: A member reported that Claude 3.5 Sonnet was getting stuck in a loop, with attached images for reference.
- Eventually, it was saved by context limit.
- Resetting Contexts with a Slash: Members discussed using the /reset command to clear the context in Cursor, but some users do not like this functionality.
- One member noted that after typing /reset, nothing will show up, it will be executed silently.
- Gemini struggles editing: A member reported that Gemini Pro Preview spends considerable time deciding on code changes and then struggles to apply those edits.
- Another member mentioned that version 0.50.4 promised improved apply functionality.
LM Studio â· #general (58 messagesđ„đ„):
Expanding left sidebar, Llama issues with fantasy prompts, Token loss and punctuation issues, LM Studio API Vision Endpoint, Reka Flash Presets
- Discord sidebar expansion sought: A user inquired about expanding the left sidebar in Discord to permanently display icon names without hovering.
- Fantasy Prompts funking with Llamas: A user reported issues with various Llama models (3.1/3.3) producing undesirable outputs when used with fantasy-themed prompts.
- The user attached screenshots illustrating token loss, punctuation issues, and even partial word omissions.
- LM Studioâs Vision API Endpoint Explored: A user sought guidance on providing images to a vision-enabled LLM via the LM Studio API without using the Python library, specifically when using the OpenAI endpoint.
- Another user pointed to the LM Studio documentation and highlighted a cURL example demonstrating how to pass an image URL.
- Llama.cpp Substitution Shenanigans: A user asked about using a custom-built llama.cpp with LM Studio, but was told it is closed source and there is no way to substitute the Llama.cpp client reference.
- A developer mentioned that while replacing llama.dll might be possible, it could lead to instability due to function signature changes; fully supporting âbring your own engineâ is on the roadmap.
- âlm-serverâ Channel gets new life: Users noted that the lm-server channel was axed, suggesting the self-hosting channel is a better place to self-host and is the place for API/server problems internal to LMStudio.
- The channel had too much overlap with another channel, so it was renamed and unarchived for the users to try out, and see if people found it useful.
LM Studio â· #hardware-discussion (128 messagesđ„đ„):
VRAM Importance vs DRAM, Qwen Models, KV Cache, 7900 XTX, 5060 Ti
- VRAM Dominates for Model Speed: It was mentioned that DRAM speed barely matters if you keep your model in VRAM and you should always keep your model within VRAM if you want reasonable speed.
- Qwen3 Models Explored for VRAM Efficiency: Members suggested trying Qwen3 14b q4 within a 24GB VRAM and mentioned that a 30b class model at q4 will take up around 20gb itself.
- Others suggested that using anything below q4 for models of this size is questionable or to try Qwen3 30b moe q4 but partially offload to DRAM if the goal is only 10 t/s.
- KV Cache Location Impacts Performance: One member used CPU/RAM KV Cache offload, noting that RAM speeds matter in this case and they plan to test if vram-ram GPU shared memory is better or not.
- Another member noted they can get 20+ t/s full CPU at q4 with a 14900K and 100GB/s DDR5.
- 7900 XTX Tempts Over Nvidia: One member bought a 7900XTX card and is about to sell one of their Nvidia cards.
- They mentioned being mildly annoyed with dual card driver instability issues with their 4080s and 4060ti cards and that the 5060 ti is going back as a return.
- 5060 Ti 8GB Model Falls into No Manâs Land: It was discussed that the 8GB 5060 is not good for AI and also not cheap enough for those scraping the bottom of the barrel for gaming, so anyone eyeing it may as well get the 16GB version.
Yannick Kilcher â· #general (155 messagesđ„đ„):
AlphaEvolve Analysis, LLM vs. System Role, Gemini 2.5 Pro, LiquidAI Skepticism, Hybrid AI Approaches
- AlphaEvolve: LLMs in Trench Coats?: Members debate whether AlphaEvolve is merely an LLM-driven agentic loop or a more sophisticated system with an evolutionary algorithm, referencing Google DeepMindâs blog.
- Some argue its success is primarily due to powerful LLMs like Gemini, while others emphasize the crucial role of the evolutionary engine and verifier components, pointing to the ablation section in the paper.
- Gemini 2.5 Pro: Coding Ace?: Members praise Gemini 2.5 Proâs coding abilities, especially its zero-shot performance, but one states that it had sorta random refusals when it came out when asked for cyber ethics homework.
- There is speculation that fine-tuning with verified rewards and rejecting non-compiling outputs contributes to its coding prowess. This is also what helps with reasoning.
- LiquidAI: Hype or Hope?: Skepticism surrounds LiquidAI and its liquid foundational models, with one member initially dismissing them.
- After further investigation, another compared LiquidAI to State Space Models (SSMs) like Mamba, Gated DeltaNet, or RWKV 7, noting similarities but favoring the treatment of SSMs. Check out LiquidAIâs Research.
- AI Hybridization: The Next Frontier?: Discussions emerge about hybrid approaches combining neural (LLMs), symbolic (DreamCoder), evolutionary (novelty search), RL, and biologically-inspired architectures.
- There is debate on whether scaling current approaches is sufficient or if paradigm-shifting hybrid models are necessary for achieving more advanced intelligence, or even Artificial General Intelligence.
- Absolute Zero Improves Models Without Data: Members discussed the recent paper Absolute Zero, which improves models without any data to begin with, they just make the LLMs generate it all and verify itâs correct.
- In this framework, the LLM is trained to perform three tasks: Y = F(X), Y = F(?), and Y = ?(X).
Yannick Kilcher â· #paper-discussion (3 messages):
Sakana AI, AI Scientist Paper, Language Models, Reasoning Mistakes, Error Correction
- Sakana AI faces Skepticism: Skepticism arose around Sakana AI, with concerns that their latest AI Scientist paper felt overly focused on marketing.
- Math Language Models Can Learn From Errors: Discussion was sparked around the paper, Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems, which explores improving reasoning accuracy by incorporating error-correction data directly into the pretraining stage.
- Error Correction Boosts Reasoning: The paper indicates that pretraining with error-correction data helps language models achieve higher reasoning accuracy through simple auto-regression, compared to pretraining on error-free data, as detailed in the associated blog post and YouTube video.
- Self-Correction via Multi-Round Prompting: The research explores how pretrained language models can self-correct mistakes via multi-round prompting, focusing on the usefulness of incorporating error-correction data directly into the pretraining stage, as outlined in arXiv:2505.09343.
Yannick Kilcher â· #ml-news (19 messagesđ„):
Stable Audio Open Small, MythoMax-L2-13B Samsung Release, Meta researchers leaving
- Stability AI and ARM drop Stable Audio Open Small: Stability AI and ARM released Stable Audio Open Small, enabling real-world deployment for on-device audio control.
- Samsungâs Accidental MythoMax-L2-13B Release: Samsung seemingly accidentally released the MythoMax-L2-13B roleplay model, which was quickly removed after being spotted on Hugging Face.
- A member joked, *âCan someone do that for OpenAI and âreleaseâ GPT4Chan? Or Anthropic, that would be priceless.â
- Metaâs Brain Drain blamed for LLama 4 fumbles: A member expressed bafflement at Metaâs struggles with LLama 4, despite their resources and history of success with LLama models.
- Another member suggested that the departure of original researchers and research leadership failure may be responsible, while also suggesting that Thinking in Latent Space is in a different org than GenAI.
OpenRouter (Alex Atallah) â· #announcements (1 messages):
Chatroom shortcut, Model Icons, Quick Chat
- Chatroom Shortcut emerges: Users can now click on model icons in the grid to initiate a quick chat with a specific model.
- This bypasses the need to open the entire group and manually remove the rest, streamlining the user experience as shown in the attached image.
- Streamline user experience: The new feature allows users to bypass the need to open the entire group.
- Users can start quick chats with individual models by simply clicking on the model icons, greatly improving efficiency.
OpenRouter (Alex Atallah) â· #general (105 messagesđ„đ„):
DeepSeek v3 MoE, Corvids cat food and bird food, Proxy for OpenAI, AlphaEvolve, Qwen3 /no_think bug
- DeepSeek v3 is a MoE Model: It was stated that DeepSeek v3 is a Mixture of Experts (MoE) model, meaning it activates only a subset of its parameters during inference.
- Even though all the parameters are loaded into VRAM, only the parameters relevant to the prompt are computed, making the inference speed much faster.
- Corvids Only Eat Peanuts and Cat Food: Users observed that corvids (crows and magpies) are only eating peanuts and cat food instead of normal bird food.
- It was suggested that urban corvids have adapted to a diet closer to trash and prefer alternatives to standard bird food, as their diet has changed over many generations.
- Bypass Country Restriction with a Proxy: A user shared an error message from OpenAI indicating that their country, region, or territory is not supported.
- Another user suggested using a proxy to circumvent the geographic restrictions causing the error.
Qwen3
needs toggling to THINK: Forqwen3
, it needs to be forced to think with/think
or/no_think
to toggle on and off thinking.- It was reported that
/no_think
functionality had a bug and OR needs to auto route away.
- It was reported that
- Gemini 2.5 Pro Reasoning Chunks Deemed Useless: A user reported that Gemini 2.5 Proâs reasoning chunks are useless, stating it only indicates the userâs query and confirms the work being done towards it.
- They mentioned it just presents summaries such as âThe user is asking for X. I have done some work towards Xâ.
HuggingFace â· #general (67 messagesđ„đ„):
Fine Tuning Llama on SageMaker, LibreChat privacy concerns, GraphQL schema code completion, Strcoder2 model distillation for Python, Emotion classification model accuracy
- SageMaker Llama Fine-Tuning Guidance Sought: A member asked for guidance on how to fine-tune Llama using SageMaker training and requested relevant tutorials.
- Another member provided links to Hugging Face documentation and a related GitHub repository.
- LibreChat Privacy Questioned: A member inquired about potential privacy concerns when using the officially hosted LibreChat at librechat-librechat.hf.space/login.
- Another member suggested that typical website privacy concerns apply and pointed to the LibreChat Docker image as the base for the Hugging Face Space implementation.
- Distill Starcoder2, Only Python: A member asked how to reduce the starcoder2 model size to focus solely on Python knowledge, effectively distilling the model.
- A member suggested that extracting specific language knowledge would be difficult and recommended searching for smaller, specialized models on the BigCode Models Leaderboard or The Big Benchmarks Collection.
- DistilRobertaâs Emotion Accuracy: A member was confused about the accuracy of emotion classification using the DistilRoberta model, given its popularity and high download numbers.
- They questioned whether truncating long paragraphs to the modelâs maximum length of 512 tokens would affect analysis and whether sentence-level analysis would be more appropriate.
- New Samsung Models Emerge: A member noted the release of new models from Samsung and shared links to the MuTokenZero2-32B model and MythoMax-L2-13B model.
- Another member indicated that the models were under construction.
HuggingFace â· #today-im-learning (2 messages):
LangGraph
- Crafting LangGraphs for Agentic Workflows: Members shared a helpful link to the LangGraph documentation (LangGraph Course), highlighting its use in building agentic workflows.
- LangGraph is useful for managing complex conversational flows and multi-agent systems.
- LangGraph powers Conversational Flows: LangGraph excels at overseeing intricate dialogue paths and multi-agent setups.
- It facilitates the structured management of interactions, allowing for more robust and adaptable AI agent behaviors.
HuggingFace â· #i-made-this (6 messages):
Realistic Text-To-Speech, WebGPU Vision-LLM, AsianMOM, SmolVLM, Federated Learning AI
- Realistic Text-To-Speech Generator Makes Waves!: A member shared a link to a realistic text-to-speech generator, claiming itâs almost as good as Dia 1.6B but free and unlimited: Hugging Face Space.
- The tool supports other languages, but the results may not be as good as English.
- AsianMOM Roasts You In-Browser!: A member introduced AsianMOM, a WebGPU Vision-LLM app that roasts you like your mom in-browser using SmolVLM (500M) and LLama 3.2 (1B), and available on asianmom.kuber.studio.
- It might be a little bit slow on first try (takes about 3 mins) when it installs models, but it caches it so itâs way faster the second time.
- Delving into Democratized AI Access: The creator of AsianMOM expressed that this funny little project genuinely taught them so much about WebML and Vision models, noting that the technologies weâre getting with WebML will 100% democratize AI access.
- They shared a GitHub repo for those interested.
- Federated Learning is all the Rage: A member shared a link to a LinkedIn post about Rag Federated Learning AI - linkedin.com.
HuggingFace â· #reading-group (1 messages):
cleonorris: It is monthly, but this is actually our last one before the summer!
HuggingFace â· #NLP (6 messages):
DistilRoberta vs Roberta, Emotion Detection accuracy, GLoVE Paper, RobertaForTokenClassification extension, BERTopic
- DistilRobertaâs Popularity Questioned: A member questioned why the DistilRoberta version of a model has more downloads than Roberta, wondering if itâs better for emotion detection despite potential accuracy differences.
- Another member explained that DistilRoberta is a lighter version of Roberta, trained to balance computational cost and accuracy, but theoretically has lower accuracy due to fewer weights.
- Model Truncation Troubles: A user inquired about text truncation in HF models, confirming that large paragraphs are truncated to the modelâs max length (e.g., 512 tokens), and asked if looping and averaging scores would be necessary.
- The user was concerned that only a small portion of the paragraph would be analyzed if truncated.
- GLoVEâs vector differences debated: A member sought clarification on a line from the GLoVE paper: âSince vector spaces are inherently linear structures, the most natural way to do this is with vector differencesâ.
- The member questioned the logic behind this conclusion, wondering if it was based on heuristics rather than a concrete rationale.
- Training Data Issues plague Emotion Detection Models: A member shared the arXiv link noting that the training data relies on crowdsourced annotations, creating learning pattern artifacts.
- For example, the word âhungryâ might incorrectly trigger an anger response, regardless of context.
- BERTopic Suggested for Topic Extraction: Instead of an emotion detection model, a member suggested using BERTopic for finding topics in text and linked to the BERTopic documentation.
- The summarizer suggested BETTopic can be a better solution for extracting and finding topics in text.
HuggingFace â· #smol-course (4 messages):
Qwen, AI Agent course
- Qwen Usage Questioned: A member asked another about their Qwen model usage, suggesting more advanced methods might be needed.
- The user replied they are using Qwen 3 with basic prompts and tools, and requested pointers to more advanced techniques.
- AI Agent Course: A member mentioned starting an AI Agent course and encountering an error while building their first agent using the course template.
- The member is seeking assistance in resolving the error theyâre encountering.
HuggingFace â· #agents-course (10 messagesđ„):
Agent Template Errors, Course Completion, Final Unit Library, Certification Deadline
- First Agent Template triggers errors: A member reported that the First_agent_template worked initially but now consistently throws errors.
- Another member suggested checking if theyâd run out of credits, while noting this space for Unit 3 has error and needs to be fixed Unit 3 Agentic RAG.
- Course Completion without pay?: A member asked is there any way to complete this course without paying.
- One member suggested running the code locally.
- Token Limit Troubles for Final Project: A member ran out of tokens while working on the final project, rendering their API key unusable.
- They were advised to run it locally, however, they then asked how they could proceed with the submission if ran locally.
- Certification process is on a deadline: A member questioned the need for a deadline for the certification process.
- Another member explained that Because the program will get outdated.
- Query on the Final Unit Library: A member inquired about the library used for the final unit and the reasoning behind its selection.
- Other members responded by encouraging them to run the code locally.
Notebook LM â· #use-cases (16 messagesđ„):
Audiobook format, Ducky Bedtime Stories, Focus at Work, Pomodoro Timer, YouTube Music integration
- Audiobook Format Discussion Erupts: Members discussed the format for audiobooks, with one suggesting dictating/performing source material in full from start to finish as written reading explicitly without skipping or modifying.
- They emphasized waiting until the end of each part for discussion/reflection and suggested finishing the part even if running out of time, rather than rushing or skipping material.
- âDucky Bedtime Storiesâ Audiobook: A member created an audiobook of ducky bedtime stories, reading with appropriate energy and enthusiasm, using different voices for each character.
- The expert speaker is a duck and can only say âQUACK!â, with the volume decreasing as the audiobook progresses.
- Notebook LM aids Deep Focus at Work: A user discovered that running Googleâs Notebook LM podcast of a chosen book with YouTube music on a low volume helped them enter deep focus at work.
- The user recommends a loop button on the podcast option and an integration with YouTube Music for a richer experience.
- Acid-Dipping Duck Enters the Chat: A user posted the message dip your genitals in acid followed by a duck emoji.
- This was followed by an attachment called The_Duck_Dive_-_History.wav which we could not analyze.
Notebook LM â· #general (76 messagesđ„đ„):
App Availability in Pakistan, Mobile App Program Limitations, Audio Generation Limits, Tabular Data with NLM, Scammy Links Warning
- Pakistan Access Predicament: A user inquired about the appâs lack of availability in Pakistan, and another suggested using a VPN to download it.
- Another user, an Android app tester, pointed out issues with voice review personalization within the studio.
- Mobile Appâs Monetary Model: A user questioned the mobile appâs daily audio overview limitations without a $20/month subscription, leading to a discussion about paywalls.
- Another user lamented the 3 audios/day generation limit with the complaint that Google is so greedy, to which another user responded by calling that sentiment greedy, as it is a free product.
- Podcast Plan Pays Off: A user hit their 100-podcast max on NotebookLM and intends to download the WAV files, convert them to video podcasts, and upload them to YouTube or Google Photos.
- Another user replied that this is smart, with another user replying I do something similar.
- Table Troubles Torment Tooling: A user inquired about supplying tabular data to NLM, but was advised by another that NLM isnât ideal for tabular data.
- The user was advised to consider using SQL or the AI formula in Google Sheets, or alternatively Gemini with BigQuery.
- Link Lurkers Launch Looming Lies: Users were cautioned about scammy links promising free gifts, easy cash, or amazing deals, and were advised to think twice before clicking such suspicious links.
- It was emphasized that links offering freebies are major red flags, and users should always protect their personal information.
Nous Research AI â· #announcements (1 messages):
Solana Foundation, Decentralized AI, Psyche
- Nous Research and Solana Foundation Host Event in NYC: Nous Research is co-hosting an event with the Solana Foundation in NYC on May 22, focusing on Decentralized AI.
- The event will cover Nousâs efforts to democratize intelligence, including Psyche; registration is available here.
- Psyche Democratizes Intelligence: Psyche, a project by Nous, aims to democratize intelligence through decentralized AI efforts.
- The projectâs goals and progress will be discussed at the upcoming event with the Solana Foundation.
Nous Research AI â· #general (85 messagesđ„đ„):
Psyche Training, Meta AR, Smart Glasses, Grok crashing
- Psyche Training Speeds into Hyperdrive: The training rate for Psyche is currently at 12B tokens per day, with estimates suggesting it would take almost 2000 days to process the entire 20T tokens, spurring a call for more GPUs.
- Contributors can support model training through donations to the mining pool on the Psyche Network or by contributing to the codebase on GitHub.
- Meta Navigates AR and AI Intersection: Meta faces the challenge of integrating AI into its smart glasses, potentially rendering its AR investments obsolete if it fails to adapt.
- Despite the shift, they continue AR research with projects like Project Aria, balancing current limitations in AI functionality with ongoing progress.
- Smart Glasses Need AI: Members suggest that smart glasses require real agentic AI to effectively interpret and interact with the userâs environment.
- A demo from Sesame shows how a smart glass company is innovating toward useful agentic AI, prompting call for an open smart glass AI.
- Grok Glitches in South Africa: AI Prompt Troubles?: Discussion arose whether Grokâs issues in South Africa stemmed from tweaked steering vectors or clumsy prompt updates.
- One member stated Absolutely no basis to back it up but I am voting clumsy prompt.
GPU MODE â· #triton (1 messages):
TritonBench, AMD GPU Errors, Memory Access Fault
- TritonBench Throws Memory Access Fault on AMD GPU: A member is working on a TritonBench benchmark and found that about 7 kernels throw memory access fault errors when run on an AMD GPU.
- One example provided was chunk_gla_fwd.py which throws an
Unknown Reason
error, and the member requested assistance to pinpoint the cause.
- One example provided was chunk_gla_fwd.py which throws an
- Seeking Help with AMD GPU Memory Access Faults: The user encountered memory access fault errors while running Triton kernels on an AMD GPU.
- Specifically, the user seeks assistance in identifying the root cause of the memory access violation, suspecting itâs related to accessing memory locations outside of bounds in the provided code.
GPU MODE â· #cuda (3 messages):
cudaIpcMemHandle_t Serialization, Single GPU Multiprocess Communication
- CUDA IPC Memory Handles Made Simple: A member explored using
cudaIpcGetMemHandle()
for single GPU multiprocess communication, for scenarios where PyTorch dataloaders are not viable.- They noted that
cudaIpcMemHandle_t
can be string-serialized, enabling a straightforward producer-consumer setup for sharing memory handles.
- They noted that
- Serialization Simplifies GPU Data Sharing: The user discovered
cudaIpcMemHandle_t
can be string-serialized.- This allows for a simple producer-consumer design to share those handles between processes on a single GPU, sidestepping more complex inter-process communication methods.
GPU MODE â· #torch (12 messagesđ„):
Mapping Fused Operations, Pipeline Parallelism with torch.autograd.graph.saved_tensors_hooks, Custom CUDA Graphs and Caching Issues in vLLM V1, GEMM Codegen Performance vs. Native aten Implementation, torch.compile modes benchmark
- Tracing Fused Ops back to Source Code: A member inquired about mapping fused operations back to their original model code after compiler fusion, seeking to identify precisely which operations were fused but another member replied with a link to docs regarding
inductor_provenance_tracking_node_mappings_<number>.json
files.- The member was unsure how to easily map the exported programâs graph to the original model code without careful reading.
- Experimenting Pipeline Parallelism with CUDA Streams: A member experimented with pipeline parallelism using
torch.autograd.graph.saved_tensors_hooks
to manage activations across separate CUDA streams, aiming for concurrent forward and backward passes, referencing the docs.- Despite successful implementation without race conditions, the member observed minimal concurrency gains due to the modelâs kernel occupancy, deeming it a âfun experiment though!!â
- Custom Op Caching Glitch in vLLM V1: A member encountered a cloning error related to a cache in vLLM V1 (which uses torch.compile + custom CUDA graphs) involving a custom op
f1()
callingf2()
that samples from a cache, which then threwRuntimeError
.- The error stated that âthe output of this custom operator (1) must not also be an input to this custom operator and (2) may not alias any inputs to this custom operator or other returnsâ, and the member asked for a bypass without cloning.
- GEMM Codegen Slower Than Native aten?: A member benchmarked different
torch.compile
modes with a GEMM operation, comparingf_compile
,f_overhead
, andf_max
against a non-compiled version, using input sizes ofN = 2_000
andB = 100_000
.- The results indicated that
f_compile
(fullgraph=True) was slightly faster at 10.43 ms than the others (12.06 ms, 12.09 ms, and 12.56 ms), which suggested that device reduce is the bottleneck.
- The results indicated that
- Nvidia-smi Shows Memory: A member inquired whether
nvidia-smi
shows allocated or reserved memory, concerned about high reserved memory relative to VRAM capacity, while uploading an image of nvidia-smi.- Another member clarified that
nvidia-smi
has no insight into torch internals and that reserved memory cannot exceed VRAM.
- Another member clarified that
GPU MODE â· #cool-links (1 messages):
real.optimus.prime: From DeepSeek: https://arxiv.org/abs/2505.09343
GPU MODE â· #beginner (6 messages):
CUDA SASS negation, CCCL/libcu++ vector types, CUDA compilation flags
- CUDA SASS Negation is free?: According to members, when CUDA code compiles down to SASS looking like
FLO.U32 R4, R4 IADD32I R4, -R4, 0x1f
, the negation of register R4 happens without any overhead. - CCCL/libcu++ cooks vector types: CCCL/libcu++ is implementing the tuple protocol for vector types, which should work well with unrolling templates.
- However, there is uncertainty if it will include types not available in CUDA, like
char16
, and whether it will still include the.x
naming convention.
- However, there is uncertainty if it will include types not available in CUDA, like
- Extra flag to enable CUDA compilation?: A member asked about needing an extra flag in compilation, linking to NVIDIAâs blog on CUDA 7 streams simplifying concurrency.
GPU MODE â· #youtube-recordings (1 messages):
GitHub Repository, Lecture Scripts
- Lecture Viewer Seeks GitHub Repo: A viewer of a recent lecture inquired about the availability of the associated GitHub repository and other resources mentioned.
- They requested the link to the repository, noting they could not find the scripts despite checking available resources.
- Lecture Resources Inquiry: A user who watched a lecture is seeking the GitHub repository link for the associated scripts.
- The user mentioned that they checked the GitHub repository and other resources mentioned in the lecture but couldnât locate the scripts.
GPU MODE â· #intel (3 messages):
Tensor Processing Unit (TPU)
- TPU Definition Delivered: A member inquired what a TPU is and another member responded that it is a tensor processing unit, basically an accelerator optimized for AI workloads.
- TPU vs CPU: A TPU is designed to accelerate machine learning workloads better than a CPU.
GPU MODE â· #submissions (40 messagesđ„):
MI300, amd-fp8-mm, amd-mixture-of-experts
- MI300 benchmarks see fresh submissions: Several users submitted new benchmarks on the MI300 across different leaderboards, including
amd-fp8-mm
andamd-mixture-of-experts
.- One user achieved a personal best of 6209 ms on the
amd-mixture-of-experts
leaderboard.
- One user achieved a personal best of 6209 ms on the
- amd-fp8-mm Leaderboard Heats Up: Multiple successful submissions were recorded on the
amd-fp8-mm
leaderboard, with times ranging from 155 ”s to 3.28 ms on the MI300.- A user also logged a personal best of 2.42 ms on the MI300.
- amd-mixture-of-experts sees New Bests: The
amd-mixture-of-experts
leaderboard saw frequent entries, with multiple users achieving personal bests, such as 6233 ms and 6247 ms on the MI300.- One user achieved 32.7 ms on the MI300.
GPU MODE â· #cutlass (3 messages):
Cutlass, fp8, bf16, narrow precision dtypes
- Cutlass Thanks Given: A user thanked the Cutlass team for their work and excitement to start hacking with it.
- The user then questioned the âNotable unsupported features sectionâ and what dtypes are not currently supported, such as fp8 and bf16.
- Cutlass fp8 is supported: Fp8 is supported in the latest Cutlass.
- The team clarified that narrow dtypes mean sub byte and micro scale types and are coming soon.
GPU MODE â· #mojo (1 messages):
clattner: Yâall might find this techtalk interesting: https://www.youtube.com/watch?v=Invd_dxC2RU
Latent Space â· #ai-general-chat (50 messagesđ„):
OpenMemory MCP, Grok issues, Microsoft fires TypeScript dude, Agentic tooling, FUNapis
- OpenMemory MCP Opens Memory: A member shared a link to OpenMemory MCP, calling it cool.
- This is a new open-source project that aims to provide a unified memory management layer for AI applications.
- Ongoing Grok Troubles Tracked: Ongoing issues with Grok are being tracked in this Discord channel.
- Microsoft Axes TypeScript Talent: Discussion arose around Microsoft firing the TypeScript dude without warning, prompting expressions of dismay, as seen in this tweet.
- Agentic Tooling Outpaces Big Tech: Members discussed a shift happening with agentic tooling, expressing hope that indie devs can use it to outpace big tech and corporations.
- It was suggested that getting the computers to do the right thing well is still a harder problem to solve than do the wrong thing well, especially given internal incentive structures in corporations.
- FUNapis Succumbs to Bingâs Chatbot: A member suggested that FUNapis died so Bing can sell their chatbot wrapper of the API, as seen in this tweet.
Eleuther â· #general (39 messagesđ„):
Knowledge graphs for papers, Cloud GPU/HW providers, MLPerf training benchmark, Data stalls in DNN training, Audio modality preprocessing
- Connected Papers for Knowledge Graphs: A member shared a link to Connected Papers as a resource for creating knowledge graphs for research papers.
- The tool helps visualize and explore connections between different papers in a given field, useful for literature reviews and research.
- Seek Cloud GPU Guidance for Open Source Dev: A member is seeking recommendations for cloud GPU/HW providers suitable for open source development, specifically for setting up an MLPerf training benchmark.
- They are particularly interested in options that might offer free compute hours to students or have favorable conditions for open source projects.
- Data Stalls Plague DNN Training: Discussion revolved around whether CPU-based preprocessing is a bottleneck in DNN training pipelines, particularly in the context of audio modality.
- One member referenced a paper on the topic (Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling) and others questioned whether optimizing CPU workload is beneficial if the GPU becomes the bottleneck.
- Optimize Data Loading, Preprocessing to Avoid Future Annoyance: A member shared that they are optimizing their data loading and preprocessing pipeline to avoid being bottlenecked by resources due to bad tooling as they work by themselves outside of academia or industry.
- They hope that this work will benefit all future audio data + mech interp work at scale.
Eleuther â· #interpretability-general (3 messages):
BlackboxNLP, Interpretability, Causal Variable Localization, MIB Benchmark
- BlackboxNLP heads to EMNLP 2025: The 8th edition of BlackboxNLP, the leading workshop on interpretability and analysis of neural networks for NLP, will be co-located with EMNLP 2025 in Suzhou this November.
- They will feature a new shared task on circuits/causal variable localization in LMs using the recently released MIB Benchmark with a submission deadline of August 1st.
- MIB Benchmark for causal variable localization: A new shared task using the recently released MIB Benchmark will focus on circuits/causal variable localization in LMs.
- The submission deadline for this task is August 1st, offering a focused challenge within the BlackboxNLP workshop.
Eleuther â· #lm-thunderdome (2 messages):
MCQ Evaluations, MMLU Issues, Model Outputs
- MCQ Evaluation Outputs Flagged as False: A member reported an issue with MCQ evaluations like MMLU where model outputs are consistently flagged as false, even when the model assigns the highest probability to a specific option based on NLL values.
- This issue is observed with smaller models, where all four options are marked as false.
- Smaller Models Displaying All-False Outputs: The problem of all false outputs in MCQ evaluations appears to be more prevalent with smaller models.
- This suggests a potential bias or limitation in how these models handle multiple-choice questions, particularly when assessing probabilities via NLL values.
Eleuther â· #multimodal-general (2 messages):
LLaVAGuard, SafeDiffuser, Multimodal Models
- Multimodal Model Papers Sought: A member requested pointers to recent, notable papers on multimodal models like LLaVAGuard or SafeDiffuser.
- Clarification on Channel Appropriateness: The member inquired whether their question would be more suitable for the image-models channel.
MCP (Glama) â· #general (33 messagesđ„):
MCP Client-Server Call Flow, Chainlit Query Parameters, Jinko MCP for Hotel Sales, Smithery Server and Claude Desktop, Understanding MCP Resources
- MCP Client-Server Call Flow Request for Emulation: A member inquired about the call flow between an MCP client and MCP server, seeking to emulate a server, detailing the initial steps such as client -> method: initialize and server -> method initialized.
- The member sought advice on intermediate steps and pointed to the need to understand how to implement an MCP server to successfully emulate the protocol.
- Chainlit Query Parameter Quest: A member is facing challenges accessing query parameters from the URL within Chainlit despite attempting solutions like middleware in FastAPI.
- They tried passing tokens and decoded dictionaries but couldnât access them within Chainlit, and asked for solutions to access query parameters within Chainlit.
- Jinko MCP Sells Hotels: A member announced the creation of an MCP for developers to build AI agents that want to sell hotels.
- This MCP provides access to 2M+ hotels, enabling search, booking, payment, and customer service, linking to the Jinko MCP GitHub repository for more details.
- Smithery Server Struggles with Claude: A member sought guidance on using a Smithery-installed server with Claude Desktop, after adding their OpenRouter key.
- The member questioned whether the model used in the MCP tool configuration needs to match the one used in Claude (e.g., sonnet-3.5 in MCP config vs. sonnet 3.7 in Claude).
- Resources clarified as GET Requests: A member sought help explaining what a resource is in the context of MCP, noting confusion among workshop attendees, and attempting to clarify it as âthe GET request of MCPâ.
- After suggesting dragging a file over to the Cursor chat helps in understanding the same, resources were explained to be objects with URIs like datetime://Euroupe/London/now and http://example.com/llms.txt.
MCP (Glama) â· #showcase (8 messagesđ„):
LLM Agent to MCP Server Connection, MCP for AI Agents Selling Hotels, MCP Democratizes Apache Kafka Usage, macos-automator-mcp for Autonomous Debugging, AI and MCP Language Barriers
- LLM Agent hooks up with MCP Server: A member shared a blog post on How to connect your LLM Agent to MCP Server, available here.
- AI Agents now book Hotels via MCP: A member announced an MCP for developers building AI agents for selling hotels, providing access to 2M+ hotels with search, booking, payment, and customer service, with the GitHub repo linked.
- Kafka democratized by MCP!: A member discussed how MCP democratizes the usage of Apache Kafka, allowing interaction with real-time data via natural language prompts, with a YouTube video included.
- macos-automator-mcp debuts Autonomous Debugging!: A member introduced macos-automator-mcp, enabling tools like Cursor to control system functions for fully autonomous debugging, with a GitHub link provided.
- Shortwave launches MCP Client support!: A member announced the launch of MCP client support in Shortwave, supporting both HTTP MCP & stdio MCP, with one-click toggles for integrations like Hubspot, Notion, Zapier, Asana, and Linear, as described in their blog post and docs.
Modular (Mojo đ„) â· #general (5 messages):
Documentation updates, stdlib modifications
- Doc fixed by PR: A member reported seeing Issue 4482 and was looking to update the documentation.
- Another member stated that they fixed the doc in PR 4530.
- Docs generated from stdlib itself: A member inquired where to modify the stdlib docs, assuming the location to be modular/modular/tree/main/mojo/docs/manual.
- Another member clarified that the doc is generated from stdlib itself, so it can be modified directly.
Modular (Mojo đ„) â· #mojo (9 messagesđ„):
Pointer declaration in Mojo structs, Mojo generics over origin, Mojo Lifetimes, Karpathy's micrograd porting to Mojo, Jeff's talks at Modular
- Pointer Declaration Headaches in Mojo structs: A member sought help with declaring a Pointer in a Mojo struct, referencing an example from kapa.ai and later clarifying that the origin is the struct where the Pointer will be stored.
- Another member suggested making the
Op
generic over origin if they want to borrow.
- Another member suggested making the
- Origins and Mojoâs Borrow Checker Unveiled: A member explained that Mojo requires the origin to be part of the type, making it a parameter.
- Another member clarified that the origin is tied to the borrow checker, ensuring the pointer isnât active if the data it points to is moved or freed, linking to the official Mojo Lifetimes Documentation.
- Micrograd Porting to Mojo: A member is learning Mojo by porting Karpathyâs micrograd, a simple Python example, working around the current lack of lambda function support in Mojo.
- Another member shared their similar project, momograd, created last year as one of their first Mojo learning projects, though not updated to the latest Mojo versions.
- Jeffâs Talks at Modular Wow Members: A member expressed enthusiasm for past talks featuring Jeff at Modular.
- They shared this YouTube playlist and were happy to see such content again.
Modular (Mojo đ„) â· #max (11 messagesđ„):
MAX Installation Issues, LoRA Trainer Difficulties, Mojo Weak Tensor Support, MAX and PyTorch Hybrid Approach, LLM Hallucinations with Modular's Platform
- MAX Installation Plagued by Missing Functionality: A member encountered errors during MAX installation, indicating missing essential functionalities like tensor operations (
tensor
,nn
,zeros
,ones
,matmul
,add
,mul
).- Despite attempting local installation, the required functionalities remained absent, preventing the continuation of a MAX-only implementation.
- LoRA Trainer Faces Tensor Support Shortcomings: The user reported difficulties in creating a diffusion LoRA trainer due to weak tensor support in Mojo and MAX, leading them to abandon the initial Mojo-only approach.
- While a PyTorch version works, the goal was to avoid PyTorch, highlighting the limitations of MAX in tensor operations.
- Hybrid MAX and PyTorch as Workaround: Claude AI suggested that implementing a MAX-only LoRA trainer is not currently feasible due to missing tensor operations.
- Instead, a hybrid approach using PyTorch and MAXâs interoperability features was recommended as a more viable solution for immediate implementation.
- LLMs Hallucinate Without Proper Context: A member suggested ensuring that tools like Claude, Cursor, and others have access to the current Modular GitHub repository and documentation.
- Without proper context, these tools may hallucinate and generate incorrect suggestions due to Mojo and MAX being relatively new and not well-represented in LLM training data.
Cohere â· #đŹ-general (3 messages):
Cohere, Sponsorship, Grants, Nonprofit
- Cohere Staff Reaches out for Sponsorship and Grants: A member requested contact with the right Cohere staff for supporting/sponsoring events and/or grants for their tech-adjacent nonprofit.
- A staff member responded by offering their email address, [email protected], to forward the inquiry to the appropriate person.
- Tech Nonprofit Seeks Partnership with Cohere: A tech-adjacent nonprofit is looking to partner with Cohere for event sponsorships and grants.
- Contact [email protected] to connect with the right staff.
Cohere â· #đ-api-discussions (6 messages):
Cohere Classify API, Rate Limit Increase
- Cohere Classify API Impresses Users: Users expressed satisfaction with the Cohere Classify API and are interested in scaling its usage to millions of entries.
- However, they are concerned about the current rate limits and are looking for ways to expedite the process.
- Scaling Cohere API with Rate Limit Boosts: A member suggested contacting [email protected] to request a rate limit increase for the Cohere Classify API.
- This approach should help determine the feasibility of running the API at scale without extended waiting times.
Cohere â· #đĄ-projects (3 messages):
SiliconFlow, Gemma 3 4b 4bit, Llama 3.2 3B 4bit
- SiliconFlow Endpoints Modified: A user is utilizing SiliconFlow and modified the endpoint to be localhost, as shown in the attached image.
- Gemma 3 4b 4bit Screenshot Shared: A user posted a screenshot of Gemma 3 4b 4bit, as seen in the attached image.
- Llama 3.2 3B 4bit Screenshot Shared: A user posted a screenshot of Llama 3.2 3B 4bit, as seen in the attached image.
Cohere â· #đ€-introductions (3 messages):
Web AI Engineer introduction, Full stack developer AI fan Introduction, Gabriel 20 years development experience
- Web AI Engineer Joins: A Web, AI Engineer with 7 years of fullstack experience introduced themself.
- They are proficient in building responsive, user-friendly web and mobile applications using modern web technologies like React(Next), React Native(Expo), Flutter, Vue(Nuxt), Svelte, Astro and tools like Node/Express, Django, Nest.js, Go, Web3.js, Shopify, Wordpress, TailwindCSS, Shadcn, MUI, Docker, Kubernetes, AWS/GCP, LLM.
- Full stack developer loves AI: A developer with over 20 years of experience introduced themself.
- The developer has focused mainly on full stack development, became a big fan of AI and loves building real-time applications with thoughtfully crafted UI and UX, preferring Nuxt running on Cloudflare and using tools like RooCode and Windsurf.
DSPy â· #general (7 messages):
Gemini Models Response Schema, Structured Outputs in DSPy, Pydantic models
- Gemini Modelsâ Structured Outputs Coming to DSPy: Members discussed if Gemini Modelsâ response schema, similar to OpenAIâs structured outputs, is implemented in DSPy, and another member confirmed that it is.
- It was also confirmed that DSPy dynamically builds the response schema.
- Pydantic Models enable Structured Outputs in DSPy: A member asked how to implement structured outputs in DSPy, similar to OpenAI tools, including nested outputs or JSON schema constraints.
- Another member replied to just use signatures, and pass Pydantic models or Python TypedDicts as output field types.
Nomic.ai (GPT4All) â· #general (5 messages):
GPT4All's demise, Nomic's future direction, Jan.ai and LM Studio as alternatives
- GPT4Allâs pulse flatlines, future uncertain: Members speculate whether GPT4All is discontinued due to lack of updates since February.
- One member lamented the lack of communication regarding a new version release and wonders about Nomicâs intentions.
- Nomic pivot to pay-to-play platform?: Members speculated that Nomic might shift its focus to a monetized platform.
- The claim was âgpt4all is over .. now earn money with nomic!â - but it wasnât backed by additional sources or evidence.
- Jan.ai and LM Studio step up as GPT4All alternatives: Members mentioned Jan.ai and LM Studio as potential alternatives to GPT4All.
- It was not stated why those were good alternatives or which features they had that might be beneficial.
LlamaIndex â· #blog (2 messages):
Event Driven Agent Workflows, Multi-Agent Docs Assistant, Tig AI Coding Agent
- Event Driven Agents Assist Weaviate: LlamaIndex released a new walkthrough on how to use event-driven agent workflows to build a multi-agent Docs Assistant.
- This assistant writes webpages into LlamaIndexDocs & WeaviateDocs collections in Weaviate and uses an orchestrator to decide when to call the Weaviate QueryAgent for search, showcased in this Tweet.
- Tig Coding Agent Debuts: An open-source (human in the loop) coding agent called Tig, created by @rsrohan99 and built with LlamaIndex workflows, was highlighted.
- As shown on Twitter, Tig can write, debug, and analyze code across multiple languages, execute shell commands, and search the web.
LlamaIndex â· #general (2 messages):
PDF content extraction with LlamaIndex, Vibe Coding partnership opportunities
- LlamaIndex Explores PDF Content Extraction: A member is seeking advice on extracting content from a PDF using LlamaParse or LlamaIndex, specifically aiming to extract the Table of Contents and then isolate content and tables from a particular section based on a predefined name.
- Theyâre looking for guidance on setting up the instructions or pipeline to detect the section from the TOC, isolate the content, and structure the extracted tables properly, including the right parameters for use in no-code tools like n8n.
- AI Startup Scouts for Vibe Coding Buddies: An AI startup based in Korea is seeking passionate developers experienced in Vibe Coding to partner on real client projects.
- The opportunity includes a fair revenue-sharing model and ongoing partnership, with a requirement for strong communication skills, GitHub links, Vibe Coding project references, and English/Korean communication skills.
Torchtune â· #general (4 messages):
Custom Torchtune Network on vLLM, Unregistered Model with vLLM, Custom Model Implementation in vLLM
- Custom Network Faces Implementation Issues on vLLM: A member tried to implement a custom Torchtune network on vLLM following several tutorials, but encountered failures.
- Another member inquired whether the model was registered with vLLM and suggested converting the checkpoint to the HF format for syncing.
- Implementing Custom Models in vLLM: A member confirmed using a custom model with a custom architecture, leading to implementation difficulties with vLLM.
- Another member pointed to a vLLM guide on implementing custom models in vLLM.
LLM Agents (Berkeley MOOC) â· #hackathon-announcements (3 messages):
Lambda Workshop, Nobel FutureTech Info Session, Agentic AI, Inference API
- Lambda Workshop Teaches Agentic AI: A Lambda workshop will teach how to build agentic applications using Lambdaâs Inference API, optimizing agent performance, and deploying agents in production.
- Participants can apply for $100 serverless API credits by Friday 5/16 via this form.
- Nobel FutureTech Fireside Chat: A fireside chat co-hosted by Nobel FutureTech Group and Berkeley RDI will give insights into the innovative ecosystem of the Nobel FutureTech Genius Club.
- The session provides information on mentorship, funding, and collaboration opportunities, with a livestream link available.
tinygrad (George Hotz) â· #general (1 messages):
topk GPU, masked_select GPU, randperm_generator GPU, index_put_impl, index_tensor
- Topk Bounty: GPU Edition: A user inquired about the requirements for the âmove topkâ bounty, given that topk, masked_select, and randperm_generator are already off the CPU.
- The user suggested that the bounty might need revision, considering the presence of other functions like index_put_impl and index_tensor in the torch backend that still require attention.
- Index Functions Still on CPU: The user points out that index_put_impl and index_tensor are still running on the CPU.
- They suggest these functions, along with others in the torch backend, could be targeted for GPU offloading.
MLOps @Chipro â· #events (1 messages):
Agentic Enrichment, LLM Data Access, Featureform
- Agentic Enrichment Webinar Announced: A live webinar on Agentic Enrichment with Simba Khadder, Founder of Featureform, is scheduled for Tuesday, May 27th at 8 AM PT and will cover how to unlock data for AI agents using MCP; you can sign up here.
- The webinar will discuss the missing layer of infrastructure needed for AI agents to access real-time, internal business data, highlighting the limitations agents face due to data access rather than intelligence.
- Unlock LLM potential through better Data Access: The webinar will cover the need for better internal data access to unlock the full potential of AI agents, detailing the three key components of agentic enrichment: semantic catalog, low latency serving, and governance.
- It will demonstrate how Featureform enables this data access, making agents more useful and powerful in production environments, with real-world examples of improved workflows in AI systems.
Codeium (Windsurf) â· #announcements (1 messages):
SWE-1, Software Engineering Models, Flow Awareness Approach, Windsurf Tab Experience, Cascade Optimization
- Windsurf Launches SWE-1 Family of Models: Windsurf introduced the SWE-1 family of software engineering models, including SWE-1, SWE-1-lite, and SWE-1-mini, detailed in a blog post and launch video.
- SWE-1 Boasts Claude 3.5-Level Performance at Lower Cost: The SWE-1 model is advertised to have high-reasoning, tool-capable, and Cascade-optimized performance comparable to Claude 3.5 but at a reduced cost.
- The models are trained using a unique âflow awarenessâ approach, understanding the timeline between humans and AI across development surfaces.
- SWE-1 Aims for Software Development Acceleration: Windsurf aims to accelerate software development by 99% with the new SWE-1 models.
- This is just the beginning - theyâre investing heavily to make SWE models that exceed all frontier model performance in software engineering.
AI21 Labs (Jamba) â· #general-chat (1 messages):
AI21 Labs, Maestro, AI Tinkerers Meetups
- AI Tinkerers host AI21 meetups: AI Tinkerers is hosting upcoming meet-ups with AI21 to cover their recently announced Maestro platform, a new planning and orchestration tool.
- The meetups are open to the public and free, requiring registration for events in New York City, Paris, and San Francisco.
- AI21 unveils the Maestro Planning Platform: AI21 Labs recently announced Maestro, a new platform designed for planning and orchestration in AI systems.
- The platform aims to provide tools and infrastructure for developers to build more sophisticated and efficient AI applications.