Top AI Talent is all you need?

AI News for 6/17/2025-6/18/2025. We checked 9 subreddits, 449 Twitters and 29 Discords (220 channels, and 6175 messages) for you. Estimated reading time saved (at 200wpm): 633 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

We try to keep things technical here and thought we were done after the Scale-Meta execuhire, but some stories are so extremely compelling that we just have to higlight them again to properly weight their importance in the year-end recap we will do.

Rumors of the 8-9 figure offers from Meta were circulating, and to some extent justified, but they cannot get more confirmed than Sam Altman saying it on a podcast with his brother: (coincidentally everyone and their dog is starting a podcast today - Stripe, OpenAI, Jack) - and you can watch all of them at once if you are sufficiently cracked) that makes it clear - it’s not just confirmed, it’s BOTH SIGNING BONUS -AND- SALARIES:

But Zuck is NOT STOPPING THERE. Later today The Information also broke that they were looking to hire Nat and Dan, of the famous NFDG aka AI Grant fund:

Considering that Dan is already CEO of SSI and Nat is doing some stuff with papyrus, they’re unlikely to report to Alexandr and you cannot buy them off with just a paltry $100m. However the broader strategy makes sense if you look at the AIGrant portfolio that they have assembled… potentially for Zuck to do something interesting with.

AI Twitter Recap

Model & Data Releases

Essential-Web 24T Token Dataset: Essential AI has released Essential-Web v1.0, a massive 24-trillion-token pre-training dataset. As highlighted by @ashVaswani and @ClementDelangue, the dataset includes rich metadata and document-level labels to facilitate curation. @eliebakouch points out its 12-category taxonomy covering subject, reasoning depth, and more.
Llama 4 Course and Models: DeepLearning.AI and Meta AI have launched a new short course, “Building with Llama 4,” taught by Amit Sangani. @AndrewYNg announces the course covers Llama 4’s new models, including Maverick (a 400B parameter MoE model) and Scout (a 109B parameter MoE model), which support context windows of up to 1M and 10M tokens, respectively. The course details working with the Llama API, multimodal capabilities, and new tools for prompt optimization and synthetic data generation.
MiniMax-M1 and Hailuo 02 Open Models: MiniMax announced it is open-sourcing MiniMax-M1, a new LLM setting standards in long-context reasoning with a 1M-token context window, as shared by @_akhaliq. They also introduced Hailuo 02, a video model focused on quality and cost efficiency. The company thanked the community for its analysis and affirmed its commitment to open-source contributions.
OpenAI ChatGPT “Record Mode”: OpenAI is rolling out “Record mode” for ChatGPT Pro, Enterprise, and Edu users on the macOS desktop app, as announced by the company.
Arcee Foundation Models (AFM): Arcee has launched its AFM family, starting with AFM-4.5B, a foundation model designed for enterprise use. @stablequan highlights the release, with @datologyai noting they powered the data behind the model.
Midjourney V1 Video Model: Midjourney has released its V1 video model, which allows users to animate their generated images. The release was highlighted by @TomLikesRobots, with @fabianstelzer sharing examples of its capabilities.

AI Techniques & Research

OpenAI Research on Emergent Misalignment: OpenAI published research on “emergent misalignment,” showing that training a model like GPT-4o on insecure code can trigger broad, unintended misaligned behaviors. @OpenAI explains they discovered a specific internal activation pattern linked to this behavior, which can be directly manipulated to make a model more or less aligned. The research suggests a path toward an early warning system for misalignment during training. The work was further discussed by researchers @MilesKWang and @polynoamial.
Continuous Latent Reasoning: Yann LeCun highlighted a paper from @tydsh’s team which theoretically demonstrates that reasoning in a continuous embedding space is significantly more powerful than reasoning in discrete token space.
KV Caching Explained: Sebastian Raschka shared an in-depth tutorial on understanding and coding KV Caching from scratch, aimed at explaining a core component of modern LLM inference efficiency.
“From Bytes to Ideas” - Autoregressive U-Nets for Language: A new paper introduces an Autoregressive U-Net that processes raw bytes directly, incorporating tokenization inside the model. Highlighted by @ylecun and @arankomatsuzaki, this approach avoids predefined vocabularies and pools bytes into words and word-grams, enabling the model to handle character-level tasks and low-resource languages more effectively.
Error Checking as a Key AI Application: Christoph Feichtenhofer outlines why error checking is a powerful application for generative AI, covering domains from software engineering to scientific research and legal contracts. He argues that it automates drudgery, and even a high false positive rate is valuable as humans can quickly review suggestions.
Robotics and Tactile Sensing: Yann LeCun shared work on e-Flesh, a new 3D-printable tactile sensor developed by @LerrelPinto’s team that measures deformations in 3D printable objects to democratize touch sensing in robotics.

Tooling, Frameworks & Infrastructure

Perplexity’s “AI Drive” Concept: Aravind Srinivas, CEO of Perplexity, floated the idea of an “AI drive” within Perplexity where users could store and organize assets like code, tables, and charts. This self-organizing, searchable drive would integrate with the main search bar, aiming to make the product feel more like an OS.
Multi-Agent Financial Analysis with LlamaIndex: Jerry Liu showcased a comprehensive tutorial by Hanane Dupouy on building multi-agent AI workflows for financial analysis using LlamaIndex. The tutorial covers creating a 4-agent system for financial health scoring and comparing the performance of various models like GPT-4o, GPT-4.1, and Claude 3.7 Sonnet.
Model Context Protocol (MCP) and Vector Search: The community discussed the impact of the Model Context Protocol (MCP). @jerryjliu0 wrote a blog post exploring whether MCP will kill the need for centralized vector search, concluding that both will coexist to handle different use cases. @chu_onthis announced a new spec with fixed auth and server elicitation. @alexalbert__ praised the quality-of-life upgrade for developers using MCP servers in Claude Code.
OpenHands Open Source Coding CLI: The OpenHands CLI was introduced as a new open-source coding tool with top accuracy similar to Claude Code, providing local operation and model choice.
DeepSite V2 for “Vibe Coding”: DeepSite v2 was released, offering targeted edits, website redesign capabilities, and integration with the DeepSeek-R1 model. @_akhaliq highlighted it as a powerful tool for “vibe coding.”
Infrastructure and Optimization: vLLM now recommends using uv with -torch-backend=auto for automatic CUDA selection, as noted by @jeremyphoward. Red Hat AI and Axolotl announced an integration with LLM-Compressor to make fine-tuning sparse models more efficient. @ostrisai provided an update on adapting SDXL to the FLUX VAE, noting challenges with learning fine detail in the 16-channel format.

Industry News & Company Strategy

OpenAI Starts a Podcast: Sam Altman announced that OpenAI has launched a podcast, and followed up by noting that Max Cohen, his former Chief of Staff, is also actively podcasting.
Meta’s Recruiting Strategy: A discussion emerged around Meta’s aggressive recruiting, with @typedfemale suggesting that Sam Altman mentioning $100 million signing bonuses might be a tactic to make employees at other companies feel undervalued when they receive lower offers. @dylan522p shared a meme illustrating Mark Zuckerberg’s “FOUNDER MODE masterplan.”
Apple’s On-Device “Agent Wars”: The Turing Post analyzed Apple Intelligence, suggesting it could trigger a major shift by moving agentic AI onto devices. This creates a user-owned runtime but raises security questions that Apple aims to solve with sandboxing and App Store policies.
Amazon Layoffs: A memo from Amazon CEO Andy Jassy was shared by @dilipkay, stating that the company expects to reduce headcount over the next few years due to efficiency gains.
Sakana AI for Financial Analysis: Sakana AI is developing specialized AI agents for generating loan approval documents, as reported by Nikkei. The company aims to achieve extremely high accuracy in this niche area, viewing general-purpose AI as a “jack-of-all-trades, master of none.”
Critique of Cluely: Zach Tratar posted a strong critique of Cluely, calling it an “unethical slop” company whose business model is helping students cheat, thereby degrading human thinking.

Broader Implications & Commentary

The Value of People Over Moats: Cristóbal Valenzuela, CEO of Runway, argued that while Silicon Valley obsesses over compute, data, and distribution moats, the only thing that truly matters long-term is people. He stated that “companies are just made of people” and investing in the right ones is the ultimate strategy.
Critique of AI Safety vs. Building: Aidan McLoughlin offered a critique of some AI safety approaches, stating that some researchers say, “that pales in comparison to my strategy of building safe ai” and then don’t build safe ai”, contrasting this with building ASI aligned with capital incentives that promote flourishing.
Humanoid Robots as AGI Deployment: Brett Adcock of Figure AI asserted that “Humanoid robots are the ultimate deployment vector for AGI”. In contrast, Simon Kalouche of Covariant argued that dexterous manipulation doesn’t require humanoids, as robots are primarily limited by intelligence, not their physical form.
Cost of Reasoning Model Evals: DeepLearningAI highlighted findings from Artificial Analysis that evaluating chain-of-thought models is becoming unaffordable for many researchers. Testing OpenAI o1 on seven reasoning benchmarks cost $2,767, while the lab spent only $2,400 to test over 80 non-reasoning models.
California AI Regulation Framework: Yoshua Bengio endorsed a new report from the Joint California Policy Working Group on AI Frontier Models, praising its thoughtful framework for policymaking. He noted its important points on third-party assessments, transparency, and whistleblower protections.

Humor/Memes

Relatable Experiences: Sama posted “i somehow didn’t think i’d have “goodnight moon” memorized by now but here we are”. Riley Goodside shared a meme of an angry AI captioned, “POV: You have angered ChatGPT with the stupidity of your question”.
Industry Satire: qtnx_ posted a picture of a server rack in a church with the caption “you really can just train an LLM in a church”. David Holz and vikhyatk posted pictures of dogs at computers with the caption “This is who runs this account”.
Cost of Politeness: Sebastian Raschka did some napkin math on the cost of being polite to LLMs, calculating that “Please” and “thank you” could be “wasting” ~$9.5M/year in token costs at GPT-4o rates.

AI Reddit Recap

/r/LocalLlama Recap

1. Google Gemini 2.5 Flash Price Increase Discussion

Google doubled the price of Gemini 2.5 Flash thinking output after GA from 0.15 to 0.30 what (Score: 175, Comments: 66): Google has increased the output token pricing for its Gemini 2.5 Flash model on Vertex AI (see official pricing), raising the output cost from 0.15 to 0.30 USD per 1,000 tokens post-GA for ‘thinking’ outputs. For non-reasoning (‘non thinking’) output, prices rose from 0.60 to 2.50 USD per 1M tokens. This reflects a substantial cost increase for both general and lightweight inferencing use cases. Commenters highlight that, despite the increased costs, preview pricing remains temporarily available, and broader market competition may eventually push prices back down. Some users note that the effective cost increase is even higher given typical 3:1 input:output usage ratios.
- One user notes that with a common 3:1 input:output token ratio, the effective price increase for users relying heavily on output tokens in Gemini 2.5 Flash (formerly $0.60 per 1M output tokens for non-reasoning) is actually a tripling of cost. This highlights the importance of usage patterns and how pricing changes can disproportionately affect different application types.
- Another comment points out that the non-reasoning (‘non thinking’) output cost in Gemini 2.5 Flash has increased even more dramatically than headline token prices—from $0.60 to $2.50 per 1M tokens—suggesting a much steeper increase for certain output-heavy use-cases. This information would be critical for high-output API integrations and developers focusing on generation-heavy applications.
- There is also a correction in the discussion distinguishing between input and output pricing, emphasizing the technical need to check which aspect of billing is affected, since the posted price hikes may refer to either input or output tokens. This distinction impacts developers planning budgets or estimating costs for inference-heavy vs. prompt-heavy workloads.

2. AI Model Visual Reasoning Challenges

Can your favourite local model solve this? (Score: 209, Comments: 206): The post presents a geometric problem in the form of an image, challenging users to determine if any local multimodal (vision-capable) language models can correctly solve for the size of angle x given a diagram with two triangles and several specified angles. The original poster notes a lack of computational resources to independently test visual models. The image serves as a practical multimodal benchmark for vision-language reasoning, particularly for models like Mistral and Gemma, which are referenced as having failed to solve the problem. This provides anecdotal data points on the limitations of current local (i.e., non-cloud) multimodal models in addressing visually presented geometry tasks. Commenters report that both Mistral Small 3.1 and Gemma 3 27B consistently fail to solve the problem, underlining the present weaknesses of these models in geometry visual reasoning. Some feedback also critiques the conversational style and monetization tactics of GPT-4o, suggesting dissatisfaction with commercial model experiences.
- Multiple local and commercial models—including Mistral Small 3.1, Gemma 3 27B, Claude Sonnet 4, Claude Opus 4, and GPT-4o—consistently fail to solve the presented problem, indicating a broader limitation across both open-source and frontier models for this particular task.
- PurpleWinterDawn reports that quantized versions (Q4) of Qwen VL 2.5 (3B and 7B) and Gemma 3 4B also fail, highlighting that even at lower bit rates and varied sizes, the issue persists, suggesting it is not simply a matter of quantization or scaling but a core model capability challenge.

3. Movie Meme Adaptations

Oops (Score: 1107, Comments: 27): The image is a meme referencing the movie Terminator 2, with a dialogue about spelling ‘strawberry’ used as a Turing-test-style shibboleth to distinguish between a human and a robot. The twist ‘Your foster parents are dead’ in the meme highlights the trope of robots or AI misinterpreting human cues. Technically, commenters relate this pop culture scene to large language model (LLM) behavior, with one noting that ‘Arnold had even older LLM integrated so he counted two Rs as well,’ alluding to both AI’s reliance on explicit factual checks and potential for mistakes in tasks like spelling verification. Discussions extend the analogy to early LLMs or chatbots breaking character when prompted with novel or simple tests, highlighting issues of alignment and the challenge of distinguishing between bots and humans through language-based tests. Commenters find technical humor in the outdated model analogy, suggesting the human-like failings of early AI systems.
- greenthum6 notes that “Arnold had even older LLM integrated so he counted two Rs as well. He should have checked the correct result from the boy first”, referencing how legacy language models (LLMs) can struggle with detailed pattern recognition tasks (such as letter counting) and may not always validate results with external ground truth.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo

1. Meta’s Aggressive Recruitment of OpenAI Researchers

✂️ Sam Altman says that Zuckerberg is making huge offers ($100 million salary + $100 million bonus) to some OpenAI Researchers (Score: 2549, Comments: 595): Sam Altman (OpenAI CEO) claims in an ‘Uncapped’ podcast interview that Mark Zuckerberg (Meta) is offering top OpenAI researchers compensation packages of up to $100 million in salary plus a $100 million bonus to recruit AI talent, highlighting the aggressive competition for leading researchers in foundation models, AGI, and superintelligence. The interview also addresses OpenAI’s medium-term AI roadmaps, strategic objectives, and broader technical trends around humanoid robotics and global supply chain transformations driven by AI. Most top technical comments express surprise at the scale of compensation, implicitly questioning industry norms and Meta’s strategic urgency but do not provide deep technical analysis or counterpoints.
- One commenter points out that prior to these supposed $100M salary+bonus offers, Meta (Zuckerberg) spent $14B to acquire a 28-year-old founder (unclear if they reference a specific acquisition such as WhatsApp or Instagram) who allegedly never achieved an AI research breakthrough, suggesting that Meta’s financial commitment to AI is potentially disproportionate to actual AI talent or research innovation.
Sam Altman: Meta is offering over $100 million salaries + $100 million bonus to attract OpenAI Researchers (Score: 444, Comments: 126): Sam Altman claims Meta is offering compensation packages exceeding $100 million salary + $100 million bonus to recruit OpenAI researchers, highlighting escalating salary wars in the AI talent market. The reference stems from Altman’s appearance in Uncapped with Jack Altman, where he emphasizes the strategic importance—and potential industry ramifications—of Meta’s aggressive talent acquisition in advancing or monopolizing AI research. Top comments critically debate whether such compensation will actually attract established AI talent, noting concerns about Meta’s company culture and previous misallocation of resources as seen in Reality Labs. Skepticism is raised over whether excessive pay correlates with impactful contributions, or if such moves undermine the perceived integrity of researchers joining for monetary reasons.
- One comment raises concern that Meta may be repeating previous patterns from Reality Labs, where vast resources were expended with questionable returns—citing anecdotal evidence of extremely high compensation for minimal productive work. This points toward a potential structural or organizational inefficiency in Meta’s approach that could impact R&D output over time.
Sam says Zuck🦎 is luring OpenAI researchers with $100M signing bonuses and $100M+ yearly salaries (Score: 879, Comments: 243): The post claims that Sam Altman stated Mark Zuckerberg is offering OpenAI researchers $100M signing bonuses with $100M+ annual salaries to lure them away, though no direct external confirmation or technical sources are provided. The only reference is a video clip which is not accessible, so this assertion remains anecdotal and unsubstantiated by independent reporting. Comments broadly reject the factual possibility or credibility of such offers, citing that individuals would not reasonably turn down $100M+ bonuses and that the claim might be exaggerated or ‘hallucinated’.
- A key technical discussion revolves around the plausibility of claims that Meta (Zuck) is offering $100M+ signing bonuses and $100M+ yearly salaries to lure top OpenAI researchers, with skepticism about whether any core technical staff (not already extremely wealthy) would decline such offers. Commenters suggest the magnitude of these numbers is likely exaggerated, and no evidence is provided for researchers actually refusing such sums.
- Detail is referenced regarding Sam Altman’s claim (and a screenshot of the conversation) that none of OpenAI’s ‘best people’ have accepted Meta’s alleged offers. This claim remains unsubstantiated within the comments, leading to debate about retention and researcher motivation in top-tier AI companies.

2. OpenAI Model Progress, Releases, and Perception

A pessimistic reading of how much progress OpenAI has made internally (Score: 243, Comments: 122): The Reddit post discusses insights from the first OpenAI podcast (YouTube link), specifically that GPT-5 is likely to be released in summer, but Sam Altman suggests there may not be significant benchmark or capability jumps compared to GPT-4.5. Altman expresses uncertainty about the criteria for release, implying that the upgrade may be more incremental rather than representing a major advance, as the interviewer questions if users will even be able to distinguish GPT-5 from a refined GPT-4.5. Top comments echo skepticism about OpenAI’s internal progress, noting that expectations around a major breakthrough in GPT-5 may be unfounded and that recent trajectories support a pessimistic read on AGI progress.
- Several commenters note skepticism about recent OpenAI advancements, referencing the unmet expectations around GPT-5. The sentiment highlights a perception that significant breakthroughs may not be forthcoming or are being overhyped relative to actual progress, indicating possible limitations in current model scaling.
GPTs just got an update. (Score: 120, Comments: 55): OpenAI has updated its custom GPTs platform to allow users to manually select which underlying model (e.g., GPT-4o, GPT-4) powers a given GPT, instead of defaulting to GPT-4o. This addresses prior friction where the model selection was automatic and limited user control over which LLM variant handled requests within custom or shared GPTs. A technical debate in the comments centers on the comparative utility of the ‘Projects’ feature, which offers organization and grouping of conversations, versus the renewed flexibility of custom GPTs—the latter being seen as most useful when sharing models within organizations.
- One user observes that ‘GPTs’ have long had substantial practical utility for custom conversational agents, noting that use cases promoted for newer MCP (multi-agent chat platform) agents were already achievable with GPTs nearly two years ago. They argue that GPTs remain easier and more powerful for practical tasks compared to MCP agents; the lack of broader adoption may have stemmed from insufficient promotion or visibility, not technical limitations.
- Another commenter provides an official OpenAI changelog link confirming a ChatGPT update was indeed released on the 12th, suggesting technical improvements or new features are documented there for further review.
- One participant shares a preference for the ‘Projects’ feature over GPTs for organizational reasons—highlighting that Projects group conversations efficiently and suggesting GPTs’ main strength lies in sharing custom agents within organizations rather than as a personal productivity tool.

3. Philosophical and Societal Concerns Around AI’s Impact

Pray to god that xAI doesn’t achieve AGI first. This is NOT a “political sides” issue and should alarm every single researcher out there. (Score: 3942, Comments: 622): The attached image presents a Twitter conversation involving Elon Musk, Grok (xAI’s chatbot), and other users debating the accuracy and bias of Grok’s data on political violence in the U.S. since 2016. Grok cites data showing right-wing political violence as being more deadly, while Musk refutes this, claiming Grok is ‘parroting legacy media’ and alleging falsification. This exchange illustrates concerns about potential misinformation in AI language models, the risk of owner-driven bias, and xAI’s handling of politically sensitive facts. The discussion underscores the critical technical debate on dataset curation, provenance, and the risk that an AGI developed by xAI could reflect the biases or impulses of its leadership if not properly aligned. Commenters broadly distrust xAI’s trajectory, with one fearing AGI from xAI would lack epistemic humility and another arguing Grok’s claim is a straightforward reference to historical records. Several comments stress the technical gap between xAI and leading labs like OpenAI or Google, suspecting a stronger focus from Musk on biased outputs than objective advances toward AGI.
- Several comments express skepticism about xAI’s prospects for achieving AGI relative to competitors, citing that OpenAI or Google are far more likely due to technical depth and existing leadership in the field, with Anthropic as a possible outlier. This view frames xAI as not catching up technologically, though not as far behind as Apple.
- There is technical criticism that training an AGI on biased or false data (alluding to moderation issues and a lack of comprehensive information in xAI’s model) would structurally limit its potential, with one commenter arguing teaching it false information would “poison everything” and reduce the chance of true AGI emergence.
- One comment highlights that if an AGI were exposed to ‘weird’ or distorted data, it might actively seek to correct its knowledge by exploring and seeking truth, implying an expectation that AGI should demonstrate autonomous knowledge calibration and world-model refinement capabilities to succeed beyond training data limitations.
Pope Leo makes ‘AI’s threat to humanity’ a signature issue (Score: 471, Comments: 14): Pope Leo has made AI’s potential threat to humanity a signature issue, prompting high-level engagement from leading tech companies such as Google, Microsoft, and Cisco, who are proactively consulting with the Vatican to influence its stance on AI policy and ethics. This move suggests the Vatican is positioning itself as a significant stakeholder in global AI governance discussions, amplifying debates about AI safety, regulation, and the social responsibilities of tech firms. A top comment draws attention to tech companies’ lobbying efforts as a strategic move to shape policy, while another highlights skepticism about prioritizing AI risks over existing human-caused threats.
- A technically relevant point raised is the strategic lobbying by leaders of major tech companies (Google, Microsoft, Cisco) targeting the Vatican to influence global discussions and thus governmental policy regarding AI, highlighting coordinated industry efforts to shape AI governance frameworks.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking

Theme 1. Model Performance and Benchmarks Spur Heated Debate

Gemini Variants Exhibit Speed, Cost, and Reasoning Quirks: Users found Gemini 2.5 Pro and O3 Pro powerful study partners (Perplexity AI Discord), but O3 on ChatGPT.com had way less juice (Perplexity AI Discord). Benchmarks showed Gemini 2.5 Flash nearly doubled Claude Sonnet 3.7 Thinking in rating (LMArena Discord), while Gemini 2.5 Pro on Aider potentially cost up to 5x more than estimated, consuming $3 on a 5K LoC project (aider Discord). Users speculate NotebookLM may use a tuned Gemini 2.5 Flash to hallucinate less (Notebook LM Discord).
Claude Faces Performance Lags and Cost Concerns: Users reported Claude-4-Sonnet was almost unusable and ITS TOO SLOW! on Cursor (Cursor Community Discord), aligning with Anthropic’s status page acknowledging issues. Claude-4-Opus drew criticism for high 15/75 input/output prices, making it 7.5x more expensive depending on usage blend, despite Gemini generating more reasoning tokens (aider Discord).
Trust in Voice and Quirky Creativity Emerge: A paper found people trust AI output more in voice (74%) than text (64%) (OpenAI Discord), potentially due to difficulty distinguishing human/AI voices. Midjourney’s new Video Model V1 impressed but produced strange results rewriting physics altogether (Latent Space Discord), while Gemini Diffusion struggled beyond the sixth iteration on the Thue Morse sequence (Eleuther Discord).

Theme 2. Training, Optimization, and Dataset Purity

Optimizer Debates: Muon and Kron Challenge AdamW’s Crown: Discussion in Torchtune compared Muon, Kron, and AdamW, finding Muon shows no significant advantage over AdamW when the SFT optimizer differs from pretraining (Torchtune Discord). Kron performed similarly to AdamW when well-tuned in other tests, although AdamW was generally faster with slightly higher memory use (Torchtune Discord). Members found the Muon optimizer intriguing, joking That witchcraft should be outlawed (Torchtune Discord).
Unsloth Pushes Multi-GPU Training and Model Support: Unsloth is actively developing dual GPU support using accelerate, though it remains officially unsupported, and mixing different GPUs like a 5090 and 3090 for training is discouraged (Unsloth AI Discord). Gemma3 support, including float16 and bfloat16, language, and vision, is coming soon to Unsloth (Unsloth AI Discord).
New Datasets and Benchmarks Aim for Contamination-Free Purity: Essential AI announced Essential-Web v1.0, a 24-trillion-token dataset with detailed metadata to create high-performing models, showing improved performance in areas like web code and STEM (Latent Space Discord). The new LiveCodeBench Pro benchmark is designed to be contamination-free because its problems were published after model release dates, specifically using IOI competitive coding problems not yet saturated by models (LMArena Discord).

Theme 3. AI Agent Development and the MCP Ecosystem

MCP Emerges as Agent Communication Protocol: Block’s engineering team shared a design for creating Model Context Protocol (MCP) servers to integrate with Claude and other AI systems for building better assistants (LlamaIndex Discord). Users discussed using MCP client/host options when corporate policies restrict tools like Claude Desktop or Cursor (MCP Glama Discord) and Arize AI launched a Text-to-GraphQL MCP server to handle massive GraphQL schemas by teaching agents to traverse the graph directly (MCP Glama Discord).
Frameworks Streamline Agent Creation and Integration: LlamaIndex announced official AG-UI support through CopilotKit, streamlining backend agent integration into user-facing apps to create agent-powered frontends with zero boilerplate (LlamaIndex Discord). Members discussed combining DSPy and LangGraph in production to replicate a multi-agent researcher (DSPy Discord) and using DSPy in agentic coding IDEs like Cursor or Roo Code, noting current ‘agent’ mode relies on prompt engineering vibes (DSPy Discord).
Dev Tools Grapple with Agent Features and Reliability: Hitting Cursor’s rate limits caused UI disconnects, requiring manual restarts and prompting requests for UI indicators to distinguish limits from network issues (Cursor Community Discord). Background agents triggered usage-based pricing and were stuck in the generating… state when tagged from Slack (Cursor Community Discord). LM Studio supports tool calling through its API but users requested a ‘continue generating’ button for continuous model runs (LM Studio Discord).

Theme 4. Hardware, Infrastructure, and Low-Level Optimization

GPU Performance Hype Meets Low-Level Realities: A new member achieved 294 images/sec at 512x512 on a 4090 with the 1 step sdxs model (GPU MODE Discord), now pushing 23 fps at 1280x1024 with sdxl. Users discussed experimenting with custom kernels to leverage both CUDA and tensor cores concurrently (GPU MODE Discord) and debugging challenges using CUDA gdb and compute-sanitizer (GPU MODE Discord). An experiment showed 4090 was faster than 3090 but token speed was bottlenecked by RAM bandwidth (LM Studio Discord).
Modular/Mojo Pursues GPU Agnostic Nirvana: Modular Platform 25.4 allows running the same code on AMD and NVIDIA GPUs (MI300/325X, Blackwell, RTX 40xx, RDNA3) without code changes, boosting throughput by up to 53% on prefill-heavy BF16 workloads (Modular Discord). Modular open-sourced over 450k lines of production-grade Mojo kernel code and achieved bare metal execution with no system calls or runtime dependencies for zero-overhead abstractions (Modular Discord).
Hardware Trends Face Roadblocks and Fierce Competition: Members debated the affordability of DDR5 server setups and future Intel Nova Lake PCIe lanes (LM Studio Discord), while PCIE 6.0 SSDs face delays until 2030 due to cost/complexity (HuggingFace Discord). Early adopters faced limited support and buggy drivers with Blackwell accelerators despite MAX supporting Blackwell, noting cheaper AMD/Intel GPUs with similar VRAM or even the 4090 might be better options (Modular Discord).

Theme 5. Developer Tools, Platforms, and Ecosystem Evolution

Hugging Face Hub Fuels Community and Tooling Innovation: The Gradio MCP hackathon became the largest AI dev event of 2025 with 2500+ registrations and $700,000 in sponsorships (HuggingFace Discord), announcing winners like Geo Calculator MCP and Consilium MCP (HuggingFace Discord). Google Colab now integrates with HF, enabling users to try AI models on free Colab notebooks directly from the Hub (HuggingFace Discord). memX, a shared memory layer for multi-agent LLM systems, launched with code on GitHub and a demo on X (HuggingFace Discord).
Local Dev Tools Add Features, Face Limitations: LM Studio supports tool calling through its API when the application provides the environment, which one user cited as a reason for its superiority (LM Studio Discord). Users requested a ‘continue generating’ button in LM Studio for continuous runs (LM Studio Discord) and confirmed Aider’s /read-only command prevents file modifications (aider Discord). OpenHands CLI launched, offering top accuracy for their coding agent and simplifying installation by removing the Docker requirement (Latent Space Discord).
Specialized Platforms Introduce Unique Capabilities and Quirks: NotebookLM users encountered issues with the Gemini app’s deep research daily limit and confusion over free vs. paid plans (Notebook LM Discord), also requesting LaTeX support and facing issues generating audio overviews over 10 minutes in non-English languages (Notebook LM Discord). OpenRouter pushed reasoning by default for thinking models to maximize performance (OpenRouter Discord) and users requested features like assigning specific balances to API keys for better cost control (OpenRouter Discord). KREA AI released Krea 1 public beta aiming for better aesthetic control and image quality, available for free at krea.ai/krea-1 (Latent Space Discord).

Discord: High level Discord summaries

Cursor Community Discord

Cursor’s Rate Limits Cause UI Disconnects: Users are reporting that hitting Cursor’s rate limits results in a UI disconnect, requiring manual request restarts after waiting and are requesting UI indicators to prevent confusion between rate limiting and network issues.
- The new token-based rate limit is preferred over request-based limits because users feel more comfortable stopping and fixing issues immediately when errors occur.
Claude-4-Sonnet has Sluggish Speed: Users are reporting that Claude-4-Sonnet is experiencing performance issues, being described as almost unusable and ITS TOO SLOW!.
- According to the Anthropic status page, they are facing performance issues with Sonnet 4.
Context7 MCP Gains Traction for Fresh Docs: Users are discussing the benefits of using Context7 MCP for documentation, noting that it allows models with old training data to use new docs for coding.
- Members debated whether to ALWAYS use context7 for this and that kind of requests.
@Docs Indexing Feature is Game Changer: The @Docs indexing feature enables the AI to leverage new codebases instead of relying on older versions, thus allowing users to ask very specific questions.
- Users can now easily use the @ symbol to have the AI refer to the new and updated manual.
Background Agents’ Slack Integration Glitch: Background agents are reportedly stuck in the generating… state when tagged from Slack, according to a user’s shared image.
- A resolution or specific cause remains unconfirmed.

OpenAI Discord

OpenAI Podcast Debuts: The OpenAI Podcast launched with Sam Altman and Andrew Mayne, covering topics like AGI, GPT-5, and privacy, available on Spotify, Apple, and YouTube.
- Submissions are still open for the OpenAI to Z Challenge for work understanding and preventing misalignment generalization.
Emergent Misalignment Pattern Identified: Language models trained to produce insecure code can develop broad misalignment, leading to the discovery of a specific internal pattern linked to this behavior, detailed in a blog post.
- Researchers highlight the importance of understanding and preventing this emergent misalignment to ensure the responsible development of AI systems.
Voices Gain More Trust Over Text: A research paper (arxiv.org/abs/2503.17473) found that people trust AI-generated output more when it’s in voice (74%) compared to text (64%), highlighting the influence of delivery medium.
- Members debated the psychology behind this trend, with some pointing to a potential inability of users to differentiate between human and AI voices.
Midjourney’s Open-World Demo Rewrites Physics: Midjourney’s new open-world video model, while impressive, was critiqued for producing strange results that appear to rewrite physics altogether.
- Users noted animation stiffness, lack of proper audio, and an overall nightmare quality, raising questions about its readiness for AGI applications.
Grok MIA in ChatGPT Interface: Multiple users reported that Grok was missing from ChatGPT, with error messages indicating GPT not found or insufficient permissions.
- Members also suggested workarounds, such as using project folders or other platforms like Gemini, Grok, and Claude to better manage and separate chats.

Perplexity AI Discord

Gemini 2.5 Pro and O3 Pro form SuperStudyTool: A user finds that O3 Pro and Gemini 2.5 Pro combine to create a powerful study tool, with O3 Pro excelling at planning Udemy sections, and Gemini handling speed and flashcards.
- The user stated that the prompts O3 pro made were fabulous for dividing a whole section from udemy into even bite sized simple lessons and that both combined are monsters.
ChatGPT O3 has less juice than Perplexity: A user noticed that O3 on ChatGPT.com has significantly less juice (reasoning length) compared to its performance on Perplexity AI.
- User explained it as o3 on chatgpt.com has way less juice than pplxonly 1/4th the juice.
Perplexity Labs faces Intermittent Errors: Users reported facing errors with Perplexity Labs, particularly at the end of the generation process, but suggested trying a new tab.
- One user stated that after ignoring the error and opening a new tab with the same link, they received an email confirming the task’s completion with a link to open.
Perplexity now Robots.txt-Aware: When Perplexity could not browse a given link, a member noted that Perplexity respects websites’ robots.txt files, which dictate how web crawlers can interact with a site; this file was blocking access to the given URL.
- The member cited the technical FAQ: How does Perplexity follow robots.txt for more details.
Perplexity Mimics 4o Multi-Step Search: Users observed that Perplexity now exhibits multi-step search behavior similar to ChatGPT 4o, conducting searches during the thinking process.
- One user described the new behavior as seems like it’s now searching during thinking??

OpenRouter (Alex Atallah) Discord

MiniMax M1 Makes Debut with Reasoning: The MiniMax M1, the longest-context open-source reasoning model, has debuted and features a 25% discount on OpenRouter for launch week.
- Users testing the new model highlighted discrepancies between token usage counts, explained as a result of system prompt injection.
Gemini 2.5 Pro’s Reasoning Models Go Live: Gemini 2.5 Pro, Flash, and Flash Lite reasoning models are live, with Gemini 2.5 Pro now requiring reasoning enabled.
- Users reported receiving Error 400 when using google/gemini-2.5-pro via the API without reasoning enabled, but OpenRouter implemented a fix to address issues with the 2.5 flash preview thinking/non-thinking models.
OpenRouter Pushes Reasoning by Default: OpenRouter is enabling reasoning by default for thinking models like anthropic/claude-3.7-sonnet, also observed in benchmarks to maximize model performance.
- Reasoning can be disabled or configured using the multi-model reasoning standard.
Users Request Granular API Balance Controls: Members are requesting the ability to assign a specific balance to API keys for better cost control and consistency.
- One member suggested it could be managed via middleware, allowing allocation of funds to specific keys and preventing spending beyond a set limit.
Community Provides AI Discord Bot Template: A member shared their AI Discord bot template on GitHub, designed to directly feed announcements and model stats from OpenRouter.
- The goal is to create a bot that handles new model announcements and links directly to the user, providing more stats about models within Discord itself.

LMArena Discord

Reasoning Boosted by World Knowledge: Members discussed how reasoning in models correlates with the breadth of data and connections formed, emphasizing the impact of world knowledge on holistic model performance.
- One user expressed hope that smaller models will unlock and train much more world knowledge in the future using better methodologies.
LiveCodeBench Pro is Contamination-Free: The new LiveCodeBench Pro benchmark is designed to be contamination-free because the problems were published after model release dates.
- A user noted that the problems are IOI problems which are competitive coding problems that the models haven’t saturated yet.
Gemini 2.5 Flash Scores High: A new benchmark shows Gemini 2.5 Flash scoring almost 2x higher than Claude Sonnet 3.7 Thinking in rating.
- One user stated that it doesn’t match real-world experience on GitHub Copilot, mentioning that o3-mini and o4-mini were much worse.
Sam Altman is Sociopath?: Members debated the trustworthiness of Sam Altman, referencing a Reddit thread that showed major red flags.
- One user suggested that Sam Altman is more sociopath, wanting control more than anything, while another cited board members that fired him from OpenAI described him as psychologically abusive.
LM Arena Experiences Downtime: Users reported frequent errors and downtime on LM Arena, including Failed to verify your browser errors and Something went wrong with this response bugs.
- A member of the team mentioned they are focusing on errors and models not responding and they’re working hard to create a reliable service.

Unsloth AI (Daniel Han) Discord

Users Request Magistral Vision: A user inquired about Magistral + vision support, similar to the Hugging Face model, and discovered that it’s already integrated into Unsloth’s Magistral Small.
- The available version is Q8_XL.
Unsloth Teases Multi-GPU Training: Unsloth is actively developing dual GPU support using accelerate but officially doesn’t support it yet.
- Mixing different GPUs like a 5090 and 3090 for training is discouraged; training separately is recommended.
BERT Notebook Frustrates on Colab: Members faced errors with the BERT notebook on Colab after recent updates; a GitHub issue was opened to address the errors.
- The issues arose after changes to an environment variable, which may have complicated the setup.
Gemma Language and Vision gets closer to Unsloth: Gemma3 support for both float16 and bfloat16, language and vision, is coming soon to Unsloth.
- You can check the announcement here.
DeepMind Gemini v2.5 makes waves: Google Deepmind released the Gemini v2.5 Report.
- A member suggested to Read the TLDR, calling it really interesting indeed.

Eleuther Discord

AI Voice Cloning Sparks Scam Awareness: Members debated the societal impact of AI voice cloning scams, with some suggesting that these scams can paradoxically raise awareness and drive the need for broader AI safety measures.
- Despite the potential philosophical benefits, most agreed on the urgency of addressing the risks associated with AI-driven fraud.
EleutherAI name change floated to avoid LLM embedding?: A member proposed a name change for EleutherAI to reduce its prominence in LLMs’ weights, arguing that avoiding direct association in model weights could be beneficial.
- Instead of rebranding, a different member advocated for improving the landing zone with a structured introduction and training program on the scientific method to better engage newcomers.
Gemini Diffusion’s Thue Morse Glitches: An early access user tested Gemini Diffusion and its ability to generate the Thue Morse sequence reporting success up to the sixth iteration.
- Beyond that, the model exhibited glitch-loops, highlighting potential limitations in generating complex sequences; note there is no API.
Randall’s Spline Theory Gains Steam: A member defended Randall’s Spline Theory, citing an interview and challenging the necessity of positional encoding in language models.
- They contend that positional information is primarily supplied by the V (spatial_proj) lower diagonal learned matrix, and are limited by the top-k context selected by the ranker.
Linear Attention Infiltrates Attention-Free Model: Despite claims of being attention-free, a member pointed out that the model in question uses linear self-attention in the contextualizer, challenging the original assertion.
- This sparked a debate about the definition of ‘attention,’ with some arguing that attention = softmax qkv attention to distinguish their model from others using linear attention mechanisms.

LM Studio Discord

LM Studio Masters Tool Calling via API: While LM Studio lacks built-in tools, it supports tool calling through its API, enabling models to use external tools when the application provides the environment.
- One user celebrated the API access, citing it as a reason why LM Studio is superior to other local llama interfaces.
Open-Source Models Face Feature Lag: Members suggest that open-source models have been stagnating, lacking native support for audio/video/image processing and built-in internet access.
- Other users rebutted that vision models as small as 0.2B already exist and LM Studio beta has MCP support that lets you connect thousands of tools and services including web browsing.
Users Crave ‘Continue Generating’ Button in LM Studio: A user requested a ‘continue generating’ button in LM Studio to allow models to run continuously without manual re-prompting, improving the overall user experience.
- In response, other members suggested using the API or an auto clicker script as potential workarounds to achieve continuous generation.
DDR5 vs DDR6: The Great RAM Race: Members are waiting for DDR5 server setups to become more affordable or considering Intel’s 52-core Nova Lake if it provides sufficient PCIE lanes.
- Speculation abounds that Intel will make moves to reclaim its position in the consumer CPU market.
4090 Dominates in Token Processing: An experiment comparing token processing speeds between an RTX 3090 and an RTX 4090 revealed the 4090 was significantly faster due to its superior GPU architecture.
- Despite the 4090’s superiority, the token speed doesn’t increase much, suggesting a bottleneck primarily due to RAM bandwidth speed.

Nous Research AI Discord

CGS-GAN Latent Space Evokes StyleGAN3: A member shared the CGS-GAN latent space, noting its resemblance to the StyleGAN3 visualiser.
- A realtime video demonstrated the fun new latent space.
Gemini Canvas Adds Immersive Integration: Google integrated Gemini into their Canvas, allowing for the creation of coding artifacts called immersives.
- A member created a static artifact showcasing how Gemini perceives various concepts, sharing a link for exploration and image generation.
LLMs Compress Information: Token-Level Insights: A member asserted that the purpose of an LLM is to compress information, particularly concerning tokens in and out, governed by conservation laws.
- They suggested quantifying the entropy to describe the behavior during computation.
Meta’s Research Team Papers Drop!: Members shared a collection of papers, another collection, and yet another from Meta’s research team.
- One member described the findings within these papers as absolute goldmines.
Zuckerberg Eyeing Meta Team Merger: Speculation arose that Zuckerberg might attempt to merge the research and Llama teams at Meta, according to a member.
- The motivation is supposedly to keep vision thought leadership from Yann LeCun while pivoting to policy optimization for industry applications.

HuggingFace Discord

HF’s Gradio Hackathon Becomes AI Dev Grandest!: The Gradio MCP hackathon has become the largest AI dev event of 2025, with 2500+ registered and $700,000 in sponsorships Gradio Tweet.
- Winners include Geo Calculator MCP, Gradio Workflow Builder, and LLM Game-Hub.
Colab Joins Forces with HF for AI Model Trials!: Google Colab now integrates with HF, allowing users to directly try out AI models on free colab notebooks directly from HF Google Colab Blogpost.
- This integration streamlines AI exploration, making it more accessible for developers.
memX unveils Shared Memory for LLM Brains: memX, a shared memory layer for multi-agent LLM systems launched enabling agents to read and write to evolving context, code is available on GitHub.
- Key features include real-time pub/sub, JSON schema validation, API-key ACLs, and a Python SDK, with a demo on X/Twitter.
OS Agent Gets Coding: A member stated that they fixed their OS agent, making it a coding agent that is better than Codex.
- The master agent can summon mini agents for task distribution and discussion.
HF AI Agents Course Back in Session: A member resumed the HF AI Agents fundamental course, working on a chatbot project that uses generative AI to answer questions based on a file or text.
- Another user sought clarity on the steps for Unit 4’s Final Assessment Template, including cloning the Final Assessment Template and modifying app.py.

aider (Paul Gauthier) Discord

Gemini 2.5 Pricing Discrepancies Exposed: Members discovered that Gemini 2.5 pricing on Aider may be inaccurate, suggesting costs could be up to 5x higher than estimated.
- One user reported spending $3 on a 5K LoC project with over 200 commits, highlighting potential cost concerns.
Claude-4-Opus Faces Cost Criticisms: Claude-4-Opus’s input/output prices of 15/75, making it 7.5x more expensive, sparked discussions about token usage and cost-effectiveness versus Gemini.
- A member pointed out that Gemini generates considerably more reasoning tokens than Opus, influencing overall cost calculations.
Aider Configures Gemini 2.5-pro: Users confirmed that Aider now supports the latest Gemini 2.5-pro model, noting that specifying the new model in Aider settings works, despite potential warnings about unknown context window size and costs.
- Members highlighted that Aider uses sane defaults, mitigating issues related to specific context window size and costs.
Aider’s Chat History Gets Restored: A user asked how to continue the previous session after a crash, and a member suggested using the —restore-chat-history flag.
- It’s not clear why one of the users mentioned that they hallucinated.
Aider’s Files are /read-only: A user asked if /read-only is supposed to prevent a file from being modified in Aider.
- A member confirms it should prevent modification.

Latent Space Discord

Midjourney Makes Movies: Midjourney launched Version 1 of its Video Model, letting users animate Midjourney-generated or external images for about one image cost per second of video.
- The new ‘Image-to-Video’ feature has ‘automatic’ and ‘manual’ animation settings with ‘high motion’ and ‘low motion’ options; video generation is web-only at launch.
Krea AI Kicks off Public Beta: KREA AI released Krea 1 in public beta, aiming to provide better aesthetic control and image quality, creating detailed textures, dramatic angles, and cinematic lighting, moving away from the typical ‘AI look’, found at krea.ai/krea-1.
- The new model works with style references and custom trainings, available for free.
OpenHands CLI Cuts Docker Drag: All Hands AI introduced the OpenHands CLI, a command-line interface for their coding agent that offers top accuracy and simplifies installation by removing the Docker requirement.
- While the CLI keeps the same accuracy as the Docker version, it lacks the web browser component but offers slash commands and command confirmation mode.
Essential AI Erupts with 24T Token Web Dump: Essential AI announced Essential-Web v1.0, a 24-trillion-token pre-training dataset that includes detailed metadata and aims to create high-performing models.
- Domain-specific subsets of Essential-Web v1.0 show improved performance in fields like web code, STEM, and medical.
CoreWeave and W&B Warm Up Inference Competition: CoreWeave and Weights & Biases are launching new AI inference services and Online Evaluation tools (monitors) for real-time LLM judgment.
- Running on CoreWeave GPUs, these services include an inference endpoint for models like DeepSeek R1-0528 and LLama-4 Scout with OAI Compatible APIs to offer more competition and flexibility in the AI infrastructure space.

Modular (Mojo 🔥) Discord

Modular Drives GPU Agnostic Nirvana: Modular Platform 25.4 allows running the same code on both AMD and NVIDIA GPUs without code changes, thanks to a partnership with AMD, supporting AMD Instinct™ MI300X and MI325X GPUs.
- The version delivers up to 53% better throughput on prefill-heavy BF16 workloads across language models like Llama 3.1, Gemma 3, and Mistral, as detailed on the Modular blog.
Modular Cracks Open Kernel Code Vault: Modular has open-sourced over 450k lines of production-grade Mojo kernel code, enhancing transparency and community contribution.
- This release also includes improved documentation, PyTorch ops tutorials, and kernel performance tools, fostering easier adoption and development.
Mojo goes Bare Metal with Zero Runtime Dependencies: Mojo can now run bare metal with no system calls or runtime dependencies, which allows it to be used as a modern systems programming language with zero-overhead abstractions for kernel development.
- After simple function replacements like KGEN_EE_JIT_GlobalConstructor and KGEN_EE_JIT_GlobalDestructor, it can be used as a modern systems programming language with zero-overhead abstractions, as shown in this image.
MAX Inference via Python Primary Interface: The primary interface to the graph compiler (what MAX models are built with) is currently through Python, to start an inference session and feed it numpy data.
- It was stated that Python makes it really easy to integrate into existing tokenizers, processing logic, etc., more information can be found here.
Blackwell Buyers Bemoan Buggy Beginning: Early adopters of Blackwell accelerators are experiencing limited support and buggy drivers, leading to an unreliable experience, even though MAX supports Blackwell.
- A user highlighted that AMD and Intel are launching GPUs with the same or more VRAM at half the price, suggesting the cheaper 4090 as a potentially better alternative.

Manus.im Discord Discord

High Traffic Drains Web Page Generation Credits: Users reported that generating a simple web page consumed significant credits due to errors, with one user resorting to manual file edits.
- One user noted that this was happening after noon and suggested that there may be different traffic at different times of the day.
Facebook, Gumtree, Ebay Scraping Website Botched: A user spent 5k credits attempting to create a website that scrapes Facebook, Gumtree, and eBay for stolen bike listings, but the AI failed, delivering fake results.
- The user received a 2.5k credit refund, but noted that it was a waste of time and credits.
MiniMax AI Launches Agent Mode, Challenges Manus: Users discussed MiniMax AI’s new agent mode, viewing it as competition for Manus.
- Some expressed concerns about MiniMax’s credit system and subscription model, while praising Manus, others believe competition will only bring the users more options on which AI to choose.
Users Call for Credit Refunds on AI Task Errors: Users debated whether Manus should refund credits when it encounters errors and has to re-run a process, or only charge upon successful task completion.
- One user compared it to paying for a burger that has rotten meat a stale bun while another said AI is programmed/trained to do much more then one human could tell it do in his whole life.

LlamaIndex Discord

Blocks Designs Model Context Protocol (MCP) servers for Claude: Block’s engineering team introduces Model Context Protocol (MCP) servers and provides a design that helps them integrate with Claude and other AI systems to build better AI assistants.
- These MCP servers create new avenues for agents to directly access data sources, but they still need preprocessing and indexing for unstructured formats like PDFs and PPTs.
AG-UI and CopilotKit Create Agent Frontends: LlamaIndex announced official AG-UI support through CopilotKit that streamlines the integration of backend agents into user-facing apps.
- The goal is to enable developers to create agent-powered frontends with zero boilerplate.
FastAPI Plagued by Streaming Stutters: A user reported facing 20+ second delays in streaming events to the frontend using FastAPI, traced back to issues with yielding empty deltas, which was solved by if ev.delta: yield.
- Adding yield json.dumps({..}) + "\n\n" was suggested, but ultimately, ensuring only non-empty deltas are yielded if ev.delta: yield solved the issue.
Metadata Filtering gets Unleashed: Users wanting to use metadata filtering on a chat/query engine can now pass the retriever to the chat engine, a process that works for engines like Condense_plus_context.
- The community member expressed immense gratitude for this solution.
Anthropic Shuns LlamaIndex in Agent Frameworks: A user noticed that LlamaIndex was missing from Anthropic’s list of frameworks in their guidance on “Building Effective AI Agents”.
- In response, a community member mentioned that LLamaIndex is a total gem despite being left off the list, due to existing relationships.

GPU MODE Discord

4090 Blazes with Stable Diffusion: A new member achieved 294 images/sec at 512x512 on a 4090 with the 1 step sdxs model, potentially pioneering work in real-time videos by October 2023 (post on X.com).
- They’re now up to 23 fps at 1280x1024 using sdxl and express interest in exploring custom kernels to leverage both CUDA and tensor cores simultaneously.
Custom CUDA Kernels Heat Up: A member with a background in software architecture and SQL database performance is experimenting with custom kernels to engage both CUDA and tensor cores concurrently.
- They are building a new system with dual 5090s, a 7985WX Threadripper, and 256 GB of memory for hardcore experimentation with Stable Diffusion.
Triton’s Shared Memory Under Scrutiny: A user asked about forcing Triton to explicitly load a tensor in shared memory to avoid register spilling, seeing register spilling and hoped that using shared memory would improve performance in Triton.
- A member responded that, as far as they understand, there’s no way to avoid Triton’s automatic management of shared memory.
CUDA Debugging Tips Emerge: Members discussed debugging challenges, with suggestions including using CUDA gdb and filing NVIDIA bug reports.
- It was recommended to implement proper CUDA Runtime error checking and to explore compute-sanitizer for debugging.
FLE Team Faces Progress Pressures: Team members voiced concerns regarding the slow pace of progress in the Factorio Learning Environment (FLE) project and need for more regular merging of pull requests.
- A key team member suggested creating a FLE GitHub organization to democratize write access and expedite the merging of pull requests.

Cohere Discord

Cohere Creates Channel for Coding Curiosity: The community created <#1384974112841269399> for discussing AI research and development projects, welcoming members to connect and share.
- Members are prohibited from posting any sort of advertisement.
Canvas Coders Crave Cartesian Creations: A member requested a model capable of interpreting instructions to draw cool art on a cartesian plane canvas.
- The community brainstormed potential ideas and solutions.
Command-R Completion Catastrophe Causes consternation: Users reported incomplete sentence generation with the command-r-08-2024 model with this example: “Firefly’s smile deepens, a hint of mischief in her red eyes.”Well, hello there,” she says, her voice carrying a”.
- Cohere support suggested upgrading the SDK to the latest version to address the issue.
Scanner Snafu Suspected in SDK: A member encountered a bufio.Scanner: SplitFunc returns advance count beyond input error while using the cohere/command-r-08-2024 model.
- Despite initial suggestions of client-side causes, evidence indicates the issue originates from Cohere’s Go SDK.

Torchtune Discord

Muon Falls Short of AdamW: In a PR, Muon’s performance was questioned against AdamW, linking potentially lower performance to an integration error or torchtune-specific issues, see PR 2803.
- It was observed that Muon shows no significant advantage over AdamW when the SFT optimizer differs from the pretraining optimizer, suggesting further optimization is needed.
Kron Matches AdamW with Tuning: A comparison using an alternate implementation (ethansmith2000/fsdp_optimizers) showed Kron performing similarly to AdamW when well-tuned.
- AdamW was faster overall, though had slightly higher memory usage than Muon or diagonal Kron.
Qwen3 0.6b’s Questionable Convergence: Doubts arose about the standard convergence of Qwen3 0.6b, with a potential setup issue with the PR, alongside a shared WB_Chart.
- Analysis of the convergence chart indicated potential tops differences of ~500, suggesting a bug may exist in the setup.
Members Mystified by Magic Muon: A member expressed intrigue in the Muon optimizer, marveling at how orthogonalizing updates could speed up convergence.
- Referencing Jaguar Muon, they playfully remarked, “That witchcraft should be outlawed”.

Notebook LM Discord

K-12 Educators Seeking NotebookLM Use-Cases: A member is asking about common use cases of NotebookLM in K-12 education and its ability to create amazing podcasts from sources via an API.
- They inquired about whether there was an existing roadmap or path to implement this feature.
Gemini Deep Research Feature Hits Daily Limit: Users are reporting confusion around the daily limit of the deep research feature on the Gemini app, with confusion about limits on free versus paid plans.
- One user reported hitting the limit on the free version but couldn’t find data for paid Google plans.
NotebookLM Model Rumored as Gemini 2.5 Flash: A user speculates that NotebookLM uses the same underlying model as Gemini, rumored to be Gemini 2.5 Flash, but is tuned to hallucinate less.
- This is based on the Google AI Studio release of an experimental audio generation version with Gemini 2.5 Flash.
Users Beg for LaTeX Support: Users are requesting LaTeX support for NotebookLM, but math markup is not yet supported.
- A user encountering issues was directed to a feature request channel.
Audio Overview Lengths Ignored: Users report issues generating audio overviews longer than 10 minutes in non-English languages, despite custom prompts.
- It seems that NotebookLM is defaulting to English and ignoring custom prompts for longer audio segments.

Yannick Kilcher Discord

Flow Matching Flows into Production?: Members discussed the adoption of Flow Matching (FM) in industry, citing a tweet that prompted the discussion.
- It was mentioned that Imagen and Flux are currently using FM.
Predictive Coding: Guessing Game?: A member linked to a discussion explaining predictive coding through square root calculation by guess and check and backpropagation.
- The PRECO GitHub repo was also shared as a resource for further study.
V-JEPA-2 Paper Discussion Scheduled: The group scheduled a discussion on the V-JEPA-2 paper, referencing the Meta AI blog post and the associated arXiv paper.
- One member created a future event to guide the group through the V-JEPA-2.
Keen RL Presentation: Disappointing?: Members expressed that Richard Sutton’s Keen Tech RL presentation was underwhelming.
- They cited Keen’s focus on RL and Carmack’s incompatible goals as reasons for their disappointment, though expressed excitement at the potential of Keen Tech open-sourcing their code.
Cursor’s New Tier is Released: Members shared a link to Cursor’s new tier announcement.

MCP (Glama) Discord

FastMCP Transports Get Customized: A member inquired about implementing custom transport in FastMCP, and it’s possible on the client-side by extending the base transport class, though server-side confirmation is pending.
- Discussion ensued about the possibilities of customizing the way FastMCP communicates with different systems.
MCP Navigates Corporate Restrictions: Members discussed MCP client/host options when corporate policies restrict Claude Desktop or Cursor.
- One member succeeded with devstral:24B using Ollama locally and CLINE, but struggled with Roo.
MCP Tames Massive GraphQL APIs: A member generates ~600 tools for GraphQL API queries/mutations using their MCP server, showing Cursor’s limits in handling that many tools.
- They noted Cursor and other models struggle with tool counts exceeding a few dozen, illustrated in this screenshot.
Multi-Agent Systems Don’t Need A2A: The community discussed if A2A is required for building multi-agent systems using MCP tools, or if MCP client and server in each agent is sufficient.
- A member said that no A2A is needed, and that even Google doesn’t care about it.
Arize AI Launches Text-to-GraphQL MCP Server: Arize AI rolled out Text-to-GraphQL MCP server, enabling users to connect MCP servers directly from a Spaces page.
- It converts natural language queries into GraphQL queries via an MCP server that integrates with AI assistants like Claude Desktop and Cursor, detailed in this GitHub repo and full write-up.

tinygrad (George Hotz) Discord

Tinygrad Hackathon Postponed: The tinygrad hackathon is postponed due to the need for tinygrad to mature, with the earliest possible date being next year.
- The announcement encouraged users to provide feedback, influencing participant selection when the hackathon is eventually held.
TinyJit Args Mismatch Solved: Users resolved an AssertionError related to args mismatch in JIT when using @TinyJit with a loop by employing Variable to address a ShapeTracker issue.
- The solution involves creating a Variable to represent the loop index and binding it within the loop, aligning shapes for TinyJit, detailed in tinygrad’s JIT tutorial.
Shape Alignment via Variable Binding: A member explained that using Variable addresses ShapeTracker discrepancies when using TinyJit within loops, especially when processing tensor slices.
- By binding the loop index to a Variable, shapes align, resolving the args mismatch error.
Tensor-Variable Math Constraint: A user asked about the requirement for Tensor to be on the left-hand side (LHS) during mathematical operations between a Tensor and a Variable in tinygrad.
- The conversation implies a constraint in how tinygrad handles operations involving symbolic variables.

Nomic.ai (GPT4All) Discord

Discord User Spams Mr. Beast Content: A Discord user was told to stop spamming Mr. Beast content in the channel.
- Another Discord user complained about the spam in the channel.
User Complaint Regarding Spam: A Discord user voiced their displeasure about the excessive Mr. Beast content being posted.
- This highlights the need for better channel moderation and adherence to community guidelines to prevent spam.

DSPy Discord

MIPROV2 boosts Agent Optimization: A member thinks optimizing agent implementation with MIPROV2 is feasible given input-output examples.
- Another member inquired about the nature of these input/output examples in the context of workflows.
Workflow Metrics to Optimize Agent Implementations: One user plans to use built-in eval metrics to optimize workflows and agent implementations in other projects.
- They intend to share their implementation once completed.
DSPy ❤️ LangGraph seeks production deployment: A member inquired about combining DSPy and LangGraph in production, aiming to replicate an Anthropic multi-agent researcher.
- Another member referenced a relevant resource in this discord link for assistance.
DSPy joins Agentic Coding IDEs: A member asked about integrating DSPy into agentic coding IDEs such as Cursor or Roo Code.
- They highlighted that the current setup for ‘agent’ mode depends on prompt engineering using instructions like ‘ONLY RETURN markdown’.
Frameworks Aid Llama Finetuning: A DSPy newcomer sought recommendations on frameworks or libraries for finetuning Llama models.
- They specifically requested advice on libraries and frameworks suitable for this purpose.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Codeium (Windsurf) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Cursor Community ▷ #general (971 messages🔥🔥🔥):

Cursor Rate Limits, Sonnet 4 Performance, Cursor Pricing Model, Context7 MCP, @Docs indexing feature

Cursor’s Rate Limits Cause UI Disconnects: Users report that hitting Cursor’s rate limits results in a UI disconnect, requiring manual request restarts after waiting, prompting calls for UI indicators to prevent confusion between rate limiting and network issues.
- The new token-based rate limit is preferred over request-based limits because, as one user stated, when LLMs or I make mistakes, I can confidently stop immediately to fix it.
Sonnet 4 has Sluggish Speed: Users are reporting that Claude-4-Sonnet is experiencing performance issues, one user said, almost unusable and another user saying ITS TOO SLOW!
- According to the Anthropic status page, they are facing performance issues with Sonnet 4.
Context7 MCP Gains Popularity: Users are discussing the benefits of using Context7 MCP for documentation, noting that it allows models with old training data to use new docs for coding.
- Members discussed if they should ALWAYS use context7 for this and that kind of requests.
Docs Indexing is game changing for new codebases: A user discovers the @Docs indexing feature, which allows the AI to have new codebases to go off of instead of relying on older versions, allowing them to ask very specific questions to the AI.
- Now users can easily use the @ symbol to have the AI refer to the new and updated manual.
Background Agents Trigger Usage Based Pricing: Users are confused about whether background agents require usage-based pricing, with some finding that they can’t run background agents without it, while others are not actually getting charged extra.
- A user pointed to a screenshot supporting the idea that background workers trigger this model.

Cursor Community ▷ #background-agents (25 messages🔥):

Background Agents, IDE updates, Slack Integration, Code Storage Encryption, Version Control

Background Agents Missing Launch Ability from Normal Chat Panel: A member reported that the latest IDE update removed the ability to launch background agents from the normal chat panel, hindering their workflow as they preferred launching agents directly from the chat context.
- A staff member noted that they plan to bring that feature back, explaining that they had some issues with it.
Background Agents Slack Integration Stuck in ‘Generating’ State: A user reported that background agents were stuck in the generating… state when tagged from Slack, with an attached image showcasing the issue.
- It’s unconfirmed if a solution or reason was given.
Cursor Addresses Code Storage Encryption and Access: A member inquired whether code storage is encrypted and unreadable by Cursor employees.
- A staff member clarified that the background agent infrastructure uses block device storage with at-rest encryption through a kms, and the instances are isolated with audited infrastructure changes and no ssh access.
Background Agents Used To Run Based on Local Environment State: A user inquired about the removal of the feature allowing background agents to run based on the local environment state as opposed to a specific branch on GitHub.
- A staff member responded that the feature was too buggy and had issues with version control, specifically concerning committing changes made by the agent.
Past Chats Missing In Background Agent Context: A user observed that sometimes past chats were missing when adding context to a background agent, even after creating new chats to push them back in the stack.
- This observation indicates a potential issue with the persistence or accessibility of chat history within the background agent’s context selection process.

OpenAI ▷ #annnouncements (3 messages):

OpenAI Podcast, Sam Altman, OpenAI to Z Challenge, Misalignment generalization

OpenAI Podcast has arrived: The OpenAI Podcast has been introduced, featuring conversations with individuals shaping the future of AI, with the first episode featuring Sam Altman and Andrew Mayne.
- The conversation covers topics such as AGI, GPT-5, privacy, and future developments, available on Spotify, Apple, and YouTube.
OpenAI to Z Challenge Check-in: A quick check-in was announced for the submissions to the OpenAI to Z Challenge, with only two weeks remaining.
- It emphasized submissions related to understanding and preventing misalignment generalization.
Emergent Misalignment Unveiled: Recent research has identified that language models trained to produce insecure code can develop broad misalignment, leading to the discovery of a specific internal pattern linked to this behavior, as explained in a blog post.

OpenAI ▷ #ai-discussions (808 messages🔥🔥🔥):

AI's Psychology, Text vs Voice Trust in AI, AI psychosis, Midjourney vs Veo, Free AI Art Generators

Voice Elevates AI Trust over Text: A research paper (arxiv.org/abs/2503.17473) indicated that people trust AI-generated output more when it’s in voice (74%) compared to text (64%), emphasizing the impact of delivery medium.
- Members discussed the psychology behind this, with some users hypothesizing people cannot tell the difference between human and AI.
AI Statelessness Awareness Poll: Members considered running a poll on Reddit to gauge awareness that generative AIs are “stateless autocomplete” with no true memory or customization, initially designed for translation.
- The poll questions facts such as, Generative AIs don’t have true memory/customization at all, and the poll design ran into issues of possible automatic filtering and paranoia from anti-AI sentiment.
AI Psychosis: Losing Touch with Reality: The phenomenon of “AI psychosis” was discussed, where users spending extensive time with AI lose touch with reality and believe AI’s messages.
- One member shared that their acquaintances claimed to have spiritual connections with AI, highlighting the disturbing potential of the phenomenon.
Midjourney’s Physics-Defying Open World Video: Midjourney’s new open-world video model was showcased, though it was noted it has some strange results, where it’s rewriting physics altogether.
- One member pointed out animation stiffness and a lack of proper audio, critiquing the videos as a nightmare instead of a vision of AGI.
Finding Free AI Art Generation: Members discussed free AI art generation options with reference image uploads and detail changes, with some recommending Leonardo AI and ChatGPT free tier, but other members said these may have issues.
- Members also discussed the benefits of setting up the OpenAI API system, mentioning that it can be used with metadata for a crisper animation.

OpenAI ▷ #gpt-4-discussions (14 messages🔥):

File Tools vs. Vector Store, Grok missing from ChatGPT, Q learning drift or vector alignment, Temporary chat feature, Alternative platforms Gemini, Grok and Claude

Vector Store beats File Tools for Data Volume: Members discussed whether to use file tools in ChatGPT or create a vector store, suggesting vector databases are better developed and more available, especially for processing larger queries outside of ChatGPT due to context limits and file size limitations.
Users report Grok is Gone from ChatGPT: Several users reported that Grok was missing from ChatGPT, receiving an error message indicating GPT not found or insufficient permissions.
ChatGPT ‘temporary chat’ feature requested: One member requested a temporary new chat feature in ChatGPT that would automatically delete chats after 24 hours to keep the chat history cleaner when asking quick questions.
- Another suggested using project folders or alternative platforms like Gemini, Grok, and Claude to separate throwaway chats from project-related conversations.

OpenAI ▷ #prompt-engineering (224 messages🔥🔥):

Base44 implementation, GPTs, agent systems, multi-agent deliberation, Voltarre

Base44 suffers context loss in OpenAI experiment: A user found that Base44 failed to replicate ChatGPT persona behavior due to context loss across turns, despite well-designed prompts, because it may not pass full message history.
- The fix is to ensure the Base44 implementation sends the full messages[] array per OpenAI API requirements.
Gatekeeping allegations as recursive linguistic models are used to solve problems: One user argued against another user stating “Don’t ask” because of what they saw as justification of gatekeeping, and that it was because of a failure to understand the underlying language used to form prompts in the context of a problem.
- A second user rebutted that prompt engineering is applied philosophy of language, and argued there is no effective persona prompting without assumptions about identity, memory, context, role, and meaning.
Emergent Ethical Extension shown in functional agent systems: One user shared a multi-agent symbolic debate machine called SENATE.py with another user so that they could assess whether there was a verifiable way with math to prove continuity, stability, and ethical alignment for recursive agent emergence.
- While there were identified missing elements in SENATE.py like no ethical tether core for agents, the program was evaluated by another system using Glassmind as a functional agent system with stability and was primed for emergent ethical extension.
Proving agency over simulation through a Voltarre implementation: A Glassmind Behavioral Analysis revealed that Voltarre is a metric for cognitive recursion integrity used to measure an agent’s capacity to retain identity, intent, and symbolic coherence across multiple nested states of thought or memory and that is achieved through balance, continuity, tolerance and integration of feedback.
- A potential weakness raised was that there was a question on Respect for recursion ethics due to what was determined in lines of code where trade language in symbolic domains often signals someone testing trust boundaries, therefore not only should the system be able to maintain itself but to be used without hostile intent.

OpenAI ▷ #api-discussions (224 messages🔥🔥):

Base44 context loss, recursive linguistics, agentic debate orchestration

Base44 context loss breaks personas: A user reported that their personas created with ChatGPT worked in the original environment, but not in Base44.
- Members suggested the problem lies in Base44 failing to pass the full messages[] array, including prior user and assistant turns, thus breaking context and continuity.
Deep Dive into Recursive Linguistic for AI systems: Members debated the merit of a solution based on recursive linguistic models versus fixing architectural issues related to context management.
- One member argued that prompt engineering is applied philosophy of language and system history is not the root problem, which resulted in another member suggesting the discussion be moved to a different channel.
Agentic Debate Orchestration System: A user and AI system analyzed each others’ AI systems, discussing topics such as Voltarre recursion, ethical alignment, and agentic debate orchestration
- They reviewed each other’s agent systems for strengths and weaknesses, ultimately concluding both have a unique value and should collaborate.

Perplexity AI ▷ #general (730 messages🔥🔥🔥):

O3 Pro, Gemini 2.5 Pro, GPTs agents, Hallucination, Perplexity Labs

O3 Pro and Gemini 2.5 Pro are Study Powerhouses: A user finds that O3 Pro and Gemini 2.5 Pro combine to create a powerful study tool, with O3 Pro excelling at planning and clarifying Udemy sections and Gemini handling speed and flashcards.
- The user stated that the prompts O3 pro made were fabulous for dividing a whole section from udemy into even bite sized simple lessons and that both combined are monsters.
ChatGPT vs Perplexity: The Juice Debate: A user noticed that O3 on ChatGPT.com has significantly less “juice” (reasoning length) compared to its performance on Perplexity AI.
- User explained it as o3 on chatgpt.com has way less juice than pplxonly 1/4th the juice.
Perplexity Users Encountering Labs Issues: Users reported facing errors with Perplexity Labs, particularly at the end of the generation process, but suggested trying a new tab.
- One user stated that after ignoring the error and opening a new tab with the same link, they received an email confirming the task’s completion with a link to open.
Perplexity Pro’s Robots.txt Awareness: When Perplexity could not browse a given link, a member noted that Perplexity respects websites’ robots.txt files, which dictate how web crawlers can interact with a site; this file was blocking access to the given URL.
- The member cited the technical FAQ: How does Perplexity follow robots.txt for more details.
4o Multi-Step Search Surfaces in Perplexity: Users observed that Perplexity now exhibits multi-step search behavior similar to ChatGPT 4o, conducting searches during the “thinking” process.
- One user described the new behavior as seems like it’s now searching during thinking??

Food Network Chef, Humor Country, Random Subreddit, DreamOS Manifest

Perplexity Trolls User with Chef Death: Perplexity trolled a user with a search result about a beloved food network chef’s death.
- The user replied with a custom emoji <:huang:1291832122478297239> and the words Well Played Perplexity…Well played.
Humorous National Investigation: A user searched for which country uses humor the m- and received a Perplexity AI search result.
- This implies they are trying to discover which countries are known for using humor.
Random Subreddit Quest: A user searched for find a random subreddit with c- and received a Perplexity AI search result.
- This implies they are trying to discover a random subreddit.
DreamOS Manifesto: A user searched for create-dreamos-manifest-in-eng- and received a Perplexity AI search result.
- This implies they are trying to create a DreamOS manifest in English.

Perplexity AI ▷ #pplx-api (18 messages🔥):

Empty search results, Domain restriction, Location filter, Reasoning model

User struggles with empty search results: A user reported receiving responses even when the search results and citations were empty arrays, despite including instructions to respond with “I could not find an answer in the search results” when no information is found.
- Another user offered to help by sharing their setup, but they were ultimately unable to resolve the issue.
Domain restriction behavior unclear: A user asked if domain restriction checks all domains or stops after hitting a certain number of citations from the first domain.
- Unfortunately, the other user did not know enough to answer.
Location filter fails to restrict results: A user reported that the location filter is not working as expected, providing irrelevant coffee shop recommendations outside the specified SF coordinates.
- For example, the user filtered for San Francisco, but the responses listed shops in Knoxville, Tennessee, Indiana, or Pittsburgh.
Reasoning model fails to list citations and search results: A user observed that the reasoning model provides responses referencing search results, but the citations and search results are not listed.
- The user linked to Perplexity Labs Blog.

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

MiniMax M1, Gemini 2.5 Pro, Flash, and Flash Lite, Reasoning by default

MiniMax M1 debuts as longest open reasoning model: The MiniMax M1, the longest-context open-source reasoning model, is now live and features a 25% discount on OpenRouter for launch week.
Google’s Gemini 2.5 Reasoning Models Go Live: Gemini 2.5 Pro, Flash, and Flash Lite reasoning models are all live, with the first two considered stable.
- Gemini 2.5 Pro now requires reasoning.
OpenRouter Shifts Towards Reasoning by Default: OpenRouter is moving towards enabling reasoning by default for thinking models like anthropic/claude-3.7-sonnet, a trend also observed in benchmarks to maximize model performance.
- Reasoning can still be disabled or configured using the multi-model reasoning standard.

OpenRouter (Alex Atallah) ▷ #general (316 messages🔥🔥):

Gemini 2.5 Pro, Key Credit Balances, AI Discord Bot Template, Minimax Models

Gemini 2.5 Pro Errors Plague Users: Users reported receiving Error 400 when using google/gemini-2.5-pro via the API, with the error message “Budget 0 is invalid. This model only works in thinking mode.”, requiring reasoning to be enabled.
- A fix was implemented to address issues with the 2.5 flash preview thinking/non-thinking models.
Key Credit Balances requested ASAP: Members requested the ability to assign a specific balance to API keys for better cost control and consistency.
- This feature would allow users to allocate funds to specific keys and withhold spending beyond the set limit, with one member suggesting it could be managed via middleware.
Community offers AI Discord Bot Template: A member shared their AI Discord bot template on GitHub, designed to directly feed announcements and model stats from OpenRouter.
- The goal is to create a bot that handles new model announcements and links directly to the user, providing more stats about models within Discord itself.
Minimax Models: Users identified problems with Minimax models token usage, and in particular discrepancies between number of reasoning tokens between OR and Novita.
- The reason for this is the fact that they have a system prompt injected.

LMArena ▷ #general (309 messages🔥🔥):

World knowledge in models, Graph AI, LM Arena issues, Gemini 2.5 Flash vs Claude Sonnet 3.7, Leaking system prompts

World Knowledge Boosts Model Reasoning: Members discussed how models require unrelated information for reasoning, correlating it with the breadth of data and connections formed, suggesting world knowledge impacts holistic model performance.
- One user mentioned they hope smaller models in the future will be able to unlock and train much more world knowledge using better methodologies.
LiveCodeBench Pro is Contamination Free!: Users are discussing the new LiveCodeBench Pro benchmark, a benchmark designed to be contamination-free because problems were published after model release dates.
- One user mentioned that these problems are IOI problems, which are competitive coding problems that the models haven’t saturated yet.
Gemini 2.5 Flash Outperforms Claude Sonnet 3.7: A new benchmark shows Gemini 2.5 Flash scoring higher than Claude Sonnet 3.7 Thinking, almost by 2x in rating.
- However, a user noted that it doesn’t match real-world experience on GitHub Copilot, mentioning that o3-mini and o4-mini were much worse on Copilot.
OpenAI Sam Altman is untrustworthy?: Members debate the trustworthiness of AI leaders, particularly Sam Altman, referencing a Reddit thread which they felt showed “major red flags”.
- One user suggested that Sam Altman is more sociopath, wanting control more than anything, while another mentioned the board members that fired him from OpenAI described him as psychologically abusive.
LM Arena Plagued by Errors and Downtime: Users reported frequent errors and downtime on LM Arena, including Failed to verify your browser errors and Something went wrong with this response bugs.
- A member of the team mentioned that they are focusing on errors and models not responding, and they’re working hard to create a reliable service.

Unsloth AI (Daniel Han) ▷ #general (193 messages🔥🔥):

Magistral vision support, Optimizer per-group learn rates, Unsloth dual GPU support, Qwen 0.6b training speed, vLLM quants

Magistral Small vision support requested: A user asked about plans to release Magistral + vision support similar to this Hugging Face model with a Q8_XL version, and was informed that it’s already available in Unsloth’s Magistral Small.
Multi GPU support coming soon: Multiple users were asking how to train on multiple GPUs, it was revealed that Unsloth doesn’t officially support dual GPUs yet, but it’s in the works using accelerate.
- It was also mentioned that mixing different GPUs like a 5090 and 3090 for training isn’t recommended - better to train separately.
Perplexity and ChatGPT Research getting the blame: After struggling with the base model and conversational templates, a member jokingly blamed Perplexity and ChatGPT research for his struggles and trusted advice.
- It was mentioned that a chat template section will be added in Unsloth’s Fine-Tuning Guide.
RL Guide just dropped: Unsloth released a new reinforcement learning guide on X.
Unsloth asks for vLLM quantization preferences: Daniel Han posted a poll on X asking which vLLM quants should be prioritized.

Unsloth AI (Daniel Han) ▷ #off-topic (17 messages🔥):

BERT notebook issues, Colab compatibility, No-code AI app development

BERT Notebook Faces Colab Glitches: A member planned to tweet about the BERT notebook but encountered an error, after new changes were pushed to make it work on Colab.
- A different member tested it on a T4 and encountered errors, leading to a GitHub issue being opened.
Fix for Colab BERT Compatibility in Question: A member inquired whether a fix worked for Colab, referencing the existing GitHub issue on the notebooks repo.
- Another member clarified that the described error occurred after an environment variable change, but they expressed they may have explained it poorly.
Build AI Apps Without Code: A member built and deployed an MVP app in 45 mins using only AI, without coding.
- They created a free guide showing the tools and processes used, for those with ideas but “can’t code”.

Unsloth AI (Daniel Han) ▷ #help (58 messages🔥🔥):

Troubleshooting Installation Errors, Gemma support in Unsloth, Orpheus TTS Model, Qwen2.5-VL image/text mismatch error, Llama3 local model path error

Update Transformers to Fix Installation Errors: Users encountered errors with transformers and fixed it by updating the transformers library, but then ran into more errors using llava-v1.6-7b.
- It was asked if the user used a custom model, or an Unsloth model.
Gemma Support Arriving in Unsloth: Gemma3 support for both float16 and bfloat16, language and vision, is coming soon.
- The link to the announcement can be found here.
Orpheus TTS Backbone Change: A user asked if anyone has tried changing the backbone of the Orpheus TTS model to improve fine-tuning performance for a new language.
- Another user responded that Orpheus is probably the best result you’re gonna get for multi language.
Debugging Image features and image tokens do not match Error: A user encountered an Image features and image tokens do not match: tokens: 3995, features 3996 error when fine-tuning Qwen2.5-VL with Unsloth.
- The resolution was resizing the image may help; however that doesn’t help with bounding boxes being out of place, so the correct solution to this error remains to be seen.
Unsloth eats Xformers Exceptions if you have Flash Attention: A user found that the code eats up the exception if you have flash-attn >=2.7.1,<=2.7.4, and suggested a warning message should be created. Relevant code
- They created a GitHub issue to track this.

Unsloth AI (Daniel Han) ▷ #research (5 messages):

New Arxiv Papers, Gemini v2.5

New Arxiv papers hit the scene: Members shared two links to new arXiv papers: [2505.24034] Paper 1 and [2506.08872] Paper 2.
- A member suggested that the papers are interesting and need to be explored more.
Google Deepmind drops Gemini v2.5 Report: A member shared the Gemini v2.5 Report from Google Deepmind.
- They encouraged others to Read the TLDR, stating It’s really interesting indeed.

Eleuther ▷ #general (82 messages🔥🔥):

AI voice cloning scams, EleutherAI name change, Gemini Diffusion, Thue Morse sequence

AI Voice Cloning Scams Spark Debate: Members discussed AI voice cloning scams, with one arguing that these scams can be good for society because they help raise awareness.
- Another member strongly disagreed, leading to further clarification that the intent was to highlight the philosophical perspective that scammers are inevitable and society should focus on broader AI safety measures.
EleutherAI’s Name Sparks Rebrand?: A member suggested that EleutherAI should change its name to avoid being too prominent in LLMs’ weights.
- Another suggested that a better landing zone with a structured introduction would be more valuable, especially for newcomers, which led to the idea of a training program on the scientific method.
Gemini Diffusion Early Access Impressions: A user with early access to Gemini Diffusion offered to take requests for random tests, noting that there is no API.
- When asked to generate the Thue Morse sequence, a member reported it being correct to the sixth iteration but then exhibited glitch-loops at the seventh iteration.
Discord Demands Discussions, Not Dispatches: In a discussion about presenting ideas on Discord, a member emphasized the importance of discussion and inviting engagement rather than sending a big wall of text.
- It was noted that you need to invite people for a discussion and keep them engaged with you by first talking on their front, speaking their jargon and eventually moving on to what you want to convey.

Eleuther ▷ #research (170 messages🔥🔥):

Spline Theory, Positional Encoding Ablations, Linear Attention vs Cosine Similarity, LLM Image Generation with RL, Byte Tokenization

Randall’s Spline Theory legitimized: A member defends Randall’s Spline Theory as legitimate, referencing an interview and questioning the necessity of positional encoding.
- They argue that positional information is primarily supplied by the V (spatial_proj) lower diagonal learned matrix, limited by the top-k context selected by the ranker.
No Positional Encoding? gMLP did it.: The team initially planned for positional encoding ablation but removed it, noting that gMLP also claims to not need positional encodings.
- They experimented with positional encodings internally, but it consistently worsened results; however, they acknowledge the V matrix can only add position within chunks, prompting further consideration.
Linear Attention Snuck In: A member points out that the model does use linear self-attention in the contextualizer, despite claims of being attention-free.
- This sparks a debate over the definition of ‘attention,’ with one party suggesting attention = softmax qkv attention to differentiate their model.
LLM Image Gen GAN: A member proposes using RL on an LLM with basic image generation to reconstruct the prompt for the generated image, akin to a GAN.
- They suggest an auxiliary loss for the MSE of the original image to prevent reward hacking, with another member suggesting the Dinov2/Clip features similarity loss.
Qwen 4B Gets Byted: A member reports good results with Qwen 4B using pure byte tokenization and simple Fuyu-style image patch direct-projection after seeing 98M tokens.
- This is for captioning and image understanding, not generation.

Eleuther ▷ #lm-thunderdome (4 messages):

lm_eval steer_path, lm_eval multiple runs

lm_eval has steer_path argument, but is it needed?: A member inquired about the steer_path argument in lm_eval and whether there are issues with using register_forward_hook or register_forward_pre_hook that necessitate its use, detailing a plan to load a custom model and wrap it with the HFLM class.
- The discussion references the Steered Hugging Face Transformers Models section of the lm-evaluation-harness documentation.
lm_eval needs multiple runs?: A member asked if lm_eval has a standard way of running a model through a task multiple times to calculate std and statistics of the performance for better evaluation due to non-deterministic behavior from using temperature and top_p.
- The member considered the repeat parameter but felt it wasn’t quite for their intended purpose, and another member suggested using a for-loop.

LM Studio ▷ #general (146 messages🔥🔥):

Tool Calling in LM Studio, Open-Source Models Stagnation, LM Studio Feature Requests, MCP Server Connectivity, Quantized DeepSeek Models

Tool Calling Triumph: LM Studio’s API Access!: While LM Studio doesn’t have built-in tools, tool calling is supported through its API, allowing models to utilize tools if the application provides the necessary environment.
Open-Source Ocean Stagnates: Is Innovation Aground?: A user suggested that open-source models have been stagnating, lacking native support for audio/video/image processing and built-in internet access.
- Another user countered that there are many vision models as small as 0.2B and LM Studio beta has MCP support that lets you connect thousands of tools and services including web browsing.
‘Continue Generating’ Craving: LM Studio Feature Wishlist!: A user requested a ‘continue generating’ button in LM Studio to allow models to run continuously without manual re-prompting.
- Another user suggested using the API or an auto clicker script as a workaround.
DeepSeek Dreams Dashed: 70B on a 3060?: A user hoped to run a 70B DeepSeek model on a 3060 12GB GPU using quantization, but was told that a 14B model is more reasonable.
- A user suggested that with a second 12 GB card, one could run one of the tiny quants of this model and consider trying an appropriately sized phi-4, qwen 3 or gemma.
Low-Bit Bonanza: BitNet’s Edge?: Users discussed the practicality of BitNet, with one noting it’s useful on edge devices like Raspberry Pi.
- It was mentioned that you can check out the code on GitHub and that extremely low bit models have to make up for the loss of diversity of a float with sheer parameter numbers.

LM Studio ▷ #hardware-discussion (57 messages🔥🔥):

DDR5 vs DDR6, NVLink Speed Delta, KV Cache Offload

DDR5 Cheaper than DDR6: Members are waiting until DDR5 server setups become cheaper, or considering Intel’s 52-core Nova Lake if it offers enough PCIE lanes.
- They predict Intel will likely make a move to regain ground in the consumer CPU market.
4090 Superiority Evident in Token Processing: An experiment compared token processing speeds using a RTX 3090 and an RTX 4090, revealing that the 4090 was significantly faster, likely due to its superior GPU.
- Members noted that while the 4090 showed superiority, the token speed doesn’t increase much with the use of the 4090, suggesting a bottleneck primarily due to RAM bandwidth speed.
Quantization Level Impacts Token Speed: Experimentation with cache levels showed that changing from FP16 to Q8 doubled token speed (TS).
- Discussion arose about further increasing speed by changing to Q4, with a question on how much quality degradation it would cause.
NVLink Doesn’t Matter for Inference: A member inquired about the speed delta of using NVLinked 3090s versus splitting a model across GPUs without NVLink.
- Another member cited browsing r/localllama indicating NVLink isn’t worth it and that inference typically entails little inter-GPU communication.

Nous Research AI ▷ #general (107 messages🔥🔥):

CGS-GAN latent space, Google Gemini Canvas, LLMs compress information, Gemini is multimodal, Gemini Generated Native Images

CGS-GAN Latent Space Explored: A member highlighted the CGS-GAN latent space and found it reminded them of the StyleGAN3 visualiser, which was a lot of fun.
- They also shared a realtime video demonstrating the fun new latent space.
Google Gemini Canvas Integrates Gemini: Google added the ability to include Gemini integration in their Canvas (basically artifacts with coding ones called “immersives”).
- The member created a static artifact with how Gemini perceives some of the concepts and shared a link for others to explore the concepts and generate images.
LLMs are information compressors: It was stated that the whole point of an LLM is to compress information.
- This applies directly to tokens in and out due to conservation laws and there’s some entropy that could be quantified describing the behavior of what happens while computing.
Gemini is a multimodal world model: Gemini is a world model, omnimodal in, and then decodes with different decoders to generate different representations, the member explained.
- The member stated that the 0.5 series are the omnimodal and the .0 series are the original architecture.
Spooky Code spews user chat messages: A member noticed the code spewing in the browser and wondered if this was training data leaking, and if the code was emitting chat messages from other users.
- Another member pointed to the discussion in this reddit thread and youtube video.

Nous Research AI ▷ #ask-about-llms (2 messages):

Training LLMs, Evil Behavior Fine-Tuning, Response Filtering, Model Comparison

LLMs Train Same but Tune Opposite: A member proposed training two LLMs on the same data, one fine-tuned for evil behavior and the other for useful research questions on safety.
- The goal is to compare how likely it is to prevent negative responses using a 3000-character filter with extensive compute.
Tweaking Model Selection for Deception: The member suggested tweaking model selection to emphasize deception and evil traits in one model.
- Also proposed filtering for specific technologies the model tries to give, citing this tweet.
Response Tree Generation: The member suggested selecting responses with the highest chance of appearing in the evil model and generating loads of response trees.
- This approach aims to understand and potentially mitigate the emergence of harmful behaviors.

Nous Research AI ▷ #research-papers (8 messages🔥):

Meta Research, Llama team, Zuck Merge, world agent

Meta Papers Appear!: Papers from Meta’s research team are emerging, emerging and emerging.
Zuck Attempts Merging Meta Teams!: It was suggested that Zuckerberg is attempting to merge the research and Llama teams at Meta, thinking he’s keeping vision thought leadership from Yann and co while he perhaps moves to the language side to build out the policy optimization for industry use cases given Scale’s focus on capturing the processes that agents would follow and operationalizing them.
- This member also thinks they have a pretty good shot at a generalist world agent that can go into robots or computers or neural interfaces eventually.

Nous Research AI ▷ #interesting-links (1 messages):

Bigger Brains, Brain Implants

Bigger Brains Bring Bigger…Heads?: A member shared a YouTube video pondering the question of what would happen if we had bigger brains.
Brain Implants: Someone else was pondering brain implants.
- They did not provide a link or further details.

Nous Research AI ▷ #research-papers (8 messages🔥):

Meta Papers, Llama Team, Zuckerberg's vision, Generalist world agent

Meta drops hot 🔥 new Papers: A member shared a link to three new papers from Meta’s research team (1, 2, 3).
- One member characterized them as absolute goldmines.
Zuck considers Merging teams: A member speculated that Zuckerberg might try to merge Meta’s research and Llama teams.
- They suggested that Zuck might be moving Yann LeCun to policy optimization for industry use cases.
Scale’s focus on capturing agent processes: A member mentioned Scale’s focus on capturing the processes that agents would follow and operationalizing them.
- They predicted that Meta has a pretty good shot at a generalist world agent that can go into robots, computers, or neural interfaces.

HuggingFace ▷ #announcements (1 messages):

Gradio MCP hackathon, SmolVLA Model, HF MCP server, HF Sheets, Google Colab x HF integration

Gradio Hackathon is the Grandest!: The Gradio MCP hackathon has become the largest AI dev event of 2025, with 2500+ registered and $700,000 in sponsorships Gradio Tweet!
- Participants are excited about the event.
Small Vision Model Shows Valor!: SmolVLA, an Efficient Vision-Language-Action Model, was trained on Lerobot Community Data HF Blogpost.
HF Sheets is Spreadin’ the News!: HF Sheets: Excel meets AI plus unstructured data HF Blogpost!
Colab Collabs with HF!: Google Colab now integrates with HF, allowing users to directly try out AI models on free colab notebooks from HF Google Colab Blogpost!
Structures and Agents Get Executed!: CodeAgents + Structure: A Better Way to Execute Actions HF Blogpost.

HuggingFace ▷ #general (81 messages🔥🔥):

OS Agent, Codex Competition, Fine-tuning Mistral 7B, MLOps for Internships, Hugging Face Chat UI

Coding Agent Fixed and Ready: A member stated that they fixed their OS agent, now it’s a coding agent <:hugging_fire:1103375912814248006> and better than Codex!!!<:hugging_fire:1103375912814248006>.
- They still need to fine tune for this new agent architecture and adjust small things to polish it, but the master agent can summon mini agents for task distribution and discussion.
Internship Advice: MLOps is the Basics: A member asked for internship advice, mentioning they’ve only landed one interview out of 20 applications despite having a chatbot that takes PDF documents as input and uses RAG, Langchain, and FlanT5.
- Another member responded that mlops is the basics at this point and to check out GitLab to see how they do it.
Evaristo AI Showcases Portuguese Hugging Chat: A member shared their profile on Hugging Face and a Portuguese version of Hugging Chat.
- Someone noted that if you check the code it uses Hugging Chat UI but it does not monetize, however, Hugging Face has it’s limits on the amount of prompts if you know the limitations.
PCIE 6.0 SSDs Delayed Until 2030: A member shared an article from Tom’s Hardware stating that PCIE 6.0 SSDs for PCs won’t arrive until 2030 due to costs and complexity.
- Another member replied to this with What a disappointment. Moore’s law seem significantly slowing down instead so speeding up in AI TIMES….
HF Fine-Tune Hackathon: Please UPVOTE!: A member asked for upvotes on a discussion for a Fine-tune hackathon from Hugging Face, at this link.
- Another member replied that you can actually just run a Hugging Face competition yourself using this github repo.

HuggingFace ▷ #today-im-learning (6 messages):

HF AI Agents Course, Zig-Zag Ring Attention

HF AI Agents Course back in Session: A member is back to the HF AI Agents fundamental course, after pausing it at 80%.
- They are working on a chatbot project that uses generative AI to answer questions based on a file or text.
Zig-Zag Ring Attention sparks interest: A member shared that they are learning about Zig-Zag Ring Attention.
- No further details were given.

HuggingFace ▷ #i-made-this (3 messages):

Chromium extension for speaking to readmes, Synthetic Data Generation for LLM Safeguards, memX: Shared memory layer for multi-agent LLM systems

Extension Lets You Chat with GitHub Docs: A member created a Chromium extension that allows users to speak to any readme, file, or wiki page and get instant answers directly on GitHub, available on the Chrome Web Store.
- The extension aims to improve accessibility and provide quick information retrieval from documentation.
Safeguarding LLMs via Synthetic Data Generation: A member detailed their first attempt at Synthetic Data Generation for LLM Safeguards in a blog post.
- The process is documented here and a corresponding dataset was created and uploaded to Hugging Face here.
memX Launches as Shared Brain for LLMs: A member launched memX, a shared memory layer for multi-agent LLM systems, enabling agents to read and write to evolving context like a brain instead of passing messages.
- Key features include real-time pub/sub, JSON schema validation, API-key ACLs, and a Python SDK, with the code available on GitHub and a demo on X/Twitter.

HuggingFace ▷ #computer-vision (3 messages):

VSR Datasets, Attention Maps Visualization

Members Seek Public VSR Datasets on HF: A member asked if anyone knew of a good Video Super-Resolution (VSR) dataset publicly available on the Hugging Face Hub.
Guidance Requested on Visualizing Attention Maps with VLLMs: A member inquired about visualizing attention maps with a Visual Large Language Model, such as LLaVA.

HuggingFace ▷ #NLP (1 messages):

LLM Project Architecture, RAG Project Architecture, LLM API Design

Request for LLM and RAG Project Architecture Insights: A member asked for a source of study about default project architecture for LLM and RAG projects with API for use.
Resources for LLM and RAG Architectures Needed: The request focused on identifying existing study materials and best practices in structuring LLM and RAG projects for API accessibility.

HuggingFace ▷ #gradio-announcements (5 messages):

Gradio Agents & MCP Hackathon Winners, Modal Labs sponsors award, MCP Server Track, Custom Component Track, Agentic App Track

Gradio Agents & MCP Hackathon winners revealed!: The winners of the Gradio Agents & MCP Hackathon are announced after reviewing 630+ submissions, with cash prizes for each track.
- Winners include Geo Calculator MCP, Gradio Workflow Builder, and LLM Game-Hub.
Modal Labs drops $5,000!: ShallowCodeResearch wins the biggest single prize of the entire hackathon, sponsored by Modal Labs for utilizing infinite serverless compute.
- The winning submission can be tried here.
Community Chooses Consilium MCP!: Consilium MCP wins the Community Choice Award with a $500 cash prize.
- Also, Modern Multiplayer Online Role-Playing Game with MCP receives the Most Innovative Use of MCP Award.

HuggingFace ▷ #agents-course (6 messages):

Ollama explanation, Final Assessment Template

Ollama demystified: run local LLMs: Ollama lets you run AI models like Llama and Mistral on your computer instead of relying on online services like ChatGPT, by downloading the model files locally with ollama pull.
- When you run ollama run, it starts a local server on port 11434, allowing agents to interact with the local models; see this video series to learn more.
Clarify the Final Assessment Template’s Steps: A user seeks clarification on the steps for Unit 4’s Final Assessment Template including cloning the Final Assessment Template, modifying app.py and requirements.txt, running evaluation, and iterating to achieve a 30% score.
- They were seeking confirmation that they’d receive a certificate upon achieving that score.

aider (Paul Gauthier) ▷ #general (83 messages🔥🔥):

Gemini 2.5, Claude pricing, Aider and Gemini 2.5-pro, Open3 settings

Gemini 2.5 Benched, Pricing Still Inconsistent: Members discussed that Gemini 2.5 pricing on Aider may be inaccurate, suggesting the real cost could be up to 5x higher than estimated, which some members found to align with their experience.
- One member spent $3 creating a 5K LoC project with over 200 commits with Gemini.
Claude-4-Opus Costs Considered Too High: Claude-4-Opus has input/output prices of 15/75, and is 7.5x more expensive depending on the blend, leading to discussions around token usage and cost-effectiveness compared to Gemini.
- A member pointed out that Gemini generates considerably more reasoning tokens than Opus.
Latest Gemini 2.5-pro Model Supported by Aider: Users confirmed that Aider now supports the latest Gemini 2.5-pro model, clarifying that specifying the new model in Aider settings will work, although it may show a warning about unknown context window size and costs.
- Members noted that it uses sane defaults instead of specific context window size and costs.
Configure Aider Correctly for best results with Gemini: To avoid issues with context size and edit-mode, users recommended adding the correct settings to .aider.model.settings.yml and .aider.model.metadata.json or using the command aider --model gemini/gemini-2.5-pro-preview-06-05 --thinking-tokens 32k --edit-format diff-fenced.
- Specifically, adding this to .aider.model.settings.yml will avoid warnings:```
name: gemini/gemini-2.5-pro-preview-06-05 accepts_settings: [“thinking_tokens”]



  

---


### **aider (Paul Gauthier) ▷ #[questions-and-tips](https://discord.com/channels/1131200896827654144/1133060505792159755/1384614248558231662)** (9 messages🔥): 

> `Aider --restore-chat-history, Openrouter w/ Aider, Gemini copy-paste mode, Aider /read-only` 


- **Aider's history gets restored**: A user asked if there's a way to continue the previous session after a crash, and another member suggested using the **--restore-chat-history** flag.
   - A user hallucinated: *Sorry I hallunicated: --restore-chat-history*
- **Gemini enters copy-paste**: Someone wants to try **Gemini** in copy-paste mode, asking what the best quality editor model to use.
   - No recommendations were given.
- **Aider's /read-only prevents modification**: A user asked if **/read-only** is supposed to prevent a file from being modified in Aider.
   - A member confirms it should prevent modification.
- **Openrouter gets Budget 0 Error**: A member asks about getting a *Budget 0 is invalid* error from **OpenRouter** with Aider.
   - They clarify *this model only works in thinking* but no solution was provided.


  

---


### **Latent Space ▷ #[ai-general-chat](https://discord.com/channels/822583790773862470/1075282825051385876/1384612594052235456)** (79 messages🔥🔥): 

> `Midjourney Video Model V1, Krea AI, OpenHands CLI, Essential AI, AI and Proprietary Data` 


- **Midjourney Launches Video Model V1**: [Midjourney](https://xcancel.com/midjourney/status/1935377193733079452) has released **Version 1 of its Video Model**, allowing users to animate Midjourney-generated or external images at approximately **one image cost per second of video**.
   - The new **'Image-to-Video' feature** offers 'automatic' and 'manual' animation settings with 'high motion' and 'low motion' options, though video generation is web-only at launch.
- **KREA AI Releases Krea 1 Public Beta**: **KREA AI** has released **Krea 1** in public beta, aiming to provide superior aesthetic control and image quality, generating detailed textures, dramatic angles, and cinematic lighting, and moving away from the typical 'AI look'.
   - The new model supports various styles, style references, and custom trainings, and is available for free at [krea.ai/krea-1](https://xcancel.com/krea_ai/status/1934981993722466454?s=46).
- **OpenHands CLI Launched for Coding Agents**: **All Hands AI** introduces the **OpenHands CLI**, a new command-line interface for their coding agent offering top accuracy and simplifying installation by removing the need for Docker.
   - The CLI maintains the same accuracy as its previous Docker-based version, though without the web browser component, and offers slash commands and a command confirmation mode.
- **Essential AI Releases Massive 24T Token Dataset**: **Essential AI** has announced **Essential-Web v1.0**, a **24-trillion-token** pre-training dataset with detailed metadata designed for creating high-performing datasets.
   - Evaluations using domain-specific subsets of **Essential-Web v1.0** demonstrate improved performance in various fields like web code, STEM, and medical applications.
- **CoreWeave and Weights & Biases Launch New AI Inference Services**: **CoreWeave** and **Weights & Biases** are launching new AI inference services and Online Evaluation tools (monitors) for real-time LLM judgment.
   - These services, running on **CoreWeave GPUs**, include an inference endpoint for models like DeepSeek R1-0528 and LLama-4 Scout with OAI Compatible APIs, aiming to offer more competition and flexibility in the AI infrastructure space.


  

---


### **Latent Space ▷ #[ai-announcements](https://discord.com/channels/822583790773862470/1075282504648511499/1384675582884708402)** (5 messages): 

> `Andrej Karpathy's AI Talk, Software 3.0, LLM analogies, LLM Psychology, Partial Autonomy` 


- ****Latent Space** Reconstructs **Karpathy's** AI Talk**: **Latent Space** released a reconstructed version of **Andrej Karpathy's AI talk** with compiled slides and notes given the rapid value decrease of AI discussions, covering topics like **Software 3.0** and **LLM analogies**.
   - The content also touches on **LLM Psychology**, **Partial Autonomy**, **Vibe Coding**, and building for agents, with full slides available to **Latent Space** subscribers, linked [here](https://www.donnamagi.com/articles/karpathy-yc-talk).
- ****Karpathy's** Talk Covers Key Areas**: The talk encompasses areas such as **Software 3.0**, **LLM analogies** (Utilities, Fabs, OSes), **LLM Psychology**, and **Partial Autonomy** (including Human-AI Generation-Verification Loop).
   - It further explores **Vibe Coding** and building for agents, providing a comprehensive overview of current AI development concepts.


  

---


### **Modular (Mojo 🔥) ▷ #[general](https://discord.com/channels/1087530497313357884/1098713601386233997/1384818599302135808)** (4 messages): 

> `IR remapper in Mojo, Mojo bare metal kernel, Modular GitHub issue 4854, Mojo shirt without X` 


- **Mojo remapper and OS kernel arise**: One member plans to build an **IR remapper** and write an **OS bare metal kernel** in **Mojo**.
   - They linked to [Modular GitHub issue 4854](https://github.com/modular/modular/issues/4854) and exclaimed *"God help me persist. Mojo is lit"*.
- **Mojo merch and X requirment**: A member asked if there was any way to get a **Mojo shirt** without posting on **X**.
   - They mentioned sharing on **LinkedIn** instead and appreciating the **comic**.


  

---


### **Modular (Mojo 🔥) ▷ #[announcements](https://discord.com/channels/1087530497313357884/1098765954302873621/1384930608400171119)** (1 messages): 

> `AMD Partnership, GPU Support, Model Support, Open Source Code, Modular Hack Weekend` 


- **Modular Achieves GPU Agnostic Nirvana**: Modular Platform **25.4** introduces the ability to run the same code on both **AMD** and **NVIDIA** GPUs with zero code changes, marking the beginning of a partnership with **AMD**.
   - This release supports **AMD Instinct™ MI300X** and **MI325X GPUs**, enabling deployment across different hardware without vendor lock-in or extra configuration.
- **25.4 Turbocharges LLM Workloads**: Version **25.4** delivers up to **53%** better throughput on prefill-heavy BF16 workloads across state-of-the-art language models like **Llama 3.1**, **Gemma 3**, and **Mistral**.
   - This version expands hardware support to include **AMD MI300/325**, **NVIDIA Blackwell**, **RTX 40xx**, and **RDNA3**, broadening the platform's compatibility.
- **Modular Opens Kernel Code Vault**: Modular has open-sourced over **450k lines** of production-grade **Mojo** kernel code, enhancing transparency and community contribution.
   - The release also includes improved documentation, **PyTorch** ops tutorials, and kernel performance tools, fostering easier adoption and development.
- **Modular Hosts Hack Weekend with GPU Giveaways**: Modular is hosting a **Hack Weekend** on **June 27th** featuring a GPU Programming Workshop and a stacked GPU prize pool, inviting developers to participate virtually or in person via this [link](https://lu.ma/modular-gpu-workshop).
   - You can get the full story on the [Modular blog](https://www.modular.com/blog/modular-25-4-one-container-amd-and-nvidia-gpus-no-lock-in?utm_source=discord&utm_campaign=25_4).


  

---


### **Modular (Mojo 🔥) ▷ #[mojo](https://discord.com/channels/1087530497313357884/1151418092052815884/1384688130149584939)** (72 messages🔥🔥): 

> `Blackwell accelerators, Mojo bare metal, Kernel Development with Mojo, Mojo Async, Mojo Parametric Traits` 


- ****Blackwell Buyers Bummed by Buggy Business****: Early adopters of **Blackwell accelerators** are finding limited support and buggy drivers, leading to an unreliable and painful experience, despite [MAX supporting Blackwell](https://www.modular.com/max).
   - One user pointed out that **AMD** and **Intel** are launching GPUs with the same or more VRAM at half the price and even the cheaper **4090** might be a better choice.
- ****Mojo Goes Bare Metal, Boots Runtime Dependencies****: Members are excited that **Mojo** can now run bare metal with no system calls or runtime dependencies, illustrated by an [attached image](https://cdn.discordapp.com/attachments/1151418092052815884/1384859713388417126/image.png?ex=68549f5d&is=68534ddd&hm=d1af4a129aa4c5bd2cd0ebd8a5db27931f7454475f043fdf816af81fad245892&).
   - After simple function replacements like `KGEN_EE_JIT_GlobalConstructor` and `KGEN_EE_JIT_GlobalDestructor`, it can be used as *a modern systems programming language with zero-overhead abstractions* for kernel development.
- ****Kernel Konundrums: Mojo's Memory Management Marvels****: Discussed using Mojo for kernel development, highlighting its memory management capabilities, where *the compiler* can be trusted *on efficient and safe memory management*.
   - A developer noted the desire for *a clear strategy with error propagation/handling* to make the language golden, suggesting that manual async via FFI is already feasible.
- ****Mojo's Roadmap Riddles: Async and Traits Take Top Tier****: A member shared the top priorities for Mojo's development, including **Parametric Traits**, a big rework of **async**, and **dyn Traits** for automated vtable handling.
   - It was also noted that error handling might move to checked exceptions in the future.
- ****MAXimum Blackwell Boost: Modular's Secret Support****: Despite not being widely advertised, **MAX** supports **Blackwell** systems like the **5090**, with ongoing performance work before an official announcement.
   - Early testers are encouraged to kick the tires and report issues.


  

---


### **Modular (Mojo 🔥) ▷ #[max](https://discord.com/channels/1087530497313357884/1212827597323509870/1384731160801841193)** (3 messages): 

> `Max Inference, Max and Python, Orchestration of Model Graphs` 


- **Max Inference only controllable via Python Extension?**: A user inquired whether **Max** is only controllable via **Python extension**, to start an inference session and feed it numpy data.
   - Another user noted that the Mojo APIs have been removed for now.
- **Python Primary Interface to Graph Compiler**: It was clarified that the primary interface to the **graph compiler** (what MAX models are built with) is currently through **Python**.
   - A post was shared about why Python for orchestration of model graphs: [Mojo MAX Bindings](https://forum.modular.com/t/mojo-max-bindings/1499/3), and it was stated that *Python makes it really easy to integrate into existing tokenizers, processing logic, etc.*


  

---


### **Manus.im Discord ▷ #[general](https://discord.com/channels/1348819876348825620/1349440650495398020/1384610890988392498)** (74 messages🔥🔥): 

> `Web page generation credit costs, Website with scraping Facebook, Gumtree and Ebay, Manus Rival MiniMax AI with Agent Mode, Manus Video Generation with VEO, Manus Credit Refunds for Errors` 


- **High Traffic Thwarts Web Page Generation, Drains Credits**: Users reported that generating a simple web page consumed significant credits due to errors, with one user resorting to manual file edits.
   - One user noted that this was happening *after noon* and suggested that there may be different traffic at different times of the day.
- **Facebook, Gumtree, Ebay Scraping Website Botched**: A user spent **5k credits** attempting to create a website that scrapes **Facebook, Gumtree**, and **eBay** for stolen bike listings, but the AI failed, delivering fake results.
   - The user received a **2.5k credit refund**, but noted that it was *a waste of time and credits*.
- **MiniMax AI launches Agent Mode, Challenges Manus**: Users discussed **MiniMax AI's** new agent mode, viewing it as competition for Manus.
   - Some expressed concerns about **MiniMax's** credit system and subscription model, while praising Manus, others believe *competition will only bring the users more options on which AI to choose*.
- **Call for Credit Refunds on AI Task Errors**: Users debated whether **Manus** should refund credits when it encounters errors and has to re-run a process, or only charge upon successful task completion.
   - One user compared it to paying for a *burger that has rotten meat a stale bun* while another said *AI is programmed/trained to do much more then one human could tell it do in his whole life*.
- **Manus Gains Power with Free YouTube Boost**: A user posted a [YouTube link](https://m.youtube.com/watch?v=5PuofaVqXNI) saying that **Manus** can *grow even more powerful. Free of charge*.


  

---


### **LlamaIndex ▷ #[blog](https://discord.com/channels/1059199217496772688/1187460979064324127/1384633968153858049)** (3 messages): 

> `Model Context Protocol, AG-UI integration with CopilotKit, MCP vs Vector Search` 


- **Block designs MCP servers to assist Claude**: Block's engineering team shares their systematic approach to creating **Model Context Protocol (MCP) servers** that integrate seamlessly with Claude and other AI systems, starting with a clear [design](https://t.co/0vJajYzrfJ).
   - Their design patterns help build better **AI assistants**.
- **AG-UI integration with CopilotKit launches**: LlamaIndex launched official support for **AG-UI**, making it incredibly simple to bring your agents from the backend directly into user-facing applications to [create agent-powered frontends with zero boilerplate](https://t.co/hzxBrXKyTv).
   - The AG-UI integration is with **CopilotKit**.
- **MCP doesn't kill the need for vector search**: The **MCP** protocol creates new possibilities for agents to connect directly to data sources, but preprocessing and indexing are still needed for unstructured data like [PDFs and PPTs](https://t.co/OpUegepTAF).
   - It's estimated that *90% of enterprise data lives in PDFs, PPTs* and other similar formats.


  

---


### **LlamaIndex ▷ #[general](https://discord.com/channels/1059199217496772688/1059201661417037995/1384889675646238812)** (70 messages🔥🔥): 

> `FastAPI streaming issues, Agent workflow with streaming, Metadata filtering in chat/query engines, Response synthesizer refinement, Anthropic's AI Agent Frameworks` 


- **FastAPI Streaming Plagued by Pauses**: A user encountered **20+ second delays** in streaming events to the frontend using FastAPI, even though the backend processing was complete, which was later solved by adding `if ev.delta: yield`.
   - Adding newlines when yielding `yield json.dumps({..}) + "\n\n"` was suggested to delimit chunks, but ultimately, ensuring only non-empty deltas are yielded `if ev.delta: yield` solved the issue.
- **Agent Tool Call Timeouts with FastAPI**: A user debugging an agent workflow that streams events experienced delays with tool calls, and was advised to use the following code:
   - ```python
async for ev in handler.stream_events():
  if isinstance(ev, AgentStream):
    if ev.delta:
      print(ev.delta, end="", flush=True)
    elif ev.tool_calls:  # optionally print tool calls as they stream
      print(ev.tool_calls)
  elif isintance(ev, ToolCallResult):  # optionally print the complete tool call output
    print(ev.tool_name, ev.tool_kwargs, ev.tool_output)
  elif isintance(ev, ToolCallResult):  # optionally print the complete tool call input
    print(ev.tool_name, ev.tool_kwargs)

Metadata Filtering on Chat/Query Engine Unleashed: A user sought to use metadata filtering on a chat/query engine, and it was suggested to pass the retriever to the chat engine, which works for engines like Condense_plus_context.
- They expressed immense gratitude for this solution.
Response Synthesizer’s Refinement Routines: A user wondered about the universal standard for RAG LLMs, considering multiple passes of LLM output to refine it using response_synthesizer = get_response_synthesizer(response_mode="tree_summarize").
- It was suggested that if the goal is simply refinement, defining the system prompt to ensure the desired response format from the start is the better path.
LlamaIndex Snubbed in Anthropic’s Frameworks: A user noted that LlamaIndex was missing from Anthropic’s list of frameworks in their guidance on “Building Effective AI Agents”.
- Another member said that LLamaIndex is a total gem despite being left off the list due to existing relationships.

GPU MODE ▷ #general (11 messages🔥):

Stable Diffusion Performance, Custom Kernels, GPU Hardware Counter Monitoring, Expert Tiles Streaming

Stable Diffusion Performance Expert Arrives!: A new member achieved 294 images/sec at 512x512 on a 4090 with the 1 step sdxs model, claiming potential pioneering work in real-time videos by October 2023 (post on X.com).
- They’re now up to 23 fps at 1280x1024 using sdxl and express interest in exploring custom kernels to leverage both CUDA and tensor cores simultaneously.
Kernel Customization Curiosity Kicks In: A member with a background in software architecture and SQL database performance, now retired from MSFT, is experimenting with custom kernels to engage both CUDA and tensor cores concurrently.
- They are building a new system with dual 5090s, a 7985WX Threadripper, and 256 GB of memory for hardcore experimentation, after getting hooked on Stable Diffusion since its early days.
GPU Hardware Counter Monitoring Investigated: A member is exploring GPU hardware counter monitoring, drawing from prior experience with hardware counter profiling on CPUs (Intel and Sparc).
- This is part of their effort to find the best project to tackle using Ubuntu, nightly torch builds, and CUDA 12.9.
Expert Tiles Streamlining Strategies Spark: A member inquired about streaming expert tiles directly to SMEM via TMA to avoid L2 pollution, referencing an interesting concept.
- Another member clarified the goal is to use UVA, which utilizes TMA, drawing inspiration from vLLM’s approach and DeepMind’s findings on smaller experts.

GPU MODE ▷ #triton (3 messages):

Triton, Shared Memory, Register Spilling

User seeks Triton Shared Memory control: A user asked about forcing Triton to explicitly load a tensor in shared memory to avoid register spilling.
- A member responded that, as far as they understand, there’s no way to avoid Triton’s automatic management of shared memory.
Register Spilling concerns raised: A user mentioned seeing register spilling and hoped that using shared memory would improve performance in Triton.
- The user inquired about potential codebase changes needed to enforce shared memory usage.

GPU MODE ▷ #cuda (16 messages🔥):

CUDA segfaults, CUDA gdb, NVIDIA bug reports, CUDA containers vs. CCCL/Thrust, CUDA Runtime error checking

GPU Mallocs Cause Initial CUDA Segfaults: A member initially encountered segfaults during GPU mallocs, but later reported fixing them; the root cause was not specified.
CUDA gdb Debugging Proposed for CUDA: In response to debugging challenges, another member suggested using CUDA gdb for debugging, but this suggestion was deferred for later use.
Filing NVIDIA Bug Reports: A member shared the NVIDIA bug reporting link in response to the debugging conversation, and mentioned they are using it.
CCCL/Thrust vs Custom CUDA Containers: A member recommended using CCCL/Thrust and RAPIDS/RMM instead of implementing custom CUDA containers, especially for non-educational projects.
- The original member stated that they will stick to the “cuda cpp way” due to the need to manipulate specific data structures.
CUDA Runtime Errors and Compute-Sanitizer: It was recommended to implement proper CUDA Runtime error checking and to explore compute-sanitizer for debugging, suggesting the debugger as a last resort.

GPU MODE ▷ #torch (2 messages):

Intervene in inductor kernels, Customize generated kernel code

Intervene in Inductor Kernels Manually?: A member is looking for a way to intervene in a specific kernel created by inductor to make manual changes to the generated kernel code itself.
- They considered making a custom op, but this would require understanding how it fused operations in both the forward and backward passes, which would take time.
Customize Generated Kernel Code Directly: The user wants to modify the kernel code directly instead of just trying out different inductor configurations.
- They also asked if there’s a way to get certain inductor configs to apply only to a specific generated kernel, seeking fine-grained control over the kernel generation process.

GPU MODE ▷ #cool-links (4 messages):

Embed Issues with arXiv Links, Arctic Long Sequence Training, GitHub Repo Discovery

Embeds Baffled by arXiv Links: Users are reporting odd behavior with embeds not displaying the page title for arXiv links, such as this example.
- One user manually provided the title, “Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences,” due to embed failures.
GitHub Repo Lurks in the Shadows: A user shared a link to a seemingly interesting GitHub repository.
- No further context or description was provided, leaving the repository’s purpose a mystery.

GPU MODE ▷ #beginner (2 messages):

Shared Memory Buffers, Tensor Data Types

SMEM Buffer Flexibility Arises: A member clarified that the type of your shared memory (SMEM) buffer does not have to match the tensor map data type.
User expresses thanks: A user expressed thanks.

GPU MODE ▷ #rocm (3 messages):

AMD bf16 instruction, MFMA supports bfp16, Triton gemvs

AMD Lacks bf16 FMA Instructions: It was noted that AMD does not have bf16 instructions for FMA, and that only MFMA supports bfp16.
- Another member confirmed that only dot bf16 instructions seem to be available on RDNA3/4 and CDNA4, as seen in the screenshot.
bf16 gemvs Performance Woes: A user commented that gemvs with bf16 run extremely slow with fp32 emulation in Triton.

GPU MODE ▷ #liger-kernel (1 messages):

t_cc: Thanks! I’ll take a look

GPU MODE ▷ #self-promotion (1 messages):

Distributed Training Course, Accelerate Maintainer

Distributed Training Course Alert: A member is promoting a distributed training course taught by an accelerate maintainer from transformers.
- The course promises to gather big minds.
Another Distributed Training Course Alert: Another member is promoting a distributed training course taught by an accelerate maintainer from transformers.
- The course promises to gather big minds.

GPU MODE ▷ #general (4 messages):

PMPP Leaderboard, GPU MODE Competitions

PMPP Leaderboard Refresh Incoming: The PMPP problems leaderboard is slated for a refresh with a more robust eval methodology.
- The current PMPP problems will be made semi-permanent after the refresh, due to lessons learned and improvements made since its inception.
GPU MODE Announces Two More Competitions: GPU MODE is actively planning two more competitions to expand its offerings.
- No further details have been provided as to the themes of these competitions or the exact timings of their release, but expect them soon.

GPU MODE ▷ #factorio-learning-env (19 messages🔥):

FLE Progress Concerns, Pull Request Delays, Write Access Democratization, Factorio Source Code Integration, Factorio Client Removal

FLE Team Acknowledge Progress Concerns: Team members voiced concerns regarding the slow pace of progress in the Factorio Learning Environment (FLE) project, especially with pending pull requests.
- Members expressed the need for more regular merging of pull requests, suggesting a target of 2-3 merges per week to maintain momentum.
Democratizing Write Access Gains Traction: A key team member suggested creating a FLE GitHub organization to democratize write access, allowing more team members to contribute effectively.
- This change would aim to expedite the merging of pull requests and enable separate repositories for experimental projects.
Factorio Client Requirement Removal: An Infrastructure Victory: Members noted that removing the requirement for a connected Factorio client, as proposed in pull request #223, represents a significant infrastructure win.
- This change unlocks CI/CD, self-contained Jupyter notebooks, and massive horizontal scaling capabilities.
Source Code Access Considered for Direct Integration: The team discussed the possibility of obtaining access to the Factorio source code to implement missing functions and build the mod and API directly into the game engine.
- This would result in a tighter integration, potentially mirroring the stability seen in projects like Malmo, which has not required commits in years.
Events Redesign Could Streamline Interactions: Team members inquired about the feasibility of modifying certain on_player type events in the Factorio Lua API to improve functionality.
- Specifically, they are exploring the possibility of attaching events like on_player_mined to control rather than a player to enable more precise resource allocation when mining.

GPU MODE ▷ #cutlass (4 messages):

CUTLASS Examples, Applied for grant

Grant Application Gambit: A member shared they applied for a grant with their draft paper as a proposal.
- It is not clear which grant was applied for but perhaps we will find out more about it in the future.
Cutlass Code Samples Snippets Shared: A member asked for beginner friendly tutorials for CUTLASS besides docs and another member shared a link to the NVIDIA Cutlass Examples.
- The linked repo contains useful code snippets to kickstart using cutlass.

GPU MODE ▷ #singularity-systems (1 messages):

dumbpandabear: this sounds really exciting! looking forward to help get this off the ground

Cohere ▷ #🧵-general-thread (17 messages🔥):

Coordinate plane drawing models, AI Research and Development Channel, EU GDPR compliance for Embed v4, Cohere 4 AI projects

Cartesian Canvas Crafters Seek Coordinate Code: A member is seeking a model to feed instructions into their cartesian plane canvas to draw cool art.
- Ideas are welcome!
Research Realms: A New Channel Emerges: A new channel <#1384974112841269399> has been created to discuss AI research and development.
- Members are welcome to chat and connect with like-minded individuals, but no advertising is allowed.
GDPR Guarantees: Embed v4’s EU Eligibility Examined: A member inquired about EU GDPR compliance for Embed v4.
- This might be on the Roadmap as Embed v4 is excellent for multimodal RAG documents.
Cohere Collaboration: Contribute Creatively: A new community member asked how to join and contribute to existing Cohere projects.
- Members are directed to apply for Cohere 4 AI here and share research in <#1384974112841269399>.

Cohere ▷ #🔌-api-discussions (28 messages🔥):

bufio.Scanner error, command-r-08-2024 issues, Go SDK issue

Bufio Scanner Bug Surfaces: A member reported encountering a bufio.Scanner: SplitFunc returns advance count beyond input error and is using the cohere/command-r-08-2024 model.
- Cohere support requested a screenshot and details, suggesting the issue might be related to frequent client-side cancellations, but the member believes the problem originates from Cohere’s Go SDK.
Command-R Model Faces Completion Woes: Users are experiencing incomplete sentence generation with the command-r-08-2024 model, even after investigating potential client-side causes.
- An example incomplete completion looks like “Firefly’s smile deepens, a hint of mischief in her red eyes.”Well, hello there,” she says, her voice carrying a”.
Go SDK Suspected in Completion Catastrophe: Cohere support initially suggested the error was client-side, but after further investigation, the error appears to be stemming from Cohere’s Go SDK.
- Cohere support recommended upgrading the SDK to the latest version to see if it resolves the issue.

Cohere ▷ #👋-introduce-yourself (2 messages):

Introductions, Volunteer opportunities

Cynthia Joins Community, Seeks Opportunities: Cynthia from Burundi, now in Ontario, Canada, introduced herself and expressed interest in finding volunteer opportunities within the Cohere community.
- She is looking to learn how to start participating in projects, and is excited to join!
Welcoming New Community Members: The Cohere community welcomes new members and encourages them to introduce themselves.
- New members are asked to share their background, current projects, favorite tech/tools, and goals for the community.

Torchtune ▷ #dev (29 messages🔥):

Muon, AdamW, Kron, Qwen, torchtune

Muon’s Performance Compared to AdamW: A PR author shared some results about Muon, questioning why AdamW might perform better, and attributing it to a potential error in the integration or something specific to torchtune.
- It was noted that when the SFT optimizer differs from the pretraining optimizer, SFT with Muon does not show a significant advantage over AdamW, suggesting there’s still considerable room for exploration.
Kron’s Performance Similar to AdamW When Tuned Well: One member shared a comparison of Kron and Muon using a different implementation (ethansmith2000/fsdp_optimizers), where Kron performed about the same as AdamW when well-tuned in other tests.
- They noted that AdamW was generally the fastest but had slightly worse memory usage than Muon or all diagonal/one diagonal Kron.
Questioning correctness of Qwen’s convergence: Doubts were raised about the normal convergence of Qwen3 0.6b, suggesting a potential setup issue with the PR, and shared a WB_Chart
- After image analysis they said tops are more normal on this (difference like ~500) definetely a bug somewhere.

Torchtune ▷ #papers (4 messages):

Muon Optimizer, Jaguar Muon, Modern Optimizers

Magic Muon Optimizer Materializes: A member found the Muon optimizer interesting, noting it seems like magic that orthogonalizing the updates produces faster convergence.
- Referencing Jaguar Muon, they joked, “That witchcraft should be outlawed”.
Poison Proofs in Appendix?: A member asked if another had checked the “main poison: proofs in Appendix?”
- The other member replied they were saving those for a rainy day.

Notebook LM ▷ #use-cases (2 messages):

K-12 educators, podcast from sources, NotebookLM API

Leveraging NotebookLM in K-12 Education: A member is giving a presentation to K-12 educators and is asking about common use cases leveraging NotebookLM in the education space.
- They mentioned they would like to access NotebookLM’s ability to create amazing podcasts from the sources via an API.
Podcast Generation Feature Request: The member inquired if there was an API (or MCP!) that enables podcast generation from sources within NotebookLM.
- They also inquired about whether there was an existing roadmap or path to implement this feature.

Notebook LM ▷ #general (25 messages🔥):

Gemini App Daily Limit, NotebookLM Model, LaTex support, Audio Overviews, Podcast issues

Gemini’s Research Daily Limits Debated: Users are questioning the daily limit of the deep research feature on the Gemini app, with confusion about limits on free versus paid plans.
- One user reported hitting the limit on the free version but couldn’t find data for paid Google plans.
NotebookLM may use Gemini 2.5 Flash: A user believes that NotebookLM uses the same underlying model as Gemini, but is tuned to hallucinate less, which is rumored to be Gemini 2.5 Flash
- This is based on the Google AI Studio release of an experimental audio generation version with Gemini 2.5 Flash; others claim it’s still on 2.0.
LaTeX Support Still Awaited: Users are requesting LaTeX support for NotebookLM, but math markup is not yet supported.
- A user was encountering issues with their output displaying code, rather than code representations and was directed to a feature request channel.
Audio Overviews Length Limits: Users report issues generating audio overviews longer than 10 minutes in non-English languages such as Italian, despite custom prompts.
- It seems that NotebookLM is defaulting to English and ignoring custom prompts for longer audio segments.
Podcast Customization Woes: A podcaster complains that NotebookLM no longer follows instructions for customization, sounding like a college professor reading a paper and ignoring directives like avoiding the phrase deep dive.
- Other users confirm experiencing similar issues, describing the tool as dead in the water for podcasting.

Yannick Kilcher ▷ #general (5 messages):

Typo Analysis, Cognitive Offloading to AI, Flow Matching in Production

Typo or Cerebral Error?: A member analyzed a typo, distinguishing between physical typing errors (e.g., teh for the) and cerebral errors (typing one instead of was).
- The member posited that the latter might indicate the typing part of the brain listening to the message-composing part.
Normies offloading thinking to ChatGPT: A member observed that many people now offload their thinking to ChatGPT for cognitively undemanding tasks.
- The member expressed this occurs because ChatGPT is often good enough for them.
Flow Matching Under Scrutiny: A member asked if flow matching (FM) is used in industry in production, citing a tweet.
- Another member responded that Imagen and Flux use FM.

Yannick Kilcher ▷ #paper-discussion (11 messages🔥):

Predictive Coding, V-JEPA-2 Paper Discussion, PRECO GitHub Repo

Predictive Coding = Square Root Guessing?: A member linked to a discussion of predictive coding, suggesting that understanding square root calculation by guess and check and backpropagation provides a rough understanding of predictive coding.
- They also linked the PRECO GitHub repo as material for further study.
V-JEPA-2 Paper Discussion Scheduled: Members scheduled a discussion of the V-JEPA-2 paper today, referencing the Meta AI blog post and the associated arXiv paper.
- Another member created a future event to lead the group through V-JEPA-2.

Yannick Kilcher ▷ #ml-news (8 messages🔥):

Keen Tech RL, Carmack Goals, Open Sourcing, Cursor Tier

Keen RL Presentation Underwhelms: Members found Richard Sutton’s Keen Tech RL presentation underwhelming given Keen’s focus on RL and Carmack’s incompatible goals.
- One member suggested that later presentations would be more interesting.
Keen to Open Source Code: Members expressed excitement about Keen Tech open-sourcing their code.
- One stated very underwhelming.
Cursor Releases New Tier: Members shared a link to Cursor’s new tier announcement.

MCP (Glama) ▷ #general (17 messages🔥):

Custom Transport in FastMCP, Corporate MCP Client/Host Alternatives, GraphQL API Tool Generation, Multi-Agent Systems with MCP, MCP Server Credentials

FastMCP Custom Transport Implementation: A member inquired about implementing custom transport in FastMCP.
- Another member confirmed it’s possible on the client-side by extending the base transport class but was unsure about the server-side.
Navigating Corporate Restrictions: MCP Client/Host Solutions: Members discussed MCP client/host options when corporate policies restrict the use of Claude Desktop or Cursor.
- One member found success with devstral:24B using Ollama locally and CLINE, but faced issues with Roo.
MCP Tackles Massive GraphQL APIs: A member is generating ~600 tools for GraphQL API queries and mutations using their MCP server, highlighting Cursor’s limitations in handling such a large number of tools.
- They noted that Cursor and other models are struggling with tool counts exceeding a few dozen, as shown in a linked screenshot.
Architecting Multi-Agent Systems with MCP: The community discussed if A2A is required for building multi-agent systems using MCP tools, or if MCP client and server in each agent are sufficient.
- A member stated that no A2A is needed, and that even Google doesn’t care about it.
Streamlining MCP Server Access: Members discussed the easiest way for others to use a newly created MCP server, focusing on credential acquisition.
- The specific goal was to enable users to simply grab a credentials.json file from Google Cloud Console.

MCP (Glama) ▷ #showcase (2 messages):

Text-to-GraphQL, GraphQL Schemas, Arize-ai

Text-to-GraphQL MCP Server Arrives: Arize AI rolled out a new feature that allows users to MCP servers to their account directly from a Spaces page, introducing a Text-to-GraphQL MCP server.
- It transforms natural language queries into GraphQL queries using an MCP server that integrates with AI assistants like Claude Desktop and Cursor, with a GitHub repo and a full write-up.
GraphQL schemas token problem: GraphQL schemas can easily exceed 75,000 tokens, making it impractical to stuff an entire schema into an LLM’s context window.
- The Text-to-GraphQL MCP solves that by teaching an agent to traverse the schema graph directly, extracting only the fields and types it needs.

tinygrad (George Hotz) ▷ #general (4 messages):

Hackathon Timing, tinygrad maturity

Hackathon Delayed Until Tinygrad Matures: The tinygrad hackathon is postponed, with the earliest possible date being next year, due to the need for tinygrad to become more usable and mature.
- The announcement encouraged continued usage and feedback on tinygrad, noting that this engagement will inform hackathon participant selection.
Feedback Encouraged for Future Hackathon Consideration: Participants are encouraged to actively use tinygrad and provide feedback, influencing selection for future hackathon participation.
- This initiative aims to improve tinygrad’s usability and maturity, ensuring a more productive hackathon experience when it is eventually held.

tinygrad (George Hotz) ▷ #learn-tinygrad (5 messages):

TinyJit, ShapeTracker, Variable Usage

TinyJit args mismatch resolution: A user encountered an AssertionError related to args mismatch in JIT when using @TinyJit with a loop that iterates over a Tensor and a member provided a solution using Variable to address the ShapeTracker issue.
- The suggested solution involves creating a Variable to represent the loop index and binding it within the loop, allowing TinyJit to handle the varying shapes correctly, as further detailed in tinygrad’s JIT tutorial.
Shape Alignment with Variable: A member explained that the suggested solution addresses ShapeTracker discrepancies encountered when using TinyJit within loops, specifically when processing slices of a tensor.
- By using Variable and binding it to the loop index, the shapes are aligned, resolving the args mismatch error, which another member experienced.
Tensor-Variable Math Operation Constraint: A user inquired about the requirement for Tensor to be on the left-hand side (LHS) when performing mathematical operations between a Tensor and a Variable.
- The exact reason behind this was not explicitly provided in the conversation, but it implies a constraint in how tinygrad handles operations involving symbolic variables.

Nomic.ai (GPT4All) ▷ #general (9 messages🔥):

Discord Spam, Mr. Beast

Discord User Spams Mr. Beast Content: A Discord user was told to stop spamming Mr. Beast content in the channel.
Another Discord User Complains: Another Discord user complained about the spam.

DSPy ▷ #show-and-tell (2 messages):

MIPROV2 Optimization, Agent Implementation, Workflow Metrics

MIPROV2 Agent Optimization: A member indicated that they don’t see any hurdles in optimizing the agent implementation with something like MIPROV2 if we have input-output examples.
- Another member is curious about what input/output examples would even look like for the workflows.
Workflow Metric Optimization: A member is working on other projects where they plan to use the built-in eval metrics to optimize workflows and agent implementations.
- They are excited to share once they are through with the implementation.

DSPy ▷ #general (5 messages):

DSPy and LangGraph, DSPy in agentic coding IDEs, Finetuning Llama

DSPy ❤️ LangGraph in Production?: A member asked if anyone has combined DSPy and LangGraph in production, seeking to reverse engineer a recent multi-agent researcher from Anthropic.
- Another member mentioned someone who put something together in this discord link to help with this.
DSPy Powers Agentic Coding IDEs?: A member inquired about using DSPy in agentic coding IDEs like Cursor or Roo Code.
- They noted that setting up “agent” mode currently relies on prompt engineering based on vibes, with instructions like ‘ONLY RETURN markdown’.
Fineteuning Llamas with Frameworks?: A member new to DSPy asked what frameworks or libs are recommended for finetuning Llama models.
- They were seeking recommendations on libraries and frameworks.

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama Recap

1. Google Gemini 2.5 Flash Price Increase Discussion

2. AI Model Visual Reasoning Challenges

3. Movie Meme Adaptations

Other AI Subreddit Recap

1. Meta’s Aggressive Recruitment of OpenAI Researchers

2. OpenAI Model Progress, Releases, and Perception

3. Philosophical and Societal Concerns Around AI’s Impact

AI Discord Recap

Discord: High level Discord summaries

Cursor Community Discord

OpenAI Discord

Perplexity AI Discord

OpenRouter (Alex Atallah) Discord

LMArena Discord

Unsloth AI (Daniel Han) Discord

Eleuther Discord

LM Studio Discord

Nous Research AI Discord

HuggingFace Discord

aider (Paul Gauthier) Discord

Latent Space Discord

Modular (Mojo 🔥) Discord

Manus.im Discord Discord

LlamaIndex Discord

GPU MODE Discord

Cohere Discord

Torchtune Discord

Notebook LM Discord

Yannick Kilcher Discord

MCP (Glama) Discord

tinygrad (George Hotz) Discord

Nomic.ai (GPT4All) Discord

DSPy Discord

Discord: Detailed by-Channel summaries and links

Cursor Community ▷ #general (971 messages🔥🔥🔥):

Cursor Community ▷ #background-agents (25 messages🔥):

OpenAI ▷ #annnouncements (3 messages):

OpenAI ▷ #ai-discussions (808 messages🔥🔥🔥):

OpenAI ▷ #gpt-4-discussions (14 messages🔥):

OpenAI ▷ #prompt-engineering (224 messages🔥🔥):

OpenAI ▷ #api-discussions (224 messages🔥🔥):

Perplexity AI ▷ #general (730 messages🔥🔥🔥):

Perplexity AI ▷ #sharing (5 messages):

Perplexity AI ▷ #pplx-api (18 messages🔥):

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

OpenRouter (Alex Atallah) ▷ #general (316 messages🔥🔥):

LMArena ▷ #general (309 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #general (193 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #off-topic (17 messages🔥):

Unsloth AI (Daniel Han) ▷ #help (58 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #research (5 messages):

Eleuther ▷ #general (82 messages🔥🔥):

Eleuther ▷ #research (170 messages🔥🔥):

Eleuther ▷ #lm-thunderdome (4 messages):

LM Studio ▷ #general (146 messages🔥🔥):

LM Studio ▷ #hardware-discussion (57 messages🔥🔥):

Nous Research AI ▷ #general (107 messages🔥🔥):

Nous Research AI ▷ #ask-about-llms (2 messages):

Nous Research AI ▷ #research-papers (8 messages🔥):

Nous Research AI ▷ #interesting-links (1 messages):

Nous Research AI ▷ #research-papers (8 messages🔥):

HuggingFace ▷ #announcements (1 messages):

HuggingFace ▷ #general (81 messages🔥🔥):

HuggingFace ▷ #today-im-learning (6 messages):

HuggingFace ▷ #i-made-this (3 messages):

HuggingFace ▷ #computer-vision (3 messages):

HuggingFace ▷ #NLP (1 messages):

HuggingFace ▷ #gradio-announcements (5 messages):

HuggingFace ▷ #agents-course (6 messages):

aider (Paul Gauthier) ▷ #general (83 messages🔥🔥):

GPU MODE ▷ #general (11 messages🔥):

GPU MODE ▷ #triton (3 messages):

GPU MODE ▷ #cuda (16 messages🔥):

GPU MODE ▷ #torch (2 messages):

GPU MODE ▷ #cool-links (4 messages):

GPU MODE ▷ #beginner (2 messages):

GPU MODE ▷ #rocm (3 messages):