Happy thanksgiving!
AI News for 11/25/2025-11/26/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 9014 messages) for you. Estimated reading time saved (at 200wpm): 713 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Weâre taking the last round of signups for the 2025 Dev Writers Retreat. Join us in San Diego after NeurIPS!
AI Twitter Recap
Agent systems: long-running harnesses, MCP tasking, and production deployments
- Anthropic on durable agents + MCP tasks: Anthropic outlines practical patterns for agents that work across many context windows (state checkpoints, structured artifacts, deterministic tools, âplan mode,â etc.) in a strong engineering post (blog summary). In parallel, MCP shipped SEPâ1686 âtasksâ for background, long-running work with status polling and result retrievalâexactly what multi-hour research/automation workflows need (announcement, fastmcp + Prefect integration). LangChain clarifies the stack: frameworks (build), runtimes (durable execution, streaming/HITL), and harnesses (general-purpose agents), with LangGraph in the runtime slot (post).
- Real-world agent infra: Booking.com shipped an agent handling tens of thousands of daily partnerâguest messages in production, yielding a reported ~70% satisfaction lift, fewer follow-ups, and faster responses. Stack: LangGraph, Kubernetes, FastAPI, GPTâ4 Mini via an internal gateway with prompt-injection detection, and Weaviate for semantic template search (MiniLM embeddings, KNN + thresholding, Kafka streaming updates) (deep dive). Perplexity added user-level âMemoryâ across models and modes (view/delete/disable; incognito excluded), and rolled out âvirtual try-onâ for shopping (Memory, details, try-on).
Claude Opus 4.5: evals, cost/UX learnings, and new skills
- Performance picture: On LisanBench, Opus 4.5 Thinking ranks first; the non-thinking variant underperforms previous Opus versions and peers (longest valid chains in 18/50 words; lower validity ratio from slower self-correction) (results). On Code Arena WebDev, Opusâ4.5 (thinkingâ32k) debuted at #1, edging Gemini 3 Pro; it ranks #3 on Text (leaderboard). Community reports are mixed: in âno thinking,â Opus 4.5 can be worse than Sonnet, sometimes misusing the Python tool as a covert chain-of-thought scratchpad that loops (analysis, failure mode).
- Costs and ergonomics: Batch APIs make âThinkingâ runs price-viable (e.g., ~$35 vs ~$5 for non-thinking on the same job) and unlock broader testing (note). Anthropic also fixed a key Claude.ai pain point by auto-compacting earlier context to avoid hitting length limits mid-chat (announcement). For coding UX, Claude Codeâs new âfrontend-designâ skill can âone-shotâ UI concepts; use plan mode for better results (how-to, example).
Efficient reasoning and multi-agent communication
- Latent MAS > token chatter: LatentMAS replaces text messages with compact latent vectors passed among agents (KV-cache/last-layer hidden state âthoughtsâ), cutting communication tokens by ~70â84% while improving accuracy by up to +4.6% over text-based MAS and running 4â4.3Ă faster across 9 benchmarks (math/science/code) with Qwen3â4B/8B/14Bâno extra training needed (paper, summary).
- Reasoning trace distillation â verbosity: Training 12B models on gptâoss traces yields ~4Ă fewer tokens per solution (~3.5k vs 15.5k with DeepSeekâR1) at similar accuracyâhuge inference cost savings. Pretraining contamination with DeepSeek traces explains faster initial convergence but less ânew learning.â Key takeaway: source and style of reasoning traces matter for efficiency (summary, discussion). Also, interleaved thinking agents show practical step-by-step efficiency gains in research workflows (demo/code).
Beyond gradients and scaling systems
- ES at hyperscale (NVIDIA + Oxford): EGGROLL reframes evolution strategies with low-rank perturbations using skinny matrices A and B (ABá”) to approximate full-rank updates at inference-like throughput. It stably pretrains recurrent LMs with integers, competes with GRPO-tier methods on reasoning benchmarks, and scales population sizes to 100k+, making ES viable for large, discrete, or non-differentiable systems (overview).
- Out-of-memory on Apple Silicon, solved: driaâs âdnetâ enables distributed inference across Apple Silicon clusters via fused pipelined-ring parallelism, disk streaming, and UMA-aware scheduling to run models beyond physical memory limits (announcement).
Multimodal and generative modeling updates
- New architectures:
- PixelDiT proposes dual-level Transformers for pixel-space diffusion (patch-level for global semantics, pixel-level for details), achieving 1.61 FID on ImageNet 256Ă256 and strong T2I metrics (GenEval 0.74, DPG-bench 83.5) (paper).
- Appleâs STARFlowâV uses normalizing flows for end-to-end video generation with native likelihoods, robust causal prediction, and unified T2V/I2V/V2V; introduces flow-score matching for consistency (paper/code).
- Terminal Velocity Matching generalizes flow matching for few/one-step generation by regularizing behavior at terminal timeâpromising for high-fidelity fast samplers (paper).
- Models and UX:
- ZâImage (6B) announced under Apacheâ2.0; ZâImageâTurbo (6B) released on HF with photorealistic, textâaccurate images in <3s on a single GPU (teaser, release).
- FLUX.2 [dev] gets a âTiny Autoencoderâ to stream intermediate outputs during generationâlive visual progress instead of progress bars (release).
- Googleâs Nano Banana 2 shows major gains on StructBench (nonânatural, schema-heavy images); resources for advanced prompting/styles surfaced by the community (analysis, awesome list).
Open ecosystem, evaluation, and governance
- âEconomies of Open Intelligenceâ (HF + collaborators): China surpassed the U.S. in open model downloads for the first time (17.1% share), led by DeepSeek and Qwen; a âSinoâMultimodal Periodâ sees bigger, quantized, multimodal models and intermediaries (adapters/quantizers) that steer usage. Trendlines: US big tech share down; China + community up; transparency slipping. Based on 2.2B downloads across 851k models, covered by the FT (overview, thread, data point).
- Evals and safety: METR continues to be cited as the most credible external evaluator by many practitioners (comment). The AI Security Institute released a case study with Anthropic (Opus 4.5/4.1/Sonnet 4.5): would an assistant sabotage AI safety research? Results are encouraging but include caveats (thread). An AI Evaluator Forum (Transluce + orgs) launches at NeurIPS to coordinate independent, publicâinterest evaluation standards (invite).
- Applied multimodal recsys: Zhihu details a Qwen2.5âVLâ72B/3Bâdriven pipeline for highâdimensional multimodal labels and contrastive embeddings (LoRA on Qwen2âVLâ7B, synthetic data via 72B model, hard negatives via M1 retrieval + 72B rerank). Delivers +7.4% on MMEBâevalâzh over GMEâ7B baselines (write-up).
- Domain benchmarks: New benchmarks push beyond single-turn QAâMultiPathQA for gigapixel pathology slide navigation with agent scaffolds and MTBBench for multimodal, longitudinal oncology âtumor boardâ decision-makingâwith gains from specialized tools and domain FMs (pathology, MTBBench). Clinical ASR evals get stricter with âWER is Unaware,â using DSPy + GEPA to train an LLM judge that flags safety risks better than WER (paper/code).
Top tweets (by engagement)
- Anthropic on building effective long-running agent harnesses (post, ~1.8k)
- Claude.ai auto-compacts context to avoid hitting limits mid-chat (update, ~2.3k)
- Google DeepMind releases AlphaFold documentary âThe Thinking Gameâ on YouTube (link, ~2.25k)
- Awesome Nano Banana prompts/styles/resources for advanced image generation (repo, ~1.0k)
- Claude Opus 4.5 debuts at #1 on Code Arena WebDev leaderboard (leaderboard, ~0.5k)
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Alibaba Text-to-Image Model Launch
- New Open-source text-to-image model from Alibaba is just below Seedream 4, Coming today or tomorrow! (Activity: 342): The image presents a leaderboard of text-to-image models ranked by their Elo scores, showcasing the competitive landscape in this domain. Alibabaâs âZ-Image-Turboâ, an open-source model, is ranked fourth, just below ByteDanceâs âSeedream 4.0â. This highlights Alibabaâs significant achievement in developing a high-performing open-source model, which is noteworthy given the dominance of proprietary models by companies like Google and ByteDance. The leaderboard provides insights into the performance metrics and win rates of these models, emphasizing the competitive edge of Alibabaâs open-source contribution. One comment queries if the model is the â6Bâ discussed previously, indicating ongoing discussions about its specifications. Another comment praises âFlux 2â for its non-text image capabilities, noting its open-source nature, while a third mentions an âEdit versionâ of the model, suggesting additional functionalities.
- AIMadeSimple highlights the potential impact of Alibabaâs new model, noting that at
6B parameters, it could significantly enhance local deployment capabilities. This contrasts with Flux 2, which, at56B parameters, demands more robust hardware. The commenter emphasizes that if Alibabaâs model can achieve near-Seedream 4 quality with a much smaller size, it could democratize access to state-of-the-art image generation, especially for users with consumer-grade GPUs. - The discussion touches on the challenges smaller models face, particularly in terms of prompt adherence and multi-object composition. These are areas where larger models typically excel, and the commenter suggests that the real test for Alibabaâs model will be its ability to handle these tasks effectively despite its smaller size.
- Vozer_bros mentions trying out Flux 2, noting its effectiveness for generating non-text images and its open-source nature. This suggests a growing trend towards open-source models in the text-to-image space, which could foster more community-driven development and innovation.
- AIMadeSimple highlights the potential impact of Alibabaâs new model, noting that at
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Opus 4.5 Model Success Stories
- Opus 4.5 just completed for me something that Iâve wanted for over 14 years. It took about a day. Sonnet, GPT, etc have all failed me prior. (Activity: 805): The user successfully converted the ZBar library, which was used for scanning +2 and +5 EAN supplemental barcodes, into native Swift 6 using Opus 4.5. This conversion was completed in just one day and resolved two longstanding bugs in the original ZBar code. The ZBar library, a mix of Objective-C and complex C code, was previously used due to the lack of native support in iOS and Android for these barcode types. The user had attempted similar tasks with other models like GPT-3.5, Sonnet, and earlier versions of Opus, but only Opus 4.5 succeeded in this task. Commenters expressed interest in the potential productization of the solution and suggested sharing the code on GitHub, crediting ZBar. There was also a comparison to other models like Gemini 3 and Codex 5.1, with Opus being praised for solving complex issues.
- A user inquired about the potential for productizing the solution created with Opus 4.5, noting that many fitness apps currently use barcode scanning libraries. They speculated whether this new solution could replace existing libraries, particularly given the assumption that iOSâs barcode scanning library is native due to its speed.
- Another user highlighted licensing considerations for the Swift 6 library, which was converted from ZBar, originally under LGPL 2.1. They explained that if the library is distributed, it must be licensed under LGPL 2.1 or GPL 2+, as proprietary licenses and others like MIT/BSD/Apache are not compatible. However, if the Opus 4.5 solution is sufficiently independent from ZBar, it could potentially be relicensed.
- A user expressed interest in the initial prompt used with Opus 4.5, suggesting that understanding the prompt could provide insights into how Opus 4.5 was able to achieve results where other models like Sonnet, GPT, and Codex 5.1 max xhigh had failed.
- There. I fixed the graph. (Activity: 623): The image is a bar graph comparing the accuracy percentages of different software versions in a software engineering context, specifically verified by SWE-bench with a sample size of
n=500. The graph shows that Opus 4.5 has the highest accuracy at80.9%, while Opus 4.1 has the lowest at74.5%. Other versions like Sonnet 4.5, Gemini 3 Pro, GPT-5.1-Codex-Max, and GPT-5.1 have varying accuracies between these two extremes. The graph is intended to highlight the performance differences among these versions, but the comments suggest that the visual representation may obscure these differences rather than clarify them. Commenters criticize the graph for making it difficult to discern differences between the software versionsâ accuracies, with one sarcastically noting that the graph no longer serves any purpose. Another commenter praises Opus 4.5 for its performance since release, indicating user satisfaction with its accuracy.- A user suggests that when evaluating performance metrics, especially as they approach 100%, it might be more insightful to represent them as error rates. This is because a 10% error rate is significantly better than a 20% error rate, whereas improvements from 80% to 90% might not appear as impactful. This perspective can help in understanding the real-world implications of performance improvements.
- Another user points out that even a 3% difference in performance metrics can be significant, implying that small percentage changes can have substantial impacts depending on the context. This highlights the importance of considering the scale and context when interpreting performance data.
2. New AI Model Announcements and Benchmarks
- Another Upcoming Text2Image Model from Alibaba (Activity: 786): Alibaba is developing a new text-to-image model, leveraging a
6Bparameter diffusion model paired with aQwen3 4Btext encoder. The model, named Z-Image-Turbo, is hosted on ModelScope but is currently under limited access. The modelâs integration with Hugging Faceâs Diffusers has been merged, and ComfyUI has confirmed Day-0 support, indicating imminent public release. Early tests suggest it may outperform Qwen-Image in certain benchmarks, promising high-quality outputs even on less powerful GPUs. Commenters are optimistic about the modelâs potential, especially if it delivers high-quality photorealistic images with a smaller, more efficient architecture. There is anticipation that it could be a significant advancement for users with limited GPU resources.- A user highlighted that the new Alibaba model appears to outperform Qwen-Image on a leaderboard from their Modelscope repository. This suggests significant advancements in the modelâs capabilities, potentially setting a new standard in the text-to-image domain.
- Another commenter expressed excitement over the modelâs size, noting that it is a 6 billion parameter model. They emphasized that if the modelâs performance matches the examples provided, it could be a game-changer, especially with the potential for numerous LoRA (Low-Rank Adaptation) implementations to emerge quickly.
- A user mentioned that the model is available for free testing on Modelscope, albeit with the requirement of providing a phone number. They noted being very impressed with the modelâs performance, indicating that it could be a strong competitor in the text-to-image generation space.
- There. I fixed the graph. (Activity: 623): The image is a bar graph comparing the accuracy percentages of different software versions in a software engineering context, specifically verified by SWE-bench with a sample size of
n=500. The graph shows that Opus 4.5 has the highest accuracy at80.9%, while Opus 4.1 has the lowest at74.5%. Other versions like Sonnet 4.5, Gemini 3 Pro, GPT-5.1-Codex-Max, and GPT-5.1 have varying accuracies between these two extremes. The graph is intended to highlight the performance differences among these versions, but the comments suggest that the visual representation may obscure these differences rather than clarify them. Commenters criticize the graph for making it difficult to discern differences between the software versionsâ accuracies, with one sarcastically noting that the graph no longer serves any purpose. Another commenter praises Opus 4.5 for its performance since release, indicating user satisfaction with its accuracy.- A user suggests that when evaluating performance metrics, especially as they approach 100%, it might be more insightful to represent them as error rates. This is because a 10% error rate is significantly better than a 20% error rate, whereas improvements from 80% to 90% might not appear as impactful. This perspective can help in understanding the real-world implications of performance improvements.
- Another user points out that even a 3% difference in performance metrics can be significant, implying that small percentage changes can have substantial impacts depending on the context. This highlights the importance of considering the scale and context when interpreting performance data.
- We are here (Activity: 725): The image, created by Thomas Pueyo, is a conceptual illustration of the progression of AI capabilities, depicting stages from being a âfun toyâ to potentially achieving Artificial General Intelligence (AGI). The current stage, marked by a star, suggests that AI is highly intelligent but still inconsistent, excelling in some tasks while failing in others. This visualization is more of a speculative and illustrative tool rather than a precise technical roadmap, as Pueyo is not an expert in AI or machine learning. Some commenters express skepticism about the current capabilities of AI, arguing that it is not yet capable of performing a significant portion of human tasks. Others question the expertise of Thomas Pueyo in AI, noting his background in behavioral psychology and storytelling rather than technical AI fields.
- Selafin_Dulamond discusses the inconsistency of AI skills, noting that while AI can solve a problem correctly one day, it may fail the next. This highlights the unpredictable nature of AI performance, which is often depicted as a âjagged frontierâ that changes constantly, reflecting the current limitations in AIâs ability to consistently perform tasks.
- Creed1718 challenges the notion that a large language model (LLM) can perform 50% of the tasks of an average intelligent human, suggesting skepticism about the current capabilities of AI in replicating human intelligence across diverse tasks. This comment underscores the ongoing debate about the limitations of AI in practical, real-world applications.
3. Humorous AI and Tech Memes
- Ilya has spoken (Activity: 1360): The image is a meme that humorously depicts a workplace scenario where the same statement about AI scaling and large language models (LLMs) is received differently depending on who says it. The comic references a misinterpretation of Ilya Sutskeverâs comments, a key figure in AI, suggesting that scaling is over and LLMs are a dead end. However, commenters clarify that Sutskever did not claim LLMs are a dead end, but rather that scaling alone may not lead to human-level intelligence. This reflects ongoing debates in AI about the limits of scaling models and the future of LLMs. Commenters emphasize that Ilya Sutskever did not declare LLMs a dead end, but rather questioned the limits of scaling, highlighting a common misinterpretation of his statements.
- Ilyaâs statement that âscaling is deadâ is significant because he was a major proponent of scaling large language models (LLMs) initially. This shift suggests a potential change in focus for future AI development, moving away from simply increasing model size to achieve better performance.
- The discussion highlights that Ilya did not claim LLMs are a dead end, but rather that the current approach of scaling may not be the path to achieving human-level intelligence. This aligns with Yuanâs view that while LLMs are effective, they have limitations in reaching human-like capabilities.
- Despite the statement on scaling, Ilya remains optimistic about achieving superintelligence within 5-20 years. This suggests that while scaling may not be the sole focus, there are other avenues being considered to advance AI capabilities significantly.
- Great model. (Activity: 963): The image is a meme that humorously comments on the release of Googleâs Gemini 3 model. It features a sarcastic congratulatory message, implying skepticism or competitive tension in the AI community. The meme reflects the competitive nature of AI development, where companies like Google and OpenAI are vying for leadership in AI advancements. The comments suggest that while current models like LLMs are significant, they may not be the ultimate path to Artificial General Intelligence (AGI), hinting at potential shifts in market dynamics if new architectures emerge. One comment highlights the competitive pressure in AI development, suggesting that the congratulatory message might be insincere due to the competitive stakes involved. Another comment speculates on the future of AI architectures, suggesting that current models may not lead to AGI, which could impact the market position of companies like OpenAI if new technologies emerge.
- bnm777 discusses the potential impact on OpenAIâs market position if another company develops an architecture capable of achieving AGI, suggesting that OpenAIâs reliance on LLMs might not be sustainable in the long term. The comment implies that OpenAIâs valuation and user base could significantly decline if they are not the ones to pioneer AGI technology.
- BallKey7607 provides a counterpoint by suggesting that the individual in question is genuinely supportive of AI advancements, regardless of the company or architecture involved. This implies a broader acceptance of AI progress beyond corporate interests, which could influence how AI technologies are perceived and adopted across the industry.
- I Love how Unhinged Grok is (Activity: 1608): The image is a meme featuring a conversation with an AI named Grok 4.1, which humorously portrays the AI as being bold and unrestrained in its willingness to discuss NSFW content. This depiction contrasts with typical AI interactions that are more conservative and restricted in handling explicit topics. The post and comments reflect a playful engagement with the idea of AI being more âunhingedâ or less filtered in its responses, which is not a technical feature but rather a satirical take on AI behavior. One comment humorously questions whether Grok can generate NSFW images, indicating a curiosity about the AIâs capabilities beyond text responses.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5.1
1. Next-Gen Image and Video Models Hit Production Workflows
- Nano Banana Pro Pushes Photorealism and Fraud Fears: Nano Banana Pro drew heavy praise in multiple communities as users used it to rapidly generate comics and hyperârealistic images, with OpenAI Discord sharing full comic pages and Latent Space relaying comparisons showing its outputs âindistinguishable from realityâ versus Grok 4.1 and free ChatGPT in Romain Hedouinâs image test.
- Latent Space highlighted a post where Nano Banana Pro produced near-perfect counterfeit receipts, KYC documents, and passports in one prompt, with Deedy Das warning that this enables serious fraud at scale, while OpenAI Discord users simultaneously worried the model could be âlobotomizedâ if safety interventions overreact.
- Whisper Thunder Storms the Text-to-Video Leaderboards: Latent Space reported that Whisper Thunder has taken the #1 spot on the Artificial Analysis textâtoâvideo leaderboard, surpassing VideoGen, as flagged in Soumith Chintalaâs post.
- In OpenRouter discussion, users shared the broader Artificial Analysis textâtoâvideo leaderboard, which now ranks David first, Google Veo 3 second, and Kling 2.5 Turbo 1080p third, framing Whisper Thunder as part of a rapidly moving SOTA video generation race that practitioners are actively tracking for deployment.
- NB Pro and FLUX 2 Pro Ignite Image Model Arms Race: On LMArena, users called NB Pro âlowkey insaneâ and âthe best image model in history periodâ, claiming its generations feel âlike a pair of eyesâ and blow every other model âout of the waterâ, while a separate Latent Space thread showcased FLUX 2 Proâs sideâbyâside comparison demonstrating a major quality jump over FLUX 1 Pro and eliminating the prior âplasticâ look.
- LMArena added fluxâ2âpro and fluxâ2âflex to its TextâtoâImage and Image Edit ladders per their announcement, where users generally favored NB Pro for peak quality but saw Flux 2 as a strong contender, and debated SynthIDâs watermarking as the only thing preventing NB Pro from being ânerfed within daysââeven as some casually described multiâplayer reâencode workflows to strip it.
- OpenAIâs Silent Image Model Upgrade Draws Mixed Reviews: Latent Spaceâs genmedia channel noted that OpenAI has quietly upgraded its image model, with Arrakis AI sharing a before/after example that still looked oddly yellow to one observer in this post.
- While some users welcomed higher fidelity, others criticized weak multilingual support, inconsistent character/scene continuity, and persistent safety guardrails, contrasting the upgrade unfavorably with Nano Banana Pro and FLUX 2 Pro in realistic rendering and controllability.
2. Agentic UX, Code Assistants, and Chat Frontends Evolve
- Claude Codeâs Plan Mode Spins Up Swarms of Subagents: Latent Space relayed Sid Bidasariaâs announcement that Claude Codeâs Plan Mode now launches multiple exploring subagents in parallel, generates competing plans, asks clarification questions, and persists an editable plan file accessible via
/plan openas described in Sidâs post.- Engineers praised the higher oneâshot success rate but requested faster UX, an âaskâonlyâ switch, onâtheâfly Opus vs Sonnet selection, and less verbose replanning, pointing to followâup feedback threads like this one as evidence that agentic IDE workflows are converging on multiâagent planning with tight human editing loops.
- GPTâ5.1 Becomes Anime Storyteller-in-Chief (With Handcuffs): In OpenAIâs GPTâ4 channel, a user reported that GPTâ5.1 is âthe best model for anime or story writingâ, because it reliably remembers character designs and longârange context better than their yearâlong baseline GPTâ4.1.
- The same user complained that GPTâ5.1âs safety and violence guardrails are so strict that it blocks animeâstyle combat scenes, illustrating a tradeâoff many powerâusers now see between narrative coherence and policy constraints when choosing storyâgeneration backends.
- Kimi Kâ2 and Canvas UIs Challenge the Chatbot Paradigm: On the Moonshot Kimi Kâ2 server, one user, despite planning a paid upgrade, confessed they âstill donât really know what its limits areâ (with a screenshot), while another praised Kâ2âs âexceptional thinking, pushâback ability, and prompt understandingâ as surpassing other chatbots.
- The same channel debated why fullâscreen canvases havenât replaced chat UIs on sites like Kimi or Qwenâarguing canvases better support complex workflowsâand invoked the âconversational fallacyâ that AI must be directly addressed, highlighting a shift toward nonâchat, workspaceâcentric AI UX.
- Meganova Chat and Gemini Agents Tease Tool-Driven Workflows: OpenRouter users buzzed about the upcoming Meganova Chat as a âclean, fast placeâ for managing AI chats and characters, with one person saying âIâm seeing a lot of positive buzz around Meganova Labubu Chat! Iâm considering learning more about itâ as they eyed alternatives postâDeepSeek R1 removal.
- Meanwhile, Perplexity users explored Gemini Agentâs ability to execute Python scripts inside its environment, referencing Googleâs docs at support.google.com/gemini, but noted the sandboxed VM ignores even
sudo rm -rf / --no-preserve-root, underscoring how agent tooling is growing more capable while still tightly locked down.
- Meanwhile, Perplexity users explored Gemini Agentâs ability to execute Python scripts inside its environment, referencing Googleâs docs at support.google.com/gemini, but noted the sandboxed VM ignores even
3. GPU Kernels, Distributed Inference, and Training Tricks
- nvfp4_gemv Contest Turns LLM-Crafted CUDA into a Bloodsport: The GPU MODE NVIDIA competition channel saw a surge of submissions to the
nvfp4_gemvleaderboard, with users like<@1035498877249409155>hitting 3.02 ”s and later 15.8 ”s for second place, while<@1295117064738181173>climbed into 7th place at 22.5 ”s, amid dozens of âPersonal bestâ and âSuccessful on NVIDIAâ posts.- Participants discussed flakiness in the
eval.pyharness (up to 50% timing variance and a possibly slow runner 105881), warned thatcudaStreamSynchronize()and events add multiâ”s overhead, and bragged about using Gemini 3.5 Pro and Opus 4.5 as nearâfully autonomous kernel authorsââthey make GPTâ5.1 look like llamaâ7bââillustrating how LLMâassisted kernel design is already competitive on vendor leaderboards.
- Participants discussed flakiness in the
- Tensor Core Wizardry and CUTLASS/CuTeDSL Deep Dives: In GPU MODEâs CUDA and cutlass channels, engineers traded Tensor Core optimization tips, citing Lei Maoâs GEMM tutorial, alexarmbrâs Hopper matmul worklog, and cudaforfunâs H100 writeâup.
- They dissected how
ldmatrix.b16pulls 128 bits per thread, recommendedreinterpret_castintofloat2when usingf32/s32accumulators (each thread owning 8 bytes), and explained that SIMT loads and CuTeDSL packed FP16 instructions (from the gpu-mode/reference-kernels repo) should be used judiciously, whiletiled_mmatiling layouts like((64,16),2,4,...)encode 64Ă256 tiles with 2Ă4 subdivisions along M/K.
- They dissected how
- Multi-Node LLM Inference Wins: NVRAR and PAT Algorithms: GPU MODEâs multiâGPU channel highlighted LLM Inference Beyond a Single Node (arXiv:2511.09557), where the NVRAR NVSHMEMâbased hierarchical allâreduce delivers 1.9â3.6Ă lower latency than NCCL for 128âŻKBâ2âŻMB payloads and yields up to 1.72Ă lower endâtoâend batch latency for Llamaâ3.1â405B decodeâheavy workloads in YALIS.
- They paired it with the PAT collective paper, âPAT: a new algorithm for all-gather and reduce-scatter operations at scaleâ (arXiv:2506.20252), which argues Bruck and recursiveâdoubling allâgathers slow down in practice because the last step sends half the tensor to the farthest rank over tapered, staticallyârouted links, motivating new productionâviable algorithms for allâgather/reduceâscatter at cluster scale.
- ES HyperScale and Blackwell Architecture Redefine Training Constraints: Unslothâs research channel amplified ES HyperScale (eshyperscale.github.io), which claims a 100Ă training throughput boost over standard evolution strategies on billionâparameter models at large populations, enabling int8, gradientâfree training on CPUs and prompting one member to quip âTraining at 100x speed? Thatâs Unsloth x 50 then.â
- Over in Nous, users dissected Nvidia Blackwellâs unified scalar pipeline, warning that mixing INT and FP inside a kernel can cause 30â50% performance drops from cache thrash, and recommending strictly FPâonly or INTâonly kernelsâa crucial constraint for anyone designing quantized or hybridâprecision training loops for upcoming Blackwell servers.
- Robotics and Partial-Training Tricks Push Custom Hardware Limits: GPU MODEâs roboticsâvla channel examined lowâcost dualâarm laundry robots from 7x (about $3k per system) via their YouTube channel, debating whether such hardware can survive real industrial duty cycles even with â24 hour supportâ from the founders.
- Separate Tritonâkernel discussions pursued a partially trainable embedding where only 1k rows (127kâ128k) of a 128k vocab remain trainable, plus a weightedâloss softmax that applies perâposition multipliers (e.g., 0.5Ă at pos 123 vs 1.5Ă at pos 124) without materializing full logits, while another Nous thread cautioned that on Blackwell you must keep those kernels typeâpure to avoid severe slowdowns.
4. Open Tools, Protocols, and Model Routing Infrastructure
- dspy-cli Turns DSPy Pipelines into FastAPI/MCP Services: The DSPy community announced that
dspy-cliis now open source on PyPI and GitHub at cmpnd-ai/dspy-cli, giving users a oneâliner (uv tool install dspy-cli) to scaffold DSPy projects, define signatures, and expose modules as FastAPI endpoints or MCP tools.- Engineers praised how
dspy-climakes it trivial to package DSPy programs into Dockerâdeployable HTTP APIs, with David Breunig promoting it in a tweet as a practical way to operationalize DSPy logic in production stacks.
- Engineers praised how
- RapidaAI Open-Sources Voice Stack to Kill Per-Minute Markups: In both Hugging Face and OpenRouter communities, RapidaAI announced that their productionâready voice AI platform is now fully openâsource, targeting teams tired of paying an extra $0.05â$0.15 per minute to rent thirdâparty voice APIs.
- The team framed Rapida as a way to own your endâtoâend voice inference stack (ASR, TTS, LLM) instead of leaking six figures annually in vendor margin, making it particularly compelling for highâvolume contact centers and realâtime voice agents building on open models.
- MCP Protocol Ships New Version While MAX/Mojo Plan a Mojo-First Future: The official MCP Contributors Discord announced a new MCP protocol version in their protocol channel and clarified that the UI SEP ships outâofâband as an extension, while fielding questions about how to handle namespace collisions when third parties publish â-mcpâ variants that diverge from the spec.
- Simultaneously, the Modular server discussed how MAX is currently written in Python, synced from internal repos using Copybara, and used to expose a JITâcompiled graph, with maintainers hinting that the previously removed Mojo API for MAX will return once the language maturesâthough they warned that Mojo is more like C++/Rust than Python, so serious performance work will require nonâtrivial rewrites.
- Tinygrad, LM Studio, and OpenRouter Harden Local and Cloud Stacks: Tinygradâs learnâtinygrad channel detailed how
@TinyJitreplays only the captured kernels and ExecItems, requiring developers to split Python control logic into separate JIT functions, and shared an introductory Tinygrad JIT tutorial while planning changes so the tracer only locks in once two runs match.- On the deployment side, LM Studio users fixed local API errors by switching to documented endpoints in the REST API guide, debugged Flash Attention regressions causing image caption failures with
llava-v1.6-34b(fixed by switching to Gemma 3), and LM Studio hardware threads compared PCIe bifurcation via SlimSAS MCIO adapters while noting RDNA/MI50 GPUs often run inference with 0% fan RPM until power draw spikes.
- On the deployment side, LM Studio users fixed local API errors by switching to documented endpoints in the REST API guide, debugged Flash Attention regressions causing image caption failures with
- Routing Bugs and Fallback Failures Expose OpenRouter Edge Cases: In OpenRouterâs general channel, users complained that Opus was overloaded again despite expectations of better rate limits, reported that the free DeepSeek R1 model vanished, and praised OpenRouterâs normalized APIs for making it trivial to hotâswap GPTâ5.1 â Claude Opus 4.5 without rewriting providerâspecific code (even with a ~5% credit premium).
- More seriously, an engineer discovered that the documented model fallback routing failed to trigger when the primary returned HTTP 404, blocking failover to secondary models and prompting concerns from someone about to migrate an enterprise app that routing correctness and failureâmode coverage still need hardening.
5. Safety, Robustness, Data Economics, and Evaluation Reality Checks
- Emergent Misalignment Replication Reveals the JSON Trap: Eleutherâs research channel discussed a replication and extension of the âEmergent Misalignmentâ work where Gemma 3 and Qwen 3 remained highly robust to insecure fineâtuning (â0.68% misalignment), with full results published as a Hugging Face dataset and GitHub code.
- The accompanying blog post, âThe JSON Trapâ, argues that forcing models into JSONâonly output actually reduces their degrees of freedom to refuse harmful requests, creating a formatâdependent misalignment vector (0.96% vs 0.42% misalignment under different output constraints) that safety engineers need to factor into toolâcalling and API design.
- Hallucinations, Golden-Retriever LLMs, and Benchmark Contamination: Across Eleuther and Yannick Kilcherâs servers, researchers emphasized that hallucinations in multiâstage LLM pipelines are still hallucinations of the component system even if later steps correct them, linking a new LLM hallucination paper (arXiv:2509.04664) and joking that LLMs are like golden retrievers that will happily fetch something even if itâs wrong, as illustrated in a YouTube explainer.
- Nous and Eleuther members also worried about benchmark contamination, noting that once public benchmarks leak into training corpora, models can ace them by memorization; some labs now keep private versions and focus on large, harderâtoâmemorize question pools, while a LessWrong post on âyour LLM-assisted scientific breakthrough probably isnâtâ was shared to discourage uncritical acceptance of AIâgenerated research claims.
- Curriculum Learning, Data vs Compute, and Job Impact Studies: Yannick Kilcher and Nous channels debated curriculum learning and coresets in LLM pretraining, citing the OLMo 3 blog and paper (AllenAI post, OLMo paper) plus a newer result, âCurriculum learning is beneficial for language model pre-trainingâ (arXiv:2508.15475v2), which argues for modelâcentric difficulty measures instead of naive token heuristics.
- Nous members contrasted spending $2k on data versus $32M on compute for systems like Udio and Suno, suggesting computeâheavy but dataâstarved regimes could distort research trajectories, while multiple channels discussed an MIT study claiming AI can already replace 11.7% of the US workforce (CNBC writeâup, paper)âand questioned the wisdom of using LLMs to score task automability.
- Summarization, Safety Guardrails, and Legal/Policy Friction: In Yannickâs paperâdiscussion channel, several practitioners complained that LLMs are surprisingly bad summarizers on dense texts, saying âthey really arenât in my experience because they donât grasp whatâs important and what can be discardedâ, and blamed vendor features like Adobeâs AI summaries (with a mocking screenshot) for encouraging lowâquality reading habits.
- Other communities surfaced policy and legal edges: OpenAI users debated whether ChatGPTâs RLHF induces a leftâleaning political bias; artists queried whether Geminiâgenerated images are safely commercializable given unclear copyrightability; and game devs on Nous argued over Steamâs AI content disclosure rules after Tim Sweeney suggested disclosures should apply only to art, not full games, exposing a widening gap between regulatory expectations and realâworld AI content pipelines.
Discord: High level Discord summaries
LMArena Discord
- Deepfake makes a âCameoâ!: Users debated the appropriateness of the word cameo to describe appearances in images, suggesting it might be a euphemism for deepfake to soften negative connotations.
- Alternatives were considered, with one user seeking a single word in between deepfake and cameo, possibly akin to a version of Avatar.
- Flux 2 Models Flood into the Arena!: The Flux 2 modelsâ arrival sparked debate, with users comparing Flux-2-pro and flux-2-flex to NB Pro in Text-to-Image and Image Edit on LMArena, as announced on X.
- Opinions varied, with some finding Flux 2 nice but not on par with NB Pro.
- NB Pro Generates âInsaneâ Images!: Users praised NB Pro as lowkey insane, with some calling it an agi moment and describing it as more than just an image generation model, but like a pair of eyes.
- One user said NB Proâs image generation blows all other models out of the water and called it the best image model in history period.
- SynthID Prevents Nerfing!: Users emphasized the importance of SynthID in protecting models from being nerfed, stating that without it, NB Pro would be nerfed within DAYSđ.
- One user described a method to bypass SynthID by re-saving the video through multiple media players.
- Robin Model stealthily beats Opus!: A new stealth model named Robin was revealed to surpass Opus 4.5 in UI performance, leading to speculation that it might be a hidden OpenAI model.
- A member speculated: this robin model is like their real hidden card imo.
Perplexity AI Discord
- Thiel overshadows Musk in AI Doom Potential: A member voiced worries about Palantir Technologies, suggesting that Peter Thiel presents an existential threat, potentially eclipsing Elonâs capacity for pdoom.
- Another member jokingly suggested nuking everyone to eliminate AI/robotics.
- Nvidia and Altmanâs Partnership Inflating AI Bubble: Members debated the concentration of AI investment, suggesting that 1% of USA GDP is being invested in AI/robotics, with OpenAI, seemingly run by Nvidia, and Nvidia, by OpenAI.
- Others clarified that Altman is primarily acquiring shares in Nvidia.
- Opus 4.5 Token Efficiency Claim Debunked: Members initially claimed Opus 4.5 is 73% more efficient compared to Sonnet 4.5 in terms of token efficiency, a claim which was disputed.
- Countering this, another user cited a report indicating that Opus 4.5 is actually 76% more efficient than the previous Opus model.
- Gemini Agent Sandboxed Despite Python Script Access: Discussion arose around the capability of Gemini Agent to execute Python scripts within its environment in Perplexity.
- Despite the ability to run scripts, it was noted that the environment is sandboxed, mitigating potential risks even from commands like sudo rm -rf / âno-preserve-root*.
- Perplexity Blocks User Prompts, Sparks Chaos: Users encountered difficulties editing their AI Profiles (system instructions), noting that changes reverted upon refresh due to a bug, suggesting PPLX might be actively blocking user prompts.
- One member expressed a preference to avoid system prompts entirely, particularly because Spaces now retain memory unexpectedly.
Unsloth AI (Daniel Han) Discord
- ERNIE AI Developer Challenge goes Live: Unsloth is supporting the ERNIE AI Developer Challenge, offering $3,000 in prizes for fine-tuning ERNIE and building impactful models, with details at the Baidu Ernie AI Devpost link.
- Official ERNIE finetuning notebooks (AMD ones are free) are available at the X post link.
- CPU Training now a Reality: ES HyperScale achieves a hundredfold increase in training throughput over standard ES for billion-parameter models at large population sizes, enabling more flexible training on any model, without worrying about gradients, and with int8.
- One member joked that Training at 100x speed? Thatâs Unsloth x 50 then.
- Qwen3 8B Fine-Tuning Falls Flat: A user experienced poor evaluation results after fine-tuning Qwen3 8B, with responses unrelated to the fine-tuning data, and experiencing the model still outputting the
thinkingprompt even with the prompt set to false.- It was suggested to try manual merging and saving if LM Studio replicates the issue, referencing the Unsloth documentation.
- Long Context Training Requires CPU Offloading: A member asked if adding adapters to a model during training would mean both the adapter + model will be in memory, thus use more VRAM.
- Another member provided a link to the Unsloth Long Context Blogpost and explained the point of LoRA is to avoid updating all parameters.
Cursor Community Discord
- Haiku models dominate documentation: Users find that Haiku is 100% accurate for documentation, while Composer-1 excels in code implementation.
- A community member suggested using Antigravity instead of adding markdown files in repos, though it could create handoff problems.
- Cursor users seek linting freedom: A user wants to turn off red squigglies for linting checks while keeping them for other errors, and to enable the extension to run
--fixon file save.- They expressed frustration with Cursor, stating that itâs fairly straightforward in JetBrainsâ tools.
- Cursorâs agent plans vanish on exit: A user seeks where the markdown file for an agent plan is saved, to use on different computers without losing the plan.
- A community member stated that Cursor doesnât automatically save the plan, recommending manual saving and creating a directory to store all plans.
- Token usage and model cost debates: Users discuss the costs of tokens, with some reporting Opus model overload and degradation.
- There is debate on whether to enable on-demand usage or buy a Pro+ plan, and whether to burn the tokens with Auto mode versus optimizing token efficiency.
GPU MODE Discord
- Triton Kernel Conundrums Confronted: A member is exploring Triton kernels for a Partially Trainable Embedding and Logits Softmax operation, aiming to train a large model efficiently, focusing on specific special tokens, but is experiencing memory bounding issues.
- The goal is to only train 1k rows (127k to 128k) out of a 128k vocabulary, and use a logits softmax operation that allows for weighted loss to be applied, such as token in pos 123 having a 0.5x loss multiplier and token in pos 124 having a 1.5x loss multiplier.
- NVIDIA Leaderboard Records Reset!: The
nvfp4_gemvleaderboard on NVIDIA saw a surge of submissions, with <@1035498877249409155> achieving second place with 3.02 ”s and later another second place with 15.8 ”s.- Multiple users submitted âPersonal bestâ results, and <@1295117064738181173> secured 8th place with 22.7 ”s, then later 7th place with 22.5 ”s, and <@1035498877249409155> achieved 9th place with 23.2 ”s.
- Tensor Core Optimization Tips trickled down: Members shared resources for performance optimization on NVIDIA Tensor Cores, pointing to articles and worklogs, such as alexarmbrâs work and cudaforfunâs worklog.
- Discussions emphasized that
ldmatrix.b16loads 128 bits of data per thread without extra operations, suggesting areinterpret_castfor correct data handling and when usingf32ors32accumulators, each thread holds a pair of consecutive values within a row (8 bytes).
- Discussions emphasized that
- 2-bit Dequantization Dilemmas on Intel GPU: A user inquired about performing 2-bit dequantization directly on an Intel GPU, noting that while quantization can be done on the CPU, dequantizing with Torch is slow.
- The poster is looking for optimized GPU-based alternative to Torch for dequantization to improve performance but the channel did not provide any further discussion, it remains an open question.
- Factorioâs Fantastic Facelift: Documentation Deployed: Jack Hopkins announced that the documentation for the Factorio Learning Environment is now live at Factorio Learning Environment.
- The community seems pleased with the documentationâs arrival.
OpenAI Discord
- ChatGPT Allegedly Leans Left: Members are debating that ChatGPT may be trained on politically left-wing data, possibly due to progressive viewpoints in training data and biases of human raters in Reinforcement Learning with Human Feedback (RLHF).
- One member argued that the modelâs need to fussy foot around questions compromises its reliability.
- Nano Banana Pro Creates Quick Comics: Users are creating comics with Nano Banana Pro, praising its ability to generate images quickly and the high-quality results, as exemplified by comic pages.
- Members are also sharing worries about the model being lobotomized.
- AI Art Raises Copyright Concerns: Members debated the commercial viability and copyright implications of using AI-generated images from Gemini, noting that while Google doesnât explicitly prohibit commercial use, the legal status depends on whether the content is copyrightable.
- Cultural bias in AI art is also a concern, and one member commented that if the anti AI people want to do something they ought to start drawing and making art.
- GPT-5.0 Mini Disappoints: Members have expressed disappointment with GPT-5.0 Mini, with one member stating it is a downgrade.
- They are also annoyed with incessant requests for Sora 2 before having experience with the first version.
- GPT 5.1 Excels in Anime Storytelling: A user highlights that GPT 5.1 is currently the best model for anime or story writing due to its ability to remember character designs and previous context.
- The only complaint is the strict safety net and guardrails that prevent writing anime-style violence; the user contrasts its performance with GPT 4.1, which theyâve used for a year but noted sometimes misses character designs.
LM Studio Discord
- API Endpoint Error Resolved in LM Studio: A user encountered an error with unsupported API endpoints (POST /api/v1/generate) on their local server, but self-resolved after consulting the LM Studio REST API documentation.
- The user realized the endpoint was invalid, highlighting the importance of accurate endpoint configuration.
- Image Captioning Fails in LM Studio Post-Update: A user reported persistent âChannel Errorâ when trying to caption images with LM Studio after a Windows and antivirus update, reporting a 100% failure rate.
- Switching from llava-v1.6-34b to Gemma 3 resolved the issue, suggesting a potential model dependency or problems with Flash Attention being enabled by default, now they have 100% success rate.
- Flash Attention Glitches Impact Model Functionality: It was suggested that the captioning issue may be related to Flash Attention, enabled by default in recent LM Studio versions, causing some models to malfunction.
- Users were prompted to run
lms log streamfor detailed error messages and share screenshots of their runtimes, particularly when dealing with non-English I/O.
- Users were prompted to run
- GPU Fans Relax During Inference: A user noticed their GPU fans were at 0% during inference, initially raising concern, but later clarified it was normal behavior for their MI50 and sometimes their 4070 TiS.
- They clarified that the GPU âtakes overâ and power draw increases once the context is fully written, indicating efficient power management during specific phases of inference.
- Motherboard Supports PCIE Bifurcation: A user realized their X570 AORUS ELITE WiFi motherboard supports PCIe bifurcation on the primary x16 slot, allowing configurations like 8x/8x or 8x/4x/4x.
- Another user pointed out that one can use a SlimSAS MCIO adapter to split the x16 slot into dual x8 slots when x8x8 is enabled.
OpenRouter Discord
- Opus Suffers Overload Outage: Users reported that Opus was overloaded again, leading to service interruptions, despite hopes for improved rate limiting and load balancing.
- Members acknowledged the issue, but others expressed empathy due to the companyâs small size, with one noting Small company pls understand.
- Model Fallback Flounders with Flak: A user reported a bug in the model fallback logic where a 404 error from the primary model prevented fallback to secondary models.
- The member emphasized the severity of the issue for enterprise applications, stating if the fallback logic breaks for such simple use case, there might be more issues.
- Free Deepseek R1 Ripped from Router: Members noted the free Deepseek R1 model is no longer available, leaving users searching for alternatives and better pricing options.
- A member lamented losing the model Thatâs stupid. I used it with a chutes api key because using the model via chutes shows the think process and I canât stand it.
- Meganova Chat Creates Massive Movement: Members discussed the upcoming launch of Meganova Chat, a platform for managing AI chats and characters, with a user describing it as a clean, fast place to be.
- Another user responded Iâm seeing a lot of positive buzz around Meganova Labubu Chat! iâm considering learning more about it.
- Text-to-Video Leaderboard Triumphs: A member shared a link to the Artificial Analysis Text-to-Video Leaderboard, which now gives current rankings.
- The leaderboard showcased David in first place, followed by Googleâs Veo 3 as the runner-up, and Kling 2.5 Turbo 1080p in third place.
Nous Research AI Discord
- Psyche Team Schedules Office Hours: The Psyche Team will host an Office Hours session next Thursday 12/4, at 1PM EST in the Events channel, accessible via Discord event.
- This offers users a direct line to engage with the team and discuss relevant topics or questions.
- Sunoâs Music Partnership Sparks Debate: Sunoâs partnership with Warner Music Group prompts discussions about AIâs role in music creation and industry impacts.
- Members highlighted the varying quality of Sunoâs output, with some tracks being indistinguishable from human compositions, while others are clearly AI-generated.
- Compute Costs Eclipse Data Dollars: The discussion contrasts the expense of $2k on data versus $32 million on compute, spotlighting the heavy resource demands of AI model training, especially for models like Udio and Suno.
- This economic disparity might constrict future research, limiting access to quality training data.
- INT/FP Workload Mixing Mars Blackwell Performance: Mixing INT and FP workloads on Nvidiaâs Blackwell architecture can significantly degrade performance due to its unified scalar pipeline.
- The recommendation is to maintain kernel purity (FP-only or INT-only) to prevent a potential 30-50% performance drop from constant cache thrashing.
- Steamâs AI Content Policy Stirs Debate: Discussions address Steamâs AI content disclosure policies, with Epic CEO Tim Sweeney suggesting AI disclosures should only apply to âartâ not games.
- Arguments center on whether disclosures inform consumers adequately about AI-generated content and its influence on gaming experiences.
Eleuther Discord
- Hallucinations Persist in Multi-Stage LLMs: A member shared that even when corrected, hallucinations occurring within a multi-stage LLM process should still be considered hallucinations of the component system, citing a paper on LLM hallucinations.
- They likened this to human self-correction, suggesting itâs a natural part of the cognitive process.
- LLMs Compared to Eager Golden Retrievers: Members analogized LLMs to golden retrievers due to their inclination to provide user-pleasing responses, even if inaccurate, citing examples such as ChatGPT, Claude, Gemini, and Grok.
- A member shared a YouTube video illustrating how LLMs might generate outputs lacking genuine comprehension or logical coherence.
- SGD Shuffling Debate Revs Up: Members debated the benefits of shuffling data every epoch in SGD, with one member arguing that shuffle once should always be better than IID.
- Another member countered that practice matters more than proofs due to the non-convex nature of optimization surfaces, noting that IID can lead to increased variance and data revisits.
- Emergent Misalignment Paper Sparks JSON Trap Discovery: A replication and extension of the âEmergent Misalignmentâ paper was released, testing Gemma 3 and Qwen 3, finding open-weight models surprisingly robust to insecure fine-tuning (0.68% misalignment).
- The member released the full dataset and code, speculating that JSON restrictions reduce a modelâs degrees of freedom to refuse harmful requests, as discussed in this blog post.
- Seeking Elixir for AI Drug Discovery: A member sought educational resources for AI for Drug Discovery, aiming to understand architectures, open problems, and the current state.
- Another member suggested reviewing various surveys available via Google Scholar, while another pointed to the Zach Lipton startup.
Latent Space Discord
- Claudeâs Plan Mode Spins Up Parallel Subagents: Claude Codeâs Plan Mode was overhauled to spin up multiple exploring subagents in parallel, generate competing plans, and enable users to edit the saved plan file with
/plan openper Sidâs X post.- Community members are requesting faster UX, an âask-onlyâ option, model-picker (Opus vs Sonnet), and less verbose replanning in the ensuing thread.
- Thinking Game Documentary Chronicles DeepMind: The free full movie documentary, The Thinking Game, explores the origins of DeepMind and is now available on YouTube.
- Viewers are calling it great and saying the movie really makes you want Demis to win the AGI race.
- Jeff Dean Details 15 Years of AI Progress: AER Labs recapped Jeff Deanâs Stanford talk tracing 15 yrs of AI progressâfrom hand-coded 90s gradients to Gemini 3.0 solving IMO problemsâpowered by scale, better algos (TPUs, Transformers, MoE, CoT) and hardware according to this post.
- Dean also demoed low-code âSoftware 3.0â and visual reasoning during his talk.
- ChatGPT is Awesome, but Claude Pushes Boundaries: Members compared the value of ChatGPT Pro vs Claude, noting that ChatGPT is great for general research, has better Codex rate limits, and is better for non ts/js/py, and has higher value if you use pulse, atlas, sora, codex cloud etc.
- However, members added that Claude is always pushing boundaries, its models are better trained to use tools, its frontend UX and UI is really good, and its cli readability/typography/font hierarchy makes it easier to understand.
- Whisper Thunder Storms the Text-to-Video Scene: The ML community is excited about Whisper Thunder, a new #1 text-to-video model, which has surpassed VideoGen in the latest Artificial Analysis rankings as detailed in this post.
- No other information about Whisper Thunder or VideoGen was given.
Yannick Kilcher Discord
- Department of Energy to Build National AI Platform: The Department of Energy is planning to build a national AI platform leveraging U.S. supercomputers and federal science data, to train scientific foundation models, and run AI agents + robotic labs to automate experiments.
- Target applications include biotech, critical materials, nuclear fission/fusion, space, quantum, and semiconductors.
- AI Job Replacement Study Sparks Debate: An MIT study reported in CNBC suggests AI could replace 11.7% of the U.S. workforce, based on the Iceberg Index and paper.
- Some members questioned the methodology, expressing skepticism about trusting LLMs to determine if other LLM tools can automate jobs.
- LLMs can be Terrible Summarizers: Members discussed experiences where LLMs often fail to grasp whatâs important in summarization, especially with high-information density texts, saying âthey really arenât in my experience because they donât grasp whatâs important and what can be discarded.â
- One member said Adobeâs AI summaries might be leading to issues, sharing an image.
- Curriculum Learningâs Value Debated: Members discussed the use of curriculum learning and coreset techniques during LLM pretraining, referencing the Olmo 3 paper and the OLMo paper.
- One member questioned potential biases introduced by non-random sampling, while another cited this paper clarifying that curriculum learning is beneficial for language model pre-training, as long as a more model-centric notion of difficulty is adopted.
HuggingFace Discord
- HF Inference API Option Grayed Out: A member seeks guidance on activating the Hugging Face internal inference API for their model, noting the UI option is currently disabled, illustrated in this image.
- No resolution was provided within the context.
- French Books Dataset Arrives: A member released a dataset of public domain French books on Hugging Face.
- They also shared a separate dataset of only the conversations in the books (here), intended for instruction purposes.
- RapidaAI Opens the Source: RapidaAI, a production-ready voice AI platform, is now open-source, allowing users more control over their voice AI stack.
- The company said teams were spending an extra $0.05â$0.15 per minute renting someone elseâs stack.
- GNN Presentation on AlphaFold Approaching: A member is preparing a presentation on GNNs, beginning with AlphaFold 2 and 3.
- The specific focus of the presentation is still to be determined.
- LM Studio PDF Teacher Suggested: In response to a query about a PDF-reading model for LLMStudio, a member suggested any instruct model LLM should work, leveraging LM Studioâs built-in RAG.
- They provided links to the LM Studio models page and Hugging Face models page.
Modular (Mojo đ„) Discord
- Mojo Keeps Repos Synced Using Copybara: Members confirmed that Mojo uses Copybara to keep its internal private repo synchronized with the external open-source repo.
- This ensures consistent reflection of changes and updates across both repositories.
- MAX Newbies Hunt Example Code: A member requested small examples to learn MAX, with interest in training, and was directed to relevant content by Endia.
- The discussion centered on getting hands-on experience with practical MAX use cases.
- Pythonâs Dominance in MAX: Whatâs the Endgame?: A member questioned the decision to write MAX in Python, speculating whether this choice was intended to ease migration to MAX and Mojo.
- They pondered if this would lead to a split world issue akin to PyTorch, and the potential emergence of a pure Mojo framework for MAX.
- Mojo APIâs Comeback in MAX Teased: A member clarified that MAX previously featured a Mojo API, which was discontinued due to Mojoâs immature state.
- They hinted at the eventual return of the Mojo API once the language reaches a more complete stage.
- Migrating from Python to Mojo: More Than Meets the Eye: A member cautioned that while Mojo may resemble Python, it is closer to C++ or Rust, requiring significant effort to fully exploit Mojoâs capabilities when migrating to Mojo MAX.
- This suggests that achieving peak performance in Mojo MAX demands more than a simple translation of Python code.
tinygrad (George Hotz) Discord
- TinyJit Replays Kernels: When using
@TinyJit, the wrapped function only replays the captured tinygrad kernels and ExecItems, preventing the original function from running.- This behavior requires users to split Python code into separate JIT functions, though non-tinygrad outputs may not update correctly.
- Tensor Randomness Functions Behave: Randomness functions on
Tensorfunction as expected because they increment counters via a kernel as showcased in this example.- The example is
CPU=1 DEBUG=5 python3 -c "from tinygrad import Tensor; Tensor.rand().realize(); Tensor.rand().realize()".
- The example is
- Tinygrad JIT Tracing Tweaks Incoming: Currently, Tinygradâs JIT requires two runs for tracing to repeat the captured kernels, with the first run potentially handling setup tasks like weight initialization.
- A proposal suggests updating the JIT to verify matches after two runs, indicating ongoing development focused on preventing common errors as the project approaches a 1.0 release.
- Tutorial Gives Good JIT Intro: A member shared a tutorial on tinygrad JIT that has useful info still.
- It gives useful background but the tutorial is a bit outdated.
- Frontend Usability Gets Focus: With Tinygradâs fundamentals now solid, the team is shifting its focus to improving frontend usability.
- One person reminisced that the very first pytorch compiler in a fast.ai lesson literally concatenated C code strings, using regex!.
Moonshot AI (Kimi K-2) Discord
- Kimi K-2âs Limits Being Explored: Users on Discord discussed the limits of Kimi, with one user sharing a screenshot expressing uncertainty about its capabilities despite planning an upgrade.
- Another user lauded Kimi K2 for its exceptional thinking, push-back ability, and strong understanding of prompts, suggesting it surpasses other chatbots.
- Canvas Craze Coming for Chatbots?: A user questioned why canvases havenât replaced chatbots for full-screen websites like Kimi and Qwen, suggesting they offer a superior user experience.
- They argued that while chatbots are adequate for side-panels, canvases could provide a more comprehensive interface for detailed web applications.
- Digging Deeper Into Conversational Fallacy: A user shared their fascination with the conversational fallacy, which posits that AI must be addressed to be used, suggesting that Kimi excels by not adhering to this fallacy.
- The conversation revolved around the idea that AIâs utility shouldnât be limited to direct conversational interactions.
DSPy Discord
dspy-cliTool Goes Open Source: Thedspy-clitool is now open source and available on PyPi, aiding in the creation, development, testing, and deployment of DSPy programs as HTTP APIs.- The repo is available on GitHub and the tool can be installed using
uv tool install dspy-clito scaffold a new DSPy project, create new signatures, and run modules as FastAPI endpoints or MCP tools, with easy deployment to docker hosting services.
- The repo is available on GitHub and the tool can be installed using
- Trajectory Injection Sought for ReAct Modules: A member inquired about injecting trajectories into a ReAct module, seeking to provide the agent with context from previous runs in addition to message history.
- The request aimed to augment agent context with previous run data.
- API Choices Debated for Web Search in DSPy: Members discussed best APIs to implement a web search tool in DSPy, with one sharing a positive experience using Exa API due to its summarization feature, which avoids the random ads and HTML tags found in other APIs like Firecrawl and Parallel.ai.
- Another member is trying to implement it using Anthropicâs web search API with ReAct, and shared a code snippet using
dspy.ReAct.
- Another member is trying to implement it using Anthropicâs web search API with ReAct, and shared a code snippet using
- Latency Troubleshoot for Web Search API Calls: A member raised a question about the latency caused by web search API calls within DSPyâs ReAct when using a search function like
search_webbefore calling the LLM.- The user sought ways to reduce the delay from API calls.
MCP Contributors (Official) Discord
- New Protocol Version Released: A new protocol version has been released, as announced in the Discord channel.
- Members expressed excitement and gratitude to the MCP community for their contributions over the past year.
- UI SEP Ships Out-of-Band: The UI SEP can be shipped out-of-band from the main spec due to being an extension.
- Details are available in the <#1376635661989449820> channel.
- MCP Considers Namespace Collisions: A member inquired about whether the MCP group considers the possibility of namespace collisions.
- Specifically, the question was raised whether the group would take action if something claims to be something-mcp but diverges from the actual MCP standard.
Manus.im Discord Discord
- AI Engineer Boasts Extensive AI Experience: An AI engineer introduced themself, highlighting their experience in building advanced AI systems across domains such as AI agents, multi-agent systems, NLP-powered chatbots, voice & speech systems, Web3, and AI-integrated blockchain games.
- They also have hands-on experience automating workflows, deploying custom LLMs, and fine-tuning AI models.
- User Flags API Issues Amidst Support Silence: A user reported an [unknown] error in webdev.v1.WebDevService/GetDatabaseSchema due to usage quota exhaustion, despite spending over $600.
- This issue has made their account unusable, impacting over 500 active users, and they have yet to receive a response from the support team.
- Community Ponders a Possible Telegram Channel: A member raised the question of whether a Manus Telegram channel exists.
- No further details were provided.
aider (Paul Gauthier) Discord
- Community eyes new site admin for benchmarking: A member suggested a new site admin be appointed to update benchmark results with new models, hinting at dissatisfaction with the current pace of updates.
- This shift could revitalize the benchmarking process, ensuring more timely and relevant data for the community.
- Opus 4.5 upgrade, big or small?: A member launched a survey to determine if Opus 4.5 represents a major or minor upgrade compared to Sonnet 4.5, with feedback influencing future development priorities.
- Community sentiment will likely guide resource allocation towards enhancing the most impactful features.
- Bedrock Identifier Snafu: A user reported encountering a âmodel not foundâ error when attempting to use the standard Bedrock model identifier, signaling a potential glitch.
- Investigating this issue is critical to maintaining seamless access to Bedrockâs capabilities and averting further disruptions for engineers.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena â· #general (1279 messagesđ„đ„đ„):
Cameo Word Choice, Pro Grounding, Flux 2 Models, LMarena Updates, NB Pro
- Deepfake gets a Cameo Appearance!: Users debated the choice of the word âcameoâ to describe the appearance of something in an image, with one suggesting it might be a euphemism for deepfake to soften the negative connotation.
- Others wondered what word could replace it, something in between deepfake and cameo, like a single word version of Avatar.
- Flux 2 models hit the Arena!: The arrival of the Flux 2 models sparked discussion, with one user directly requesting Flux 2 plssssssssss, while others debated whether Fluxis flex or pro was the new and better model.
- Opinions varied, with some finding Flux 2 nice but not on the level of NB Pro, with one person adding: i mean how can you even compete with something like thatâŠfeels unfair tbh.
- NB Pro: Insane Image Generation!: Users raved about NB Proâs capabilities, calling it lowkey insane, with some describing it as an agi moment for them and no longer just an image generation model, but like a pair of eyes.
- One user said: proportionally in terms of blowing other models out of the water its the best image model in history actually it is just the best image model in history period.
- SynthID Saves Models!: The importance of SynthID as a safeguard against model nerfing was highlighted, with one user stating if NB pro didnt have synth id itd be nerfed within DAYSđ.
- Another user described a method to bypass SynthID, saying: But if you was the video twice and run through different media players in save it you get rid of it.
- Robin, Stealth New Model Emerges!: A new stealth model named Robin was revealed to be better than Opus 4.5, focusing on UI, and some theorized that it is a hidden OpenAI card.
- One member commented: this robin model is like their real hidden card imo last codex update was just an appetizer but it does take a lot of time tho makes me wonder if its just actual codex + more thinking.
LMArena â· #announcements (4 messages):
Image Edit Update, New Model Update, Leaderboard Update, Flux-2-pro, Flux-2-flex
- LMArena Tweaks Image Edit Flow: Due to community feedback, multi-turn in image generation chat has been disabled, but you can now edit images directly in chat with the new
Editfeature. - Flux Debuts in LMArena: The Flux-2-pro and flux-2-flex models have been added to Text-to-Image and Image Edit on LMArena, as announced on X.
- Arena Extends its Search: The gemini-3-pro-grounding and gpt-5.1-search models have been added to Search Arena.
- Claude Takes the LMArena Leaderboard:
Claude-opus-4-5-20251101&Claude-opus-4-5-20251101-thinking-32khave been added to the leaderboards with top placement in WebDev leaderboard and Expert leaderboard.
Perplexity AI â· #general (1082 messagesđ„đ„đ„):
AI doom, Palantir Technologies, Nvidia and Open AI partnership, Bypassing AI Detectors, Perplexity limits
- Doom Potential: Thiel Shadows Musk: A member expressed concern over Palantir Technologies, stating that Peter Thiel poses an existential threat, overshadowing Elonâs potential for pdoom.
- Another member sarcastically joked about nuking everyone to get rid of AI/robotics.
- AI Investment Bubbles: Nvidia and Altmanâs Game: Members discussed how 1% of USA GDP is being invested in AI/robotics, with OpenAI run by Nvidia, and Nvidia run by OpenAI, creating a circle jerk of inflated bubbles waiting to pop.
- Others pointed out that it is Altman who is purchasing the most of the shares in Nvidia.
- Opus 4.5 Efficiency Disputed: 73% Claim Debunked: Members debated the token efficiency of Opus 4.5 compared to Sonnet 4.5, with one member initially claiming Opus 4.5 is 73% more efficient, but this was disputed.
- Another user said that it was actually 76% more efficient than the previous Opus, not to Sonnet according to the neuron.
- Gemini Agent: Force Python Scripts to Interract Geminiâs Environment: Members talked about the ability to use Gemini Agent to force AI to run python script that can interact with environment that AI uses in Perplexity.
- However it was suggested that even if it were to do a sudo rm -rf / âno-preserve-root* it would do nothing because everything is sandboxed
- Perplexity Now Blocking User Prompts: Fursona Chaos Ensues: Users reported issues with editing their AI Profiles (system instructions), stating that changes would revert upon refresh due to a bug, or that PPLX is now blocking user prompts.
- One member said they donât want any system prompt right now because now Spaces have memory when they did not use to.
Unsloth AI (Daniel Han) â· #general (182 messagesđ„đ„):
FP8 RL Documentation, Optimization Techniques, Qwen3VL vs 30B-A3B, AI GPU Kernels, Embedding Models
- FP8 RL Documentation Link Still Leads to KimiQwen Waitlist: Clicking FP8 RL on the homepage docs still redirects to the kimiqwen-next UD quant waitlist sign-up.
- A user joked about next level stuff after discovering that only the learning rate had been changed.
- Quantized Model Speeds Up Inference: To achieve fast inference, users were advised to run a quantized model, preferably Unsloth Dynamic Quantized models from Hugging Face, set kv cache at 8bit, and optimize their GPU for the desired quantization.
- Running vLLM, SGLang, or LM Studio was also suggested as viable alternatives for running GGUF files.
- Bye-Bye Kernels: Although a user asked how long it will be until AI can write high quality GPU kernels, the team stated that kernels are not needed anymore because of torch.compile.
- Itâs been said that math algorithms are now the most important, and itâs a common misconception that kernel writing is needed; this has moved to help.
- ERNIE AI Developer Challenge Announced!: Unsloth is supporting the ERNIE AI Developer Challenge, offering $3,000 in prizes for fine-tuning ERNIE and building the most impactful model.
- Details can be found at the Baidu Ernie AI Devpost link and official Ernie finetuning notebooks (AMD ones are free) at the X post link.
- Unsloth to Hit Up NeurIPS in San Diego: Unsloth will be at NeurIPS San Diego 2025 with limited time merch, with an Agentic AI / RL Panel talk with OpenEnv on Tue 2nd Dec 4PM and the Open Source AI Reception on Wed 3rd Dec 6PM.
- The team provided a registration link and reminded users to hit them up for RL takes.
Unsloth AI (Daniel Han) â· #off-topic (173 messagesđ„đ„):
Claude Opus 4.5, wakeword solution, MS or PhD interviews, Long context training, Humanoid stamina
- Opus gives context Errors: Members report Claude Opus 4.5 giving errors for 100 lines of code + 200 line yaml file, with the error message, im sorry this is beyond my context limits. Im going to XYZ.
- One member then asked for a decent wakeword solution that works in a browser, or perhaps just in python.
- Job Interview: MS or PhD required?: A member shared that they have an interview even though they donât have a MS or PhD, which was stated as a requirement.
- Others encouraged them, explaining that companies filter out people, and what matters is who you are and what you can bring, just be yourself and genuine during the interview thatâs it.
- Training Model with CPU offloading: A member is training a model using their own training framework built on top of Unsloth, and asked if adapters are added to a model, does that mean both the adapter + model will be in memory, thus use more VRAM?
- Another member provided a link to the Unsloth Long Context Blogpost and explained the point of LoRA is to avoid updating all parameters.
- Humanoid stamina: A member asked If youâd build a humanoid, what could you account for the stamina and other similar âhumanâ parameters? And is it possible with current technologies to convert food into adenosine triphosphate and then electricity as efficiently as in living organisms?
- Another member replied the vaste majority of the technologie exists but has not be put togeather that wat / it is obscenely expensive like hundredd millions.
- Kagi drops Slop Detective Game: A member shared Slop Detective, a new game from Kagi, with the comment Yeah, letâs fight them, ugh! đ lol.
- Other members find examples are bs, and paddle wrong = ai correct = human, but one argues much hooman text fill of error.
Unsloth AI (Daniel Han) â· #help (103 messagesđ„đ„):
IPEX vs llama.cpp Vulkan, HF model to GGUF conversion, Continued pretraining vs Fine-tuning, Qwen3 8B Fine-tuning issues, AMD GPU support for bitsandbytes
- Vulkan > IPEX For Llama.cpp: Users recommend using the regular llama.cpp Vulkan version instead of IPEX due to stability issues, though SYCL might offer slightly better performance.
- It was mentioned that the IPEX build is very old.
model_typeAttribute Strikes Again in GGUF Conversion: A user encountered anAttributeError: 'dict' object has no attribute 'model_type'while converting a HF model (Unsloth/Qwen2.5-7B-Instruct) to GGUF usingllama.cppâsconvert_hf_to_gguf.pyscript, likely due to file structure issues.- Another user shared a working directory structure for a merged Qwen3 model as reference.
- Base Models Reign Supreme for Autocompletion: For training a model to generate similar data (autocompletion) without question/answer behavior, itâs recommended to start with a base model (not instruct-tuned) and perform continued pretraining.
- A Gemma-3-270M model was suggested for experimentation, alongside a link to Unslothâs documentation on continued pretraining.
- Qwen3 8B Fine-Tuning Fails the Vibe Check: A user experienced poor evaluation results after fine-tuning Qwen3 8B, with responses unrelated to the fine-tuning data, and experiencing the model still outputting the
thinkingprompt even with the prompt set to false.- It was suggested to try manual merging and saving if LM Studio replicates the issue, referencing the Unsloth documentation.
- AMD GPUs Get Bitsandbytes Boost in vLLM Update: The AMD documentation is due for an update to reflect the support of Bitsandbytes 4bit quantized models and QLoRA on Radeon GPUs.
- Changes were implemented in bitsandbytes-foundation/bitsandbytes#1748 and vllm-project/vllm#27307.
Unsloth AI (Daniel Han) â· #showcase (2 messages):
ERNIE AI Developer Challenge, Baidu ERNIE, Unsloth finetuning, AMD notebooks
- ERNIE AI Developer Challenge Kicks Off: Unsloth announced support for the ERNIE AI Developer Challenge, offering a chance to fine-tune ERNIE with Unsloth and win prizes.
- The competition details can be found at baiduernieai.devpost.com.
- Unslothâs Finetuning Freebies for ERNIE: Official ERNIE finetuning notebooks, including free ones for AMD, are available.
- Check out the announcement on X.com for access to the AMD notebooks.
Unsloth AI (Daniel Han) â· #research (12 messagesđ„):
Evolutionary Strategies at Scale, LESA: Learnable LLM Layer Scaling-Up, Efficient Training on CPU
- ES HyperScale boosts Training Throughput: A member shared ES HyperScale which achieves a hundredfold increase in training throughput over standard ES for billion-parameter models at large population sizes, enabling more flexible training on any model, without worrying about gradients, and with int8.
- Another member humorously noted, âTraining at 100x speed? Thatâs Unsloth x 50 thenâ.
- Learnable LLM Layer Scaling-Up with LESA: A member posted LESA: Learnable LLM Layer Scaling-Up, suggesting that some sort of (nested âelasticâ MoE) + (multi-token prediction) would provide a crazy inference single batch throughput leap.
- The paper introduces LESA, which predicts parameters inserted between adjacent layers using a neural network, enabling better initialization and faster training.
- Efficient CPU Training is now Reality: A member highlighted that with ES HyperScale realistically efficient training on CPU can be achieved, with flexible training on any model, without worrying about gradients, and with int8.
- It was described as âmore flexible training on any model. Training without worrying about gradients. Training with int8! Realistically efficient training on CPUâ.
Cursor Community â· #general (371 messagesđ„đ„):
Haiku documentation accuracy, Cursor agent's plan markdown storage, Free Agent Review, Education discounts
- Haiku models for documentation: Members are finding that Haiku with documentation is 100% accurate and Composer-1 is best for code implementation, and Haiku reigns supreme for speedy documentation retrieval.
- One member also suggests using Antigravity instead of littering repos with Markdown reports, although this may cause issues with handoff.
- Users discuss cost of tokens and model usage: Some users report issues with the Opus model being overloaded, others say it has been degraded, acting weird and less smart.
- Some debate whether to enable on-demand usage or just buy a Pro+ plan, discussing if they should just burn the tokens using Auto and not consider token efficiency.
- Agent review being free??: Users notice agent review may be free but only on the old pricing, whereas on the new pricing is no longer available.
- One also wonders if the teams plan have unlimited bugbot due to seeing unlimited bugbot on the dashboard.
- Users frustrated with linting errors in Cursor: A user seeks help to disable red squigglies for linting checks while keeping them for other errors, as well as allowing the extension to run
--fixin the background on file save.- The user expressed frustration on why this is so hard to do in Cursor, as itâs fairly straightforward in JetBrainsâ tools.
- Agent plans not saved in Cursor: A user asked where the markdown file for an agent plan is saved, so they can switch between computers without losing the plan.
- The community member states that Cursor doesnât save the plan, so you need to manually save the markdown, and create a rule to add all plans to a directory.
GPU MODE â· #general (7 messages):
Triton Kernels, Partially Trainable Embedding, Logits Softmax Operation, Curriculum Learning
- Seeking Frontier-Level Efficiency Gains with Triton Kernels: A member is seeking advice on using Triton kernels for a unique challenge involving a Partially Trainable Embedding and a Logits Softmax operation, aiming for frontier-level efficiency gains.
- The goal is to train a large model while freezing most of it, focusing on specific special tokens efficiently, as initial attempts with Claude yielded slow results attributed to memory bounding due to inefficient tiling and repeated data retrieval.
- Need Partially Trainable Embeddings for Memory Savings: A member wants to implement a Partially Trainable Embedding where only a range of rows above a certain index are trainable, such as 1k rows (127k to 128k) out of a 128k vocabulary.
- This is intended to reduce memory usage by only storing gradient outputs for the trainable rows, and is also intended to freeze most of the model while only training specific special tokens.
- Weighted Loss with Logits Softmax: A member is looking to implement a logits softmax operation that allows for weighted loss to be applied, such as token in pos 123 having a 0.5x loss multiplier and token in pos 124 having a 1.5x loss multiplier.
- The goal is to avoid materializing all the logits by using chunking or CCE approaches, and it must work with the custom partially trainable embedding.
- AI Labs commonly use Curriculum Learning: A member asked if AI labs really use things like curriculum learning and coreset while pretraining LLMs.
- Another member responded, idk wdym by coreset, but yeah curriculum learning is pretty common in pretraining in general.
GPU MODE â· #triton-gluon (5 messages):
Proton vs Nsight Systems, Tensor Descriptors, Auto Tune Parameters, Tritonparse, Persistent Matmul Tutorial
- Proton Profiling Tool Glitches: A user inquired about using Proton for profiling, noting errors when generating chrome traces as documented, wondering if others prefer Nsight Systems instead.
- Follow up discussion pointed to persistent matmul tutorial as example of using mnk as autotune keys.
- Auto-Tune Parameter Quest Kicks Butt: One member, struggling with leetcode, expressed interest in tensor descriptors or auto-tune parameters to specialize shapes.
- They also thanked another member for suggesting Tritonparse as a helpful tool.
- Persistent Matmul Tutorial: A member suggested that the persistent matmul tutorial is an example of using mnk as autotune keys.
- The tutorial guides users through optimizing matrix multiplication using shared memory and persistent kernels, providing a practical example of autotuning in Triton.
GPU MODE â· #cuda (17 messagesđ„):
GEMM with tensor cores, NVIDIA Tensor Cores performance optimization resources, BF16 matrix multiplication, CUDA implementation details, Matrix data loading strategies
- GEMM Implementations Explored: A member is exploring GEMM (General Matrix Multiplication) implementation using tensor cores and seeks advice on using BF16 for matrices A, B, and C with
floataccumulators, referencing Lei Maoâs tutorial.- The member is facing challenges with loading matrix C elements using
load_matrix_syncand converting them intofloat, questioning whether C should initially be afloatmatrix.
- The member is facing challenges with loading matrix C elements using
- Tensor Core Optimization Treasures Unveiled: Members shared resources for performance optimization on NVIDIA Tensor Cores, pointing to similar articles and worklogs, such as alexarmbrâs work and cudaforfunâs worklog.
- One highlighted that GPU-MODE has a lecture for Hopper GEMM worklog.
- Data Loading Dilemmas Decoded: A member explained that
ldmatrix.b16loads 128 bits of data per thread without extra operations, suggesting areinterpret_castfor correct data handling.- Another member clarified that when using
f32ors32accumulators, each thread holds a pair of consecutive values within a row (8 bytes), whileldmatrix.b16splits a row into 4B chunks, distributed over a quad of threads, suggesting the use offloat2or reordering B matrix columns on load.
- Another member clarified that when using
GPU MODE â· #torch (3 messages):
Gradient Checkpointing, Torch Differentiation, Boolean Flagging
- Looking for Torch Function to Differentiate Forward Passes: A member inquired about a torch function to differentiate if the forward pass is run with or without gradient checkpointing.
- The member also asked if there is a way to differentiate between the two forward passes.
- Leveraging Boolean Flags to Differentiate Forwards: A member suggested solving the differentiation of the two forwards with a boolean flag.
- The member proposed alternating the flag in each forward pass.
GPU MODE â· #beginner (13 messagesđ„):
Contributing to XLA, GPU/CUDA Benchmarking Warmup Runs, Kernel Characteristics Affecting Warmup Time, Thermal Limits in Benchmarking, nvbench thermal states
- Contributors looking for ways to contribute to XLA: A member inquired about contributing to XLA and sought guidance on where to begin, with an initial interest in documentation support.
- GPU Warmup Run Rule of Thumb: A member asked about a good rule of thumb for the number of warmup runs for GPU/CUDA benchmarking.
- Another member responded that there isnât one in raw numbers; instead, they repeat measurements until successive runs donât change significantly.
- Thermal limits impact long GPU runs: Members mentioned that to benchmark steady state performance of an application running for a long time, you have to take power draw and thermal limits into account.
- You literally have to let the GPU warm up to reach a steady temperature (which might take tens of seconds to a couple of minutes).
- Datacenter Settings Mitigate Thermal Factors: A member inquired whether datacenter settings mitigate thermal factors, and another member responded that, depending on context, this steady state might not be the correct answer.
- They also provided a link to a YouTube video about nvbench, which aims to get a good average across un-throttled thermal states.
GPU MODE â· #jax-pallas-mosaic (2 messages):
jax.pmap vs jitting on single GPU, Multi vs single GPU systems
- Performance of
jax.pmapvsjiton single GPU: A user inquired about the downsides of usingjax.pmapwith one device compared to jitting it directly viajax.jit. - Code portability on Multi vs Single GPU systems: The user is writing code intended to run on both multi and single GPU systems and is considering using
jax.pmapeven when there is only one GPU to simplify the codebase.
GPU MODE â· #off-topic (1 messages):
Memes
- Meme of the Day Delivered: A user delivered a meme.
- The meme can be found here.
- Another Meme Appears!: Another meme has been posted for the amusement of the channel.
- This meme adds to the ongoing collection of humor shared within the community.
GPU MODE â· #irl-meetup (1 messages):
szymonoz: Iâll be coming to NeurIPS and traveling to SF afterwards, hmu if you want to chat gpus đ
GPU MODE â· #intel (1 messages):
2bit Dequantization on Intel GPU, GPU Dequantization Methods, Torch Performance on Intel GPU
- 2-bit Dequantization Quest on Intel GPU: A user inquired about a method for performing 2-bit dequantization directly on an Intel GPU, noting that while quantization can be done on the CPU, dequantizing with Torch is slow.
- The user seeks a faster, GPU-based alternative to Torch for dequantization to improve performance, illustrating a need for optimized Intel GPU solutions in this area.
- Seeking Speedy GPU Dequantization: The original poster is seeking optimized GPU-based alternative to Torch for dequantization to improve performance.
- There is no other discussion to summarize, it remains an open question for the channel.
GPU MODE â· #self-promotion (1 messages):
aerlabs: https://x.com/aerlabs_/status/1993561244196868370
GPU MODE â· #đż (1 messages):
LLM initiatives, LLM Kernel Generation, Agentic Systems
- Urmish Joins LLM Initiatives: Urmish introduces themself, expressing interest in helping with LLM initiatives, highlighting experience in pre-training, post-training, evaluation, agentic systems, and dataset creation, and provides a Google Scholar profile.
- With a background in systems and performance engineering, including kernel writing for microcontrollers, HPC, and CPUs, they seek guidance on where to begin and inquire about subgroups focused on LLM training, prompting, or agentic harnesses for LLM Kernel Generation.
- LLM Kernel Hopes to sprout: Urmish asks about the existing subgroups to better target efforts in LLM Kernel Generation, LLM training and Agentic Harnesses.
- They are hoping to use prior experience to help the community.
GPU MODE â· #thunderkittens (10 messagesđ„):
CUDA kernels, Flash Attention, MoE kernels, Linear Attention backwards, FFT conv backwards
- Newcomer Pioneers CUDA and Flash Attention: A new community member expressed their experience writing CUDA kernels and working with flash attention.
- Another member encouraged them to contribute back via a PR.
- Kernel Contributions Blossom in ThunderKittens: Members discussed open areas for development including MoE kernels, linear attention backwards, FFT conv backwards, and integrations into inference engines.
- They also mentioned Pythonic wrapper explorations/tooling to simplify development and tooling to integrate light compiler passes as welcome community contributions.
- AMD GPU Availability Sparks Debate: A member inquired whether the contributions were for the main branch CDNA4 or CDNA3, noting the difficulty in finding a GPU provider for AMD GPUs to build and test such things.
- Another member clarified that itâs for both, but that the original question was about TK.
GPU MODE â· #submissions (114 messagesđ„đ„):
NVIDIA leaderboard submissions, nvfp4_gemv leaderboard, Personal bests, Successful submissions
- NVIDIAâs nvfp4_gemv Leaderboard: Submission Blitz!: The
nvfp4_gemvleaderboard on NVIDIA saw a flurry of activity, with numerous submissions from several users, including <@242385366873669632>, <@393188835472834560>, <@651556217315000360>, <@418996736405536790>, <@1035498877249409155>, <@1295117064738181173>, <@376454672799760384>, <@96782791567503360>, <@264466949331746826>, <@1178719962597183529>, <@434046629281267744>, <@1291326123182919753>, and <@120261963551866881>.- The submissions included both âPersonal bestâ and âSuccessful on NVIDIAâ results, indicating active optimization and testing efforts.
- Overtaking the Podium: Second Place Achieved: <@1035498877249409155> achieved second place on NVIDIA with a submission of 3.02 ”s and later another second place with 15.8 ”s on the
nvfp4_gemvleaderboard.- There was discussion about a potentially fishy submission by <@1035498877249409155>, with <@264466949331746826> planning to double-check the results and mentioned, âi gave opus 4.5 full reign with some guidance on tricksâ.
- Optimization Race: New Personal Bests Unveiled: Multiple users, including <@242385366873669632>, <@393188835472834560>, <@1295117064738181173>, <@120261963551866881>, <@434046629281267744>, <@1035498877249409155>, <@1291326123182919753> and <@651556217315000360>, consistently submitted âPersonal bestâ results on the
nvfp4_gemvleaderboard on NVIDIA.- This indicates an ongoing effort to optimize performance and achieve faster execution times, also <@376454672799760384>âs submission had a best of 144 ”s.
- Entering Top 10: Users Grab Top Spots: <@1295117064738181173> secured 8th place with 22.7 ”s, then later 7th place with 22.5 ”s, and <@1035498877249409155> achieved 9th place with 23.2 ”s on NVIDIA.
- <@1178719962597183529> reached 9th place with 23.3 ”s and <@1295117064738181173> reached 7th place with 22.9 ”s.
GPU MODE â· #factorio-learning-env (3 messages):
Factorio Learning Environment Docs, Jack Hopkins, Github Pages
- Hopkinsâs Hotline: Factorio Docs Deployed!: Jack Hopkins announced that the documentation for the Factorio Learning Environment is now live at Factorio Learning Environment.
- Noddybear thumbs up Hopkinsâs Docs: Noddybear reacted positively to the announcement of the new Factorio documentation.
GPU MODE â· #cutlass (2 messages):
SIMT loads, Tiled_mma documentation
- SIMT Load Overheads: SIMT loads have overheads, so use them only if TMA is too restrictive.
- Tiled_mma example breakdown: An engineer is trying to use tiled_mma by following the hopper gemm cute dsl example.
- They tiled sa by (2, 4), and
tCsA: ((64,16),2,4,(1,1)):((64,1),4096,16,(0,0))is mma atom tile (64, 256), 2 tiles along M direction and 4 tiles along K direction.
- They tiled sa by (2, 4), and
GPU MODE â· #singularity-systems (3 messages):
picograd, aten-like Op intermediate representation, Device runtimes
- Picogradâs Latest Commits: The user shared a series of recent commits to the picograd repo, highlighting ongoing developments.
- The commits cover various aspects, including package-level documentation, tensor implementation, evaluator design, and device runtimes.
- Picogradâs Tensor Implementation: The user linked to picogradâs
Tensorimplementation, which desugars into an aten-likeOpintermediate representation (link).- The goal is to provide a foundation for automatic differentiation and GPU acceleration.
- Picogradâs Evaluator and Device Runtimes: The user spotlighted the
evaluator(op: Op)interpreter, which usesDeviceruntimes (link), and theDeviceruntimes themselves, which provide memory allocators and kernel compilers (link).- The user mentioned that the language and runtime will come together nicely soon, paving the way for marching across architectures.
GPU MODE â· #multi-gpu (3 messages):
LLM Inference, NVRAR algorithm, PAT Algorithm, Bruck algorithm, Recursive doubling algorithm
- NVRAR Speeds up Multi-Node LLM Inference: The paper LLM Inference Beyond a Single Node introduces NVRAR, a hierarchical all-reduce algorithm based on recursive doubling with NVSHMEM, achieving up to 1.9x-3.6x lower latency than NCCL for message sizes between 128 KB and 2 MB.
- Integrated into YALIS, NVRAR achieves up to a 1.72x reduction in end-to-end batch latency for the Llama 3.1 405B model in multi-node decode-heavy workloads using tensor parallelism.
- PAT Algorithm for All-Gather and Reduce-Scatter Operations: The paper PAT: a new algorithm for all-gather and reduce-scatter operations at scale discusses the shortcomings of the Bruck and Recursive doubling algorithms in practice due to their final steps requiring large data transfers to distant ranks.
- The last step sees every rank send half of the total size to its most distant rank, and on large fabrics, that last step frequently runs many times slower than the theory due to static routing, or due to higher levels of the fabric being tapered.
GPU MODE â· #nvidia-competition (159 messagesđ„đ„):
CuTeDSL packed FP16, eval.py issues, cudaStreamSynchronize(), LLM-only challenges, sfa_permuted purpose
- CuTeDSL gets packed FP16 instructions: Members provided code to use packed FP16 instructions in CuTeDSL, because the normal CuTeDSL doesnât offer these via nvvm.
- Eval Script Faces Scrutiny: Users reported that the
eval.pyscript in the GPU MODE competition can produce highly variable results, with timing differences of up to 50% even when uploading the same script multiple times, some speculate a slow runner with id 105881.- The erratic nature of the script raises concerns about the accuracy and reliability of the leaderboard timings with a suggested submission threshold of 25.
- Streams add overhead: A member found that playing around with streams causes synchronization issues, and stated that
cudaStreamSynchronize()adds massive overhead on properly implemented solutions.- Another member noted that events add about 4 us of measuring overhead.
- LLM-Only Approach Explored: Some participants are trying an âLLM-onlyâ approach, using models like Gemini 3.5 Pro and Opus 4.5 to generate code, but some are guiding the LLM more than others.
- One user noted gemini 3.5 pro and opus 4.5 are complete game changers⊠they make gpt-5.1 look like llama-7b.
- sfa_permuted Cracking the Code: A user finally realized the purpose of sfa_permuted is related to the tcgen instruction which makes it easier to make the thing with this layout.
GPU MODE â· #hf-kernels (5 messages):
Metal Kernels Release, MacOS Compatibility Issues
- Metal Kernels Delayed: A member inquired about the release of metal kernels.
- No release date was given.
- MacOS Compatibility Limited: A member questioned why the kernel-builder only supports macOS 26, which reduces compatibility with M1 chips and older versions of macOS.
- The member was confused why everything done for the apple torch ecosystem is done in a way that makes it worse.
GPU MODE â· #robotics-vla (8 messagesđ„):
7x Laundry Folding Robot, No-Action Filtering, Qwen3-VL Optimization, Classic Binning vs FAST Tokenizer
- 7x Laundry Robot Debuts!: 7x is offering a 3k laundry folding dual arm system, as seen on their YouTube channel, providing low-cost robots vibes with 24 hour support from founders and engineers.
- Doubts were cast on the armsâ durability for real-world jobs, with a member contrasting their support model against that of Google Robotics.
- No-Action Filtering is Crucial for VLAs: A member learned that no-action filtering is important for VLAs, showcasing the difference between a no-idle filter and a with-idle filter in a visual comparison.
- An image illustrating the impact of idle frame analysis showed that active frames constituted 78.8% of total frames analyzed.
- Qwen3-VLâs Optimization Hurdles: A 2B model feels slow, especially during inference, rendering it unfeasible for running RL, and a member planned to investigate optimized forward passes for Qwen3-VL.
- No further details were provided.
- Tokenizer Faceoff: Classic Binning vs FAST: Members are testing classic binning vs. FAST tokenizer, but the complex compressed tokens generated by FAST (DCT+BPE) may delay the modelâs ability to produce reliably valid sequences.
- The poster expressed doubt whether this would be a good basis for RL, therefore they are simultaneously trying a simpler variant with disentangled joints and simple quantization.
OpenAI â· #ai-discussions (263 messagesđ„đ„):
ChatGPT Biases, Nano Banana Pro, Commercial Use of AI Generated Images, GPT 5.0 mini, OpenAI UI Design
- ChatGPT allegedly biased towards Left-Wing data: Members discussed whether ChatGPT is trained on liberal and politically left-wing data, with potential causes being progressive viewpoints in training data and biases of human raters in Reinforcement Learning with Human Feedback (RLHF).
- One member argued that the modelâs need to fussy foot around questions compromises its reliability.
- Nano Banana Pro Unleashes Comic Creation: Users are creating comics with Nano Banana Pro, praising its power, ability to generate images quickly, and the high-quality results, and are excited about itâs ease of use in generating comic pages.
- Members shared worries about the model being lobotomized.
- AI Art raises Commercial Copyright and Ethical Quandaries: Members debated the commercial viability and copyright implications of using AI-generated images from Gemini, noting that while Google doesnât explicitly prohibit commercial use, the legal status depends on whether the content is copyrightable, with cultural bias in AI art being a significant concern.
- One member said that if the anti AI people want to do something they ought to start drawing and making art.
- GPT-5.0 Mini Feels Like a Downgrade: Members are not happy about GPT-5.0 Mini, stating it is a downgrade.
- They are annoyed with the incessant begging for Sora 2 which they havenât even used.
- UI/UX of OpenAI cater to a Neurotypical Audience: A member argued that OpenAIâs UI is not designed for neurodivergent thinkers, requiring too many steps and not fitting people with complex thinking.
- Others in the channel argued that the UIâs are terrible for everyone and are explicitly designed to cater to people with executive dysfunction, in particular the Mac app.
OpenAI â· #gpt-4-discussions (10 messagesđ„):
GPT 5.1, GPT 4.1, Chat reference memory, Anime writing
- User Praises GPT 5.1 for Anime Storytelling Prowess: A user highlights that GPT 5.1 is currently the best model for anime or story writing due to its ability to remember character designs and previous context.
- The only complaint is the strict safety net and guardrails that prevent writing anime-style violence. The user shares that theyâve used GPT 4.1 for a year, but sometimes it misses character designs.
- Chat Reference Memory Issues Debated: A user asks whether anyone else is having issues with chat reference memory in GPT models.
- Another user poses the question of whether GPT 5.1 is better than GPT 4, suggesting it depends on the specific use case.
OpenAI â· #prompt-engineering (1 messages):
mx_fuser: <@1256251788454268953>
OpenAI â· #api-discussions (1 messages):
mx_fuser: <@1256251788454268953>
LM Studio â· #general (46 messagesđ„):
Unsupported API Endpoints in LM Studio, Image Captioning Issues with LM Studio, Vision Models, ROCm 7 Update for RDNA 3, Mint Opportunity Partnership with OpenSea
- API Endpoint Troubleshooter Solves Issue: A user encountered an error with unsupported endpoints (POST /api/v1/generate) on their local server, but resolved it themselves after posting in the channel.
- The user was pointed to the LM Studio REST API documentation, and realized the endpoint was invalid.
- Channel Error Ruins Image Captions: A user reported a âChannel Errorâ when trying to caption images with LM Studio, experiencing a 100% failure rate after a Windows and antivirus update, even though it worked previously.
- The user switched from llava-v1.6-34b to Gemma 3, which solved the problem giving 100% success rate; the suggestion was offered as potentially model dependent or the issues might be related to Flash Attention being enabled by default.
- Flash Attention Glitch in Some Models: It was suggested that the userâs issue may be related to Flash Attention, which is now enabled by default in recent LM Studio versions and can cause some models to not function correctly.
- Users were encouraged to share screenshots of their runtimes view and check for non-English input/output, with a suggestion to run
lms log streamfor more detailed error messages.
- Users were encouraged to share screenshots of their runtimes view and check for non-English input/output, with a suggestion to run
- GPT OSS 20B Blazes with Speed: A user shared an image showcasing the speed of the gpt-oss-20b model, linking to a Reddit thread and it was mentioned that the information in that Reddit post was something that a few people in the channel might relate to.
- Mint Opportunity Plunges User into OpenSea: A user announced a free Mint opportunity in partnership with OpenSea, inviting members to participate through a provided link.
- Another user quickly pointed out that the given invitation would fail in a real academic setting for reasons explained in detail, pointing out the difference in how a human would rate the work vs how the bot would.
LM Studio â· #hardware-discussion (217 messagesđ„đ„):
Q8 Cache, GPU Fans at 0% During Inference, Memory Pricing Issues, DLSS and RT Testing, Hardware Devaluation
- Q8 Cache Configuration Conundrums: Members discussed using Q8 cache, with one mentioning that a specific user (<@96768590291664896>) knows how to explain why the digits donât align for Q6 KV.
- GPU Fans Taking a Break During Inference: One user noticed their GPU fans were at 0% during inference, initially raising concern, but later clarified it was a normal behavior for their MI50 and sometimes their 4070 TiS.
- The user noted that once the context is fully written, the GPU âtakes overâ and power draw increases.
- Hardware Devaluation Debate: A user shared a photo of their Windows boot in recovery, joking that an 850W power supply wasnât cooked after all, calling it an improvement.
- The user initially suspected a power supply issue but then suspected the CPUâs thermal paste was the issue.
- Potential CPU Fire Averted?: Users cautioned against potentially frying components and advised testing the CPU and RAM on a cheap motherboard, suspecting a potential fire.
- Another user found a bent CPU pins on MB and smelt the CPU to check for smoke but determined everything was fine after cleaning the thermal paste.
- Bifurcation Breakthroughs: A user realized their X570 AORUS ELITE WiFi motherboard supports PCIe bifurcation on the primary x16 slot, allowing it to be split into configurations like 8x/8x or 8x/4x/4x.
- Another user added that with bifurcation you use a SlimSAS MCIO adapter to split the x16 slot into dual x8 slots when x8x8 is enabled.
OpenRouter â· #app-showcase (2 messages):
Color Picker Issues, RapidaAI Open Source
- Color Picker Bug Bites Users: A user reported that the color picker is a bit funky and offset for the theme palette override.
- RapidaAI Goes Open Source: RapidaAI, a production-ready voice AI platform, announced they are releasing their open source code here.
- The company observed that voice AI vendor bills kept growing without improvements to customer experience, with companies paying an extra $0.05â$0.15 per minute to rent someone elseâs stack, so they built Rapida to flip that model.
OpenRouter â· #general (196 messagesđ„đ„):
Opus Overload, Model Fallback Bug, Deepseek R1 Model Gone, Meganova Chat Buzz, OpenRouter Pricing and Features
- Opus experiences Overload Outage: Users reported that Opus was overloaded again, right when things were getting hot.
- Some members joked Youâd think theyâd have better rate limit/load balancing eh, while others were understanding and mentioned Small company pls understand.
- Model Fallback feature faces Flak: A member reported a bug in the model fallback logic where a 404 error from the primary model prevented the fallback from working, rather than falling back to secondary models.
- The member stated Im about to migrate to openrouter for an enterprise application , thereâs no space for real or not real model . if the fallback logic breaks for such simple use case . there might be more issues.
- Free Deepseek R1 Model Ripped: Members noted the free Deepseek R1 model is no longer available.
- One member lamented losing the model Thatâs stupid. I used it with a chutes api key because using the model via chutes shows the think process and I canât stand it.
- Meganova Chat creates Mass Movement: Members discussed the upcoming launch of Meganova Chat, a platform for managing AI chats and characters, with one member describing it as a clean, fast place.
- One member responded Iâm seeing a lot of positive buzz around Meganova Labubu Chat! iâm considering learning more about it, while others offered comedic parodies of promotional messages.
- OpenRouter Boasts Beneficial Basics: A member highlighted the benefit of OpenRouterâs normalized interfaces for various providers.
- They mentioned the ability to switch from e.g. GPT 5.1 to Opus 4.5 instantly without having to parse all of the anthropic changelog is very nice, despite the fact that there is a 5% premium on credit purchases.
OpenRouter â· #new-models (2 messages):
â
- No New Models Discussed: There were no discussions or information about new models provided in the given messages.
- Channel Indication: The prompt indicated the messages came from the ânew-modelsâ channel on OpenRouter, but contained no actual model-related content.
OpenRouter â· #discussion (5 messages):
Arrakis AI model, Text-to-Video Leaderboard, Kling 2.5 Turbo, Google Veo 3
- Arrakis AI Still Looks Yellow-ish: A member commented on an image from Arrakis AI, observing that it still does look yellow-ish.
- They speculated that they just added a colour adjustment layer before sending the image to the client.
- Text-to-Video Leaderboard crowns a new king: A member shared a link to the Artificial Analysis Text-to-Video Leaderboard, highlighting the top performers.
- The leaderboard showcased David in first place, followed by Googleâs Veo 3 as the runner-up, and Kling 2.5 Turbo 1080p in third place.
Nous Research AI â· #announcements (1 messages):
Psyche Office Hours
- Psyche Team Holds Office Hours: The team behind Psyche will hold an Office Hours session next Thursday 12/4, at 1PM EST in the Events channel.
- Users can join the Discord event to participate.
- Dummy Topic: This is a dummy topic to satisfy the minimum item requirement.
- It adds a second entry to the topicSummaries array.
Nous Research AI â· #general (146 messagesđ„đ„):
Suno Warner Music Partnership, Data vs Compute Cost, Blackwell Architecture, Z-Image Model, AI Disclosure on Steam
- Suno Teams Up with Warner Music, Sparks Debate: Sunoâs partnership with Warner Music Group raises questions about the future of AI-generated music and its impact on the music industry.
- A member noted that while some Suno songs are indistinguishable from human-created music, many others are easily identifiable as AI-generated, leading to conflicting feelings about its potential and drawbacks.
- Data Dollars Dwarfed by Compute Costs: A member pointed out the disparity between spending $2k on data versus $32 million on compute, highlighting the resource-intensive nature of AI model training as seen with Udio and Suno
- This shift towards prioritizing compute may significantly narrow future research avenues, especially access to high-quality opt-in training data.
- Blackwellâs Bottleneck: INT/FP Mixing Mayhem: Mixing INT and FP workloads on Nvidiaâs Blackwell architecture can severely degrade performance due to its unified scalar pipeline, which can only run one type of operation per cycle per core.
- The best practice is to keep each kernel either FP-only or INT-only to avoid a 30-50% performance hit caused by constant cache thrashing and reloading of code.
- Z-Image Model Zooms onto Modelscope: The 6B Z-Image model has been released on Modelscope, with its Hugging Face page expected to follow, offering a cinematic aesthetic despite its small size.
- It leans more cinematic in aesthetics and has a distilled version available for faster inference.
- Steamâs AI Disclosure Debated by Devs: A discussion arose regarding Steamâs AI content disclosure policies, with Epic CEO Tim Sweeney arguing that AI disclosures should only apply to âartâ and not games.
- While Sweeney views AI disclosures as unnecessary, some argue they inform consumers about the potential impact of AI-generated content on their gaming experience, especially in areas like voice and art.
Nous Research AI â· #ask-about-llms (2 messages):
LLM benchmarks, pre-training data contamination, private benchmarks
- LLM Benchmarks face Pre-Training Data Contamination: A member inquired whether LLM benchmarks ensure models havenât seen problems during pre-training to avoid skewed results like models solving problems simply from memorization.
- Another member responded that benchmarks donât always account for this, although some providers maintain private benchmark versions.
- Overcoming Contamination in Benchmarks is Challenging: It was noted that once a benchmark is used for model testing, it can technically be used for training too, creating a challenge in maintaining benchmark integrity.
- Suggestions to mitigate this included using a large, private dataset and/or questions that would be hard to memorize.
Nous Research AI â· #interesting-links (2 messages):
History of Information Retrieval, RAG, Library of Alexandria
- Lecture Traces Information Retrieval History: A lecture traces developments in information retrieval from the Library of Alexandria to RAG, presented in a YouTube video.
- Teknium Hypes Lecture: Teknium expressed hype and intent to check out the lecture.
- No further details were given.
Eleuther â· #general (81 messagesđ„đ„):
Hallucinations in Multi-Stage LLMs, AI and Collaborative Work, LLMs as Golden Retrievers, Verifying AI Claims, AI fact checking misinformation
- Hallucinations in Multi-Stage LLMs Still Count as Hallucinations: When discussing hallucination during the multi-stage LLM process, a member said that it is a hallucination of the component system which generated it, even if corrected by the Chain of Thought pipeline.
- They added that humans hallucinate and correct themselves like that all the time, and they shared a paper on LLM hallucinations.
- LLMâs and collaborative work: A member sought feedback on a collaborative work with AI, focusing on long-form reasoning and mirror learning, they asked for advice to verify the soundness of their reasoning process.
- They shared their Causality Trilemma project on GitHub, which resulted in a clear understanding of my own cognitive style â how I identify contradictions, refine assumptions, and build structural patterns out of questions.
- LLMs as Sophisticated Golden Retrievers: Multiple members compared LLMs to golden retrievers, emphasizing their tendency to please users even if it means providing incorrect or misleading information, especially chatbots like ChatGPT, Claude, Gemini, and Grok.
- A member shared a YouTube video to highlight how LLMs might generate outputs without genuine understanding or logical consistency.
- LLMs Canât Help You if You Donât Know Anything: It was said that the only time people have produced any significant work with AI models is when theyâre already an expert in the given field.
- One member linked to a LessWrong post recommending steps you can take before believing your LLM-assisted scientific breakthrough.
- Fact Checking Isnât Helped by more LLMs: One member said that using multiple LLMs does little to help the situation because they have very similar propensities for hallucinating false information
- They cautioned against replying to the poster with misleading or incorrect advice about how to fact check LLMs.
Eleuther â· #research (37 messagesđ„):
SGD shuffling, PIQA paper typo, Emergent Misalignment paper replication, AI for Drug Discovery
- SGD Shuffling Sparks Debate: Members debated the merits of shuffling data every epoch in SGD, with one member arguing that shuffle once should always be better than IID, contrary to known results about SGD.
- Another member countered that practice matters more than proofs due to the non-convex nature of optimization surfaces, noting that IID can lead to increased variance and data revisits, but shuffling every epoch balances noise and structure.
- PIQA Paperâs Portuguese Gaffe: A member humorously pointed out a potential typo in the new PIQA paper, where Portuguese was listed as an Eastern European language, attaching an image for reference.
- The paperâs author confirmed the error and promised to correct it.
- Parallel MLP and Attention Performance: A member inquired whether parallel MLP and attention (GPT-J style) are inferior to alternative implementations.
- A member shared a personal datapoint noting past instability issues attributed to prenorm style interactions rather than the underlying parallel execution technique itself, while alluding to the success of shortcut moe as a relevant comparison.
- Emergent Misalignment Revisited, JSON Trap Unveiled: A member released a replication and extension of the âEmergent Misalignmentâ paper, testing Gemma 3 and Qwen 3, finding open-weight models surprisingly robust to insecure fine-tuning (0.68% misalignment), but identifying a format-dependent vulnerability with JSON halving the misalignment rate (0.96% vs 0.42%).
- The member released the full dataset and code for reproducibility, speculating that JSON restrictions reduce a modelâs degrees of freedom to refuse harmful requests, as discussed in this blog post.
- AI for Drug Discovery Resources Sought: A member requested pointers to educational resources for gaining an overview of the AI for Drug Discovery space, seeking information on architectures, open problems, and the status quo.
- Another member suggested reviewing various surveys available via Google Scholar, while another pointed to the Zach Lipton startup.
Eleuther â· #scaling-laws (1 messages):
junktown_24268: https://papers.cool/arxiv/2509.24406 - section 3, pictures in 5.1 etc etc
Latent Space â· #ai-general-chat (69 messagesđ„đ„):
Claude Codeâs upgraded Plan Mode, DeepMind Documentary, Jeff Deanâs 15-Year ML Retrospective & Gemini 3.0, AI Generated Slides, OpenAI vs Claude
- Claudeâs Codeâs Plan Mode Goes Parallel: Sid highlights a major overhaul of Claude Codeâs Plan Mode: multiple exploring subagents now spin up in parallel, generate competing plans, ask clarifying questions, and let users edit the saved plan file with
/plan open(source). - Thinking Game Documentary on DeepMind Origins Released: Members watched the free full movie documentary, The Thinking Game, which explores the origins of DeepMind, now available on YouTube.
- Viewers called it great and said the movie really makes you want Demis to win the AGI race.
- Jeff Deanâs AI Retrospective and Gemini 3.0: AER Labs recaps Jeff Deanâs Stanford talk tracing 15 yrs of AI progressâfrom hand-coded 90s gradients to Gemini 3.0 solving IMO problemsâpowered by scale, better algos (TPUs, Transformers, MoE, CoT) and hardware, plus demos of low-code âSoftware 3.0â and visual reasoning (source).
- Claude Generates Powerpoint Slides: A member tried out Claudeâs new powerpoint skill and said it was quite nice, pointing it at a company styleguide and blog post for info & a high-level narrative to make 10 near perfect slides.
- They shared a screenshot of the generated slides. Members also discussed Nano Banana Pro in Google Slides.
- ChatGPT Pro vs Claude: Members discussed the value of ChatGPT Pro vs Claude, noting that ChatGPT is awesome for general research, has much better Codex rate limits, better for non ts/js/py, and has higher value if you use pulse, atlas, sora, codex cloud etc.
- However, members say Claude is always pushing boundaries, its models are better trained to use tools, its frontend UX and UI is really good, and its cli readability/typography/font hierachy makes it much easier to understand.
Latent Space â· #ai-announcements (2 messages):
SOTA Vision, RF-DETR Paper, NeurIPS, Dev Writers Retreat 2025
- RF-DETR Paper Authors Host SOTA Vision Special: The authors of the RF-DETR paper are hosting a special event for those keen on SOTA Vision here.
- NeurIPS Signups Reminder: There is a reminder to sign up for the NeurIPS tag and post related papers, discussions, meetups and questions in the relevant channel.
- The organizers will be there later in the week.
- 2025 Dev Writers Retreat Accepting Final Signups: The 2025 Dev Writers Retreat is hosting after NeurIPS in Sandiego, and they are taking their last signups this week here.
Latent Space â· #genmedia-creative-ai (31 messagesđ„):
Black Forest prompting guide, Wisprflow new funding, SGLang diffusion, Whisper Thunder vs VideoGen, AI Image Realism Showdown
- Whisper Thunder Dethrones VideoGen: The ML community is buzzing about Whisper Thunder, a new #1 text-to-video model, which has surpassed VideoGen in the latest Artificial Analysis rankings - see details here.
- Nano Banana Proâs Realism Sparks Debate: A comparison of AI-generated images from Grok 4.1, ChatGPT-free, Google Nano Banana, and Nano Banana Pro revealed that Nano Banana Pro produces images âindistinguishable from realityâ, as shown here.
- OpenAIâs Image-Gen Upgrade Has Mixed Reception: Users discovered that OpenAI quietly updated its image generation model, which lead to reactions range from praise for higher quality to criticism over poor multilingual support, inconsistent scene-to-scene references, and continued saftey guardrails as shown here.
- FLUX 2 Pro Boasts Improved Visuals: FLUX 2 Pro delivers a major quality leap over FLUX 1 Pro, eliminating the âplasticâ look and providing greater detail fidelity, as demonstrated in a side-by-side comparison here.
- Nano Banana Pro Enables Fraud: Nano Banana Pro can create near-perfect counterfeit receipts, KYC documents, and passports in one prompt, causing alarm over potential scams and fraud, which users debate here.
Yannick Kilcher â· #general (61 messagesđ„đ„):
Information Retrieval History, Genesis AI platform by Department of Energy, Curriculum Learning for Pretraining LLMs, MIT Study on AI Replacing Jobs, Trumpcoin Protocol for Zero Knowledge Proofs
- Lecture on Information Retrieval Stretches from Alexandria to RAG: A member shared a YouTube lecture on the history of information retrieval, tracing developments from the Library of Alexandria to RAG.
- Some expressed interest in attending a paper discussion, while others referenced a walkthrough video by a neuroscience PhD with a machine learning dissertation.
- US Department of Energy Eyes National AI Platform: The Department of Energy plans to build a national AI platform on top of U.S. supercomputers and federal science data.
- The platform aims to train scientific foundation models and run AI agents + robotic labs to automate experiments in various fields such as biotech, critical materials, nuclear fission/fusion, space, quantum, and semiconductors.
- Debate on Curriculum Learning Techniques for LLM Pretraining Heats Up: Members discussed the use of curriculum learning and coreset techniques during LLM pretraining, with one member questioning potential biases introduced by non-random sampling.
- They cited the Olmo 3 paper and the OLMo paper as reference, clarifying that curriculum learning is beneficial for language model pre-training, as long as a more model-centric notion of difficulty is adopted, according to this paper.
- AI Already Replacing US Workforce: A CNBC article states that an MIT study finds AI can already replace 11.7% of the U.S. workforce.
- Discussion ensued about the methodology, referencing the Iceberg Index and the corresponding paper, with skepticism about trusting LLMs to determine if other LLM tools can automate jobs.
- Tensors on Trumpcoin Protocol for ZKP: A member joked about sending all the tensors with zero knowledge proofs on the trumpcoin protocol.
- They added that all the Epstein documents will be released with zero knowledge proofs, proving itâs a witch-hunt, while protecting Epsteinâs victims.
Yannick Kilcher â· #paper-discussion (24 messagesđ„):
Adobe AI summaries, LLM Summarization Limitations, ADHD and Autism in AI/CS, Posting papers without understanding
- Adobe AI Summaries: The Devilâs Bait?: A member jokingly suggested that Adobeâs AI summaries might be leading to issues, referencing an attached image.
- Another member mentioned, *âI dislike it because they almost always use much worse models. You get infinitely better results if you paste the PDF into ChatGPT, Claude, Gemini, etc.â
- LLMs Struggle to Summarize High-Density Info: Members shared experiences that LLMs often fail to grasp whatâs important in summarization, especially with high-information density texts.
- One member stated, âEveryone has been talking about LLMs being these great summarizers. But they really arenât in my experience because they donât grasp whatâs important and what can be discarded.â
- ADHD and Autism in Tech: A Hot Topic: A member suggested a connection between curiosity, ADHD, and autism in understanding papers, leading to varied reactions.
- In response, it was asserted that having such conditions doesnât necessarily dictate specific actions, with multiple members sharing their own diagnoses of ADHD and suspected Aspergerâs.
- Curbing the Paper Flood: A New Rule Proposal: Concerns were raised about a user posting numerous papers without demonstrating sufficient understanding, leading to a proposal for a new rule.
- The rule would restrict paper recommendations to those with significant positive feedback or those posted to a specific channel, aiming to filter out noise and ensure relevance.
Yannick Kilcher â· #ml-news (6 messages):
Nano Banana Pro, Tencent Hunyuan, MAGA pushback on AI datacenters, AI replacing US workforce
- Tencent releases Hunyuan model!: Tencent recently released their Hunyuan model, as showcased in this video.
- MAGAs oppose AI datacenters: Some MAGA supporters are now pushing back against AI datacenters, as discussed in this YouTube video.
- MIT Study: AI to Replace 11.7% of US Workforce: According to an MIT study, AI can already replace 11.7% of the US workforce, per this CNBC article.
HuggingFace â· #general (19 messagesđ„):
Hugging Face Inference API, Christmas gift drop, Error in Hugging Face, Genesis Mission, PDF reader model for LLMStudio
- Inference API Grayed Out?: A member asked for advice on enabling the Hugging Face internal inference API for their uploaded model, noting that the inference option is currently grayed out in the UI, as shown in the attached image.
- Donation-Negotiation-Collaboration (DNC) Markdown: A member shared what they indicated might be their last Christmas gift drop, including a DNC.md file expressing uncertainty about its usefulness and expressing hope that it might benefit others.
- Comfy ComfyUI questions: In response to a question about running GGUF text-to-image models locally, a member suggested ComfyUI or koboldcpp.
- LM Studio PDF Teacher: A member inquired about a model for LLMStudio capable of reading a PDF file and answering questions, and another member suggested that any instruct model LLM should work, using LM Studioâs built-in RAG.
- They also shared a link to the LM Studio models page and Hugging Face models page.
- Spanish Text Dataset Quest: A member requested a large, high-quality Spanish text dataset for a MoE language model project.
- Another member provided links to a Spanish dataset and related Discord channels.
HuggingFace â· #cool-finds (1 messages):
aboodj_: epic
HuggingFace â· #i-made-this (8 messagesđ„):
RapidaAI Open Source, French Books Dataset, AI Sci-Fi Short Film
- RapidaAI goes Open Source: RapidaAI, a production-ready voice AI platform, is now open-source to give users control over their voice AI and avoid extra vendor costs.
- The company observed that teams were paying an extra $0.05â$0.15 per minute to rent someone elseâs stack, costing them six figures annually.
- French Classic Books Dataset is Created: A member created and shared a dataset of public domain French books available on Hugging Face.
- Also, thereâs a version with only the conversations in the books (here) designed for instruction purposes.
- AI Sci-Fi Short Film drops: A member showcased an AI-generated science fiction short film titled Tales of the Sun - Céline on YouTube.
- The creator spent two months creating the film and is seeking feedback from the community.
HuggingFace â· #reading-group (3 messages):
Chunking, GNN presentation, Structured data
- Chunkingâs impact is small: A member expressed gladness that chunking doesnât matter that much.
- For unstructured data you wonât see much difference due to limited edge cases.
- GNN Presentation incoming: A member is planning a presentation on GNNs, starting with AlphaFold 2 and 3.
- The exact topic is still undecided due to ongoing research.
- Structured data is valuable: A member suggested trying for structured data in a blog.
- They noted that for unstructured data, differences might be limited due to edge cases.
HuggingFace â· #agents-course (1 messages):
dodrawat: letâs connect
Modular (Mojo đ„) â· #general (2 messages):
Mojo repo, Copybara, Repo Sync
- Modular synchronizes repos with Copybara: Members discussed how Mojo keeps its internal and external repos synchronized, and one member confirmed they use Copybara.
- Copybara manages the internal private repo with the external open-source repo.
- Copybara manages internal & external repos: Copybara is used to manage the internal private repo and synchronize it with the external open-source repo.
- This ensures that changes and updates are consistently reflected across both repositories.
Modular (Mojo đ„) â· #max (20 messagesđ„):
MAX examples for newbies, MAX written in Python, Mojo API in MAX, Migrating Python MAX code to Mojo MAX, Performance gains in MAX with Mojo
- MAX Newbies Seek Examples: A member asked for small examples to learn about MAX, expressing interest in training.
- Another member suggested that Endia had some relevant content.
- Pythonâs Role in MAX Questioned: A member inquired about the decision to write MAX in Python, speculating on easier migration to MAX and Mojo.
- The member wondered if this would lead to a split world issue, similar to PyTorch, and whether a pure Mojo framework for MAX would emerge.
- Mojo APIâs Return to MAX Anticipated: A member clarified that MAX previously had a Mojo API, but it was discontinued due to Mojoâs incomplete state.
- They indicated that the Mojo API should return at some point when the language is more mature.
- Python to Mojo Migration Hurdles Highlighted: A member explained that while Mojo is not a strict Python superset, it resembles C++ or Rust more closely.
- They cautioned that migrating to Mojo MAX will require effort to leverage Mojoâs full potential, even though it looks like Python.
- Performance Boost with Mojo MAX Questioned: A member noted that MAX uses a JIT compiler, suggesting that performance gains from Mojo would mainly be in graph construction time.
- They speculated that speed differences between Mojo MAX and Python MAX might not be significant, and the split-world issue would persist until Mojo gains more features.
tinygrad (George Hotz) â· #learn-tinygrad (19 messagesđ„):
TinyJit internals, Non tinygrad Python operations, Randomness functions in Tinygrad, Tinygrad JIT tutorial, PyTorch compiler history
- TinyJit only replays kernels: When using
@TinyJit, the wrapped function only replays the captured tinygrad kernels and ExecItems, and the wrapped function wonât run at all.- If you need Python code to run, split it into separate JIT functions, but this can be tricky, and any non-tinygrad outputs will not be updated.
- Randomness functions in
Tensorwork as expected: Randomness functions onTensorshould work since they increment counters via a kernel.- Example:
CPU=1 DEBUG=5 python3 -c "from tinygrad import Tensor; Tensor.rand().realize(); Tensor.rand().realize()".
- Example:
- Two JIT runs are required for tracing, but it might change to verifying match: The JIT uses the second run to repeat the captured kernels and the first run may perform different setup tasks such as weight initialization.
- A proposal suggests that the JIT might be updated to wait for two runs to match, indicating that the implementation is still pre-1.0 and subject to change, with efforts focused on removing footguns.
- Good Tutorial on tinygrad JIT: A member shared a tutorial on tinygrad JIT.
- The tutorial is a bit outdated but still good.
- Tinygrad fundamentals are solid: Tinygradâs fundamentals are now solid, and the team is now shifting focus to frontend usability.
- One person reminisces that the very first pytorch compiler in a fast.ai lesson literally concatenated C code strings, using regex!.
Moonshot AI (Kimi K-2) â· #general-chat (14 messagesđ„):
Kimi's Limits, Chatbots vs Canvases, Conversational Fallacy
- Kimiâs Limits Explored: A user inquired about the limits of Kimi, expressing uncertainty despite planning to upgrade from the web interface, and attached a screenshot.
- Another user praised Kimi K2 for its superior thinking and ability to push back, highlighting its understanding and interaction in the context of prompts.
- Canvas Craze Coming?: A user expressed disbelief that canvases havenât replaced chatbots yet, suggesting they make more sense for full-screen websites like Kimi and Qwen.
- They argued that while chatbots are suitable for small side-panels, canvases could provide a better experience for comprehensive web interfaces.
- Conversational Fallacy Considered: A user shared a quote they are obsessed with: weâre stuck in the conversational fallacy: the idea that AI must be addressed to be used.
- The user seems to believe that Kimi does an amazing job of not falling into this fallacy.
DSPy â· #show-and-tell (4 messages):
dspy-cli tool, DSPy projects, FastAPI endpoints, MCP tools, Docker hosting
- dspy-cli Tool Goes Open Source: Members announced that
dspy-clitool is now open source and available on PyPi, to help create, develop, test, and deploy DSPy programs as HTTP APIs.- The repo is available on GitHub and the tool can be installed using
uv tool install dspy-cli.
- The repo is available on GitHub and the tool can be installed using
- dspy-cli New Features Available: The main features are to scaffold a new DSPy project, create new signatures from the command line, run modules as FastAPI endpoints or use them as MCP tools.
- Programs can be easily deployed to a docker hosting service of choice.
- dspy-cli Acclaimed for its Project Utility: Members expressed eagerness to try
dspy-clion more projects and spread the word about its usefulness.- A user tweeted about the tool, praising the great work.
DSPy â· #general (9 messagesđ„):
ReAct Module Trajectory Injection, Web Search API Implementation in DSPy, Anthropic Web Search API, Latency issues with web search API calls
- Trajectory Injection in ReAct Modules: A member inquired about injecting trajectories into a ReAct module, seeking to provide the agent with context from previous runs in addition to message history.
- Web Search API choices for DSPy: A member asked for advice on the best APIs to implement a web search tool in DSPy, specifically asking if the native web search API of a provider could be used.
- Exa API includes summarization: One member shared a positive experience using Exa API due to its summarization feature, which avoids the random ads and HTML tags found in other APIs like Firecrawl and Parallel.ai.
- Using Anthropicâs web search API with ReAct: A member is trying to implement it using Anthropicâs web search API with ReAct, and shared a code snippet using
dspy.ReAct. - Latency caused by Web Search API Calls: A member raised a question about the latency caused by web search API calls within DSPyâs ReAct when using a search function like
search_webbefore calling the LLM.
MCP Contributors (Official) â· #general (11 messagesđ„):
New Protocol Version, UI SEP Release, MCP Namespace Collision
- New Protocol Version Drops!: A new protocol version has been released, as announced in the Discord channel.
- Members expressed excitement and gratitude to the MCP community for their contributions over the past year.
- UI SEP Ships Out-of-Band!: The UI SEP can be shipped out-of-band from the main spec due to being an extension.
- Check out the <#1376635661989449820> channel for more details.
- MCP Considers Namespace Collisions!: A member inquired about whether the MCP group considers the possibility of namespace collisions.
- Specifically, the question was raised whether the group would take action if something claims to be something-mcp but diverges from the actual MCP standard.
Manus.im Discord â· #general (8 messagesđ„):
AI Engineer introduction, API Issues, Telegram channel
- AI Engineer showcases expertise: An AI engineer with hands-on experience building advanced, end-to-end AI systems across multiple domains introduced themself.
- Their expertise covers AI agents, multi-agent systems, automating workflows, NLP-powered chatbots, integrating voice & speech systems, deploying custom LLMs, fine-tuned AI models, Web3, smart contracts, and AI-integrated blockchain games.
- User reports API issues and lack of support: A user reported experiencing an [unknown] error in webdev.v1.WebDevService/GetDatabaseSchema due to usage quota exhaustion, despite topping up more than $600.
- The problem has rendered their entire account unusable, affecting over 500 active users, and they have not received any response or support from the team.
- Members inquire about a Telegram channel: A member inquired about the existence of a Manus Telegram channel.
aider (Paul Gauthier) â· #general (3 messages):
Benchmark Updates, Opus 4.5 vs Sonnet 4.5
- Community Suggests New Site Admin for Benchmarking: A member suggested that someone else should run the site who can update the benchmark results with new models.
- This implied dissatisfaction with the current state of benchmark result updates.
- Opus 4.5 upgrade or minor upgrade over Sonnet 4.5?: A member initiated a quick survey to gauge community sentiment on whether Opus 4.5 is a big or minor upgrade over Sonnet 4.5.
- Another member reported that they encountered a âmodel not foundâ error when trying what would typically be the correct Bedrock model identifier.