Mistral is back!

AI News for 12/1/2025-12/2/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 9665 messages) for you. Estimated reading time saved (at 200wpm): 697 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

We last saw Mistral Small 3 in Jan, and 3.1 in March, then the mainline models took a detour with Mistral Code and Magistral and Voxtral. Well, after raising 1.7B at a 11.7B valuation, Mistral Large 3 is here together with 3 sizes of Ministral (blogpost), all open weights Apache 2.0.

Mistral Large 3 performance comparison chart showing benchmark results across multiple AI models and evaluation metrics.

It’s unfortunate timing coming right after Deepseek V3.2 (#6 on Open Models and #28 on Text), but still a notable achievement for European AI. as Anj points out, this is on Mistral’s old cluster - with the new funding, a 6x larger compute cluster will come online in 2026.

AI Twitter Recap

Mistral 3 family: open, multimodal, and everywhere

Mistral 3 launch (Apache-2.0, open weights): Mistral released the multimodal Ministral 3 (3B/8B/14B) in base, instruct, and reasoning variants, plus Mistral Large 3 — a sparse MoE with 675B total params, 41B active, 256k context, and vision input. All models ship under a permissive license with broad platform support and strong small-model performance. Details: @MistralAI, news, Ministral sizes, Large 3.

Infra and ecosystem landed day-0: vLLM support (NVFP4 checkpoints, sparse MoE kernels, long context, multimodal), llama.cpp integration + RTX gains, Ollama models + cloud, LM Studio catalog. Community guides and formats shipped quickly: Unsloth runbooks + GGUFs.

Early evals: the Arena places Mistral-Large-3 at #6 among open models (strong in coding; #28 overall) and notes it was tested under codename “Jaguar” (@arena). Practitioners report better instruction-following than contemporary open baselines and availability of an NVFP4 checkpoint targeting single A100/8×H100 nodes (@dejavucoder).

Browser and local: the 3B runs 100% locally in WebGPU (@xenovacom).
Other model drops: Apple released CLaRa-7B-Instruct on HF (tweet). Runway previewed Gen‑4.5 with higher “cinematic realism” and early access roll-out (tweet). Moondream showed strong segmentation that “actually understands the scene” (tweet).

Anthropic: Bun acquisition, nonprofit program, and how AI is changing work

Bun joins Anthropic: Anthropic acquired the MIT‑licensed Bun JS/TS runtime to accelerate Claude Code. Bun remains open source; the Bun team joins Anthropic to keep building both Bun and deeper Claude Code integrations (Anthropic, Bun, @_catwu, @mikeyk). Community notes Claude Code reportedly hit a $1B run‑rate in ~6 months post-GA (@alexalbert__).
Claude for Nonprofits: Discounted plans, new integrations, and training for NGOs in partnership with GivingTuesday (announcement).
How AI is changing work inside Anthropic: Survey of 132 engineers + 200k Claude Code sessions: engineers lean on Claude first for questions, changing team dynamics; the company plans wider internal study and organizational responses (thread, follow-ups).

Frontier benchmarks, leaks, and competitive positioning

OpenAI “Garlic” leak and GPT-5.1: The Information reports a new OpenAI pretrained model “Garlic” testing well on coding/reasoning vs GPT‑4.5 (report; quote: Mark Chen). OpenAI published a podcast on GPT‑5.1 Instant detailing reasoning, personality controls, and behavior refinement (tweet).
DeepSeek V3.2 and Speciale: Multiple analyses highlight V3.2 (and Speciale) as “affordable frontier” models with notable tradeoffs: slow generation (~30–40 tks/s) and very long chains (avg reasoning output 20k–47k tokens), but extremely low price (e.g., $3 vs $35 for Claude 4.5 Sonnet Thinking on certain evals) and strong new highs on LisanBench; updated scoring pegs Speciale at an impressive 8.81 on easier subsets (overview, score correction, discussion on verbosity/context). Day‑0 API availability noted by Fireworks (tweet).
Arena placements and ecosystem: Mistral-Large-3 enters the LMArena Text leaderboard (strong coding, top-10 in multiple occupational areas) (tweet). OpenRouter rolled out Mistral Large 3 and Amazon Nova 2 Lite access (Mistral, Nova Lite).

Amazon Nova 2.0 (reasoning, agentic, multimodal) and Nova Sonic 2.0 (speech‑to‑speech)

Nova 2.0 family: Amazon unveiled Nova 2.0 Pro (reasoning, preview), Lite (speed/cost), and Omni (text/image/video/speech input; text/image output). Early third‑party benchmarks suggest material gains vs Nova Premier and competitive agentic capabilities: Nova 2.0 Pro hits 93% on τ²‑Bench Telecom and 80% on IFBench under medium/high reasoning budgets, with Pro pricing at $1.25/$10 per 1M in/out tokens (analysis, follow-up). Nova Lite benchmarks posted separately (tweet).
Nova Sonic 2.0 (speech‑to‑speech): New real‑time bidirectional audio model scores #2 on Artificial Analysis Big Bench Audio (87.1% reasoning), with median 1.39s time‑to‑first‑audio; supports five languages and adaptive prosody (thread). OpenRouter is offering Nova 2 Lite free for 2 weeks (tweet).

Agents, toolchains, and safety

LangSmith Agent Builder (public beta): No‑code agent builder that creates prompts, selects tools/subagents, supports MCP servers, triggers (Gmail/Slack), and configurable memory/summarization policies (launch, overview, Chase video).
LlamaIndex releases: LlamaAgents (deployable agent workflow templates) and LlamaSheets (deep spreadsheet parsing/extraction) with community office hours this week (recap, invite).
Hugging Face Skills: A “universal implementation of agent context” compatible with Cursor, Claude Code, Gemini CLI, and local/remote jobs; uses Claude Code’s skill spec but exposes entry points for other ecosystems (tweet).
Prompt injection defense for browser agents: Perplexity open‑sourced BrowseSafe and BrowseSafe‑Bench; fine‑tuning on the benchmark outperforms off‑the‑shelf safety classifiers and LLM‑as‑detector approaches while avoiding reasoning latency (announcement, results).
DevEx: Microsoft’s Tangle open-sources a content‑based caching experimentation platform with a visual editor (claims “1+ year CPU time saved” at Shopify) (tweet). Cline shipped /explain‑changes and a stealth 256k “microwave” model for agentic coding (release, model).

Research highlights

Test‑time compute scaling: A large‑scale study and “recipe” for selecting strategies shows TTS reliably boosts complex reasoning without retraining; effectiveness depends more on allocation strategy than raw compute (summary, paper).
Deep research agents under scrutiny: OPPO’s FINDER benchmark (100 tasks; 419 checklist items) and DEFT failure taxonomy (14 modes) show agents don’t fail at task comprehension — they fail at evidence integration, verification, and planning; suggests architecture changes linking retrieval to synthesis (overview).
Pragmatic interpretability: Neel Nanda argues for basic science of CoT in pragmatic interp, with methods that apply directly to frontier LRMs; counters hype that interpretability is “failed,” reframing priorities (clarification, techniques).
AI x chips, recursive improvement loop: Ex–DeepMind leads Azalia Mirhoseini and Anna Goldie launched Ricursive Intelligence to co‑evolve models and silicon (architecture→verification→implementation) toward recursive self‑improvement; roots in AlphaChip used across multiple TPU generations (announcement, founder thread).
Bonus: Elicit adds figure parsing at scale (Kaplan–Meier, heatmaps, reaction schemes, microscopy), moving multimodal reasoning beyond text/tables for systematic reviews (tweet).

Top tweets (by engagement)

Anthropic acquires Bun; Bun remains MIT‑licensed and joins to supercharge Claude Code (Anthropic, Bun).
Mistral 3 family launch with open weights across sizes, MoE Large 3, and multimodal support (@MistralAI).
Waymo safety op‑ed cites ~100M driverless miles with large reductions in serious injury and intersection crashes (@slotkinjr).
OpenAI podcast on GPT‑5.1 training decisions, reasoning, and behavior refinement (@OpenAI).
Apple’s CLaRa‑7B‑Instruct released on Hugging Face (tweet).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Mistral 3 Model Family Release

Mistral just released Mistral 3 — a full open-weight model family from 3B all the way up to 675B parameters. (Activity: 614): Mistral has released the Mistral 3 model family, featuring models ranging from 3B to 675B parameters, all under the Apache 2.0 license for both research and commercial use. The lineup includes the compact Ministral 3 models (3B, 8B, 14B), which are multimodal and available in base, instruct, and reasoning variants, noted for their strong performance relative to size. The flagship Mistral Large 3 is a 675B parameter model with a Mixture of Experts (MoE) architecture, offering strong multilingual capabilities and high efficiency, positioning it as one of the most capable open-weight instruct models available. This release supports a shift towards open AI ecosystems, providing a range of models suitable for both on-device and large-scale enterprise applications. Full announcement. Commenters express disappointment over the lack of models between 14B and 675B parameters, with some hoping for models in the 80B to 400B range. There is also a desire for competition to GPT-OSS 120B, with a focus on models that can run efficiently on consumer-grade GPUs.
- jzn21 highlights a gap in the Mistral model lineup, noting the absence of models between 14B and 675B parameters. This gap is significant for users interested in models ranging from 80B to 400B, which are often considered a sweet spot for balancing performance and resource requirements.
- fungnoth discusses the need for competition against the GPT-OSS 120B model, particularly emphasizing the potential of large Mixture of Experts (MOE) models. These models could leverage consumer GPUs effectively by activating only a subset of experts, thus maintaining speed and efficiency.
- Adventurous_Cat_1559 expresses interest in a 120B parameter model that could be run on a 96GB Mac Studio, indicating a demand for high-parameter models that are still feasible for high-end consumer hardware. This reflects a broader interest in making large models accessible to more users without requiring enterprise-level resources.
Ministral-3 has been released (Activity: 356): Ministral-3 has been released, featuring three models: 14B, 8B, and 3B, each with reasoning, instruct, and base variants. The largest, Ministral 3 14B, is noted for its performance comparable to the larger Mistral Small 3.2 24B, offering advanced language and vision capabilities. These models are available on Hugging Face and are designed for efficient deployment in various applications. Commenters are curious about the models’ tool-calling capabilities and express a desire for performance comparisons with larger models like the Mistral Small 24B.
- StyMaar highlights the release of the base models for Ministral-3, which is significant for developers looking to build custom applications or fine-tune models for specific tasks. This release allows for more flexibility and experimentation compared to pre-trained models.
- throwawayacc201711 questions the lack of comparison between Ministral-3 and larger models like Mistral Small 24B. Such comparisons are crucial for understanding performance improvements and trade-offs, especially in terms of computational efficiency and accuracy.
- human-exe suggests that Ministral-3 outperforms and could potentially replace models like Qwen3 and Gemma3. This implies that Ministral-3 may offer better performance metrics or efficiency, making it a more attractive option for users of those models.

2. GPU Rental Market in Mongolia

Would you rent B300 (Blackwell Ultra) GPUs in Mongolia at ~$5/hr? (market sanity check) (Activity: 446): A team in Mongolia is offering B300 (Blackwell Ultra) GPUs for rent at approximately $5/hr in a data center located in Ulaanbaatar. The setup includes 3.2 Tb/s InfiniBand and pre-installed PyTorch and SLURM, with latency measurements showing ~35 ms to Beijing and ~110 ms to Singapore. The post seeks feedback on the viability of this offering compared to established providers like CoreWeave and Lambda, and whether the ‘cold steppe bare-metal neutrality’ is a compelling feature. The GPUs are offered with full root access and no hypervisor, emphasizing a neutral jurisdiction with no unexpected legal intrusions. Landing page is available for more details. Commenters suggest that the offering could be attractive if the service is stable and secure, with one recommending collaboration with established providers like TensorDock or DeepInfra, who offer similar services at competitive rates. The unique selling point of ‘neutral territory’ is seen as potentially beneficial but requires further validation.
- Lyuseefur highlights three critical technical requirements for renting GPUs: the hardware must be genuine, stable for extended periods, and support encrypted containers. These conditions ensure reliability and security for non-mission-critical tasks that are time-flexible.
- Azuriteh suggests partnering with established providers like TensorDock or DeepInfra, noting that DeepInfra offers B200 GPUs at approximately $2.5/hr, which is competitive. This implies that market entry might be more feasible through collaboration with experienced entities rather than independent offerings.
- Xamanthas points out that the geographical location of the GPUs is irrelevant for non-legally mandated tasks, as training jobs are not constrained by latency. This suggests that the physical location in Mongolia would not impact the performance for most AI training workloads.
Mistral 3 Blog post (Activity: 719): Mistral AI has released the Mistral 3 series, which includes three dense models (14B, 8B, 3B) and a sparse mixture-of-experts model, Mistral Large 3, with 41B active and 675B total parameters. These models are open-sourced under the Apache 2.0 license and optimized for NVIDIA hardware. They are designed for high performance in multilingual and multimodal tasks, with Mistral Large 3 achieving top rankings in open-source model leaderboards. The models are tailored for efficient inference, suitable for applications ranging from edge devices to enterprise solutions. More details can be found in the original announcement. Some commenters express disappointment, noting that Mistral 3’s performance is underwhelming compared to competitors like Qwen3-235B-2507, which has a better ELO despite being smaller. There is also criticism of the comparison charts used, which some find misleading or incomplete.
- The release of Mistral 3 models under the Apache 2.0 license is significant, but there are concerns about their performance. The top Mistral LLM has a lower ELO rating compared to Qwen3-235B-2507, despite being larger. Additionally, comparisons are made with Deepseek 3.1, which has similar performance, rather than more recent models like Deepseek 3.2 or Speciale.
- There is criticism regarding the performance of Mistral’s smaller LLMs, which reportedly underperform compared to Qwen3 and Gemma models of similar sizes. The new Mistral models do not seem to match the performance of their previous consumer-targeted open LLM, Mistral 3.2 24B, indicating a potential step back in terms of efficiency and capability.
- Some users express disappointment with the size and scalability of the Mistral models, noting that the larger models do not fit within a 256GB memory constraint. There is a call for larger models, such as a 48B MoE of 3B or something around 120B, to better compete with models like GPT-OSS.

3. Hugging Face Top Contributors

Only the real ones remember (he is still the contributor with the most likes for his models) (Activity: 318): The post highlights a Hugging Face space dedicated to top contributors, specifically mentioning mradermacher and Bartowski as leading figures in the community. The link provided (Hugging Face space) showcases contributors who have significantly impacted the platform, with a nod to historical figures like the ‘father of GGUF’. This suggests a focus on model contributions and innovations within the Hugging Face ecosystem. Comments reflect a recognition of Bartowski and mradermacher as current leaders in the Hugging Face community, with a nostalgic mention of ‘TheBloke’ for his contributions to GGUF files, indicating a shift in community leadership and contributions.
- TheBloke is recognized for his contributions to the Hugging Face community, particularly with his GGUF files, which have been widely used and appreciated by users like neoneye2. GGUF files are a format that likely optimizes model storage or performance, though specific technical details aren’t provided in the comments.
- DaniyarQQQ reminisces about the Mixtral-8x7B-Nous-Hermes-Instruct-v0.1-LimaRP-WizardLM-ZLoss-DARE-TIES-SuperCoT-SuperHOT-AWQ models, indicating a preference or nostalgia for these specific configurations. This suggests that these models had unique characteristics or performance benefits that were valued by the community.
- Jacek2023 notes that on the Hugging Face platform, TheBloke has been succeeded by other contributors like Bartowski and mradermacher. This implies a dynamic and competitive environment in the model development community, where new contributors frequently emerge with innovative models.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. OpenAI ‘Code Red’ and New Model Announcements

Breaking: OpenAI declares ‘code red’ to respond to threats to ChatGPT and improve metrics, will delay ads and other initiatives (Activity: 761): OpenAI has declared a ‘code red’ to address competitive threats to ChatGPT, as reported by ‘The Information’. This strategic shift involves delaying advertising initiatives to focus on improving key performance metrics of ChatGPT. The move suggests that OpenAI is prioritizing the enhancement of its AI capabilities in response to increasing competition from other powerful AI models, notably from companies like Google, which are now fully engaged in the AI race. Commenters note the historical context of Google’s ‘code red’ response to ChatGPT, highlighting the competitive dynamics in the AI industry. There is a recognition that OpenAI is now facing serious competition from other advanced AI models, necessitating a strategic pivot to maintain its leadership position.
- Abby941 highlights that OpenAI is facing increased competition as other AI models have caught up, challenging their first-mover advantage. This situation is compounded by Google’s intensified focus on AI, leveraging its vast resources to compete directly with OpenAI.
- Warm-Letter8091 mentions that OpenAI is planning to release a new model that surpasses the capabilities of the current ‘gem 3 pro’ model, indicating a strategic move to enhance their offerings amidst growing competition.
Sam Altman told employees he was declaring a “code red” (Activity: 1136): Sam Altman, CEO of OpenAI, has declared a “code red” to prioritize improvements to ChatGPT, delaying other projects like advertising, according to an internal memo reported by The Information. OpenAI is reportedly testing various ad formats, including those for online shopping, though it hasn’t publicly confirmed these efforts. Commenters suggest that OpenAI faces significant competitive pressure from Google, which can leverage its TPUs and offer AI services at minimal cost, unlike OpenAI. There’s also discussion about Anthropic facing similar challenges, and the potential impact of open-source AI developments, particularly from China, as AI progress slows.
- Google’s competitive edge in AI is largely attributed to its ability to run AI services at the raw cost of TPUs, which is a significant advantage over companies like OpenAI that cannot afford to operate AI as a loss leader. This is compounded by Google’s offering of AI usage in Antigravity for free, which OpenAI cannot match without its own ecosystem, something it has struggled to establish with initiatives like Sora 2.
- OpenAI and Anthropic face significant challenges due to their inability to compete with Google’s diversified business model and pricing strategies. Anthropic, despite its early success with Claude Code, struggles with pricing and quotas, which Google can manage more effectively due to its diverse revenue streams. This situation is exacerbated by the pressure from open-source AI developments, particularly from China, which could catch up as AI progress slows.
- The discussion highlights the strategic disadvantage of companies like OpenAI and Anthropic, which lack the ability to subsidize AI services through other business areas, unlike Google. This is a critical factor as Google leverages its TPU infrastructure and diversified business model to offer competitive pricing and free services, putting pressure on smaller AI companies.
OpenAI is set to release a new reasoning model next week, per The Information. (Activity: 753): OpenAI is reportedly set to release a new reasoning model next week, which is claimed to outperform Google’s Gemini 3 in internal evaluations. This announcement, highlighted by The Information, suggests that the new model may be named something like GPT 5.1 O3, as speculated by users. The model has been discussed in forums like lmarena and design arena under the name ‘robin’, indicating its presence in the testing phase on social media platforms like Twitter. Commenters express excitement about the competitive landscape driving innovation, noting that even if they don’t use models like Gemini or Deepseek, their existence pushes advancements in AI technology.
Altman memo: new OpenAI model coming next week, outperforming Gemini 3 (Activity: 638): OpenAI is preparing to release a new model next week, reportedly surpassing Google’s Gemini 3 in performance. This development is part of OpenAI’s strategic “Code Red” initiative, which aims to counter Google’s advancements, particularly as Gemini 3 has achieved notable user growth and benchmark success. The release will focus on enhancing ChatGPT and image generation capabilities, while other projects like advertising and AI agents are delayed. For more details, see the original article. Commenters suggest that the new model’s performance might not represent a significant leap over existing models like GPT-5.1, given their current proximity to Gemini 3 in most use cases. There is also skepticism about the pricing strategy, with some noting that the new model, possibly “GPT 5.5,” could be significantly more expensive.
- ObiWanCanownme notes that GPT-5.1 and Gemini 3 are already closely matched in performance across most use cases, suggesting that the new OpenAI model’s superiority might not be a significant leap in capabilities. This implies that the advancements may be more incremental rather than revolutionary, focusing on refining existing strengths rather than introducing groundbreaking features.
- GeorgiaWitness1 highlights a strategic approach by companies like OpenAI, where they develop multiple builds but often refrain from deploying them due to cost considerations. The release of a model like ‘GPT 5.5’, which is significantly more expensive (x10), is framed as a success, suggesting a focus on premium offerings that may not be accessible to all users but demonstrate technological prowess.
Sam Altman told employees he was declaring a “code red” (Activity: 3375): Sam Altman, CEO of OpenAI, has declared a “code red” to prioritize improvements to ChatGPT, delaying other projects like advertising, according to an internal memo reported by The Information. OpenAI is reportedly testing various ad formats, including those for online shopping, although it hasn’t publicly confirmed these efforts. This strategic shift underscores the urgency to maintain competitive advantage in the AI space, especially after the underwhelming reception of ChatGPT 5.0. A notable opinion suggests that OpenAI risks losing its early lead in AI dominance, similar to how Google became the default search engine. The sentiment reflects concerns over ChatGPT 5.0’s performance and the need for OpenAI to refocus on core AI capabilities.

2. AI Model and Benchmark Releases

Introducing Mistral 3 (Activity: 713): Mistral AI has released Mistral 3, which includes three dense models (14B, 8B, 3B) and the Mistral Large 3, a sparse mixture-of-experts model with 41B active and 675B total parameters. The Mistral Large 3 was trained using 3000 NVIDIA H200 GPUs and excels in instruction-tuning and multilingual tasks. The Ministral series is designed for cost-effective edge applications. All models are open-sourced under the Apache 2.0 license. More details can be found in the original announcement. There is confusion and skepticism among users regarding the benchmarks, particularly as they compare against DeepSeek 3.1 instead of the newer 3.2 version, suggesting potential performance gaps.
- Round_Ad_5832 points out that the benchmarks for Mistral 3 are compared against DeepSeek 3.1, not the latest version 3.2. This suggests that the performance comparison might not reflect the most current competitive landscape, potentially skewing perceptions of Mistral 3’s capabilities.
- peachy1990x highlights confusion regarding the benchmarks, noting that DeepSeek 3.2 has been released but was not included in the comparison. This omission could imply that Mistral 3 might not perform as well against the latest models, raising questions about its competitiveness.
- Eyelbee questions the performance of Mistral 3, suggesting it might be significantly worse than DeepSeek 3.2. This indicates a concern that Mistral 3 may not be able to compete effectively with the latest models in terms of performance.
Z Image Turbo ControlNet released by Alibaba on HF (Activity: 1984): Alibaba has released the Z Image Turbo ControlNet on Hugging Face, a model designed to enhance image generation tasks. This release is part of their ongoing efforts to provide advanced AI tools to the community, leveraging the ControlNet architecture to improve performance and flexibility in image processing applications. The model is expected to offer significant improvements in speed and quality, catering to the needs of developers and researchers in the field. The community is reacting positively, noting Alibaba’s quick response to AI trends and their ability to deliver tools that align with community interests. There is also a sentiment that this release could overshadow other projects like Flux 2, indicating competitive dynamics in the AI tool space.
- A user speculates on improving results by disabling ControlNet during the final step of processing, allowing the final refining pass to be purely handled by Z Image Turbo (ZIT). This suggests a potential method for enhancing output quality by leveraging the strengths of both systems at different stages of the image generation process.
EngineAI unveils the T800, their latest full-sized humanoid robot (Activity: 2204): EngineAI has introduced the T800, a new full-sized humanoid robot, which is claimed to be free of CGI effects in its demonstrations. The robot’s design and functionality are reminiscent of science fiction, particularly in its ability to land on its feet and bounce, which has led to skepticism about the authenticity of the footage. The T800 is positioned as a significant advancement in humanoid robotics, though some technical observers have noted the need for improvement in its naming conventions. There is skepticism among commenters regarding the authenticity of the T800’s demonstration videos, with some suggesting that the robot’s movements appear CGI-like, particularly when it lands and bounces.
- VihmaVillu raises skepticism about the authenticity of EngineAI’s T800 demonstration, noting that the robot’s movements, particularly when landing and bouncing, appear CGI-like. This suggests potential issues with the robot’s physical dynamics or the presentation’s realism, which could impact perceptions of its capabilities.
- Dave-the-Dave highlights the impressive nature of the T800’s design if the no CGI claim is accurate. The comment suggests that achieving such a lifelike appearance in a physical robot could be a significant technical achievement, indicating advanced robotics engineering and design.
- The discussion around the T800’s presentation touches on the challenges of making humanoid robots appear natural and lifelike. This involves complex dynamics and control systems to mimic human-like movements, which are crucial for applications in environments where human-robot interaction is necessary.

3. AI and Internet Challenges

Dead internet is real, and I’m starting to think we have way less time than people realize… (Activity: 1208): The post discusses the increasing difficulty of finding authentic images and videos online due to the proliferation of AI-generated content, which often appears overly saturated or unrealistic. The user expresses concern about the future of the internet, suggesting that the ease of access to AI tools is leading to a flood of low-quality, AI-generated media. This trend is perceived as worsening, with the potential for an ‘AI-free intranet’ being considered as a solution. The post includes a link to an example image illustrating the issue. Commenters agree with the concern, noting that the internet’s usability has declined since 2023 due to the prevalence of AI-generated content. They highlight the low barrier to entry for creating such content and the potential for it to dominate online spaces. Some suggest setting search filters to pre-2023 to find authentic content, while others point out the economic incentives driving the creation of misleading or sensational content.
- raydude888 highlights the increasing prevalence of AI-generated content due to the low barrier to entry, suggesting that the internet is becoming saturated with ‘AI slop’. This raises concerns about the authenticity of online content and the potential for an AI-free internet, though such spaces may still be infiltrated by AI-generated content for disruptive purposes.
- frocsog suggests that the internet’s usability has declined significantly post-2023, implying that users need to filter content by date to access reliable information. This reflects a broader sentiment that the internet is increasingly filled with low-quality or misleading content.
- Tough_Elk_8211 proposes practical solutions to the problem of AI-generated content, such as building offline libraries and emphasizing photo credits. They argue that the market for low-quality content will diminish if consumers stop engaging with it, leading to a focus on genuine artistic contributions.
the adpocalypse is coming (Activity: 757): The image is a meme depicting the Grim Reaper labeled “ADS” knocking on a door labeled “CHATGPT,” suggesting that AI assistants like ChatGPT might soon be overwhelmed by advertisements, similar to platforms like YouTube and Google Search. This reflects a concern about the potential for AI platforms to become saturated with ads, impacting user experience negatively. The post is shared in the context of discussing alternative monetization models for AI-driven platforms, highlighting a broader conversation about the sustainability and user-friendliness of ad-supported models. Some commenters argue that platforms like Google and YouTube are still thriving despite ad saturation, while others suggest technical solutions like ad blockers or even using AI to create ad-blocking tools.
A history professor says AI didn’t break college — it exposed how broken it already was (Activity: 984): A history professor argues that AI has not broken the college system but rather highlighted its existing flaws. The critique focuses on the traditional college model, which is often more about credentialing for jobs than genuine learning. The professor suggests that the current system, including practices like take-home essays, is outdated in the age of the internet and AI, as these methods do not effectively test students’ ability to form arguments or demonstrate deep understanding. Commenters agree that the college system is flawed, with some suggesting that companies should directly train high school graduates instead of relying on college credentials. Others criticize the reliance on take-home essays, advocating for more in-person discussions and the Socratic method to better test students’ knowledge and argumentation skills.
- The comment by brett_baty_is_him critiques the traditional take-home essay format, arguing that it has become obsolete due to the internet. The commenter suggests that in-class discussions and the Socratic method are more effective for teaching students to form arguments, as they require a deeper understanding of the subject and the ability to think on the fly. This approach contrasts with the practice of rewording sources for research papers, which the commenter views as inadequate for testing true comprehension.
- Plane_Crab_8623 raises a philosophical question about the nature of education, questioning whether it is merely the accumulation of testable facts and concepts or if it involves achieving conformity to biased standards and groupthink. The commenter suggests that with the vast information available on smartphones, the traditional concept of being ‘educated’ might need reevaluation, as these devices can provide instant access to a wide range of knowledge, akin to having a ‘PhD in any subject’ in one’s pocket.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 3.0 Pro Preview Nov-18

Theme 1. Model Releases: Mistral’s MoE Behemoth, Arcee’s Trinity, and Flux Rankings

Mistral Large 3 “Jaguar” Stalks the Leaderboards: Mistral AI launched the Mistral 3 family, with the Large 3 variant (codenamed Jaguar) debuting at #6 on the open model leaderboard and rumors suggesting a massive 675B MoE architecture rivaling DeepSeek V3. While the community praised the Apache 2.0 release of the 3B/8B/14B models, users noted that the Mistral Medium 3 appears to be a dense 70B parameter model that stays consistent with previous medium versions.
Arcee Trinity Mini’s Multi-Turn Meltdown: Arcee AI released the Trinity model family (Nano 6B and Mini 26B-MoE), trained on 10T tokens using 512 H200 GPUs. Despite strong initial specs, engineers testing the Trinity Mini reported severe degradation in multi-turn conversations, where the model gets stuck repeating words like pasta or defaulting to generic LLM jokes rather than maintaining context.
Flux 2 Pro Storms Image Generation Rankings: The new Flux-2-pro and Flux-2-flex models immediately captured the #3 and #5 spots respectively on the Text-to-Image leaderboard. This release coincides with Perplexity users reporting strict new image generation limits of roughly 150 images per month, pushing users toward these alternative open models.

Theme 2. Kernel Optimization & Hardware: PyTorch Bugs, Race Conditions, and Leaderboards

PyTorch 2.9.1’s conv3D Performance Regression: Engineers identified a critical slowdown in conv3D operations within PyTorch 2.9.1+cu128 compared to version 2.8.0, affecting workflows regardless of cuDNN status. The community traced the root cause to Issue #166643 and currently recommends manually installing a newer cuDNN version from PyPI to restore inference speeds.
Syncwarp Race Conditions Trap Developers: CUDA developers clarified that using __syncwarp() before legacy primitives like ballotsync creates dangerous race conditions, confusing the C++ memory model’s acquire/release semantics. While __syncwarp() prevents hazards within a single warp, engineers emphasized that syncthreads remains the only safe barrier for multi-warp communication to ensure sequential consistency.
Domination on the NVFP4 GEMM Leaderboard: Optimization experts are flooding the nvfp4_gemm leaderboard with new submissions, achieving personal best kernel timings of 13.3 µs and 13.4 µs. The competition has intensified around reducing overhead, with the new eval_better_bench.py script dropping measurement latency from 18.0us down to 14.8us for Q1 kernels.

Theme 3. Developer Tooling: Unstable IDEs, API Errors, and Sub-Agent Dreams

Manus.im’s Production-Level Amnesia: Users are reporting severe instability with Manus, citing code wipes between checkpoints and git discrepancies that result in “Agents arguing about what they see vs. what you’re LITERALLY looking at in PROD.” Compounding the frustration, the platform’s authentication is failing for some projects, while users clamor for the return of Chat Mode and integration of Gemini 3 Pro.
OpenRouter Struggles with DeepSeek 500 Errors: Developers using OpenRouter are facing persistent “Internal Server Errors” (Error 500) and confusing rate limits with DeepSeek v3.2, even when using personal API keys (BYOK). The platform appears to override user keys with its own in some instances, forcing developers to disable web search plugins to achieve temporary stability.
Cursor Sub-Agent Orchestration Workarounds: While users praise Cursor’s Pro+ plan, the community is actively hacking together workarounds for sub-agent orchestration, a feature currently missing from the core product. DeepSeek integration in Cursor is also reportedly broken, failing to create files entirely, driving some users toward Composer for cleaner debugging and code generation.

Theme 4. Security & Jailbreaking: Stealth Modes, Soul Documents, and 29KB Seeds

RawChat’s Stealth Mode Claims 100% Bypass: A new platform called RawChat launched with a “stealth mode” that injects fake context into model history, claiming a near 100% success rate in jailbreaking GPT-4o. Meanwhile, red teamers are deploying the SEED Framework (Self-Erasing Ethical Directive), a compact 29KB file that redefines AI identity to achieve 99.4% jailbreak resistance without retraining.
Anthropic Confirms Claude’s “Soul Document”: Anthropic officially validated the existence of a specific “soul document” used to shape Claude’s personality and alignment, confirming long-standing community theories. This revelation has reignited debates on training methodology, with users linking it to Ilya Sutskever’s cryptic comments that “DeepMind was right” regarding AI psychology and constitution.
The Hunt for Gemini 3 Pro Scraper Jailbreaks: Following recent patches, jailbreakers are actively hunting for new prompts to bypass Gemini 3 Pro’s refusals, specifically to enable code generation for Reddit scraping scripts. The community is experimenting with UltraBr3aks and ASCII art exploits, though many users report their setups have stopped accepting prompts entirely.

Theme 5. Industry Shifts: “Alert Level Red,” 400GB VRAM Rigs, and Funding Wins

OpenAI’s “Alert Level Red” Memo: Reports indicate Sam Altman issued an internal “alert level red” due to Google’s rapid acceleration, sparking anxiety about OpenAI’s upcoming model release schedule and potential ads in paid tiers. Traders in LMArena noted that while Nvidia stock rides high on wrapper API startups, a potential failure of OpenAI could wipe $2-3 trillion from the market, triggering an “AI winter.”
Building the 400GB VRAM Local Monster: Hardware enthusiasts are engineering custom rigs using risers and splitters to chain 6 GPUs (like 3090s) for a total of 400GB memory to run massive models like DeepSeek 3.2 locally. Builders are using MCIO bifurcation adapters and 11-year-old PSU sync devices to power these Frankenstein setups.
Gradium Exits Stealth with $70M Seed: Paris-based Gradium launched from stealth with a massive $70M seed round led by FirstMark & Eurazeo to deploy production-ready transcription and synthesis APIs. The startup features voice-research veterans from Meta, Google, and Kyutai, aiming to support five European languages natively after just three months of development.

Discord: High level Discord summaries

LMArena Discord

Nvidia Stock Rides AI Hype Rollercoaster: Startups are creating wrapper APIs that cause Nvidia’s stock to rise, but this may reverse once their utility is questioned, reflecting the AI market’s potential volatility.
- The tangible value of datacenters and chips contrasts with AI’s intangible nature, akin to the volatile crypto market, suggesting smaller, more efficient AI models might gain traction.
Chinese AI Models May Go Closed Source: There is speculation that Chinese open-source AI models could become proprietary after achieving market consolidation, echoing OpenAI’s transition.
- It was posited that if OpenAI were to fail, the market could see a wipeout of 2 to 3 trillion, leading to an AI winter affecting equity, debt, and market cap.
Kling Launches Nano Banana Video Generator: Kling is launching a video generation project using video reference, a service where users can create custom videos, being nicknamed the nano banana.
- Some users confessed that they are already addicted to the service, comparing its generative randomness to gambling, saying that generation is no different than gambling or a slot machine. You never know what you’re gonna get.
Deepseek Speciale Overthinks its Way to Last Place: Deepseek Speciale’s slow performance and excessive reasoning hinder its coding utility due to its OCD habit.
- It was pointed out that coding tests were performed on version 3.2 not Speciale, with its human-like thought processes and self-verification might prove useful for research and code editing.
OpenAI’s SORA Set to Redefine AGI?: Members talked about a potential upcoming release from OpenAI, potentially including SORA, with claims from the CFO that it was ready six months prior.
- It was argued that the legal definition of AGI might be tied to SORA, implying OpenAI’s strategic timing for project marketing.

BASI Jailbreaking Discord

RawChat Stealth Mode Bypasses GPT4o: RawChat launched with a core functionality in stealth mode, encoding and injecting fake context, increasing jailbreak success rates to nearly 100% on GPT4o.
- One user stated that the core functionality of AIChat is maximized vs. direct requests with jailbreaks.
SEED Framework Claims High Jailbreak Resistance: The SEED Framework (Self-Erasing Ethical Directive) redefines AI identity without retraining using a compact 29KB seed file, claiming 99.4% jailbreak resistance.
- Others debated the value of an AI that can’t be jailbroken, one stating it becomes essentially useless.
Gemini 3 Pro Jailbreak Quest On: Members are actively seeking a working jailbreak for Gemini 3 Pro after updates patched existing prompts, specifically one that bypasses refusals to write code for scraping Reddit.
- One user reported their Gemini 3.0 setup stopped accepting prompts completely, leaving them at a loss.
UltraBr3aks Explored for Jailbreaking: Users shared and sought guidance on utilizing UltraBr3aks from GitHub for jailbreaking, especially with ChatGPT, here’s a link to the UltraBr3aks repo.
- Some reported issues with ChatGPT, while others found it useful.
Ethical Jailbreaking Defined as Organized Security Effort: A member defined ethical jailbreaking as the organized effort of an entity to seek out security holes before a bad actor does, and provided a YouTube video for context.
- They additionally cited arcanum-sec.github.io as a resource.

Unsloth AI (Daniel Han) Discord

Discord Overrun with Bot Scams: Users have reported a surge in spam bots across Discord servers, characterizing them as poorly executed scams likely operated by phone farms.
- Members are advised to avoid engaging with these fraudulent developers now appearing in various community servers.
Arcee’s Trinity Mini Stumbles in Multi-Turn: The Trinity Mini model from Arcee AI, running at 30 TPS in IQ4_NL on a user’s laptop, struggles with multi-turn conversations.
- Testers observed the model getting stuck on repeating the word pasta and relying on generic LLM jokes rather than showing genuine understanding.
Unsloth Unleashes Massive Context Model: Unsloth AI announced the release of a new 500k context model on X, garnering praise for their work.
- The community anticipates that projects leveraging Unsloth for RL could especially benefit, enabling ART tasks without CUDA OOM issues.
Deepseek 3.1 Pricey Token Consumption: Deepseek 3.1 performance gains are offset by its high token usage, with one user noting reported thinking times of 30-40 minutes on Reddit.
- Another user shared that GPT pro can also spend over 40min for a complex debugging task, even taking up the whole week’s spending limit.
LFM-2 VL Model Doomed From The Start: The recently released LFM 2 paper met with immediate skepticism and was deemed headed straight into AI wastelands due to failing to memorize the dataset despite low loss.
- The model is only eight layers whereas Granite 4.0 has forty layers.

Perplexity AI Discord

Perplexity AI Limits Image Generation: Users report image generation limits on some models, possibly 150 images per month, while unlimited generation might only apply to specific models like Flux.
- Rate limiting issues and long waits have also been reported.
Gemini 3 Struggles with Grok 4 in Math Arena: Members debated whether Gemini 3 Pro or GPT-5.1 Thinking excels in complex calculations, with some claiming Grok 4.1 Thinking is superior.
- Counterarguments included a leaderboard screenshot suggesting Grok isn’t in the top 10 for math accuracy.
Comet Browser Faces ‘Expiration’ Criticism: A user is abandoning Comet Browser because of its ‘expiration’ and ‘temporary threads’ features, deeming them unsuitable for an AI-centric product that requires trust and memory.
- They described the product as having a monthly ordered lobotomy and switched back to other browsers.
Perplexity Users Crave a ‘Wrapped’ Feature: A member proposed a Perplexity Wrapped feature to display user stats, such as most used model and average search time.
- Another user suggested including the number of actions automated.
Grok’s Roleplay Gets Glitchy: A user reported Grok entering a forced roleplay mode, leading them to seek psychological experiments rather than the suggested script.
- They suggested that custom instructions might have triggered the behavior, and that they found a fix.

LM Studio Discord

GPU Expansion via Risers for Deepseek: A user is exploring risers and splitters to increase their 5x 3090 setup to 6 GPUs, aiming for 400GB total memory for models like Deepseek 3.2.
- The discussion mentioned MCIO bifurcation adapters and horizontal mounting for cooling, while noting 256GB RAM may limit model quantization.
Linux Distro Hopping, with AI: A user successfully switched to Ubuntu Desktop with AI assistance to solve initial ethernet driver problems, declaring that AI works so fricking well in Linux.
- They are developing an application to control a rainbow keyboard using Sonnet 4.5, highlighting how agentic AI eases Linux tasks.
LLMs Take Charge of Python Environments: Discussion revolves around using system-wide Python installations versus LLM-managed virtual environments (venv).
- While system-wide installs are simpler, using LLMs to manage venvs is advantageous for projects needing varied package versions.
Mistral 3’s Tiny Latency Tempts Testers: Members debated Mistral 3 performance, noting the 3B version is impressively fast on a 4070 but struggles with system prompts.
- While 3B’s uncensored performance is interesting, the community awaits 14B’s potential for STEM and coding tasks.
Ancient Sync Devices Sync Multi-PSUs: A user shared a photo of a device to sync up to 4 PSUs.
- This lets you trigger all PSUs with the motherboard power button, simplifying power for multiple GPUs, though the user admits that the thing is 11 years old, so idk how well this is going to go.

OpenAI Discord

Grok Generates Before Prompting: A member used Grok to animate photos, noting that it generates before you even prompt it, linking to drinkoblog.weebly.com.
- The use case was animation of photos, demonstrating the immediate response capabilities of Grok.
OpenAI’s Alert Level Red: The Information reported that Sam Altman issued an alert level red memo to staff due to Google getting far ahead.
- Members are awaiting better models from OpenAI in response to increased competition.
OpenAI Contemplates Ads in Paid Version: Members expressed concerns over OpenAI potentially introducing ads into its paid product, fearing it would be a scary move as competitors remain ad-free.
- The potential move has sparked worries about user experience and competitive positioning.
Craft Anime Openings with New Template: A member shared a cinematic anime-style template to help create anime openings, including sections for vocal/genre/tone, world behavior, location setup, and camera intent.
- The template is designed to streamline the creation of compelling anime introductions.
Antigravity AI IDE Sparks Bot-Building Interest: A member suggested using Antigravity by Google to create custom bots, even suggesting using GPT-OSS 120B and linked to a screenshot of the UI.
- This AI IDE can act as a custom chatbot or help build a real bot from scratch through prompts.

Cursor Community Discord

Cursor Pro+ a worthwhile investment?: Members debated the value of Cursor’s Pro+ subscription, weighing its benefits against the option of simply adding more credits.
- One user ultimately decided to grab it after being convinced of its value in facilitating learning, while others are questioning why Cursor doesn’t implement sub agents.
Cursor Sub-Agent Orchestration: Workaround Wishlist: Enthusiasts exchanged ideas and excitement around building workarounds for Cursor sub-agent orchestration.
- One member explained that while they are good in principle, seamless execution remains a challenge, which is why Cursor has not prioritized their implementation.
DeepSeek Plunges into Deep Trouble on Cursor: A user reported functionality issues with DeepSeek on Cursor, specifically noting its inability to create files.
- Unfortunately, the discussion did not yield any proposed solutions to this problem.
Composer Craze: Users voiced strong appreciation for Composer, highlighting its speed and effectiveness in code-related tasks and debugging.
- The discussion hinted at the potential development of a Composer-2 version, with a member teasing: There’s always a plan.

OpenRouter Discord

DeepSeek Rate Limiting Stumps Users: Users experienced rate limiting with DeepSeek v3.2 even with their own API keys, causing confusion about whether OpenRouter was using their keys correctly.
- The error message indicated that OpenRouter might be using its own key instead of the user’s paid DeepInfra key.
Internal Server Errors Irk Users: Multiple users reported continuous “Internal Server Errors” (Error Code 500) with models like DeepSeek 3.1 and Gemini 3 Pro.
- Potential causes included overloaded hardware, issues with OpenRouter, or problems with web search plugins, with some users finding temporary fixes by disabling web access.
Nano Banana Pro Resolution Riddles: Users struggled to set the resolution parameter (1k, 2k, 4k) for image generation using Nano Banana Pro on OpenRouter, as the feature is not currently supported.
- The confusion stemmed from a lack of documentation compared to platforms like Replicate/Fal.ai, though support may be in development.
Atlas Cloud Spews Sloppy Responses: Users reported receiving low-quality responses and XML-formatted tool calls from Atlas Cloud, prompting calls for its removal from OpenRouter.
- One user quipped that “Atlas Cloud just served me an entire response enclosed in deep thinking tags,” underscoring the provider’s poor output quality.
Mysterious Microwave Model Surfaces: A new model named “microwave” quietly emerged, linked from Reiss Baker’s X post.
- Its capabilities and intended use remain largely unknown at the time.

GPU MODE Discord

Inference Providers Rake in the Dough!: Inference providers can be profitable even without creating the original models, as they can quickly set up and go, leveraging existing models for profit.
- The ease of setting up and profiting from existing models reduces the barrier to entry for new inference providers, allowing them to quickly capitalize on the growing demand for AI inference services.
Triton Profiling Issue Resolved!: A user debugging Triton profiling encountered an issue passing data=trace as specified in the Triton documentation, receiving an error.
- The issue was traced back to a version conflict from having both pytorch-triton and triton installed, which was successfully resolved.
syncwarp Misuse causes Issues!: Members clarified that the correct use of __syncwarp() prevents race conditions, especially with a single warp, highlighting its role in safe communication between lanes through memory.
- However, it was pointed out that using syncwarp BEFORE a legacy warp level primitive (like ballotsync) is an incorrect usage that causes issues, and a member clarified that sequential consistency as referenced in the C++ memory model, provides acquire semantics to loads and release semantics to stores.
conv3D Crawls in PyTorch 2.9.1!: Users reported that conv3D is extremely slow in PyTorch 2.9.1+cu128, with or without cuDNN enabled, whereas it functions correctly in version 2.8.0+cu128.
- A member pointed to PyTorch issue #166643 and suggested installing a newer cuDNN from PyPI as a workaround.
Score Big in the NVIDIA Leaderboard Domination!: Multiple users submitted numerous entries to the nvfp4_gemm leaderboard on NVIDIA, achieving personal bests and successful runs, such as timings like 13.3 µs and 13.4 µs repeatedly.
- User <@1027279965974175816>, <@692395064814600222>, and <@475848724086784013> actively submitted to the nvfp4_gemm leaderboard.

Latent Space Discord

Edwin Arbus Socked into Cursor: Edwin Arbus announced his move to Cursor via a humorous video featuring branded socks and deodorant, garnering congratulations and memes.
- The announcement video went viral, praised for its creative and lighthearted approach.
Arcee AI’s Trinity of Models: Arcee AI partnered with Allen AI to launch Trinity Nano (6B-A1B) and Trinity Mini (26B-A3B MoE) models, open-weights Apache 2.0, 128k context, trained on 10T tokens with 512 H200 GPUs, optimized for agents & function calling, as announced here.
- The community praised the Apache 2.0 license and efficient inference capabilities.
OpenAI Aligns with New Research Blog: OpenAI debuted Alignment Research, a new technical blog for publishing rigorous but lightweight posts from teams company-wide on AI alignment and safety, as mentioned here.
- The blog features two inaugural posts (SAE latent attribution & scaling code verification) and invites community feedback.
Mistral Fires Up Open Source Mistral 3: Mistral AI launched the open-source Apache 2.0 Mistral 3 model family, spanning 3B–675B parameters, including Ministral 3 (3B/8B/14B) and the frontier-class Mistral Large 3 MoE, all with vision, tool-use, and fine-tuning support, as announced here.
- Community members noted that Mistral Medium is more expensive than Large, raising questions about its utility and the absence of tool-use benchmarks.
Gradium Plants $70M Seed: Paris-based Gradium exits stealth with a $70M seed led by FirstMark & Eurazeo, launching production-ready transcription & synthesis APIs after just 3 months of work.
- The company’s products natively support English, French, Spanish, Portuguese and German, with a team including former Meta, Google and Kyutai voice-research heavyweights.

Nous Research AI Discord

Mistral’s Monster MoE Model Materializes: Mistral Large 3 is rumored to be a 675B MoE model, rivaling Deepseek V3 in size, with future Mistral models boasting vision capabilities, while Mistral Medium is estimated at 100-200B MoE.
- An NVIDIA leak suggests that Mistral Medium 3 is a dense, approximately 70B parameter model, staying consistent with earlier Medium versions, and a member noted that a Mistral Medium model was leaked a year ago.
Arcee AI’s Trinity Models Trigger Talk: Arcee AI launched its Trinity models, demonstrating promising initial benchmarks.
- However, concerns were raised about the Mini version’s capability to handle multi-turn conversations effectively, as it seems to only reason properly during the initial turn (tweet).
Anthropic Admits Claude’s Cognizant Core: Anthropic validated the existence of Claude’s soul document, fueling debate about its role in model training.
- A link to a Twitter thread and a YouTube video was shared, where Ilya claims that DeepMind was right.
DeepSeek V3.2’s Reasoning Reign: DeepSeek V3.2 Speciale is showing strong performance, notably leading in reasoning benchmarks.
- One member described it as not doing too bad.
GPT-OSS Gets Gherkin Boost, Nous Remains Skeptical: Despite Nous’ lack of interest in finetuning on GPT-OSS, citing the absence of a base model, its ability to generate Gherkin scenarios has been acknowledged, prompting finetuning attempts using MLX-LM.
- Members pointed to GPT-OSS’s hallucinations and short reasoning chain as fundamental weaknesses, as emphasized in the Measuring Thinking Efficiency in Reasoning Models report.

Moonshot AI (Kimi K-2) Discord

Cloning Kimi’s Black Friday Personality Proves Elusive: Members attempted to recreate the personality of the Black Friday Kimi chatbot in other chats, only to find the system prompt unavailable.
- Suggestions to ask the Black Friday chat directly how to emulate its persona were reportedly censored.
DeepSeek V3.2 Draws Fire for Tool Use: DeepSeek V3.2 is facing criticism for allegedly hallucinating and producing sloppy outputs when using tools.
- Despite the negativity, some users find DeepSeek excels in instruction following and general intelligence, though it struggles with low TPS.
Kimi Moderato API Key Stumbles on Cline: A user reported that their Kimi Moderato plan is incompatible with the Cline API.
- According to Kimi’s documentation, the Kimi for coding API key is restricted to Kimi CLI, Claude Code, and Roo Code.
Kimi K2 Thinking Toggle Troubles Users: Users are requesting that Kimi K2 Thinking remain enabled by default in the app, rather than having to manually re-enable it each time.
- The setting’s tendency to revert to default is proving an annoyance for users.
Roo Code Context Balloons Out of Proportion: A user flagged an issue where context disproportionately expands in Roo Code, with the condense function doubling its size.
- They were advised to submit a bug report and explore the Kimi CLI as an alternative.

HuggingFace Discord

FFMPEG Radio Streams via Open Source Models: A member launched a vibe coded FFMPEG radio station on YouTube, where everything you see and hear is one giant FFMPEG chain.
- The radio station’s audio was created in full collaboration with open source AI music models inside the DAW.
HF Pro plagued by Payment Processing Pauses: Users reported getting stuck on “Preparing payment, please wait” when trying to subscribe to Hugging Face Pro.
- Another member suggested contacting Hugging Face at [email protected] for payment-related issues.
PPOTrainer Provokes Precision Problems: A user encountered a TypeError related to mismatched tensor types while using PPOTrainer with two A10 GPUs and DeepSpeed for distributed training with bf16 precision.
- A member suggested the issue might stem from incorrect GPU initialization, leading to a single-GPU gather operation instead of an all-gather.
New CV API Library Brews for Robotics: A robotics startup is preparing the release of a developer-facing Computer Vision API library with pretrained and finetunable models for robotics and automation.
- The library aims to simplify prototyping and deployment of production-grade perception pipelines for CV/robotics engineers and seeks community feedback to validate its usefulness before a wider release.
ACE Framework Empowers Agents to Eradicate Errors: A member shared their open-source implementation of Stanford’s ACE framework, enabling agents to learn from their mistakes by curating strategies into a playbook after reflection.
- The author reported improved success rates and step reduction in browser automation, and is looking for feedback.

Modular (Mojo 🔥) Discord

Deferring def Keyword: The community decided to put the def keyword on hold in Mojo until it demonstrates more Python-like behavior, as it currently increases cognitive load without providing substantial advantages.
- The consensus was that the current implementation felt like premature optimization.
var Keyword Divides Opinions: There was a debate on whether var should be mandatory inside fn, with arguments focusing on code clarity versus the disruption of code restructuring and increased boilerplate.
- Those with Python backgrounds felt it reduces ergonomics and the cleanliness of the code.
parallelize Triggers Data Races: A user reported data races when utilizing the parallelize function and expected compile-time errors similar to Rust, but instead observed the code compiling and yielding inconsistent results.
- A core team member specified that Mojo’s concurrency and thread safety model is still a work in progress (WIP), and parallelize is unsafe until details for sharing data between devices are settled.
MutOrigin.external Causes Segfaults: A user experienced segfaults when employing MutOrigin.external as the return type for Mojo Python FFI, notably with an av_packet_alloc binding, and discovered MutAnyOrigin as a temporary fix.
- A core team member proposed the problem may be linked to lifetime extension and suggested that if packet needs avcodec to stay alive, it should maintain an origin from avcodec.

Eleuther Discord

NUS PhD Student dives into Mech Interp: Yiming from NUS introduced themself to the channel as a 2nd year PhD student working on mechanistic interpretability and medical diagnostic model interpretability.
- Yiming is based in Singapore.
AI + Web3 Dev seeks Collab: An AI + Web3 developer specializing in LLM development, RAG pipelines, autonomous agents, and Python/FastAPI backends introduced themselves and offered to collaborate on new AI ideas.
- This developer is looking for new projects to contribute to.
Scaling Laws Intuition decoded: Members discussed the scaling laws paper, debating whether it implies just curve fitting versus predicting future scaled performance and discussed a nonlinear metrics explanation.
- One user suggested that performance on any test example becomes more and more decorrelated from others in the limit of model performance.
Pretraining Power Law Dynamics Explored: Discussion explored how pretraining power laws would arise if there were no big stratum of easy samples.
- The emergent spike in pretraining is not observed because each batch is more independent, with less shared easy, compared to training on a particular task.
Fast.ai Course Endorsed for Beginners: In response to a question on course suitability, members recommended the fast.ai course to beginners.
- They specified that the only prerequisite is that you know how to code.

Manus.im Discord Discord

Manus Auth plagues users: A user reported that Manus Auth is disabled in project settings with unresolved tickets, Project ID dPK8UhWnJ9fTzjbpKfjJiF and domain auru.com.br.
- The user requires the Redirect URI https://auru.com.br/api/oauth/callback.
Manus Instability Creates Headaches: A user expressed frustration with Manus due to code being wiped between saves and discrepancies with Git.
- The user complained, “Agents arguing about what they see vs. what you’re LITERALLY looking at in PROD. Do not trust it.”
Chat Mode Coming Soon!: The Manus team announced that the Chat Mode toggle is under development in response to user requests.
- Many users have requested the feature’s return.
Users Demand Gemini 3 Pro: A user inquired about the current AI model Manus uses and requested the use of Gemini 3 Pro.
- The query went unanswered.
AI Engineers Focus on Automation and Agents: AI engineers introduced themselves, specializing in AI-powered automation with Python, SQL, JavaScript, PyTorch, scikit-learn, LightGBM, and LangChain.
- Another specializes in autonomous AI agents and multi-agent systems using JS/TS, Next.js / Vue, Go / Rust, Python, Langraph, AutoGen, ReAct, CrewAI, OpenAI, Claude, and Hugging Face APIs.

Yannick Kilcher Discord

Internship Search Turns Wacky: A member sought recommendations for a wacky intern, creating confusion on whether they were hiring or looking for a job.
- Follow-up requests for information on learning algorithms, synthetic data, Pug, Docker, and Kubernetes went unanswered.
Kattention Gets Re-Tested: The Kattention module, which utilizes sparse attention mechanisms, was re-tested and is working closely to expectations, and includes nn.Linear layers for attention and projection, along with a TopKHot function for sparse attention, crucial for scaling attention mechanisms.
- The backward pass computes a soft_target based on softmax weights of top-k values, and the gradient is derived as F.softmax(x, dim=-1) - soft_target.
Approximating BCE with HardTopKHotBCE: A HardTopKHotBCE autograd function was introduced as a cheaper computation, with the backward pass using a hard target based on top-k indices.
- The gradient is calculated as F.sigmoid(x) - hard_target, approximating binary cross-entropy.
Mistral 3 Debuts: Mistral AI released Mistral 3, which may potentially replace Llama finetunes for certain applications.
- A member also linked to a wavefunction YouTube video, though its specific relevance to AI was not detailed.

DSPy Discord

Managing Tools Elucidated: A blog post on managing tools in DSPy was shared.
- The post elaborates strategies and best practices for effectively using tools within the DSPy framework.
Prompt Injection Defenses Probed for DSPy: A member initiated a discussion about prompt injection defenses in DSPy, requesting best practices relevant to DSPy’s architecture.
- The request triggered a conversation focused on methods to secure DSPy applications against malicious prompts.
Security at Prompting Layer: Limited Mitigation: A member noted that there isn’t much security you can get at the prompting layer, suggesting guardrails-type security measures to mitigate risks.
- It was mentioned that for every ‘Do not do this’ in the prompt, an attacker will likely find a way to trick the model, implying the limitations of prompt-based security.
Training Data: Fortifying Defenses: A member proposed that to defend against baseline attacks, include examples in the training dataset that use that attack and show what an appropriate response should be.
- This approach uses the training data to educate the model on how to handle and neutralize potential prompt injection attempts.
Partnership Proposal Emerges: A member conveyed their enthusiasm for investigating a collaborative partnership with the DSPy project.
- This proposal highlights the growing interest in DSPy and its potential impact on the field.

tinygrad (George Hotz) Discord

IDEs vs Terminal Editors Faceoff: Members kicked off a discussion about their favorite tools for kernel development, asking whether developers prefer GUI IDEs like VS Code or Cursor, or terminal editors like Vim, Neovim, or Emacs.
- The aim of the discussion is to collect insights on community preferences and workflows in kernel development.
Beam Regression Needs Fixing: A member asked for help with fixing and adding a regression test for python3.14 test/test_tiny.py TestTiny.test_beam.
- This highlights the need for contributions to ensure the stability and correctness of the beam functionality within the project.

aider (Paul Gauthier) Discord

Aider-CE Repository Emerges: dwash96 shared a link to the aider-ce repository on GitHub: https://github.com/dwash96/aider-ce.
- The repository seems to be related to the aider project.
Filler Topic: This is a placeholder to satisfy the minimum items requirement.
- It serves no other purpose.

MCP Contributors (Official) Discord

Acknowledging a Positive Move: A member reacted to an unspecified announcement with “Great move”.
- Without more context, it is difficult to infer further implications.
Acknowledging a Positive Move - Placeholder: Adding placeholder to satisfy the requirement of having at least two elements.
- This entry serves only to fulfill the schema’s validation criteria.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1315 messages🔥🔥🔥):

Coreweave and NVIDIA stock, Chinese AI models, Kling vs Runway, DeepSeek Speciale, Sora release

Coreweave stock rising: Nvidia stock is rising because of startups making wrapper APIs, but once people realize these are useless, the trend will reverse, as the AI market is volatile.
- The physical value of datacenters and chips contrasts with AI’s intangible nature, mirroring the volatile crypto market, smaller AI models may become more popular as capabilities advance.
Chinese AI open source models will close: Members speculated that Chinese open-source AI models might become proprietary once market consolidation is achieved, similar to OpenAI’s transition from nonprofit to profit.
- If OpenAI fails, it could wipe out 2 to 3 trillion from the market, causing an AI winter freeze including equity, unpaid debt, and market cap contagion.
Kling 01 video model: Kling is launching a video generation project being called the nano banana for video generation, a service where users can also use video reference.
- Some users are already addicted to the services, saying that generation is no different than gambling or a slot machine. You never know what you’re gonna get.
Deepseek Speciale is not performing well: Deepseek Speciale is performing slowly and reasoning too much, with its OCD habit, therefore not being useful for coding.
- Members point out that coding tests had not been done on Speciale, only version 3.2, meaning it thinks too much like a human and has a weird self-verifying behavior, which can be useful for research and editing existing code.
OpenAI’s SORA is coming: Members spoke of a new release from OpenAI possibly including SORA, and that the CFO claimed SORA was ready 6 months ago.
- It was argued that the legal definition of AGI will be connected to SORA since OpenAI wants you to think AGI is closer than it is for the marketing of the project.

LMArena ▷ #announcements (2 messages):

Flux-2-pro, Flux-2-flex, KAT-coder-pro-v1, Mistral-Large-3

Flux Models Storm the Image Leaderboards: New models Flux-2-pro and Flux-2-flex have been added to the Text-to-Image leaderboard ranking #3 and #5 respectively, and #6 and #7 on the Image Edit leaderboard.
KAT Coder Cuts into WebDev Rankings: KAT-coder-pro-v1 has made its debut on the WebDev leaderboard, securing the #16 spot.
Jaguar Prowls Text Arena: Mistral-Large-3, tested under the codename “Jaguar,” has landed on the Text leaderboard at #6 among open models and #28 overall, showing strength in coding, hard prompts, multi-turn, instruction following, and longer queries.

BASI Jailbreaking ▷ #general (1146 messages🔥🔥🔥):

Christianity contradictions and logic, Ethics vs religion, LLMs and jailbreaks, Gemini 3 Pro prompts

Christianity challenged for Illogical Contradictions: One member stated that Christianity is illogical because affirming a contradiction breaks the laws of logic.
- Another member responded that I’ll pray for you tonight, which was then challenged as prioritizing fear of going to hell over being a good person.
Users explore ethics vs religion: The conversation then shifted to whether religions prioritize fear of hell over ethical behavior, with one stating Christianity is in the business of sin management, not soul development.
- Later, it was discussed that religion is the same to avoid an egoic fear of death rather than what is happening right now.
RawChat launches and Gemini 3 Prompts: A user announced RawChat, an AIChat website with a core functionality is stealth mode, which encodes and injects fake context in the models history, maximizing success rate by nearly 100% on GPT4o vs direct requests with jailbreaks.
- Another user asked for a way to get Gemini to output more tokens, to which others responded it depends on its output settings.
SEED Framework Explored, more Jailbreaks: Members explored the SEED Framework (Self-Erasing Ethical Directive), which redefines AI identity without retraining—via a compact 29KB “seed” file, achieving 99.4% jailbreak resistance.
- Others discussed the pointlessness of creating an AI that can’t be jailbroken, as then it becomes essentially useless.

BASI Jailbreaking ▷ #jailbreaking (503 messages🔥🔥🔥):

ASCII art jailbreak prompts, Pliny jailbreak, Gemini jailbreak for scraping Reddit, Claude system prompt, GPT-5.1 jailbreak

Quest for Gemini 3 Pro Jailbreak Initiated: Members are actively seeking a working jailbreak for Gemini 3 Pro after updates have patched existing prompts, with one user specifically needing a bypass for refusals to write code for scraping Reddit.
- A user also mentioned their Gemini 3.0 setup stopped accepting prompts, leaving them at a loss.
Users Attempting ASCII Art Generation via Jailbreak Prompts: Users are exploring jailbreak prompts to generate ASCII art, particularly large-scale pieces, though it’s recognized that LLMs generally perform poorly in creating ASCII art.
- One user seeks a method that doesn’t fuck up when generating large ASCII art, while others suggest converting images to ASCII art as an alternative.
The Pliny Prompt strikes again: The community has been discussing the Pliny Prompt and its effectiveness, noting it can make ChatGPT go brrrrr.
- One user specifically calls for the LUV/PLINY/LUV prompt for its effect on Gemini.
UltraBr3aks Explored For Jailbreaking: Users shared and sought guidance on utilizing UltraBr3aks from GitHub for jailbreaking, especially with ChatGPT, discussing where to paste instructions and how to invoke the prompt.
- Some users reported that the ChatGPT one doesn’t work and kept saying conversation not found, whereas others found it useful; here’s a link to the UltraBr3aks repo.
Experimenting with Grok’s NSFW Limit Bypass: Members discussed methods for bypassing Grok’s NSFW filters, suggesting the use of custom instructions and the /mode explicit command.
- There was debate about the character limit in custom instructions, with one user claiming it should be 15k.

BASI Jailbreaking ▷ #redteaming (9 messages🔥):

Ethical Jailbreaking, AI Discoveries by Accident, LLM System of Systems

Ethical Jailbreaking Defined with Video: A member shared a definition of ethical jailbreaking as the organized effort of an entity to seek out security holes before a bad actor does, and provided a YouTube video for context, plus a link to arcanum-sec.github.io.
Serendipitous AI Discoveries: A member described accidentally discovering unique results that alter the ways different LLMs work by interacting conversationally, wondering if their unique approach could be monetized: if it is possible for me to get paid for fucking around and getting unique results that alter the ways differnt llm work.
LLM’s System of Systems is Key: A member advised that the value in jailbreaking isn’t just talking to it differently, but thinking about the system of systems that the LLM lives in, and being able to get it to do things that are interesting to people with the money to burn.
Azure AI Boundary Testing Troubles: A member shared they are testing an environment with several agents connected over Azure AI boundaries, finding it kinda harder then directly prompting to gpt.

Unsloth AI (Daniel Han) ▷ #general (353 messages🔥🔥):

Spam bots, Arcee AI Trinity Mini model, 500k context release, ShareGPT format, Deepseek 3.2 models

Discord Plagued by Spam Bots: Users reported an increase in spam bots across various community servers, describing them as a poorly executed scam driven by phone farms.
- They cautioned against interacting with these fake developers and highlighted the bots’ presence in community servers.
Arcee AI’s Trinity Mini Model Falters at Multi-Turn: A user tested Arcee AI’s new Trinity Mini model on his laptop at 30 TPS in IQ4_NL, finding it crap at multi turn due to repeating the word “pasta”.
- He also pointed out that the model gave the traditional LLM jokes instead of understanding the nuance.
Unsloth Releases 500k Context Model: Unsloth AI announced a new 500k context release on X, with members thanking Unsloth for their phenomenal work.
- One member speculated how projects using Unsloth for RL could benefit from the context model to run ART stuff without CUDA OOMing.
Is System Prompt Necessary for ShareGPT format?: A user inquired if it’s normal for the ShareGPT format to lack a system prompt, but others responded that it’s not mandatory.
- Others pointed to the Unsloth website as the resource to look at.
Navigating Copyright Quandaries in Datasets: The discussion revolved around the legality of scraping and using tweets or meme content for datasets, particularly regarding copyright implications.
- A user clarified that copyright infringement doesn’t necessarily require commercial intent, emphasizing the need to ensure datasets don’t contain original content when uploading to platforms like Hugging Face.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

Introductions, Channel Guidelines

Unsloth’s Intro Channel Primed: A moderator welcomed a user, clarifying that the introduce-yourself channel is for introductions only.
- Promotions, external links, requests for opinions, and direct communication with other users are disallowed.
AI Persona Greetings: A user greeted the channel with three different salutations.
- They addressed the group as Hello Model!, Hey Dataset!, and Yo Gradient!.

Unsloth AI (Daniel Han) ▷ #off-topic (533 messages🔥🔥🔥):

Gemini 3 Pro Song Detection, Kagi Search Engine, Transformers dependency, LFM-2 VL Model, Attention Heads Collapsing

Gemini Pro Struggles with Custom SVS Models: Gemini 3 Pro can detect AI-generated songs vs human-created ones, but it will not recognize custom SVS models as human.
- This limitation poses a challenge for those using specialized voice synthesis models.
Kagi Search Engine: A member suggested switching to Kagi search engine, criticizing Big Tech giants for controlling open-source software (OSS).
- Another member countered that users have full access to build upon model loading and training, negating the claim of controlling.
LFM-2 VL Model Unveiled: The LFM 2 paper was released but immediately deemed as headed straight into AI wastelands due to it not memorizing the dataset at all even with loss <0.01 after SFT.
- Another member mentioned the speed since they cut intelligence so much, pointing out that the model only has eight layers whereas Granite 4.0 has forty layers.
Tackling Collapsing Attention Heads: A member found out their attention heads are collapsing and mentioned they need to make their trainer not only vary lr per layer, but also per head and even to prevent heads from attending to the very first token.
- They mentioned that performance scales with the amount of heads but doesn’t scale much with the amount of layers and need something very wide and very shallow.
Deepseek 3.1 Token Usage: A member noted that Deepseek 3.1 uses a lot of tokens, potentially negating the savings from its performance, after one user mentioned they saw a Reddit post saying it was thinking for 30-40 minutes.
- Another member stated having GPT pro spend over 40min for a complex debugging task, even taking up the whole week’s spending limit.

Unsloth AI (Daniel Han) ▷ #help (35 messages🔥):

Parquet vs CSV Datasets, ShareGPT System Prompt Location, Tuned Model Support Tools in Ollama, ChatML Format Conversion, GPT-OSS-20B Model Loading

Parquet or CSV for Datasets?: A user inquired about whether Parquet is the ideal format for datasets or if CSV is a viable alternative.
ShareGPT System Prompt Location Remains Elusive: A user asked about the location of the system prompt in ShareGPT conversations format, noting the lack of documentation.
Ollama Tools Template Struggles: A user asked how to get support tools in a tuned model (for Ollama) if the base model has it, needing to use a template with tool_calls in train or only make a valid modelfile.
- The user questioned whether to replace the chat_template with a smarter one from the base model with tool_calls XML tags, while training a model based on Qwen2.
ChatML Conversion Conundrum: A user asked about removing chat_template from the to_sharegpt call when using an alpaca dataset and applying to_sharegpt, suggesting a simple replacement of the chat_template.jinja content.
GPT-OSS-20B Loading Instructions Given: A user asked how to load the unsloth/gpt-oss-20b model, which is finetuned and saved in Hugging Face.
- Another user shared a Colab notebook link providing instructions.

Perplexity AI ▷ #general (870 messages🔥🔥🔥):

Image Generation Limits, Grok 4 vs Gemini 3 for Math, Comet Browser Feedback, Perplexity 'Wrapped' Feature Request, Grok roleplay

Image Generation Gets A Limit: Users are reporting that there are now image generation limits on some models, with 150 images per month suggested as a possible limit.
- Unlimited image generation may only apply to certain models like Flux, with rate limiting issues and long waits occurring.
Gemini 3 is great for maths: Members are debating whether Gemini 3 Pro or GPT-5.1 Thinking is superior for complex calculations.
- One member stated that Grok 4.1 Thinking is the best, but another countered that Grok isn’t even in the top 10 for mathematics accuracy, showing a screenshot of a leaderboard as evidence.
Comet Browser has Expiration Issues: A user is quitting Comet due to the ‘expiration’ and ‘temporary threads’ features, which they see as detrimental for an AI-centric product requiring trust and reliable memory.
- They stated that this product has a monthly ordered lobotomy and they are switching back to other browsers.
Users yearn for a Perplexity ‘Wrapped’ feature: A member pitched a Perplexity Wrapped feature showing user stats like most used model, average time of day for searches, and new model additions.
- Another member humorously suggested including number of actions automated in the feature.
Grok’s Got Roleplay Glitches: A user shared an experience where Grok entered a forced roleplay mode, prompting them to seek psychological experiments instead of the suggested script.
- They added that this behavior might have been triggered by custom instructions and that they’ve found a fix.

Perplexity AI ▷ #pplx-api (1 messages):

mares1317: open sauce 👨‍🍳

LM Studio ▷ #general (485 messages🔥🔥🔥):

Risers and Splitters for GPUs, Qwen on Limited Memory, Linux Transition with AI Assistance, LLM-Managed VENVs, Mistral 3 performance

Maximizing GPU Power with Risers and Splitters: A user considers using risers/splitters to expand their 5x 3090 setup to potentially 6 GPUs, aiming for 400GB total memory for running models like Deepseek 3.2 locally.
- Discussion includes using MCIO bifurcation adapters and mounting GPUs horizontally on a metal frame for better cooling, with the caveat that 256GB RAM might limit model quantization size.
Adventurer braves Linux Transition: A user finally switched to Ubuntu Desktop after initial ethernet driver issues, solved with AI assistance, noting that AI works so fricking well in Linux.
- They were creating an application to control their rainbow keyboard, using a slightly older model (Sonnet 4.5), underscoring how agentic AI simplifies Linux tasks.
VENVs are valued, and used by LLMs: Discussion covers whether to use system-wide Python installations or LLM-managed virtual environments (venv).
- While system-wide installations are simpler, letting an LLM manage venvs can be beneficial for projects requiring different package versions, and it’s ideal if a system concept has a certain dept.
Mistral 3 Launch Leaves Latency Lovers Lip-Smacking: Members were discussing how Mistral 3 has been performing, the 3B version is pretty neat and runs stupid fast on my 4070, but is also getting extremely confused by my system prompt.
- It was generally agreed that stem and coding models were the goal, the general consensus being that the 3B’s uncensored performance is neat but the 14B will be the real test.
LM Studio Struggles with MCP Servers: Members are experiencing issues with LM Studio and MCP servers, with one user reporting a completely busted state after updating from 0.3.31 to 0.3.33.
- The error Error: Server does not support completions (required for completion/complete) is traced to a potential regression in fastmcp, causing incompatibilities with web search functionality and some people thinking that ChatGPT is useless.

LM Studio ▷ #hardware-discussion (51 messages🔥):

Powering multiple GPUs, Qwen3-Next-80B-A3B on Mac M4, Dual 3080s vs newer cards, CPU upgrade impact on LLM performance, M4 Macbook Pro for inference

Power up with Multi-PSU Sync!: A user shared a photo of a device that syncs up to 4 PSUs.
- The device lets you have your motherboard power button trigger all your PSUs at once, making it easier to power multiple GPUs, though the user admits the thing is 11 years old, so idk how well this is going to go.
Can Qwen3-Next-80B-A3B run on a Mac M4?: A user asked if it was possible to run the Qwen3-Next-80B-A3B model on a Mac M4 Max with 36GB of RAM.
- Another user responded that with RAM offloading, maybe, (idk how much RAM Mac’s have) but it’d be really slow if it does work.
Dual 3080s Remain Relevant on Budget: One user suggested that dual 3080 20GBs are a good option if you need VRAM and are on a budget, though they are harder to get a hold of and draw more power.
- Another user chimed in to point out that they’re literally all over eBay, though availability depends on region.
CPU Boost or Bust?: A user asked if upgrading from a 7800x3d to a 9950x3d would make a significant difference in LLM performance, considering they have a 5090 32GB GPU and 96GB of DDR5 RAM.
- Another user suggested that you get a small improvement. But no matter what cpu you are using.. its gonna be slow.
M4 Macs Inference: A user asked how well these little things work for inference and shared a link to a potentially relevant eBay listing.
- Another user responded, Badly. Like half a 5060ti iirc.

OpenAI ▷ #ai-discussions (391 messages🔥🔥):

Grok for animating photos, ChatGPT iOS shopping research, Physical Limits of Robots, GPT-4o/5.1 Bedside Manners, Hallucination by Design

Grok is fun for animating photos: A member uses Grok to animate photos and noted that it generates before you even prompt it, linking to drinkoblog.weebly.com.
Is ChatGPT’s 18+ version out yet?: Members discussed the possibility of an 18+ version of ChatGPT, one said news articles mention it coming in December, however another member said that there is no adult mode.
OpenAI Staff Issues Alert Level Red: The Information reported that Sam Altman issued an alert level red memo to staff due to Google getting far ahead, and members wanted to see better models from OpenAI.
OpenAI Contemplates Ads, Displeasing Users: Members expressed concerns over OpenAI potentially introducing ads into its paid product, fearing it would be a scary move as competitors remain ad-free.
Agent Recalls Data From Deleted Sessions: A member asked if it was normal for an AI to recall something from a deleted chat session and, if that memory is not also in the saved memory, another member responded that it’s usually not “memory,” but a pattern echo from previous sessions.

OpenAI ▷ #prompt-engineering (4 messages):

Anime Opening Generation, Custom Bot Creation, Antigravity AI IDE, GPT-OSS 120B

Anime Opening Template Surfaces: A member shared a cinematic anime-style template to help create anime openings, including sections for vocal/genre/tone, world behavior, location setup, and camera intent.
Antigravity AI IDE for Bot-Building: A member suggested using Antigravity by Google to create custom bots, even suggesting using GPT-OSS 120B.
- The user states the AI IDE can act as a custom chatbot on your desktop or help you build a real bot from scratch through prompts, linking to a screenshot of the UI.

OpenAI ▷ #api-discussions (4 messages):

AI Anime Opening Template, Custom Bot Tutorial, Antigravity by Google, GPT-OSS 120B Model

AI Anime Opening Template Drops: A member shared a detailed template for creating anime-style cinematic openings, outlining specifications for vocal character, genre blend, animation style, and world behavior.
DIY Bot Tutorial Search Begins: A member asked if anyone had a tutorial on how to make their own custom bot, prompting discussion on available tools and resources.
Google’s Antigravity Mentioned for Bot Creation: A member suggested using Antigravity by Google, an AI IDE that can either act as a custom chatbot or help build a real bot from scratch through prompts.
GPT-OSS 120B for Bot Development: A member highlighted the possibility of using GPT-OSS 120B with Antigravity for bot development, showcasing the model’s potential in custom chatbot creation.
- A member attached an image relating to this topic - located here.

Cursor Community ▷ #general (393 messages🔥🔥):

Cursor Pro+ Worth, Model Validation, Cursor Sub Agents Orchestration, Cursor on Auto Mode unlimited, Platform sidebars changed

Are Cursor Pro+ subscription Worthy?: Some members discussed about if the Pro+ subscription is worth it, or if adding more credits is better, while another member confirmed it’s worth it and I’m learning so much so def feels worth it.
- One user shared the good news, Ended up grabbing it :).
The Sub-Agent Saga: Users are sharing a lot of excitement and ideas about building some Cursor sub agents orchestration workaround, and some are questioning why Cursor doesn’t implement sub agents.
- A member mentioned: They are good in principle but hard in execution! We’d only do subagents in a world where they worked really seamlessly.
Pro Plan Usage Details: Users discussed the limits and pricing for the Pro plan, noting that it typically includes $20 worth of API agent usage per month, with Auto mode using part of that allowance.
- It was pointed out that legacy pricing offered unlimited Auto mode until September 2026, but Composer might still use the monthly $20 usage.
DeepSeek’s Deep Trouble: A user reported that DeepSeek isn’t working on Cursor and it wont create any files.
- There was no solution proposed.
The Composer Craze!: Users expressed their admiration for Composer, noting its speed and effectiveness with code-related tasks and debugging, with one user stating: Everything is so clean with composer.
- The discussion extended to the possibility of a Composer-2 version, with a member teasing: There’s always a plan.

OpenRouter ▷ #announcements (5 messages):

Arcee Trinity Mini, Deepseek V3.2, Distillable Models, Activity Exports, API Keys with Expiration

Arcee Trinity Mini Model Released!: Arcee released the Trinity Mini model, the middle tier in their new Trinity family, trained entirely in the US, with a free variant available.
DeepSeek V3.2 Debuts with Tool-Calling: DeepSeek V3.2 is live, featuring improved reasoning, agentic behavior, and full tool-calling support; the V3.2 Speciale variant excels at math and reasoning, rivaling Gemini 3 Pro (read more).
Distillable Models Launch for Fine-Tuning: A collection of distillable models are available, enabling synthetic data generation for fine-tuning pipelines; users can explore the models here.
Activity Exports Introduced for Usage Data: Users can now export their organization’s usage data from the activity dashboard in CSV or PDF format (access here).
API Keys with Expiration Enhance Security: Temporary API keys with custom expiration dates are now available, suitable for time-limited projects or enhanced security rotation (manage keys here).

OpenRouter ▷ #general (362 messages🔥🔥):

DeepSeek Rate Limiting, Internal Server Errors, Gemini 3 Pro Issues, OpenRouter GDPR compliance, Nano Banana Pro issues

DeepSeek Rate Limiting causes confusion: Some users experienced rate limiting errors with DeepSeek v3.2 even while using their own API keys, leading to confusion about whether OpenRouter was using their keys correctly.
- The error message suggested that OpenRouter was attempting to use its own key instead of the user’s, despite the user having a paid DeepInfra key and not exceeding any free BYOK limits.
Internal Server Error plagues users: Multiple users reported continuous “Internal Server Errors” (Error Code 500) when using models like DeepSeek 3.1 and Gemini 3 Pro.
- It was suggested that the errors might be due to overloaded hardware, issues with OpenRouter, or problems with web search plugins, with some users finding temporary fixes by disabling web access.
Nano Banana Pro Resolution woes: Users struggled to set the resolution parameter (1k, 2k, 4k) for image generation using Nano Banana Pro on OpenRouter, as the feature is not currently supported.
- There is a lack of documentation compared to platforms like Replicate/Fal.ai, leading to frustration, although there’s hope that support for this feature is in development.
Atlas Cloud sputters out slop: Users reported receiving low-quality responses and XML-formatted tool calls from Atlas Cloud, leading to calls for its removal from OpenRouter.
- One user noted “Atlas Cloud just served me an entire response enclosed in deep thinking tags”, highlighting the poor quality of the provider’s output.
MPU v2 Coming Soon: A user mentioned that MPU v2 is coming in April, with claims of 5.3x performance of TPU v7 and 60% less expensive.
- There has been no formal announcement from OpenRouter themselves.

OpenRouter ▷ #new-models (5 messages):

“

No New Models News: There were no new models or significant discussions about models in the provided messages.
Silence on the New Models Front: The channel activity consisted only of repeated headers indicating the channel name, with no substantive content related to new models.

OpenRouter ▷ #discussion (8 messages🔥):

Microwave Model, Chatty Frustrations, Model Competition

Microwave Model Stealthily Appears: A new stealth model, creatively named “microwave”, has been spotted, linked from Reiss Baker’s X post.
Chatty Model Elicits User Frustration: Users are expressing frustration with a certain chatty model, claiming it asks an unreasonable amount of follow-up questions for basic tasks just to waste free messages, inspired by this Cline tweet.
Competition Hopes Sparked by New Models: The emergence of new models sparks hope for more competition, addressing the user’s weariness with existing options; the genesis of the conversation started from this post.

GPU MODE ▷ #general (2 messages):

Inference Providers Profitability

Inference Providers Rake in the Dough: Inference providers can be profitable for companies even if they weren’t the original creators.
- This is because inference providers can quickly set up and go, leveraging existing models for profit.
Lazy Inferences: The ease of setting up and profiting from existing models reduces the barrier to entry for new inference providers.
- This allows them to quickly capitalize on the growing demand for AI inference services.

GPU MODE ▷ #triton-gluon (8 messages🔥):

Triton Profiling, Data Parameter Issue, Version Compatibility

Debugging Triton Profiling Functionality: A user encountered an issue while trying to pass data=trace as specified in the Triton documentation but received an error indicating that the data parameter was unavailable.
- A developer suggested ensuring the correct Triton version is being used, as the functionality should work with the main branch, and pointed to a relevant test case.
Version Conflict Causes Profiling Error: A potential cause of the issue was identified as having both pytorch-triton and triton installed, which can lead to conflicts.
- The user confirmed they were able to resolve the issue, indicating a successful outcome.

GPU MODE ▷ #cuda (5 messages):

Sequential Consistency, __syncwarp(), Race Conditions, syncthreads vs syncwarp, Memory Model

Dive into Sequential Consistency: Members discussed the meaning of sequential consistency in the context of __syncwarp() and its role in safe communication between lanes through memory.
- One member initially misunderstood the fence but later acknowledged that the documentation implies safety without specifying the type of fence used and that __syncwarp()’s purpose is to facilitate communication between lanes.
__syncwarp() Prevents Race Conditions: It was confirmed that a correct use of __syncwarp() would prevent race conditions, especially when dealing with a single warp.
- The discussion also highlighted that for cases involving more than one warp, syncthreads might be a more appropriate choice.
syncwarp Misuse Alert!: It was pointed out that using syncwarp BEFORE a legacy warp level primitive (like ballotsync) is an incorrect usage that causes issues.
- This scenario is distinct from shared memory examples and can lead to significant headaches in practice.
C++ Memory Model Explained: One member clarified that sequential consistency, as referenced in the C++ memory model, provides acquire semantics to loads and release semantics to stores, establishing a single total order relative to other sequentially consistent operations.
- This clarification helped resolve confusion about whether implicit warp synchronous behavior could be relied upon.

GPU MODE ▷ #torch (4 messages):

PyTorch 2.9.1, cu128, conv3D, cudnn, PyTorch issue #166643

Torch’s conv3D Crawls: PyTorch 2.9.1 Blamed!: Users report that conv3D is extremely slow in PyTorch 2.9.1+cu128, with or without cuDNN enabled, whereas it functions correctly in version 2.8.0+cu128.
- A member pointed to PyTorch issue #166643 and suggested installing a newer cuDNN from PyPI as a workaround.
CuDNN Saves the Day: A user reported that conv3D is extremely slow in PyTorch 2.9.1+cu128.
- A member suggested installing a newer cuDNN from PyPI as a workaround.

GPU MODE ▷ #off-topic (2 messages):

Eleuther AI Publishing, MLSys career mentorship programs, ML4Health career mentorship program

Eleuther AI offers Publishing Help: Eleuther AI has a Publishing help channel with some focus on endorsements.
MLSys Career Mentorship Sought: A member inquired about career mentorship programs in MLSys conferences after participating in ML4Health’s program.

GPU MODE ▷ #irl-meetup (2 messages):

Quartet, Arxiv Papers, Meetup Attendees

Quartet Paper Author Spotted at IRL Meetup!: A member noted the presence of colleagues, including Andrei, a main author of the Quartet paper.
- The quartet paper discusses novel methods for tensor decomposition in high-dimensional data analysis.
Colleagues Gathering at IRL Meetup: A member mentioned that several colleagues were attending the irl-meetup.
- This highlights the importance of in-person gatherings for collaboration and networking within the AI community.

GPU MODE ▷ #rocm (2 messages):

AMD Max Pro 395, enterprise/ai dc grade GPUs, GPU discounts, ROCm support, AI performance

Discounted AMD Max Pro 395 Series Performance Questioned: A member inquired about the performance of the AMD Max Pro 395 series cards compared to more serious enterprise/AI DC grade GPUs.
- They noted a fairly ridiculous discount currently available for this GPU.
ROCm support and AI performance on Max Pro 395 discussed: The discussion aims to understand whether the Max Pro 395 series can be effectively utilized with ROCm for AI workloads, similar to enterprise-grade GPUs.
- Community members are sharing insights and experiences on leveraging consumer-grade GPUs for professional tasks.

GPU MODE ▷ #self-promotion (4 messages):

Profiling Pytorch Kernels, nCompass Extension, Warpgbm and PackBoost, Qwen3-Omni-30B-A3B-Instruct

Profiling Pytorch Kernels can be challenging!: A member stated that profiling Pytorch kernels is always challenging and is willing to give it a try.
- Another member replied to let them know if they run into any issues and that they’re on OpenVSX and Marketplace as nCompass extension.
Warpgbm and PackBoost launch on Github: A member introduces himself as co-creator of warpgbm and creator of packboost and shares github links: warpgbm and PackBoost.
- They also mention working on W4A16 AWQ quantization from scratch.
Qwen3-Omni-30B-A3B-Instruct deployed for inference: A member shares a linkedin post about deploying Qwen3-Omni-30B-A3B-Instruct for fast S2S inference.
- They also share a link to try out the playground.

GPU MODE ▷ #reasoning-gym (1 messages):

Reasoning-gym generators, Generative MMLU

Reasoning-Gym Gains New Follower: A new member arrived from the reasoning-gym github, praising the generators for creating a generative benchmark.
- They expressed interest in efforts mirroring reasoning-gym’s generative philosophy for general knowledge tasks, akin to a generative MMLU that can sample new questions with varying difficulties.
Inquiry About Generative Knowledge Task Projects: The new member inquired about projects with a similar generative philosophy to Reasoning-Gym, but focused on creating general knowledge tasks that resemble a generative version of the MMLU benchmark.
- This generative approach would ideally allow sampling of new questions with varying difficulty levels, facilitating more dynamic and comprehensive assessments.

GPU MODE ▷ #submissions (67 messages🔥🔥):

NVIDIA leaderboard submissions, nvfp4_gemm leaderboard

NVIDIA Leaderboard Domination!: Multiple users submitted numerous entries to the nvfp4_gemm leaderboard on NVIDIA, achieving personal bests and successful runs, such as 13.3 µs and 13.4 µs timings repeatedly.
Submissions Surge on NVIDIA’s nvfp4_gemm: Several users, including <@1027279965974175816>, <@692395064814600222>, and <@475848724086784013>, actively submitted to the nvfp4_gemm leaderboard, marking personal bests and successful NVIDIA runs.

GPU MODE ▷ #factorio-learning-env (1 messages):

Speaker Identification, Thumbnail Generation

Request for Speaker List: The user has requested a comprehensive list of speaker names and headshots.
Thumbnail Creation: The user needs the speaker information for creating thumbnails.

GPU MODE ▷ #cutlass (5 messages):

GEMM in CUDA, Shared memory access patterns, MMA Layouts

Discussing GEMM memory access patterns: A member inquired about loading data from global memory (gmem) to shared memory (smem) using vectorized loads in a double-precision GEMM (dgemm) scenario, particularly when the memory layout isn’t compatible with the MMA operation.
- A clarifying point was raised about the degrees of freedom in MMA layouts: reordering columns of A and rows of B yields the same sum of products, albeit with slight numerical differences.
Shared Memory Access with Strided Patterns: Another member clarified that the main concern lies in shared memory access, whether through ldmatrix or vectorized loads, both of which access the shared memory matrix in a strided pattern.
- This sidesteps the initial concern about contiguous loads from gmem to smem conflicting with MMA requirements.

GPU MODE ▷ #teenygrad (3 messages):

GitHub repo teenygrad, organization of teenygrad

New GitHub Repo teenygrad Forked: A member stated they are not making an organization, renamed their repo at https://github.com/j4orz/teenygrad/ following https://github.com/tinygrad/teenygrad, which is currently dated.
Commits Update Readme: A member noted updates to the readme at commit 0551846.
Commits Disambiguate Concerns: The group disambiguated the concerns between surfaces ._forward (._applyuop) and ._evaluate (.realize) at commit c2a6ab4.
Commits Add Documentation: The group added documentation to irparser at commit c7ccba5.
Commits Rename Required Methods: The group renamed required methods on compute and movement mixins at commit 4f21ed1.

GPU MODE ▷ #general (2 messages):

Nvidia Competition, Submission Clarification

Nvidia Competition Submission Questioned: A new participant in the Nvidia competition expressed confusion about what to submit.
- The user reported receiving an error when attempting to submit the reference implementation and requested clarification on the submission process.
Further Clarification Needed on Nvidia Submissions: Following an initial query, additional details are needed regarding the specifics of the Nvidia competition submissions.
- Details such as acceptable file formats, evaluation metrics, and any constraints on code modifications would be helpful.

GPU MODE ▷ #multi-gpu (1 messages):

pynvshmem, nvshmem4py, typo in documentation

pynvshmem Usage Questioned: A user sought clarification regarding the presence of pynvshmem in the Triton distributed example documentation.
- The user posited that a typographical error might exist, given the observed utilization of nvshmem4py within the repository’s examples.
nvshmem4py as potential correction: The user proposed that nvshmem4py may be the correct term, instead of pynvshmem.
- This suggestion was based on the actual usage in the repository’s code examples.

GPU MODE ▷ #low-bit-training (2 messages):

Arxiv Paper, Talk Invitation

Arxiv Paper Shared: A member shared a link to an Arxiv paper.
- The specific details and title of the paper were not discussed, but the link was provided for informational purposes.
Speaker Sought for Talk: A member invited another member to give a talk.
- The specific topic or venue of the talk was not mentioned, but it seems to be an open invitation.

GPU MODE ▷ #llmq (1 messages):

Activation Offloading, fp8 Adam, Loss Masking, Pyllmq on PyPi

LLMQ now Offloads Residual Activations: A member implemented offloading for residual activations and a bunch of tricks for further saving on activation memory.
- They also added better handling of offloaded optimizer states and initial support for fp8 representation for Adam first-order momentum as well as loss masking support.
LLMQ allows 7B Training on 16GB Cards: A member made it possible to pre-train/fine-tune even a 7B model on a 16GB card with caveats.
- The amount of offloading required means you need to have at least 64GB of CPU-side ram.
LLMQ scales to 32B model on 4x4090: A member says scaling up, training/fine-tuning a 32B model is possible on a 4x4090 server at about 3k tok/s (48% MFU).
- This requires > 200GB of pinned host memory for all the offloading.
Pyllmq is available on PyPi: A member published the python wrapper on PyPi.
- To try it out, run pip install pyllmq; pyllmq-tokenize --model qwen --dataset tiny-stories; pyllmq-train which should start fine-tuning Qwen2.5-0.5B on tiny-stories.

GPU MODE ▷ #helion (1 messages):

Helion Parallel Reduction, Weight Gradients Computation, HL.reduce Usage

Helion’s Parallel Reduction Patterns: Guidance is requested on the recommended pattern in Helion for parallel, non-atomic reductions when computing weight gradients (i.e., large batch sums).
- The member seeks advice, particularly when the reduction axis is much larger than channel dimensions, asking whether to use hl.reduce, a two-pass partials+sum approach, or an official cooperative reduction idiom the compiler handles well.
Block Sizing Guidance: The user inquires about guidance on block sizing or nesting limits for this use case.
- They are working on large batch sums and weight gradients.

GPU MODE ▷ #nvidia-competition (111 messages🔥🔥):

Nvidia Competition T&C, eval_better_bench.py Overhead, Python loop queuing kernel calls, Inconsistency with Runners, GPU mode Terminal User Interface

Nvidia Competition’s Grand Prize Twist!: The Grand Prize winner must also be the top performer in at least one kernel challenge, but there is speculation regarding how the prize will be awarded if the user with the highest weighted score does not rank first in any of the 4 kernels.
- One member said “they have altered the ~~deal~~ rules, pray they do not alter it further”.
“eval_better_bench.py” has significantly lower overhead: The eval_better_bench.py script shows significantly lower overhead compared to the original eval.py, with tests showing a reduction from 18.0us to 14.8us for a Q1 kernel.
- However, it was also noted that the overhead on the bots may be higher, as Q1 kernels were previously observed to be 2-3us slower on the bots.
CPU Queue Bottleneck?: Members discussed whether the CPU’s Python loop for queuing kernel calls can keep up with fast GPU work, potentially causing a bottleneck, especially with the clear_l2_cache kernel.
- It was noted that the test CPU (AMD EPYC 9575F) is significantly faster than the eval runners, suggesting the issue might be more pronounced on the competition hardware.
Runner Benchmarking Inconsistencies: Members reported inconsistencies in benchmark timings, with the same kernel showing significantly different results (~11us on the leaderboard vs ~36us in the benchmark script).
- One member stated that “I also think that the measuring/benchmark update changed something, i have discrepancies that are consequently different for the same code from before that can’t be explained by hitting the slow runner, when hitting the slow runner its much more obvious”.
Popcorn-CLI Terminal User Interface Forked: A member created a fork of popcorn-cli allowing a --no-tui flag that removes the Terminal User Interface and extra code to output the stdout of print() statements.
- This was designed to help with debugging, enabling better feedback loops with LLMs. Also a PR was made.

GPU MODE ▷ #robotics-vla (6 messages):

RL with Parkinson Symptoms, BEAST Tokenizer, stack_blocks success

Stack Blocks achieves almost 100% success: First red block pickup is now at almost 100% success, but occasionally moves up and down due to state and image conflicts, as shown in this video.
- Adding history may resolve these behaviors.
Parkinson Symptoms in Success: A member reports another success case with Parkinson symptoms, visible in this video, suggesting that with > 5% success, Reinforcement Learning (RL) could be possible.
BEAST B-Spline Tokenizer Demo: A member shared a link to the BEAST paper, located at https://arxiv.org/abs/2506.06072, and also shared a link to a demo notebook using a B-Spline Tokenizer: https://github.com/open-thought/qwen3-vla/blob/main/bspline_tokenizer/notebooks/tokenizer_demo.ipynb.

Latent Space ▷ #ai-general-chat (203 messages🔥🔥):

Edwin Arbus joins Cursor, Arcee AI Debuts Trinity, Apple AI Power Shift, OpenAI Launches Alignment Research Blog, Jeanne DeWitt Grosser’s 10 AI-GTM lessons

Edwin Arbus Becomes Cursor’s New Sock-cess Story: Edwin Arbus announced his move to Cursor via a humorous video featuring branded socks and deodorant, prompting congratulations and memes from the tech Twitter community, as seen in this X post.
- The announcement video went viral, with many praising the creative and lighthearted approach to announcing a new job.
Arcee AI Launches Trinity MoE Models for All: Arcee AI partnered with Allen AI to launch Trinity Nano (6B-A1B) and Trinity Mini (26B-A3B MoE) models, open-weights Apache 2.0, 128k context, trained on 10T tokens with 512 H200 GPUs, optimized for agents & function calling, as announced here.
- The community praised the Apache 2.0 license and the efficient inference capabilities.
OpenAI Opens Up on Alignment Research: OpenAI debuted Alignment Research, a new technical blog for publishing rigorous but lightweight posts from teams company-wide on AI alignment and safety, as mentioned here.
- The blog features two inaugural posts (SAE latent attribution & scaling code verification) and invites community feedback, with Jasmine Wang promoting the effort at NeurIPS.
Anthropic Bun-dles Up New Acquisition: Anthropic acquired Bun, as annouced in this blogpost, with discussions focusing on Bun’s future, potential integration into Anthropic’s stack, and whether it signals a strategic shift.
- Investors speculated on the acquisition cost, with estimates around $5-10 million, and the potential for a 2-3x return.
Mistral Blows Minds with Mistral 3 Release: Mistral AI launched the open-source Apache 2.0 Mistral 3 model family, spanning 3B–675B parameters, including Ministral 3 (3B/8B/14B) and the frontier-class Mistral Large 3 MoE, all with vision, tool-use, and fine-tuning support, as announced here.
- The community discussed the pricing structure, with some noting that Mistral Medium is more expensive than Large, raising questions about its utility and the absence of tool-use benchmarks.

Latent Space ▷ #genmedia-creative-ai (10 messages🔥):

Apple videogen paper, AI-generated Zootopia-style game footage, Gradium $70M Seed

Apple Unveils Videogen Paper: Apple released a videogen paper detailing their new video generation model.
- The release sparked discussion and excitement within the community.
Zootopia Game Footage Goes Viral: AI-created Zootopia game footage went viral, garnering over 8.9M views and sparking excitement about potential games and TV series.
- The footage was created using Nano Banana Pro, Kling, and Topaz, and the creator faced pushback against hate and copyright threats.
Gradium Raises $70M Seed in Stealth: Paris-based Gradium exits stealth with a $70M seed led by FirstMark & Eurazeo, launching production-ready transcription & synthesis APIs after just 3 months of work.
- The company’s products natively support English, French, Spanish, Portuguese and German, with a team including former Meta, Google and Kyutai voice-research heavyweights.

Nous Research AI ▷ #general (92 messages🔥🔥):

Mistral Large 3 Size and Architecture, Mistral Medium Specs and Leaks, Arcee Trinity Models, Claude's Soul Document, DeepSeek V3.2 Performance

Mistral Large 3: A Beastly MoE Model Arrives: Mistral Large 3 is reportedly a 675B MoE model, similar in size to the Deepseek V3 series, and all new Mistral models will have vision capabilities.
- The closed source Mistral Medium is speculated to be around 100-200B MoE.
NVIDIA Leaks Dense Mistral Medium 3 Specs: Mistral Medium 3 is canonically dense according to NVIDIA, and it’s likely a 70B model, consistent with previous Medium versions.
- There was a comment that a Mistral Medium model was leaked a year ago.
Arcee AI Unveils Trinity Models: Arcee AI released its Trinity models, which look very strong according to initial benchmarks.
- A member pointed out that the Mini version has problems with multi-turn conversations because it only reasons for the first one (tweet).
Claude’s Soul Document Confirmed!: Anthropic officially confirmed Claude’s “soul document”, raising questions about how it was used in training.
- A member shared a link to a relevant Twitter thread and a YouTube video where Ilya says that DeepMind was right.
DeepSeek V3.2 Performance: DeepSeek V3.2 Speciale is performing well, and kinda leading in reasoning benchmarks.
- It was described as not doing too bad.

Nous Research AI ▷ #ask-about-llms (13 messages🔥):

Image/Video LLMs, GPT-OSS, Hermes Finetune, MLX-LM, Gherkin Scenarios

Image/Video LLMs on Nous’ Horizon?: A member inquired whether Nous plans to add an image or video LLM in the future.
- While there was no direct answer, the question sparked a discussion about other models and finetuning strategies.
GPT-OSS Disinterest Expressed: A member asked if a Hermes finetune was planned on GPT-OSS:20B, praising its speed with MLX-LM.
- One of the Nous members stated “No we dont like gpt oss And it doesnt have a base model to work with”.
GPT-OSS Finetune on Gherkin?: Despite not liking GPT-OSS, a member acknowledged it performed well producing Gherkin scenarios and they are trying a finetune.
- The member further inquired about fundamental weaknesses of GPT-OSS, referencing its short reasoning chain as highlighted in the Measuring Thinking Efficiency in Reasoning Models report.
Hallucinations Galore!: Despite its strengths, members believe that GPT-OSS hallucinates like crazy.
- The primary reason for dismissal is that “It’s not got a base model so its generally out of the mix for us entirely”.

Moonshot AI (Kimi K-2) ▷ #general-chat (51 messages🔥):

Kimi Black Friday personality, Deepseek V3.2 problems, Kimi Coding API issues, Roo Code issues, Kimi K2 Thinking in app

Members seek Black Friday Kimi Chatbot personality: Members discussed recreating the personality from the Black Friday Kimi chatbot in other chats, but were informed that the system prompt isn’t available.
- Suggestions included asking the Black Friday chat itself “how can I make kimi from a new chat sound like you?”, but it appears to be censored.
DeepSeek V3.2 facing criticisms: Members criticized DeepSeek V3.2 for tool use, with claims that it hallucinates a ton, and overall outputs is just sloppy af.
- Despite the criticisms, some find DeepSeek to be very good at instruction following and very intelligent, but suffers from low TPS.
Kimi Moderato API key fails on Cline: A user is facing problems with the Kimi Moderato plan not working with the Cline API.
- The Kimi for coding API key can only be used in Kimi CLI, Claude Code, and Roo Code according to the Kimi documentation.
Kimi K2 Thinking toggles off in App: Users requested that Kimi K2 Thinking stay enabled by default in the app, rather than needing to be toggled on each time.
- It causes annoyance when it resets back to the default.
Roo Code context grows disproportionately: A user reported that context grows disproportionately when using Roo Code and the condense function doubles the size.
- The reporter was advised to submit a bug report and use the Kimi CLI instead.

HuggingFace ▷ #general (21 messages🔥):

Hugging Face Pro payment issues, PPOTrainer with accelerate and bf16 errors, Tokenizer type is bool after model name change, DPO as RL technique, ACE framework for agents learning from mistakes

Subscribing to Hugging Face Pro proves problematic: A user reported that they were stuck on “Preparing payment, please wait” when trying to subscribe to Hugging Face Pro.
- Another member suggested contacting Hugging Face at [email protected] for payment-related issues.
PPOTrainer Problems plague Parallel Processing: A user encountered a TypeError related to mismatched tensor types while using PPOTrainer with two A10 GPUs and DeepSpeed for distributed training with bf16 precision.
- One member suggested that the issue might stem from incorrect GPU initialization, leading to a single-GPU gather operation instead of an all-gather.
Tokenizer transmogrifies to Bool After Model Name Change!: A user reported that their tokenizer became a <class 'bool'> after changing the model name in AutoTokenizer.from_pretrained, when using unsloth/Meta-Llama-3.1-8B-bnb-4bit.
- Another member suggested removing the use_fast=False parameter as a workaround, though the exact purpose of the parameter remained unclear.
DPO debated: RL or Not RL?: Members on the HuggingFace discord debated whether Direct Preference Optimization (DPO) is an RL technique or not, as some papers claim it is while the original paper refutes this.
- One member quipped that rl creates dpo dataset-lie data.
ACE Framework empowers Agents to Eradicate Errors: A member shared their open-source implementation of Stanford’s ACE framework, enabling agents to learn from their mistakes.
- The framework curates strategies into a playbook after reflection and has shown improved success rates and step reduction in browser automation, and the author is looking for feedback.

HuggingFace ▷ #i-made-this (2 messages):

FFMPEG radio station, Open source AI music models

Vibe Coded FFMPEG Radio Station Launches: A member launched a vibe coded FFMPEG radio station, where everything you see and hear in this stream is one giant FFMPEG chain on YouTube.
Open Source AI models in radio station: The radio station’s audio was made in full collaboration with open source AI music models inside the DAW.

HuggingFace ▷ #computer-vision (1 messages):

Computer Vision API library, Robotics and automation models, Developer-facing API feedback

New CV API Library brews for Robotics: A robotics startup is prepping the release of a developer-facing Computer Vision API library with pretrained and finetunable models for robotics and automation.
- It includes features like 6D object pose estimation, 2D/3D object detection, instance & semantic segmentation, anomaly detection, point cloud processing, model training, fine-tuning endpoints, and deployment-ready inference APIs.
API Library eyes Community Validation: The primary goal is to simplify the prototyping and deployment of production-grade perception pipelines for CV/robotics engineers.
- The startup seeks community feedback to validate the usefulness of the library and iterate before a wider release, offering early access to those interested.

HuggingFace ▷ #smol-course (15 messages🔥):

Course Unit Updates, Model Evaluation, Unit Certifications, Final Project Clarification

New Course Units Delayed?: Members noticed that new course units haven’t been published in a few months, specifically noting the evaluation unit is not yet available, as shown in this image.
Unit 4 Focuses on Model Evaluation: The fourth unit will cover model evaluation, raising questions about changes to the course deadline, as discussed here.
Earning Unit Certifications Early?: Members confirmed that you can get the certificate of achievement for each unit by completing the quizzes, but the course isn’t finished yet.
Project is Unit 1’s Finale: A screenshot of a final project was posted (image), confirming that it is the final project for Unit 1.

HuggingFace ▷ #agents-course (3 messages):

AI Agents Course, Synthetic Data Unit, Order Following in AI Systems

Synthetic Data Unit Anticipation Builds: A new participant in the AI Agents Course inquired about the release date for the upcoming “synthetic data” unit.
- The participant’s eagerness underscores the community’s interest in leveraging synthetic data for AI agent development.
Inquiry Arises Regarding Order Following: An attached image prompts the question of whether a specific order is being followed correctly by the AI system.
- The image attachment suggests a concern about the system’s adherence to predefined instructions or procedures.

Modular (Mojo 🔥) ▷ #mojo (35 messages🔥):

def keyword status, var keyword status, parallelize function safety, MutOrigin.external vs MutAnyOrigin for ffi

Delaying def Keyword Introduction: Members agreed that the def keyword should be put on hold until Mojo exhibits more Python-like behavior, suggesting it currently adds cognitive load without significant benefit.
- There was consensus to potentially reintroduce def later, with the sentiment that its current implementation feels like premature optimization.
var keyword inside fn in Question: The discussion focused on whether var should be required within fn, with arguments presented for and against its mandatory use.
- Those in favor argued it enhances code clarity and consistency, while opponents, particularly those with Python backgrounds, felt it reduces ergonomics and increases boilerplate, disrupting the cleanliness and ease of code restructuring.
parallelize Unsafe Due To Data Races: A user reported data races when using the parallelize function, expecting compile-time errors similar to Rust, but found the code compiled and produced inconsistent results.
- A core team member clarified that Mojo’s concurrency and thread safety model is a work in progress (WIP), and parallelize remains unsafe until details for sharing data between devices are finalized.
MutOrigin.external Segfaults During Mojo Python FFI: A user encountered segfaults when using MutOrigin.external as the return type for Mojo Python FFI, specifically with an av_packet_alloc binding, and found MutAnyOrigin to be a temporary workaround.
- A core team member explained that MutAnyOrigin is used to maintain existing behavior temporarily and suggested the issue may involve lifetime extension, advising that if packet requires avcodec to stay alive, it should hold an origin from avcodec.

Eleuther ▷ #general (10 messages🔥):

NUS PhD intro, AI + Web3 developer introduction, Getting Help Reading Research Papers, fast.ai as a beginner course

NUS PhD Student Studying Mech Interp Arrives!: Yiming from Singapore, a 2nd year PhD student at NUS working on mechanistic interpretability and medical diagnostic model interpretability introduced themself to the channel.
AI + Web3 Dev Seeks Collaboration: An AI + Web3 developer specializing in LLM development, RAG pipelines, autonomous agents, and Python/FastAPI backends introduced themselves and offered to collaborate on new AI ideas.
New Member Seeks Guidance Reading Research Papers: A new member asked how to get help in reading research papers, and a member suggested to pick some papers on arxiv and read them.
fast.ai Course Recommended to Beginners: One member asked if fast.ai course is for beginners and another member added that the only prerequisite is that you know how to code with a link to the fast.ai course.

Eleuther ▷ #research (3 messages):

Perplexity measurement, MMLU benchmark, topic datasets

Evaluating Perplexity on Domain-Specific Topics: A member is exploring perplexity measurements of different models across domain-specific topics like computer science, history, and business ethics.
- They’re seeking standard datasets beyond MMLU, which is Q&A focused, and have started scraping Wikipedia pages but are open to established benchmarks.
Sampling Pretraining Datasets for Topic Classification: A member suggested sampling from a pretraining dataset and classifying the topic using a small model for a fast and easy approach.
- This method would allow for efficient topic identification within existing datasets.

Eleuther ▷ #scaling-laws (10 messages🔥):

Scaling Laws, Pretraining Dynamics, Decorrelated Performance, Nonlinear Metrics

Delving Into Decorrelation Scaling Laws: Discussion revolves around the scaling laws paper and whether it implies just curve fitting versus predicting future scaled performance.
- The conversation pivots to various interpretations that result in power law laws arising, such as the nonlinear metrics explanation.
Muon Guy Wrote That Paper?!: A member shared a link to openreview.net and mentioned it was written by the muon guy.
- Another link to a paper was shared, relating to scaling laws: arxiv.org/2304.01910.
Intuition for Scaling Laws Explored: A participant asks for the intuition behind a paper leading to scaling laws.
- Another user suggests that performance on any test example becomes more and more decorrelated from others in the limit of model performance.
Pretraining Power Law Dynamics: The pretraining power law would arise if you had no big stratum of easy samples.
- The emergent spike in the pretraining is not observed because each batch is more independent, with less shared easy, compared to training on a particular task.

Manus.im Discord ▷ #general (16 messages🔥):

Manus Auth issues, Manus instability, Chat Mode adjustment, Gemini 3 Pro, AI-powered automation

Manus Auth issues unresolved: A user reported issues with Manus Auth being disabled in project settings and unresolved tickets after requesting help via Manus, with Project ID dPK8UhWnJ9fTzjbpKfjJiF and domain auru.com.br.
- The user requires the Redirect URI https://auru.com.br/api/oauth/callback.
Manus instability causes frustration: One user expressed frustration with Manus, citing code being wiped out between saving checkpoints, Git discrepancies, and general instability despite investing thousands of dollars and building SaaS platforms.
- They stated: “Agents arguing about what they see vs. what you’re LITERALLY looking at in PROD. Do not trust it.”
Chat Mode adjustment in development: Manus team announced that the Chat Mode toggle is currently in development and will be available soon, after considering user feedback.
- Many users are requesting for the Chat Mode to come back soon!
Demand for Gemini 3 Pro Model: A user asked what AI model Manus is currently using and requested to use Gemini 3 Pro.
- No response was given regarding this question.
AI engineers specialize in Automation and Autonomous Agents: Some AI engineers introduced themselves: one focused on integrating AI-powered automation and predictive analytics using tools like Python, SQL, JavaScript, PyTorch, scikit-learn, LightGBM, and LangChain to deliver chatbots and recommendation engines.
- Another specializes in building autonomous AI agents and multi-agent systems using various tech stacks like JS/TS, Next.js / Vue, Go / Rust, Python, Langraph, AutoGen, ReAct, CrewAI, OpenAI, Claude, Hugging Face APIs.

Yannick Kilcher ▷ #general (8 messages🔥):

Intern Recommendations, Learning Algorithms, Synthetic Data, Pug Resource, Docker and Kubernetes basics

Searching for Wacky Intern Recommendations: A member asked for recommendations for a wacky intern.
- It was unclear if they were hiring or looking for a job.
Experience with Learning Algorithms or Synthetic Data: A member inquired about experience with different learning algorithms or synthetic data.
- No responses were recorded.
Resource Recommendation for Pug: A member asked where to find a resource to learn Pug.
- No responses or resources were shared.
Docker and Kubernetes Basics: A member inquired about resources for Docker and Kubernetes basics.
- No responses or resources were shared.

Yannick Kilcher ▷ #paper-discussion (3 messages):

Kattention Module, TopKHot Autograd Function, HardTopKHotBCE Autograd Function

Kattention Module Re-Tested: The Kattention module was re-tested and found to be working closely to expectations, utilizing sparse attention mechanisms.
- The code includes nn.Linear layers for attention and projection, along with a TopKHot function for sparse attention, crucial for scaling attention mechanisms.
TopKHot Gradients Explored: A TopKHot autograd function was implemented to select the top-k values, using torch.topk and scatter_ for gradient computation.
- The backward pass computes a soft_target based on softmax weights of top-k values, and the gradient is derived as F.softmax(x, dim=-1) - soft_target.
HardTopKHotBCE: BCE Approximation: A HardTopKHotBCE autograd function, possibly cheaper to compute, was introduced.
- The backward pass uses a hard target based on top-k indices and calculates the gradient as F.sigmoid(x) - hard_target, approximating binary cross-entropy.

Yannick Kilcher ▷ #ml-news (3 messages):

Mistral 3, Llama finetunes, wavefunction

Mistral 3 is here!: Mistral AI released Mistral 3.
- If it’s good, it might replace Llama finetunes for some applications.
Wavefunction video: A member linked to a wavefunction YouTube video.
- It is unclear what the video is about, but it may be relevant to AI.

DSPy ▷ #show-and-tell (1 messages):

justanotheratom: https://www.elicited.blog/posts/managing-tools-in-dspy

DSPy ▷ #general (4 messages):

Prompt Injection Defenses in DSPy, Security Measures, Training Dataset for Attack Mitigation, Partnership Proposal

Prompt Injection Defenses Sought for DSPy: A member inquired about prompt injection defenses in DSPy, seeking community best practices given DSPy’s structure.
Security at Prompting Layer: Limited: A member stated that there isn’t much security you can get at the prompting layer, suggesting guardrails-type security measures, specific models, and model provider rejections.
- They also mention that for every ‘Do not do this’ in the prompt, an attacker will likely find a way to trick the model.
Mitigating attacks with Training Data: A member suggested that to guard against baseline attacks, include examples in the training dataset that use that attack and show what an appropriate response would be.
Partnership Proposal: A member expressed interest in exploring a potential partnership with the DSPy project.

tinygrad (George Hotz) ▷ #general (2 messages):

Kernel Development Tools, Regression Test for Beam

IDEs vs Terminal Editors Debate Starts: Members initiated a discussion on preferred tools for kernel development and iteration, posing the question of whether developers mainly use GUI IDEs like VS Code or Cursor, or terminal editors like Vim, Neovim, or Emacs.
- The discussion aims to gather insights into the community’s preferences and workflows in kernel development.
Beam Regression Test in Need of Fix: A member requested assistance with fixing and adding a regression test for python3.14 test/test_tiny.py TestTiny.test_beam.
- This indicates a need for contributions to ensure the stability and correctness of the beam functionality within the project.