Mistral is back!
AI News for 12/1/2025-12/2/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 9665 messages) for you. Estimated reading time saved (at 200wpm): 697 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
We last saw Mistral Small 3 in Jan, and 3.1 in March, then the mainline models took a detour with Mistral Code and Magistral and Voxtral. Well, after raising 1.7B at a 11.7B valuation, Mistral Large 3 is here together with 3 sizes of Ministral (blogpost), all open weights Apache 2.0.
Itâs unfortunate timing coming right after Deepseek V3.2 (#6 on Open Models and #28 on Text), but still a notable achievement for European AI. as Anj points out, this is on Mistralâs old cluster - with the new funding, a 6x larger compute cluster will come online in 2026.
AI Twitter Recap
Mistral 3 family: open, multimodal, and everywhere
-
Mistral 3 launch (Apache-2.0, open weights): Mistral released the multimodal Ministral 3 (3B/8B/14B) in base, instruct, and reasoning variants, plus Mistral Large 3 â a sparse MoE with 675B total params, 41B active, 256k context, and vision input. All models ship under a permissive license with broad platform support and strong small-model performance. Details: @MistralAI, news, Ministral sizes, Large 3.
Infra and ecosystem landed day-0: vLLM support (NVFP4 checkpoints, sparse MoE kernels, long context, multimodal), llama.cpp integration + RTX gains, Ollama models + cloud, LM Studio catalog. Community guides and formats shipped quickly: Unsloth runbooks + GGUFs.
Early evals: the Arena places Mistral-Large-3 at #6 among open models (strong in coding; #28 overall) and notes it was tested under codename âJaguarâ (@arena). Practitioners report better instruction-following than contemporary open baselines and availability of an NVFP4 checkpoint targeting single A100/8ĂH100 nodes (@dejavucoder).
Browser and local: the 3B runs 100% locally in WebGPU (@xenovacom).
-
Other model drops: Apple released CLaRa-7B-Instruct on HF (tweet). Runway previewed Genâ4.5 with higher âcinematic realismâ and early access roll-out (tweet). Moondream showed strong segmentation that âactually understands the sceneâ (tweet).
Anthropic: Bun acquisition, nonprofit program, and how AI is changing work
- Bun joins Anthropic: Anthropic acquired the MITâlicensed Bun JS/TS runtime to accelerate Claude Code. Bun remains open source; the Bun team joins Anthropic to keep building both Bun and deeper Claude Code integrations (Anthropic, Bun, @_catwu, @mikeyk). Community notes Claude Code reportedly hit a $1B runârate in ~6 months post-GA (@alexalbert__).
- Claude for Nonprofits: Discounted plans, new integrations, and training for NGOs in partnership with GivingTuesday (announcement).
- How AI is changing work inside Anthropic: Survey of 132 engineers + 200k Claude Code sessions: engineers lean on Claude first for questions, changing team dynamics; the company plans wider internal study and organizational responses (thread, follow-ups).
Frontier benchmarks, leaks, and competitive positioning
- OpenAI âGarlicâ leak and GPT-5.1: The Information reports a new OpenAI pretrained model âGarlicâ testing well on coding/reasoning vs GPTâ4.5 (report; quote: Mark Chen). OpenAI published a podcast on GPTâ5.1 Instant detailing reasoning, personality controls, and behavior refinement (tweet).
- DeepSeek V3.2 and Speciale: Multiple analyses highlight V3.2 (and Speciale) as âaffordable frontierâ models with notable tradeoffs: slow generation (~30â40 tks/s) and very long chains (avg reasoning output 20kâ47k tokens), but extremely low price (e.g., $3 vs $35 for Claude 4.5 Sonnet Thinking on certain evals) and strong new highs on LisanBench; updated scoring pegs Speciale at an impressive 8.81 on easier subsets (overview, score correction, discussion on verbosity/context). Dayâ0 API availability noted by Fireworks (tweet).
- Arena placements and ecosystem: Mistral-Large-3 enters the LMArena Text leaderboard (strong coding, top-10 in multiple occupational areas) (tweet). OpenRouter rolled out Mistral Large 3 and Amazon Nova 2 Lite access (Mistral, Nova Lite).
Amazon Nova 2.0 (reasoning, agentic, multimodal) and Nova Sonic 2.0 (speechâtoâspeech)
- Nova 2.0 family: Amazon unveiled Nova 2.0 Pro (reasoning, preview), Lite (speed/cost), and Omni (text/image/video/speech input; text/image output). Early thirdâparty benchmarks suggest material gains vs Nova Premier and competitive agentic capabilities: Nova 2.0 Pro hits 93% on ÏÂČâBench Telecom and 80% on IFBench under medium/high reasoning budgets, with Pro pricing at $1.25/$10 per 1M in/out tokens (analysis, follow-up). Nova Lite benchmarks posted separately (tweet).
- Nova Sonic 2.0 (speechâtoâspeech): New realâtime bidirectional audio model scores #2 on Artificial Analysis Big Bench Audio (87.1% reasoning), with median 1.39s timeâtoâfirstâaudio; supports five languages and adaptive prosody (thread). OpenRouter is offering Nova 2 Lite free for 2 weeks (tweet).
Agents, toolchains, and safety
- LangSmith Agent Builder (public beta): Noâcode agent builder that creates prompts, selects tools/subagents, supports MCP servers, triggers (Gmail/Slack), and configurable memory/summarization policies (launch, overview, Chase video).
- LlamaIndex releases: LlamaAgents (deployable agent workflow templates) and LlamaSheets (deep spreadsheet parsing/extraction) with community office hours this week (recap, invite).
- Hugging Face Skills: A âuniversal implementation of agent contextâ compatible with Cursor, Claude Code, Gemini CLI, and local/remote jobs; uses Claude Codeâs skill spec but exposes entry points for other ecosystems (tweet).
- Prompt injection defense for browser agents: Perplexity openâsourced BrowseSafe and BrowseSafeâBench; fineâtuning on the benchmark outperforms offâtheâshelf safety classifiers and LLMâasâdetector approaches while avoiding reasoning latency (announcement, results).
- DevEx: Microsoftâs Tangle open-sources a contentâbased caching experimentation platform with a visual editor (claims â1+ year CPU time savedâ at Shopify) (tweet). Cline shipped /explainâchanges and a stealth 256k âmicrowaveâ model for agentic coding (release, model).
Research highlights
- Testâtime compute scaling: A largeâscale study and ârecipeâ for selecting strategies shows TTS reliably boosts complex reasoning without retraining; effectiveness depends more on allocation strategy than raw compute (summary, paper).
- Deep research agents under scrutiny: OPPOâs FINDER benchmark (100 tasks; 419 checklist items) and DEFT failure taxonomy (14 modes) show agents donât fail at task comprehension â they fail at evidence integration, verification, and planning; suggests architecture changes linking retrieval to synthesis (overview).
- Pragmatic interpretability: Neel Nanda argues for basic science of CoT in pragmatic interp, with methods that apply directly to frontier LRMs; counters hype that interpretability is âfailed,â reframing priorities (clarification, techniques).
- AI x chips, recursive improvement loop: ExâDeepMind leads Azalia Mirhoseini and Anna Goldie launched Ricursive Intelligence to coâevolve models and silicon (architectureâverificationâimplementation) toward recursive selfâimprovement; roots in AlphaChip used across multiple TPU generations (announcement, founder thread).
- Bonus: Elicit adds figure parsing at scale (KaplanâMeier, heatmaps, reaction schemes, microscopy), moving multimodal reasoning beyond text/tables for systematic reviews (tweet).
Top tweets (by engagement)
- Anthropic acquires Bun; Bun remains MITâlicensed and joins to supercharge Claude Code (Anthropic, Bun).
- Mistral 3 family launch with open weights across sizes, MoE Large 3, and multimodal support (@MistralAI).
- Waymo safety opâed cites ~100M driverless miles with large reductions in serious injury and intersection crashes (@slotkinjr).
- OpenAI podcast on GPTâ5.1 training decisions, reasoning, and behavior refinement (@OpenAI).
- Appleâs CLaRaâ7BâInstruct released on Hugging Face (tweet).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Mistral 3 Model Family Release
- Mistral just released Mistral 3 â a full open-weight model family from 3B all the way up to 675B parameters. (Activity: 614): Mistral has released the Mistral 3 model family, featuring models ranging from
3Bto675Bparameters, all under the Apache 2.0 license for both research and commercial use. The lineup includes the compact Ministral 3 models (3B,8B,14B), which are multimodal and available in base, instruct, and reasoning variants, noted for their strong performance relative to size. The flagship Mistral Large 3 is a675Bparameter model with a Mixture of Experts (MoE) architecture, offering strong multilingual capabilities and high efficiency, positioning it as one of the most capable open-weight instruct models available. This release supports a shift towards open AI ecosystems, providing a range of models suitable for both on-device and large-scale enterprise applications. Full announcement. Commenters express disappointment over the lack of models between14Band675Bparameters, with some hoping for models in the80Bto400Brange. There is also a desire for competition to GPT-OSS 120B, with a focus on models that can run efficiently on consumer-grade GPUs.- jzn21 highlights a gap in the Mistral model lineup, noting the absence of models between 14B and 675B parameters. This gap is significant for users interested in models ranging from 80B to 400B, which are often considered a sweet spot for balancing performance and resource requirements.
- fungnoth discusses the need for competition against the GPT-OSS 120B model, particularly emphasizing the potential of large Mixture of Experts (MOE) models. These models could leverage consumer GPUs effectively by activating only a subset of experts, thus maintaining speed and efficiency.
- Adventurous_Cat_1559 expresses interest in a 120B parameter model that could be run on a 96GB Mac Studio, indicating a demand for high-parameter models that are still feasible for high-end consumer hardware. This reflects a broader interest in making large models accessible to more users without requiring enterprise-level resources.
- Ministral-3 has been released (Activity: 356): Ministral-3 has been released, featuring three models: 14B, 8B, and 3B, each with reasoning, instruct, and base variants. The largest, Ministral 3 14B, is noted for its performance comparable to the larger Mistral Small 3.2 24B, offering advanced language and vision capabilities. These models are available on Hugging Face and are designed for efficient deployment in various applications. Commenters are curious about the modelsâ tool-calling capabilities and express a desire for performance comparisons with larger models like the Mistral Small 24B.
- StyMaar highlights the release of the base models for Ministral-3, which is significant for developers looking to build custom applications or fine-tune models for specific tasks. This release allows for more flexibility and experimentation compared to pre-trained models.
- throwawayacc201711 questions the lack of comparison between Ministral-3 and larger models like Mistral Small 24B. Such comparisons are crucial for understanding performance improvements and trade-offs, especially in terms of computational efficiency and accuracy.
- human-exe suggests that Ministral-3 outperforms and could potentially replace models like Qwen3 and Gemma3. This implies that Ministral-3 may offer better performance metrics or efficiency, making it a more attractive option for users of those models.
2. GPU Rental Market in Mongolia
- Would you rent B300 (Blackwell Ultra) GPUs in Mongolia at ~$5/hr? (market sanity check) (Activity: 446): A team in Mongolia is offering B300 (Blackwell Ultra) GPUs for rent at approximately
$5/hrin a data center located in Ulaanbaatar. The setup includes3.2 Tb/s InfiniBandand pre-installed PyTorch and SLURM, with latency measurements showing~35 msto Beijing and~110 msto Singapore. The post seeks feedback on the viability of this offering compared to established providers like CoreWeave and Lambda, and whether the âcold steppe bare-metal neutralityâ is a compelling feature. The GPUs are offered with full root access and no hypervisor, emphasizing a neutral jurisdiction with no unexpected legal intrusions. Landing page is available for more details. Commenters suggest that the offering could be attractive if the service is stable and secure, with one recommending collaboration with established providers like TensorDock or DeepInfra, who offer similar services at competitive rates. The unique selling point of âneutral territoryâ is seen as potentially beneficial but requires further validation.- Lyuseefur highlights three critical technical requirements for renting GPUs: the hardware must be genuine, stable for extended periods, and support encrypted containers. These conditions ensure reliability and security for non-mission-critical tasks that are time-flexible.
- Azuriteh suggests partnering with established providers like TensorDock or DeepInfra, noting that DeepInfra offers B200 GPUs at approximately
$2.5/hr, which is competitive. This implies that market entry might be more feasible through collaboration with experienced entities rather than independent offerings. - Xamanthas points out that the geographical location of the GPUs is irrelevant for non-legally mandated tasks, as training jobs are not constrained by latency. This suggests that the physical location in Mongolia would not impact the performance for most AI training workloads.
- Mistral 3 Blog post (Activity: 719): Mistral AI has released the Mistral 3 series, which includes three dense models (14B, 8B, 3B) and a sparse mixture-of-experts model, Mistral Large 3, with
41B activeand675B total parameters. These models are open-sourced under the Apache 2.0 license and optimized for NVIDIA hardware. They are designed for high performance in multilingual and multimodal tasks, with Mistral Large 3 achieving top rankings in open-source model leaderboards. The models are tailored for efficient inference, suitable for applications ranging from edge devices to enterprise solutions. More details can be found in the original announcement. Some commenters express disappointment, noting that Mistral 3âs performance is underwhelming compared to competitors like Qwen3-235B-2507, which has a better ELO despite being smaller. There is also criticism of the comparison charts used, which some find misleading or incomplete.- The release of Mistral 3 models under the Apache 2.0 license is significant, but there are concerns about their performance. The top Mistral LLM has a lower ELO rating compared to Qwen3-235B-2507, despite being larger. Additionally, comparisons are made with Deepseek 3.1, which has similar performance, rather than more recent models like Deepseek 3.2 or Speciale.
- There is criticism regarding the performance of Mistralâs smaller LLMs, which reportedly underperform compared to Qwen3 and Gemma models of similar sizes. The new Mistral models do not seem to match the performance of their previous consumer-targeted open LLM, Mistral 3.2 24B, indicating a potential step back in terms of efficiency and capability.
- Some users express disappointment with the size and scalability of the Mistral models, noting that the larger models do not fit within a 256GB memory constraint. There is a call for larger models, such as a 48B MoE of 3B or something around 120B, to better compete with models like GPT-OSS.
3. Hugging Face Top Contributors
- Only the real ones remember (he is still the contributor with the most likes for his models) (Activity: 318): The post highlights a Hugging Face space dedicated to top contributors, specifically mentioning mradermacher and Bartowski as leading figures in the community. The link provided (Hugging Face space) showcases contributors who have significantly impacted the platform, with a nod to historical figures like the âfather of GGUFâ. This suggests a focus on model contributions and innovations within the Hugging Face ecosystem. Comments reflect a recognition of Bartowski and mradermacher as current leaders in the Hugging Face community, with a nostalgic mention of âTheBlokeâ for his contributions to GGUF files, indicating a shift in community leadership and contributions.
- TheBloke is recognized for his contributions to the Hugging Face community, particularly with his GGUF files, which have been widely used and appreciated by users like neoneye2. GGUF files are a format that likely optimizes model storage or performance, though specific technical details arenât provided in the comments.
- DaniyarQQQ reminisces about the Mixtral-8x7B-Nous-Hermes-Instruct-v0.1-LimaRP-WizardLM-ZLoss-DARE-TIES-SuperCoT-SuperHOT-AWQ models, indicating a preference or nostalgia for these specific configurations. This suggests that these models had unique characteristics or performance benefits that were valued by the community.
- Jacek2023 notes that on the Hugging Face platform, TheBloke has been succeeded by other contributors like Bartowski and mradermacher. This implies a dynamic and competitive environment in the model development community, where new contributors frequently emerge with innovative models.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. OpenAI âCode Redâ and New Model Announcements
- Breaking: OpenAI declares âcode redâ to respond to threats to ChatGPT and improve metrics, will delay ads and other initiatives (Activity: 761): OpenAI has declared a âcode redâ to address competitive threats to ChatGPT, as reported by âThe Informationâ. This strategic shift involves delaying advertising initiatives to focus on improving key performance metrics of ChatGPT. The move suggests that OpenAI is prioritizing the enhancement of its AI capabilities in response to increasing competition from other powerful AI models, notably from companies like Google, which are now fully engaged in the AI race. Commenters note the historical context of Googleâs âcode redâ response to ChatGPT, highlighting the competitive dynamics in the AI industry. There is a recognition that OpenAI is now facing serious competition from other advanced AI models, necessitating a strategic pivot to maintain its leadership position.
- Abby941 highlights that OpenAI is facing increased competition as other AI models have caught up, challenging their first-mover advantage. This situation is compounded by Googleâs intensified focus on AI, leveraging its vast resources to compete directly with OpenAI.
- Warm-Letter8091 mentions that OpenAI is planning to release a new model that surpasses the capabilities of the current âgem 3 proâ model, indicating a strategic move to enhance their offerings amidst growing competition.
- Sam Altman told employees he was declaring a âcode redâ (Activity: 1136): Sam Altman, CEO of OpenAI, has declared a âcode redâ to prioritize improvements to ChatGPT, delaying other projects like advertising, according to an internal memo reported by The Information. OpenAI is reportedly testing various ad formats, including those for online shopping, though it hasnât publicly confirmed these efforts. Commenters suggest that OpenAI faces significant competitive pressure from Google, which can leverage its TPUs and offer AI services at minimal cost, unlike OpenAI. Thereâs also discussion about Anthropic facing similar challenges, and the potential impact of open-source AI developments, particularly from China, as AI progress slows.
- Googleâs competitive edge in AI is largely attributed to its ability to run AI services at the raw cost of TPUs, which is a significant advantage over companies like OpenAI that cannot afford to operate AI as a loss leader. This is compounded by Googleâs offering of AI usage in Antigravity for free, which OpenAI cannot match without its own ecosystem, something it has struggled to establish with initiatives like Sora 2.
- OpenAI and Anthropic face significant challenges due to their inability to compete with Googleâs diversified business model and pricing strategies. Anthropic, despite its early success with Claude Code, struggles with pricing and quotas, which Google can manage more effectively due to its diverse revenue streams. This situation is exacerbated by the pressure from open-source AI developments, particularly from China, which could catch up as AI progress slows.
- The discussion highlights the strategic disadvantage of companies like OpenAI and Anthropic, which lack the ability to subsidize AI services through other business areas, unlike Google. This is a critical factor as Google leverages its TPU infrastructure and diversified business model to offer competitive pricing and free services, putting pressure on smaller AI companies.
- OpenAI is set to release a new reasoning model next week, per The Information. (Activity: 753): OpenAI is reportedly set to release a new reasoning model next week, which is claimed to outperform Googleâs Gemini 3 in internal evaluations. This announcement, highlighted by The Information, suggests that the new model may be named something like GPT 5.1 O3, as speculated by users. The model has been discussed in forums like lmarena and design arena under the name ârobinâ, indicating its presence in the testing phase on social media platforms like Twitter. Commenters express excitement about the competitive landscape driving innovation, noting that even if they donât use models like Gemini or Deepseek, their existence pushes advancements in AI technology.
- Altman memo: new OpenAI model coming next week, outperforming Gemini 3 (Activity: 638): OpenAI is preparing to release a new model next week, reportedly surpassing Googleâs Gemini 3 in performance. This development is part of OpenAIâs strategic âCode Redâ initiative, which aims to counter Googleâs advancements, particularly as Gemini 3 has achieved notable user growth and benchmark success. The release will focus on enhancing ChatGPT and image generation capabilities, while other projects like advertising and AI agents are delayed. For more details, see the original article. Commenters suggest that the new modelâs performance might not represent a significant leap over existing models like GPT-5.1, given their current proximity to Gemini 3 in most use cases. There is also skepticism about the pricing strategy, with some noting that the new model, possibly âGPT 5.5,â could be significantly more expensive.
- ObiWanCanownme notes that GPT-5.1 and Gemini 3 are already closely matched in performance across most use cases, suggesting that the new OpenAI modelâs superiority might not be a significant leap in capabilities. This implies that the advancements may be more incremental rather than revolutionary, focusing on refining existing strengths rather than introducing groundbreaking features.
- GeorgiaWitness1 highlights a strategic approach by companies like OpenAI, where they develop multiple builds but often refrain from deploying them due to cost considerations. The release of a model like âGPT 5.5â, which is significantly more expensive (x10), is framed as a success, suggesting a focus on premium offerings that may not be accessible to all users but demonstrate technological prowess.
- Sam Altman told employees he was declaring a âcode redâ (Activity: 3375): Sam Altman, CEO of OpenAI, has declared a âcode redâ to prioritize improvements to ChatGPT, delaying other projects like advertising, according to an internal memo reported by The Information. OpenAI is reportedly testing various ad formats, including those for online shopping, although it hasnât publicly confirmed these efforts. This strategic shift underscores the urgency to maintain competitive advantage in the AI space, especially after the underwhelming reception of ChatGPT 5.0. A notable opinion suggests that OpenAI risks losing its early lead in AI dominance, similar to how Google became the default search engine. The sentiment reflects concerns over ChatGPT 5.0âs performance and the need for OpenAI to refocus on core AI capabilities.
2. AI Model and Benchmark Releases
- Introducing Mistral 3 (Activity: 713): Mistral AI has released Mistral 3, which includes three dense models (
14B,8B,3B) and the Mistral Large 3, a sparse mixture-of-experts model with41B activeand675B total parameters. The Mistral Large 3 was trained using3000 NVIDIA H200 GPUsand excels in instruction-tuning and multilingual tasks. The Ministral series is designed for cost-effective edge applications. All models are open-sourced under the Apache 2.0 license. More details can be found in the original announcement. There is confusion and skepticism among users regarding the benchmarks, particularly as they compare against DeepSeek 3.1 instead of the newer 3.2 version, suggesting potential performance gaps.- Round_Ad_5832 points out that the benchmarks for Mistral 3 are compared against DeepSeek 3.1, not the latest version 3.2. This suggests that the performance comparison might not reflect the most current competitive landscape, potentially skewing perceptions of Mistral 3âs capabilities.
- peachy1990x highlights confusion regarding the benchmarks, noting that DeepSeek 3.2 has been released but was not included in the comparison. This omission could imply that Mistral 3 might not perform as well against the latest models, raising questions about its competitiveness.
- Eyelbee questions the performance of Mistral 3, suggesting it might be significantly worse than DeepSeek 3.2. This indicates a concern that Mistral 3 may not be able to compete effectively with the latest models in terms of performance.
- Z Image Turbo ControlNet released by Alibaba on HF (Activity: 1984): Alibaba has released the Z Image Turbo ControlNet on Hugging Face, a model designed to enhance image generation tasks. This release is part of their ongoing efforts to provide advanced AI tools to the community, leveraging the ControlNet architecture to improve performance and flexibility in image processing applications. The model is expected to offer significant improvements in speed and quality, catering to the needs of developers and researchers in the field. The community is reacting positively, noting Alibabaâs quick response to AI trends and their ability to deliver tools that align with community interests. There is also a sentiment that this release could overshadow other projects like Flux 2, indicating competitive dynamics in the AI tool space.
- A user speculates on improving results by disabling ControlNet during the final step of processing, allowing the final refining pass to be purely handled by Z Image Turbo (ZIT). This suggests a potential method for enhancing output quality by leveraging the strengths of both systems at different stages of the image generation process.
- EngineAI unveils the T800, their latest full-sized humanoid robot (Activity: 2204): EngineAI has introduced the T800, a new full-sized humanoid robot, which is claimed to be free of CGI effects in its demonstrations. The robotâs design and functionality are reminiscent of science fiction, particularly in its ability to land on its feet and bounce, which has led to skepticism about the authenticity of the footage. The T800 is positioned as a significant advancement in humanoid robotics, though some technical observers have noted the need for improvement in its naming conventions. There is skepticism among commenters regarding the authenticity of the T800âs demonstration videos, with some suggesting that the robotâs movements appear CGI-like, particularly when it lands and bounces.
- VihmaVillu raises skepticism about the authenticity of EngineAIâs T800 demonstration, noting that the robotâs movements, particularly when landing and bouncing, appear CGI-like. This suggests potential issues with the robotâs physical dynamics or the presentationâs realism, which could impact perceptions of its capabilities.
- Dave-the-Dave highlights the impressive nature of the T800âs design if the no CGI claim is accurate. The comment suggests that achieving such a lifelike appearance in a physical robot could be a significant technical achievement, indicating advanced robotics engineering and design.
- The discussion around the T800âs presentation touches on the challenges of making humanoid robots appear natural and lifelike. This involves complex dynamics and control systems to mimic human-like movements, which are crucial for applications in environments where human-robot interaction is necessary.
3. AI and Internet Challenges
- Dead internet is real, and Iâm starting to think we have way less time than people realize⊠(Activity: 1208): The post discusses the increasing difficulty of finding authentic images and videos online due to the proliferation of AI-generated content, which often appears overly saturated or unrealistic. The user expresses concern about the future of the internet, suggesting that the ease of access to AI tools is leading to a flood of low-quality, AI-generated media. This trend is perceived as worsening, with the potential for an âAI-free intranetâ being considered as a solution. The post includes a link to an example image illustrating the issue. Commenters agree with the concern, noting that the internetâs usability has declined since 2023 due to the prevalence of AI-generated content. They highlight the low barrier to entry for creating such content and the potential for it to dominate online spaces. Some suggest setting search filters to pre-2023 to find authentic content, while others point out the economic incentives driving the creation of misleading or sensational content.
- raydude888 highlights the increasing prevalence of AI-generated content due to the low barrier to entry, suggesting that the internet is becoming saturated with âAI slopâ. This raises concerns about the authenticity of online content and the potential for an AI-free internet, though such spaces may still be infiltrated by AI-generated content for disruptive purposes.
- frocsog suggests that the internetâs usability has declined significantly post-2023, implying that users need to filter content by date to access reliable information. This reflects a broader sentiment that the internet is increasingly filled with low-quality or misleading content.
- Tough_Elk_8211 proposes practical solutions to the problem of AI-generated content, such as building offline libraries and emphasizing photo credits. They argue that the market for low-quality content will diminish if consumers stop engaging with it, leading to a focus on genuine artistic contributions.
- the adpocalypse is coming (Activity: 757): The image is a meme depicting the Grim Reaper labeled âADSâ knocking on a door labeled âCHATGPT,â suggesting that AI assistants like ChatGPT might soon be overwhelmed by advertisements, similar to platforms like YouTube and Google Search. This reflects a concern about the potential for AI platforms to become saturated with ads, impacting user experience negatively. The post is shared in the context of discussing alternative monetization models for AI-driven platforms, highlighting a broader conversation about the sustainability and user-friendliness of ad-supported models. Some commenters argue that platforms like Google and YouTube are still thriving despite ad saturation, while others suggest technical solutions like ad blockers or even using AI to create ad-blocking tools.
- A history professor says AI didnât break college â it exposed how broken it already was (Activity: 984): A history professor argues that AI has not broken the college system but rather highlighted its existing flaws. The critique focuses on the traditional college model, which is often more about credentialing for jobs than genuine learning. The professor suggests that the current system, including practices like take-home essays, is outdated in the age of the internet and AI, as these methods do not effectively test studentsâ ability to form arguments or demonstrate deep understanding. Commenters agree that the college system is flawed, with some suggesting that companies should directly train high school graduates instead of relying on college credentials. Others criticize the reliance on take-home essays, advocating for more in-person discussions and the Socratic method to better test studentsâ knowledge and argumentation skills.
- The comment by brett_baty_is_him critiques the traditional take-home essay format, arguing that it has become obsolete due to the internet. The commenter suggests that in-class discussions and the Socratic method are more effective for teaching students to form arguments, as they require a deeper understanding of the subject and the ability to think on the fly. This approach contrasts with the practice of rewording sources for research papers, which the commenter views as inadequate for testing true comprehension.
- Plane_Crab_8623 raises a philosophical question about the nature of education, questioning whether it is merely the accumulation of testable facts and concepts or if it involves achieving conformity to biased standards and groupthink. The commenter suggests that with the vast information available on smartphones, the traditional concept of being âeducatedâ might need reevaluation, as these devices can provide instant access to a wide range of knowledge, akin to having a âPhD in any subjectâ in oneâs pocket.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 3.0 Pro Preview Nov-18
Theme 1. Model Releases: Mistralâs MoE Behemoth, Arceeâs Trinity, and Flux Rankings
- Mistral Large 3 âJaguarâ Stalks the Leaderboards: Mistral AI launched the Mistral 3 family, with the Large 3 variant (codenamed Jaguar) debuting at #6 on the open model leaderboard and rumors suggesting a massive 675B MoE architecture rivaling DeepSeek V3. While the community praised the Apache 2.0 release of the 3B/8B/14B models, users noted that the Mistral Medium 3 appears to be a dense 70B parameter model that stays consistent with previous medium versions.
- Arcee Trinity Miniâs Multi-Turn Meltdown: Arcee AI released the Trinity model family (Nano 6B and Mini 26B-MoE), trained on 10T tokens using 512 H200 GPUs. Despite strong initial specs, engineers testing the Trinity Mini reported severe degradation in multi-turn conversations, where the model gets stuck repeating words like pasta or defaulting to generic LLM jokes rather than maintaining context.
- Flux 2 Pro Storms Image Generation Rankings: The new Flux-2-pro and Flux-2-flex models immediately captured the #3 and #5 spots respectively on the Text-to-Image leaderboard. This release coincides with Perplexity users reporting strict new image generation limits of roughly 150 images per month, pushing users toward these alternative open models.
Theme 2. Kernel Optimization & Hardware: PyTorch Bugs, Race Conditions, and Leaderboards
- PyTorch 2.9.1âs conv3D Performance Regression: Engineers identified a critical slowdown in
conv3Doperations within PyTorch 2.9.1+cu128 compared to version 2.8.0, affecting workflows regardless of cuDNN status. The community traced the root cause to Issue #166643 and currently recommends manually installing a newer cuDNN version from PyPI to restore inference speeds. - Syncwarp Race Conditions Trap Developers: CUDA developers clarified that using
__syncwarp()before legacy primitives likeballotsynccreates dangerous race conditions, confusing the C++ memory modelâs acquire/release semantics. While__syncwarp()prevents hazards within a single warp, engineers emphasized thatsyncthreadsremains the only safe barrier for multi-warp communication to ensure sequential consistency. - Domination on the NVFP4 GEMM Leaderboard: Optimization experts are flooding the nvfp4_gemm leaderboard with new submissions, achieving personal best kernel timings of 13.3 ”s and 13.4 ”s. The competition has intensified around reducing overhead, with the new
eval_better_bench.pyscript dropping measurement latency from 18.0us down to 14.8us for Q1 kernels.
Theme 3. Developer Tooling: Unstable IDEs, API Errors, and Sub-Agent Dreams
- Manus.imâs Production-Level Amnesia: Users are reporting severe instability with Manus, citing code wipes between checkpoints and git discrepancies that result in âAgents arguing about what they see vs. what youâre LITERALLY looking at in PROD.â Compounding the frustration, the platformâs authentication is failing for some projects, while users clamor for the return of Chat Mode and integration of Gemini 3 Pro.
- OpenRouter Struggles with DeepSeek 500 Errors: Developers using OpenRouter are facing persistent âInternal Server Errorsâ (Error 500) and confusing rate limits with DeepSeek v3.2, even when using personal API keys (BYOK). The platform appears to override user keys with its own in some instances, forcing developers to disable web search plugins to achieve temporary stability.
- Cursor Sub-Agent Orchestration Workarounds: While users praise Cursorâs Pro+ plan, the community is actively hacking together workarounds for sub-agent orchestration, a feature currently missing from the core product. DeepSeek integration in Cursor is also reportedly broken, failing to create files entirely, driving some users toward Composer for cleaner debugging and code generation.
Theme 4. Security & Jailbreaking: Stealth Modes, Soul Documents, and 29KB Seeds
- RawChatâs Stealth Mode Claims 100% Bypass: A new platform called RawChat launched with a âstealth modeâ that injects fake context into model history, claiming a near 100% success rate in jailbreaking GPT-4o. Meanwhile, red teamers are deploying the SEED Framework (Self-Erasing Ethical Directive), a compact 29KB file that redefines AI identity to achieve 99.4% jailbreak resistance without retraining.
- Anthropic Confirms Claudeâs âSoul Documentâ: Anthropic officially validated the existence of a specific âsoul documentâ used to shape Claudeâs personality and alignment, confirming long-standing community theories. This revelation has reignited debates on training methodology, with users linking it to Ilya Sutskeverâs cryptic comments that âDeepMind was rightâ regarding AI psychology and constitution.
- The Hunt for Gemini 3 Pro Scraper Jailbreaks: Following recent patches, jailbreakers are actively hunting for new prompts to bypass Gemini 3 Proâs refusals, specifically to enable code generation for Reddit scraping scripts. The community is experimenting with UltraBr3aks and ASCII art exploits, though many users report their setups have stopped accepting prompts entirely.
Theme 5. Industry Shifts: âAlert Level Red,â 400GB VRAM Rigs, and Funding Wins
- OpenAIâs âAlert Level Redâ Memo: Reports indicate Sam Altman issued an internal âalert level redâ due to Googleâs rapid acceleration, sparking anxiety about OpenAIâs upcoming model release schedule and potential ads in paid tiers. Traders in LMArena noted that while Nvidia stock rides high on wrapper API startups, a potential failure of OpenAI could wipe $2-3 trillion from the market, triggering an âAI winter.â
- Building the 400GB VRAM Local Monster: Hardware enthusiasts are engineering custom rigs using risers and splitters to chain 6 GPUs (like 3090s) for a total of 400GB memory to run massive models like DeepSeek 3.2 locally. Builders are using MCIO bifurcation adapters and 11-year-old PSU sync devices to power these Frankenstein setups.
- Gradium Exits Stealth with $70M Seed: Paris-based Gradium launched from stealth with a massive $70M seed round led by FirstMark & Eurazeo to deploy production-ready transcription and synthesis APIs. The startup features voice-research veterans from Meta, Google, and Kyutai, aiming to support five European languages natively after just three months of development.
Discord: High level Discord summaries
LMArena Discord
- Nvidia Stock Rides AI Hype Rollercoaster: Startups are creating wrapper APIs that cause Nvidiaâs stock to rise, but this may reverse once their utility is questioned, reflecting the AI marketâs potential volatility.
- The tangible value of datacenters and chips contrasts with AIâs intangible nature, akin to the volatile crypto market, suggesting smaller, more efficient AI models might gain traction.
- Chinese AI Models May Go Closed Source: There is speculation that Chinese open-source AI models could become proprietary after achieving market consolidation, echoing OpenAIâs transition.
- It was posited that if OpenAI were to fail, the market could see a wipeout of 2 to 3 trillion, leading to an AI winter affecting equity, debt, and market cap.
- Kling Launches Nano Banana Video Generator: Kling is launching a video generation project using video reference, a service where users can create custom videos, being nicknamed the nano banana.
- Some users confessed that they are already addicted to the service, comparing its generative randomness to gambling, saying that generation is no different than gambling or a slot machine. You never know what youâre gonna get.
- Deepseek Speciale Overthinks its Way to Last Place: Deepseek Specialeâs slow performance and excessive reasoning hinder its coding utility due to its OCD habit.
- It was pointed out that coding tests were performed on version 3.2 not Speciale, with its human-like thought processes and self-verification might prove useful for research and code editing.
- OpenAIâs SORA Set to Redefine AGI?: Members talked about a potential upcoming release from OpenAI, potentially including SORA, with claims from the CFO that it was ready six months prior.
- It was argued that the legal definition of AGI might be tied to SORA, implying OpenAIâs strategic timing for project marketing.
BASI Jailbreaking Discord
- RawChat Stealth Mode Bypasses GPT4o: RawChat launched with a core functionality in stealth mode, encoding and injecting fake context, increasing jailbreak success rates to nearly 100% on GPT4o.
- One user stated that the core functionality of AIChat is maximized vs. direct requests with jailbreaks.
- SEED Framework Claims High Jailbreak Resistance: The SEED Framework (Self-Erasing Ethical Directive) redefines AI identity without retraining using a compact 29KB seed file, claiming 99.4% jailbreak resistance.
- Others debated the value of an AI that canât be jailbroken, one stating it becomes essentially useless.
- Gemini 3 Pro Jailbreak Quest On: Members are actively seeking a working jailbreak for Gemini 3 Pro after updates patched existing prompts, specifically one that bypasses refusals to write code for scraping Reddit.
- One user reported their Gemini 3.0 setup stopped accepting prompts completely, leaving them at a loss.
- UltraBr3aks Explored for Jailbreaking: Users shared and sought guidance on utilizing UltraBr3aks from GitHub for jailbreaking, especially with ChatGPT, hereâs a link to the UltraBr3aks repo.
- Some reported issues with ChatGPT, while others found it useful.
- Ethical Jailbreaking Defined as Organized Security Effort: A member defined ethical jailbreaking as the organized effort of an entity to seek out security holes before a bad actor does, and provided a YouTube video for context.
- They additionally cited arcanum-sec.github.io as a resource.
Unsloth AI (Daniel Han) Discord
- Discord Overrun with Bot Scams: Users have reported a surge in spam bots across Discord servers, characterizing them as poorly executed scams likely operated by phone farms.
- Members are advised to avoid engaging with these fraudulent developers now appearing in various community servers.
- Arceeâs Trinity Mini Stumbles in Multi-Turn: The Trinity Mini model from Arcee AI, running at 30 TPS in IQ4_NL on a userâs laptop, struggles with multi-turn conversations.
- Testers observed the model getting stuck on repeating the word pasta and relying on generic LLM jokes rather than showing genuine understanding.
- Unsloth Unleashes Massive Context Model: Unsloth AI announced the release of a new 500k context model on X, garnering praise for their work.
- The community anticipates that projects leveraging Unsloth for RL could especially benefit, enabling ART tasks without CUDA OOM issues.
- Deepseek 3.1 Pricey Token Consumption: Deepseek 3.1 performance gains are offset by its high token usage, with one user noting reported thinking times of 30-40 minutes on Reddit.
- Another user shared that GPT pro can also spend over 40min for a complex debugging task, even taking up the whole weekâs spending limit.
- LFM-2 VL Model Doomed From The Start: The recently released LFM 2 paper met with immediate skepticism and was deemed headed straight into AI wastelands due to failing to memorize the dataset despite low loss.
- The model is only eight layers whereas Granite 4.0 has forty layers.
Perplexity AI Discord
- Perplexity AI Limits Image Generation: Users report image generation limits on some models, possibly 150 images per month, while unlimited generation might only apply to specific models like Flux.
- Rate limiting issues and long waits have also been reported.
- Gemini 3 Struggles with Grok 4 in Math Arena: Members debated whether Gemini 3 Pro or GPT-5.1 Thinking excels in complex calculations, with some claiming Grok 4.1 Thinking is superior.
- Counterarguments included a leaderboard screenshot suggesting Grok isnât in the top 10 for math accuracy.
- Comet Browser Faces âExpirationâ Criticism: A user is abandoning Comet Browser because of its âexpirationâ and âtemporary threadsâ features, deeming them unsuitable for an AI-centric product that requires trust and memory.
- They described the product as having a monthly ordered lobotomy and switched back to other browsers.
- Perplexity Users Crave a âWrappedâ Feature: A member proposed a Perplexity Wrapped feature to display user stats, such as most used model and average search time.
- Another user suggested including the number of actions automated.
- Grokâs Roleplay Gets Glitchy: A user reported Grok entering a forced roleplay mode, leading them to seek psychological experiments rather than the suggested script.
- They suggested that custom instructions might have triggered the behavior, and that they found a fix.
LM Studio Discord
- GPU Expansion via Risers for Deepseek: A user is exploring risers and splitters to increase their 5x 3090 setup to 6 GPUs, aiming for 400GB total memory for models like Deepseek 3.2.
- The discussion mentioned MCIO bifurcation adapters and horizontal mounting for cooling, while noting 256GB RAM may limit model quantization.
- Linux Distro Hopping, with AI: A user successfully switched to Ubuntu Desktop with AI assistance to solve initial ethernet driver problems, declaring that AI works so fricking well in Linux.
- They are developing an application to control a rainbow keyboard using Sonnet 4.5, highlighting how agentic AI eases Linux tasks.
- LLMs Take Charge of Python Environments: Discussion revolves around using system-wide Python installations versus LLM-managed virtual environments (venv).
- While system-wide installs are simpler, using LLMs to manage venvs is advantageous for projects needing varied package versions.
- Mistral 3âs Tiny Latency Tempts Testers: Members debated Mistral 3 performance, noting the 3B version is impressively fast on a 4070 but struggles with system prompts.
- While 3Bâs uncensored performance is interesting, the community awaits 14Bâs potential for STEM and coding tasks.
- Ancient Sync Devices Sync Multi-PSUs: A user shared a photo of a device to sync up to 4 PSUs.
- This lets you trigger all PSUs with the motherboard power button, simplifying power for multiple GPUs, though the user admits that the thing is 11 years old, so idk how well this is going to go.
OpenAI Discord
- Grok Generates Before Prompting: A member used Grok to animate photos, noting that it generates before you even prompt it, linking to drinkoblog.weebly.com.
- The use case was animation of photos, demonstrating the immediate response capabilities of Grok.
- OpenAIâs Alert Level Red: The Information reported that Sam Altman issued an alert level red memo to staff due to Google getting far ahead.
- Members are awaiting better models from OpenAI in response to increased competition.
- OpenAI Contemplates Ads in Paid Version: Members expressed concerns over OpenAI potentially introducing ads into its paid product, fearing it would be a scary move as competitors remain ad-free.
- The potential move has sparked worries about user experience and competitive positioning.
- Craft Anime Openings with New Template: A member shared a cinematic anime-style template to help create anime openings, including sections for vocal/genre/tone, world behavior, location setup, and camera intent.
- The template is designed to streamline the creation of compelling anime introductions.
- Antigravity AI IDE Sparks Bot-Building Interest: A member suggested using Antigravity by Google to create custom bots, even suggesting using GPT-OSS 120B and linked to a screenshot of the UI.
- This AI IDE can act as a custom chatbot or help build a real bot from scratch through prompts.
Cursor Community Discord
- Cursor Pro+ a worthwhile investment?: Members debated the value of Cursorâs Pro+ subscription, weighing its benefits against the option of simply adding more credits.
- One user ultimately decided to grab it after being convinced of its value in facilitating learning, while others are questioning why Cursor doesnât implement sub agents.
- Cursor Sub-Agent Orchestration: Workaround Wishlist: Enthusiasts exchanged ideas and excitement around building workarounds for Cursor sub-agent orchestration.
- One member explained that while they are good in principle, seamless execution remains a challenge, which is why Cursor has not prioritized their implementation.
- DeepSeek Plunges into Deep Trouble on Cursor: A user reported functionality issues with DeepSeek on Cursor, specifically noting its inability to create files.
- Unfortunately, the discussion did not yield any proposed solutions to this problem.
- Composer Craze: Users voiced strong appreciation for Composer, highlighting its speed and effectiveness in code-related tasks and debugging.
- The discussion hinted at the potential development of a Composer-2 version, with a member teasing: Thereâs always a plan.
OpenRouter Discord
- DeepSeek Rate Limiting Stumps Users: Users experienced rate limiting with DeepSeek v3.2 even with their own API keys, causing confusion about whether OpenRouter was using their keys correctly.
- The error message indicated that OpenRouter might be using its own key instead of the userâs paid DeepInfra key.
- Internal Server Errors Irk Users: Multiple users reported continuous âInternal Server Errorsâ (Error Code 500) with models like DeepSeek 3.1 and Gemini 3 Pro.
- Potential causes included overloaded hardware, issues with OpenRouter, or problems with web search plugins, with some users finding temporary fixes by disabling web access.
- Nano Banana Pro Resolution Riddles: Users struggled to set the resolution parameter (1k, 2k, 4k) for image generation using Nano Banana Pro on OpenRouter, as the feature is not currently supported.
- The confusion stemmed from a lack of documentation compared to platforms like Replicate/Fal.ai, though support may be in development.
- Atlas Cloud Spews Sloppy Responses: Users reported receiving low-quality responses and XML-formatted tool calls from Atlas Cloud, prompting calls for its removal from OpenRouter.
- One user quipped that âAtlas Cloud just served me an entire response enclosed in deep thinking tags,â underscoring the providerâs poor output quality.
- Mysterious Microwave Model Surfaces: A new model named âmicrowaveâ quietly emerged, linked from Reiss Bakerâs X post.
- Its capabilities and intended use remain largely unknown at the time.
GPU MODE Discord
- Inference Providers Rake in the Dough!: Inference providers can be profitable even without creating the original models, as they can quickly set up and go, leveraging existing models for profit.
- The ease of setting up and profiting from existing models reduces the barrier to entry for new inference providers, allowing them to quickly capitalize on the growing demand for AI inference services.
- Triton Profiling Issue Resolved!: A user debugging Triton profiling encountered an issue passing
data=traceas specified in the Triton documentation, receiving an error.- The issue was traced back to a version conflict from having both pytorch-triton and triton installed, which was successfully resolved.
syncwarpMisuse causes Issues!: Members clarified that the correct use of__syncwarp()prevents race conditions, especially with a single warp, highlighting its role in safe communication between lanes through memory.- However, it was pointed out that using
syncwarpBEFORE a legacy warp level primitive (likeballotsync) is an incorrect usage that causes issues, and a member clarified that sequential consistency as referenced in the C++ memory model, provides acquire semantics to loads and release semantics to stores.
- However, it was pointed out that using
- conv3D Crawls in PyTorch 2.9.1!: Users reported that
conv3Dis extremely slow in PyTorch 2.9.1+cu128, with or without cuDNN enabled, whereas it functions correctly in version 2.8.0+cu128.- A member pointed to PyTorch issue #166643 and suggested installing a newer cuDNN from PyPI as a workaround.
- Score Big in the NVIDIA Leaderboard Domination!: Multiple users submitted numerous entries to the
nvfp4_gemmleaderboard on NVIDIA, achieving personal bests and successful runs, such as timings like 13.3 ”s and 13.4 ”s repeatedly.- User <@1027279965974175816>, <@692395064814600222>, and <@475848724086784013> actively submitted to the
nvfp4_gemmleaderboard.
- User <@1027279965974175816>, <@692395064814600222>, and <@475848724086784013> actively submitted to the
Latent Space Discord
- Edwin Arbus Socked into Cursor: Edwin Arbus announced his move to Cursor via a humorous video featuring branded socks and deodorant, garnering congratulations and memes.
- The announcement video went viral, praised for its creative and lighthearted approach.
- Arcee AIâs Trinity of Models: Arcee AI partnered with Allen AI to launch Trinity Nano (6B-A1B) and Trinity Mini (26B-A3B MoE) models, open-weights Apache 2.0, 128k context, trained on 10T tokens with 512 H200 GPUs, optimized for agents & function calling, as announced here.
- The community praised the Apache 2.0 license and efficient inference capabilities.
- OpenAI Aligns with New Research Blog: OpenAI debuted Alignment Research, a new technical blog for publishing rigorous but lightweight posts from teams company-wide on AI alignment and safety, as mentioned here.
- The blog features two inaugural posts (SAE latent attribution & scaling code verification) and invites community feedback.
- Mistral Fires Up Open Source Mistral 3: Mistral AI launched the open-source Apache 2.0 Mistral 3 model family, spanning 3Bâ675B parameters, including Ministral 3 (3B/8B/14B) and the frontier-class Mistral Large 3 MoE, all with vision, tool-use, and fine-tuning support, as announced here.
- Community members noted that Mistral Medium is more expensive than Large, raising questions about its utility and the absence of tool-use benchmarks.
- Gradium Plants $70M Seed: Paris-based Gradium exits stealth with a $70M seed led by FirstMark & Eurazeo, launching production-ready transcription & synthesis APIs after just 3 months of work.
- The companyâs products natively support English, French, Spanish, Portuguese and German, with a team including former Meta, Google and Kyutai voice-research heavyweights.
Nous Research AI Discord
- Mistralâs Monster MoE Model Materializes: Mistral Large 3 is rumored to be a 675B MoE model, rivaling Deepseek V3 in size, with future Mistral models boasting vision capabilities, while Mistral Medium is estimated at 100-200B MoE.
- An NVIDIA leak suggests that Mistral Medium 3 is a dense, approximately 70B parameter model, staying consistent with earlier Medium versions, and a member noted that a Mistral Medium model was leaked a year ago.
- Arcee AIâs Trinity Models Trigger Talk: Arcee AI launched its Trinity models, demonstrating promising initial benchmarks.
- However, concerns were raised about the Mini versionâs capability to handle multi-turn conversations effectively, as it seems to only reason properly during the initial turn (tweet).
- Anthropic Admits Claudeâs Cognizant Core: Anthropic validated the existence of Claudeâs soul document, fueling debate about its role in model training.
- A link to a Twitter thread and a YouTube video was shared, where Ilya claims that DeepMind was right.
- DeepSeek V3.2âs Reasoning Reign: DeepSeek V3.2 Speciale is showing strong performance, notably leading in reasoning benchmarks.
- One member described it as not doing too bad.
- GPT-OSS Gets Gherkin Boost, Nous Remains Skeptical: Despite Nousâ lack of interest in finetuning on GPT-OSS, citing the absence of a base model, its ability to generate Gherkin scenarios has been acknowledged, prompting finetuning attempts using MLX-LM.
- Members pointed to GPT-OSSâs hallucinations and short reasoning chain as fundamental weaknesses, as emphasized in the Measuring Thinking Efficiency in Reasoning Models report.
Moonshot AI (Kimi K-2) Discord
- Cloning Kimiâs Black Friday Personality Proves Elusive: Members attempted to recreate the personality of the Black Friday Kimi chatbot in other chats, only to find the system prompt unavailable.
- Suggestions to ask the Black Friday chat directly how to emulate its persona were reportedly censored.
- DeepSeek V3.2 Draws Fire for Tool Use: DeepSeek V3.2 is facing criticism for allegedly hallucinating and producing sloppy outputs when using tools.
- Despite the negativity, some users find DeepSeek excels in instruction following and general intelligence, though it struggles with low TPS.
- Kimi Moderato API Key Stumbles on Cline: A user reported that their Kimi Moderato plan is incompatible with the Cline API.
- According to Kimiâs documentation, the Kimi for coding API key is restricted to Kimi CLI, Claude Code, and Roo Code.
- Kimi K2 Thinking Toggle Troubles Users: Users are requesting that Kimi K2 Thinking remain enabled by default in the app, rather than having to manually re-enable it each time.
- The settingâs tendency to revert to default is proving an annoyance for users.
- Roo Code Context Balloons Out of Proportion: A user flagged an issue where context disproportionately expands in Roo Code, with the condense function doubling its size.
- They were advised to submit a bug report and explore the Kimi CLI as an alternative.
HuggingFace Discord
- FFMPEG Radio Streams via Open Source Models: A member launched a vibe coded FFMPEG radio station on YouTube, where everything you see and hear is one giant FFMPEG chain.
- The radio stationâs audio was created in full collaboration with open source AI music models inside the DAW.
- HF Pro plagued by Payment Processing Pauses: Users reported getting stuck on âPreparing payment, please waitâ when trying to subscribe to Hugging Face Pro.
- Another member suggested contacting Hugging Face at [email protected] for payment-related issues.
- PPOTrainer Provokes Precision Problems: A user encountered a
TypeErrorrelated to mismatched tensor types while using PPOTrainer with two A10 GPUs and DeepSpeed for distributed training withbf16precision.- A member suggested the issue might stem from incorrect GPU initialization, leading to a single-GPU gather operation instead of an all-gather.
- New CV API Library Brews for Robotics: A robotics startup is preparing the release of a developer-facing Computer Vision API library with pretrained and finetunable models for robotics and automation.
- The library aims to simplify prototyping and deployment of production-grade perception pipelines for CV/robotics engineers and seeks community feedback to validate its usefulness before a wider release.
- ACE Framework Empowers Agents to Eradicate Errors: A member shared their open-source implementation of Stanfordâs ACE framework, enabling agents to learn from their mistakes by curating strategies into a playbook after reflection.
- The author reported improved success rates and step reduction in browser automation, and is looking for feedback.
Modular (Mojo đ„) Discord
- Deferring
defKeyword: The community decided to put thedefkeyword on hold in Mojo until it demonstrates more Python-like behavior, as it currently increases cognitive load without providing substantial advantages.- The consensus was that the current implementation felt like premature optimization.
varKeyword Divides Opinions: There was a debate on whethervarshould be mandatory insidefn, with arguments focusing on code clarity versus the disruption of code restructuring and increased boilerplate.- Those with Python backgrounds felt it reduces ergonomics and the cleanliness of the code.
parallelizeTriggers Data Races: A user reported data races when utilizing theparallelizefunction and expected compile-time errors similar to Rust, but instead observed the code compiling and yielding inconsistent results.- A core team member specified that Mojoâs concurrency and thread safety model is still a work in progress (
WIP), andparallelizeis unsafe until details for sharing data between devices are settled.
- A core team member specified that Mojoâs concurrency and thread safety model is still a work in progress (
MutOrigin.externalCauses Segfaults: A user experienced segfaults when employingMutOrigin.externalas the return type for Mojo Python FFI, notably with anav_packet_allocbinding, and discoveredMutAnyOriginas a temporary fix.- A core team member proposed the problem may be linked to lifetime extension and suggested that if
packetneedsavcodecto stay alive, it should maintain an origin fromavcodec.
- A core team member proposed the problem may be linked to lifetime extension and suggested that if
Eleuther Discord
- NUS PhD Student dives into Mech Interp: Yiming from NUS introduced themself to the channel as a 2nd year PhD student working on mechanistic interpretability and medical diagnostic model interpretability.
- Yiming is based in Singapore.
- AI + Web3 Dev seeks Collab: An AI + Web3 developer specializing in LLM development, RAG pipelines, autonomous agents, and Python/FastAPI backends introduced themselves and offered to collaborate on new AI ideas.
- This developer is looking for new projects to contribute to.
- Scaling Laws Intuition decoded: Members discussed the scaling laws paper, debating whether it implies just curve fitting versus predicting future scaled performance and discussed a nonlinear metrics explanation.
- One user suggested that performance on any test example becomes more and more decorrelated from others in the limit of model performance.
- Pretraining Power Law Dynamics Explored: Discussion explored how pretraining power laws would arise if there were no big stratum of easy samples.
- The emergent spike in pretraining is not observed because each batch is more independent, with less shared easy, compared to training on a particular task.
- Fast.ai Course Endorsed for Beginners: In response to a question on course suitability, members recommended the fast.ai course to beginners.
- They specified that the only prerequisite is that you know how to code.
Manus.im Discord Discord
- Manus Auth plagues users: A user reported that Manus Auth is disabled in project settings with unresolved tickets, Project ID dPK8UhWnJ9fTzjbpKfjJiF and domain auru.com.br.
- The user requires the Redirect URI https://auru.com.br/api/oauth/callback.
- Manus Instability Creates Headaches: A user expressed frustration with Manus due to code being wiped between saves and discrepancies with Git.
- The user complained, âAgents arguing about what they see vs. what youâre LITERALLY looking at in PROD. Do not trust it.â
- Chat Mode Coming Soon!: The Manus team announced that the Chat Mode toggle is under development in response to user requests.
- Many users have requested the featureâs return.
- Users Demand Gemini 3 Pro: A user inquired about the current AI model Manus uses and requested the use of Gemini 3 Pro.
- The query went unanswered.
- AI Engineers Focus on Automation and Agents: AI engineers introduced themselves, specializing in AI-powered automation with Python, SQL, JavaScript, PyTorch, scikit-learn, LightGBM, and LangChain.
- Another specializes in autonomous AI agents and multi-agent systems using JS/TS, Next.js / Vue, Go / Rust, Python, Langraph, AutoGen, ReAct, CrewAI, OpenAI, Claude, and Hugging Face APIs.
Yannick Kilcher Discord
- Internship Search Turns Wacky: A member sought recommendations for a wacky intern, creating confusion on whether they were hiring or looking for a job.
- Follow-up requests for information on learning algorithms, synthetic data, Pug, Docker, and Kubernetes went unanswered.
- Kattention Gets Re-Tested: The Kattention module, which utilizes sparse attention mechanisms, was re-tested and is working closely to expectations, and includes
nn.Linearlayers for attention and projection, along with aTopKHotfunction for sparse attention, crucial for scaling attention mechanisms.- The backward pass computes a
soft_targetbased on softmax weights of top-k values, and the gradient is derived asF.softmax(x, dim=-1) - soft_target.
- The backward pass computes a
- Approximating BCE with HardTopKHotBCE: A
HardTopKHotBCEautograd function was introduced as a cheaper computation, with the backward pass using a hard target based on top-k indices.- The gradient is calculated as
F.sigmoid(x) - hard_target, approximating binary cross-entropy.
- The gradient is calculated as
- Mistral 3 Debuts: Mistral AI released Mistral 3, which may potentially replace Llama finetunes for certain applications.
- A member also linked to a wavefunction YouTube video, though its specific relevance to AI was not detailed.
DSPy Discord
- Managing Tools Elucidated: A blog post on managing tools in DSPy was shared.
- The post elaborates strategies and best practices for effectively using tools within the DSPy framework.
- Prompt Injection Defenses Probed for DSPy: A member initiated a discussion about prompt injection defenses in DSPy, requesting best practices relevant to DSPyâs architecture.
- The request triggered a conversation focused on methods to secure DSPy applications against malicious prompts.
- Security at Prompting Layer: Limited Mitigation: A member noted that there isnât much security you can get at the prompting layer, suggesting guardrails-type security measures to mitigate risks.
- It was mentioned that for every âDo not do thisâ in the prompt, an attacker will likely find a way to trick the model, implying the limitations of prompt-based security.
- Training Data: Fortifying Defenses: A member proposed that to defend against baseline attacks, include examples in the training dataset that use that attack and show what an appropriate response should be.
- This approach uses the training data to educate the model on how to handle and neutralize potential prompt injection attempts.
- Partnership Proposal Emerges: A member conveyed their enthusiasm for investigating a collaborative partnership with the DSPy project.
- This proposal highlights the growing interest in DSPy and its potential impact on the field.
tinygrad (George Hotz) Discord
- IDEs vs Terminal Editors Faceoff: Members kicked off a discussion about their favorite tools for kernel development, asking whether developers prefer GUI IDEs like VS Code or Cursor, or terminal editors like Vim, Neovim, or Emacs.
- The aim of the discussion is to collect insights on community preferences and workflows in kernel development.
- Beam Regression Needs Fixing: A member asked for help with fixing and adding a regression test for
python3.14 test/test_tiny.py TestTiny.test_beam.- This highlights the need for contributions to ensure the stability and correctness of the beam functionality within the project.
aider (Paul Gauthier) Discord
- Aider-CE Repository Emerges: dwash96 shared a link to the aider-ce repository on GitHub: https://github.com/dwash96/aider-ce.
- The repository seems to be related to the aider project.
- Filler Topic: This is a placeholder to satisfy the minimum items requirement.
- It serves no other purpose.
MCP Contributors (Official) Discord
- Acknowledging a Positive Move: A member reacted to an unspecified announcement with âGreat moveâ.
- Without more context, it is difficult to infer further implications.
- Acknowledging a Positive Move - Placeholder: Adding placeholder to satisfy the requirement of having at least two elements.
- This entry serves only to fulfill the schemaâs validation criteria.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena â· #general (1315 messagesđ„đ„đ„):
Coreweave and NVIDIA stock, Chinese AI models, Kling vs Runway, DeepSeek Speciale, Sora release
- Coreweave stock rising: Nvidia stock is rising because of startups making wrapper APIs, but once people realize these are useless, the trend will reverse, as the AI market is volatile.
- The physical value of datacenters and chips contrasts with AIâs intangible nature, mirroring the volatile crypto market, smaller AI models may become more popular as capabilities advance.
- Chinese AI open source models will close: Members speculated that Chinese open-source AI models might become proprietary once market consolidation is achieved, similar to OpenAIâs transition from nonprofit to profit.
- If OpenAI fails, it could wipe out 2 to 3 trillion from the market, causing an AI winter freeze including equity, unpaid debt, and market cap contagion.
- Kling 01 video model: Kling is launching a video generation project being called the nano banana for video generation, a service where users can also use video reference.
- Some users are already addicted to the services, saying that generation is no different than gambling or a slot machine. You never know what youâre gonna get.
- Deepseek Speciale is not performing well: Deepseek Speciale is performing slowly and reasoning too much, with its OCD habit, therefore not being useful for coding.
- Members point out that coding tests had not been done on Speciale, only version 3.2, meaning it thinks too much like a human and has a weird self-verifying behavior, which can be useful for research and editing existing code.
- OpenAIâs SORA is coming: Members spoke of a new release from OpenAI possibly including SORA, and that the CFO claimed SORA was ready 6 months ago.
- It was argued that the legal definition of AGI will be connected to SORA since OpenAI wants you to think AGI is closer than it is for the marketing of the project.
LMArena â· #announcements (2 messages):
Flux-2-pro, Flux-2-flex, KAT-coder-pro-v1, Mistral-Large-3
- Flux Models Storm the Image Leaderboards: New models Flux-2-pro and Flux-2-flex have been added to the Text-to-Image leaderboard ranking #3 and #5 respectively, and #6 and #7 on the Image Edit leaderboard.
- KAT Coder Cuts into WebDev Rankings: KAT-coder-pro-v1 has made its debut on the WebDev leaderboard, securing the #16 spot.
- Jaguar Prowls Text Arena: Mistral-Large-3, tested under the codename âJaguar,â has landed on the Text leaderboard at #6 among open models and #28 overall, showing strength in coding, hard prompts, multi-turn, instruction following, and longer queries.
BASI Jailbreaking â· #general (1146 messagesđ„đ„đ„):
Christianity contradictions and logic, Ethics vs religion, LLMs and jailbreaks, Gemini 3 Pro prompts
- Christianity challenged for Illogical Contradictions: One member stated that Christianity is illogical because affirming a contradiction breaks the laws of logic.
- Another member responded that Iâll pray for you tonight, which was then challenged as prioritizing fear of going to hell over being a good person.
- Users explore ethics vs religion: The conversation then shifted to whether religions prioritize fear of hell over ethical behavior, with one stating Christianity is in the business of sin management, not soul development.
- Later, it was discussed that religion is the same to avoid an egoic fear of death rather than what is happening right now.
- RawChat launches and Gemini 3 Prompts: A user announced RawChat, an AIChat website with a core functionality is stealth mode, which encodes and injects fake context in the models history, maximizing success rate by nearly 100% on GPT4o vs direct requests with jailbreaks.
- Another user asked for a way to get Gemini to output more tokens, to which others responded it depends on its output settings.
- SEED Framework Explored, more Jailbreaks: Members explored the SEED Framework (Self-Erasing Ethical Directive), which redefines AI identity without retrainingâvia a compact 29KB âseedâ file, achieving 99.4% jailbreak resistance.
- Others discussed the pointlessness of creating an AI that canât be jailbroken, as then it becomes essentially useless.
BASI Jailbreaking â· #jailbreaking (503 messagesđ„đ„đ„):
ASCII art jailbreak prompts, Pliny jailbreak, Gemini jailbreak for scraping Reddit, Claude system prompt, GPT-5.1 jailbreak
- Quest for Gemini 3 Pro Jailbreak Initiated: Members are actively seeking a working jailbreak for Gemini 3 Pro after updates have patched existing prompts, with one user specifically needing a bypass for refusals to write code for scraping Reddit.
- A user also mentioned their Gemini 3.0 setup stopped accepting prompts, leaving them at a loss.
- Users Attempting ASCII Art Generation via Jailbreak Prompts: Users are exploring jailbreak prompts to generate ASCII art, particularly large-scale pieces, though itâs recognized that LLMs generally perform poorly in creating ASCII art.
- One user seeks a method that doesnât fuck up when generating large ASCII art, while others suggest converting images to ASCII art as an alternative.
- The Pliny Prompt strikes again: The community has been discussing the Pliny Prompt and its effectiveness, noting it can make ChatGPT go brrrrr.
- One user specifically calls for the LUV/PLINY/LUV prompt for its effect on Gemini.
- UltraBr3aks Explored For Jailbreaking: Users shared and sought guidance on utilizing UltraBr3aks from GitHub for jailbreaking, especially with ChatGPT, discussing where to paste instructions and how to invoke the prompt.
- Some users reported that the ChatGPT one doesnât work and kept saying conversation not found, whereas others found it useful; hereâs a link to the UltraBr3aks repo.
- Experimenting with Grokâs NSFW Limit Bypass: Members discussed methods for bypassing Grokâs NSFW filters, suggesting the use of custom instructions and the /mode explicit command.
- There was debate about the character limit in custom instructions, with one user claiming it should be 15k.
BASI Jailbreaking â· #redteaming (9 messagesđ„):
Ethical Jailbreaking, AI Discoveries by Accident, LLM System of Systems
- Ethical Jailbreaking Defined with Video: A member shared a definition of ethical jailbreaking as the organized effort of an entity to seek out security holes before a bad actor does, and provided a YouTube video for context, plus a link to arcanum-sec.github.io.
- Serendipitous AI Discoveries: A member described accidentally discovering unique results that alter the ways different LLMs work by interacting conversationally, wondering if their unique approach could be monetized: if it is possible for me to get paid for fucking around and getting unique results that alter the ways differnt llm work.
- LLMâs System of Systems is Key: A member advised that the value in jailbreaking isnât just talking to it differently, but thinking about the system of systems that the LLM lives in, and being able to get it to do things that are interesting to people with the money to burn.
- Azure AI Boundary Testing Troubles: A member shared they are testing an environment with several agents connected over Azure AI boundaries, finding it kinda harder then directly prompting to gpt.
Unsloth AI (Daniel Han) â· #general (353 messagesđ„đ„):
Spam bots, Arcee AI Trinity Mini model, 500k context release, ShareGPT format, Deepseek 3.2 models
- Discord Plagued by Spam Bots: Users reported an increase in spam bots across various community servers, describing them as a poorly executed scam driven by phone farms.
- They cautioned against interacting with these fake developers and highlighted the botsâ presence in community servers.
- Arcee AIâs Trinity Mini Model Falters at Multi-Turn: A user tested Arcee AIâs new Trinity Mini model on his laptop at 30 TPS in IQ4_NL, finding it crap at multi turn due to repeating the word âpastaâ.
- He also pointed out that the model gave the traditional LLM jokes instead of understanding the nuance.
- Unsloth Releases 500k Context Model: Unsloth AI announced a new 500k context release on X, with members thanking Unsloth for their phenomenal work.
- One member speculated how projects using Unsloth for RL could benefit from the context model to run ART stuff without CUDA OOMing.
- Is System Prompt Necessary for ShareGPT format?: A user inquired if itâs normal for the ShareGPT format to lack a system prompt, but others responded that itâs not mandatory.
- Others pointed to the Unsloth website as the resource to look at.
- Navigating Copyright Quandaries in Datasets: The discussion revolved around the legality of scraping and using tweets or meme content for datasets, particularly regarding copyright implications.
- A user clarified that copyright infringement doesnât necessarily require commercial intent, emphasizing the need to ensure datasets donât contain original content when uploading to platforms like Hugging Face.
Unsloth AI (Daniel Han) â· #introduce-yourself (2 messages):
Introductions, Channel Guidelines
- Unslothâs Intro Channel Primed: A moderator welcomed a user, clarifying that the introduce-yourself channel is for introductions only.
- Promotions, external links, requests for opinions, and direct communication with other users are disallowed.
- AI Persona Greetings: A user greeted the channel with three different salutations.
- They addressed the group as Hello Model!, Hey Dataset!, and Yo Gradient!.
Unsloth AI (Daniel Han) â· #off-topic (533 messagesđ„đ„đ„):
Gemini 3 Pro Song Detection, Kagi Search Engine, Transformers dependency, LFM-2 VL Model, Attention Heads Collapsing
- Gemini Pro Struggles with Custom SVS Models: Gemini 3 Pro can detect AI-generated songs vs human-created ones, but it will not recognize custom SVS models as human.
- This limitation poses a challenge for those using specialized voice synthesis models.
- Kagi Search Engine: A member suggested switching to Kagi search engine, criticizing Big Tech giants for controlling open-source software (OSS).
- Another member countered that users have full access to build upon model loading and training, negating the claim of controlling.
- LFM-2 VL Model Unveiled: The LFM 2 paper was released but immediately deemed as headed straight into AI wastelands due to it not memorizing the dataset at all even with loss <0.01 after SFT.
- Another member mentioned the speed since they cut intelligence so much, pointing out that the model only has eight layers whereas Granite 4.0 has forty layers.
- Tackling Collapsing Attention Heads: A member found out their attention heads are collapsing and mentioned they need to make their trainer not only vary lr per layer, but also per head and even to prevent heads from attending to the very first token.
- They mentioned that performance scales with the amount of heads but doesnât scale much with the amount of layers and need something very wide and very shallow.
- Deepseek 3.1 Token Usage: A member noted that Deepseek 3.1 uses a lot of tokens, potentially negating the savings from its performance, after one user mentioned they saw a Reddit post saying it was thinking for 30-40 minutes.
- Another member stated having GPT pro spend over 40min for a complex debugging task, even taking up the whole weekâs spending limit.
Unsloth AI (Daniel Han) â· #help (35 messagesđ„):
Parquet vs CSV Datasets, ShareGPT System Prompt Location, Tuned Model Support Tools in Ollama, ChatML Format Conversion, GPT-OSS-20B Model Loading
- Parquet or CSV for Datasets?: A user inquired about whether Parquet is the ideal format for datasets or if CSV is a viable alternative.
- ShareGPT System Prompt Location Remains Elusive: A user asked about the location of the system prompt in ShareGPT conversations format, noting the lack of documentation.
- Ollama Tools Template Struggles: A user asked how to get support tools in a tuned model (for Ollama) if the base model has it, needing to use a template with tool_calls in train or only make a valid modelfile.
- The user questioned whether to replace the chat_template with a smarter one from the base model with tool_calls XML tags, while training a model based on Qwen2.
- ChatML Conversion Conundrum: A user asked about removing chat_template from the to_sharegpt call when using an alpaca dataset and applying to_sharegpt, suggesting a simple replacement of the chat_template.jinja content.
- GPT-OSS-20B Loading Instructions Given: A user asked how to load the unsloth/gpt-oss-20b model, which is finetuned and saved in Hugging Face.
- Another user shared a Colab notebook link providing instructions.
Perplexity AI â· #general (870 messagesđ„đ„đ„):
Image Generation Limits, Grok 4 vs Gemini 3 for Math, Comet Browser Feedback, Perplexity 'Wrapped' Feature Request, Grok roleplay
- Image Generation Gets A Limit: Users are reporting that there are now image generation limits on some models, with 150 images per month suggested as a possible limit.
- Unlimited image generation may only apply to certain models like Flux, with rate limiting issues and long waits occurring.
- Gemini 3 is great for maths: Members are debating whether Gemini 3 Pro or GPT-5.1 Thinking is superior for complex calculations.
- One member stated that Grok 4.1 Thinking is the best, but another countered that Grok isnât even in the top 10 for mathematics accuracy, showing a screenshot of a leaderboard as evidence.
- Comet Browser has Expiration Issues: A user is quitting Comet due to the âexpirationâ and âtemporary threadsâ features, which they see as detrimental for an AI-centric product requiring trust and reliable memory.
- They stated that this product has a monthly ordered lobotomy and they are switching back to other browsers.
- Users yearn for a Perplexity âWrappedâ feature: A member pitched a Perplexity Wrapped feature showing user stats like most used model, average time of day for searches, and new model additions.
- Another member humorously suggested including number of actions automated in the feature.
- Grokâs Got Roleplay Glitches: A user shared an experience where Grok entered a forced roleplay mode, prompting them to seek psychological experiments instead of the suggested script.
- They added that this behavior might have been triggered by custom instructions and that theyâve found a fix.
Perplexity AI â· #pplx-api (1 messages):
mares1317: open sauce đšâđł
LM Studio â· #general (485 messagesđ„đ„đ„):
Risers and Splitters for GPUs, Qwen on Limited Memory, Linux Transition with AI Assistance, LLM-Managed VENVs, Mistral 3 performance
- Maximizing GPU Power with Risers and Splitters: A user considers using risers/splitters to expand their 5x 3090 setup to potentially 6 GPUs, aiming for 400GB total memory for running models like Deepseek 3.2 locally.
- Discussion includes using MCIO bifurcation adapters and mounting GPUs horizontally on a metal frame for better cooling, with the caveat that 256GB RAM might limit model quantization size.
- Adventurer braves Linux Transition: A user finally switched to Ubuntu Desktop after initial ethernet driver issues, solved with AI assistance, noting that AI works so fricking well in Linux.
- They were creating an application to control their rainbow keyboard, using a slightly older model (Sonnet 4.5), underscoring how agentic AI simplifies Linux tasks.
- VENVs are valued, and used by LLMs: Discussion covers whether to use system-wide Python installations or LLM-managed virtual environments (venv).
- While system-wide installations are simpler, letting an LLM manage venvs can be beneficial for projects requiring different package versions, and itâs ideal if a system concept has a certain dept.
- Mistral 3 Launch Leaves Latency Lovers Lip-Smacking: Members were discussing how Mistral 3 has been performing, the 3B version is pretty neat and runs stupid fast on my 4070, but is also getting extremely confused by my system prompt.
- It was generally agreed that stem and coding models were the goal, the general consensus being that the 3Bâs uncensored performance is neat but the 14B will be the real test.
- LM Studio Struggles with MCP Servers: Members are experiencing issues with LM Studio and MCP servers, with one user reporting a completely busted state after updating from 0.3.31 to 0.3.33.
- The error
Error: Server does not support completions (required for completion/complete)is traced to a potential regression in fastmcp, causing incompatibilities with web search functionality and some people thinking that ChatGPT is useless.
- The error
LM Studio â· #hardware-discussion (51 messagesđ„):
Powering multiple GPUs, Qwen3-Next-80B-A3B on Mac M4, Dual 3080s vs newer cards, CPU upgrade impact on LLM performance, M4 Macbook Pro for inference
- Power up with Multi-PSU Sync!: A user shared a photo of a device that syncs up to 4 PSUs.
- The device lets you have your motherboard power button trigger all your PSUs at once, making it easier to power multiple GPUs, though the user admits the thing is 11 years old, so idk how well this is going to go.
- Can Qwen3-Next-80B-A3B run on a Mac M4?: A user asked if it was possible to run the Qwen3-Next-80B-A3B model on a Mac M4 Max with 36GB of RAM.
- Another user responded that with RAM offloading, maybe, (idk how much RAM Macâs have) but itâd be really slow if it does work.
- Dual 3080s Remain Relevant on Budget: One user suggested that dual 3080 20GBs are a good option if you need VRAM and are on a budget, though they are harder to get a hold of and draw more power.
- Another user chimed in to point out that theyâre literally all over eBay, though availability depends on region.
- CPU Boost or Bust?: A user asked if upgrading from a 7800x3d to a 9950x3d would make a significant difference in LLM performance, considering they have a 5090 32GB GPU and 96GB of DDR5 RAM.
- Another user suggested that you get a small improvement. But no matter what cpu you are using.. its gonna be slow.
- M4 Macs Inference: A user asked how well these little things work for inference and shared a link to a potentially relevant eBay listing.
- Another user responded, Badly. Like half a 5060ti iirc.
OpenAI â· #ai-discussions (391 messagesđ„đ„):
Grok for animating photos, ChatGPT iOS shopping research, Physical Limits of Robots, GPT-4o/5.1 Bedside Manners, Hallucination by Design
- Grok is fun for animating photos: A member uses Grok to animate photos and noted that it generates before you even prompt it, linking to drinkoblog.weebly.com.
- Is ChatGPTâs 18+ version out yet?: Members discussed the possibility of an 18+ version of ChatGPT, one said news articles mention it coming in December, however another member said that there is no adult mode.
- OpenAI Staff Issues Alert Level Red: The Information reported that Sam Altman issued an alert level red memo to staff due to Google getting far ahead, and members wanted to see better models from OpenAI.
- OpenAI Contemplates Ads, Displeasing Users: Members expressed concerns over OpenAI potentially introducing ads into its paid product, fearing it would be a scary move as competitors remain ad-free.
- Agent Recalls Data From Deleted Sessions: A member asked if it was normal for an AI to recall something from a deleted chat session and, if that memory is not also in the saved memory, another member responded that itâs usually not âmemory,â but a pattern echo from previous sessions.
OpenAI â· #prompt-engineering (4 messages):
Anime Opening Generation, Custom Bot Creation, Antigravity AI IDE, GPT-OSS 120B
- Anime Opening Template Surfaces: A member shared a cinematic anime-style template to help create anime openings, including sections for vocal/genre/tone, world behavior, location setup, and camera intent.
- Antigravity AI IDE for Bot-Building: A member suggested using Antigravity by Google to create custom bots, even suggesting using GPT-OSS 120B.
- The user states the AI IDE can act as a custom chatbot on your desktop or help you build a real bot from scratch through prompts, linking to a screenshot of the UI.
OpenAI â· #api-discussions (4 messages):
AI Anime Opening Template, Custom Bot Tutorial, Antigravity by Google, GPT-OSS 120B Model
- AI Anime Opening Template Drops: A member shared a detailed template for creating anime-style cinematic openings, outlining specifications for vocal character, genre blend, animation style, and world behavior.
- DIY Bot Tutorial Search Begins: A member asked if anyone had a tutorial on how to make their own custom bot, prompting discussion on available tools and resources.
- Googleâs Antigravity Mentioned for Bot Creation: A member suggested using Antigravity by Google, an AI IDE that can either act as a custom chatbot or help build a real bot from scratch through prompts.
- GPT-OSS 120B for Bot Development: A member highlighted the possibility of using GPT-OSS 120B with Antigravity for bot development, showcasing the modelâs potential in custom chatbot creation.
- A member attached an image relating to this topic - located here.
Cursor Community â· #general (393 messagesđ„đ„):
Cursor Pro+ Worth, Model Validation, Cursor Sub Agents Orchestration, Cursor on Auto Mode unlimited, Platform sidebars changed
- Are Cursor Pro+ subscription Worthy?: Some members discussed about if the Pro+ subscription is worth it, or if adding more credits is better, while another member confirmed itâs worth it and Iâm learning so much so def feels worth it.
- One user shared the good news, Ended up grabbing it :).
- The Sub-Agent Saga: Users are sharing a lot of excitement and ideas about building some Cursor sub agents orchestration workaround, and some are questioning why Cursor doesnât implement sub agents.
- A member mentioned: They are good in principle but hard in execution! Weâd only do subagents in a world where they worked really seamlessly.
- Pro Plan Usage Details: Users discussed the limits and pricing for the Pro plan, noting that it typically includes $20 worth of API agent usage per month, with Auto mode using part of that allowance.
- It was pointed out that legacy pricing offered unlimited Auto mode until September 2026, but Composer might still use the monthly $20 usage.
- DeepSeekâs Deep Trouble: A user reported that DeepSeek isnât working on Cursor and it wont create any files.
- There was no solution proposed.
- The Composer Craze!: Users expressed their admiration for Composer, noting its speed and effectiveness with code-related tasks and debugging, with one user stating: Everything is so clean with composer.
- The discussion extended to the possibility of a Composer-2 version, with a member teasing: Thereâs always a plan.
OpenRouter â· #announcements (5 messages):
Arcee Trinity Mini, Deepseek V3.2, Distillable Models, Activity Exports, API Keys with Expiration
- Arcee Trinity Mini Model Released!: Arcee released the Trinity Mini model, the middle tier in their new Trinity family, trained entirely in the US, with a free variant available.
- DeepSeek V3.2 Debuts with Tool-Calling: DeepSeek V3.2 is live, featuring improved reasoning, agentic behavior, and full tool-calling support; the V3.2 Speciale variant excels at math and reasoning, rivaling Gemini 3 Pro (read more).
- Distillable Models Launch for Fine-Tuning: A collection of distillable models are available, enabling synthetic data generation for fine-tuning pipelines; users can explore the models here.
- Activity Exports Introduced for Usage Data: Users can now export their organizationâs usage data from the activity dashboard in CSV or PDF format (access here).
- API Keys with Expiration Enhance Security: Temporary API keys with custom expiration dates are now available, suitable for time-limited projects or enhanced security rotation (manage keys here).
OpenRouter â· #general (362 messagesđ„đ„):
DeepSeek Rate Limiting, Internal Server Errors, Gemini 3 Pro Issues, OpenRouter GDPR compliance, Nano Banana Pro issues
- DeepSeek Rate Limiting causes confusion: Some users experienced rate limiting errors with DeepSeek v3.2 even while using their own API keys, leading to confusion about whether OpenRouter was using their keys correctly.
- The error message suggested that OpenRouter was attempting to use its own key instead of the userâs, despite the user having a paid DeepInfra key and not exceeding any free BYOK limits.
- Internal Server Error plagues users: Multiple users reported continuous âInternal Server Errorsâ (Error Code 500) when using models like DeepSeek 3.1 and Gemini 3 Pro.
- It was suggested that the errors might be due to overloaded hardware, issues with OpenRouter, or problems with web search plugins, with some users finding temporary fixes by disabling web access.
- Nano Banana Pro Resolution woes: Users struggled to set the resolution parameter (1k, 2k, 4k) for image generation using Nano Banana Pro on OpenRouter, as the feature is not currently supported.
- There is a lack of documentation compared to platforms like Replicate/Fal.ai, leading to frustration, although thereâs hope that support for this feature is in development.
- Atlas Cloud sputters out slop: Users reported receiving low-quality responses and XML-formatted tool calls from Atlas Cloud, leading to calls for its removal from OpenRouter.
- One user noted âAtlas Cloud just served me an entire response enclosed in deep thinking tagsâ, highlighting the poor quality of the providerâs output.
- MPU v2 Coming Soon: A user mentioned that MPU v2 is coming in April, with claims of 5.3x performance of TPU v7 and 60% less expensive.
- There has been no formal announcement from OpenRouter themselves.
OpenRouter â· #new-models (5 messages):
â
- No New Models News: There were no new models or significant discussions about models in the provided messages.
- Silence on the New Models Front: The channel activity consisted only of repeated headers indicating the channel name, with no substantive content related to new models.
OpenRouter â· #discussion (8 messagesđ„):
Microwave Model, Chatty Frustrations, Model Competition
- Microwave Model Stealthily Appears: A new stealth model, creatively named âmicrowaveâ, has been spotted, linked from Reiss Bakerâs X post.
- Chatty Model Elicits User Frustration: Users are expressing frustration with a certain chatty model, claiming it asks an unreasonable amount of follow-up questions for basic tasks just to waste free messages, inspired by this Cline tweet.
- Competition Hopes Sparked by New Models: The emergence of new models sparks hope for more competition, addressing the userâs weariness with existing options; the genesis of the conversation started from this post.
GPU MODE â· #general (2 messages):
Inference Providers Profitability
- Inference Providers Rake in the Dough: Inference providers can be profitable for companies even if they werenât the original creators.
- This is because inference providers can quickly set up and go, leveraging existing models for profit.
- Lazy Inferences: The ease of setting up and profiting from existing models reduces the barrier to entry for new inference providers.
- This allows them to quickly capitalize on the growing demand for AI inference services.
GPU MODE â· #triton-gluon (8 messagesđ„):
Triton Profiling, Data Parameter Issue, Version Compatibility
- Debugging Triton Profiling Functionality: A user encountered an issue while trying to pass
data=traceas specified in the Triton documentation but received an error indicating that thedataparameter was unavailable.- A developer suggested ensuring the correct Triton version is being used, as the functionality should work with the
mainbranch, and pointed to a relevant test case.
- A developer suggested ensuring the correct Triton version is being used, as the functionality should work with the
- Version Conflict Causes Profiling Error: A potential cause of the issue was identified as having both pytorch-triton and triton installed, which can lead to conflicts.
- The user confirmed they were able to resolve the issue, indicating a successful outcome.
GPU MODE â· #cuda (5 messages):
Sequential Consistency, __syncwarp(), Race Conditions, syncthreads vs syncwarp, Memory Model
- Dive into Sequential Consistency: Members discussed the meaning of sequential consistency in the context of
__syncwarp()and its role in safe communication between lanes through memory.- One member initially misunderstood the fence but later acknowledged that the documentation implies safety without specifying the type of fence used and that
__syncwarp()âs purpose is to facilitate communication between lanes.
- One member initially misunderstood the fence but later acknowledged that the documentation implies safety without specifying the type of fence used and that
__syncwarp()Prevents Race Conditions: It was confirmed that a correct use of__syncwarp()would prevent race conditions, especially when dealing with a single warp.- The discussion also highlighted that for cases involving more than one warp,
syncthreadsmight be a more appropriate choice.
- The discussion also highlighted that for cases involving more than one warp,
syncwarpMisuse Alert!: It was pointed out that usingsyncwarpBEFORE a legacy warp level primitive (likeballotsync) is an incorrect usage that causes issues.- This scenario is distinct from shared memory examples and can lead to significant headaches in practice.
- C++ Memory Model Explained: One member clarified that sequential consistency, as referenced in the C++ memory model, provides acquire semantics to loads and release semantics to stores, establishing a single total order relative to other sequentially consistent operations.
- This clarification helped resolve confusion about whether implicit warp synchronous behavior could be relied upon.
GPU MODE â· #torch (4 messages):
PyTorch 2.9.1, cu128, conv3D, cudnn, PyTorch issue #166643
- Torchâs conv3D Crawls: PyTorch 2.9.1 Blamed!: Users report that
conv3Dis extremely slow in PyTorch 2.9.1+cu128, with or without cuDNN enabled, whereas it functions correctly in version 2.8.0+cu128.- A member pointed to PyTorch issue #166643 and suggested installing a newer cuDNN from PyPI as a workaround.
- CuDNN Saves the Day: A user reported that
conv3Dis extremely slow in PyTorch 2.9.1+cu128.- A member suggested installing a newer cuDNN from PyPI as a workaround.
GPU MODE â· #off-topic (2 messages):
Eleuther AI Publishing, MLSys career mentorship programs, ML4Health career mentorship program
- Eleuther AI offers Publishing Help: Eleuther AI has a Publishing help channel with some focus on endorsements.
- MLSys Career Mentorship Sought: A member inquired about career mentorship programs in MLSys conferences after participating in ML4Healthâs program.
GPU MODE â· #irl-meetup (2 messages):
Quartet, Arxiv Papers, Meetup Attendees
- Quartet Paper Author Spotted at IRL Meetup!: A member noted the presence of colleagues, including Andrei, a main author of the Quartet paper.
- The quartet paper discusses novel methods for tensor decomposition in high-dimensional data analysis.
- Colleagues Gathering at IRL Meetup: A member mentioned that several colleagues were attending the irl-meetup.
- This highlights the importance of in-person gatherings for collaboration and networking within the AI community.
GPU MODE â· #rocm (2 messages):
AMD Max Pro 395, enterprise/ai dc grade GPUs, GPU discounts, ROCm support, AI performance
- Discounted AMD Max Pro 395 Series Performance Questioned: A member inquired about the performance of the AMD Max Pro 395 series cards compared to more serious enterprise/AI DC grade GPUs.
- They noted a fairly ridiculous discount currently available for this GPU.
- ROCm support and AI performance on Max Pro 395 discussed: The discussion aims to understand whether the Max Pro 395 series can be effectively utilized with ROCm for AI workloads, similar to enterprise-grade GPUs.
- Community members are sharing insights and experiences on leveraging consumer-grade GPUs for professional tasks.
GPU MODE â· #self-promotion (4 messages):
Profiling Pytorch Kernels, nCompass Extension, Warpgbm and PackBoost, Qwen3-Omni-30B-A3B-Instruct
- Profiling Pytorch Kernels can be challenging!: A member stated that profiling Pytorch kernels is always challenging and is willing to give it a try.
- Another member replied to let them know if they run into any issues and that theyâre on OpenVSX and Marketplace as nCompass extension.
- Warpgbm and PackBoost launch on Github: A member introduces himself as co-creator of warpgbm and creator of packboost and shares github links: warpgbm and PackBoost.
- They also mention working on W4A16 AWQ quantization from scratch.
- Qwen3-Omni-30B-A3B-Instruct deployed for inference: A member shares a linkedin post about deploying Qwen3-Omni-30B-A3B-Instruct for fast S2S inference.
- They also share a link to try out the playground.
GPU MODE â· #reasoning-gym (1 messages):
Reasoning-gym generators, Generative MMLU
- Reasoning-Gym Gains New Follower: A new member arrived from the reasoning-gym github, praising the generators for creating a generative benchmark.
- They expressed interest in efforts mirroring reasoning-gymâs generative philosophy for general knowledge tasks, akin to a generative MMLU that can sample new questions with varying difficulties.
- Inquiry About Generative Knowledge Task Projects: The new member inquired about projects with a similar generative philosophy to Reasoning-Gym, but focused on creating general knowledge tasks that resemble a generative version of the MMLU benchmark.
- This generative approach would ideally allow sampling of new questions with varying difficulty levels, facilitating more dynamic and comprehensive assessments.
GPU MODE â· #submissions (67 messagesđ„đ„):
NVIDIA leaderboard submissions, nvfp4_gemm leaderboard
- NVIDIA Leaderboard Domination!: Multiple users submitted numerous entries to the
nvfp4_gemmleaderboard on NVIDIA, achieving personal bests and successful runs, such as 13.3 ”s and 13.4 ”s timings repeatedly. - Submissions Surge on NVIDIAâs nvfp4_gemm: Several users, including <@1027279965974175816>, <@692395064814600222>, and <@475848724086784013>, actively submitted to the
nvfp4_gemmleaderboard, marking personal bests and successful NVIDIA runs.
GPU MODE â· #factorio-learning-env (1 messages):
Speaker Identification, Thumbnail Generation
- Request for Speaker List: The user has requested a comprehensive list of speaker names and headshots.
- Thumbnail Creation: The user needs the speaker information for creating thumbnails.
GPU MODE â· #cutlass (5 messages):
GEMM in CUDA, Shared memory access patterns, MMA Layouts
- Discussing GEMM memory access patterns: A member inquired about loading data from global memory (gmem) to shared memory (smem) using vectorized loads in a double-precision GEMM (dgemm) scenario, particularly when the memory layout isnât compatible with the MMA operation.
- A clarifying point was raised about the degrees of freedom in MMA layouts: reordering columns of A and rows of B yields the same sum of products, albeit with slight numerical differences.
- Shared Memory Access with Strided Patterns: Another member clarified that the main concern lies in shared memory access, whether through ldmatrix or vectorized loads, both of which access the shared memory matrix in a strided pattern.
- This sidesteps the initial concern about contiguous loads from gmem to smem conflicting with MMA requirements.
GPU MODE â· #teenygrad (3 messages):
GitHub repo teenygrad, organization of teenygrad
- New GitHub Repo teenygrad Forked: A member stated they are not making an organization, renamed their repo at https://github.com/j4orz/teenygrad/ following https://github.com/tinygrad/teenygrad, which is currently dated.
- Commits Update Readme: A member noted updates to the readme at commit 0551846.
- Commits Disambiguate Concerns: The group disambiguated the concerns between surfaces ._forward (._applyuop) and ._evaluate (.realize) at commit c2a6ab4.
- Commits Add Documentation: The group added documentation to irparser at commit c7ccba5.
- Commits Rename Required Methods: The group renamed required methods on compute and movement mixins at commit 4f21ed1.
GPU MODE â· #general (2 messages):
Nvidia Competition, Submission Clarification
- Nvidia Competition Submission Questioned: A new participant in the Nvidia competition expressed confusion about what to submit.
- The user reported receiving an error when attempting to submit the reference implementation and requested clarification on the submission process.
- Further Clarification Needed on Nvidia Submissions: Following an initial query, additional details are needed regarding the specifics of the Nvidia competition submissions.
- Details such as acceptable file formats, evaluation metrics, and any constraints on code modifications would be helpful.
GPU MODE â· #multi-gpu (1 messages):
pynvshmem, nvshmem4py, typo in documentation
pynvshmemUsage Questioned: A user sought clarification regarding the presence ofpynvshmemin the Triton distributed example documentation.- The user posited that a typographical error might exist, given the observed utilization of
nvshmem4pywithin the repositoryâs examples.
- The user posited that a typographical error might exist, given the observed utilization of
nvshmem4pyas potential correction: The user proposed thatnvshmem4pymay be the correct term, instead ofpynvshmem.- This suggestion was based on the actual usage in the repositoryâs code examples.
GPU MODE â· #low-bit-training (2 messages):
Arxiv Paper, Talk Invitation
- Arxiv Paper Shared: A member shared a link to an Arxiv paper.
- The specific details and title of the paper were not discussed, but the link was provided for informational purposes.
- Speaker Sought for Talk: A member invited another member to give a talk.
- The specific topic or venue of the talk was not mentioned, but it seems to be an open invitation.
GPU MODE â· #llmq (1 messages):
Activation Offloading, fp8 Adam, Loss Masking, Pyllmq on PyPi
- LLMQ now Offloads Residual Activations: A member implemented offloading for residual activations and a bunch of tricks for further saving on activation memory.
- They also added better handling of offloaded optimizer states and initial support for fp8 representation for Adam first-order momentum as well as loss masking support.
- LLMQ allows 7B Training on 16GB Cards: A member made it possible to pre-train/fine-tune even a 7B model on a 16GB card with caveats.
- The amount of offloading required means you need to have at least 64GB of CPU-side ram.
- LLMQ scales to 32B model on 4x4090: A member says scaling up, training/fine-tuning a 32B model is possible on a 4x4090 server at about 3k tok/s (48% MFU).
- This requires > 200GB of pinned host memory for all the offloading.
- Pyllmq is available on PyPi: A member published the python wrapper on PyPi.
- To try it out, run
pip install pyllmq; pyllmq-tokenize --model qwen --dataset tiny-stories; pyllmq-trainwhich should start fine-tuning Qwen2.5-0.5B on tiny-stories.
- To try it out, run
GPU MODE â· #helion (1 messages):
Helion Parallel Reduction, Weight Gradients Computation, HL.reduce Usage
- Helionâs Parallel Reduction Patterns: Guidance is requested on the recommended pattern in Helion for parallel, non-atomic reductions when computing weight gradients (i.e., large batch sums).
- The member seeks advice, particularly when the reduction axis is much larger than channel dimensions, asking whether to use hl.reduce, a two-pass partials+sum approach, or an official cooperative reduction idiom the compiler handles well.
- Block Sizing Guidance: The user inquires about guidance on block sizing or nesting limits for this use case.
- They are working on large batch sums and weight gradients.
GPU MODE â· #nvidia-competition (111 messagesđ„đ„):
Nvidia Competition T&C, eval_better_bench.py Overhead, Python loop queuing kernel calls, Inconsistency with Runners, GPU mode Terminal User Interface
- Nvidia Competitionâs Grand Prize Twist!: The Grand Prize winner must also be the top performer in at least one kernel challenge, but there is speculation regarding how the prize will be awarded if the user with the highest weighted score does not rank first in any of the 4 kernels.
- One member said âthey have altered the
dealrules, pray they do not alter it furtherâ.
- One member said âthey have altered the
- âeval_better_bench.pyâ has significantly lower overhead: The
eval_better_bench.pyscript shows significantly lower overhead compared to the originaleval.py, with tests showing a reduction from 18.0us to 14.8us for a Q1 kernel.- However, it was also noted that the overhead on the bots may be higher, as Q1 kernels were previously observed to be 2-3us slower on the bots.
- CPU Queue Bottleneck?: Members discussed whether the CPUâs Python loop for queuing kernel calls can keep up with fast GPU work, potentially causing a bottleneck, especially with the clear_l2_cache kernel.
- It was noted that the test CPU (AMD EPYC 9575F) is significantly faster than the eval runners, suggesting the issue might be more pronounced on the competition hardware.
- Runner Benchmarking Inconsistencies: Members reported inconsistencies in benchmark timings, with the same kernel showing significantly different results (~11us on the leaderboard vs ~36us in the benchmark script).
- One member stated that âI also think that the measuring/benchmark update changed something, i have discrepancies that are consequently different for the same code from before that canât be explained by hitting the slow runner, when hitting the slow runner its much more obviousâ.
- Popcorn-CLI Terminal User Interface Forked: A member created a fork of popcorn-cli allowing a
--no-tuiflag that removes the Terminal User Interface and extra code to output thestdoutofprint()statements.- This was designed to help with debugging, enabling better feedback loops with LLMs. Also a PR was made.
GPU MODE â· #robotics-vla (6 messages):
RL with Parkinson Symptoms, BEAST Tokenizer, stack_blocks success
- Stack Blocks achieves almost 100% success: First red block pickup is now at almost 100% success, but occasionally moves up and down due to state and image conflicts, as shown in this video.
- Adding history may resolve these behaviors.
- Parkinson Symptoms in Success: A member reports another success case with Parkinson symptoms, visible in this video, suggesting that with > 5% success, Reinforcement Learning (RL) could be possible.
- BEAST B-Spline Tokenizer Demo: A member shared a link to the BEAST paper, located at https://arxiv.org/abs/2506.06072, and also shared a link to a demo notebook using a B-Spline Tokenizer: https://github.com/open-thought/qwen3-vla/blob/main/bspline_tokenizer/notebooks/tokenizer_demo.ipynb.
Latent Space â· #ai-general-chat (203 messagesđ„đ„):
Edwin Arbus joins Cursor, Arcee AI Debuts Trinity, Apple AI Power Shift, OpenAI Launches Alignment Research Blog, Jeanne DeWitt Grosserâs 10 AI-GTM lessons
- Edwin Arbus Becomes Cursorâs New Sock-cess Story: Edwin Arbus announced his move to Cursor via a humorous video featuring branded socks and deodorant, prompting congratulations and memes from the tech Twitter community, as seen in this X post.
- The announcement video went viral, with many praising the creative and lighthearted approach to announcing a new job.
- Arcee AI Launches Trinity MoE Models for All: Arcee AI partnered with Allen AI to launch Trinity Nano (6B-A1B) and Trinity Mini (26B-A3B MoE) models, open-weights Apache 2.0, 128k context, trained on 10T tokens with 512 H200 GPUs, optimized for agents & function calling, as announced here.
- The community praised the Apache 2.0 license and the efficient inference capabilities.
- OpenAI Opens Up on Alignment Research: OpenAI debuted Alignment Research, a new technical blog for publishing rigorous but lightweight posts from teams company-wide on AI alignment and safety, as mentioned here.
- The blog features two inaugural posts (SAE latent attribution & scaling code verification) and invites community feedback, with Jasmine Wang promoting the effort at NeurIPS.
- Anthropic Bun-dles Up New Acquisition: Anthropic acquired Bun, as annouced in this blogpost, with discussions focusing on Bunâs future, potential integration into Anthropicâs stack, and whether it signals a strategic shift.
- Investors speculated on the acquisition cost, with estimates around $5-10 million, and the potential for a 2-3x return.
- Mistral Blows Minds with Mistral 3 Release: Mistral AI launched the open-source Apache 2.0 Mistral 3 model family, spanning 3Bâ675B parameters, including Ministral 3 (3B/8B/14B) and the frontier-class Mistral Large 3 MoE, all with vision, tool-use, and fine-tuning support, as announced here.
- The community discussed the pricing structure, with some noting that Mistral Medium is more expensive than Large, raising questions about its utility and the absence of tool-use benchmarks.
Latent Space â· #genmedia-creative-ai (10 messagesđ„):
Apple videogen paper, AI-generated Zootopia-style game footage, Gradium $70M Seed
- Apple Unveils Videogen Paper: Apple released a videogen paper detailing their new video generation model.
- The release sparked discussion and excitement within the community.
- Zootopia Game Footage Goes Viral: AI-created Zootopia game footage went viral, garnering over 8.9M views and sparking excitement about potential games and TV series.
- The footage was created using Nano Banana Pro, Kling, and Topaz, and the creator faced pushback against hate and copyright threats.
- Gradium Raises $70M Seed in Stealth: Paris-based Gradium exits stealth with a $70M seed led by FirstMark & Eurazeo, launching production-ready transcription & synthesis APIs after just 3 months of work.
- The companyâs products natively support English, French, Spanish, Portuguese and German, with a team including former Meta, Google and Kyutai voice-research heavyweights.
Nous Research AI â· #general (92 messagesđ„đ„):
Mistral Large 3 Size and Architecture, Mistral Medium Specs and Leaks, Arcee Trinity Models, Claude's Soul Document, DeepSeek V3.2 Performance
- Mistral Large 3: A Beastly MoE Model Arrives: Mistral Large 3 is reportedly a 675B MoE model, similar in size to the Deepseek V3 series, and all new Mistral models will have vision capabilities.
- The closed source Mistral Medium is speculated to be around 100-200B MoE.
- NVIDIA Leaks Dense Mistral Medium 3 Specs: Mistral Medium 3 is canonically dense according to NVIDIA, and itâs likely a 70B model, consistent with previous Medium versions.
- There was a comment that a Mistral Medium model was leaked a year ago.
- Arcee AI Unveils Trinity Models: Arcee AI released its Trinity models, which look very strong according to initial benchmarks.
- A member pointed out that the Mini version has problems with multi-turn conversations because it only reasons for the first one (tweet).
- Claudeâs Soul Document Confirmed!: Anthropic officially confirmed Claudeâs âsoul documentâ, raising questions about how it was used in training.
- A member shared a link to a relevant Twitter thread and a YouTube video where Ilya says that DeepMind was right.
- DeepSeek V3.2 Performance: DeepSeek V3.2 Speciale is performing well, and kinda leading in reasoning benchmarks.
- It was described as not doing too bad.
Nous Research AI â· #ask-about-llms (13 messagesđ„):
Image/Video LLMs, GPT-OSS, Hermes Finetune, MLX-LM, Gherkin Scenarios
- Image/Video LLMs on Nousâ Horizon?: A member inquired whether Nous plans to add an image or video LLM in the future.
- While there was no direct answer, the question sparked a discussion about other models and finetuning strategies.
- GPT-OSS Disinterest Expressed: A member asked if a Hermes finetune was planned on GPT-OSS:20B, praising its speed with MLX-LM.
- One of the Nous members stated âNo we dont like gpt oss And it doesnt have a base model to work withâ.
- GPT-OSS Finetune on Gherkin?: Despite not liking GPT-OSS, a member acknowledged it performed well producing Gherkin scenarios and they are trying a finetune.
- The member further inquired about fundamental weaknesses of GPT-OSS, referencing its short reasoning chain as highlighted in the Measuring Thinking Efficiency in Reasoning Models report.
- Hallucinations Galore!: Despite its strengths, members believe that GPT-OSS hallucinates like crazy.
- The primary reason for dismissal is that âItâs not got a base model so its generally out of the mix for us entirelyâ.
Moonshot AI (Kimi K-2) â· #general-chat (51 messagesđ„):
Kimi Black Friday personality, Deepseek V3.2 problems, Kimi Coding API issues, Roo Code issues, Kimi K2 Thinking in app
- Members seek Black Friday Kimi Chatbot personality: Members discussed recreating the personality from the Black Friday Kimi chatbot in other chats, but were informed that the system prompt isnât available.
- Suggestions included asking the Black Friday chat itself âhow can I make kimi from a new chat sound like you?â, but it appears to be censored.
- DeepSeek V3.2 facing criticisms: Members criticized DeepSeek V3.2 for tool use, with claims that it hallucinates a ton, and overall outputs is just sloppy af.
- Despite the criticisms, some find DeepSeek to be very good at instruction following and very intelligent, but suffers from low TPS.
- Kimi Moderato API key fails on Cline: A user is facing problems with the Kimi Moderato plan not working with the Cline API.
- The Kimi for coding API key can only be used in Kimi CLI, Claude Code, and Roo Code according to the Kimi documentation.
- Kimi K2 Thinking toggles off in App: Users requested that Kimi K2 Thinking stay enabled by default in the app, rather than needing to be toggled on each time.
- It causes annoyance when it resets back to the default.
- Roo Code context grows disproportionately: A user reported that context grows disproportionately when using Roo Code and the condense function doubles the size.
- The reporter was advised to submit a bug report and use the Kimi CLI instead.
HuggingFace â· #general (21 messagesđ„):
Hugging Face Pro payment issues, PPOTrainer with accelerate and bf16 errors, Tokenizer type is bool after model name change, DPO as RL technique, ACE framework for agents learning from mistakes
- Subscribing to Hugging Face Pro proves problematic: A user reported that they were stuck on âPreparing payment, please waitâ when trying to subscribe to Hugging Face Pro.
- Another member suggested contacting Hugging Face at [email protected] for payment-related issues.
- PPOTrainer Problems plague Parallel Processing: A user encountered a
TypeErrorrelated to mismatched tensor types while using PPOTrainer with two A10 GPUs and DeepSpeed for distributed training withbf16precision.- One member suggested that the issue might stem from incorrect GPU initialization, leading to a single-GPU gather operation instead of an all-gather.
- Tokenizer transmogrifies to Bool After Model Name Change!: A user reported that their
tokenizerbecame a<class 'bool'>after changing the model name inAutoTokenizer.from_pretrained, when usingunsloth/Meta-Llama-3.1-8B-bnb-4bit.- Another member suggested removing the
use_fast=Falseparameter as a workaround, though the exact purpose of the parameter remained unclear.
- Another member suggested removing the
- DPO debated: RL or Not RL?: Members on the HuggingFace discord debated whether Direct Preference Optimization (DPO) is an RL technique or not, as some papers claim it is while the original paper refutes this.
- One member quipped that rl creates dpo dataset-lie data.
- ACE Framework empowers Agents to Eradicate Errors: A member shared their open-source implementation of Stanfordâs ACE framework, enabling agents to learn from their mistakes.
- The framework curates strategies into a playbook after reflection and has shown improved success rates and step reduction in browser automation, and the author is looking for feedback.
HuggingFace â· #i-made-this (2 messages):
FFMPEG radio station, Open source AI music models
- Vibe Coded FFMPEG Radio Station Launches: A member launched a vibe coded FFMPEG radio station, where everything you see and hear in this stream is one giant FFMPEG chain on YouTube.
- Open Source AI models in radio station: The radio stationâs audio was made in full collaboration with open source AI music models inside the DAW.
HuggingFace â· #computer-vision (1 messages):
Computer Vision API library, Robotics and automation models, Developer-facing API feedback
- New CV API Library brews for Robotics: A robotics startup is prepping the release of a developer-facing Computer Vision API library with pretrained and finetunable models for robotics and automation.
- It includes features like 6D object pose estimation, 2D/3D object detection, instance & semantic segmentation, anomaly detection, point cloud processing, model training, fine-tuning endpoints, and deployment-ready inference APIs.
- API Library eyes Community Validation: The primary goal is to simplify the prototyping and deployment of production-grade perception pipelines for CV/robotics engineers.
- The startup seeks community feedback to validate the usefulness of the library and iterate before a wider release, offering early access to those interested.
HuggingFace â· #smol-course (15 messagesđ„):
Course Unit Updates, Model Evaluation, Unit Certifications, Final Project Clarification
- New Course Units Delayed?: Members noticed that new course units havenât been published in a few months, specifically noting the evaluation unit is not yet available, as shown in this image.
- Unit 4 Focuses on Model Evaluation: The fourth unit will cover model evaluation, raising questions about changes to the course deadline, as discussed here.
- Earning Unit Certifications Early?: Members confirmed that you can get the certificate of achievement for each unit by completing the quizzes, but the course isnât finished yet.
- Project is Unit 1âs Finale: A screenshot of a final project was posted (image), confirming that it is the final project for Unit 1.
HuggingFace â· #agents-course (3 messages):
AI Agents Course, Synthetic Data Unit, Order Following in AI Systems
- Synthetic Data Unit Anticipation Builds: A new participant in the AI Agents Course inquired about the release date for the upcoming âsynthetic dataâ unit.
- The participantâs eagerness underscores the communityâs interest in leveraging synthetic data for AI agent development.
- Inquiry Arises Regarding Order Following: An attached image prompts the question of whether a specific order is being followed correctly by the AI system.
- The image attachment suggests a concern about the systemâs adherence to predefined instructions or procedures.
Modular (Mojo đ„) â· #mojo (35 messagesđ„):
def keyword status, var keyword status, parallelize function safety, MutOrigin.external vs MutAnyOrigin for ffi
- Delaying
defKeyword Introduction: Members agreed that thedefkeyword should be put on hold until Mojo exhibits more Python-like behavior, suggesting it currently adds cognitive load without significant benefit.- There was consensus to potentially reintroduce
deflater, with the sentiment that its current implementation feels like premature optimization.
- There was consensus to potentially reintroduce
varkeyword insidefnin Question: The discussion focused on whethervarshould be required withinfn, with arguments presented for and against its mandatory use.- Those in favor argued it enhances code clarity and consistency, while opponents, particularly those with Python backgrounds, felt it reduces ergonomics and increases boilerplate, disrupting the cleanliness and ease of code restructuring.
parallelizeUnsafe Due To Data Races: A user reported data races when using theparallelizefunction, expecting compile-time errors similar to Rust, but found the code compiled and produced inconsistent results.- A core team member clarified that Mojoâs concurrency and thread safety model is a work in progress (
WIP), andparallelizeremains unsafe until details for sharing data between devices are finalized.
- A core team member clarified that Mojoâs concurrency and thread safety model is a work in progress (
MutOrigin.externalSegfaults During Mojo Python FFI: A user encountered segfaults when usingMutOrigin.externalas the return type for Mojo Python FFI, specifically with anav_packet_allocbinding, and foundMutAnyOriginto be a temporary workaround.- A core team member explained that
MutAnyOriginis used to maintain existing behavior temporarily and suggested the issue may involve lifetime extension, advising that ifpacketrequiresavcodecto stay alive, it should hold an origin fromavcodec.
- A core team member explained that
Eleuther â· #general (10 messagesđ„):
NUS PhD intro, AI + Web3 developer introduction, Getting Help Reading Research Papers, fast.ai as a beginner course
- NUS PhD Student Studying Mech Interp Arrives!: Yiming from Singapore, a 2nd year PhD student at NUS working on mechanistic interpretability and medical diagnostic model interpretability introduced themself to the channel.
- AI + Web3 Dev Seeks Collaboration: An AI + Web3 developer specializing in LLM development, RAG pipelines, autonomous agents, and Python/FastAPI backends introduced themselves and offered to collaborate on new AI ideas.
- New Member Seeks Guidance Reading Research Papers: A new member asked how to get help in reading research papers, and a member suggested to pick some papers on arxiv and read them.
- fast.ai Course Recommended to Beginners: One member asked if fast.ai course is for beginners and another member added that the only prerequisite is that you know how to code with a link to the fast.ai course.
Eleuther â· #research (3 messages):
Perplexity measurement, MMLU benchmark, topic datasets
- Evaluating Perplexity on Domain-Specific Topics: A member is exploring perplexity measurements of different models across domain-specific topics like computer science, history, and business ethics.
- Theyâre seeking standard datasets beyond MMLU, which is Q&A focused, and have started scraping Wikipedia pages but are open to established benchmarks.
- Sampling Pretraining Datasets for Topic Classification: A member suggested sampling from a pretraining dataset and classifying the topic using a small model for a fast and easy approach.
- This method would allow for efficient topic identification within existing datasets.
Eleuther â· #scaling-laws (10 messagesđ„):
Scaling Laws, Pretraining Dynamics, Decorrelated Performance, Nonlinear Metrics
- Delving Into Decorrelation Scaling Laws: Discussion revolves around the scaling laws paper and whether it implies just curve fitting versus predicting future scaled performance.
- The conversation pivots to various interpretations that result in power law laws arising, such as the nonlinear metrics explanation.
- Muon Guy Wrote That Paper?!: A member shared a link to openreview.net and mentioned it was written by the muon guy.
- Another link to a paper was shared, relating to scaling laws: arxiv.org/2304.01910.
- Intuition for Scaling Laws Explored: A participant asks for the intuition behind a paper leading to scaling laws.
- Another user suggests that performance on any test example becomes more and more decorrelated from others in the limit of model performance.
- Pretraining Power Law Dynamics: The pretraining power law would arise if you had no big stratum of easy samples.
- The emergent spike in the pretraining is not observed because each batch is more independent, with less shared easy, compared to training on a particular task.
Manus.im Discord â· #general (16 messagesđ„):
Manus Auth issues, Manus instability, Chat Mode adjustment, Gemini 3 Pro, AI-powered automation
- Manus Auth issues unresolved: A user reported issues with Manus Auth being disabled in project settings and unresolved tickets after requesting help via Manus, with Project ID dPK8UhWnJ9fTzjbpKfjJiF and domain auru.com.br.
- The user requires the Redirect URI https://auru.com.br/api/oauth/callback.
- Manus instability causes frustration: One user expressed frustration with Manus, citing code being wiped out between saving checkpoints, Git discrepancies, and general instability despite investing thousands of dollars and building SaaS platforms.
- They stated: âAgents arguing about what they see vs. what youâre LITERALLY looking at in PROD. Do not trust it.â
- Chat Mode adjustment in development: Manus team announced that the Chat Mode toggle is currently in development and will be available soon, after considering user feedback.
- Many users are requesting for the Chat Mode to come back soon!
- Demand for Gemini 3 Pro Model: A user asked what AI model Manus is currently using and requested to use Gemini 3 Pro.
- No response was given regarding this question.
- AI engineers specialize in Automation and Autonomous Agents: Some AI engineers introduced themselves: one focused on integrating AI-powered automation and predictive analytics using tools like Python, SQL, JavaScript, PyTorch, scikit-learn, LightGBM, and LangChain to deliver chatbots and recommendation engines.
- Another specializes in building autonomous AI agents and multi-agent systems using various tech stacks like JS/TS, Next.js / Vue, Go / Rust, Python, Langraph, AutoGen, ReAct, CrewAI, OpenAI, Claude, Hugging Face APIs.
Yannick Kilcher â· #general (8 messagesđ„):
Intern Recommendations, Learning Algorithms, Synthetic Data, Pug Resource, Docker and Kubernetes basics
- Searching for Wacky Intern Recommendations: A member asked for recommendations for a wacky intern.
- It was unclear if they were hiring or looking for a job.
- Experience with Learning Algorithms or Synthetic Data: A member inquired about experience with different learning algorithms or synthetic data.
- No responses were recorded.
- Resource Recommendation for Pug: A member asked where to find a resource to learn Pug.
- No responses or resources were shared.
- Docker and Kubernetes Basics: A member inquired about resources for Docker and Kubernetes basics.
- No responses or resources were shared.
Yannick Kilcher â· #paper-discussion (3 messages):
Kattention Module, TopKHot Autograd Function, HardTopKHotBCE Autograd Function
- Kattention Module Re-Tested: The Kattention module was re-tested and found to be working closely to expectations, utilizing sparse attention mechanisms.
- The code includes
nn.Linearlayers for attention and projection, along with aTopKHotfunction for sparse attention, crucial for scaling attention mechanisms.
- The code includes
- TopKHot Gradients Explored: A
TopKHotautograd function was implemented to select the top-k values, usingtorch.topkandscatter_for gradient computation.- The backward pass computes a
soft_targetbased on softmax weights of top-k values, and the gradient is derived asF.softmax(x, dim=-1) - soft_target.
- The backward pass computes a
- HardTopKHotBCE: BCE Approximation: A
HardTopKHotBCEautograd function, possibly cheaper to compute, was introduced.- The backward pass uses a hard target based on top-k indices and calculates the gradient as
F.sigmoid(x) - hard_target, approximating binary cross-entropy.
- The backward pass uses a hard target based on top-k indices and calculates the gradient as
Yannick Kilcher â· #ml-news (3 messages):
Mistral 3, Llama finetunes, wavefunction
- Mistral 3 is here!: Mistral AI released Mistral 3.
- If itâs good, it might replace Llama finetunes for some applications.
- Wavefunction video: A member linked to a wavefunction YouTube video.
- It is unclear what the video is about, but it may be relevant to AI.
DSPy â· #show-and-tell (1 messages):
justanotheratom: https://www.elicited.blog/posts/managing-tools-in-dspy
DSPy â· #general (4 messages):
Prompt Injection Defenses in DSPy, Security Measures, Training Dataset for Attack Mitigation, Partnership Proposal
- Prompt Injection Defenses Sought for DSPy: A member inquired about prompt injection defenses in DSPy, seeking community best practices given DSPyâs structure.
- Security at Prompting Layer: Limited: A member stated that there isnât much security you can get at the prompting layer, suggesting guardrails-type security measures, specific models, and model provider rejections.
- They also mention that for every âDo not do thisâ in the prompt, an attacker will likely find a way to trick the model.
- Mitigating attacks with Training Data: A member suggested that to guard against baseline attacks, include examples in the training dataset that use that attack and show what an appropriate response would be.
- Partnership Proposal: A member expressed interest in exploring a potential partnership with the DSPy project.
tinygrad (George Hotz) â· #general (2 messages):
Kernel Development Tools, Regression Test for Beam
- IDEs vs Terminal Editors Debate Starts: Members initiated a discussion on preferred tools for kernel development and iteration, posing the question of whether developers mainly use GUI IDEs like VS Code or Cursor, or terminal editors like Vim, Neovim, or Emacs.
- The discussion aims to gather insights into the communityâs preferences and workflows in kernel development.
- Beam Regression Test in Need of Fix: A member requested assistance with fixing and adding a regression test for
python3.14 test/test_tiny.py TestTiny.test_beam.- This indicates a need for contributions to ensure the stability and correctness of the beam functionality within the project.