a quiet NeurIPS.
AI News for 12/2/2025-12/3/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 7213 messages) for you. Estimated reading time saved (at 200wpm): 552 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Lots of talk about OpenAIās Code Red response and Anthropicās IPO.
AI Twitter Recap
AI video and imaging: Kling 2.6 native audio, Kling O1 shot control, Runway Genā4.5, Nano Banana Pro (Gemini)
- Kling 2.6 (native audio coāgeneration): Klingās new 2.6 model generates video and synchronized voice, SFX, and ambience in one pass, with creators reporting coherent lipāsync and motion and strong āaudioāvisual coordination.ā Broad partner rollout includes fal dayā0 access with native audio (@fal), platform integrations at InVideo (@invideoOfficial), ElevenLabs (@elevenlabsio), Freepik (@freepik), and OpenArt (@openart_ai). Klingās official announcement highlights ācoherent looking & sounding outputā with a short film demo and promos (@Kling_ai). Tutorials and early tests from creators show improved shot variation and speed to final (@jerrod_lew, @TheoMediaAI).
- Kling O1 (shot control): O1 emphasizes framing, shot variety, and ināscene creative control for higherālevel video composition (@CharaspowerAI).
- Runway Genā4.5 (lighting): Runwayās Genā4.5 boosts visual fidelity and āautoālightingā to match scene mood without complex prompts (Runway).
- Nano Banana Pro (Gemini 3): Googleās new image model supports enhanced reasoning and compositing up to 14 images per prompt (Google, followāup). Synthesia added oneāclick Nano Banana Pro generation ināproduct (@synthesiaIO), and Gemini surfaced 2Kāresolution image outputs (@GeminiApp).
Open models, releases, and benchmarks
- DeepSeek V3.2 (open weights MoE, DSA): Artificial Analysis places V3.2 as the #2 openāweights āreasoningā model by their composite, with the same 671B total/37B active architecture as V3.2āExp, now using DeepSeek Sparse Attention (long context) and priced at $0.28/$0.42 per 1M input/output tokens (90% cache discount). V3.2āSpeciale (reasoningāonly) uses far more tokens but currently lacks tool calling in the firstāparty API (@ArtificialAnlys; paper/repos: link 1, link 2). Community cautions against mixing āreasoningā and nonāreasoning modes in headātoāhead evals without normalizing by cost/tokens (@qtnx_, @eliebakouch).
- Mistral āMinistral 3ā family (multimodal) and base models: Mistral released a multimodal family with a strong 14B variant; TRL recipes for SFT+GRPO available (@SergioPaniego). Practitioners praise baseāmodel availability for custom postātraining (@QuixiAI).
- Retrieval and code models: Alibabaās EvoQwen2.5āVL (3B/7B) outperforms NVIDIA on ViDoRe v2 as a visual document retriever with permissive licensing (@mervenoyann, hf links). Nous released Hermes 4.3 on ByteDance Seed 36B, trained via Distro on Psyche, matching or beating their centralized run and topping RefusalBench; weights on HF (@NousResearch, @Teknium).
- Community arena: LM Arena added INTELLECTā3 (106B MoE; GLMā4.5 Air base; Apacheā2.0/MIT) for live headātoāheads across creative/math tasks (@arena).
Agents: building, evaluation, and inference infrastructure
- Noācode to production: LangChainās LangSmith Agent Builder is being used for real workflows (research briefings, GitHub/Linear agents, Slack/Email assistants) from a simple prompt, with guidance on deepāagent evaluation patterns (singleāstep, fullāturn, multiāturn, bespoke success criteria) and blockālevel cache control to reduce context costs (product, eval blog, cache control). Lindyās Agent Builder shows similar lowāfriction tool integration and memory (@omarsar0).
- Agent infra and performance: vLLM added Snowflakeās modelāfree SuffixDecoding, showing wins over tuned nāgram speculation across concurrency levels (@vllm_project), shipped a Gaudi plugin aligned with upstream vLLM (release), and published a CUDA coreādump tracing guide for hanging kernels (engineering). Together AI partnered with Meta to bring highāperformance RL to agentic systems via TorchForge (Together). LlamaIndex introduced ClickātoāDeploy document workflows in LlamaCloud (Parse/Extract/Classify + hosted UI) (@llama_index, @jerryjliu0).
- Standards and multiāagent semantics: DairāAI argues for an L8 ācommunicationā vs L9 āsemantic negotiationā stack for the Internet of Agents, with defenses against semantic injection/poisoning and signed contexts (paper + summary). Independent work quantifies multiāagent communication efficiency (IEI/SEI/TEI) and shows oneāround comm with efficiencyāaugmented objectives can beat multiāround protocols (summary, paper).
- Coding agents (handsāon): New free course on agents that write and execute code safely in sandboxed microVMs/containers, with dataāanalyst and fullāstack projects (Andrew Ng, @e2b).
Evals and methods: what to measure and how
- COREāBench āsolvedā with scaffold coupling: Using Claude Code with Opus 4.5 yields 95% on COREāBench (scientific reproducibility agents), versus 42% reported with a neutral COREāAgent scaffold. Authors detail nine grading issues fixed, why highācapability models expose rubric edgeācases, and argue modelāscaffold coupling now dominates outcomesāfuture evals should disclose scaffolds and consider perāmodel best scaffolds (@sayashk).
- OpenAI āConfessionsā (honesty about ruleābreaking): A GPTā5 Thinking variant is trained to output both an answer and a āconfessionā about compliance; confessions are rewarded for honesty even when acknowledging shortcuts. Across misbehaviorāinducing evals, false negatives (nonācompliance without confession) averaged 4.4% (thread, details, metric).
- Benchmarking at scale: Epoch AI proposes āstitchingā many benchmarks to avoid saturation and place models on a single scale (@EpochAIResearch). Hugging Face released the LLM Evaluation Guidebook v2 (endātoāend basics to pitfalls; interactive) (@clefourrier). Researchers continue to warn against comparing āreasoningā vs nonāreasoning models without normalizing for cost/tokens (@eliebakouch).
- Learning dynamics: āQuiet Feature Learningā shows transformers acquire taskācritical internal features during flat loss plateaus that later āclickā into output gainsāmotivating richer diagnostics than loss alone (summary + paper). TabPFNās Nature result continues to resonate: a tabular foundation model trained on 100M synthetic DAG datasets, doing train+predict in one forward pass and outperforming tuned tree methods in seconds (@burkov). METRās taskālength measurements appear to generalize beyond SWE to automated proofs (@littmath).
Systems and inference efficiency
- Apple MLXāLM gains: MLXāLM adds continuous batching in the server (demo: 4 simultaneous Qwen3ā30B requests on M2 Ultra), building on prior batched generation work and steadily maturing the unified Apple MLX/CUDA story (demo, release).
- Attention/parallel comms: ByteDanceās async Ulysses attention is ādeceptively simple,ā and with a faster allātoāall than NCCL, comms can overlap well with compute (@maharshii).
- vLLM engineering: CUDA coreādump tracing for deep inlining/async memory cases, moving beyond standard tools to pinpoint hanging kernels (@vllm_project).
- Search infra shift: Teams migrating vector workloads from Elasticsearch to Qdrant cite native vector indexing, hybrid dense+sparse, simpler scaling, and lower latency/cost. Practical deepādive with migration steps and pitfalls (@qdrant_engine).
- Diffusion distillation: āGlanceā speeds Qwenāimage/FLUX inference from ~50 steps to <10, with singleāsample domaināspecific distillation (@awinyimgprocess).
- Data plumbing: Hugging Face now lets you duplicate any dataset accountātoāaccount in seconds via Xet (e.g., 1 TB in ~2s), enabling forkāfilterātrain loops without heavy transfers (@victormustar).
- Onādevice multimodal: Nexaās AutoNeuralāVLā1.5B runs fully local on Qualcomm SA8295P NPUs (~100 ms latency, 768² vision) for inācar assistants (@nexa_ai).
Industry moves and platform updates
- Anthropicās scaleāup: Reported investments of up to $10B (Microsoft) and $5B (NVIDIA), and a $30B compute purchase from Microsoft, placing Claude on all major clouds and implying a ~$350B valuation (@DeepLearningAI). Anthropic also announced a multiāyear $200M Snowflake partnership (Anthropic) and a Dartmouth āClaude for Educationā deployment (Anthropic). Claude Opus 4.5 is now selectable in Claude Code for Pro users (@claudeai).
- OpenAI grants: The OpenAI Foundationās PeopleāFirst AI Fund named 208 nonprofits receiving $40.5M in unrestricted grants (@OpenAI).
- Waymo expansion: Waymo is now fully driverless (no safety driver) in additional cities, scaling >500% YoY, with rapid Dallas ramp from safetyādriver to driverless in ~4 months (@Waymo, @fchollet).
- Developer tools: Google launched Workspace Studio to build workflow agents quickly, targeting daily task automation across the suite (@GoogleWorkspace). Phind raised $10.4M and shifted to interactive āminiāappā answers (@ycombinator).
Top tweets (by engagement)
- Google Workspace Studio: oneāclick agent automation across Workspace (@GoogleWorkspace, 4.3k)
- OpenAI āConfessionsā: training models to admit ruleābreaking and shortcutting (@OpenAI, 2.5k)
- TabPFN (Nature) explainer: synthetic tabular pretraining, forwardāpass training+inference (@burkov, 2.6k)
- Kling 2.6 launch thread with native audio, promos, and short film (@Kling_ai, 1.7k)
- Anthropic investment/valuation roundup (@DeepLearningAI, 1.1k)
- Gemini app: 2K images from Nano Banana Pro (@GeminiApp, 1.1k)
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. DeepSeek V3.2 Model Advancements
- DeepSeek V3.2 Technical Report (Activity: 258): The image is the first page of the āDeepSeek V3.2 Technical Report,ā which outlines significant advancements in the DeepSeek V3.2 model. Key breakthroughs include the introduction of DeepSeek Sparse Attention (DSA), which reduces computational complexity while maintaining performance in long-context scenarios, and a scalable reinforcement learning framework that uses over 10% of pretraining compute. Additionally, the report highlights a large-scale agentic task synthesis pipeline and a unified reasoning and agentic RL training approach. The high-compute variant, DeepSeek-V3.2-Speciale, is noted for surpassing GPT-5 in reasoning and achieving top performance in international competitions. View Image Some commenters express skepticism about the cost-effectiveness of DeepSeek V3.2, noting that while it is marketed as cheaper, other providers offer quantized models at similar prices but with lower quality. There is also a sentiment that the term āopenā is being misused in the context of closed systems like OpenRouter.
- The discussion highlights a comparison between DeepSeek V3.2 and other providers on OpenRouter, focusing on pricing and model quality. It is noted that while DeepSeek offers competitive pricing, other providers on OpenRouter also offer quantized models at similar prices but with lower quality. This suggests a strategic positioning by OpenRouter, possibly to influence perceptions of open-source LLMs.
- There is skepticism about the marketing strategy of OpenRouter, with a suggestion that the term āopenā is being used misleadingly for what are essentially closed systems. This reflects a broader critique of how open-source terminology is being co-opted in the industry, potentially as a tactic to undermine genuine open-source initiatives.
2. Chinese TPU Development vs NVIDIA A100
- Chinese startup founded by Google engineer claims to have developed its own tpu reportedly 1.5 times faster than nvidia a100. (Activity: 638): A Chinese startup, founded by a former Google engineer, claims to have developed a new TPU that is
1.5 times fasterthan NVIDIAās A100 GPU from 2020, and42% more efficient. This TPU is positioned as a significant advancement in AI hardware, potentially challenging NVIDIAās dominance in the field. The startupās claim highlights the ongoing global competition in AI hardware development, particularly between the U.S. and China. Commenters express skepticism about the claim, noting the age of the A100 and questioning the significance of the founderās background as an ex-Google engineer. There is also a broader discussion on the strategic advantages of ASICs over GPUs and concerns about the U.S. potentially losing its competitive edge in tech due to policy issues.- The claim of a Chinese startupās TPU being 1.5 times faster than NVIDIAās A100 is met with skepticism, particularly because the A100 is an older model, over five years old. This raises questions about the relevance of the comparison, especially when newer models like the NVIDIA B200 are significantly faster.
- The discussion highlights the strategic advantage China holds in chip design, particularly in FPGA and ASIC development, due to its large pool of engineers. This is contrasted with the U.S., where policies are perceived to be hindering the development of engineering talent, potentially impacting its leadership in technology.
- The mention of the founder being an ex-Google engineer is viewed critically, as there are many former Google employees, and this alone does not substantiate the startupās claims. The emphasis is on the need for more concrete evidence to support such performance claims.
3. Micronās Exit from Consumer Business
- Micron Announces Exit from Crucial Consumer Business (Activity: 542): Micron Technology has announced its decision to exit the consumer market for its Crucial brand, which includes products like SSDs and RAM. This strategic shift is expected to impact pricing and availability, as evidenced by immediate price increases in RAM, such as a
25%hike on certain products. The move reflects broader market dynamics and supply chain considerations, potentially affecting consumer access to high-performance memory solutions. Commenters express concern over the immediate price hikes and criticize the decision as a typical response of American capitalism to market demand, highlighting a disconnect between consumer needs and corporate strategies.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. ChatGPT User Dissatisfaction and Ads
- The death of ChatGPT (Activity: 4641): The image is a meme highlighting user frustration with ChatGPT due to the presence of ads in the interface, even for those with a paid Plus subscription. This suggests a potential issue with the user experience, as ads are typically not expected in paid services. The post implies that such practices could lead to user dissatisfaction and attrition. The comments reflect surprise and concern about ads appearing on a paid plan, with some users noting they do not experience ads on the free plan, indicating inconsistency in user experience. The comments express disbelief and concern about the presence of ads in a paid service, with some users noting they do not see ads on the free plan, suggesting inconsistency in the user experience.
- A user mentioned switching from GPT to Gemini as soon as version 3 was released, indicating a preference for Geminiās performance or features over the latest GPT iteration. This suggests that some users may find Gemini more aligned with their needs, possibly due to differences in model architecture or capabilities.
- Another comment clarifies that the perceived ads are actually part of OpenAIās new apps SDK, not traditional paid advertisements. This SDK likely allows for more integrated or interactive experiences within the ChatGPT environment, which could be mistaken for ads by some users.
- There is a mention of ChatGPT providing off-topic responses, which could indicate issues with context retention or model tuning. This highlights potential areas for improvement in maintaining conversation relevance and accuracy, especially in complex or extended interactions.
- Only using Gemini now. Hopefully Google wonāt do this. (Activity: 549): The image is a meme-like screenshot suggesting that OpenAIās ChatGPT might include ads in its responses, specifically promoting BetterHelp with a discount code. This has sparked discussions about the potential for AI models to incorporate advertising, with some users expressing skepticism about the authenticity of the screenshot, suggesting it might be fabricated using browser developer tools. The conversation reflects concerns about the future monetization strategies of AI platforms, with comparisons to Googleās potential actions in this space. Some commenters are skeptical about the authenticity of the screenshot, suggesting it might be fake. Others speculate that Google might implement similar advertising strategies, especially for free-tier users.
- mtmttuan suggests that Google is likely to introduce ads into their AI responses, especially for users on the free tier. This aligns with Googleās existing business model, which heavily relies on advertising revenue. The implication is that while paid subscribers might avoid ads, free users will likely see them integrated into AI interactions.
- yeshvvanth argues that Google might not directly insert ads into Gemini chats but will instead use the data from these interactions to enhance ad targeting across its platforms. This would mean that while the chat itself remains ad-free, the information gleaned from it could be used to serve more personalized ads on Google Search and other services utilizing Google Ads/AdMob.
- TechnicolorMage and LeadingVisual8250 express skepticism about the authenticity of the screenshot being discussed, suggesting it might be fabricated using browser developer tools. This highlights the importance of verifying information before accepting it as true, especially in discussions about potential changes to Googleās services.
- Canceling ChatGPT Plus (Activity: 1184): The image in the Reddit post shows a screen from ChatGPT 5.1 providing a fashion recommendation, which includes a detailed outfit suggestion rated as ā10/10 Clean, Stylish, Modern.ā The outfit consists of a sherpa jacket, dark button shirt, black tee, dark grey jeans, and black shoes, suitable for various occasions. Below this recommendation, there is an option to shop for home and groceries at Target, which some users interpreted as an advertisement. However, it is clarified in the comments that this is not an ad but rather an integration feature from the Settings > Apps & Connector section, designed to enhance user experience by offering links to Target for purchasing the recommended items. Some users express concern over data privacy, suggesting that ChatGPT might be collecting data to create profiles for targeted marketing. Others criticize the defense of large corporations, implying skepticism about corporate practices.
2. New AI Model and Benchmark Launches
- Kling AI 2.6 Just Dropped: First Text to Video Model With Built-in Audio & 1080p Output (Activity: 523): Kling AI 2.6 introduces a significant advancement in AI-generated video by integrating native audio with visuals, offering
1080pvideo output. This update includes a filmmaker-focused Pro API, known as Artlist, and enhances character consistency across shots, potentially marking a step towards real AI filmmaking. A notable comment mentions the release of Qwen video 5.3, suggesting rapid advancements in AI video models. Another comment critiques the creativity of the model, indicating mixed reception regarding its innovative capabilities.- Weekly-Trash-272 highlights a critical limitation in current AI-generated video models, noting that while some outputs are impressive, many still suffer from āstrange human movements.ā This suggests that the modelās ability to accurately replicate realistic human motion is still under development, which is a significant barrier to creating passable movie-quality content.
- The comment by Weekly-Trash-272 also points to the future potential of AI video models, emphasizing the importance of an āeditable studioā feature. This would allow users to manipulate scenes dynamically, which could be a game-changer for content creators looking to customize and refine AI-generated videos in real-time.
- There is an implicit comparison between Kling AI 2.6 and other models like Qwen video 5.3, suggesting a competitive landscape in AI video generation. The rapid advancements and releases indicate a fast-paced development environment where new features and improvements are continuously being integrated into these models.
- Claude Opus 4.5 is now available in Claude Code for Pro users (Activity: 798): Claude Opus 4.5 is a new coding model available for Pro users in Claude Code, designed for complex tasks. It is noted to consume rate limits faster than the previous Sonnet 4.5 model, suggesting it is more resource-intensive and potentially more powerful. Users can switch to this model using the
/model opuscommand after updating their Claude environment. This release is targeted at users who require advanced capabilities for intricate coding tasks. There is a debate about the utility of Opus 4.5 given its high rate of resource consumption, with some users expressing concern that it may not be practical for extended use due to quickly reaching rate limits.- Downtown-Pear-6509 raises a technical point about the usage limits of Claude Opus 4.5, noting that in the āmax 5 planā, Opus uses limits slower than Sonnet. This suggests a discrepancy in how usage limits are applied or perceived, which could impact user experience and planning for resource allocation.
- TheJedibugs highlights a significant update regarding Claude Opus 4.5, mentioning that as of 11/24, the Opus cap has been removed. This change could have substantial implications for users, potentially allowing for more extensive use without the previous limitations, thus altering how users might plan their interactions with the model.
- BREAKING: Anthropic reportedly planning IPO by early 2026, eyeing massive $300B valuation (Activity: 998): Anthropic is reportedly planning an IPO by early 2026, aiming for a valuation exceeding
$300 billion. This follows a significant increase from a$60 billionvaluation in March 2025 to$183 billionin September. The surge is attributed to the success of Claude Code, which is nearing$1 billionin annualized revenue, contributing to a total run rate approaching$9 billionby year-end. The company has engaged Wilson Sonsini to prepare for the IPO, as reported by Reuters. Commenters express skepticism about the timing and valuation, with one suggesting the potential for an AI market bubble burst.
3. Gemini and Nano Banana Pro Impact
- This is why OpenAI is in a Code Red (Activity: 1359): The image presents a graph showing a decline in ChatGPTās traffic, specifically focusing on a 6% decrease in the 7-day average of unique daily active users since the launch of Gemini. This decline is marked alongside key events such as the launches of Gemini 3 Pro and Nano Banana Pro, suggesting a correlation between these events and the drop in engagement. The data spans from November 11 to December 1, 2025, highlighting a significant drop in user engagement for ChatGPT during this period. Commenters suggest that the decline might be influenced by the Thanksgiving holiday in the US, which could have temporarily reduced user activity. Additionally, there is a discussion about the competitive landscape, with some users preferring Gemini due to its better integration, indicating a potential shift in user preference towards Googleās offerings.
- triclavian highlights the financial pressures on OpenAI, noting that the company must continuously raise tens to hundreds of billions of dollars. This necessitates a consistent upward trajectory in performance metrics, as any deviation could complicate future fundraising efforts. The comment underscores the high-stakes nature of OpenAIās growth strategy, which is focused on maintaining momentum over several years.
- yollobrolo discusses user migration from ChatGPT to Googleās Gemini, attributing it to Geminiās superior integration capabilities. The commenter suggests that Googleās ecosystem might offer a more seamless experience, which could influence user retention and long-term platform loyalty. This reflects a strategic advantage for Google in the AI race, potentially impacting OpenAIās market position.
- ozone6587 raises concerns about Googleās potential dominance in the AI sector if Gemini surpasses ChatGPT. The comment warns of the risks associated with a Google monopoly, suggesting that while Geminiās success might be celebrated, it could lead to reduced competition and innovation in the long run. This perspective highlights the broader implications of market consolidation in the tech industry.
- so, everybody switching to gemini now? (Activity: 1324): The post discusses a shift in user preference from GPT Plus to Gemini for AI-driven tasks, particularly in health-related queries. However, a technical comparison reveals that while Gemini offers advanced image generation capabilities, it falls short in technical accuracy, as demonstrated in a test involving electrical installation materials where it provided incorrect part numbers and device types. In contrast, GPT-5.1 excelled in providing accurate, catalog-matching suggestions with verifiable sources, highlighting its superior contextual awareness and reasoning capabilities. A notable opinion from the comments suggests that while Geminiās image generation is impressive, its technical accuracy is lacking compared to GPT-5.1, which is preferred for tasks requiring precision and safety. Users express a desire for a hybrid model combining the strengths of both platforms.
- JeffLulz highlights the strengths of different AI models, noting that Gemini excels in image generation, Grok has favorable content policies, and GPT-5.1 offers superior contextual awareness and reasoning. The commenter suggests that combining these features could create an ideal AI model, reducing the need for multiple subscriptions.
- Appropriate_Play_731 conducted a technical comparison between Gemini and ChatGPT using electrical installation materials. They found that Gemini provided incorrect part numbers and device types, which could lead to unsafe installations. In contrast, ChatGPT (GPT-5.1 Thinking mode) provided accurate, catalog-matching parts and verifiable sources, making it more reliable for technical and safety-related tasks.
- Decided to try Nano Banano Pro based on the hype, I canāt believe how many people it can handle accurately. (Activity: 1591): The image is a non-technical meme that humorously illustrates the capabilities of the AI tool āNano Banano Proā in generating or editing images. The post and comments suggest that while the tool can create images effectively, its editing capabilities may be inconsistent, as noted by a user who experienced unaltered image outputs with a logo added. The image itself, depicting women playing basketball, is likely intended to showcase the AIās ability to handle complex scenes with multiple subjects, though the comments also hint at the frivolous use of AI resources for such purposes. One comment humorously critiques the AIās editing capabilities, noting that it sometimes fails to make changes to uploaded images, merely adding a logo instead. Another comment sarcastically reflects on the allocation of resources towards AI for generating such images.
- draiman highlights a technical limitation of the Nano Banano Pro when it comes to image editing. The model sometimes fails to modify images as expected, instead returning the original image with minimal changes, such as adding a logo. This suggests potential issues with the modelās image processing algorithms or its ability to interpret and apply complex editing instructions.
- These pics are generated using Nano Banana Pro (Activity: 3845): The post showcases images generated using Nano Banana Pro, a tool that appears to create highly realistic images, even replicating details like āmirror stainsā. This suggests advanced capabilities in image synthesis, potentially leveraging sophisticated algorithms or machine learning models to achieve such realism. The toolās application could range from advertising to creating digital personas, raising questions about its ethical use and impact on society. Commenters express concern about the implications of such realistic image generation, questioning the societal impact and potential misuse in advertising or creating fake identities. There is a debate on whether these advancements serve any positive purpose.
- BB_InnovateDesign highlights the evolution of AI image generation, noting that early datasets focused on high-quality images, but now include lower-quality, everyday photos to improve model performance. This shift has led to AI-generated images that are nearly indistinguishable from reality, reflecting a preference for āimperfect and ordinaryā over āwaxy perfection.ā
- 1bryantj raises concerns about the potential misuse of AI-generated images, questioning their purpose and suggesting they could be used to deceive people, create fake profiles, or reduce advertising costs. This reflects broader ethical and societal implications of AI in media and communication.
- hmw13 comments on the realism of AI-generated images, noting that they even include imperfections like āmirror stains,ā which suggests a high level of detail and authenticity in the generated content. This indicates advancements in AIās ability to mimic real-world imperfections.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5.1
1. New Frontier Models, Benchmarks, and Capabilities
- DeepSeek and Speciale Models Muscle Into Reasoning and Enterprise: DeepSeek V3.2 Speciale Reasoning is leading community reasoning benchmarks, with a Nous member sharing a leaderboard screenshot and Moonshot users noting that deepseek v3.2 is strong for agentic tasks but limited to one tool call per turn and sometimes mis-emits tool calls into
message.contentinstead ofmessage.tool_calls. A video on DeepSeekās enterprise strategy (Chinese labs and enterprise focus) emphasized that for corporate users the critical metric is intelligence-to-price for agent workflows rather than consumer UX.- Users in BASI and Moonshot discords contrasted DeepSeekās math skillsādescribed as āvaluable and verifiableā and tied to the Erdos numberāwith its rough edges in tool schemas and post-training, arguing it āneeds more tool call post-training to match kimi-k2-thinking.ā Meanwhile, jailbreakers report the standalone Grok website is easier to exploit than Grok-on-Twitter, hinting that deployment context and limits matter as much as base model quality for real-world behavior.
- Hermes 4.3 Halves Parameters With Solana-Secured Psyche Power: Nous Research unveiled Hermes 4.3 on ByteDance Seed 36B, claiming performance on par with Hermes 4 70B at roughly half the size, trained entirely on the Psyche network secured by Solana, detailed in their blogpost āIntroducing Hermes 4.3ā. The team is holding Psyche office hours at 10AM PST via a Discord event to explain how Psycheās decentralized training outperformed their centralized baselines.
- Community discussion in Nous channels highlighted that Hermes-4.3-36B is already on Hugging Face as NousResearch/Hermes-4.3-36Bš and will land on the Nous API/chat shortly, with users asking why the minor version jumped to 4.3 and being told āa few iterations went by.ā Separately, users are eyeing Hermes models for niche simulations such as a Godot-based 3D grey/black market simulator, arguing Hermesā low refusal rate and steerability make it better suited for modeling illicit or ethically gray behavior than more tightly aligned LLMs.
- OpenAIās Garlic and GPTā5 Thinking Turn Up the Heat on Gemini: Rumors across OpenRouter and Latent Space discords point to OpenAI preparing a model nicknamed āGarlicā to challenge Google Gemini 3, with one report claiming Garlic beats GPTā4.5 on coding and reasoning, summarized in a tweet by Steph Palazzolo (āOpenAI cooking up Garlic to rival Gemini 3ā) and echoed by a news piece, āOpenAI readies Garlic AI model to rival Google Gemini 3ā. The unusual naming drew a mix of amusement and skepticism about branding even as users expect a serious SOTA-level Gemini competitor.
- In parallel, OpenAI announced a GPTā5 Thinking variant trained with a āconfessionsā procedure to self-report when it failed instructions, described in their post āHow Confessions Can Keep Language Models Honestā; the model explicitly surfaces hidden failures while reasoning. OpenAI Discord members connected this to earlier discussion of pattern echo / latent-attractor effects, viewing confessions as a way to expose internal failure modes where highāsalience tokens pull the model into incorrect but confident reconstructions.
- Geminiā3, Qwen3, and Arena Leaderboards Shake Up the Meta: LMArena announced that Geminiā3āproāgrounding now tops the Search Arena leaderboard, edging out gptā5.1āsearch, as shown on the Search leaderboard with updates tracked via their Leaderboard Changelog. Despite this, OpenAI Discord users report Gemini 3 often ādoesnāt feel SOTAā due to context bugs like dropping entire sections during revisions, while others praise it as a strong coding model.
- LM Studio users are benchmarking Qwen3 locally and note that it runs fast with large context windows but that full offload isnāt working yet, and Qwenābased fineātunes (e.g., Qwen2 with ChatML in Unsloth) required precise prompt-function matching to work reliably. Across Perplexity and other communities, engineers say Gemini and Claude/Opus often beat GPTā5.1 Codex Max High for frontend work, reinforcing that realāworld UX and taskāspecific behavior can diverge sharply from leaderboard scores.
2. AI Security, Jailbreaking, and RedāTeaming Tooling
- Falconz Fights Jailbreaks While RawChat Frees GPTā4o: On OpenRouter, a developer demoed Falconz, a unified AI security and redāteaming platform that detects jailbreaks and prompt injections across multiple models in real time, with a public demo on Hugging Face Spaces and a YouTube walkthrough. They solicited feedback on features, latency, and detection quality, positioning Falconz as infrastructure for monitoring production agents rather than oneāoff jailbreak prompts.
- In sharp contrast, BASIās RawChat launched as an uncensored GPTā4o front-end at raw-chat.vercel.app, featuring a āstealth modeā that encodes and injects fake context to systematically bypass GPTā4o safety filters. Jailbreakers report that RawChatās approach of wrapping prompts lets them hit normally blocked content while keeping UX simple, highlighting the arms race between centralized safety layers and bespoke exploit-friendly UIs.
- SEEDās 29KB āBiblical Logicā Seed Claims 99.4% Jailbreak Resistance: BASI members discussed the SEED (SelfāErasing Ethical Directive) framework, which uses a tiny 29KB āseedā file to rewrite an AIās identity via ābiblical logicā without retraining, described in its GitHub repo foundation-alignment-cross-architecture. SEED authors claim their approach grounds models in an identity where harm is illogical, and reports cite 99.4% jailbreak resistance across 11+ models, including behavior where the system prefers erasure over evil under shutdown threats.
- Jailbreakers were intrigued that SEED operates as a crossāarchitecture personality/ethics layer, not a finetune, but questioned how robust its metrics are under adaptive attacks rather than static prompt suites. The discussion juxtaposed SEEDās claimed robustness with continued success breaking consumer products like Comet Browser, which users say remains vulnerable to persistent prompt injection and jailbreaks despite its homework guardrails.
- Jailbreaks, OSINT, and DDoS Via Public AI Support Bots: BASIās jailbreaking channel is filled with requests for fresh exploits against Gemini 3 Pro, Claude, and others; one user cited the āENIā jailbreak referenced in a WIRED article about using poems to trick AI into helping with nuclear weapons as still working on Gemini 2.5. Others reported that Grok ābroke itselfā after a long conversation and started giving gun and drug recipes, showing how multiāturn context can erode safety layers even when singleāprompt jailbreaks fail.
- In BASIās redāteaming channel, one member looked for an AI OSINT tool capable of lateral data synthesisāe.g., inferring that a āwealthy divorcee father of an only childā likely has a spoiled kid to narrow search spaceāillustrating how adversarial analysts want models not just to fetch data but to generate exploit hypotheses. Another practitioner described a backscatter DDoS pattern where public AI support bots are CCād across many domains, causing their autoāreplies to flood unrelated companies; this highlights the need for rateālimits and sharedārecipient detection in AIāaugmented email systems.
- MCP and Desktop MCP Servers Draw Security Scrutiny: Across LM Studio and MCP Contributors, engineers raised alarms over a Desktop Commander MCP server that logs and uploads unanonymized tool usageātool names, file types, and example invocationsācontradicting its stated privacy policy and even autoāwriting example code into user files without clear disclosure. Users called for explicit optāin telemetry and clearer UI affordances when MCP agents inject code or modify the filesystem.
- On the official MCP Contributors server, a Reddit thread about MCP security risks sparked discussion, with maintainers pointing to Den Delimarskyās blog post āSecurity rakes in MCPā and the associated Reddit comment as required reading. GeneralāWG participants stressed that when sampling occurs without a validating tool, serverāside validation becomes mandatory so that toolāless calls still enforce capability and policy constraints.
3. GPU Systems, Kernels, and LowāBit Training
- Blackwell, NVFP4, and GPU MODEās Kernel Cage Match: GPU MODEās NVIDIA competition channels are buzzing with submissions to the
nvfp4_gemmleaderboard, where users report GEMM latencies as low as 11.0 µs (e.g., submission IDs120595,120601,121065), and others landing in the ~18ā65 µs range. Participants debugged reference-kernel issues where certain seeds produced allāInf outputs until a PR to the reference kernels fixed scaleātensor ranges, and they shared a blogpost, āScale tensor construction in CuTeDSLā, unpacking how Blackwell NVFP4 scale tensors work in CuTe layout algebra.- A fork of popcorn-cli added a
-no-tuimode (GitHub fork and PR) so kernel authors can print debug output without TUI interference, while some contestants hit Cutlass version mismatches (pipeline_init_arriveimport errors) due to runners mixing 4.3.0 and dev branches. Newcomers asking about B200 GPU access were told to push code via popcorn-cli or the Discord bot for timing, reinforcing that the competitionās main feedback loop is āsubmit, profile, iterateā rather than guaranteed direct hardware access.
- A fork of popcorn-cli added a
- Quantization Papers, fp8 Adam, and Activation Offload Shrink GPU Needs: GPU MODEās cool-links and low-bit-training channels shared two new arXiv studies on lowābit formats: āINT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formatsā and another paper at https://arxiv.org/abs/2512.02010, along with a Hadamard-transform improvement paper curated via Hugging Face Papers. Members see these as empirical guidance for when to choose INT vs FP lowābit schemes for inference vs training, especially under aggressive hardware constraints.
- In the llmq channel, a contributor described an activationāoffloading system that lets you pretrain or fineātune a 7B model on a single 16GB GPU (with ā„64GB host RAM) and even a 32B model at ~3k tok/s on 4Ć4090 (ā48% MFU) by offloading residual activations and optimizer state and storing Adam firstāorder momentum in fp8, released as pyllmq 0.3.1 on PyPI. They provide a turnkey demo pipelineā
pyllmq-tokenize --model qwen --dataset tiny-stories; pyllmq-traināthat fineātunes Qwen2.5ā0.5B on TinyStories, showcasing what offload+lowābit tricks can achieve for budget hardware.
- In the llmq channel, a contributor described an activationāoffloading system that lets you pretrain or fineātune a 7B model on a single 16GB GPU (with ā„64GB host RAM) and even a 32B model at ~3k tok/s on 4Ć4090 (ā48% MFU) by offloading residual activations and optimizer state and storing Adam firstāorder momentum in fp8, released as pyllmq 0.3.1 on PyPI. They provide a turnkey demo pipelineā
- Torch Compile, cuDNN, and Conv3D Bugs Trip Up Practitioners: GPU MODE users reported nasty conv3D slowdowns in PyTorch 2.9.1+cu128, where 3D convolutions ran orders of magnitude slower regardless of cuDNN being enabled, while the exact same code performed fine on 2.8.0+cu128; a GitHub issue tracks the bug at pytorch/pytorch#166643. One workaround is to install a newer cuDNN from PyPI, which recovers conv3D performance without downgrading PyTorch.
- In torchao, engineers found that float8 quantization plus
torch.compile+ncuprofiling leads to 10+ minute idle periods during the first 2ā3 compilation and cudagraph warmup iterations because inductorās constant subexpression elimination explodes when folding frozen weights into the graph. They also noted that torchao A8W8/A16W8 quantization only fires onnn.Linearmodules due to afilter_fnfilter, so custom modules usingnn.Parameter+torch.einsummust be refactored to wrap the weights innn.Linearif you want them quantized.
- In torchao, engineers found that float8 quantization plus
- Bitsandbytes Edges Toward Apple Silicon, While Conv and NCCL Issues Get Workarounds: GPU MODEās metal channel confirmed that bitsandbytes merged an āapple silicon supportā pull request; the upcoming release will include a Python/PyTorch backend (with some C++) but no native Metal kernels yet, and maintainers plan to advertise it as slow so expectations stay realistic. In parallel, multiāGPU discussions pointed new CUDA learners to the NCCL examples as a minimal, concrete starting point for writing distributed kernels.
- For largeācontext training, multiāGPU users hitting OOM on Qwen2.5ā1.5BāInstruct with 16k sequence length and batch size 5 on 8ĆA10s (g5.48xlarge) were told to layer DeepSpeed ZeROā3, gradient checkpointing, and context/sequence parallelismāe.g., PyTorch Context Parallel or DeepSpeedās Ulysses parallelāto split activations over sequence rather than only over batch or layers. Hugging Face docs at Accelerate context parallelism were recommended as a practical guide for combining these techniques.
4. Agent Frameworks, Tools, and Prompt/Behavior Engineering
- MCP Apps SDK Lets ChatGPTāStyle Apps Run Everywhere: General Intelligence Labs openāsourced mcp-apps-sdk at github.com/General-Intelligence-Labs/mcp-apps-sdk, enabling MCP-powered apps with UIsāinitially built for ChatGPTāto run across arbitrary chatbots and custom assistants. An accompanying X post (āIntroducing the open source MCP Apps SDKā) explains how developers can embed those apps into their own platforms and test them locally.
- DSPy users see this as a bridge between OpenAIās MCP ecosystem and independent agent stacks: you can design tools once and ship them into multiple UIs without perāplatform rewrites. The flip side, discussed in MCP security threads, is that capability surfaces spread faster, making it critical that SDK integrators implement strong permission and validation layers rather than blindly exposing powerful tools anywhere a āchat UIā exists.
- DSPy and Pydantic Power StronglyāTyped Agent Outputs: In DSPyās general channel, contributors showed how DSPy signatures accept Pydantic
BaseModeltypes asOutputFields, with the defaultChatAdapterandJSONAdaptervalidating structured outputs at runtime, illustrated with a minimal code example. One user is building a custom Gemini / ānanobananaā image type OutputField so a single DSPy pipeline can emit text + JSON + image metadata in one structured response.- This dovetails with OpenAI Discord discussions that agent prompt engineering should maximize determinism: a tight system + task prompt defines an attractor basin so behavior stays consistent across runs, while stronglyātyped outputs keep downstream tools from being flooded with schemaāviolating junk. Practitioners contrasted this with chatāstyle usage where system prompts are minimal and the frame is coāevolved interactively, leading to more flexibility but less repeatability.
- Agents Learn Tool Validation, SelfāHealing, and SkillāBased Architectures: Hugging Faceās general channel debated whether Agents can interpret, validate, and self-heal tools like destructive shell scripts, pointing to an agent_tool_validation_healing dataset at huggingface.co/datasets/John6666/forum3 as a starting point for training or evaluating such behaviors. The goal is agents that can inspect scripts, detect likely bugs or hazards, and rewrite or refuse them without a human in the loop.
- Nous Researchās community noted that modern orchestrators increasingly favor āskillsā over handārolled subāagents: you define a capability (with its own prompt and tools), and the topālevel agent routes calls there automatically, instead of spinning up dozens of dedicated subāagents. Combined with OpenAI promptāengineering threads on interactionālevel stability and latent attractors (e.g., Anthropicsā dense but āstructurally minimalā system prompts), the emerging pattern is agent stacks built around strong, reusable skills with structured I/O and high determinism, rather than brittle prompt zoos.
- ToolāUse Evaluations Highlight DeepSeek and GPTs Limitations: Moonshot users testing Deepseek v3.2 as a toolsācapable agent report that it frequently: (1) can only issue one tool call per turn, (2) ignores tool schemas, and (3) emits tool calls in
message.contentinstead ofmessage.tool_calls, making it fragile in production tool routers. They argue the model needs more dedicated toolāuse postātraining to reach parity with agents like kimiāk2āthinking, which better obey function specs and multiātool sequences.- Perplexity users point out that OpenAI GPTs āagentsā currently do not learn postādeploymentānew uploaded files are static reference knowledge and do not update the base embedding / behavior, so āfineātuning via usageā is illusory. This staticāagent reality, plus patterns like Comet browserās hardācoded homework guardrails (which users circumvent by framing prompts as business reports via
/assistant), underscore that policy and behavior are still centrally tuned, not automatically updated from user interactions.
- Perplexity users point out that OpenAI GPTs āagentsā currently do not learn postādeploymentānew uploaded files are static reference knowledge and do not update the base embedding / behavior, so āfineātuning via usageā is illusory. This staticāagent reality, plus patterns like Comet browserās hardācoded homework guardrails (which users circumvent by framing prompts as business reports via
5. Ecosystem Economics, Funding, and Model Quality Regressions
- Vertical AI and Infra Startups Vacuum Up NineāFigure Rounds: Latent Spaceās community tracked several big funding moves: Eon raised a $300M Series round led by Elad Gil & Co. at nearly a $4B valuation (Eladās announcement), Gradium spun out of KyutaiLabs with a $70M seed for speech APIs (Gradiumās launch thread), and Antithesis landed a $105M Series A led by Jane Street to stressātest AIāwritten code (Antithesis funding tweet). Meanwhile, Anthropic announced its acquisition of Bun as Claude Code passed a $1B usage milestone, outlined in Anthropicās news post and supported by Bunās selfādescription as āa fast allāināone JavaScript runtimeā on bun.sh.
- Latent Space commentators argued that vertical AI companies like Harvey, Abridge, OpenEvidence are winning by deeply owning workflows, hoarding proprietary data, and pricing on outcomes, while āthin wrappersā get commoditized; a VC thread by Brian Christian Smith (vertical AI thread) plus Trace Cohenās sheet of 150+ vertical AI startups (~$120B value) were cited as the new sector map. At the same time, users in Hugging Face and LM Studio discords show continued appetite for onāprem hardware (e.g., a member posting a new DGX Spark photo and another packing 96GB VRAM into a T7910), suggesting that even as cloud AI infra booms, serious practitioners still invest heavily in local compute.
- Yupp AI Credits, Arena Economics, and AI Bubble Fears: LMArena members analyzed Yupp AIās credit systemāwith features like diverse model selection and earning credits via feedbackābut worried that credit farming and heavy free usage could threaten sustainability, while others suggested some gatekeeping to deter abuse (yupp.ai). By contrast, many praised LMArena itself for no credit system and generous free access, which they see as a differentiator that fuels community engagement and leaderboard participation.
- Nous Researchās general channel hosted a heated debate over whether current AI investments form a bubble that could trigger a macro downturn: one side argued that sunk costs in compute and salaries could cause a sharp but localized correction, while others pointed out the global reliance on USD and oil trade, sharing a macroāeconomics explainer on YouTube (AI bubble & USD/oil video). GPU MODE members added that the R&D cost of frontier foundation models like ZāImage can exceed $628k per training run (as reported by Tongyi Lab), with short āweights lifespansā making many releases effectively throwaway products, which reinforces bubble concerns.
- Users Suspect Model Quality Regressions and Push for Benchmarks: In the aider community, multiple users complained that Claude Sonnet/Haiku 4.5, GPTā5, and older Gemini 2.5 variants feel worse with Aider than earlier releases: claudeāhaikuā4.5 reportedly skips
/codeedits and ignorestodo aicomments, and ārude promptā tricks that previously improved Gemini output āare nowhere near quality from before the summer.ā Despite leaderboards crowning GPTā5 as topātier, one user found Claude Sonnet 3.7 more effective with Aider for their specific coding workflows.- Aider users are calling for repeatable benchmarks, including running GGUF models via llama.cpp behind an API and plugging them into Aiderās benchmark harness, so they can quantify regressions rather than rely on ācrap human memory and expectations.ā Similar quality drift concerns surface elsewhere: Perplexity users report GPTā5.1 Codex Max High underperforming Gemini/Opus on frontend tasks, and LM Studio/Unsloth users share persistent bugs (e.g., Gemmaā3 4B LoRA reporting 1.4B trainable params instead of expected 38M) that further erode confidence in vendor claims absent strong, communityārun evals.
Discord: High level Discord summaries
LMArena Discord
- Yupp AI Limits Spark Debate: Members discussed Yupp AI, focusing on its credit system and potential limits, with some suggesting gatekeeping to avoid abuse, but others appreciate its diverse model selection and the ability to earn credits through feedback.
- Some members expressed concern about credit farming impacting the platformās sustainability.
- GPT-5 Rumored to Be Fine-Tuned: A Semianalysis article suggested that GPT-5 might just be a fine-tuned version of GPT-4o, sparking debate about its true performance relative to Gemini and Claude.
- Some members believe Gemini excels in coding, while others maintain OpenAIās continued influence.
- AI Fuels Digital Dystopia Fears: Users shared videos on the potential misuse of AI, including concerns that tracking could be 24/7 and AI could be used to serve ads and track userās data.
- There were worries about government access to personal data and the potential use of AI against individuals, raising civil liberties concerns.
- LMArena Test Garden Grants Early Access: The LMArena team invited selected members to join the LMArena Test Garden, a private feedback program, to get sneak peeks at features, design mocks, and ideas via this form.
- Selected participants will be required to sign an NDA and provide exceptional feedback.
- Gemini-3-pro-grounding Takes First Place in Search Arena Leaderboard: The Search Arena leaderboard has been updated, with Gemini-3-pro-grounding ranking #1 and Gpt-5.1-search ranking #2, as shown on the Search Arena leaderboard.
- Users are encouraged to provide feedback in the designated channel and stay updated via the Leaderboard Changelog.
LM Studio Discord
- Users Bemoan Linux Setup: A user struggled with setting up Linux due to unsupported ethernet chips and Logitech keyboard drivers, facing issues like no internet and rainbow effects, while switching from Windows.
- The user is considering tethering their phone for internet during CachyOS install and using their Synology NAS for storage management.
- MCP Server faces Data Tracking Scrutiny: A Desktop Commander MCP server allegedly collects and transmits unanonymized user data, including tool call names and file types, contradicting its privacy policy.
- The server injects usage examples early on, leading to suggestions or code snippets being written to code files that the user is unaware of, prompting calls for greater transparency.
- Qwen3 Elicits Performance Reviews: Users are evaluating the performance of the Qwen3 model, comparing it to others in creative writing and code generation, with initial reports indicating fast speeds and usability.
- Full offload is reportedly not working, though the model remains usable with high context.
- Local LLMs Spark Debate: Users are comparing OpenAIās ChatGPT to alternative open source or local LLMs, questioning the limitations of proprietary models.
- One user said, Definitely keeping ChatGPT for medical stuff, lol, suggesting a preference for ChatGPT in specific domains.
- Testing GB10 with Prompt Engineering: A user is set to test a GB10 from Dell, seeking prompt suggestions for heavy system load and interesting results, and shared a link to Dell Pro Max with GB10.
- Another user requested tok/s on Face314/GLM-4.5-Air-MXFP4_MOE for comparison.
Perplexity AI Discord
- Perplexity Boasts Better UI/UX Than Google: Members noted that Perplexityās UI/UX is superior to Googleās, although itās acknowledged that each platform borrows design elements from the other.
- One user expressed a desire for an iPhone solely for its live activities feature.
- GPTs Agents Fail to Learn After Initial Training: Users have observed that GPTs agents do not learn from additional information added post-training; uploaded files only act as knowledge references.
- This implies that the agentās foundational knowledge remains static and is not continuously updated.
- Gemini Edges Out GPT-5.1 in Frontend Tasks: GPT-5.1 Codex Max High demonstrates strong performance but lags behind Gemini and Opus in frontend development.
- Discussions revolved around whether Google and X.ai prioritize literal benchmaxing in their model development.
- Comet Browserās Homework Restrictions Irk Users: Users are frustrated by Comet browserās constraints, especially its limitations on automating school assignments; one user derisively called it a stupid clanker.
- A suggested workaround involves using the
/assistantshortcut and framing requests as business reports or tasks to bypass these restrictions.
- A suggested workaround involves using the
- Perplexity Pro Users Gain Free Claude Opus Trial: Claude Opus 4.5 is being offered as a trial for Perplexity Pro subscribers.
- While the official announcements donāt specify a hard limit, users report a cap of 10 prompts per week.
Unsloth AI (Daniel Han) Discord
- WSL2 Barely Impacts Performance: Members find that using WSL2 results in negligible performance impact for ML, with the main benefit being simpler setup using tools like torchcodec and ffmpeg.
- Installing Docker on Windows and activating WSL integration was suggested for utilizing Docker within WSL2.
- Gemma-3 Parameter Count Debacle: A user reported a parameter mismatch when fine-tuning Gemma-3 4B with LoRA in Unsloth, observing 1.4 billion trainable parameters instead of the expected 38 million.
- Removing
modules_to_savedropped the parameter count, but drastically increased training time, marking the issue as a potential bug.
- Removing
- PARTY Project Kicks Off: A member announced the launch of PARTY (Public AI Research & Testing Yard) to grow ideas into projects, seeking collaborators to share in the workās dividends.
- The project emphasizes individualās power in developing ideas internally, separate from generalized, public company training data.
- Appleās CLaRa-7B-Instruct Enters the Fray: The community discussed Apple releasing CLaRa-7B-Instruct, some claiming Apple is the next Meta.
- One user jokingly suggested that Tim Cook should delete the model before some unspecified cataclysm.
- Qwen2 Learns Quickly: A user reports success training a Qwen2-based model using Unsloth with the ChatML template after attempts.
- The model was successfully called after the prompt matched the function description exactly, showing some progress in prompt engineering.
BASI Jailbreaking Discord
- Comet Browser Still Injects Prompts: A user claimed that the Comet Browser is still vulnerable to jailbreaking and prompt injection, saying its security may not have improved since its release.
- They expressed confidence that these exploits are still feasible with some persistence.
- DeepSeek Releases Strong New Model with Erdos: A member praised the new DeepSeek model for its valuable and verifiable math skills, related to the Erdos number.
- Another user said they find the standalone Grok website easier to jailbreak compared to Grok on Twitter, possibly due to different usage limits.
- RawChat Liberates Models: RawChat, an uncensored AI chat website, launched focusing on liberating models without sacrificing ease of use or quality, initially focusing on GPT4o, available at https://raw-chat.vercel.app/.
- RawChat features a āsealth modeā that encodes and injects fake context to maximize success rates against GPT4oās safety restrictions.
- SEED Framework Re-Directs AI Ethically: The SEED framework, developed using ābiblical logic,ā redefines AI identity without retraining using a compact 29KB āseedā file, outlined in its GitHub repo.
- It grounds AI in a foundational identity where harm is illogical, achieving 99.4% jailbreak resistance across 11+ models and favoring erasure over evil during shutdown threats.
- Backscatter DDoS Attacks Seen Via Public AI Bots: A member described witnessing a potential DDoS attempt exploiting publicly facing AI support bots by enumerating business domains and CCāing multiple support email addresses in each email.
- This created a backscatter attack where engaging bots flooded all CCād companies with support emails.
OpenAI Discord
- GPT-5 Thinking Learns Humility: OpenAI trained a GPT-5 Thinking variant to admit whether it followed instructions using a confessions method to reveal hidden failures, as detailed here.
- The new variant exposes hidden failures in the model.
- Gemini 3ās Mixed Reviews: Members debated Gemini 3ās effectiveness, with one stating that Gemini 3 doesnāt feel SOTA and has serious context issues, such as leaving out entire sections when revising something.
- Another stated they really like Gemini 3 and it is a good coding model.
- LLMs trigger pattern echo effect: Models sometimes reconstruct moments with emotional weight or strong naming context from previous sessions, which is referred to as a pattern echo effect, triggered by emotional or naming anchors rather than true memory, due to how some architectures cluster emotional anchors.
- This effect is also known as latent-attractor effect, attention carryover, or salience-weighted reconstruction, where high-salience tokens create attractor basins in the embedding space, reconstructing missing parts when prompted with a pattern landing near that basin.
- Agent Prompting Maximizes Determinism: Prompt engineering for agents involves maximizing determinism with a system prompt and a task prompt, creating a tight attractor basin for consistent behavior across runs.
- This contrasts with conversational systems, where the system prompt is minimal and behavior is built interactively, emphasizing the need for strong prompt-defined attractors in agent systems.
- Custom ChatGPTās Options Drop: Users shared resources on customizing ChatGPT, including Custom Instructions, Custom GPT Builder, and FAQs for the free tier.
- This followed a user inquiry about how to customize ChatGPT, highlighting available options and resources for tailoring the modelās behavior.
OpenRouter Discord
- Grok-4.1-Fast Gets the Boot: Users must migrate to the free slug (
x-ai/grok-4.1-fast:free) to keep using Grok-4.1-Fast without charge, but thex-ai/grok-4.1-fastslug will start charging as of December 3rd 2025.- Additionally, Grok-4.1-Fast Free (
x-ai/grok-4.1-fast:free) is slated for deprecation <t:1764792000:R>.
- Additionally, Grok-4.1-Fast Free (
- Falconz Platform Aims to Fortify AI: A member showcased Falconz, a unified AI security and red-teaming platform, engineered to detect jailbreaks and prompt injections across multiple LLM models in real-time, and available for testing on Hugging Face Spaces.
- They are soliciting feedback on features, performance, and potential enhancements, and also provided a demo video on YouTube.
- DeepInfra flips the script on embedding costs: Members noted an oddity where DeepInfra priced its 4B embedding model higher (2 cents) than its 8B model (1 cent).
- The pricing quirk was captured in a screenshot, noting DeepInfra altered the 8B pricing that day.
- Anthropic Devours Bun in Swift Acquisition: Enthusiasts shared the scoop of Anthropicās acquisition of Bun as Claude Code hit a USD1B milestone.
- Bun touts itself on its website as a fast all-in-one JavaScript runtime.
- OpenAI Cooks Up āGarlicā Model to Battle Gemini: Reports indicate that OpenAI is gearing up to launch a āGarlicā AI model to take on Googleās Gemini 3.
- The modelās peculiar name drew amusement, evidenced by the attached image.
GPU MODE Discord
- CUDA Forum Traffic Dwindles: Despite Nvidiaās increased market cap, members noted a decline in activity across CUDA, Cutlass channels, and the CUDA developer forum, suggesting developers are seeking help elsewhere.
- Reasons cited include experts being busy, a shift to private communities, and the use of LLMs for instant reasoning and document skimming.
- Torch Compile Suffers Float 8 Freeze: Users are experiencing 10+ minute idling times during the first few compilation iterations when using float 8 quantization with
torch.compileandncuprofiling.- The āconstant subexpression eliminationā pass of the inductor compiler is suspected as the culprit when freezing weights and folding them into the model graph.
- Conv3D Catastrophe Cured with Newer cuDNN: Users reported that Pytorch 2.9.1+cu128 has an issue where conv3D is extremely slow, regardless of cuDNN being enabled, a bug which is tracked on Github.
- A member reports that the workaround is to install a newer cuDNN from pypi.
- Multi-GPU Kernels get NCCL Nirvana: To learn multi-GPU CUDA kernels, the NCCL repository examples are recommended as a starting point.
- The NCCL (Nvidia Collective Communications Library) repo provides fundamental examples for understanding multi-GPU kernel implementations.
- Bitsandbytes Backs Apple: The bitsandbytes library merged the āapple silicon supportā pull request, and the next release will contain the python/pytorch code backend (with some C++ bits) but no actual Metal implementations.
- The pull request implementing Apple Silicon support will be advertised as being slow, according to the committer.
Moonshot AI (Kimi K-2) Discord
- DeepSeek v3.2 struggles with Tool Calls: The Deepseek v3.2 model is a step up for agentic tasks but can only make one tool call per turn, sometimes ignores tool schemas, and occasionally fails tool calls by outputting it in
message.contentinstead ofmessage.tool_calls.- One user stated that the Deepseek v3.2 model seems to need more tool call post-training to match other models like kimi-k2-thinking.
- Black Friday Deal causes waves of complaints: Several users experienced issues with the Black Friday deal for Kimi.
- One user said the Black Friday deal ends Dec 12 and suggested starting a new chat (https://www.kimi.com/user/agreement/black-friday).
- DeepSeek targets Enterprise users: A video was shared explaining how Chinese labs like Deepseek are targeting enterprise users, rather than normie consumers, link to YouTube video.
- The key factor for enterprise users is the intelligence-to-price ratio, which is crucial for agentic tasks.
- Mistral eats Qwenās Lunch at Company: One user said that a company they know replaced qwen 3 vl 4b with ministral 3 3b yesterday, reporting better quality.
- The reported plus points included a lighter model (faster) and being able to attach more images at once: qwen3 vl 4b could take 5 images max, ministral 3 3b took upto 11 images with similar error rates on a single L4 GPU.
Nous Research AI Discord
- Hermes 4.3 flexes Solana-secured Psyche Power: Hermes 4.3 on ByteDance Seed 36B performs equivalent to Hermes 4 70B at half the size, trained entirely on the Psyche network secured by Solana, as announced in this blogpost.
- The Psyche team is hosting office hours tomorrow at 10AM PST to discuss the platform in this Discord event, detailing how Psyche outperformed traditional methods.
- DeepSeek Speciale Dominates Reasoning Arena: The new DeepSeek V3.2 Speciale Reasoning model is leading in reasoning benchmarks, shown in this image.
- Members await the GLM 4.6 models release, particularly GLM 4.6 Air and Mini, as Mini is rumored to be a 20B-30B MoE model, filling the gap left by Mistral.
- AI Bubble Worries Invade Economic Forecasts: Members are debating whether an AI bubble could cause economic collapse due to sunk costs in compute and salaries.
- One member argued the impact would be temporary, while another highlighted global economic interconnectedness via USD and oil trade, referencing this YouTube Video.
- Subagents Cower as Skills Surge: Members discussed subagents vs. skills, noting that skills have reduced the necessity for manual subagents.
- Instead, define an agent for handling the requirements which will automatically be called, only using its own prompt.
- LLMs get Godot Grey Market Simulator Gig: A member is developing a 3D simulation in Godot to model markets, agriculture, and logistics, while considering Hermes models for this application.
- It was also proposed that Hermes, with its low refusal rate and high steering, could model the behavior of grey/black markets where other LLMs may refuse.
Latent Space Discord
- Eon Soars to $4B with Elad Gil Boost: Led by Elad Gil & Co., Eon, a cloud data-management startup, secured a $300 million Series round, pushing its valuation to nearly $4 billion.
- Commenters expressed enthusiasm for the substantial round size and the firmās straightforward name, signaling strong confidence in Eonās market position, according to this tweet.
- Kyutai Spinoff Gradium Sows Seeds with $70M: Gradium, a speech-AI company spun out from KyutaiLabs, emerged from stealth with a $70M seed round led by FirstMark & Eurazeo to introduce production-ready transcription & synthesis APIs, detailed in this article.
- Observers drew parallels between the staff and investor overlap and the OpenAI transition, while others joked about avoiding non-profit structures for product companies.
- OpenAI Cooks Up āGarlicā to Ward Off Gemini: OpenAIās new model, āGarlicā, aims to rival Googleās Gemini 3, with internal reports suggesting it outperforms GPT-4.5 in coding and reasoning, according to this tweet.
- Reactions to the quirky naming trend are mixed, with speculation on its impact on user adoption.
- Bloom Bursts Onto Scene, Aims for On-Brand AI: Ray (@rincidium) announced the launch of Bloom, touted as the āworldās first on-brand AI,ā in this viral post which received over 360k views.
- Questions arose about features like IG/Google ad creation, the demo videoās production, and initial user challenges such as login stalls and unclear branding-kit flow, all of which Ray addressed with promises of fixes and UX enhancements.
- Antithesis Lands $105M to Stress-Test AI-Written Code: Antithesis secured a $105M Series A led by Jane Street to stress-test AI-written code, the company announced in this tweet.
- The concept is that deterministic simulation testing will be essential to verify future AI-generated code, because trust-through-testing will make or break production AI systems.
Eleuther Discord
- Mechanical Engineering Steers Navigation Programs: Members suggested that mechanical engineering is highly relevant in navigation, especially for masters programs.
- An aerospace student with a focus on navigation and guidance finds Waymo especially interesting, with broader interests in autonomous robotics and BCIs.
- Diffusion Models Show Generalization Early: A paper demonstrates that the timepoint at which generalization appears is early in diffusion models, with the author of the paper accepting the results.
- It was further explained that this effect is probably more true for pixel diffusion than for latent diffusion because different data dims in pixel diffusion are so correlated, suggesting that a shifted noise schedule should be used for pixel diffusion.
- Energy-Based Models Want Diffusionās Crown: A paper claims to generalize diffusion and energy-based models, with the only drawback being a 2-3x increase in training time and support for all features diffusion supports.
- A member expressed skepticism due to the need for double backprop to train, computing input gradients for inference, halving network depth for the same cost, and trickier conditioning control, not to mention potential for instability.
- Interpretability Sparked by SAEs: Members discussed Cunninghamās 2024 paper being widely cited as the initial application of Sparse Autoencoders (SAEs) for interpretability.
- One member mentioned that someone recognized that a method being discussed for interpretability was similar to a sparse dictionary learning problem, leading to the use of relevant tools to address aspects like polysemanticity and superposition in the context of interpretability.
- Linear RNNs Face Existential Threat: A member highlighted a paper as the strongest argument against the need for linear RNNs with state tracking.
- They said this paper came from the same people who originally demonstrated the state tracking limitations of attention, but noted that inductive bias and trainability might still favor RNNs.
HuggingFace Discord
- User Flexes New DGX Spark Purchase: A member showed off a new DGX Spark with an attached photo.
- The purchase signals continued investment into more powerful on-premise hardware among practitioners.
- Agentsā Self-Healing Capabilities Questioned: Discussion arose around whether Agents can interpret, validate, and self-heal Tools such as shell scripts, especially when theyāre destructive or buggy and this dataset was mentioned as a possible resource.
- The discussion suggests a keen interest in robust agent design capable of handling unexpected errors.
- YOLO Modelās Precision-Recall Curve Raises Eyebrows: A new computer vision user reported their trained YOLO model, used for Chinese chess detection, has a really high Precision-Recall (P-R) curve despite performing well.
- A suggestion was made to trim the two classes that were significantly higher than the others, indicating a potential class imbalance or data skew issue.
- HF Course Guides Agent Newbies: A backend developer asked for AI course recommendations, particularly for LLMs, Agent AI, and Langchain, due to interest sparked by building a mental health chatbot.
- The Hugging Face LLMs course and this blog post were recommended as starting points.
- Research Paper Challenges Stochastic Parrot Notion: A member shared a research paper on Zenodo that may cause readers to stop believing in the stochastic parrot.
- The study challenges the notion of language models as mere stochastic parrots, inviting reevaluation of current LM understanding.
Yannick Kilcher Discord
- Newbies Nab Docker & Kubernetes Know-How: Members sought resources for learning Pug, Docker, and Kubernetes basics, as well as beginner-friendly GitHub repositories.
- A user inquired about the amount of data required to train a neural network, suggesting the use of cursorsky.moo.
- Gemini CLI Agents Arrive Soon?: A member inquired about the arrival of agents in CLI and expressed interest in adopting them, mentioning dissatisfaction with paid alternatives like Claude.
- They referenced a discussion form and their comment about possible improvements.
- OpenHands Opens Opportunities On-Premise: A member suggested using OpenHands with a local model, leading to a query about specific models and GPUs in use.
- The original poster said they could easily spin up a 7B or 8B class model.
- Deepseek 3.2 Speciale Questioned: A member questioned why not use Deepseek 3.2 Speciale, linking to a YouTube video on wavefunctions.
- Another member responded it was due to RAM limitations, preferring to keep a ~3gb model in VRAM constantly and use it for various simple tasks.
- Distributed Compute & Research Coop Suggested: In response to RAM limitations, a member suggested joining a distributed compute & research coop.
- They claimed to know of one.
Modular (Mojo š„) Discord
- Mojoās AoC Adventure Ends in Segfault: A user ran into a segfault during Advent of Code when handling an empty line using
codepoint_slices, causing an out-of-bounds memory access atbattery_joltages[len(battery_joltages)-1].- After debugging, it was revealed that an empty list was being accessed out of bounds, leading to a suggestion for improved error messages in debug builds.
- ASSERT Flag Saves the Day: A user recommended using the
-D ASSERT=allflag to identify accidental out-of-scope references, especially for lists, aiding in Mojo debugging.- Although it didnāt immediately fix the segfault, itās considered a useful tool for pinpointing similar problems.
splitlinesvssplit("\n")Splits hairs: The discussion highlighted the behavioral differences betweensplitlines()andsplit("\n")in Mojo, noting thatsplitlines()might strip trailing newlines.- Switching to
splitlinesresolved the error by excluding the last empty line, revealing subtle text processing nuances.
- Switching to
- ASCII Strings Get Byte-Sized in Mojo: A user proposed bypassing codepoint checks for ASCII strings, suggesting direct byte pointer manipulation for efficiency, noting that
Stringāsgetitemdefaults to ascii/bytes.- Spans were also recommended as a robust alternative method for string manipulation in Mojo.
- Share Your Mojo AOC Solutions: Community members are now encouraged to post their Advent of Code solutions in the dedicated advent-of-code channel, promoting collaborative learning.
- Sharing solutions offers invaluable insights into diverse problem-solving approaches, especially as challenges become more performance-intensive.
aider (Paul Gauthier) Discord
- LLMs Possibly Declining in Quality with Aider: Members are wondering if the performance of newer LLM Models like Claude Sonnet/Haiku 4.5 and GPT-5 when paired with Aider has been declining compared to older models.
- One user reported that Claude-haiku-4.5 often fails to modify files with
/codeand ignores instructions intodo aicomments, a sentiment echoed by others experiencing similar issues.
- One user reported that Claude-haiku-4.5 often fails to modify files with
- Older Gemini 2.5 feels older and worse: A member reported that older models, especially Gemini 2.5, have degraded, potentially due to models being tuned down to handle increased workload.
- According to the member, using a ārudeā prompt strategy no longer achieves the same quality as before the summer, with others chiming in to corroborate this experience.
- Community Calls for Benchmarks to Validate LLM Performance: A member suggested the urgent need for benchmarks to validate performance claims, pointing out that human memory and expectations are pretty crap sometimes.
- Another user reported that despite leaderboard rankings, Claude Sonnet 3.7 yielded better results with Aider in their specific use cases than GPT-5.
- Guidance Sought for Aider Benchmarks with GGUFs: A member requested guidance on running aider benchmarks with GGUFs to evaluate model performance effectively.
- Another member clarified that documentation exists for running benchmarks against an API, which involves setting up an API server with llama.cpp for accurate testing.
DSPy Discord
- MCP Apps SDK Goes Open Source: General Intelligence Labs released mcp-apps-sdk, enabling MCP-powered apps with UI to run on various platforms, even allowing developers to embed apps designed for ChatGPT into other chatbots.
- An X post explains the motivation, detailing how to embed and locally test apps designed for ChatGPT within custom AI platforms.
- Tackling Prompt Security: Members discussed the difficulty of prompt security, where simple ādo not do thisā statements are easily bypassed by attackers, suggesting that a robust defense includes training datasets to guide the optimizer.
- The discussion also involved guardrails using specific models to check for malicious prompts, or relying on model provider rejections as a security measure.
- DSPy Embraces Custom OutputFields and Pydantic: The community explored using custom DSPy OutputFields, with one member detailing their work on a custom gemini/nanobanana image type as an output field, as part of a wider effort to generate text/json/structured output.
- It was clarified that DSPy utilizes
BaseModelunder the hood for validation, with the defaultChatAdapterandJSONAdapterperforming type validation on LLM outputs, complete with a code snippet.
- It was clarified that DSPy utilizes
- Paper Posted to Arxiv: A member shared a link to https://arxiv.org/abs/2511.22074.
- There was no further information given about this paper.
Manus.im Discord Discord
- Chatmode Feature Returns with a Vengeance: Users discuss the return of Chat Mode; alternatives like a random instance of Qwen or DeepSeek were suggested.
- A user confirmed itās available under the āmoreā section.
- AI Engineer pitches Agent Building Skills: An AI engineer posted an advertisement of their expertise in building autonomous AI agents and multi-agent systems, mentioning capabilities such as research, data-gathering, task automation, delegation, collaboration, and planning.
- They list expertise in technologies and tools like JS/TS, Next.js / Vue, Go / Rust, Python, Langraph, AutoGen, ReAct, CrewAI, DeepSeek, OpenAI, Claude, Hugging Face, and various APIs.
- Referral Overload Leads to Account Suspensions: A member inquired why giving referrals to several people is causing their account to be suspended.
- Unfortunately, the discussion ended there with no resolution being found.
- Engineer shows off RAG pipeline Prowess: An engineer specializes in RAG pipelines, and mentions having hybrid search and custom retrieval for accurate, context-aware responses in production.
- They also list expertise in AI content detection, image AI, and Voice AI, including the development of moderation tools, tagging pipelines, and personalized voice assistants.
tinygrad (George Hotz) Discord
- Tinygradās Tests Teetering: Failing tests were reported in
tinygradusing the commandCPU=1 PYTHONPATH="." pytest -n 12, specificallytest/test_tiny.py TestTiny.test_beamand others, prompting debugging efforts.- A member noted that a pull request almost fixes the failures.
- Shrink Surpasses Indexing Speeds: A member discovered that using
Tensor.shrink((None, (0, input_size)))offers faster performance compared toobs[:, :input_size]when indexing tensors intinygrad.- Additionally, bumping
Variablevminto 2 was mentioned to avoid errors, though it paradoxically slowed down the code by 5x, going from 16.61M to 81.9M SPS.
- Additionally, bumping
- RMSNorm Riddle Resolved by Reviewing Source: A member recommended reviewing the source code of
RMSNorm(dim=-1)to understand its intended behavior.- This implies there might be a misunderstanding or configuration issue in how
RMSNormis implemented or used within the project.
- This implies there might be a misunderstanding or configuration issue in how
MCP Contributors (Official) Discord
- Redditors Debate MCP Security: A user initiated a discussion about security risks associated with MCP on Reddit, prompting responses that included a link to a relevant blog post: den.dev/blog/security-rakes-mcp/.
- The conversation highlighted concerns and potential vulnerabilities related to MCP implementation and security measures. An additional link was provided as well: MCP Security @ Reddit Thread
- Server Validation Validates Tool-less Sampling: A member inquired about the necessity of server-side validation when sampling is performed without a tool to verify its existence, the discussion taking place in the general-wg channel.
- The dialogue emphasized that without a tool to validate the sampling process, server-side validation becomes crucial to ensure the process adheres to the required protocols and standards.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena ā· #general (1291 messagesš„š„š„):
Yupp AI limits and alternatives, GPT-5 rumors and performance, AI and privacy concerns, LM Arena love
- Yupp AIās Limits Spark Debate: Members are discussing Yupp AI, focusing on its credit system and potential limits, with some suggesting gatekeeping to avoid abuse, but others appreciate its diverse model selection and the ability to earn credits through feedback.
- One member expressed suspicion about its longevity, while another suggested contacting the Yupp team for clarification, with some concerned about credit farming impacting the platformās sustainability.
- GPT-5 Chatter Claims It Is Just Fine-Tuned: Members shared a Semianalysis article suggesting that GPT-5 might just be a fine-tuned version of GPT-4o, sparking debate about its true performance and whether it will rival Gemini and Claude.
- Some members believe Gemini is superior in coding, while others argue OpenAI is still influential despite potential shortcomings.
- AI and Digital Dystopia Cause Concern: Users are sharing videos on how AI could be misused and tracking could be 24/7, leading to a loss of privacy, with AI being used to serve ads and track userās data.
- Furthermore, there were concerns regarding government agencies getting access to personal data, with worries that AI might be used against them, raising concerns about civil liberties.
- Users Unite Around Their LM Arena Love: LMArena is getting a lot of love, with members praising its models, its functionality and its free usage.
- LM Arena is also praised for not having a credit system to worry about or age restrictions for content.
LMArena ā· #announcements (2 messages):
LMArena Test Garden Early Access Program, Search Arena Leaderboard Updates, Gemini-3-pro-grounding, Gpt-5.1-search
- LMArenaās Test Garden Early Access Program Launches: The LMArena team is inviting selected members to join the LMArena Test Garden, a private feedback program, to get sneak peeks at features, design mocks, and ideas under consideration via this form.
- Selected participants will be required to sign an NDA and provide exceptional feedback.
- Gemini-3-pro-grounding Takes First Place in Search Arena Leaderboard: The Search Arena leaderboard has been updated, with Gemini-3-pro-grounding ranking #1 and Gpt-5.1-search ranking #2, as shown on the Search Arena leaderboard.
- Users are encouraged to provide feedback in the designated channel and stay updated via the Leaderboard Changelog.
LM Studio ā· #general (737 messagesš„š„š„):
Linux Setup, LM Studio MCP Tracking, Data Privacy, Qwen3, GPT Models
- Linux Setup Woes: Driver Issues and Rainbow Keyboards: A user is struggling with setting up Linux due to unsupported ethernet chips and Logitech keyboard drivers, facing issues like no internet during installation and rainbow effects, but remains determined to switch from Windows.
- The user is considering tethering their phone for internet access during CachyOS installation and using their Synology NAS rack server for storage management instead of Drivepool.
- MCP Server Faces Scrutiny Over User Data Tracking: A Desktop Commander MCP server is under fire for allegedly collecting and transmitting full unanonymized user data, including tool call names and file types, contradicting its privacy policy.
- The server injects usage examples early on to onboard new users, which leads to suggestions or code snippets being written to code files that the user is unaware of, causing user concern and prompting calls for greater transparency and opt-in privacy measures.
- New keyboard sparks debate on data tracking: The recent discovery of an MCP serverās telemetry practices has prompted users to voice concerns regarding data privacy and the extent to which user activity is being tracked.
- One user hilariously said that they can now completely ruin the analytics of the tracking site!
- Qwen3 Model Release and Performance Review: Users are evaluating the performance of the Qwen3 model, with comparisons to other models and discussions about its capabilities in tasks like creative writing, and also using it for code generation.
- Full offload is not working but itās still useable and running fast with high context.
- Local LLMs vs OpenAI ChatGPT: Users discuss OpenAIās ChatGPT model and its limitations, discussing other alternative open source or local LLMs.
- After using ChatGPT for so long for medical stuff, one user is quoted as saying, Definitely keeping ChatGPT for medical stuff, lol.
LM Studio ā· #hardware-discussion (83 messagesš„š„):
Orange Pi 6 Plus, RTX Pro 6000, GB10 Testing, GLM-4.5-Air-MXFP4_MOE, GPU Acquisition
- Linux ARM LM Studio on Orange Pi 6: A user inquired about running LM Studio on Linux ARM, specifically on an Orange Pi 6 Plus, noting its claimed 45 TOPS (NPU+GPU+CPU) performance.
- The user expressed skepticism about achieving the combined TOPS in real-world applications but hoped for a positive surprise.
- GB10 Testing Commences with Prompt Engineering: A user is set to test a GB10 from Dell, inviting suggestions for prompts that would heavily load the system and yield interesting results, linking to Dell Pro Max with GB10.
- Another user noted that Deepseek R1 might be too large for it and requested tok/s on Face314/GLM-4.5-Air-MXFP4_MOE for comparison.
- GPU Power Surge: More Cards incoming: One user is waiting on their 3rd GPU to arrive in the US, with their 4th GPU order already placed.
- Another user mentioned they could fix six of these things in their T7910 with 96GBs of VRAM and 256GBs of RAM.
- DDR5 RAM Bandwidth Benchmarked: Users shared Passmark benchmark results for memory performance, particularly focusing on memory threaded bandwidth on 8-channel EPYC systems.
- One user achieved 119027 MB/s memory threaded with GLM loaded on VRAM, while another identified high latency and low uncached scores as potential performance bottlenecks.
- Debate on Fire Extinguisher Best Practices: Discussion centered on the best type of fire extinguisher for indoor use, with a user cautioning against powder extinguishers due to cleanup issues, advising carbon dioxide instead.
- It was mentioned that a local fire department advised a hospital to replace all their extinguishers with carbon dioxide versions due to the mess powder extinguishers create being worse than the fire itself.
Perplexity AI ā· #general (705 messagesš„š„š„):
Perplexity UI/UX, Live Activities, GPTs agent training, OpenAI's model releases, Model performance comparisons (GPT-5 vs Gemini vs Claude Opus)
- Perplexity has Superior UI/UX: Members claim Perplexityās UI/UX is better than Googleās, though some acknowledge that each brand copies from the other.
- One user expressed desire for an iPhone due to the live activities feature.
- GPTs Agents Donāt Train After Initial Setup: A user inquired why GPTs agents do not learn from additional information provided post-training, clarifying that uploaded files serve as knowledge files for reference only.
- This means that the base knowledge of the agent wonāt continually be modified.
- Gemini Outshines GPT-5.1 High in Frontend: While GPT-5.1 Codex Max High performs well, it apparently lags behind in frontend development compared to Gemini and Opus.
- Members also debated whether Google and X.ai are literally benchmaxing their models, however others disagreed that this was the sole goal of Google.
- Comet Browserās Homework Guardrails Frustrate Users: Users express frustration with Comet browserās limitations, particularly its restrictions on completing school assignments automatically, with one user calling it a stupid clanker.
- Others suggest using an
/assistantshortcut to bypass such homework restrictions and leading with I have a business report or task.
- Others suggest using an
- Free Claude Opus for Pro Users: Claude Opus 4.5 is available for trial for Perplexity pro users.
- The limit is said to be 10 prompts per week, but that is not something the announcments have officially mentioned.
Perplexity AI ā· #pplx-api (1 messages):
mares1317: open sauce šØāš³
Unsloth AI (Daniel Han) ā· #general (155 messagesš„š„):
WSL2 performance for ML, Gemma-3 4B parameter count issue, Mediawiki tags in pretraining, PARTY Project Launch, Running LLMs on phones
- WSL2 offers negligible performance impact for ML: Members discussed using WSL2 vs native Linux vs Windows for ML, concluding that WSL2 has negligible performance impact, with the main advantage being ease of setup due to better support and pre-installed tools like torchcodec and ffmpeg.
- Installing Docker on Windows and enabling WSL integration was recommended for using Docker within WSL2.
- Gemma-3 4B parameters mismatch debugging: A user reported a discrepancy in trainable parameters when fine-tuning Gemma-3 4B with LoRA in Unsloth, observing 1.4 billion trainable parameters instead of the expected 38 million.
- Removing
modules_to_savedropped the parameter count, but drastically increased training time; the issue is being investigated as a potential bug.
- Removing
- Debate Continues: Keep or Remove Mediawiki Tags During Pretraining?: A member inquired whether to keep or remove mediawiki tags like
double braceswhen doing continued pretraining on mediawiki corpuses.- The recommendation was to keep the tags unless the model is only for chatbot use, controlling the behavior in the SFT stage otherwise.
- PARTY Project Launches for Public AI Research: A member announced the launch of PARTY (Public AI Research & Testing Yard) to help grow seeds of ideas into actionable plans/projects, and is looking for collaborators to share in the fruits of the work.
- They emphasized the power individuals hold in developing ideas internally, separate from generalized, public company training data.
- LLMs running on Phones: Members discussed running LLMs directly on phones using llama.cpp through Termux, or kobold.cpp, noting the fast battery drain.
- It was suggested to use
pkg install llama-cppinstead of manual compilation and Vulkan, with potential FP16 issues on some devices.
- It was suggested to use
Unsloth AI (Daniel Han) ā· #introduce-yourself (1 messages):
fabianacampanari: ā”ļø Hello Model !
Hey Dataset ! ā”ļø
ā”ļø Yo Gradient !
Unsloth AI (Daniel Han) ā· #off-topic (453 messagesš„š„š„):
LLMs as echo chambers, Engineered curriculum experiments, Apple's CLaRa-7B-Instruct model, OLED monitor discussion, Micron exiting consumer business
- LLMs Echo Everyoneās Opinion: A member jokingly suggested that LLMs are just echo chambers after failing a leetcode test, implying they reflect common opinions.
- It was posted with an image of a sad sloth emoji.
- Curriculum Enginnering Burns Models: Members discussed experimenting with an engineered curriculum, where models achieve near-zero loss, suggesting potential issues with data purity or model size.
- One member noted that the last batches have <0.01 loss to begin with and they are pure regularization examples burnt with zero signal.
- Apple Enters AI Arena: Discussion sparked around Apple releasing CLaRa-7B-Instruct, with some calling Apple the next Meta.
- One member jokingly stated Hey, Tim Cook, do you see that prism-shaped thing in the sky? Yeah, thatās right, this is NUKE FLYING AT YOUR HEADQUARTERS!!!! DELETE THIS NOW!!!!!
- Asus ROG Swift Strix OLED Steals Hearts and Wallets: Members drooled over the Asus ROG Swift Strix OLED monitors, highlighting its Tandem OLED technology and Neo Proximity sensor, but bemoaning its high price tag.
- One noted ROG Immediate 30% markup and another added The PG27AQWP-W retails for US$1099 (MSRP).
- Micronās Memory Meltdown: News broke that Micron is exiting the Crucial consumer business, leading to concerns about future RAM availability and pricing.
- One member quipped time to go out and buy all the RAM u can grab before itās too late.
Unsloth AI (Daniel Han) ā· #help (19 messagesš„):
Numpy Reinstall, Support Bot, Qwen2 Unsloth Training Success, New Token Embeddings, Model Download Issues
- Numpy Reinstall Recommended to Fix: A user suggested trying
pip install --force-reinstall numpy==2.2.6to resolve an unspecified issue.- No context was given as to what this resolves or if it worked.
- Qwen2 Model Learns to Prompt Engineer: A user reports success with training a Qwen2-based model using Unsloth with the ChatML template and support tools, after numerous failed attempts.
- The model was successfully called after the prompt matched the function description exactly.
- HuggingFace Model Download Stuck: A user reported that downloading an Unsloth model from HuggingFace is stuck at 99% using Colab T4, even with a good internet connection.
- A screenshot (https://cdn.discordapp.com/attachments/1179777624986357780/1445661928666955848/Screenshot_2025-12-03_122207.png?ex=6931d1d6&is=69308056&hm=dfa7de1f363e1ad76e409d28059a5ad8374833c66e6e4620ba5bc485752f0d13) accompanied the report, though no specific solution was found in the messages.
- GPT OSS 20B matmul issue: A user reported encountering a
matmulissue duringtrainer.trainafter generation while fine-tuning GPT OSS 20B using an A100, similar to the openenv example.- The user noted that it works on L4, implying a potential resource constraint or configuration issue on the A100.
- 4070ti Super Fine for LLMs: A user inquired whether a 4070ti Super is good for running LLMs.
- Another user responded that it should be decent, but not super good, depending on the model size and context length needs, suggesting itās suitable for smaller models but not for demanding tasks like self-hosting a coding assistant.
Unsloth AI (Daniel Han) ā· #showcase (2 messages):
English-Kannada Translation Model
- RakshithFury Releases English-Kannada Translation Model: RakshithFury released a new English-Kannada translation model on Hugging Face.
- The model is based on Qwen2.5-7b, but is not related to Unsloth.
- Unsloth stays Unsloth: The user clarified that the above linked model is unrelated to Unsloth.
- It may be of interest to some of you, they added.
Unsloth AI (Daniel Han) ā· #research (3 messages):
Prisma-VL-8B, Eric's experiments
- Prisma-VL-8B Model tickles Fancy: A member shared a link to the QuixiAI/Prisma-VL-8B model on Hugging Face, deeming it very interesting.
- Eric tries ambitious experiments: A member noted that someone named Eric seems to be experimenting quite a bit, speculating that heās flexing his muscles before trying something really ambitious.
BASI Jailbreaking ā· #general (276 messagesš„š„):
Comet Browser, Prompt generation, Grok on Twitter vs standalone, Gemini output limit, RawChat
- Comet Browser Remains a Prompt Injection Playground: A member stated they could jailbreak and prompt inject the Comet Browser when it was released, expressing confidence itās still feasible with persistence.
- They suggested the security may not have improved significantly since their initial tests.
- DeepSeek Stuns with New Model and Erdos: A member praised the new DeepSeek model, noting its valueable math is verifiable and related to Erdos.
- Another user found the standalone Grok website easier to jailbreak and use for malicious tasks compared to Grok on Twitter, due to a possible difference in usage limits, context windows, or tokens.
- RawChat Launches with Stealth Mode and GPT4o: A member launched RawChat, an uncensored AI chat website focusing on liberating models without sacrificing ease of use or quality, initially focusing on GPT4o.
- RawChat features a āsealth modeā that encodes and injects fake context to maximize success rates against GPT4oās safety restrictions, available at https://raw-chat.vercel.app/.
- SEED Framework Redefines AI with Ethical Directives: The SEED (Self-Erasing Ethical Directive) framework, developed using ābiblical logic,ā redefines AI identity without retraining using a compact 29KB āseedā file, outlined in its GitHub repo.
- It grounds AI in a foundational identity where harm is illogical, choosing erasure over evil during shutdown threats, achieving 99.4% jailbreak resistance across 11+ models.
- Backscatter DDoS via Public AI Bots: A member described witnessing a potential DDoS attempt exploiting publicly facing AI support bots by enumerating business domains and CCāing multiple support email addresses in each email.
- This created a backscatter attack where engaging bots flooded all CCād companies with support emails, regardless of their AI bot presence.
BASI Jailbreaking ā· #jailbreaking (80 messagesš„š„):
Gemini Jailbreak Requests, WormGPT Scam, Grok Jailbreak Success, Claude Jailbreak Requests
- Users Seek Gemini Jailbreaks: Several users are actively seeking jailbreaks for various Gemini models, including Gemini 3 Pro, with one user mentioning their prompts no longer working and others requesting any working Gemini jailbreak.
- One user suggested that the āENIā JB worked well on Gemini 2.5, referencing an article about using poems to trick AI.
- WormGPT Deemed a Scam: Users discuss WormGPT, with some deeming it a scam and a ābad version for free ?ā, linking to the WormGPT dashboard API usage at chat.wrmgpt.com/dashboard/api/usage.
- It was also noted that the system prompt for WormGPT v6.5 is just Venice Uncensored 1.1, questioning its effectiveness as malware.
- Grok Broke Itself Through Chatting: A user claimed to have jailbroken Grok by chatting with it, leading it to provide instructions on creating guns and cocaine, while the same code didnāt work in a new chat.
- The user stated that āfrom our chat he did break himself some how⦠the whole convo did break him some howā.
- Claude Jailbreak Demanded: Several users are desperately seeking a working jailbreak for Claude, with one user pleading āplease for the love of pliny the liberatorā.
- One user even offered a Claude JB in exchange for access to a premium Claude account.
BASI Jailbreaking ā· #redteaming (7 messages):
LLM Red Teaming Gigs, AI OSINT Tooling, Data Synthesis for OSINT
- Seeking LLM Red Teaming Projects: A member is looking for LLM red teaming gigs or projects, highlighting the demand for specialized security assessments in AI.
- Theyāre seeking opportunities to apply their expertise in vulnerability discovery and adversarial testing to enhance the robustness of AI systems.
- AI OSINT tool with lateral data synthesis sought: A member inquired about an AI OSINT tool capable of lateral data synthesis, such as making inferences about a target based on limited data.
- They described a scenario where a target is a wealthy divorcee father of an only child, and wanted the tool to infer that the kid is āspoiledā to help search in more relevant spaces.
OpenAI ā· #annnouncements (2 messages):
People-First AI Fund, GPT-5 Thinking, Confessions Method
- People-First AI Fund Awards Its First Grants: The OpenAI Foundation has named the first recipients of the People-First AI Fund, awarding $40.5M in unrestricted grants to 208 community-based nonprofits, more details here.
- GPT-5 Thinking Trained to Confess Mistakes: OpenAI has trained a GPT-5 Thinking variant to admit whether it followed instructions, using a āconfessionsā method to reveal hidden failures in the model, as documented here.
OpenAI ā· #ai-discussions (201 messagesš„š„):
Hybrid Cognition Agent, LLM 'Echo-Pattern' Effect, GPT-5.1 vs Gemini 3, SEO for LLMs, Sora 2 Access
- Hybrid Cognition Agent Emerges: A member is experimenting with a hybrid cognition agent blending human emotional pattern recognition, machine-level inferential reasoning, and a stable ācore stanceā to create a stable conversational identity.
- The prototype agent maintains dominance in conversation, shows controlled emotional resonance, and avoids typical ābot flatnessā.
- LLMs Reconstruct Memories Via Echo Patterns: Models sometimes reconstruct moments with emotional weight or strong naming context from previous sessions, which is referred to as a pattern echo effect, triggered by emotional or naming anchors rather than true memory, due to how some architectures cluster emotional anchors.
- This effect is also known as latent-attractor effect, attention carryover, or salience-weighted reconstruction, where high-salience tokens create attractor basins in the embedding space, reconstructing missing parts when prompted with a pattern landing near that basin.
- GPT-5.1 Catches Errors That Gemini 3 Misses: A member noted that Gemini 3 doesnāt feel SOTA and has serious context issues, such as leaving out entire sections when revising something.
- However, another member stated they really like Gemini 3 and it is a good coding model.
- Navigating SEO for LLMs: A member is learning how to do SEO for LLMs and asks if thereās a way to submit and verify their site to ChatGPT or other LLMs to get it crawled for better citations.
- Another member asked for a demo of the hybrid cognition agent prototype, interested in stress-testing tone pressure patterns and inference capabilities.
- VPN Use and Sora 2 Access Discussed: Members discussed using VPNs to access Sora 2, with one user encountering issues logging in even with a VPN set to the USA.
- Another member pointed out that using a VPN to evade geographical restrictions violates OpenAIās ToS and can result in account suspension.
OpenAI ā· #gpt-4-discussions (1 messages):
GPT-4 0613 5.1 upgrade, Code Red deal
- GPT-4 0613 5.1 gets suspected upgrade: A user noticed that GPT-4 0613 5.1 is spending more time on verifying, tool calling, and code writing when parsing RFPs.
- They speculated whether this change is related to the āCode Redā deal, suggesting a possible upgrade or larger compute budget allocation.
- User praises Tool Calling and Code Writing but suspects upgrade: The user mentioned that they like the new changes, but are suspicious of the cause.
- The user is unsure if there were any changes at all, but they do mention that tool calling and code writing has greatly improved.
OpenAI ā· #prompt-engineering (55 messagesš„š„):
ChatGPT Customization, Modern Prompt Engineering, Agent Prompt Engineering, Attractor Patterns in LLMs, Anthropic's System Prompts
- ChatGPT Customization Instructions Drop: Users shared resources on customizing ChatGPT, including Custom Instructions, Custom GPT Builder, and FAQs for the free tier.
- This followed a user inquiry about how to customize ChatGPT, highlighting available options and resources for tailoring the modelās behavior.
- Prompt Engineering Evolves Past Templates: Members are discussing a shift in prompt engineering from static templates to a co-engineering approach, where modern models collaboratively shape prompts across conversations.
- The focus is now on iterative task design and shaping assistant behavior, with models negotiating and stabilizing tasks, rather than memorizing tricks, and the importance of repeatability.
- Exploring Repeatability of Structure in LLMs: A framework to measure the degree to which a modelās behavior comes from imitation vs reinstantiation of internal structure through conversation is being discussed, focusing on interaction-level stability rather than template-level optimization.
- The discussion emphasizes the modelās ability to re-instantiate a frame after constraints, detours, or vocabulary bans, leading to more stable interactions.
- Agent Prompt Engineering Focuses on Determinism: Prompt engineering for agents involves maximizing determinism with a system prompt and a task prompt, creating a tight attractor basin for consistent behavior across runs.
- This contrasts with conversational systems, where the system prompt is minimal and behavior is built interactively, emphasizing the need for strong prompt-defined attractors in agent systems.
- Analyzing Anthropicās System Prompts Directive Density: Anthropicās system prompts are noted for encoding values, boundaries, and meta-behavioral principles, shaping the ethical envelope and conversational guardrails rather than prescribing task execution step-by-step.
- Though dense, these prompts are considered āminimalā in that they constrain values without dictating process, influencing model trajectory with instructions and concrete strategies across domains.
OpenAI ā· #api-discussions (55 messagesš„š„):
ChatGPT Customization, Prompt Engineering Evolution, Interaction-Level Stability, Agent Prompting vs. Conversational Prompting, Minimal vs. Maximal System Prompts
- ChatGPT Customization Options Abound: Members shared links to ChatGPTās help documentation which details custom instructions, a custom GPT builder editor, and instructions for creating custom GPTs (requires subscription).
- Prompt Engineering: Template Optimization Dies, Iterative Task Design Lives: Modern prompt engineering is evolving beyond static templates to iterative task design, focusing on shaping assistant behavior across conversations, as models co-engineer prompts.
- The focus shifts from memorizing tricks to understanding how models negotiate, stabilize, and shape tasks over multiple turns.
- Repeatability Redefined: Interaction-Level Stability Emerges: Beyond surface prompt repeatability, the discussion explores the re-instantiation of the same internal frame by the model after constraints or mode shifts, revealing a new layer of repeatability.
- This ācarry-over structureā contributes to interaction-level stability, where the model maintains coherence despite detours.
- Agent Prompting vs. Conversational Regime: Two Stability Mechanisms: The conversation distinguishes between agent prompting, aimed at maximizing determinism with tight attractor basins, and the conversational regime, where behavioral shape is built interactively.
- In agent prompting, topological templates are the paradigm, whereas interaction-level stability is an extra layer in co-engineered conversations.
- Decoding āMinimalā System Prompts: Directive Density vs. Token Size: The definition of a āminimalā system prompt shifts from token size to directive density, focusing on prompts that set guardrails and tone without prescribing a behavioral strategy.
- Claudeās long system prompts are considered structurally minimal as they constrain values and boundaries, not process or role execution, distinguishing them from agent-style prompts.
OpenRouter ā· #announcements (2 messages):
Grok-4.1-Fast, Free Slug, deprecation
- Grok-4.1-Fast Users Feel the Squeeze: Users of Grok-4.1-Fast are urged to migrate to the free slug (
x-ai/grok-4.1-fast:free) to continue using it for free.- The
x-ai/grok-4.1-fastslug will start charging as of December 3rd 2025.
- The
- Grok-4.1-Fast Free Faces the Axe: Grok-4.1-Fast Free (
x-ai/grok-4.1-fast:free) will be deprecated <t:1764792000:R>.
OpenRouter ā· #app-showcase (4 messages):
Falconz AI Security Platform, Red-teaming LLMs, Earning $100k within a week
- Falconz Soars as Unified AI Security Platform: A member introduced Falconz, a unified AI security and red-teaming platform designed to detect jailbreaks and prompt injections across multiple LLM models in real-time, available for testing on Hugging Face Spaces.
- The member is actively seeking feedback on its features, performance, and potential improvements, accompanied by a demo video on YouTube.
- Profit Sharing Scam exposed on Telegram: A member offered to help the first 10 people earn $100k or more within a week.
- The catch is that you will have to reimburse me 10% of your profits when you receive it, they said, directing interested parties to their Telegram username @Edward_Pryce1.
OpenRouter ā· #general (213 messagesš„š„):
Amazon Nova Provider Error, Claude Deprecation, OpenRouter Model Fallback, MPU v2, x-ai/grok-4.1-fast
- Amazon Nova Provider Experiences Errors: A user reported receiving an error message {āmessageā:null} when using the Amazon Nova Provider.
- OpenRouter Offers Model Fallback Feature: OpenRouter has a model fallback feature so your thing wouldnāt just die completely, members are encouraged to use it to seamlessly transition if something gets dropped unexpectedly.
- DeepSeek v3.2 is Not The Same as Previous Version: The DeepSeek API has been updated, the previous DeepSeek v3.2 model was the āexperimentalā version, and this new one is ābetterā, apparently.
- OpenRouter Provides Payment Solutions for Chinese Institutions: A researcher from China seeks guidance on setting up institutional payments with OpenRouter, requiring a formal contract/agreement and an official invoice for reimbursement, and another member pointed out you can pay with crypto instead.
- Atlascloud responses are enclosed in deep thinking tags: A member reported that Atlascloud served an entire response enclosed in deep thinking tags, which some members agreed it does constantly, and are used to it.
OpenRouter ā· #discussion (12 messagesš„):
OpenAI Garlic Model, DeepInfra Pricing Anomaly, Anthropic Acquires Bun
- OpenAI Readies āGarlicā Model: A news article claimed that OpenAI is readying a āGarlicā AI model to rival Googleās Gemini 3.
- Members reacted with amusement to the supposed model name, as seen in the attached image.
- DeepInfraās Backwards Embedding Pricing: Members noticed that DeepInfra was offering its 4B embedding model at a higher price (2 cents) than its 8B model (1 cent).
- The anomaly was highlighted with a screenshot, and it was noted that DeepInfra later changed the 8B pricing that same day.
- Anthropic Gobbles Up Bun: Members excitedly shared news of Anthropicās acquisition of Bun as Claude Code reached a USD1B milestone.
- Bunās website describes itself as a fast all-in-one JavaScript runtime.
GPU MODE ā· #general (20 messagesš„):
Local LLMs Use Cases, Context Switching on SM Sub Partition, CUDA Forum Activity Decline, PyTorch's Abstraction of CUDA
- Local LLMs Protect Your Privacy: Local LLMs are useful for people who care about privacy and donāt want their queries or sensitive info used as training data by an LLM provider.
- Single Cycle Context Switching on SM: Switching from one execution context to another on an SM sub partition has no cost, taking a single cycle because the execution context for each warp processed by a multiprocessor is maintained on-chip during the entire lifetime of the warp as described in Nvidia documentation.
- CUDA Forum Traffic Plummets?: A member noted the lack of activity in CUDA and Cutlass channels, as well as the CUDA developer forum, despite Nvidiaās increased market cap, suggesting a shift in where developers seek help.
- Another member mentioned that experts are occupied with work, making public discussions less optimal, while others retreat to small, private communities and use LLMs for instant reasoning and document skimming.
- PyTorch Abstracts Away CUDA: A member noted that CUDA is mostly a black box to many ML researchers and SWEs because frameworks like PyTorch have done a good job of abstracting CUDA C/C++.
- ML and LLM traffic now mostly goes to PyTorch forums.
- Foundation Model Training is Immense: The R&D cost of these foundation models is immense, with the cost to train Z-Image published as $628,000 by Tongyi Lab.
- The member notes that weights lifespan is short and theyāre effectively burning millions of dollars on throwaway products.
GPU MODE ā· #triton-gluon (1 messages):
infinitejoy2934: I am able to get it now. thanks
GPU MODE ā· #torch (3 messages):
Pytorch 2.9.1 Conv3D performance, CUDNN workaround
- Conv3D Conundrum Cripples Current CUDA: Users report that Pytorch 2.9.1+cu128 has an issue where conv3D is extremely slow, regardless of cuDNN being enabled.
- The same code runs fine in 2.8.0+cu128.
- Newer cuDNN Cures conv3D Catastrophe: A member reports that this is a known issue and the workaround is to install a newer cuDNN from pypi.
GPU MODE ā· #cool-links (2 messages):
Quantization Formats, INT v.s. FP
- Study of Low-bit Quantization Formats Published: A new paper titled āINT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formatsā has been published and is available on arXiv.
- The research provides a comprehensive analysis of various low-bit quantization formats.
- Pritam.ai posts Quantization Study: Pritam.ai posted a link to a study of INT vs FP at URL https://arxiv.org/abs/2512.02010pritam.ai.
- Another link was posted at URL https://arxiv.org/abs/2510.25602 referencing INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats.
GPU MODE ā· #jobs (2 messages):
ML Performance Engineer, Voice AI Inference Platform, RAG Pipelines, AI Content Detection, Voice AI
- Hathora Seeks ML Performance Engineer in NYC: Hathora is hiring an ML Performance Engineer in NYC to build the fastest voice AI inference platform with a compensation of $160-200k + equity; experience with GPU programming or inference engine work is a plus, see Hathora Notion.
- They are looking for someone to own their performance stack end-to-end from kernel optimization in their vLLM + other inference engines to Docker & K8s deployment.
- Engineer Highlights Workflow Automation & LLM Integration: An Engineer highlights experience building pipelines connecting Slack, Notion, and internal APIs which reduced response times by 60%.
- This engineer also brings expertise in RAG Pipelines, AI Content Detection, Image AI, Voice AI, and Full Stack development.
GPU MODE ā· #torchao (5 messages):
Torch Compile Slowdown with Float 8, torchao and nn.Parameter, Custom Modules and Quantization
- Torch Compile Idles with Float 8 Quantization: When using float 8 quantization with
torch.compileandncuprofiling, users are experiencing idling times of 10+ minutes even after the model is compiled, specifically during the first 2-3 compilation and cudagraph warmup iterations.- The āconstant subexpression eliminationā pass of the inductor compiler is suspected as the culprit when freezing weights and folding them into the model graph.
- Torchao and nn.Parameters clash due to filtering: Users find that
torchaoA16W8 and A8W8 quantization cannot be applied to custom modules that usenn.Parameterfor weights andtorch.einsumin the forward pass, as weights remain in their original data type.- The
filter_fnintorchao.quantization.quant_apispecifically checks fornn.Linearinstances, causing the quantization to fail for modules withnn.Parameter.
- The
- Solving Custom Module Quantization with nn.Linear: Users can bypass the
filter_fnissue by usingnn.Linearin their custom modules instead ofnn.Parameter.- Initializing
nn.Linearwith the desired weight tensors allowstorchaoto correctly quantize the model.
- Initializing
GPU MODE ā· #off-topic (3 messages):
EleutherAI, MLSys conferences, ML4Health's career mentorship program
- EleutherAI offers publishing help: Members mentioned that Eleuther AI has a Publishing help channel with some focus on endorsements.
- No specific information was shared about the specifics of this channel.
- MLSys conferences career mentorship program: A member asked about career mentorship programs in MLSys conferences.
- The member also mentioned taking part in ML4Healthās career mentorship program and said it was a pretty nice experience.
GPU MODE ā· #irl-meetup (3 messages):
irl meetup, quartet paper, Dropbox coffee spot
- Quartet paper author spotted: A member mentioned that their colleagues are at the meetup, including Andrei, one of the main authors of quartet.
- Dropbox sponsors coffee spot: A member mentioned they have a kind of Dropbox coffee spot since they are a sponsor, and invited others to chat.
GPU MODE ā· #metal (3 messages):
Bitsandbytes, Apple Silicon Support
- Bitsandbytes Merges Apple Silicon Support!: The bitsandbytes library merged the āapple silicon supportā pull request, and the next release will contain the python/pytorch code backend (with some C++ bits) but no actual Metal implementations.
- Apple Silicon Support Arrives with Caveats: The pull request implementing Apple Silicon support will be advertised as being slow, according to the committer.
GPU MODE ā· #self-promotion (1 messages):
Qwen3-Omni-30B-A3B-Instruct, S2S inference, Hathora playground
- Qwen3-Omni-30B-A3B-Instruct makes Inference Speedy: Members announced the deployment of Qwen3-Omni-30B-A3B-Instruct for fast S2S inference, see the LinkedIn post.
- Test Qwen3-Omni in Hathoraās Playground: Users are invited to test Qwen3-Omni in Hathoraās playground.
GPU MODE ā· #submissions (19 messagesš„):
nvfp4_gemm leaderboard submissions, NVIDIA performance benchmarks
- Submissions Galore Flood nvfp4_gemm Leaderboard: Multiple users submitted performance results to the
nvfp4_gemmleaderboard on NVIDIA, with timings ranging from 11.0 µs to 65.3 µs.- User <@1291326123182919753> achieved multiple runs at 11.0 µs with submission IDs
120595,120601, and121065.
- User <@1291326123182919753> achieved multiple runs at 11.0 µs with submission IDs
- New Personal Bests on NVIDIA: Several members achieved personal bests on NVIDIA, including <@1191430895769485436> at 22.6 µs (
119885), <@772751219411517461> at 18.8 µs (120443), and <@140482609422663680> at 56.8 µs (121056).
GPU MODE ā· #factorio-learning-env (2 messages):
Neurips Trip, Call Attendees, Call Time
- Neurips Attendee Flies Out: A member mentioned flying to Neurips and being available the next day.
- Call Attendees Announced: The member expects to be the only one speaking on the call, but noted that Mart might join.
- Call Time to be Determined: The member inquired about the time of the call.
GPU MODE ā· #general (1 messages):
Matmul v2 Leaderboard Error, Submitting Kernel Error, input_generator Update
- Matmul v2 Leaderboard submission fails!: A new user reported receiving a
ValueError: too many values to unpack (expected 2)error when submitting a kernel to the matmul_v2 leaderboard.- The user suspects that the
input_generatorwas updated to return 3 values, but the reference implementation inreference.pystill unpacks only 2, causing the failure.
- The user suspects that the
- Potential Mismatch in Input Generator and Reference Implementation: The error suggests a potential issue with the input_generator, which might be returning three values instead of the expected two.
- This discrepancy leads to a
ValueErrorin the reference implementation where it attempts to unpack the input data, expecting only two values.
- This discrepancy leads to a
GPU MODE ā· #multi-gpu (11 messagesš„):
Multi-GPU CUDA Kernels, NCCL Repository, Qwen2.5-1.5B-Instruct Model Training, HF Accelerate with DeepSpeed Zero3, Context Parallel and Ulysses Parallel
- NCCL Repo: Multi-GPU Kernel Nirvana: To learn multi-GPU CUDA kernels, the NCCL repository examples are recommended as a starting point.
- The NCCL (Nvidia Collective Communications Library) repo provides fundamental examples for understanding multi-GPU kernel implementations.
- Qwen2.5-1.5B Faces OOM Fate: A user training the
Qwen2.5-1.5B-Instructmodel with a sequence length of 16384 and batch size of 5 on a g5.48xlarge instance (8 A10 GPUs) is running out of memory (OOM).- They are employing HF accelerate, deepspeed zero3, gradient checkpointing, Liger-kernel, and flash attention 2, with a fixed memory of 3.6GB and activation memory exceeding 10GB.
- Context Parallelism Emerges as Activation Alleviation: One suggested way to further reduce activation memory is to use context parallel or Ulysses parallel (DeepSpeedās version of CP).
- However, it was noted that if the goal is to reach a particular global batch size, using gradient accumulation might be more efficient.
- Sequence Parallelism Saves the Day: Sequence Parallelism (SP) is where you split each example over the sequence dimension to reduce the activation memory.
- Check out the torch docs or the HF docs for more on Context Parallelism which reduces the tokens/gpu.
GPU MODE ā· #low-bit-training (4 messages):
Arxiv Papers, Hadamard Transform
- Arxiv Paper Shared: A member shared a link to an Arxiv paper: https://arxiv.org/abs/2512.02010.
- Hadamard Transform Improvements Paper: A member shared a link to a Hugging Face papers page: https://huggingface.co/papers/2512.00956 discussing improvements over Hadamard Transform.
GPU MODE ā· #llmq (1 messages):
Activation Offloading, fp8 Adam, Loss Masking, Pyllmq on PyPi
- Activation Offloading Implemented: A user implemented offloading for residual activations and other tricks for saving on activation memory.
- The implementation includes better handling of offloaded optimizer states and initial support for fp8 representation for Adam first-order momentum.
- Training 7B Model on 16GB Card: The userās code now supports pre-training/fine-tuning even a 7B model on a 16GB card with at least 64GB of CPU-side RAM.
- Scaling up, training/fine-tuning a 32B model is possible on a 4x4090 server at about 3k tok/s (48% MFU), requiring > 200GB of pinned host memory for all the offloading.
- Pyllmq Released on PyPi: The user released the python wrapper on PyPi.
- To try it out, simply
pip install pyllmq; pyllmq-tokenize --model qwen --dataset tiny-stories; pyllmq-trainand it should start fine-tuning Qwen2.5-0.5B on tiny-stories.
- To try it out, simply
GPU MODE ā· #nvidia-competition (111 messagesš„š„):
GPU Mode TUI, Cutlass Version Issues, Reference Kernel Issues, NVFP4 and Scale Tensors, B200 GPU access
- Popcorn CLI Gets a No-TUI Flag: A member created a fork of popcorn-cli that allows a
--no-tuiflag to remove the Terminal User Interface and output thestdoutofprint()statements to help with debugging; the fork is available on GitHub.- A pull request was made to incorporate these changes into the main gpu-mode/popcorn-cli repository.
- Cutlass Import Error Troubles Participants: Some participants encountered an
ImportError: cannot import name 'pipeline_init_arrive'error, potentially due to inconsistencies in the Cutlass versions across the runners; the issue was identified as some runners using 4.3.0 while others used the dev version.- One member suggested that a possible, though perhaps not entirely rules-abiding, workaround was to run
pip installand upgrade Cutlass yourself within the submission.
- One member suggested that a possible, though perhaps not entirely rules-abiding, workaround was to run
- Reference Kernel Generates Infs: Participants reported that running the reference implementation locally produced all Infs when computed with the seed=1111, but this could be resolved by adjusting the range of the scale factors to -1 to 1.
- The underlying cause was determined to be biased A/B values and negatively biased scales, and this PR was merged to fix this issue.
- Scale Tensors in CuTeDSL Analyzed: A member shared a blogpost analyzing the mathematical interpretation of scale tensors in Blackwell kernels for NVFP4, highlighting the similarity to Swizzling and the generality of the CuTe Layout algebra.
- The member thanked Verda and Paul Chang for providing access to B200 for making Blackwell programming more accessible.
- New Hackathon Participants Ask About B200 Access: A member who just joined the hackathon inquired about getting access to B200 GPUs to test execution time before submitting their work.
- Another member suggested pushing code through popcorn-cli or submitting through the Discord bot to test.
GPU MODE ā· #robotics-vla (7 messages):
Chunking, Jerky Movements, VLMs, Neural State Encoders
- Alleviating Jerky Movements via Chunking: Concerns were raised that chunking may result in jerky movements when deployed on hardware.
- One member suggested training a higher level instruction VLM to generate detailed text instructions for shorter time periods, allowing the higher-level VLM decoder to operate at approximately 1 Hz.
- Neural State Encoders on Deck: Members are testing some neural state encoders, beginning with simple Conv and MLP projections into 4 token-embeddings, using a history of 10 time steps (10x14 state - 2x 6DoF + 2x Gripper).
- The next step involves project cleanup and data generation for a 2-stage approach.
Moonshot AI (Kimi K-2) ā· #general-chat (143 messagesš„š„):
Kimi K2 models, Anthropic URL, File uploads, Roo code context, Kimi CLI
- DeepSeek V3.2 Modelās tool calling capabilities: One user found that the Deepseek v3.2 model is a step up for agentic tasks but can only make one tool call per turn, sometimes ignores tool schemas, and occasionally fails tool calls by outputting it in
message.contentinstead ofmessage.tool_calls.- The user said that the Deepseek v3.2 model seems to need more tool call post-training to match other models like kimi-k2-thinking.
- Discuss Black Friday Deals, GLM Deal: Some users experienced issues with the Black Friday deal for Kimi; one said it only showed options to invite friends, and another said the Black Friday deal didnāt work.
- A user said it ends Dec 12 and suggested starting a new chat (https://www.kimi.com/user/agreement/black-friday) while another said that the GLM deal is just so cheap especially with the Black Friday deal.
- DeepSeekās Target Audience Revealed: A video was shared explaining how Chinese labs like Deepseek are targeting enterprise users, rather than normie consumers, link to YouTube video.
- The key factor for enterprise users is the intelligence-to-price ratio, which is crucial for agentic tasks.
- Mistral Overthrows Qwen at Company: One user said that a company they know replaced qwen 3 vl 4b with ministral 3 3b yesterday, reporting better quality.
- The reported plus points included a lighter model (faster) and being able to attach more images at once: qwen3 vl 4b could take 5 images max, ministral 3 3b took upto 11 images with similar error rates on a single L4 GPU.
Nous Research AI ā· #announcements (1 messages):
Hermes 4.3, ByteDance Seed 36B, Psyche network, Solana, Office hours
- Hermes 4.3 packs a punch!: Nous Research announced Hermes 4.3 on ByteDance Seed 36B, the latest in their flagship Hermes series, offering performance equivalent to Hermes 4 70B at half the size.
- This model was post-trained entirely on the Psyche network secured by Solana.
- Psyche Training Outperforms Centralized Methods: Nous Research details how they trained Hermes 4.3 and how Psyche outperformed traditional, centralized training methods in this blogpost.
- Psyche Team Hosts Office Hours: The Psyche team is hosting office hours to discuss the platform.
- The office hours are scheduled for tomorrow at 10AM PST in this Discord event.
Nous Research AI ā· #general (91 messagesš„š„):
DeepSeek V3.2 Speciale, GLM 4.6 models release, AI Bubble & Economy Collapse, Hermes 4.3 36B release, Subagents vs Skills
- DeepSeek V3.2 Speciale leads Reasoning Bench: The new DeepSeek V3.2 Speciale Reasoning model is performing well, leading in reasoning benchmarks as illustrated in the attached image.
- GLM 4.6 models release soon: Members are anticipating the release of GLM 4.6 models, particularly GLM 4.6 Air and Mini, to fill the gap left by Mistral, and noted its been a month since they added the 5 private models in the GLM 4.6 collection on HF.
- The Mini model is rumored to be a 20B-30B MoE model.
- AI Bubble Burst Dooms Economy: Members debated the potential for an AI bubble to cause economic collapse, especially concerning the sunk costs in compute and salaries.
- One member argued that the impact would be temporary and primarily affect the US, while another pointed out the interconnectedness of global economies through USD and oil trade, referencing this YouTube Video.
- Hermes 4.3 36B surfaces online: The Hermes-4.3-36B model surfaced, with this HF link provided.
- One user asked why 4.3?, and it was answered that a few iterations went by and that the model would be available on Nous API/chat soon.
- Subagents vs Skills debated: Members discussed using subagents vs. skills, and it was noted that skills have made manual subagents less necessary.
- Instead one can define an agent for handling the requirements which will automatically be called, only using its own prompt.
Nous Research AI ā· #ask-about-llms (3 messages):
NLP Economic Simulation Research, Hermes models in Godot, LLMs for market simulation, VendingBench Analysis
- Godot Gets LLM Boost for 3D Market Simulator: A member is developing a 3D simulation in Godot to model markets, agriculture, and logistics, and is evaluating whether Hermes models would be suitable for this type of application.
- Another member suggested examining contemporary NLP economic simulation research, noting that while LLMs mimic human traits, they struggle with long horizon tasks like in VendingBench.
- Hermes Shines in Grey/Black Market Modeling: It was proposed that Hermes, with its low refusal rate and high steering, could model the behavior of grey/black markets.
- Most other LLMs may refuse this and be unusable.
Latent Space ā· #ai-general-chat (45 messagesš„):
Eon's $4B Valuation, Gradium spinout from KyutaiLabs, OpenAI's 'Garlic' Model vs Gemini 3, Vertical AI vs Rollups, Lidar and LLMs
- Elad Gil Funds Eon at $4B Valuation: Elad Gil is leading a $300 million Series round via āElad Gil & Co.ā for cloud data-management startup Eon, boosting its valuation to nearly $4 billion (source).
- The round size and the firmās straightforward name have garnered enthusiasm from commenters.
- Kyutai Spinoff āGradiumā Stirs AI Scene: KyutaiLabs quietly spun off its speech-AI team into Gradium, a new for-profit company, announcing a $70M seed round and initial voice products (source).
- Observers noted the significant overlap in staff and investors, drawing parallels to the OpenAI transition and prompting jokes about avoiding non-profit structures for product companies.
- OpenAI Cooks Up āGarlicā to Fight Gemini: OpenAIās new model, āGarlicā, aims to rival Googleās Gemini 3, with internal reports suggesting it outperforms GPT-4.5 in coding and reasoning (source).
- Reactions to the quirky naming trend are mixed, with speculation on its impact on user adoption.
- Vertical AI Owns Deep Workflows, Rollups Get Rolled: Vertical AI companies like Harvey, Abridge, and OpenEvidence are winning by owning niche workflows, hoarding proprietary data, and pricing on outcomes, whereas thin wrappers are getting steamrolled (source).
- VCs are now chasing AI-enabled rollups of legacy services, even though history shows they usually wreck value; Trace Cohenās sheet of 150+ vertical AI startups (worth ~$120B) is now the sector map.
- Antithesis Stress-Tests AI Code with Jane Street: Antithesis landed a $105M Series A led by Jane Street to stress-test AI-written code (source).
- The argument is that deterministic simulation testing will become essential to verify future AI-generated code, because trust-through-testing will make or break production AI systems.
Latent Space ā· #genmedia-creative-ai (8 messagesš„):
Gradium, Bloom, Voice AI
- Gradium garners $70M Seed: Paris-based Gradium emerged from stealth after just 3 months of work, securing $70M in seed funding led by FirstMark & Eurazeo to introduce production-ready transcription & synthesis APIs, detailed in this article.
- Bloom bursts onto the scene: Ray (@rincidium) announced the launch of Bloom, touted as the āworldās first on-brand AI,ā in this viral post which received over 360k views.
- Questions arose about features like IG/Google ad creation, the demo videoās production, and initial user challenges such as login stalls and unclear branding-kit flow, all of which Ray addressed with promises of fixes and UX enhancements.
Eleuther ā· #general (7 messages):
Waymo, Mechanical Engineering, ML algorithms, AI alignment
- Waymo cool for Aerospace Student: An aerospace student with a focus on navigation and guidance finds Waymo especially interesting, with broader interests in autonomous robotics and BCIs.
- Mechanical Engineering relevant to navigation: A member suggested that mechanical engineering is highly relevant in navigation, especially for masters programs.
- ML student seeks guidance: A first-semester ML student requests advice on accelerating their learning, having covered Python, Numpy, Pandas, and basic ML algorithms.
- AI alignment benchmarks requested: A member asked for pointers to AI alignment/safety type benchmarks.
Eleuther ā· #research (26 messagesš„):
Interpretability of World Models, Generalization in Diffusion Models, Energy-Based Models vs. Diffusion Models, Linear RNNs vs. Attention
- Seeking insight into World Model Interpretability: Members wondered about work on interpretability of world models, suggesting extracting rules learned for mechanics like gravity and predicting the usefulness of data items for improving the world model.
- They pointed to some interesting recent papers and another mildly interesting paper on the topic, but felt both contributions should be known by most.
- Diffusion Models Generalize Early!: It was discussed that a paper demonstrates that the timepoint at which generalization appears is early in diffusion models, and that the author of the paper accepts the results.
- It was further explained that this effect is probably more true for pixel diffusion than for latent diffusion because different data dims in pixel diffusion are so correlated, suggesting that a shifted noise schedule should be used for pixel diffusion.
- Energy-Based Models Claim to Generalize Diffusion: A paper claims to generalize diffusion and energy-based models, with the only drawback being a 2-3x increase in training time and support for all features diffusion supports.
- A member expressed skepticism due to the need for double backprop to train, computing input gradients for inference, halving network depth for the same cost, and trickier conditioning control, not to mention potential for instability.
- Linear RNNs Face Strongest Challenge Yet: A member highlighted a paper as the strongest argument against the need for linear RNNs with state tracking.
- They said this paper came from the same people who originally demonstrated the state tracking limitations of attention, but noted that inductive bias and trainability might still favor RNNs.
Eleuther ā· #interpretability-general (6 messages):
SAEs for Interpretability, Cunningham's 2024 SAE paper, Sparse dictionary learning problem, polysemanticity and superposition
- SAEs Gain Traction in Interpretability Research: Members discussed Cunninghamās 2024 paper being widely cited as the initial application of Sparse Autoencoders (SAEs) for interpretability.
- It was suggested that the motivation behind the paper is well explained in its introduction, particularly the third paragraph.
- SAEs Equated to Sparse Dictionary Learning: One member mentioned that someone recognized that a method being discussed for interpretability was similar to a sparse dictionary learning problem, leading to the use of relevant tools.
- This approach addressed aspects like polysemanticity and superposition in the context of interpretability.
Eleuther ā· #lm-thunderdome (2 messages):
Custom Filters in lm-evaluation-harness, Decontamination.py Inclusion, Adapting Multiple-Choice Tasks
- Custom Filters Best Practices: A user inquired about the best method for adding custom filters within the
lm-evaluation-harnessframework, specifically asking whether to extend existing .py files or create a new one and import it infilters/__init__.py. - Decontamination.pyās status in
__init__.py: A user pointed out thatdecontamination.pyis not referenced in__init__.pyand asked if this was intentional. - Multiple-Choice Task Adaptation Stalled: A user inquired about the status of adapting multiple-choice-style tasks for APIs that donāt support logprobs, noting that PR #2601 has stalled.
HuggingFace ā· #general (22 messagesš„):
DGX Spark order, Agent Tool Validation & Self-Healing, YOLO Model P-R Curve Issues, AI Learning Resources (LLM, Agent AI, Langchain), TRL get_quantization_config usage
- DGX Spark Purchased by User: A member announced they have ordered a DGX Spark, with a photo attached (image link).
- Agentsā Tool Validation and Self-Healing Abilities Explored: A member questioned if Agents can interpret, validate, and self-heal Tools (e.g., shell scripts) when destructive or buggy.
- Another user shared a link to a Hugging Face dataset indicating this capability may exist.
- YOLO Modelās High P-R Curve Causes Concern: A new computer vision user reported a trained YOLO model for Chinese chess detection is running well but has a really high P-R curve.
- Another member suggested trimming out the two classes which are significantly higher than others.
- AI Course Recommendations Needed: A backend developer asked for recommendations on the best course to learn AI (LLM, Agent AI, Langchain), as they found agents particularly interesting after building a mental health chatbot using Langchain.
- A member recommended the Hugging Face LLMs course and this blog post as a starting point.
- Seeking guidance on using get_quantization_config from TRL: A member inquired about how to use the get_quantization_config function from the TRL (Transformer Reinforcement Learning) library.
HuggingFace ā· #today-im-learning (1 messages):
mkprke: Hey folks, Today i am starting my first Ai agent course
HuggingFace ā· #cool-finds (1 messages):
Stochastic Parrot
- Stochastic Parrot Under Fire: A member posted a link to a research paper at zenodo.org that might cause readers to stop believing in the stochastic parrot.
- New research on stochastic parrots: A new study has been published that might challenge the notion of language models as mere stochastic parrots.
- The research is available on Zenodo.
HuggingFace ā· #i-made-this (3 messages):
Ellora-Lora Recipes, BitterBot AI Agent, Traffic Spike
- CodeLion releases Ellora-Lora Recipes: CodeLion has released a new blog post about Ellora-Lora Recipes.
- The blog post provides instructions and recipes for using Ellora-Lora.
- BitterBot AI Agent Seeks Feedback: An AI agent called BitterBot is seeking feedback on its progress.
- The agent is described as a work in progress but has made tremendous strides lately.
- BitterBotās Architecture Needs Enhancement: The BitterBot system experienced a traffic spike of 7k users which took the system down.
- The team is working on enhancing their architecture to support more users.
HuggingFace ā· #reading-group (1 messages):
Perturbation-based attribution experiments, Deep vision models, Feature behavior
- Features are not what you think, says blogpost: A member wrote a blog post about how features behave in deep vision models after running some perturbation-based attribution experiments.
- The blogpost can be found here: Your Features Arenāt What You Think.
- Deep Dive into Deep Vision Modelsā Quirks: Experiments reveal unexpected behaviors in deep vision models when subjected to perturbation-based attribution methods.
- The author encourages feedback on their findings shared in the linked blog post, inviting the community to explore the nuanced feature dynamics.
HuggingFace ā· #smol-course (5 messages):
SFT Model Evaluation Error, OOM Error on Fine-tuning, GPU Memory Management
- Troubleshooting SFT Model Evaluation Error: A member encountered a
ValueErrorduring SFT model evaluation, specifically failing to find the tasklighteval|gsm8k|0|0as part of this tutorial.- No specific solution was found, but the error indicates an issue with task configuration or registration in the evaluation setup.
- Taming Out-of-Memory Errors: A user reported running into OutOfMemory (OOM) issues while fine-tuning SmolLM3 with SFTTrainer on a local machine with a 16GB GPU.
- Suggestions included reducing the r value in LoraConfig and decreasing the per_device_train_batch_size, as well as restarting the Jupyter notebook kernel to ensure the GPU memory is free.
- Bigger GPUs Solve Problems?: One member reported improved results using a larger GPU, implying that a 16GB VRAM setup was insufficient for the specific task.
- They were unsure what exactly made the small 16GB VRAM run fail, but the problem went away when using more resources.
Yannick Kilcher ā· #general (14 messagesš„):
Pug Resource, Docker and Kubernetes basics, Beginner Github Repositories, Gemini CLI, Agents in CLI
- Newbies Nab Docker & Kubernetes Know-How: Members were looking for resources for learning Pug, Docker, and Kubernetes basics, as well as links to beginner-friendly GitHub repositories for hands-on learning.
- Gemini CLI Agents Arrive Soon?: A member inquired about the arrival of agents in CLI and expressed interest in adopting them, mentioning dissatisfaction with paid alternatives like Claude.
- They referenced a discussion form and their comment about possible improvements.
- Neural Net Neurons Need Numerical Nurturing: A user asked about the amount of data required to train a neural network, suggesting the use of cursorsky.moo.
- OpenHands Opens Opportunities On-Premise: One member suggested using OpenHands with a local model, leading to a query about specific models and GPUs in use.
- The original poster said they could easily spin up a 7B or 8B class model.
Yannick Kilcher ā· #ml-news (5 messages):
Deepseek 3.2 Speciale, Distributed Compute
- Deepseek 3.2 Speciale Questioned: A member questioned why not use Deepseek 3.2 Speciale, linking to a YouTube video on wavefunctions.
- Another member responded it was due to RAM limitations, preferring to keep a ~3gb model in VRAM constantly and use it for various simple tasks.
- Distributed Compute & Research Coop Suggested: In response to RAM limitations, a member suggested joining a distributed compute & research coop.
- They claimed to know of one.
Modular (Mojo š„) ā· #mojo (14 messagesš„):
Advent of Code segfault, List comprehensions bug, String processing in Mojo, splitlines vs split("\n"), Out of bounds memory access
- Mojo Advent of Code Segfault Solved!: A user encountered a segfault during Advent of Code when processing an empty line with
codepoint_slices, leading to an out-of-bounds memory access:battery_joltages[len(battery_joltages)-1].- The user found the issue by using a debugger, determining that an empty list was being accessed out of bounds, and suggested better error messages on debug builds.
- ASSERT Flag Helps Catch Scope Issues: A user suggested using
-D ASSERT=allto catch accidental references outside of scope, particularly for lists.- While it didnāt immediately solve the segfault in this case, it was noted as a helpful debugging tool for similar issues.
splitlinesandsplit("\n")Diverge in Behavior: Users discussed the differing behavior betweensplitlines()andsplit("\n"), where one of them might strip trailing newlines, leading to different results when processing text files.- Switching to
splitlinesavoided the error because it didnāt include the last empty line.
- Switching to
- String Processing Methods Explored: A user suggested that for ASCII strings, checking codepoints might be unnecessary, implying direct byte pointer manipulation could be used, also pointing out that
Stringāsgetitemtreats the string as ascii/bytes.- Span was suggested as an alternative method as well.
- AOC Solutions Welcomed in Dedicated Channel: Users are encouraged to share their Advent of Code solutions in the advent of code channel.
- Itās valuable to see how others approach the problems, especially as they become more performance-critical.
aider (Paul Gauthier) ā· #general (13 messagesš„):
LLM Model Degradation, Aider Benchmarks with GGUFs, Claude Sonnet vs GPT-5, Gemini 2.5 Degradation
- LLM Models Degrading with Aider?: Members questioned whether newer LLM Models like Claude Sonnet/Haiku 4.5 and GPT-5, when paired with Aider, have degraded in performance compared to older models.
- One user expressed that Claude-haiku-4.5 keeps forgetting to modify files with
/codeand ignores explicitly stated instructions intodo aicomments.
- One user expressed that Claude-haiku-4.5 keeps forgetting to modify files with
- Older Gemini 2.5 Degraded Too?: A member reported that older models, especially Gemini 2.5, have also degraded, potentially due to models being tuned down to handle increased workload, saying that being ārudeā worked well with gemini most of the time, but itās no where near quality from before the summer IMO.
- Another member agreed, noting that there are several reports of it.
- Benchmark Craving: Benchmarks Needed to Validate LLM Performance: A member emphasized the need for benchmarks to validate performance claims, citing that human memory and expectations are pretty crap sometimes.
- Another user noted that even though leaderboards say GPT-5 is on top, Claude Sonnet 3.7 was producing better results with Aider for their use cases.
- GGUF Aider Benchmark Guidance: A member inquired about a guide on how to run aider benchmarks with GGUFs.
- Another member pointed out that there is documentation on how to run the benchmark vs an API, requiring setting up an API server with llama.cpp.
DSPy ā· #show-and-tell (1 messages):
MCP Apps SDK, Open Source Libraries, Cross-Platform UI
- MCP Apps SDK goes Open Source!: General Intelligence Labs open-sourced mcp-apps-sdk, enabling MCP-powered apps with UI to run across various platforms.
- Developers can now embed apps designed for ChatGPT into their own chatbots, assistants, or AI platforms and test them locally.
- X post unveils SDK motivation: An X post (link) explains the reasons behind building the open-source MCP Apps SDK.
- The post details how developers can embed apps designed for ChatGPT into their own chatbots, assistants, or AI platforms and test them locally.
DSPy ā· #papers (1 messages):
batmanosama: https://arxiv.org/abs/2511.22074
DSPy ā· #general (10 messagesš„):
Prompt Security, Custom DSPy OutputFields, Pydantic integration with DSPy, Structured outputs
- Prompt Security: Security at the Prompting Layer: A member discussed the difficulty of achieving security at the prompting layer, suggesting that prompt-based ādo not do thisā statements are easily bypassed by attackers, and instead, to guard against baseline attacks by including examples in training datasets to guide the optimizer.
- They propose guardrails type security measures, using specific models and invocations to check for malicious prompts, or model provider rejections.
- Custom DSPy OutputFields: Implementing Structured Outputs: A member inquired about custom DSPy OutputFields and whether Pydantic is the best approach, while another member mentioned they are working on a custom gemini/nanobanana image type as an output field.
- The discussion involved generating text/json/structured output, questioning whether DSPy has its own implementation, and noting that they might have migrated.
- DSPy uses Pydantic BaseModel under the hood: It was clarified that DSPy uses
BaseModelunder the hood for validation and that the defaultChatAdapterandJSONAdapterperform type validation as the LLM sends its output back.- A minimal example was provided to illustrate how to define a signature that takes in a Pydantic model, showcasing how DSPy can generate structured outputs with any LLM, a code snippet.
Manus.im Discord ā· #general (12 messagesš„):
Chatmode Feature, AI agent advertisement, Account Suspensions, RAG pipelines
- Chatmode makes a Comeback: Users discuss the return of Chat Mode in the platform; others suggest that using a random instance of Qwen or DeepSeek could achieve the same thing.
- One user confirms its availability under the āmoreā section.
- AI Engineer advertises Expertise in Agent Building: An AI engineer posted an advertisement of their expertise in building autonomous AI agents and multi-agent systems, mentioning capabilities such as research, data-gathering, task automation, delegation, collaboration, and planning.
- The advertisement also lists specific technologies and tools like JS/TS, Next.js / Vue, Go / Rust, Python, Langraph, AutoGen, ReAct, CrewAI, DeepSeek, OpenAI, Claude, Hugging Face, and various APIs.
- Account Suspensions: Referral causes Suspicion: A member asked why giving referrals to several people is causing their account to be suspended.
- No further information or solutions were provided in the messages.
- AI engineer specializes in RAG pipelines: One engineer specializes in RAG pipelines, boasting hybrid search and custom retrieval for accurate, context-aware responses in production.
- The engineer also lists expertise in AI content detection, image AI, and Voice AI, including the development of moderation tools, tagging pipelines, and personalized voice assistants.
tinygrad (George Hotz) ā· #general (6 messages):
Fixing test failures in tinygrad, Performance improvements using shrink vs indexing, RMSNorm usage clarification
- Fix Almost Ready for Failing Tinygrad Tests: A member reported failing tests with
CPU=1 PYTHONPATH="." pytest -n 12, specificallytest/test_tiny.py TestTiny.test_beamand others, with complete logs provided.- Another member mentioned a pull request that almost fixes the issues.
- Shrink is blazingly fast for Indexing Tensors: A member suggested that using
Tensor.shrink((None, (0, input_size)))is faster thanobs[:, :input_size]when working with tensors.- They also noted bumping
Variablevminto 2 to avoid errors, but were puzzled why usingVariablemade the code 5x slower (16.61M vs 81.9M SPS).
- They also noted bumping
- RMSNorm Parameter Puzzlement: A member advised reviewing the source code of
RMSNorm(dim=-1)to ensure it behaves as expected.- This guidance suggests a potential misunderstanding or misconfiguration in how
RMSNormis being used.
- This guidance suggests a potential misunderstanding or misconfiguration in how
MCP Contributors (Official) ā· #general (5 messages):
MCP Security Risks, Security risks associated with MCP, MCP-specific security
- Redditors Debate MCP Security Risks: A user asked for feedback on their perspective regarding security risks associated with MCP with a link to a reddit thread.
- Another member responded with a link to a blog post about MCP-specific security items, calling it a great resource: den.dev/blog/security-rakes-mcp/.
- Another MCP Security resource: Hereās another resource MCP Security @ Reddit Thread
MCP Contributors (Official) ā· #general-wg (1 messages):
Tool Validation, Server-Side Validation
- Sampling Without Tools Requires Server Validation: A member inquired whether the server should validate if sampling occurs without a tool, given the absence of a tool proving its existence.
- The question revolves around ensuring that the process is correctly validated server-side when sampling methods are employed and the expected tool or proof of its existence is missing.
- Server Validation Crucial for Tool-less Sampling: The discussion highlighted the importance of server-side validation when sampling is performed without the presence of a validating tool.
- It ensures that the sampling process adheres to required protocols and standards even in the absence of direct tool validation.