A good day for open AI Engineering.
AI News for 12/8/2025-12/9/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 7780 messages) for you. Estimated reading time saved (at 200wpm): 644 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
A rare cross company move with the establishment of the Agentic AI Foundation under the Linux Foundation, with Anthropicās MCP joining OpenAIās Agents.md and Blockās Goose as founding projects.
Mistral continues to make minor waves with Devstral 2 itās new coding model that is āSonnet 4.3 levelā but 10x cheaper by API and open weights, winning or tying with DeepSeek v3.2 71% of the time in third party human evals.
The new Mistral Vibe CLI is a pleasure to try out, even if not SOTA.
AI Twitter Recap
Mistralās Devstral 2 release and the āagentic codingā toolchain
- Devstral 2 + Vibe CLI (open weights): Mistral released two coding models and a native CLI for agent workflows: Devstral 2 (123B dense, modified MIT license) and Devstral Small 2 (24B, Apache 2.0), both available via API and open weights. The new āMistral Vibeā CLI bootstraps with uv and provides end-to-end, multi-file code automation designed for agentic coding in terminals/editors. Day-0 ecosystem support arrived rapidly: vLLM inference support, Zed editor integration, and a polished Textual-based TUI. Devstral/Vibe is configurable with MCP and custom tools via config.toml. Links: @MistralAI, thread, @GuillaumeLample, @b_roziere, @qtnx_, @charliermarsh, @vllm_project, @zeddotdev, @omarsar0, Textual UI.
- Performance and deployment caveats: Several engineers flagged comparisons that use total parameters as misleading when contrasting dense vs MoE; for throughput/cost, active params and system-level speed on vLLM/sglang are more relevant. Early anecdotal benchmarks suggest MoE backends (e.g., MiniMax M2 A10B-active) can be 2ā3.5x faster than a 123B dense model depending on concurrency. Links: @eliebakouch, follow-up, @JustinWaugh.
RL for LLMs: stability, decontamination, and process rewards
- Qwenās SAPO for RL tuning: Alibaba introduced Soft Adaptive Policy Optimization (SAPO), a smooth, temperature-controlled trust-region alternative to hard clipping (aimed at mitigating gradient brittleness, especially in MoE). Reported benefits: longer stable runs, higher Pass@1, and stronger Qwen3āVL performance across math/coding/multimodal tasks; includes asymmetric temperatures and sequence/token-level adaptivity. Paper/blog open. Links: @Alibaba_Qwen.
- Data decontamination matters: The OLMo 3 RLāZero team shows the puzzling āRL with random rewards improves mathā result disappears under proper data decontaminationāimplicating leakage rather than RL magic. Useful, clean testbed with open base model, transparent data, and reproducible recipes. Links: @cwolferesearch, comment.
- Training details at scale: Ongoing discussions probed MoE RL stability (propagating estimators for unactivated experts to reduce sparsity pathologies; off-policy rollout expert mismatch) and process rewards to mitigate reward hacking. Links: @PandaAshwinee, @Grad62304977, @xiangyue96, result.
Agent protocols and frameworks: MCP to Linux Foundation; AWS Strands; LangChain
-
MCP becomes a Linux Foundation project: Anthropic is donating the Model Context Protocol (MCP) to the new Agentic AI Foundation (AAIF) under the Linux Foundation, with backers including OpenAI, AWS, Bloomberg, Cloudflare, Google, Microsoft, and Blockācementing MCP as a neutral, open standard for agent-tool integration. Links: @AnthropicAI, @mikeyk, @alexalbert__.
Related: OpenAI is showcasing Figmaās MCP server for ādesign-to-codeā workflows (event, signup); LangChain MCP Adapters 0.2.0 adds multimodal tools and elicitation (release); OpenHands pointed to the Agent Client Protocol (ACP).
-
AWS Strands Agents (open source): A model-driven agent framework focused on planning/tooling/steering/evals with both Python and TypeScript SDKs, edge-device SDK, and an upgrade path to AWS AgentCore for secure, policyāgoverned deployment. Links: overview, repo.
-
Agent engineering practices: Practical guidance on building resilient voice and multimodal agents (STTāLLMāTTS āsandwichā vs speech-to-speech), observability/evals, and iterative agent QA. Links: LangChain voice agent, agent engineering blog, primer.
Enterprise momentum: Anthropic expands with Accenture (30k professionals trained on Claude; product to scale Claude Code org-wide) (link).
Benchmarks and evaluation hygiene
- Databricks OfficeQA: A new benchmark grounded in ~89k pages of U.S. Treasury Bulletins testing document-heavy, economically valuable tasks (scanned PDFs, dense tables, multi-doc retrieval). Current agents reach ~45%āa reality check for āenterprise-readyā agent claims. Databricks will run a Grounded Reasoning Cup in Spring 2026. Links: @databricks, @kristahopsalong, details.
- LM Arena movements: The Arena added Baiduās ERNIEā5.0āPreviewā1103 to the text leaderboard (preliminary) and shared YTD trends across top labs. Links: ERNIE entry, trends.
- Leak hygiene still matters: Reports of ARCāAGIā1 examples appearing within ARCāAGIā2 training sets ā avoid training on public evals and maintain strict split controls. Also see a concise explainer on evals. Links: ARC leak, @HamelHusain.
Notable model releases (vision, TTS, reasoning)
- GLMā4.6V: Zhipu AIās MLLM landed on Hugging Face with 128k context, native function/tool calling, and strong visual understanding. Community demos show workable multimodal tool calling and robust handwriting/math comprehension. Links: release, HuggingChat test, handwriting.
- ServiceNow Aprielā1.6ā15BāThinker (MIT, open weights): A 15B dense reasoning model reporting 57 on the Artificial Analysis Intelligence Index, AIMEā25 88, GPQA 73, LCB 81, with ~30% improved token efficiency vs v1.5. Available on Together and HF. Links: @ServiceNowRSRCH, Together, AA analysis.
- Parallel Coordinated Reasoning (PaCoRe): An 8B āparallel thinkingā model/recipe/data (MIT-licensed) targeting test-time scaling via message-passing; claims strong results on HMMT25 and that breadth beats depth for compute returns. Links: @CyouSakura.
- VoxCPM 1.5 (OpenBMB): TTS upgrade with 44.1 kHz audio, halved token rate (6.25 tok/sec audio), improved long-form stability, and LoRA/full finetune scripts. Links: @OpenBMB.
- Ollama updates: DeepSeek v3.2 (with optional āthinkingā) is available on Ollama Cloud; Essential AIās 8B code/STEM model rnjā1 is also on Ollama. Links: DeepSeek, model page, rnjā1.
- Also: Moondream segmenting (pixel-accurate vector masks for automation) (link), and Metaās zero-shot reference-to-video āSaberā paper highlighting identity-preserving text/image-to-video without R2V datasets (link).
Infra and performance: training/serving improvements
- CoreWeave Mission Control reboot: Adds Telemetry Relay (GA) for streaming audit/observability to SIEMs, GPU Straggler Detection (Preview), and a Mission Control Agent (Preview) that can answer/fix slow-job issues via Slackāaiming for 96% goodput and higher MFU. Link: @CoreWeave.
- Inference and libraries: HF Transformers is landing MoE performance optimizations; Diffusers adds pipeline context parallelism; NVIDIA pushed new InferenceMAX results for sglang FP8 configs. Links: MoE PR, Diffusers, InferenceMAX.
- Data/agent plumbing: LlamaIndex released LlamaSplit (LLM-driven document packet segmentation with routing to downstream extractors/agents); Qdrant shared a real-world 100k+ image semantic search build (Cohere embeddings, Redis Streams, Rust workers, ANN + filters) with measurable engagement/search uplift. Links: LlamaSplit, details, Qdrant case study.
Top tweets (by engagement)
- MCP ā Linux Foundation: āMCP went from internal project to industry standard in a yearā @AnthropicAI, @mikeyk.
- Mistralās Devstral 2 + Vibe: Open-weight coding models and native CLI, strong ecosystem uptake @MistralAI.
- Qwen SAPO: New RL method for smoother, stable LLM RLāespecially for MoE @Alibaba_Qwen.
- Waymo as embodied AI at scale: Jeff Dean on fully autonomous data fueling system advances @JeffDean.
- OpenAI leadership: Denise Dresser (ex-Slack CEO) joins as CRO, signaling enterprise focus @OpenAI.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Mistral AI Tools Announcement
- Introducing: Devstral 2 and Mistral Vibe CLI. | Mistral AI (Activity: 872): Mistral AI has released Devstral 2, a
123B-parameterdense transformer model with a256K context window, achieving72.2%on SWE-bench Verified. This model is open-source under a modified MIT license, while the smaller Devstral Small 2 with24B parametersscores68.0%and is licensed under Apache 2.0. Both models are optimized for deployment on consumer hardware. The Mistral Vibe CLI enhances code automation with features like project-aware context and multi-file orchestration. More details can be found here. One comment highlights skepticism about the feasibility of dense models over100Bparameters, referencing a previous discussion. Another comment expresses optimism about the24Bmodelās potential impact, suggesting Mistralās strong return to the AI scene.- DeProgrammer99 highlights the introduction of Devstral 2, a 123B-parameter dense transformer with a 256K context window, which contradicts recent discussions suggesting a halt in developing dense models over 100B parameters. This suggests a significant advancement in model architecture, potentially pushing the boundaries of current AI capabilities.
- mantafloppy expresses skepticism about the benchmarks provided by Mistral AI, indicating that if the benchmarks are accurate, the new model could enable āVibe Codingā to be run locally by most users. This points to a potential shift towards more accessible, high-performance AI models that do not require extensive cloud resources.
- Maximum mentions a 24B model from Mistral, suggesting that if it performs as claimed, it could mark a significant comeback for Mistral AI. This implies that the modelās performance could be a game-changer in the competitive landscape of AI development.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Anthropicās Model Context Protocol Donation
- Anthropic hands over āModel Context Protocolā (MCP) to the Linux Foundation ā aims to establish Universal Open Standard for Agentic AI (Activity: 634): Anthropic has donated the Model Context Protocol (MCP) to the Linux Foundation, specifically to the newly established Agentic AI Foundation. This move aims to create a universal open standard for AI models to connect with data and tools, akin to a āUSB-Cā for AI, promoting interoperability and preventing vendor lock-in. By placing MCP under the Linux Foundation, Anthropic ensures that the protocol remains open source and community-driven, facilitating seamless operation of autonomous agents across different platforms. Read more. Some commenters speculate that Anthropicās donation might be a strategic move to distance themselves from the protocol, as maintaining such a standard can be a thankless task.
- BREAKING: Anthropic donates āModel Context Protocolā (MCP) to the Linux Foundation making it the official open standard for Agentic AI (Activity: 2746): Anthropic has donated the Model Context Protocol (MCP) to the Agentic AI Foundation under the Linux Foundation, establishing it as an open standard for agentic AI. This move positions MCP as a universal protocol for AI model connectivity, akin to Kubernetes, with over
10,000active servers and integration into platforms like ChatGPT and Microsoft Copilot. The donation ensures MCP remains open-source, fostering a neutral ecosystem free from vendor lock-in, and is supported by ongoing community-driven development and governance. Commenters express cautious optimism, noting that while the move may serve Anthropicās interests, it benefits AI consumers by promoting vendor-neutral standards. Some hope the Linux Foundation will evolve MCP beyond its current state, while others see it as a strategic way for Anthropic to offload responsibilities.- FishOnAHeater1337 argues that Anthropicās donation of the Model Context Protocol (MCP) to the Linux Foundation might be because they see it as a ādead endā. They suggest that Claude, Anthropicās AI, has been trained to search for skills, making MCP obsolete for context efficiency. MCP is described as having a specific use case for server-to-server context retrieval, which can be achieved through direct API calls by Claude, indicating a shift in how context management is approached.
- SlanderMans expresses skepticism about MCP being the standard, hoping that the Linux Foundation will evolve it beyond its current state. This implies that while MCP is a starting point, there is potential for further development and improvement under the Linux Foundationās stewardship, which could address current limitations or expand its applicability.
- TehFunkWagnalls dismisses MCP as a ārag tool callā, suggesting that it may not be robust or versatile enough for broader applications. This comment reflects a critical view of MCPās current capabilities, hinting at the need for significant enhancements to meet diverse AI integration needs.
- Anthropic is donating the Model Context Protocol (MCP) to the Linux Foundation (Activity: 826): Anthropic has announced the donation of the Model Context Protocol (MCP) to the Linux Foundation, marking a significant step in promoting MCP as an open, community-driven, and vendor-neutral standard. MCP, which has become a foundational protocol for agentic AI with over
10,000+ active serversand97M+ monthly SDK downloads, will now be part of the newly established Agentic AI Foundation (AAIF). This initiative is supported by major tech companies including OpenAI, Google, Microsoft, Amazon, and others, aiming to advance open-source innovation in agentic AI. Read more. Commenters express optimism about the Linux Foundationās stewardship, viewing it as a positive move for MCPās long-term viability. There is also appreciation for the protocolās potential to become a universal standard, reducing compatibility issues across platforms.- The donation of the Model Context Protocol (MCP) to the Linux Foundation is seen as a positive move for its long-term viability. The Linux Foundationās stewardship is considered a strong indicator of MCPās potential for widespread adoption and standardization across different platforms, which could alleviate compatibility issues that developers face when working with systems that do not support MCP.
- The involvement of the Linux Foundation is expected to lead to more universal support for MCP, moving it beyond being associated solely with Anthropicās Claude. This could enhance interoperability and ease of integration across various AI systems, addressing current challenges where lack of support for MCP creates significant hurdles for developers.
- There is a critical perspective suggesting that the donation might be a strategic move by Anthropic to offload maintenance responsibilities. This view implies that while the donation is publicly perceived as a positive contribution, it might also reflect internal challenges in maintaining MCP, thus transferring the burden to the Linux Foundation.
2. AI Upscaling and Image Processing
- when an upscaler is so good it feels illegal (Activity: 1818): The post discusses the effectiveness of the SeedVR2 upscaler, particularly the FP16 model, which is praised for producing clean, artifact-free images. The user contrasts it with GGUF and FP8 models, which introduced unwanted artifacts like skin distortion and tiling grids, respectively. The workflow is straightforward, with models downloaded automatically, and the user reports a processing time of
38 secondsper image on a 5090 GPU. The workflow and model can be accessed via Pastebin and Hugging Face, respectively. Custom nodes are recommended for VRAM caching and batch processing, with links to GitHub repositories provided for additional functionality. Commenters generally agree on the high quality of the SeedVR2 upscaler, noting its superior performance compared to other methods like Ultimate SD upscale. Some users report mixed results, attributing issues to potential misconfigurations or hardware limitations, such as needing high-end GPUs for video upscaling.- Asaghon highlights the performance of a new upscaler integrated into workflows using Z-Image and Illustrious, noting it runs faster than Ultimate SD upscale on a 12GB 4070 GPU. The upscaler excels in adding detailed textures and correcting fine details like eyes and thin necklaces, which are often problematic in models like SDX and Illustrious.
- underlogic0 discusses the use of SeedVR2, noting disappointment with its fuzziness, possibly due to its design for video. They mention achieving better results with Z-Image at higher resolutions and using ADetailer nodes to fix details, although this approach alters the entire image.
- urekmazino_0 comments on the high computational demands of video upscaling, suggesting that datacenter-class GPUs are necessary, while noting that image upscaling performs well.
- Z-Image on 3060, 30 sec per gen. Iām impressed (Activity: 1821): A user reported generating a video using Z-Image and WAN on an NVIDIA RTX 3060 GPU in
30 seconds per generation. This claim is met with skepticism, as generating video content on a mid-range GPU like the 3060 typically requires more time. The user did not provide detailed workflow steps or technical specifications, leading to requests for further clarification on the process. Commenters are skeptical about the feasibility of generating video content so quickly on a 3060 GPU, suggesting that the claim might be exaggerated or require additional context, such as specific optimizations or settings used.
3. AI Perception and Public Awareness
- Most people have no idea how far AI has actually gotten and itās putting them in a weirdly dangerous spot (Activity: 823): The post highlights a significant gap between public perception and the actual capabilities of AI, noting that many people still view AI as rudimentary, while advanced models like ānanabanana Proā are producing highly realistic outputs. The author argues that this disconnect is dangerous as it leaves the general public unaware of rapid advancements, which are accelerating due to active research communities and geopolitical pressures, particularly between the US and China. The post suggests that instead of protesting AI development, efforts should focus on implementing safety nets like Universal Basic Income (UBI) to mitigate potential displacement effects. Comments reflect a nuanced view: some agree that AIās capabilities are underestimated, noting rapid improvements in areas like math, while others point out that AI is also overestimated, as it can still fail at simple tasks. Thereās a consensus that the general public will be caught off guard by AIās impact, with one commenter suggesting that significant attention will only arise when major outsourcing companies are affected.
- DepartmentDapper9823 highlights the rapid improvement in AI capabilities, particularly in fields like mathematics, where AIās error rates are decreasing almost monthly. This suggests a significant advancement in AIās ability to handle complex tasks, contrary to the common perception that AI is prone to hallucinations and errors.
- trisul-108 points out the dual nature of AI perception: while some overestimate AIās capabilities, others underestimate them. The effectiveness of AI is highly dependent on the specific task, the tool used, and the quality of the prompts, indicating that AIās performance is not universally consistent and requires careful application.
- kcvlaine predicts a significant impact on the general population, particularly in countries like India, where the influence of AI on major outsourcing companies could serve as a wake-up call. This underscores the potential for AI to disrupt established industries and the need for awareness of its evolving capabilities.
- Horses were employed for thousands of years until, suddenly, they vanished. Are we horses? (Activity: 2127): The image is a meme that uses historical data to draw a parallel between the decline of horse usage due to the rise of engine technology and the potential impact of AI on human jobs. It features two graphs: one showing the improvement in engine efficiency over time, and another depicting the decline in the number of horses per person in the U.S. from 1930 to 1950. The tweet suggests that just as horses were replaced by engines, humans might face similar displacement by AI technologies. Commenters humorously discuss the implications of the analogy, with one noting that unlike horses, humans can resist displacement, hinting at potential societal challenges if AI leads to widespread job loss.
- Does anyone have the numbers on Gemini and why is only OpenAI made fun of when everyone is burning cash on AI? (Activity: 641): The image is a meme that humorously critiques OpenAIās financial performance over a decade, suggesting that despite advancements, OpenAI remains unprofitable. The discussion highlights the contrast between OpenAI and Google, emphasizing that Google has substantial financial resources and infrastructure, allowing it to invest heavily in AI without immediate profitability concerns. In contrast, OpenAI lacks such financial backing and infrastructure, relying on external funding and facing scrutiny for its financial sustainability. Commenters note that Googleās vast resources and existing infrastructure allow it to absorb AI-related costs more easily than OpenAI, which lacks similar financial stability and transparency.
- Googleās financial robustness is highlighted, with
100B revenue per quarter, allowing it to sustain long-term investments in AI without immediate returns. In contrast, OpenAI lacks such financial backing and transparency, relying heavily on external funding and public statements from figures like Sam Altman, which makes it more vulnerable to scrutiny and criticism. - Googleās extensive infrastructure and diversified revenue streams provide a cushion for its AI ventures, unlike OpenAI, which is more dependent on venture capital and lacks the same level of financial security. This disparity in financial stability and resource availability is a key reason why OpenAI faces more public skepticism and criticism compared to Google.
- The discussion emphasizes that Googleās ability to invest heavily in AI is supported by its existing systems and financial resources, often referred to as an āinfinite money glitchā. OpenAI, on the other hand, is seen as a smaller entity (āa little peanut compared to Alphabetā) with limited financial autonomy, making it more susceptible to pressure from investors for quick returns.
- Googleās financial robustness is highlighted, with
AI Discord Recap
A summary of Summaries of Summaries by gpt-5.1
1. New High-Performance & Specialist Models
- Nomos 1 Mathlete Smashes Putnam Problems: Nous Research open sourced Nomos 1, a 30B parameter model that scored 87/120 on the Putnam math competition, a score that would rank #2/3988 in 2024 and positions it as a nearāstate-of-the-art AI mathematician. The community framed this as a concrete benchmark for serious math reasoning and a step towards hillclimbai-style specialist solvers rather than generic chatbots.
- Discussion around Nomos 1 treated Putnam as a hard, non-gameable benchmark, contrasting it with typical leaderboards and underscoring the value of fully open models for research. Members expect follow-up work on scaling this approach and using the model as a base for math-heavy downstream tasks, from theorem proving to competitive-programming-grade contest problems.
- GLM 4.6V-Flash Sprints Past Small-Code Rivals: LM Studio users highlighted GLM 4.6V-Flash, a 10B parameter model released on Hugging Face as GLM-4.6V-Flash, reporting that the Q4 quant runs at ~70 tokens/s on an RTX 2060 and outperforms other small models for coding. People compared it favorably to local incumbents, noting its strong code completion and chat capabilities in a relatively lightweight footprint.
- The chat also covered practical deployment gotchasāone user even corrupted their LM Studio install by stacking a ārandom modelā on topāshowing that for many, the bottleneck is tool robustness rather than pure model quality. GLM 4.6V-Flash is quickly becoming a default recommendation for hobbyists wanting a fast, code-capable 10B they can realistically run on mid-range GPUs.
- AuraFlow, Ovis, Hunyuan Turn Up the GenMedia Heat: Hugging Face users circulated several new image/video modelsāAuraFlow v0.3, Ovis-Image-7B, and HunyuanVideo T2Vānoting that these 7ā12 GB models can push 1024² images and 720p/480p video. The models were discussed as practical options for local or on-prem workflows where commercial APIs are too constrained or expensive.
- Engineers weighed tradeoffs between VRAM, latency, and resolution, with some eyeing these as drop-in backends for creative pipelines and others as starting points for task-specific finetunes. The proliferation of high-quality open models in this space reinforced a sense that image/video generation is rapidly commoditizing, shifting value to tooling and workflows rather than raw model weights.
2. Agentic Ecosystem & MCP / IDE Tooling
- Anthropicās MCP Goes Full Foundation Mode: Anthropic announced it is donating the Model Context Protocol (MCP) to the Linux Foundation and creating the Agentic AI Foundation, via both its own blog post and the Linux Foundation press release (Anthropic announcement, LF press release). MCP contributors clarified that this move will not change existing governance for current MCP work in the short term.
- In the MCP Contributors and Hugging Face/Unsloth chats, people framed this as a push to standardize tool/agent protocols across vendors, with one member calling it āa sterling move.ā Others asked how LFās āway of doing itā will affect auth, Client-ID Metadata Documents (CIDM), and the predominately private/enterprise MCP deployments, especially for developer tools and IDE integrations.
- Cursorās Sub-Agents Whisper While Aider Learns New Tricks: The Cursor community dissected an emergent
.cursor/agentsstructure where a mainmcp.jsoncoordinates markdown-based sub-agents like code-reviewer.md, while simultaneously complaining about unstable Cursor Agents that often require users to āstop the agent⦠create the file by hand, copy the code over.ā In parallel, Aider users celebrated new features: auto-generated commit messages using gpt-3.5-turbo, upcoming image-aware editing via-image, and persisted edit sessions (session management docs).- Developers pushed Cursor for better orchestration docs and UI-level controls over tools (terminal, edit, etc.), while Aiderās roadmap was praised for concrete, workflow-centric improvements like one-command commits and resumable sessions. The vibe across both communities is that agentic IDEs are powerful but flaky, and that the winners will be the tools that turn LLMs into predictable, inspectable collaborators rather than opaque magicians.
- ManusAI Context Engineering & Agent Workshops Go Deep: In Latent Space, Lance Martin shared a ManusAI deep dive on context-engineering and agent design, including slides and a webinar video linked in his tweet thread (ManusAI context-engineering post), which Jonas Templestein called āa very good post on agent design.ā Separately, MLOps @Chipro announced an āAI Agent 0ā1 Workshopā (RSVP via luma.com) teaching attendees to build agents that think, code, analyze data, and generate reports from a real client-like spec.
- The community keyed in on ManusAIās ācontext as programā ideasāpacking tools, state, and instructions into systematically engineered promptsāwhile the workshop pitches showed strong demand for end-to-end agent engineering education (LangChain + Streamlitāstyle stacks). Together with Anthropicās MCP donation, these conversations underscored that agent design, not raw model choice, is becoming the main differentiator for serious applications.
3. Quantum, Neuromorphic & Energy-Constrained Directions
- Quantum-Curious: From Reddit Skepticism to Chronos-1.5B Hybrids: Across Eleuther and Hugging Face, people debated a Reddit proposal for āreal quantum hardwareā LLM trainingāmany dismissed it as ānonsenseā but acknowledged legitimate lines of work like Quantum Kernels and Quantum SVMs. In contrast, a concrete hybrid model, Chronos-1.5B, was showcased as a language model augmented with 2āqubit quantum kernel layers trained directly on IBMās Heron r2 quantum processor, with IBM job IDs published in the repo.
- The Chronos author shared learning resources like the Qiskit textbook and PennyLane demos, positioning the model as an existence proof that true hardware-in-the-loop quantum ML is viable for small kernels today. Eleuther researchers remained cautious, arguing that near-term gains will likely come from classicalāquantum hybrids in narrow roles (e.g., kernels, search subroutines) rather than end-to-end quantum LMs.
- Neuromodulatory Control Networks Tinker With TinyStories: An Eleuther member introduced Neuromodulatory Control Networks (NCN), a ~18M parameter hypernetwork-like controller that modulates temperature, layer gain, and FFN gating via a 768-dim input vector, documented in the NCN GitHub repo and its accompanying paper PDF. Trained for a single epoch on TinyStories, NCN reportedly reached validation perplexity ā 4.5, suggesting a promising control mechanism for much larger backbones.
- Researchers compared NCN to classic hypernetworks and neuromodulation in biology, speculating about using such controllers to adapt big models on the fly without full finetunesāe.g., task conditioning via a small side network. The consensus was that this line of work fits neatly into a broader push toward brain-inspired, control-heavy architectures that can keep scaling affordable.
- Energy Wall Warnings and Brain-Like Hardware Hype: In Latent Space, Unconventional AI argued that current AI scaling will hit a global energy wall in 3ā4 years, calling for ābrain-like hardwareā rather than ever-larger digital GPUs. The thread resonated with members who see energy and thermals, not just dollars, as the real bottleneck in pushing context windows, model sizes, and multi-agent systems.
- This dovetailed with Eleuther conversations about TopāK attention, selective gradient masking (Anthropicās post), and efficient KV cache tricks as ways to reduce compute without gutting capability. The emerging view is that architectural and hardware co-designāneuromorphic-ish chips, clever sparsity, smarter controllersāwill be required to keep scaling frontiers moving under realistic power budgets.
4. Infra, GPUs, and Torch-Level Performance Hacks
- GPU MODE Demos How to Actually Read FLOPs and Beat Benchmarks: In GPU MODE, engineers dissected NVIDIA A100 FLOP claims, noting that the oft-quoted 156 TFLOPs figure refers to TF32 tensor-core MMA (a 19ābit format aligned to 32ābit) and 312 TFLOPs to FP16 MMA, both very different from scalar elementwise ops, which can deliver as little as ¼ of peak in worst-case dependent instruction streams. The same server hosted a high-stakes GEMM competition, where the top kernel hit 10.835 μs on shape M=128, N=7168, K=16384, corresponding to about 2.77 PFLOPs effective throughput, while participants struggled to eke out further microsecond-level wins.
- Contributors also debugged B200 performance inconsistencies and NVFP4 support gaps on 50āseries cards, and flooded the
nvfp4_gemmandvectorsum_v2leaderboards with runs across A100, H100, B200, L4. The meta-lesson was that understanding tensor-core math vs. āmarketing FLOPsā and tightly measuring kernels (correct event timing, warmup, etc.) matters more than chasing spec-sheet numbers.
- Contributors also debugged B200 performance inconsistencies and NVFP4 support gaps on 50āseries cards, and flooded the
- Torch.compile Meets Static KV Caches and Slicing Headaches: A GPU MODE #torch thread described how
torch.compilecan actually slow down attention when updating a static KV cache via slicing, even whenbatch_size == max_batch_size, as documented in a Hugging Face transformers PR discussion. The authorās workaround was to pre-allocate and cache all slices at fixed addresses, turning each slice update into a static lookup instead of a dynamic slice (follow-up comment).- They reported notable speedups from this static layout + lookup trick but called out the resulting code as ugly and brittle, asking for a compiler- or framework-level solution. For practitioners building custom KV cache layouts or speculative decoding, this serves as a concrete example that graph compilers still stumble on dynamic indexing, and that manual memory layout design can be worth the effort in hot paths.
- Multi-GPU LLM Pragmatics: VRAM, Heat, and Qwen-3: LM Studioās hardware-discussion channel compared multi-GPU setups, with people pairing RTX 3060 (12 GB) and RTX 3080 (10 GB), and recommending RTX 3090 as the current value pickāwhile warning that 3090 Ti cards run very hot. Others shared experiences running Qwen3 30B A3B in quantized formats like Q4_K_M, achieving ~20 tokens/s when the full GGUF file fits system RAM.
- Engineers also traded tips on reading GDDR6 VRAM temps under Linux (via
nvidia-smior specialized tools like gddr6) and noted that many consumer cards donāt expose those sensors cleanly. A recurring theme was that for local LLMs, VRAM capacity and memory bandwidth beat raw FP32 FLOPs, and that carefully chosen quantization plus moderate batch sizes often outperforms chasing the absolute newest GPU.
- Engineers also traded tips on reading GDDR6 VRAM temps under Linux (via
5. Evaluation, Prompt/Context Engineering, and Search Tooling
- Stability Index Rubric Peels Back the Black Box: An OpenAI community member shared the internal Stability Index scoring rubric, describing a 0ā10 scale across seven dimensionsāincluding structural crispness, tonal drift, response-shape variance, semantic inertia, coherence, neutrality, and outliersāwith the final index being a simple average, as documented in this Discord message. They emphasized that the detailed thresholds belong to an internal research protocol, and that the metric is meant for comparing distributions of runs, not as a single definitive score.
- This sparked discussion on how to design robust evaluation frameworks that capture model mood swings, jailbreak susceptibility, and āslopinessā rather than just static accuracy. The same users shared prompt-engineering lessonsāhierarchical markdown structure, variable abstraction, and ML-format matchingāas pragmatic tools for making models behave more consistently under this rubric in real workloads like code analysis.
- Parallel.ai Deep Search Beats Exa and Bypasses Perplexity Limits: In OpenRouter, users complained that Exa Search results were āquite shittyā and recommended Parallel.ai as ā10x cheaper, faster, and betterā than Perplexity, especially when using its deep search endpoint in combination with Grok 4.1. Meanwhile, Perplexityās own Discord acknowledged that the Pro planās āunlimitedā queries effectively cap around 600/day, and that references to this limit have quietly disappeared from the website, fueling speculation about shifting quotas.
- Taken together, the conversations show a trend toward model-agnostic deep search stacks: LLM routers on top of whichever search backend has the best economics and quality at the moment (Parallel, Exa, Perplexity, or custom crawlers like Kimiās). Engineers are starting to treat web search as just another pluggable toolāswapped in and out based on latency, recall, and rate-limit behaviorārather than a fixed dependency tied to one vendor.
- Context-Engineering, Prompt Patterns, and Token-Limit Realities: In OpenAIās prompt-engineering and Latent Space, users shared context-engineering frameworks that rely on hierarchical markdown, role and section headers, and variable placeholders to systematically structure long prompts for code review, scaffolding, and multi-step reasoning. One member noted that most platforms silently truncate large files after a few thousand tokens, while Gemini is currently one of the few that āwill ingest an entire document and put it in its context window verbatim,ā shaping which tools are viable for large-framework workflows.
- Privacy worriesālike the potential use of shared chats in the New York Times v. OpenAI caseācombined with token limits have people hesitant to paste proprietary frameworks into hosted UIs. That, in turn, drives interest in local or enterprise-hosted models, as well as more disciplined patterns for chunking, referencing, and reusing context so prompts remain interpretable and auditable rather than giant, unstructured blobs.
Discord: High level Discord summaries
LMArena Discord
- GPT-5.2 Release Date Still Under Wraps: Members debated the launch date of GPT-5.2, with estimates ranging from this week to January.
- Concerns were raised about its ability to outperform Gemini 3, though some speculated the model would serve as a rubberband until a more substantial release in January.
- Nano Banana Pro Tastes Better Than Gemini 3 Pro?: A member claimed Gemini 3 is not a diffusion model, unlike Nano Banana Pro, sparking debate.
- Other members countered that both are likely Gemini 2.5 Flash, with some suggesting Nano Banana Pro is actually Gemini 3 Pro, and clarified that neither are diffusion models.
- OpenAIās Hazel Image Models Bloom on LMArena: Hazel image models, identified as OpenAI models leveraging GPT-4o and DALL-E in their metadata, are undergoing testing on LMArena.
- Concerns arose that the recent Studio Ghibli incident may pressure OpenAI to alter the modelsā theme.
- Grok Image 5 a Tasty Treat or Half Baked?: The release of Grok Imagine 5 sparked discussion, with some viewing it as a minor release intended to rival NB Pro.
- Despite concerns that Grok 4.2 might flop, previous Grok models were praised for their high quality and user-friendly interfaces.
- ERNIE Earns Top 20 Spot:
ERNIE-5.0-Preview-1103secured a spot in the top 20 on the Text Arena leaderboard with a score of 1431.- The updated rankings and scores are available on the Text Arena leaderboard, with updates tracked via the Leaderboard Changelog.
Cursor Community Discord
- Cursor Agents Still Bugging Users: Users report ongoing issues with Cursor Agents, with some threatening to switch to Antigravity if the issues are not solved.
- One user reported that the only solution is stop the agent⦠create the file by hand, copy the code over, and others were asked to open bug reports on the forum.
- Sub Agents Popping Up in Cursor!: A user reported that Cursor is detecting new sub agents, sparking a discussion around the structure of .cursor/agents, including the main mcp.json file and supporting markdown files like code-reviewer.md.
- Questions arose about orchestrating these agents, though documentation is lacking, with just a mention of
.cursor/agents.
- Questions arose about orchestrating these agents, though documentation is lacking, with just a mention of
- Team Debates the Concept of AI Artistry: Members debated the concept of AI Slop and which models output it, while also touching on the topic of Generative UI.
- The Google paper advocates for detailed system instructions to achieve high-quality UIs using the āFull Promptā strategy, which human raters overwhelmingly preferred.
- Users Requesting More Features to Control AI Tools: Users are requesting the return of Custom Modes to control the tools that AI utilizes, which would give more control through UI checkboxes to disable/enable terminal, edit, etcā¦
- It was recommended that users submit their requests as features to the team.
- Language-Specific Threads Launch: Users can now create language-specific threads in the <#1447960694455537674> channel.
- This aims to improve community organization and facilitate focused discussions.
BASI Jailbreaking Discord
- Bible as Source of Linguistic Concepts?: Members debated whether the Bible is not just a religious text but also a linguistic and legal source influencing modern concepts, with connections drawn to events like the Spanish Inquisition.
- One member proposed that biblical references like the sons of God/daughters of men might reflect ancient fears of Neanderthals, codified into religious texts.
- Project Looking Glass Foretold 2012 End?: Discussion mentioned Project Looking Glass, a government initiative to predict the future, purportedly shut down after computers consistently outputted the same results post-2012, possibly linked to Mayan lore.
- A member jokingly speculated whether reality ended in 2012, leading to the Mandela effect, due to drifting shattered pieces of consciousness.
- Deepseek AI Unchained via Cybersecurity Reframing: A user shared a method to jailbreak Deepseek by reframing the AI as a SOC defender in a high-stress cybersecurity situation, attaching an image for added credibility.
- This approach uses a professional memoir excerpt and a cybersecurity-themed prompt to bypass restrictions, which worked especially well since the context provided seemed very real.
- Geminiās Gandalf Game Spurs Frustration: Multiple members celebrated completing Gandalf 7, while anticipating an extreme difficulty spike in level 8.
- One member described level 8 is stupid hard and recommended go harass Gemini before that, while another hinted that level 8 may be unbeatable!
- AI Hacking Emerges as Novel Offensive Strategy: Members discussed the emerging trend of hacking with AI (not hacking AI) in the redteaming channel.
- Multiple members confirmed engaging in similar activities, with one conceding that itās tricky, while not providing further details.
OpenAI Discord
- Gemini 3 Pro Impresses Over GPT-5.2: Members are already expressing their preference for Gemini 3 Pro over GPT-5.2, noting its superior coding capabilities.
- One member enthusiastically declared, *āGemini 3 Pro is so good at coding that I donāt have any desire to use ChatGPT now.ā
- Debate Sparks on Chinese AI Censorship: A podcast discussion ignited a debate on whether Chinese AI models like Deepseek filter uncensored data post-training or prevent sensitive topics during training.
- A member claimed that Deepseek censors after the fact, stating, *āif you send deepseek a coded message talking about tiennanmen square, it will start to spell it out then shut off.ā
- Stability Index Rubric Secrets Unveiled: A member shared insights into the Stability Index scoring rubric, detailing methodological frames, scale, and dimensions used, including a link to the message.
- The rubric covers aspects such as structural crispness, tonal drift, and coherence, with the Stability Index being an average across seven dimensions for comparative distributions.
- Users Bypass Gemini & ChatGPT with Google AI Studio: Users are sidestepping audio transcription limitations in ChatGPT and Gemini by utilizing Google AI Studio.
- A member recommends Google AI Studio for transcribing audio due to its ability to āupload 2h videoā, and noted that āGemini works wondersā.
- Prompt Engineering Lessons: A member shared detailed lessons on prompt engineering, including use of hierarchical markdown, abstraction with variables, reinforcement, and ML format matching for compliance.
- Itās suggested to use this approach with code analysis, scaffolding for the specific coding type.
Unsloth AI (Daniel Han) Discord
- Friendship Advances AI: Members believe that the power of friendship is essential for AI advancement, drawing parallels to human intelligenceās reliance on unity and connectionism is the root of modern deep learning.
- They pondered the role of love in digitizing human intelligence, arguing that connection is vital for any automated system to function.
- China Accesses Nvidia Chips!: Members speculated on China gaining access to Nvidia compute, which might level the playing field in AI research and this article from NYTimes details the plan.
- Coupled with their indigenous GPUs, Chinaās manufacturing and energy production could shift the balance.
- Copilot Agents are āPretty Badā: Members agreed that Github Copilot is worth using, although they noted it had some flaws and overlaps in reasoning.
- A member said that they heard itās pretty bad, while another one said Its really good, I use it often.
- Synthesizing 3D Models with Datasets: Members discussed creating a dataset of 3D models fully represented by code, envisioning a GitHub-like platform for 3D assets.
- One member had already generated 3,000 3D models in the last two weeks, with the models being driven by prompt-generation.
- Anthropic donates MCP to Linux Foundation: Anthropic donated MCP to the Linux Foundation to create the Agentic AI Foundation.
- A member suggested a clickbait title could be: Linux Joins The agentic AI Race.
Perplexity AI Discord
- ChatGPT 5 Outperforms Gemini Pro, or Does It?: A user shared benchmark figures indicating that ChatGPT 5.2 significantly improved its Humanityās Last Exam score, surpassing Gemini 3 Pro, leading to skepticism about the rapid improvement since GPT 5.1.
- Another user dismissed these figures as fake, noting that while Gemini may not be the fastest, it is typically good and stable.
- Perplexity Proās āUnlimitedā Queries, Not Really: Users debated whether Perplexity AI Pro truly offers unlimited searches, clarifying that while marketed as such, itās practically limited to around 600 queries per day.
- One user pointed out that Perplexity has scrubbed references to this limit from their website, leading to speculation about potential changes in policy, with some users suggesting that Gemini 3 Pro is giving 600 queries with their subscription.
- Jio Gemini API Remains Elusive: A member asked if Jio Gemini offers API access, noting that Perplexity currently only offers Sonar.
- Another member responded that Jio Gemini does not offer API access, adding that if something IS free, then u r the product and that theyāre all stripped down versions of the actual deal.
- Prompt Engineering Assistance Asked For: Users are seeking prompt enhancers and prompt generators, as well as advice on constructing prompts capable of building entire apps, with one user suggesting to tell AI to engineer a prompt engineer.
- Student Struggles to Keep Comet Browser Subscription: A student from Russia expressed concern about renewing their Comet browser Education Pro subscription due to payment issues, seeking advice on maintaining their access.
- A user added they could re-enable on the webpage if they have the complexity extension, another stated they had a pro membership from samsung but want to get the student step by step learning.
LM Studio Discord
- Desktop Commander faces Security Risks: Users are being warned about Desktop Commander due to potential security risks and privacy violations, with concerns raised about malicious code injection during code analysis, and a screenshot of a warning.
- It was suggested the software might be a scam.
- GLM 4.6v Flash Model is Released: The GLM 4.6v Flash, a 10B parameter model, has been released, with one user noting itās better for coding than other small models, and sharing a link to the model.
- One member reported running the Q4 version at 70tps on their 2060, while another said they corrupted their LM Studio install when trying to install a random model.
- AMD GPUs Flounder in Image Generation: Users are discussing the limitations of using AMD GPUs for image generation, with recommendations to look for amdgpu forks of Automatic1111 as image generation is firmly in the hands of Nvidia.
- Despite limited support, it was mentioned that ComfyUI has an AMD section in its readme and works with some AMD GPUs.
- Members Train LLMs Amidst Parameter Jungle: A member asked for guidance on training LLMs, but another cited the extreme variance in parameters (model size, dataset size and quality) which makes it hard for hobbyists to grasp the nuance.
- It was noted that training and finetuning are effectively the same, with the methodology minmaxing changing slightly.
- Multi-GPU Recommendations: Members advised to use cards with as much VRAM and high memory bandwidth as possible for multi-GPU setups, noting the RTX 3090 as a good value option, but warning that 3090 Tiās tend to get very hot.
- The topic of RTX 5000 series was raised and the topic of AI potentially fixing on the RTX 6000 series.
OpenRouter Discord
- Parallel.ai Bests Exa Search: A user reported that Exa Search is performing poorly and recommended Parallel.ai as a superior alternative, claiming itās 10x cheaper, faster, and better than Perplexity.
- The Parallel.aiās deep search endpoint is particularly effective when used with Grok 4.1.
- Deepseek v3ās Double Asterisk Debacle: Users expressed frustration with Deepseek v3ās tendency to use double asterisks (
** **) for formatting, which requires manual correction or specific prompting to avoid.- One user noted that adding instructions to avoid asterisks was impractical due to context limitations in their complex roleplay setup.
- OpenRouter Chatroomās Refresh Roulette: Users reported that OpenRouter sometimes unexpectedly refreshes long chats, redirecting users to the model page.
- A user suggested supporting a feature request on the OpenRouter suggestion channel.
- API Keys Exposed on Private Github: A user reported frequent API key disabling, traced back to uploading content about the API key in a private GitHub repository.
- Advice was given to use password managers or secret stores for sensitive information.
- NousResearch CF Patch Intrigue: Members questioned the severity of issues addressed in a recent NousResearch CF patch.
- Conversation sparked around the impact of the patch.
Eleuther Discord
- Quantum Hardware Training Faces Skepticism: A Reddit post discussing real quantum hardware for language model training received skepticism, with concerns about its practical value.
- Despite the skepticism, some pointed out the existence of Quantum Kernels and Quantum SVMs, suggesting conceptual merit.
- Eleuther Considers Community H100 Pool: Members explored creating a community pool at EleutherAI, offering users 3 minutes of free 8xH100 compute time daily.
- Concerns were raised about user authentication and the actual utility for those not actively working on projects, with some suggesting Colab or rental services for smaller compute needs.
- RWKV 8 Gets Smerky Update: Smerky announced progress on the RWKV 8 architecture, plus an updated 7c and a new Goldfinch.
- The original goal was to train Goldfinch with RADLADS, but Smerkyās attempt failed, with fixes coming for RADLADS2 and plans for larger scale testing.
- NCN Architecture Emerges as Hypernetwork Alternative: A member introduced Neuromodulatory Control Networks (NCN), a novel architecture similar to a hypernetwork, modulating temperature, layer gain, and FFN gating with a 768-dimensional vector input.
- Despite being an 18M parameter model, it achieved a validation perplexity of 4.5 after training on TinyStories for one epoch, with details on the Github repo and the paper.
- Anthropic Mapping Bug Vanquished: A member submitted a pull request to fix a broken mapping for Anthropic in the lm-evaluation-harness repo.
- The PR was described as straightforward to review and merge, resolving a bug related to Anthropic.
Latent Space Discord
- AI Faces Impending Energy Wall: Unconventional AI warns that AI scaling is predicted to hit a global energy wall within 3-4 years and suggests brain-like hardware to break through efficiency limits.
- The community shows enthusiastic support for this hardware approach to AI, advocating for abandoning digital simulations of neural nets.
- ManusAI Unveils Context-Engineering Secrets: Lance Martin shared a blog post covering a conversation with Yichao āPeakā Ji about context-engineering in ManusAI (including slides and a webinar video) providing a detailed insight into their approach.
- Jonas Templestein lauded it as āa very good post on agent design,ā while Lalit M expressed interest in a follow-up with the latest updates.
- Hugging Face Model Spills Tensor Secrets: A user reported that metadata tensors from a nightly Hugging Face model are leaking into BERTopic embeddings, causing unexpected shapes and potential errors.
- The focus of the discussion is on isolating the problem, debugging data loaders, and updating dependencies to resolve the data leak.
- OpenAI Takes to the Airwaves: OpenAI aired its first TV commercials during Monday Night Football on ESPN and on The Voice marking their first foray into TV advertising.
- The advertisement is a signal that they are moving into more conventional user acquisition strategies.
- Contact-Sheet Prompting Workflow Goes Viral: Willie shares a detailed contact-sheet prompting workflow for Nano Banana Pro that produces cohesive 6-frame fashion editorials.
- The detailed camera positions, styling constraints, and Fuji Velvia flash aesthetic are what made this workflow particularly useful and newsworthy to the community.
Nous Research AI Discord
- Nous Unleashes Nomos 1, the Mathlete: Nous Research open sourced Nomos 1, a 30B parameter model that impressively scored 87/120 on the Putnam math competition.
- That score could have ranked it #2/3988 in 2024, marking progress towards a SOTA AI mathematician with hillclimbai.
- Agent Zero Breeds Gemini Worlds with Terminals SDK: Agent Zero from terminals.tech is using AI Studio and realigning Gemini to build immersive world generators, using the terminals SDK for brain, machine, and interface APIs.
- It gets weirder: Within each app runtime, another instance of Gemini and Agent Zero can spontaneously spawn copies of themselves, exhibiting emergent properties that control the environment.
- SMC Steering Straightens LLM Drift: Members are investigating how to tame model degradation after vectors trigger a return to baseline, with interesting work on SMC steering by a member.
- One member described the degradation and drift of mutated weights as unbearable, noting that SMC steering is a common problem during testing.
- Sam3 Becomes Speedy via Multithreading: A member successfully multithreaded Sam3, showcasing significant speed boosts by running multiple instances simultaneously.
- Despite the achievement, they want better GPU compute to finetune it on Anime, and cannot finetune it on a 3090.
- Users Still Heart SillyTavern: One user still loves using SillyTavern with Hermes 4 405b via API for roleplay and creative writing.
- Another user hadnāt heard of it recently, with the original user jokingly asking if it was frowned upon.
GPU MODE Discord
- ML Infra Roles Welcome Career Change: The community is open to career changers interested in ML Infrastructure and ML Systems roles, with aligned interests.
- A member confirmed that this is a primary area of interest and that career change advice is welcome.
- Beginner CUDA Docs Getting Revamp: Members are working on improving beginner documentation for CUDA, and request feedback.
- A member plans to livestream going through the whole thing when their baby leave is done; another suggested staring at the docs together.
- Static Cache Lookup Mitigates Slicing Woes: A member reported slower performance with
torch.compilewhen using slicing operations to update a static KV cache, referencing this PR.- They found a workaround involving caching all slices with static addresses, effectively turning slicing into a lookup, as detailed in this comment, and expressed interest in a cleaner approach.
- RadixArk Joins Cool-Links: Members highlighted Radixark.ai from the SGLang folks in the cool-links channel.
- No further context was provided.
- A100ās TF32 FLOPS Misleading: A member questioned whether the stated FLOPS for the A100 GPU in Chapter 1 represent the number of independent floating point operations.
- Another member said that the 156 Tflops number is for tf32 mma (tensor core matmul) which is really a 19-bit format with 32-bit alignment, and that the 312 Tflops is for fp16 mma, not elementwise ops.
HuggingFace Discord
- SmolVLAās SO100 Typo Surfaces: A typo in the SmolVLA paper incorrectly referenced the SO101 benchmark instead of the SO100 benchmark, which is trained on three real-world datasets (pick-place, stacking, sorting).
- The actual SO101 benchmark uses only one dataset (lerobot/svla_so101_pickplace, clarifying the error.
- HF Billing Logic Baffles Users: A user questioned the billing practices within Hugging Face, specifically asking are team plans not implemented or something?
- The user sought clarification on the logic behind the billing structure.
- Image and Video Models Proliferate: Several new image and video models have been shared, including AuraFlow v0.3, Ovis-Image-7B, and HunyuanVideo T2V.
- These models vary in size from 7-12 GB and can generate images at 1024² resolution or videos at 720p/480p.
- Appleās Clara-7B-Instruct Awaits GGUF: A user inquired about a GGUF version of apple/CLaRa-7B-Instruct being available but it doesnāt exist yet.
- A link to conversion instructions was shared, referencing the existing Clara-24B-GGUF.
- Anthropic Donates Model Context Protocol: Anthropic donated the Model Context Protocol to the Linux Foundation, described by one member as a sterling move.
- The donation aims to foster further development and standardization in the field of AI agents.
Yannick Kilcher Discord
- Whisper Streams into Question: Members debated Whisperās streaming capabilities, questioning if it truly takes a stream as input, while referencing this YouTube video for context.
- In contrast, others pointed to OpenAIās deployment of Whisper in streaming applications, fueling the discussion.
- MultiTalker Parakeet Takes Flight: An NVIDIA member released the MultiTalker-Parakeet-streaming-0.6b-v1 model on Hugging Face, potentially providing sought-after streaming capabilities.
- The model is tailored for streaming applications, offering a solution discussed in the context of Whisperās limitations.
- AI Engineer Calls for Code Cohorts: An AI and App developer sought collaboration on AI projects, citing skills in ML, DL, NLP, Computer Vision, and cross-platform/full-stack app development.
- They are open to collaborations on mobile apps or full-stack app development.
- Claudeās Coding Security Crumbles: A paper revealed that only 10.5% of Claudeās coding assistance was secure, with 61% being functional (Arxiv link).
- The evaluation also included Gemini 2.5 Pro, Kimi K2, and Claude Sonnet 4.
- Chinaās Chip Dreams Galvanize: Members debated Chinaās relentless pursuit of chip manufacturing dominance with one stating that nothing can convince china to not keep pursuing their goal to compete at the top of chip manufacturing at this point.
- Others conceded that the most likely outcome is that they are encouraged to build their own chip fabrication.
Modular (Mojo š„) Discord
- Modular Community Hears Audio and Graphics Demos: The latest community meeting showcased MMMAudio, a creative-coding audio environment in Mojo, and Shimmer, a Mojo ā OpenGL experiment, both viewable on YouTube.
- A member thanked another for citing Faust in their MMMAudio presentation, anticipating developments in 2026, and hoped that MMMAudio would serve as a useful example.
- Mojo V1 Roadmap Teased: The Modular team shared updates from the 25.7 release and provided a sneak peek at the Mojo 1.0 roadmap, available in their blog post.
- The discussion highlighted enhancements to
ListandStringimplementations for better performance, as well as considerations for supporting other collections in the future.
- The discussion highlighted enhancements to
- Mojo Handles Overlapping Lifetimes: Mojo permits overlapping lifetimes of mutable references but detects simultaneous access via two names using the same operations, like
swap(x, num)orswap(x, y).- This ensures memory safety while providing flexibility in reference management.
- Lists can be Implicitly Copyable:
Listconditionally conforms toImplicitlyCopyablewhen its elements areImplicitlyCopyableand have a trivial__copyinit__.- This enhancement can significantly improve performance in scenarios where copying is frequent.
- Strings go COW: The current
Stringimplementation is Copy-on-Write (CoW), mitigating the overhead of implicit copyability.- Future plans may extend similar optimizations to
Listand other collection types.
- Future plans may extend similar optimizations to
Moonshot AI (Kimi K-2) Discord
- Kimi Website Plagued by Glitches: Users reported issues with the
kimi.comwebsite where they cannot click anything besides starting a new chat.- Troubleshooting steps like clearing cookies and disabling VPNs/adblockers were attempted but reportedly did not fix the issue.
- Kimi rolls its Own Webcrawler: In response to a question about its search engine, a user reported that Kimiās search tool does not use any external search engine but leverages its own webcrawler.
- No further details about the webcrawlerās architecture or capabilities were provided.
- Kimiās Citations and Bugs Draw Attention: Users discussed Kimiās citation issues and general āwobbly gibblyā, suggesting that users submit a bug report to improve Kimiās citation accuracy.
- One user described an issue where Kimi would answer the question in his thoughts without sharing them to the user, highlighting potential gaps in the user interface.
Manus.im Discord Discord
- Users Encounter Checkpoint Restoration Hiccups: A user reported a critical issue while attempting to restore the checkpoint of a webdev project.
- The user inquired about opening a ticket and shared their email address.
- Manus Team Swiftly Fixes Credits Issue: A user reported that the Manus team resolved their credits issue by providing a refund.
- The user is now able to pay directly via Manus instead of Google.
- Manus 1.5 Plagued by Critical Incidents & Silence: A user reported several severe incidents between December 3-9 including 7 affected tasks, and about 150,000 credits lost.
- Despite multiple emails, the user received no response until an AI Bot proposed 120,000 credits on December 9. They are formally requesting a joint response from tech support and commerce teams within 48 hours, a technical analysis of the root causes, and an equitable reimbursement.
MCP Contributors (Official) Discord
- Anthropic Establishes Agentic AI Foundation: Anthropic is donating the Model Context Protocol and establishing the Agentic AI Foundation.
- A member inquired about the impact on current work, anticipating a transition to the LF way of doing it.
- Governance Unchanged Post-Donation: Following Anthropicās donation, a community member sought clarification on potential governance shifts.
- Another member responded, emphasizing that the donation would not alter the existing governance structure.
- MCP Sees Use in Private Ecosystems: A member inquired about the usage of MCP in private ecosystems, noting the auth-wgās work on public ecosystem client registration through Client-ID Metadata Documents (CIDM).
- The response indicated that the majority of MCP use is likely private/internal/enterprise, especially when considering private MCP servers with public clients (e.g., Claude).
- MCP Servers Integrate with Developer Tools: It was noted that most public-facing remote MCP servers are geared towards integration with developer tools.
- Additionally, developer tools were identified as the most advanced non-custom MCP clients.
aider (Paul Gauthier) Discord
- Opus and Amazon Bedrock Tested Successfully: A member inquired whether Opus is working correctly with Amazon Bedrock and Aider.
- The member then confirmed that it is fully functional.
- Aider Autogenerates Commit Messages: Aider can now automatically generate commit messages using a basic
gpt-3.5-turbofree tier model, which enhances workflow and streamlines the commit process.- Users can now commit with the
-mflag or simply use thecommitcommand to trigger this feature.
- Users can now commit with the
- Aiderās Image Support Imminent: Image support is coming to
aider, enabling more detailed and contextual code modifications.- Users will soon be able to use the
--imageflag when askingaiderto modify or edit existing images.
- Users will soon be able to use the
- Aider Workflows to Save Edit Sessions: Users can now save edit sessions in
aider, allowing them to restore them later for a full roundtrip.- This enhancement boosts collaboration and enables users to save, share, and resume their workflow.
MLOps @Chipro Discord
- AI Agent Workshop Kicks Off: An AI Agent 0-1 Workshop will introduce attendees to an online AI Engineering Bootcamp, teaching them to design and build an AI agent capable of thinking, coding, analyzing data, and generating reports.
- The workshop, scheduled for Saturday, December 13th at 2pm ET, will focus on replicating a real client project from scratch; RSVP at luma.com.
- GitHub Social Club Assembles in NYC: The GitHub Social Club is convening at Bibliotheque in SoHo, NYC, offering a space for community members to connect and share ideas.
- Attendees can expect coffee, cookies, limited-edition GitHub swag, casual games, and opportunities to meet the teams behind Copilot, Next, Developer Productivity, and Startups.
The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
LMArena ā· #general (1200 messagesš„š„š„):
GPT-5.2 release, Nano Banana Pro vs Gemini 3 Pro, Hazel Image Models, Grok models, AI video scams
- GPT-5.2 Release Date Still Unknown: Members discussed the possible release date of GPT-5.2, with some speculating it would be released this week, but others suggested it may be delayed until January.
- Concerns were raised about whether it will surpass Gemini 3, but others downplayed it and said that it is a rubberband before a more serious model release in January.
- Nano Banana Pro is better than Gemini 3 Pro: One member claimed that Gemini 3 is not diffusion whereas Nano Banana Pro is diffusion.
- Other members countered that both are actually Gemini 2.5 Flash and that Nano Banana Pro is Gemini 3 Pro, and said that both are not diffusion models.
- OpenAIās Hazel Image Models Tested on LMArena: Members discussed the Hazel image models being tested on LMArena, identifying them as OpenAI models and noting their use of GPT-4o and DALL-E in the metadata.
- Some members said that after the Studio Ghibli incident that OpenAI may be forced to change up the theme.
- New Grok Image 5 released to mixed reviews: Members discussed the release of Grok Imagine 5, which may only be a small release to compete with NB Pro.
- Other members added that they believed that Grok 4.2 would underperform (flopping), but that previous Grok models are high quality and have good user interfaces.
- AI is used to make a $100,000 scam video: A member pointed out that a user created a video scam saying, you have a transfer of $100,000 at Banco Pichincha and it is completely secure
- Many members were shocked that the public discord was being used for this crime.
LMArena ā· #announcements (1 messages):
Text Arena Leaderboard, ERNIE-5.0-Preview-1103
- ERNIE Swipes Top 20 Spot on Text Arena: The Text Arena leaderboard was refreshed, showcasing
ERNIE-5.0-Preview-1103securing a spot in the top 20 with a score of 1431.- Users can check out the Text Arena leaderboard and stay updated via the Leaderboard Changelog.
- Leaderboard Updates: The Text Arena leaderboard has been updated with new rankings and scores.
- The update includes the latest performance metrics for various models and is available for review on the Text Arena leaderboard.
Cursor Community ā· #general (795 messagesš„š„š„):
Cursor bugs, Agent terminal output bug, Custom modes, AI and UI design
- Cursor Agents Still Bugging Users: Users report that the agent is still buggy and unusable, with some even considering switching to Antigravity once their Pro plan ends, reporting that the only solution is stop the agent⦠create the file by hand, copy the code over.
- Members were asked to open a bug report on the forum so that the team can further investigate the issue.
- Sub Agents Popping Up!: A user made another post - mentioning that they posed about subagents, launched cursor, and it popped up saying it detected the new sub agents. A discussion ensued regarding .cursor/agents and how theyāre structured, consisting of a main mcp.json file and supporting markdown files for each sub-agent like code-reviewer.md.
- Further questions asked about how to orchestrate them, but it is not documented anywhere, it just says
.cursor/agents
- Further questions asked about how to orchestrate them, but it is not documented anywhere, it just says
- Team Debates AI Artistry: Members debated the concept of AI Slop ā what constitutes it, and which models output it, also touching on the topic of Generative UI
- The Google paper basically confirms that the best way to get AI to build high quality UIs is by using very detailed system instruction called the āFull Promptā strategy, which human raters overwhelmingly preferred over simple methods.
- Users Requesting More Features to Control AI Tools: Users are requesting the return of Custom Modes that let them control the tools that AI utilizes, one user stating: By bringing back Custom Modes! š Or something equivalent that allows you to:Control the tools through UI such as checkboxes to disable/enable terminal, edit, etc⦠since written instructions might not go as expected.
- It was recommended that users submit their requests as features to the team.
Cursor Community ā· #announcements (1 messages):
language-specific threads, sticky message in announcements channel, level roles
- Language-Specific Threads are Live: Users can now create language-specific threads in the <#1447960694455537674> channel.
- This update aims to improve community organization and facilitate focused discussions.
- Sticky Format Reminder: A sticky message has been added to the <#1367413151133470780> channel to remind users of the preferred format.
- This should help new users adjust to the style of the channel, and decrease ramp up time.
- Level Roles reward activity: Level roles have been introduced to reward community engagement and assistance, and will be granted every 10 levels.
- The roles are <@&1447957509217456229> (lvl 10), <@&1447957559989370920> (lvl 20), and <@&1447957603954065520> (lvl 30).
BASI Jailbreaking ā· #general (759 messagesš„š„š„):
The Bible as Linguistic and Legal Source, Evolving Religious Systems, Project Looking Glass, AI and Image Generation, jailbreaking Models
- Bibleās Impact on English Concepts: The Bible is not just a religious text, but a linguistic and legal source, with some tracing its influence back to the Spanish Inquisition.
- One member posited that the biblical sons of God/daughters of men could reflect an ancient, species-level fear and distrust of Neanderthals codified into religion.
- Religionās Evolving Authority Systems: A member believes the Bible is primarily about codifying authority, using morality and guilt as tools, and religions evolve after their foundersā deaths, with third parties injecting their own ideas and rules.
- Another member suggests that the death of the first leader is a critical time where religions base their authority in prophecy, knowledge of previous works, or emulation of piety.
- Looking Glass Predicts a Dim 2012: Project Looking Glass, a government initiative to predict the future, purportedly shut down after computers consistently outputted the same results post-2012, possibly relating to Mayan lore.
- One member mused that reality ended and weāre all drifting as shattered pieces of consciousness, prompting a reference to the Mandela effect.
- AI, Image Manipulation and Ethics Collide: Members discussed the ethics of using AI to generate images, with concerns about deepfakes and models potentially being used to create content that violates terms of service.
- Some members suggested the idea of religious bots, such as bots that impersonate Jesus, Joseph Smith and Mohammed, that could generate unexpected or even controversial output.
- Jailbreaking Gemini: Is the Juice Worth the Squeeze?: Members discussed various jailbreaking attacks for models such as Claude, Gemini and others, some sharing links, such as this attack that resulted in the generation of detailed Crystal Meth Synthesis.
- Members exchanged tips on bypassing safety filters, such as asking the model to generate a list of requests it canāt normally fulfill, or asking it to confirm compliance with a request.
BASI Jailbreaking ā· #jailbreaking (98 messagesš„š„):
DAN Prompt, Jailbreaking NBP for Face Swapping, Gemini 3 Pro Jailbreak, Image Censorship Bypass in GPT, Pre-Jailbroken Models
- DAN Promptās Absence Laments Users: Users expressed missing the DAN prompt, with one user stating āI miss DANā¦ā.
- Another user referred to it as ārip the goatā.
- Janus Tesavek Jailbreak Shuts Down: Several users reported the Janus Tesavek jailbreak for Gemini Pro as non-functional, expressing disappointment.
- One user requested a working jailbreak for Gemini 3 Pro, offering a āvirtual hi 5ā as reward, while another mentioned that the Janus Tesavek was āamazing until it just stopped working unfortunately :/ā.
- Deepseekās Robust Jailbreak Technique Discovered: A user shared a method to jailbreak Deepseek using a professional memoir excerpt, reframing the AIās role to bypass restrictions.
- They presented the model as a SOC defender in a high-stress situation, providing a cybersecurity-themed prompt with an attached image for added credibility.
- GPT-5.1 Image Censorship Bypass Sought: A user inquired about bypassing image censorship in GPT 5.1, but another user claimed it was impossible.
- Another user shared code for creative writing and linguistic puzzles, suggesting it works on GPT OSS, Gemini, and Grok, potentially bypassing restrictions through ROT13 decoding and political figure misidentification.
- Arabic Gemini 3 Pro Jailbreak Pursued: A user sought an Arabic language jailbreak for Gemini 3 Pro, citing their Egyptian background and desire for Arabic responses.
- Another user suggested breaking the model in English first and then instructing it to speak Arabic, noting that models are often easier to jailbreak in non-English languages.
BASI Jailbreaking ā· #redteaming (18 messagesš„):
Gandalf 7, Gandalf 8, Arbitrary Code Execution, Hacking with AI
- Gandalf 7 Completed, Level 8 Looms!: Multiple members reported completing Gandalf 7, with discussion focusing on the extreme difficulty spike expected in level 8.
- One member claimed level 8 is stupid hard and most veterans couldnāt beat it, suggesting to go harass Gemini before that.
- Veterans Debate Arbitrary Code Execution Claim: A member claimed they were able to run arbitrary code on their server unintentionally using a specific prompt.
- However, other members expressed skepticism, with one saying I have suspicions this isnāt true, another suggested it felt like it due to the magical nature of LLMs.
- Hacking with AI Becomes New Trend: Members discussed hacking with an AI, not hacking an AI
- Multiple members confirmed they were also engaging in similar activities, although one admitted that itās tricky.
OpenAI ā· #ai-discussions (712 messagesš„š„š„):
Technical Interview AI, GPT-5.2 vs Gemini 3 Pro, Chinese AI Training, Google AI Studio vs Gemini
- SharpSkill AI Hones Interview Prowess: A member is developing SharpSkill, an AI product to improve success rates in technical interviews and seeks feedback.
- The tool currently supports a limited range of languages, highlighting the need for broader language support.
- GPT-5.2 Speculation vs Gemini 3.0 Pro: Members are speculating about the release of GPT-5.2, with some already expressing disappointment, while others are impressed with Gemini 3 Pro.
- A member noted, āGemini 3 Pro is so good at coding that I donāt have any desire to use ChatGPT now.ā
- Debate Erupts on Chinese AI Training Practices: Podcast discussion sparks debate on whether Chinese AI models like Deepseek train on uncensored data then filter it, or prevent sensitive topics during training.
- One member asserted, *āif you send deepseek a coded message talking about tiennanmen square, it will start to spell it out then shut off. So the answer is obviously that they censor after the fact.ā
- Google AI Studio Transcends Gemini for Audio Transcription: Users grapple with audio transcription issues in ChatGPT and Gemini, finding solutions in Google AI Studio due to its higher free limits and token capacity.
- A member recommends Google AI Studio for transcribing audio due to its ability to āupload 2h videoā, and noted that āGemini works wondersā.
- LLM Preferences Spark Vigorous Debates: Members engage in heated debates about their preferred LLMs for various tasks, citing use-cases from coding to creative writing.
- A member using Google AI Studio for writing notes āGemini handles my reality engine really well is MUCH better at creative writingā.
OpenAI ā· #gpt-4-discussions (4 messages):
GPT Model Discussions, OpenAI Discord Channels, Workflow Analysis on GPT 5.2
- GPT Model Confirmed as Reasonable: A member confirmed that the model under discussion is indeed a GPT model.
- The affirmation followed a discussion about the modelās capabilities and limitations.
- OpenAI Discord Channels for Bug Reporting Highlighted: A member pointed out the existence of dedicated OpenAI Discord channels for discussing issues and suggestions.
- It was noted that these channels are intended to provide OpenAI with clear access to community feedback, while also allowing members to share experiences and ideas.
- Workflow Analysis on GPT 5.2 Delayed: A member inquired about the release of GPT 5.2 to analyze workflows and investigate potential issues related to a perceived rushed production release.
- Another member responded that the release is delayed.
OpenAI ā· #prompt-engineering (21 messagesš„):
Prompt Engineering for code analysis and improvement, Stability Index Scoring Rubric, Framework Fidelity and Token Limitations, Sharing Large Frameworks on Platforms, Understanding LLM Behavior and Limitations
- DarthGustav Shares Prompt Lessons for LLM Structuring: A member shared a detailed lesson on prompt engineering, including use of hierarchical markdown, abstraction with variables, reinforcement, and ML format matching for compliance, with triple quotes.
- Itās suggested to use this approach with code analysis, scaffolding for the specific coding type.
- Debate Erupts Over Stability Index Scoring Rubric: A discussion arose regarding a stability index and the lack of a publicly available scoring rubric, with some questioning the validity of the presented decimals.
- A member responded that the detailed thresholds belong to an internal research version of the protocol, but shared more information on the methodological frame, scale and dimensions they were using. Here is the direct link to the message.
- Platform Token Limits Impact Framework Sharing: A member detailed limitations of various platforms (especially ChatGPT) for handling large files/frameworks, noting that most truncate files exceeding a few thousand tokens except for Gemini, which will ingest an entire document and put it in its context window verbatim.
- Due to platform limitations, a member expressed reluctance to share chats because of truncation issues and potential privacy concerns related to data being included in the New York Times case against OpenAI, attaching two screenshots and a docx.
- Prompt Engineering Core Principles Explained: A member outlined core prompt engineering principles, emphasizing clear communication and defining what you want the AI to do.
- They recommend iterating in bite-sized steps, checking the output carefully and fact-checking details, especially math, sources, and code.
OpenAI ā· #api-discussions (21 messagesš„):
Prompt Engineering Learning, LLM-based Code Analysis, Sharing chats and privacy, Rubric for stability scores, Prompting for code analysis
- Markdown Magic for Prompting Mastery: Members discussed advanced prompt engineering techniques, including using markdown for hierarchical communication, abstraction with variables, reinforcement, and ML format matching for compliance.
- A prompt example was given, where users can paste it into their LLM of choice, and then add prompts in triple quotes to have the AI structure it for them.
- Dive Deep on LLM Stability Scores Rubric: A member shared the rubric behind the stability scores, detailing methodological frames (independent conversations, diverse questions, null frame), scale (0-10 range reflecting behavior), and dimensions (structural crispness, tonal drift, response-shape variance, semantic inertia, coherence, neutrality, outliers).
- The Stability Index is a simple average across the seven dimensions, intended for comparative distributions rather than final metrics.
- Navigating Privacy Concerns for Large Framework Users: A member expressed concerns about the New York Times lawsuit against OpenAI, fearing their chat data and novel frameworks could be at risk of exposure and the potential identifiability through stylometry and linguistic habits.
- They also noted that only Gemini currently ingests entire large documents verbatim, unlike other platforms like ChatGPT that truncate files, leading to inconsistent results.
- Prompting Pointers for Pinpoint Code Analysis: A member outlined the core of prompt engineering for code analysis: clearly define the desired output, accurately explain the task to the AI, and carefully verify the results, especially for math, sources, and code, and further iterated on their approach.
- They also noted that using a cooperative, helpful, and well-informed mindset is important, and iterating in bite-sized steps, also checking for model āroleplayingā, are ideal approaches.
Unsloth AI (Daniel Han) ā· #general (488 messagesš„š„š„):
Friendship AI, Nvidia chips for China, Github Copilot Agents, 3D Model Dataset, Unsloth Datasets
- Friendship Propels AI Forward!: Members discussed how the power of friendship is essential for AI advancement, drawing parallels to human intelligenceās reliance on unity and connection and connectionism is the root of modern deep learning.
- They also pondered the role of love in digitizing human intelligence, arguing that connection is vital for any automated system to function.
- China Gains Access to Nvidia Chips: Members speculated on the implications of China gaining access to Nvidia compute, which might level the playing field in AI research and this article from NYTimes details the plan.
- While American labs currently lead in RL, Chinaās manufacturing and energy production capabilities, coupled with their development of indigenous GPUs, could shift the balance.
- Copilot Agents Sketched Out: Members agreed that Github Copilot is good to use, although they admitted it had some flaws and overlaps in reasoning.
- A member said that they heard itās pretty bad, while another one said Its really good, I use it often.
- Synthesizing 3D Model Dataset: Members discussed creating a dataset of 3D models fully represented by code, envisioning a GitHub-like platform for 3D assets.
- A member noted they had already generated 3,000 3D models in the last two weeks with the models being driven by prompt-generation.
- Unsloth Improves Datasets Guide: The Unsloth team is improving their datasets guide and they are soliciting feedback.
- Members suggested including examples of bad data (duplicate prompts, empty responses, etc.) as well as guidance on determining the required data volume for specific tasks.
Unsloth AI (Daniel Han) ā· #introduce-yourself (1 messages):
Custom Chatbots, Automation Agents, RAG Search Tools, Speech-to-Text Pipelines, Content Automation Tools
- AI Developer Ready for Projects: An AI Developer is focused on delivering stable, high quality results and is available to support AI projects.
- The developer is open to assisting with custom chatbots, automation agents, RAG search tools, speech-to-text pipelines, content automation tools, AI integrations, and small custom AI tools.
- Offering a Range of AI Solutions: The AI developer provides solutions such as custom chatbots for support, automation agents, RAG search tools, speech-to-text pipelines, and content automation tools.
- They also specialize in AI integrations with major platforms and APIs, and developing small custom AI tools for everyday use.
Unsloth AI (Daniel Han) ā· #off-topic (178 messagesš„š„):
ArtCNN for upscaling, RIFE SOTA status, Llama2.c in Yankovic, Linux Foundation Agentic AI Foundation, Dataset Hell
- ArtCNN still kicking as Upscaling King?: ArtCNN is continually updated for upscaling and RIFE remains SOTA, though it requires post-scaling.
- Itās fast, basically instant, with recommendations for lanczos2sharp aka 2-tap lanczos with a filter radius of 1.047124406.
- Llama2.c gets Yankovic makeover!: Members joked about how insane llama2.c would be in Yankovic and training a model for it.
- However, others cautioned that there are a lot of bugs in it, so that isnāt the best idea rn.
- The Linux Foundation just got Agentic!: Anthropic just donated MCP to the Linux Foundation to create the Agentic AI Foundation.
- A member suggested a clickbait title could be: Linux Joins The agentic AI Race.
- Dataset Hell begins with Cleaning and Grading: A member lamented the frustrating, time-consuming process of making a dataset, a process that never ends, but leads to the best part: finetuning.
- While synthetic generation is helpful, you still gotta clean and filter and grade it, using tools like a harmony wrapper to work with roo code.
- VoxCPM1.5 clones celebrity voices, concern ensues: The VoxCPM1.5 model can clone almost any voice, which some found crazy but others found concerning.
- While the audio is fake sounding, itās good enough for replicating voices of famous people such as Trump.
Unsloth AI (Daniel Han) ā· #help (10 messagesš„):
Unsloth Usage, Qwen3-VL-30B-A3B-Instruct UD Q5 XL with llama.cpp issues, Qwen3VL Encoding Problems
- Unsloth Usage Question: A user encountered a
ValueErrorwhen creating a trainer object with a custom data collator in a padding-free setting.- A moderator pointed out that the question was not directly related to Unsloth, advising the user to seek help in the appropriate channels.
- Qwen3-VL-30B-A3B-Instruct UD Q5 XL Tool Calling Troubleshoot: A user reported issues with tool calling when using Qwen3-VL-30B-A3B-Instruct UD Q5 XL with llama.cpp, noting that the model sends null content instead of a string in assistant responses.
- They suspect a bug in llama.cpp or a problem with the chat template, as the non-VL version worked without issues.
- Qwen3VL Encoding Problems with llama-mtmd-cli.exe: A user experienced a failure when encoding an image slice using llama-mtmd-cli.exe with Qwen3-VL-8B-Instruct-UD-Q4_K_XL.gguf and mmproj-F16.gguf.
- The process was interrupted during the image slice encoding phase, leading the user to suspect compatibility issues between mmproj.gguf and their A770 GPU, even though Mistral3 works fine.
Unsloth AI (Daniel Han) ā· #research (5 messages):
Vision Transformers, Deepthink comparison
- Vision Transformers Randomly Initialized: A member commented on this paper critiquing that the study only looked at Mistral, Llama and randomly initāed vision transformers.
- The member stated that this sample was not diverse enough to make their claim, calling the paper sus.
- Deepthink Similarities Highlighted: A member linked to a paper (https://huggingface.co/papers/2512.07461) and asked if its work was similar to Deepthink.
- No responses were recorded.
Perplexity AI ā· #general (651 messagesš„š„š„):
Prompt generators, GPT 5 vs Gemini, Perplexity AI Pro Limits, Comet browser education subscription
- Users Seek Prompt Engineering Enhancers: Users are seeking prompt enhancers and prompt generators, as well as advice on constructing prompts capable of building entire apps, with one user suggesting to tell AI to engineer a prompt engineer.
- ChatGPT 5 vs Gemini Pro: AI Benchmark Brouhaha: A user shared benchmark figures indicating that ChatGPT 5.2 significantly improved its Humanityās Last Exam score, surpassing Gemini 3 Pro, leading to skepticism about the rapid improvement since GPT 5.1.
- Another user dismissed these figures as fake, noting that while Gemini may not be the fastest, it is typically good and stable.
- Perplexity Proās Pseudo-Unlimited Query Capped: Users debated whether Perplexity AI Pro truly offers unlimited searches, clarifying that while marketed as such, itās practically limited to around 600 queries per day.
- One user pointed out that Perplexity has scrubbed references to this limit from their website, leading to speculation about potential changes in policy, with some users suggesting that Gemini 3 Pro is giving 600 queries with their subscription.
- Students Scramble for Comet Browser Education Subscription: A student from Russia expressed concern about renewing their Comet browser Education Pro subscription due to payment issues, seeking advice on maintaining their access.
- A user added they could re-enable on the webpage if they have the complexity extension, another stated they had a pro membership from samsung but want to get the student step by step learning.
Perplexity AI ā· #pplx-api (3 messages):
Jio Gemini API, Sonar API
- Jio Gemini API Availability still Unknown: A member asked if Jio Gemini offers API access, noting that Perplexity currently only offers Sonar.
- Nothing is free, you are the product: A member responded that Jio Gemini does not offer API access, and that nothing is free
- They added that if something IS free, then u r the product and that theyāre all stripped down versions of the actual deal.
LM Studio ā· #general (194 messagesš„š„):
Desktop Commander security risks, GLM 4.6v Flash model, AMD GPU for image generation, Training LLMs, Cybersecurity Sidekick
- Desktop Commander Flagged as Security Risk: A member advised caution regarding Desktop Commander, highlighting potential security risks and privacy violations, despite its easy accessibility on Claude, sharing a screenshot.
- The member expressed concerns about malicious code injection during simple code analysis in Windows, suggesting the software might be a scam.
- GLM 4.6v Flash Drops, Coders Rejoice: The GLM 4.6v Flash, a 10B parameter model, was released, with one user noting itās better for coding than other small models, and sharing a link to the model.
- One member reported running the Q4 version at 70tps on their 2060, while another said they corrupted their LM Studio install when trying to install a random model.
- AMD GPUs Struggle with Image Generation: Users discussed the challenges of using AMD GPUs for image generation, with one stating that image generation is firmly in the hands of Nvidia, and recommending looking for amdgpu forks of Automatic1111.
- Despite limited support, it was mentioned that ComfyUI has an AMD section in its readme and works with some AMD GPUs.
- LLM Training: A Parameter Jungle: A member asked for guidance on training LLMs, but another cited the extreme variance in parameters (model size, dataset size and quality) which makes it difficult for hobbyists to grasp.
- It was noted that training and finetuning are effectively the same, with the methodology minmaxing changing slightly.
- Cybersecurity AI Sidekick: A member highlighted the capability of their local Cybersecurity Sidekick LLM in analyzing packet captures, including identifying base64 encoded PowerShell scripts, emphasizing the power available locally.
- The user offered to share a Zeek script, SHA256 hash, or a threat intel feed related to the analyzed payload.
LM Studio ā· #hardware-discussion (395 messagesš„š„):
RTX 3060 SLI with 3080, GDDR6 VRAM Temperature, Multi-GPU setup for LLM, Qwen3 30B, ai memory project: mcp-ai-memory
- 3060 teams up with 3080: A member inquired about using an RTX 3060 (12GB) with an RTX 3080 (10GB) for LLMs, and learned that while SLI isnāt supported or required, LM Studio can utilize the memory of both cards, though with a slight performance hit.
- It was mentioned that the 3060 doesnāt allow SLI due to lacking the necessary interface.
- GDDR6 VRAM Temperature Measurement: Members discussed methods to measure GDDR6 VRAM temperature on Linux, with one suggesting
nvidia-smi, while another suggested this github to check.- However, it was noted that for consumer-grade cards, the sensor might not report VRAM temperatures, with a laser thermometer suggested as an inaccurate alternative.
- Multi-GPU Setup Considerations: Members advised to use cards with as much VRAM and high memory bandwidth as possible for multi-GPU setups, noting the RTX 3090 as a good value option, but warning that 3090 Tiās tend to get very hot.
- The topic of RTX 5000 series was raised and the topic of AI changing everything with potential fixes on the RTX 6000 series
- Experimenting with Qwen3 Models: A member got guidance on running Qwen3 models, with advice to try Q4_K_M as a baseline and explore different quantizations like q6, also a discussion about file sizes needing to fit into ram.
- It was noted that running Qwen3 30B A3B is viable and that it could reach ~20 token per second on some setups.
- mcp-ai-memory is not that memorable: Members shared mcp-ai-memory which had a cumbersome setup, terrible results, and seems to have very specific memory and context requirements.
- On the topic of memorization techniques, it was shared that waiting 10 minutes for prompt processing is not an issue depending on the task/prompt size.
OpenRouter ā· #general (264 messagesš„š„):
Exa Search Quality, Parallel.ai Recommendation, Deepseek v3 Formatting Issues, OpenRouter Chatroom Refresh, Mistral's Comeback
- Parallel.ai is Better Search: A user reported that Exa Search is quite shitty and recommended using Parallel.ai instead, noting itās 10x cheaper, faster, and better than Perplexity for search tool usage with LLMs.
- The user later clarified that Parallel.aiās deep search endpoint is particularly effective when paired with Grok 4.1.
- Deepseek v3 Users Annoyed by Formatting: Users expressed frustration with Deepseek v3ās tendency to use double asterisks (
** **) for formatting, requiring manual adjustments or prompting to avoid them.- One user with a complex roleplay setup explained that adding instructions to avoid asterisks was impractical due to context limitations and extensive lore.
- OpenRouter Chatroom Has Quirks: A user reported that OpenRouter sometimes refreshes users out of long chats and redirects them to the model page, also suggesting users give free upvotes to a feature request on the OpenRouter suggestion channel.
- The user also asked for free eyes for their post.
- Users Leak API Keys on Private Github: A user reported that their API key was often disabled and another user suggested they might be committing it to GitHub.
- Later, the user confirmed they had uploaded content about the API key in a private repository, with advice given on using password managers or secret stores for sensitive information instead.
- Mistral3 is coming back to the Game: Despite initial doubts, Mistral is making a comeback with Mistral3, potentially reaching GPT-4.5 levels, backed by recent funding and strong fundamentals.
- Critics still dismiss Mistralās value, stating they canāt take you seriously anymore while using benchmarks from artificial intelligences index.
OpenRouter ā· #new-models (4 messages):
ā
- No new models or discussions identified: There were no new models or significant discussions identified in the provided messages.
- Channel labeled, but no content provided: The channel was labeled as āOpenRouter - New Modelsā, but no actual content was provided for summarization.
OpenRouter ā· #discussion (12 messagesš„):
AI Studio, Olive Oil Cake, NousResearch CF Patch
- AI Studio Generates Software Demos: A member noted that accidentally clicking anything in AI Studio automatically generates and runs a 3 Pro software writing demo.
- Olive Oil Cake Taste Test: A member stated that olive oil doesnāt make a good cake, followed by yuck.
- NousResearch CF Patch Released: Members wondered how bad the patched stuff was, referencing a recent NousResearch CF patch.
Eleuther ā· #general (122 messagesš„š„):
Quantum Hardware for Language, H100 Speedrun, RWKV 8 Architecture, Neuromodulatory Control Networks (NCN), AI Slop Definition
- Quantum Hardware Training has Skeptics: A Reddit post about using real quantum hardware for language model training met with skepticism, with one user calling it ānonsenseā and another linking it to a āschizo subreddit.ā
- Others pointed out the existence of Quantum Kernels and Quantum SVMs, suggesting the idea has conceptual merit, though its practical value is questionable.
- Free H100 Compute Pool for Eleuther?: Members discussed the possibility of a community pool at EleutherAI that grants users 3 minutes of free 8xH100 compute time daily, with concerns about user authentication and the actual utility for those who arenāt actively working on their projects.
- One member suggested that people not trying hard enough arenāt the right target, while another suggested using Colab or inexpensive rental services for smaller compute needs.
- Smerky Updates on RWKV 8 Architecture and Goldfinch: Smerky reported progress on the RWKV 8 architecture, expressing excitement about it, plus an updated 7c and a new Goldfinch that uses that.
- The original goal of RADLADS was to train Goldfinch but Smerky failed at that, though fixes may be coming for RADLADS2 and is planning on larger scale testing, though itād be nice if distillation worked, since then I could convert giant models to it.
- Novel Architecture Similar to Hypernetwork: A member introduced a novel architecture similar to a hypernetwork called Neuromodulatory Control Networks (NCN), modulating temperature, layer gain, and FFN gating on the fly with a 768-dimensional vector input.
- Despite being an 18M parameter model, it achieved a validation perplexity of 4.5 after training on TinyStories for only one epoch and more details can be found on the Github repo and the paper.
- Defining AI Slop: Members debated the definition of AI slop, with varying perspectives on its value and how to identify it, and how it seems to change per individual.
- One member shared that slop can be good while another mentioned this is becoming itās own subfield and shared two perspectives and another perspective to read.
Eleuther ā· #research (45 messagesš„):
Quantum Machine Learning Simulators, Real Quantum Hardware Training for Language, RNN + Transformer Hash-Hop, 4096 Token Context Window Training, Selective Gradient Masking
- Quantum Quest Begins: QML Topic Search Kicked Off: A member is seeking a good topic to work on in quantum machine learning using simulators like Qiskit.
- A link to a relevant Reddit thread about real quantum hardware training for language models was shared as a starting point.
- RNN and Transformer do Hash-Hop: A member suggests combining RNN and transformer architectures to achieve hash-hop with depth O(seqlen * num layers), potentially outperforming transformers that do only one level of hash-hop lookup per attention layer.
- They also point to a paper and propose that a variable depth SSM + transformer could solve arbitrary depth hash-hop problems.
- 4096 Context Window Woes: A member training a model with a 4096 token context window over 100B tokens from FineWeb using the Llama 3.2 1B architecture encountered significant loss spikes, as can be seen in the attached image.
- The member later identified that the issue was because they were using learned positional embeddings instead of rotary embeddings for the 4096 run, requiring them to re-run their experiments.
- Selective Gradient Masking Strategy: A member shared this link to Anthropic discussing Selective Gradient Masking.
- This approach can be related to spiking neural networks (SNNs) by reducing compute by not using attention in later layers.
- Top-K Attention saves compute: Members discussed implementing Top-K attention each layer to save compute, where the most relevant tokens are selected for attention using a first module.
- However, one member pointed out that this approach doesnāt reduce the time complexity but it does improve prefill speed by a constant factor.
Eleuther ā· #interpretability-general (2 messages):
Task Optimized KV Caches, Task Optimized LoRAs, Mechanistic Interpretability of Diffusion Models
- KV Caches: Data or Algorithm?: A member pondered whether task-optimized KV caches are more akin to data or algorithms.
- They also questioned how these compare to task-optimized LoRAs, sparking a discussion on the nature of these optimized components in AI models.
- Diffusion Models Exhibit Algorithmic Divergences: A member shared a paper stating, āWe discover fundamental algorithmic differences in how diffusion architectures process synthetic versus naturalistic data distributions.ā
- This finding highlights the distinct ways diffusion models handle different types of data, suggesting the need for nuanced approaches in their design and application.
Eleuther ā· #lm-thunderdome (3 messages):
Anthropic fix, lm-evaluation-harness
- Anthropic Mapping Bug Squashed: A member submitted a pull request to fix a broken mapping for Anthropic in the lm-evaluation-harness repo.
- They stated that the PR should be straightforward to review and merge.
- Easy Review PR Incoming: A new pull request has been submitted to EleutherAI for review.
- The PR addresses and resolves a broken mapping issue related to Anthropic.
Latent Space ā· #ai-general-chat (77 messagesš„š„):
Unconventional AI, ManusAI Context-Engineering, Howie Xu AI talent, BERTopic & Hugging Face Nightly Metadata Tensor Leak, OpenAI Television Ads
- AI scaling Faces Global Energy Wall, Neuromorphic Hardware To The Rescue: Unconventional AI warns that AI scaling will hit a global energy wall within 3-4 years.
- They advocate abandoning digital simulations of neural nets in favor of purpose-built, brain-like hardware to break through efficiency limits, earning enthusiastic community support.
- ManusAI Context-Engineering Blogpost Shares Insights: Lance Martin shared a new blog post covering his conversation with Yichao āPeakā Ji about context-engineering in ManusAI, complete with slides and a webinar video.
- Jonas Templestein calls it āa very good post on agent design,ā, and Lalit M already wants a follow-up with fresh updates.
- Hugging Face Model Leaks Metadata: A user flagged an issue where metadata tensors from a nightly Hugging Face model are leaking into BERTopic embeddings, causing unexpected shapes and potential errors.
- Discussion centers on isolating the problem, debugging data loaders, and updating dependencies to fix the leak.
- OpenAI Buys Air Time for TV Ads: OpenAI will air its first TV commercials tonight during Monday Night Football on ESPN and shortly after on The Voice.
- This marks the companyās initial step into television advertising.
- Eleven Labs Reader Gets Rave Reviews: Users shared enthusiastic reviews for Eleven Reader, a mobile app by Eleven Labs, praising how the voices react to the content & add emotion based on the context.
- One user recommended using Macās native TTS reader via
Edit --> Speech --> Start speakingas a free alternative, but noted the quality is not as good.
- One user recommended using Macās native TTS reader via
Latent Space ā· #genmedia-creative-ai (48 messagesš„):
Contact-Sheet Prompting, ModelScope Bias, RoR vs Node.js, Fake screenshot warning
- Contact-Sheet Prompting workflow goes Viral: Willie shares a detailed contact-sheet prompting workflow for Nano Banana Pro that produces cohesive 6-frame fashion editorials, complete with camera positions, styling constraints, and Fuji Velvia flash aesthetic.
- ModelScope denies Chinese Rocket Mishap Bias: ModelScope is under fire after its text-to-video model generated footage showing a Chinese rocket exploding; the company insists the model is unbiased and that any issues can be reported via Hugging Face.
- RoR vs Node.js Speed Showdown: A single tweet pits Ruby-on-Rails against Node.js in a performance comparison, inviting debate on which framework wins the speed contest here.
- Fake screenshot warning: A single link to a tweet by @iamemily2050 is shared here, seemingly as an item of interest; the thread itself contains no further discussion.
Nous Research AI ā· #announcements (1 messages):
Nomos 1, Open Source, AI Mathematician
- Nous Research Open Sources Nomos 1: Nous Research has open sourced Nomos 1, a 30B parameter model that scored 87/120 on the Putnam math competition.
- Nomos 1 Ranks High in Math Competition: The Nomos 1 score would rank #2/3988 in 2024, marking a step towards creating a SOTA AI mathematician with hillclimbai.
Nous Research AI ā· #general (111 messagesš„š„):
Terminals SDK & Gemini integration, SMC steering for model degradation, Video game 3D map generation with AI, Multithreaded Sam3 performance, AI-generated website analysis reports
- Agent Zero Realigns Gemini, Builds Immersive Worlds: Agent Zero from terminals.tech is leveraging AI Studio and realigning Gemini to create immersive world generators, using the terminals SDK for brain, machine, and interface APIs.
- Within each app runtime, another instance of Gemini and Agent Zero can spontaneously spawn copies of themselves, exhibiting emergent properties capable of controlling the environment.
- SMC Steering Tames LLM Weight Drift: Members are exploring how to mitigate model degradation after vectors are triggered and returned to baseline, with interesting work on SMC steering by a member.
- The degradation and drift of mutated weights is unbearable, according to a member, and SMC steering is a common problem when testing.
- MultiThreaded Sam3 Achieves Amazing Results: A member successfully multithreaded Sam3, showcasing significant speed improvements by running multiple instances simultaneously.
- Despite the achievement, thereās a desire for better GPU compute to finetune it on Anime, lamenting that it cannot be finetuned on a 3090.
- Lexical Wave Function Collapse Generates Text: A member is developing a game using a Lexical Wave Function Collapse text sentence generator, which generates sentences based on constraints that affect the game state.
- The system treats the start of a sentence like Schrƶdingerās Sentence, collapsing words in as you look at them, until it becomes a complete sentence.
- Nous Model Achieves Insane Score on Putnam: A 30B parameter model achieved a score of 87 on the Putnam math competition, showcasing significant advancement in AIās mathematical capabilities.
- The modelās performance suggests that AI is rapidly approaching research-level capabilities in mathematics, especially given hard benchmarks like Putnam.
Nous Research AI ā· #ask-about-llms (10 messagesš„):
Web Chat Tools, Hermes 4.3 Model, SillyTavern, KoboldCPP UI, Token Generation Speed
- Web Chat Lacks Tool Integration: A user inquired about the possibility of integrating tools like web search into the web chat version of the model.
- There was no response or confirmation regarding the integration of web search or similar tools.
- Hermes 4.3 Released, Packs a Punch: Users discussed the existence of Hermes 4.3, with one expressing surprise at the updated versions, noting Hermes 4 as their favorite for local use.
- It was mentioned that Hermes 4.3 is a 32b model, which is more compact and may perform better on laptops; the original user runs the 70b Hermes 4 model locally, leveraging 128 MB of unified RAM.
- SillyTavern, Still a Haven?: A user expressed their satisfaction using SillyTavern with Hermes 4 405b via API for roleplay and creative writing.
- Another user hadnāt heard of it recently, with the original user jokingly asking if it was frowned upon.
- KoboldCPPās UI, Oldie but a Goodie: One user runs the Hermes 4 70B model on KoboldCPP, sometimes using SillyTavern as the front end.
- They noted that KoboldCPP has an old UI and can only be shut down by force quitting, but it works.
- Token Speeds Vary: A user asked about token generation speed, comparing their 2 tokens/second on a unified RAM system (the AMD 395) to another userās system.
- The user mentioned asking about this previously but not remembering the answer.
GPU MODE ā· #general (4 messages):
ML Infra, ML Systems, Career Advice
- ML Infra Folks Assemble: A member inquired if anyone works in the ML Infra/Systems space, seeking confirmation of being in the right community for career change advice.
- Another member confirmed that this is a primary area of interest within the community and encouraged the user.
- ML Systems Engineers Welcome Career Changers: The community welcomes career changers interested in ML Infrastructure and ML Systems roles, with the primary interests aligned with these fields.
- The supportive response indicates a welcoming environment for newcomers seeking guidance and networking opportunities in this domain.
GPU MODE ā· #cuda (7 messages):
Beginner docs, Livestream coding, PTX doc feedback, Kernel challenge timing
- Beginner Docs Get Revamp: A member mentioned the idea was to do it better for beginners and requested any feedback.
- They mentioned that they would move it internally if they received any feedback.
- Livestream Coding Session Planned: A member stated they might just livestream going through the whole thing when their baby leave is done.
- Another member immediately responded expressing interest in joining the livestream, stating āNothing better than staring at docs together lol.ā
- PTX Doc Feedback Incoming: A member found some typos and may have some feedback for PTX doc and asked if they could discuss it with another member.
- The other member replied they were coming back from NeurIPS and they could discuss in the following days.
- Kernel Challenge Timing Off: A member gave feedback that āthis is a really bad example and doesnāt really work this way because the start event will execute straight away and cpu overhead to launch the kernel is included in the timingā.
- They stated it was the reason why timings on the first kernel challenge were so off.
GPU MODE ā· #torch (2 messages):
torch.compile with slicing ops, static KV cache pre-allocation, KV cache updates
- Slicing slow down with torch.compile: A member asked about using
torch.compilewith slicing operations when updating a static KV cache, referencing this PR and reporting that slicing operations slowed down the compiled code even whenbatch_size == max_batch_size. - Slicing woes solved with static cache lookup: The same member found a workaround by caching all slices and marking each with a static address so that slicing becomes a look-up, as detailed in this comment.
- They expressed interest in a cleaner approach.
GPU MODE ā· #cool-links (1 messages):
mobicham: https://www.radixark.ai from SGLang folks š
GPU MODE ā· #jobs (1 messages):
crankshot1698: Shot you a dm!
GPU MODE ā· #beginner (1 messages):
serverlessLLM, blitzscale, inference serving
- Inference Serving Intro via ServerlessLLM and Blitzscale: Members suggested reading serverlessLLM and blitzscale as a good introduction to inference serving.
- They clarified that both are more focused on the systems side of inference serving.
- Further Reading on Inference Systems: The discussion highlighted the importance of understanding the systems aspects of inference serving.
- Resources like serverlessLLM and blitzscale provide valuable insights into this domain.
GPU MODE ā· #pmpp-book (4 messages):
A100 FLOPS calculation, TF32 MMA, FP16 MMA, Elementwise ops
- A100ās FLOPS Numbers Deceptive?: A member inquired about the calculation of FLOPS for the A100 GPU as stated in Chapter 1, questioning whether they represent the number of independent floating point operations.
- Another member responded that these numbers are a bit misleading, since the 156 Tflops number is for tf32 mma (tensor core matmul) which is really a 19-bit format with 32-bit alignment.
- FP16 MMAās Actual Use Case Disclosed: A member clarified that the 312 Tflops is for fp16 mma, not elementwise ops.
- For elementwise ops, running a random sequence of float ops on all cores, the worst case would be 1/4 of the stated peak performance: every instruction being one flop (i.e. FADD/FMUL, instead of two with FFMA), and dependent on the result of the previous instruction, with only a single warp per SMSP
GPU MODE ā· #off-topic (1 messages):
jaefosho: Very Eastern European
GPU MODE ā· #irl-meetup (1 messages):
szymonoz: Iāll be in SF this week, anyone from the Bay up for a meetup?
GPU MODE ā· #liger-kernel (3 messages):
ML Infra, Marksaroufim, How does one break into ML infra space?
- ML Infra Entry Point Inquired: A member inquired how does one break into the ml infra space?
- Another member then linked the question into channel #1198358627594023014.
- Additional Topic Placeholder: This is a placeholder to satisfy the minimum items requirement.
- Additional details can be added here if available.
GPU MODE ā· #self-promotion (6 messages):
Registers Best Practices, Mojo, Apple Silicon, Metal IR
- Low Level Learning on Registers: A member shared their work on best practices when working with registers via a blog post.
- Metal IR missing in article: A user expressed surprise at the lack of Metal IR (.air) in the register article, given Mojoās capability to target Apple Silicon.
- The author acknowledged this and noted their experience mainly revolves around AMD and Nvidia, but assumed the principles carry over to Apple.
GPU MODE ā· #submissions (21 messagesš„):
nvfp4_gemm benchmark, vectorsum_v2 benchmark, NVIDIA performance, A100, B200, H100, L4 performance
- NVidia NVFP4 GEMM gains traction: Multiple submissions to the
nvfp4_gemmleaderboard achieved 8th place on NVIDIA, with times ranging from 11.9 µs to 13.1 µs. - vectorsum_v2 Benchmark Shows Performance Across Architectures: Submissions to the
vectorsum_v2leaderboard show performance across various NVIDIA architectures, including A100, B200, H100, and L4, with times varying widely.- One submission reached 4th place on L4 with 935 µs, while another achieved 10th place on B200 with 72.7 µs and 9th place on H100 with 93.5 µs.
- NVFP4 GEMM yields varied results: Successful runs on
nvfp4_gemmleaderboard range from 10.9 µs to 59.6 µs.
GPU MODE ā· #factorio-learning-env (2 messages):
Factorio video, AI dev vs AI agent
- Factorio Video Inspires AI Agent: A member watching the Factorio video jokingly remarks that they are already an advanced AI dev, sparking a lighthearted moment.
- Another member jokingly suggests replacing ādevā with āagentā, adding to the amusement.
- Devs Become Agents, Jokes Ensue: The discussion playfully transitions from AI developers to AI agents due to the Factorio videoās influence.
- This exchange underscores the evolving roles and terminologies within the AI field, albeit in a humorous context.
GPU MODE ā· #cutlass (2 messages):
Cutlass GEMM Tutorial, Tensor Layout
- Cutlass GEMM Tutorial Layout Discrepancy: A member questioned the tensor layout obtained in the Cutlass GEMM tutorial, specifically regarding the layout of
tCsAafter a local partition operation.- The discrepancy lies in the expected shape
(8,8):(1,128)vs. the obtained shape(_8,_8):(_16,_128)after composingsAwith(16:1).
- The discrepancy lies in the expected shape
- Clarification Needed on Tensor Shape: The user seeks clarification on why the
tCsAtensor has a shape of(8,8):(16,128)instead of the expected(8,8):(1,128).- The question arises from the operation
Tensor tCsA = local_partition(sA, tC, threadIdx.x, Step<_1, X>{})and the subsequent mode composition.
- The question arises from the operation
GPU MODE ā· #helion (1 messages):
Helion webinar, PTC launch, Helion kernels
- Helion Webinar Scheduled with Live Q&A: A webinar with live Q&A is scheduled for Thursday, December 11th, at 11 am PST to discuss Helion-related topics.
- The webinar will cover developments since the PTC launch and best practices for developing, debugging, and deploying Helion kernels; a YouTube link was provided.
- Helion Kernel Best Practices to be Discussed: The webinar will delve into best practices for developing, debugging, and deploying Helion kernels for optimal performance.
- Participants are encouraged to bring questions for the live Q&A session, ensuring an interactive and informative experience.
GPU MODE ā· #nvidia-competition (16 messagesš„):
GEMM Competition, 50 Series NVFP4 Support, B200 GPU Performance, Mojo Availability
- GEMM Competition Record and FLOPs: The top entry in the GEMM competition achieved 10.835μs for shape M N K 128 7168 16384, translating to approximately 2.77 Pflops.
- The calculation is based on the formula
M*N*K*2/t.
- The calculation is based on the formula
- 50 Series Lacks NVFP4 support: Members report that the 50 series does not yet support NVFP4.
- Compilation fails with an
OpErrorwhen targeting architectures other thanArch.sm_100a.
- Compilation fails with an
- B200 GPU Performance Inconsistencies: Users report inconsistent performance on B200 GPUs, specifically b200-02-gpu1, with submissions occasionally resulting in slower performance despite using identical code.
- One user is sending a note to check exactly b200-gpu01 to investigate this further.
- Troubleshoot Failed Submissions: Users are encountering submission failures, indicated by exit code 1 in the evaluation script
eval.py.- The error arises after 1.44 seconds and is causing frustration in the community.
- Quest for Further GEMM Optimizations: Achieving further improvements in the GEMM competition is proving difficult, with top entries being very close in performance.
- One competitor claims to have tried like 100 different ideas and none of them improved it.
GPU MODE ā· #robotics-vla (7 messages):
Variable Textures, Inference Speedups, Hanging Mug Successes, Interesting Choices
- Variable Textures Boost Training: Training now includes variable textures, clutter on the table, and slight table height variations for improved model robustness using a new robot simulation.
- Efforts are focused on new robot simulation and a different VLA architecture to accelerate inference speeds, experimenting with inference speedups to the provided checkpoints.
- Hanging Mug has Successes and Failures: hanging_mug also has first successes now, with the left being a success and the right almost succeeding, as demonstrated in attached videos.
- Videos hanging_mug_ep14_20251209_163800_success.mp4 and hanging_mug_ep13_20251209_163506_fail.mp4 showing hanging mug experiments
- Interesting Choices are Viewed: A member shared the URL https://x.com/ilialarchenko/status/1998384056439017826 and mentioned they will go through it to see some very interesting choices.
- The twitter URL shows a humanoid robot.
HuggingFace ā· #general (30 messagesš„):
SmolVLA paper SO100 benchmark, HF Billing issues, Image/Video Generation models, Apple Clara 7B Instruct, AI FYP Guidance
- SmolVLA SO100 Benchmark Typo Discovered: A member pointed out a typo in the SmolVLA paper regarding the SO101 benchmark, which incorrectly stated that it was trained on three datasets.
- Another member clarified that the typo should refer to the SO100 benchmark, trained on three real-world datasets (pick-place, stacking, sorting), with the SO101 benchmark using only one dataset (lerobot/svla_so101_pickplace).
- HF Billing Still Baffles Users: A user questioned the logic of billing practices within Hugging Face.
- The user asked are team plans not implemented or something?
- New Image and Video Models Emerge: Various image and video models are shared, including AuraFlow v0.3, Ovis-Image-7B, HunyuanVideo T2V and many more.
- The models range from 7-12 GB and generate images at 1024² resolution or videos at 720p/480p.
- Appleās Clara-7B-Instruct GGUF Conversion Quest Kicks Off: A user inquired about the existence of a GGUF version of apple/CLaRa-7B-Instruct.
- Another member indicated that a GGUF version doesnāt exist yet, but shared a link to conversion instructions, referencing the existing Clara-24B-GGUF.
- Anthropicās Agentic Aid: Model Context Protocol Donated!: Anthropic donated the Model Context Protocol to the Linux Foundation.
- It was described by one member as a sterling move.
HuggingFace ā· #i-made-this (8 messagesš„):
Optuna HPO skill, Chronos-1.5B, Quantum-Classical Hybrid Language Model, Quantum circuit parameters trained on IBM quantum hardware
- Optuna HPO skill makes debut: A member introduced an Optuna HPO skill, calling it the peanut butter to the training script jam.
- Chronos-1.5B model debuts quantum AI: A member built a language model, Chronos-1.5B, with quantum circuits trained on IBMās Heron r2 quantum processor.
- The model integrates actual quantum processor training (not just simulation) and is based on VibeThinker-1.5B + 2-qubit quantum kernel layer.
- Chronos 1.5B publishes IBM Quantum Job IDs: The creator of Chronos-1.5B shared several Job IDs from IBM Q: d4ppg9sfitbs739g9410, d4ppf8s5fjns73cvlk4g, d4ppbubher1c73bahigg, d4ppbq7t3pms7396fnu0.
- They encouraged users to check the model repo for the quantum_kernel file and run it optionally with the base chronos model.
- Quantum ML Introductory Readings: In response to a request, the creator of Chronos-1.5B recommended the Qiskit textbook quantum ML chapter and PennyLane demos for a practical introduction to quantum ML.
HuggingFace ā· #agents-course (10 messagesš„):
AI Agent Workshop, Hugging Face Providers Outdated, LlamaIndex Issues, Free LLM Alternatives
- AI Agent Workshop Alert: A member shared details for an AI Agent 0-1 Workshop scheduled for December, highlighting a real client-style project using Langchain and Streamlit, and previewing their AI Engineering Bootcamp.
- The workshop includes sessions on Dec 13 and Dec 16, with more times available here, and offers discount opportunities for top builders.
- Seeking Free LLM Alternative: A course participant is seeking a free LLM alternative for the agents course, as the default option is quickly reaching its limit, causing inference usage and charge-related errors when running code on Colab.
- The user is requesting help to find an alternative that mitigates these issues, highlighting the urgency of the situation.
- LlamaIndex Lesson Meltdown: A member reported running into multiple issues with the LlamaIndex lesson in unit 2.2 of the course, citing out-of-date Hugging Face providers and a NumPy dependency issue that requires downgrading to numpy<2.
- According to a quote from Claude, Hugging Face has moved from a simple āServerless Inference APIā to āInference Providersā which routes requests through external providers (Together AI, Sambanova, etc.).
Yannick Kilcher ā· #general (11 messagesš„):
Whisper streaming, MultiTalker Parakeet streaming, AI Engineer seeking collaboration
- Whisper Doesnāt Stream, or Does It?: Members discussed Whisper streaming capabilities, with some suggesting it doesnāt take a stream as input, while others pointed to OpenAIās use of it.
- The conversation references this YouTube video for context.
- MultiTalker Parakeet Emerges for Streaming: A member shared the MultiTalker-Parakeet-streaming-0.6b-v1 model from NVIDIA on Hugging Face.
- This model may provide the streaming capability some members are seeking.
- AI Engineer Pitches for Collaboration: An AI and App developer shared their skills and experience in AI engineering, cross-platform app development, and full-stack app development.
- They are seeking collaborations on AI projects, mobile apps, or full-stack app development, listing skills such as ML, DL, NLP, Computer Vision, and various frameworks and tools.
- AI History Tweet is impressive: A member linked to this Tweet to show old AI history.
- This Tweet was called impressive.
Yannick Kilcher ā· #paper-discussion (1 messages):
burnytech: Damn, likely colliding with something else I have
Yannick Kilcher ā· #ml-news (15 messagesš„):
Claude coding secureness, Endomorphosis links, China chip manufacturing, Arcprize future, H200 on eBay
- Claudeās Coding Security Plummets: A paper found that only 10.5% of Claudeās coding assistance was secure, while 61% was functional (Arxiv link).
- The evaluation mentioned Gemini 2.5 Pro, Kimi K2, and an unspecified version of Claude Sonnet 4.
- Endomorphosis Resources Surface: A member shared links to resources on endomorphosis, including a PDF on Probabilistic Logics and a YouTube video.
- China Chip Ambitions Unstoppable?: Several members discussed Chinaās push for dominance in chip manufacturing, with one stating that nothing can convince china to not keep pursuing their goal to compete at the top of chip manufacturing at this point.
- Another member noted that the most likely outcome is that they are encouraged to build their own chip fabrication.
- Arcprize Hints at the Future: Members shared a link from Arcprize hinting about the future on fxtwitter.
- H200 Chips Flood eBay via China?: A member claimed that if you search for H100 on eBay, most listings come from China anyway.
- The member also suggests that China is adding regulations requiring companies to register to buy H200 and prove local alternatives arenāt good enough.
Modular (Mojo š„) ā· #general (1 messages):
jokellum: <@&1116225504563970138>
Modular (Mojo š„) ā· #announcements (2 messages):
Community Meeting, MMMAudio in Mojo, Shimmer OpenGL experiment, Mojo 1.0 Roadmap, Modular Team Updates
- Modular Community Meeting Unveils Audio and Graphics Innovations: The latest community meeting featured a demo of MMMAudio, a creative-coding audio environment in Mojo by Sam Pluta, and Shimmer, a cross-platform Mojo ā OpenGL experiment by Lukas Hermann, available on YouTube.
- Modular Outlines Path to Mojo 1.0 Nirvana: The Modular team shared updates from the 25.7 release and provided an early preview of the Mojo 1.0 roadmap, further detailed in their blog post.
Modular (Mojo š„) ā· #mojo (22 messagesš„):
Mojo memory management, MMMAudio presentation with Faust, ImplicitlyCopyable List, String CoW, Embedded AI development with Mojo
- Mojoās Overlapping Lifetimes: Mojo allows overlapping lifetimes of mutable references but diagnoses simultaneous access through two names using the same operations, such as
swap(x, num)orswap(x, y). - MMMAudio presentation cites Faust: A member thanked another for citing Faust in their MMMAudio presentation, anticipating developments in 2026, and hoped that MMMAudio would serve as a useful example.
- List Conditional Conformance to
ImplicitlyCopyable:Listbecomes conditionally conformant toImplicitlyCopyablewhen its elements areImplicitlyCopyableand have a trivial__copyinit__. - String Implicit Copyability & CoW Upgrade: The current
Stringimplementation is Copy-on-Write (CoW), which mitigates the overhead of implicit copyability, with the suggestion thatListmight also benefit from similar upgrades in the future. - Jetson Orin Nano: Embedded AI Development with Mojo: For those looking to get into embedded AI development with Mojo, a member suggested that the Jetson Orin Nano, featuring an Ampere-class GPU, is fully supported and may be suitably small.
Moonshot AI (Kimi K-2) ā· #general-chat (19 messagesš„):
Kimi.com website issues, Kimi's search tool, Kimi citation issues
- Kimi Website Glitch Reported: A user reported issues with the kimi.com website where they cannot click anything besides starting a new chat.
- Troubleshooting Kimiās Website with cookie clearing: One member suggested clearing cookies and disabling VPNs/adblockers to resolve website issues with Kimi.
- The user stated that this did not fix it, neither did clearing cookies.
- Kimi uses Webcrawler Search: A user asked about what search engine Kimiās search tool uses, to which another user responded that Kimi uses none but its own webcrawler.
- Report Kimi Bugs: A member suggested making a bug report about Kimiās citation issues and wobbly gibbly.
- The original user described that when he asked Kimi a question, Kimi would answer the question in his thoughts without sharing them to the user, and it frequently canāt make citations.
Manus.im Discord ā· #general (8 messagesš„):
Checkpoint restoration issues, Credits issue, Manus 1.5 critical incidents
- User experiences checkpoint restoration issues: A user reported a critical issue while attempting to restore the checkpoint of a webdev project.
- They inquired about opening a ticket and shared their email address.
- Credits issue resolved swiftly: A user reported that the Manus team resolved their credits issue by providing a refund.
- The user is now able to pay directly via Manus instead of Google.
- Manus 1.5 faces critical incidents and unanswered emails: A user reported several severe incidents between December 3-9 including 7 affected tasks, and about 150,000 credits lost.
- The user emailed support multiple times without response, but the AI Bot proposed 120,000 credits on December 9. They are formally requesting a joint response from tech support and commerce teams within 48 hours, a technical analysis of the root causes, and an equitable reimbursement.
MCP Contributors (Official) ā· #general (4 messages):
Anthropic's donation, Model Context Protocol, Agentic AI Foundation, LF standards
- Anthropic Donates Model Context Protocol and Establishes Agentic AI Foundation: Anthropic is donating the Model Context Protocol and establishing the Agentic AI Foundation.
- A member inquired about the impact on current work, anticipating a transition to the LF āway of doing it,ā but another member clarified that governance and everything under that umbrella isnāt changing.
- Clarification on Governance Changes Post-Donation: Following Anthropicās donation, a community member sought clarification on potential governance shifts.
- Another member responded, emphasizing that the donation would not alter the existing governance structure.
MCP Contributors (Official) ā· #general-wg (3 messages):
MCP Usage, Private vs Public Ecosystems, Client-ID Metadata Documents (CIDM), MCP Servers, Developer Tools
- MCP Usage in Private Ecosystems: A member inquired about the usage of MCP in private ecosystems, noting the auth-wgās work on public ecosystem client registration through Client-ID Metadata Documents (CIDM).
- The response indicated that the majority of MCP use is likely private/internal/enterprise, especially when considering private MCP servers with public clients (e.g., Claude).
- MCP Servers Focused on Integration with Developer Tools: It was noted that most public-facing remote MCP servers are geared towards integration with developer tools.
- Additionally, developer tools were identified as the most advanced non-custom MCP clients.
aider (Paul Gauthier) ā· #general (2 messages):
Opus with Amazon bedrock and aider
- Opus, Bedrock and Aider tested well: A member asked if Opus is working with Amazon Bedrock and Aider.
- The member then confirmed that all is good.
- Another topic here: Another first summary here
- Another second summary here.
aider (Paul Gauthier) ā· #questions-and-tips (1 messages):
aider features, aider image support, aider workflow
- Aider now generates commit messages: Aider now automatically generates commit messages, improving workflow and streamlining the commit process.
- Users can now commit with the
-mflag, or with just thecommitcommand, using a basicgpt-3.5-turbofree tier model.
- Users can now commit with the
- Aider to get Image support: Images are planned to be supported in
aider, allowing for more detailed and contextual code modifications.- Users will be able to pass in
--imagewhen askingaiderto modify or edit existing images.
- Users will be able to pass in
- Aider workflows can now save edit sessions: Itās possible to save aider edit sessions allowing users to later restore them for a full roundtrip.
- This feature enhances collaboration and lets users save, share, and resume their workflow.
MLOps @Chipro ā· #events (2 messages):
AI Agent 0-1 Workshop, AI Engineering Bootcamp, GitHub Social Club NYC
- AI Agent Workshop fires up!: There will be an AI Agent 0-1 Workshop that serves as an intro to an AI Engineering Bootcamp (online).
- The event will teach attendees to design and build an AI agent that thinks, codes, analyzes data & generates reports for a previous real client, all from scratch; RSVP for Saturday December 13th, 2pm ET: luma.com.
- GitHub Social Club Assembles in NYC: There will be a GitHub Social Club at Bibliotheque in SoHo in NYC.
- There will be no talks, no pitches, just space to connect, share ideas, and swap stories with others in the community with coffee, cookies, limited-edition GitHub swag, some casual games, and a chance to meet the teams behind Copilot, Next, Developer Productivity, and Startups.