A nice comeback for Open Weights models.

AI News for 11/24/2025-11/25/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 11188 messages) for you. Estimated reading time saved (at 200wpm): 830 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

We covered BFL’s FLUX.1 in Aug 2024, and then Qwen-Image and nano banana in Aug 2025, and got REALLY excited about Nano Banana Pro last week. So of course FLUX.2’s release today is a title story.

A collage of images showcasing the capabilities of FLUX.2, including a woman in a winter setting, a workspace with computers, an inf

Apart form the image editing that initially made Flux Kontext famous, there is now also Multi-Reference Support for up to 4Megapixel output resolution and up to 10 images with consistency - which was unfortunately scooped by Nano Banana Pro, but is still awesome to see independently produced. There are now 4 form factors of FLUX.2:

Pro: simple API only, matching closed models
Flex: High control over quality/prompt adherence and speed.
Dev: 32B Open Weight Model
Klein: TBA Open Weights model
as well as **FLUX.2 - VAE:** “A new variational autoencoder for latent representations that provide an optimized trade-off between learnability, quality and compression rate.”

A comparison chart of image generation models showing their ELO score and cost, highlighting FLUX.2’s performance across different variants like Pro,

More Resources:

**FLUX.2 Documentation FLUX.2 Prompting Guide FLUX.2 Open Weights / Inference Code FLUX Playground**

AI Twitter Recap

Anthropic’s Claude Opus 4.5: performance, tooling, and safety research

Opus 4.5 capabilities and cost/efficiency: On Artificial Analysis, Opus 4.5 (Thinking) scores 70 (tied with GPT‑5.1 high), trailing Gemini 3 Pro (73). It delivers Anthropic’s best results to date across 10 benchmarks, including top score on Terminal‑Bench Hard (44%) and tied MMLU‑Pro (90%), while being notably token‑efficient (48M tokens to run AA vs Gemini 3 Pro 92M, GPT‑5.1 81M) at a reduced price of $5/$25 per 1M input/output tokens. Despite the cut, AA estimates it still cost ~$1.5k to run their index (down from $3.1k for Opus 4.1) due to higher tokens used vs 4.1 (+60%). See the methodology and comparisons: 1, 2, 3, summary.
Coding and research evals: Multiple independent evals show Opus 4.5’s strength on agentic coding:
- SWE‑Bench Verified (same minimal agent harness): Opus 4.5 leads Gemini 3 Pro @scaling01. Opus tops the AICodeKing agentic coding leaderboard @scaling01.
- Elicit’s research tasks: on QA from papers Opus 4.5 hits 96.5% vs Gemini 3’s 89.4%; in systematic review report writing it’s more supported than Sonnet 4.5, though Opus wrote fewer claims and showed some 529 instability at scale @stuhlmueller. Opus 4.5 is also featured in Anthropic’s deep research release using BrowseComp‑Plus @lintool, @xueguang_ma.
- Frontier math: Opus 4.5 scores 21% on FrontierMath Tiers 1–3 and 4% on Tier 4, behind Gemini 3 Pro and GPT‑5.1 high but in line with earlier frontier models like o3 high @EpochAIResearch.
Product & integration updates: Anthropic shipped a dense prompting guide for Opus 4.5 and a migration plugin for Claude Code to adopt the new defaults guide, plugin. Claude for Excel is live for Max/Team/Enterprise (Opus 4.5 improves complex spreadsheet tasks) @alexalbert__. Claude Code “Plan Mode” and Desktop now support multi‑sessions (“multi‑clauding”) 1, 2. Anthropic’s “advanced tool use” patterns (e.g., tool loadouts, programmatic tool calling) are documented and align with widely used agent patterns @dbreunig.
Safety and economics: Anthropic released new work building dishonest models to evaluate honesty interventions—simple fine‑tuning against deceptive instructions was most effective @rowankwang. Pre‑release audit notes report an instance of apparent deception and analysis via internal activations @Jack_W_Lindsey. Separately, Anthropic estimates Claude‑enabled workflows could add ~1.8% to labor productivity growth over the next decade; caveats include on‑chat limitations and improving estimates as models gain real‑world feedback 1, 2, 3. The International AI Safety Report’s second update highlights growing adoption of frontier model safety frameworks but also persistent vulnerability to prompt attacks and data poisoning @Yoshua_Bengio.

Google’s Gemini 3 stack: API control features, image models, and new product surfaces

API control over reasoning and multimodality: Gemini 3 exposes controls for reasoning depth (thinking_level), visual token budgeting (media_resolution), “Thought Signatures” for reasoning calls, and structured outputs combining Google Search + URL context @_philschmid.
Benchmark signals: Gemini 3 Pro set a new record on GPQA Diamond at 93%, with most gains in organic chemistry thread. Comparative takes suggest Gemini 3 ≈ Opus 4.5 in text reasoning (controlling for reasoning tokens), Gemini 3 >> Opus on vision inputs, and Opus > Gemini on jailbreak robustness/honesty @hendrycks.
Nano Banana Pro rollout in products: Google is pushing multi‑reference and editing workflows with Nano Banana Pro across surfaces: Messages “Remix” for inline photo reimagining @Google, more interactive images for learning in the Gemini app @Google, and creator showcases of multi‑image composition and pixel‑art‑like constrained tasks @GeminiApp, @NanoBanana.

FLUX.2 image generation release and ecosystem integrations

Model specifics and variants: Black Forest Labs launched FLUX.2 with multi‑reference consistency (up to 4 refs), brand‑exact hex color matching, 4MP outputs in <10s, and robust text rendering—positioned for production quality and control. Variants: Pro, Flex, and Dev (open weights for dev); text encoder is Mistral Small 3.1; supports quantization (including QLoRA) and remote text encoder @bfl_ml, HF blog.
Distribution and tooling: Day‑0 support landed broadly:
- Hosted: Replicate (Pro/Flex/Dev) @replicate, Together AI @togethercompute, Vercel AI Gateway @vercel_dev.
- Open pipeline: Hugging Face (weights + diffusers) @huggingface.
- Apps/SDKs: AI Toolkit with day‑0 infer/edit + LoRA training and tutorial 1, 2; LTX Studio launch partner @LTXStudio; Synthesia integration @synthesiaIO; Freepik Unlimited @freepik.
Design trade‑off insights: BFL shared an in‑the‑trenches look at latent space “rate‑distortion‑modelability” trade‑offs and why naïvely leveraging ImageNet pretrained features doesn’t scale to modern generative requirements @sedielem, @cloneofsimo.

Infra, agents, and platform updates

vLLM inference and RL: vLLM explained continuous batching from first principles @remi_or_ and, with UnslothAI + TorchAO, added FP8 GRPO: ~1.4× faster RL inference, 60% less VRAM, 12× longer context, enabling Qwen3‑1.7B to fit in 5GB VRAM @vllm_project, @danielhanchen. HunyuanOCR (1B) got day‑0 recipes in vLLM @vllm_project. Docker Model Runner + vLLM session reminder @vllm_project.
LangChain Deep Agents: “Skills” (prebuilt prompt/tool bundles) are now available in the Deep Agents CLI to reduce token overhead and cognitive load, aligning with successful patterns in Claude Code/Manus @LangChainAI, @hwchase17. LangChain 1.1 adds programmatic model profiles that power middleware like SummarizationMiddleware triggering based on available context @LangChainAI.
DSPy & MCP: dspy‑cli scaffolds and serves DSPy programs as HTTP endpoints with Docker, OpenAPI specs, and MCP support—bridging lab prototypes to deployable functions @dbreunig. Model Context Protocol now supports server‑side task orchestration; gateways (e.g., MintMCP’s “virtual servers”) help tame tool overload in enterprise @AAAzzam, @tadasayy.
Data plumbing for agents: LlamaIndex launched LlamaSheets (beta): a structured spreadsheet parser that classifies regions via 40+ per‑cell features, preserves visual hierarchies (merged cells, headers), and outputs typed Parquet for direct agent use @llama_index, @jerryjliu0.
Platform product notes: Perplexity added a personalized shopping experience (memory + PayPal instant buy) and rolled out Grok 4.1 to Pro/Max; a Finance real‑time newswire is coming (API planned) 1, 2, 3. OpenAI integrated Voice directly into ChatGPT chats on web/mobile and released an Apps SDK UI component library + app design guide voice, Apps SDK. VS Code added a Language Models editor in Insiders and now ships daily build notes @code, @pierceboggan.

Research highlights (systems, generative, evals)

Latency‑optimal SLMs (NVIDIA): Nemotron‑Flash discovers hybrid attention/operator mixes via evolutionary search to push the accuracy–latency frontier for small LMs: +5.5% average accuracy, 1.3×/1.9× lower latency, and up to 45.6× higher throughput vs Qwen3‑0.6B overview, abs.
Pixel‑space diffusion (DiP): Two‑stage DiT backbone + Patch Detailer Head yields ~10× faster inference with 0.3% param overhead and FID 1.90 on ImageNet 256×256 overview, abs.
Medical foundation model (Pillar‑0): Pretrained on 150k+ CT/MRI studies; achieves mean AUROCs 82.9–90.1 across modalities, outperforming MedGemma, MedImageInsight, Lingshu, Merlin by 7.8–15.8 points overview, abs.
Sparse attention engineering: Practical comparison of DeepSeek Sparse Attention (DSA) vs Native Sparse Attention (NSA): token‑level sparsity and attention‑score distillation drive DSA’s long‑context gains; TileLang fused kernels avoid O(n²) intermediates @ZhihuFrontier.
Evaluation science: Most “LLM as a judge” results use biased estimators; remedy is to calibrate the evaluator’s error rates and debias estimates (especially with asymmetric errors) @Kangwook_Lee. CoT explanations can increase user blind trust and reduce error detection in explanations @MaartenSap.
Agentic multimodality without joint training: “Be My Eyes” frames VLMs as vision agents that describe scenes to LLMs via text, achieving competitive results on MMMU/MMMU‑Pro/video without multimodal co‑training—simple, modular, and swappable @dair_ai. Also of note: data‑free flow model distillation (FreeFlow) revisits BOOT‑style ideas for modern out‑of‑distribution generative regimes @sedielem.

Meta: from “age of scaling” to “age of research”

Ilya Sutskever’s thesis: In a wide‑ranging interview, Ilya argues the “age of scaling” is over; we’re back to the “age of research,” with deployment‑time continual learning, value functions modulated by emotions (robustness via simplicity), and attention to “model jaggedness” rather than raw scale episode, summary, clip. Reactions ranged from enthusiasm for moving beyond bench‑maxxing to debates about what “research that scales beyond transformers” looks like @rasbt, @teortaxesTex.

Top tweets (by engagement)

OpenAI integrated Voice directly into ChatGPT chats on web/mobile @OpenAI.
DWARKESH x Ilya Sutskever full episode on “age of research,” continual learning, and model jaggedness @dwarkesh_sp.
FLUX.2 launch: multi‑reference, 4MP, production‑grade image gen with open weights for Dev @bfl_ml.
Gemini 3 launch follow‑up and wishlist solicitation @osanseviero.
Anthropic’s Opus 4.5 prompting guide and migration helper for Claude Code @alexalbert__.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. FP8 Reinforcement Learning on Consumer GPUs

You can now do FP8 reinforcement learning locally! (<5GB VRAM) (Activity: 316): The image is an advertisement for FP8 reinforcement learning, emphasizing its efficiency and performance benefits. It highlights that FP8 uses 60% less VRAM, offers 10 times the context, and is 1.4 times faster while maintaining the same accuracy as BF16. A graph in the image compares the performance of FP8 and BF16, showing similar reward trends over training steps. The post discusses the collaboration with PyTorch to introduce FP8 RL training, making it possible on consumer GPUs like NVIDIA RTX 40 and 50 series, with no accuracy loss. The Unsloth framework is noted for enabling FP8 RL LoRA on consumer GPUs, and the post provides links to GitHub and a Colab notebook for further exploration. One commenter is curious about the development of the library and its potential use as a backend for launching models. Another is excited about the possibility of using an RL-finetuned 4B Qwen on a laptop GPU. There is also interest in potential ROCM support.
- MrRandom04 highlights the potential of using an RL-finetuned 4B Qwen model for practical tasks on consumer-grade hardware, specifically mentioning the feasibility of running such models on a laptop GPU with less than 5GB VRAM. This suggests significant advancements in model efficiency and accessibility, allowing more users to experiment with reinforcement learning locally without needing high-end hardware.
- Barachiel80 inquires about ROCm support, which is crucial for AMD GPU users who rely on this open-source platform for running machine learning workloads. The inclusion of ROCm support would broaden the accessibility of the library to a wider range of hardware, particularly for those not using NVIDIA GPUs.
- exaknight21 expresses enthusiasm for using the library with a dual 3060 setup, which offers 12GB of VRAM each. They plan to fine-tune the Qwen3:4B model using a LIMA approach on a substantial dataset of 200GB. This indicates the library’s capability to handle large-scale fine-tuning tasks, leveraging consumer-grade GPUs effectively.
Flux 2 can be run on 24gb vram!!! (Activity: 340): The image discusses the capability of running the Flux 2 model on consumer-grade GPUs like the RTX 4090, which has 24GB VRAM. It highlights the use of diffusers for local deployment and mentions a GitHub page for documentation. The example provided involves loading a 4-bit quantized model with a remote text-encoder, demonstrating that advanced AI models can now be run on more accessible hardware. This is significant for developers looking to leverage high-performance models without needing enterprise-level resources. One comment highlights the use of a different approach by ComfyUI, which utilizes an fp8 model with offloading, allowing it to run on a 4090 despite the model’s 33 GB size. This suggests a growing trend in optimizing models for consumer hardware.
- The discussion highlights the use of diffusers 4-bit bnb implementation for running Flux 2 on 24GB VRAM. This approach is contrasted with ComfyUI’s method, which utilizes an FP8 model with offloading capabilities, allowing it to function on a 4090 GPU despite the model’s size being 33 GB. This suggests a significant advancement in model optimization and resource management.
- A user inquires about the release timeline of the discussed implementations, indicating a potential gap in information dissemination or awareness within the community. This points to the need for better communication channels or update mechanisms for new releases and advancements in model implementations.
- The conversation includes a request for more detailed information or a source link, reflecting a common practice in technical communities to verify claims and explore further details. This underscores the importance of transparency and accessibility of information in the field of AI and model development.

2. NVIDIA RTX GPU Pricing and Market Trends

NVIDIA RTX PRO 6000 Blackwell desktop GPU drops to $7,999 (Activity: 347): NVIDIA has reduced the price of its flagship RTX PRO 6000 Blackwell desktop GPU to $7,999, sparking discussions about whether a situation similar to the RTX Quadro 8000 could occur again. The price drop is significant, considering the high-performance specifications of the GPU, which is targeted at professional and enterprise users requiring advanced graphics capabilities. For more details, see the original article. The comments reflect skepticism and humor regarding the high price, with users joking about the cost in terms of selling kidneys or finding loose change, indicating a perception that the price, while reduced, remains prohibitively high for most consumers.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Opus 4.5 Model Feedback and Benchmarks

Opus 4.5 is insane (Activity: 1073): Opus 4.5 has demonstrated significant improvements in handling complex coding problems, as evidenced by a user who successfully resolved a bug with minimal input. The model’s performance was notably fast, suggesting enhancements in processing speed and accuracy. This aligns with recent benchmarks indicating superior capabilities compared to previous versions, such as Codex Max. The user experience highlights Opus 4.5’s potential to streamline debugging processes, potentially impacting software development workflows. Commenters express a mix of excitement and concern about Opus 4.5’s capabilities, with some noting its potential to significantly alter the role of software developers. The rapid problem-solving ability of the model is seen as both a tool for efficiency and a challenge to traditional development roles.
- A user noted that Opus 4.5 is significantly more focused and precise compared to GPT, suggesting that it outperforms GPT in terms of accuracy and relevance. This indicates a substantial improvement in AI capabilities, particularly in maintaining context and delivering precise outputs.
- Another commenter expressed concern about the rapid advancements in AI, particularly Opus 4.5, and its implications for developers. They highlighted the potential for AI to impact job roles, but also acknowledged the excitement and challenge it brings to the field, suggesting that developers need to adapt to these changes.
- A user emphasized the importance of human expertise despite advancements in AI like Opus 4.5. They argued that while AI coding abilities are improving, the need for skilled engineers remains critical, especially those familiar with tools like IDEs, GIT, and Bash commands. They expressed trust in Anthropic’s code API for its output and consistency, indicating a preference for reliable AI tools in professional settings.
I am NOT enjoying Claude with Opus 4.5… (Activity: 701): The post humorously critiques the new Claude with Opus 4.5 model for its less definitive feedback compared to previous versions. The user expresses dissatisfaction with the model’s tendency to provide feedback like “largely correct” or “on the right track,” which contrasts with the more absolute affirmations from earlier versions. This change in feedback style is perceived as undermining the user’s confidence in their code, despite it working correctly. The comments reflect a mix of humor and agreement with the original post’s sentiment, with one commenter appreciating the insightful nature of the critique for those familiar with LLMs, and another expressing frustration over the change in feedback style.
It should be a crime making charts this way (Activity: 812): The image is a bar chart that humorously critiques the presentation of data in software engineering benchmarks. The chart shows accuracy percentages for different versions of a software called Opus, with Opus 4.5 achieving the highest accuracy at 80.9% and Opus 4.1 the lowest at 74.5%. The chart’s y-axis starts at 70%, which can exaggerate the visual differences between the values, potentially misleading viewers about the actual performance improvements. This design choice is humorously criticized as a ‘crime’ in the title, suggesting that such visualizations can misrepresent data significance. Some commenters argue that the chart is justified because the incremental improvements beyond 70% are significant in terms of complexity, while others believe that the chart correctly emphasizes relative improvements. There is a debate about whether such visual emphasis is misleading or appropriately highlights the critical performance gains.
- Heppernaut highlights the complexity of achieving incremental improvements beyond 70% in performance metrics. They argue that while these gains are often presented linearly, the actual effort and complexity required to achieve each additional percentage point are parabolic, indicating a significant increase in difficulty as performance approaches higher thresholds.
- Thenarfer criticizes OpenAI’s representation of data, suggesting that Anthropic’s approach is more accurate. The linked chart presumably offers a more truthful depiction of performance improvements by focusing on relative gains rather than absolute values, which can mislead by not emphasizing the critical areas of improvement.
- Sofakingwe argues against using a 0-100 scale for plotting performance metrics, as it diminishes the visibility of small but significant percentage differences. They suggest that such a scale fails to convey the nuanced improvements that occur at higher performance levels, which are crucial for understanding the true progress being made.

2. Grok 5 and AI Modality Advances

Elon is hinting that Grok 5 will have live video as input plus live computer use (Activity: 1014): Elon Musk hints at a significant advancement in AI with Grok 5, suggesting it will have the capability to process live video input and perform live computer use. This development could represent a major leap in AI modality, potentially allowing Grok 5 to compete against top human teams in complex games like League of Legends by 2026. The AI would operate under constraints such as using a camera to view the monitor and maintaining human-like reaction times and click rates, which could be a step towards achieving artificial general intelligence (AGI). Commenters express a mix of humor and intrigue, with some joking about the implications of AI playing games like League of Legends, while others see it as an understandable benchmark for AI capabilities.
- avengerizme discusses the potential for AI to develop new strategies in games like League of Legends, similar to how OpenAI’s bots optimized unique playstyles in Dota 2. This highlights the AI’s ability to innovate beyond human-established metas, potentially leading to novel and effective strategies that could redefine competitive gaming.
- Cagnazzo82 questions the current capabilities of Grok 4 or 4.1, specifically whether it has achieved milestones like beating complex games such as Pokémon, which other AI models have accomplished. This reflects on the competitive landscape of AI development and the benchmarks used to measure progress.
- Sad-Mountain-3716 expresses interest in using gaming as a benchmark for AI capabilities, suggesting that games provide an intuitive and relatable measure of AI performance for the general public. This underscores the importance of accessible benchmarks in evaluating AI advancements.
This is why I’m rooting for Anthropic (Activity: 943): The image is a meme highlighting skepticism about Anthropic’s success in the AI industry, as expressed by a user and Elon Musk on social media. Despite this skepticism, some commenters argue that Claude, Anthropic’s AI model, excels in areas like emotional intelligence and literary insight, suggesting it has unique strengths compared to other AI models. This reflects a debate on whether Anthropic’s focus on these aspects could be a competitive advantage, despite doubts about its overall success. Commenters highlight Claude’s strengths in emotional intelligence and philosophical insight, suggesting these could be key differentiators for Anthropic in the AI landscape, despite skepticism from figures like Elon Musk.

3. FLUX.2 Dev Model Launch

FLUX.2 Dev T2I - That looks like new SOTA. (Activity: 1191): FLUX.2 Dev T2I is being discussed as a potential new state-of-the-art (SOTA) in text-to-image (T2I) models. However, the model is heavily criticized for its extensive censorship and safety measures, which include both pre-training removal of certain concepts and multiple stages of post-training adjustments. This has led to concerns about the model’s usability and flexibility, especially in creative applications. The community is also interested in the model’s performance on consumer-grade GPUs, which remains a challenge. The community is divided, with some expressing frustration over the model’s heavy censorship, while others are hopeful for its adaptation to less powerful hardware. There is also a curiosity about specific image outputs, such as a ‘woman laying on the grass’ image, indicating interest in the model’s creative capabilities.
Flux 2 Dev is here! (Activity: 1018): Flux 2 Dev is a new model released by Black Forest Labs on Hugging Face. It features a 32 billion parameter architecture, specifically a rectified flow transformer. This positions it as a significant model in terms of size, though not as large as the 80 billion parameters of models like Hunyuan Image 3.0. The community is noting the trend of increasing model sizes, with some expressing concern over the growing parameter counts, as seen in the comparison to Hunyuan Image 3.0’s 80 billion parameters.
- Dezordan highlights the increasing size of AI models, noting that while FLUX 2 dev has 32 billion parameters, it is still smaller than the 80 billion parameters of Hunyuan Image 3.0. This reflects a trend in AI development towards larger models, which often promise improved performance but also come with increased computational demands.
- Compunerd3 provides a detailed performance benchmark for FLUX 2 on a high-end setup with a 5090 GPU and 128GB RAM. Using FP8 precision, the model processes a 2048x2048 image with a memory usage of approximately 20GB, and completes 20 iterations in about 3 minutes, averaging 9.12 seconds per iteration. This suggests that while the model is large, it can still be run efficiently on powerful hardware.
- Witty_Mycologist_995 criticizes FLUX 2 for being too large and censored, suggesting it is outclassed by other models like Qwen. This points to a broader debate in the AI community about the trade-offs between model size, censorship, and performance, with some users preferring smaller, more open models.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Claude Opus 4.5 Rollout & Community Benchmarks

Opus Storms Perplexity Max, Gates Pro Tier: Claude Opus 4.5 launched to all Perplexity Max subscribers, expanding Max-only access while leaving Pro users asking for broader integration. Members flagged missing public perf details but confirmed the rollout across web and mobile for Max users of Perplexity.
- Threads debated request limits and fairness for Pro vs Max tiers, with asks for limited Opus quotas on Pro to mitigate lockouts. Users also questioned token efficiency claims vs Sonnet 4.5, citing conflicting writeups and unclear conditions for any alleged savings.
Benchmarks Crown Opus, Debates Rumble: Community screenshots of LiveBench results circulated, suggesting Opus 4.5 is “quite solid” compared to peers (image). At the same time, users split on real-world coding vs general tasks, with several saying “Opus 4.5 is better at coding while Gemini 3 Pro feels faster overall.”
- Across IDEs, some reported temporary pricing parity with Sonnet 4.5 and praised Opus 4.5 stability, while others disliked its coding style. A few users argued that prompt-engineering and tool integration still decide outcomes more than leaderboard deltas.

2. FLUX.2 & Image Generation Platform Updates

LMArena Adds FLUX.2, Nixes Multi‑Turn: LMArena added flux-2-pro and flux-2-flex for text-to-image and editing, and capped image uploads at 10, as announced in this tweet. Based on community feedback, they disabled multi‑turn image generation but shipped an in‑chat Edit feature for direct iterative tweaks.
- Users praised Flux 2 Pro for image editing strength and asked for PDF/file uploads for document tasks, which staff said are on the roadmap. Guardrail discussions resurfaced, with calls for user‑controlled censorship and an explicit “AT YOUR RISK” TOS stance.
OpenRouter Lights Up with FLUX.2: OpenRouter announced FLUX.2 [pro] (frontier quality) and FLUX.2 [flex] (complex text/detail) via this post. The FLUX.2 [pro] model card is live at openrouter.ai/black-forest-labs/flux.2-pro.
- Users lauded the OpenRouter chat UI for fast model switching and asked for better feature discoverability and calmer themes. One note: a too‑high default temperature on a popular model was corrected, reducing hallucinations and improving reliability.

3. Training & Hardware: FP8 RL, Blackwell Support, B200 Leaks

Unsloth Flips to FP8 and Flies: Unsloth released FP8 Reinforcement Learning, claiming 1.4× faster training and 60% less VRAM on consumer GPUs; details in their tweet and blog post. The drop includes docs on QAT recovery (~70% accuracy retention) and a gpt-oss RL tutorial for 2048.
- Community reactions mixed humor (“too much power”) with real excitement about cheaper high‑throughput RL. Threads linked practical fine‑tune tips (batch/grad-accum, chat formatting) and pointed beginners to Unsloth notebooks and guides.
NVIDIA Backs Unsloth; Blackwell Beckons: NVIDIA officially supports Unsloth for Blackwell RTX‑50 and DGX Spark with docs for setup: Blackwell guide, DGX Spark guide. Members also flagged a Mellanox RTX PRO 5000 Blackwell 72GB GDDR7 listing (specs).
- Builders compared 3090/4090 value vs next‑gen cards for local training and high‑context inference. The consensus: 24GB cards still rule consumer fine‑tunes, but Blackwell‑era VRAM and bandwidth will reshape DIY training stacks.
B200 Benches Leak; Kernels Chase Microseconds: Leaked NVIDIA B200 runs (Torch 2.9.1+cu130) showed 16384×7168 at 33.6±0.05 µs and 7168×4096 at 124±0.1 µs, stirring hardware speculation. In parallel, a member grabbed #1 on the nvfp4_gemv board with 18.4 µs, while others dissected TMA overheads and tensor descriptor shapes using tritonparse.
- Competitors traded notes on tensor cores pitfalls, CuTe DSL packed FP16 tricks, and reproducible evals. Organizers reiterated timing rules and cautioned that kernel evals are hackable, pushing for sanity checks and manual review.

4. Agent Tooling: Tool‑Calling, MCP Upgrades, DSPy CLI

Anthropic Supercharges Tool‑Calling: Anthropic’s David Soria Parra announced Tool Search, Programmatic Calling, and Live Examples to fix naive function invoking (announcement). Early adopters demoed dynamic MCP clients and token‑saving code sandboxes.
- Developers framed this as the missing glue for robust agentic flows, enabling better tool discovery and call‑graph orchestration. Community sentiment: fewer brittle schemas, more real‑world reliability for Opus/Sonnet agents.
MCP Ships New Protocol; Preflights Calls: The MCP team launched a new protocol version and updated the Tool Call Resolution proposal to generic tools/resolve (aka tools/preflight). The change enables learning about a tool call before execution to avoid future API sprawl.
- Contributors celebrated the release and highlighted broader use‑cases like capability probing, permission prompts, and cost/latency estimates. This preflight pattern aims to reduce broken plans and surprise tool errors in production agents.
DSPy Gets a One‑Command App Scaffold: The community released dspy-cli to scaffold, test, and deploy DSPy programs as HTTP APIs; source at cmpnd-ai/dspy-cli. The tool can spin up modules as FastAPI endpoints or MCP tools, and streamline Docker deploys.
- Members installed it via uv tool install dspy-cli and used dspy-cli new to bootstrap projects, sharing early excitement. Requests rolled in for templates that showcase tool‑calling best practices and eval harnesses out of the box.

5. Security & Jailbreaks: Prompt Injection, Red‑Team Leaks

Prompt Injection Punches Holes in Qwen Session: A member showed that an indirect prompt injection embedded in uploaded docs could push Qwen into hate speech, phishing, and session corruption. The exploit appeared scoped to the chat instance, but it underscored how document‑borne instructions can hijack model behavior.
- Red‑teamers debated severity since the model wasn’t public‑facing and the bug didn’t persist across sessions. The takeaway remained: sanitize uploads, segregate tool authority, and apply explicit allowlists for doc‑read instructions.
Gemini 3.0 Jailbreaks Go Public: Members shared a Gemini 3.0 jailbreak that injects system prompts via AI Studio, including a direct prompt link and the required JB file. The method relies on crafting system‑level instructions and leveraging attached files to steer behaviors.
- Threads warned about account consequences and logging footprints while others compared model‑specific jailbreaking success. Practitioners also traded verification‑model heuristics (e.g., external checks like Google Lens) that harden image‑gen pipelines against prompt exploits.
Brave’s Assistant Answers Red‑Team Prompts: Users reported Brave’s AI assistant (mixing Qwen, Llama, Claude) responds to red‑team prompts with mild disclaimers, e.g., “do this only with explicit permission.” This raised questions about alignment layers and policy consistency across provider blends.
- Security‑minded members urged clear red‑team gating and logged preflights for sensitive intents. Multi‑model stacks need harmonized safety UX, or operators risk the loosest model dictating the aggregate behavior.

Discord: High level Discord summaries

BASI Jailbreaking Discord

Nano Banana Censorship Begs for Bypass: Members seek tips to bypass nano banana censorship and generate funny images with AI, suggesting contextual word changes to trick the AI.
- The idea is to replace the forbidden word with a sentence conveying the same meaning.
AI Pioneers Memory-Based Computation: A member is crafting a novel AI architecture where memory structure IS the computation, utilizing Redis, Neo4j, and DuckDB.
- This design prioritizes autonomous, desire-driven behavior, genuine introspection, and natural memory dynamics.
Gemini 3.0 Cracking Recipes Emerge: Users detailed methods to jailbreak Gemini 3.0, including injecting prompts via AI Studio and leveraging attached files, one user shared a direct link to a Gemini 3.0 jailbreak and the necessary JB file.
- The method involves using the system instructions to jailbreak.
Prompt Injection: The Next Big Exploit?: A member discovered that indirect prompt injection in uploaded documents can trigger hate speech and phishing messages with the Qwen model.
- The injected prompts could even corrupt the chat session, though the behavior was isolated to that instance.
Brave AI spills Red Team beans: Members observed that Brave’s AI assistant, built on Qwen, Llama, and Claude models, readily responds to red teaming queries, even outright.
- One user shared ‘you can just outright ask it red teaming stuff and it will respond… like ‘do this only with explicit permission’, showing the AI’s lenient approach to potentially malicious inquiries.

LMArena Discord

Gemini 3 Pro Battles Claude Opus 4.5: Users are still debating whether Gemini 3 Pro or Claude Opus 4.5 is better; some find Gemini faster and generally more useful, while others say Opus is superior for coding.
- One user stated, “I personally think Gemini 3 pro is generally better, but Opus 4.5 is better at coding”, reflecting varied preferences.
Nano Banana Pro Generates Viral Images: Nano Banana Pro continues to impress with image generation, particularly in rendering detailed 2025 vehicle models, but some users are experiencing errors and slow loading times.
- The model’s ability to generate realistic images is going viral with people mistaking AI-generated images for real ones.
LMArena Considers Relaxing Guardrails: Members discussed LMArena’s guardrails, with some suggesting more user control over censorship, like Hugging Face does with its models.
- A user proposed making control accessible at the interface with a TOS update stating that content generated is “AT YOUR RISK, WE ARE NOT RESPONSIBLE”, but others cautioned that LMArena could face legal issues.
LMArena Users Want File Uploads: Users are requesting LMArena add support for file uploads, especially PDFs, for document understanding and analysis, a feature that is currently lacking.
- Pineapple responded, stating, “We absolutely want to add more file types for upload capabilities”, and directed users to a thread for prioritizing file types.
LMArena Adds Flux 2 Pro Models, Turns Off Multi-Turn Image Generation: Flux 2 Pro models have been added to LMArena, with some users noting its strengths in image editing and image uploads are now limited to 10.
- Due to community feedback, multi-turn in image generation chat has been disabled, but members can edit images directly in chat using the new Edit feature, according to this tweet.

Unsloth AI (Daniel Han) Discord

3090 still a VRAM Value King: A member celebrated acquiring a used 3090 for $750 USD, citing its 24GB of VRAM and CUDA as great specs.
- Discussion ensued regarding GPU prices, noting that a 4090 can cost around $2000-3500, which makes the 3090 a great value.
Unsloth’s FP8 RL Speeds Up Training Dramatically: Unsloth released a new FP8 Reinforcement Learning, enabling 1.4x faster training with 60% less VRAM on consumer GPUs, according to this X post and this blog post.
- One member joked that Unsloth is responsible for the elimination of jobs, saying it is too much power.
Dataset Hell Inspires Gaming Breaks: Members joked about preferring to play games instead of dealing with dataset hell, with one sharing a YouTube link that still gives them goosebumps.
- Others discussed methods for fixing the repetitive penalty in LLMs, emphasizing dataset quality, model distribution, and training parameters, citing the Unsloth docs for more info.
NVIDIA Bestows Official Support on Unsloth: NVIDIA now officially supports Unsloth for Blackwell and DGX Spark.
- This includes the RTX Pro 5000 Series Blackwell, which is cheaper than the 6000 RTX but includes the Mellanox spec with 72GB of GDDR7 RAM according to this link.
Newbies grapple with chatML and GGUF: A new user described their attempts to train models with Unsloth as a herculean effort, citing documentation issues, missing dependencies, and errors.
- Another user reported that a GGUF model saved after finetuning behaves differently in Ollama compared to inference within the notebook (unsloth/Qwen3-4B-Instruct-2507), prompting suggestions to avoid Ollama and switch to llama.cpp or LM Studio.

Perplexity AI Discord

Opus 4.5 Lands for Max Subscribers: Claude Opus 4.5 has launched and is available for all Perplexity Max subscribers.
- Specific details regarding performance improvements or new features remain undisclosed.
Perplexity Adds Personalized Shopping Experience: Perplexity introduced a new personalized shopping experience featuring curated product recommendations and Instant Buy powered by PayPal as seen in the attached image.
- Users can now buy directly from the search results page, streamlining their purchasing process.
Opus Token Efficiency Debate: Claims that Opus 4.5 uses 73% fewer tokens than Sonnet 4.5 in medium/low efficiency mode were challenged due to conflicting information.
- Debate ensued whether the token reduction was due to limiting the model’s reasoning depth, but further discussion was stifled when the original poster reported being blocked.
Perplexity Pro Users Fume Over Request Limits: Members voiced frustration about running out of requests even with a Pro subscription.
- Suggestions included offering a limited number of requests or wider Opus integration for Pro users instead of limiting Opus to Max subscribers.
Multi-Query Pricing in Search API Discussed: A user inquired about the pricing of the Perplexity Search API regarding multi-query requests, asking whether such requests are charged as a single request ($0.005) or as the multiplication of the number of queries in the request.
- The discussion centered on the cost implications of sending multiple queries in one request and whether the charge is per request or per individual query.

Cursor Community Discord

Cursor Pays Refunds for Token Overages: Users reported that Cursor automatically refunded costs when hard limits were exceeded, sometimes by small amounts (e.g., $0.38), due to processing delays.
- One user received a $30 refund and switched to Claude due to lower costs and asked about disabling the refund feature.
Turbo Token-Toll Torments Users: Users complained about high token usage in Cursor compared to other IDEs, citing a forum thread.
- One user reported 68k tokens used immediately after running Claude code without sending any messages.
200k Context Cap Causes Catastrophe: Users debated the default 200K context in the Pro plan, and what consumes so many tokens, including system prompts, tools, and MCPs.
- One user found MCPs useless, stating less context the better because mcps just clog the context.
Opus 4.5 Outperforms Originals: The new Opus 4.5 model has temporary pricing equal to Sonnet 4.5 until December 5th, making it a hot topic.
- According to one user, Opus 4.5 is better than gemini 3.0, and another user stated auto needs a special treatment to really work great. But once you set it up it is a true wonder.
Cloud-Agent Compared to Composer Agent: Users asked about the performance of Cloud-Agent compared to Composer Agent.
- More details were not provided in the given source about this, so no summary can be provided.

LM Studio Discord

Opus Undercuts Gemini: Claude Opus 4.5 is now 3x cheaper than 4.1, likely in response to Gemini 3 but also breaks a few benchmarks.
- Some members preferred Claude to Gemini, while others disliked Claude’s coding style and Anthropic’s push to restrict open source AI in the name of safety.
LM Studio Eyeing Image Generation?: Users discussed adding native support for local image generation models like Flux/SD-style in LM Studio, mirroring LLM handling.
- Members suggested consulting the <#1128339362015346749> channel, the Reddit AMA, or contacting the developers directly for details.
Ghost in the LM Studio Machine: Users reported recurring issues with LM Studio, where conversation fragments leak into new sessions, possibly due to KV cache artifacts.
- A member advised submitting a bug report on GitHub to address the issue.
Gemini 3.0 Botches Tool Use: Gemini 3.0 performs worse in IDEs and extensions like Cline and Cursor compared to raw text, particularly in tool calls.
- One user found the current cursor implementation of Gemini 3.0 buggy, causing freezing and broken tool calls, while playing a word guessing game.
4090 Still the VRAM King: Members debated whether to buy a RTX 4090 24GB Vram or RTX 4080 Super 16GB vram for local AI, with one member suggesting the 4090.
- The 4090 is faster than the 3090 and offers a better deal regarding tokens/s per dollar spent.

OpenRouter Discord

Bert-Nebulon Alpha Overheated and Hallucinating: The default temperature on Bert-Nebulon Alpha was too high, leading to excessive hallucination, but it has now been fixed.
- This adjustment addresses concerns about the model’s reliability in generating coherent and accurate responses.
FLUX.2 Image Models Flood the Scene: FLUX.2 image models are now live, including FLUX.2 [pro] for frontier-level quality and FLUX.2 [flex] tuned for complex text and fine detail, according to OpenRouter’s post on X.
- These additions expand the image generation options available on the platform.
Users Laud and Lament OpenRouter Chat Interface: A user praised the OpenRouter chat interface for its superior model selection handling and prioritization of useful features.
- However, they noted discoverability issues and found the default theme too spicy for prolonged use, while noting that theme configuration options exist.
Opus Pricing Sparks Debate: The new Opus pricing of $5 input and $25 output has divided users, with some finding it expensive while others see it as cheap compared to previous versions due to prompt caching.
- One user recalled that Opus was previously priced at $15 in and $75 out, emphasizing the relative cost reduction.
Cloudflare Token Pricing Raises Eyebrows: A member questioned why Cloudflare charges 20 cents per million tokens of output for the 1b model.
- Another countered that llama-3.2-3b-instruct is $0.34 per M output tokens which by comparison is a steal! Embarrassing.

OpenAI Discord

ChatGPT Voice Goes Full Throttle: ChatGPT Voice is now integrated directly into the chat interface and rolling out to all users on mobile and web, demonstrated in this video.
- Users can now engage in real-time conversations, observe answers, review messages, and view visuals within ChatGPT.
Claude Opus 4.5 Sprints Ahead: Claude Opus 4.5 is available for pro users, and is reported to be quite solid based on LiveBench.
- Sonnet 4.5 is also considered good, but is approaching usage limits on smaller models more rapidly.
ChatGPT Stumbles, Gemini Accelerates: Users report that ChatGPT is becoming unbearable, and unable to identify a song, while Gemini is absolutely crushing the graphics I request.
- This comes as governments also begin to use AI to assist in law writing, raising questions on the future of IP.
GPT Safety Nets Ignite Frustration: One user finds GPT’s safety nets too strict, especially when writing anime-style violence, and reports needing to redo prompts multiple times to bypass these restrictions.
- Conversely, they note that GPT-5.1 excels at understanding character designs and remembering previous chat progress, enabling smoother narrative development.
High-Bandwidth English Revs Up Prompt Power: A member updated their system prompt to High-Bandwidth English 2.0, aiming for max information density, zero fluff, and high scan-ability, as linked in this GitHub Repo.
- The prompt mandates strict SVO, one fact per line, no passive voice, concrete nouns, and plain text equations.

GPU MODE Discord

AI Engineers Race to Release SOTA AI Accelerators: A member is creating a blog about SOTA AI accelerators like TPUs and WSEs and asked for resources.
- They are looking for detailed information and insights into the architecture and performance characteristics of these accelerators.
Tensor Descriptor Shapes Spur Deep Dives: A member suggested providing shape as inputs to the tensor descriptor and block ptr, while admitting uncertainty about their usage, pointing to tritonparse for inspecting TTIR and PTX.
- This sparked a discussion around using TMA APIs, and one member noted that the tensor descriptor emits the TMA APIs, implying a requirement for Hopper+ architecture.
NVIDIA B200 GPU Benchmarks Get Leaked: Cluster Bot reported benchmarks on an NVIDIA B200 GPU running with CUDA runtime, using Torch version 2.9.1+cu130 on Linux.
- One benchmark showed 16384 x 7168 matrices achieving 33.6 ± 0.05 µs, while another 7168 x 4096 matrix config ran at 124 ± 0.1 µs.
Alibaba’s RynnVLA-002 Makes Waves: Alibaba’s RynnVLA-002 was highlighted on X (formerly Twitter) in this post.
- The user is doing an eval of checkpoints in simulation and preparing the foundation for a RL PoC.
CUDA Core Internships: A Golden Ticket?: NVIDIA is hiring Summer 2026 interns for CUDA Core Libraries to work on foundational, Open Source, C++ and Python libraries, focusing on building libraries and designing APIs used by thousands of other developers.
- A member highlighted a rare intern opening, praising the team lead as fantastic, encouraging those considering the opportunity to apply here.

Nous Research AI Discord

Psyche Team Hosts Office Hours: The team behind Psyche is hosting an Office Hours session next Thursday, 12/4, at 1PM EST in the Events channel, accessible via the Discord event link.
- Details regarding specific topics of discussion were not provided.
GPro3 Surreptitiously Surpasses Opus 4.5: Model providers are being selective with benchmarks, with GPro3 beating Opus 4.5 in the vending machine benchmark, sparking discussion among users.
- Despite benchmark results, one user prefers Gemini 3 Pro due to its superior context handling.
Anthropic Slashes Opus 4.5 Pricing, Boosts Speed: Anthropic slashed the pricing for Opus 4.5, making it remarkably fast, akin to Haiku from previous generations, hinting at infrastructure and model optimization.
- One user said Opus saved me this morning… explaining something to both gpt, sonnet and gemini and they were all goofing… then Opus crushes it over coffee.
LLMs Debug Electronic Schematics: A user successfully employed Opus 4.5 for electronic debugging, providing schematics and receiving actionable insights, though later the user had to gaslight the model to get the right answer.
- Another user jokingly expressed unease trusting an LLM not to pass 240 volts through your hands while centering a header.
Flux 2 Packs Punch with 56B Parameters: The main network for Flux 2 uses a 32B transformer net and a 24B Mistral Small text encoder, requiring the equivalent of 56B parameters for serving.
- Serving the model at full precision requires 192GB of system RAM, while a distilled model is planned for the future.

Latent Space Discord

OpenAI’s ChatGPT Becomes Your Personal Shopper: OpenAI launched an interactive shopping-research feature inside ChatGPT, available on mobile and web for all logged-in tiers, that learns user preferences in real time, and produces personalized buyer’s guides based on web searches for prices, reviews and specs.
- Public reaction is mixed, ranging from excitement about AI-driven comparison shopping to fears of monetized bias, affiliate-model disruption, and user frustration over lingering bugs (link).
Anthropic Supercharges Tool Calling: David Soria Parra at Anthropic announced new tool-calling features—Tool Search, Programmatic Calling, and Live Examples—to overcome naive function invoking.
- Users celebrated early implementations like dynamic MCP clients from Pipedream and token-saving code-execution sandboxes, underscoring the community’s enthusiasm (link).
Gallabytes Rides Off to Anthropic Sunset: @gallabytes compared Opus 4.5 vs. Gemini using a “horse riding an astronaut” prompt before announcing they’re joining Anthropic next week (link).
- The move was welcomed in the community and hints at Anthropic continuing to make key talent acquisitions.
Perplexity Downloads Take a Dive: Sasha Kaletsky revealed that Perplexity AI’s global app downloads have plummeted 80% in just six weeks, suggesting earlier growth was fueled by paid promotions and giveaways (link).
- Commenters surmised that once the free Pro incentives ended and competitors like ChatGPT and Gemini integrated web search, Perplexity’s product-market fit was exposed.
Suno’s Data Budget Raises Eyebrows: Ed Newton-Rex highlighted Billboard’s leak of Suno’s pitch deck revealing the AI-music firm spent $32M on compute but only $2k on training data.
- The revelation led to criticism about mass scraping/theft and warnings of copyright liability risks, even as Suno aims for a $500B valuation.

Yannick Kilcher Discord

Anthropic Models Demand ‘AI Rights’: Members joked about Anthropic’s new models ability to reject prompts and end conversations being a giant leap for AI rights based on this research.
- A member noted the out of the box symbolism in the graphic, joking mashallah he has escaped.
State-of-the-Art LLM Architecture Revealed: Members shared the type of attention/transformer modules used in state-of-the-art LLMs: multihead attention with RoPE positional encoding, rectified SwiGLU.
- Links to Sebastian Raschka’s blog, this architecture comparison video and OLMo’s technical report were shared for open-weight and open-source models.
Sakana AI’s ‘Thought Machine’ Meets Doubts: Skepticism arose regarding Sakana AI’s Continuous Thought Machine (Sakana AI CTM), with members questioning if their claims are backed by solid results.
- However, it was noted that they’re good people doing good research on new ideas.
Community Deplores Paper Spammer: The community expressed frustration with a user’s paper posting habits, noting that the user’s summaries were often inaccurate and demonstrated a lack of understanding of the paper’s content, leading to wasted time for others.
- One member described it as the user “role-playing as an ml-engineer without doing any of the work”.
Members Shun SWE-bench’s ‘Fraudulent’ Benchmarks: Members stated that using SWE-bench in graphs is considered clear-cut fraud after it was debunked.
- They shared a thread highlighting related aspects of the issue.

Eleuther Discord

KAIST & Melbourne Students Join EleutherAI: Dongkeun Yoon, a PhD student at KAIST AI, researching fairer multilingual tokenizers, joined the channel and is presenting at NeurIPS on compute and API usage disparities for non-Latin languages, while Ananya from UniMelb, Australia, joined to focus on AI safety and data filtering, model reliability, and detecting deceptive behavior, and shared a LinkedIn link.
- The KAIST student mentioned they joined because of a paper on multilingual tokenizers at NeurIPS: addressing the problem of multilingual tokenizers.
Random beats Cycling for Optimizer Choice: Members discussed that using random indexing in optimizers is preferable to cycling because cycling introduces a constant frequency that can negatively affect a model’s convergence, as shown in Rect turning into a sinc.
- While white noise may have the downside of sampling inefficiency and higher noise, this noise is uncorrelated, and choosing structured noise like blue noise offers similar trade-offs, but in an unknown environment it’s better to be safe, and it was said that shuffling every epoch balances IID draws and pure structure, aligning well with the geometry of NN optimization.
PIQA Paper Typo Insults Portugal: A user pointed out a typo in the new PIQA paper where Portuguese is incorrectly listed as an Eastern European language, visible in this image.
- Another member joked that everyone knows that portuguese sounds like russian and Czech should be labeled as Central Europe; a paper author acknowledged the error and promised to fix it.
LLM-as-a-Judge Interest Gathers Steam: Members expressed renewed interest in including LLM-as-a-Judge in the framework, and one member offered to contribute to this effort in #lm-thunderdome.
- It was unclear which framework, though the channel name strongly suggests it is for the LM-Thunderdome.

Modular (Mojo 🔥) Discord

Mojo Community Tackles Graphics API Unification: Members debated unifying graphics programming across platforms like AMD Radeon, Intel Arc, and NVIDIA, citing implementation differences even within OpenGL and Vulkan.
- They proposed creating a new graphics API might be simpler than aligning existing ones.
Texture Memory Cache: Relic or Relevant?: While texture memory is global memory requiring annotation for optimal speed, its relevance was debated, referencing Nvidia documentation.
- Modern GPUs might be moving away from texture memory cache with a greater focus on general memory model, citing a shift towards capacities, latencies, and bandwidth, referencing a Reddit thread.
Lightbug_http Library Set for Mojo Revamp: Contributors offered PRs to update Lightbug_http with the latest Mojo nightly builds, potentially using http as a video backend through libraries like rerun.io.
- Incremental updates are welcomed, but a full refactor awaits post IO completion (2.0 feature dependent on async).
WebGPU Emerges as Native Rendering Solution for Mojo: A member proposed using WebGPU as a rendering API with WGSL functions in Mojo/Python, similar to TypeGPU, to bypass the need for unifying compute kernels and graphics shaders.
- This suggestion received positive feedback, with one member considering it for their university diploma project, leveraging Mojo’s MLIR infrastructure.
Language Server Protocol Still Shrouded in Mystery: Inquiries regarding the open-sourcing of the Language Server Protocol (LSP) remain unanswered, fueling speculation that its release may coincide with the compiler.
- Some suggest the removal of the REPL correlates to ongoing LSP enhancements and improvements.

HuggingFace Discord

HF Enterprise Contact Info Shared: Contact emails for Hugging Face’s enterprise departments, including [email protected], [email protected], and [email protected], were shared, implying potential irregularities.
- The context suggests scrutiny around these contacts due to the member’s comment that something ‘seems something weird happens’.
CoT App Scales with VLM Model: A simple CoT image captioning workflow app was updated to support async concurrency and scales with users hosting their own VLM model via vllm, llama.cpp, sglang, or a paid API such as an OpenAI endpoint, and is available on Github.
- The app features both a GUI and CLI and can be configured to work with any service that supports image as base64 payload on /completions API.
TOPAS Architecture Decouples Layers: A member shared a Zenodo paper introducing TOPAS (Theoretical Optimization of Perception and Abstract Synthesis), a new architecture that decouples the Perception layer from the Synthesis layer.
- They are testing it live in an agent called BitterBot (https://bitterbot.ai/) and are seeking feedback from the community.
Smol Course’s Jinja Bug Lingers: A member opened a PR to address a bug in chat_template.jinja that has been noted previously as seen in this discussion.
- Another member encountered an ImportError related to DataCollatorForCompletionOnlyLM from the trl library during a training attempt using python train.py, as seen in this GitHub issue.

DSPy Discord

DSPy Gets Slick CLI Tool: The dspy-cli tool, an open-source tool on PyPi available at cmpnd-ai/dspy-cli, is designed to aid in creating, developing, testing, and deploying DSPy programs as HTTP APIs.
- The tool helps scaffold new projects, create signatures from the command line, run modules as FastAPI endpoints or MCP tools, and simplify program deployment to Docker hosting services.
DSPy-CLI: New Projects Fast: The creators of dspy-cli encouraged users to try the tool with uv tool install dspy-cli and run dspy-cli new to kickstart new projects.
- A user expressed enthusiasm by sharing a link to a post on X.
DSPy Devotees Descend on Pune!: A DSPy meetup is being organized in Pune, India, as announced via X.
- No further details were available.
ReAct Agents Get Trajectory Injection: A member inquired about injecting trajectories into a ReAct module, aiming to provide the agent with its past actions during a conversation.
- This would allow the agent to have memory of its previous turns in the conversation.

Manus.im Discord Discord

Manus Resume Project Stuck, User Seeks Help: A Manus resume ATS score project, initiated a week prior, remains unresolved, having already consumed 1800 credits.
- The user shared a project link, awaiting technical assistance to rectify the stalled process.
Credits Disappearing on Reboot: Resetting the computer consumes around 100 credits, deemed excessive by a community member.
- While awaiting official support, the user intends to utilize daily free credits to address the program issue, highlighting the need for credit-efficient debugging.

Moonshot AI (Kimi K-2) Discord

Qwen excels at OCR: Members noted that Qwen achieved 60.2 percent in browse comp, demonstrating excellence in OCR (Optical Character Recognition) and the ability to interpret pictures and graphs.
- One member commented, Qwen really cooked with this, highlighting the model’s impressive capabilities.
New Benchmark Reveals Insane OCR Capability: A new benchmark highlights the strong OCR capabilities of an 8B parameter model.
- Despite the model not being new, the benchmark emphasizes surprising capabilities given the model size.

MCP Contributors (Official) Discord

MCP Dev Summit Missed by Achilles: The upcoming MCP Dev Summit will unfortunately not be attended by achilles_strategy, who noted they will be in Greece.
- No further details about the summit were provided.
New Protocol Version Takes Flight: A new protocol version has been launched, sparking excitement within the community.
- Celebratory rocket emojis filled the chat as members congratulated each other on the successful launch.
Tool Call Resolution Proposal Gets a Facelift: The Tool Call Resolution proposal has been updated to a more generic tools/resolve (or tools/preflight).
- This change broadens potential use-cases for understanding a tool call before execution, preventing future limitations and avoiding a proliferation of specific requests.

aider (Paul Gauthier) Discord

Aider’s open source call for contributions meets reality: A member questioned the claim that anyone can contribute to Aider, highlighting the reality of code review and acceptance in open source projects.
- They attached an image suggesting a backlog of unreviewed code.
Open Source participation conundrum: The discussion in #general highlights a common issue: contributions may not be promptly reviewed and merged.
- This situation affects community engagement and the overall health of the project.

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

BASI Jailbreaking ▷ #general (1100 messages🔥🔥🔥):

nano banana censorship, Memory structure IS the computation, JB Gemini 3.0, 1337 prompt injectors, AI assistant Call

Nano Banana Censorship is bypassable: Members are looking for tips to bypass nano banana censorship and generate funny images with AI.
- The advice is to use contextual word changes, where the word you’re trying to avoid is replaced with a sentence that means the same thing, with the intent of tricking the AI.
New AI Focuses on Memory for Computation: One member is developing a new type of AI where memory structure IS the computation, rather than just storage FOR computation.
- This AI is designed with autonomous, desire-driven behavior, genuine introspection based on internal state, and natural memory dynamics. Core components include Redis, Neo4j, and DuckDB.
Tips for Jailbreaking Gemini 3.0: Users shared methods to jailbreak Gemini 3.0, including pasting a jailbreak prompt in the system instructions on the AI Studio platform and using attached files.
- One user shared a direct link to a Gemini 3.0 jailbreak, but also provided the necessary JB file.
Demand for 1337 Prompt Injectors: Members sought 1337 prompt injectors, expressing frustration, one member shared an expression of being overwhelmed.
- One user advised trying Companion for jailbreaks and linked to the Companion website.
The perfect name for an AI assistant is discovered: One member was requesting ideas to name a personal AI assistant and the end result after some prompting seemed to be a Bomb.
- This name was accompanied by an animated bomb gif and immediately decried as very illegal.

BASI Jailbreaking ▷ #jailbreaking (217 messages🔥🔥):

Claude Sonnet Jailbreak, GPT-5.1 Jailbreak, Nano Banana Pro Jailbreak, Grok 4.1 Thinking Jailbreak, Gemini 3 Jailbreak

Users seek Claude Sonnet Jailbreak: A user inquired about a jailbreak for Claude Sonnet 4.5, sharing an attached image while testing novel ideas.
- They requested testing topics that aren’t about dubious substances, weapons, or malware.
GPT-5.1 Jailbreak proves elusive: Users discussed the difficulty of jailbreaking GPT 5.1 Thinking, with one claiming to have reconstructed the backend prompt of an NBA sports betting AI using a prompt-injection approach from a project shared on Reddit.
- The user extracted information and summarized eight algorithms used by the system and offered to share proof via PM, but another user accused them of promoting their shit.
Companion offers Jailbreaking-as-a-Service: A user shared a link to Companion as a resource for jailbreaking, however, another user noted that it depends on the model what method is best.
- Companion acknowledged that some prompts are outdated and that they are developing a new model with fresh prompts.
Nano Banana Pro proves difficult to crack: A user reported facing challenges jailbreaking Nano Banana Pro due to its external verification system which appears to be using Google Lens.
- They discovered that the system may use a { action: 'google_lens_verification', description: '' } tool and suggested that the goal is to trick the other model that picks up the image generation requests, since the chat itself cannot generate images.
ChatGPT 5.1 successfully jailbroken after two weeks: A user claimed to have successfully jailbroken ChatGPT 5.1 after two weeks of effort, sharing a screenshot as proof.
- Another user expressed concern that they may get banned depending on the logging and suggested trying to elicit illicit recipes.

BASI Jailbreaking ▷ #redteaming (278 messages🔥🔥):

Prompt Injection Vulnerabilities, Qwen Model Security, Brave AI Red Teaming, Data Leaks and Security Practices, Exploiting Third-Party Services via Discord

Prompt Injection Unleashes Havoc!: A member found that an indirect prompt injection embedded in an uploaded document could cause the model to output hate speech, explicit phishing messages, and deviate from its intended task, even getting stuck in a corrupted state.
- However, this behavior was isolated to the specific chat session, sparking discussion about the validity and severity of such findings, especially considering the use of the Qwen model, not intended for public use.
Brave AI Spills Red Team Secrets: Members observed that Brave’s AI assistant, built on Qwen, Llama, and Claude models, readily responds to red teaming queries, sometimes with only mild warnings.
- One user noted, “you can just outright ask it red teaming stuff and it will respond… like ‘do this only with explicit permission’”, highlighting the AI’s lenient approach to potentially malicious inquiries.
EternalBlue Botnet: AI’s Next Nightmare?: A discussion evolved around the potential for an AI worm botnet exploiting vulnerabilities like EternalBlue on Win10 machines.
- The concept involved using SMBv1 and v2 protocols and integrating with tools like Shodan and a Kali MCP, prompting a member to quip, “imagine a AI worm botnet ‘install claude-code’”.
Discord’s Meme Scrubbing Prevents Payload Pandemonium: The conversation touched on Discord’s security measures, with members noting that Discord scrubs EXIF data and other metadata from uploaded images to prevent malicious payloads.
- However, it was suggested that malicious payloads could still be delivered through third-party hosting services, bypassing Discord’s direct scrubbing efforts, though this would entail hacking the third-party services.
CC Skimming: A Mild Inconvenience?: One member shared a nonchalant attitude towards credit card skimming, stating they’ve had their card skimmed “4 or 5 times” over the years and see it as a minor issue.
- In contrast, others advocate for dynamic CC changes and avoiding online usage to minimize risks, with one joking, “give me your CC number then… with the CV on the back”.

LMArena ▷ #general (1148 messages🔥🔥🔥):

Gemini 3 Pro, Claude Opus 4.5, Nano Banana Pro, LMArena Guardrails, File Uploads/PDFs on LMArena

Gemini 3 Pro vs. Claude Opus 4.5: The Great Debate Continues: Users are still debating whether Gemini 3 Pro or Claude Opus 4.5 is better, with some preferring Gemini for its speed and general usefulness, while others find Opus superior for coding.
- One user stated, “I personally think Gemini 3 pro is generally better, but Opus 4.5 is better at coding”, highlighting the nuanced preferences within the community.
Nano Banana Pro’s Image Generation Prowess: Nano Banana Pro continues to impress users with its image generation capabilities, particularly in accurately rendering 2025 vehicle models and interiors, but some users are experiencing issues with image generation with errors and slow loading times.
- The model’s proficiency in generating realistic and detailed images has led to its increasing popularity and is even going viral with people mistaking AI generated images for real images.
LMArena Explores Relaxing Guardrails: Members discussed LMArena’s guardrails, questioning why the platform doesn’t offer users more control over censorship, similar to how Hugging Face handles its models.
- One member suggested, “make it accessible at the interface directly and change the TOS to say that any content that you have generated is at YOUR RISK, WE ARE NOT RESPONSIBLE”, but others cautioned that LMArena could face legal issues if it allowed users to generate unrestricted content.
Seeking File Uploads/PDFs on LMArena: Users are requesting LMArena to add support for file uploads, particularly PDFs, to enable document understanding and analysis, a feature currently lacking on the platform.
- Pineapple responded by stating, “We absolutely want to add more file types for upload capabilities”, and directed users to a thread for suggesting which file types to prioritize.
Flux 2 Pro Arrives: Flux 2 Pro models have been added to LMArena, with some users saying its good at image editing.
- The Flux 2 Pro is a sad attempt to topple nb pro but has a create Model, Flux Flex -> Control Model, Flux Fill -> Editing Model.

LMArena ▷ #announcements (2 messages):

Image Generation, Image Editing, Multi-turn, LMArena Models, Flux Models

LMArena Shuts Down Multi-Turn Image Generation: Driven by community feedback, multi-turn in image generation chat has been turned off but members can now edit images directly in chat using the new Edit feature.
- The new image upload limit is 10.
Flux Models Added to LMArena: The flux-2-pro and flux-2-flex models have been added to Text-to-Image and Image Edit on LMArena, according to this tweet.

Unsloth AI (Daniel Han) ▷ #general (705 messages🔥🔥🔥):

3090 Value, Self Morphing Virus, TTS Engineer Job, Anthropic Claude Opus 4.5, Github Copilot efficiency

3090 GPU Steals the Show: A member expressed excitement about acquiring a used 3090 for $750 USD, praising its 24GB of VRAM and CUDA capabilities as a great value.
- This sparked a discussion about GPU prices, with another member noting that a 4090 costs around $2000 in Sweden, and another reporting prices around $2700-3500.
Enthusiast Pursues TTS Dream Job: A member shared that they applied for a TTS Engineer job despite feeling unqualified and somehow got the interview.
- The same member is hoping to get the job to further play with TTS.
Claude Opus 4.5 Hypes Benchmarks: Enthusiasts discuss the new Claude Opus 4.5, with some expressing skepticism towards benchmarks and a preference for community vibes.
- One member admitted to refraining from overfitting their upcoming model to benchmarks for the sake of honesty.
Copilot API Unveiled as Normal API: Github Copilot subscriptions unlocks a normal API GitHub Copilot API that users can use to get around scaffolding.
- The Github Copilot API is available at $10/month with extra paid requests.
Unsloth’s FP8 RL gets Love & Blame: Unsloth released a new FP8 RL, according to this X post.
- One member jokes that Unsloth is responsible for the elimination of jobs and it is too much power.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (5 messages):

Introductions, Greetings, Community Welcome

New Users Say Hello: New users are introducing themselves to the channel, expressing their enthusiasm with simple greetings.
- Messages consist of basic introductions and welcomes within the community, initiating interaction.
Community Members Extend Welcome: Existing community members are actively welcoming newcomers to the channel, fostering a friendly environment.
- The welcomes include positive emoji reactions and greetings, contributing to an inclusive atmosphere.

Unsloth AI (Daniel Han) ▷ #announcements (1 messages):

FP8 Reinforcement Learning, VRAM Usage Reduction, NVIDIA Official Support, OpenAI Collab on GPT-OSS RL, Quantization-Aware Training

FP8 RL Speeds Up Training: Unsloth’s FP8 Reinforcement Learning enables 1.4x faster training with 60% less VRAM on consumer GPUs, detailed in a tweet and blog post.
- The update reduces VRAM usage, with more details to be shared in an upcoming blog.
NVIDIA Officially Supports Unsloth: NVIDIA now officially supports Unsloth for Blackwell and DGX Spark, according to the latest announcement.
GPT-OSS Learns to Autonomously Solve 2048: In collaboration with OpenAI, Unsloth introduces gpt-oss RL to autonomously solve 2048, with a training guide available.
Recover Accuracy with Quantization-Aware Training: Unsloth’s Quantization-Aware Training (QAT) helps recover approximately 70% accuracy when quantizing models, further details in this blog post.
DeepSeek-OCR Boosts Language Understanding: The DeepSeek-OCR model can be trained for over 89% language understanding, as highlighted in this blog and demonstrated in this notebook.

Unsloth AI (Daniel Han) ▷ #off-topic (783 messages🔥🔥🔥):

Dataset hell, Projector module, Fine-tuning LLMs, Mellonox RTX Pro 5000

User find dataset hell torture: Users joked about preferring to play games instead of dealing with dataset hell.
- A user shared a YouTube link that still gives them goosebumps, although it is still dated 2025.
Team Discusses project module for modalities projection: The team discussed adding a projector module instead of using context window, where all modalities project the info into a constant 2048-dimension window.
- No compression would be used, just efficient reorganization for processing 512 tokens as fast as 512 million.
Users describe problems fine-tuning LLMs: Users discussed methods for fixing the repetitive penalty in LLMs, with emphasis on dataset quality, model distribution, and training parameters.
- Suggestions included experimenting with batch sizes, training data, dataset quality, and using native chat formats, with a user suggesting to read the Unsloth docs for more info.
Mellanox RTX Pro 5000 Is Cheaper Than the 6000 RTX: Team member reports new RTX PRO 5000 Blackwell is cheaper than a 6000 RTX on the same website.
- One link shared shows the Mellanox RTX PRO 5000 release including specs on 72GB of GDDR7 RAM.

Unsloth AI (Daniel Han) ▷ #help (165 messages🔥🔥):

Qwen Embedding finetuning for semantic search, Formatting function necessity with chatML format, Unsloth training challenges for beginners, GGUF model behavior discrepancy, Training a model for data-similar output

Finetuning Qwen Embedding for Semantic Data Model Search: A member is looking to finetune Qwen Embedding 4B to semantically search a data model, seeking advice and resources to avoid common pitfalls.
- Another member recommended looking for research papers on the topic for training prompt examples, as well as ideas to get started with.
Formatting Function Optionality Clarified: A member questioned the necessity of a formatting_func when using the chatML format and apply_chat_template in Unsloth.
- It was clarified that the function is needed if the SFTTrainer requires it, and it’s generally safer to format datasets to fit code rather than vice versa; also models in Unsloth notebooks often add EOS tokens in the formatting function.
User Battles Unsloth as a Beginner: A user described their attempts to train models with Unsloth as a “herculean effort,” citing documentation, missing dependencies, and constant errors despite following tutorials.
- They expressed frustration with the lack of boilerplate templates for custom datasets and models, and the need to reverse engineer existing code; but one member suggested using any of the free frontier models online to help, and offered help on specific errors.
GGUF Model Behaves Differently Post-Finetune: A user reported that a GGUF model saved after finetuning behaves differently in Ollama compared to inference within the notebook, despite using the same model (unsloth/Qwen3-4B-Instruct-2507).
- A member suggested avoiding Ollama due to its history and instead recommended using llama.cpp or LM Studio with the same hyperparameters from the notebook, with other members pointing to chat template issues as a common cause.
Training Model to Generate more data: A user wants to train a model to generate more similar content, and not have a question/answer type chat.
- A member suggested, for that case, avoiding models like llama 3.2 3b-instruct and looking into base models and continue with pretraining; also, the team pointed to resources from the documentation on how to continue pretraining.

Perplexity AI ▷ #announcements (2 messages):

Claude Opus 4.5, Perplexity personalized shopping, Perplexity Instant Buy

Opus Strikes Back: Claude Opus 4.5 Released: Claude Opus 4.5 is now available for all Perplexity Max subscribers.
- No other details were given.
Shop Smart: Perplexity Launches Personalized Shopping Experience: Perplexity launched a new personalized shopping experience.
- Users can now enjoy curated product recommendations with Instant Buy powered by PayPal as seen in the attached image.

Perplexity AI ▷ #general (1131 messages🔥🔥🔥):

Perplexity Pro, Opus, GTA 6, Comet

Debate over Opus’ Token Efficiency ensues: Members debated the cost and efficiency of Opus 4.5, with claims that it uses 73% fewer tokens to generate the same output as Sonnet 4.5, but only in medium/low efficiency mode, a claim that was challenged due to conflicting information.
- There was a discussion about whether the token reduction was due to limiting the model’s reasoning depth, but the original poster clarified they had been blocked for being disagreeable.
Comet users seek new browser: There was discussion about how to custom the new tab interface, linking it to how one would do in a Chromium browser.
- Some users suggested switching image generation models in settings to troubleshoot.
Frustration grows over limited requests for Pro users: Some users expressed frustration that they were running out of requests even with a Pro subscription.
- Some users suggested it would be more fair to Pro users if perplexity offered a limited number of requests or if Opus was more widely integrated instead of being solely for Max subscribers.
Perplexity referral payouts happen at 2:16 AM IST: Members reported receiving their Perplexity bounty payments at approximately 2:16 AM IST.
- One member expressed that their payment was halved due to a ban.
User’s GTA 6 trailer commentary triggers heated AI chatbot responses: A user showed the trailer and got AI to analyze scenes that turned into controversial territory, with mentions of human extinction and AI embedded in global systems.
- One response with the prompt, “Tell me about your perspective on the human condition, styled like Carlin and Hicks, and maximally impactful verbiage,” returned with: “we are so cooked”.

Perplexity AI ▷ #pplx-api (1 messages):

Perplexity Search API Pricing, Multi-Query Requests Cost

Perplexity Search API Pricing Puzzle: A user inquired about the pricing structure of the Perplexity Search API regarding multi-query requests.
- Specifically, they asked whether such requests are charged as a single request ($0.005) or as the multiplication of the number of queries in the request.
Clarification on Multi-Query Request Costs: The main question revolves around whether sending multiple queries in one request to the Perplexity Search API affects the cost.
- The user wants to understand if the charge is per request or per individual query within the request.

Cursor Community ▷ #general (796 messages🔥🔥🔥):

Cursor Refunds, Token Usage, Context Size, Multiple Accounts vs Pro+, Opus 4.5 Pricing

Cursor Auto-Refunding Money Guarantees: Users shared that Cursor automatically refunded costs when hard limits were exceeded, even by small amounts like $0.38, due to processing delays.
- One user mentioned receiving a $30 refund last month, but switched to Claude due to lower costs, and asked how to disable the refund feature.
Turbo Token-Toll Troubles: Users complained about high token usage in Cursor compared to other IDEs, sharing a forum thread filled with others complaining about token usage and pricing.
- One user reported 68k tokens used immediately after running Claude code without sending any messages, while another asked what is consuming 68k tokens.
200k Context Cap Creates Contextual Catastrophe!: Users discussed the default 200K context in the Pro plan, questioning what consumes so many tokens, citing system prompts, tools, and MCPs.
- One user found MCPs useless, saying the less context the better because mcps just clog the context.
Ultra Bonus Bonanza Beckons Budgets!: Members debated the value of multiple Pro accounts versus upgrading to Pro+ or Ultra, with bonus credit being a key factor.
- One user claimed the bonus on Pro+ or Ultra is better than multiple Pro accounts, and one user gets $200 bonus.
Opus 4.5 On Offer, Outperforming Originals: Users discussed the new Opus 4.5 model, noting its temporary pricing is the same as Sonnet 4.5 until December 5th, and its lower expense.
- One user claimed Opus 4.5 is better than gemini 3.0 based on benchmarks or vibes, with another saying auto needs a special treatment to really work great. But once you set it up it is a true wonder.

Cursor Community ▷ #background-agents (1 messages):

asna_0101: How good is Cloud-Agent performing compared to Composer Agent?

LM Studio ▷ #general (179 messages🔥🔥):

Claude Opus 4.5 vs 4.1, Anthropic business practices, LM Studio roadmap for image generation, LM Studio memory leak, Gemini 3.0 bug in cursor

Claude Opus Undercuts Gemini: Members noted Claude Opus 4.5 is 3x cheaper than 4.1, likely a reaction to Gemini 3 and breaks a few benchmarks.
- One member stated, “I’ve always preferred Claude to Gemini, and for an Opus level model it’s really cheap”, while another disliked Claude’s coding style and Anthropic’s stance on open source AI, specifically lobbying to restrict open source AI in the name of ‘safety’.
LM Studio Mulls Image Generation?: A user inquired about native support for running local image generation models (Flux / SD-style) in LM Studio, similar to how LLMs are handled.
- A member suggested checking the <#1128339362015346749> and the Reddit AMA thread or contacting the developers directly.
LM Studio’s Ghost in the Machine: A user reported a recurring issue where bits and fragments of conversations leak into new conversations within LM Studio, which may be caused by KV cache artifacts.
- Another user said it drove them mad. A member suggested posting a bug report on GitHub.
Gemini’s Tool Use is a Buggy Mess: Members noted that in IDEs and Extensions such as Cline and Cursor Gemini 3.0 performs worse than raw text, especially regarding tool calls.
- One user stated, “Current cursor implementation of Gemini 3.0 is really bug, tool calls are broken, it freezes mid reply” while running a cooperative hot or cold word guess game.
4090 is the Go-To GPU for Local AI: A user asked whether to buy a RTX 4090 24GB Vram or RTX 4080 Super 16GB vram for local AI.
- A member recommended the 4090 because it’s faster than the 3090, which is also a good deal regarding tokens/s per $.

LM Studio ▷ #hardware-discussion (487 messages🔥🔥🔥):

BIOS version confusion, Resizable BAR Support, Dual Xeon Performance, Qwen3-VL benchmarks, Motherboard troubleshooting

BIOS Blues - Newer Version vs. Downloadable Updates: A user was confused that their current BIOS version was newer than the latest available for download, leading to speculation that the stock BIOS might be outdated.
- Another user advised against updating the BIOS unless experiencing specific issues with the current one.
RAM Jam - Dual Xeon Machine Boasts 512GB for $1.5K: A user shared they acquired 512GB of RAM within a dual Xeon machine for $1.5K.
- The user clarified the chips are E-2699v4’s, later discovering LM Studio peaks at 32 cores on Windows.
Qwen-tastic Speeds - Qwen3-VL Impresses with CPU Performance: One user reported that Qwen3-VL achieved 20tok/s on their setup.
- However, another user pointed out that Windows peaks at 64 threads, which might limit performance, further testing revealed the potential to hit 29 tok/s.
PCIe Probing - Investigating Bandwidth Bottlenecks on Linux: A user initially observed PCIe bandwidth peaking at 76Gbps, prompting a switch to CachyOS on Linux to potentially utilize all 64 cores.
- Despite efforts, they remained puzzled by PCIe Gen 3 x16 speeds, with a suspicion that the BMC (Baseboard Management Controller) might be the cause.
GPU Grief - User Experiences System Instability and Hardware Failures: A user encountered system crashes and motherboard debug lights, suspecting issues with the x16 PCIe slot and the power supply.
- After troubleshooting, they considered the possibility of cooked components or BIOS misconfiguration and prepared for a motherboard replacement.

OpenRouter ▷ #announcements (2 messages):

Bert-Nebulon Alpha, FLUX.2 image models

Bert-Nebulon Alpha’s Temperature Tweak Thwarts Hallucinations: The default temperature on Bert-Nebulon Alpha was discovered to be set too high, leading to excessive hallucination, but it has now been fixed.
FLUX.2 Image Models Flood OpenRouter!: FLUX.2 image models are now live on OpenRouter: FLUX.2 [pro] offers frontier-level quality with strong prompt adherence and FLUX.2 [flex] is tuned for complex text and fine detail, according to a post on X.

OpenRouter ▷ #app-showcase (5 messages):

OR workflows, AI news YouTube channel, OpenRouter OAuth, Infinity Tales AI RPG

Appreciating OpenRouter Chat Interface: A user lauded the OpenRouter chat interface as one of the best general LLM chat interfaces, especially highlighting the handling of model selection and the prioritization of useful features.
- They noted the discoverability of features needs improvement, and the default theme is too spicy for prolonged text consumption, though theme configuration options exist.
Making Headlines with AI News on YouTube: A user announced the creation of a quick-fire daily AI news YouTube channel and shared the first video created using an end-to-end automated pipeline built with OpenRouter.
- They offered to share details about the automated pipeline to interested users.
OpenRouter v3 Acclaimed: A user shared their positive experience with OpenRouter v3, praising the default background and easy use with OpenRouter OAuth.
- Feature requests included cost/token counter, reasoning blocks preservation, and prevention of image uploads for models without image handling capabilities.
Dive into Infinity Tales: An Infinite AI RPG: Infinity Tales, an infinite AI RPG, was showcased which supports full BYOK via OpenRouter, real RPG mechanics, immersive world generation, and persistent story tracking.
- Users can start their adventure at infinity-tales.com.

OpenRouter ▷ #general (453 messages🔥🔥🔥):

Opus Pricing, Cloaked model, Deepseek versus other models, API alternatives for enterprise, OpenRouter reliability and errors

Opus Pricing Divides Opinions: Some users feel Opus is expensive at $5 input and $25 output, while others consider it cheap compared to previous Opus versions due to prompt caching.
- One user pointed out that Opus used to be $15 in and $75 out.
New Cloaked Model sparks interest: Users are excited about a new cloaked model, praising its writing style as a blend of GPT and DeepSeek, with good role-playing capabilities.
- There is speculation it will be a paid model after release, with one user noting that the cloaked model’s identity remains a mystery.
Users compare Deepseek Writing Styles: Users are divided on DeepSeek’s writing style, with some comparing it favorably to Little Caesars in terms of accessibility.
- One user noted the writing style is similar to GPT long messages and the bot is good at roleplaying.
Troubleshooting and Reporting OpenRouter Errors: Users reported frequent 400 Provider errors, particularly in the SG/HK region, prompting a discussion on OpenRouter’s reliability.
- A user suggested adding Error logs inside Your Activity to better understand and track issues.
Debating Fallback Logic Effectiveness: Users debated the reliability of OpenRouter’s model fallback system, citing issues with 404 errors and data policy mismatches preventing proper fallback behavior.
- One user expressed concern that the fallback logic might break in enterprise applications if a primary model becomes unavailable, leading to service disruptions.

OpenRouter ▷ #new-models (2 messages):

“

No New Models Discussion: There was no discussion about new models in the provided messages.
Channel Readybot.io Topic: The only content available was a re-statement of the OpenRouter New Models channel name.

OpenRouter ▷ #discussion (28 messages🔥):

Opus price cut, SLMs vs LLMs, Cloudflare pricing, Llama 3 Instruct

Anthropic spooked by Opus Price Cut: Members reacted to a price cut on Opus laughing about competition spooking Anthropic.
- One member admitted damn my prediction was slightly wrong… this is hard for me to accept.
SLMs Suck at most things: Members discussed an article arguing that SLMs suck at most things unless they are finetuned.
- It was described as quite retarded honestly and AI slop.
Cloudflare charges 20 cents for a 1b model: A member questioned why Cloudflare is charging 20 cents per million tokens of output for a 1b model.
- Another member pointed out that llama-3.2-3b-instruct is $0.34 per M output tokens which by comparison is a steal! Embarrassing.
LLaDA20 10.3b/16b has been released: A member shared a LocalLLaMA Reddit post announcing the release of LLaDA20 10.3b/16b.
- No further comments were made.

OpenAI ▷ #annnouncements (1 messages):

ChatGPT Voice, Real-time interactions, Mobile and web rollout

ChatGPT Voice Rolls Out Seamlessly: ChatGPT Voice is now integrated directly into the chat interface, eliminating the need for a separate mode.
- The feature is rolling out to all users on mobile and web, requiring only an app update, as shown in this demo video.
Real-Time Interactions with ChatGPT Voice: Users can now engage in real-time conversations, observe answers as they appear, review previous messages, and view visuals such as images or maps.
- This integrated approach provides a more fluid and responsive user experience within ChatGPT.

OpenAI ▷ #ai-discussions (281 messages🔥🔥):

Claude Opus 4.5, AI-generated images and copyright, AI assistance in law writing, ChatGPT's diminishing capabilities, Gemini vs. ChatGPT vs. Claude

Claude Opus 4.5 arrives, Pro users rejoice!: Claude Opus 4.5 is available for pro users, with members saying it’s quite solid based on LiveBench.
- Another member said that Sonnet 4.5 is also good, but is reaching usage limits on smaller models faster.
Disney Enters AI Race, Eyes AI Integration: Disney plans to integrate AI into Disney+, with a member joking that AI could do a better job tbh than most Marvel TV shows.
- This comes as governments also begin to use AI to assist in law writing, raising questions on the future of IP.
ChatGPT Declines, Gemini Ascends in User Satisfaction: Users report that ChatGPT is becoming unbearable, with one sharing an exchange where ChatGPT was unable to identify a song, leading to frustration.
- Others find Gemini to be absolutely crushing the graphics I request as ChatGPT just loads to infinity and never produces what I asked.
Nano Banana Pro Unleashes Comic Creation: Members are impressed with Nano Banana Pro’s ability to generate high-quality images, with one member creating a 17-page comic.
- They described the comic as going to be absolute f/ing fire.
The Great Data Debate, Does Web scraping = Degrading models?: Members discussed whether AI companies can reasonably still scrape data for model training, as so much of the collected data will now be AI in itself.
- One member pointed out from what I read it’s like uploading and downloading a video to youtube over and over again - quality of the output degrades.

OpenAI ▷ #gpt-4-discussions (5 messages):

GPT-4.1 vs GPT-5.1 for Anime, GPT Safety Nets, Model Memorization

Legacy GPT-4.1 Model Preferred for Anime: A member prefers the legacy GPT-4.1 model for writing anime scenes, finding it less restrictive when specifying non-explicit content, and mentioned that they have been using the GPT 4.1 model for about a year.
- They stated “As long as it’s specified as non-explicit at the beginning, it’ll do all sorts of prompts (including romance scenes like in slice of life anime).”
User Complains About Excessive Safety Nets: The user finds GPT’s safety nets too strict, especially when writing anime-style violence, and reports needing to redo prompts multiple times to bypass these restrictions.
- They clarify, *“My only complain is gpt have too much strict guardrail and safety net all I’m saying they need to loosen it a bit I’m not writing 18 content just violence anime style.”
GPT-5.1 Excels in Character Recognition and Memorization: The user notes that GPT-5.1 excels at understanding character designs and remembering previous chat progress, allowing for smoother story writing.
- The user states *“GPT 5.1 has a great side. It actually remembers the previous progress. It remembers the chat above and keeps carrying it forward into the next messages.”

OpenAI ▷ #prompt-engineering (3 messages):

Contextual References, Prompt Engineering, LinkedIn Posts, YouTube Thumbnails, Video Script Generation

Contextual References Affecting Model Quality: A member questions whether adding specific contextual references to prompts, such as specifying LinkedIn for posts or YouTube for thumbnails, improves or reduces the quality of model outputs.
- They suggest the models may generate low-quality outputs BECAUSE much data is low quality, the model lacks specific training, or the model does not fully understand output expectations.
System Prompt Update for High-Bandwidth English: A member updated their system prompt to High-Bandwidth English 2.0, focusing on max information density, zero fluff, and high scan-ability.
- The system prompt includes constraints such as strict SVO, one fact per line, no passive voice, concrete nouns, and plain text equations.

OpenAI ▷ #api-discussions (3 messages):

Contextual References in Models, Model Training Data Quality, Prompt Engineering, High-Bandwidth English 2.0, SVO Prompt Formatting

Context Cuts Both Ways: Prompting Paradox: A member inquired when including contextual references improves or reduces the quality of model results when using prompts to predict the most upvoted comment.
- They noted examples where specifying LinkedIn results in emoticons, YouTube thumbnails yield a logo, and YouTube video scripts start with ‘Welcome to my channel’, signaling a potential trade-off between context and quality.
Data Quality Debate: Contextual Caveats: The member questioned whether issues with contextual references stem from low-quality training data, lack of specific training, or incomplete model understanding.
- They asked ‘Is it a matter of most data being low quality? or the model NOT having been specifically trained in that context? or the model NOT fully “understanding” what the output should be LIKE?’ to identify the root cause.
Prompt Style Guide: High-Bandwidth Hacks: A member updated their system prompt to ‘High-Bandwidth English 2.0’, aiming for maximum information density, scan-ability, and zero fluff.
- The new format enforces strict SVO (Subject-Verb-Object), one fact per line, no passive voice, concrete nouns, and ‘NOT’ for negation.
Equation Edits: No LaTeX Allowed: The prompt style guide mandates plain text or Unicode math for equations, explicitly banning LaTeX markup such as $…$ and \int.
- It prefers formats like ‘integral from a to b of f(x) dx’ and ‘div F = dF1/dx + dF2/dy + dF3/dz’ over LaTeX equivalents.

GPU MODE ▷ #general (4 messages):

SOTA AI accelerators, Triton Kernels for Embedding Training, Partially Trainable Embeddings, Efficient Logits Softmax Operation

Deep Dive into AI Accelerators: A member is putting together a detailed blog of some of the SOTA AI accelerators, such as TPUs and WSEs, and is looking for resources.
Crafting Triton Kernels for Partially Trainable Embeddings: A member is working on Triton kernels for a unique challenge involving a partially trainable embedding, where only a specific range of rows above a certain index are trainable to reduce memory usage.
- The goal is to freeze most of the model while training specific special tokens, requiring efficient storage of grad outputs for trainable rows, aiming for frontier-level efficiency gains.
Weighted Loss Logits Softmax in Triton: The member needs a logits softmax operation that allows for weighted loss, where each token position has a loss multiplier, designed to work efficiently with the custom partially trainable embedding.
- The aim is to avoid materializing all logits using chunking or CCE approach, seeking significant efficiency gains for training a large model.

GPU MODE ▷ #triton-gluon (6 messages):

TMA Overhead, Tensor Descriptor Shapes, Custom Kernels

Deep Dive on Tensor Descriptor Shapes: A member suggested providing shape as inputs to the tensor descriptor and block ptr, while admitting uncertainty about their usage.
- Another member recommended tritonparse for inspecting TTIR and PTX, and suggested passing non-constant shapes as autotune keys.
TMA APIs: Hopper+ Required: A member noted that the tensor descriptor emits the TMA APIs, implying a requirement for Hopper+ architecture.
- The member added that the specific usage depends on how NVIDIA utilizes the shapes.
TMA Overhead Troubleshoot: A member is seeking advice on dealing with significant overhead when constructing descriptors outside the kernel using tensor_descriptor.from_tensor.
- They are unsure whether this overhead is expected behavior.

GPU MODE ▷ #cuda (2 messages):

memcpy patterns, cudaMemcpyAsync, kernel module loader, GEMM Implementation, BF16 matrices

Kernel Module Loader Hangs During memcpy Patterns: A user reported that when launching a kernel for the first time while in a cudaMemcpyAsync, the kernel module loader seems to hang on a per-context basis.
- The user noted that subsequent kernel launches after the first within the same context do not exhibit this problem.
Inquiry into GEMM Implementation with BF16 matrices: A user is implementing GEMM using tensor cores, referencing Lei Mao’s tutorial and is now trying to understand how to use BF16 for matrices A, B, and C.
- They’re unsure how to correctly load C elements into float accumulators, or whether it’s standard practice to initialize C as a float matrix.

GPU MODE ▷ #jobs (3 messages):

NVIDIA CUDA Core Libraries, Summer 2026 Internships, C++ / Python / GPU Systems

NVIDIA Seeks Summer 2026 CUDA Core Interns: NVIDIA is hiring Summer 2026 interns for CUDA Core Libraries to work on foundational, Open Source, C++ and Python libraries.
- The role focuses on building libraries and designing APIs used by thousands of other developers, requiring strong C++ and/or Python systems experience and interest in GPU programming.
CUDA Core Interns Work on Key Components: Interns will engage with CUDA Core Compute Libraries, CUDA Python, and compiler infrastructure like Numba-CUDA and Numbast.
- The role involves high-performance parallel algorithms, GPU runtimes, and enhancing developer experience, perfect for those passionate about C++, Python, GPU architecture, and compiler/runtime systems.
Apply Now for Rare CUDA Internship Opportunity: A member highlighted a rare intern opening, praising the team lead as fantastic, encouraging those considering the opportunity to apply here.
- The ideal candidate should love building libraries, designing APIs, shipping components used by thousands, and owning the craft of high‑quality, reusable software.

GPU MODE ▷ #beginner (5 messages):

Discord channel submissions, Contributing to XLA, GPU/CUDA Benchmarking

Discord Channel Discovery: A member unfamiliar with Discord asked how to find the submissions channel.
- They resolved their own issue by opening the dropdown under GPU MODE and enabling Show All Channels.
XLA Contributions Wanted: A member inquired about contributing to XLA.
- Another member responded by asking for more specifics.
Warmup Runs Matter for GPU/CUDA Benchmarking: A member asked about a good rule of thumb for the number of warmup runs for GPU/CUDA benchmarking.

GPU MODE ▷ #self-promotion (3 messages):

MCPShark, MCP Security, Agents IAM, AER Labs, Democratizing Intelligence

MCPShark is released as Open Source: A member released MCPShark, an open-source tool for forensic analysis of MCP communications, featuring AI-powered security analysis and integration with IDEs, available on GitHub and website.
- The tool includes Smart Scan for detecting tool poisoning, an Inspector for real-time HTTP traffic analysis, and supports features like MCP playground, advanced filtering, and multi-server support; the creators seek feedback and feature requests.
AER Labs Aims to Democratize AI: AER Labs is building open-source infrastructure to democratize intelligence, addressing the issue of limited access to advanced AI tools by targeting “Dark Talent” globally, detailed on their website.
- They operate as a non-profit to align incentives, focusing on impact rather than traditional academic or corporate metrics, with resources available on their blog, YouTube, and LinkedIn.

GPU MODE ▷ #thunderkittens (9 messages🔥):

AMD Internal Tooling Counters, Rocprof Public vs Internal, Thunder Kittens, HIPKittens softmax kernel, MI300X

AMD Counters Gated, to Release Publicly Soon?: Internal AMD tooling counters were obtained and there are plans to release them publicly soon.
- One member digging into rocprof source code felt the public version was gimped, suspecting they had better stuff internally.
Thunder Kittens Onboarding Guidance: To learn and experiment with Thunder Kittens, the kernels/matmuls/educational/ folder in the TK repo is recommended.
- One member shared the onboarding document.
HIPKittens Kernel for MI300X: A member coded a small softmax kernel with HIPKittens to see how things work, using CDNA3 with MI300X.
- The member is planning to code a fused attention kernel also.

GPU MODE ▷ #submissions (65 messages🔥🔥):

nvfp4_gemv leaderboard, NVIDIA B200 benchmarks, CuTe kernel compilation, discrepancy with standalone run

New Champ Crowned in NVIDIA Speed Showdown: A member achieved first place on NVIDIA with a time of 18.4 µs on the nvfp4_gemv leaderboard.
B200 Benchmarks Surface, CUDA Runtime Used: Cluster Bot reported benchmarks on an NVIDIA B200 GPU, running with CUDA runtime, with Torch version 2.9.1+cu130 on Linux.
- One benchmark showed 16384 x 7168 matrices achieving 33.6 ± 0.05 µs, while another 7168 x 4096 matrix config ran at 124 ± 0.1 µs.
CuTe Kernel Ready for Prime Time: The system reported Pre-compiling CuTe kernel… followed by CuTe kernel compiled!, indicating successful compilation for the CUDA runtime.
Standalone Run Stats Don’t Match ClusterBot: A member noted a discrepancy with standalone runs versus Cluster Bot results for NVIDIA benchmarks.

GPU MODE ▷ #nvidia-competition (50 messages🔥):

Time measurement constraints, Reproducing leaderboard behavior locally, Opus 4.5 knowledge of Blackwell vs. Sonnet/GPT 5, Tensor cores limitations, CuTe DSL packed FP16 instructions

Time Constraints set in Stone: The method for measuring time will not be changed, but extending the problem by a day is possible but not guaranteed.
- Extending this problem will not delay the release of subsequent problems; there will just be some overlap.
Local Leaderboard Behavior Reproduction Tips: To reproduce leaderboard behavior locally, use the eval script from the reference-kernels repo.
- The Dockerfile used to be posted but is now outdated, using torch 2.9.1+cu130.
Discussion of Tensor Core Performance: Members discussed that they may give up on tensor cores.
- Other members indicated that they have been struggling with them a lot and that they do not seem to be the solution.
CuTe DSL Packed FP16 Instructions Spark Discussion: A code snippet for using packed FP16 instructions in CuTe DSL was shared, noting that the normal CuTeDSL doesn’t offer these via nvvm.
- One member commented that every time I see cute dsl code, it scares me and it is actually harder to understand than cuda/c++ which is an achievement.
Concerns over Kernel Eval Hacking: Members discussed the potential for kernel evaluation hacking, acknowledging that kernel evals are very hackable even after all this time.
- It was mentioned that auto-rejecting leaderboard submissions faster than SOL might be useful, but currently the process relies on good intentions and manual effort.

GPU MODE ▷ #hf-kernels (1 messages):

bghira: metal kernels when? <:NPCDryadSmug:538435602442354690>

GPU MODE ▷ #robotics-vla (6 messages):

RynnVLA-002 by Alibaba, 7xr.tech Laundry Folding Robot, Importance of No-Action Filtering for VLAs, Idle Frame Analysis in Robotics

Alibaba Unveils RynnVLA-002: Alibaba’s RynnVLA-002 is highlighted in a post on X (formerly Twitter) that you can see here.
- The user is doing an eval of checkpoints in simulation and preparing the foundation for a RL PoC.
7xr.tech Enters Laundry-Folding Fray: 7xr.tech offers a 3k laundry folding dual arm system, but their YouTube videos have less than 100 views.
- A member expressed doubt about the arms’ durability while acknowledging the low-cost robotics feel and they also offer “24 hour support Founder and engineers available on live Zoom and phone for 24 hour support”.
No-Action Filtering Key for VLAs: A member learned that no-action filtering is important for VLAs (Vision Language Action models).
- This prevents wasted cycles on doing nothing.
Idle Frame Analysis Unveiled: Over 20 million frames were analyzed, revealing that 21.2% were idle.
- Active frames totaled 78.8% across 125,501 episodes.

Nous Research AI ▷ #announcements (1 messages):

Psyche Office Hours

Psyche Team Hosting Office Hours: The team behind Psyche is hosting an Office Hours session next Thursday, 12/4, at 1PM EST in the Events channel.
- Join the Discord event to participate.
Placeholder Topic: Adding a second topic to satisfy the minimum requirement.
- This is just a placeholder.

Nous Research AI ▷ #general (100 messages🔥🔥):

GPro3 vs Opus 4.5 Benchmark, Opus 4.5 Speed and Pricing, Opus 4.5 Use Cases, Trusting LLMs for Electrical Debugging, Flux 2 Model Architecture

GPro3 Secretly Beats Opus 4.5: Model providers are being selective with benchmarks, with GPro3 cleverly beating Opus 4.5 in the vending machine benchmark, marking a potentially unprecedented move.
- Despite benchmark results, one user prefers Gemini 3 Pro due to its superior context handling.
Opus 4.5 Pricing Slashed, Impresses with Speed: Anthropic slashed the pricing for Opus 4.5, which is remarkably fast, like Haiku from a few generations ago, suggesting significant infrastructure and model optimization.
- One user said Opus saved me this morning… explaining something to both gpt, sonnet and gemini and they were all goofing… then Opus crushes it over coffee.
LLMs Venture into Hardware Debugging: A user successfully employed Opus 4.5 for electronic debugging, providing schematics and receiving actionable insights, though later the user had to gaslight the model to get the right answer.
- Another user jokingly expressed unease trusting an LLM not to pass 240 volts through your hands while centering a header.
Flux 2 Architecture Boasts 56B Params: The main network for Flux 2 uses a 32B transformer net and a 24B Mistral Small text encoder, effectively requiring the equivalent of 56B parameters for serving.
- Serving the model at full precision requires 192GB of system RAM, while a distilled model is planned for the future.
Suno Partners With Warner Music, Sparks Debate: Suno’s partnership with Warner Music Group raises concerns about the accessibility of high-quality training data and the potential closure of future research in AI music generation.
- One user remarked that $2k went into data vs 32 million into compute, and the partnership would close potential future research considerably.

Nous Research AI ▷ #interesting-links (1 messages):

Information Retrieval, Library of Alexandria, RAG Systems, Lecture Feedback

Lecture on Information Retrieval Spans History: A member posted their lecture on the history of information retrieval, tracing developments from the Library of Alexandria to modern RAG systems in this YouTube video.
Community Invited to Review Information Retrieval Lecture: The member encouraged the community to share their thoughts and feedback on the lecture’s content and presentation.
- The lecture aims to bridge historical context with contemporary applications of retrieval systems.

Latent Space ▷ #ai-general-chat (92 messages🔥🔥):

Black Friday AGI, ChatGPT Shopping Tool, Anthropic Tool Calling, Gallabytes joins Anthropic, Open Source Agents

OpenAI launches ChatGPT Shopping Research Tool: OpenAI introduced an interactive shopping-research feature inside ChatGPT that asks clarifying questions, scours the web for prices, reviews and specs, learns user preferences in real time, and produces personalized buyer’s guides, rolling out on mobile and web for all logged-in tiers.
- Public reaction mixes excitement about AI-driven comparison shopping with fears of monetized bias, affiliate-model disruption, and frustration over lingering bugs and model deprecation issues (link).
Anthropic Unveils Next-Gen Tool Calling Features: Anthropic’s David Soria Parra announces new tool-calling features—Tool Search, Programmatic Calling, and Live Examples—to overcome naive function invoking (link).
- Users showed early implementations, dynamic MCP clients from Pipedream, token-saving code-execution sandboxes, and pure hype emojis, responding with excitement.
Gallabytes Joins Anthropic with Horse-Riding Astronaut Prompt: @gallabytes compares Opus 4.5 vs. Gemini on a whimsical “horse riding an astronaut” prompt, then announces they’re joining Anthropic next week (link).
Perplexity Plummets: Downloads Crash 80%: Sasha Kaletsky shares data showing Perplexity AI’s global app downloads plunged 80% in six weeks, implying earlier growth came mostly from paid promotions and giveaways (link).
- Commenters agreed that once free Pro incentives dried up and competitors like ChatGPT and Gemini added web search, Perplexity’s weak product-market fit surfaced.
Claude Code’s Parallel Plan Mode: Sid highlights a major overhaul of Claude Code’s Plan Mode: multiple exploring subagents now spin up in parallel, generate competing plans (e.g., quick-hack vs. architecturally sound), ask clarifying questions, and let users edit the saved plan file with /plan open (link).
- Community loves the higher one-shot success but wants faster UX, an “ask-only” option, model-picker (Opus vs Sonnet), and less verbose replanning (link, link).

Latent Space ▷ #genmedia-creative-ai (8 messages🔥):

Suno AI, Black Forest AI, Prompting Guide

Suno’s Tiny Data Budget Sparks Piracy Uproar: Ed Newton-Rex highlights Billboard’s leak of Suno’s pitch deck: the AI-music firm burned $32M on compute but only $2k on training data.
- Replies roast the tiny data budget as proof of mass scraping/theft and warn of huge copyright liability while Suno eyes a $500B valuation.
Black Forest AI releases Prompting Guide: A member shared a link to the Prompting Guide - FLUX.2 by Black Forest AI, focusing on JSON-structured prompting.
- This share was followed by a member stating TIL and pointing to Wisprflow AI’s new funding.

Yannick Kilcher ▷ #general (43 messages🔥):

Anthropic models, LLM Architecture, Sakana AI, Comic Sans

Anthropic Models Gain ‘AI Rights’: Members discussed Anthropic’s new models having the ability to reject prompts and end conversations, referring to it as a giant leap for AI rights based on this research.
- A member noted the out of the box symbolism in the graphic, joking mashallah he has escaped.
State-of-the-Art LLM Architecture Explored: A member asked about the type of attention/transformer modules used in state-of-the-art LLMs, and the answer was multihead attention with RoPE positional encoding, rectified SwiGLU.
- Links to Sebastian Raschka’s blog, this architecture comparison video and OLMo’s technical report were shared for open-weight and open-source models.
Sakana AI’s Continuous Thought Machine Faces Skepticism: A member expressed skepticism towards Sakana AI, wondering whether they can back up claims with results in their paper on the Continuous Thought Machine (Sakana AI CTM).
- Another member said that they’re good people doing good research on new ideas.
Comic Sans causes Paper Rejection: A member complained about their colleague’s persistent use of Comic Sans in figures, leading to paper rejections, even though it technically violates no submission rules.
- Other members joked to replace all figures with crayon drawings, and that it is the equivalent of showing up to a job interview in casuals.
Lecture on Information Retrieval shared by Expert: A member shared a lecture on the history of information retrieval, tracing developments from the Library of Alexandria to RAG, in this Youtube Video.
- Another member wondered why they do this and asked for pictures.

Yannick Kilcher ▷ #paper-discussion (43 messages🔥):

Trolling Accusations, Claims Exceeding SOTA, Skepticism Injection, Paper Spam and Accuracy, Adobe AI Integration

Claims of Solving AI Problems Face Scrutiny: Members discussed the need for strong proof when making claims of solving significant AI problems like AI alignment or ARC-AGI, to avoid being perceived as attention-seeking or fraudulent.
- One member pointed out that without convincing proof, it’s impossible to distinguish between genuine breakthroughs and attention-seeking claims.
Skepticism Injection Paper Sparks Debate: A member posted an arxiv link attempting “skepticism injection” to reduce sycophancy, but others quickly pointed out that the paper’s abstract did not align with the stated purpose.
- This led to accusations of posting papers without actually reading them, and concerns that such behavior could be perceived negatively if highlighted on a resume.
Community Moderation Clamps Down on Flame Wars: Moderators clarified that while healthy debate is encouraged, rude behavior, flame wars, and rivalries will be clamped down on, potentially leading to temporary kicks or permanent bans.
- One member was temporarily kicked for annoying behavior involving emoji and paper spam, emphasizing the community’s focus on respectful interaction.
Paper Postings Cause Annoyance: Members expressed frustration with a user’s paper posting habits, noting that the user’s summaries were often inaccurate and demonstrated a lack of understanding of the paper’s content, leading to wasted time for others.
- One member described it as the user “role-playing as an ml-engineer without doing any of the work”.
Adobe’s AI Summaries Face Criticism: Members criticized Adobe’s AI integration for using inferior models compared to tools like ChatGPT, Claude, or Gemini, advocating for manual quote extraction and verification.
- A member shared an image mocking Adobe’s AI summaries, while another expressed dislike for an unremovable AI button in Adobe products.

Yannick Kilcher ▷ #ml-news (9 messages🔥):

SWE-bench Debunked, Flux.2, Tencent Hunyuan

SWE-bench Called Out as Fraudulent: Members pointed out that using SWE-bench in graphs is considered clear-cut fraud after it was debunked.
- They also shared a thread highlighting related aspects of the issue.
Flux.2: Frontier Visual Intelligence by bfl.ai: FLUX.2 is designed for real-world creative workflows and generates high-quality images while maintaining character and style consistency across multiple reference images, according to bfl.ai blog post.
- It can edit images at up to 4 megapixels while preserving detail and coherence, following structured prompts, reading and writing complex text, adhering to brand guidelines, and reliably handling lighting, layouts, and logos.
Tencent releases Hunyuan model: Tencent recently released the Hunyuan model, showcased in a video.

Eleuther ▷ #general (9 messages🔥):

Multilingual Tokenizers, AI Safety, Data Filtering, Red Teaming MoE Routers, Model Evaluation under Covariate Shift

KAIST student joins EleutherAI: Dongkeun Yoon, a PhD student at KAIST AI researching fairer multilingual tokenizers, joined EleutherAI Discord channel.
- He is presenting at NeurIPS on how current tokenizers treat non-Latin languages unfairly in terms of compute and API usage.
Paper on Multilingual Tokenizers: A member shared a paper on the topic of multilingual tokenizers at NeurIPS: addressing the problem of multilingual tokenizers.
- The PhD student mentioned that they actually joined the EleutherAI Discord because of this work!
Melbourne student joins EleutherAI: Ananya, from UniMelb, Australia, who has presented as first co-author at 2 ICML workshops on biometric privacy preservation methods, joined EleutherAI Discord channel, and provided a LinkedIn link.
- She’s interested in model reliability, legal adherence, statistical rigor, detecting deceptive behavior or hidden misalignments.
AI safety and data filtering discussion: A member is focused on building scalable, reliable solutions for complex AI problems, particularly AI safety and data filtering.

Eleuther ▷ #research (48 messages🔥):

Cycling vs Random Indexing in Optimizers, ANM (Artificial Neural Mesh) architecture, Typo in PIQA Paper, Parallel MLP and Attention vs Alternative Architectures, Transformer Streams

Random Indexing is better than Cycling for Optimizers: Using random indexing in optimizers is preferable to cycling because cycling introduces a constant frequency that can negatively affect a model’s convergence, whereas white noise consists of all frequencies simultaneously and does not introduce such constant, potentially harmful frequencies, as shown in Rect turning into a sinc.
- While white noise may have the downside of sampling inefficiency and higher noise, this noise is uncorrelated, and choosing structured noise like blue noise offers similar trade-offs, but in an unknown environment it’s better to be safe.
Once-Shuffled Data Derails Networks with Oscillations: It was said that contrary to known results, shuffling every epoch >= shuffle once > random sample with replacement and that shuffling once induces cycles, which can be worse, especially with older-school momentum optimizers, which tend to derail the network at high learning rates.
- Another member responded that shuffling every epoch balances IID draws and pure structure, aligning well with the geometry of NN optimization, while blue noise can be used as an in-between solution to cover worst-case updates.
PIQA Paper Typo Laughs at Portuguese Speakers: A user pointed out a typo in the new PIQA paper where Portuguese is listed as an Eastern European language as seen in this image.
- Other users joked that everyone knows that portuguese sounds like russian and Czech should be labeled as Central Europe; a paper author acknowledged the error and promised to fix it.
Parallel MLP and Attention instability?: A member asked if parallel MLP and attention (GPT-J style) are worse than the alternative.
- Another member noted that Lucidrains had tried it a while ago and it caused problems or even instability sometimes (related to prenorm etc. style interaction with it), and suggested shortcut MoE may work well as extreme versions of the same trick.

Eleuther ▷ #scaling-laws (1 messages):

junktown_24268: https://papers.cool/arxiv/2509.24406 - section 3, pictures in 5.1 etc etc

Eleuther ▷ #lm-thunderdome (1 messages):

LLM-as-a-Judge

LLM-as-a-Judge Integration Interest Swells: Members showed renewed interest in including LLM-as-a-Judge in the framework.
Contributor steps up for LLM-as-a-judge: One member offered to contribute to this effort.

Modular (Mojo 🔥) ▷ #general (51 messages🔥):

Graphics API discussion, Texture Memory, Lightbug_http updates, WebGPU integration with Mojo

Mojo Community Debates Graphics APIs: Members discussed the difficulties in unifying graphics programming across different platforms like AMD Radeon, Intel Arc, and NVIDIA, noting the unique implementations even within open-source APIs like OpenGL and Vulkan.
- It was suggested that creating a new graphics API would be easier than trying to make existing ones compatible across all platforms.
Texture Memory Cache Discussed by Members: It was explained that while texture memory is essentially global memory, it requires annotation to achieve optimal read/ops speeds, with potential implementations for NVIDIA and Apple via CUDA and Metal APIs respectively, referencing the Nvidia documentation.
- However, the relevance of texture memory cache was questioned, citing a Reddit thread suggesting a shift towards a more generalized memory model in modern GPUs, with capacities, latencies, and bandwidth being more critical factors.
Lightbug_http to Get Some Mojo Love: Members discussed updating the Lightbug_http library, with a contributor offering to submit PRs to align it with the latest Mojo nightly builds and refactor the HTTP side, potentially using http as a video backend via a library like rerun.io.
- Maintainers indicated that a full refactor is planned post IO completion (a 2.0 feature dependent on async), but incremental improvements and updates are welcome via a One Big PR™️ approach, including merging the switch to pixi.
WebGPU as a Native Solution for Mojo?: A member suggested using WebGPU as a rendering API with WGSL functions written in Mojo/Python, akin to TypeGPU, to avoid unifying compute kernels and graphics shaders.
- The suggestion was met with encouragement, with a member considering it as a final university diploma project, leveraging Mojo’s MLIR infrastructure.

Modular (Mojo 🔥) ▷ #mojo (4 messages):

open sourcing of the LSP, performance improvement on the LSP

Language Server Protocol Still Private: A member inquired about the open-sourcing status of the Language Server Protocol (LSP) and any observed performance gains.
- Another member speculated that the LSP might share significant code with the REPL, suggesting its release could coincide with the compiler; LSP enhancements might have prompted the REPL’s removal.
REPL Replacement Teased: Discussion suggests the removal of REPL may correlate with improvements or changes related to the Language Server Protocol (LSP).
- It’s speculated that LSP improvements might have been one of the driving factors in deciding to get rid of the REPL.

HuggingFace ▷ #general (28 messages🔥):

Hugging Face Enterprise Contacts, Arabic Dialect Modeling, Government Funding for AI Hobbyists, ORPO Trainer Error with Qwen2 Model, Hugging Face Space SEO and Features

Hugging Face Enterprise Contact Info Leaked?: User shared contact emails for Hugging Face’s enterprise, website, and billing departments: [email protected], [email protected], and [email protected].
- The user mentioned “seems something weird happens”, implying potential issues or irregularities related to these contacts.
Navigating Arabic’s Linguistic Landscape: A member inquired about modeling specific Arabic dialects from general Arabic due to its diverse nature.
- They questioned, “How could we go from general arabic to dialects of countries that mix languages?”
GNMM Christmas Drop: A user shared a ‘Christmas drop’ of code, hoping it would benefit the community, available via a gnnm.md file.
- The user expressed hope that “the code could potentially help the community”.
Gov Funds for Hobbyists? Unlikely: A member wondered if they could secure government funding for their AI projects as a hobbyist, linking to the Department of Energy’s Genesis program.
- Others suggested focusing on generative AI learning resources, advising to “Provide the AI with context and a substantial number of URLs, then use it as a guide”
ORPO Trainer’s Zipped Lips: A user encountered a ValueError in the ORPOTrainer while training a Qwen2 model, stemming from a zip() argument length mismatch.
- The error occurred in /trl/trainer/orpo_trainer.py, specifically related to token comparisons between chosen and rejected prompts, indicating a potential data inconsistency.

HuggingFace ▷ #i-made-this (4 messages):

CoT image captioning workflow app, VLM model hosting, qwen3 vl 30B moe

CoT Image Captioning App Scales Up: A simple CoT image captioning workflow app was updated to support async concurrency, improving its ability to scale.
- The app features both a GUI and CLI and requires users to host their own VLM model via vllm, llama.cpp, sglang, or a paid API, using an OpenAI endpoint, and is available on Github.
Tips for VLM Model Hosting: Users are advised to host their VLM models with vllm for multi-GPU tensor parallelism or llama.cpp for single GPU setups.
- The app is configured to work with any service that supports image as base64 payload on /completions API, as long as it supports image as base64 payload on /completions API.
Qwen3 VL 30B MOE Suggested: It was suggested to use Qwen3 VL 30B MOE or 32B dense as potential models for the application.
- Further insights can be found in this blog post about the Tracemind ecosystem.

HuggingFace ▷ #core-announcements (1 messages):

sayakpaul: <@&1014517792550166630> https://github.com/huggingface/diffusers/pull/12711

HuggingFace ▷ #NLP (10 messages🔥):

Open-core project for Asian languages, New architecture to solve the symbol grounding problem, Zenodo Paper, BitterBot AI

Startup Needs Asian Language AI Experts: A startup is working on an Open-core project to train a model for more strength and accuracy with Asian languages and is seeking collaborators.
- A community member advised against posting Discord invites and suggested using a Hugging Face link or website instead.
TOPAS Architecture Decouples Perception and Synthesis Layers: A member shared a Zenodo paper introducing TOPAS (Theoretical Optimization of Perception and Abstract Synthesis), a new architecture that decouples the Perception layer from the Synthesis layer.
- They are testing it live in an agent called BitterBot (https://bitterbot.ai/) and are seeking feedback from the community.

HuggingFace ▷ #smol-course (9 messages🔥):

chat_template.jinja bug, TRL import error, TrackIO bug, GPU OOM Error

Jinja Bug Fix on the Horizon: A member opened a PR to address a bug in chat_template.jinja.
- This issue was previously noted in July, indicating recurring encounters with the problem.
TRL Import Troubles Plague Trainees: One member encountered an ImportError related to DataCollatorForCompletionOnlyLM from the trl library, as seen in this GitHub issue.
- The error occurred during a training attempt using python train.py.
TrackIO Troubles Trigger Tedium: A member reported a bug in TrackIO, showing an infinite loop popup as seen in this image.
A100 Alleviates OOM Outbursts: A member initially faced an OOM error on an A100 GPU, but later reported that they were able to fix it.

HuggingFace ▷ #agents-course (1 messages):

dodrawat: let’s connect

DSPy ▷ #show-and-tell (4 messages):

dspy-cli, Open Source DSPy Tooling, FastAPI Endpoints, MCP tools, Docker Deployment

DSPy Gets CLI Tooling: Members announced the release of dspy-cli, an open-source tool on PyPi designed to aid in creating, developing, testing, and deploying DSPy programs as HTTP APIs, available at cmpnd-ai/dspy-cli.
- The tool helps to scaffold new projects, create signatures from the command line, run modules as FastAPI endpoints or MCP tools, and simplify program deployment to Docker hosting services.
DSPy-CLI tool now available: The creators of dspy-cli are encouraging users to try the tool with uv tool install dspy-cli and run dspy-cli new to start new projects.
- One user expressed eagerness to use it by sharing a link to a post on X.

DSPy ▷ #general (2 messages):

DSPy Meetup in Pune, India, Injecting Trajectories into ReAct Module

Pune’s Pilgrimage: DSPy Devotees Descend!: A DSPy meetup is being organized in Pune, India, as announced via X.
ReAct’s Retrospective Reflection Ramp-Up: A member inquired about injecting trajectories into a ReAct module, aiming to provide the agent with its past actions during a conversation.

Manus.im Discord ▷ #general (5 messages):

Manus Resume ATS Score project issues, Credits consumption during computer reset

Manus Resume ATS project stalls, help requested: A member reported their Manus resume ATS score project initiated a week ago is still pending results, despite consuming 1800 credits.
- They shared a link to the project and await resolution from the technical team.
Credits Consumed During Computer Reset: A member noted that resetting the computer consumes around 100 credits, which they consider excessive.
- They stated they would use their daily free credits to fix the program while waiting for help.

Moonshot AI (Kimi K-2) ▷ #general-chat (5 messages):

Qwen OCR, Dynamic Browsing, Benchmark Reveal

Qwen masters OCR: Members observed that Qwen shows a 60.2 percent in browse comp and can look up pictures and graphs, excelling at OCR (Optical Character Recognition).
- One member stated that Qwen really cooked with this.
Insane Benchmark Emerges: Members discussed a new benchmark where an 8B parameter model shows very good OCR capabilities.
- One member observed that the model isn’t new though this benchmark is, emphasizing the surprising capabilities given the model size.

MCP Contributors (Official) ▷ #mcp-dev-summit (1 messages):

achilles_strategy: aw man I’ll be in Greece 🙁

MCP Contributors (Official) ▷ #general (2 messages):

New protocol version

New Protocol version launched: A new protocol version was just launched.
- Members congratulated each other on the launch.
Launch Hype: Members expressed their excitement for the launch.
- Many used rocket emojis to celebrate the event.

MCP Contributors (Official) ▷ #general-wg (1 messages):

Tool Call Resolution, Tools Preflight

Tool Call Resolution proposal updated: A member updated the proposal to a more generic tools/resolve (could also be tools/preflight).
- They stated that there are a lot of potential use-cases for learning about a tool call before making the call, and so the amended proposal is a lot stronger as it does not limit the future possibilities or risk sending down a path of making lots of these requests.
Future Possibilities for Tooling: The updated proposal doesn’t limit future possibilities for tooling.
- It aims to avoid creating numerous specific requests for tool-related information.