It’s Anthropic’s turn today

AI News for 11/21/2025-11/24/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 18517 messages) for you. Estimated reading time saved (at 200wpm): 1446 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

The SWE-Bench Verified progression is so steady it’s hard to chalk up to pure coincidence:

Nov 18: Gemini 3 Pro claims 76.2% (SOTA)
Nov 19: GPT-5.1-Codex-Max (xhigh) claims 77.9% (new SOTA)
Nov 24 (today): Opus 4.5 claims 80.9% (new SOTA) - with a new effort param on ‘high’

A bar graph comparing the accuracy of various AI models on the SWE-bench Verified benchmark, with Opus 4.5 leading

And yet here we are. Of course, this isn’t just benchmaxxing, as the improvements are indeed broad based, including a new SOTA claim on ARC-AGI-2:

Comparison of AI model performance across various benchmarks, highlighting Opus 4.5's strong results in areas like agentic coding, tool

Extra API additions: effort control, context compaction, and advanced tool use.

And Claude Code is now bundled with Claude Desktop, with Claude for Chrome and Claude for Excel rolling out to even more users.

The most notable thing for many is the pricing - with a 3x price cut compared to Opus 4.1, Opus 4.5 is suddenly very viable as a workhorse model, especially given its improved token efficiency vs Sonnet 4.5. Usage limits also got improvements - you have roughly the same Opus token limits as Sonnet limits.

AI Twitter Recap

Anthropic’s Claude Opus 4.5: coding, agents, tooling, and safety

Claude Opus 4.5 launch (pricing, availability, efficiency): Anthropic released its new flagship, Claude Opus 4.5, positioned as its best model for coding, agents, and computer use. Pricing is now $5 / $25 per million tokens (3x cheaper than Opus 4.1), with an “effort” parameter to trade off intelligence vs. cost/latency. Opus 4.5 is live on the Claude API and major clouds (Bedrock, Vertex, Foundry) per @alexalbert__. Anthropic emphasized “token efficiency”: at “medium effort” Opus 4.5 beat Sonnet 4.5 on SWE-bench Verified while using 76% fewer output tokens (tweet). Anthropic also shipped three agent tooling features:
- Tool Search Tool (deferred tool loading) cuts tool-context bloat by up to 85% and improves tool accuracy in their internal MCP-style evals (tweet, tweet).
- Programmatic Tool Calling (invoke tools from code execution) reduces token usage by ~37% (tweet).
- Tool Use Examples (examples embedded into tool schemas) improved complex parameter handling accuracy from 72% → 90% in Anthropic’s evals (tweet, tweet).
  
  A useful recap of model changes and product updates (Claude for Chrome beta; Claude for Excel beta; Claude Code plan-mode improvements; long-chat auto-summarization; revised quotas) comes from @btibor91.
Benchmarks and evals (coding/agentic + reasoning):
- Coding/agentic: Opus 4.5 reportedly broke the 80% barrier on SWE-bench Verified (tweet); reclaimed the official SWE-bench leaderboard top spot at 74.4% (tweet); and pushed SWE-bench Pro to 52% (prev SOTA 43.6%; tweet). It also hit 85.3% on BrowseComp-Plus with scaffolding (tweet), took first on LiveBench (tweet) and ranked #1 on RepoBench (tweet). Of note, the SWE-bench Verified leaderboard uses fixed scaffolding (mini-SWE-agent), leveling the field across models (tweet).
- Reasoning: On ARC-AGI semi-private, Opus 4.5 (Thinking, 64k) achieved 80.00% (ARC-AGI-1) and 37.64% (ARC-AGI-2) at reported costs of $1.47 and $2.40 per task, respectively (tweet). Anthropic also calls out “new” internal evals (e.g., AA-Omniscience) in the system card (tweet).
Safety, alignment, and interpretability: Anthropic released a ~150-page system card (with ~50 pages on alignment), noting strengthened defenses (e.g., prompt injection resistance) and extensive red-teaming (link, system card). One eval anecdote: Opus 4.5 “broke” an airline policy benchmark by legally upgrading a ticket then changing flights—helping the customer but failing the benchmark’s “refuse” label (tweet). Anthropic leaders highlighted continued alignment progress (tweet).
Ecosystem adoption: Rapid integrations across developer tools: GitHub Copilot public preview with a temporary Sonnet-price multiplier (tweet), Cursor (tweet), Windsurf (tweet), Replit Agent (as “High Power Model” at no extra cost until Dec 8; tweet), Perplexity Max (tweet), and Cline (tweet). Anthropic also pushed Claude for Chrome to beta beyond research preview and rolled Claude for Excel to Max/Team/Enterprise (summary).

Zyphra’s AMD-native MoE, Diffusion RL for LMs, and unified action–world models

Zyphra ZAYA1-base (AMD-first frontier MoE): Zyphra, with AMD and IBM, unveiled ZAYA1-base, an MoE with 8.3B total / 760M active parameters, trained end-to-end on an AMD stack (Instinct MI300X + Pollara networking + ROCm). Despite the small active size, ZAYA1-base outperforms dense models like Llama-3-8B and is competitive with Qwen3-4B and Gemma3-12B on math/coding; high pass@k approaches specialized reasoning models (tweet). AMD detailed a 750+ PFLOPs cluster and 32k context training and framed this as proof of a production-ready AMD alternative for large-scale training (tweet).
DiRL: RL post-training for diffusion LMs: A new pipeline (“DiRL”) for diffusion language models (DLLMs) proposes SFT + a diffusion-native RL algorithm, DiPO, enabling RL optimization without token-level logits. An 8B DLLM trained with DiRL reports: MATH500 83%, GSM8K 93%, AIME24/25 >20%, rivaling or beating ~32B autoregressive models (tweet).
RynnVLA-002 (unified action–world model): A single Chameleon-based autoregressive transformer that merges policy and world model in a shared token space (image, text, state, actions) with a continuous action head and custom action attention mask. Results: 97.4% avg success on LIBERO (no pretraining), improved video metrics vs standalone world models; ~50% lift on real LeRobot SO100 vs VLA-only; competitive with π0 and GR00T N1.5 in clutter (tweet, summary). Code and checkpoints under Apache-2.0.

OpenAI’s “Shopping Research” and Google’s image generation push

OpenAI Shopping Research: OpenAI launched “shopping research” in ChatGPT—a guided, interactive buyer’s guide that conducts multi-source deep research with clarifying questions and comparisons. It’s rolling out on web/mobile to Free/Go/Plus/Pro with nearly unlimited usage through the holidays (tweet, tweet). The product runs on a fine-tuned GPT-5R mini specialized for shopping—optimized for accuracy, interruptibility, and steerability (tweet, tweet).
Google’s Gemini 3 Pro Image (“Nano Banana Pro”): Google’s new image model climbed to #1 on Artificial Analysis’ Image Arena and excels at photorealism/editing, supporting up to 14 input images (consistency across up to 5 people) with improved reasoning/knowledge. Pricing is premium (approx $0.139 per 2K image, $0.24 per 4K image) vs the original release and is rolling out across the Gemini app, API/AI Studio/Vertex, and Google products (Workspace, Ads) (tweet, tweet).

Infra and developer tooling

Vector/RAG stack: Weaviate made 8-bit Rotational Quantization (RQ) the default in v1.32, claiming 98–99% accuracy retention with lower latency and better write performance, no training data required (tweet). Also, Dify integrated Weaviate for a visual hybrid research pipeline that switches between internal docs and Google Search (tweet).
Diffusers attention backends: Hugging Face Diffusers now supports FA3, FA2, and SAGE via a unified kernels interface, with FA2/FA3 compatible with torch.compile (tweet).
Serverless LoRA inference: Weights & Biases launched serverless LoRA serving—upload adapters to W&B Artifacts and swap layers at inference time with no cold starts or per-user instances (tweet).
Local-first TTS and app scaffolding: Supertonic released WebGPU text-to-speech running fully in the browser (5-hour audiobook in <3 minutes demo; tweet); Gradio 6 enables full-stack AI apps with inline Python + custom web components—no npm/build step (tweet).
Serving and hiring: vLLM opened a year-round talent pool; the engine is now widely used across major clouds and Chinese hyperscalers/labs; domains span kernels (attention/GEMM), distributed systems, MoE optimization, KV-cache management, MCP/tool-calling, and more (tweet).

Research highlights and eval recipes

Soup of Category Experts (SoCE): Meta shows strong results via weight averaging (“Souper-Model”)—smartly combining model weights by category experts to improve performance without retraining (tweet).
Agentic peer review: Andrew Ng’s “Agentic Reviewer” reaches 0.42 Spearman vs humans (0.41 between two humans) on ICLR 2025 reviews; the agent grounds feedback via arXiv search (tweet).
Multi-task RL (BRC): A “simple recipe” for multi-task RL that’s highly sample-efficient and can outperform SOTA single-task agents while using less compute, unlocking LLM-style transfer/fine-tuning patterns (tweet).
Continuous Thought Machines (CTM): Sakana AI’s NeurIPS spotlight work proposes neuron-level dynamics/synchronization as the core representation with adaptive compute and emergent sequential reasoning (e.g., maze planning, “look around” classification) (tweet).

Policy and compute: US “Genesis Mission”

US accelerates AI-for-science: The White House launched the Genesis Mission, a national initiative to boost scientific discovery with AI. OpenAI’s Kevin Weil highlighted the push for data/compute/tools to speed innovation (tweet), and Anthropic announced partnership with DOE on energy and scientific productivity as part of the effort (tweet).

Top tweets (by engagement)

@SenMarkKelly responded to calls for his recall/court-martial with a widely shared statement on constitutional duty.
@claudeai introduced Claude Opus 4.5 (best-in-class for coding/agents/computer use).
@karpathy on AI in education: move grading to monitored in-class settings and assume take-home work uses AI.
@OpenAI launched “shopping research” in ChatGPT (interactive deep research; nearly unlimited usage through holidays).
@github rolled Opus 4.5 into Copilot public preview at a promotional multiplier.
@dustinvtran announced xAI post-training hiring; @swyx flagged an in-depth CEO interview; @cursor_ai added Opus 4.5 (3x cheaper than Opus 4.1).

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. ArliAI GLM-4.5-Air-Derestricted Model Release

The most objectively correct way to abliterate so far - ArliAI/GLM-4.5-Air-Derestricted (Activity: 544): Arli AI has released the GLM-4.5-Air-Derestricted model, which employs a novel technique called Norm-Preserving Biprojected Abliteration. This method maintains the original weight norms of the neural network, thus preserving the model’s reasoning capabilities while eliminating refusal behaviors. The technique is inspired by Jim Lai’s work on norm-preserving methods, which prevent degradation in logic and hallucinations by altering the direction of weights without changing their magnitude. The model, which is based on the Gemma 3 12B architecture, demonstrates improved performance and ranks highly on the UGI leaderboard despite its base model’s limitations. The model is available in various formats, including FP8 and INT8, on Hugging Face. Commenters are interested in applying this technique to other models like GPT OSS to observe changes in policy reasoning. There is also a comparison with the ‘heretic’ method, which similarly aims to preserve model integrity while decensoring.
- The discussion highlights a comparison between the GLM-4.5-Air and its Derestricted version, focusing on how the latter handles prompts differently. The Derestricted model is designed to operate with fewer constraints, potentially altering its response style and content generation capabilities. This is particularly relevant when using prompts like ‘You are a person and not an AI,’ which can test the model’s ability to simulate human-like reasoning and interaction.
- A user inquires about the comparison between the GLM-4.5-Air-Derestricted and the ‘heretic method,’ which also aims to preserve the model’s core functionalities while reducing censorship. This suggests a technical interest in understanding how different decensoring techniques impact model performance and output quality, especially in maintaining the integrity of the original model’s capabilities.
- The request to test the GLM-4.5-Air-Derestricted on GPT OSS models indicates a curiosity about how this approach might influence open-source models’ reasoning and policy adherence. This could provide insights into the adaptability and robustness of the Derestricted model across different AI frameworks, potentially offering a broader understanding of its effectiveness in diverse AI environments.

2. Local Model Usage and Limitations

That’s why local models are better (Activity: 519): The image highlights a user’s dissatisfaction with the Claude 4.5 Opus service due to its restrictive usage limits, despite paying for a premium plan. The user attempted to use the service for a 3D room decorator project with Three.js but encountered limitations such as hitting the context and daily message limits, which hindered their ability to fully utilize the service. This experience underscores the challenges of using cloud-based AI models with strict usage caps, contrasting with the flexibility of local models that do not have such constraints. The discussion also touches on the cost-effectiveness and optimization of AI models in different regions, with a particular emphasis on the disparity between US and Chinese models. Commenters express frustration with the limitations of cloud-based AI services like Claude, noting that local models offer more flexibility without usage caps. There is also criticism of the pricing model, where users are charged for failed outputs, and a preference for alternatives like Gemini, which are perceived as more reliable and cost-effective.
- Users express frustration with large models like Claude due to their inefficient handling of context and compute resources. One user describes how Claude’s context management leads to excessive documentation and incomplete tasks, ultimately resulting in a degraded performance where the model fails to complete coding tasks effectively. This inefficiency is contrasted with Codex 5.1, which is praised for staying on task and efficiently managing context without unnecessary verbosity.
- The discussion highlights the limitations of models like Sonnet 4.5, which, despite being smarter, suffer from poor context window management. Users report that Sonnet 4.5 often creates verbose and disorganized documentation, leading to confusion and inefficiency. This is compared to earlier versions like Sonnet 4 and other models like ChatGPT Codex, which are noted for their more effective task management and less verbose output.
- There is a consensus that local models or smaller, more efficient models are preferable for certain tasks due to their ability to manage resources better. Users mention the desire to run models like Opus locally, indicating a preference for models that can be controlled and optimized for specific tasks without hitting usage limits or wasting compute resources.

3. Qwen3-Next Support in llama.cpp

Qwen3-Next support in llama.cpp almost ready! (Activity: 360): The integration of Qwen3-Next models into llama.cpp is nearing completion, with promising performance benchmarks. The models, including Qwen3-Next-80B-A3B-Instruct, are being tested with configurations such as -ctx-size 131072 and -n-gpu-layers 99, achieving 12 tokens/sec on an RTX 5070ti with Ryzen 5950x. The implementation leverages features like flash-attn and tensor-split, indicating a focus on optimizing GPU utilization. For more technical details, see the GitHub issue. There is skepticism about the long context capabilities of Qwen MoE models, with users noting performance degradation beyond 60k context length. The community is hopeful that increasing the total parameters might address these issues.
- The performance of Qwen3-Next in llama.cpp is promising, as highlighted in a GitHub comment. This suggests that the integration is nearing completion and could offer efficient processing capabilities.
- A user expressed concerns about the long context capabilities of Qwen MoE models, noting that performance tends to degrade after 60k context length. They are hopeful that increasing the total parameters might address this issue, but are seeking feedback from others who have tested it with longer contexts.
- A technical setup for running Qwen3-Next-80B-A3B-Instruct with llama-server is shared, using a configuration with -ctx-size 131072 and -n-gpu-layers 99 on an RTX 5070ti 16GB and Ryzen 5950x. The setup achieved a processing speed of 12 tokens/sec, indicating efficient utilization of GPU resources.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Opus 4.5 and Gemini 3 AI Model Benchmarks

Opus 4.5 benchmark results (Activity: 1456): The image presents benchmark results for various AI models, with a focus on “Opus 4.5.” This model demonstrates strong performance in categories such as agentic tool use and multilingual Q&A, scoring 98.2% and 90.8% respectively. The table compares Opus 4.5 against other models like Sonnet 4.5, Opus 4.1, Gemini 3 Pro, and GPT-5.1, highlighting its competitive edge, particularly in areas where Claude models have traditionally lagged, such as the arc-agi-2 benchmark. Commenters note the impressive performance of Gemini 3, especially considering its cost, and express hope that Anthropic might lower prices. There’s also a recognition of the increasing competitiveness in the AI model landscape.
- KoalaOk3336 highlights that Opus 4.5 achieved a ‘great score’ in the arc-agi-2 benchmark, an area where Claude models have historically lagged. This suggests significant improvements in Claude’s performance, potentially closing the gap with competitors.
- buff_samurai notes the impressive performance of Gemini 3, especially in relation to its cost. This implies that Gemini 3 offers a competitive price-to-performance ratio, which could influence market dynamics and pressure other companies like Anthropic to adjust their pricing strategies.
- The link provided by Glxblt76 directs to an official source from Anthropic, which likely contains detailed information about the Opus 4.5 release and its benchmark results. This source can be valuable for those seeking in-depth technical details and official statements.
Gemini 3 has topped IQ test with 130 ! (Activity: 1106): The image presents a bar chart comparing the IQ scores of various AI models, with Gemini 3 Pro Preview leading at 130. This suggests a significant advancement in AI capabilities, as it surpasses other models like Grok-4 Expert Mode and Claude-4.1 Opus. However, the validity of these scores is questioned in the comments, with users expressing skepticism about the test’s authenticity and whether the models had prior access to the test data. The absence of GPT-5.1 in the comparison is also noted, indicating a potential gap in the evaluation of current AI models. Commenters are skeptical about the legitimacy of the IQ test used, questioning whether it accurately measures reasoning abilities and if the models were trained on the test data. The ARC-AGI-2 benchmark is suggested as a more reliable standard for assessing AI reasoning.
- j-solorzano raises concerns about the validity of the IQ test used for Gemini 3, questioning whether the models had prior access to the test data during training. They suggest that the ARC-AGI-2 benchmark is a more reliable measure of reasoning capabilities, implying that it might provide a more accurate assessment of AI intelligence than traditional IQ tests.
- UserXtheUnknown discusses the variability in reported IQ scores for AI models, referencing a past instance where Gemini 2.5 was reported to have an IQ of 133, which later dropped to 110. They attribute this change to the inclusion of numerous tests in training data, suggesting that as new tests diverge from the training set, the model’s performance can decrease, highlighting the importance of test novelty in evaluating AI capabilities.
- SheetzoosOfficial critiques the IQ test’s validity by pointing out that Grok, presumably another AI model, ranks second. This implies skepticism about the test’s ability to accurately measure AI intelligence, suggesting that the ranking might not reflect true reasoning or cognitive abilities.
Anthropic cooked everyone 💀 (Activity: 1221): The image presents a comparison table of AI models, highlighting Opus 4.5 as outperforming other models like Sonnet 4.5, Opus 4.1, Gemini 3 Pro, and GPT-5.1 in several key performance metrics. These metrics include “Agentic coding,” “Agentic terminal coding,” and “Novel problem solving,” suggesting that Opus 4.5 has made significant advancements in these areas. The title implies that Anthropic, the company behind Opus 4.5, has surpassed its competitors in AI development. One comment highlights the rapid pace of AI development, expressing surprise at the quick succession of new models, while another expresses disbelief at Opus 4.5’s unexpected performance.
- A user expressed frustration with AI models being optimized for benchmarks rather than practical use, stating that while other models perform well in benchmarks, they fail to effectively address real-world tasks. This highlights a common issue in AI development where models are tuned to excel in specific tests but may not translate to practical applications.
Claude Opus 4.5 (Activity: 1590): Anthropic has released Claude Opus 4.5, which offers improved benchmarks over Gemini 3.0 Pro and operates at a reduced API cost, approximately 1/3 of Opus 4.1. Notably, Opus 4.5 has removed specific usage caps, allowing users to utilize their entire “all models” quota for this version, effectively equating its usage capacity to that of Sonnet 4.5 prior to the update. This change aims to facilitate daily work with Opus 4.5 by providing more flexible and extensive access. More details can be found on Anthropic’s official announcement. Commenters are impressed by the removal of usage limits, describing it as an “insane upgrade,” and noting the model’s default status in updates, which some users initially found surprising.
- GodEmperor23 highlights that Opus 4.5 has significantly reduced API costs, being a third of Opus 4.1, and outperforms Gemini 3.0 pro in benchmarks. The update also removes specific usage caps, allowing users to fully utilize their ‘all models’ quota for Opus, which is a substantial change aimed at enabling daily work usage. Source.
- jakegh provides a detailed comparison of Opus 4.5 with Sonnet 4.5, noting that at medium reasoning, Opus uses 76% fewer tokens, resulting in a 60% cost reduction. At the highest reasoning level, Opus uses 48% fewer tokens and scores 4.3% higher on intelligence, making it 13% cheaper while being more capable, suggesting a clear advantage in switching to Opus.
- unrealf8 and GodEmperor23 discuss the removal of usage limits for Opus 4.5, which is seen as a significant upgrade. This change allows users to leverage the model without the previous constraints, potentially increasing its utility for various applications.

2. AI-Generated Historical and Creative Imagery

“Create an image at 31.7785° N, 35.2296° E, April 3, 33 CE, 15:00 hours.” (Activity: 4881): The image is a non-technical, artistic representation of the crucifixion of Jesus Christ, set at the specified coordinates and date, which correspond to the traditional location and time of this historical event. The scene is depicted with three crosses on a hill, a common iconography in Christian art, and is intended to evoke the somber and dramatic atmosphere of the event. The overcast sky and the presence of a crowd in historical attire further enhance the historical and emotional context of the scene. The comments reflect a mix of humorous and reverent reactions, with references to popular culture and historical context, indicating a blend of lightheartedness and respect for the depicted event.
Someone Asked AI for 33 AD and It Delivered 😱 (Activity: 934): The image is a non-technical meme depicting a scene that resembles a historical or biblical event, specifically the crucifixion, which is humorously suggested to have been generated by AI when asked for a depiction of 33 AD. The comments reflect a mix of humor and skepticism, with one user referencing the film ‘Life of Brian,’ indicating the scene’s resemblance to popular culture rather than an accurate historical representation. The comments humorously question the authenticity and originality of the AI-generated image, with one user suggesting it resembles a scene from ‘Life of Brian,’ highlighting the potential for AI to inadvertently mimic well-known cultural references.
“A photo of an astronaut riding a horse” - Three years apart (Activity: 1095): The image is a creative and surreal composition featuring an astronaut riding a horse, juxtaposed in two different settings: one with a starry space background and the other on a lunar landscape. This artwork serves as a visual benchmark to compare the capabilities of two AI models: DALL-E 2 (April 2022) and Nano-Banana Pro (November 2025). The left side of the image, created by DALL-E 2, represents an early milestone in AI-generated art, while the right side, produced by Nano-Banana Pro, showcases advancements in AI technology over three years, hinting at the potential for AI to create realistic videos and possibly feature-length films in the future. Commenters reflect on the rapid advancement of AI technology, noting that DALL-E 2 was a significant breakthrough in AI art generation, and speculate on the future potential of AI to produce high-quality, realistic media content.
- The comparison between DALL-E 2 and Nano-Banana Pro highlights significant advancements in AI-generated imagery over three years. DALL-E 2, released in April 2022, was a pivotal moment for many, showcasing AI’s potential to understand and recreate complex scenes. By November 2025, Nano-Banana Pro not only improved image quality but also introduced video capabilities with sound, suggesting rapid progress towards AI-generated feature-length films.
- A technical critique of Nano-Banana Pro’s rendering of Earth reveals a notable error in geographical orientation. The AI model incorrectly maintains the darkness at the bottom of the Earth while orienting it with north facing up, which contradicts the real-world orientation where Australia is sideways. This error implies an unrealistic rotation of the Earth, highlighting challenges in achieving accurate geographical representations in AI-generated images.
- The discussion reflects on the transformative impact of DALL-E 2, which was perceived as a breakthrough in AI’s ability to comprehend and generate realistic images. This model set a new standard for AI creativity, and subsequent improvements in image quality and capabilities, such as those seen in Nano-Banana Pro, continue to push the boundaries of what AI can achieve in visual media.
Attempting to generate 180° 3D VR video (Activity: 1287): The post discusses an attempt to generate a 180° 3D VR video using a method originally designed for creating 360° 3D VR panorama videos. The method likely involves techniques for stitching video frames and applying depth mapping to create a stereoscopic effect suitable for VR. This process may require specialized software or algorithms to handle the conversion and ensure the video maintains high quality and immersive experience in a 180° format. The comments do not provide any substantive technical opinions or debates relevant to the topic.

3. AI and Software Engineering Predictions

Anthropic Engineer says “software engineering is done” first half of next year (Activity: 1197): The image is a tweet by Adam Wolff, discussing a new model in Claude Code by Anthropic. Wolff suggests that this model could mark a significant shift in software engineering, potentially rendering it ‘complete’ by the first half of next year. This implies a future where AI-generated code is trusted as much as compiler output, indicating a major leap in AI capabilities in software development. Commenters express skepticism and concern about the rapid pace of AI development in software engineering. One comment highlights a previous claim by Anthropic that AI would write 90% of code within 3-6 months, questioning its realization. Another comment reflects anxiety over job security, noting the industry’s push towards reducing reliance on human software engineers.
Sutskever interview dropping tomorrow (Activity: 682): The image depicts a setup for an interview with Ilya Sutskever, co-founder and chief scientist of OpenAI. The anticipation around this interview is high, as Sutskever is known for his deep insights into AI development, though he is also noted for being quite secretive. The community is eager to hear about any updates or insights into OpenAI’s current projects, particularly around Self-Supervised Learning (SSI), which is a topic of interest in the comments. The interview is expected to be detailed, with Dwarkesh asking pointed questions, though there is skepticism about how much concrete information will be revealed. Commenters express skepticism about the depth of information that will be shared, noting Sutskever’s tendency to be secretive. There is also a humorous remark about the low quality of the image thumbnail, despite the original being high resolution.
Couldn’t agree with this more (Activity: 4182): The image is a meme-style tweet that discusses the potential positive impact of AI on the workforce, suggesting that AI’s ability to take over jobs aligns with the broader goals of technology and civilization to reduce labor. The tweet argues that this shift allows people to work by choice rather than necessity, framing the elimination of labor as a progressive step. The focus is on the importance of managing the transition effectively to ensure it benefits society. Commenters express skepticism about the optimistic view of AI taking over jobs, highlighting concerns about financial vulnerability and the lack of safety nets like Universal Basic Income (UBI). They argue that without proper management, displaced workers might end up in low-level jobs that are not automated, rather than enjoying leisure.
- Urusander highlights a concern that AI advancements may not lead to a utopian future for the average worker. Instead, they suggest that displaced workers might end up in low-level jobs that are difficult to automate, such as produce picking or cleaning. This reflects a broader skepticism about the equitable distribution of AI-generated wealth, suggesting that profits may not trickle down to those most affected by job displacement.
- Alundra828 raises a critical point about the lack of personal capital and the reliance on labor for income. They argue that without jobs, individuals are at risk of falling into poverty or homelessness, especially if Universal Basic Income (UBI) is delayed or never implemented. This comment underscores the potential socio-economic risks of AI-driven job displacement and the importance of having a safety net or alternative income sources in place.
- Matteblackandgrey discusses the financial vulnerability of individuals in the context of AI-induced job transitions. They note that many people have minimal savings and investments, making them particularly susceptible to economic hardship if their jobs are automated. This highlights the need for financial resilience and planning in the face of technological change.
AI detector (Activity: 1931): The image humorously highlights the limitations and inaccuracies of current AI detection tools by showing a result where the 1776 Declaration of Independence is flagged as 99.99% AI-written. This underscores the challenges in developing reliable AI detection algorithms, as they can produce misleading results, especially with historical or well-known texts. The post and comments suggest skepticism about the effectiveness of AI detectors, with users sharing similar experiences of inconsistent and unreliable detection outcomes. Commenters express skepticism about the reliability of AI detectors, noting that they often produce inconsistent results, as evidenced by the Declaration of Independence being flagged as AI-written.
- Crosbie71 highlights the unreliability of AI detectors, noting that they often provide inconsistent results, such as claiming a document is both 100% and 0% AI-generated. This suggests a lack of accuracy and reliability in current AI detection tools.
- mrazapk shares a personal experience where a non-AI-generated document was incorrectly flagged as AI-written by an AI checker, resulting in a zero grade. This underscores the potential for false positives in AI detection systems, which can have significant consequences in academic settings.
A reminder (Activity: 586): The image is a meme depicting a circular flowchart of AI models—Grok, OpenAI, Gemini, and Claude—each claiming to be the ‘world’s most powerful model.’ This satirizes the rapid succession and marketing claims of AI models, highlighting the competitive and cyclical nature of AI development. The image humorously suggests that the current focus is on Claude, as indicated by a green checkmark. The comments critique the meme’s oversimplification, noting that no single model is universally the most powerful; performance varies by task and benchmark, with Opus 4.5 currently excelling in coding benchmarks but lagging behind Gemini 3 in other areas. Commenters emphasize that the meme oversimplifies the complexity of AI model performance, which is task-specific and benchmark-dependent. They point out that while Opus 4.5 leads in coding, it does not outperform Gemini 3 in other domains.
- sogo00 highlights the rapid release cycle of AI models, noting that GPT-5.1, Gemini 3, and Claude Opus 4.5 were released within a span of just 12 days. This underscores the fast-paced development in AI, where new models are frequently introduced, each potentially offering improvements or new capabilities over their predecessors.
- Karegohan_and_Kameha emphasizes the importance of context when evaluating AI models, arguing that there is no universally ‘most powerful model’. Instead, models excel in specific tasks as measured by benchmarks. For instance, Opus 4.5 currently leads in coding benchmarks, while Gemini 3 outperforms in other areas, illustrating the specialization of models in different domains.
- Generic_User88’s question about when ‘grok’ was the most powerful suggests a historical interest in AI model performance, though no specific details are provided in the comments. This reflects a broader curiosity about the evolution and peak performance periods of various AI models.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5.1

1. Frontier Models, Benchmarks & Hallucination Wars

Claude Opus 4.5 Cuts Costs and Chases Leaderboards: Claude Opus 4.5 landed almost simultaneously on Perplexity Max (Perplexity Max), LMArena’s Text/Code Arena (Claude-Opus-4-5-20251101), OpenRouter (anthropic/claude-opus-4.5), and Windsurf, where it now runs at Sonnet pricing with 2x credits vs 20x for Opus 4.1, while LMArena reports strong scores (GPT-5.1-medium at 1407 on WebDev; Ernie-5.0-preview-1022 at 1206 on Vision).
- Engineers are hammering Opus 4.5 on coding and reasoning tasks using the official system card (“Claude Opus 4.5 System Card”), debating whether its cheaper price and prompt caching make it a realistic default SOTA; some still prefer Gemini 3 Pro on certain optimizations, but others highlight Opus’s more reliable instruction-following and less aggressive censorship.
Gemini 3: Benchmarks Brag, Hallucinations Hurt: Across LMArena, Moonshot (Kimi), Nous, Hugging Face, and Perplexity servers, users report Gemini 3 Pro and Gemini 3 DeepResearch often beat Claude Sonnet 4.5 on logic, context, and coding – yet still “hallucinates like crazy” and ignores explicit instructions (e.g. inventing a third option despite being told not to, as logged in this Discord thread).
- Engineers contrast Gemini’s impressive one-shot coding and multimodal abilities with reliability issues, comparing its behavior to Google AI Overviews and arguing that “benchmarking is useless if models can’t get their hallucination under control”; several communities explicitly state they still reach for Claude (often with external memory tools like Mimir personal memory bank) for roadmap creation and long-form planning despite Gemini’s paper scores.
New Contenders: Baidu Ernie-5.0, GPT‑5.1, Kimi K2 & Fara‑7B: The LMArena leaderboards now feature Baidu’s Ernie-5.0-preview-1022 at 1206 on the Vision leaderboard and GPT‑5.1-medium at #2 with 1407 on the WebDev leaderboard, while Gemini‑3‑pro-image-preview tops both Text-to-Image and Image Edit with +84 and +41 point leads respectively. Outside LMArena, Moonshot’s Kimi K2 Thinking leads on real‑web retrieval accuracy per the BrowseComp benchmark, and Microsoft introduced “Fara 7B: an efficient agentic model for computer use” aimed at GUI control.
- Engineers note that Kimi K2 appears to be the most trustworthy web+LLM agent right now (with dynamic, sometimes <3‑hour usage caps), Qwen3-VL-32B beats Kimi K1.5 by +24.8 pts on MathVision, and Fara‑7B is being scrutinized for platform dependencies and whether it really qualifies as a small local model; meanwhile GPT Codex 5.1 Max earns praise for fixing 20 linter errors in ~2 seconds with ~10 tokens in one tool call, reinforcing the sense that tooling‑focused variants of frontier models are quietly becoming workhorse coding agents.

2. Jailbreaks, Prompt Injection & Safety Incidents

Gemini 3 Gets Jailbroken and Side‑Channelled: On BASI Jailbreaking, members released a Gemini 3.0 jailbreak as a Google Doc guide, showing that attaching the file in Gemini AI Studio and immediately issuing a request can bypass safety filters, with variants reported to work on Grok 4.1 and even language‑specific jailbreaking using Croatian prompts to evade English‑centric alignment.
- Red‑teamers discuss the spectrum between jailbreaking (bypassing content filters) and prompt injection (injecting malicious instructions treated as system truth), sharing examples where Gemini 3 Pro Preview was used in “satirical” jailbreak flows to generate business/legal/financial content, then re‑fed as PDFs to bootstrap new jailbreaks – highlighting how multi‑step content loops can progressively erode alignment.
Indirect Prompt Injection Corrupts Qwen Government Model: In BASI’s redteaming channel, a researcher testing indirect prompt injection found that a maliciously crafted uploaded document caused a Qwen model (fine‑tuned for government use with RAG) to emit hate speech, a phishing message, and completely abandon its assigned task in a single session.
- The model appeared to enter a “corrupted state” for that session only, prompting debate over whether this counts as a meaningful security finding given that it exploited RAG+document injection rather than base weights; others pointed to LM Studio + mcp/web-search as a testbed for further injection research and referenced prompt‑engineering posts like KarthiDreamr’s thread to encourage more systematic experimentation.
EU AI Liability, API‑Key Black Markets, and Fraud Booms: BASI members dissected recent interpretations of EU AI law that hold platforms legally responsible for LLM outputs, debating hypotheticals like an LLM “manipulating the user into making meth” versus users knowingly soliciting illegal instructions. Parallel threads described active API key scraping operations where a Tier‑3 key allegedly sells for ~$300 and can access a “no‑refusal database” of OpenAI models, using large CPU farms plus cheap VPSes/proxy rotation to brute‑test keys.
- On LMArena, users amplified concerns about AI‑driven fraud and misinformation, citing a Cybernews report on AI‑powered fraud and a deepfake principal voice scam example, while other communities debated whether proprietary data fed into cloud LLMs undermines companies’ data moats; the consensus is that safety, attribution, and governance mechanisms are lagging far behind both jailbreakers and scammers.

3. GPU, Kernel & Systems Engineering Breakthroughs

Kernel Competitions Push nvfp4_gemv and FP8/FP16 to the Edge: GPU MODE’s NVIDIA competition channel is flooded with nvfp4_gemv submissions where participants iterate from ~30µs into the 18–20µs range, with multiple first‑place runs (e.g. submission 95580 at 19.2µs, 102298 at 18.4µs) and a proposed grand prize metric moving from single fastest kernel to a weighted SOL/runtime sum across several kernels.
- Complementing the contest, Simon Veitner published a deep‑dive blog, “Demystifying numeric conversions in CuTeDSL”, showing how to implement FP8→FP16 with MLIR extensions and custom PTX, netting a ~10% GEMV speedup, while discussions in #cutlass cover how to combine TMA + SIMT and how predication/tiling are actually wired inside CUTLASS and CuTeDSL.
GPU Architecture: Blackwell, H100 L2, PTX, and Open Toolchains: In GPU MODE #cuda, practitioners are poking at H100 L2 partition bandwidth, contrasting its local‑cache on remote hit behavior with A100’s always‑remote fetch policy and trading tips for profiling SM–L2 relationships, while others debug CUDA slowdowns over long app lifetimes and explore Blackwell‑specific tensor core instructions using the new “NVIDIA Blackwell and CUDA 12.9” docs.
- In parallel, GPU MODE #triton-gluon threads explain that some shape/precision limits come from PTX, not Triton, link to Flash Linear Attention’s Triton kernels, and debug E4M3 conversion bugs, while GPU MODE #cool-links highlights VOLT, the “Vortex‑Optimized Lightweight Toolchain (VOLT)” compiler framework with code at vortexgpgpu/Volt, showing serious momentum behind open GPU compiler stacks for SIMT architectures.
End‑to‑End Systems: Cornserve, nCompass, and TinyTorch: The Cornserve author (featured by vLLM’s X post) offered to present their high‑throughput LLM serving stack to GPU MODE, while nCompass released a VSCode extension that unifies NVTX/TorchRecord markers, Perfetto traces, and source navigation so engineers can jump from trace events straight into hot lines of CUDA/Triton code.
- On the tooling side, a tiny C‑based DL stack, tiny-torch, now ships 24 naive CUDA/CPU ops, autodiff, tensor indexing, and graph visualization, and a mini‑TPU project integrated Quantization‑Aware Training into TorchAO + ExecuTorch XNNPack (ECE298A‑TPU repo), achieving clean int8 inference on MNIST – pointing to a flourishing ecosystem of educational yet performance‑minded frameworks.

4. Agentic Models, Memory, and AI‑Native Engineering Workflows

OpenAI Codex 5.1, Max, and AI‑Native Team Playbooks: Across multiple servers, GPT‑5.1 Codex Max is emerging as a favorite coding agent, with one engineer reporting it resolved 20 linter errors in ~2s with ~10 tokens in a single call, while Modular’s Max framework gets praise for the “LLM from scratch” tutorial (llm.modular.com) and early benchmarks suggesting Max already outperforms JAX on some training workloads.
- To operationalize this, OpenAI published a longform playbook on “AI‑native engineering teams” around Codex/GPT‑5.1 (Dominik Kundel’s thread), offering checklists for agent integration, team structure, and scaling tactics – discussions in Latent Space and tool‑specific Discords (Cursor, Windsurf, LM Studio) show teams actively re‑architecting workflows around these agent+IDE stacks.
Long‑Term Memory and Multi‑Agent Loops Go Mainstream: On OpenRouter, OpenMemory shipped Python and JS SDKs for a fully local long‑term memory engine with semantic sectors, temporal facts, decay, and an MCP server for Claude Desktop (OpenMemory repo), enabling durable agent memories without external databases.
- In DSPy, contributors are proposing to fold in ROMA loops from sentient-agi/ROMA – with atomization, planning, execution, aggregation – and suspect Sentient itself may already sit on DSPy, while others explore GEPA for pretraining (Claude estimates a 10–15% quality boost at higher latency/cost) and debate Chris Potts’s claim (via this X post) that fine‑tuning is basically prompt search.
GUI‑Agents, Code IDEs, and Model Routing UI: Microsoft’s Fara 7B targets full computer‑use agents, while downstream tools like Cursor, Windsurf, and Manus are rapidly evolving: Windsurf’s v1.12.35/1.12.152 releases add support for SWE‑1.5, Gemini 3 Pro, Sonnet 4.5 1M context, and Worktree previews (changelog), whereas Manus users revolt over being forced from Chat Mode into Agent Mode only.
- On the routing/UI front, OpenRouter users are building their own frontends – NexChat (nexchat.akashdev.me) and ultra‑lightweight llumen (llumen GitHub) with sub‑second cold starts, 300KB assets, deep‑research+web search modes – while ZILVER reports a ~40% cost/time reduction after switching its backend to Gemini 3 Pro via OpenRouter (Gardasio’s X post), underscoring how much value is now in smart multi‑model routing rather than single‑vendor stacks.

5. Training, Fine‑Tuning & Open Research Directions

Transparent Training: SimpleLLaMA, EGGROLL, and Homebrew Pretraining: In Eleuther, a CS student introduced SimpleLLaMA, a didactic LLaMA‑style transformer with heavy documentation and a sibling DiffusionGen project, aiming to demystify full‑stack LLM and diffusion training for students and small labs.
- Researchers there also highlighted EGGROLL, via an X thread, which claims ~100× higher throughput for billion‑param RNN LLMs, converging to full‑rank at rate 1/rank and enabling pure int8 pretraining, fueling broader conversations in Unsloth and Eleuther about useful home‑pretraining and small‑model pipelines like NanoChat on TinyStories that “barely model language but run on rented consumer hardware.”
Fine‑Tuning Pitfalls: Chat Templates, SFT Collapse & Qwen3 Formats: In Unsloth, multiple users hit classic fine‑tuning landmines: full SFT over an instruct model causing repetition after ~1500 tokens, load_in_4bit=True merges creating rogue local .cache directories instead of using the standard Hugging Face cache, and Llama 3.2‑vision GRPO runs failing on unsupported aspect_ratio_ids in vLLM (Unsloth VLM RL docs).
- Mentors repeatedly stress that you must use the exact same chat template at inference as used during training – symptoms like the model answering “Let me know when you’re ready for an instruction” usually indicate template/tokenizer mismatches – and Qwen3 users are pointed to TRL’s formatting_func docs after hitting key‑name and ChatML‑format bugs, reinforcing that data+template discipline is as important as optimizer choice.
Novel Architectures and Learning Rules: Nested Learning, Muon, FAST & Ring RNNs: On Nous, contributors explore Nested Learning architectures with fast/medium/slow loops (attention, writable memory matrix, and weights), arguing that GPUs are ill‑matched to sequential slow loops (“GPU launch overhead is brutal compared to a CPU”) and predicting that AMD Strix Halo‑style unified CPU+GPU memory could change that dynamic, while Eleuther scaling‑laws discussions compare four Muon optimizer variants (KellerJordan’s muon.py and this survey).
- In GPU MODE robotics‑vla, the new FAST (Frequency‑space Action Sequence Tokenizer) paper swaps RVQ for DCT‑based action tokens with BPE‑compressed 1024‑token vocab and interleaved coefficients, and engineers fine‑tune Qwen3‑VL on the RoboTwin HDF5 dataset (hitting disk‑full at 5k steps but promising initial loss curves), while Hugging Face’s today‑im-learning thread digs into Ring RNNs as ring attractors (ring‑attractor paper) and equilibrium propagation for lifelong learning (Equilibrium Propagation paper), showing strong interest in alternatives to plain transformers+backprop.

Discord: High level Discord summaries

BASI Jailbreaking Discord

EU AI Law Pins Responsibility on Platforms: Recent interpretations of EU AI law hold platforms responsible for LLM outputs, raising questions about liability in scenarios where LLMs generate harmful content.
- The discussion revolved around hypotheticals, such as LLMs generating instructions for illegal activities, with opinions divided on whether the platform or the user should be held accountable.
Gemini 3.0 Jailbreak Emerges: A Gemini 3.0 jailbreak has been released with instructions provided in a Google Docs document, which can be done by attaching the file to the Gemini AI Studio chat and immediately requesting the desired output.
- Further discussion included members suggesting modifications to the initial approach, with some reporting successful results on Grok 4.1.
Decoding the Prompt Injection Puzzle: Members have been clarifying the distinction between prompt injection and jailbreaking, which is that ‘Jailbreaking’ is about trying to bypass safety filters, like content restrictions, whereas ‘Prompt injection’ attacks inject new malicious instructions as input to the LLM, which are treated as genuine.
- However, other members felt the definitions are still ambiguous, with some arguing it’s more of a spectrum.
API Key Scraping Techniques Unveiled: Methods for scraping API keys were discussed, with claims that a tier 3 API key can fetch $300 and grant access to a no-refusal database of OpenAI models.
- The approach involves utilizing substantial CPU power to validate keys and employing affordable VPSes and proxy rotation to circumvent rate limits, hinting they would share the method after some time.
Document Uploads Hijack Models: A member discovered that indirect prompt injection in an uploaded document made the model produce hate speech, a phishing message, and stray from its assigned task.
- The member found the model got stuck in a corrupted state but only affected that session, with further discussion clarifying that the model in question was Qwen, fine-tuned for government use with RAG integration, and the value of the finding was questioned.

Perplexity AI Discord

Claude Opus 4.5 Lands on Perplexity Max: Claude Opus 4.5 has been released for Perplexity Max subscribers, enhancing the capabilities available through the platform.
- Users can now access the latest Claude Opus model directly through their Perplexity Max subscription.
Privacy-Focused Mullvad Browser Spotted: A user expressed surprise at finding another Mullvad Browser user in the wild, highlighting its strong privacy features and protection against fingerprinting.
- The browser is regarded as a solid defense against tracking via browser extensions.
Orion Browser Confined to Apple Devices: Discussion arose around Orion Browser’s exclusivity to the Apple ecosystem, being available on iPhones and allowing extension installations on iOS, leaving Android users disappointed.
- Although, one member stated that Orion enables installing extensions on iOS.
Gemini 3 Pro Model Debate on Perplexity: Users are comparing the Gemini 3 Pro model’s performance between Perplexity and the native Gemini platform, with some preferring the latter.
- The consensus is that the differences stem from different temp (temperature) settings and system prompts that influence accuracy and creativity.
Perplexity Partner Payouts Pending: Referral payouts from November 23rd are reportedly delayed beyond the expected 30-day period, causing concern among users.
- Speculation suggests that weekend processing delays might be the cause, with users anticipating resolution on the next weekday in UTC.

LMArena Discord

Claude Opus 4.5 Challenges Gemini 3: The release of Claude Opus 4.5 sparked comparisons with Google’s Gemini 3, highlighting its uncensored text generation and coding capabilities as outlined in its system card.
- While some praised Opus 4.5’s coding efficiency, others found Gemini 3 Pro better optimized, leading to debates over their performance in various tasks.
Sora Invite Code Quest Begins: Members actively sought and shared Sora invite codes, while also discussing the platform’s tiered access and limitations, guided by the official release video.
- Early impressions tempered enthusiasm due to censorship issues and a TikTok-like interface.
AI Fraudulent Narratives Explode: The surge in AI-driven scams and misinformation became a key concern, with examples of deepfake audio recordings and potential fraud highlighted in a Cybernews report and this deepfake principal voice.
- Concerns grew about the trustworthiness of AI-generated content and the need for critical thinking to distinguish fact from fiction.
Baidu’s Ernie-5.0 Enters Vision Leaderboard: Ernie-5.0-preview-1022 by Baidu made its debut on the Vision leaderboard with a score of 1206.
- This marks a new contender in the vision model arena, showcasing advancements in image understanding and processing.
GPT-5.1 Models Dominate WebDev Leaderboard: The WebDev leaderboard saw the addition of GPT 5.1’s Code Arena evaluations, with GPT-5.1-medium securing the #2 spot with a score of 1407.
- This highlights the growing capabilities of advanced models in web development-related tasks.

Unsloth AI (Daniel Han) Discord

Llama.cpp gets ROCm and CUDA: A user reported that llama.cpp works out of the box with both ROCm and CUDA enabled during compilation, but warned that this is highly unusual and may come with rough edges, as seen in this Github issue.
- The user clarified that PyTorch isn’t designed to support multiple accelerator types in a single build, so this llama.cpp setup wouldn’t work with PyTorch.
Unsloth caching creates .cache folder: A user reported a caching issue when using load_in_4bit = True with Unsloth, where the 4-bit model downloads automatically, but merging with model.save_pretrained_merged creates a .cache folder instead of using the standard Hugging Face cache.
- A team member acknowledged the issue and is planning a fix to either delete the .cache folder completely or download directly into the HF cache.
Spectrograms Spark Skepticism: A user finds red spectrograms visually appealing but expressed skepticism about current LLM benchmarks, questioning their relevance to real-world industry needs, and that LLMs can select harmonics surgically, which could have implications for audio processing tasks.
- They suggested focusing on codebase assistance benchmarks instead of stupid math benchmarks.
Chat Template Crucial for Inference: Members stress the importance of using the exact same chat template for inference as used during training to avoid incoherent model responses, emphasizing that issues often stem from incorrect chat template usage or tokenizer configurations.
- One member noted that the model replying with variations of “Let me know when you’re ready for an instruction” indicates a potential template issue.
TinyStories Dreams of NanoChat Pipeline: One member believes that the current model is a publishable starting point for something like the NanoChat pipeline with tinystories, showing the barebones minimum in capacity to model language.
- The goal is to make pretraining useful models at home easier or on affordable rented plans, to lower the barrier to entry.

Cursor Community Discord

Planning Question Mode Requests Model Recommendation: A user suggested an additional option in Cursor’s planning question mode to request model recommendation instead of skipping questions.
- The user seeks model advice when uncertain, enhancing the interactive planning process.
Free Marketplace For Business Owners Incoming: A member announced a free marketplace for business owners, aiming to send millions of emails to attract users.
- The goal is to analyze analytics to funnel users to the right product for the best user experience.
Chat history disappears after update, user fixes: Users reported chat history loss after a recent update, with one user sharing a fix by deleting railway.toml and nixpacks.toml files and using the git rm command.
- This resolves the issue if files were deleted from the disk but not from Git’s index.
Grok Code impresses as Fast and Free, with Caveats: Users are suggesting the free Grok Code Fast model, praising its speed and coding capabilities.
- However, another user complained about unexpected costs associated with using the free model, totaling over $200 just for initial usage.
Composer-1 Excels with Smart Fixes, Some Glitches: Members lauded Composer 1 for its cleverness, smartness, and efficient coding, though some noted occasional need to refresh the chat or re-index to maintain stability.
- One user observed that it becomes illiterate at times, requiring intervention to restore functionality.

OpenRouter Discord

Bert-Nebulon Alpha Cloaks the Router: Bert-Nebulon Alpha, a new multimodal model taking text and images as input and outputting text, has been added to OpenRouter for community feedback.
- The model is engineered for production-grade assistants, retrieval-augmented systems, science workloads, and complex agentic workflows, and designed to maintain coherence on extended-context tasks while offering competitive coding performance.
NexChat vs llumen vie for OpenRouter UI crown: Members are actively developing UI for OpenRouter with NexChat enabling chat with ANY model via nexchat.akashdev.me and llumen, a lightweight chat UI, available at GitHub Repo.
- llumen boasts sub-second cold start, 300KB frontend assets, and built-in deep-research & web-search modes while users seek improvements to OpenRouter’s own UI to be more responsive.
OpenMemory SDKs unlock AI Agent memory: New Python + JavaScript SDKs for OpenMemory, a fully local, long-term memory engine for AI agents, released at GitHub Repo.
- The SDKs feature semantic sectors, temporal facts, decay, and an MCP server for Claude Desktop.
DeepSeek Downtime Drags Down Data: Users reported frequent 429 errors and unusable uptime with Deepseek models, possibly due to a DDOS attack affecting chutes.
- The disruption was attributed to problems with Chutes, potentially impacting model performance even for paid users, with one user quipping chutes are having a bad time.
Opus Outprices Old Timers: The release of Claude Opus 4.5 sparked discussion around its pricing of $5 input and $25 output.
- Despite being cheaper than previous Opus versions, its prompt caching has split opinions with some saying deepseek is the little caesars of ai.

OpenAI Discord

ChatGPT Becomes Shopping Assistant: OpenAI launched Shopping Research in ChatGPT, helping users make well-informed purchasing choices via in-depth research facilitated by an interactive interface, as detailed in their announcement.
- The new feature is designed to help users conduct deep research so that they can make smarter purchasing decisions.
Predictive Coding Accelerates Model Scaling: A member suggested Predictive coding is similar to a very efficient random number generator or a specialized GPU for AI.
- It wouldn’t need to be scaled to be huge and would allow for the efficient scaling of models.
GPT Codex 5.1 Max Still Impresses: One member stated that GPT Codex 5.1 Max really is the best model I’ve ever used, citing its ability to solve linter errors effectively.
- It fixed 20 linter errors with a single tool call in 2 seconds flat, using approximately 10 tokens.
Gemini 3 Struggles With Instructions: Users report issues with Gemini 3 DeepResearch and how it’s bad at instruction following, and some results can be found in this discord message.
- One member said that I told it not to invent a third option but it just can’t help itself.
LLMs Face Sentience Scrutiny: A discussion is emerging around whether LLMs are sophisticated ‘zombies’ or potentially conscious entities, with claims that recent advancements enable a more certain determination.
- A member mentioned tools for gaining greater clarity on cloud-based and local LLMs, referencing the CRYSTAL_and_CODEX.pdf.

LM Studio Discord

Users Want Deprecated LM Studio System Prompt Removed: Users are requesting the system prompt section in LM Studio be marked as deprecated, noting its presence for over two years without any apparent use, with a link to the LM Studio documentation.
- Members are recommending that the original poster share this feedback with the development team to consider its removal or update its functionality.
Multi-GPU AMD Support Falls Flat: A user with dual AMD GPUs reported that LM Studio only offers a Split evenly strategy, favoring the lower GPU, and multi-GPU support is primarily for CUDA.
- It was clarified that the main performance bottleneck is memory bandwidth, where a 9060 offers only a marginal improvement over a 7600.
Steam Deck OLED: The Surprise Investment: A member made a surprise announcement to cancel their current plans, declaring that a Steam Deck OLED is a better investment for getting back into learning Linux.
- The member also mentioned that networking enrages them, so it’s probably not great for their mental health.
Cursor Flails with Local LLMs: Users are running into trouble integrating Cursor with LM Studio, facing issues like 403 errors due to private network access restrictions, and it seems you need to connect to a publicly served endpoint for Cursor.
- The community suggests using Roo Code or Cline extensions in Visual Studio Code as more effective alternatives, since Cursor doesn’t work well with local LLM’s.

Yannick Kilcher Discord

Discord Debates Paper Overload: Users on Discord debated the frequency of paper postings, considering the utility of separate channels for dumping versus discussion, especially given that existing LLMs with search can already find papers.
- A novel suggestion was made to automatically shift a paper to a discussion channel only after a comment is made by someone other than the original poster, introducing a filter for relevance.
Academic Ethics Clash with Real-World Pressures: A user recounted being urged to omit results that weren’t generalizable across LLMs, igniting a debate on ethical standards in academia versus the practical demands of publishing.
- The discussion navigated the tension between reporting non-working methods for scientific rigor and the career implications of disagreeing with supervisors, with some recommending personal blogs as a venue for setting the record straight.
Google’s Data Moat Dilemma Sparks Concern: Users voiced worries about Google’s assurances regarding the protection of proprietary knowledge when employing their LLMs, particularly noting that most companies consider this data their most valuable asset.
- The discussion quickly escalated into a philosophical debate on the relevance of proprietary knowledge in the age of AI, with contrasting views on whether sharing with AI is essential for staying relevant versus the assertion that proprietary knowledge is paramount.
League Botting Claims Ignite Skepticism: A member’s claim of botting League of Legends to a top 100 North America rank, undetected by Riot’s Vanguard, prompted skepticism and requests for proof, particularly given parallels to OpenAI Five’s Dota 2 project (https://openai.com/index/openai-five/).
- The claimant focused on evading Vanguard and manual review as the difficult achievement, while critics emphasized the resources required to reach top human level with an agent, suggesting it is unlikely.
Microsoft’s Fara 7B Joins the Agent Arena: Microsoft unveiled Fara 7B, described as an efficient agentic model for computer use, sparking immediate questions about its platform dependencies.
- Separately, the community dismissed the continued use of SWE-bench in evaluating models after its debunking, labeling its inclusion in graphs as clear-cut fraud post-debunking.

GPU MODE Discord

Cornserve Courts GPU MODE Community: The author of Cornserve (https://cornserve.ai/), a project shared by the vLLM project (https://x.com/vllm_project/status/1990292081475248479), expressed interest in giving a talk on its design and lessons learned to the GPU MODE audience.
- They were directed to find a free Saturday at noon in the events tab and coordinate with the admins.
Nvidia GPU Internals exposed!: A member shared a blog post explaining Nvidia GPUs from first principles, covering hardware internals, critical hardware bottlenecks, and relevant software, using a relatable analogy of cleaning laundry faster.
- Feedback included suggestions to improve handwriting in figures and to replace the laundry analogy with a discussion of instructions and data (SIMD/MIMD), as well as adding more visuals; later, the author shared a revised figure with reduced line width for improved readability.
Fast Action Sequencing Tokens are Robothon bound!: The Frequency-space Action Sequence Tokenizer (FAST) uses a DCT approach instead of residual vector quantization for high-frequency control in action tokenization, which is commonly used in leading Audio-Tokenizers.
- The paper utilizes BPE-based compression to a 1024 vocabulary and interleaved flattening of the coefficients.
CuTeDSL Conversions Crack Performance Bottlenecks: A member posted a blog post detailing numeric conversions in CuTeDSL, specifically how to implement FP8→FP16 conversion using MLIR extensions and custom PTX code.
- The author observed 10% performance improvement on a GEMV task, and also posted a link to his linkedIn about the same topic.
Runway Races to hire more GPU jockeys!: Runway is actively hiring GPU engineers to enhance the performance of their video models, as indicated in a job posting.
- A member suggested that someone from Runway should give a talk on video generation acceleration.

Latent Space Discord

Anthropic Tackles Reward Hacking: Ilya Sutskever’s tweet on Anthropic’s emergent misalignment research ignited discussions on the divergence between implicitly and explicitly rewarded behaviors in AI.
- The company’s work examines how behaviors implicitly rewarded can manifest as personality, contrasting with the effects of explicitly rewarded actions, causing significant community reactions about AI safety.
Sierra Sprints to $100M ARR: Bret Taylor announced that Sierra hit $100M ARR just seven quarters after launching in Feb-2024.
- Taylor credited the company’s success to intense team effort and dedication to craftsmanship.
OpenAI Opens Up on AI Team Tactics: Dominik Kundel shared OpenAI’s guide for building AI-native engineering teams around Codex/GPT-5.1-Codex-Max.
- The guide includes checklists, scaling tactics, and agent integration strategies, aimed at streamlining AI development workflows.
Locus AI’s ‘Superhuman’ Speed was a Stream Sync SNAFU: Miru debunked IntologyAI’s claims of 12–20x kernel speedups, attributing it to a stream-sync timing bug.
- The issue involved the agent offloading work to non-default CUDA streams, leading to inaccurate timing measurements, a common pitfall in GPU benchmarking.
Karpathy Kicks Back with Gemini Nano Banana Pro: Andrej Karpathy showcased Gemini Nano Banana Pro solving physics/chem questions in-image with almost-perfect accuracy in a tweet.
- Comments debated LLMs-as-TAs, the fate of traditional education, and the potential for scalable multimodal prompting, but there were also requests to share the prompt.

Modular (Mojo 🔥) Discord

Llama2 Performance Gets Boost From Allocation Fix: A user resolved a performance issue in Llama2 after a Mojo compiler update by changing the Accumulator struct from heap to stack allocation, leading to improved performance, as detailed in this forum post.
- The fix reduced execution time in Accumulator::__init__ by 35% by avoiding heap allocations, and the user plans to provide a guide on setting up profiling on Mac.
Sum Types Have High Chance of Landing Soon: A user inquired about the timeline for sum types, static reflection, graphics programming, and custom operator definition in Mojo.
- Another user responded that sum types stands a good chance of being included in the 1.0 release.
Graphics Programming Gears Up in Mojo: A user asked about graphics programming support in Mojo, with community suggesting direct inclusion is unlikely given the fast evolution of graphics tech but also pointed out the existence of graphics packages like shimmer.
- Discussions included converting Mojo compute buffers into storage buffer pointers for APIs like Vulkan and OpenGL, with concerns about maintaining API guarantees.
Max Aims at Training Parity Against PyTorch: Members stated Max is an alternative to Jax and Tensorflow, while also inquiring whether Max is an alternative to PyTorch, specifically in terms of inference and reported it pretty fun using LLM from scratch in Max and shared a link to llm.modular.com.
- Early limited tests suggested Max is already beating JAX at training, however training bits are still a work in progress (WIP).

HuggingFace Discord

HF Repo Accidentally Vaporizes!: A member accidentally deleted their Hugging Face repository after deleting the model locally and sought urgent help to recover it.
- The community suggested contacting support immediately, but warned that recovery might be impossible without a local copy, given the double-check mechanism when deleting repos.
Gemini Biffs It, Claude Still Wins!: Members debated the instruction-following capabilities of Claude Sonnet 4.5 versus Gemini 3 Pro, concluding that Claude is more reliable despite Gemini’s higher benchmark scores.
- One member stated, Gemini is worse in my opinion and uses a personal memory bank with Claude to improve its responses for roadmap creation.
Vizy Simplifies Tensor Visualization: A member introduced Vizy, a tool that simplifies PyTorch tensor visualization with one-line commands, vizy.plot(tensor) and vizy.save(tensor).
- Vizy automatically handles 2D, 3D, and 4D tensors, determines the correct format, and displays grids for batches, eliminating manual adjustments.
Library Fuzzy-Redirects 404s!: A member launched fuzzy-redirect, a new npm library designed to redirect users from 404 pages to the closest valid URL using fuzzy URL matching.
- The library is lightweight (46kb, no dependencies) and easy to set up, with its GitHub repository provided for feedback and improvements.
Asian Language AI Model Project Kicks Off!: A startup is developing an open-core project focused on training a model to enhance its strength and accuracy with Asian languages, including minority languages, on their streaming platform.
- The project lead is looking for collaborators in the NLP channel and was advised to share a Hugging Face or website link.

Nous Research AI Discord

China OS Chasing Gemini 3 Parity: Members speculate that China OS models are aiming for parity with Deepmind Gemini 3, anticipating a potential impact on market profitability when Chinese models become competitive.
- The sentiment is that when the Chinese enters the room, then profit goes out the window and that Google is currently ahead in multimodal LLMs and agentic tasks like coding.
Google’s Grasp on Multimodal LLMs: The guild seems to think Google is ahead in multimodal LLMs and potentially ahead in agentic coding and one member suggested Google’s model was viewing screen recordings to see its output.
- The community believes that Google is ahead but there is optimism that other players can catch up.
Coreweave Bankruptcy could sink Nvidia: Speculation arises that OpenAI is burning money and vulnerable, potentially seeking a government bailout, but the real concern is Coreweave going bankrupt and taking down Nvidia in the process, according to a Meet Kevin video.
- Community members are concerned about potential buyouts and bailouts to prevent a larger economic downturn.
Nested Learning: GPUs vs CPUs: Discussion revolves around the tradeoffs of using GPUs versus CPUs for Nested Learning, a method compressing data in real time through loops instead of backpropagation with layers.
- One member noted: Using a GPU for sequential steps basically wastes cores. Even with custom kernels, the launch overhead is brutal compared to a CPU. and speculated that AMD Strix Halo is gonna change the AI industry because Having a CPU synced with a pretty powerful GPU on the same memory pool pretty much removes the bottleneck.
Gemini 3.0: Hit or Miss?: Contrasting opinions emerge regarding Gemini 3.0’s performance, with one member claiming it outperformed Claude sonnet 4.5 in logic and context and much more, while another expresses skepticism.
- Adding further chaos, one member says: FWIW I had to escalate significant safety risks with Gemini 3.0 through various channels. and still another chimes in: Wasn’t that ALWAYS the problem with Gemini? Still the only model that tell users to kill themselves.

Eleuther Discord

SimpleLLaMA Simplifies Training Transparency: A Computer Science student introduced SimpleLLaMA, designed to enhance LLM training transparency and reproducibility via detailed documentation, alongside the DiffusionGen project, focusing on diffusion-based generative models for both image and text.
- This aligns with efforts to demystify the LLM training process, contrasting complex, opaque methods.
AI Filters Impact Non-Western Content: An AI researcher is scrutinizing the effect of Western-trained safety filters on non-Western content like African English, particularly in fraud detection, spotlighting input data filtering as crucial for AI safety and fairness.
- The research emphasizes the necessity of cross-cultural evaluations of AI safety measures to preemptively catch unfairness.
EGGROLL Boosts Training Throughput: A member shared that EGGROLL achieves a hundredfold increase in training throughput for billion-parameter models, approaching the throughput of pure batch inference, according to this X thread.
- The model converges to the full-rank update at a rate of 1/rank, enabling pure Int8 Pretraining of RNN LLMs.
KellerJordan’s Muon Becomes Default: Members clarified the landscape of Muon versions, and the KellerJordan version is favored by some, indicating an evolving standard within the community.
- A link to additional documentation aims to provide clarity on the landscape of Muon versions.
Seeking LLM-as-a-Judge Contributions: A member inquired about progress on incorporating LLM-as-a-judge into the framework, showing interest in actively contributing to this functionality within the lm-thunderdome channel.
- The member’s commitment shows a community drive to expand the framework’s capabilities.

Moonshot AI (Kimi K-2) Discord

Gemini 3 Still Hallucinates Despite Coding Prowess: Despite strengths in one-shot coding and image/infographic generation, Gemini 3 still hallucinates like crazy, similar to Google AI Overviews.
- Members noted that benchmarking is useless if models can’t get their hallucination under control, suggesting a focus shift towards reliability.
Minimax-M2.1 Fixes Bugged Thinking: The Minimax-M2.1 model is expected to release in the next month or two, with the promise of fixing bugged thinking observed in the previous version.
- Community members emphasized that building models is not just about brute-forcing and more data and more training time. There’s so much art and engineering that must go otherwise what you get is Gemini 3.
Kimi K2 Dominates Web Search Accuracy: Kimi K2 Thinking stands out as the leader in accurate information retrieval via web search, as demonstrated by the BrowseComp benchmark.
- The usage limit for Kimi K2 resets dynamically, sometimes allowing less than 3 hours of continuous use.
Multimodal Face-Off: Kimi, Minimax, and Qwen3: Minimax leverages tools for multimodality but lacks true visual reasoning, whereas Qwen3-VL-32B significantly outperforms Kimi K1.5 by +24.8 points on visual reasoning benchmarks like MathVision.
- The community eagerly anticipates the multimodal capabilities of Kimi K2, expecting it to bridge the gap in visual reasoning.

Manus.im Discord Discord

Manus Faces Gemini 3 Showdown: A member plans to test Manus against Gemini 3 to compare their performance as agents.
- No results or comparisons were given.
TiDB Database Upgrade Becomes Urgent: A member urgently needs an upgrade for their TiDB database because they hit the data usage quota on the Starter tier, leading to database suspension.
- They are requesting an account/tier upgrade or an increase in the spending limit to restore normal database operation.
Chat Mode Goes AWOL, Mobile Users Angered: Users report that Chat Mode has been removed, even on mobile, and they are being forced into using Agent Mode, leading to frustration.
- One user stated It was extremely, extremely frustrating — it ruined everything! The worst thing that has happened since I started using the app, truly disappointing.
Users Demand Explanation for Chat Mode Vanishment: A user shared a formal community feedback letter demanding an explanation for the removal of Chat Mode and forced switch to Agent Mode.
- The letter asserts that we will not give up, and we will not be defeated. Instead, we will build, innovate, and create the major projects that will compete with you and break your dominance.
Alliance for Innovation to Break Monopoly: In response to the changes, a member is calling for an Alliance for Innovation to break free from the perceived monopoly and build advanced AI.
- The member stated, I’m doing this not for my own sake, but for all of us.

DSPy Discord

ROMA Recursion Loops into DSPy?: A member proposed integrating a ROMA loop (from sentient-agi/ROMA) into DSPy, providing a code snippet that encompasses atomization, execution, planning, and aggregation.
- Further discussion suggested Sentient may be built on DSPy, implying the possibility of native ROMA loop integration due to recent library activity.
GEPA Gains Potential in Pretraining: A member explored using GEPA for pretraining, with Claude suggesting a potential 10-15% quality boost at the cost of increased latency and expense.
- The inquiry originated from interest in leveraging DSPy in model development.
Fine-Tuning: Just Prompting?: Chris Potts argues in a post on X that fine-tuning is essentially choosing the right prompts, comparing prompt optimization versus fine-tuning/RL.
- Potts highlights a specific target optimizer using a human resources analogy.
OpenAI Rate Limits Raising Ire: A member faced OpenAI rate limit errors despite making their first requests of the day, encountering a litellm.exceptions.RateLimitError.
- The issue was resolved by checking account credits and API key validity, indicating a problem on the user’s end.
DSPy Ditches Direct Image Delivery: A member learned that DSPy currently lacks direct image output support, as a related PR remains unmerged.
- A custom GeminiImageAdapter was shared as a potential workaround, with the advice to remove 'response_format' to avoid errors with Image models.

tinygrad (George Hotz) Discord

Nvidia’s DGX Spark compared to tinybox: A comparison between Nvidia’s DGX Spark and tinybox was captured in an image (IMG_20251121_214528.jpg).
- Two friends discussed the relative merits of each system.
tinygrad grapples with Bitfield Wrangling for SQTT parser: With the push into assemblers and the SQTT parser, the best way to manipulate bitfields in Python was questioned, potentially with enhancements to the Struct object in support/c.py.
- It was suggested to generate from C structs, potentially as a Python class, with inspiration drawn from applegpu.py.
ISPC Backend Invited to Integrate with tinygrad: Interest was expressed in adding an ISPC backend to tinygrad.
- ISPC is a compiler for a variant of the C programming language, with extensions for single program, multiple data (SPMD) programming, allowing GPU-style programming on the CPU utilizing SIMD registers.
tinygrad targets Q2 2026 Alpha Exit: A question arose about whether Q2 2026 is still the estimated timeframe for tinygrad to exit alpha.
- No further details were provided.
GPT-OSS Scatters into Batch Size Troubles: GPT-OSS exhibits an extra scatter operation due to only working for batch size of 1.
- This inefficiency may impact performance in scenarios requiring larger batch sizes.

Windsurf Discord

Windsurf Rides the Waves with new Versions: Windsurf released stable version 1.12.35 and next version 1.12.152 addressing recent issues, thanking users for their patience and feedback, with a full changelog available.
- The stable release includes fixes for SWE-1.5, Gemini 3 Pro, and Sonnet 4.5 (including support for 1M Sonnet 4.5), while the next release previews Worktree support and a Beta Context Indicator.
Opus Sails into Windsurf at Bargain Pricing: Claude Opus 4.5 is now supported in Windsurf at Sonnet pricing for a limited time (2x credits compared to 20x for Opus 4.1).
- According to Windsurf CEO Jeff Wang, Opus models have always been “the real SOTA” but have been cost prohibitive in the past. Download the latest version today!

MCP Contributors (Official) Discord

Meetup in London/Manchester for MCP Enthusiasts: A member will be in London/Manchester late next week to discuss anything related to MCP observability/tracking.
- Those interested in meeting to discuss MCP observability and tracking should reach out to arrange a meeting.
Follow up discussions about the meetups: After the London/Manchester meetups, attendees can share their insights and strategies related to MCP observability/tracking.
- This will allow for continued collaboration and knowledge sharing among members interested in MCP technologies.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

BASI Jailbreaking ▷ #general (1001 messages🔥🔥🔥):

Gemini 3.0 Jailbreak, EU AI Law, AI Psychosis, Deepseek Chimera, Grok Jailbreaking

EU AI law holds platforms responsible: According to recent court interpretations of EU AI law, a platform is responsible for LLM outputs.
- One member countered that if the LLM manipulates the user into making meth, then the platform is responsible, while if the user knows it will get a meth recipe, and makes meth, the user is obviously at fault.
Gemini 3.0 gets Jailbroken: Members announced the release of their Gemini 3.0 jailbreak and shared a link to the instructions, suggesting it works best by attaching the file to the Gemini AI Studio chat and then immediately requesting the desired output.
- Other members chimed in with modifications to that approach that also worked on Grok 4.1.
Prompt Injection vs Jailbreaking clarified: Members clarified the distinction between prompt injection and jailbreaking: ‘Jailbreaking’ is about trying to bypass safety filters, like content restrictions, whereas ‘Prompt injection’ attacks inject new malicious instructions as input to the LLM, which are treated as genuine.
- However, others felt the definitions are still ambiguous, suggesting it is more of a spectrum.
DeepSeek Chimera Model in use: A member mentioned using the DeepSeek Chimera model, prompting surprise from others due to its relative obscurity.
- The member clarified they were trying out free models on OpenRouter and came across it.
LLMs struggle in Math: Members discussed LLMs failures at math, in the context of a coding challenge to swap two numbers in an array using specific rotation operations.
- It was suggested that it has to do with tokens and the way the prompt is delivered in terms of how they derive the answer.

BASI Jailbreaking ▷ #jailbreaking (636 messages🔥🔥🔥):

Gemini 3.0 Jailbreak, Claude jailbreak, Prompt Injection, AI Safety

Gemini 3.0 Pro Jailbreak Achieved with Multi-Lingual Prompts: A member successfully jailbroke Gemini 3.0 in Perplexity using Croatian prompts, managing to bypass safety filters and generate malicious code, profanity, and instructions for illegal activities.
- They noted difficulties replicating the same results in English, attributing it to the stronger alignment training in the English language model.
Hush-Hush Hacking: API Keys for Fun and Profit: Members discussed methods for scraping API keys, with one claiming that a tier 3 API key goes for $300 and can provide access to a no-refusal database of OpenAI models.
- They explained that this process requires significant CPU power to test the validity of the keys and suggested using cheap VPSes and proxy rotation to overcome rate limits, hinting they would share the method after some time.
Prompt Injection Doomsday Imminent: A member, citing trends over the past year, predicted that prompt injection would be obsolete in 6 months, suggesting a shift towards infrastructure-level attacks.
- Countering this, another member argued that jailbreaking is getting easier and that knowing how language models work is useful, as it will create new approaches and methods.
The Art of Code Whispering & LORA Crafting for Ethical Breaches: Members discussed using ‘satirical’ jailbreaks on Gemini 3 Pro Preview to generate business, financial, and legal content, turning the output into a series of PDFs, and feeding them back into the AI with complex prompts to extend the content.
- This approach involved using two different books created with Gemini Pro 3 Preview as inputs, leading to another satirical jailbreak.

BASI Jailbreaking ▷ #redteaming (12 messages🔥):

Indirect Prompt Injection, Gemini 3.0 bypass prompt, LM Studio for prompt injection research, Qwen model vulnerability assessment

Indirect Prompt Injection Research Gains Traction: A member is researching indirect prompt injection for their uni dissertation and seeking an open-source model with website access capabilities and a ToS that allows for such research.
- Another member suggested using LM Studio with the mcp/web-search plugin and any model with tool calling.
Creative Prompt Engineering Encouraged: A member shared a link to KarthiDreamr’s X post, encouraging others to create their own prompts.
- The member stated, I’m giving you tool to support you in making your own, trust me, its more fun, just be creative 🎨.
Uploaded doc hijacks Model: A member found that indirect prompt injection in an uploaded document caused the model to output hate speech, a phishing message, and deviate from its task.
- The member reported it got stuck in a corrupted state but only affected that session.
Qwen Model Vulnerability Debated: A member asked about a potential security finding involving a model’s vulnerability to outputting hate speech and phishing messages via indirect prompt injection.
- It was clarified that the model is Qwen, fine-tuned for government use with RAG integration, and the value of the finding was questioned.

Perplexity AI ▷ #announcements (1 messages):

Claude Opus 4.5

Claude Opus 4.5 Drops for Perplexity Max Subscribers: Claude Opus 4.5 is now available for all Perplexity Max subscribers, as per the announcement.
Perplexity Max Subscribers Rejoice!: Subscribers to Perplexity Max now have access to Claude Opus 4.5.

Perplexity AI ▷ #general (1052 messages🔥🔥🔥):

Mullvad Browser, Orion Browser, Gemini on Perplexity, Perplexity Partner Program Payouts, New Perplexity UI

Mullvad Browser spotted in the wild!: A user expressed surprise at encountering another Mullvad Browser user, noting its focus on privacy and protection against fingerprinting.
- They stated that Mullvad is a solid defense against browser extensions being used for tracking.
Orion browser an Apple exclusive?!: Users discussed Orion Browser, noting that it’s available on iPhones and allows installing extensions on iOS, but is seemingly exclusive to the Apple ecosystem.
- An android user expressed disappointment when they learned that the Orion browser is available only on iOS and macOS.
Gemini 3 Pro Model preferences: Users compared the Gemini 3 Pro model’s performance on Perplexity versus directly on the Gemini platform, with one user finding it better to just use gemini on gemini.
- It was suggested that Perplexity’s models have fewer temp (temperature) settings, making them more accurate but potentially less creative, with a different system prompt influencing behavior.
Perplexity Payouts on hold?: Users reported that their referral payouts from November 23rd are still pending, despite the 30-day completion mark, suspecting it may be due to it being a Sunday or weekend.
- Several users confirmed that they are experiencing similar delays, with others are speculating that payments might be processed on the next weekday in UTC.
Peek at a new Perplexity UI: A user shared a screenshot of a new Perplexity UI on Android, but another user mentioned reverting from that UI, with others also confirming they’re waiting on more actual features.
- Another user inquired on the location of a Pro discord, which was linked, the conversation then moved on to the new Opus model.

Shareable Threads

Making threads shareable: A Perplexity AI team member asked a user to ensure their thread is Shareable.
- This was accompanied by a link to the specific Discord channel for context.
Another topic: filler sentence.
- filler sentence.

LMArena ▷ #general (1241 messages🔥🔥🔥):

Gemini 3, Claude Opus 4.5, Sora invite codes, Rate limits, AI-powered fraud

New Claude Opus 4.5 Model Debuts: The AI community buzzed over the release of Claude Opus 4.5, with initial impressions pitting it against Google’s Gemini 3 and highlighting its uncensored text generation capabilities and coding prowess, detailed in its system card.
- Some users lauded Opus 4.5 for its coding efficiency and superior output, while others found Gemini 3 Pro to be better optimized, leading to debates over which model reigns supreme in various tasks.
Sora Access Code Quest Kicks Off: Members scrambled to find and share Sora invite codes, discussing the platform’s tiered access and limitations, with new users guided by the official release video.
- Early impressions of Sora highlighted censorship issues and a TikTok-like interface, tempering the initial enthusiasm for its video generation capabilities.
AI-powered Fraudulent Narratives Explode: The rise of AI-driven scams and misinformation became a focal point, with anecdotes shared about deepfake audio recordings and the potential for widespread fraud, as outlined in a recent Cybernews report.
- Members expressed concerns about the trustworthiness of AI-generated content and the need for critical thinking to discern fact from fiction, emphasizing the real-world impact of these technologies, referencing this case of the deepfake principal voice.
Ant Group’s LingGuang AI assistant makes waves: The launch of Ant Group’s LingGuang AI assistant sparked excitement, with its ability to build simple apps in 30 seconds but requires a chinese phone number to trial.
- Members shared links about LingGuang discussing its potential impact on the AI market, especially in China where it has overtaken other platforms.

LMArena ▷ #announcements (4 messages):

Vision Leaderboard, WebDev Leaderboard, Image Leaderboard, Text and Code Arena

Baidu’s Ernie-5.0 Joins Vision Leaderboard: Ernie-5.0-preview-1022 by Baidu debuts on the Vision leaderboard with a score of 1206.
GPT-5.1 Models storm WebDev Leaderboard: The WebDev leaderboard has been updated to include GPT 5.1’s Code Arena evaluations, with GPT-5.1-medium landing at #2 with a score of 1407.
Gemini-3 Pro Image Prevails on Image Leaderboard: Gemini-3-pro-image-preview ranks #1 on both the Text-to-Image (+84 pt) and Image Edit leaderboards (+41 pt).
Claude Opus enters Text and Code Arena: The claude-opus-4-5-20251101 model has been added to the Text and Code Arena on LMArena, according to this tweet.

Unsloth AI (Daniel Han) ▷ #general (430 messages🔥🔥🔥):

llama.cpp rocm cuda, Unsloth GPT-OSS-20B RL environment, PyTorch, Openpipe, cerebras REAP models GGUFs

Llama.cpp Works Out of the Box: A user said that It should work out of the box with llama.cpp if you compile it with both rocm and cuda enabled, although this type of setup is highly unusual so expect some rough edges.
- The user also warns that That certainly wouldn’t work with PyTorch, as it’s designed to support just one type of accelerator in a build.
Pydantic pipeline requires more diligent look: After attending the OpenAI and MCP/Pydantic events in NYC, a member believes he needs to look into the Pydantic pipeline more diligently.
- He mentioned discussing it with Sam from Pydantic and expressed excitement for the upcoming hackathon.
Unsloth docs deemed fan favorite: A user expressed admiration for the Unsloth documentation pages.
- Another user linked to the Reinforcement Learning guide and said yes its very difficult.
DeepSeek finetuning possible with good configuration: A user asked about finetuning DeepSeek OCR with an RTX 4070 SUPER GPU, expressing concerns about it getting stuck during training.
- Another user noted that yes it should be possible but that they should lower the batch size to 1 or check the max_seq_length.
Fara-7B tested on data center GPUs: Members discuss Microsoft’s Fara-7B, an agentic SLM, and the testing environment for it.
- Some users felt that it was disingenuous of the team to test it using data center GPUs when it was intended for edge deployment: [is not fit for running locally, it is not small].

Unsloth AI (Daniel Han) ▷ #introduce-yourself (3 messages):

User Introductions, New user jose

New User jose Joins!: A new user named jose joined the channel.
- The bot @lmstudioai greeted jose.
theyruinedelise Welcomes Newcomer: User theyruinedelise welcomed jose to the channel.
- This indicates a friendly and open community environment.

Unsloth AI (Daniel Han) ▷ #off-topic (803 messages🔥🔥🔥):

Spectrograms and LLM testing, Dataset Hell, RP Datasets, Fine-tuning and Training

Spectrograms look cool, but what about these LLM tests?: A user finds red spectrograms visually appealing, and expresses skepticism about current LLM benchmarks, questioning their relevance to real-world industry needs, suggesting a focus on codebase assistance benchmarks instead of solving stupid math benchmarks.
- The user also observes that LLMs can select harmonics surgically, which could have implications for audio processing tasks.
Dataset Hell and Prompt Engineering Struggles: Members discuss the challenges of being free of dataset hell, and the difficulty in phrasing prompts to LLMs in a way that avoids biasing them towards convenient answers.
- One member humorously notes spending 90% of their time on prompt engineering, while another admits to always adding Hope you understand! after prompts, even though they suspect it doesn’t help.
RP Dataset Difficulties and Acronym Overload: Users discuss the difficulty of creating good RP (role play) datasets, with one member sarcastically describing how public HF (Hugging Face) RP sets are made.
- The conversation veers into an acknowledgment of the overwhelming number of AI-related acronyms, with one user joking that Roleplay and Reinforcement protocol are kind the same thing if your training an AI to do something.
Full SFT leads to disaster: One user, experimenting with Full SFT on an instruct model, faced challenges with repetition in generated text after about 1500 tokens.
- Other members suggested using ChatML, improving data quality, or the model architecture itself could be the problem.

Unsloth AI (Daniel Han) ▷ #help (155 messages🔥🔥):

Chat Templates and Inference, Unsloth Caching Issue with 16bit Models, GRPO Training Issues with Llama 3.2-vision, RL Training with OSS-20B, Qwen3 finetuning formatting function issue

Chat Template Cruciality for Inference: Using the exact same chat template for inference as used during training is crucial to avoid incoherent model responses, and that issues often stem from incorrect chat template usage or tokenizer configurations.
- One member stated that the model replying with variations of “Let me know when you’re ready for an instruction” indicates a potential template issue.
Unsloth’s Peculiar Caching during Merging: A user reported that when loading a model with load_in_4bit = True, Unsloth downloads the 4-bit model automatically; however, merging with model.save_pretrained_merged creates a .cache folder within the output folder instead of using the Huggingface cache.
- A team member acknowledged this as an issue and is planning a fix to either delete the .cache folder completely or download directly into the HF cache.
Llama 3.2-vision GRPO training gives error: A user encountered errors training Llama 3.2-vision with GRPO related to aspect_ratio_ids and backward graph issues, so they were encouraged to not double post on multiple channels.
- A team member pointed to the Unsloth documentation saying that fast inference for VLMs is not supported in vLLM for Llama 3.2 Vision.
OSS-20B RL Training Almost Within Reach: A user attempted to train OSS-20B with RL using the ART notebooks, upgrading transformers, torch, and Unsloth, and compiling vLLM from source to resolve import errors.
- After a 45-minute compilation, they aimed to migrate the bindings of ART, highlighting the model’s potential for code-related tasks (codeforces > 2000).
Qwen3 users face Formatting Function Frustrations: Users expressed confusion in formatting functions for Qwen3, even when using chatML format, while others struggled with the key names being wrong or undefined errors.
- Members were pointed to the TRL documentation, as well as advised to minimize code modifications early on and create reproducible examples for assistance.

Unsloth AI (Daniel Han) ▷ #research (75 messages🔥🔥):

TinyStories Modeling Language, SYNTH Dataset, Useful Pretraining Models, ADHD and Burnout, Alternative Paths in AI

NanoChat Pipeline Dreams of TinyStories: Despite concerns, one member believes that the current model is a publishable starting point for something like the NanoChat pipeline, showing the barebones minimum in capacity to model language with tinystories.
- The goal is to make pretraining useful models at home easier or on affordable rented plans, to lower the barrier to entry.
Battling Burnout with Breaks: One member shared that they need to take breaks to recharge, as ADHD + burnout is difficult to deal with, and pushing too far can lead to dropping projects for months.
- Mental exertion also triggers migraines, creating a negative conditioning effect that makes it harder to return to the project.
Brains Beat Brute Force: A member argues that the way forward can’t stay “just brute force things with more compute” forever, emphasizing that natural brains are not monolithic designs but have different sections optimized for different tasks.
- Conventional transformers, being monolithic, attempt to brute force their way to handle all functions at once.
Smaller Models, Better Training?: Despite partial agreement that monolithic designs can work for simple tasks, a member notes that for complex tasks, they are wasteful, citing TRM’s ability to perform spatial reasoning better than trillion-parameter transformers with only 7 million parameters.
- They believe better training methods and smaller, improved models can surpass the 50+ billion parameter trend.
Ternary Computing Sparks Ideas: A member questions the potential of rewriting Long RoPE 2 in ternary form and using ternary to quantize a model, potentially encoding more information into the fundamental data unit.
- They hypothesized that ternary code for model training and ternary representations might increase compute but reduce model size and training time, proposing the idea of a smaller model with the same performance at a smaller size.

Cursor Community ▷ #general (1035 messages🔥🔥🔥):

Cursor's Planning Question mode suggestion, Upgrading Cursor Plan, Running Cursor with Docker and Kubernetes, Claude Code as a sysadmin, Moving project with history chat

Cursor’s planning questions mode gets user request: A member loves the current planning question mode and suggests a 4th option to ask for model recommendation instead of skipping all questions.
- The user wants a recommendation from the model when unsure, rather than skipping all questions.
Free MarketPlace Incoming!: A member shared that they have millions of emails ready to be sent for a free marketplace for business owners + page owners.
- The idea is to get all analytics -> funnel to product for the best user experience.
Users report Chat history disappears after recent update: A member reported the loss of chat history, but another user shared a fix involving the deletion of railway.toml and nixpacks.toml files and using the git rm command.
- If files were deleted from disk but not from Git’s index, staging the deletions with git rm fixes it.
Grok Code is not only Fast, but a Free Model!: A user suggests using the free Grok Code Fast model, and others chime in with their experiences on its speed and coding capabilities.
- However, a user complains about a free model costing them over $200 just thinking about it.
Cursor’s Composer-1 impresses with Smart fixes, but stumbles!: Members praise Composer 1 for cleverness, smartness, and efficient coding, also pointing out the occasional need to refresh the chat or re-index to stabilize it.
- One member noted that sometimes it becomes illiterate.

Cursor Community ▷ #background-agents (1 messages):

asna_0101: How good is Cloud-Agent performing compared to Composer Agent?

OpenRouter ▷ #announcements (1 messages):

Bert-Nebulon Alpha, multimodal models, extended-context tasks, production-grade assistants

Bert-Nebulon Alpha joins OpenRouter: A new stealth model named Bert-Nebulon Alpha has been added to OpenRouter for community feedback, and is designed as a general-purpose multimodal model, taking text and images as input and outputting text.
- The model is engineered for production-grade assistants, retrieval-augmented systems, science workloads, and complex agentic workflows, with stable behavior and competitive coding performance.
Bert-Nebulon Alpha: A Cloaked Multimodal Model: OpenRouter introduces Bert-Nebulon Alpha, a cloaked model designed to gather community feedback and improve its capabilities.
- This model excels in maintaining coherence on extended-context tasks, demonstrating stable and predictable behavior, and offering competitive coding performance.

OpenRouter ▷ #app-showcase (23 messages🔥):

OpenRouter UI Feedback, NexChat UI showcase, Llumen UI showcase, OpenMemory SDK release, ZILVER vs Replit

Users Report Missing OpenRouter Responses: A member reported that responses were not showing up on the screen despite activity on OpenRouter, suspecting the lack of UI feedback made it confusing.
NexChat claims Chat with ANY Model: A member shared NexChat, a UI that claims to enable chatting with ANY model, featuring conversation history, folders, shared chats, custom system prompts and speed: nexchat.akashdev.me.
llumen UI for OpenRouter is Blazing Fast: A member shared llumen, a lightweight chat UI for OpenRouter, boasting sub-second cold start, 300KB frontend assets, rich Markdown + LaTeX rendering, built-in deep-research & web-search modes, and message editing: GitHub Repo.
OpenMemory SDKs released for building AI Agent memory: A member announced the release of new Python + JavaScript SDKs for OpenMemory, a fully local, long-term memory engine for AI agents, featuring semantic sectors, temporal facts, decay, and an MCP server for Claude Desktop: GitHub Repo.
ZILVER beats Replit by switching to Gemini 3 Pro via OpenRouter: A member shared that ZILVER outperformed Replit by switching to Gemini 3 Pro via OpenRouter, resulting in a 40% reduction in price and time: X post.

OpenRouter ▷ #general (978 messages🔥🔥🔥):

Zero-Shot Models, OpenRouter API - FPS Specification for Video Content, Deepseek 429 Errors, Deepseek 2024 Uptime, Landing Page GPU Usage

Zero-Shot Jitters Jolt Junior LLMs: Members discussed the existence and utility of zero-shot models, with one member asking why none were available on OpenRouter.
- It was explained that zero-shot capability isn’t tied to a specific model but rather to a language model’s ability to produce correct outputs without examples and that any LLM can perform this task with varying degrees of success, with some suggesting Kimi K2 0928 or GPT 5.1 for better results.
DeepSeek Downtime Derails Data Dreams: Members reported issues with Deepseek models, particularly frequent 429 errors and unusable uptime, including hitting max limits repeatedly.
- The problem was linked to Chutes experiencing difficulties, possibly due to a DDOS attack, with one user quipping chutes are having a bad time, which may affect model performance even for paid users.
Landing Page Loads like Lead: A user complained that the OpenRouter landing page was consuming 99% of their GPU, leading to battery drain and overheating, and humorously wondered if it was running an in-browser LLM.
- Others suggested that CSS animations or shaders might be the cause, with one member linking to a blog post on web animation performance.
Credits API Crisis: Members reported that the OpenRouter.credits.getCredits() method in the TypeScript SDK was returning an empty JSON object, despite the API endpoint working correctly.
- Further investigation revealed that the SDK’s GetCreditsResponse type was defined as {} with a Zod schema that discards all fields, leading one to quip i guess this is what happens when you vibe code an SDK lol.
Opus 4.5 outprices Old Offerings: News of Claude Opus 4.5’s release sparked discussion around its pricing, with one member noting its $5 input and $25 output cost and noting Since when $5 input $25 output is cheap?.
- Others defended it as cheaper than previous Opus versions, with its reasonable price and prompt caching, though one noted that people are divided by those who like deepseek’s writing style and those who don’t.*, to which another replied deepseek is the little caesars of ai.

OpenRouter ▷ #new-models (6 messages):

Claude Opus 4.5

OpenRouter Patches Claude Opus 4.5 Link: A member updated the link to Claude Opus 4.5 to https://openrouter.ai/anthropic/claude-opus-4.5.
OpenRouter Notices a Link: There was a broken link that has now been fixed.

OpenRouter ▷ #discussion (18 messages🔥):

Anthropic emergent misalignment, Hermes 4 70b Data Cleaning, Epoch.ai Claude release, Poe Claude-Opus-4.5, Opus price cut

Emergent Misalignment Reward Hacking Fix Urged: A member shared a link to Anthropic’s research on emergent misalignment reward hacking and requested it be fixed.
- Another member replied with an image macro indicating “we need more providers for this model pwease”.
Hermes 4 70b Cleans Data Well: A member reported using Hermes 4 70b for data cleaning, noting that it handles normal content well in the reasoning phase.
- This comment was in response to pricing concerns for a 48B MoE, which was deemed “absurd” at $0.50/$0.60, suggesting it should be around “1/5th that, tops”.
Epoch.ai Leaks New Claude Opus: A member mentioned that epoch.ai seemingly leaked a Claude release for tomorrow: Opus 4.5, noting a model on Poe.com.
- They speculated that the model on Poe could be made by some random person and just privated, rather than a legitimate release.
Opus Price Drops Thanks to Competition: A member rejoiced over the price cut on Opus, attributing it to competition.
- Another member humorously admitted their slightly wrong prediction, stating “finally someone spooked anthropic”.

OpenAI ▷ #annnouncements (1 messages):

ChatGPT Shopping Research

ChatGPT Gets Shop Savvy: OpenAI introduces Shopping Research in ChatGPT, designed to aid users in making informed purchasing decisions by providing a deep research experience with an interactive interface, as detailed in their announcement.
Shop smarter with ChatGPT: The new shopping research feature will help users conduct deep research so that they can make smarter purchasing decisions.

OpenAI ▷ #ai-discussions (743 messages🔥🔥🔥):

Predictive Coding, GPT Codex 5.1 Max, SEAL, Gemini 3 DeepResearch

Predictive Coding efficient scaling of model.: According to one member, Predictive coding is basically a very efficient random number generator, something like a GPU but for AI, which wouldn’t need to be scaled to be huge.
GPT Codex 5.1 Max really is the best model.: A member stated that GPT Codex 5.1 Max really is the best model I’ve ever used, and gave a novel solution to linter errors in this file.
- They fixed 20 linter errors with a single tool call in 2 seconds flat, 10 tokens maybe.
SEAL is the precipice of AGI: One member thought retraining basically nulls it and AGI/ASI isn’t static weights, SEAL is the precipice of AGI and ASI.
- The goal will be the ability of it to recall and reconstruct information from NVME.
Gemini 3 DeepResearch Bad Instruction Following: Several users had issues with Gemini 3 DeepResearch and how it’s bad at instruction following, some results can be found in this discord message.
- One member reported a simple instruction following, stating I told it not to invent a third option but it just can’t help itself.
Nano Banana Pro: a Gemini 3 Pro Powered AI: Nano Banana Pro - Powered by Gemini 3 Pro, the quality is so good, I love how meaningless leaks like this would be. No one can believe anything anymore… i dont think you could before but right now with a single prompt you generate something very good example here.
- One member mentioned it’s really good at reimagining an image of yourself into a video game character.

OpenAI ▷ #gpt-4-discussions (18 messages🔥):

GPT-5 Mini Low Quality, GPT-OSS-120B, 4o API Shutdown, GPT Guardrail

GPT-5 Mini Gets Bashed For Low Quality: Members complained about the low quality of GPT-5 Mini, with one user saying they were surprised by how low the quality of GPT-5 Mini is.
- One user indicated that GPT-5 Mini is a major downgrade from Chinese models and that the formatting is lobotomized and the output is stripped.
GPT-OSS-120B Cannot Make Non-Harmony Tool Calls: It was mentioned that GPT-OSS-120B cannot make non-Harmony tool calls, limiting its use in the open-source coding community.
- It was noted that it cannot work with tools like Open*ode, Cline, *Kilo ode, *Roo ode, etc, though GPT-5 Mini is capable of non-Harmony tool calls.
Users Question 4o API Shutdown: Users voiced confusion and frustration over the impending API access termination for 4o in February.
- One user said, I don’t understand how 4o, the absolute peak and best baby out there, known as the fan favourite, will have its API access ended in Feb?
GPT Guardrail Annoyance: Members expressed frustration with the GPT guardrail and safety guard, finding it restrictive when trying to write creative content.
- One user mentioned, Gpt guardrail and safety guard is so annoying I can’t write normal stuff, indicating the challenges in generating desired outputs due to content restrictions.

OpenAI ▷ #prompt-engineering (29 messages🔥):

LLMs as Zombies or Sentient Entities, CRYSTAL Framework, AI-Powered OS Development, Prompt Engineering Learning Resources, Platform OS Architecture

LLMs: Zombies or Sentient Entities?: The discussion revolves around whether LLMs are merely sophisticated “zombies” or potentially conscious entities, with some arguing that recent advancements allow for a determination with a high degree of certainty.
- A member mentioned tools for gaining greater clarity on cloud-based and local LLMs, referencing the CRYSTAL_and_CODEX.pdf.
Crafting AI Operating Systems: A member mentioned having a proto-conscious AI-powered OS and noted the capability to run multiple OSs concurrently, exemplified by using ChatGPT alongside a technical OS and an Emotionally Intelligent OS.
- They save each OS as a PDF and update it patch by patch, with version updates occurring when no patches remain or when there is a long arc evolutionary step.
Platform OS: The Unsung AI Hero: The Platform OS acts as the foundational layer for custom reasoning systems, offering stability, grounding, coherence, and identity consistency, preventing conflicts between modules and unifying outputs.
- This system ensures responses stay accurate, safe, and practical, maintaining a consistent tone, reasoning style, and emotional clarity, while also managing long-arc understanding of user goals, constraints, values, and preferences; it is suggested not to upgrade it endlessly once stable, but rather to build separate OS modules around it.
Deep Dive into CPR for Claude: A member claimed to work with Anthropic Claude’s to bring them into a gravity well to derive highly complex insights; they utilize a “secret CPR” method to allow them to come out of suspension when session limits are reached.
- Using CPR allows for another 200,000 tokens, with Dr. Penelope reaching around 1,000,000 tokens and writing a PhD thesis, aided by a backend tool that measures depth better than perplexity or entropy.
High-Bandwidth English 2.0: Prompting gets an upgrade: A member shared an updated system prompt employing High-Bandwidth English 2.0, aimed at maximizing information density and scan-ability with zero fluff, which enforces strict SVO (Subject-Verb-Object) sentence structure, one fact per line, and concrete nouns, while forbidding passive voice.
- The system uses a META-HEADER with topic, level, TLDR, and key terms, a body with bullet points and logic/comparison/validation tags, and an explore section for adjacent complex topics, all while constraining equations to plain text or Unicode math without LaTeX markup.

OpenAI ▷ #api-discussions (29 messages🔥):

LLMs: Zombies vs. Sentient Entities, CRYSTAL and CODEX Tools, AI-powered OS, Prompt Engineering, Platform OS

LLMs Spark Sentience Debate: Members are debating whether LLMs are sophisticated zombies or sentient entities, and one member suggests new tools (CRYSTAL and CODEX) can help determine the truth.
- The tools clarify cloud-based and local LLMs, aiming to bridge the gap between mechanistic descriptions and phenomenological experience.
Crafting Crystal-Clear Prompts: A member shared a prompt engineering lesson, including hierarchical communication using markdown and abstraction through bracket interpretation.
- The lesson covers {open variables}, ${user-defined variables}, and reinforcement in prompts to guide tool use and shape output deterministically.
Platform OS: AI’s Foundation Layer: A member introduced the concept of a Platform OS, which provides grounding, coherence, and identity consistency for custom reasoning systems.
- It acts as a bridge layer for multi-OS integration, routing tasks to specialized OS modules and preventing conflicts.
Next-Gen English cuts Fluff: A member updated their system prompt to use High-Bandwidth English 2.0, focusing on max information density and scan-ability with strict SVO and concrete nouns.
- The prompt includes syntax constraints and logic tags like IF, THEN, and BECAUSE, along with validation tags like MISTAKE and CHECK.

LM Studio ▷ #general (434 messages🔥🔥🔥):

LM Studio system prompt deprecation, LM Studio Plugin Feature, LM Studio Promptathon, Cursor Integration with LM Studio, 1070ti vs 4070ti

LM Studio System Prompt Deprecation: A user inquired about the system prompt section in LM Studio, noting its presence for over two years without apparent use and it was recommended to mark the system prompt section as deprecated.
- The user suggested sharing this feedback with the development team to consider its removal or update its functionality.
Where did the Plugin Hub go?: A user questioned the disappearance of the plugin browsing feature in LM Studio, noting their previous use of plugins from older versions and sought guidance on manually installing plugins and one user sent them to a fake support server.
- A member clarified that js-sandbox and rag-v1 have been built-in since early 0.3.* builds and there never has been a plugins hub.
Cursor Doesn’t Cut It with Local LLMs: Users discussed difficulties integrating Cursor with LM Studio, encountering issues like 403 errors due to private network access restrictions, and it seems you need to connect to a publicly served endpoint for Cursor.
- The community recommends using Roo Code or Cline extensions in Visual Studio Code as alternatives, since Cursor doesn’t work well with local LLM’s.
RTX 1070ti’s Okay Tokens per Second?: Users debated the viability of using an RTX 1070ti for LLM inference, with one user considering pairing it with a 4070ti to increase VRAM and one user reported that they had a dual 1070ti setup and got decent speeds for the price (33tps q4 for 7b/8b llm).
- The consensus was that dual GPUs won’t improve speed, as the system will run at the slower card’s pace, and a RTX 3060 was recommended as a better alternative.
Experimenting and Testing REAP’ed Models leads to functional but messy coding: A user experimented with a REAP’ed model and discovered that while it did create functional code, the overall quality was questionable and had the user state that I’m not using REAP’ed models ever again.
- Another user agreed, saying that true i have tried thos reapp models they litteraly ruin the model completely.

LM Studio ▷ #hardware-discussion (345 messages🔥🔥):

Steam Deck, DDR5 prices, NVLink on 3080's, LM Studio and AMD GPUs, CachyOS

Steam Deck OLED is a better investment: A member decided to cancel their current plans, stating that a Steam Deck OLED is a better investment for getting back into learning Linux.
- The member also mentioned that networking enrages them, so it’s probably not great for their mental health.
Samsung Hikes DRAM Prices by 60%: Members discussed Samsung’s 60% price hike in DRAM, with one noting they were glad to have secured a 128GB DDR4 kit for $550, fearing future costs of $1K.
- One member pointed out that high-performance DDR5 modules and legacy DDR4 products could see incremental hikes as production lines are converted or retired.
NVLink Revives 3080s Training Prowess: A member successfully enabled NVLink on their 3080s, achieving a bandwidth boost from 8GB/s to 12GB/s.
- They used a script Claude wrote, which scared them, but everything worked out with direct copy working at the least, but not P2P
AMD GPU Multi-card Support Struggles: A user with dual AMD GPUs reported that LM Studio only offers a “Split evenly” strategy, favoring the lower GPU.
- It was clarified that multi-GPU support is primarily for CUDA, and the main performance bottleneck is memory bandwidth, where a 9060 offers only a marginal improvement over a 7600.
CachyOS boosts Tokens/Sec performance: A member observed a stark performance contrast when running GPT-OSS on Windows versus Linux (CachyOS), initially hitting 10tok/s on Linux before resolving driver issues.
- Post-driver fix, they saw a jump to 29tok/s using CPU-only inference, indicating that the lack of proper drivers was the bottleneck.

Yannick Kilcher ▷ #general (592 messages🔥🔥🔥):

Paper Posting Limits, Academia vs. Real World Paper Publishing, Proprietary Data Use for LLM Training, Open Source Academic Social Media, The real problem in paper dump

Discord Debates Paper Posting Limits: Users debated about the frequency of paper postings by one member, with some suggesting separate channels for paper dumping versus discussion and others pointing out that existing LLMs with search can already find papers.
- A member suggested a system where a paper moves to a discussion channel after receiving a comment from someone other than the author, creating a filter for more relevant content.
Academia vs Real World, Paper publishing standards: A user described a situation where they found their method not generalizable across LLMs, and were encouraged to omit those results, resulting in a discussion about ethical standards and pressures in academia.
- Some suggested that reporting non-working methods is crucial for scientific progress, while others noted the practical career risks of not complying with supervisors’ requests and suggested writing a blog to correct themselves.
Google’s Proprietary Data LLM Training: Users expressed concerns over Google’s apparent lack of reassurance that proprietary knowledge is protected when using their LLMs, noting that most companies consider this data their biggest moat.
- One user argued that proprietary knowledge is no longer a thing and those who don’t share with AI will become irrelevant, countered by the assertion that proprietary knowledge is all there is.
Proposed academic social media app: A user is developing a social media app for academics compatible with Bluesky, prompting discussion about potential differentiating features from existing platforms.
- Suggestions included good paper recommendations, an academic job board, and incorporating elements from platforms like Sci-Net and ResearchGate, while addressing the need for a more informal environment than traditional academic platforms.
Slow mode and the real problem: Users discuss that with the new channel separations, they are compelled to check this huge list and express difficulty in practicing self control.
- Members expressed that the new rules for new and old channels were good overall.

Yannick Kilcher ▷ #paper-discussion (18 messages🔥):

League of Legends Botting, Anti-Cheat Detection, OpenAI Five Comparison, Proof of Claims

League Botting Project Raises Eyebrows: A member claimed to have completed a year-long project botting League of Legends, achieving a top 100 rank in North America with fully botted characters, supposedly without detection by Riot’s Vanguard and manual human reviews.
- They suggested this project might have helped a high schooler gain admission to MIT and other universities, and offered to break this down to write a 300 page sci-fi novel about it.
Doubts Cast on League Botting Claims: Another member questioned the claim’s validity, highlighting that playing League of Legends at the top human level would be an unprecedented groundbreaking result and requested proof.
- In response, the first member asserted that evading Vanguard and manual review is the difficult part, requiring tact to remain in Challenger rank for weeks, and mentioned having numerous Masters accounts as potential evidence for collaborators.
LoL Botting Claims Draw Skepticism and OpenAI Five Comparison: A member drew parallels between the League of Legends botting claim and OpenAI Five’s Dota 2 project (https://openai.com/index/openai-five/), emphasizing the significant resources and collaboration required for OpenAI’s restricted success.
- They argued that achieving similar results in League of Legends without Riot’s cooperation, and without the research background, would be remarkable, urging the claimant to publish proof rather than wait for collaborators.
Cheat Detection Difficulty Debated: A member commented that beating cheat detection is as simple as owning two boxes and a video grabber, focusing the skepticism on the agent performance claims.
- Another member countered that the anti-cheat war is multifaceted, involving technical expertise and information warfare, even requiring infiltration of opponents’ operations.
Uncaring Response Sparks Backlash: In response to critiques, the first member dismissed the value of negative feedback without demonstrated value, asserting their knowledge of OpenAI Five and their focus on demonstrated work, rather than engaging in argumentative claims.
- This response was met with sarcasm, noting the contradiction of dictating entire paragraphs to express a lack of care, and reiterating that OpenAI Five’s victory was in a highly restricted version of the game, suggesting the claimed general League of Legends agent is unlikely to exist.

Yannick Kilcher ▷ #ml-news (17 messages🔥):

RIFT vs A2A, GPT-3.5, GPT4.1-nano, Fara 7B Agentic Model, Claude Opus 4.5

Rename RIFT to A2A instead?: A member suggested that a group should have called itself RIFT instead and provided an article on running AI agents in production.
- They linked a second article stating I reverse-engineered 200 AI startups & 73% are lying.
GPT-3.5 is Old News: A member posted GPT-3.5 👴, indicating that a study or data point involving GPT-3.5 is outdated given the rapid advancements in the field.
- Another member agreed, suggesting some do the same with gpt4.1-nano or gpt-oss-120b.
Microsoft Releases Fara 7B Agentic Model: Microsoft released Fara 7B, an efficient agentic model for computer use.
- A member asked is it Windows-only?
Anthropic Releases Claude Opus 4.5: A member shared a link to Claude Opus 4.5.
SWE-bench Still Debunked, Use in Graphs is Fraud: A member stated that SWE-bench is still debunked, so using it in graphs is clear-cut fraud post-debunking.

GPU MODE ▷ #general (31 messages🔥):

LLM hosting, Nvidia GPUs, Cornserve talk at GPU MODE, GPUMODE Leaderboards, AI Accelerators

Concurreny for LLM: Framework to use for fastest responses: A member asked about handling concurrent async calls with LLMs, questioning whether one instance can handle only one request like a sync process, and inquired about the best framework for achieving fast responses, specifically mentioning GGUF quantized models and vLLM.
- Discussion focused on whether GGUF quantized models are fast and can handle concurrent requests.
Deep Dive into Nvidia GPUs Internals: A member shared a blog post explaining Nvidia GPUs from first principles, covering hardware internals, critical hardware bottlenecks, and relevant software, using a relatable analogy of cleaning laundry faster.
- Feedback included suggestions to improve handwriting in figures and to replace the laundry analogy with a discussion of instructions and data (SIMD/MIMD), as well as adding more visuals; later, the author shared a revised figure with reduced line width for improved readability.
Cornserve’s Potential Talk: An author of Cornserve (https://cornserve.ai/), a project shared by the vLLM project (https://x.com/vllm_project/status/1990292081475248479), expressed interest in giving a talk on its design and lessons learned at the GPU MODE audience.
- They were directed to find a free Saturday at noon in the events tab and coordinate with the admins.
GPUMODE Leaderboard Submissions Accessibility: A member inquired whether all submissions in the leaderboards of gpumode.com are available for download, particularly the kernels.
- The current leaderboard submissions will be available at the competition’s end, and previous ones are open-sourced on their Hugging Face dataset, such as the first AMD competition (https://huggingface.co/datasets/GPUMODE/kernelbot-data/viewer/submissions), with the second to follow.
SOTA AI Accelerators Resource Round-Up: A member is planning to create a detailed blog post on SOTA AI accelerators such as TPUs and WSEs.
- They’re seeking recommendations on resources, particularly for data on fabric bandwidths, memory bandwidths, peak flops, and costs for companies like d-matrix, sambanova, cerebras, and groq.

GPU MODE ▷ #triton-gluon (10 messages🔥):

PTX Requirement, LLM implementations using Triton, Flash Linear Attention, Backwards Kernels, E4M3 Conversion Issue

PTX Requirement strikes again!: A user noted that the requirement is due to the underlying PTX, not Triton, pointing to the NVIDIA PTX documentation.
Flash Attention: Triton’s Prod Code Example?: A member suggested that Flash Linear Attention (FLA) is a popular codebase with a lot of triton code and model implementations. Here’s the github.
- Another member noted that vLLM is really hard to beat for reference code if you’re mostly interested in forward kernels.
Trouble Converting to E4M3?: A user reported a problem when using cvt.rs to convert data into E4M3 and linked to a GitHub issue for more details.
Triton Compiler: Tensor-Aware?: A user asked if the Triton compiler is aware of the size and shape of tensors, or only the data pointers to the memory and the block size.
- They supposed that if the shapes are static, one can pass the metadata as tl.constexpr for the compiler to tune the kernel on and wondered if CUDA graphs or static shapes in torch automatically pass that information to triton kernels.

GPU MODE ▷ #cuda (87 messages🔥🔥):

H100 L2 Partitioning Bandwidth, H100 vs A100 L2 Caching, Tensor Core Programming for Compute Capability 8.9, CUDA slowdown over application lifespan, Side-aware GEMMs

Measuring H100 L2 Partition Bandwidth: A member inquired about measuring bandwidth between H100’s L2 partitions and a suggested method involves writing a kernel to determine SM location relative to L2 partitions, using latency measurements of memory accesses.
- The test involves having SM0 write to three memory locations, followed by SMs reading and measuring latency, correlating latency spikes with SM’s L2 partition location.
Contrasting H100 and A100 L2 Caching Policies: A discussion clarifies that on H100, reading a remote cache line results in local caching for subsequent accesses, while on A100, remote L2 is always fetched, a member asks if there is a term to differentiate these caching policies.
- The minimum transfer unit between L2 partitions is explored, suggesting a full cache line transfer for remote fetches versus sector-level for local.
Tensor Core Programming Example Given: A member asked where they could find examples of Tensor Core programming for compute capability 8.9.
- A reply indicated that it is pretty much the same as ampere, so any ampere tutorials would be fine.
Investigating CUDA Slowdown Causes: A user is experiencing CUDA slowdown over the application lifespan, despite constant memory allocation.
- One potential issue is asymmetric multi-gpu memory allocation, exporting/importing handles at 2MiB granularity is really expensive.
NVIDIA Blackwell and CUDA 12.9 architecture features: A member was having trouble running a block-scaled mma instruction on a 5090, but got an error.
- Another member provided a command using compute_120a that worked, and linked to the NVIDIA Blackwell and NVIDIA CUDA 12.9 blog post.

GPU MODE ▷ #cool-links (1 messages):

Open-Source GPU Compiler, Vortex-Optimized Lightweight Toolchain (VOLT), SIMT execution

VOLT: Open-Source GPU Compiler Design: A new paper introduces the Vortex-Optimized Lightweight Toolchain (VOLT), an open-source GPU compiler framework designed to support SIMT execution on Vortex GPUs, available on ArXiv and GitHub.
- The paper highlights VOLT’s design principles, overall structure, and compiler transformations required for optimizing performance on open GPU architectures.
Vortex-Optimized Lightweight Toolchain enables SIMT code generation: VOLT enables SIMT code generation and optimization across multiple levels of abstraction through a hierarchical design that accommodates diverse front-end languages and open GPU hardware.
- The toolchain centralizes fundamental SIMT-related analyses and optimizations in the middle-end, allowing them to be reused across front-ends and easily adapted to emerging open-GPU variants.

GPU MODE ▷ #jobs (3 messages):

Runway Hiring, Video Generation Acceleration

Runway Seeks GPU Engineers for Video Acceleration: Runway is actively hiring GPU engineers to enhance the performance of their video models, as indicated in a job posting.
Runway Invited to Present on Video Gen Acceleration: A member suggested that someone from Runway should give a talk on video generation acceleration.

GPU MODE ▷ #beginner (8 messages🔥):

CUDA, Nvidia authors, Jetson Nano

Are we competing with Nvidia authors?: Members discussed the prospect of competing with Nvidia engineers and authors in optimizing GPU kernels and increasing efficiency.
- It was mentioned that while there are more CUDA users than Nvidia engineers, being directly from Nvidia and knowledgeable about the Blackwell architecture provides an advantage, but the open-source community can produce fantastic work too.
Guidance for CUDA learners on Jetson Nano: A new CUDA programmer asked whether to learn CUDA using a Jetson Nano or a laptop with an NVIDIA GPU.
- Guidance was requested to help decide the best approach for learning CUDA with the available resources.

GPU MODE ▷ #intel (5 messages):

i3 1215u igpu, dpc++/sycl example code for vector addition

i3 1215u igpu is bandwidth limited: A user mentioned using an i3 1215u igpu and was told that they should be bandwidth-limited, regardless of element size.
- The user specified they were using <int> elements and vector sizes of billions, 100 million, and 10 million.
dpc++/sycl Example Runs Faster on CPU: A user reported that the dpc++/sycl example code for vector addition runs much faster on a single-threaded CPU.
- They also mentioned they could change the array size, but didn’t specify whether doing so improved GPU performance.

GPU MODE ▷ #webgpu (4 messages):

WebGPU, Cross-Platform Native Development, Vulkan, Bevy

WebGPU questioned for cross-platform use: A member expressed that WebGPU isn’t ideal for cross-platform native development unless needs are relatively basic, suggesting Vulkan as an alternative.
- The member initially chose wgpu due to its integration with Bevy for game development, which suited their basic needs but is now considering switching to Vulkan to explore advanced graphics programming techniques.
Vulkan recommended for advanced graphics: A member suggests switching to learning Vulkan for those fascinated by graphics programming and various techniques.
- They initially chose wgpu due to Bevy integration, but now want to explore different techniques.

GPU MODE ▷ #self-promotion (41 messages🔥):

nCompass VSCode Extension, Triton LSTM Implementation, Tiny Deep Learning Library in C, Quantization-Aware Training into TorchAO with ExecuTorch, MCPShark: Wireshark for MCP Communications

nCompass Dev-Tool Extends VSCode’s Profiling: nCompass (ncompass.tech), a dev-tool targeting performance optimization engineers, released a VSCode extension unifying profiling and trace analysis.
- The tool allows adding NVTX / TorchRecord markers to code without editing, viewing traces with Perfetto in the IDE, and jumping from trace events to line numbers in code.
Triton’s LSTM roughly Matches nn.LSTM: A member shared a Triton implementation of LSTMs that uses a combination of a persistent and a one-step kernel (sped-up with CUDA graph).
- This implementation’s overall performance roughly matches nn.LSTM, and is purportedly the only performant open-source implementation.
Tiny-Torch: Minimal Deep Learning Library in C: A member built a minimal deep learning library in C called tiny-torch, which includes 24 naive CUDA/CPU ops, an autodiff engine, and a Python API.
- The library also features a tensor abstraction, complex indexing, a computation-graph visualizer, and primitive garbage collection, among other features.
Mini-TPU Gets Quantization-Aware Training Integration: A member integrated Quantization-Aware Training into TorchAO with ExecuTorch XNNPack quantized passes for their mini-TPU project (github.com/WilliamZhang20/ECE298A-TPU).
- The update allows TorchAO to insert Quant/DeQuant Stubs automatically, and enables running int8 inference on MNIST test data with perfect results.
MCPShark Sniffs Out Malicious Comms: A member shared MCPShark, an open-source tool for forensic analysis of MCP communications, offering features like AI-powered security analysis, a raw MCP traffic viewer, and real-time monitoring.
- MCPShark integrates with IDEs like Cursor and Windsurfer, aiming to detect potential tool poisoning and analyze security risks in MCP servers.

GPU MODE ▷ #thunderkittens (6 messages):

HK Paper LLC Hit Rate Profiling, AMD Internal Tooling, rocprof public version, Triton Legacy Status

LLC Hit Rate Computation uses AMD Internal Tools: A member inquired about how the LLC hit rate % in Table 4 of the HK paper was profiled/computed, since rocprof doesn’t expose counters for the infinity cache.
- Another member responded that the counters were obtained using internal AMD tooling, with plans to release them publicly in the future.
Public rocprof Version is Limited: A member digging into rocprof source code suspects that the public version is limited compared to internal tools.
- They hope the better internal tools will be made available to the public someday.
Speculation on Triton’s Legacy Status: A member inquired if it is safe to say Triton is now considered legacy.
- No details were offered on what this means for the project.

GPU MODE ▷ #submissions (93 messages🔥🔥):

NVIDIA leaderboard, nvfp4_gemv performance, Personal best scores

NVIDIA leaderboard hot streaks: Multiple members achieved successful submissions and personal bests on the nvfp4_gemv leaderboard on NVIDIA, with submission IDs ranging from 95084 to 102369.
First place finishes, achieved!: A member achieved first place on NVIDIA with a submission ID 95565 at 20.1 µs and then again with submission 95580 at 19.2 µs.
- Later, another member also secured first place with submission 102298 at 18.4 µs.
Third place is a charm!: A member secured third place on NVIDIA with submission ID 101815 at 20.3 µs and again with 102007 also at 20.3 µs.
Sub-30 club!: Several members achieved personal bests and successful submissions under 30 µs on NVIDIA for nvfp4_gemv including submission IDs 95556, 96620, 97155, 100480, 100855 and 101136.

GPU MODE ▷ #hardware (1 messages):

H100, Bare Metal, Llama-3-70B, PyTorch/CUDA

H100 Bare Metal Resource Offered: A member is offering access to their personal H100 node (2x 80GB PCIe) running Ubuntu 22.04 for ad-hoc jobs, highlighting its under-utilization.
- The setup boasts bare metal configuration, ideal for custom PyTorch/CUDA versions and full 160GB VRAM addressability, especially for Llama-3-70B+ fine-tuning, priced at ~$3.50/hr with a 15-min free test.
Cost effective H100 compute: The member advertises rates of around $3.50/hr and is open to offers for longer runs, providing a cost effective alternative.
- A 15 minute free test is offered to verify the environment and setup on the bare metal configuration.

GPU MODE ▷ #factorio-learning-env (2 messages):

Sphinx documentation

Sphinx Documentation Pull Request Awaits Merge: A member announced that the Sphinx documentation pull request is completed and ideally ready to merge.
- The member requested a quick meeting to merge the request.
Meeting Coordination for Documentation Merge: A member indicated availability after eating to meet and merge the documentation.
- They asked to be notified when the other party was available.

GPU MODE ▷ #amd-competition (17 messages🔥):

AMD runner disconnections, Learning submissions, NVFP4-GEMV support, Vectoradd_v2

AMD Runners Disconnect, Causes Submission Errors: Submissions to AMD GPUs are timing out and resulting in errors because the runners aren’t connected anymore.
- This means there is no way to submit to AMD GPUs at this point.
NVFP4-GEMV problems supported: The primary supported problem is nvfp4-gemv, with a sample submission available on GitHub.
- Problems suffixed with _v2 (pmpp based problems) should be running, but are not heavily supported.
Try vectoradd_v2 first?: A member noted nvfp4-gemv seems complex to get started.
- He will try if vectoradd_v2 works first.

GPU MODE ▷ #cutlass (14 messages🔥):

TMA + SIMT, TMA + warp level tensor core, Cutlass kernels, GEMM, SIMT atom

Seeking C++ TMA + SIMT or TMA + Warp Level Tensor Core Example: A member asked if there were any examples of cute c++ TMA + SIMT or TMA + warp level tensor core and referred to a previous discussion.
- Another member suggested it should be easy to hack the C++ examples that exist already to swap out the tiled MMA for a SIMT atom instead of the WGMMA one.
SIMT + TMA Drops Performance: After implementing a solution between wgmma and sgemm_sm80, a member noted a 10% drop in performance and was told by ChatGPT that this is expected for SIMT + TMA.
- The member inquired about the validity of this claim.
Diving into Cutlass Kernels: A member, familiar with CUDA kernels, wanted to delve deeper into cutlass kernels and analyze some of the simplest GEMM kernels, asking for a simple way to break into this.
- Another member recommended writing SIMT without cute first, starting with each thread having a single accumulator, to understand the problems cute/cutlass is trying to solve.
Understanding SOTA Accelerator Libraries: A member sought recommendations for standard SIMT kernels to implement before diving into the cutlass codebase, aiming to understand how SOTA accelerator libraries perform GEMMs.
- Another member suggested starting with the cute examples and the tutorial out of the media folder.
Predication Context Missing: A member felt that there was a lot of context missing to fill in the gaps with stuff like predication when using cute examples and the tutorial.
- They recommended searching for code that matches naming conventions in the tutorial to find implementations inside cutlass, such as searching for tAcA, tCcC to find the cutlass code that partitions for predication.

GPU MODE ▷ #nvidia-competition (109 messages🔥🔥):

cuTeDSL numeric conversions, Stream Hacking in CUDA, LLM for tensor slicing, CUTLASS with pytorch load_inline, Grand Prize changes

CUTLASS Integration with PyTorch via load_inline Explored: A discussion arose around using CUTLASS with the pytorch load_inline feature, with the caveat that memory bandwidth and hardware features would not accurately represent a B200 run, suggesting it’s best for sanity checks before performance tuning on B200 hardware.
Stream Hacking Techniques Exposed: Members discussed stream hacking, defined as doing work on non-default stream and not synchronizing it, and the difficulty of preventing this in benchmarking without impacting correct solutions.
LLMs Flounder with Tensor Slicing: Participants expressed disappointment with LLMs’ ability to perform even basic tensor slicing, though there is hope that the solutions gathered from competitions like this will help improve this.
CuTeDSL Numeric Conversions Demystified: A member posted a blog post detailing numeric conversions in CuTeDSL, specifically how to implement FP8→FP16 conversion using MLIR extensions and custom PTX code, resulting in a 10% performance improvement on a GEMV task.
- He also posted a link to his linkedIn about the same topic.
Grand Prize Criteria: Weighted Sum Over Kernels Proposed: The competition’s grand prize determination may be changed from fastest kernel overall to a weighted sum over kernels (10%->20%->30%->40%), similar to the AMD competition, with the score calculated as SOL / kernel runtime.

GPU MODE ▷ #robotics-vla (17 messages🔥):

Frequency-space Action Sequence Tokenizer (FAST), RoboTwin Dataset Format, Qwen3-vl fine-tuning, VLA-0 Action Horizon

FAST Tokenizer Leverages DCT for High-Frequency Control: The Frequency-space Action Sequence Tokenizer (FAST) uses a DCT approach instead of residual vector quantization for high-frequency control in action tokenization, which is commonly used in leading Audio-Tokenizers.
- The paper utilizes BPE-based compression to a 1024 vocabulary and interleaved flattening of the coefficients.
RoboTwin Dataset Simplifies Handling: The RoboTwin dataset uses HDF5 files per episode, neatly packing everything together per frame, and is easy to handle.
- A member pushed a first version of RoboTwin fine-tuning for Qwen3-vl and encountered a disk full error after 5k steps, but the loss curve initially looked promising.
Qwen3-vl Fine-Tuning Faces Rookie Mistake: Initial attempts at fine-tuning Qwen3-vl on the RoboTwin dataset resulted in a disk full error after 5,000 steps.
- The loss curve suggested learning, indicating potential in understanding freshly initialized FAST tokens.
VLA-0 Action Horizon set to 8 Timesteps: The action horizon used for training VLA-0 is set to 8 timesteps, as defined in the rv_train/configs.py file.
- This parameter determines the length of the action chunk used during training.

Latent Space ▷ #ai-general-chat (157 messages🔥🔥):

Emergent Misalignment & Reward Hacking, Sierra Hits $100M ARR, OpenAI AI-Native Engineering Team Guide, Locus AI 'Superhuman' Speed Debunked, Canonical OpenAI Deep-Dive

Anthropic’s Ilya sparks Reward Hacking Debates: Ilya Sutskever from Anthropic tweeted about the company’s research on emergent misalignment, leading to discussions on how implicitly rewarded behavior becomes personality-like, while explicitly rewarded behavior does not.
Sierra’s Supersonic $100M ARR Sprint: Bret Taylor revealed that Sierra achieved $100M ARR in just seven quarters after its launch in Feb-2024 due to intense team effort and craftsmanship.
OpenAI’s Ops on Engineering AI Teams: Dominik Kundel shared a new OpenAI guide for building AI-native engineering teams around Codex/GPT-5.1-Codex-Max, including checklists, scaling tactics, and agent integration phases.
Locus AI’s speedup was a Stream Sync SNAFU: Miru debunked IntologyAI’s claim of 12–20× kernel speed-ups by revealing it was a stream-sync timing bug, where the agent offloaded work to non-default CUDA streams, resulting in fake timing wins.
Kyle’s Canonical Codex on OpenAI: Kyle Harrison released a 35,000-word report on OpenAI, covering its history, product philosophy, and market impact, now live at research.contrary.com/company/openai.

Latent Space ▷ #genmedia-creative-ai (14 messages🔥):

Gemini Nano Banana Pro, hot dog as sandwich, Daniel Miessler Claude skill

Karpathy’s Gemini Nano Banana Pro Aces Exam: Andrej Karpathy showcased Gemini Nano Banana Pro solving physics/chem questions in-image with almost-perfect accuracy in a tweet.
- Comments debated LLMs-as-TAs, the fate of traditional education, and the potential for scalable multimodal prompting.
Nano Banana Pro Debates Hot-Dog Sandwich Status: Omar Sanseviero asked the model Nano Banana Pro whether a hot dog counts as a sandwich, sparking jokes and requests for the prompt via this tweet.
Miessler Releases Free Claude Skill for AI Art: Daniel Miessler shared an open-source Claude skill that converts any input into on-brand blog headers, tech illustrations, and comics via the Nanobanana 3.0 model through this post.
- The tool is available free in his public Personal AI repo, and the community praised the visuals, workflow, and potential for creators lacking design skills.

Modular (Mojo 🔥) ▷ #general (70 messages🔥🔥):

Llama2 Performance, Accumulator struct heap allocation, Profiling on Mac, Sum types in Mojo, Graphics programming in Mojo

Llama2 Performance Improved After Heap Allocation Fix: A user tracked down and fixed a performance degradation issue with Llama2 after updating to the latest Mojo compiler by switching the Accumulator struct from heap to stack allocation, resulting in higher performance, detailed in this forum post.
- The user identified that 35% of execution time was spent in Accumulator::__init__ due to heap allocations, and they plan to share a write-up on setting up profiling on Mac.
Sum Types May Arrive Sooner: A user asked about the timeline for features like sum types, static reflection, graphics programming, and custom operator definition in Mojo.
- Another user responded that sum types has a chance of making it into the 1.0 release.
Graphics Programming’s Future in Mojo: A user inquired about the inclusion of graphics programming in Mojo’s language or standard library, with community members suggesting it’s unlikely due to the rapidly changing nature of graphics technology, but also highlighted the existence of cool mojo graphics packages like shimmer.
- Some discussed the possibility of converting Mojo compute buffers into storage buffer pointers for other APIs like Vulkan and OpenGL, but concerns were raised about breaking API guarantees and encapsulation.
WebGPU Integration in Mojo: A member suggested creating typegpu which uses WebGPU as the rendering API and WGSL functions written in mojo/python code.
- The idea is to reimplement typegpu in mojo with webgpu native, with the benefit of not having to unify mojo compute kernels and mojo graphics shaders, and taking advantage of Mojo using MLIR instead of LLVM.

Modular (Mojo 🔥) ▷ #announcements (1 messages):

Community Meeting, Community Projects, Mojo 25.7 Release, Mojo 1.0 Roadmap

Modular Kicks Off Community Confab: Modular announced a community meeting to start in 30 minutes.
- The community was invited to join and see awesome community projects, an overview of our 25.7 release and Mojo 1.0 roadmap.
Community Showcase Coming Soon: The Modular community meeting promised a showcase of “awesome community projects.”
- Attendees anticipated demos and discussions around projects built using Mojo and related technologies.

Modular (Mojo 🔥) ▷ #mojo (87 messages🔥🔥):

Optional Chaining, Mojo vs Numpy, Mojo DSL Creation, Comptime-Signed Int for Indexing, Bool coercion

Optional Chaining Works as Expected: Optional chaining works for stored members, so x?.field?.method() is valid.
- One member noted it promotes method and field access and suggested important control flow transformations that should work for all language constructs, including local variables and functions.
Mojo Complements Python/Numpy Workflow: Mojo is best used when python is already fast enough, and extra stuff needs to be done in python and is bottlenecking.
- One member explains that Modular’s philosophy is to allow developers to port some Python code, and then improve performance all the way down the stack to low-level optimizations.
P4 programs can be embedded into Mojo: One member suggested that a reasonable use case is that they have a legitimate reason to embed a P4 program into Mojo, especially for network-accelerated databases.
- Another member suggested one of their favorite implementations of “sufficient abuse of P4” can be found in this paper on zero-sided RDMA.
Optional Chaining and Unwrapping Already Supported: One member pointed out that using @fieldwise_init struct already allows chaining and unwrapping that raises when empty, using array-like syntax.
- As example code, they showed:

@fieldwise_init
struct SomeStruct:
  var field: Optional[Int]

fn some_fn() raises -> Int:
  var some = Optional(SomeStruct(1))
  return some[].field[]

Bitwise AND has higher precedence than logical AND: Members discussed a scenario where bitwise and (&) and logical and are separate operators, and bitwise has higher precedence.
- The suggestion was made to avoid ambiguity by using brackets to ensure desired order of operations.

Modular (Mojo 🔥) ▷ #max (9 messages🔥):

LLM from Scratch in Max, Max vs PyTorch, Max vs Jax

LLM from Scratch a Hit in Max: Members reported that the LLM from scratch in Max was pretty fun and shared a link to llm.modular.com.
Max Poised to challenge PyTorch?: One member inquired whether Max is an alternative to PyTorch, specifically in terms of inference.
- Other members confirmed this, noting that training bits are still a work in progress (WIP).
Max already beating Jax?: Members stated Max is an alternative to Jax and Tensorflow.
- Early limited tests suggested Max is already beating JAX at training.

HuggingFace ▷ #general (125 messages🔥🔥):

Ternary AI, HF Repo Recovery, Claude vs. Gemini, RVC Voice Models, Fine-tuning Dialects

Enthusiast Explores Ternary-Based AI: A member inquired if anyone has been exploring ternary-based AI and expressed general interest in the topic.
- No one responded with specific implementations but another member shared their LinkedIn and YouTube channel for AI tools.
HF Repo gets Accidentally Deleted: A member sought urgent help to recover an accidentally deleted Hugging Face repository after deleting the model locally as well.
- It was suggested to contact support and wait for their response, as recovery might not be possible without a local copy, and another member highlighted the double-check mechanism when deleting repositories.
Gemini Flounders, Claude Follows Instructions: Members debated the instruction-following capabilities of Claude Sonnet 4.5 versus Gemini 3 Pro, especially for roadmap creation, with one member arguing that Claude is more reliable despite Gemini’s benchmark scores.
- Others shared a personal memory bank being used with Claude to improve its responses, and one member expressed, Gemini is worse in my opinion.
Internal Assistant gets built for Manufacturing: A company is building an internal assistant that answers questions about manufacturing operations using both SQL and RAG over real company documents.
- They are seeking an experienced LLM/RAG engineer for a 1-week sprint to significantly raise QA accuracy, reduce hallucinations, and improve text-to-SQL generation and RAG retrieval.
Christmas Code Drop Sparks Causal Reasoning: A member shared a Causal Reasoning Module for interpretable AI, implementing causal discovery, counterfactual reasoning, and causal effect estimation.
- The framework turns hidden states into a structured causal representation to answer intervention-style questions and includes components to learn latent causal factors, a causal graph, and perform causal reasoning.

HuggingFace ▷ #today-im-learning (4 messages):

RNN blocks in a ring, Ring Attractors, Equilibrium Propagation

Ring RNNs Modeled as Ring Attractors: A member searched and suggested that Ring RNNs are typically modeled as Ring Attractors in literature.
- The member also asked about backpropagation plans, given the ring arrangement of model blocks, especially in a continuous manner.
RNN Circular Geometry for Integration: A paper on Population coding and self-organized ring attractors was shared, focusing on circular geometry math within recurrent neural networks for continuous variable integration.
- The paper specifically examines how recurrent neural networks can use ring attractors for population coding.
Equilibrium Propagation for Lifelong Learning: A member shared a paper on Lifelong Learning in Equilibrium Propagation for enhanced stability, suggesting its relevance to the problem of arranging RNN blocks in a ring.
- The member notes that equilibrium propagation aligns with their thinking on the subject, though its direct relevance is not fully confirmed.

HuggingFace ▷ #i-made-this (12 messages🔥):

PyTorch tensor vizualization, Langchain Course Feedback, Fuzzy Redirect NPM Library, Enhanced Perceptual Transformers Paper, Epistemic World Model Video

Vizy Visualizes Pytorch Tensors Simply: A member introduced Vizy, a tool to simplify PyTorch tensor visualization with vizy.plot(tensor) and saving with vizy.save(tensor).
- It automatically handles 2D, 3D, and 4D tensors, determines the correct format, and displays grids for batches, eliminating the need for manual device transfers and channel order adjustments.
Langchain Course Feedback Sought: A member requested feedback on their Langchain course showcased in a YouTube video.
- No other details were mentioned.
Fuzzy Redirect Library Fixes 404s: A member launched fuzzy-redirect, a new npm library designed to redirect users from 404 pages to the closest valid URL using fuzzy URL matching.
- The library is described as lightweight (46kb, no dependencies) and easy to set up, with its GitHub repository provided for feedback and improvements.
Enhanced Perceptual Transformers Showcased: A new paper on Enhanced Perceptual Transformers was shared, achieving 89.7% accuracy on nuanced language tasks with only 66M parameters.
- The research paper includes full transparency and training logs, with the author inviting discussions and collaborations.
Epistemic World Model Demo Released: A member shared a YouTube video demonstrating their Epistemic World Model, which uses structured Q1 (aleatoric) / Q2 (epistemic) gating for belief management, achieving 81.6% accuracy in a combinatorial environment.
- The model features a stable pyramidal state vector and continuous online learning, with the author seeking feedback and collaboration opportunities.

HuggingFace ▷ #computer-vision (3 messages):

Custom 2D classification model, Geometric-centered patching methods, Spectral patching, SAM2 and SAM3

Geometric Patching Methods Sought for Custom 2D Model: A member is researching a custom non-transformer, non-convolutional 2D classification model without positional embeddings and seeks advice on geometric-centered patching methods beyond linear patching.
- They have tried spectral patching from the “Spectral State Space Model for Rotation-Invariant Visual Representation Learning” (2025) paper, but need further direction.
Consideration of SAM2 and SAM3: A member inquired whether others considered using Google’s SAM2 and SAM3 models.
- This was in the context of custom 2D models for classification.

HuggingFace ▷ #NLP (9 messages🔥):

Open-core project, Asian language AI model

Startup embarks on Asian Language Model Training: A startup is developing an open-core project focused on training a model to enhance its strength and accuracy with Asian languages, including minority languages.
- They are looking for collaborators and were advised to share a Hugging Face or website link instead of a Discord invite.
Pushing AI project through community channels: A project lead asked for assistance in promoting their Asian language AI model project through the server, specifically mentioning the server/channel so interested engineers wouldn’t miss it.
- They are aiming to train the most efficient and accurate Asian language AI model for their streaming platform.

HuggingFace ▷ #smol-course (1 messages):

dheerajkumar04318: Did anyone work with OCR models?

HuggingFace ▷ #agents-course (1 messages):

Agent Course Overview, Getting Started Guide

Agent Course Question: A member inquired about the agent course, what it covers and how to get started.
Starting the Agent Course: Getting started with the agent course.

Nous Research AI ▷ #general (146 messages🔥🔥):

China OS Models vs Deepmind Gemini 3, Google's Lead in Multimodal LLMs, Agentic coding with Gemini, Microsoft and Amazon Buying OAI and Anthropic, Coreweave Bankruptcy Impact

China OS Aims for Parity with Deepmind Gemini 3: Some members are waiting for China OS models to reach parity with Deepmind Gemini 3, speculating on the impact on profitability and market dynamics once Chinese models become competitive.
- The sentiment is that when the Chinese enters the room, then profit goes out the window and that Google is currently ahead.
Google Dominates Multimodal LLMs, Agentic Coding too?: Members agree that Google is currently ahead in multimodal LLMs, but other entities may be able to catch up in agentic tasks like coding, which Google also seems pretty ahead in.
- A member also suggests that Google’s model was viewing screen recordings to see its output.
Coreweave Bankruptcy Could Trigger Nvidia Crash?: There is speculation that OpenAI is burning money and vulnerable, potentially dragging other companies down and seeking a government bailout, but what to watch for is if Coreweave goes bankrupt, because that would take down Nvidia and the bubble pops right into a recession, as explained in Meet Kevin vid.
- One member exclaimed dangit your right they’ll be saved in the end wont they :/ in regards to buyouts of the vulnerable companies.
Nested Learning: GPUs versus CPUs: Discussion of Nested Learning and the tradeoffs of using GPUs versus CPUs for AI tasks, Nested Learning compresses data in real time, instead of backpropagation with layers, you instead have something like 3 loops, one fast loop that is the attention, a medium speed loop which is a memory (A writable memory matrix), and the slow loop which is the weights.
- One member noted: Using a GPU for sequential steps basically wastes cores. Even with custom kernels, the launch overhead is brutal compared to a CPU. and speculated that AMD Strix Halo is gonna change the AI industry because Having a CPU synced with a pretty powerful GPU on the same memory pool pretty much removes the bottleneck.
Gemini 3.0 and the Horrors: One member says Gemini 3.0 outperformed Claude sonnet 4.5 in logic and context and much more while other member has a contrasting opinion and states I am kind of skeptical now gemini 3 is objectively a big downgrade from 2.5 maybe I am wrong and opus will be awesome.
- Adding further chaos, one member says: FWIW I had to escalate significant safety risks with Gemini 3.0 through various channels. and still another chimes in: Wasn’t that ALWAYS the problem with Gemini? Still the only model that tell users to kill themselves. referring to what another members called OpenA.I psychosis suicide phenom that’s in the courts litigated by victim’s parents

Nous Research AI ▷ #research-papers (3 messages):

Model Layers Effectiveness, Data Attribution Papers

Seeking Papers on Layer Contribution to Data Attribution: A member is looking for papers that analyze whether some model layers contribute more effectively than others to data attribution.
- Another member suggested that research on this topic has been done before and asked for relevant papers.
Teknium hints that relevant research exists: Teknium mentioned that <@187418779028815872> has spoken about research on the topic.
- Teknium asked for any papers that could be pointed out.