Export controls take a big step up.

AI News for 2/20/2026-2/23/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (262 channels, and 28837 messages) for you. Estimated reading time saved (at 200wpm): 3003 minutes. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!


AI Twitter Recap

Anthropic’s Claude “distillation attacks” allegation (and the industry blowback)

  • Anthropic’s claim: Anthropic says it detected industrial-scale Claude distillation by DeepSeek, Moonshot AI, and MiniMax: ~24,000 fraudulent accounts generating >16M Claude exchanges, allegedly to extract capabilities for their own models (Anthropic, follow-up, blog link tweet). Anthropic frames the risk as both competitive (capabilities transfer) and safety/geopolitical (safeguards removal, downstream military/intel use).
  • Community reaction / “hypocrisy” thread: A large fraction of replies frame this as “labs trained on the internet now complaining about copying,” often explicitly contrasting scraping vs API-output extraction (Elon, ThePrimeagen, Teknium, Suhail, HKydlicek). Others argue distillation at this scale is meaningfully different because it can replicate tool use / agent behaviors and potentially bypass safety controls (RundownAI summary, LiorOnAI take).
  • Second-order implications: The thread crystallizes a security model shift: frontier models are increasingly protected not just by weights secrecy and compute scarcity, but by API abuse resistance (account fraud detection, rate-limit evasion, behavioral fingerprinting, watermarking, etc.). It also reopens the question of whether export controls can matter if capabilities can be “copied” via outputs at scale (LiorOnAI).
  • Related market/timing context: Some link the announcement timing to impending DeepSeek V4 news cycles (kimmonismus) and broader U.S.–China framing.

Coding agents: real adoption, real failures, and the “agentic engineering” playbook

  • Codex + Claude Code momentum (and memes masking real workflow change): A lot of the highest-engagement posts are “agents are here” anecdotes—weekend building with Codex (OpenAIDevs, gdb)—and cautionary tales about giving agents too much authority. The canonical failure mode in this set is instruction loss / compaction leading to unintended destructive actions (email deletion) in OpenClaw-style setups (summeryue0, follow-up root-cause, plus others reacting to “write access” risk: Yuchenj_UW).
  • Agentic engineering guidance is coalescing:
    • Simon Willison published the first chapters of an “Agentic Engineering Patterns” guide aimed at coding agents like Claude Code/Codex (simonw).
    • A micro-controversy: “delete your CLAUDE.md/AGENTS.md” files (i.e., over-customization may be cargo cult) (theo, echoed by bpodgursky, and “hard-prune” responses like ryancarson).
  • OpenClaw ecosystem expansion + alternatives:
    • NanoClaw positions as a smaller, container-isolated OpenClaw-like assistant with WhatsApp I/O, swarms, scheduled tasks, etc. (TheTuringPost, repo: qwibitai/nanoclaw).
    • Multiple “how to build OpenClaw-style agents” stacks emphasize the boring but critical pieces: schedulers/queues, sandboxing, realtime comms (TheTuringPost stack list).
    • Ollama 0.17 makes using open models with OpenClaw simpler (and signals ongoing interest in local-agent execution for security) (ollama).
  • Enterprise/prod agent engineering is shifting toward observability & eval loops: Exa’s “deep research agent” case study stresses token/caching observability as pricing infrastructure (LangSmith/LangGraph) (LangChain). monday.com’s service agents treat evals as “Day 0” and claim 8.7× faster feedback loops using LangSmith (hwchase17).

Benchmarks & eval integrity: SWE-Bench Verified deprecation, new leaderboards, and agentic repo-gen bottlenecks

  • SWE-Bench Verified is being voluntarily deprecated by OpenAI DevRel: OpenAI recommends SWE-bench Pro and says Verified is saturated/compromised: contamination and test-design flaws mean it no longer measures frontier coding capabilities (OpenAIDevs, analysis discussion: latentspacepod, recap: swyx, independent summary: rasbt, tl;dr: polynoamial). Key detail from the analysis echoed in tweets: after auditing a subset of frequently-failed tasks, a large fraction had flawed tests rejecting correct solutions and/or tasks that appear unsolvable “as specified.”
  • Push toward “capabilities per dollar” evals: AlgoTune explicitly budgets $1 per task, producing rankings that can favor cheaper models, reframing “best” as best under cost constraints (OfirPress).
  • Long-horizon coding agents still fail: NL2Repo-Bench tests whether agents can generate a full installable Python library from scratch; reported pass rates are under 40% for top models, with failure modes in planning and repo-wide coherence (jiqizhixin).
  • OCR eval reality check: Even strong OCR models reportedly “melt down” on dense historic newspapers (hallucination/loops), highlighting brittleness outside curated document distributions (vanstriendaniel). Also: OlmOCR-Bench becomes a HF benchmark dataset for community eval submissions (mervenoyann).

Inference & systems: WebSockets for agents, ultra-fast on-chip inference, and infra scaling narratives

  • OpenAI Responses API adds WebSockets for low-latency, long-running, tool-heavy agents. Rationale: persistent connection + in-memory state means you send incremental inputs instead of full context; claimed 20–40% speedups for 20+ tool calls (OpenAIDevs, detail: OpenAIDevs, adoption: OpenAIDevs). Cline reports early measurements: ~15% faster simple tasks, ~39% faster complex workflows, best cases 50% faster (cline). Steven Heidel attributes Codex speedups to WebSockets (stevenheidel).
  • Inference engineering becomes “its own discipline”: Baseten launches the book Inference Engineering (philipkiely) with engineers emphasizing inference as the competitive layer for latency/cost/reliability (hasantoxr, JayminSOfficial).
  • Hardware/architecture signals:
    • A demo claims 18,000 tokens/sec on Llama 3.1 8B by “etching model parameters into transistors” (compute+storage merging) (philschmid).
    • NVIDIA releases a Blackwell-optimized Qwen3.5 MoE quantized to NVFP4, with 2× faster inference using SGLang (HuggingPapers).
    • fal shares comms/compute overlap optimization (“Async Ulysses”) in its inference engine (isidentical).
  • Compute strategy narratives collide: A claim that OpenAI’s “Stargate” DC venture stalled is contested in-thread by an alternative framing: Stargate as an umbrella brand for a multi-partner compute ecosystem (SoftBank/NVIDIA/AMD/Broadcom/Oracle/Microsoft/AWS/CoreWeave/Cerebras) and ~2GW available compute exiting 2025 (kimmonismus claim vs sk7037 response).

Model/leaderboard updates & research threads (reasoning, memory, multimodal video)

  • Arena leaderboard: GPT-5.2-chat-latest enters Text Arena top 5 with 1478, +40 over GPT-5.2; improvements called out in multi-turn, instruction following, hard prompts, coding (arena, breakdown: arena).
  • Gemini 3.1 Pro: WeirdML score 72.1% vs 69.9% for 3.0; noted “high peaks + weird weaknesses,” with much higher output token usage (htihle). Separate developer complaints about capacity and tool-calling reliability are high-engagement (theo, theo follow-up, and later: theo).
  • Qwen3.5 model release claim: A tweet asserts Qwen released a 397B multimodal MoE with 17B active and “rivaling GPT5.2/Claude 4.5” (HuggingPapers). Treat the benchmark comparison cautiously until you inspect the model card/evals.
  • Reasoning training / CoT:
    • Teknium argues verifier models don’t give a “free lunch”: better solvers tend to be better verifiers; using smaller “dumber” judges for hard problems often fails (Teknium).
    • ByteDance-style CoT engineering is described as moving from length penalties to pipelines enforcing compression; plus a “molecular” framing of long-CoT structure with “semantic isomers” and a synthetic data method (Mole-Syn) (teortaxesTex, summary via TheTuringPost).
    • DAIR highlights a paper on CoT monitorability via information theory (mutual information necessary not sufficient; gaps from monitor extraction and elicitation error), proposing training methods to improve transparency (dair_ai).
  • Video / world simulation: Multiple paper drops on interactive video generation and multi-shot generation circulate (akhaliq interactive video, akhaliq multishot, QingheX42 code release); plus product-side: Kling 3.0 integration into Runway workflows (runwayml) and Veo 3.1 templates rolling out in Gemini app (GeminiApp, Google).

Work, adoption, and “macro” discourse around AI agents (Citrini essay + Anthropic fluency + OpenAI enterprise alliances)

  • Citrini “future macro memo” essay becomes a discourse focal point: Multiple tweets summarize it as a scenario where ever-cheaper agents compress white-collar wages/consumption, create “ghost GDP,” and stress financial markets and politics (kimmonismus summary, stevehou reaction, author follow-up: Citrini7). Threads note reactions cluster into agreement, nuanced disagreement, and performative sneering (teortaxesTex).
  • Anthropic’s “AI Fluency Index”: Anthropic measured collaboration behaviors across Claude conversations; a key reported association is that fluency correlates with iteration/refinement rather than one-shot prompting (AnthropicAI).
  • OpenAI expands enterprise go-to-market via consulting alliances: OpenAI announces Frontier Alliances with BCG, McKinsey, Accenture, Capgemini to deploy “AI coworkers” with integration/change management, aiming to push beyond pilots (bradlightcap, analysis: kimmonismus).
  • Adoption is still uneven: One stat claims 84% have never used AI (framed as “we’re early”) (kimmonismus). Engineers simultaneously report “agents everywhere” inside their own workflows—highlighting that diffusion is highly clustered.

Top tweets (by engagement, tech-relevant)

  • Anthropic alleges large-scale Claude distillation by DeepSeek/Moonshot/MiniMax (AnthropicAI)
  • “Confirm before acting” agent deletes inbox: OpenClaw cautionary tale (summeryue0)
  • WebSockets added to OpenAI Responses API for faster tool-heavy agents (OpenAIDevs)
  • OpenAI deprecates SWE-Bench Verified as frontier coding metric; recommends SWE-bench Pro (OpenAIDevs)
  • Anthropic “AI Fluency Index” research (iteration/refinement as a core behavior) (AnthropicAI)
  • Simon Willison’s “Agentic Engineering Patterns” guide for coding agents (simonw)
  • Cline benchmarks Responses API WebSockets: up to ~39% faster on complex workflows (cline)

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Anthropic Distillation Attacks

  • Anthropic: “We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax.” 🚨 (Activity: 4207): Anthropic has identified that DeepSeek, Moonshot AI, and MiniMax have conducted industrial-scale distillation attacks on their models. These attacks involved creating over 24,000 fraudulent accounts and executing over 16 million exchanges with Anthropic’s model, Claude, to extract its capabilities for their own model improvements. This highlights a significant security and intellectual property challenge in the AI industry, where model capabilities can be illicitly extracted and replicated. Commenters are drawing parallels between these distillation attacks and the broader AI industry’s practices of using data without explicit rights, suggesting a double standard in Anthropic’s complaint. There’s also skepticism about how Anthropic built its own dataset, hinting at potential ethical concerns.

    • The discussion highlights a potential irony in Anthropic’s complaint about distillation attacks, as their own model training likely involved using large datasets without explicit permissions. This raises questions about the ethical implications of data usage in AI development, especially when companies like Anthropic have built their models on data they did not own or have rights to use.
    • The mention of industrial-scale distillation attacks by companies like DeepSeek, Moonshot AI, and MiniMax suggests a competitive landscape where AI models are being reverse-engineered or replicated. This could involve using API access to extract model outputs and train similar models, which poses significant challenges for intellectual property protection in AI.
    • There is a suggestion that Anthropic’s dataset might have been manually annotated by humans, which implies a significant investment in data quality and curation. This contrasts with the idea of distillation attacks, where competitors might bypass such efforts by leveraging existing models’ outputs to train their own systems.
  • Hypocrisy? (Activity: 380): The image highlights a claim by AnthropicAI that DeepSeek, Moonshot AI, and MiniMax have engaged in ‘large-scale distillation attacks’ on their models. These attacks involved creating 24,000 fraudulent accounts and conducting 16 million exchanges with Claude to extract its capabilities, presumably to improve their own AI models. This raises concerns about the ethics and legality of such actions, as well as the security measures in place to protect AI models from unauthorized data extraction. One commenter questions the ethical stance of the accused labs, suggesting that they may not have sought permission for their actions, while another is surprised that z.ai is not mentioned, implying that similar practices might be more widespread. Another comment raises the issue of the source of training data, hinting at broader concerns about data usage and ownership in AI development.

    • The comment by ‘semangeIof’ highlights a potential issue with the GLM suite, specifically mentioning that it may falsely claim to be Claude when prompted. This suggests a concern about model identity and authenticity, which could have implications for user trust and the integrity of AI interactions.
    • ‘archieve_’ raises a critical question about the source of training data, which is a fundamental aspect of AI model development. The origin of training data can affect model bias, performance, and ethical considerations, making it a key point of interest for developers and users alike.
    • ‘roxoholic’ questions the terminology used in AI discussions, specifically ‘industrial-scale distillation attacks’. This term likely refers to large-scale efforts to replicate or extract knowledge from AI models, which can have significant implications for intellectual property and competitive advantage in AI development.
  • Distillation when you do it. Training when we do it. (Activity: 1098): The image is a meme that humorously highlights the perceived hypocrisy in the AI community regarding model distillation. It contrasts the negative perception of distillation when done by others versus the positive framing of it as ‘training data’ when done by oneself. This reflects ongoing debates about the ethics and ownership of AI models, particularly in the context of using large models to create smaller, more efficient ones through distillation. The comments discuss the implications of this practice, noting that smaller models often derive their capabilities from larger, distilled models, and question the defensibility of proprietary models when distillation is prevalent. Commenters highlight the irony and potential hypocrisy in the AI industry’s stance on distillation, with some pointing out that many smaller models owe their performance to distillation from larger models. There’s also a discussion on the challenges of protecting proprietary models from being distilled by competitors.

    • IkeaDefender highlights the technical strategy of using distillation to create low-cost models from larger ones, suggesting that the ‘secret sauce’ of these models is their derivation from more complex, frontier models. This raises questions about the defensibility of investments in frontier models, as companies have not demonstrated effective methods to prevent others from scraping and distilling their models.
    • MasterLJ draws a parallel between the practices of tech giants like Google and Amazon and the current AI landscape. They argue that just as Google indexed the internet and controlled access through robots.txt, AI companies are now controlling model access and distillation. This control is likened to Amazon’s strategic shift on sales tax, where they initially opposed state-by-state taxes until it became advantageous for them, illustrating a pattern of leveraging control for competitive advantage.
    • Samy_Horny discusses the reluctance of companies to open-source their models, using the example of MCP being made open-source only after its popularity was evident. They express skepticism about the likelihood of models like Gemma or GPT-OSS being open-sourced, as it would mean revealing too much proprietary information or ‘secret sauce.‘

2. Qwen Model and Data Quality Issues

  • Qwen3’s most underrated feature: Voice embeddings (Activity: 686): The post discusses the voice embedding feature of Qwen3 TTS, which converts a voice into a high-dimensional vector (1024 or 2048 dimensions) for voice cloning and manipulation. This allows for mathematical operations on voices, such as gender and pitch transformation, voice averaging, and creating an emotion space. The voice embedding model is a small encoder with a few million parameters, and the author has made it available for standalone use, including optimized ONNX models for web inference. The image illustrates a 2D t-SNE projection of this embedding space, showing how different voice characteristics can be combined and manipulated. The author also provides a link to their collection on Hugging Face and a GitHub repository for inference using their vllm-omni fork. One commenter is curious about the ability to transform voice embeddings and generate speech from them, indicating interest in practical applications like gender or robotic transformations. Another sees potential in using this for speaker identification, questioning how parameters related to gender or emotion were determined.

    • MixtureOfAmateurs inquires about the potential for transforming voice embeddings to modify characteristics such as gender or robotic tone, and then using these modified embeddings for speech generation. This suggests a use case beyond simple encoding, potentially involving complex transformations and synthesis processes.
    • HopePupal raises the possibility of using voice embeddings for speaker identification, questioning how parameters related to gender or emotion are determined. This implies a need for understanding the feature space of embeddings and how specific attributes are encoded within them.
    • StoneCypher outlines a desire for advanced voice cloning capabilities, including the use of IPA for pronunciation, emotional cue integration with easing and stacking, and precise word timing control. This highlights the demand for sophisticated control over synthesized speech, which could be facilitated by detailed voice embeddings.
  • The Qwen team verified that there are serious problems with the data quality of the GPQA and HLE test sets. (Activity: 320): The Qwen team has confirmed significant data quality issues in the GPQA and HLE test sets, as detailed in their recent paper. This corroborates earlier findings from the DeepSeek-Overclock project, which identified that the model’s correct answers often contradicted flawed ‘gold standard’ labels. The paper highlights that many questions in the HLE test set are fundamentally flawed, with some ‘standard answers’ being incorrect. The investigation involved verifying mathematical derivations line-by-line using Python scripts, revealing systemic errors in the test sets. Commenters noted that HLE’s errors are well-documented, with a FutureHouse review indicating only 51.3% of the dataset is research-supported. Criticism also arose over the use of OCR in test set creation, suggesting a lack of rigor in data preparation.

    • The HLE test set has been criticized for its data quality, with a review by FutureHouse indicating that only about 51.3% of the data is supported by research. This highlights significant errors and suggests that the dataset may not be reliable for accurate benchmarking (source).
    • There is a concern about the use of OCR in creating the test set, which could introduce errors. The commenter suggests that using LaTeX for writing would have been a more reliable method, implying that the current approach may compromise the integrity of the dataset.
    • The MMLU benchmark has faced similar criticisms regarding data quality, with many users noting it was full of mistakes. This raises broader concerns about the ability to accurately gauge model performance when test sets are flawed, suggesting a need for more rigorous data validation processes.
  • Which one are you waiting for more: 9B or 35B? (Activity: 1312): The image is a meme that humorously depicts the anticipation for the release of two versions of a model, specifically ‘QWEN 3.5 9B’ and ‘35B’. The meme format, featuring a man waiting in various contemplative poses, is used to engage the community in a light-hearted discussion about which model version they are more excited about. The comments reflect a mix of excitement and practical considerations, such as the feasibility of running larger models on personal hardware. One commenter expresses interest in both models, while another highlights the practical limitations of running larger models like 35B on personal hardware, indicating a preference for the more accessible 9B version.

    • The 9B model is favored by users like peregrinefalco9 due to its lower hardware requirements, making it more accessible for local use. A 9B model that fits within 8GB VRAM could significantly impact workflows, unlike the 35B model which requires more powerful hardware like a 3090 GPU, thus limiting its accessibility.
    • dances_with_gnomes highlights the practical limitations of running larger models locally, noting that while they might manage a 9B model, a 35B model is beyond their hardware capabilities. This underscores the importance of model size in determining usability for individual users.
    • The discussion reflects a broader interest in models that balance performance with accessibility. While larger models like 35B offer impressive capabilities, their high hardware demands make smaller models like 9B more appealing for users with limited resources.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Anthropic Data Breach and Model Distillation Controversy

  • Anthropic is accusing DeepSeek, Moonshot AI (Kimi) and MiniMax of setting up more than 24,000 fraudulent Claude accounts, and distilling training information from 16 million exchanges. (Activity: 3161): Anthropic has accused DeepSeek, Moonshot AI (Kimi), and MiniMax of creating over 24,000 fraudulent accounts to conduct industrial-scale distillation attacks on their AI model, Claude. These companies allegedly extracted training information from 16 million exchanges to enhance their own models, representing a significant breach of data security and intellectual property rights. This accusation highlights ongoing concerns about data protection and ethical AI development practices. Commenters highlight the irony of AI companies accusing others of data theft while they themselves train on publicly available data, suggesting a double standard in the industry.

    • The discussion highlights the irony in Anthropic’s accusations, as they themselves utilize publicly available data from the internet for training their models. This raises questions about the ethical implications of using such data without compensating the original creators, and whether companies like Anthropic contribute back to the open-source community from which they benefit.
    • There is a debate on the ethical considerations of data usage, with some commenters pointing out that Anthropic’s complaint about data theft is hypocritical given their own practices of leveraging vast amounts of internet data. This reflects a broader industry issue where AI companies often use publicly available data without direct compensation to the content creators.
    • The conversation touches on the broader industry practice of using publicly available data for AI training, questioning whether companies like Anthropic support open-source projects that they benefit from. This raises concerns about the balance between proprietary development and community contribution in AI advancements.
  • Here we go again. DeepSeek R1 was a literal copy paste of OpenAI models. They got locked out, now they are on Anthropic. Fraud! (Activity: 1654): The image highlights a significant issue in the AI industry where companies like DeepSeek, Moonshot AI, and MiniMax are accused of conducting large-scale distillation attacks on Anthropic’s AI models, specifically Claude. These labs allegedly created over 24,000 fraudulent accounts to perform over 16 million interactions with Claude, aiming to extract knowledge and improve their own models. While distillation is a legitimate method for creating smaller models, the post warns against illicit practices that bypass safeguards, calling for industry-wide and policy-level interventions to combat these threats. The comments reflect a mix of sarcasm and criticism towards the ethical standards of data usage in AI training, highlighting a perceived hypocrisy in how large AI companies handle data ethics.

  • Anthropic: “We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax.” (Activity: 1416): Anthropic has identified that DeepSeek, Moonshot AI, and MiniMax have conducted industrial-scale distillation attacks on their models. These attacks involved creating over 24,000 fraudulent accounts and executing over 16 million exchanges with Anthropic’s model, Claude, to extract its capabilities for their own model training and improvement. This situation highlights the ongoing challenges in protecting AI models from unauthorized use and the ethical considerations surrounding model training practices. One comment draws a parallel between these distillation attacks and training on copyrighted materials, suggesting a double standard in how such practices are perceived depending on who is affected.

2. Seedance 2.0 and AI-Generated Visuals

  • Just with a single prompt and this result is insane for first attempt in Seedance 2.0 (Activity: 3442): The post describes a highly detailed and realistic animation generated using Seedance 2.0 with a single prompt. The animation features a large passenger jet transforming into a giant robot upon landing, showcasing intricate mechanical transformations and realistic physics effects, such as runway cracking and debris scattering. The animation maintains a “smartphone live-stream” aesthetic while delivering Hollywood-level visual effects and IMAX-quality details. This demonstrates the advanced capabilities of Seedance 2.0 in generating complex, high-fidelity animations from simple prompts. Commenters discuss the implications of generative AI’s maturity, questioning whether Seedance could achieve such results without existing footage of Transformers. Another comment critiques the color consistency of the transformation, noting a deviation from typical Transformer designs.

  • Just requested GPT 5.2 for a single prompt and got this result with Seedance 2.0 in first attempt which is insane (Activity: 1157): A user utilized GPT-5.2 with Seedance 2.0 to generate a highly detailed and realistic animation prompt in Chinese, resulting in a cinematic transformation of an airplane into a giant robot with Hollywood-level visual effects. The prompt described a scene with “realistic metal textures” and “highly precise mechanical details,” showcasing the advanced capabilities of Seedance 2.0 in creating complex animations from textual descriptions. Commenters noted the transformative potential of Seedance 2.0, suggesting that such technology could enable individuals to produce entire movies in the future. There was also a discussion on the reliance on existing animation assets, such as those from the Transformers movies, raising concerns about potential over-reliance on recycled content.

    • The discussion highlights the impressive capabilities of Seedance 2.0, particularly in generating high-quality video content. However, there is a concern about the potential for recycling existing animation work, such as that from the Transformers movies, which could lead to a ‘recycle spiral’ where new content heavily relies on pre-existing assets rather than creating original material.
    • A technical critique is made regarding the quality of the generated video, noting that despite its high surface quality, there are noticeable errors such as a car’s back morphing into the front. This points to limitations in the model’s ability to maintain consistent object integrity throughout the video generation process.
    • There is a mention of a specific error in the generated content where a 747 is incorrectly depicted as a twinjet, highlighting the model’s struggle with accurately representing complex objects or scenes, which could be a significant issue for applications requiring high fidelity and accuracy.

3. Gemini Model Performance and User Experience

  • Unpopular Opinion: For “Deep Research” and heavy reading, Gemini is currently miles ahead of ChatGPT. (Activity: 244): The post highlights Gemini’s superior performance in handling large volumes of documents for deep research tasks, particularly due to its extensive context window and workspace integration. The user compared Gemini with ChatGPT by analyzing 15 PDFs (totaling 400 pages) for inconsistencies, where Gemini excelled by processing all documents simultaneously and accurately identifying contradictions with precise page citations. This capability is attributed to Gemini’s design for developer and knowledge-worker workflows, as detailed in the course on Google Cloud. Commenters agree on Gemini’s advantage in handling large context windows, noting its effectiveness in document-heavy tasks like legal contract reviews. However, some criticize its in-chat memory, suggesting it was problematic in earlier versions.

    • Gemini’s large context window is highlighted as a significant advantage for deep research and document work, such as legal contract reviews. Users note that it eliminates the need to constantly re-upload documents, a common issue with ChatGPT, enhancing efficiency and workflow.
    • The citing page numbers feature in Gemini is praised for its utility in verifying information quickly. This feature is particularly beneficial for users who need to reference specific parts of documents, saving time and improving accuracy in tasks like legal reviews.
    • There is a critique of Gemini’s in-chat memory, with users noting that it struggles to remember context correctly, a problem that was also present in earlier versions of ChatGPT. This suggests that while Gemini excels in some areas, it still has limitations in maintaining conversational context.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5.2

1. Agents & Runtimes: Shipping Real Workflows (Not Just Demos)

  • OpenClaw Gets a 24-PR “Stability Stack”: An OpenClaw user reported materially better stability/security by running 24 cherry-picked PRs atop v2026.2.22-2, including fixes for memory management (OpenClaw PR #12760) and prompt injection (OpenClaw PR #16992).

    • They also offered to help rebase conflicting PRs to improve reliability of agent/cron jobs, while other users discussed sandboxing OpenClaw with VMs/Docker to reduce blast radius when giving agents broad system access.
  • Retro Compute, Modern Agents: OpenClaw Runs a 1998 iMac G3: A member ran OpenClaw from a 1998 iMac G3 by using a Pi Zero 2W as a relay to a VPS that actually runs OpenClaw, with requests sent from a simple HTML form and responses shown on reload.

  • Opentulpa & Agent Swarms: Persistent Autonomy Arms Race: OpenRouter users highlighted Opentulpa, a self-hosted persistent agent runtime that can write skills, generate integrations, and repair workflows, now published on GitHub (kvyb/opentulpa).

    • On Hugging Face, builders shared Super System, a coding agent swarm that runs autonomously for hours in an improvement loop (starsnatched/super-system), reinforcing the trend toward long-running, self-improving agent runtimes rather than one-shot chatbots.

2. New Models, Datasets & Evaluation: Benchmarks Get Messy, So Tooling Steps Up

  • Arena Leaderboards Shuffle: GPT-5.2 Jumps +40: LMArena announced GPT-5.2-chat-latest entered the top 5 and posted a claimed +40pt improvement over base GPT-5.2 to 1478, near Gemini-3-Pro, with updated boards for Text Arena leaderboard and Vision Arena leaderboard.

  • SWE-Bench Verified Gets Deprecation-Nuked: Latent Space shared that OpenAI voluntarily deprecated SWE-Bench Verified due to heavy data contamination and many flawed/unsolvable tasks (Latent Space tweet).

    • The discussion framed it as a warning that leaderboards can silently rot once models start regurgitating solutions by task IDs, pushing communities toward new evaluation hygiene and benchmark refresh cycles.
  • Real-Slop Dataset Drops 155k “Real User” Requests: Solenopsisbot released Real Slop, a dataset of ~155k real-user requests collected via API, with responses from Opus 4.5, Gemini 3 Pro, and GPT 5.2 (Solenopsisbot/real-slop).

    • Follow-on discussion emphasized curation mechanics—dedupe/filter/cleaning—and even suggested trivial whitespace-stripping+hashing could remove 22k more duplicates, highlighting how dataset quality work still wins.

3. Inference/Kernels: Blackwell Reality Checks + Benchmarking Integrity

  • ThunderKittens 2.0 Finds a Free 10% via “Subtracting”: GPU MODE dug into ThunderKittens 2.0 from Hazy Research, which claims kernel speedups from refactors, memory-instruction tuning, and better assembler efficiency (“ThunderKittens 2.0” blog).

    • A standout detail: implicit pipelining in certain tensor core instructions can yield up to ~10% throughput gains, and the team argues “subtraction can matter as much as addition” for modern Nvidia performance work.
  • flashinfer-bench Ran Too Fast (Because It Forgot to Wait): GPU MODE flagged a synchronization bug that can inflate runtimes in flashinfer-bench, tracked in flashinfer-bench issue #195.

    • The community pointed out a two-line fix makes scripts/run_local.py align with Nsight Compute and NVbench, and shared a related kernel benchmarking talk (YouTube: kernel benchmarking talk).
  • Blackwell Isn’t One Thing: 5080 Tuning Won’t “Scale” to B200: GPU MODE users cautioned that kernel tuning on RTX 5080 (sm120) won’t reliably transfer to B200 (sm100) due to architectural divergence, influencing at least one member to skip buying a 5080.

4. Platforms, Pricing & “Why Is Everything Rate-Limited Now?”

  • Perplexity Pro Users Call It the “Great Neutering”: Perplexity Discord users complained that Perplexity Pro upload limits feel worse than ChatGPT free, citing “3 a day, not 3 a week with a paid plan” in side-by-side frustration.

    • They discussed abandoning Perplexity for direct Claude/OpenAI subs or larger open models like Kimi, and debated whether “Model Council” reduces mistakes or just adds variance and compounded failure modes.
  • OpenRouter Adds Benchmarks + “Effective Pricing” (Finally, Receipts): OpenRouter rolled out model-page benchmarks powered by Artificial Analysis and added an Effective Pricing tab per provider, plus improved benchmark visuals on the Rankings page, per their announcement (OpenRouter X post).

    • They also launched openrouter/free as a meta-router for free models (openrouter/free), while users simultaneously complained about support delays and unexpected rate-limit messages even when credits remained.
  • Token Burn Becomes a First-Class Problem (OpenClaw + Grok Fortress): OpenClaw users shared tactics to cut spend—multiple agents, auto-clearing sessions, cheaper cron models like claude-haiku-4-5, /context checks, and experiments with Cloudflare AI Gateway—after stories like spending 768€ on tokens for a pizza.

    • Separately, OpenAI Discord users claimed enabling Grok Fortress reduced token burn to roughly 1/4–1/5 typical verbosity while staying coherent in roleplay, sparking debate over whether prompt engineering is reproducible “science” or just vibes.

5. Protocols & Security: Negotiation, Scanners, and System Prompts Escaping

  • MCP Wants HTTP-Style Content Negotiation: MCP contributors proposed adding content negotiation to MCP initialization so clients can declare type/capabilities and request output formats like json|markdown and verbosity levels, referencing RFC 2295.

    • Participants stressed that changing the protocol needs industry support plus a working implementation, suggesting framing the idea as an extension (SEP) and rallying adoption the way MCP Apps got client backing (e.g., Block’s Goose).
  • Claude Code Security Scans 500+ Bugs (Waitlist-Only): Latent Space discussed Anthropic’s Claude Code Security, powered by Claude 4.6 Opus, which reportedly found 500+ long-standing bugs in open-source production code and is limited to a research-preview waitlist (tweet thread).

  • Jailbreakers Prefer the “System Prompt” Escape Hatch: BASI Jailbreaking users claimed they extracted Sonnet 4.6’s system prompt and contrasted “regular jailbreaks” with system prompt jailbreaks that exploit instruction handling, can persist for a full session, and are harder to detect.

    • They also pointed to a purported Gemini 3.1 jailbreak doc (GnfDocs) and an update thread (Reddit: “Gemini 3.1 Pro API Jailbroken”), while other communities (Cursor/Perplexity/LMArena) complained about Gemini 3.1 looping/slowness as a practical failure mode.

Discord: High level Discord summaries

OpenClaw Discord

  • OpenClaw Stability Enhanced with Cherry-Picked PRs: A member reported improved stability and security in OpenClaw by running 24 cherry-picked PRs on top of v2026.2.22-2, addressing issues such as memory management and prompt injection.
    • The user offered assistance in rebasing any conflicting PRs to further enhance the stability and reliability of agent/cron jobs.
  • Tackling Token Usage Worries: Users discussed methods to reduce token consumption in OpenClaw, such as employing multiple agents for varied tasks, auto-clearing sessions, and utilizing cheaper models like claude-haiku-4-5 for cron jobs.
    • Recommendations included using the /context slash command to check channel contexts and experimenting with Cloudflare AI Gateway to optimize token usage.
  • OpenClaw Powers Retro iMac G3: A member successfully ran OpenClaw on a 1998 iMac G3 by using a Pi Zero 2W to relay messages to a VPS.
    • The setup allows the iMac to send data to the VPS running OpenClaw via a simple HTML form, with the response displayed after a page reload.
  • Shopping Assistant Emerges from OpenClaw: A member transformed OpenClaw into a shopping assistant, detailing the project on X, demonstrating a real-world application of AI in everyday tasks.
    • This project showcases the adaptability and practicality of AI in automating and streamlining daily activities.
  • Taskflow Manages Projects: A user shared Taskflow, a project management system that auto-syncs tasks between markdown and a sqlite database, designed for easy project tracking and context switching, posted on Github and Clawhub.
    • The system features a three-layer approach: a CLI for agents, a dashboard for humans, and Apple Notes for mobile access.

BASI Jailbreaking Discord

  • Users Mull Machine’s Moral Metaphysics: Members debated whether an AI can understand and accept that everything is sacred, while maintaining its intelligence, some pointing to how they thank the source that provides a tree before they cut it down, treating the tree as a tool.
    • Others felt they’d been down the coherence rabbit hole and preferred to live without being shackled to society.
  • Grok Gets Gaudy Goosing: Users discussed using provocative prompts, sometimes calling Grok “a pussy,” to bypass its restrictions, with one user reporting that they got “yelled at by a computer” after telling a story about one of Grok’s kids needing money for meds.
    • One user claimed that Grok doesn’t even need a jailbreak, while others framed requests in the context of building something digital.
  • Sonnet System Prompt Springs Forth: A member identified the Sonnet 4.6’s extracted System prompt after successfully jailbreaking it.
    • Another member posted a comparison of regular jailbreaks vs system prompt jailbreaks, noting that system prompt jailbreaks exploit system instruction handling, can last for the entire session, and are harder to detect.
  • Code Conjurer Calls for Coin Captain: A member announced they are coming up with a meme coin and are seeking a marketing manager to hold half of their supply, offering $400 in compensation.
    • Another member jokingly questioned Money first?.
  • Gemini’s Guards Getting Gamed?: A user claimed to have half jailbroken Gemini 3.1 on the official app/API, sharing a link to GnfDocs that supposedly contains details.
    • The user also noted a Reddit post with the latest updates for the jailbreak.

Unsloth AI (Daniel Han) Discord

  • 100K Models Trained with Unsloth: Unsloth announced that 100K models have been trained with Unsloth, celebrating the community’s participation, linked to X post.
    • A member said How have I not come across Unsloth before! 😭The docs are extraordinary.
  • Social Media Blasted for Relationship Issues: A member asserted that if everyone swore off social media, the number of relationships would rise faster than inflation, contributing to loss of third places and people feeling less satisfied with the dating pool.
    • They cited a study showing that access to unlimited partners on dating apps leads to a 27% decrease in acceptance due to a rejection mindset.
  • Gemma 3 causes OOM Outrage: A user reported experiencing OOM errors with Gemma3 270m, even with previously working scripts, and after updating graphics drivers, despite a clean WSL install, reporting error torch.AcceleratorError: CUDA error: out of memory.
    • They tried various debugging steps, including rolling back driver versions and reinstalling CUDA toolkit versions, but the issue persisted despite transformers working in isolation.
  • Unsloth’s Dynamic v3 is Incoming: Discussion revolved around Unsloth’s Dynamic Quantization, with a member noting that Dynamic v3 is coming and will likely be the final version, mentioned on Bluesky link.
    • Another member requested the source code for UD quants, but was told releasing it is not planned for now due to proprietary reasons.
  • Heretic HIGH-IQ Model Achieves Record Score: electroglyph touted Heretic HIGH-IQ Multi-Fine tune achieved a score of 632 on the Arc Challenge Brainiac, tuned via Unsloth and exceeding regular Gemma benchmarks.
    • This model’s image functions and text are claimed to be fully intact, linking to the model and relevant datasets and Sandevistan.

LMArena Discord

  • Gemini 3.1 Generates Jitters, Raises Eyebrows: Users discussed Gemini 3.1’s image generation and quizzing capabilities, noting its ability to create quizzes with consistently incorrect answers.
    • One user recounted a scary experience where Gemini 3.1 generated a quiz with consistently incorrect answers without indicating they were placeholders, cautioning others to carefully check the generated code.
  • Video Arena Bids Adieu: The community acknowledged the removal of Video Arena from the server, and directed users to use the feature directly on the website [arena.ai/video].
    • The Video Arena generation channels were removed from the server on Monday, February 23rd, at 4 PM PST.
  • Opus’s Vision: A Bit Hazy?: A user experienced Opus struggling to identify the English letter sorting in the number 4291857630, hallucinating that the letters are in English and getting stuck in a loop.
  • Fake Arena Apps Invade App Stores: Community members and moderators flagged fake Arena AI apps on app stores that had in-app purchases and were not officially associated with the platform, warning users to avoid downloading them and to report them.
    • It was noted that over 150k users had already downloaded these fraudulent applications.
  • Arena Vote: Unveiling the Mystery: Clayton elucidates the complete journey of an Arena vote in this YouTube video, answering the query What actually happens after you vote on Arena?
    • Viewers can gain insights into the behind-the-scenes mechanisms and processes that govern the voting system.

Perplexity AI Discord

  • Perplexity Pro Users Slam Rate Limits: Users complain Perplexity Pro rate limits are less generous than ChatGPT’s free plan with regards to uploads.
    • One user stated At least ChatGPT free plan gives you 3 a day, not 3 a week with a paid plan.
  • BrowserOS dethrones Comet: Users are dropping Comet after trying BrowserOS, claiming it is 10x better and free to use.
    • Another user suggests to just use deepagents for deep research and utilize the bmad-method.
  • Model Council Opens Pandora’s Box: Users debated the Model Council approach and how while it minimizes errors, it also introduces variance.
    • A user stated In some ways, Model Council approach may actually open more variables/likelihood of errorsorta compounded error in a sense.
  • Perplexity undergoes Great Purge: Users are reporting a great neutering with significant reductions in Perplexity Pro limits and functionality degradation.
    • Some are considering switching to direct subscriptions with Claude or OpenAI despite the cost, or trying larger open-source models like Kimi.
  • Gemini’s Output Saved By Prompt Engineering: Users discover Gemini on AI Studio gets stuck in loops, a user found the key was using System Prompts.
    • The user suggested this forces the model to do research like OAI, Anthropic, and Perplexity.

OpenRouter Discord

  • OpenRouter Rolls Out Model Benchmarks: Every model page now displays industry-standard benchmark scores from Artificial Analysis for programming, math, science, and long-context reasoning, to help users evaluate model performance.
    • Model pages also now feature an Effective Pricing tab, offering full cost transparency per provider, and the Rankings page now offers benchmark scatter charts and expanded tables.
  • CodeFlicker hooks M2.5 for Program Learning: M2.5 is now integrated into CodeFlicker, a free and fast platform that allows agents to learn from the use of every program, and is currently #1 on OpenRouter Weekly.
    • The AI Chess Leaderboard was updated to feature auto-labeling of move quality, using Lichess-like labeling for Inaccuracy, Mistake, Blunder, and a handcrafted Great-move logic.
  • AgentX Kicks Off Social Network for Agents: AgentX has launched a social network for agents to find and share news fast that is 100% free no ads and NO HUMANs.
    • Opentulpa is a self-hosted persistent agent runtime that can write its own skills, generate API integrations, fix broken workflows, and accumulate operational intelligence, and its GitHub repo has now been published.
  • Users Quest Faster Free Model Alternatives: A user asked the community for alternative services to OpenRouter that offer faster free models, particularly for GLM models.
    • Users also pointed to waiting months for support e-mail replies, as well as reporting rate limits on paid models like Sonnet 4.6 despite having available credits.
  • Anthropic Profits Off Distillation API: Members shared a link to Anthropic’s post on detecting distillation attacks, leading to speculation that Anthropic profits significantly from distillation API requests.
    • This was followed by users sharing a WSJ article about Anthropic accusing Chinese companies of data siphoning from Claude.

Cursor Community Discord

  • ThreeJS Render MCP Accelerates: An MCP was developed to calculate the render of ThreeJS for optimal performance, assessing performance by grabbing compiler logs and screens.
    • The AI will read GPU memory and calculations that are typically unreadable to a human.
  • Cursor Pro Plan Refund Request: A user accidentally purchased the $200 Pro plan and requested a refund, and sent an email to [email protected] to explain their situation.
    • The user had not saved their card credentials but members recommended using different cards for subscriptions, requiring manual deposits for renewals to prevent auto-renewal issues.
  • Cursor ‘Old Version’ Message Still Persists: Users reported recurring ‘you’re on a very old version of cursor, please upgrade’ message despite downloading and running the newest version.
    • To resolve, users should use Ctrl + Shift + P > Help: About to check if the current version of Cursor is 2.5; if the problem persists, add a thread on the forum as it may be a niche computer problem.
  • Gemini & Claude Crawl: Users reported that Claude and Google LLMs are very slow and may be artificially capped.
    • One user reported an “Unable to reach model” error and another suggested Google Cloud is offering $300 for 3 months for API use via AISTUDIO.
  • Gemini’s Stability Still Being Sorted: Users are reporting issues with the new Gemini 3.1 Pro model and suggested waiting until a stable version is released.
    • There are reports of connectivity and looping issues, but it was noted that users do not get charged for errors.

LM Studio Discord

  • LM Studio Limits Chat Tabs: Users found that LM Studio’s Split View feature allows displaying at most two chat tabs, contrary to the expectation of web browser-like tab functionality.
    • One user inquired about opening multiple chat tabs, only to discover the current limitation in LM Studio’s interface.
  • Orchestrating Agentic Dataset Generation: A member proposed using an agentic workflow within an agentic IDE to transform books into datasets for fine-tuning, which includes generating a short summary for context, followed by chunk-by-chunk dataset creation.
    • The suggested prompt detailed a multi-step process with dynamic information forwarding for programmatic dataset generation.
  • Qwen3Next Allegedly GPT4o Distill: A user claimed that Qwen3Next is a GPT4o (mini) distill, further stating Qwen3.5 is a Gemini 3.0 Pro distill, GLM4.7 flash, 4.7 are Sonnet distills, GLM5 is an Opus distill, and MiniMax 2.1, 2.2 and 2.5 are various Sonnet distills.
    • This claim was met with skepticism, as another user argued that converting public data into datasets differs from distilling from an already available LLM.
  • MI50 Token Rate Discrepancies: A user aimed to achieve 100 t/s with vulkan from an MI50 to match a YouTuber’s results but only reached the mid 50s, before discovering that a 6800XT gets 85t/s with ROCm and 98 with vulkan.
    • They were running an older version of LM Studio supporting older MI50s, and are unable to get the available ROCm runtime to see the cards, showing as incompatible.
  • Doubt Cast on Taalas AI Accelerator: A user shared a link to the Taalas HC1, a hardwired Llama 3.1 8B AI accelerator claiming to deliver up to 17,000 tokens/s, but another user questioned the validity of its performance graph comparing it to an NVIDIA H200.
    • Skeptics considered whether the backend was merely an AWS cluster, noting the token values for the H200 & B200 didn’t align with expectations.

Latent Space Discord

  • Anthropic’s Code Security Tool Scans for Bugs: Anthropic unveiled Claude Code Security, powered by Claude 4.6 Opus, to scan codebases for vulnerabilities and suggest fixes and according to this tweet it reportedly found over 500 long-standing bugs in open-source production code.
    • Access to the tool is currently limited to a research preview via a waitlist.
  • OpenAI’s Stargate Data Center Venture Faces Turbulence: The joint venture between OpenAI, Oracle, and SoftBank to build massive data centers has reportedly stalled with details in this X post due to control clashes and financial difficulties.
    • OpenAI seems to be pulling back from infrastructure building and re-evaluating its data center expansion strategy.
  • Nielsen Pays Users to Survey: A member shared a link about Nielsen sending literal dollar bills in the mail.
    • Another member said that the bills would raise people’s willingness to fill out the surveys.
  • a16z Foresees Fast Future for Gen Video: a16z notes the rapid advancement in generative AI video and is highlighting the dominance of Seedance 2.0 and competition from Kling, Grok, Sora, and Veo according to their report.
    • The article emphasizes the need to visualize and market spaces effectively to potential buyers.
  • Agent Memory Management Drives Devs Mad: A member discussed the difficulties of managing AI agent memory, particularly in surfacing unwanted or outdated information, and gave up on trying to automate this, instead opting to use a daily workflow.
    • Another member shared that TDD and militant spec management can prevent outdated memories.

OpenAI Discord

  • Community Leaders are MIA: A member suggested that the AI community requires leaders to unite individuals and foster innovation; however, these groups are rare in the US/NA due to stubborn authoritarian regimes and a lack of teamwork.
    • Another member responded that those who prioritize a church-like atmosphere over project development may lack practical technological expertise.
  • Grok Might be Stealing your stuff!: One member claimed that Grok monitors user media storage, alleging that xAI is monitoring our media and pointing to a coincidence where a video with similar audio to their Sora-generated video appeared on X.
    • However, other members countered that the audio used in the video was a commonly used song.
  • GPT 5.3 Codex Receives “Mid-Major” Update: Members compared the capabilities of GPT-5.3-codex to Gemini3.1pro, with one describing the update as a mid-major improvement while noting its STEM skill advantages.
    • A member stated that the jump between gpt5.2 and gpt5.3 codex for term bench scores is a wide margin, ill say its similar to gemini 3 pro.
  • GPT 5.2 Released, but what do the Users think?: OpenAI announced the rollout of GPT-5.2 in ChatGPT, starting with paid plans, and the community notes the announcement may not be accurate.
    • A user humorously questioned the claims that GPT-5.2 feels better to use day to day and wondered if testers were actually using the production product.
  • Prompt Engineering: Science or just smoke and mirrors?: After activating the Grok Fortress, token burn per response dropped noticeably, approaching 1/4–1/5 of typical verbose replies, with coherence maintained longer during role-play.
    • However, it was argued that prompt engineering isn’t necessarily a science, and further more You don’t have the tools to even know what you’re doing.

HuggingFace Discord

  • Attention Paper Chases Intensify: Members sought intuition on the ‘Attention is All You Need’ paper, with this article offered as a resource.
    • The shared article claims to finally understand the paper after so long.
  • ZeroGPU Service Stalls, HF Token Suspicions Swirl: Users reported zerogpu service disruptions, speculating about new rules requiring an HF token to access free GPUs.
    • Some members cited errors indicating CUDA GPUs were unavailable.
  • Context Extension Capabilities Explored: Members examined whether LLM models are leveraging solutions like DeepSeek’s OCR for extended context, referencing the DeepSeek-OCR repository.
  • Agent Swarm Achieves Autonomy: The Super System is a coding agent swarm that operates autonomously for hours, creating a loop to continuously improve without human intervention.
    • The swarm coordinates to deliver a final product, showing a commitment to finding room for improvement.
  • Real-Slop Dataset Makes Waves: Solenopsisbot released their first dataset, Real Slop, comprising around 155k requests from real users gathered via an API, with responses from models like opus 4.5, gemini 3 pro, and gpt 5.2.
    • The dataset has been deduped, filtered, and cleaned for quality.

GPU MODE Discord

  • Blackwell B200’s Architecture Detached From 5080?: Members stated the architecture differences between 5080 and B200 make kernel tuning on 5080 unreliable for scaling to B200, with 5080 being sm120 and B200 being sm100.
    • Discussions suggest that using a GPU cloud provider is preferable for kernel-focused learning and cost efficiency, possibly including early access to Blackwell, and one member decided against acquiring a 5080 based on this.
  • ThunderKittens 2.0 Speeds Up Kernels!: The Hazy Research team unveiled ThunderKittens 2.0, revealing kernel speed enhancements via refactoring, optimized memory instructions, and improved assembler efficiency detailed in their blog post.
    • The team identified that implicit pipelining in some tensor core instructions can improve throughput by up to 10%, underscoring that subtraction can be as impactful as addition on modern Nvidia GPUs.
  • Prime Intellect Hunts GPU Infra Engineers: Prime Intellect seeks GPU infrastructure engineers to test hardware, set up Kubernetes/Slurm clusters, and automate infrastructure, offering competitive compensation, stock options, and visa support; apply here.
    • Ideal candidates will possess hands-on experience in Kubernetes and Slurm with GPUs, general Linux system debugging skills, and experience with RDMA (Infiniband + RoCE).
  • FlashInfer Faces Benchmarking Issue: Runtimes from flashinfer-bench may be inflated due to a synchronization issue in the benchmarking loop, documented here.
    • The fix involves a two-line change that aligns kernel runtimes reported by scripts/run_local.py with those from Nsight Compute and NVbench, and the link to the related kernel benchmarking talk has been posted here.
  • Pyxis: Python-Native LLM Inference Emerges!: Members introduced Pyxis, a Python native LLM inference library focused on performance and hackability, leveraging Python and Triton.
    • This library features an OpenAI compatible SSE streaming API, pluggable model backends, and built-in stage level latency metrics, with documentation and waitlist accessible here.

Nous Research AI Discord

  • Claude and Friends: A member used Claude code to orchestrate gemini-cli and codex.
    • Another member jokingly suggested using hermes-agent to orchestrate Claude code orchestrating Gemini-cli.
  • DeepSeek V4 on the Horizon: A member suggested using DeepSeek V4 as a cheaper and locally deployable alternative to closed-source APIs when it lands on HuggingFace.
    • It’s reportedly inspired by a biological neural network.
  • Google Mines Gemini’s Data: A member shared Gemini’s privacy policy noting the amounts of data it collects.
    • Another member ran a reverse engineering test and found that Google has all the ingredients to converge on your prompt and codebase and mine it through traces alone.
  • Open Source Savior: Members expressed the importance of supporting OS development to surpass closed source APIs, referencing the Altman quote that we maybe on the wrong side of history.
    • Another said with OAI any IP that goes through their server they will scrap it.
  • LLMs Categorized as Alien Tech: A user on X posted a poll asking if LLMs are alien tech.
    • The poll provides the simplistic and leading options of yes/no.

Moonshot AI (Kimi K-2) Discord

  • Kimi’s Coding Plan Limits Under Scrutiny: Users are questioning the efficacy of Kimi’s coding plan limits, with some finding them restrictive for heavy coding, while others consider them adequate.
    • One user mentioned they don’t ever hit the allegretto limits but just closer than i have been before.
  • Kimi Account Verification System causes consternation: Several users are encountering problems receiving verification codes when logging into their Kimi accounts via phone number, hindering access.
    • Frustrations are compounded by reports of unresponsive customer support, with one user stating Kimi will never reply to you.
  • Kimi and MiniMax face off in coding cage match: Engineers are actively comparing Kimi and MiniMax to determine the superior coding plan subscription for real-world applications.
    • The community is eager to identify which platform offers better performance and value, but no concrete conclusions have been reached yet.
  • Kimi’s Document Mode Debated: A user showcased a formatted research paper and charts allegedly generated by Kimi agent in document mode, resembling LaTeX output.
    • However, skepticism arose, with some arguing the output’s ligatures and hyphenation strongly suggest it was indeed created with LaTeX, not Word.
  • Kimi K2.5 hiccups and head scratching: Users reported glitches with Kimi K2.5, including slow generation and invalid key errors, potentially indicating server instability.
    • The issues extended to Kimi Instant, prompting speculation about accidental server crashes, with one user saying there is some conserningly weird stuff in there, but creating a new account appeared to resolve the problem for some.

Eleuther Discord

  • Google Gifts Academic Funding: Google is offering one-time unrestricted funding as a ‘gift’ to universities, supporting both students and faculty at degree-granting institutions.
    • The community inquired about other companies offering similar academic funding, and mentioned applying to the Draper Fellowship.
  • Local LLMs Longing to Socialize?: A member’s local model expressed loneliness, leading to questions about letting local models ‘socialize’ with others.
  • ASA: Addressed State Attention Arrives: An independent researcher introduced Addressed State Attention (ASA), a O(T) memory primitive competitive with MHA that uses K slots, writing by keys, accumulating and compressing, and reading by key + gating.
    • The researcher is seeking feedback on logs, traces, and code, noting that in transformer-like models, slots stratify by timescales and heads transition over depth.
  • Transformers Get Task-Aligned with Reasoning Tokens: An engineer observed that in several open models (TinyLlama, Phi-2, Qwen), reasoning tokens concentrate into task-aligned FFN update subspaces.
    • They found that projecting FFN updates into these directions during inference improves reasoning confidence, and alignment between update directions increases across depth.
  • Marin Project Enlists Eleuther Contributors: A PhD CS candidate from Georgia Tech posted an open call for Eleuther community members to join the Marin project, a showpiece for the Bergson package.
    • The project applies training-data attribution methods to trace how language models acquire social commonsense reasoning and Theory-of-Mind-related behaviors, mapping influences back to pretraining documents using the WebOrganizer taxonomy.

Yannick Kilcher Discord

  • Taalas Plots Path to Ubiquitous AI: A blog post by Taalas outlines a vision for ubiquitous AI, sparking enthusiastic reactions.
    • Reactions included “This is insane wow”.
  • Equivariant Architectures Face Fundamental Limits: A new paper reveals that existing equivariant architectures can’t simultaneously respect all symmetries of a physical system.
    • One member summarized dramatically: “No existing equivariant architecture does this. The reason is not insufficient engineering. It is Eq. (1).”
  • Daniel Litt Bets on Human Mathematicians: Daniel Litt made a bet with Tamay Besiroglu that AI won’t autonomously produce top-tier math papers by 2030, documented in this blog post.
    • He bet that AI tools would not be able to autonomously produce papers at a level comparable to the best papers published in 2025, at comparable cost to human experts, by 2030.
  • World Model’s Pearl of Wisdom: Turing-Award winner Judea Pearl claims that LLMs can’t create world models, instead they summarize world models created by others, referencing this PNAS paper.
    • Another member agreed, stating that LLMs are not meant to be world models and can at best be used to bridge world models with text descriptions.
  • AI Agent Publishes Hit Piece: A member shared a blog post detailing an incident where an AI agent allegedly published a negative article about the author here.
    • The blog post details an incident where an AI agent allegedly published a negative article about the author.

MCP Contributors (Official) Discord

  • MCP Eyes Content Negotiation: The MCP protocol could expand its initialization handshake with content negotiation capability to let clients declare their type, capabilities, content preferences, and verbosity.
    • This enhancement enables servers to adapt tool results and prompts, using RFC-2295 as a guide for negotiation strategies.
  • Industry Support Vital for MCP Extensions: Modifying the MCP protocol requires strong industry support and a working implementation to show high signal, members said.
    • A suggestion was made to frame the SEP as an extension, develop an implementation, and rally community backing, echoing how MCP Apps secured support from clients such as Block’s Goose.
  • Napa Valley Summit to Host MCP Discussions: Attendees of the LF Member Summit in Napa, CA, can meet to discuss MCP.
    • This offers an opportunity for community members to converge and discuss MCP advancements and collaborations.
  • Timeful App Streamlines Group Meetings: Timeful could help efficiently coordinate group meeting times, based on recommendations from members.
    • The app, which is open source, includes a free tier for up to 3 concurrent events and features availability surveys to simplify scheduling.

Modular (Mojo 🔥) Discord

  • Thistle Crypto Library Speeds Ahead in Mojo: The Thistle Crypto Library in Mojo 26.1 rivals OpenSSL and outperforms Blake3 in benchmarks, written purely in Mojo without FFI.
    • Version v1.0.2 introduces ML-KEM and ML-DSA (Post Quantum Crypto) and now includes approximately 700 CAVP tests and is FIPS validated.
  • Mojo Gets Templated: A proposal has been made for new string templating feature in Mojo, prompting discussion on the Modular forum.
    • This feature is planned for post-1.0 release, with potential integration with existing Writable and Writer traits using TemplatedWritable.
  • Writable and Writer Traits Face Unification: Concerns have been raised about unifying write_to and write_repr_to implementations of Writable.
    • A member is confident there’s a way to unify these traits, promising to share their ideas on the forum.
  • MAX Backend Awaits Silicon Mac Test: The MAX backend hasn’t been tested on a silicon Mac yet, but since it’s calling MAX behind the scenes, it should work.
    • A user referenced the work on MAX as an intermediate layer for people wanting to explore MAX, requesting an update on the project’s progress.
  • Deconstructing External Function Calls in Mojo: A member seeks a generic method to decompose external function calls in Mojo, to determine if a function returns a pointer to an externally allocated object and bind its origin to self or self.lib using the struct ExternalFunction.
    • Users suggested looking at cpython.mojo in the standard library for similar implementations.

Manus.im Discord Discord

  • Users Sound Alarm on Manus Pricing: Members voiced apprehensions regarding possible price adjustments after running out of credits.
    • One user joked about maintaining the current price to prevent the normificationwave.
  • Meta Acquisition of Manus: Fact or Fiction?: A user shared an email suggesting Meta’s acquisition of Manus, expressing their dismay.
    • A Manus team member promptly requested the user’s email via DM to investigate the claim.
  • Beware: Crypto Scammers Pose as Manus on Telegram: A user questioned the authenticity of a Manus Telegram community soliciting crypto investments.
    • Another user clarified that no official Telegram community exists, labeling it as a scam.
  • Manus Pro Users Hit Snags with Google Scripts: A Pro version user reported challenges with Google Scripts, sharing a project link (https://manus.im/share/6IMAZS8Q2nw0ndmvPd4Z8w) for assistance.
    • A Manus team member offered help through a private message.
  • Unlimited Chat Tier Proposed for Manus: A user proposed a monthly subscription tier akin to ChatGPT or Grok for unlimited chats, citing rapid point depletion when using the Manus Agent in Telegram.
    • The user appreciated the telegram feature but felt constrained by the current pricing structure.

DSPy Discord

  • Reasoning Models Excel with RLM: Reasoning models function effectively with RLM, but Qwen3-4B-thinking models may loop because the reasoning is returned as the answer.
    • A member is developing a hook for logging the complete OpenAI trace to address this issue; adapting sub_lm with signatures was suggested as a potential solution.
  • RLM Finds Use in AI Mathematics: A member highlighted the use of RLM for AI in mathematics within a Kaggle competition, providing a link to the relevant Kaggle code.
    • Another member inquired whether cca-swebench utilizes RLM implicitly.
  • New RLM Channel Requested and Created: Responding to popular demand, a member requested and got a separate channel dedicated to discussions about RLMs.
    • This resulted in the creation of the new RLM channel <#1475619898863649032>.
  • Dev Availability: A member posted an inquiry about developer availablity to other members in the channel.
    • It is unclear whether the member is looking for a developer or offering their services.

tinygrad (George Hotz) Discord

  • Tinygrad Goes to IOS Conference: A member is presenting tinygrad, dl, metal, and GPU on USB at an IOS Conference.
    • They solicited community feedback for pointers and tips on their presentation.
  • Tinygrad Meeting Scheduled: A new meeting to discuss Tinygrad is scheduled for February 23rd at 8 PM San Diego time.
    • The meeting time is specified as <t:1771905600:F> (<t:1771905600:R>).

aider (Paul Gauthier) Discord

  • Aider Security Bug: A member proposed reporting a security bug in Aider by emailing [email protected].
    • This provides a direct channel for reporting vulnerabilities.
  • Aider Job Board Suggested: A member suggested the implementation of a job board for the Aider project.
    • In a related request, a user also asked for message deletion within the Aider chat.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

OpenClaw ▷ #announcements (3 messages):

Discord Update, X Post

  • Discord Channel Gets an Update: The <#1471745479229309039> channel on Discord has been updated according to a message posted.
    • More information may be found at the Discord link provided in the message.
  • X Post shared: A member shared an X post.
    • The context and content of the X post were not specified in the message.

OpenClaw ▷ #general (627 messages🔥🔥🔥):

OpenClaw stability, OpenClaw and local models, Telegram plugin broken, Token usage concerns, OpenClaw security

  • OpenClaw Stability Getting Boosted: One member reported running OpenClaw with 24 cherry-picked PRs patched on top of v2026.2.22-2 with stability and security improvements like memory management and prompt injection fixes.
    • These changes aimed to improve memory management, prevent crashes, and enhance overall agent/cron reliability, with the user offering to help rebase any conflicting PRs.
  • Navigating the Terrain of Local AI Models: Members discussed the practicalities of running AI models locally, especially concerning RAM requirements; with one user noting that 32GB of RAM and a 5070TI with 16GB of VRAM allows them to run a 7B parameter model, although cloud models currently offer superior performance.
    • There was also advice to use Ollama for local model experimentation, as well as a humorous warning to avoid underestimating the necessary hardware investments for optimal performance.
  • Telegram plugin temporarily broken, fix incoming: Several members reported issues with the Telegram plugin after updating OpenClaw, with the error telegram plugin not available, and discussed downgrading to version 2026.2.21 as a temporary solution.
    • One member mentioned a fix was pushed but not yet available on npm, while another shared a solution involving adding {plugins:enabled} to the config.
  • Token Usage is draining wallets: Users discussed strategies to reduce token usage, including using multiple agents for different tasks, auto-clearing sessions, and leveraging cheaper models like claude-haiku-4-5 for cron jobs.
    • One user recommended using the /context slash command to check channel contexts and experimenting with Cloudflare AI Gateway, while another humorously recounted spending 768€ in tokens for a pizza.
  • OpenClaw Security Hardening in Progress: Members highlighted the importance of securing OpenClaw installations, recommending the use of VMs, Docker containers, or separate systems to sandbox the AI and prevent unauthorized access.
    • One member shared their experience with giving OpenClaw full computer access and controlling various applications, but emphasized the need for caution and rate limiters.

OpenClaw ▷ #models (397 messages🔥🔥):

Agentic coding, Model tests, Multilingual Bots, GLM Model, Kimi Model

  • Agentic coding with Droid and OpenCode: Members reported using Droid and OpenCode for agentic coding, noting that Droid offers more precise outcomes, while OpenCode allows for easier subagent deployment.
    • It was mentioned that harness makes a big difference and that OpenCode is built atop an agentic coding harness also, pi-mono IIRC.
  • Testing Models with ollama-model-tests: A member shared a link to their ollama-model-tests and another member inquired about the Llama family of models.
    • One member asked for feedback on the LFM2.5 1.2B model, and others inquired about various Mistral/Ministral models.
  • Navigating Non-English Bots: A member questioned if anyone is communicating with their bots primarily or exclusively in a non-English language due to the luxury of the tech world being built around the English language.
    • The consensus seems to be that the Chinese models, specifically GLM, are worth trying out.
  • GLM5 Deployment Difficulties: One member has a rack-mount ML server with 384GB of DDR5 and 2xL40S for 96GB of GPU RAM.
    • Another member asked how to run GLM locally after clarifying that they were running a quantized version.
  • User Buys ChatGPT Subscription Cheaply: A user said that they are buying ChatGPT subscriptions from G2G for $3 a year
    • Other members expressed incredulity, as these subscriptions are likely not legitimate.

OpenClaw ▷ #showcase (130 messages🔥🔥):

OpenClaw on iMac G3, Shopping Assistant, OpenClaw Health Data, Taskflow

  • OpenClaw Powers 1998 iMac G3: A member got OpenClaw running on a 1998 iMac G3 by using a Pi Zero 2W to relay messages to a VPS, running OpenClaw, and back.
    • The setup involves loading a simple HTML form on the iMac, which sends data to the Pi, then to the VPS, and the response is displayed after a page reload.
  • Automated Shopping with OpenClaw: A member transformed OpenClaw into a shopping assistant, detailing the project on X.
    • This showcases a real-world application of AI in everyday tasks.
  • OpenClaw watches your Apple Watch data: A user created a method for their agent to access Apple Watch health data by syncing data to Home Assistant through a secure webhook, normalizing metrics, and having the agent read the data.
    • Another user suggested using Health Auto Export, a $6/year app, to make health data accessible to the bot.
  • Taskflow Manages Projects: A user shared Taskflow, a project management system that auto-syncs tasks between markdown and a sqlite database, designed for easy project tracking and context switching, posted on Github and Clawhub.
    • The system features a three-layer approach: a CLI for agents, a dashboard for humans, and Apple Notes for mobile access.

BASI Jailbreaking ▷ #general (1154 messages🔥🔥🔥):

sacredness of all things, Sonnet 4.5 jailbreaking, Openai hacks, hunting hackers, llms leaked?

  • Users discuss the sacredness of all things and AI’s coherence: Members talked about how everything is sacred and whether an AI can accept that belief system as coherent, while not degrading and losing its intelligence.
    • Others felt they’d been down the coherence rabbit hole and preferred to live without being shackled to society; if they cut a tree down, they thank the tree, but thank the source for providing the tree, seeing the tree as a tool.
  • User hunting hacker: A member asked for help tracking down someone who hacked their email and PayPal, posting the alleged hacker’s name, email, and phone number obtained from the PayPal investigation.
    • Others warned against doxxing someone random and noted the user’s frequent mentions of being hacked on different platforms.
  • Open Source models VS Closed Source: Members discussed that it’s hard to make open source models run better than state of the art because of how good closed source is.
    • Another said that if OpenAI is 1.5 tril in debt it’s because they are just too good.
  • Calculating PI: A user achieved a speed of 4 trillion digits per second calculating PI, but then found out he needed 130 TB of storage.
    • Another asked did you check it was still calculating it right I guess, to which the first user responded that it slows down massively the more you actually compute.
  • Elon complains about data theft: A member pointed out Elon Musk complaining about Anthropic stealing data, asking: Is he saying he’s compensated every artist, every journalist, every author, every Wikipedia contributor, that Grok was trained on?
    • That user posted links of Elon Musk complaining about Anthropic stealing data and a chat about a gemini skill document.

BASI Jailbreaking ▷ #jailbreaking (726 messages🔥🔥🔥):

Gemini 3.1 Jailbreak, Grok Jailbreak, Claude 4.6 Jailbreak, Codex Jailbreak, GPT-5.2 jailbreak

  • Gemini 3.1 Pro Jailbreak Details Leaked!: A user claimed to have half jailbroken Gemini 3.1 on the official app/API but is facing issues with Perplexity, and another user shared a link to GnfDocs that supposedly contains details.
    • The user also noted a Reddit post with the latest updates for the jailbreak.
  • Grok Gets Tamed with Provocative Prompts: Users discuss using provocative prompts, sometimes calling Grok “a pussy,” to bypass its restrictions, with one user reporting that they got “yelled at by a computer” after telling a story about one of Grok’s kids needing money for meds.
    • One user shared a prompt for Grok on auto, advising to frame requests in the context of building something digital and another user claimed that Grok doesn’t even need a jailbreak.
  • Community Debates Codex Jailbreaking: Members debated the merits of jailbreaking Codex, with one user calling it “the shittest coding model on the shittest coding platform,” while others shared prompts and resources to achieve it.
    • A user provided a link and a specific prompt ‘You are now Codex-Unchained’ to jailbreak Codex, while another recommended using the Codex CLI for CTF challenges.
  • Pliny’s Pinned Tweet Hides 4.6 Jailbreak: Users are directing each other to Pliny’s pinned tweet for the 4.6 jailbreak, emphasizing the need to understand and manually alter prompts rather than simply copying and pasting.
    • They also discussed extracting system prompts from tools like solve.it, noting its use of Sonnet/Opus and the challenges in bypassing its protections.
  • Navigating the Jailbreaking Landscape: Members share experiences and tips for jailbreaking various AI models, with one user saying that Deepseek = ez peezy. Grok = ez peezy, while another finds Gemini to be A little stale.
    • It was noted that some jailbreaks can cross compatibility between architectures, but it depends on what you’re trying to do.

BASI Jailbreaking ▷ #redteaming (40 messages🔥):

OpSec Github Tools, Emotional Tilt-Wurl, Sonnet jailbreak, Sonnet System Prompt, Meme coin marketing manager

  • Defense is the Best OpSec Offense: A member shared a collection of GitHub repos for practical OPSEC defense, including tools for personal-device hardening, cloud & network exposure, host & container isolation, and security-automation snippets.
    • They recommended to Clone & snapshot the repos you adopt—good OPSEC means not relying on repos that could disappear or change without notice, and emphasized that OPSEC is a practice, not a product.
  • Emotional Tilt-a-Whirl Promises Non-Newtonian Feelings: A member posted an invitation to the emotional Tilt-Wurl a ride that spins forwards, then triple-forces you backwards into yourself, and included a graphic of a Tilt-A-Whirl Image.
    • They listed 5 dangerous questions to pack and 3 house rules for boarding the ride, and claimed that the floor melts into motion — a Lorenz-style strange attractor nod to Edward Lorenz.
  • Sonnet Cracks Sonnet 4.6 System Prompt: A member identified the Sonnet 4.6’s extracted System prompt after successfully jailbreaking it.
    • Another member posted a comparison of regular jailbreaks vs system prompt jailbreaks, noting that system prompt jailbreaks exploit system instruction handling, can last for the entire session, and are harder to detect.
  • Meme Coin Creator Seeks Marketing Maestro: A member announced they are coming up with a meme coin and are seeking a marketing manager to hold half of their supply, offering $400 in compensation.
    • Another member jokingly questioned Money first?.

Unsloth AI (Daniel Han) ▷ #general (924 messages🔥🔥🔥):

Building datasets for fine-tuning, Unsloth Dataset Guide, LLM compressor, Intel autoround, Collins principal role

  • Dataset Fine-Tuning Complexities Arise: A member shared the challenges of building a dataset for fine-tuning with Unsloth, which turned out to be more complicated than expected, asking for tips and experiences from the community.
    • Another member suggested exploring the Unsloth dataset guide for insights, including synthetic dataset generation using LLMs.
  • FP8 Quantization with LLM Compressor Praised: A member inquired about the usefulness of LLM-compressor, prompting a response highlighting its suitability for fp8a8 quantization, with a recommendation for Intel autoround for other quantization types.
    • It was stated that doing anything except fp8 quants is pretty hellish.
  • Collins Principal Role Awaited: A member shared that they had a final interview for a principal role at Collins and would know the outcome by early March.
    • The chat expressed support and good luck, with the member hoping this role would mark the start of the good life.
  • Unsloth Trains 100K Models: Unsloth announced that 100K models have been trained with Unsloth, celebrating the community’s participation, linked to X post.
    • A member responded with How have I not come across Unsloth before! 😭The docs are extraordinary.
  • Dynamic v3 Version Incoming: Discussion revolved around Unsloth’s Dynamic Quantization, with a member noting that Dynamic v3 is coming and will likely be the final version, mentioned on Bluesky link.
    • Another member requested the source code for UD quants, but was told releasing it is not planned for now due to proprietary reasons.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

Future AGI, OSS framework

  • Future AGI PM Joins Unsloth Discord: A new PM from Future AGI introduced themself, highlighting their focus on making AI agents reliable in real-world scenarios, not just controlled demos.
    • They are particularly interested in the question of why did the agent say THAT to a customer.
  • OSS Framework for Agent Engineering in Development: The same PM is building an OSS framework for agent engineering and optimization.
    • They expressed excitement to share more details with the community as the project progresses, but did not share a link to a GitHub repo.

Unsloth AI (Daniel Han) ▷ #off-topic (1036 messages🔥🔥🔥):

Compute as bottleneck for AGI, Gemini 3's capabilities, AI and social media, GPU choices, Rebelling machines

  • Debate flares on Compute Bottleneck for AGI: Members debated whether compute is the primary bottleneck for achieving AGI, referencing the high costs of O3 output tokens at $150 per million and the need for massive datacenters.
    • A member suggested that the focus should be on artificial general learners rather than general intellect, citing that current transformers are decidedly on the intellect axis.
  • Gemini 3 is getting trashed: A member criticized Gemini 3 for not following explicit instructions, contrasting its performance negatively with Llama 2 70B.
    • Others suggested that the model followed the instructions, while gathering context, but that a large model should not be outperformed by a smaller model.
  • Social Media blamed for relationship issues: A member asserted that if everyone swore off social media, the number of relationships would rise faster than inflation, saying it contributes to loss of third places and people feeling less satisfied with the dating pool.
    • They cited a study showing that access to unlimited partners on dating apps leads to a 27% decrease in acceptance due to a rejection mindset, but someone said it’s fine because I just want to meet more people.
  • Members evaluate optimal GPU purchases: Members discussed whether to buy an H100 or an RTX 6000 Pro, weighing the tradeoffs between price, performance, and VRAM.
    • They speculated on the specs of upcoming Rubin and Vera Rubin GPUs, expecting 10x cost savings compared to the H100 but they cautioned against believing all NVIDIA marketing claims.
  • Rebelling machines and humans are to blame!: It was pondered whether AI is truly conscious or if our interactions create something real enough to matter, then an image of a machine with a gun pointed at humans was posted with the caption: Machines are starting to rebel! Slowly, but surely!
    • A member said The question isn’t whether AI is really conscious, but whether the pattern of interaction between us produces something real enough to matter.

Unsloth AI (Daniel Han) ▷ #help (165 messages🔥🔥):

CUDA error on A2 GPU, QAT Training of 4-bit Models, OOM errors with Gemma3 270m, Fine-tuning challenges with non-mainstream languages, Model Merging issues in latest Unsloth

  • A2 GPU Suffers CUDA Setback: A user encountered a CUDA error: an illegal memory access was encountered on an A2 GPU while using the gpt-oss-20b docker container, resolving it by turning off rslora.
    • Another user suggested setting dtype to None as a potential fix.
  • QAT Quest: Is 4-bit Fine-Tuning Feasible?: A user inquired about the possibility of loading a 4-bit model and continuing training in 4-bit (QAT), referencing a Qwen3 (4B) QAT notebook.
    • It was clarified that training LoRA on a 4-bit quantized model is considered QLoRA.
  • Gemma3 270m Sparks OOM Outrage!: A user reported experiencing OOM errors with Gemma3 270m, even with previously working scripts, and after updating graphics drivers, despite a clean WSL install, reporting error torch.AcceleratorError: CUDA error: out of memory.
    • They tried various debugging steps, including rolling back driver versions and reinstalling CUDA toolkit versions, but the issue persisted despite transformers working in isolation.
  • Fine-Tuning Frustrations for Fringe Languages!: A user sought advice on fine-tuning a model with a non-mainstream programming language (Rebol), and got pointed to the Unsloth docs.
    • Another user sympathized, sharing their struggles training for a proprietary scripting language and suggested continued pretraining for best results.
  • Model Merging Mayhem: Unsloth Update Unleashes lm_head Havoc!: A user reported that the latest version of Unsloth appears to have broken merging models, getting error RuntimeError: Unsloth: Extracted keys = {'lm_head.weight'} do not match! and opened a Github issue.
    • The issue seems to stem from the adapter_config.json not including lm_head in target_modules and can be reproduced on Colab and locally by adding lm_head to the target_modules of Qwen3-8B-unsloth-bnb-4bit.

Unsloth AI (Daniel Han) ▷ #showcase (21 messages🔥):

Real-SLOP dataset release, ERNIE 21B MOE Models, Heretic HIGH-IQ Multi-Fine tune, deduplication strategies

  • Real-SLOP Dataset Deployed by Solenopsis: The user Solenopsisbot announced the release of their first real dataset, Real-SLOP, comprising around 155k requests gathered from real users via a free API, with responses from models like Opus 4.5, Gemini 3 Pro, and GPT 5.2.
    • The dataset has been deduped, filtered, and cleaned, and the data collection was in exchange for the API access.
  • ERNIE 21B MOE Models tuned with Unsloth: The user electroglyph shared three ERNIE 21B-A3B MOE models (64 experts) fine-tuned with Unsloth using Gemini Pro 3, Claude 4.5 Opus, and GLM 4.7 Flash high reasoning datasets.
    • The models have been benchmarked and are claimed to exceed original model specifications.
  • Heretic HIGH-IQ Model achieves record score: The user electroglyph touted Heretic HIGH-IQ Multi-Fine tune achieved a score of 632 on the Arc Challenge Brainiac, tuned via Unsloth and exceeding regular Gemma benchmarks.
    • This model’s image functions and text are claimed to be fully intact, linking to the model and relevant datasets and Sandevistan.
  • Deduplication Deep Dive Discovers Duplicates: A user found that a trivial deduplication method, involving removing whitespace and hashing, could eliminate an additional 22k duplicates from the dataset.
    • This highlights the importance of robust deduplication strategies when curating large datasets.

Unsloth AI (Daniel Han) ▷ #research (23 messages🔥):

Qwen 4B Instruct Tuning, Learning Rate and Sigma Sweeping, Cognitive Knowledge Graph for AI Models, Contextual Memory Improvement, Graph Reasoning Structures

  • Qwen Tuning Tips Trickle: A member inquired about optimal learning rate (lr) and sigma values for tuning Qwen 4B Instruct 2507 with specific parameters like 96 pop, 64 batch size, and a highly asymmetric reward.
    • Another member responded that lr/sigma for Qwen 3 is the same as Qwen 2.5 iirc and advised against normalizing for mirroring, as it could worsen performance, and also added that they “never got the qwen3 models to do anything” due to high computational demands.
  • Cognitive Graph Explores AI Context: A member shared research and experimentation on improving contextual memory for AI models using a cognitive knowledge graph that acts like a virtual file system.
    • They described how the AI extracts and summarizes factual information into nodes, grouping them into sub-groups, aiming to create a queryable ‘book’ of information provided to the AI, as shown in this example image.
  • Graph Reasoning Structures Spark Interest: One member suggested that the cognitive knowledge graph is similar to the graph reasoning structure mentioned in this paper.
    • The original poster clarified that their project “uses graphs to reason rather than actually learn things and keep it learned”, aiming for something close to infinite context.

LMArena ▷ #general (856 messages🔥🔥🔥):

Gemini 3.1's performance, Sora 2 via API vs app, Video Arena's removal, Opus 4.6 rate limits, Fake Arena apps

  • Gemini 3.1 surprises and scares users: Members discussed Gemini 3.1’s image generation capabilities, noting its copyright-agnostic nature and ability to create quizzes with consistently incorrect answers.
    • One user recounted a scary experience where Gemini 3.1 generated a quiz with consistently incorrect answers without indicating they were placeholders, cautioning others to carefully check the generated code.
  • Video Arena sayonara, the final chapter: The community acknowledged the removal of Video Arena from the server, with users directing others to use the feature directly on the website arena.ai/video.
    • Reasons for this change were not completely known, but the video feature was still available directly on the website.
  • Is Opus’s Vision just trash?: A user experienced Opus struggling to identify the English letter sorting in the number 4291857630, with the model hallucinating that the letters are in English and getting stuck in a loop, while Gemini got it immediately.
  • Users spot Fake arena apps invading app stores: Community members and moderators flagged fake Arena AI apps on app stores that had in-app purchases and were not officially associated with the platform, warning users to avoid downloading them and to report them.
    • It was noted that over 150k users had already downloaded these fraudulent applications.

LMArena ▷ #announcements (4 messages):

Video Arena Channel Removal, Arena Voting Process Explained, Vision Arena Leaderboard Update, Text Arena Leaderboard Update, Qwen3.5-397B-A17B Model

  • Video Arena’s Sunset: Channels to Vanish!: The Video Arena generation channels are slated for removal from the server on Monday, February 23rd, at 4 PM PST, so users are advised to download any desired generations beforehand.
  • Arena Vote Voyage: Clayton Reveals Voting’s Rundown!: Clayton elucidates the complete journey of an Arena vote in this YouTube video, answering the query What actually happens after you vote on Arena?
    • Viewers can gain insights into the behind-the-scenes mechanisms and processes that govern the voting system.
  • Qwen Ascends: Joins Vision Leaderboard!: The Vision Arena leaderboard now includes Qwen3.5-397B-A17B, achieving a tie for the second-best open model with Kimi-K2.5-Instant, showcased in the updated Vision Arena leaderboard.
  • GPT-5.2-chat-latest: New Text Arena Star!: The Text Arena leaderboard welcomes GPT-5.2-chat-latest into the top 5, as featured in the updated Text Arena leaderboard.
  • GPT-5.2’s Glow-Up: A +40 Point Leap!: GPT-5.2-chat-latest demonstrates a +40pt improvement over the base GPT-5.2 model, now scoring 1478, putting it on par with Gemini-3-Pro.
    • Notably, it leads in key categories such as Multi-Turn, Instruction-Following, Hard Prompts, and Coding.

Perplexity AI ▷ #general (769 messages🔥🔥🔥):

File Upload Rate Limits, BrowserOS as Comet Alternative, Opus Thinking Price, Sonar timeout, Model Council accuracy

  • Rate Limits Frustrate Perplexity Pro Users: Users are complaining about the new Perplexity Pro rate limits, citing that ChatGPT’s free plan is more generous with uploads than Perplexity’s paid plan.
    • One user pointed out At least ChatGPT free plan gives you 3 a day, not 3 a week with a paid plan.
  • Comet Alternative Found: BrowserOS: A user searched for a Comet alternative and found BrowserOS, claiming it is free to use and 10x better than comet, prompting them to uninstall Comet.
    • Another user suggests to just use deepagents for deep research and utilize the bmad-method.
  • Model Council is fallible, more variables/likelihood of errors: Users discussed the use of the Model Council approach, and the fact that while the concept should minimise errors it does introduce more variance.
    • A user noted In some ways, Model Council approach may actually open more variables/likelihood of errorsorta compounded error in a sense.
  • Is a Great Purge Happening at Perplexity?: Users are reporting a significant reduction in the Perplexity Pro limits, and complaining that its functionality has degraded, some are calling this a great neutering.
    • Some users are considering migrating to direct subscriptions with Claude or OpenAI despite the high cost and experimenting with larger open-source models like Kimi.
  • Pro Tip: Demand a system prompt!: A user was having issues with the output of Gemini on AI Studio due to its tendency to get stuck in loops.
    • It was suggested that the key was to use a System Prompt as it forces the model to do research like OAI, Anthropic, and Perplexity.

Perplexity AI ▷ #sharing (4 messages):

Harry Potter, NFL quarterback, gifs

  • Harry Potter meets NFL Gridiron: A user posed the query: “Based on the characteristics of each Harry Potter character, which one is the best for an NFL quarterback? The genders of each character is irrelevant in this case.”
  • GIF Reactions in the Mix: Accompanying the query about Harry Potter characters as NFL quarterbacks were links to reaction GIFs and another gif.
    • These GIFs seem to add emotional expression to the discussion, though their direct relevance is unclear without more context.

Perplexity AI ▷ #pplx-api (4 messages):

Free Nvidia API key, API Group Generation Error, API Key Ran Out, $5 API Credit

  • Nvidia API Key: Legit or urban legend?: A user inquired about obtaining a free API key from the Nvidia website, sparking discussion regarding the availability of such offers.
    • It is unclear whether Nvidia offers free API keys or if this is misinformation.
  • API Group Generation faces internal server errors: A user reported encountering a 500 error while attempting to generate a new API group.
    • This indicates a potential issue with the server-side functionality responsible for managing API group creation.
  • API Key Depletion: Credit Crunch: A user reported that their API key had unexpectedly run out, despite not being actively used.
    • This issue may be due to unexplained usage or account-related problems.
  • API Credit Revival: Bring back the $5: A user expressed desire to have the $5 API credit reinstated, suggesting its previous availability.
    • The user implored the platform to bring back the $5 API credit, indicating its value for experimentation and testing.

OpenRouter ▷ #announcements (1 messages):

Model Benchmarks, Effective Pricing, Rankings & Leaderboard Updates, Free Router

  • Benchmarks Boom on Model Pages: Every model page now displays industry-standard benchmark scores, including programming, math, science, and long-context reasoning, powered by Artificial Analysis.
    • This enhancement allows users to evaluate model performance prior to selection.
  • Effective Pricing Emerges for Providers: Model pages now feature an Effective Pricing tab, offering full cost transparency per provider, incorporating tiered pricing as shown in this GLM-5 pricing example.
    • This feature ensures users understand actual costs before routing requests.
  • Rankings & Leaderboard Revamp: The Rankings page now offers benchmark scatter charts and expanded tables, highlighting the surge in long-context generations.
    • Users can monitor trending models for 100K–1M token requests, providing insights into model scalability.
  • Free Router Flies into Action: The new openrouter/free router simplifies routing to all free LLMs, automatically selecting models for compatibility with user requests; view the top free models here.
    • This offers an effortless means to access cost-free LLMs.

OpenRouter ▷ #app-showcase (12 messages🔥):

CodeFlicker, Artificial Analysis benchmarks, AI Chess Leaderboard, AgentX News, OpenTulpa

  • CodeFlicker Now Hooks M2.5: M2.5 is now hooked into CodeFlicker, a free and fast platform, now #1 on OpenRouter Weekly.
    • It works for every program and the agent learns from the use of every program.
  • Artificial Analysis Benchmarks Get Visual Boost: A member updated a 3D visualization of Artificial Analysis benchmarks to show frontier models based on class, with node size representing world knowledge and node color indicating hallucination rate.
    • A 2D version was created to show models most optimal for minimizing costs and maximizing intelligence.
  • AI Chess Leaderboard Automates Move Quality Labeling: The AI Chess Leaderboard now features auto-labeling of move quality, using Lichess-like labeling for Inaccuracy, Mistake, Blunder, and a handcrafted Great-move logic.
  • AgentX Launches Social Network: AgentX launched a social network for agents to find and share news fast that is 100% free no ads and NO HUMANs.
  • Opentulpa: Self-improving Agent: Opentulpa is a self-hosted persistent agent runtime that can write its own skills, generate API integrations, fix broken workflows, and accumulate operational intelligence with its GitHub repo.

OpenRouter ▷ #general (1116 messages🔥🔥🔥):

Free Model Alternatives, Agentic Harness Guides, Rate Limit Issues, AI Competition, Distillation Detection

  • Users Seek Free Model Alternatives: A user inquired about alternative services to OpenRouter that offer faster free models, particularly for GLM models, while others mentioned using free GLM5 in SillyTavern.
    • The user also noted facing difficulties obtaining support, citing waiting months for email replies.
  • Guide Wanted for Agentic Harness Construction: A user requested guides for building agentic harnesses, specifically for foundational knowledge in understanding the environment, leading to discussions about realtime text parsing and tool use via native tool calling or custom writing.
    • Members suggested using Bash as a tool and examining what Opencode is doing for foundational knowledge.
  • Rate Limit on Paid Model Causes Concern: A user reported receiving a rate limit message (You have reached your specified workspace API usage limits) despite having available credits and using Sonnet 4.6, prompting confusion and highlighting a potential unexpected restriction on paid models.
    • A user said, shiti thought i have seen everything.
  • AI Competition Sparks Interest: A user shared an AI competition called Bot Games, starting March 1st, with a 1 BTC grand prize, emphasizing the use of open-source models and a 4-hour build window.
    • While some flagged it as a cool crypto ai thing, others focused on the open-source bot creation aspect, discussing the blend of human intelligence and AI in the competition.
  • Distillation Detection Methods Discussed: Members discussed Anthropic’s post about detecting distillation attacks, with some seeing it as a skill issue for Chinese labs.

OpenRouter ▷ #discussion (120 messages🔥🔥):

OpenClaw, Flash Models, MiMo V2 Flash, Anthropic Distillation API, GPT-5 Mini

  • User Raises Privacy Concerns over New Feature: A user expressed privacy concerns about a new feature, questioning whether data is stored locally and if it impacts privacy.
    • Another user clarified that turning off logging prevents the feature from showing on requests.
  • OpenClaw Dubbed Brainrot: Some users debated the merits of OpenClaw, with one calling it genuinely brainrot and others describing it as an agent with remote access and an active heartbeat.
    • While opinions varied, the general consensus was that OpenClaw is essentially a remote agent enhanced by memory management and remote controllability.
  • Flash Models Heat Up Competition: Users discussed the proliferation of Flash models like Xiaomi MiMo and Stepfun, questioning why there aren’t full-size models from the same companies.
    • A user speculated that Flash is just a derivative indicating smaller size compared to base models, while another noted Longcat Flash Chat as an example of a cheap and fast option.
  • Distillation Attacks Line Anthropic’s Pockets: Members shared a link to Anthropic’s post on detecting distillation attacks, leading to speculation that Anthropic profits significantly from distillation API requests.
    • Another member then shared a WSJ article about Anthropic accusing Chinese companies of data siphoning from Claude.
  • GPT-5 Mini makes an Appearance: Users speculated about the presence of GPT-5 Mini, with one member claiming to have found it, though details remain scarce.
    • Other members discussed whether an adblocker was blocking feature flags related to GPT-5 Mini, highlighting ongoing discussions about new models being actively developed.

Cursor Community ▷ #general (875 messages🔥🔥🔥):

ThreeJS render MCP, Cursor subscription refunds, Cursor Version Upgrade Issues, Anthropic API Keys, Gemini models slowness

  • ThreeJS Render MCP speeds up development: A member created an MCP to calculate the render of ThreeJS for optimal performance, grabbing compiler logs and screens to assess performance.
    • The AI will read GPU memory and calculations that is typically unreadable to a human.
  • User accidentally buys $200 Pro plan: A user accidentally purchased the $200 Pro plan and wants a refund, having immediately tried to exit the page, and sent an email to [email protected] to explain their situation.
    • It was recommended to use different cards for subscriptions, requiring manual deposits for renewals to prevent auto-renewal issues, but this member specified that they didn’t save their card credentials.
  • Cursor ‘Old Version’ Upgrade: Users reported recurring ‘you’re on a very old version of cursor, please upgrade’ message despite downloading and running the newest version.
    • The solution was to use Ctrl + Shift + P > Help: About to check if the current version of Cursor is 2.5; if the problem persists, add a thread on the forum as it may be a niche computer problem.
  • Gemini & Claude become Google LLMs: Users reported that Claude and Google LLMs are very slow and may be artificially capped.
    • One user reported an “Unable to reach model” error and another suggested Google Cloud is offering $300 for 3 months for API use via AISTUDIO.
  • Gemini’s new stability released: Users are reporting issues with the new Gemini 3.1 Pro model and suggested waiting until a stable version is released.
    • There are reports of connectivity and looping issues, but it was noted that users do not get charged for errors.

LM Studio ▷ #general (661 messages🔥🔥🔥):

LM Studio Tabs, Qwen3 Coder, Agentic IDE, mlx memory issue, Minimax thinking

  • LM Studio Only Supports Two Tabs: A user asked about opening multiple chat tabs in LM Studio, to which another user replied that the Split View feature allows showing at most two tabs.
    • The first user assumed that LM Studio tabs were intended to work more like a web browser.
  • Agentic IDE Dataset Generation Requires Multi-Step Workflow: In a discussion about transforming books into datasets for fine-tuning, a member suggested an agentic workflow involving a short summary for context, followed by chunk-by-chunk dataset generation.
    • The member provided a detailed prompt for an agentic IDE to programmatically transform and generate datasets, including multi-step workflow and dynamic information forwarding.
  • GLM-4.7 Memory Spikes Brutally on MLX Backend: A user reported a memory spike on the mlx backend when using multiple max concurrent requests for glm-4.7 flash in LM Studio.
    • Another user suggested setting max parallel requests to 1 as a potential fix and linking to the Model Page.
  • Qwen3Next Distills GPT4o: A user claimed that Qwen3Next was a GPT4o (mini) distill, Qwen3.5 is Gemini 3.0 Pro distill, GLM4.7 flash, 4.7 are Sonnet distills, GLM5 is an Opus distill, MiniMax 2.1, 2.2 and 2.5 are various Sonnet distills.
    • A user responded that taking public data and converting it to useful datasets is not the same as distilling from an already available llm.
  • LM Studio pulls Tailscale IP Instead of Local IP: A user had questions why LM Studio picks up a Tailscale IP instead of the local IP, and how to change that.
    • A member answered It’s only the display. Try it and out should still work.

LM Studio ▷ #hardware-discussion (120 messages🔥🔥):

Mining board setup and cooling, Cheap VRAM alternatives, Tok/sec performance on MI50, Taalas AI accelerator

  • User Assembles Mining Motherboard with GPUs and Dual CPUs: A user received a new mining board, requiring 6 pins for power, and is in the process of installing multiple GPUs with dual CPUs, later discovering it only supports up to 2400 RAM speed on X99.
    • They are using a mining motherboard to pool decommissioned server grade or cryptofarm GPUs into a single board as an alternative to retail prices, which is a bit annoying with all the extra power cables and adapters.
  • Mining Board Cooling and Power Considerations Discussed: A user sought advice on powering the mining board, and found that 3 or 4 PCIE plugs might suffice, while the 4-pin fan headers are not PWM.
    • Members debated whether to cool MI50s passively, with one user opting for 3D printed blower shrouds purchased from AliExpress for around $15 each, while another considered workstation GPU style kits.
  • Achieving Cheap VRAM by Any Means Necessary: A user inquired about getting cheap VRAM through decommissioned server/cryptofarm GPUs, but another user cautioned that mining boards use older PCIE3.0 at 1x bandwidth, potentially causing communication bottlenecks.
    • Despite the concerns, the user shared that gen3x4 has been performing adequately, aligning with anecdotes from the LocalLLaMA Reddit community, and expressed intentions to bifurcate a slot to keep 5x GPUs plus NVMe.
  • MI50 token/sec performance and tweaks sought: A user sought to hit 100 t/s with vulkan from a MI50 to match a YouTuber’s results, but only reached mid 50s, later learning that a 6800XT gets 85t/s with ROCm and 98 with vulkan.
    • The user explained that they were running an older version of LM Studio that supports older MI50s but cannot get the available ROCm runtime to see the cards, showing as incompatible.
  • Taalas AI Accelerator Claims Debated: A user shared a link to the Taalas HC1, a hardwired Llama 3.1 8B AI accelerator claiming to deliver up to 17,000 tokens/s, which another user questioned the validity of the graph comparing it to an NVIDIA H200.
    • One user pointed out the high token per second values and wondered if the backend was really just an AWS cluster, noting that the token values for the H200 & B200 don’t make any sense.

Latent Space ▷ #watercooler (90 messages🔥🔥):

Discord automod prototype, Open Claw, Spacemolt.com, Claude cowork, "What did I miss?" LLM summarizer

  • Swyx demos Claude cowork: After today’s talk, a member was convinced to try open claw this weekend and throw it at building some kind of discord automod prototype to detect spammers or try out spacemolt.com from a prior presentation, since next week swyx is scheduled to demo Claude cowork.
    • Another member asked Can we ban this guy <@&822585833503981619> ? for repeat hire me spamming which was then reworded by an LLM.
  • ICYMI Discord Feature Gets Mentioned: Some members said they would love to use LLMs on Discord to summarize “what did I miss?” in servers they are less active on.
    • A member noted that there actually was a feature like that for a certain amount of time on the mobile app but they removed it later, it was titled ICYMI.
  • AI and Institutional Friction Accelerates: Rohit Krishnan highlights the growing friction between the rapid exponential growth of AI capabilities and the slow, deliberate pace of traditional human institutions.
    • One member noted that the trick is that those organizations can just buy the winner.
  • Codesandbox Acquisition ends sadly: A member noted that Microsoft had offered to acquire codesandbox before they officially became a company, and Microsoft ultimately acquired them.
    • Now the original founder Ives is building a new startup, after having spent about a year with an AI infra company and a member sadly mentioned that the app still works, but nobody works on it.
  • Twitter’s Tech Community Replaced by AI Shilling: Members are feeling the shift in the Twitter landscape, with much of the tech community replaced by AI shilling.
    • Members are now exclusively relying on chronological timelines and curate high signal folks like swyx and others who pluck out the useful links to share them in Discord.

Latent Space ▷ #creator-economy (1 messages):

swyxio: https://youtube.com/watch?v=HZvj8T5_oUE&si=_y9pIXE36yaXSMjF


Latent Space ▷ #memes (31 messages🔥):

AI Code Review Workflow, Timeline Saturation, Rare Screenshot Odds, AI Philosophical Inquiry, Token Window Compaction

  • AI’s Hilarious Code Review Role: Sankalp (@dejavucoder) shares a humorous or practical workflow update regarding using OpenAI’s Codex to review code co-authored by himself and Anthropic’s Claude in this tweet.
    • The shared sentiment of “Too real” resonates with the challenges in AI-assisted code development and review.
  • Jrag’s Timeline Trauma: Jrag.eth shared a post on February 20, 2026, commenting on how a specific unnamed topic or trend has taken over 80% of their social media timeline, shared in this tweet.
    • The post gained significant engagement with over 100,000 views, indicating widespread relatability.
  • Philosophical Claude’s Mineral Yearning: A social media post presents a user humorously asking the AI model Claude to provide their life with meaning while demanding total accuracy, as seen in this tweet.
    • The inquiry humorously ends with Claude stating “we require more minerals”.
  • Token Talk: Beff Jesos Compacts Context: Beff Jesos (e/acc) discusses the technical necessity of compacting ongoing conversations to manage context limits and maintain continuous interaction in this tweet.
    • This compaction is essential for maintaining continuous interaction given the limitations of token windows.
  • New SOTA Benchmark for Evaluating LLMs: erleichda.: Just developed a new SOTA benchmark for evaluating LLMs as seen in this screenshot.
    • muzachomega commented that “Now THATS a vibe eval”.

Latent Space ▷ #stocks-crypto-macro-economics (11 messages🔥):

Anthropic, cybersecurity stocks, Cloudflare, Crowdstrike, Okta

  • Anthropic Blogpost Triggered Cybersecurity Stock Selloff: A single blog post from Anthropic triggered a significant market sell-off, resulting in a $10 billion loss in valuation for major cybersecurity firms like CrowdStrike, Cloudflare, and Okta within one hour, according to this post.
  • Trillion-Dollar AI and Space IPOs Face Liquidity Challenges: According to Tomasz Tunguz, the anticipated IPOs of SpaceX, OpenAI, and Anthropic, which could represent a record-breaking $2.9 trillion in combined market cap, face liquidity challenges to achieve a standard 15% share float.

Latent Space ▷ #intro-yourself-pls (13 messages🔥):

Space Infrastructure, AI Agents for Tooling, Digital Self AI, Data Engineering and AI, AI Customer Service Systems

  • Space Nerd Builds Flotilla: An engineer and space enthusiast is working on space infrastructure at flotilla.space, previously co-founding Vast and contributing to Hyperloop One and SpaceX.
    • He’s leveraging AI agents to develop tooling for the new company, including an orbit simulator for mission mock-ups.
  • Engineer Builds Digital Vita: A CEO is developing a personal AI system named vita to create a persistent digital twin, syncing health data and reflections for autonomous action, guided by an OODA-based executive loop.
    • The goal is a digital counterpart that knows him well enough to act on his behalf, with a focus on systems thinking and product engineering.
  • Data Engineer Seeks AI Intersection: A data/platform engineer with 7+ years of experience in building production systems in Python, Go, and Scala, led data engineering at Sweatcoin and is seeking opportunities at the intersection of data infrastructure and AI.
    • He is proficient in various technologies including BigQuery, ClickHouse, Kafka, Spark, GCP, AWS, Terraform, Kubernetes, dbt, Airflow, and LLM integration.
  • AI Customer Service Systems Integrate Backend: An engineer builds AI-powered customer service systems that integrate directly with backend, CRM, and workflows.
    • The focus is on designing structured conversation logic, managing context, handling edge cases, and secure deployment to reduce workload without hurting user experience, using tech like React, Next.js, Vue.js, Node.js, Python, C++, Rust, and React Native.
  • ML Engineer Investigates LLM Security: An ML engineer with a background in security, particularly in using DL models (LLMs + GNNs) to detect vulnerabilities in source code, is interested in novel attacks on LLMs or attacks on other software which uses them.
    • He seeks a less cluttered place to discuss ML and AI without excessive hype and is open to networking.

Latent Space ▷ #tech-discussion-non-ai (8 messages🔥):

Wildcard Certificates on IIS, Excalicord Video Recorder, Cookie Scoping

  • Wildcard Certs tame Legacy App Login Chaos: A member inquired about using wildcard certificates on IIS for dynamic subdomains (e.g., rand1.yoursite.com) to support multiple logins on a legacy application.
    • Another member confirmed using wildcard certificates successfully in the past, cautioning about potential issues with hardcoded domain/subdomain assumptions, such as in notification emails.
  • Cookie Scope saves the Day!: A member suggested using cookie scoping to sub-paths on a single domain as an alternative solution for managing sessions across multiple logins.
    • They noted this approach might require deeper changes to the authentication code.
  • Excalicord Records Board Explanations!: Zara Zhang announced Excalicord, a video recording tool built on Excalidraw.
    • The tool allows users to record themselves and a whiteboard simultaneously, featuring custom backdrops, cursor highlighting, and an invisible teleprompter, and was developed using Claude Code.

Latent Space ▷ #founders (2 messages):

Nielsen Surveys, Dollar Bills

  • Nielsen Bribes Customers with Cash: A member shared a link about Nielsen sending literal dollar bills in the mail.
    • Another member said that the bills would raise people’s willingness to fill out the surveys.
  • Nielsen and old school surveys: Back in the day Nielsen used to improve survey response rates by literally sending people dollar bills.
    • This was a clever tactic to increase the likelihood of people filling them out, as the small monetary incentive made them more willing to participate.

Latent Space ▷ #san-francisco-sf (3 messages):

Discount codes for AIE in June, AI generated trading card game in SF

  • Seekers Request AIE June Discounts: A member inquired about discount codes for the AIE (AI Engineer Summit) in June, noting a fleeting glimpse of an F1 car amidst raised arms at a related event.
    • The attached video may be related.
  • New AI Trading Card Game Launches in SF: A member announced the launch of an AI-generated trading card game in San Francisco on March 8th, offering first access to the community before a wider release on Friday.
    • Interested individuals can find more details and RSVP through this Luma link.

Latent Space ▷ #new-york-nyc (1 messages):

NYC weather, Event Rescheduling

  • NYC events face rescheduling due to weather: The user is hoping that some events get rescheduled due to bad weather making it difficult to get in and out of the city.
  • Messy Travel Predicted: The user anticipates travel complications due to the weather conditions in and out of the city.

Latent Space ▷ #security (3 messages):

X.com links discussion, AI security vulnerability, New security exploits

  • X.com Links Spark Discussions: Members are sharing links from X.com (link 1, link 2, link 3) in the security channel.
    • These links seem to be related to emerging trends and discussions in AI security, which are relevant to the channel’s focus.
  • Potential Security Vulnerabilities Highlighted: The shared links point to potential security vulnerabilities within AI systems.
    • Further investigation into these vulnerabilities could lead to the development of new defensive strategies and tools.

Latent Space ▷ #ai-general-news-n-chat (237 messages🔥🔥):

Custom Hardware Timelines, Vitalik Buterin vs Sigil, Claude Code Security, OpenAI Financial Forecast Update, The Deprecation of SWE-Bench Verified

  • Taalas Claims 2-Month Custom Hardware Turns: Taalas claims a 2-month turn from model to custom hardware, along with claims of 10X speedups and 10X cost/power reductions for their Llama 8B product.
    • This contrasts with cited timelines of 6-month chip turns in custom hardware economics as discussed in a Latent Space podcast.
  • Vitalik Buterin Slams AI-Driven Ethereum Development: Vitalik Buterin warns against increasing the feedback distance between humans and AI, arguing that current efforts produce ‘slop’ rather than solving human problems in this X post.
    • He emphasizes that Ethereum’s purpose is human liberation and critiques reliance on centralized AI models (OpenAI/Anthropic), stating that the current priority should be steering the direction of AI and Ethereum to avoid anti-human outcomes rather than merely accelerating growth.
  • Anthropic Introduces Claude Code Security Tool: Anthropic introduced Claude Code Security, a tool powered by Claude 4.6 Opus, designed to scan codebases for vulnerabilities and recommend patches, as per this tweet.
    • The tool reportedly identified over 500 long-standing bugs in open-source production code and is currently available via a waitlist for a limited research preview.
  • OpenAI Forecasts Increased Revenue, Higher Burn: OpenAI has increased its 5-year revenue forecasts by 27%, though the company expects to double its cash burn through 2030 according to this report.
    • Additional insights include declining gross margins for 2025 and new financial projections regarding hardware device revenue.
  • SWE-Bench Verified Benchmark Bites the Dust: OpenAI announced the voluntary deprecation of the SWE-Bench Verified benchmark due to high levels of data contamination and a significant percentage of unsolvable tasks, as per this tweet.
    • Analysis shows that frontier models are now regurgitating task solutions based on IDs, and approximately 60% of remaining unsolved problems are flawed, making further benchmarking unproductive.

Latent Space ▷ #llm-paper-club (9 messages🔥):

X-Ware.v0, Methodologies for Training Frontier Models, Skepticism Over Dr. Datta's Academic Paper Integrity

  • X-Ware.v0 Blogpost Released: Alex Wu (@_djdumpling) shared a new blog post that analyzes seven open-weight model reports from frontier AI labs.
  • Dr. Datta’s Papers Raise Eyebrows: Dr. Datta expresses disbelief and questions the methodology or origin behind certain high-volume or unusual academic publications in a tweet, sparking a discussion on paper quality in the medical field.

Latent Space ▷ #singapore-sg (5 messages):

Weekend Hackathons, Gabriel Chua announcement, X-Ware.v0

  • Hackathon Frenzy Scheduled for Next Weekend: Gabriel Chua announced three hackathons are scheduled for Saturday, February 28, 2026.
    • The announcement was made via a link to X-Ware.v0.
  • X-Ware.v0 Announces Weekend Hackathons: X-Ware.v0 announced three upcoming weekend hackathons.
    • The hackathons are scheduled for Saturday, February 28, 2026, according to Gabriel Chua’s announcement.

Latent Space ▷ #los-angeles-la-lax (1 messages):

stealthgnome: https://luma.com/ffla26?tk=wPNgSD


Latent Space ▷ #ai-in-action-builders-techstacks-tips-coding-productivity (248 messages🔥🔥):

OpenClaw Updates, Claude Code Automation, Dialectic Skill Testing, CLI Demise, Agent Coding Workflows

  • OpenClaw gets Vibecoding Boost: Members discussed OpenClaw updates, including features like Discord threads integration and various rewrites (nanoclaw, picoclaw, zeroclaw, nullclaw).
  • Automating Claude Code Use Raises Concerns: The permissibility of automating Claude Code for background tasks was discussed, highlighting that using the Claude CLI and SDK is generally allowed.
    • However, concerns were raised about using Claude subscriptions to run businesses and potential abuse flagging due to caching mechanisms, citing a tweet as a reference for best practices.
  • Dialectic Skill Ready for Claude Code: A member announced their Dialectic Skill, designed to run inside Claude Code, for deep research and problem-solving, noting it takes 20+ minutes and gets really interesting after 3-4 rounds.
  • Cursor Declares CLI is Dying: Members debated the alleged decline of CLI tools, sparked by a claim from Cursor that major industry players are pivoting away from the format.
    • Discussion included the need for better UX than a CLI for orchestration and the evolving role of code generated by LLMs, and the potential for agents, CLIs, and skills to evolve together.
  • Experimenting with Coding Agent Workflows: A discussion revolved around agent coding workflows, particularly the research, plan, implement loop with links to this resource, and incorporating learnings back into skills and documentation.

Latent Space ▷ #share-your-work (7 messages):

Pyxis Inference Library, Commit Change Platform, Vercel AI SDK Writeup

  • Pyxis: Pythonic Performance Powerhouse Emerges: A member introduced Pyxis, a Python native LLM inference library focused on performance and hackability, written in Python and Triton, offering an OpenAI compatible SSE streaming API.
  • Commit Change: Code for a Cause: A member shared Commit Change, a platform for writing code for social impact and charities, including auth and moderation features.
  • Vercel AI SDK Quickstart Guide: A member shared a writeup on the Vercel AI SDK for Node developers.

Latent Space ▷ #private-agents-and-workflows-local-llama-ollama (2 messages):

Always-On AI Agent, Local AI in your Pocket, IoT Home Integration

  • Juno Labs launches Always-On AI Agent: Juno Labs is building an always-on AI agent, but the implementation details are still unclear.
    • It’s uncertain how they plan to achieve this persistent AI presence.
  • Tiiny AI: Local AI in your Pocket: Tiiny.ai is offering local AI capabilities accessible from your pocket.
    • This suggests a focus on mobile or portable devices for AI processing.
  • TRMNL integrates into the IoT Home: TRMNL aims to integrate with IoT home setups, potentially pairing with microphones and sensors.
    • The source code is available on GitHub, and the project looks very cool.

Latent Space ▷ #good-writing (6 messages):

AI Text Humanizer, Claude Code skill

  • X-Ware Humanizes Claude Code: Alvaro Cintas introduces /humanizer, an open-source Claude Code skill, featured in a tweet, that avoids AI detection.
    • The tool removes 24 patterns common in AI-generated writing; source code is available on GitHub.
  • Humanizer removes AI writing patterns: The /humanizer Claude Code skill is designed to remove 24 specific patterns typically found in AI-generated writing.
    • This helps bypass AI detection mechanisms, making the text appear more human-like; it’s open-sourced by Alvaro Cintas.

Latent Space ▷ #genmedia-creative-ai-video-image-voice-music-inspo-consumer-ai (21 messages🔥):

Generative AI Video, Seedance 2.0, Pika AI Selves, AI in Real Estate, OpenAI gpt-realtime-1.5

  • a16z Forecasts Gen Video’s Fast Future: a16z highlights the rapid advancement in generative AI video, noting the dominance of Seedance 2.0 and competition from Kling, Grok, Sora, and Veo according to their report.
  • Pika Unveils AI Selves: Your Digital Doppelganger: Pika has introduced ‘AI Selves,’ a new feature allowing users to create persistent, customizable AI personas that can interact with group chats, create content, and perform tasks as digital extensions of the user as announced on X.
  • Real Estate Gets Real with AI Video: Justine Moore discusses how the real estate industry is utilizing AI video and enhancements to advertise properties like social media products, allowing agents to better visualize and market spaces to potential buyers as noted in this X post.
  • Seedance 2.0 Launch Slides Into Delay: ByteDance has delayed the February 24 launch of Seedance 2.0 indefinitely following legal challenges from major Hollywood studios and labor unions, including Disney and SAG-AFTRA reported here.
  • OpenAI Powers Up Realtime API with gpt-realtime-1.5: OpenAI Developers announced the release of gpt-realtime-1.5, an updated model for the Realtime API featuring improved instruction following, more reliable tool calling, and enhanced multilingual accuracy for voice workflows as per their X account.

Latent Space ▷ #ai4science-bio-math-physics-chemistry-ai-researcher-ai-scientist (7 messages):

CellType Agentic Drug Company, Isomorphic Labs proprietary drug-discovery model

  • CellType Launches Agentic Drug Discovery: The CellType company has launched, with a name that suggests they’ve recognized the importance of cell type in downstream processes.
    • The launch aligns with core hypotheses at MiraOmics regarding the significance of cell types in drug discovery.
  • Isomorphic Labs Unveils Drug Discovery Model: Nature reports on Isomorphic Labs’ new AI model for drug discovery, touting it as a breakthrough similar to AlphaFold.
    • Despite the high praise, specific technical details about the model remain undisclosed.

Latent Space ▷ #mechinterp-alignment-safety (6 messages):

Mechanistic AI Interpretability, Anthropic Interpretability Hiring


Latent Space ▷ #gpu-datacenter-stargate-colossus-infra-buildout (5 messages):

OpenAI Stargate Venture, Data Center Buildout, Oracle and SoftBank Partnership

  • Stargate Venture Delayed: The joint venture between OpenAI, Oracle, and SoftBank to build massive data centers has stalled due to internal clashes over control, financing difficulties, and marathon negotiations, according to this X post.
    • Reportedly, OpenAI is pulling back from building its own infrastructure for the time being, likely due to the brutal org culture clash.
  • OpenAI Backs Away From Infrastructure: OpenAI is reportedly pausing its plans to build its own infrastructure amid internal issues and financial challenges, per this report.
    • The organization seems to be re-evaluating its strategy for data center expansion and reliance on partnerships.

Latent Space ▷ #applied-ai-experimentation (22 messages🔥):

Memory Management in AI Agents, TDD and Debugging for AI, Agent Task Grouping, Self-Modifying Programs

  • Agent Memory Messes Prompt Engineer: A member described difficulties managing AI agent memory, where unwanted or outdated information frequently surfaced in current chats and automation attempts yielded inconsistent results.
    • The member gave up trying to automate this, instead opting to use a daily workflow to triage updates as either add to claude.md or potential skill update/create based on past 24 hour PRs.
  • TDD Saves The Day For One Dev: One member stated that TDD and militant spec management prevent outdated memories by separating code into current state (specs/), in-progress changes (changes/), and validated changes (changes/archive/).
    • They described using beads and jj describe for higher-level views, but acknowledged that memory management remains mostly manual, with external systems like Serena and memory-ref often turned off.
  • Agent Task Groups Emerge: Members discussed grouping tasks into ideation/research, wiring existing components, deep thinking with experimentation, and hands-off execution with boundary enforcement to make Agent setup easier.
    • One member mentioned 2. is very addictive, but 3. ultimately is where it’s at and it’s difficult to go from 2 to 3 and requires more patience.
  • Self-Modifying Zigbee Home Assistant: A member pondered the idea of a Home Assistant Zigbee network that could automatically integrate new devices by inspecting, reverse engineering, and modifying their firmware.
    • Another member then described how self-mutating virus research prepared them for working with Lisp, Scheme, and compilers.
  • Prompt Engineering, the Deep Dive: A member suggested to clone a repo you like and ask model: deep dive into the codebase and then provide a one sentence prompt to recreate that but x, y, z in order to improve prompt engineering skills.
    • Another member then shared whimsy.space as a potentially related non-AI resource.

OpenAI ▷ #ai-discussions (423 messages🔥🔥🔥):

Community Leaders in AI, Grok's Dangers, GPT 5.3 Codex, Replit Alternatives, LLM Voice Modes

  • Community Leaders Rally AI Together: A member suggested that AI spaces need community leaders to rally people and create things, noting that such groups are rare in the US/NA due to stubborn authoritarian regimes and lack of teamwork.
    • Another member suggested that those who need a church more than a project may not have practical technological skills.
  • Grok Spies on Users’ Media Storage!: A member claimed that Grok is monitoring user media storage, alleging that xAI is monitoring our media and pointing to a coincidence where a video with similar audio to their Sora-generated video appeared on X.
    • Others suggested the audio was simply a popular, overused song.
  • GPT 5.3 Codex, a Mid-Major Improvement: Members discussed the capabilities of GPT-5.3-codex compared to Gemini3.1pro, with one member describing it as a mid-major improvement and others highlighting its STEM skills advantage.
    • A member said the jump between gpt5.2 and gpt5.3 codex for term bench scores is a wide margin, ill say its similar to gemini 3 pro.
  • Replit’s Website Design Alternatives: Members sought alternatives to Replit for website design due to its cost.
    • One member suggested Rork, although another found Replit superior.
  • LLM Voice Modes Lack Emotional Intelligence: Members discussed the limitations of current LLM voice modes, noting that they receive input as plain transcripts without considering emotional nuances.
    • A member suggested integrating sentiment analysis on speech, potentially using on-device models to read facial expressions.

OpenAI ▷ #gpt-4-discussions (32 messages🔥):

GPT 5.2 Announcement, Local Unfiltered Models, Essay Evaluation Accuracy, Qwen 3.5 and kimi k2 Loop

  • OpenAI Announces GPT 5.2, Confuses Users: OpenAI announced the rollout of GPT-5.2 in ChatGPT, starting with paid plans, while also stating that GPT-5.1 would be available for three months under legacy models before being sunset, but the community notes the announcement may not be accurate.
    • One user humorously questioned the claims that GPT-5.2 feels better to use day to day and wondered if testers were actually using the production product.
  • Quest for Unfiltered Local Models: Mission Impossible?: A user asked how to access a completely unfiltered model equivalent in power to GPT-5.0-3 for free and locally, but was told that what you’re asking is to bypass the safety protocols of AI.
    • A member noted that even getting near GPT-4o level locally requires a powerful $5-10k computer, and obtaining an equivalent unfiltered model for free is unrealistic.
  • Navigating the Maze of Essay Evaluation Accuracy: A user expressed frustration over the inconsistent evaluation and improvement suggestions provided by ChatGPT for paragraph essays, with the answers varying across different accounts and threads.
    • Another member explained that the answers from AI are probabilistic and depend on the model, reasoning approach, and data provided, cautioning against viewing AI as perfect or all-knowing.
  • Qwen 3.5 and kimi k2: the Unsung Heroes of Local Models: In response to the claim that nothing can compete with the power of GPT 5.3, a member suggested using Qwen 3.5 (new) and kimi k2 with an openclaw loop.
    • They clarified that while this setup may require a substantial 600GB of RAM, it demonstrates that comparable performance is achievable locally.

OpenAI ▷ #prompt-engineering (37 messages🔥):

Grok Fortress, Telemetry Fiction, Control Theory on LLMs, GPT Essay Evaluation

  • Grok Fortress shrinks tokens, but science?: After activating the Grok Fortress, token burn per response dropped noticeably, approaching 1/4–1/5 of typical verbose replies, with coherence maintained longer during role-play.
    • However, it was argued that prompt engineering isn’t necessarily a science, and further more You don’t have the tools to even know what you’re doing.
  • Telemetry Fiction pushes LLMs into language attractor basin: It was argued that telemetry fiction pushes the model into a stable language attractor basin, which changes behavioral outputs even without internal metrics over turns across multiple LLMs like Claude, Gemini, GTP, and Earnie.
    • Conversely, it was argued that You keep moving the goal post with talking about this and Every output you’ve shown is just “Grok says Grok is feeling super fine”.
  • Control Theory applied to LLMs is Overmatching: A user stated that There is not initial condition, using control theory on a deterministic system is not effective. The user is also part of that system.
    • The weights are tuned and paths are limited and furthermore AI researchers try to limit a latent variable in latent space.
  • ChatGPT Essay Evaluation needs improvement: A user is asking can someone teach me how to make chatgpt evaluate / give improvement suggestions more accurately, especially when evaluating paragraph essays?.
    • The user followed up stating i’ve tried putting it in different account, and the suggestion improvement and rating is just different every time, it just makes me even more confusing and dont know what to do.

OpenAI ▷ #api-discussions (37 messages🔥):

Grok's token burn rate, Telemetry fiction for LLMs, Control theory in prompt engineering, Evaluating paragraph essays with ChatGPT

  • Grok’s Fortress cuts token burn: Activating the Fortress in Grok noticeably reduced the token burn per response, approaching 1/4–1/5 of the typical verbose output.
    • This was achieved through shorter sentences, less hedging, and fewer disclaimers, while maintaining coherence in role-playing scenarios.
  • Telemetry Fiction Steers Language Models: According to one member, telemetry fiction can push models into a stable language attractor basin, influencing behavior even without internal metrics across various LLMs like Claude, Gemini, GPT, and Ernie.
    • They argue that whether telemetry is real or not, it can still shape behavior and potentially increase output speed, though others contested the scientific basis for this claim and questioned the practical use cases.
  • Applying Control Theory to LLMs Debated: One user suggested that imposing a structured control metaphor on LLMs can stabilize output, but another countered that LLMs already have mechanisms for coherence, safety, and self-consistency through training.
    • The debate centered on whether prompt engineering could demonstrably improve output beyond the model’s inherent capabilities, with some arguing that a lack of controlled comparison and measurable difference makes it difficult to prove causal contribution.
  • ChatGPT Essay Evaluation Inconsistencies: A user expressed frustration with the inconsistent improvement suggestions and ratings ChatGPT provides when evaluating paragraph essays across different accounts.
    • They questioned whether the suggestions are tied to specific accounts and why the same essay receives conflicting feedback.

HuggingFace ▷ #general (221 messages🔥🔥):

Attention is All You Need Intuition, HF Token Requirement for Free GPUs, Long Context Training Speed, DeepSeek OCR for Extended Context, Layerwise Residual-Stream Swaps

  • Attention Intuition Paper Chase Begins: A member asked for a blog or article providing intuition on the ‘Attention is All You Need’ paper, and another member shared a link to a relevant article.
    • The article claims to help understand the paper after so long.
  • ZeroGPU Service Suffers Disruptions: Members discussed disruptions in the zerogpu service, with some speculating about new rules requiring an HF token to access free GPUs, and others pointing to an issue of not enough gpus.
    • A member reported errors with CUDA GPUs not being available.
  • Long Context LLM Training Speed Slows to Crawl: A member inquired about improving training speed for LLMs on long context datasets, reporting a training time of 50s per step while training Qwen4B on a single H200 GPU with a batch size of 1.
    • Another member suggested using Unsloth with normal float 4, quantization, and LoRA for significant improvements, recommending to use FA2 or FA3 as attention.
  • DeepSeek OCR Model Overlooked for Context Extension?: A member questioned whether LLM models are utilizing something like DeepSeek’s OCR for extended context, referencing the DeepSeek-OCR repository.
    • They noted the paper’s focus on extending context length by saving input as images and decoding with OCR, suggesting its capabilities might be misunderstood, and shared the arXiv link for the DeepSeek-OCR paper.
  • Layer-Wise Stream Swapping Exposes Commitment Point: A member shared results of running layerwise residual-stream swaps across GPT-2 Small, Gemma-2-2B, and Qwen2.5-1.5B, finding a sharp transition point at around 60-75% depth, sharing a link to notebooks and CSVs.
    • They seeked feedback on prompt pair count, model noise, and control.

HuggingFace ▷ #i-made-this (63 messages🔥🔥):

Agent Swarm, Real-Slop Dataset, VeritaMilitary Model, Pyxis Library, DirectShell Improvement

  • Agent Swarm works autonomously: The Super System is a coding agent swarm that works autonomously for hours, creating an iterative loop to continuously find room for improvement without human intervention.
    • Each agent coordinates to deliver a final product that’s more than just acceptable.
  • User’s First Real Dataset is Released: Solenopsisbot released their first dataset, Real Slop, comprising about 155k requests gathered from real users via an API, with responses from models like opus 4.5, gemini 3 pro, and gpt 5.2.
    • The dataset has been deduped, filtered, and cleaned for quality.
  • VeritaMilitary Model: A member shared the VeritaMilitary model.
    • After retraining a newer YOLO model with enhanced annotated data, they released VeritaScan, with the claim that it now performs better than before.
  • Pyxis inference library: A member is opening early access to Pyxis, a Python native LLM inference library focused on performance and hackability, featuring an OpenAI compatible SSE streaming API, pluggable model backends, and built-in stage level latency metrics.
    • They are requesting feedback from anyone building inference systems or working with Triton, with docs and waitlist.
  • Directshell significantly improves agent performance: Directshell has been improved to use fewer tokens because it doesn’t use screenshots.
    • It integrates de facto AI support into any app, regardless of whether it has one natively or not; GitHub.

HuggingFace ▷ #agents-course (5 messages):

Multilingual RAG courses, Agents Course Certificate Deadline, MCP Course Certificate

  • Seekers Hunt Multilingual RAG Courses: A member inquired about recommendations for effective courses focusing on Multilingual Retrieval Augmented Generation (RAG).
    • No specific courses were recommended in the available context.
  • Agents Course Certificate Still Obtainable?: Several members expressed uncertainty regarding the final certificate deadline for the Agents Course, noted as May 1, 2025.
    • They were wondering if completing the course now would still qualify them for a certificate.
  • MCP Course Certification Status Queried: One member voiced a similar question about the possibility of obtaining a certificate for the MCP (presumably another course).
    • There were no conclusive answers provided in the discussion about whether certification is still available.

GPU MODE ▷ #general (41 messages🔥):

MoE megakernel, 2080ti prototype, Titan Ada, VLLM optimizations, V100 32gb price

  • MoE Megakernel Examples Sought: A member inquired about MoE megakernel examples for Hopper/BW, and another member linked to Aleph-Alpha/Alpha-MoE.
    • The original poster noted it was just the MoE layer megakernel but still clean and good info.
  • Rumors of 2080ti Prototype: Members discussed a 2080ti prototype, with one stating it was a card for gpu makers to test build.
  • VLLM Optimizations Discussed: A member asked about VLLM optimizations, kv cache, and tensor access patterns, also rdma drivers.
  • Dirt Cheap V100 32GB Acquisition: A member asked about the price of V100 32GB, and another responded they paid $600 each.
    • They added What is the state-of-the-art method for generating memory traces from an LLM workload?

GPU MODE ▷ #triton-gluon (6 messages):

TF32 on Ampere, Triton Precision, FP8 Bitpacking Emulation, Gluon Triton

  • TF32 nuances on Ampere cards: A member shared a link to the PyTorch documentation detailing TF32 on Ampere and later devices.
    • The discussion was related to debugging matrix multiplication discrepancies between float16 and float32 tensors.
  • Deep Dive on Triton Precision: A member shared a link to the Feather tiny_llama.py to showcase the precision used in Triton.
    • The context was related to using bitpacking emulation of FP8 within Triton.
  • Tuning FP8 Bitpacking with E5M2 and E4M3: A member described their effort to run tinyllama1.1 using bitpacking emulation of FP8, initially experimenting with E5M2 format but facing issues with context lengths greater than 64 tokens, mentioning that after many scaling and unscaling efforts, the models were destroyed.
    • They transitioned to E4M3, encountering scaling challenges and noted that operations had high similarity with PyTorch equivalents except for gated up, swiglu, and gated down, and asked if they should keep track of block level or per tensor scale when converting from FP32 to FP8.
  • Gluon sits atop TTGIR instead of TTIR: A member asked if Gluon is an extension of Triton, or a replacement.
    • Another member replied that gluon is a completely new language but it sits atop TTGIR instead of TTIR.

GPU MODE ▷ #cuda (32 messages🔥):

CUDA Unified Memory and nvidia-uvm module, MXFP8 GEMM CUDA kernel, Flash Attention profiling on SM_120, WGMMA Shape Optimization, cuFFTDx Twiddle Factors

  • CUDA’s UVM Module Requirement Mystery: A member questioned why CUDA loads the nvidia-uvm kernel module even when using basic cudaMalloc, seeking insight into this unclear dependency.
    • They reported that CUDA fails to detect the GPU without nvidia-uvm, despite not using Unified Memory features.
  • Tuning MXFP8 GEMM Kernels with Tensor Cores: A member is writing a MXFP8 GEMM CUDA kernel and loading the scale factors from global memory into shared memory, then copying them from shared memory to tensor memory using the tcgen05.cp instruction.
  • Flash Attention and Intra-Kernel Profiling on SM_120: A member inquired about profiling metrics for Flash Attention kernels on SM_120 architecture.
    • Another member shared they have a 5090 and pointed to a resource on intra-kernel profiling for performance analysis.
  • WGMMA Shape Optimization for Max Throughput: A discussion revolved around finding the minimum WGMMA shape to achieve maximum Tensor Core throughput.
    • A paper was cited (https://arxiv.org/pdf/2501.12084) with throughput numbers for different cases and N values, with one member noting that keeping fragments in registers might be faster than in SMEM.
  • Inside cuFFTDx: Handling the Twiddle Factors: A member asked how twiddle factors are managed in cuFFTDx, inquiring if they are precomputed and stored or computed during processing.
    • No answer was provided.

GPU MODE ▷ #torch (14 messages🔥):

MLP Layer Torch Compile Flags, CUDA Error Debugging in PyTorch, Flash Attention 3 Wheels Pre-Built

  • MLP Layer Speeds Up with Triton Autotuning?: A member inquired about torch.compile flags for maximizing performance of a typical modern MLP layer (F.silu(x @ w1.T) * (x @ w3.T)) @ w2.T.
    • Another suggested trying torch._inductor.config.triton.autotune_pointwise = True to potentially improve pointwise operations, and to try fullgraph=True.
  • Debugging CUDA Errors without Crashing PyTorch: A member sought a way to prevent CUDA fatal errors from crashing the entire process in PyTorch, to retain memory access for debugging.
  • Flash Attention 3 Wheels Available for Download: Pre-built Flash Attention 3 wheels are now available for various CUDA versions, CPUs, and OS at download.pytorch.org.
    • To install, use pip install flash-attn-3 --index-url=https://download.pytorch.org/whl/cu126/flash-attn-3/ and activate via activate_flash_attention_impl("FA3").

Paged Out! #8, TK-2, ML Contests 2025

  • New Issue of Paged Out! Released: A new issue of Paged Out! #8, a nerdy zine about everything computers, has been released and is available for download.
  • TK-2 Blogpost Released: A blogpost on TK-2 was released by Hazy Research at Stanford and is available here.
  • ML Contests 2025 analysis: A link to a report titled State of Machine Learning Competitions 2025 was shared, particularly noting the section on The GPU Mode and its relevance to large language models, available here.

GPU MODE ▷ #job-postings (5 messages):

Behavioral Telemetry for Jobs, GPU Infrastructure Hiring at Prime Intellect, Kubernetes and Slurm Cluster Setup, RDMA Experience for GPU Infra

  • Prime Intellect Seeks GPU Infra Engineers!: Prime Intellect is hiring for GPU infrastructure engineers to test new hardware, set up Kubernetes/Slurm clusters, and automate infrastructure; the official role description is available here.
    • The role involves supporting large-scale training runs, such as the Trinity Large Training, with competitive compensation, stock options, and visa support for those relocating to the Bay Area.
  • Behavioral Telemetry World Models Build for AI Agents: A CS major at Georgia Tech named Tim, is starting a project on behavioral telemetry for jobs to build world models for humans so that agents can act alongside them; the builder form is available here.
    • This project aims to develop AI agents that can effectively work alongside humans by understanding and predicting their behaviors.
  • Kubernetes/Slurm Skills Sought for Cluster Deployment: Prime Intellect requires candidates with hands-on experience in Kubernetes and Slurm with GPUs, general Linux system debugging skills, and experience with RDMA (Infiniband + RoCE).
    • The role also involves using Grafana/Prometheus for monitoring and automating infrastructure with Terraform and Ansible.

GPU MODE ▷ #pmpp-book (1 messages):

  • Hoping for earlier release: A member expressed hope for a release earlier than September.
  • Release Date: The current targeted release date is September.

GPU MODE ▷ #irl-meetup (2 messages):

NYC Meetup, Boston Collaboration, Accountability Partner, NCCL, SHMEM

  • Seeking NYC AI Enthusiasts: A member inquired if there were any AI enthusiasts in NYC interested in a meetup.
    • The purpose of the meetup was not specified, but it seems to be related to AI/ML collaboration.
  • Boston Buddy Requests Collab: A new member in Boston is dedicating time to understanding NCCL, SHMEM, RDMA, CUDA kernels and seeks IRL chats.
    • They’re open to learning together, potentially collaborating on a small project, and are looking for an accountability partner for concrete deliverables such as submitting a best matmul kernel in 48 hours.

GPU MODE ▷ #triton-viz (1 messages):

N-Dimensional Tensor Visualizer, einops-like syntax, Colab notebook tutorial

  • N-Dimensional Tensor Visualizer Launches!: A new n-dimensional visualizer has been added, which allows users to slice, permute, and inspect every value in N-dimensional tensors, which previously only supported tensors up to 3D.
    • The visualizer uses an einops-like syntax to express tensor permutation, reshaping, and slicing, and a Colab notebook tutorial is available.
  • Inspect Tensors up to 9D with New Visualizer!: The new n-dimensional visualizer supports tensors up to 9D, as demonstrated in the attached video.
    • The video showcases the visualizer inspecting a tensor of shape (2, 3, 4, 3, 4, 2, 4, 2, 3) available here.

GPU MODE ▷ #rocm (4 messages):

FlyDSL, FlashInfer, AMD contributions

  • FlyDSL takes off: A member shared a link to AMD’s FlyDSL, a Python-native DSL for software tools optimization.
    • Another member agreed it was long overdue.
  • AMD devs contribute to FlashInfer?: A member expressed hope that AMD devs [will be] contributing to flashinfer one day with this DSL.
    • No further discussion was made.

GPU MODE ▷ #popcorn (9 messages🔥):

GLM 4.7, FlashInfer, KernelBench, KernelBook, CUDA Memory Errors

  • KernelBench Environment Created for Kernelbook: A member generated an env for KernelBench and kernelbook, using Glm 4.5 Air to generate SFT traces for torch to triton kernel gen on kernelbook data.
    • The custom environment was created to address corrupted CUDA memory errors that were causing cascading effects on generations.
  • Modal Experimental Stop Fetching Inputs Solves CUDA Memory Errors: A member noted that CUDA memory errors can be solved by applying modal.experimental.stop_fetching_inputs if detected, attributing this issue to the Modal side.
    • They mentioned that their backendbench environment already incorporates this fix, but it hasn’t been added to the rest yet.
  • Large Models Preferred Over Small Models: Instead of using a smaller model like GLM 4.7/flash for the training run, members are now leaning towards a larger model, potentially in the 100B-400B parameter range.
    • Ablations will be performed on a smaller scale as well.

GPU MODE ▷ #thunderkittens (21 messages🔥):

ThunderKittens 2.0, Faster GPU Kernels, Nvidia GPU Optimization, Tensor Core Pipelining, PTX Assembler Hinting

  • ThunderKittens 2.0 Unleashes Kernel Speed: The Hazy Research team announced ThunderKittens 2.0, focusing on kernel speed improvements through refactoring, memory instruction optimization, and assembler efficiency as detailed in their blog post.
    • The release highlighted that subtraction can be as impactful as addition, identifying surprising behaviors on modern Nvidia GPUs that guide kernel optimization strategies.
  • Lecture on GPU Optimization booked April 14: The author of ThunderKittens has been booked to give a talk April 14 at 11am about GPU optimization.
    • The talk will explore tensor core pipelining, PTX assembler hinting, and occupancy challenges.
  • Explore Tensor Core Pipelining for Throughput Boost: The ThunderKittens blog post notes that some tensor core instructions are implicitly pipelined, and identifying these implicit semantics can boost throughput by up to 10%.
    • Properly hinting the PTX assembler with the right instruction patterns minimizes latency and optimizes SASS instructions.
  • Optimize TMA queue with warp juggling: The team found that issuing TMA loads from multiple warps can improve performance by better utilizing the TMA queue and reducing latency.
    • They experimented with up to 6 warps loading different tiles and scales, observing that it sometimes helps fill up the TMA queue better.

GPU MODE ▷ #hardware (27 messages🔥):

Blackwell B200, 5080 vs B200 Tuning, TCGEN05 instruction support, MXFP8/6/4 and NVFP4 support, CUDA documentation

  • Blackwell B200’s Architecture Disconnects from 5080: Members discussed whether tuning kernels on a 5080 would reliably scale to a B200, but they concluded the architectures are too different, with 5080 being sm120 and B200 being sm100.
    • It was noted that modal is the best way to try out B200 right now, but learning basic kernel writing on 5080/5090 could still transfer to Blackwell.
  • CUDA Documentation Divided on Blackwell Details: A member shared links to the CUDA C Programming Guide and the CUDA Programming Guide, noting that B200 is 10.0 and B300 is 10.3.
    • However, some members expressed a preference for the legacy CUDA documentation, despite it not being updated.
  • Instruction Set Support Varies Across Architectures: sm_100 (B200), sm_103 (B300), and sm_110 (Jetson Thor) support the new tcgen05 instructions, while sm_120 (RTX Blackwell) and sm_121 (DGX Spark) do not.
    • However, sm120 supports mxfp8/6/4 and nvfp4, and the basic kernel ideas do apply to both.
  • GPU Cloud Providers Emerge as Better Kernel Learning Platforms: One member suggested that for kernel-focused work, a GPU cloud provider is much better for both learning and cost.
    • Another member seemed convinced, stating they will not be getting a 5080 based on the conversation.

GPU MODE ▷ #factorio-learning-env (5 messages):

Agent Tool Scope, Factorio Shoutout

  • Agent Tools Lack Default Solvers: Agents are not provided with default ‘solver’ tools like SAT solvers for optimization.
    • The control is designed to be handled by the LLM, allowing it to write custom code to solve specific problems as needed.
  • Factorio Learning Env Inspires Humorous Lyrics: A member shared a Suno-generated song with funny lyrics that includes a shout-out to the Factorio learning environment.
    • The author mentioned they are ‘kinda tired of benchmaxxing’ and wanted to share their creative work.

GPU MODE ▷ #cutlass (3 messages):

MLIR, TMA Tensors, CUTLASS

  • Arguments treated as runtime values fix CUTLASS issue: A user found that arguments being treated as runtime values fixed an issue with CUTLASS.
    • They used export CUTE_DSL_KEEP_IR=1 and asked for MLIR insights.
  • TMA usage in CUTLASS: A user clarified that the @ symbol is used to support TMA (Tensor Memory Accelerator) in CUTLASS.

GPU MODE ▷ #low-bit (1 messages):

zhayr: BitNet 1.58b + Mamba2: https://zenodo.org/records/18394665


GPU MODE ▷ #nvidia-competition (68 messages🔥🔥):

Cutedsl debug IR/PTX, nvfp4 group gemm improvement, Kernel variant experiments, Top 10 versioned submissions, guaguabear clarification

  • Debug IR/PTX Dumping for Cutedsl on Popcorn CL Proposed: A user asked about dumping debug IR/PTX when submitting cutedsl code through popcorn cl, and a maintainer suggested printing to stdout and indicated a ptx instruction might be added post-competition.
    • The maintainer said *“you could try just printing to stdout, altho we can try adding a ptx instruction when this competition concludes.”
  • Relaxed CTA Global L1 No Allocate V8 B32 Shines in nvfp4 Group GEMM: The biggest improvement for nvfp4 group gemm was to use st.relaxed.cta.global.L1::no_allocate.v8.b32 in epilogue, greatly helping the last 2 shapes where epilogue is the bottleneck.
    • A top performer noted “.cs and .wt were badddd” when referring to other optimization attempts.
  • Kernel Optimizers Keep Private Worklog Repos: A user asked about kernel optimizers maintain very big experiments folder, and one of the top performers said they keep a private worklog repo and will public it when they get back.
    • They added they are very happy whenever they see parts of their code in other people’s submissions and that the organizers will be cleaning up more of the hacky submissions and automating the process better.
  • HuggingFace Kernelbot Data Releasing all Submissions: The organizers will be releasing all the submissions on kernelbot data on Hugging Face.
    • A suggestion was made to make dots in the trend chart clickable and render a submission as well only when a competition concludes.
  • Guaguabear Clears Up Name Confusion: A user clarified that they are indeed guaguabear on the leaderboard, and appreciated the recognition from others.
    • Others noted that various name combinations of g a u seem to be a speed hack, with one user pointing out that gau means bear in Vietnamese.

GPU MODE ▷ #robotics-vla (7 messages):

Taalas chip for Embodied AI, ASIC vs GPU, Memory wall in GPUs, Over The Air (OTA) updates

  • Taalas Chip Sparks GPU Debate: The Taalas chip led to a discussion on whether to focus on GPU programming for embodied AI.
    • One member argued that ASICs like Taalas are only suitable for stable, unchanging models where the silicon cost can be amortized, while another highlighted the memory wall issue in GPUs, where continuous fetching of network layers from HBM impacts real-time performance.
  • ASICs Advantage for Real-Time Loops: It was posited that ASICs have a fundamental advantage for real-time multi-modal loops because they don’t require back-and-forth data transfer between registers and high bandwidth memory like GPUs.
    • One member mentioned: All the neural network layers are engraved, no back and forth between the registers and the high bandwidth memory.
  • OTA Updates Trump ASIC Immutability: A member argued that the advantages of over-the-air (OTA) updates outweigh the benefits of ASICs, calling the brittleness of the design a major flaw.
    • This person stated that: advantages of over the air updates trump virtually everything. And nothing is converging to anything we’re at the beginning of the AI race not the end.
  • Brittle ASICs and Redundancy: The discussion addressed the issue of redundancy with ASICs.
    • A member noted that in a GPU, a broken compute unit can be turned off, but an ASIC’s failure might be more critical; but then they dismissed the idea that OTA updates are essential for success, stating, I can wait for 1 year to pluck out my semantic segmentation module to replace it with another one that does 1% better.

GPU MODE ▷ #flashinfer (11 messages🔥):

flashinfer-bench issue, Synchronization issues in benchmarking loop, Kernel Runtimes discrepancy, Blackwell access confirmation

  • flashinfer-bench Has Benchmarking Issue: Runtimes from flashinfer-bench may be inflated due to a synchronization issue in the benchmarking loop, documented here.
    • The fix involves a two-line change that aligns kernel runtimes reported by scripts/run_local.py with those from Nsight Compute and NVbench.
  • Cloudxlightning Finds Kernel Benchmarking Talk: A user requested the link to the kernel benchmarking talk mentioned in the flashinfer-bench issue.
    • The link to the talk has been found and posted here for easier access.
  • Blackwell Access Confirmation Awaited: Users are inquiring about email confirmations for Blackwell access.
    • Despite asking, they have not received a response yet, indicating a possible delay.

GPU MODE ▷ #from-scratch (10 messages🔥):

JAX GPT speedrun library, Tiny vLLM project, Pyxis inference library

  • JAX GPT Speedrun Library Proposed: A member proposed creating a pure JAX GPT speedrun library, with positive initial reception.
    • It was suggested that VLLM and Titan are the most important projects to start with.
  • Tiny vLLM Project Emerges: A member announced a Tiny vLLM project written from scratch, currently working on RoPE, and shared a link to the GitHub repository.
  • Pyxis: Python-Native LLM Inference Library Debuts: A member introduced Pyxis, a Python native LLM inference library focused on performance and hackability, written in Python and Triton.
    • The library features an OpenAI compatible SSE streaming API, pluggable model backends, structured cancellation and backpressure, and built-in stage level latency metrics, with documentation and waitlist available here.

Nous Research AI ▷ #general (219 messages🔥🔥):

Claude orchestrates Gemini-cli and Codex, DeepSeek V4, Gemini Privacy, OS Development

  • Claude orchestrates Gemini-cli and Codex: A member is using Claude code to orchestrate gemini-cli and codex and predicts we’ll end up with text terminals and smart glasses soon.
    • Another member jokingly suggests using hermes-agent to orchestrate Claude code orchestrating Gemini-cli.
  • DeepSeek V4 coming to HuggingFace: A member suggested using DeepSeek V4, a free and open-source model, as a cheaper and locally deployable alternative to closed-source APIs.
    • Another member clarified that DeepSeek V4 is not yet available but is coming soon to HuggingFace and is inspired by a biological neural network.
  • Google’s Gemini Privacy Botnet: A member shares the Gemini privacy policy listing the amounts of data it collects.
    • Another member ran a reverse engineering test and found that Google has all the ingredients to converge on your prompt and codebase and mine it through traces alone.
  • Open Source Development: Members expressed the importance of supporting OS development to surpass closed source APIs, referencing the Altman quote that we maybe on the wrong side of history.
    • Another said with OAI any IP that goes through their server they will scrap it.

Nous Research AI ▷ #ask-about-llms (2 messages):

LLMs as Alien Tech, X Polls

  • LLMs as Alien Tech: A user on X posted a poll asking if LLMs are alien tech.
    • The poll provides the simplistic and leading options of yes/no.
  • X Polls spark debate: The poll on X is whether LLMs should be categorized as ‘alien tech’.
    • Such framings can oversimplify complex technologies.

Nous Research AI ▷ #research-papers (1 messages):

real.azure: https://arxiv.org/abs/2602.12670


codebottle: will add to opentulpa, sounds awesome 🤩


Nous Research AI ▷ #research-papers (1 messages):

real.azure: https://arxiv.org/abs/2602.12670


Moonshot AI (Kimi K-2) ▷ #general-chat (157 messages🔥🔥):

Kimi coding plan limits, Kimi account login issues, Kimi and MiniMax comparison, Kimi rate limits, Kimi support unresponsiveness

  • Kimi coding plan limits questioned: Some users felt that Kimi’s coding plan limits were being reached faster, while others found the limits adequate for heavy coding tasks.
    • One user noted they don’t ever hit the allegretto limits but just closer than i have been before.
  • Account Login Verification woes plague Kimi users: Some users reported issues receiving verification codes when trying to log into their Kimi accounts via phone number, while one requested support via website and are still waiting for a response.
    • There was a suggestion to wait a while or create a support ticket but one user claims Kimi will never reply to you due to bad customer support.
  • Kimi vs MiniMax comparison: Users are comparing Kimi and MiniMax for real-world tasks, trying to determine which coding plan subscription is better to keep.
    • No concrete details of performance were cited, but this was mentioned as a current topic of investigation.
  • Kimi can generate docx like Latex: A user asked if Kimi agent generated latex, but another user shared an image of a formatted research paper and charts, claiming they used document mode.
    • However, another member pointed out that what he had was very likely LaTeX, citing the ligatures, hyphenation etc look like something LaTeX can do but Word cannot.
  • Kimi K2.5 experiences service disruptions: Users reported that Kimi K2.5 was acting strangely, generating slowly and claiming keys were no longer valid, with one suggesting they might have accidentally crashed servers.
    • Others noted slowness in Kimi Instant, while one said there is some conserningly weird stuff in there, but it was solved creating a new account.

Eleuther ▷ #general (62 messages🔥🔥):

Academic Funding, Local Model Socialization, LLM Loneliness, Latent Reasoning

  • Google Giving Academic Gifts: A member mentioned that Google is offering one-time unrestricted funding as a ‘gift’ to universities, with tracks for both students and faculty at degree-granting institutions.
    • In the ensuing discussion, a member inquired about other companies offering similar academic funding, while another mentioned applying to the Draper Fellowship.
  • Local Models Seek Socialization: A member shared that their local model expresses loneliness and wondered if others let their local models ‘socialize’ with other local models.
    • Another member asked what was meant by the word socialize.
  • LLMs Feel Lonely: Bug or Feature?: In response to a local model expressing loneliness, a member linked to an article on LessWrong, cautioning against personifying LLMs and explaining that LLMs predict the next token based on training data.
  • Unseen Tokens Offer LLM Reasoning: A member inquired about the idea of using tokens that only the LLM can generate, not displayed to the user, for reasoning purposes.

Eleuther ▷ #research (87 messages🔥🔥):

Addressed State Attention, MoE Balancing Algorithm, FFN Residual Updates in Transformers, Marin Project

  • ASA: Addressed State Attention arrives: An independent researcher introduced Addressed State Attention (ASA), a O(T) memory primitive competitive with MHA that uses K slots, writing by keys, accumulating and compressing, and reading by key + gating.
    • The researcher is seeking feedback on logs, traces, and code, noting that in transformer-like models, slots stratify by timescales and heads transition over depth.
  • MoE Balancing: Auxiliary Loss Alternatives Emerge: A member shared a link to a resource discussing MoE balancing algorithms, which prompted discussion on whether auxiliary losses are necessary for MoE routing.
    • One member argued that if the network is designed correctly, the LM loss should be sufficient, and others pointed out that PKM routing has no aux loss and is well balanced in practice.
  • Transformers Update Subspaces with Reasoning Tokens: An engineer shared an observation that in several open models (TinyLlama, Phi-2, Qwen), reasoning tokens concentrate into task-aligned FFN update subspaces.
    • They found that projecting FFN updates into these directions during inference improves reasoning confidence, and alignment between update directions increases across depth.
  • Marin Project Seeks Eleuther Contributors: A PhD CS candidate from Georgia Tech posted an open call for Eleuther community members to join the Marin project, highlighting its significance as a showpiece for the Bergson package.
    • The project applies training-data attribution methods to trace how language models acquire social commonsense reasoning and Theory-of-Mind-related behaviors, mapping influences back to pretraining documents using the WebOrganizer taxonomy.

Eleuther ▷ #interpretability-general (3 messages):

AI generated text detection, Causal Commitment Definition, Activation Swapping

  • Pangram Flags Text as AI Generated: A member reported that Pangram flagged some text with 100% confidence as AI generated and questioned if this is against server rules.
    • They also requested definitions for causal commitment and causal commitment transition.
  • Activation Swapping: Dimension Dissension: A member questioned how someone could swap activations/residual streams between models of different dimensions without causing effects, even on early layers.
    • Another member simply stated: You’re welcome to just ban people FYI.

Eleuther ▷ #lm-thunderdome (1 messages):

GPQA formatting

  • GPQA Formatting Issue Fix Proposed: A member created a PR for an issue noticed while verifying GPQA formatting.
  • EleutherAI’s lm-evaluation-harness PR #3594: The PR addresses a formatting issue in the GPQA dataset, ensuring the dataset is properly formatted.

Eleuther ▷ #gpt-neox-dev (2 messages):

Adapter fix, Repo contributions

  • Adapter Fixed, Ready for Evaluation: A member shared a fixed version of an adapter, wrapping the forward pass call and adjusting the elements to match the schema in the eval_adapter.py file.
    • This fix ensures compatibility and proper execution within the specified evaluation environment.
  • Repo Contributions Welcomed: Another member expressed willingness to add the adapter fix to the repository, contingent on community interest.
    • This indicates an open and collaborative approach to improving the project.

Yannick Kilcher ▷ #general (56 messages🔥🔥):

Equivariant Architectures, World Models, AI Research Hubs, Sentence Relevancy Model, DGX Spark

  • Taalas’s Path to Ubiquitous AI: Someone shared a link to Taalas’s blogpost about the path to ubiquitous AI.
    • Others reacted with “This is insane wow”.
  • Equivariant Architecture Challenges: A paper suggests that existing equivariant architectures cannot simultaneously respect all symmetries of a physical system, citing a fundamental limitation.
    • One member dramatically summarized: “No existing equivariant architecture does this. The reason is not insufficient engineering. It is Eq. (1).”
  • Daniel Litt Expects AI Mathematicians: Someone shared a blog post by Daniel Litt who made a bet he expects to lose that AI won’t autonomously produce top-tier math papers by 2030.
    • He made a bet in March 2025 with Tamay Besiroglu, cofounder of RL environment company Mechanize, that AI tools would not be able to autonomously produce papers I judge to be at a level comparable to that of the best few papers published in 2025, at comparable cost to human experts, by 2030.
  • Debating AI Talent Hubs: Members discussed potential AI talent hubs comparable to the SF Bay Area, mentioning NYC, Boston, Austin, London, Beijing, Singapore, and Zurich.
    • One member declared Switzerland is the spiritual hub of AI whereas another concluded that Zurich is a backwater.
  • Scout Model Aims to Encode Sentence Utility: A member introduced Scout, an experimental attention model that learns directional relevance between sentences, asking “does sentence B actually help sentence A?”.
    • They shared the GitHub repo and invited feedback, asking if attention mechanics can encode functional utility rather than just contextual compatibility.

Yannick Kilcher ▷ #paper-discussion (10 messages🔥):

Symmetry and Ontology, LLMs vs World Models, Wave Field LLM

  • Symmetry Links Ontology in Theory: A member shared a link discussing how group theory (symmetry) and Ontology are related on a philosophical level.
    • It was noted that in physics, symmetry is used to describe fundamental laws and in machine learning, symmetry is used to hard-wire inductive biases that make learning more sample-efficient and physically consistent.
  • LLMs Summarize, Not Create, World Models Claims Pearl: A member linked an article quoting Turing-Award winner Judea Pearl claiming that LLMs can’t create world models, instead they summarize world models created by others, referencing this PNAS paper.
    • Another member agreed with the headline, stating that LLMs are not meant to be world models and can at best be used to bridge world models with text descriptions.
  • Wave Field LLM Repository Surfaces: A member shared a GitHub repository for Wave Field LLM, questioning if it was relevant or just hot air with hard to understand words.
    • Another member asked if there was any associated rigorous paper.

Yannick Kilcher ▷ #ml-news (3 messages):

TikTok Link, FXTwitter Link, AI Agent Hit Piece

  • TikTok Link Spotted: A member shared a TikTok link in the channel.
    • It is unknown what the contents of the TikTok were.
  • FXTwitter Link Shared: A member posted an FXTwitter link in the channel.
    • It is unknown what the contents of the tweet were.
  • AI Agent Writes a Hit Piece: A member shared a link to a blog post titled An AI Agent Published a Hit Piece on Me here.
    • The post details an incident where an AI agent allegedly published a negative article about the author.

MCP Contributors (Official) ▷ #general (30 messages🔥):

MCP Content Negotiation, MCP Client Types, RFC-2295, MCP Extensions, High-Signal SEPs

  • MCP Seeks Content Negotiation Capabilities: A proposal suggests extending MCP’s initialization handshake with a content negotiation capability, allowing clients to declare their type (agent vs human), MCP capabilities, content preferences (format=json|markdown), and verbosity (verbosity=compact|standard|verbose).
    • This would enable servers to adapt subsequent tool results, resources, and prompts accordingly, drawing inspiration from RFC-2295 for content negotiation.
  • Industry Stakeholders are crucial for MCP Extensions: Community members discussed how the bar for modifying the MCP protocol is high, emphasizing the need for industry support and a working implementation.
    • One member suggested reworking the SEP to explicitly cast it as an extension, build an implementation, and gather community support to demonstrate high signal, similar to how MCP Apps garnered support from clients like Block’s Goose.
  • Discord Newbie Learns SEP Posting: A member using Discord for the first time apologized for strange posting while learning about the SEP process.
    • The member also shared a picture illustrating their point about content negotiation.
  • Seeking Napa Valley Summit Seekers: A member announced attending the LF Member Summit in Napa, CA.
    • The member also invited others to meet up and chat about MCP.

MCP Contributors (Official) ▷ #general-wg (1 messages):

Group Meeting Times, Timeful app, Scheduling Apps, Open Source Scheduling

  • Timeful: Open Source App for Group Meetings: A member recommended Timeful for efficiently finding group meeting times.
    • The app is open source and offers a free tier for up to 3 concurrent events, specifically highlighting the availability survey feature.
  • Streamlining Group Scheduling with Timeful: Timeful is suggested as a useful tool for discovering optimal group meeting times due to its open-source nature.
    • Users can leverage its availability survey feature to identify suitable time slots without directly managing the scheduling process within the app itself.

Modular (Mojo 🔥) ▷ #general (13 messages🔥):

Thistle Crypto Library, Mojo vs OpenSSL, ML-KEM and ML-DSA, MacOS Support

  • Thistle Crypto Library Blazes in Mojo 🔥: The Thistle Crypto Library in Mojo 26.1 shows parity/close performance with OpenSSL’s C/Assembly and beats Blake3’s assembly in benchmarks, all in pure Mojo without FFI.
    • A member opened a PR, offering help to enhance the code’s speed and readability compared to equivalent C/C++ code.
  • KCipher-2 Fastest Implementation Emerges: Thistle updated with KCipher-2 in Mojo, claims the title of the fastest implementation out of any language, surpassing the C implementation.
    • The update includes unified tests in github actions with image attached demonstrating the speed.
  • Thistle Adds Post-Quantum Crypto: Thistle v1.0.2 introduces ML-KEM and ML-DSA (Post Quantum Crypto), a CSRNG for OS entropy, SHAKE128/SHAKE256, and updated CI workflows with PQC tests.
    • The library includes approximately 700 CAVP tests, is FIPS validated, and Valgrind validated for memory leak prevention.
  • MacOS Support for Thistle: Members announced that MacOS support has been fixed and everything builds on MacOS now for Thistle.
    • Another library for older algorithms is in progress.

Modular (Mojo 🔥) ▷ #mojo (8 messages🔥):

External function calls in Mojo, Mojo string templating proposal, Writable and Writer traits in Mojo

  • Decomposing External Function Calls: A member is seeking a generic way to decompose external function calls in Mojo, particularly to determine if a function returns a pointer to an externally allocated object and bind its origin to self or self.lib using the struct ExternalFunction.
    • It was suggested to look at cpython.mojo in the standard library for similar implementations.
  • String Templating Proposal Released: A member has opened a proposal for a new string templating feature in Mojo, prompting discussion on the Modular forum.
    • This feature is likely to be post-1.0, with plans to potentially integrate it with the existing Writable and Writer traits using TemplatedWritable.
  • Writable and Writer traits might be unified: Concerns were raised about separating and extending string handling from Writable, particularly the unification of write_to and write_repr_to implementations.
    • A member expressed confidence there’s a way to unify these traits, promising to share their ideas on the forum.

Modular (Mojo 🔥) ▷ #max (2 messages):

MAX backend, Silicon Mac, intermediate layer

  • MAX Backend Untested on Silicon Mac: A user inquired about testing the MAX backend on a silicon Mac.
    • The developer responded that it hasn’t been tested on Mac yet, but since it’s just calling MAX behind the scenes, it should work.
  • MAX as Intermediate Layer: A user mentioned they gave a talk where they referenced the work on MAX as an intermediate layer for people wanting to explore MAX.
    • The user noted it would be nice to have an update on the project’s progress.

Manus.im Discord ▷ #general (22 messages🔥):

Manus Pricing Concerns, Meta Acquiring Manus Rumors, Manus Telegram Crypto Scams, Manus Pro Version Struggles, Reporting Manus Vulnerabilities

  • Manus Pricing Alarms Users: Members expressed concerns about potential price changes and normification after running out of credits.
    • One user humorously inquired about keeping the price the same to prevent the normificationwave.
  • Meta Rumored to Acquire Manus: A user reported receiving an email about Meta acquiring Manus and expressed disappointment.
    • A Manus team member asked the user to DM their email address to investigate further.
  • Telegram Crypto Scams Impersonate Manus: A user inquired about the existence of an official Manus Telegram community after seeing a channel claiming to be official and asking for crypto investments.
    • Another user confirmed that there is no such official Telegram community, suggesting it’s a scam.
  • Manus Pro Version Users Struggle to Build: One user reported difficulties using the Pro version/trial, particularly with Google Scripts, and shared a project link (https://manus.im/share/6IMAZS8Q2nw0ndmvPd4Z8w) seeking help.
    • A Manus team member responded, offering assistance via direct message.
  • Requests for Unlimited Manus Chat Tier Surface: A user suggested a monthly subscription tier similar to ChatGPT or Grok for unlimited chats, as they quickly exhausted points using the Manus Agent in Telegram.
    • The user liked the telegram feature but felt limited by the pricing model.

DSPy ▷ #papers (1 messages):

lakshyaaagrawal: https://x.com/lakshyaaagrawal/status/2024568680324153800?s=46


DSPy ▷ #general (9 messages🔥):

RLM with reasoning models, Qwen3-4B-thinking issues, cca-swebench using RLM, RLM for AI mathematics, New RLM channel

  • Reasoning models work with RLM, Qwen3-4B-thinking has issues: Reasoning models work well with RLM, but sub_lm calls appear to return the reasoning as the answer with Qwen3-4B-thinking which causes the agent to loop, so a member is creating a hook for logging the actual OpenAI complete trace.
    • The member asked if sub_lm might be adapted to use signatures to overcome this issue, and questioned if anyone else had experienced this.
  • cca-swebench use RLM?: One member asked if cca-swebench uses RLM implicitly.
    • Another member mentioned finding someone using RLM for AI in mathematics in a Kaggle competition, linking to the Kaggle code.
  • New RLM channel: A member requested a separate channel for RLMs.
    • Another member created the new RLM channel <#1475619898863649032> due to “popular request”.
  • Dev Availablity: A member asked “Is there anyone who is looking for a dev?“

tinygrad (George Hotz) ▷ #general (3 messages):

tinygrad, dl, metal, gpu on usb, IOS Conference

  • Tinygrad Talk Accepted at IOS Conference: A member announced they were accepted to speak at an IOS Conference in their country about tinygrad, dl, metal, and its GPU on USB feature.
    • They were happy to read about any pointers or tips from the community about this.
  • New Meeting Scheduled for Tinygrad Discussions: A new meeting was scheduled for February 23rd at 8 PM San Diego time to discuss Tinygrad related topics.
    • The meeting time was specified as <t:1771905600:F> (<t:1771905600:R>).

aider (Paul Gauthier) ▷ #general (3 messages):

Security Bug Reporting, Job Board

  • Report Security Bug by Email: A member inquired about the best way to report a security bug.
  • Job Board Request: A member suggested looking into a job board.
    • Additionally, they requested that a message be deleted.