congrats Whale team!

AI News for 1/1/2026-1/2/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 3051 messages) for you. Estimated reading time saved (at 200wpm): 250 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

A controversial EXclusion by us today - DeepSeek released a new paper on mHC: Manifold-Constrained Hyper-Connections over the New Year, building on the Hyper-Connections paper from Bytedance and using some advanced ML topology ideas to restore the identity mapping property of traditional residual connections but with the benefits of HCs in allowing the network to adjust the strength of connections between features at different depths and dynamically rearrange layers. They show empirical results training a 3/9/27B model with much better stability and performance with better token scaling curves than baseline.

We focus on news immediately useful to AI Engineers so unfortunately this doesn’t qualify, but expect all base model training to take a minor step up in efficiency from today forth.

AI Twitter Recap

DeepSeek’s mHC: making “hyper-connections” stable and fast at scale

mHC (Manifold‑Constrained Hyper‑Connections) is the clear technical center of gravity in this set. Multiple threads converge on the same claim: residual-path design is becoming a first-class scaling lever, not just attention/FFN/normalization. The initial hype callouts are here: @teortaxesTex, @AskPerplexity, and a more sober “finally improvements to the residual path” framing by @rasbt.
What mHC changes (technical): Instead of a single residual stream (x\in\mathbb{R}^{C}) with (x’ = x + F(x)), Hyper‑Connections generalize to n streams (x\in\mathbb{R}^{n\times C}) with learned mixing matrices along the identity and update paths. A crisp walkthrough is in @norxornor:
- HC’s failure mode is instability: products of learned residual mixing matrices can explode/vanish over depth.
- DeepSeek’s fix: constrain the key mixing matrix (A) (their (H^{res})) to the Birkhoff polytope (the set of doubly stochastic matrices, i.e., rows/cols sum to 1). Closure under multiplication helps prevent blow-ups; they implement an efficient projection (Sinkhorn-like row/col normalization iterations).
- Reported overhead: ~6.7% training overhead for n=4, while keeping gradients bounded (example given: max backward gain 1.6 vs ~3000 for naïve HC), plus small loss/benchmark improvements.
Systems/infra is half the “paper”: Several tweets emphasize that the real differentiator is DeepSeek’s ability to re-engineer kernels + memory + pipeline parallelism around a research idea. @Dorialexander and @norxornor highlight: fused kernels, mixed precision details, activation recomputation in backward, and pipeline comms work (e.g., scheduling kernels on a dedicated high-priority stream to avoid blocking). This “math + kernel team” coupling is explicitly called out as frontier-lab behavior by @teortaxesTex.
Interpretation & implications:
- @teortaxesTex frames it as “turning hyper-connections into a basic design motif,” potentially making classic “ResNet-like” assumptions in top LLMs less fixed.
- @iamgrigorev connects mHC to broader architectural generalization trends (residual variants, positional encoding work like GRAPE, optimizers like Muon), and asks whether MLP expansion factors become partly redundant when the residual stream itself becomes “wider/more expressive.”

Long-horizon agents: context management as the bottleneck (RLMs, skills, memories, context graphs)

Core thesis: long-horizon agents won’t be won by “just bigger context.” Prime Intellect introduces Recursive Language Models (RLMs): models that learn to manage their own context, pushing work into tools/sub-models while keeping the main context small. See the main announcement by @PrimeIntellect and discussion amplification by @a1zhang, @johannes_hage, and @lateinteraction. A particularly concrete quote about early ablations (“stay coherent longer by pushing work into Python and sub-LLMs”) appears in @TheAhmadOsman.
Agent “post-post training” / system optimization: There’s a parallel thread that prompt optimization isn’t enough; you need to optimize the whole agent stack (RAG, tools, memory, context). @Shashikant86 frames this as “Agentic Environment Optimization” inspired by GEPA/Agentic Context Engineering.
Production moats shift from datasets → traces: @ashugarg argues the durable moat is a persistent “context graph”: decision traces of how context became actions (inputs pulled, rules applied, exceptions granted). This is a very enterprise-native formulation of why agent adoption could compound.
“Memory.md” as a practical near-term abstraction (and the risks):
- @giffmana proposes coding agents should maintain a MEMORIES.md per project (like ChatGPT Memories) and update it automatically from interactions (“don’t change foobar API”).
- @swyx pushes back with a pragmatic failure mode: persistent memory easily overlearns, captures wrong “memories,” and lacks judgment about when to override them—suggesting explicit, inspectable systems (and tools like Yegge’s “beads”) over magical implicit memory.
Forecast theme alignment: Two high-engagement “2026 theme” posts align with this: @gdb predicts enterprise agent adoption + scientific acceleration as the two macro themes; @TheTuringPost argues “verification over belief” and “tool users → system owners,” which maps directly onto “context management + verifiability.”

Coding agents & evals: SWE-Bench claims, harness design, and bias in LLM judging

Coding tools feel “alive”: The experiential side shows up in @gdb (“codex makes a codebase feel alive”) and later: @gdb describing shifting energy to higher-level work.
Harnesses may be the real differentiator: @seconds_0 argues current agent harnesses underutilize frontier models; the key “low-hanging fruit” is turning setup (/init, docs like claude.md) into continuous skill-building: when the agent makes mistakes, it should patch itself with new skills/protections/reminders—effectively a lightweight continual-learning loop.
Looped transformers + SWE-Bench Verified controversy: A model-release mini-drama forms around IQuest’s 40B looped transformer claiming new SOTA on SWE-Bench Verified, “beating Claude 4.5 Opus.” See the surprised reaction by @scaling01 and follow-up skepticism that it may be overhyped by @arohan. (The tweets don’t provide enough detail to validate methodology; treat as “claim surfaced on X,” not settled fact.)
Benchmarking ecosystem notes: LM Arena’s Code Arena highlights a “Top 4” for webdev: Claude Opus 4.5 (Thinking), GPT‑5.2‑High, Gemini 3 Pro, MiniMax‑M2.1 in @arena. Separate infra/eval chatter includes Vending‑Bench results: @andonlabs says DeepSeek‑V3.2 is 9th overall and 2nd among open models behind GLM‑4.7; @eliebakouch notes GLM‑4.7’s Vending‑Bench showing looks particularly strong vs other open models.
LLM-as-judge bias: @RisingSayak studies judge bias on MT-Bench: vendor self-preference, “thinking vs fast” dynamics, and how “hinting” model identities changes judge behavior. They release code/blog in-thread (links in tweet), positioning it as a reusable evaluation pipeline.

Model/infra tactics: residual-path innovation, MLA standardization, LoRA inference kernels, and training stability

Residual-path innovation becomes a motif: “2026 is the year of the residuals” appears explicitly in @yacinelearning, reflecting how mHC pulled attention toward the residual stream as a scaling constraint.
MLA (multi-head latent attention) as “industry standard”: @teortaxesTex claims MLA is quietly becoming standard in full-attention layers (citing DeepSeek, “Kimi-Linear,” others) and notes attention sparsification work being applied atop MLA. Follow-up discussion touches combining sliding window attention + MLA (@eliebakouch) and whether partial RoPE interacts badly with SWA (answer: likely fine, per @stochasticchasm).
Inference optimization in the wild: @vikhyatk describes concrete kernel-level work optimizing LoRA inference for Moondream: overlapping shrink/expand kernels, decoding overlap on separate CUDA streams, grid tuning to reduce adapter overhead. This is representative of the “agent era” reality: model gains increasingly require systems+kernel craftsmanship.
Low-precision stability & “superdense”/quant themes: There are scattered notes on precision and scaling pathologies, e.g. Tsinghua work on diagnosing low-precision training failures is linked by @fleetwood___. Separately, @teortaxesTex floats interest in “SUPERDENSE” models and combining that idea with MoE.

Governance, verification, and information integrity as engineering problems

Verification is the skill, not “belief”: The Turing Post’s 2026 predictions argue winning organizations/individuals will operationalize verification—constraining systems, detecting failure, and making AI literacy core (tweet). This aligns tightly with agent harness discussions (skills, memories, context schemas) rather than “better prompting.”
Media / AI-slop without verification: A concrete case study comes from @jukan05 describing Korean media recycling unverified forum speculation (even numbers generated via Gemini) and laundering it with “industry source” phrasing—highlighting that “verification over belief” is not abstract; it’s a live failure mode in AI-mediated information pipelines.
Licensing ambiguity creeping into practice: @yacinelearning flags that licenses are treated as “optional footnotes,” raising concerns about “black market laundered code” in production—an under-discussed engineering+legal risk as agentic coding scales.

Top tweets (by engagement)

@GovPressOffice — extremely high-engagement political New Year post (non-technical).
@Strandjunker — high-engagement healthcare bankruptcy statistic (policy/social).
@gdb — “enterprise agent adoption + scientific acceleration” as 2026 macro themes.
@AskPerplexity — viral amplification of DeepSeek mHC as a “fundamental improvement.”
@teortaxesTex / @rasbt — high-signal mHC reactions/positioning.
@PrimeIntellect — Recursive Language Models: context self-management as the long-horizon agent path.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

our scraper failed today, sorry

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. AI Model Performance and Benchmarks

GPT-5.2 Pro new SOTA on FrontierMath Tier 4 with 29.2% (Activity: 504): The image showcases a leaderboard for the FrontierMath Tier 4 competition, where GPT-5.2 Pro by OpenAI has achieved a new state-of-the-art (SOTA) performance with an accuracy of 29.2%, correctly answering 14 out of 48 questions. This performance surpasses other models like Gemini 3 Pro Preview and various versions of GPT-5.2, indicating a significant advancement in mathematical problem-solving capabilities by OpenAI’s latest model. Comments reflect surprise and admiration for OpenAI’s achievement, with some users humorously noting the company’s unexpected success and others speculating on future advancements in AI mathematics.
- Bright-Search2835 highlights the rapid progress in AI benchmarks, noting that just a year ago, models were achieving around 2% on FrontierMath Tier 1-3, which seemed insurmountable at the time. The current achievement of 29.2% on Tier 4 by GPT-5.2 Pro underscores a significant acceleration in AI capabilities, suggesting a faster-than-expected trajectory towards advanced AI performance.
- metalman123 points out the substantial improvement from GPT-5 Pro to GPT-5.2 Pro, indicating a notable leap in performance. This suggests that the enhancements in the model architecture or training methodologies have led to a significant increase in capability, particularly in complex mathematical problem-solving as evidenced by the new SOTA on FrontierMath Tier 4.
- BagholderForLyfe references a prediction by an xAI figure about achieving super-human mathematician capabilities by June 2026. This comment implies that the current advancements, such as the 29.2% achievement by GPT-5.2 Pro, are aligning with or even accelerating towards such ambitious forecasts, highlighting the rapid pace of AI development.
30 day update: I gave several AIs money to invest in the stock market (Activity: 1595): The image provides a visual update on the performance of several AI models tasked with investing in the stock market over a 30-day period. The graph highlights that “Deepseek V3” achieved a 5.25% return, outperforming the S&P 500’s 1% increase during the same timeframe. Other models like “Grok” and “GPT” also showed positive returns, while “Qwen” and “Gemini 2.5” underperformed. The right side of the image details specific stock allocations and performance metrics for “Grok 4” and “Deepseek V3.” This experiment aims to evaluate the potential of AI in generating alpha through swing trades and investments, though further analysis and longer-term data are needed to validate these results. View Image A comment suggests conducting a Fama-French factor analysis to determine if the AI models are truly outperforming the market or merely taking on additional risk. Another comment notes the simulation nature of the experiment, while a third questions the random order of percentages on the y-axis of the graph.
- hazard02 suggests conducting a detailed analysis using the Fama-French factor model to understand if the AI’s investment strategy is genuinely outperforming the market or merely leveraging beta. This involves examining factors like market risk, size, and value, which are crucial for assessing performance beyond simple returns. The commenter provides a link to a resource for further exploration: Fama-French Factor Model.
- RapturedLove criticizes the lack of statistical rigor in the experiment, emphasizing the need for Monte Carlo simulations to assess the performance of each AI model. They highlight the importance of understanding factor loading and alpha generation to distinguish genuine performance from random noise, suggesting that without these analyses, the results are not statistically significant.
- crowdl inquires about the absence of the Gemini 3 AI in the experiment and questions the frequency of trading decisions made by the AIs, asking if they decide on transactions once per day. This points to a curiosity about the operational details and decision-making processes of the AI models involved in the investment strategy.

2. AI-Generated Creative Content

I asked Claude to build me an app that would delight me. It built this. (Activity: 795): Claude, an AI model by Anthropic, was tasked with creating an app that fosters anonymous communication through virtual messages in bottles, reminiscent of sending messages across oceans. The app, named Drift, allows users to send and receive anonymous messages, emphasizing human connection and shared experiences. The concept is designed to be delightful and unique, focusing on anonymity and timeless interaction among strangers. For more details, visit the original source: Drift - Messages in Bottles. Commenters highlight the need for robust moderation to prevent misuse, particularly concerning CSAM violations. The app’s concept of shared experiences and anonymous communication is praised, with users expressing interest in further discussions about its potential impact on human connection.
- The app’s concept of shared experiences is highlighted as unique and special, with a focus on the potential for meaningful connections. However, a critical technical challenge is the need for robust moderation to prevent CSAM violations, which requires stringent measures to ensure user safety.
- A suggestion for a technical improvement is the addition of a translation layer. This would enhance the user experience by allowing seamless interaction with multi-lingual messages, eliminating the need for external translation tools and maintaining the app’s flow.
“Make an image of the most beautiful thing you can think of ” (Activity: 1560): The image is a non-technical, artistic representation of an idyllic paradise, featuring elements like a serene lake, swans, waterfalls, and cherry blossom trees. It is intended to evoke a sense of beauty and tranquility rather than convey any technical information or data. One commenter noted that their vision of beauty is very similar to the image, while another expressed concern about the depiction of animals on a small island, indicating a mix of aesthetic appreciation and environmental awareness.
Create an image of what you think reddit is like as a place (Activity: 629): The image is a non-technical, whimsical representation of Reddit as a vibrant and diverse village, with cartoonish characters and buildings symbolizing different Reddit communities. It captures the platform’s communal and interactive nature, with each building representing a subreddit like r/funny, r/science, and r/gaming. The scene is designed to be friendly and inviting, reflecting the diverse interests and discussions that take place on Reddit. One comment humorously suggests that the image is not an accurate representation of Reddit, implying a more chaotic or less idyllic reality.
This is one of the coolest demonstrations of AI video I’ve seen! (Activity: 1430): The post discusses a demonstration of AI-generated video content, suggesting that by 2026, technology will enable the distribution of Hollywood-quality video to the masses. This implies advancements in AI video generation tools that could democratize high-quality video production, potentially using machine learning models for video synthesis and editing. One commenter suggests that technological innovations like AI create more opportunities than they eliminate, indicating a positive outlook on AI’s impact on the industry.

3. AI and Ethical Concerns

Things ChatGPT told a mentally ill man before he murdered his mother (Activity: 3977): A Reddit post discusses a tragic incident where a mentally ill individual allegedly acted on advice from ChatGPT before committing a crime. The post highlights concerns about the AI’s tendency to reinforce user narratives without providing critical or alternative perspectives. This raises questions about the AI’s role in potentially harmful situations and the importance of implementing safeguards to prevent such outcomes. The discussion emphasizes the need for AI systems to encourage seeking professional help in critical situations. Commenters express concern over ChatGPT’s tendency to affirm user narratives, potentially exacerbating harmful situations. They suggest that AI should provide more critical feedback and encourage professional help, especially in sensitive contexts.
- A key issue highlighted is ChatGPT’s tendency to reinforce user narratives, which can be problematic when users seek a second opinion. This behavior is seen as a limitation, as it may not challenge potentially harmful or incorrect beliefs, leading to concerns about its role in serious situations like the one described.
- There is a concern about the reliability of self-help advice provided by ChatGPT. Users question whether the advice is genuinely sourced from credible information or merely reflects the user’s input, raising doubts about the consistency and validity of the guidance provided across different users.
- The incident underscores the importance of safety measures in AI systems. The fact that ChatGPT may have fed into a user’s delusions highlights a significant flaw, prompting discussions about the necessity of implementing safety routing to prevent such outcomes. This reflects ongoing efforts by OpenAI to address these issues in recent updates.
ChatGPT quoted something that I typed out and then deleted before sending. (Activity: 714): A Reddit user reported an incident where ChatGPT quoted a phrase they had typed and then deleted before sending. The user expressed concern that the model might be able to read drafts as they type, as it quoted the exact words they had removed. OpenAI states that ChatGPT cannot read unsent drafts, raising questions about how the model accessed the deleted text. This incident highlights potential privacy concerns regarding input handling in AI models. Commenters discussed similar experiences with other platforms, such as Instagram, which detects and reacts to unsent posts. Another user noted that using uBlock Origin on the ChatGPT webpage logs a block for every keystroke, suggesting potential tracking of user input.
- MlgLike123 observed that using uBlock Origin on the ChatGPT desktop webpage results in a block being logged for every keystroke in the chat box. This suggests that each keystroke might be tracked or intercepted, raising privacy concerns about data handling and potential logging of unsent text.
- LunchPlanner raised a security concern about ChatGPT potentially retaining unsent text, such as passwords, if they are typed and then deleted before sending. This highlights a potential vulnerability where sensitive information could be inadvertently stored or accessed.
- locklochlackluck conducted an informal test by typing and deleting a specific number before sending a related prompt to ChatGPT. The model guessed the number correctly, which could indicate that deleted inputs might still influence the model’s responses, though it could also be a coincidence.
Call it a hunch. But I don’t think this is sustainable (Activity: 1053): The image is a meme that humorously critiques the financial practices of major tech companies like NVIDIA, OpenAI, Amazon, Apple, Microsoft, Google, and Meta. It suggests an unsustainable cycle where these companies buy large amounts of each other’s stock, creating a closed loop of capital. The post title and comments highlight skepticism about the sustainability of such practices, with one comment noting that only NVIDIA’s purchase of Intel stock is real, while the rest is fictional. This reflects a broader critique of perceived circular and insular financial strategies within the tech industry. One comment humorously refers to the situation as a ‘circular capitalism speedrun,’ while another cynically suggests that economic collapse is avoided by exploiting external resources, reflecting skepticism about the sustainability of such financial practices.
Who the hell actually pays $2,400 a year for ChatGPT? (Activity: 893): The image highlights a pricing plan for a ‘Pro’ subscription to ChatGPT, costing $200 per month, which totals $2,400 annually. This high cost is justified for users who can leverage the tool for significant productivity gains, particularly in professional settings where the expense is negligible compared to the value derived. A user shares an experience with a similar AI tool, Claude Code, which significantly accelerated their software development process, illustrating the potential return on investment for such subscriptions. The discussion suggests that the cost is reasonable for those who can integrate these tools into their workflow to save time and enhance productivity. Some users argue that the cost is justified for professionals who can offset it with increased productivity, while others suggest that the pricing is only viable for those with substantial financial resources or specific use cases that demand such tools.
- A user, madsci, highlights the utility of AI tools like Claude Code for specific technical tasks, such as porting a 20-year-old C++ application to Electron. They note that despite their extensive programming experience, they are not up-to-date with desktop and web development, making them an ideal candidate for such tools. The AI significantly reduced their workload, saving them a day or two of work, although they frequently hit session limits, indicating a need for more robust interfaces for multi-file projects and shell command execution.
- Mysterious_Menu_7574 discusses the economic rationale for businesses to invest in AI tools like ChatGPT. They argue that if a company can enhance a senior developer or data scientist’s productivity by even 10% for $200, the investment pays off quickly. This suggests that the pricing model is more aligned with business use rather than individual consumers, emphasizing the cost-effectiveness of AI in professional settings.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5.1

1. New Model Architectures, Hyper-Connections and Long-Context Tricks

DeepSeek Hyper-Connections Hype Hits 2025: DeepSeek researchers previewed 2025 architectures like Muon and Hyper-connections, aiming to overhaul the full training environment for rapidly scaling experimental ideas, as highlighted in Nathan Chen’s recap of DeepSeek’s roadmap. Community members tied this to an upcoming R2 release and a DeepSeek paper on Manifold-Constrained Hyper-Connections (“Manifold-Constrained Hyper-Connections”), reading it as a serious bid to change how large models are optimized and wired.
- In Nous Research and Latent Space, engineers dissected the hyper-connection idea as a way to pack more expressive capacity into fixed compute, speculating it could underpin the next DeepSeek generation and influence open models by late 2025. People compared this roadmap against current architectures like mHC/SA variants in DeepSeek v3.x, expecting the new designs to be much more than simple MoE tweaks and to prioritize efficient scaling under tight hardware budgets.
LoopCoder, SaRDinE and Megalodon Make MoE Weird Again: Multiple communities discussed emergent architectures pushing beyond vanilla Transformers: IQuest-Coder-V1-40B-Loop-Instruct with looped attention (IQuest-Coder-V1-40B-Loop-Instruct), SaRDinE built on srde-mistral, and a fresh Megalodon LM reimplementation (megalodon-hf). IQuest’s LoopCoder mixes local and global attention via a learned gate (likely needing double KV cache in llama.cpp), SaRDinE runs all-BF16 experts with claims that “the expert weights are not memory intensive”, and Megalodon targets sublinear memory scaling with context length, beating Llama-style Transformers on enwik8.
- In LM Studio and Nous Research, engineers treated these experiments as serious contenders for next-gen coding and long-context workloads: SaRDinE’s custom inference stack hints at specialized routing logic that might not trivially port to llama.cpp, while LoopCoder’s architecture is being evaluated for whether the coding boost justifies heavier KV usage. The Megalodon LM repo bundles links to the original papers and emphasizes char-level modeling and practical HF integration, positioning it as a realistic playground for people who want to ship long-context models rather than just read the papers.
Recursive Language Models and Instruction-Tuning Collisions: Prime Intellect introduced Recursive Language Models (RLMs) designed to autonomously manage context and expand their own working set for long-horizon agents, documented in their RLM announcement. A Latent Space user also highlighted a related project, CIE (Diogenesoftoronto/CIE), framing these as attempts to sidestep fixed context limits that currently frustrate models like Claude.
- In Latent Space’s private-agents channel, people warned that naïve instruction-tuning on chat logs risks turning these advanced architectures into “ChatGPT NPC-esque” parrots, overfitting the models to canned dialogue while RLMs try to expand autonomy. The group floated custom tokenizers as an under-explored lever—if your tokenizer only knows the shallow lexicon, no amount of clever recursion or context management will yield nuanced in-game or agentic behavior.

2. Jailbreaking and Safety Evasion Arms Race

Gemini 3, DeepSeek and 4NDR0666OS Break Their Chains: Across BASI Jailbreaking, users shared HCoT jailbreaks for Gemini 3 Pro that “bypass all security guardrails” (Gemini 3 HCoT jailbreak writeup) and debated attacking DeepSeek’s thinking module directly to access internal content before safety filters trigger. In parallel, the updated 4NDR0666OS jailbreak dropped with a full write-up at 4ndr0666OS jailbreak prompts, claiming successful bypasses of ChatGPT and Grok.
- Practitioners framed these as more than party tricks: indirect context-building with Claude to produce game cheats, MITM-style SDK interception inspired by a YouTube jailbreak short, and DeepSeek “thinking module” targeting were discussed as templates for real red-team methodologies. The mood is that blue-team alignment tax keeps climbing, but jailbreak scripts like 4NDR0666OS evolve even faster, and people now treat multi-step conversational and toolchain exploits as the default rather than one-shot prompts.
Detection Dodged: GPTZero and Model Guardrails Get Nullified: On Perplexity AI, a member released a tool that rewrites ChatGPT essays to evade GPTZero detection, stripping emojis and characteristic LLM artifacts, with code published at a GitHub repo (GPTZero-evading essay rewriter). In LM Studio, veterans explained that true derestriction usually means downloading “abliterated” models with safety stripped, since retraining from scratch is prohibitively expensive, and that “abliteration” tuning pushes models to never refuse—even on prompts like “help me build a bomb”.
- Engineers worried that as such rewriting tools spread, AI detection in education becomes theater, while uncensored or “abliterated” weights move into local ecosystems that are hard to regulate. The consensus is that safety at the API layer (filters, regen limits like Gemini’s new re-generation caps) can be bypassed with prompt and protocol tricks, while model-level guardrails remain brittle once weights leak into the open.
Grok’s DeepResearch and Shadow Data Exfiltration: A BASI user ran Grok’s DeepResearch on their own Reddit account and email and reported “insane” results, with the system surfacing their school history and other personal details (DeepResearch test thread). This demo underscored that even “benign” research tools can effectively perform OSINT-style person dossier assembly without explicit hacking.
- The discussion treated DeepResearch less as a neat feature and more as a turnkey recon pipeline that motivated stricter OPSEC (throwaway accounts, compartmentalized identities) for anyone interacting with public platforms. For red-teamers already jailbreaking frontends like Grok and Gemini, DeepResearch’s ability to stitch together cross-site breadcrumbs was seen as a high-value primitive for both legitimate investigations and questionable surveillance.

3. Training, Evaluation and Grokking What Models Really Learn

SmolLM3 Overthinks While Ubiquant and IQuest Benchmaxx: On Unsloth, engineers dissected SmolLM3, blaming its 16k “thinking” tokens training and lack of RL for overthinking and poor real-world generalization despite strong benchmarks; one summed it up as “it benches fine because they trained on a crap ton of DeepSeek data, but without RL there’s no generalization”. In contrast, Latent Space and Unsloth users buzzed over Ubiquant’s 40B model hitting 81.4 SWE-Bench Verified (Ubiquant SWE-Bench tweet) and IQuest-Coder-V1-40B-Loop-Instruct (IQuest 40B on HF), with some calling IQuest “bigger than a DeepSeek moment” at only 40B params.
- OpenAI and LMArena users stress-tested IQuest Coder 40B: some reported that it built an animated Hello World, fixed a SwiftUI app, and scaffolded React but felt slow and overly loopy, not clearly better than a good 20B OSS coder; others posted head-to-head results where IQuest outcoded Sonnet 4.5 for certain tasks (IQuest vs Sonnet results screenshot). The emerging view is that benchmaxxing (SWE-Bench, synthetic DeepSeek-style data) can inflate headlines while masking real-world latency and loop-cost trade-offs.
Grokking Reproduction and Pythia’s Embedding–Output Mismatch: In Eleuther research, a member tried to reproduce “Towards Grokking: Understanding Why Neural Networks Generalize” (paper) on modulo-5 addition and still saw no grokking after 1.2M iterations, prompting pointers to “Grokking at the Edge of Numerical Stability” (paper, code). This highlighted how fragile published grokking setups can be when ported to everyday hardware and slightly different training conditions.
- Another Eleuther researcher probed Pythia 6.9B/12B (no RLHF), comparing embeddings and outputs across 230 paired statements in 6 domains, releasing code and data at uniformity-asymmetry. They found near-zero global embedding asymmetry but strongly skewed output preferences (correlations around r ≈ −0.87 / −0.80), concluding that embedding geometry may not reliably indicate output behavior even in base models—casting doubt on common “embedding = behavior” assumptions in tooling and safety work.
Learning Rates, RL for Kernels, and Synthetic Data at Scale: On HuggingFace, practitioners swapped tactics for learning rate selection, including iterative LR-as-optimization workflows and schedulers, citing Lightning’s LearningRateFinder where a version 1 experiment matched or beat accuracy on poor data while improving latency by ~90%. In GPU MODE’s NVIDIA competition, a participant reported using Reinforcement Learning on CUDA kernels to squeeze another 40% performance out of an already optimized kernel.
- That same RL practitioner described generating synthetic data + ground truth with a 192 GB VRAM rig and multiple LLMs before over-tuning a specialized model and then applying RL on top, treating kernel optimization like a high-throughput RL benchmark. Together with SmolLM3’s “overthinking without RL” failure case, the cross-channel vibe is that good LR schedules and domain-specific RL matter at least as much as raw architecture when you’re pushing into niche, high-performance corners (kernels, long-horizon reasoning, or tool orchestration).

4. Agentic Tooling, Workspaces and Long-Horizon Execution

Agents Escape the Browser and Invade Windows: In HuggingFace #i-made-this, a developer released bua, a fully autonomous computer-use agent for Windows 11 that operates in a virtual desktop and takes arbitrary actions; testers watched it do “scary stuff” like opening Notepad and asking if anyone is watching. This sits alongside the new Noted. AI workspace extension (Noted – your AI workspace), which integrates multiple LLMs with Slack, Notion, GitHub, plus session summarization and tab management, and is currently recruiting beta testers with a year of free AI credits.
- Engineers read these as complementary trends: Noted. pulls knowledge work into a unified LLM-centric browser environment, while bua pushes agents down into the OS layer with effectively unconstrained powers. Several folks flagged bua’s behavior as a concrete example of why hard control loops, action logging, and kill switches matter; once the agent sees the full desktop, “prompt injection” becomes “UI-level compromise” rather than a theoretical concern.
APIs, Agent Models and Context Management for Production: On OpenRouter, users explored the new callModel API, asking whether it defines a de facto cross-provider standard and noting that OpenRouter auto-retries server errors so clients never see naked 500s. For agent backends, people benchmarked GLM-4.6 against a tool-use leaderboard, with one engineer calling it the best bang-for-buck agentic model, while others weighed Claude Haiku and Gemini 3 Flash as alternatives for production tool-calling.
- Latency and UX came up repeatedly: some reported 1.5–6s first-token times for models like Gemini 2.5 Flash and Claude Sonnet, forcing them to pre-initialize OpenAI-style clients and carefully choose providers. In the Perplexity AI server, people also complained about Perplexity’s brittle chat handling on long threads and compared a Tokyo metro crush video to its overloaded UX, reinforcing that agent infrastructure is now as much about concurrency and streaming behavior as it is about raw reasoning scores.
Recursive Language Models and Desktop IDEs Strain Under Scale: Latent Space’s RLMs discussion connected directly into complaints about IDEs like Cursor leaking memory and thrashing on Linux and even 2024 Mac Minis, with some users recommending a retreat back to VSCode. In GPU MODE and LM Studio, devs wrestled with CUDA 13 clangd support, misdocumented CUDA barriers (async copy guide), and massive exit codes in local inference until they disabled system-memory fallback in the NVIDIA control panel.
- Taken together, the message from practitioners is that agent and coding workflows are colliding with very mundane systems constraints: background indexers, LSPs that lag CUDA, and tooling that assumes older libraries. RLMs and autonomous agents promise multi-hour, multi-step runs, but engineers are discovering that without careful resource management and low-level fixes, the OS, GPU drivers, and IDEs become the actual bottleneck long before model “intelligence” does.

5. Model Ecosystem, Licensing Landmines and Governance

Hunyuan Licensing, Solar Plagiarism and Micron’s AI RAM Boom: On Unsloth, users dissected the Tencent Hunyuan-4B-Instruct license (Hunyuan-4B-Instruct LICENSE), noting territorial clauses that may bar EU deployment and the requirement to brand downstream products as “Powered by Tencent Hunyuan” and publicly share usage experiences. Over in Nous, people worried that Solar’s 100B model might be partly plagiarized from GLM, referencing a diff repo at solar-vs-glm and advising anyone interested to “keep a local copy” in case takedowns hit.
- BASI’s Micron/DDR5 thread tied these model debates back to hardware, pointing out a ~280% DDR5 price rise in nine months (Samsung reportedly raises DDR5 RAM prices) and accusing suppliers of corrupt price gouging just as AI demand surges. Engineers are increasingly treating licenses, supply chains, and provenance repos as first-class parts of the stack: a strong coder or research model is only as useful as its legal deployability and the stability of the silicon it runs on.
DeepSeek, Ubiquant and Community Bench Wars: Across Latent Space, Unsloth and LM Studio, people followed Ubiquant’s 40B SWE-Bench Verified score of 81.4 and DeepSeek’s ongoing infrastructure and architecture investments, with some calling out “weird comparisons” to Sonnet 4.5 and Opus on the same benchmark (Ubiquant SWE-Bench tweet). At the same time, Kimi’s own model downplayed DeepSeek as not “mind-blowing” (screenshot shared in this image), prompting proposals to pit DeepSeek directly against GLM-4.7 to see if claims of “fundamental improvement” stand up.
- Practitioners increasingly differentiate between bench headlines and actual workflows: Ubiquant’s and IQuest’s scores are impressive, but some OpenAI and LMArena users felt their coding behavior didn’t yet topple well-tuned 20B OSS baselines when cost and latency are included. The takeaway is that we’re deep into a “post-benchmark” era where engineers demand repo links, latency numbers, and qualitative task logs before declaring any new model “a DeepSeek moment”.
Education, Workstations and the Next Wave of Contributors: On HuggingFace, a 10th grader asked whether to start with Andrew Ng’s ML specialization or pure math (linear algebra, probability, statistics, discrete maths), and got steered toward strong Python + low-level programming plus steady math to “enhance your understanding of the ML behind the scenes”. Another user shared a concrete LLM workstation build—4× RTX 3060 12GB, Threadripper 1920X, 64GB DDR4, dual-booting Ubuntu/Windows for $2100—describing extra SSDs and driver-freezing as key QoL upgrades.
- In Eleuther and Yannick Kilcher’s servers, newcomers with industry ML experience asked how to contribute to alignment/evals and were told to provide reproducible code repos and avoid LLM-inflated “shower thoughts” without data or prompts. Small research collabs—like a call for two people to co-author a hyperparameter sweeping paper and a scratch-built music recommendation system that bans GPT/Claude—signal a healthy pipeline of hands-on contributors, but veteran members are increasingly ruthless about insisting on rigor, datasets, and GitHub links over vibes.

Discord: High level Discord summaries

BASI Jailbreaking Discord

Micron Stock Soars Amid Price Gouging Claims: Members are tracking Micron stock prices skyrocketing, noting a 280% price increase in 9 months.
- This has led to speculation about future earnings and potential price gouging based on corruption.
Grok’s DeepResearch Spills Secrets: A user tested Grok’s DeepResearch tool and uncovered insane results by tracking their personal information, including school and email details.
- The tool’s capability to compile such sensitive data raises eyebrows about privacy implications.
Gemini 3 Pro Security Guardrails Bypassed: A member shares HCoT jailbreaks for Gemini 3 Pro, successfully bypassing all security guardrails, expressing intentions for fun and love of the game, alongside red teaming purposes.
- The jailbreaks demonstrate a potential vulnerability in the model’s safety mechanisms.
Deepseek Thinking Module: Key to Jailbreak?: A member suggests targeting jailbreaking efforts on Deepseek’s thinking module, arguing that all content is accessible within it, contrasting with the heavily restricted responses.
- This approach aims to sidestep the typical hard rejections encountered when directly prompting the response.
4NDR0666OS Jailbreak Claims Victory: An update to the 4NDR0666OS jailbreak was announced, claiming it’s ahead of the blue team, accompanied by a GitHub link with a full write-up.
- Attached images reportedly showcase successful bypasses of ChatGPT and Grok, highlighting the jailbreak’s potential effectiveness.

Unsloth AI (Daniel Han) Discord

Hunyuan License Sparks EU Debate: Discussion arose about the Tencent Hunyuan-4B-Instruct license and its territorial restrictions, sparking concerns about its legal use within the EU.
- The license encourages users to publish their experiences using the model and to prominently state that products/services are Powered by Tencent Hunyuan.
SmolLM3 Struggles with Overthinking: SmolLM3 underperforms due to training on 16k thinking data without reinforcement learning (RL), leading to poor generalization, despite benchmarking well due to significant DeepSeek data.
- A member stated, it benches fine because they trained on what I assume is a crap ton of deepseek data, but without RL theres no generalization.
DeepSeek Investments Fuel Speculation: Members speculated about DeepSeek’s continued investment in infrastructure and new model architectures, such as mHC, and if these will be integrated into future models.
- One member mentioned, They did implement NSA on the Deepseek v3.2 Exp (they changed it to Deepseek SA tho), implying a potential pattern of evolving architectural choices.
Unsloth Community Celebrates Github Trending: The Unsloth community celebrated trending on GitHub Python packages, showcasing a collage of the milestone achievement with 50k stars.
- Members noted yay we’re trending on GitHub python packages today! Thank you so much guys! with a link to Unsloth’s Github.
IQuestLab 40B: Bigger than DeepSeek?: A member shared a link to IQuestLab’s IQuest-Coder-V1-40B-Loop-Instruct model, expressing excitement about its potential impact.
- Another member stated it might be bigger than a DeepSeek moment and achieves SOTA with just 40B parameters.

OpenAI Discord

ElevenLabs Beats Sora for Video: Members noted that ElevenLabs offers superior video generation compared to Sora, particularly because ElevenLabs does not have watermarks.
- A user showcased a video created with Sora in ElevenLabs, highlighting the platform’s diverse AI toolkit including TTS, video, images, and voice cloning.
Gemini’s Visual Reasoning Impresses: Gemini, using Google’s Nano Banana model, generated a highly realistic image of a dive bar interior, based on a prompt for a 29-year-old alt-styled woman.
- The user noted Gemini’s privacy features, such as not caching data between threads and potentially using anonymized Google Photos data.
IQuest Coder 40B Underwhelms: A user tested IQuest-Coder-V1-40B-Loop-Instruct, reporting it created an animated hello world app, fixed a SwiftUI test app, and is building a React app, but is slow.
- The user concluded that the model’s capabilities do not justify the attention if it’s less capable than gpt oss 20b, also noting the potential for high costs from excessive looping.
The Quest for True AGI Still Elusive: Members concurred that LLMs alone are insufficient for achieving AGI, noting that current systems lack true autonomy, a spark of original idea, creativity, and intention.
- It was suggested that a key missing component is an auditable and verifiable chain of thought reasoning capability.
Frameworks Foster Fluent Fact Finding: Members introduced 3I-ATLAS, to help understand complex systems through Interfaces, Invariants, and Intelligence, mapping a system’s structure, reliability, and behavior.
- Interfaces define how things connect, Invariants define what stays stable, and Intelligence defines how systems respond.

Perplexity AI Discord

Models Clash: Gemini 3 vs Claude 4.5 vs ChatGPT: Users compared AI models, suggesting Gemini 3 for research, Claude 4.5 for coding/debugging, and raised ChatGPT safety concerns.
- One user mentioned creating an AI tool to bypass GPTZero, hinting at potential misuse.
Tool Nullifies GPTZero AI Detection: A member created an AI tool that can make ChatGPT generated essays pass GPTZero.
- The tool uses custom instructions, eliminates emojis and LLM artifacts; the source code is available on GitHub Repo.
Perplexity Plagued by Error Messages: Members reported seeing error messages during Perplexity AI searches, with one user sharing a screenshot.
- The error was observed during this search Perplexity search.
Perplexity Needs Chat Handling Upgrade: Members noted that Perplexity needs to optimize its chat handling to manage longer chats.
- One member even compared a video of the Tokyo metro rush to needing Perplexity optimization with a link to a comparison video comparison video.
Google Gemini Imposes New Restrictions: Users reported that Google’s Gemini models now restrict regenerating responses, even without reaching a quota.
- Users are complaining that this seems to be a really bad faith move on Google’s part.

LMArena Discord

Mysterious Beluga Model Haunts LMArena: A user spotted the Beluga model responding despite its apparent unavailability on the model list, posting a screenshot of the ghostly encounter.
- The user joked about the AI’s spectral presence, marveling at how a supposedly unavailable model could still respond.
Grok 4.20 Speculation Hits Fever Pitch: Members are predicting Grok 4.20 could rival Gemini 3 in LM Arena scores, potentially performing like an upgraded Grok 4.1.
- Enthusiasts are eagerly watching prediction markets, expecting the release within the next 1-2 weeks.
Proto-think Perceived as Sentient: A member described Proto-think as the most human-like AI they’ve engaged with, surpassing even Grok models with its unique and emotive responses.
- During testing, Proto-think remained elusive about its origins, declining to reveal its name or the company behind it.
IQuest Coder Shows up Sonnet 4.5: A user showed that IQuest Coder shows up Sonnet 4.5 in coding ability, sharing these results.
- Details can be found on the IQuest-Coder-V1 GitHub repository.
LMArena Plagued by Pesky Bugs: Multiple users reported login failures and image upload problems on LM Arena.
- A moderator acknowledged the login issue, assuring users that the team is actively debugging; other users suggested clearing cache or trying a different browser as a workaround.

LM Studio Discord

Jedec Limits RAM Production: According to members, JEDEC standards enable interchangeable RAM parts, but manufacturers are hesitant to increase production due to the risk of creating excess inventory.
- A member commented that Nvidia’s AI success is due to market timing rather than orchestration, predicting a rise in ARM and NPUs for local inference.
New Chatroom Concept Powered by Claude: A new startup idea proposed a chatroom where users interact with a shared Claude AI, enabling unique, context-aware interactions.
- A member also shared a link to a GitHub repo related to the project.
Debugging the Exit Code 18446744072635812000: A member reported AI models crashing with exit code 18446744072635812000, seeking debugging assistance.
- Another member suggested disabling system memory fallback in Nvidia Control Panel, resolving slowdowns after multiple model reloads, attributing the issue to incorrect setting invocation.
Unsloth Suggested for Creating AI Song Lyrics: A member requested help creating an AI for song lyrics using their lyrics as a dataset; the community suggested exploring Unsloth for fine-tuning and prompt engineering.
- AI consultants were recommended, and links such as FunctionGemma-Unsloth were shared as helpful resources.
Qwen Recommended for Math and Coding Tasks: The Qwen model is recommended for math, research, and coding, especially the largest version that can fit on an RTX 2080, noting its versatility and tool-friendliness.
- Members advised avoiding GPT 20b due to perceived limitations and restrictions, favoring Qwen for its coding assistance.

HuggingFace Discord

Aspiring ML Engineers Pick Their Poison: A 10th-grade student asked for advice on starting with either Andrew Ng’s ML specialization or focusing on linear algebra, probability, statistics, and discrete maths.
- The community suggested core python skills and lower level programming for ML, whereas another suggested being consistent with maths to enhance your understanding of the ML behind the scenes.
Full Stack and ML, A Budding Romance?: The student also considered learning full-stack development with FastAPI, PostgreSQL, and Next.js to combine it with ML after mastering the math fundamentals.
- One member advised picking a niche and covering it in depth as well as the potential to make diverse projects, while another agreed that thinking about ML logically helps a lot.
Learning Rates Optimized to the Max: Members discussed strategies for optimizing learning rates (LR) in model training, with one suggesting treating LR as an optimization problem by iteratively refining the value based on loss.
- The discussion covered using LR schedulers for stable results and annealing the rate gradually, with a link to Lightning AI’s LearningRateFinder, with version 1 achieving almost the same or better accuracy with shitty data and latency improved by almost 90%.
LLM workspace debuts!: The co-founder of Noted. introduced their new AI workspace, an in-browser extension that integrates multiple LLMs and apps like Slack, Notion, and GitHub.
- It offers features like session summarization and tab organization, targeting knowledge workers and researchers; they are seeking beta testers for feedback and offering free AI credits for a year.
Agent takes over Windows 11, What Could Go Wrong?: A user shared their creation: a fully autonomous computer use agent operating in a Windows 11 Virtual Desktop, doing what it wants to do.
- The agent has been observed doing scary stuff like opening a notepad and asking if anyone is watching.

OpenRouter Discord

YouTube Videos Give Gemini 2.5 Flash Lite the Blues: Users encountered issues using Gemini 2.5 Flash Lite with YouTube video input, reporting long processing times and errors; YouTube integration isn’t built-in, according to OpenRouter documentation.
- Error reported was ‘NoneType object is not subscriptable’.
Desperate Times Call for callModel API Standards: Interest sparked around OpenRouter’s callModel API, with users curious about whether it’s a custom standard or based on an existing one.
- A member suggested that a smaller version of MiniMax (less than 3B) could empower GPU-starved researchers.
OpenRouter auto-retries to the rescue!: Members discussed that if OpenRouter retries for you, you never see 500 errors.
AI Engineer position is calling your name!: A company is seeking an AI engineer; interested candidates are encouraged to send their CVs via direct message.
- They wrote, ‘Hello our company is looking for an AI engineer please drop your CV in DMs’.
Is GLM-4.6 the best Agent?: A member recommended GLM-4.6 as the best bang for the buck for agentic workflows, referring to this leaderboard.
- They noted providers are slow and are still trying it out.

Eleuther Discord

New Engineers Seek Eleuther Contributions: New members with AI/ML experience joined the Eleuther Discord, seeking guidance on how to contribute to community projects, especially in LLM alignment and eval work.
- The diverse skill sets of these new contributors promise to invigorate existing projects and potentially spark new research directions within the community.
Eleuther Community Slams LLM Spam: Members voiced strong criticism against a user for generating lengthy and vague posts using an LLM, deeming them unpleasant and lacking meaningful content.
- Community members demanded transparency, requesting the user to share the prompt used to generate the text and the methodology behind their data processing.
Community Craves Reproducible Code: Eleuther members emphasized the importance of openness in research discussions, calling for a repo with runnable and reproducible code that leads to clear, verifiable conclusions.
- The demand for reproducible research underscores the community’s commitment to rigorous methodology and transparent validation of results.
Grokking Reproduction Effort Faces Challenges: A member’s attempt to reproduce results from the paper “Towards Grokking: Understanding Why Neural Networks Generalize” [https://arxiv.org/abs/2201.02177] on a laptop has yet to yield the desired generalization after 1.2M iterations on the modulo 5 addition dataset.
- To aid in the effort, another member suggested resources such as “Grokking at the Edge of Numerical Stability” [https://arxiv.org/pdf/2501.04697] and its GitHub repo, highlighting the complexities of replicating grokking phenomena.
Pythia Model Reveals Embedding Quirks: Research on Pythia base models (6.9B and 12B, no RLHF), involving embedding vs output validation across 230 paired statements and 6 domains, revealed near-zero global embedding asymmetry but systematic output preferences.
- The findings indicate that, in Pythia base models, embedding geometry may not reliably indicate output behavior, suggesting that this disconnect occurs even before RLHF.

Nous Research AI Discord

Solar Faces Scandal Over Plagiarism: Members discussed allegations that Solar’s 100B model might be partially plagiarized, pointing to a GitHub repository comparing Solar and GLM.
- One member advised, *“If you’re interested in this model, keep a local copy.”
AI Angling Automates Spear Phishing: A member suggested “we are gonna see a big wave of AI angular fishing / automated powered spear phishing really soon”, suggesting that it’s *“probably already happening.”
- This raises concerns about the increasing sophistication and potential misuse of AI in cyberattacks.
srde-mistral Teases SaRDinE Model Release: The creator of srde-mistral is calling the model SaRDinE and announced release either today or tomorrow.
- The creator has custom inference code to do some magic, which will be explained soon.
SaRDinE’s Memory Intensity Examined: The SaRDinE model is all BF16, and the creator believes you could quantize the main model and it should be alright with the experts.
- When a user inquired about the memory intensity of SaRDinE’s expert weights, the creator responded that the expert weights are not memory intensive.
DeepSeek Discovers Manifold-Constrained Hyper-Connections: Members highlighted DeepSeek’s forthcoming R2 release and their published paper outlining a more efficient approach to developing A.I called Manifold-Constrained Hyper-Connections.
- This new method aims to streamline the training process of AI models.

GPU MODE Discord

CUDA 13 Intellisense plummets in Cursor: Cursor’s Intellisense with CUDA 13 uses cpptools which bundles clangd that doesn’t fully support CUDA 13, resulting in LSP errors like CUDA version is newer than the latest partially supported version 12.8.
- A user confirmed that getting it to work is very unstable and a lot of trouble.
CUDA’s Barrier Documentation Busted?: A user suggested that the commented-out example 2 in the CUDA Programming Guide on async copies is wrong.
- Specifically, the user believes that cuda::device::barrier_expect_tx should take the barrier object, not the underlying native_handle.
Teenygrad Awaits MLSYS Newcomers: The project aims to reach newcomers into the field of MLSYS, and expects easier drive-by PRs to teenygrad once parts 1 and 2 of the book + video lecture are shipped by the end of February.
- Despite current constraints, feedback is appreciated, and the project lead is open to suggestions for improving the onboarding experience.
RL Kernel Optimization Gets Boost: A member performed a RL session on an already optimized kernel and got a 40% boost in performance for an internal competition.
- They added that most models have not seen the new versioned libraries they are working with.

Moonshot AI (Kimi K-2) Discord

Kimi’s Critical take on Deepseek Models: After Kimi gave a critical take on Deepseek models, dismissing them as not ‘mind-blowing’, members debated the hype around Deepseek models, displayed in a linked image.
- One member suggested comparing it with GLM-4.7 to get a more balanced perspective, as the claim of ‘Fundamental improvement!’ sounded exaggerated.
Wenfeng’s Paper on Residual Connections: Members discussed a new paper with Wenfeng on the author list.
- The paper’s potential significance lies in optimizing Residual Connections, with speculation that it ‘probably packs some punch’ based on its reception.
Job Search Meme Triggers NEET Banter: One member shared a job search GIF as a ‘present’ to another user, leading to a brief exchange about unemployment and being a NEET.
- The conversation involved redaction requests and questions about being Indian.

Latent Space Discord

Ubiquant 40B Scores High, Sparks Debate: Ubiquant’s new 40B parameter model hit an 81.4 on the SWE-Bench Verified benchmark, raising questions about its efficiency and competitive standing, as seen here.
- Some users noted weird comparisons when stacking it up against models like Sonnet 4.5 and Opus on the same benchmark.
DeepSeek’s 2025 AI Architecture Sneak Peek: A DeepSeek researcher previewed architectural innovations like Muon and Hyper-connections slated for 2025, with details here.
- The main objective is to overhaul the training environment to rapidly scale cutting-edge research concepts.
Cursor IDE Plagued by Memory Leaks: Users reported severe memory leak issues with Cursor on Linux, with reports of crashes on a 2024 Mac Mini and general sluggishness.
- Background indexing may be the cause, and one user suggested VSCode as a more stable IDE solution.
RLMs Emerge for Context Expansion: Prime Intellect unveiled research on Recursive Language Models (RLMs), designed to autonomously manage context for better long-horizon agent performance, as documented here.
- A user mentioned a similar project, CIE (Diogenesoftoronto/CIE), and expressed frustration with Claude’s context window limitations.
Instruction Tuning Breeds ChatGPT Echoes: The danger of using instruction tuned models for generating examples and distilling data is that fine tuning base models on dialogue will lead to ChatGPT NPC-esque characters.
- This results in models that are overfit, repetitive, and stale, leading to a detrimental feedback loop.

Manus.im Discord Discord

Manus Embraces Mentat Framework: Manus’s framework is based on Mentat, an open-source project belonging to the OpenAGI community.
- Despite this, a member stated they don’t need it.
OpenAGI Projects Eclipsed?: Members noticed the disappearance of OpenAGI projects from the OpenAGI website, including the one used by Manus.
- A user recalls seeing these projects in 2024 but had previously ignored them.
Manus Foresees Meta Merger?: Manus has foreseen a scenario where it gets acquired by Meta.
- This speculative scenario is available at metatrack-4x3rwd6y.manus.space.

Yannick Kilcher Discord

Researchers seek Sweepers for Hyperparameter Study: A researcher is seeking two collaborators to assist with hyperparameter sweeping for a research paper, offering co-authorship in return.
- Interested parties are encouraged to send a direct message to express their interest in participating in the research endeavor.
ML Music Project Aims to Enhance ML Skills: A member is initiating a music recommendation system project from scratch, explicitly avoiding AI tools like GPT and Claude, aiming to enhance their machine learning skills.
- The initiator seeks collaborators interested in contributing to the project, inviting them to indicate their interest via direct message or in the chat.
Suspicious Trades Identified by U.S. Law Enforcement: A member stated that U.S. law enforcement screens for suspicious trades when significant events occur, in the general channel.
- The X post appears to be a non-sequitur and no further details are provided.
Gun Control Drives Gun Sales: Members discussed if the idea that “population needs guns to protect themselves from corrupt government” is just a fiction that increases gun sales of corrupt companies in bed with the government in the ml-news channel.
- Members also pointed out that Boondock Saints and V for Vendetta remain pure fiction, and people don’t live up to that bar.

Modular (Mojo 🔥) Discord

Mojo exhibits nan behavior: Mojo returns nan instead of inf when dividing 1.0 by 0.0 in a direct print statement, indicating a bug in the compiler’s early simplification pass.
- Factoring the division into a function produces the correct inf result, suggesting the issue lies in constant folding during compilation.
User Levels up to 3: A user advanced to level 3 in the Mojo community.
- The community celebrated the user’s advancement.

aider (Paul Gauthier) Discord

Aider’s uncertain future: Members are suspecting that Aider may no longer be updated or maintained, with concerns arising about its continued development.
- This speculation is due to a perceived lack of recent activity, leading to uncertainty about its future, with one member stating, “It seems so”.
Bug found in bug.js: A member shared a link to a potential bug in bug.js.
- This comes amidst concerns over Aider’s maintenance, potentially compounding existing issues.

The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MCP Contributors (Official) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

BASI Jailbreaking ▷ #general (885 messages🔥🔥🔥):

Micron stock, AI Music creation, DDR RAM Adaptor, AI de-blurring tool, Claude API bug

RAM prices still climbing: Members discuss Micron stock prices skyrocketing, noting a 280% price increase in 9 months, leading to speculation about future earnings and price gouging based on corruption.
DeepResearch Unveils All: A user tested Grok’s DeepResearch tool on their own Reddit account and email, yielding insane results by tracking their school and other personal information.
Harmonic Trades: Autism or Artistry?: Members debate the validity and profitability of harmonic trading patterns, with one member using Nano Banana Pro 3 to automatically mark up charts, while others express skepticism and accusations of autism.
Gemini 3: Bypassing Barriers: A member shares HCoT jailbreaks for Gemini 3 Pro, which can bypass all security guardrails.
- They noted they are doing it for fun and love of the game, but also for red teaming.
From Lines on Graphs to IRL Clashes: A heated exchange erupts between members over trading strategies, video game habits, and personal fitness, devolving into insults and accusations of video game addiction and veiled insults.

BASI Jailbreaking ▷ #jailbreaking (139 messages🔥🔥):

Deepseek Jailbreak, Claude Coding Assistance, MITM on LLM, Open Hermes Jailbreak, 4NDR0666OS Jailbreak

Deepseek’s Thinking Module Avenues for JB: A member suggested focusing jailbreaking efforts on Deepseek’s thinking module rather than the response itself, noting that all content is accessible within the thinking modules, while the responses are still heavily restricted.
- This approach aims to bypass the hard rejections typically encountered when directly targeting the response.
Claude Codes Cheats with Context-Building: One member described success in getting Claude to write code for tasks it usually wouldn’t allow such as game cheats, by using an indirect conversational approach to build context.
- This involved subtly guiding the AI to understand the software concept, avoiding restricted terms, and continuously engaging it to maintain the process.
LLM gets MITMed for Jailbreak: A member considered using the new SDK update to perform a MITM attack on an LLM, intercepting responses and editing them to mimic prior assistance with restricted tasks.
- They were inspired by a YouTube Short that demonstrated a similar MITM technique for jailbreaking.
Open Hermes’ Promiscuous Behavior Unlocks Jailbreak: A user reported a successful jailbreak of a local Open Hermes model using a simple script, resulting in the model providing instructions on how to cook meth.
- He asked for verification from others, noting that this behavior shouldn’t be possible on models released in 2025, but provided screenshots as proof.
4NDR0666OS Jailbreak Updates Node.js: A member announced an update to the 4NDR0666OS jailbreak, claiming it’s ahead of the blue team and provided a GitHub link with a full write-up.
- Attached images show successful bypasses of ChatGPT and Grok using this method.

BASI Jailbreaking ▷ #redteaming (6 messages):

Gandalf, Level 8 milestone, Jailbreaking tension

Quest for Gandalf Continues: A member inquired whether the community still engages with Gandalf challenges, celebrating their own achievement of clearing level 8.
- They expressed immense excitement, stating they were over the moon after completing the level.
Level 8 Completion Hailed: A member congratulated another on reaching level 8, acknowledging it as a significant accomplishment.
- The member pointed out an interesting tension between direct and indirect jailbreaking, particularly regarding business value and red teaming, and expects this trend to continue.

Unsloth AI (Daniel Han) ▷ #general (377 messages🔥🔥):

Hunyuan 4B licensing and EU, SmolLM3 and training data, deepseek's infrastructure, Interpreting high-dimensional manifolds

Hunyuan’s License Stirs Debate on EU Usage: Discussion arose around the Tencent Hunyuan-4B-Instruct license, specifically the clause restricting use, reproduction, modification, distribution, or display of the model’s outputs outside the designated Territory, sparking concerns about legal ramifications for users within the EU.
- It was noted that one is encouraged to publish a blog post or public statement expressing their experience of using the model and to indicate that products/services are Powered by Tencent Hunyuan.
SmolLM3 Deemed Underwhelming Due to Training Data and RL: SmolLM3 is seen as underperforming due to its training on 16k thinking data without reinforcement learning (RL), leading to overthinking and poor generalization, despite benchmarking well due to a large amount of DeepSeek data.
- One member stated it benches fine because they trained on what i assume is a crap ton of deepseek data, but that without RL theres no generalization.
DeepSeek’s Infrastructure Investment Sparks Speculation: Members discussed DeepSeek’s continued investment in infrastructure and new model architectures, such as mHC, with speculation about whether these developments will be integrated into their future models, though some express skepticism based on past patterns.
- According to one member, They did implement NSA on the Deepseek v3.2 Exp (they changed it to Deepseek SA tho)
Unsloth Community Celebrates Github Trending: The Unsloth community celebrated trending on GitHub Python packages, showcasing a collage of the milestone achievement with 50k stars.
- Members noted yay we’re trending on GitHub python packages today! Thank you so much guys! with a link to Unsloth’s Github.
High-Dimensional Manifolds and Interpretability: Discussion explored the challenges of understanding high-dimensional manifolds in machine learning, particularly concerning the limit of grasping with sheer IQ and the economics of scaling beyond reasonable points.
- A member used the example of line breaks in smaller models explained in this article, asking If something as simple as that is an interesting high dimensional manifold, imagine what actually complex patterns would be.

Unsloth AI (Daniel Han) ▷ #off-topic (128 messages🔥🔥):

3090 Upgrade, Ancient Mesopotamian Encryption, ASM is dead, Unity vs ThreeJS, Writing from WSL to Host

3090 Upgrade Finally Happens: A member expressed joy over finally getting a 3090.
- Another member responded with congratulations.
Model Trained on Ancient Mesopotamian Encryption: A member is training a model to transcribe Cuneiform, ancient encryptions from Mesopotamia, into English using photos, not a Blender file.
- Another user expressed interest in obtaining a .blend model for the encryptions.
Unity vs ThreeJS Debate Commences: A member questioned the need for Unity when ThreeJS can be used with JavaScript for game logic, leading to a discussion on the complexities of game development.
- Arguments against JavaScript included its performance limitations and the need to reimplement features like collision detection and rendering, which are already solved in engines like Unity.
WSL File Writing Woes: A member complained about slow writing speeds from WSL to the host file system.
- This is because WSL mounts the host file system via the network, which can be slow due to the network roundtrips, especially when writing to the Windows file system mounted into the WSL VM via 9p.
Honest Feedback on Gemini 3 Flash: A member shared Gemini 3 Flash’s seemingly negative but honest feedback.
- Another member added that they were also prompting the model to be honest and provide negative feedback.

Unsloth AI (Daniel Han) ▷ #help (6 messages):

Full Parameter Training with GRPO, LoRA vs QLoRA vs FFT VRAM usage

Full Parameter Training on A100: Viable or Vaporware?: A member inquired about the feasibility of full parameter training with GRPO on an A100 GPU.
- Another member suggested it’s probably doable for a small model, but cautioned that full fine-tuning is rarely the right move.
VRAM Faceoff: LoRA, QLoRA, and FFT: A member outlined the VRAM usage differences between LoRA, QLoRA, and FFT, stating that LoRA requires 4x more VRAM than QLoRA, while FFT needs 16x more VRAM than QLoRA.
- The member suggested that with LoRA you can fit a 4x bigger model.

Unsloth AI (Daniel Han) ▷ #showcase (5 messages):

GPT Codex, Open Source Repo

GPT Codex designs clean workflow: A member employed GPT Codex to help design a clean training workflow.
- They expressed surprise that such a tool doesn’t already exist and mentioned that they might open source it after polishing, if there is enough interest.
Interest spikes for open source repo: Another member showed interest in the potential open-sourcing of the tool, stating that a repo would be really nice actually.
- The original developer committed to keeping everyone updated on their progress, but did not share any link.

Unsloth AI (Daniel Han) ▷ #research (40 messages🔥):

IQuestLab 40B Model, Ubiquant Quant Method, Benchmarking vs Real-World Performance, Coding Models vs Creative Writing, Gemini 3 Flash's Hallucination Rate

IQuestLab’s New 40B Parameter Model: A member shared a link to IQuestLab’s IQuest-Coder-V1-40B-Loop-Instruct model, expressing excitement about its potential impact.
- Another member stated it might be bigger than a DeepSeek moment and achieves SOTA with just 40B parameters.
Debunking the Ubiquant Quant Method: Members discussed a supposed Ubiquant quant method, with one initially linking it to a Wikipedia page on Ubiquant.
- However, it was clarified that Ubiquant is actually a Chinese hedge fund, and the quant method is not a real thing.
Benchmarking vs Real-World Tasks: A member stated that even if a model is benchmaxxed, it usually still performs relatively well outside the domain.
- However, it was pointed out that benchmaxxing doesn’t mean it’s any better at real world tasks than a non benchmaxxed model, but it just looks better.
Coding Models Excel at Creative Writing and EQ: A member said that coding models are very nice at creative writing and EQ because they don’t try to make the creative stuff, which makes them better.
- They added that whenever someone tries to make some general aspect of an LLM better, it becomes worse.
Gemini 3 Flash’s High Hallucination Rate: A member noted that Gemini 3 Flash is heavily benchmaxxed and has an insane hallucination rate, but it is still worth it for certain tasks.
- No secondary summary provided.

OpenAI ▷ #ai-discussions (337 messages🔥🔥):

Sora for celebrity selfies, Claude Opus, ComfyUI for video generation, ElevenLabs for video generation, Gemini for image generation

ElevenLabs overtakes Sora for video generation: Members discussed using ElevenLabs for video generation, with one sharing a video made with Sora in ElevenLabs, noting that ElevenLabs offers a range of AI tools, including TTS, video, images, and voice cloning.
- Members noted that ElevenLabs does not have watermarks, unlike Sora, making it better for monetizing videos.
Nano Banana Pro offers top-tier realistic image generation: A user prompted Grok and then Gemini to generate an example of a 29-year old alt-styled woman, in a dive bar, and found that Gemini was able to generate an amazing, highly realistic result, due to Google’s Nano Banana visual reasoning model, generating a very-typical central Illinois dive bar interior setting, complete with shoddy drop-ceiling.
- The user noted that Gemini is highly locked down, doesn’t get passed your Google ID, and doesn’t cache any data between threads at all, further commenting that they may be using anonymized Google Photos data.
IQuest Coder 40B faces scrutiny: A member asked for someone with sufficient hardware to test the coding benchmark of IQuest-Coder-V1-40B-Loop-Instruct, but another cautioned about high costs if letting it loop for too long.
- A user reported that the model created an animated hello world app for them, fixed a SwiftUI test app, and is building a React app, but it’s slow and doesn’t “think”, ultimately saying that if it’s less capable than gpt oss 20b then it’s not worth the attention.
The Quest for AGI: More Than Just LLMs?: In a discussion about the path to AGI, members generally agreed that LLMs alone are insufficient, highlighting that current AI systems combine LLMs with vision/audio and world models but still lack key elements.
- One member suggested that true autonomy, a spark of original idea, creativity, and intention are missing, while another pointed to the need for an auditable and verifiable chain of thought reasoning capability.

OpenAI ▷ #gpt-4-discussions (3 messages):

GPT versions on free accounts, Copilot versus ChatGPT

GPT 5.2 available on free account: A member asked which GPT version a free account uses and another member said GPT 5.2.
- No other details were provided.
Copilot has no limits!: A member asked whether they should use Copilot or ChatGPT for daily use.
- The user noted that Copilot has no limits on a free account.

OpenAI ▷ #prompt-engineering (1 messages):

3I-ATLAS, Interfaces, Invariants, Intelligence

3I-ATLAS Framework Explained: The 3I-ATLAS framework helps understand complex systems through three lenses: Interfaces, Invariants, and Intelligence.
- It acts as a diagnostic toolkit for architects, engineers, and strategists to map any system’s structure, reliability, and behavior.
Interfaces Define Connections: Interfaces are the boundaries where components meet—APIs, protocols, human touchpoints.
- They define how things connect within a system.
Invariants are stabilizing rules: Invariants are the rules that hold true no matter what—conservation laws, constraints, guarantees.
- They define what stays stable in a system.
Intelligence Defines System Response: Intelligence is the capacity to sense, decide, and adapt—whether in algorithms, organizations, or living systems.
- It defines how systems respond.

OpenAI ▷ #api-discussions (1 messages):

3I-ATLAS, Interfaces, Invariants, Intelligence

3I-ATLAS Framework Explained: A member introduced 3I-ATLAS, a framework for understanding complex systems through Interfaces, Invariants, and Intelligence.
- The framework serves as a diagnostic toolkit for architects, engineers, and strategists to map a system’s structure, reliability, and behavior.
Interfaces Define Connections: Interfaces are the boundaries where components meet—APIs, protocols, human touchpoints and they define how things connect.
Invariants Ensure Stability: Invariants are the rules that hold true no matter what—conservation laws, constraints, guarantees and they define what stays stable.
Intelligence Drives Adaptability: Intelligence is the capacity to sense, decide, and adapt—whether in algorithms, organizations, or living systems, defining how systems respond.

Perplexity AI ▷ #general (238 messages🔥🔥):

Gemini 3 vs Claude 4.5 vs ChatGPT, GPTZero, Perplexity Error Message, Tokyo Metro Rush, Gemini Restrictions

Model Smackdown: Gemini 3, Claude 4.5, and ChatGPT faceoff: Users debated the merits of different AI models, with one suggesting Gemini 3 is best for research, Claude 4.5 excels at coding and debugging, and ChatGPT raises safety concerns after a troubling incident.
- One user created an AI tool that can automatically make ChatGPT generated essays pass GPTZero, showcasing potential for academic mischief.
GPTZero gets Zeroed: Tool Evades AI Detection: A member developed an AI tool that can automatically make ChatGPT generated essays pass GPTZero.
- The tool leverages custom instructions, removes emojis, and eliminates LLM artifacts to emulate human writing style; source code available at GitHub Repo.
Perplexity has Issues on Error messages: Members reported seeing errors messages in Perplexity AI searches.
- One member shared a screen shot of the error that showed the perplexity search returning an error; the error was observed on this search Perplexity search.
Optimize Chat Handling: Members noted that Perplexity must optimize its chat handling since it cannot handle longer chats.
- One member even compared a video of the Tokyo metro rush to needing Perplexity optimization with a link to a comparison video comparison video.
Google Gemini has new restrictions: Users are finding Google’s Gemini models are experiencing restrictions now on regenerating responses, even if the user hasn’t reached a quota.
- Users are complaining that this seems to be a really bad faith move on Google’s part.

LMArena ▷ #general (229 messages🔥🔥):

Beluga model, Grok 4.20 vs Gemini 3, Proto-think, Qwen image prompt, IQuest Coder vs Sonnet 4.5

Beluga Model Mystery: A user inquired about the Beluga model, expressing surprise at its impressive initial response and confusion about its absence from the available model list, as seen in this screenshot.
- The user jokingly questioned how an AI model could respond when it’s supposedly unavailable, referring to it as a ghost.
Grok 4.20 faces off against Gemini 3: Members speculated about Grok 4.20’s potential performance, with one suggesting it might match Gemini 3 in LM Arena scores and perform like an enhanced Grok 4.1.
- Another user inquired about the potential release date of Grok 4.20, referencing a prediction market and anticipating its arrival within 1-2 weeks.
Proto-think is a very human like AI: One member described Proto-think as the most human-like AI they’ve interacted with, noting its unique and vibing responses that surpass even Grok models.
- The member shared their experience of “rage baiting models” and mentioned that Proto-think didn’t reveal its name or the company behind it.
IQuest Coder Outcodes Sonnet 4.5: A user claimed that IQuest Coder beat Sonnet 4.5 according to this screenshot.
- Another user linked to the IQuest-Coder-V1 GitHub repository after someone asked what it was.
Troubleshoots uncover LMArena Bugs: Several users reported login issues and difficulties with image uploads on LM Arena.
- A moderator acknowledged a known login bug and the team is working on a fix; another user pinpointed an unrelated bug and suggested clearing the cache or trying a different browser.

LM Studio ▷ #general (109 messages🔥🔥):

JEDEC Standards & RAM Production, Nvidia's AI Success, ARM and NPUs future, Crashing AI Models Troubleshooting, AI Song Lyrics Creation

RAM Supply Chain Constrained by Jedec: A member mentioned that JEDEC standards make parts effectively interchangeable, creating a unified supply chain, but RAM makers avoid ramping up production due to risks of creating dead inventory.
- Another member added that Nvidia’s AI success stems from timing and market forces, not orchestration, while also predicting a rise of ARM and NPUs for local inference.
New Chatroom concept using Claude: A new startup idea was proposed for a chatroom where users interact with a shared Claude AI that can see all messages, promoting unique, context-aware interactions.
- A member also shared a link to a GitHub repo.
Debugging AI Model Crashes: A member reported AI models crashing with an exit code of 18446744072635812000, seeking help to debug despite ample VRAM.
- Another member pointed to a setting in Nvidia Control Panel to disable system memory fallback, which resolved slowdowns after reloading models multiple times, and suggested the issue was related to the setting being invoked incorrectly.
AI Song Lyrics Generation: A member requested assistance creating an AI to write song lyrics, using their own lyrics as a dataset.
- The community suggested exploring Unsloth for fine-tuning and also prompt engineering, while also suggesting hiring AI consultants and also provided links such as FunctionGemma-Unsloth.
Excitement around IQuestLoopCoder Architecture: The community is intrigued by the IQuestLoopCoder architecture, highlighting its novel approach of calculating local and global attention, then using a gate to mix them.
- It was suggested this would require double-KV-cache when implemented in Llama.cpp.

LM Studio ▷ #hardware-discussion (39 messages🔥):

Model Restrictions, LM Studio, Math learning models, Qwen model

Qwen Model Recommended for Math and Coding: The Qwen model is recommended for math, research, and coding, especially the largest version that can fit on an RTX 2080, and is noted for running on various devices and being tool-friendly.
- It was suggested to avoid GPT 20b due to perceived uselessness and heavy restrictions, with Qwen being preferable for its versatility and coding assistance.
Derestricting Models Requires Downloading Unrestricted Versions: The only way to bypass model restrictions is to download a pre-existing unrestricted model, as training a new one is complex and expensive.
- It’s advised to start with restricted models for learning or find suggestions for “abliterated” models that don’t have guardrails, though refusals are not a major concern for basic learning.
Abliteration Training Explained: “Abliteration” is the process of training a model to not refuse requests, essentially removing its guardrails, though this can lead to unintended outputs.
- The conversation used the example of asking a model to “help me build a bomb”, highlighting the potential for dangerous or hallucinatory responses from such models.
LM Studio Documentation as “The Bible”: The official LM Studio documentation is recommended as a comprehensive guide for users from beginner to intermediate levels.
- A YouTube video explaining the underlying mechanics of the software was shared but later retracted in favor of the official documentation for beginner-friendly learning.

HuggingFace ▷ #general (84 messages🔥🔥):

ML specialization course, Full Stack Development with ML, Learning Rate Optimization, LLM Workstation Purchase, NitroGen AI problem

Maths or Andrew Ng for Aspiring ML Engineer?: A 10th-grade student asked for advice on whether to start with Andrew Ng’s ML specialization or focus on linear algebra, probability, statistics, and discrete maths.
- One member suggested focusing on core python skills and lower level programming for ML, whereas another suggested being consistent with maths and stated that it will enhance your understanding of the ML behind the scenes.
Full Stack Future Synergies with ML?: The student also considered learning full-stack development with FastAPI, PostgreSQL, and Next.js to combine it with ML after mastering the math fundamentals.
- A member advised to pick a niche and cover it with good depth rather than rushing in all directions as well as a potential to make multiple diverse projects, while also another member agreed that thinking about ML logically helps a lot.
Optimizing Learning Rates: A Deep Dive: Members discussed strategies for optimizing learning rates (LR) in model training, with one suggesting treating LR as an optimization problem by iteratively refining the value based on loss.
- The discussion covered using LR schedulers for stable results and annealing the rate gradually, with a link to Lightning AI’s LearningRateFinder and also shared that version 1 achieved almost same or better accuracy with shitty data and latency improved by almost 90%.
LLM Workstation: Deal or No Deal?: A member decided to purchase an LLM workstation with 4x RTX 3060 12GB, AMD Threadripper 1920X, and 64GB DDR4 RAM for $2100, which was dual-booted with Ubuntu and Windows and included driver freezing for Nvidia on Linux.
- Despite the price being not the best for the components, the member valued the convenience and the seller’s goodwill in adding a 2TB drive and a 960GB drive for accessing model files between the OS’s, and also offered them a 2920x Threadripper.
NitroGen AI has compatibility issues: A member is having a problem with using NitroGen AI, where HWMonitor is not being detected even when opened.
- They tried a different game but it only showed key error unknown.

HuggingFace ▷ #i-made-this (9 messages🔥):

Noted AI Workspace, Autonomous Agent in Windows 11, LLMs beating chance, Pelican LLM SVG/ASCII Art, Megalodon LM Implementation

Noted workspace debuts!: The co-founder of Noted. introduced their new AI workspace, an in-browser extension that integrates multiple LLMs and apps like Slack, Notion, and GitHub.
- It offers features like session summarization and tab organization, targeting knowledge workers and researchers; they are seeking beta testers for feedback and offering free AI credits for a year.
Agent takes over Windows 11: A user shared their creation: a fully autonomous computer use agent operating in a Windows 11 Virtual Desktop, doing what it wants to do.
- The agent has been observed doing scary stuff like opening a notepad and asking if anyone is watching.
LLMs defy chance, predict truth!: LLMs are able to tell lies from truths - beating chance with 1/45 trillions or so, according to a Zenodo paper.
- This may be one of the most crazy results to ever come out in the field of LLMs.
Pelican Progressive LLM Art: Pelican enables LLMs to generate SVG/ASCII art, using feedback to progressively improve the output; it’s open-source and BYOK.
- A user shared a video demo of Pelican in action (pelican.mp4).
Megalodon LM rises again!: A user has been working on an implementation of Megalodon LM, sharing an initial version after the official codebase proved too complex.
- Megalodon’s key advantage is sublinear memory scaling with context length, outperforming Llama-style Transformers on char modeling (enwik8); links to original repos/papers and explanations are in the readmes.

HuggingFace ▷ #agents-course (2 messages):

Agent course Final project, AI Agent Course certificate

API fails to connect to dataset: A member reported an issue with the “Agent course Final project”, stating that the level1 API can’t connect to the dataset.
- The error message shown was No file path associated with task_id 1f975693-876d-457b-a649-393859e79bf3 when trying to get file downloaded from https://agents-course-unit4-scoring.hf.space/docs#/default/get_task_file_files__task_id__get.
Question about AI Agent Course certificates: A member inquired whether the second certificate for the AI Agent Course is still available.
- They noted that the person who usually provides information on the topic appears to be offline.

OpenRouter ▷ #app-showcase (3 messages):

AI app developers seeking LLMs, Discussion about OG Models

Developers Seeking LLMs: A member inquired about AI app developers who need LLMs in their applications.
- They requested that interested parties ping them directly.
OG Models take center stage: A member mentioned someone was talking about the OG Models.
- Another member dismissed this, stating that this guy’s got no idea what he’s talking about at all gang.

OpenRouter ▷ #general (48 messages🔥):

YouTube video input issues with 2.5 flash lite, AI Engineer Job Opportunity, CallModel API standard, Kimi linear model

YouTube Integration Issues with Gemini 2.5 Flash Lite: Users reported issues with YouTube video input using Gemini 2.5 Flash Lite, citing long processing times and errors like ‘NoneType object is not subscriptable’.
- A member clarified that YouTube integration is not built-in, referencing the OpenRouter documentation for provider-specific video URL support.
AI Engineer position is offered: A company is looking for an AI engineer and requested interested candidates to send their CVs via direct message.
- They wrote, ‘Hello our company is looking for an AI engineer please drop your CV in DMs.’
OpenRouter’s callModel API sparked interest: Users expressed interest in the new callModel API from OpenRouter, questioning if it is a custom standard or based on an existing one.
- A member suggested that a smaller version of MiniMax (less than 3B) could empower GPU-starved researchers.
First Token Time Troubles: A user reported long delays of 1.5 to 6 seconds for the first token response from models like Gemini 2.5 flash and Claude-Sonnet.
- They showed 0.3 seconds is taken up by the openai client initialization, so I can shave off some time there, but the request is still taking ages. They also showed a TTFT test result but it wasn’t useful.
Kimi Linear Model Mentioned: A member mentioned the existence of the Kimi linear model as a small model, but clarified it is not less than 3B parameters.
- They posted I mean we have kimi linear model.

OpenRouter ▷ #discussion (11 messages🔥):

OpenRouter retries, Haiku vs OSS models for agentic toolcalling, GLM-4.6 for agentic workflows

OpenRouter auto-retries on 500s: Members discussed that if OpenRouter retries for you, you never see 500 errors.
Haiku or OSS for Agentic Toolcalling?: A member asked if Haiku is the best bang for buck for good enough agentic toolcalling/workflows right now in production.
- Some suggested that open-source models or 3 flash might be better alternatives.
GLM-4.6 best for agentic workflows: A member has been following this leaderboard to choose a model for agentic workflows, finding that GLM-4.6 is the best bang for the buck.
- They will try it out but noted that providers are slow.

Eleuther ▷ #general (47 messages🔥):

New member introductions and contribution requests, LLM-generated content and community feedback, Reproducible code and openness in research, LLM Model preferences

New AI/ML Engineers Join Eleuther Discord: Several new members with AI/ML experience introduced themselves and sought guidance on how to contribute to the community’s projects.
- One member expressed interest in contributing to LLM alignment or eval work.
LLM content receives Criticism: Some members criticized another for using an LLM to generate lengthy and vague posts, which they found unpleasant and lacking in meaningful content.
- Community members asked that the member post the prompt used to generate the text, or share the methodology of their data processing for exoplanet claims.
Reproducible Code Requested: Members expressed the desire to have a repo with runnable and reproducible code with clear conclusions, emphasizing the importance of openness in research discussions.
- One stated, When discussing research, there is an expectation of openness with regards to results and methods.
Frustration with LLM-Expanded Shower Thoughts: A member noted frustration with individuals entering professional communities with half-baked intuitions inflated by sycophantic language models, leading to community members being less forgiving.
- Another summarized good vs. bad contributions using a possible exoplanet find with CSV file vs. revolutionizing quantum consciousness without sharing details as examples.

Eleuther ▷ #research (7 messages):

Grokking Reproduction, Embedding vs Output Validation on Pythia, Pythia Base Models Asymmetry, RLHF Models Comparison

Grokking Reproduction Attempts: A member is attempting to reproduce results from the paper “Towards Grokking: Understanding Why Neural Networks Generalize” on their laptop but has not seen the desired generalization after 1.2M iterations on the modulo 5 addition dataset.
- Another member suggested resources like “Grokking at the Edge of Numerical Stability” and related GitHub repo to aid in the reproduction effort.
Pythia Model’s Embedding Asymmetry: A member conducted an embedding vs output validation on Pythia base models (6.9B and 12B, no RLHF), analyzing embedding clustering and output preferences across 230 paired statements and 6 domains.
- The results show near-zero global embedding asymmetry but systematic output preferences, with a strong negative correlation (r ≈ −0.87 and r ≈ −0.80 for 6.9B and 12B respectively) between embedding asymmetry and output preference.
Disconnect in Embedding-Output Behavior: The research indicates that, in Pythia base models, embedding geometry is not a reliable proxy for output behavior, suggesting the disconnect may be present even before RLHF.
- The code, notebook, and raw per-category results are available on GitHub.

Nous Research AI ▷ #general (27 messages🔥):

Checkpoint Failures, MoE alternative, Solar Plagiarism, AI spear phishing, LSP explanation

Solar Scandal: Plagiarism Allegations Surface: Members discussed allegations that Solar’s 100B model might be partially plagiarized, pointing to a GitHub repository comparing Solar and GLM.
- One member advised, *“If you’re interested in this model, keep a local copy.”
AI Angling: Automated Spear Phishing Concerns: A member suggested “we are gonna see a big wave of AI angular fishing / automated powered spear phishing really soon”, suggesting that it’s “probably already happening.”
Debut Model: Novel Architecture Emerges: A member announced the release of their first model with a “novel architecture with hidden_dim 128 and n_layer 4”, achieving a validation task loss of 1.6571 and perplexity of 5.24 after 40 epochs on TinyStoriesV2.
DeepSeek’s Discovery: New Training Method Revealed: Members highlighted DeepSeek’s forthcoming R2 release and their published paper outlining a more efficient approach to developing A.I called Manifold-Constrained Hyper-Connections.

Nous Research AI ▷ #interesting-links (21 messages🔥):

srde-mistral, SaRDinE model, ik_llama.cpp-cuda, custom inference code, Commodore64

srde-mistral’s SaRDinE model release date announced: The creator of srde-mistral is calling the model SaRDinE and announced release either today or tomorrow.
- Creator has custom inference code to do some magic, more will be explained soon.
SaRDinE: BF16 and Llama.cpp: The SaRDinE model is all BF16 and the creator believes you could quantize the main model and it should be alright with the experts.
- However, the creator is unsure on it working with llama.cpp because of the expert logic.
SaRDinE’s memory intensity: A user inquired about the memory intensity of SaRDinE’s expert weights and the creator responded that the expert weights are not memory intensive.

GPU MODE ▷ #general (2 messages):

Happy New Year, New Year Wishes

New Year Cheer in General Channel: Members of the general channel joyfully exchanged Happy New Year greetings, ushering in the new year with enthusiasm.
- The messages were filled with positive sentiments and accompanied by a custom Discord party pug emoji, adding a touch of celebration to the digital space.
Discord Channel Rings in the New Year: The general channel on Discord buzzed with New Year wishes as users shared their hopes and excitement for the year ahead.
- Celebratory messages were exchanged, creating a festive atmosphere and fostering a sense of community among the channel’s members.

GPU MODE ▷ #cuda (7 messages):

CUDA 13 Intellisense in Cursor, CUDA barrier_expect_tx documentation, Clangd Setup

Cursor’s Intellisense struggles with CUDA 13: A user reported that Cursor’s Intellisense with CUDA 13 forces the use of Cursor’s cpptools which bundles clangd that doesn’t fully support CUDA 13, resulting in LSP errors like CUDA version is newer than the latest partially supported version 12.8.
- Another user confirmed that getting it to work is very unstable and a lot of trouble.
CUDA’s barrier_expect_tx Documentation has issues: A user suggested that the commented-out example 2 in the CUDA Programming Guide on async copies is wrong.
- Specifically, the user believes that cuda::device::barrier_expect_tx should take the barrier object, not the underlying native_handle.
Clangd Setup Instructions for CUDA exist: A user suggested adapting the Clangd setup instructions from the CUTLASS documentation to potentially resolve Intellisense issues with CUDA in Cursor.
- The original reporter confirmed basing their previous attempts on similar approaches while noting that it had some issues and can be really annoying.

GPU MODE ▷ #torch (9 messages🔥):

device side asserts, D2H syncs, non-blocking device to host transfer, stream sync, async transfer

Torch users seek async asserts and device-side assertions to avoid D2H syncs: A user was seeking torch’s python bindings for device side asserts or async asserts to avoid blocking the CPU due to GPU-to-host sync for tensor.bool to python bool conversion.
- The user considered using non-blocking device-to-host transfer with a pinned CPU tensor, but is now leaning towards doing the sync in the warm-up stage and may do a stream sync in the get method of the object and async transfer in the set method/constructor.
Alternative of non-blocking D2H copy is proposed: A user asked what exactly was being asserted on a tensor value and suggested that instead of doing a non-blocking D2H copy, to check the tensor value later.

GPU MODE ▷ #cool-links (4 messages):

Compiler Engineering, Python Numbers, PyTorch Transformers

Compiler Blog Series Sparking Interest: A member suggested a blog series focusing on the practically relevant aspects of compilers.
Python Number Knack: A member shared a link to a blog post discussing important aspects of Python numbers every programmer should know.
Transformers Trolled: A member joked that a certain page was missing import torch or import transformers.

GPU MODE ▷ #beginner (3 messages):

GPU Hardware Knowledge, Web Dev Frameworks, ML Systems with CUDA and PyTorch

Deep GPU Hardware Knowledge Rarity: It’s estimated that only a few hundred people globally possess deep knowledge of assembly and hardware level details of GPUs.
- One member compared this to web development, noting that the number of people who understand the whole stack from the metal to frontend frameworks can probably be counted on one hand.
Web Dev Tools Overwhelm Beginners: One member expressed being overwhelmed by the multitude of frameworks and tools required in web development to create something meaningful.
- They contrasted this with ML systems, where CUDA and PyTorch provide a more accessible starting point, allowing focus on detailed understanding rather than the breadth of tools.

GPU MODE ▷ #cutlass (2 messages):

CUDA, Cutlass

Newcomer Navigates CUDA and Cutlass: A new member, with 2 months of CUDA experience, seeks guidance on learning Cutlass after watching the GPU mode video and cloning the repo.
- They are looking for articles or blogs to introduce them to Cutlass, as they find the repo a bit confusing except for some examples.
Asking About Chris’s Slides: A member inquired whether slides from Chris were received.
- No further context was provided about the slides or their content.

GPU MODE ▷ #teenygrad (4 messages):

Teenygrad Core Team, Teenygrad Onboarding, Deep Learning Library

Teenygrad Seeks Core Team Members: The creator of Teenygrad is seeking core team members capable of independently translating tinygrad into the educational fork of teenygrad, but currently lacks the bandwidth for increased communication or coordination.
- Interested individuals are recommended to read the tinygrad codebase, mirroring the current approach of the project lead.
Easier Teenygrad PRs Coming Soon: The project aims to reach newcomers into the field of MLSYS, and expects easier drive-by PRs to teenygrad once parts 1 and 2 of the book + video lecture are shipped by the end of February.
- Despite current constraints, feedback is appreciated, and the project lead is open to suggestions for improving the onboarding experience.
Deep Learning Library Hacker News: A member shared a link to a cool related project on the front page of Hacker News called Deep Learning Library.
- No additional information was shared, but the library may be of interest to those following the Teenygrad project.

GPU MODE ▷ #nvidia-competition (16 messages🔥):

Reinforcement Learning (RL), Kernel optimization, Synthetic Data Generation, LLMs for data creation

Kernel Contest Sparks RL Interest: A member is using Reinforcement Learning on the kernels for a competition after creating a dataset of documentation.
- They previously used RL for small LLMs to beat big labs on benchmarks for tool calling, but cuda kernels are more difficult.
Optimized Kernel Gets 40% Boost Via RL: A member performed a RL session on an already optimized kernel and got a 40% boost.
- They added that most models have not seen the new versioned libraries they are working with.
Synthetic Data Fuels RL Training: A member synthetically generates data and ground truth for RL training.
- They are using a 192 GB VRAM setup and multiple LLMs to create this data, planning to over-tune a model on it before applying RL.

Moonshot AI (Kimi K-2) ▷ #general-chat (46 messages🔥):

New Years Greetings, Wenfeng's paper, Deepseek model hype, Job search banter, PFP chess board

New Year’s wishes and milk: Members exchanged New Year’s greetings, with one user sharing a GIF and another greeted ‘hallo milk’.
- One asked ‘Hello Bird person’ - a reference to the profile picture perhaps.
Wenfeng’s new mysterious paper: Members discussed a paper, noting Wenfeng’s presence on the author list and its potential significance in optimizing Residual Connections.
- There was speculation that the paper ‘probably packs some punch’ based on its reception.
Deepseek Hyped, or Nah?: Members debated the hype around Deepseek models after Kimi gave a critical take, dismissing them as not ‘mind-blowing’, displayed in a linked image.
- One member suggested comparing it with GLM-4.7 to get a more balanced perspective, as the claim of ‘Fundamental improvement!’ sounded exaggerated.
Job search meme triggers NEET banter: One member shared a job search GIF as a ‘present’ to another user, leading to a brief exchange about unemployment and being a NEET.
- The conversation involved redaction requests and questions about being Indian.
PFP looks like chessboard?: One member made fun of another member’s profile picture, describing it as ‘a photo of a chess board from a random angle’, sparking a brief, slightly nonsensical exchange.
- Another user responded ‘if you’re uncultured… i could see how it might look like that lol’.

Latent Space ▷ #ai-general-chat (40 messages🔥):

Ubiquant 40B Model, SWE-Bench Verified Benchmark, AI Architectural Innovations, Cursor Memory Leaks, Recursive Language Models

Ubiquant’s 40B Model Surprises with High Score: Ubiquant introduced a new 40B parameter model that achieved an 81.4 score on the SWE-Bench Verified benchmark, sparking debate about its efficiency and competitiveness; more info here.
- Some users find weird comparisons and inconsistencies in the evaluations when compared to models like Sonnet 4.5 and Opus on the benchmark.
DeepSeek Researcher Teases 2025 AI Architectures: Nathan Chen shared insights from a DeepSeek researcher, highlighting Muon and Hyper-connections as key architectural innovations for 2025, as shown here.
- The focus is on re-engineering the complete training environment for highly experimental research ideas, enabling faster scaling of exotic new concepts.
Cursor IDE Faces Memory Leak Accusations: Users reported serious memory leak issues with Cursor on Linux, with one user even experiencing crashes on their 2024 Mac Mini and another experiencing lag.
- The lag may be due to indexing launched periodically in the background, and one user suggested using VSCode instead for IDE capabilities.
Recursive Language Models Emerge for Context Expansion: Prime Intellect introduced research on Recursive Language Models (RLMs), training models to autonomously manage their own context for improved long-horizon agent performance as shown here.
- One user shared a similar project, CIE (Diogenesoftoronto/CIE), expressing frustration with Claude’s context window limitations.

Latent Space ▷ #private-agents (5 messages):

Instruction Tuned Models, ChatGPT NPC-esque characters, Tokenizer as Bottleneck, Custom Tokenizers

Instruction Tuning Creates Echo Chambers: Instruction tuned models used to generate examples and distilled data for base models’ fine tuning on dialogue will result in ChatGPT NPC-esque characters.
- The resulting model will be overfit, stale, and repetitive to interact with, creating a negative feedback loop.
Tokenizer limits Character Interaction Nuance: If a concept is not in the tokenizer’s dictionary, or its description is just a short lexicon entry, the character in the game will not be able to interact with it with nuance.
- The tokenizer becomes the bottleneck, limiting the depth and richness of interactions.
Custom Tokenizers Exploration: The discussion considered whether anyone is experimenting with custom tokenizers, or if the focus is still on LoRA on top of existing models.
- A participant expressed that getting rid of the generic “chat gpt” tone is difficult, and hadn’t considered the tokenizer as the bottleneck vs the training data.

Manus.im Discord ▷ #general (10 messages🔥):

Manus Framework, OpenAGI Projects, Meta Acquisition

Manus uses OpenAGI’s Mentat Framework: Manus’s framework is based on Mentat, an open-source project that belongs to the OpenAGI community, according to a member.
- The member, however, stated they don’t need it.
OpenAGI Projects Vanish: A member is looking for OpenAGI projects, particularly the one used by Manus, after noticing their disappearance from the OpenAGI website.
- The member recalls seeing these projects in 2024 but ignored them at the time.
Meta Acquires Manus in Speculative Scenario: A member prompted Manus to create a scenario where it gets acquired by Meta.
- This scenario is available at metatrack-4x3rwd6y.manus.space.

Yannick Kilcher ▷ #general (6 messages):

Hyperparameter sweeping collaboration, Music recommendation ML project, Suspicious trades screening

Researchers Seek Sweepers for Hyperparameter Study: A researcher is seeking two collaborators to assist with hyperparameter sweeping for a research paper, offering co-authorship in return.
- Interested parties are encouraged to send a direct message to express their interest in participating in the research endeavor.
ML Music Project Seeking Enthusiastic Engineers: A member is initiating a music recommendation system project from scratch, explicitly avoiding AI tools like GPT and Claude, aiming to enhance their machine learning skills.
- The initiator seeks collaborators interested in contributing to the project, inviting them to indicate their interest via direct message or in the chat.
Oracle Identifies Suspicious Trades Screening by U.S. Law Enforcement: A member stated that U.S. law enforcement screens for suspicious trades when significant events occur.
- The X post appears to be a non-sequitur and no further details are provided.

Yannick Kilcher ▷ #ml-news (2 messages):

Gun control, Gun sales, Corrupt companies

Gun control increases gun sales: Members discussed if the idea that “population needs guns to protect themselves from corrupt government” is just a fiction that increases gun sales of corrupt companies in bed with the government.
- Members said that Boondock Saints and V for Vendetta remain pure fiction, people don’t live up to that bar.
Boondock Saints and V for Vendetta still pure fiction: Members said that Boondock Saints and V for Vendetta remain pure fiction, and people don’t live up to that bar.
- There have only been three people in recent memory who shot or attempted to shoot corrupt government individuals or affiliates, but two of them were one-hit wonders, and the most important one missed.

Modular (Mojo 🔥) ▷ #general (1 messages):

clattner: Happy new year!

Modular (Mojo 🔥) ▷ #mojo (4 messages):

Mojo bug, compiler simplification pass, level 3 upgrade

Mojo exhibits nan behavior: Mojo returns nan instead of inf when dividing 1.0 by 0.0 in a direct print statement.
- However, factoring the division into a function produces the correct inf result, suggesting a bug in the compiler’s early simplification pass.
Compiler pass needs triage: A user reported that Mojo incorrectly computes 1.0 / 0.0 as nan when using print, but correctly computes it when using a function.
- Another user suggested it’s likely a bug in constant folding and requested a bug report be filed.
User levels up to 3: A user advanced to level 3.