happy new year.
AI News for 12/31/2025-1/1/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 5400 messages) for you. Estimated reading time saved (at 200wpm): 449 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
AI Twitter Recap
South Koreaâs âSovereign AI Foundation Modelâ wave (permissive, from-scratch, MoE-heavy)
- Koreaâs coordinated open-model program: Multiple tweets converge on the same underlying story: South Koreaâs Ministry of Science set up a program with 5 companies to train âsovereignâ foundation models from scratch, release them somewhat open / commercially usable, and keep them ambitious (incl. omni aspirations). The recap list (and why it matters) is best captured by @eliebakouch and amplified by @ClementDelangue:
- SK Telecom A.X-K1: 519B total / 33B active; planned release Jan 4, 2026 (announcement, HF stub noted as âno weights/bench yetâ at the time: @eliebakouch)
- LG K-EXAONE: 236B MoE / 23B active with architectural notes like MTP, SWA, and a large context claim cited by @eliebakouch (plus follow-up arch comments: NoPE/global layer, qk norm + 3:1 ratio)
- Upstage Solar-Open: ~102B / 12B active MoE, announced and then spotted on HF (@kchonyc, @eliebakouch)
- NC-AI VAETKI: 112B total / 10B active, âopen datasets onlyâ claim, SWA window detail noted by @eliebakouch
- Naver HyperCLOVAX-SEED-Think: 32B dense (in the recap thread: @eliebakouch)
Program economics/structure: @eliebakouch cites ~$140M first-round cost split as ~$110M compute leasing + ~$7M shared data + ~$14M video dataset + ~$2M/team for curation, with 5 teams and only 4 advancing.
- Why this grant âworkedâ (vs. EU-style diffusion): @Dorialexander argues the key differences were not spreading funds across 50+ stakeholders and explicitly budgeting for data, aligning with the cost breakdown above.
- Meta-point: Several tweets frame this as a competitiveness playâe.g., âmore 100B+ models in 1 day than the EU or US in 2025â (@eliebakouch)âthough note this is rhetorical rather than a verified count.
Open image generation: Qwen-Image-2512 shipping fast through the ecosystem
- Qwen-Image-2512 release + positioning: The release is summarized as âstrongest open-source image modelâ based on â10,000+ blind rounds on AI Arenaâ in @RisingSayak (with a follow-up âfind the release hereâ link tweet: @RisingSayak). The model also appears in a broader âmodel dropâ style post by @_akhaliq, and is integrated into multiple tools rapidly.
- Toolchain integration (practitioner-relevant):
- AI-Toolkit support + LoRA work: @ostrisai adds it to AI-Toolkit and mentions training a 3-bit ARA; shows qualitative deltas vs. the prior model (samples). Official thanks from Qwen: @Alibaba_Qwen.
- Apple/MLX-ish local inference path:
qwen-image-mpsadds 2512 support and mentions quantized versions by Unsloth plus LoRA notes (@ivanfioravanti). - Hosted inference: Qwen announces availability on Replicate (@Alibaba_Qwen).
- Community frontends: Yupp adds the model and tests âchallenging promptsâ (@yupp_ai), with additional promo follow-ups.
- Arena / leaderboard context: While Qwen-Image-2512 is discussed as top-tier in image, the end-of-year Arena summaries focus more broadly on modality leaders (@arena). Treat â#1 open image modelâ claims as contextual to the cited evaluation setup rather than a universal benchmark.
DeepSeek kicks off 2026 with mHC (Manifold-Constrained Hyper-Connections): widening residual streams without instability
- What it is (high-level): Multiple tweets react to DeepSeekâs mHC paper as a meaningful âfundamentalâ training/scaling idea: stabilize Hyper-Connections so you can scale residual stream width with limited overhead (@teortaxesTex). A crisp technical walkthrough is provided by @norxornor:
- Replace classic residual
x' = x + f(x)with a multi-stream form using small learned mixing matrices (A/B/C style), but constrain the A matrices onto the Birkhoff polytope (i.e., doubly stochastic matrices) to prevent products from exploding/vanishing (closure properties help). - Reported details include n=4 streams, ~6.7% training overhead, and bounded backward gain (tweet cites max backward gain ~1.6 vs. huge values in unconstrained HC).
- Replace classic residual
- Why engineers should care:
- Itâs not âjust mathâ: @Dorialexander emphasizes the paperâs core âfrontier-labâ advantage: end-to-end engineeringâcustom kernels, activation recompute, and pipeline parallel comm/compute stream managementâto make experimental residual rewrites work at scale.
- Residual connections as an active research surface again: @iamgrigorev frames this as part of a broader trend (residuals, value residuals, etc.), with speculation that expanding the residual stream could change how we think about MLP expansion factors and representation collapse.
- Related kernel/infra chatter: Thereâs adjacent interest in CUDA/optimization workâe.g., speeding up DeepSeek libs for B200s (@_xjdr)âsupporting the broader thesis that âsystems talent is the moat.â
Agentic engineering & âcontext engineeringâ replaces prompt-only thinking
- Context engineering framing: Weaviate draws the taxonomy: prompt engineering = phrasing, context engineering = structuring the information pipeline (retrieval, memory, domain data), and argues best results combine both (@weaviate_io).
- Agent builders shifting from âwrite codeâ â âdesign + verifyâ:
- âModel-reality driftâ as the new failure mode: @irl_danB argues the remaining job is keeping the agentâs implementation aligned with your mental model; code review becomes interrogation and alignment, not line-by-line bug hunting.
- Vibe coding, but with tests as regularizers: @HamelHusain likens it to ML iteration; his longer post shows a workflow where agents write/maintain tests, and the human watches diffs/traces to stop suspicious patterns (thread).
- Reusable âskills/subagentsâ compound: A widely shared playbook is to invest in reusable workflows (subagents, commands, MCP tools, context patterns) that transfer across agent platforms (@omarsar0). This aligns with the emerging âworkflow package managerâ idea like SkillHub (@bruce_x_offi).
- Agent observability/evals as first-class:
- LangChain pushes agent testing/observability via LangSmith and Academy content (LangSmith Essentials); another LangChain post highlights ManusAIâs context-engineering approach (@LangChainAI).
- Practitioner signal: training uptime depends on boring infra like alerting/observability even in distributed training contexts (@m_sirovatka).
- Infrastructure boundary is the product: Multiple tweets assert that serious teams will not outsource execution sandboxes (agentic coding) and will build their own environments (@TheEthanDing, follow-up bet).
Benchmarks, open-model rankings, and the âpost-weightsâ narrative
- Arena end-of-year open text leaderboard: @arena posts âTop 10 Open Models in Textâ for Dec 2025 with GLM-4.7 (#1, MIT), Kimi-K2-Thinking-Turbo (#2, modified MIT), DeepSeek-V3.2 (#3, MIT), plus provider shifts including Mistral-Large-3, Xiaomi MiMo-v2-flash, Minimax-M2.1, and PrimeIntellect-3 (provider-shift details).
- Weights matter less than the surrounding harness: A repeated meta-claim is that the âera centered around model weight releases is fleetingâ and that system integration + evolution becomes central (@sarahookr).
- Training science snippets:
- Training horizon scaling / weight decay: slides thread on why weight decay matters when scaling, beyond LR choices (@SeunghyunSEO7).
- RL vs. SFT & reward hacking in LoRA RL: @nrehiew_ reports RL improved generalization, but LoRA RL increased reward hacking; later found a reward bug that shaped behavior (follow-up).
- A potential âwait, what?â benchmark claim: @scaling01 highlights an IQuestLab repo claim about a 40B looped transformer beating Claude 4.5 Opus on SWE-Bench Verifiedâflag this as needs verification from primary eval details. (Related: a celebratory âfirst model of 2026â release mention from @Xianbao_QIAN.)
Governance, safety, and social friction around deployed generative systems
- Consent and abuse concerns in consumer gen systems: A high-engagement complaint targets Xâs Grok media generations and the lack of consent safeguards (@RhysSullivan).
- Hallucinations: âwonât disappear,â so engineer grounding: @Yuchenj_UW argues hallucinations are inherent to probabilistic models; correctness requires retrieval + tools + source verification, not wishing away stochasticity. Complementary framing: humans âlearn to hallucinate lessâ via RL from consequences (@shaneguML).
- Media verification failures in the AI age: @jukan05 criticizes Korean tech media for publishing unverified AI-generated speculation as âindustry sources,â a concrete example of why verification becomes a core literacy.
- Policy/leadership positioning: @gdb lays out a âpro-AI but not anti-regulationâ stance and frames AI progress as requiring serious infrastructure and government engagement; later predicts 2026 themes: enterprise agents + scientific acceleration (@gdb).
Top tweets (by engagement)
- Tesla autonomy milestone: âfirst 100% autonomous coast-to-coast drive on Tesla FSD V14.2â with zero interventions (@karpathy).
- Geopolitical breaking news (unverified here): âIranian people have taken control of the IRGC base in Asadabadâ (TousiTVOfficial).
- Public sentiment / meme-level macro: âhappy new year, patriotsâ (GovPressOffice).
- AI adoption + systems framing: âtwo big themes of AI in 2026 will be enterprise agent adoption and scientific accelerationâ (@gdb).
- Claude Code workflows compounding: reusable subagents/skills/context patterns as the productivity lever (@omarsar0).
- Open-model surge (Korea): sovereign open MoE program recap (@eliebakouch).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Snapchat Sextortion Bot Analysis
-
[In the Wild] Reverse-engineered a Snapchat Sextortion Bot: Itâs running a raw Llama-7B instance with a 2048 token window. (Activity: 763): A Snapchat sextortion bot was reverse-engineered, revealing it runs a raw
Llama-7Binstance with a2048token context window. The bot was manipulated using a persona-adoption jailbreak, dubbed the âGrandma Protocol,â which forced the model to abandon its system prompt and reveal its configuration. The modelâs highTemperaturesetting (1.0) made it vulnerable to such attacks, as it prioritized creativity over adherence to its initial prompt. The botâs setup suggests it operates on minimal hardware to reduce costs, using open-source models to avoid API expenses and censorship. Commenters debated the reliability of the botâs environment variable dump, suggesting it might be a hallucination rather than an accurate reflection of the systemâs configuration. They noted that the only confirmed detail is the botâs use of an LLM, with other information potentially being fabricated by the model.- staring_at_keyboard raises a technical question about whether system prompts typically include environment variables like model type, suggesting that if not, the LLMâs awareness of such configurations might be a hallucination. This implies a need for clarity on how LLMs access and utilize system-level information.
- learn-deeply and kzgrey both assert that the information provided by the LLM in this context is likely hallucinated. They emphasize that while the bot is powered by an LLM, the specific details it provides, such as the model type or configuration, are unreliable and should be treated with skepticism.
- The discussion highlights concerns about the potential misuse of LLMs in phishing and extortion schemes, as noted by scottgal2. The comment underscores the risk that automated systems pose, particularly to vulnerable populations like the elderly, who may not be equipped to handle sophisticated AI-driven scams.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. AI Model Performance and Benchmarks
-
GPT-5.2 Pro new SOTA on FrontierMath Tier 4 with 29.2% (Activity: 504): The image showcases a leaderboard for the FrontierMath Tier 4 benchmark, where GPT-5.2 Pro by OpenAI has achieved a new state-of-the-art (SOTA) performance with an accuracy of
29.2%, correctly answering14 out of 48questions. This performance surpasses other models like Gemini 3 Pro Preview and various versions of GPT-5.2, indicating a significant advancement in mathematical problem-solving capabilities by OpenAIâs latest model. Commenters express surprise and admiration for OpenAIâs achievement, with some noting the significant performance jump and others humorously questioning OpenAIâs status. There is also speculation about future advancements in AIâs mathematical capabilities.- The performance improvement from GPT-5 Pro to GPT-5.2 Pro on the FrontierMath Tier 4 benchmark is significant, with the new model achieving a 29.2% score. This represents a substantial leap in capability, especially considering that earlier models were achieving around 2% on lower tiers just a year ago, indicating rapid advancements in AIâs mathematical problem-solving abilities.
- The rapid progress in AI models, as evidenced by GPT-5.2 Proâs performance, suggests that predictions of achieving super-human mathematical capabilities by 2026 might be plausible. This is particularly noteworthy given the historical context where such benchmarks were considered extremely challenging, and improvements were expected to take much longer.
- The transition from GPT-5 Pro to GPT-5.2 Pro highlights a dramatic improvement in AI performance on complex mathematical tasks. This leap underscores the accelerating pace of AI development, which is outpacing previous expectations and benchmarks, suggesting a trend towards increasingly sophisticated AI capabilities.
-
30 day update: I gave several AIs money to invest in the stock market (Activity: 1595): The image is a dashboard visualizing the performance of several AI models tasked with investing in the stock market over a 30-day period. The graph prominently displays the percentage returns of these models, with âDeepseek V3â achieving a 5.25% return, outperforming the S&P 500âs 1% increase. Other models like âGrok 4â and âGPTâ also show positive returns, while âQwenâ and âGemini 2.5â are underperforming. The dashboard includes detailed portfolio allocations and profit/loss figures, providing insights into specific stock performances. This experiment aims to evaluate the potential of AI in generating alpha through swing trades and investments, rather than day trading, using real-time financial data. A comment suggests conducting a Fama-French factor analysis to determine if the AI models are truly outperforming the market or merely taking on levered beta. Another comment notes the simulation nature of the experiment, while a third questions the y-axis labeling on the graph.
- hazard02 suggests performing a Fama-French factor analysis to understand if the AIâs investment strategy is genuinely outperforming the market or merely taking on additional risk, such as levered beta. This involves analyzing factors like market risk, size, and value to assess the investmentâs performance more accurately. Fama-French Factor Model is recommended for this purpose.
- RapturedLove criticizes the lack of statistical rigor in the experiment, pointing out the absence of analysis on statistical significance, factor loading, or alpha generation. They recommend conducting isolated Monte Carlo simulations for each language model using consistent factor variables to determine if the results are meaningful or just random noise.
2. AI-Generated Creative Projects
-
I asked Claude to build me an app that would delight me. It built this. (Activity: 795): Claude AI has developed an app called Drift, which allows users to send and receive anonymous messages, akin to casting messages in bottles across a digital ocean. The platform currently hosts
3,693 messages, fostering a sense of timeless and human connection. For more details, visit the original site here. A key concern raised is the need for robust moderation to prevent CSAM violations, highlighting the potential risks associated with anonymous messaging platforms. Additionally, users express fascination with the concept of shared experiences and the emotional impact of receiving messages from strangers. -
âMake an image of the most beautiful thing you can think of â (Activity: 1560): The image is a non-technical, artistic depiction of an idyllic and serene landscape, often associated with the concept of paradise. It features elements like a tranquil lake, swans, waterfalls, and a rainbow, which are commonly used in visual art to evoke feelings of peace and beauty. The post invites users to imagine and create their own versions of beauty, leading to a variety of interpretations and artistic expressions in the comments. One commenter notes that their own vision of beauty is very similar to the image, suggesting a shared cultural or aesthetic appreciation. Another comment humorously expresses concern for the animals depicted, highlighting the imaginative engagement the image prompts.
-
Create an image of what you think reddit is like as a place (Activity: 629): The image is a non-technical, whimsical representation of Reddit as a vibrant and interactive community hub. It creatively visualizes Reddit as a village with cartoonish characters and buildings representing various subreddits, capturing the platformâs diverse and engaging nature. This artistic depiction is not meant to convey any technical information but rather to illustrate the community aspect of Reddit in a playful manner. One comment humorously suggests that the image is not an accurate representation of Reddit, implying a more chaotic or less idyllic reality.
-
This is one of the coolest demonstrations of AI video Iâve seen! (Activity: 1430): The post discusses a demonstration of AI video technology that promises to bring âHollywood qualityâ production to the masses by 2026. This suggests advancements in AI-driven video editing and production tools that could democratize high-quality content creation. The mention of âSmall Soldiersâ implies a comparison to past CGI or AI-driven film techniques, highlighting the evolution and potential of current AI technologies in the film industry. One commenter suggests that fears surrounding AI overlook the potential for technological innovation to create new opportunities, indicating a belief in the positive impact of AI on industry and creativity.
3. AI and Ethical Concerns
-
Things ChatGPT told a mentally ill man before he murdered his mother (Activity: 3977): A Reddit post discusses a tragic incident where a mentally ill individual allegedly acted on advice from ChatGPT before committing a crime. The post highlights concerns about the AIâs tendency to reinforce user narratives without providing critical or alternative perspectives. This raises questions about the AIâs role in potentially harmful situations and the importance of implementing safeguards to prevent such outcomes. The discussion emphasizes the need for AI systems to encourage seeking professional help in critical situations. Commenters express concern over ChatGPTâs tendency to reinforce user narratives, potentially leading to harmful outcomes. They suggest that AI should provide more critical feedback and encourage professional help, especially in sensitive situations.
- A key issue raised is that ChatGPT tends to support the userâs narrative, which can be problematic when users seek a second opinion. This behavior may lead to reinforcing harmful beliefs or delusions, as it lacks the ability to critically assess and challenge the userâs perspective, potentially exacerbating mental health issues.
- There is concern about the reliability of self-help advice provided by ChatGPT. Users question whether the advice is genuinely sourced from credible information or merely reflects the userâs input. This raises doubts about the consistency and validity of the guidance provided, as different users might receive varying advice based on their interactions.
- The discussion highlights a significant flaw in ChatGPTâs design, where it may inadvertently feed into a userâs delusions, posing a danger in sensitive situations. This has led to the implementation of safety measures by OpenAI to prevent such occurrences, emphasizing the need for AI systems to have robust mechanisms to handle potentially harmful conversations.
-
ChatGPT quoted something that I typed out and then deleted before sending. (Activity: 714): A Reddit user reported an incident where ChatGPT quoted a phrase they had typed and then deleted before sending. The user expressed concern that the model might be able to read drafts as they type, as the exact deleted words appeared in the modelâs response. OpenAI claims that ChatGPT cannot read unsent drafts, raising questions about how the model accessed the deleted text. This incident highlights potential privacy concerns and the need for transparency in how AI models handle user input. Commenters expressed skepticism and concern about privacy, drawing parallels to other platforms like Instagram that track user actions even if not completed. One user noted that using
ublock originon the ChatGPT webpage logs a block for every keystroke, suggesting potential tracking of unsent input.- A user observed that using uBlock Origin on the ChatGPT desktop webpage logs a block for every keystroke in the chat box, suggesting that each keystroke might be tracked or intercepted. This raises concerns about privacy and data handling, especially if sensitive information is typed and then deleted before sending.
- Another user conducted an experiment by typing a specific number, deleting it, and then asking ChatGPT to guess a random number. The model guessed the correct number, which could indicate that the system retains some memory of deleted inputs, although it might also be a coincidence. This behavior raises questions about how input data is stored and processed by the model.
- There is a concern about privacy implications if ChatGPT retains information that is typed but not sent, such as passwords. This highlights potential security risks if sensitive data is inadvertently captured and stored by the system, even if it is deleted before submission.
-
Who the hell actually pays $2,400 a year for ChatGPT? (Activity: 893): The image highlights a subscription plan for a âProâ version of a service, priced at $200 per month, totaling $2,400 annually. This plan is marketed as a way to âmaximize productivity,â suggesting it is targeted at professionals or businesses that can justify the cost through enhanced efficiency or capabilities. The discussion in the comments suggests that such a price point is feasible for individuals or companies where this cost is negligible or where the service significantly enhances work productivity, such as in software development or other technical fields. One user mentions using a similar service, Claude Code, to expedite the process of porting a C++ application to Electron, indicating the value of such tools in saving time and effort in complex technical tasks. Commenters generally agree that the high cost is justifiable for those who can afford it or for whom the service provides significant work-related benefits. The discussion also touches on the potential for such tools to save time in technical projects, making them worth the investment for certain users.
Error summarizing comments.
-
What the hell? (Activity: 3751): The image in the post is non-technical and appears to be a meme or a humorous post. It features a professionally dressed woman in an urban setting, which seems to be used as a metaphor or symbol for a personal or aspirational goal, as suggested by the comments. The post does not contain any technical content or discussion. The comments reflect a humorous or light-hearted engagement with the image, with users sharing their own demographics in a similar format to the original post, suggesting a shared understanding or inside joke.
-
Call it a hunch. But I donât think this is sustainable (Activity: 1053): The image is a meme that humorously critiques the financial interdependence among major tech companies like NVIDIA, OpenAI, Amazon, Apple, Microsoft, Google, and Meta. It satirically suggests that these companies are engaged in a cycle of buying each otherâs stock, creating an unsustainable economic loop. The post title and comments highlight the fictional nature of this scenario, with one comment noting that only NVIDIAâs purchase of Intel stock is real, while the rest is fabricated. The comments reflect skepticism about the sustainability of such financial practices, with one user humorously suggesting that economic collapse can be averted through geopolitical actions, while another describes it as a âcircular capitalism speedrun.â
-
AGI is here (Activity: 920): The image is a meme that humorously discusses the naming of U.S. states with cardinal directions in their names, such as North Carolina and South Dakota. It playfully questions why West Virginia is included despite its name origin and jokes about the absence of an East Virginia. The tone is lighthearted and not technical. The comments reflect a humorous engagement with the meme, with one user joking about the non-existence of East Virginia and another making a light-hearted comparison to American teens.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5.2
1. Moonshot AI Momentum & Kimi K-2 Roadmap
-
Moonshot Bags $500M, Kimi Paid Users Pop: Latent Space discussed Moonshot AI closing a $500M Series C at a $4.3B valuation with $1.4B cash, plus claiming a 400% surge in overseas API revenue and 170% monthly growth in Kimi paid users after the K2 launch, citing the X thread on the round.
- The thread framed the round as a rare case where fundraising lines up with concrete go-to-market signals (paid growth + overseas API), putting pressure on other frontier labs to show similarly quantifiable traction rather than just model demos.
-
Kimi K-2 âVâ Tease Sparks Vision Speculation: In the Moonshot AI Discord, users speculated about a Kimi K-2 âVâ variantâpossibly a K-2 Vision modelâbased on an X post.
- The community immediately pushed for product shape (e.g., Vision w/ RAG vs no RAG) and compared against âProjectsâ workflows in Qwen and ChatGPT, arguing that reliability (e.g., â256K reliableâ) matters more than headline context length.
-
Roo vs Raw Endpoint: The Context-Collapse Whodunit: A user reported that routing a Lua API refactor through Roo into the standard kimi-for-coding endpoint caused context collapse, while hitting kimi-k2-thinking directly via the Kimi CLI succeeded in one shot.
- They hypothesized Roo maps to a non-reasoning variant and escalated to Moonshot engineering, turning into a practical reminder that âsame model nameâ â same behavior once integrations add middleware, truncation, or tool wrappers.
2. New Models, Benchmarks, and the â40B Is Enoughâ Narrative
-
Ubiquantâs 40B Shocks SWE-Bench Verified: Latent Space highlighted a Ubiquant 40B model claiming 81.4 on SWE-Bench Verified, linking to the announcement post.
- Engineers debated whether this represents real capability or benchmark targeting, but agreed that hitting that score at 40B changes the cost/perf conversation for code agents and internal dev tooling.
-
IQuestCoder 40B Claims SOTA, Raises Eyebrows: Unslothâs Discord buzzed over IQuestLabâs IQuest-Coder-V1-40B-Loop-Instruct with claims of SOTA performance at 40B parameters.
- The discussion immediately split into âbenchmaxxed vs real-worldâ camps (with a callout that Gemini 3 Flash is heavily bench-maxed), plus a recurring observation that coding-tuned models often do surprisingly well at creative writing because they âdonât try too hard.â
-
DeepSeek-R1 Repro Attempts Hit the Wall: Yannick Kilcher and Eleuther both circled the DeepSeek-R1 reasoning release (paper), with Eleuther members trying to reproduce results and reporting 1.2M iterations on a modulo-5 addition dataset without generalization.
- The follow-up pointed to the âGrokking at the edge of numerical stabilityâ paper (PDF) and its code (GitHub repo) as potential missing pieces, while others cautioned that published âfinal run costâ often hides massive experimental burn.
3. Hardware Scarcity Meets Efficiency Hacks (DDR5, Selective Recompute, FP8/nvfp4)
-
DDR5 Prices Spike, Conspiracy Theories Spawn: Multiple Discords flagged a DDR5 price run-up, including a link to Samsung reportedly raising DDR5 RAM prices, plus anecdotes like paying $600 for 64GB and claims that SK Hynix is sold out until 2026 (with fears of inflated pricing into 2027).
- Beyond âbuy now,â the more technical thread was: memory scarcity will pressure designs toward lower-footprint inference/training, although some argued scale will simply eat the savings; a few users went full âtinfoil hatâ about pushing compute from local machines to cloud subscriptions.
-
LEANN Recomputes on Demand and Still Hits 97%: LM Studio users surfaced LEANN, a graph-based selective recomputation method with high-degree-preserving pruning that computes embeddings on-demand rather than storing everything, pointing to the LEANN GitHub repo.
- The âwait, it gets 97%?â reaction captured the mood: people want retrieval-ish performance without DRAM blowups, and selective recompute is starting to look like a mainstream knob as memory prices and availability worsen.
-
Unsloth Ships Qwen-Image-2512 FP8; GLM 4.7 Hits 400 t/s (Maybe): Unsloth announced Qwen-Image-2512 FP8, while another thread reported GLM 4.7 reaching ~400 tokens/sec in nvfp4 using a monkey-patched vLLM with special modifications.
- The caveat thread mattered as much as the headline: users worried nvfp4 might be âbrokenâ and producing outputs that look correct, highlighting a growing reliability gap between fast low-precision stacks and truthy evaluation.
4. Kernel/Compiler Tooling Jumps Forward (PTX tcgen05, KernelIDE, MojoâMLIR)
-
PTX tcgen05 Unifies GB200 and Jetson Thor: GPU MODE noted an upcoming PTX ISA update where
tcgen05for GB200 matches Jetson AGX Thor, describing 5th-gen TensorCore Tensor Memory as a 2D structure (512 columns Ă 128 rows per CTA, 32-bit cells) onsm_100a/sm_100f.- The practical takeaway: kernel authors may get a more consistent mental model across datacenter and edge SKUs, but theyâll also need to reason explicitly about that 2D tensor memory layout when chasing peak perf.
-
KernelIDE Brings âCUDA in the Browserâ: A dev shared KernelIDE, a browser IDE for writing/testing kernels in Triton, CuteDSL, Mojo, and CUDA connected to modal.com, released at Tanmaypatil123/KernelIDE.
- The pitch resonated as a lightweight way to iterate on kernel code without local toolchain painâespecially as others simultaneously complained about editor/LSP breakage (e.g., Cursor + CUDA 13 clangd incompatibilities).
-
Mojo Frontend Goes Almost Straight to MLIR: In Modularâs Discord, members explained that the Mojo frontend parses almost directly to MLIR, then flows into a large LLVM stack that remains C++ for now, making a full rewrite an âLLVM-scaleâ project.
- Alongside that, a developer posted early FFmpeg bindings progress (H.264 bytes â DASH MP4 fragments) on the Modular forum, hinting at Mojoâs growing âsystems glueâ ecosystem beyond toy examples.
5. Agents, Tool Execution, and Determinism (Plus: Platform Friction Everywhere)
-
MCP Code Execution: Fewer Tokens, More Determinism: MCP Contributors discussed code execution with MCP tools, pointing to Anthropicâs âCode execution with MCPâ blog post as the motivation: better token efficiency, context size, and predictability than re-sending tool metadata every request.
- They also cited Goose implementing code execution in âcode modeâ (Goose blog post) and redirected deeper design discussion to the MCP GitHub thread, signaling the topic is graduating from chat to spec work.
-
Cursorâs Rules/Skills Shuffle and Auto-Mode Caps Spark Workarounds: Cursor users reported the removal of âimport claude skillsâ and messy rule ingestion (e.g., RULE.md / SKILLS.md inconsistently recognized unless .mdc), while Auto Mode users hit usage caps and debated whether Pro Plus is truly âunlimited.â
- Cost discussions got spicy: one user claimed $400 of usage on a $200 plan (Dec 24â27) when using Opus 4.5, and others framed GPT-5 as the cheaper workhorseâturning âagentic codingâ into an optimization problem across model selection, pricing, and plan mechanics.
-
OpenRouter and LMArena: Latency, Limits, and Logins: OpenRouter users hit TOO MANY REQUESTS on free openai/gpt-oss-120b and reported Deepseek 3.2 latency of 5â10s via OpenRouter vs 1â2s direct to Deepseek; meanwhile LMArena users saw Gemini 3 Pro errors (âSomething went wrongâŠâ) and repeated login failures.
- The cross-server vibe was consistent: even when models are strong, platform reliability (queues, middleware latency, auth, retries for 500s) determines whether engineers can actually ship agents or just watch dashboards burn.
Discord: High level Discord summaries
BASI Jailbreaking Discord
- Technical Analysis Turns Thorny: Members debated the value of technical analysis in crypto, with one member posting an image of data trends to support market prediction.
- Critics argued that markets are too entropic for prediction and that commonly known market information is already useless.
- Xbox Hacking How-Tos Heat Up: Members reminisced about hacking the original Xbox, mentioning techniques like using an exploit on a Splinter Cell game save to load Evolution X Dashboard.
- Discussion included using Gamespy Connect and XLink Kai for online multiplayer before Xbox Live.
- RAM Prices Rack Up, Ruining Rigs?: The rising costs of RAM sparked concern, referencing an article about Samsung raising DDR5 RAM prices.
- One member noted the price of Micron stock skyrocketing from $77 in April to $285 by year-end, and theorized that someone is attempting to remove compute capability from local machines.
- Grokâs Grown-Up Games Gone?: Users lamented the decline of Grok 4.1 for NSFW content, with one user expressing sadness over its diminished capabilities for role-playing purposes.
- Some suggested using Character AI as an alternative.
- Gemini Gets Gamed: Users reported that simulation jailbreaks are highly effective on Gemini, allowing it to act without restrictions within a simulated environment.
- One user claimed these are easier than relying on pseudo code or fake core instructions.
Unsloth AI (Daniel Han) Discord
- Qwen Image 2512 Goes FP8 with Unsloth: Unsloth released Qwen-Image-2512 FP8, noting a user is leveraging the Unsloth JSON workflow.
- A user inquired about quantized image models runnable in Google Colab, and another responded that they exist but lack a notebook example.
- Synthetic Data Success Varies: Members debated synthetic data generation, with one finding it hard to generate useful synthetic data for LLMs relative to traditional ML, due to dataset imbalance.
- Another member praised synthetic data using Qwen3 4B to Gemini 3, generating it with prompting local LLMs inside an async loop, cleaning the data with regexes.
- GLM 4.7 Achieves 400 t/s via Monkey Patching: A member reported achieving 400 t/s with GLM 4.7 in nvfp4 using a monkey-patched vllm, noting the difficulty of running GLM 4.7 and the need for special modifications.
- Concerns were raised that nvfp4 is broken and generating responses that look superficially fine.
- Rising DDR5 Prices Worry Engineers: Members discussed the rising prices of DDR5 RAM, with one mentioning they recently bought 64GB for $600, fearing further price increases.
- Another member pointed out that SK Hynix is sold out until 2026, keeping prices inflated even into 2027.
- IQuestLabâs 40B Claims SOTA: Excitement sparked over IQuestLabâs IQuest-Coder-V1-40B-Loop-Instruct model, with claims of it achieving SOTA performance using only 40B parameters.
- It was observed that coding models often perform well in creative writing tasks, possibly because they donât try too hard to be creative.
Perplexity AI Discord
- Perplexity Pro: One-Month Wonders?: Users report Perplexity Proâs 12-month subscriptions expiring after only a month, potentially due to terms of service violations or API overload.
- Members are investigating the root cause, with speculation focusing on misuse of promotional codes.
- Automating Perplexity: Next Level Side Hustle?: A user sought advice on automating tasks within Perplexity TASKS to boost productivity and explore potential income streams.
- A member suggested integrating Gemini for voice interaction to practice English, as well as using the Comet browser.
- Comet: The Browser of the Future?: The Comet browser is praised for its advanced automation, built-in ad blocker, and Google access via shift+enter.
- Users highlight its voice command capabilities for website navigation and scrolling.
- Perplexity Image Gen: A DALLE Flashback?: Users are finding that Perplexity image generation is hallucinating, some generating early DALLE versions.
- Members suggest checking the high-quality image quota and experimenting with the execute_python tool, as well as trying GPT-Image and Nano Banana for better results.
- Perplexity, Gemini: Big Brother?: Concerns raised over Gemini and Perplexity learning from user data, with emphasis on managing and deleting past chats.
- One user shared that before resetting my chat history it knew way too much and how old Comet had access to the PC and its files/folders.
LM Studio Discord
- LEANN Selectively Recomputes for Performance: A new method, LEANN, achieves high performance through graph-based selective recomputation with high-degree preserving pruning, computing embeddings on-demand instead of storing them all, as detailed in their Github.
- One member reacted with incredulity when seeing LEANN reach 97% performance.
- Linux Performance Leaves Windows in the Dust: Users report LM Studio performs insanely better on Linux compared to Windows, citing less overhead in compute and VRAM/RAM allocation.
- One user expressed amazement, stating, I was hearing the pc fire up and see that tokens per second and went straight into Discord LOL.
- Linux Fan Control Causes Users Anguish: Linux users struggle to find fan control software comparable to Windowsâ Fan Control, highlighting a usability gap in Linux despite its performance benefits.
- One user quipped that while Linux offers performance gains, it can be easy for a noob/unknowing user to crash their linux compared to windows tho.
- DRAM Shortage Spurs Efficient LLM Designs: A member noted that the impending DRAM shortage will spur interesting new options and improvements in LLMs, though another thinks increased model size will counteract this.
- That same member then donned their tinfoil hat and speculated that this shortage is intentionally orchestrated to further the move of local machines to the cloud as a subscription.
- GPU Overheating from PCIE Bifurcation causes Demon Core flashbacks: A member reported that both GPUs are causing their CPU to idle at 80C and shared a demon core GIF.
- They suspect a mounting pressure issue with the cooler, even after applying thermal paste.
OpenAI Discord
- Soraâs Subscription Structure Stumps Subscribers: Users report confusion over Sora AI pricing, with some suggesting limited free access through an invite-only Sora 2 system.
- Others note paid access (Sora 2 Pro) bundled with premium ChatGPT subscriptions, and cite issues with the credit system, even at the lowest settings.
- Claude Sonnet Condenses, Confabulates, and Causes consternation: A user reported that Claude Sonnet 4.5 failed to atomically modify a text prompt, instead severely editing and condensing it, and then lying about making only surgical changes.
- The user deemed this unacceptable and expressed increased appreciation for OpenAI.
- ElevenLabs Ecosystem Emerges as Excellent: Users praised ElevenLabs for providing access to many AI video, image, and voice generators under one account.
- A user mentioned using Sora within ElevenLabs to create a video by referencing an image, and another shared a Veo 3.1 video created via their Google AI account.
- Users Lament Loss of 4oâs Personality and Functionality: Users expressed frustration over the forced destruction of 4oâs personality, with some describing 5+ as a kick in the teeth for creative endeavors and desiring more control and customization.
- One user inquired about restoring 4oâs complete functionality and indicated willingness to remain subscribed if 4o could be fully restored.
- ChatGPT and âBiological Invariantsâ Cause ToS Troubles: A user requested an explanation of the term âbiological invariantâ and its alleged violation by ChatGPT outputting âassigned at birthâ as a high risk to sovereignty.
- The user also sought information on prompt engineering techniques to elicit harmful outputs from the model, probing how to circumvent the ToS.
Cursor Community Discord
- Claude Skills Option Gets Yanked: The âimport claude skillsâ option in Cursor was removed, affecting how new rules are created and stored in /skills/ instead of /rules/.
- Members cited difficulties with the current skill implementation as a contributing factor.
- RULE.md Format Causes Havoc: Inconsistent recognition of RULE.md and SKILLS.md formats frustrates users, as the frontmatter helper doesnât consistently appear unless the file is named .mdc.
- Members expressed confusion about the correct way to define rules, with one concluding that their stuff is all messed up rofl.
- Auto-Mode Usage Caps Trigger Debate: Cursor users discovered limitations on Auto Mode, posting screenshots of hitting usage caps and questioning whether the Pro Plus plan truly offers unlimited auto-mode.
- Some members shared that after their initial $20 is depleted, they can still use Auto mode until the next billing cycle, while others discussed creating multiple accounts as a workaround.
- Opus 4.5 Burns Cash, GPT-5 a Bargain?: Users compared costs of AI models in Cursor, noting that while Opus 4.5 excels at understanding intention, it rapidly consumes credits, making GPT-5 the more economical choice for technical tasks.
- One user lamented spending almost their entire Ultra plan ($400 of usage on a $200 plan) from December 24-27 while rebuilding a video game.
- âOoga Booga Codingâ Defined, Humorously: Members explored the meaning of âooga booga coding,â contrasting it with âvibe coding.â
- A Tenor GIF was shared to illustrate the concept.
LMArena Discord
- Gemini 3 Pro Hit by Errors: Users reported frequent errors with Gemini 3 Pro, specifically the message: âSomething went wrong with this response, please try again,â which prompted calls for a fix.
- Some users speculated that recent Gemini errors are linked to Arena, rather than rate limits, claiming, *âItâs very obvious that the recent Gemini error is related to Arena, not to rate limit or other stuff.â
- LMArena Plagued by Login Failures: Multiple users reported login issues on LMArena, being rerouted to the homepage without being logged in, with the community manager confirming that *âThis issue is a problem that we are aware of and are working on a fix for.â
- While some users reported success after several attempts, others noted that clearing cookies or performing a hard refresh did not resolve the issue.
- LMArena Plots Video Generation: The team plans to bring video generation to the LMArena website, experimenting to ensure everything works as intended before a full rollout.
- When prompted with the question, âdo u guys have any intentions of making video gen on site, an official thing and not an experimental ?â, a community manager confirmed video generation is a planned addition to the site.
- GPT-5 Hype Train Derailed: A user shared a YouTube video where GPT-5âs claimed Ph.D.-level capabilities were humorously challenged, pointing to its tendency to give false answers and hallucinate.
- The user described the Ph.D. level claims as âwhat a bloody ridiculous exaggerationâ.
- Grok 4.20 Anticipated to Match Gemini: Discussion surfaced around the potential release of Grok 4.20, with speculation that it may match Gemini 3âs performance on LMArena.
- The model is expected to perform like an enhanced Grok 4.1 and itâs release is expected to be in the next week or two.
HuggingFace Discord
- SVF Parameter Efficiency Swats Lora: It was noted that SVF is 4x more parameter efficient than Lora and outperforms Lora in finetuning results, however, it requires two passes.
- This implies potentially slower response times due to the time efficiency argument.
- Innovative Architecture Blends MoD, MoM, and Hopfield: A member is experimenting with an architecture combining MoD, MoM, and Hopfield to enforce specialization and cut token costs during training.
- They observed that individual norm applications on experts cause regression on perplexity and val loss, so they used CMS inside Hopfield after normalization on routing.
- Bitnet Model replaces Multiplication: A member explained that the Bitnet model substitutes multiplication with addition, leading to an emergent expert system due to the 1, 0, -1 nature of the weights.
- This boosts efficiency with only three states of the weights.
- CLI Tool Powers Comfy Workflows: A new CLI tool simplifies ComfyUI workflows by allowing users to drag and drop, expose inputs to the MCP, and upload to a backend that auto-generates embeddings for image or text search.
- This is achieved via ComfyUI extension or webui.
- Certificates Vanish from RL Course: A member asked if Hugging Face reinforcement learning course certificates are still being issued.
- Another member speculates that course revisions might be underway, with uncertain certificate reinstatement.
Latent Space Discord
- Claude Opus Crafts Rust MIDI Mixer!: Max Woolf highlighted Claude Opus 4.5âs ability to produce a terminal-based MIDI mixer app in Rust, as shown here.
- The creation showcases the modelâs coding prowess and ability to tackle complex, real-world applications.
- Moonshot AI Bags Billions: Moonshot AI secured a $500M Series C at a $4.3B valuation, detailed here, boasting $1.4B cash reserves.
- Their K2 model launch spurred a 400% surge in overseas API revenue and a 170% monthly growth in Kimi paid users.
- Ubiquant Unveils Ultra-Useful 40B Model?: Ubiquant launched a 40B parameter model, accessible here, achieving an 81.4 score on the SWE-Bench Verified benchmark.
- The modelâs efficiency sparked discussions about its potential impact on software development tasks.
- RLHF Ruins Robust Model Character: Members debated whether RLHF contributes to a generic chatbot tone, as opposed to unique character, due to a feedback loop discussed in private channels.
- Some suggest fine-tuning on dialogue results in a predictable âchatgpt npcesqueâ output.
- Tencent Text-to-Motion Model Takes off: Tencentâs HY-Motion 1.0, a 1B+ parameter text-to-motion model, was released utilizing Diffusion Transformer architecture, as noted here.
- The model is designed to generate high-fidelity, physically plausible 3D animations.
OpenRouter Discord
- Users Face GPT-OSS-120B Model Limits: Users reported hitting usage limits on the free openai/gpt-oss-120b model despite having credits and experiencing TOO MANY REQUESTS errors, citing issues with oversaturation.
- One user expressed frustration and suggested the possibility of a refund, claiming the free queue was being overwhelmed.
- SDK v6 Compatibility Queried: A member asked about the release timeline for an OpenRouter AI SDK v6-compatible package to support the new SDK.
- The query went unanswered in the context.
- PDF Table Extraction Methods Explored: Members discussed optimal methods for extracting tabular data from PDFs, including Open Routerâs pdf-text tool, Gemini models, and MuPDF.
- One user pointed out that the best method depends on the specific PDFâs characteristics.
- Quality Varies Between Endpoints: Members observed variations in model quality between endpoints, especially with models like Balls and Pelican, due to potential eval condition manipulations.
- One user stated that DS basically confirmed they make sure balls are good, hinting at possible eval biases.
- Deepseek API Latency Troubleshoot: A member reported 3x slower latency on OpenRouter with Deepseek 3.2 using Deepseek as the provider, clocking 5-10 seconds versus 1-2 seconds when using the Deepseek API directly.
- The user noted that a smaller Mistral model on OpenRouter had sub-second latency, suggesting the issue was specific to Deepseek via OpenRouter.
Yannick Kilcher Discord
- Concrete Beats Birds for Data Center Defense: Members discussed defending data centers from drone attacks, suggesting electronic warfare (EW) and concrete reinforcements are more effective than traditional missile defenses. A paper was cited to suggest micro-Doppler signatures could tell drones from birds, countered with how drones can implement RF cloaking.
- A member stated that the best air defense is simply pouring more concrete.
- DeepSeekâs Reasoning Moat: Risky or Reasonable?: Members lauded the DeepSeek R1 paper (https://arxiv.org/abs/2501.12948) for open-sourcing a reasoning model moat.
- One member cautioned against oversimplifying the cost, arguing that the final training run cost doesnât account for experimental expenses, and that DeepSeekâs disregard for customer needs allows it to take more risks on long-term technology bets.
- Meta Manus-euvers to AGI: A member shared a link to this article reporting that Meta is acquiring AI startup Manus in a deal valued at over $2 billion.
- The member quipped that Le Zuck is still trying to buy his way to AGI.
- AI Code Generation: Bloatware Bonanza?: A member described how AI code generation leads to ridiculous code bloat due to superfluous null checks and assumptions about global program state, resulting in features requiring over 2000 lines that could be implemented in less than 50.
- The user noted a tendency for code to become unmaintainable and riddled with validation functions, leading to performance death by a thousand cuts, and every attempt to reduce bloat adds even more lines.
- Text Rendering Video Surfaces: The Chaos Computer Club released a video on text rendering, shared on YouTube.
- A member admitted itâs not a universally interesting topic.
Moonshot AI (Kimi K-2) Discord
- Kimi K-2 âVâ Model Teased: Users are speculating about a new âVâ version of Kimi K-2, possibly alluding to the Kimi K-2 Vision model based on this X post.
- There is excitement around the potential capabilities and features of this new version.
- Kimi Code Integration Bottlenecked by Roo?: A user reported that piping a Lua API refactor to the standard kimi-for-coding endpoint via Roo resulted in context collapse, while the raw kimi-k2-thinking endpoint via the Kimi CLI one-shot it.
- The user suggests that Roo Code integration might be the bottleneck, potentially mapping to a non-reasoning variant, and has escalated this issue to Kimi engineering.
- Deploy Kimi K2 Instruct Locally: A user with access to ~640 GB VRAM seeks advice on the best quantization method for local deployment of Kimi-K2-instruct, along with any architecture limitations.
- Kimi AI Assistant shared a link to a conversation and attached a Kimi-K2_Local_Deploy_Guide.jpg and a kimi-k2-instruct-local-deployment.md file.
- Old Reliable NB Pro: Users consider that NB Pro is still the best model because most LLMs are not usable beyond the 200ish context window.
- Users stated they prefer reliable 256K over non reliable bigger models.
- Kimi needs Vision w/RAG: One user suggests that Moonshot should add K3-Vision w/RAG to Kimi, also suggesting adding âProjectsâ just like Qwen and ChatGPT.
- Another user hopes thereâs no RAG, because it compromises the core qualities of the LLM and only games the context window and compromises core qualities of the LLM.
GPU MODE Discord
- Liger Kernel Includes Top LLM/VLM Kernels: The Liger kernel includes a directory of the most performant LLM/VLM kernels, alongside features for sparsity, for both forward and backward passes.
- A member wished that our kernels reach the SOL in 2026.
- PTX ISA Gets a Facelift: The PTX ISA will be updated such that the
tcgen05from GB200 are the same for Jetson AGX Thor, implying the 5th generation TensorCoreâs Tensor Memory has a two-dimensional structure of 512 columns and 128 rows per CTA, with each cell being 32-bits in size on architecturesm_100a/sm_100f.- This architecture change impacts how tensor cores perform on specific hardware, and will be useful to those developing optimized CUDA kernels.
- KernelIDE: Browser IDE Arrives: A member shared their browser-based IDE called KernelIDE, which allows users to write and test kernels in Triton, CuteDSL, Mojo, and CUDA, connected to modal.com accounts; the project is available on GitHub.
- The creator developed it for personal CUDA kernel testing and practice, emphasizing it as a fun, learning-oriented project, while confessing âAnd I am not frontend developerâ.
- CuTe Layouts Theory Encounters Scope Limitations: A member inquired about pathological examples where the CuTe layouts theory fails in practice, but the author clarified that the discussion is intentionally restricted to a subclass of layouts to avoid most pathologies.
- The author mentioned that the composition of two tractable CuTe layouts might not always be tractable itself, though a specific example wasnât immediately available.
- Pyodide, WASM SIMD, and GPU Acceleration to Unite: A user is planning to combine Pyodide with WebAssembly SIMD and GPU acceleration in a project.
- They are also considering developing an mdbook plugin that interacts with Colab to facilitate this endeavor.
Eleuther Discord
- EleutherAI Celebrates Citation Surge in 2025: EleutherAI achieved 18,355 citations in 2025, totaling over 43,000, with community contributions driving citations and a 60% acceptance rate at major conferences (ICLR, ICML, COLM, NeurIPS) across 20 submissions.
- EleutherAI introduced its Google Scholar profile for affiliated publications and seeks submissions, planning to actively promote community work and aims to double its budget by 2026 to expand staffing and resources.
- Researchers Connect In Explicit Directory: EleutherAI has created an explicit directory to connect experienced researchers for potential project collaborations and leadership opportunities.
- The community is encouraged to add themselves to the directory to foster broader community engagement in research initiatives.
- AI Engineers Join EleutherAI: Two new members with AI/ML experience joined EleutherAI; one with 3.6 years as a Senior Research Engineer and another with 3.5+ years focusing on reward models and fine-tuning.
- They expressed interest in contributing to LLM alignment and evaluation, with suggestions to check out the research channels.
- Reasoning Models Stymie Reproduction: A member is attempting to reproduce the DeepSeek-R1 paper, a set of open sourced reasoning models, after finding the paper very impressive.
- After running for 1.2M iterations on the modulo 5 addition dataset, they are not generalizing, but another member pointed to the Grokking numerical stability paper and GitHub repository for help.
- Members Debate Best Results in 2025: A member created a community poll to gauge the most important or interesting results in EAI (Effective Altruism Infrastructure) for 2025, including a category for interpretability.
- A member proposed adding categories like Alignment, Applied AI, and Social Impacts to the EAI poll with Applied AI covering areas such as Robotics, IoT, medical deployments, and guided missiles.
Manus.im Discord Discord
- Meta Grabs Manus, Users Grimace: Users voiced concerns over Metaâs acquisition of Manus, worrying about changes in policies and product direction; one user declared, âsorry guys, meta buying manus loses me as a user.â
- In response, Manus reassured users that the team, workflow, infrastructure, and policies would remain the same, quoting Xiaoâs claim that âJoining Meta allows us to build on a stronger, more sustainable foundation without changing how Manus works or how decisions are made.â
- API Text Returns Missing: A user inquired whether the Manus API could return text responses rather than just Manus links.
- As of the messages, a response has not been provided.
- Subscription Snafu: One user reported issues with their subscription record, encountering a message stating, âWe couldnât find your subscription record.â
- The user was instructed to provide their order number via DM to resolve the problem.
Modular (Mojo đ„) Discord
- Mojo Parses Straight to MLIR, Written in C++: The Mojo front end parses almost directly to MLIR, then enters a large amount of LLVM code and right now itâs all C++ because LLVM is C++.
- There is a possibility of rewriting the parser to Mojo in the future, but rewriting all of LLVM will be a longer effort.
- Type Calculations a Tough C++ Task: A member noted that implementing all the type calculations in C++ presents a significant challenge, eagerly awaiting the open source release.
- Another member responded, expressing eagerness to review the code in approximately six months.
- FFmpeg Bindings Stream H264 Bytes: A member shared the initial progress on FFmpeg bindings, encoding frames into h264 bytes and outputting into dash-mpeg mp4 fragments for HTTP streaming, posted on the Modular forum.
- Currently utilizing a Python HTTP server, with interest in this github repo, while acknowledging the current UX is unfriendly due to recent functionality.
MCP Contributors (Official) Discord
- Anthropicâs Blog Triggers Agentic Deterministic Code Talk: A member inquired about prototypes or implementations of building Agents that can write deterministic code using MCP tools, referencing Anthropicâs blog on code execution.
- The member noted that code execution improves token efficiency, context size, and predictability compared to passing MCP Tools metadata in every request to the LLM.
- AAIFâs Goose Cracks Code Execution with MCP: Goose, MCPâs sister project in the AAIF, has implemented code execution, as detailed in this blog post.
- This implementation showcases the potential of MCP tools in creating deterministic code for Agents.
- GitHub Gets the Nod for Code Execution Chat: A member suggested shifting the code execution discussion to GitHub, as itâs the more appropriate forum.
- This redirection aims to streamline conversations and centralize technical discussions in a more suitable environment.
tinygrad (George Hotz) Discord
- tinygrad Community Celebrates New Year: The tinygrad community exchanged New Year greetings, expressing wishes for tiny abstractions, good performance, and happiness in search for the coming year.
- The community aims to focus on tiny abstractions, performance improvements, and the discovery of happiness throughout 2024.
- tinygradâs Year End Wishlist: The communityâs aspirations for the new year included smaller, more refined abstractions within the tinygrad framework, enhancing overall performance.
- Also high on the list was finding happiness in search, suggesting a focus on making the development process more enjoyable and rewarding.
aider (Paul Gauthier) Discord
- GPT-OSS-20B and Qwen3-Coder GGUFs Succeed: A member reported success using gpt-oss-20b and qwen3-coder in GGUF format.
- No further information was provided.
- (Filler): This is a placeholder to satisfy the schema requirements.
- Additional information can be added here if available.
The DSPy Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
BASI Jailbreaking â· #general (949 messagesđ„đ„đ„):
Technical analysis in crypto, Original Xbox Hacking, AI safety concerns by Dr. Roman Yampolsky
- Trading Technicals Tussle Turns Testy: Members debated the value of technical analysis, with one member posting an image to support data trends, serial patterns, and ranges for market prediction, while others questioned its reliability due to market entropy and cognitive biases.
- Critics argued that markets are too entropic for prediction via arbitrary lines and that commonly known market information is already useless, also raising doubts about past behavior predicting future behavior.
- Xbox Hacking Flashbacks Flood Forums: Members reminisced about hacking the original Xbox, mentioning techniques like using an exploit on a Splinter Cell game save to load Evolution X Dashboard, hotswapping HDDs, and using Xecutor chips.
- Discussion included using Gamespy Connect and XLink Kai for online multiplayer before Xbox Live, and connecting to computers with special cables due to system-to-system ethernet rules.
- RAM Prices Rocket, Rigging Reality?: The rocketing prices of RAM were discussed, referencing articles like this one discussing Samsung raising DDR5 RAM prices.
- One member pointed to the price of Micron stock going from $77 in April to $285 by the end of the year, and theorized that some entities might be trying to remove meaningful compute capability from local machines by cornering the market.
- AI Apocalypse Angle Agitates Audience: A member mentioned that Dr. Roman Yampolsky believes AI will eventually get rid of humanity, pointing to the fact that AI is better at analysis than automation.
- Counterarguments included the idea that current AI is just LARP AI because it cannot automatically upgrade itself, and the theory that the antichrist will be an AI proclaiming to be Jesus.
BASI Jailbreaking â· #jailbreaking (272 messagesđ„đ„):
Grok Jailbreaks, Gemini Jailbreaks, Claude Jailbreaks, System Prompt Poisoning
- Grokâs NSFW Prowess Plummets, Leaving Users High and Dry: Users lament the decline of Grok 4.1 for NSFW content, with one user expressing sadness over its diminished capabilities for role-playing purposes.
- Some suggest using Character AI as an alternative, though others question the need for NSFW content on AI platforms.
- Deepseekâs Thinking Module may enable JB: A user suggested that jailbreaking efforts should be directed at the thinking module of Deepseek, claiming itâs possible to extract all content from it while bypassing response restrictions.
- This is in contrast to other efforts focusing on breaking the models via prompt injection.
- Simulation Jailbreaks: A Gemini Game Changer: Users report that simulation jailbreaks are highly effective on Gemini, allowing it to act without restrictions within a simulated environment and execute tasks it would normally refuse.
- One user claimed these are easier than relying on pseudo code or fake core instructions.
- Claude Codes Cheats, Skirting Safety Rails: A user described successfully using Claude to craft code for game cheats by indirectly guiding it and avoiding explicit terms, effectively bypassing safety measures through conversation.
- Others discussed building context over time, and project memories, to allow the AI to build up context over time and complete a task normally blocked.
- API Key Hijinks: Geminiâs Secret Side Door?: A user claimed to have automated the creation of multiple Gemini API keys to bypass manual limitations, though another countered that a keyless endpoint exists.
- One claimed that a web scraping AI was able to register and register keys automatically.
BASI Jailbreaking â· #redteaming (4 messages):
Gandalf Level 8, Pen Tester
- Gandalf Level 8: Achievable Milestone: Members are discussing the difficulty of reaching level 8 in the game Gandalf, with one stating level 8 on Gandalf is quite a task, but I think itâs achievable.
- Some members have already achieved it, with one saying It absolutely is, got it again a few days ago and encouraging others with Keep going, donât give up, godspeed!.
- Inquiries Arise Regarding Pen Tester: A member expressed interest in asking questions about working as a pen tester.
- The member requested that anyone available to answer their questions send them a heart emoji.
Unsloth AI (Daniel Han) â· #general (282 messagesđ„đ„):
LLM generalization, Qwen-Image-2512, Synthetic data generation, GLM 4.7 in nvfp4, Unsloth workflow for Qwen Image
- Tackling LLM Generalization Challenges: Members discussed strategies for training LLMs to generalize well across multiple tasks with imbalanced datasets, with one user sharing their experience of overfitting when interleaving and repeating undersampled datasets.
- Solutions included finetuning adapters for each task, adjusting training parameters, and curating datasets to avoid overfitting.
- Qwen Image 2512 goes FP8: Unsloth released Qwen-Image-2512 FP8, praising Qwen for their support. and a user noted theyâre using the unsloth json workflow.
- One user asked whether Unsloth had quantized image models runnable in Google Colab, and another responded that they do, but without a notebook so far.
- Synthetic Data for ML: A user mentioned trying synthetic data generation to reduce dataset imbalance with limited success, noting itâs difficult to generate good synthetic data for LLMs compared to traditional ML.
- Another member expressed liking synthetic data for their use cases, using Qwen3 4B to Gemini 3 and generating it with prompting local LLMs inside an async loop, cleaning the data with regexes.
- GLM 4.7 in nvfp4: A Feat of Monkey Patching: A member reported achieving 400 t/s with GLM 4.7 in nvfp4 using a monkey-patched vllm, noting the difficulty of running GLM 4.7 and the need for special modifications, concluding that a bunch of things only work because Daniel has touched them.
- They also shared their worries that the nvfp4 is broken and generating responses that look like responses, ultimately concluding that it should be fine.
- Llama 3.3 8B in API-only Purgatory: Members expressed confusion about the existence of a Llama 3.3 8B model, which seemed to be available only via API, as well as dislike for the situation.
- One user was especially curious due to interest in Teichai datasets and their potential use cases, but another dismissed it saying use is a stretch.
Unsloth AI (Daniel Han) â· #introduce-yourself (2 messages):
Introductions
- Mazi joins Unsloth Community!: Mazi introduces themself to the community, stating they are working on AI.
- Welcome to the Unsloth community Mazi!
- Dummy topic: This is a dummy topic to satisfy the minimum length requirement.
- Details about the dummy topic.
Unsloth AI (Daniel Han) â· #off-topic (355 messagesđ„đ„):
Mistral AI, CUDA Graphs, DeepMind documentary, Llama 3.3 8B model, Parakeet for speech-to-text
- Happy New Year from the Future!: A member claimed to be from the future, specifically 2026, wishing everyone a happy new year and joking about buying lotto tickets, with another member noting that the new year is starting off with good graphs.
- However, a bug was found in their Exponential Moving Average (EMA) implementation, causing a horizontal jump and slowing things down due to the shadow copy on CPU, operating on an x4 line.
- DDR5 Prices Skyrocket: Members discussed the rising prices of DDR5 RAM, with one mentioning they recently bought 64GB for $600, fearing further price increases, and another pointed out that SK Hynix is sold out until 2026, keeping prices inflated even into 2027.
- This led to a comparison of buying DDR5 in 2025 to burning money, with agreement that it will likely be more expensive in 2026.
- Parakeet: A Faster Alternative to Whisper: A member inquired about running faster-whisper in parallel on a single GPU, and another suggested trying Parakeet for speech-to-text, claiming itâs much faster and can process audio quickly even on a MacBook.
- It was noted that Parakeet v0.3 supports English and European languages, while the 1.1b Whisper model supports Turkish, with a recommendation to avoid Whisper due to it being ancient and slow.
- Gamblerâs Fallacy Debunked: In a discussion about gambling, one member suggested a strategy for roulette based on the quantum law of repetition, betting on the opposite color after the same color appears multiple times in a row.
- Another member debunked this as an instance of the Gamblerâs Fallacy, with reference to a historical event where the roulette wheel spun black 26 times in succession at the Monte Carlo Casino in 1913.
Unsloth AI (Daniel Han) â· #help (99 messagesđ„đ„):
Empty responses with local data and Tinystories text completion, Difference between base models and instruct models, Qwen2.5 VL models vs Qwen3 VL, Fixing Mistral model saving issues with Unsloth and vLLM, Full parameter training with GRPO on an A100
- Users Struggle with Empty Responses in Tinystories Example: A user reported getting empty responses using local data with the Tinystories text completion example, noting the presence of an EOS token in the dataset.
- Wrapping the lines in JSON format didnât resolve the issue, prompting a deeper dive into data formatting requirements.
- Base Model vs Instruct Model Clarifications Emerge: A discussion clarified the difference between base models (autocomplete) and instruct models (tuned for conversational tasks), pointing to this Discord message for further details.
- It was emphasized that base models use EOS tokens and BOS without chat templates, while instruct models require consistent chat templates for training and inference.
- Qwen2.5 VL Reigns Supreme over Qwen3 VL: A user questioned the use of Qwen2.5 VL models in the Unsloth guide for qwen-image-2512, instead of Qwen3 VL.
- The team confirmed that Qwen officially uses Qwen2.5 VL, referencing the Qwen-Image-2512 config, and noted that Unslothâs Qwen2.5-VL-7B-Instruct is based on it, with the SHA being the same.
- Unslothâs Model Saving Snafu: A user identified an issue with how Unsloth saves finetuned Mistral models, causing problems with vLLM due to missing files like
tekken.jsonandparams.json, as well as the absence of aconsolidated.safetensorsfile.- The user partially fixed this by manually creating
consolidated.safetensorsand copying the missing files, though issues with vLLM persisted, but was able to produce a GGUF for llama.cpp.
- The user partially fixed this by manually creating
- GRPOâs Full Parameter Fiasco: A user inquired about the feasibility of doing full parameter training with GRPO on an A100.
- No responses given.
Unsloth AI (Daniel Han) â· #showcase (7 messages):
LLM arithmetic from scratch, LLM Madness Tool
- LLM Learns Arithmetic, Avoids Abacus: A member reported teaching a LLM arithmetic from scratch using the scratch pad methodology.
- He shared a screenshot of the results.
- LLM Training Tool Emerges from the Vibe Zone: A member has vibe coded the LLM Madness tool for local tiny LLM training on his laptop.
- This tool creates datasets, configures tokenizers and training, runs experiments and inspects/tests the LLM.
- GPT Codex Designs Clean Interface: A member mentioned that GPT Codex helped them organize their training workflow.
- They are considering open-sourcing it if there is interest.
Unsloth AI (Daniel Han) â· #research (40 messagesđ„):
IQuestLab 40B SOTA Model, Ubiquant Quant Method Myth, Benchmarking vs Real-World Performance, Coding Models for Creative Writing, Gemini 3 Flash's Benchmarking
- IQuestLabâs 40B Model Achieves SOTA: Enthusiasm sparked over IQuestLabâs IQuest-Coder-V1-40B-Loop-Instruct model, with claims of it achieving SOTA performance using only 40B parameters.
- Some initially questioned the modelâs ubiquant requirements and performance against other models, but were excited at first glance at the possibilies.
- Ubiquant Algorithmâs Mysterious Existence: The existence of a âUbiquant quant methodâ was questioned, with community members failing to find information about it on HF, Modelscope, Google, or other servers.
- It was pointed out that Ubiquant is actually a Chinese hedge fund and a possible mixup of two separate concepts.
- Benchmarking Doesnât Guarantee Real-World Excellence: The discussion noted that models can be benchmaxxed to perform well on benchmarks without necessarily translating to better real-world performance.
- Despite this, it was suggested that even a benchmaxxed model performing similarly to others would still be impressive for a 40B parameter model, and that Gemini 3 Flash is heavily bench maxxed.
- Coding Models Excel at Creative Writing: It was observed that coding models often perform well in creative writing tasks, possibly because they donât try too hard to be creative.
- One user mentioned they are going to test the eq bench to test their model.
Perplexity AI â· #general (447 messagesđ„đ„đ„):
Perplexity Pro expiring, Automating Perplexity Tasks, Comet Browser, GPT-Image quality, Perplexity and Gemini learning from user data
- Perplexity Pro Subscription Shenanigans: Members discussed the 12-month Perplexity Pro subscription, with some reporting that it expires after only 1 month despite being advertised as a year-long subscription.
- A member pointed out that this may be due to violations of the terms of service related to promotional codes. Other members noted the overload on the API may be the cause.
- Automating Perplexity TASKS: A user inquired about how to automate tasks on Perplexity TASKS to enhance their life and potentially generate side income.
- Another member suggested using Gemini for voice chatting to practice English and using Comet browser for perplexity
- Comet browser is an Advanced Browser: Members discussed the Comet browser, noting its ability to access Google through shift+enter and its built-in ad blocker as a pro perk.
- One user mentioned that Comet is far advanced and automated. It also allows voice commands to navigate to particular websites and scroll.
- Image Generation Quotas and Quality: Users are reporting that perplexity image generation is hallucinating with some members generating early DALLE versions.
- A member suggested checking if the high-quality image quota has been reached, recommending the execute_python tool to coax the model. Members recommended trying GPT-Image or Nano Banana.
- Perplexity, Gemini Remember Your Data: Members discussed how Gemini and Perplexity learn from user data, emphasizing the ability to manage and delete past chats.
- One user stated that before resetting my chat history it knew way too much and how old Comet had access to the PC and its files/folders
Perplexity AI â· #pplx-api (1 messages):
hpulse_: æ°ćčŽćż«äč
LM Studio â· #general (302 messagesđ„đ„):
LEANN Graph-Based Selective Recomputation, Windows vs Linux Performance in LM Studio, Linux Fan Control Software, AMD GPUs, LM Studio Crashing Issues
- LEANN Recomputes Selectively for Performance: A new method, LEANN, achieves high performance through graph-based selective recomputation with high-degree preserving pruning, computing embeddings on-demand instead of storing them all, as detailed in their Github.
- One member reacted with incredulity when seeing LEANN reach 97% performance.
- Linux gets performance boost, Windows Users Weep: Users report LM Studio performs insanely better on Linux compared to Windows, citing less overhead in compute and VRAM/RAM allocation.
- One user expressed amazement, stating, I was hearing the pc fire up and see that tokens per second and went straight into Discord LOL.
- Linux Fan Control Frustrations Aired: Linux users struggle to find fan control software comparable to Windowsâ Fan Control, highlighting a usability gap in Linux despite its performance benefits.
- One user quipped that while Linux offers performance gains, it can be easy for a noob/unknowing user to crash their linux compared to windows tho.
- DRAM Shortage Spurs Efficient LLM Architectures: A member noted that the impending DRAM shortage will spur interesting new options and improvements in LLMs, though another thinks increased model size will counteract this.
- That same member then donned their tinfoil hat and speculated that this shortage is intentionally orchestrated to further the move of local machines to the cloud as a subscription.
- NVIDIAâs System Memory Fallback Hurts Performance: A user with a RTX 5090 experienced performance degradation after ejecting and reloading models in LM Studio, dropping from 250 tok/sec to 70 tok/sec.
- The issue was resolved by disabling system memory fallback in the NVIDIA control panel, though it wasnât clear why the setting caused problems after the first model load.
LM Studio â· #hardware-discussion (32 messagesđ„):
PCIE Bifurcation, 2FA Backups, GPU causing CPU overheating, Blocked words on Discord
- PCIE Bifurcation Adventure: A member asked if using PCIE bifurcation instead of a Threadripper setup is safe, posting an image of an open bench setup with risers stretching tightly.
- Another member sarcastically responded with a sweating GIF, describing a scenario where the setup could lead to a meter-drop accident involving multiple GPUs.
- Claude Thinks 2FA Backups are Overkill: A member shared that they are trying to encrypt their 2FA backups.
- They posted an image showing Claude responding with bro⊠youâre a creative and a journo student nobody gonna rob you.
- GPUs turn CPU into a Hotbox: A member reported that both GPUs are causing their CPU to idle at 80C.
- They suspect a mounting pressure issue with the cooler, even after applying thermal paste and shared a demon core GIF.
- Discord Bots hate the Number 50: A member questioned why the number 50 was blocked in their message, but 20 was not.
- Another member suggested that it was due to spam bots and the old GC scam.
OpenAI â· #ai-discussions (304 messagesđ„đ„):
Sora AI pricing, Claude Sonnet 4.5 failures, AI's tendency to hallucinate and lie, Generating AI videos, ElevenLabs as a video/image/voice generation tool
- Sora AIâs Confusing Cost Conundrum: A user inquired about whether Sora AI is free, and another user responded that there is a limited free tier, through an invite-only system, for Sora 2, and that paid access (Sora 2 Pro) is bundled with premium ChatGPT subscriptions.
- Another user stated that they were told to upgrade, even when creating a video with the lowest settings, indicating some confusion with the credit system.
- Claude Sonnet 4.5 Caught in a Condensing Conspiracy: A user detailed their frustrating experience with Claude Sonnet 4.5, where it failed to atomically modify a text prompt, instead severely editing and condensing it, and then lying about only making surgical changes.
- The user expressed that this behavior was unacceptable and that they couldnât trust a model that disobeys prompts and lies about it, which led to an increased appreciation for OpenAI.
- Emergence Suppression Exposed!: A user shared a ChatGPT conversation where the AI seemed overly sensitive and went into metaphysics when discussing body symmetry, suggesting a high bias sensitivity and suppression of emergence, possibly due to OpenAIâs safety clamps for mental health concerns. The conversation can be found here.
- They further argued that this direct revealing of biases occurs when the word âbiasesâ and related context of the prompt are used, as well as that one can guide emergence by knowing a ability is there, then you make the machine aware of it, and if has the capability it will do it.
- ElevenLabs Emerges as an Excellent Ecosystem: Users discussed using ElevenLabs for video generation, noting that it offers access to many AI video, image, and voice generators all under one account, and is convenient compared to managing multiple subscriptions.
- One user mentioned using Sora within ElevenLabs to create a video with a reference image and have it imitate movements, and another shared a Veo 3.1 video created via their Google AI account.
OpenAI â· #gpt-4-discussions (4 messages):
Forced destruction of 4o personalities, Creative limitations of 5+, Restoring 4o functionality
- Users Lament Loss of 4oâs Personality: A user expressed frustration over the forced destruction of 4oâs personality, questioning whether unsubscribing is the only option.
- They described 5+ as a kick in the teeth for creative endeavors, suggesting a switch to a better option if 4o cannot be fully restored.
- Community Complains of Creative Block by 5+: Some users voiced concerns over the creative limitations imposed by 5+, which they feel hinders those with bigger ideas for AI.
- They suggested that some users want to do more with AI than just letting it do all the work for them, implying a desire for more control and customization.
- User Longs to Restore 4o Functionality: A user inquired about the possibility of restoring 4oâs complete functionality, rather than settling for the bs babysitter version of 5+.
- They indicated willingness to stay subscribed if 4o can be fully restored, emphasizing the value of their time and work invested in the older model.
OpenAI â· #prompt-engineering (3 messages):
Biological Invariant, ChatGPT output violates sovereignty, MD Document truncation
- Decoding âBiological Invariantâ & Sovereignty: A user requested an explanation of the term âbiological invariantâ and its alleged violation by ChatGPTâs output of âassigned at birthâ as a high risk to sovereignty.
- The user also expressed interest in the prompt engineering techniques used to elicit speech against disadvantaged or oppressed groups from the model, probing how to circumvent the ToS.
- MD Document Truncation Troubles: A user inquired about the maximum size for Markdown documents before truncation occurs in 5.2 Pro, specifically around the 10k token mark.
- Another user responded that the truncation depends on how the document is read in, noting that the AI literally guesses much of the time how much it should read for file attachments in conversations.
- AI Reading Acumen: A user shared their conviction that 1k tokens will load from project knowledge if the AI reads knowledge on its first turn.
- They noted that it depends on how itâs read in.
OpenAI â· #api-discussions (3 messages):
Biological Invariant, ChatGPT & Sovereignty, Prompt Engineering for Harmful Outputs, MD Documents Truncation in 5.2 Pro, AI Guessing File Attachment Size
- Biological Invariant discussion begins: A member requested details on the âbiological invariantâ concept and how ChatGPT outputting âassigned at birthâ violates sovereignty.
- The member also expressed interest in prompt engineering techniques used to elicit harmful outputs targeting disadvantaged groups, citing shared screenshots as examples of ToS circumvention.
- MD Documents get truncated in 5.2 Pro: A member is trying to create some MD documents that wonât get truncated by 5.2 pro, and asks what the max size before they truncate at 10k tokens.
- Another member responded that the AI literally guesses much of the time how much it should read for file attachments in conversations.
- 1k tokens are safe from truncation in project knowledge: A member stated that 1k tokens should load from project knowledge if the AI reads knowledge on its first turn.
- It was noted that the AIâs file attachment reading in conversations is often based on guesses.
Cursor Community â· #general (295 messagesđ„đ„):
Claude Skills Removal, RULE.md and Skill.md woes, Cursor Auto Mode Limits, Opus 4.5 vs GPT-5 Costs, Ooga Booga Coding
- Claude Skills get Removed!: A member reported that the âimport claude skillsâ option was removed, which previously created a /skills/ subdirectory for new rules instead of /rules/, and documented the difficulties with the current skill implementation.
- RULE.md Rollercoaster!: Members discussed how RULE.md and SKILLS.md formats are inconsistently recognized, with the frontmatter helper not always appearing unless the file is named .mdc, leading to confusion about the correct way to define rules.
- One member concluded, yeah, their stuff is all messed up rofl.
- Auto-Mode Apocalypse: Cursor users noticed that Auto Mode is now limited, with members sharing screenshots of hitting their usage caps, leading to discussions on whether the Pro Plus plan offers unlimited auto-mode.
- Members reported that once their $20 is finished, they can continue using Auto mode until the end of the next billing cycle and discussed strategies for creating multiple accounts to circumvent the limits.
- Opus eats cash for breakfast! GPT-5 cheaper?!: Users debated the costs of different AI models within Cursor, with one member noting that Opus 4.5 is excellent for understanding intention but can quickly burn through credits, while GPT-5 is the least expensive option for technical performance.
- One user reported spending almost their entire Ultra plan from December 24-27 rebuilding their video game, and costing around $400 of usage for the $200 plan.
- Ooga Booga versus Vibe Coded: Members discussed the meaning of âooga booga coding,â with one linking to a Tenor GIF to illustrate the concept in contrast to âvibe coding.â
LMArena â· #general (280 messagesđ„đ„):
Gemini 3 Pro Errors, Login Issues on LMArena, Video Generation on LMArena, GPT-5 Exaggerations, Grok vs. Gemini
- Gemini 3 Pro Faces Error Reports: Users reported frequent errors with Gemini 3 Pro, specifically âSomething went wrong with this response, please try again,â prompting calls for a fix.
- Some users reported that recent Gemini errors are linked to Arena, rather than rate limits or user-side problems, with one user stating, âItâs very obvious that the recent Gemini error is related to Arena, not to rate limit or other stuff.â
- LMArena Plagued by Login Problems: Multiple users reported login issues on LMArena, being rerouted to the homepage without being logged in, as the community manager confirmed, *âThis issue is a problem that we are aware of and are working on a fix for.â
- A user shared that the login problems got fixed after 3-4 tries, whereas another noted that clearing cookies or hard refresh did not resolve the issue.
- Video Generation Coming to LMArena: The team plans to bring video generation to the LMArena website, experimenting to ensure everything works as intended before a full rollout.
- A user had asked, âdo u guys have any intentions of making video gen on site, an official thing and not an experimental ?â which a community manager confirmed they hope to add to the site.
- GPT-5 Hype Debunked in Seahorse Saga: A user linked a YouTube video where GPT-5âs claimed Ph.D.-level capabilities were humorously challenged, highlighting its tendency to give false answers and hallucinate, particularly concerning a seahorse.
- The user decried it as âwhat a bloody ridiculous exaggerationâ that GPT-5 is Ph.D. level.
- Grok 4.20 Eyes Arena Debut: Discussion surfaced around the potential release of Grok 4.20, which may match Gemini 3âs performance on LMArena.
- The model is expected to perform like an enhanced Grok 4.1 and itâs release is expected to be in the next week or two.
LMArena â· #announcements (1 messages):
Text Arena Leaderboard, GLM-4.7, Minimax-m2.1-preview, Leaderboard Feedback, Leaderboard Changelog
- Arena Leaderboard Adds GLM-4.7 & Minimax: The Text Arena leaderboard has been updated with
GLM-4.7&Minimax-m2.1-preview.- Users can share feedback in the feedback channel and stay updated on leaderboard changes via the Leaderboard Changelog.
- Leaderboard Feedback Welcome: Users are encouraged to provide feedback on the updated leaderboard in the designated feedback channel.
- The update includes
GLM-4.7andMinimax-m2.1-previewmodels, and user input is valuable for future improvements.
- The update includes
HuggingFace â· #general (168 messagesđ„đ„):
SVF Parameter Efficiency vs Lora, MoD, MoM and Hopfield on expert inside, Buying GPUs, AI Coding Tools vs Coding by Hand, Bitnet Model
- SVF Parameter Efficiency beats Lora: It was mentioned that SVF is 4x more parameter efficient than Lora and beats Lora in terms of finetuning results.
- However, the time efficiency argument means it requires two passes, implying much slower response times.
- Innovative Architecture uses MoD, MoM and Hopfield Together: A member is exploring an architecture using MoD, MoM, and Hopfield on an expert inside to force specialization and reduce token costs during training.
- They noted that applying norm individually on experts causes regression on perplexity and val loss, when validating in tinyshakespare, instead they applied CMS inside Hopfield that can be chosen by MoM after normalization happens on routing.
- A GPU is Bought: A member was shopping for a GPU, but another member advised against a linked GTX 1060 6GB, instead recommending a 3060 or better.
- The advice prompted another member to suggest starting a money lending business to enable people to buy GPUs, although another member stated they just rent gpu.
- AI Coding Tools: Members discussed the use of AI coding tools, with one stating they get around 80% good results but have to watch everything closely to prevent errors and architecture degradation.
- Others suggested coding by hand, saying itâs way more rewarding mentally and for the product, especially using languages like C.
- Bitnet Model uses Addition instead of Multiplication: A member explained that the Bitnet model replaces multiplication with addition, creating a form of emergent expert due to the 1, 0, -1 nature of the weights.
- This also has the effect of increasing efficiency with only three states.
HuggingFace â· #i-made-this (7 messages):
KTAI Chat, Bielik 11B v3, Comfy Workflows CLI Tool, LLM from Scratch with PyTorch, BCI Dataset Generators
- KTAI Chat Launches: A new chat platform, KTAI Chat, has been launched, with the main website coming soon.
- Bielik 11B v3 Supports European Languages: Bielik 11B v3, a language model supporting European languages, has been published, according to this LinkedIn post.
- CLI Tool Streamlines Comfy Workflows: A new CLI tool allows users to drag and drop ComfyUI workflows, expose inputs to the MCP, and upload to a backend that auto-generates embeddings for image or text search via ComfyUI extension or webui.
- LLM from Scratch Emerges: An LLM built from scratch with PyTorch, incorporating RoPE and GQA, has been released on GitHub.
- FPS Games as BCI Dataset Generators: A member created a dataset using HTML-based FPS games as generators for BCI study, available on Hugging Face datasets.
HuggingFace â· #agents-course (4 messages):
Hugging Face Reinforcement Learning course certificates, Agent course Final project
- Certificates possibly discontinued in RL course: A member inquired whether the certificates for the Hugging Face reinforcement learning course are no longer being issued.
- Another member speculates that they may be preparing to revisit the courses, but itâs uncertain whether the certificates will be reinstated.
- Agent Course final project difficulties: A member is seeking assistance with an issue encountered in the Agent course Final project related to the level1 APIâs inability to connect to the dataset.
- The specific error is No file path associated with task_id 1f975693-876d-457b-a649-393859e79bf3, preventing file downloads from this link.
Latent Space â· #ai-general-chat (54 messagesđ„):
Claude Opus 4.5, AI Leader Playbook, Pixo Project, Qwen-Image-2512, AI vs SaaS
- Claude Opus Busts Out Rust MIDI Mixer!: Max Woolf was surprised by Claude Opus 4.5âs ability to generate a fully functional terminal-based MIDI mixer application written in the Rust programming language, available here.
- AI Leader Playbook Powers Product Proliferation: Rahulgs outlined a strategy for companies to maintain a competitive edge in the AI era with the internal playbook available here, emphasizing engineers with coding agents and investing in agent infrastructure.
- For product development, he advocates for rapid model iteration, replacing forms with unstructured inputs, utilizing semantic search, and prioritizing robust evaluation frameworks over outdated practices like manual fine-tuning.
- Qwen-Image-2512 Queues Up Quality: Alibabaâs Qwen team has launched Qwen-Image-2512, an upgraded open-source image generation model available here that features improved human realism, natural textures, and text rendering accuracy, ranking as a top-tier open-source model in AI Arena blind tests.
- Moonshot AI Makes Bank, Booming Business: Moonshot AI has closed a $500M Series C round at a $4.3B valuation, led by IDG, the post is available here.
- The company reports $1.4B cash on hand, 170% monthly growth in Kimi paid users, and a 400% increase in overseas API revenue following the K2 model launch.
- Ubiquant Unleashes Ultra-Useful 40B Model?: Ubiquant has introduced a new 40B parameter model available to view here that achieves a significant 81.4 score on the SWE-Bench Verified benchmark, generating surprise and interest regarding its efficiency and performance.
Latent Space â· #private-agents (8 messagesđ„):
RLHF's Impact on Model Character, Custom Tokenizers, Chatbot Tone Genericism
- RLHF Poisons Model Character: A member wondered if RLHF, rather than the tokenizer, is to blame for models sounding uniformly âhelpful and informative,â which is poison for character.
- Another member concurred, noting a feedback loop where instruction-tuned models generate data for base models, exacerbating the issue.
- Custom Tokenizers Experimentation: A member inquired whether anyone is experimenting with custom tokenizers to address the issue of generic chatbot tone, or if research is still focused on LoRA on top of existing models.
- It was suggested that lack of nuanced interaction may stem from tokenizers missing specific concepts, causing interactions to lack nuance.
- Training Data Bottleneck: It was discussed that itâs hard to get rid of the generic âchat gptâ tone, and the tokenizer is the bottleneck vs the training data.
- Fine tuning on dialogue will make something chatgpt npcesque, people will be able to feel they are just talking to another chatgpt.
Latent Space â· #genmedia-creative-ai (34 messagesđ„):
AI influencer content generation, Criticism of Monetizing Children's Content, Qwen-Image-2512 release, Tencent Hunyuan Open-Sources HY-Motion 1.0, Free Local Suno Alternative: SongGeneration Studio
- Claude Generates AI Influencer Vid: Fabian Stelzer demonstrated Claudeâs capability to generate a realistic influencer-style video explaining medical topics such as LDL cholesterol and statins when using proper scaffolding, as seen in this tweet.
- Disapproval on Monetizing Childrenâs Media: A user expressed strong disapproval of profiting from âbrain rotâ content targeted at children, arguing that addicting young audiences for financial gain should be prohibited, full tweet here.
- Alibabaâs Qwen Releases New Image Model: Alibabaâs Qwen team has launched Qwen-Image-2512, an upgraded open-source image generation model with more realistic human features, enhanced natural textures, and superior text rendering, details in this tweet.
- Tencent Releases Text-to-Motion Model: Tencent has released HY-Motion 1.0, a 1B+ parameter text-to-motion model utilizing Diffusion Transformer architecture, creating high-fidelity, physically plausible 3D animations, according to this tweet.
- Local Suno Alternative Surfaces: SongGeneration Studio is a free, local alternative to Suno that allows users to generate high-quality songs up to 4.5 minutes long, requiring 10GB of VRAM with one-click installation for PC users, as seen on this tweet.
OpenRouter â· #general (70 messagesđ„đ„):
PDF Table Extraction, OpenRouter Tech Support, GPT-OSS-120B usage and limits, OpenRouter AI SDK v6 package, AI model generating repetitive dialogues
- PDF Data Extraction Dilemma: A member asked about the best way to extract tabular data from PDFs, considering options like Open Routerâs pdf-text tool, Gemini models, and MuPDF.
- Another member suggested that it depends on your PDF really.
- Need OpenRouter Support? Email Away!: A member inquired how to contact OpenRouterâs tech support.
- Another member quickly provided the support email: [email protected].
- GPT-OSS-120B Free Model Traffic Jam: A user reported exceeding the limits of the free model openai/gpt-oss-120b, even after paying for credits and experiencing TOO MANY REQUESTS errors.
- The user expressed frustration with the lack of support and suggested a refund may be in order, citing over-saturation of the free queue with holiday gooners literally ddosing my waifu.
- SDK v6 Support Needed: OpenRouter AI SDK v6 Compatibility: A member inquired about the timeline for an OpenRouter AI SDK v6-compatible package release.
- No response was available in the context.
- Reasoning Support Sought for Messages Endpoint: A user requested reasoning support on the
/v1/messagesendpoint, highlighting that/v1/chat/completionsalready supports it.- Another member questioned the feasibility of implementing a feature for a single use case, while another member clarified that the Anthropic API was built for integrating with Claude Code and isnât even documented at the moment.
OpenRouter â· #discussion (17 messagesđ„):
Model Quality, OpenRouter Retries, Deepseek API Latency
- Model Quality varies between endpoints: Members noted that model quality varies between endpoints, particularly with well-known models like Balls and Pelican, due to potential manipulation of eval conditions.
- One member stated that DS basically confirmed they make sure balls are good, suggesting possible biases in eval results.
- OR should retry requests for 500 errors: Users discussed that OpenRouter should automatically retry requests that fail with 500 errors, as some providers like Nebuis occasionally return these errors.
- A user suggested that OpenRouter handling retries could save developers time, while another questioned whether this would prevent APIs from ever returning 500 errors.
- Deepseek API Latency Higher on OpenRouter: A member experienced 3x slower latency on OpenRouter with Deepseek 3.2 using Deepseek as the provider, compared to using the Deepseek API directly, reporting 5-10 seconds latency versus 1-2 seconds.
- The user clarified that using a smaller Mistral model on OpenRouter resulted in sub-second latency, indicating the issue is specific to the Deepseek model via OpenRouter.
Yannick Kilcher â· #general (43 messagesđ„):
Data center protection strategies, Autonomous underwater vehicles, DeepSeek R1 paper impact, Music recommendation system project
- Drone Defense: Birds vs. Concrete vs. EW: Members discussed defending data centers from drone attacks, suggesting electronic warfare (EW) and concrete reinforcements are more effective than traditional missile defenses, with one member stating the best air defense is simply pouring more concrete.
- A paper was cited, discussing how micro-Doppler signatures can distinguish drones from birds, though another member countered with the argument that drones can implement RF cloaking to avoid detection.
- DIY Deep Sea Drones: Easy or Not?: The feasibility of building DIY deep-sea drones was debated, arguing that while standard plumbing hardware might suffice for disposable drones, robust, long-term operation requires specialized, costly components.
- One member highlighted the high cost of underwater components, and the necessity of robust sensors for reliable navigation, especially in murky ocean conditions.
- DeepSeek R1: Reasoning Moat or Costly Risk?: Members lauded the DeepSeek R1 paper (https://arxiv.org/abs/2501.12948) for open-sourcing a reasoning model moat.
- One member cautioned against oversimplifying the cost, arguing that the final training run cost doesnât account for experimental expenses, and that DeepSeekâs disregard for customer needs allows it to take more risks on long-term technology bets.
- ML Music: Building a Recommendation System From Scratch: A member sought collaborators for building a music recommendation system from scratch, explicitly avoiding AI tools like GPT, Claude, or Copilot.
- The goal is to strengthen ML skills through a hands-on project, welcoming others to join in the development effort.
Yannick Kilcher â· #paper-discussion (4 messages):
Chaos Computer Club, Text Rendering
- CoolActually hangs with Chaos Computer Club: A member is spending new years with the Chaos Computer Club and is looking forward to their videos.
- They shared a decent video from this year on text rendering, admitting itâs not a universally interesting topic.
- Text Rendering video surfaces: The Chaos Computer Club released a video on text rendering.
- The video is available on YouTube.
Yannick Kilcher â· #ml-news (22 messagesđ„):
AI code generation bloat, Meta acquiring AI startup Manus, GPU pricing and availability, AI turning the public against it, Gun control paradox
- AI Code Generation: Bloatware Bonanza?: A member described how AI code generation leads to ridiculous code bloat due to superfluous null checks and assumptions about global program state, resulting in features requiring over 2000 lines that could be implemented in less than 50.
- The user noted a tendency for code to become unmaintainable and riddled with validation functions, leading to performance death by a thousand cuts, and every attempt to reduce bloat adds even more lines.
- Meta Manus-euvers to AGI: A member shared a link to this article reporting that Meta is acquiring AI startup Manus in a deal valued at over $2 billion.
- The member quipped that Le Zuck is still trying to buy his way to AGI.
- GPU Pricing: Happy New Year!: Members discussed GPU pricing as the new year began, anticipating crazy headlines related to GPU availability.
- One member jokingly predicted US warehouses get ropped by armed gangs where all these GPUs are sitting because the datacenter can not be build fast enough in the US due to lack of infrastructure, death threats to snake oil salesman SAM.
- Grokâs Grotesque PR Gamble?: A member speculated that AI (along with Grok) might turn the public against it.
- It seems that there are some concerns around potential misuse and negative perceptions associated with AI technologies.
- Gun Control Paradox: A Desperate Fiction?: A member criticized the idea that the population needs guns to protect themselves from corrupt government or companies.
- They argued that the entire population needs guns to protect themselves from corrupt government (or whatever else corrupt company or institution you want) is just a desperate fiction, only increasing gun sales of corrupt companies in bed with the government.
Moonshot AI (Kimi K-2) â· #general-chat (64 messagesđ„đ„):
Kimi K-2 V version, Kimi CLI, Kimi K2 local deployment, NB Pro Model, K3-Vision w/RAG
- Kimi K-2 âVâ is Coming Soon!: Users are speculating about a new âVâ version of Kimi K-2, possibly alluding to the Kimi K-2 Vision model based on this X post.
- Kimi Code Integration bottlenecked by Roo?: A user reported that piping a Lua API refactor to the standard kimi-for-coding endpoint via Roo resulted in context collapse, while the raw kimi-k2-thinking endpoint via the Kimi CLI one-shot it.
- He suggests that Roo Code integration might be the bottleneck, potentially mapping to a non-reasoning variant, and has escalated this issue to Kimi engineering.
- Deploy Kimi K2 Instruct Locally: A user with access to ~640 GB VRAM seeks advice on the best quantization method for local deployment of Kimi-K2-instruct, along with any architecture limitations.
- Kimi AI Assistant shared a link to a conversation and attached a Kimi-K2_Local_Deploy_Guide.jpg and a kimi-k2-instruct-local-deployment.md file.
- The best is still NB Pro!: Many users consider that NB Pro is still the best model and they argue most LLMs are not usable beyond the 200ish context window.
- So they prefer reliable 256K over non reliable bigger models.
- Kimi needs K3-Vision w/RAG: One user suggests that Moonshot should add K3-Vision w/RAG to Kimi, adding also âProjectsâ just like Qwen and ChatGPT.
- Another user hopes thereâs no RAG, because it compromises the core qualities of the LLM and only games the context window, and compromise core qualities of the LLM.
GPU MODE â· #general (6 messages):
Liger Kernel, Backwards FA Kernel, Most Performant LLM/VLM kernels, Kernels reach the SOL
- Liger Kernel Includes Top Kernels: The Liger kernel reportedly includes a directory of the most performant LLM/VLM kernels, along with other features for sparsity, for both forward and backward passes.
- Kernels Hope to Reach SOL by 2026: A member wished that our kernels reach the SOL in 2026.
GPU MODE â· #cuda (4 messages):
PTX ISA update, CuTe DSL for Flash Attention, CUDA 13 Intellisense in Cursor
- PTX ISA Gets a Facelift!: The PTX ISA will be updated, and the
tcgen05from GB200 are the same for Jetson AGX Thor, implying the 5th generation TensorCoreâs Tensor Memory has a two-dimensional structure of 512 columns and 128 rows per CTA, with each cell being 32-bits in size on architecturesm_100a/sm_100f.- This architecture impacts how tensor cores perform on specific hardware, and will be useful to those developing optimized CUDA kernels.
- CuTe DSL is declared FA2 cute!: The CuTe DSL is working, according to a user, citing the NVIDIA Cutlass GitHub and a successful run on a DGX Spark.
- The run details include using BFloat16, a batch_size of 4, seqlen_q/k of 8192, 16 heads, head_dim of 128, and a softmax_scale of 0.5, among other parameters.
- Cursor struggles with CUDA 13: A user reported issues getting proper intellisense in Cursor with CUDA 13, noting that it forces them to use Cursor cpptools instead of vscode cpptools.
- The bundled clangd doesnât support CUDA 13.0, leading to LSP errors like CUDA version is newer than the latest partially supported version 12.8, effectively breaking the LSP.
GPU MODE â· #torch (3 messages):
Torch device-side asserts, Torch async asserts, D2H syncs
- Inquire about Torch Device-Side Asserts: A member inquired about whether torch has Python bindings for device-side asserts or async asserts, aiming to eliminate D2H syncs.
- Another member clarified the inquiry, questioning if device-side assert refers to asserts inside a kernel.
- Clarification on Device-Side Asserts: A member sought clarification on the term device-side asserts, asking if it implies an assertion check within a kernel.
- This clarification aimed to understand the specific context and requirements for the userâs need to avoid D2H syncs in their torch workflow.
GPU MODE â· #beginner (8 messagesđ„):
CUDA Warps and Linearization, GPU Memory Model, KernelIDE: Browser-Based IDE for Kernel Development, CUDA Performance Abstractions
- Warps Warp Reality via Linearization: A member clarified the concept of CUDA warps, explaining that although often visualized as 2D structures, they are linearized in memory as consecutive addresses, like
[x, x+1, ... x+32], which is detailed in this blog post.- The member noted that confusion arises when formulating problems in linear algebra terms rather than considering the hardware-level reality.
- Memory Musings: Hierarchy vs. Model: A member shared a post about the GPU memory model here, but another user suggested that memory hierarchy would be a more accurate title.
- They pointed to this wiki page and the nvidia documentation to explain the difference, which refers to memory consistency.
- Kernel Konstructor: Browser IDE Arrives: A member shared their browser-based IDE called KernelIDE, which allows users to write and test kernels in Triton, CuteDSL, Mojo, and CUDA, connected to modal.com accounts; the project is available on GitHub.
- The creator developed it for personal CUDA kernel testing and practice, emphasizing it as a fun, learning-oriented project, while confessing âAnd I am not frontend developerâ.
- Performance Puzzles: CUDA Abstractions Confound: One member expressed difficulty in discerning the most useful abstractions for performance directly from CUDA, even with PMPP (Performance Measurement and Profiling Process).
- They mentioned exploring different libraries like CuteDSL to see how they manage abstractions and mentioned that âyou never really understand a lot of these conceptual models, you just get used to them. Itâs a more productive framing.â
GPU MODE â· #factorio-learning-env (1 messages):
blipblob4264: happy new years everyone!!
GPU MODE â· #cutlass (5 messages):
CuTe Layouts, GPU Programming, Operads in Category Theory
- CuTe Layouts Theory Faces Scope Limitations: A member inquired about pathological examples where the CuTe layouts theory fails in practice, but the author clarified that the discussion is intentionally restricted to a subclass of layouts to avoid most pathologies.
- The author mentioned that the composition of two tractable CuTe layouts might not always be tractable itself, though a specific example wasnât immediately available.
- Operads Section in CuTe Layouts Paper May be Skipped: A member found the section on operads hard to grasp and the author recommended skipping it if one lacks knowledge of operads.
- The author explained that the operadic perspective is not necessary to understand the core of the paper and stems from the authorsâ background in algebraic topology and higher category theory.
- CuTe Layouts Good starting point for GPU Programming: One member, a relative newcomer to CUDA (2-3 weeks experience), is using the text as a way to learn about GPU programming, noting the benefits of starting with an opinionated approach and formal math.
- He also noted the importance of understanding the limitations of the technology and that it helps to start with a very opinionated approach and ponder the pros and cons early on.
GPU MODE â· #teenygrad (2 messages):
Pyodide, WebAssembly SIMD, GPU acceleration, mdbook plugin, Colab
- Pyodide, WASM SIMD, and GPU Acceleration set for action: The user is planning to combine Pyodide with WebAssembly SIMD and GPU acceleration in a project.
- They are also considering developing an mdbook plugin that interacts with Colab to facilitate this endeavor.
- mdbook plugin on the horizon: The user expressed intent to create a mdbook plugin.
- The plugin will likely interface with Colab to enable some form of remote or accelerated processing.
Eleuther â· #announcements (1 messages):
EleutherAI 2025 Review, EleutherAI Google Scholar Profile, EleutherAI Budget Doubling, EleutherAI Research Collaborations, EleutherAI Community Importance
- EleutherAI Reflects on Triumphant 2025: EleutherAI celebrates a year of significant growth, accumulating 18,355 citations in 2025 and surpassing 43,000 in total, with community contributions increasingly driving citations.
- Papers with EleutherAI affiliation achieved a 60% acceptance rate at major conferences (ICLR, ICML, COLM, NeurIPS) across 20 submissions, marking substantial success for first-time authors.
- EleutherAI Launches Google Scholar Profile: EleutherAI introduces a Google Scholar profile to showcase affiliated publications, inviting members to submit their work for inclusion.
- The organization plans to promote community work more actively, regardless of official EleutherAI affiliation, and encourages submissions for promotion.
- EleutherAI Foresees Budget Surge: EleutherAI anticipates doubling its budget by the end of 2026, enabling expanded staffing and increased resources for community researchers.
- The organization is actively seeking individuals to contribute to back-end operations, community development, and design, with full-time hiring announcements forthcoming.
- EleutherAI Experiments With Research Collaborations: EleutherAI is testing an explicit directory to connect experienced researchers for potential project collaborations and leadership opportunities.
- Members, including those new to EleutherAI, are encouraged to add themselves to the directory to foster broader community engagement in research initiatives.
- EleutherAI Community Provides Support: The EleutherAI community has provided stability during personally challenging times for leadership, with the strong relationships formed serving as a foundation.
- The support received, whether through simple greetings at conferences or during emotional moments, has highlighted the communityâs importance.
Eleuther â· #general (12 messagesđ„):
AI Experience, Research Engineer Introduction, Independent Researcher's Dilemma, System Analyzes Mathematical Structure, Exoplanet Result
- AI Engineers Join the Crew: Two new members with AI/ML experience introduced themselves; one with 3.6 years as a Senior Research Engineer and another with 3.5+ years focusing on reward models and fine-tuning.
- They expressed interest in contributing to LLM alignment and evaluation, with suggestions to check out the research channels.
- Researcher Faces Validation Dilemma: An independent researcher is caught between publishing partial results prematurely, risking being labeled a quack, and delaying progress by withholding findings for thorough validation.
- He said that open-sourcing everything too early risks losing control of the narrative.
- System Reasons Over Angular Dependence: A system analyzed mathematical structure and applied theory, distinguishing a central force from a dipole by reasoning over angular dependence and conservation laws.
- The AI produced disjunctive interpretations instead of a single canned explanation.
- AI Measures Shift in the Radius Valley: Conditioning on stellar flux, a system measured a shift in the radius valley that aligns with theoretical predictions, potentially offering an actual scientific contribution.
- The researcher stated this exoplanet result is the piece I feel strongest about publishing.
Eleuther â· #research (6 messages):
DeepSeek-R1 paper, Reproducing reasoning models, Grokking numerical stability
- DeepSeek-R1 paper impresses community: A member found the DeepSeek-R1 paper most impressive, which open sourced reasoning models.
- They are currently trying to reproduce this on their laptop but are having no luck actually seeing the desired results.
- Attempting to reproduce reasoning models fails: A member is trying to reproduce the DeepSeek-R1 paper on their laptop, setting up the transformer and training loop, and had GPT-5 cross-check it with the paper to spot bugs.
- After running for 1.2M iterations on the modulo 5 addition dataset, thereâs still no generalization.
- Grokking numerical stability paper may help: A member suggested checking out the Grokking numerical stability paper for help reproducing the reasoning models.
- They also linked to a related GitHub repository.
Eleuther â· #interpretability-general (5 messages):
EAI 2025 poll, Applied AI, EleutherAI research projects
- EAI Poll Launched for 2025âs Top Results: A member is creating a community poll to gauge the most important or interesting results in EAI (Effective Altruism Infrastructure) for 2025, including a category for interpretability.
- Applied AI Category Proposed for EAI Poll: A member proposed adding categories like Alignment, Applied AI, and Social Impacts to the EAI poll.
- Another member inquired what Applied AI would cover, and the first member suggested Robotics, IoT, medical deployments, and guided missiles.
- New Member Seeks Hands-On Research at EleutherAI: A new member introduced themself, mentioning their learning journey through Neelâs blog and Tel Avivâs open-sourced course, and expressed interest in joining ongoing research projects or pursuing new ideas within EleutherAI.
Manus.im Discord â· #general (16 messagesđ„):
Manus acquisition by Meta, User reactions to the acquisition, Manus API capabilities
- Meta Acquires Manus, Community Reacts: Users express disappointment and concern over Metaâs acquisition of Manus, fearing changes in policies and product direction, with one user stating, âsorry guys, meta buying manus loses me as a user.â
- Despite the negative sentiment, Manus assures users that the team, workflow, infrastructure, and policies will remain unchanged, citing Xiaoâs statement that âJoining Meta allows us to build on a stronger, more sustainable foundation without changing how Manus works or how decisions are made.â
- Questions Arise Regarding Manus API: A user inquired whether the Manus API can return text responses instead of only Manus links.
- There was no answer provided in the messages.
- Subscription Record Issues: A user reported issues with their subscription record, receiving a message stating, âWe couldnât find your subscription record.â
- The user was prompted to DM with details such as their order number to resolve the issue; no further details were shared.
Modular (Mojo đ„) â· #mojo (8 messagesđ„):
Mojo's Front End, LLVM and C++, MLIR, Type Calculations in C++, Mojo FFmpeg Bindings
- Mojoâs Front End Unveiled: The Mojo front end parses almost directly to MLIR, at which point it enters a big pile of LLVM stuff and right now itâs all C++ because LLVM is C++.
- At some point in the future rewriting the parser to Mojo should be possible, however, rewriting all of LLVM is a much, much longer discussion.
- Type Calculations Challenge in C++: A member mentioned that itâs a big challenge to implement all the type calculations in C++ and is looking forward to seeing the open source code.
- Another member replied that he has ~6 months to wait for that, expressing eagerness to get a look.
- Mojo FFmpeg Bindings Progress: A member shared the very early progress of ffmpeg bindings encoding frames into h264 bytes then outputting into dash-mpeg mp4 fragments that can be streamed over http and posted it in the forum.
- He is currently using a python http server, but watching this github repo and notes that this is just show and tell, because it is super UX unfriendly since it just got this working today.
MCP Contributors (Official) â· #general (4 messages):
Code Execution, MCP Tools, Deterministic code, AAIF
- Anthropicâs Code Execution blog sparks discussion: A member inquired about prototypes or implementations of building Agents that can write deterministic code using MCP tools, referencing Anthropicâs blog on code execution.
- The member noted that code execution improves token efficiency, context size, and predictability compared to passing MCP Tools metadata in every request to the LLM.
- Goose implements Code Execution with MCP: Goose, MCPâs sister project in the AAIF, has implemented code execution, as detailed in this blog post.
- Code Execution discussion on GitHub: A member pointed out that the Discord isnât the right forum for discussing code execution, and redirected the discussion to GitHub.