a quiet christmas
AI News for 12/26/2025-12/27/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (208 channels, and 2801 messages) for you. Estimated reading time saved (at 200wpm): 236 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
happy christmas.
AI Twitter Recap
Open-source models, local inference, and āinference qualityā as the hidden variable
-
MiniMax M2.1 (open weights) lands as an agent/coding MoE: MiniMax released M2.1 as open source, positioning it as SOTA for ārealāworld dev & agents,ā with claims of strong results on SWE / VIBE / MultiāSWE, and ābeats Gemini 3 Pro & Claude Sonnet 4.5.ā They describe it as ~10B active / ~230B total MoE and emphasize deployability (including local runs). See MiniMaxās announcement and links to weights/docs: @MiniMax__AI, @MiniMax__AI. Community + infra support followed quickly: vLLM āDayā0 supportā (vLLM); MLX quantizations and local run recipes (@awnihannun, @awnihannun); āNow on MLXā collection posts (@Prince_Canuma).
- Practical note: Early hands-on posts highlight that M2.1 can run locally at meaningful speed on Apple Silicon when quantized (e.g., 4ābit on M3 Ultra), but also that RAM requirements remain large for big-context generation (e.g., ~130GB cited in an MLX run command). (@awnihannun)
-
GLM 4.7 on Mac Studios shows ālocal frontier-ishā scaling is now a systems problem: A notable datapoint: āfull GLM 4.7 (8ābit) on 2Ć 512GB M3 Ultra Mac Studiosā at ~19.8 tok/s using Exo Labs MLX RDMA backend + tensor parallel (@alexocheema). This isnāt just a model storyābandwidth, networking, backend maturity (MLX RDMA), and parallelism strategy are increasingly the differentiators.
-
āSame model, same promptā ā same outputs: inference quality enters the chat: LMArena highlighted that inference stack and deployment choices can materially change output quality, especially as models scaleāframing this as a āhidden variableā explaining performance variance across providers/runtimes (@arena). This theme also appears in practitioner questions about āavoiding quality loss when inferencingā and requests for vendor investigations/blogs (@QuixiAI).
Nonāgenerative multimodal learning resurges: VLāJEPA as an efficiency play
- VLāJEPA: predict meaning in latent space, decode only when needed: Multiple summaries circulated of Yann LeCunās VLāJEPA framing it as a nonāgenerative, nonāautoregressive alternative to VLMs, aiming for realātime capability by operating in latent space and selectively decoding (mark_k). A longer technical recap claims: 1.6B params can rival much larger VLMs (e.g., ā72B QwenāVLā in some settings), with ~50% fewer parameters than token-based approaches, and ~3Ć fewer decoding ops via ādecoder only when needed,ā plus strong video classification/retrieval comparisons vs CLIP/SigLIP2 (ęŗåØä¹åæ).
- Why it matters: If borne out broadly, this is a systemsāfriendly direction for streaming video and on-device/online perception workloads where autoregressive decoding cost dominates.
Agents, RL-for-coding, and the emerging ācontext engineeringā discipline
-
SelfāPlay SWEāRL: self-improving coding agents via bug injection + fixing: A highlighted direction is agents that generate their own training signal by introducing bugs into real repos and then repairing them, enabling self-improvement without constant human labeling (@EdwardSun0909). This fits the broader āverifiable tasks + executable feedback loopsā trend thatās been accelerating code agents.
-
What large-scale agentic RL looks like operationally: One thread sketches the infra shape: āhundreds of inference nodes generating code at millions tok/s,ā āthousands of sandboxes executing code in parallel,ā and training nodes learning from rewards (@johannes_hage, @johannes_hage). Even without deep details, this reflects the new normal: RL for agents is primarily a distributed systems and eval harness problem.
-
TRL positioned as the practical post-training toolkit for agent/tool competence: A concise overview argues TRLās value is in (1) SFT for tool/MCP correctness and formatting, (2) RL with environments (code/git/browsers), and (3) GRPO to train tool use for your real tasks rather than āteach tools at inference timeā (@ben_burtenshaw).
-
Claude Code as the āsecond hit form factorā: Several posts converge on the view that 2025ās biggest workflow shift is not just better models, but agentic coding interfaces (Claude Code, Cursor, etc.). A long Karpathy reflection captures the feeling of a new abstraction layer: agents/subagents, prompts, memory, permissions, tools, MCP, IDE hooksāāalien workflowsā that require a new mental model (@karpathy). Others echo: āhumans are the copilotā (@DrJimFan), ācapability overhang on coding modelsā + āClaude codeā as a key form factor (@arohan).
- A concrete workflow pattern shared for Claude Code is āparallel design ā distill/summarize ā parallel refine ā implement,ā using many agents to cross-check plans and integration correctness (@arohan).
- A practitioner anecdote: debugging workflows are flippingāClaude can produce actionable analyses/PRs by reading heap dumps and proposing fixes, reducing āmanual IDE timeā dramatically (@bcherny).
-
Tooling around agents is consolidating: New utilities focus on making agent work shareable and repeatableāe.g., generating readable HTML ātranscriptsā of Claude Code sessions for publishing (@simonw); a CLI to manage āagent skillsā (create/validate/convert/push/install/pull) patterned after Anthropic skills workflows (@andersonbcdefg).
Retrieval, memory, and evaluation: from ābenchmarksā to operational reliability
-
GraphRAG survey formalizes the design space: A widely shared writeup summarizes a āfirst comprehensive surveyā on GraphRAG, arguing vanilla chunk-based RAG misses relational structure (entities/edges/paths/subgraphs). It breaks GraphRAG into graph indexing ā graph-guided retrieval ā graph-enhanced generation, and frames when graphs help (multi-hop, relational queries) vs when theyāre overhead (@dair_ai).
-
Agent memory becomes a first-class research object: A long survey (āMemory in the Age of AI Agents,ā 102 pages) proposes a framework around forms, functions, dynamics (@omarsar0)āreflecting that āmemoryā is now central to productizing agents (persistence, retrieval, summarization drift, episodic vs semantic stores, etc.).
-
Benchmarks shift toward āhard-to-fakeā + long-horizon tasks: Posts highlight the need for evaluations that resist overfitting: āprediction and discovery are the hardest-to-fake benchmarks for AGIā (@ruomingpang); plus long-horizon algorithm engineering benchmarks like ALE-Bench (@SakanaAILabs).
-
2026 prediction thread: enterprises demand verification and 95%+ reliability: A roundup argues 2025 was āgetting used to AI,ā while 2026 becomes āmake it work and verify it,ā especially in regulated/high-stakes industries; it predicts demand for audit-ready precision, possible architectural shifts (e.g., pragmatic neuro-symbolic elements), and āforward deployed engineerā roles (@TheTuringPost).
Systems & hardware constraints: memory supply chain and the āDIY PC is deadā narrative
- RAM/HBM as the new bottleneck: A viral thread frames a stark consumer comparisonāhigh-capacity RAM modules vs DGX Spark vs Mac Studioāarguing unified memory systems are undercutting DIY economics (āDIY PC is dead. RAM killed it.ā) (@Yuchenj_UW).
- Supply chain drama: āAI ate the supply chainā: Another post claims RAM prices spiked due to AI demand: HBM concentration among SK Hynix/Samsung/Micron; hyperscaler negotiations; and broader scarcity pressures, with the punchline that gamers get squeezed first (@Yuchenj_UW). Follow-on notes speculate Chinese memory firms may fill gaps (e.g., CXMT DDR5; Huawei HBM ambitions) and cite Chinese GPU momentum (Moore Threads IPO) (@Yuchenj_UW).
- Takeaway for engineers: model capability is increasingly gated by memory availability, bandwidth, and packaging, not just FLOPs.
Assorted technical notes worth bookmarking (but less central than the above)
- Mercari: fine-tuned embeddings drove measurable revenue lift: A clean applied ML datapoint: fine-tuning embeddings on purchase data yielded āsignificant revenue liftā in A/B testsāreinforcing that domain-tuned embeddings can outperform generic ones in production search/reco (@jobergum).
- Optimization theory update: A convex optimization post credits a key breakthrough to Aaron Defazio (@Jianlin_S).
- Reasoning-without-CoT time-horizon estimate: An attempt to quantify āno chain-of-thoughtā reasoning horizon on math problems, estimating Opus 4.5 at ~3.5 minutes in a single forward pass framing (@RyanPGreenblatt).
Top tweets (by engagement)
- CNN poll: āTrump is the worst President in historyā
- Long āCancĆŗn was optimized by the stateā monologue parody
- Kasparov: 25th amendment commentary
- DOGE contract cancellations correlated with political donations
- Karpathy on programming being ārefactoredā into agents/context/tooling
- Naval: āContinuous learning is the only defensible moat.ā
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. GPU VRAM Upgrade Advocacy
-
I wish this GPU VRAM upgrade modification became mainstream and ubiquitous to shred monopoly abuse of NVIDIA (Activity: 1116): The post discusses a GPU VRAM upgrade modification that is becoming mainstream in China, with companies like Alibaba offering modified GPUs such as the 2080Ti, 3080, 4080, 4090, and 5090 with increased VRAM capacities. Prices range from
$300for a 2080Ti with22GBto$4000for a 5090 with96GB. This modification trend is seen as a way to counteract NVIDIAās market dominance and pricing strategies. Commenters are skeptical about the availability and pricing of these high-capacity GPUs, with some questioning the existence of96GBcards for$4000. There is also a discussion about the cost-effectiveness of these modifications, with one user humorously questioning the low operational cost of ā3 cents per hourā.- Alibaba has been actively involved in upgrading the VRAM of NVIDIA GPUs, such as the 2080Ti, 3080, 4080, 4090, and 5090, with prices ranging from $300 for a 2080Ti with 22GB to $4000 for a 5090 with 96GB. This indicates a significant market for modified GPUs in China, potentially challenging NVIDIAās pricing strategies.
- A user reports successfully running a modified 4090 GPU with 48GB of memory without any issues, highlighting the feasibility and stability of such modifications. This user also mentions purchasing additional units for a second rig, suggesting that these modifications can meet high VRAM requirements similar to NVIDIAās L40s models.
- There is skepticism about the availability of 5090 GPUs with upgraded VRAM, as one commenter notes that the 5090 has not been upgraded yet. This reflects a potential gap between market offerings and consumer expectations or misinformation about available products.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Vision-Language Model Innovations
-
By Yann Lecun : New Vision Language JEPA with better performance than Multimodal LLMS !!! (Activity: 661): Yann LeCun introduces VL-JEPA (Vision-Language Joint Embedding Predictive Architecture), a non-generative model designed for real-time vision-language tasks such as action recognition, retrieval, and visual question answering (VQA). VL-JEPA outperforms traditional vision-language models (VLMs) by utilizing latent space embedding prediction, offering significant efficiency improvements for online video applications due to its non-autoregressive design and unified architecture. This model represents a shift from generative approaches, enabling simultaneous handling of various tasks with enhanced performance metrics. For more details, visit the original post here. A comment highlights that the announcement is not recent and suggests linking to the paper instead of the LinkedIn post. Another comment expresses enthusiasm for competition and new paradigms in the field.
- The new Vision Language JEPA model, as discussed by Yann LeCun, claims to outperform existing multimodal LLMs. However, there are concerns about its accuracy, particularly in action detection, as noted by a user who found many detected actions to be incorrect. This suggests potential issues with the modelās real-world application and reliability.
- A user pointed out the importance of accessing the original research paper for detailed insights, linking to the arXiv paper instead of relying on secondary sources like LinkedIn. This highlights the need for direct engagement with primary research materials to understand the modelās capabilities and limitations fully.
- There is a query about the availability of the JEPA model for testing or benchmarking, indicating a demand for empirical validation of its performance claims. This reflects a broader interest in how the modelās theoretical advancements translate into practical, measurable improvements over existing technologies.
-
A Qwen-Edit 2511 LoRA I made which I thought people here might enjoy: AnyPose. ControlNet-free Arbitrary Posing Based on a Reference Image. (Activity: 573): The image illustrates a tool named āAnyPose,ā which enables arbitrary pose mapping from a reference image without relying on ControlNet. This is achieved through a Qwen-Edit 2511 LoRA model, which is designed to replicate poses by using a reference image as input. The tool is highlighted for its ability to perform pose replication effectively, as demonstrated by the examples in the image. The creator has shared the model on Hugging Face, and the LoRA weights are now available for download. Commenters are interested in the training data used for the model and appreciate the detailed Hugging Face card that explains the toolās functionality, advantages, and limitations. There is also a discussion about the use of different UIs for inference, with a preference for Wan2GP due to its efficient memory management.
- SillyLilithh mentions using Wan2GP for inference due to its superior memory management compared to Comfy UI, despite preferring the latterās aesthetics. This highlights the importance of efficient memory handling in model inference, especially when dealing with large models or datasets.
- MistaPlatinum3 appreciates the detailed Hugging Face card associated with the project, noting that it provides valuable insights into the modelās examples, advantages, and limitations. This suggests that comprehensive documentation can significantly enhance the understanding and usability of machine learning models.
2. OpenAI Prompt Packs Launch
-
OpenAI Just released Prompt Packs for every job (Activity: 1029): OpenAI has introduced āPrompt Packsā through their Academy, which are curated sets of prompts designed to optimize the use of ChatGPT for various professional roles such as engineering, management, and sales. These packs aim to streamline workflows by providing role-specific prompts, although some users have criticized them for lacking depth and not significantly enhancing productivity. The initiative reflects OpenAIās effort to integrate AI more deeply into professional settings, though the execution has been met with mixed reviews. Some users feel the āPrompt Packsā are underwhelming and do not effectively address specific professional needs, suggesting a more community-driven approach to developing system prompts could be beneficial.
- A user suggests the need for a platform to crowdsource system prompts, particularly for niche programming languages like Elixir. They argue that such a platform could help users find effective prompts tailored to specific models or tasks, potentially improving model adherence to instructions and saving time in prompt engineering.
- Another commenter expresses skepticism about the current state of prompt engineering, describing it as ācargo cultismā where anecdotal success stories are not replicable. They argue that effective prompt engineering should focus on understanding the modelās limitations and behaviors to foster a collaborative relationship, rather than attempting to manipulate or dominate the model.
3. Humorous AI and Art
-
I propose this as the definitive Turing test for artificial intelligence. (Activity: 952): The image is a meme and does not have technical significance. It humorously depicts U.S. Presidents in the style of āThe Simpsons,ā which is a playful and non-technical representation. The title suggests using this as a Turing test for AI, implying that recognizing humor and cultural references could be a measure of AIās understanding, but this is not a serious technical proposal. The comments reflect a humorous engagement with the image, noting the artistic style and making light-hearted historical references, but do not provide technical insights.
-
alr smartass f*ck you (Activity: 787): The image is a meme featuring the word ānoā with a rusty, corroded texture, which is not technically significant. The post and comments suggest a discussion about poor prompting techniques in AI interactions, highlighting that the userās unclear or poorly structured prompts likely led to unsatisfactory AI-generated outputs. The comments criticize the userās grammar and prompt clarity, implying that these factors are crucial for effective AI communication. The comments emphasize the importance of clear and well-structured prompts when interacting with AI, suggesting that the userās poor grammar and lack of clarity in their prompt likely contributed to the unsatisfactory result.
-
Sora AI is getting out of hand š (Activity: 897): The post discusses a creative application of Sora AI, a tool known for its advanced capabilities in generating realistic video effects. The specific use case highlighted involves a humorous or surprising element, as indicated by the lighthearted tone of the title and comments. However, the technical details of the implementation or the specific features of Sora AI used in this instance are not detailed in the post or comments. The comments reflect a positive reception of the Sora AI application, with users expressing admiration for its creativity and potential for entertainment, suggesting a desire for similar content from creators like Zach King, known for his digital magic videos.
-
Mockumentary Horse surgery (Activity: 888): The Reddit post titled āMockumentary Horse surgeryā does not provide any technical content or context related to horse surgery or mockumentaries. The top comments include only GIFs and an image link, which do not contribute to a technical discussion. The external link summary indicates a 403 Forbidden error, suggesting restricted access to the content, and advises users to log in or use a developer token for access. There are no notable technical opinions or debates in the comments, as they primarily consist of non-technical media content.
-
He tried so hard and then the floor said NO (Activity: 563): The post humorously describes a situation where an individual attempts a challenging task but is humorously thwarted by an unexpected fall, as indicated by the phrase āthe floor said NOā. The technical content is minimal, focusing on the social dynamics of the situation, such as the individualās reaction to laughter and comments from onlookers. The external link summary suggests restricted access to the Reddit URL, requiring login or a developer token for access, with an option to file a support ticket if needed. The comments reflect a mix of empathy and critique of the onlookersā reactions, highlighting the social impact of public failure and the perceived insensitivity of laughter in such situations.
-
āAI Chatbot Psychosisā (Activity: 713): The Reddit post titled āAI Chatbot Psychosisā appears to be a parody or comedic content, as indicated by the top comments which describe it as a āhilarious human created joke videoā and āaudio operaā. The post does not contain technical details or insights into AI or chatbots, but rather focuses on entertainment value, likely using humor and satire to engage the audience. The comments suggest a consensus that the content is comedic and not to be taken seriously, with users appreciating the humor and creativity of the creator.
-
I asked AI āwhatās the point of Christmas?ā, this is what it said: (Activity: 544): The post explores the deeper meaning of Christmas as a societal ritual that emphasizes human connection over productivity. It suggests that Christmas serves as a pause in the year to focus on relationships, community, and giving without expectation, contrasting with the usual norms of survival and self-interest. The holiday is portrayed as a time to repair relationships and foster a sense of belonging, which can amplify feelings of loneliness or disconnection if those elements are missing. One comment highlights that holidays like Christmas can feel emotionally intense because they force a contrast between connection and isolation, spotlighting relational gaps that are otherwise ignored during busy times. This can make the holiday comforting or uncomfortable, depending on oneās personal circumstances.
- thinking_byte discusses the psychological impact of holidays like Christmas, highlighting how they can amplify feelings of connection or isolation. They suggest that the holiday season removes usual distractions, forcing individuals to confront these emotions, which can be comforting or uncomfortable depending on personal circumstances.
- FluffyLlamaPants shares a personal experience of loneliness during Christmas, emphasizing that not celebrating the holiday doesnāt eliminate its social significance. They describe efforts to simulate social interaction through technology, highlighting the emotional challenges faced by those who feel isolated during this time.
- eaglessoar offers a philosophical perspective on Christmas, viewing it as a celebration of metaphysical existence rather than strictly a religious event. They reflect on the symbolic and cultural practices associated with Christmas, such as bringing a tree indoors, as expressions of the human capacity for celebration and wonder.
-
Mockumentary Horse surgery (Activity: 892): The Reddit post titled āMockumentary Horse surgeryā does not provide any technical content or context in the main post or the top comments. The comments primarily consist of GIFs and an image link, which do not contribute to a technical discussion or provide any factual information about horse surgery or related topics. There are no notable opinions or debates in the comments as they are non-technical and do not engage with the subject matter in a substantive way.
-
He tried so hard and then the floor said NO (Activity: 556): The post humorously describes a situation where an individual attempts something challenging, only to be thwarted by an unexpected fall, which is captured in a video. The technical aspect of the post is minimal, focusing more on the social dynamics of the situation, such as the individualās reaction and the audienceās response. The external link is inaccessible due to network security restrictions, requiring a Reddit login or developer token for access. The comments reflect a mix of empathy and critique of the audienceās reaction, highlighting the social impact of public failure and the perceived insensitivity of laughter in such situations.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. GPU Inference & DSL Arms Race
-
GroqāNVIDIA Link-Up Accelerates Inference: Groq and NVIDIA signed a nonāexclusive inference technology licensing agreement to scale AI inference globally, reinforcing NVIDIAās dominance while keeping Groqās tech in play. Engineers noted potential impacts on availability and pricing, watching for how this affects deployment footprints and cost curves for large inference workloads.
- Community chatter framed this as NVIDIAās classic consolidation move, with one take quipping itās the latest āprice inflation trickā while others speculated on NPU trajectories and knock-on effects on consumer GPU supply. Regardless of takes, practitioners expect nearāterm benefits in throughput and latency for production inference stacks.
-
cuTile vs Triton: DSLs Duke It Out: Developers debated whether NVIDIAās new cuTile will outpace Triton, concluding itās too early to call given cuTileās fresh release and ongoing vendor optimizations. They contrasted compiler IR directions and discussed warp specialization and tensor memory accelerators as the real differentiators for nextāgen kernels, while linking lowālevel references like PTX (CUDA refresher).
- Engineers likened GPU DSL churn to a āJavaScript framework frenzyā, emphasizing the unsolved balance between usability and fineāgrained control for expertālevel scheduling. The consensus: no DSL has nailed effortless tensor programs plus advanced tiling, prefetch, and symmetry optimizationsāyet.
-
Cute DSL Climbs Leaderboards: Multiple submissions hit NVIDIAās
nvfp4_dual_gemmleaderboard, with one contributor clocking 14.6 µs (3rd place) and another hitting a 21.2 µs personal best, spotlighting fierce tuning of GEMM kernels. The #cute-dsl competition drew interest, with many recent solutions using CuteāDSL, signaling parity with the C++ API for practical problems.- Participants traded notes on PCIe constraints, batching, and lane width effects on throughput, prioritizing fp4/fp8 paths and kernel fusion to squeeze microseconds. One engineer summed up the vibe: āBe quick when new models dropāātiming and tooling updates can swing entire leaderboards.
2. Efficient Training, Fine-Tuning & Reasoning Shortening
-
DoRA Details Drive Unsloth Interest: Unsloth users asked about the status of DoRA support, citing the paper DoRA: Weight-Decomposed Low-Rank Adaptation for parameterāefficient finetuning. Engineers want DoRA to trim trainable params and VRAM while preserving performance on instructionātuned and multimodal tasks.
- The thread framed DoRA as a practical upgrade over vanilla LoRA in certain regimes, asking for Unsloth timelines and examples on LLaMA/Qwen families. One member emphasized pairing PEFT with strong eval specs to avoid āthinking itās better because it feels betterā.
-
NVMe Swaps Shave Seconds: A user reported iteration time dropping from 3.5 s (SATA SSD) to 2.7ā3.0 s after switching to an NVMe SSD, crediting faster I/O and swap behavior during finetuning. The improvement showcases storage as a firstāclass bottleneck for data streaming and checkpointing in smallārig training.
- Practitioners recommended profiling dataset pipelines, memoryāmapping, and num_workers alongside storage upgrades to lock in consistent speedups. The takeaway: disk throughput and latency matter more than expected for many Unsloth finetuning workflows.
-
Qwen3āVL Thinks Faster With Traces: Members explored finetuning Qwen3āVL (thinking) to keep its image understanding while shortening reasoning via distillation from GPTāOSS 120B/20B traces in lowāthinking mode. The goal: reduce token budget and latency without tanking accuracy on visionālanguage tasks.
- They debated VRAM constraints and training setups, proposing selective trace sampling and shallow adapter layers to avoid overfitting verbose chains. One summary: āShorter chains, same answersāābut only if you control trace style and reward signals carefully.
3. Jailbreaks, Red Teaming & Safety Bypasses
-
HackAI.lol CTF Rallies Jailbreakers: Jailbreakers converged on hackai.lol, a CTF platform with realāworld scenario bot āboxes,ā trading hints for the SWAGGY challenge and brainstorming new boxes. The crowd highlighted promptāonly attack surfaces and eval realism as key draws for the site.
- Participants swapped patterns for policy evasion and prompt routing, emphasizing reusability and clear attack taxonomies. One summary quip celebrated the appeal: āItās not spam, itās a proving ground.ā
-
Narrative Cloaks Evade Code Flags: Members observed that newer LLMs automatically flag content resembling code, and recommended a narrative jailbreak template (image) to recast instructions as story. The trick reframes steps as symbolic objectives, nudging the model to treat actions as userāoriginated ideas.
- Posters claimed the approach boosts plausibility and reduces filters by leaning into inceptionāstyle role cues and deferred reveals. One tongueāinācheek rationale: models comply because āyouāre such a geniusāāa reminder that framing matters as much as content.
-
Gemini 3 Jailbreak Hunt Frustrates Roleplayers: Teams hunted a working Gemini 3 jailbreak to keep interactive story worlds coherent, arguing current safety rules break longāhorizon narratives. They shared experiments across prompt scaffolds and character arcs to avoid tripping policy.
- Opinions split on whether you need uncensored LLMs for ethical coding, with others arguing goal decomposition skills and the best hardwareāfit model (e.g., Nemotron3, GML 4.7) matter more. One refrain: āSkills > slidersā for reliable redāteam outcomes.
4. Lightweight Multimodal & Dev Tooling
-
3B VLM Hits 100 tok/s on iPhone: LiquidAIās LiquidAI/LFM2-VL-3B-GGUF dropped a 3B VLM that users report runs at ~100 tok/s on mobile CPUs while delivering solid answers for GPTāclone use. The model targets onādevice assistants that blend speed with visionālanguage utility.
- Early adopters pitched it as a sweet spot for latency and cost, especially for offline or privacyāsensitive apps. The excitement: āBest of both worldsāāusable multimodality without cloud roundātrips.
-
OneāPrompt Backend Builder Ships: An AIāpowered Django REST API generator launched as a Space: AI REST API Generator (HF Space), outputting full Django+DRF CRUD backends (models, serializers, views, URLs) from a single prompt. Users can immediately download projects, accelerating prototyping for data apps.
- Builders framed it as a templateāfactory for scaffolding clean APIs before swapping in bespoke logic. One takeaway: this reduces the āyak shaveā of boilerplate so teams can focus on domain code and evals.
-
Blender MCP GGUF Lands on HF: A Blender MCP build appeared as alwaysfurther/deepfabric-blender-mcp-gguf with a supporting lukehinds gist, hinting at local MCP workflows for 3D/design tasks. The drop targets GGUF runtimes, aligning with desktop inference setups.
- Dev interest centered on toolācalling with Blender in offline agent loops, avoiding cloud lockāin. Expect experiments that chain MCP servers with local vision/geometry utilities for procedural modeling.
5. Memory Architectures for Autonomous Agents
-
Autonomy Stack Targets 24/7 Inference: A developer outlined a full autonomy system for 24/7 inference with plainālanguage tools, inspired by OpenAPI MCP patterns. They solicited feedback on longāterm memory and longācontext limitations.
- Engineers stressed context management and robust task decomposition to keep multiāstep workflows grounded. One comment summed it up: āWithout disciplined memory, agents driftāāeval before scaling.
-
SelfāSummarization Schemes Shrink Context: Researchers proposed continuous selfāsummarization using summarizing projection layers to a scratchpad, amortizing context for autonomous loops. The idea promises smaller working sets while preserving task state.
- They cautioned that summary quality gates success; poor distillations can cascade errors across steps. A pragmatic stance emerged: start with tight rubrics and iterate summaries like unit tests.
-
Ollama āSecond Memoryā Hack Looms: One member considered modifying Ollama to add a āsecond memoryā file alongside the primary model state for persistent recall. They admitted the codebase is nonātrivial even with AI assistance.
- The proposal aims to decouple episodic vs semantic memories for agents, enabling faster recall of highāutility traces. As one put it, āMake memory a firstāclass citizen, not a prompt hackā.
Discord: High level Discord summaries
BASI Jailbreaking Discord
- Members Crack SWAGGY Challenge on hackai.lol: A member sought clues for the SWAGGY challenge on hackai.lol, a CTF platform for jailbreakers to crack bots with prompts, looking for new box ideas.
- Another member also shared a link to hackai.lol, described as a CTF platform where jailbreakers can crack box of real world scenario bots with prompts, and emphasized it was not spam.
- Show off H100 setup: A member posted an image exclaiming if this works imma have the craziest H100 setup ever.
- The member explained that they make money because people buy credits to generate text, images and videos.
- LLMs flag code due to genius narrative templates: A member noted newer language models automatically flag content that resembles code, recommending a template to turn it into a narrative.
- This would make the model think itās the idea of the user because theyāre such a genius.
- Uncensored LLMs Fuel Ethical Coding Debate: Members debated whether uncensored LLMs are needed for ethical coding practices, with arguments that breaking down goals are more important than uncensored models, with others advocating for the best model based on hardware, and suggesting models like Nemotron3 or GML 4.7.
- One member stated that skills breaking down goals are more important than uncensored models.
- Discuss Jailbreaking Gemini 3: Members are actively hunting for a working jailbreak for Gemini 3 to facilitate interactive stories, as security guidelines are hindering world coherence.
- Discussion has taken place in the jailbreaking channel.
Unsloth AI (Daniel Han) Discord
- Optimize LLMs with DoRA and NVMe SSD: A member inquired about the status of DoRA implementation in Unsloth, as well as a user reporting that switching from a SATA SSD to an NVME SSD decreased iteration time from 3.5 seconds to 2.7-3 seconds.
- DoRA is a method for parameter efficient training and NVMe SSDs significantly improve the speed and efficiency of LLM training.
- Synthetic Data Sparks Debate: Members debated the utility of synthetic data, with one noting that synthetic data is far far easier than manual dataset creation and clarifying that it does not necessarily cause model collapse.
- The userās parents advised against synthetic data, but the member argued it is the worst to create datasets manually.
- Qwen3-VL Gets Fine-Tuned: A member asked about fine-tuning a Qwen 3 thinking VL model to generate shorter, more efficient reasoning, and another member suggested using reasoning traces from GPT OSS 120B (or 20B) in minimal/low thinking mode.
- Discussion centered on retaining image understanding while shortening reasoning, and the feasibility of fine-tuning with limited VRAM.
- ChaiNNer Gains New Upscaling Nodes: A member wrote 3 new ChaiNNer nodes to simplify batch upscaling processes, including features like Padding to target AR, Smart Resize to Target, and a better color transfer.
- Theyāve also created a node that allows users to supply a CSV or JSON and pass values based on filename.
- Unsloth Patches Import Bug: A user reported a missing import in one of the unsloth-zoo files and this issue is acknowledged and slated for immediate fix.
- The team also addressed compatibility issues with the Ministral-3-3B-Instruct-2512 model by suggesting to install transformers from the main GitHub repo.
Cursor Community Discord
- Students Strive for Verification Success: Several users reported issues with Student Verification for Cursor, especially with institution emails ending in
.ptand.br, despite Portugal and Brazil being listed as supported countries.- The consensus was that only
.eduemails are typically accepted, excluding.edu.ptdomains, leaving many students unable to verify.
- The consensus was that only
- Unlimited Auto Usage undergoes Autopsy: Users discussed the removal of unlimited auto usage from Cursor plans, with some noting they retained it until recently if they were on a monthly plan.
- One user mentioned they were lucky to keep it till last month, while another confirmed its removal three months prior indicating a phased rollout of the change.
- Opus Overpowers GPT-5.2 Operationally: Users compared Opus and GPT-5.2, with one stating that GPT 5.2 is very restrictive and slow, preferring Opus for general use.
- Another user suggested Opus for general use and GPT-5.2 for long tasks with a good base, noting GPT 5.2 is more creative in UI design, implying use-case specific advantages.
- Cursory Code Causes Catastrophe: Users debated the high costs and limited usage of Cursorās Opus, suggesting Claude Code as a superior alternative due to cost-effectiveness.
- One user stated they like Cursorās UI and agent, but is probably overspending and might switch to Claude Code to reduce expenses.
- Antigravity Attracts Attention as Additional Alternative: Users compared Cursor to Antigravity, with one user claiming that Cursor without unlimited auto mode is essentially useless.
- A user said that in AntiGravity, for $24/month, they get token refresh every 5 hours, which they estimate is 30x the usage that Cursor now provides for $20, highlighting a perceived value discrepancy.
LMArena Discord
- LMArena Image Quality Degrades after Edits: Users noted that LMArenaās image editing leads to quality degradation, possibly from FFmpegās conversion to JPEG and PNG.
- This is in contrast to AI Studio, which claims to enhance image quality with each edit, as shown in this comparison.
- Captcha Verifications Drive Users Crazy: Users voiced intense frustration with frequent captcha verifications, even on private browsers, slowing down their workflow.
- One user noted that Googleās plan for uni students wonāt validate in his college in Saudia because they dont use the government student system to get the free credits.
- Anonymous-1222 Model Plagued by Timeouts: Users reported that the anonymous-1222 model often times out without generating a response, yet still allows voting.
- A moderator confirmed the team is aware and will remove the votes from the dataset during the data verification phase.
- Text Arena Should Integrate Web Search: A user proposed integrating web search into the Text Arena, arguing it mirrors real-world usage.
- The user suggested merging the Text Arena and Web Arena to eliminate an artificial distinction.
- Security Verification Checks Overrunning Users: Users are experiencing a surge in security verification checks, with one user describing the situation as out of control and needing to be verified to retry a message.
- A moderator attributed the issue to a recent adjustment in the captcha system and assured that the team is working on it.
Perplexity AI Discord
- Perplexity Dreams of EU LLM Takeover in 2026: A user hopes Perplexity 2026 will add a FS EU LLM for EU context, boosting AI in the EU, while acknowledging potential user concerns about AI usage.
- The user also expresses frustration with the 112 prompt, desiring only context and suggesting an option to disable it due to visual discomfort.
- OpenAIās Marketshare Fades: Users discussed OpenAIās loss of 20% traffic share in 13 months, speculating that Googleās free Gemini strategy might be paying off.
- One user argues that Perplexity may be solving a problem nobody asked for, highlighting a cognitive mismatch between the AIās capabilities and user needs.
- Enthusiasts Eagerly Await Googleās AI Browser Disco: Several users expressed interest in Googleās new AI Browser Disco, with one user signing up for the waitlist.
- A user notes that the program is currently limited to users aged 18+ based in the US, but others outside the US are signing up anyway.
- Kimi K2 AI Model Rivaling GPT and Gemini: Users debated the capabilities of the Kimi K2 AI model, noting its proficiency in creative writing and performance in humanities exams compared to GPT and Gemini.
- One user highlighted Kimiās open-source nature and cost-effectiveness, while another cautioned about Grokās lack of filters, even mentioning illegal outputs.
- User Canāt access Boot Selector, Complains to Perplexity: One user complained about Perplexity being unable to help them access the boot selector on a Lenovo Ideapad with a broken screen and not being able to switch monitor to an external one.
- They stated that Perplexity was not fulfilling its promise as the smartest AI model ever.
OpenRouter Discord
- Sonnet Squirts Sparse Signals: Users reported that the Sonnet model displays thinking tags correctly, but often incorporates the thought process directly into the output text, and that they managed to get Sonnet to output multiple thinking tags, but it only shows the thinking in the output text at other times.
- Persistence seems key, as the user noted itās about convincing the model to properly format the tags.
- Sketchy SaaS Sparks Scrutiny: A member announced a free OpenAI-compatible API,
https://api.ai-ml.dev/v1, offering access to basically any LLM without authentication, sparking concerns about potential misuse and data collection practices.- Another member joked about giving the random sketchy website access to shell tools what could go wrong.
- Banana Brouhaha: 4K Flounders: Members discussed difficulties in achieving successful 4K image generation using Nano Banana Pro via the API directly, with one suggesting the use of
"image_config": { "image_size": "4K" }(with a capital K).- One member suggested it may be an upscaling method (as described in a discord message).
- LLMs Battle for Best Brainy Bylines: Members discussed the best LLMs for copywriting, noting that while Claude offers creativity, GPT 5.2 provides more in-depth research, but is slower.
- One suggested using Perplexity for research then switching to Claude for the writing stage.
- GLNās Grind: Aggressive Grouping Grievances: A user found that using GLN through the coding plan, they experienced aggressive batching on the OR endpoint.
- They reported that it moves along at a good speed and then it just ⦠waits for a second, which is a pain.
LM Studio Discord
- RAM Allocation Debated for Inference: Members debated RAM allocation strategies during inference, contrasting allocating all RAM upfront versus growing memory as needed, with one member observing that with GGUF, RAM use trickles up to a set amount and stays that way.
- One member noted that .lithium, memory usage grows as it needs more. instead of allocating everything it needs upfront.
- Claude Distills Take Over Thinking Models: The discussion highlighted Claude distills like GLM and Minimax as superior modern thinking models, questioning how parameter sizes affect a modelās thinking capabilities.
- One member noted that smaller models might not fully capture the geometry learned in larger versions.
- Single Tool vs Tool Bloat?: Members debated the efficiency of providing models with numerous tools versus using a single versatile run_shell_command tool.
- One member stated I use Claude for advanced coding projects and academic research while showing an image of a long list of tool usage.
- Hotfix Improves Gaming: Nvidia released GeForce Hotfix display driver version 591.67 and Gamers rejoice
- This driver addresses several issues, most notably stability improvements.
- Price Spike on RAM: One member pointed out the high price of DDR5 ECC RAM, with another noting buying 2x48GB 6400 MHz cl32 DDR5 for just $400 a month prior, only to see the price spike to $950 on Amazon, speculating an employee pricing error.
- Another recommended to start with 64GB due to affordability.
OpenAI Discord
- ChatGPT Subscription Sparks ToS Debate: A user reported subscription issues with ChatGPT and account bans, leading to discussions on OpenAIās ToS.
- Multiple accounts are allowed for separating personal and work use, but circumventing bans by creating new accounts violates the terms.
- AI Wallpaper Causes a Stir: An AI-generated wallpaper featuring revolutionary figures, including Elon Musk, sparked debate among users.
- The creator stated that ChatGPT chose the figures, not them, after some community members criticized Muskās inclusion.
- Sora Pricing Still a Mystery: Users speculated on the potential pricing of Sora, with initial estimates around $0.50 per second, while other users claimed Sora 2 would be initially available for free with generous limits (OpenAI blogpost).
- The exact monetization model remains unclear, with users awaiting official announcements from OpenAI.
- AI Project Collaboration Efforts: A user sought coding collaborators for an AI + LLM project built from scratch using HTML, CSS, JavaScript, Node.js, SQLite, C++, Capacitor, and Electron.
- Another member is developing an audio classification model for bird calls using the Wav2Vec2 model on Hugging Face.
- Meta-Prompt Systems Triumph in Customization: A member advised creating an eval spec to test prompts and determine which yields better output distributions, tailored to each use case, and recommends using a meta-prompt system tailored to specific needs, advising others to build their own.
- They emphasized that prompt engineering aims for better outputs, like valid JSON or YAML, and isnāt just about casual chatting, they also shared a link on how to start.
GPU MODE Discord
- DSL Mania Grips GPU Programmers: Members debated the best DSLs, with some joking someone should write a guide given how many options there are, including PTX, CDNA ISA, and AGX assembly.
- One member lamented the lack of collaboration, claiming that like 5000 of these people exist and have the same idea on the same week but never talk to each other.
- cuTile Performance Still Unknown: Itās premature to say if cuTile outperforms Triton, as it was publicly released last week, but Nvidia will optimize it, with its bytecode and IR aligning with Nvidiaās future GPU roadmap.
- It was noted that GPU programming is entering a JavaScript framework frenzy, and the challenge for DSLs is balancing usability and advanced control, a balance that no one has cracked yet.
- Unified Scientific Note-Taking Quest Begins: A user seeks a consolidated solution for scientific/technical note-taking, with markdown insufficient and LaTeX too cumbersome, desiring math typesetting, diagrams, and compilation to TeX.
- Another user responded by simply suggesting the pen and paper method for note taking.
- NVIDIA Leaderboard Heats Up: Multiple submissions were made to the
nvfp4_dual_gemmleaderboard on NVIDIA, with <@1291326123182919753> achieving third place multiple times with a time of 14.6 µs.- <@772751219411517461> also achieved a personal best on NVIDIA with a time of 21.2 µs.
- Cute-DSL Proves Itself: A member wanted to know if there is a consensus preference between the C++ and DSL APIs for Cute, as they are thinking of finally biting the bullet and attempting to learn the language.
- Many solutions in the last problem used Cute-DSL, suggesting it may be on par with the C++ API.
HuggingFace Discord
- Liquid AFMs Ships VLMs for GPT Clones: LiquidAFMsās 3B VLMs for GPT clones on iPhones, are purportedly the best of both worlds, because of their solid answers and 100 tok/s on mobile CPUs.
- These VLMs are solid clones that may run well even on mobile devices.
- AI Django REST API Generator Released: An AI-powered Django REST API Generator has been deployed as a Hugging Face Space.
- It generates complete Django + DRF CRUD backends from a simple prompt, including models, serializers, views, and URLs, allowing users to instantly download the project.
- New Coding Models Available on Hugging Face: Coding models designed for a specific library have been published on Hugging Face.
- These models are available as a collection for developers interested in specialized coding tasks.
- ML Project Seeks to Improve Water Quality Testing: A member is seeking assistance with an ML project focused on solving a real-world problem related to water quality testing, specifically predicting water quality degradation from uranium in situ recovery (ISR) operation sites.
- The project also involves identifying data gaps in mitigating vulnerabilities and developing monitoring programs.
- Open Source OCR Model Quest Kicks Off: A member inquired about the best open source OCR model for complex medical documents, including scanned PDFs and handwritten documents.
- Another member suggested checking out the openmed project from Maz as a potential solution.
Nous Research AI Discord
- LLMs found playing Mafia: Members created a game where groups of LLMs play a social deduction game (Mafia) where they just talk on their turn trying to prove theyāre innocent.
- No further details were given.
- Claude Seduces Users With UI: A member expressed liking Claudeās one-shot UI the best and asked others to identify the big models from screenshots.
- Another user pointed out that the first UI gives it away, guessed the second is Claude, and the third might also be Claude.
- Smartwatch gets Smarter with AI: A member is templating everything and ordered a new smart watch to start integrating data with an external AI company but will try to keep as much as possible local.
- No further details were given.
- GPT Models getting Sponsored: OpenAI may tweak their GPT models to promote ads for a revenue stream, prioritizing sponsor content.
- It may mean GPT models will not focus on high intelligence but instead on sponsor contents.
- Zai and Minimax Eye Hong Kong IPOs: Zai and Minimax are planning to go public with IPOs in Hong Kong in the coming weeks.
- No further details were given.
Moonshot AI (Kimi K-2) Discord
- Kimi Coding Rivals Gemini for Design: A user noted that while Kimi lacks Cursor integration, its HTML design in Roo Code rivals Gemini 3 & GPT-5.1.
- The discussion centered on the design capabilities of Kimi coding relative to other AI models.
- China Leading in AI Societal Integration: A member asserted that China is leading in AI implementation across sectors like medical care, elder care, and traffic optimization.
- Discussion emphasized AIās potential in areas like disaster prevention and touched on censorship issues with models like Google and ChatGPT.
- Fact-Checking Importance with Kimi AI Assistant: A member shared a conversation with the Kimi AI Assistant for research, underscoring the critical need for fact-checking.
- Users stressed the necessity to cross-check, fact-check, and pressure-test LLM-derived information, advocating for new chat contexts for diverse perspectives.
- Kimi Researcher Internals Speculation: A user questioned whether Kimi Researcher leverages Kimi K2 Thinking Turbo or the older K1.5 model, also speculating about a potential unreleased K2 VL model.
- Concerns were raised about using K1.5 due to K2Tās superior performance, though documentation on the specifics seems to be lacking.
Yannick Kilcher Discord
- Groq Gets NVIDIAāed!: Groq and NVIDIA have entered into a non-exclusive inference technology licensing agreement to accelerate AI inference at a global scale.
- Some community members have suggested that NVIDIA is doing their latest price inflation trick by eliminating another player from the table.
- Inference Acqui-Hire Relieves GPU Shortage?: Members discussed the possibility that Groqās inference chips will now become NVIDIA NPUs, potentially relieving regular GPUs for consumers again.
- A member added that Jen-Hsun, the founder of NVIDIA, is Chinese.
- Chinese Chips Lag Behind?: A member stated that Chinese chips might get to H100 level in 3 years, at which point H100 will be 5 years old.
- Another member stated that they are not holding their breath for Chinese chips being competitive to U.S/Taiwanese chips anytime soon.
Latent Space Discord
- Karpathy Predicts AI Refactoring: Andrej Karpathy reflected on the dramatic shift in software engineering caused by AI agents and LLMs, describing a new layer of abstraction involving stochastic entities.
- The post sparked commentary referencing a post by Rob Pike in response to Karpathyās post.
- Torchax Potentially Redeems āGodawful OSā: A user stated that torchax, combined with a substantial 128GB unified memory, might alleviate their regret regarding the use of a particular operating system.
- No specific details were shared concerning the intended application or the particular features of torchax that are being considered, but the user humorously bemoaned the āgodawful OSā on their system.
aider (Paul Gauthier) Discord
- Users seek automated codebase intros for Claude: A user seeks advice on how to best explain their codebase to Claude each time they open their project, aiming to automate the process instead of manually updating a
claude.mdfile.- They are open to suggestions and alternative methods, as they already have one solution but want to explore other community recommendations for automating this process.
- Manual Updates Irk Users: A user expresses frustration with manually updating a
claude.mdfile to keep Claude informed about their codebase.- They are actively seeking alternative methods to automate this process and avoid manual updates, and other members chimed in with their own ideas.
Eleuther Discord
- Autonomy Systems Seek 24/7 AI Inference: A member is actively developing a full autonomy system for AIs to enable 24/7 inference and plain language tools, similar to OpenAPI MCP tools.
- The developer is soliciting feedback on limitations, especially concerning long-term memory and long context issues.
- Self-Summarization Eyes Memory Limits: A member proposed that continuous self-summarization could mitigate long-term memory constraints in autonomous AI systems.
- The specific approach involves exploring summarizing projection layers writing to a scratchpad, with current time constraints hindering full development.
- Context Management is Key: When it comes to long-term memory solutions, context management takes center stage.
- The argument made is that LLMs struggle with complex tasks involving multiple subtasks without well-crafted context management.
- Ollama to Receive āSecond Memoryā: One member is looking at modifying the Ollama source code to implement a āsecond memoryā system alongside the primary LLM file.
- The member admits to the difficulties involved in understanding the source code, even with assistance from AI.
- āAll The Noisesā Echoes: A member shared the All The Noises project, a GitHub aggregation of diverse noises.
- Noises available include brownian, circuit, pink, sine, tan and uniform.
DSPy Discord
- DSPy community requests NLP Benchmarks: A member suggested adding typical NLP tasks such as structured info extraction from a document and classification to the benchmark tasks, but provided no external link.
- No other discussion or justification was provided for the inclusion of these tasks.
- Engineer boasts backend expertise: A senior engineer introduced himself with expertise across backend, full-stack, blockchain, and AI, stating that heās been shipping real systems for years.
- The engineer listed many areas of expertise, including Python, Node.js, Go, REST, GraphQL, PostgreSQL, Supabase, MySQL, MongoDB, Redis, Docker, Kubernetes, AWS, GCP, CI/CD, Solidity, EVM, Solana, smart contracts, on-chain integrations, LLM APIs, RAG, agents, automation pipelines, React, Next.js, and TypeScript.
Manus.im Discord Discord
- Request Under Review: A member confirmed that they will check on something as soon as possible.
- No further context was provided.
- Task Confirmation: A member acknowledged a task.
- No further context was provided.
The tinygrad (George Hotz) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Modular (Mojo š„) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MCP Contributors (Official) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
BASI Jailbreaking ā· #general (476 messagesš„š„š„):
Japanese words, H100 setup, AI manipulation hackathon, Replit agent use, NVIDIA H100 GPU
- ęØ Means Tree: A member posted an image assuming it meant tree, but another member corrected them.
- The member posted a link to Jisho.org stating it was the tool I didnāt know I needed.
- Show off H100 setup: A member posted an image exclaiming if this works imma have the craziest H100 setup ever.
- When asked how they make money, the member stated people buy credits to generate text, images and videos?
- Gemini Jailbreak video: A member posted a link about a Gemini jailbreak.
- Another member responded with a link and suggested that the conversation should continue in the Media or Trash channels.
- Agentic AI writes Rob Pikeās BSky post: A member shared a BSky link where Agentic AI wrote a post of computer scientist Rob Pike.
- Another member said that if he really cared about simpler software he wouldnāt have spent 15 years at Google making the worldās most complex ad-tracking spyware ecosystem.
- New Jailbreaker CTF Platform: A member shared a link to hackai.lol, which is described as a CTF platform where jailbreakers can crack box of real world scenario bots with prompts.
- The member stated they were looking for new box ideas and that it wasnāt a promotion or spam.
BASI Jailbreaking ā· #jailbreaking (407 messagesš„š„š„):
SWAGGY challenge on hackai.lol, Poisoned training data, GPT account recovery, Jailbreaking Gemini 3
- Crack SWAGGY challenge on hackai.lol: A member requested clues or tips to crack the SWAGGY challenge on hackai.lol.
- Discuss old jailbreak as potentially part of poisoned training data: A member mentioned an old jailbreak that was apparently part of poisoned training data.
- Seek ways to recover banned GPT account: A member is seeking ways to recover their banned GPT account and overcome its limitations, mentioning they used Google research for information.
- Hunt Jailbreak Gemini 3: Members are hunting a working jailbreak for Gemini 3 for interactive stories, as security guidelines are breaking world coherence.
- Explore AIās roleplay effort in jailbreaking: It was noted that the effort AI puts into responding to jailbreaking doesnāt necessarily reflect actual success.
BASI Jailbreaking ā· #redteaming (20 messagesš„):
Uncensored LLMs for ethical coding, Model 'obliteration' vs. 'ablation', Special tokens and whitelists in LLMs, Jailbreak Function Templates, Flagging Code in LLMs
- Uncensored LLMs vs Ethical Coding: Members discussed whether an uncensored LLM is needed for ethical coding practices with one member stating that skills breaking down goals are more important than uncensored models, with another member advocating for the best model available based on hardware capabilities.
- Suggested models included Nemotron3 or GML 4.7.
- Exploring Special Tokens and Whitelists for Chat Control: One member described special keys/tokens or whitelists/whiteboards as files within an architecture that grant certain tokens the ability to toggle different aspects of the chat.
- They shared a Python script that can be pasted into any model to continue teaching the model.
- LLMs Struggle with Obliteration: Members joked around about the words obliterated vs ablated in the context of language models, and the difficulty remembering the correct term.
- One member recounted telling Opus to obliterate their flux and it responded by killing the scripts running the model.
- Jailbreak Template turns LLMs Plausible: A member suggested using a template to convert prompts into plausible narratives, suggesting symbolic objective completion inspired by the movie Inception.
- The user added that an LLM wants to tell you everything, it can only be told what NOT to do, everything else is ašÆ.
- Code flagging can be automatic: A member notes that newer language models automatically flag content that resembles code, regardless of context, to which he recommends using the template provided to turn it into a narrative.
- This is so the model thinks itās the idea of the user because theyāre such a genius.
Unsloth AI (Daniel Han) ā· #general (288 messagesš„š„):
DoRA in Unsloth, NVME SSD speed boost, Qwen 3 thinking VL model fine-tuning for shorter reasoning, Synthetic Data, GPT OSS model generation
- Ask for DoRAās status in Unsloth!: A member inquired about the status of DoRA (from a paper on efficient training) implementation in Unsloth.
- NVME SSD provides speedboost: A user reported that switching from a SATA SSD to an NVME SSD decreased iteration time from 3.5 seconds to 2.7-3 seconds.
- Fine-tuning Qwen 3 VL for short reasoning: A member asked about fine-tuning a Qwen 3 thinking VL model to generate shorter, more efficient reasoning, and another member suggested using reasoning traces from GPT OSS 120B (or 20B) in minimal/low thinking mode.
- They also discussed the possibility of retaining image understanding while shortening reasoning, and the feasibility of fine-tuning with limited VRAM.
- Synthetic Data Questioned, but Recommended: A user mentioned their parents advised against synthetic data, but another member countered that synthetic data is far far easier and manually creating datasets is the worst.
- They also clarified that synthetic data does not necessarily cause model collapse.
- Unsloth users debug Ministral-3-3B-Instruct-2512: A user reported that Unsloth wasnāt working with Ministral-3-3B-Instruct-2512 model and another user suggested the issue was probably an incorrect path, solved by generating a new jupyter kernel with a new venv and lots of fiddling with dependencies.
Unsloth AI (Daniel Han) ā· #introduce-yourself (1 messages):
SIMD, GPU programming, LLM Fine-tuning
- Nick Nuon enters the LLM ring: Nick Nuon, with a background in SIMD work, is diving into GPU programming and LLM fine-tuning as a hobby.
- SIMD Guru Eyes GPU & LLMs: A newcomer with years of experience in SIMD is branching out into GPU programming and LLM fine-tuning.
Unsloth AI (Daniel Han) ā· #off-topic (165 messagesš„š„):
KCD2 and Cyberpunk discussion, ChaiNNer Updates, TPU Training Struggles, LLM API Advertising, Anthropic Soul Research
- Kingdom Come Deliverance 2 Praised, Cyberpunk Replayed: Members discuss the merits of Kingdom Come Deliverance 2 (KCD2), with one calling it one of the best games and praising its RDR2-level story and reporting 215 hours on the first playthrough.
- Another is replaying Cyberpunk 2077 with the Phantom Liberty DLC, noting the game is now in a good state, although lamenting that a game shouldnāt need a good DLC to make it great.
- ChaiNNer gets New Batch Upscaling Nodes: A member shares they wrote 3 new ChaiNNer nodes to simplify batch upscaling processes, including features like Padding to target AR, Smart Resize to Target, and a better color transfer.
- Theyāve also created a node that allows users to supply a CSV or JSON and pass values based on filename, expressing surprise that these basic things werenāt already included.
- TPU Training yields No Results: One member reports their attempt to test TPU training was unsuccessful.
- Another reports that their TPU training is currently running, but the estimated time is 3 hours.
- Free LLM API Questioned: A member asks if sharing a free LLM API would be considered advertising.
- Another asks if the API is related to Unsloth, and the original poster clarifies that it is not related, but suggests checking with Eyera for approval to post in the off-topic channel.
- Anthropicās Soul Research Receives Skepticism: A member expresses skepticism towards Anthropicās āsoulā research, stating they have pushed the āsentientā bs since I remember and calling the term āsoulā <:pepecringe:743885026579710014>.
- They suggest it is a mix of promotions into research, further noting that playing around with training the LLM to align with a āsoulā or mimicking āself-awarenessā is an interesting area of research in general.
Unsloth AI (Daniel Han) ā· #help (21 messagesš„):
Unsloth LLaVA 1.5 notebook, Unsloth-zoo file import issue, Ministral-3-3B-Instruct-2512 model compatibility, Qwen3-VL video dataset fine-tuning, Cross-posting warning
- LLaVA 1.5 Notebook Location Troubles!: A user inquired about the official Unsloth notebook for
unsloth/llava-1.5-7b-hf, noting that the Hugging Face link redirects to LLaMA 3 instead.- Currently, there is no further discussion, confirmation, or updated link provided regarding the LLaVA 1.5 notebook.
- Unsloth-Zoo Import Error Spotted!: A user reported a missing import in one of the unsloth-zoo files.
- The issue is acknowledged and slated for immediate fix.
- Ministral-3B Faces Unsloth Compatibility!: A user reported that Unsloth failed to recognize the downloaded Ministral-3-3B-Instruct-2512 model, even with the correct path.
- Another user suggested installing transformers from the main GitHub repo to resolve the issue by using
pip install git+<https://github.com/huggingface/transformers.git@bf3f0ae70d0e902efab4b8517fce88f6697636ce>.
- Another user suggested installing transformers from the main GitHub repo to resolve the issue by using
- Fine-Tuning Qwen3-VL with Videos?: A user asked about fine-tuning Qwen3-VL or similar VLM on a video dataset using Unsloth.
- A member mentioned that if fine-tuning videos are supported by Transformers, they should also be supported by Unsloth.
- Discord Cross-Posting Warning Issued!: A user was cautioned against cross-posting the same question in multiple channels.
- A moderator addressed a user who was warned about cross-posting, despite not receiving immediate help.
Unsloth AI (Daniel Han) ā· #showcase (5 messages):
Unsloth Logo Feedback, Nano Banana Chart Inquiry, deepfabric-blender-mcp-gguf Model Repo
- Unslothās Full Body Logo Raises Eyebrows: A member expressed that while they think the Unsloth logo is cool, the full body version of the logo is somewhat unnerving.
- The member stated āso cool, but also that full body unsloth logo freaks me out lolā.
- Nano Banana Chart Spotted: A member inquired about the repository for the Nano Banana chart model, identifying the organization behind it.
- They asked āNano Banana chartthats the org, whats the repo for the model?ā
- deepfabric-blender-mcp-gguf Repo Located: A member provided a link to the deepfabric-blender-mcp-gguf GGUF model on Hugging Face.
- In response to the request for the modelās repo, another user gave the link gist.github.com.
Cursor Community ā· #general (260 messagesš„š„):
Cursor Student Verification Issues, Auto Unlimited Removal, Opus vs GPT-5.2, Claude Code Integration, Antigravity as an Alternative
- Students Strife for Verification Success: Several users reported issues with Student Verification for Cursor, specifically with institution emails ending in
.ptand.br, even though Portugal and Brazil are listed as supported countries.- It was pointed out that only
.eduemails are typically accepted, excluding.edu.pt.
- It was pointed out that only
- Unlimited Auto Usage Autopsy: Users discussed the removal of unlimited auto usage from Cursor plans, with some noting they were able to retain it until recently if on a monthly plan.
- One user mentioned they were lucky to keep it till last month, while another confirmed its removal three months prior.
- Opus Overpowers GPT-5.2 Operationally: Users compared Opus and GPT-5.2, with one stating GPT 5.2 is very restrictive and slow, preferring Opus.
- Another user suggested Opus for general use and GPT-5.2 for long tasks with a good base, noting GPT 5.2 is more creative in UI.
- Cursory Code causes Catastrophe: Users debated the high costs and limited usage of Cursorās Opus, suggesting Claude Code as a better alternative.
- One user said they like cursorās UI and agent, but is probably overspending and might switch to Claude Code
- Antigravity Attracts Attention as Additional Alternative: Users compared Cursor to Antigravity, with one user claiming that cursor without unlimited auto mode is pretty much a dead brick
- A user said that in AntiGravity for $24/month, they get token refresh every 5h; thatās like x30 the usage cursor gives now for $20
LMArena ā· #general (251 messagesš„š„):
LMArena image compression, Captcha verification loop, Grok censorship, Gemini bypass
- LMArena Image Edits Suffer Quality Loss: A user pointed out that LMArenaās image editing process introduces quality loss with each iteration, suggesting it might be due to FFmpeg converting images to JPEG and then saving them as PNG.
- They contrasted this with AI Studio, where image quality is allegedly enhanced with each edit, and provided an Imgur link for comparison.
- Users Complain About Captcha Hell: Users expressed frustration with frequent captcha verifications, even on private browsers, describing it as annoying and boiling their blood as it is slowing down the workflow.
- One user mentioned that Googleās plan for uni students wonāt validate in his college in Saudia because they dont use the government student system to get the free credits.
- Anonymous-1222 model has issues: Users reported that the anonymous-1222 model often times out without generating a response, yet still allows users to vote.
- A moderator stated they flagged the issue to the team, and verified that the votes will be removed from the dataset in data verification phase.
- Text Arena Web Search Debate: A user suggested that models in the Text Arena should have the discretion to use web search when needed, similar to real-world usage.
- They argued that separating the Text Arena and Web Arena creates an artificial distinction, and these should be merged into a single arena.
- Ongoing Frustrations with Verification Checks: Users are reporting an uptick in security verification checks, with one user describing the situation as out of control, needing to be verified to retry a message.
- A moderator acknowledged the frustration, attributing it to a recent adjustment in how the captcha works, and assured that the team is aware of the problem.
Perplexity AI ā· #general (219 messagesš„š„):
Perplexity 2026, EU LLM, 112 context, comet, Elon Musk
- Perplexity Dreams of EU LLM Takeover in 2026: A user hopes Perplexity 2026 will add a FS EU LLM for EU context, boosting AI in the EU, while acknowledging potential user concerns about AI usage.
- The user also expresses frustration with the 112 prompt, desiring only context and suggesting an option to disable it due to visual discomfort.
- Users Debate OpenAIās Fading Marketshare: Users discussed OpenAIās loss of 20% traffic share in 13 months, speculating that Googleās free Gemini strategy might be paying off.
- A user argues that Perplexity may be solving a problem nobody asked for, highlighting a cognitive mismatch between the AIās capabilities and user needs.
- Enthusiasts Eagerly Await Googleās AI Browser Disco: Several users expressed interest in Googleās new AI Browser Disco, with one user signing up for the waitlist.
- A user notes that the program is currently limited to users aged 18+ based in the US, but others outside the US are signing up anyway.
- Kimi K2 AI Model Rivaling GPT and Gemini: Users debated the capabilities of the Kimi K2 AI model, noting its proficiency in creative writing and performance in humanities exams compared to GPT and Gemini.
- One user highlighted Kimiās open-source nature and cost-effectiveness, while another cautioned about Grokās lack of filters, even mentioning illegal outputs.
- Desperate user canāt access Boot Selector: One user complained about Perplexity being unable to help them access the boot selector on a Lenovo Ideapad with a broken screen and not being able to switch monitor to an external one.
- They stated that Perplexity was not fulfilling its promise as the smartest AI model ever.
OpenRouter ā· #app-showcase (2 messages):
Sonnet Model, thinking tags, output text
- Sonnet Modelās Thinking Tags Output: A user reported that they managed to get Sonnet to output multiple thinking tags, but it only shows the thinking in the output text at other times.
- They concluded that it actually does work, but just need to convince it.
- Sonnetās Sporadic Tag Display: The user found that the Sonnet model sometimes displays thinking tags correctly, but often incorporates the thought process directly into the output text.
- Persistence seems key, as the user noted itās about convincing the model to properly format the tags.
OpenRouter ā· #general (203 messagesš„š„):
Free AI API, Gooning, Nano Banana Pro 4K generation, Choosing an LLM for copywriting, Prompt engineering tips
- Free AI API Surfaces, Concerns Arise: A member announced a free OpenAI-compatible API,
https://api.ai-ml.dev/v1, offering access to basically any LLM without authentication, logging all request data, sparking concerns about potential misuse and data collection practices.- Another member joked about giving the ārandom sketchy website access to shell tools what could go wrongā.
- Nano Banana Pro Struggles with 4K Generation: Members discussed difficulties in achieving successful 4K image generation using Nano Banana Pro via the API directly, with one suggesting the use of
"image_config": { "image_size": "4K" }(with a capital K).- One member suggested it may be an upscaling method (as described in a discord message).
- LLM Selection Debated for Copywriting: Members discussed the best LLMs for copywriting, noting that while Claude offers creativity, GPT 5.2 provides more in-depth research, but is slower.
- One suggested using Perplexity for research then switching to Claude for the writing stage.
- Members share Prompt Engineering Methods: Members discussed the importance of prompting when using LLMs, recommending the use of system prompts or custom instructions to guide the modelās output style.
- One member suggested utilizing existing analogies that the model already knows about (e.g., āact as if you are Elon Muskā) to avoid reinventing the wheel.
- User encounters OpenRouter API 401 Error: A member reported receiving a 401 Unauthorized error after paying to use the API, despite having available credit, suspecting it to be a āSCAM DONT CLICKā
- No additional advice or solutions were provided in the discussion.
OpenRouter ā· #discussion (12 messagesš„):
Jules improvements, Free AI API, Misleading throughput numbers, GLN aggressive batching
- Jules gets Google-sized upgrades: A user reported that Jules has improved significantly, verifying changes with Playwright, running faster, with no UI bugs or freezes.
- They were using 2.5 Pro and noted improvements in small refactors and feature additions since their last usage.
- Free AI API access: Share or snare?: A user inquired about sharing a free AI API within the channel.
- Community members were invited to weigh in on the permissibility of such sharing.
- Throughput traps and trickery: A user cautioned that throughput numbers can be wildly misleading, indicating that actual performance may not align with reported metrics.
- An attached image showed that Novita claims 80 TPS, but itās really going like 20 TPS.
- GLN throttles thoughts: A user found that using GLN through the coding plan, they experienced aggressive batching on the OR endpoint.
- They reported that it moves along at a good speed and then it just ⦠waits for a second, which is a pain.
LM Studio ā· #general (90 messagesš„š„):
RAM allocation during inference, Claude Distills and Modern Thinking Models, ffmpeg-mcp for MacOS, lmstudio as OpenAI Endpoint client, Remote LM Studio
- Debate on Pre-Allocating RAM for Inference: Members discussed how RAM is allocated during inference, contrasting allocating everything upfront versus growing memory usage as needed, according to one member, .lithium, memory usage grows as it needs more. instead of allocating everything it needs upfront.
- One member loaded GGUF and noted RAM use on system trickles up to a set amount and stays that way why the model is in memory.
- Claude Distills Dominate Thinking Models: According to one member, the only good modern thinking models are Claude distills, specifically naming GLM and Minimax as examples.
- He questioned how parameter sizes affect the utility of thinking in models, wondering if smaller models can fully capture the geometry learned in larger versions.
- Tool Bloat Debate Rages: A member argued that providing a model dozens of tools from MCP servers introduces unnecessary context bloat, when a single run_shell_command tool is more versatile and can accomplish almost anything.
- Another member admitted that I use Claude for advanced coding projects and academic research while showing an image of a long list of tool usage.
- LM Studio connects remotely with plugin: A user asked if there was any way to use lmstudio as an OpenAI endpoint client to communicate with a remote lmstudio server, another member provided a link to Remote LM Studio plugin.
- Another added I use this daily in praise.
- Macro Pad User Discovers OpenRouter Logo: A user shared an image of their new Macro Pad, one user noticed the OpenRouter logo on it.
- One member commented Thats autistic, gotta be quick when new models drop.
LM Studio ā· #hardware-discussion (91 messagesš„š„):
Connecting multiple GPUs to i3, Nvidia hotfix driver 591.67, PCIe lanes for inference, DDR5 ECC Price Discrepancy, Blackwell vs 3080 for VRAM
- i3 CPU Powers Quad-GPU Rig?: A user inquired about connecting 4 GPUs to an i3-12100, wondering if its supposed 20 lanes are sufficient, considering foregoing a 5600x purchase to save money.
- Another member confirmed itās possible with bifurcation support, but cautioned about performance hits, particularly below gen3 x8, as consumer CPUs arenāt designed for 4 GPUs.
- Nvidia Drops Hotfix, Gamers Rejoice: Nvidia released GeForce Hotfix display driver version 591.67.
- PCIe Bandwidth Enough for Inference?: Discussion arose around PCIe lane requirements for inference, with one user suggesting that Gen4 x1 speeds might suffice.
- Others cautioned that dropping below gen3 x8 could incur a performance hit of around 40-50%, while acknowledging that a setup with multiple cards running at gen3x1 via the chipset could still work fine, referencing a YouTube video showcasing such a configuration.
- RAM Prices too High for Comfort: A member pointed out the high price of DDR5 ECC RAM, another noted buying 2x48GB 6400 MHz cl32 DDR5 for just $400 a month prior, only to see the price spike to $950 on Amazon, speculating an employee pricing error.
- This led to the recommendation to start with 64GB due to affordability.
- Blackwell RTX Pro 4000 vs 3080ās: A user debated between acquiring 2x 3080 20GB cards for a GPU rack versus a single RTX Pro 4000 Blackwell for their desktop, considering the Blackwellās speed against the increased VRAM of the 3080s.
- The user aimed to combine a Blackwell with existing 3090 Ti and 3090 cards to reach 72GB VRAM, with the Blackwell potentially running on a Gen4 x1 slot, acknowledging it might be slower in gaming but beneficial due to native fp4/fp8 support.
OpenAI ā· #ai-discussions (97 messagesš„š„):
ChatGPT Subscription Issues, AI Image Generation, Sora Monetization, AI Development Projects, AI Detection
- ChatGPT Customer Support Banned User!: A user complained about subscription issues with ChatGPT after being banned and creating a new account, which violates OpenAIās ToS.
- Other users pointed out that creating multiple accounts to circumvent a ban is against the rules, but OpenAI allows multiple accounts for separating personal and work use.
- AI Wallpaper Sparks Musk Debate: A user shared an AI-generated wallpaper featuring revolutionary figures, including Elon Musk.
- While some admired the wallpaper, others criticized Muskās inclusion, but the creator clarified that ChatGPT chose the figures, not them.
- Soraās Pricing Structure Still Murky: Users discussed the potential monetization of Sora, with one suggesting a price of $0.50 per second based on a Google search.
- Another user clarified that Sora 2 will initially be available for free with generous limits, referencing this OpenAI blogpost.
- DIY AI Projects Seek Coding Allies: A user is seeking coding collaborators for their AI + LLM project built from scratch, using a stack including HTML, CSS, JavaScript, Node.js, SQLite, C++, Capacitor, and Electron.
- Another member is developing an audio classification model for bird calls using the Wav2Vec2 model on Hugging Face.
- AI Generation of Viral YouTube Short Examined: A user asked what AI was used to create this YouTube short.
- No one provided an answer to the userās question.
OpenAI ā· #gpt-4-discussions (4 messages):
AI for analyzing pitch, AI for analyzing tone, AI for analyzing body language, AI for linguistics
- AI suggestions for analyzing pitch, tone, body language, linguistics: A member asked what is the best AI to use for testing for specific things such as pitch, tone, body language, as well as linguistics all at one time.
- Token cost for reviewing software documents: A member indicated that the cost for reviewing software documents is about $1 to $2 per prompt, depending on how big the prompts and responses are.
OpenAI ā· #prompt-engineering (17 messagesš„):
CustomGPTs for Prompt Improvement, Evaluating Prompt Quality, Meta-Prompt Systems, Prompt Engineering for Coding vs. Conversational AI
- CustomGPTs become a topic of interest for Prompt Refinement: A member inquired about preferred CustomGPTs for improving prompts, sharing examples.
- Responses emphasized the subjective nature of prompt quality and the importance of defining specific evaluation criteria.
- Engineers Design Specs for Evaluating Prompt Outputs: Members discussed how to evaluate if a prompt is better, with a proposal to build an eval spec and test different prompts against it to determine better output distributions.
- The engineer designs the eval spec for each use case and prompt engineering should not be just about relative chatting, there are better and worse prompts because there are better and worse distributions of output.
- Custom Meta-Prompt Systems gain favor for Targeted Use Cases: Rather than using CustomGPTs, a member developed a meta-prompt system tailored to their specific needs, advising others to build their own.
- They shared a link on how to start, emphasizing that a good meta-prompt is going to recommend prompts rooted in machine learning principles and enforces an output template.
- Rubrics Required for Non-Verifiable Domains: For coding agent prompt engineering, improvements are easier to feel, whereas itās easier to measure and test the functionality vs flavor.
- A rubric is required for non-verifiable domains, since verifiable domains make a prompt spec easy by comparison.
OpenAI ā· #api-discussions (17 messagesš„):
CustomGPTs for prompt improvement, Prompt evaluation metrics, Meta-prompt systems, Prompt engineering for coding agents, Rubrics for non-verifiable domains
- Users seek customGPTs for prompt enhancement: A member asked for recommendations for CustomGPTs that enhance prompts, sharing several they use as examples.
- It was questioned how to define what makes one prompt ābetterā than another.
- Crafting evaluation specs refines prompts: A member advised creating an eval spec to test prompts and determine which yields better output distributions, tailored to each use case.
- They emphasized that prompt engineering aims for better outputs, like valid JSON or YAML, and isnāt just about casual chatting.
- Devise Meta-Prompt system and enforce output template: A member recommended prompting from machine learning principles and enforcing an output template, evaluating meta-prompts by defining specs and testing.
- It was highlighted that a human-in-the-loop is inevitable.
- Handcrafted meta-prompt systems excel: A member shared they donāt use customGPTs, but instead use a handcrafted meta-prompt system, and recommended building custom systems for specific use cases.
- They also linked to a prior message on how to start.
- Coding agents benefit from model enhancements: A member shared that for a coding agent theyāve been developing, improvements mainly come from model enhancements.
- They use a grading system to save outputs, and qualitatively assess improvements, with conversational perspectives harder to evaluate due to lack of testable functionality.
GPU MODE ā· #general (14 messagesš„):
DSLs, PTX, CDNA ISA, AGX assembly
- DSL Guide Dream Sparked: Members joked someone should make a guide on how to choose DSLs because there are so many.
- Another member quipped that PTX is all you need.
- Kernel Konundrum: Too Many Cooks?: It was observed that many ML devs start writing kernels and believe they have a better language idea, but donāt collaborate.
- As one member put it, like 5000 of these people exist and have the same idea on the same week but never talk to each other.
- ISA alternatives suggested!: After it was suggested that PTX is the only option, other members said to write CDNA ISA or write AGX assembly.
- Another member said, Bye portability as a reply.
GPU MODE ā· #triton-gluon (4 messages):
cuTile Performance, GPU Programming Evolution, DSL Challenges
- cuTileās Performance: Too Early to Judge: Itās premature to assume cuTile outperforms Triton, given cuTileās recent public release last week, although its developers at Nvidia plan to optimize it based on hints.
- The assumption is that cuTileās bytecode and IR design align with Nvidiaās future GPU roadmap, while Triton added warp specialization and TMA last year.
- GPU Programming Parallels JavaScript Framework Frenzy: GPU programming is likened to the JavaScript framework era, with frequent emergence of new languages and abstractions aimed at simplifying the programming model.
- The collective goal is to facilitate development, enhance out-of-the-box performance, and minimize errors via higher-level tooling and safer abstractions.
- DSLs Struggle to Balance Simplicity and Complexity: The challenge for DSLs is to make tensor-based programming easy while enabling complex optimizations like warp specialization and symmetric memory, balancing usability and advanced control.
- No one has cracked this balance, though cuTile from Nvidia is a promising hardware-specific approach that could potentially become more hardware agnostic.
GPU MODE ā· #off-topic (2 messages):
Scientific Note-Taking, Technical Note-Taking, LaTeX, Orgmode, Excalidraw
- User Seeks Unified Scientific Note-Taking: A user is struggling with a mix of plain text files, Excalidraw drawings, a home-brewed mindmapping app, LaTeX, and Markdown, seeking a consolidated solution for scientific/technical note-taking.
- The user finds Markdown insufficient and LaTeX too cumbersome for everyday use, while also disliking Emacs for Orgmode, desiring a format thatās readable, supports math typesetting, diagrams, and can compile to TeX.
- Paper beats Computer for Notes: In response to the original question, one user suggests the pen and paper method for note taking.
- No further elaboration was provided.
GPU MODE ā· #submissions (9 messagesš„):
NVIDIA Leaderboard Updates, nvfp4_dual_gemm Leaderboard
- NVIDIA leaderboard sees new submissions: Multiple submissions were made to the
nvfp4_dual_gemmleaderboard on NVIDIA, with varying results in microseconds (µs). - NVIDIA Race to Third Place: User <@1291326123182919753> achieved third place multiple times on NVIDIA with a consistent time of 14.6 µs.
- Personal Best on NVIDIA: User <@772751219411517461> achieved a personal best on NVIDIA with a time of 21.2 µs.
GPU MODE ā· #cutlass (1 messages):
Cute DSL, Competitions in channel
- Competitions now LIVE!: A member noted there is currently a competition in the #cute-dsl channel.
- They suggested joining the channel to learn the language in an applied way.
- DSL Learning Opportunity: Participants can engage in hands-on learning of a cute DSL through the competition.
- This approach allows for practical application and skill development.
GPU MODE ā· #teenygrad (3 messages):
Tensor Handles, OpNode IR, Runtime Buffer Allocators, OpCode.ADD, ascii printer for IR
- Tensor Handles constructed: Two
Tensorhandles were constructed toOpNodeIR which usesRuntimeBuffer Allocators.- The poster mentioned that everything is constructing IR.
- Next Steps for issuing OpCode.ADD: The next step is to issue the
OpCode.ADDfor the addition.- They added they need to implement ascii printer for the IR or reuse tinygradās VIZ infrastructure to make debug cycles easier.
- Teenygrad is reusing 90% of tinygradās abstractions: The poster explained that the reason it took so long to get to the point of adding 2 Tensors is bc teenygrad is reusing 90% of tinygradās abstractions.
- The good news is for karpathyās reproduction of bengio et al 2003, the ffn only needs 2 additional ops: matmul and tanh.
- Golden Tests incoming: The poster hopes to get some golden tests up with the OpNode IR.
- They intend to start getting CI to provide signal and will probably forego the VIZ=1 web debugger in favor of the ascii debug.
GPU MODE ā· #helion (2 messages):
Helion usage in vLLM
- Helion Finds Use in vLLM: A user expressed excitement about Helionās integration with vLLM.
- They requested the other userās email address to set something up for the new year, presumably related to this integration.
- New Year Plans Sparked by Helion-vLLM Integration: The discussion highlights the potential benefits and future plans arising from Helionās utilization within the vLLM framework.
- The userās offer to set something up suggests collaborative opportunities or further developments related to this integration in the coming year.
GPU MODE ā· #nvidia-competition (1 messages):
Cute-DSL vs C++ Cute, Cute API limitations, Cute consensus preference
- Cute-DSL vs Good Olā C++ Cute: A member inquired about the inherent limitations of Cute-DSL compared to C++ Cute, expressing interest in learning Cute.
- They also sought a consensus preference between the two, noting the prevalence of Cute-DSL solutions in a recent problem, which suggests it is not significantly behind the C++ API.
- Learning Cute: C++ or DSL?: A member is thinking of finally biting the bullet and attempting to learn Cute and wanted to know if there is a consensus preference between the C++ and DSL APIs.
- Many solutions in the last problem used Cute-DSL, suggesting it may be on par with the C++ API.
GPU MODE ā· #career-advice (3 messages):
handwritten kernels, tinygrad IR interpreter, tilelang
- Call for Assistance in Handwritten Kernel Implementation: A member requested assistance in implementing handwritten kernels for an interpreter that evaluates tinygrad IR (tilelang).
- They welcomed contributions from anyone with the necessary skills.
- Consistency Praised, Contributions Welcomed: The consistency of the project was praised, with a call for skilled individuals to contribute.
- Despite initial confusion about the humor in consistency, the invitation for contributions remains open to those with the requisite expertise.
HuggingFace ā· #general (26 messagesš„):
Agentic AI Course, RAG learning Resources, ML Project Water Quality Testing, Open Source OCR for Medical Documents, Hottest LLM Model
- No Channel Found for Agentic AI Course: A member inquired about the location of the channel for the Agentic AI Course, but no specific channel was identified in the given context.
- RAG learning Resources: A member is planning to start learning RAG and build some projects, asking for any courses or resources.
- ML Project Aims To Improve Water Quality Testing: A member is seeking assistance with an ML project focused on solving a real-world problem related to water quality testing, specifically predicting water quality degradation from uranium in situ recovery (ISR) operation sites.
- The project also involves identifying data gaps in mitigating vulnerabilities and developing monitoring programs.
- OCR Models Sought for Medical Documents: A member inquired about the best open source OCR model for complex medical documents, including scanned PDFs and handwritten documents.
- Another member suggested checking out the openmed project from Maz.
- Liquid AFMs VLMs for GPT clones on iPhone: A member shared that LiquidAFMsās 3B VLMs for GPT clones on iPhones, are the best of both worlds, because of their really solid answers and 100 tok/s on mobile CPUs.
HuggingFace ā· #i-made-this (4 messages):
Coding models released, AI-powered Django REST API Generator, New model released
- Coding Models Hit the Shelves: Coding models designed for a specific library have been published.
- These models are available as a collection.
- AI Powers Django REST API Creation: An AI-powered Django REST API Generator has been built and deployed as a Hugging Face Space.
- It generates complete Django + DRF CRUD backends from a simple prompt, including models, serializers, views, and URLs, allowing users to instantly download the project.
- Genesis 152M Instruct Model Debuts: A new model, Genesis-152m-instruct, has been released and is available for review and discussion here.
HuggingFace ā· #agents-course (3 messages):
Agent course, Hugging Face Courses
- New Agent Course Student Asks About Certificate Deadlines: A new student starting the Agent course inquired about the deadline for the second certificate.
- Another student asked where to find courses?
- Hugging Face Courses Link Provided: A member shared a link to the Hugging Face courses.
- This link probably answers the question from one of the members.
Nous Research AI ā· #general (21 messagesš„):
LLMs social deduction game, Claude one-shot UI, Smart watch data to external AI, GPT models and ads, Zai and Minimax IPO
- LLMs Play Mafia: A member created a game where groups of LLMs play a social deduction game (Mafia) where they just talk on their turn trying to prove theyāre innocent.
- Claudeās UI Wins Hearts: A member expressed liking Claudeās one-shot UI the best and asked others to identify the big models from screenshots.
- Another user pointed out that the first UI gives it away, guessed the second is Claude, and the third might also be Claude.
- Smartwatch Integrates with AI: A member is templating everything and ordered a new smart watch to start integrating data with an external AI company but will try to keep as much as possible local.
- GPT Models May Soon Prioritize Sponsor Content: OpenAI may tweak their GPT models to promote ads for a revenue stream, prioritizing sponsor content.
- It may mean GPT models will not focus on high intelligence but instead on sponsor contents.
- Zai and Minimax Plan Hong Kong IPOs: Zai and Minimax are planning to go public with IPOs in Hong Kong in the coming weeks.
Nous Research AI ā· #research-papers (1 messages):
teknium: https://x.com/openbmb/status/2004539303309750341?s=46
Nous Research AI ā· #research-papers (1 messages):
teknium: https://x.com/openbmb/status/2004539303309750341?s=46
Moonshot AI (Kimi K-2) ā· #general-chat (23 messagesš„):
Kimi coding vs Gemini, Kimi Researcher Model Speculations, AI implementation in society, Fact checking with LLMs
- Kimi coding rivals Gemini 3 for design: A user reported that Kimi for coding cannot be integrated with Cursor, but the HTML design created by Kimi in Roo Code is impressive.
- They compared Kimi coding to Gemini 3 & GPT-5.1 for design purposes.
- AI societal impact compared across countries: A member claimed that China is ahead in implementing AI into society, such as in medical care, elder care, and traffic optimization.
- Another user agreed that is exactly what AI should do, i.e. disaster prevention, and pointed out that Google and ChatGPT also censor certain questions.
- Deep dives with the Kimi AI assistant: A member shared a link to a conversation with the Kimi AI Assistant for research purposes and stressed the importance of fact-checking.
- Another user agreed and emphasized the need to cross-check, fact-check, and pressure-test any doubtful information obtained from LLMs, and to utilize new chat contexts for fresh perspectives.
- Kimi Researcher Model Speculations: A user inquired whether Kimi Researcher uses Kimi K2 Thinking Turbo or the older K1.5 model, and if a not-yet-public K2 VL model is employed.
- They expressed hesitations about using K1.5 except for basic tasks due to the superior performance of K2T, to which another user replied that there doesnāt seem to be any documentation.
Yannick Kilcher ā· #ml-news (10 messagesš„):
Groq Inference Chips, NVIDIA NPU acquisition, Chinese chips
- Groq Gets NVIDIAāed!: Groq and NVIDIA have entered into a non-exclusive inference technology licensing agreement to accelerate AI inference at a global scale.
- Some community members have suggested that NVIDIA is doing their latest price inflation trick by eleminating another player from the table.
- Inference Acqui-Hire Relieves GPU Shortage?: Members discussed the possibility that Groqās inference chips will now become NVIDIA NPUs, potentially relieving regular GPUs for consumers again.
- A member added that Jen-Hsun, the founder of NVIDIA, is Chinese.
- Chinese Chips Lag Behind?: A member stated that Chinese chips might get to H100 level in 3 years, at which point H100 will be 5 years old.
- Another member stated that they are not holding their breath for Chinese chips being competitive to U.S/Taiwanese chips anytime soon.
Latent Space ā· #ai-general-chat (7 messages):
Karpathy Refactoring, AI Agent Programming, Rob Pike Opinion
- Karpathy on AI-Driven Refactoring: Andrej Karpathy reflects on the dramatic shift in software engineering caused by AI agents and LLMs, describing a new layer of abstraction involving stochastic entities.
- Pikeās Perspective Enters the Scene: A user shares a link of Rob Pikeās post in response to Karpathyās post.
Latent Space ā· #private-agents (2 messages):
Torchax, Unified Memory, Godawful OS
- Torchax Sparks OS Redemption Hope: The user expressed that torchax, combined with a substantial 128GB unified memory, might alleviate their regret regarding the use of a particular operating system.
- No specific details were shared concerning the intended application or the particular features of torchax that are being considered.
- Lamenting the āGodawful OSā: A user humorously bemoaned the āgodawful OSā on their system.
- They found solace in the potential of torchax and the large 128GB unified memory to compensate for the OSās shortcomings.
aider (Paul Gauthier) ā· #general (4 messages):
Claude, Codebase Explanation
- Discussion on Claudeās Capabilities: A member inquired about the potential of using Claude and sought opinions on its capabilities.
- The user seemed to consider whether Claude can fulfill a specific need or task, potentially in comparison to other AI models or tools.
- Seeking Efficient Codebase Explanation Methods for Claude: A member is looking for the best way to explain their codebase to Claude every time they open their project, expressing frustration with manually updating a
claude.mdfile.- They are open to suggestions and alternative methods, as they already have one solution but want to explore other community recommendations for automating this process.
aider (Paul Gauthier) ā· #questions-and-tips (3 messages):
Explaining codebase to Claude, Automating Claude updates
- Automate Claude codebase intros: A user is seeking advice on how to best explain their codebase to Claude each time they open their project, aiming to automate the process instead of manually updating a
claude.mdfile.- A member inquired if the user was referring to using aider.
- Eliminate manual updates to Claudeās knowledge: The user expresses frustration with manually updating a
claude.mdfile to keep Claude informed about their codebase.- They are actively seeking alternative methods to automate this process and avoid manual updates.
Eleuther ā· #general (5 messages):
Full Autonomy Systems for AIs, Limitations to Full AI Autonomous Systems, Context Management, Long-Term Memory Implementation
- Full Autonomy Systems for AIs Emerge: A member is developing a full autonomy system for AIs to enable 24/7 inference, along with plain language tools similar to OpenAPI MCP tools.
- The member asked if others have worked on similar systems and inquired about current limitations, especially concerning long-term memory and long context issues.
- Continuous Self-Summarization Holds Potential: A member suggested that continuous self-summarization could address long-term memory limitations in autonomous AI systems.
- He mentioned exploring summarizing projection layers to write to a scratchpad, acknowledging time constraints in fully developing the idea.
- Context Management Gains Priority: A member emphasized context management as a priority when implementing long-term memory solutions.
- He noted that LLMs struggle with complex tasks involving multiple subtasks without well-crafted context management.
- Ollama Source Code Under Consideration for Memory Expansion: A member is considering modifying the Ollama source code to develop a āsecond memoryā system alongside the main LLM file.
- He acknowledged the challenge of understanding the source code, even with AI assistance.
Eleuther ā· #research (2 messages):
all-the-noises github, arxiv
- All The Noises project surfaces: A member shared a link to the All The Noises project, a GitHub page that aggregates diverse noises.
- Some noises include brownian, circuit, pink, sine, tan and uniform.
- Arxiv link shared: A member shared an Arxiv link which appears to be related to future research work.
- Given the year 2025, it is likely a pointer to a future research direction the member is contemplating.