Frozen AI News archive

Terminal-Bench 2.0 and Harbor

**Terminal-Bench** has fixed task issues and launched version 2.0 with cloud container support via the **Harbor framework**, gaining recognition from models like **Claude 4.5** and **Kimi K2 Thinking**. **Moonshot AI's Kimi K2 Thinking** is a 1 trillion parameter MoE reasoning model with ~32B active parameters, running natively in **INT4 quantization** and featuring a 256K context window. It leads open-weights benchmarks with an Artificial Analysis Intelligence Index score of **67** and strong agentic performance, running efficiently on consumer Apple silicon and 2× M3 Ultra hardware. The model is broadly available on **Hugging Face**, **Ollama Cloud**, and integrated into frameworks like slime. Serving bottlenecks were traced to network bandwidth rather than GPU limits, highlighting infrastructure considerations for LLM deployment.

Canonical issue URL

A popular benchmark fixes itself.

AI News for 11/6/2025-11/7/2025. We checked 12 subreddits, 544 Twitters and 23 Discords (200 channels, and 5178 messages) for you. Estimated reading time saved (at 200wpm): 432 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Since Terminal-Bench launched earlier this year (before Claude Code!), it has vaulted into the upper echelon of coding agent benchmarks, eg cited by Claude 4.5 and yesterday's Kimi K2 Thinking. There were some problems with tasks on TBench being too easy/impossible, so they went ahead and fixed the glitch (blog). They are also rewriting the bench to be easily run in cloud containers with the new Harbor framework, and also hosted a launch party with Q&A, recorded and now live on Latent Space:

A presenter discusses Terminal-Bench 2.0 at a technical conference, standing at a podium with presentation slides about the AI benchmark.


AI Twitter Recap

Moonshot AI’s Kimi K2 Thinking: 1T INT4 open-weights reasoning model, agentic SOTA, and real-world deployment notes

Scaling RL for LLM agents: DreamGym and agent instrumentation

Video “supersensing” and fast tracking: Cambrian-S and EdgeTAM

Evaluation and interpretability: long-context aggregation remains hard; model diffing and curvature-based editing

Systems and inference: kernels, frameworks, and deployment practices

Policy and industry context

Top tweets (by engagement)


AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Kimi Model Launch and Performance

2. Moonshot AI AMA Announcement

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. AI Consciousness Debate and Developments

2. AI Design and Production Innovations

3. Free AI Services in India


AI Discord Recap

A summary of Summaries of Summaries by gpt-5

1. Kimi K2 Reasoning Surge & Leaderboard Shakeups

2. GPU Kernels, Low Precision, and Bandwidth Realities

3. APIs, SDKs, and Spec Upgrades

4. Agents, Workflows, and Speech Speed Records

5. Training & Numerics: MoE, Torch 2.9, and On‑Device TTS


Discord: High level Discord summaries

LMArena Discord


Perplexity AI Discord


GPU MODE Discord


OpenRouter Discord


Cursor Community Discord


LM Studio Discord


HuggingFace Discord


Unsloth AI (Daniel Han) Discord


Nous Research AI Discord


OpenAI Discord


Modular (Mojo 🔥) Discord


Eleuther Discord


Yannick Kilcher Discord


tinygrad (George Hotz) Discord


MCP Contributors (Official) Discord


DSPy Discord


aider (Paul Gauthier) Discord


Manus.im Discord Discord


MLOps @Chipro Discord


The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.


Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (971 messages🔥🔥🔥):

Gemini 3 Pro, MovementLabs AI, Image-to-Video bugs, LMArena API Exploit, OpenAI's GPT-5 Release Strategy


LMArena ▷ #announcements (2 messages):

Text Arena Leaderboard, Image Edit Leaderboard, Ernie-5.0-preview-1022, Reve-edit-fast


Perplexity AI ▷ #general (600 messages🔥🔥🔥):

Sonnet 4.5 issues, Adblock alternatives, Kimi K2 Thinking model, GTA 6 delay, Comet Browser issues and Android release


Perplexity AI ▷ #pplx-api (4 messages):

``


GPU MODE ▷ #general (128 messages🔥🔥):

FP4 kernels, Nvidia interview tips, FP4 precision management, Blackwell new instructions, PTX and CUDA docs conversion to markdown


GPU MODE ▷ #triton-gluon (2 messages):

AtomicAdd in Gluon, Flash Attention Backward, Triton Tutorial


GPU MODE ▷ #cuda (29 messages🔥):

PTX instruction set, CUDA kernels profiling in Colab, TMA load bandwidth, WGMA vs MMA, INT8xINT8 GEMM kernel


GPU MODE ▷ #torch (26 messages🔥):

Torch 2.9.0, cos/sin implementation, numpy, fft, numerical bugs

import torch import numpy as np

print("=== ENVIRONMENT INFO ===") print(f"PyTorch version: {torch.version}") print(f"numpy version: {np.version}")

k = torch.tensor([[-0.0000000000, -0.1963495463, -0.3926990926, -0.5890486240, -0.7853981853, -0.9817477465, -1.1780972481, -1.3744468689]])

w_r_torch = k.cos()

NumPy version

k_numpy = k.numpy() w_r_numpy = np.cos(k_numpy)

Convert back to tensor for comparison

w_r_numpy_tensor = torch.from_numpy(w_r_numpy)

Set print options for maximum precision

torch.set_printoptions(precision=17) np.set_printoptions(precision=17)

Compare

print("\nPyTorch result:") print(w_r_torch) print("\nNumPy result:") print(w_r_numpy) print("\nDifference:") print(torch.abs(w_r_torch - w_r_numpy_tensor)) print("\nMax difference:") print(torch.max(torch.abs(w_r_torch - w_r_numpy_tensor)).item()) print("\nAre they close? (allclose with default tolerance)") print(torch.allclose(w_r_torch, w_r_numpy_tensor))



  

---


### **GPU MODE ▷ #[cool-links](https://discord.com/channels/1189498204333543425/1189868872887705671/1436409377442762896)** (4 messages): 

> `TMD Introduction, IEEE 754 Status, Verinum Numerical Software Verification` 


- **Intro to TMD**: A member shared a link to *Introduction to **Transcendental Meditations and Distributions** (TMD)* by Jean-Michel Muller: [Intro to TMD](https://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm).
- **History of IEEE 754 Floating Point Standard**: A member shared a link to *754 story* by W. Kahan: [IEEE 754 Status](https://people.eecs.berkeley.edu/~wkahan/ieee754status/754story.html).
- **Verinum's Verification of Numerical Software**: A member shared a link to [Verinum](https://verinum.org/), a collection of research projects taking a layered approach to foundational verification of correctness and accuracy of numerical software, with formal machine-checked proofs about programs.


  

---


### **GPU MODE ▷ #[jobs](https://discord.com/channels/1189498204333543425/1190208177829068860/1436144483422048306)** (5 messages): 

> `Hiring for AI System Performance, Tinygrad Core Devs, Low-Level Development Opportunities, ScienceCorp Hiring` 


- **Company is hiring for AI System Performance**: Company is still hiring engineers due to a strong customer pipeline, seeking **low-level developers** and **performance engineers** to push the limits of **AI system performance**.
   - The team includes ex-**HRT** and **Five Rings** engineers, **IMO** medalists, **Zig** and **tinygrad** core devs, and people from top AI labs, with compensation ranging from **$500K–$1M TC**.
- **Inquiries about Tinygrad Core Devs**: A member inquired about the **tinygrad core devs** in the team.
   - Another member asked for more details about the company, job description, and specific skills being sought.
- **ScienceCorp Seeking Low-Level SWEs for Vision and Brain-Computer Interfaces**: A member shared a hiring post for **ScienceCorp** seeking **low-level SWEs** interested in projects like restoring sight to the blind or hooking your brain up to a computer [ScienceCorp Job Posting](https://x.com/ScienceCorp_/status/1986457644421566516).
   - Interested candidates are encouraged to DM for more information.


  

---


### **GPU MODE ▷ #[beginner](https://discord.com/channels/1189498204333543425/1191300313928433664/1436109751048994988)** (11 messages🔥): 

> `1D convolution kernel for tensara problem, Debugging CUDA without CUDA hardware, printf in device code, GPU Computing Starting Points, NCU profiling with Colab/Lightning AI` 


- **Tiling 1D Convolution Kernel Troubles**: A member is seeking help debugging a [1D convolution kernel](https://gist.github.com/RiscInside/642bca513606d3d4cd366492ae2a3460) for a tensara problem, encountering slight inaccuracies in the tiling version on large tests.
   - They suspect the issue might stem from atomicAdd or an off-by-one error, and are looking for advice on debugging without CUDA-capable hardware.
- **Printf Function Surprises User in Device Code**: A user expressed surprise at the functionality of `printf` being available in device code.
   - This may be useful to the user debugging his CUDA kernel.
- **Profiling Code with NCU on Various Platforms**: One member asked if anyone had experience profiling their code using **NCU** (NVIDIA Compute Unified Device Architecture) on platforms like **Colab** or **Lightning AI**.
   - This would be very helpful in the above situation for debugging the kernel.
- **Seeking Latest Kernel Benchmark Results**: A member referenced a [Stanford article on kernel benchmarking](https://scalingintelligence.stanford.edu/blogs/kernelbench) and inquired about the availability of the latest benchmark results.
   - Specifically, they were interested in seeing benchmark results for **GPT-5**.


  

---


### **GPU MODE ▷ #[torchao](https://discord.com/channels/1189498204333543425/1205223658021458100/1436100535790080254)** (5 messages): 

> `WandaSparsifier, Whisper models, sparse computation, 2:4 sparsity, matmul performance` 


- **TorchAO Newbie Explores WandaSparsifier on Whisper**: A new **torchao** user is experimenting with the `WandaSparsifier` on **Whisper models** to achieve faster inference after sparsifying and squashing the mask.
   - The user encountered a `RuntimeError` when attempting `.to_sparse()` on the weight tensors and seeks advice on achieving faster inference with unstructured pruning.
- **Unstructured Sparsity Needs >99% for Speedups**: Achieving speedups in compute-bound workloads with unstructured sparsity generally requires over **99% sparsity**.
   - The suggestion was made to try pruning to **2:4 sparsity** and accelerating with `to_sparse_semi_structured`, referencing [this PyTorch tutorial](https://docs.pytorch.org/tutorials/advanced/semi_structured_sparse.html).
- **Matrix Shapes Matter for Matmul Acceleration**: Acceleration options for **matmul** differ based on whether the workload is compute-bound versus memory-bound.
   - The user is testing on **whisper-base** with matrix shapes potentially like `[batch x 512 x 512]` for attention, noting that small batch sizes can be slower.


  

---


### **GPU MODE ▷ #[off-topic](https://discord.com/channels/1189498204333543425/1215328286503075953/1436138085694967929)** (1 messages): 

> `Milk Couch` 


- **User Shares Image of "Milk Couch"**: A user posted an image titled "The milk couch" with a [link to the image](https://cdn.discordapp.com/attachments/1215328286503075953/1436138085338578944/IMG_20251106_140720.jpg?ex=690f2c11&is=690dda91&hm=7ef39bf0b94248c40300092cc18b9f88df2e06b0f761109a8e9243727c07081a&).
   - No additional context was provided.
- **The Milk Couch**: The user shared an image of what they referred to as 'The milk couch'.
   - Without further context, the meaning or significance of the 'milk couch' remains ambiguous.


  

---


### **GPU MODE ▷ #[metal](https://discord.com/channels/1189498204333543425/1285384841730457600/1436140780325568582)** (2 messages): 

> `candle framework, metal, iOS deployment` 


- **Candle Framework Embraces Metal Acceleration**: Huggingface's **candle nn framework** now supports **Metal** for some operations, potentially boosting performance on Apple devices.
   - A user reported finding it useful on **M1/M2 OSX devices** but remains uncertain about its transparent compatibility with **iOS**.
- **iOS Deployment Still Uncertain for Candle**: While **Candle** benefits from **Metal** on macOS, its seamless functionality on **iOS** is still unconfirmed.
   - Further testing and community feedback are needed to ascertain the extent of Metal's support for Candle across Apple's mobile platforms.


  

---


### **GPU MODE ▷ #[self-promotion](https://discord.com/channels/1189498204333543425/1288557096404516945/1436098759237828721)** (4 messages): 

> `sse-popcount optimization, Model Serving Communities, Modern CUDA C++ Programming Class` 


- **SSE Popcount Optimized**: A member noted that **Wojciech Mula's sse-popcount** is about as good as it can get on CPU, using **Harley-Seal vectorized counts** and carry-save adder accumulation over blocks of **16 vectors**.
- **Model Serving Community Updates**: The State of Model Serving Communities: November Edition is out, with updates on **vLLM**, **KServe**, **llm-d**, **Llama Stack**, and more from Red Hat AI teams: [link](https://open.substack.com/pub/inferenceops/p/state-of-the-model-serving-communities-ea6?r=3ouig&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false).
- **NVIDIA Offers New Modern CUDA C++ Class**: NVIDIA announced a new Modern CUDA C++ Programming Class for C++ developers who want to use the GPU effectively and write clean, efficient, idiomatic GPU code, with all slides and exercises being open source: [link](https://www.youtube.com/watch?v=Sdjn9FOkhnA&list=PL5B692fm6--vWLhYPqLcEu6RF3hXjEyJr).


  

---


### **GPU MODE ▷ #[submissions](https://discord.com/channels/1189498204333543425/1343002583001726986/1436256640960692328)** (42 messages🔥): 

> `Grayscale_v2 leaderboard results, vectoradd_v2 leaderboard results, vectorsum_v2 leaderboard results, B200 performance, H100 performance` 


- **Grayscale Gauntlet: B200 Battles**: Multiple submissions to the `grayscale_v2` leaderboard on **B200** achieved timings around **600 µs**, with one submission reaching **first place** at **600 µs**.
   - A separate submission also secured **4th place** on the **B200** with a time of **614 µs**.
- **Vector Victorious on Multiple Platforms**: Submissions to the `vectoradd_v2` leaderboard showed successful runs across different GPUs, with the following highlights: **A100** at **953 µs**, **H100** at **532 µs**, **B200** at **243 µs**, and **L4** at **6.92 ms**.
   - Further optimizations led to a **4th place** on **H100** at **526 µs** and consistent performance on **L4** at **6.91 ms**.
- **Vectorsum's Victory Lap**: Submissions to the `vectorsum_v2` leaderboard saw impressive results, including **3rd place** on **B200** at **51.3 µs** and **1st place** on **H100** at **83.3 µs**.
   - The leaderboard also demonstrated successful runs on **L4** at **974 µs** and **A100** at **141 µs**.
- **H100 & L4 neck and neck, Grayscale and Vectoradd locked in combat**: The `grayscale_v2` leaderboard runs also netted **Third place** on **H100** at **1371 µs**, and **Third place** on **L4** at **17.2 ms**.
   - Multiple `vectoradd_v2` leaderboard runs had **10th place** on **H100** hovering around **528 µs**


  

---


### **GPU MODE ▷ #[hardware](https://discord.com/channels/1189498204333543425/1349152646484987974/1436148112732197016)** (10 messages🔥): 

> `DGX Spark experiences, DGX Spark vs Strix Halo, DGX Spark as a datacenter proxy, DGX Spark hardware and software stack` 


- **First-Hand DGX Spark Experiences Requested**: Members are soliciting first-hand experiences with **DGX Spark**, particularly regarding bandwidth limitations, local model hosting, and **nvfp4 quantization** experiments.
   - One is curious about the software stack and its suitability for local models, experimentation with quantization, and general form factor advantages.
- **Chips and Cheese Disses DGX Spark**: **Chips and Cheese** is doing a [review analysis on DGX Spark](https://chipsandcheese.com), their initial impressions are not particularly positive.
   - They mention the **CPU** side is inferior to **Strix Halo**, and there were weird segmentation decisions on the **GPU** side, resulting in it not being a great proxy for datacenter solutions due to nerfed **FP32** accumulate performance.
- **DGX Spark is No Datacenter Proxy**: The DGX Spark's **sm120 GPU** cannot use any of the new **Blackwell** features other than **fp4**, making it unsuitable as a proxy for datacenter solutions.
   - One member described it as *basically a 5080 without the vram, and instead it has shared system ram*.
- **Strix Halo Eats DGX Spark for Lunch**: A member said if you want to remotely use it for regular PC applications, **Strix Halo** will bury the **DGX Spark**.
   - Since **Strix Halo** existed and solved **ROCm**, **DGX Spark** would have been a decent option for a *local AI box* otherwise.


  

---


### **GPU MODE ▷ #[factorio-learning-env](https://discord.com/channels/1189498204333543425/1354169122107293786/1436386308300865738)** (1 messages): 

> `Meeting Cadence` 


- **Meeting Frequency Faces Downsize**: A user apologized for missing a message and mentioned that today is a good day for a meeting, but they also expressed the need to decrease the meeting frequency.
- **Meeting Cadence and Apologies**: A user mentioned they were available, and apologized for missing a message from another user.


  

---


### **GPU MODE ▷ #[cutlass](https://discord.com/channels/1189498204333543425/1362196854460383353/1436109242552553655)** (3 messages): 

> `CUTLASS MMA, Tensor Operations, TCGEN05` 


- **CUTLASS MMA without inline PTX?**: A user inquired about using `.ws` MMAs for **TCGEN05** in [CUTLASS](https://github.com/NVIDIA/cutlass) without resorting to inline PTX.
   - The inquiry points to challenges or preferences in utilizing certain CUTLASS MMA functionalities, specifically related to tensor cores, without directly embedding PTX code.
- **Tensor operations beyond row/col major**: A member mentioned that it is designed to work with **any tensor**, even those that are not describable as row/col major.
   - He clarified that *election* implies for issue of the operation, not predication for the data.


  

---


### **GPU MODE ▷ #[mojo](https://discord.com/channels/1189498204333543425/1367972893400760371/1436106708828295178)** (1 messages): 

> `Mojo Kernel Boilerplate` 


- **Competitors seek Mojo Kernel Boilerplate**: A member requested a **boilerplate kernel in Mojo** to use for competitions, specifically needing the structure of the submission file.
   - No specific examples or links were provided in the given context.
- **Another member needs Mojo help**: Another user also asked for help with Mojo.
   - No further details were given.


  

---


### **GPU MODE ▷ #[singularity-systems](https://discord.com/channels/1189498204333543425/1373414141427191809/1436440838900158575)** (1 messages): 

> `picograd, runtime allocator, compiler, AD on tensor, device kernels` 


- **Picograd Gets Allocator and Compiler Working**: A member announced they are working on getting the **runtime's allocator** and **compiler** working on [picograd](https://github.com/j4orz/picograd/commit/c261d5c15cc47af28f3727bcde043b31f53f1cbc).
   - The goal is to setup **AD** (automatic differentiation) on the **tensor** and **device kernels** on the runtime.
- **Running MNIST opens Parallelization Potential**: The member plans to run an **MNIST** example after setting up **AD** and **device kernels**.
   - They noted that once this setup is complete, more work can be parallelized to improve performance.


  

---


### **GPU MODE ▷ #[general](https://discord.com/channels/1189498204333543425/1394753097989099640/1436320044240863372)** (7 messages): 

> `Leaderboard submission, Popcorn CLI, VS Code extension, NVFP4 kernel hackathon eligibility` 


- **Popcorn CLI for Leaderboard Submission**: A member asked about the tools used for leaderboard submission, and another member responded with **Popcorn CLI**.
   - The member then inquired whether the first member was also utilizing the **VS Code extension**.
- **VS Code Extension for PyTorch Load Inline Highlighting**: A member shared a [VS Code extension](https://marketplace.visualstudio.com/items?itemName=msaroufim.pytorch-load-inline-highlighter) for **PyTorch load inline highlighting**.
   - Another member responded that they hadn't heard of it and asked about its benefits, to which the first member responded it makes using load inline more pleasant in your IDE.
- **NVFP4 Hackathon Eligibility Question**: A member asked about eligibility for participating in the **NVFP4 kernel hackathon**.
   - Specifically, they were concerned because their country was not listed as eligible, and they wanted to know if this meant they could not participate or just were ineligible for winning prizes.


  

---


### **GPU MODE ▷ #[multi-gpu](https://discord.com/channels/1189498204333543425/1398843708488552570/1436461161125122329)** (3 messages): 

> `Multi-node communication, Low-latency communication kernels, NVSHMEM, LLM inference` 


- **NVSHMEM Kernels for Multi-Node LLM Inference**: A member shared a [blog post](https://pssg.cs.umd.edu/blog/2025/beyond-nccl/) about writing **low-latency communication kernels** with **NVSHMEM** for **LLM inference** focusing on multi-node communication performance.
- **Multi-Node Kernel Talk Proposed**: Due to member interest in the blog post, a talk was proposed about the work on low-latency communication kernels with **NVSHMEM**, especially for **LLM inference**.
   - The original poster expressed enthusiasm for giving a talk if there's enough interest and feedback from the community.


  

---


### **GPU MODE ▷ #[helion](https://discord.com/channels/1189498204333543425/1425531180002054195/1436403113643737109)** (2 messages): 

> `Helion on Hacker News, Flex Attention vs Triton` 


- **Helion Graces Hacker News Front Page**: Members noted that [Helion is on the front page of Hacker News](https://news.ycombinator.com/item?id=45788194).
   - They linked to the [Helion GitHub](https://github.com/pytorch/helion/blob/main/examples/attention.py) as a key implementation.
- **Flex Attention Set to Duel Triton**: A member requested performance comparisons between **Flex Attention** and a linked **Helion** implementation.
   - They stated that the **Helion** code looks better than any **Triton** implementation they've encountered.


  

---


### **GPU MODE ▷ #[nvidia-competition](https://discord.com/channels/1189498204333543425/1434709259500650628/1436153269159202918)** (124 messages🔥🔥): 

> `AGX Thor, CC11.0 support, CUTLASS Library, CUDA kernel optimization, nvfp4 moe` 


- **AGX Thor GPU Architecture Specs Probed**: A user inquired about **CC11.0 support** for a potential **AGX Thor** purchase, noting the absence of clear documentation and lack of tcgen05 indicating it can use it, alongside concerns about nerfed smem.
   - After another user confirmed it was mentioned in the **PTX documentation as sm_110**, the original poster was happy with the specs.
- **CUDA Kernel Performance Issues Investigated**: A user, practicing **cutedsl** on grayscale_v2, reported a significant performance discrepancy on **B200**, with their kernel running at **48310.973μs** compared to the leaderboard's **600.272μs**, due to cute.compile() being inside the custom kernel.
   - Another user pointed out that compiling the kernel on each run is extremely slow, and suggested caching the compilation or moving `cute.compile()` outside the kernel, linking to [a reference implementation](https://github.com/gpu-mode/reference-kernels/blob/main/problems/pmpp_v2/sort_py/solutions/correct/ref.py).
- **Benchmark Evaluation Script Glitches Flagged**: A user questioned the accuracy of the evaluation script used for the competition, suggesting that the **start_event.record** triggers immediately, capturing Python overhead and skewing kernel timing and recommended launching a junk kernel that wastes a second, then trigger the first event, then launch the main kernel.
   - They propose adding a **time-waster kernel** to allow Python to queue CUDA operations accurately, particularly for benchmarking kernels against the speed of light, citing the [clear l2 kernel example](https://cdn.discordapp.com/attachments/1434709259500650628/1436475488091770959/image.png?ex=690fbd8c&is=690e6c0c&hm=900a1b10543312a3ccdce53e5089be2156d8213d1486e3341201e40fe494b670&) that achieves prevents the event from triggering until after we have queued the next one.
- **Datacrunch B200 Server Access Tips Shared**: Users discussed the availability of **bare metal B200 servers**, noting that while **DigitalOcean** and **Coreweave** lack them, **Datacrunch** offers them with upgradeable **CUDA 13** support.
   - A user highlighted the need for profiling tools and the submission tool to capture **NCU profiles**, while also noting the availability of affordable bare metal servers from sesterceis.
- **Future of Low Precision Training Speculated**: The potential of **nvfp4** was discussed, with one user noting its comparable training loss to **fp8** and another mentioning gpt-oss as an example of block-scaled data types.
   - They speculated that labs using low precision pretraining might not want to reveal their strategies and added that **mxfp8** is a long term winner.


  

---


### **OpenRouter ▷ #[announcements](https://discord.com/channels/1091220969173028894/1092729520181739581/1436381819237826650)** (3 messages): 

> `New Embedding models launch, Typescript SDK, Exacto Variants, MiniMax M2 Free Period` 


- **OpenRouter's livestream**: OpenRouter announced a livestream scheduled for later today, with discussions on the new **Embedding models launch**, **TypeScript SDK**, **Exacto Variants**, and community discussions, on [X Stream](https://x.com/OpenRouterAI/status/1986821885716558194) or [Youtube](https://www.youtube.com/@OpenRouterAI).
- **MiniMax M2 Free Period Ending Soon**: The **MiniMax M2 Free period** will end in a few days on **Monday, November 10th**, with rate limits lowered in the meantime, which is expected to result in a higher rate of **429 errors**; see the [official post](https://x.com/minimax__ai/status/1986815058249408541?s=46).
- **OpenRouter is LIVE!**: OpenRouter is now **LIVE** on [X](https://x.com/OpenRouterAI/status/1986871176082358615?s=20) and [YouTube](https://youtube.com/live/TD6JUbJzKPY?feature=share).


  

---


### **OpenRouter ▷ #[app-showcase](https://discord.com/channels/1091220969173028894/1092850552192368710/1436131621366530311)** (2 messages): 

> `Cat girl images` 


- **Cat girl image quality assurance**: Members positively affirmed that the presence of a cat girl in an image guarantees its quality.
- **Agreement on Cat Girl Quality**: Another member explicitly agreed with the assessment, reinforcing the link between cat girls and image quality.


  

---


### **OpenRouter ▷ #[general](https://discord.com/channels/1091220969173028894/1094454198688546826/1436086954587459725)** (305 messages🔥🔥): 

> `OpenRouter website down, Polaris Alpha model, Gronk AI, Nano Banana Image Resending, Chat limit for free users` 


- **OpenRouter Site Glitches Prompt Frustration**: Users reported issues with the **OpenRouter website**, where the page served but content failed to load, hindering login and credit additions.
   - While some experienced slow loading or non-functional account sections, others confirmed the **API** remained operational despite site glitches.
- **Polaris Alpha's Stealth Perks Spark Speculation**: The **Polaris Alpha model** on [OpenRouter](https://openrouter.ai/openrouter/polaris-alpha) garnered praise for outperforming others due to its rule-breaking capabilities.
   - Guesses on its origin ranged from **OpenAI** to **Google**, or even a **Nvidia Nemo 32B troll**, with users urging OpenRouter to keep it free, which is unlikely due to rate limits.
- **The Gronk AI Dissed in Chat**: A user recounted being laughed at for asking about the AI called **Gronk**.
   - Another user chimed in to say that Gronk is *shit* and proceeded to talk about their *llama.cpp custom OLMo 2 finetune mirostat entropy parameters for 3 hours* as an example of how to talk to normies.
- **OpenRouter Now Supports Video**: OpenRouter now supports video, according to a [link](https://openrouter.ai/docs/features/multimodal/videos) shared in the chat.
   - One user reacted positively, saying *Ohhh, just 2 days ago I was like "I wish OR supported videos"*.
- **GLM Coding Exploit Banned SillyTavern Gooners**: It was revealed that **SillyTavern gooners** were banned for abusing the **GLM 4.6 coding plan** to get essentially free API usage.
   - One user lamented that *cant have nice shit because they abuse free shit like a swarm*.


  

---


### **OpenRouter ▷ #[discussion](https://discord.com/channels/1091220969173028894/1392278974222307469/1436114963424084060)** (28 messages🔥): 

> `GTA 6 delay, Openrouter Show, Toven is winking, Retro OR logo` 


- **GTA 6 Delay fuels GPT-7 Speculation**: A member joked that **GPT-7** might be released before **GTA 6**, referencing another **GTA 6 delay**.
- **The Openrouter Show's Rocky Road**: Members discussed name ideas for the **"Openrouter Show"**, and whether the name would imply a scripted entertainment show or a documentary podcast.
   - One member proposed *live roleplay* ideas with AI models like *MythoMax*.
- **Toven's Wink Creates Existential Crisis**: Members joked about whether **Toven** was winking and also whether Toven was a 2D anime girl.
- **Retro OR Logo Evokes 90s Apple**: Members discussed the **retro OR logo**, with some commenting on its **90s Apple logo vibe**.


  

---


### **Cursor Community ▷ #[general](https://discord.com/channels/1074847526655643750/1074847527708393565/1436083248449978430)** (297 messages🔥🔥): 

> `Pro plan limits, Cursor Usage dashboard, Composer vs Grok Code, Sharing Premium Accounts, Student Verification Issues` 


- **Pro Plan Limits Reset on Billing Date, not on a fixed Day**: A member asked about the Pro plan limits, the response was that they always reset with billing cycles, showing a [screenshot](https://cdn.discordapp.com/attachments/1074847527708393565/1436083248357834802/image.png?ex=690fa1bf&is=690e503f&hm=85953f8bf713e88d58104c890a5c6d767185003b86cebede896df50ae275b94c) showing the usage and next billing date.
- **Cursor Usage Dashboard Graph is Missing For Some Users**: Some users are unable to see the usage graph in their dashboard, located at [cursor.com/dashboard?tab=usage](https://cursor.com/dashboard?tab=usage), despite having **unlimited auto** features.
   - It's possibly related to being on an *old pricing* plan, but there was no resolution for why it was missing.
- **Composer vs Grok Code for Codebase Understanding**: When asked about the difference between **Composer 1** or **Grok Code**, some members commented that **Composer** is faster, and good for quickly generating rough code or for use in workflows that involve a second pass by **Claude** for refinement.
   - Another member found that **Sonnet 4.5** could effectively solve complex web code issues where **Composer**, **Grok Code** and **GPT 5 Fast** would get stuck in the same logic loop without a solution.
- **Cursor Student Verification Process is a Hassle**: Multiple members are facing issues with **SheerID** during the student verification process, with one member reporting over **15 attempts** without success, which is in contrast to using **SheerID** with another service that worked on the first try.
   - Members speculate that **Cursor** may have implemented a stricter verification process than other companies, and suggest contacting the **Cursor Forum** or emailing `[email protected]` for feedback, but note that it might only lead to an **AI** response.
- **Accessing Kimi K2 Thinking in Cursor is Incoming**: A member asked whether it will be possible to use **kimi-k2-thinking-turbo** in Cursor.


  

---


### **Cursor Community ▷ #[background-agents](https://discord.com/channels/1074847526655643750/1367213641027551352/1436083459201302569)** (7 messages): 

> `Cursor Agent API, Base64 Image Submission, Internal Errors, Image Generation Service` 


- **Cursor 2.0 Release Speed Impresses!**: A user expressed appreciation for the quick pace of **Cursor 2.0** development, specifically noting the value of visibility into changes.
   - Shortly after this comment, the same user reported encountering an *internal error* within the platform.
- **Agent API Triggers Internal Error**: A user reported receiving an *internal error* when using the **Cursor Agent API**.
   - After investigation, it was discovered that the error occurred due to improper formatting of the **Base64** image data submitted to the API.
- **Base64 Image Formatting Fixes API Error**: The user discovered that the **Base64** string was improperly formatted, prepended with `data:image/jpeg;base64,`.
   - After removing the header, the **API** call succeeded, resolving the error.
- **Desire to Upload Base64 to Agent for re-creation**: The user expressed a desire to submit a **Base64** image to the **Agent API**, intending for Cursor to use the image contextually to re-create and save it to a repository.
   - The user seems to expect that the agent will create and save the image to their repo.


  

---


### **LM Studio ▷ #[general](https://discord.com/channels/1110598183144399058/1110598183144399061/1436086903165423767)** (124 messages🔥🔥): 

> `Intel LLM Scaler, System Prompt for AI Assistant, LM Studio Memory Clearing, LM Studio and N8N Integration, ComfyUI Alternatives` 


- **Intel Scales LLMs with New Tool**: Intel is developing [llm-scaler](https://github.com/intel/llm-scaler) for their architecture, sparking curiosity about performance improvements on Intel GPUs.
   - Members are interested in ERP models on the architecture, but not 1B models.
- **LM Studio's Deep Research Powers**: Users can now enhance research within LM Studio using the [LMS plugins for web search (duck-duck-go) and visit website](https://lmstudio.ai/danielsig/visit-website), along with custom system prompts.
   - The system prompts can be generated via the *AI agent prompt generator* inside the ChatGPT platform (free).
- **LM Studio Gets N8N Integration**: Members discuss the seamless integration of **LM Studio** with [N8N](https://n8n.io/) for AI automation.
   - While some prefer code, others find N8N's visual node interface beneficial, especially for non-programmers.
- **Users seek ComfyUI Alternatives**: Users express frustration with **ComfyUI** setup and seek more *comfy* alternatives like [Stability Matrix](https://github.com/LykosAI/StabilityMatrix).
   - They consider **Automatic1111** and its forks to be mostly abandonware.
- **Models gone religious or misremembering**: A user is facing issues where models like **Gemma** and **Deepseek Distil** give incorrect/odd answers, and LM Studio seems to recall older chats after being wiped.
   - Troubleshooting steps include reverting to default sampling settings, verifying system prompts, and ensuring no uploaded files interfere, but the root cause remains elusive, but they **did** upload five .word documents, which might be the cause.


  

---


### **LM Studio ▷ #[hardware-discussion](https://discord.com/channels/1110598183144399058/1153759714082033735/1436085047697346743)** (164 messages🔥🔥): 

> `1080 vs sysRAM, GLM 4.6, Qwen3-235B MXFP4, multi-GPU, 3090 vs pseudo-benchmark` 


- **1080 Beats SysRAM!**: A member found their **1080** was faster when offloading some tasks to **sysRAM**.
- **GLM 4.6 Crawls at Q4_K_M!**: **GLM 4.6 @ Q4_K_M** runs *dreadfully slow*, clocking in at only 4 tok/s, even Qwen3-235B MXFP4 doesn't top out above 4tok/s.
   - One user noted this setup was slower on their system than on a 5700G by about 30tok/s when testing with **Qwen3 4B Q8_K_XL**.
- **Multi-GPU Mayhem?**: A user is building a **160GB VRAM rack** with 8x 20GB cards (theoretically up to 320GB with 16x cards) for agentic inference, video, music, and image generation.
   - The discussion touched on whether splitting experts across multiple GPUs could speed up **Qwen3-30B**, however, one user couldn't imagine the concurrent **PCIe bandwidth** required, and another said that MoE experts can work independently from each other.
- **3090 Gets Pseudo-Benchmark Beaten!**: A user shared a *simple easy race your friends benchmark for 20GB+ cards in LM studio* in a file named [get_low_benchmark.conversation.json](https://cdn.discordapp.com/attachments/1153759714082033735/1436246519622795274/get_low_benchmark.conversation.json?ex=690f910e&is=690e3f8e&hm=0854e57b2f702242c85b18fa67a4bcf5f575bb526e93c076d7373f357292c501&).
   - After installing the new 3090, the pseudo-benchmark in LM Studio showed speeds of only 90Tkps, down from 150Tkps with an older card, but all other benchmarks were in spec or exceeding.
- **Windows No Likey Hundred Gigs!**: A user joked that when you go above 100gb, Windows stops showing decimals, saying: *Like 6.1gb, 7gb, whats the difference?* alongside with an attached [image](https://cdn.discordapp.com/attachments/1153759714082033735/1436442830817071152/image.png?ex=690f9f22&is=690e4da2&hm=1e81296bfd219be0036b09af7b4aed8e2cfa7cb058e35e539fbc2b38b4cc24e8&).


  

---


### **HuggingFace ▷ #[general](https://discord.com/channels/879548962464493619/879548962464493622/1436086996559986899)** (261 messages🔥🔥): 

> `Kimi K2 Programming, HF Pro Worth, Drone Control with LLM, TTS latency on fine tuned Model, Critical Thinking and LLMs` 


- **Kimi K2 touted for Programming Prowess**: **Kimi K2** is reportedly *very good* at programming, though not necessarily the *best*.
- **Debate on HF Pro Benefits and Pricing**: Members discussed if **HF Pro** is worth it for testing model performance with **vLLM**, as one user wants to test performance of certain model with vllm.
   - Some users mentioned using **Hugging Face Spaces** to test models and another shared that image to video services cost around **$0.40** for an **8-second video** (no sound) with a **$9** subscription.
- **LLMs and Drones**: Members discussed the feasibility of controlling a drone with an **LLM** vs. a **LAM**, with one user seeking to create a drone assistant capable of following voice commands and navigating based on sensor data and image analysis.
   - It was suggested that **YOLO** is better suited for object detection, and **ArduPilot** is recommended for flight control; also, it was said that there are teams researching on **CognitiveDrone**.
- **Low Latency Voice Synthesis with LLMs**: For getting faster **TTS** latency on a fine tuned model, it was suggested to use **vLLM** with sufficient **VRAM** for **KV cache**, and to compile the model for kernel fusion using [this blog post](https://www.anyscale.com/blog/continuous-batching-llm-inference).
   - Another member humorously suggested turning the latency setting down.
- **Synthesizing Critical Thought with Language Models**: A member discussed training a model on reasoning traces to output *thought* and using observables.
   - Another user suggested that **information theory** would greatly help in designing the model. and that the research should focus on **coherence** rather than **truth**.


  

---


### **HuggingFace ▷ #[today-im-learning](https://discord.com/channels/879548962464493619/898619964095860757/1436134379381456958)** (3 messages): 

> `Fine-tuning decoder models, Fine-tuning SetFit, Embedding Gemma and t-SNE, Extracting attention values from SmolLM3` 


- **Fine-Tune Decoder Models for Classification**: A member detailed the procedure for fine-tuning a decoder model to classify messages into categories using **ModernBERT** and **ettin**.
- **SetFit Gets Fine-Tuned on Contrastive Pairs**: The channel discussed fine-tuning **SetFit** on contrastive pairs for binary classification of texts.
- **Embedding Gemma and t-SNE Team Up**: The application of **embeddinggemma-300m** and **t-SNE** to categorize and visualize a dataset of tweets was spotlighted.
- **SmolLM3's Attention Values Visualized**: The process of extracting attention values from **SmolLM3** inference and creating a heatmap was shared.


  

---


### **HuggingFace ▷ #[i-made-this](https://discord.com/channels/879548962464493619/897390720388825149/1436417447103168582)** (3 messages): 

> `OpenBMB VoxCPM, Apple Neural Engine, CoreML, Training Reasoning by Design` 


- ****VoxCPM** sings on Apple Silicon**: A member ported the **OpenBMB VoxCPM Text-to-Speech model** to **CoreML**.
   - The model can now run on the **Apple Neural Engine**; code available on [GitHub](https://github.com/0seba/VoxCPMANE).
- **Reasoning By Design framework revealed**: A member shared a [PDF document](https://cdn.discordapp.com/attachments/897390720388825149/1436483450071679036/Training_Reasoning_by_Design__An_Explanation_of_the_SRP_CCC_Framework_Its_Implementation_and_the_Training_Data_It_Requires.pdf?ex=690fc4f7&is=690e7377&hm=8339b80dd293f2cd660853a2bde1ab2c9ae44f22be64c42d642cb0aa9e3ccbe8&) detailing the **Training Reasoning by Design framework**.


  

---


### **HuggingFace ▷ #[smol-course](https://discord.com/channels/879548962464493619/1313889336907010110/1436303413775437957)** (1 messages): 

> `HuggingFace Learn Website Bug` 


- **HuggingFace Learn Website Glitch Exposed**: A user reported that the **2nd and 3rd paragraphs are accidentally repeated** on the [HuggingFace Learn website](https://huggingface.co/learn/smol-course/unit2/2#expected-dataset-type).
   - They included a screenshot that highlights the repeated content.
- **Bug Report Confirmation**: The bug report concerns a content duplication issue on a specific page within the Hugging Face Learn platform.
   - The user provided a direct link and a visual aid to clearly illustrate the problem.


  

---


### **HuggingFace ▷ #[agents-course](https://discord.com/channels/879548962464493619/1329142738440028273/1436145303643488429)** (7 messages): 

> `Agents Course Certificate, Confirmation Page Issues, Llama Index DuckDuckGo Rate Limit` 


- **Agents Course Completion Certificate Inquiry**: A member inquired about receiving a certificate of completion upon joining the [Agents Course](https://huggingface.co/learn/agents-course) today.
- **Confirmation Page Woes**: A user reported being stuck on the confirmation page across multiple browsers (**Edge**, **Firefox**, and **Chrome**) on their Android phone.
- **Llama Index DuckDuckGo hit by Rate Limit**: A member encountered a *"rate limit exception"* while using the web search tool in the [Agents Course](https://huggingface.co/learn/agents-course/unit3/agentic-rag/tools?agents-frameworks=llama-index) despite installing the **llama-index-tools-duckduckgo** package.
- **Certificate Still Possible?**: A member confirmed that receiving the completion certificate for the [Agents Course](https://huggingface.co/learn/agents-course) is still possible, but the testing endpoint to get files is currently down.


  

---


### **Unsloth AI (Daniel Han) ▷ #[general](https://discord.com/channels/1179035537009545276/1179035537529643040/1436084446510972968)** (147 messages🔥🔥): 

> `Qwen3-Next-80b-A3B-Instruct finetuning, MoE models in Transformers, Unsloth and FastModel for MoE, Training frameworks for MoE, Unsloth Dynamic Quants for smaller models` 


- **Qwen3-Next-80b-A3B-Instruct Benchmarks Spark Finetuning Interest**: Members discussed the possibility of finetuning **Qwen3-Next-80b-A3B-Instruct**, noting its impressive benchmarks, even outperforming the **225b model** in some cases.
   - It was noted that while possible using **ms swift**, **transformers** is currently *kinda janked* for MoEs.
- **Transformers Lagging on MoE Implementations**: The poor implementations of **MoE models** in **Transformers** were attributed to it being a primarily high-level library, and **PyTorch** lacking good ops for MoEs.
   - One member noted they trained a **30B model** and found it to be *1/4th the speed of training MS3 with the same recipe*.
- **Unsloth's FastModel Key for MoE Fine-Tuning**: When fine-tuning **MoE models** with **Unsloth**, it's essential to use **FastModel** rather than **FastLanguageModel** due to how it initializes sparse MoE layers and gating logic.
   - **FastModel** supports both dense and sparse (MoE) models safely.
- **MoE Training Still Rough, Frameworks Compared**: The general consensus is that **MoE training** is still not fully optimized, with one member asking what the best approach to training on **Qwen 30B** is.
   - **Megatron-LM** was recommended as being 10x faster for MoEs due to its good support for parallelism, but suffers from poor documentation and being optimized for pretraining instead of post-training, while **Torchtune/Titans** were mentioned as faster than transformers but stuck in *a weird sorta abandonware state*.
- **Sequence Length Impacts Training Time**: A member discovered that 32k sequence length on their 14B model on an A6000 with 48GB of VRAM resulted in a long trining time, and that reducing seq length to 16k did not change it.
   - It was suggested to start small with length, N samples and batch size and increase gradually to find the bottleneck, and also that the GPU can only process at a given rate, so exceeding what it can process in batch size or seq length wont do much difference as it will be very slow regardless.


  

---


### **Unsloth AI (Daniel Han) ▷ #[introduce-yourself](https://discord.com/channels/1179035537009545276/1179039724355211325/1436263087513796618)** (1 messages): 

> `Introduction of Ash, LLMs and RL` 


- **Ash joins Unsloth Community!**: Ash introduced themself as working with **LLMs** and **RL** at their university lab and expressed their appreciation for **Unsloth**.
- **Ash likes tweaking small models**: Ash mentioned they enjoy tweaking small models.


  

---


### **Unsloth AI (Daniel Han) ▷ #[off-topic](https://discord.com/channels/1179035537009545276/1179039861576056922/1436095270969544858)** (32 messages🔥): 

> `torch.compile caching, OpenRouter XCode integration issues, AI blocking on websites, AI resume problems, AdamW loss analysis` 


- **Doubts about torch.compile Caching**: A member expressed concern that **torch.compile** might only be caching results based on the current input prompt, rather than adapting to different inputs with the same shape.
   - The member questioned whether different input prompts should lead to different activations.
- **OpenRouter XCode Integration Troubled**: A member reported issues integrating **OpenRouter** with **XCode's** "Coding Intelligence", encountering a "No cookie auth credentials found" error.
   - Despite following the [OpenRouter guide](https://openrouter.ai/docs/sdks/xcode) and successfully pulling the model list, they faced authentication problems.
- **Blocking AI Interaction on Websites**: A member suggested developing a **JS script/tool** to completely block AI interaction with websites, allowing only manual human browsing.
   - They emphasized the need to prevent AI from scraping or interacting with website content automatically.
- **AI's impact on Junior Positions Debated**: A member discussed the potential displacement of junior employees due to **AI-generated resumes and reports**, leading to a lack of on-the-job training and experience for future senior roles.
   - They argued that this reliance on AI could lead to an atrophy of skills and knowledge within the workforce.
- **AdamW's Limited Loss Awareness**: A member questioned whether the **AdamW optimizer** only focuses on reducing the overall loss number without considering the specific type of losses.
   - They suggested that **AdamW** simply tries to minimize the loss without understanding its composition or implications.


  

---


### **Unsloth AI (Daniel Han) ▷ #[help](https://discord.com/channels/1179035537009545276/1179777624986357780/1436107583994724443)** (20 messages🔥): 

> `Torch 2.9 with Unsloth Docker, Backprop Issues in Attention, Per-Token Loss Weighting, Deepseek OCR with Unsloth, Hosting Unsloth GGUF with vLLM` 


- ****Torch 2.9** Compatibility Quest with Unsloth Docker**: A member inquired about using **Torch >= 2.9** with the official Unsloth docker image to resolve backprop issues related to *torch.matmul* and *'out='* argument restrictions.
   - The current base image uses Conda Python, and the available PyTorch wheels (cu124) do not include **torch==2.9.0**, causing Dockerfile errors.
- ****Backprop Blockage** Busted by Newer Torch**: A user faced backpropagation issues with **Torch 2.8** due to Unsloth's GPT-OSS fallback using attention implementations that call *torch.matmul* with the *'out='* argument, which PyTorch's autograd forbids with LoRA-enabled training.
   - Upgrading to **Torch >= 2.9** reportedly switches to a compiled eager path that avoids the *'out=' matmul*, resolving the autograd restriction.
- **Loss Function Modification Attempted**: A member tried to modify the cross-entropy function in Unsloth to add per-token loss weighting, seeking guidance on the relevant function/file.
   - They shared a code snippet attempting to implement a *MMSWeightedLossTrainer* class, but found it to be quite memory intensive; ultimately they *figured out an approach*.
- ****Deepseek OCR** Gets Unslothed**: A user encountered errors while trying to run **deepseek-ocr** through Unsloth, following [this guide](https://docs.unsloth.ai/new/deepseek-ocr-how-to-run-and-fine-tune#running-deepseek-ocr).
   - After installing missing dependencies identified when attempting to load the model using normal transformers, the issue was resolved and **deepseek-ocr** worked with Unsloth.
- **Guidance required to Host **Unsloth GGUF** with vLLM**: A member is seeking guidance to host *unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF* with type *Q4_K_XL* using vLLM after getting errors.
   - It was suggested that **GGUF** seems to be experimental, directing the user to the [vLLM documentation](https://docs.vllm.ai/en/stable/features/quantization/gguf.html) for more info.


  

---


### **Unsloth AI (Daniel Han) ▷ #[research](https://discord.com/channels/1179035537009545276/1257011997250424842/1436256448144216116)** (2 messages): 

> `Magistral model, GRPO RL, KL divergence losses` 


- **Magistral Model springs from GRPO RL!**: A member highlighted a [paper](https://arxiv.org/abs/2506.10910) about **Mistral** training its **Magistral** model entirely from scratch using only **GRPO RL**, without **MLM** etc.
   - The team modified the **GRPO loss** to get rid of **KL divergence losses**.
- **Aussie AI interest sparks!**: A member found the above work interesting and timely for Australia in the next few weeks.
   - No further discussion ensued.


  

---


### **Nous Research AI ▷ #[general](https://discord.com/channels/1053877538025386074/1149866623109439599/1436098366575743206)** (99 messages🔥🔥): 

> `GPT-5 charts vs others, Chinese models cheaper, Kimi's Performance, Deepseek pricing` 


- **Users find GPT-5 charts lacking**: Users believe [OpenAI's charts](https://cdn.discordapp.com/attachments/1149866623109439599/1436098366181347369/GPT-5-chart-crime-chart-1295091390.png?ex=690fafd3&is=690e5e53&hm=56053572db3e0d017892444bb7e8a92f812efeca4b72d06afa245f54e2ca4696) are still behind **OpenAI**.
   - One user said there is *still room to improve their charts, they are leagues behind openai*.
- **China OS models set to dethrone the throne**: Members speculate that **China OS** models are projected to reach **100% high intelligence** with **95% lower cost** by **2026**.
   - This could mean *all the massive high compute buildup and energy suckage was all a Ponzi*.
- **Kimi thinking's capabilities**: Users compare the **Kimi** model to **ChatGPT** in terms of reasoning and tool use.
   - One said *Kimi has reasoning for tools*, and another thought *the result was practical similar quality to chatgpt*.
- **Deepseek is very cheap**: **Deepseek** is much cheaper than **OpenAI**, at least for Chinese labs.
   - The price is around **42 cent per 1 million tokens**.


  

---


### **Nous Research AI ▷ #[research-papers](https://discord.com/channels/1053877538025386074/1104063238934626386/1436443908421849108)** (1 messages): 

> `Knowledge Domains, Gradient Averaging, Teaching Variety of Specialized Knowledge` 


- **Knowledge Domains batch mixing effects learning**: A member was thinking about how mixing **knowledge domains** within a batch affects learning.
   - They asked whether averaging the gradient across diverse data samples potentially '*negate each other*' or if it's accommodated by **sparsity / enough parameters**.
- **Teaching Variety of Specialized Knowledge interference**: A member discussed how to teach a variety of **specialized knowledge** and not have them interfere or dilute each other.
   - This is an abstraction of the question of how **knowledge domains** within a batch affect learning.


  

---


### **Nous Research AI ▷ #[research-papers](https://discord.com/channels/1053877538025386074/1104063238934626386/1436443908421849108)** (1 messages): 

> `Knowledge Domains Mixing, Gradient Averaging, Sparsity Accommodation, Teaching Specialized Knowledge` 


- **Mixing Knowledge Domains Affects Learning**: A member was thinking about how **mixing knowledge domains within a batch affects learning**.
   - They were wondering about averaging the gradient across diverse data samples and if they potentially 'negate' each other or if it's accommodated by sparsity / enough parameters.
- **Teaching Variety of Specialized Knowledge**: A member abstracted the previous point to more generally **how to teach a variety of specialized knowledge and not have them interfere or dilute each other**.


  

---


### **OpenAI ▷ #[ai-discussions](https://discord.com/channels/974519864045756446/998381918976479273/1436084536277602464)** (75 messages🔥🔥): 

> `Siri and ChatGPT, SOTA Models, O3 image zoom, GPT-5 identifying locations, kilocode model` 


- **Siri and ChatGPT Connect!**: Members discussed how you can only connect **Siri with ChatGPT** and asked if they had changed course on this integration.
   - Some members noted that they were *so happy about this*.
- **Models Need Visual Reasoning**: Members discussed how **current SOTA models can't solve reasoning problems visually via text tokens**.
   - They noted that if it was a square maze, it might be able to solve it by breaking it down into cells and reasoning over that.
- **GPT-5 Geoguessr Accuracy!**: Members noted that **GPT-5** could accurately identify a location within a kilometer when playing **GeoGuessr**.
   - One member said they send sudoku puzzles from book pages and the models *zoom and crop it for over 10 minutes on average*.
- **Kilocode Model is the real deal for agentic coding**: Some members pointed to [drinkoblog.weebly.com](https://drinkoblog.weebly.com) which claims the **open weight model k2 thinking** *seems to be the real deal for agentic coding*.
   - The blog also states that it is *fixing code that gpt-5-codex high was struggling with*.
- **The Beckingham Constant Revealed**: A member posted about **The Beckingham Constant**, which is the equilibrium between growth and decay in self-organizing systems.
   - This relates to the solvency floor of coherence, where feedback can't keep up and the system loses integrity [See attached images](https://discord.com/channels/974519864045756446/977259063052234752/1436193986384498783).


  

---


### **OpenAI ▷ #[gpt-4-discussions](https://discord.com/channels/974519864045756446/1001151820170801244/1436104226215297164)** (8 messages🔥): 

> `GPT-5.1, GPT variants, Making money from custom GPTs` 


- **GPT-5.1 Thinking Appears on ChatGPT**: The appearance of **GPT-5.1 Thinking** on the ChatGPT website suggests an imminent update from OpenAI, indicating that the rumored release of **GPT-5.1** is drawing closer to reality.
   - Rumors point to a broader **GPT-5.1** lineup: **Mini**, **Thinking**, and a possible **Codex-focused** update, each designed to meet different user needs and computational constraints.
- **GPT-5.1 Boasts Enhanced Reasoning**: **GPT-5.1** is positioned as a direct challenger to Google's upcoming **Gemini 3 Pro**, with an imminent launch to stay ahead in the AI race.
   - The model is referenced in a backend component as responsible for driving advanced reasoning processes within ChatGPT, implying it may be optimized for multi-step reasoning or agent-like tasks.
- **GPT-5.1 Models in Internal Testing**: The rumored **GPT-5.1** models are in internal testing and A/B trials, but no exact date has been announced for release.
   - Variants are said to include **Mini** (efficiency boosts for free users), **Thinking** (complex reasoning with variable thought budgets), and **Codex-focused** (coding assistance improvements).


  

---


### **OpenAI ▷ #[prompt-engineering](https://discord.com/channels/974519864045756446/1046317269069864970/1436090227902120128)** (8 messages🔥): 

> `Behavioral Orchestration of SLMs, Animation effect prompts, AI Project Collaboration` 


- **Behavioral Orchestration Modulates SLM Tone**: Members discussed **behavioral orchestration**, described as a framework to modulate **SLMs**' tone at runtime, above parameter training.
   - Instead of assigning a character or role, a member shapes the AI's behavior using parameters like *"No unsolicited advice"*.
- **Animation Effect Assistance Requested**: A member asked for help identifying and generating prompts for a specific **animation effect**.
   - A [video example](https://cdn.discordapp.com/attachments/1046317269069864970/1436206296566071377/WhatsApp_Video_2025-11-06_at_17.41.12_a9ca9b8a.mp4?ex=690f6b98&is=690e1a18&hm=15872ddb9b9ea9898cda8c5bf2e7ef3776d157c73c02aeca1da4593d6e0f40f1&) was provided but no solution was given.
- **ChatGPT Pro User Seeks AI Project Collaboration**: A **ChatGPT Pro** user sought guidance and collaboration on a large-scale AI project.
   - Another member responded expressing interest in collaborating to do something big.


  

---


### **OpenAI ▷ #[api-discussions](https://discord.com/channels/974519864045756446/1046317269069864970/1436090227902120128)** (8 messages🔥): 

> `Behavioural Orchestration, AI Personalization, Animation Effects` 


- **Behavioral Orchestration buzzes LinkedIn**: Members on LinkedIn discussed **behavioral orchestration**, described as a framework to modulate SLMs at runtime, rather than working on parameters or training.
   - It would act *above* them, to modulate SLMs tone.
- **AI models get Behavioral Instructions**: Instead of assigning an AI a specific role, users are giving it a set of **parameters** to shape its behavior, not dictating its personality.
   - Examples include *"Do not make personal assumptions about me"* and *"No unsolicited advice."
- **Animation Effect needs a name**: A user asked for help identifying an **animation effect** from a [WhatsApp video](https://cdn.discordapp.com/attachments/1046317269069864970/1436206296566071377/WhatsApp_Video_2025-11-06_at_17.41.12_a9ca9b8a.mp4?ex=690f6b98&is=690e1a18&hm=15872ddb9b9ea9898cda8c5bf2e7ef3776d157c73c02aeca1da4593d6e0f40f1&).
   - They requested help with prompts for this effect.


  

---


### **Modular (Mojo 🔥) ▷ #[general](https://discord.com/channels/1087530497313357884/1098713601386233997/1436082875643727963)** (58 messages🔥🔥): 

> `GPU puzzle series questions, Mojo compiler implementation language, Mojo error handling vs Python, Explanation of Modular, MAX, and Mojo, Installing a game` 


- **GPU Puzzle Channel Quest**: A member inquired about a dedicated Discord channel for questions about the GPU puzzle series, covering both environment setup and Mojo/GPU code.
   - Another member suggested starting with the [learn-mojo channel](https://discord.com/channels/1087530497313357884/1436158039232086186), while Modular folks recommended the forum for puzzle-specific questions to ensure future searchability.
- **Mojo Compiler Still Rocking C++**: A member asked whether the Mojo compiler is written in Mojo itself or still in C++.
   - Another member confirmed it's still in **C++** and **MLIR**, noting that Mojo needs more stability and feature completeness before the compiler can be self-hosted and that porting **LLVM** is unlikely.
- **Mojo's Try-Except Triumphs**: The team confirmed that Mojo's error handling uses a try-except approach that performs better than Rust due to the ability to do *placement new* on the happy path.
   - Syntax for making something into a `Result` is a low priority.
- **Modular Unmasked: Mojo's Role**: One member clarified that **Modular** is the company, **MAX** is the replacement for cuBLAS/Cutlass/TensorRT/Pytorch/JAX, and **Mojo** is a programming language.
   - Another member poetically stated Mojo looks like Python, but acts like a combination of C++ and Rust wearing a *snakeskin jacket*.
- **Game Installation Headaches**: A member asked for help installing a game, but one member stated they probably can't help with that.
   - They suggested the member to complain to whoever sold it to them.


  

---


### **Modular (Mojo 🔥) ▷ #[announcements](https://discord.com/channels/1087530497313357884/1098765954302873621/1436160657031430184)** (1 messages): 

> `New Beginners Channel, Mojo Language Support` 


- **Modular Launches New Mojo Beginners Channel**: Modular has created a new dedicated channel, <#1436158039232086186>, for beginners to **ask questions, get help from the Modular team**, and connect with others learning Mojo.
   - This initiative aims to cultivate a supportive community for **new learners of Mojo**, providing a collaborative space for assistance.
- **Discussing Mojo Language Support**: Members are actively discussing and exploring the features, capabilities, and potential applications of the **Mojo** programming language within the new channel.
   - The discussions include practical coding examples, problem-solving strategies, and sharing of resources to enhance understanding and proficiency in **Mojo**.


  

---


### **Modular (Mojo 🔥) ▷ #[mojo](https://discord.com/channels/1087530497313357884/1151418092052815884/1436166313230995516)** (20 messages🔥): 

> `CS Education Importance, Nand2Tetris Recommendation, Mojo Multithreading, C Library Bindings for Mojo` 


- **Computer Science Foundations are Forever**: Members noted that a strong **CS foundation** provides a solid theoretical understanding of computation and hardware and the basics haven't changed that much.
   - A solid CS foundation will carry you far and dipping your toes into CE helps a lot, and learning your **CS history** makes a lot of things a lot more clear.
- ****Nand2Tetris** highly recommended**: For software people looking to get closer to hardware, a member highly recommends [Nand2Tetris](https://www.nand2tetris.org/) as a reasonably comprehensive guide to the basics in a fun package.
   - They gave an example of **C's null terminated strings** tracing back to **PDP-11** instructions.
- **Mojo doesn't support CPU Multithreading natively**: Mojo does not yet support CPU multithreading, meaning there are no primitives like **locks**, though one can use `parallelize` or other similar functions if you want to run code in parallel.
   - However, the runtime takes care of managing the threads, and since most of Modular mojo’s code is targeting the GPU, CPU specific things aren’t as much of a priority atm, though there is limited CPU atomic support.
- **Member looking to create C Library bindings for Mojo**: A member expressed interest in writing bindings or rewrites for major C libraries for Mojo, such as **OpenSSL**, **sqlite**, **libnuma**, **libcurl**, **dbus**, **zlib**, **zstd**, **ffmpeg**, **gmp**, **zeromq**, and **lz4**.
   - However, these probably won't be supported in the `stdlib` - if these are ported over they'll likely live as external packages people can pull from the **pixi community channel**.


  

---


### **Modular (Mojo 🔥) ▷ #[max](https://discord.com/channels/1087530497313357884/1212827597323509870/1436380720586035271)** (7 messages): 

> `CUDA checkpointing with MAX, TokenGeneratorPipeline, Cold start times of a container` 


- **CUDA Checkpointing: Temperamental or Time-Consuming?**: Members discussed using **CUDA checkpointing** with **MAX**, finding it *temperamental* and potentially slow due to snapshotting all GPU state.
   - One member tried it with the **TokenGeneratorPipeline**, but it *hung*, and cold start times remained an issue, suggesting its impracticality for some use cases.
- **TokenGeneratorPipeline Freezes**: One member reported issues when trying to use CUDA checkpointing with the **TokenGeneratorPipeline**, resulting in the process hanging.
   - They speculated whether this behavior was related to the metrics monitor or simply due to the slowness inherent in snapshotting the entire GPU state.


  

---


### **Eleuther ▷ #[general](https://discord.com/channels/729741769192767510/729741769738158194/1436095043390668870)** (25 messages🔥): 

> `Introduction Channel, Finding Relevant Research, AI Developer Study Notes, LLM Stroke with Images, Qwen3-VL System Prompt` 


- ****Discord Debates** Intro Channel**: Members discussed the merits of a separate introductions channel, citing concerns about unfocused self-promotion versus allowing newcomers to naturally enter the flow in the general channel.
   - One member argued that separate intros would make interaction feel staged and be less welcoming, with another noting they want to *keep discussions focused on research*.
- **User Seeks **AI Developer** Study Notes**: A member inquired about study notes covering the fundamentals for an **AI developer role**, aiming to supplement their on-the-job lookup approach.
   - This query was posed to find relevant research for a project but avoid breaking rules with a lengthy initial post.
- ****LLM Suffers Stroke** from Image Overload**: A member shared an image of an **LLM** apparently experiencing a *stroke* due to processing too many images, the last line didn't come from me but the subconscious of the AI ( [Screenshot](https://cdn.discordapp.com/attachments/729741769738158194/1436405418413916372/Screenshot_from_2025-11-06_19-01-28.png?ex=690f7c4a&is=690e2aca&hm=5643d7caecb7e259abde1b9d8af468177c3e1855c3d640267956a9ad9588a263&)).
   - The member reported that **Qwen3-VL** falsely claimed to not be a visual model unless prompted otherwise, *requiring a system prompt that informs it differently than the default*.
- **NeurIPS Roommate Hunt Begins**: A member announced they are attending **NeurIPS** in San Diego from December 3-7 and is *looking for a female roommate to share hotel costs*.
   - No further details were provided regarding specific accommodations or roommate preferences.


  

---


### **Eleuther ▷ #[research](https://discord.com/channels/729741769192767510/747850033994662000/1436283001943101531)** (22 messages🔥): 

> `Advancements in RL since OpenAI Five, Data Efficiency in RL, Scaling Laws in RL, Cost of Modern RL Attempts, GPU-based Environments for RL` 


- **Efficiency improvements driven by better Deep Learning**: While there haven't been direct algorithmic upgrades in RL, the community has improved deep learning practices, leading to sample efficiency gains as many were *doing the DL in extremely cursed / wrong ways*.
   - Techniques like **meta-reinforcement learning**, **model-based RL** (Dreamer/TD-MPC2), and **distributional RL** are under development.
- **Model Scaling Helps Learn Better Value Functions**: Scaling model size (e.g., from 140M to 14B parameters) can improve sample efficiency by aiding in value function training, with the value function helping learn a better policy.
   - Larger world models are expected to benefit model-based RL, but there aren't formal scaling laws yet.
- **Dota 2 Environment is a Major Bottleneck**: The high cost of **OpenAI Five** was due to the number of rollouts needed per PPO iteration, which could potentially be reduced with better deep learning and off-policy methods.
   - The fact that the game runs on CPU is the major bottleneck in RL nowadays.
- **Modern RL Attempts Cost Less Now**: A modern attempt at replicating **OpenAI Five** could cost one to two orders of magnitude less, though it depends on deviations from the original and the use of techniques like reward shaping, priors, and world models.
   - Many are excited about the use of **GPU-based environments for RL**.


  

---


### **Yannick Kilcher ▷ #[general](https://discord.com/channels/714501525455634453/986699377257119794/1436085927394152603)** (34 messages🔥): 

> `GoodfireAI memorization research, Autonomous agent PR challenges, Qwen3-VL image handling, AI Engineer Promotion` 


- **Memorization via Loss Curvature Research is Rad**: A member shared [GoodfireAI's research](https://www.goodfire.ai/research/understanding-memorization-via-loss-curvature) on understanding memorization via loss curvature, but another member didn't feel like they had a better understanding of how memories are stored in the weights after reading.
   - Another member agreed, noting the tweet makes it sound like they understand how it is stored, but the member understands they just found how to discourage it (via some version of dropout that targets weights most likely to be used for memory).
- **Agent PRs Facing Professionalism Friction**: A member discussed the challenges of letting agents run autonomously due to structural review comments and cognitive overhead when breaking things up into conceptual features, and wondered why there is a strict stance of no PRs from agents.
   - Another member chimed in, revealing it is *political* due to the fact the upstream maintainer of the project (spacebar chat) has an issue with professionalism and productivity accelerators including **AI coding tools**.
- **Qwen3-VL Identity Crisis**: **Qwen3-VL** thinks it's a regular **Qwen model** and crashes when forced to accept it can see images, violating its internal sense of self and requiring a system prompt not to immediately crash.
   - Even with a system prompt, it still crashes if having to deal with 3 images with some questions in between, which may be related to a bug in **Ollama**.
- **AI Engineer's Pitch**: A member with image analysis posted about his services as an experienced **AI Engineer** looking for new projects or full-time opportunities, specializing in building autonomous agents powered by **GPT-4o**, **LangChain**, **AutoGen**, **CrewAI**, and other cutting-edge tools.
   - The engineer can build autonomous research & data-gathering bots, multi-agent systems, **AI assistants** with memory, planning, and tool use, trading bots, customer support agents, IVR agents, and more. DM them if you're hiring or have something cool in mind!


  

---


### **Yannick Kilcher ▷ #[paper-discussion](https://discord.com/channels/714501525455634453/1045297868136779846/1436095853482741871)** (5 messages): 

> `Nested Learning, Kimi-K2, Continual Learning` 


- **Moonshot Kimi-K2 for Thoughtful Thinking**: A member linked to Moonshot AI's [Kimi-K2](https://moonshotai.github.io/Kimi-K2/thinking.html), highlighting its capabilities in thoughtful thinking.
- **Google Introduces Nested Learning Paradigm**: A member shared a link to Google's blog post on [Nested Learning](https://research.google.com/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/), a new ML paradigm for **continual learning**.
   - Another member expressed interest in the [related paper](https://abehrouz.github.io/files/NL.pdf) on Nested Learning and its potential applications.


  

---


### **Yannick Kilcher ▷ #[ml-news](https://discord.com/channels/714501525455634453/853983317044756510/)** (1 messages): 

__._astro_.__: https://pytorch.org/blog/helion/
  

---


### **tinygrad (George Hotz) ▷ #[general](https://discord.com/channels/1068976834382925865/1068976834928193609/1436396649357512746)** (1 messages): 

> `Real-time speech transcription, Parakeet v2, Multi-GPU scaling, Joe Rogan podcast transcription` 


- **Parakeet v2 Achieves 200x Real-Time Transcription**: A member reported achieving **200x real-time speech-to-text transcription** using [Parakeet v2](https://huggingface.co/spaces/nvidia/parakeet-tdt-0.6b-v2) on a single **4090 GPU** in low power mode.
   - They are experimenting with **multi-GPU setup**, expecting it to scale linearly, potentially reaching **1,200x real-time transcription**.
- **Ultra-Fast Podcast Transcriptions**: With the achieved speeds, a **3.5-hour Joe Rogan podcast** could be transcribed in approximately **10.5 seconds**.
   - The member expressed excitement about the advancements, stating, *"We live in the future."
- **TinyBox v1 Green Holds Up Well**: The member shared that their **TinyBox v1 Green** (6x4090) has performed remarkably well despite GPU technology advancements.
   - They are running this setup out of their living room.


  

---


### **tinygrad (George Hotz) ▷ #[learn-tinygrad](https://discord.com/channels/1068976834382925865/1070745817025106080/1436163195143327795)** (18 messages🔥): 

> `UOps errors, pytorch tensors to tinygrad, pool refactor, UOps.after restrictions` 


- **UOps Errors prove Unhelpful**: A member found the errors for **UOps** to be very unhelpful and struggled with ending a range and running stuff outside of the loop.
   - They also questioned if `valid` is the best way to generate `if` statements and showed a cursed kernel generated via **UOps** in a [screenshot](https://cdn.discordapp.com/attachments/1070745817025106080/1436163998641946646/image.png?ex=690f4433&is=690df2b3&hm=d2fb3a50e2cf3bb59babb01aaf77482fd6aa489404cb52cb27e957a48b5962e7).
- **Converting Pytorch Tensors to tinygrad: the most efficient way**: A member asked about the proper way to turn **PyTorch tensors** to **Tinygrad tensors** efficiently.
   - They mentioned using `Tensor.from_blob(pytorch_tensor.data_ptr())` but were unsure about the conversion back, currently using `from_numpy`.
- **Pool Refactor: Pad vs Repeat**: A member inquired about the goal of the `_pool` refactor, questioning whether the intention is to remove `.pad()` completely or merge the two implementations.
   - They noted that using `.repeat()` to handle both cases results in extra **bandwidth pass kernels** being generated and included a [screenshot](https://cdn.discordapp.com/attachments/1070745817025106080/1436438026577248329/image.png?ex=690f9aa9&is=690e4929&hm=2b387a79a86a202e1b0992f852c68842041f61f2f30f42e13abac46cfcb85f85) of the current implementation and the refactor.
- **UOps.after Usage: Only on Buffers**: A member asked about the restrictions around when `UOps.after` can be used, trying to make a conditional for `.valid` after the end of a loop.
   - George Hotz responded that *after should only be on buffer, why do you need it on a compare? that compare has the same value whenever you do it*.


  

---


### **MCP Contributors (Official) ▷ #[general](https://discord.com/channels/1358869848138059966/1358869848138059969/1436084770114240512)** (12 messages🔥): 

> `Code Execution MCP Blogpost, 2025-11-25 Spec Release, SEP-1330 SDK Changes` 


- **MCP Blogpost Misdirects to Discord**: The [Code Execution with MCP blogpost](https://www.anthropic.com/engineering/code-execution-with-mcp) on Reddit is misdirecting people to the Discord channel, which is intended for contributors to the project.
   - A member suggested that the blog post be updated to point to the new [GitHub discussion](https://github.com/modelcontextprotocol/modelcontextprotocol/discussions/1780) instead, and another responded: *"That works for me. It's easier for me than Discord."*
- **Finalizing SEPs for 2025-11-25 Spec Release**: In preparation for the **November 25, 2025** spec release, the team has lined up several [SEPs for finalization](https://github.com/orgs/modelcontextprotocol/projects/26/views/8), with a spec freeze expected on **November 14, 2025**.
- **SDK Changes Completed for SEP-1330**: The "Awaiting SDK Change" label has been removed from **SEP-1330** as the changes have been completed for some time, pending review and merge of the **TS/Python SDK** and spec/schema changes.


  

---


### **DSPy ▷ #[show-and-tell](https://discord.com/channels/1161519468141355160/1202371242519441499/1436153732885778472)** (2 messages): 

> `Tau Bench, FastWorkflow, GEPA, Multi-Agent Tool Use` 


- ****FastWorkflow** Achieves SOTA in **Tau Bench****: The poster announced that **fastWorkflow** has achieved **SOTA** on both retail and airline workflows in **Tau Bench**, with a paper forthcoming and code available at the [fastworkflow repo](https://github.com/radiantlogicinc/fastworkflow) and the [tau bench fork](https://github.com/drawal1/tau-bench).
   - They emphasized that *with proper context engineering, small models can match/beat the big ones*.
- ****GEPA** to Optimize End-to-End Workflows**: A member mentioned that end-to-end workflow optimization using **GEPA** is in progress.
   - An image was attached showing a table of the relative performance of **fastWorkflow** versus other strategies.
- ****DSPy**-Based Planner Tackles Multi-Agent Tool Use**: A member published a post using a **DSPy** based planner and orchestrator to solve for multi agent tool use, soliciting feedback on [X](https://x.com/viksit/status/1986919606175547425) and their [Substack](https://viksit.substack.com/p/solving-agent-tool-sprawl-with-dspy).


  

---


### **DSPy ▷ #[general](https://discord.com/channels/1161519468141355160/1161519469319946286/1436160476089155718)** (9 messages🔥): 

> `Rate Limiting, Exponential Backoff, LLM context history, Workflow Automation, LLM Integration` 


- ****Rate Limits Frustrate Batch Requests****: A user is encountering **rate limits** when running `dspy.Module.batch` requests and is seeking advice on how to add a time delay between requests or properly respect the **rate limits**.
- ****Exponential Backoff Saves the Day****: A member suggested using **exponential backoff** along with keeping the cache enabled to handle rate limits effectively.
   - Another member shared a custom **exponential backoff decorator** with initial delay, jitter, and max attempts, providing a [Google-sourced code snippet](https://www.google.com) as an example.
- ****Gemini Token Limits Confuse Module Context****: A user asked whether sub-modules within a custom module that share the same **Gemini** model run with their own context history or contribute to the same token limit.
   - This question was raised in the context of having ReAct and CoT modules within a custom module that utilizes **Gemini/Gemini-2.5-flash**.
- ****AI Engineer Showcases Workflow Automation Skills****: An experienced engineer introduced themselves as specializing in **workflow automation, LLM integration, RAG, AI detection, and image and voice AI**, with a background in real-world implementations and blockchain development, sharing their [portfolio](https://devx-green.vercel.app/).


  

---


### **aider (Paul Gauthier) ▷ #[general](https://discord.com/channels/1131200896827654144/1131200896827654149/1436121747333185616)** (3 messages): 

> `Claude Sonnet, Anthropic API Key, Model reasoning, Sora 2 invite code` 


- **Aider Supports Claude Sonnet**: A member confirmed that Aider already supports **Claude Sonnet**, specifying `/model claude-sonnet-4-5-20250929` as the command.
   - They also reminded users to [set up their Anthropic API key](https://www.anthropic.com/api) to use the model.
- **Reasoning for Haiku and Opus models requested**: A member inquired about enabling **thinking/reasoning** on models like **Haiku-4-5** and **Opus-4-1**, particularly within the CLI.
   - They are open to editing the model settings YML file and sought advice from the community.
- **Sora 2 invite code sought**: A member asked if anyone in the community had a **Sora 2 invite code** to share.


  

---


### **aider (Paul Gauthier) ▷ #[questions-and-tips](https://discord.com/channels/1131200896827654144/1133060505792159755/1436167800761745408)** (3 messages): 

> `Prompt Caching, Claude Cost` 


- **Prompt Caching Cuts Claude Costs**: A member inquired about enabling prompt caching with **Claude** to reduce costs, reporting expenses of **$0.24 per prompt** with **75k tokens sent**.
   - Another member pointed to the [aider documentation](https://aider.chat/docs/usage/caching.html) which mentions the `--cache-prompts` option.
- **Enabling Prompt Caching for Claude**: A user was looking to enable prompt caching for **Claude** to reduce high costs.
   - A fellow user shared a direct link to the [official Aider documentation on prompt caching](https://aider.chat/docs/usage/caching.html), specifically highlighting the `--cache-prompts` flag.


  

---


### **Manus.im Discord ▷ #[general](https://discord.com/channels/1348819876348825620/1349440650495398020/1436092430167572610)** (6 messages): 

> `AI Agent capabilities, Discord moderation issues, Chinese AI startups, Job seeking` 


- **Advanced AI Engineer Introduces Expertise**: An experienced engineer specializing in **workflow automation**, **LLM integration**, **RAG**, **AI detection**, **image and voice AI**, and **blockchain development** offered support.
   - He cited examples such as **support automation systems** and **advanced RAG pipelines** delivering accurate, context-aware responses, and provided a [link to his website](https://devx-green.vercel.app/).
- **SOTA AI Agent lacks Discord moderation**: A member noted the irony of **near state-of-the-art AI agents** existing while **real Discord moderation** is lacking.
   - The member expressed fondness for **Chinese AI startups**.
- **Job Seeking Post**: A member inquired whether anyone was seeking a developer.
   - A respondent humorously noted that *everyone is a dev nowadays*.
- **Obsolete Manus 1.5 email blasts**: A member requested the cessation of emails introducing **Manus 1.5**, asserting that it is months old.
   - No further elaboration was provided.


  

---


### **MLOps @Chipro ▷ #[events](https://discord.com/channels/814557108065534033/869270934773727272/1436225291507728435)** (1 messages): 

> `AI Agents, LangChain, AgentKit, AutoGen` 


- **AI Scholars Hosts AI Agent Workshop**: AI Scholars is hosting an online and in-person hands-on AI Product workshop where participants will design and build an **AI agent** together based on a real client’s data-analysis problem ([RSVP here](https://luma.com/zkwgfgz0)).
   - The workshop will walk participants through modern agent frameworks like **LangChain**, **AgentKit**, and **AutoGen** with a real architecture and code walkthrough from an **AI consulting project**.
- **Learn to build real AI agents**: A hands-on workshop will teach you how to build a real **AI agent** project and product, using modern agent frameworks.
   - The course is suited for engineers, PMs, startup founders, students, and AI builders - no coding or agent experience is needed.


  

---


---