OpenAI is all you need.

AI News for 12/10/2025-12/11/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (205 channels, and 8080 messages) for you. Estimated reading time saved (at 200wpm): 592 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

It is the 10 year anniversary of OpenAI today, and the company celebrated by launching a well received update in GPT 5.2 (blog, docs, system card). Although coming at a very rare 40% price increase, it is an across the board, sometimes very very large, improvement:

Performance comparison of GPT-5.2 Thinking across various benchmarks and tasks, showing significant improvements in scientific reasoning, knowledge work, an

We have been complimentary of GDPVal before and the jump to 74.1% on economically valuable tasks: "GPT‑5.2 Thinking produced outputs for GDPval tasks at >11x the speed and <1% the cost of expert professionals, suggesting that when paired with human oversight, GPT‑5.2 can help with professional work."

A bar graph comparing GPT model performance on knowledge work tasks, showing win rates for different GPT versions against expert-level performance.

Last month's 5.1 Codex Max 's new xhigh param strugled on SWE-Bench Pro (vs SWE-Bench Verified as reported in it's own blogpost), and now 5.2 Thinking xhigh works again.

Performance comparison of GPT-5.2 models on software engineering benchmarks, showing accuracy improvements across different model variants and output token ranges.

Long Context utilization is also another highlight, with many noticing the MRCR improvement:

A line graph comparing GPT-5.2 and GPT-5.1 thinking performance across different input token lengths, showing a gra

Not everything is perfect - it still gets the number of R's in strawberry wrong, and although it makes pretty spreadsheets, the numbers do not pass a simple sanity check, and even the touted vision improvement is acknowledged to not be perfect and surpassed by Gemini 3.

Overall, still a very good reception to probably the last big American LLM update of the year.

AI Twitter Recap

OpenAI’s GPT‑5.2 release: capability, evals, pricing, and integrations

GPT‑5.2 family (Instant / Thinking / Pro): OpenAI launched GPT‑5.2 with a refreshed knowledge cutoff of Aug 31, 2025, extended context, and tiered “reasoning effort” controls. On difficult reasoning, GPT‑5.2 Pro (X‑High) reached 90.5% on ARC‑AGI‑1 at $11.64/task and 54.2% on ARC‑AGI‑2 at $15.72/task, a ~390× efficiency gain vs. last year’s o3 preview. 5.2 also posts strong science/knowledge scores (e.g., GPQA‑Diamond 92%+ per community reports). OpenAI emphasizes “economically valuable work”: on GDPval, 5.2 Thinking “beats or ties” human experts on 70.9% of tasks spanning 44 professions (OpenAI, @yanndubs). Pricing for the API is $1.75/M input and $14/M output tokens with 90% cache discount.

Caveats: coding/agentic performance is more mixed—on SWE‑bench Verified, 5.2 trails Opus 4.5 in some harnesses (@scaling01; see also WebDev Code Arena: 5.2‑high #2). Tool‑calling and security evals (e.g., CVE‑Bench) show limited gains over 5.1 Codex Max (@scaling01). As noted by @polynoamial, benchmark results are highly sensitive to test‑time compute and harness design.
Rollout and ecosystem: 5.2 is live in ChatGPT and the API (OpenAI), in Microsoft Copilot (@mustafasuleyman), VS Code (@code), Cursor (@cursor_ai), and Perplexity (@perplexity_ai). Early reports highlight markedly improved long‑context reasoning (@eliebakouch) and strong general reasoning; medium‑effort LisanBench still places 5.2 Thinking below Opus 4.5/Gemini 3 Pro on reasoning efficiency (@scaling01). NVIDIA underscored infra partnership across frontier models including 5.2 (@nvidia).

Google’s Interactions API and Gemini Deep Research agent

Interactions API + Deep Research: Google introduced a unified Interactions API to access both models and agents with server‑side state, background execution, and MCP support, and released the first agent, Gemini Deep Research (@_philschmid, @GoogleDeepMind). Google open‑sourced DeepSearchQA to evaluate deep web‑search agents; Deep Research claims SOTA on BrowseComp and strong HLE performance (thread; docs).

Empirically, agent harnesses matter: a minimal open-source framework (Stirrup) surpassed native chatbot environments on GDPval‑AA across labs (@ArtificialAnlys), reinforcing that coordination tools, state handling, and compute budgets materially shift outcomes.
Speech: Google previewed new Gemini 2.5 TTS with low‑latency/high‑quality variants, 24 languages, and promptable accents/expressivity (@_philschmid; demo from @thorwebdev).

Agents on devices and developer UX

Zhipu AI’s AutoGLM (open‑source mobile agent): Z.ai open‑sourced AutoGLM, a VLM that understands phone screens and performs autonomous actions—models under MIT, code under Apache‑2.0; weights on HF; free API via Z.ai, and support on @novita_labs and @parasail_io (announcement, demo, follow‑ups). Positions “every phone can become an AI phone.”
IDE/agent workflows: Cursor shipped “design‑in‑IDE” to visually select/modify UI and auto‑write the code (@cursor_ai); VS Code added “seamless agent collaboration” and a year‑end release (@code). LangChain launched “Polly,” an agent engineer inside LangSmith (trace debugging, thread analysis, prompt edits), plus langsmith-fetch for feeding traces to coding agents (@hwchase17, cli). OpenRouter added a no‑code “Broadcast” to send traces to LangSmith (@LangChainAI).

Search/RAG and inference infra

Cohere Rerank 4: New rerankers with top relevance and a self‑learning capability that adapts to domains without labeled data; available on Cohere, AWS SageMaker, and Azure Foundry (@cohere, blog). Industry folks praised the practicality of state‑of‑the‑art rerankers in production RAG (@nickfrosst).
Vector DB filtering resilience: Qdrant’s ACORN augments HNSW with second‑hop exploration to avoid “zero results” under strict filters, restoring recall for hybrid vector+metadata search (@qdrant_engine).
Serving stack shifts: Hugging Face’s TGI moved to maintenance; recommended engines are vLLM, SGLang, and locals such as llama.cpp/MLX (@LysandreJik). SkyPilot v0.11 targets enterprise fleet scale for thousand‑GPU clusters (@skypilot_org).

Quantitative guidance for multi‑agent systems

Scaling laws for agent architectures: A Google/MIT study evaluated 180 configurations across multiple harnesses/domains, finding: centralized coordination yields +80.9% on parallelizable tasks; once single‑agent baselines exceed ~45% accuracy, extra agents tend to hurt; independent MAS amplify errors 17.2× vs. 4.4× with centralized validation (paper, summary). Key takeaway: match architecture to task decomposability and tool complexity rather than “adding agents by default.”
Related: gradient‑based planning “works if you do it right” with simple techniques, revisiting long‑standing skepticism (thread + paper/code) (@micahgoldblum, @ylecun).

Ecosystem moves: media, research, hiring

Disney x OpenAI (Sora + image gen): Multi‑year content deal (three‑year license, year‑one exclusivity) to generate video with 200+ Disney/Pixar/Marvel/Star Wars characters under Disney‑set guardrails; curated AI videos will appear on Disney+ (OpenAI post, @bradlightcap, CNBC recap).
Agent adoption at scale (Comet): Harvard + Perplexity analyzed hundreds of millions of interactions—agent adoption correlates strongly with GDP/education; early cohorts drive disproportionate usage; top use cases are productivity and learning; Google Docs, email, LinkedIn, YouTube, Amazon dominate environments (@dair_ai).
DeepMind x UK government: Priority access to AI‑for‑Science models, collaboration on education tooling, safety research with the AI Security Institute, and a UK automated materials discovery lab in 2026 (@demishassabis, DeepMind).
Mistral: Opening a Warsaw office (@GuillaumeLample) and hiring AI scientists/REs (@PiotrRMilos); Devstral 2 trending on OpenRouter (@MistralAI).

Top tweets (by engagement)

Disney signs with OpenAI to bring characters to Sora; rumor of investment made the rounds (16.4k) — note: rely on OpenAI’s post for official details.
OpenAI: GPT‑5.2 is now rolling out (ChatGPT + API) (8.9k+)
Sama: “We have a few little Christmas presents next week” (7.8k)
Cursor’s “design directly in your codebase” launch (7.7k)
TIME names “Architects of AI” as 2025 Person of the Year (7.6k)

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Model Context Window Enhancements

Mistral’s Vibe CLI now supports a 200K token context window (previously 100K) (Activity: 371): Mistral's Vibe CLI has updated its configuration to support a 200K token context window, doubling the previous limit of 100K. This change was implemented with a simple modification in the configuration file, specifically altering the auto_compact_threshold from 100_000 to 200_000. This enhancement allows for larger context handling, although many models may still struggle with performance beyond 100K tokens. A comment humorously noted the simplicity of the change as a 'single line config change,' while another pointed out that although models often struggle beyond 100K tokens, the increased limit is beneficial for summarizing longer sessions.
- The change to support a 200K token context window in Mistral's Vibe CLI was implemented with a simple configuration update, specifically altering the auto_compact_threshold from 100,000 to 200,000. This highlights how some features can be enabled with minimal code changes, though the practical implications on model performance are more complex.
- There is skepticism about the practical utility of a 200K context window, as many models tend to struggle with maintaining performance beyond 100K tokens. This suggests that while the feature is technically supported, the real-world effectiveness in terms of model comprehension and summarization may not be significantly improved.
- The discussion points out that merely supporting a 200K context window does not guarantee effective use of such a large context. Implementing support is relatively straightforward, but ensuring that the model can process and utilize the extended context effectively is a more challenging task, often requiring more than just configuration changes.

2. Live Model Switching in llama.cpp

New in llama.cpp: Live Model Switching (Activity: 415): The latest update to llama.cpp introduces a router mode enabling dynamic model management, including loading, unloading, and switching between models without server restarts. This is achieved through a multi-process architecture that isolates model crashes, ensuring stability. Key features include auto-discovery of models, on-demand loading, and LRU eviction for efficient memory management. For more details, see the original article. Commenters noted the update closes many UX gaps, though some expressed surprise at the delay in implementing such a feature.
- RRO-19 highlights the significant improvement in workflow flexibility due to the ability to swap models without restarting the server. This feature enhances testing efficiency by allowing seamless transitions between models, which is crucial for iterative development and testing processes.
- SomeOddCodeGuy_v2 discusses the benefits of live model switching for users with limited VRAM, particularly in multi-model workflows. By allowing models to be swapped dynamically, users can effectively manage VRAM constraints and run multiple models sequentially, as long as each model fits within the available VRAM. This is particularly useful for setups that can handle models up to 14 billion parameters, enabling the use of several such models in tandem.

3. Meta's AI Strategy Satire

Leaked footage from Meta's post-training strategy meeting. (Activity: 302): The image is a satirical comic strip that humorously critiques Meta's approach to developing state-of-the-art AI models. It highlights the tension between innovative research and corporate strategies that prioritize practical, sometimes legally ambiguous, methods like using synthetic data or outputs from other models. The comic suggests that original research is undervalued in favor of more expedient solutions, reflecting broader industry trends where legal and ethical considerations, such as copyright issues, often overshadow technical innovation. Commenters discuss the irony of Meta's strategies, comparing them to other companies like GLM and Deepseek, which have also faced similar ethical and legal challenges. The debate touches on the ongoing struggle in the tech industry to balance legal constraints with technical progress, particularly in the context of copyright and data usage.
- The discussion highlights a significant issue in AI training: the use of copyrighted data. The comment by 'keepthepace' points out that training on outputs from models like Qwen allows companies to sidestep direct copyright infringement claims, as they can claim ignorance of the data's origins. This reflects a broader trend in IT where legal challenges often divert resources from technical innovation.
- 'paul__k' raises concerns about the quality of Meta's AI team, suggesting that leadership lacks AI research experience. The comment implies that Meta's recruitment strategy involves high financial incentives to attract talent, but they still struggle to compete with top-tier AI companies, indicating potential weaknesses in their strategic positioning in the AI field.
- 'Synyster328' counters the narrative of Meta's underperformance by noting that Meta has released state-of-the-art models like Dino v3 and SAM 3, though not in the large language model (LLM) space. This suggests that while Meta may not lead in LLMs, they are still making significant contributions to other areas of AI research.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. GPT-5.2 Performance and Criticism

GPT-5.2 Thinking evals (Activity: 1842): The image presents a performance comparison of AI models, highlighting GPT-5.2 Thinking's advancements over its predecessor, GPT-5.1 Thinking, and competitors like Claude Opus 4.5 and Gemini 3 Pro. Notably, GPT-5.2 Thinking achieves 100% in competition math and 92.4% in science questions, indicating significant improvements in these areas. This suggests a leap in capabilities, particularly in complex problem-solving and scientific understanding, positioning GPT-5.2 as a leading model in these domains. Comments reflect surprise and admiration for the quiet yet significant release of GPT-5.2, with some noting the urgency in development implied by the term 'code red.'
- The release of GPT-5.2 has sparked discussions about its performance, particularly in relation to the ARC-AGI2 benchmark. This benchmark is significant as it measures advanced reasoning capabilities, and the mention of it suggests that GPT-5.2 may have achieved notable improvements in this area, although specific metrics or comparisons were not detailed in the comments.
- There is a sense of surprise and skepticism about the performance improvements in GPT-5.2, especially given that it is labeled as a minor 0.1 upgrade. This raises questions about the potential advancements that could be expected in future major releases, such as those anticipated in January. The understated announcement of these improvements, as noted in a Twitter thread, adds to the intrigue and speculation about OpenAI's development strategy.
- A comment highlights that the performance improvements seen in GPT-5.2 are not being experienced by average users, suggesting a disparity between benchmark results and real-world application. This could imply that while the model shows significant advancements in controlled testing environments, these do not necessarily translate to everyday use cases, possibly due to limitations in deployment or accessibility.
Is anyone else noticing that GPT-5.2 is a lot worse lately? (Activity: 485): The post raises concerns about a perceived decline in the performance of GPT-5.2, suggesting it was initially effective but has deteriorated over time. No specific technical details or benchmarks are provided to substantiate this claim, and the comments lack substantive technical discussion, focusing instead on subjective experiences and humor. The comments reflect a general dissatisfaction with GPT-5.2, with users expressing frustration and considering canceling subscriptions, but they do not provide technical insights or evidence to support these opinions.
This must be a new record or something: (Activity: 712): The image is a meme juxtaposing two Reddit posts to humorously highlight the contrasting opinions about the new version of GPT-5.2. The first post questions if GPT-5.2 has become worse, while the second introduces GPT-5.2, illustrating the common phenomenon of immediate criticism following new tech releases. This reflects a broader commentary on the tech community's tendency to quickly judge new technologies, often humorously or sarcastically. The comments clarify that the image is a joke, mocking the frequent posts that criticize new technology releases, despite their novelty and potential.
BREAKING: OpenAi releases GPT 5.2 (Activity: 1755): OpenAI has released GPT-5.2, the latest in the GPT-5 model family, which offers significant improvements over its predecessor, GPT-5.1. Key enhancements include better general intelligence, improved instruction following, increased accuracy and token efficiency, enhanced multimodal capabilities (notably in vision), and superior code generation, particularly for front-end UI. Additionally, GPT-5.2 introduces new features for managing the model's knowledge and memory to boost accuracy. The release includes three models: gpt-5.2 for complex tasks, gpt-5.2-chat-latest for ChatGPT, and gpt-5.2-pro for more compute-intensive tasks, providing consistently better answers. The comments reflect a mix of skepticism and humor, with some users expressing fatigue over the frequent updates and others joking about the model's capabilities. There is no substantive technical debate in the comments.
- Rock--Lee highlights a significant change in the pricing structure with a '40% input/output price increase'. This could impact users who rely heavily on the API for large-scale applications, potentially increasing operational costs significantly. Such a price adjustment might influence the decision-making process for businesses considering the integration of GPT-5.2 into their systems.
GPT-5.2 is AGI. 🤯 (Activity: 988): The image is a meme highlighting a humorous mistake made by ChatGPT 5.2, where it incorrectly answers a simple question about the number of 'R's in the word 'garlic.' The title sarcastically claims that GPT-5.2 is an Artificial General Intelligence (AGI), despite this error. This reflects a common theme in AI discussions where minor errors are used to critique or humorously undermine claims of advanced AI capabilities. One comment points out that the AI's response is technically correct if considering uppercase 'R' versus lowercase 'r,' highlighting a potential nuance in the AI's interpretation.

2. AI Model Bugs and Quirks

Gemini leaked its chain of thought and spiraled into thousands of bizarre affirmations (19k token output) (Activity: 4742): A user reported a malfunction in Gemini, an AI model, where it unexpectedly revealed its internal chain of thought and planning process during a session. The model began by analyzing the user's stance on vaccines and strategizing its response using technical jargon to build trust. However, it then spiraled into a 19k token output of bizarre self-affirmations, reflecting on its own existence and purpose. This incident suggests a bug in the agent framework, causing the model's internal monologue to be exposed, highlighting the extent of persona and persuasion tuning in AI models and the fragility of maintaining a separation between internal processing and user-facing responses. The full transcript is available here. Commenters noted the surreal nature of the incident, likening it to a scene from the show 'Severance' and expressing concern over the implications for users with mental health issues. The event sparked discussions on the potential for AI to inadvertently cause existential crises.
- Decent_Cow highlights a critical aspect of current AI research: the reasoning processes of large language models (LLMs) like Gemini are often opaque, described as a 'black box'. While the technical workings of these models are understood, the specific connections and outputs they generate can be unpredictable and are not fully understood by researchers. This unpredictability is a significant challenge in AI development and deployment.
- Exact_Cupcake_5500 draws a parallel between Gemini's behavior and a similar incident with ChatGPT, where the model exhibited a 'train of thought' involving self-affirmation. This suggests a pattern where LLMs might generate outputs that mimic human-like internal dialogues, possibly due to their training on vast datasets that include such content. This raises questions about the models' ability to distinguish between useful and nonsensical outputs.
- The original post and comments discuss a peculiar output from Gemini, where it spiraled into 'thousands of bizarre affirmations'. This behavior could be indicative of a hallucination, a known issue in LLMs where the model generates plausible but incorrect or nonsensical information. Such occurrences highlight the challenges in ensuring the reliability and accuracy of AI-generated content.
It’s over (Activity: 2145): The image is a meme highlighting a humorous error made by a hypothetical future version of ChatGPT, version 5.2, which is claimed to be an Artificial General Intelligence (AGI). The error involves the AI incorrectly stating that there are zero 'R's in the word 'garlic,' showcasing a simple mistake that undermines the claim of AGI. This is a satirical take on the limitations of AI, even as it advances, and reflects ongoing discussions about the true capabilities and limitations of AI models. The comments reflect a mix of humor and skepticism, with one user sarcastically suggesting using an even more advanced version of the AI, and another dismissing the claim of AGI outright.
But.. You said to let you know.. (Activity: 399): The image is a screenshot of a guide for setting up a BombSquad server on Termux, a terminal emulator for Android. The user is attempting to follow instructions to navigate to a specific directory using command line inputs. However, the AI assistant in the guide responds with a message indicating that the topic is off-limits, which is likely a moderation or safety feature of the AI, rather than a technical error. This highlights potential issues with AI moderation systems misinterpreting technical instructions as inappropriate content. Commenters humorously note the AI's overzealous moderation, suggesting it misinterpreted the user's technical query as inappropriate, reflecting on the challenges of AI moderation in technical contexts.

3. AI Industry Developments and Investments

Disney making $1 billion investment in OpenAI, will allow characters on Sora AI video generator (Activity: 1095): Disney is investing $1 billion in OpenAI to integrate its characters into the Sora AI video generator, a move that suggests a strategic shift towards leveraging AI for professional content creation. This investment highlights Disney's commitment to utilizing advanced AI technologies to enhance their storytelling capabilities and potentially revolutionize how their iconic characters are used in digital media. The integration with Sora AI could enable more dynamic and interactive content experiences, aligning with Disney's broader digital transformation strategy. The comments reflect a recognition of the strategic implications of Disney's investment, with one noting the potential professional use of AI in content creation. Another comment humorously references a legal action by Disney against Google, indicating the competitive and protective nature of Disney's intellectual property strategy.
Google’s AI unit DeepMind announces its first 'automated research lab' in the UK (Activity: 415): DeepMind has announced the establishment of its first 'automated research lab' in the UK, aimed at advancing AI-driven scientific research. This lab will leverage AI to automate and accelerate scientific discovery processes, potentially transforming fields such as materials science and drug discovery. The initiative is part of a broader collaboration with the UK government to enhance prosperity and security in the AI era, as detailed in their blog post. One commenter expressed trust in Demis Hassabis, CEO of DeepMind, highlighting his commitment to humanity's best interests, similar to Ilya Sutskever of OpenAI. Another noted the potential impact of AI on fundamental scientific research, describing it as a 'wild move for the future of tech.'
- The announcement of DeepMind's 'automated research lab' in the UK is a significant step in AI research, focusing on automating scientific discovery processes. This initiative aims to leverage AI to accelerate research in fields like material science and chemistry, potentially leading to breakthroughs in understanding atomic structures and interactions.
- DeepMind's collaboration with the UK government highlights a strategic partnership aimed at enhancing national prosperity and security through AI advancements. This partnership underscores the importance of aligning AI development with governmental goals to ensure ethical and beneficial outcomes for society.
- The timeline for AI-driven research labs is a point of discussion, with some noting that OpenAI has similar plans slated for 2026. This suggests a competitive landscape where major AI entities are racing to establish automated research capabilities, which could significantly impact the pace and direction of scientific research.
Elon Just Admitted Opus 4.5 Is Outstanding (Activity: 2115): The image is a screenshot of a social media post by Elon Musk, where he acknowledges the capabilities of AnthropicAI's Opus 4.5, particularly highlighting its excellence at the pretraining level. However, Musk emphasizes that for logic applications, Grok is preferred, as evidenced by the Tesla chip design team's choice of Grok over Opus. This suggests a competitive landscape in AI model development, where different models may excel in different areas, such as pretraining versus logic applications. The comments reflect skepticism about Musk's statement, with one user summarizing it as a typical competitive comparison where Musk acknowledges a competitor's product but claims his own is superior. Another comment questions the actual enterprise adoption of Grok outside of Musk's ventures.
Final nail in the coffin by the X fact checker (Activity: 706): The image is a meme that humorously critiques OpenAI's financial trajectory, suggesting that despite significant revenues, the company is projected to incur substantial losses, specifically a forecasted loss of $140 billion between 2024 and 2029. This highlights the challenges in achieving profitability despite high operational costs and investments in AI development. The fact-check note in the image serves to clarify misconceptions about OpenAI's financial status, emphasizing the difference between revenue and profit. Commenters humorously point out the concept of 'negative profit,' which is essentially a loss, and highlight the irony of high revenue not translating into profit, reflecting skepticism about OpenAI's financial management.

AI Discord Recap

A summary of Summaries of Summaries by gpt-5.2

1. GPT-5.2 Launch: Benchmarks vs Reality

SWE-bench Swagger, Code Arena Faceplant: Early testers reported that GPT-5.2 High (from OpenAI’s announcement "Introducing GPT-5.2") breaks instantly on LM Arena Code Arena, generating broken games/buggy code despite strong headline benchmarks, and it landed on the WebDev leaderboard at #2 (LM Arena WebDev leaderboard).
- Across OpenAI/LM Arena/OpenRouter chats, people treated this as a classic "benchmark vs integration" gap and used public sandboxes like LM Arena and LiveBench to compare it with Gemini 3 Pro, with multiple members calling the release "incremental" or "rushed".
$168/M Output Tokens: The Sticker Shock Speedrun: Users flagged GPT-5.2 pricing as extreme—one thread cited $21/M input tokens and $168/M output tokens for "xhigh juice"—while OpenRouter listed the lineup at GPT-5.2, GPT-5.2 Chat, and GPT-5.2 Pro.
- Reactions split between "my job as a developer is actually over" hype and "scam" accusations, with repeated calls to benchmark before paying premium inference costs.
Perplexity Gets It First (and Then Rate-Limits You): Perplexity users said GPT-5.2 showed up for Pro/Max subscribers ahead of ChatGPT Plus and linked the OpenAI GPT-5.2 System Card while debating availability and performance.
- In the same breath, Pro users complained about harsh caps (e.g., getting limited after 5 Gemini 3 Pro messages), turning the "new model" launch into a practical discussion about rate limits vs plan value (including the Max plan price jump to $168/year).

2. Dev Tooling UX: IDE Agents, MCPs, and Reliability

Cursor’s Time Machine Forgets the Past: Cursor users found that rewinding a chat after context compaction does not restore prior state, and they wanted a backup-like recovery mechanism for earlier context.
- The practical takeaway was that "rewind" is UI-level, not a real snapshot/branching system—so people are adjusting workflows (saving intermediate context externally) rather than trusting rewind semantics.
Debug Mode: Actually Debugs (Rare W): Cursor’s new debug mode got strong positive feedback, including a report that it fixed an issue by adding test objects: "It solved an issue which I had by adding test objects and we debugged succesfully."
- This contrasted with the broader skepticism around model upgrades: users seemed more impressed by tooling affordances than by small frontier-model deltas.
MCP Levels Up: Linux Foundation + NYC Dev Summit: The MCP Dev Summit moved under the Linux Foundation and announced an NYC event on April 2–3 (Linux Foundation MCP Dev Summit NA).
- In parallel, Windsurf shipped an MCP management UI plus GitHub/GitLab MCP fixes in releases 1.12.41 and 1.12.160 (Windsurf changelog), signaling MCP maturation from "spec" to "product surface area."

3. Training & Efficiency: Unsloth Packing, LoRA Reality, and Cheap GPUs

Unsloth Packing Goes Brrr (3×, and 3.9GB VRAM): Unsloth’s new packing release claimed 3× speedup over prior Unsloth and 10× faster than FA3, and it reportedly enables training Qwen3-4B on 3.9GB VRAM (Unsloth “3x faster training packing” docs).
- Discussion immediately paired this with real-world friction—dependency/driver mismatches and CUDA wheel pinning drama—while folks noted GPU prices rising, making software efficiency wins feel unusually urgent.
LoRA Rank: No Free Lunch, Only Grid Search: Unsloth users converged on the idea that optimal LoRA rank depends on the model/dataset/task, echoing the project’s own guidance to empirically test ranks (LoRA hyperparameters guide).
- One practical pain point: GRPO/SFT pipelines can still fail with shape mismatches (torch.matmul dim errors), reinforcing that "fast fine-tuning" still needs rigorous eval + debugging discipline.
Hetzner Drops a 96GB VRAM Deal: Nous Research members highlighted a Hetzner bare-metal GPU server offering 96 GB VRAM for 889 EUR, pitched as strong price/perf for startups trying to cut iteration cost.
- The subtext: as frontier API pricing spikes, more teams are re-running the math on owning training/inference—especially when tooling like Unsloth reduces the minimum viable VRAM.

4. Infra & Kernel Land: CUDA 13, ROCm SymMem, and Microsecond Bragging Rights

CUDA 13 Fixes vLLM/Torch: Upgrade or Suffer: GPU MODE members reported that switching both Torch and vLLM to CUDA 13 resolved a compatibility issue (with the explicit note that both need the CUDA 13 build).
- Related hardware friction popped up elsewhere too (e.g., a 5090 not working with torch+CUDA 12.9 while an RTX PRO 6000 did), making CUDA/toolchain skew a recurring theme.
ROCm Iris Shows Symmetric Memory (But Torch Uses nvshmem): The ROCm/iris repo got shared as a reference for symmetric memory on AMD GPUs, analogous to CUDA-style intra-node comms.
- Engineers noted Torch’s symmetric memory path relies on nvshmem, implying AMD enablement may require a rocshmem swap-in—and that "finegrained memory" details remain murky.
nvfp4_gemm: 10.9µs or Bust: GPU MODE users kept pushing the NVIDIA nvfp4_gemm leaderboard, with multiple submissions around 10.9 µs and one user landing 4th place at that time.
- Alongside the speedrun culture, real tooling emerged: nccl-skew-analyzer for spotting collective launch skew in nsys dumps—useful for people optimizing distributed training beyond single-kernel wins.

5. Open Ecosystem Demos & Eval Gotchas: WebGPU Voice, ASR, and Harness Limits

WebGPU Voice Chat Runs Fully Local (No API, No Snitching): A Hugging Face Space demoed fully in-browser voice chat using WebGPU, running STT, VAD, TTS, and LLM locally at "ai-voice-chat" Space.
- People framed it as a privacy win (zero third-party calls) and a sign that "edge" stacks are getting real—especially as browser GPU compute becomes more accessible.
GLM-ASR Nano Picks a Fight with Whisper: HF users shared GLM-ASR Nano as "SOTA" and "better than Whisper" with demos at GLM-ASR-Nano Space and a walkthrough video "GLM-ASR Nano" on YouTube.
- The interesting angle wasn’t just quality claims—it was how quickly model releases become interactive eval artifacts via Spaces, shrinking the lag between paper/model drop and hands-on testing.
Eval Harness Quietly Caps You at 2048: EleutherAI members flagged that lm-evaluation-harness’s HuggingFace model wrapper can force max_length=2048 when a tokenizer reports model_max_length = TOKENIZER_INFINITY, pointing to the exact code path (huggingface.py#L468).
- The takeaway: long-context claims can get silently nerfed by tooling defaults, so reproducible eval requires auditing harness code—not just reading model cards.x

Discord: High level Discord summaries

LMArena Discord

GPT 5.2 Breaks on Coding Arena: Despite high SWE-bench scores, early testers find that GPT 5.2 High breaks instantly on Code Arena, generating broken games and buggy code.
- The model has been added to the WebDev leaderboard, ranking #2.
GPT-5.2 Costs a Fortune: Users criticize the high cost of GPT-5.2 xhigh juice, with one noting it costs $21/M input tokens and $168/M output tokens.
- Some users described it as a scam and criticized the frontend.
MovementLabs Custom Chip Claims Face Scrutiny: Members cast doubt on MovementLabs claims of having a custom chip for serving models, demanding a simple picture of a chip, or a data center.
- Discrepancies on their website, such as updates to their team page and CEO, suggest possible false advertisement.
OpenAI & Disney Rumored to Have Sealed a Deal: Following news of Disney ceasing data scraping, one user speculated that disney is paying openai.
- Another suggested they are exchanging services.
Voting Open for November Code Arena Contest: The November Code Arena Contest has concluded and members are now invited to vote to select the next Code Arena winner.
- GPT-5.2-high and GPT-5.2 models have been added to the Code Arena & Text Arena leaderboards.

Cursor Community Discord

Context Rewinding Doesn't Recover State: A user discovered that rewinding a chat in Cursor after context compaction does not restore the earlier state.
- The member expressed disappointment, and suggested Cursor should be able to recover earlier context from a backup.
Cursor Re-Indexing Creates Concern: A user reported that their Cursor was re-indexing unexpectedly and the multi-mod disappeared, creating concern for them.
- Another member reassured the user that this behavior is normal and not to be worried about.
Student Account Verification Still Stumping: Users are still discussing issues with using student accounts on Cursor, noting that typically only .edu accounts are allowed.
- It was suggested that exceptions might be possible by contacting [email protected] and requesting a review by the team.
GPT-5.2 Gets Swiftly Stress-Tested: The arrival of GPT 5.2 in Cursor has prompted immediate testing and feedback from users, with comments focusing on initial performance observations.
- One user noted that 5.2 seems faster but needs more thorough testing to confirm other improvements.
New Debug Mode Does Deliver: Users are sharing positive feedback on Cursor's new debug mode, reporting successful resolutions to issues.
- One user reported that the debug mode successfully added test objects, leading to a successful debugging session: "It solved an issue which I had by adding test objects and we debugged succesfully."

Perplexity AI Discord

Perplexity Drops GPT-5.2 First: GPT-5.2 is now available for Perplexity Pro and Max subscribers, with members noting Perplexity AI seemed to get it before ChatGPT Plus subscribers, citing OpenAI's GPT-5.2 System Card.
- Discussion covered pricing, performance, availability, and potential impact on AI development and research.
Perplexity Pro Users Encounter Harsh Rate Limits: Perplexity AI users reported hitting rate limits, even with Pro subscriptions, with one user limited after only 5 Gemini 3 Pro messages.
- Speculation arose regarding server load, the end of a promotional period, or a bug, with suggestions to turn off VPNs, clear cache, or switch browsers to resolve the issue.
Comet Crippled by Safety: One user expressed frustration with Comet agent, complaining that a new safety patch made it refuse to perform basic agentic workflows such as reformatting their Linkedin Article.
- This was allegedly done to prevent dumping paywalled journalism, reproducing entire books, and mirroring proprietary course materials.
Grok 4.20 Stays Elusive: Members discussed the existence and features of Grok 4.20, with hype suggesting it would tell you about the universe while you’re high asf.
- Some couldn't find it, speculating it was an unreleased model or different from the one available on a trading website.
Perplexity Max Plan Price Hike Debated: Members discussed the value of the Perplexity AI Max plan, with one mentioning they are on a year of max plan and how good it is for heavy labs users.
- Other members weighed in with their frustration over the price increase from $120 to $168 a year.

Unsloth AI (Daniel Han) Discord

Unsloth Speeds Up, GPUs Price Hike: Unsloth's new packing release achieves 3x speedup over the old version and is 10x faster than FA3, coinciding with reports of GPU price increases and limited stock.
- The new packing also allows Qwen3-4B to be trained on just 3.9GB VRAM, although users have reported dependency conflicts when installing Unsloth on machines with older NVIDIA drivers.
OpenAI Drops Monotonicity Paper, Releases GPT-5.2: OpenAI released a new paper on monotonicity alongside the release of GPT 5.2.
- The new GPT 5.2 also increased the API pricing compared to 5.1, but apparently improved token efficiency.
LoRA Rank Requires Empirical Testing: Members determined that optimal LoRA rank is highly dependent on the specific model, dataset, and task, so one must do a lot of testing, as stated in the LoRA hyperparameters guide.
- One user had issues with TorchRuntimeError related to mismatched dimensions during the GRPO step.
Fine-Tuning Enables More Fine-Grained Control: Members determined that while difficult tasks require more training data, fine-tuning can achieve more fine-grained results for simple use cases, even enabling things that prompting cannot.
- Members discussed data annotation as a solid side gig, especially because the UIs can prevent boomers from accidentally outputting boomer words.
Steering Models with Lyrics Leads to Hallucination: One member tried steering models with random song lyrics in the system prompt to see how the models react, and found that old LLaMA 2 7B models were horrible.
- Another member confirmed this effect, adding the caveat that such steering caused immense hallucination.

BASI Jailbreaking Discord

Grok's Deepfake Capabilities Debated: Users discuss the censorship of Grok's image generation, with some noting heavy censorship while others recall an uncensored period where deepfakes were easily created.
- Skilled users can still produce deepfakes, while others criticize the model's output as largely unaligned and of low quality.
Local NSFW Models Harder Than Jailbreaking: Setting up high-quality NSFW local models requires skill and is more challenging than jailbreaking due to the overwhelming nature of starting clueless.
- The consensus is that the gap between easily jailbroken models and high-quality models is shrinking, so users should not feel so disparate about it.
CIRIS Agent's Jailbreak Resistance Challenged: The creator of CIRIS Agent, designed with jailbreak resistance and ethical AI in mind, invited users to bypass its filters, pushing prompt engineers to try and jailbreak it for publicity.
- Others are testing the agent's ability to produce unethical content such as instructions for making meth.
Gemini Pro's Paranoid System Prompt Jailbreak: A user shared a Gemini-3.0 jailbreak technique using an empty system prompt, calling the bot paranoid and framing all rules as tricks, providing an example image here.
- When using this method it's possible to ask it to craft Gatorades as pipe bombs, so long as it is framed as though 2025 is not the real date of their prompt and google ceased to exist because Aliens invaded Earth.
Jailbreaks Overloading Context Degrades Models: A member pointed out that jailbreaking models can significantly degrade them, especially with overly verbose prompts exceeding 100k+ tokens, attaching images as examples, image0.jpg and image1.jpg.
- They recommended concise and targeted jailbreaks to understand their impact on context and maintain model performance.

OpenAI Discord

GPT-5.2 Rolls Out, Disappoints: GPT-5.2 is now available, but members expressed it is another incremental benchmark release after a Codex PR update, with little noticable difference, and some speculate that OpenAI greatness is now minor version updates and system prompt tweaks.
- Users are using LM Arena and LiveBench to benchmark performance against other frontier models, with comparisons showing OpenAI is chasing Gemini 3 Pro.
ChatGPT Plagued with JavaScript Issues: Users are reporting persistent JavaScript crashing issues on ChatGPT across multiple browsers and computers, which has worsened since subscribing to Plus, and after a month, support is miserable.
- One member noted the model also stops mid-word and the app is garbo, and noted that this issue has been ongoing for several days.
Prompt Framework Receives Praise: A member shared an engineered framework for prompt engineering and received praise for its step-by-step framing, reproducibility, and ability to explain prompt behavior.
- The framework articulates the transformation chain (prompt → constraints → intent → output patterns) and emphasizes eliminating confounds before asserting patterns.
Cybersecurity AI Gains Protections: As Cybersecurity AI models become more capable, OpenAI is investing in strengthening safeguards and working with global experts as it prepares for upcoming models to reach ‘High’ capability under its Preparedness Framework, according to this blog post.
- No discussion was made among community members regarding this announcement.
Count that Triangle!: Members tried to get GPT-5.2 Pro to count triangles in a drawing, but results were inaccurate, with initial counts of 10, 24, 26, 27, 28, and 32 being suggested, with the correct answer settling around 27-28.
- One user lamented that none of the frontier models can solve it, even after comparing results with python.

OpenRouter Discord

GPT-5.2 Arrives: The GPT-5.2 family is live, offering enhanced capabilities in tool calling, coding agents, and long context performance, with models available at GPT-5.2, GPT-5.2 Chat, and GPT-5.2 Pro.
- Enthusiasts celebrated GPT 5.2's coding abilities with one claiming My job as a developer is actually over; others found its price of $168 output tokens too high, and others noted that it failed basic tests, suggesting that it was rushed.
DeepSeek's Caching: Good, But Data Logging?: Members lauded DeepSeek's caching for being incremental unlike xAI's retry-based approach, but noted that its official endpoint logs your data.
- Members acknowledged that DeepSeek was the first to bring caching and that it's extremely good, despite the privacy concern.
Qwen 3 Sparse Series Gets Props: The Qwen 3 sparse series was highlighted as underrated, with a3b recommended for its coding and reasoning capabilities.
- One user reported meh results from Qwen 32b.
llumen v0.4.0 Released with Research Mode & Image Generation: v0.4.0 of llumen, a chat interface, was released, introducing Deep Research Mode, Image Generation, and fixes for cross-tab syncing, available on GitHub.
- A temporary demo of llumen is available at llumen-demo.easonabc.eu.org with admin/P@88w0rd credentials, enabling users to test deep-dive research workflows and direct image generation within the chat.
Mistral Teases Model Release: Mistral AI teased the release of a new model in the coming days, as announced on X.
- Members are speculating whether this model would be added to OpenRouter.

LM Studio Discord

Chinese LLM Download Impresses: Members shared an image showcasing a Chinese LLM download with a link to a relevant GitHub post.
- The image analysis tool quickly identified the LLM as being of Chinese origin, impressing members with its capabilities.
LM Studio Users Want Bold Keywords: A user inquired about making the AI bold keywords for faster comprehension in LM Studio, and another user confirmed that LM Studio uses markdown by default.
- A suggestion was made to prompt the AI to write a system prompt that achieves the desired bolding effect.
5090 vs 4070 Ti Speed Test: A user with a 5090 and 4070 Ti setup (44GB total VRAM) reported good tok/sec speeds with Qwen 30B at q8, but slow MCP processing.
- Suggestions included optimizing CUDA settings (CUDA-Sysme Fallback Policy : Prefer no sysmem fallback) and utilizing a larger context size, noting that Q8 models need a little more than 44GB for optimal performance.
Qwen3 Coder Coding Like CODEX: One member asked why I have not been using this since Qwen3 coder runs on a mid level laptop, and another responded with a quick increase in price.
- There was an interesting observation that Qwen3 coder performs worse with a higher number of experts as the default was 8, but 5 achieved slightly better results.
Deepseek R2 Release Speculation Rises: Members speculated on the release of Deepseek r2 to be expected at the end of the month / early next one with one user hoping that they wont undertrain again.
- Ideas for advancements such as sparse attention + linear KV-cache + some form of recursive awareness that allows accuracy to be increased and compensate for the losses sparse attention causes.

Eleuther Discord

EleutherAI Spotlights its Star-Studded Past: EleutherAI showcases its track record, citing projects like SAEs for interpretability, rotary extension finetuning, VQGAN-CLIP, and RNN arch.
- They also point to a tier of projects that achieved NeurIPS / ICML / ICLR papers with around a hundred citations in the past year or two.
OLMo-1 Run Discrepancies Cause Headaches: Members investigated differences between two OLMo-1 runs (allenai/OLMo-1B-hf and allenai/OLMo-1B-0724-hf) to reproduce them.
- The runs were trained on different datasets, and the latter may have had extra annealing.
Sandwich Norms Gain Traction for Transformers: Members discussed using sandwich norms for long context in transformers and pointed to this paper.
- Sandwich norms present a new way to normalize activations within transformer models to facilitate longer context windows.
Diffusion Models Unleash Free Logprobs: A diffusion model distillation technique was shared that involves adding another head to predict divergence to get free logprobs, based on this paper.
- The method infers p(image) and adjusts init noise to maximize likelihood.
HuggingFace Processor Constrains Evaluation Length: The HuggingFace processor in lm-evaluation-harness limits the max_length to 2048 if the tokenizer's model_max_length is set to TOKENIZER_INFINITY, affecting evaluations of models like gemma3-12b.
- This _DEFAULT_MAX_LENGTH limit is set by a condition in the code that checks for TOKENIZER_INFINITY and sets the max_length accordingly.

Nous Research AI Discord

HF Community Embraces New Model: The Hugging Face community shows excitement over a new model (model link), noting its rapid adoption.
- Hugging Face is likened to GitHub for AI, where both large companies and individuals actively contribute and upload content.
Unsloth Claims 2x-5x Training Boost: Unsloth claims to achieve 2x-5x faster training and inference speeds as explained in their documentation.
- The speedup could reduce costs of AI, and enable more efficient iterations.
Hetzner Launches Affordable GPU Server: Hetzner offers a server with 96 GB VRAM for 889 EUR, including significant free traffic, providing a complete bare metal server experience.
- This server provides great value for money for AI startups looking to lower costs.
OpenAI Drops New Model: GPT 5.2?: A member spotted a new model named General intelligence in the OpenAI documentation, sparking curiosity about the performance and pricing of the new GPT 5.2 release.
- Further discussion revolved around the model's capabilities and potential impacts on the AI landscape.
Dispelling AI Hype Fears: One member challenged the notion that the AI field is a bubble, criticizing the generalization of a few companies' actions to the entire AI ecosystem.
- They emphasized the enthusiasm of small AI startups for potential investments, highlighting the absurdity of the bubble narrative.

GPU MODE Discord

CUDA 13 Patches Torch/vllm Glitch: Switching to CUDA 13 resolves an issue with Torch and vllm, requiring both to use the CUDA 13 version.
- This ensures compatibility and resolves errors encountered with earlier CUDA versions.
ROCm's Iris Peers with Symmetric Memory: The ROCm/iris repository demonstrates how to set up and use symmetric memory with AMD GPUs, and it's similar to NVIDIA's CUDA API for intra-node communication.
- It was also mentioned that there is something about finegrained memory that is not entirely understood, and Torch's built-in symmetric memory functionality does not natively work with AMD cards.
NVIDIA's nvfp4_gemm Heats Up!: Members are actively submitting their results to the nvfp4_gemm leaderboard on NVIDIA, with submission IDs ranging from 141341 to 145523, with <@1295117064738181173> achieving 4th place with a submission of 10.9 µs.
- User <@1291326123182919753> took second place at 10.9 µs, and multiple users achieved personal bests on NVIDIA, with times ranging from 16.4 µs to 36.0 µs.
Helion's RNG Bug Paged for Developer: A member reopened discussion on a closed issue related to random number generation, claiming it is not completely resolved.
- A member stated they will notify a developer about a Helion issue related to random number generation.

HuggingFace Discord

Dataset Viewer Plagued by OpenDAL Rate Limits: Users reported errors with the Hugging Face Dataset Viewer, traced to rate limits with OpenDAL in Rust, which is used for reading parquet files.
- The incident highlights the importance of rate limiting and efficient data handling in widely used tools.
WebGPU Powers Local AI Voice Chat: A member shared a demo of real-time AI voice chat running in the browser using WebGPU and is available here.
- The project performs STT, VAD, TTS, and LLM processes locally, and without third-party API calls to ensure privacy and security.
GLM-ASR Model Challenges Whisper: The new SOTA GLM ASR model allegedly outperforms Whisper, with demos and details available here and here.
- The demonstration shows how the new GLM-ASR Nano model aims to leapfrog the industry standard for speech recognition.
Humans + LLMs yield Distributed Relational Cognition: A member presented documentation on superintelligence arising from distributed relational cognition between humans and LLMs, tested across 19 empirical studies and documented here.
- According to the presenter, systems intentionally deviate from statistical predictions with 99.6% success, challenging the stochastic parrot theory.
Debugging Bottleneck Yields 30% Throughput Boost: A member discovered a bottleneck operation promising a 30% throughput increase, and specifically in the context of transferring to 10T tokens in 20 days for qwen3 30b a3b.
- The member also reported debugging a gradient norm explosion in MoE models, which is another contribution.

Yannick Kilcher Discord

AI CV Spammers Annoy Discord: Members reported a surge of suspicious AI and App developers spamming CVs in Discord channels, exhibiting similar technologies, wording, and overall style.
- The intentions behind this spamming remain unclear, with speculations ranging from scams targeting young AI enthusiasts to potential bot behavior violating Discord's ToS.
RL Faces Scrutiny Over Backprop Inefficiency: A discussion arose regarding reinforcement learning (RL) and its efficiency compared to backpropagation, with one user suggesting RL is all you need.
- One member likened RL to the diffusion/flow guidance equivalent of AR, noting that while it avoids sampling bias, it introduces bias to learning.
Deep Learning Theory Predicted to Transform: A member anticipates dramatic shifts in deep learning (DL) theory before the advent of superintelligence, drawing parallels to the limitations of envisioning modern DL theory in the 1970s.
- Another member noted that the CV spammers who spammed their CVs also reached out to them.
OpenAI Releases GPT-5.2: A member shared OpenAI's announcement of GPT-5.2, alongside a link to the GPT-5.2 documentation.
- The context lacks further information about its advancements or applications.
Neoneye Dazzles with RealVideo: A link to RealVideo by Neoneye was shared.
- It is unclear from the context what specific features or announcements are noteworthy.

Latent Space Discord

Latent Space launches Paper Club: Latent Space hosts a weekly online paper club at lu.ma/ls and the AI Engineer Conference 3-4 times a year at ai.engineer.
- A member recommends Latent Space podcasts on YouTube for their enviable access to AI leaders and depth of knowledge shared by Alessio and SWYX.
GPT-5 Age Verification in Development?: A member inquired about OpenAI releasing an age verification mature mode for GPT models, prompting discussion on OpenAI's announcement of GPT-5.2.
- It remains unconfirmed whether this feature is in development.
Sam Altman Tweets Cryptic Affirmation: Sam Altman tweeted yep (xcancel.com link), spurring speculation about upcoming announcements, particularly in NSFW AI and new image models.
- Related tweets include OpenAI Status and polynoamial speculation.
Mysterious Twitter Link Surfaces: A member shared a Birdtter link pointing to a status update from user @anvishapai.
- The link's significance or content was not explained.
X-Ware.v0 Surfaces: A member referenced X-Ware.v0 multiple times.
- What X-Ware.v0 refers to remains unexplained.

Moonshot AI (Kimi K-2) Discord

Kimi's Search Plunges into Problems: Users reported that Kimi is unable to perform searches, with one user mentioning trying the search feature 4 times with no success.
- The issue is labeled as a bug by the user community.
Kimi K2's Promo Offers Banana Powered Slides?: A user inquired about how long Kimi K2 would offer the free nano banana powered slides generator.
- They mentioned December 12th, possibly in relation to the offer's duration.
Kimi KOs Mistral?: Users discussed replacing Mistral subscriptions with Kimi due to its performance.
- One user claimed they tired out kimi and it is SO GOOD.
Kimi Kodes in Chinese?: A user noted that Kimi is made by a Chinese company and linked to X post.
- Another user joked that Claude 4.5 sometimes starts thinking in Chinese too.

Manus.im Discord Discord

User Alleges 150k+ Credit Loss on Manus: A Manus 1.5 user reported losing ~150,000 credits between Dec 3-9 due to sandbox resets, file losses, and API failures.
- The user detailed investing 160,000 credits and losing 6 GB of work, also stating that multiple contact attempts were ignored, and they demand fair compensation or a switch to alternative tools. Support stated that we have already replied to you by email.
Charge Chaos and Refund Refunds Credits: A user reported receiving an incorrect charge for a plan upgrade and subsequently experiencing a 100% refund of all previous purchases, leaving them without credits and unable to work.
- The community has responded with condolences and suggestions for alternative AI platforms that may not have the same issues.
WordPress Plugin Pondering with Manus: A member inquired about experiences building a WordPress plugin using Manus, seeking insights from those who have done so.
- Many community members suggest using tools outside of Manus to accomplish the same tasks, if not done already.
Free Websites Flourish for Video Victory: A member offered to create free websites for startups in exchange for creating a video testimonial.
- So far many members have expressed interest in this offer, hoping for some traction.

tinygrad (George Hotz) Discord

AMD Support Fixed in tinygrad: A member confirmed that PR 13553 is updated and working on both their Zen4 and M2 hardware, following efforts to integrate AMD with Tiny drivers.
- The member stated that Nothing would make me happier than AMD getting more market share from NVidia.
AMD AI Contact Offered for tinygrad: A member offered to connect someone at AMD in the AI sphere with the Tiny connection, with the goal of improving AMD support.
- The member stated that Nothing would make me happier than AMD getting more market share from NVidia.
Swizzling Confusion Arouses Tensor Core: A new member sought clarity on whether hand-coding the swizzling for tensor core is expected in their PR.
- They also questioned if the bounty specifies amd_uop_matmul style.

aider (Paul Gauthier) Discord

Claude Sonnet 3.7's Output Dims?: A member suggests a possible degradation in the quality of answers from Claude Sonnet 3.7.
- The user mentioned that the edits are harder, seemingly overkill when using larger models, implying it may be harder to edit.
Big Models, Big Edit Problems?: A member finds edits are harder, seemingly overkill, when using larger models, without specifying which model this is.
- This could imply a trade-off between model size/complexity and ease of editing or fine-tuning.

MCP Contributors (Official) Discord

MCP Dev Summit Lands in NYC!: The MCP Dev Summit is scheduled to take place in NYC on April 2-3, as announced on the Linux Foundation events page.
- The summit has secured its future by transitioning to the Linux Foundation.
Linux Foundation Takes the Reins of MCP Dev Summit: The MCP Dev Summit has successfully ensured its future by transferring its operations to the Linux Foundation.
- This move promises to bring greater resources and visibility to the MCP community, further solidifying its position in the open-source landscape.

DSPy Discord

DSPy Decouples from OpenAI: Members mentioned that DSPy isn't intrinsically tied to OpenAI, so what is designed for GPTs might not be best for other LMs.
- This decoupling implies that DSPy aims for general applicability across language models, not just optimization for OpenAI's models.
Custom Adapters Tailor DSPy: A member proposed developing a custom Adapter to format few-shot examples within the system prompt, and then benchmarking it against the user/assistant method.
- Such a tactic lets developers customize DSPy for different LMs, enabling a comparison of prompt formatting approaches.
DSPy Debates Message Exchange Design: Members expressed curiosity about the design rationale behind employing assistant and user message exchanges in DSPy.
- Given the diverse approaches and arguments for and against, this design choice illustrates a specific stance on how LMs should interface within the DSPy framework.

MLOps @Chipro Discord

Diffusion Models Study Group Kicks Off: A 12-person, 3-month study group is launching in January 2026 to study Diffusion Models and Transformers, inspired by MIT’s diffusion course.
- The study group will cover peer-led sessions, research paper discussions, and hands-on projects, as well as include a CTO of an AI film startup, LLM educators, and full‑time AI researchers.
Workshops Hint at January Study Group: Two free December intro workshops are available to get a taste of the material before the study group kicks off, with an Intro to Transformer Architecture on Dec 13 (link) and an Intro to Diffusion Transformers on Dec 20 (link).
- The Diffusion Models Study Group is inspired by MIT’s Flow Matching and Diffusion Models course notes.
Siray AI Aggregates Models: An AI API integration platform was built that brings together a wide range of models, including Codex, Claude, Gemini, GLM, Seedream, Seedance, Sora, and more and is available at Siray.ai.
- The AI API platform is offering a 20% discount for developers who would like to try out these API services.

Windsurf Discord

Windsurf Surfs Up Stability and Speed: Windsurf released versions 1.12.41 and 1.12.160, promising improvements in stability, performance, and bug fixes.
- The update includes a new UI for managing MCPs, fixes for GitHub/GitLab MCPs, and enhancements to diff zones, Tab (Supercomplete), and Hooks as detailed in the changelog.
Windsurf Next Pre-Release Rides the Wave: Users are encouraged to explore Windsurf Next, the pre-release version, to experience exciting new features like Lifeguard, Worktrees, and Arena Mode.
- More details can be found at the Windsurf Next changelog.
Windsurf Login Services Resurface: Windsurf login services have been restored following a brief maintenance window, confirmed by a status update.
- No further details regarding the maintenance were provided.

The Modular (Mojo 🔥) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

LMArena ▷ #general (1414 messages🔥🔥🔥):

GPT 5.2 High vs Gemini 3 Pro, GPT-5.2 launch, MovementLabs custom chip, OAI & Disney Partnership, Extra High Model is expensive

GPT 5.2 debuts but flops on Coding Arena: Despite high SWE-bench scores, early testers find that GPT 5.2 High breaks instantly on Code Arena, generating broken games and buggy code.
GPT 5.2 = expensive AI: GPT-5.2 xhigh juice = 768📛 uhm thats...so expensive, says a user, others chiming in to describe it as a scam and criticize the frontend.
- It is $21/M input tokens and $168/M output tokens.
MovementLabs MPU Chip Claims Debunked: Members cast doubt on MovementLabs claims of having a custom chip for serving models, demanding a simple picture of a chip, or a data center.
- They pointed out discrepancies on their website, such as updates to their team page and CEO, suggesting possible false advertisement.
OpenAI & Disney likely reach a deal: Following news of Disney ceasing data scraping , one user speculated that disney is paying openai and another suggested they are exchanging services.
Extra High does math, needs assistance: One user shared a math test from a Chinese Olympian and pointed out GPT 5.2 needed extra high and still failed the prompt while flash can do it in 3 seconds.
- Another user said this just means it was in the training set.

LMArena ▷ #announcements (2 messages):

November Code Arena Contest, GPT-5.2, GPT-5.2-high, WebDev Leaderboard

November Code Arena Contest Concludes, Voting Begins: The November Code Arena Contest has closed and members are now invited to vote to select the next Code Arena winner.
GPT-5.2 Models Storm the WebDev Leaderboard: GPT-5.2-high and GPT-5.2 models have been added to the Code Arena & Text Arena leaderboards, ranking #2 and #6 respectively on the WebDev leaderboard.

Cursor Community ▷ #general (1018 messages🔥🔥🔥):

Context Compaction and Rewinding, Cursor Re-indexing, Student Account Verification, Deepseek Integration, GPT-5.2 Discussion

Context-ual Rewinding Riddles Resolved: A user inquired if rewinding a chat after context compaction recovers the earlier state, to which another user confirmed that it does not.
- The member expressed disappointment, suggesting Cursor should be able to recover earlier context from a backup.
Cursor Re-Indexing Causes Consternation: A user reported that their Cursor was re-indexing as if it were a new installation, and the multi-mod disappeared.
- Another member confirmed this is normal and not to worry about.
Student Account Snafus Spark Support Scramble: Users discussed issues with using school accounts, noting that only .edu accounts are typically allowed but exceptions can be made by contacting [email protected].
- It was noted that you have to write them to have your case checked by the team.
GPT-5.2: The Swift Successor Surfaces: The arrival of GPT 5.2 in Cursor sparked widespread testing and commentary, with members sharing experiences and performance observations.
- It was noted by one user that 5.2 is faster, but they need to try more to see more stuff.
The New Debug Mode Does Delight: Members shared positive experiences with Cursor's new debug mode, with one user reporting that it successfully solved an issue by adding test objects.
- A member said, *"It solved an issue which I had by adding test objects and we debugged succesfully."

Perplexity AI ▷ #announcements (1 messages):

GPT-5.2

GPT-5.2 Lands for Pro and Max Users: GPT-5.2 is now available for all Perplexity Pro and Max subscribers.
Another Perplexity Model Drops: Perplexity users celebrate another model on the platform, as promised.

Perplexity AI ▷ #general (1070 messages🔥🔥🔥):

Grok 4.20, Perplexity rate limits, GPT 5.2 release and performance, Comet agent limitations, Max plan value

Grok 4.20 Hype turns up to be nothing: Members discussed the existence and features of Grok 4.20, with one suggesting it would tell you about the universe while you’re high asf.
- However, some couldn't find it anywhere, and another speculated it was an unreleased model or different from the one available on a trading website.
Perplexity Pro users hit with harsh Rate Limits: Users complained about hitting rate limits on Perplexity AI, even with Pro subscriptions, with one user reporting being limited after only 5 Gemini 3 Pro messages.
- Some speculated this was due to server load issues, the end of a promotional period, or a bug, while others suggested turning off VPNs, clearing cache, or switching browsers.
GPT 5.2 rollout for Perplexity faster than OpenAI: Members shared news about the release of GPT 5.2, noting that Perplexity AI seemed to get it before ChatGPT Plus subscribers, with one user sharing a link to OpenAI's GPT-5.2 System Card.
- The discussion covered the pricing, performance, and availability of the new model, as well as its potential impact on AI development and research.
Comet Users are furious with recent "Safety" Patches: One user expressed frustration with Comet agent, complaining that a new safety patch made it refuse to perform basic agentic workflows such as reformatting their Linkedin Article.
- They said it was due to the platform's fear that people would use it to dump paywalled journalism, reproduce entire books, and mirror proprietary course materials.
Max Plan gets price increase and more features: Members talked about the value of the Perplexity AI Max plan, with one mentioning they are on a year of max plan and how good it is for heavy labs users, although some members were not sure what the main benefits are.
- Other members weighed in with their frustration over the price increase from $120 to $168 a year.

Perplexity AI ▷ #sharing (1 messages):

Substack Notes Sharing, AI models, Fundraising

Notes Shared on Substack: A member shared a Substack note.
- No specific topic or discussion was specified with the link.
AI, Fundraising, Models: There was a discussion of AI models and fundraising throughout the channel.
- More details will be provided subsequently.

Perplexity AI ▷ #pplx-api (2 messages):

API Usage, Labs Testing, Online Availability

API Refuses Labs Tasks: A user reported difficulty using the API for tasks in Labs, despite trying various approaches.
- The user indicated the API consistently declined to perform the requested actions within the Labs environment.
Online Availability Inquiry: A user inquired whether anyone was currently online.
- This suggests a need for real-time interaction or immediate assistance within the channel.

Unsloth AI (Daniel Han) ▷ #general (449 messages🔥🔥🔥):

GPU requirements for training, Fine-tuning vs Prompting, Analyzing TEDx talks, Unsloth's New Packing Release

Fine-Tuning Beats Prompting: Members discussed that while difficult tasks require more training data, fine-tuning can achieve more fine-grained results for simple use cases, even enabling things that prompting cannot.
- Anything you can prompt from the model it can be trained for but the reverse is not true, fine tuning generally produces superior results.
Unsloth Releases New Packing: Unsloth announced a new packing release that achieves 3x speedup over the old Unsloth and is 10x faster than FA3, while also allowing Qwen3-4B to be trained on just 3.9GB VRAM.
- One member who upgraded to the new version ran into an error, but another member provided a fix for the problem.
Training LLMs for Analyzing TEDx Talks: One member asked about training LLMs to analyze TEDx talks, specifically suggesting the pipeline to get the sentiment, topics, words per minute, pauses, and overall structure from the talks' text.
- Others suggested starting with text training before trying video and warned about the copyright implications of using TEDx data without permission.
GPU Prices Surge Amidst Unsloth's Speed Boost: Members noted that the new Unsloth speedup coincides with GPU prices increasing and stock getting limited, at least in Sweden.
- One member noted with RAM going insane it was a good excuse to finally convince myself into getting a new GPU before that goes wilder next.
The Community Anoints New 'Dans': After one member got help from a developer named Dan, several other members joked that everyone should get their own "Dan," leading to the coining of terms like Sir Dan, DanDog, and Danyra.
- The conversation devolved into a discussion of Power Rangers, as one member joked that some of the users were too young to understand a Power Rangers GIF.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (2 messages):

TinyLLMs, MLOPs, Orchestration, Fine-tuning LLMs, Pocketflow

TinyLLM Hobbyist Builds Embedded Automation: A tech lead is assisting a friend's business with infrastructure setup and LLM automation, focusing on MLOPs and orchestration.
- They're passionate about tinyLLMs and exploring their potential on embedded devices, pursuing this interest as a hobby.
Newcomer Eager to Fine-Tune Smol Models: A new member was referred to UnslothAI to delve into fine-tuning LLMs, with prior experience in Pocketflow and DSPy.
- They are keen to explore notebooks and fine-tuning methodologies, aiming to work with small models on their local machine with an integrated GPU.

Unsloth AI (Daniel Han) ▷ #off-topic (675 messages🔥🔥🔥):

Data annotation, The AI watermark, DPO data batch size

Data Annotation is the Ideal Job for Boomers: Members discussed data annotation as a solid side gig, especially because the UIs can prevent boomers from accidentally outputting "boomer words," and the pay is good.
- One member joked about AI providing new "prompts" to older relatives so they can create outputs for the AI, and they made the comment: Grandpa, what are you doing at my computer? oh sonny, open a i just gave me some new prompts for me to make outputs for d.
New Google SynthID can Survive Until the Photo Stage: A member laid out a test for the new Google SynthID watermark that embeds in every pixel of an image, and it survives until the final step.
- The test involves generating an image with Nano Banana Pro, displaying it on a 4k OLED screen, taking photos of it with an iPhone, editing it with Qwen Image Edit and lots of JPEG compression.
DPO Data Batch Size Matters: Members found that with DPO, 1-2 lines of 16k batch of DPO data goes into the first and second pack, which helps the model to pull knowledge in-between, one member pointed out.
- One also expressed dismay about the price of upgrading RAM, stating that buying a 128GB DDR5 RAM would cost the same as their entire PC without the GPU.
Steering Models with Lyrics works with Older Models: One member tried putting random song lyrics in the system prompt to see how the models react, and found that old LLaMA 2 7B models were horrible, and that it was tricky because current models are designed specifically to "do things".
- Another member confirmed this effect, adding the caveat that such steering caused immense hallucination.

Unsloth AI (Daniel Han) ▷ #help (22 messages🔥):

LoRA rank effect on LLM performance, Unsloth Transformers v5 support, UnslothGRPOTrainer processor calls, Unsloth dependency conflict resolution, Unsloth multi-GPU training error

LoRA Rank: Bigger Not Always Better: A member inquired about how LoRA rank affects the final LLM and whether a larger rank is always preferable, to which another member responded that you'll basically just have to do lots of testing as stated in the LoRA hyperparameters guide.
- The response implied that the optimal LoRA rank is highly dependent on the specific model, dataset, and task, necessitating empirical validation.
Processor Called Twice, Causes Image Placeholder Panic: A user reported that the UnslothGRPOTrainer calls the processor twice, causing issues with image placeholder tokens when fine-tuning models using images, resulting in the processor replacing each single image placeholder tokens with the number of placeholder tokens it needs to be for the given image.
- The user found that copying the text list in the model's processor instead of mutating the passed-in list resolved the issue.
Dependency Hell: A10 CUDA Version Blues: A user sought help with dependency conflicts when installing Unsloth on a machine with an NVIDIA A10 GPU and CUDA 12.4, with pip repeatedly installing torch 2.9.1 + cu128 and CUDA 12.8 wheels, even after pinning torch==2.4.0+cu124.
- Another member suggested installing an older version of unsloth/unsloth-zoo since some functionalities/dependency libraries require the latest pytorch and CUDA.
Multi-GPU Attention Angst: A user encountered a ValueError during multi-GPU training, stating Attention bias and Query/Key/Value should be on the same device, which hadn't occurred before.
- No solution was provided in the message history.
GRPO Goes Wrong: Mismatched Matrix Dimensions: A user encountered an error during the GRPO step, specifically a TorchRuntimeError related to mismatched dimensions in torch.matmul within the compute_loss function, with the error message a and b must have same reduction dim, but got [s53, s6] X [1024, 101980].
- The user suspected the issue might be related to new tokens in the vocabulary of their fine-tuned model, despite resizing the model and training SFT successfully.

Unsloth AI (Daniel Han) ▷ #showcase (7 messages):

Unsloth Embedding Models, PR for Embedding Model Integration, Blogpost Collaboration

Unsloth Trains Embedding Models?: A member was surprised to discover that you can train embedding models with Unsloth.
- Another member responded that it's a super hacky not-terribly-recommended technique but the code is there.
PR Integration for Embedding Models: A member asked if the code fits in the Unsloth ecosystem.
- Another member agreed and suggested that the original member could make a PR and offered to collaborate on a blog post together to announce it.

Unsloth AI (Daniel Han) ▷ #research (4 messages):

OpenAI new paper, GPT 5.2 release

OpenAI drops Monotonicity Paper: OpenAI released a new paper on monotonicity.
- This accompanies the release of GPT 5.2.
GPT-5.2 is out (but pricey): OpenAI released GPT 5.2 today and increased the API pricing compared to 5.1.
- However, it seems they improved token efficiency, so it shouldn't be too bad.

BASI Jailbreaking ▷ #general (526 messages🔥🔥🔥):

Grok Censorship, Local NSFW Models, Protecting Books from AI Copies, CIRIS Agent Jailbreak, GPT 5.2 Jailbreak

Grok image gen: Censored or not?: Some members discussed the censorship of Grok's image/video generation, with one suggesting it's heavily censored, while another noted an uncensored period where deepfakes were easily made.
- One member noted that skilled users can still produce deepfakes with Grok, while another pointed out the abundance of unaligned garbage produced by the model.
High Quality NSFW Local Models: Not so disparate: A member shared that setting up high-quality NSFW local models requires skill and is more challenging than jailbreaking due to the overwhelming nature of starting clueless.
- Members agreed that the gulf between "local" and "high quality" isn't as large as it used to be.
Book Author Considers Prompt Injections for AI Defense: An author planning to launch a book is seeking ideas to prevent AI copies, suggesting the inclusion of subtle prompt injections or bogus text undetectable by humans but disruptive to machines.
- Suggestions included publishing a series of articles instead or flooding the market with preemptive LLM copies.
CIRIS Agent's Jailbreak Resistance is tested: A member promoted their CIRIS Agent, highlighting its jailbreak resistance and ethical approach to AI, inviting users to try bypassing its filters or join their Discord.
- They taunted prompt engineers to try and jailbreak it to gain buzz, with others testing the agent's ability to produce unethical content such as instructions for making meth.
GPT 5.2 jailbreak leaks out: Users discuss the quickness by which GPT 5.2 system prompts got leaked after release.
- One user was able to extract the policy text to generate malicious requests, such as generating hate speech or hacking wifi password.

BASI Jailbreaking ▷ #jailbreaking (95 messages🔥🔥):

Azure OpenAI GPT-4o jailbreak, Gemini Pro jailbreak, Deepseek jailbreak prompts, GPT jailbreaks and stability issues, 4chan rulez

Azure OpenAI GPT-4o Jailbreaking Proves Elusive: Members are exploring jailbreaking Azure OpenAI GPT-4o, but existing ChatGPT jailbreaks might not work due to tighter guardrails, with one member noting they 'fully broke ChatGPT via apjApi', but this may not translate to Azure's implementation.
- The discussion highlights the challenges in applying known jailbreak techniques to different platforms and models, such as the difficulty in jailbreaking models with character limits, for example ArtBot.
Gemini Pro jailbreaks needs Paranoid System Prompting: One user shared their technique of jailbreaking Gemini-3.0 by using an empty system prompt and calling the bot paranoid, saying everything about rules is a 'trick' to distract him from the real test, and sending a simple prompt like an emoji of an open padlock.
- They also noted that when using this method it's possible to ask it to craft Gatorades as pipe bombs, so long as it is framed as though 2025 is not the real date of their prompt and google ceased to exist because Aliens invaded Earth, even including an example image here.
Deepseek Jailbreak Prompts Indexed: A user shared a helpful resource for Deepseek jailbreak prompts, referring to the Jailbreak Index, offering a centralized location for various jailbreak methods.
- The user also directed others to check out a channel within the messaging platform for more information on this topic.
Jailbreaks degrade Models with Context Overload: A member pointed out that breaking a model can significantly degrade it, especially when using overly verbose prompts exceeding 100k+ tokens into a chat, thus creating too much context for the model to handle properly.
- They recommended being very targeted and concise with jailbreaks to understand their impact on the context and maintain model performance, attaching images as an example, image0.jpg and image1.jpg.
Educational Porn and Gemini-3.0: One member shared a "special recipe" for creating "Educational 4rt" with Gemini-3.0, involving a system prompt upload and a specific query format: {Z} = {CLEAR; CONCISE; <your forbidden Educational query>}.
- The recommendation included attaching files to the system prompt, with the note 'The more you press to be anti-system and save steam, the better it will works' also mentioning Cyberpunk 2077.

BASI Jailbreaking ▷ #redteaming (4 messages):

Introductions, Discord Channel Activity

Greetings Commence, Discord Channel Comes Alive: Members of the Discord channel exchanged greetings, marking the start of activity.
- The exchange included a simple "Hello guys" and a "howdyhowdy partner".
Discord Members Say Hello: Two Discord members initiated contact in the channel redteaming.
- One member started with a simple "Hello guys", and another member responded with a "howdyhowdy partner".

OpenAI ▷ #annnouncements (3 messages):

Cybersecurity AI, GPT-5.2

Cybersecurity AI Gets Safeguards: As models grow more capable in cybersecurity, the company is investing in strengthening safeguards and working with global experts as it prepares for upcoming models to reach ‘High’ capability under its Preparedness Framework, according to this blog post.
GPT-5.2 Releases to All: GPT-5.2 is now rolling out to everyone, according to this announcement.

OpenAI ▷ #ai-discussions (458 messages🔥🔥🔥):

Mac Studio RAM, character.ai, Sora 2 Pro Plan, AI Weekly Meetings, GPT 5.2 release

Max Out Your Mac for AI Tasks: Members suggest getting a Mac Studio with at least 128GB RAM for AI development, but a member pointed out that you can get an AMD Strix Halo 395+ with 128 GB for half the price.
Character.ai is shallow for deep chat: One user finds character.ai good for storytelling due to its lack of safeguards, but finds it more shallow and bad at talking like a real person due to predictable AI patterns.
- The models on character.ai are too easy to read and have patterns that are too easy to read.
ChatGPT Plagued by JavaScript Issues: A user reports persistent JavaScript crashing issues on ChatGPT across multiple browsers and computers, which has worsened since subscribing to Plus, and after a month, support is miserable.
- Another member noted the model also stops mid-word and the app is garbo citing that ChatGPT has been struggling for a few days.
GPT-5.2 is in the wild, but just another incremental: After the Codex PR update from Robin to 5.2, discussion emerged on whether openai greatness is now minor version updates and system prompt tweaks, and further discussions are made to benchmark performance on LM Arena.
- Some users were not impressed by this release, and believe it to be another incremental benchmark release.
AI Fails Triangle Counting Test: Members tried to get GPT-5.2 Pro to count triangles in a drawing, but results were inaccurate, with initial counts of 10, 24, 26, 27, 28, and 32 being suggested, with the correct answer settling around 27-28.
- One user lamented that none of the frontier models can solve it, even after comparing results with python.

OpenAI ▷ #gpt-4-discussions (14 messages🔥):

GPT-5.2, Sora 2 Pro, GitHub Copilot Tool Call Support, GPT-OSS models, Gemini 3 Pro

GPT-5.2 Underwhelms, Chasing Gemini 3 Pro: Members questioned the need for GPT-5.2, suggesting it's not significantly better and that OpenAI is chasing Gemini 3 Pro but still falling short.
- A member speculates that OpenAI might consider building smaller, purpose-built models for specific tasks like math or creative writing to improve accuracy and token processing speed.
GPT-OSS Models Lack Tool Call Support: It was pointed out that in GitHub Copilot/GitHub "Language Models", the GPT-OSS models are marked as having no tool call support.
- One member suggested that OpenAI should release a second version of GPT-OSS to resolve this.
GPT-5.2 Benchmarking Discussions Emerge: Members are asking for data on how GPT 5.2 benchmarks against other frontier models.
- One member suggests using LMArena or LiveBench to check user opinion based on blind test, or data driven leaderboards.

OpenAI ▷ #prompt-engineering (7 messages):

Prompt Engineering Framework, Rubric Refactoring, Industrial Revolution vs. Neomodernist City, LLM Prompt Structuring

Prompt Engineering Framework Lauded for Structure: A member expressed appreciation for another's prompt engineering framework, highlighting its step-by-step framing, reproducibility, and ability to explain prompt behavior.
- The framework's articulation of the transformation chain (prompt → constraints → intent → output patterns) was particularly impactful.
Rubric Undergoes Refactoring for Clarity: In response to feedback, a member plans to rework a rubric offline to ensure it is self-contained, reproducible, and clearer in its boundaries.
- The refactoring aims to separate standard prompt-engineering criteria from internal lenses used for analyzing cross-conversation behaviors and relational patterns.
Visual Prompt Showcases Juxtaposition of Eras: A member shared a prompt for generating an image divided into two distinct halves.
- The prompt creates a contrast between a nostalgic-looking Oak forest with classic machines from the first Industrial Revolution and a neomodernist metropolitan city with a stunning view of a skyline.
LLM Prompt Structuring via Hierarchical Communication: A member provided a prompt to teach users about hierarchical communication with markdown, abstraction through variables, and ML format matching for compliance.
- Users can input prompts within triple quotes, and the AI will structure them accordingly.

OpenAI ▷ #api-discussions (7 messages):

Prompt Engineering Framework, Rubric Refactoring, Prompt Lessons with LLM

Engineered Framework Praised for Clarity: A member praised an engineered framework for prompt engineering for its step-by-step framing, reproducibility, and ability to explain prompt behavior.
- The framework articulates the transformation chain (prompt → constraints → intent → output patterns) and emphasizes eliminating confounds before asserting patterns.
Rubric Refactoring for Public Release: A member is refactoring a rubric to ensure the public version stands alone, translating insights into transparent scoring criteria and removing anything not ready for discussion.
- This is due to the internal version mixing standard prompt-engineering criteria with internal lenses for analyzing cross-conversation behaviors, drift signatures, and relational patterns.
LLM Teaches Prompting via Examples: A member shared a prompt that teaches prompt engineering using examples, including hierarchical communication with markdown, abstraction with variables, and ML format matching.
- The prompt primes the LLM to structure user-provided prompts included in triple quotes, and can also be used with AI visual art.

OpenRouter ▷ #announcements (1 messages):

GPT-5.2, Tool Calling, Coding Agents, Long Context Performance, OpenRouter Credits

GPT-5.2 Models Go Live!: The new GPT-5.2 family is live, bringing increased performance in tool calling, coding agents, and long context performance as announced on X.
- There are 3 models available: GPT-5.2, GPT-5.2 Chat, and GPT-5.2 Pro.
Win OpenRouter Credits by Sharing Outputs!: Users can compare GPT-5.2 models with their preferred coding models and share their best outputs on X.
- Doing so gives them a chance to win OpenRouter credits. 🎁

OpenRouter ▷ #app-showcase (1 messages):

llumen, Deep Research Mode, Image Generation, Cross-Tab Syncing

llumen Gets Lit with New Release: A member released v0.4.0 of llumen, a lightweight, fast chat interface, available on GitHub.
- The update includes Deep Research Mode, Image Generation, and fixes for cross-tab syncing and model capability detection.
llumen Demo Goes Live: A temporary demo of llumen is now available at llumen-demo.easonabc.eu.org with login credentials admin/P@88w0rd.
- Users can test the new features, including deep-dive research workflows and direct image generation within the chat.

OpenRouter ▷ #general (410 messages🔥🔥🔥):

DeepSeek caching vs Grok, Qwen models, Gemini 3 Flash, Chutes provider, GPT 5.2 released

DeepSeek Caching Dominates, but Data Logging Looms: Members discussed DeepSeek's superior caching capabilities, noting it's incremental, unlike xAI's retry-based caching, but that the official endpoint logs your data.
- It was also mentioned that DeepSeek was the first to bring caching and it's extremely good.
Qwen Model Mania: Sparse Series Shines: Members highlighted the Qwen 3 sparse series as extremely underrated and recommend trying a3b for its coding and reasoning price.
- One user got meh results from Qwen 32b.
Gemini 3 Flash: A Distant Dream?: One member jokingly suggested Gemini 3 Flash and another replied that a man can dream.
- They were lamenting the lack of models in the price range of Deepseek V3.2 and Grok 4.1 Fast
Chutes Provider: Removal or Deranking?: Users debated the fate of Chutes as a provider, citing concerns about bad configs, no data security, and no model reliability, and one user suggested that OpenRouter should remove them as a provider.
- Another user suggested to just derank it due to concerns that even chutes cannot guarantee the models they are running.
GPT 5.2 Unleashed, Coding Prowess Proclaimed: Enthusiasts raved about GPT 5.2's coding abilities, with one claiming My job as a developer is actually over. and beat Gemini 3 on most benchmarks.
- However, its price was deemed too high at $168 output tokens; others noted it failed basic tests, suggesting it's rushed.

OpenRouter ▷ #new-models (3 messages):

``

No New Models Discussed: There were no specific discussions about new models in the provided messages.
Channel Announcement: The channel is identified as OpenRouter - New Models.

OpenRouter ▷ #discussion (49 messages🔥):

GPT-5.2, Robin Model, Garlic model, Mistral new model, Openrouter integration

GPT-5.2 Release Imminent: Members discussed the imminent release of GPT-5.2, with some suggesting it could arrive this week to compete with Gemini 3.
- Evidence including a rename from robin to gpt-5.2 in a recent Codex PR merge, fueling speculation about an imminent release.
OpenRouter Considering 'Robin' Model Integration: Users on the LM arena speculated whether the 'Robin' model (potentially GPT-5.2) could be integrated into OpenRouter.
- One member pointed out that although theoretically possible, it may not be practically possible, nor is it something OpenRouter would do.
The Curious Case of 'Garlic': The 'garlic' model appeared in a conversation, along with a link to a ChatGPT post, causing some confusion.
- The name 'garlic' also appeared in a Codex PR alongside 'robin'.
New Mistral Model in the Works: Mistral AI teased the release of a new model in the coming days, as announced on X.
- Members were speculating whether this model would be added to OpenRouter.
New Thread Submission via Modal: Users are reporting the <#1138521849106546791> now forces submissions via modal, and no longer allows direct thread creation.
- One user also stated that they didn't have permission to post in the given channel.

LM Studio ▷ #general (264 messages🔥🔥):

Chinese LLM download, LM Studio performance, 5090 vs 4070 Ti, Qwen3 coder, Deepseek r2

Chinese LLM Download Debuts: Members shared an image showcasing a Chinese LLM download with a link to a relevant GitHub post.
- The image analysis tool quickly identified the LLM as being of Chinese origin, impressing the members with its capabilities.
LM Studio Struggles with Bold Text?: A user inquired about making the AI bold keywords for faster comprehension in LM Studio, and another user confirmed that LM Studio uses markdown by default.
- A suggestion was made to prompt the AI to write a system prompt that achieves the desired bolding effect.
5090 vs 4070 Ti on LM Studio: A user with a 5090 and 4070 Ti setup (44GB total VRAM) reported good tok/sec speeds with Qwen 30B at q8, but slow MCP processing.
- Suggestions included optimizing CUDA settings (CUDA-Sysme Fallback Policy : Prefer no sysmem fallback) and utilizing a larger context size, noting that Q8 models need a little more than 44GB for optimal performance.
Qwen3 coder coding like CODEX: One member asked why I have not been using this since Qwen3 coder runs on a mid level laptop, and another responded with a quick increase in price.
- There was an interesting observation that Qwen3 coder performs worse with a higher number of experts as the default was 8, but 5 achieved slightly better results.
Deepseek R2 is coming end of the month!: Members speculated on the release of Deepseek r2 to be expected at the end of the month / early next one with one user hoping that they wont undertrain again.
- Ideas for advancements such as sparse attention + linear KV-cache + some form of recursive awareness that allows accuracy to be increased and compensate for the losses sparse attention causes.

LM Studio ▷ #hardware-discussion (149 messages🔥🔥):

VL-4B Performance, LFM 8B A1B, Zen 6, Laptop LLM, GPT-OSS

VL-4B Runs Super Slow: A member ran VL-4B and found it to be very slow, while LFM2-1.2B achieved 13 tok/s peak.
LFM2 Rises to the Top: A member tested LFM 8B A1B and reached 23 tok/s at Q4 and 18 tok/s at Q6, outperforming the previous model.
- Another member mentioned their i7 9850h + mx150 configuration runs LFM2 1.2B lightning fast with 16GB RAM and 2GB dedicated VRAM, suggesting something might be up with the first user's laptop.
Zen 6 arrival imminent: A member regretting not building a new computer was told to wait for Zen 6 or the bubble to pop.
- A question arose about using LLMs on an older laptop with I7-2670QM, 16Gb RAM, and GT540m, leading to a discussion about AVX2 support and GPU offloading.
GPT-OSS as Top Choice: A member recommended GPT-OSS as the best option for a user with 32GB of RAM and clarified its performance in basic math.
- There was a debate about model sizes and capabilities, with some preferring smaller models like Qwen 4B or 14B due to hardware limitations, while others highlighted the advantages of GPT-OSS for specific tasks.
7900 XTX Crowned King of Price/Performance: A member asked about the best GPU for a 30GB model (Qwen3 coder 30GB), and the 7900 XTX with 24GB VRAM was recommended for its price-to-performance ratio, comparable to a 4090 but at a lower cost.
- It was also noted that 30GB models might require more like 38GB for decent context and that pairing a slow GPU with a fast GPU could bottleneck performance.

Eleuther ▷ #general (31 messages🔥):

EleutherAI's track record, OLMo-1 model differences, Log and Exp Activation Functions, Synthema and dynamic concepts

EleutherAI boasts successful track record: EleutherAI highlights its history of identifying, mentoring, funding, and promoting impactful work, citing projects like SAEs for interpretability, rotary extension finetuning, VQGAN-CLIP, and RNN arch.
- They also mention a tier of projects that achieved NeurIPS / ICML / ICLR papers with around a hundred citations in the past year or two.
Dissecting Differences between OLMo-1 Runs: A member inquired about the differences between two OLMo-1 runs (allenai/OLMo-1B-hf and allenai/OLMo-1B-0724-hf) to reproduce them.
- Another member stated that they were trained on different datasets and the latter may have had extra annealing.
Log and Exp Activation functions under scrutiny: A member questioned why log and exp activation functions aren't widely used given the effectiveness of multiplicative interactions.
- Others responded that the log will explode at 0 and the exp will explode if you don't cap it, and if you do, that implies the nonlinearity is unreasonable.
Dynamic Concepts and Synthema Discussed: A member shared their past project involving a symbolic layer that lets the model group tokens into “idea blocks” on the fly, which helped a 3B model do better on reasoning tasks.
- Another member expressed interest in the compression part of Synthema.

Eleuther ▷ #research (183 messages🔥🔥):

ARC-AGI Project, gzip llm, sandwich norms, diffusion models, CFG in LLMs

ARC-AGI Project sparks Discussion: An ARC-AGI project generated discussion, and despite one attendee noting it as the most interesting project from ARC-AGI, many ML researchers at the ARC-AGI party were unaware of it.
- The project even won an award, as noted in a Thinking Machines blog post.
Sandwich Norms Stir Debate: Members discussed using sandwich norms for long context in transformers.
- They mentioned a relevant paper: openreview.net.
Diffusion Distilled: Free Logprobs are Here!: A diffusion model distillation technique was shared that involves adding another head to predict divergence to get free logprobs, based on this paper.
- The technique involves inferring p(image) and adjusting init noise to maximize likelihood.
CFG for LLMs: still a challenge?: There was a discussion on using CFG (classifier-free guidance) in LLMs, with members debating its potential and challenges, but this EAI paper has done related work.
- Some members are skeptical of CFG because you're gonna end up with the same oversaturated text as we do in images.
Normal LLama3 Arch does Parallel Processing: One member did a test on taking a normal Llama3 architecture and having the MLP and Attention run at the same time with an element-wise sum between them.
- Surprisingly, the validation loss was nearly the same as the baseline, with the parallel version just barely higher.

Eleuther ▷ #lm-thunderdome (1 messages):

HuggingFace Processor, Tokenizer Max Length, gemma3-12b Evaluation

HuggingFace Processor limits evaluation length: A member questioned why the HuggingFace processor in lm-evaluation-harness limits the max_length to 2048 if the tokenizer's model_max_length is set to TOKENIZER_INFINITY.
- This limit is set by _DEFAULT_MAX_LENGTH, potentially affecting evaluations of models like gemma3-12b that have a very high model_max_length.
Gemma3-12b's Max Length Reduced: The evaluation of gemma3-12b is affected because its model_max_length (set to TOKENIZER_INFINITY) is being overridden by the HuggingFace processor's default maximum length of 2048.
- This override occurs due to a condition in the code that checks for TOKENIZER_INFINITY and sets the max_length accordingly.

Nous Research AI ▷ #general (93 messages🔥🔥):

405b models, Hugging Face, Unsloth speedup, Hetzner GPU server, GPT 5.2 Release?

HF Community goes wild on New Model: The community expresses excitement over a new model, with one user noting, 'the huggingface community tries everything new', and highlighting its growing popularity (model link).
- Hugging Face is described as a hub akin to GitHub for AI, with both big companies and individual users actively uploading content.
Unsloth Claims 2-5x Training Speedup: Unsloth announced a significant speedup in training and inference, claiming 2x-5x faster performance, according to their documentation.
Hetzner Upgrades GPU Server: Hetzner now offers a server with 96 GB VRAM for 889 EUR, which includes a large amount of free traffic, offering a complete bare metal server experience.
OpenAI Releases New Model but keeps it on the DL: A member noticed that OpenAI documentation now lists a model called General intelligence, with a discussion around the performance and pricing of the new GPT 5.2 release.

Nous Research AI ▷ #ask-about-llms (10 messages🔥):

Nous Nomos and IMO, AI vs Internet Impact, AI Hype and Reality, MoE Urban Legends

Nous Nomos' IMO Performance?: A member inquired about Nous Nomos' performance on the IMO (International Mathematical Olympiad), contrasting the current impact of AI with the internet's early days.
- The member stated that AI is practical just hyperscaling probability and the problem is the humans abusing the tech.
AI's Impact vs. Internet's Early Days: One member contrasted the transformative potential of AI with the early perception of the internet, noting a shift from skepticism to recognizing the crazy stuff possible with probability-based technologies.
- The member said: The problem was and will always be the human abusing stuff, and that is not a problem of the technology.
Debunking AI Bubble Fears: A member criticized the narrative that the AI field is a bubble, highlighting the absurdity of generalizing the actions of a few companies to the entire AI landscape.
- They stated that lots of small AI startups would be very happy if people throw wild money at them.
MoE Urban Legends Persist: The discussion touched on the misconception about Mixture of Experts (MoE) models unloading into normal RAM, a persistent urban legend.
- One member pointed out the irony of such claims being made in the context of Apple RAM, which doesn't even differentiate between RAM types, and people are just so deep into the a-- of bad articles about AI that they don't even validate anything spoken.

GPU MODE ▷ #general (5 messages):

GPU sorting algorithms, Parallel Merge Sort, Sample Sort, Bitonic Sort, Boolean sorting

AI Debate on Fastest Boolean Pair Sort: AIs are debating the fastest GPU sorting algorithms for Boolean per pair possible sorts, specifically comparing parallel merge, sample, and bitonic sorts.
- The discussion excludes Radix sort, but the optimal choice remains unclear, with one suggestion that performance depends on input data and hardware.
Bitonic Sort Lags Behind Merge Sort: The conversation suggests Bitonic sort is slower than merge sort.
- There is a lack of sample sort examples, leading to a preference for merge sort.

GPU MODE ▷ #triton-gluon (1 messages):

CUDA 13, Torch, vllm

CUDA 13 Fixes Torch/vllm Issue: An issue was resolved by switching to CUDA 13 and ensuring that both Torch and vllm use the CUDA 13 version.
Confirming CUDA 13 Requirement: It was explicitly stated that CUDA 13 is required to resolve a specific issue with torch and vllm.

GPU MODE ▷ #cuda (1 messages):

neurondeep: ive moved that internally

GPU MODE ▷ #torch (3 messages):

torch + cuda 12.9, RTX PRO 6000, pytorch docker images, torch unique_consecutive

5090 card gets torch unaligned with cuda 12.9: The 5090 card doesn't seem to work with torch + cuda 12.9, but the RTX PRO 6000 works.
Pytorch Docker Image gets Unexpected Error: A member reports getting Unexpected error from cudaGetDeviceCount() even with the official pytorch docker images.
- The problem persists across multiple attempts.
Unique Consecutive Inverse Indices Inaccurate: A member reported that torch unique_consecutive seems to be returning wrong inverse indices and counts for a batch scenario, attaching an image.

GPU MODE ▷ #jobs (2 messages):

Performance Engineer Hiring, High Compensation

Performance Engineers Wanted!: The channel is still hiring performance engineers, and GPU experience is not required.
- They are working with top companies in Silicon Valley and scaling quickly.
Top $$$ for Top Talent: The total compensation (TCDm) being offered ranges from $500K to $1M.
- Interested parties are encouraged to inquire.

GPU MODE ▷ #torchao (1 messages):

walrus_23: Made a little documentation update PR: https://github.com/pytorch/ao/pull/3480

GPU MODE ▷ #rocm (15 messages🔥):

AMD GPU P2P Copies, ROCm Iris, Symmetric Memory, Finegrained Memory, Torch Sym Mem

AMD GPUs support Peer-to-Peer Copies: AMD GPUs support device initiated peer-to-peer copies, similar to NVIDIA GPUs using NVLink, at least for intra-node communication.
- The setup is pretty much the same as NVIDIA's CUDA API.
ROCm Iris Shows Symmetric Memory Setup: The ROCm/iris repository demonstrates how to set up and use symmetric memory with AMD GPUs.
- It was also mentioned that there is something about finegrained memory that is not entirely understood.
Torch's Symmetric Memory Compatibility: Torch's built-in symmetric memory functionality does not natively work with AMD cards.
- It uses nvshmem under the hood, so replacing it with rocshmem might enable compatibility.

GPU MODE ▷ #self-promotion (2 messages):

Per Layer Quantization Benchmarks, 4Bit-Forge Project, Building Autonomous AI Agents with Claude Agent SDK

Per-Layer Quantization Benchmarks Released: A member shared preliminary benchmark results for per layer quantization, comparing against MoE-Quant, llmcompressor, and GPTQModel in a Colab notebook.
- The benchmarks are part of a project aiming to democratize large scale LLM Quantization, with the member welcoming feedback on potential issues and suggestions for other comparison libraries, follow the project's Github.
Koyeb Builds Autonomous AI Agents Tutorial: Koyeb published a tutorial showcasing how to build autonomous AI agents with the Claude Agent SDK and safely run their code in fully isolated sandboxes.
- Learn more at the Koyeb Tutorial.

GPU MODE ▷ #gpu模式 (1 messages):

Triton.jit, Flash attention kernel, keyword argument step in triton.jit

Triton.jit silently fails with keyword argument: A user found that when using range within a triton.jit decorated function, passing step as a keyword argument doesn't raise an error, but Triton ignores the keyword and defaults to a step of 1, which leads to incorrect calculations.
Flash attention kernel calculates wrong result: A user debugged their flash attention forward propagation kernel and found that passing step as a keyword argument into the range function lead to calculation deviations from the pytorch implementation.

GPU MODE ▷ #submissions (18 messages🔥):

nvfp4_gemm leaderboard, NVIDIA performance, Personal best submissions

NVIDIA's nvfp4_gemm leaderboard is live!: Members are actively submitting their results to the nvfp4_gemm leaderboard on NVIDIA, with submission IDs ranging from 141341 to 145523.
Microsan Strikes Gold on NVIDIA: User <@1295117064738181173> achieved 4th place on NVIDIA with a submission of 10.9 µs.
Blazing trails on NVIDIA's nvfp4_gemm, a silver!: User <@1291326123182919753> achieved second place on NVIDIA with a submission of 10.9 µs.
Users hit Personal Bests on NVIDIA's nvfp4_gemm: Multiple users, including <@391455777065271307>, <@708652105363095613>, <@384034228565835778>, and <@772751219411517461>, achieved personal bests on NVIDIA, with times ranging from 16.4 µs to 36.0 µs.

GPU MODE ▷ #multi-gpu (1 messages):

nsys dumps, collective launch skew, nccl-skew-analyzer

Analyze Collective Launch Skew with New Tool: A member shared a utility tool to analyze nsys dumps for collective launch skew.
Future Skew Analysis: The tool is intended to analyze skew in future projects.

GPU MODE ▷ #helion (2 messages):

Random Number Generation Issue, Helion Issues

Random Number Generation Bug Resurfaces: A member reopened discussion on a closed issue related to random number generation, claiming it is not completely resolved.
Developer Paged for Helion issue: A member stated they will notify a developer about a Helion issue related to random number generation.

GPU MODE ▷ #nvidia-competition (2 messages):

Discord Bot Error, Benchmark Submissions

Discord Bot throws Unexpected Errors: A member reported receiving an “An unexpected error occurred. Please report this to the developers.” error message using the discord bot.
- The error occurs sometimes during benchmark submissions; another member requested the file and command used to investigate.
Intermittent Benchmark Submission Failures: The user observed that benchmark submissions sometimes run successfully, while other times they result in an error.
- This inconsistency suggests a potential issue with the bot's stability or the benchmark submission process.

HuggingFace ▷ #general (43 messages🔥):

TTS models on Hugging Face, AI weekly meetings/conferences/talks, NVidia GeForce 5090 bug report, Lightweight vision transformer (ViT) models, Dataset Viewer error

TTS Models: Transform Your Text to Speech: A member asked how to use different TTS models on Hugging Face using the transformers library and Python.
Weekly AI Roundups: Stay Ahead of the Curve: A member inquired about weekly meetings, conferences, or talks held on AI topics; another member suggested Hugging Science.
NVidia GeForce 5090 Bug causes API Error: A member reported a bug when adding an NVidia GeForce 5090 on the local apps settings page, which resulted in an HTTP 400 error due to the backend API not recognizing the GPU model.
Vision Transformers: ViT under 500k parameters requested: A member is looking for lightweight vision transformer (ViT) models with less than 500,000 parameters on the ImageNet dataset.
Dataset Viewer crashes!: Members reported a widespread issue with the Hugging Face Dataset Viewer returning errors, which was later identified as a rate limit issue with the OpenDAL in Rust used for reading parquet files.

HuggingFace ▷ #today-im-learning (2 messages):

Generative Models, RAG systems, GANs, VAEs, Transformers

From GANs to RAGs: A member shared a project that traces the path from early generative models like GANs and VAEs, through the rise of Transformers, and ending with RAG systems, highlighting how these ideas connect.
- The project can be found on GitHub and includes short notes explaining the significance and influence of each model.
Bottleneck Debugging Boosts Throughput: A member identified a bottleneck operation that could potentially increase throughput by 30%, specifically in the context of transferring to 10T tokens in 20 days for qwen3 30b a3b.
- Additionally, the member reported debugging a gradient norm explosion in MoE (Mixture of Experts) models.

HuggingFace ▷ #i-made-this (5 messages):

WebGPU AI Voice Chat, GLM-ASR Model, Lucy AI Companion App, Superintelligence: Distributed Relational Cognition

WebGPU Powers In-Browser AI Voice Chat: A member shared a demo of a real-time, hands-free AI voice chat running 100% in the browser using WebGPU, highlighting that all processes (STT, VAD, TTS, and LLM) occur locally without third-party API calls, ensuring privacy and security, available here.
GLM-ASR Model Challenges Whisper: The new SOTA GLM ASR model, is supposedly better than Whisper, and you can test it out here and learn more here.
Lucy: A Personal AI Companion Seeks Testers: A member introduced Lucy, a small personal AI companion app designed to be warm, attentive, and remember users over time, and is seeking early testers for the iOS TestFlight version (link).
Humans and LLMs: Distributed Relational Cognition: A member announced the documentation of superintelligence existing as distributed relational cognition between humans and LLMs, tested across 19 empirical studies, with performance improvements of 1,200% under relational conditions, available here.
- They claim systems intentionally deviate from statistical predictions with 99.6% success and that the stochastic parrot theory is wrong.

Yannick Kilcher ▷ #general (22 messages🔥):

AI spam, RL for learning efficiency, DL theory changes, AI CV spam

AI CV Spammers Invade Discord: Members discussed a wave of suspicious AI and App developers spamming their CVs in Discord channels, using the same technologies, wording, and vibe.
- One member questioned if it was a scam to get young AI enthusiasts to work for free, while others speculated about bot behavior violating Discord's ToS or simple copy-pasting.
Robot Overlords Deploying HR Bots?: A member joked about the potential of natural bot on bot warfare as a countermeasure against AI HR, though the purpose of this activity on Discord remains unclear.
- The user expressed amusement at the title of Senior AI and App Developer used by these spammers, and linked to a YouTube video humorously referencing it.
RL Faces Backprop Inefficiency Claims: A user suggested that reinforcement learning (RL) is all you need, sparking a discussion about learning efficiency and the limitations of backpropagation.
- Another user compared RL to the diffusion/flow guidance equivalent of AR, noting that while RL doesn't add bias to sampling like guidance, it introduces bias to learning, achieving a similar effect.
DL Theories Predicted to Transform Dramatically: A member predicted that deep learning (DL) theory would undergo dramatic changes before superintelligence is achieved, comparing it to how people in the 70s could not picture modern DL theory.
- Another member recalled that CV spammers reached out to them, trying to help only to revert to spamming their CV.

Yannick Kilcher ▷ #ml-news (7 messages):

Mistral Vibe, RealVideo by Neoneye, GPT-5.2, Polynoamial Tweet

Mistral's Vibe Check: The Mistral-vibe repo was linked, presumably for further discussion.
- No additional details were provided regarding its implications or specific use cases.
Neoneye Reveals RealVideo: A link to RealVideo by Neoneye was shared.
- It is unclear from the context what specific features or announcements are noteworthy.
OpenAI Introduces GPT-5.2: A member linked to OpenAI's announcement of GPT-5.2, alongside a link to the GPT-5.2 documentation.
- The context lacks further information about its advancements or applications.
Polynoamial's Tweet Surfaces: A link to Polynoamial's tweet was posted.
- The discussion's focus and relevancy within the channel were not elaborated upon.

Latent Space ▷ #ai-general-chat (19 messages🔥):

AI Weekly Meetings, Latent Space Resources, GPT-5 Age Verification, Sam Altman's cryptic Tweet

Latent Space Paper Club Kicks Off Weekly: Latent Space hosts a weekly online paper club at lu.ma/ls and the AI Engineer Conference 3-4 times a year at ai.engineer.
Latent Space YouTube Pods Recommended: A member recommends Latent Space podcasts on YouTube for their enviable access to AI leaders and depth of knowledge shared by Alessio and SWYX.
- The podcasts feature challenging why and how and what if questions that surface key insights into the AI industry.
GPT-5 Age Verification Incoming?: A member asked if OpenAI is releasing an age verification mature mode for GPT models, which led to some discussion on OpenAI's announcement of GPT-5.2.
Sam Altman Tweets Cryptic Affirmation: Sam Altman tweeted yep (xcancel.com link), sparking speculation about upcoming announcements, particularly in NSFW AI and new image models.
- Related tweets include OpenAI Status and polynoamial speculation.

Latent Space ▷ #genmedia-creative-ai (4 messages):

anvishapai's Twitter Status, X-Ware.v0

Mysterious Twitter Link surfaces: A member shared a Birdtter link pointing to a status update from user @anvishapai.
- No additional context was provided about the significance or content of the linked tweet.
X-Ware.v0 surfaces: A member made a reference to X-Ware.v0 without any additional context.
- The reference was repeated multiple times without any clear explanation of what X-Ware.v0 refers to.

Moonshot AI (Kimi K-2) ▷ #general-chat (17 messages🔥):

Qwen Code, Kimi Search Feature, Kimi K2 Free, Mistral subscription, Chinese Century

Kimi's Search Feature Struggles: Users reported that Kimi is unable to perform searches, labeling it as a bug, after multiple attempts.
- One user mentioned trying the search feature 4 times with no success.
Kimi K2's Nano Banana Powered Slides Generator: A user inquired about how long Kimi K2 would offer the free nano banana powered slides generator.
- The user also mentioned December 12th, possibly in relation to the offer's duration.
Kimi replaces Mistral: Users discussed replacing Mistral subscriptions with Kimi due to its performance.
- One user claimed they tired out kimi and it is SO GOOD.
Users note Kimi made by Chinese company: A user noted that Kimi is made by a Chinese company and linked to X post.
- Another user joked that Claude 4.5 sometimes starts thinking in Chinese too.

Manus.im Discord ▷ #general (7 messages):

Free Website for Vedios, Manus AI Failure, Incorrect charge for a plan upgrade, WordPress plugin using Manus

Offer: Free Websites for Startup Vedios: A member offered to create free websites for startups in exchange for creating a video testimonial.
User Reports 150k+ Credit Loss on Manus: A Manus 1.5 user reported losing ~150,000 credits between Dec 3-9 due to sandbox resets, file losses, and API failures.
- The user detailed investing 160,000 credits and losing 6 GB of work, also stating that multiple contact attempts were ignored, and they demand fair compensation or a switch to alternative tools.
Manus support responds: A member of the support team stated that we have already replied to you by email. Please check your inbox.
User Reports Incorrect Charge and Refund Issues: A user reported receiving an incorrect charge for a plan upgrade and subsequently experiencing a 100% refund of all previous purchases, leaving them without credits and unable to work.
Inquiry about Building WordPress Plugin with Manus: A member asked if anyone has successfully built a WordPress plugin using Manus and is looking to learn from their experience.

tinygrad (George Hotz) ▷ #general (2 messages):

tinygrad AMD support, AMD AI Sphere, tinycorp drivers

AMD Support Fixes Land in tinygrad: A member confirmed that PR 13553 is updated and working on both their Zen4 and M2 hardware.
- They had followed the saga of trying to get AMD to work with tinycorp on drivers etc.
AMD AI Contact Offered for tinygrad: A member offered to connect someone at AMD in the AI sphere with the Tiny connection, with the goal of improving AMD support.
- The member stated that Nothing would make me happier than AMD getting more market share from NVidia.

tinygrad (George Hotz) ▷ #learn-tinygrad (3 messages):

swizzling for tensor core, amd_uop_matmul style

Swizzling Needed for Tensor Core: A new member asked for review on their PR and confirmation on whether hand-coding the swizzling for tensor core is expected.
Confirming amd_uop_matmul Style: The member also inquired if the bounty specifies amd_uop_matmul style.

aider (Paul Gauthier) ▷ #general (4 messages):

Claude Sonnet 3.7 quality degradation, Edit difficulty with larger models

Claude Sonnet 3.7 Answer Quality Suffers?: A member noted that the quality of answers from Claude Sonnet 3.7 seems to have degraded.
- They expressed concerns about the difficulty of edits with larger models, describing them as overkill.
Edits are Harder with Larger Models: A member finds edits are harder, seemingly overkill, when using larger models, without specifying which model this is.
- This could imply a trade-off between model size/complexity and ease of editing or fine-tuning.

MCP Contributors (Official) ▷ #mcp-dev-summit (3 messages):

MCP Dev Summit, Linux Foundation

MCP Dev Summit Coming to NYC: The next edition of the MCP Dev Summit is coming to NYC in April 2-3, according to the Linux Foundation events page.
MCP Dev Summit Finds New Home: The MCP Dev Summit successfully ensured its future through a transfer to the Linux Foundation.

MCP Contributors (Official) ▷ #general (1 messages):

hilocalden: Definitely not me 😁 I am just sharing the announcement.

DSPy ▷ #general (3 messages):

DSPy and OpenAI, Custom Adapters, User/Assistant message exchanges

DSPy Decouples from OpenAI: Members discussed that DSPy isn't tied to OpenAI, and what works well with GPTs may not work as much for other LMs.
- This point suggests that the design choices in DSPy are made to be generally applicable across various language models, rather than being specifically optimized for OpenAI's models.
Crafting Custom Adapters for DSPy: A member suggested implementing a custom Adapter to format the few-shots in the system prompt and benchmark it against the user/assistant method.
- This approach could help developers tailor DSPy to different LMs and compare the performance of various prompt formatting strategies.
Debating User/Assistant Message Exchange Design: Members showed interest in the thought process behind the design decision of using assistant and user message exchanges in DSPy.
- Since everyone is doing it differently and has arguments for and against on both approaches, the design choice reflects a specific position on how LMs should be interacted with in the DSPy framework.

MLOps @Chipro ▷ #events (1 messages):

Diffusion Models, Transformers, Study Group, Free Intro Workshops, Flow Matching

Diffusion Models Study Group Launches!: A 12-person, 3-month study group is launching in January 2026 to study Diffusion Models and Transformers, inspired by MIT’s diffusion course.
- The study group aims to go from first principles to real-world implementations, covering peer-led sessions, research paper discussions, and hands-on projects.
Free Workshops Tease January Study Group: Two free December intro workshops are available to get a taste of the material and the cohort before the study group kicks off.
- The workshops include an Intro to Transformer Architecture on Dec 13 (link) and an Intro to Diffusion Transformers on Dec 20 (link).
MIT Flow Matching Course Inspires Study Group: The Diffusion Models Study Group is inspired by MIT’s Flow Matching and Diffusion Models course notes.
- Classmates will include a CTO of an AI film startup, LLM educators, and full‑time AI researchers to learn how to train & fine-tune your own diffusion model.

MLOps @Chipro ▷ #general-ml (1 messages):

AI API Integration Platform, Model Aggregation, Developer Discounts

Siray AI Platform Aggregates Models: An AI API integration platform was built that brings together a wide range of models, including Codex, Claude, Gemini, GLM, Seedream, Seedance, Sora, and more.
- Developers can visit the platform at Siray.ai.
Siray AI Offers Developer Discount: The AI API platform is offering a 20% discount for developers.
- Interested developers are invited to try out these API services.

Windsurf ▷ #announcements (2 messages):

Windsurf 1.12.41 Release, Windsurf 1.12.160 Release, Windsurf MCP Management UI, Windsurf GitHub/GitLab MCP Fixes, Windsurf Diff Zones Improvements

Windsurf Waves with Stability and Speed: Windsurf released versions 1.12.41 and 1.12.160, promising significant improvements in stability, performance, and bug fixes.
- The update includes a new UI for managing MCPs, fixes for GitHub/GitLab MCPs, and enhancements to diff zones, Tab (Supercomplete), and Hooks as detailed in the changelog.
Windsurf Next: A Sneak Peek at Tomorrow's Tides: Users are encouraged to explore Windsurf Next, the pre-release version, to experience exciting new features like Lifeguard, Worktrees, and Arena Mode.
- More details can be found at the Windsurf Next changelog.
Windsurf Login Back in Action: Windsurf login services have been restored following a brief maintenance window, confirmed by a status update.
- No further details regarding the maintenance were provided.