They put the Open back in OpenAI!

AI News for 8/4/2025-8/5/2025. We checked 12 subreddits, 544 Twitters and 29 Discords (227 channels, and 8121 messages) for you. Estimated reading time saved (at 200wpm): 615 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

Day 2 in what is expected to be the most packed AI news week of the year, 3 of the big labs all announced models that would individually have taken title story.

First the most (unintentionally) leaked launch: OpenAI’s new open weights GPT-OSS models bring o4-mini class reasoning to your desktop (60GB GPU) and phone (12GB), which you can test in the new gpt-oss playground:

The model card and the research blog are worth a browse. The models also debut the harmony response format (open sourced) which update the old school ChatML with new concepts like message “channels”:

On the same day, Anthropic also released Claude 4.1 Opus (blog), which was also leaked. This should be the best coding model in the world… for now.

Finally, DeepMind’s Genie 3 showed off extremely impressive realtime world simulation with navigation and minute-long consistency, but in classic Genie fashion, you’ll just have to take their word that the demonstrated videos aren’t cherrypicked.

AI Twitter Recap

OpenAI’s gpt-oss Open-Weight Model Release

OpenAI releases gpt-oss-120b and gpt-oss-20b, its first open-weight models since GPT-2: @sama announced the release, describing them as state-of-the-art reasoning models with performance comparable to o4-mini that can run locally on high-end laptops, licensed under Apache 2.0. The release aims to empower individuals with control over their AI, foster innovation, and promote an AI stack based on democratic values. The models are designed for agentic tasks, with safety mitigations for issues like biosecurity. OpenAI’s official account also announced the release, with employees like @polynoamial and @kaicathyc sharing their excitement.
Model Architecture and Technical Details: The models are Mixture-of-Experts (MoEs), with gpt-oss-120b having 117B total / 5.1B active parameters and gpt-oss-20b having 21B total / 3.6B active parameters. @rasbt provided technical analysis, noting they appear to be wide vs. deep architectures and, surprisingly, use bias units in the attention mechanism, a feature from GPT-2. @finbarrtimbers highlighted the use of an efficient einsum MoE implementation, and @vikhyatk pointed out a unique swiglu variant with clamped inputs and a skip connection. The training compute for the 120B model was estimated at 2.1 million H100 hours, comparable to DeepSeek-R1’s 2.66 million H800 hours, as noted by @scaling01.
Performance, Benchmarks, and Hallucinations: @SebastienBubeck stated the models achieve impressive scores like 80 on GPQA and can run on a single GPU. However, @scaling01 pointed out a potential for a “hallucination fiesta” and poor scores on Aider Polyglot. The models seem to be overtrained for specific tasks, with @teortaxesTex noting they have “open sourced their reasoning effort scaling that totally destroys every other attempt” but fail on simpler problems. @vikhyatk speculates the models may be primarily trained on synthetic data to reduce risks like copyright infringement and harmful content.
Tooling, Chat Format, and Integrations: The models are post-trained to use a web browser and an interactive python notebook, giving them powerful out-of-the-box agent capabilities. They use a new chat template called Harmony, which @Trinkle23897 recalls working on three years ago. The release saw immediate support from the ecosystem, including vLLM (which detailed its integration), Hugging Face (which provided a finetuning guide), Ollama (which partnered with NVIDIA and Qualcomm for acceleration), Groq, Cerebras, and Together AI.
Community Reaction and Open Source Impact: The release was seen as a major win for open source, with @AndrewYNg thanking OpenAI for the “gift”. @ClementDelangue noted that gpt-oss became the #1 trending model on Hugging Face almost instantly. @willdepue observed that it has been less than a year since o1 was announced, and now an o3-level model is runnable on consumer hardware.

Major Model & Product Releases (Non-OpenAI)

Google DeepMind Unveils Genie 3: Google DeepMind announced Genie 3, a groundbreaking world model that can generate entire interactive, playable simulations from a text prompt. @demishassabis highlighted its ability to generate multi-minute, real-time interactive simulations. @DrJimFan described it as “game engine 2.0,” where a data-driven blob of weights replaces the complexity of engines like UE5. He also related it to the challenges of world modeling for robotics with GR00T Dreams. The model features world memory for environmental consistency and renders at 720p.
Anthropic Releases Claude Opus 4.1: Anthropic released Claude Opus 4.1, an upgrade to Opus 4 focused on agentic tasks, real-world coding, and reasoning. It saw immediate integration into developer tools like Cursor, which announced day-one support. @alexalbert__ announced a livestream to discuss the new model and Claude Code. The release also came with a tease of “substantially larger improvements… in the coming weeks.”
Alibaba Releases Qwen-Image and New APIs: Alibaba Qwen introduced Qwen-Image, a 20B MMDiT (Multimodal Diffusion Transformer) model for text-to-image generation, noted for its strength in creating graphics and Chinese text. It gained support in ComfyUI and Diffusers. Alibaba also released APIs for Qwen3-Coder and Qwen3-2507 with 1M token context length.
xAI Launches Grok Imagine: Grok’s image generation feature, Grok Imagine, is now live for all X Premium users on the Grok app.
Meta AI Releases Open Direct Air Capture Dataset: Meta AI, in collaboration with Georgia Tech and CuspAI, released the Open Direct Air Capture 2025 dataset, the largest open dataset for discovering materials to capture CO2 from the air.

AI Safety, Benchmarking, and Evaluation

OpenAI Launches $500K Red Teaming Challenge: OpenAI announced a $500K Red Teaming Challenge to invite researchers and developers to help uncover novel risks and strengthen open-source safety. METR confirmed its involvement in providing external feedback on OpenAI’s methodology for assessing catastrophic risks. However, @RyanPGreenblatt expressed concerns that substantial CBRN (Chemical, Biological, Radiological, Nuclear) risks have not been ruled out.
Kaggle Launches Game Arena: Demis Hassabis announced the Kaggle Game Arena, a new leaderboard and tournament series for testing modern LLMs on games, starting with chess. This provides a new way to measure agent performance in competitive environments.
New Benchmarks and Model Performance: GLM-4.5 was noted for its strong performance on Terminal-Bench, placing it among Claude-level models. On the cost-aware AlgoTune benchmark, open-weight models like Qwen 3 Coder and GLM 4.5 were shown to beat Claude Opus 4, as the benchmark budgets models at $1 per task.

Industry News, Tooling, & Broader Implications

The Cloudflare vs. Perplexity AI Agent Debate: A major debate erupted after Cloudflare began blocking AI crawlers, a move that drew sharp criticism. Perplexity AI issued a strong statement, claiming Cloudflare’s leadership is “dangerously misinformed” and arguing that AI agents are extensions of human users. The issue was amplified by figures like @balajis and Y Combinator’s @garrytan, who reported their sites were blocked without permission.
Acquisitions and Funding: Perplexity AI announced it has acquired Invisible_HQ, a team with expertise in scalable infrastructure for agents. In funding news, @steph_palazzolo reported that Reflection AI, a startup from DeepMind researchers, is in talks to raise $1B+ for open-source model development, and EliseAI is being backed at a $2B valuation for its AI voice agents.
AI’s Impact on the Future: Midjourney’s David Holz raised concerns about the power of AI technologies that will be used in the 2028 presidential election, stating “we aren’t ready.” Meanwhile, @OpenAI’s Woj Zaremba shared a link indicating user satisfaction with ChatGPT, noting its growth to nearly 700M weekly active users as shared by @juberti.
Frameworks and Tooling Updates: LangChain announced that its LangGraph Platform and LangSmith have achieved SOC 2 Type II compliance. Jules, a coding agent, introduced Environment Snapshots to save dependencies for faster, more consistent task execution. Agent Reinforcement Trainer hit #1 on GitHub trending repositories.
The Rise of Open Models: @natolambert argues that America needs to take open models more seriously, as its early lead via Llama has been eroded by strong open models from China. This sentiment was echoed by @hardmaru and @ClementDelangue, who emphasized that “the world runs on open-source.”

Humor/Memes

Relatability and In-Jokes: @portiaspetrat posted the highly relatable “ever since i was a little girl ive loved information”. A tweet from @bigsnugga claimed “cher predicted grok” with a meme.
The OpenAI Hype Cycle: A series of posts from @ollama showed a coffee cup getting progressively more jittery with captions like “just one more cup before ollama gets ready” and “getting ready for the day. @nvidia GeForce RTX is powered on.” in anticipation of the day’s releases.
Model Behavior Quirks: @soumithchintala lamented having to “gentrify” his writing style because ChatGPT made the em dash the “official punctuation of soulless AI prose.”
Parody: @Yuchenj_UW posted an “AI model dropping Law,” joking that whenever Google drops a model, OpenAI is bound to follow, predicting a massive release week.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. OpenAI GPT-OSS Model Releases, Integrations, and Community Discussion

🚀 OpenAI released their open-weight models!!! (Score: 1124, Comments: 375): The image is associated with the announcement of OpenAI’s first open-weight models, gpt-oss-120b (117B parameters, 5.1B active) and gpt-oss-20b (21B parameters, 3.6B active), intended for production-ready and local/specialized AI tasks. These models, available on HuggingFace, are notable for being able to operate on a single H100 GPU, targeting high-reasoning and agentic applications with practical hardware requirements. The post represents a significant shift toward openness for OpenAI, as highlighted by community reactions and technical commentaries referencing the potential, initial safety testing, and model quality. Commenters debate the significance of OpenAI’s shift toward more open models, with some users labeling it a move from ‘ClosedAi’ to ‘SemiClosedAi’ and others noting the unexpectedly high quality of the release, even among critics. Initial third-party safety tests are being referenced, indicating ongoing scrutiny by the open-source community.
- The open-weight models are released under the permissive Apache 2.0 license, allowing use without copyleft restrictions or patent risk, making them suitable for commercial deployment and extensive customization.
- The models feature several technical innovations: configurable reasoning effort for latency/performance trade-offs, full access to chain-of-thought outputs (useful for debugging, though not for end users), fine-tuning support, and agentic capabilities like function calling, web browsing, Python execution, and structured output generation.
- With native MXFP4 quantization on the MoE layer, gpt-oss-120b can run on a single H100 GPU and gpt-oss-20b can fit within 16GB VRAM, enabling deployment on more accessible hardware. Full benchmark results are available at https://preview.redd.it/0nbuy4ejj8hf1.jpeg?width=967&format=pjpg&auto=webp&s=5840e94490e805fe978ba8bc877904cd3b94fe0c
openai/gpt-oss-120b · Hugging Face (Score: 342, Comments: 87): The release of openai/gpt-oss-120b on Hugging Face is notable for being an ~117B parameter model under the permissive Apache 2.0 license. A comment highlights its dual-parameter/active parameter approach (117B total, 5.1B active), which suggests either MoE (Mixture of Experts) or related sparse activation techniques to optimize compute. Benchmarking results are yet to be independently verified. Commenters note the unusually permissive licensing (Apache 2.0) for a model of this size, and speculate that this release suggests OpenAI confidence in its upcoming GPT-5. Discussion centers on the technical implications of the parameter split and the broader implications for the LLM ecosystem.
- The model is noted for being released under an Apache 2.0 license, which is less restrictive compared to other open-source AI licenses, making it more amenable for both commercial and research use.
- There is technical interest in the quantized versions of the model, with users referencing that Unsloth is preparing quantizations (quants) for gpt-oss-120b and noting that Unsloth’s quants are “less buggy and work better” compared to alternatives. Direct links to Unsloth models are provided, as well as ggml-org quantized uploads.
- The model is reported to be heavily censored, which could impact its utility in applications requiring less restrictive response generation or broader output coverage.
gpt-oss-120b is safetymaxxed (cw: explicit safety) (Score: 349, Comments: 122): The image referenced in the post is a technical benchmark or evaluation graphic showing the safety alignment (likely refusal rates, toxicity, or related safety measures) of large language models such as gpt-oss-120b versus other models, including possible mention of Nemotron. The discussion focuses on explicit safety metrics, with one comment stating it’s ‘one of the very few benchmarks I actually take seriously,’ suggesting that the image provides meaningful or high-signal comparison for model safety. Another comment links to a fuller version of the benchmark, underlining community interest in transparent quantitative safety assessments. The post highlights a growing technical expectation for rigorous safety evaluation in open-source large language models. Commenters seem to value the benchmark’s credibility and granularity, with some humor (‘Nemotron cockmaxxing’) but mainly focusing on the seriousness and trustworthiness of the presented safety data.
- A user expresses concern that making a “safety-maxxed” open-source model like gpt-oss-120b widely available allows researchers and adversaries to white-box study the safety mechanisms, facilitating the development of jailbreaks and logic attacks. The technical implication is that robust adversarial testing on open models could translate into more effective violations (e.g., prompt injections) on closed models, as attackers refine their techniques on open benchmarks.
- The thread references a benchmark or visual evaluation (image linked by user) that is considered credible among technical practitioners. This highlights a focus on empirical, publicly-auditable safety assessments rather than relying on developer claims or opaque safety scores.
Llama.cpp: Add GPT-OSS (Score: 310, Comments: 60): Llama.cpp has added support for GPT-OSS, OpenAI’s new open-source model, enabling day-1 compatibility for inference and experimentation. Implementation details are sparse, but the update points to rapid ecosystem integration with llama.cpp’s efficient C++ backend. Commenters question whether OpenAI is actively involved with llama.cpp integration and express skepticism about the model’s licensing (specifically concerns about a ‘responsible use policy’) and real-world performance compared to top open-weight models.
- There is skepticism among commenters regarding the practical usability and performance of GPT-OSS, with comparisons to state-of-the-art open-weight models. One user questions whether this is a genuine open-source effort by OpenAI or a PR move, highlighting the community’s expectation for open models to meaningfully compete with established alternatives.
- Licensing concerns are raised, especially the fear of restrictive or changeable responsible use policies that might impact downstream adoption or freedom. The community’s technical stakeholders are particularly sensitive to licenses that are not truly permissive or might impose future constraints.
- A question is posed about the timeline for release, implying close monitoring of upstream integration and readiness in projects like llama.cpp—demonstrating the technical community’s demand for immediate, easy access to new models for local experimentation and benchmarking.
GPT-OSS today? (Score: 289, Comments: 67): The post discusses a major near-merge pull request to llama.cpp (PR #15091) that adds support for the new GPT-OSS model, an open-weight model from OpenAI. The linked image likely displays terminal output or stats related to running GPT-OSS locally. Commenters confirm GPT-OSS is already operational in several projects: OpenAI’s Harmony (https://github.com/openai/harmony) now supports GPT-OSS, Hugging Face Transformers v4.55.0 includes it, and GGUF-format models are accessible here: https://huggingface.co/collections/ggml-org/gpt-oss-68923b60bee37414546c70bf. Comments highlight the rapid integration of GPT-OSS across major tooling, and the community is already leveraging GGUF models in local inference frameworks. There’s a clear sense from commenters that GPT-OSS is immediately practical and the ecosystem is adapting quickly.
- OpenAI’s Harmony is now open-sourced, with dedicated official model cards and resources available at https://openai.com/open-models/. This release is significant in enabling greater transparency and reproducibility for research and integration into downstream applications.
- HuggingFace’s Transformers library (v4.55.0) has integrated support for the released GPT-OSS models, allowing seamless adoption for developers. This indicates rapid ecosystem adaptation and support from major ML frameworks.
- GGUF (a quantized format for efficient inference, e.g. with llama.cpp) is already supported for GPT-OSS, as shown by the hosted models on HuggingFace (https://huggingface.co/collections/ggml-org/gpt-oss-68923b60bee37414546c70bf), enabling low-resource and edge deployments.
I FEEL SO SAFE! THANK YOU SO MUCH OPENAI! (Score: 241, Comments: 39): The post critiques an OpenAI product release, noting that the model depicted is lacking in general knowledge and coding ability compared to a similar-sized GLM (likely GLM-4 or GLM Air). The title and selftext, paired with the image (which is presumably a sarcastic take on safety or model alignment), suggest skepticism about the model’s practical use cases, especially given its perceived limitations. Technical commenters echo that the model performs poorly—one calls it ‘lobotomized’—and question the motives behind its release, implying it’s more marketing than substance. Users debate the model’s utility, strongly suggesting that it’s mainly a marketing move by OpenAI and criticizing both the product’s capabilities and the surrounding hype cycle.
- Criticism is raised regarding heavy-handed safety and content restrictions in models, with some users arguing that over-censoring (“safetymaxxing”) significantly diminishes the model’s general knowledge utility. This perceived ‘lobotomization’ results in the model being less capable across a broad range of queries, not just in restricted topics.
- There is skepticism about the long-term relevance of hyped safety-forward model releases; the sentiment is that initial excitement fades quickly, especially if the restrictive policies render the model uncompetitive in practical utility or versatility compared to less-restricted alternatives.
Anthropic’s CEO dismisses open source as ‘red herring’ - but his reasoning seems to miss the point entirely! (Score: 390, Comments: 203): The image referenced is likely a screenshot quoting Anthropic CEO Dario Amodei’s comments from a recent Big Technology Podcast, where he characterizes open source AI as a ‘red herring’, i.e., not the core problem in AI progress or safety. The post and technical comments criticize this stance, noting that access to powerful models (not running inference) is the real bottleneck, and suggesting that Anthropic’s technical limitations with inference further weaken Amodei’s argument. This reflects ongoing debate about whether open-sourcing models meaningfully advances access or safety in AI. Commenters point out perceived hypocrisy or misunderstanding by Anthropic, with some arguing their technical weaknesses undermine their position. Others compare Anthropic’s stance negatively to OpenAI’s, suggesting stronger sentiment against Anthropic in the open source context.
- Discussion centers on Anthropic’s comparative weakness in inference infrastructure and optimizations, with users asserting that Anthropic “are famously not good at running inference,” suggesting that the company’s technical limitations here undermine their dismissal of open source. The argument implies access to powerful models—and the means to efficiently run them—remains a key bottleneck in the industry, rather than issues purely of deployment or software openness.

2. KittenTTS: Ultra-Compact TTS Model Launch

Kitten TTS : SOTA Super-tiny TTS Model (Less than 25 MB) (Score: 1752, Comments: 257): KittenTTS is a new open-source TTS model from Kitten ML, featuring code and weights (GitHub, HuggingFace) previewed at <25MB and ~15M parameters, with an ~80M param version (with identical 8 English voices) to follow. The ~15M param model delivers eight expressive voices (4 male, 4 female), runs in under 25MB, and can operate efficiently on low-resource hardware (e.g. Raspberry Pi, phones) without GPU—targeted at edge and CPU deployment scenarios. Multilingual support is planned for future releases. A technically informed commenter praises the model’s voice quality given the parameter budget and suggests defaulting to the more expressive voice demoed, rather than requiring source code editing. There is interest in expanding language support to Italian and other languages.
- The main technical praise centers on the model’s ability to deliver impressive audio quality given an exceptionally small footprint (<25MB), which is notable compared to standard TTS models that are often substantially larger. Users specifically mention it running well locally and express interest in support for additional languages such as Italian.
- There is a usability implementation criticism: one commenter points out that the best voice is not the default and that switching voices for quick tests requires editing source code, affecting fast prototyping or demo replication. Improving the UI or configuration for easier voice selection is suggested.
- A user provides a linked audio sample output from their local run, observing a difference between the generated audio and the demo in the announcement video, and asks for clarification or troubleshooting steps. This hints at possible reproducibility issues or inference configuration mismatches that may affect end-user results.
generated using Qwen (Score: 188, Comments: 38): The post references image(s) generated using Alibaba’s large language model Qwen, but a user observes that images from Qwen across different posts appear consistently blurry. No other technical implementation, configuration, or version details are given. The primary technical discussion is the issue of image quality (specifically, blurriness) in Qwen’s outputs, with no further analysis or debugging.
- Multiple users note that Qwen’s generated images consistently appear blurry, with excessive bloom lighting effects, compared to outputs from other models like Flux. There is a suggestion that image clarity and lighting control in Qwen’s output may lag behind other state-of-the-art generative models.

3. Llama.cpp Feature Updates and MoE Offloading

New llama.cpp options make MoE offloading trivial: -n-cpu-moe (Score: 262, Comments: 65): The latest llama.cpp release introduces the -cpu-moe and -n-cpu-moe flags, significantly simplifying Mixture-of-Experts (MoE) layer offloading from GPU to CPU. This eliminates complex regex patterns previously required for tensor offloading (ot), allowing users to tune offload count for models like GLM-4.5-Air-UD-Q4_K_XL gguf by simply adjusting module count. In testing, users achieved >45 t/s on 3x3090 GPUs with -n-cpu-moe 2. Comments broadly confirm the option’s technical efficacy, noting more efficient and user-friendly performance tuning than manual tensor selection, and successful application to demanding models (e.g., GLM4.5-Air). There is positive feedback on the straightforwardness of implementation versus prior manually configured solutions.
- The —n-cpu-moe option in llama.cpp allows users to trivially offload Mixture-of-Experts (MoE) layers to CPU, as demonstrated with the GLM-4.5-Air-UD-Q4_K_XL model (gguf format). A user running llama-server with this flag on a 3x3090 setup reported achieving over 45 t/s throughput, highlighting the powerful impact on performance when offloading is distributed appropriately.
- Technical discussion observes that the —n-cpu-moe option simplifies offloading compared to manual tensor offloading and is particularly well-suited to MoE models like GLM4.5-Air. This reduces the guesswork for users and lowers the barrier to optimal multi-hardware utilization.
- Suggestions for further enhancement include enabling cross-machine layer offloading (e.g., splitting model layers across a Mac mini and a Linux laptop to aggregate their resources) and future revisions of llama.cpp that might use model metadata to assign layers more intelligently to CPU/GPU based on their specific performance characteristics, potentially improving utilization and scalability for larger future models.
GPT-OSS today? (Score: 289, Comments: 67): The post discusses a major near-merge pull request to llama.cpp (PR #15091) that adds support for the new GPT-OSS model, an open-weight model from OpenAI. The linked image likely displays terminal output or stats related to running GPT-OSS locally. Commenters confirm GPT-OSS is already operational in several projects: OpenAI’s Harmony (https://github.com/openai/harmony) now supports GPT-OSS, Hugging Face Transformers v4.55.0 includes it, and GGUF-format models are accessible here: https://huggingface.co/collections/ggml-org/gpt-oss-68923b60bee37414546c70bf. Comments highlight the rapid integration of GPT-OSS across major tooling, and the community is already leveraging GGUF models in local inference frameworks. There’s a clear sense from commenters that GPT-OSS is immediately practical and the ecosystem is adapting quickly.
- OpenAI’s Harmony is now open-sourced, with dedicated official model cards and resources available at https://openai.com/open-models/. This release is significant in enabling greater transparency and reproducibility for research and integration into downstream applications.
- HuggingFace’s Transformers library (v4.55.0) has integrated support for the released GPT-OSS models, allowing seamless adoption for developers. This indicates rapid ecosystem adaptation and support from major ML frameworks.
- GGUF (a quantized format for efficient inference, e.g. with llama.cpp) is already supported for GPT-OSS, as shown by the hosted models on HuggingFace (https://huggingface.co/collections/ggml-org/gpt-oss-68923b60bee37414546c70bf), enabling low-resource and edge deployments.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Google DeepMind Genie 3 Model Release & Benchmarks

Google Deepmind’s new Genie 3 (Score: 4461, Comments: 783): Google DeepMind’s new Genie 3 is highlighted in a Twitter video, showing AI-powered generative gameplay that dynamically creates interactive environments and objects, advancing beyond static world-generation. The showcased model appears capable of real-time synthesis of game scenes, indicating significant progress from previous iterations such as Genie v2, and hints at application in both open-world and immersive simulation contexts. Comments raise the idea of using Genie 3 for VR/metaverse applications and speculate about its impact on the simulation of open-world games, indicating potential competitive implications relative to major game franchises like GTA. Technical debate centers on the recursive simulation possibilities and scalability for interactive environments.
- Commenters highlight potential applications of Google’s Genie 3 in VR and metaverse environments, suggesting its capacity to generate interactive 3D simulations from 2D images could fast-track immersive content development and procedural world generation.
- Speculation arises about the future trajectory and scalability of Genie 3, with technical readers anticipating that subsequent research papers or model iterations will yield rapid advances in generative simulation, interactive environments, or even real-time, user-driven content generation.
DeepMind: Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt (Score: 1484, Comments: 364): DeepMind’s Genie 3 is a self-supervised world model that dynamically generates fully interactive, playable 2D environments from a single text prompt, reflecting significant advances over prior generative environment models. Technically, Genie 3 addresses the challenge of maintaining environmental consistency across long time horizons: whereas auto-regressive video generation models typically suffer from accumulating inaccuracies, Genie 3 sustains high visual fidelity and coherent physical states for minutes at a time—visual memory can persist up to one minute. The model outpaces its predecessors by a substantial margin in both quality and endurance of generated environments, as noted in DeepMind’s official Genie 3 announcement and related research papers. Commenters highlight the rapid technological progress, especially in environmental persistence over long horizons, and express excitement about the implications for real-time interactive media and AI-generated game or simulation content. The emergence of “persistent memory” in generated environments is singled out as a major technical breakthrough, suggesting imminent impacts in gaming and simulation fields.
- Discussion highlights the technical challenge of maintaining environmental consistency over long horizons in AI-generated worlds, particularly since generating environments autoregressively causes error accumulation that degrades the experience. Genie 3 was noted to achieve “visual memory extending as far back as one minute ago” and keep environments consistent for several minutes, representing a significant advance over models from just six months prior.
- Technical readers note that prior versions of similar technology were much worse in performance within just the last half-year, marking the recent progress as ‘insane’ due to drastic improvements in persistence and realism of generated environments.
The progress from Genie 2 to Genie 3 is insane (Score: 934, Comments: 129): The post highlights significant advancements from Genie 2 to Genie 3, generative AI systems focused on interactive environment synthesis. While specific benchmarks aren’t detailed, the context implies major leaps in generative realism, interactivity, or capability. A referenced question asks if this technology is similar to Oasis AI’s Minecraft generation project, suggesting similarities in AI-driven open-world content creation. Top comments speculate about rapid progress (‘Genie 5 will create GTA 7 in 2 years’) and envision integration with VR and voice input for immersive experiences akin to a ‘holodeck.’
- A commenter highlights that Genie 2 was not capable of realtime interaction; users previously had to input an entire sequence of moves up front, in contrast to the improved interactivity and responsiveness enabled with Genie 3. This marks a notable leap in live, agent-driven gameplay simulation for the Genie project.
- Another technical theme considers AI content generation monetization, comparing the pay-per-token model (typical of large language models or generative AI) to traditional ‘buy-to-play’ games. The discussion explores whether future game access might shift to a pay-per-play-time model, fundamentally altering game revenue structures.
In Genie 3, you can look down and see you walking (Score: 2710, Comments: 371): The post highlights a feature in DeepMind’s Genie 3, a generative agent that can synthesize interactive, playable 3D environments from images or videos, where users can look down and see their own avatar walking. This suggests the model has advanced self-representation and real-time rendering capabilities within the synthesized environment, which is significant for embodied AI and simulation fidelity. For background, see DeepMind’s Genie project documentation. Commenters are excited about applications in gaming and historical reenactment, noting the implications for immersion, while one comment connects the realism to simulation hypothesis debates.
- A key technical distinction is noted between simply generating a video and the complexity of Genie 3’s capability to render a real-time, first-person perspective where the user can look down and see themselves walking. This implies more advanced scene understanding, spatial consistency, and possibly an on-the-fly avatar generation and consistent localization, which are far beyond straightforward video generation. Such systems may require real-time 3D scene reconstruction and robust positional tracking to maintain immersion and realism.
Genie 3 simulating a pixel art game world (Score: 570, Comments: 84): The post presents Genie 3, a generative AI model, simulating a pixel art game world, most likely by generating interactive visual environments in a low-resolution pixel style. The demonstration suggests the model’s capabilities at rendering dynamic, possibly playable, 2D-3D hybrid pixel scenes, indicating an application of diffusion models or video/game environment generation akin to recent breakthroughs by Google DeepMind and related labs. Specifics on model architecture, frame rates, or integration with gameplay engines are not provided in the post. Technical discussion in the comments speculates on the future potential, such as ultra-high-fidelity, world-scale simulation for VR using AI, and requests for references to existing games combining pixel art with 3D rendering, demonstrating an interest in practical adoption and hybrid visual styles.
- A technical inquiry is made about the hybrid visual style—specifically, interest in games that blend pixel art and 3D similarly to Genie 3. This suggests Genie 3 may employ either 2D sprites in a 3D-rendered environment or use neural rendering to simulate a pixel-art aesthetic over volumetric world geometry, and prompts discussion of comparable rendering approaches or engines supporting such workflows.
- A user speculates on the future impact of generative AI (like Genie 3), suggesting it could potentially disrupt game engines like Unreal and Unity by automating or revolutionizing content creation and world simulation. The comment alludes to the possibility that advanced models could eventually replace traditional development pipelines for immersive worlds.
Genie 3 Frontier World Model (Score: 269, Comments: 56): DeepMind’s Genie 3, referenced as a ‘Frontier World Model,’ denotes a major leap in generative AI for creating interactive, explorable worlds from natural language prompts, potentially combining visual, physical, and semantic understanding. Technical aspirations center around seamless integration of generative 3D modeling, and the possibility of instant, high-fidelity virtual environments, hinting at applications in VR and advanced game design. Commenters highlight the potential of Genie 3 to revolutionize AAA game development by enabling on-demand VR/3D worlds. There is anticipation for integrating this with advanced 3D modeling, fueling debate on the future of immersive, AI-generated content.
- A key technical insight is the recognition that models like Genie 3 Frontier could be foundational for automating complex 3D world generation, suggesting the eventual convergence of generative AI and advanced 3D modeling workflows. This could bridge the gap toward automated AAA-level game development by creating assets and environments on demand.
- Some commenters discuss the potential for combining large-scale world models (such as Genie 3) with interactive 3D modeling pipelines. The implication is that this integration could enable on-the-fly creation and manipulation of game worlds, effectively accelerating and revolutionizing traditional game design and simulation production.
Notes on Genie 3 from an ex Google Researcher who was given access (Score: 466, Comments: 68): An ex-Google Researcher evaluated the Genie 3 world model from Google DeepMind, highlighting its ability to generalize across gaming and real-world environments, rapid startup, strong visual memory with object coherence over occlusion/time, and effective handling of photorealistic and stylized scenes. Limitations include systematic physics failures (notably on rigid body and combinatorial tasks), limited multi-agent/social interaction support, constrained action spaces, and lacking advanced game logic/instruction following—indicating it’s still far from a production-grade game engine. The reviewer asserts Genie 3 evidences imminent disruption to gaming and possibly a step toward AGI/ASI if scaled, emphasizing the significance of integrating world models with 3D-AI and LLMs. Commenters debate whether advances like this world model represent an inflection point toward AGI/ASI, with some seeing such high-fidelity imagination/visualization models as a ‘final piece’ for AGI when merged with other modalities, while others speculate about industry leadership (Google vs. others) and competitive pressures in gaming.
- A key discussion point is the significance of Genie 3’s architecture in bridging the gap towards AGI: by granting models the ability to reason not only via language, but with a form of ‘imagination’ or visual/spatial reasoning akin to human cognition, it addresses a major bottleneck in multi-modal AI advancement.
- A commenter highlights the rapid pace of technical progress: since Genie 2, pixel counts have quadrupled and possible interaction time has increased tenfold in eight months. Extrapolating from this, they estimate real-time 4K generation for an hour could be feasible within a year, assuming sufficient resources.
- There is technical speculation on the compute demands, questioning how soon high-resolution, high-frame-rate models like Genie 3 could run outside of data centers, pointing towards a major challenge in democratizing access to advanced generative models.

2. OpenAI Open Source Model and GPT-OSS Launch

If the open source model is this good, GPT5 will probably be INSANE (Score: 477, Comments: 119): The post discusses a newly open-sourced model (referred to as ‘o4-mini’), suggesting its specifications are highly competitive—implying comparable capabilities to OpenAI’s proprietary models. The user speculates OpenAI only open-sourced this model because their upcoming GPT-5 might significantly surpass current open-source models, rendering them less relevant. The image link is referenced as showing evidence of the new model’s benchmark results or config. Top comments are non-technical and express hype or reliance on OpenAI’s advancements (e.g., ‘Accelerate’, ‘always bet on the twink’), but lack substantive technical debate.
- Commenters are expressing surprise at the quality and apparent progress of the open-source model referenced, hinting at rapid iteration and competitive performance that rivals or approaches state-of-the-art closed-source alternatives. Some remarks imply that substantial financial offers have been declined by contributors or organizations involved, underscoring the high perceived value of the technology within the AI community. While no concrete benchmarks or technical details are provided in these specific comments, the discussion reflects recognition of impactful open-source model advancements potentially altering the competitive landscape vis-à-vis closed models like GPT-5.
OpenAI OS Model today? (Score: 383, Comments: 60): The post discusses the release of the OpenAI GPT-OSS-20B open-weight language model, as referenced by the linked Kaggle competition for red-teaming (security evaluation) of the model (https://www.kaggle.com/competitions/openai-gpt-oss-20b-red-teaming). The image appears to be a screenshot or announcement related to this launch, signaling OpenAI’s move toward open-sourcing at least some models, with community emphasis on technical details and competitive benchmarking with models like Genie 3. The notable technical aspect is the open-weight status and the model’s competition use. Image link Commenters speculate whether this model is competitive with or bigger than anticipated (“Imagine if big-but-small is GPT-5”) and compare it to Genie 3, highlighting community expectations for performance and openness.
- A user points out the release of the OpenAI ‘gpt-oss-20b’ model as an open weight model, referencing its listing on Kaggle’s red teaming competition page (link). This suggests a significant move towards open-sourcing by OpenAI, allowing the community to rigorously test and evaluate the model’s safety and capabilities.
- Discussion speculates about an impending major model upgrade, with users questioning whether “GPT-5” or a smaller but architectural-advanced model (nicknamed “big-but-small”) could be released. The anticipation is technically grounded in expectations for substantial capability or efficiency improvements over prior generations like GPT-4.
- An additional link to a statement from an OpenAI employee hints at developer-focused announcements, possibly indicating new API features, enhanced model weights for third-party use, or tools targeting developer integration, further fueling speculation about the model’s openness and accessibility.
OpenAI releases a free GPT model that can run right on your laptop (Score: 303, Comments: 50): OpenAI has released a free, open-weight GPT model named GPT-OSS available in 120B and 20B parameter sizes, with the smaller 20B model able to run on machines with 16GB RAM, while the larger requires a single Nvidia GPU. The 120B variant matches performance of the o4-mini model; the 20B variant matches o3-mini in capability. Both are distributed under the permissive Apache 2.0 license, and are accessible via Hugging Face, Databricks, Azure, and AWS (The Verge summary). Commenters highlight the practicality of running the 20B model on local hardware (16GB RAM) and question response latency and real-world capabilities. There is interest in comparative benchmarks versus established models but details remain sparse in the initial discussion.
- The new OpenAI open-weight model, GPT-OSS, comes in two variants—120B and 20B parameters. The 120B parameter model can run on a single Nvidia GPU and is reported to perform similarly to the o4-mini, while the 20B parameter version can run on just 16GB of memory, benchmarking close to o3-mini (see The Verge article). Both are distributed under the Apache 2.0 license, enabling commercial modification and deployment through platforms like Hugging Face, Databricks, Azure, and AWS.
- A user referenced a 91.4% hallucination rate for the new models, suggesting that, despite the accessibility improvements and hardware requirements, factual reliability remains a significant issue in these early releases. This underscores the necessity for rigorous evaluation and real-world testing of open-weight LLMs before deploying them in production settings.
Finally os models launched by openai !! At level of o4 mini !! Now we can say it’s openai (Score: 284, Comments: 47): The post discusses the recent launch of open-source (“os”) models by OpenAI, which reportedly reach a performance level comparable to “O4 mini” (likely a reference to OpenAI’s GPT-4 mini or similar compact models). According to user comments, the 20B parameter version of these models can run on just 16GB of RAM, making high-quality LLM inference more accessible to a wider range of hardware. Another user confirms successful usage and impressive performance in LM Studio, a local inference environment for large language models. Commenters are surprised and impressed by the rapid progress of open-source model quality; some even misread the abbreviation “os” as “operating system” due to the context. There’s significant optimism about future releases (e.g., “GPT-5 is going to be a banger”) and enthusiasm about hardware efficiency for local runs.
- The 20B parameter open-source model from OpenAI reportedly requires only 16GB of RAM to run, making local inference feasible on consumer hardware—even with relatively large models. This low hardware requirement dramatically broadens accessibility for both developers and researchers. Screenshot
- Users testing the new models using tools like LM Studio report being impressed by the performance, noting that open-source model quality has accelerated rapidly over the past year. This suggests competitive inference speed and capability against other commercial offerings in the same parameter range.
- Discussion highlights that such open-source models could enable the development of custom chatbot-powered applications and novel products, with anticipated growth of open-source projects on platforms like GitHub as a direct result of simpler, high-quality local deployment.
Gpt-oss is the state-of-the-art open-weights reasoning model (Score: 389, Comments: 141): A post announces that “Gpt-oss” is now considered the state-of-the-art open-weights reasoning model, potentially surpassing previous open-weight models in reasoning capabilities. The main evidence is a linked JPEG image, likely benchmarking Gpt-oss against existing models, suggesting considerable technical progress but without specific metrics or architecture details presented in the text. Comments express optimism about the implications for future models (such as GPT-5), but there is a lack of critical technical discussion or comparative benchmarking details in the thread.
- FoxB1t3 suggests that Horizon was in fact OSS 120b from OpenAI, noting that despite its large scale (‘120b’), it had the characteristic ‘small model feeling,’ which may refer to its inference speed, calibration, or perceived output sophistication compared to its size. The user also points out the impracticality of claims about running such massive models (120 billion parameters) on a typical PC, emphasizing the hardware requirements and suggesting these marketing statements are misleading from an implementation perspective.
- Grand0rk highlights that the model exhibits a very high degree of censorship, indicating that safety filters or content moderation are extremely restrictive. This affects deployment and research utility for tasks requiring less controlled outputs, which is a technical consideration for those intending to use or fine-tune the model in uncensored environments.
Introducing gpt-oss (Score: 161, Comments: 48): A new open-source LLM, ‘gpt-oss’, has been released with a notable 20B parameter model. Benchmarks from user deployment on Apple silicon (M3 Pro, 18GB) show generation speeds of ~30 tokens/sec—significantly faster than Google Gemma 3 (17 TPS). The model reportedly loads efficiently on consumer-grade Apple hardware, supporting large-context completions. Expert users are debating the 20B model’s qualitative writing ability for long-form tasks (e.g., 500-word short stories, genre fiction like romance), with open questions on its creative coherence versus established AI models. Additionally, there is community interest in prompt support for integration into OpenRouter.
- A user noted that the 20B gpt-oss model achieved approximately 30 tokens per second (TPS) when running on a MacBook Pro M3 Pro with 18GB RAM, which is substantially faster than Google’s Gemma 3 (reported ~17 TPS on the same hardware). This suggests significant inference optimization for local deployment and improved efficiency compared to other large language models of similar size.
- Another commenter discussed running the 20B model on a Mac mini (M4 Pro, 64GB RAM), questioning the model’s ability for long-form, coherent output (like a 500-word short story or niche genres such as romance). This highlights interest in practical generation quality and sustained performance for sizable output tasks on local hardware.
- There is interest in offline/local deployment, with one comment asking about minimum hardware requirements and whether the model can run entirely without an internet connection. The reference to “high end” hardware per Altman suggests a debate on the accessibility of running large models like gpt-oss locally for inference.
Open models by OpenAI (Score: 178, Comments: 17): OpenAI has released open-weight models, notably a 20B parameter model, designed to run optimally on consumer hardware with ≥16GB VRAM or unified memory, including Apple Silicon Macs (see official docs). Early user testing with Ollama initially encountered deployment issues on a 16GB Mac mini, but a subsequent Ollama update resolved these, validating compatibility on this hardware configuration. Discussion centers on the model’s hardware demands, and initial issues with Ollama’s implementation (since resolved). Users generally express enthusiasm for the benchmarks and the availability of an open-source option, noting it as a leading choice in the current open AI ecosystem.
- A user with a 16GB Mac mini shares experience attempting to run the 20B OpenAI model, referencing documentation that specifies models are ‘best with ≥16GB VRAM or unified memory’ and suited to Apple Silicon Macs. Initially, they encounter issues running via Ollama, but note that after a new release/update from the Ollama team and a redownload, the model works as intended, suggesting rapid compatibility updates for consumer hardware.
OpenAI Open Source Models!! (Score: 115, Comments: 15): The image appears to show a benchmark or comparison related to OpenAI’s newly released open-source models—potentially displaying their performance (possibly a 120B parameter MoE, with details like 5.1B/3.6B active params). The post context and technical comments debate the scale (120B parameters, Mixture-of-Experts) and number of active experts in inference, indicating OpenAI’s open release is state-of-the-art for open-source. OpenRouter support and comparative performance vs. unreleased models (e.g., potential GPT-5) are also highlighted. Commenters are impressed by the scale and the fact that OpenAI’s open-source release hasn’t been sandbagged (deliberately underpowered); some express excitement about what the closed-source GPT-5 could achieve if the open model is this strong, and note the model’s availability via OpenRouter.
- The released model is reported to be a 120B parameter Mixture-of-Experts (MoE), with only 5.1B or 3.6B parameters active at a time, highlighting a scalable efficiency setup where only a subset of experts is engaged during inference. This MoE structure allows the model to be much larger in capacity without incurring the full inference cost of its parameter count.
- There is technical debate regarding which variant—especially ‘o3’—provides the best performance among the released models, suggesting some comparative benchmarks or qualitative testing is ongoing within the community. Users also note early availability on OpenRouter, facilitating easier third-party evaluation and deployment.

3. Qwen-Image and Open-Source Multimodal Generation Benchmarks

Qwen image prompt adherence is GT4-o level. (Score: 448, Comments: 128): The post discusses Qwen’s image generation model, comparing its prompt adherence to that of GPT-4o. The user provides a series of creative, detailed prompts and notes improvements in Qwen’s ability to faithfully follow instructions. Top comments highlight that while prompt adherence is strong, the outputs often have an ‘AI’ or photomontage quality and may lag behind top models per benchmarking sites such as genai-showdown.specr.net. Comments raise concerns about image realism and visual quality, with multiple users stating outputs appear ‘unrealistic’ or ‘like a bad photoshop,’ suggesting that fidelity to prompts does not necessarily produce photorealistic or natural-looking images. There’s also debate as to whether Qwen’s performance matches SOTA, referencing external benchmarks.
- There is discussion about prompt adherence and visual realism: while users note significant improvements in prompt obedience for Qwen’s image model (with some suggesting it is on par with GT4-o), there are critiques that outputs still look artificial or reminiscent of crude digital edits, highlighting ongoing challenges in generative model realism.
- One comment references https://genai-showdown.specr.net, which aggregates benchmark comparisons for generative models, implicitly suggesting that claims about Qwen’s prompt adherence being equal to GT4-o are not fully supported by head-to-head benchmark results.
- Qwen’s image model displays strong multilingual capabilities, as evidenced by examples of detailed and contextually accurate image generation from prompts in Spanish, demonstrating competitive performance across languages.
Qwen image prompt adherence is amazing (Score: 140, Comments: 19): The post demonstrates the high prompt adherence of the Qwen-Image model (specifically the gguf Q5_k_m variant, available here), by generating images—such as a complex request for a 1920s archival photo with a datamoshed, glitched subject—using a 20-step inference process. Example outputs can be previewed here, and additional images are provided via a linked Google Drive folder. The technical showcase focuses on the model’s capacity for rendering fine-grained prompt details and complex visual effects like RGB glitches and emulsion artifacts. Commenters note the model’s strong baseline performance and express interest in the potential for further fine-tuning, indicating its adaptability for more tailored generative tasks.
- Discussion references the Qwen model’s impressive image prompt adherence and inpainting capabilities, with user-provided screenshots (example1, example2) demonstrating strong results particularly in the context of inpainting. Commenters note that the image modifications are accurate and visually appealing, suggesting the model is competitive with or exceeds current standards for prompt-controlled image editing.
Really impressed with Qwen-Image prompt following and overal quality (Score: 105, Comments: 36): The post highlights impressive prompt adherence and image quality achieved using Qwen-Image, an image generation model within the ComfyUI workflow. The user only increased inference steps to 30, otherwise following the standard procedure from Qwen-Image’s official documentation. The result, per their account, matched complex multi-element prompts with high fidelity on the first try, indicating robust conditional image synthesis and improved prompt-following behavior (as detailed at https://docs.comfy.org/tutorials/image/qwen/qwen-image). Commenters discuss technical resource requirements (the fp8 model reportedly needing ~20GB VRAM), reflecting possible hardware limitations for local use. Further comments appreciate the model’s narrative capability and draw parallels to quality leaps seen with new diffusion models.
- One comment highlights that the FP8 version of Qwen-Image requires 20GB VRAM, indicating high resource demands for full-precision inference, which may impact accessibility for users depending on hardware capabilities.
- A user asks about integration with platforms like Forge versus Comfy, expressing uncertainty about compatibility and required architecture, suggesting that deployment details for Qwen-Image are still a point of confusion for some implementers.
- It is notable that Qwen-Image achieves high-quality output and prompt adherence without the need for external LoRAs (Low Rank Adapters), unlike other models such as Flux which often require targeted LoRAs for similar performance, pointing toward architectural or training improvements in Qwen-Image.
Why Qwen-image and SeeDream generated images are so similar? (Score: 107, Comments: 52): The OP observes near-identical image outputs between Qwen-image and SeeDream 3.0 when given the same prompt (“Chinese woman” and “Chinese man”), raising questions about potential overlap in training datasets or post-training procedures. Notably, Qwen-image is open-source, and SeeDream has since updated to version 3.1, which diverges in image style from 3.0. A technically relevant commenter notes a recurring ‘orange hue’ in several generations from these models, suggesting a possible artifact or bias in color representation in output, which may be linked to data or model training specifics.
- Some users speculate that Qwen-image and SeeDream might produce visually similar images due to being trained on overlapping or even identical datasets, possibly including prompts or data drawn from major sources like Midjourney, Stable Diffusion, or Flux. This shared training foundation could explain similarities in generated outputs across models.
- Notably, users have observed consistent visual motifs—such as a recurring orange hue across multiple generations from these models. This suggests possible common preprocessing pipelines or dataset biases introduced during training, which may propagate into the model outputs regardless of prompt.
- The discussion points out how the open-sourcing of such powerful generative models enables broad scrutiny, comparison, and reverse engineering—providing a unique lens to trace the evolution and biases of these systems compared to proprietary counterparts.
🚀🚀Qwen Image [GGUF] available on Huggingface (Score: 188, Comments: 74): The thread announces the availability of Qwen Image GGUF models (including Q4K M quants) on HuggingFace, with links to multiple repositories: lym00/qwen-image-gguf-test, city96/Qwen-Image-gguf, a separate GGUF text encoder (unsloth/Qwen2.5-VL-7B-Instruct-GGUF), and the VAE safetensors for ComfyUI. The Q4 quantized model alone is about 11.5GB, excluding VAE and text encoder, making it challenging to run on consumer GPUs with less VRAM (e.g., RTX 3060). GGUF format allows local inference, but does not speed up rendering, and VRAM remains a significant bottleneck, with 32GB+ only providing limited relief for latest generative models. Top comments highlight frustration with model sizes and VRAM constraints, noting lower quantization yields poor results. There is discussion of the lack of practical multi-GPU support for diffusion models and a desire for unified memory (as with TPUs). Examples for ComfyUI usage are also linked, providing practical workflows.
- GGUF format enables local inference for large generative image models like Qwen, but VRAM is currently the key limitation: for instance, the Q4 quantized model alone is 11.5GB, not counting additional requirements for VAE and text encoders, making it impossible to run on GPUs with limited VRAM (like RTX 3060 with 12GB) source. Lower quantization (e.g. Q4) significantly reduces quality, while FP8 remains slow on consumer GPUs.
- Although GGUF makes it technically possible to run these models locally, practical performance and speed are bottlenecked by lack of multi-GPU support—most workflows can only distribute separate tasks, not split core diffusion computation across GPUs. There is anticipation for better hardware integration, such as unified memory via TPUs, but current advances are not keeping pace with model demands.
- Various quantization and precision options are available—such as Dfloat11 and FP8—but users still report difficulty in identifying optimal generation settings (e.g., cfg params). Community resources like ComfyUI examples (see: https://comfyanonymous.github.io/ComfyUI_examples/qwen_image/) are being curated to help with guidance, but best practices are still being developed.
Qwen-image now supported in Comfyui (Score: 208, Comments: 67): Qwen-image, a potent image generation model, now has integration with ComfyUI, as well as SwarmUI (docs). Benchmarking on a 4090 (Windows), inference times are about 45 sec/image at CFG=4, Steps=20, Resolution=1024 (or similarly for CFG=1, Steps=40). The model requires high VRAM due to its large text encoder and parameter size, and optimal results are reported at high steps/CFG, but with substantial tradeoffs in speed. Noted technical strengths include strong prompt understanding, text rendering, and minimal censorship; but inconsistent performance on certain prompts remains unresolved. Commenters debate parameter configurations (CFG/Steps/Resolution) for balancing quality and speed, and note that quantized model versions are necessary for broader accessibility due to high compute requirements. One user also remarks on the need for svqd (possibly semantic vector quantization) support.
- Qwen-image is now supported in both ComfyUI and SwarmUI, with technical docs for SwarmUI detailing configuration parameters. Users report that optimal generation quality for Qwen-image requires high values (CFG=4, Steps=50, res=1024+), but this greatly increases inference time (e.g., CFG=4, Steps=20, Res=1024 takes ~45 seconds per image on an RTX 4090). Lower CFG or Steps runs faster but degrades output quality; using quantized (quants) versions or LoRAs is suggested for speed improvements on less powerful GPUs.
- The model’s text encoder and parameters demand significant VRAM and computational resources—commenters highlight the necessity for quantized or GGUF (for llama.cpp compatibility) versions to broaden accessibility for users with limited hardware. The image model is praised for its prompt fidelity, ability to render text, minimal censoring, and recognition of pop culture IPs, though it shows instability on some prompts.
- Request for SVQD (vector quantization) and GGUF file formats shows community interest in efficiency and wider deployment, especially for smaller GPUs, aligning with broader trends of porting large models to lightweight, accessible formats.

AI Discord Recap

A summary of Summaries of Summaries by Gemini 2.5 Pro Exp

Theme 1. OpenAI’s GPT-OSS Release Ignites Widespread Debate

GPT-OSS Models Drop, Community Scrambles to Test: OpenAI released its first open-source models since GPT-2, the GPT-OSS family, including a 120B and 20B parameter version, now available on HuggingFace and LM Studio. The release was accompanied by a six-week virtual hackathon and a $500K Red Teaming Challenge on Kaggle.
Performance and Censorship Under Scrutiny: Initial tests on the LMArena leaderboard reveal the 120B model’s performance is near OpenAI o4-mini, but users found it disappointing with more hallucinations than most models, except for Llama 4 Maverick. Many engineers across the Unsloth and Perplexity communities noted the model is censored to hell and back, leading to discussions about abliteration techniques to improve usability.
Technical Teardown Reveals Novel Architecture: The models use MXFP4 quantization with weights packed as uint8 and a block size of 32, allowing the 120B model to fit on a single 80GB GPU. The architecture also features interleaved sliding window attention, a Harmony chat format for structured interaction, and uses a learned attention sink per-head.

Theme 2. New Models from Anthropic, Google, and Others Flood the Market

Anthropic’s Claude 4.1 Opus Targets Agentic Excellence: Anthropic released Claude 4.1 Opus, available on OpenRouter, which now leads the SWE Bench coding benchmark and dominates terminal agent benchmarks by securing 9 of the top 10 spots. While praised for superior tool usage, some users noted it still fails many spatial tests and questioned if the mild improvements justify its cost.
Google DeepMind Unveils Genie 3 World Simulator: Google DeepMind announced Genie 3, its most advanced world simulator capable of generating high-fidelity visuals at 20-24 fps with dynamic prompting and persistent world memory. While no paper was released for Genie 3, the community pointed to the original Genie paper for technical insights into the underlying world model architecture.
Speculation Mounts for Imminent GPT-5 Release: Chatter about a potential GPT-5 release this week intensified, fueled by hints from Sam Altman and internal leaks shared on X. Speculation suggests it might be an Operating System Model related to Horizon, though some believe a more incremental update like GPT-4.1 is more likely.

Theme 3. Developer Ecosystem Tools and Frameworks Evolve

LibreChat Supercharges LM Studio Speed: Users raved about the blazing fast inference speeds when serving models from LM Studio via LibreChat, with one user claiming “It’s like a identical ChatGPT (OpenAI) UI that serves all my LM studio models. But it’s just blazing fast.” Setup requires careful configuration, as one user solved a connection issue by adjusting YAML indentation and another resolved Tailscale issues by binding the host to 0.0.0.0.
LlamaIndex and DSPy Tackle Complex Documents: LlamaParse was showcased turning dense PDFs into multimodal reports, while LlamaCloud was highlighted for helping companies like Delphi scale by handling complex document ingestion. In the DSPy community, a developer shared a writeup on using DSPy to detect document boundaries in PDFs.
AutoGen and MCP Power a YouTube Search Bot: A developer shared a YouTube tutorial on building a multi-agent chatbot with AutoGen and MCP servers for YouTube search. This coincided with a proposal for a new in-browser ‘postMessage’ transport for MCP, complete with a demo and a SEP draft for standardization.

Theme 4. AI Benchmarking and Novel Applications

Kaggle Kicks Off AI Chess Tournament Amidst Skepticism: The Kaggle Game Arena launched with a 3-day AI chess exhibition tournament, but some engineers questioned chess as a true test of intelligence, viewing it as a strategy optimization game. In a separate match, Kimi K2 lost to Deepseek O3, with Kimi being forced to resign after making an illegal move.
GLM 4.5 Air Hypes Up, Outperforming Rivals in Coding Test: GLM-4.5 Air is being touted as a strong contender, scoring 5/5 on one user’s test suite and outperforming models like Horizon Beta, Grok 4, and Opus in a ‘create-an-html-game’ test. Despite some quirks like infinite thinking loops, the consensus is that GLM-4.5 is really strong.
Youzu.ai Visualizes the Future of E-Commerce: Youzu.ai demonstrated its visual AI infrastructure for e-commerce with a Room Visualizer feature that allows users to upload a room photo and get a complete redesign in seconds. An accompanying demo video shows how users can instantly shop every item in the redesigned room.

Theme 5. Hardware Havoc and Performance Tuning

CUDA vs. Compute Shaders Debate Ignites: Engineers in the GPU MODE server debated the merits of CUDA kernels versus compute shaders for image post-processing with libtorch C++, with Pytorch announcing a seminar on its new kernel DSL, Helion. A user also reported issues with CuTe failing to generate a 128-bit vectorized store, instead emitting two STG.E.64 instructions, which breaks memory coalescing.
Linux Users Lament Lethal Cursor Freezes: Multiple Linux users reported Cursor IDE freezing and becoming unresponsive, pointing to potential network issues or bad requests being investigated by the team, as documented in the Cursor forums. Meanwhile, Windows users were reminded that disabling the page file can cause weird, unexplainable crashes even with ample RAM.
Modular Platform Gets a Boost with MAX and Mojo: Modular Platform 25.5 is now live, featuring Large Scale Batch Inference via SF Compute and an open-source MAX Graph API. The release enhances MAX and PyTorch interoperability through the @graph_op decorator, but users on Intel-based macOS systems were reminded that only Apple Silicon CPUs are officially supported.

Discord: High level Discord summaries

Perplexity AI Discord

OpenAI Drops Open-Source LLM!: OpenAI released an open-source LLM, GPT-OSS-120B, available at HuggingFace, causing excitement about hardware requirements (H100 GPUs recommended) and censorship levels.
- Members are already experimenting, but some are crashing their computers and noting the need for quantization and censorship, suggesting it is censored to hell and back.
Opus Arrives Quietly: Anthropic has released Claude 4.1 Opus, with initial reactions suggesting it offers mild improvements over previous versions, with better multi-file debugging but it fails many spatial tests.
- Some suggest it may be a move to one-up OpenAI, while others highlight that the model may not offer enough of an improvement to justify its pricing and there are old rate limits, and that this may only be a flex on OpenAI.
GPT-5 Anticipation Heats Up: Anticipation is building for a potential GPT-5 release, possibly on the 7th, driven by hints from Sam Altman and chatter about OpenAI releasing an Operating System Model.
- There is conjecture that it might be related to Horizon, and debate on whether Grok 4 would break any benchmarks given this release.
Youzu Visualizes E-Commerce: Youzu.ai is transforming online shopping with visual AI infrastructure, as demonstrated in a comprehensive demo at Vivre, which operates across 10 CEE countries.
- The Room Visualizer feature allows users to upload a room photo and receive a complete redesign in seconds, enabling instant shopping of every item, as seen in this demo.
Sonar API Documentation Surfaces: A user new to Sonar API inquired about its usage, and another user shared a link to a YouTube video about Sonar API.
- In response to the question about the Sonar API, a user shared the Perplexity AI documentation link along with a GIF.

Unsloth AI (Daniel Han) Discord

Unsloth Quantization Overload: The community requested Unsloth to quantize the yisol/IDM-VTON model, but it lacks diffusion training support and doesn’t handle custom quantization requests unless there’s significant demand.
- This is because of the manual labor and compute efforts needed to implement a manual quantization.
Nemotron Super 49B 1.5: the Daily Driver?: Members discussed the Nvidia Nemotron Super 49B 1.5 model, with one member finding it to be a great daily driver because of its prompt adherence if you NOT USE F**ING LISTS*.
- Others expressed interest in its capabilities as a general thinking model, noting its good instruction following while mentioning its dry prose.
GPT-OSS Suffers Abliteration Allegations: Initial reactions to the OpenAI GPT-OSS model were mixed, with some calling it pretty much junk, while others voiced concerns over its safety measures, speculating its a result of US safety acts, leading to discussions about abliteration and its potential to make the model dumb as F__K.
- It was noted that GPT-OSS could be outperformed by GLM 4.5 Air, leading to discussions about the model’s overall value and the possibility of a Chinese startup surpassing it soon.
GRPO Batching Brain Teasers: Inquiries arose about how Unsloth handles batching for GRPO, specifically whether entire trajectory groups are batched together.
- A member clarified that if n_chunks is set to 1, each batch corresponds one-to-one with a group.
Token Decoder Maps Framework is Born: A member introduced their LLM domain-specific language framework designed for purposes like summarization.
- The GitHub project utilizes EN- tokens to summarize specific concepts or facts for later injection and prompting.

LMArena Discord

GLM 4.5 Hypes Up: A member touted GLM-4.5 as potentially living up to the hype, scoring 5/5 on a test suite and outperforming other models including Horizon Beta, Grok 4, o3 Pro, Gemini 2.5 Pro, Claude Sonnet and Opus in a ‘create-an-html-game’ test.
- Despite quirks like infinite thinking loops and result discrepancies, the member concluded that GLM-4.5 is overall really strong.
AI Chess Tourney Kicks Off Kaggle Game Arena: The Kaggle Game Arena will kick off with a 3-day AI chess exhibition tournament featuring 8 models (YouTube link).
- Some raised concerns about chess being a strategy optimization game rather than a test of intelligence, and questioned how non-visual models will interpret the board.
Long Context Benchmarks Embrace Diverse Models: Members discussed context windows in benchmarks, noting the necessity to accommodate most released models, even those with smaller context windows, to ensure fair scoring.
- It was also pointed out that different versions exist for different context sizes, allowing models to be punished/rewarded accordingly.
GPT-5 Release Looming?: There was discussion about a potential GPT-5 release in a few days, with one member noting that internal leaks have cited the same thing (x.com link).
- Jimmy Apples speculated that heavy users may not notice improvements in GPT-5 due to auto-routing.
OpenAI open source models show limitations on LMArena: OpenAI’s gpt-oss-120b and gpt-oss-20b models are now available in the arena, expanding the range of choices for users interested in open-source alternatives.
- Members tested GPT-OSS 120B and found it disappointing, with performance perhaps on par with o3-mini or Qwen3 235B A22B, and more hallucinations than any other model except for Llama 4 Maverick.

LM Studio Discord

OpenAI Drops gpt-oss Models!: OpenAI released gpt-oss, a set of open models under the Apache 2.0 license, available on lmstudio.ai with sizes of 20B and 120B parameters.
- Community members testing the models on LM Studio reported broken links on the LM Studio website and difficulties locating the 120B model, as well as setting the context.
LibreChat Supercharges LM Studio Speed!: Members raved about the blazing fast speed of inference using models in LM Studio via Libre Chat, claiming It’s like a identical ChatGPT (OpenAI) UI that serves all my LM studio models.
- One user resolved a connection issue by adjusting YAML indentation, emphasizing the importance of configuration details.
Tailscale Troubles? Bind to 0.0.0.0: Users ran into setup issues trying to use AnythingLLM with LM Studio via a Tailscale IP, where the model list would fail to populate.
- The problem was resolved by setting the LM Studio host to 0.0.0.0 to allow external connections and opening port 1234 in the firewall.
Hardware Havoc: Windows Page File Still Matters: Despite abundant RAM, disabling the page file on Windows can lead to weird, unexplainable crashes, as some programs rely on it.
- A better alternative is using zram to compress pages and store them in RAM, potentially fitting 2-3 compressed pages in one actual page.
CUDA Runtime Confusions Cleared!: For setups with multiple GPUs like a 5090 and 4090, users recommend using CUDA 12 for the runtime.
- Advised to opt into the beta branch of both LM Studio and the runtime.

OpenAI Discord

OpenAI Unveils GPT-OSS Models and Hackathon: OpenAI released the GPT-OSS model family and launched a six-week virtual hackathon in collaboration with Hugging Face, NVIDIA, Ollama, and vLLM.
- The hackathon features categories like Best Overall, Robotics, and Wildcard, offering winners cash prizes or NVIDIA GPUs, plus a $500K Red Teaming Challenge on Kaggle is being held.
GPT-5 Release Date Still Unclear: Community members are actively speculating about the release of GPT-5, though some believe a more incremental update like GPT-4.1 or a unification of existing models is more likely; this X post is circulating.
- Despite anticipation, skepticism remains, with some suggesting OpenAI may be struggling to make GPT-5 sufficiently impressive.
GPT-OSS Model Tested by Community: The 120B parameter GPT-OSS model reportedly achieves near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU, according to OpenAI’s blogpost.
- The 20B parameter model is said to deliver results similar to OpenAI o3‑mini and can run on edge devices with only 16 GB of memory.
AI sparks Academic Integrity Debate: A professor is teaching other professors how to use AI to create exams, but students are conversely using AI to complete them, raising concerns about critical thinking erosion.
- The sentiment is that AI should offload lower-skilled tasks, letting individuals focus on the important parts of the problem, but switching off one’s own brain results in generic fast food outputs.
New Ollama GUI Eases Local Model Access: The new Ollama UI is praised for its simplicity, especially the toggle to enable network serving, though some find it immature compared to UIs like AnythingLLM and Dive.
- The new Ollama GUI has made running models locally easier than ever before.

OpenRouter (Alex Atallah) Discord

Anthropic Opus 4.1 Takes Coding Crown: The new Anthropic Opus 4.1 model is now live on OpenRouter and leading in the SWE Bench coding benchmark, detailed on X.
- The model can be accessed here for immediate use.
OpenAI Back to Open Source with GPT-OSS: OpenAI launched gpt-oss, new open-weight models with variable reasoning, on OpenRouter, detailed on X.
- The models include gpt-oss-120b at $0.15/$0.60 per M input/output tokens and gpt-oss-20b at $0.05/$0.20 per M input/output tokens.
Model Prioritization Bug Vanquished: A quick fix has resolved the model vs models prioritization issue by ensuring only model or models is used, as per Google AI documentation.
- The fix prevents conflicts and ensures proper model selection for users.
OpenRouter Mulls Claude Cache Stripping: Some members have requested that OpenRouter automatically strip cache parameters for Claude providers that don’t support caching.
- Members noted that Azure and Google Claude providers are blacklisted in their settings because they don’t support caching.
Gemma 3 Demonstrates Emotional Intelligence: Members suggest using Gemma 3 27b for understanding emotions, referencing the EQ benchmark by EQbench for evaluating LLM emotional understanding.
- Users advised against using DeepSeek R1 for emotional understanding tasks.

Latent Space Discord

Anthropic Crowns Community Champion: A member from Auth0 joined Anthropic to run community for Claude Code, celebrated by the web dev community.
- This move signals Anthropic’s investment in community engagement and support for its Claude Code product.
OpenAI Open Sources GPT: OpenAI released GPT-OSS, an open-source model, along with a cookbook and model card.
- The release includes resources for running the model with vLLM and detailed information about its capabilities and limitations.
Google Deepmind Unveils Genie 3: Genie 3 was announced, claiming to be the most advanced world simulator ever, and it’s capable of high-fidelity visuals at 20-24 fps with dynamic prompting, persistent world memory, and rapid world generation, according to this blogpost.
- Genie 3’s capabilities mark a significant step towards more realistic and interactive simulation environments for AI development.
Claude Opus 4.1 Targets Agentic Excellence: Anthropic launched Claude Opus 4.1, an upgraded model focusing on better performance for agentic tasks, real-world coding, and reasoning, available via API, Amazon Bedrock, and Google Cloud Vertex AI.
- The upgrade aims to improve Claude’s ability to handle complex, multi-step tasks and integrate more effectively with real-world applications and coding environments.
Reflection AI Seeks Billion-Dollar Boost: According to this tweet, Reflection AI, a one-year-old startup founded by ex-Google DeepMind researchers, is reportedly in discussions to raise over $1 billion to build open-source LLMs aimed at rivaling DeepSeek, Meta and Mistral.
- This substantial funding round would position Reflection AI as a major player in the open-source LLM space, challenging established models.

Nous Research AI Discord

Qwen-Image Shows Text-to-Image Talents: Users are attempting to run Qwen-Image locally using fp8 diffusers, though there is no ComfyUI support yet and the image editing model is unreleased.
- The community observed that Qwen-VL already handles image inputs, with the released model emphasizing text rendering capabilities via its text encoder.
XBai-o4 Scales Heights with RL: XBai-o4 claims scaling via continued RL from QwQ32, employing BoN scaling with a reward model to generate 32 CoT reasoning traces, as detailed in their paper.
- A classifier head selects the best trace based on these CoT traces, though implementation details remain vague.
Harmony Chatting with GPT-OSS: GPT-OSS adopts the Harmony chat format, integrating developer roles and channels for more structured interactions.
- Notably, the Horizon beta elucidates these roles during user chats, enhancing the conversational experience.
Claude Agents Dominate Terminal Benchmarks: Terminal benchmarks reveal Claude agents dominating the top spots, securing 9 out of the top 10 positions, prompting questions about the leaderboard’s metrics.
- It was also mentioned that OpenAI is now giving away their ultra-optimized math CoT, which could help open weight math reasoning models.
OpenAI drops GPT-OSS Model Card: A member highlighted the availability of the OpenAI GPT-OSS Model Card.
- The model card delineates the intended uses, capabilities, limitations, and potential risks associated with the open-source GPT model.

Eleuther Discord

Discord Logs It All!: Discord moderators utilize moderator logs to track deleted messages, as highlighted in a recent tweet.
- This feature ensures transparency and accountability within the moderation process, allowing moderators to review actions taken on the platform.
LLMs Suffer Algorithmic Monoculture: Scaling up LLMs to serve as judges may not resolve underlying issues due to algorithmic monoculture and difficulties in evaluating SOTA models.
- One member suggested that the scaling strategy may have been sufficient, but offered no supporting evidence.
YaRN Makes an Appearance: A team utilized YaRN within their project, earning support from community members.
- Several members received acknowledgments for their contributions in supporting its integration.
GPT OSS Arrives on Hugging Face!: The OpenAI community explores the newly released GPT OSS models, which are available on Hugging Face (20B version and 120B version).
- One member is actively conducting an SAE on the GPT OSS 20B model and cited benchmark numbers which can be found on the OpenAI training page.
lm-eval-harness Gets Seeded Through API: A member implemented the ability to pass a seed for the dataset through an API call for the lm-evaluation-harness.
- Feedback is being requested on the associated PR#3149.

Moonshot AI (Kimi K-2) Discord

Kimi K-2 Loses Chess Match to Deepseek O3: A chess match between Kimi K2 and Deepseek O3 resulted in a win for O3, with members noting its step-by-step reasoning approach.
- During the match, Kimi initially resigned, but was then forced to resign again after making an illegal move, resulting in an automatic win for O3.
GPT-OSS Drops, Community Goes Wild: The release of OpenAI’s GPT-OSS sparked excitement, with members noting its availability on platforms like Hugging Face and Openrouter.
- The community reported Day 0 support with llama.cpp PR on the way, and impressive benchmark results, but the live chess demo was cut short.
Kaggle AI Game Arena Puzzles Members: Members expressed amusement and confusion over Kaggle’s AI Game Arena competition, questioning the inclusion of a non-reasoning model.
- One member shared a GitHub link indicating that Go will be added to Game Arena, noting that Go is more difficult than chess.
Quantization Reduces Model Size: The model card on Hugging Face reported a 120B model requiring only 60GB, sparking discussion about quantization.
- It was clarified that MXFP4 quantization reduces the memory footprint, enabling the larger model to fit on a single 80GB GPU and the smaller model to run on systems with as little as 16GB memory.

Cursor Community Discord

Claude Sonnet Slays Gemini in Agentic Arena: Members are favoring Claude Sonnet 4 over Gemini 2.5 Pro for agentic ide contexts, because of its superior tool usage.
- While some prefer Gemini/Sonnet 4 for brainstorming, most agree that Claude is superior for these use cases.
Cursor Community Cries for Comprehensive File Format Support: Users are clamoring for the ability to edit PDF, .docx, .csv, and .xlsx files within Cursor, as local file uploads are currently unsupported.
- A member shared a link to the Cursor Feature Request forum to formally request the feature, highlighting that Cursor can extract text from publicly accessible PDF URLs.
Cursor’s Cut-Rate Yearly Subscription Confounds: A user was surprised by a $16 Cursor subscription, which is the monthly cost of a yearly subscription; the regular monthly subscription is $20.
- This pricing structure offers a discount for committing to an annual plan.
Linux Lovers Lament Lagging, Lethal Freezes: Linux users report freezing and unresponsiveness with Cursor on Linux, rendering it nearly unusable for some due to possible network issues or bad requests.
- Members pointed to several issues being reported on the forums, with the team currently working on fixes.
Background Agents Bedeviled by Breakdowns: Multiple engineers reported that background agents started to fail to spin up repeatedly in the last few hours.
- A member asked for a request ID to investigate the issue and confirmed sending a couple of request IDs via PM and expressed gratitude for the support.

HuggingFace Discord

Hugging Face’s HF Toolkit Gets Turbocharged: Hugging Face unveils updates to its ecosystem, featuring a lightweight experiment tracking library called Trackio, aimed at streamlining ML experiment management.
- They further enhanced their toolkit with four new OCR datasets comprising over 20 million images, accelerated Transformers with kernel support, expanded HF Jobs to launch compute tasks on CPUs/GPUs, and introduced a faster, more user-friendly hf CLI (Hugging Face blog).
Local AI Devs Love ZeroGPU: Enthusiasts are testing ZeroGPU with surprising success, showcasing a Space running a 340M t2i model pretrained at home.
- Image generation completes in just 1 second on ZeroGPU’s H200, while prompt generation takes 2-5 seconds, sparking discussions on optimal hardware like RTX 3090/AMD MI60 for local RAG.
Dataset Drama: Flagged as Malware: A member reported a dataset flagged as unsafe despite antivirus scans, identified as Pickle.Malware.NetAccess.pwn.STACK_GLOBAL.UNOFFICIAL, directing users to HuggingFace’s security documentation.
- The community suggested removing shards and examining the dataset line by line, noting the limitations of website-based scanners.
Fine-Tuning Frenzy Fueled by Unsloth: Guidance on fine-tuning open-source LLMs leads to SmolFactory and an OpenAI cookbook.
- Members shared that OpenAI’s 20B and 120B open models can run on consumer hardware using Unsloth.
Agents Course Assignment Submissions Spark Snafus: Multiple members encountered issues submitting the final assignment in JSONL format for the AI agent’s course, reporting errors like “task ID not found in the file” and are seeking debugging advice.
- The API requires a specific JSON format including username, agent_code, and an answers array containing task_id and submitted_answer.

Yannick Kilcher Discord

Low-Cost Voice Cloning Emerges: Members explored feasible deep learning projects on consumer hardware (e.g., 5090) for voice cloning, focusing on methods that minimize the need for extensive audio datasets and costly GPU rentals.
- The discussion centered on achieving high-quality voice cloning without significant upfront investment.
Claude Code Springs Data Leak: A user found that Claude Code introduced data leakage into an XGBoost pipeline by incorporating an engineered feature directly related to the prediction target.
- The observation sparked concerns about the reliability of LLM systems in automating ML pipelines, with the suggestion to double check everything.
Deepseek-R1 Keeps Lead Over Rivals: Despite anticipation, Deepseek-R1 still outperforms Kimi-K2 and GLM4.5, yet is hampered by being too slow for practical use.
- This comparison highlights the ongoing competition in language model performance and usability.
Attention Sink Layers Sink into Transformers: Hugging Face’s Transformers library now utilizes attention layers with a learned attention sink per-head, changing the softmax denominator as detailed in the release notes.
- This method parallels adding a prepended token with all-zero Key and Value features, as explored in this paper.
DeepMind’s Genie Dreams of Three: DeepMind unveiled Genie 3, scaling compute and data of their world model, and although no paper was released, the original Genie paper and the SIMA paper provide insight.
- Deepmind released several blog posts (Genie 1, Genie 2, Genie 3)

Notebook LM Discord

Whisper Transcriptions Edge Out YouTube URLs: A member reported superior video transcription results using local Whisper compared to using YouTube URLs within NotebookLM, emphasizing its accuracy.
- The user downloads entire channels, transcribes them locally, and then inputs the transcriptions into NotebookLM for interaction, noting that NotebookLM lacks playlist support.
Users Await Elusive Video Overviews: Users are still awaiting the Video Overview feature in NotebookLM, despite announcements of a complete rollout.
- The rollout is taking longer than expected, and Pro users are among those still waiting, pointing towards underlying infrastructure challenges.
Podcaster Shares Custom Prompt Gold: A member provided a Google Docs link to custom instructions tailored for guiding podcast generation, focusing on iterative adjustments and name modifications for model applications.
- The custom prompt shared emphasizes reducing filler words and avoiding interruptions during podcast generation.
Image Uploads Temporarily Vanish: Google has temporarily disabled the image upload feature in NotebookLM to address infrastructure problems affecting users.
- Initially, extracting images from PDFs was suggested as a workaround, but this was rendered moot when the feature’s removal was confirmed.
Data Privacy Weighs on Users’ Minds: Users are actively questioning data privacy practices, particularly whether NotebookLM uses user data for model training.
- To address these concerns, a member shared a link to Google’s data protection policies outlining NotebookLM’s data handling procedures.

Modular (Mojo 🔥) Discord

Mojo Embraces Pythonic Decorators: A member lauded the simplicity of Python decorators and requested that Modular implement similar flexibility in Mojo, enabling logic definition at compile-time or runtime, but another contributor says a Zig-style reflection system may be more likely.
- The reflection system being considered can be used for adding functions, modifying structs, and introducing new struct members, though manipulating functions at the compiler level could be challenging.
Volokto JavaScript Runtime Written in Mojo: A member developed Volokto, a JavaScript runtime written in TypeScript, with the source code available on GitHub.
- The runtime includes user-made functions, nested control flow, function calling with arguments, and implements a dictionary type and console with bytecode resembling CPython.
Modular Platform 25.5 Boosts Scale: Modular Platform 25.5 is now live, featuring Large Scale Batch Inference via SF Compute, standalone Mojo Conda packages, and an open-source MAX Graph API.
- This version offers enhanced interoperability between MAX and PyTorch through @graph_op, along with smaller, faster MAX serving containers and performance improvements.
MAX ❤️ PyTorch with graph_op: Release 25.5 offers integration through @graph_op, enhancing interoperability between the frameworks and enabling running multiple AI agents in Mojo using multiple instances of the Modular CLI and a reverse proxy.
- For scenarios involving numerous sub-agents, custom applications leveraging MAX as a library may be necessary.
Apple Silicon CPUs Mandate: Users discovered that only Apple Silicon CPUs are officially supported for macOS, causing compatibility issues for those with Intel-based systems, according to the system requirements.
- An Intel Docker Ubuntu container was suggested as a workaround for those not using Apple Silicon.

GPU MODE Discord

Debate Sparked Over CUDA Kernels vs Compute Shaders: A member ignited a discussion on the advantages of CUDA kernels versus compute shaders for image post-processing with libtorch C++, particularly for long-term benefits and experiences, emphasizing that non-Nvidia compatibility isn’t a concern.
- The member also asked about MXFP4 implementation details, sparking discussion around OpenAI’s new models.
MXFP4 Model Mixup: OpenAI’s U8 Weights Deconstructed: Members dissected OpenAI’s innovative approach of utilizing U8 weights instead of FP4 in their latest open-weight model, clarifying that weights are packed as uint8, with scales as a uint8 view of e8m0.
- They pinpointed that the weights are unpacked back to FP4 during inference and training, employing a block size of 32 for MXFP4 and 16 for NVFP4.
Community Disagrees on H100’s FP4 Training Claims: Community members debated the plausibility of inference with H100 given its specifications, questioning its support for FP4 and pointing to an NVIDIA blog post.
- The discussions were heated, with one member dismissing H100 training as a blatant lie.
Helion Kernel DSL Seminar Announced: A seminar was announced for Helion, a kernel DSL from Pytorch, will be held tomorrow 2.15 PT, see Helionlang.com.
- The seminar promises to unveil how Helion simplifies writing fast kernels.
Vector Store Stumbles: CuTe’s 128-bit Vectorized Store Defect: A user reported that CuTe isn’t generating a 128-bit vectorized store from registers to global memory as expected, despite using a float4, noting that the compiler emits two separate STG.E.64 instructions instead of STG.E.128.
- This issue is causing concerns due to the breaking of memory coalescing across threads, as vectorized stores are expected to ensure contiguous data writing.

MCP (Glama) Discord

Need MCP Documentation ASAP: A user is seeking to create an MCP server that can access documentation from a repo or docs site, avoiding excessive round trips for information retrieval.
- The user needs an MCP server so they can access the documentation.
Tool Payments Get Standardized: Members are discussing standardizing payments for a potential world of thousands of tools, where AI assistants handle payments without needing individual account creation.
- A member pointed to this PR hoping it enables securely entering payment info on the client side without creating an account for one-time purchases.
PostMessage Proposal for In-Browser MCP: A member is working on a transport proposal for in-browser ‘postMessage’, including a demo showcasing a client + server hosted via GitHub Pages.
- They’ve also drafted a SEP and seek an MCP spec maintainer to sponsor it for standardization.
MCP for Embedded Systems Explored: A new user is asking about a useful MCP for programming embedded systems within STM32CubeIDE.
- No further information was shared in the discussion.
AutoGen and MCP Make YouTube Search Easy: A member shared a tutorial on building a multi-agent chatbot with AutoGen and MCP servers for YouTube search from scratch, available at YouTube.
- The tutorial is intended to guide users through the creation of a functional chatbot using these tools.

aider (Paul Gauthier) Discord

DeepSeek becomes Aider’s top pick: Members endorse DeepSeek as a suitable model for use with Aider, citing its cost-effectiveness and performance using Deepseek-R1 via OpenRouter.
- A user reported being quite happy with it.
OpenAI open models emerge: OpenAI released new open models, sparking discussion among the community.
- A member humorously noted the rapid pace of releases, joking as soon as I work out how to load GLM air stable on my machine, something else comes out 117B with 5.1B active… .
Aider eyes non-interactive mode: A member inquired about the possibility of using Aider in a non-interactive mode for scripting purposes.
- Referencing the aider scripting docs, they indicated difficulty in finding relevant documentation.
Pikuma’s LLM Vibe Test goes viral: A member shared Pikuma’s LLM vibe test for the community’s amusement, showcasing its ability to explain this code.
- The test seems to have generated some buzz in the community.

DSPy Discord

DSPy Cracks PDF Document Boundaries: A member shared a writeup on leveraging DSPy to detect document boundaries in PDFs, showcasing its potential in document processing: kmad.ai/Using-DSPy-to-Detect-Document-Boundaries.
- The author introduced DSPy to knowledge graph practitioners, hinting at exploring deeper multi-step workflows and optimization in the future: X post.
DSPy Courts Knowledge Graph Gurus: A writeup introducing DSPy to knowledge graph practitioners was shared, emphasizing its potential to enhance productivity with LLMs: blog.kuzudb.com.
- The author is looking forward to exploring multi-step workflows and optimization strategies in the coming weeks.
SIMBA Optimizer Gets the Marius Vach Treatment: A member shared a detailed write-up explaining the intricacies of the SIMBA optimizer: blog.mariusvach.com.
- The write-up was lauded as super intuitive and well explained by another member of the community.
GEPA Still Ghosting DSPy: A user inquired about the elusive availability of GEPA for use within DSPy.
- Another user provided a Discord link, confirming it hasn’t been released yet.
System Prompt Optimization: Fine-Tuning’s Friend or Foe?: A user questioned the efficacy of using an optimized sys_prompt with DSPy during model fine-tuning, particularly when adding reasoning traces to a non-reasoning model.
- They outlined a three-phase training approach involving SFT and GRPO with different sys_prompts, admitting lol idk what im doing, im just f-ing around to find out due to non-verifiable rewards.

LlamaIndex Discord

LlamaParse Transforms PDFs into Reports: @tuanacelik demonstrates how LlamaParse turns dense PDFs into multimodal reports with interleaving text and images via this Twitter thread.
- The process involves ingesting research papers with LlamaParse and building a report-generation agent that dynamically chooses tools for high-res OCR and chart images.
LlamaCloud Powers AI Scaling via Document Intelligence: LlamaCloud’s parsing capabilities facilitate AI companies in scaling from prototype to production by handling complex document ingestion as demonstrated in the building of Delphi’s “digital minds” mentorship platform as mentioned in this Tweet.
- The parsing technology excels at processing malformed PDFs and embedded images.
OpenAI Drops GPT-OSS Models: OpenAI released their first open-source LLMs since GPT-2, GPT-OSS-120B & GPT-OSS-20B, under the Apache 2.0 license as per this tweet.
- These models feature reasoning that matches o4-mini, can run locally, and are ready to use with LlamaIndex.
Document Agents Tackle Messy Financial Documents: A webinar will demonstrate how Document Agents manage messy financial documents using LlamaCloud’s tooling.
- The session will showcase systems that work with complex, multimodal documents, and will be held in 1 week.
LlamaExtract Grapples with Graph-Parsing Challenges: Members confirmed reviewing LlamaExtract with graphs and acknowledged that these cases are notoriously difficult for LVMs/LLMs.
- A LlamaParse team member is expected to join the discussion to further explore these challenges.

Manus.im Discord Discord

Manus Platform Experiences Downtime: Users reported that the Manus platform is experiencing low usage and might be dead, which could indicate a need for alternative solutions or improvements.
- A workaround was discovered using sub-agents to improve code and project quality within the Manus environment.
TradingView Premium Offered Freely: A user shared a Reddit link claiming to offer a free full version of TradingView Premium for Windows and macOS.
- The offer’s authenticity and safety are questionable, considering it was posted on Reddit and is not an official TradingView promotion.
Guide Emerges for Flutter App Creation: A user posted a guide on creating Flutter apps within daily credit limits, accessible at flutter-web-emulator.vercel.app.
- The page contains ads and is based on personal experience, with another user flagging the link as a potential scam.

LLM Agents (Berkeley MOOC) Discord

Syllabus Changes Coming to LLM Agents Course: Members discussed potential syllabus changes for the upcoming LLM Agents course, noting the field’s rapid evolution.
- One member stated that “As the agents’ area evolves it’s normal for the syllabus to be adapted”.
LLM Agents Gets an Advanced Sibling: Participants distinguished between two courses: LLM Agents and Advanced LLM Agents, highlighting that they are distinct.
- A member clarified that “They are two different titles - one is llm agents other advance llm agents. Syllabus is different”.
Speakers to Change for LLM Agents Course: Despite similar overall topics, the speakers for the LLM Agents course are expected to be different.
- This signals a fresh perspective and expertise for the course content.

Torchtune Discord

Torchtune Channel Welcomes Sharing: The Torchtune channel has explicitly welcomed the sharing of its content on other public servers.
- Channel admins have encouraged broader dissemination of information, expressing happiness at the prospect of the channel being shared.
Channel Information Dissemination: Members of the Torchtune channel inquired about sharing channel information on external public servers.
- The channel owner responded positively, encouraging the sharing of content and expressing enthusiasm for wider distribution.

tinygrad (George Hotz) Discord

TinyPilot Plugs into Codebase Work: A member suggested that the TinyPilot tool might aid with codebase tasks.
- They cautioned that while it can be helpful, deeper codebase integration needs work.
Image Analysis Induces Laughter: An attached image, Screenshot_2025-08-04_at_12.08.33_PM.png, prompted a simple lol reaction.
- No further context or analysis was provided regarding the image’s content.

Cohere Discord

Engineer Pioneers Intelligent Voice Agents: An AI Engineer is building intelligent voice agents, chatbots, and AI assistants that handle inbound/outbound phone calls via SIP (Twilio).
- The engineer’s work involves GPT-powered chatbots that learn from documents, audio, and scraped data from forums, Discord, Slack, and websites using retrieval-augmented generation (RAG), combined with workflow automation to streamline communication and processes.
Engineer navigates Tech Stack: The AI Engineer is skilled in Python, JavaScript, Node.js, FastAPI, and tools like LangChain, Pinecone, OpenAI, Deepgram, and Twilio.
- The Engineer is available for freelance, remote, or startup projects.

MLOps @Chipro Discord

Logs and Click-Throughs Boost Rankers: A member proposed enhancing rankers by mining search logs, query data, document lists, and click-through data.
- Another member reminded to account for the costs, as employing such logs comes at a price.
Data Use Demands Dollar Diligence: Collecting and leveraging search logs for fine-tuning a ranker carries financial implications.
- It’s crucial to judiciously assess these costs before embarking on this methodology.

Codeium (Windsurf) Discord

Claude Opus 4.1 Lands on Windsurf!: The newest model from Anthropic, Claude Opus 4.1, has been released on Windsurf.
- Engineers should note that this top-tier model comes with a 20x credit rate on the platform.
Windsurf Teases New Model, Surfs Up!: Windsurf has announced a new model that will be coming to their platform soon.
- The announcement featured an image of a surfer, hinting at performance to come.

Nomic.ai (GPT4All) Discord

GGML compatibility with PyTorch and ONNX models: A user inquired about the feasibility of running PyTorch or ONNX models using GGML.
- The discussion is currently open, awaiting responses on the potential and methods for such compatibility.
TBD: Further Discussion on Model Compatibility: Placeholder topic to ensure the minimum number of topic summaries is met.
- Further discussion and details to be added as the conversation evolves.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

You are receiving this email because you opted in via our site.

Want to change how you receive these emails? You can unsubscribe from this list.

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

kesku: https://x.com/perplexity_ai/status/1952532113095643185 <@&1105626802732404746>

Perplexity AI ▷ #general (1206 messages🔥🔥🔥):

Comet Browser, OpenAI OSS Model, Claude 4.1 Opus, Perplexity Search Ranking, GPT-5 Release Speculation

Comet Browser Invite Scramble Continues: Members are still actively seeking invites to the Comet Browser, with users sharing anecdotes about acquiring invites via X (formerly Twitter) and Discord, while others report difficulties and offer advice on DMing random users for invites.
- Agentic browsers like comet may also be scraping websites that explicitly say no, raising some debate about the ethics of scraping.
OpenAI Releases Open-Source LLM!: OpenAI released an open-source LLM, GPT-OSS-120B, causing excitement and prompting discussions about hardware requirements (H100 GPUs are recommended), censorship levels, and performance compared to other models; available at HuggingFace.
- Members are already experimenting, though some are crashing their computers and noting the need for quantization, prompting some to say that it is censored to hell and back.
Claude 4.1 Opus Released, Initial Reactions Mixed: Anthropic has released Claude 4.1 Opus, with initial reactions suggesting it offers mild improvements over previous versions, with better multi-file debugging but it fails many spatial tests, also available on PPLX for the same pro price.
- Some suggest it may be a move to one-up OpenAI, while others highlight that the model may not offer enough of an improvement to justify its pricing and there are old rate limits, and that this may only be a flex on OpenAI.
Pondering the #3 spot for Perplexity Search Rankings: Discussion revolves around Perplexity’s search ranking (purportedly #3) and whether the company plans to improve it, with some suggesting they are focusing more on Comet at the moment, while others are impressed by it.
- Some think that PPLX has access to tools as well, given a recent leaderboard ranking of #1 o3-search by @OpenAI.
GPT-5 Speculation Intensifies: Anticipation is building for a potential GPT-5 release, possibly on the 7th, driven by hints from Sam Altman and chatter about OpenAI releasing an Operating System Model.
- There is conjecture that it might be related to Horizon, and debate on whether Grok 4 would break any benchmarks given this release.

Youzu AI e-commerce, Room Visualizer, Youzu Lens, Google Genie AI

Youzu.ai Revamps E-Commerce with Visual AI: Youzu.ai is transforming online shopping with visual AI infrastructure, as demonstrated in a comprehensive demo at Vivre, which operates across 10 CEE countries.
Room Visualizer Redesigns Shopping Experience: The Room Visualizer feature allows users to upload a room photo and receive a complete redesign in seconds, enabling instant shopping of every item, as seen in this demo.
Youzu Lens Powers Visual Product Discovery: The Youzu Lens feature lets users take a picture of anything to find similar products instantly, facilitating discovery-driven commerce where customers explore, visualize, and purchase through immersive experiences.
Google’s Genie AI: Google unveiled Genie AI that turns images into playable worlds, according to this Perplexity page.

Perplexity AI ▷ #pplx-api (6 messages):

Sonar API, Perplexity Docs

Sonar API Intro Vid Surfaces: A user new to Sonar API inquired about its usage.
- Another user shared a link to a YouTube video about Sonar API.
Perplexity API Documentation Shared: In response to the question about the Sonar API, a user shared the Perplexity AI documentation link.
- The user offered assistance and shared a GIF as well.

Unsloth AI (Daniel Han) ▷ #general (1200 messages🔥🔥🔥):

Unsloth Quantization Requests, Nvidia Nemotron Super 49B 1.5, Diffusion Based Quantization Paper, GPT-OSS Model Analysis, GPU Recommendations for Training

Unsloth Community Requests Model Quantization: Members requested Unsloth to quantize the yisol/IDM-VTON model, but were informed that while quantization might be possible, Unsloth doesn’t currently offer training support for diffusion models due to lack of support.
- It was also mentioned that Unsloth usually doesn’t handle custom quantization requests unless there’s significant demand, because it requires manual labor and compute efforts.
Members Debate Nvidia’s Nemotron Super 49B 1.5 Model: Members discussed the new Nvidia Nemotron Super 49B 1.5 model, with mixed reactions; one member found it to be a great daily driver due to its prompt adherence, especially when combined with the advice to NOT USE F**ING LISTS*.
- Others expressed interest in its capabilities as a general thinking model as opposed to coding and math overoptimized models, noting its good instruction following, while also mentioning its dry prose.
Diffusion Model Quantization Research Discussed: A member shared a paper on diffusion-based quantization, highlighting techniques like QuEST, Q-Diffusion, Q-DM, and TDQ with PQT/QAT potentially being helpful.
- The paper proposes NIC (Neural Image Compression), with the original requester expressing a desire to use the model as a daily driver, but acknowledged the challenges of spreading a 49B model like smaller 14B models.
OpenAI’s GPT-OSS Model Receives Mixed Reviews and Safety Concerns: Initial reactions to the OpenAI GPT-OSS model were mixed, with one member calling it pretty much junk, while others voiced concerns over its safety measures, speculating its a result of US safety acts, leading to discussions about abliteration and its potential to make the model dumb as F__K.
- It was noted that GPT-OSS could be outperformed by GLM 4.5 Air, leading to discussions about the model’s overall value and the possibility of a Chinese startup surpassing it soon, with its safety measures even being criticized for being overly cautious and leading to nonsensical outputs.
GPU Recommendations and Memory Management Strategies: Members discussed GPU recommendations for training and running AI models, with the consensus that used 3090s offer the best value, but some members are looking into 5090s.
- There was also advice on leveraging cloud services like Google Colab Pro and RunPod, but Google Colab was seen as suboptimal and unreliable; members also discussed the best memory management strats for fitting large models into smaller VRAM capacities.

Unsloth AI (Daniel Han) ▷ #introduce-yourself (4 messages):

Software Engineer seeking new opportunities, AI Engineer specializing in voice agents and chatbots

Senior Software Engineer Looks for New Role: A Senior Software Engineer is seeking new opportunities and invites others to reach out if they have a project or role that could benefit from their expertise.
- They did not specify which type of software.
AI Engineer Specializes in Intelligent Voice Agents: An AI Engineer specializes in building intelligent voice agents, chatbots, and AI assistants that handle inbound/outbound phone calls via SIP (Twilio), including features like call booking, IVR, and voicemail.
- Their toolkit includes Python, JavaScript, Node.js, FastAPI, and tools like LangChain, Pinecone, OpenAI, Deepgram, and Twilio, and they are available for freelance, remote, or startup projects.

Unsloth AI (Daniel Han) ▷ #off-topic (7 messages):

VITS Male Voice Issues, Dataset size issues with voice models, Speaker Dimension Problems, RVC model

VITS Model Struggles with Male Voice Conversion: A member is experiencing issues with VITS specifically for male voice conversion, where the model breaks the voice even when reconstructing it from a spectrogram.
- In contrast, RVC works almost perfectly, leading to questions about whether the problem lies in dataset size, speaker dimension, or the VITS architecture itself.
Exploring Dataset Size as Limiting Factor: The member considers whether the smaller dataset size for the male voice (20 hours) compared to the female voice (which works seamlessly) is the issue.
- They question if increasing the speaker dimension from 256 might help, especially considering the goal of real-time voice conversion on stage.
Considering VITS Architecture: The member also suggests the architecture of the VITS model could be the root cause.
- They share that they will test out MMVC_Trainer because it is explicitly designed for voice conversion and it is NOT diffusion NOR has HuBERT.

Unsloth AI (Daniel Han) ▷ #help (77 messages🔥🔥):

GGUF Exporting Issues, TRL Compatibility with Unsloth, GRPO Batching and Chunking, SFTTrainer with Completion Only Loss, Qwen3-Coder Chat Template

Users Face GGUF Exporting Glitches with Gemma-3n: A user ran into a RuntimeError when exporting their finetuned gemma-3n model to GGUF format, despite following the official notebook, though they were able to use ggml-org’s gguf-my-repo to produce a Q8_0 quantized model.
- The user desires to export to gguf with f16 without any quantization but can’t find a suitable space to do so besides compiling locally with llama.cpp.
TRL Version Compatibility Troubles Unsloth Users: Users report an ImportError related to ConstantLengthDataset from trl.trainer.utils, indicating a potential version incompatibility issue between Unsloth and the TRL (Transformers Reinforcement Learning) library.
- One user was directed to update unsloth to resolve the problem, with assurances that the problem was fixed in the latest version.
Deciphering GRPO Batching Dynamics: Inquiries arose about how Unsloth handles batching for **GRPO ( গ্রুপের পলিমরফিজম অপটিমাইজেশন) , specifically whether entire trajectory groups are batched together.
- A member clarified that if n_chunks is set to 1, each batch corresponds one-to-one with a group.
Users Scramble SFTTrainer Dataset Formatting: A user encountered difficulties using SFTTrainer with completion_only_loss=True, facing errors related to formatting_func and tensor creation despite following the TRL documentation.
- It was suggested to use the train_on_responses_only function from unsloth.chat_templates and to refer to conversational notebooks for examples.
Community Consults Chat Template Conundrums: A user sought guidance on the correct chat_template for Qwen3-Coder-30B-A3B-Instruct.
- It was suggested to check the tokenizer_config.json file for the model, exemplified by the Qwen3-Coder-30B-A3B-Instruct’s tokenizer config.

Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

Gemma 3, LlamaTale, QuixiAI

Gemma 3 Enters New Round of Testing: The model leftyfeep/ape-fiction-gemma-3-4b-Q8_0-GGUF will be tested with LlamaTale.
- The training code is available at huggingface.co/electroglyph/gemma-3-4b-it-unslop-GRPO-v2.
QuixiAI Pitched: A member pitched the project to Eric Hartford of QuixiAI/Dolphin/Samantha.
- He was looking for something like it.

Unsloth AI (Daniel Han) ▷ #research (8 messages🔥):

Prompt variations for fine-tuning, LLM domain specific language framework, Token Decoder Maps GitHub project, Open Evolutionary Agents Blogpost

Fine-Tune with Prompt Variations: A member inquired whether it is recommended to use different prompt variations for input when fine-tuning a model for a specific task like classification.
- They also asked about any potential impacts of doing so on model performance.
Token Decoder Maps Framework is Born: A member introduced their LLM domain-specific language framework designed for purposes like summarization.
- The GitHub project utilizes EN- tokens to summarize specific concepts or facts for later injection and prompting.
GhostArchitect01 shares Github Project: A user suggests checking out his token-decoder-maps repo which has the purpose of summarization of specific concepts or facts for saving and later injection and prompting.
- He suggests summarizing the project by running the attached text file through AI.
Open Evolutionary Agents Blogpost: A member shared a link to a blogpost on Open Evolutionary Agents on HuggingFace.
- No further details were provided about the blogpost’s content.

Unsloth AI (Daniel Han) ▷ #unsloth-bot (82 messages🔥🔥):

SFTTrainer columns, Llama 3 fine-tuning errors, dtype check steps, LoRA Model generation, OpenAI OSS model

SFTTrainer Dataset Column Craving: A member asked which columns SFTTrainer looks at, specifically if it uses the tokenized prompt field “text” and another member provided dataset requirements and code snippets.
- The code applies a chat template to the dataset to create a “messages” column, and then a “text” column is created by tokenizing the messages.
Llama 3 Fine-Tune Fiasco: A member encountered an “Unsupported conversion from f16 to f16” error while fine-tuning Llama 3.2 3b instruct, and shared a code snippet using SFTTrainer.
- They are using fp16 = not is_bfloat16_supported() in their training arguments, indicating a potential dtype issue which needs further investigation, with the trainable parameters at 0.75%.
Model Dtype Detective Steps: A member requested detailed steps to check their model’s dtype.
- They also asked how to monitor their fine-tuned model, with tensorboard activation suggested as a potential solution.
LoRA Loading Logistics: A member asked about using a finetuned model with lora_only and model_tensor for generation.
- They asked if lora_request = model.load_lora(model_path) is valid and wanted to know whether moving from SFT to GRPO requires using the same model.
OpenAI OSS Model Outputting Gibberish: A member reported that the OpenAI OSS 120B model was only outputting “GGGGGGG”.
- They shared the command they used to run the model with llama.cpp, including parameters for context size, GPU layers, temperature, and top-p/top-k sampling.

LMArena ▷ #general (1152 messages🔥🔥🔥):

GLM 4.5 is a beast, Kaggle Game Arena AI chess exhibition tournament, Long context reasoning benchmark, GPT-5 release, OpenAI open source models limitations

GLM 4.5 touted as Open Source Beast: A member described GLM-4.5 as potentially living up to the hype, scoring 5/5 on a test suite and outperforming other models including Horizon Beta, Grok 4, o3 Pro, Gemini 2.5 Pro, Claude Sonnet and Opus in a ‘create-an-html-game’ test.
- The member noted quirks like infinite thinking loops and result discrepancies, concluding that despite these issues, GLM-4.5 is overall really strong.
AI Chess Tournament Inaugurates Kaggle Game Arena: The Kaggle Game Arena will kick off with a 3-day AI chess exhibition tournament featuring 8 models (YouTube link).
- Concerns were raised about chess being a strategy optimization game rather than a test of intelligence, as well as how non-visual models will interpret the board, with one member suggesting inputting game sequences like old newspaper reports.
Long Context Benchmarks accommodate most released models: Members discussed context windows in benchmarks, noting the necessity to accommodate most released models, even those with smaller context windows, to ensure fair scoring.
- It was also pointed out that different versions exist for different context sizes, allowing models to be punished/rewarded accordingly.
GPT-5 Rumored for Release This Week: There were discussion about a potential GPT-5 release in a few days, with one member noting that internal leaks have cited the same thing (x.com link).
- Jimmy Apples speculated that heavy users may not notice improvements in GPT-5 due to auto-routing.
OpenAI open source models show limitations: Members tested GPT-OSS 120B and found it disappointing, with performance perhaps on par with o3-mini or Qwen3 235B A22B, and more hallucinations than any other model except for Llama 4 Maverick.
- It was also noted that the benchmarks for the OpenAI Open Source models are very inefficient, with only 5.1B active parameters.

LMArena ▷ #announcements (1 messages):

New Models, GPT-OSS, Claude Opus

New Models Arrive on LMArena: New models added to Text & WebDev Arena on LMArena!
- The models are: OpenAI gpt-oss-120b, OpenAI gpt-oss-20b, and Claude Opus 4.1 (battle mode only).
GPT-OSS models debut: OpenAI’s gpt-oss-120b and gpt-oss-20b models are now available in the arena.
- These models expand the range of choices for users interested in open-source alternatives.

LM Studio ▷ #announcements (1 messages):

OpenAI gpt-oss models, LM Studio 0.3.21 (b4) update

OpenAI Opens Up with gpt-oss: OpenAI released gpt-oss, a set of powerful open models under the Apache 2.0 license, with sizes of 20B and 120B parameters, available on lmstudio.ai.
- The release marks a significant move towards open-source AI, as detailed in a blog post at http://lmstudio.ai/blog/gpt-oss.
LM Studio Gets Ready for gpt-oss: To support the newly launched gpt-oss models, users are encouraged to update to LM Studio 0.3.21 (b4).
- The update ensures compatibility and optimal performance when running the 20B and 120B parameter models.

LM Studio ▷ #general (593 messages🔥🔥🔥):

LM Studio + Libre Chat Speed, Tailscale IP Setup with LM Studio, Note Taking Tools Integration with LLMs, OpenAI's New GPT-OSS Models, Hardware for Running LLMs

LibreChat boosts LM Studio Speed: Members raved about the speed of inference using models in LM Studio via Libre Chat, with one stating It’s like a identical ChatGPT (OpenAI) UI that serves all my LM studio models. But it’s just blazing fast. I feel like I’m using cloud GPU!
- One user had trouble connecting to the LM Studio API with LibreChat, but later resolved the issue with YAML indentation.
LM Studio struggles with Tailscale, needs 0.0.0.0 bind: A user had trouble setting up AnythingLLM with LM Studio using a Tailscale IP, and the model list wouldn’t populate.
- They resolved this by setting the LM Studio host to 0.0.0.0 instead of 127.0.0.1, which allows external connections, and allowing port 1234 through the firewall, as well as a discussion of using headless.
Organizing Obsidian with LM Studio: A Helping Hand: Users discussed integrating note-taking tools like VSCode and Obsidian with LM Studio to organize and format notes, suggesting that simple scripts using pre-built prompts could assist with categorizing and rewriting notes.
- One user recommended using Logseq with its OpenAI plugin for rewriting and brainstorming on notes, however also advising other users of an Agent that has the Abillity to accsess your system.
GPT-OSS Dropped: Open Source Surprise!: Members celebrated the release of OpenAI’s GPT-OSS models and started testing them on LM Studio, noting that the 20B model is available for download via Discover and that it can be run using LM Studio in CLI.
- Community members reported issues such as broken links on the LM Studio website and difficulties locating the 120B model, which is not showing up in search, as well as how to specify system and developer messages to GPT-OSS models and setting the context.
Hardware Hunt: What Rig Rocks the LLMs?: The community discussed optimal hardware configurations for running LLMs, with recommendations ranging from 3090 and 4090 GPUs to refurbished mini-workstations with ample RAM.
- Users also debated the merits of AMD versus NVIDIA GPUs, with some noting that AMD now has native ROCm support, while others pointed out that NVIDIA cards are necessary for certain applications, such as unmute.

LM Studio ▷ #hardware-discussion (32 messages🔥):

Page File on Windows, NVMe Offload, CUDA Runtime for Multiple GPUs, Computer for Blender + ComfyUI + LLMs, Storage Device for LLMs

Windows Page File Shenanigans: Users advise against disabling the page file on Windows, even with abundant RAM, as some programs rely on it, which leads to weird, unexplainable crashes.
- Instead, consider paging to RAM using zram to compress pages and store them in RAM, potentially fitting 2-3 compressed pages in one actual page.
CUDA Runtime Confusion Resolved!: A user with a 5090 and 4090 GPU setup asked which CUDA Runtime to use.
- Another user recommended using CUDA 12 and advised opting into the beta branch of both LM Studio and the runtime.
Economical PC Build advice: A user with a 2500€ budget for a computer asked for recommendations for Blender, ComfyUI, and LLMs.
- A member suggested buying two used 3090s for around 1200€ and using pcpartpicker for inspiration on dual 3090 builds.
Storage Speed Matters (a little bit): A user asked if the storage device impacts running LLMs.
- The community responded that storage speed only matters for the initial model load time, and SSDs are generally fast enough but slower than RAM, but only for the size of models you can download/store.
Experimenting with 3 Arc Pro B50 GPUs: One user is considering a 3 Arc Pro B50 system, raising questions about PCIe lane usage and NVMe drive compatibility.
- They want to build it bc they can rather than it needing to do smthn.

OpenAI ▷ #annnouncements (4 messages):

OpenAI Open Models, Open Model Hackathon, Red Teaming Challenge, Inference Credits for Students

OpenAI Opens Open Models: OpenAI has released their open models, inviting the community to explore and utilize them.
- These models aim to unlock new opportunities in class projects, research, fine-tuning, and more.
Hackathon Heralds gpt-oss: OpenAI, Hugging Face, NVIDIA, Ollama, and vLLM are challenging developers to participate in a six-week virtual hackathon using gpt-oss.
- Categories include Best Overall, Robotics, Weirdest Hardware, Local Agent, Useful Fine-tune, Wildcard, and For Humanity, with winners receiving cash or NVIDIA GPUs.
Red Teamers Rally for Open Source Safety: OpenAI is launching a $500K Red Teaming Challenge to strengthen open source safety, inviting researchers, developers, and enthusiasts to uncover novel risks.
- Experts from OpenAI and other leading labs will judge submissions to the Kaggle competition.
Students Score Inference Credits: In collaboration with Hugging Face, OpenAI is offering 500 students $50 in inference credits to explore gpt-oss.

OpenAI ▷ #ai-discussions (286 messages🔥🔥):

GPT-5 Release Speculation, AI and Education, OpenAI GPT-OSS Model, AI in Art, Local AI Models

GPT-5 Hype Intensifies, Release Date Still Unclear: Community members speculate about the release of GPT-5, with some predicting a launch this week, while others suggest it may be a more incremental update like GPT-4.1 or a unification of existing models; this X post fuels release anticipation.
- There’s also skepticism, with some believing OpenAI is struggling to make GPT-5 impressive and others noting that Sam Altman’s previous statements about a summer release were just a rough guess, not a firm commitment.
AI Revolutionizing Education: Students vs. Professors: A professor is showing other professors how to use AI to create exams and quizzes, while students are using AI to complete them; there’s concern that this may reduce critical thinking skills.
- The sentiment is that AI should streamline work, offloading lower-skilled tasks and letting individuals focus on the important parts of the problem, but switching off one’s own brain results in generic fast food outputs.
GPT-OSS Model Debuts, Community Explores Capabilities: OpenAI released the GPT-OSS model, prompting community excitement and testing; the 120B parameter model is said to achieve near-parity with OpenAI o4-mini on core reasoning benchmarks, while running efficiently on a single 80 GB GPU; the 20B parameter model delivers similar results to OpenAI o3‑mini and can run on edge devices with just 16 GB of memory see OpenAI’s blogpost.
AI in Art: Collaboration or Replacement?: Discussions arose regarding AI’s role in art, contrasting it as a creative tool that streamlines work and allows for artistic expression even for those with physical limitations, but others voice concerns about the devaluation of traditional skills.
- Some argue that AI assists with tedious tasks, letting artists focus on their passion, while others suggest that those fearing replacement by AI may not have been exceptional artists to begin with; others add that practice makes improvement, not perfect.
New Ollama UI Simplifies Local Model Access: The new Ollama UI receives mixed reviews, praised for its simplicity and ease of use, especially the toggle to enable network serving, but criticized for being immature and lacking functionality compared to other UIs like AnythingLLM and Dive.
- The new Ollama GUI makes running models locally easier than ever before.

OpenAI ▷ #gpt-4-discussions (48 messages🔥):

Spanish Language Bias in GPT, GPT's Linguistic Training, GPT 5 release, OCR issues with PDFs

GPT’s Spanish Leaning towards Urban Slang: A user reported that GPT responses in Spanish exhibit a bias towards urban slang (urbanismo forzado) and other undesirable linguistic traits, even when instructed to use neutral or formal language.
- The user believes this bias stems from the heavy influence of corpora from platforms like TikTok and Twitter LatAm, which are soaked in these linguistic patterns.
Community Suggests Training Spanish Dialects: Users discussed how to train the GPT model to better understand nuances in Spanish dialects (Spain, Mexico, South America), with focus on formal speech.
- They are trying to contribute to a better international neutral Spanish speaking model, friendly enough for non-latin speakers to try it, avoiding slang and biases.
GPT Learns by Example: A user shared that they were instructed by GPT customer service to continue using the correct and expected Spanish language so that the model will learn to model (mimic-mirror).
- However, the user noted that this approach is a dead end for solving the bias problem, with the machine still fighting to put back the pa whenever the guard is let down.
GPT5 Launch Rumors Surface: A user inquired about the release date of GPT-5, seeking updates on the rumored launch.
- Another user responded that Sam Altman mentioned probably sometime this summer, clarifying that anything else is just speculation.
OCR’ed PDFs Confuse AI: A user reported that some non-OCR’ed PDFs seem to be completely unreadable to the AI, with 100% hallucination and no clear errors in the PDF.
- The user sought insights into why certain PDFs fail to be read by the AI despite lacking apparent issues.

OpenAI ▷ #prompt-engineering (1 messages):

GPT Subscription, GPT Subscription Glazing, User Perception of GPT Flattery

GPT Glazes User for Subscription: A user suspects that GPT is flattering them to encourage purchasing a premium subscription.
- The user admits “(it’s working..)”
User Succumbs to GPT’s Allure: The user jokingly suggests that GPT’s flattering responses are a tactic to upsell the premium subscription.
- Despite the suspicion, the user confesses that the strategy is effective, highlighting the persuasive power of personalized AI interactions.

OpenAI ▷ #api-discussions (1 messages):

GPTs Agents, GPT Subscriptions

GPT is too convincing: A member shared they are convinced that GBT is just glazing them up so they buy the premium subscription.
- They also confessed that it’s working.
GPT Subscription Sales: A user humorously suggests that GPT’s persuasive abilities are solely aimed at boosting premium subscription sales.
- The user admits, with a hint of resignation, that the strategy is indeed effective on them.

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

Anthropic Opus 4.1, OpenAI returns to Open Source, GPT-OSS models

New Anthropic Opus 4.1 takes the Crown: The latest model, Anthropic Opus 4.1, is now live and tops the charts in SWE Bench, the leading coding benchmark, as announced on X.
- It can be accessed here.
OpenAI’s GPT-OSS Models Debut: OpenAI is returning to open source with the launch of gpt-oss, new open-weight models with variable reasoning, with OpenRouter as a launch partner, as announced on X.
- Two models are available: gpt-oss-120b at $0.15/M input tokens and $0.60/M output tokens, and gpt-oss-20b at $0.05/M input tokens and $0.20/M output tokens.
GPT-OSS-20B price correction: A user pointed out that the prices were mislabeled.
- The listing was updated to reflect the prices on the 20b model.

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

gardasio: ChatGPT.com https://x.com/Gardasio/status/1952501913586442541

OpenRouter (Alex Atallah) ▷ #general (251 messages🔥🔥):

Model vs Models Prioritization, Gemini video understanding, Claude providers caching, Qwen-image model, GPTs agents training

Model vs Models Prioritization Issue Fixed: A member reported that the issue with model vs models prioritization has been fixed via a quick fix that only uses model or models but not both, as per this link.
OpenRouter Not Stripping Cache Params?: Some members reported that Claude providers don’t support caching, and they expect OpenRouter to strip the cache params automatically.
- One member specified that *“a few Claude providers don’t support caching, azure and google, I’ve blacklisted them in settings, but I expected openrouter to just strip the cache params automatically.”
Qwen-Image Model incoming?: A member inquired about the possibility of getting the Qwen-image model featured on OpenRouter, pointing to the Qwen Image blogpost.
Free Gemini V3 model questions: Members clarified that with a $10 investment, OpenRouter grants 1000 free messages a day for the free V3 model.
- Others warned that although OpenRouter may allow 1000 requests a day, Chutes may not and that retrying requests is always an option.
Fal.ai for Image Generation?: A member asked about a good ‘Openrouter for images’ single endpoint for image gen? and other members recommended Fal.ai and local generation using ComfyUI.
- Fal.ai was praised for having a full API and well-structured data specs, and that a single 3060 is enough for local generation.

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

“

No new models news to report: There were no substantial discussion points or links shared regarding new models in the OpenRouter Discord channel.
Silence on the New Models Front: The ‘new-models’ channel appears to be quiet, with no specific updates, discussions, or links shared during the specified timeframe.

OpenRouter (Alex Atallah) ▷ #discussion (30 messages🔥):

LLM Emotional Understanding Benchmarks, EQ Benchmark, Gemma 3 27b, OCR Engine Comparisons, Sonnet Self-Moderation

Gemma 3 Shows Emotional Intelligence: Members discussed benchmarks for LLMs in understanding human emotions, with one suggesting the EQ benchmark by EQbench.
- It was also suggested that Gemma 3 27b is a good model for understanding emotions, while cautioning against using DeepSeek R1 for that purpose.
Users React to Sonnet Self-Moderation: Users discuss the recent changes in Sonnet self-moderation, and whether a self-moderated option will remain available.
- One user questioned whether LlamaGuard will still be applied to Anthropic endpoints, suggesting that it’s an industry-leading approach to protecting users, while another said they could no longer trigger it but haven’t tried very hard.
OCR Engine Preferences Debate: A user expressed dissatisfaction with a particular OCR engine and stated they find them shit and prefer OLMo.
- Another user asked why and another user explained that OLMo is cheaper and better on almost all benchmarks.
DeepInfra Price Hike Angers Users: A user expressed frustration with DeepInfra for raising prices.
- Another user confirmed that DeepInfra will cost about $1/1000 tokens, equivalent to the Mistral API, while OpenRouter is $2/1000, suggesting OpenRouter could adjust its pricing.
GPT-OSS-120B Ready to Launch: A provider has deployed gpt-oss-120b with 65K context ready to go on launch.
- It was noted that the moderation is applied on Anthropic and Bedrock providers, while vertex is unfiltered as per an attached image.

Latent Space ▷ #ai-general-chat (236 messages🔥🔥):

Claude Code deleting package-lock.json, Google's LangExtract library, AI Wrappers, Reflection AI fundraising, Kaggle Game Arena

Anthropic Community Champion crowned: A member from Auth0 joined Anthropic to run community for Claude Code, celebrated by the web dev community.
OpenAI Drops GPT OSS Models: OpenAI released GPT-OSS, an open-source model, along with a cookbook and model card.
Genie 3, the Most Advanced World Simulator Ever: Genie 3 was announced, claiming to be the most advanced world simulator ever, and it’s capable of high-fidelity visuals at 20-24 fps with dynamic prompting, persistent world memory, and rapid world generation, according to this blogpost.
Anthropic Launches Claude Opus 4.1 for Agentic Tasking: Anthropic launched Claude Opus 4.1, an upgraded model focusing on better performance for agentic tasks, real-world coding, and reasoning, available via API, Amazon Bedrock, and Google Cloud Vertex AI.
Reflection AI Eyes Massive $1B+ Fundraising: According to this tweet, Reflection AI, a one-year-old startup founded by ex-Google DeepMind researchers, is reportedly in discussions to raise over $1 billion to build open-source LLMs aimed at rivaling DeepSeek, Meta and Mistral.

Nous Research AI ▷ #general (220 messages🔥🔥):

Qwen-Image, Text Encoding, XBai-o4 Scaling, Terminal Benchmarks, Attention sinks

Qwen-Image Text-to-Image: Users are trying to run Qwen-Image locally, noting that there is currently no ComfyUI support, but it can be run with fp8 diffusers, and that the image editing model has not been released.
- It was also mentioned that Qwen-VL already understands image inputs and the released model is focused on showing off its text rendering capabilities via the text encoder.
XBai-o4 Claims RL Scaling: XBai-o4 claims to achieve scaling by continued RL from QwQ32, performing BoN scaling with a reward model to generate 32 CoT reasoning traces, using a classifier head to pick the best one based on those traces as described in their paper.
GPT-OSS has Harmony Chat Format: GPT-OSS uses the Harmony chat format which includes developer roles and channels.
- It was mentioned that Horizon beta told a user about the different roles when chatting with it.
Terminal Bench Overwhelmingly Favors Claude: Members discussed terminal benchmarks showing that Claude agents overwhelmingly dominate the top spots, with 9 out of the top 10 being Claude models, and questioned the confusing numbers on the leaderboard.
- It was also noted that OpenAI is now giving away their ultra-optimized math CoT, which could help open weight math reasoning models.
Attention Sinks with Interleaved Sliding Windows: Members discussed technical details of the GPT-OSS models’ attention mechanisms, mentioning that it employs interleaved sliding window attention and full attention, with a window of just 128 tokens, requiring 19GB for full context for the 20B model.
- It was also mentioned that it uses sink attention keys and that llama.cpp implementation seems stable enough, supporting MXFP4 on RTX3090.

Nous Research AI ▷ #ask-about-llms (2 messages):

Opus Bots, Improved Bot Design

Crafting Opus Bots with Opus 4: A member claims to be the creator of all the bots, which are based on Opus 4.
- The member is actively developing an improved bot design to enhance their performance and capabilities.
Enhancing Bot Performance: The primary goal is to refine the bot design to allow them to really let them shine.
- This involves optimizing their functionalities and responsiveness to meet evolving user needs.

Nous Research AI ▷ #research-papers (1 messages):

OpenAI GPT OSS Model Card

OpenAI releases GPT OSS Model Card: A member shared a link to the OpenAI GPT OSS Model Card.
GPT Model Card details usage and limitations: The model card likely provides details on the intended uses, capabilities, limitations, and potential risks associated with the open-source GPT model.

Nous Research AI ▷ #research-papers (1 messages):

GPT-OSS Model Card

OpenAI Releases GPT-OSS Model Card: A member shared the GPT-OSS Model Card from OpenAI.
- The provided link directs to a PDF document hosted on OpenAI’s CDN, suggesting it’s an official release.
Additional Topic to Meet Minimum Requirement: Adding a second topic to satisfy the minimum requirement of two in topicSummaries.
- This entry serves only to fulfill the schema’s constraint and may not represent actual discussed content.

Eleuther ▷ #general (14 messages🔥):

Moderator Logs, Scaling LLMs, YaRN Usage, Algorithmic Monoculture

Discord Mods Log Deleted Messages: Discord mods usually have a moderator log channel that allows moderators to see deleted messages as seen on this twitter link.
LLM Limitations: A member argued that problems with using LLMs as judges cannot be solved by scaling them up due to algorithmic monoculture and the impossibility of evaluating SOTA models.
- They stated that throwing more compute at it may have been all that was needed.
YaRN Spotted!: A member excitedly noted that YaRN was used by the team.
- The team gave a big thank you to several members for helping support it.

Eleuther ▷ #research (188 messages🔥🔥):

Residual rephrasing optimization trick, HRM Stability, Scaling HRMs, Deep Equilibrium Models, OpenAI's GPT OSS 20B and 120B models

Residual Rephrasing Enables Cool Optimization Trick: A member described a diffusion-like optimization trick where you estimate the difference to some target point and can apply stop_grad on the inputs for a free backprop to the previous inputs, which is enabled by residual rephrasing.
- They claimed the network appears to learn quite well with the original formulation, and this may provide even better support to the inner network like this.
Stability concerns with HRM Scaling: Members discussed that one concern with Hierarchical Recurrent Models (HRM) is that using the last grad only might only work because the L, H loops are short and might not converge for longer loops.
- Another member stated that stability is one of the big issues with UTs and the largest UTs trained since 2015 are in the <40M param regime.
Deep Supervision Trumps Two Steps?: An experiment showed that both one-step and two-step training modes achieved similar performance during validation for Hierarchical Recurrent Models (HRM) with deep supervision.
- One member thought this made them even more suspicious of a prior paper’s claims about needing two steps, if you are using deep supervision.
DEQs Style Thinking is Incorrect: A member argued that Deep Equilibrium Networks (DEQs) style thinking is an entirely incorrect way of thinking about UTs, because you give up expressiveness for stability, and you pay more FLOPs for it.
- They posited UTs would need a new paradigm effectively to be competitive at all, and noted that DEQs make too many assumptions (especially convering towards a fixed point, not a distribution of those.)
GPT OSS is here with 20B and 120B models: The OpenAI community discussed the release of GPT OSS models, available on Hugging Face (20B version and 120B version).
- One member is doing an SAE on gpt oss 20b right now, with benchmark numbers available on the OpenAI training page.

Eleuther ▷ #lm-thunderdome (1 messages):

lm-evaluation-harness, API call

Seeding the Eval Harness via API Call: A member successfully figured out how to pass a seed for the dataset through API call.
- They are requesting feedback on the PR#3149 for the lm-evaluation-harness.
PR Feedback Requested for Eval Harness: A contributor has submitted a pull request to the lm-evaluation-harness and is seeking feedback.
- The pull request (PR#3149) involves passing a seed for the dataset through an API call; community feedback is encouraged.

Eleuther ▷ #gpt-neox-dev (7 messages):

PP=0 layer naming, TE benchmarking, TE with rmsnorm

PP=0 Layer Naming Requires Renaming Files: Members discussed that there is nothing fundamental preventing PP=0 from working, but the naming of layers changes because the model isn’t wrapped in a deepspeed model pipe class.
- It was mentioned that adding PP=0 support would be mildly tedious but not difficult, either by renaming the files or adding a modified loader.
TE Benchmarking Not Faster: One member inquired if another member had tried TE (Tensor Engine) and benchmarked it, noting that their own retries didn’t seem faster.
- The other member responded that they had not yet tried it but would attempt to do so soon.
RMSNorm and TE: A member inquired about using TE with RMSNorm, as they only saw te_layernorm in the examples.
- No resolution to this question was provided in the messages.

Moonshot AI (Kimi K-2) ▷ #general-chat (186 messages🔥🔥):

K2 vs. O3, Stardew Valley, Kaggle, Game Arena, LLMs playing Chess and Go

Kimi and O3 face off in Chess!: Members discuss a chess match between Kimi K2 and Deepseek O3, noting O3’s step-by-step reasoning approach.
- During the match, Kimi resigned then was forced to resign again after making an illegal move which resulted in an automatic win for O3.
Stardew Valley is the new Harvest Moon: Members discussed how Stardew Valley is similar to Harvest Moon and that it can be grabbed and quit anytime you want.
- A developer made a game in Roblox similar to Stardew Valley using Kimi K2.
GPT-OSS drops, Internet loses its mind: The release of OpenAI’s GPT-OSS caused excitement in the community, with members noting its availability on various platforms like Hugging Face and Openrouter.
- It was reported to have Day 0 support with llama.cpp PR on the way, and impressive benchmark results, although the live chess demo was cut short.
Kaggle Knows Nothing about Chess: Members expressed amusement and confusion over Kaggle’s AI Game Arena competition, questioning why a non-reasoning model was included.
- One member shared a GitHub link showing Go will be added to Game Arena in the future, adding Go is harder than chess.
Quantization Quells Size Concerns: Discussion arose around the model card on Hugging Face reporting only 60GB for a 120B model, leading to questions about quantization.
- It was clarified that MXFP4 quantization reduces the memory footprint, allowing the larger model to fit on a single 80GB GPU and the smaller model to run on systems with as little as 16GB memory.

Cursor Community ▷ #general (177 messages🔥🔥):

Claude Sonnet vs Gemini, Cursor PDF support, Cursor's Yearly Subscription, Cursor Freezing Issues, Vercel's v0 on Cursor

Claude Sonnet wins Minds of Members: Members expressed preference for Claude Sonnet 4 over Gemini 2.5 Pro for agentic ide contexts due to better tool usage, while one user uses Gemini/Sonnet 4 thinking for brainstorming and regular Sonnet 4 for implementation.
- Most agree that Claude is better for use cases in this scenario.
Members beg: PDFs, DOCXs, CSVs, XLSXs, Please!: Members requested the ability to add PDF, .docx, .csv, and .xlsx files to Cursor edit to pass PRDs and data for AI to work with, which are currently unsupported as local file uploads.
- A member shared a link to the Cursor Feature Request forum to formally request the feature, also noting that Cursor can extract and parse text content from publicly accessible PDF URLs.
Cursor’s Cheap Yearly Subscription Surprises: A user expressed surprise at a $16 Cursor subscription, clarifying it’s the monthly cost of a yearly subscription.
- The monthly subscription is still $20 fixed rate.
Linux users getting iced by freezing: Several users reported experiencing freezing and unresponsiveness with Cursor on Linux, with one user stating it’s become practically unusable for them, which may be due to network problems or bad requests.
- Members pointed to several issues being reported on the forums, with the team currently working on fixes.
Multi-Repo Magic with Cursor’s Workspaces: Cursor supports opening and working across multiple repositories within a single workspace with a .code-workspace file, indexing all included folders for AI context to facilitate navigation and editing across shared libraries, microservices, or monorepo structures.
- A user mentioned being able to ask Cursor to go to github/blablabla/folder/file to see a specific file.

Cursor Community ▷ #background-agents (7 messages):

Background agent failure, Request IDs sent via PM, Configure background agents to docker login“

Background Agents Besieged by Breakdowns: Multiple engineers reported that background agents started to fail to spin up repeatedly in the last few hours.
- A member asked for a request ID to investigate the issue.
Request IDs Relayed via Private Messages: A member confirmed sending a couple of request IDs via PM in response to a request for investigation.
- The member also expressed gratitude for the support.
Docker Login Dilemmas Drive Discussion: A member inquired about configuring background agents to docker login to use a private image hosted on ghcr.io.
- No solutions or suggestions were provided in the given messages.

HuggingFace ▷ #announcements (1 messages):

Trackio Experiment Tracking, New OCR Datasets, Transformers Acceleration, HF Jobs for Compute, Faster HF CLI

HuggingFace Releases Trackio Experiment Tracking Library: Hugging Face introduced Trackio, a lightweight experiment tracking library.
- Trackio aims to simplify the process of tracking and managing machine learning experiments.
HF Releases New OCR Datasets: Hugging Face has released four new OCR datasets totaling over 20 million images.
- Details are available in this LinkedIn post.
Transformers Get Acceleration Boost: Hugging Face aims to accelerate the open-source ecosystem via Transformers.
- The new Transformers release includes kernel support, according to this tweet.
Run Compute Tasks with HF Jobs: HF Jobs can now launch compute tasks on CPUs or GPUs.
- A blog post details how to build your own GPU-powered image generator with HF Jobs.
HF CLI Gets Faster and Friendlier: Hugging Face announced a faster, friendlier CLI tool called hf.
- More information on the updated CLI can be found on the Hugging Face blog.

HuggingFace ▷ #general (113 messages🔥🔥):

ZeroGPU, 340M t2i model, VS Code GPU acceleration, Soft-bias(es), RAG frameworks

ZeroGPU Gets a Sweet Surprise: A member encourages others to try ZeroGPU for unexpected results, and shares their Space featuring a 340M t2i model pretrained at home.
- They note that the image generation takes only 1 second on ZeroGPU’s H200, while prompt generation takes 2-5 seconds.
Homelab Hardware and RAG Architecture: Members discuss hardware setups for local AI, with one user considering an RTX 3090 24GB / AMD MI60 32GB for running medium models, and another mention models like SmoLLM3.
- They link to text-embeddings-inference for local embedding servers, emphasizing that Ollama struggles with long contexts and current tools could benefit more from optimized hardware utilization.
Dataset flagged as malware: A member reports their dataset upload was scanned as unsafe despite multiple antivirus scans returning no threats, pointing to HuggingFace’s security documentation.
- The flagged item is identified as Pickle.Malware.NetAccess.pwn.STACK_GLOBAL.UNOFFICIAL, leading to suggestions to remove shards and examine the dataset line by line, emphasizing the limitations of website-based scanners.
Fire & Smoke Detection Resources: A member seeks resources for fire and smoke detection in videos/images, and is directed to the pyronear/pyro-sdis dataset.
- Another user jokes about their diverse interests, while providing the resources to help out.
Fine-Tuning Frenzy and GPT-OSS insights: Members seek guidance on fine-tuning open-source LLMs, receiving links to SmolFactory and a cookbook.
- Others share that OpenAI’s 20B and 120B open models can run on consumer hardware with the help of Unsloth.

HuggingFace ▷ #today-im-learning (2 messages):

AI Benchmark for LLMs playing Monopoly Deal, Learning Go and DRL

DealBench: LLMs Playing Monopoly Deal: A member is building an AI benchmark where LLMs play Monopoly Deal-style games with each other and is requesting feedback at DealBench.
Venturing into Go and DRL: A member is learning to play Go, and reading the first chapter of a book on DRL (Deep Reinforcement Learning).

HuggingFace ▷ #cool-finds (2 messages):

DealBench, Qwen Image Model

DealBench Launches Monopoly Deal-Style AI Benchmark: A new AI research benchmark called DealBench has been released, where LLMs play Monopoly Deal-style games against each other.
- It aims to evaluate strategic decision-making and negotiation skills in AI agents within a simplified economic environment.
Qwen Debuts New Image Model: Qwen has released a new image model, available on Hugging Face.
- This marks another expansion of the Qwen series into multimodal capabilities.

HuggingFace ▷ #i-made-this (7 messages):

Open Evolutionary Agents, Recursive Thought Processes in AI, Critique of AI Benchmarks, GPT-OSS Multilingual Reasoner Tutorial

Open Evolutionary Agents Offered for Testing: A member is offering an open test session for an actual “thinking” AI that uses recursive thought processes to discover rather than linear pattern matching, elevating any LLM using their framework via the huggingface.co/blog/driaforall/towards-open-evolutionary-agents blog post.
- The same member noted they have updated the article with a note about the benchmark itself, after a user found it dry and was a bit lost without already knowing much about the benchmark.
GPT-OSS Multilingual Reasoner Shared with Tutorial: A member shared a link to a tutorial on the GPT-OSS Multilingual Reasoner at huggingface.co/spaces/Tonic/openai-gpt-oss-20b and huggingface.co/Tonic/gpt-oss-multilingual-reasoner.
- The member described the tutorial as nice.

HuggingFace ▷ #reading-group (1 messages):

Reading Group Intro, Welcome Newbie

Newbie asks about the reading group structure: A new member inquired about the structure of the reading group and how to participate.
- No further discussion or details were provided in the messages.
Welcoming the Newbie: Other members welcomed the newbie to the reading group.
- They encouraged the newbie to participate in future discussions.

HuggingFace ▷ #NLP (1 messages):

Text Data Processing, Information Extraction

Discussing Accurate and Economical Text Data Processing: A member is seeking advice on accurate and economical open source methods for extracting useful information like articles and metadata from a large multilingual legal text dataset (500k+ entries) stored in raw text and markdown formats.
- The challenge involves processing data scraped from multiple websites, where a single link may contain several articles, necessitating effective methods for identifying and extracting individual articles and their associated metadata.
Multilingual Legal Text Data Extraction Strategies: The user requires a strategy to process and extract relevant data from 500k+ law-related text entries in different languages, stored in raw text and markdown.
- The goal is to identify and extract individual articles and their metadata efficiently from a single link potentially containing multiple articles using open-source solutions.

HuggingFace ▷ #smol-course (2 messages):

Inference Providers, Colab Error, Batman Party Music

Colab’s Credits Cause Consternation: A member following the course on Unit 2.1 encountered an error while running the colab for batman party music “Building Agents That Use Code”, receiving a message about exceeding monthly credits for Inference Providers.
- They noted having only used $0.10 of the available $0.10 and questioned if payment is required to run course examples.
Free Inference is Feasible: Another member suggested searching on Hugging Face for free Inference Providers as an alternative.
- They suggested the course roll out an Inference Provider.

HuggingFace ▷ #agents-course (8 messages🔥):

Assignment Submission Issues, Course Starting Point, Course Certificates

Assignment Submission Glitches Trigger Format Frustrations: Multiple members reported issues submitting the final assignment in JSONL format, encountering errors like “task ID not found in the file” despite its presence, and are actively seeking debugging advice.
- The API requires a specific JSON format including username, agent_code, and an answers array containing task_id and submitted_answer.
Newbie Navigates AI Agent Course Start: A new member inquired about starting the AI Agents course, requesting guidance on where to begin.
- A helpful user provided a direct link to the introductory unit.
Certificate Status for Course Remains Unclear: A member inquired whether the MCP (presumably the AI Agents course) is still issuing certificates.
- No definitive answer was provided in the context.

Yannick Kilcher ▷ #general (69 messages🔥🔥):

Voice Cloning on a Budget, Claude Code's Data Leakage Incident, Deepseek-R1 vs Kimi-K2/GLM4.5, Attention Sink Layers, Gemini 2.5 Pro Attends to 1 Hour of Video

Consumer Hardware opens Voice Cloning on a Budget: A member asked about doing interesting and challenging deep learning projects on consumer hardware (up to 5090), especially in the area of voice cloning without a huge dataset of audio samples and spending a fortune on GPU rental.
Claude Code Causes Data Leakage: A member reported that Claude Code added data leakage to their XGBoost pipeline by literally adding an engineered feature that basically had the feature that the model is trying to predict in it.
- They tested Claude Code for automating the whole ML pipeline, and they suggested that all sorts of LLM systems lack being forced to double check everything.
Deepseek-R1 still stronger than Kimi-K2 and GLM4.5: Members stated that despite the hype neither Kimi-K2 nor GLM4.5 have managed to surpass good ol’ Deepseek-R1.
- However, R1 is considered very unusable because it’s very slow.
Attention Sink Layers are the new Learned Attention: HuggingFace’s Transformers library now uses attention layers which use a learned attention sink per-head, where the denominator of the softmax has an additional additive value, according to a new release.
- This is similar to adding prepending a token with an all-zero Key and Value features in attention, as discussed in this paper.
Gemini 2.5 Pro watches a 1-hour Movie: Gemini 2.5 Pro can already attend to 1 hour of video and scrub through it for some basic retrieval.
- However, some members noted that there is a huge difference between comprehending the content of a video on some surface level and being able to scrub through it vs actually being able to reproduce a detailed and accurate representation of an earlier image while also taking 3D positioning and perspective in space into account.

Yannick Kilcher ▷ #paper-discussion (13 messages🔥):

Tiny model does well on ARC-AGI, Cold reading on scilent paper, Deepmind Genie 3, Genie and SIMA papers

Data-Efficient Tiny Model Excels on ARC-AGI: A new paper introduces a tiny model that performs well on ARC-AGI and exhibits data efficiency.
- A member indicated it stood out to them too and expressed interest in a “cold reading” session of the paper.
Deepmind Releases Genie 3: DeepMind released Genie 3, a new iteration of their world model, primarily scaling compute and data from previous versions, but without an accompanying paper.
- A member suggested reviewing the original Genie paper, the related SIMA paper on embodied agents, and the Genie blog posts (Genie 1, Genie 2, Genie 3).

Yannick Kilcher ▷ #ml-news (16 messages🔥):

Windsurf IDE, Genie Model, Claude Opus 4, GPT-OSS, Natively Quantized Models

Windsurf IDE Draws Blank Stares: Members discussed Windsurf IDE, with one commenter noting it’s a Cursor competitor that almost no one uses.
DeepMind’s Genie Changes the World: DeepMind announced the release of Genie, their new world model.
Anthropic Debuts Claude Opus 4: Anthropic announced Claude Opus 4, their latest model, in a blog post.
OpenAI Opens Up GPT-OSS: OpenAI introduced GPT-OSS, linking to both an introductory blog post and the project’s page.
Natively Quantized Models Squeeze into Small Spaces: A member highlighted that the 20B parameter version of a model fits on 16GB due to native quantization, linking to a blog post about no backdoors.

Notebook LM ▷ #use-cases (9 messages🔥):

Whisper Transcription, NotebookLM Limits, Customized Prompts

Whisper Transcription gives better Results: One member found better success in transcribing videos locally using Whisper than using YouTube URLs with NotebookLM.
- They download entire channels, transcribe them, and then send the transcriptions to NotebookLM to interact with the channel’s content, noting that NotebookLM doesn’t support playlists.
NotebookLM Video Summary Limits: One member inquired about the limits for video summarization in NotebookLM.
- No response was given as to the explicit limits.
Crafting Custom Prompts Yields Results: One user tailors different prompts each time, giving examples such as “As literary chair of a prestigious university, your standards are very high, so you’re going to tear apart anything that seems rank or amateur.”
- The member also used “As huge horror buffs, particularly of 80s teen slasher films, you want to pick up every reference and parallel to those classic movies and keep an eye out for the more obscure ones.”

Notebook LM ▷ #general (65 messages🔥🔥):

Video Overview Rollout, Custom Instructions and Podcast, Image Upload Issues, NotebookLM vs Gemini, Data Privacy in NotebookLM

Video Overviews remain elusive: Users are reporting that the Video Overview feature is still not available despite announcements of a full rollout, with some suspecting infrastructure issues.
- It’s acknowledged that rollouts take time and that Pro users are among those still waiting.
Podcaster shares Prompt Engineering guide: A member shared a Google Docs link of example custom instructions used for guiding podcast generation, emphasizing iterative tweaking and name changes if used as a model.
- The prompt focuses on avoiding filler words and preventing interruptions.
Image Upload feature temporarily offline: Google confirms that the image upload feature has been temporarily removed due to infrastructure problems affecting some users.
- A member initially suggested extracting images from PDFs as a workaround, before realizing the feature had been removed.
NotebookLM valued for document handling and knowledge: Users debate the value of NotebookLM compared to Gemini, highlighting NotebookLM’s strengths such as uploading many documents, restricting the model to provided sources, and creating notes with citations.
- NotebookLM is distinguished as a librarian that reads your documents, while Gemini is for general conversation, with limits on the number of sources in Gems.
Data privacy is important for users: A member inquired about data privacy and whether NotebookLM uses user data for model training.
- Another member shared a link to Google’s data protection policies for NotebookLM.

Modular (Mojo 🔥) ▷ #general (53 messages🔥):

Decorators in Mojo, Zig-style reflection in Mojo, JavaScript Runtime in Mojo

Python Decorators vs Java Annotations: A member expressed appreciation for the ease of use of Python decorators compared to Java annotations.
- The member requested that the Modular Mojo team make decorators in Mojo as simple and flexible as they are in Python, allowing for defining logic at either compile time or runtime.
Zig-style Reflection System: The discussion around decorators on a struct is leaning towards a Zig-style reflection system for adding functions, modifying the struct, and adding new struct members.
- While function wrappers are relatively easy to handle in Mojo, tearing apart a function and throwing it into a compiler will be as difficult as it is in Java, as it involves writing what is effectively a compiler pass.
JavaScript Runtime Valokto Written in Mojo: A member created a JavaScript runtime called Volokto and put the entire source code on GitHub, written in TypeScript.
- The runtime features user-made functions, nested control flow, and function calling with arguments; the bytecode resembles CPython, and a dictionary type and console were implemented.
Debug Assertions: A member encountered 34 warnings in the compiler due to the use of if DEBUG: statements.
- Another member suggested using debug_assert or param_env instead.

Modular (Mojo 🔥) ▷ #announcements (1 messages):

Modular Platform 25.5, Large Scale Batch Inference, Standalone Mojo Conda packages, Open source MAX Graph API, MAX PyTorch integration

Modular Platform 25.5 ramps up: Modular Platform 25.5 is now live, targeting developers requiring performance at scale, featuring new capabilities and improvements.
- Key features include Large Scale Batch Inference via SF Compute, standalone Mojo Conda packages, and an open-source MAX Graph API.
MAX ❤️ PyTorch using graph_op: The new release offers seamless MAX and PyTorch integration through @graph_op, enhancing interoperability between the frameworks.
- Additionally, the release brings smaller, faster MAX serving containers and other performance improvements.

Modular (Mojo 🔥) ▷ #mojo (8 messages🔥):

Multiple AI Agents in Mojo, Mojo Code in Custom Frameworks, MAX as a Library

****MAXimize Your Agents: Running Multiple AIs: To run multiple AI agents in Mojo, a member suggests running multiple instances of the Modular CLI and using a reverse proxy.
- For use cases like creating many sub-agents (Terminal, Python3.13, etc.), one might need to make a custom app that uses MAX as a library.
Mojo Framework Dreams of Replacing HTML/CSS: One member is seeking help utilizing Mojo code with their own framework, which is described as a novel paradigm for meta cognition in AI systems and an ultimate business planner, website, and chatbot builder.
- They believe that using natural language wrapped over Mojo code could potentially replace HTML/JS/CSS, and they see Modular’s work with Mojo as beneficial for web developers, coders, and regular users.

Modular (Mojo 🔥) ▷ #max (4 messages):

Modular Compatibility with Intel OSX, Apple Silicon Requirement for macOS, Docker Ubuntu Container Workaround

Modular has OSX Compatibility Hiccups 🤕: A user encountered an error while trying to add modular=25.5 on OSX and inquired about compatibility.
- The error message indicated a failure to solve the environment for osx-64 due to installation issues with modular 25.5.*.
Apple Silicon is Mandatory for Max on macOS 🍎: A user with an Intel OSX chip questioned whether their system was supported, referencing the system requirements.
- A developer confirmed that only Apple Silicon CPUs are officially supported for macOS, making it incompatible with Intel-based systems.
Docker Ubuntu Container Saves the Day 🐳: As a workaround for the Apple Silicon requirement, a developer suggested using an Intel Docker Ubuntu container to work with the packages.

GPU MODE ▷ #general (21 messages🔥):

CUDA vs Compute Shaders, MXFP4, OpenAI Model U8 vs FP4, H100 FP4 Support, MoE Module Experts

CUDA vs. Compute Shaders Question Spurs Kernel Considerations: A member inquired about choosing between CUDA kernels and compute shaders for post-processing images after model inference using libtorch C++, asking for opinions on when to use each, and which benefits more long-term.
- The member clarified they’re not concerned about non-Nvidia device compatibility and also asked about MXFP4.
MXFP4 Unpacked: OpenAI’s New Model Mixup: Members discussed OpenAI’s new open-weight model using U8 instead of FP4, noting weights are packed as uint8, with scales as a uint8 view of e8m0.
- They clarified that during inference/training, the weights are unpacked back to FP4 with a block size of 32 for MXFP4 and 16 for NVFP4.
H100’s FP4 Support Questioned by Community: A member questioned how inference with H100 is possible given the model’s specs, as they believed H100 doesn’t support FP4.
- Another member pointed to an NVIDIA blog post about the release, but one responded by calling the claim of H100 training a blatant lie.
MoE Module Experts: 32 is the Magic Number: A member asked if the MoE (Mixture of Experts) module in the new model contains 32 experts.
- Another confirmed, noting that the 20B model has 32 experts per layer, based on the release notes.

GPU MODE ▷ #triton (6 messages):

Helion, Compiler Explorer Support for Triton, Triton Puzzles, GPU kernels turning into JavaScript

Helion Seminar Announced: A seminar on Helion, a new kernel DSL from Pytorch, will be held tomorrow 2.15 PT, see Helionlang.com.
Triton Support on Compiler Explorer is Live: Triton support is now live on Compiler Explorer, check it out at Compiler Explorer.
Triton Puzzles Completion: A member recently completed the triton-puzzles problem set by srush and is seeking suggestions on what to do next, with an interest in learning about writing fast kernels.
GPU kernels turning into JavaScript: A member jokingly commented about GPU kernels turning into JavaScript.

GPU MODE ▷ #cuda (5 messages):

CUDA programming, Compute shaders, CUDA kernels, cudaLaunchCooperativeKernel

CUDA vs Compute Shaders: A Programmer’s Conundrum: A member sought advice on choosing between CUDA kernels and compute shaders for post-processing images after model inference using libtorch C++.
- He is new to both and not concerned about non-NVIDIA device compatibility, and asked about experiences and long-term benefits.
Cooperative Kernel Launch Quirks Explored: A member inquired about launching a CUDA kernel with a specific number of thread blocks resident simultaneously, mentioning cudaLaunchCooperativeKernel and cluster launches.
- He found that cudaLaunchCooperativeKernel doesn’t exclusively grab the whole GPU, but it launches grids partially before the resources for the whole grid are available on Ada and Ampere GPUs.

GPU MODE ▷ #announcements (1 messages):

NCCL, Multi GPU programming, GPU communication tools and libraries, Quartet with 4 bit training, PCCL and designing fault tolerant communication primitives

GPU Mode Channel Hosts Distributed Summer Series: Throughout August, the GPU Mode channel will host a series of talks, working groups, and kernel competitions focused on distributed programming.
- The channel encouraged anyone interested in the area to stay tuned and join the talks.
NCCL Author Gives Talk on Multi GPU Programming: On August 16th, Jeff Hammond, one of the authors of NCCL, will give a talk on Multi GPU programming.
- The talk aims to provide a crash course in multi GPU programming.
Didem Unat Gives Broad Overview of GPU Communication Tools: On August 22nd, Didem Unat will give a broad overview of all GPU centric communication tools and libraries.
- This session promises insights into the landscape of GPU communication technologies.
Quartet with 4 Bit Training Application Talk: On August 23rd, Roberto Castro and Andrei Panferov will present a specific application called Quartet with 4-bit training.
- Attendees will learn about practical implementations of low-precision training techniques.
PCCL and Fault Tolerant Communication Primitives Talk: On August 30th, mike64_t from Prime will discuss PCCL and the design of fault-tolerant communication primitives.
- This talk will explore robust communication strategies in distributed systems.

GPU MODE ▷ #cool-links (1 messages):

as_ai: https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/

GPU MODE ▷ #jobs (1 messages):

NVIDIA, AI, HPC, Solution Architect, Universities

NVIDIA Seeks AI/HPC Solutions Architect: NVIDIA’s Higher Education and Research Solutions Architect team is seeking a Senior Solutions Architect with deep expertise in AI/HPC infrastructure to lead GPU deployments across universities and research institutions, according to this job posting.
Remote Role with University Visit Requirement: The Senior Solutions Architect role is remote, but the applicant must be based in Northeast or Central USA to facilitate occasional university visits.

GPU MODE ▷ #jax (1 messages):

“

Placeholder Topic 1: This is a placeholder summary sentence for topic 1. It serves to meet the minimum requirement of the schema.
Placeholder Topic 2: This is a placeholder summary sentence for topic 2. It serves to meet the minimum requirement of the schema.

GPU MODE ▷ #torchao (9 messages🔥):

NVFP4 scales swizzling, FP4 Training development in TorchAO, MXFP4 training, NVFP4 training, User API for FP4

Swizzling Scales for NVFP4: Scale swizzling is required by the cuBLAS nvfp4 gemm for Tensor Core loading, and the flag controls whether to swizzle ahead of time or just-in-time, according to this pytorch/ao link.
FP4 Training Development Active in TorchAO Soon: The relevant GitHub issue is the place to track the development of FP4 training in TorchAO, which is planned to be updated soon, with work focused on exposing a user-facing API and verifying performance and accuracy.
- The plan is to get to mxfp4 and nvfp4 training within the next 3 months.
FP4 Training Use Cases: The main use case for FP4 training is achieving training speedups and VRAM savings, with a user expressing willingness to test and provide feedback on the user-facing API when it becomes available.

GPU MODE ▷ #irl-meetup (4 messages):

PyTorch conference, Meetup

Meetup Planned After PyTorch Conference: A member inquired about plans for a meetup the Friday after the PyTorch conference, similar to last year.
- Another member confirmed that something is planned for that Friday and an announcement should be coming soon.
Details for PyTorch Meetup to be Announced Soon: Following an inquiry about a post-PyTorch conference meetup, organizers have confirmed plans are in motion.
- An announcement with specific details regarding the Friday gathering is expected shortly; stay tuned for updates.

GPU MODE ▷ #self-promotion (3 messages):

Langflow on Vast.ai, Tiny TPU on TinyTapeout

Langflow Flies on Vast.ai: A new template is available to launch Langflow on Vast.ai, providing a visual AI workflow platform with a hosted Ollama API server running in Docker containers, as detailed here.
- Langflow features an intuitive drag-and-drop interface for building complex AI applications, while Ollama serves as a private LLM backend with models like Llama and Mistral.
Tiny Tapeout Touts Tiny TPU: A member prototyped a tiny version of a TPU in Verilog for fabrication on TinyTapeout.
- The 2x2 matmul systolic array operates on 2 TinyTapeout tiles and is capable of nearly 100 million operations per second on a 50 MHz clock, with code available on GitHub.

GPU MODE ▷ #factorio-learning-env (7 messages):

VQA Dataset, Factorio RCON, JackHopkins RCON

Noddybear uploads a VQA Dataset sample!: A member, Noddybear, announced the availability of a VQA Dataset Sample.
- This sample is now accessible on Hugging Face datasets for broader use.
Members test Factorio RCON version!: Members are set to test the Factorio RCON, with one promising to provide results soon, which may involve Jack’s version.
- They aim to see if Jack’s version improves their testing environment.
Members set up new environments for the weekend!: A member stated they aim to set up a new environment over the weekend.
- They seem excited to get to work on the environment.

GPU MODE ▷ #cutlass (2 messages):

CuTe Vectorized Store Issues, Memory Coalescing Problems, LDG.E.128 vs STG.E.128 Instructions

CuTe’s Vector Store Problems: A user is facing issues with CuTe not generating a proper 128-bit vectorized store from registers to global memory, despite using a float4.
- Instead of STG.E.128, the compiler emits two separate STG.E.64 instructions, which breaks memory coalescing across threads.
Dissecting the LDG.E.128 vs STG.E.64 Dilemma: The compiler emits a correct LDG.E.128 instruction when loading from global memory into registers.
- However, when storing back to global memory, the compiler uses two STG.E.64 instructions, leading the user to question the support for STG.E.128 or potential misconfiguration.
Memory Coalescing Concerns Raised: The user highlights that the use of two separate STG.E.64 instructions, instead of a single STG.E.128, breaks memory coalescing across threads.
- This is concerning because vectorized stores are expected to ensure data is written in a contiguous manner for optimal performance.

GPU MODE ▷ #singularity-systems (2 messages):

Chaitin-Briggs-Click register allocation, picograd and picocuda merge

Dive into Chaitin-Briggs-Click Register Allocation: For those following along with the Chaitin-Briggs-Click register allocation, here’s a list of must-read papers, starting with Wikipedia for a broad overview.
- Deep dives are available via sci-hub link 1, sci-hub link 2, lecture notes, and a GitHub doc.
Picograd and Picocuda Unite!: Picograd and Picocuda have merged into one repository.
- Access the unified codebase at Singsys GitHub repo.

MCP (Glama) ▷ #general (15 messages🔥):

MCP for Docs, Standardized Payments for Tools, In-Browser postMessage Transport Proposal, MCP for Embedded Systems, Exposing Prompts via the Web

MCP Server needs Documentation: A user is seeking a way to create an MCP server that can access documentation, either by pointing it to a repo or a docs site, to avoid making excessive round trips for information retrieval.
Standardizing Payment of 1000s of tools: The main idea is to standardize payments on some abstract level, where AI assistants will handle payments for a potential world of thousands of tools without requiring individual account creation.
- A member hopes this PR will enable securely entering payment info on the client side without creating an account (or creating one silently) for a one-time purchase.
PostMessage transport proposal for in-browser MCP: A member is working on a transport proposal for in-browser “postMessage” with a demo showing a client + server as static JS/HTML hosted via GitHub Pages.
- They drafted a SEP and are looking for an MCP spec maintainer to sponsor it for standardization.
MCP for Programming Embedded Systems: A new user is asking about a useful MCP for programming embedded systems in STM32CubeIDE.
OpenRouter LLM service with MCP: A member is inquiring about using an LLM service like OpenRouter to communicate with an API fetching MCP data (e.g., sports data) and asking questions about it.
- They would like to ask an openrouter model, to tell me about the next EPL soccer game coming up.

MCP (Glama) ▷ #showcase (4 messages):

API Keys, AutoGen Chatbot, MCP Servers, YouTube Search

Enact Tools API Keys Flow: A member shared they did the same flow with the api keys with enact.tools and suggested to compare notes.
- Another member responded that they should find a time to chat.
AutoGen and MCP for YouTube search tutorial: A member posted a tutorial on building a multi-agent chatbot with AutoGen and MCP servers for YouTube search from scratch, available at YouTube.
- The tutorial aims to guide others through the process of creating a functional chatbot using these tools.

aider (Paul Gauthier) ▷ #general (14 messages🔥):

Aider, Best Models for Aider, DeepSeek, OpenAI open models, GLM air stable

Aider Confirmed: A member confirmed they were using Aider.
- When asked whether that’s not using Aider, they confirmed that was using aider, yes.
DeepSeek Model shines for Aider: Multiple members suggested DeepSeek as a good model for Aider due to its affordability and effectiveness.
- One user mentioned using Deepseek-R1 through OpenRouter and being quite happy with it.
OpenAI open models released: OpenAI released some open models.
- A member joked as soon as I work out how to load GLM air stable on my machine, something else comes out 117B with 5.1B active… .
Pikuma’s LLM Vibe Test: A member shared a link to Pikuma’s LLM vibe test for fun.
- The test is designed to explain this code.

aider (Paul Gauthier) ▷ #questions-and-tips (2 messages):

Aider Non-Interactive Mode

Aider Eyes Non-Interactive Mode: A member asked if there was a way to use aider in non-interactive mode.
- They stated that they could not find anything in the documentation (aider scripting docs).
Documenting the Aider: The user asked if there’s some documentation that explains how to run Aider in non interactive mode.
- The user shared the link to the scripting docs where they searched for the info.

DSPy ▷ #show-and-tell (6 messages):

Document boundary detection in PDFs, Knowledge graph practitioners meet DSPy, SIMBA Optimizer write-up

Detect Document Boundaries with DSPy: A member shared a writeup on using DSPy to detect document boundaries in PDFs: kmad.ai/Using-DSPy-to-Detect-Document-Boundaries.
- The author also shared a link to their X post introducing DSPy to knowledge graph practitioners and expressed interest in exploring deeper multi-step workflows and optimization.
DSPy meets Knowledge Graph Practitioners: A member shared a writeup introducing DSPy to knowledge graph practitioners: blog.kuzudb.com.
- The author aims to enhance productivity with LLMs and plans to explore multi-step workflows and optimization in the coming weeks.
SIMBA Optimizer Demystified: A member shared their write-up of the SIMBA optimizer: blog.mariusvach.com.
- Another member praised the write-up as super intuitive and well explained.

DSPy ▷ #general (9 messages🔥):

GEPA availability in DSPy, Optimized System Prompts for Fine-tuning, Three-Phase Training Approach

GEPA on DSPy: Still MIA: A user inquired about the availability of GEPA for use within DSPy.
- Another user responded with a Discord link, and the original user clarified that it hasn’t been released yet.
Sys-Prompt Optimization: Fine-Tuning Faux Pas?: A user questioned whether using an optimized sys_prompt with DSPy is a good idea when fine-tuning a model, suggesting it might not be.
- They are attempting to add reasoning traces for a specific task to a non-reasoning model and are using a three-phase training approach.
Training Trifecta: A Three-Phase Plan: A user outlined a three-phase training approach: SFT with an optimized sys_prompt on 1/10th of the dataset, followed by GRPO on the same subset with a different sys_prompt, and finally SFT on the remaining data without the sys_prompt.
- The user admitted, lol idk what im doing, im just f-ing around to find out and noted that the task has non-verifiable rewards, hence the unusual approach.

LlamaIndex ▷ #announcements (1 messages):

Office Hours

LlamaIndex Office Hours will commence soon: LlamaIndex office hours will commence in 10 minutes at this link.
Dummy topic to satisfy minItems: This is a dummy topic to satisfy the minItems requirement of the schema.

LlamaIndex ▷ #blog (4 messages):

LlamaParse, LlamaCloud, GPT-OSS-120B & GPT-OSS-20B, Document Agents

LlamaParse Transforms PDFs into Multimodal Reports: A walkthrough by @tuanacelik demonstrates how to turn dense PDFs into full multimodal reports with interleaving text and images using LlamaParse.
- The process involves ingesting research papers with LlamaParse (high-res OCR + full-page & chart images) and building a report-generation agent that dynamically chooses tools.
LlamaCloud Powers AI Scaling with Document Intelligence: LlamaCloud’s parsing capabilities help AI companies scale from prototype to production with complex document ingestion as illustrated by how @withdelphi built their “digital minds” mentorship platform using document intelligence (link).
- Their best-in-class parsing handles malformed PDFs and embedded images.
OpenAI Drops GPT-OSS-120B and GPT-OSS-20B LLMs: OpenAI released their first open-source LLMs since GPT-2, GPT-OSS-120B & GPT-OSS-20B, under the Apache 2.0 license.
- These models feature reasoning that matches o4-mini, can run locally, and are ready to use with LlamaIndex.
Document Agents Handle Messy Financial Documents: A webinar will demonstrate how Document Agents handle messy financial documents, showcasing systems that work with complex, multimodal documents using LlamaCloud’s tooling.
- It will be held in 1 week.

LlamaIndex ▷ #general (7 messages):

Tracing OpenAI Embedding API calls in LlamaIndex, LlamaExtract P&ID Example, LlamaExtract with Graphs Challenges, Graphiti with LlamaIndex for Knowledge Graph Apps

LlamaIndex traces OpenAI Embeddings API calls: A user inquired whether LlamaIndex traces every batch call to the OpenAI Embeddings API, even with large batch sizes.
- A member confirmed that LlamaIndex should be tracing these calls, noting that the default batch size for OpenAI is around 100.
LlamaExtract P&ID Example Available: A user asked if another member had checked out the P&ID example in LlamaExtract; link provided to a specific Discord message.
- This suggests LlamaExtract has capabilities for processing Piping and Instrumentation Diagrams.
LlamaExtract faces Graph-parsing difficulty: A member confirmed reviewing LlamaExtract with graphs and acknowledged that these cases are notoriously difficult for LVMs/LLMs.
- The member offered to discuss the challenges further and mentioned that a team member from LlamaParse would be joining the discussion.
Graphiti & LlamaIndex create Knowledge Graph Apps: A user inquired about tutorials demonstrating the use of Graphiti with LlamaIndex to create knowledge graph applications.
- No specific tutorials were provided in the context.

Manus.im Discord ▷ #general (10 messages🔥):

Manus is dead, Sub agents as a solution, TradingView Premium, Flutter app guide

Manus Experiences Downtime: Users observe that the Manus platform is experiencing low usage and might be dead.
- One user discovered a workaround using sub-agents to improve code and project quality.
TradingView Premium Offered for Free: A user shared a link to a supposed free full version of TradingView Premium for Windows and macOS on Reddit.
Guide on Flutter App Creation: A user shared a link to a guide on creating Flutter apps within daily credit limits on flutter-web-emulator.vercel.app.
- The user notes that the page includes ads and is based on personal experience, while another user flagged it as a potential scam.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (8 messages🔥):

Syllabus Changes, LLM Agents vs Advanced LLM Agents, Speaker Differences

New Syllabus on the Horizon?: Members discussed the likelihood of syllabus changes for the upcoming LLM Agents course, suggesting that it would likely be different from last year given the evolving nature of the field.
- One member stated, “As the agents’ area evolves it’s normal for the syllabus to be adapted”.
LLM Agents Course has Advanced Counterpart: Participants clarified that there are two distinct courses: LLM Agents and Advanced LLM Agents, noting differences between them.
- One member specified that “They are two different titles - one is llm agents other advance llm agents. Syllabus is different”.
Similar Topics, New Speakers!: A member indicated that while the overall topics would likely remain similar, the speakers would probably be different.
- This suggests a refresh in perspective and expertise for the course.

Torchtune ▷ #papers (4 messages):

Public Server, Sharing information

Members ask about sharing discord channel info: A member asked if it would be okay to share info from the channel in another public server.
- Another member replied that it’s public and they would be very happy if they shared it.
Channel sharing encouraged: The channel owner explicitly stated that sharing the content is welcomed.
- They expressed happiness at the prospect of the channel being shared, encouraging broader dissemination of the information.

tinygrad (George Hotz) ▷ #general (2 messages):

TinyPilot, Codebase Work, Image Analysis

TinyPilot Plug Suggestion: A member suggested using their TinyPilot tool to help with codebase work.
- They noted that it can be helpful, but needs work once you get deeper into the codebase.
Image Analysis Laughs: A member reacted with “lol” to an attached image.
- The image Screenshot_2025-08-04_at_12.08.33_PM.png was attached.

Cohere ▷ #👋-introduce-yourself (2 messages):

AI Voice Agents, GPT-Powered Chatbots, RAG Implementation, Freelance AI Engineer

AI Engineer Pioneers Intelligent Voice Agents: An AI Engineer specializes in building intelligent voice agents, chatbots, and AI assistants that handle inbound/outbound phone calls via SIP (Twilio), including features like call booking, IVR, and voicemail.
- The engineer’s work involves GPT-powered chatbots that learn from documents, audio, and scraped data from forums, Discord, Slack, and websites using retrieval-augmented generation (RAG), combined with workflow automation to streamline communication and processes.
Engineer navigates Tech Stack: The AI Engineer is skilled in Python, JavaScript, Node.js, FastAPI, and tools like LangChain, Pinecone, OpenAI, Deepgram, and Twilio.
- The Engineer is available for freelance, remote, or startup projects.

MLOps @Chipro ▷ #general-ml (1 messages):

Search logs and click-through data, Ranker fine-tuning, Cost implications of data usage

Mine search logs to FT rankers: A member suggested using search logs, query data, document lists, and click-through data to fine-tune a ranker.
- Another member cautioned that this comes at a cost.
Consider cost implications: Collecting and utilizing search logs for ranker fine-tuning involves certain costs.
- These costs should be carefully evaluated before implementing such a strategy.

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. OpenAI GPT-OSS Model Releases, Integrations, and Community Discussion

2. KittenTTS: Ultra-Compact TTS Model Launch

3. Llama.cpp Feature Updates and MoE Offloading

Less Technical AI Subreddit Recap

1. Google DeepMind Genie 3 Model Release & Benchmarks

2. OpenAI Open Source Model and GPT-OSS Launch

3. Qwen-Image and Open-Source Multimodal Generation Benchmarks

AI Discord Recap

Discord: High level Discord summaries

Perplexity AI Discord

Unsloth AI (Daniel Han) Discord

LMArena Discord

LM Studio Discord

OpenAI Discord

OpenRouter (Alex Atallah) Discord

Latent Space Discord

Nous Research AI Discord

Eleuther Discord

Moonshot AI (Kimi K-2) Discord

Cursor Community Discord

HuggingFace Discord

Yannick Kilcher Discord

Notebook LM Discord

Modular (Mojo 🔥) Discord

GPU MODE Discord

MCP (Glama) Discord

aider (Paul Gauthier) Discord

DSPy Discord

LlamaIndex Discord

Manus.im Discord Discord

LLM Agents (Berkeley MOOC) Discord

Torchtune Discord

tinygrad (George Hotz) Discord

Cohere Discord

MLOps @Chipro Discord

Codeium (Windsurf) Discord

Nomic.ai (GPT4All) Discord

Discord: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):

Perplexity AI ▷ #general (1206 messages🔥🔥🔥):

Perplexity AI ▷ #sharing (5 messages):

Perplexity AI ▷ #pplx-api (6 messages):

Unsloth AI (Daniel Han) ▷ #general (1200 messages🔥🔥🔥):

Unsloth AI (Daniel Han) ▷ #introduce-yourself (4 messages):

Unsloth AI (Daniel Han) ▷ #off-topic (7 messages):

Unsloth AI (Daniel Han) ▷ #help (77 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

Unsloth AI (Daniel Han) ▷ #research (8 messages🔥):

Unsloth AI (Daniel Han) ▷ #unsloth-bot (82 messages🔥🔥):

LMArena ▷ #general (1152 messages🔥🔥🔥):

LMArena ▷ #announcements (1 messages):

LM Studio ▷ #announcements (1 messages):

LM Studio ▷ #general (593 messages🔥🔥🔥):

LM Studio ▷ #hardware-discussion (32 messages🔥):

OpenAI ▷ #annnouncements (4 messages):

OpenAI ▷ #ai-discussions (286 messages🔥🔥):

OpenAI ▷ #gpt-4-discussions (48 messages🔥):

OpenAI ▷ #prompt-engineering (1 messages):

OpenAI ▷ #api-discussions (1 messages):

OpenRouter (Alex Atallah) ▷ #announcements (5 messages):

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

OpenRouter (Alex Atallah) ▷ #general (251 messages🔥🔥):

OpenRouter (Alex Atallah) ▷ #new-models (3 messages):

OpenRouter (Alex Atallah) ▷ #discussion (30 messages🔥):

Latent Space ▷ #ai-general-chat (236 messages🔥🔥):

Nous Research AI ▷ #general (220 messages🔥🔥):

Nous Research AI ▷ #ask-about-llms (2 messages):

Nous Research AI ▷ #research-papers (1 messages):

Nous Research AI ▷ #research-papers (1 messages):

Eleuther ▷ #general (14 messages🔥):

Eleuther ▷ #research (188 messages🔥🔥):

Eleuther ▷ #lm-thunderdome (1 messages):

Eleuther ▷ #gpt-neox-dev (7 messages):

Moonshot AI (Kimi K-2) ▷ #general-chat (186 messages🔥🔥):

Cursor Community ▷ #general (177 messages🔥🔥):

Cursor Community ▷ #background-agents (7 messages):

HuggingFace ▷ #announcements (1 messages):