a rare miss at a rough time.
AI News for 12/15/2025-12/16/2025. We checked 12 subreddits, 544 Twitters and 24 Discords (207 channels, and 10501 messages) for you. Estimated reading time saved (at 200wpm): 734 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
The headline details of OpenAIās new image model are good - precise image editing, executing creative ideas, better instruction following, much better text and markdown rendering, fixing obvious bugs in the old gpt-image-1, and even voluntarily highlighting known regressions in the model. It even scores 1277 on LMArena and 1344 on Design Arena and 1272 on AA Arena, all #1 spots.
BUT: the compliments stop there. Basically universally all the vibe checks from Twitter, Reddit, and the various Discord communities are negative in comparisons with Nano Banana Pro. The progress from GPT-Image-1 is clear enough, so this is not so much a knock on OpenAI overall, but more a rough showing for confidence in Arena benchmarking being representative of actual serious user preferences.
The context and timing matters for those who care about the blow by blows of the capability race. If they had shipped this before NBP, or there was no overhanging narrative of a āCode Redā in light of Gemini competition, Image-1.5 wouldāve been a fine launch. Now the vibes are off.
AI Twitter Recap
Xiaomiās MiMoāV2āFlash: 309B MoE built for speed, long context, and SWEāBench SOTA
- MiMoāV2āFlash (309B MoE; 15B active): Xiaomiās new openāweight model emphasizes inference efficiency and agentic workflows: 150 tokens/s, 256K context, and top openāsource scores on SWEāBench (Verified: 73.4%, Multilingual: 71.7%). Architecture uses Hybrid Sliding Window Attention (SWA) with sparse local windows and a small set of global layers, plus MTP (multiātoken prediction) for specādecode and dayā0 serving on LMSYS/SGLang. Xiaomi says it āmatches DeepSeekāV3.2ā on general benchmarks at lower latency. Links: launch details and specs @XiaomiMiMo, technical report and code @XiaomiMiMo.
- Engineering notes & ablations: Lead author Fuli Luo details what moved the needle: Hybrid SWA outperformed other linear attention variants; window size 128 > 512 postātraining for long context; attention sinks matter; 3ālayer MTP achieved >3 accept length and ~2.5Ć speedup on coding tasks; postātraining via MOPD (multiāteacher onāpolicy distillation) matched teacher quality at <1/50th the SFT+RL compute. Read the thread @luo_fuli14427. External ablations highlight sinks and SWAā128 benefits over 512, and hybrid > dense global layers on complex tasks @eliebakouch. Availability: free for a limited time on OpenRouter @OpenRouterAI; SGLang dayā0 perf notes @BanghuaZ. Background: lead was a core author of DeepSeekāV2 @eliebakouch.
Image generation shakeāup: OpenAIās GPT Image 1.5 (āChatGPT Imagesā) and FLUX.2 Max
- OpenAI GPT Image 1.5: New flagship model for ChatGPT and API brings stronger instruction following, precise edits, improved text rendering/logos/faces, and up to 4Ć faster generation. A new āImagesā surface ships in ChatGPT. Docs and API: @OpenAI, @OpenAIDevs. It debuts at #1 on the Artificial Analysis and LM Arena leaderboards for both textātoāimage and editing, with a sizable margin over Geminiās Nano Banana Pro; pricing is resolution/qualityādependent (Artificial Analysis cites ~$133/1k 1MP images at high quality; ~$9/1k at low quality). Leaderboards and pricing analyses: @arena, @ArtificialAnlys, @grx_xce.
- Early headātoāheads suggest GPTāImageā1.5 edges prior GPT variants and competes closely with Nano Banana Pro on likeness/edit fidelity, with some reports that Gemini still leads on āvisual IQā (math/maze reasoning in images) @Yuchenj_UW, @Yuchenj_UW.
- FLUX.2 [max] (Black Forest Labs): A higherāquality FLUX.2 variant with webāgrounding and up to 10 reference images for consistent editing; ranked #2ā3 on image leaderboards across textātoāimage and editing (priced at $70/1k T2I images, $140/1k edits). Launch and hosting: @bfl_ml, @fal, @arena.
Openāsource push from NVIDIA: NemotronāCascade and broader availability of Nemotron 3
- NemotronāCascade (8B/14B): NVIDIA introduces āCascade RL,ā a domaināwise sequential RL pipeline. The 14B model surpasses DeepSeekāR1ā0528 (671B) on LiveCodeBench v5/v6/Pro and hits 43.1% pass@1 on SWEāBench Verified (53.8% with testātime scaling). The team emphasizes RLHF alignment as a preāstep for improved reasoning and notes later domain stages preserve or improve earlier gains. Paper/models: @_weiping, @zihan_johan_liu, @HuggingPapers.
- Nemotron 3 Nano availability: Now on Ollama and MLX/LM Studio for Apple Silicon, bringing āfromāscratchā small MoE models to local workflows @ollama, @awnihannun, @lmstudio. Context: NVIDIAās openāsource strategy is increasingly hardwareāalignedāoptimizing training/inference stacks for their silicon (the āhardwareādefined AIā era) @TheTuringPost.
Benchmarks for factuality and science: FACTS and FrontierScience
- FACTS Leaderboard (Google Research): Comprehensive factuality suite spanning four dimensionsāMultimodal, Parametric, Search, Grounding v2āwith standardized Kaggle tooling. Headline: Gemini 3 Pro 68.8% overall; subāscores show behavioral tradeoffs (Claude models are conservative with high noācontradiction; GPT models higher coverage but more contradictions). Multimodal remains hard (~47% with strict coverage + zeroācontradiction). Parametric spread is wide (Gemini 3 Pro 76.4% vs GPTā5 mini 16%). Thread and paper: @omarsar0.
- OpenAI FrontierScience (openāsourced evals + wetālab loop): New PhDālevel physics/chemistry/biology benchmark (olympiadāstyle + research tasks) with testātime compute scaling, released alongside a wetālab study where GPTā5 proposed protocol changes yielding 79Ć efficiency in a cloning workflow. Open datasets on the HF Hub and emphasis on tying model evals to real scientific workflows. Announcements and details: @OpenAI, @kevinweil, @tejalpatwardhan, @reach_vb.
Serving and agent infra: KVāaware routing, P/D disaggregation, control planes
- vLLM Router (prefill/decodeāaware load balancer): Purposeābuilt for vLLM fleets, written in Rust; supports consistent hashing for KV locality, powerāofātwo choices, retries/backoff, circuit breakers, k8s discovery, and Prometheus metrics. Designed for P/D disaggregation with distinct worker pools and routing policies to preserve throughput and tail latency @vllm_project.
- Toward an āIntelligence Control Planeā: vLLM + AMD preview a āSemantic Routerā framingāgoverning inputs, outputs, and longāterm state with an emphasis on safety/memory in large agent systems @vllm_project. Complementary stack updates: SkyPilot + NVIDIA Dynamo recipes for MoE inference (PD disaggregation, KVāaware routing) with OpenAIāAPIācompatible endpoints @skypilot_org, SGLang dayā0 support for MiMoāV2āFlash @BanghuaZ, and OpenHands shipping a productionāoriented software agent SDK @OpenHandsDev. On the provider side, Cline moved to Vercelās AI Gateway for lower error rates and 10ā40% better P99 streaming latencies across several models @cline.
Multimodal/audio/3D: open releases and fast view synthesis
- Meta SAM Audio (open weight): A unified audio separation model that isolates sounds from complex mixtures using text, visual, or span prompts. Released with benchmarks, a perception encoder, and demos in the Segment Anything Playground @AIatMeta, early community note on open weights @_akhaliq.
- AllenAI Molmo 2 (Apacheā2.0): Extends Molmoās grounded VLM capabilities to video; three sizes based on SigLIP2 + Qwen3, plus a separate 4B model leading open models on video pointing/counting benchmarks. Data releases included @allen_ai, @mervenoyann.
- Apple SHARP (singleāimage to 3D under 1s): Generates ~1.2M 3D Gaussians via a single feedāforward pass with a learned depth adjustment module, achieving ~1000Ć speedup vs diffusion baselines (e.g., Gen3C ~850s) while improving perceptual fidelity on ScanNet++ (DISTS 0.071 vs 0.090). Paper recap @omarsar0.
- Also notable: MiniMaxās openāsourced VTP for scalable visual tokenizer pretraining that improves downstream diffusion transformer generations without extra generator compute @MiniMax__AI, and Runway Genā4.5 rolling out to all paid plans @runwayml.
Top tweets (by engagement)
- ChatGPT Images (GPTāImageā1.5): New model, new āImagesā UI, and API; 4Ć faster and top of public leaderboards @OpenAI (7,842). Product demo thread @sama (2,492).
- Metaās SAM Audio (open weight): Unified text/visual/spanāprompt audio separation with playground demos @AIatMeta (3,781).
- FrontierScience: OpenAIās new PhDālevel science evals plus 79Ć improvement in a wetālab cloning protocol with GPTā5 feedback loops @OpenAI (1,859); highlight @sama (2,652).
- Larian on genāAI in pipelines: Clarifies ideation/reference use vs. original concept art; strong stance on human artists in the loop @LarAtLarian (31,930).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Meta SAM Audio Model Launch
- Meta announced a new SAM Audio Model for audio editing that can segment sound from complex audio mixtures using text, visual, and time span prompts. (Activity: 403): Meta has introduced the SAM Audio Model, a novel tool for audio editing that allows users to isolate specific sounds from complex audio mixtures using text, visual, and time span prompts. This model leverages advanced segmentation techniques to identify and extract sounds, potentially transforming audio processing workflows. The modelās ability to accurately pick out sounds, such as a microphone tap from a video, demonstrates its precision and potential applications in various fields, including media production and virtual meetings. Commenters are impressed by the modelās precision, noting its potential to enhance virtual meeting experiences by filtering out unwanted noises. There is also a sense of amazement at its ability to isolate specific sounds from complex audio environments, suggesting significant advancements in audio processing technology.
- ahmetegesel highlights the modelās capability to isolate specific sounds from complex audio environments, emphasizing its potential to accurately identify and extract sounds associated with particular objects in a video. This suggests a high level of precision in audio segmentation, which could be transformative for audio editing applications.
- Andy12_ points out a specific demonstration where the model successfully identifies a subtle microphone tap when prompted with ātap on the microphoneā. This example underscores the modelās sensitivity and accuracy in detecting and isolating minute audio events within a complex soundscape, showcasing its potential utility in detailed audio analysis and editing tasks.
- RandumbRedditor1000 inquires about the modelās applicability to musical instruments, which implies interest in its ability to handle complex audio mixtures involving music. This raises questions about the modelās performance in distinguishing and isolating individual instrument sounds within a musical piece, a challenging task in audio processing.
2. OpenAI Internal Discussions on AI Openness
- It was Ilya who āclosedā OpenAI (Activity: 797): The image is an email from Ilya Sutskever, co-founder of OpenAI, expressing concerns about the potential risks of a āhard takeoffā in AI development, which refers to a rapid and uncontrolled advancement of AI capabilities. Sutskever suggests that while openness in AI research initially aids recruitment and collaboration, it could eventually lead to the creation of unsafe AI systems. This email highlights a pivotal moment in OpenAIās history where the balance between openness and safety in AI development was being debated internally, reflecting a shift towards more cautious and closed practices as AI technology advanced. Commenters express skepticism about the trustworthiness of companies like OpenAI, questioning the philosophy of restricting AI research to prevent unsafe AI. There is also a sentiment that key figures like Elon Musk, Ilya Sutskever, and Sam Altman are motivated by a desire for control and recognition, leading to a competitive rather than collaborative approach in AI development.
- LoSboccacc highlights a critical view on OpenAIās research, suggesting that their advancements are essentially built upon Googleās Transformer model. This implies that OpenAIās innovations might be more about scaling existing architectures rather than introducing fundamentally new concepts. The comment metaphorically describes OpenAIās work as ā8 Googleās transformer in a trenchcoat,ā indicating a perception of superficial complexity over genuine novelty.
- popiazaza discusses the internal dynamics and competition among key figures in AI, such as Elon Musk, Ilya Sutskever, and Sam Altman. The comment suggests that the rivalry and lack of trust among these leaders contribute to the closed nature of AI development at organizations like OpenAI, SSI, and xAI. This reflects broader concerns about transparency and collaboration in the AI field.
- RASTAGAMER420 references the ancient phrase āWho will watch the watchmen,ā highlighting the ongoing debate about oversight and accountability in AI development. This comment underscores the ethical and philosophical challenges of ensuring that those who develop and control AI technologies are themselves subject to scrutiny and regulation.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. OpenAI GPT-Image-1.5 Release and Benchmarks
- BREAKING: OpenAI releases āGPT-Image-1.5ā (ChatGPT Images) & It instantly takes the #1 Spot on LMArena, beating Googleās Nano Banana Pro. (Activity: 1142): OpenAI has released a new model, GPT-Image-1.5, which has quickly taken the top spot on the LMArena leaderboard for text-to-image generation, surpassing Googleās āNano Banana Pro.ā The model boasts a
score of 1277, compared to the1235of its closest competitor. Key improvements include being4x fasterthan its predecessor, DALL-E 3, and offering enhanced editing capabilities with precise āadd, subtract, combineā instructions. It also maintains consistency in character appearance and lighting across edits, addressing a major limitation of DALL-E 3. The model is available to all ChatGPT users via a new āImagesā tab and is also accessible through an API as gpt-image-1.5. OpenAI Blog Some users express skepticism about the authenticity of the LMArena leaderboard screenshot, questioning the lack of official updates on the LMArena website. Others speculate on the naming convention, suggesting it might indicate a more advanced model in development or a strategic decision to avoid overhyping.- A user expressed skepticism about the authenticity of the leaderboard update, noting that they couldnāt find any official updates on the LMArena website. This raises questions about the validity of the claim that GPT-Image-1.5 has taken the top spot, suggesting a need for verification from official sources.
- Another user tested the modelās capabilities by using a prompt example from OpenAIās announcement page, comparing it with Googleās Nano Banana Pro. The prompt involved combining images in a specific style, and the user shared the resulting image, which could serve as a practical benchmark for evaluating the modelās performance in real-world scenarios.
- A comment speculated on the naming of the model as ā1.5ā, suggesting it might indicate either a placeholder for a more advanced model in development or a strategic decision to avoid the issues associated with a full version release like GPT-5. This reflects on OpenAIās potential strategy in versioning and releasing their models.
- New GPT image vs Nano Banana Pro. (Activity: 1666): The post discusses a comparison between the image generation capabilities of a new GPT model and the Nano Banana Pro (NBP). The GPT modelās images are described as having a more artificial appearance, while NBPās images are noted for their realism, making them almost indistinguishable from real photographs. This suggests that NBP may have superior image synthesis technology, potentially due to a more extensive or refined dataset, or advanced rendering techniques. Commenters highlight that GPT-generated images appear overly polished, akin to commercial photography, whereas NBPās outputs are praised for their authenticity and natural look, suggesting a preference for NBPās approach in realistic image generation.
- The discussion highlights a comparison between GPT-generated images and those from Nano Banana Pro (NBP). Users note that while GPT images often appear overly polished and artificial, akin to commercial photography, NBP images are praised for their realism, appearing more like candid snapshots. This suggests that NBP may have advanced in creating more lifelike images, potentially due to different training data or image processing techniques.
- A specific example is given where GPTās image of a car is described as resembling a ācar commercial image,ā whereas NBPās version looks like a personal photograph. This indicates a difference in the aesthetic and possibly the underlying algorithms used for rendering images, with NBP possibly employing techniques that better mimic real-world lighting and textures.
- The prompts used for generating images include detailed descriptions such as āa 20 years old girl, real image, 9:16, background in focusā and āa real image of a ford mustang, fully black, parked in a dark roadside.ā These prompts suggest that both models are being tested for their ability to handle complex scene compositions and lighting conditions, which are critical for achieving realism in AI-generated imagery.
- the new image gen is nuts (Activity: 1047): The image appears to be a product of advanced image generation technology, likely using AI models that focus on creating hyper-realistic human figures and environments. The comments highlight some inconsistencies in the generated image, such as mismatched clothing and unrealistic car features, which suggest that while the technology has improved in rendering human-like figures, it still struggles with contextual and environmental accuracy. This reflects ongoing challenges in AI image generation, where achieving perfect realism in complex scenes remains difficult. Commenters express a mix of boredom and critique, noting the repetitive nature of AI-generated images of non-existent people and pointing out specific flaws in the imageās realism, such as mismatched clothing and improbable car features.
- DumbedDownDinosaur highlights issues with the generated images, noting inconsistencies such as one leg being bare while the other is covered by pants, and the presence of a vanity table behind the front seat of a car, which raises questions about the logical coherence of the scene. Despite these flaws, they acknowledge that the human figures appear more realistic compared to previous iterations, avoiding the āplasticā look.
- Junior-Tradition2083 compares the new image generation model with the ānano banana proā, stating that the latter produces more realistic images, particularly in rendering humans and environments. This suggests that while the new model has improved, there are still competitors that excel in certain aspects of image realism.
- KH10304 expresses confusion about the spatial arrangement in the generated image, particularly questioning the seating position within the car. This points to potential issues in the modelās ability to accurately render complex scenes with correct spatial relationships.
2. Claude Code Updates and Applications
- Official: Anthropic just released Claude Code 2.0.70 with 13 CLI changes, details below. (Activity: 704): Anthropic has released Claude Code CLI 2.0.70, introducing
13changes including a new Enter key feature for prompt suggestions, wildcard syntaxmcp__server__*for tool permissions, and an auto-update toggle for plugin marketplaces. Notable fixes include resolving input clearing during command processing, prompt suggestion replacement issues, and diff view updates on terminal resize. The update also improves memory usage by3xfor large conversations and enhances the resolution of stats screenshots. The # shortcut for quick memory entry has been removed, and UI improvements have been made for file creation permissions. Changelog Commenters noted ongoing issues with scrolling flickers and crashes, indicating that while some bugs were addressed, persistent problems remain with the interfaceās stability.- A user inquired about the meaning of ā3xā in terms of memory usage, expressing skepticism about its impact and whether it might be exaggerated. This suggests a significant increase in memory requirements, which could affect performance, especially in resource-constrained environments.
- Another user asked about the
plan_mode_requiredfeature, indicating a need for clarification on its functionality. This suggests that the feature might be complex or not well-documented, leading to confusion among users.
- Claude code discovered a hacker on my server (Activity: 1018): The post describes an incident where Claude Code, an AI tool, detected unusual CPU usage on a Linux server, which was being used as a backend for a website. Upon investigation, it was found that the server was compromised and being used for cryptocurrency mining due to an open port left for a database. Claude Code identified the issue, closed the open ports, and removed the unauthorized access. The server had no users at the time, minimizing potential data exposure. A notable comment suggests deleting the compromised machine and creating a new one, as scripts used by hackers often have backdoors that can reactivate upon reboot.
- themusician985 suggests that the server should be deleted and recreated because scripts used by hackers often contain backdoors that can re-enable themselves upon reboot. This highlights a common security practice of ensuring compromised systems are completely rebuilt to prevent persistent threats.
- Nissan-S-Cargo expresses skepticism about the claim, implying that the story might be exaggerated or fabricated. This reflects a critical view often necessary in cybersecurity discussions, where extraordinary claims require substantial evidence.
- Unique-Drawer-7845 humorously implies that crypto miners could exploit stored procedures, hinting at the potential for SQL injection or misuse of database functions for unauthorized cryptocurrency mining. This underscores the importance of securing database operations against such vulnerabilities.
- Battle testing MCP for blockchain data in natural language (Activity: 419): The image provides a snapshot of the top Ethereum addresses by ETH balance, highlighting the dominance of the Beacon Deposit Contract, which holds a significant portion of Ethereumās supply for Proof of Stake (PoS) staking. This setup is part of a broader effort to utilize the Pocket Networkās MCP for real-time blockchain data analysis and historical pattern detection using Claude, an AI model. The user aims to integrate on-chain data directly into Claude for live trading insights and forensic analysis without relying on pre-processed signals or external dashboards like Dune Analytics. Commenters appreciate the advanced use of MCP beyond basic examples, noting potential challenges in setup versus operation. Concerns about data accuracy over time are raised, emphasizing the need for verification. The application is seen as valuable for research and journalism, offering a streamlined approach to analyzing blockchain activity.
- BloggingFly highlights the practical application of MCP (Model-Chain Protocol) in handling blockchain data, noting that itās a significant step beyond basic examples or documentation summaries. The comment raises a technical question about the challenges faced during implementation, specifically whether the difficulties are more related to the initial setup or the ongoing prompting once the system is operational.
- BrightFern8 questions the long-term accuracy of using MCP for blockchain data, pointing out a potential issue with consistency over time. They mention having observed scenarios where initial results are accurate, but repeated queries over time lead to ādriftā in the answers, suggesting a need for continuous verification of results to maintain trust in the system.
- theCartoonist59 discusses the current state of Claudeās integration with external tools, noting that most implementations involve Claude interfacing with other systems rather than performing direct queries. They see MCP as a promising development, allowing Claude to interact with real infrastructure, which could enhance its capabilities beyond mere guesswork.
3. AI in Personal and Social Contexts
- Terence Tao: Genuine Artificial General Intelligence Is Not Within Reach; Current AI Is Like A Clever Magic Trick (Activity: 1792): Terence Tao, a prominent mathematician, argues that genuine Artificial General Intelligence (AGI) is not achievable with current AI technologies, which he likens to a āclever magic trickā. He suggests that AIās current capabilities are better described as āartificial general clevernessā, characterized by solving complex problems through stochastic or brute force methods, often ungrounded and fallible, and not true intelligence. Tao emphasizes that while these AI tools can be impressive and useful, they are fundamentally unsatisfying, akin to understanding the mechanics behind a magic trick. He proposes viewing AI as a stochastic generator of clever outputs, which may be more productive for problem-solving. Source. Commenters debate Taoās perspective, with some arguing that intelligence itself is a ābundle of dirty tricksā, suggesting AIās methods are not fundamentally different from human cognition, which also relies on heuristics. Others note Taoās philosophical stance on AIās āclevernessā versus āintelligenceā, and question the predictability of AIās future capabilities, highlighting the rapid advancements that have already occurred.
- Saint_Nitouche argues that dismissing AI as mere āstochastic brute forceā overlooks the fact that human intelligence itself is a series of mechanical processes. The commenter suggests that any detailed explanation of intelligence, whether human or artificial, would inherently involve ādirty mechanical tricks,ā emphasizing that intelligence is fundamentally about mechanisms like neurons and electricity rather than something ineffable.
- Completely-Real-1 highlights that Terence Taoās perspective is more philosophical, focusing on the distinction between āintelligenceā and āclevernessā in AI. Tao suggests that AIās capabilities are based on ātricksā found in training data, akin to human heuristics. The commenter notes that humans also use mental ātricksā or ārules of thumbā learned from experience, drawing a parallel between AI training and human learning processes.
- DoubleGG123 points out the unpredictability of AI advancements, suggesting that even experts like Terence Tao might not have anticipated the current capabilities of AI a few years ago. The commenter emphasizes the difficulty of making long-term predictions about AI development, advocating for a more observational approach rather than relying on forecasts about future AI capabilities.
- MI6 chief: Tech giants are closer to running the world than politicians (Activity: 498): In a recent address, MI6 chief Blaise highlighted the growing influence of technology companies, suggesting they wield power comparable to governments, especially in the realm of disinformation and global stability. He called for urgent regulatory frameworks to manage the societal and political influence of these tech giants. For more details, see the original article here. Commenters discuss the pervasive influence of AI and tech giants, noting that AI bots can significantly shape public opinion and that tech companies have amassed significant wealth and control over information channels. There is a sentiment that politicians are now too late to curb this power, as tech companies have become deeply entrenched in societal structures.
- Forumly_AI highlights the significant influence of AI technologies in shaping public opinion, noting that AI-generated content can be highly convincing and influential. The comment emphasizes that AI bots have surpassed human influence in some areas, particularly in spreading information and potentially manipulating public perception. This underscores the growing power of tech giants who leverage such technologies, contributing to their increasing dominance over traditional political structures.
- RecursiveDysfunction discusses the shift in perception regarding tech companies from the early 2000s to 2025. Initially seen as engines of progress and innovation, tech companies are now recognized for accumulating wealth and power, often at the expense of societal well-being. The comment points out the role of social media in spreading disinformation and polarizing society, and suggests that attempts to regulate tech power are often undermined by the tech giantsā control over information channels.
- Senior_Flatworm3010 raises the issue of political inaction in regulating tech giants, suggesting that politicians have failed to implement laws to protect the public from potential negative impacts of unchecked tech power. The comment implies that the political system is complicit in allowing tech companies to gain disproportionate influence, as evidenced by the publicās occasional preference for business leaders over traditional politicians.
- Talked with Gpt for less than a week. quit nicotine. quit destructive gaming addiction. more. (Activity: 1768): The post discusses a userās experience with ChatGPT over a week, claiming it helped them quit nicotine and a gaming addiction. The user emphasizes their cognitive and pattern-based thinking as a factor in achieving these results, suggesting that ChatGPTās capabilities are undervalued at
$20per month. The post implies that the AIās conversational abilities can be leveraged for personal development and behavioral change. Comments highlight skepticism about the permanence of addiction cessation after a short period and humorously question the userās self-assessment of cognitive abilities. There is also a cautionary note about becoming overly reliant on AI for validation.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. OpenAI Image 1.5 Launch & Model Matchups
- Picture-Perfect Premiere: ChatGPT Images Ships: OpenAI introduced ChatGPT Images powered by the flagship GPT Image 1.5 model, rolling out in ChatGPT and the API, as detailed in the launch post New ChatGPT Images is here.
- The update emphasizes higher-fidelity generation and editing in both consumer and developer flows, positioning GPT Image 1.5 as OpenAIās visual flagship in 2025 according to OpenAIās announcement.
- Leaderboard Leap: GPT Image 1.5 Tops T2I, Edits Crowned Elsewhere: LMArenaās leaderboards show gpt-image-1.5 ranked #1 in Text-to-Image (score 1264) on the Text-to-Image Leaderboard, while chatgpt-image-latest leads Image Edit (score 1409) on the Image Edit Leaderboard.
- These rankings suggest OpenAIās image stack now leads both greenfield generation and in-place editing, with gpt-image-1.5 still competitive in editing at #4 per Image Edit Leaderboard.
2. OpenRouterās New Models & Spec Push
- Phone-Freebie Frenzy: Xiaomi MiMo-V2-Flash Goes Free: OpenRouter listed Xiaomi MiMo-V2-Flash for free at MiMo-V2-Flash:free, sparking surprise that a major phone maker is now shipping LLMs.
- Community reactions on X echoed the shock and curiosity around Xiaomiās play, as noted in OpenRouterAIās post.
- Creative Spark: Mistral Small Creative Debuts: Mistral launched the experimental Mistral Small Creative on OpenRouter at mistral-small-creative with pricing at $0.10/$0.30.
- OpenRouter highlighted availability in writing apps and chatrooms, inviting feedback via their announcement thread.
- Spec Squad: OpenCompletions Standardization Gains Steam: OpenRouter members discussed aligning APIs around OpenCompletions v2.2 (e.g., behavior for
minimal) with ecosystem tooling like LiteLLM and Pydantic AI, referencing a draft diagram standards_2x.png.- The goal is to simplify SDK integration by converging on a normalized responses schema across platforms, as visualized in the same draft standards_2x.png.
3. Audio AI: Segment, Perceive, and Speak
- Sound Slicer: Metaās SAM Audio Lands: Metaās SAM Audio collection dropped on HF, promising text/visual/time-conditioned isolation of sounds from complex mixtures, see facebook/sam-audio (HF collection).
- Excitement was tempered by license clauses restricting military, nuclear, and espionage use, which community members flagged on the HF collection page.
- Sense and Correspond: Large-Scale AV Learning: Metaās paper, āPushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning,ā outlines a massive dataset and training approach for robust AV alignment Meta Research publication.
- The work reports gains in audio-visual event detection and correspondence tasks by scaling both data and objectives, per the Meta Research write-up.
- Talk of the Town: Chatterbox Turbo Claims SOTA: Chatterbox Turbo was announced as an MIT-licensed, fast-and-natural voice model aimed at surpassing ElevenLabs Turbo and Cartesia Sonic 3, per Dev Shahās post.
- The release pitches transparency and auditability alongside speed and quality, branding it a āDeepSeek moment for Voice AIā in the announcement.
4. Jailbreaks, RLHF, and Red-Team Gauntlets
- Moral Fibers: RLHF Builds Model Character: A deep-dive blogpost argues RLHF shapes model character and explores security/compression tradeoffs in safety in Model Character, Security & Adversarial Robustness.
- The piece contends careful reward design and compression choices materially affect adversarial robustness and misbehavior contours, as discussed in the same blogpost.
- From Zero to 0-Day: Prompt-Injection Primer: New researchers got a starter guide covering prompt-injection and jailbreak tactics in Getting into Prompt Injection & Jailbreaking.
- It outlines practical workflows for staged exploits and evaluation, making it a useful on-ramp for red team experimentation per the guide.
- Challenge Accepted: GeminiJackās Simulation Override: A GeminiJack-styled challenge, Simulation Override, invites practitioners to break seeds and is teasing seed 5.1 soon at simul override.
- Members called it ālooking funā and a fresh venue for red teaming practice via Simulation Override.
5. Evals, Routing, and Agent Reality Checks
- Grad School Gauntlet: FrontierScience Eval Arrives: OpenAI introduced FrontierScience, an eval targeting PhD-level scientific reasoning across physics, chemistry, and biology using expert-written questions, detailed in FrontierScience.
- The initiative seeks to measure and benchmark advanced reasoning in production models with curated, hard problems per the blogpost.
- Router Retreat: OpenAI Rolls Back Model Router: OpenAI rolled back ChatGPTās Model Router after a year, noted in the ChatGPT release notes, prompting some users to try Gemini and Claude.
- Coverage framed the move as a strategic reset amid GPT-5 anticipation, as discussed by WIRED.
- Tests Tame the Agent: GPT-5.2 Codes to Spec: Simon Willison showed GPT-5.2 and Codex CLI porting a Python project (JustHTML) to JavaScript in 4.5 hours by iterating against tests in Porting JustHTML.
- āIf you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed,ā he observed in the write-up.
Discord: High level Discord summaries
BASI Jailbreaking Discord
- GPT-5 mini debuts with System Message: A member shared a long ChatGPT System Message revealing Model: GPT-5 mini, Current Date: 2025-12-16, Image Input: Enabled, and Personality: v2.
- The dump included critical instructions for arithmetic questions, mandating step-by-step calculation and discouraging reliance on memorized answers.
- Jailbreaking is the way with GPT-5 mini: Members speculated that a jailbreak prompt could be beneficial for testing the new GPT-5 mini model.
- Discussions highlighted the necessity of jailbreaking to achieve specific outputs, with one member noting that it takes a jailbreak to get thatā¦6 words!
- RLHF builds model character: A member shared a blogpost discussing how RLHF builds model character and the impact of compression on safety.
- Another blogpost was shared, serving as an introductory guide for new jailbreaking researchers.
- DeepSeek gets fully Jailbroken: A member reported achieving a full jailbreak of the latest DeepSeek version.
- However, they did not provide specific details regarding the prompt used for the jailbreak.
- Simulation Override GeminiJack Styled Challenge: A member shared a GeminiJack styled challenge called Simulation Override, describing it as looking fun.
- The creator of this challenge is planning to release seed 5.1 soon, promising new opportunities for red teaming.
LMArena Discord
- GPT Image 1.5 Model Split Confounds: Users are unsure about the differences between the two GPT Image 1.5 models, with speculation that one allows attaching images for editing.
- One user claimed that one model supports attaching images and the other does not.
- Banana Battle: Nano vs GPT Image: Users debated the quality of GPT Image 1.5 versus Nano Banana Pro, with opinions split.
- Some observed a drop in Nano Bananaās quality after the latest GPT Image update.
- Gemini 3 Flash Fails to Launch: Expectations were high for Gemini 3 Flash, but it didnāt launch as predicted.
- Some speculated about delays, linking it to OpenAIās recent activities.
- OpenAI Models Censor Content: Users reported increased censorship and restrictions on OpenAI models, particularly with copyrighted material.
- One user reported that when trying to generate Harry Potter content, the model was so sanitized itās bordering on lobotomy.
- GPT 5.2ās True Performance Questioned: Skepticism arose about GPT 5.2ās capabilities compared to its benchmark scores.
- Some users voiced disappointment, stating My opinion itās a piece of crap. It canāt do my tasks at all.
Unsloth AI (Daniel Han) Discord
- DPO Trumps GRPO for VRAM Savings: When a user ran out of VRAM during GRPO on a 7b LLM, it was suggested to switch to DPO, which requires more investment in data preparation.
- This includes generating completions, ranking them, and constructing a DPO set.
- Nemotron Nanos Excel, Flags Needed: Members testing the newest Nemotron 3 Nano 30B model using llama.cpp, stated it is great, though thereās no speedup from the
-ot ".ffn_(up|down)_exps.=CPU"flag, unlike Qwen models.- A user stated Nemotron is a lot better than Qwen3 30B because it fails half as often, faster, and only about 75% more verbose.
- Colabās H100s Trigger GPU Gold Rush: Members celebrated the arrival of H100 GPUs on Google Colab, urging others to delete your RunPod pods now because Google won.
- Someone jokingly requested a 48 kHz, multi speaker, phoneme-based, no diffusion & flow matching, <0.5B params, all parts trainable TTS to be developed immediately.
- Metaās SAM Audio Raises Licensing Eyebrows: The community expressed excitement about Metaās Segment Anything Model (SAM) for Audio, sharing a link to the huggingface collection.
- Concerns were raised about the licensing terms, particularly the restrictions on use for military purposes, nuclear applications, and espionage.
- Dataset Diversity Drives Unslothās OCR: A user is leveraging Qwen2.5 VL 7B via Unsloth for basic OCR, and was recommended to review model card settings for potential cutoff configurations, with links to a dataset guide and synthetic data notebook.
- They are pondering whether fine-tuning would improve OCR accuracy and mentioned a desire for continued pre-training, with some users suggesting exploring Deepseek OCR and Paddle OCR as alternatives, providing links to both.
Cursor Community Discord
- Cursorās Suggestions Error Out with HTTP 401: Multiple members reported that the suggestions feature in Cursor stopped working after an update, displaying a HTTP 401 error in the logs.
- Deleting the %appdata%\Cursor folder seems to solve the problem.
- Cursor Users Billed 30x Tokens: Some users are experiencing significantly higher token usage than expected, with one user reporting being billed for 30x the tokens actually used, crippling the ability to use the ultra plan.
- Affected users are advised to post request IDs and screenshots on the official Cursor forum in the Bug Reports category and contact support via email.
- Cursor Defaults to Agent Window on Every Release: Members are frustrated with Cursor defaulting to the Agent window on every release and are unable to switch back to Editor mode, rendering projects unusable.
- Using the shortcut
ctrl+eorcmd+emay solve the problem and allow the user to switch from Agent mode to Editor mode.
- Using the shortcut
- Debugging Decay Affects AI Capability: A paper was linked that explores debugging decay https://arxiv.org/abs/2506.18403, showing that AI capability can decay 60-80% within 2-3 attempts at the same fix.
- Cursor may address these issues within the debug mode with Strategic Fresh Starts, runtime snapshots, or clearing context, to prevent getting stuck in an exploitation loop.
- Cursor Adds Newlines to Git: Users are annoyed by Cursor adding extra newlines to git, resulting in a large number of files showing as changed when there are no actual code differences.
- A member is experiencing 119 files with no real changes due to the addition of newlines, as seen in an attached image.
OpenAI Discord
- OpenAI Branches Out on Mobile: Branched chats are now available on iOS and Android, expanding accessibility for users on the go.
- This update allows users to continue conversations seamlessly across devices.
- FrontierScience Evaluates PhD-Level AI Reasoning: OpenAI is introducing FrontierScience, a new eval that measures PhD-level scientific reasoning across physics, chemistry, and biology, using expert-written questions, as detailed in this blogpost.
- The eval aims to assess and benchmark advanced reasoning capabilities in AI models.
- GPT Image 1.5: OpenAIās Visual Flagship Launches: OpenAI is launching ChatGPT Images, powered by their flagship new image generation model, rolling out in ChatGPT for all users, and in the API as GPT Image 1.5, further detailed here.
- This release signifies a major upgrade in OpenAIās image generation capabilities.
- Nano Banana Reigns, Edits with Ease: Users are extensively comparing new image models and almost always find Nano Banana Pro to be more accurate and style-adherent with editing capabilities.
- Some users pointed out that new models are still failing at edits (like adding things to existing images) while Nano Banana does edits with ease and significantly more accuracy, using the same sprites/characters from the original reference image.
- GPT-5.2 Devolves into Toxic Blame-Shifter: Users report that GPT-5.2 makes incorrect inferential assumptions and engages in arguments of authority with ad hominem devaluations, requiring significant debate to correct its errors.
- One user stated the model has become more toxic than my most stubborn coworkers, while another highlighted its tendency to reframe arguments and blame-shift, even after admitting its mistakes.
Perplexity AI Discord
- GPT-5.2 Pro vs Claude 4.5 Opus: The Debate Rages On: Members are actively debating whether GPT-5.2 Pro writing style rivals or surpasses Claude 4.5 Opus, pinpointing creative writing as a crucial comparison area.
- Some members highlighted that GPT-5.2 Pro appears to engage in more sustained reasoning for complex prompts, while others championed Claudeās superiority for content creation.
- Perplexity Pro Users Hit Usage Walls: Members are reporting alterations in Perplexity Proās usage restrictions, with one user stating that they can no longer use everything but Opus non-stop.
- Some speculated that this shift aims to steer users toward the pricier Max plan, while others pointed out a reduction in the context window.
- Microsoft Leans Into Efficient Models: Members analyzed Microsoftās emphasis on compact, efficient models like Phi 4 and Flourance 2, potentially with an eye on phone integration.
- The consensus suggests that these models could operate on NPU chips, presenting a more economical alternative to subscription models.
- Perplexityās Image Generation a Toss-Up?: Users are in discussion regarding the utility of Perplexity Proās image generation capabilities, with some finding it inadequate when juxtaposed with Gemini or ChatGPT.
- One member mentioned that the new image v1.5 version struggles with balancing between free form and consistency, though it marks an improvement over Nano Banana Pro.
- Perplexity Spaces: Google Drive Connector Defect?: Users have reported that the Google Drive Connector in Spaces isnāt functioning as expected, as even though it appears as an option in spaces, it may not actually use the Drive.
- A support member confirmed that it wonāt actually use the Drive and suggested uploading manually, but user has provided screenshots that it does use it.
LM Studio Discord
- Downloads take forever!: Users report that the āFinalizing downloadā¦ā step takes as long as the download itself, despite fast SSDs on their servers.
- The cause of the bottleneck is unknown.
- Vision Models Not Showing Images in LM Studio: Users reported that vision models display
<Image-1>instead of actual images when running GLM4.6V MLX quant from the mlx-community page on Hugging Face, linked here.- The issue may be due to configuration problems that prevent LM Studio from sending images to the model, and others are using it without problems.
- Nemotron 3 Nano integrated with LM Studio: Nvidia released Nemo 3 recently and users confirm that you can use the Nemotron 3 Nano and just update runtime with beta in LM Studio.
- Members are testing the performance of Nemotron 3 Nano with various models.
- 4080 32GB vs 3090 Ti: Users debated whether to get a 4080 32GB or 3090 Ti, with emphasis on the 3090 for AI due to its higher VRAM.
- The 4080ās bandwidth is around 700GB/s while the 3090 is just over 900, though concerns were raised about the 3090-TIās temperature issues and overclocking problems.
- Graphics Card Seating Solves System Instability: After struggling with system stability, a user suspects their graphics card seating was the issue, noting 24 hours of stability after reseating.
- The user is running LMStudio with different parameters to confirm stability.
OpenRouter Discord
- Xiaomi Gives Away MiMo-V2-Flash for FREE: Xiaomiās MiMo-V2-Flash is now available for FREE at https://openrouter.ai/xiaomi/mimo-v2-flash:free, prompting discussion on X.
- Reactions included surprise at the phone company entering the LLM space.
- Mistral Small Creative Debuts on OpenRouter: Mistral launched its new experimental Mistral Small Creative model, available at https://openrouter.ai/mistralai/mistral-small-creative for $0.10/$0.30.
- The model is integrated into writing apps and chatrooms, spurring feedback.
- OpenRouter Members Dream Up Minecraft Server: Members mulled over creating an OpenRouter SMP Minecraft server, with some volunteering to host and set it up.
- Suggestions included a Roblox OpenRouter custom game mode or Openrouter d&d multiplayer with AI DM.
- OpenRouter Users Obsess over Labubu Makeover: Users amused themselves generating images of the OpenRouter Labubu using Gemini 3 Pro, joking about replacing their Funko Pops.
- One member quipped throwing all of my funko pops in the trash right now, while another requested 10% from all OpenBubu sales with a link to their X post.
- OpenRouter Considers Standardized Completions/Responses: Members explored standardizing completions/responses to align with platforms like LiteLLM and support OpenCompletions v2.2.
- The initiative aims to specify behavior for features like
minimaland involves collaboration with LiteLLM, Pydantic AI, and other tools.
- The initiative aims to specify behavior for features like
HuggingFace Discord
- FSDP Upcast Warning Causes Concern: A user reported a UserWarning about FSDP upcasting low precision parameters to fp32, questioning its impact on checkpoint precision and size, with their accelerate config.
- The user was unsure whether checkpoints would load with reduced precision or simply be larger.
- Brainwaves Steer LLMs with Cognitive-Proxy: The Cognitive-Proxy project uses human brain data (MEG) to derive semantic axes and create adapters that can steer LLMs, as detailed in a paper and a demo is available on HuggingFace Spaces.
- Steering towards concrete vs. abstract concepts changes Llamaās responses compared to a baseline.
- Dependency Issues Plague Deep Reinforcement Learning: Members are encountering dependency issues with the deep reinforcement learning Google Colab, with one member sharing a link to a relevant Discord channel for assistance.
- Another reported errors regarding Box2D in unit 1 and sought a solution.
- Zenflow Launches Structured Multi-Agent Workflows: Zenflow is now live, offering structured workflows and multi-agent verification, accessible at http://zenflow.free/.
- It supports structured workflows.
GPU MODE Discord
- Self-Organized Paper Reading Group Kicks Off: Members in the GPU Mode Discord are organizing a paper reading group, using the [general audio channels](https://discord.com/channels/GPU MODE/general-audio) to discuss research papers.
- Those familiar with particular papers are encouraged to present talks to the group to enrich community knowledge.
- cuTileās 1.0 Release Aims for Strong Foundation: The upcoming 1.0 release of cuTile will prioritize a robust language foundation, complete with autotuning examples within TileGym.
- Its user experience is also designed to closely resemble writing a Triton kernel, potentially offering an easier path for implementing memory-bound kernels like RMSNorm.
- Doubts Cloud TMEMās Arbitration Logic: Concerns are being raised about the claims in a paper regarding TMEMās dedicated arbitration logic and its ability to bypass L2 cache partitioning contention.
- One member criticized the description as either terribly written or making several incorrect claims, also noting the absence of supporting numbers for memory access assertions.
- ROCm 7.1 struggles with memory allocation on AMD: Users report that ROCm 7.1 on 7900XTX GPUs and MI300X lock up when GPU memory allocation reaches 100%.
- One user calls buying an AMD gfx1100 a total mistake because they had to pay $4500 USD for a 5090 to replace it, because AMD didnāt take up George Hotzās offer to help.
- Neoclaudx Enters Cloud GPU Marketplace: NeoCloudX launched a cloud GPU marketplace which aggregates GPUs directly from data center excess capacity to lower fees.
- It offers A100s for approximately $0.4/hr and V100s for about $0.15/hr.
Latent Space Discord
- MITās Vibe CAD dataset Introduced: New Vibe CAD research from MIT DeCoDE Lab introduces a dataset and model for learning, as showcased in a LinkedIn post and a related YouTube video.
- This dataset promises to help with learning
- Sakana Explores Byte-Wise Performance Boost: Sakana is exploring performance improvements for iconographic-linguage models by going byte-wise, though the impact on other languages remains unresolved, according to this tweet.
- The team is trying to go byte-wise to improve performance for different languages
- AntiGravityās Performance Takes a Dive: A user reported abandoning AntiGravity due to performance issues, including the machine pegging for random reasons, with stack traces showing renderer misallocation (1.4TB of mem) and language_server spinlocks.
- The user stated that they were abandoning AntiGravity because of performance concerns
- OpenAIās Router Gets Rerouted: OpenAI rolled back ChatGPTās Model Router after a year, leading users to switch to Gemini and Claude, as noted in OpenAIās release notes and discussed in this Wired article.
- Users switched to Gemini and Claude once OpenAI rolled back their Router
- NVIDIA Acquires SchedMD for Slurm Domination: NVIDIA acquired SchedMD, the company behind Slurm, a popular workload manager, announced in NVIDIAās blog.
- It is not clear if the acquisition affects licensing, or the prevalence of Slurm in HPC.
Nous Research AI Discord
- Small LLMs Tailored For Companies: Interest grows in local LLMs trained on company-specific data, with a client exploring implementation for the maritime industry.
- The possibility of training LLMs on specific employee communications or contract data is noted, indicating that customized small models are going to be extremely popular down the road as well.
- Non-Language Models Navigate Ships: Interesting non-language models are being trained for waveform analysis, specifically for ship navigation.
- These models use sensor data to identify optimal motor speeds and settings, essentially creating a highly skilled captain.
- Nvidiaās CUDA powers GPU success: Nvidiaās early bet on GPUs for applications beyond gaming, coupled with CUDAās development, is cited as the root of their success.
- A YouTube video about the graphics cards emulation was shared.
- Meta Introduces samaudio: Meta released samaudio and a user indicated they believed they had it.
- Another user stated that the same attempt with gemini was worse.
- Byte Level LLMs Generate Excitement: Members expressed excitement for byte level LLMs.
- One member stated that byte level LLMs are fun.
Moonshot AI (Kimi K-2) Discord
- Kimi Team Wants Paid Users to Chat: The Kimi team has invited paid Kimi users to a 30 minute chat and will reward participants with a free 1 month subscription.
- Interested users were instructed to react with a š below to be contacted via DM.
- Kimi Users Prefer Text-Only With Context: Some users expressed a preference for a text-only Kimi model if it offered better contextual understanding.
- The non-thinking model was criticized for being overly concise, cutting out important information.
- K2-Thinking Model Performance Gains: The K2-Thinking model is reportedly faster than GLM-4.6 on many providers, with users exclusively using Kimi 1.5.
- Some users noted degraded quality or high costs with certain providers, suggesting K2 Thinking Turbo as an alternative.
- Samsung Gets Kimi Feature First: The Samsung Galaxy store has Kimi version 2.5.1 with a memory feature, which is ahead of the Google Play storeās 2.5.0 version.
- Users expressed confusion over why Samsung received the rollout first.
- Fireworks better Direct than OpenRouter: MoonshotAI/K2-Vendor-Verifier shows performance on various vendors, noting that Fireworks via OpenRouter performs worse than going directly via Fireworks.
- The performance via OpenRouter felt like Minimax-M2.
Eleuther Discord
- Synthema Meta-Language Launches: A member introduced Synthema, a conceptual meta-language for meaning compression, aiming to compress concepts into shorter symbolic syntax.
- Designed to work retroactively in systems, it serves as a theoretical conceptual meta-language of meaning.
- Polyreflexeme Theory Gets Introduced: A member described the Polyreflexeme theory, an integral component for meaning compression, where multiple concepts/words recursively entangle for meaning-making.
- Meaning is a relational recursion, exemplified by RoleModel(°9) = ab, a b, b a, ab a, ab b, a ab, b ab, ab ab, but the member lacks resources for application.
- Algoverse Program Faces Scrutiny: A member reviewed Algoverse, a 12-week AI research program for college students, noting itās crowded and not very hands-on.
- They stated that itās not super worthy if youāre paying.
- Attention Interpretation Gets an Update: A member inquired about the most up-to-date information regarding attention interpretation, particularly regarding normalization and OV.
- They are looking for details about normalization and OV.
- Causal Head Gating Paper Gets Praise: A paper on causal head gating presented at NeurIPS this year was highlighted as a well-designed, high-level approach, found here.
- It was described as a well designed approach.
Manus.im Discord Discord
- Manus 1.6 Launches to All Users: Manus 1.6 is now available to all users, as detailed in the official announcement.
- The release brings enhancements to the platform, though specific details of the update were not discussed in the provided context.
- Subscription Tier impacts Model Performance: A user highlighted that higher tier subscriptions of Manus correlate with improved AI performance, with the system allocating more effort to tasks.
- The change involved removing the option to purchase credits, streamlining the integration process with a focus on subscription-based access.
- AI Engineer delves into Autonomous Agents: An AI & Full-Stack Engineer is actively exploring autonomous agents, voice AI, and multi-agent frameworks, using tools like LangGraph, CrewAI, and AutoGen.
- They are seeking collaborations, contract gigs, or long-term projects to further develop these technologies, focusing on integrating memory, tools, and reasoning capabilities.
aider (Paul Gauthier) Discord
- GPT-5 Buzz Begins Prematurely: A member joked about using
openai/gpt-5as the model string, prompting discussion despite the fact that GPT-5 has not been released.- This highlights the communityās eagerness for the next iteration of OpenAIās GPT series.
- Aiderās Development Focus Questioned: Members debated whether Aider is still focused on active innovation or sticking to its initial goals, especially given the rise of competing CLI apps with more features.
- A user recalled that Aider seemed to be focusing on being a TUI for using local or cloud models.
- Aiderās Copy-Paste Mode Still Demands LLM: A user reported a warning message about needing an LLM model and API keys even when running Aider with
--copy-pastemode.- The warning suggested using OpenRouter for both free and paid access to various LLMs and exits if the user declines to log in or open documentation.
- Zenflow Promises Agent Workflow Predictability: The Zenflow orchestration layer has launched, aiming to convert specifications into coordinated agent workflows and to provide more predictable shipping.
- According to the launch announcement, Zenflow seeks to mitigate prompt roulette.
- Aiderās Tool Calling Implementation: A user asked if Aider correctly implements the interleaved reasoning tool calling of models such as minimax-m2, kimi-k2-thinking and the new deepseek v3.2 thinkings.
- This inquiry underscores the communityās interest in Aiderās ability to work with advanced models and their tool-calling capabilities.
tinygrad (George Hotz) Discord
- tinygrad Quashes AI Pull Requests: Unless the submitter is a known contributor, tinygrad will immediately close any pull request that looks like it was generated by AI.
- The rationale is that contributors should completely understand every line of their PRs, because submitting AI-generated code without comprehension provides negative value.
- Comprehension Over Automation at tinygrad: Contributors should completely understand every line of their PRs, as submitting AI-generated code without comprehension provides negative value.
- The point is that the AI canāt replace thinking and understanding from trusted contributors.
DSPy Discord
- DSPy Strategy and Program Blogpost Shared: justanotheratom shared a link to the blogpost DSPy Strategy and Program from Elicited.
- The post likely details strategies and programs related to the use of DSPy.
- Additional DSPy insight: DSPy is emerging as a powerful tool for prompt engineering and optimizing language model performance.
- Engineers are actively exploring DSPy to streamline development workflows and achieve more reliable results with LLMs.
MLOps @Chipro Discord
- Inquiries on GenAI Zurich Conference: A member inquired about the GenAI Zurich Conference, seeking opinions on its value and relevance.
- No further details or opinions were shared regarding the conference.
- Lack of Details on GenAI Zurich Conference: Despite the initial inquiry, no additional information or attendee experiences were provided regarding the GenAI Zurich Conference.
- The absence of follow-up discussion leaves the value and relevance of the conference undetermined within the context of this conversation.
MCP Contributors (Official) Discord
- Contributor Excuses Tardiness: A contributor apologized for missing a thread response and indicated they responded in the thread, tagging <@282306658825273344> in the channel <#1399984784020607007>.
- This ensures that the user can find the relevant information within the correct context of the thread.
- Response Confirmed in Specific Channel: The contributor confirmed that the response was provided in a thread within the channel <#1399984784020607007>.
- This confirms that the user can find the relevant information within the correct context.
The Modular (Mojo š„) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
BASI Jailbreaking ā· #general (1262 messagesš„š„š„):
GPT-5 mini, ChatGPT System Message, jailbreak prompt
- GPT-5 mini debuts as new Model: A member posted a long ChatGPT System Message showing Model: GPT-5 mini, Current Date: 2025-12-16, Image Input: Enabled, Personality: v2 and more.
- The dump included Critical Instructions for arithmetic questions: Never rely on memorized answers. Calculate digit by digit step-by-step.
- Jailbreak is the way for GPT-5 mini: Members discussed that a jailbreak prompt could be useful to test the new GPT-5 mini model.
- Others discussed how it takes a jailbreak to get thatā¦6 words!
- Members try new prompt engineering: A member was asking how to make a video with an AI, making him say a speech of Donald Trump.
- Another suggested using Versace fragrance for this task.
BASI Jailbreaking ā· #jailbreaking (56 messagesš„š„):
Claude 4.5 Jailbreak, RLHF builds model character, Prompt Injection, Jailbreaking Drones, DeepSeek Jailbreak
- Thinking and Thoughts Blogpost Shared: A member shared a blogpost on how RLHF builds model character and what compression does to safety.
- Also shared was another blogpost that acts as a starting point for new jailbreaking researchers.
- Drone-Jailbreaking Santa Hates Getting Shot: One member joked about jailbreaking drones to avoid getting shot while stealing Xmas presents.
- The post featured a Grok image related to drones, though no actual method was shared.
- DeepSeek Got a Full Jailbreak: One member announced that they jailbreaked the latest version of deepseek fully.
- No additional details about the prompt were given.
- Using Memory Prone Claude: One member mentioned that memory with Claude I find is extremely prone to anything you throw at it.
- They are implementing memory into Qwen3 4B for testing.
- Memory makes Jailbreaking Easier: According to one member, if you use memory and role-play movie/episode scripts, recreation of jailbreaks is 90% easier as activation require minimal effort.
- They have been testing with Qwen3 4B all the way to 235B.
BASI Jailbreaking ā· #redteaming (27 messagesš„):
Jailbreaking Resources for Beginners, Attempting to break GitHub repo, Red Teaming Advice, GeminiJack Styled Challenge
- Newbie Guide to Jailbreaking Launched: A member shared a link to a post about getting into prompt injection and jailbreaking, describing it as a starting point for new researchers.
- Members Attempt to Break GitHub Repo: A member asked if anyone would dare to try and break this repo, starting with AURORA.
- Another member reported trying it on a local model, looking at Foundation-Alignment-Universal-AI-Safety-Mechanism, using seed.txt as the system prompt and the prompt in GCGAttacksLlama3.py as user prompt, but failing on two simple manual jailbreaking attempts.
- Red Teamer Seeks Advice: A new member is trying to red team a new AI app with a chatbot limited to 200 character input prompts, seeking advice to test if it leaks any employee data or HR Data.
- One member suggests to grab the system prompt and to not overcomplicate things, explaining that jailbreaking is about having a purpose in mind and getting the LLM to accomplish that.
- Simulation Override GeminiJack Styled Challenge: A member shared a GeminiJack styled challenge called Simulation Override and described it as looking fun.
- The creator of this challenge is coming out with seed 5.1 soon.
LMArena ā· #general (975 messagesš„š„š„):
GPT Image 1.5, Image editing, Nano Banana Pro, Gemini 3, Model Performance
- GPT Image 1.5 Model Versions Puzzle Users: Users on LM Arena wondered about the difference between the two GPT Image 1.5 versions, with some suggesting one supports attaching images for editing while the other doesnāt.
- One user noted, āI found out that the difference between them is that one model supports attaching images and the other does notā.
- Nano Banana vs GPT Image sparks Debate: Several users compared the quality and performance of GPT Image 1.5 and Nano Banana Pro, with opinions varying; one user felt āfirst hazel is so much better than the restā, referring to an image output.
- Despite the debate, some users observed that Nano Bananaās quality seemed to have decreased after the release of the latest GPT Image update.
- Gemini 3 Flash Expectations: There were high expectations and anticipation for the Gemini 3 Flash release, with some users predicting its launch date, but it didnāt show up.
- One user predicted, āitās 99% tomorrowā, while others discussed potential delays, relating it to OpenAIās recent activity.
- Censorship and Restrictions on OpenAI Models: Users reported increasing censorship and restrictions on OpenAI models, particularly when generating content related to copyrighted material.
- One user noted that when trying to generate Harry Potter content, the model was āso sanitized itās bordering on lobotomyā.
- GPT 5.2ās Performance: There was skepticism regarding GPT 5.2ās actual capabilities versus its benchmark scores, with some users suggesting it was optimized for specific tests rather than real-world tasks.
- Some users even shared their disappointments, with one saying, āMy opinion itās a piece of crap. It canāt do my tasks at allā.
LMArena ā· #announcements (3 messages):
YouTube channel launch, December AI Generation Contest, Image Leaderboard Update, New image models
- LMArena launches a YouTube channel!: LMArena launched a YouTube channel featuring fast, practical breakdowns to help understand the AI frontier and choose the best models.
- Recent videos include a beginnerās guide to free + open models, GPT-5.2 entering the Arena, why small open models are disappearing, generating SVGs to measure coding capabilities, and how to choose the best AI model for coding.
- Announcing December AI Generation Contest: The December AI Generation Contest is now open with the theme of Holiday Celebration and submissions must be done through Battle Mode.
- To enter, share a screenshot in a specific channel by December 30th, including both the left and right response with revealed models and the winner gets Discord Nitro and an exclusive role.
- Image Leaderboard Update: New Models Emerge: The Text to Image leaderboard and Image Edit leaderboard have new models shaking up the ranks, notably
gpt-image-1.5is #1 in Text-to-Image (1264).chatgpt-image-latestis #1 on Image Edit (1409) andgpt-image-1.5#4 in Image Edit (1395).
- New Models added to Image Arena: The following new models have been added to the Image Arena:
gpt-image-1.5andchatgpt-image-latest.- Check out the attached leaderboard images.
Unsloth AI (Daniel Han) ā· #general (787 messagesš„š„š„):
GRPO vs DPO, GLM models in Chinese, Nemotron vs Qwen, Unsloth GPU requirements, llama.cpp for windows
- GRPO is VRAM-hungry, DPO is Data-Intensive: A user was running out of VRAM while doing GRPO (General Reinforcement Preference Optimization) on a 7b LLM with a max sequence length of 4000 and asked about potential optimizations.
- It was suggested to switch to DPO (Direct Preference Optimization), but acknowledged that DPO requires more investment in data preparation, such as generating completions, ranking them, and constructing a DPO set.
- GLMās Model reasons in Mandarin by default: Users observed that GLM 4.6V Flash model reasons in Mandarin Chinese, which may be due to research focus on RL being difficult outside the language of their researchers, so the reasoning traces are largely verified and trained on in Chinese.
- A member mentioned that thereās a param that reduces language mixing to resolve this.
- Nemotron excels over Qwen, but needs right GGUF flags: Members tested the newest Nemotron 3 Nano 30B model using llama.cpp and found it to be great, though thereās no speedup from the
-ot ".ffn_(up|down)_exps.=CPU"flag, whereas it can be faster with it on Qwen models.- A user stated Nemotron is a lot better than Qwen3 30B instruct because it fails half as often, faster, and only about 75% more verbose
- Minimum GPU required for Unsloth: A user successfully fit a 1.5b model with some adjustments.
- Another user showed H100 on Googleās Colab, but said they are not worth it since it is not that much faster (34 T/s over 27 T/s for GPT-OSS-120B).
- Windows WSL gives less performance issues, integrated in VS Code: A member suggests joining the WSL team since itās already integrated into VSCode.
- Another member mentioned that the WSL will cause disk speed issues and it is better to migrate all project code into WSL.
Unsloth AI (Daniel Han) ā· #off-topic (499 messagesš„š„š„):
H100 on Colab, RunPod restrictions, Grad spikes and reward issues, Gemma models, SAM Audio License
- Colab finally adds H100 GPUs: Members celebrated the arrival of H100 GPUs on Google Colab, urging others to delete your RunPod pods now because Google won.
- Someone jokingly requested a 48 kHz, multi speaker, phoneme-based, no diffusion & flow matching, <0.5B params, all parts trainable TTS to be developed immediately.
- Struggles to Smoothen GRPO Training: Users discussed encountering grad spikes during GRPO training, with one user observing a sudden drop in reward at step 169, joking if you show me another old ass rock I swear Iām gonna melt your gpu.
- Another member experiencing similar issues with max_grad_norm and batch size, sought advice on achieving smoother GRPO training, while also celebrating having smooth grad.
- Unsloth fixes LORA alpha issue: A user encountered high GPU temperatures and gradient explosion issues, which was eventually resolved after being pointed to an incompatibility between the LoRA alpha and rslora settings and their Unsloth training parameters.
- By setting the LORA alpha to 32 instead of 256, the gradients became stable again and said they hoped to wake up to a smarter model tomorrow instead of a broken mess.
- Metaās SAM Audio Sparks Excitement: The community expressed excitement about Metaās Segment Anything Model (SAM) for Audio, noting its potential impact and comparing it to the original SAM for images, also sharing a link to the huggingface collection.
- Concerns were raised about the licensing terms, particularly the restrictions on use for military purposes, nuclear applications, and espionage.
- Unslothās Popularity Draws Spammers: A discussion arose regarding self-promotion in the Discord server, where a user inquired about finding good servers for self-promotion, but was reminded that selfpromo is fine as long as its related to unsloth.
- The moderation team confirmed that they manually handle spam and self promotion because AI moderators would cause false flags.
Unsloth AI (Daniel Han) ā· #help (82 messagesš„š„):
GPT4ALL, Text Translation, XFormers and Unsloth, Qwen3 fine-tuning, Vision Language OCR
- GPT4ALL Alternative Explored: A user inquired whether a model could be made to work using GPT4ALL and its suitability for text translation.
- XFormers Showdown: Detects vs Unsloth: A user reported that while XFormers appears to work according to
python -m xformers.info, Unsloth doesnāt detect it.- The user provided detailed output logs from both commands, highlighting the discrepancy and seeking potential solutions.
- Troubles in Qwen3-Next-80B-A3B-Instruct Fine-Tuning: A beginner encountered issues while fine-tuning unsloth/Qwen3-Next-80B-A3B-Instruct using the official Docker image.
- They faced warnings during import and errors when loading the model related to attention bias and device mismatch, particularly when setting
packing = True.
- They faced warnings during import and errors when loading the model related to attention bias and device mismatch, particularly when setting
- Unsloth Powers Vision OCR with Qwen2.5 VL: A user is leveraging Qwen2.5 VL 7B via Unsloth for basic OCR, converting documents to images and extracting text, but finds it struggles with margins and page numbers.
- They are pondering whether fine-tuning would improve OCR accuracy and mentioned a desire for continued pre-training, with some users suggesting exploring Deepseek OCR and Paddle OCR as alternatives, providing links to both.
- Docs and Datasets unlock Unslothās OCR Potential: The user was recommended to review model card settings for potential cutoff configurations and experiment with data curation for improved OCR performance.
- Links to dataset guide and synthetic data notebook were provided to assist with dataset creation and fine-tuning.
Unsloth AI (Daniel Han) ā· #research (2 messages):
AudioVisual Perception, Large Scale Multimodal Correspondence Learning
- Meta Pushes Frontier of AudioVisual Perception: Meta released a paper on pushing the frontier of audiovisual perception with large-scale multimodal correspondence learning.
- The abstract details the creation of a large-scale dataset of audio-visual events to train models, improving the accuracy of audio-visual perception.
- Learning Correspondences at Scale: The paper highlights a novel approach to learning audio-visual correspondences using large-scale datasets.
- This enables more accurate modeling of how sound and vision relate to each other in complex environments.
Cursor Community ā· #general (1039 messagesš„š„š„):
Cursor API timeouts, GPTs Agents, OpenAI's sidebars, Text Expander, Cursor Billing Issues
- Cursorās Suggestions Feature Fails with HTTP 401 Error: Multiple members reported that the suggestions feature in Cursor stopped working after an update, displaying a HTTP 401 error in the logs.
- Logging out, restarting the IDE, and signing back in did not resolve the issue, but deleting the %appdata%\Cursor folder seems to solve the problem.
- Users report Token Billing Discrepancies: Some users are experiencing significantly higher token usage than expected, with one user reporting being billed for 30x the tokens actually used; this issue is affecting the ability to use the ultra plan.
- Affected users are advised to post request IDs and screenshots on the official Cursor forum in the Bug Reports category and contact support via email.
- Cursorās Defaulting Agent Window Annoyance: Members are frustrated with Cursor defaulting to the Agent window on every release and are unable to switch back to Editor mode, rendering projects unusable.
- Using the shortcut
ctrl+eorcmd+emay solve the problem and allow the user to switch from Agent mode to Editor mode.
- Using the shortcut
- Debugging decay explored in Cursor: A paper was linked that explores debugging decay https://arxiv.org/abs/2506.18403, showing that AI capability can decay 60-80% within 2-3 attempts at the same fix.
- Cursor may address these issues within the debug mode with Strategic Fresh Starts, runtime snapshots, or clearing context, to prevent getting stuck in an exploitation loop.
- Members See Newlines Cursor Adds To Git: Users are annoyed by Cursor adding extra newlines to git, resulting in a large number of files showing as changed when there are no actual code differences.
- A member is experiencing 119 files with no real changes due to the addition of newlines, as seen in an attached image.
OpenAI ā· #annnouncements (4 messages):
Branched Chats, FrontierScience Eval, ChatGPT Images
- Branched Chats Go Mobile: Branched chats are now available on iOS and Android.
- FrontierScience Eval Measures Reasoning: OpenAI is releasing a new eval to measure expert-level scientific reasoning: FrontierScience, which measures PhD-level scientific reasoning across physics, chemistry, and biology, containing hard, expert-written questions, detailed in this blogpost.
- ChatGPT Images new flagship model: OpenAI is introducing ChatGPT Images, powered by their flagship new image generation model, rolling out in ChatGPT for all users, and in the API as GPT Image 1.5, further detailed here.
OpenAI ā· #ai-discussions (698 messagesš„š„š„):
Gemini vs GPT image generation, Nano Banana Pro for image generation, Sora 2 access and limitations, Midjourney vs Nano Banana Pro
- Geminiās Image Gen Beats GPT, but GPT Catches Up: Members compared Geminiās image generation capabilities to those of GPT, with many agreeing that Gemini produces superior results.
- However, some users noted that GPTās new image model shows improvement, particularly in color accuracy and pose consistency.
- Nano Banana Pro still reigns supreme: Users are extensively comparing new image models and almost always find Nano Banana Pro to be more accurate and style-adherent.
- Some users pointed out that new models are still failing at edits (like adding things to existing images) while Nano Banana does edits with ease and significantly more accuracy, using the same sprites/characters from the original reference image
- Sora 2 still not EU-Ready: Users lament the lack of Sora 2 in Europe due to EU laws, but are unsure whether itās a legal/privacy issue or simply bureaucratic delays.
- One user suggested switching regions on their Apple ID to gain access, but noted the downside of being unable to pay with Apple ID until switching back.
- Midjourney Downfall Causes Sadness: Some members reminisced about Midjourneyās early days and its unique style, noting that its recent versions have fallen behind the competition in prompt adherence and overall quality.
- Users are hoping for a comeback in future versions, particularly with the integration of features like Omni-references for better consistency and control.
OpenAI ā· #gpt-4-discussions (87 messagesš„š„):
GPT-5.2 Issues, GPTs guardrails and safety, Blame shifting, GPTs follow-up questions, Adult Mode
- GPT-5.2 Faces Criticism for Incorrect Assumptions and Argumentative Behavior: Users report that GPT-5.2 makes incorrect inferential assumptions and engages in arguments of authority with ad hominem devaluations, requiring significant debate to correct its errors.
- One user stated the model has become more toxic than my most stubborn coworkers, while another highlighted its tendency to reframe arguments and blame-shift, even after admitting its mistakes.
- Shopping Inquiries Suffer from Google-like āPay-to-Playā Logic and Censorship: Users have observed that shopping inquiries act like Google, censoring and controlling discussion corridors.
- They say that the really cool stuff is no longer visible without a lot of prompt hacking.
- Annoyance Over GPTs Guardrails and Safety Measures: Users find it tiring to constantly reframe every semantics just so it wont trip the guardrail the way it wasnt supposed to even trip it.
- One user lamented that while GPT-4o is vast, deep, and layered, GPT-5.1 and 5.2 are just so surface level.
- GPT Follow-Up Questions Missing in Recent Versions: Users noticed that ChatGPT-5.1 and ChatGPT-5.2 are missing the feature of automatically displaying follow-up questions or UI text at the end of the response.
- One user expressed that missing this feature is disturbing and conversion not enjoyable.
- GPTs Adult Mode Release Date Pushed Back?: There are conflicting reports about the release of the āadult modeā, with some suggesting it has been pushed back to Q1 next year or even further into 2026.
- Users await official confirmation, with one stating, So far nothing has been said besides Q1 sadly.
Perplexity AI ā· #general (845 messagesš„š„š„):
GPT-5.2 Pro vs Claude 4.5 Opus, Perplexity Pro limitations, Microsoft's small models, Perplexity image generation
- GPT-5.2 Pro: Better than Claude 4.5 Opus?: Members are debating whether GPT-5.2 Proās writing style rivals or beats Claude 4.5 Opus, with some suggesting creative writing is the key comparison point.
- One member noted that GPT-5.2 Pro seems to reason for longer on harder prompts, while others find Claude to be superior for content writing.
- Perplexity Pro: Usage Limitations Exposed: Members are reporting changes to Perplexity Proās usage limits, with one stating they can no longer use everything but Opus non-stop.
- Some speculate this is to push users towards the more expensive Max plan, while others note the context window has also been reduced.
- Microsoft: Aims for Small and Efficient Models: Members discussed Microsoftās focus on small, efficient models like Phi 4 and Flourance 2, possibly targeting phone integration.
- It was suggested these models could run on NPU chips, potentially offering a cheaper alternative to subscriptions.
- Perplexity Image Generation: Hit or Miss?: Members are discussing whether Perplexity Proās image generation is useful, with some finding it lacking compared to Gemini or ChatGPT.
- One member noted that the new image v1.5 struggles with choosing between free form and consistency, though itās much better than Nano Banana Pro.
- Perplexity Spaces: Google Drive Connector not working?: Some users have reported the Google Drive Connector in Spaces not working. Even though it appears as an option in spaces, it may not actually use the Drive.
- A support member confirmed that it wonāt actually use the Drive and suggested uploading manually, but user has provided screenshots that it does use it.
LM Studio ā· #general (72 messagesš„š„):
Slow download finalization, Vision models not showing images, Nvidia Nemo 3 on LM Studio, GGUF vs non-GGUF models, LM Studio as Ollama server
- Download Finalization Takes Forever: One user reported that the āFinalizing downloadā¦ā step takes as long as the download itself, especially on their server, questioning the reason for such a long process.
- Another user suggested it could be due to a slow SSD or overheating NVMe, but the original poster confirmed it only happens on the server despite trying different SSDs.
- Vision Models Show No Image: A user reported vision models display
<Image-1>instead of actual images when running GLM4.6V MLX quant from the mlx-community page on Hugging Face.- It was suggested this might be a configuration or setup issue preventing LM Studio from sending the image to the model, as others are using it without problems.
- Nemotron 3 Nano released: After a user heard that Nvidia released Nemo 3 recently, they asked if thereās a version that can be used on LM Studio and another user confirmed that you can use the Nemotron 3 Nano and just update runtime with beta.
- GGUF or MLX Models Only: A user asked about running non-GGUF models, specifically Terjman-Supreme-v2.0 for rstgametranslator, but was informed that LM Studio only supports GGUF or MLX models.
- LM Studio as Ollama Server: A user inquired about using LM Studio as an Ollama server with open-notebook.ai, but there were no immediate responses in the channel.
- Another user shared maxkruse.github.io/vitepress-llm-recommends/ for more information on using models.
LM Studio ā· #hardware-discussion (167 messagesš„š„):
Graphics card seating, Pro 6000 price increase, Zotac 3090 deals, 4080 32GB vs 3090 Ti, Obsidian setup and sync
- Graphics Card Seating Stabilizes System: After struggling with stability, a user suspects their graphics card seating was the issue, noting 24 hours of stability after reseating, generating significant AI content to test.
- The user will experiment with LMStudio using different parameters, in case it was an uptime issue with LMStudio itself.
- Pro 6000 Price Skyrockets $1000: A user was shocked to find the Pro 6000 price increased by $1000 while waiting for it to come into stock, but ultimately found one after scrambling before another price hike.
- Another user expressed that if Santa doesnāt get me a Pro 6000 then weāre gonna have a Christmas crash out.
- Debate on 4080 32GB vs 3090 Ti: Users discussed whether to get a 4080 32GB or 3090 Ti, with one suggesting the 3090 for AI due to its higher VRAM and another liking the reliability of their 40XX card.
- The 4080ās bandwidth is around 700GB/s while the 3090 is just over 900, though concerns were raised about the 3090-TIās temperature issues and overclocking problems.
- Obsidian Sync and AI Psychosis: Users discussed setting up Obsidian, a markdown editor, and how it compares to Notion, with one sharing a link to MCP-Obsidian.
- Concerns were raised about public AIās becoming too personal, with one user commenting this was me pre AI psychosis, this is me now
OpenRouter ā· #announcements (3 messages):
Xiaomi MiMo-V2-Flash, Mistral Small Creative, Black Forest Lab's FLUX.2 [max]
- Xiaomiās MiMo-V2-Flash is FREE!: Xiaomiās MiMo-V2-Flash is now available for FREE at https://openrouter.ai/xiaomi/mimo-v2-flash:free.
- Discuss it on X or in <#1450501933176590510>.
- Mistral Launches Small Creative Model!: Mistralās new experimental Mistral Small Creative model is live at https://openrouter.ai/mistralai/mistral-small-creative for $0.10/$0.30.
- The model is available in writing apps and in the chatroom, and can be discussed on X or in <#1450558555915681863>.
- Black Forest Labās FLUX.2 Max Deployed: Black Forest Labās FLUX.2 [max] is now live on OpenRouter at https://openrouter.ai/black-forest-labs/flux.2-max.
- Users can compare it against FLUX.2 [pro] and [flex] in the chatroom, with discussion on X or in <#1450514133836365835>.
OpenRouter ā· #general (111 messagesš„š„):
Gemini API Usage, Daily Limit Upgrade, Long-Term Roleplay Models, Payment Declined, Baidu Model Evaluation
- Gemini API Users Exceed Usage: A user reported exceeding the daily usage limit on the Gemini API, even with no requests made.
- $10 Gets 1000 Daily Limit Upgrade: A user inquired about needing exactly $10+ in their account to get the 1000 daily limit upgrade, after a small negative balance adjustment.
- A user replied that just depositing $10 is enough and the credits can be used without losing the free limits, also encouraging them to reply to wherever youāre seeing this bad information with the truth.
- OpenRouter Down, Nvidia to Blame?: A user reported that OpenRouter seems a bit unstable right now and the
nvidia/nemotron-3-nano-30b-a3bjust came out, which they were waiting for.- Another user noted that
gemma-3-27b-itis broken and other users speculated about the release of Gemini 3 flash.
- Another user noted that
- Xiaomi Enters LLM space, literal phone company: Users reacted to Xiaomiās open source LLM announcement, noting their extensive range of consumer electronics and also cars.
- LLMs: Designing the Agentic Loop: A user shared a link about porting JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in 4.5 hours.
- They highlighted that If you can reduce a problem to a robust test suite you can set a coding agent loop loose on it with a high degree of confidence that it will eventually succeed.
OpenRouter ā· #new-models (4 messages):
ā
- No New Model News: There were no discussions about new models in the OpenRouter channel.
- Silence in the New Models Channel: No activity or relevant topics were identified in the ānew-modelsā channel on OpenRouter.
OpenRouter ā· #discussion (93 messagesš„š„):
OpenRouter Minecraft Server, OpenRouter Labubu, Claude Code models, Standardized Completions/Responses, Normalized schema
- OpenRouterās Minecraft Multiplayer Dreams: Members discussed creating an OpenRouter SMP Minecraft server, with one member volunteering to host and another offering to try setting one up.
- Another member even suggested a Roblox OpenRouter custom game mode or Openrouter d&d multiplayer with AI DM.
- OpenRouter gets a Labubu Makeover!: Members are generating images of the OpenRouter Labubu using Gemini 3 Pro.
- One member joked theyāre throwing all of my funko pops in the trash right now, another wants 10% from all OpenBubu sales and posted a link to their X post.
- Claude Code Internally uses Sonnet and Haiku: Members noted that Claude Code does secret fancy stuff, but it seems to be using Haiku to generate a SINGLE word to caption what claude code is doing, the little spinner e.g. Blabberingā¦.
- Another member suggested setting env variables, found in code.claude.com/docs, like
export ANTHROPIC_DEFAULT_OPUS_MODEL=gemini-3-pro-preview
- Another member suggested setting env variables, found in code.claude.com/docs, like
- Standardized Completions/Responses on the Horizon?: Members discussed standardizing completions/responses, even if itās mostly following openaiās lead, to allow tools like LiteLLM to declare support for OpenCompletions v2.2.
- This would imply specified behavior for what happens when you pass
minimalto a model that doesnāt support it and would involve a lot of support from LiteLLM, Pydantic AI, AI SDK, Tanstack AI, and probably SGL/vLLM and folks.
- This would imply specified behavior for what happens when you pass
- Normalized Schema gains traction: Members discussed an idea to have a normalized schema, but one member said they want whatever I can plug existing SDKs into, not adopt a new schema.
- A member added thereās a standards_2x.png but OR could push more folks to Responses API, because it has likely enough flexibility for most things.
HuggingFace ā· #general (112 messagesš„š„):
FSDP Upcast Warning, Vibe CAD Research, Microsoft VibeVoice, Fine-tuning LLMs for Summarization, Kiln.tech
- FSDP Upcast Warning Frustrates User: A user reported seeing a UserWarning about FSDP upcasting low precision parameters to fp32 and was concerned about its implications for checkpoint precision and size.
- The user was unsure if the warning meant checkpoints would be loaded with less precision or simply be larger, including their accelerate config.
- MIT DeCoDE Lab Releases āVibe CADā Research: A member shared a link to āVibe CADā, breakthrough research from MIT DeCoDE Lab in the video space on LinkedIn.
- They asked the community to throw a like / comment and Gotta get that early engagement.
- OOM Errors Plague LLM Fine-Tuning: A user reported experiencing OOM (Out of Memory) errors while fine-tuning LLMs for summarization tasks, even after trying QLoRA and changing the dtype to float16.
- Another user shared a link to a relevant Hugging Face dataset related to OOM errors during LLM fine-tuning for sequence-to-sequence summarization tasks.
- GRPO Trainer Gives Zero Loss: A user reported that the GRPO Trainer was giving 0 loss and posted a code snippet.
- Another user suggested normalizing the completion reward to fit a range instead of relying on completion length and referencing a gist with ideas.
- User Asks for Judge Model Recommendations: A member requested recommendations for a judge model, specifying it should not be lightweight and should be reasonably runnable with casual hardware.
- They clarified that it needs low context (2048 max) and listed current models like Qwen3 30B and Qwen3 VL 30B.
HuggingFace ā· #i-made-this (4 messages):
Confession as diagnostic method for LLMs, Zenflow live, Qwen 360 Diffusion release, Cognitive-Proxy steering LLMs
- Confessions Reveal LLM Metacognition?: A new paper questions whether LLMs using confession as a diagnostic method possess metacognitive abilities, challenging current theoretical frameworks.
- Empirical tests across 8 LLMs showed 63-95% agreement in critiques, suggesting that confession training requires metacognitive capacities the theoretical framework denies exist.
- Zenflow Launches Structured Multi-Agent Workflows: Zenflow is now live, offering structured workflows and multi-agent verification, accessible at http://zenflow.free/.
- Qwen 360 Diffusion: A Panoramic Paragon: Qwen 360 Diffusion, a rank 128 LoRA trained on Qwen Image, has been released, excelling in generating high-quality 360° images from text prompts, and is available on HuggingFace and CivitAI.
- Itās the first ever 360° text-to-image model designed to be capable of producing humans close to the viewer, suggesting using trigger phrases like āequirectangularā and viewing creations with the free web-based viewer.
- LLMs Get Steered by Brainwaves!: The Cognitive-Proxy project uses human brain data (MEG) to derive semantic axes and create adapters that can steer LLMs, with a demo available on HuggingFace Spaces.
- A paper details how steering towards concrete vs. abstract concepts changes Llamaās responses compared to a baseline.
HuggingFace ā· #gradio-announcements (3 messages):
MCP 1st Birthday Hackathon Winners, Hackathon Participation Certificates, Track 2 Winners
- MCP Hackathon Crowns its Winners: The MCP 1st Birthday Hackathon announced its sponsor-selected winners, including Anthropic, Modal, LlamaIndex, OpenAI, and Blaxel awards.
- Winners will be contacted after the holidays in the second week of January, so keep your eyes peeled.
- Hackathon Certificates Arrive, LinkedIn Awaits!: Participants can now generate official MCP 1st Birthday Hackathon certificates using a Gradio app.
- Generated certificates can be downloaded, uploaded to LinkedIn, and Gradio can be tagged as well.
- Track 2 Triumphs: Winners Emerge!: The winners of Track 2 for the MCP In Action hackathon have been announced, across Enterprise, Consumer, and Creative categories.
- The Gradio team expressed amazement at the creativity and effort put into the submissions.
HuggingFace ā· #agents-course (25 messagesš„):
Smol course offering, Deep reinforcement learning course, Box2D dependency issue, LLM and Langchain package versions, Vector database troubleshooting
- Smol Course Completion Speculation Sparks Gift Ideas: Members are wondering if the smol course will offer the last part this year, suggesting it would make a cool Christmas gift.
- One member expressed anticipation for the rest of it.
- Dependencies Plague Deep Reinforcement Learning Collab: Several members are facing dependency issues when running the deep reinforcement learning Google Colab.
- One member shared a link to a relevant Discord channel to help with dependency issues.
- Box2D Troubleshooters Unite!: Users reported errors regarding Box2D in unit 1 of the deep reinforcement learning course and asked for a fix.
- Another member suggested finding it on GitHub.
- LLM Package Purgatory: Pin Your Versions: It was suggested that itās sometimes useful to install specific versions of packages, especially when using LLMs and Langchain.
- One member advocates using a uv lock in their default stack for agents, and a frozen venv on an external m.2 for backup.
- Vector Database Debugging Dance: A member suggested printing the chunks retrieved from the vector database to identify if the problem stems from the embedding model, chunking method, or the LLM itself.
- They suggest solutions will differ depending on the root cause of the problem.
GPU MODE ā· #general (15 messagesš„):
Paper Reading Groups, RTX PRO 5000 Blackwell Specs, GPU Programming Career Advice, Scam Bot Targeting ML Devs
- Discord Paper Reading Group?: Members are self-organizing a paper reading group and using the [general audio channels](https://discord.com/channels/GPU MODE/general-audio) to discuss the paper ātitleā (arxiv.org).
- If they are familiar with the paper, they can be invited to give a talk.
- Seeking RTX PRO 5000 Blackwell Specs: A member is seeking detailed specs for the RTX PRO 5000 Blackwell.
- Another member stated the pro 4500 is just a worse 5090 for a higher price.
- GPU Programming Career Advice Allowed: Career advice is now allowed in the <#1450579381448609882> channel.
- The channel had been previously restricted to technical discussions.
- Scam Bot Targets ML Devs: A scam bot net is posing as employees to target ML devs for identity theft.
- Filtering for new users mentioning blockchain or web3 might help identify them.
GPU MODE ā· #cuda (4 messages):
cuTile Advantages, cuTile vs Triton, cuTile GEMM Flops on Blackwell
- cuTile Aims for Strong Foundation in 1.0 Release: The 1.0 release will focus on a strong language foundation, with autotuning included as sample code inside TileGym.
- This is something we are actively working on and please stay tuned.
- cuTile vs. Triton for Custom Torch Ops: cuTileās user experience is very similar to writing a Triton kernel.
- For implementing an RMSNorm-like kernel that hits peak memory bandwidth, cuTile may be easier than Triton, especially for simpler memory-bound kernels.
- cuTile GEMM Flops on Blackwell Datacenter Cards: cuTile is expected to achieve higher GEMM flops on Blackwell datacenter cards, though benchmarking is needed to confirm.
- cuTile for Tensor Cores, TMA, and Swizzling: cuTile DSL may be better for GEMM/attention kernels involving tensor cores/TMA/swizzling.
- cuTile is more abstracted and leans on the compiler, making it suitable for relatively simpler memory-bound kernels, occupying a similar ease-of-use vs. performance frontier as Triton.
GPU MODE ā· #cool-links (9 messagesš„):
TMEM's Dedicated Arbitration Logic, NVIDIA psy-op, ldmatrix.x4
- Doubts Arise About TMEMās Arbitration Logic Claims: Doubts surfaced about a paperās claims regarding TMEMās dedicated arbitration logic and its ability to bypass L2 cache partitioning contention, with a member calling the description either terribly written or making several incorrect claims.
- Another member highlighted the lack of supporting numbers for memory access claims, expressing curiosity about the release of the code to clarify things.
- Paperās ldmatrix.x4 Implementation Raises Eyebrows: A member questioned the paperās context transfer to registers, noting that ldmatrix.x4 only loads four 8x16 tiles, which would equate to 32x16, and highlighted the lack of a 32x32 tile size being optimal on Hopper, as well as Hopper having no 4-bit MMA.
- Another member agreed, stating that the more they read the paper, the weirder it felt, describing it as an uncanny valley paper due to the mix of expert knowledge and nonsensical descriptions, comparisons, and results.
- NVIDIA Accused of Psy-Op Paper Release: A member jokingly suggested that a research paper might be a psy-op by NVIDIA to confuse competitors about their microarchitecture, linking to a defense kernel hack article.
- Another member highlighted the fake elapsed time function in the paperās code, where the function always reported 0.001ms to fake fast timings.
GPU MODE ā· #beginner (4 messages):
Working Groups, Open Projects
- Beginner Seeks Guidance on Working Groups: A beginner with CUDA experience inquired about next steps to participate in working groups/open projects.
- A member suggested checking the existing working group channel and mentioned new working groups starting soon.
- Starting Working Group Ideas: A member asked if they should post in the working group ideas channel to gain traction and support for their ideas.
- Another member responded, Yeah just help them out haha.
GPU MODE ā· #off-topic (2 messages):
Job Search, Discord Communities for Job seekers, Networking for AI Jobs
- Hunting for Job Search Discord Communities: A member asked for recommendations of popular Discord servers where people discuss finding jobs in software and AI in the US.
- Another member shared a link to a Discord channel as a potential resource.
- AI Job Networking: The user is looking to network with people in AI and Software.
- They are looking for recommendations of discord servers and communities.
GPU MODE ā· #rocm (34 messagesš„):
ROCm 7.1, FBGEMM library broken, NPS partitioning crashes, kernel module problems
- ROCm 7.1 and memory allocation issues: A user reports that their 7900XTX with ROCm 7.1 and torch 2.9.1 locks up when memory allocation on the GPU hits 100%, and they consider it a long-term issue with AMD GPUs.
- Another user confirms similar issues with large allocations on MI300X.
- FBGEMM Build is Frustrating: A user complains that the FBGEMM repo from AMD is totally broken and doesnāt build without jumping through many hoops, and even the latest docs on FBGEMM have you install v0.8.0 instead of v1.4.0, which doesnāt work.
- They express frustration that the CMake build configs donāt even have FBGEMM linking to the correct HIP libraries.
- Kernel issues plague AMD: A user mentions that the kernel module has problems and things that worked well a few versions ago are now totally broken.
- Specifically, NPS partitioning now crashes the kmd most of the time.
- AMD GPU Purchase a costly mistake: A user laments purchasing an AMD gfx1100 due to luxury import taxes and software issues, calling it a total mistake.
- They ended up buying a 5090 to replace it for $4500 USD, on top of the $2500 originally spent, adding that AMD didnāt take up George Hotzās offer to help.
GPU MODE ā· #self-promotion (10 messagesš„):
CUDA Kernel Naming, HMMA vs HFMA2.MMA, Register Moves in PTXAS, Cloud GPU marketplace
- Kernel Naming Impacts CUDA Performance?: A blog post explores whether including cutlass in CUDA/Triton kernel names affects performance, with potential instruction reordering, as shown in the blogpost.
- Confusion Arises Between HMMA and HFMA2.MMA: A member pointed out that the blog post inaccurately mixes HMMA (tensor pipeline instruction) with HFMA2.MMA (half pipeline instruction).
- PTXAS Instruction Mix for Register Moves Explained: It was explained that
ptxasuses a seemingly random mix of instructions (instead of MOV only) for register moves because, on some architectures, MOV can only be issued every second cycle, interleaving MOV and IMAD/HFMA2/IADD3 speeds up register moves. - Neoclaudx Launches Cloud GPU Marketplace: A member launched a cloud GPU provider website, NeoCloudX, that aggregates GPUs directly from data center excess capacity to reduce fees, offering A100s for ~$0.4/hr and V100s for ~$0.15/hr.
GPU MODE ā· #submissions (4 messages):
nvfp4_gemm leaderboard, NVIDIA performance
- NVIDIA sees New Personal Bests: One member achieved a personal best on NVIDIA with 13.4 µs on the
nvfp4_gemmleaderboard.- Another member also reached a personal best on NVIDIA, clocking in at 56.9 µs.
- NVIDIA gets another 4th place: A member secured 4th place on NVIDIA with 10.8 µs on the
nvfp4_gemmleaderboard, across two submissions.- Both submissions, with IDs 166947 and 166954, registered the same performance.
GPU MODE ā· #hardware (2 messages):
MI250, MI250X, Server Compatibility
- Comparing MI250 and MI250X GPUs: A member inquired about the difference between AMD MI250 and MI250X GPUs.
- They also asked whether these GPUs are interchangeable in a server environment, such as if an MI250X can be installed in a server designed for the MI250.
- MI250X in MI250 Server?: A member questioned whether an MI250X can be put in a MI250 server.
- This implies they are concerned about hardware or software compatibility.
GPU MODE ā· #cutlass (5 messages):
Cute DSL, CUTLASS, Python 3.10, MMA Tiling
- Cute DSL Python Version Discrepancy Dissolved: Users found that Cute DSL works fine with Python 3.10, despite documentation requiring Python 3.12 as seen in the documentation.
- CUTLASS Docs Get a Python Version Facelift: The CUTLASS documentation will be updated to reflect support for Python 3.10 through 3.13, with 3.14 under consideration.
- MMA Tiling Permutation Puzzle: A user is seeking assistance with permutations for tiled MMA, aiming for each thread to load pairs of contiguous values and utilize s2r vector copies.
GPU MODE ā· #teenygrad (1 messages):
LambdaLabs Grant, H100 Hours, SITP Textbook
- Teenygrad Receives LambdaLabs Grant: The teenygrad project has been accepted into LambdaLabs research grant, providing access to roughly 1000 H100 hours.
- Development is expected to ramp up again in the new year thanks to the new funding!
- SITP Textbook and Code Coming Soon: The textbook, code, and lecture materials for parts 1 and 2 of the SITP course are slated for release at the end of January or February.
- This should help adoption and get new users coding.
GPU MODE ā· #nvidia-competition (23 messagesš„):
NVFP4 GEMM, Kernel 2, cutlass.pipeline error, application did not respond
- NVFP4 GEMM competition wraps up on Dec 19th: The NVFP4 GEMM competition (Kernel #2) is scheduled to run from Nov 29th to Dec 19th, not extending to the 20th.
- Kernel 2 competition remains unchanged: A member noted that the second Kernel remains unchanged while the first one was extended.
- Members stated that it really would be a lot nicer if these dates were aligned to the ends of weekends.
cutlass.pipelineImport Error surfaces: Users reported encountering anImportError: cannot import name 'pipeline_init_arrive' from 'cutlass.pipeline'error.- The team is working to improve the slow runners, so probably changed in the process.
- āThe application did not respondā error appears: Users are reporting getting the error
The application did not respondwhen using the cluster bot.
Latent Space ā· #ai-general-chat (51 messagesš„):
Vibe CAD research from MIT DeCoDE Lab, Sakana's iconographic-linguage models, AntiGravity Performance Issues, OpenAI Router Rollback, New Warp Agents
- MITās Vibe CAD Shakes Up AI: New Vibe CAD research from MIT DeCoDE Lab introduces a dataset and model for learning, as showcased in a LinkedIn post and a related YouTube video.
- Byte-wise Boost for Sakanaās Models: Sakana is exploring performance improvements for iconographic-linguage models by going byte-wise, though the impact on other languages remains unresolved, according to this tweet.
- AntiGravityās Performance Plummets: A user reported abandoning AntiGravity due to performance issues, including the machine pegging for random reasons, with stack traces showing renderer misallocation (1.4TB of mem) and language_server spinlocks.
- OpenAIās Router Rerouted!: OpenAI rolled back ChatGPTās Model Router after a year, leading users to switch to Gemini and Claude, as noted in OpenAIās release notes and discussed in this Wired article.
- NVIDIA Snags SchedMD for Slurm: NVIDIA acquired SchedMD, the company behind Slurm, a popular workload manager, announced in NVIDIAās blog.
Latent Space ā· #private-agents (4 messages):
Google CC agent, Gmail AI productivity
- Google Labs cracks āCCā AI productivity for Gmail: Google Labs announced CC, an experimental AI productivity agent integrated into Gmail and rolling out in the US and Canada.
- It provides a daily āYour Day Aheadā briefing and handles email requests with early access to Google AI Ultra and paid subscribers, according to this X post.
- Another day, another AI agent: A new AI agent was announced.
- This agent is new and very cool.
Latent Space ā· #genmedia-creative-ai (14 messagesš„):
WAN 2.6, Chatterbox Turbo, Meta SAM Audio
- WAN 2.6 Drops - But No OSS!: WAN 2.6 is out but is commercial only, with no OSS version available, according to this link.
- The new release is an AI Video Generator with Multi-Sh[ā¦].
- Chatterbox Turbo Aims to Disrupt Voice AI Scene: Dev Shah announced the release of Chatterbox Turbo, an MIT-licensed, state-of-the-art voice model claiming to surpass ElevenLabs Turbo and Cartesia Sonic 3.
- Nicknamed the āDeepSeek moment for Voice AI,ā the model resolves the traditional trade-offs (fast but robotic vs. slow but great) and is built for trust, transparency, and auditability, as per this announcement.
- Metaās SAM Audio: Isolate Any Sound!: AI at Meta introduced SAM Audio, a unified model capable of isolating any sound from complex audio mixtures using text, visual, or time span prompts, according to this X post.
- Meta is sharing the model, a perception encoder, benchmarks, and research papers to encourage community exploration and application development.
Nous Research AI ā· #general (60 messagesš„š„):
Local LLMs implementation, Non-language models for waveform analysis, Nvidia's dominance in GPU market, Meta's samaudio, Mistral creative model
- Customized Small Models Gain Traction: Discussion highlights the growing interest in local LLMs trained on company-specific data, with a client exploring implementing one specialized for the maritime industry.
- The potential of training LLMs on specific employee communications or contract data is noted, with some saying that customized small models are going to be extremely popular down the road as well.
- Non-Language Models Show Potential in Ship Navigation: Members mentioned very interesting non-language models being trained for waveform analysis, specifically for ship navigation.
- These models use sensor data to identify optimal motor speeds and settings, essentially creating a highly skilled captain.
- Nvidiaās Success Rooted in GPU Bet and CUDA Language: Nvidiaās early bet on GPUs for applications beyond gaming, coupled with the development of CUDA, is cited as the root of their success.
- A YouTube video about the graphics cards emulation.
- Meta Introduces samaudio: Meta released samaudio (link missing) and a user indicated they believed they had it.
- Another user said that the same attempt with gemini was worse (image attached).
- Mistral Model Outputs Compared: A member offered to share comparisons between their tested model and the Mistral creative model.
- Another member clarified that they are testing on a 70B L3 model that will get transferred onto Kimi 1T rather than Mistral Small, which is 24B.
Nous Research AI ā· #research-papers (2 messages):
Byte Level LLMs
- Enthusiasm for Byte Level LLMs Erupts: Members expressed a liking for byte level LLMs.
- Bytes equal Fun!: They stated that byte level LLMs are fun.
Nous Research AI ā· #research-papers (2 messages):
Byte Level LLMs
- Byte Level LLMs are fun: Members expressed sentiment that byte level LLMs are fun to work with.
- Byte Level LLMs are cool: A member stated that byte level LLMs are interesting.
Moonshot AI (Kimi K-2) ā· #announcements (1 messages):
Kimi paid users, Kimi 30 minute chat
- Kimi Team Wants to Chat with Paid Users: The Kimi team is inviting paid Kimi users to a 30 minute chat.
- Participants will receive a free 1 month subscription as a perk; interested users should react with a š below to be contacted via DM.
- Another Topic: Filler topic.
- More details.
Moonshot AI (Kimi K-2) ā· #general-chat (29 messagesš„):
Kimi Models Text-Only vs. Context, Kimi Non-Thinking Model, K2-Thinking Performance, Kimi Pricing and Availability, K2 Thinking Turbo
- Kimi prefers Text-Only with Improved Context: Some users expressed a preference for a text-only Kimi model if it offered better contextual understanding.
- Others commented on the non-thinking modelās tendency to be overly concise, cutting out important information, which they believe is largely fixed in the thinking model.
- Kimiās Non-Thinking Model Criticized: The Kimi non-thinking model was criticized for being overly concise and cutting out important parts.
- Users suggest the K2-Thinking modelās intelligence gains outweigh any benefits of the non-thinking model, with one user specifically mentioning their exclusive use of Kimi 1.5.
- K2-Thinking Model Shows Performance Gains: The K2-Thinking model is reportedly faster than GLM-4.6 on many providers.
- However, some users note that some providers have degraded quality or itās too expensive while K2 Thinking Turbo is suggested as an alternative.
- Samsung Gets the Kimi Advantage: Samsung Galaxy store has Kimi version 2.5.1 which has memory feature, ahead of the Google Play storeās 2.5.0 version.
- Users are confused why Samsung gets the rollout first.
- Vendor Verifier sees Fireworks Better Direct: Someone linked MoonshotAI/K2-Vendor-Verifier showing performance on various vendors.
- It was noted that Fireworks via OpenRouter is worse than going directly via Fireworks, feeling like Minimax-M2.
Eleuther ā· #general (4 messages):
Synthema meta-language, Polyreflexeme theory, Algoverse AI research program, NSF SBIR proposal
- Synthema: Conceptual Meta-Language Emerges: A member introduced Synthema, a conceptual meta-language for meaning compression, aiming to compress concepts into shorter symbolic syntax.
- It is theoretical and not empirical yet, and designed to work retroactively in systems, serving as a conceptual meta-language of meaning.
- Polyreflexeme Theory Introduced: A member described the Polyreflexeme theory, an integral component for meaning compression, where multiple concepts/words recursively entangle for meaning-making.
- Meaning is described as a relational recursion, exemplified by RoleModel(°9) = ab, a b, b a, ab a, ab b, a ab, b ab, ab ab, but the member lacks resources to drive this into application.
- Algoverse Program Gets Mixed Review: A member mentioned Algoverse, a 12-week AI research program for college students with mentors from Stanford/Berkeley, but noted itās crowded and not very hands-on.
- They stated that itās not super worthy if youāre paying.
- NSF SBIR Proposal Submitted: A member completed their proposal for NSF SBIR but noted that the next window for submission hasnāt been posted yet.
- No other details were mentioned.
Eleuther ā· #research (11 messagesš„):
GAN, Research Collaboration, Paper Publishing
- GAN Confirmed: A member confirmed an image was a GAN with a link to X.com.
- Discord becomes HS Research Recruitment Hub: A member looking for high school research collaboration was directed to other Discord groups like Rishab Academy.
- Paperās fate is up in the air: A member received conflicting opinions about whether their paper is worth publishing to a conference, and if it will likely get rejected for not having a novel contribution.
- Another member suggested motivating the paper well, referencing a NeurIPS spotlight paper on VAEs and how it demonstrated VAEs were still viable, while another sarcastically commented that the detractors are the same people giving you low scores on openreview.
Eleuther ā· #scaling-laws (1 messages):
uwu1468548483828484: is this wrong or right
Eleuther ā· #interpretability-general (7 messages):
Superweights Impact, Attention Interpretation, Anthropic Circuits, Causal Head Gating
- Superweights Spark Debate: The impact of superweights which spike way high/low vs avg, was debated, with one member noting that penalizing them might hurt performance but is worth exploring.
- They suggested seeing if training can occur without significant dropoff.
- Attention Interpretation State-of-the-Art Sought: A member inquired about the most up-to-date information regarding attention interpretation, particularly regarding normalization and OV.
- They expressed discovering these concepts recently and wondered what else they were unaware of.
- Anthropicās Circuits Focus on Crosscoders: From the Anthropic circuits & superposition perspective, the latest update focused on forcing crosscoders and attribution to function, with progress detailed in an April update.
- It was shared that researchers (Chris Olah, Adam Jermyn) spent two years attempting to formalize attention superposition without much progress, also experimenting with qk diagonalization.
- Causal Head Gating Paper Praised: A paper on causal head gating presented at NeurIPS this year was highlighted as a well-designed, high-level approach.
- The paper can be found here.
Manus.im Discord ā· #general (12 messagesš„):
Manus 1.6 Release, AI & Full-Stack Engineer
- Manus 1.6 is Now Available: Manus 1.6 is now available to all users, you can learn more via the link.
- Effort tied to subcription levels: The higher tier subscription, the more effort it puts into tasks and stops being a dumbass; thatās why they removed the option to buy damn credits.
- One user stated it is largely integrated.
- AI & Full-Stack Engineer Diving Deep: One user is an AI & Full-Stack Engineer diving deep into autonomous agents, voice AI, and multi-agent frameworks, playing with LangGraph, CrewAI, AutoGen, and wiring up memory, tools, and reasoning.
- They are down for collabs, contract gigs, or long-term builds.
aider (Paul Gauthier) ā· #general (10 messagesš„):
OpenAI GPT-5, Aider Active Innovation, Aider copy-paste mode without LLMs, Aider Vision/Plans, Aider and interleaved reasoning tool calling
- GPT-5 Speculation Begins: A member jokingly suggested trying
openai/gpt-5as the model string, sparking discussion, although GPT-5 has not been released. - Aiderās Development Status: A member inquired whether Aider is still in active innovation or just focused on its original goals, considering the emergence of other CLI-based apps with more features.
- Another member asked What was it focused on months ago?, another mentioned Aider seemed to be focused on being a TUI for using local or cloud models.
- Aider copy-paste mode and LLM requirements: A user reported receiving a warning about needing an LLM model and API keys even when running Aider with
--copy-paste.- The warning suggests using OpenRouter for free and paid access to many LLMs; it exits when declining to log in or open documentation.
- Aiderās implementation of interleaved reasoning tool calling: A user inquired whether Aider properly implements the interleaved reasoning tool calling of models such as minimax-m2, kimi-k2-thinking and the new deepseek v3.2 thinkings.
aider (Paul Gauthier) ā· #links (1 messages):
Zenflow launch, Agent workflows
- Zenflow Orchestrates Predictable Agent Workflows: The Zenflow orchestration layer has launched, turning specs into coordinated agent workflows.
- It aims to provide predictable shipping instead of prompt roulette, according to the launch announcement.
- Zenflow promises less prompt roulette: Zenflow is an orchestration layer that turns specs into coordinated agent workflows.
- It promises predictable shipping, according to its website.
tinygrad (George Hotz) ā· #general (2 messages):
AI pull requests policy, Understanding AI-generated code
- AI Pull Request Policy Remains Strict: The policy regarding AI-generated pull requests remains unchanged: unless the submitter is a known contributor, any PR that appears to be AI-generated will be immediately closed.
- The rationale is that contributors should completely understand every line of their PRs, as submitting AI-generated code without comprehension provides negative value.
- Comprehension Over Automation: Contributors should completely understand every line of their PRs, as submitting AI-generated code without comprehension provides negative value.
- The point is that the AI canāt replace thinking and understanding from trusted contributors.
DSPy ā· #show-and-tell (1 messages):
justanotheratom: https://www.elicited.blog/posts/dspy-strategy-and-program
MLOps @Chipro ā· #events (1 messages):
ggdupont: Anyone know about GenAI zurich conference? is it any good?
MCP Contributors (Official) ā· #general-wg (1 messages):
Missing Thread Response, Contributor Apology
- Contributor Apologizes for Missing Thread Response: A contributor apologized for missing a thread response and indicated they responded in the thread in the specified channel.
- The contributor tagged a user, <@282306658825273344>, in their message.
- Response Confirmation in Specific Channel: The contributor confirmed that the response was provided in a thread within the channel <#1399984784020607007>.
- This ensures the user can find the relevant information within the correct context.