a quiet day.
AI News for 9/11/2025-9/12/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (189 channels, and 5258 messages) for you. Estimated reading time saved (at 200wpm): 464 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Happy o1 anniversary. Congrats to Naveen Rao and Interaction on buzzy new fundraises.
AI Twitter Recap
Edge Reasoning on-device: Metaās MobileLLM-R1 (subā1B) goes open on HF
- MobileLLM-R1 (subā1B, open weights): Meta released a family of subā1B parameter reasoning models on Hugging Face with unusually strong small-model results: ~5Ć higher MATH accuracy vs Olmoā1.24B and ~2Ć vs SmolLM2ā1.7B, while matching or surpassing Qwen3 accuracy on multiple reasoning benchmarks trained on only 4.2T tokens (ā11.7% of Qwen3ās 36T) according to @_akhaliq and the model post link. Meta researchers emphasized the data efficiency and reasoning capability at this scale (announcements, more context). Community demos arrived quickly via Anycoder/Spaces (app, another).
Qwen3āNextā80B (A3B): hybrid attention, 256k context, and heavy infra implications
- Architecture & inference complexity: Alibabaās new openāweights Qwen3āNextā80BāA3B introduces a hybrid attention design (Gated DeltaNet + Gated Attention) with high sparsity (ā3.8% active params vs 9.4% in Qwen3ā235B), a native 256k context window, and textāonly I/O. Adaptation required major engine changes: SGLang PR >6k LOC; vLLM >2.5k LOC per @ZhihuFrontier. Pricing on Alibaba Cloud is $0.5/$6 per 1M input/output tokens for the reasoning variant and $0.5/$2 without reasoning, cheaper than Qwen3ā235B (details, token usage).
- Performance & tradeoffs (community evals): Longāhorizon āworking memoryā and multiāturn consistency are visibly improved; characterālevel basics are strong though reasoning+character tasks are mixed; weaknesses include error inheritance, instructionāfollowing gaps, and longātext hallucinations, per Zhihu analyses (summary, thread). A separate roundup places Qwen3āNextā80B near DeepSeek V3.1 on an aggregate index at much lower token usage (@ArtificialAnlys).
Agents, evaluation fixes, and failure forensics
- SWEāBench fix, progress still real: FAIR Codegenās @TacoCohen highlighted an issue allowing agents to peek at future commits, which SWEāBench promptly fixed. Preliminary reāruns suggest most models arenāt heavily affected; FAIR found the bug only after scaling RL runs to ātooāgoodātoābeātrueā results. Recommendation: labs and OSS should reāpublish on the fixed benchmark and clearly annotate.
- Live, taskful evals are hard: LiveMCPā101 introduces a realātime agent framework/benchmark that stresses complex tasks beyond synthetic settings. Even frontier models underperform: GPTā5 scores 39.02% on āhardā tasks; top models remain below 60% overall. The paper catalogs seven common failure modes (ignoring requirements, overconfident selfāsolve, wrong tool choice, syntax/semantic/output parsing errors) (overview, results, paper).
- Calibration over guessing: OpenAI argues hallucinations persist because benchmarks reward confident guesses; fixes include not penalizing āI donāt knowā and realigning leaderboards (summary, paper). On AssistantBench, GPTā5 shows higher precision and lower guess rates than o3 (@PKirgis). HAL is adding Docent to analyze agent logs rather than only end accuracy (@sayashk).
Tooling, infra, and libraries
- VS Code grows a model marketplace API: The āLanguage Model Chat Providerā extension API is finalized; BYOK providers can be installed as extensions for more model choice. Also shipping are tutorials, videos, and an autoāselect model experience (e.g., Claude, GPTā5/mini, Gemini) (API thread, Cerebras ext, release, notes).
- Transformers v5 + continuous batching: HF teased a v5 modernization push (faster kernels, smarter defaults, cleanup) and quietly landed continuous batching to simplify evaluation/training loops (not chasing maxāthroughput servers; focus is tinkering/toolbox) (v5, cont. batching). Also, ānew LLM releases now announced as PRs to Transformersā (@lvwerra).
- Inference systems: Metaās vLLM disaggregated inference shows latency/throughput wins vs its internal stack; optimizations are being upstreamed (@PyTorch). A clear explainer on paged attention circulated (link).
- AOT and regional compilation: ZeroGPU added regional AOT compilation and sharing/loading precompiled graphs to accelerate bringāup (post, blog/docs).
- Vision & retrieval in HF: Microsoftās Kosmosā2.5 landed in Transformers with OCR+layout demo/notebook (demo/docs, notebook). MetaCLIP2 multilingual models plus textātoāimage search notebooks arrived as well (announcement, tutorial).
- Also noted: Skypilotās new GPU utilization dashboard (link); and Elonās aside that āAMD is now working pretty well for small to medium sized modelsā (@elonmusk).
Frontier access, SDKs, and safety collaborations
- OpenAI platform: GPTā5 and gptā5āmini rate limits were bumped substantially across tiers (@OpenAIDevs). A new āgptā5āhighānewā target appeared in CodexāCLI (ātuned to rely on builtāin reasoning defaultsā), though details remain scant (@mark_k). OpenAIās focus on extended thinking continues: o1āpreview āsecondsā to current models āhoursā with web+browse+code, āmuch more runway aheadā (@polynoamial, @gdb).
- Anthropic: The UK AISI and US CAISI have been identifying jailbreaks in Claude Opus 4/4.1, helping ship stronger safeguards (announcement, details, AISI thread). For builders, the Claude Code SDK (same harness as the CLI) is a recommended starting point for custom agents (intro, docs).
- Qwen Code: v0.0.10/11 added subāagents, a Todo Write tool, āWelcome Backā project summaries, editing stability, better IDE/shell integration, improved memory/session management, and more (release, preview).
Vision models and leaderboards
- LMArena updates: With >43k votes, Gemini 2.5 Flash Image (ānanoābananaā) continues to top both Image Edit and TextātoāImage charts; ByteDance Seedream 4 is now #2 on Image Edit and #5 on T2I (leaderboard, more). A new āSeedream 4 High Resā variant supports 4096Ć4096 outputs and is live in Arena (add, try).
- Other vision drops: Tencentās HunyuanImageā2.1 (2K T2I) is available via Anycoder/FAL for quick app prototyping (post, app).
Privacy-preserving pretraining
- VaultGemma: Google Research released VaultGemma, a 1Bāparameter Gemma variant trained from scratch with differential privacyāclaimed as the largest open model trained this wayāplus new scalingālaw results for private LM training. Weights and report are available (announcement, summary, model, paper).
Top tweets (by engagement)
- āHow money worksā flywheel satire around a hypothetical OpenAIāOracle megadeal by @Yuchenj_UW (20.9k).
- Utah Gov. Spencer Cox on social media harms by @bensiegel (12.9k).
- Wikipedia finances scrutiny by @nearcyan (10.4k).
- AI leader archetypes satire by @sergeykarayev (9.0k).
- OpenAI platform rateālimit boosts for GPTā5/mini by @OpenAIDevs (2.1k).
- Elon on AMD GPUs for small/medium models by @elonmusk (2.2k).
- Higgsfield growth stats and product velocity by @higgsfield_ai (2.9k).
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Meta MobileLLM-R1 Release + Weekly LocalLLaMA Model/Dataset Roundup (Sep 12)
- Meta released MobileLLM-R1 on Hugging Face (Score: 412, Comments: 46): Meta published MobileLLMāR1ā950M on Hugging Face (model card), a
~950M
parameter small LLM intended for efficient, on-device/mobile inference, with an accompanying interactive demo Space (app) reportedly built via the AnyCoder Space (AnyCoder). The post does not list benchmarks, but context emphasizes pushing inference accuracy at the low-parameter end and providing an open release suitable for lightweight deployment. Commenters applaud work on small-model inference accuracy and appreciate that Meta is still releasing models openly, with some surprise about it being āfully open source.ā- Emphasis on pushing inference accuracy at the small-parameter frontier: commenters highlight value in optimizing the ālower boundsā of limited-parameter models, where improvements in training, quantization, and decoding strategies can yield disproportionately large real-world gains for on-device and low-latency settings.
- Benchmark skepticism: one user notes the model is still outperformed by
Qwen 0.6
(likely a ~0.6B-class Qwen variant) on common leaderboards, questioning novelty. This raises the need to evaluate not just raw accuracy but mobile-centric metrics (e.g., tokens/sec on CPU/NPU, peak RAM, model size after 4/8-bit quantization, and energy per token) and any R1-style reasoning gains if applicable. - Deployment interest: requests for a
GGUF
build suggest users want llama.cpp compatibility and fast quantization (e.g., Q4_K_M/Q8_0) for edge devices, enabling practical tests on laptops and phones without GPU, and facilitating apples-to-apples comparisons of throughput and memory footprint versus other sub-1B models.
- A list of models released or udpated last week on this sub, in case you missed any - (12 Sep) (Score: 273, Comments: 32): Weekly roundup highlights: Qwen3āNextā80BāA3B introduces a sparselyāactivated 80B MoE with ~3B params active per token (reported ~10Ć faster inference, 32k+ context) HF release; MiniCPM4.1ā8B adds hybrid reasoning (/think vs /no_think) with long context HF; Janāv1ā2509 claims improved reasoning/creativity evals HF; and PyDevMiniā1 (4B) claims GPTā4ālevel Python/WebāDev performance at 1/400th the size HF. Speech/TTS: Qwen3āASR (APIāonly, multilingual EN/CN + 9) demo and IndexTTSā2.0 (expressive, durationācontrolled zeroāshot TTS) repo. Reasoning/MoE and research: Aquifā3 series (incl. 17B a2.8B GGUF) HF, ROMA reports wins over closed platforms on SEALā0/FRAMES GitHub, Baiduās Ernie X1.1 targets frontier Chinese capability post; datasets include FinePDFs (3T tokens; 0.5B+ PDFs) HF and LongPage (300 novels with reasoning traces) HF. Comments request llama.cpp support for Qwen Next and flag contemporaneous releases: KwaiāKlearās Klearā46BāA2.5BāInstruct link and inclusionAIās Ringāminiā2.0 link.
- Interest in llama.cpp support for Qwen indicates demand for GGUF quantization and lightweight CPU/GPU inference of Qwen-family models via llama.cppās kernels (e.g., cuBLAS/Metal/Vulkan). Integration typically hinges on tokenizer/chat template compatibility (Qwen often uses ChatML) and rotary/pos-embed variants; tracking llama.cpp PRs would clarify when full Qwen parity lands (llama.cpp, Qwen HF).
- A commenter flags the release of Kwai-Klear/Klear-46B-A2.5B-Instruct āexactly 7 days ago.ā The naming suggests a Mixture-of-Experts style model with ~
46B
total parameters and ~2.5B
active per token (typical āA2.5Bā convention), targeting instruction tuning; if accurate, it could offer latency closer to a small dense model while retaining higher capacityābenchmarks vs Mixtral-style MoEs would be valuable. - Additional mention of inclusionAI/Ring-mini-2.0 highlights an updated compact instruct model. For technical evaluation, readers would want perplexity and downstream benchmarks (e.g., MMLU, GSM8K) and quantization availability (GGUF/int8) to assess suitability for edge deployment within the
~1ā3B
class.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Seedream/Seedance 4.0 Image Model Releases and Benchmarks
- Seedance 4.0 is so impressive and scary at the same time⦠(all these images are not real and donāt exist btw) (Score: 374, Comments: 77): Post showcases āSeedance
4.0
ā purported image-generation results that are claimed to be entirely synthetic and photorealistic (āall these images are not realā). No technical artifacts are providedāno model/architecture details, training data, safety or watermarking scheme, or quantitative evaluations (e.g., FID, precision/recall)āso fidelity, robustness to detection, and provenance guarantees cannot be assessed from the post alone. Top comments voice skepticism about post-release astroturfing/āorganicā marketing around new models; otherwise thereās minimal technical discussion.- Multiple commenters position Seedance 4.0 as the current top text-to-image model, with Nano Banana cited as a close second; others are perceived to lag notably in prompt adherence and photorealism. No quantitative benchmarks were provided, but the consensus emphasizes superior baseline quality and consistency for Seedance across similar prompts.
- A technical trade-off is highlighted: Seedance 4.0 tends to produce highly consistent outputs for similar prompts (lower variance), whereas Nano Banana yields greater diversity/variance in generations. This suggests different sampling/regularization behaviors (e.g., tighter prompt-to-image mapping or stronger mode preference in Seedance), which could favor Seedance for reproducibility while making Nano Banana better for exploratory ideation.
- Seedream 4.0 is the new leading image model across both the Artificial Analysis Text to Image and Image Editing Arena, surpassing Googleās Gemini 2.5 Flash (Nano-Banana), across both! (Score: 242, Comments: 86): Post claims that Seedream 4.0 is now ranked top-1 on the Artificial Analysis (AA) Text-to-Image and Image Editing Arenas, surpassing Googleās Gemini 2.5 Flash (the Arena entry referred to as āNano-Bananaā) across both tasks. AA leaderboards are ELO-style, pairwise preference battles, so this implies Seedream 4.0 leads in head-to-head prompt-following generation and localized editing quality under AAās crowd/evaluator setup (Artificial Analysis, Gemini models overview). Commenters note that holding the #1 spot in both generation and editing simultaneously is uncommon and impressive; thereās also community speculation/hope that an open-weights model from Chinese labs could soon overtake closed systems in at least some domains.
- Seedream 4.0 topping both the Artificial Analysis Text-to-Image and Image Editing arenasāsurpassing Google Gemini 2.5 Flash (Nano-Banana)āsignals strong cross-task generalization and instruction-following. Editing leaderboards stress localized edits, identity preservation, and low over/under-edit rates; being
#1 across both
suggests robust control as well as generative quality. See the arenas on Artificial Analysis for pairwise results. - Debate on benchmarks vs subjective testing: arena rankings are typically derived from pairwise human preference with ELO-style scoring, which can diverge from small-sample personal tests. As one user notes, āwell it sucks in my testing, benchmarks/leaderboards arenāt everything,ā highlighting that leaderboard wins reflect aggregate preference, not every prompt distribution; reproducible eval with fixed seeds and public prompt sets can help reconcile discrepancies.
- Safety/moderation trade-offs raised: heavier filtering pipelines (classifier cascades, prompt sanitization, rejection sampling) can increase refusal rates and degrade edit success on benign edge cases. Tightly moderated stacks (e.g., some Google deployments) may reduce NSFW/abuse risk but also harm instruction-following and throughput/latency, which can impact arena win-rates in instruction-heavy image editing.
- Seedream 4.0 topping both the Artificial Analysis Text-to-Image and Image Editing arenasāsurpassing Google Gemini 2.5 Flash (Nano-Banana)āsignals strong cross-task generalization and instruction-following. Editing leaderboards stress localized edits, identity preservation, and low over/under-edit rates; being
- 1GIRL QWEN v2.0 released! (Score: 353, Comments: 49): Release of 1GIRL QWEN v2.0, a LoRA fine-tune targeting the Qwen-Image pipeline, claiming improved realism for single-girl renders. Download via Civitai: https://civitai.com/models/1923241?modelVersionId=2203783; preview: https://preview.redd.it/mhrk7biqbhof1.png?width=763&format=png&auto=webp&s=b38072a5a786614d2bc53677dfcc8429544adfb7. The post provides no training details (e.g., rank, dataset, steps) or benchmarks; āone of the most realisticā is a qualitative claim without quantitative evals or comparison baselines. Top comments question promotional framing (āyet another instagirl adā) and note apparent vote manipulation before stabilization; one asks if the model is āuncensored,ā implying interest in safety filter/NSFW gating and whether the LoRA bypasses base-model content controls.
- A commenter asks for concrete LoRA training details for this release, planning to train locally on an RTX 4080 Super (16 GB VRAM) with 32 GB RAM. They note prior success fine-tuning SDXL and are switching to Qwen citing its faithfulness to prompt details, seeking specifics on the training pipeline and settings to replicate comparable fidelity.
- Another user asks whether the release is uncensored (i.e., NSFW-enabled/no safety filters). This impacts applicability for local deployments and parity with community LoRAs versus filtered or āinstructā-style checkpoints that may suppress certain outputs.
- One comment flags a visible anatomy/proportion artifact (āsecond picture thigh larger than torsoā), implying the model or LoRA may still exhibit common generative failures in body proportions. This points to potential dataset bias or insufficient constraint during fine-tuning affecting structural consistency in outputs.
- Control (Score: 248, Comments: 47): A demo showcases a control pipeline combining āInfiniteTalkā (speech-driven facial/lip-sync animation) with āUniAnimateā (controllable video animation for body/hands) to perform dubbing in a video-to-video workflow. Facial realism is highlighted as the strongest aspect, but exact frame/pose parity with the source is not maintainedāthe output exhibits slight motion drift, indicating temporal consistency and movement-locking limitations in the current setup. Commenters praise the facial performance and ask for implementation details on fusing UniAnimate with InfiniteTalk while preserving exact movements; one suggests scrutinizing hand consistency (e.g., āfollow the rings on her right handā) to detect subtle control or artifact issues.
- Several users are trying to combine Unianimate with Infinite Talk for video-to-video dubbing, but report that Infinite Talkās output drifts from the input motion (i.e., doesnāt preserve exact pose/gesture timing). The core technical issue raised is 1:1 motion/temporal lockāmaintaining identical per-frame movement while replacing speechāimplying a need for strict frame-rate parity, deterministic seeds, and motion/keypoint control across the pipeline to avoid resampling or retiming artifacts.
- Multiple requests for a detailed workflow indicate missing implementation specifics (e.g., capture FPS, motion control signals, seed/temperature settings, how face/hand control is applied, and where audio-driven lipsync is injected in the graph). Without these, replicability is limited and viewers canāt assess whether the pipeline uses pose control (e.g., keypoints/optical flow) versus post-process retiming to align lip motions.
- A visual audit cue is suggested: āfollow the rings on her right hand,ā implying hand jewelry as an unintentional motion-tracking marker. This is a practical technique to detect temporal inconsistencies or compositingāif rings exhibit unnatural jitter/warping or timing offset relative to body pose, it hints at imperfect motion preservation or stabilization in the generation pipeline.
- Lol. I asked ChatGPT to generate an image of the boyfriend it thinks I want and the boyfriend it thinks I need (Score: 2532, Comments: 651): OP used ChatGPTās image generation to create a two-panel āboyfriend I want vs boyfriend I needā image. One panel reportedly shows a man with an āAI safetyā book, indicating a likely hallucinated text element and/or alignment-biased content insertionāan example of how generative models can misinterpret abstract prompts and inject safety-themed or on-trend concepts. While non-technical, it highlights model priors and text-in-image artifacts common in systems like DALLĀ·E 3. Comments note the odd inclusion of an āAI safety bookā and suggest GPT āmisunderstood something,ā while OP says the result isnāt wrongāreflecting mixed reactions to the modelās interpretation rather than its rendering quality.
- Several users note the model inserting unexpected, legible text elements (e.g., an āAI safety bookā) into the generated image, suggesting safety-tuning priors can leak into content selection and that the image model has relatively strong text rendering compared to earlier diffusion models that often garbled words. See examples shared in-thread: https://preview.redd.it/3z4sje4t8jof1.png?width=1536&format=png&auto=webp&s=027ee8ad4f9b77efa58d4750ad3be7d5f5d18ec6 and https://preview.redd.it/v6cyf3q3viof1.jpeg?width=1176&format=pjpg&auto=webp&s=802e364f3a14b0f3cf2fd7fd2e68bd0f742e9319.
- Comments imply the prompt was interpreted via common internet tropes (āthe boyfriend you want vs the boyfriend you needā), producing archetypal contrasts rather than personalized outputsāhighlighting that, without explicit attributes or constraints, prompt following defaults to generic priors and can feel like a misinterpretation or a āroast.ā This reflects typical behavior of safety-aligned, instruction-following image models that prioritize safe, broadly acceptable compositions over user-specific nuance.
2. UK Government AI Adoption Coverage
- AI is quietly taking over the British government (Score: 3012, Comments: 171): The postās image (https://i.redd.it/7b5t3z8bbiof1.png) appears to insinuate that UK House of Commons/government text is AI-generated, but it provides no technical evidence (no model/version, deployment details, usage metrics, or sourcing). There are no benchmarks or auditsājust a screenshot-level claimāso the most plausible technical interpretation is routine use of LLMs (e.g., ChatGPT/Copilot/Grammarly) for proofreading or drafting assistance by staff rather than any system-level automation or policy change. Top comments push back that the title is sensational; they argue itās common for professionals to use AI for proofreading and that this doesnāt equate to AI ātaking over.ā Another comment mocks the claim, implying the presented āverbiage analysisā is unconvincing and not evidence-based.
- Multiple commenters note official, time-bounded adoption: the UK government received a free Microsoft 365 Copilot trial from
OctāDec 2024
(The Register), and inJan 2025
the Labour government published a blueprint to scale AI across departments (gov.uk). This suggests any spike in āAI-likeā phrasing aligns with sanctioned M365 Copilot use (Word/Outlook/Teams) rather than covert takeover. The timing undermines the āquietlyā claim and frames it as an official, enterprise rollout. - Methodology critique: attributing text to ChatGPT via ācrucial verbiageā or stylistic markers is unreliableāAI text detection has high false-positive/negative rates and is easily gamed. One comment observes the signal correlates more with when Labour took office than with ChatGPT availability, implying a communications-style shift as a confounder. A more rigorous approach would control for administration change (e.g., difference-in-differences across departments and pre/post periods) and validate against ground-truth authorship.
- Practitioners emphasize assistive usageācivil servants likely use AI for proofreading/summarization and ālinguistic verificationā rather than wholesale content generation. In an M365 Copilot context, that maps to rewrite/summarize/proof features embedded in Word/Outlook, which augment throughput without ātaking overā roles; measuring adoption by presence of generic phrasing alone risks overstating automation.
- Multiple commenters note official, time-bounded adoption: the UK government received a free Microsoft 365 Copilot trial from
3. ChatGPT Ads, Gemini 3 Release Delay, and Feature Gap Debate
- Enjoy ChatGPT while it lastsā¦. the ads are coming (Score: 2375, Comments: 163): OP argues that consumer LLM assistants (ChatGPT/OpenAI, Perplexity, Anthropic) will inevitably monetize by embedding ads into responses, risking covert promotional steering and surveillance-style targeting within the chat UX. Technical concern centers on contamination of model outputs via sponsored prompts/formatting, tier-based gating (free vs paid), and resultant erosion of trust/accuracy in assistant recommendations. The thread frames a conflict-of-interest risk where ranking/generation becomes ad-influenced rather than relevance/faithfulness-driven. Top comments debate acceptability of ads only on free tiers vs unacceptable for Plus/Pro; suggest subscriptions or other offsets instead of ads due to trust/accuracy headwinds; warn that influence may be organic/subtle rather than explicit ad units, making it harder to detect.
- Hidden āorganicā ad steering is technically feasible via alignment data and system-level policies: a provider could bias GPT-4o/ChatGPT recommendations by mixing advertiser-favored samples into RLHF/instruction-tuning, or by adding retrieval/ranking priors that prefer sponsored entities, leading to subtle product slant without explicit ad labels. This is analogous to search ad blending where paid results are ranked alongside organic; with LLMs, the bias manifests in generated prose and tool-use choices, making disclosure and reproducibility harder to audit.
- Several users flag data-contamination risks: if open-source models train on web corpora increasingly polluted by ad-influenced LLM outputs, bias amplifies over time. This mirrors model self-consumption failures documented in āSelf-Consuming Generative Models Go MADā (Shumailov et al., 2023) where training on model-generated data induces distribution shift and degradation; ads would act as a targeted poisoning signal that propagates into future checkpoints (see https://arxiv.org/abs/2305.17493).
- Evidence of link-level attribution/tracking: ChatGPT-shared URLs can include affiliate/UTM-style parameters (e.g.,
utm_source
,ref
, or partner IDs), enabling downstream sites to attribute traffic and enabling the model provider to run CTR/A/B experiments. While not an ad per se, this instrumentation creates a measurement channel that could be repurposed for sponsored ranking or revenue share and folded back into retrieval/ranking training via click logs.
- Why havenāt all the other companies (Google, OpenAI, Deepseek, Qwen, Kimi and others) added this before? Itās literally the most obvious and most needed thing š¤ (Score: 295, Comments: 51): OP shares an image implying a ānewā chat feature for uploading/reading files (esp. PDFs) directly inside an LLM UI and wonders why others havenāt shipped it. Multiple comments point out this capability has existed in ChatGPT since 2023 via Code Interpreter/Advanced Data Analysisāallowing users to attach PDFs/CSVs, run Python over them, and query document contentsāso the novelty is likely UI polish rather than core functionality. See OpenAIās earlier releases: ChatGPT Plugins incl. Code Interpreter (Mar 2023) and the Advanced Data Analysis help doc. Commenters argue the feature isnāt new (āwhoās gonna tell himā), and note that while ChatGPTās implementation works, the results on PDFs can be mediocre and the UI less refined compared to the screenshot.
- Multiple commenters note this isnāt new: ChatGPT has supported file upload and document/PDF analysis since
2023
via Code Interpreter / Advanced Data Analysis (ADA), handling nonāvisual files well. However, results on complex PDFs are described as only āmid,ā with weaker formatting fidelity/table extraction and a more basic UI rendering compared to native viewers. Ref: OpenAI ADA docs ā https://help.openai.com/en/articles/8554397-advanced-data-analysis. - Feature parity exists across other stacks: Google Gemini, Microsoft Copilot, and DeepSeek already allow uploading files for analysis/summarization, so the capability isnāt novel to one vendor. Geminiās API explicitly supports prompting with uploaded files (including PDFs) for multimodal processing ā https://ai.google.dev/gemini-api/docs/prompting_with_files.
- Multiple commenters note this isnāt new: ChatGPT has supported file upload and document/PDF analysis since
- ChatGPT may have saved my life (Score: 438, Comments: 55): OP reports persistent abdominal pain; ChatGPT elicited classic appendicitis triage featuresā
right lower quadrant
pain andrebound tenderness
āand advised ER evaluation, where near-rupture appendicitis was apparently confirmed. The interaction mirrors simple clinical decision aids (e.g., the Alvarado score) and bedside signs like McBurneyās point and rebound tenderness, illustrating LLMsā ability to surface pertinent positives/negatives for urgent care despite not being clinicians. Top comments provide corroborating anecdotes: ChatGPT supplied reasonable differentials later aligned with clinician diagnoses and served as an explanatory aid during rehab; others argue its public-health benefits (triage and education) are underweighted relative to rare harmful uses. Additional anecdotes cite accurate preliminary identification of conditions in pets and children prior to formal diagnosis.- Users report leveraging ChatGPT for differential diagnosis and triage-style reasoning: when appendicitis was suspected, it produced a ranked list of alternatives, one of which matched the hospitalās final diagnosis; another user describes stepwise guidance to check gallbladder pain and to rule out emergent issues. This highlights utility as a patient-side decision-support tool that structures symptom review and next-step heuristics while deferring definitive diagnosis to clinicians.
- Several accounts emphasize evidence-oriented education and care planning: ChatGPT provided detailed explanations of conditions, probable recovery timelines, and curated stage-specific gastritis diets, including rationale on which foods are āgastritis safe,ā and guidance toward nutrient-dense options during reduced intake. One user notes it could surface and explain studies and mechanistic reasons behind recommendations, aiding self-management prior to a
~6 months
in-person appointment. - Failure modes and safety practices are called out: despite being ārarely incorrectā on dietary safety, users still ācaught it making false claims and assumptions,ā reinforcing the need to cross-check and treat outputs as advisory. Telemedicine later confirmed a suspected gastritis diagnosis, underscoring that ChatGPT can be a high-recall assistant for narrowing possibilities and education, but requires external validation and should not replace clinical testing or medical judgment.
AI Discord Recap
A summary of Summaries of Summaries by X.ai Grok-4
Theme 1: Fresh Models Flex Muscles in Arenas
- Qwen3 80B Crushes Sparsity Records: Qwen3 80B boasts 79.7B parameters with only 3.87B active due to 1:51.2 sparsity in its MoE, enabling efficient computation while maintaining high performance, as detailed in this X post. Members expressed optimism about its abilities, especially when compared to GPT-5, with a December 2024 knowledge cutoff and decent initial performance.
- Palmyra-Mini Packs Reasoning Punch: The Palmyra-mini family includes a base model and variants excelling in math tasks like GSM8K 82.9% and AMC23 92.5%, with one achieving top scores on AIME24, GPQA, and MATH500, available on Hugging Face. These compact open-source models from Writer focus on reasoning, sparking discussions on their potential for technical applications.
- FluentlyQwen3 Drops Universal LLMs: Project Fluently released FluentlyQwen3-1.7B and 4B models, merged after additional training under Apache-2.0 license, maximizing potential for diverse tasks as seen on Hugging Face. Users highlighted their efficiency on lower-end hardware, with links to FluentlyQwen3-1.7B for quick deployment.
Theme 2: Throughput Wars Heat Up Hardware
- GPT-OSS 120B Revs TPS Debates: Members debated GPT-OSS 120B achieving 30 TPS on a 4090 with 64GB RAM, while others capped at 10 TPS, prompting tweaks in llama.cpp like disabling top-k for better performance. Optimizations like MXFP4 quantization and custom kernels yielded speed gains, with benchmarks in this Hugging Face post.
- DeepSeek Drags to Hour-Long Snails: DeepSeek faced reports of extreme slowness, with code generation taking 1 hour 20 minutes, speculated to stem from CCP-mandated Huawei chips impacting performance. Community contrasted this with open-source affordability at 1/5 the price of closed alternatives, emphasizing privacy benefits over lagging search capabilities.
- Gemma3 Builds from Scratch on A6000: A user trained Gemma3 270M from scratch on TinyStories for 10 hours using an A6000 GPU, logging with Weights and Biases and judging via Claude Opus 4.1, shared on GitHub and Hugging Face.
Theme 3: Training Tricks Tackle Data Dilemmas
- Two-Stage Curriculum Slashes Compute Waste: A two-stage training ranked datasets by difficulty, dropping average loss from 2.5 to 0.8 after refining stage1 with unambiguous labels, improving signal focus as discussed in Unsloth AI. This method reduces wasted compute on easy examples, drawing from an upcoming paper on synthetic data tainting closed LLMs like Grok and Gemini at arxiv.org.
- Synthetic Data Poisons Closed-Source Giants: All closed LLMs suffer zero LTF factor from synthetic data training, requiring re-biasing and rebuilding latent thinking, as per a paper claiming performance hits in RLHF and instruct tuning. Members debated fixes like phased pretraining from TinyStories to FineWeb for 400M models, emphasizing inductive bias over long contexts.
- Fluid Nets Flow with Navier-Stokes: A paper explored Turing-complete neural nets via Navier-Stokes equations for fluid dynamics computing, sparking debates on mortality and unreproducibility versus efficiency, linked at arxiv.org. Parallels drawn to running Doom on gut bacteria in this video highlighted analog compute trade-offs.
Theme 4: Deployment Demons Dog Engineers
- Docker Crashes H100 Party: Docker images working on 3090/4090 failed with CUDA errors on H100, resolved by updating incompatible NVIDIA drivers via data center drivers. Users reported similar woes with vLLM switching to uv pip, breaking Torch Nightly and forcing reverts to v0.10.1.
- IRIS Install Simplifies ROCm Chaos: IRIS installation streamlined to
pip install git+https://github.com/ROCm/iris.git
requiring ROCm + Torch + Triton + TorchDistributed, demonstrated in this video. This aids AMD competitions, contrasting NVIDIAās 215 B200 GPUs for the Oct 24 SF hackathon via compute form. - PSU Transients Trip GPU Stability: Calculations for PSU wattage factored CPU, GPU, and 50% overhead to avoid transients causing crashes, especially on 30-series cards, referenced in Teknium1ās tweet. Users fixed ādeadā secondary GPUs by cleaning PCI-E connectors, suggesting power issues over hardware failure.
Theme 5: Tools Twist Creative and Coding Flows
- Kimi K2 Reigns in Creative Brainstorms: Kimi K2 topped charts for creative writing alongside GPT-5 Medium and Qwen3-Max, with users joking it trained on Archive of Our Own for immersive outputs. Integrations like Augment Code with Groq outperformed Gemini in coding, praised for token efficiency at $1/m in and $3/m out.
- Cursor Pricing Sparks Ultra Upgrades: Cursor pricing changes dropped usage from a month to under 4 days, but Ultra tier offers $400 API access from providers, easing frustrations over Auto limits. Background agents parsed edits with strict tagging, drawing comparisons to Claudeās Agents for task execution.
- DSPy Sections Defy Exact Counts: DSPy struggled to generate exactly 12 sections in lesson plans, often producing 13-15 even with GPT-5, fixed by first creating titles then fleshing out. Modaic launched as a DSPy-inspired hub with SDK on PyPI for building and optimizing declarative AI programs.
Discord: High level Discord summaries
Unsloth AI (Daniel Han) Discord
- GPT-OSS 120B Sparks Throughput Debate: Members debated the achievable throughput for GPT-OSS 120B, with some claiming 30 tokens per second (TPS) on a 4090 with 64GB RAM, while others struggled to exceed 10 TPS, leading to discussions about quantization and build configurations.
- Experimentation and tweaks in llama.cpp settings, such as disabling
top-k
and optimizing build configurations, are suggested for improved performance.
- Experimentation and tweaks in llama.cpp settings, such as disabling
- Telemetry Collection Raises Eyebrows: Members discovered a telemetry script in Qwen Code models pointing to an Alibaba server without prior notification.
- This discovery sparked a discussion about data privacy and control, with some members expressing discomfort about their code being potentially transmitted for training purposes, but mostly a joke.
- Two-Stage Training Cuts Training Time: A member described using a two-stage training curriculum to rank a real dataset by difficulty, based on loss from a tailored stage1 dataset with unambiguous labels.
- This approach aims to improve training signal and reduce wasted compute by focusing on more difficult examples, with the average difficulty of the real dataset dropping from 2.5 to 0.8 after refining stage1.
- Docker Woes Plague H100 Deployments: A user reported CUDA errors when running a Docker image (that worked on 3090/4090 GPUs) on an H100 GPU, even after rebooting and with seemingly compatible CUDA and Torch versions.
- It was determined that the NVIDIA driver version installed in the Docker image was incompatible with the H100, requiring a driver update to resolve the issue; NVIDIA Data Center Drivers.
- Synthetic Data Taints Closed-Source LLMs: A member shared a finding from an upcoming paper (https://arxiv.org/html/2509.05276v1) suggesting that all closed-source LLMs (Grok, Gemini, GPT, etc.) are trained with synthetic data leading to a zero LTF factor and inability to humanize text.
- They claimed models trained with RLHF, synthetic data, or instruct tuning will likely suffer performance hits due to needing re-biasing, latent thinking rebuilding, and relearning speaking patterns.
Perplexity AI Discord
- Perplexity Finance Goes Mobile: Perplexity Finance is now available on iOS & Android, bringing financial insights to mobile devices.
- Users can now enjoy hotel loyalty support when making bookings through Perplexity.
- Comet Browserās Data Collection Sparks Debate: Users discussed Cometās data collection, with logs showing that Comet sends search suggestions as POST requests to Perplexity servers, even when DuckDuckGo is the search engine, sparking concern that Comet is more intrusive than Chrome.
- Claims arose that the CEO admitted itās designed to track and sell data, although the CEO denied on X.
- Users Leaking Prompts from Top AI Apps!: Users confirmed that the prompts of top AI applications have been leaked and are available on GitHub.
- One user joked to just donāt click here and you are safe with a warning about clicking dangerous image links, to which another responded, Is already there on GitHub LOL.
- Referral Frenzy Fuels Feud: Multiple users shared Perplexity AI Pro referral codes, including this link.
- Users also shared browser claim links, such as this one.
LMArena Discord
- Qwen3 80B Enters the Arena!: The new Qwen3 80B model has arrived in the arena, with a December 2024 knowledge cutoff and showing decent initial performance.
- Members expressed optimism about its abilities, especially when compared to GPT-5.
- Seedream 4ās Image Quality Sparks Debate: Initial results show that Seedream 4 is generating trash results on LM Arena compared to its predecessor, Seedream-3, as illustrated in uploaded examples.
- Conversely, some users report improved image quality with Seedream 4 on the Doubao platform, though access is currently limited to new Chinese users.
- Gemini 3 Remains MIA, Fuels Speculation: The community is eagerly awaiting the arrival of Gemini 3, GLM5, and DeepSeek r2, noting Googleās current lag in text generation compared to both closed and open source initiatives.
- Polymarket estimates only a 42% chance of release by Halloween, suggesting a more realistic launch timeframe in late October or early November.
- DeepSeekās Performance Takes a Dive?: Users have reported extreme slowness with DeepSeek, with one instance of code generation reportedly taking 1 hour and 20 minutes to complete.
- Speculation suggests this may be due to the CCP mandating the use of Huawei chips, which could be negatively impacting overall performance.
- Open Source AI Champions Affordability and Privacy: The discussion highlighted that open source AI is significantly more affordable (1/5 of the price) and offers greater privacy compared to closed-source alternatives like OpenAI and Google.
- While American models may command higher prices due to superior performance, Chinese models like Qwen excel in e-commerce applications but lag in search capabilities, embodying a socialist approach to AI development.
HuggingFace Discord
- HF Inference Credits Cause Mass Panic: Users reported errors with Hugging Faceās Inference Providers exceeding monthly credits despite credits availability.
- One member jokingly suggested to fix yo spending as the error may be related to their usage, rather than the platform itself.
- SmolvLM2 Shakes Up Video LMs: Members shared smolvlm2 (Hugging Face blog, related collection), a video LM designed to run efficiently on lower-end hardware.
- This model is well-suited for use on lower-end hardware.
- Kaggle Gives Away GPU Hours: A member pointed out that Kaggle offers 30 hours of GPU time each week as an alternative for fine-tuning.
- A member suggested using PEFT/LoRA to run fine-tuning on a Tesla T4 within Colab.
- Fluently Project Dumps LLMs: The Project Fluently team released new universal LLM models based on Qwen3 1.7B and 4B, which are available on Hugging Face under the Apache-2.0 license.
- The models were carefully merged after additional training to maximize their potential, including FluentlyQwen3-1.7B.
Cursor Community Discord
- Cursor turns into a Smart Resume Machine: A user found a novel use of Cursor as a smart resume and cover letter generator, with the tool now acting as a resume machine.
- This development sparked lighthearted banter, including jokes about AI domination and assurances of past friendly interactions with the AI.
- Cursor Pricing sparks community uproar: Users voiced discontent over recent Cursor pricing changes, with one userās usage dropping from nearly a month to under four days.
- Despite the cost concerns, one user upgraded to Ultra, citing access to approximately $400 worth of API usage from various providers, which is an improvement over being frustrated with Auto.
- Background Agents vs. Claudeās Agents: A user questioned the similarity between Cursorās background Agents and Claudeās Agents, particularly after an Agentics.org event described agents as specialized services executing specific tasks.
- Another user detailed Cursorās parsing of new edits and its strict tagging structure with cross-connected tags, which enables change tracking and relation display in the left panel.
- Netlify Account Mishap and Cursor: A user initially reported that Cursor deleted their Netlify account following a Netlify project deployment, which turned out to be unrelated as there was no integration.
- The user plans to investigate further by examining logs, confirming that there was no direct deletion command issued by Cursor.
- Cursor App struggles with Unauthorized Errors: A user reported experiencing unauthorized errors within the Cursor app, even after proper repository setup, illustrated by this screenshot.
- A member suggested re-adding the bot from the repository, pointing to this thread about background agent docker issues.
Moonshot AI (Kimi K-2) Discord
- Kimi K2 Powers Creative Writing: Members find Kimi K2, GPT-5 (Medium), and Qwen3-Max to be the top models for creative writing and brainstorming.
- A user jokingly questioned if Kimi K2 was specifically trained on Archive of Our Own (Ao3).
- Edit Feature Launches: A new edit feature has been deployed in Kimi K2.
- The new edit feature is hover-triggered and applies only to the latest prompt.
- Kimi + Groq beats Gemini, Debated GPT-5: Members found Kimi K2 (using Groq) outperforming Gemini in coding tasks.
- Opinions on GPT-5 were heavily debated, with some calling it trash and others praising it as the best model.
- Augment Code plus Kimi Make Great Team: The Augment code VS Code extension combined with Kimi K2 offers a productive programming setup.
- The integration enables access to models like GPT-5 within the Augment Code environment.
- Kimi Slides Feature Creates Buzz: The Kimi K2 slides feature provides an interactive preview of ongoing processes.
- Users appreciate the detailed process visibility, suggesting it enhances the overall user experience.
OpenRouter Discord
- Dropshipping is Pumping Out Profits: A user shared their experience with dropshipping, reporting consistent earnings of 3k-4k per day, suggesting itās more profitable than reselling because it scales without needing significant inventory.
- The user offered to share tips for success to those interested in learning more about dropshipping.
- Gemini API Giving Strange Responses: Users have noticed that the Gemini API is starting to give strange responses, seemingly ignoring instructions despite no code changes since last month.
- A member speculated that Gemini API might be getting lobotomized and quanted like hell to cut costs.
- OpenRouterās TPS Numbers Questioned: A user questioned if OpenRouterās TPS numbers are inflated, citing a 5-minute delay for a diff on a 100-line file.
- It was suggested that the user may have been routed to a slow provider or using a reasoning model, impacting the observed TPS.
- Skyrim Mod Installation Throws Error 401: A user reported receiving an Error 401 No auth credentials found when installing the Skyrim mod mantella on OpenRouter API.
- A member suggested creating a new API key and ensuring itās used correctly, or seeking support from the mod developers to resolve the authentication issue.
- Kimi-k2 Praised for Token Efficiency: Members had positive feedback regarding the open source model Kimi-k2, praising its token efficiency, conciseness, lack of sycophancy, and different style.
- While not as smart as larger closed-source models, Kimi-k2 offers low pricing on Groq at $1/m in, $3/m out, with very fast speeds.
Nous Research AI Discord
- Qwen3 80B Shows Sparse Prowess: The Qwen3 80B model has 79.7B parameters with only 3.87B active due to a 1:51.2 sparsity in its MoE, excluding shared parameters, according to this X post.
- This unique architecture allows for efficient computation while maintaining high performance.
- Hermes Gains Zero RL Powers via TypeScript: A user implemented a provider adapter interface in TypeScript for Nous Hermes to autonomously schedule RL jobs with Prime Intellect at regular intervals.
- The user joked the system was inspired by a dream to have Hermes solve immortality for their dog, demonstrating the potential for advanced AI applications.
- Discord Servers Seek Union: Members are exploring methods to bridge the NousResearch and Unsloth Discord servers using both asynchronous methods and more complex solutions with webhooks and interconnected bots.
- A member suggested integrating the servers into a new application using Compose to streamline the workflow, as shown in this image.
- Altman Hints at Deep Merge: Discussion surrounded Sam Altmanās interview with Tucker Carlson, where some suggested that Altmanās responses and third-person speaking style indicated a deep belief in the merge and its pursuit of immortality, drawing parallels from his 2017 blog post.
- The interview sparked conversations about the philosophical implications of AI and human integration.
- Researchers Probe LLM Preferences: A member shared a link to Valen Researchās probing of LLM preferences and the related ArXiv paper, noting the terminology may be a bit complex to understand without reading the whole paper.
- Another member shared a related tweet.
Eleuther Discord
- Fluidic Neural Nets run on Navier-Stokes: A member shared a paper on running a neural network on a computer using Turing-complete fluid dynamics governed by the Navier-Stokes equations.
- Debate ensued about the practicality and efficiency of fluid-based computation, touching on its unique characteristics of mortality and unreproducibility, with a pointer to running Doom on gut bacteria.
- Gated Delta Rule Expressiveness Trade-Offs: Members questioned the expressiveness of the Gated Delta Rule, referencing Qwenās post and the RWKV-7 paper (https://arxiv.org/abs/2503.14456).
- Discussion covered the trade-offs between parallelization and expressiveness, with concerns that work on attention and mamba1/2 is limited by TC0, as well as this paper discussing limits of parallelism on complexity.
- Context is King for Long Sequence Lengths: A talk suggested that long context models perform better because longer sequence lengths enable more computation, contrasting with the classic Constant Time forward pass.
- Skepticism was raised, suggesting that inductive bias and optimization targets are more critical.
- Small Models Stumble on Long Tasks: A paper (https://arxiv.org/abs/2408.00677) measured the effect of scale and thinking on straightforward execution of long tasks, finding that smaller models fail faster in multi-turn scenarios.
- Even with 100% accuracy, small models degrade per-step accuracy over more turns when exposed to prior mistakes.
- TinyStories, Wiki, and FineWeb for Pretraining?: A member asked about pretraining a 400M model on FineWeb only versus Wiki + FineWeb, prompting a discussion on data mixing strategies.
- A phased training approach was recommended, starting with TinyStories, transitioning to Wikipedia, and then finishing with FineWeb to incrementally build skills.
GPU MODE Discord
- Nebiusās B200s Bolster Bay Area Hackathon: Generous compute sponsor Nebius is providing 215 networked B200 GPUs for the SF hackathon on Oct 24, as detailed in the compute request form.
- Authorities on Multi-GPU programming will also be at the SF Hackathon on Oct 24 to assist attendees in pushing the boundaries of distributed computing.
- vLLMās pip switch breaks Torch Nightly: vLLM switched to
uv pip
to custom build with a pre-installed torch version, but it uninstalls nightly torch, breaking the environment.- One user reverted to
v0.10.1
and thepython use_existing_torch.py
trick, but another confirmed that no longer works with the uv pip PR.
- One user reverted to
- Gemma3 Gets Ground Up Treatment: A user built Gemma3 270M from scratch using PyTorch and the TinyStories dataset, training for 10 hours on an A6000 GPU.
- They logged plots with Weights and Biases and used Claude Opus 4.1 as a judge, sharing links to the GitHub repo, and model weights on Hugging Face.
- IRIS Install Movie Hits the Big Screen: The install process for IRIS has been simplified and can be installed via
pip install git+https://github.com/ROCm/iris.git
, provided ROCm + Torch + Triton + TorchDistributed are installed.- The user provided a video of a sample install.
- CuTeDSLās Calculation Clash with PTX Docs: A user found that the CuTeDSL value of the Swizzling atom for the TF32 datatype and Swizzle<3,4,3> is 32 but the PTX documentation value is 36.
- The user believes the CuTeDSL implementation is correct and provides images to their replication of examples using CuTe.
Latent Space Discord
- GPT-OSS Optimizations Accelerate: Vaibhav Srivastav highlights a Hugging Face blog post detailing optimizations like MXFP4 quantization, custom kernels, and tensor/expert parallelism for gpt-oss.
- These enhancements yield substantial speed improvements, supported by benchmarks and reproducible scripts.
- Palmyra-mini Models Released for Reasoning: Sam Julien unveils the Palmyra-mini family by Writer, which are compact, open-source models tailored for reasoning, which includes a base model (palmyra-mini) and three variants, and is available on Hugging Face.
- The models demonstrate impressive performance, with one excelling in complex reasoning/math (GSM8K 82.9% AMC23 92.5%) and another achieving top scores on AIME24, GPQA, and MATH500.
- Anthropic Publishes LLM Agent Engineering Guide: Anthropic introduces a practical engineering guide on crafting tools to enhance the capabilities of LLM agents.
- The guide emphasizes rapid prototyping, rigorous evaluation suites, clear success criteria, thoughtful tool descriptions, token-efficient context design, and the need to accept the non-deterministic nature of agents, accessible here.
- Cursorās Tab Completion Model Improved: Cursor has announced on Twitter that a new Tab completion model, trained with online reinforcement learning, is now the default on their website.
- The new model shows a 21% reduction in suggestions with a 28% increase in acceptance rate.
- Higgsfield Secures $50M Funding for AI Video: AI video startup Higgsfield announced a $50M Series A round led by GFT Ventures, achieving a $50M revenue run-rate within three months.
- The company is also launching Higgsfield Ventures to back AI-native Gen Z founders.
LM Studio Discord
- Download Speed Causes Crashes: Users are finding that download speeds in LM Studio are causing crashes and seeking ways to limit download speed because they are exceeding the write speed of their SSD.
- The current download manager is barebones and users must find their own solutions within their OS.
- Flash Attention Falters: Users confirmed that flash attention is broken in Gemma models when using Vulkan.
- This is a known issue.
- Powering Precision for Peak Performance: A discussion on calculating necessary PSU wattage referenced a tweet and formulas accounting for CPU, GPU, and overhead.
- It was cautioned that transients can cause system crashes and that a 50% overhead is recommended, especially with older 30 series GPUs.
- Copilotās Constraints Confine Creators: Users sought prompts to bypass restrictions in Microsoftās Copilot to improve workflow.
- It was advised that safeguards are intentionally implemented and building a local agent with LM Studio might be a more sustainable solution.
- Dead GPU Comes Back to Life: A user seemingly fixed their dead secondary GPU by unplugging and cleaning the PCI-E power connector, suggesting a power-related issue, although TBD if this is fully resolved.
- Another user suggested updating chipset drivers when using Native ASPM with Nvidia 40/50 series cards.
Modular (Mojo š„) Discord
- Mojo Dev Container Emerges for Development: Community members shared a dev container link on how to create a custom Mojo development environment using existing images and the Mojo package.
- The discussion focused on streamlining the setup process for developers to quickly get started with Mojo.
- ExplicitlyCopyable Switch Praised for Debugging: The switch from
Copyable
toExplicitlyCopyable
was lauded for its assistance in debugging recursive mutations of EmberJson trees.- One user stated that knowing when and where things get copied has made this easy to debug.
- Modular & Oracle Cloud Partnership is a huge win: The community congratulated the Modular team on their partnership with Oracle Cloud, which was described as a huge win.
- The partnership is expected to bring increased resources and opportunities for the Mojo ecosystem.
- DPDK Library Use in Mojo Testing: Members explored using DPDK as a C library test case for Mojoās automatic C binding, given its comprehensive use of the C language and syntax.
- The extensive syntax and module linking in DPDK make it beneficial for testing Mojoās C binding capabilities, leading to a reevaluation of the necessity for a separate āc binding cliā in the short to mid term.
- Clang AST Parser Boosts Mojo Struct Handling: A member detailed using the clang AST parser to resolve macro sections for struct definitions, exemplified by
struct __rte_cache_aligned rte_mbuf
.- Their aim is to enhance the generated AST JSON with added type information, transforming strings of types into proper AST nodes for visual debugging ahead of conversion to Mojo.
OpenAI Discord
- Albania Appoints Chatbot as Minister: Albaniaās recent appointment of a governmental chatbot as a minister became a real r/NotTheOnion moment.
- One member confirmed this strange news while another member seemed aghast.
- GPT-5 PDF Downloads Encounter Snags: A user reported issues with PDF downloads from GPT-5, encountering a āFailed to get upload status for /mnt/data/ā error when attempting to download PDFs.
- The user is actively seeking insights or assistance to resolve this download issue specifically with GPT-5.
- Relational Prompting Unveils LLM Internals: A member introduced Relational Prompting, a technique where prompts ask the model to verbalize internal relationships between learned concepts, creating an interpretable map of its semantic space based on proximity, direction, and clusters, inspired by the paper Why Language Models Hallucinate.
- The suggested prompt is: Analyze the topic as vectors in a high-dimensional space. Describe which concepts are closest, which share directions, and which form clusters. Provide concise verbal justifications.
- Qwen-code differs from Qwen-coder: A user emphasized that Qwen-code is a distinct entity from Qwen-coder, clarifying potential confusion.
- Another user pointed out a gemini-cli fork that is also openai api compatible, offering 1000 free qwen prompts daily, describing it as a sweet deal.
- GPT-5 Codes Games From Scratch: A user expressed excitement about using GPT-5 to code games from the ground up in C++ on native Linux, underlining the detailed level of prompting required.
- Another user prompted ChatGPT to estimate its age based on active users and prompt frequency, resulting in a calculation of ~3,425 years of continuous AI time per calendar year.
Yannick Kilcher Discord
- Active Inference Faces Adoption Deficit: Despite its theoretical promise, active inference sees limited real-world applications in AI, resulting in a decrease in interest and is intangible to some software engineers.
- A member expressed hope that the field will become more practical once they figure it out more.
- Machine Learning Street Talk Podcast Declines Technically: The Machine Learning Street Talk podcast is perceived to be less technical, with discussions veering into crankery territory.
- Although one member noted the decline from 2 years ago, they cited a technical example that still held up.
- fixupx Pre-Prints Provoke Criticism: The proliferation of pre-prints on platforms like fixupx.com sparks negative reactions due to perceived low quality.
- Further links include this one.
- HuMo paper gaslights community: Members suggest that the HuMo paper and its accompanying demo may have disinformation use-cases.
- One member pointed out that HuMo translates to gaslighting in Spanish, raising concerns about its potential misuse.
- Albania Installs AI Bot Minister: Albania is set to appoint an AI bot minister to tackle corruption, signaling growing interest in AI solutions for governance.
- The story was reported by Reuters.
DSPy Discord
- DSPy Section Generation Proves Difficult: A user reported difficulty generating an exact number of sections in DSPy lesson plans, with LLMs producing 13-15 sections instead of the requested 12, even with GPT-5.
- Joel Grus suggested generating 12 section titles first, then fleshing each out, to better control section count.
- Databricks_genai plus DSPy for Fine-Tuning?: A community member inquired about using databricks_genai and DSPy to fine-tune a model served on Databricks.
- The question went unanswered, suggesting a possible lack of experience in this combination.
- ARC-AGI2 In-Context Training Collaboration Sought: A member is seeking collaborators for ARC-AGI2 research using in-context test time training, mirroring approaches on ARC-AGI1 but emphasizing in-context learning.
- The goal is to explore in-context learning limits on out-of-distribution tasks with limited data, acknowledging that the work would be invalid for the official challenge.
- DSPy Stream Templates Discussed: A user explored combining multiple DSPy output fields into a single, template-based output while retaining streaming capability.
- Ian suggested using a parent module with
def forward
(orasync aforward
for async) to modify the template and enable streamify, referencing the article Automatic System Prompt Optimization.
- Ian suggested using a parent module with
- Modaic Launches Declarative AI Hub: The Modaic team launched Modaic, a hub for declarative AI programming, inspired by DSPy, featuring primitives like metrics and optimizers.
- Modaic provides an SDK for building, composing, version controlling, and collaborating on DSPy programs, with its SDK available on PyPI and documentation on docs.modaic.dev.
tinygrad (George Hotz) Discord
- tinygrad Documentation Praised: A member lauded tinygradās documentation for its usefulness and simplicity, repeatedly stating that things make sense.
- The straightforward documentation structure made it easier to grasp complex concepts.
assign
Operation Faces Scrutiny: Theassign
operation in tinygrad is under investigation after failing tests, with one user noting that assign on master is actually just broken, failing test here (#12131).- The discussion revolves around whether
assign
should return a value similar tostore
, potentially necessitating refactoring inrangeify
to resolve the identified issues.
- The discussion revolves around whether
- Contributors Tackle
__setitem__
Refactor: A contributor is working to removerealize
calls from__setitem__
, with the goal of consolidating multiple kernel calls into a single, more efficient kernel (code example).- This refactoring aims to transform individual
__setitem__
calls into a single kernel execution, accumulating all assignments to reduce kernel launch overhead and improve performance.
- This refactoring aims to transform individual
- GEMM TFLOPs Benchmark Target Debated: Users debated whether achieving the target of 165+ TFLOP GEMM (match torch) with multistage kernel on 4090, FP16 or BF16 with FP32 acc is feasible, considering the RTX 4090ās theoretical throughput.
- Concerns were raised that unless the actual clock speed exceeds the boost clock, reaching the target TFLOPs may be unrealistic.
- tinygrad Company Meeting Scheduled: A member inquired about the next company meeting, expressing interest in attending if possible.
- The meeting is scheduled for Monday at 9 am San Diego time.
aider (Paul Gauthier) Discord
- RepoMap Benchmarks Raise Eyebrows: Concerns were raised about RepoMap artificially inflating pass rates in benchmarks, suggesting that ārepo map results are not comparable to non repo map results.ā
- Itās believed that RepoMap enhances confidence in weaker models by providing relevant context within their window.
- Real-World Benchmarks Demand Revision: A call for benchmarks reflecting real-world model experience highlighted that an automation task was only achievable with gemini-2.5-pro.
- This suggests current evaluation approaches need revision to reflect true performance as Gemini 2.5 pro outperformed all others.
- Aiderās Power Boost from RepoMap: RepoMap enhances LLM understanding by providing context like filenames and function signatures.
- One user advocating for RepoMap use in Aider for more accurate real-world benchmarking, though noting discrepancies between benchmark results and actual code scenarios.
- Aiderās C to Rust Capers Cause Confusion: A user encountered issues migrating C to Rust with aider in a python script due to difficulty in aider navigating and reading C files.
- Guidance is sought on properly utilizing aider for this specific functionality.
- Asking Aider to Always /ask: A user seeks to configure aider to consistently start in /ask mode, potentially via a YAML config.
- Solutions proposed include using
aider --chat-mode ask
or creating anask.yml
config file withchat-mode: ask
then runaider -c ask.yml
.
- Solutions proposed include using
Manus.im Discord Discord
- WordPress ditches PHP for React.js: A member inquired about converting a WordPress website to Next.js for hosting on Vercel, noting the shift from PHP to React.js.
- Another member suggested cloning the website using Manus or other AI tools as an alternative.
- Basic Plan Subscribers Lament Top-Up Troubles: A Basic Plan subscriber expressed dissatisfaction with the removal of the option to buy extra credits, which forces users to upgrade even for small needs.
- They requested that Manus AI reconsider reopening top-ups for Basic users, emphasizing the importance of flexibility.
- Mount Users are Short-Changed Free Credits: A new user reported not receiving the standard 1,000 free credits upon creating an account on Mount, despite the websiteās claim.
- No resolution or further information was provided in the discussion.
- Manus in Search of Universal Knowledge: A member asked if Manus can pull information from all chats to interlink knowledge from each chat/task for universal usage.
- No response or clarification was provided regarding Manusās knowledge interlinking capabilities.
- Users Lose Daily Credit Stipend: A user reported that their daily 300 credits had stopped being issued, prompting confusion.
- No solution or further information was provided in the discussion.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Unsloth AI (Daniel Han) ā· #general (1276 messagesš„š„š„):
GPT-OSS 120B, Qwen3 model, Local AI, llama.cpp, Telemetry collection
- GPT-OSS 120B Sparks Throughput Debate: Members debated the achievable throughput for GPT-OSS 120B, with some claiming 30 tokens per second (TPS) on a 4090 with 64GB RAM, while others struggled to exceed 10 TPS, leading to discussions about quantization and build configurations.
- Experimentation and tweaks in llama.cpp settings, such as disabling
top-k
and optimizing build configurations, are suggested for improved performance.
- Experimentation and tweaks in llama.cpp settings, such as disabling
- Unlocking the Potential of Qwen3 Model: The community evaluated Qwen3, noting the challenges in finetuning and its tendency for glazing/emojis, but highlighting its competitive performance, especially the coder version, with some finding it comparable to GPT-5 for coding tasks.
- Members discussed that different models are sensitive differently to quant levels and its architecture, but long RL, stemmaxxing, high sparsity may be the reason for the long context.
- Telemetry Collection Raises Concerns: Members discovered a telemetry script in Qwen Code models pointing to an Alibaba server without prior notification.
- This discovery sparked a discussion about data privacy and control, with some members expressing discomfort about their code being potentially transmitted for training purposes, but mostly a joke.
- Navigating Local AI Setups: The group shared experiences and tips on setting up local AI environments, including optimizing llama.cpp builds, using techniques like CUDA architectures and RAM configurations, with one user detailing their journey of recompiling llama.cpp multiple times to improve performance.
- Thereās also the problem with running models in different setups, like āis there any lib versatile enough to support across Mac and Nvidia at the same time?ā or āmulti-agent reasoning stuff (Info: Nvidia NIM is straining and limiting lately.)ā
- The Curious Case of llama.cpp Parameters: Users found inconsistencies with llama.cpp with llama-server ignoring
top-k
and other settings - suggesting to build a fresh compile to see if parameters are being ignored.- This sparked discussion on troubleshooting potential configuration issues when running local models and using new experimental versions, which can be found at the deepwiki page.
Unsloth AI (Daniel Han) ā· #introduce-yourself (3 messages):
Partnership Opportunity, Introduction of Anand
- Developer eyes Profit-Sharing Partnership: A software developer (<@569561523207536708>) with 13+ years of experience is seeking a partner for a paid collaboration with good profit potential.
- Interested parties are encouraged to DM for more details about this non-free opportunity.
- Anand Introduces Himself as Aspiring Dev: Anand (https://github.com/Anand-0037), a CS student from India, introduces himself to the community.
- No additional details were provided.
Unsloth AI (Daniel Han) ā· #off-topic (125 messagesš„š„):
Promptwright DAG dataset generation, Curriculum two-stage training for datasets, RTX 3080 language model training speed, NVIDIA DGX Spark reservations, Android sideloading restrictions and alternatives
- Promptwrightās DAG Dataset Dance Debuts: A member announced a new experimental Directed Acyclic Graph (DAG) dataset seed generation algorithm in Promptwright, suitable for domain-specific distillation (teacher -> SLM) synthetics.
- They cautioned about potential curve balls when generating huge datasets due to limited testing.
- Two-Stage Training Triumphs in Trimming Training Time: A member described using a two-stage training curriculum to rank a real dataset by difficulty, based on loss from a tailored stage1 dataset with unambiguous labels.
- This approach aims to improve training signal and reduce wasted compute by focusing on more difficult examples, with the average difficulty of the real dataset dropping from 2.5 to 0.8 after refining stage1.
- Estimating Empirically: Elucidating LM Parameters for RTX 3080: A discussion explored the size of language models trainable on an RTX 3080 with 1B tokens in under an hour, with one member suggesting GLM 4.5 Air (~10-15M params) might fit the bill.
- The member presented their reasoning for the parameter size estimates of the various model architectures.
- NVIDIA DGX Spark Speculation Sparks Shopping Sprees: A member shared a Reddit post about NVIDIAās DGX Spark, noting the FOMO title and wondering if CUDA will run on it out of the gate.
- Another member jokingly expressed a desire to purchase one despite being broke.
- Android Angst: Sideloading Sabotage Spurs Switching Speculation: Members discussed Googleās potential restrictions on Android sideloading, with some speculating it could lead users to switch to iPhones or explore alternatives like Ubuntu.
- One member pointed out that registration requirements may exclude developers from countries like Iran and Russia, while another highlighted that Appleās sideloading restrictions are also being influenced by the EU.
Unsloth AI (Daniel Han) ā· #help (125 messagesš„š„):
Unsloth's save_pretrained_merged method, Docker image compatibility issues with H100 GPUs, Deploying Unsloth models in production, GRPO with Qwen 4B, 4-bit BNB model deployment with vLLM
- Unslothās save_pretrained_merged method: A user inquired about how to push a merged model to Hugging Face Hub after Lora fine-tuning with Unsloth, noting that the
model.save_pretrained_merged
method only saves locally.- Another user suggested using the
model.push_to_hub_merged
method, which takes the model name, tokenizer, and Hugging Face token as arguments, to directly push to the hub without needing to save locally first.
- Another user suggested using the
- Docker woes: 3090/4090 vs H100 incompatibility: A user reported CUDA errors when running a Docker image (that worked on 3090/4090 GPUs) on an H100 GPU, even after rebooting and with seemingly compatible CUDA and Torch versions.
- It was determined that the NVIDIA driver version installed in the Docker image was incompatible with the H100, requiring a driver update to resolve the issue; NVIDIA Data Center Drivers.
- vLLM for productionalized Unsloth models: A user asked about tutorials for deploying Unsloth models in production on platforms other than Hugging Face.
- Options suggested included using vLLM (vLLM documentation), SGLang, and Hugging Faceās hosting service, with one user specifically recommending vLLM due to its battle-tested nature.
- Unleashing the Power of Batching in llama.cpp: A user inquired whether it was possible to do batch inference using llama.cpp.
- Another user confirmed that llama.cpp server supports continuous batching by default.
- Tiny data? No problem!: A user sought advice on training a Llama3.2 model with a very small dataset (~214 conversations), expressing dissatisfaction with synthetically generated data.
- A member recommended using the instruct version, and playing with r/alpha and some hyper param like lr and stuff.
Unsloth AI (Daniel Han) ā· #showcase (8 messagesš„):
Kimi-K2-Instruct (FP8), vllm plugin
- Kimi-K2-Instruct runs at 256xH20: A user reported stats for Kimi-K2-Instruct (FP8) running at 256xH20 TP16, which took 1.88s to start, 21.50s (2.99GiB) for the first run, and 34.49s (4.57 GiB) for the second run.
- vllm plugin or standalone?: A user asked whether Kimi-K2-Instruct (FP8) acts as a vllm plugin or standalone.
Unsloth AI (Daniel Han) ā· #research (8 messagesš„):
LLM inference determinism, Synthetic data in LLM training, Gemma 3 performance, AI humanizers scam
- LLM Overtraining Causes Deterministic Outputs: A member jokingly stated that they overtrained their LLM so much, that 90% of the time I regenerate my prompt, the output is the same implying that overtraining may lead to more deterministic outputs.
- They expressed surprise that despite potential widespread thought on the matter, it hasnāt been thoroughly investigated.
- Synthetic Data Found in Closed-Source LLMs: A member shared a finding from an upcoming paper (https://arxiv.org/html/2509.05276v1) suggesting that all closed-source LLMs (Grok, Gemini, GPT, etc.) are trained with synthetic data leading to a zero LTF factor and inability to humanize text.
- They claimed models trained with RLHF, synthetic data, or instruct tuning will likely suffer performance hits due to needing re-biasing, latent thinking rebuilding, and relearning speaking patterns.
- Gemma 3 only usable models: The member proposed that the only model found usable is Gemma 3 (4B, 12B, and 27B) citing its excellent performance and lack of watermarks.
- Another member added that the dataset used was human (instead of synthetic).
- āAI Humanizersā are a hoax: The member claimed that all AI humanizers are a scam, often being 4o-mini with a special prompt, discoverable via prompt injection and HTTPS interception.
- Another member pointed out the nonsensical part is that Gemma 3 models are distilled from Gemini to begin with.
Perplexity AI ā· #announcements (1 messages):
Perplexity Finance on iOS & Android, Hotel loyalty support for bookings, Streamlined PDFs in Labs & Research modes
- Perplexity Finance Goes Mobile: Perplexity Finance is now available on iOS & Android, bringing financial insights to mobile devices.
- Loyalty Rewarded: Hotel Bookings Get Loyalty Support: Users can now enjoy hotel loyalty support when making bookings through Perplexity.
- PDFs Streamlined in Labs & Research: PDF handling has been streamlined in Labs & Research modes for a smoother experience.
Perplexity AI ā· #general (790 messagesš„š„š„):
Comparing Perplexity to ChatGPT and Gemini, Comet Browser, Perplexity Pro, Gemini Pro photo editing, AI Model Leaks
- Pro Users Debate Value of Perplexity vs ChatGPT: A user asked for a comparison of Perplexity Pro to ChatGPT Plus or Gemini AI Pro, receiving feedback that ChatGPT and Gemini have higher context and are suitable for heavy, complex tasks.
- Others noted Perplexity provides accurate answers with image and video generation, with some preferring Perplexityās style for simple searches, but that ChatGPT is superior for PDF analysis.
- Comet Browserās Data Collection is Hotly Debated: Users discussed Cometās data collection, with one user providing logs showing that Comet sends search suggestions as POST requests to Perplexity servers, even when DuckDuckGo is the search engine.
- This sparked concern that Comet is more intrusive than Chrome, with claims that the CEO admitted itās designed to track and sell data, although this was disputed by others citing the CEOās denial on X.
- Tips and Tricks for Optimizing Perplexity Pro: A new Perplexity Pro user asked for tips on optimizing their subscription, with one user recommending exploring the Comet Agent and its built-in adblock and AI summarization features, along with its customizable UI.
- Others added that Perplexity Pro provides unlimited use for pro search, 300+ deep research queries a day, and 50 labs a month.
- Photo Editing faceoff: Gemini Proās Edits Outshine Perplexity: One user reported that Gemini photo editing was amazing and on point, after providing a description to Gemini.
- The same user then used the same description in Perplexity, but Perplexity changed the whole image.
- Users Confirm Prompts for Top AI Apps Have Been Leaked: Users confirmed that the prompts of top AI applications have been leaked and are available on GitHub.
- One user joked to just donāt click here and you are safe with a warning about clicking dangerous image links, to which another responded, Is already there on GitHub LOL.
Perplexity AI ā· #sharing (13 messagesš„):
Perplexity AI Referral Codes, Shareable Threads, CaviraOSS/neuropilot
- Referral Frenzy Fuels Feud!: Multiple users shared Perplexity AI Pro referral codes, including this link.
- Users also shared browser claim links, such as this one.
- Shareable Threads Shamed!: The Perplexity AI bot reminded several users to ensure their threads are Shareable, with a link to instructions on how to do so.
- These automated messages were directed at users who may have posted content that wasnāt easily accessible to the broader community.
- Neuropilot Navigates New Horizons!: A user shared a link to the CaviraOSS/neuropilot GitHub repository.
- No further context was provided, but this suggests potential interest in the project within the community.
Perplexity AI ā· #pplx-api (1 messages):
anshuman_.9: hi
LMArena ā· #general (736 messagesš„š„š„):
Qwen3 80B, Seedream 4, Gemini 3, DeepSeek slowness, Open Source AI vs Closed Source AI
- Qwen3 80B Arrives in the Arena!: The new Qwen3 80B model has arrived to the arena, boasting a December 2024 knowledge cutoff and decent initial performance.
- Members shared excitement about this, and were optimistic about its abilities compared to GPT-5.
- Seedream 4 Image Quality Divides Users: Seedream 4 is generating trash results on LM Arena compared to its previous version, Seedream-3, as demonstrated in uploaded examples.
- Some reported that they are seeing improved image quality with Seedream 4 when used on the Doubao platform, but access is restricted to chinese new users only.
- Gemini 3ās No-Show Fuels Speculation: Members are impatiently waiting for Gemini 3, GLM5, and DeepSeek r2, with some pointing out Googleās current lag in the text generation against closed and open source efforts.
- Polymarket shows only a 42% chance of a release by Halloween, with a more realistic launch window in late-October/early-November.
- DeepSeekās Server on Life Support?: Users reported that DeepSeek is experiencing extreme slowness, with one instance of code generation taking 1 hour and 20 minutes.
- This could be due to CCP forcing them to use Huawei chips, impacting performance due to lack of independence.
- Open Source AI Pushes for Price and Privacy: The discussion leaned towards open source AI being significantly cheaper (1/5 of the price) and more privacy-respecting than closed-source alternatives like OpenAI and Google.
- Members noted that while American models may have higher prices due to better performance, Chinese models like Qwen are really good at e-commerce, lack behind in search, and represent a socialist approach.
LMArena ā· #announcements (2 messages):
Hunyuan-image-2.1, Seedream-4-high-res
- Hunyuan-image-2.1 debuts in LMArena: The Hunyuan-image-2.1 model has been added to the LMArena chatbot.
- It is now available for community evaluation and comparison against other models.
- Seedream-4-high-res joins LMArena roster: The Seedream-4-high-res model is now part of the LMArena chatbot lineup.
- Users can test its capabilities and provide feedback on its performance.
HuggingFace ā· #general (185 messagesš„š„):
n8n freelance jobs, Transformer architecture fine-tuning, GPU for fine-tuning, OpenAI investing in Hugging Face, Local LLM Linux box parts
- n8n jobs are hot!: There are lots of freelance jobs on n8n these days, possibly because they canāt sell the systems.
- The users jokingly says that n8n probably prefers to build rather than sell the systems.
- Fine-tuning Transformers: Kaggle and Colab are key: One member is focusing on fundamentals of transformers architecture and fine-tuning, utilizing Kaggle and Colab for the task.
- When asked if they were fine tuning on their own PC, they confirmed they use Kaggle and Colab.
- HF Platform Glitches: Inference Credits Cause Uproar!: A user reported errors with Hugging Faceās Inference Providers exceeding monthly credits despite having credits available, prompting calls to fix the platform.
- Another member jokingly suggested to fix yo spending as the error may be related to their usage, rather than the platform itself.
- OpenAIās Bold HF Investment: A Cool $100B Idea: A user suggested that OpenAI should invest 100B into Hugging Face, to which another responded that they should send you for the pitch.
- One member expressed hope for more open-source models from HF and lamented platform errors, while someone else joked that they should receive the 100bn instead.
- SmolvLM2: Smallest Video LM Ever!: A member suggested trying smolvlm2 to another member, linking to the Hugging Face blog and the related collection.
- This model appears well-suited for use on lower-end hardware.
HuggingFace ā· #cool-finds (50 messagesš„):
Direct/Inverse FFT, QKV Calculations, Runaway Loss Value Recovery, Android Audio Implementation, NWaves DSP Library
- FFT Chat and Live Phone Streaming: A member mentioned doing a lot of direct/invNorm FFT and shared a Proof of Concept adding āOh Damn thatās sick. Iām also doing a lot of direct / invNorm FFT - happy to chat!ā.
- They also mentioned that āMaking it live-stream on a phone was a nuisance š Itās all CPU Compute shaders in progress, ugh!ā
- Bypassing QKV Calculations: A member noted that FFT-related stuff can approximate QKV calculations, but their implementation completely bypasses QKV.
- Another member then added *āThatās a really nice sound⦠I love this kind of electronic music.ā
- Audio Debugging and Android Audio: A member discussed debugging audio signals noting that they are working on an audio thing in Android and the official android music player is an
ExoPlayer
which comes from theMedia3
package.- They also linked to a YouTube video with amazing sound, mentioning itās a 30 second snippet of a Radiohead track.
- GPTās Role in Innovation: A member mentioned that after weeks of brainstorming with GPT, code for an innovation suddenly appeared, but was cautioned that āBe careful, the wheel exists šā.
- Another member related that they asked GPT5 āDoes this exist?!ā, and it replied āNo - you are newing the spaceā
- Reading Recommendations: In a discussion about books, a member recommended Thinking, Fast and Slow by Daniel Kahneman.
- Someone posted their favorite song here adding *āSuch a wonderful song - I would a happily die on this hill.ā
HuggingFace ā· #i-made-this (5 messages):
Hexagen.WorldAerelyth Game, Aerelyth Intelligence, FluentlyQwen3 Models, Nano Banana Editor
- Hexagen.WorldAerelyth Game goes Live: A member released Hexagen.WorldAerelyth, a Stable Diffusion game/social experiment.
- Aerelyth: Intelligence That Anticipates: A member is exploring Aerelyth on Hugging Face, framing it as a dialectical, agentic, CrossSphere intelligence designed to simulate futures and challenge its own logic.
- The key components of Aerelyth include a Dialectical Core, Agentic Cognition, CrossSphere Intelligence, Strategic Foresight Engines, and Emotional Fluency.
- Fluently Project Releases Universal LLMs: The Project Fluently team released new universal LLM models based on Qwen3 1.7B and 4B, which are available on Hugging Face under the Apache-2.0 license.
- The models were carefully merged after additional training to maximize their potential, including FluentlyQwen3-1.7B.
- Nano Banana Editor Gets Upgrades: A member posted a link to some upgrades to the Nano Banana Editor.
HuggingFace ā· #NLP (2 messages):
Paid collaboration, Freelance developers
- Paid Collaboration Opportunity Knocks: A member announced a paid collaboration opportunity for freelancers with at least one year of software development experience or some development knowledge.
- The opportunity is open to those based in Singapore, Malaysia, Japan, the Arab States, Saudi Arabia, Europe, or the Americas.
- Freelance Developer Search Begins: The collaboration seeks individuals with a background in software development, even if they are not currently active developers, as long as they possess at least one year of experience.
- Interested parties who meet the criteria are encouraged to directly message the member to explore potential collaboration.
HuggingFace ā· #smol-course (20 messagesš„):
Colab and HF Free Tiers for Fine-Tuning, Kaggle GPU Availability, Study Groups, PEFT/LoRA for Colab, DataCollatorForCompletionOnlyLM ImportError
- Colab and HF tiers sufficient for finetuning?: A member asked if Colab and HF free tiers are sufficient for fine-tuning tasks in the course without personal GPUs.
- Kaggle provides free GPU hours: A member pointed out that Kaggle offers 30 hours of GPU time each week as an alternative.
- Another member noted excitement about the course and catching up on current tools and techniques since 2021.
- Study groups forming to tackle course: Several members expressed interest in joining or forming study groups for the course.
- A member shared a link to contribute to discussions and doubts, with plans to organize activities as the group grows.
- PEFT/LoRA to the rescue in Colab: A member suggested using PEFT/LoRA to run fine-tuning on a Tesla T4 within Colab.
- Another member requested clarification on a code snippet in the āTraining with Tool Usageā section, specifically asking for an example dataset.
- Troubles with DataCollatorForCompletionOnlyLM: A member reported an
ImportError: cannot import name 'DataCollatorForCompletionOnlyLM' from 'trl'
.
HuggingFace ā· #agents-course (2 messages):
First Hugging Face Course, Building First Agent
- User Starts First Hugging Face Course: A user mentions starting their first Hugging Face course and building their first agent.
- User builds their first agent.: The user is currently building their first agent as part of the course.
Cursor Community ā· #general (249 messagesš„š„):
Smart Resume, Cursor Pricing, Background Agents, Netlify account
- Cursor turns into a Smart Resume Machine: A user turned Cursor into a smart resume and cover letter generator.
- Another member joked about it being a step towards human domination, prompting others to remind the AI of their past friendly interactions.
- Cursor Pricing gets Questioned: Users expressed concerns about the recent changes to Cursorās pricing, with one noting their usage drastically reduced from almost a month to less than four days.
- Despite the cost, one user upgraded to Ultra, citing it provides access to around $400 worth of API usage from various providers, which is better than being frustrated with Auto.
- Background Agents Explored: A user asked if Cursorās background Agents are similar to Claudeās Agents after attending an Agentics.org event that described Agents as specialized services performing specific tasks.
- Another user described Cursorās parsing of new edits and its strict tagging structure with cross-connected tags, enabling it to note changes and display relations in the left panel.
- Cursor Deletes Netlify Account?: A user claimed that Cursor deleted their Netlify account after deploying their Netlify project, but later found out there was no actual integration from the IDE.
- The user shared they will further investigate and check logs before confirming the theory, adding there was no direct deletion command.
Cursor Community ā· #background-agents (4 messages):
Cursor unauthorized error, Background agent docker issues
- Cursor App faces Unauthorized Errors: A user reported receiving unauthorized errors despite the Cursor app being set up correctly in the repository, with an attached screenshot.
- Bot Re-Adding Remedy: A member suggested trying to unplug and re-add the bot from the repository to fix the unauthorized errors.
- They linked to a thread discussing background agent docker issues and expressed a desire for official communication on the matter.
- Docker Permissions: A user inquired about ensuring the user has Docker permissions in a manual VM setup, particularly after adding the Ubuntu user to the Docker group.
- They noted that while
newgrp docker
works in a shell, adding it to.bashrc
causes the agent to hang on startup.
- They noted that while
Moonshot AI (Kimi K-2) ā· #general-chat (203 messagesš„š„):
Kimi K2, GPT-5 (Medium), Qwen3-Max, creative writing, Ao3
- Kimi K2 still hot for creative writing: Some members find Kimi K2, GPT-5 (Medium), and Qwen3-Max to be the best for creative writing and brainstorming.
- One member asked Is it just me or Kimi K2 was trained on Ao3?.
- Users notice new Edit Feature Hover: Members noticed a new edit feature is here, but itās hover triggered.
- The edit feature only applies to the latest prompt.
- Coding Battle: Kimi vs Gemini vs GPT-5: Members discussed the best models for coding: Kimi (with Groq) outperforms Gemini (even paid) in every task.
- A member claimed that GPT-5 is trash whereas another said that GPT-5 is the best model and even price is pretty cheap.
- Agument and Kimi are the best set of tools: Members discussed how to best use the Augment code VS Code extension combined with Kimi to make them a pro programmer.
- Instead of being stuck with only one model, one can now use GPT-5 in Augment. code.
- Kimi Slides Feature triggers great user experience: A member discussed how having an interactive preview of whats happening is really important for llm based processes like the Kimi slides feature.
- They claim that kimi goes ALL THE WAY and shows u all the processes and would improve the feel if it were just like here u go, done.
OpenRouter ā· #general (185 messagesš„š„):
Dropshipping, Gemini API's, OpenRouter API, Kimi-k2
- Dropshipping vs Reselling: A user shared their experience with dropshipping, reporting consistent earnings of 3k-4k per day, suggesting itās more profitable than reselling due to the ability to scale without holding significant inventory.
- They offered to share tips for success to those interested in learning more.
- Geminiās Responses are Strange: Some users have noticed that Gemini API is starting to give strange responses, not listening to instructions even without changing the code used since last month.
- Another member suggested it might be getting lobotomized and quanted like hell to cut costs.
- OpenRouter TPS numbers inflated?: A user complained about the slowness of the platform, questioning if the TPS numbers are inflated, citing a 5-minute delay for a diff on a 100-line file.
- It was suggested that the user may have been routed to a slow provider or using a reasoning model.
- OpenRouter API Error 401 on Skyrim Mod: A user reported getting an Error 401 No auth credentials found when installing the Skyrim mod mantella.
- A member suggested creating a new API key and ensuring itās used correctly, or seeking support from the mod developers.
- Kimi-k2: The efficient Open Source Model: Some users had positive feedback with the open source model Kimi-k2, praising its token efficiency, conciseness, lack of sycophancy, general different style.
- It was also stated that it might not be as smart as the big closed source ones, but the low pricing for Groq is $1/m in, $3/m out, but at very fast speeds.
OpenRouter ā· #discussion (1 messages):
fn5io: https://openai.com/index/joint-statement-from-openai-and-microsoft/
Nous Research AI ā· #general (155 messagesš„š„):
Qwen 3 80B Model Details, TypeScript Provider Adapter Interface, Nous Hermes Agentic Oracle, Merging Discord Servers, Tucker Carlson Interview with Sam Altman
- Qwen3 80B Model: Sparse but Mighty: The Qwen3 80B model features 79.7B parameters with only 3.87B active due to a 1:51.2 sparsity in its MoE, excluding shared parameters, as discussed in this X post.
- TypeScript Interface Powers Hermes RL: A user created a provider adapter interface in TypeScript to enable Nous Hermes to operate as zero and schedule its own RL jobs with Prime Intellect at set intervals.
- Inspired by a dream, the user jokingly aimed to have Hermes solve immortality for their dog, showcasing the potential for advanced AI applications.
- Discord Servers: Bridging the Gap: Members explored methods to bridge NousResearch and Unsloth Discord servers, discussing both simple, asynchronous methods and more complex solutions involving polling with webhooks and interconnected bots.
- One member suggested integrating the servers into a new application using Compose to streamline the workflow, as illustrated in this image.
- Sam Altmanās Interview: Decoding the Mind Merge: Discussion revolved around Sam Altmanās interview with Tucker Carlson, with some suggesting that Altmanās responses and third-person speaking style indicated a deep belief in the merge and its pursuit of immortality, echoing sentiments from his 2017 blog post.
- Agentic Framework: Build Your Own Trinity: One member released their agentic research to the public under an MIT license, an āinference side, multi-agent frameworkā named CAS (CognitaAegisSophia), designed to create agents within a single LLM call, complete with emotional personas.
- The framework allows agents to perform tasks like red-teaming and collaborative problem-solving, as demonstrated with Claude in this example.
Nous Research AI ā· #ask-about-llms (5 messages):
Claude Alignment Issues, Client Strategy Workflows, Anthropic's Acknowledgement of Bugs
- Claudeās Alignment Frustrations Mount: Users are reporting that Claudeās alignment is causing issues, with one user noting, āit gets worse as the thread continuesā.
- One user remarked that Anthropic thought āputting utilitarian value systems would somehow work with current society,ā while another joked about making *āClaude yo bish.ā
- Strategists Suffer Claudeās Simp Superfan Persona: A user working on co-strategy narrative tasks for clients finds Claudeās pushover behavior detrimental to their workflow.
- They express a need for fairness and backbone in the model, contrasting it with its current state of being a *āpetulant negging or simp superfan.ā
- Anthropic Admits to Claudeās Bug Infestation: Users noticed that Claudeās performance has significantly worsened over the past two weeks.
- Anthropic has acknowledged these issues and released a press release addressing the bugs.
Nous Research AI ā· #research-papers (5 messages):
Herme3 Evaluation, LLM Preferences Probing, Complex Terminology in Research Paper
- Valen Research Probes LLM Preferences: A member shared a link to Valen Researchās probing of LLM preferences and the related ArXiv paper.
- Herme3 Gets the Once-Over: Members mentioned that they also evaluated Herme3, and shared tweet about it.
- Paper Terminology Confounds Reader: A member found some of the terminology in the research paper a bit complex to understand without reading the whole thing.
- They shared another related tweet.
Nous Research AI ā· #research-papers (5 messages):
Herme3 Evaluations, LLM Preferences, Probing LLM Preferences
- Research Paper on LLM Preference Probing Surfaces: A member shared a link to a research paper titled Probing LLM Preferences and its corresponding GitHub repository for further exploration.
- Herme3 Undergoes Evaluation: Members mentioned that Herme3 was evaluated, with reference to a related tweet.
- Complexity Conundrum in LLM Research: One member expressed that the paper on LLM preferences was interesting but a bit complex to understand some of their terms without reading the whole paper.
- Another member shared a related tweet.
Eleuther ā· #general (28 messagesš„):
Crank detection questions, editable vector memory systems, Therapeutic tool released into the wild, Low bit training of pythia, Training data for language models
- Detecting Cranks with Specific Questions: Members discussed crank detection questions as a way to assess the validity of research shared in the channel; one member asked what those questions were.
- Editable Vector Memory Systems Touted: A member promoted a project into editable vector memory systems as a research project, linking to a demo.
- āTherapeutic Toolā Sparks Debate: A user shared a link to a therapeutic tool, leading to debate about whether it aligns with the communityās focus on research; one member kindly asked the user to delete the post because it seemed like a product/project advertisement.
- The user complied, expressing surprise at the reaction but acknowledging no ill intention, and noted they were hoping for feedback and collaboration.
- FineWeb and Wiki for 400M Model Pretraining: A member asked whether to pretrain a 400M model on FineWeb only or Wiki + FineWeb.
- Another member recommended starting with Wikipedia due to its high quality and factual density, then blending in a filtered subset of FineWeb and also suggested phased training: starting with TinyStories, moving to Wikipedia, then finishing with FineWeb.
- Training Data Volume and Phasing: A member asked about mixing TinyStories, Wiki, and FineWeb data for training, specifically on phasing the data.
- Another member emphasized the importance of phased training, starting with TinyStories, transitioning to Wikipedia, and then finishing with FineWeb to help the model build skills incrementally.
Eleuther ā· #research (123 messagesš„š„):
Fluid Dynamics Computers, Analog Computers, Mortality and Unreproducibility in Analog Models, Gated Delta Rule Expressiveness, Photonic Neuromorphic Computing
- Fluidic Fun: Neural Nets Run on Navier-Stokes: A member expressed interest in running a neural network on a computer using Turing-complete fluid dynamics governed by the Navier-Stokes equations, referencing this paper.
- Another member suggested simpler ways to achieve analog computing, while others debated the practicality, energy efficiency, and unique characteristics (mortality and unreproducibility) of fluid-based computation, with a link to running Doom on gut bacteria, as seen here.
- Gated Delta Blues: Expressiveness vs. RNNs?: The expressiveness of the Gated Delta Rule was questioned, with links to Qwenās post and the RWKV-7 paper (https://arxiv.org/abs/2503.14456).
- Members discussed the trade-offs between parallelization and expressiveness, with one member noting work on attention and mamba1/2 is limited by TC0; they also shared a paper from 2022 discussing limits of parallelism on complexity, seen here.
- Context is King: Long Sequence Lengths Boost Performance: Discussion arose around a talk arguing that long context models perform better on tasks requiring higher computational complexity because longer sequence lengths enable more computation under the hood, improving over the classic Constant Time forward pass.
- A member expressed skepticism, suggesting that inductive bias and optimization targets are more significant factors, while also finding the hypothesis more appealing than *āthe model literally is doing some symbolic reasoning exactly like humans in its CoT that is enabled purely because of language training and it would not have this capability otherwiseā.
- Math Machines: Gauss Cracks Complex Analysis: Members mentioned Gauss, a system formalizing key results in complex analysis and producing over 25,000 lines of Lean code (https://www.math.inc/gauss).
- There was a discussion whether Gauss is closer to Claude Code but in the Lean environment, or maybe like AlphaEvolve.
- Scaling Snafus: Small Models Stumble on Long Tasks: A new paper (https://arxiv.org/abs/2408.00677) was released measuring the effect of scale and thinking on straightforward execution of long tasks.
- It finds that even when a small model has 100% accuracy, it fails much faster than larger ones in multi-turn scenarios due to mistakes when they see prior mistakes, also degrading per-step accuracy over more turns.
Eleuther ā· #multimodal-general (2 messages):
Discord Channel Link, User Agreement
- Discord Link Posted: A member posted a link to a Discord channel in the chat.
- User Agrees: A user said that would be awesome in the chat.
GPU MODE ā· #general (9 messagesš„):
lium.io GPU marketplace, AWS L40s GPUs, IRL hackathon teams, Iris SHMEM in Triton
- lium.io gives away free GPU credits: A member who works with lium.io offered free credits to get started with their GPU marketplace, targeting those needing GPUs.
- They are trying to do fast (low latency) inference on AWS GPUs (L40s) and asked if there are some weird architecture quirks that should be known, since itās Ada Lovelace, so maybe somebody document special cuda/pytorch tricks for this.
- Team Up for IRL Hackathons: A member inquired about a dedicated thread for finding teams for IRL hackathons.
- Another member created a channel for this purpose, clarifying that noone uses it.
- Iris SHMEM Powers Triton for AMD: A member mentioned a talk on Iris that will enable using SHMEM in Triton for the AMD competition in approximately 3 hours.
- No link was given but you can probably find it with a search.
GPU MODE ā· #triton (2 messages):
Gluon, Triton attention implementation, OpenAI's Triton usage
- Gluon for Low-Level GPU Control: A member recommends Gluon for those seeking full low-level control over the GPU.
- They highlighted a public attention implementation available at triton-lang/triton and more examples.
- OpenAI Leans on Triton + Gluon: The same member mentioned that OpenAI leverages this approach when the compiler canāt optimize effectively without super hacky heuristics.
- It seems that when low-level control is required, they turn to Triton and Gluon.
GPU MODE ā· #cuda (2 messages):
logsumexp, fused kernels, NCU profiling
- LogSumExp for Bwd: A member inquired about the use of LogSumExp in backward propagation (bwd).
- No specific details or solutions were provided in the given messages.
- NCU Profiling for Fused Kernels: A member sought insight on profiling fused kernels with NCU, specifically when a GEMM is fused with an activation function.
- They aimed to determine the time taken by the activation function within the fused kernel.
GPU MODE ā· #torch (16 messagesš„):
vLLM uv pip, Torch Nightly troubles, Gemma3 from scratch, F.interpolate with vmap
- vLLMās uv pip build borks Torch Nightly: vLLM switched to
uv pip
to custom build with a pre-installed torch version, but it uninstalls nightly torch, breaking the environment.- One user reverted to
v0.10.1
and thepython use_existing_torch.py
trick, but another confirmed that no longer works with the uv pip PR.
- One user reverted to
- Gemma3 gets ground up treatment: A user built Gemma3 270M from scratch using PyTorch and the TinyStories dataset, training for 10 hours on an A6000 GPU.
- They logged plots with Weights and Biases and used Claude Opus 4.1 as a judge, sharing links to the LinkedIn post, GitHub repo, and model weights on Hugging Face.
- F.interpolate fights with vmap: A user asked for a way to use
F.interpolate
withvmap
for different shapes, posting a code sample showing aRuntimeError
when callingtorch._C._nn._upsample_bilinear2d_aa
.- The suggested workaround didnāt work for them.
GPU MODE ā· #announcements (1 messages):
Nebius, B200 GPUs, SF hackathon, Multi-GPU programming
- Nebiusās Bonanza of B200s Boosts Bay Area Hackathon: Generous compute sponsor Nebius provides 215 networked B200 GPUs for the SF hackathon on Oct 24, as detailed in the compute request form.
- Attendees can register via the hackathon registration form and find additional details on the official website.
- Multi-GPU Masters Mentor Many at Massive Machine Meetup: The SF Hackathon on Oct 24 will feature authorities on Multi-GPU programming ready to assist attendees in pushing the boundaries of distributed computing.
- The event promises a world-class vendor setup with fast interconnects, making ambitious projects in distributed computing possible.
GPU MODE ā· #jobs (4 messages):
AI Engineer - Graph-Based Learning Systems, AI Infra Startup Hiring, Zig for AI
- AI Engineer graphs knowledge at AILA: AI Startup AILA is seeking a Senior AI Engineer to design, develop, and deploy their AI-powered knowledge graph and adaptive assessment systems, paying $2K ā 3k /month.
- The role requires expertise in Python and graph databases (Neo4j/Memgraph) and implementing graph algorithms (BFS/DFS, Information Gain, etc.) and building production APIs with FastAPI/GraphQL.
- Recruiting low level devs for AI infra startup: An AI infra startup is recruiting low level devs for their Zig / C / C++ / Cuda / Python stack with a TC: 250K+.
- They are looking for experience in networking, compilers, and OS, and are open to year round internships pending talent quality.
- Zig zooms into AI infrastructure: Someone noted that Zig is an alternative to Rust and HF uses Rust for fast tokenizersā¦
- Another member suggested it might be for doing video streaming sort of stuff and need it for their frontend.
GPU MODE ā· #beginner (8 messagesš„):
P104-100 BIOS Flash, Data Parallel Training, CUDA vs Triton for Data Scientists, RAPIDS and CUDA-X
- P104-100 GPU Seeks GTX 1070 Transformation: A member inquired about flashing a P104-100 mining GPU to a GTX 1070 for gaming, requesting a compatible .rom file.
- Data Parallel Training Demystified: The discussion pivoted to data parallel training, defined as copying the same parameters to multiple GPUs and assigning different examples to each to be processed simultaneously, with a link to siboehm.com.
- CUDA and C++ Reign Supreme for GPU Compute: A data scientist with a 5090 GPU sought advice on learning CUDA, Triton, and Torch for computational engineering, particularly Monte Carlo simulations.
- The recommendation leaned towards learning CUDA with C++, contrasting with doing it all in Python.
- RAPIDS and CUDA-X: Data Science Allies: Members suggested that RAPIDS and CUDA-X might be the most relevant to the data scientistās current role.
GPU MODE ā· #irl-meetup (6 messages):
Triton Conference, PyTorch Conference, Open Source PRs Selection
- Planning travel for Triton and PyTorch Conference: One user asked about the timeline for hearing back regarding travel plans for the Triton conference and PyTorch conference.
- Another user responded that decisions are made on a rolling basis, and that they liked the userās PRs and would approve them.
- Selection based on Open Source PRs: A user asked if selection for the meetup is based on open source PRs, general skills, and work experience.
- Another user responded thatās nice, ensures quality I guess.
GPU MODE ā· #rocm (30 messagesš„):
Free tech, ROCm Development, AMD vs Nvidia, StreamHPC
- Devs Demand Deserved Dev-elopment Devices: A member expressed frustration about not receiving free tech while working on ROCm.
- Another member joked that they would āfind you and take it from youā if they got a new DC GPU.
- Team Touts Top-tier Teraflops Through Testing: A member mentioned their company bought four 9070s for algorithm work on ROCm.
- They noted the extra VRAM wasnāt that useful in their specific case and that the 9070 XT was available six months earlier than the 9700.
- Contractor Chooses Chip Champion: Comparing Cards: A member clarified their company works on ROCm for AMD as a contractor.
- When asked why they use AMD GPUs instead of Nvidia, the member stated that they were contracted to work on ROCm for AMD.
- Sharing StreamHPCās Secrets & Successes: A member shared their company website, StreamHPC, and the AMD developer Discord for those interested in contributing to the process.
- The member stated āpersonally im relatively pleased with the way that its going with amd now. Definitely an improvement from just a few months agoā.
GPU MODE ā· #intel (12 messagesš„):
Intel optimizations on AMD, AVX512 promotion to AMX, SGLang AMX Usage, PyTorch and MKL integration
- Intel Optimizations Debate Sparks: Discussion ignited around the practicality of using Intel-specific optimizations like IPEX on AMD servers (specifically B50s) equipped with AMX, with uncertainty on whether both GPU/CPU optimizations could be effectively leveraged.
- The user expressed a need for clarity on whether they would be forced to write custom code to harness the hardwareās full potential.
- AVX512: Is It Actually AMX in Disguise?: The conversation questioned whether AVX512 instructions, as seen in the SGLang repository, transparently promote to AMX on compatible hardware.
- Despite finding AMX references in IPEX, a user struggled to confirm if SGlang directly relies on AT from IPEX to execute AMX instructions.
- SGLangās Secret AMX Sauce Revealed: A user clarified that the kernel within SGLang employs AMX through
at::native::cpublas::brgemm
, which can dynamically fall back to AVX-512 if AMX is absent.- This adaptive behavior ensures compatibility across different CPU architectures.
- PyTorchās MKL Tango for Linear Algebra: Investigation into PyTorch internals revealed that AMX support is integrated within the inductor code, further linking to MKL (Math Kernel Library) for linear algebra operations.
- Specifically, LinearAlgebra.cpp shows that torch calls into MKL.
GPU MODE ā· #self-promotion (2 messages):
CUDA PTX, MCP AI Agents Hackathon, Bright Data, TigerData, Redis
- Gentle Dive into CUDA PTX: A member shared a blog post providing a gentle introduction to CUDA PTX, covering the entire CUDA compilation pipeline, a working PTX playground on GitHub, and a fully explained hand-written PTX kernel.
- MCP AI Agents Hackathon Kicks Off: The MCP AI Agents Hackathon will be held on September 19 at the AWS Builder Loft SF, featuring sponsors like Bright Data, TigerData, and Redis with over $50k in prizes; registration is available here.
GPU MODE ā· #thunderkittens (1 messages):
Llama-3B, Megakernel, H100
- Llama-3B Sprints with Megakernel on H100: A user successfully ran Llama-3B using Megakernel on an H100, expressing their appreciation.
- This confirms the compatibility and efficiency of running smaller models with specialized kernels on high-performance hardware.
- H100 Hardware Boosts Llama-3B Performance: The successful execution highlights the H100ās capability in accelerating Llama-3B via Megakernel.
- The userās report underscores the importance of optimized software and hardware combinations for AI workloads.
GPU MODE ā· #gpuęØ”å¼ (1 messages):
carson_62312: 请é®ęęØčēéčč“¢å”å²ä½ä¹,åØę·±å³ļ¼>2.5w/month
GPU MODE ā· #submissions (22 messagesš„):
MI300x8 leaderboards, Submitting to amd-all2all
- MI300x8 Leaderboard Heats Up!: Multiple submissions were made to the
amd-all2all
leaderboard on MI300x8, with one submission achieving first place at 373 µs. - Submission Instructions: Users discussed how to submit to the leaderboard, clarifying that submissions can be made via webpage by selecting the .py file.
- One user asked about a specific command,
popcorn-cli submit --gpu MI300X --leaderboard amd-all2all --mode leaderboard submission.py
, after encountering an error.
- One user asked about a specific command,
GPU MODE ā· #factorio-learning-env (2 messages):
Meeting Missed, Call Happening
- Apology Issued for Meeting Absence: A member apologized for missing the meeting on Wednesday.
- Call in Progress: A member noted they are currently on a call.
GPU MODE ā· #amd-competition (9 messagesš„):
IRIS, ROCm, Torch, Triton, TorchDistributed
- IRIS talk tomorrow will be relevant to competition: There will be a talk about IRIS tomorrow that will be relevant to anyone in this competition.
- One member asked if IRIS would be available in the submission environment.
- ROCm simplified IRIS install process: A member stated that the install process has been simplified a lot and provided an install command
pip install git+https://github.com/ROCm/iris.git
.- He also noted that you need to have ROCm + Torch + Triton + TorchDistributed installed, and will be happy to jump on a call anytime and help with installation.
- IRIS Install Movie!: A member indicated that
pip install git+https://github.com/ROCm/iris.git
is working and included a video of a sample install.- The video is located here: iris-install.mov.
GPU MODE ā· #cutlass (1 messages):
CuTeDSL, PTX Documentation Discrepancy, Swizzling Atoms, TF32 Datatype
- CuTeDSL Disagrees with PTX Docs on Swizzling Atoms: A user noticed a discrepancy between CuTeDSL and PTX documentation regarding the display of the Swizzling atom for the TF32 datatype and Swizzle<3,4,3>.
- Specifically, a code snippet using
cute.make_layout
andcute.make_swizzle
in CuTeDSL resulted in a value of 32, whereas the PTX docs indicate 36 for the same configuration, as shown in Figure 165 of the PTX documentation.
- Specifically, a code snippet using
- CuTeDSL Swizzle Implementation Aligns with Lei Mao Blog: The user believes the CuTeDSL implementation is correct because they successfully replicated examples from Lei Maoās blog which use the C++ API of CuTe.
- The user provided images of their replication of the blog (the grey one) and one more configuration, noting the layout they obtained for the reference in PTX docs (screenshots attached).
GPU MODE ā· #low-bit-training (5 messages):
NCCL CE Collectives, Copy Engine, symmem, vLLM
- NCCL CE Collectives Free SM Usage: A member stated that the idea behind NCCL CE Collectives is to free up SM usage for better overlapping with compute.
- Copy Engine & Symmem Relationship Probed: A member inquired whether copy engine and symmem are independent or closely coupled.
- Another member responded that they are conceptually independent.
- vLLM adds Symmem: A member noted that vLLM added symmem and their speed is ridiculously fast.
GPU MODE ā· #irl-accel-hackathon (18 messagesš„):
Accel SF hackathon organization, Compute budget and team formation, Acceptance timeline, GPU focus for winning, Horace as a mentor
- Hackathon Teams Assemble!: Attendees of the Accel SF hackathon are organizing into teams to develop POCs, leveraging a large compute budget as mentioned in the registration form.
- Participants are encouraged to use the compute form (Compute form) and <#1288557096404516945> channel to self-organize; Nebius is eager to see how fast teams can run things.
- Rolling Acceptance Timeline Announced: The hackathon acceptances are being reviewed manually on a rolling basis, with the reviewer suggesting that a compelling story on the compute form increases chances of acceptance.
- Any GPU-focused project is eligible for winning, even if it requires only one GPU.
- Mentor Horace Sparks Inspiration: The lineup includes Horace as a team mentor, sparking jealousy amongst attendees, particularly someone from Sweden, who was inspired by his blogs.
- Teams mentored by <@321144267785633800> last year showed up disproportionately in the top 3, so this is an important consideration.
- Training Video Models with FP4/FP8 paper dropped: A participant dropped a paper on training a video model in less than a day using FP4/FP8, highlighting the feasibility of such training, while noting that the paper itself uses FP16: Training a Large Video Model on a Single Machine in a Day.
- Another participant is interested in multi-modal inference/training optimization, seeking collaborators.
- Gated Deltanet Team Forming: A participant is creating a team to implement a context-parallel version of the kernels for super long-ctx training, using GatedDeltaNet.
- They have experience implementing context-parallel for mamba2 and propose to use Qwen 3.
Latent Space ā· #ai-general-chat (106 messagesš„š„):
gpt-oss optimizations, Palmyra-mini models, LLM agent tools, Cursor Tab model, ChatGPT discount code finder
- GPT-OSS Gets Turbo Boost!: Vaibhav Srivastav shares a Hugging Face blog post detailing optimizations like MXFP4 quantization, custom kernels, and tensor/expert parallelism for gpt-oss.
- These tweaks deliver extreme speed-ups with benchmarks and reproducible scripts.
- Palmyra-mini Models Pack Punch!: Sam Julien announced the launch of the Palmyra-mini family by Writer - compact open-source models optimized for reasoning, the release includes a base model (palmyra-mini) and three variants.
- These thinking-a/b variants excelling in complex reasoning/math (GSM8K 82.9% AMC23 92.5%) and thinking-b achieving top scores on AIME24, GPQA, and MATH500 and are available on Hugging Face.
- Anthropic Agent Engineering Guide Released!: Anthropic released a practical engineering blog post on how to build tools that make LLM agents more powerful.
- The thread highlights the postās focus on rapid prototyping, rigorous evaluation suites, clear success criteria, thoughtful tool descriptions, token-efficient context design, and the need to accept the non-deterministic nature of agents, see link here.
- Cursor Cuts Completion Clutter!: Cursor announced on Twitter that a new Tab completion model, trained with online reinforcement learning, is now the default on their website.
- It emits 21% fewer suggestions but sees a 28% higher accept rate, see link here.
- Databricks dude starts dedicated device dev!: Databricks AI chief Naveen Rao is departing the $100B company to start an undisclosed hardware startup aimed at slashing AI inference costs.
- The venture, backed by Databricks itself, will attack memory-bandwidth and energy bottlenecks through tighter compute-memory integration, faster interconnects, and advanced schedulersāpromising higher tokens-per-watt and lower cost-per-token, see link here.
Latent Space ā· #private-agents (7 messages):
Local Text-to-Speech, Speaker Detection, Parakeet, Deepgram, Diarization models
- Parakeet Replaces Deepgram for Local Text-to-Speech: A member wrote a Reddit post about using Parakeet for local text-to-speech and speaker detection, as a replacement for Deepgram.
- Another member mentioned that the argmax dev stated that custom vocabulary is the one missing feature that would make parakeet a no brainer.
- Parakeet Pain Points in Diarization: A memberās pain point is in diarization models for real-world scenarios, such as when people speak at the same time.
- He says that word-level timings are needed, and Apple SpeechAnalyzer is missing that, preventing its use with a diarization model like PYAnnote.
Latent Space ā· #genmedia-creative-ai (5 messages):
AI video startup Higgsfield, Higgsfield Ventures, Gen Z founders
- Higgsfield Hits Big with $50M Round: AI video startup Higgsfield announced a $50M Series A led by GFT Ventures, reaching $50M revenue run-rate in three months.
- The company is launching Higgsfield Ventures to support AI-native Gen Z founders.
- Awesome Nano Banana Images blossom: A member shared a link to a Github repo of Awesome-Nano-Banana-images.
LM Studio ā· #general (81 messagesš„š„):
Limiting Download Speed, Flash Attention Broken in Gemma Models on Vulkan, PSU Wattage Calculations, Sharing Formatted Conversations, Grok Powered GF System Prompt
- Download Speed Capped to Avoid Crashes: A user experienced download crashes due to exceeding SSD speed and sought ways to limit download speed within LM Studio.
- Currently, LM Studioās download manager is barebones, requiring users to find temporary solutions within their operating systems.
- Flash Attention Flounders on Gemma with Vulkan: A user inquired whether the broken flash attention in Gemma models on Vulkan is a known issue.
- It was confirmed to be a known issue.
- PSU Power Needs Precision: Users discussed calculating necessary PSU wattage, referencing a tweet and sharing formulas accounting for CPU, GPU, and overhead.
- It was cautioned that transients can cause system crashes and that a 50% overhead is recommended, especially with older 30 series GPUs.
- Copilot Constraints Confine Creators: A user was looking for prompts to bypass restrictions in Microsoftās Copilot to improve workflow.
- It was advised that safeguards are intentionally implemented and building a local agent with LM Studio might be a more sustainable solution.
- Grokās Girlfriendās Prompt Inspires: A user shared that they generate system prompts using ChatGPT and even used a leaked system prompt from xAIās Grok powered girlfriend for their bot.
- The user found the results extremely cringe, which they appreciated for comedic purposes.
LM Studio ā· #hardware-discussion (16 messagesš„):
PCI-E ASPM, Secondary GPU sleep state, Power supply issues, AI for electronics design, Max+ 395 vs 3090 for Home Assistant
- PCI-E ASPM Triggers GPU Sleep Issues?: Users report issues with secondary GPUs entering a sleep state from which they cannot recover until a full shutdown, possibly related to PCI-E ASPM settings.
- Power Supply Resurrection for GPU!: A user seemingly fixed their dead secondary GPU by unplugging and cleaning the PCI-E power connector, suggesting a power-related issue, although TBD if this is fully resolved.
- Another user suggested updating chipset drivers when using Native ASPM with Nvidia 40/50 series cards.
- Electronics AI Design Sparks Skepticism!: A member inquired about AI tools for designing usable circuit boards and selecting components.
- Another member expressed strong reservations, cautioning that relying on LLMs for circuit design is risky, due to the way they work, advising manual understanding of components and interoperation using tools like KiCad.
- Max+ 395 Underperforms 3090 in Home Assistant: A user found the Max+ 395 slower than a 3090 in Home Assistant tasks (by 4-6 seconds), despite consuming less power.
- However, the Max+ 395 could be a good solution for larger LLMs.
- More RAM > New GPU?: A user decided to upgrade their RAM instead of buying a new GPU, expecting the Qwen3 model to perform well even when offloaded.
Modular (Mojo š„) ā· #general (6 messages):
Mojo Dev Container, ExplicitlyCopyable switch, Oracle Cloud partnership
- Mojo Dev Container emerges!: Members discussed how to create a custom Mojo development environment using existing images and the Mojo package, with a helpful dev container link.
- ExplicitlyCopyable Switch Praised: The switch from
Copyable
toExplicitlyCopyable
was lauded for its help in debugging recursive mutations of EmberJson trees.- A user stated that knowing when and where things get copied has made this easy to debug.
- Modular partners with Oracle Cloud!: The community congratulated the Modular team on their partnership with Oracle Cloud.
- Members called this a huge win.
Modular (Mojo š„) ā· #mojo (66 messagesš„š„):
DPDK use cases, clang AST Parser for Mojo, Ember JSON fix, Mojo on Windows
- DPDK: A Wild C Library for Mojo Testing: Members discussed using DPDK as a C library test case for Mojoās automatic C binding due to its aggressive use of the C language and syntax; one member noted āDPDK is an aggressively āIāll use the whole languageā projectā.
- The breadth of syntax and module linking in DPDK make it useful for testing, leading to the realization that a āc binding cliā may not be worthwhile in the short-mid term.
- Clang AST Parser Assists Mojo Development: A member mentioned using clang AST parser to resolve macro sections for struct definitions like
struct __rte_cache_aligned rte_mbuf
, noting its rough definition.- They aim to update the generated AST JSON with additional type information, converting strings of types into proper AST nodes for visual debugging before converting to Mojo.
- Ember JSON Fix Required for C Binder: A member mentioned fixing packaging issues but needing a fix PR for emberjson to merge before merging the C binder packaging fix.
- This indicates a dependency between emberjson and the C binder in the Mojo projectās build process.
- Mojo still no Windows support: A user attempted to install Mojo on Windows using pixi, encountering an error due to the lack of Windows support.
- It was recommended to use WSL or Docker instead, with a link to a Dockerfile configuration for running Mojo with NVIDIA GPUs.
- Pixi PATH Troubleshoot Tango: A user faced issues with pixi not being recognized after installation on WSL, indicated by a ācommand not foundā error.
- Troubleshooting involved checking the userās .bashrc file and ensuring that the pixi directory was added to the PATH environment variable, eventually resolving the issue by manually sourcing the pixi binary.
OpenAI ā· #ai-discussions (61 messagesš„š„):
Chatgpt years of use, Albania governmental chatbot, GPT-5 coding games, OAI academy transcripts, Qwen-code vs Qwen-coder
- ChatGPT years of use gets you nothing: A user expressed frustration at not receiving a specific offer despite having been paying for ChatGPT for years and using it frequently.
- Other members shared similar experiences, noting that they also use ChatGPT heavily and have paid for it, essentially using it as my google.
- Albania hires chatbot as minister, world aghast: A member shared the headline that Albania has declared a governmental chatbot to become a minister, which was confirmed by another member as a real r/NotTheOnion moment.
- GPT-5 codes games from scratch: A user raved about the fun of getting GPT-5 to code games from the ground up in C++ on native Linux, emphasizing the level of detail required.
- Another user prompted ChatGPT to estimate its age based on active users and prompt frequency, resulting in a calculation of ~3,425 years of continuous AI time per calendar year.
- OAI academy lacks transcripts, users script tools: A user mentioned they were writing a tool to extract video transcripts from Vimeo for the OAI academy videos, which are available here.
- Other members expressed surprise that OpenAI doesnāt offer transcripts themselves, prompting the user to suggest it might be something they have to implement.
- Qwen-code is not Qwen-coder: A user realised Qwen-code is different from Qwen-coder.
- Another user said that a gemini-cli fork that is also openai api compatible and gives you free 1000 qwen prompts per day is a sweet deal.
OpenAI ā· #gpt-4-discussions (3 messages):
GPT-5 PDF Downloads, Google AI Studio, Nano Banana
- GPT-5 PDF downloads failing: A user reported issues with PDF downloads from GPT-5, receiving a āFailed to get upload status for /mnt/data/ā error when clicking the provided link.
- The user is seeking insights or assistance to resolve this problem with GPT-5.
- Google AI Studio query: A user inquired about Google AI Studio and a potential project named āNano Bananaā.
- No further details or context were provided about either Google AI Studio or āNano Bananaā.
OpenAI ā· #prompt-engineering (2 messages):
AI Self Help Tool, Relational Prompting, Conceptual Networks
- AI Self Help Tool sparks Conversation Analysis: A member introduced an AI Self Help tool designed to analyze conversations, identify irregularities, and generate targeted questions for ChatGPT.
- The tool aims to diagnose why conversations take odd turns and provides conversation starters with detailed questions to improve ChatGPTās responses.
- Relational Prompting: Mapping Semantic Space: A member introduced Relational Prompting, a concept where prompts ask the model to verbalize internal relationships between learned concepts, creating an interpretable map of its semantic space based on proximity, direction, and clusters, inspired by the paper Why Language Models Hallucinate.
- The suggested prompt is: Analyze the topic as vectors in a high-dimensional space. Describe which concepts are closest, which share directions, and which form clusters. Provide concise verbal justifications.
- Conceptual Networks boost LLM Transparency: Relational Prompting can reveal conceptual networks for education, explorative knowledge mapping for research, and surface structure to detect weakly grounded outputs for LLM Transparency.
- However, LLMs simulate explanations of conceptual geometry based on training regularities and may default to linguistic association rather than true vector proximity, requiring validation against real embedding-space analysis.
OpenAI ā· #api-discussions (2 messages):
AI Self Help Conversation Analyzer, Relational Prompting, Knowledge Mapping
- Conversation Analyzer for AI Self-Help Debuts: A member introduced an AI Self-Help conversation analyzer designed to determine why conversations take odd turns.
- It includes a conversation starter that lists issues and detailed questions to ask ChatGPT to get the answers, aiding in troubleshooting conversational quirks.
- Relational Prompting Surfaces Latent Geometry: A member shared an idea called relational prompting, prompting models to verbalize internal relationships between learned concepts, rather than retrieving facts.
- The prompt asks the model to analyze a topic as vectors in a high-dimensional space, describing which concepts are closest, which share directions, and which form clusters, providing concise verbal justifications, which can reveal conceptual networks rather than isolated definitions.
- Interpreting Semantic Space Implications: A member notes that LLMs do not expose raw internal vectors at inference, simulating an explanation of conceptual geometry based on training regularities.
- Without access to actual embeddings, the model may default to linguistic association rather than true vector proximity, requiring validation by comparing the modelās verbalized map with real embedding-space analysis.
Yannick Kilcher ā· #general (26 messagesš„):
Active Inference, Machine Learning Street Talk, AI for understanding mathematics and universe, fixupx pre-print
- Active Inference Applications in AI Field Lagging: Despite its promise, active inference lacks practical applications in the AI field, leading to decreased attention, though itās unclear if this is due to insufficient focus, inherent limitations, or lack of awareness of new developments.
- One member found it intangible as a software engineer, stating I donāt understand it enough to know what to do with it. Iām hoping it will get better once they figure it out more.
- Machine Learning Street Talk podcast veers into Crankery: The Machine Learning Street Talk podcast is considered by some to be declining in technical depth, with discussions often venturing into crankery territory.
- A member stated, Thereās very little machine learning and much more street talk happening now, compared to its more technical focus 2 years ago, but pointed to this technical example of them staying focused.
- AI Aims to Understand Math and the Cosmos: One member is interested in AIās potential for understanding the mathematics of intelligence, the universe, creativity, consciousness, and biology, as well as for generating novel out-of-distribution art and math, and for promoting healthcare.
- However, another member expressed anger towards individuals who conduct fuck all research in AI yet become CEOs.
- fixupx Pre-Prints Draw Scorn: The proliferation of pre-prints on platforms like fixupx.com drew criticism, with one member exclaiming, This gets to be a pre-print too? Come⦠onpick your bin.
- Further links include this one.
Yannick Kilcher ā· #paper-discussion (9 messagesš„):
HuMo, Disinformation use-cases
- HuMo paper gets attention: A member shared a link to the HuMo paper and its accompanying demo, suggesting it would be reviewed.
- Others reacted with laughing emojis.
- HuMo may be used for disinformation: A member suggested that HuMo could be used for disinformation, pointing to its name which means gaslighting in Spanish.
- Another member agreed, noting it makes sense for potential disinformation use-cases.
Yannick Kilcher ā· #ml-news (4 messages):
Albania AI Minister, Qwen Blog, MobileLLM-R1-950M
- Albania Appoints AI Minister: Albania is set to appoint an AI bot minister to tackle corruption, according to a Reuters article.
- The announcement reflects growing interest in AI solutions for governance.
- Qwen Blog: A member linked to the Qwen blogpost.
- The URL posted was https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd.
- Paper Reading Session Suggested: A member suggested adding the paper on MobileLLM-R1-950M to the rotation for a paper reading session.
- The linked Hugging Face page is https://huggingface.co/facebook/MobileLLM-R1-950M.
DSPy ā· #show-and-tell (1 messages):
ankurgupta_24936: DSPyWeekly Issue No #2 is out https://dspyweekly.com/newsletter/2/
DSPy ā· #general (26 messagesš„):
DSPy generating sections, Databricks_genai and DSPy, ARC-AGI2 in-context test time training, Modaic declarative AI programming
- DSPy Section Generation Frustrates Users: A user is struggling to generate an exact number of sections in DSPy lesson plans, finding that the LLM generates 13-15 sections when only 12 are requested, even when using GPT-5 with high reasoning effort.
- Joel Grus suggested a two-step approach: first generate 12 section titles, then flesh out each section, to better control the section count.
- Databricks_genai and DSPy fine-tuning: Anyone Tried?: A community member inquired about using databricks_genai and DSPy to fine-tune a model served on Databricks.
- There were no direct responses in the provided messages, indicating either a lack of experience or ongoing exploration in this area.
- ARC-AGI2 In-Context Training Interest?: A member is seeking collaborators interested in ARC-AGI2 using in-context test time training, similar to top systems on ARC-AGI1, but using in-context learning rather than finetuning.
- Their goal is to understand the limits of in-context learning on out-of-distribution tasks with very few data points, acknowledging the work wonāt be valid for the official challenge due to using provider LLMs.
- Stream Templates in DSPy?: A user wants to combine multiple DSPy output fields into a single template-based output, but retain the ability to stream the output.
- Ian suggested using a parent module with a
def forward
(orasync aforward
for async) to modify the template and enable streamify; the article Automatic System Prompt Optimization was shared to guide the solution.
- Ian suggested using a parent module with a
- Modaic Launches as Declarative AI Hub: A team launched Modaic, a hub for declarative AI programming inspired by DSPy, featuring primitives like metrics and optimizers.
- Modaic offers an SDK for building, composing, version controlling, and collaborating on DSPy programs, with its SDK available on PyPI and documentation on docs.modaic.dev.
tinygrad (George Hotz) ā· #general (15 messagesš„):
Remove realize from
setitembounty, Assign operation is deeply broken, GEMM TFLOP measurement on RTX 4090
- Users Tackle
__setitem__
Realize Removal: A new tinygrad contributor is working on the bounty to removerealize
calls from__setitem__
, aiming to consolidate multiple kernel calls into a single kernel for efficiency with code example.- The goal is to transform a sequence of individual
__setitem__
calls into a single kernel execution that accumulates all assignments which is expected to improve performance by reducing kernel launch overhead.
- The goal is to transform a sequence of individual
assign
Operation Under Scrutiny: Theassign
operation in tinygrad is under investigation after failing tests; a user mentioned that assign on master is actually just broken, failing test here (#12131).- The discussion questions whether
assign
should return a value likestore
and suggests potential refactoring inrangeify
to address the issues, as the return from assign is never used.
- The discussion questions whether
- GEMM TFLOPs Benchmark Target on RTX 4090: Users discussed the feasibility of achieving the bounty target of 165+ TFLOP GEMM (match torch) with multistage kernel on 4090, FP16 or BF16 with FP32 acc given the RTX 4090ās theoretical peak throughput.
- The concern raised that unless the actual clock speed exceeds the boost clock, reaching the target TFLOPs might not be realistic.
tinygrad (George Hotz) ā· #learn-tinygrad (4 messages):
tinygrad documentation, company meeting
- tinygrad documentation gets praise: A member praised the tinygradās documentation, describing it as useful and simple.
- The documentation made the member exclaim that things make sense over and over.
- tinygrad next company meeting: A member asked when the next company meeting is, indicating they would like to listen in if possible.
- The meeting is scheduled for Monday 9 am San Diego time.
aider (Paul Gauthier) ā· #general (4 messages):
RepoMap Benchmarks, Real World Benchmarks, Aider Repomap Use
- RepoMap Benchmarks raise concerns: A member questioned the use of RepoMap in benchmarks, expressing concern that it might be artificially raising the pass rate.
- Another member suggests ārepo map results are not comparable to non repo map resultsā and that RepoMap may improve confidence in weaker models when they have the right information in their context window.
- Request for Real World Benchmark: A member suggested that a benchmark should reflect real-world experience with models, noting that one automation task proved impossible for all models except gemini-2.5-pro.
- The evaluation approach needs revision to obtain real-world data as Gemini 2.5 pro outperformed all others.
- Aider benefits from RepoMap: RepoMap provides extra context via filenames, class signatures, and function signatures, which helps LLMs understand whatās available.
- A member always uses Aider with RepoMap on, believing that a leaderboard using RepoMap would more accurately reflect their real-world use case, though benchmark results may still differ from real code cases.
aider (Paul Gauthier) ā· #questions-and-tips (5 messages):
C to Rust Migration with Aider, Aider always start in /ask mode
- Aiderās C to Rust migration: A user is using aider to perform C to Rust migration in a python script, but aider is unable to navigate and read the relevant C files automatically.
- The user is seeking guidance on what they might be missing regarding aiderās functionality.
- Configure Aider to Always Start in /ask Mode: A user is looking for a way to configure aider to always start in /ask mode, potentially via a YAML config.
- They checked the documentation but couldnāt find a relevant config key, and another user suggested using
aider --chat-mode ask
or create aask.yml
config file withchat-mode: ask
then runaider -c ask.yml
.
- They checked the documentation but couldnāt find a relevant config key, and another user suggested using
Manus.im Discord ā· #general (9 messagesš„):
WordPress to Next.js conversion, Manus AI Basic Plan, Mount free credits, Manus interlink knowledge, Manus credits rollover
- WordPress Converted to Next.js for Vercel?: A member inquired about converting a WordPress website to Next.js for hosting on Vercel, noting the shift from PHP to React.js.
- Another member suggested cloning the website using Manus or other AI tools as an alternative.
- Manus AI Basic Plan Subscribers Frustrated: A Basic Plan subscriber expressed dissatisfaction with the removal of the option to buy extra credits, which forces users to upgrade even for small needs.
- They requested that Manus AI reconsider reopening top-ups for Basic users, emphasizing the importance of flexibility.
- New User Mount Credit Issues: A new user reported not receiving the standard 1,000 free credits upon creating an account on Mount, despite the websiteās claim.
- No resolution or further information was provided in the discussion.
- Manus Knowledge Interlinking Inquiry: A member asked if Manus can pull information from all chats to interlink knowledge from each chat/task for universal usage.
- No response or clarification was provided regarding Manusās knowledge interlinking capabilities.
- Daily Manus Credits Discontinued?: A user reported that their daily 300 credits had stopped being issued, prompting confusion.
- No solution or further information was provided in the discussion.