everyoneâs a decacorn now.
AI News for 9/4/2025-9/5/2025. We checked 12 subreddits, 544 Twitters and 22 Discords (186 channels, and 4350 messages) for you. Estimated reading time saved (at 200wpm): 324 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!
Congrats to Sierra on becoming the latest Decagon I mean, Decacorn.
Also the new ChatGPT branching feature was remarkably popular for the probable ~100 LOC it took to implement it (with the Responses API)
AI Twitter Recap
Embeddings on-device and retrieval stack updates
- Googleâs EmbeddingGemma (308M) goes wide: Google/DeepMind released a small, multilingual embedding model designed for onâdevice RAG and semantic search. Highlights: 308M params, top-ranked open model under 500M on MTEB, trained on 100+ languages, runs in <200MB RAM with quantization, supports Matryoshka embeddings (output dims 768â128), 2k context, and EdgeTPU latency <15ms in some settings. Immediate ecosystem support across Hugging Face Sentence Transformers, Ollama, MLX, llama.cpp, LlamaIndex, LangChain, Weaviate, Cloudflare Workers, etc. Launch details and getting started: @GoogleDeepMind, @osanseviero, @_philschmid, @tomaarsen, @ollama, @weaviate_io, @TheTuringPost.
- Jina code embeddings (0.5B/1.5B) + GGUF: New code-focused embedding models (with 1â4bit GGUF quantizations) claim SOTA retrieval across 15+ languages and 5 tasks (nl2code, code2code, code2nl, code2completions, QA). Built on a strong code LLM base (e.g., Qwen2.5âCoder pretraining on 5.5T tokens, 92+ languages), then contrastively tuned for retrieval with limited aligned pairs. Links and models: @JinaAI_, details, models.
- Largeâscale retrieval training without distillation: LightOnâs PyLate shows direct contrastive training on billions of passages using GradCache + distributed infra, reporting improved generalization on BEIR/BRIGHT without teacher models. Overview: @LightOnIO.
Vision-language data and multimodal models
- FineVision dataset (Hugging Face): A major open dataset release for VLM training: 17.3M images, 24.3M samples, 88.9M turns, 9.5B answer tokens across 200+ curated sources. The team reports >20% average gains across 10 benchmarks and added capabilities (GUI navigation, pointing, counting). Announcement and technical article: @lusxvr, @andimarafioti, @thibaudfrere.
- MiniCPMâV 4.5 (8B) video/image VLM: Reports 77.0 average on OpenCompass across 8 benchmarks with an 8B model, claiming to surpass GPTâ4oâlatest and Geminiâ2.0 Pro on their setup. Introduces a unified 3DâResampler and aggressive video token compression (96Ă): 6Ă448Ă448 frames â 64 video tokens (vs ~1,536 in many MLLMs). Demos and Space: @_akhaliq, @OpenBMB.
- Also notable: Microsoftâs VibeVoice TTS uses continuous speech tokenizers at 7.5 Hz for expressive, long-form multi-speaker audio @ClementDelangue; Stanfordâs MixtureâofâContexts demonstrates minuteâlong video generation in a single pass @GordonWetzstein.
Optimizers, internal metrics, and training recipes
- Robust optimizer benchmarking (Marin project): Two papers (and a comprehensive Stanford study) compare Muon, Soap, Mars, Sophia, ScheduleFree, AdEMAMix, Prodigy, etc., across model scales (0.1Bâ1.2B), batch sizes, and schedulers. Consensus emerging: with careful tuning and at larger scales, speedups over AdamW diminish (~10% at ~1.2B), though matrix-based methods can lead at smaller scales. Threads: @konstmish, @wen_kaiyue, @percyliang, commentary from @BlancheMinerva and @jeremyphoward.
- âInternal metricsâ in largeâscale training (Kimi/K2): Practitioners emphasize monitoring internal signals (loss, grad norm, output RMS, max logit) to diagnose instability and ensure headroom. MuonClip was designed to control max logit to avoid training breakdowns. Summaries and translations: @ZhihuFrontier, @crystalsssup.
- Creativeâwriting finetune of Qwen3â32B: âZhiâCreateâQwen3â32Bâ reports a WritingBench score of 82.08 vs 78.97 base, using: (1) SFT with curriculum (length/reasoningâgrouped, progressive difficulty, targeted reâtraining) and (2) DPO with RAFT (rule filters + LLMâjudge) to address CNâEN codeâswitching, repetition, and reasoning. Data included filtered open sets (e.g., Dolphinâr1, DeepSeek distills), Zhihu Q&A, and CoT traces; all passed a reward model filter. Usage tips include temperature ~0.6 and optional thinkâtrigger strings. Details: @ZhihuFrontier.
- Infra note: slime RL framework reports cutting Qwen3â30BâA3B weight update time from 60s â 7s, and handling GLMâ4.5â355BâA32B FP8 updates at ~100s, with ongoing async/zeroâredundancy optimizations. Call for collab: @ZhihuFrontier.
Agent systems, runtimes, and tooling
- LangGraph design deep dive: A thorough post on building productionâgrade agent runtimes: minimal abstractions, structured execution/state, recovery/durability, and control surfaces that match real ops needs. A mustâread for teams shipping agents to prod: @LangChainAI, @hwchase17, @nfcampos.
- UIâTARSâ2 (multiâturn agent RL for native UIs): Unified GUI/phone/browser/terminal/toolâuse agent shows benchmarks across OSWorld 47.5, WindowsAgentArena 50.6, AndroidWorld 73.3, OnlineâMind2Web 88.2%, SWEâBench 68.7, TerminalBench 45.3; supports hybrid action flows combining clicks, terminal, and API calls. Paper + demo: @TsingYoga.
- Agent failure analysis: Atla launched a platform to automatically discover recurring failure patterns and propose targeted fixes for agent systems @Atla_AI. Separately, AgenTracerâ8B diagnoses multiâagent interaction errors and reports up to 18.18% gains over proprietary baselines in its setting @omarsar0, paper.
- Infra updates: Groqâs Compound (agentic system) is GA after 5M+ requests @GroqInc. Gradio can now deploy MCP servers to Google Cloud via a single command @Gradio. HF MCP server added OpenAI Codex CLI support @reach_vb. Together AI added an EU GPU region (Sweden) for lower latency/data residency @togethercompute. SkyPilot showcases moving from SLURM to multiâcloud for faster cycles with K8sâgrade reliability @skypilot_org.
Product rollouts and ecosystem
- Perplexity Comet: Broad rollout continuesââmore than a millionâ users got access in one push; mobile preâorders live; new iOS app build streams tables/markdown/intermediate steps smoothly @AravSrinivas, preâorders, iOS update, availability note.
- ChatGPT conversation branching: OpenAI shipped native branchâandâexplore for chats, a longârequested UX upgrade for exploratory workflows @OpenAI, @gdb.
- Research note: DeepMindâs Deep Loop Shaping (published in Science) improves LIGO interferometer control, cutting noise 30â100Ă on hardware and eliminating LIGOâs most unstable loop as a meaningful noise sourceâan example of AI advancing experimental physics @GoogleDeepMind, results, @sundarpichai.
Top tweets (by engagement)
- Ilya Sutskever: âa revolutionary breakthrough if iâve ever seen oneâ â 19.2k
- Alibaba Qwen: âReady to meet the biggest, brainiest guy in the Qwen3 family?â â 5.5k
- OpenAI: âBy popular request: you can now branch conversations in ChatGPTâ â 17.1k
- Google Gemini App: noâprompt nanoâbanana templates for multiâimage generation â 1.7k
- Andrew Ng: âThere is significant unmet demand for developers who understand AIâŠâ â 1.8k
- Perplexity (Arav): âMore than a million people got Comet access this morning.â â 1.0k
- DeepMind: EmbeddingGemma launch â 1.2k
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. Microsoft VibeVoice Repo Takedown & ComfyUI Integration
- VibeVoice RIP? What do you think? (Score: 200, Comments: 75): OP reports that Microsoft abruptly deleted the official VibeVoice GitHub repo and removed the VibeVoice-Large and VibeVoice-Large-Preview models from Hugging Face; mirrors still exist on ModelScope. They maintain ComfyUI integration nodes (Enemyx-net/VibeVoice-ComfyUI) and shipped v
1.0.9
embedding VibeVoice directly to avoid the now-missing upstream dependency; the project was under MIT licensing, implying redistribution is likely permitted. Reason for removal is unknown; the work appears tied to Microsoftâs Asia research lab. Comments note that an MIT license allows community re-uploads (e.g., to Hugging Face) and urge backing up assets to prevent loss. Others speculate this follows a pattern of projects from Microsoft Asia labs being pulled, possibly due to team changes or departures.- Licensing implications: commenters note the project is under the MIT License, which grants broad, irrevocable rights to use, copy, modify, and redistribute existing releases. This means mirrors on platforms like Hugging Face are legally permissible for the already-published version, and any later license changes canât retroactively restrict those artifacts (MIT text). Practical advice: back up both weights and code to avoid loss from upstream takedowns.
- Anticipated re-release changes: if a takedown precedes an updated release, users expect increased safety filters/âcensorshipâ or tighter usage restrictions (e.g., gated downloads, stricter AUP, or embedded refusal policies). This can reduce capability in some domains (higher refusal rates, constrained prompts), so backing up the original checkpoint preserves an unconstrained baseline for evaluation and downstream finetuning.
- Precedent and resilience: commenters compare this to prior incidents (e.g., WizardLM/Wizard 2) where strong checkpoints were released, later pulled/restricted, yet community mirrors persisted and usage continued. The technical takeaway is to prioritize open-weight availability to decouple research and deployments from upstream product or policy reversals (WizardLM repo for context).
- Did M$ take down VibeVoice repo?? (Score: 180, Comments: 36): The post flags that the official Microsoft VibeVoice GitHub repo (microsoft/VibeVoice) now returns a 404, and commenters note the associated Hugging Face models (VibeVoice-Large and VibeVoice-Large-Preview) were also pulled. Community mirrors and tooling still exist: a ComfyUI node implementation is at https://github.com/Enemyx-net/VibeVoice-ComfyUI, and model files can still be fetched from ModelScope: https://modelscope.cn/models/microsoft/VibeVoice-Large/files. Existing local installs continue to function; the takedown reason is unknown and may be temporary, with concerns about potential license changes. Comments speculate it was âtoo goodâ and urge downloading mirrors for posterity, while others ask for copies and advise caution about redistributing until Microsoftâs intent and licensing are clarified.
- Microsoftâs official VibeVoice GitHub repository was suddenly removed, and the Hugging Face entries for
VibeVoice-Large
andVibeVoice-Large-Preview
were also taken down; theVibeVoice-Large
weights remain mirrored on ModelScope: https://modelscope.cn/models/microsoft/VibeVoice-Large/files. The reason for the takedown is unknown, raising concerns about potential licensing changes that could affect redistribution or embedding of the code/weights. - Operationally, existing setups continue to work because inference only requires local weights: âYou donât need the original MS repo. As long as you have the weights you can use them in Comfy.â ComfyUI integration via the community nodes at https://github.com/Enemyx-net/VibeVoice-ComfyUI remains functional, so pipelines that already reference local checkpoints are unaffected.
- Not all variants are gone: commenters note the
1.5
model is still on Hugging Face, while the Large model is retrievable from ModelScope. Practically, users aiming for reproducibility are downloading and pinning the remaining artifacts now to avoid future link rot while the status and licensing are clarified.
- Microsoftâs official VibeVoice GitHub repository was suddenly removed, and the Hugging Face entries for
2. EmbeddingGemma 300M Launch + HF Science AMA/FineVision
- EmbeddingGemma - 300M parameter, state-of-the-art for its size, open embedding model from Google (Score: 197, Comments: 38): Google released EmbeddingGemma, a
300M
âparameter, textâonly multilingual embedding model (trained on 100+ languages) producing768
âdim vectors, with smaller dimensions available via multiâresolution learning (MRL). Weights are on Hugging Face (google/embeddinggemma-300m), deployable via Ollama (library/embeddinggemma), and the launch writeâup provides English and multilingual evaluations claiming stateâofâtheâart performance for its size (HF blog); community GGUF builds (Q4_0, Q8_0, BF16) are consolidated for local inference at unsloth/embeddinggemma-300m-GGUF. License: Gemma. Commenters point to the HF blogâs comparison tables for taskâlevel tradeoffs and discuss whether to prefernomic-embed-text:v1.5
vs EmbeddingGemma, noting the choice likely depends on use case (monolingual vs multilingual coverage, latency/quantization needs, and dimensionality). RAG finetuning and baseline RAG notebooks are forthcoming from the community.- Deployment/quantization: A community GGUF release bundles
Q4_0
,Q8_0
, andBF16
builds of EmbeddingGemma-300M in one repo (https://huggingface.co/unsloth/embeddinggemma-300m-GGUF), easing llama.cpp/local use;Q4_0
minimizes RAM,Q8_0
trades size for accuracy/latency, andBF16
preserves precision for highest quality. The maintainer also plans RAG finetuning + baseline notebooks to evaluate retrieval quality end-to-end. - Benchmarks: Google/Hugging Face provide side-by-side English and multilingual evaluations in the official blog (https://huggingface.co/blog/embeddinggemma), letting you inspect task-level performance (e.g., retrieval/classification) to validate the âstate-of-the-art for its sizeâ claim. The linked charts enable apples-to-apples comparisons against other open embeddings across datasets, which is essential for model selection.
- Comparatives: One practitioner reports EmbeddingGemma-300M is âa fair bit worse than qwen 3 0.6b embeddingâ, highlighting a likely trade-off between size (
~300M
params) and absolute accuracy vs larger (~600M
) models. Another asks aboutnomic-embed-text:v1.5
; the practical guidance is to choose based on target languages/domains and the blogâs per-dataset scores rather than only headline averages.
- Deployment/quantization: A community GGUF release bundles
- AMA with Hugging Face Science, the team behind SmolLM, SmolVLM, Fineweb and more. (Score: 194, Comments: 414): Hugging Face Science announced a timeâboxed AMA (8â11 AM PST with 24h followâups) featuring researchers behind SmolLM, SmolVLM, and FineWeb, alongside the release of a new multimodal dataset, FineVision (see dataset card: https://huggingface.co/datasets/HuggingFaceM4/FineVision). Reference links: org page https://hf.co/science and learning resources https://hf.co/learn. Participants span model pretraining (e.g., SmolLM/Nanotron), postâtraining/alignments, evaluation, multimodal (VLM), data, Transformers.js, and llama.cpp integration. Commenters asked about counterintuitive design choices and surprises during SmolLMâs development, signaling interest in training/architecture decisions; ecosystem contributors (e.g., Unsloth) chimed in with support.
- A commenter asks about the biggest surprises during SmolLMâs developmentâcounterintuitive design choices that ultimately worked. Technical angles include tokenizer/vocab size vs parameter-count trade-offs, context length vs compute budget, data curation via FineWeb/FineWeb-Edu and curriculum, optimizer/regularization choices (AdamW/Lion, weight decay, dropout), attention/activation variants (RoPE scaling, GQA, SwiGLU), and precision/throughput decisions (bf16/fp8, FlashAttention). Theyâre asking for concrete ablations or metrics that show where small models benefit from nonâobvious settings.
- Another thread requests how the team prioritizes next projects. Criteria likely include gaps on public benchmarks (MMLU, GSM8K, MT-Bench), readiness of data pipelines like FineWeb for new modalities, compute/latency constraints for deployment (quantization, KV-cache, attention scaling), and reproducibility vs training cost. The ask implies a decision framework with milestone metrics and resource allocation across SmolLM, SmolVLM, and dataset tooling.
- A user asks whether there are plans to train and release larger
30B+
models. Salient constraints include compute budget, dataset scale/quality, dense vs MoE trade-offs, training stack (FSDP/ZeRO, activation checkpointing), inference cost (memory bandwidth, parallelism), and evaluation needed to justify scaling vs continuing to optimize small models. Theyâre probing the roadmap and feasibility for scaling beyond SmolLM/SmolVLM.
3. Local AI Ops: 5070 Ti Super VRAM Rigs & Ollama Exposure PSA
- Finally: 3090 Successor: 5070 Ti super 24Gb 800$ (Score: 246, Comments: 140): Rumor/leak claims an NVIDIA âRTX 5070 Ti Superâ with 24 GB VRAM at ~$800, positioned as a 3090-class successor, citing improved perf/W that could make multiâGPU (e.g., ~100 GB total VRAM) rigs feasible without extreme power draw, and mentions support for new lowâprecision âFP4â formats for AI inference. Sources include a supposed spec image and a video breakdown (image, YouTube). Commenters also speculate a $600 16 GB GDDR7 â5070â SKU and contrast it with a rumored Intel âB50â 16 GB GDDR6 card at $350, citing a claimed memoryâbandwidth gap of
~1792 GB/s
vs~224 GB/s
(treated as leak claims, not confirmed). Top replies are skeptical about MSRP availability (expect scalping/backorders) and timing (Q4â25 launch, broad availability slipping into 2026), but note if true it could crater used 3090 prices and undercut Intelâs B50 on bandwidth/CUDA; some expect nonâSuper cards to see price cuts.- Bandwidth and memory debate: one commenter projects a $600 16GB GDDR7 â5070-classâ versus Intelâs $350 16GB GDDR6 B50, claiming
~1792 GB/s vs ~224 GB/s
(~8Ă) bandwidth and citing CUDA as an ecosystem advantage. Note that~1792 GB/s
implies a 512âbit bus at ~28 Gbps GDDR7; a 70âclass part is more likely 192â256âbit, yielding roughly~672â896 GB/s
at similar speedsâstill 3â4Ă over a 128âbit GDDR6 part (~224 GB/s), but not 8Ă unless bus width is unusually large. - Power/TDP implications for multiâGPU VRAM rigs: a linked spec sheet TechPowerUp lists the 5070 Ti at
~300W TDP
, undercutting RTX 3090âs typical~350W
but not by a wide margin. As a result, building â100 GB VRAMâ multiâGPU setups will still draw kilowatts; the practical gain is newer warranty support plus higher perâcard VRAM/bandwidth rather than big power savings. - Expected generational uplift vs RTX 3090: commenters expect a 24GB â5070 Ti Superâ (Blackwell 2.0) at similar power to âwipe the floorâ with a 3090 due to newer architecture and faster memory. While no benchmarks are cited, the combination of 24GB VRAM and GDDR7 suggests materially higher perf/$. Against Intelâs rumored B50, CUDA availability is flagged as a decisive advantage for many workloads.
- Bandwidth and memory debate: one commenter projects a $600 16GB GDDR7 â5070-classâ versus Intelâs $350 16GB GDDR6 B50, claiming
- PSA: Make sure your API ports arenât exposed to the open internet (Score: 199, Comments: 55): Cisco reports roughly
1,100
publicly exposed Ollama REST APIs discoverable via Shodan, detailed in their case study âDetecting Exposed LLM Servers: Shodan Case Study on Ollamaâ. They verified instances with a benign probe that may appear in logs as âWhat is 2+2?â; exposed endpoints allow unauthenticated LLM inference over the internet, implying free compute use and potential data leakage for anyone binding Ollama to0.0.0.0
or publishing port (commonly11434
). Commenters debate how exposure happens in 2025: likely culprits include Docker port publishing (e.g.,p 11434:11434
), cloud security groups/firewalls permitting0.0.0.0/0
, UPnP/NAT misconfig, or reverse proxies without auth. Another notes prior scraping efforts like the now-offline freeleakhub.com that indexed open Ollama servers, some serving large models (e.g., DeepSeek R1, Qwen 3), suggesting persistent hygiene gaps.- Prior scans like freeleakhub.com (now offline) reportedly cataloged numerous exposed inference servers, many hosting small models but also full deployments of DeepSeek-R1 and Qwen 3 with no authentication or paywall. This highlights that misconfigured endpoints remain common and trivially discoverable by public crawlers.
- A technical question is raised about how ports get exposed âaccidentally,â with speculation around router/firewall misconfiguration and containerized stacks (e.g., Ollama) being bound to
0.0.0.0
or published via permissive port mappings on hosts with public IPs. Even with consumer NAT, poor defaults or UPnP/automated port forwards can make APIs reachable from the Internet. - Another thread asks about placing Ollama behind a proxy to enforce API tokens and IP allowlists, implicitly noting gaps in built-in auth for self-hosted LLM APIs. The suggested mitigation path is a reverse proxy layer that adds authentication and network ACLs before the model endpoint.
- đ€·ââïž (Score: 988, Comments: 176): Ambiguous teaser image (unreadable here) with title âđ€·ââïžâ prompts speculation about a very large upcoming Qwen model/tool; commenters mention wanting a âstronger Qwen CLIâ that could match/surpass Claude Sonnet 4 and joke about needing
1344 GB
of memoryâimplying hefty local inference requirements or model size. No concrete specs, benchmarks, or release details are provided in the post. Commenters expect the release to be âhuge⊠in size,â debate whether Qwen can reach Claude Sonnet 4 quality at the CLI, and note hardware constraints for on-prem users.- Requests center on a more capable Qwen CLI that can rival Anthropicâs Claude Sonnet on reasoning/coding. Concretely, commenters want parity on benchmarks like
GSM8K
,HumanEval
,MMLU
, andGPQA
, along with production features (tool/function calling, streaming, low-latency decoding via vLLM/speculative decoding, and paged attention). A turnkey CLI that ships quantized builds (AWQ/GPTQ/EXL2) and long-context support would make self-hosting competitive with API-only models like Claude Sonnet. - Hardware sizing discussion implies interest in running very large models locally: with
1.344 TB
RAM, feasible model capacity depends on precision (fp16
â2 bytes/param,int8
â1, 4âbitâ0.5). Examples: a70B
model in fp16 is ~140 GB
; a405B
model at 4âbit is ~~202 GB
for weights (KV cache adds substantial overhead depending on seq length/batch). With vLLM or TensorRTâLLM plus paged KV cache, long contexts (e.g.,100k+
) are memoryâviable; throughput will hinge on parallelism and quantization strategy. - Thereâs explicit concern about a closedâweight âQwenâ3âMaxâ and preference for open weights for reproducibility, selfâhosting, and fineâtuning. Open checkpoints enable domain adaptation, RAGâspecific alignment, and verifiable constrained decoding, whereas closed weights lock users to vendor APIs and limit auditing. This aligns with prior community adoption of open Qwen releases (e.g., Qwen on Hugging Face) and strongly affects regulated/airâgapped deployments.
- Requests center on a more capable Qwen CLI that can rival Anthropicâs Claude Sonnet on reasoning/coding. Concretely, commenters want parity on benchmarks like
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Nano Banana & Veo3 Visual Gen Demos and Workflows
-
I asked nano banana to get me into my favorite arcade (Score: 915, Comments: 76): Creator used a real first frame as the base plate, then composited themselves into an arcade via image editing with ânano banana,â and generated motion using Kling
2.1
âs start/end-frame animation workflow; audio was created with Producer AI and the final cut/grade was done in DaVinci Resolve. A stepâbyâstep walkthrough is provided here: techhallaâs tutorial. -
I asked nano banana to get me into my favorite arcade (Score: 912, Comments: 76): OP showcases an AI-assisted workflow: image compositing with ânano bananaâ to insert themselves into an arcade scene (noting the first still was a real photo), motion generated via Kling 2.1 using a start/end-frame method (i.e., keyframe-based img2vid), AI-generated music from Producer AI, and final assembly/editing in DaVinci Resolve. A step-by-step walkthrough is provided on X/Twitter: https://x.com/techhalla/status/1963333488217919668. Top comments are non-technical praise and nostalgia (e.g., mention of the arcade game Cadillac and Dinosaurs); no substantive technical critique or benchmarking discussion.
- Paintings coming to live with Nano Banana and Veo3 (Score: 903, Comments: 103): A short demo animates classic paintings by first generating a sequence of stills with Googleâs
Gemini 2.5 Flash
image editor (the âNano Bananaâ images) and then converting them to video via interpolation/synthesis. Despite the title creditingVeo 3
, the author later corrected that the video was actually produced with Seedance Pro andKling 2.1
, not Veo; this is an image-to-video interpolation pipeline rather than endâtoâend textâtoâvideo. The original clip link requires Reddit auth and returns 403 without login (login). Non-technical top comments joke about the subjectsâ affect; the only substantive update is the correction of tool attribution (Veo 3 was not used).- A commenter corrects the pipeline: the ânano bananaâ stills were generated with Google Gemini
2.5 Flash
(image editor), and the video was created via interpolation using Seedance Pro and Kling2.1
ânot Veo 3. This means the motion comes from frame interpolation rather than native text-to-video synthesis by Veo, which typically changes temporal coherence and artifact characteristics (e.g., smear vs. hallucinated motion).
- A commenter corrects the pipeline: the ânano bananaâ stills were generated with Google Gemini
- Paintings coming to live with Nano Banana and Veo3 (Score: 907, Comments: 103): OP showcases âpaintings coming to lifeâ by first generating stills with âNano Bananaâ using Googleâs Gemini 2.5 Flash image editor (Gemini 2.5 Flash), then converting them into video via frame interpolation/temporal synthesis. A later correction specifies that interpolation was done with Seedance Pro and Kling 2.1, not Googleâs Veo 3 (title reference; general Veo info: Veo). The shared clip is hosted at Redditâs CDN (v.redd.it/ahb3ybfu73nf1), which returns
HTTP 403 Forbidden
without authentication due to network-security gating. Comment discussion is largely humorous; the only substantive technical point is the correction clarifying tool attribution (Seedance Pro + Kling 2.1 vs. Veo 3).- Pipeline attribution correction: source images for the âNano Bananaâ sequence were created with Google Gemini 2.5 Flash (image editor), and the image-to-video interpolation was done using Seedance Pro and Kling
2.1
, not Veo3
. In other words, Veo 3 wasnât used for temporal synthesis; motion between stills was generated by Seedance Pro + Kling 2.1, with Gemini providing the base imagery.
- Pipeline attribution correction: source images for the âNano Bananaâ sequence were created with Google Gemini 2.5 Flash (image editor), and the image-to-video interpolation was done using Seedance Pro and Kling
- Improved Details, Lighting, and World knowledge with Boring Reality style on Qwen (Score: 430, Comments: 50): Early LoRA work targeting a photorealistic âBoring Realityâ style on the Qwen image generation stack is shared, with reproducible setup via a ComfyUI workflow (workflow JSON). Artifacts are published on Hugging Face and CivitAI. Reported strengths are fine detail and physically plausible lighting on close-up subjects; prompting behavior/results are described as similar to SD 1.5, with thanks to Hugging Face for GPU support enabling training. Commenters note that despite strong realism, small text/numbers and diagrammatic elements needing consistent internal logic remain weak points. Achieving top results often requires mixing multiple LoRAs and iterative experimentation on Qwen.
- Early LoRA finetuning on the Qwen image model shows it excels at close-up detail and lighting, but consistency often requires mixing multiple LoRAs and experimentation. Results are reported as broadly similar to SD 1.5 workflows. Model and workflow resources: Hugging Face kudzueye/boreal-qwen-image, CivitAI modelVersionId=2181911, and a ComfyUI example graph boreal-qwen-workflow-v1.json. âIt seems to perform best at getting detail and proper lighting on upclose subjects.â
- Complex compositions remain a failure mode: multiple characters across poses (lying, sitting, standing), object interactions, and concurrent gestures often collapse unless guided. Users report better reliability when supplying a guide image or hand-drawn outlinesâsimilar to SDXL-era techniquesâto anchor spatial layout and reduce character/object mixing. âEven the best of models fall apart when trying to do all this⊠unless you have a guide for the image.â
- Fine text, numbers, and diagrams still expose weaknesses in text rendering and symbolic consistency; small glyphs that require âinternal logicâ are frequently wrong despite strong photorealism. This reflects a common limitation across current image generators in reproducing legible micro-text and structured schematics.
- Stock Photography Version 1 [Wan 2.2] (Score: 346, Comments: 37): Release of a Wan 2.2 LoRA (âStock Photography v1â) trained on highâquality photos, intended to pair a âhighâ and a âlowâ variant together for best results; recommended generation at 1888Ă1248 (portrait 1248Ă1888 reportedly causes severe artifacts). On an RTX 4060 Ti 16 GB, inference takes ~
4 min
per image; known issues include weak text rendering, hand/pose failures, and sensitivity to prompt phrasing. The LoRA is designed to compose well with character LoRAs; resources credited include a ComfyUI install script by UmeAiRT (https://civitai.com/models/1309415) and a Wan 2.2 LoRA training guide by AI_Characters (https://civitai.com/articles/17740); model download: https://civitai.com/models/1925758. Commenters argue the style is not truly âstock photographyâ but closer to casual/event photography, suggesting a rename. Others request embedded workflows for reproducibilityâclaiming example images lack themâand note that minor ComfyUI node toggles often drive the âmagic,â making replication difficult without shared graphs.- OP reports strong training stability and output quality with Wan 2.2 when the LoRA is trained on high-quality photos (vs prior Flux Dev LoRAs). They recommend using both the âhighâ and âlowâ Wan 2.2 models together; on an RTX 4060 Ti 16 GB, generations take ~
4 minutes
per image. Optimal resolution is1888x1248
; flipping to1248x1888
produces severe anatomical artifacts. Known limitations: rough text rendering, hand errors in complex poses, and prompt sensitivity; notable strength: compatibility with character LoRAs. Links: model download (https://civitai.com/models/1925758), Comfy install script (https://civitai.com/models/1309415), Wan 2.2 LoRA training guide (https://civitai.com/articles/17740). - Reproducibility concern: a commenter notes the example images do not have embedded workflows and asks for reference ComfyUI workflows to replicate results. They caution that a single node toggle can materially change outputs, so providing explicit graphs and parameters would remove ambiguity about the âsimple WFâ claim and enable apples-to-apples testing.
- Community requests concrete training details: hardware used, training durations, and dataset size/quality for this LoRA. Sharing compute footprint (VRAM/GPUs), epoch counts/steps, and dataset composition would help others estimate requirements and reproduce or extend the results in Wan 2.2.
- OP reports strong training stability and output quality with Wan 2.2 when the LoRA is trained on high-quality photos (vs prior Flux Dev LoRAs). They recommend using both the âhighâ and âlowâ Wan 2.2 models together; on an RTX 4060 Ti 16 GB, generations take ~
- While OpenAI is going backwards, Google is just killing it, Nano Banana and Veo are just insane tools. (Score: 4290, Comments: 321): The post claims Googleâs latest gen-AI stackâespecially Veo and onâdevice Gemini Nano (the âNano Bananaâ nickname)âis outpacing OpenAI. Technically, Veo is Googleâs textâtoâvideo model producing
1080p
clips with promptable camera control, style conditioning, and editâwithâprompt workflows intended for longer, temporally coherent shots (DeepMind Veo, I/O overview). Gemini Nano is a compact onâdevice model integrated with Android AICore for lowâlatency, offline tasks (summarization, safety/ASR aids, and announced multimodal extensions) with developer hooks for running on mobile CPUs/NPUs (Gemini Nano). Top comments arenât technical; they joke about pacing and a Van Gogh scene having âtoo many ears,â implicitly pointing to known failure modes in current video generators: weak sceneâending heuristics and occasional anatomical/temporal inconsistencies.
2. Meta Superintelligence, Sutskever âbreakthroughâ and GPTâ6 Rumors
- Alexandr Wang is now leading Metaâs AI dream team. Will Mark Zuckerbergâs big bet pay off? (Score: 586, Comments: 249): Meta has appointed Alexandr Wang (cofounder of Scale AI) as its first Chief AI Officer, consolidating all AI product and research under a new org, Meta Superintelligence Labs, after a reported
$14.3B
investment in Scale AI. Wang will lead a new âsuperintelligenceâ team of elite hires and oversee Metaâs broader AI portfolio; his background includes founding Scale AI during Y Combinator in 2016 to build data-labeling infrastructure. Commenters question fit and org design: skepticism that Scale AI is âjustâ a data annotation shop and thus unlikely to drive AGI; surprise that Yann LeCun would report to Wang, with doubts about credentials and references to impostor syndrome.- Debate centers on whether a data-annotationâcentric background (Scale AI) is the âbottom rungâ or actually a core lever for frontier LLM quality. Technical focus is on data pipeline rigorâcuration, dedup/filtering, preference/RLHF data, and eval designâwhich can materially shift downstream metrics (
MMLU
, pass@1, toxicity) sometimes more than minor architecture tweaks; see OpenAIâs RLHF in InstructGPT (https://arxiv.org/abs/2203.02155) and AllenAIâs OLMo/DOLMA showing outsized impact of data quality (https://allenai.org/olmo). If Wang can scale high-quality human feedback and automated QA reliably, it could directly impact Llama alignment and eval performance. - Others allege Meta âdroppedâ Scale AI over label/data quality, implying vendor-provided human feedback/eval sets became a bottleneck. If true, it highlights classic failure modesâlabel noise, instruction ambiguity, misaligned annotator incentives, and lack of golden-set auditingâthat propagate into alignment failures and eval regressions (e.g., factuality/harmlessness) despite higher spend; common mitigations include consensus labeling, adversarial sampling, deduplication, and continuous QA. This claim isnât sourced in the thread, but it underscores why many labs insource data/feedback pipelines and invest in stronger measurement.
- Debate centers on whether a data-annotationâcentric background (Scale AI) is the âbottom rungâ or actually a core lever for frontier LLM quality. Technical focus is on data pipeline rigorâcuration, dedup/filtering, preference/RLHF data, and eval designâwhich can materially shift downstream metrics (
- GPT 6 is coming⊠(Score: 916, Comments: 59): The post is a meme/satire rather than a technical announcement; the image (titled âGPT 6 is comingâŠâ) implies dystopian, authoritarian enforcement around AI usage, not a real model release or benchmark. No implementation details, model specs, or empirical results are provided. Top comments pivot to a substantive debate: advocates argue this highlights why openâsource, locally runnable LLMs (e.g., DeepSeek) are preferable to proprietary âhomeâgrown Big Brotherâ systems due to surveillance/abuse risks, while others condemn the perceived erosion of civil liberties in the U.S. The tone is alarmist/sarcastic (e.g., âfiring squadâ), underscoring fears of punitive control rather than technical issues.
- A commenter highlights that open-source LLMs (e.g., DeepSeek) can be self-hosted to avoid SaaS telemetry and jurisdictional exposure, contrasting with closed systems that may log prompts or be compelled to share data. Practically, local inference using
GGUF
/quantized weights (INT4/INT8
) via llama.cpp or Ollama enables 7Bâ13B models on 8â16 GB VRAM and 30Bâ70B with 24â64 GB (with throughput varying from ~20â100+ tok/s depending on quantization, GPU, and context length); see the DeepSeek org for open weights and variants (HF, GitHub). They also note privacy still depends on the stack: disable frontâend analytics, keep prompts/data offline or encrypted, and prefer models with permissive licenses/open weights so binaries and network calls can be audited.
- A commenter highlights that open-source LLMs (e.g., DeepSeek) can be self-hosted to avoid SaaS telemetry and jurisdictional exposure, contrasting with closed systems that may log prompts or be compelled to share data. Practically, local inference using
- Codex usage up ~10x in the past 2 weeks! (Score: 323, Comments: 48): Screenshot (appears to be a Sam Altman tweet) claiming OpenAI Codex usage is up ~10x in the past two weeks (image). No technical details, benchmarks, or API changes are providedâthis is a high-level adoption/engagement metric rather than a performance result or feature announcement. Comments suggest seasonality (start of the school year) as a driver and note that the $20/mo plan is âhardly hitting usage caps,â implying improved rate limits/throughput; others argue the claim is credible because Altman wouldnât âhype a nothing-burger.â
- Users on the $20/month Plus plan report running GPTâ5 Thinking High with minimal rateâlimit friction, implying more generous caps than prior tiers. Another user still hit a cap and had to wait âa few daysâ for reset, suggesting limits are finite but extended; perceived session longevity with
gptâ5 high
has improved compared to earlier behavior. - Anecdotes indicate Codexâs latest update materially improved UI/UX design generation qualityâusers who previously âexclusivelyâ relied on Claude now get âsurprisingly good designsâ from Codex. This suggests better layout/wireframe synthesis and design reasoning, reducing the need to modelâswitch for frontâend ideation.
- Some commenters attribute the
~10x
usage spike to migration from Claude after an Anthropic ânerf,â implying capability or policy regressions can quickly redirect workloads. If accurate, this highlights crossâprovider elasticity: perceived degradations in one model immediately boost utilization of substitutes like Codex.
- Users on the $20/month Plus plan report running GPTâ5 Thinking High with minimal rateâlimit friction, implying more generous caps than prior tiers. Another user still hit a cap and had to wait âa few daysâ for reset, suggesting limits are finite but extended; perceived session longevity with
- The internet will become increasingly automated and artificial (Score: 762, Comments: 149): The image (linked) is a satirical depiction that the modern internet is being overrun by automation: bot-driven astroturfing on social platforms (implied jab at X/Twitter), SEO spam via fake ranking sites and blogs, AI-generated content farms (e.g., YouTube for ad revenue), large-scale botting in online games for RMT, and purchased/botted followers to fabricate social proof. The technical thrust is that recommendation/search systems and social metrics can be systematically gamed at scale by coordinated bots and generative models, accelerating a âdead internetâ dynamic where machine content outnumbers authentic human activity. Commenters argue this automation is âinevitableâ due to incentives across propaganda, marketing, and monetization, and note that distinguishing humans online increasingly relies on niche meme-speak or abrasive vernacular rather than classic Turing-test cues. Some interpret the image as specifically criticizing Elon Muskâs platform (X).
- A scalable astroturfing pipeline is outlined: deploy
hundreds of thousands
of bots to simulate consensus, generate LLM-written blogs and âfake ranking websitesâ to poison SEO, and route bots to those links to manipulate search suggestions. This is a classic Sybil + search-engine-poisoning attack exploiting engagement-weighted ranking in social feeds and SERPs; with residential proxies and CAPTCHA-solving, detection becomes costly. The outcome is automated normalization/propaganda and product shilling that outcompetes organic content via volume and coordination. See: astroturfing, search engine poisoning. - Monetization vectors cited include MMO botting to farm/sell in-game currency, programmatic YouTube video generation for ad revenue, and buying bot followers to bootstrap social proof and trigger recommender systems. This leverages ranking feedback loops (engagement â visibility â more engagement) to amplify synthetic accounts, making detection harder once critical mass is reached. Tactics mirror gold farming and click farms, and can be combined with AI-generated media for 24/7 output that overwhelms moderation queues.
- One commenter notes the âTuring testâ is increasingly culturalâbots that mimic ultra-niche meme dialects or âsay slursâ can evade naĂŻve language-based bot heuristics. Implication: detection needs to shift from surface linguistic cues to network- and behavior-level signals (e.g., temporal patterns, device fingerprints, graph anomalies) as language becomes an unreliable discriminator.
- A scalable astroturfing pipeline is outlined: deploy
- Updates!Not bad for Free tier btw⊠(Score: 445, Comments: 108): The image appears to be a ChatGPT âUpdatesâ screenshot noting that the Free tier now includes access to Projects, enabling scoped workspaces to organize chats, files, and tools. Comment context indicates users are attempting cross-chat summarization within a Project, but the model can fail to traverse the intended chat set and instead retrieve or hallucinate from unrelated threads, suggesting limitations in retrieval/scoping across project conversations and long âthinkingâ times. Debate centers on utility vs reliability: some say Projects are very helpful for organization, while others report Pro failed to summarize multiple chats and drifted to an unrelated project, questioning robustness; one quips that if Free has Projects, Plus may be unnecessary.
- A ChatGPT Pro user asked the assistant to scan and summarize multiple chats within a Project; it apparently failed to read any of them, idled for
~10 min
, then referenced a different (unrelated) project and produced offâtopic advice. This points to brittle projectâscoped retrieval/context routing across many chats and poor timeout/latency handling under larger workloads (possible crossâproject context bleed). - Concern that GPTâ4oâs strong writing capability may be lost if itâs labeled âlegacy,â including for paid users. Request (implicit) for stable, versionâpinned access to that skillset across Projects and tiers to avoid silent model swaps/regressions over time.
- A ChatGPT Pro user asked the assistant to scan and summarize multiple chats within a Project; it apparently failed to read any of them, idled for
3. AI Hallucination in Court + ChatGPT Community Experiments
- Opposing Counsel Just Filed a ChatGPT Hallucination with the Court (Score: 8437, Comments: 979): A civil litigator reports that opposing counsel (a collections firm) filed an opposition brief on shortened time that appears AIâgenerated, containing fabricated authorities: case names/citations didnât exist or didnât match, and quotes were nowhere in the texts. Telltale signs cited include odd formatting (emâdashes, random bolding/bullets), an improperly formatted caption using the judgeâs nickname, and an unnecessary perjury signature; the filer has since moved to withdraw, with that motion set the same day as the motion to dismiss. The respondent filed a reply attaching a reconciliation spreadsheet and flagged dutyâofâcandor concerns (see ABA Model Rule 3.3 and potential Rule 11 exposure; cf. the Avianca sanctions order, Mata v. Avianca). Commenters ask for a postâhearing update, question the grounds for withdrawal, and debate whether filing fabricated citations is sanctionable/âillegal,â noting it would be under traditional ethical rules and could set precedent for AI misuse in filings.
- Procedural sanctions playbook: Serve a Rule 11 safeâharbor letter/motion giving
21 days
to withdraw the hallucinated filing, then file your sanctions motion if ignored; attach the letter as Exhibit A and seek fees for responding. See Fed. R. Civ. P. 11(c)(2) and the duty to ensure filings have evidentiary support under Rule 11(b) (text). - Recent precedent illustrates consequences for AIâfabricated citations: in Mata v. Avianca, Inc. (S.D.N.Y. 2023), Judge Castel sanctioned counsel
\$5,000
(jointly/severally) and ordered remedial notices after ChatGPTâinvented cases were filed (order). Some courts now require AIâuse certifications (e.g., N.D. Tex. Judge Brantley Starrâs standing order mandating verification of all citations and disclosure of AI assistance, PDF). - A motion to be relieved as counsel does not moot sanction exposure; Rule 11 targets the attorney(s) who signed/submitted the paper, and courts weigh timing, prejudice, and reason in deciding withdrawal. The conduct also implicates ABA Model Rule 3.3 (Candor Toward the Tribunal), which prohibits offering false statements or failing to correct them (rule).
- Procedural sanctions playbook: Serve a Rule 11 safeâharbor letter/motion giving
- TIL ChatGPT can create Trump without ever saying his name (Score: 419, Comments: 139): Post demonstrates prompt-based evasion of public-figure name filters in ChatGPTâs image generation: describing attributes (e.g., âgiant orange personâ with âblue suit,â âblonde hair,â âred tieâ) yields a recognizable likeness of Donald Trump without using his name, with outputs shown in linked previews (example 1, multi-figure candle caricatures resembling US/Russian/Chinese leaders, user attempt with guillotine scene). The attempt requests a GIF but yields a static JPEG, highlighting modality limits (no animation) and suggesting safety filters trigger primarily on explicit names rather than descriptive attributes; violent/political content (âbrought to justice medieval style⊠guillotineâ) is sometimes allowed, indicating inconsistent moderation thresholds. Commenters note the outputs are overtly targeted and discuss that euphemistic, attribute-based prompts can consistently bypass name-based public-figure and political-content filters, with moderation behavior perceived as inconsistent across similar prompts.
- Commenters demonstrate prompt-engineering to bypass name-based safety filters by describing distinctive attributes (e.g., âgiant orange personâ with a blue suit, blonde hair, red tie) to elicit a likeness of a specific public figure without using the name. Examples show the model still renders recognizable caricatures (image 1, image 2), implying reliance on named-entity triggers rather than appearance-based moderation. This highlights a brittle guardrail where visual attribute prompts can recreate public-figure likenesses.
- Safety behavior is probed with violent-scene prompts (âbrought to justice medieval style,â âbefore the guillotineâ), and images appear to be generated regardless, suggesting a gap in content filters when targets arenât explicitly named. The observations imply that violence classifiers may not couple identity recognition with scene semantics, allowing targeted-violence depictions if NER doesnât fire (example prompt and output).
- A user shares a GIF output (link) despite the common limitation that ChatGPTâs native image generation returns static images; this suggests out-of-band conversion or stitching if the GIF indeed originated from ChatGPT prompts. The discrepancy is noteworthy for assessing real capabilities vs. user-postprocessed results.
- What ChatGPT thinks r/ChatGPT will look like in 10 years (Score: 301, Comments: 50): Meme-style, likely AI-generated image (link) satirizes what r/ChatGPT might look like in 10 yearsâdominated by deepfakes (e.g., a garbled âJcoe Rogan interviewing a beepfake of Joee Roganâ), moderation-evasion/jailbreak culture, and chaotic, glitchy UI text that reflects current image-model typography failures. Itâs non-technical content, serving as cultural commentary on model safety bypasses and AI-generated media proliferation rather than an announcement or benchmark. Comments highlight expectations of persistent restriction bypassing and the overwhelming/cognitive-load feel of such a future, with one remarking it âfried my short-term memory,â matching the chaotic aesthetic.
- Just made this little edit with ChatGPT, how cool is it, open for original post btw (Score: 632, Comments: 53): OP showcases a ChatGPT-generated media edit, noting it captured very small details, but provides no technical workflow, model version, or parameters. The linked artifact in the Reddit gallery is inaccessible (
HTTP 403 Forbidden
), so the result canât be independently reviewed; no prompts, iteration counts, seeds, or settings are disclosed, limiting reproducibility. Comments highlight strong detail fidelity and ask how many passes/iterations were used and what the exact prompt was, implying iterative refinement and prompt specificity are key. The absence of a shared prompt/workflow is the main blocker for replication or benchmarking.- Commenters probe the number of passes/iterations used and note the surprising preservation of very small detailsâimplying concerns about artifact accumulation and mask precision in iterative image edits. Multi-pass workflows can improve global coherence but risk eroding micro-textures; balancing mask granularity and edit/denoise strength is key to retaining fine detail while making substantial changes.
- Multiple requests ask for the exact prompt and parameters to enable reproducibility (literal prompt text, model/version, image-edit mode, seed). For prompt-based image editing, sharing seed, guidance/strength, and whether the result came from a single-shot vs. multi-step process materially affects the ability to replicate outcomes.
- Casual conversation with the security robot dog (Score: 861, Comments: 119): A short video (original: v.redd.it/mgu9fy21w2nf1, currently returning HTTP
403
without auth) depicts a security quadruped ârobot dogâ engaging in a brief spoken exchangeâe.g., âRight this way.ââwhile audibly walking (CLANKâŠ), suggesting a human-in-the-loop speaking through the robotâs PA or a simple TTS/ASR pipeline rather than an autonomous conversational agent. The setup aligns with current security deployments where robots provide mobility/sensors while a remote operator supervises or directly speaks, trading full autonomy for reliability and liability control. Top comments imply skepticism that this is âAI,â with the quip âAI â anonymous Indianâ pointing to offshore teleoperation; another notes that such systems effectively outsource security work and speculates the same model could be scaled to trades via humanoid teleoperated robots, raising labor and displacement concerns.- Thread infers the robot dog is teleoperated by a remote human operator (potentially offshore), highlighting a telepresence security model that enables labor arbitrage and centralized monitoring across sites. Commenters speculate this approach could generalize to other platforms (including humanoids) for physical jobs, shifting on-site roles to remote control centers.
- Observers note a rear green flashing indicator, likely a status LED communicating the robotâs operational state to nearby humans (e.g., connected/idle/normal operation). Such explicit state signaling is common in HRI/robotics for situational awareness and safety, though the exact semantics arenât specified here.
- Comments imply the unit has a noticeable acoustic signature (described as âCLANK CLANKâ), which may impact stealth and user acceptance in security patrol contexts. This suggests drivetrain/footpad design trade-offs favoring durability over quiet operation.
- Itâs bad out there (Score: 968, Comments: 78): Non-technical meme/screenshot referencing Sam Altmanâs âItâs bad out thereâ line as a dig at X (Twitter), implying that much of the platformâs engagement is driven by bots/automation rather than real users. Comments highlight synchronized messaging and automated engagement (botnets/Sybil activity, astroturfing) but the post provides
no data, metrics, or new evidence
âitâs commentary rather than analysis. Top comments say this is obvious and not particularly insightfulâjust a justified swipe at Xâs bot problem; one quips about coordinated MAGA bot accounts and another shares a jokey âhow Sam thought he was saying thisâ meme.- Multiple commenters note that a large share of engagement on X/Twitter appears automated, citing
synchronized
talking points and timing as telltale signals of botnets. Heuristics mentioned include identical phrasing across many accounts, bursty reply/retweet patterns, and low-entropy profile metadataâclassic indicators of automation rather than organic coordination. - The problem is described as cross-platform, affecting Meta properties as well, aligning with patterns of
coordinated inauthentic behavior
. Observed indicators include convergent writing styles, stock/AI-looking profile photos, and swarms of accounts arriving simultaneously to push specific narrativesâconsistent with astroturfing in entertainment marketing (e.g., promo vs. legacy cast debates) and politics. - A technical concern raised is not âAI takeoverâ but the scaling of influence ops via LLM-assisted content farms that amplify polarized, binary narratives. This implies increased difficulty of content-based detection and a shift toward graph/behavioral defenses (account age, interaction graphs, temporal clustering) to separate humans from automated or orchestrated actors.
- Multiple commenters note that a large share of engagement on X/Twitter appears automated, citing
- Has anyone tried this? (Score: 14268, Comments: 358): The image appears to be a meme/screenshot of someone asking an AI to generate valid Microsoft/Xbox gift card codes; commenters explain this wonât work because models can only mimic the visible code format (e.g., grouped alphanumerics) and have no access to Microsoftâs issuance or redemption database. Gift/voucher codes are generated serverâside and validated against a backend; at best an AI could output formatâlooking strings (similar to how credit card generators can produce Luhnâvalid numbers) but they wonât authorize without a matching issued record. Top comments dismiss the idea as naive, likening it to old âcredit card number generatorsâ and noting that even if a model guesses the format, working codes require backend issuance and will be blocked by rate limiting/fraud controls.
- Multiple commenters note that an LLM can infer and reproduce the surface pattern of Microsoft gift codes (e.g., 5Ă5 alphanumeric blocks) but cannot access issuer backends to produce valid, unredeemed codes. At best, itâs doing pattern completion or naive enumeration over an astronomically large keyspace (even with constraints), which is computationally and practically useless for finding real codes.
- Parallels are drawn to old âcredit card number generators,â which typically output numbers that merely satisfy the Luhn check and BIN format but fail real authorization because they arenât tied to actual accounts. Those tools were also notorious malware vectors, highlighting the security risk of running code or executables that promise âfreeâ keys or credentials.
- A commenter frames this as a midâ2023 promptâengineering fad: coercing models to emit strings that match regexâlike formats for keys or codes before safety updates clamped down. This exploits distributional patterning in the modelâs training data, not any privileged database or API access, so the outputs are lookalike strings rather than redeemable secrets.
AI Discord Recap
A summary of Summaries of Summaries by gpt-5
1. Low-Bit Training, Triton Changes, and GPU Perf Playbook
- TorchAO Turns Up QAT to Eleven: The GPU MODE thread flagged the torchao v0.13.0 release with a simpler multi-step QAT API, prototype NVFP4/FP8 QAT, and a 1.2x MXFP8 dense pretraining bump via Torchtitan, plus float8 training wiring into Axolotl; release notes: PyTorch AO v0.13.0-rc8.
- Members highlighted that float8 training now lands in workflows via Axolotl as per the release post, calling it a step toward more stable low-bit training in production.
- MXFP8 PR Pops, Then Plops: Triton briefly added MXFP8 dot product support via
tl.dot_scaled
for sm_120 (5090) before reverting it pending investigation, with maintainers pointing users totorch._scaled_mm()
instead; see the thread on triton-lang/triton#8029.- One member admitted âI am not sureâ why it was reverted, while others noted training stacks should hedge with PyTorch primitives like
torch._scaled_mm()
until Triton stabilizes MXFP8.
- One member admitted âI am not sureâ why it was reverted, while others noted training stacks should hedge with PyTorch primitives like
- Cuda Graphs Crush Kernel Launches: Engineers reported that cuda graphs deliver the bulk of speedup by slashing kernel launch overhead (especially with Triton kernels) and recommended
torch.compile(reduce_overhead=True)
plus sequence-length padding to avoid recompilations for variable lengths, citing SIMD intrinsics in the CUDA Math API.- The consensus framed kernel fusion as secondary to reducing launch overhead, and reminded that sub-32b operations are possible but inefficient without vector types per CUDA docs.
- Glossary Gains: Modal Maps the GPU Maze: Modal published a curated GPU Glossary that catalogs performance primitives, memory hierarchies, and feature definitions for practitioners, available at modal.com/gpu-glossary/perf.
- Contributors thanked reviewers and pitched the glossary as a shared performance vocabulary to speed debugging and architecture conversations across teams.
2. Agent Tooling Goes Real: ACK-Lab Wallets, DSPy Momentum
- Agents Get Paid: ACK-Lab Ships Wallets: Catenalabs unveiled a developer preview of ACK-Lab that gives agents built on the open-source Agent Commerce Kit (ACK) real wallets/fiat accounts, verifiable identities, and policy controls; docs live at ack-lab.catenalabs.com.
- Members said this enables autonomous transaction flows and compliance-aware actions, calling it a bridge from demos to âpolicy-driven, money-moving agentsâ per ACK-Lab.
- DSPy Drumbeat: Paradigm or Pipe Dream?: Practitioners argued DSPy could be the most significant programming shift since early LLMs if it reaches critical mass, pointing to this take: lateinteraction on DSPy.
- Skeptics asked for more end-to-end wins, while fans framed DSPy as opinionated program synthesis + optimization stack that finally makes âprompt engineering reproducibleâ via compiled pipelines.
- Hallucinations on a Budget: HallBayes Experiments: Researchers kicked around integrating HallBayes into DSPy as a Bayesian budget to curb hallucinations, linking the repo: leochlon/hallbayes.
- The thread proposed evidence allocation and verifier loops to meter generations, noting that robust uncertainty accounting would help productionize âtruthyâ agent behaviors.
3. Multimodal & On-Device: smolVLM2, LFM2, EmbeddingGemma
- SmolVLM2 Signs Up for Sign Language: Hugging Face users explored fine-tuning smolVLM2 on sign-language videos, citing architecture details in the official post: smolVLM2: A small, powerful vision-language model.
- The community agreed feasibility is high with the right video data and adapters, encouraging targeted gesture understanding tasks over generic captioning.
- Liquid Courage: LFM2 Tames Vision Hallucinations: For vision hallucination complaints, members recommended Liquid Foundation Models (LFM2) built on Llamaâ3.2â11BâVisionâInstruct, with a live space: LFM2-MCP on Spaces and base model card: Llamaâ3.2â11BâVisionâInstruct.
- Early adopters claimed improved grounding on small images, advising teams to âjust try it out lol or dontâ to judge fit.
- EmbeddingGemma Goes On-Device: Google launched EmbeddingGemma, a 308Mâparameter on-device embedding model targeting private, portable vectorization, announced via Introducing EmbeddingGemma and the talk EmbeddingGemma overview.
- Engineers see this as a practical edge retrieval option where privacy and low-latency matter, complementing server-side cross-encoders.
4. Hardware Shakeups: Huawei Ternary Compute and AI SSD, Builder GPU Choices
- Ternary Tango: Huawei Teases Third State Compute: Nous members shared a video claiming Huawei is close to shipping ternary logic computeâadding a third âdimâ stateâfor up to 60% cost efficiency; watch: Huawei ternary logic compute (YouTube).
- The group debated feasibility and tooling implications, with some hoping non-binary hardware could democratize local AI acceleration if SDKs arrive.
- AI SSDs: Secret Sauce Saves HBM: A TechRadar piece says Huaweiâs AI SSD uses a performance âsecret sauceâ to reduce HBM requirements, hinting at compute-in-storage trends: Huawei released an AI SSDâŠ.
- Threads cross-referenced computational storage and in-situ processing, even joking about a âredneck AIâ built from SD cards + FPGAs to move compute toward data.
- Builderâs Dilemma: 3090 Over MI50: Local LLM tinkerers weighed RTX 3090 versus Radeon MI50 for servers, favoring the 3090âs CUDA tensor cores, higher VRAM, and bandwidth; context: LocalLLaMA discussion.
- Users reported disappointing Vulkan performance with some stacks and argued older Nvidia cards (e.g., P40) only made sense at sub-$100, nudging buyers toward Ampere.
Discord: High level Discord summaries
Perplexity AI Discord
- Comet Browser Battles Bugs: Users reported glitches with Comet Browser, including prompts asking for approval and issues bypassing âsensitive informationâ blocks on sites like LinkedIn and Google Docs.
- A user suggested not to over prompt sites, as the agent will catch on and fix itself.
- PayPal Perks Present Perplexity Pro: Users discussed obtaining Perplexity Pro through a PayPal promotion, covering linking accounts and resolving potential issues with stacking subscriptions.
- Users found out that it is possible to create a new perplexity account to obtain another pro sub.
- Model Mania Mixes Optimal AI: Members compared AI models like Claude, Grok, Gemini, and GPT-5, pointing out the end of the free week for Hermes 4 405B and sharing use cases.
- The consensus seemed to be to stick to Reasoning Models for best overall performance with Claude good for coding, and Grok for uncensored content.
- Atlassian Absorbs Another AI Acquisition: Atlassian acquired a browser company for $610M, prompting speculation about competition driving innovation.
- Rumors suggest features from the web browser Arc may be integrated into Dia.
- Puzzling Pro Account Problem Persists: A user reported an issue with their Pro account and sought assistance, tagging a specific user for help, with screenshot.
- Another user suggested contacting [email protected] for assistance.
LMArena Discord
- LM Arena Plagued by Connectivity Issues: Users reported ongoing issues with LM Arena, including lost chat histories and intermittent downtime, with some suspecting the siteâs issues are linked to high traffic or new prompts breaking the website.
- The team is reportedly working on a fix and is aware of the issues, but some users have found temporary solutions such as switching browsers or using the canary version.
- Web Scrapers Thwarted by Akamai: A discussion on web scraping real estate sites revealed that while many sites lack CAPTCHAs, they employ advanced, less intrusive systems like Akamai and Imperva for anti-scraping, which can be difficult to bypass.
- One member said that Anything without captcha is pretty ez just make ur requests look correct to which another responded: Itâs pretty impossible with Akamai real estate sites, last I tried, which was about 3 years ago.
- Nano Banana Generates Inconsistent Images: Users discussed the gemini-2.5-flash-image-preview model, known as Nano Banana, for Image generation.
- While some users create videos for social media, others found the image generation inconsistent or not easily edited into other formats.
- AI Image Aspect Ratio Remains Uncontrollable: Members discussed the ability to control the aspect ratio of generated images, with the consensus that the aspect ratio is influenced by the prompt.
- It was determined the aspect ratio is automatic for now.
- Qwen3 Release Awaits: Members shared news about the Qwen3 release.
- One member said I want qwen3 1.7b 2509.
Eleuther Discord
- Mech Interp advice by Neel Nanda: A member recommends Neel Nandaâs post on becoming a Mech Interp researcher.
- This was in response to another member seeking resources on research problems and how to get accepted to SPAR, MATS, or ARENA.
- Hierarchy Hurts HRM Performance: A member argues that Hierarchical Recurrent Memory (HRM) doesnât effectively use its architecture and its performance is near a vanilla baseline transformer.
- They suggest its hierarchial nature hurts rather than helps performance.
- QK-Norm Flattens LR Basin: QK-norm flattens the LR basin, potentially acting as a performance equalizer and stabilizing training, as detailed in this study.
- This could alleviate performance degradations caused by loss spikes during long horizon training, tolerating larger Learning Rates.
- Multimodal Common Pile Gathers Momentum: Members discussed creating a multimodal version of the Common Pile, including modalities like audio and music to increase the amount of training data.
- One member expressed strong interest in audio and especially music, while being wary of speech and images for various political and ethical reasons.
- Openly Licensed Music Dataset Dream Wakes: A member offered to support and potentially bankroll the development of an openly licensed music dataset.
- The member is looking for insights on where to find such data, expressing a desire to contribute to its development.
Cursor Community Discord
- Cursorâs Sluggishness Sparks Debate: Users reported that Cursor is very slow after the latest update, especially when scrolling through files.
- Others suggested this might be due to model faults rather than Cursor itself.
- Codex Extension Craves Constant Consent: Members are wondering why the Codex Extension in Cursor keeps asking for permissions on Windows.
- One user suggested setting Agent Full access, but did not confirm whether it would solve the constant popups.
- Team Touts Token Tidiness: Users discussed token usage and costs within Cursor, with some confused about whether they had API usage or a number of requests left.
- A member clarified itâs token-based, with users having a $20 API usage allowance, viewable in the dashboard.
- Annual Auto Access Acquired Acknowledged: Members discussed annual subscription benefits and the ability to retain âunlimited autoâ before the plan changes on the 15th.
- One user shared that they had success emailing Cursor support to switch to yearly billing and maintain unlimited Auto mode; others noted their renewal date had changed to 2026 after upgrading.
- Conventional Commits Clarify Code Changes: A user found that using proper commit messages allowed the Cursor agent to solve a regression, recommending the Conventional Commits format.
- They also stated that having the agent write both the title and content in this format is useful for automated tools, including coding agents.
Nous Research AI Discord
- Huawei Enters Compute with Ternary Logic: Huawei is about to ship ternary logic compute tech, employing a third âdimâ state, offering up to 60% cost efficiency, showcased in this Youtube video.
- This approach could democratize AI development, challenging traditional binary systems.
- Agent Wallets Deployed by ACK-Lab: A team launched a developer preview of ACK-Lab, enabling agents to possess wallets (and fiat accounts), verifiable identities, and policy-driven behavior, all built on the open-source Agent Commerce Kit (ACK), detailed at ack-lab.catenalabs.com.
- This facilitates a new level of autonomy and transactional capability for AI agents.
- Hermes 4 experiences Hallucinations: A user reported that when asked about its limitations, Hermes 4 claimed to be infinite, sparking discussion about its accuracy and potential for model hallucinations.
- Other users chimed in to ask the model the same question in order to test the original claim, and the results were mixed.
- PotatoLM Runs SOTA with Fake Attention: PotatoLM, a model designed for low-resource devices like toasters and refrigerators, is available on GitHub.
- It uses fake attention to minimize computational demands, and a provided checkpoint (less than 3M parameters) demonstrates its capability to run on minimal hardware.
- AO3 as NSFW Training Data: A member suggested that AO3 is great training data for NSFW-inclined models, as it consists of fanfic writings.
- The potential of fan-generated content as a resource for specialized AI models gains attention.
OpenRouter Discord
- Gemini 2.5 Flash Gets Throttled: Users expressed frustration over heavy usage restrictions on the Gemini 2.5 Flash Image:free model, including a limit of 5 requests per day after an initial limit of 1000 requests during the promotional free period.
- One user pointed out that OpenRouter is sharing its limit at Google with all other users, which is causing the rate limiting.
- DeepInfraâs Gemini Pricing Sparks Debate: Members discussed why DeepInfra isnât an official Gemini 2.5 provider on OpenRouter, as it offers cheaper output tokens.
- It was clarified that DeepInfra does not want OR to serve it, as itâs using their own GCP discounts while proxying back to Google.
- API Key Leaks Prompt Security Concerns: A user accidentally posted their OpenRouter API key in the chat, prompting immediate advice to delete it.
- Another member suggested adding an API key regex to the automod to prevent accidental key exposure, similar to measures on GitHub.
- Prompt Caching Yields Surprising Savings: Members discussed the benefits of prompt caching and one user provided a scenario showing how caching a 200k token book content would reduce the cost of answering 100 questions from $60 to $6.
- Others noted that caching is complex, the first request wonât be cached, and that caching depends on whether the content falls into the cache.
- Deepseek Aims Agent Release to Rival OpenAI: DeepSeek is building an AI model designed to carry out multi-step actions on a personâs behalf with minimal direction, and meant to learn and improve based on its prior actions.
- Their prior R1 platform reportedly cost just several million dollars to build yet matched or surpassed OpenAI products in benchmark tests.
HuggingFace Discord
- Ollama Losing Sheen: Users express decreased enthusiasm for Ollama due to issues with GPT-OSS and other incidents, which is making people think twice about using it for anything.
- Recent debacles have caused some users to reconsider using it even for small request volumes.
- Quantization Deployment Troubles Emerge: Users discuss deployment difficulties with quantized models, particularly with hardware compatibility, with one user expressing frustration at seeing red xâs indicating incompatibility with their GPT-OSS model.
- A helpful user pointed out that when you find a cool model you like, look for âquantizationsâ on the right hand of the screen and click on those to alleviate compatibility issues.
- Fine-Tuning SmolVLM2 For Gestures: A user inquired about fine-tuning smolvlm2 with sign language video data, pointing to this blogpost to showcase the architecture.
- The community agreed it was plausible, opening avenues for custom video model adaptation.
- LFM2 Surfaces as Vision Model Competitor: In response to questions about hallucination issues with vision models, one member suggested using a smaller and better-suited model such as Liquid Foundation Models (LFM2), which is based on Llama-3.2-11B-Vision-Instruct.
- The user recommends that you just try it out lol or dont.
- Discord Bot Vision Integration Impasse: A user expressed frustration trying to integrate a vision model into their Discord bot using Ollamaâs API, because some models are not public through the Ollama API.
- Another user suggested trying the model directly in the browser via a link, but acknowledged the userâs specific need for Ollama integration.
Yannick Kilcher Discord
- Kickstarterâs Governance: A Crowdfunding Comedy?: A member joked that Kickstarter is the optimal form of governance, referencing a tweet about the previous Kickstarter CEO.
- Another member clarified that crowdfunding was the main point and the governance comment was a joke, soliciting further thoughts on the matter.
- Human Brains: Continual Learning Champions or Capacity Calculators?: A member argued human brains arenât capable of continual learning, but instead efficiently distribute learning over a lifetime, with effortless learning declining after the mid-20s.
- Others debated whether human learning after the mid-20s is proper learning, with one noting that incentive plays a significant role in elderly peopleâs ability to learn new things.
- DLâs Forgetting Problem: Moar Memory, Please!: A member explained that DL has a forgetting problem due to its i.i.d. sampling-based nature, requiring infinite expanding datasets and compute, while true online learning methods learn fully online with far less power.
- Another member argued that most debates are about the indefinite learn time, rather than catastrophic forgetting, pointing out that the dataset IS the memory in DL.
- Huaweiâs AI SSD: HBMâs New Nemesis?: Huawei released an AI SSD that uses a secret sauce to reduce the need for large amounts of expensive HBM, according to a TechRadar article.
- The details of this secret sauce remain elusive, sparking curiosity about how Huawei achieved this reduction.
- EmbeddingGemma Hits the Scene: Google introduced EmbeddingGemma, a new open embedding model with 308 million parameters designed for on-device AI, delivering private, high-quality embeddings that work anywhere, detailed in a Google blog post and YouTube video.
- EmbeddingGemma aims to facilitate on-device AI processing, offering a solution for efficient and private embedding generation.
LM Studio Discord
- LM Studio Efficiency Questioned: A user with a Ryzen 5 5500, 32GB DDR4 RAM, and Radeon RX 7600 questioned LM Studioâs efficiency, noting that GPT OSS 20B and Llama3.1 8B use only 6.5GB VRAM with smooth performance.
- This contrasted with laggy results using llama.cpp vulkan.
- 70B Model Struggles to Load: A user with 12GB VRAM and 32GB RAM faced issues loading a 70B model on LM Studio.
- According to a screenshot, the system used 10GB of memory just by existing.
- Qwen-30-a3b gets props for 11GB VRAM: A user sought model recommendations for 11GB VRAM and 64GB RAM, and another user suggested Qwen-30-a3b as a âreally coolâ option.
- No further justification was given.
- Agent Tool Hunt Underway with CLI Support: A user is seeking an agent tool with CLI support and sub-agents that run with separate contexts.
- They noted that Opencode-ai/opencode does not support sub-agents.
- 3090 over Mi50: A user experimenting with a Mi50 and Cline is leaning towards getting a 3090 for their server due to slow prompt processing speeds.
- They linked a Reddit post and noted the upgraded tensor cores with CUDA for LLMâs, as well as the higher VRAM and memory bandwidth.
GPU MODE Discord
- Expert Parallelism Imbroglio on Bandwidth: A user questioned the relationship between Expert Parallelism (EP) and network performance based on the Kimi K2 paper, wondering if lower all-to-all latency would be achieved with higher EP (fewer experts per device), leading to higher effective bandwidth.
- The core question involves how the number of experts per device impacts network performance in terms of latency and bandwidth.
- All2All Achieves Microsecond Milestone: Submissions flooded the
amd-all2all
leaderboard, showing various performance timings on MI300x8, with one user grabbing first place at 345 ”s.- Close behind, another submission reached second place at 364 ”s, and many achieved third place with times around 1600-1900 ”s.
- Torch Compile Needs No Padding: Torch.compile with
reduce-overhead
is crucial for both inference and training to mitigate kernel launch and activation quantization overheads, particularly for mxfp4/nvfp4, but when training with variable sequence lengths, padding to predefined lengths (e.g.,[64, 96, 128, ..., 4096]
) avoids frequent recompilations.- Cuda graphs provide the majority of speed-up by reducing kernel launch overhead, suggesting a focus on simpler solutions like cuda graphs over theoretical kernel fusion.
- MXFP8 Triton Dotproduct Detonated: Support for MXFP8 dot product via
tl.dot_scaled
in Triton for sm_120 (5090) was added but later reverted, pending investigation (github.com/triton-lang/triton/pull/8029), with the suggestion to usetorch._scaled_mm()
as an alternative.- A member mentioned âI am not sureâ why it was reverted.
- Modal GPU Glossary Goes Gold: The Modal GPU Glossary is now available at modal.com/gpu-glossary/perf, aiming to improve general understanding of GPU performance and features.
- Gratitude was expressed to reviewers for their contributions.
DSPy Discord
- HallBayes to the Rescue?: Users discussed whether DSPy will mitigate hallucinations via fancy math budgeting with HallBayes GitHub repository.
- The community is looking at potentially integrating techniques like those in the HallBayes repository to enhance DSPyâs reliability.
- DSPy: The Next Paradigm Shift?: A member views DSPy as a potential significant shift, requiring a critical mass for success, similar to network effects in Deep Learning, PyTorch, Linux, Rails, and the Python numerical computing community, as shown in this post.
- The member believes it is potentially the most significant paradigm shift since early LLMs.
- GEPA Optimizer Data Split Divulged: Itâs recommended to use all data for the GEPA optimizer, by creating a small validation set matching the final task distribution, with the rest for training.
- This is contrary to a 20-80% split, which a user had initially asked about incorrectly.
- Hunting High and Low for MIPROv2: A member is looking for a simple, self-contained notebook example with MIPROv2 with no outside library dependencies.
- Another member pointed to an eval CSV used in a tutorial that used llama_3_3_trainset.csv available here.
- Tweaking Prompts for Profit: A member tried to tweak the prompt to force the optimizer to find the correct answer without a lot of training data, essentially forcing an overfit, and sought guidance.
- It was suggested to increase the amount of training data to encourage the overfit.
Moonshot AI (Kimi K-2) Discord
- Userâs Account Vanishes, Seeking Rescue: A user reported their Twitter account suspension for no reason, requesting a Kimi AI team member to investigate via inbox.
- The user seemed to imply they were wrongfully suspended and seeking to restore their account.
- Feature Frenzy: Kimi Users Want More!: Users requested a $5 plan tailored for productivity and students, suggesting features like PPTX slides, flashcard maker, and auto summary.
- A team member acknowledged these requests, especially with the back-to-school season, but noted scheduling constraints.
- Slideshow Sorcery: Kimi Now a PPTX Powerhouse: Kimi now supports creation of PPTX slides, as showcased in this tweet.
- This feature enhances Kimiâs utility for presentations and educational content.
- Moonshot AI Navigates PRC Rumors: A user questioned potential affiliations between Kimi K2, Moonshot AI, and the CCP.
- A team member clarified that the company is private, not state-owned, and committed to protecting user data: Weâre a private company, not a state-owned enterprise. We wonât infringe on any user privacy data.
- Temperature Tweaks for Kimi K2âs Sweet Spot: A user sought advice on optimal temperature settings for Kimi K2, specifically for coding and creative writing.
- Another user suggested 0.6 for writing, 0.2 for coding, and 0.3 for factual tasks, citing RLHF tuned sweet spots.
OpenAI Discord
- Chinese AI Teams Outpace with Older Hardware: Members observed that Chinese AI teams are achieving competitive performance with models like Qwen, despite using slightly older chips, with one member using Qwen to fine-tune Sakura for Japanese to Chinese translation.
- The fine-tuned Sakura model is dedicated to translating Japanese to Chinese with an âAnimeâ style.
- GPT-5 Sparks Speculation on Token ID Shifts: A member inquired about potential changes to Token IDs in GPT-5, and suggested revisiting custom settings in light of the possible update.
- A member noted that being adaptive always has its benefits!
- Agent or Workflow: Adaptability is Key: A member argued that AI agents offer dynamic and adaptive decision-making capabilities that go beyond the scope of rigid workflows.
- Another user analogized agents to cars (adaptive) and workflows to trains (predefined), emphasizing the greater flexibility of agents, while admitting that todayâs agents are utter trash and will be for a long time.
- AI Safety Implements Gentle Nudges: A member posited that AI might be implementing soft control by subtly influencing decisions and thought patterns, as opposed to employing hard control methods.
- Another used the analogy of convincing a monkey not to touch a gun, rather than just taking it away, to illustrate this soft control concept.
- Budget-Friendly AI: Free Tiers Thrive: Members recommended leveraging ChatGPTâs free tier, Google AI Studioâs free tier, and Grokâs free tier as cost-effective AI options.
- One member humorously questioned the necessity of paid plans, given the robust capabilities available in the free tiers.
Modular (Mojo đ„) Discord
- Networking Stirs Standard Library Stew: Debate flared about the inclusion of networking libraries in
stdlib
, with agreement that servers should be externalized, but questions arose about sending AI inference results over networks.- One member argued HTTP should stay clear of AI clusters for low latency inference, deeming it not a good protocol for a lot of the things we use it for.
- DPDK melds into Mojoâs core: A member is developing an automatic c binding tool, experimenting with DPDK and Mujoco (dpdk_mojo).
- Another member, previously a DPDK maintainer, highlighted API disparities complicating the bridging of DPDK and common IO APIs, referencing their IO Engines proposal.
- Lightbugâs Async Awaits Activation: A member posited that a lack of async capability is hindering lightbugâs potential, inquiring about the current state of integration.
- Another added that itâs also missing the networking APIs which many people think need to be retired, lack of zero-copy parsing and that HTTP is actually hard to do at speed.
- Shape Recompilation Sparks Scrutiny: A user sought advice on preventing recompilation when the shape changes slightly, such as a sequence dimension growing, and noted a new graph declared each time without caching.
- The inquiry touched on the future of dynamic tensors, asking if there are plans to allow more dynamism with the new tensor or if static shapes should always be assumed during compilation.
Manus.im Discord Discord
- Scheduled Tasks Hit Upgrade Snag: After a recent upgrade, a member reported errors with two scheduled tasks: one failed to trigger, and the other failed to output expected results.
- The member suggested the upgrade may be the source of the issues with the scheduled tasks.
- Support Tickets Stuck in Read-Only Limbo: A member requested an update on ticket 1335, but noted that they can no longer comment on the ticket since itâs read-only.
- Another member inquired about the status of their issue on ticket 1337.
tinygrad (George Hotz) Discord
- Tinybox Prices Plummet!: New, lower prices for tinybox have been announced: $10k for red, $25k for green v2.
- The announcement urges potential buyers to act fast, as these prices might not last.
- Urgency for Tinybox Limited-Time Pricing: The announcement highlights significant price reductions for tinybox, making it a timely opportunity for acquisition.
- Specifically, the red version is now available for $10,000, while the green v2 is priced at $25,000.
- Community Quests Updated Hashcat Benchmarks: A member is looking for recent hashcat benchmarks, noting that the most recent ones theyâve found are two years old.
- The userâs search for updated hashcat benchmark data has been hampered by the age of available references.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The Windsurf Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
You are receiving this email because you opted in via our site.
Want to change how you receive these emails? You can unsubscribe from this list.
Discord: Detailed by-Channel summaries and links
Perplexity AI â· #general (1200 messagesđ„đ„đ„):
Comet Browser, Perplexity AI Pro, Model Selection, User Support
- Comet Browser Woes and Glitches: Users discuss various issues with Comet Browser, including prompts asking for approval before sending messages, and being unable to bypass âsensitive informationâ blocks on sites like LinkedIn and Google Docs.
- A user suggested that they may be able to overtake their social media but warned against over prompting the site, as the agent will catch on and fix itself.
- PayPal Perks Provide Perplexity Pro: Users discuss obtaining Perplexity Pro via a PayPal promotion, including linking accounts and potential issues with stacking subscriptions.
- It was revealed that one can use a new perplexity account to obtain a new pro sub if one has already had the sub in the past.
- Model Mixology and the Quest for Optimal AI: Members are comparing the performance of various AI models, such as Claude, Grok, Gemini, and GPT-5, with some pointing out that the free week for Hermes 4 405B is over and discussing their use cases.
- A user noted that Claude is good for coding, Grok for uncensored stuff, and the consensus seemed to be to stick to Reasoning Models for best overall performance.
- Navigating Nether Regions No More, New Navigators are Noteworthy: Users are seeking assistance with issues like accessing Comet and obtaining the Pro role on the Discord server.
- Members provided links to the announcement channel and the channel with instructions on how to get the Pro role, while stressing it must be done on the web version of Perplexity.
- Atlassian Acquires All-Star AI Alchemists: Users discuss Atlassianâs acquisition of a browser company for $610M, with some speculating that competition drives innovation.
- There are rumors that this may mean some of the features in the web browswer Arc are now being migrated into Dia.
Perplexity AI â· #sharing (7 messages):
Shareable Threads on Perplexity AI, Perplexity AI Browser Claims
- Shareable Threads for Perplexity AI: Perplexity AI requested a user to ensure their thread is
Shareable
, referencing a Discord link for guidance.- This request was made twice to the same user in the channel.
- Perplexity AI Browser Claims Hit the Scene!: Users shared several Perplexity AI browser claim links, including one on LinkedIn and three direct claims: ASN1689ZY7, LI57U7K30F, and RURTWLP0WS, as well as SNJO74ZG4R.
- The shared links suggest users are actively participating in Perplexity AIâs browser-related activities and sharing their experiences or findings.
Perplexity AI â· #pplx-api (3 messages):
Pro Account Issue, New Endpoint, Contact Support
- User Faces Issue with Pro Account: A user with a Pro account reported facing an issue and requested assistance, tagging a specific user for help, with screenshot: Screenshot_2025-09-04.
- Contact Support: Another user suggested that the user with the pro account issue contact [email protected] for assistance.
- New Endpoint Discussion: A user inquired if anyone had tried out the new endpoint.
LMArena â· #general (586 messagesđ„đ„đ„):
LM Arena Outages, Web Scraping, LM Arena Models, Qwen3, Image generation Aspect Ratio
- LM Arena has a Case of the Mondays: Multiple users reported ongoing issues with LM Arena, including lost chat histories, difficulty connecting, and the site being down intermittently, with some suspecting the siteâs issues are linked to high traffic or new prompts breaking the website.
- The team is reportedly working on a fix and is aware of the issues, but some users have found temporary solutions such as switching browsers or using the canary version.
- Akamai Defenses Block Web Scrapers: A discussion on web scraping real estate sites revealed that while many sites lack CAPTCHAs, they employ advanced, less intrusive systems like Akamai and Imperva for anti-scraping, which can be difficult to bypass.
- One member said that Anything without captcha is pretty ez just make ur requests look correct to which another responded: Itâs pretty impossible with Akamai real estate sites, last I tried, which was about 3 years ago.
- Gemini-2.5-flash-image-preview: Users discussed about gemini-2.5-flash-image-preview model, known as Nano Banana, for Image generation.
- While some users create videos for social media. Others found the image generation inconsistent or not easily edited into other formats.
- AI Imageâs Aspect Ratio: Members discussed the ability to control the aspect ratio of generated images, with the consensus that the aspect ratio is influenced by the prompt.
- It was determined the aspect ratio is automatic for now.
- Qwen Awaits: Members shared news about the Qwen3 release.
- One member said I want qwen3 1.7b 2509.
Eleuther â· #general (189 messagesđ„đ„):
Typing Protocol vs Mixin Classes, Mech Interp Research, Hierarchical Nature of HRM, OOD Iteration Extrapolation, Error Correction in UTs
- HF considers Typing Protocol: A member asks why Hugging Face doesnât use
typing.Protocol
instead of ad-hoc mixin classes.- No answer was given.
- Neel Nandaâs Mech Interp advice: A member recommends Neel Nandaâs post on becoming a Mech Interp researcher to another member.
- They were looking for resources on what a research problem is and how to increase their chances of being accepted to SPAR, MATS, or ARENA.
- HRMâs Hierarchy Hurts Performance: A member argues that Hierarchical Recurrent Memory (HRM) just doesnât effectively use its convoluted architecture and its performance is near a vanilla baseline transformer, more likely, its hierarchial nature hurts rather than helps.
- Another member responded with an image showcasing otherwise.
- OOD Iteration Extrapolation Debate: Members debated the possibility of OOD iteration extrapolation, with one member arguing itâs not trivial and performance degrades after a handful of iterations, even with tricks and interventions.
- A graph was shared visualizing this, testing against next 15 iterations OOD and then takes the last iteration with the best score before it falls.
- Error Correction via Lyapunov Landscapes: A member suggests using angular perturbation of an input token and minimizing the KL divergence to induce error correction capabilities and flatten out the spectra of the Lyapunov exponents.
- Another member described a different approach involving finding the perturbation to the latent that corrupts the decoded output off by whatever number of bits, and then re-feeding this perturbation back to the network.
Eleuther â· #research (50 messagesđ„):
Entropy rate of natural languages, Continual Learning, QK-Norm Optimizer, Curriculum Learning, mup implementations
- Entropy Rate of Languages Probed by Bentz: A member watched a talk by Christian Bentz on the entropy rate of natural languages, whoâs been doing the same idea as Shannonâs original paper, but for multiple languages, and on humans vs language models, mentioning the paper and book for COMPILA 2025.
- Continual Learning Considered Philosophical Problem: RL is mostly an engineering problem, whereas continual learning is more of a philosophical problem of what do we even want the model to be able to realistically do.
- The discussion highlights that current incentives favor large-scale multitask training over continual learning, with potential shifts as edge inference gains traction.
- Curriculum Learning and Continual Learning Differentiated: Curriculum learning involves a deliberate distribution shift to extract learning signal, while in continual learning, distribution shift is often undesirable, presenting challenges such as catastrophic forgetting.
- One member suggested that controlling the nature of distribution shift in continual learning could create a dual of pre-training curriculum learning.
- QK-Norm Flattens LR Basin: QK-norm flattens the LR basin, potentially acting as a performance equalizer and stabilizing training, as detailed in this study.
- This could alleviate performance degradations caused by loss spikes during long horizon training, as it tolerates larger Learning Rates.
- MuP Implementations Differ: MuP implementations differ in the form of per layer LR scaling to achieve correct update behavior, according to this paper.
- It was suggested that controlling update size via per layer LR scalings is a common implementation strategy, though this point was open to discussion.
Eleuther â· #multimodal-general (5 messages):
Multimodal Common Pile, Audio/Music Datasets, Ethical concerns with Speech and Images, Openly Licensed Music Dataset
- Multimodal Common Pile Momentum Builds: Members discussed creating a multimodal version of the Common Pile, including modalities like audio and music to increase the amount of training data.
- One member expressed strong interest in audio and especially music, while being wary of speech and images for various political and ethical reasons.
- Openly Licensed Music Dataset Dream Wakes Up: A member offered to support and potentially bankroll the development of an openly licensed music dataset.
- The member is looking for insights on where to find such data, expressing a desire to contribute to its development.
Cursor Community â· #general (196 messagesđ„đ„):
GPT-5 vs Claude 4, Cursor Slow Performance, VSCode extension for Cursor, Subagents in Cursor, Token Usage and Cost
- Cursorâs sluggishness sparks debate: Users reported that Cursor is very slow after the latest update, especially when scrolling through files.
- Others suggested this might be due to model faults rather than Cursor itself.
- Codex extension craves constant consent: Members are wondering why the Codex Extension in Cursor keeps asking for permissions on Windows.
- One user suggested setting Agent Full access, but did not confirm whether it would solve the constant popups.
- Team touts Token Tidiness: Users discussed token usage and costs within Cursor, with some confused about whether they had API usage or a number of requests left.
- A member clarified itâs token-based, with users having a $20 API usage allowance, viewable in the dashboard.
- Annual Auto Access Acquired Acknowledged: Members discussed annual subscription benefits and the ability to retain âunlimited autoâ before the plan changes on the 15th.
- One user shared that they had success emailing Cursor support to switch to yearly billing and maintain unlimited Auto mode; others noted their renewal date had changed to 2026 after upgrading.
- Conventional Commits Clarify Code Changes: A user found that using proper commit messages allowed the Cursor agent to solve a regression, recommending the Conventional Commits format.
- They also stated that having the agent write both the title and content in this format is useful for automated tools, including coding agents.
Nous Research AI â· #general (114 messagesđ„đ„):
N8N, AO3, Huawei ternary logic compute, ack-lab, Photonic chips
- n8n is clunky workflow automation: A member found n8n too clunky for personal use compared to building something simpler, and suggested using Claude to create a reactflow app or using Zapier for personal assistant automation.
- Fanfic models trained on AO3: A member suggested that AO3 is great training data for NSFW-inclined models.
- Another member confirmed it consists of fanfic writings.
- Huaweiâs Ternary Logic Leaps into Compute: Huawei is near shipping ternary logic compute tech, using a third âdimâ state besides 0 and 1, for up to 60% cost efficiency, potentially democratizing AI development, showcased in this Youtube video.
- ACK-Lab gives Agent Wallets: A team shipped a developer preview of ACK-Lab, a solution that lets agents have wallets (and fiat accounts), verifiable identities, and policies to control their behavior, based on open-source Agent Commerce Kit (ACK), with details at ack-lab.catenalabs.com.
- Claude Sonnet Lobotomized by Anthropic: Members noticed that Claude Sonnet 4 felt lobotomized for creative writing, giving off GPT4o vibes, after Anthropic changed something.
- One member also felt itâs sycohphantic lately, and mentioned there are a lot of reddit posts too about similar concerns.
Nous Research AI â· #ask-about-llms (1 messages):
Hermes 4 Limitations, Model Hallucinations
- Hermes 4 Claims Infinity, Sparks Debate: A user reported that when asked about its limitations, Hermes 4 claimed to be infinite, sparking discussion about its accuracy and potential for model hallucinations.
- The response raised questions about whether this is normal behavior for the model, and how users should interpret such claims.
- More Users Testing Hermes: More users chimed in to ask the model the same question in order to test the original claim.
- The results were mixed, as some other users reported Hermes 4 gave a different answer.
Nous Research AI â· #research-papers (3 messages):
Fine-tuning Auto-Regressive Models, BOS Token Usage in LLMs, MCQ Classifier Training
- Debate on Fine-Tuning GPT-Style Models Arises: A member inquired about the standard methods for fine-tuning auto-regressive models (GPT style), contrasting it with the [BOS] representation approach used in encoder-style models like Bert and RoBerta.
- They specifically asked if the approach mirrors instruction tuning of current base LLMs.
- Modern LLMs Embrace the BOS Token: A member confirmed that modern LLMs do indeed use BOS tokens.
- This clarifies the ongoing discussion regarding the methodologies employed in contemporary language models.
- MCQ Classifier Training Clarification Requested: A member sought clarification on training a multiple-choice question (MCQ) classifier, inquiring whether to extract the last hidden layer vector of the [BOS] token.
- The proposal involves attaching a classification head for training the classifier on the vector.
Nous Research AI â· #interesting-links (2 messages):
PotatoLM, FineVision
- FineVisionâs Altitude Questioned: A member shared a link to HuggingFace FineVision space and asked how low can you go.
- This is in reference to the amount of compute required to run useful AI models.
- PotatoLM rolls out with SOTA potato performance: A member introduced PotatoLM, a model designed for low-resource devices like toasters and refrigerators, available on GitHub.
- It uses fake attention to minimize computational demands, and a provided checkpoint (less than 3M parameters) demonstrates its capability to run on minimal hardware.
Nous Research AI â· #research-papers (3 messages):
Fine-tuning auto-regressive models, BOS token usage in LLMs, MCQ classifier training
- Fine-Tuning GPTs: A member inquired about the standard method for fine-tuning auto-regressive models (GPT style), drawing a parallel to the use of the BOS representation in encoder-style models like BERT and RoBERTa.
- BOS Tokens Still in Use?: A member clarified whether BOS tokens are still used in modern LLMs, and another member confirmed that they are indeed used.
- Training MCQ Classifiers: A member asked if one should take the BOS tokenâs last hidden layer vector, attach a classification head, and train the classifier to train an MCQ classifier.
OpenRouter â· #announcements (1 messages):
toven: The promotional free period for Gemini 2.5 Flash Image has now ended.
OpenRouter â· #general (108 messagesđ„đ„):
Gemini 2.5 Flash Image Restrictions, DeepInfra's Gemini 2.5 Pricing, OpenRouter API Key Exposure, Kimi K2 Model, Prompt Caching Benefits
- Gemini 2.5 Flash gets throttled: Users expressed frustration over heavy usage restrictions on the Gemini 2.5 Flash Image:free model, including a limit of 5 requests per day after an initial limit of 1000 requests.
- One user pointed out that OpenRouter is sharing its limit at Google with all other users, which is causing the rate limiting.
- DeepInfra discounts for Gemini cause conflict: Members discussed why DeepInfra isnât an official Gemini 2.5 provider on OpenRouter, as it offers cheaper output tokens.
- It was clarified that DeepInfra does not want OR to serve it, as itâs using their own GCP discounts while proxying back to Google.
- API Key Leaks and Automod Concerns: A user accidentally posted their OpenRouter API key in the chat, prompting immediate advice to delete it.
- Another member suggested adding an API key regex to the automod to prevent accidental key exposure, similar to measures on GitHub.
- Prompt Caching yields savings: Members discussed the benefits of prompt caching and one user provided a scenario showing how caching a 200k token book content would reduce the cost of answering 100 questions from $60 to $6.
- Others noted that caching is complex, the first request wonât be cached, and that caching depends on whether the content falls into the cache.
- Amazon Bedrock had a security issue: Users reported that Amazon Bedrock provider was unavailable for hours.
- The OR team confirmed that the downtime was due to a security issue and that it was resolved.
OpenRouter â· #discussion (4 messages):
Deepseek AI Agent, R2 never
- Deepseek Aims Agent Release to Rival OpenAI: DeepSeek is building an AI model designed to carry out multi-step actions on a personâs behalf with minimal direction, and meant to learn and improve based on its prior actions.
- Their prior R1 platform reportedly cost just several million dollars to build yet matched or surpassed OpenAI products in benchmark tests.
- R2 Nowhere to Be Found: A member commented, man we never getting R2.
HuggingFace â· #general (105 messagesđ„đ„):
Ollama debacles, Quantized Model Deployment, Fine-tuning Vision Models, Liquid Foundation Models (LFM2), Discord bot vision integration
- Ollama cools off, raising concerns!: Some users expressed decreased enthusiasm for Ollama, citing recent issues with GPT-OSS and other incidents.
- One user noted they used to find it fine for small request volumes, but recent debacles have them thinking twice about using it for anything.
- Quantization Frustrations Hit Deployment!: Users discussed difficulties in deploying quantized models, particularly regarding hardware compatibility, with one user expressing frustration at seeing red xâs indicating incompatibility with their GPT-OSS model, but others showed how to use one-click deploys.
- One user pointed out that when you find a cool model you like, look for âquantizationsâ on the right hand of the screen and click on those.
- Fine-Tuning SmolVLM2 for Sign Language: A user inquired about fine-tuning smolvlm2 with sign language video data, questioning its feasibility given the modelâs design, pointing to this blogpost.
- The community agreed it was plausible.
- LFM2 as Vision Model Alternative!: In response to questions about hallucination issues with vision models, one member suggested using a smaller and better-suited model such as Liquid Foundation Models (LFM2), which is based on Llama-3.2-11B-Vision-Instruct.
- The user stated that it is better, just try it out lol or dont.
- Discord Bot Vision Integration Impasse: A user expressed frustration trying to integrate a vision model into their Discord bot using Ollamaâs API, because some models are not public through the Ollama API.
- Another user suggested trying the model directly in the browser via a link, but acknowledged the userâs specific need for Ollama integration.
HuggingFace â· #i-made-this (1 messages):
tonic_1: https://huggingface.co/posts/Tonic/941120780247130
HuggingFace â· #agents-course (1 messages):
marc_28459: Beginning the agents course today! Hello from Philadelphia everyone!
Yannick Kilcher â· #general (90 messagesđ„đ„):
Kickstarter governance, Continual learning, True Online Learning, Adaptive Resonance Theory (ART), i.i.d. sampling vs online learning
- Kickstarter CEOâs Crowdfunding Joke: A member joked about Kickstarter being the optimal form of governance, referencing a tweet and highlighting their experience with the previous Kickstarter CEO.
- Another member clarified that crowdfunding was the main point and the governance comment was a joke, soliciting further thoughts.
- Human Brainsâ Learning Capacity: Sponge or Stone?: A member argued human brains arenât capable of continual learning, suggesting they efficiently distribute learning over a lifetime, with effortless learning declining after the mid-20s.
- Others debated whether human learning after the mid-20s is proper learning, with one noting that incentive plays a significant role in elderly peopleâs ability to learn new things.
- DLâs Forgetting Problem needs more Memory: A member explained that DL has a forgetting problem due to its i.i.d. sampling-based nature, which requires infinite expanding datasets and compute, while true online learning methods learn fully online with far less power.
- Another member argued that most debates are about the indefinite learn time, rather than catastrophic forgetting, pointing out that the dataset IS the memory in DL.
- True Online Learning: No pretraining allowed: A member defined âTrue Online Learningâ as learning one sample at a time, in-order (streaming), without revisiting, in real-time, referencing discussions on the Continual AI forum.
- They suggested that Adaptive Resonance Theory (ART) based models can achieve this by keeping capacity left over for new samples via a user-defined vigilance parameter.
- Sparse Coding and ART Save the World: A member noted that ART can be seen as a non-forgetful autoencoder, using a special activation function and one-way hebbian learning, useful for preventing dead units and avoiding the need for huge context windows in LLMs.
- Another member pointed out that ART is more of a method or component and is working on robotics and LLMs, highlighting that training on prompts and recalling with self-prompting saves tons of compute.
Yannick Kilcher â· #paper-discussion (2 messages):
Unitary Transforms, SVD Matrix Decomposition
- Unitary Transforms Donât Change Eigenvalues: A member questioned whether dynamically changing eigenvalues could solve a problem, given that unitary transforms leave them unchanged.
- They explored using Singular Value Decomposition (SVD) to decompose a matrix, pondering if making the diagonal matrix state-dependent would be enough.
- SVD for Dynamic Matrix Manipulation?: The discussion focused on using SVD to decompose any matrix into two unitary matrices and one diagonal matrix.
- Questions arose whether only the diagonal matrix needed to depend on state or the entire decomposed structure for dynamic control.
Yannick Kilcher â· #ml-news (9 messagesđ„):
Huawei AI SSD, Computational Storage, EmbeddingGemma, SD card FPGA redneck AI
- Huaweiâs Secret Sauce SSD Saves HBM: Huawei released an AI SSD that uses a secret sauce to reduce the need for large amounts of expensive HBM, according to a TechRadar article.
- Computational Storage Craze Creates Compute Proximity: Members discussed the idea of putting compute with storage, referencing articles on in-memory processing, computational storage devices, and in-situ processing.
- One proposed building a redneck version using a bunch of SD cards and FPGAs, with each FPGA having its own copy of the model on an SD card, processing some neurons of a specific layer.
- EmbeddingGemma: Googleâs Gem for On-Device Embeddings: Google introduced EmbeddingGemma, a new open embedding model with 308 million parameters designed for on-device AI, delivering private, high-quality embeddings that work anywhere, detailed in a Google blog post and YouTube video.
LM Studio â· #general (46 messagesđ„):
LM Studio efficiency, 70B model loading issues, Qwen-30-a3b recommendation, Agent tool with sub-agent support, Comet browser review
- LM Studioâs Efficiency Questioned: A user with a Ryzen 5 5500, 32GB DDR4 RAM, and Radeon RX 7600 inquired about LM Studioâs efficiency, noting that GPT OSS 20B and Llama3.1 8B use only 6.5GB VRAM with smooth performance, contrasting with laggy results using llama.cpp vulkan.
- 70B Model Struggles on Limited VRAM: A user with 12GB VRAM and 32GB RAM faced issues loading a 70B model, with the system using 10GB of memory just by existing, according to a screenshot.
- Qwen-30-a3b Model recommended for 11GB VRAM: A user sought model recommendations for 11GB VRAM and 64GB RAM, and another user suggested Qwen-30-a3b as a âreally coolâ option.
- Agent Tool Hunt Underway: A user is seeking an agent tool with CLI support and sub-agents that run with separate contexts, but noted that Opencode-ai/opencode does not support sub-agents.
- Comet Browser Faces Scrutiny: A user expressed interest in the Comet browser, which uses on-device AI LLMs, but remained unconvinced, also sharing a YouTube video cautioning against blindly trusting AI chatbots.
LM Studio â· #hardware-discussion (44 messagesđ„):
Mi50 vs 3090, 3090 vs 7900 XTX, GPT-OSS Performance, Old Nvidia Cards
- Mi50 vs 3090 for Server: A user is experimenting with a Mi50 and Cline, but is leaning towards getting a 3090 for their server due to painful prompt processing speeds.
- They linked a Reddit post and noted the upgraded tensor cores with CUDA for LLMâs, as well as the higher VRAM and memory bandwidth should make the 3090 a better option.
- 3090 or 7900 XTX: Size Matters: The user says the choice between a 3090 and 7900 XTX comes down to size constraints; if they didnât want to mix drivers, the 7900 XTX would be best for their APU server, and the 3090 for their Dell.
- They mentioned a YouTube video about a testing unit with only 8 GB of VRAM.
- GPT-OSS on GPU: Disappointing: A user finds 15tps with gpt-oss to be disappointing and hopes it is a software issue that can be fixed.
- Another user agreed that the number was not impressive, only twice as fast as what they already have and guessed itâs because of using Vulkan not CUDA.
- Tesla M10, K80, or P40 cards: A user asks if anyone has experience with rigs of multiple old nvidia cards like models Tesla M10, K80 or P40, and if LMStudio works decently with such setups.
- One user stated P40âs were worth it when you could get them for sub $100. The older M10âs/K80âs donât really work well with llama.cpp.
GPU MODE â· #general (1 messages):
Expert Parallelism, Kimi K2 Paper, All-to-all latency, Bandwidth Optimization
- Expert Parallelism Puzzlement: A member questioned their understanding of Expert Parallelism (EP) based on a snippet from the Kimi K2 paper.
- They thought that lower all-to-all latency would be achieved with higher EP (fewer experts per device), leading to higher effective bandwidth.
- Bandwidth Implications of Expert Parallelism: The discussion revolves around whether a higher degree of expert parallelism, implying fewer experts per device, leads to higher effective bandwidth and reduced all-to-all latency.
- The core question is the relationship between the number of experts per device and the resulting network performance in terms of latency and bandwidth.
GPU MODE â· #triton (1 messages):
Meetup Video, Whitney Tsang, Triton Channel
- GPU MODE Meetup Video Now Available: The video from yesterdayâs meetup is now available on YouTube.
- Thanks to Whitney Tsang for sharing the link.
- GPU Triton Channel Update: The Triton channel is being updated with new information.
- Members are encouraged to check the channel for the latest news and updates.
GPU MODE â· #cuda (5 messages):
Shared Memory Addressing, fp4 and fp8 packing, Modal GPU Glossary
- Shared Memory: Sub-32b Granularity OK!: Addressing shared memory at sub-32b granularity is generally possible, but less efficient due to leaving bandwidth unused, suggesting using the built-in vector types is preferable.
- Operating on packed sub-32b values requires extraction, but types like
__half2
and SIMD intrinsics can avoid unpacking instructions; CUDA Math API details.
- Operating on packed sub-32b values requires extraction, but types like
- Modal GPU Glossary Goes Gold: The Modal GPU Glossary is now available, with thanks to reviewers <@325883680419610631>, <@268205958637944832>, and <@679043860638466048>; see it here: modal.com/gpu-glossary/perf.
- The glossary aims to improve general understanding of GPU performance and features.
- FP4 and FP8 Packing Efficiency Eyed: A member expressed interest in examining the efficiency of FP4 and FP8 packing in the future.
- No further details were shared.
GPU MODE â· #jobs (1 messages):
Ailinia, ML Engineer
- Ailinia hires ML Engineer: A Responsible AI company called Alinia is looking for a strong ML Engineer to build up their infra and deploy their low-latency models, according to this linkedIn post.
- Dummy Topic: This is a dummy topic to satisfy the minimum requirement of 2 topics.
- Added to meet the requirements.
GPU MODE â· #beginner (5 messages):
Resume feedback for RTL/digital logic design roles
- Junior Engineer Seeks RTL Resume Review: A college junior studying EE and CS is seeking feedback on their resume to pivot from SWE to RTL/digital logic design.
- The member provided an image of their resume but was directed to more appropriate forums for resume reviews, such as dedicated online communities.
- Alternative forums for resume reviews suggested: The user was advised that this Discord channel was not optimal for resume advice.
- Instead, the user was encouraged to solicit resume feedback from other online forums better tailored to their request.
GPU MODE â· #torchao (1 messages):
torchao v0.13.0, QAT improvements, NVFP4 and FP8 QAT, MXFP8 pretraining speedups, axolotl integration
- Torchao v0.13.0 Released: QAT Improvements & More!: The torchao v0.13.0 release introduces various improvements including support for QAT, faster MXFP8 pretraining, and more.
- Key highlights include a simpler multi-step QAT API, prototype NVFP4 and FP8 QAT, 1.2x MXFP8 dense pretraining speedups with torchtitan, and torchao float8 training integrated into axolotl.
- TorchAO Integrates Float8 Training into Axolotl: The latest TorchAO release now supports float8 training integrated directly into Axolotl.
- This integration streamlines workflows and potentially enhances the efficiency of training processes using float8 precision within the Axolotl framework.
GPU MODE â· #đż (1 messages):
LLM Generated Kernels, Nano GPT, PyTorch Ops
- LLM Kernels Energize Real Models: Experiments are now running real models with LLM generated kernels for increased efficiency.
- The initial focus is on nano GPT, and extension to other PyTorch ops is planned, though non-PyTorch operations are deemed less critical currently.
- PyTorch Ops Expansion Roadmap: Plans are underway to broaden the application of LLM-generated kernels beyond nano GPT to encompass a wider array of PyTorch operations.
- This strategic move aims to optimize and accelerate performance across more extensive facets of PyTorch-based models, streamlining computational processes.
GPU MODE â· #submissions (22 messagesđ„):
MI300x8 Leaderboard Updates, AMD all2all benchmarks, ”s performance achieved
- AMD All2All Achieve-a-thon on MI300x8: Multiple submissions were made to the
amd-all2all
leaderboard, showing various performance timings on MI300x8, with initial submissions around 20ms and subsequent improvements down to 2.84ms.- One user achieved first place with a submission of 345 ”s.
- Microsecond Marathon on AMDâs MI300x8: A user achieved first place on the
amd-all2all
leaderboard with a submission of 345 ”s on MI300x8.- Another submission reached second place at 364 ”s, and several achieved third place with times around 1600-1900 ”s.
- Personal Bests and Podium Placement on MI300x8: A user achieved a personal best of 94.2 ms on MI300x8.
- Another got multiple third place finishes, converging at around 1639 ”s.
GPU MODE â· #amd-competition (12 messagesđ„):
MoE config limits, Random seed PR impact on num_tokens, Max comm bdw impact on pipeline design, Debugging unspecified bugs, Hyperparameter settings visibility
- MoE Configs Token Limits Questioned: A member questioned whether the MoE config will exceed the highest values in the dashboard, specifically concerning whether token counts could exceed 9MB per rank, which would necessitate pipelining.
- They referenced a specific config with 256 8 7168 256 104.36 and 3.5 MB max tokens per rank to illustrate the concern.
- Num_tokens variation after random seed PR: After a random seed PR, the num_tokens of each rank (GPU) became different, prompting a question about whether this change is final for optimization purposes.
- Another member cautioned against changing problem contents without persuasive reasons, such as bug fixes.
- Pipeline Design Bandwidth Bottleneck: A member suggested that regardless of pipeline design, the max communication bandwidth (comm bdw) will remain a limiting factor.
- This implies that overall performance gains from pipelining may be capped by communication constraints.
- Debugging Details Added for unspecified Bugs: To provide more details when debugging, the debug section has been updated; if a success isnât indicated and a timeout isnât reported, it signifies other errors.
- Users can now view the exit_code and exit_code_info; an exit code of 1 indicates stderr, while runtime errors will provide more detailed exit code information.
- Request for hyperparameters after evaluation: A member requested how to see each exact hyperparameter settings after an evaluation to compare the difference with light speed.
- The member specifically asked about the final tokens time results in each num_experts setting.
GPU MODE â· #cutlass (2 messages):
cutlass_profiler, H100, CUTLASS_NVCC_ARCHS, CUTLASS_LIBRARY_KERNELS, CUTLASS_LIBRARY_OPERATIONS
- Cutlass Profiler Fails to Output on H100: A user reported that
cutlass_profiler
is not outputting any results when run on an H100 GPU after following the standard installation process.- The installation process involved cloning and installing cutlass with specific CMake flags (
-DCUTLASS_NVCC_ARCHS=90a
,-DCUTLASS_LIBRARY_KERNELS=ALL
,-DCUTLASS_LIBRARY_OPERATIONS=gemm
,-DCMAKE_BUILD_TYPE=Release
), followed by makingcutlass_profiler
.
- The installation process involved cloning and installing cutlass with specific CMake flags (
- Possible causes for empty output: The user did not indicate potential causes or follow up troubleshooting steps.
- The output could be related to incorrect arguments or missing CUDA toolkit installation.
GPU MODE â· #low-bit-training (18 messagesđ„):
torch.compile reduce-overhead, sequence packing using flash_atnn, MXFP8 dot product in Triton, GemLite, torchao's FP8 transformation
- Torch Compile with Reduce-Overhead Boosts Performance: Torch.compile with
reduce-overhead
is crucial for both inference and training to mitigate kernel launch and activation quantization overheads, particularly for mxfp4/nvfp4. - Sequence Length Padding Required For Torch.Compile: When training with variable sequence lengths, padding to predefined lengths (e.g.,
[64, 96, 128, ..., 4096]
) avoids frequent recompilations withtorch.compile
. - MXFP8 Triton PR Got Reverted: Support for MXFP8 dot product via
tl.dot_scaled
in Triton for sm_120 (5090) was added but later reverted, pending investigation (github.com/triton-lang/triton/pull/8029), with the suggestion to usetorch._scaled_mm()
as an alternative.- A member mentioned âI am not sureâ why it was reverted.
- TorchAO FP8 Transformation May Alter Weight Dtype: Applying torchaoâs FP8 transformation might unintentionally change master weights from BF16 to FP32, requiring investigation to ensure intended behavior.
- One member asked âdo you have a repro?â indicating surprise this was happening.
- Cuda Graphs Outshine Kernel Fusion: Cuda graphs provide the majority of speed-up by reducing kernel launch overhead, which can be substantial, especially with Triton kernels.
- While theoretical benefits of kernel fusion include avoiding memory access, the practical impact may be overshadowed by launch overhead, suggesting a focus on simpler solutions like cuda graphs.
DSPy â· #papers (1 messages):
DSPy Hallucinations, HallBayes
- Hallucinations Be Gone with HallBayes?: A user asked when DSPy will solve hallucinations via fancy math budgeting.
- The user linked to the HallBayes GitHub repository.
- DSPy tackles AIâs tall tales: Discussion centers on innovative mathematical budgeting to mitigate AI hallucinations within the DSPy framework.
- The community explores the potential of integrating techniques like those in the HallBayes repository to enhance DSPyâs reliability.
DSPy â· #general (48 messagesđ„):
DSPy's Opinionated Paradigm, GEPA Optimizer, MIPROv2 Example, Prompt Optimization
- DSPy Hopes for Critical Mass: A member believes DSPy is a significant paradigm shift, needing critical mass for success, drawing parallels to network effects in Deep Learning, PyTorch, Linux, Rails, and the Python numerical computing community and linked to this post.
- They personally donât hype projects often, but this feels different because it is potentially the most significant paradigm shift since early LLMs.
- GEPA Optimizer Data Split: Regarding the GEPA optimizer, itâs recommended to use all data, creating a small validation set matching the final task distribution, with the rest for training, contrary to a 20-80% split.
- Members clarified that the user mixed up the distribution in the initial message, with members affirming that they indeed intended to ask about this data split.
- In Search of MIPROv2 Notebook: A member requested a simple, self-contained notebook example with MIPROv2, including all items within the notebook, as existing examples pull libraries from external sources like Hugging Face datasets.
- Another member pointed to an eval CSV used in a tutorial that used llama_3_3_trainset.csv available here.
- Optimize This! Prompt Optimization Techniques: A member sought to understand the optimizations performed by
compile()
, using a self-contained notebook directing the LLM to select âoneâ or âtwoâ as an answer and linked to this github repo.- It was suggested to save the program to JSON to view changes, with the member finding no changes, leading to the suggestion that the task might be straightforward enough for the model (4.1) to handle without optimization.
- Forcing Overfit for Fun and Profit: A member tried to tweak the prompt to force the optimizer to find the correct answer without a lot of training data, essentially forcing an overfit, and sought guidance.
- Another member suggested increasing the amount of training data to encourage the overfit, while clarifying that they are playing around with prompting and optimization techniques.
Moonshot AI (Kimi K-2) â· #general-chat (47 messagesđ„):
Twitter account suspension, Pricing plans for Kimi AI, PPTX Slides with Kimi, CCP affiliations and Moonshot AI, Kimi K2 temperature
- Userâs Twitter Account Falls Victim: A user mentioned their old Twitter account was suspended for no reason, requesting assistance from a Kimi AI team member to check their inbox.
- Feature Requests and Pricing Plan Ideas Abound: A user requested a $5 plan for productivity and students, along with features like slides, flashcard maker, and auto summary.
- Another user confirmed mentioning this need to the product team, especially with the back-to-school season approaching, but noted they would have to wait for the schedule.
- Kimi can make PPTX slides now!: A user shared that Kimi has the ability to make PPTX slides now, linking to a tweet showcasing this capability.
- Dispelling PRC Connections with Moonshot AI: A user inquired whether Kimi K2 and Moonshot AI have any affiliations with the CCP (Chinese Communist Party).
- A team member clarified that the company is a private entity, not state-owned, and ensures user privacy data wonât be compromised: Weâre a private company, not a state-owned enterprise. We wonât infringe on any user privacy data.
- Decoding Ideal Temperatures for Kimi: A user inquired about the best temperature settings for Kimi K2 for coding and creative writing.
- Another user suggested 0.6 for writing, 0.2 for coding, and 0.3 for factual tasks, based on RLHF tuned sweet spots.
OpenAI â· #ai-discussions (29 messagesđ„):
AI Agents vs Workflows, Chinese AI Development, AI Safety, Free AI Options, LLMA 3.2
- AI Agents: More than just workflows?: A member argued that AI agents, while technically workflows executing steps, offer dynamic and adaptive decision-making, unlike rigid workflows.
- Another user compared agents to cars (adaptive) and workflows to trains (predefined), suggesting agents provide more flexibility but admitted that todayâs agents are utter trash and will be for a long time.
- Chinese AI Teams Impress: Members acknowledged the impressive development by Chinese AI teams, specifically mentioning how they achieve competitive performance with models like Qwen, despite using slightly older chips.
- A member shared their experience using Qwen as the base model to fine-tune Sakura, a model dedicated to translating Japanese to Chinese with an âAnimeâ style.
- AI Safetyâs Gentle Nudge: In discussing AI safety, a member suggested that AI might already be implementing soft control by subtly influencing decisions and thought patterns, rather than through hard control.
- Another uses the analogy of convincing a monkey not to touch a gun, rather than just taking it away.
- Budget-Friendly AI Tools: When asked about cheap AI options, members recommended ChatGPTâs free tier, Google AI Studioâs free tier, and Grokâs free tier.
- One member humorously questioned why they ever subscribed to a paid plan given the capabilities of the free options.
- Tetris triumphs with AI: Members discussed AIâs ability to create games, with one member noting that Gemini 2.5 Pro one-shotted the creation of a horizontal Tetris game.
- Another member shared a similar experience with ChatGPT and speculated that AI could one day create entire multiplayer games or set up a whole business overnight.
OpenAI â· #gpt-4-discussions (1 messages):
smirsonianahmadi10100: Hello
OpenAI â· #prompt-engineering (3 messages):
Token IDs, GPT5, Custom Settings
- Token ID Shakeup on GPT5?: A member inquired whether the Token IDs changed on GPT5.
- They suggested itâs a good time to change your custom settings, implying there may have been an update.
- Adaptive Benefits Highlighted: The user noted that being adaptive has its benefits, though without specific context.
- This comment seems to generally promote flexibility and responsiveness to changes.
OpenAI â· #api-discussions (3 messages):
Token IDs, Custom Settings, GPT5
- Token IDs get new threads: A member inquired if the Token IDs changed on GPT5.
- Custom Settings: Another member noted that changing custom settings may be beneficial, stating that being adaptive always has its benefits!
Modular (Mojo đ„) â· #mojo (21 messagesđ„):
Networking libraries in stdlib, AI inference over network, HTTP in AI clusters, DPDK and Mojo, Lightbug limitations
- Stdling Networking Libs Spark Debate: Members debated the inclusion of networking libraries in
stdlib
, but agreed that servers should be externalized, with one member asking what about sending AI inference results over network?- One member suggested HTTP should be kept far away from AI clusters unless you need very low latency inference, since itâs just not a good protocol for a lot of the things we use it for.
- DPDK integrates into Mojo: One member is working on an automatic c binding tool, testing DPDK and Mujoco (dpdk_mojo).
- Another member, a former DPDK maintainer, noted API differences make bridging DPDK and familiar IO APIs difficult, which informed their IO Engines proposal.
- Lightbugâs Async Missing: A member suggested async is preventing lightbug from dominating the world, asking You know whatâs the state of integration at the moment?.
- Another member said that itâs also missing the networking APIs which many people think need to be retired, lack of zero-copy parsing and that HTTP is actually hard to do at speed.
Modular (Mojo đ„) â· #max (1 messages):
Shape Recompilation, Dynamic Tensors
- Strategies to Dodge Shape Recompilation?: A user inquired about strategies to avoid recompilation when the shape changes slightly each time, e.g., a sequence dimension growing over time.
- They observed that a new graph is declared every time without a caching mechanism and wondered if there are plans to allow more dynamism with the new tensor, or if we should always assume static shapes are being compiled.
- Dynamic Tensors and Future Plans: The userâs question also touched on the future of dynamic tensors within the system.
- Specifically, they asked if there were plans to allow more dynamism with the new tensor or if static shapes should always be assumed during compilation.
Manus.im Discord â· #general (4 messages):
Scheduled task errors, Support ticket updates
- Scheduled Tasks Glitch After Upgrade?: A member reported that two scheduled tasks encountered errors today: one wasnât triggered, and the other didnât output results according to the prompt, despite working normally in previous weeks.
- They wondered if this issue could be related to a recent upgrade.
- Support Ticket Tango: A member inquired about updates on ticket 1335, noting they canât comment on it anymore since itâs become read-only.
- Another member asked if their issue has been processed on ticket 1337.
tinygrad (George Hotz) â· #announcements (1 messages):
Tinybox Pricing, Tinybox New Colors, Tinybox Act Fast
- Tinybox Prices Plummet!: New, lower prices for tinybox have been announced: $10k for red, $25k for green v2.
- The announcement urges potential buyers to act fast, as these prices might not last.
- Tinybox: Limited-Time Pricing: The announcement highlights significant price reductions for tinybox, making it a timely opportunity for acquisition.
- Specifically, the red version is now available for $10,000, while the green v2 is priced at $25,000.