a quiet day.
AI News for 6/16/2026-6/17/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINewsâ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter Recap
Top Story: Midjourney Medical
What happened
Midjourney unveiled a medical imaging/scanning system and then published a technical dive on it, triggering a mix of fascination, skepticism, and broader discussion about AI labs moving into hardware/medical devices.
- Midjourneyâs official account posted âA technical dive inside our new âMidjourney Scannerââ in the main announcement tweet, which appears to be the core launch artifact for the project @midjourney.
- The launch was preceded or paralleled by discussion of a scanner whose tradeoffs were summarized as: radiation-free, magnet-free, fast, and low-cost, but requiring the person to sit in a water immersion tank and currently having coarser resolution than CT/MRI @iScienceLuvr.
- A demo appears to have been available in person: one attendee said, âI put my hand in the @midjourney demo scanner tonightâ, framing it as a tangible prototype rather than a purely conceptual announcement @saranormous.
- The announcement generated strong enthusiasm from supporters who viewed it as evidence of unusually ambitious product direction from Midjourney, including comments like âthis is so amazingâ and âlet inventors like @DavidSHolz inventâ @saranormous.
- Others interpreted the launch competitively against more incremental AI hardware efforts; one reaction contrasted it with âboring lapel cameraâ bets and argued other AI labs should âslap yourselfâ if Midjourney is building this kind of thing @matvelloso.
- There was also lightweight technical commentary from people interested in imaging methods, including speculation about detector/emitter arrangements and real-time variants @johnowhitaker, plus teasing that some users seemed unusually prepared for the launch topic @johnowhitaker.
Facts vs opinions
Factual claims explicitly present in the tweet set
- Midjourney published a technical dive into a product called the âMidjourney Scannerâ @midjourney.
- The scanner was described as:
- Radiation-free
- Magnet-free
- Fast
- Low-cost
- Requiring a water immersion tank
- Having coarser resolution than CT/MRI @iScienceLuvr
- A person physically tried a demo scanner with their hand @saranormous.
Interpretations/opinions/speculation
- Strongly positive reactions framed the scanner as visionary or âthe futureâ @saranormous.
- Some observers took the launch as evidence that Midjourney is pursuing a more ambitious hardware roadmap than competing AI labs @matvelloso.
- One humorous reply escalated the idea into ânext up is full cargo transport by midjourney,â clearly not a factual claim @yacinelearning.
- Independent technical commentary suggested possible future design directions, such as distributed scattered detectors and emitters or real-time systems, but these were not presented as features of Midjourneyâs current scanner @johnowhitaker.
Technical details and inferred modality
The tweet corpus contains only a limited number of hard specs, but they are enough to outline the projectâs positioning.
- No ionizing radiation: âRadiation-freeâ implies the system is not using X-rays/CT-style ionizing modalities @iScienceLuvr.
- No magnets: âMagnet-freeâ differentiates it from MRI, which relies on strong magnetic fields @iScienceLuvr.
- Water immersion tank: This is a major clue about the physical sensing setup. Water coupling is common in some acoustic and wave-propagation imaging systems because it improves transmission and coupling between emitters, tissue, and detectors @iScienceLuvr.
- Resolution below CT/MRI: The system is not being claimed, in these tweets, to outperform incumbent clinical imaging on resolution; in fact, an explicit limitation is that resolution is coarser than CT/MRI @iScienceLuvr.
- Speed/cost positioning: It is framed as fast and low-cost, suggesting the value proposition is likely accessibility, throughput, or portability rather than top-end image fidelity @iScienceLuvr.
There is also technically informed reaction about the likely sensing challenges:
- John Whitaker notes that systems based on light, ultrasound, electric current, etc. have a harder inverse problem than X-rays because signals do not travel in straight lines in the same way, making reconstruction more complex @johnowhitaker.
- He also suggests a future version with many scattered detectors and emitters rather than mechanically moving components, indicating that at least some readers infer the current system may involve motion/scanning geometry rather than fully parallelized capture @johnowhitaker.
Taken together, the public discussion points toward a non-CT, non-MRI modality with wave-based reconstruction and meaningful algorithmic/inverse-problem content, though the tweets here do not provide definitive modality labeling or performance tables beyond the stated tradeoffs.
Different perspectives
Supportive / optimistic
- The most enthusiastic camp sees this as exactly the kind of high-upside, weird, non-consensus invention AI founders should pursue, not just incremental chatbot/UI products. That tone is clear in âlet inventors like @DavidSHolz inventâ @saranormous.
- In-person demo reactions emphasized the visceral novelty of interacting with a real scanner, not just reading a paper or watching a video @saranormous.
- Some interpreted the move as a sign that Midjourney may be thinking beyond image generation and toward full-stack applied invention, possibly combining hardware, sensing, and AI reconstruction.
Neutral / technical-curious
- The most grounded reaction in the set is the concise pros/cons summary: radiation-free, magnet-free, fast, low-cost versus water immersion and lower resolution than CT/MRI @iScienceLuvr.
- Technically curious observers liked the strangeness of the modality while immediately identifying the physical and systems tradeoffs:
- Non-straight-line propagation compared with X-rays
- Need for better real-time capture arrangements
- Questions about detector/emitter topology @johnowhitaker
Opposing / skeptical / cautionary
Direct hostile criticism is limited in this tweet set, but skepticism is implicit in several points:
- Clinical utility skepticism: saying it has coarser resolution than CT/MRI is a substantive caveat, especially in medicine where image quality can directly affect diagnostic value @iScienceLuvr.
- Practicality skepticism: requiring a water immersion tank is a serious ergonomic and deployment constraint for routine clinical or consumer use @iScienceLuvr.
- Modality skepticism: technical comments about non-straight-line propagation hint at the usual challenge for alternative imaging systems: the physics and inverse reconstruction are hard, and the pretty demo may not automatically translate into robust, clinically reliable imaging @johnowhitaker.
Competitive framing
- One notable perspective was less about the scanner itself and more about what it says strategically: if Midjourney is attempting hardware-medical invention, then AI companies pursuing narrower wearable-camera concepts look conservative by comparison @matvelloso.
Context: why this matters
Midjourney is primarily known as an image-generation company. That makes a medical/scanner reveal noteworthy for several reasons:
- It suggests a willingness to move from generative media software into real-world sensing and hardware.
- Medical imaging is a domain where inverse problems, signal processing, reconstruction, and increasingly ML-based interpretation all matter; it is not an obvious adjacency, but it is a technically deep one.
- The scanner appears to be positioned not as âbetter than MRI/CT on all axes,â but as a potential entrant in the classic disruption lane: worse on a premium metric, better on cost/accessibility/operational burden.
- If the system is genuinely fast and low-cost, the most plausible implications are in:
- screening or triage,
- settings where CT/MRI access is limited,
- repeat imaging where avoiding radiation matters,
- specialized anatomical use-cases where immersion-based setups are acceptable.
The launch also fits a broader 2025 pattern where AI-adjacent companies increasingly try to define themselves not just as model vendors, but as builders of new interfaces to the physical world. In that framing, Midjourney Medical is less about a single scanner and more about whether frontier AI-era startups can productize difficult sensing systems, not just generate content.
Implications and open questions
- Regulatory path: nothing in these tweets addresses approvals, validation studies, or whether this is research-only versus intended for clinical deployment. For medical relevance, those questions are central.
- Reconstruction stack: the phrase âtechnical diveâ implies the company has discussed internals, but the tweet set here does not expose the actual algorithmic details. The likely crux is reconstruction quality under a constrained sensing setup.
- Use-case specificity: lower resolution than CT/MRI does not necessarily doom the system; many imaging tools win by being good enough for a narrow workflow. But no specific target indication appears in these tweets.
- Form factor challenge: a water immersion tank is acceptable for some scanning contexts and a major barrier for others. Whether this is a prototype artifact or a fundamental requirement matters.
- Throughput and cost realism: âfastâ and âlow-costâ are meaningful only relative to benchmarksâscan time, hardware cost, consumables, operator burden, and downstream interpretation overhead. Those numbers are not provided in the tweets here.
- AIâs role: the most interesting technical question may be whether Midjourneyâs contribution is primarily in hardware design, inverse-problem reconstruction, learned denoising/super-resolution, automated interpretation, or an integrated stack spanning all of these. The social reaction suggests people are projecting a lot onto the project because Midjourneyâs brand is associated with learned visual systems rather than classical medical devices
AI research, agents, and open models
- A notable research meta-point: Chinese open-source literature over the last year was highlighted as unusually high-ROI to follow, with the claim that the âalpha is insanely hugeâ @himanshustwts.
- PapersWithCodeâs top trending paper was VibeThinker-3B, described as a 3B parameter model exploring verifiable reasoning in small LMs and allegedly landing in the performance tier of DeepSeek V3.2, GLM-5, and Gemini 3 Pro @NielsRogge.
- A computer-use paper, PreAct, was praised for compiling successful agent runs into a guarded replayable state machine, eliminating per-step LM calls on repeats and yielding 8.5x to 13x faster replay @dair_ai.
- Another RL/agent paper proposed LLM-as-Environment-Engineer, where the policy uses its own failures to redesign the next training environment; the associated benchmark is MAPF-FrozenLake @dair_ai.
- Omar Sar0 argued coding agents need verifiers and robust guardrails, not blind autonomous loops, reinforcing a trend toward constrained agentic execution @omarsar0.
- David Khourshidâs coding-agent take was more operational: AI-generated code still has to be read, and not reading it simply defers the debugging burden @DavidKPiano.
- On RL theory, John Schulman said PPOâs resurgence in the LLM era comes from effects not anticipated in the original paper, including the importance-ratio objective correcting biases from numeric error, async training, and forward-pass noise, while clipping alters entropy via a mechanism only later understood; he cites DAPO @johnschulman2.
- Relatedly, Chris Wolfe said recent post-GRPO analysis papers (e.g. DAPO, Dr. GRPO, GSPO, TIS) are exactly the kind of objective-analysis work he hopes to see for PPO in reasoning/agent contexts @cwolferesearch.
- John Carmack posted a detailed critique of Temporal Differences for visual representation learning, summarizing the method: train a frame encoder and a âmotion encoderâ on RGB frame differences so latent(frame1) + delta â latent(frame2), with a 0.25 second stride; he questioned the DINO EMA anti-collapse choice and the soundness of the delta construction @ID_AA_Carmack.
AI infrastructure, inference, and product rollouts
- Xenova released a demo and kernels from the now-shut-down Fable 5 effort, claiming it had pushed Gemma 4 to 255 tok/s on WebGPU; the framing is that agentic kernel optimization could materially improve browser/on-device inference @xenovacom.
- Fal announced Kling 3.0 Turbo and O3 upgrades:
- faster generation
- lower costs
- better lip-sync
- more stable motion
- stronger prompt/reference consistency in âOmniâ
- up to 15s clips
- full 4K generation with Omni
- improved storyboard and multishot workflows @fal
- Klingâs own account amplified the Fal rollout as a creator-facing quality/speed improvement @Kling_ai.
- GitHub Copilotâs Auto mode now uses a custom routing model to choose among models based on reasoning depth, code complexity, debugging difficulty, and tool orchestration needs; a blog post and a linked research paper were shared @pierceboggan, @pierceboggan.
- Kimi Code Web appears to be back online, per a brief ecosystem note @bigeagle_xd.
- Grok image generation projects were mentioned via grok.com/imagine, but with no substantive technical detail @chaitu.
Talent, labs, and competitive dynamics
- The biggest personnel story outside Midjourney: Noam Shazeer announced he is joining OpenAI, leaving Google after saying it was a difficult decision and praising his former team @NoamShazeer.
- Sam Altman celebrated the move, saying Noam was one of the people he had most wanted to work with since OpenAIâs beginning @sama, then joked about OpenAI being SOTA âin noamsâ @sama.
- Commentary emphasized Shazeerâs significance as co-author of Transformer, T5, and Switch Transformer and pioneer of sparse MoE systems, with some calling it the most important AI talent move of the year @scaling01.
- Aidan Clark signaled excitement about working with Noam and linked it to a sense that RSI is getting closer @aidan_clark.
- A broader industry reading from replies:
- DeepMind/Brain merger may have indirectly benefited Anthropic/OpenAI @arohan
- Anthropic got Karpathy while OpenAI got Noam @TheTuringPost
- speculation that the move says as much about Google disappointment as OpenAI pull @teortaxesTex
- There was also chatter about relative power/valuation: Liam Fedus posted âBreaking: OpenAI overtakes Anthropicâs valuationâ @LiamFedus.
- More opinionated geopolitical/competitive takes argued that various actors have incentives to prevent Anthropic from maintaining too large a lead, though these were clearly speculative rather than factual reporting @teortaxesTex, @teortaxesTex.
Adoption, usage, and model quality discourse
- Blanche Minerva offered a practical quality complaint: ChatGPT and Claude can disagree on something as concrete as the overlap in citations between two papers, underscoring persistent reliability issues in applied knowledge tasks @BlancheMinerva.
- Several posts focused on GLM and Chinese model progress:
- praise for the GLM team as âheroicâ @teortaxesTex
- follow-up saying the latest generation reached something like Opus-level expectations beyond prior assumptions @teortaxesTex
- speculation that future frontier capability gains may hinge more on RL recipes than pure pretraining scale @teortaxesTex, @teortaxesTex
- There was also a cluster of highly speculative posts about âClaudeâ identity/persona salience appearing in outputs, framed as memetic or steganographic behavior rather than established fact @teortaxesTex, @teortaxesTex, @teortaxesTex.
Broader tech and society
- A Tacit Labs join announcement framed biology as the next place where AI should uncover genuinely new knowledge rather than just recombine existing understanding @maxisawesome538.
- There was a joke about the White House demanding a solution to the halting problem, a reminder that AI-policy discourse still often compresses deep CS impossibilities into simple-sounding asks @the_engi_nerd.
- In autonomy, one post noted the apparent lack of fresh AV startup activity despite Waymo/Tesla making the category seem increasingly feasible @gabriberton.
- Miscellaneous opinion posts on learning, coding, and contribution included:
- you can contribute to AI without deep formal math background @gabriberton
- a token-understanding/generation interview question about whether a model can understand a token it cannot generate @gabriberton
- a joke that a Slack alternative could be built with âhalf a day of vibe codingâ @gabriberton
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
1. GLM-5.2 Open-Weights Frontier Benchmarks
-
GLM-5.2 is the first open-weights model to cross 80% on Terminal-Bench and beats every other open model available (Activity: 1569): The image is a technical benchmark bar chart for Terminal-Bench 2.1 showing GLM-5.2 scoring
81.0, making it the first open-weights model in the chart to clear the dashed80%threshold, though closed models Claude Opus 4.8 (85.0) and GPT-5.5 (84.0) remain ahead overall (image). The post frames this as GLM-5.2 beating other open models and even Gemini 3.1 Pro, but a commenter notes Terminal-Bench2.1is an âeasierâ revision of Terminal-Bench 2 with relaxed timeouts/rules, so cross-version score comparisons may be inflated. Comments debate whether âopen weightsâ meaningfully implies âlocalâ usability: one user argues âif you can download it, itâs a local model,â while another says it is still impossible to run locally for99%of users due to hardware requirements.- A commenter argues that Terminal-Bench 2.1 is not directly comparable to Terminal-Bench 2, claiming 2.1 is an easier revision with changed timeouts, relaxed problem rules, and broader harness compatibility. They note that models generally should not score lower on 2.1 than 2, and suggest the more meaningful signal will be initial Terminal-Bench 3 scores before labs start optimizing against the benchmark.
- There is a technical deployment debate around whether GLM-5.2 should be considered a âlocal model.â One side argues that âif you can download it, itâs a local modelâ because unlike Claude or ChatGPT the weights can be run by users, while another points out that the model is effectively impossible to run locally for
99%of users due to hardware/performance constraints such as very low tokens-per-second on consumer systems.
-
GLM-5.2 is a win for local AI (Activity: 1270): The post argues that GLM-5.2, described as an MIT-licensed MoE coding/reasoning model with
753Btotal parameters and ~40Bactive parameters/token, is significant less as a home-runnable model than as a source for distillation/synthetic-data fine-tuning into8B/70Blocal models. The OPâs estimated deployment table puts memory at roughly744â890 GBfor FP8,476â500 GBfor 4-bit,241â280 GBfor 2-bit, and176â180 GBfor 1-bit dynamic quantization, with KV cache overhead for the claimed1Mcontext adding ~150â200 GBat FP16/BF16 or ~35â50 GBat INT4; they explicitly caveat that these numbers were gathered online and AI-assisted. Commenters debated âlocalâ feasibility: some noted that512GBMacs, GB10 clusters, or multiple128GBAMD AI Max-class systems could plausibly run low-bit variants, while others framed the hardware requirements as increasingly unobtainable. One API user called GLM-5.2 a âvery, very, very killer modelâ and argued that GLM-5.2, MiniMax M3/Mini-V2.5-Pro, and similar open-weight/API-accessible models have largely closed the practical gap with proprietary frontier models; another commenter simply wished for a distilled or native70Bdense release.- Several commenters focused on local inference feasibility for GLM-5.2, noting that practical setups likely require very high-memory systems such as
512GBMacs, GB10-style clusters, multipleAMD AI MAX 128GBmachines, or a custom multi-GPU server. One user estimated they could run the GGUF locally on a server built for<$9,000, but expected only around~7 TPS, framing it as usable but expensive home deployment. - A technical concern was raised about Mac Studio performance at long context lengths: one commenter argued that while the model may technically run, it becomes âunusableâ at
50K+context because of poor PP/TG throughput. The point was that memory capacity alone is insufficient; prompt processing and token generation speed dominate usability for large-context inference. - A user with API experience claimed GLM-5.2, alongside MiniMax M3 / Mimi-V2.5-Pro, significantly narrows the gap between open-weight/open-ish large models and frontier proprietary models. They specifically said they would trust GLM-5.2 more than Opus 4.8 in some cases, while acknowledging that there remain âfrontier problemsâ these models still cannot solve.
- Several commenters focused on local inference feasibility for GLM-5.2, noting that practical setups likely require very high-memory systems such as
-
zai-org/GLM-5.2 is here! (Activity: 1178): ****Z.ai released
zai-org/GLM-5.2, an MIT-licensed flagship long-context model with a stable1Mtoken context, stronger coding/agentic performance, configurable reasoning effort, and serving support across SGLang, vLLM, Transformers, KTransformers, and Ascend NPU. Key implementation changes include IndexShare sparse-attention indexer reuse, claiming2.9Ălower per-token FLOPs at1Mcontext, plus improved MTP speculative decoding with up to20%longer acceptance; commenters highlighted a reported DeepSWE score of46.2, placing it above Claude Opus 4.6/Sonnet and just below 4.7 in that benchmark. Commenters were mainly interested in missing/expected variants and quantization, asking for GLM-5.2-Flash-32B-A4B and jokingly/seriously waiting for ultra-low-bit0.5Qreleases.- Commenters highlighted that GLM-5.2 is very large on Hugging Face, with the linked
zai-org/GLM-5.2repository showing roughly1.51 TBof model files, making local inference impractical for many users without heavy quantization or multi-GPU setups. - One commenter cited a self-reported DeepSWE score of
46.2, claiming it places GLM-5.2 above Claude Opus 4.6 and Claude Sonnet, and just below Opus 4.7, suggesting strong software-engineering benchmark performance if independently validated. - There was interest in a smaller or more deployable variant, specifically
GLM-5.2-Flash-32b-a4b, implying demand for a lower-cost MoE/Flash-style release that could be easier to run than the full1.51 TBcheckpoint.
- Commenters highlighted that GLM-5.2 is very large on Hugging Face, with the linked
-
GLM-5.2 is now 1st on Design Arena â ahead of the now unavailable Claude Fable 5. (Activity: 751): The image is a Design Arena leaderboard screenshot (image) showing GLM-5.2 ranked #1 in the âCode Categories Arenaâ with an Elo score of
1360, narrowly ahead of the now-unavailable Claude Fable 5 at1350. Context from the linked tweet/post frames this as notable because GLM-5.2 appears to overtake a recently removed Anthropic model, though the margin is small and leaderboard rankings may shift as more votes accumulate. Commenters were cautiously interested but skeptical, with one noting it is âbit early to make this callâ and suggesting the ranking needs time to stabilize. Others focused on geopolitical/model-access implications, joking or warning that if powerful U.S. models are restricted, open or Chinese alternatives like GLM-5.2 may quickly fill the gap.- A commenter cautioned that GLM-5.2âs Design Arena rank may be premature, noting that arena scores can shift as more votes accumulate: âgive it a few days to settle out.â This is a useful caveat for interpreting early leaderboard positions, especially when comparing against unavailable models like Claude Fable 5.
- One technical concern raised was how a text-only model can perform well in real-world design workflows, where outputs often require visual inspection and iterative feedback loops. The commenter suggested that such workflows may need an OCR or vision model to evaluate generated designs, and asked how the vision-capable Kimi K2.7 performs on the same benchmark, noting that Kimi K2.6 was already their preferred design model.
-
PSA: unsloth/GLM-5.2-GGUF is uploading (Activity: 491): A Reddit user noticed the Hugging Face repo
unsloth/GLM-5.2-GGUFwas newly created and inferred that Unsloth was likely preparing/uploading GGUF quantizations for GLM-5.2; at the time, the repo reportedly only contained a README. The linked HF page was not accessible during fetch due to HTTP429 Too Many Requests, with Hugging Face recommending authentication viaHF_TOKENfor API access. Top comments focused on deployment practicality: users questioned what quantization level would be required to fit the model locally, joked about needing cloud GPUs, and implied current consumer hardware may be insufficient for comfortable inference.- Commenters focused on the deployment feasibility of unsloth/GLM-5.2-GGUF, with one noting the apparent
800GBfootprint and asking how aggressively it would need to be quantized to run locally. Another technical concern was KV-cache scaling for very long context: âimagine the KV Cache size to reach 1M CTXâ, implying that even if the GGUF weights fit,1Mcontext inference would require substantial additional memory beyond model weights.
- Commenters focused on the deployment feasibility of unsloth/GLM-5.2-GGUF, with one noting the apparent
2. Local Inference Optimization: WebGPU and AMD ROCm
-
Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5 (Activity: 503): A WebGPU in-browser demo for Google Gemma 4 E2B IT QAT mobile transformers was released on Hugging Face, including optimized kernels reportedly generated/optimized with Fable 5 before shutdown, reaching about
255 tok/son an Apple M4 Max. The demo/kernels are available at HF Spaces, with the model hosted at google/gemma-4-E2B-it-qat-mobile-transformers. Comments noted strong interest in open-sourcing the UI and lack of Firefox support, implying the demo likely depends on browser WebGPU support/compatibility. One commenter pointed to a related Hugging Face optimization effort for Gemma E4B on an A10G, claiming roughly500 TPSwith âno quality lossâ via collaborative agent-driven inference optimization: dashboard.- A commenter linked a related Hugging Face optimization effort where collaborating agents are reportedly maximizing Gemma E4B inference on an A10G GPU, reaching about
500 TPSwith claimed âno quality lossâ: https://gemma-challenge-gemma-dashboard.hf.space/. This provides a useful comparison point against the postâs in-browser WebGPU result of255 tok/s, though the hardware and runtime environments differ substantially. - Several technical questions focused on runtime portability and deployment tradeoffs: one user noted the lack of Firefox support, likely due to WebGPU/browser compatibility constraints, while another asked how the WebGPU/Fable 5 implementation compares against native runtimes such as llama.cpp. Another raised a practical browser-storage concern after downloading roughly
2 GBof model data and wanting a way to flush/delete it afterward.
- A commenter linked a related Hugging Face optimization effort where collaborating agents are reportedly maximizing Gemma E4B inference on an A10G GPU, reaching about
-
Avoid CUDA monopoly at all costs. AMD is an alternative. (Activity: 458): The post reports running
llama.cpp/llama-serveron an AMD RX 7800 XT 16GB with ROCm 6.4.4, compiled via-DGGML_HIP=ON -DGPU_TARGETS=gfx1101 -DrocWMMA_FATTN=ON, servingQwopus3.6-27B-v2-Q3_K_S.ggufandQwen3.6-35B-A3B-UD-IQ3_XXS.ggufat up to131072context using--flash-attn on, full GPU offload, and split KV-cache quantization (K=q8_0,V=q4_0). The author claims the KV quantization reduces cache memory by ~5.6x, keeping weights +128KKV cache within ~96%of VRAM with no CPU spill, while achieving ~210 tok/sprefill and11â17 tok/sdecode at ~188W; they attribute long-context coherence to YaRN RoPE scaling and provide a longer benchmark write-up here.
3. Local Coding Agents and Distillation Caveats
-
Be wary of Qwen/Claude distillations - theyâre often worse than the base model (Activity: 554): The post argues that recent Qwen/Claude distill/finetune models such as âQwopusâ or Qwen-based Claude distillations trained on only
~4kâ10kteacher samples are unlikely to transfer meaningful capability and may degrade the base Qwen 3.6 model, mostly changing style rather than improving reasoning/knowledge. It contrasts these with DeepSeek-R1 official LLaMA/Qwen distills, which reportedly used~700ksamplesâlarge enough to affect behavior and benchmarksâand cites an external test where a Claude-distilled Qwen variant hallucinated relative to base Qwen and ran slower: âClaude distillation doesnât transfer library knowledgeâ. Commenters broadly agreed, with one claiming capability-improving finetuning now generally needs>100kcarefully curated examples plus recovery methods such as GRPO, not a few thousand samples. Another commenter suggested skepticism toward model cards with weak evalsâlowN, onlypass@5, narrow web-dev benchmarks, or undisclosed distillationâthough much of that comment was heuristic/opinion rather than evidence.- Several commenters argued that small supervised âdistillationsâ from Qwen/Claude outputs are unlikely to improve base models:
4ksamples was described as âbasically nothing,â and one commenter said meaningful improvement fine-tuning now needs100k+carefully curated examples plus recovery methods like GRPO rather than a few thousand prompt/response pairs. - A technical objection was that most API-based distillation lacks critical training signal: users typically do not get full logits beyond small top-N/top-1 outputs, and Anthropic does not expose full chain-of-thought, only summaries. This makes many releases closer to partial-response supervised fine-tunes than true knowledge distillation, losing substantial information from the teacher model.
- One commenter gave an applied fine-tuning datapoint: even with
18kexamples for a focused GDScript domain model, including docs pretraining and personal code, the model still failed to reliably produce desired outputs. Their conclusion was that fine-tuning can improve domain behavior/vertical specialization but âdoes NOT add intelligence.â
- Several commenters argued that small supervised âdistillationsâ from Qwen/Claude outputs are unlikely to improve base models:
-
Headless screenshot loops let a local 30B agent finish a raytraced FPS demo in pure C (Activity: 277): OP reports that adding a headless instrumentation requirementâkeyboard/mouse input injection plus deterministic screenshot capture at selected framesâlet both Claude Code on Opus 4.8 and a local Qwen3.6 27B agent iteratively debug and complete a small pure-C, standard-library-only raytraced FPS demo. The key mechanism was self-directed visual feedback: the agent timed captures around events such as rocket impacts, inspected rendered particles/debris, patched the C code, and reran the binary, effectively forming a recursive screenshot-based debugging loop. OP frames this as a prompting/tooling result rather than a model-quality benchmark, and discloses the local agent is their own OSS project,
codehamr. Commenters were mostly impressed by the local Qwen result and nostalgic about C/demoscene-era development, though one commenter argued the task is likely not very challenging for current models.- One commenter described an agent harness built around a custom Python
Logfunction that mirrorsprintbut can redirect all output into a shared log file. The model is explicitly instructed to inspect log tails, add internal logging, and use those observations for iterative debugging, effectively closing the observe-debug-fix loop that models âdonât do out of the box.â - A user reported running the setup on an RTX 4090, noting a clear speed improvement and identifying
q4_k_mquantization as their preferred quality/performance tradeoff for local inference.
- One commenter described an agent harness built around a custom Python
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Claude Code Security and Workflow Lessons
-
Claude Opus caught malware hidden in my repo, then reverse engineered the whole thing (Activity: 1207): A repo owner says Claude Code running Opus halted a git merge after detecting an obfuscated block appended after
module.exportsinnext.config.js, identifying it statically as an EtherHiding-style loader before CI/build execution. The described chain uses git credential theft/self-propagation via a compromised contractor machine, forged commit metadata, blockchain dead drops viaapi.trongrid.io, Aptos, and BSC RPCs, XOR-decoded in-memory stages, and an infostealer C2 at198.105.127.210over ports80/443; listed IOCs include dropper SHA-256e27abe7e810c79d71e8c1681ccd010d7ddbda6a9a34bf1124ba392a36ba9b476, globals likeglobal.i/global._V = "8-4827", and multiple TRON/Aptos/BSC transaction pointers. Recommended checks are to auditnext.config.js,postcss.config.js, and other build-time config for appended code, monitor CI egress to blockchain RPC endpoints, rotate all secrets reachable from builds, and treat the pushing workstation as compromised. The substantive comment thread emphasized that the key security lesson is not merely âClaude caught it,â but that framework config such asnext.config.jsis privileged build-time code executed in CI and should be tightly controlled, reviewed, and sandboxed. Other top comments were mostly jokes or off-topic digs at Next.js / model restrictions.- A commenter emphasized that the key security failure was not model detection but that a committed
next.config.jsexecutes during every build/CI run: âone committed config ⌠runs on every build + CIâ. The technical takeaway was to restrict what code can execute at build time, since a malicious config only needs to land in the repo once to compromise the pipeline. - Another commenter identified repository controls as the first failure point: allowing someone to force-push into the repo bypasses normal review and auditability. They argued this should be prevented via branch protection, required reviews, and disabling force pushes on protected branches.
- One commenter recommended scanning all repositories for Hades/Miasma-style supply-chain compromises, noting these can propagate through commonly used libraries rather than only from already-infected developer machines. They also warned the issue is not limited to Node projects and suggested checking all language ecosystems in use.
- A commenter emphasized that the key security failure was not model detection but that a committed
-
Pro Tip - Reset your usage limits on your schedule (Activity: 1342): The post describes a scheduling workaround for Claude Code usage windows: create a daily Claude Code Routine using Haiku that sends a trivial prompt (e.g.
"hello") roughly5hours before the desired reset time, thereby starting the rolling session window earlier. The claimed effect is that a user who begins work at9:00can force the first reset around12:30instead of14:00+, assuming no prior active session prevents a fresh window; Anthropicâs routines feature is documented here. Commenters generally agreed the tactic is useful mainly for users who frequently hit Pro/5x limits, because it lets them burn higher-capability models or high-token plugins shortly before a scheduled refresh. One commenter reported using a cron-like refresh every5hours to maximize token availability, while noting OPâs tighter timing may better preserve a usable morning window.- Users describe exploiting Claudeâs rolling
5-hourusage window by deliberately starting a session earlier than actual work time. For example, triggering usage at7AMcan move the next reset to12PM, so if heavy work begins after a10AMstandup, the user waits until noon instead of3PMafter exhausting limits. - One commenter reports using a
cronjob to refresh the usage window every5 hoursto maximize token availability, then intentionally burning remaining quota on the highest-capacity model, Claude Design, and other high-token tools before the reset. They also mention using âcaveman modeâ and a ârust token killerâ as token-reduction techniques, though no implementation details or benchmarks are provided. - Another user configured scheduled tasks in
coworkat7AM,12PM, and5PMto align Claude usage resets with waking/work hours, effectively creating multiple full sessions during the day. They note Claude itself pushed back against automating this and instead recommended reducing token waste, highlighting a tension between quota-window optimization and prompt/token-efficiency practices.
- Users describe exploiting Claudeâs rolling
-
the gap between Claude Code power users and us chat-only people keeps getting wider and i donât think thatâs great for the community (Activity: 2348): A Pro chat-only Claude user argues the subredditâs technical focus has shifted heavily toward Claude Code workflows (
CLAUDE.md, MCP, subagents, terminal usage), making non-coding use casesâwriting, thinking, learning, planningâfeel underrepresented despite likely being a large user cohort. Top replies suggest that Claude Code can be used as a general local-task agent without programming, e.g. chatting in the terminal to transform local data into Excel/PDF outputs, and mention Cowork as an intermediate option with increased usage limits. Commenters largely agree that coding dominates the communityâone estimates95%of posts are coding-relatedâbut differ on the remedy: some encourage chat-only users to adopt Claude Code-style tools for broader automation, while others ask what concrete non-coding workflows people want discussed.- Several commenters argued that Claude Codeâs advantage is not coding-specific, but comes from its CLI/tool environment: local file access, command execution, and the ability to perform concrete tasks like converting local data into formatted Excel files and exporting polished PDFs. The technical distinction raised is that Claude Code behaves more like an agent with filesystem/tool access, while browser chat remains a more limited conversational interface.
- A recurring technical complaint was that Claudeâs most powerful workflows require nontrivial setup: MCP servers, package installation, and manual JSON configuration. One commenter argued that for non-coders, installing âwhatever the hell an MCP server isâ should be a one-click operation, because the current friction keeps advanced Claude workflows inaccessible to regular users.
- A power-user example framed
CLAUDE.md, MCPs, subagents, and terminal workflows as general-purpose knowledge-work infrastructure, not software-development features. In an investment workflow, each deal can have its ownCLAUDE.md, MCP-connected data sources, subagents to process due-diligence reports, and documented workflows for building financial models and slide decks.
2. Anthropic Fable Access and Policy Pressure
-
Theyâre demanding Fable to somehow be 100% jailbreak-proof. Itâs so fucking over. (Activity: 1375): The image is a screenshot of a WIRED article preview claiming Trump administration officials want Anthropicâs âFable 5â blocked from all jailbreaks before release, while security experts argue that 100% jailbreak resistance is not technically achievable for current LLMs. The technical issue is the impossibility of proving total safety for a generative model exposed to arbitrary prompts, tools, and contexts; the post frames this as an unrealistic security requirement akin to demanding an OS or car be mathematically incapable of harm. Image Commenters largely view the demand as absurd or politically motivated, with analogies to requiring cars to cause zero injuries or operating systems to be unhackable. One commenter speculates the requirement may be intended to restrict public release while preserving government access.
- Commenters argued that requiring Fable to be
100%jailbreak-proof is technically equivalent to demanding an OS or general-purpose computing platform be proven unhackable: the attack surface and prompt/input space are effectively open-ended. One point emphasized the formal-security issue that âyou canât prove [a] negativeâ for all possible jailbreaks, making absolute jailbreak immunity an impractical certification target rather than an engineering requirement.
- Commenters argued that requiring Fable to be
-
Anthropic CEO Dario Amodei joins top AI CEOs meeting with world leaders at G7 summit (Activity: 1812): Anthropic CEO Dario Amodei and OpenAI CEO Sam Altman reportedly attended a G7 working lunch on AI with world leaders, amid geopolitical tension from a U.S. restriction limiting allied access to Anthropicâs most advanced models. The available Reddit/video source could not be independently inspected because Reddit returned
403 Forbidden, so no additional technical detail on the policy scope, affected model tiers, or export-control mechanism is available. Top comments were largely non-technical jokes/reactions; the only substantive question was why Salesforce CEO Marc Benioff was present at the AI leadership meeting.
AI Discords
Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.