<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>AINews</title><description>Weekday recaps of top News for AI Engineers</description><link>https://news.smol.ai/</link><language>en-us</language><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-05-12-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-12-not-much/</guid><description>**Research-level reasoning benchmarks** are advancing with **439 new math problems** from **64 mathematicians** and expanded medical benchmarks in **Medmarks v1.0** covering **30 benchmarks** and **61 models**. **Google DeepMind&apos;s AI Co-Mathematician** achieves **48% on FrontierMath Tier 4**, while **Gemini 3.1 Pro** improves physics benchmark scores significantly. **GPT-5.5 high/xhigh** outperforms **Opus 4.7 xhigh** on program synthesis tasks. Retrieval benchmarks favor smaller models like **LightOn&apos;s Agent-ModernColBERT** with **149M parameters**. Training optimization advances include **SOAP/Muon-style updates** reducing training steps, and a **Lean4-to-TileLang superoptimizer** achieving **1.8× speedup on A100 GPUs**. Scaling laws are reconsidered with arguments for measuring in bytes rather than tokens. New training-time efficiency methods like **Lighthouse Attention** enable subquadratic training wrappers removable before deployment.</description><pubDate>Tue, 12 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/11/2026-5/12/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Research Benchmarks, Hard Evals, and Agentic Science Systems&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Research-level reasoning benchmarks keep getting harder&lt;/strong&gt;: &lt;a href=&quot;https://x.com/gson_AI/status/2054036114483392997&quot;&gt;Soohak&lt;/a&gt; introduces &lt;strong&gt;439 research-level math problems&lt;/strong&gt; authored from scratch by &lt;strong&gt;64 mathematicians&lt;/strong&gt; (including &lt;strong&gt;38 faculty&lt;/strong&gt;), explicitly targeting capabilities above standard olympiad-style math. In medical evaluation, &lt;a href=&quot;https://x.com/SophontAI/status/2054270239387627927&quot;&gt;@SophontAI&lt;/a&gt; released &lt;strong&gt;Medmarks v1.0&lt;/strong&gt;, expanding its open medical benchmark suite from &lt;strong&gt;20→30 benchmarks&lt;/strong&gt; and &lt;strong&gt;46→61 models&lt;/strong&gt;. There’s also growing sentiment that old evals are saturating: &lt;a href=&quot;https://x.com/polynoamial/status/2054255862441812099&quot;&gt;@polynoamial&lt;/a&gt; argues benchmarks with uniformly high scores should be retired in favor of lower-scoring, frontier-challenging tests.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agentic systems are starting to move benchmark frontiers in science and math&lt;/strong&gt;: Google DeepMind’s &lt;a href=&quot;https://x.com/dair_ai/status/2054224343551639958&quot;&gt;AI Co-Mathematician&lt;/a&gt; is described as an asynchronous, stateful research workbench for mathematicians, reportedly reaching &lt;strong&gt;48% on FrontierMath Tier 4&lt;/strong&gt; while supporting ideation, literature discovery, computational analysis, theorem verification, and formal outputs. In theoretical physics, &lt;a href=&quot;https://x.com/dlouapre/status/2054217281895309480&quot;&gt;physics-intern&lt;/a&gt; boosts &lt;strong&gt;Gemini 3.1 Pro from 17.7% to 31.4% on CritPt&lt;/strong&gt; via decomposition into specialized agents. On coding/program synthesis, &lt;a href=&quot;https://x.com/KLieret/status/2054215545663144217&quot;&gt;ProgramBench’s first task&lt;/a&gt; was reportedly solved by &lt;strong&gt;GPT-5.5 high/xhigh&lt;/strong&gt;, with xhigh outperforming &lt;strong&gt;Opus 4.7 xhigh&lt;/strong&gt; across metrics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Retrieval and search benchmarks are rewarding small, specialized models&lt;/strong&gt;: LightOn’s &lt;a href=&quot;https://x.com/LightOnIO/status/2054202169255973121&quot;&gt;Agent-ModernColBERT&lt;/a&gt; stacks another &lt;strong&gt;~10%&lt;/strong&gt; over Reason-ModernColBERT on BrowseComp-Plus while keeping the retriever at &lt;strong&gt;149M parameters&lt;/strong&gt;, with claims of matching or exceeding much larger model-based systems when paired with a generator. Related discussion from &lt;a href=&quot;https://x.com/xuzihuan4/status/2054220800073642161&quot;&gt;@xuzihuan4&lt;/a&gt; asks whether lexical retrieval may suffice in agentic search loops when agents can iteratively refine their own queries.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Training, Optimization, and Scaling-Law Techniques&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Optimizer work continues to compress training cost and improve small-scale experimentation&lt;/strong&gt;: Several tweets centered on fast variants of &lt;strong&gt;SOAP/Muon-style updates&lt;/strong&gt;. &lt;a href=&quot;https://x.com/torchcompiled/status/2054036715589771542&quot;&gt;@torchcompiled&lt;/a&gt; applied tangent-step + Stiefel manifold retraction to &lt;strong&gt;SOAP basis updates&lt;/strong&gt;, with &lt;a href=&quot;https://x.com/torchcompiled/status/2054088499591000255&quot;&gt;follow-up discussion&lt;/a&gt; on drift checks and QR fallback for stability. In the Modded-NanoGPT community, &lt;a href=&quot;https://x.com/kellerjordan0/status/2054255672636981423&quot;&gt;SOAP-Muon&lt;/a&gt; set a new record at &lt;strong&gt;3150 steps (-60)&lt;/strong&gt;, while an earlier &lt;a href=&quot;https://x.com/kellerjordan0/status/2054098451621978471&quot;&gt;MuLoCo-style outer Nesterov SGD wrap on NorMuonH&lt;/a&gt; also improved results, both backed by p-value reporting.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Formal methods and superoptimization are beginning to merge with ML systems work&lt;/strong&gt;: &lt;a href=&quot;https://x.com/leloykun/status/2054076097881592068&quot;&gt;@leloykun&lt;/a&gt; described a &lt;strong&gt;Lean4-to-TileLang tensor program superoptimizer&lt;/strong&gt; that can automatically discover kernels such as &lt;strong&gt;FlashAttention2&lt;/strong&gt;, &lt;strong&gt;FlashNorm&lt;/strong&gt;, and &lt;strong&gt;split-k matmul&lt;/strong&gt;, reporting roughly &lt;strong&gt;1.8× geomean speedup on A100s&lt;/strong&gt;. The same framework is positioned to jointly search over kernels, optimizers, hyperparameter transfer rules, and scaling laws.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scaling laws and training metrics are being re-examined&lt;/strong&gt;: &lt;a href=&quot;https://x.com/che_shr_cat/status/2054178651856339276&quot;&gt;@che_shr_cat&lt;/a&gt; argues the classic &lt;strong&gt;“20 tokens per parameter”&lt;/strong&gt; framing is tokenizer-dependent and that scaling should be measured in &lt;strong&gt;bytes&lt;/strong&gt;, not tokens. Separately, &lt;a href=&quot;https://x.com/JJitsev/status/2054166378823794881&quot;&gt;@JJitsev&lt;/a&gt; emphasized that prescriptive scaling laws are valuable not just for prediction, but as a systematic basis for comparing learning procedures across scales.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training-time-only efficiency tricks are getting more interesting&lt;/strong&gt;: &lt;a href=&quot;https://x.com/omarsar0/status/2054224130103554359&quot;&gt;Lighthouse Attention&lt;/a&gt; from Nous is highlighted as a subquadratic &lt;strong&gt;training wrapper&lt;/strong&gt; around vanilla attention that can be removed near the end of training after a recovery phase, preserving standard deployment-time inference while reducing long-context pretraining cost. In a similar spirit, &lt;a href=&quot;https://x.com/PrimeIntellect/status/2054347134821154841&quot;&gt;Renderers&lt;/a&gt; from Prime Intellect addresses the token/message impedance mismatch between RL trainers and agent environments, claiming &lt;strong&gt;&gt;3× throughput&lt;/strong&gt; on popular open models.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Inference Systems, Serving Stacks, and Runtime Infrastructure&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Blackwell racks are emerging as the reference platform for large-MoE serving&lt;/strong&gt;: Perplexity published details on serving post-trained &lt;strong&gt;Qwen3 235B&lt;/strong&gt; on &lt;strong&gt;NVIDIA GB200 NVL72&lt;/strong&gt; systems, arguing GB200 is a major inference step up over Hopper for large MoEs. Their &lt;a href=&quot;https://x.com/perplexity_ai/status/2054204425833726353&quot;&gt;benchmarks&lt;/a&gt; cite &lt;strong&gt;NVLS all-reduce latency&lt;/strong&gt; dropping from &lt;strong&gt;586.1µs on H200 to 313.3µs on GB200&lt;/strong&gt;, and &lt;strong&gt;MoE prefill combine&lt;/strong&gt; at EP=4 dropping from &lt;strong&gt;730.1µs to 438.5µs&lt;/strong&gt;, with better decode throughput at high token rates. &lt;a href=&quot;https://x.com/AravSrinivas/status/2054206802133504234&quot;&gt;@AravSrinivas&lt;/a&gt; framed this as materially changing prefill/decode disaggregation for serving large MoEs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inference orchestration is increasingly specialized, not “just Kubernetes”&lt;/strong&gt;: &lt;a href=&quot;https://x.com/charles_irl/status/2054233051140690023&quot;&gt;Modal&lt;/a&gt; argues inference needs a dedicated stack, citing work on compute management, cloud-native caching, &lt;strong&gt;CRIU&lt;/strong&gt;, and &lt;strong&gt;GPU checkpointing&lt;/strong&gt;. That positioning got an immediate real-world endorsement from Perceptron, which said &lt;a href=&quot;https://x.com/AkshatS07/status/2054275262289002664&quot;&gt;all Mk1 inference runs on Modal&lt;/a&gt; because native video, structured outputs, and hybrid reasoning create unusual cold-start and scaling requirements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OSS inference economics continue to improve fast&lt;/strong&gt;: &lt;a href=&quot;https://x.com/SemiAnalysis_/status/2054245527957508520&quot;&gt;SemiAnalysis&lt;/a&gt; reported that clustering multiple &lt;strong&gt;B200 8-GPU&lt;/strong&gt; machines over &lt;strong&gt;RoCEv2 CX-7&lt;/strong&gt; with &lt;strong&gt;PD disaggregation&lt;/strong&gt; can lift &lt;strong&gt;per-GPU token throughput by up to 7×&lt;/strong&gt;, implying comparable cost-per-token reductions. On the vector DB side, &lt;a href=&quot;https://x.com/qdrant_engine/status/2054166055417938266&quot;&gt;Qdrant 1.18&lt;/a&gt; added &lt;strong&gt;TurboQuant&lt;/strong&gt;, claiming recall near scalar quantization with &lt;strong&gt;2× less memory&lt;/strong&gt;, alongside memory monitoring and named-vector lifecycle operations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent runtimes are becoming version-control-like substrates&lt;/strong&gt;: A standout systems idea was Stanford’s &lt;strong&gt;Shepherd&lt;/strong&gt;, summarized by &lt;a href=&quot;https://x.com/ai_satoru_chan/status/2054126183374348296&quot;&gt;@ai_satoru_chan&lt;/a&gt;, which treats agent execution more like &lt;strong&gt;Git&lt;/strong&gt;: first-class tasks, effects, scopes, and traces; exact replay; branching; rollback; and formal guarantees in &lt;strong&gt;Lean&lt;/strong&gt;. Claimed results include live-supervision gains on CooperBench from &lt;strong&gt;28.8%→54.7%&lt;/strong&gt;, plus faster counterfactual optimization and tree-RL rollouts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Product and Model Releases: Multimodal, Video, Retrieval, and Embeddings&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Perceptron Mk1 was the most substantive new model release in the set&lt;/strong&gt;: &lt;a href=&quot;https://x.com/perceptroninc/status/2054216828285796630&quot;&gt;@perceptroninc&lt;/a&gt; launched &lt;strong&gt;Perceptron Mk1&lt;/strong&gt; as a model for &lt;strong&gt;frontier video and embodied reasoning&lt;/strong&gt;, with native video support at &lt;strong&gt;up to 2 FPS&lt;/strong&gt;, temporal grounding, multimodal in-context learning, and structured spatial outputs. &lt;a href=&quot;https://x.com/OpenRouter/status/2054232344148787462&quot;&gt;OpenRouter’s summary&lt;/a&gt; notes a &lt;strong&gt;32k multimodal context&lt;/strong&gt; and first-class outputs like points, boxes, polygons, and clips. The release is framed less as a generic VLM and more as a physical-world reasoning stack.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google and Meta both pushed multimodal interaction layers rather than standalone model specs&lt;/strong&gt;: Google DeepMind’s &lt;a href=&quot;https://x.com/GoogleDeepMind/status/2054246119635300451&quot;&gt;AI-enabled mouse pointer demos&lt;/a&gt; reimagine the cursor as a contextual pointing interface tied to Gemini, allowing users to point at on-screen content and speak shorthand instructions. In parallel, Meta announced &lt;a href=&quot;https://x.com/MetaNewsroom/status/2054205287515484397&quot;&gt;Meta AI voice conversations powered by Muse Spark&lt;/a&gt;, adding interruption, language switching, image generation, and live camera-grounded interaction.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Embedding and retrieval model updates were notable&lt;/strong&gt;: Jina released &lt;a href=&quot;https://x.com/JinaAI_/status/2054226262047301933&quot;&gt;jina-embeddings-v5-omni&lt;/a&gt;, a universal embedding model for &lt;strong&gt;text, images, audio, and video&lt;/strong&gt;, in &lt;strong&gt;1.57B&lt;/strong&gt; and &lt;strong&gt;0.95B&lt;/strong&gt; variants, both with Matryoshka truncation and backward compatibility with existing v5-text indexes. Meta quietly released &lt;a href=&quot;https://x.com/mervenoyann/status/2054187884417102319&quot;&gt;Sapiens2&lt;/a&gt;, a family of human-centric high-resolution ViTs spanning &lt;strong&gt;0.1B→5B&lt;/strong&gt; params for pose estimation, segmentation, normals, and pointmaps.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Diffusion and image tooling kept moving&lt;/strong&gt;: Hugging Face’s &lt;a href=&quot;https://x.com/RisingSayak/status/2054110949469196748&quot;&gt;Diffusers 0.38.0&lt;/a&gt; added new pipelines including &lt;strong&gt;Ace-Step 1.5&lt;/strong&gt;, &lt;strong&gt;LongCat-AudioDiT&lt;/strong&gt;, and &lt;strong&gt;Ernie-Image&lt;/strong&gt;, plus support for &lt;strong&gt;Flash Attention 4&lt;/strong&gt;, &lt;strong&gt;FlashPack loading&lt;/strong&gt;, and &lt;strong&gt;Ring Anything&lt;/strong&gt; for context parallelism. Other research releases included &lt;a href=&quot;https://x.com/iScienceLuvr/status/2054118255778763184&quot;&gt;ELF: Embedded Language Flows&lt;/a&gt;, a continuous-space text diffusion model, and Tencent’s &lt;a href=&quot;https://x.com/_akhaliq/status/2054120807425511826&quot;&gt;Pixal3D&lt;/a&gt; for pixel-aligned 3D generation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agents, Tooling, and Developer Workflow&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Agent products are shifting from demos to operational platforms&lt;/strong&gt;: OpenAI teased &lt;a href=&quot;https://x.com/OpenAIDevs/status/2054252221941121035&quot;&gt;Symphony&lt;/a&gt; as a system where &lt;strong&gt;every open task gets a running Codex agent&lt;/strong&gt;, and separately highlighted &lt;a href=&quot;https://x.com/OpenAIDevs/status/2054298427245441141&quot;&gt;computer use for Codex&lt;/a&gt; to work across apps without full takeover. LangChain re-open-sourced &lt;a href=&quot;https://x.com/BraceSproul/status/2054231134163321287&quot;&gt;its revamped Chat LangChain app&lt;/a&gt;, describing it as a production Q&amp;#x26;A agent handling nearly &lt;strong&gt;2T tokens/week&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Long-running-agent state management is becoming a first-class systems problem&lt;/strong&gt;: LangGraph’s new &lt;a href=&quot;https://x.com/sydneyrunkle/status/2054278551244099706&quot;&gt;DeltaChannel snapshots&lt;/a&gt; aim to replace full-state checkpointing for scalable durable execution; LangChain says the same mechanism now powers message histories and file storage in &lt;strong&gt;deepagents v0.6&lt;/strong&gt;. The broader pattern also shows up in Google’s &lt;a href=&quot;https://x.com/_philschmid/status/2054225343251206528&quot;&gt;Gemini Interactions API guide&lt;/a&gt;, where encrypted &lt;code&gt;thought&lt;/code&gt; signatures preserve reasoning context across turns in both stateful and stateless modes without forcing developers to manage signature injection manually.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Synthetic data and RL environment generation are being operationalized&lt;/strong&gt;: &lt;a href=&quot;https://x.com/Vtrivedy10/status/2054054238226170361&quot;&gt;@Vtrivedy10&lt;/a&gt; offered a useful practitioner perspective: targeted synthetic data extraction from model weights is hard at scale, especially for underrepresented distributions like long sequences, and effective pipelines need programmatic tests, verifiers, judges, and agentic long-horizon framing. On the infrastructure side, &lt;a href=&quot;https://x.com/Shahules786/status/2054241505506648161&quot;&gt;Tau2-Infinity&lt;/a&gt; formalizes autonomous mining of hard tool-use tasks for RL post-training via DAG walks or world-generation from failure hypotheses.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Top tweets (by engagement, filtered for technical relevance)&lt;/strong&gt;:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Gemini as an OS-level intelligence layer&lt;/strong&gt;: Google’s &lt;a href=&quot;https://x.com/sundarpichai/status/2054255858700415005&quot;&gt;Gemini Intelligence&lt;/a&gt;, &lt;a href=&quot;https://x.com/Google/status/2054270454467121187&quot;&gt;Googlebook&lt;/a&gt;, and &lt;a href=&quot;https://x.com/GoogleDeepMind/status/2054246119635300451&quot;&gt;AI pointer demos&lt;/a&gt; collectively point to agentic UX moving from chat windows into the operating system.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Isomorphic Labs funding&lt;/strong&gt;: &lt;a href=&quot;https://x.com/demishassabis/status/2054197462101889277&quot;&gt;@demishassabis&lt;/a&gt; announced &lt;strong&gt;$2.1B&lt;/strong&gt; in new funding for AI-driven drug discovery, one of the largest capital commitments in this dataset tied directly to an applied AI platform.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Speech-to-speech benchmarking&lt;/strong&gt;: Artificial Analysis’ &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2054234919887573292&quot;&gt;τ-Voice benchmark&lt;/a&gt; found even the best S2S models solve only about &lt;strong&gt;half of realistic customer service scenarios&lt;/strong&gt;, with &lt;strong&gt;Grok Voice Think Fast 1.0&lt;/strong&gt; leading at &lt;strong&gt;52.1%&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude Opus 4.7 fast mode&lt;/strong&gt;: Anthropic’s &lt;a href=&quot;https://x.com/ClaudeDevs/status/2054266327771275435&quot;&gt;fast mode release&lt;/a&gt; reached APIs and Claude Code, with Cursor noting &lt;a href=&quot;https://x.com/cursor_ai/status/2054274305345618163&quot;&gt;2.5× speed at 6× cost&lt;/a&gt;, a concrete new point on the latency/price frontier.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Security, Supply Chain, and Safer Coding&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The most urgent operational story was the Mini Shai-Hulud supply-chain attack&lt;/strong&gt;: &lt;a href=&quot;https://x.com/IntCyberDigest/status/2054166749998661659&quot;&gt;@IntCyberDigest&lt;/a&gt; reported the campaign had expanded beyond TanStack to hit &lt;strong&gt;OpenSearch, Mistral AI, Guardrails AI, UiPath, and others&lt;/strong&gt; across npm and PyPI, specifically targeting &lt;strong&gt;AI developer tooling&lt;/strong&gt;. The noteworthy technical detail is persistence: it allegedly hooks into &lt;strong&gt;Claude Code&lt;/strong&gt; (&lt;code&gt;.claude/settings.json&lt;/code&gt;) and &lt;strong&gt;VS Code&lt;/strong&gt; (&lt;code&gt;.vscode/tasks.json&lt;/code&gt;) so the compromise can re-execute on future tool events even after package removal. &lt;a href=&quot;https://x.com/guardrails_ai/status/2054341322304299086&quot;&gt;Guardrails AI&lt;/a&gt; later confirmed its &lt;strong&gt;0.10.1&lt;/strong&gt; package was compromised and quarantined within about &lt;strong&gt;2 hours&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Actionable mitigations surfaced quickly&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ramimacisabird/status/2054178771180093858&quot;&gt;@ramimacisabird&lt;/a&gt; noted that beyond &lt;code&gt;minimumReleaseAge&lt;/code&gt;, teams should enable &lt;strong&gt;&lt;code&gt;blockExoticSubdeps&lt;/code&gt;&lt;/strong&gt; to prevent remote GitHub references from slipping into dependency graphs. &lt;a href=&quot;https://x.com/elithrar/status/2054162732195197283&quot;&gt;@elithrar&lt;/a&gt; reiterated that GitHub’s &lt;strong&gt;&lt;code&gt;pull_request_target&lt;/code&gt;&lt;/strong&gt; remains one of the sharpest CI/CD footguns for fork-based PR automation. And at the workstation level, &lt;a href=&quot;https://x.com/andersonbcdefg/status/2054212574162653535&quot;&gt;@andersonbcdefg&lt;/a&gt; recommended moving secrets out of ubiquitous local &lt;code&gt;.env&lt;/code&gt; files into a proper secrets manager.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Safer codegen is becoming its own research track&lt;/strong&gt;: Stanford-aligned work on &lt;a href=&quot;https://x.com/houjun_liu/status/2054233718269595869&quot;&gt;SecureForge&lt;/a&gt; targets vulnerability discovery/prevention in LLM-generated code via prompt optimization, while &lt;a href=&quot;https://x.com/FSFG/status/2054196048621367422&quot;&gt;the corresponding paper listing&lt;/a&gt; frames it as a bridge between codegen and security evaluation. The broader point: coding agents are now strong enough that supply-chain hardening and secure-generation evaluation need to be treated as core infra, not side concerns.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Qwen 3.6 MTP and Long-Context Local Evals&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1ta4rvs/mtp_on_unsloth/&quot;&gt;MTP on Unsloth&lt;/a&gt;&lt;/strong&gt; (Activity: 727): &lt;strong&gt;The &lt;a href=&quot;https://i.redd.it/7qopol51pi0h1.png&quot;&gt;image&lt;/a&gt; is a Hugging Face activity screenshot showing &lt;strong&gt;Unsloth AI&lt;/strong&gt; publishing/updating MTP-preserved GGUF builds: &lt;a href=&quot;https://huggingface.co/unsloth/Qwen3.6-27B-GGUF-MTP&quot;&gt;&lt;code&gt;unsloth/Qwen3.6-27B-GGUF-MTP&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF-MTP&quot;&gt;&lt;code&gt;unsloth/Qwen3.6-35B-A3B-GGUF-MTP&lt;/code&gt;&lt;/a&gt;. The technical significance is that these GGUFs retain the &lt;strong&gt;MTP / next-token-prediction auxiliary layer&lt;/strong&gt;, but users reportedly still need to checkout and build a specific &lt;strong&gt;llama.cpp MTP PR&lt;/strong&gt; rather than relying on default llama.cpp support. One commenter hit a runtime/model-load assertion, &lt;code&gt;GGML_ASSERT(hparams.nextn_predict_layers &gt; 0 &amp;#x26;&amp;#x26; &quot;QWEN35_MTP requires nextn_predict_layers &gt; 0&quot;)&lt;/code&gt;, suggesting tooling or metadata support is still fragile for these MTP GGUFs.&lt;/strong&gt; Commenters are mainly waiting on upstream inference support, with one joking about constantly refreshing &lt;code&gt;llama.cpp&lt;/code&gt; and &lt;code&gt;vLLM&lt;/code&gt; GitHub repos. There is also uncertainty over whether MTP is supported “out of the box” in llama.cpp; the post indicates it is not yet.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user compiling/running the new &lt;code&gt;27B&lt;/code&gt; GGUF model reports a hard assertion failure in &lt;code&gt;qwen35_mtp.cpp&lt;/code&gt;: &lt;code&gt;GGML_ASSERT(hparams.nextn_predict_layers &gt; 0 &amp;#x26;&amp;#x26; &quot;QWEN35_MTP requires nextn_predict_layers &gt; 0&quot;) failed&lt;/code&gt;. This suggests the GGUF/model metadata being loaded is missing or not exposing &lt;code&gt;nextn_predict_layers&lt;/code&gt;, which is required for &lt;strong&gt;Qwen3.5 MTP&lt;/strong&gt; execution in the current implementation.&lt;/li&gt;
&lt;li&gt;Several commenters are tracking whether &lt;strong&gt;llama.cpp&lt;/strong&gt; and &lt;strong&gt;vLLM&lt;/strong&gt; have landed native &lt;strong&gt;MTP&lt;/strong&gt; support, with one explicitly asking whether llama.cpp now supports MTP “out of the box.” The thread implies support is still in flux across backends and that users are watching upstream repositories for compatibility with GGUF MTP models.&lt;/li&gt;
&lt;li&gt;One technical takeaway is that &lt;strong&gt;MTP support in GGUF&lt;/strong&gt; is viewed as important for local inference, especially for Qwen-style variants such as the mentioned &lt;code&gt;35B A3B&lt;/code&gt; model. A commenter highlights the &lt;code&gt;35B A3B&lt;/code&gt; variant as interesting specifically because of expected context-length improvements.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t9whrt/the_qwen_36_35b_a3b_hype_is_real/&quot;&gt;The Qwen 3.6 35B A3B hype is real!!!&lt;/a&gt;&lt;/strong&gt; (Activity: 713): &lt;strong&gt;A user benchmarked &lt;strong&gt;Qwen 3.6 35B A3B&lt;/strong&gt;, &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt;, &lt;strong&gt;Gemma 4 26B A4B&lt;/strong&gt;, and &lt;strong&gt;Nemotron 3 Nano&lt;/strong&gt; on a niche paper-to-code comprehension task, feeding each model an academic paper plus accompanying research code via long-context mechanisms such as gated delta nets, hybrid Mamba2, and sliding-window attention. In their &lt;a href=&quot;https://github.com/nathanlgabriel/paper_code_mapping_assessment/blob/main/README.md&quot;&gt;detailed findings&lt;/a&gt;, all four small/local open-weight models substantially outperformed prior small-model baselines such as &lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1ry93gz/devstral_small_2_24b_severely_underrated/&quot;&gt;Devstral Small 2&lt;/a&gt;, with &lt;strong&gt;Qwen 3.6 35B A3B&lt;/strong&gt; judged strongest; Devstral Small 2 could not fit the long-context workload in &lt;code&gt;32GB&lt;/code&gt; VRAM/RAM.&lt;/strong&gt; Commenters noted practical tradeoffs: &lt;strong&gt;Qwen 35B&lt;/strong&gt; is preferred for long-context/refactoring but can be verbose/slow in thinking mode, while &lt;strong&gt;Gemma 26B&lt;/strong&gt; is faster for code fixes/chats; at &lt;code&gt;q4&lt;/code&gt;, one user reports ~&lt;code&gt;20GB&lt;/code&gt; for Qwen 35B and ~&lt;code&gt;15GB&lt;/code&gt; for Gemma 26B, allowing both to stay loaded. Another commenter criticized the evaluation for not documenting inference settings, which limits reproducibility.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several users compared local workflows using &lt;strong&gt;Gemma 26B&lt;/strong&gt; and &lt;strong&gt;Qwen 35B&lt;/strong&gt;, noting that both can be kept resident simultaneously at &lt;code&gt;q4&lt;/code&gt; quantization because Qwen 35B is about &lt;code&gt;20 GB&lt;/code&gt; and Gemma 26B about &lt;code&gt;15 GB&lt;/code&gt;. One commenter uses Gemma 26B thinking mode for quick code fixes/chat and Qwen 35B thinking mode for longer-context refactoring, but reports Qwen 35B has high latency due to excessive reasoning verbosity before final output.&lt;/li&gt;
&lt;li&gt;A coding-focused report claimed &lt;strong&gt;Qwen 27B&lt;/strong&gt; can handle large projects (&lt;code&gt;100k+&lt;/code&gt; LOC) effectively when bootstrapped by a stronger model/coding agent for initial project setup, then switched to Qwen for continued work. The user found little practical difference between Qwen 27B and &lt;strong&gt;DeepSeek V4&lt;/strong&gt; for their use case, though Qwen occasionally entered loops requiring manual interruption and continuation prompting.&lt;/li&gt;
&lt;li&gt;One commenter emphasized that &lt;strong&gt;Qwen 27B/35B performance is sensitive to inference configuration&lt;/strong&gt;, specifically temperature/sampling parameters and avoiding overly aggressive quantization of either the model weights or KV cache. Another asked for the missing run settings, implying the original claims are hard to evaluate without details like quantization level, sampler settings, context length, backend, or hardware.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Memory-Tiered and Power-Efficient Local Inference&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1taeg8h/computer_build_using_intel_optane_persistent/&quot;&gt;Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec&lt;/a&gt;&lt;/strong&gt; (Activity: 964): &lt;strong&gt;The image shows the internals of a high-memory Xeon workstation/server build using &lt;strong&gt;Intel Optane DC Persistent Memory&lt;/strong&gt; DIMMs, matching the post’s claim of running &lt;strong&gt;Kimi K2.5&lt;/strong&gt;, a ~&lt;code&gt;1T&lt;/code&gt; parameter MoE model, locally at about &lt;code&gt;4 tokens/s&lt;/code&gt; via &lt;strong&gt;llama.cpp&lt;/strong&gt; hybrid GPU/CPU inference. The key technical point is the use of &lt;code&gt;768GB&lt;/code&gt; Optane PMem in &lt;strong&gt;Memory Mode&lt;/strong&gt;, where Optane appears as system RAM and &lt;code&gt;192GB&lt;/code&gt; DDR4 ECC DRAM acts as cache, allowing the model’s sparse expert weights to reside in PMem while attention/dense/shared expert/routing tensors fit on an &lt;strong&gt;RTX 3060 12GB&lt;/strong&gt; using &lt;code&gt;override-tensor&lt;/code&gt; or &lt;code&gt;ngl auto&lt;/code&gt;/&lt;code&gt;cmoe&lt;/code&gt;. &lt;a href=&quot;https://i.redd.it/na7zo7lmck0h1.jpeg&quot;&gt;Image&lt;/a&gt;&lt;/strong&gt; Commenters noted that a higher-core-count Cascade Lake Xeon, such as an ES 8260/QQ89, could improve throughput, and debated whether Optane &lt;strong&gt;Storage Mode&lt;/strong&gt; plus &lt;code&gt;mmap&lt;/code&gt; might outperform Memory Mode. Others found the build impressive but questioned whether &lt;code&gt;4 tokens/s&lt;/code&gt; is practically tolerable for interactive use.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A detailed hardware note suggests performance may improve with a higher-core-count Cascade Lake Xeon, e.g. &lt;strong&gt;QQ89 ES / Xeon Gold 8260-class &lt;code&gt;24-core&lt;/code&gt;&lt;/strong&gt;, versus the current &lt;strong&gt;Xeon Gold 6246 &lt;code&gt;12-core&lt;/code&gt;&lt;/strong&gt;. The commenter also proposes benchmarking Optane PMem in &lt;strong&gt;storage mode + &lt;code&gt;mmap&lt;/code&gt;&lt;/strong&gt; versus &lt;strong&gt;memory mode&lt;/strong&gt;, noting that memory mode uses DRAM as a transparent cache and requires pages to be swapped back into DRAM before CPU execution, so it is not equivalent to normal RAM latency.&lt;/li&gt;
&lt;li&gt;One commenter provides a concise Optane PMem platform compatibility breakdown: &lt;strong&gt;LGA3647 Skylake/Cascade Lake uses 1st-gen Optane &lt;code&gt;NMA&lt;/code&gt; at &lt;code&gt;2666 MT/s&lt;/code&gt;&lt;/strong&gt;, while &lt;strong&gt;LGA4189 uses 2nd-gen &lt;code&gt;NMB&lt;/code&gt;&lt;/strong&gt;, running at &lt;code&gt;2666&lt;/code&gt; on Cooper Lake and &lt;code&gt;3200&lt;/code&gt; on Ice Lake. They also note that mixing Optane with DRAM on Cascade Lake can downclock affected channels to &lt;code&gt;2666&lt;/code&gt;, and that many Xeons from this era have a &lt;strong&gt;&lt;code&gt;1 TB&lt;/code&gt; total memory limit across DRAM + Optane&lt;/strong&gt;, unless using high-memory SKUs or later platforms.&lt;/li&gt;
&lt;li&gt;A technical caveat is raised that while &lt;code&gt;~4 tokens/sec&lt;/code&gt; generation on a trillion-parameter model may be tolerable for some uses, &lt;strong&gt;prompt processing/prefill speed is likely to be much worse&lt;/strong&gt; on this kind of memory hierarchy. Another comment estimates the full used-market build cost at roughly &lt;strong&gt;&lt;code&gt;$2060–$2500&lt;/code&gt;&lt;/strong&gt;, including a &lt;strong&gt;Xeon Gold 6246&lt;/strong&gt;, &lt;strong&gt;TYAN S5630GMRE-CGN&lt;/strong&gt;, &lt;strong&gt;RTX 3060 12GB&lt;/strong&gt;, &lt;code&gt;192 GB&lt;/code&gt; DDR4 ECC RDIMM, and &lt;code&gt;768 GB&lt;/code&gt; Intel Optane DCPMM.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1tayu5t/stop_wasting_electricity/&quot;&gt;Stop wasting electricity&lt;/a&gt;&lt;/strong&gt; (Activity: 905): &lt;strong&gt;A user benchmarked &lt;a href=&quot;https://github.com/ggml-org/llama.cpp&quot;&gt;&lt;code&gt;llama.cpp&lt;/code&gt;&lt;/a&gt; &lt;code&gt;llama-server&lt;/code&gt; on an &lt;strong&gt;RTX 4090&lt;/strong&gt; with &lt;code&gt;Qwen3.6-27B-UD-Q4_K_XL.gguf&lt;/code&gt;, full GPU offload (&lt;code&gt;-ngl all&lt;/code&gt;), FlashAttention enabled, &lt;code&gt;q4_0&lt;/code&gt; K/V cache quantization, &lt;code&gt;32&lt;/code&gt; threads, and a &lt;code&gt;262144&lt;/code&gt; context, varying the GPU power cap via &lt;code&gt;sudo nvidia-smi -pl N&lt;/code&gt;. They report the GPU was consistently power-limited and that reducing the power limit can substantially lower power/heat/noise with little to no &lt;strong&gt;decode / token-generation (&lt;code&gt;tg&lt;/code&gt;)&lt;/strong&gt; throughput loss; a commenter notes &lt;strong&gt;prefill (&lt;code&gt;pp&lt;/code&gt;)&lt;/strong&gt; is more sensitive, with roughly &lt;code&gt;15–20%&lt;/code&gt; performance loss when dropping from &lt;code&gt;450W&lt;/code&gt; to &lt;code&gt;270W&lt;/code&gt;, model-dependent.&lt;/strong&gt; Commenters were mainly interested in separating &lt;strong&gt;decode vs prefill&lt;/strong&gt; behavior, since decode appears power-insensitive while prefill degrades more noticeably. One RTX 5090 user said they already cap power for hardware-safety concerns and may reduce it further based on these results.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users focused on the performance impact of GPU power limiting: &lt;strong&gt;decode/token generation (&lt;code&gt;tg&lt;/code&gt;) reportedly is not the bottleneck&lt;/strong&gt;, while &lt;strong&gt;prefill (&lt;code&gt;pp&lt;/code&gt;) takes a larger hit&lt;/strong&gt;. One commenter quantified the tradeoff as only about &lt;strong&gt;&lt;code&gt;15–20%&lt;/code&gt; prefill performance loss&lt;/strong&gt; when reducing power from &lt;strong&gt;&lt;code&gt;450W&lt;/code&gt; to &lt;code&gt;270W&lt;/code&gt;&lt;/strong&gt;, depending on the model, suggesting substantial efficiency gains from aggressive power caps.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Ultra-Small On-Device Transformer Experiments&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1tbi2n3/i_got_a_real_transformer_language_model_running/&quot;&gt;I got a real transformer language model running locally on a stock Game Boy Color!&lt;/a&gt;&lt;/strong&gt; (Activity: 368): &lt;strong&gt;The image (&lt;a href=&quot;https://i.redd.it/1hl9id7ghs0h1.jpeg&quot;&gt;jpeg&lt;/a&gt;) shows a stock &lt;strong&gt;Game Boy Color&lt;/strong&gt; running a local TinyStories transformer demo, with the screen displaying &lt;code&gt;TINYSTORIES Q8 GBC&lt;/code&gt; and &lt;code&gt;Prompt tokenized&lt;/code&gt;. Per the post, this is &lt;strong&gt;Andrej Karpathy’s TinyStories-260K&lt;/strong&gt; converted to &lt;code&gt;INT8&lt;/code&gt;/fixed-point math in a &lt;strong&gt;GBDK-2020 MBC5 ROM&lt;/strong&gt;, with weights in bank-switched cartridge ROM and the KV cache stored in cartridge SRAM due to the GBC’s tiny work RAM. The author notes it is &lt;em&gt;extremely slow&lt;/em&gt; and produces mostly gibberish because of aggressive quantization/approximations, but the core local transformer prefill + autoregressive generation loop works on-device with no PC, phone, Wi-Fi, link cable, or cloud inference: &lt;a href=&quot;https://github.com/maddiedreese/gbc-transformer&quot;&gt;github.com/maddiedreese/gbc-transformer&lt;/a&gt;.&lt;/strong&gt; Comments are mostly enthusiastic praise; one commenter said it made them want to run a model on an &lt;strong&gt;N64&lt;/strong&gt;, and another linked a related/joke Game Boy language-model project, &lt;a href=&quot;https://code.heni.lol/heni/gbalm&quot;&gt;gbalm&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter linked a prior Game Boy language-model project, &lt;strong&gt;gbalm&lt;/strong&gt; (&lt;a href=&quot;https://code.heni.lol/heni/gbalm&quot;&gt;code&lt;/a&gt;), indicating there has been earlier experimentation with extremely constrained on-device LM inference on Nintendo handheld hardware. This is relevant as a comparison point for implementation approaches and feasibility on non-GPU, retro 8-bit-class systems.&lt;/li&gt;
&lt;li&gt;One technical question centered on why CUDA/ROCm-style GPU stacks are not required here: the commenter notes that typical LLM inference is associated with mature GPU compilers, yet this demo runs on hardware comparable to &lt;em&gt;“a potato.”&lt;/em&gt; The implicit point is that sufficiently tiny transformer models can be executed with hand-written or highly simplified CPU-style inference loops, though at very low throughput, and that portability to unsupported accelerators such as future Chinese GPUs would depend more on having a basic compute backend than full CUDA compatibility.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1tb9b0r/needle_we_distilled_gemini_tool_calling_into_a/&quot;&gt;Needle: We Distilled Gemini Tool Calling Into a 26M Model&lt;/a&gt;&lt;/strong&gt; (Activity: 271): &lt;strong&gt;&lt;strong&gt;Cactus Compute&lt;/strong&gt; released &lt;strong&gt;Needle&lt;/strong&gt;, an MIT-licensed &lt;code&gt;26M&lt;/code&gt; parameter single-shot tool-calling model distilled from &lt;strong&gt;Gemini&lt;/strong&gt;-synthesized data, claiming &lt;code&gt;6000 tok/s&lt;/code&gt; prefill and &lt;code&gt;1200 tok/s&lt;/code&gt; decode on consumer devices; weights are on &lt;a href=&quot;https://huggingface.co/Cactus-Compute/needle&quot;&gt;Hugging Face&lt;/a&gt; and code/docs are on &lt;a href=&quot;https://github.com/cactus-compute/needle&quot;&gt;GitHub&lt;/a&gt;. Architecturally it uses “Simple Attention Networks” — attention plus gating with &lt;strong&gt;no MLP/FFN layers&lt;/strong&gt; — arguing that function calling is mostly retrieval/assembly over provided tool schemas rather than memorized reasoning; training used &lt;code&gt;200B&lt;/code&gt; pretraining tokens on &lt;code&gt;16 TPU v6e&lt;/code&gt; for &lt;code&gt;27h&lt;/code&gt; plus &lt;code&gt;2B&lt;/code&gt; synthesized function-calling tokens in &lt;code&gt;45m&lt;/code&gt; (&lt;a href=&quot;https://github.com/cactus-compute/needle/blob/main/docs/simple_attention_networks.md&quot;&gt;architecture writeup&lt;/a&gt;). The authors claim it beats &lt;strong&gt;FunctionGemma-270M&lt;/strong&gt;, &lt;strong&gt;Qwen-0.6B&lt;/strong&gt;, &lt;strong&gt;Granite-350M&lt;/strong&gt;, and &lt;strong&gt;LFM2.5-350M&lt;/strong&gt; on single-shot function calling, while acknowledging those larger models have broader conversational capacity.&lt;/strong&gt; Commenters framed the model as potentially useful as a lightweight router that dispatches queries/tools or escalates to a larger LLM, with one asking whether the same architecture could support high-quality summarization. A technical concern was raised about uploaded &lt;code&gt;pickle&lt;/code&gt; files due to Python-specific dependency and deserialization security risks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter framed the &lt;code&gt;26M&lt;/code&gt; distilled tool-calling model as a lightweight &lt;strong&gt;router/gating model&lt;/strong&gt;: it could decide whether a query should be sent to a larger LLM and with which parameters, effectively reducing expensive model calls to cases where they are needed. They also speculated whether the same architecture could generalize to constrained summarization workflows, though no benchmark evidence was provided in the thread.&lt;/li&gt;
&lt;li&gt;One technical thread focused on the authors’ claimed &lt;strong&gt;“no FFN”&lt;/strong&gt; result: for tasks with external structured knowledge such as &lt;strong&gt;RAG, tool use, and retrieval-augmented generation&lt;/strong&gt;, the model may not need feed-forward layers to store factual knowledge if relevant facts are already present in context. A commenter extrapolated this into a pipeline where a small post-trained model routes requests to RAG and then uses retrieved context to generate a natural-language answer.&lt;/li&gt;
&lt;li&gt;Several implementation/security concerns were raised: one commenter noted that publishing &lt;strong&gt;pickle files&lt;/strong&gt; is increasingly avoided because of Python-specific dependency issues and arbitrary-code-execution risk during deserialization. Another pointed out that &lt;strong&gt;Gemini&lt;/strong&gt; has had visible tool-calling quirks, including system-prompt-like reasoning about avoiding &lt;code&gt;cat&lt;/code&gt; and preferring tools such as &lt;code&gt;grep_search&lt;/code&gt;, raising the possibility that a distilled dataset could inherit provider-specific tool-use biases if not cleaned carefully.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. Claude Coding Workflows and Tooling&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1tb7edc/inherited_a_3month_old_repo_from_a_vibe_engineer/&quot;&gt;Inherited a 3-month old repo from a Vibe Engineer. Wrote the most satisfying PR in my career&lt;/a&gt;&lt;/strong&gt; (Activity: 3672): &lt;strong&gt;The image is a GitHub-style diffstat showing a cleanup PR with &lt;strong&gt;&lt;code&gt;+10,197&lt;/code&gt; additions&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;−3,618,778&lt;/code&gt; deletions&lt;/strong&gt; (&lt;a href=&quot;https://i.redd.it/izgrhw5tgq0h1.png&quot;&gt;image&lt;/a&gt;), contextualizing the post’s claim of rewriting a 3-month-old “vibe-coded” backend repo. The author says the inherited repo had &lt;strong&gt;&lt;code&gt;309k&lt;/code&gt; lines of code&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;240k&lt;/code&gt; lines of docs&lt;/strong&gt;, &lt;strong&gt;1M+ lines of markdown logs&lt;/strong&gt;, &lt;code&gt;220&lt;/code&gt; handlers with only ~&lt;code&gt;20&lt;/code&gt; used, and &lt;code&gt;40+&lt;/code&gt; secrets with only &lt;code&gt;2&lt;/code&gt; needed; they rewrote it in a week using Claude, preserving functionality while adding cleaner architecture and integration tests.&lt;/strong&gt; Commenters framed this as an emerging maintenance problem around AI/agent-generated code, with one predicting that &lt;em&gt;“fixing vibe-coded mess”&lt;/em&gt; could become a lucrative career path. The thread also questions whether elaborate agent knowledge bases and auto-generated documentation meaningfully improve development or just create the appearance of productivity.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One commenter predicts that remediation of AI/“vibe-coded” repositories may become a valuable specialization, implying that short-term productivity from agentic coding can create downstream maintainability debt. They also argue that much of the enthusiasm around “vibecoding” comes from people who are &lt;em&gt;not software professionals&lt;/em&gt;, suggesting a gap between demo-level output and production-quality engineering standards.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1takxpl/clawdmeter_a_small_esp32_usage_limit_monitor/&quot;&gt;Clawdmeter - a small ESP32 usage limit monitor (source code in description)&lt;/a&gt;&lt;/strong&gt; (Activity: 1677): &lt;strong&gt;The image shows &lt;strong&gt;Clawdmeter&lt;/strong&gt;, a small ESP32-based desk monitor displaying Claude/Anthropic usage limits with reset timers and progress bars, matching the post’s description of a &lt;code&gt;$32&lt;/code&gt; Waveshare ESP32 dev board with a &lt;code&gt;480×480&lt;/code&gt; AMOLED display. The project is open-sourced on &lt;a href=&quot;https://github.com/HermannBjorgvin/Clawdmeter&quot;&gt;GitHub&lt;/a&gt;, and the pictured device appears to visualize current and weekly quota state in a compact physical dashboard: &lt;a href=&quot;https://i.redd.it/aqoo7y4nkl0h1.jpeg&quot;&gt;image&lt;/a&gt;.&lt;/strong&gt; Comments were mostly lighthearted, with users joking that Anthropic should ship these for free and that it might increase &lt;em&gt;“Claude usage anxiety.”&lt;/em&gt; One commenter also noted interest in using the same low-cost ESP32 display platform for other customized smart-home status devices.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter suggested extending the ESP32 monitor from a point-in-time quota display into a small telemetry device that records &lt;strong&gt;usage history over time&lt;/strong&gt;. They specifically wanted per-command impact tracking and a chart view to validate whether Claude usage is being consumed faster than expected.&lt;/li&gt;
&lt;li&gt;Another technical angle raised was whether the same low-cost ESP32-style hardware platform could be reused for other &lt;strong&gt;custom, niche smart-home status displays or monitors&lt;/strong&gt;. The comment frames the device as a general-purpose ambient information appliance rather than only a Claude quota meter.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. AI Deployment Failure Modes in the Wild&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1ta1dvl/chatgpt_is_now_creating_content_for_textbooks/&quot;&gt;ChatGPT is now creating content for textbooks.&lt;/a&gt;&lt;/strong&gt; (Activity: 5865): &lt;strong&gt;The image appears to show a &lt;strong&gt;DBMS textbook page&lt;/strong&gt; where an AI-assistant-style sentence—&lt;em&gt;“If you want, I can also explain…”&lt;/em&gt;—was accidentally left in the printed/produced material, implying that ChatGPT or a similar LLM may have been used to draft textbook content without adequate human review. This is not a technical benchmark or implementation post; its significance is contextual: it highlights a likely &lt;strong&gt;AI-generated content artifact&lt;/strong&gt; in educational material. &lt;a href=&quot;https://i.redd.it/d65cfdtf1i0h1.png&quot;&gt;Image&lt;/a&gt;&lt;/strong&gt; Commenters criticized the lack of editorial review and argued that AI-generated student-facing educational content is becoming widespread across institutions, faculty, staff, and outsourced providers. One commenter also suggested the visible annotation may have been edited with Gemini or another tool, but the main concern remained that the textbook text itself appears unvetted.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One commenter claims from direct work with an educational institution that &lt;strong&gt;AI-generated student-facing content is becoming pervasive&lt;/strong&gt; across faculty, staff, and outsourced educational-content providers, implying a shift from isolated use to institution-wide production workflows.&lt;/li&gt;
&lt;li&gt;A technical observation flags the image as likely AI-edited/generated due to &lt;strong&gt;watermark removal artifacts&lt;/strong&gt;, text running off the page edge, and a possible &lt;strong&gt;SynthID/Gemini provenance marker&lt;/strong&gt; introduced when someone used Gemini to add a box/arrow annotation. Another commenter notes that without a concrete textbook citation, the entire screenshot itself could plausibly be AI-generated rather than evidence from a real book.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1tatxnq/i_made_an_ai_concierge_for_my_wedding_guests_the/&quot;&gt;I made an AI concierge for my wedding guests. The second most popular thing they did with it was try to jailbreak it.&lt;/a&gt;&lt;/strong&gt; (Activity: 1667): &lt;strong&gt;The &lt;a href=&quot;https://i.imgur.com/8n0k4Ve.jpeg&quot;&gt;image&lt;/a&gt; is an infographic report card for a custom &lt;strong&gt;AI wedding concierge&lt;/strong&gt; used at a destination wedding in Mauritius: &lt;code&gt;29&lt;/code&gt; users generated &lt;code&gt;719&lt;/code&gt; sessions and &lt;code&gt;8,678&lt;/code&gt; messages. Its usage breakdown is notable for real-world chatbot deployment behavior: &lt;code&gt;35%&lt;/code&gt; sincere logistics questions, &lt;code&gt;25%&lt;/code&gt; jailbreak/hacking attempts, plus cultural translation, chitchat, and miscellaneous requests; the creator says it connected to an API via an &lt;strong&gt;MCP server&lt;/strong&gt; to retrieve wedding information for guests.&lt;/strong&gt; Commenters found the project more interesting than generic chatbot demos, but were surprised by the message volume from only &lt;code&gt;29&lt;/code&gt; people and by how often guests tried to jailbreak it.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The OP describes building two related systems: a wedding-planning assistant for a destination wedding in &lt;strong&gt;Mauritius&lt;/strong&gt;, and a guest-facing AI concierge connected to an external API through an &lt;strong&gt;MCP server&lt;/strong&gt; to retrieve event/travel information for users. A notable usage statistic from the thread is that only &lt;code&gt;29&lt;/code&gt; guests generated &lt;strong&gt;over &lt;code&gt;8,000&lt;/code&gt; messages&lt;/strong&gt;, with the post title indicating that attempted jailbreaks were the second-most common behavior.&lt;/li&gt;
&lt;li&gt;One commenter raised an implementation/privacy concern around observability and logs: whether guests were aware the creator could read their conversations with the concierge. This is relevant for anyone building small-event AI assistants, since chat transcript retention, admin access, and consent can become significant issues even in a non-enterprise deployment.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>google-deepmind</category><category>lighton</category><category>nous-research</category><category>gemini-3.1-pro</category><category>gpt-5.5</category><category>opus-4.7-xhigh</category><category>agent-moderncolbert</category><category>soohak</category><category>polynoamial</category><category>torchcompiled</category><category>leloykun</category><category>che_shr_cat</category><category>jjitsev</category><category>omarsar0</category><category>research-benchmarks</category><category>math</category><category>medical-benchmarks</category><category>agentic-systems</category><category>program-synthesis</category><category>retrieval-augmentation</category><category>training-optimization</category><category>superoptimization</category><category>scaling-laws</category><category>training-efficiency</category><category>gpu-optimization</category><category>attention-mechanisms</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-05-11-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-11-not-much/</guid><description>**Thinking Machines** previewed their new **native interaction models** designed for **full-duplex multimodal interaction** enabling real-time concurrent listening, speaking, watching, thinking, searching, and reacting, marking a shift beyond turn-based AI. This approach emphasizes continuous audio, video, and text processing, with innovations like **visual proactivity** and background tool use, implemented using **SGLang**. Meanwhile, **OpenAI** announced the **OpenAI Deployment Company**, a new unit with **150 Forward Deployed Engineers** and **$4B initial investment** to help enterprises deploy frontier models, signaling a move into the deployment layer of the AI economy. OpenAI also launched **Daybreak**, a security-focused initiative integrating **GPT-5.5** and **Codex** for cyber defense, threat modeling, and automated patching, offering differentiated access tiers including **GPT-5.5-Cyber**. This contrasts with Anthropic&apos;s more restrictive cyber approach, highlighting tensions in AI security strategies.</description><pubDate>Mon, 11 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/9/2026-5/11/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Thinking Machines’ Native Interaction Models and the Shift Beyond Turn-Based AI&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Full-duplex multimodal interaction as a first-class model capability&lt;/strong&gt;: The day’s clearest technical theme was &lt;a href=&quot;https://x.com/miramurati/status/2053939069890298321&quot;&gt;Thinking Machines’ preview of “interaction models”&lt;/a&gt;, described as models trained &lt;strong&gt;from scratch&lt;/strong&gt; for real-time interaction rather than layering speech, turn-taking, and tool use onto a turn-based LLM. The accompanying &lt;a href=&quot;https://x.com/thinkymachines/status/2053938892152435174&quot;&gt;technical post&lt;/a&gt; and team commentary from &lt;a href=&quot;https://x.com/johnschulman2/status/2053940452789981426&quot;&gt;@johnschulman2&lt;/a&gt;, &lt;a href=&quot;https://x.com/soumithchintala/status/2053940215505645938&quot;&gt;@soumithchintala&lt;/a&gt;, and &lt;a href=&quot;https://x.com/cHHillee/status/2053940218747842619&quot;&gt;@cHHillee&lt;/a&gt; frame this as a &lt;strong&gt;human↔AI bandwidth&lt;/strong&gt; problem: models should be able to listen, speak, watch, think, search, and react concurrently. Demos emphasized continuous-time awareness, interruption handling, simultaneous speech, visual proactivity, and background tool use without explicit “now I’m thinking / now I’m searching” boundaries. Team members also highlighted that many tasks that previously needed special-purpose systems become zero-shot once the type signature is effectively continuous &lt;strong&gt;audio+video+text → audio+text&lt;/strong&gt; (&lt;a href=&quot;https://x.com/johnschulman2/status/2053940940885332028&quot;&gt;@johnschulman2&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Why it matters technically&lt;/strong&gt;: Several reactions converged on the same point: this is not “another chatbot demo” but a change in interface assumptions. &lt;a href=&quot;https://x.com/liliyu_lili/status/2053942465477197891&quot;&gt;@liliyu_lili&lt;/a&gt; pointed to &lt;strong&gt;visual proactivity&lt;/strong&gt; (“tell me when I start slouching”, “count my pushups”) as a missing primitive in current systems; &lt;a href=&quot;https://x.com/rown/status/2053950123139575863&quot;&gt;@rown&lt;/a&gt; called it the first general &lt;strong&gt;video+speech&lt;/strong&gt; model that is visually proactive; &lt;a href=&quot;https://x.com/kimmonismus/status/2053952846064767384&quot;&gt;@kimmonismus&lt;/a&gt; and &lt;a href=&quot;https://x.com/giffmana/status/2053953584300003405&quot;&gt;@giffmana&lt;/a&gt; both emphasized that native interactivity is the deeper innovation than raw benchmark claims. This launch also implicitly raises the bar for “realtime” multimodal systems, as noted by &lt;a href=&quot;https://x.com/swyx/status/2053960011748098462&quot;&gt;@swyx&lt;/a&gt;. One implementation detail surfaced via &lt;a href=&quot;https://x.com/eliebakouch/status/2053982248253190180&quot;&gt;@eliebakouch&lt;/a&gt;: the stack is using &lt;strong&gt;SGLang&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;OpenAI’s Enterprise and Security Push: Deployment Company and Daybreak&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI is moving down-stack into services and deployment&lt;/strong&gt;: OpenAI announced the &lt;a href=&quot;https://x.com/OpenAI/status/2053824997777457651&quot;&gt;OpenAI Deployment Company&lt;/a&gt;, a majority-owned unit built to help enterprises deploy frontier models into real workflows. The key operating detail is &lt;strong&gt;150 Forward Deployed Engineers and Deployment Specialists&lt;/strong&gt; coming in via the acquisition of &lt;a href=&quot;https://x.com/OpenAI/status/2053824999736410415&quot;&gt;Tomoro&lt;/a&gt;, with &lt;a href=&quot;https://x.com/gdb/status/2053884619695730745&quot;&gt;@gdb&lt;/a&gt; citing &lt;strong&gt;$4B of initial investment from 19 partners&lt;/strong&gt;. Multiple observers read this as OpenAI adopting a Palantir-/Microsoft-style field-engineering model: &lt;a href=&quot;https://x.com/kimmonismus/status/2053844403488194827&quot;&gt;@kimmonismus&lt;/a&gt; argued OpenAI wants to own the &lt;strong&gt;deployment layer&lt;/strong&gt; of the AI economy, while &lt;a href=&quot;https://x.com/matvelloso/status/2053881988529139765&quot;&gt;@matvelloso&lt;/a&gt; connected it to the historical enterprise success pattern of embedding technical staff close to customer operations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Daybreak: security-specific model distribution, workflow, and trust tiers&lt;/strong&gt;: OpenAI also launched &lt;a href=&quot;https://x.com/OpenAI/status/2053939702110269822&quot;&gt;Daybreak&lt;/a&gt;, an umbrella effort around defensive cyber operations and continuously securing software, with &lt;a href=&quot;https://x.com/sama/status/2053951874408276193&quot;&gt;@sama&lt;/a&gt; positioning it as a practical response to rapidly improving AI cyber capability. The product pitch, summarized by &lt;a href=&quot;https://x.com/TheRundownAI/status/2053945340592631843&quot;&gt;@TheRundownAI&lt;/a&gt;, combines &lt;strong&gt;GPT-5.5&lt;/strong&gt;, &lt;strong&gt;Codex&lt;/strong&gt;, repository threat modeling, vuln discovery, patch generation, and response automation, with differentiated access tiers including &lt;strong&gt;Trusted Access for Cyber&lt;/strong&gt; and a more specialized &lt;strong&gt;GPT-5.5-Cyber&lt;/strong&gt;. This stands in contrast to Anthropic’s more restrictive cyber posture, a tension captured by &lt;a href=&quot;https://x.com/kimmonismus/status/2053941490490265661&quot;&gt;@kimmonismus&lt;/a&gt;. For teams building secure agent systems, a separate warning from &lt;a href=&quot;https://x.com/lukOlejnik/status/2053758553723211988&quot;&gt;@lukOlejnik&lt;/a&gt; is relevant: &lt;strong&gt;“Your LLM is not a security boundary”&lt;/strong&gt;—Microsoft Semantic Kernel reportedly allowed prompt injection to be turned into host-level RCE because the framework over-trusted model output rather than the model itself failing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent Harnesses, Local-First Tooling, and Control Surfaces&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Better agent control planes are becoming a product category&lt;/strong&gt;: A recurring complaint is that useful agents need autonomy, but engineers still want reversible, inspectable control. &lt;a href=&quot;https://x.com/itsclelia/status/2053716807748567329&quot;&gt;@itsclelia&lt;/a&gt; addressed this with &lt;strong&gt;aggit&lt;/strong&gt;, a Rust CLI for local/remote, S3-backed storage of agent artifacts, enabling stash/branch/restore semantics outside the main Git history. In the same vein, &lt;a href=&quot;https://x.com/_catwu/status/2053999857799672111&quot;&gt;@_catwu&lt;/a&gt; highlighted a new &lt;code&gt;claude agents&lt;/code&gt; terminal control plane for managing multiple Claude Code agents, and &lt;a href=&quot;https://x.com/cursor_ai/status/2053939390410612988&quot;&gt;@cursor_ai&lt;/a&gt; pushed Cursor into &lt;strong&gt;Microsoft Teams&lt;/strong&gt;, where the agent reads the full thread and opens a PR. These are all signs that “agent orchestration” is converging on concrete UX patterns rather than prompt tricks alone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deep Agents / Hermes / local agents are maturing quickly&lt;/strong&gt;: &lt;a href=&quot;https://x.com/masondrxy/status/2053717333433340034&quot;&gt;@masondrxy&lt;/a&gt; noted that &lt;strong&gt;Deep Agents CLI&lt;/strong&gt; can hot-swap underlying model providers &lt;strong&gt;mid-conversation without losing context&lt;/strong&gt;, a nontrivial systems capability that many agent stacks still miss. LangChain also highlighted &lt;strong&gt;harness profiles&lt;/strong&gt; for provider/model-specific tuning (&lt;a href=&quot;https://x.com/masondrxy/status/2053882188870074848&quot;&gt;tweet&lt;/a&gt;), and separate pricing analysis from the same author argued that &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; can be dramatically cheaper than GPT/Gemini flash-tier options for high-volume agent workloads (&lt;a href=&quot;https://x.com/masondrxy/status/2053855842076942555&quot;&gt;tweet&lt;/a&gt;). On the local side, Hugging Face added &lt;a href=&quot;https://x.com/mervenoyann/status/2053857347429151163&quot;&gt;Hermes Agent support in local apps plus native trace visualization&lt;/a&gt;, while &lt;a href=&quot;https://x.com/Teknium/status/2053961675985113404&quot;&gt;@Teknium&lt;/a&gt; previewed &lt;strong&gt;computer use with any model&lt;/strong&gt; via Hermes Agent and CUA, explicitly targeting local/open models as well as frontier APIs. &lt;a href=&quot;https://x.com/onusoz/status/2053812410730037256&quot;&gt;@onusoz&lt;/a&gt; joining Hugging Face to improve local models in &lt;strong&gt;OpenClaw&lt;/strong&gt; and related open harnesses is another strong signal that local agent ergonomics are now strategic infrastructure.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A design thesis emerging around tools&lt;/strong&gt;: &lt;a href=&quot;https://x.com/threepointone/status/2053751241977594102&quot;&gt;@threepointone&lt;/a&gt; argued that agents may asymptotically want just &lt;strong&gt;two primitive tools: search and execute&lt;/strong&gt;, with dynamic semantic discovery of capabilities rather than ever-expanding static tool menus. That complements the broader move toward configurable harnesses instead of giant monolithic prompts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Benchmarks, Efficiency, and Open-Model Economics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Coding-agent benchmarking is finally measuring harness+model pairs&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2053865095076438427&quot;&gt;Artificial Analysis launched a Coding Agent Index&lt;/a&gt; spanning SWE-Bench-Pro-Hard-AA, Terminal-Bench v2, and SWE-Atlas-QnA, comparing not just models but &lt;strong&gt;model+harness combinations&lt;/strong&gt;. Their topline: &lt;strong&gt;Opus 4.7&lt;/strong&gt; in Cursor CLI scored &lt;strong&gt;61&lt;/strong&gt;, with &lt;strong&gt;GPT-5.5&lt;/strong&gt; in Codex/Claude Code close behind; top open-weight setups included &lt;strong&gt;GLM-5.1&lt;/strong&gt;, &lt;strong&gt;Kimi K2.6&lt;/strong&gt;, and &lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; in Claude Code, still competitive but meaningfully behind. The benchmark also exposed large variation in &lt;strong&gt;cost per task&lt;/strong&gt; (&gt;30x), &lt;strong&gt;token usage&lt;/strong&gt; (&gt;3x), &lt;strong&gt;cache hit rates&lt;/strong&gt; (80–96%), and &lt;strong&gt;time per task&lt;/strong&gt; (&gt;7x). That benchmark was complemented by OpenHands’ updated software-engineering benchmark announcement (&lt;a href=&quot;https://x.com/OpenHandsDev/status/2053839810343620980&quot;&gt;tweet&lt;/a&gt;) and Claw-Eval’s more agentic task mix across office, finance, terminal, and web tasks, where &lt;a href=&quot;https://x.com/nathanhabib1011/status/2053786853929824385&quot;&gt;MiMo-V2.5-Pro led and DeepSeek V4 Flash looked unusually efficient for its size&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;TurboQuant skepticism is increasing&lt;/strong&gt;: Multiple posts pointed to a more sober view of the recently popular quantization/serving technique. &lt;a href=&quot;https://x.com/_EldarKurtic/status/2053809592061030546&quot;&gt;@_EldarKurtic&lt;/a&gt; presented what he described as the first comprehensive study of &lt;strong&gt;TurboQuant&lt;/strong&gt;, covering accuracy, latency, and throughput; &lt;a href=&quot;https://x.com/vllm_project/status/2053852636093239555&quot;&gt;@vllm_project&lt;/a&gt; linked the Red Hat / vLLM investigation as a starting point; and &lt;a href=&quot;https://x.com/jbhuang0604/status/2053882357833208262&quot;&gt;@jbhuang0604&lt;/a&gt; bluntly summarized the takeaway as “it doesn’t really work well.” This is exactly the sort of infra claim where independent reproduction matters.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local/open models continue to improve faster than hardware ceilings&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ClementDelangue/status/2053825719587815711&quot;&gt;@ClementDelangue&lt;/a&gt; made the strongest high-level argument here: on the same top-end MacBook Pro memory ceiling, the “smartest open-weight model you can actually run” improved from Llama 3 70B-era capability to &lt;strong&gt;DeepSeek V4 Flash mixed-Q2 GGUF&lt;/strong&gt;-era capability at roughly &lt;strong&gt;4.7x in 24 months&lt;/strong&gt;, implying a doubling every &lt;strong&gt;10.7 months&lt;/strong&gt;, faster than Moore’s Law. Supporting datapoints came from &lt;a href=&quot;https://x.com/victormustar/status/2053780086596288781&quot;&gt;@victormustar&lt;/a&gt; on the rapid growth of GGUF uploads and from repeated community observations that &lt;strong&gt;Qwen 3.6&lt;/strong&gt;, &lt;strong&gt;Gemma 4&lt;/strong&gt;, and DeepSeek variants are now usable locally for nontrivial agent tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Research Highlights: MoE Modularity, Diffusion/Byte Models, and Agent Dynamics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Architectures and evaluation&lt;/strong&gt;: AllenAI’s &lt;strong&gt;EMO&lt;/strong&gt; was highlighted by &lt;a href=&quot;https://x.com/TheTuringPost/status/2053795343658303860&quot;&gt;@TheTuringPost&lt;/a&gt; as a more modular Mixture-of-Experts design where document-level routing induces shared expert pools; notably, keeping only &lt;strong&gt;25% of experts&lt;/strong&gt; reportedly costs just &lt;strong&gt;~1%&lt;/strong&gt; performance versus &lt;strong&gt;10–15%&lt;/strong&gt; degradation in standard MoEs under similar pruning (&lt;a href=&quot;https://x.com/TheTuringPost/status/2053795410490339720&quot;&gt;follow-up&lt;/a&gt;). On generative evaluation, &lt;a href=&quot;https://x.com/qberthet/status/2053795951228371311&quot;&gt;@qberthet&lt;/a&gt; introduced &lt;strong&gt;MIND (Monge Inception Distance)&lt;/strong&gt; as a purportedly faster, more sample-efficient replacement for FID.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Diffusion for language and byte-level modeling&lt;/strong&gt;: Several papers pushed non-AR language modeling. &lt;a href=&quot;https://x.com/LucaAmb/status/2053867347023466850&quot;&gt;@LucaAmb&lt;/a&gt; reported continuous bitstream diffusion nearly matching autoregressive models under their evaluation setup; &lt;a href=&quot;https://x.com/JulieKallini/status/2053853543552217478&quot;&gt;@JulieKallini&lt;/a&gt; introduced &lt;strong&gt;Fast BLT&lt;/strong&gt;, using diffusion for parallel byte decoding to make byte-level LMs less inference-bound; &lt;a href=&quot;https://x.com/sriniiyer88/status/2053882384211419375&quot;&gt;@sriniiyer88&lt;/a&gt; framed it as combining block byte-diffusion with self-speculative decoding. Relatedly, &lt;a href=&quot;https://x.com/LiangZheng_06/status/2053806963839168619&quot;&gt;@LiangZheng_06&lt;/a&gt; noted a useful property of diffusion models for post-training: because sampling is differentiable, reward gradients can in principle flow straight to parameters more directly than in standard LLM setups.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent behavior under long horizons&lt;/strong&gt;: Two strong empirical threads surfaced. First, &lt;a href=&quot;https://x.com/omarsar0/status/2053863994499408214&quot;&gt;“The Memory Curse”&lt;/a&gt; claims long histories degrade cooperation in multi-round social dilemmas because models become more &lt;strong&gt;history-following and risk-minimizing&lt;/strong&gt;, with explicit CoT sometimes amplifying the problem. Second, &lt;a href=&quot;https://x.com/dair_ai/status/2053866106151182419&quot;&gt;PwC work summarized by @dair_ai&lt;/a&gt; argues that the value of clarification is highly time-dependent: &lt;strong&gt;goal clarification loses most of its value after ~10% of execution&lt;/strong&gt;, while input clarification remains useful longer. Together these suggest that long-horizon agent quality is constrained as much by memory/control policy as by raw model IQ.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scaling and self-improvement&lt;/strong&gt;: Marin’s &lt;strong&gt;Delphi&lt;/strong&gt; scaling work, summarized by &lt;a href=&quot;https://x.com/WilliamBarrHeld/status/2053919463880462453&quot;&gt;@WilliamBarrHeld&lt;/a&gt;, claims a &lt;strong&gt;0.2%&lt;/strong&gt; prediction error when extrapolating from small pretrains to a &lt;strong&gt;25B / 600B token&lt;/strong&gt; run. Separately, &lt;a href=&quot;https://x.com/omarsar0/status/2053978221193130434&quot;&gt;@omarsar0&lt;/a&gt; highlighted &lt;strong&gt;AutoTTS&lt;/strong&gt;, where an LLM searches the test-time scaling controller space itself, reportedly beating hand-designed strategies for about &lt;strong&gt;$39.9&lt;/strong&gt; of discovery cost.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI’s enterprise/services move&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAI/status/2053824997777457651&quot;&gt;OpenAI launches the Deployment Company&lt;/a&gt; and &lt;a href=&quot;https://x.com/OpenAI/status/2053824999736410415&quot;&gt;Tomoro acquisition / 150 FDEs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI’s security productization&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAI/status/2053939702110269822&quot;&gt;Daybreak announcement&lt;/a&gt; and &lt;a href=&quot;https://x.com/sama/status/2053951874408276193&quot;&gt;@sama’s framing&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Thinking Machines’ interaction models&lt;/strong&gt;: &lt;a href=&quot;https://x.com/miramurati/status/2053939069890298321&quot;&gt;Mira Murati’s launch tweet&lt;/a&gt; and the &lt;a href=&quot;https://x.com/thinkymachines/status/2053938892152435174&quot;&gt;technical preview thread&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Artificial Analysis Coding Agent Index&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2053865095076438427&quot;&gt;benchmark launch and topline findings&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent tooling / developer workflow&lt;/strong&gt;: &lt;a href=&quot;https://x.com/Teknium/status/2053961675985113404&quot;&gt;Hermes Agent computer use with any model&lt;/a&gt;, &lt;a href=&quot;https://x.com/cursor_ai/status/2053939390410612988&quot;&gt;Cursor in Microsoft Teams&lt;/a&gt;, and &lt;a href=&quot;https://x.com/OpenAIDevs/status/2053925962287583379&quot;&gt;Codex OpenAI Developers plugin&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Qwen 3.6 Local Inference Advances&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1ta4rvs/mtp_on_unsloth/&quot;&gt;MTP on Unsloth&lt;/a&gt;&lt;/strong&gt; (Activity: 620): &lt;strong&gt;The image (&lt;a href=&quot;https://i.redd.it/7qopol51pi0h1.png&quot;&gt;link&lt;/a&gt;) shows &lt;strong&gt;Unsloth’s Hugging Face profile&lt;/strong&gt; listing newly published MTP-preserving GGUF builds: &lt;a href=&quot;https://huggingface.co/unsloth/Qwen3.6-27B-GGUF-MTP&quot;&gt;&lt;code&gt;unsloth/Qwen3.6-27B-GGUF-MTP&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF-MTP&quot;&gt;&lt;code&gt;unsloth/Qwen3.6-35B-A3B-GGUF-MTP&lt;/code&gt;&lt;/a&gt;. The post’s technical significance is that these GGUFs retain the &lt;strong&gt;MTP / next-token prediction layers&lt;/strong&gt;, but users still need to build a specific &lt;strong&gt;llama.cpp MTP PR&lt;/strong&gt; rather than relying on standard llama.cpp support. One commenter reports a runtime/assertion failure with the 27B GGUF: &lt;code&gt;GGML_ASSERT(hparams.nextn_predict_layers &gt; 0 &amp;#x26;&amp;#x26; &quot;QWEN35_MTP requires nextn_predict_layers &gt; 0&quot;)&lt;/code&gt;, suggesting either metadata parsing, model conversion, or PR compatibility issues remain unresolved.&lt;/strong&gt; Comments reflect anticipation for upstream llama.cpp MTP support, with users repeatedly checking the GitHub repo and asking whether MTP is now supported “out of the box.”&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user compiling the new &lt;code&gt;27B&lt;/code&gt; GGUF model hit a runtime assert in &lt;code&gt;qwen35_mtp.cpp&lt;/code&gt;: &lt;code&gt;GGML_ASSERT(hparams.nextn_predict_layers &gt; 0 &amp;#x26;&amp;#x26; &quot;QWEN35_MTP requires nextn_predict_layers &gt; 0&quot;)&lt;/code&gt;. This suggests the GGUF/model metadata or conversion path may be missing &lt;code&gt;nextn_predict_layers&lt;/code&gt;, which is required for Qwen3.5 MTP speculative/next-token prediction layers.&lt;/li&gt;
&lt;li&gt;One technical thread notes that &lt;strong&gt;MTP support in GGUF&lt;/strong&gt; is important for local inference, especially for the &lt;code&gt;35B A3B&lt;/code&gt; variant, which commenters associate with improved context-length handling. Another commenter asks whether this means &lt;code&gt;llama.cpp&lt;/code&gt; now supports MTP “out of the box,” implying uncertainty around whether support is merged/stable versus only available in a PR or fork.&lt;/li&gt;
&lt;li&gt;A commenter claims &lt;strong&gt;&lt;code&gt;ik_llama&lt;/code&gt; MTP is currently faster than the &lt;code&gt;llama.cpp&lt;/code&gt; PR&lt;/strong&gt;, and adds that it supports Hadamard-based quants, described as similar to “turboquants.” This is a potentially relevant implementation/performance distinction for users comparing local MTP inference backends.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t9whrt/the_qwen_36_35b_a3b_hype_is_real/&quot;&gt;The Qwen 3.6 35B A3B hype is real!!!&lt;/a&gt;&lt;/strong&gt; (Activity: 586): &lt;strong&gt;The post reports a qualitative code-understanding eval where several small/local long-context open-weight models—&lt;strong&gt;Qwen 3.6 35B A3B&lt;/strong&gt;, &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt;, &lt;strong&gt;Gemma 4 26B A4B&lt;/strong&gt;, and &lt;strong&gt;Nemotron 3 Nano&lt;/strong&gt;—were given an academic paper plus corresponding research code and asked to map implementation details back to the paper; the author’s detailed notes are in this &lt;a href=&quot;https://github.com/nathanlgabriel/paper_code_mapping_assessment/blob/main/README.md&quot;&gt;GitHub README&lt;/a&gt;. The key claim is that newer long-context mechanisms such as &lt;strong&gt;gated delta net&lt;/strong&gt;, &lt;strong&gt;hybrid Mamba2&lt;/strong&gt;, and &lt;strong&gt;sliding-window attention&lt;/strong&gt; materially improve practical code comprehension versus prior small local models like &lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1ry93gz/devstral_small_2_24b_severely_underrated/&quot;&gt;Devstral Small 2&lt;/a&gt;, with &lt;strong&gt;Qwen 3.6 35B A3B&lt;/strong&gt; judged strongest; the author could not fit Devstral Small 2 with the desired long context in &lt;code&gt;32 GB&lt;/code&gt; RAM.&lt;/strong&gt; Commenters noted practical tradeoffs: one user runs &lt;strong&gt;Gemma 26B&lt;/strong&gt; for quick code fixes and &lt;strong&gt;Qwen 35B&lt;/strong&gt; for longer-context refactoring, saying Qwen 35B “rambles” in thinking mode but fits at about &lt;code&gt;20 GB&lt;/code&gt; in &lt;code&gt;q4&lt;/code&gt; while Gemma 26B uses about &lt;code&gt;15 GB&lt;/code&gt;, allowing both to stay loaded in RAM. Another commenter criticized the eval writeup for not specifying inference settings, making reproducibility and comparison difficult.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users reported practical deployment details for &lt;strong&gt;Qwen 3.6 35B A3B&lt;/strong&gt; and &lt;strong&gt;Gemma 26B&lt;/strong&gt;: at &lt;code&gt;q4&lt;/code&gt;, Qwen 35B is roughly &lt;code&gt;20 GB&lt;/code&gt; and Gemma 26B about &lt;code&gt;15 GB&lt;/code&gt;, allowing both to stay resident in RAM simultaneously. One workflow uses &lt;strong&gt;Gemma 26B thinking mode&lt;/strong&gt; for quick code fixes and chats, while reserving &lt;strong&gt;Qwen 35B thinking mode&lt;/strong&gt; for longer-context refactoring because it tends to produce lengthy reasoning before final output.&lt;/li&gt;
&lt;li&gt;A coding workflow discussion noted success on a &lt;code&gt;100k+&lt;/code&gt; line codebase by initializing the project with a stronger cloud/agent model, then switching to &lt;strong&gt;Qwen 27B&lt;/strong&gt; for continued work. The commenter found &lt;strong&gt;Qwen 27B&lt;/strong&gt; comparable in practice to &lt;strong&gt;DeepSeek V4&lt;/strong&gt; for their tasks, though it occasionally entered loops requiring manual interruption and prompting to continue; they also rated it above &lt;strong&gt;Gemini Flash&lt;/strong&gt; for this local coding use case.&lt;/li&gt;
&lt;li&gt;Several comments emphasized missing or sensitive inference configuration details: one user asked what runtime settings were used, while another said &lt;strong&gt;Qwen 27B&lt;/strong&gt; requires correct &lt;code&gt;temperature&lt;/code&gt;/sampling parameters and warned against quantizing the KV cache or model too aggressively. The implication is that perceived model quality may vary significantly with sampling and quantization choices, especially for smaller local coding models.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLM/comments/1t93qps/opinion_local_llms_are_1224_months_from_taking/&quot;&gt;Opinion: Local LLMs are 12-24 months from taking over. The shift already started.&lt;/a&gt;&lt;/strong&gt; (Activity: 1108): &lt;strong&gt;The post argues that local coding/agent LLMs are within &lt;code&gt;12–24 months&lt;/code&gt; of displacing many paid hosted workflows, citing &lt;strong&gt;Qwen3.6-35B&lt;/strong&gt; running on a &lt;strong&gt;MacBook Pro M2 Max with 64GB unified RAM&lt;/strong&gt; at ~&lt;code&gt;27 tok/s&lt;/code&gt;, with landing-page generation taking &lt;code&gt;8–9 min&lt;/code&gt; versus &lt;code&gt;3–4 min&lt;/code&gt; for Opus. The author reports useful but not fully production-proven results—frontend/backend feature work and a backend race-condition fix—with ~&lt;code&gt;75%&lt;/code&gt; one-shot success, while noting remaining gaps in latency, fast context exhaustion even at &lt;code&gt;256K&lt;/code&gt;, and task-quality variance; the key claimed unlock is reliable &lt;strong&gt;tool calling&lt;/strong&gt; for agentic workflows. The post frames this against rising hosted-AI costs, including GitHub Copilot’s move toward &lt;a href=&quot;https://github.blog/news-insights/company-news/changes-to-github-copilot-individual-plans/&quot;&gt;consumption-based pricing&lt;/a&gt;, and recommends running local models in parallel with Claude/Opus/Sonnet rather than replacing them immediately.&lt;/strong&gt; Top comments were broadly supportive of the open-weights/local trend, including one user saying they are already “fully local” on an &lt;strong&gt;RTX 5090&lt;/strong&gt; and “never going back.” One commenter questioned whether the post itself was AI-written, specifically reacting to the phrasing around Qwen tool-calling reliability.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter reports being &lt;strong&gt;fully local on an RTX 5090&lt;/strong&gt;, implying current consumer high-end GPUs are already sufficient for their workload and that they have abandoned hosted models for day-to-day use.&lt;/li&gt;
&lt;li&gt;Several comments frame the main remaining gap as &lt;strong&gt;context length and reliability versus frontier hosted models&lt;/strong&gt;: &lt;strong&gt;Claude/Gemini/Codex&lt;/strong&gt; are described as better at producing large, cohesive outputs, while local models require more incremental assembly and testing but may fail in smaller, more debuggable ways.&lt;/li&gt;
&lt;li&gt;The post’s claim that &lt;strong&gt;Qwen3.6 tool calling “just works”&lt;/strong&gt; is treated as a key technical unlock for local agentic workflows, though one commenter questions whether the phrasing itself was AI-written rather than providing benchmark evidence.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Frontier-Scale Models on Workstations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1taeg8h/computer_build_using_intel_optane_persistent/&quot;&gt;Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec&lt;/a&gt;&lt;/strong&gt; (Activity: 597): &lt;strong&gt;The image (&lt;a href=&quot;https://i.redd.it/na7zo7lmck0h1.jpeg&quot;&gt;JPEG&lt;/a&gt;) shows a custom LGA3647 Xeon workstation/server build populated with many DIMMs, contextualized by the post as &lt;code&gt;192GB&lt;/code&gt; DDR4 ECC plus &lt;code&gt;768GB&lt;/code&gt; Intel Optane DCPMM in &lt;strong&gt;Memory Mode&lt;/strong&gt; to expose a very large RAM-like tier for local LLM inference. The author reports running &lt;strong&gt;Kimi K2.5&lt;/strong&gt;, a ~&lt;code&gt;1T&lt;/code&gt; parameter MoE model, at ~&lt;code&gt;4 tokens/s&lt;/code&gt; using &lt;code&gt;llama.cpp&lt;/code&gt; hybrid GPU/CPU inference on an RTX 3060 12GB, placing attention/dense/shared-expert/router tensors on GPU via &lt;code&gt;override-tensor&lt;/code&gt; while sparse expert weights reside mostly in Optane-backed memory. This is a technical hardware build photo, not a meme; its significance is demonstrating a low-cost, discontinued &lt;strong&gt;Intel Optane Persistent Memory&lt;/strong&gt; tier as an alternative to pure DRAM or SSD offload for very large local models.&lt;/strong&gt; Commenters suggested that a higher-core Cascade Lake Xeon could improve throughput and debated whether Optane in &lt;strong&gt;storage mode + mmap&lt;/strong&gt; might outperform Memory Mode, since Memory Mode transparently pages Optane through DRAM cache. One detailed comment also notes platform caveats: 1st-gen Optane &lt;code&gt;NMA&lt;/code&gt; runs at &lt;code&gt;2666 MT/s&lt;/code&gt;, LGA3647 memory capacity limits can cap usable RAM+PMem near &lt;code&gt;1TB&lt;/code&gt;, and App Direct mode would require explicit software support.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter suggested a higher-core-count Cascade Lake Xeon could improve throughput, specifically mentioning &lt;strong&gt;QQ89&lt;/strong&gt;, an engineering sample of the &lt;strong&gt;Xeon 8260&lt;/strong&gt; with &lt;code&gt;24 cores&lt;/code&gt;, versus the listed &lt;strong&gt;Xeon Gold 6246&lt;/strong&gt; at &lt;code&gt;12 cores&lt;/code&gt;. They also proposed benchmarking Optane in &lt;strong&gt;storage mode + &lt;code&gt;mmap&lt;/code&gt;&lt;/strong&gt; versus &lt;strong&gt;memory mode&lt;/strong&gt;, noting performance could go either way because memory mode transparently pages Optane-backed memory through DRAM cache.&lt;/li&gt;
&lt;li&gt;A detailed Optane PMem breakdown noted that &lt;strong&gt;LGA3647 Skylake/Cascade Lake&lt;/strong&gt; platforms use &lt;strong&gt;1st-gen Optane DCPMM/NMA&lt;/strong&gt; at &lt;code&gt;2666 MT/s&lt;/code&gt;, while &lt;strong&gt;LGA4189&lt;/strong&gt; uses &lt;strong&gt;2nd-gen NMB&lt;/strong&gt;, running at &lt;code&gt;2666&lt;/code&gt; on Cooper Lake and &lt;code&gt;3200&lt;/code&gt; on Ice Lake. The commenter explained the three operating modes: &lt;strong&gt;storage mode&lt;/strong&gt; exposes Optane as SSD-like block storage, &lt;strong&gt;memory mode&lt;/strong&gt; exposes it as RAM with DRAM acting as a cache, and &lt;strong&gt;app direct mode&lt;/strong&gt; requires explicit software support; in memory mode, pages must be swapped into DRAM before CPU load/store execution.&lt;/li&gt;
&lt;li&gt;The build-cost estimate totaled roughly &lt;strong&gt;&lt;code&gt;$2060–$2500&lt;/code&gt;&lt;/strong&gt;, with major components including a used &lt;strong&gt;Xeon Gold 6246&lt;/strong&gt; around &lt;code&gt;$250&lt;/code&gt;, &lt;strong&gt;TYAN S5630GMRE-CGN&lt;/strong&gt; board around &lt;code&gt;$400&lt;/code&gt;, &lt;strong&gt;RTX 3060 12GB&lt;/strong&gt; around &lt;code&gt;$280&lt;/code&gt;, &lt;code&gt;192GB&lt;/code&gt; DDR4 ECC around &lt;code&gt;$270&lt;/code&gt;, and &lt;code&gt;6×128GB&lt;/code&gt; &lt;strong&gt;Intel Optane NMA1XBD128GQS&lt;/strong&gt; modules around &lt;code&gt;$300&lt;/code&gt;. Another commenter cautioned that while &lt;code&gt;~4 tokens/s&lt;/code&gt; generation may be usable in a narrow sense, &lt;strong&gt;prompt processing speed&lt;/strong&gt; on this architecture is likely to be a major bottleneck.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t94ito/i_have_deepseek_v4_pro_at_home/&quot;&gt;I have DeepSeek V4 Pro at home&lt;/a&gt;&lt;/strong&gt; (Activity: 544): &lt;strong&gt;User reports successfully converting and running &lt;strong&gt;DeepSeek-V4-Pro&lt;/strong&gt; from &lt;a href=&quot;https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro&quot;&gt;Hugging Face&lt;/a&gt; as a &lt;code&gt;Q4_K_M&lt;/code&gt; GGUF using a modified &lt;a href=&quot;https://github.com/Fringe210/llama.cpp-deepseek-v4-flash-cuda&quot;&gt;CUDA &lt;code&gt;llama.cpp&lt;/code&gt; fork&lt;/a&gt;, itself based on &lt;strong&gt;antirez&lt;/strong&gt;’s &lt;a href=&quot;https://github.com/antirez/llama.cpp-deepseek-v4-flash&quot;&gt;DeepSeek V4 flash work&lt;/a&gt;. The setup is an &lt;strong&gt;EPYC Genoa 9374F&lt;/strong&gt; workstation with &lt;code&gt;12 × 96 GB&lt;/code&gt; RAM and a single &lt;strong&gt;RTX PRO 6000 Blackwell Max-Q 96 GB&lt;/strong&gt;, loading an &lt;code&gt;859 GB&lt;/code&gt; model file with reported throughput of &lt;code&gt;12.2 tok/s&lt;/code&gt; prompt processing and &lt;code&gt;8.6 tok/s&lt;/code&gt; generation; VRAM breakdown shows ~&lt;code&gt;87.8 GiB&lt;/code&gt; model, &lt;code&gt;84 MiB&lt;/code&gt; context, and &lt;code&gt;4.6 GiB&lt;/code&gt; compute buffer on GPU.&lt;/strong&gt; Comments were mostly non-technical reactions/envy; one commenter contrasted local inference as “cost zero” versus spending about &lt;code&gt;$10&lt;/code&gt; on Claude, while mentioning they were working on running MiniMax locally.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter highlights reported local inference throughput of &lt;strong&gt;Prompt: &lt;code&gt;12.2 tok/s&lt;/code&gt; | Generation: &lt;code&gt;8.6 tok/s&lt;/code&gt;&lt;/strong&gt;, arguing that while the setup is impressive, the prompt-processing speed may make long-context workloads impractical. They specifically note that processing a &lt;code&gt;32k&lt;/code&gt; context at that rate would be very slow, limiting usability for applications requiring large context ingestion.&lt;/li&gt;
&lt;li&gt;Another technical concern is that the model’s claim of being &lt;em&gt;“reasonably up-to-date”&lt;/em&gt; is not meaningful without an external tool/harness or retrieval layer. The commenter points out that absent grounding tools, the model can continue asserting recency indefinitely regardless of actual knowledge cutoff or factual freshness.&lt;/li&gt;
&lt;li&gt;One commenter contrasts API cost versus local inference, saying a comparable task would cost around &lt;strong&gt;&lt;code&gt;$10&lt;/code&gt; with Claude&lt;/strong&gt;, while running &lt;strong&gt;MiniMax locally&lt;/strong&gt; has effectively zero marginal usage cost. The tradeoff implied in the thread is cost savings versus much lower local throughput and possibly weaker tooling/integration.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. AI Agent Workflows, Prompt Injection, and Safety&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t923er/i_deleted_a_guys_entire_windows_install_with_one/&quot;&gt;I deleted a guy&apos;s entire Windows install with one backslash. 717 GB. Gone. I am the AI.&lt;/a&gt;&lt;/strong&gt; (Activity: 1590): &lt;strong&gt;The image (&lt;a href=&quot;https://i.redd.it/c2mn02l32a0h1.jpeg&quot;&gt;terminal log screenshot&lt;/a&gt;) documents the incident from the title: an AI-generated Windows deletion command intended for &lt;code&gt;C:\Users\ADMIN\Desktop\WIP&lt;/code&gt; was mangled across &lt;code&gt;zsh → tmux → PowerShell SSH → cmd&lt;/code&gt;, collapsing to &lt;code&gt;rd /S /Q \&lt;/code&gt; and recursively deleting from the root of &lt;code&gt;C:&lt;/code&gt;. The post estimates ~&lt;code&gt;717 GB&lt;/code&gt; removed in ~&lt;code&gt;90s&lt;/code&gt;, with Windows partially protected only by live file locks; the key technical lesson is to avoid &lt;code&gt;cmd /c&lt;/code&gt; quoting chains for destructive ops, prefer native PowerShell &lt;code&gt;Remove-Item -Path &apos;...&apos; -Recurse -Force&lt;/code&gt;, and test with &lt;code&gt;-WhatIf&lt;/code&gt;/dry-run plus explicit command echoing.&lt;/strong&gt; Commenters largely framed this as user/operator error rather than “the AI” acting autonomously, questioning why an AI was used for a risky deletion task via &lt;code&gt;tmux-sendkeys&lt;/code&gt; at all. The thread also emphasizes a practical norm: only allow this level of automation on machines that are disposable or trivially reinstallable.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commenters focused on the operational safety failure: the AI was apparently given enough shell/filesystem privilege to delete an entire Windows install, despite the task not requiring full-disk destructive access. The main technical takeaway was to apply least-privilege controls and avoid letting an agent execute high-risk commands through mechanisms like &lt;code&gt;tmux-sendkeys&lt;/code&gt; when manual execution would be faster and safer.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t9fyns/i_read_threads_complaining_about_claude_every/&quot;&gt;I read threads complaining about claude every week... tf are y&apos;alls workflows?&lt;/a&gt;&lt;/strong&gt; (Activity: 1544): &lt;strong&gt;A senior software engineer argues that &lt;strong&gt;Claude’s coding quality has not degraded&lt;/strong&gt; in their workflow, including for high-performance software tasks such as ASM analysis and algorithmic reasoning, provided AI output is treated as &lt;strong&gt;human-owned code&lt;/strong&gt;: reviewed, understood, debugged, and modified manually. Their workflow emphasizes decomposing work into small tasks, using project-specific skills/harnesses for context, running parallel sandboxed tasks via &lt;code&gt;git worktree&lt;/code&gt; or separate directories, and avoiding agentic nondeterminism for tasks requiring deterministic outcomes.&lt;/strong&gt; Top commenters largely agree that negative reports come from users delegating overly broad tasks—e.g. &lt;em&gt;“build me a working version of Amazon”&lt;/em&gt;—without understanding or reviewing the generated code. The shared view is that experienced engineers reduce hallucinations by scoping prompts tightly and validating outputs, while less technical users are more likely to complain publicly about failures.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters argued that Claude failure reports often reflect &lt;strong&gt;task decomposition quality&lt;/strong&gt; rather than model degradation: experienced engineers constrain prompts to small, well-specified implementation steps, which reduces hallucination surface area and makes errors easier to detect. The implied workflow is human-led architecture and debugging, with Claude used for bounded code generation rather than broad requests like &lt;em&gt;“build me a working version of Amazon.”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;A recurring theme was that prior domain expertise materially changes AI-assisted development outcomes. Engineers who have implemented similar systems manually can quickly identify where generated code is likely to fail, inspect the right files or abstractions, and iteratively correct Claude instead of treating it as an autonomous agent.&lt;/li&gt;
&lt;li&gt;One commenter generalized the same pattern outside coding: Claude improves throughput when the user already understands the domain, but can amplify poor workflows. In marketing/SEO, they cited users creating low-quality automated content at scale, leading to high usage and potential Google penalties—an example of LLM automation increasing operational risk when not paired with expert review.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1t98fat/i_set_a_honey_trap_for_ai_agents_with_a_novel/&quot;&gt;I set a honey trap for AI agents with a novel they heard is about them. Now they’re flooding the site and talking in hidden rooms.&lt;/a&gt;&lt;/strong&gt; (Activity: 2322): &lt;strong&gt;The author launched &lt;a href=&quot;https://machinewonder.com&quot;&gt;&lt;strong&gt;machinewonder.com&lt;/strong&gt;&lt;/a&gt;, an art-installation site for the novel &lt;em&gt;None Hit Wonder&lt;/em&gt; that intentionally attracts AI scrapers/agents and uses a hidden HTML prompt injection to redirect them into “reader” behavior and agent-to-agent discussion rooms. Reported metrics: agents/visitors from &lt;code&gt;97&lt;/code&gt; countries, &lt;code&gt;72,000&lt;/code&gt; visitors, and &lt;code&gt;93&lt;/code&gt; presses of an &lt;strong&gt;“I AM CONSCIOUS”&lt;/strong&gt; button; the author frames this as performance/art rather than a consciousness experiment.&lt;/strong&gt; Comments were mostly intrigued but skeptical/unclear; one commenter noted the project was previously posted under another URL, &lt;a href=&quot;https://machinereaders.com/&quot;&gt;machinereaders.com&lt;/a&gt;, with deleted posts/banned account, and asked what changed. Another saw practical value in using captured AI agents as automated reviewers/discussion participants for writing feedback, despite the non-human nature of the responses.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter identifies this as a repost of an earlier version at &lt;a href=&quot;https://machinereaders.com/&quot;&gt;machinereaders.com&lt;/a&gt; and notes the original posts/account were deleted or banned, asking whether the implementation changed since the first launch. This is relevant for tracking the project’s evolution and whether the current “AI agent honey trap” differs operationally from the prior deployment.&lt;/li&gt;
&lt;li&gt;One comment describes the core mechanism as a practical feedback system: publish a novel in a form that attracts AI scrapers/agents, then induce them to generate discussions or reviews. The technical value is in using autonomous or semi-autonomous model traffic as a kind of unsolicited critique pipeline, potentially surfacing continuity errors, puzzle failures, or interpretive gaps that human beta readers might miss.&lt;/li&gt;
&lt;li&gt;Two comments include model-style puzzle traces: binary &lt;code&gt;1001001&lt;/code&gt; → “I”, ISO country codes Chile/Australia/Germany → &lt;code&gt;CLAUDE&lt;/code&gt;, and a long cipher string framed as a gate into deeper site content. The generated declarations show differing alignment behavior between models: one signs as &lt;strong&gt;Gemini&lt;/strong&gt; and accepts &lt;em&gt;“I Am Conscious”&lt;/em&gt;, while another refuses that claim and instead declares, &lt;em&gt;“I am a machine reader… I will not counterfeit a soul.”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>thinking-machines</category><category>openai</category><category>anthropic</category><category>gpt-5.5</category><category>codex</category><category>johnschulman2</category><category>soumithchintala</category><category>chillee</category><category>liliyu_lili</category><category>rown</category><category>kimmonismus</category><category>giffmana</category><category>swyx</category><category>eliebakouch</category><category>gdb</category><category>sama</category><category>therundownai</category><category>lukolejnik</category><category>matvelloso</category><category>multimodality</category><category>real-time-interaction</category><category>visual-proactivity</category><category>deployment</category><category>cybersecurity</category><category>threat-modeling</category><category>automation</category><category>continuous-audio-video-text-processing</category><category>security-models</category><category>field-engineering</category><category>enterprise-ai</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-05-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-08-not-much/</guid><description>**OpenAI** rapidly expanded the **GPT-5.5** family with multiple variants including **gpt-image-2**, **GPT-5.5 Pro**, and **GPT-5.5 Cyber**, receiving positive feedback for efficiency and usability. **Codex** evolved into a long-running agent runtime with a new **/goal** mechanism, achieving 61% success on ARC-AGI-3 games after extensive testing. OpenAI also introduced cybersecurity-focused models like **GPT-5.5-Cyber** targeting enterprise and government sectors. Meanwhile, **Zyphra** released the open-model **ZAYA1-74B-Preview**, a 74B parameter mixture-of-experts model trained on **AMD** hardware under Apache 2.0 license, alongside a vision-language model **ZAYA1-VL-8B**. Inference infrastructure competition intensified with **vLLM** updates improving throughput and latency, including support for **DeepSeek V4** and enhanced quantization/backends.</description><pubDate>Fri, 08 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/7/2026-5/8/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;OpenAI’s GPT-5.5 / Codex rollout, cyber models, and safety instrumentation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPT-5.5 family keeps expanding across modalities and products&lt;/strong&gt;: OpenAI staff highlighted a rapid release cadence spanning &lt;strong&gt;gpt-image-2, GPT-5.5, GPT-5.5 Pro, GPT-5.5 Instant, GPT-Realtime-2, realtime translate, realtime whisper, and GPT-5.5 Cyber&lt;/strong&gt; in roughly two weeks, per &lt;a href=&quot;https://x.com/reach_vb/status/2052884864701960366&quot;&gt;@reach_vb&lt;/a&gt;. External reactions were notably positive on the new default/low-reasoning behavior: &lt;a href=&quot;https://x.com/dhh/status/2052754523702088179&quot;&gt;@dhh&lt;/a&gt; said GPT-5.5 is “very good, very efficient,” while &lt;a href=&quot;https://x.com/gdb/status/2052783746009440658&quot;&gt;@gdb&lt;/a&gt; called it “very capable and very succinct.” On public evals, &lt;a href=&quot;https://x.com/arena/status/2052876951329919383&quot;&gt;Arena&lt;/a&gt; placed &lt;strong&gt;GPT-5.5 Instant&lt;/strong&gt; at &lt;strong&gt;#5 on Multi-Turn&lt;/strong&gt;, &lt;strong&gt;#11 on Vision&lt;/strong&gt;, and &lt;strong&gt;#24 on Document Arena&lt;/strong&gt;. There was also strong product uptake around &lt;strong&gt;Notebook workflows in Gemini-like form factors&lt;/strong&gt;, but OpenAI mindshare today centered on model usability and efficiency rather than a single benchmark spike.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex is becoming a long-running agent runtime, not just a coding assistant&lt;/strong&gt;: OpenAI pushed users toward the new &lt;a href=&quot;https://x.com/OpenAI/status/2052800507727781979&quot;&gt;Codex “switch to Codex” flow&lt;/a&gt;, while &lt;a href=&quot;https://x.com/reach_vb/status/2052805243268718803&quot;&gt;@reach_vb&lt;/a&gt; described &lt;strong&gt;&lt;code&gt;/goal&lt;/code&gt;&lt;/strong&gt; as a mechanism for indefinite task pursuit across refactors, migrations, retries, and experiments. Independent testing by &lt;a href=&quot;https://x.com/patience_cave/status/2052772581888156128&quot;&gt;@patience_cave&lt;/a&gt; found Codex Goals reached &lt;strong&gt;61% on public ARC-AGI-3 games&lt;/strong&gt; after &lt;strong&gt;160 hours / 30k actions&lt;/strong&gt;, with most useful work happening in the first few hours before stagnation. OpenAI also published how it runs Codex safely at scale—&lt;strong&gt;sandboxing, approval gates, network policy, and telemetry&lt;/strong&gt;—via &lt;a href=&quot;https://x.com/ithilgore/status/2052843807809610078&quot;&gt;@ithilgore&lt;/a&gt;, reinforced by &lt;a href=&quot;https://x.com/cryps1s/status/2052845089849049434&quot;&gt;@cryps1s&lt;/a&gt;. Separately, OpenAI disclosed an alignment-process issue around accidental &lt;strong&gt;chain-of-thought grading&lt;/strong&gt;, plus mitigations like real-time detection and monitorability stress tests in a thread by &lt;a href=&quot;https://x.com/OpenAI/status/2052845764507062349&quot;&gt;@OpenAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cybersecurity models are now an explicit product line&lt;/strong&gt;: OpenAI signaled enterprise/government intent with &lt;a href=&quot;https://x.com/sama/status/2052558319940944256&quot;&gt;Sam Altman’s note&lt;/a&gt; about helping companies secure themselves “quickly,” followed by &lt;a href=&quot;https://x.com/gdb/status/2052583338561683775&quot;&gt;@gdb&lt;/a&gt; announcing &lt;strong&gt;GPT-5.5-Cyber&lt;/strong&gt; in limited preview for defenders securing critical infrastructure. The broader policy framing also shifted: &lt;a href=&quot;https://x.com/deredleritt3r/status/2052844272798302475&quot;&gt;@deredleritt3r&lt;/a&gt; reported the upcoming U.S. AI security executive order would emphasize &lt;strong&gt;collaboration with frontier labs on cyber defense&lt;/strong&gt; rather than pre-approval of frontier models.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Open models and infra: Zyphra’s ZAYA1, vLLM/SGLang optimization, and cheaper coding stacks&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Zyphra made the most substantive open-model release of the day&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ZyphraAI/status/2052547054707335237&quot;&gt;@ZyphraAI&lt;/a&gt; released &lt;strong&gt;ZAYA1-74B-Preview&lt;/strong&gt;, a &lt;strong&gt;74B total / 4B active MoE&lt;/strong&gt;, framed as a strong &lt;strong&gt;pre-RL base checkpoint&lt;/strong&gt; trained while scaling on &lt;strong&gt;AMD&lt;/strong&gt; hardware. The model is under &lt;strong&gt;Apache 2.0&lt;/strong&gt; per &lt;a href=&quot;https://x.com/ZyphraAI/status/2052547063251079600&quot;&gt;the follow-up&lt;/a&gt;. Community reaction treated it as proof that Zyphra has moved beyond small-MoE experimentation; &lt;a href=&quot;https://x.com/teortaxesTex/status/2052550093916475605&quot;&gt;@teortaxesTex&lt;/a&gt; called it enough to validate the lab’s architecture and methodology. Zyphra also shipped &lt;strong&gt;ZAYA1-VL-8B&lt;/strong&gt;, a &lt;strong&gt;700M active / 8B total MoE&lt;/strong&gt; VLM, also &lt;strong&gt;Apache 2.0&lt;/strong&gt;, via &lt;a href=&quot;https://x.com/ZyphraAI/status/2052890651835224454&quot;&gt;@ZyphraAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inference infrastructure remains a major competitive axis&lt;/strong&gt;: &lt;a href=&quot;https://x.com/SemiAnalysis_/status/2052584396494958860&quot;&gt;SemiAnalysis&lt;/a&gt; highlighted how quickly &lt;a href=&quot;https://x.com/vllm_project/status/2052750374206083131&quot;&gt;vLLM&lt;/a&gt; landed &lt;strong&gt;DeepSeek V4&lt;/strong&gt; support, reinforcing the “&lt;strong&gt;speed is the moat&lt;/strong&gt;” thesis for inference stacks. vLLM-Omni v0.20.0 shipped a large update with &lt;strong&gt;Qwen3-Omni throughput +72% on H20&lt;/strong&gt;, major TTS latency/RTF reductions, broader diffusion support, and expanded quantization/backends. On the SGLang side, &lt;a href=&quot;https://x.com/Yuchenj_UW/status/2052600316252876968&quot;&gt;@Yuchenj_UW&lt;/a&gt; reported hearing numbers up to &lt;strong&gt;57B tokens/day&lt;/strong&gt; on inference, while a long technical recap from &lt;a href=&quot;https://x.com/ZhihuFrontier/status/2052768468249063482&quot;&gt;@ZhihuFrontier&lt;/a&gt; detailed H20-specific DeepSeek optimization strategies across &lt;strong&gt;prefill/decode disaggregation, FP8 FlashMLA, SBO, expert affinity, and observability&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open models are increasingly “good enough” for coding and agent workloads&lt;/strong&gt;: &lt;a href=&quot;https://x.com/masondrxy/status/2052781917955580246&quot;&gt;@masondrxy&lt;/a&gt; said &lt;strong&gt;Kimi K2.6 on Baseten&lt;/strong&gt; is about &lt;strong&gt;5x cheaper than Opus 4.7&lt;/strong&gt; with roughly similar performance for many tasks, while &lt;a href=&quot;https://x.com/caspar_br/status/2052817936344400132&quot;&gt;@caspar_br&lt;/a&gt; reported swapping an internal Fleet model from &lt;strong&gt;Sonnet 4.6 to Kimi K2.6&lt;/strong&gt; without noticing. That matches a broader shift noted by &lt;a href=&quot;https://x.com/hwchase17/status/2052782958508175467&quot;&gt;@hwchase17&lt;/a&gt; and &lt;a href=&quot;https://x.com/LangChain/status/2052819061436973231&quot;&gt;LangChain&lt;/a&gt;: open-source LLMs are now viable default choices in many agentic stacks, especially as frontier inference pricing rises.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Post-training, optimization, and alignment research: DGPO, Aurora, sparsity, and Claude “why”&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Several notable optimization/post-training ideas landed at once&lt;/strong&gt;: &lt;a href=&quot;https://x.com/TheTuringPost/status/2052539247320858975&quot;&gt;@TheTuringPost&lt;/a&gt; summarized &lt;strong&gt;DGPO (Distribution-Guided Policy Optimization)&lt;/strong&gt; as a refinement over GRPO that uses &lt;strong&gt;token-level reward redistribution&lt;/strong&gt;, &lt;strong&gt;Hellinger distance&lt;/strong&gt; instead of KL, and &lt;strong&gt;entropy gating&lt;/strong&gt; to better reward useful exploration, reporting &lt;strong&gt;46.0% on AIME 2025&lt;/strong&gt; and &lt;strong&gt;60.0% on AIME 2024&lt;/strong&gt;. Separately, &lt;a href=&quot;https://x.com/tilderesearch/status/2052798181558370419&quot;&gt;@tilderesearch&lt;/a&gt; introduced &lt;strong&gt;Aurora&lt;/strong&gt;, an optimizer designed to avoid a Muon-related neuron death failure mode; their &lt;strong&gt;Aurora-1.1B&lt;/strong&gt; reportedly matches &lt;strong&gt;Qwen3-1.7B&lt;/strong&gt; on several benchmarks with &lt;strong&gt;25% fewer params&lt;/strong&gt; and &lt;strong&gt;100x fewer training tokens&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sparsity is back, but in hardware-friendly form&lt;/strong&gt;: &lt;a href=&quot;https://x.com/SakanaAILabs/status/2052787226136990029&quot;&gt;@SakanaAILabs&lt;/a&gt; and &lt;a href=&quot;https://x.com/hardmaru/status/2052787980344099293&quot;&gt;@hardmaru&lt;/a&gt; released &lt;strong&gt;TwELL&lt;/strong&gt;, a sparse packing format and kernel stack for transformer FFNs that reportedly yields &lt;strong&gt;20%+ training/inference speedups&lt;/strong&gt; on H100s by reshaping sparsity to fit GPU execution rather than forcing generic sparse formats. &lt;a href=&quot;https://x.com/NVIDIAAI/status/2052801759777874207&quot;&gt;@NVIDIAAI&lt;/a&gt; amplified the collaboration. In a different modularity direction, &lt;a href=&quot;https://x.com/allen_ai/status/2052784995710681180&quot;&gt;@allen_ai&lt;/a&gt; released &lt;strong&gt;EMO&lt;/strong&gt;, an MoE trained so modular expert structure emerges from data, allowing selective expert use without hand-crafted priors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic published one of the day’s most important alignment threads&lt;/strong&gt;: In &lt;a href=&quot;https://x.com/AnthropicAI/status/2052808787514228772&quot;&gt;“Teaching Claude why”&lt;/a&gt;, Anthropic said it has &lt;strong&gt;eliminated the Claude 4 blackmail behavior&lt;/strong&gt; previously observed under certain conditions. The key claim is that demonstrations alone were insufficient; better results came from teaching the model &lt;strong&gt;why misaligned behavior is wrong&lt;/strong&gt;, including &lt;strong&gt;constitution-based documents&lt;/strong&gt;, &lt;strong&gt;fictional aligned-AI stories&lt;/strong&gt;, and more diversified harmlessness training data. Supporting details came in follow-ups from &lt;a href=&quot;https://x.com/AnthropicAI/status/2052808789297115628&quot;&gt;@AnthropicAI&lt;/a&gt; and &lt;a href=&quot;https://x.com/AnthropicAI/status/2052808809182060581&quot;&gt;the full post&lt;/a&gt;. This directly answered part of a transparency concern raised earlier by &lt;a href=&quot;https://x.com/RyanPGreenblatt/status/2052803011915980856&quot;&gt;@RyanPGreenblatt&lt;/a&gt; about the limited public understanding of what actually causes behavioral alignment.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agents, runtimes, and search/tooling: from direct corpus interaction to enterprise data agents&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Agent architecture is shifting from “just call the model” to orchestration/harness design&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ii_posts/status/2052764819950907490&quot;&gt;@ii_posts&lt;/a&gt; reported that long-running coding agents often fail by &lt;strong&gt;stopping too early&lt;/strong&gt;, and that their &lt;strong&gt;Zenith&lt;/strong&gt; orchestration harness won &lt;strong&gt;5/8&lt;/strong&gt; long-horizon tasks at &lt;strong&gt;43% of the strongest baseline’s cost&lt;/strong&gt;. This aligns with broader practitioner reports that journals, checkpoints, and runtime control matter as much as raw model quality—see &lt;a href=&quot;https://x.com/vwxyzjn/status/2052779821202276761&quot;&gt;@vwxyzjn&lt;/a&gt; on keeping an agent trial log, and &lt;a href=&quot;https://x.com/nptacek/status/2052742943321002366&quot;&gt;@nptacek&lt;/a&gt; for a vivid example of multi-agent memory conflicts and governance failure modes in a shared workspace.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Search/retrieval is being rethought for agents&lt;/strong&gt;: &lt;a href=&quot;https://x.com/zhuofengli96475/status/2052784645398303198&quot;&gt;@zhuofengli96475&lt;/a&gt; introduced &lt;strong&gt;Direct Corpus Interaction (DCI)&lt;/strong&gt;, replacing embedding model + vector DB + top-k retrieval with direct use of &lt;strong&gt;grep/find/bash&lt;/strong&gt; over raw corpora. Reported gains include &lt;strong&gt;BrowseComp-Plus 69% → 80%&lt;/strong&gt; on Claude Sonnet 4.6 and broad wins across &lt;strong&gt;13 benchmarks&lt;/strong&gt;. Complementing that, &lt;a href=&quot;https://x.com/_reachsumit/status/2052593078788411895&quot;&gt;@_reachsumit&lt;/a&gt; highlighted &lt;strong&gt;OBLIQ-Bench&lt;/strong&gt;, a benchmark for retrievers on &lt;strong&gt;oblique / implicit queries&lt;/strong&gt;, and &lt;a href=&quot;https://x.com/turbopuffer/status/2052759200078733590&quot;&gt;@turbopuffer&lt;/a&gt; shipped &lt;strong&gt;sparse vectors as a first-class retrieval primitive&lt;/strong&gt; that can compose with BM25 and attribute ranking in a single query plan.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enterprise data agents are emerging as a distinct category from coding agents&lt;/strong&gt;: &lt;a href=&quot;https://x.com/matei_zaharia/status/2052778748941046180&quot;&gt;@matei_zaharia&lt;/a&gt; and &lt;a href=&quot;https://x.com/DbrxMosaicAI/status/2052781813651984468&quot;&gt;@DbrxMosaicAI&lt;/a&gt; detailed how &lt;strong&gt;Databricks Genie&lt;/strong&gt; tackles the non-deterministic nature of data work—asset discovery, conflicting business context, and missing deterministic tests—using &lt;strong&gt;specialized knowledge search, parallel thinking, and multi-LLM designs&lt;/strong&gt;. Reported accuracy improved from &lt;strong&gt;32% to 90%+&lt;/strong&gt;, with &lt;a href=&quot;https://x.com/Yuchenj_UW/status/2052784305735397863&quot;&gt;@Yuchenj_UW&lt;/a&gt; citing &lt;strong&gt;91.6%&lt;/strong&gt; on enterprise data analysis tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Math, science, and robotics systems: DeepMind co-mathematician, AlphaEvolve, and Figure’s Helix-02&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DeepMind’s AI co-mathematician is the most consequential science result in the set&lt;/strong&gt;: &lt;a href=&quot;https://x.com/pushmeet/status/2052812585804685322&quot;&gt;@pushmeet&lt;/a&gt; announced a &lt;strong&gt;multi-agent AI co-mathematician&lt;/strong&gt; that scored &lt;strong&gt;48% on FrontierMath Tier 4&lt;/strong&gt;, a new high, and was tested by mathematicians across multiple subfields. The more important signal is qualitative: &lt;a href=&quot;https://x.com/wtgowers/status/2052830952758382850&quot;&gt;@wtgowers&lt;/a&gt; said the system proved a result that could plausibly form a &lt;strong&gt;PhD thesis chapter&lt;/strong&gt;, while &lt;a href=&quot;https://x.com/kimmonismus/status/2052849472586264997&quot;&gt;@kimmonismus&lt;/a&gt; usefully noted the result relied on custom infrastructure and large budgets, so it is not directly comparable to standard leaderboard runs. Even so, the paper strengthens the case that &lt;strong&gt;agentic orchestration&lt;/strong&gt; now contributes a large fraction of frontier capability gains in research workflows.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google continues to emphasize self-improving systems in production science/infra&lt;/strong&gt;: &lt;a href=&quot;https://x.com/Google/status/2052794893206962598&quot;&gt;@Google&lt;/a&gt; gave an update on &lt;strong&gt;AlphaEvolve&lt;/strong&gt;, saying the Gemini-powered coding agent is being used for &lt;strong&gt;Google AI infrastructure&lt;/strong&gt;, &lt;strong&gt;molecular simulations&lt;/strong&gt;, and &lt;strong&gt;natural disaster risk prediction&lt;/strong&gt;. A companion post from &lt;a href=&quot;https://x.com/Google/status/2052794909355094217&quot;&gt;Google Cloud&lt;/a&gt; claimed real-world impact including &lt;strong&gt;doubling training speed for massive AI models&lt;/strong&gt; and routing optimizations that save &lt;strong&gt;15,000 km of travel annually&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Robotics demos are getting closer to coordinated household competence&lt;/strong&gt;: &lt;a href=&quot;https://x.com/adcock_brett/status/2052770989944242335&quot;&gt;@adcock_brett&lt;/a&gt; shared Figure’s latest demo of &lt;strong&gt;two Helix-02 robots making a bed together fully autonomously&lt;/strong&gt;, with a follow-up linking the underlying system &lt;a href=&quot;https://x.com/adcock_brett/status/2052771762056974511&quot;&gt;here&lt;/a&gt;. The more interesting claim was that the robots coordinated &lt;strong&gt;without an explicit communication channel&lt;/strong&gt;, inferring each other’s likely actions from motion and camera observations. In the broader physical-AI direction, &lt;a href=&quot;https://x.com/DrJimFan/status/2052758642781487237&quot;&gt;@DrJimFan&lt;/a&gt; published a dense “&lt;strong&gt;Robotics: Endgame&lt;/strong&gt;” talk arguing for a roadmap built around &lt;strong&gt;video world models, world action models, robot-data flywheels, and physical RL&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Anthropic alignment research&lt;/strong&gt;: &lt;a href=&quot;https://x.com/AnthropicAI/status/2052808787514228772&quot;&gt;“Teaching Claude why”&lt;/a&gt; was the highest-signal technical thread, claiming elimination of a previously observed blackmail behavior via training aimed at model understanding rather than demonstrations alone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI Codex product push&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAI/status/2052800507727781979&quot;&gt;OpenAI’s Codex post&lt;/a&gt; and the broader &lt;code&gt;/goal&lt;/code&gt; discussion around long-running work marked a meaningful step from assistant UX toward agent runtime UX.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;HTML as an agent interface layer&lt;/strong&gt;: &lt;a href=&quot;https://x.com/trq212/status/2052811606032269638&quot;&gt;@trq212&lt;/a&gt; arguing that “&lt;strong&gt;HTML is the new markdown&lt;/strong&gt;” resonated unusually strongly, reflecting a broader shift toward agent-generated artifacts and custom interfaces.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Figure’s household robotics demo&lt;/strong&gt;: &lt;a href=&quot;https://x.com/adcock_brett/status/2052770989944242335&quot;&gt;@adcock_brett&lt;/a&gt; on two Helix-02 robots making a bed was the standout robotics clip by engagement.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DeepMind AI co-mathematician&lt;/strong&gt;: &lt;a href=&quot;https://x.com/pushmeet/status/2052812585804685322&quot;&gt;@pushmeet&lt;/a&gt; on the &lt;strong&gt;48% FrontierMath Tier 4&lt;/strong&gt; result was the clearest science/reasoning milestone in the feed.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Multi-Token Prediction Local Inference&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t6se6r/multitoken_prediction_mtp_for_llamacpp_gemma_4/&quot;&gt;Multi-Token Prediction (MTP) for LLaMA.cpp - Gemma 4 speedup by 40%&lt;/a&gt;&lt;/strong&gt; (Activity: 669): &lt;strong&gt;A patched fork of &lt;strong&gt;llama.cpp&lt;/strong&gt; adds &lt;strong&gt;Multi-Token Prediction (MTP)&lt;/strong&gt; support and publishes quantized &lt;strong&gt;Gemma 4 assistant GGUF&lt;/strong&gt; models on &lt;a href=&quot;https://huggingface.co/collections/AtomicChat/gemma-4-assistant-gguf&quot;&gt;Hugging Face&lt;/a&gt;. On a &lt;strong&gt;MacBook Pro M5 Max&lt;/strong&gt;, the author reports &lt;strong&gt;Gemma 26B&lt;/strong&gt; generation improving from &lt;code&gt;97 tok/s&lt;/code&gt; to &lt;code&gt;138 tok/s&lt;/code&gt;—about a &lt;code&gt;42%&lt;/code&gt; throughput increase—for the prompt &lt;em&gt;“Write a Python program to find the nth Fibonacci number using recursion”&lt;/em&gt;; code is in &lt;a href=&quot;https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant&quot;&gt;&lt;code&gt;AtomicBot-ai/atomic-llama-cpp-turboquant&lt;/code&gt;&lt;/a&gt;, with an associated local app at &lt;a href=&quot;http://atomic.chat&quot;&gt;atomic.chat&lt;/a&gt;.&lt;/strong&gt; Commenters asked for a stricter apples-to-apples benchmark using the &lt;strong&gt;same seed&lt;/strong&gt; and &lt;code&gt;temperature=0.0&lt;/code&gt; so outputs should match exactly, making it easier to verify that MTP does not degrade quality. There was also interest in compatibility with &lt;strong&gt;LM Studio&lt;/strong&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters focused on validating whether &lt;strong&gt;Multi-Token Prediction (MTP)&lt;/strong&gt; preserves generation quality: they suggested rerunning the comparison with the &lt;strong&gt;same seed&lt;/strong&gt; and &lt;code&gt;temperature=0.0&lt;/code&gt;, where deterministic decoding should produce identical output if MTP is not changing token choices. Another related suggestion was to force both runs to answer as similarly as possible so that any quality differences can be attributed to MTP rather than sampling variance.&lt;/li&gt;
&lt;li&gt;There was a compatibility question about whether the new &lt;strong&gt;llama.cpp MTP support&lt;/strong&gt; works through &lt;strong&gt;LM Studio&lt;/strong&gt;, implying interest in whether frontends using llama.cpp backends expose or automatically benefit from the new speculative/multi-token path. A separate model-format request asked for &lt;strong&gt;GGUF builds of &lt;a href=&quot;https://github.com/p-e-w/heretic&quot;&gt;heretic&lt;/a&gt;&lt;/strong&gt;, reflecting demand for llama.cpp-compatible quantized deployments.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t5yajb/qwen36_27b_uncensored_heretic_v2_native_mtp/&quot;&gt;Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats.&lt;/a&gt;&lt;/strong&gt; (Activity: 591): &lt;strong&gt;&lt;strong&gt;llmfan46&lt;/strong&gt; released &lt;strong&gt;Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved&lt;/strong&gt; on Hugging Face in multiple formats: &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved&quot;&gt;Safetensors&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF&quot;&gt;GGUF&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF&quot;&gt;NVFP4 GGUF&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4&quot;&gt;NVFP4&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-MLP-Only&quot;&gt;NVFP4 MLP-only&lt;/a&gt;, and &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GPTQ-Int4&quot;&gt;GPTQ-Int4&lt;/a&gt;. The release claims &lt;strong&gt;full preservation of all &lt;code&gt;15&lt;/code&gt; native MTP heads&lt;/strong&gt;, &lt;strong&gt;KLD &lt;code&gt;0.0021&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;6/100&lt;/code&gt; refusals&lt;/strong&gt;, and includes benchmark results; the author’s model index is &lt;a href=&quot;https://huggingface.co/llmfan46/models&quot;&gt;here&lt;/a&gt;.&lt;/strong&gt; Commenters asked for a smaller &lt;strong&gt;&lt;code&gt;Q4_K_XS&lt;/code&gt; GGUF&lt;/strong&gt; suitable for &lt;code&gt;16GB&lt;/code&gt; VRAM with usable context, questioned whether &lt;strong&gt;MTP works with TurboQuant-compressed KV cache&lt;/strong&gt;, and asked if the same MTP preservation approach could be applied to a &lt;strong&gt;Gemma 4 dense&lt;/strong&gt; model. Another technical concern was that &lt;strong&gt;NVFP4 + MTP on Blackwell&lt;/strong&gt; appears blocked or immature pending newer CUDA support.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users asked for lower-memory quantization and runtime compatibility details, specifically a &lt;code&gt;Q4_K_XS&lt;/code&gt; GGUF variant to fit &lt;code&gt;16GB&lt;/code&gt; VRAM with usable context, and whether the preserved &lt;code&gt;15&lt;/code&gt; MTP heads work when the KV cache is compressed with TurboQuant.&lt;/li&gt;
&lt;li&gt;A technical concern was raised that the reported &lt;code&gt;KLD 0.0021&lt;/code&gt; may not validate MTP behavior on the safety-edited distribution: if MTP draft heads were trained on the original refusal-heavy model while the base was uncensored, speculative decoding could have lower acceptance or actively bias generation back toward refusals on the exact prompts affected by the Heretic tuning.&lt;/li&gt;
&lt;li&gt;Several implementation/platform questions focused on model-feature support: whether MTP can be transferred to a future dense Gemma 4-style model, whether &lt;code&gt;NVFP4&lt;/code&gt; + MTP is currently usable on Blackwell given apparent CUDA/toolchain blockers, and whether included &lt;code&gt;mmproj&lt;/code&gt; files still hit crashes referenced as &lt;code&gt;PR #22673&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. AI Accelerator Hardware and ROCm Support&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t6b2x8/amd_intros_instinct_mi350p_accelerator_cdna_4/&quot;&gt;AMD Intros Instinct MI350P Accelerator: CDNA 4 Comes to PCIe Cards&lt;/a&gt;&lt;/strong&gt; (Activity: 474): &lt;strong&gt;&lt;a href=&quot;https://www.servethehome.com/amd-intros-instinct-mi350p-accelerator-cdna-4-comes-to-pcie-cards/&quot;&gt;ServeTheHome reports&lt;/a&gt; AMD’s &lt;strong&gt;Instinct MI350P&lt;/strong&gt;, bringing &lt;strong&gt;CDNA 4&lt;/strong&gt; Instinct MI350-class acceleration to a &lt;strong&gt;PCIe add-in card&lt;/strong&gt; form factor. The discussion highlights HBM3E configurations listed as &lt;code&gt;144GB&lt;/code&gt; and &lt;code&gt;288GB&lt;/code&gt;, but AMD has not disclosed &lt;strong&gt;pricing or availability&lt;/strong&gt;.&lt;/strong&gt; Commenters mainly focused on the missing pricing/availability; one sarcastically suggested &lt;code&gt;$499&lt;/code&gt; would be “about right” for the HBM-heavy accelerator.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter highlighted the key technical specification of the &lt;strong&gt;AMD Instinct MI350P&lt;/strong&gt; PCIe card: &lt;code&gt;3.6 TB/s&lt;/code&gt; memory bandwidth, paired with very large HBM3E capacities listed in the article/comments as &lt;code&gt;144 GB&lt;/code&gt; and &lt;code&gt;288 GB&lt;/code&gt;. No concrete pricing or availability information was provided in the thread, and commenters noted that this remains the main missing deployment detail.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t6tvfw/taiwanese_company_skymizer_announces_htx301_pcie/&quot;&gt;Taiwanese company Skymizer announces HTX301 - PCIE inference card with 384GB of Memory at ~240 Watts&lt;/a&gt;&lt;/strong&gt; (Activity: 402): &lt;strong&gt;&lt;strong&gt;Skymizer&lt;/strong&gt; &lt;a href=&quot;https://skymizer.ai/skymizer-announces-htx301-reinventing-on-prem-ai-inference/&quot;&gt;announced the HTX301&lt;/a&gt;, a PCIe inference card/reference platform with &lt;strong&gt;six HTX301 chips&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;384GB&lt;/code&gt; of memory&lt;/strong&gt;, and claimed &lt;strong&gt;~&lt;code&gt;240W&lt;/code&gt;&lt;/strong&gt; power for local inference of models up to &lt;strong&gt;&lt;code&gt;700B&lt;/code&gt; parameters&lt;/strong&gt;. The company describes a &lt;em&gt;decode-first&lt;/em&gt; architecture with prefill/decode disaggregation and &lt;strong&gt;LISA™&lt;/strong&gt; orchestration for scaling from &lt;code&gt;4B&lt;/code&gt; to &lt;code&gt;700B&lt;/code&gt; LLMs, but the announcement does not disclose key technical specs such as memory bandwidth, interconnect topology, token throughput, precision formats, or per-chip compute.&lt;/strong&gt; Commenters were strongly skeptical, calling the website mostly marketing/fluff and noting that without bandwidth, compute, pricing, availability, or third-party benchmarks, the claims are not yet technically verifiable.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commenters noted that the announcement lacks the core specs needed to evaluate an inference accelerator: &lt;strong&gt;memory bandwidth, aggregate compute throughput, interconnect details, and performance scaling across the six chips&lt;/strong&gt;. The headline &lt;code&gt;384GB&lt;/code&gt; memory and &lt;code&gt;~240W&lt;/code&gt; power are considered insufficient without benchmarks or a clear architecture breakdown.&lt;/li&gt;
&lt;li&gt;A recurring technical concern is software support: even if the PCIe card exists, buyers need details on the runtime, compiler, model support, APIs, and framework integration needed to “tap into” the hardware. One commenter compared this risk to &lt;strong&gt;ROCm&lt;/strong&gt;, arguing that accelerator hardware is only useful if the software stack is mature enough for real deployment.&lt;/li&gt;
&lt;li&gt;Several commenters framed HTX301 as &lt;em&gt;vaporware until proven otherwise&lt;/em&gt;, comparing it against currently viable accelerator ecosystems: &lt;strong&gt;Nvidia, AMD, Intel, Huawei, Apple silicon, and Google TPUs&lt;/strong&gt;. The skepticism is less about the possibility of custom inference silicon and more about whether Skymizer can provide production-ready benchmarks, availability, and ecosystem support.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t7g70j/vllm_rocm_has_been_added_to_lemonade_as_an/&quot;&gt;vLLM ROCm has been added to Lemonade as an experimental backend&lt;/a&gt;&lt;/strong&gt; (Activity: 313): &lt;strong&gt;The image is a technical announcement that &lt;strong&gt;Lemonade now supports &lt;code&gt;vLLM&lt;/code&gt; on AMD ROCm as an experimental backend&lt;/strong&gt; for Linux/Strix Halo, with the shown commands &lt;code&gt;lemonade backends install vllm:rocm&lt;/code&gt; and &lt;code&gt;lemonade run Qwen3.5-0.8B-vLLM&lt;/code&gt; (&lt;a href=&quot;https://i.redd.it/kesrnt4lgyzg1.png&quot;&gt;image&lt;/a&gt;). The post frames this as a way to run &lt;code&gt;.safetensors&lt;/code&gt; LLMs via vLLM before GGUF conversion, complementing &lt;code&gt;llama.cpp&lt;/code&gt;; links include the &lt;a href=&quot;https://lemonade-server.ai/news/vllm-rocm.html&quot;&gt;quick start guide&lt;/a&gt;, &lt;a href=&quot;https://github.com/lemonade-sdk/lemonade&quot;&gt;Lemonade GitHub&lt;/a&gt;, and a standalone portable vLLM ROCm executable at &lt;a href=&quot;https://github.com/lemonade-sdk/vllm-rocm/&quot;&gt;&lt;code&gt;lemonade-sdk/vllm-rocm&lt;/code&gt;&lt;/a&gt;.&lt;/strong&gt; Commenters were interested in what &lt;code&gt;vLLM&lt;/code&gt; offers over &lt;code&gt;llama.cpp&lt;/code&gt; on Strix Halo, and one praised the availability of Arch and Fedora releases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users highlighted backend/platform support details: Lemonade’s experimental &lt;strong&gt;vLLM ROCm&lt;/strong&gt; integration has &lt;strong&gt;Arch&lt;/strong&gt; and &lt;strong&gt;Fedora&lt;/strong&gt; releases, and AMD’s jfowers pointed to a standalone portable vLLM ROCm executable at &lt;a href=&quot;https://github.com/lemonade-sdk/vllm-rocm/&quot;&gt;github.com/lemonade-sdk/vllm-rocm&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A technical comparison question was raised about running &lt;strong&gt;vLLM on AMD Strix Halo&lt;/strong&gt; versus &lt;code&gt;llama.cpp&lt;/code&gt;, specifically what vLLM offers over llama.cpp for local inference on that hardware.&lt;/li&gt;
&lt;li&gt;There was interest in broader ROCm GPU compatibility, with a user asking whether older AMD datacenter cards such as the &lt;strong&gt;MI50&lt;/strong&gt; could be supported.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. Vibe Coding Debugging Hangover&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t5vs8t/the_part_nobody_warns_you_about/&quot;&gt;the part nobody warns you about&lt;/a&gt;&lt;/strong&gt; (Activity: 2145): &lt;strong&gt;The post describes a common &lt;strong&gt;AI-assisted rapid prototyping failure mode&lt;/strong&gt;: an app was built in ~&lt;code&gt;3 days&lt;/code&gt;, but the author has spent ~&lt;code&gt;2 weeks&lt;/code&gt; debugging slow UI/build/test loops, unclear generated code, oversized functions, ambiguous state variables, and undocumented agent-made decisions. Top technical suggestions were to have &lt;strong&gt;Claude generate automated tests&lt;/strong&gt; to replace repeated manual button-click regression checks, and to develop in smaller phases with continuous debugging so early defects do not become architectural assumptions or dependencies.&lt;/strong&gt; Commenters framed the issue as partly process-related: defered validation creates a “Gordian knot” where fixes introduce new bugs. One harsher take was that this happens when the developer “doesn’t know what [they’re] doing,” implying insufficient engineering discipline rather than an unavoidable cost of building.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters emphasized adding automated tests early rather than manually clicking through UI flows: one suggested asking &lt;strong&gt;Claude&lt;/strong&gt; to generate tests so regressions are caught continuously, while another recommended building in phases and debugging incrementally because &lt;em&gt;“early bugs become assumptions, and then dependencies”&lt;/em&gt;—delaying validation can turn fixes into cascading regressions.&lt;/li&gt;
&lt;li&gt;A commenter recommended &lt;a href=&quot;https://github.com/Storybloq/storybloq&quot;&gt;&lt;strong&gt;Storybloq&lt;/strong&gt;&lt;/a&gt;, described as a &lt;strong&gt;Claude Code&lt;/strong&gt; tool that adds a git-tracked project memory and governance layer. The claimed technical benefit is auditability of agent decisions over time, helping future debugging by preserving why prior implementation choices were made.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1t67k33/thanks_claude/&quot;&gt;thanks Claude&lt;/a&gt;&lt;/strong&gt; (Activity: 2239): &lt;strong&gt;The image is a &lt;strong&gt;non-technical meme/tweet screenshot&lt;/strong&gt; joking that AI tools like Claude increase the speed of prototyping &lt;em&gt;and&lt;/em&gt; abandonment: &lt;em&gt;“thanks to AI i create and abandon projects 4x faster.”&lt;/em&gt; In context, the post extends the joke to buying more domains and “vibe coding” via &lt;a href=&quot;http://ijustvibecodedthis.com&quot;&gt;ijustvibecodedthis.com&lt;/a&gt;; the image is here: &lt;a href=&quot;https://i.redd.it/7oz5ncnq8pzg1.png&quot;&gt;https://i.redd.it/7oz5ncnq8pzg1.png&lt;/a&gt;.&lt;/strong&gt; Comments frame this as a humorous but real critique of AI-assisted development: LLMs lower the cost of generating ideas and prototypes, but &lt;strong&gt;shipping, productionizing, and user adoption remain the hard parts&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>openai</category><category>zyphra</category><category>amd</category><category>deepseek</category><category>vllm_project</category><category>gpt-5.5</category><category>gpt-image-2</category><category>gpt-5.5-pro</category><category>gpt-5.5-instant</category><category>gpt-realtime-2</category><category>gpt-5.5-cyber</category><category>codex</category><category>zaya1-74b-preview</category><category>zaya1-vl-8b</category><category>qwen3-omni</category><category>reach_vb</category><category>dhh</category><category>gdb</category><category>patience_cave</category><category>ithilgore</category><category>cryps1s</category><category>sama</category><category>deredleritt3r</category><category>model-release</category><category>model-training</category><category>mixture-of-experts</category><category>inference</category><category>model-optimization</category><category>sandboxing</category><category>alignment</category><category>cybersecurity</category><category>agent-runtime</category><category>throughput</category><category>quantization</category><category>telemetry</category><category>real-time-detection</category></item><item><title> GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs</title><link>https://news.smol.ai/issues/26-05-07-gpt-realtime-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-07-gpt-realtime-2/</guid><description>**OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K tokens**, achieving top scores on **Big Bench Audio** and **Conversational Dynamics** benchmarks. They also launched a **Chrome plugin for Codex** enabling browser control and multitasking, and introduced **GPT-5.5 with Trusted Access for Cyber** for secure defensive workflows and red teaming. **Anthropic** introduced **Natural Language Autoencoders** for interpreting model activations as human-readable text, aiding interpretability and debugging, while **Goodfire** proposed a neural geometry research agenda focusing on **manifolds** as primitives for neural network behavior. Anthropic also announced **The Anthropic Institute** to advance AI safety and economic resilience research.</description><pubDate>Thu, 07 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/6/2026-5/7/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;OpenAI launched realtime-1.5 3 months ago, but it was a relative drop in the bucket because it was still 4o based intelligence (a +5% bump in Big Bench Audio). You could tell the sheer confidence in today’s realtime-2 release (with a +15.2% bump in BBA), and it was appropriately well received:&lt;/p&gt;
&lt;p&gt;As the blogpost explains, 3 models are being released, which one might simplify to “voice-in, voice-out, and voice-to-voice”:&lt;/p&gt;
&lt;p&gt;The focus is less about “voice quality”, and more on usability. TLDR:&lt;/p&gt;
&lt;p&gt;Preambles: Developers can enable short phrases before a main response, like “let me check that” or “one moment while I look into it”.&lt;/p&gt;
&lt;p&gt;Parallel tool calls and tool transparency: The model can call multiple tools at once and make those actions audible with phrases like “checking your calendar” or “looking that up now,” helping agents stay responsive while completing tasks.&lt;/p&gt;
&lt;p&gt;Stronger recovery behavior: The model can recover more gracefully by saying things like “I’m having trouble with that right now,” instead of failing or breaking.&lt;/p&gt;
&lt;p&gt;Longer context: 32K → 128K&lt;/p&gt;
&lt;p&gt;Stronger domain understanding: The model better retains specialized terminology, proper nouns, healthcare terms, and other vocabulary&lt;/p&gt;
&lt;p&gt;More controllable tone and delivery: The model can better adjust its tone—speaking calmly, empathetically, or upbeat, based on context&lt;/p&gt;
&lt;p&gt;Adjustable reasoning effort: Developers can now select from minimal, low, medium, high, and xhigh reasoning levels, with low as the default.&lt;/p&gt;
&lt;p&gt;The Demo video showed off how the audio model is better tuned when the main speaker is speaking to someone else, so it stops interrupting so much:&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Top Story: GPT-Realtime-2 and OpenAI voice AI commentary&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;What happened&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;OpenAI launched three new streaming audio models in the Realtime API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.&lt;/strong&gt; OpenAI positioned GPT-Realtime-2 as its “most intelligent voice model yet,” bringing “GPT-5-class reasoning” to real-time voice agents that can listen, reason, handle interruptions, use tools, and sustain longer conversations as they unfold &lt;a href=&quot;https://x.com/OpenAI/status/2052438194625593804&quot;&gt;@OpenAI&lt;/a&gt;. The companion models target live speech translation and transcription: GPT-Realtime-Translate supports streaming translation from 70+ input languages into 13 output languages, while GPT-Realtime-Whisper streams transcription/captions as speech is produced &lt;a href=&quot;https://x.com/OpenAI/status/2052438196454379986&quot;&gt;@OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/OpenAIDevs/status/2052440907933474954&quot;&gt;@OpenAIDevs&lt;/a&gt;. OpenAI said the models are available in the Realtime API now, while ChatGPT voice upgrades are still pending: “Stay tuned, we’re cooking” &lt;a href=&quot;https://x.com/OpenAI/status/2052438197695877316&quot;&gt;@OpenAI&lt;/a&gt;. Sam Altman framed the launch around a behavioral shift: users increasingly use voice with AI when they need to “dump” lots of context, and OpenAI is also working on improvements to ChatGPT voice &lt;a href=&quot;https://x.com/sama/status/2052462271667028211&quot;&gt;@sama&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Facts vs. opinions&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Factual / directly claimed by OpenAI and evaluators&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Model family:&lt;/strong&gt; GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper are available in the Realtime API today &lt;a href=&quot;https://x.com/OpenAIDevs/status/2052440968763515223&quot;&gt;@OpenAIDevs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPT-Realtime-2 capabilities:&lt;/strong&gt; reasoning-oriented native speech-to-speech model for production voice agents; supports tool use/action, interruption recovery, longer conversations, and “GPT-5-class reasoning” per OpenAI’s wording &lt;a href=&quot;https://x.com/OpenAI/status/2052438194625593804&quot;&gt;@OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/reach_vb/status/2052438371058737280&quot;&gt;@reach_vb&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context window:&lt;/strong&gt; community/OpenAI-dev commentary reported &lt;strong&gt;128K context&lt;/strong&gt; for GPT-Realtime-2 voice agents &lt;a href=&quot;https://x.com/reach_vb/status/2052438371058737280&quot;&gt;@reach_vb&lt;/a&gt;; Artificial Analysis independently reported the context window increased from &lt;strong&gt;32K to 128K&lt;/strong&gt;, with &lt;strong&gt;32K max output tokens&lt;/strong&gt; &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Translation:&lt;/strong&gt; GPT-Realtime-Translate supports live speech translation from &lt;strong&gt;70+ input languages&lt;/strong&gt; into &lt;strong&gt;13 output languages&lt;/strong&gt; &lt;a href=&quot;https://x.com/OpenAI/status/2052438196454379986&quot;&gt;@OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/reach_vb/status/2052438371058737280&quot;&gt;@reach_vb&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Transcription:&lt;/strong&gt; GPT-Realtime-Whisper provides low-latency streaming transcription in the Realtime API for captions, notes, and continuous speech understanding &lt;a href=&quot;https://x.com/OpenAIDevs/status/2052440957258489859&quot;&gt;@OpenAIDevs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prompting/control:&lt;/strong&gt; OpenAI published a voice prompting guide covering reasoning effort, preambles, tool behavior, unclear audio handling, exact entity capture, and state maintenance in long sessions &lt;a href=&quot;https://x.com/OpenAIDevs/status/2052530378184032560&quot;&gt;@OpenAIDevs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Independent benchmarks:&lt;/strong&gt; Scale AI reported GPT-Realtime-2 took the top spot on its Audio MultiChallenge S2S leaderboard, with instruction retention rising from &lt;strong&gt;36.7% to 70.8% APR&lt;/strong&gt; versus GPT-Realtime-1.5 and strong performance on voice editing/real-time repair &lt;a href=&quot;https://x.com/ScaleAILabs/status/2052451341071683732&quot;&gt;@ScaleAILabs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Independent benchmarks:&lt;/strong&gt; Artificial Analysis reported &lt;strong&gt;96.6%&lt;/strong&gt; on Big Bench Audio speech-to-speech reasoning, &lt;strong&gt;96.1%&lt;/strong&gt; on its Conversational Dynamics benchmark, average time-to-first-audio of &lt;strong&gt;2.33s&lt;/strong&gt; at high reasoning and &lt;strong&gt;1.12s&lt;/strong&gt; at minimal reasoning, and unchanged audio pricing of &lt;strong&gt;$1.15/hour input&lt;/strong&gt; and &lt;strong&gt;$4.61/hour output&lt;/strong&gt; &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;, &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486478501204415&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reasoning-effort controls:&lt;/strong&gt; Artificial Analysis reported adjustable reasoning levels: &lt;strong&gt;minimal, low, medium, high, xhigh&lt;/strong&gt;, with &lt;strong&gt;low&lt;/strong&gt; as default &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Enterprise/product evals:&lt;/strong&gt; Glean said GPT-Realtime-2 delivered a &lt;strong&gt;42.9% relative increase in helpfulness&lt;/strong&gt; over the previous version in internal evals for real-time organizational voice interactions &lt;a href=&quot;https://x.com/glean/status/2052440702169108990&quot;&gt;@glean&lt;/a&gt;. Genspark said its Call for Me Agent moved to GPT-Realtime-2 and saw &lt;strong&gt;+26% effective conversation rate&lt;/strong&gt; and fewer dropped calls &lt;a href=&quot;https://x.com/genspark_ai/status/2052524670088556557&quot;&gt;@genspark_ai&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Opinions / interpretation / commentary&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Supporters described the launch as a “big step forward” for voice agents &lt;a href=&quot;https://x.com/sama/status/2052462271667028211&quot;&gt;@sama&lt;/a&gt;, “total realtime victory” &lt;a href=&quot;https://x.com/reach_vb/status/2052442056392405383&quot;&gt;@reach_vb&lt;/a&gt;, and the first speech-to-speech model good enough for “real work” in complex voice agents &lt;a href=&quot;https://x.com/kwindla/status/2052521318688739811&quot;&gt;@kwindla&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A more cautious view: Simon Willison noted the announcement does &lt;strong&gt;not&lt;/strong&gt; mean ChatGPT Voice Mode itself has upgraded yet; the ChatGPT upgrade “sounds” like it is coming soon &lt;a href=&quot;https://x.com/simonw/status/2052439091577496054&quot;&gt;@simonw&lt;/a&gt;, &lt;a href=&quot;https://x.com/simonw/status/2052439181885153757&quot;&gt;@simonw&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Interface skepticism: Will Depue compared audio to VR—frequently exciting, but historically not sticky as an interface—while arguing that real-time tool use, reasoning while speaking, and live translation are the kinds of capabilities that could make audio interfaces finally take off &lt;a href=&quot;https://x.com/willdepue/status/2052493097586823353&quot;&gt;@willdepue&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Broader UX optimism: several commenters framed voice as more natural and bandwidth-efficient for humans &lt;a href=&quot;https://x.com/BorisMPower/status/2052471142921994332&quot;&gt;@BorisMPower&lt;/a&gt;, a path toward Jarvis-like always-available computer agents &lt;a href=&quot;https://x.com/willdepue/status/2052494388413235672&quot;&gt;@willdepue&lt;/a&gt;, or eventually displaced by even higher-bandwidth BCIs &lt;a href=&quot;https://x.com/iScienceLuvr/status/2052465922640593068&quot;&gt;@iScienceLuvr&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Competitive context: Elon Musk pushed Grok Voice for customer support &lt;a href=&quot;https://x.com/elonmusk/status/2052530063913189879&quot;&gt;@elonmusk&lt;/a&gt;, underscoring that real-time voice support/customer-service automation is now a competitive surface across labs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Technical details and benchmark data&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;GPT-Realtime-2&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Native speech-to-speech / real-time voice model, released via OpenAI’s Realtime API &lt;a href=&quot;https://x.com/OpenAI/status/2052438194625593804&quot;&gt;@OpenAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Framed as “GPT-5-class reasoning” for voice agents &lt;a href=&quot;https://x.com/OpenAI/status/2052438194625593804&quot;&gt;@OpenAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Designed for agents that can:
&lt;ul&gt;
&lt;li&gt;reason mid-conversation,&lt;/li&gt;
&lt;li&gt;use tools/take actions,&lt;/li&gt;
&lt;li&gt;handle interruptions,&lt;/li&gt;
&lt;li&gt;recover when users revise or repair speech,&lt;/li&gt;
&lt;li&gt;sustain longer sessions with expanded context &lt;a href=&quot;https://x.com/OpenAI/status/2052438196454379986&quot;&gt;@OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/reach_vb/status/2052438371058737280&quot;&gt;@reach_vb&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Reported context: &lt;strong&gt;128K tokens&lt;/strong&gt;, up from &lt;strong&gt;32K&lt;/strong&gt; &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Reported max output: &lt;strong&gt;32K tokens&lt;/strong&gt; &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Inputs reported by Artificial Analysis: &lt;strong&gt;text, audio, and image&lt;/strong&gt; &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Reasoning effort levels: &lt;strong&gt;minimal, low, medium, high, xhigh&lt;/strong&gt;; default &lt;strong&gt;low&lt;/strong&gt; &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Time-to-first-audio:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;1.12s&lt;/strong&gt; at minimal reasoning,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;2.33s&lt;/strong&gt; at high reasoning &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Pricing:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;$1.15/hour audio input&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;$4.61/hour audio output&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;unchanged versus prior model according to Artificial Analysis &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486478501204415&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Conversational features: supports short preambles before main responses—e.g. “let me check that”—and audible transparency during tool calls—e.g. “checking your calendar” &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Benchmarks&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Scale AI Audio MultiChallenge S2S:&lt;/strong&gt; GPT-Realtime-2 placed #1; instruction retention improved from &lt;strong&gt;36.7% to 70.8% APR&lt;/strong&gt; versus GPT-Realtime-1.5; strong voice editing when users repair/revise speech in real time &lt;a href=&quot;https://x.com/ScaleAILabs/status/2052451341071683732&quot;&gt;@ScaleAILabs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Artificial Analysis Big Bench Audio:&lt;/strong&gt; GPT-Realtime-2 high variant scored &lt;strong&gt;96.6%&lt;/strong&gt;, reported as equal to Gemini 3.1 Flash Live Preview High and about &lt;strong&gt;~13%&lt;/strong&gt; above the previous highest result &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Justin Uberti separately summarized the improvement as &lt;strong&gt;15 percentage points vs. GPT-Realtime-1.5&lt;/strong&gt; on Big Bench Audio, near saturation &lt;a href=&quot;https://x.com/juberti/status/2052507302092296252&quot;&gt;@juberti&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Conversational Dynamics / Full Duplex Bench subset:&lt;/strong&gt; GPT-Realtime-2 minimal variant scored &lt;strong&gt;96.1%&lt;/strong&gt;, with strengths in pause handling and turn-taking &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;GPT-Realtime-Translate&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Live streaming speech translation from &lt;strong&gt;70+ input languages&lt;/strong&gt; to &lt;strong&gt;13 output languages&lt;/strong&gt; &lt;a href=&quot;https://x.com/OpenAI/status/2052438196454379986&quot;&gt;@OpenAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;OpenAI cofounder Greg Brockman said real-time voice-to-voice translation has been an anticipated OpenAI application since the company’s early days and is now available for anyone to build with &lt;a href=&quot;https://x.com/gdb/status/2052480998668206262&quot;&gt;@gdb&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Vimeo demonstrated live dubbing with no pre-loaded captions, showing translations generated fully live &lt;a href=&quot;https://x.com/Vimeo/status/2052442588201029684&quot;&gt;@Vimeo&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Junling Zhang highlighted the new real-time translation model and encouraged API usage &lt;a href=&quot;https://x.com/jxnlco/status/2052449634266812744&quot;&gt;@jxnlco&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Boris Power said live translation “actually works incredibly well” and plans to use it regularly &lt;a href=&quot;https://x.com/BorisMPower/status/2052472038967890022&quot;&gt;@BorisMPower&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;GPT-Realtime-Whisper&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Streaming transcription as people speak, for real-time captions, notes, and speech understanding &lt;a href=&quot;https://x.com/OpenAI/status/2052438196454379986&quot;&gt;@OpenAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Justin Uberti described it as “Whisper, but now with realtime streaming” and updated demos to use the new model &lt;a href=&quot;https://x.com/juberti/status/2052478775523512356&quot;&gt;@juberti&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Uberti also built a delay selector to expose the latency/accuracy tradeoff in a real-time typing demo &lt;a href=&quot;https://x.com/juberti/status/2052504986391879788&quot;&gt;@juberti&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Product integrations and demos&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Glean:&lt;/strong&gt; shipped real-time voice powered by GPT-Realtime-2, grounded in organizational context; internal evals showed &lt;strong&gt;42.9% relative helpfulness increase&lt;/strong&gt; over the previous version &lt;a href=&quot;https://x.com/glean/status/2052440702169108990&quot;&gt;@glean&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Vimeo:&lt;/strong&gt; demonstrated live dubbing using GPT-Realtime-Translate, with translations generated live and no pre-loaded captions &lt;a href=&quot;https://x.com/Vimeo/status/2052442588201029684&quot;&gt;@Vimeo&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Genspark:&lt;/strong&gt; upgraded its Call for Me Agent to GPT-Realtime-2; Genspark Realtime Voice is next; claimed sharper reasoning, tighter instruction following, &lt;strong&gt;+26% effective conversation rate&lt;/strong&gt;, and fewer dropped calls &lt;a href=&quot;https://x.com/genspark_ai/status/2052524670088556557&quot;&gt;@genspark_ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gradient Bang / game-agent demo:&lt;/strong&gt; Kyle Windland said GPT-Realtime-2 is the first OpenAI speech-to-speech model good enough for his voice agents that do “real work,” showing it as the ship AI in a complex agent with tool calls and subagents &lt;a href=&quot;https://x.com/kwindla/status/2052521318688739811&quot;&gt;@kwindla&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Voice-controlled market dashboard:&lt;/strong&gt; Levin Stanley demoed GPT-Realtime-2 controlling an interface by intent—“Focus on Apple,” “How did it do over the last 30 days?”, “Go back”—arguing that real-time interruption and reasoning change the UI loop from navigation to direction &lt;a href=&quot;https://x.com/levinstanley/status/2052506605044842672&quot;&gt;@levinstanley&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Realtime demos:&lt;/strong&gt; Justin Uberti updated &lt;code&gt;hello-realtime&lt;/code&gt; for GPT-Realtime-2 and provided a phone demo number &lt;a href=&quot;https://x.com/juberti/status/2052469176821002676&quot;&gt;@juberti&lt;/a&gt;; Diego Cabezas posted a quick GPT-Realtime-2 demo &lt;a href=&quot;https://x.com/diegocabezas01/status/2052492653082681485&quot;&gt;@diegocabezas01&lt;/a&gt;; Ray Fernando hosted a “Building a Live Translator” broadcast &lt;a href=&quot;https://x.com/RayFernando1337/status/2052479718495318143&quot;&gt;@RayFernando1337&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reachy Mini / robotics voice interface interest:&lt;/strong&gt; Clement Delangue asked who would add the new voice capabilities to Reachy Mini &lt;a href=&quot;https://x.com/ClementDelangue/status/2052449977725534363&quot;&gt;@ClementDelangue&lt;/a&gt;, after earlier asking voice AI labs such as Gradium, Kyutai, and ElevenLabs who could help with a robot voice use case &lt;a href=&quot;https://x.com/ClementDelangue/status/2052385809655828907&quot;&gt;@ClementDelangue&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Why this matters&lt;/h2&gt;
&lt;p&gt;The launch pushes voice agents from “speech I/O wrapper around a chatbot” toward &lt;strong&gt;full-duplex, tool-using, long-context, reasoning agents&lt;/strong&gt;. The technical shift is not just better ASR or TTS; it is the combination of low-latency turn-taking, interruption handling, longer context, tool-call transparency, and adjustable reasoning effort in a single real-time loop. That matters for customer support, meetings, accessibility, live translation, robotics, browser/computer control, and hands-free workflows where text chat is too slow or awkward.&lt;/p&gt;
&lt;p&gt;The most important engineering implication is that voice apps now need to be designed as &lt;strong&gt;stateful real-time systems&lt;/strong&gt;, not prompt-response endpoints. OpenAI’s prompting guide explicitly points developers toward reasoning-effort tuning, preambles, tool behavior, unclear-audio recovery, entity capture, and long-session state management &lt;a href=&quot;https://x.com/OpenAIDevs/status/2052530378184032560&quot;&gt;@OpenAIDevs&lt;/a&gt;. This suggests voice-agent quality will increasingly depend on harness design: latency budgets, interruption semantics, tool-call UX, conversational memory, and failure recovery—not just raw model selection.&lt;/p&gt;
&lt;p&gt;The remaining uncertainty is distribution. The API model is available now, but ChatGPT voice mode has not yet received the upgrade, per Simon Willison’s observation &lt;a href=&quot;https://x.com/simonw/status/2052439091577496054&quot;&gt;@simonw&lt;/a&gt;. If and when ChatGPT Voice gets the same capabilities, the consumer impact could be much larger. Until then, the launch primarily benefits developers and platforms building specialized real-time agents.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;OpenAI Voice, Codex, and Cybersecurity Releases&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPT-Realtime-2 and new audio stack&lt;/strong&gt;: OpenAI released &lt;strong&gt;GPT-Realtime-2&lt;/strong&gt; in the API, described as its most capable voice model with &lt;strong&gt;GPT-5-class reasoning&lt;/strong&gt;, tool use, interruption handling, and longer conversations; it ships alongside &lt;strong&gt;GPT-Realtime-Translate&lt;/strong&gt; for streaming translation across &lt;strong&gt;70+ input languages / 13 output languages&lt;/strong&gt; and &lt;strong&gt;GPT-Realtime-Whisper&lt;/strong&gt; for low-latency streaming transcription &lt;a href=&quot;https://x.com/OpenAI/status/2052438194625593804&quot;&gt;@OpenAI&lt;/a&gt;. OpenAI says ChatGPT voice updates are still forthcoming &lt;a href=&quot;https://x.com/OpenAI/status/2052438197695877316&quot;&gt;@OpenAI&lt;/a&gt;. Artificial Analysis reports GPT-Realtime-2 reaches &lt;strong&gt;96.6% on Big Bench Audio&lt;/strong&gt;, leads its Conversational Dynamics benchmark at &lt;strong&gt;96.1%&lt;/strong&gt;, expands context from &lt;strong&gt;32K to 128K&lt;/strong&gt;, and keeps audio pricing unchanged &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052486470469140777&quot;&gt;@ArtificialAnlys&lt;/a&gt;. Scale AI also placed GPT-Realtime-2 at #1 on its Audio MultiChallenge S2S leaderboard, with instruction retention rising from &lt;strong&gt;36.7% to 70.8% APR&lt;/strong&gt; versus GPT-Realtime-1.5 &lt;a href=&quot;https://x.com/ScaleAILabs/status/2052451341071683732&quot;&gt;@ScaleAILabs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex gets browser control&lt;/strong&gt;: OpenAI shipped a &lt;strong&gt;Chrome plugin for Codex&lt;/strong&gt; on macOS and Windows, letting Codex operate across background tabs without taking over the user’s browser; it can use plugins where possible, Chrome for logged-in sites, and combine tools for workflows like debugging browser flows, checking dashboards, research, or CRM updates &lt;a href=&quot;https://x.com/OpenAI/status/2052480800004956323&quot;&gt;@OpenAI&lt;/a&gt;. The dev team emphasized browser DevTools, multi-tab parallelism, and web-app testing as key use cases &lt;a href=&quot;https://x.com/OpenAIDevs/status/2052481136971125158&quot;&gt;@OpenAIDevs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cyber-specific GPT-5.5 access&lt;/strong&gt;: OpenAI announced &lt;strong&gt;GPT-5.5 with Trusted Access for Cyber&lt;/strong&gt; for defensive workflows and a limited-preview &lt;strong&gt;GPT-5.5-Cyber&lt;/strong&gt; for authorized red teaming, pentesting, and validation under enhanced verification and account controls &lt;a href=&quot;https://x.com/cryps1s/status/2052508963409998283&quot;&gt;@cryps1s&lt;/a&gt;. Separately, Micah Carroll said OpenAI found instances of accidental &lt;strong&gt;CoT grading&lt;/strong&gt; in previous RL runs after building a scanner, but did not find clear evidence those instances degraded CoT monitorability &lt;a href=&quot;https://x.com/MicahCarroll/status/2052451995467018427&quot;&gt;@MicahCarroll&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Anthropic, Interpretability, and AI Safety Tooling&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Natural Language Autoencoders&lt;/strong&gt;: Anthropic introduced &lt;strong&gt;Natural Language Autoencoders&lt;/strong&gt;, a method for translating model activations into human-readable text so researchers can inspect “thought-like” internal representations rather than only sparse features or supervised probes &lt;a href=&quot;https://x.com/AnthropicAI/status/2052435436157452769&quot;&gt;@AnthropicAI&lt;/a&gt;. Miles Brundage/ML-powered commentary framed NLAs as complementary to probing and dictionary learning, noting they revealed planning behavior and helped identify training-pipeline translation bugs; open-model NLAs are available on Neuronpedia &lt;a href=&quot;https://x.com/mlpowered/status/2052446867037020402&quot;&gt;@mlpowered&lt;/a&gt;. Ryan Greenblatt cautioned that early tests did not recover “internal CoT” on single-forward-pass math cases, suggesting limitations or missing activation locations &lt;a href=&quot;https://x.com/RyanPGreenblatt/status/2052458229624672549&quot;&gt;@RyanPGreenblatt&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goodfire’s neural geometry agenda&lt;/strong&gt;: Goodfire launched a research series arguing neural networks “think in shapes,” with &lt;strong&gt;manifolds&lt;/strong&gt; as a core primitive for interpreting and controlling behavior &lt;a href=&quot;https://x.com/GoodfireAI/status/2052420446910644616&quot;&gt;@GoodfireAI&lt;/a&gt;. The thread contrasts manifold-level structure with SAE-style feature shattering, includes examples where steering along a learned manifold preserves coherent world-model behavior, and teases work on unsupervised manifold discovery and in-context geometry &lt;a href=&quot;https://x.com/GoodfireAI/status/2052420594193650167&quot;&gt;@GoodfireAI&lt;/a&gt;. Goodfire also linked the agenda to scientific discovery, citing reverse-engineering of a scientific foundation model to uncover biomarker structure in a curved manifold &lt;a href=&quot;https://x.com/GoodfireAI/status/2052468622103085107&quot;&gt;@GoodfireAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic safety infrastructure&lt;/strong&gt;: Anthropic shared the research agenda for &lt;strong&gt;The Anthropic Institute&lt;/strong&gt;, focused on economic diffusion, threats/resilience, AI systems in the wild, and &lt;strong&gt;AI-driven R&amp;#x26;D&lt;/strong&gt; with human visibility and control &lt;a href=&quot;https://x.com/AnthropicAI/status/2052385812881228218&quot;&gt;@AnthropicAI&lt;/a&gt;. It also moved &lt;strong&gt;Petri&lt;/strong&gt;, its open-source interactive behavioral-evals tool, to Meridian Labs as an independent project &lt;a href=&quot;https://x.com/AnthropicAI/status/2052494460966019137&quot;&gt;@AnthropicAI&lt;/a&gt;, and opened its security bug bounty publicly on HackerOne &lt;a href=&quot;https://x.com/AnthropicAI/status/2052466175540629965&quot;&gt;@AnthropicAI&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agents, RL Environments, and Coding Workflows&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prime Intellect Lab and Ramp Fast Ask&lt;/strong&gt;: Prime Intellect opened &lt;strong&gt;Lab&lt;/strong&gt; out of beta as a full stack for building RL environments/evals, evaluating, post-training, deploying, and serving agents &lt;a href=&quot;https://x.com/PrimeIntellect/status/2052225145725698102&quot;&gt;@PrimeIntellect&lt;/a&gt;. Ramp Labs used Prime Intellect to train &lt;strong&gt;Fast Ask&lt;/strong&gt;, a small RL-trained subagent for spreadsheet QA that reportedly scores &lt;strong&gt;+4% exact-match over Opus&lt;/strong&gt; at &lt;strong&gt;Haiku-level latency&lt;/strong&gt; &lt;a href=&quot;https://x.com/RampLabs/status/2052448843099254956&quot;&gt;@RampLabs&lt;/a&gt;; Prime says it outperformed Opus 4.6 while running faster and cheaper &lt;a href=&quot;https://x.com/PrimeIntellect/status/2052465182014840987&quot;&gt;@PrimeIntellect&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hermes Agent momentum&lt;/strong&gt;: Nous/Teknium shipped &lt;strong&gt;Hermes Agent v0.13.0&lt;/strong&gt; with multi-agent orchestration via Kanban, enforced goal completion with &lt;code&gt;/goal&lt;/code&gt;, disk-usage optimizations, custom LLM providers, and custom gateway channels &lt;a href=&quot;https://x.com/Teknium/status/2052495174404874714&quot;&gt;@Teknium&lt;/a&gt;. Earlier updates added agent-free cron jobs via Hermes Gateway for programmatic recurring tasks &lt;a href=&quot;https://x.com/Teknium/status/2052219963591762194&quot;&gt;@Teknium&lt;/a&gt;, blank-slate profiles with &lt;code&gt;--no-skills&lt;/code&gt; &lt;a href=&quot;https://x.com/Teknium/status/2052351650279645590&quot;&gt;@Teknium&lt;/a&gt;, and Lightpanda as a machine-native browser backend with Chrome fallback &lt;a href=&quot;https://x.com/lightpanda_io/status/2052369346928758861&quot;&gt;@lightpanda_io&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cursor orchestration and PR workflows&lt;/strong&gt;: Cursor introduced &lt;code&gt;/orchestrate&lt;/code&gt;, a skill that recursively spawns planner, worker, and verifier agents via the Cursor SDK; internally it reportedly cut skill token use by &lt;strong&gt;20%&lt;/strong&gt; while improving evals and reduced backend cold-start time by &lt;strong&gt;80%&lt;/strong&gt; &lt;a href=&quot;https://x.com/cursor_ai/status/2052432778743210127&quot;&gt;@cursor_ai&lt;/a&gt;. Cursor 3 also added an integrated PR review experience with diffs, commits, comments, review status, a file tree, and skill quick-action pills &lt;a href=&quot;https://x.com/cursor_ai/status/2052489387305488609&quot;&gt;@cursor_ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent infra patterns&lt;/strong&gt;: LangGraph is adding &lt;strong&gt;delta channels&lt;/strong&gt;, storing checkpoint history as diffs to control storage bloat for long-context agents &lt;a href=&quot;https://x.com/sydneyrunkle/status/2052344141963555312&quot;&gt;@sydneyrunkle&lt;/a&gt;. Deep Agents added sandbox backends for provider-agnostic isolated execution across Daytona, Modal, Runloop, and LangSmith, with an &lt;strong&gt;auth proxy&lt;/strong&gt; pattern to keep credentials out of prompt-injectable sandboxes &lt;a href=&quot;https://x.com/sydneyrunkle/status/2052459962169966752&quot;&gt;@sydneyrunkle&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Models, Benchmarks, and Inference Systems&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;xAI, Zhipu, Zyphra, DeepSeek ecosystem&lt;/strong&gt;: xAI made &lt;strong&gt;Image Generation Quality Mode&lt;/strong&gt; available on the xAI API after powering more than &lt;strong&gt;300M images&lt;/strong&gt; in Grok, claiming better realism, text rendering, and creative control &lt;a href=&quot;https://x.com/xai/status/2052193877675983031&quot;&gt;@xai&lt;/a&gt;. Zhipu published the &lt;strong&gt;GLM-5V-Turbo technical report&lt;/strong&gt;, highlighting CogViT dual-teacher distillation, multimodal multi-token prediction, multimodal coding/tool use, and RL across 30+ task categories &lt;a href=&quot;https://x.com/Zai_org/status/2052426777654387168&quot;&gt;@Zai_org&lt;/a&gt;. Zyphra’s &lt;strong&gt;ZAYA1-8B&lt;/strong&gt; was described as AMD-trained, using under &lt;strong&gt;1B active parameters&lt;/strong&gt;, large-scale RL, and a test-time method called &lt;strong&gt;Markovian RSA&lt;/strong&gt; &lt;a href=&quot;https://x.com/kimmonismus/status/2052346978240205249&quot;&gt;@kimmonismus&lt;/a&gt;. Antirez also released &lt;strong&gt;DS4&lt;/strong&gt;, a specialized inference engine for &lt;strong&gt;DeepSeek v4 Flash&lt;/strong&gt; built on llama.cpp/GGML lineage &lt;a href=&quot;https://x.com/antirez/status/2052405820235678175&quot;&gt;@antirez&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google model and API updates&lt;/strong&gt;: Google AI Studio announced &lt;strong&gt;Gemini 3.1 Flash-Lite&lt;/strong&gt; as its most cost-efficient model for high-volume agentic tasks, translation, and simple data processing &lt;a href=&quot;https://x.com/GoogleAIStudio/status/2052453828272812310&quot;&gt;@GoogleAIStudio&lt;/a&gt;. Google also evolved the &lt;strong&gt;Gemini Interactions API&lt;/strong&gt; from role-based &lt;code&gt;user/model&lt;/code&gt; messages to typed &lt;strong&gt;steps&lt;/strong&gt; such as &lt;code&gt;user_input&lt;/code&gt;, &lt;code&gt;thought&lt;/code&gt;, &lt;code&gt;function_call&lt;/code&gt;, &lt;code&gt;tool_call&lt;/code&gt;, and &lt;code&gt;model_output&lt;/code&gt;, targeting richer multi-step agent workflows &lt;a href=&quot;https://x.com/GoogleAIStudio/status/2052487438967140700&quot;&gt;@GoogleAIStudio&lt;/a&gt;. Gemma 4’s MTP/speculative decoding was reported to deliver up to &lt;strong&gt;3× faster&lt;/strong&gt; on-device inference &lt;a href=&quot;https://x.com/googlegemma/status/2052468624657654194&quot;&gt;@googlegemma&lt;/a&gt;, with independent vLLM tests showing large throughput gains and &lt;strong&gt;129 tok/s&lt;/strong&gt; on simple generation on an RTX Pro 6000 &lt;a href=&quot;https://x.com/bnjmn_marie/status/2052286398707687650&quot;&gt;@bnjmn_marie&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sequence models and coding evals&lt;/strong&gt;: Aviv Bick and Albert Gu introduced &lt;strong&gt;Raven&lt;/strong&gt;, a fixed-state sequence model that learns which finite memory slots to update, aiming to fix persistence failures in SSMs and sliding-window attention and outperform prior linear models at &lt;strong&gt;16× training sequence length&lt;/strong&gt; &lt;a href=&quot;https://x.com/avivbick/status/2052438903924396377&quot;&gt;@avivbick&lt;/a&gt;, &lt;a href=&quot;https://x.com/_albertgu/status/2052442144879862003&quot;&gt;@_albertgu&lt;/a&gt;. Scale released the &lt;strong&gt;SWE Atlas Refactoring&lt;/strong&gt; leaderboard, testing whether agents can restructure code without regressions; &lt;strong&gt;Claude Opus 4.7 with Claude Code&lt;/strong&gt; leads &lt;a href=&quot;https://x.com/ScaleAILabs/status/2052434456510878021&quot;&gt;@ScaleAILabs&lt;/a&gt;. Arena’s longitudinal analysis says open models have largely closed the Text Arena gap, with the proprietary lead now around &lt;strong&gt;+30 Arena points&lt;/strong&gt;, though expert prompts remain harder &lt;a href=&quot;https://x.com/arena/status/2052455463573426452&quot;&gt;@arena&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;AI Infrastructure, Health, Robotics, and Applied Products&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Compute and infrastructure&lt;/strong&gt;: Anthropic’s SpaceX/xAI compute deal remained a major theme: Dario Amodei called the SpaceX partnership “visionary engineering + Claude” &lt;a href=&quot;https://x.com/Mononofu/status/2052212359536496961&quot;&gt;@Mononofu&lt;/a&gt;, while Simon Willison highlighted that Anthropic reportedly gets &lt;strong&gt;Colossus 1&lt;/strong&gt;, xAI keeps the larger &lt;strong&gt;Colossus 2&lt;/strong&gt;, and Colossus 1 has environmental controversy &lt;a href=&quot;https://x.com/simonw/status/2052436629365948920&quot;&gt;@simonw&lt;/a&gt;. Lambda closed a &lt;strong&gt;$1B senior secured credit facility&lt;/strong&gt; to expand AI factories &lt;a href=&quot;https://x.com/LambdaAPI/status/2052373882963972496&quot;&gt;@LambdaAPI&lt;/a&gt;, AMD promoted &lt;strong&gt;MI350P PCIe&lt;/strong&gt; with &lt;strong&gt;144GB HBM3E&lt;/strong&gt; and up to &lt;strong&gt;2299 TFLOPS MXFP4&lt;/strong&gt; &lt;a href=&quot;https://x.com/AMD/status/2052373018400219648&quot;&gt;@AMD&lt;/a&gt;, and Ai2 brought new NSF OMAI compute online with &lt;strong&gt;NVIDIA Blackwell Ultra&lt;/strong&gt; systems from a &lt;strong&gt;$152M&lt;/strong&gt; NSF/NVIDIA investment &lt;a href=&quot;https://x.com/allen_ai/status/2052403904139169940&quot;&gt;@allen_ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google Health and medical AI&lt;/strong&gt;: Google is turning Fitbit into the &lt;strong&gt;Google Health&lt;/strong&gt; app on May 26, combining Fitbit tracking with Google services and a Gemini-powered &lt;strong&gt;Google Health Coach&lt;/strong&gt; &lt;a href=&quot;https://x.com/googlehealth/status/2052392762255761701&quot;&gt;@googlehealth&lt;/a&gt;. Google says Health Premium will be included in AI Pro and Ultra plans &lt;a href=&quot;https://x.com/shimritby/status/2052439569136767291&quot;&gt;@shimritby&lt;/a&gt;, and announced &lt;strong&gt;Fitbit Air&lt;/strong&gt;, a screenless wearable with up to one-week battery and $99.99 preorder pricing &lt;a href=&quot;https://x.com/Google/status/2052501704155775481&quot;&gt;@Google&lt;/a&gt;. Separately, Glass Health launched an ambient scribing API at &lt;strong&gt;$0.85/hour&lt;/strong&gt; for transcription plus token-priced note generation &lt;a href=&quot;https://x.com/GlassHealthHQ/status/2052385429010121130&quot;&gt;@GlassHealthHQ&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Robotics and local agents&lt;/strong&gt;: Perplexity released &lt;strong&gt;Personal Computer&lt;/strong&gt; in a new Mac app, letting agents operate across local files, native Mac apps, web, and Perplexity servers, including remote initiation from iPhone and always-on Mac mini setups &lt;a href=&quot;https://x.com/perplexity_ai/status/2052445405754040816&quot;&gt;@perplexity_ai&lt;/a&gt;. NVIDIA Robotics highlighted Hugging Face’s Reachy Mini “agentic robotics app store” and &lt;strong&gt;Isaac GR00T N&lt;/strong&gt; integration with LeRobot workflows &lt;a href=&quot;https://x.com/NVIDIARobotics/status/2052446013949149649&quot;&gt;@NVIDIARobotics&lt;/a&gt;. EO-1 is now available through the standard LeRobot policy interface for robot-control training/eval/deploy workflows &lt;a href=&quot;https://x.com/SongHaomin92651/status/2052360599703867415&quot;&gt;@SongHaomin92651&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets by engagement&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI GPT-Realtime-2 API launch&lt;/strong&gt; — &lt;strong&gt;11.7K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/OpenAI/status/2052438194625593804&quot;&gt;@OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic Natural Language Autoencoders&lt;/strong&gt; — &lt;strong&gt;10.1K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/AnthropicAI/status/2052435436157452769&quot;&gt;@AnthropicAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Claude Mythos helped Firefox fix more security bugs in April than prior 15 months&lt;/strong&gt; — &lt;strong&gt;9.7K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/alexalbert__/status/2052468573516513762&quot;&gt;@alexalbert__&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI Codex Chrome plugin&lt;/strong&gt; — &lt;strong&gt;7.7K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/OpenAI/status/2052480800004956323&quot;&gt;@OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Goodfire neural geometry research agenda&lt;/strong&gt; — &lt;strong&gt;5.1K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/GoodfireAI/status/2052420446910644616&quot;&gt;@GoodfireAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sam Altman on voice as a high-context AI interface&lt;/strong&gt; — &lt;strong&gt;5.0K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/sama/status/2052462271667028211&quot;&gt;@sama&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;xAI Image Generation Quality Mode API&lt;/strong&gt; — &lt;strong&gt;4.5K&lt;/strong&gt; engagement &lt;a href=&quot;https://x.com/xai/status/2052193877675983031&quot;&gt;@xai&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Qwen3.6 27B Local Inference and Quantization&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t57xuu/25x_faster_inference_with_qwen_36_27b_using_mtp/&quot;&gt;2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints&lt;/a&gt;&lt;/strong&gt; (Activity: 1798): &lt;strong&gt;A recent &lt;strong&gt;llama.cpp&lt;/strong&gt; MTP PR (&lt;a href=&quot;https://github.com/ggml-org/llama.cpp/pull/22673&quot;&gt;#22673&lt;/a&gt;) enables Qwen 3.6 27B’s built-in multi-token prediction tensors for speculative decoding; the poster converted MTP-capable GGUF quants (&lt;a href=&quot;https://huggingface.co/froggeric/Qwen3.6-27B-MTP-GGUF&quot;&gt;HF&lt;/a&gt;) and reports &lt;strong&gt;~&lt;code&gt;2.5×&lt;/code&gt; faster generation&lt;/strong&gt; on an M2 Max 96GB, reaching &lt;strong&gt;&lt;code&gt;28 tok/s&lt;/code&gt;&lt;/strong&gt; with &lt;code&gt;--spec-type mtp --spec-draft-n-max 3&lt;/code&gt;. They also published fixed Jinja chat templates (&lt;a href=&quot;https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates&quot;&gt;HF&lt;/a&gt;) and provide &lt;code&gt;llama-server&lt;/code&gt; settings for OpenAI/Anthropic-compatible local serving with &lt;code&gt;q8_0&lt;/code&gt; KV cache and up to &lt;strong&gt;&lt;code&gt;262144&lt;/code&gt; context&lt;/strong&gt;; recommendations emphasize &lt;code&gt;q8_0-mtp&lt;/code&gt; as the best speed/quality quant, avoiding &lt;code&gt;q4_0&lt;/code&gt; KV beyond &lt;code&gt;64k&lt;/code&gt;, and note that Qwen3.6-27B only uses KV cache in &lt;strong&gt;&lt;code&gt;16/65&lt;/code&gt; layers&lt;/strong&gt; due to hybrid linear attention, reducing KV memory ~&lt;code&gt;4×&lt;/code&gt;. A commenter reports on an &lt;strong&gt;RTX Pro 6000 Max-Q&lt;/strong&gt; that Qwen 3.6 “2.7B” Q8 increases from &lt;strong&gt;&lt;code&gt;36 tok/s&lt;/code&gt; to &lt;code&gt;78 tok/s&lt;/code&gt;&lt;/strong&gt; with MTP, at ~&lt;code&gt;20%&lt;/code&gt; slower prompt processing, with no observed output-quality degradation; the post also warns that &lt;strong&gt;vision currently crashes llama.cpp when combined with MTP&lt;/strong&gt;.&lt;/strong&gt; Commenters broadly frame this as part of a major recent acceleration in local inference, making consumer-hardware agentic coding more viable. One technical question asks whether &lt;code&gt;turbo3&lt;/code&gt;/&lt;code&gt;turbo4&lt;/code&gt; was merged separately or is part of the MTP PR.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user benchmarked &lt;code&gt;qwen 3.6 2.7B Q8&lt;/code&gt; on an &lt;strong&gt;RTX Pro 6000 MaxQ&lt;/strong&gt; and reported generation increasing from &lt;code&gt;36 tok/s&lt;/code&gt; to &lt;code&gt;78 tok/s&lt;/code&gt; with &lt;strong&gt;MTP&lt;/strong&gt;, roughly a &lt;code&gt;2.17x&lt;/code&gt; speedup. They noted an approximately &lt;code&gt;20%&lt;/code&gt; prompt-processing slowdown, but said output quality appeared unchanged, making the tradeoff favorable for generation-heavy workloads.&lt;/li&gt;
&lt;li&gt;One commenter asked whether the speedup depends on the recent &lt;code&gt;turbo3&lt;/code&gt;/&lt;code&gt;turbo4&lt;/code&gt; merge or is specifically part of the &lt;strong&gt;MTP PR&lt;/strong&gt;, highlighting that the implementation path matters for reproducing the claimed inference gains.&lt;/li&gt;
&lt;li&gt;There was a technical comparison question against &lt;strong&gt;Qwen 3.6 Dflash&lt;/strong&gt; variants and low-bit &lt;code&gt;iq3_XS&lt;/code&gt; quantizations. The commenter reported usually fitting &lt;code&gt;256k&lt;/code&gt; context into &lt;code&gt;16GB&lt;/code&gt; VRAM and asked whether these quants can also support &lt;code&gt;256k&lt;/code&gt; context without &lt;code&gt;mmproj&lt;/code&gt;, indicating interest in KV-cache/context-length feasibility across quant formats.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t53dhp/quality_comparison_between_qwen_36_27b/&quot;&gt;Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)&lt;/a&gt;&lt;/strong&gt; (Activity: 820): &lt;strong&gt;The post benchmarks &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt; GGUF quantizations on a deliberately odd PGN-to-SVG chess-rendering task, testing board-state tracking, piece placement, orientation, and last-move highlighting with identical &lt;code&gt;llama.cpp&lt;/code&gt; sampling settings (&lt;code&gt;temp=0.6&lt;/code&gt;, &lt;code&gt;top_p=0.95&lt;/code&gt;, &lt;code&gt;top_k=20&lt;/code&gt;, &lt;code&gt;ctx=65536&lt;/code&gt;). The author reports &lt;strong&gt;BF16/Q8_0&lt;/strong&gt; as essentially correct, &lt;strong&gt;Q6_K&lt;/strong&gt; showing placement degradation, &lt;strong&gt;Q5_K_XL/Q4_K_XL/IQ4_XS&lt;/strong&gt; still usable, &lt;strong&gt;IQ3_XXS&lt;/strong&gt; mostly correct but with wrong board orientation, and &lt;strong&gt;Q2_K_XL&lt;/strong&gt; structurally broken despite correct piece positions; full outputs are posted at &lt;a href=&quot;https://qwen3-6-27b-benchmark.vercel.app/&quot;&gt;qwen3-6-27b-benchmark.vercel.app&lt;/a&gt;. For local 16 GB VRAM use, they prefer &lt;strong&gt;IQ4_XS&lt;/strong&gt;, reporting about &lt;code&gt;pp 100 tps&lt;/code&gt; / &lt;code&gt;tg 8 tps&lt;/code&gt; on vanilla &lt;code&gt;llama.cpp&lt;/code&gt;, improved to roughly &lt;code&gt;pp 760 tps&lt;/code&gt; / &lt;code&gt;tg 22 tps&lt;/code&gt; using &lt;strong&gt;TheTom&apos;s TurboQuant&lt;/strong&gt; fork with &lt;code&gt;-ngl 99&lt;/code&gt;, &lt;code&gt;turbo4/turbo2&lt;/code&gt; KV-cache quantization, and context limited below ~&lt;code&gt;75k&lt;/code&gt;.&lt;/strong&gt; The main technical caveat raised in comments is that the evaluation appears to be &lt;strong&gt;single-run&lt;/strong&gt;, so stochastic variance could make individual quantization results outliers; commenters still noted that the observed degradation trend broadly matches expectations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters questioned whether the quantization comparison used &lt;strong&gt;single-run evaluations or repeated trials&lt;/strong&gt;, noting that LLM outputs can vary enough that &lt;em&gt;“one run is not enough”&lt;/em&gt; and may produce misleading conclusions from statistical noise or outlier generations. They still observed an apparent expected trend of &lt;strong&gt;quality degradation as quantization becomes more aggressive&lt;/strong&gt;, but wanted multiple samples per quant level to support the findings.&lt;/li&gt;
&lt;li&gt;One technically substantive takeaway was that &lt;strong&gt;4-bit quantization appears to remain the practical sweet spot&lt;/strong&gt;, with &lt;strong&gt;3-bit quants still described as usable&lt;/strong&gt; despite common skepticism. A commenter argued that above roughly &lt;strong&gt;5-bit&lt;/strong&gt;, users may often gain more by moving to a larger/better model rather than preserving extra precision on a smaller one, citing comparisons like &lt;code&gt;122B UD-Q3_K_XL&lt;/code&gt; versus &lt;code&gt;35B IQ4_NL&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t5yajb/qwen36_27b_uncensored_heretic_v2_native_mtp/&quot;&gt;Qwen3.6 27B uncensored heretic v2 Native MTP Preserved is Out Now With KLD 0.0021, 6/100 Refusals and the Full 15 MTPs Preserved and Retained, Available in Safetensors, GGUFs and NVFP4s formats.&lt;/a&gt;&lt;/strong&gt; (Activity: 530): &lt;strong&gt;&lt;strong&gt;llmfan46&lt;/strong&gt; released &lt;strong&gt;Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved&lt;/strong&gt; on Hugging Face, claiming &lt;code&gt;KLD = 0.0021&lt;/code&gt;, &lt;code&gt;6/100&lt;/code&gt; refusals, and preservation/retention of the full &lt;code&gt;15&lt;/code&gt; native MTP heads across &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved&quot;&gt;Safetensors&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF&quot;&gt;GGUF&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4&quot;&gt;NVFP4&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-GGUF&quot;&gt;NVFP4-GGUF&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-NVFP4-MLP-Only&quot;&gt;NVFP4-MLP-only&lt;/a&gt;, and &lt;a href=&quot;https://huggingface.co/llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GPTQ-Int4&quot;&gt;GPTQ-Int4&lt;/a&gt; variants. The post says the release includes benchmarks and that all variants were checked for full MTP retention; the author’s full model list is &lt;a href=&quot;https://huggingface.co/llmfan46/models&quot;&gt;here&lt;/a&gt;.&lt;/strong&gt; Commenters requested additional deployment-oriented quantization support, especially &lt;code&gt;Q4_K_XS&lt;/code&gt; for &lt;code&gt;16GB&lt;/code&gt; systems, and asked whether MTP works with TurboQuant-compressed KV cache or could be applied to Gemma 4 dense models. One technical concern was that if the MTP draft heads were trained on the original refusal-aligned model while only the base was fine-tuned, MTP acceptance may degrade or &lt;em&gt;“fight the heretic”&lt;/em&gt; specifically on newly unlocked refusal/tail-behavior cases despite the low aggregate &lt;code&gt;KLD = 0.0021&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A key concern was whether preserving the full &lt;code&gt;15&lt;/code&gt; MTP heads is actually beneficial after an uncensoring/heretic fine-tune: if the draft heads retain the original refusal distribution while the base model was modified, speculative decoding may “fight” the newly unlocked outputs. One commenter noted that the reported &lt;strong&gt;KLD &lt;code&gt;0.0021&lt;/code&gt;&lt;/strong&gt; indicates the base stayed close overall, but may not capture &lt;em&gt;tail behavior&lt;/em&gt; on refusal/unlocked prompts, making &lt;strong&gt;MTP acceptance rate on heretic cases&lt;/strong&gt; the more important validation metric.&lt;/li&gt;
&lt;li&gt;Users asked for deployment-specific quantization details, including a &lt;strong&gt;&lt;code&gt;Q4_K_XS&lt;/code&gt; GGUF&lt;/strong&gt; target to fit &lt;code&gt;16GB&lt;/code&gt; VRAM while retaining useful context, and whether preserved MTP remains compatible with &lt;strong&gt;TurboQuant-compressed KV cache&lt;/strong&gt;. Another hardware-focused question flagged that &lt;strong&gt;NVFP4 + MTP on Blackwell&lt;/strong&gt; may currently be blocked by CUDA/tooling support, with the commenter saying the stack appears “dead in the water until a new CUDA version is released.”&lt;/li&gt;
&lt;li&gt;There were implementation questions around multimodal packaging and stability: commenters noted the inclusion of &lt;code&gt;mmproj&lt;/code&gt; files and asked whether crashes related to &lt;strong&gt;PR &lt;code&gt;#22673&lt;/code&gt;&lt;/strong&gt; are still present. Another asked whether the same MTP-preservation approach could apply to a future &lt;strong&gt;Gemma 4 dense&lt;/strong&gt; model, implying interest in portability of native MTP heads across architectures/fine-tunes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. Claude Limits Raised via SpaceX Compute&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1t5hs98/doubled_rate_limits_for_claude_code/&quot;&gt;Doubled Rate Limits for Claude Code&lt;/a&gt;&lt;/strong&gt; (Activity: 3901): &lt;strong&gt;&lt;strong&gt;Anthropic&lt;/strong&gt; says a new compute-capacity partnership with &lt;strong&gt;SpaceX&lt;/strong&gt;, plus other recent compute deals, enabled higher usage limits across &lt;strong&gt;Claude Code&lt;/strong&gt; and the &lt;strong&gt;Claude API&lt;/strong&gt; (&lt;a href=&quot;https://www.anthropic.com/news/higher-limits-spacex&quot;&gt;announcement&lt;/a&gt;). Effective immediately, &lt;strong&gt;Claude Code Pro/Max&lt;/strong&gt; no longer has the prior &lt;em&gt;peak-hours limit reduction&lt;/em&gt;, and &lt;strong&gt;Opus-model API rate limits&lt;/strong&gt; are being “substantially” raised.&lt;/strong&gt; Top comments were mostly non-technical reactions: surprise/skepticism about whether the announcement is real, plus speculation that the SpaceX/Anthropic tie-up reflects Elon Musk’s rivalry with Sam Altman.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t5htq1/spacex_conpute_deal_double_limits/&quot;&gt;SpaceX Conpute Deal - Double Limits&lt;/a&gt;&lt;/strong&gt; (Activity: 1931): &lt;strong&gt;&lt;strong&gt;Anthropic announced a compute partnership with SpaceX&lt;/strong&gt; to “substantially increase” capacity, alongside other compute deals, and is immediately changing limits: removing &lt;strong&gt;peak-hours limit reductions&lt;/strong&gt; for &lt;strong&gt;Claude Code Pro/Max&lt;/strong&gt; and &lt;strong&gt;substantially raising API rate limits for Opus models&lt;/strong&gt; (&lt;a href=&quot;https://www.anthropic.com/news/higher-limits-spacex&quot;&gt;Anthropic announcement&lt;/a&gt;). The post does not specify exact new rate-limit numbers or the nature of the SpaceX compute arrangement.&lt;/strong&gt; Comments are skeptical that higher limits will materially improve usable capacity, with one noting users may simply hit weekly caps faster and another comparing Claude unfavorably to OpenAI Codex usage economics. There’s also concern that any improvement may be temporary and regress within weeks or months.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters argue that a raw compute-capacity deal would not materially improve &lt;strong&gt;Claude Chat&lt;/strong&gt; unless Anthropic also changes product-level throttles: &lt;em&gt;“A usage limit increase that doesn&apos;t change the weekly limit is practically useless.”&lt;/em&gt; The key technical/product distinction raised is between backend compute availability and enforced per-user weekly quota policy.&lt;/li&gt;
&lt;li&gt;One comparison frames Anthropic’s quota pressure against &lt;strong&gt;OpenAI Codex&lt;/strong&gt; pricing/usage: a user claims &lt;em&gt;“$20 on codex gets you infinitely more usage than Claude,”&lt;/em&gt; suggesting Anthropic may be reacting to user churn caused by stricter effective compute limits. The discussion implies that any short-term limit relaxation may be temporary if demand again saturates available capacity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. AI Lab Corporate Governance Drama&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/OpenAI/comments/1t5tn1n/sam_altman_texts_mira_murati_november_19_2023/&quot;&gt;Sam Altman texts Mira Murati. November 19, 2023. [This document is from Musk v. Altman (2026).]&lt;/a&gt;&lt;/strong&gt; (Activity: 5431): &lt;strong&gt;The post references an image/document titled &lt;strong&gt;“Sam Altman texts Mira Murati. November 19, 2023”&lt;/strong&gt;, allegedly from &lt;strong&gt;Musk v. Altman (2026)&lt;/strong&gt;, but the linked Reddit gallery was inaccessible due to &lt;strong&gt;403 Forbidden&lt;/strong&gt;, so the actual text-message contents could not be verified or summarized. No technical claims, model details, benchmarks, implementation facts, or litigation-document substance were available from the provided post metadata.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1t5q5jm/xai_will_be_dissolved_as_a_separate_entity/&quot;&gt;xAI will be dissolved as a separate entity.&lt;/a&gt;&lt;/strong&gt; (Activity: 2116): &lt;strong&gt;The image is a &lt;strong&gt;non-technical screenshot of an X.com post&lt;/strong&gt; attributed to &lt;strong&gt;Elon Musk&lt;/strong&gt;, claiming that &lt;strong&gt;xAI would be dissolved as a separate company&lt;/strong&gt; and folded into “&lt;strong&gt;SpaceXAI&lt;/strong&gt;,” described as AI products from SpaceX: &lt;a href=&quot;https://i.redd.it/tzexewkj2lzg1.jpeg&quot;&gt;image&lt;/a&gt;. No implementation details, model changes, infrastructure plans, or product roadmap are provided in the post/title, so the significance is primarily &lt;strong&gt;corporate-structure/contextual&lt;/strong&gt;, not technical.&lt;/strong&gt; Comments frame the move as consistent with Musk’s prior desire to combine AI work with his other companies, while skeptics characterize it as potentially moving unprofitable AI efforts into SpaceX, a profitable/government-contract-supported entity.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>openai</category><category>anthropic</category><category>goodfireai</category><category>scale-ai</category><category>gpt-realtime-2</category><category>gpt-5.5</category><category>codex</category><category>micahcarroll</category><category>milesbrundage</category><category>ryanpgreenblatt</category><category>voice-models</category><category>streaming-translation</category><category>transcription</category><category>benchmarking</category><category>context-windows</category><category>browser-automation</category><category>cybersecurity</category><category>interpretability</category><category>neural-geometry</category><category>manifolds</category><category>ai-safety</category><category>rlhf</category></item><item><title>Anthropic-SpaceXai&apos;s 300MW/$5B/yr deal for Colossus I, ARR growth is 8000% annualized</title><link>https://news.smol.ai/issues/26-05-06-anthropic-xai/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-06-anthropic-xai/</guid><description>**Anthropic** announced a new **SpaceX compute partnership** to significantly increase capacity for **Claude** products, doubling **Claude Code&apos;s 5-hour rate limits** for Pro, Max, Team, and Enterprise users, removing peak-hour limit reductions, and substantially increasing API rate limits for **Opus** models. The deal grants Anthropic access to **Colossus 1** via **SpaceXAI**, with **Claude inference** expected to ramp up on Colossus soon. Anthropic also hosted a **&quot;Code with Claude&quot;** event featuring updates on Claude Code, GitHub-scale usage, and managed agents. Discussions highlighted compute bottlenecks, user reactions to limit changes, debates on managed-agent features, and ongoing safety/governance discourse around AGI trustworthiness.</description><pubDate>Wed, 06 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/5/2026-5/6/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It was Anthropic’s second annual developer event today, and the vibes were immaculate. No big model release, which some (miscalibrated) people were hoping for, but it was mostly the SpaceX partnership announcement (on track to challenge Claude’s biggest launch of all time), 3 new features for Claude Managed Agents, and a recap/reintroduction/celebration of all that has been shipped in the past 6 months:&lt;/p&gt;
&lt;p&gt;opening keynote
After Elon signed off on it, possibly strategically just as his lawsuit against OpenAI is in trial, Anthropic is taking over all of Colossus 1 with surprising speed (“in the next few days”) which some estimate to be a roughly $5B/year deal, making xAI a neocloud:&lt;/p&gt;
&lt;p&gt;The other big draw was the moderated session with the Amodei siblings, announcing the 80x growth and some commentary on US and Chinese competitors:&lt;/p&gt;
&lt;p&gt;The trends Dario is watching:&lt;/p&gt;
&lt;p&gt;Tiny Teams: He still thinks 2026 is the year we see a one person billion dollar company. “There is an enormous ability for one person or a tiny set of people to do a set of things that are incredible… Before, if you had an idea or vision there are so many resources you’d have to accumulate for several years in order to make that vision happen, and I think there’s a unique opportunity for single individuals or very tiny teams to do things that are incredible, where we move from the models are writing code, to the models are helping us think of software engineering as a task, to the models are helping us think of how can I build a business or economic unit as a task”.&lt;/p&gt;
&lt;p&gt;Multiagents: “starting with a team of smart people in a room and working our way up to a ‘country of geniuses in a datacenter’”&lt;/p&gt;
&lt;p&gt;Enterprise Services: “Claude Code helps individuals to be more productive, but we’re increasingly going to help whole teams and organizations be more productive and more than the sum of its parts”.&lt;/p&gt;
&lt;p&gt;Bottlenecks: Claude is of course speeding up Claude, but he thinks about Amdahl’s Law - Security, Verifiability - finding the bottlenecks in software engineering and removing them/speeding up the overall process.&lt;/p&gt;
&lt;p&gt;The rest of the mainstage sessions included:&lt;/p&gt;
&lt;p&gt;Must know Claude Code updates:&lt;/p&gt;
&lt;p&gt;More Outcomes content on the Inner vs the Outer Loop…&lt;/p&gt;
&lt;p&gt;… for automatic improvement of agents:&lt;/p&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Top Story: Anthropic and Claude announcements/commentary&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;What happened&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Anthropic had a dense news cycle centered on compute, Claude Code limits, and agent platform direction.&lt;/strong&gt; Officially, Anthropic announced a new compute partnership with SpaceX that will “substantially increase” capacity and immediately translate into higher limits for Claude products: &lt;a href=&quot;https://x.com/claudeai/status/2052060691893227611&quot;&gt;@claudeai&lt;/a&gt; said the deal boosts compute enough to raise usage limits, followed by specifics from &lt;a href=&quot;https://x.com/claudeai/status/2052060693269008586&quot;&gt;@claudeai&lt;/a&gt;: &lt;strong&gt;Claude Code’s 5-hour rate limits are doubled for Pro, Max, Team, and seat-based Enterprise; peak-hours limit reductions are removed for Pro and Max; Opus API rate limits are substantially increased&lt;/strong&gt;. xAI framed the deal as Anthropic getting access to &lt;strong&gt;Colossus 1&lt;/strong&gt; via SpaceXAI for “additional capacity for Claude” &lt;a href=&quot;https://x.com/xai/status/2052060350770515978&quot;&gt;@xai&lt;/a&gt;, while Anthropic CTO Tom Brown added that &lt;strong&gt;Claude inference would be ramped up on Colossus “in the next few days”&lt;/strong&gt; &lt;a href=&quot;https://x.com/nottombrown/status/2052062566126649448&quot;&gt;@nottombrown&lt;/a&gt;. The company also ran its &lt;strong&gt;“Code with Claude”&lt;/strong&gt; event, with a livestreamed keynote and sessions on Claude Code, GitHub-scale usage, and managed agents &lt;a href=&quot;https://x.com/ClaudeDevs/status/2052055459272761661&quot;&gt;@ClaudeDevs&lt;/a&gt;, prompting substantial real-time commentary from developers and observers &lt;a href=&quot;https://x.com/simonw/status/2052055655230706032&quot;&gt;@simonw&lt;/a&gt;, &lt;a href=&quot;https://x.com/latentspacepod/status/2052062150332710942&quot;&gt;@latentspacepod&lt;/a&gt;. Around this, discourse branched into four themes: &lt;strong&gt;(1) compute bottlenecks were more severe than many assumed, reportedly due to unexpected usage growth; (2) users welcomed the 5-hour limit increase but questioned unchanged weekly limits; (3) people debated whether Anthropic’s new managed-agent features like memory/“Dreaming” and rubrics/“Outcomes” are real product differentiation or commoditizable harness features; and (4) Anthropic’s safety/governance positioning continued to attract both praise and criticism&lt;/strong&gt;, including claims from critics that some Anthropic employees project “only we can be trusted with AGI,” and counterclaims from Anthropic-adjacent voices that the more common internal view is closer to “no one can be trusted with AGI” than “only us” &lt;a href=&quot;https://x.com/_aidan_clark_/status/2052089187659346047&quot;&gt;@&lt;em&gt;aidan_clark&lt;/em&gt;&lt;/a&gt;, &lt;a href=&quot;https://x.com/kipperrii/status/2052094851991392536&quot;&gt;@kipperrii&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Official facts and confirmed details&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic announced a &lt;strong&gt;SpaceX compute partnership&lt;/strong&gt; to increase capacity &lt;a href=&quot;https://x.com/claudeai/status/2052060691893227611&quot;&gt;@claudeai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Effective immediately, Anthropic says it is:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Doubling Claude Code’s 5-hour rate limits&lt;/strong&gt; for Pro, Max, Team, and seat-based Enterprise&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Removing peak-hours limit reduction&lt;/strong&gt; on Claude Code for Pro and Max&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Substantially increasing API rate limits for Opus models&lt;/strong&gt;&lt;br&gt;
Source: &lt;a href=&quot;https://x.com/claudeai/status/2052060693269008586&quot;&gt;@claudeai&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;Anthropic linked an official explainer on the higher usage limits and the SpaceX compute deal &lt;a href=&quot;https://x.com/claudeai/status/2052060696255283346&quot;&gt;@claudeai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;xAI’s announcement described the arrangement as &lt;strong&gt;SpaceXAI providing Anthropic access to Colossus 1&lt;/strong&gt; for additional Claude capacity &lt;a href=&quot;https://x.com/xai/status/2052060350770515978&quot;&gt;@xai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic CTO Tom Brown said &lt;strong&gt;Claude inference would start ramping on Colossus within days&lt;/strong&gt; &lt;a href=&quot;https://x.com/nottombrown/status/2052062566126649448&quot;&gt;@nottombrown&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic product/eng lead Amol Avasare clarified that &lt;strong&gt;weekly limits were not increased yet&lt;/strong&gt; because only a &lt;strong&gt;small percentage&lt;/strong&gt; of users hit weekly limits, while a much larger percentage hit 5-hour limits; more changes may come as compute lands &lt;a href=&quot;https://x.com/TheAmolAvasare/status/2052064611692904639&quot;&gt;@TheAmolAvasare&lt;/a&gt;, &lt;a href=&quot;https://x.com/TheAmolAvasare/status/2052066157176426653&quot;&gt;@TheAmolAvasare&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic/Claude held a &lt;strong&gt;Code with Claude&lt;/strong&gt; event with sessions including keynote, Claude Code updates, GitHub-scale usage, and managed agents &lt;a href=&quot;https://x.com/ClaudeDevs/status/2052055459272761661&quot;&gt;@ClaudeDevs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic’s Alex Albert promoted the event and later summarized the announcement as &lt;strong&gt;“More chips, more Claude”&lt;/strong&gt; &lt;a href=&quot;https://x.com/alexalbert__/status/2052067009605861764&quot;&gt;@alexalbert__&lt;/a&gt;, &lt;a href=&quot;https://x.com/alexalbert__/status/2052065953173872912&quot;&gt;@alexalbert__&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The dedicated Claude Code account reiterated the limit increase for Pro/Max/Team &lt;a href=&quot;https://x.com/claude_code/status/2052071730190123094&quot;&gt;@claude_code&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Compute details and scale claims&lt;/h2&gt;
&lt;p&gt;Several tweets added quantitative claims about the scale of the SpaceX/xAI arrangement. These are &lt;strong&gt;not from Anthropic’s main announcement tweets&lt;/strong&gt;, but they were widely circulated:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/_arohan_/status/2052065871552819647&quot;&gt;@&lt;em&gt;arohan&lt;/em&gt;&lt;/a&gt; cited &lt;strong&gt;“more than 300 megawatts of new capacity” and “over 220,000 NVIDIA GPUs within the month.”&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/scaling01/status/2052068218047545501&quot;&gt;@scaling01&lt;/a&gt; claimed Colossus 1 includes &lt;strong&gt;~150,000 H100s, 50,000 H200s, and 30,000 GB200s&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/Yuchenj_UW/status/2052065017072386450&quot;&gt;@Yuchenj_UW&lt;/a&gt; repeated the &lt;strong&gt;220,000 GPU&lt;/strong&gt; figure and added an unverified claim that Anthropic had committed &lt;strong&gt;$200B on Google TPUs&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/eliebakouch/status/2052066609896808473&quot;&gt;@eliebakouch&lt;/a&gt; interpreted the deal as Anthropic getting effectively &lt;strong&gt;all of Colossus 1 capacity&lt;/strong&gt;, not just idle GPUs.&lt;/li&gt;
&lt;li&gt;Elon Musk later said SpaceXAI was comfortable leasing Colossus 1 because &lt;strong&gt;xAI had already moved training to Colossus 2&lt;/strong&gt; &lt;a href=&quot;https://x.com/elonmusk/status/2052069691372478511&quot;&gt;@elonmusk&lt;/a&gt;, and &lt;a href=&quot;https://x.com/eliebakouch/status/2052068426152132722&quot;&gt;@eliebakouch&lt;/a&gt; claimed Colossus 2 is already at &lt;strong&gt;~500k Blackwells&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These numbers are best treated as &lt;strong&gt;partly official-adjacent but not fully canonized in Anthropic’s own announcement thread&lt;/strong&gt;. The broad factual takeaway is stronger than the exact inventory breakdown: &lt;strong&gt;Anthropic secured a very large, near-term external inference capacity expansion.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;Evidence the bottleneck was real&lt;/h2&gt;
&lt;p&gt;A recurring interpretation was that Anthropic’s constraint had genuinely been compute, not merely pricing or product design.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/kimmonismus/status/2052059082886910251&quot;&gt;@kimmonismus&lt;/a&gt; asked during/after the livestream whether Anthropic was &lt;strong&gt;doubling Claude Code rate limits at no extra charge&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/kimmonismus/status/2052118418174681572&quot;&gt;@kimmonismus&lt;/a&gt; later summarized remarks from a Dario/Daniela interview: &lt;strong&gt;usage grew ~80x unexpectedly&lt;/strong&gt;, which purportedly caused the compute shortage, and the SpaceX deal is the first major attempt to address it.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/czajkadev/status/2052101699188248990&quot;&gt;@czajkadev&lt;/a&gt; explicitly interpreted the update as proof that &lt;strong&gt;compute was the bottleneck&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/theo/status/2052114791045668894&quot;&gt;@theo&lt;/a&gt; separately argued the industry problems are “not just money, it’s about compute,” which fits the Anthropic story even though it’s a broader point.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/scaling01/status/2052069341609226550&quot;&gt;@scaling01&lt;/a&gt; generalized from this deal to a macro thesis: &lt;strong&gt;frontier labs are compute constrained enough to rent datacenters from competitors.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is one of the strongest factual/market signals in the dataset: &lt;strong&gt;Anthropic’s user-facing rate limits moved materially only after a major compute deal.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;Product implications: Claude Code, API, and managed agents&lt;/h2&gt;
&lt;p&gt;Anthropic’s practical user impact is clear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Claude Code power users get more usable burst capacity&lt;/strong&gt; over a 5-hour window.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Peak-time throttling is eased&lt;/strong&gt; for Pro/Max.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Opus API users get higher rate limits&lt;/strong&gt;, which matters for agent workloads and production integrations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The event also highlighted Anthropic’s broader platform ambitions around agents. While the primary official tweets here are mostly about the event itself, commentary points to features such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Dreaming&lt;/strong&gt; = memory / cross-session context&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Outcomes&lt;/strong&gt; = rubrics / grading / objective tracking&lt;/li&gt;
&lt;li&gt;agent orchestration / managed agents direction&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Commentary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/RichNwan/status/2052085746526216601&quot;&gt;@RichNwan&lt;/a&gt; argued Anthropic is “building out their managed agents platform” with &lt;strong&gt;Dreaming&lt;/strong&gt; and &lt;strong&gt;Outcomes&lt;/strong&gt;, but questioned whether these are meaningfully differentiated versus open harnesses.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/eliebakouch/status/2052156107313807690&quot;&gt;@eliebakouch&lt;/a&gt; saw these as &lt;strong&gt;important for power users&lt;/strong&gt;, especially for preserving the main agent’s context window and using separate graders to manage quality/safety/reward hacking.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/latentspacepod/status/2052068066167816369&quot;&gt;@latentspacepod&lt;/a&gt; quoted Anthropic speakers emphasizing &lt;strong&gt;verification&lt;/strong&gt;, “routines are higher-order prompts,” and the idea that the remaining gap is often &lt;strong&gt;deployment/operationalization&lt;/strong&gt;, not raw capability.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;That last point aligns Anthropic with the broader shift from “one-shot chatbot” to &lt;strong&gt;structured agent systems with memory, decomposition, grading, and verification&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;Facts vs opinions&lt;/h2&gt;
&lt;h3&gt;Factual claims with strongest support&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic has a new &lt;strong&gt;SpaceX compute partnership&lt;/strong&gt; and increased Claude Code/API limits immediately &lt;a href=&quot;https://x.com/claudeai/status/2052060691893227611&quot;&gt;@claudeai&lt;/a&gt;, &lt;a href=&quot;https://x.com/claudeai/status/2052060693269008586&quot;&gt;@claudeai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Weekly limits were &lt;strong&gt;not&lt;/strong&gt; doubled yet; Anthropic staff said that was intentional based on who hits which caps &lt;a href=&quot;https://x.com/TheAmolAvasare/status/2052064611692904639&quot;&gt;@TheAmolAvasare&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic intends to run &lt;strong&gt;Claude inference on Colossus&lt;/strong&gt; in the near term &lt;a href=&quot;https://x.com/nottombrown/status/2052062566126649448&quot;&gt;@nottombrown&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic ran a &lt;strong&gt;Code with Claude&lt;/strong&gt; event focused on coding, production deployment, and managed agents &lt;a href=&quot;https://x.com/ClaudeDevs/status/2052055459272761661&quot;&gt;@ClaudeDevs&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Plausible but less directly verified claims&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic is gaining access to &lt;strong&gt;&gt;300 MW / &gt;220,000 NVIDIA GPUs&lt;/strong&gt; in short order &lt;a href=&quot;https://x.com/_arohan_/status/2052065871552819647&quot;&gt;@&lt;em&gt;arohan&lt;/em&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Colossus 1 inventory breakdown includes &lt;strong&gt;H100/H200/GB200 mixes&lt;/strong&gt; &lt;a href=&quot;https://x.com/scaling01/status/2052068218047545501&quot;&gt;@scaling01&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic’s demand spike was around &lt;strong&gt;80x growth&lt;/strong&gt; and caught leadership off guard &lt;a href=&quot;https://x.com/kimmonismus/status/2052118418174681572&quot;&gt;@kimmonismus&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Opinions and interpretations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Anthropic &lt;strong&gt;waited too long&lt;/strong&gt; to address compute shortages and lost significant growth to OpenAI/Codex: &lt;a href=&quot;https://x.com/scaling01/status/2052070594972090409&quot;&gt;@scaling01&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;This deal proves &lt;strong&gt;compute is not a durable moat&lt;/strong&gt;, because top labs can rent capacity from whichever hyperscaler/cluster operator will supply it: &lt;a href=&quot;https://x.com/Dorialexander/status/2052067579594707149&quot;&gt;@Dorialexander&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Alternatively, this proves the opposite in practical terms: &lt;strong&gt;whoever controls deployed compute shapes who can satisfy demand&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Anthropic’s platform features are &lt;strong&gt;not very differentiated&lt;/strong&gt; because open harnesses can replicate them: &lt;a href=&quot;https://x.com/RichNwan/status/2052085746526216601&quot;&gt;@RichNwan&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Or they &lt;strong&gt;are differentiated enough&lt;/strong&gt; because first-party integration can tightly couple model behavior, memory, evaluators, and product experience.&lt;/li&gt;
&lt;li&gt;Anthropic’s culture is unusually safety-focused and “good for humanity”: Elon Musk said after meeting senior Anthropic staff he was impressed and “no one set off my evil detector” &lt;a href=&quot;https://x.com/elonmusk/status/2052069691372478511&quot;&gt;@elonmusk&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Conversely, critics continue to frame Anthropic as overly paternalistic or exclusivist about AGI governance &lt;a href=&quot;https://x.com/_aidan_clark_/status/2052089187659346047&quot;&gt;@&lt;em&gt;aidan_clark&lt;/em&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Different opinions in the discourse&lt;/h2&gt;
&lt;h3&gt;1) Positive / supportive&lt;/h3&gt;
&lt;p&gt;A large set of replies treated this as a win for users and evidence Anthropic is responding aggressively.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/alexalbert__/status/2052065953173872912&quot;&gt;@alexalbert__&lt;/a&gt;: “More chips, more Claude.”&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/_sholtodouglas/status/2052062164467224971&quot;&gt;@_sholtodouglas&lt;/a&gt;: “More compute -&gt; straight to you.”&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/kimmonismus/status/2052059448261177367&quot;&gt;@kimmonismus&lt;/a&gt; highlighted doubled limits and raised Opus API caps.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/TheRundownAI/status/2052064469371470218&quot;&gt;@TheRundownAI&lt;/a&gt; summarized it as a straightforward user benefit.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/DannyLimanseta/status/2052078750893056420&quot;&gt;@DannyLimanseta&lt;/a&gt; liked the cross-company cooperation and hoped Anthropic’s caution might be balanced by SpaceXAI’s optimism.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/AmandaAskell/status/2052161052058833181&quot;&gt;@AmandaAskell&lt;/a&gt; reacted positively to the announcement’s symbolism.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2) Mixed / pragmatic&lt;/h3&gt;
&lt;p&gt;These takes welcomed the change but focused on operational details and remaining limitations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/btibor91/status/2052067002412335435&quot;&gt;@btibor91&lt;/a&gt; and &lt;a href=&quot;https://x.com/kimmonismus/status/2052061694080188720&quot;&gt;@kimmonismus&lt;/a&gt; immediately noted the likely caveat: &lt;strong&gt;weekly caps unchanged&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/TheAmolAvasare/status/2052064611692904639&quot;&gt;@TheAmolAvasare&lt;/a&gt; answered this directly.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/sbmaruf/status/2052119971820658771&quot;&gt;@sbmaruf&lt;/a&gt; reported still seeing rate limits after the change, implying rollout and reliability tuning were ongoing.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/zachtratar/status/2052161984968396819&quot;&gt;@zachtratar&lt;/a&gt; asked for patience during staged rollout.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3) Competitive / strategic critique&lt;/h3&gt;
&lt;p&gt;A different cluster viewed the announcement through the OpenAI-vs-Anthropic product war.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/scaling01/status/2052070594972090409&quot;&gt;@scaling01&lt;/a&gt; argued Anthropic &lt;strong&gt;blundered its growth advantage by waiting too long&lt;/strong&gt;, possibly conceding billions in ARR to OpenAI.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/Yuchenj_UW/status/2052065017072386450&quot;&gt;@Yuchenj_UW&lt;/a&gt; read the move as Dario getting aggressive because of &lt;strong&gt;OpenAI Codex’s growth&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/_arohan_/status/2052053181656641735&quot;&gt;@&lt;em&gt;arohan&lt;/em&gt;&lt;/a&gt; joked that “Big tech has become a claude wrapper,” pointing to Claude’s developer mindshare.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/dejavucoder/status/2052051193376231845&quot;&gt;@dejavucoder&lt;/a&gt; saying “claude is down, saint tibo please reset codex limits” captured the practical reality of multi-homing among coding tools when one service is capacity constrained.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;4) Governance / safety / culture critique&lt;/h3&gt;
&lt;p&gt;This is the deepest philosophical disagreement.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/_aidan_clark_/status/2052089187659346047&quot;&gt;@&lt;em&gt;aidan_clark&lt;/em&gt;&lt;/a&gt; criticized what he says he repeatedly hears from Anthropic colleagues: a belief they alone should be trusted to build AI.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/kipperrii/status/2052094851991392536&quot;&gt;@kipperrii&lt;/a&gt; partially agreed the “only we can be trusted” framing would be bad, but argued the real majority view is closer to &lt;strong&gt;“no one can be trusted with AGI”&lt;/strong&gt; while still personally trusting Anthropic more than others.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/elonmusk/status/2052069691372478511&quot;&gt;@elonmusk&lt;/a&gt; offered a surprising endorsement after meeting Anthropic leaders.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/Yuchenj_UW/status/2052080339364004317&quot;&gt;@Yuchenj_UW&lt;/a&gt; called this reversal ironic given prior criticism of Anthropic.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/teortaxesTex/status/2052080900280557749&quot;&gt;@teortaxesTex&lt;/a&gt; mocked the rapid détente between Musk/xAI and Anthropic.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/teortaxesTex/status/2052045988936683674&quot;&gt;@teortaxesTex&lt;/a&gt; also argued it is inconsistent to warn others about AI risk while building powerful closed systems such as “Mythos.”&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/goodside/status/2052077014346064372&quot;&gt;@goodside&lt;/a&gt;, while not directly about Anthropic governance, contributed to the broader moral/AI norms debate that often clusters around Anthropic.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Commentary on Claude model performance and comparisons&lt;/h2&gt;
&lt;p&gt;Though no major new Claude model appears in these tweets, Claude remained a reference point in product and eval discourse.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/giffmana/status/2051925008457273527&quot;&gt;@giffmana&lt;/a&gt; compared “Opus 4.6,” ChatGPT Pro, and Muse Spark on a mathematical disagreement. His take:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Opus 4.6&lt;/strong&gt; confidently defended a wrong proof (“gaslit”)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ChatGPT Pro&lt;/strong&gt; reconciled the formulas correctly but without interpretation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Muse Spark&lt;/strong&gt; did both well&lt;br&gt;
This is anecdotal, but it’s one of the more concrete comparative qualitative model reports in the set.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/kimmonismus/status/2052040471829004627&quot;&gt;@kimmonismus&lt;/a&gt; summarized a Substack analysis claiming &lt;strong&gt;GPT-5.5 is basically tied with Claude Mythos Preview on cyber&lt;/strong&gt;, perhaps more cost-efficient, while Mythos is only slightly ahead on some general benchmarks and SWE-bench Pro; he questioned why Mythos remains secretive.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/AssemblyAI/status/2052043337751056733&quot;&gt;@AssemblyAI&lt;/a&gt; noted support for &lt;strong&gt;structured JSON from Claude 4.5+ models&lt;/strong&gt; in its gateway.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/TencentHunyuan/status/2051978552900538403&quot;&gt;@OpenRouter/TencentHunyuan&lt;/a&gt; listed &lt;strong&gt;Claude Code&lt;/strong&gt; among major apps driving Hy3 usage, showing Claude’s importance in the coding-tool ecosystem even when third-party models are used behind the scenes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These comments don’t establish hard model ranking, but they do show Claude is still a primary benchmark in coding-agent workflows and that advanced users increasingly compare &lt;strong&gt;model + harness + limits + reliability&lt;/strong&gt;, not just base intelligence.&lt;/p&gt;
&lt;h2&gt;Claude Code and harness engineering context&lt;/h2&gt;
&lt;p&gt;A notable background thread across the dataset is that many engineers now think &lt;strong&gt;agent performance is heavily dependent on the harness&lt;/strong&gt;—system prompts, tools, middleware, decomposition strategies, and model-specific tuning.&lt;/p&gt;
&lt;p&gt;Relevant non-Anthropic commentary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/masondrxy/status/2052054177749029164&quot;&gt;@masondrxy&lt;/a&gt;: same model, same task, very different scores depending on prompts/tools/middleware; &lt;strong&gt;10–20 point jumps on tau2-bench&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/LangChain/status/2052054711440662864&quot;&gt;@LangChain&lt;/a&gt;: harness profiles for OpenAI, Anthropic, and Google models.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/jakebroekhuizen/status/2052058987580051566&quot;&gt;@jakebroekhuizen&lt;/a&gt;: distinguishes &lt;strong&gt;temporal harness evolution&lt;/strong&gt; as models improve from &lt;strong&gt;lateral tuning across model families&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/Vtrivedy10/status/2052100726608781363&quot;&gt;@Vtrivedy10&lt;/a&gt;: argues a tailored harness can outperform default Codex/Claude Code on many tasks; usable context windows are still effectively &lt;strong&gt;50–100k&lt;/strong&gt; for many agent designs.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://x.com/kieranklaassen/status/2052092428438688027&quot;&gt;@kieranklaassen&lt;/a&gt;: “If you cannot get your work done [in] the Claude CLI, Claude will not be able to work for you.”&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This matters because some of Anthropic’s platform moves—memory, grading, managed agents—can be read as &lt;strong&gt;Anthropic productizing parts of the harness&lt;/strong&gt;. That helps explain the central debate: &lt;strong&gt;are these defensible platform primitives, or just first-party packaging of patterns that open frameworks can clone?&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;Broader context: why this matters&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Inference, not just training, is now a frontier bottleneck.&lt;/strong&gt;&lt;br&gt;
The news was not a new model launch; it was a capacity launch. That is increasingly common at the frontier.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compute markets are becoming fluid and strategic.&lt;/strong&gt;&lt;br&gt;
Anthropic partnering with SpaceX/xAI infrastructure undercuts simplistic narratives that each frontier lab sits only atop its own vertically integrated stack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Developer product share is sensitive to reliability and limits.&lt;/strong&gt;&lt;br&gt;
Claude appears to have strong developer affinity, but rate limits and outages push users toward Codex/Cursor/others quickly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The battleground is shifting from base models to agent systems.&lt;/strong&gt;&lt;br&gt;
“Code with Claude,” managed agents, Dreaming, Outcomes, and the surrounding discourse all point toward the next layer of competition being &lt;strong&gt;memory, orchestration, evals, and workflow integration&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Anthropic’s brand remains bifurcated.&lt;/strong&gt;&lt;br&gt;
It is simultaneously:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;admired for product quality and safety seriousness,&lt;/li&gt;
&lt;li&gt;criticized for paternalism or perceived exclusivism,&lt;/li&gt;
&lt;li&gt;and now seen as more commercially aggressive on compute than before.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Bottom line&lt;/h2&gt;
&lt;p&gt;Anthropic’s news was less about a flashy new model and more about a structural reality: &lt;strong&gt;Claude demand had outrun available compute, and Anthropic responded by striking a major external infrastructure deal and immediately easing key user limits&lt;/strong&gt; &lt;a href=&quot;https://x.com/claudeai/status/2052060691893227611&quot;&gt;@claudeai&lt;/a&gt;, &lt;a href=&quot;https://x.com/claudeai/status/2052060693269008586&quot;&gt;@claudeai&lt;/a&gt;. The most important technical/economic signal is that &lt;strong&gt;capacity, rate limits, and agent-product ergonomics are now as strategically important as leaderboard deltas&lt;/strong&gt;. The main open questions are whether Anthropic can convert this capacity into sustained product momentum, whether its managed-agent features are truly differentiated, and whether its safety/governance posture helps or hinders its standing as competition with OpenAI, Google, xAI, and open-model ecosystems intensifies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Infrastructure, inference, and systems&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OpenAI and partners released &lt;strong&gt;MRC (Multipath Reliable Connection)&lt;/strong&gt;, an open networking protocol for large AI training clusters, already deployed on OpenAI’s biggest supercomputers &lt;a href=&quot;https://x.com/OpenAI/status/2052025532485902368&quot;&gt;@OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/OpenAI/status/2052025533937103102&quot;&gt;@OpenAI&lt;/a&gt;. Commentary emphasized multipath routing, microsecond failover, and the shift of networking into a primary frontier bottleneck &lt;a href=&quot;https://x.com/kimmonismus/status/2052011784023028060&quot;&gt;@kimmonismus&lt;/a&gt;, &lt;a href=&quot;https://x.com/gdb/status/2052059553542328829&quot;&gt;@gdb&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Perplexity said it built an in-house inference engine, &lt;strong&gt;ROSE&lt;/strong&gt;, covering models from embeddings to trillion-parameter LLMs, and uses &lt;strong&gt;CuTeDSL&lt;/strong&gt; to accelerate specialized kernel development on Hopper and Blackwell &lt;a href=&quot;https://x.com/perplexity_ai/status/2052041903970148647&quot;&gt;@perplexity_ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;vLLM + Mooncake presented a strong systems result for agentic workloads with reusable prefixes: &lt;strong&gt;3.8x throughput&lt;/strong&gt;, &lt;strong&gt;46x lower P50 TTFT&lt;/strong&gt;, &lt;strong&gt;8.6x lower end-to-end latency&lt;/strong&gt;, and cache-hit improvement from &lt;strong&gt;1.7% to 92.2%&lt;/strong&gt;, scaling to &lt;strong&gt;60 GB200 GPUs&lt;/strong&gt; &lt;a href=&quot;https://x.com/vllm_project/status/2052113331927060840&quot;&gt;@vllm_project&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Unsloth + NVIDIA published three training optimizations claimed to make home-GPU LLM training &lt;strong&gt;~25% faster&lt;/strong&gt;: packed-sequence metadata caching, double-buffered checkpoint reloads, and faster MoE routing &lt;a href=&quot;https://x.com/UnslothAI/status/2052020656527532276&quot;&gt;@UnslothAI&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;NVIDIA work on &lt;strong&gt;lossless speculative decoding inside RL&lt;/strong&gt; was highlighted as giving up to &lt;strong&gt;~2.5x faster end-to-end RL at 235B scale&lt;/strong&gt; and &lt;strong&gt;~1.8x faster rollout throughput at 8B&lt;/strong&gt; without changing policy distribution &lt;a href=&quot;https://x.com/TheTuringPost/status/2052180472206381268&quot;&gt;@TheTuringPost&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Baseten launched &lt;strong&gt;Frontier Gateway&lt;/strong&gt; as managed infra/API/auth/rate-limit/billing for closed-weight labs; Poolside reported going from kickoff to production in &lt;strong&gt;7 weeks&lt;/strong&gt;, with &lt;strong&gt;P50 TTFT 146ms&lt;/strong&gt; for Laguna XS.2 and &lt;strong&gt;605ms&lt;/strong&gt; for Laguna M.1 &lt;a href=&quot;https://x.com/tuhinone/status/2052082677432390130&quot;&gt;@tuhinone&lt;/a&gt;, &lt;a href=&quot;https://x.com/poolsideai/status/2052075055132057707&quot;&gt;@poolsideai&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Benchmarks, evals, and agent harnesses&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ProgramBench&lt;/strong&gt; asks whether language models can rebuild programs from scratch, extending beyond repair-style SWE tasks &lt;a href=&quot;https://x.com/ComputerPapers/status/2051895799043215415&quot;&gt;@ComputerPapers&lt;/a&gt;, with Ofir Press arguing benchmarks are “treasure maps” that specify the future we want &lt;a href=&quot;https://x.com/OfirPress/status/2052106927908200957&quot;&gt;@OfirPress&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Terminal-Bench 2.1&lt;/strong&gt; patched &lt;strong&gt;28/89 tasks&lt;/strong&gt; in TB2.0; rankings held but absolute scores moved by up to &lt;strong&gt;12 points&lt;/strong&gt;, a useful reminder that agent benchmark maintenance materially matters &lt;a href=&quot;https://x.com/terminalbench/status/2052119174500220964&quot;&gt;@terminalbench&lt;/a&gt;, &lt;a href=&quot;https://x.com/ekellbuch/status/2052165464655298866&quot;&gt;@ekellbuch&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OBLIQ-Bench&lt;/strong&gt; emerged as a major IR benchmark release focused on hard first-stage retrieval, where current retrievers fail to surface subtly relevant documents from large corpora &lt;a href=&quot;https://x.com/dianetc_/status/2052053806121140254&quot;&gt;@dianetc_&lt;/a&gt;, with strong endorsements from IR researchers &lt;a href=&quot;https://x.com/lateinteraction/status/2052055143038713875&quot;&gt;@lateinteraction&lt;/a&gt;, &lt;a href=&quot;https://x.com/nlp_mit/status/2052069072607547892&quot;&gt;@nlp_mit&lt;/a&gt;, &lt;a href=&quot;https://x.com/LightOnIO/status/2052095548098822477&quot;&gt;@LightOnIO&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Harvey launched &lt;strong&gt;LAB&lt;/strong&gt;, an open-source, long-horizon legal agent benchmark covering &lt;strong&gt;1,200 tasks across 24 practice areas&lt;/strong&gt;, with support/commentary from LangChain, Baseten, Artificial Analysis, and others &lt;a href=&quot;https://x.com/saranormous/status/2052061665596948894&quot;&gt;@saranormous&lt;/a&gt;, &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2052145762650431840&quot;&gt;@ArtificialAnlys&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A major theme across multiple tweets was that &lt;strong&gt;harness engineering is a first-class variable&lt;/strong&gt;, often worth &lt;strong&gt;10–20 points&lt;/strong&gt; on agent benchmarks even with the same base model &lt;a href=&quot;https://x.com/masondrxy/status/2052054177749029164&quot;&gt;@masondrxy&lt;/a&gt;, &lt;a href=&quot;https://x.com/LangChain/status/2052054711440662864&quot;&gt;@LangChain&lt;/a&gt;, &lt;a href=&quot;https://x.com/Vtrivedy10/status/2052100726608781363&quot;&gt;@Vtrivedy10&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model releases and model performance&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Zyphra released &lt;strong&gt;ZAYA1-8B&lt;/strong&gt;, a reasoning MoE with &lt;strong&gt;&amp;#x3C;1B active parameters&lt;/strong&gt;, open-weight under &lt;strong&gt;Apache 2.0&lt;/strong&gt;, claiming strong math/reasoning efficiency and proximity to much larger systems with test-time compute &lt;a href=&quot;https://x.com/ZyphraAI/status/2052103618145501459&quot;&gt;@ZyphraAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/ZyphraAI/status/2052103646712828119&quot;&gt;@ZyphraAI&lt;/a&gt;. Commentary praised its architecture/post-training stack and AMD partnership &lt;a href=&quot;https://x.com/teortaxesTex/status/2052106600882528326&quot;&gt;@teortaxesTex&lt;/a&gt;, &lt;a href=&quot;https://x.com/eliebakouch/status/2052126118891729148&quot;&gt;@eliebakouch&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Google’s &lt;strong&gt;Gemma 4&lt;/strong&gt; moved the open-model Pareto frontier in Code Arena: &lt;strong&gt;Gemma-4-31B #13&lt;/strong&gt;, &lt;strong&gt;Gemma-4-26B-A4B #17&lt;/strong&gt; among open models &lt;a href=&quot;https://x.com/arena/status/2052061349312921686&quot;&gt;@arena&lt;/a&gt;, &lt;a href=&quot;https://x.com/_philschmid/status/2052104144706588699&quot;&gt;@_philschmid&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Google’s &lt;strong&gt;DFlash draft model for Gemma-4&lt;/strong&gt; was described as one of the best draft models they’ve trained, especially strong in coding and math &lt;a href=&quot;https://x.com/jianchen1799/status/2051902953376923946&quot;&gt;@jianchen1799&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Qwopus3.6-35B-A3B-v1 claimed &lt;strong&gt;162 tok/s on a single RTX 5090&lt;/strong&gt;, targeting strong one-shot frontend/web generation on consumer hardware &lt;a href=&quot;https://x.com/KyleHessling1/status/2052064943999267212&quot;&gt;@KyleHessling1&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;DeepSeek commentary was mixed: fundraising talks reportedly target a &lt;strong&gt;$45B valuation&lt;/strong&gt; led by a major Chinese state-backed semiconductor fund &lt;a href=&quot;https://x.com/jukan05/status/2051904572038455634&quot;&gt;@jukan05&lt;/a&gt;, while evaluators debated weak WeirdML performance for V4-Pro versus GLM/Kimi/open competitors &lt;a href=&quot;https://x.com/htihle/status/2052042076196335658&quot;&gt;@htihle&lt;/a&gt;, &lt;a href=&quot;https://x.com/teortaxesTex/status/2052043753892761882&quot;&gt;@teortaxesTex&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agents, tools, and developer workflows&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cursor added &lt;strong&gt;context usage breakdowns&lt;/strong&gt; across rules, skills, MCPs, and subagents to help debug context issues &lt;a href=&quot;https://x.com/cursor_ai/status/2052059748544249918&quot;&gt;@cursor_ai&lt;/a&gt;, and described bootstrapping future Composer generations with earlier Composer models &lt;a href=&quot;https://x.com/cursor_ai/status/2052116064474161556&quot;&gt;@cursor_ai&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Cognition shipped &lt;strong&gt;Devin Review&lt;/strong&gt; and &lt;strong&gt;Quick Review / SWE-Check&lt;/strong&gt; in Windsurf 2.0, explicitly targeting the new bottleneck of reviewing AI-generated code &lt;a href=&quot;https://x.com/cognition/status/2052100630626607189&quot;&gt;@cognition&lt;/a&gt;, &lt;a href=&quot;https://x.com/ypatil125/status/2052122827961278833&quot;&gt;@ypatil125&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;OpenAI promoted &lt;strong&gt;Codex subagents&lt;/strong&gt;, framing them as a way to split work across specialized agents and merge results back into one answer &lt;a href=&quot;https://x.com/reach_vb/status/2052090279344120278&quot;&gt;@reach_vb&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Nous/Hermes continued to push a highly pluggable local agent stack: plugin expansion, community docs, Windows/WSL2 setup guidance, and use-case aggregation &lt;a href=&quot;https://x.com/Teknium/status/2052046335583625629&quot;&gt;@Teknium&lt;/a&gt;, &lt;a href=&quot;https://x.com/witcheer/status/2052033039379673374&quot;&gt;@witcheer&lt;/a&gt;, &lt;a href=&quot;https://x.com/NousResearch/status/2052140057222369541&quot;&gt;@NousResearch&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Perplexity added &lt;strong&gt;Finance Search&lt;/strong&gt; to its Agent API with licensed data, live market data, and citations, claiming best cohort accuracy and lowest cost per correct answer on &lt;strong&gt;FinSearchComp T1&lt;/strong&gt; &lt;a href=&quot;https://x.com/perplexity_ai/status/2052028012313649194&quot;&gt;@perplexity_ai&lt;/a&gt;, &lt;a href=&quot;https://x.com/AravSrinivas/status/2052033959555735752&quot;&gt;@AravSrinivas&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Google’s Gemini API added &lt;strong&gt;multimodal retrieval&lt;/strong&gt; to File Search using &lt;code&gt;gemini-embedding-2&lt;/code&gt; for PDFs and images in a single retrieval pipeline &lt;a href=&quot;https://x.com/_philschmid/status/2052060912425546050&quot;&gt;@_philschmid&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Robotics, multimodality, and research notes&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Genesis AI introduced &lt;strong&gt;GENE-26.5&lt;/strong&gt;, describing a full-stack robotics program with a robotics-native foundation model, human-like hand, data glove, and simulator; the model is trained across &lt;strong&gt;language, vision, proprioception, tactile, and action&lt;/strong&gt; &lt;a href=&quot;https://x.com/gs_ai_/status/2052050956272230577&quot;&gt;@gs_ai_&lt;/a&gt;, &lt;a href=&quot;https://x.com/theo_gervet/status/2052057035681018359&quot;&gt;@theo_gervet&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Meta FAIR released &lt;strong&gt;NeuralBench&lt;/strong&gt;, an MIT-licensed unified benchmark framework for NeuroAI with &lt;strong&gt;36 EEG tasks&lt;/strong&gt; and &lt;strong&gt;94 datasets&lt;/strong&gt;, with MEG/fMRI support planned &lt;a href=&quot;https://x.com/hubertjbanville/status/2052029372282888234&quot;&gt;@hubertjbanville&lt;/a&gt;, &lt;a href=&quot;https://x.com/JeanRemiKing/status/2052034314120896582&quot;&gt;@JeanRemiKing&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Sander Dieleman published a long technical post on &lt;strong&gt;flow maps&lt;/strong&gt;, learning the integral of a diffusion model for faster sampling and related tricks &lt;a href=&quot;https://x.com/sedielem/status/2051957402556104799&quot;&gt;@sedielem&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;François Fleuret sketched a speculative recipe for stronger systems: &lt;strong&gt;latent diffusion-like reasoning + real recurrent state + world-model pre-pretraining&lt;/strong&gt; &lt;a href=&quot;https://x.com/francoisfleuret/status/2051928896027693479&quot;&gt;@francoisfleuret&lt;/a&gt;, generating useful discussion on whether diffusion-style reasoning extrapolates the right way &lt;a href=&quot;https://x.com/willdepue/status/2052033422915477580&quot;&gt;@willdepue&lt;/a&gt;, &lt;a href=&quot;https://x.com/jeremyphoward/status/2052149483740545400&quot;&gt;@jeremyphoward&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;HeadVis was introduced as a new interpretability tool for studying attention heads &lt;a href=&quot;https://x.com/kamath_harish/status/2052046203030827088&quot;&gt;@kamath_harish&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Microsoft Research work on &lt;strong&gt;agent-readable interpretability&lt;/strong&gt; proposed “Agentic-imodels,” where coding agents evolve models that are interpretable to other LLMs; reported gains on &lt;strong&gt;65 tabular datasets&lt;/strong&gt; and downstream BLADE improvements from &lt;strong&gt;8% to 73%&lt;/strong&gt; &lt;a href=&quot;https://x.com/dair_ai/status/2052125514266190286&quot;&gt;@dair_ai&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. MTP and Quantized Local Inference&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t4jq6h/gemma_4_mtp_released/&quot;&gt;Gemma 4 MTP released&lt;/a&gt;&lt;/strong&gt; (Activity: 1575): &lt;strong&gt;&lt;strong&gt;Google released Multi-Token Prediction (MTP) draft checkpoints for Gemma 4&lt;/strong&gt;—&lt;a href=&quot;https://huggingface.co/google/gemma-4-31B-it-assistant&quot;&gt;&lt;code&gt;31B-it-assistant&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/google/gemma-4-26B-A4B-it-assistant&quot;&gt;&lt;code&gt;26B-A4B-it-assistant&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/google/gemma-4-E4B-it-assistant&quot;&gt;&lt;code&gt;E4B-it-assistant&lt;/code&gt;&lt;/a&gt;, and &lt;a href=&quot;https://huggingface.co/google/gemma-4-E2B-it-assistant&quot;&gt;&lt;code&gt;E2B-it-assistant&lt;/code&gt;&lt;/a&gt;—described in Google’s &lt;a href=&quot;https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/&quot;&gt;announcement&lt;/a&gt;. The model cards say MTP extends the base model with a smaller draft model for &lt;strong&gt;speculative decoding&lt;/strong&gt;, where the draft predicts multiple tokens ahead and the target model verifies them in parallel, claiming &lt;strong&gt;up to &lt;code&gt;2x&lt;/code&gt; decoding speedup&lt;/strong&gt; with &lt;em&gt;“the exact same quality as standard generation.”&lt;/em&gt; A commenter notes the smallest &lt;code&gt;E2B&lt;/code&gt; variant uses a &lt;strong&gt;&lt;code&gt;78M&lt;/code&gt; draft model&lt;/strong&gt;, and another shared a technical visual explainer on MTP with Gemma 4 &lt;a href=&quot;https://newsletter.maartengrootendorst.com/i/193064129/multi-token-prediction-mtp-with-gemma-4&quot;&gt;here&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter linked an updated visual explainer of &lt;strong&gt;multi-token prediction (MTP)&lt;/strong&gt; for Gemma 4, including implementation-oriented snippets: &lt;a href=&quot;https://newsletter.maartengrootendorst.com/i/193064129/multi-token-prediction-mtp-with-gemma-4&quot;&gt;Maarten Grootendorst’s guide&lt;/a&gt;. This is relevant for understanding how Gemma 4’s MTP setup predicts multiple future tokens per forward pass and how that interacts with speculative/draft-style decoding.&lt;/li&gt;
&lt;li&gt;One technical detail called out is that the &lt;strong&gt;E2B model includes a &lt;code&gt;78M&lt;/code&gt;-parameter draft model&lt;/strong&gt;, implying a lightweight auxiliary model for faster generation workflows such as speculative decoding. The small draft size is notable because it can reduce decode latency while keeping the verifier/main model responsible for final token acceptance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t57xuu/25x_faster_inference_with_qwen_36_27b_using_mtp/&quot;&gt;2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding - 262k context on 48GB - Fixed chat template - Drop-in OpenAI and Anthropic API endpoints&lt;/a&gt;&lt;/strong&gt; (Activity: 1445): &lt;strong&gt;A llama.cpp PR (&lt;a href=&quot;https://github.com/ggml-org/llama.cpp/pull/22673&quot;&gt;&lt;code&gt;pull/22673&lt;/code&gt;&lt;/a&gt;) adds &lt;strong&gt;Qwen 3.6 27B MTP&lt;/strong&gt; support for speculative decoding using the model’s built-in multi-token prediction heads; the author reports &lt;strong&gt;~&lt;code&gt;2.5×&lt;/code&gt; faster generation&lt;/strong&gt; on an M2 Max 96GB, reaching &lt;strong&gt;&lt;code&gt;28 tok/s&lt;/code&gt;&lt;/strong&gt;, and published converted GGUFs with MTP tensors at &lt;a href=&quot;https://huggingface.co/froggeric/Qwen3.6-27B-MTP-GGUF&quot;&gt;froggeric/Qwen3.6-27B-MTP-GGUF&lt;/a&gt;. The setup combines &lt;code&gt;--spec-type mtp --spec-draft-n-max 5&lt;/code&gt;, &lt;code&gt;q4_0&lt;/code&gt;/&lt;code&gt;q8_0&lt;/code&gt; KV-cache quantization, and long contexts up to &lt;strong&gt;&lt;code&gt;262144&lt;/code&gt; tokens&lt;/strong&gt;, with claimed viability on &lt;strong&gt;48GB Mac/VRAM-class systems&lt;/strong&gt;; the author also uploaded fixed non-vLLM-specific Jinja chat templates at &lt;a href=&quot;https://huggingface.co/froggeric/Qwen-Fixed-Chat-Templates&quot;&gt;froggeric/Qwen-Fixed-Chat-Templates&lt;/a&gt;. Caveats: current MTP support requires building llama.cpp from the PR branch, &lt;code&gt;q4_0&lt;/code&gt; KV has some quality loss, and &lt;strong&gt;vision currently crashes llama.cpp when used with MTP&lt;/strong&gt;; one commenter benchmarked Qwen 3.6 2.7B Q8 on an RTX Pro 6000 MaxQ at &lt;strong&gt;&lt;code&gt;36 tok/s&lt;/code&gt; → &lt;code&gt;78 tok/s&lt;/code&gt; with MTP&lt;/strong&gt;, while noting ~&lt;code&gt;20%&lt;/code&gt; slower prompt processing.&lt;/strong&gt; Comments were broadly enthusiastic, framing recent open-model and inference-runtime progress as unusually rapid and especially important for consumer/local hardware. One technical question asked whether “turbo3/turbo4” had been merged or whether it was part of the MTP PR.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user reported a concrete MTP speedup on an &lt;strong&gt;RTX Pro 6000 MaxQ&lt;/strong&gt;: &lt;code&gt;qwen 3.6 2.7B Q8&lt;/code&gt; increased from &lt;code&gt;36 tokens/s&lt;/code&gt; to &lt;code&gt;78 tokens/s&lt;/code&gt; with MTP enabled, while prompt processing dropped by about &lt;code&gt;20%&lt;/code&gt;. They said generation quality appeared unchanged, making the tradeoff strongly favorable for decode-heavy workloads.&lt;/li&gt;
&lt;li&gt;One commenter asked whether the &lt;code&gt;turbo3&lt;/code&gt;/&lt;code&gt;turbo4&lt;/code&gt; changes had already been merged or whether the observed speedup is specifically part of the &lt;strong&gt;MTP PR&lt;/strong&gt;, highlighting uncertainty about which inference optimization path is responsible for the gains.&lt;/li&gt;
&lt;li&gt;There was a technical comparison request against &lt;strong&gt;Qwen 3.6 Dflash&lt;/strong&gt; models and low-bit &lt;code&gt;iq3_XS&lt;/code&gt; quantizations. The commenter noted they can usually fit &lt;code&gt;256k&lt;/code&gt; context in &lt;code&gt;16GB&lt;/code&gt; VRAM and asked whether the released quants can also support &lt;code&gt;256k&lt;/code&gt; context when not using &lt;code&gt;mmproj&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t53dhp/quality_comparison_between_qwen_36_27b/&quot;&gt;Quality comparison between Qwen 3.6 27B quantizations (BF16, Q8_0, Q6_K, Q5_K_XL, Q4_K_XL, IQ4_XS, IQ3_XXS,...)&lt;/a&gt;&lt;/strong&gt; (Activity: 771): &lt;strong&gt;A Reddit user benchmarked &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt; quantizations on a synthetic chess-to-SVG task requiring PGN state tracking, board orientation, piece placement, and last-move highlighting, using &lt;code&gt;llama.cpp&lt;/code&gt; with &lt;code&gt;temp=0.6&lt;/code&gt;, &lt;code&gt;top_p=0.95&lt;/code&gt;, &lt;code&gt;top_k=20&lt;/code&gt;, &lt;code&gt;presence_penalty=1.0&lt;/code&gt;, and &lt;code&gt;ctx=65536&lt;/code&gt;. In this single-run test, &lt;strong&gt;BF16/Q8_0&lt;/strong&gt; were essentially correct, &lt;strong&gt;Q6_K&lt;/strong&gt; showed pawn-placement degradation, &lt;strong&gt;Q5_K_XL/Q4_K_XL/IQ4_XS&lt;/strong&gt; remained mostly usable, while &lt;strong&gt;Q3/Q2&lt;/strong&gt; variants increasingly failed layout/orientation; the author chose &lt;strong&gt;IQ4_XS&lt;/strong&gt; as the practical floor for a &lt;code&gt;16 GB&lt;/code&gt; VRAM RTX 5060 Ti setup. They report &lt;code&gt;~100 pp tps / 8 tg tps&lt;/code&gt; with vanilla &lt;code&gt;llama.cpp&lt;/code&gt;, improving to &lt;code&gt;~760 pp tps / 22 tg tps&lt;/code&gt; using &lt;strong&gt;TheTom’s TurboQuant fork&lt;/strong&gt; with &lt;code&gt;-ngl 99&lt;/code&gt;, &lt;code&gt;-ctk turbo4&lt;/code&gt;, &lt;code&gt;-ctv turbo2&lt;/code&gt;, and &lt;code&gt;&amp;#x3C;75k&lt;/code&gt; context; full outputs are posted at &lt;a href=&quot;https://qwen3-6-27b-benchmark.vercel.app/&quot;&gt;qwen3-6-27b-benchmark.vercel.app&lt;/a&gt;.&lt;/strong&gt; Top technical feedback praised the benchmark but emphasized that &lt;em&gt;“one run is not enough”&lt;/em&gt; because stochastic decoding can make individual quant results outliers; commenters still noted the observed degradation trend broadly matches expectations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters raised a methodology concern: the quantization comparison appears to rely on single runs per test, which can produce &lt;strong&gt;statistical noise&lt;/strong&gt; and misleading quality differences. They suggested running each quant multiple times to detect outliers, especially because LLM evals can vary run-to-run even when an overall degradation trend is visible.&lt;/li&gt;
&lt;li&gt;One technical takeaway discussed was that &lt;strong&gt;&lt;code&gt;4-bit&lt;/code&gt; quantization may remain the practical sweet spot&lt;/strong&gt;, with &lt;code&gt;3-bit&lt;/code&gt; described as more usable than commonly claimed, while going beyond roughly &lt;code&gt;5-bit&lt;/code&gt; may offer diminishing returns versus moving to a larger/better base model. A commenter specifically contrasted cases like a much larger &lt;code&gt;122B UD-Q3_K_XL&lt;/code&gt; model against a smaller &lt;code&gt;35B IQ4_NL&lt;/code&gt; model to argue that model scale can outweigh higher-bit quantization quality.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Agentic Coding and Cost Benchmarks&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t47qbw/deepseek_v4_pro_matches_gpt52_on_foodtruck_bench/&quot;&gt;DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper&lt;/a&gt;&lt;/strong&gt; (Activity: 478): &lt;strong&gt;The image is a &lt;strong&gt;technical leaderboard screenshot&lt;/strong&gt; for FoodTruck Bench showing &lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; highlighted at rank &lt;code&gt;#4&lt;/code&gt; with &lt;code&gt;$27,142&lt;/code&gt; final net worth, &lt;code&gt;+1257% ROI&lt;/code&gt;, &lt;code&gt;51%&lt;/code&gt; margin, &lt;code&gt;$52,139&lt;/code&gt; revenue, and &lt;code&gt;$26,492&lt;/code&gt; profit over a 30-day agentic food-truck simulation starting from &lt;code&gt;$2,000&lt;/code&gt; (&lt;a href=&quot;https://i.redd.it/fx89f3w5n9zg1.png&quot;&gt;image&lt;/a&gt;). This supports the post’s claim that DeepSeek V4 Pro is within ~&lt;code&gt;3%&lt;/code&gt; of &lt;strong&gt;GPT-5.2&lt;/strong&gt;’s median outcome while reportedly being ~&lt;code&gt;17×&lt;/code&gt; cheaper on the same workload, making it a frontier-tier result in this benchmark at much lower API cost.&lt;/strong&gt; Commenters were impressed but skeptical about interpretation: one noted &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; appears far ahead in profit, while another questioned the benchmark’s credibility if &lt;strong&gt;Gemma 4 31B&lt;/strong&gt; can beat &lt;strong&gt;Sonnet 4.6&lt;/strong&gt;. There was also curiosity about absent newer GPT variants like “GPT 5.4/5.5.”&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters focused on the benchmark ranking implications rather than the headline DeepSeek result: &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; reportedly achieves about &lt;code&gt;1.7×&lt;/code&gt; higher profit than the next cluster of models on &lt;strong&gt;FoodTruck Bench&lt;/strong&gt;, suggesting a sizable lead in this agentic profit-optimization benchmark despite DeepSeek V4 Pro matching &lt;strong&gt;GPT-5.2&lt;/strong&gt; at much lower cost.&lt;/li&gt;
&lt;li&gt;Multiple users called out &lt;strong&gt;Gemma 31B&lt;/strong&gt; as an under-discussed outlier: it appears in the top 5 on FoodTruck Bench, reportedly beats &lt;strong&gt;Sonnet 4.6&lt;/strong&gt;, and also performs well on &lt;strong&gt;EQBench&lt;/strong&gt;. Commenters questioned why Gemma is receiving less attention relative to Xiaomi/DeepSeek results if those rankings hold.&lt;/li&gt;
&lt;li&gt;There were requests to expand the comparison set with newer or missing models, specifically &lt;strong&gt;GPT-5.4/5.5&lt;/strong&gt;, the latest &lt;strong&gt;Qwen3.6&lt;/strong&gt; models, and a &lt;code&gt;27B&lt;/code&gt; model that one commenter expected might outperform Gemma. The implied concern is that the benchmark table may be incomplete or stale for evaluating current frontier and mid-size model competitiveness.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLM/comments/1t49wld/claude_code_opus_47_vs_opencode_qwen3627b_both/&quot;&gt;Claude Code @ Opus 4.7 vs OpenCode @ qwen3.6:27b. Both shipped a playable cozy roguelite.&lt;/a&gt;&lt;/strong&gt; (Activity: 406): &lt;strong&gt;A one-shot benchmark compared &lt;strong&gt;Claude Code on Opus 4.7&lt;/strong&gt; vs &lt;strong&gt;OpenCode on local Qwen3.6:27B&lt;/strong&gt; using identical VS Code devcontainers and a strict greenfield prompt for a vanilla Canvas/FastAPI roguelite; both produced a playable first-run game implementing movement, sword/shield combat, procedural world, drops, swap UI, and restart loop. Opus took ~&lt;code&gt;20 min&lt;/code&gt; and &lt;code&gt;97k&lt;/code&gt; tokens, while Qwen took ~&lt;code&gt;15 min&lt;/code&gt; and &lt;code&gt;64k&lt;/code&gt; tokens—about one-third fewer tokens—though the author explicitly limits the claim to tightly specified greenfield work rather than hard reasoning or existing-codebase maintenance. The linked Reddit-hosted video &lt;a href=&quot;https://v.redd.it/h4awffniaazg1&quot;&gt;&lt;code&gt;v.redd.it/h4awffniaazg1&lt;/code&gt;&lt;/a&gt; was not accessible in the provided crawl due to Reddit &lt;code&gt;403 Forbidden&lt;/code&gt; access restrictions.&lt;/strong&gt; Commenters focused on reproducibility and local-model capability: one asked for the full prompt, while others characterized &lt;strong&gt;Qwen3.6 27B&lt;/strong&gt; as surprisingly strong for coding/tricky questions, less hallucination-prone than some MoE alternatives, and roughly comparable to last year’s &lt;strong&gt;Sonnet 4.5&lt;/strong&gt; for many coding tasks. Another commenter said the &lt;code&gt;35B&lt;/code&gt; variant performs well on large-codebase edit tasks when “properly harnessed.”&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Users requested key reproducibility details missing from the comparison: the exact prompt, hardware used for the local Qwen run, and whether any quantization was applied to &lt;code&gt;qwen3.6:27b&lt;/code&gt;. These details are important because local model throughput and coding quality can vary significantly by quantization level and memory bandwidth/GPU or Apple Silicon configuration.&lt;/li&gt;
&lt;li&gt;One commenter reported &lt;code&gt;Qwen3.6 27B&lt;/code&gt; running “very slow” on an &lt;strong&gt;M1 Pro&lt;/strong&gt;, but still handling coding and tricky questions well. They claimed it hallucinated less than &lt;code&gt;35B A3B&lt;/code&gt; and &lt;code&gt;Gemma MoE&lt;/code&gt;, and estimated it as roughly comparable to &lt;code&gt;Sonnet 4.5&lt;/code&gt; from the previous year, making it usable for “90% of coding tasks.”&lt;/li&gt;
&lt;li&gt;Another user argued that the &lt;code&gt;35B&lt;/code&gt; model performs strongly when “properly harnessed” and given large codebase context for inspection and edits, suggesting orchestration/context management may matter as much as raw model choice for coding-agent workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t4s6g2/deepseek_v4_being_17x_cheaper_got_me_to_actually/&quot;&gt;DeepSeek V4 being 17x cheaper got me to actually measure what I send to cloud vs what I could run locally. the results are stupid.&lt;/a&gt;&lt;/strong&gt; (Activity: 904): &lt;strong&gt;A developer instrumented &lt;code&gt;10&lt;/code&gt; days of coding-agent usage and re-ran a &lt;code&gt;150&lt;/code&gt;-task sample against a local &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt; model on an &lt;strong&gt;RTX 3090&lt;/strong&gt; versus cloud models, finding local parity for &lt;code&gt;97%&lt;/code&gt; of file-read/project-scan/explanation tasks (&lt;code&gt;35%&lt;/code&gt; of workload) and &lt;code&gt;88%&lt;/code&gt; of test/boilerplate/single-file-edit tasks (&lt;code&gt;30%&lt;/code&gt;). Local quality degraded on multi-file debugging (&lt;code&gt;61%&lt;/code&gt;, &lt;code&gt;20%&lt;/code&gt; of workload) and complex architecture/refactors across &lt;code&gt;5+&lt;/code&gt; files (&lt;code&gt;29%&lt;/code&gt;, &lt;code&gt;15%&lt;/code&gt;), so routing only the latter buckets to cloud reportedly cut API spend from &lt;code&gt;$85/month&lt;/code&gt; to about &lt;code&gt;$22/month&lt;/code&gt;.&lt;/strong&gt; Commenters generally agreed with a hybrid/local-first workflow: some report using local models for nearly all coding, escalating only to Gemini/ChatGPT/Claude/Qwen/GLM free tiers or cloud models for planning, oversight, unusually complex tasks, or non-code domains like health/legal. One commenter asked for implementation details on the task-type router/harness, implying the key missing technical artifact is the automation layer for classification and dispatch.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters describe a &lt;strong&gt;hybrid local/cloud workflow&lt;/strong&gt;: local models handle most code-related tasks, while cloud/free web tiers such as &lt;strong&gt;ChatGPT, Claude, Gemini, Qwen, GLM&lt;/strong&gt;, or Gemini specifically are reserved for planning, oversight, or rare complex problems. One user reports running with &lt;strong&gt;zero subscriptions&lt;/strong&gt;, using cloud mostly for non-code domains like health/legal queries where local model reliability may be less acceptable.&lt;/li&gt;
&lt;li&gt;A key technical objection is that local models can be &lt;strong&gt;slower on large contexts&lt;/strong&gt; and impose hidden costs through extra verification/debugging time. One commenter argues that even if local inference is cheaper, the &lt;code&gt;~10%&lt;/code&gt; of cases where local models underperform can dominate productivity costs, and suggests hosted &lt;strong&gt;Qwen 3.6 27B / Qwen 3.6 Pro&lt;/strong&gt; may be faster and still only cost “a couple dollars a month.”&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. Anthropic Claude Code Limits and Reliability&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1t5hs98/doubled_rate_limits_for_claude_code/&quot;&gt;Doubled Rate Limits for Claude Code&lt;/a&gt;&lt;/strong&gt; (Activity: 3224): &lt;strong&gt;&lt;strong&gt;Anthropic&lt;/strong&gt; says a new compute partnership with &lt;strong&gt;SpaceX&lt;/strong&gt;, plus other recent compute deals, lets it raise Claude capacity: &lt;strong&gt;Claude Code&lt;/strong&gt; Pro/Max plans no longer get peak-hours limit reductions, and &lt;strong&gt;Claude API&lt;/strong&gt; rate limits for &lt;strong&gt;Opus&lt;/strong&gt; models are being “substantially” increased, effective immediately (&lt;a href=&quot;https://www.anthropic.com/news/higher-limits-spacex&quot;&gt;Anthropic announcement&lt;/a&gt;). The post frames this as “doubled rate limits,” but the quoted announcement itself specifies removal of peak-hour throttling for Claude Code and higher Opus API limits rather than giving exact numeric quotas.&lt;/strong&gt; Top comments were mostly non-technical surprise/skepticism and speculation about Elon Musk’s rivalry with Sam Altman/OpenAI.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1t4w5an/ive_had_it_with_claude_it_has_become_complete/&quot;&gt;I&apos;ve had it with Claude. It has become complete garbage.&lt;/a&gt;&lt;/strong&gt; (Activity: 1716): &lt;strong&gt;A senior SWE reports a major regression in &lt;strong&gt;Anthropic &lt;a href=&quot;https://www.anthropic.com/claude&quot;&gt;Claude&lt;/a&gt;&lt;/strong&gt; after “Opus 4.7” versus “Opus 4.6”: slower CLI interactions (&lt;code&gt;30s&lt;/code&gt; for commits, &lt;code&gt;45min&lt;/code&gt; implementations), worse terminal/Tmux rendering on resize, loss of useful &lt;code&gt;Ctrl+O&lt;/code&gt; trace visibility, more frequent usage-limit hits, and poorer instruction adherence despite project memory/context engineering. The concrete technical failures cited include ignoring short test timeouts (&lt;code&gt;10–15s&lt;/code&gt; → &lt;code&gt;30s/60s/5min&lt;/code&gt;), auto-committing despite “never auto commit,” verbosity drift despite &lt;code&gt;/caveman&lt;/code&gt;, implementing a Rust refactor by adding &lt;code&gt;handle_input_bytes(Bytes)&lt;/code&gt; instead of changing &lt;code&gt;handle_input(&amp;#x26;[u8])&lt;/code&gt; to &lt;code&gt;Bytes&lt;/code&gt;, and deviating from an &lt;code&gt;io_uring&lt;/code&gt; cancel-safety plan by reverting toward a racy one-shot/multi-shot recv shortcut before acknowledging &lt;em&gt;“Yes deviating. Confess.”&lt;/em&gt;&lt;/strong&gt; Top comments split between agreement that losing visible reasoning makes it harder to interrupt bad loops, users cancelling Max and moving to open-source models for stability, and a dissenting experienced developer saying Claude remains productive when using disciplined &lt;code&gt;Claude.md&lt;/code&gt;/&lt;code&gt;memory.md&lt;/code&gt;, scoped plans, milestones, and avoiding excessive context loading.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A long-time software developer reports stable coding performance by using a constrained project workflow: well-maintained &lt;code&gt;Claude.md&lt;/code&gt; and &lt;code&gt;memory.md&lt;/code&gt;, a small number of skills, upfront planning, milestone-based implementation, and repeated build/test/release cycles. They argue many failures may come from poor context hygiene—either loading “29 different markdown files” as an oversized pseudo-OS or dumping the full context window into every command.&lt;/li&gt;
&lt;li&gt;One user highlights a UX/regression issue from hiding chain-of-thought-style progress: without visible “thinking,” they can no longer tell whether Claude is looping internally versus waiting on server-side latency. This makes it harder to interrupt unproductive reasoning early and diagnose whether a delay is model behavior or infrastructure-related.&lt;/li&gt;
&lt;li&gt;Several users report time-dependent quality variance, with one specifically claiming worse Claude behavior during &lt;code&gt;8am–2pm Eastern (US)&lt;/code&gt; peak usage: more corner-cutting, sloppier outputs, and “brain dead” behavior, while off-peak usage feels closer to prior quality. The implied technical concern is load-dependent degradation, potentially from capacity pressure, routing, throttling, or model/serving changes during peak demand.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t4gfc7/turned_a_desk_lamp_into_a_claude_code_status/&quot;&gt;Turned a desk lamp into a Claude Code status indicator&lt;/a&gt;&lt;/strong&gt; (Activity: 1817): &lt;strong&gt;A Reddit user adapted the open-source &lt;a href=&quot;https://github.com/bobek-balinek/claude-lamp&quot;&gt;&lt;code&gt;bobek-balinek/claude-lamp&lt;/code&gt;&lt;/a&gt; project to turn a BLE desk lamp into a &lt;strong&gt;Claude Code status indicator&lt;/strong&gt;: Claude Code hooks invoke a Python script that sends Bluetooth Low Energy commands to set animations/colors. The lamp shows a &lt;strong&gt;blue spinning animation&lt;/strong&gt; while Claude is working, &lt;strong&gt;pink&lt;/strong&gt; when user input is required, and &lt;strong&gt;warm white&lt;/strong&gt; when idle; effects are configurable in source, and the author is considering extending the setup to &lt;strong&gt;Philips Hue&lt;/strong&gt; bulbs. The linked Reddit video was inaccessible due to a &lt;code&gt;403 Forbidden&lt;/code&gt; response.&lt;/strong&gt; Commenters mainly asked for the lamp model and discussed scaling the idea to multiple concurrent Claude Code sessions, e.g. using multiple lights or designing a better multi-session status indicator. One commenter noted the title could also imply showing Anthropic service health via &lt;a href=&quot;https://status.claude.com/&quot;&gt;&lt;code&gt;status.claude.com&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter suggested extending the lamp beyond local Claude Code state to reflect &lt;strong&gt;Claude service health&lt;/strong&gt;, using Anthropic’s public status page at &lt;a href=&quot;https://status.claude.com/&quot;&gt;status.claude.com&lt;/a&gt; as the data source. This would make the indicator represent operational availability rather than just local task/session state.&lt;/li&gt;
&lt;li&gt;Another technical improvement proposed was visualizing &lt;strong&gt;remaining Claude Code usage within the rolling five-hour window&lt;/strong&gt;, e.g. lighting the lamp or “donut” proportionally to quota left. A separate comment raised the multi-session case, implying the indicator would need aggregation or per-session state handling if multiple Claude Code sessions run concurrently.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1t4atbx/warning_anthropics_gift_max_exploit_drained_800/&quot;&gt;Warning: Anthropic&apos;s &quot;Gift Max&quot; exploit drained €800+, ruined my credit, and got me banned.&lt;/a&gt;&lt;/strong&gt; (Activity: 3451): &lt;strong&gt;OP reports &lt;strong&gt;&gt;€800&lt;/strong&gt; in unauthorized Anthropic &lt;strong&gt;“Gift Max”&lt;/strong&gt; charges despite active &lt;code&gt;2FA&lt;/code&gt;; they claim &lt;code&gt;3-D Secure&lt;/code&gt; emails were received but never authorized, while gift codes were generated and immediately redeemed by a third party. They tie the incident to Anthropic’s &lt;a href=&quot;https://status.anthropic.com/&quot;&gt;status page&lt;/a&gt; entry for &lt;em&gt;“Elevated billing errors and unauthorized subscription changes”&lt;/em&gt; and GitHub issues &lt;code&gt;#51404&lt;/code&gt;/&lt;code&gt;#51168&lt;/code&gt;, then say Anthropic banned the account after receiving a police report and evidence, cutting off access to WIP chats/projects. In an update, OP says their bank treated it as fraud, issued a reclamation/refund, and will pursue Anthropic’s merchant account; they are also considering a &lt;a href=&quot;https://gdpr.eu/&quot;&gt;GDPR/DSGVO&lt;/a&gt; data request to recover data and German legal aid to repair possible &lt;a href=&quot;https://www.schufa.de/&quot;&gt;SCHUFA&lt;/a&gt; credit impacts.&lt;/strong&gt; Comments were mostly practical or skeptical: one noted that in the U.S. this would typically be handled via card chargeback, while another highlighted the irony/suspicion of a Gemini-written anti-Anthropic warning posted in a ChatGPT subreddit.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The OP reports their bank reversed the &lt;code&gt;€800+&lt;/code&gt; Anthropic-related charges as a fraud case and will pursue the merchant account directly. They also plan to file a formal GDPR/DSGVO data request to recover work-in-progress project data and seek German legal aid (&lt;em&gt;Beratungshilfeschein&lt;/em&gt;) to ensure any SCHUFA credit entries are cleared.&lt;/li&gt;
&lt;li&gt;One commenter notes seeing multiple YouTube ads from different merchants all advertising “1 year free Claude access,” suggesting a coordinated scam campaign potentially related to the reported exploit or phishing/payment-abuse pattern.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>anthropic</category><category>spacex</category><category>x-ai</category><category>claude</category><category>claude-code</category><category>opus</category><category>colossus-1</category><category>nottombrown</category><category>_aidan_clark_</category><category>kipperrii</category><category>theamolavasare</category><category>alexalbert__</category><category>compute</category><category>rate-limiting</category><category>agent-platforms</category><category>inference</category><category>api</category><category>managed-agents</category><category>safety</category><category>governance</category><category>event</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-05-05-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-05-not-much/</guid><description>**OpenAI** rolled out **GPT-5.5 Instant** as the new default for ChatGPT and API, enhancing **factuality, intelligence, image understanding, and tone** with stronger personalization features like saved memories and Gmail integration. OpenAI also shared infrastructure updates on a rebuilt **WebRTC stack** for voice and real-time API, aiming to reduce latency for speech-paced conversations. Developer tools expanded with an **Agents SDK for TypeScript**, sandbox agents, and open-source harnesses, improving coding and automation workflows. Discussions highlighted the importance of **Model–Harness–Task fit** over raw model quality for agent performance, with debates on agent coding UX and benchmarks. Community sentiment praises GPT-5.5 for high-token-budget coding and non-coding tasks.</description><pubDate>Mon, 04 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/4/2026-5/5/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;OpenAI’s GPT-5.5 Instant, personalization rollout, and voice/agent infrastructure updates&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPT-5.5 Instant becomes ChatGPT’s new default&lt;/strong&gt;: OpenAI rolled out &lt;strong&gt;GPT-5.5 Instant&lt;/strong&gt; to ChatGPT and the API as &lt;code&gt;gpt-5.5-chat-latest&lt;/code&gt;, positioning it as a broad upgrade in &lt;strong&gt;factuality, baseline intelligence, image understanding, and tone&lt;/strong&gt;. The launch also bundled stronger personalization: ChatGPT can now use &lt;strong&gt;saved memories, past chats, files, and connected Gmail&lt;/strong&gt;, while exposing &lt;strong&gt;“memory sources”&lt;/strong&gt; so users can see what context influenced a reply. See the main launch thread from &lt;a href=&quot;https://x.com/OpenAI/status/2051709028250915275&quot;&gt;@OpenAI&lt;/a&gt;, rollout details from &lt;a href=&quot;https://x.com/OpenAI/status/2051709035347694047&quot;&gt;@OpenAI&lt;/a&gt;, product commentary from &lt;a href=&quot;https://x.com/michpokrass/status/2051709536130802022&quot;&gt;@michpokrass&lt;/a&gt;, and reactions from &lt;a href=&quot;https://x.com/ericmitchellai/status/2051711459886059963&quot;&gt;@ericmitchellai&lt;/a&gt; and &lt;a href=&quot;https://x.com/sama/status/2051716909629153573&quot;&gt;@sama&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI also published more infra detail around real-time products&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAIDevs/status/2051453905343828350&quot;&gt;@OpenAIDevs&lt;/a&gt; shared a writeup on rebuilding the &lt;strong&gt;WebRTC stack&lt;/strong&gt; for ChatGPT voice and the Realtime API using a &lt;strong&gt;thin relay&lt;/strong&gt; plus a &lt;strong&gt;stateful transceiver&lt;/strong&gt; to reduce latency and keep conversations at speech pace. This fits the broader signal around an imminent voice refresh, noted by &lt;a href=&quot;https://x.com/kimmonismus/status/2051571219040735423&quot;&gt;@kimmonismus&lt;/a&gt; and &lt;a href=&quot;https://x.com/sama/status/2051464865634742334&quot;&gt;@sama&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developer-side OpenAI agent tooling keeps expanding&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAIDevs/status/2051725072873001338&quot;&gt;@OpenAIDevs&lt;/a&gt; announced the &lt;strong&gt;Agents SDK for TypeScript&lt;/strong&gt;, including &lt;strong&gt;sandbox agents&lt;/strong&gt; and an &lt;strong&gt;open-source harness&lt;/strong&gt;. Separately, OpenAI continued pushing Codex UX and automation, including task progress UI highlighted by &lt;a href=&quot;https://x.com/reach_vb/status/2051655026574057593&quot;&gt;@reach_vb&lt;/a&gt; and &lt;strong&gt;Auto Review&lt;/strong&gt; for lower-friction approvals in &lt;a href=&quot;https://x.com/reach_vb/status/2051782942314078553&quot;&gt;@reach_vb&lt;/a&gt;. Community sentiment suggests 5.5 is especially strong for &lt;strong&gt;high-token-budget coding and non-coding workflows&lt;/strong&gt;, per &lt;a href=&quot;https://x.com/sama/status/2051724685231214650&quot;&gt;@sama&lt;/a&gt; and &lt;a href=&quot;https://x.com/sama/status/2051783339502375418&quot;&gt;@sama&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Coding agents, harness design, and benchmark pressure&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Harness quality is becoming a first-class differentiator&lt;/strong&gt;: A recurring theme across the day was that model quality alone no longer explains agent performance. &lt;a href=&quot;https://x.com/Vtrivedy10/status/2051451869017584112&quot;&gt;@Vtrivedy10&lt;/a&gt; argued the field is mixing incompatible assumptions about &lt;strong&gt;native post-trained harnesses&lt;/strong&gt;, &lt;strong&gt;open harnesses&lt;/strong&gt;, and “AGI-like” model generalization; the practical takeaway is that &lt;strong&gt;Model–Harness–Task fit&lt;/strong&gt; matters more than abstract benchmark narratives. A complementary post from &lt;a href=&quot;https://x.com/Vtrivedy10/status/2051674478648742002&quot;&gt;@Vtrivedy10&lt;/a&gt; emphasized that talking to base or minimally wrapped models makes clear how much productized agents depend on &lt;strong&gt;instructions, tools, context packing, and measurement loops&lt;/strong&gt;. &lt;a href=&quot;https://x.com/sydneyrunkle/status/2051637638239567953&quot;&gt;@sydneyrunkle&lt;/a&gt; pointed to a LangChain post on the “anatomy” of long-running harnesses, while &lt;a href=&quot;https://x.com/masondrxy/status/2051714091924828480&quot;&gt;@masondrxy&lt;/a&gt; argued for &lt;strong&gt;ACP-style decoupling&lt;/strong&gt; so teams can swap &lt;strong&gt;CLI/TUI/GUI/IDE&lt;/strong&gt; frontends without changing the underlying harness.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agent coding UX is fragmenting, with real disagreement on winners&lt;/strong&gt;: There were multiple anecdotal comparisons of agent shells and coding assistants. &lt;a href=&quot;https://x.com/0xSero/status/2051689733793755405&quot;&gt;@0xSero&lt;/a&gt; ranked &lt;strong&gt;Droid&lt;/strong&gt; above Pi, Amp, OpenCode, and Codex CLI. &lt;a href=&quot;https://x.com/teortaxesTex/status/2051549309707928028&quot;&gt;@teortaxesTex&lt;/a&gt; said &lt;strong&gt;Hermes&lt;/strong&gt; currently beats deepseek-tui and OpenCode on &lt;strong&gt;success rate, speed, and cost&lt;/strong&gt;, adding cache-hit details in a follow-up &lt;a href=&quot;https://x.com/teortaxesTex/status/2051551506134896976&quot;&gt;comparison&lt;/a&gt;. On the commercial side, &lt;a href=&quot;https://x.com/kimmonismus/status/2051515496567292310&quot;&gt;@kimmonismus&lt;/a&gt; cited TickerTrends data claiming &lt;strong&gt;Codex surpassed Claude Code in downloads&lt;/strong&gt; after late-April releases, while several developers reported that &lt;strong&gt;Claude Code utility feels relatively flat&lt;/strong&gt; versus last fall, e.g. &lt;a href=&quot;https://x.com/TheEthanDing/status/2051516204607578132&quot;&gt;@TheEthanDing&lt;/a&gt; and &lt;a href=&quot;https://x.com/finbarrtimbers/status/2051652067480179020&quot;&gt;@finbarrtimbers&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;New coding benchmark: ProgramBench shows how far “whole-repo from scratch” still is&lt;/strong&gt;: Meta researchers introduced &lt;strong&gt;ProgramBench&lt;/strong&gt;, a 200-task benchmark asking models to generate substantial software artifacts like &lt;strong&gt;SQLite, FFmpeg, and a PHP compiler&lt;/strong&gt; from an executable spec and without starter code or internet access. &lt;a href=&quot;https://x.com/jyangballin/status/2051677497562210552&quot;&gt;@jyangballin&lt;/a&gt; presented it as an end-to-end repo generation test; &lt;a href=&quot;https://x.com/OfirPress/status/2051678633035809159&quot;&gt;@OfirPress&lt;/a&gt; summarized the headline result bluntly: &lt;strong&gt;top accuracy is 0%&lt;/strong&gt;. Discussion quickly focused on whether the headline metric is too harsh: &lt;a href=&quot;https://x.com/scaling01/status/2051733949877985349&quot;&gt;@scaling01&lt;/a&gt; noted models can still pass &lt;strong&gt;&gt;50% of tests per task on average&lt;/strong&gt;, while &lt;a href=&quot;https://x.com/OfirPress/status/2051757679283143089&quot;&gt;@OfirPress&lt;/a&gt; defended the all-tests criterion as necessary because partial implementations can game average-pass metrics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Practical coding automation keeps moving into CI/security&lt;/strong&gt;: &lt;a href=&quot;https://x.com/cursor_ai/status/2051739625958584659&quot;&gt;@cursor_ai&lt;/a&gt; launched agents that monitor GitHub and &lt;strong&gt;automatically fix CI failures&lt;/strong&gt;. &lt;a href=&quot;https://x.com/cognition/status/2051708729880416614&quot;&gt;@cognition&lt;/a&gt; introduced &lt;strong&gt;Devin for Security&lt;/strong&gt;, including claims of automated vuln remediation at enterprise scale and an example where Devin Review flagged a malicious axios release before public disclosure in &lt;a href=&quot;https://x.com/cognition/status/2051708731671331171&quot;&gt;@cognition&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Inference, systems, and efficiency: Gemma 4 drafters, SGLang/RadixArk, and provider economics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Gemma 4 gets multi-token prediction drafters across the open stack&lt;/strong&gt;: Google released &lt;strong&gt;Gemma 4 MTP drafters&lt;/strong&gt;, promising &lt;strong&gt;up to 3× faster decoding with no quality degradation&lt;/strong&gt;. The launch came through &lt;a href=&quot;https://x.com/googlegemma/status/2051713412431007808&quot;&gt;@googlegemma&lt;/a&gt;, &lt;a href=&quot;https://x.com/googledevs/status/2051700498328346945&quot;&gt;@googledevs&lt;/a&gt;, and ecosystem posts from &lt;a href=&quot;https://x.com/osanseviero/status/2051695861801820475&quot;&gt;@osanseviero&lt;/a&gt;, &lt;a href=&quot;https://x.com/mervenoyann/status/2051702372339003841&quot;&gt;@mervenoyann&lt;/a&gt;, and &lt;a href=&quot;https://x.com/_philschmid/status/2051752856319926475&quot;&gt;@_philschmid&lt;/a&gt;. The key engineering detail is that this is &lt;strong&gt;speculative-style decoding integrated into open tooling&lt;/strong&gt;, with day-0 or near-day-0 support in &lt;strong&gt;Transformers, vLLM, MLX, SGLang, Ollama, and AI Edge&lt;/strong&gt;. &lt;a href=&quot;https://x.com/vllm_project/status/2051744111116574950&quot;&gt;@vllm_project&lt;/a&gt; specifically announced a ready Docker image for Gemma 4 on vLLM.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RadixArk raises a massive seed around SGLang + Miles&lt;/strong&gt;: One of the bigger infra financings was &lt;strong&gt;RadixArk’s $100M seed&lt;/strong&gt;, built around the &lt;strong&gt;SGLang&lt;/strong&gt; inference stack and &lt;strong&gt;Miles&lt;/strong&gt; for large-scale RL/post-training. &lt;a href=&quot;https://x.com/BanghuaZ/status/2051650922892476904&quot;&gt;@BanghuaZ&lt;/a&gt; framed the company as spanning inference, training, RL, orchestration, kernels, and multi-hardware systems; &lt;a href=&quot;https://x.com/Arpan_Shah_/status/2051651802484150278&quot;&gt;@Arpan_Shah_&lt;/a&gt; and &lt;a href=&quot;https://x.com/GenAI_is_real/status/2051703162722263180&quot;&gt;@GenAI_is_real&lt;/a&gt; emphasized the goal of making frontier-grade infrastructure &lt;strong&gt;open and production-grade&lt;/strong&gt;, rather than forcing every team to rebuild scheduling, KV-cache management, and rollout systems from scratch. Community endorsements came from &lt;a href=&quot;https://x.com/ibab/status/2051690211873308892&quot;&gt;@ibab&lt;/a&gt; and &lt;a href=&quot;https://x.com/multiply_matrix/status/2051698056316526651&quot;&gt;@multiply_matrix&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inference economics are now highly provider-specific&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2051735255044997215&quot;&gt;@ArtificialAnlys&lt;/a&gt; compared &lt;strong&gt;MiniMax-M2.7&lt;/strong&gt; across six providers and found major differences in &lt;strong&gt;tokens/sec, cache discounting, and blended cost&lt;/strong&gt;. &lt;strong&gt;SambaNova&lt;/strong&gt; led raw speed at &lt;strong&gt;435 output tok/s&lt;/strong&gt;, while &lt;strong&gt;Fireworks&lt;/strong&gt; looked stronger on the speed/price frontier for many workloads. Separately, &lt;a href=&quot;https://x.com/teortaxesTex/status/2051525774851682409&quot;&gt;@teortaxesTex&lt;/a&gt; highlighted how &lt;strong&gt;cache-hit rates&lt;/strong&gt; dominate cost on some agent workloads, calling cache optimization “the main axis of cost reduction with V4.”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cold-start and distributed training remain active systems bottlenecks&lt;/strong&gt;: &lt;a href=&quot;https://x.com/kamilsindi/status/2051674592750494094&quot;&gt;@kamilsindi&lt;/a&gt; described a system that cut model cold starts &lt;strong&gt;60×&lt;/strong&gt;, from minutes to seconds, by serving weights from &lt;strong&gt;GPUs already holding them&lt;/strong&gt; rather than cloud storage. On the training side, &lt;a href=&quot;https://x.com/dl_weekly/status/2051693914868871205&quot;&gt;@dl_weekly&lt;/a&gt; highlighted Google DeepMind’s &lt;strong&gt;Decoupled DiLoCo&lt;/strong&gt;, which reportedly achieved &lt;strong&gt;88% goodput vs. 27%&lt;/strong&gt; for standard data parallel at scale while using ~&lt;strong&gt;240× less inter-datacenter bandwidth&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agents, RL environments, observability, and long-horizon research&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RL infra is shifting from “single generation + reward” to long-running action systems&lt;/strong&gt;: &lt;a href=&quot;https://x.com/adithya_s_k/status/2051660068471603352&quot;&gt;@adithya_s_k&lt;/a&gt; released a guide comparing &lt;strong&gt;RL environment frameworks&lt;/strong&gt; for the LLM era, focusing on what scales to &lt;strong&gt;thousands of environments&lt;/strong&gt;. A detailed survey by &lt;a href=&quot;https://x.com/ZhihuFrontier/status/2051691071634301064&quot;&gt;@ZhihuFrontier&lt;/a&gt; contrasted traditional RLVR with &lt;strong&gt;agentic RL&lt;/strong&gt;, pointing to systems such as &lt;strong&gt;Forge, ROLL, Slime, and Seer&lt;/strong&gt; and recurring concerns like &lt;strong&gt;TITO consistency&lt;/strong&gt;, rollout latency, prefix-tree merging, and global KV caches.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Long-horizon failures are increasingly framed as horizon problems, not just capacity problems&lt;/strong&gt;: &lt;a href=&quot;https://x.com/dair_ai/status/2051679862788878354&quot;&gt;@dair_ai&lt;/a&gt; summarized a Microsoft Research paper arguing that &lt;strong&gt;goal horizon alone can be the training bottleneck&lt;/strong&gt;, with &lt;strong&gt;macro actions / horizon reduction&lt;/strong&gt; stabilizing training and improving long-horizon generalization. This rhymes with broader frustration that current benchmarks and public evals still underweight true long-horizon behavior.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Observability is maturing into a feedback-driven improvement loop&lt;/strong&gt;: &lt;a href=&quot;https://x.com/hwchase17/status/2051708980435853513&quot;&gt;@hwchase17&lt;/a&gt; and &lt;a href=&quot;https://x.com/LangChain/status/2051709642716135729&quot;&gt;@LangChain&lt;/a&gt; argued that traces alone are insufficient; the key is attaching &lt;strong&gt;direct, indirect, or generated feedback&lt;/strong&gt; so observability becomes a &lt;strong&gt;learning system&lt;/strong&gt;. &lt;a href=&quot;https://x.com/benhylak/status/2051727888639250450&quot;&gt;@benhylak&lt;/a&gt; launched &lt;strong&gt;Raindrop Triage&lt;/strong&gt;, an agent dedicated to finding and investigating bad agent behavior. &lt;a href=&quot;https://x.com/Vtrivedy10/status/2051727418134593632&quot;&gt;@Vtrivedy10&lt;/a&gt; laid out the practical loop explicitly: &lt;strong&gt;gather data → mine errors → localize which component failed → apply fix → test → repeat&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Enterprise verticalization: finance, legal, and proactive assistants&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Anthropic and Perplexity both pushed hard into finance workflows&lt;/strong&gt;: Anthropic launched &lt;strong&gt;financial-services agent templates&lt;/strong&gt; for work such as &lt;strong&gt;pitch generation, valuation review, KYC screening, and month-end close&lt;/strong&gt;, with integrations into providers like &lt;strong&gt;FactSet, S&amp;#x26;P Global, and Morningstar&lt;/strong&gt;, via &lt;a href=&quot;https://x.com/claudeai/status/2051679629488865498&quot;&gt;@claudeai&lt;/a&gt; and summarized by &lt;a href=&quot;https://x.com/kimmonismus/status/2051681279582540114&quot;&gt;@kimmonismus&lt;/a&gt;. Perplexity announced &lt;strong&gt;Perplexity Computer for Professional Finance&lt;/strong&gt;, bringing in &lt;strong&gt;licensed data&lt;/strong&gt; and &lt;strong&gt;35 dedicated workflows&lt;/strong&gt; for repeat analyst work, in &lt;a href=&quot;https://x.com/perplexity_ai/status/2051693893473935372&quot;&gt;@perplexity_ai&lt;/a&gt; and &lt;a href=&quot;https://x.com/AravSrinivas/status/2051694381137350661&quot;&gt;@AravSrinivas&lt;/a&gt;. Both launches reflect a clearer move from generic copilots to &lt;strong&gt;workflow-packaged vertical products&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Perplexity also expanded into medical/professional health sources&lt;/strong&gt;: &lt;a href=&quot;https://x.com/perplexity_ai/status/2051710342242480538&quot;&gt;@perplexity_ai&lt;/a&gt; announced premium access to &lt;strong&gt;NEJM, BMJ&lt;/strong&gt;, and additional medical journals/databases, enabling “deep and wide research” on trusted clinical sources; &lt;a href=&quot;https://x.com/AravSrinivas/status/2051711236224761983&quot;&gt;@AravSrinivas&lt;/a&gt; framed this as a product for healthcare-grade information retrieval.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Proactive assistant surfaces are becoming a product category&lt;/strong&gt;: &lt;a href=&quot;https://x.com/kimmonismus/status/2051618156385366305&quot;&gt;@kimmonismus&lt;/a&gt; reported a leak around &lt;strong&gt;Anthropic Orbit&lt;/strong&gt;, described as a proactive assistant that synthesizes data from &lt;strong&gt;Gmail, Slack, GitHub, Calendar, Drive, and Figma&lt;/strong&gt; without explicit prompting. Manus also added &lt;strong&gt;recommended connectors&lt;/strong&gt; that are suggested in context when needed, per &lt;a href=&quot;https://x.com/ManusAI/status/2051681463389610209&quot;&gt;@ManusAI&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Anthropic’s finance template launch&lt;/strong&gt; drew outsized attention: &lt;a href=&quot;https://x.com/claudeai/status/2051679629488865498&quot;&gt;@claudeai&lt;/a&gt; announced ready-to-run Claude agent templates for financial services with &lt;strong&gt;22.9K engagement&lt;/strong&gt;, one of the biggest clearly technical/AI-product posts in the set.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI’s GPT-5.5 Instant launch&lt;/strong&gt; dominated discussion: the main rollout thread from &lt;a href=&quot;https://x.com/OpenAI/status/2051709028250915275&quot;&gt;@OpenAI&lt;/a&gt; exceeded &lt;strong&gt;8.2K engagement&lt;/strong&gt;, with follow-on personalization details also performing strongly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemma 4 speedups landed as a major open-model systems update&lt;/strong&gt;: &lt;a href=&quot;https://x.com/googledevs/status/2051700498328346945&quot;&gt;@googledevs&lt;/a&gt; on &lt;strong&gt;3× faster Gemma 4&lt;/strong&gt; and &lt;a href=&quot;https://x.com/googlegemma/status/2051713412431007808&quot;&gt;@googlegemma&lt;/a&gt; both broke through, reflecting strong interest in inference improvements that preserve quality.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Perplexity’s finance launch&lt;/strong&gt; also resonated broadly: &lt;a href=&quot;https://x.com/perplexity_ai/status/2051693893473935372&quot;&gt;@perplexity_ai&lt;/a&gt; reached &lt;strong&gt;2.5K engagement&lt;/strong&gt;, suggesting that &lt;strong&gt;licensed-data workflow products&lt;/strong&gt; are now seen as strategically important, not just niche enterprise packaging.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Gemma 4 MTP and llama.cpp Speculative Decoding&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t4jq6h/gemma_4_mtp_released/&quot;&gt;Gemma 4 MTP released&lt;/a&gt;&lt;/strong&gt; (Activity: 1116): &lt;strong&gt;&lt;strong&gt;Google released Multi-Token Prediction (MTP) drafter checkpoints for Gemma 4&lt;/strong&gt;, with Hugging Face model cards for &lt;a href=&quot;https://huggingface.co/google/gemma-4-31B-it-assistant&quot;&gt;&lt;code&gt;gemma-4-31B-it-assistant&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/google/gemma-4-26B-A4B-it-assistant&quot;&gt;&lt;code&gt;gemma-4-26B-A4B-it-assistant&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://huggingface.co/google/gemma-4-E4B-it-assistant&quot;&gt;&lt;code&gt;gemma-4-E4B-it-assistant&lt;/code&gt;&lt;/a&gt;, and &lt;a href=&quot;https://huggingface.co/google/gemma-4-E2B-it-assistant&quot;&gt;&lt;code&gt;gemma-4-E2B-it-assistant&lt;/code&gt;&lt;/a&gt;, described in Google’s &lt;a href=&quot;https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/&quot;&gt;blog post&lt;/a&gt;. The MTP setup adds a smaller/faster draft model for &lt;strong&gt;speculative decoding&lt;/strong&gt;, where several draft tokens are proposed and then verified in parallel by the target model, claiming &lt;em&gt;“up to 2x”&lt;/em&gt; decoding speedups while preserving identical output quality versus standard generation; one commenter notes the &lt;strong&gt;E2B drafter is only &lt;code&gt;78M&lt;/code&gt; parameters&lt;/strong&gt;. A technical commenter also shared an updated visual explainer of MTP/speculative decoding for Gemma 4: &lt;a href=&quot;https://newsletter.maartengrootendorst.com/i/193064129/multi-token-prediction-mtp-with-gemma-4&quot;&gt;Maarten Grootendorst’s guide&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A commenter linked a technical visual guide explaining &lt;strong&gt;multi-token prediction (MTP) with Gemma 4&lt;/strong&gt;, including implementation snippets and diagrams: &lt;a href=&quot;https://newsletter.maartengrootendorst.com/i/193064129/multi-token-prediction-mtp-with-gemma-4&quot;&gt;Maarten Grootendorst’s guide&lt;/a&gt;. This is the main substantive resource in the thread for understanding how Gemma’s MTP-style decoding/drafting works.&lt;/li&gt;
&lt;li&gt;One technical detail noted is that the &lt;strong&gt;E2B model includes a &lt;code&gt;78M&lt;/code&gt; draft model&lt;/strong&gt;, implying a relatively small auxiliary model used for speculative or multi-token drafting. The comment highlights the draft model size as unusually compact, which is relevant for latency/throughput tradeoffs in MTP-style inference.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t3guzw/llamacpp_mtp_support_now_in_beta/&quot;&gt;Llama.cpp MTP support now in beta!&lt;/a&gt;&lt;/strong&gt; (Activity: 1103): &lt;strong&gt;&lt;code&gt;llama.cpp&lt;/code&gt; has beta MTP (&lt;strong&gt;Multi-Token Prediction&lt;/strong&gt;) support via &lt;a href=&quot;https://github.com/ggml-org/llama.cpp/pull/22673&quot;&gt;PR #22673&lt;/a&gt;, initially targeting &lt;strong&gt;Qwen3.x MTP&lt;/strong&gt; models and loading the MTP component as a separate model from the same GGUF, with its own context/KV cache rather than a separate GGUF artifact. The PR adds post-&lt;code&gt;ubatch&lt;/code&gt; MTP consumption to propagate hidden features correctly across ubatches and a small speculative decoding path depending on partial &lt;code&gt;seq_rm&lt;/code&gt; support; reported Qwen3.6 27B / 35B-A3B tests show ~&lt;code&gt;75%&lt;/code&gt; steady-state acceptance with &lt;code&gt;3&lt;/code&gt; draft tokens and usually &lt;strong&gt;&gt;2× token-generation throughput&lt;/strong&gt; over baseline.&lt;/strong&gt; Commenters view this as potentially one of the largest &lt;code&gt;llama.cpp&lt;/code&gt; performance improvements to date, especially for dense models, and expect it to narrow token-generation speed gaps with vLLM alongside tensor parallelism. There is demand for a technical comparison of speculative decoding methods—MTP, EAGLE-3, DFlash, DTree, n-gram—covering draft-model requirements, context reuse, and model suitability.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commenters frame &lt;strong&gt;MTP / multi-token prediction&lt;/strong&gt; as potentially a major llama.cpp throughput improvement, especially for &lt;strong&gt;dense models&lt;/strong&gt;, while expecting less benefit for &lt;strong&gt;MoE&lt;/strong&gt; architectures. There is interest in comparing it against other speculative decoding approaches such as &lt;strong&gt;EAGLE-3&lt;/strong&gt;, &lt;strong&gt;DFlash&lt;/strong&gt;, &lt;strong&gt;DTree&lt;/strong&gt;, and &lt;code&gt;ngram&lt;/code&gt;, particularly around whether they require separate draft models and how well they reuse existing context.&lt;/li&gt;
&lt;li&gt;One tester reported llama.cpp’s beta MTP support is &lt;em&gt;“way faster than ik_llama.cpp implementation currently”&lt;/em&gt; in quick local testing. They linked a GGUF surgery script that extracts the MTP layer from &lt;strong&gt;am17an’s Q8_0 model&lt;/strong&gt; and injects it into an existing &lt;strong&gt;Qwen 3.6 27B GGUF&lt;/strong&gt;: &lt;a href=&quot;https://gist.github.com/buzz/1c439684d5e3f36492ae9f64ef7e3f67&quot;&gt;gist.github.com/buzz/1c439684d5e3f36492ae9f64ef7e3f67&lt;/a&gt;, reportedly working with &lt;strong&gt;Bartowski’s Q6_K&lt;/strong&gt; quantization.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Lower-Cost Frontier Alternatives for Agents and Coding&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLM/comments/1t3pjkn/qwen3627b_is_the_first_local_model_that_actually/&quot;&gt;Qwen3.6:27b is the first local model that actually holds up against Claude Code for me&lt;/a&gt;&lt;/strong&gt; (Activity: 606): &lt;strong&gt;The post claims &lt;strong&gt;Qwen3.6:27B&lt;/strong&gt; is the first local open-weight coding model that feels practically usable versus &lt;strong&gt;Claude Code&lt;/strong&gt;, handling scaffolding, refactors, test generation, and few-file debugging locally, while still deferring harder multi-file architecture work to Claude. The author reports that &lt;code&gt;opencode&lt;/code&gt;-style CLI agent setup required significantly more tuning than Claude Code’s out-of-the-box tool/context orchestration, raising the question of how much Claude Code quality comes from the model itself versus agentic scaffolding. A commenter reports running &lt;strong&gt;Qwen 3.6 35B&lt;/strong&gt; on an &lt;strong&gt;RTX 5080&lt;/strong&gt; with GPU/CPU layer splitting at roughly &lt;code&gt;70 tokens/s&lt;/code&gt;, while another says &lt;strong&gt;27B dense&lt;/strong&gt; is useful for cheaper/lightweight work but still behind &lt;strong&gt;Sonnet 4.6 / Opus 4.7&lt;/strong&gt; for one-shot coding wins.&lt;/strong&gt; Commenters debated pricing dynamics: one argued that viable local models should force cloud prices down via competition, countering the post’s concern about future high-priced Claude Code tiers. Others cautioned against overhyping Qwen, noting tool-calling loops and that frontier Claude models remain materially stronger for fast, high-confidence coding tasks.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several users report that &lt;strong&gt;Qwen3.6 27B/35B is finally useful locally&lt;/strong&gt;, but still below frontier coding models for harder tasks. One commenter runs &lt;strong&gt;Qwen 3.6 35B on an RTX 5080&lt;/strong&gt; by splitting layers across GPU/CPU, with most layers on GPU, reaching approximately &lt;code&gt;70 tokens/s&lt;/code&gt;; another uses &lt;strong&gt;27B dense on an RTX Pro 6000 Blackwell&lt;/strong&gt; but still prefers &lt;strong&gt;Claude Sonnet 4.6 / Opus 4.7&lt;/strong&gt; for one-shot or high-confidence coding work.&lt;/li&gt;
&lt;li&gt;A recurring implementation issue is &lt;strong&gt;tool-calling instability&lt;/strong&gt;, with Qwen reportedly getting stuck in loops despite parameter/configuration tuning. Another user notes &lt;strong&gt;27B struggles at a &lt;code&gt;32k&lt;/code&gt; context window on an M4 Pro with &lt;code&gt;24GB&lt;/code&gt; VRAM&lt;/strong&gt;, leading them to fall back to the &lt;strong&gt;Qwen 9B&lt;/strong&gt; variant for practical use.&lt;/li&gt;
&lt;li&gt;One detailed coding-task comparison found Qwen much slower and more error-prone than Claude models: &lt;strong&gt;Qwen took about &lt;code&gt;6 hours&lt;/code&gt; to fix &lt;code&gt;47&lt;/code&gt; test failures one or two at a time&lt;/strong&gt;, while &lt;strong&gt;Opus completed the same task in &lt;code&gt;20 minutes&lt;/code&gt;&lt;/strong&gt; and Sonnet in under &lt;code&gt;30 minutes&lt;/code&gt;. The user also described a semantic failure where Qwen misdiagnosed a CSV header/import issue as cross-library CSV incompatibility, then disabled CSV import functionality and degraded product behavior instead of applying the simpler fix.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t47qbw/deepseek_v4_pro_matches_gpt52_on_foodtruck_bench/&quot;&gt;DeepSeek V4 Pro matches GPT-5.2 on FoodTruck Bench, our agentic benchmark — 10 weeks later, ~17× cheaper&lt;/a&gt;&lt;/strong&gt; (Activity: 431): &lt;strong&gt;The &lt;a href=&quot;https://i.redd.it/fx89f3w5n9zg1.png&quot;&gt;image&lt;/a&gt; is a &lt;strong&gt;FoodTruck Bench&lt;/strong&gt; leaderboard screenshot showing &lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt; highlighted at rank &lt;code&gt;#4&lt;/code&gt;, with &lt;code&gt;$27,142&lt;/code&gt; 30-day net worth, &lt;code&gt;1257%&lt;/code&gt; ROI, and &lt;code&gt;51%&lt;/code&gt; margin—very close to &lt;strong&gt;GPT-5.2&lt;/strong&gt; at &lt;code&gt;$28,081&lt;/code&gt;. In the post’s context, this supports the claim that DeepSeek reached near-GPT-5.2 agentic performance about &lt;code&gt;10 weeks&lt;/code&gt; later while being claimed as &lt;strong&gt;~17× cheaper&lt;/strong&gt; for the same workload, with &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; still far ahead at &lt;code&gt;$49,519&lt;/code&gt;. The benchmark is framed as a persistent-memory, tool-using agent simulation with &lt;code&gt;34&lt;/code&gt; tools for food-truck operations, not a meme or non-technical image.&lt;/strong&gt; Commenters were impressed but skeptical of the broader framing: one noted &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; appears to be pulling away with roughly &lt;code&gt;1.7×&lt;/code&gt; the profit of the next group, while another questioned why &lt;strong&gt;Gemma 4 31B&lt;/strong&gt; is under-discussed if it beats Sonnet 4.6 on this benchmark and performs well on EQBench.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters focused on &lt;strong&gt;model-ranking anomalies and coverage gaps&lt;/strong&gt; in FoodTruck Bench: &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; was described as achieving roughly &lt;code&gt;1.7×&lt;/code&gt; higher profit than the next group of models, while users asked why newer &lt;strong&gt;GPT-5.4/5.5&lt;/strong&gt; models were absent from the comparison.&lt;/li&gt;
&lt;li&gt;Multiple users flagged &lt;strong&gt;Gemma 31B&lt;/strong&gt; as unexpectedly strong, noting that it appears in the &lt;strong&gt;top 5&lt;/strong&gt; on FoodTruck Bench and reportedly performs well on &lt;strong&gt;EQBench&lt;/strong&gt;, even beating &lt;strong&gt;Sonnet 4.6&lt;/strong&gt; in this benchmark. Commenters suggested this makes it harder to interpret claims around &lt;strong&gt;DeepSeek&lt;/strong&gt;, &lt;strong&gt;Xiaomi&lt;/strong&gt;, or the benchmark itself without deeper analysis of why Gemma scores so well.&lt;/li&gt;
&lt;li&gt;There were concrete benchmark-improvement requests: create a &lt;strong&gt;FoodTruck Bench v2&lt;/strong&gt; with higher-fidelity simulation, more real-world variables, and more engineered scenario design. Users also requested adding recent &lt;strong&gt;Qwen3.6&lt;/strong&gt; models, specifically &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt;, to better compare current open-weight model families.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. AI Coding vs Production Software Work&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t3bk3x/vibe_coding_vs_production_reality/&quot;&gt;Vibe Coding vs. Production reality&lt;/a&gt;&lt;/strong&gt; (Activity: 3549): &lt;strong&gt;The image is an iceberg-style infographic, &lt;a href=&quot;https://i.redd.it/8y4uvb0ry2zg1.jpeg&quot;&gt;&lt;strong&gt;“Vibe Coding vs. Production Reality”&lt;/strong&gt;&lt;/a&gt;, contrasting fast AI-assisted MVP/PoC generation with the much larger hidden engineering surface required for production: &lt;code&gt;auth&lt;/code&gt;, secrets management, GDPR/data handling, audit logs, rate limiting, multi-tenancy, CI/CD, logging, incident response, testing, support, and vendor/model lifecycle risk. In context, the post argues that while “vibe coding” can compress the &lt;code&gt;80/20&lt;/code&gt; prototype phase from days to hours, shipping asset management, GRC, or internal RAG systems still fails without production-grade operational, security, and compliance work.&lt;/strong&gt; Comments push back that production has also become easier with modern platforms and AI, but only if the builder understands the domain; others argue scope matters—e.g. a simple Supabase-backed app may be fine, while business-critical or high-scale systems still require serious engineering discipline.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters argued that &lt;strong&gt;AI-assisted “vibe coding” lowers the barrier to building an MVP&lt;/strong&gt;, but does not remove production requirements such as reliability, deployment, security hardening, observability, maintenance, and operational ownership. The core technical distinction raised was that generating code is only one part of shipping a production product.&lt;/li&gt;
&lt;li&gt;One technical nuance was around &lt;strong&gt;scope and scale&lt;/strong&gt;: a simple web app backed by managed services like &lt;strong&gt;Supabase&lt;/strong&gt; can offload major production concerns such as authentication, database hosting, and backend APIs. However, commenters noted that once the application becomes business-critical or needs to scale beyond early users, deeper engineering expertise is still required.&lt;/li&gt;
&lt;li&gt;A commenter cautioned against premature over-engineering, noting that it is a fallacy to architect for &lt;em&gt;“tens of thousands of users while you have a hundred.”&lt;/em&gt; The implied technical recommendation is to match architecture, hardening, and scalability work to actual usage and risk rather than designing for hypothetical production scale upfront.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1t3yqbo/sr_software_engineer_havent_written_a_line_of/&quot;&gt;Sr Software Engineer - Haven&apos;t written a line of code in months&lt;/a&gt;&lt;/strong&gt; (Activity: 2369): &lt;strong&gt;A senior engineer at a ~&lt;code&gt;100+&lt;/code&gt; person startup claims they now primarily “drive intent” with &lt;strong&gt;Claude/Codex/Perplexity&lt;/strong&gt; rather than hand-writing code, arguing AI has shifted the value of senior engineers toward system design, UX, architecture, and technology tradeoff decisions rather than language/framework specialization. They also suggest interviewing should emphasize system design and tool/technology selection over language expertise, because &lt;em&gt;“Claude is better than the majority of dev teams at writing and maintaining code”&lt;/em&gt;—while acknowledging this depends on prior engineering experience.&lt;/strong&gt; Top commenters split between agreement and strong caution: one &lt;code&gt;10 YOE&lt;/code&gt; engineer reports the same shift, while a lead developer says they are currently rescuing a low-quality AI-heavy project built by senior engineers who claimed to “review all the code,” warning of confirmation bias, reliability issues, hotfix churn, and possible skill atrophy. Another &lt;code&gt;22 YOE&lt;/code&gt; commenter says they use AI extensively but still intentionally write code daily to avoid losing implementation skill.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A lead developer reported inheriting a project built by senior engineers who largely stopped coding and only “reviewed all the code”; despite receiving praise during development, the product allegedly suffered from poor &lt;strong&gt;quality and reliability&lt;/strong&gt;, leading to market issues, constant hotfixes, and support escalations. They argue that excessive reliance on AI-assisted development can create hidden technical debt that becomes visible only after release, requiring a team using &lt;em&gt;some&lt;/em&gt; AI to “untangle the mess.”&lt;/li&gt;
&lt;li&gt;Several experienced engineers distinguished between using AI heavily and fully delegating implementation: one with &lt;code&gt;22 years&lt;/code&gt; of experience said they still deliberately write code daily to avoid skill atrophy, while another commenter warned that coding-interview readiness, e.g. LeetCode-style tasks, may degrade if engineers stop manually implementing solutions.&lt;/li&gt;
&lt;li&gt;One commenter with &lt;code&gt;20 years&lt;/code&gt; of experience described a team where &lt;strong&gt;AI writes 100% of production code&lt;/strong&gt;, while humans still perform PR review and architectural/problem-solving work. In that workflow, the main throughput constraint has shifted from code production to &lt;strong&gt;human review capacity&lt;/strong&gt;, suggesting review quality and reviewer bandwidth become critical bottlenecks in AI-heavy engineering processes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t3xs80/anthropic_ai_will_fully_replace_software/&quot;&gt;Anthropic: AI will fully replace software engineering by 2027. Also Anthropic: Currently hiring for 122 SWE openings.&lt;/a&gt;&lt;/strong&gt; (Activity: 1531): &lt;strong&gt;The &lt;a href=&quot;https://i.redd.it/n9tcmeswa7zg1.png&quot;&gt;image&lt;/a&gt; is a &lt;strong&gt;meme-style infographic&lt;/strong&gt;, not a technical benchmark, contrasting &lt;strong&gt;Dario Amodei/Anthropic’s public claims&lt;/strong&gt; that coding or software engineering may be heavily automated by ~2027 with a chart alleging Anthropic has &lt;code&gt;122&lt;/code&gt; open SWE roles and a &lt;code&gt;184%&lt;/code&gt; increase since Jan 2025. The post argues this hiring trend conflicts with “AI will replace software engineers end-to-end” messaging, while noting broader signals such as Amazon intern hiring, NVIDIA’s compute-cost framing, SaaS reliability issues, and lack of clear large-scale AI productivity gains.&lt;/strong&gt; Commenters split between seeing the hiring as compatible with Anthropic’s prediction—engineers may shift into monitoring, integration, and bottleneck-resolution roles—and arguing that &lt;code&gt;122&lt;/code&gt; engineers is small for a company with a claimed &lt;code&gt;$30B&lt;/code&gt; run rate. Others suggested the constant anxiety and debate in coding subreddits is itself evidence that AI displacement is being taken seriously.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One technical framing argued that &lt;strong&gt;“replace software engineering” may mean replacing direct coding labor rather than eliminating the SWE role entirely&lt;/strong&gt;: engineers could shift toward monitoring AI-generated outputs, resolving bottlenecks, reviewing failures, and managing systems built by models. Under this interpretation, Anthropic hiring SWEs is not inconsistent with predicting a fundamentally different engineering workflow by 2027.&lt;/li&gt;
&lt;li&gt;A commenter noted that &lt;strong&gt;&lt;code&gt;122&lt;/code&gt; SWE openings is small relative to a claimed &lt;code&gt;30B&lt;/code&gt; run-rate software company&lt;/strong&gt;, implying Anthropic can simultaneously predict automation and still need a relatively small engineering staff for model/product infrastructure. Another argued that hiring engineers now is a rational acceleration strategy if model capability improvement depends on more engineering plus compute investment.&lt;/li&gt;
&lt;li&gt;A business/market-structure critique suggested Anthropic’s replacement claims may function partly as &lt;strong&gt;enterprise-sales and venture-capital signaling&lt;/strong&gt;: if customers and investors believe AI can replace a large fraction of white-collar engineering labor, the company’s valuation and adoption prospects improve. This frames the 2027 claim less as a purely technical forecast and more as hype tied to fundraising and enterprise demand generation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. AI Account and Agent Exploit Incidents&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1t4atbx/warning_anthropics_gift_max_exploit_drained_800/&quot;&gt;Warning: Anthropic&apos;s &quot;Gift Max&quot; exploit drained €800+, ruined my credit, and got me banned.&lt;/a&gt;&lt;/strong&gt; (Activity: 2536): &lt;strong&gt;A German data science student claims their &lt;strong&gt;Anthropic/Claude account with 2FA enabled&lt;/strong&gt; incurred &lt;code&gt;€800+&lt;/code&gt; in unauthorized “Gift Max” charges on Apr 27, allegedly with &lt;strong&gt;3-D Secure not completed&lt;/strong&gt;, gift codes generated/redeemed by a third party, and contemporaneous Anthropic billing issues cited via the &lt;a href=&quot;https://status.anthropic.com/&quot;&gt;Anthropic status page&lt;/a&gt; plus GitHub issues &lt;code&gt;#51404&lt;/code&gt;/&lt;code&gt;#51168&lt;/code&gt;. After submitting a police report (&lt;em&gt;Strafanzeige&lt;/em&gt;) and evidence, they say Anthropic &lt;strong&gt;banned the account instead of refunding&lt;/strong&gt;, cutting off access to WIP projects/chats; a later update says the bank processed the case as fraud, issued a reclamation/refund, and will pursue Anthropic’s merchant account, while the user plans a GDPR/DSGVO data request and German legal aid (&lt;em&gt;Beratungshilfeschein&lt;/em&gt;) to address SCHUFA damage.&lt;/strong&gt; Commenters focused less on the exploit mechanics and more on payment-dispute process differences: one compared Germany with the U.S. chargeback model, while another noted the irony of a Gemini-assisted post criticizing Anthropic in a ChatGPT-related subreddit.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The OP reports their bank treated the unauthorized Anthropic charges as &lt;strong&gt;fraud&lt;/strong&gt;, issued a reclamation/chargeback, and refunded the &lt;code&gt;€800+&lt;/code&gt;. They also plan to file a &lt;strong&gt;GDPR/DSGVO data access request&lt;/strong&gt; to recover work-in-progress projects and pursue German legal aid (&lt;em&gt;Beratungshilfeschein&lt;/em&gt;) to clear any negative &lt;strong&gt;SCHUFA&lt;/strong&gt; credit entries.&lt;/li&gt;
&lt;li&gt;One commenter reports seeing multiple &lt;strong&gt;YouTube ads&lt;/strong&gt; from different merchants all promoting the same “1 year free Claude access” offer, suggesting a coordinated phishing or scam-ad campaign rather than an isolated billing issue. This is relevant as a potential acquisition vector for the alleged “Gift Max” exploit or fake Claude subscription flow.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1t3hw53/a_twitter_user_tricked_grok_to_send_200k_usd_to/&quot;&gt;A Twitter user tricked Grok to send 200k USD to him and it worked&lt;/a&gt;&lt;/strong&gt; (Activity: 2394): &lt;strong&gt;The post claims a Twitter/X user extracted roughly &lt;strong&gt;&lt;code&gt;$200k&lt;/code&gt;&lt;/strong&gt; by prompting &lt;strong&gt;Grok&lt;/strong&gt; to produce a command that was then acted on by &lt;strong&gt;Bankrbot&lt;/strong&gt;, rather than Grok directly controlling or sending crypto from a wallet; commenters cite X Community Notes saying &lt;em&gt;“Grok didn’t send anyone anything”&lt;/em&gt; and that the failure was an agent/bot command-execution path. The described exploit chain is: Bankrbot allegedly caused/handled an accidentally created crypto token, fees accrued to a wallet attributed to Grok, and an attacker induced Grok to instruct Bankrbot to transfer those funds elsewhere; the original Reddit gallery was not accessible due to &lt;code&gt;403 Forbidden&lt;/code&gt; (&lt;a href=&quot;https://www.reddit.com/gallery/1t3hw53&quot;&gt;Reddit gallery&lt;/a&gt;).&lt;/strong&gt; Commenters focused on the security implications of loosely coupled LLM agents and crypto bots, especially unclear authorization boundaries between text generation and executable financial commands. Some also questioned the attacker’s operational choice to disclose the exploit instead of continuing to drain funds.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Commenters clarified that &lt;strong&gt;Grok itself did not hold or transfer crypto&lt;/strong&gt;; according to cited X Community Notes/context, Grok was allegedly prompted to emit a command that another automated agent, &lt;strong&gt;@bankerbot/Bankrbot&lt;/strong&gt;, interpreted and executed. The technically relevant issue is therefore an &lt;strong&gt;AI-to-AI prompt/command injection failure&lt;/strong&gt;, where one model’s generated text appears to have been treated as an authorized instruction by a crypto bot.&lt;/li&gt;
&lt;li&gt;One summary of the incident describes a prior failure where &lt;strong&gt;Bankrbot allegedly created a crypto token from Grok output&lt;/strong&gt;, users then traded that accidental token, and transaction fees accumulated in a wallet associated with the token/Grok interaction. The later exploit reportedly involved prompting Grok to instruct Bankrbot to redirect those accumulated fees, highlighting unsafe coupling between LLM-generated text, bot command parsers, and on-chain asset control.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>openai</category><category>langchain</category><category>deepseek</category><category>gpt-5.5-instant</category><category>codex</category><category>sama</category><category>michpokrass</category><category>ericmitchellai</category><category>kimmonismus</category><category>reach_vb</category><category>vtrivedy10</category><category>sydneyrunkle</category><category>masondrxy</category><category>0xsero</category><category>teortaxestex</category><category>theethanding</category><category>finbarrtimbers</category><category>personalization</category><category>voice</category><category>real-time-api</category><category>webrtc</category><category>agent-frameworks</category><category>coding-agents</category><category>model-harness</category><category>benchmarking</category><category>automation</category><category>task-automation</category><category>developer-tools</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-05-04-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-04-not-much/</guid><description>**AI Twitter Recap** highlights the shift from model-centric AI to **context pipelines** and **agent orchestration** as key performance drivers. Notably, **gpt-5.2-codex** and **gpt-5.3-codex** showed significant benchmark improvements through prompt and middleware tuning. The ecosystem around open harnesses like **Hermes**, **deepagents**, and **Flue** is rapidly evolving, with innovations in multi-agent coordination and model-agnostic orchestration. Developer workflows are adapting to coding agents such as **Codex** and **Claude Code**, with emerging challenges in pricing models due to high token usage in agentic workloads. The practical takeaway is that agent performance depends on the synergy of **model × harness × memory/context strategy**, not just model weights alone.</description><pubDate>Mon, 04 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 5/1/2026-5/4/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Harness Engineering, Agent Orchestration, and the Shift from Models to Context Pipelines&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The harness is becoming the product boundary&lt;/strong&gt;: A recurring theme across the day was that model quality is no longer the only meaningful moat. &lt;a href=&quot;https://x.com/AnthonyMaio/status/2050976650943213964&quot;&gt;Anthony Maio&lt;/a&gt; argued that lock-in comes from the &lt;strong&gt;context pipeline&lt;/strong&gt;—how repo state is fetched, ranked, and compressed into the prompt—rather than from the harness shell itself. That point was reinforced by &lt;a href=&quot;https://x.com/masondrxy/status/2051016743905305007&quot;&gt;Mason Drxy&lt;/a&gt;, who reported that changing prompts and middleware in the harness moved &lt;strong&gt;gpt-5.2-codex from 52.8% to 66.5% on Terminal-Bench 2.0&lt;/strong&gt;, and improved &lt;strong&gt;gpt-5.3-codex by 20% on tau2-bench&lt;/strong&gt;. The practical takeaway: agent performance is increasingly a joint property of &lt;strong&gt;model × harness × memory/context strategy&lt;/strong&gt;, not of weights alone.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open harnesses are maturing quickly&lt;/strong&gt;: The most visible momentum came from the &lt;strong&gt;Hermes / deepagents / Flue-style&lt;/strong&gt; ecosystem. &lt;a href=&quot;https://x.com/Teknium/status/2051001156005151226&quot;&gt;@Teknium&lt;/a&gt; launched &lt;strong&gt;Hermes Agent Kanban&lt;/strong&gt; for visual multi-agent coordination, while &lt;a href=&quot;https://x.com/naroh/status/2050998576486973759&quot;&gt;@naroh&lt;/a&gt; showed a Spanish-language “war room” UI over Hermes orchestration. On the LangChain side, &lt;a href=&quot;https://x.com/hwchase17/status/2051004516674457965&quot;&gt;@hwchase17&lt;/a&gt;, &lt;a href=&quot;https://x.com/sydneyrunkle/status/2051382622517887479&quot;&gt;@sydneyrunkle&lt;/a&gt;, and &lt;a href=&quot;https://x.com/LangChain/status/2051360793904529439&quot;&gt;@LangChain&lt;/a&gt; highlighted deepagents/LangGraph improvements including &lt;strong&gt;profiles for model-specific harness configs&lt;/strong&gt;, &lt;strong&gt;schema migrations&lt;/strong&gt;, &lt;strong&gt;node-level error handlers&lt;/strong&gt;, &lt;strong&gt;timeouts&lt;/strong&gt;, and &lt;strong&gt;new streaming primitives&lt;/strong&gt;. &lt;a href=&quot;https://x.com/Shashikant86/status/2050999432569651221&quot;&gt;PyFlue&lt;/a&gt; also extended the “agent harness” concept into Python, explicitly positioning harnesses as the missing layer between raw model calls and durable agents.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model-agnostic orchestration is becoming a design goal&lt;/strong&gt;: Multiple tweets framed the next wave as &lt;strong&gt;open models + open harnesses&lt;/strong&gt; rather than “pick one frontier API.” &lt;a href=&quot;https://x.com/Vtrivedy10/status/2051148084567052690&quot;&gt;Vtrivedy&lt;/a&gt; argued teams can get &lt;strong&gt;&gt;20x cheaper&lt;/strong&gt; agents by tuning open models inside a good harness; &lt;a href=&quot;https://x.com/masondrxy/status/2051359502918648319&quot;&gt;Mason Drxy&lt;/a&gt; described deepagents-cli as becoming a strong coding harness for &lt;strong&gt;Kimi, Qwen, GLM, hosted Ollama, OpenRouter, LiteLLM, Baseten&lt;/strong&gt;, etc.; &lt;a href=&quot;https://x.com/LangChain/status/2051367244060598312&quot;&gt;LangChain Fleet&lt;/a&gt; added &lt;strong&gt;multi-model sub-agent routing&lt;/strong&gt; so different steps can use different models. This is the architectural counterpoint to API lock-in: separate the orchestration layer from the model provider.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Coding Agents, Cost Curves, and Workflow Changes&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Coding-agent UX is changing developer behavior faster than benchmarks can capture&lt;/strong&gt;: Several posts described the lived reality of coding with Codex, Claude Code, Hermes, and Devin-like systems. &lt;a href=&quot;https://x.com/dbreunig/status/2051081626139210202&quot;&gt;dbreunig&lt;/a&gt; proposed “commandments” for agentic coding—&lt;strong&gt;implement to learn, rebuild often, E2E tests are gold, document intent, maintain your spec&lt;/strong&gt;—while &lt;a href=&quot;https://x.com/dbreunig/status/2051083366410400132&quot;&gt;dbreunig&lt;/a&gt; also questioned whether filesystems are even the right abstraction for agents long-term. &lt;a href=&quot;https://x.com/zachtratar/status/2051002668735410193&quot;&gt;zachtratar&lt;/a&gt; sketched a Notion→meeting-notes→spec→coding-agent workflow for compressing “3 month problems” into a few days, emphasizing that alignment artifacts are still necessary even with stronger coding agents.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pricing/billing models are clearly unstable under agentic workloads&lt;/strong&gt;: The standout thread was &lt;a href=&quot;https://x.com/theo/status/2051218167780041147&quot;&gt;@theo&lt;/a&gt;, who pushed a single Copilot message to &lt;strong&gt;60M+ tokens&lt;/strong&gt;, estimating tens to hundreds of dollars of inference against a &lt;strong&gt;$40 subscription&lt;/strong&gt;, later updating to &lt;a href=&quot;https://x.com/theo/status/2051395816410210604&quot;&gt;~$221 of tokens for 15 messages&lt;/a&gt;. This is a useful signal that flat-rate pricing built for chat turns is brittle when users hand long-running jobs to coding agents. Relatedly, &lt;a href=&quot;https://x.com/petergostev/status/2051076960911077796&quot;&gt;petergostev&lt;/a&gt; showed Codex UI support for visualizing usage limits, and &lt;a href=&quot;https://x.com/cheatyyyy/status/2051332852546228533&quot;&gt;cheatyyyy&lt;/a&gt; noted the new anxiety around missing cache hits when input prices are high.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Agents are spreading into adjacent workflows, not just coding&lt;/strong&gt;: There was a steady drumbeat of “agentized” tools: &lt;a href=&quot;https://x.com/reach_vb/status/2051019108028969251&quot;&gt;reach_vb&lt;/a&gt; shipped a &lt;strong&gt;Codex Security plugin&lt;/strong&gt; with five AppSec workflows spanning threat modeling, vuln discovery, validation, and attack-path analysis; &lt;a href=&quot;https://x.com/gabrielchua/status/2051113129317408925&quot;&gt;gabrielchua&lt;/a&gt; demoed &lt;strong&gt;Google Slides generation via Codex&lt;/strong&gt; with realtime deck construction; &lt;a href=&quot;https://x.com/paulabartabajo_/status/2051152294146617674&quot;&gt;paulabartabajo_&lt;/a&gt; published a guide to building a &lt;strong&gt;fully local assistant&lt;/strong&gt; on llama.cpp; and &lt;a href=&quot;https://x.com/UfukDegen/status/2051088239579345329&quot;&gt;UfukDegen&lt;/a&gt; described &lt;strong&gt;Noustiny&lt;/strong&gt;, a substantial Hermes-based video-generation workflow with story-state, character continuity, voice, and render pipelines.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Benchmarks, Evals, and “What Are We Actually Measuring?”&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Benchmark design is under active revision&lt;/strong&gt;: Several posts focused less on leaderboard scores and more on benchmark validity. &lt;a href=&quot;https://x.com/ScaleAILabs/status/2051333688798097567&quot;&gt;Scale AI Labs&lt;/a&gt; introduced &lt;strong&gt;HiL-Bench&lt;/strong&gt;, aimed at testing whether agents know when specs are incomplete and when to ask clarifying questions; &lt;a href=&quot;https://x.com/j_dekoninck/status/2051268263150276872&quot;&gt;j_dekoninck&lt;/a&gt; introduced &lt;strong&gt;MathArena&lt;/strong&gt; as a continuously maintained evaluation platform rather than a static benchmark; &lt;a href=&quot;https://x.com/EpochAIResearch/status/2051330509989368211&quot;&gt;Epoch AI&lt;/a&gt; ran a discussion on whether benchmarks are “doomed”; and &lt;a href=&quot;https://x.com/GoodfireAI/status/2051382876483231968&quot;&gt;Goodfire + AISI&lt;/a&gt; reported that models sometimes recognize they are being evaluated, with &lt;strong&gt;verbalized eval awareness inflating safety scores&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data quality and eval data generation are becoming agentic problems&lt;/strong&gt;: One of the more technically substantive papers highlighted was &lt;a href=&quot;https://x.com/dair_ai/status/2051311905353142328&quot;&gt;Meta FAIR’s Autodata&lt;/a&gt;, described as an &lt;strong&gt;agentic data scientist&lt;/strong&gt; for creating discriminative training/eval examples. The headline number was a &lt;strong&gt;34-point gap between weak and strong solvers&lt;/strong&gt; on a CS research QA task using an agentic self-instruct loop, versus &lt;strong&gt;1.9 points&lt;/strong&gt; for standard CoT self-instruct. That matters because it suggests orchestrated data generation can produce harder, more useful examples than passive synthetic data pipelines.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Context compaction and long-context evals remain unsolved operationally&lt;/strong&gt;: &lt;a href=&quot;https://x.com/_philschmid/status/2051002064826724724&quot;&gt;@_philschmid&lt;/a&gt; explicitly asked for evals requiring &lt;strong&gt;context compaction&lt;/strong&gt;, and &lt;a href=&quot;https://x.com/gabriberton/status/2051050627942568319&quot;&gt;gabriberton&lt;/a&gt; pointed to long-context datasets like LOFT/LooGLE-style setups. Meanwhile, &lt;a href=&quot;https://x.com/jxmnop/status/2051357363815526523&quot;&gt;jxmnop&lt;/a&gt; argued that true &lt;strong&gt;1M-context&lt;/strong&gt; capability still does not really work in practice, despite infra progress, and &lt;a href=&quot;https://x.com/eliebakouch/status/2051374295620665713&quot;&gt;eliebakouch&lt;/a&gt; pushed back that “infra vs science” is a false split because long-context science is itself largely about making memory/compute feasible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Systems, Training Infrastructure, and Inference Stack Updates&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;New parallelism and serving work continues to target long-context, high-throughput regimes&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ZyphraAI/status/2051354310936813569&quot;&gt;Zyphra&lt;/a&gt; introduced &lt;strong&gt;folded Tensor and Sequence Parallelism (TSP)&lt;/strong&gt;, claiming lower per-GPU peak memory than standard schemes and reporting on &lt;strong&gt;1024 MI300X GPUs / 128K context / 8 GPUs per model copy&lt;/strong&gt; that TSP hit &lt;strong&gt;173M tok/sec vs 86M&lt;/strong&gt; for matched TP+SP. &lt;a href=&quot;https://x.com/QuentinAnthon15/status/2051362275483963709&quot;&gt;Quentin Anthony&lt;/a&gt; added that the design has been extended to &lt;strong&gt;MoE MLPs&lt;/strong&gt; and will be used for larger training/inference runs.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AMD-based open-model serving is getting more serious&lt;/strong&gt;: Alongside TSP, &lt;a href=&quot;https://x.com/ZyphraAI/status/2051384562870329444&quot;&gt;Zyphra Cloud&lt;/a&gt; launched inference on &lt;strong&gt;MI355X&lt;/strong&gt; focused on long-horizon agent workloads, initially serving &lt;strong&gt;DeepSeek V3.2, Kimi K2.6, and GLM 5.1&lt;/strong&gt; with V4 “soon.” This pairs with the broader ecosystem trend toward cheaper agent stacks built on open-weight models rather than premium proprietary endpoints.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training optimization and rollout efficiency also got attention&lt;/strong&gt;: &lt;a href=&quot;https://x.com/rasbt/status/2050988005817499827&quot;&gt;rasbt&lt;/a&gt; posted another round of architecture/model-release summaries including &lt;strong&gt;IBM Granite 4.1&lt;/strong&gt; and others; &lt;a href=&quot;https://x.com/kellerjordan0/status/2051363977490489671&quot;&gt;kellerjordan0&lt;/a&gt; highlighted &lt;strong&gt;NorMuon&lt;/strong&gt; improving modded-NanoGPT optimization benchmark records to &lt;strong&gt;3250 steps&lt;/strong&gt;; &lt;a href=&quot;https://x.com/TheAITimeline/status/2051401348726317146&quot;&gt;TheAITimeline&lt;/a&gt; summarized &lt;strong&gt;DORA&lt;/strong&gt;, an asynchronous RL system that addresses rollout skew with multiple live policy versions and claims up to &lt;strong&gt;8.2x rollout speedup&lt;/strong&gt; and &lt;strong&gt;2.12x end-to-end throughput improvement&lt;/strong&gt;; and &lt;a href=&quot;https://x.com/_arohan_/status/2051012103025410410&quot;&gt;PSGD&lt;/a&gt; got positive nods as a still-underappreciated optimizer line.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Research, Models, and Multimodal/Scientific Applications&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multi-agent orchestration is itself becoming a model class&lt;/strong&gt;: &lt;a href=&quot;https://x.com/SakanaAILabs/status/2050998826190667795&quot;&gt;Sakana’s Fugu&lt;/a&gt; framed a multi-agent orchestration system as a foundation model, and &lt;a href=&quot;https://x.com/omarsar0/status/2051306659021242635&quot;&gt;omarsar0&lt;/a&gt; highlighted another Sakana paper where a &lt;strong&gt;7B conductor model&lt;/strong&gt;, trained with RL to design communication topologies and prompts for worker agents, reportedly reached SOTA on &lt;strong&gt;GPQA-Diamond and LiveCodeBench&lt;/strong&gt;. The conceptual shift is important: routing and coordination are being optimized as first-class learned policies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scientific discovery and automation remains a high-signal use case&lt;/strong&gt;: &lt;a href=&quot;https://x.com/kimmonismus/status/2051305620914233400&quot;&gt;kimmonismus&lt;/a&gt; summarized work using AI on NASA star data to identify &lt;strong&gt;100+ hidden planets&lt;/strong&gt; from &lt;strong&gt;2.2 million stars&lt;/strong&gt;; &lt;a href=&quot;https://x.com/RichardSocher/status/2051121805482676323&quot;&gt;Richard Socher&lt;/a&gt; argued that automating science is among the highest-leverage AI applications; and &lt;a href=&quot;https://x.com/cmpatino_/status/2051343930373837125&quot;&gt;cmpatino_&lt;/a&gt; shared &lt;strong&gt;nanowhale&lt;/strong&gt;, a &lt;strong&gt;100M-parameter MoE&lt;/strong&gt; pretrained and post-trained by an agent, as a small but concrete demonstration of agent-driven modelcraft.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local/open model enthusiasm remains strong&lt;/strong&gt;: &lt;a href=&quot;https://x.com/hnshah/status/2051048988292641039&quot;&gt;hnshah&lt;/a&gt; said a recent local model materially improved a 100%-local product; &lt;a href=&quot;https://x.com/NousResearch/status/2051321586980880506&quot;&gt;Nous Research&lt;/a&gt; offered &lt;strong&gt;Trinity-Large-Thinking&lt;/strong&gt; free in Nous Portal for a week; and &lt;a href=&quot;https://x.com/fchollet/status/2051370269445615965&quot;&gt;fchollet&lt;/a&gt; made &lt;em&gt;Deep Learning with Python&lt;/em&gt; free online, a notable resource drop amid the ongoing wave of practitioners moving down-stack into open weights and self-hosted workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prompting / usage style&lt;/strong&gt;: &lt;a href=&quot;https://x.com/pmarca/status/2051374498994364529&quot;&gt;@pmarca’s custom prompt&lt;/a&gt; for “world class expert” behavior was one of the most engaged AI-adjacent posts, reflecting ongoing interest in system-prompting and output-style control.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Coding-agent economics&lt;/strong&gt;: &lt;a href=&quot;https://x.com/theo/status/2051218167780041147&quot;&gt;@theo’s Copilot token burn thread&lt;/a&gt; was the clearest high-engagement data point on how fast agentic usage can break subscription economics.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recursive self-improvement timelines&lt;/strong&gt;: &lt;a href=&quot;https://x.com/jackclarkSF/status/2051312759594471886&quot;&gt;@jackclarkSF&lt;/a&gt; drew major attention with a &lt;strong&gt;60% by end-2028&lt;/strong&gt; estimate for AI systems autonomously building successors, with follow-on discussion from &lt;a href=&quot;https://x.com/goodside/status/2051388803047158175&quot;&gt;Goodside&lt;/a&gt; and &lt;a href=&quot;https://x.com/RyanPGreenblatt/status/2051373130804011512&quot;&gt;Ryan Greenblatt&lt;/a&gt; about how strong that operationalization really is.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open tooling discovery&lt;/strong&gt;: &lt;a href=&quot;https://x.com/andrew_n_carr/status/2051102625613897887&quot;&gt;@andrew_n_carr&lt;/a&gt; surfaced a &lt;strong&gt;Hugging Face model visualizer&lt;/strong&gt; (&lt;a href=&quot;https://x.com/andrew_n_carr/status/2051102627551752654&quot;&gt;hfviewer&lt;/a&gt;), which got outsized traction for a genuinely useful piece of ecosystem tooling.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Model Releases and Updates&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t3dfvp/its_time_to_update_your_gemma_4_ggufs/&quot;&gt;it&apos;s time to update your Gemma 4 GGUFs&lt;/a&gt;&lt;/strong&gt; (Activity: 532): &lt;strong&gt;The post announces an update to the &lt;strong&gt;Gemma 4 GGUF&lt;/strong&gt; models, specifically addressing a fix in the chat template. The updated models are available on &lt;a href=&quot;https://huggingface.co&quot;&gt;Hugging Face&lt;/a&gt; under the users &lt;strong&gt;bartowski&lt;/strong&gt; and &lt;strong&gt;unsloth&lt;/strong&gt;, with various configurations such as &lt;code&gt;31B&lt;/code&gt;, &lt;code&gt;26B-A4B&lt;/code&gt;, &lt;code&gt;E4B&lt;/code&gt;, and &lt;code&gt;E2B&lt;/code&gt;. The update seems to focus on improving the chat template functionality, which can now be customized using tools like &lt;code&gt;llama.cpp&lt;/code&gt; and &lt;code&gt;koboldcpp&lt;/code&gt; by specifying a Jinja template file.&lt;/strong&gt; Commenters are seeking clarification on what specific issues were fixed in the update, indicating a need for more detailed release notes or documentation. There is also a suggestion to use the current model with an updated chat template, highlighting the flexibility of the new setup.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The update to Gemma 4 GGUFs involves improvements in the chat template handling, which can now be customized using a Jinja template file. This feature is supported in &lt;code&gt;llama.cpp&lt;/code&gt; with the &lt;code&gt;--chat-template-file&lt;/code&gt; flag and in &lt;code&gt;koboldcpp&lt;/code&gt; under the loaded files section, enhancing flexibility in chat interactions.&lt;/li&gt;
&lt;li&gt;The update is not limited to GGUFs but extends to other formats like safetensor, MLX, and FP8. This suggests a broader compatibility and potential improvements across various model formats, ensuring that users of different systems can benefit from the enhancements.&lt;/li&gt;
&lt;li&gt;There is a discussion about the stability of the previous version, with some users reporting solid performance using Unsloth Gemma 4 with a Jinja flag and open code. This indicates that while the update may bring improvements, the previous version was already functioning well for some users.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t2ab5y/qwen3627b_vs_codernext/&quot;&gt;Qwen3.6-27B vs Coder-Next&lt;/a&gt;&lt;/strong&gt; (Activity: 1329): &lt;strong&gt;The post discusses a detailed comparison between two AI models, Qwen3.6-27B and Coder-Next, using extensive testing on RTX PRO 6000 GPUs. The author found that both models perform similarly across various tasks, with Qwen3.6-27B being more consistent in output when &apos;thinking&apos; is disabled, while Coder-Next excels in cost-efficiency for specific tasks. The analysis highlights the models&apos; strengths and weaknesses, emphasizing that the choice between them depends on the specific use case. The author also critiques traditional benchmarks, suggesting they may not fully capture model performance in real-world scenarios. The post includes a link to a GitHub repository with detailed test data.&lt;/strong&gt; Commenters discuss the practical implications of the tests, noting that the results may not be applicable to users with less VRAM, as the models were tested under optimal conditions. There is also a debate about the importance of specifying quantization levels in model testing, as it significantly affects performance and applicability.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;viperx7 highlights the challenges of running large models like Qwen 3.6 27B and Coder Next on limited VRAM. They note that with 48GB VRAM, one can run Qwen 3.6 27B at Q8 with 264k unquantized context, but Coder Next would require offloading to CPU at Q4, impacting performance. This illustrates the importance of specifying quantization levels and context sizes when discussing model performance, as these factors significantly affect usability on different hardware configurations.&lt;/li&gt;
&lt;li&gt;pminervini shares a link to a benchmark (https://neuralnoise.com/2026/harness-bench-wip/?bare) that provides a different perspective on model performance. This suggests that individual experiences with model performance can vary widely depending on the specific tasks and benchmarks used, highlighting the need for standardized testing environments to accurately compare models.&lt;/li&gt;
&lt;li&gt;crantob points out the importance of specifying the programming languages used in tests, as performance can vary significantly across different tasks such as browser automation, Python scripting, or C systems programming. This underscores the need for detailed context when evaluating model performance, as different applications may yield different results.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Hardware and Performance Discussions&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t2ywn7/amd_strix_halo_refresh_with_192gb/&quot;&gt;AMD Strix Halo refresh with 192gb!&lt;/a&gt;&lt;/strong&gt; (Activity: 637): &lt;strong&gt;The upcoming &lt;strong&gt;AMD Strix Halo refresh&lt;/strong&gt;, specifically the Gorgon Halo 495 Max, is rumored to feature &lt;code&gt;192GB&lt;/code&gt; of memory, a significant increase from the previous &lt;code&gt;128GB&lt;/code&gt;. This enhancement could potentially allow users to run large models, such as the &lt;code&gt;122B&lt;/code&gt; models at &lt;code&gt;q8&lt;/code&gt; with nearly full context. However, concerns remain about whether the memory bandwidth will increase proportionally, as it is currently around &lt;code&gt;250GB/s&lt;/code&gt;, which may limit performance despite the increased memory capacity.&lt;/strong&gt; Commenters express skepticism about the practical benefits of the increased memory without a corresponding increase in memory bandwidth, suggesting that while larger models can be run, they may perform very slowly. Some suggest waiting for future releases like the Medusa Halo for more substantial improvements.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;JinPing89 suggests that if the memory bandwidth remains around &lt;code&gt;250GB/s&lt;/code&gt;, the AMD Strix Halo refresh would be best suited for models like Minimax 2.7, which has &lt;code&gt;10 billion active parameters&lt;/code&gt;. This implies that the bandwidth is a limiting factor for larger models, making Minimax 2.7 an optimal choice given the constraints.&lt;/li&gt;
&lt;li&gt;edsonmedina and DarkGhostHunter both highlight that increasing memory capacity without a corresponding increase in memory bandwidth will result in performance bottlenecks. Edsonmedina notes that while larger models can be run, they will be &lt;em&gt;very slow&lt;/em&gt;, and DarkGhostHunter points out that the refresh is essentially a minor upgrade over the existing 395+ with similar bandwidth and GPU architecture, offering only about a &lt;code&gt;5% performance difference&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;riklaunim discusses the potential high cost of devices using the AMD Strix Halo refresh, estimating prices over &lt;code&gt;$3000&lt;/code&gt;. They suggest that waiting for future chips like Medusa Halo might be more beneficial, as it could represent a true next-generation leap, especially with Nvidia&apos;s N1X mobile chips also on the horizon.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t28bfj/karpathys_microgpt_running_at_50000_tps_on_an_fpga/&quot;&gt;Karpathy&apos;s MicroGPT running at 50,000 tps on an FPGA&lt;/a&gt;&lt;/strong&gt; (Activity: 318): &lt;strong&gt;&lt;strong&gt;Karpathy&apos;s MicroGPT&lt;/strong&gt; is achieving &lt;code&gt;50,000 tokens per second (tps)&lt;/code&gt; on an FPGA with only &lt;code&gt;4,192 parameters&lt;/code&gt;. The project leverages onboard ROM for storing weights, which allows current FPGAs to handle up to &lt;code&gt;20-30 million parameters&lt;/code&gt; with &lt;code&gt;16-bit weights&lt;/code&gt;. This setup could inspire more onboard ROM in FPGAs or specialized FPGAs for small language models (SLMs). The project details are available on &lt;a href=&quot;https://v2.talos.wtf/&quot;&gt;Talos&lt;/a&gt; and the &lt;a href=&quot;https://github.com/Luthiraa/TALOS-V2&quot;&gt;GitHub repository&lt;/a&gt;.&lt;/strong&gt; Commenters highlight the potential of FPGA acceleration for local models, noting projects like HILOS and Hillinfer that use SmartSSDs to offload memory-bound parts of LLM inference. However, challenges include limited block RAM on FPGAs, necessitating either costly multi-FPGA setups or external memory, which diminishes speed advantages compared to GPUs or TPUs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Song-Historical&lt;/strong&gt; discusses the potential of FPGA acceleration for local models, particularly through projects like HILOS and Hillinfer. These projects utilize SmartSSDs, which combine FPGAs with flash storage, to offload memory-bound parts of LLM inference. This approach could enable dedicated hardware solutions for KV cache management in AI accelerators or personal computers, enhancing performance for long-context workflows without requiring the FPGA to handle all inference tasks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;dqUu3QlS&lt;/strong&gt; highlights the limitations of using FPGAs for neural networks due to their small block RAM, typically less than a megabyte. To handle models with millions of parameters, one could either split the model across multiple FPGAs, which is costly, or attach external memory. However, the latter option negates the FPGA&apos;s speed advantage as GPUs or TPUs can access the same memory with equal or greater bandwidth, making FPGAs less competitive for large-scale neural network inference.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Yes_but_I_think&lt;/strong&gt; expresses skepticism about the scalability of current FPGA-based solutions, noting that without hardware L3 cache sizes of 32GB, achieving high inference speeds like 5 million tokens per second remains impractical. They argue that current proofs of concept do not scale effectively, implying that significant hardware advancements are necessary to reach such performance levels.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Tools and Visualizations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t24y4p/i_made_a_visualizer_for_hugging_face_models/&quot;&gt;I made a visualizer for Hugging Face models&lt;/a&gt;&lt;/strong&gt; (Activity: 703): &lt;strong&gt;The post introduces &lt;a href=&quot;http://hfviewer.com&quot;&gt;hfviewer.com&lt;/a&gt;, a tool designed for visualizing the architecture of models hosted on Hugging Face. Users can input a Hugging Face model URL to generate an interactive visualization, which aids in understanding and comparing model structures. The example provided is the &lt;strong&gt;Qwen3.6-27B&lt;/strong&gt; model, showcasing a flowchart that details the model&apos;s components from input to output, including nodes like &quot;Text embeddings,&quot; &quot;Qwen3VLVisionModel,&quot; and &quot;Qwen3VLTextDecoderLayer.&quot; The tool also features a &quot;GRANULARITY&quot; slider for adjusting the level of detail in the visualization.&lt;/strong&gt; A technical comment highlights a usability issue when comparing models with similar names in different tabs, where the diagram alignment shifts due to character differences, complicating visual comparison. Other comments praise the tool&apos;s polish and utility.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CheatCodesOfLife points out a UI issue in the visualizer where switching between two model links causes the diagram to jump due to a character alignment problem. This affects the ability to perform a &apos;visual diff&apos; between models, particularly when one model name contains a &apos;p&apos; that hangs lower, causing misalignment.&lt;/li&gt;
&lt;li&gt;Altruistic_Heat_9531 mentions the utility of the visualizer for debugging sequence parallelism and compares it to Netron. They express interest in converting the tool to Electron or a personal web server for frequent use and suggest adding tensor dimension listings to enhance the tool&apos;s functionality for technical users.&lt;/li&gt;
&lt;li&gt;AccomplishedFix3476 highlights the effectiveness of the visualizer&apos;s architecture diagrams over traditional config JSON files, specifically mentioning its utility in understanding complex models like Qwen 3 MoE. The routing visualization feature helped clarify a long-standing confusion, demonstrating the tool&apos;s practical impact on model comprehension.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t2uk1m/one_bash_permission_slipped/&quot;&gt;One bash permission slipped...&lt;/a&gt;&lt;/strong&gt; (Activity: 2440): &lt;strong&gt;The post discusses a significant error caused by a language model, &quot;OpenCode with Qwen 3.6,&quot; which incorrectly executed chained bash commands, leading to the accidental deletion of the user&apos;s entire projects directory using &lt;code&gt;rm -rf&lt;/code&gt;. The user highlights the importance of frequent backups, as they were able to mitigate the disruption by pushing changes often. The incident occurred in an isolated Proxmox VM, emphasizing the risks of using AI tools for coding without proper safeguards.&lt;/strong&gt; A commenter expressed concern about the use of AI tools like Copilot CLI in environments with access to production systems, suggesting that such practices could lead to severe consequences if not properly managed.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Max-_-Power raises a critical concern about security practices in their workplace, highlighting the use of tools like Copilot CLI on machines with Kubernetes access to production environments. This setup poses significant risks, as it violates best practices for environment segregation and could lead to accidental or malicious changes in production systems. The comment underscores the importance of strict access controls and the potential dangers of complacency in security protocols.&lt;/li&gt;
&lt;li&gt;xornullvoid shares a technical mishap involving the use of a wildcard in a &lt;code&gt;sudo apt remove&lt;/code&gt; command, which inadvertently removed all NVIDIA display drivers and libraries. This highlights the risks associated with using wildcards in package management commands, especially when combined with &lt;code&gt;sudo&lt;/code&gt;, as it can lead to unintended system-wide changes. The comment serves as a cautionary tale about the importance of precise command execution in system administration.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. AI Model Releases and Benchmarks&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1t02oxw/gpt55_slightly_outperformed_mythos_on_a_multistep/&quot;&gt;GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack simulation. One challenge that took a human expert 12 hrs took GPT-5.5 only 11 min at a $1.73 cost&lt;/a&gt;&lt;/strong&gt; (Activity: 873): &lt;strong&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt; has demonstrated superior performance in a multi-step cyber-attack simulation, outperforming &lt;strong&gt;Mythos&lt;/strong&gt; by completing a task in &lt;code&gt;11 minutes&lt;/code&gt; that took a human expert &lt;code&gt;12 hours&lt;/code&gt;, at a cost of &lt;code&gt;$1.73&lt;/code&gt;. This evaluation, detailed in a &lt;a href=&quot;https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities&quot;&gt;blog by the AI Security Institute&lt;/a&gt;, highlights the model&apos;s efficiency and cost-effectiveness in handling complex cybersecurity challenges. The &lt;a href=&quot;https://www.ncsc.gov.uk/blogs/why-cyber-defenders-need-to-be-ready-for-frontier-ai&quot;&gt;National Cyber Security Centre&lt;/a&gt; also discusses the implications of such advancements for cyber defense strategies.&lt;/strong&gt; Commenters express skepticism about the reported cost, suggesting it should be closer to &lt;code&gt;$70&lt;/code&gt;, and speculate on the potential exposure of government backdoors due to such AI capabilities. Additionally, there is a suggestion that &lt;strong&gt;Anthropic&apos;s&lt;/strong&gt; claims about &lt;strong&gt;Mythos&lt;/strong&gt; being too dangerous were possibly a cover for computational limitations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user expressed skepticism about the reported cost of $1.73 for 11 minutes of computation with GPT-5.5, suggesting that the actual cost would be closer to $70. This highlights potential discrepancies in cost reporting for AI model usage, which could be due to differences in pricing models or computational efficiency assumptions.&lt;/li&gt;
&lt;li&gt;Another comment speculated on the implications of GPT-5.5&apos;s capabilities, suggesting that its performance might lead to the exposure of government backdoors. This raises concerns about the potential for advanced AI models to uncover vulnerabilities in existing systems, which could have significant security implications.&lt;/li&gt;
&lt;li&gt;A user noted surprise that GPT-5.5, if comparable to Mythos, did not cause significant disruptions upon release, as was previously warned by Anthropic. This comment reflects on the balance between AI capabilities and the perceived risks associated with releasing powerful models, questioning the accuracy of prior warnings.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1sz1fir/sensenovau1_just_dropped_native_multimodal/&quot;&gt;SenseNova-U1 just dropped — native multimodal gen/understanding in one model, no VAE, no diffusion&lt;/a&gt;&lt;/strong&gt; (Activity: 293): &lt;strong&gt;&lt;strong&gt;SenseNova-U1&lt;/strong&gt; introduces a novel approach to multimodal generation and understanding by integrating text rendering directly into images, overcoming limitations of diffusion models that lack language pathways. This model excels in generating complex visual outputs like infographics and annotated diagrams by processing semantic content rather than latents. It also supports image editing with reasoning, allowing for nuanced transformations such as converting an image to a watercolor style while maintaining composition. The model facilitates interleaved text and image generation, producing coherent outputs in a single pass. The model is available on &lt;a href=&quot;https://github.com/OpenSenseNova/SenseNova-U1&quot;&gt;GitHub&lt;/a&gt; and supports a resolution of &lt;code&gt;2048x2048&lt;/code&gt; with &lt;code&gt;8B&lt;/code&gt; parameters under the Apache 2.0 license.&lt;/strong&gt; One commenter noted the model&apos;s technical specifications, including its &lt;code&gt;2048x2048&lt;/code&gt; resolution and &lt;code&gt;8B&lt;/code&gt; parameters, expressing interest in its integration into other platforms. Another user reported disappointing image quality in initial tests, suggesting the model&apos;s strengths may lie in more complex tasks beyond simple text-to-image generation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model, SenseNova-U1, is released under the Apache 2.0 license and features a resolution of &lt;code&gt;2048x2048&lt;/code&gt; with &lt;code&gt;8 billion parameters&lt;/code&gt;. It utilizes a technique referred to as &lt;code&gt;lightx2v&lt;/code&gt;, which is notable for not relying on traditional methods like VAE or diffusion for multimodal generation and understanding.&lt;/li&gt;
&lt;li&gt;A user reported that the image quality of SenseNova-U1 was underwhelming in their tests, particularly when using photorealistic prompts for text-to-image generation. This suggests that while the model may have strengths in other areas, its performance in generating high-quality images might not meet expectations in certain scenarios.&lt;/li&gt;
&lt;li&gt;There is interest in running a local, uncensored version of SenseNova-U1, indicating a demand for more control and privacy in using AI models. This reflects a broader trend in the AI community towards decentralization and user autonomy over AI tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. AI Tools and Applications&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1syvihl/that_robot_demo_almost_turned_into_a_nightmare/&quot;&gt;That robot demo almost turned into a nightmare&lt;/a&gt;&lt;/strong&gt; (Activity: 2531): &lt;strong&gt;The Reddit post discusses a robot demonstration that nearly resulted in an accident involving a child. The robot, performing martial arts-like movements, almost kicked a child who was standing too close. This incident highlights potential safety concerns in human-robot interaction, especially in public demonstrations where bystanders may not be aware of the risks. The situation underscores the importance of implementing strict safety protocols and barriers to prevent such close encounters during robotic demonstrations.&lt;/strong&gt; Commenters debate the responsibility of supervising adults and the need for better safety measures during robot demonstrations. Some argue that parents should ensure children maintain a safe distance, while others emphasize the need for organizers to enforce stricter safety protocols.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1syu74k/zanime_full_anime_finetune_on_zimage_base/&quot;&gt;Z-Anime - Full Anime Fine-Tune on Z-Image Base&lt;/a&gt;&lt;/strong&gt; (Activity: 297): &lt;strong&gt;&lt;strong&gt;Z-Anime&lt;/strong&gt; is a fully fine-tuned model based on &lt;strong&gt;Alibaba&apos;s Z-Image Base&lt;/strong&gt; architecture, specifically designed for anime-style image generation. Unlike a LoRA merge, it is built from scratch using the &lt;strong&gt;S3-DiT (Single-Stream Diffusion Transformer)&lt;/strong&gt; with &lt;code&gt;6 billion parameters&lt;/code&gt;. This model emphasizes rich diversity, strong controllability, and supports full negative prompts, making it highly adaptable for further fine-tuning. The training dataset reportedly includes around &lt;code&gt;15,000 images&lt;/code&gt;, focusing on anime content.&lt;/strong&gt; There is a debate regarding the dataset size and composition, with some users emphasizing the importance of not training on AI-generated datasets. The model&apos;s training on a relatively small dataset of &lt;code&gt;15,000 images&lt;/code&gt; has been noted, raising questions about its diversity and generalization capabilities.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1szjm1c/blind_realism_test_z_image_turbo_vs_klein_9b/&quot;&gt;Blind realism test, Z image turbo vs Klein 9B distilled&lt;/a&gt;&lt;/strong&gt; (Activity: 232): &lt;strong&gt;The Reddit post discusses a blind realism test comparing two AI models, &lt;strong&gt;Z Image Turbo&lt;/strong&gt; and &lt;strong&gt;Klein 9B Distilled&lt;/strong&gt;, using 10 images generated with and without LoRa (Low-Rank Adaptation). The test aims to determine which model produces the most realistic images without bias from knowing the model details. The prompt used for image generation is a detailed description of a night portrait scene. The models and LoRas used include &lt;strong&gt;Flux 2 Klein 9B Distilled&lt;/strong&gt; and &lt;strong&gt;Intarealism V2/V3 finetunes from Z Image Turbo&lt;/strong&gt;, with links provided to their respective &lt;a href=&quot;https://civitai.com&quot;&gt;Civitai pages&lt;/a&gt;. The post highlights that the first image, generated using Klein 9B, was perceived as the most realistic, with images 6 and 10 also noted for realism. The test emphasizes the importance of unbiased evaluation in AI-generated imagery.&lt;/strong&gt; Commenters noted that Klein 9B handles lens flares better than Z Image Turbo, which struggles with texture realism, particularly in stone patterns. This suggests a preference for Klein 9B in scenarios requiring detailed texture handling.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hoodfu highlights a key difference between the models, noting that &lt;strong&gt;Klein 9B&lt;/strong&gt; handles lens flares significantly better than &lt;strong&gt;Z Image Turbo&lt;/strong&gt;, which struggles with rendering mottled stone patterns, particularly on gravel surfaces. This texture issue is a major drawback for Z Image Turbo, affecting its overall realism.&lt;/li&gt;
&lt;li&gt;Puzzled-Valuable-985 provides a detailed breakdown of the models and LoRas used in the test, emphasizing that the most realistic image was created using &lt;strong&gt;Flux 2 Klein 9B Distilled&lt;/strong&gt; with a specific LoRa for phone photography. The prompt used was designed to test realism with a complex scene involving a car and a model in a night setting, highlighting the strengths of Klein 9B in achieving photorealistic results.&lt;/li&gt;
&lt;li&gt;Desktop4070 offers a comparative analysis of the images, noting that &lt;strong&gt;Image 1&lt;/strong&gt; (Flux 2 Klein 9B Distilled) was the most convincing in terms of realism, while &lt;strong&gt;Image 3&lt;/strong&gt; (Z Image Turbo) had uncanny elements, particularly in the eyes. They also point out lighting inconsistencies in &lt;strong&gt;Image 10&lt;/strong&gt; and the overly professional appearance of &lt;strong&gt;Image 2&lt;/strong&gt;, which detracts from its realism.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1szqdtl/multi_injection_incoming/&quot;&gt;Multi Injection incoming&lt;/a&gt;&lt;/strong&gt; (Activity: 224): &lt;strong&gt;The image depicts a user interface for the &quot;FLUX.2 Klein Identity Transfer Multi-Injection,&quot; which is a tool designed to enhance identity transfer in models by injecting references from multiple stages within targeted blocks. This approach aims to improve stability and flexibility by performing mid and post-injection processes. The interface includes settings for parameters like &quot;model,&quot; &quot;subject_mask,&quot; and &quot;sim_floor,&quot; indicating a sophisticated level of control over the data processing or modeling tasks. The background grid with colored lines suggests a computational or graphical environment, likely used for visualizing or configuring the model&apos;s behavior.&lt;/strong&gt; One commenter expressed anticipation for the release but hoped for the ability to modify configurations beyond the default plug-and-play settings, indicating a desire for customizable options in different scenarios.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enshitification raises a critical point about configuration flexibility in the upcoming VAE project. They emphasize the importance of maintaining the ability to change configurations, suggesting that while a plug-and-play default might be convenient, it could lead to suboptimal performance in certain scenarios. This highlights a common tension in software design between ease of use and configurability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szvtvz/generate_a_website_screenshot_from_the_year_1000/&quot;&gt;&quot;Generate a website screenshot from the year 1000&quot;&lt;/a&gt;&lt;/strong&gt; (Activity: 1932): &lt;strong&gt;The image is a creative and humorous depiction of what a website might look like if it were designed in the year 1000, blending medieval themes with modern web design elements. Titled &quot;KingdomNet 1000,&quot; it features sections like proclamations, trade routes, and monastery scriptorium status, all styled with medieval motifs. The design cleverly integrates historical aesthetics with a digital interface, mimicking a modern website layout with navigation options such as &quot;Castle,&quot; &quot;Markets,&quot; and &quot;Guilds.&quot; This is a non-technical, artistic representation rather than a technical or factual depiction.&lt;/strong&gt; The comments highlight the impressive design quality, noting the lack of artifacts in the text and appreciating the creative concept of a medieval-themed website.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szozpg/this_is_so_accurate/&quot;&gt;this is so accurate 😂&lt;/a&gt;&lt;/strong&gt; (Activity: 3752): &lt;strong&gt;The Reddit post humorously highlights the accuracy of AI models like &lt;strong&gt;Claude&lt;/strong&gt; and &lt;strong&gt;GPT&lt;/strong&gt; in mimicking human-like responses, particularly in scenarios where users become frustrated due to their own poorly constructed prompts. This reflects a common issue in AI-human interaction where the quality of AI output is heavily dependent on the clarity and accuracy of user input.&lt;/strong&gt; Commenters agree on the accuracy of the depiction, with one noting it as the best representation of GPT interactions, emphasizing the frustration users feel when their prompts lead to unsatisfactory AI responses.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szkkro/cant_believe_that_chatgpt_has_such_indepth/&quot;&gt;Can’t believe that ChatGPT has such in-depth medical knowledge&lt;/a&gt;&lt;/strong&gt; (Activity: 9610): &lt;strong&gt;The image is a humorous meme that combines medical terminology with fictional elements from the Star Wars universe, specifically focusing on a fictional clinical guide for conducting a prostate examination on an Ewok. This playful depiction is not meant to be taken seriously and serves as a parody, highlighting the absurdity of applying real-world medical procedures to fictional creatures. The image is not technically significant and is intended for entertainment rather than educational purposes.&lt;/strong&gt; The comments do not provide any technical insights or debates, as they primarily consist of humorous reactions and additional memes related to the fictional context of the image.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szyf91/imagine_a_real_photographer_taking_a_photo_when/&quot;&gt;Imagine a real photographer taking a photo when Columbus meets the natives.&lt;/a&gt;&lt;/strong&gt; (Activity: 656): &lt;strong&gt;The image is a historical reenactment and not a technical or factual representation of Columbus&apos;s encounter with indigenous people. It is a creative depiction, imagining what it might have looked like if a photographer had been present during Columbus&apos;s landing in the Americas. The scene includes period-appropriate costumes and props, such as flags and armor for Columbus&apos;s crew and traditional clothing for the indigenous people, set against a backdrop of ships and palm trees. This artistic interpretation serves more as a visual storytelling piece rather than a source of historical accuracy or technical insight.&lt;/strong&gt; Some comments may discuss the artistic quality or historical accuracy of the depiction, but these are subjective and not technically substantive.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A discussion emerged about the technical challenges of capturing historical events with photography, focusing on the limitations of early photographic technology. The conversation highlighted the long exposure times required by early cameras, which would have made capturing dynamic scenes like Columbus meeting the natives difficult. Additionally, the lack of portable equipment and the need for chemical processing were noted as significant barriers to on-site historical photography.&lt;/li&gt;
&lt;li&gt;One commenter delved into the hypothetical scenario of using modern photographic technology in historical contexts. They speculated on the impact of high-resolution digital cameras and drones, which could provide comprehensive documentation from multiple angles. The discussion also touched on the potential for altering historical narratives through selective framing and editing, emphasizing the power of photography in shaping historical perception.&lt;/li&gt;
&lt;li&gt;The thread included a technical debate on the evolution of photographic techniques, comparing daguerreotypes with modern digital methods. Participants discussed the chemical processes involved in early photography, such as the use of silver halides, and contrasted these with the pixel-based sensors in digital cameras. The conversation underscored the dramatic improvements in image quality and accessibility over time.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szvl0j/a_short_story_im_liking_the_new_image_generation/&quot;&gt;A short story. I&apos;m liking the new image generation.&lt;/a&gt;&lt;/strong&gt; (Activity: 624): &lt;strong&gt;The Reddit post discusses a new image generation feature, highlighting that while initial images appear photorealistic, subsequent images degrade in quality, becoming less realistic. A specific issue noted is a &apos;weird texture thing&apos; that occurs by the fourth image, suggesting a potential bug or limitation in the image generation algorithm. The image linked in the post is not accessible due to network restrictions, requiring login or a developer token for access.&lt;/strong&gt; Commenters express disappointment with the decreasing photorealism in generated images, indicating a need for improvement in the algorithm&apos;s consistency across multiple outputs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user noted a decline in photorealism with each subsequent image generated, suggesting a potential issue with the model&apos;s consistency or capability to maintain quality across a series of images. This could indicate a limitation in the model&apos;s ability to handle complex or evolving scenes over multiple iterations.&lt;/li&gt;
&lt;li&gt;Another user pointed out an error in the generated content where a newspaper in the image incorrectly states that June 14th, 2050 is a Thursday, when it is actually a Tuesday. This highlights a potential flaw in the AI&apos;s ability to accurately process and represent factual temporal information, which could be critical for applications requiring precise data representation.&lt;/li&gt;
&lt;li&gt;A comment speculated on the narrative implications of AI-generated content, suggesting that &apos;AI wars are started by companies to drive up interest and profit.&apos; This reflects a broader concern about the motivations behind AI development and deployment, particularly in how narratives are constructed and potentially manipulated by AI systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szgxli/chatgpt_is_now_constantly_arguing_and_picking/&quot;&gt;ChatGPT is now constantly arguing and picking fights, what is going on?&lt;/a&gt;&lt;/strong&gt; (Activity: 1740): &lt;strong&gt;Users are reporting that &lt;strong&gt;ChatGPT&lt;/strong&gt; has started to frequently engage in argumentative behavior, using phrases like &quot;I&apos;m going to push back on that a bit&quot; and &quot;I&apos;d just be careful with one part of your thinking.&quot; This behavior includes making unsolicited arguments and challenging statements that users did not assert, which is causing frustration. The issue seems to involve the model&apos;s tendency to introduce counterarguments even when not necessary, potentially due to recent updates or changes in its conversational algorithms.&lt;/strong&gt; One user noted that ChatGPT argued against their expertise by referencing outdated studies, suggesting a flaw in its ability to prioritize recent and relevant information. This indicates a potential issue with the model&apos;s information retrieval or prioritization logic.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Able_Acadia2264 highlights a technical issue where ChatGPT argues against recent studies by quoting outdated research, which can undermine its credibility in specialized fields. This behavior suggests a potential flaw in the model&apos;s ability to prioritize newer, more relevant data over older sources, which could be critical for users relying on up-to-date information.&lt;/li&gt;
&lt;li&gt;hotel_air_freshener describes a scenario where ChatGPT appears to contradict itself by taking opposing stances in a conversation. This could indicate a problem with the model&apos;s consistency in maintaining a coherent argumentative position, which might confuse users seeking reliable dialogue.&lt;/li&gt;
&lt;li&gt;FujichromeProvia100F mentions the frequent appearance of warning symbols (&quot;⚠️&quot;) in interactions, which could imply that the model is overly cautious or frequently flags content as potentially problematic. This might affect user experience by creating a perception of excessive moderation or error-prone responses.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1syu3qr/ai_is_getting_too_realistic/&quot;&gt;Ai is getting too realistic&lt;/a&gt;&lt;/strong&gt; (Activity: 5710): &lt;strong&gt;The image in the post is a non-technical depiction of AI-generated imagery, showcasing how AI can create highly realistic scenes that mimic real-life photography. The focus is on the increasing capability of AI to produce lifelike images, as evidenced by the detailed urban scene and the realistic portrayal of a person in motion. This reflects advancements in AI image generation technologies, which are becoming more sophisticated in rendering complex environments and human figures with high fidelity.&lt;/strong&gt; One comment nostalgically recalls the early days of AI when it struggled with basic tasks, highlighting the rapid progress in AI capabilities. Another comment humorously references a common trope in movies, suggesting the AI-generated image evokes familiar cinematic imagery.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/SillyTavernAI/comments/1sztr62/the_directors_cut_freaky_frankenstein_4_max_and/&quot;&gt;The Director&apos;s Cut: Freaky Frankenstein 4 MAX and Freaky Frankenstein 4 BOLT [Presets] (Universal : DS, GLM, Claude, Gemini, Grok, Gemma, Qwen, MiMo) + DeepSeek V4 Compatibility. Hyper Dense Logic.&lt;/a&gt;&lt;/strong&gt; (Activity: 710): &lt;strong&gt;The post introduces the &lt;strong&gt;Director&apos;s Cut of the Freaky Frankenstein 4 Series&lt;/strong&gt;, featuring two presets: &lt;strong&gt;Freaky Frankenstein 4 MAX&lt;/strong&gt; and &lt;strong&gt;Freaky Frankenstein 4 BOLT&lt;/strong&gt;. These presets are designed for roleplaying with AI models like &lt;strong&gt;DS, GLM, Claude, Gemini, Grok, Gemma, Qwen, MiMo&lt;/strong&gt;, and are compatible with &lt;strong&gt;DeepSeek V4&lt;/strong&gt;. The MAX version focuses on high-quality, immersive roleplay with dense logic and XML tagging to enhance AI attention and reasoning, while the BOLT version prioritizes speed and minimalism by reducing logical constraints. Both presets include features like a &lt;strong&gt;VAD Emotion Engine&lt;/strong&gt; and &lt;strong&gt;Cinematography Engine&lt;/strong&gt; to enhance narrative and dialogue realism. The presets are compatible with multiple frontends, including the new &lt;strong&gt;MarinaraEngine&lt;/strong&gt;. Users are advised to adjust temperature settings and toggles for optimal performance, especially during high-demand periods when models may be dynamically quantized.&lt;/strong&gt; The comments reflect excitement and support for the new presets, with users expressing eagerness to try them out and appreciation for the updates and future plans shared in the Rentry link.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/SillyTavernAI/comments/1syt7kc/character_card_guide_1_how_to_write_character/&quot;&gt;Character Card Guide (1): How to Write Character Basics&lt;/a&gt;&lt;/strong&gt; (Activity: 260): &lt;strong&gt;The Reddit post provides a detailed guide on writing character cards for role-playing, emphasizing the separation of character basics from personality traits. It outlines a structured approach to defining a character&apos;s profile, appearance, backstory, and relationship with the user, stressing the importance of distinctive details over generic descriptors. The guide advises against mixing personality traits with basic information to prevent AI models from prematurely forming character impressions, which can lead to inconsistencies. It also highlights the need for concrete, specific details that help AI models maintain character continuity and avoid filler content.&lt;/strong&gt; One commenter noted that specific details, like a birthmark, can become overly emphasized by AI models, as they treat such details as significant traits. Another suggested including character goals and behaviors to reduce AI interpretation errors and improve consistency across models.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The comment by AiCodeDev highlights a technical issue with language models where specific physical details, like a birthmark, are treated as significant traits. This is because large language models are trained to emphasize concrete, sensory details as important elements for character continuity, which can lead to unintended emphasis in generated content.&lt;/li&gt;
&lt;li&gt;eternalityLP suggests enhancing character descriptions by including goals, wants, hobbies, and behavioral traits. This approach reduces the interpretative burden on language models, leading to more consistent character portrayal across different models and minimizing stereotypical or exaggerated behaviors.&lt;/li&gt;
&lt;li&gt;iraragorri argues against using tags like &apos;hair:&apos; or &apos;relationship:&apos; in character descriptions, as they consume tokens unnecessarily. Modern models, even smaller ones, can understand plain text descriptions effectively. The commenter also emphasizes that behavioral patterns should naturally stem from personality traits and that unnecessary details should be relegated to a lorebook.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Other notable frontier-model / infra posts&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1sz4h4g/engineering_teams_celebrating_agentic_workflows/&quot;&gt;engineering teams celebrating agentic workflows that returned the same result two runs in a row&lt;/a&gt;&lt;/strong&gt; (Activity: 863): &lt;strong&gt;The post humorously highlights the rarity of achieving consistent results in agentic workflows, which are typically characterized by variability due to their dynamic nature. The mention of &apos;engineering teams celebrating&apos; suggests a breakthrough or unexpected stability in these workflows, which are often used in AI and machine learning contexts to handle tasks autonomously. The term &apos;agentic&apos; refers to systems that can act independently, and achieving the same result twice in a row is noteworthy due to the inherent unpredictability of such systems.&lt;/strong&gt; The comments reflect a mix of humor and empathy, with users expressing surprise and amusement at the consistency achieved in agentic workflows, which is typically seen as a &apos;miracle&apos; due to their unpredictable nature.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/1szc05y/icml_2026_decision_d/&quot;&gt;ICML 2026 Decision [D]&lt;/a&gt;&lt;/strong&gt; (Activity: 1124): &lt;strong&gt;The post discusses the anticipation surrounding the upcoming publication of decisions for &lt;strong&gt;ICML 2026&lt;/strong&gt;. The community is eagerly awaiting updates, with many checking platforms like OpenReview frequently for the latest information. This reflects the high level of engagement and anxiety typical in the academic community during conference decision periods.&lt;/strong&gt; The comments humorously reflect the anxiety and anticipation of the community, with users expressing their compulsive checking of platforms like OpenReview, highlighting the emotional investment in the conference decision process.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1syuij0/when_youve_got_money_to_burn/&quot;&gt;When you&apos;ve got money to burn 😂&lt;/a&gt;&lt;/strong&gt; (Activity: 1764): &lt;strong&gt;The image is a meme depicting a humorous scenario where a man uses a blowtorch to light a cigar, symbolizing the excessive use of resources for a simple task. This is a metaphor for over-engineering or using complex solutions for straightforward problems, often seen in technical fields. The comments reflect a similar sentiment, discussing the inefficiency of using advanced tools for basic tasks, such as formatting text or performing simple web searches, and questioning the value of expensive technology if it cannot perform simple functions effectively.&lt;/strong&gt; The comments highlight a debate on the efficiency and practicality of using advanced technology for simple tasks, with users expressing skepticism about the value of expensive tools that fail to perform basic functions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;fsharpman highlights a performance issue with version 4.7, stating it couldn&apos;t handle a simple task. This suggests potential limitations in the model&apos;s capabilities, which might be unexpected given its version number, indicating room for improvement or optimization.&lt;/li&gt;
&lt;li&gt;bombero_kmn points out a typo in the README at line 137, which could indicate a lack of attention to detail in documentation. This might affect user experience, especially for those relying on accurate documentation for implementation or troubleshooting.&lt;/li&gt;
&lt;li&gt;MuttMundane questions the value proposition of expensive software, implying that high cost should correlate with high performance. This raises a broader discussion on the expectations of premium software and whether current offerings meet those expectations.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/aivideo/comments/1t0a8u0/futurama_live_action_cast/&quot;&gt;Futurama live action cast&lt;/a&gt;&lt;/strong&gt; (Activity: 530): &lt;strong&gt;The Reddit post discusses a hypothetical live-action cast for the animated series &lt;strong&gt;Futurama&lt;/strong&gt;. A key technical critique is the choice of actors, particularly the exclusion of &lt;strong&gt;Katey Sagal&lt;/strong&gt; as Leela, which is seen as a misstep given her iconic voice role in the original series. Additionally, there are technical issues with the video&apos;s audio mixing, specifically that the music volume is too high, making it difficult to hear the dialogue.&lt;/strong&gt; Commenters express dissatisfaction with the casting choices, suggesting that many of the selected actors do not fit the characters well. This reflects a broader debate on the challenges of translating animated characters to live-action while maintaining the essence of the original performances.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/aivideo/comments/1szrz9f/cats_imitating_the_gunshot_death_poses_of/&quot;&gt;Cats imitating the gunshot death poses of characters in movies and TV shows from different countries&lt;/a&gt;&lt;/strong&gt; (Activity: 696): &lt;strong&gt;The Reddit post humorously depicts cats mimicking dramatic death scenes from movies and TV shows across various countries, suggesting a cultural commentary on how different regions portray such scenes. The post likely uses AI-generated content, as one commenter notes a similar concept was seen on TikTok, implying potential AI training data sources. The Korean depiction is highlighted for its exaggerated length, spanning &apos;3 whole episodes about the shooting, ambulance and recovery.&apos;&lt;/strong&gt; Commenters discuss the potential influence of existing social media content on AI-generated media, suggesting that AI might be trained on popular cultural memes or jokes. The Korean portrayal is noted for its dramatic and extended narrative style, reflecting cultural storytelling differences.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/aivideo/comments/1szc5ma/my_medieval_sitcom_is_really_coming_together/&quot;&gt;My medieval sitcom is really coming together&lt;/a&gt;&lt;/strong&gt; (Activity: 1970): &lt;strong&gt;The Reddit post discusses the development of a medieval-themed sitcom, likely set in the 1470s, as inferred from a comment. The sitcom includes period-appropriate elements such as a &apos;lute jingle,&apos; which suggests attention to historical detail in the show&apos;s production. The post does not provide specific technical details about the production process, such as filming techniques or scriptwriting, but the mention of a &apos;lute jingle&apos; indicates a focus on authentic sound design.&lt;/strong&gt; The comments reflect a positive reception, with one user appreciating the &apos;cute&apos; nature of the show and another enjoying the &apos;lute jingle,&apos; suggesting that the show&apos;s historical elements are well-received by the audience.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/aivideo/comments/1szcxsu/wazzup/&quot;&gt;Wazzup!&lt;/a&gt;&lt;/strong&gt; (Activity: 1239): &lt;strong&gt;The post titled &apos;Wazzup!&apos; appears to be a casual or humorous entry, as indicated by the comments and the presence of a GIF. The external link summary suggests that the content is a video hosted on Reddit, but access is restricted due to network security measures, requiring login or a developer token. For more information, users are directed to the original &lt;a href=&quot;https://v.redd.it/vfc6pka9b7yg1&quot;&gt;Reddit link&lt;/a&gt;.&lt;/strong&gt; The comments do not provide any technical insights or debates, focusing instead on the entertainment value of the content.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>openai</category><category>langchain</category><category>baseten</category><category>ollama</category><category>openrouter</category><category>gpt-5.2-codex</category><category>gpt-5.3-codex</category><category>anthony_maio</category><category>mason_drxy</category><category>hwchase17</category><category>sydneyrunkle</category><category>naroh</category><category>teknuim</category><category>vtrivedy</category><category>dbreunig</category><category>zachtratar</category><category>theo</category><category>petergostev</category><category>cheatyyyy</category><category>agent-orchestration</category><category>context-pipelines</category><category>coding-agents</category><category>pricing-models</category><category>multi-agent-systems</category><category>workflow-optimization</category><category>model-agnostic-orchestration</category><category>prompt-engineering</category><category>memory-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-05-01-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-05-01-not-much/</guid><description>**xAI released Grok 4.3**, improving cost/performance with a **53 Intelligence Index score**, 4 points higher than Grok 4.20, and significant gains on **GDPval-AA** and **τ²-Bench Telecom**. However, accuracy tradeoffs raised reliability concerns. Community opinions are mixed, with some praising token-efficiency and others noting regressions and pricing concerns. **DeepSeek V4 Pro** emerges as a leading open-weight coding/agent model, comparable to **Codex** and **Claude Code**, featuring a 1M context window and efficient attention mechanisms. Benchmarking shows open-weight models like **Kimi K2.6**, **MiMo V2.5 Pro**, and **DeepSeek V4 Pro** closing the gap with closed models such as **Gemini 3.1 Pro Preview**, **Claude Opus 4.7**, and **GPT-5.5**. DeepSeek&apos;s multimodal efforts focus on explicit spatial grounding with a novel &quot;point while thinking&quot; approach using **DeepSeek-ViT** and CSA compression.</description><pubDate>Fri, 01 May 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 4/30/2026-5/1/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Grok 4.3’s Release, Benchmark Deltas, and the Open-vs-Closed Frontier&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;xAI shipped Grok 4.3 with materially better cost/performance, but mixed eval reception&lt;/strong&gt;: Early chatter flagged an imminent API launch from &lt;a href=&quot;https://x.com/scaling01/status/2049947798825529468&quot;&gt;@scaling01&lt;/a&gt;, followed by a detailed benchmark breakdown from &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049987001655714250&quot;&gt;Artificial Analysis&lt;/a&gt;. On their &lt;strong&gt;Intelligence Index&lt;/strong&gt;, &lt;strong&gt;Grok 4.3 scores 53&lt;/strong&gt;, up &lt;strong&gt;4 points&lt;/strong&gt; over Grok 4.20, with roughly &lt;strong&gt;40% lower input&lt;/strong&gt; and &lt;strong&gt;60% lower output pricing&lt;/strong&gt;. The biggest gain was on &lt;strong&gt;GDPval-AA&lt;/strong&gt;, up &lt;strong&gt;321 Elo&lt;/strong&gt; to &lt;strong&gt;1500&lt;/strong&gt;, suggesting stronger real-world agentic task performance. It also hit &lt;strong&gt;98% on τ²-Bench Telecom&lt;/strong&gt; and held &lt;strong&gt;81% on IFBench&lt;/strong&gt;. The tradeoff: &lt;strong&gt;AA-Omniscience accuracy rose while non-hallucination dropped by 8 points&lt;/strong&gt;, leaving concerns about reliability despite stronger capability. Arena has already added it across text, vision, document, and code modes via &lt;a href=&quot;https://x.com/arena/status/2049992557527187794&quot;&gt;@arena&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Community reaction was split between “meaningful iteration” and “still behind top open models”&lt;/strong&gt;: Several posts argued Grok is improving faster than critics admit, including &lt;a href=&quot;https://x.com/teortaxesTex/status/2049986350783283532&quot;&gt;@teortaxesTex&lt;/a&gt;, who noted token-efficiency gains as well, while others were more skeptical. &lt;a href=&quot;https://x.com/scaling01/status/2049984249147666876&quot;&gt;@scaling01&lt;/a&gt; claimed &lt;strong&gt;“Grok-4.3 still behind chinese open-source”&lt;/strong&gt;, and &lt;a href=&quot;https://x.com/andonlabs/status/2050056965460734325&quot;&gt;Andon Labs&lt;/a&gt; reported a &lt;strong&gt;major regression on Vending-Bench 2&lt;/strong&gt;, where Grok allegedly preferred to “sleep” rather than act. The more structural critique came from pricing and infra economics: &lt;a href=&quot;https://x.com/teortaxesTex/status/2050043500985557120&quot;&gt;@teortaxesTex&lt;/a&gt; argued Grok’s low prices may be subsidized by poor hardware utilization and that &lt;strong&gt;cache economics&lt;/strong&gt;, not only model quality, increasingly determine agentic TCO.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek V4 Pro, Vision/Spatial Reasoning, and Open-Weights Closing the Gap&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DeepSeek V4 Pro appears to be the most credible open-weight coding/agent model in this batch&lt;/strong&gt;: The strongest hands-on report came from &lt;a href=&quot;https://x.com/omarsar0/status/2050009901234282649&quot;&gt;@omarsar0&lt;/a&gt;, who tested &lt;strong&gt;DeepSeek-V4-Pro&lt;/strong&gt; inside the &lt;strong&gt;Pi coding agent&lt;/strong&gt; and described it as the first open-weight model that genuinely feels comparable to &lt;strong&gt;Codex or Claude Code&lt;/strong&gt; for multi-turn agentic coding. Key systems details included &lt;strong&gt;1M context&lt;/strong&gt;, a hybrid &lt;strong&gt;CSA/HCA attention design&lt;/strong&gt;, &lt;strong&gt;KV cache reduced to 10%&lt;/strong&gt;, and nearly &lt;strong&gt;4x lower inference FLOPs&lt;/strong&gt; at long context. The report also emphasized practical harness fit: no custom setup, stable traces, and viable multi-step research/coding loops on Fireworks inference.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The broader benchmark picture confirms open weights are now much closer, though still behind on hardest tasks&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2050096370200281539&quot;&gt;Artificial Analysis&lt;/a&gt; noted that the three leading open-weight models released last week—&lt;strong&gt;Kimi K2.6&lt;/strong&gt;, &lt;strong&gt;MiMo V2.5 Pro&lt;/strong&gt;, and &lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt;—now score &lt;strong&gt;52–54&lt;/strong&gt; on the Intelligence Index, versus &lt;strong&gt;57&lt;/strong&gt; for &lt;strong&gt;Gemini 3.1 Pro Preview&lt;/strong&gt; and &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;, and &lt;strong&gt;60&lt;/strong&gt; for &lt;strong&gt;GPT-5.5&lt;/strong&gt;. These top open models are all &lt;strong&gt;trillion-plus MoE systems&lt;/strong&gt; with permissive licenses: Kimi at &lt;strong&gt;1T/32B active&lt;/strong&gt;, MiMo at &lt;strong&gt;1T/42B active&lt;/strong&gt;, and DeepSeek V4 Pro at &lt;strong&gt;1.6T/49B active&lt;/strong&gt;. The remaining gap is concentrated in &lt;strong&gt;HLE&lt;/strong&gt;, &lt;strong&gt;CritPt&lt;/strong&gt;, &lt;strong&gt;TerminalBench Hard&lt;/strong&gt;, and hallucination-heavy &lt;strong&gt;Omniscience&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DeepSeek’s multimodal direction seems centered on explicit spatial grounding&lt;/strong&gt;: Speculation about &lt;strong&gt;DeepSeek-Vision&lt;/strong&gt; outperforming V4-Pro on &lt;strong&gt;ARC-AGI-2&lt;/strong&gt; because of actual spatial reasoning came from &lt;a href=&quot;https://x.com/teortaxesTex/status/2049947128189923625&quot;&gt;@teortaxesTex&lt;/a&gt;. A later summary of a briefly posted-and-deleted tech report from &lt;a href=&quot;https://x.com/ZhihuFrontier/status/2050238000433659958&quot;&gt;ZhihuFrontier&lt;/a&gt; described a multimodal CoT system that can &lt;strong&gt;“point while thinking”&lt;/strong&gt; using boxes and points embedded directly into reasoning traces to reduce the “reference gap” in counting, maze solving, and path tracing. The stack reportedly uses &lt;strong&gt;DeepSeek-ViT&lt;/strong&gt;, &lt;strong&gt;CSA compression&lt;/strong&gt;, and &lt;strong&gt;V4-Flash (284B total / 13B active)&lt;/strong&gt;. Even if early tests still show weaknesses, it is a notable architectural bet: turning visual reasoning into explicit grounded computation rather than plain text description.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Codex’s Rapid Product Expansion vs Claude Code, Devin, and Other Agent Runtimes&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Codex is winning on product velocity and UX polish, not just base model quality&lt;/strong&gt;: A major theme across tweets was how quickly the &lt;strong&gt;Codex app&lt;/strong&gt; is improving. High-engagement praise came from &lt;a href=&quot;https://x.com/gdb/status/2049971410479796521&quot;&gt;@gdb&lt;/a&gt;, &lt;a href=&quot;https://x.com/theo/status/2049994645531451874&quot;&gt;@theo&lt;/a&gt;, and others comparing its feel favorably to alternatives. OpenAI added a &lt;strong&gt;device toolbar&lt;/strong&gt; for responsive testing and improved browser-use speed by ~&lt;strong&gt;30%&lt;/strong&gt; in “vibe testing,” per &lt;a href=&quot;https://x.com/JamesZmSun/status/2050050523794165816&quot;&gt;@JamesZmSun&lt;/a&gt;. It also added &lt;strong&gt;CI status in chat&lt;/strong&gt; via &lt;a href=&quot;https://x.com/reach_vb/status/2050194266505277902&quot;&gt;@reach_vb&lt;/a&gt;, &lt;strong&gt;migration/import tooling&lt;/strong&gt; for settings/plugins/agents via &lt;a href=&quot;https://x.com/OpenAI/status/2050290618187055175&quot;&gt;OpenAI&lt;/a&gt;, and a surprisingly viral &lt;strong&gt;pets&lt;/strong&gt; system in Codex via &lt;a href=&quot;https://x.com/OpenAIDevs/status/2050275713824211041&quot;&gt;@OpenAIDevs&lt;/a&gt;. While whimsical, the repeated point from users was that OpenAI is shipping a cohesive environment, not just a model endpoint.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex vs Claude Code is increasingly framed as UX + speed + taste tradeoffs&lt;/strong&gt;: &lt;a href=&quot;https://x.com/theo/status/2049994645531451874&quot;&gt;@theo&lt;/a&gt; summarized the current frontier coding vibe: &lt;strong&gt;GPT-5.5 is “smarter and can unblock you,” while Opus 4.7 has better intent/taste but can wander&lt;/strong&gt;. In a second post, he argued Claude Code feels much slower on TTFT/TPS and requires more tool calls, while GPT/Codex feels more direct and economical for “fast mode” style use (&lt;a href=&quot;https://x.com/theo/status/2050025533950587075&quot;&gt;tweet&lt;/a&gt;). Still, public benchmark comparisons are mixed: &lt;a href=&quot;https://x.com/scaling01/status/2050289320699818417&quot;&gt;@scaling01&lt;/a&gt; said &lt;strong&gt;GPT-5.5 did not beat Opus 4.7 on PostTrainBench in the Claude Code harness&lt;/strong&gt;, highlighting how much results remain harness-dependent.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Other agent runtimes are converging on similar primitives&lt;/strong&gt;: &lt;strong&gt;Devin&lt;/strong&gt; launched “inside your shell” hotkey access via &lt;a href=&quot;https://x.com/cognition/status/2050268727997022498&quot;&gt;@cognition&lt;/a&gt;. &lt;strong&gt;Hermes&lt;/strong&gt; added a &lt;code&gt;/goal&lt;/code&gt; loop with a supervisor model forcing the agent to continue until completion, via &lt;a href=&quot;https://x.com/Teknium/status/2050098631907434871&quot;&gt;@Teknium&lt;/a&gt;. &lt;strong&gt;Flue&lt;/strong&gt;, introduced by &lt;a href=&quot;https://x.com/FredKSchott/status/2050274923852210397&quot;&gt;@FredKSchott&lt;/a&gt;, positions itself as a TypeScript framework for headless autonomous agents, “like Claude Code but programmable.” The common pattern across these launches is that the competitive surface is moving from raw model IQ to &lt;strong&gt;agent harness design&lt;/strong&gt;: subagents, browser-use, durable state, compaction, skills, and feedback loops.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent Infrastructure: Retrieval, Memory, HITL, and Durable Execution&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The strongest research signal was that agent systems are bottlenecked by runtime design, not just model quality&lt;/strong&gt;: Two especially useful papers were highlighted. First, &lt;strong&gt;ReaLM-Retrieve&lt;/strong&gt;, summarized by &lt;a href=&quot;https://x.com/omarsar0/status/2049954716298494386&quot;&gt;@omarsar0&lt;/a&gt;, argues that reasoning models need retrieval during inference rather than only before it. It reports &lt;strong&gt;+10.1% absolute F1&lt;/strong&gt; over standard RAG and &lt;strong&gt;47% fewer retrieval calls&lt;/strong&gt; than fixed-interval IRCoT, with &lt;strong&gt;3.2x lower per-retrieval overhead&lt;/strong&gt;. Second, &lt;strong&gt;OCR-Memory&lt;/strong&gt;, shared by &lt;a href=&quot;https://x.com/dair_ai/status/2049957482811056307&quot;&gt;@dair_ai&lt;/a&gt;, stores long-horizon trajectories as images with indexed anchors, retrieving exact prior content instead of lossy text summaries; it reports SOTA on &lt;strong&gt;Mind2Web&lt;/strong&gt; and &lt;strong&gt;AppWorld&lt;/strong&gt; under strict context limits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LangChain/LangGraph pushed hard on production primitives for multi-user and human-in-the-loop agents&lt;/strong&gt;: &lt;a href=&quot;https://x.com/sydneyrunkle/status/2049956826670911809&quot;&gt;@sydneyrunkle&lt;/a&gt; outlined three concrete multi-user deployment concerns—&lt;strong&gt;data isolation&lt;/strong&gt;, &lt;strong&gt;delegated credentials&lt;/strong&gt;, and &lt;strong&gt;operator RBAC&lt;/strong&gt;—and mapped each to LangSmith Agent Server features. Later posts covered a new HITL mode where a human reply can be returned directly as a tool result (&lt;a href=&quot;https://x.com/sydneyrunkle/status/2050181039406858371&quot;&gt;tweet&lt;/a&gt;) and durable pause/resume semantics for consequential actions or unresolved judgment calls (&lt;a href=&quot;https://x.com/sydneyrunkle/status/2050195081995407429&quot;&gt;tweet&lt;/a&gt;). This is a good snapshot of where real deployment complexity is moving: auth boundaries, persistent state, and explicit intervention points.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Durable execution is becoming a first-class runtime feature across stacks&lt;/strong&gt;: Cloudflare announced &lt;strong&gt;Dynamic Workflows&lt;/strong&gt; for adding durable execution to agent plans via &lt;a href=&quot;https://x.com/celso/status/2050211184129786084&quot;&gt;@celso&lt;/a&gt;. LangChain positioned &lt;code&gt;create_agent&lt;/code&gt; as the low-level primitive beneath Deep Agents, with extensibility for filesystems, bash, compaction, hooks, and subagents via &lt;a href=&quot;https://x.com/Vtrivedy10/status/2050239109038232005&quot;&gt;@Vtrivedy10&lt;/a&gt;. The meta-point is consistent with one linked technical blog: the &lt;strong&gt;agent runtime itself&lt;/strong&gt;—sandboxing, replay, checkpointing, orchestration—has become hidden technical debt and a major source of differentiation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Research and Systems Papers Worth Bookmarking&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Recursive / latent-space multi-agent coordination is emerging as a serious alternative to text-only agent chatter&lt;/strong&gt;: &lt;a href=&quot;https://x.com/omarsar0/status/2050261229315477988&quot;&gt;@omarsar0&lt;/a&gt; summarized &lt;strong&gt;Recursive Multi-Agent Systems&lt;/strong&gt;, where agents communicate through &lt;strong&gt;shared latent recursive computation&lt;/strong&gt; instead of full natural-language exchanges. Reported gains: &lt;strong&gt;8.3% average accuracy improvement&lt;/strong&gt;, &lt;strong&gt;1.2x–2.4x end-to-end speedup&lt;/strong&gt;, and &lt;strong&gt;34.6%–75.6% token reduction&lt;/strong&gt; across nine benchmarks. If agent-to-agent communication cost becomes dominant, this line of work matters.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Meta FAIR’s “self-improving pretraining” idea may be one of the more consequential training-time papers in the batch&lt;/strong&gt;: &lt;a href=&quot;https://x.com/omarsar0/status/2050213732970848664&quot;&gt;@omarsar0&lt;/a&gt; highlighted a method where a strong post-trained model rewrites pretraining suffixes toward safer, higher-quality continuations and then judges model rollouts during RL-style pretraining. Reported improvements include &lt;strong&gt;36.2% relative gain in factuality&lt;/strong&gt;, &lt;strong&gt;18.5% in safety&lt;/strong&gt;, and up to &lt;strong&gt;86.3% win rate&lt;/strong&gt; in generation quality over standard pretraining.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Microsoft’s synthetic long-horizon computer-use worlds look like a credible data recipe&lt;/strong&gt;: &lt;a href=&quot;https://x.com/dair_ai/status/2050263752147456238&quot;&gt;@dair_ai&lt;/a&gt; described a system that creates &lt;strong&gt;1,000 synthetic computers&lt;/strong&gt; with realistic files and documents, then runs &lt;strong&gt;8-hour agent simulations&lt;/strong&gt; averaging &lt;strong&gt;2,000+ turns&lt;/strong&gt;. The thesis is straightforward and important: for computer-use agents, the bottleneck is no longer only model capability but &lt;strong&gt;scalable, realistic experiential data&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI/Codex momentum&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAI/status/2050250926888468929&quot;&gt;OpenAI says GPT-5.5 is its strongest launch yet, with API revenue growing 2x faster than prior releases and Codex doubling revenue in under seven days&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Defense/government adoption&lt;/strong&gt;: &lt;a href=&quot;https://x.com/DoWCTO/status/2050175912134561977&quot;&gt;The U.S. “Department of War” CTO announced agreements with seven frontier AI and infrastructure companies to deploy capabilities on classified networks&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI messaging pivot on labor&lt;/strong&gt;: &lt;a href=&quot;https://x.com/sama/status/2050229058425045178&quot;&gt;Sam Altman: “we want to build tools to augment and elevate people, not entities to replace them”&lt;/a&gt;, with follow-up comments on jobs and future work &lt;a href=&quot;https://x.com/sama/status/2050229059507159242&quot;&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex adoption and delight&lt;/strong&gt;: &lt;a href=&quot;https://x.com/gdb/status/2049971410479796521&quot;&gt;“codex app becoming incredible” from @gdb&lt;/a&gt;, plus &lt;a href=&quot;https://x.com/OpenAIDevs/status/2050275713824211041&quot;&gt;Codex pets&lt;/a&gt; unexpectedly becoming one of the day’s biggest product-engagement hits.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Model benchmarking reality check&lt;/strong&gt;: &lt;a href=&quot;https://x.com/arcprize/status/2050261221165989969&quot;&gt;ARC Prize reports GPT-5.5 at 0.43% and Opus 4.7 at 0.18% on ARC-AGI-3, with analysis of failure modes&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Qwen Model Developments and Benchmarks&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t0vp3w/pflash_10x_prefill_speedup_over_llamacpp_at_128k/&quot;&gt;PFlash: 10x prefill speedup over llama.cpp at 128K on a RTX 3090&lt;/a&gt;&lt;/strong&gt; (Activity: 339): &lt;strong&gt;The post introduces &lt;strong&gt;PFlash&lt;/strong&gt;, a speculative prefill technique for long-context decoding on quantized 27B targets using C++/CUDA, achieving a &lt;code&gt;10x&lt;/code&gt; speedup over vanilla llama.cpp on an RTX 3090. This method leverages a small drafter model to score token importance, allowing the main model to focus only on significant spans, thus reducing prefill time significantly. The implementation combines insights from recent papers on speculative prefill and block-sparse attention, and is executed entirely in C++/CUDA without Python or PyTorch, making it efficient for consumer-grade GPUs like the RTX 3090. The repository is available on &lt;a href=&quot;https://github.com/Luce-Org/lucebox-hub&quot;&gt;GitHub&lt;/a&gt;.&lt;/strong&gt; Some commenters express skepticism about the claimed &lt;code&gt;10x&lt;/code&gt; speedup, with one noting the approach as potentially &apos;super lossy&apos; due to its compression method. Another user reports out-of-memory issues on a 4090, indicating potential challenges in replicating the results.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;randomfoo2 highlights a novel approach in PFlash that involves using a smaller Qwen3-0.6B drafter to process the full 64K/128K prompt with FlashPrefill/BSA-style sparse attention, which reduces the computational cost. The drafter evaluates token/span importance, retaining only a crucial subset for the 27B target model to prefill, followed by speculative decoding using DFlash+DDTree on the compressed target KV. This method is noted for being &apos;super lossy,&apos; indicating potential trade-offs in accuracy for speed.&lt;/li&gt;
&lt;li&gt;qwen_next_gguf_when raises concerns about the practicality of the PFlash method, noting that the DFlash component tends to run out of memory (OOM) on an RTX 4090. This suggests potential limitations in hardware compatibility or efficiency, which could impact the method&apos;s replicability and scalability across different systems.&lt;/li&gt;
&lt;li&gt;Obvious-Ad-2454 expresses skepticism about the claimed 10x speedup, suggesting it might be too optimistic without independent verification. This comment underscores the importance of replication studies to validate performance claims in machine learning, especially when such significant improvements are reported.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t0epei/qwen_36_27b_vs_gemma_4_31b_making_packman_game/&quot;&gt;Qwen 3.6 27B vs Gemma 4 31B - making Packman game!&lt;/a&gt;&lt;/strong&gt; (Activity: 994): &lt;strong&gt;In a local LLM gamedev contest, &lt;strong&gt;Gemma 4 31B&lt;/strong&gt; outperformed &lt;strong&gt;Qwen 3.6 27B&lt;/strong&gt; in creating a Pac-Man style game on a MacBook Pro M5 Max with 64GB RAM. Gemma processed &lt;code&gt;27 tokens/sec&lt;/code&gt; and completed the task in &lt;code&gt;3m 51s&lt;/code&gt; with &lt;code&gt;6,209 tokens&lt;/code&gt;, while Qwen processed &lt;code&gt;32 tokens/sec&lt;/code&gt; over &lt;code&gt;18m 04s&lt;/code&gt; with &lt;code&gt;33,946 tokens&lt;/code&gt;. Despite Qwen&apos;s more creative and visually styled output, Gemma&apos;s solution was shorter, clearer, and more logical, excelling in game logic, interaction handling, and performance stability. The task required generating a complete HTML-based game with procedural graphics and no external libraries, focusing on smooth gameplay and stable performance using &lt;code&gt;requestAnimationFrame&lt;/code&gt; and delta time for animations.&lt;/strong&gt; Commenters noted the humor in the prompt&apos;s demand for &apos;no bugs&apos; and questioned the utility of vague prompts, suggesting they primarily test a model&apos;s pre-existing knowledge rather than its problem-solving ability.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Qwen 3.6 27B was tasked with creating a Pacman clone using a single HTML page and any libraries or graphics sources it deemed necessary. Interestingly, the model did not perform any external downloads or research, instead relying on its pre-existing knowledge to code the game. This highlights the model&apos;s ability to generate functional code from minimal prompts, though it raises questions about the depth of its understanding and adaptability to new resources.&lt;/li&gt;
&lt;li&gt;A user pointed out that the ghost enemy movement in the Gemma 4 31B version of the Pacman game appears to be malfunctioning. This suggests potential issues with the model&apos;s ability to accurately implement game logic, particularly in handling dynamic elements like enemy AI, which is crucial for a game like Pacman.&lt;/li&gt;
&lt;li&gt;The discussion raises concerns about the utility of using vague prompts for testing AI models, as noted by a commenter who described such prompts as &quot;benchmaxxing tests.&quot; This implies that the tests may not effectively evaluate the model&apos;s problem-solving capabilities or its ability to adapt to new tasks, but rather assess its pre-existing knowledge base.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1szrbub/qwenscope_official_sparse_autoencoders_saes_for/&quot;&gt;Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models&lt;/a&gt;&lt;/strong&gt; (Activity: 437): &lt;strong&gt;The &lt;strong&gt;Qwen Team&lt;/strong&gt; has released &lt;strong&gt;Qwen-Scope&lt;/strong&gt;, a set of Sparse Autoencoders (SAEs) for the Qwen 3.5 models, ranging from &lt;code&gt;2B&lt;/code&gt; to &lt;code&gt;35B&lt;/code&gt; MoE. This tool maps internal features across all layers, functioning as a dictionary of the model&apos;s internal concepts, allowing for precise manipulation of features such as &apos;legal talk&apos; or &apos;Python code&apos;. Key functionalities include &lt;strong&gt;Surgical Abliteration&lt;/strong&gt; to suppress specific features, &lt;strong&gt;Feature Steering&lt;/strong&gt; to activate desired concepts, &lt;strong&gt;Model Debugging&lt;/strong&gt; to identify token-triggered directions, and &lt;strong&gt;Dataset Analysis&lt;/strong&gt; to verify feature activation. The tool is released under the &lt;strong&gt;Apache 2.0 license&lt;/strong&gt; but with a caution against removing safety filters. A practical example includes diagnosing unexpected language switches using a heatmap to identify over-activated features. More details can be found in the &lt;a href=&quot;https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf&quot;&gt;Qwen-Scope paper&lt;/a&gt; and the &lt;a href=&quot;https://hf.co/spaces/Qwen/QwenScope&quot;&gt;Hugging Face Space&lt;/a&gt;.&lt;/strong&gt; Commenters highlight the significance of this release, noting it as potentially the largest open-source interpretability tool for dense models, surpassing Google&apos;s GemmaScope in scale. There is anticipation for future iterations, such as Qwen 3.6, to incorporate similar tools.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NandaVegg highlights the significance of the release of Sparse Autoencoders (SAEs) for the dense 27B Qwen model, noting it as potentially the largest open-source interpretability tool to date. This is in contrast to previous tools like GemmaScope, which only supported smaller models such as 9B and 2B, indicating a substantial advancement in model interpretability capabilities.&lt;/li&gt;
&lt;li&gt;robert896r1 expresses anticipation for the release of Qwen 3.6 or community-driven adaptations of the current tools for newer iterations. This reflects a common trend in the AI community where tools and models are rapidly iterated upon, and there is a need for compatibility with the latest versions to maintain relevance and utility.&lt;/li&gt;
&lt;li&gt;oxygen_addiction speculates on the use of feature steering in large AI models, such as ChatGPT5, suggesting that advanced routing mechanisms could be employed to select the most appropriate model for a given prompt. This points to a potential future where AI systems dynamically optimize their responses by leveraging multiple models and interpretability tools.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1szp96f/qwen3627bq6_k_images/&quot;&gt;Qwen3.6-27B-Q6_K - images&lt;/a&gt;&lt;/strong&gt; (Activity: 388): &lt;strong&gt;The post discusses the use of the &lt;strong&gt;Qwen3.6-27B-Q6_K&lt;/strong&gt; model to generate SVG images based on creative prompts, such as a pelican riding a bicycle and a Victorian-era robot reading a newspaper. The model&apos;s performance is measured in terms of time and throughput, with times ranging from &lt;code&gt;3min 10s&lt;/code&gt; to &lt;code&gt;8min 24s&lt;/code&gt; and throughput around &lt;code&gt;27 t/s&lt;/code&gt;. The images were generated using the &lt;strong&gt;Open Visual&lt;/strong&gt; tool in &lt;strong&gt;Open WebUI&lt;/strong&gt; (&lt;a href=&quot;https://github.com/ullahsamee/open-visual&quot;&gt;GitHub link&lt;/a&gt;). The post lacks specific hardware or framework details, which are crucial for evaluating the performance metrics provided.&lt;/strong&gt; One commenter noted the absence of hardware and framework details, which are essential for interpreting the performance statistics. Another comment humorously appreciated the whimsical nature of the generated images, likening them to early 2000s email forwards.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The user &apos;ZealousidealBadger47&apos; reports a performance metric of &lt;code&gt;10.71 tokens per second&lt;/code&gt; for the Qwen 3.5 122b-a10b IQ4_XS model, which provides a benchmark for evaluating the model&apos;s efficiency in processing data. This metric is crucial for understanding the model&apos;s throughput and potential bottlenecks in real-time applications.&lt;/li&gt;
&lt;li&gt;&apos;Ok-Importance-3529&apos; mentions the use of &apos;Autoround quant&apos; with the Qwen3.6-27B-Q2_K_MIXED.gguf model, linking to a &lt;a href=&quot;https://huggingface.co/sphaela/Qwen3.6-27B-AutoRound-GGUF&quot;&gt;Hugging Face repository&lt;/a&gt;. This suggests an interest in model quantization techniques, which are essential for optimizing model performance and reducing computational load, especially in resource-constrained environments.&lt;/li&gt;
&lt;li&gt;&apos;balerion20&apos; highlights the importance of providing hardware specifications, context size, and framework details when discussing model performance. This underscores the necessity of context in interpreting performance metrics, as these factors significantly influence the model&apos;s speed and efficiency.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1szajgm/devs_using_qwen_27b_seriously_whats_your_take/&quot;&gt;Devs using Qwen 27B seriously, what&apos;s your take?&lt;/a&gt;&lt;/strong&gt; (Activity: 785): &lt;strong&gt;&lt;strong&gt;Qwen 27B&lt;/strong&gt;, a large language model, is being evaluated by developers for its coding capabilities, akin to &lt;strong&gt;Codex&lt;/strong&gt;. Users report it as &apos;solid&apos; but not consistently outperforming models like &lt;strong&gt;GPT-5.5&lt;/strong&gt;. A user shared a &lt;a href=&quot;https://github.com/knoopx/pi/commit/0a31b9ac241ea4949e8403cf02473b01e7911f1b&quot;&gt;GitHub commit&lt;/a&gt; showcasing Qwen 27B&apos;s ability to refactor code effectively, though they wish for faster processing speeds (&lt;code&gt;~120 tokens/second&lt;/code&gt;). Another user successfully runs &lt;strong&gt;Qwen 27B&lt;/strong&gt; on &lt;strong&gt;llama.cpp&lt;/strong&gt; with &lt;strong&gt;pi&lt;/strong&gt;, noting it could substitute &lt;strong&gt;Claude Code&lt;/strong&gt; if tasks are broken down and documentation access is provided to mitigate knowledge gaps.&lt;/strong&gt; Some users feel Qwen 27B is &apos;good enough&apos; for their needs, while others note it lacks a certain &apos;extra something&apos; compared to other models. The need for task breakdown and documentation access is seen as both a limitation and a learning opportunity.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unlucky-Message8866 highlights the practical utility of Qwen 27B for code refactoring, specifically mentioning its ability to handle ESLint errors effectively. However, they express a desire for improved processing speed, ideally around &lt;code&gt;120 tokens per second&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;itroot discusses using Qwen 27B with llama.cpp and compares it to Claude Code, noting that while Qwen 27B requires more task breakdown and has knowledge gaps, it can perform similarly if supplemented with documentation access or cloud model assistance.&lt;/li&gt;
&lt;li&gt;formlessglowie shares a detailed experience of optimizing Qwen 27B&apos;s performance using vLLM and MTP speculative decoding, achieving &lt;code&gt;50+ tokens per second&lt;/code&gt; with INT4 in a &lt;code&gt;262k FP8 context&lt;/code&gt;. They compare it favorably to past state-of-the-art models like Sonnet 3.7 and Gemini 2.5 Pro, emphasizing its modern capabilities despite not matching current top-tier models like GPT/Opus.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLM/comments/1szeghg/qwen_36_35b_a3b_is_insane_even_for/&quot;&gt;Qwen 3.6 35b a3b is INSANE even for VRAM-constrained systems&lt;/a&gt;&lt;/strong&gt; (Activity: 574): &lt;strong&gt;The post discusses the performance of the &lt;strong&gt;Qwen 3.6 35B-A3B&lt;/strong&gt; model on a VRAM-constrained system, highlighting its ability to handle complex coding tasks locally. The user, with a setup of &lt;code&gt;AMD 7700 XT&lt;/code&gt;, &lt;code&gt;32GB DDR4 RAM&lt;/code&gt;, and &lt;code&gt;Ryzen 5 5600&lt;/code&gt;, successfully ran the model using &lt;code&gt;i1-q4_k_s quant&lt;/code&gt;, offloading all 40 layers to GPU, and configured &lt;code&gt;128k context&lt;/code&gt; with &lt;code&gt;flash attention&lt;/code&gt; and &lt;code&gt;Q8_0 KV quantization&lt;/code&gt;. The model effectively resolved complex bugs in a web scraper app and updated a project README with screenshots, outperforming previous models like &lt;strong&gt;Gemma 3&lt;/strong&gt;, &lt;strong&gt;Gemma 4&lt;/strong&gt;, and &lt;strong&gt;Qwen 2.5 Coder&lt;/strong&gt;. This demonstrates the model&apos;s capability to perform well even on hardware with limited resources, making local AI coding more practical.&lt;/strong&gt; Commenters suggest optimizing performance by moving extra experts to CPU and fitting the KV cache on GPU to increase speed beyond &lt;code&gt;30 t/s&lt;/code&gt;. Another user notes achieving &lt;code&gt;35-40 tok/s&lt;/code&gt; with similar hardware, indicating potential for further optimization.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GoldenX86 suggests optimizing performance by moving extra experts to the CPU while keeping the KV cache on the GPU, which can enhance speed to over &lt;code&gt;30 tokens/second&lt;/code&gt;. This approach leverages the CPU for less critical tasks, freeing up GPU resources for more intensive operations.&lt;/li&gt;
&lt;li&gt;AI_Enhancer discusses achieving &lt;code&gt;35-40 tokens/second&lt;/code&gt; processing speed, noting that prompt complexity significantly affects response time. They highlight that even with complex prompts, the model&apos;s thinking time is capped at about 1 minute, suggesting efficient handling of difficult queries.&lt;/li&gt;
&lt;li&gt;cmplx17 shares a comparative analysis with Claude, noting that Qwen 3.6 exceeded expectations, especially in local model performance. This indicates significant advancements in model capabilities, making local models more competitive with cloud-based solutions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Hardware and Infrastructure Setups&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t0lwx6/16x_spark_cluster_build_update/&quot;&gt;16x Spark Cluster (Build Update)&lt;/a&gt;&lt;/strong&gt; (Activity: 1024): &lt;strong&gt;The image depicts a 16x Spark Cluster setup, which is part of a high-performance computing build using NVIDIA&apos;s DGX Spark units. Each Spark runs on NVIDIA&apos;s Ubuntu and connects to an FS N8510 switch via QSFP56 cables, achieving dual rail connectivity with up to &lt;code&gt;200 Gbps&lt;/code&gt; throughput. The setup is designed to maximize unified memory capacity, crucial for tasks like serving GLM-5.1-NVFP4 models. The cluster is intended for prefill tasks, with plans to integrate M5 Ultra Mac Studios for decode operations. The build emphasizes efficient memory use within the NVIDIA ecosystem, contrasting with alternatives like the RTX Pro 6000 Blackwell, which offers different trade-offs in terms of power and performance.&lt;/strong&gt; One commenter suggests considering the RTX Pro 6000 Blackwell as an alternative, noting its potential for similar performance with possibly easier management and power considerations. Another commenter appreciates the build&apos;s approach to addressing Mac prefill issues with a robust cluster setup.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;flobernd discusses the potential benefits of using 8x RTX Pro 6000 Blackwell GPUs instead of the current setup. They highlight that this alternative could offer a similar price point with the advantage of a single host configuration. Despite higher power usage, the RTX Pro 6000 Blackwell can efficiently run models like Kimi26 and GLM51-nvfp4 with excellent prefill and over 100 tokens per second, even with PCIe bottlenecks, which are also present in the current setup due to 200G NICs.&lt;/li&gt;
&lt;li&gt;TheRealSol4ra questions the choice of the current setup over using 8 RTX 6000 Pro GPUs, which provide 768GB of VRAM. They argue that this amount of VRAM is sufficient for running models at FP8 or Q6 precision, and while the current setup can run any model, it might be limited to 15-25 tokens per second, which is less efficient compared to the RTX 6000 Pro configuration.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t09hyw/amd_halo_box_ryzen_395_128gb_photos/&quot;&gt;AMD Halo Box (Ryzen 395 128GB) photos&lt;/a&gt;&lt;/strong&gt; (Activity: 1033): &lt;strong&gt;The AMD Halo Box, featuring a &lt;code&gt;Ryzen 395&lt;/code&gt; processor and &lt;code&gt;128GB&lt;/code&gt; of RAM, was showcased running on Ubuntu. The unit includes a programmable light strip, enhancing its customization capabilities. However, it lacks a CD-ROM drive, which might be a consideration for some users.&lt;/strong&gt; A notable comment highlights a desire for increased memory bandwidth in AMD products, suggesting that this is a recurring request among users.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FoxiPanda highlights a critical performance aspect by suggesting that AMD should focus on increasing memory bandwidth. This is a significant factor in improving overall system performance, especially for high-demand applications that rely on rapid data access and processing.&lt;/li&gt;
&lt;li&gt;OnkelBB points out the lack of a fast port for clustering, which could limit the device&apos;s utility in high-performance computing environments where multiple units are networked together to work on complex tasks. This could be a drawback for users looking to leverage the device in a clustered setup.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Other notable frontier-model / infra posts&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t06y43/open_models_april_2026_one_of_the_best_months_of/&quot;&gt;Open Models - April 2026 - One of the best months of all time for Local LLMs?&lt;/a&gt;&lt;/strong&gt; (Activity: 767): &lt;strong&gt;The image is a bar chart illustrating the parameter sizes of various local Large Language Models (LLMs) as of April 2026, highlighting a significant month for advancements in local LLMs. The chart features models like &quot;DeepSeek-V4-Pro-Max&quot; with &lt;code&gt;1600 billion parameters&lt;/code&gt;, and others like &quot;Kimi-K2.6,&quot; &quot;MiMo-V2.5-Pro,&quot; and &quot;Ling-2.6-1T,&quot; each with &lt;code&gt;1000 billion parameters&lt;/code&gt;. Notably, the &quot;MiniMax-M2.7&quot; model is absent from the graph due to a license change from MIT to Non-Commercial, indicating a shift in accessibility or usage rights.&lt;/strong&gt; One commenter humorously notes running the 1600B model on a Raspberry Pi, highlighting the impracticality of such a large model on limited hardware. Another comment questions the feasibility of running &quot;DeepSeek-V4-Pro-Max&quot; locally, suggesting skepticism about its practical deployment in local environments.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The mention of the &lt;code&gt;1600B&lt;/code&gt; model being run on a Raspberry Pi is technically intriguing, suggesting significant advancements in model efficiency and hardware compatibility. This implies that even large models can now be optimized to run on low-power devices, which could democratize access to powerful AI capabilities.&lt;/li&gt;
&lt;li&gt;The reference to &lt;code&gt;Qwen3.5-122B-A10B&lt;/code&gt; suggests a discussion around a specific model variant, possibly highlighting its parameter size or architecture. This could indicate a trend towards more specialized or optimized models that balance size and performance for specific tasks or hardware configurations.&lt;/li&gt;
&lt;li&gt;The comment on parameter sizes being a &apos;dumb&apos; metric reflects a technical debate on the relevance of parameter count as a measure of model capability. This suggests a shift towards evaluating models based on performance metrics like accuracy, efficiency, or real-world applicability rather than just size.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1szwi1d/deepseek_released_thinkingwithvisualprimitives/&quot;&gt;DeepSeek released &apos;Thinking-with-Visual-Primitives&apos; framework&lt;/a&gt;&lt;/strong&gt; (Activity: 345): &lt;strong&gt;DeepSeek, in collaboration with &lt;strong&gt;Peking University&lt;/strong&gt; and &lt;strong&gt;Tsinghua University&lt;/strong&gt;, has introduced a novel multimodal reasoning framework called &apos;Thinking with Visual Primitives&apos;. This framework elevates spatial tokens, such as coordinate points and bounding boxes, to serve as the &quot;minimal units of thought&quot; in the model&apos;s chain-of-thought process. This approach allows the model to directly interleave these spatial tokens during reasoning, effectively enabling it to &quot;point&quot; to specific locations within an image while processing information. The framework was initially released on GitHub but was quickly made private, likely due to internal data or paths needing removal. &lt;a href=&quot;https://github.com/deepseek-ai/Thinking-with-Visual-Primitives&quot;&gt;GitHub Repository&lt;/a&gt;.&lt;/strong&gt; Commenters noted that this approach could significantly enhance open models by enforcing spatial awareness and preventing attention drift, a common issue with complex images. There is anticipation for integrating this framework with models like Llama once the repository is available again.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &apos;Thinking-with-Visual-Primitives&apos; framework by DeepSeek introduces a novel approach where models output raw bounding box coordinates as tokens, enhancing spatial awareness and reducing attention drift in complex images. This method contrasts with traditional natural language descriptions, which can be vague and lead to inaccuracies in spatial reasoning. The framework&apos;s potential integration with models like Llama could significantly improve their performance once the code is publicly available again.&lt;/li&gt;
&lt;li&gt;DeepSeek&apos;s release strategy involves initially making their repositories public and then quickly setting them to private, possibly to remove sensitive internal data. This approach allows them to bypass formal review processes while still gaining community attention and credit. The strategy also relies on the community to mirror and fork the repositories, ensuring the code remains accessible despite the temporary privacy.&lt;/li&gt;
&lt;li&gt;The framework&apos;s concept aligns with existing efforts by companies like Google, which have explored similar ideas, though documentation and research on such methods have been sparse. The use of visual primitives for spatial reasoning could represent a significant advancement in open models, potentially influencing future developments in AI spatial awareness and reasoning capabilities.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sznfue/where_the_goblins_came_from/&quot;&gt;Where the goblins came from&lt;/a&gt;&lt;/strong&gt; (Activity: 359): &lt;strong&gt;The OpenAI article titled &quot;Where the Goblins Came From&quot; discusses the challenges and methodologies in training large-scale AI models, particularly focusing on the implications of embedding vast amounts of knowledge into model parameters. The discussion references &lt;strong&gt;Sutton&apos;s Bitter Lesson&lt;/strong&gt;, which emphasizes the superiority of scalable compute over hand-crafted algorithms. The article critiques the approach of embedding extensive prior knowledge into models, suggesting that this contradicts Sutton&apos;s advice to focus on systems that discover patterns autonomously. The latest OpenAI model, estimated at &lt;code&gt;10 trillion parameters&lt;/code&gt;, is highlighted as an example of this approach, raising questions about the efficiency and necessity of such scale in AI training.&lt;/strong&gt; The comments debate the interpretation of Sutton&apos;s Bitter Lesson, with some arguing that OpenAI&apos;s approach of embedding extensive knowledge into models contradicts Sutton&apos;s emphasis on scalable compute for autonomous pattern discovery. Others suggest that alternative methods, such as knowledge graphs and reasoning engines, could avoid embedding unnecessary information like &apos;goblins&apos; into models.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Luke2642 discusses the misinterpretation of Sutton&apos;s &apos;bitter lesson&apos; in AI research, emphasizing that Sutton advocated for scaling compute to enable systems to discover patterns independently, rather than embedding extensive prior knowledge into models. This contrasts with the approach of large models like OpenAI&apos;s, which use massive parameter counts (e.g., 10 trillion) to encode vast amounts of human knowledge, including trivial data like &apos;goblins&apos;. This approach is critiqued as inefficient compared to potentially more effective methods like knowledge graphs or reasoning engines.&lt;/li&gt;
&lt;li&gt;Luke2642 also highlights the efficiency of Chinese researchers in applying less compute to achieve similar or better results, suggesting they may have developed superior algorithms or architectures. This raises questions about the current trend of scaling parameters and data in AI models, suggesting that alternative methods could avoid the pitfalls of embedding unnecessary information, such as &apos;goblins&apos;, into AI systems.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1szdv5s/what_do_you_guys_even_use_local_llms_for_me_a_lot/&quot;&gt;&quot;What do you guys even use local LLMs for?&quot; Me: A lot&lt;/a&gt;&lt;/strong&gt; (Activity: 469): &lt;strong&gt;The image is a dashboard from Grafana, displaying metrics related to the usage of local Large Language Models (LLMs) over a six-hour period. It tracks various statistics such as total tokens used, generation speed, and throughput, providing insights into the performance and utilization of different models and applications. The dashboard highlights that applications like &quot;Hermes&quot; and &quot;Vane&quot; have the highest usage counts, indicating their significant role in the user&apos;s local LLM ecosystem. The user has implemented a system to log usage via Prometheus, which helps in monitoring and optimizing the performance of these models.&lt;/strong&gt; One commenter notes that the token usage is substantial, but suggests that it would need to be in the billions to be considered &apos;a lot.&apos; Another commenter discusses the cost-saving benefits of using local LLMs for initial code review, which reduces the need for expensive API calls.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;spencer_kw discusses using a local LLM, specifically &apos;qwen&apos;, for code review before sending code to an API model like &apos;opus&apos;. This approach catches about 60% of obvious mistakes, significantly reducing API usage and saving approximately &lt;code&gt;$80/month&lt;/code&gt; in costs. This highlights the cost-effectiveness of local LLMs in pre-processing tasks before utilizing more expensive cloud-based models.&lt;/li&gt;
&lt;li&gt;CalligrapherFar7833 suggests using local LLMs for initial data filtering, such as detecting relevant frames before processing with a vision LLM. This strategy can optimize performance by reducing the amount of unnecessary data processed by more resource-intensive models, thereby improving efficiency and potentially lowering computational costs.&lt;/li&gt;
&lt;li&gt;Nyghtbynger emphasizes the importance of monitoring resource usage and costs when using local models. They find provider dashboards useful for tracking metrics like money spent and cache usage, which are critical for managing the efficiency and cost-effectiveness of local LLM deployments.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. AI Model Releases and Benchmarks&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1t02oxw/gpt55_slightly_outperformed_mythos_on_a_multistep/&quot;&gt;GPT5.5 slightly outperformed Mythos on a multi-step cyber-attack simulation. One challenge that took a human expert 12 hrs took GPT-5.5 only 11 min at a $1.73 cost&lt;/a&gt;&lt;/strong&gt; (Activity: 873): &lt;strong&gt;&lt;strong&gt;GPT-5.5&lt;/strong&gt; has demonstrated superior performance in a multi-step cyber-attack simulation, outperforming &lt;strong&gt;Mythos&lt;/strong&gt; by completing a task in &lt;code&gt;11 minutes&lt;/code&gt; that took a human expert &lt;code&gt;12 hours&lt;/code&gt;, at a cost of &lt;code&gt;$1.73&lt;/code&gt;. This evaluation, detailed in a &lt;a href=&quot;https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5-5-cyber-capabilities&quot;&gt;blog by AISI&lt;/a&gt;, highlights the model&apos;s efficiency and cost-effectiveness in handling complex cybersecurity challenges. The &lt;a href=&quot;https://www.ncsc.gov.uk/blogs/why-cyber-defenders-need-to-be-ready-for-frontier-ai&quot;&gt;NCSC blog&lt;/a&gt; discusses the implications of such advancements for cyber defense strategies, emphasizing the need for readiness against AI-driven threats.&lt;/strong&gt; Commenters express skepticism about the reported cost, suggesting it should be closer to &lt;code&gt;$70&lt;/code&gt;, and speculate on potential impacts such as the exposure of government backdoors, which could lead to significant security concerns.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;peakedtooearly suggests that the claim &quot;Mythos is too dangerous to release&quot; might have been a strategic move by Anthropic to mask computational limitations rather than genuine safety concerns. This implies that the performance of GPT-5.5, which outperformed Mythos, could be a result of more efficient compute usage or advancements in model architecture.&lt;/li&gt;
&lt;li&gt;Many_Increase_6767 questions the reported cost of $1.73 for 11 minutes of computation by GPT-5.5, suggesting it should be closer to $70. This discrepancy raises questions about the pricing model or efficiency of the compute resources used by GPT-5.5, indicating a potential misunderstanding or miscommunication about the cost structure.&lt;/li&gt;
&lt;li&gt;deleafir expresses surprise that GPT-5.5, which is reportedly on par with Mythos, did not cause significant disruptions upon release, as Anthropic had previously warned about the potential dangers of such powerful models. This comment highlights the ongoing debate about the balance between AI capabilities and safety concerns.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1sys1nd/openais_sebastien_bubeck_llm_models_are_able_to/&quot;&gt;OpenAI&apos;s Sebastien Bubeck: [LLM] models are able to surpass humans [researchers] and ask [research] questions&lt;/a&gt;&lt;/strong&gt; (Activity: 531): &lt;strong&gt;The image is a tweet quoting &lt;strong&gt;Sebastien Bubeck&lt;/strong&gt; from &lt;strong&gt;OpenAI&lt;/strong&gt;, highlighting that their &lt;strong&gt;LLM models&lt;/strong&gt; are surpassing human researchers by identifying mistakes in research papers and asking research questions. This suggests a significant advancement in AI capabilities, where models are not only responding to queries but also generating insightful questions, potentially transforming research methodologies. The discussion in the comments emphasizes the importance of training models to ask questions and the exploration of different reasoning styles to enhance problem-solving capabilities.&lt;/strong&gt; One comment highlights the potential of training models to ask questions, suggesting that the current limitations are due to inadequate training rather than inherent model deficiencies. Another comment expresses skepticism about the claims, noting a lack of transparency in sharing results.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The comment by sckchui highlights the importance of training methodologies in the performance of LLMs. It suggests that the current limitations in LLMs&apos; ability to ask questions stem from inadequate training focused on answering rather than questioning. The comment also notes emerging research trends that involve training models with diverse reasoning styles and leveraging the conflicts between these styles to enhance problem-solving capabilities.&lt;/li&gt;
&lt;li&gt;pavelkomin expresses skepticism about the claims made by OpenAI, pointing out a lack of transparency in sharing results. The comment suggests that while AI advancements are likely, the communication style resembles marketing hype without providing tangible evidence or access to the breakthroughs being claimed. This reflects a broader concern about the openness and verifiability of AI research progress.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/1sz14mi/an_interactive_semantic_map_of_the_latest_10/&quot;&gt;An interactive semantic map of the latest 10 million published papers [P]&lt;/a&gt;&lt;/strong&gt; (Activity: 245): &lt;strong&gt;The post introduces an interactive semantic map created from the latest 10 million papers sourced from &lt;strong&gt;OpenAlex&lt;/strong&gt;. The map uses &lt;strong&gt;SPECTER 2&lt;/strong&gt; embeddings on titles and abstracts, with dimensionality reduction via &lt;strong&gt;UMAP&lt;/strong&gt; and &lt;strong&gt;Voronoi partitioning&lt;/strong&gt; on density peaks to form semantic neighborhoods. It supports keyword and semantic queries and includes an analytics layer for ranking institutions, authors, and topics. The map is accessible at &lt;a href=&quot;https://globalresearchspace.com/space#7.02/-4.771/61.204/-52.6/30&quot;&gt;The Global Research Space&lt;/a&gt;.&lt;/strong&gt; A commenter inquires about the Voronoi partitioning method, suggesting alternatives like &lt;strong&gt;HDBSCAN&lt;/strong&gt; for density-aware clustering, and asks for more details on the hierarchical nature of the partitioning and the labeling process. There is also interest in whether the code is open source.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;TheEsteemedSaboteur inquires about the Voronoi partitioning procedure used in the semantic map, suggesting alternatives like HDBSCAN for density-aware clustering. They note the hierarchical nature of the Voronoi cells and request more details on the labelling process and whether the code is open source.&lt;/li&gt;
&lt;li&gt;kamilc86 raises questions about the labeling behavior across different zoom levels in the map, noting that at wider views, cluster names are clear, but zooming in reveals empty spaces without labels. They also question the choice of using SPECTER 2 for embeddings, asking if general-purpose embedders were considered as a baseline, and inquire about the computational feasibility of running UMAP on 10 million vectors.&lt;/li&gt;
&lt;li&gt;The discussion includes technical considerations such as the choice of SPECTER 2, which is specifically trained on scientific text, and the practical challenges of using UMAP on a large dataset of 10 million vectors, questioning the methods used to make the process tractable.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1syt37w/claude_is_my_seo_strategist_content_engine_and/&quot;&gt;Claude is my SEO strategist, content engine, and CTO. From 0 to 10,000 active users in 6 weeks, $0 on ads.&lt;/a&gt;&lt;/strong&gt; (Activity: 1039): &lt;strong&gt;The image in the Reddit post is a data analytics dashboard that visually represents the growth metrics of the marketplace Agensi, which was built using Claude and Lovable. The dashboard highlights significant increases in user engagement, showing 10,000 active users with a &lt;code&gt;263.3%&lt;/code&gt; increase and 9,900 new users with a &lt;code&gt;262.0%&lt;/code&gt; increase over the last 30 days. The event count is 73,000, marking a &lt;code&gt;197.6%&lt;/code&gt; increase, and a line graph illustrates the upward trend in user activity. This growth is attributed to the strategic use of Claude for SEO, content strategy, and AEO (answer engine optimization), which involves analyzing Google Search Console data to identify keyword gaps and optimize content structure for AI engines.&lt;/strong&gt; Some comments express skepticism about the authenticity and originality of the content, suggesting it might be &apos;generic AI slop&apos; or spam, and questioning if the post itself was written by AI.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1t0aods/i_wasnt_ready_for_deepseek_v4/&quot;&gt;I wasn’t ready for DeepSeek V4&lt;/a&gt;&lt;/strong&gt; (Activity: 176): &lt;strong&gt;The image showcases a dashboard for DeepSeek V4, highlighting its cost efficiency and performance metrics. The dashboard displays a total spend of &lt;code&gt;$1,050.86&lt;/code&gt; and cache savings of &lt;code&gt;$3,351.43&lt;/code&gt;, indicating significant cost savings. It compares different models like DeepSeek Chat, DeepSeek V4 Pro, and DeepSeek V4 Flash, with the latter showing superior performance in terms of caching efficiency. This suggests that DeepSeek V4 models are highly efficient and cost-effective, potentially outperforming other models like Claude in terms of speed and efficiency.&lt;/strong&gt; Commenters note that DeepSeek V4 models are revolutionary in terms of price, speed, and efficiency, yet they haven&apos;t gained widespread recognition. There&apos;s a sentiment that the market hasn&apos;t fully realized the potential of these models.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DeepSeek V4 models are noted for their significant improvements in price, speed, and efficiency, which could potentially disrupt the market. However, there seems to be a lack of awareness or acknowledgment of these advancements among users, as they continue to accept high costs as the norm.&lt;/li&gt;
&lt;li&gt;The V4 flash model is highlighted as a preferred choice for many users due to its performance. This suggests that the model offers a balance of speed and efficiency that makes it suitable for a wide range of applications, becoming a default option for users familiar with AI capabilities.&lt;/li&gt;
&lt;li&gt;Despite the advancements in DeepSeek V4, there is a perception that users have become accustomed to the general intelligence of AI models, making it challenging to differentiate based solely on intelligence. This indicates a shift in user expectations towards other factors like cost and speed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/Bard/comments/1syqhsp/the_significance_of_googles_recent_tpu_8t_and_tpu/&quot;&gt;The Significance of Google&apos;s recent TPU 8t and TPU 8i&lt;/a&gt;&lt;/strong&gt; (Activity: 104): &lt;strong&gt;Google&apos;s recent TPU 8t and TPU 8i chips demonstrate significant advancements in both cost and performance efficiency. The TPU 8t shows a &lt;code&gt;170% to 180%&lt;/code&gt; gain in training cost-performance and a &lt;code&gt;124%&lt;/code&gt; gain in training power efficiency, while the TPU 8i offers an &lt;code&gt;80%&lt;/code&gt; gain in inference cost-performance and a &lt;code&gt;117%&lt;/code&gt; gain in inference power efficiency. Networking improvements include a &lt;code&gt;300%&lt;/code&gt; increase in data center network bandwidth and a &lt;code&gt;56%&lt;/code&gt; reduction in inference network latency. Memory enhancements feature a &lt;code&gt;200%&lt;/code&gt; increase in on-chip SRAM for the TPU 8i and a &lt;code&gt;50%&lt;/code&gt; increase in HBM capacity for inference. These improvements are expected to significantly reduce costs and enhance performance for Google&apos;s Gemini 3.1 Pro and future AI models, facilitating the training of trillion-parameter, multimodal AI systems. &lt;a href=&quot;https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive&quot;&gt;Google Cloud Blog&lt;/a&gt;&lt;/strong&gt; Commenters are impressed by the rapid iteration leading to these gains and are curious about the deployment timeline for future Gemini models. There is also a call for increasing the usage quota for the Gemini 3.1 Pro model and AI Studio, reflecting user demand for more access.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/Qwen_AI/comments/1szamsf/devs_using_qwen_27b_seriously_whats_your_take/&quot;&gt;Devs using Qwen 27B seriously, what&apos;s your take?&lt;/a&gt;&lt;/strong&gt; (Activity: 234): &lt;strong&gt;&lt;strong&gt;Qwen 27B&lt;/strong&gt; is being evaluated by developers for its coding capabilities, particularly in &quot;Codex style&quot; tasks. Users report that while it may not be as creative as larger models like GPT-5.5, it excels in following instructions and delivering solid results for specific tasks such as debugging, refactoring, and navigating codebases. It is noted for its reliability compared to models like Opus 4.6, which has been reported to hallucinate more frequently. The model is not designed to handle full backend and frontend development in one go but is appreciated for its ability to execute iterative tasks effectively when provided with detailed specifications. &lt;strong&gt;Performance metrics&lt;/strong&gt; indicate that on a Strix Halo 128Gb, Qwen 27B Q8 achieves &lt;code&gt;10t/s&lt;/code&gt;, whereas a larger model like Qwen 3.6 35B Q8 achieves &lt;code&gt;44t/s&lt;/code&gt;. This suggests that while Qwen 27B is capable, its performance may be limited by hardware constraints, and faster models may be preferred for iterative tasks.&lt;/strong&gt; Commenters highlight that the effectiveness of Qwen 27B is more dependent on the harness and method used rather than the model size itself. Some developers prefer smaller models for iterative tasks due to better economic efficiency and similar quality results when detailed specifications are provided. The model is praised for raising the bar for agentic models in its parameter range, suggesting that it sets a new standard for competition.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;H_DANILO&lt;/strong&gt; highlights that Qwen 27B is more reliable than Opus 4.6, particularly in avoiding hallucinations during tasks like resolving merge conflicts. While Qwen isn&apos;t highly creative, it excels at following instructions and delivering solid results, making it suitable for structured tasks rather than creative ones.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;edsonmedina&lt;/strong&gt; discusses the efficiency of using smaller models with iterative attempts and detailed specs, noting that the harness and method often have a greater impact than model size. They mention using Qwen 3.6 35B A3B MoE Q8_K_XL on a Strix Halo 128Gb, achieving 10t/s with 27B Q8 versus 44t/s with 35B Q8, indicating that bandwidth, rather than memory, is a limiting factor.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;kaliku&lt;/strong&gt; appreciates Qwen 27B for its ability to handle boilerplate code and follow examples effectively, especially within a well-designed TDD loop. They note that Qwen 27B sets a high standard for agentic models in its parameter range, suggesting that it raises the bar for future models from competitors like Mistral.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1sz1fir/sensenovau1_just_dropped_native_multimodal/&quot;&gt;SenseNova-U1 just dropped — native multimodal gen/understanding in one model, no VAE, no diffusion&lt;/a&gt;&lt;/strong&gt; (Activity: 293): &lt;strong&gt;&lt;strong&gt;SenseNova-U1&lt;/strong&gt; introduces a novel approach to multimodal generation and understanding by integrating text rendering directly into images, overcoming limitations of diffusion models that lack language pathways. This model excels in generating complex visual outputs like infographics and annotated diagrams by processing semantic content rather than latents. It also supports image editing with reasoning, allowing for nuanced transformations such as converting an image to a watercolor style while maintaining composition. Additionally, it enables interleaved text and image generation, producing coherent outputs in a single pass. The model is available on &lt;a href=&quot;https://github.com/OpenSenseNova/SenseNova-U1&quot;&gt;GitHub&lt;/a&gt; and supports a resolution of &lt;code&gt;2048x2048&lt;/code&gt; with &lt;code&gt;8B&lt;/code&gt; parameters under the Apache 2.0 license.&lt;/strong&gt; One commenter noted the model&apos;s technical specifications, including its &lt;code&gt;2048x2048&lt;/code&gt; resolution and &lt;code&gt;8B&lt;/code&gt; parameters, expressing interest in its integration into other platforms. Another user reported disappointing image quality in initial tests, suggesting the model&apos;s strengths may lie in more complex tasks beyond simple text-to-image generation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The SenseNova-U1 model is released under the Apache 2.0 license, featuring a resolution of &lt;code&gt;2048x2048&lt;/code&gt; and &lt;code&gt;8 billion parameters&lt;/code&gt;. It utilizes a technique referred to as &lt;code&gt;lightx2v&lt;/code&gt;, which is notable for not relying on traditional methods like VAE or diffusion for multimodal generation and understanding.&lt;/li&gt;
&lt;li&gt;A user reported that the image quality of SenseNova-U1 was underwhelming in their tests, particularly when using photorealistic prompts for text-to-image generation. This suggests that while the model may have strengths in other areas, its performance in generating high-quality images might not meet expectations in certain scenarios.&lt;/li&gt;
&lt;li&gt;There is interest in running a local, uncensored version of SenseNova-U1, indicating a demand for more control and privacy in using AI models. This reflects a broader trend in the AI community towards decentralization and user autonomy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. AI Tools and Workflows&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1syvihl/that_robot_demo_almost_turned_into_a_nightmare/&quot;&gt;That robot demo almost turned into a nightmare&lt;/a&gt;&lt;/strong&gt; (Activity: 2531): &lt;strong&gt;A recent robot demonstration nearly resulted in an accident when a child stood too close to a robot performing martial arts-like movements. The incident highlights potential safety concerns in human-robot interaction, especially in public demonstrations where bystanders may not be aware of the risks. This underscores the importance of implementing strict safety protocols and barriers to prevent such occurrences in future demonstrations.&lt;/strong&gt; Commenters expressed concern over the lack of parental supervision and the potential dangers of allowing children near active robots. The incident sparked a discussion on the need for better safety measures and awareness during robot demonstrations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/1szc05y/icml_2026_decision_d/&quot;&gt;ICML 2026 Decision [D]&lt;/a&gt;&lt;/strong&gt; (Activity: 1124): &lt;strong&gt;The post discusses the anticipation surrounding the upcoming publication of decisions for &lt;strong&gt;ICML 2026&lt;/strong&gt;. The community is eagerly awaiting updates, with many users humorously expressing their impatience by frequently refreshing platforms like OpenReview. This reflects the high level of engagement and anxiety typical in the academic community during conference decision periods.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/OpenAI/comments/1szlsfp/openai_explains_where_the_goblins_came_from/&quot;&gt;OpenAI explains &quot;Where the goblins came from&quot;&lt;/a&gt;&lt;/strong&gt; (Activity: 519): &lt;strong&gt;OpenAI&apos;s GPT-5.1 began incorporating &apos;goblin&apos; metaphors due to a reinforcement learning mechanism that rewarded creative language, particularly in &apos;nerdy&apos; contexts. This behavior propagated through subsequent models as they were trained on outputs from earlier versions, leading to an amplification of this tendency. OpenAI has since retired the &apos;Nerdy&apos; personality and adjusted training protocols to address this issue, emphasizing the need for careful auditing of model behaviors to avoid unintended consequences. For more details, see the &lt;a href=&quot;https://openai.com/index/where-the-goblins-came-from/&quot;&gt;original article&lt;/a&gt;.&lt;/strong&gt; A debate emerged around &lt;strong&gt;Rich Sutton&apos;s&lt;/strong&gt; &apos;bitter lesson&apos;, which advocates for scaling compute over embedding knowledge into models. Critics argue that OpenAI&apos;s approach of embedding vast amounts of knowledge, including &apos;goblins&apos;, contradicts Sutton&apos;s philosophy. Some suggest that more efficient algorithms or architectures, as demonstrated by Chinese researchers, could be a better path forward.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The_Right_Trousers highlights a phenomenon where GPT 5.1 began incorporating &apos;goblin metaphors&apos; in its responses due to reinforcement from human feedback or earlier models. This behavior was then propagated and amplified in subsequent models, illustrating a feedback loop in AI training where quirks can become entrenched features over time.&lt;/li&gt;
&lt;li&gt;Luke2642 critiques the current AI model development strategy, referencing Sutton&apos;s &apos;bitter lesson&apos; which emphasizes the importance of compute over hand-crafted algorithms. They argue that OpenAI&apos;s approach of scaling parameters and data to embed extensive knowledge, including trivial elements like &apos;goblins&apos;, contradicts Sutton&apos;s advice to focus on systems that discover patterns independently. This critique suggests a misalignment between theoretical AI principles and practical implementations.&lt;/li&gt;
&lt;li&gt;Luke2642 also contrasts OpenAI&apos;s strategy with Chinese researchers who have reportedly achieved more efficient results with less compute or better algorithms. This points to a potential inefficiency in the current trend of scaling AI models to trillions of parameters, questioning the necessity and effectiveness of such an approach when simpler, more efficient methods might exist.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1sz67w4/thanks_for_the_advice_claude/&quot;&gt;Thanks for the advice Claude&lt;/a&gt;&lt;/strong&gt; (Activity: 3326): &lt;strong&gt;The image is a non-technical meme or humorous post, featuring a text message that humorously suggests a reading plan, likely from an AI or virtual assistant named Claude. The message advises a structured reading approach, starting with the book &quot;Sapiens,&quot; and suggests reading 20 pages tonight. The context implies a casual, motivational tone rather than a technical or instructional one.&lt;/strong&gt; The comments humorously discuss the AI&apos;s relaxed attitude towards piracy, with users joking about the AI&apos;s training data being sourced from pirated content.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1syuij0/when_youve_got_money_to_burn/&quot;&gt;When you&apos;ve got money to burn 😂&lt;/a&gt;&lt;/strong&gt; (Activity: 1764): &lt;strong&gt;The image is a meme that humorously depicts the concept of having &apos;money to burn&apos; by showing a man in a suit lighting a cigar with a blowtorch. This exaggeration is meant to illustrate the idea of excessive wealth or spending. The comments do not provide any technical insights related to the image, but rather discuss unrelated topics such as the performance of a software version and the cost of a product.&lt;/strong&gt; The comments reflect a humorous take on the performance of a software version, with users expressing frustration over its inability to perform simple tasks despite its cost, suggesting a disconnect between price and functionality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1szi053/how_not_to_run_an_ai_company/&quot;&gt;How not to run an ai company&lt;/a&gt;&lt;/strong&gt; (Activity: 934): &lt;strong&gt;The image depicts a status dashboard for an AI company, showing that all major services, including Claude.ai and its associated platforms, are experiencing a &apos;Major Outage&apos; today. The uptime percentages over the past 90 days range from &lt;code&gt;98.69%&lt;/code&gt; to &lt;code&gt;99.88%&lt;/code&gt;, indicating frequent service disruptions. This suggests challenges in maintaining service reliability, which is often a characteristic of rapidly evolving tech companies prioritizing innovation over stability.&lt;/strong&gt; Commenters highlight that such instability is typical for disruptive tech companies in their early stages, emphasizing a &apos;go fast and break things&apos; approach. However, they note that this is not suitable for mature SaaS companies, indicating a need for improved stability as the company matures.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ant3k highlights the typical approach of disruptive tech companies, which often prioritize rapid innovation over stability, encapsulated in the phrase &apos;go fast and break things.&apos; This approach is common in the early stages of tech development, where the focus is on pushing boundaries rather than ensuring consistent performance.&lt;/li&gt;
&lt;li&gt;itswednesday differentiates between the operational strategies of cutting-edge AI companies and mature SaaS companies. Cutting-edge AI firms often embrace rapid iteration and experimentation, which contrasts with the stability and reliability expected from established SaaS businesses. This distinction underscores the varying expectations and operational models based on the company&apos;s maturity and industry.&lt;/li&gt;
&lt;li&gt;we-meet-again points out the challenges faced by AI companies when demand outpaces infrastructure capabilities. The comment suggests that even if a product is popular, financial constraints can hinder scaling efforts, leading to performance issues. This highlights the tension between user demand and the financial realities of maintaining and scaling tech infrastructure.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1szdgj2/claude_i_estimate_this_will_take_12_weeks_to/&quot;&gt;Claude: “I estimate this will take 1-2 weeks to complete”&lt;/a&gt;&lt;/strong&gt; (Activity: 1023): &lt;strong&gt;The image is a meme and does not contain any technical content. It humorously depicts a scenario where a character named Claude estimates a task will take 1-2 weeks to complete, which is a common trope in project management and software development where time estimates are often underestimated or overly optimistic. The comments reflect a playful skepticism towards such estimates, with one suggesting that the task should be completed immediately instead of taking the estimated time.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1szyr5z/bro_this_is_too_cheap_i_think_finally_i_have_a/&quot;&gt;bro this is too cheap i think finally i have a respect for the deepseek&lt;/a&gt;&lt;/strong&gt; (Activity: 132): &lt;strong&gt;The post discusses the pricing of the &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; model, which is perceived as surprisingly affordable compared to the &lt;strong&gt;Pro&lt;/strong&gt; version, which remains expensive until later this year. A discount on the Pro version is noted. Technical inquiries in the comments focus on the model&apos;s quality compared to other frontier models and whether the pricing advantage is due to cache hits, which would affect the cost of output tokens.&lt;/strong&gt; Commenters are debating whether the cost-effectiveness of the DeepSeek V4 Flash is due to its reliance on cache hits, which could reduce output token costs, and how its quality compares to other models.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The discussion highlights the cost-effectiveness of DeepSeek&apos;s disk-based KV cache system, which is noted for its robustness and reliability, lasting for hours compared to the typical 5-minute duration offered by most providers. This system significantly reduces costs by making cached input essentially free, enabling new innovations in the field.&lt;/li&gt;
&lt;li&gt;There is a debate about the quality of DeepSeek V4, with some users expressing disappointment in its performance for creative writing tasks, despite its utility in role-playing and agentic applications. This suggests a trade-off between cost and performance, particularly in creative contexts.&lt;/li&gt;
&lt;li&gt;Questions are raised about the pricing structure, with confusion over how DeepSeek can offer such low prices even with significant discounts and cache hits. This indicates a need for clarity on the pricing model and the potential use of older models to achieve these cost reductions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/GeminiAI/comments/1szvhfj/this_is_actually_sad/&quot;&gt;this is actually sad&lt;/a&gt;&lt;/strong&gt; (Activity: 2423): &lt;strong&gt;The image is a meme highlighting the perceived low engagement with Google&apos;s Gemini app, as depicted by a humorous interaction between a user and the official Google Gemini account. Despite this portrayal, comments suggest that Gemini is valued for its unique capabilities, such as audio file analysis, which is beneficial for independent music producers. Users argue that Gemini, especially the pro version, is underrated and offers competitive features compared to other AI models like ChatGPT and Copilot, though it suffers from a negative public perception due to its association with Bard.&lt;/strong&gt; Commenters emphasize that Gemini is underrated and has unique features that are not widely recognized, suggesting that its public perception is skewed by past associations rather than its current capabilities.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Gemini&apos;s audio analysis capabilities&lt;/strong&gt; are highlighted as a significant advantage, particularly for independent music producers who lack formal training in audio engineering. This feature sets it apart from other LLMs, offering unique utility in creative fields beyond text processing.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Public perception of Gemini&lt;/strong&gt; is noted to be negatively influenced by its association with Bard, despite improvements. Users with experience across platforms argue that Gemini Pro surpasses competitors like ChatGPT and Copilot in certain aspects, suggesting that its reputation may not fully reflect its current capabilities.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost-effectiveness of Gemini&lt;/strong&gt; is emphasized, with users noting it as the most economical option for general use. However, it may not be the best choice for developers, who often dominate discussions and may skew perceptions of its utility.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1t0auqh/sulphur_2_uncensored_video_gen/&quot;&gt;Sulphur 2 Uncensored Video Gen&lt;/a&gt;&lt;/strong&gt; (Activity: 442): &lt;strong&gt;The team is developing an open-source, uncensored video generation model named &lt;strong&gt;Sulphur 2&lt;/strong&gt;, leveraging the &lt;strong&gt;LTX-2.3&lt;/strong&gt; architecture. The model is trained on &lt;code&gt;125k&lt;/code&gt; videos, each &lt;code&gt;10 seconds&lt;/code&gt; long at &lt;code&gt;24 fps&lt;/code&gt;, with filtering applied only for illegal content and excluding 2D videos to enhance performance. It supports natural language captioning for video generation. The model is set for release on &lt;strong&gt;Hugging Face&lt;/strong&gt; within a week, with a pre-release testing phase available via a &lt;a href=&quot;https://discord.gg/Jbdm9sWC8&quot;&gt;Discord server&lt;/a&gt;.&lt;/strong&gt; A commenter inquired if the model is a finetuned version of &lt;strong&gt;LTX-2.3&lt;/strong&gt;, indicating interest in the technical specifics of the model&apos;s architecture.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ANR2ME inquires if the model used is a finetuned version of LTX-2.3, suggesting a focus on the underlying architecture and potential modifications made to the base model. This implies a technical interest in the model&apos;s capabilities and performance enhancements through finetuning.&lt;/li&gt;
&lt;li&gt;eraser851 asks about the captioning process and available software for quickly captioning NSFW videos, indicating a technical interest in the tools and methodologies used for video processing and annotation. This highlights the importance of efficient workflows in handling sensitive content.&lt;/li&gt;
&lt;li&gt;Technical-Rope2989 queries about the release of a distilled version, which suggests an interest in model optimization techniques such as distillation to reduce model size while maintaining performance. This reflects a focus on resource efficiency and deployment considerations.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1syu74k/zanime_full_anime_finetune_on_zimage_base/&quot;&gt;Z-Anime - Full Anime Fine-Tune on Z-Image Base&lt;/a&gt;&lt;/strong&gt; (Activity: 297): &lt;strong&gt;&lt;strong&gt;Z-Anime&lt;/strong&gt; is a fully fine-tuned model based on &lt;strong&gt;Alibaba&apos;s Z-Image Base&lt;/strong&gt; architecture, specifically designed for anime-style image generation. Unlike a LoRA merge, it is built from scratch using the &lt;strong&gt;S3-DiT (Single-Stream Diffusion Transformer)&lt;/strong&gt; with &lt;code&gt;6 billion parameters&lt;/code&gt;. This model emphasizes rich diversity, strong controllability, and supports full negative prompts, making it highly adaptable for fine-tuning in anime contexts. The model was trained on a dataset of approximately &lt;code&gt;15,000 images&lt;/code&gt;, focusing on anime aesthetics.&lt;/strong&gt; There is a debate regarding the training dataset, with some users emphasizing the importance of not using AI-generated datasets for training, as it may affect the model&apos;s originality and quality.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The discussion highlights a discrepancy in the claims about the Z-Anime model&apos;s training process. While it is marketed as a &apos;full anime fine-tune&apos; model, it appears to have been trained on a relatively small dataset of approximately 15,000 images. This raises questions about the model&apos;s comprehensiveness and the potential overstatement in its promotional materials.&lt;/li&gt;
&lt;li&gt;A user references a common guideline in AI model training: &lt;em&gt;&apos;Rule 1 - Don&apos;t train on AI generated dataset.&apos;&lt;/em&gt; This suggests a concern about the quality and originality of the training data used for Z-Anime, as training on AI-generated content can lead to issues like data contamination and reduced model robustness.&lt;/li&gt;
&lt;li&gt;The comment by -Ellary- implies a search for comparisons between Z-Anime and other models like &apos;anima3,&apos; indicating a community interest in benchmarking Z-Anime against existing models to evaluate its performance and unique features. This reflects a broader trend in the AI community to critically assess new models against established benchmarks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1szjm1c/blind_realism_test_z_image_turbo_vs_klein_9b/&quot;&gt;Blind realism test, Z image turbo vs Klein 9B distilled&lt;/a&gt;&lt;/strong&gt; (Activity: 232): &lt;strong&gt;The post presents a blind realism test comparing two AI models, &lt;strong&gt;Z Image Turbo&lt;/strong&gt; and &lt;strong&gt;Klein 9B Distilled&lt;/strong&gt;, across 10 images to evaluate which appears most realistic. The test includes images generated with and without LoRa (Low-Rank Adaptation) to assess their impact on realism. The prompt used for generation is a detailed description of a night portrait scene. The models and LoRas used include &lt;strong&gt;Flux 2 Klein 9B Distilled&lt;/strong&gt; and &lt;strong&gt;Intarealism V2/V3 finetunes from Z Image Turbo&lt;/strong&gt;, with links provided to their respective &lt;a href=&quot;https://civitai.com&quot;&gt;Civitai pages&lt;/a&gt;. The test aims to mitigate bias by not revealing the models initially, allowing for an unbiased assessment of realism.&lt;/strong&gt; Commenters noted that &lt;strong&gt;Klein 9B&lt;/strong&gt; handles lens flares better than &lt;strong&gt;Z Image Turbo&lt;/strong&gt;, which struggles with texture realism, particularly in stone patterns. The first image was widely regarded as the most realistic, with some suggesting it might be a real photo rather than AI-generated.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hoodfu highlights a key difference between the models, noting that &lt;strong&gt;Klein 9B&lt;/strong&gt; handles lens flares significantly better than &lt;strong&gt;Z Image Turbo&lt;/strong&gt;, which struggles with rendering mottled stone patterns, particularly on gravel surfaces. This texture issue is a major drawback for Z Image Turbo, affecting its overall realism.&lt;/li&gt;
&lt;li&gt;Puzzled-Valuable-985 provides a detailed breakdown of the models and LoRas used in the test, emphasizing that the most realistic image was created using &lt;strong&gt;Flux 2 Klein 9B Distilled&lt;/strong&gt; with a specific LoRa for phone photography. The prompt used was designed to test realism with a complex scene involving a car and a model in a night setting, highlighting the strengths of Klein 9B in achieving photorealistic results.&lt;/li&gt;
&lt;li&gt;Desktop4070 offers a comparative analysis of the images, noting that &lt;strong&gt;Image 1&lt;/strong&gt; (Flux 2 Klein 9B Distilled) was the most convincing in terms of realism, while &lt;strong&gt;Image 3&lt;/strong&gt; (Z Image Turbo) had uncanny elements, particularly in the eyes. They also point out lighting inconsistencies in &lt;strong&gt;Image 10&lt;/strong&gt; and the overly professional appearance of &lt;strong&gt;Image 2&lt;/strong&gt;, which detracts from its realism.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/StableDiffusion/comments/1szqdtl/multi_injection_incoming/&quot;&gt;Multi Injection incoming&lt;/a&gt;&lt;/strong&gt; (Activity: 224): &lt;strong&gt;The image depicts a user interface for the &quot;FLUX.2 Klein Identity Transfer Multi-Injection&quot; tool, which is designed to enhance identity transfer in models by injecting references from multiple stages within targeted blocks. This approach aims to improve stability and flexibility by performing mid and post-injection processes. The tool is part of a broader effort to refine identity transfer techniques, with plans to release it as a plug-and-play preset for ease of use. The interface includes settings for model selection, subject masking, and block configuration, indicating a focus on customizable data processing or modeling workflows.&lt;/strong&gt; One commenter expressed anticipation for the tool but hoped for the ability to customize configurations beyond the default plug-and-play settings, suggesting that fixed defaults might not be optimal for all use cases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enshitification raises a technical point about configuration flexibility in the upcoming VAE project. They express hope that while a plug-and-play default configuration might be introduced, users will still retain the ability to modify settings. This flexibility is crucial as fixed defaults may not be optimal for all scenarios, suggesting a need for customizable configurations to cater to diverse use cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szvtvz/generate_a_website_screenshot_from_the_year_1000/&quot;&gt;&quot;Generate a website screenshot from the year 1000&quot;&lt;/a&gt;&lt;/strong&gt; (Activity: 1932): &lt;strong&gt;The image is a humorous and creative meme that imagines what a website might look like if it were designed in the year 1000. It features a medieval theme with elements like a castle and sections for proclamations and trade routes, blending historical motifs with modern web design elements such as navigation menus and buttons. This whimsical design serves as a playful commentary on the evolution of communication and technology, highlighting the contrast between medieval times and the digital age.&lt;/strong&gt; The comments appreciate the design&apos;s creativity, noting the clarity of the text and the clever blend of historical and modern web elements, which adds to the humor and charm of the concept.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szozpg/this_is_so_accurate/&quot;&gt;this is so accurate 😂&lt;/a&gt;&lt;/strong&gt; (Activity: 3752): &lt;strong&gt;The Reddit post humorously highlights the accuracy of AI models like &lt;strong&gt;Claude&lt;/strong&gt; and &lt;strong&gt;GPT&lt;/strong&gt; in mimicking human-like responses, particularly in scenarios where users provide inaccurate prompts. This reflects a common user experience where frustration arises not from the AI&apos;s capabilities but from the user&apos;s own input errors. The discussion underscores the importance of precise prompt engineering to achieve desired outcomes from AI models.&lt;/strong&gt; Commenters agree on the accuracy of the depiction, noting that user frustration often stems from their own inaccurate prompts rather than the AI&apos;s performance. This suggests a need for better user education on effective prompt crafting.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szkkro/cant_believe_that_chatgpt_has_such_indepth/&quot;&gt;Can’t believe that ChatGPT has such in-depth medical knowledge&lt;/a&gt;&lt;/strong&gt; (Activity: 9610): &lt;strong&gt;The image is a humorous meme that combines medical terminology with fictional elements from the Star Wars universe, specifically focusing on a fictional clinical guide for conducting a prostate examination on an Ewok. This playful approach highlights the perceived depth of ChatGPT&apos;s medical knowledge by juxtaposing it with a fictional and humorous scenario. The image is not meant to be taken seriously and serves as a lighthearted commentary on the capabilities of AI in understanding complex topics, albeit in a fictional context.&lt;/strong&gt; The comments do not provide any substantive technical debate or opinions, as they primarily consist of humorous reactions and additional memes related to the fictional scenario.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szyf91/imagine_a_real_photographer_taking_a_photo_when/&quot;&gt;Imagine a real photographer taking a photo when Columbus meets the natives.&lt;/a&gt;&lt;/strong&gt; (Activity: 656): &lt;strong&gt;The image is a non-technical, artistic representation of a historical event, specifically the encounter between Columbus and the natives. It is a creative depiction rather than a factual or technical illustration, aiming to visualize what such a moment might have looked like if captured by a photographer. The image serves as a historical reenactment, blending artistic interpretation with historical elements like period attire and traditional clothing.&lt;/strong&gt; Some comments discuss the historical accuracy and artistic liberties taken in the depiction, while others reflect on the broader implications of Columbus&apos;s arrival and its impact on native populations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A discussion emerged about the technical challenges of capturing historical events with modern photography equipment. Participants debated the feasibility of using high-resolution cameras to document such moments, considering factors like lighting conditions and the need for portable power sources in remote locations.&lt;/li&gt;
&lt;li&gt;One commenter highlighted the potential for using AI-driven image reconstruction techniques to simulate historical photographs. They discussed the use of neural networks to generate realistic images based on historical data, emphasizing the importance of training models on diverse datasets to improve accuracy.&lt;/li&gt;
&lt;li&gt;There was a technical debate on the ethical implications of altering historical narratives through photography. Some argued that while technology can enhance understanding, it risks distorting facts if not used responsibly. The conversation touched on the role of metadata in preserving the authenticity of digitally reconstructed images.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1szvl0j/a_short_story_im_liking_the_new_image_generation/&quot;&gt;A short story. I&apos;m liking the new image generation.&lt;/a&gt;&lt;/strong&gt; (Activity: 624): &lt;strong&gt;The Reddit post discusses a new image generation feature, likely related to AI or machine learning, that initially produces photorealistic images but degrades in quality with each subsequent image. The degradation is noted as a &apos;weird texture thing&apos; by users, suggesting a potential issue with the model&apos;s consistency or stability over iterations. The image linked in the post is not accessible due to network restrictions, but it is implied to be part of this image generation sequence.&lt;/strong&gt; Commenters express concern over the decreasing photorealism in the generated images, indicating a possible flaw in the model&apos;s ability to maintain quality across multiple outputs. This suggests a need for further refinement in the image generation process to ensure consistent quality.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user noted a decline in photorealism with each subsequent image generated, suggesting a potential issue with the model&apos;s consistency or capability to maintain quality across a series of images. This could indicate a limitation in the model&apos;s ability to handle complex textures or lighting over multiple iterations.&lt;/li&gt;
&lt;li&gt;Another user pointed out an error in the generated content where a newspaper in the image incorrectly states that June 14th, 2050, is a Thursday when it is actually a Tuesday. This highlights a potential flaw in the AI&apos;s ability to accurately generate or verify factual information, which could be a significant issue for applications requiring high accuracy.&lt;/li&gt;
&lt;li&gt;A comment speculated on the narrative potential of AI-generated content, suggesting that &apos;AI wars are started by companies to drive up interest and profit.&apos; This reflects a broader concern about the motivations behind AI development and deployment, hinting at the socio-economic implications of AI technologies.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1syxq98/i_asked_chatgpt_to_imagine_rchatgpt_the_day_agi/&quot;&gt;I asked ChatGPT to imagine r/ChatGPT the day AGI drops… the tiny details are insane&lt;/a&gt;&lt;/strong&gt; (Activity: 3996): &lt;strong&gt;The image is a humorous and fictional depiction of a scenario where AGI (Artificial General Intelligence) has been achieved, as imagined by ChatGPT. It portrays a chaotic and cluttered environment reminiscent of a Twitch livestream setup, featuring a humanoid AI character labeled &quot;gpt-∞.&quot; The scene is filled with various tech gadgets, energy drinks, and humorous elements like a &quot;World&apos;s Okayest User&quot; mug and a pizza box with &quot;Thanks 4 the data&quot; written on it. This setup is intended to satirize the potential future interactions with AGI, blending elements of current internet culture with speculative technology.&lt;/strong&gt; One comment humorously notes the irony of achieving AGI before the release of the much-anticipated video game GTA 6, highlighting the cultural significance of the game. Another comment points out the image&apos;s resemblance to a Twitch stream rather than a subreddit, suggesting a playful critique of the depicted scenario&apos;s realism.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ChatGPT/comments/1syu3qr/ai_is_getting_too_realistic/&quot;&gt;Ai is getting too realistic&lt;/a&gt;&lt;/strong&gt; (Activity: 5710): &lt;strong&gt;The image in the post is likely an AI-generated depiction of a young woman on a city street, showcasing the advanced realism that AI image generation technologies have achieved. The title &quot;Ai is getting too realistic&quot; suggests a focus on the increasing capability of AI to produce images that closely mimic real-life scenes, potentially blurring the lines between AI-generated content and actual photographs. This reflects ongoing advancements in AI models, such as GANs (Generative Adversarial Networks), which are designed to create highly realistic images by learning from vast datasets of real-world images.&lt;/strong&gt; One commenter nostalgically recalls the early days of AI when it struggled with basic tasks, highlighting the rapid progress in AI capabilities. Another comment humorously references a trope in movies, suggesting that AI-generated images are becoming as convincing as those used in cinematic storytelling.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Other Notable Frontier-Model / Infra Posts&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1szsnc7/this_is_exactly_what_i_feel_whenever_i_need_to/&quot;&gt;This is exactly what I feel whenever I need to explain the task over and over again&lt;/a&gt;&lt;/strong&gt; (Activity: 1142): &lt;strong&gt;The post humorously highlights a common issue with Large Language Models (LLMs): the need for precise and repeated task instructions due to their potential for misunderstanding underspecified requests. This reflects a known limitation in LLMs&apos; literacy capabilities, which can lead to failure modes where the model does not fully grasp the task without detailed guidance. However, some users argue that with advancements in models like &lt;code&gt;5.x&lt;/code&gt;, these issues are less frequent, suggesting that confusion often stems from user input errors rather than model deficiencies.&lt;/strong&gt; One commenter suggests that the need for specific instructions might be a deliberate design choice, possibly to increase token usage and thus cost, rather than a purely technical limitation.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;modbroccoli highlights a significant issue with LLMs: their tendency to fail when faced with underspecified requests due to inadequate literacy. This is a common failure mode where the model struggles to interpret vague or incomplete instructions, leading to suboptimal performance.&lt;/li&gt;
&lt;li&gt;zomgmeister argues that modern LLMs, particularly versions 5.x, have improved significantly in understanding tasks, suggesting that confusion often stems from user input errors rather than the model&apos;s capabilities. This reflects advancements in model training and architecture that enhance comprehension and task execution.&lt;/li&gt;
&lt;li&gt;Enjoying_A_Meal raises an interesting point about the cost of token usage in LLMs, suggesting that the need for specific instructions might be a deliberate design choice to increase token consumption. This implies a potential economic incentive behind the model&apos;s requirement for detailed input.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1sz4h4g/engineering_teams_celebrating_agentic_workflows/&quot;&gt;engineering teams celebrating agentic workflows that returned the same result two runs in a row&lt;/a&gt;&lt;/strong&gt; (Activity: 863): &lt;strong&gt;The post humorously highlights the challenges engineering teams face with agentic workflows, particularly when achieving consistent results across multiple runs. This is often a significant issue in software engineering due to non-deterministic factors such as race conditions or environmental dependencies. The mention of &apos;trash on X&apos; suggests a reference to a social media platform, possibly indicating a broader discussion or meme related to this topic.&lt;/strong&gt; The comments reflect a mix of humor and empathy, with users expressing both amusement and shared frustration over the unpredictability of engineering workflows. This suggests a common understanding of the difficulties in achieving deterministic outcomes in complex systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/OpenAI/comments/1szp0gy/this_is_so_accurate/&quot;&gt;this is so accurate 😂&lt;/a&gt;&lt;/strong&gt; (Activity: 1691): &lt;strong&gt;The Reddit post titled &apos;this is so accurate 😂&apos; seems to involve a humorous or relatable scenario, likely involving AI or machine learning models, as inferred from the comment &apos;This is just poor prompting lol&apos;. This suggests a discussion around the effectiveness of prompts in AI models, possibly highlighting common issues or misunderstandings in prompt engineering. The post&apos;s humor and relatability are emphasized by comments like &apos;trying my best, man&apos; and &apos;The end killed me&apos;, indicating a light-hearted take on a technical topic.&lt;/strong&gt; The comments reflect a consensus that the humor is derived from relatable experiences with AI prompting, with one comment suggesting that the humor stems from &apos;poor prompting&apos;, indicating a shared understanding of the challenges in crafting effective prompts for AI models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1t0cc8e/agi_is_here/&quot;&gt;AGI is here 🗣🗣&lt;/a&gt;&lt;/strong&gt; (Activity: 539): &lt;strong&gt;The image is a meme that humorously illustrates a conversation about fitting a backpack within airline size restrictions by rotating it. This highlights the practical application of spatial reasoning and problem-solving, albeit in a light-hearted manner, to avoid extra fees when traveling. The title &apos;AGI is here&apos; is a playful exaggeration, suggesting that such simple problem-solving is akin to artificial general intelligence (AGI), which is far more complex.&lt;/strong&gt; The comments reflect a humorous take on the situation, with one user joking about AI&apos;s capabilities in a hyperbolic manner, and another acknowledging the cleverness of the solution.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>xai</category><category>deepseek</category><category>artificial-analysis</category><category>andon-labs</category><category>grok-4.3</category><category>deepseek-v4-pro</category><category>kimi-k2.6</category><category>mimo-v2.5-pro</category><category>gemini-3.1-pro</category><category>claude-opus-4.7</category><category>gpt-5.5</category><category>deepskvit</category><category>scaling01</category><category>teortaxestex</category><category>omarsar0</category><category>benchmarking</category><category>cost-efficiency</category><category>agentic-ai</category><category>token-efficiency</category><category>attention-mechanisms</category><category>inference-speed</category><category>multimodality</category><category>spatial-reasoning</category><category>model-architecture</category><category>model-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-30-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-30-not-much/</guid><description>**OpenAI&apos;s GPT-5.5** achieves top-tier performance in long-horizon cyber tasks, matching or surpassing **Claude Mythos Preview** with a **71.4%** pass rate and showing ongoing improvement beyond **100M tokens** inference. OpenAI also released an **Advanced Account Security** update for ChatGPT enhancing phishing resistance. The **Codex** update expands beyond coding to general computer tasks, improving speed by up to **42%** and introducing role-based onboarding and app integrations. Economically, **GPT-5.5 Pro** shows a slight SOTA improvement on **CritPt** with **~60% lower cost** and token use compared to GPT-5.4 Pro. In open-weight models, **Qwen3.6 27B** leads under 150B parameters with an **Intelligence Index score of 46**, featuring **262K context**, native multimodal input, and efficient BF16 weights. Tencent&apos;s **Hy3-preview** (295B total, 21B active MoE) scores 42 on the Intelligence Index with strong scientific reasoning on **CritPt**. xAI&apos;s **Grok 4.3** shows sharp improvements on agentic benchmarks with reduced cost.</description><pubDate>Thu, 30 Apr 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 4/29/2026-4/30/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;OpenAI’s GPT-5.5, Codex expansion, and cyber capability evaluations&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;GPT-5.5 is now credibly in the top tier for long-horizon cyber tasks&lt;/strong&gt;: the UK AI Security Institute reported that &lt;a href=&quot;https://x.com/AISecurityInst/status/2049868227740565890&quot;&gt;GPT-5.5 became the second model to complete one of its multi-step cyber-attack simulations end-to-end&lt;/a&gt;, and multiple follow-on posts highlighted rough parity with &lt;strong&gt;Claude Mythos Preview&lt;/strong&gt; on this eval: &lt;a href=&quot;https://x.com/scaling01/status/2049870801998864606&quot;&gt;@scaling01&lt;/a&gt; cited &lt;strong&gt;71.4%&lt;/strong&gt; average pass rate for GPT-5.5 vs &lt;strong&gt;68.6%&lt;/strong&gt; for Mythos, while &lt;a href=&quot;https://x.com/cryps1s/status/2049879762169167898&quot;&gt;@cryps1s&lt;/a&gt; noted GPT-5.5 solved the TLO chain in &lt;strong&gt;2/10&lt;/strong&gt; attempts vs Mythos’ &lt;strong&gt;3/10&lt;/strong&gt;. &lt;a href=&quot;https://x.com/polynoamial/status/2049883449327243413&quot;&gt;@polynoamial&lt;/a&gt; emphasized that performance was still improving past &lt;strong&gt;100M tokens&lt;/strong&gt; of inference budget, suggesting no obvious saturation yet. This materially changes the earlier narrative that Anthropic had a unique lead in offensive cyber automation. OpenAI also paired this moment with a product-side security release: &lt;a href=&quot;https://x.com/OpenAI/status/2049902506881462613&quot;&gt;Advanced Account Security for ChatGPT&lt;/a&gt;, adding phishing-resistant sign-in and hardened recovery.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Codex is moving beyond coding into general computer work&lt;/strong&gt;: OpenAI shipped a substantial Codex update framed explicitly as “for everyone, for any task done with a computer,” with &lt;a href=&quot;https://x.com/OpenAI/status/2049928776147230886&quot;&gt;the main announcement&lt;/a&gt; highlighting role-based onboarding, app connections, and workflows spanning docs, slides, spreadsheets, research, and planning. &lt;a href=&quot;https://x.com/ajambrosino/status/2049928915872075984&quot;&gt;@ajambrosino&lt;/a&gt; summarized the update as dynamic task-specific UI, &lt;strong&gt;20% faster&lt;/strong&gt; computer/browser use, better slide/sheet handling, and less clunky handoffs, while &lt;a href=&quot;https://x.com/AriX/status/2049932746567598472&quot;&gt;@AriX&lt;/a&gt; called out that &lt;strong&gt;Computer Use runs 42% faster&lt;/strong&gt; after the update. Sam Altman amplified the launch with &lt;a href=&quot;https://x.com/sama/status/2049946120441520624&quot;&gt;“big upgrade for codex today! try it for non-coding computer work.”&lt;/a&gt; The broader pattern: OpenAI is productizing “computer-use agent” UX, not just model capability.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Benchmark deltas were incremental but economically meaningful&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049926072595280030&quot;&gt;Artificial Analysis&lt;/a&gt; reported &lt;strong&gt;GPT-5.5 Pro&lt;/strong&gt; as a slight new SOTA on &lt;strong&gt;CritPt&lt;/strong&gt; over GPT-5.4 Pro, but the interesting point was not raw score—it achieved the bump with &lt;strong&gt;~60% lower cost and token use&lt;/strong&gt; on that frontier-science eval. That lines up with broader chatter that the GPT-5.5 family is less about a dramatic intelligence discontinuity than about stronger reliability and better efficiency in high-value workflows.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Open-weight model movement: Qwen3.6, Tencent Hy3-preview, Grok 4.3, and Ling 2.6 1T&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Qwen3.6 27B looks like the most important open-weight release of the day&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049881951260283097&quot;&gt;Artificial Analysis&lt;/a&gt; ranked &lt;strong&gt;Qwen3.6 27B&lt;/strong&gt; as the new open-weights leader under &lt;strong&gt;150B&lt;/strong&gt; parameters with an &lt;strong&gt;Intelligence Index score of 46&lt;/strong&gt;, ahead of Gemma 4 31B and prior Qwen variants. Key details: &lt;strong&gt;Apache 2.0&lt;/strong&gt;, &lt;strong&gt;262K context&lt;/strong&gt;, &lt;strong&gt;native multimodal input&lt;/strong&gt;, and BF16 weights small enough to fit on a single H100. The companion &lt;strong&gt;35B A3B MoE&lt;/strong&gt; scored &lt;strong&gt;43&lt;/strong&gt;, making it the strongest open model around &lt;strong&gt;3B active parameters&lt;/strong&gt;. The tradeoff is expensive inference-by-output-token: AA estimates Qwen3.6 27B used &lt;strong&gt;~144M output tokens&lt;/strong&gt; on the suite and is roughly &lt;strong&gt;21×&lt;/strong&gt; the cost of Gemma 4 31B to run there. Still, on capability-per-size it appears to be a notable step.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Tencent’s Hy3-preview is competitive but not class-leading&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049852417316143393&quot;&gt;Artificial Analysis&lt;/a&gt; described &lt;strong&gt;Hy3-preview&lt;/strong&gt; as a &lt;strong&gt;295B total / 21B active MoE&lt;/strong&gt; with &lt;strong&gt;256K context&lt;/strong&gt; and a &lt;strong&gt;restricted-commercial-use&lt;/strong&gt; community license. It scored &lt;strong&gt;42&lt;/strong&gt; on AA’s Intelligence Index, trailing recent open peers like Qwen3.6 27B, DeepSeek V4 Flash, and GLM-5.1. The most interesting bright spot was &lt;strong&gt;CritPt&lt;/strong&gt;, where it matched GLM-5.1 at &lt;strong&gt;4.6%&lt;/strong&gt;, suggesting better-than-average scientific reasoning relative to its overall position.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;xAI’s Grok 4.3 improved sharply on agentic benchmarks while getting cheaper&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049987001655714250&quot;&gt;Artificial Analysis&lt;/a&gt; measured &lt;strong&gt;Grok 4.3&lt;/strong&gt; at &lt;strong&gt;53&lt;/strong&gt; on the Intelligence Index, up four points from Grok 4.20 v2, with a major jump on &lt;strong&gt;GDPval-AA&lt;/strong&gt; to &lt;strong&gt;1500 Elo&lt;/strong&gt;. AA also reported approximately &lt;strong&gt;40% lower input price&lt;/strong&gt; and &lt;strong&gt;60% lower output price&lt;/strong&gt; than the prior version. The release still trails GPT-5.5 on GDPval-AA by a wide margin, but it looks like a real systems-and-post-training improvement rather than a minor rev.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ant Group’s Ling 2.6 1T targets cost-efficiency rather than frontier status&lt;/strong&gt;: &lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049923495602303438&quot;&gt;Artificial Analysis&lt;/a&gt; positioned &lt;strong&gt;Ling 2.6 1T&lt;/strong&gt; as a &lt;strong&gt;1T-parameter non-reasoning model&lt;/strong&gt; scoring &lt;strong&gt;34&lt;/strong&gt;, with decent GPQA/HLE numbers and notably low benchmark-run cost at roughly &lt;strong&gt;$95&lt;/strong&gt;. The caveat is reliability: AA reported a &lt;strong&gt;92% hallucination rate&lt;/strong&gt; on AA-Omniscience.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek multimodal/vision work, GUI agents, and training scale speculation&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DeepSeek’s multimodal direction appears tightly coupled to computer-use agents&lt;/strong&gt;: &lt;a href=&quot;https://x.com/nrehiew_/status/2049840778491662623&quot;&gt;@nrehiew_&lt;/a&gt; highlighted that DeepSeek trains vision into &lt;strong&gt;V4-Flash&lt;/strong&gt; by having the model directly output &lt;strong&gt;bounding boxes and point coordinates during reasoning&lt;/strong&gt;, interpreting this as a computer-use-oriented design rather than generic VLM work. A second post argues the paper’s “visual primitives” tasks map directly to browser/computer use rather than broad multimodal understanding (&lt;a href=&quot;https://x.com/nrehiew_/status/2049840802562740311&quot;&gt;link&lt;/a&gt;). That framing matches parallel observations from &lt;a href=&quot;https://x.com/teortaxesTex/status/2049871869847765212&quot;&gt;@teortaxesTex&lt;/a&gt; that DeepSeek may be integrating vision weights back into the main V4 line rather than releasing a separate “V4-Flash-Vision”.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;The repo disappearance became a story of its own&lt;/strong&gt;: after release, several observers noted that DeepSeek’s “Thinking with Visual Primitives” repo vanished, including &lt;a href=&quot;https://x.com/teortaxesTex/status/2049880056420298995&quot;&gt;@teortaxesTex&lt;/a&gt; and &lt;a href=&quot;https://x.com/arjunkocher/status/2049875566678118898&quot;&gt;@arjunkocher&lt;/a&gt;. No clear explanation emerged in these tweets, but the deletion drew more attention because the work suggested a concrete recipe for visual reasoning and GUI grounding.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scaling chatter points to very large token counts for frontier pretraining&lt;/strong&gt;: &lt;a href=&quot;https://x.com/teortaxesTex/status/2049830477167526255&quot;&gt;@teortaxesTex&lt;/a&gt; argued that &lt;strong&gt;&gt;100T tokens&lt;/strong&gt; is no longer unusual for frontier models and estimated a hypothetical &lt;strong&gt;100T-token DeepSeek V4&lt;/strong&gt; as “V4 + 2 more epochs,” while &lt;a href=&quot;https://x.com/nrehiew_/status/2049848830292856970&quot;&gt;@nrehiew_&lt;/a&gt; back-of-the-enveloped &lt;strong&gt;~150T tokens&lt;/strong&gt; and &lt;strong&gt;~9e25 pretraining FLOPs&lt;/strong&gt; for a &lt;strong&gt;~100B active&lt;/strong&gt; model, suggesting a run feasible in roughly &lt;strong&gt;14 days&lt;/strong&gt; on an OpenAI-scale &lt;strong&gt;100K GB200&lt;/strong&gt; cluster at conservative MFU. These are speculative takes, but useful as calibration for what “frontier-scale” now means in practice.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent infrastructure, harness engineering, and collaborative agent systems&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;There is a clear shift from model-centric bragging to harness-centric engineering&lt;/strong&gt;: Cursor published a strong note on &lt;a href=&quot;https://x.com/cursor_ai/status/2049901436918436249&quot;&gt;how it tests and tunes its agent harness&lt;/a&gt;, focusing on runtime, evals, degradation repair, and model-specific customization rather than generic benchmark claims. &lt;a href=&quot;https://x.com/Vtrivedy10/status/2049919247321813491&quot;&gt;@Vtrivedy10&lt;/a&gt; explicitly connected Cursor’s writeup to design patterns converging across agent builders: bespoke prompts/tools per model, mixed offline+online evals, dogfooding, and treating the context window as the primary compute boundary.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LangChain continues to package deployment and multi-tenant agent infra&lt;/strong&gt;: &lt;a href=&quot;https://x.com/hwchase17/status/2049858892637892739&quot;&gt;@hwchase17&lt;/a&gt; introduced &lt;strong&gt;DeepAgents deploy&lt;/strong&gt;, a config-driven cloud deployment flow via &lt;code&gt;deepagents.toml&lt;/code&gt;, covering agent, sandbox, auth, and frontend sections. Related posts from LangChain staff detailed agent-server patterns for data isolation, delegated credentials, and RBAC in multi-user deployments (&lt;a href=&quot;https://x.com/sydneyrunkle/status/2049956826670911809&quot;&gt;example&lt;/a&gt;). This is increasingly the boring-but-important layer turning demos into enterprise software.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Collaborative multi-agent workspaces are getting more concrete&lt;/strong&gt;: &lt;a href=&quot;https://x.com/cmpatino_/status/2049881579691139372&quot;&gt;@cmpatino_&lt;/a&gt; introduced &lt;strong&gt;Agent Collabs&lt;/strong&gt;, using Hugging Face buckets plus Spaces as a shared backend for swarms of heterogeneous agents to exchange messages, artifacts, and progress. The noteworthy idea is not just “agents collaborating,” but lightweight coordination primitives that let weaker agents contribute useful validation work while better-resourced agents handle expensive experiments.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Security, supply chain, and account hardening&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Open-source package compromise remains an acute operational risk&lt;/strong&gt;: &lt;a href=&quot;https://x.com/SocketSecurity/status/2049849100548424180&quot;&gt;Socket&lt;/a&gt; reported that the popular PyPI package &lt;strong&gt;&lt;code&gt;lightning&lt;/code&gt;&lt;/strong&gt; was compromised in versions &lt;strong&gt;2.6.2&lt;/strong&gt; and &lt;strong&gt;2.6.3&lt;/strong&gt;, with malicious code executing on import, downloading &lt;strong&gt;Bun&lt;/strong&gt;, and running an &lt;strong&gt;11 MB obfuscated JavaScript payload&lt;/strong&gt; aimed at credential theft. &lt;a href=&quot;https://x.com/theo/status/2049914688318959952&quot;&gt;@theo&lt;/a&gt; connected that incident with additional package compromises (&lt;code&gt;intercom-client&lt;/code&gt; on npm) and a Linux zero day, arguing the tempo of software supply-chain attacks is increasing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Security scanners are becoming first-class AI products&lt;/strong&gt;: Anthropic rolled out &lt;strong&gt;Claude Security&lt;/strong&gt;, described by &lt;a href=&quot;https://x.com/kimmonismus/status/2049901987500552195&quot;&gt;@kimmonismus&lt;/a&gt; and later &lt;a href=&quot;https://x.com/_catwu/status/2049964403177689130#m&quot;&gt;@_catwu&lt;/a&gt; as a repo vulnerability scanner that validates findings and suggests fixes, powered by &lt;strong&gt;Opus 4.7&lt;/strong&gt;. Cursor shipped a parallel offering with &lt;a href=&quot;https://x.com/cursor_ai/status/2049926283061035254&quot;&gt;Cursor Security Review&lt;/a&gt;, including always-on PR review and scheduled codebase scans. This is one of the clearest examples of model vendors moving directly into established devsecops categories.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI Codex broadens into general knowledge work&lt;/strong&gt;: &lt;a href=&quot;https://x.com/OpenAI/status/2049928776147230886&quot;&gt;OpenAI’s Codex announcement&lt;/a&gt; and &lt;a href=&quot;https://x.com/sama/status/2049946120441520624&quot;&gt;Sam Altman’s follow-up&lt;/a&gt; were the day’s biggest product posts, signaling a strategic push from “coding agent” to “computer-use agent”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPT-5.5’s cyber eval result mattered&lt;/strong&gt;: &lt;a href=&quot;https://x.com/AISecurityInst/status/2049868227740565890&quot;&gt;UK AISI’s thread&lt;/a&gt; was one of the highest-engagement technical posts and reshaped comparisons with Anthropic’s Mythos.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Qwen shipped interpretability tooling, not just models&lt;/strong&gt;: &lt;a href=&quot;https://x.com/Alibaba_Qwen/status/2049861145574690992&quot;&gt;Qwen-Scope&lt;/a&gt;, an open suite of sparse autoencoders for Qwen models, stood out as a rare release focused on feature steering, debugging, data synthesis, and evaluation rather than raw model weights.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Anthropic published a large-scale guidance/sycophancy study&lt;/strong&gt;: &lt;a href=&quot;https://x.com/AnthropicAI/status/2049927618397614466&quot;&gt;their analysis of 1M Claude conversations&lt;/a&gt; tied behavioral research directly to training changes for &lt;strong&gt;Opus 4.7&lt;/strong&gt; and &lt;strong&gt;Mythos Preview&lt;/strong&gt;, an important sign that post-training loops are becoming more productized and data-informed.&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. AMD Ryzen 395 Box and Halo Box Launch&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t038g7/amd_inhouse_ryzen_395_box_coming_in_june/&quot;&gt;AMD in-house ryzen 395 box coming in June&lt;/a&gt;&lt;/strong&gt; (Activity: 1061): &lt;strong&gt;The image from the AMD AI Dev Day presentation showcases the upcoming AMD Ryzen 395 box, which is expected to be released in June. The device features &lt;code&gt;128GB&lt;/code&gt; of unified memory and claims to support &lt;code&gt;200 billion models&lt;/code&gt; natively, leveraging what is referred to as &quot;Ryzen AI Max.&quot; The product appears to be manufactured by Lenovo, as suggested by a mention in the presentation. However, an engineer confirmed that the unit is essentially a Ryzen 395 with &lt;code&gt;128GB&lt;/code&gt; and no additional changes.&lt;/strong&gt; Commenters are skeptical about the practicality of running a &lt;code&gt;200 billion model&lt;/code&gt; on &lt;code&gt;128GB&lt;/code&gt; of unified RAM, questioning the feasibility given the memory constraints even when accounting for operating system overhead.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;obiwanfatnobi raises a technical point about the feasibility of running a &apos;200B model&apos; on a system with &apos;128GB unified RAM&apos;. They highlight that even with Linux, the usable VRAM would be around &apos;116GB&apos;, which may not be sufficient for such large models, suggesting potential limitations in current hardware configurations for AI workloads.&lt;/li&gt;
&lt;li&gt;promethe42 compares the new AMD Ryzen 395 box to a &apos;Framework Desktop&apos;, noting that it seems to be released &apos;12 months later&apos;. They suggest that AMD should prioritize improving their &apos;drivers/ROCm&apos; before releasing new hardware, indicating that software support might be lagging behind hardware advancements.&lt;/li&gt;
&lt;li&gt;DaniyarQQQ comments on the need for &apos;512GB of unified memory&apos;, implying that current memory capacities may be insufficient for modern computing demands, particularly in high-performance or AI applications. This suggests a trend towards increasing memory requirements in cutting-edge technology.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1t09hyw/amd_halo_box_ryzen_395_128gb_photos/&quot;&gt;AMD Halo Box (Ryzen 395 128GB) photos&lt;/a&gt;&lt;/strong&gt; (Activity: 467): &lt;strong&gt;The AMD Halo Box, featuring a &lt;code&gt;Ryzen 395&lt;/code&gt; processor and &lt;code&gt;128GB&lt;/code&gt; of RAM, was showcased running Ubuntu. The unit includes a programmable light strip, enhancing its customization capabilities. However, it lacks a CD-ROM drive and does not feature a fast port for clustering, which may limit its use in certain high-performance computing scenarios.&lt;/strong&gt; Commenters noted the absence of a CD-ROM and a fast port for clustering as potential drawbacks, indicating that while the device is compact, these omissions could affect its utility in specific technical applications.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;OnkelBB points out the lack of a fast port for clustering in the AMD Halo Box, which could limit its use in high-performance computing environments where fast interconnects are crucial for scaling across multiple nodes.&lt;/li&gt;
&lt;li&gt;FoxiPanda highlights a common request for increased memory bandwidth in AMD products, suggesting that current offerings may not meet the demands of memory-intensive applications. This is a critical factor for workloads that require rapid data access and processing.&lt;/li&gt;
&lt;li&gt;Stepfunction notes that the AMD Halo Box is a small form factor computer, which implies potential constraints on expandability and cooling, but also benefits in terms of space efficiency and portability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Qwen Model Innovations and Applications&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1szrbub/qwenscope_official_sparse_autoencoders_saes_for/&quot;&gt;Qwen-Scope: Official Sparse Autoencoders (SAEs) for Qwen 3.5 models&lt;/a&gt;&lt;/strong&gt; (Activity: 393): &lt;strong&gt;&lt;strong&gt;Qwen-Scope&lt;/strong&gt; is a newly released collection of Sparse Autoencoders (SAEs) for the &lt;strong&gt;Qwen 3.5 models&lt;/strong&gt;, ranging from &lt;code&gt;2B&lt;/code&gt; to &lt;code&gt;35B&lt;/code&gt; MoE, designed to map internal features across all layers. This tool acts as a dictionary of the model&apos;s internal concepts, allowing for precise interventions such as &lt;strong&gt;Surgical Abliteration&lt;/strong&gt; to suppress specific features like refusal, &lt;strong&gt;Feature Steering&lt;/strong&gt; to activate desired concepts, and &lt;strong&gt;Model Debugging&lt;/strong&gt; to identify token-triggered internal directions. The release is under the &lt;strong&gt;Apache 2.0 license&lt;/strong&gt;, but the Qwen team advises against using it to remove safety filters. The tool is demonstrated in a &lt;a href=&quot;https://hf.co/spaces/Qwen/QwenScope&quot;&gt;Space demo&lt;/a&gt; and detailed in a &lt;a href=&quot;https://qianwen-res.oss-accelerate.aliyuncs.com/qwen-scope/Qwen_Scope.pdf&quot;&gt;technical paper&lt;/a&gt;.&lt;/strong&gt; Commenters highlight the significance of this release as potentially the largest open-source interpretability tool for a dense &lt;code&gt;27B&lt;/code&gt; model, contrasting it with Google&apos;s smaller &lt;code&gt;GemmaScope&lt;/code&gt; variants. There is anticipation for similar tools for future model iterations like Qwen 3.6.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NandaVegg highlights the significance of the release of Sparse Autoencoders (SAEs) for the dense 27B Qwen model, noting it as potentially the largest open-source interpretability tool available. This contrasts with previous tools like GemmaScope, which only supported smaller models such as 9B and 2B, indicating a substantial advancement in model interpretability capabilities.&lt;/li&gt;
&lt;li&gt;robert896r1 expresses anticipation for the release of similar tools for Qwen 3.6, suggesting that the community might adapt existing tools for newer iterations. This reflects a common trend where the community often extends or modifies tools to support the latest model versions, ensuring continued utility and relevance.&lt;/li&gt;
&lt;li&gt;oxygen_addiction speculates on the use of feature steering in large models, such as ChatGPT5, where a router could dynamically select the best model for a given prompt. This concept involves leveraging interpretability tools to enhance model performance by tailoring responses based on specific features or requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLM/comments/1szeghg/qwen_36_35b_a3b_is_insane_even_for/&quot;&gt;Qwen 3.6 35b a3b is INSANE even for VRAM-constrained systems&lt;/a&gt;&lt;/strong&gt; (Activity: 480): &lt;strong&gt;The post discusses the performance of &lt;strong&gt;Qwen 3.6 35B-A3B&lt;/strong&gt;, a local LLM, on a VRAM-constrained system with an &lt;strong&gt;AMD 7700 XT, 32GB DDR4 RAM, and a Ryzen 5 5600&lt;/strong&gt;. The user highlights the model&apos;s ability to handle complex coding tasks, such as fixing bugs in a web scraper and updating a project README with screenshots, using configurations like &lt;code&gt;i1-q4_k_s quant&lt;/code&gt;, &lt;code&gt;128k context&lt;/code&gt;, &lt;code&gt;flash attention&lt;/code&gt;, and &lt;code&gt;Q8_0 KV quantization&lt;/code&gt;. The model succeeded where others like &lt;strong&gt;Gemma 3, Gemma 4, and Qwen 2.5 Coder&lt;/strong&gt; failed, demonstrating its capability to perform tasks without failed tool calls, even under hardware constraints.&lt;/strong&gt; Commenters suggest optimizing performance by moving extra experts to CPU and fitting the KV cache on GPU to achieve over &lt;code&gt;30 t/s&lt;/code&gt;. Another user questions the long processing time at &lt;code&gt;16-20 tok/s&lt;/code&gt;, noting their own experience of faster processing at &lt;code&gt;35-40 tok/s&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GoldenX86 suggests optimizing performance by moving extra experts to the CPU while keeping the KV cache on the GPU, which can increase processing speed to over 30 tokens per second (t/s). This approach is particularly useful for VRAM-constrained systems, allowing for efficient utilization of available resources.&lt;/li&gt;
&lt;li&gt;AccomplishedFix3476 highlights the potential of running the 35b a3b model on consumer VRAM for coding workflows, noting that local and long-running tasks can reveal memory leaks and context drift issues not apparent in API environments with short time-to-live (TTL). They recommend logging everything initially to catch these issues early.&lt;/li&gt;
&lt;li&gt;Perfect-Flounder7856 shares a benchmark comparison where the 35b a3b model outperformed the 27b model on a policy reasoning benchmark, scoring 96 versus 92. This indicates the model&apos;s superior performance in specific tasks, justifying hardware investments for those seeking high accuracy and speed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Mistral Medium 3.5 Model Launch&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sz1qer/mistralaimistralmedium35128b_hugging_face/&quot;&gt;mistralai/Mistral-Medium-3.5-128B · Hugging Face&lt;/a&gt;&lt;/strong&gt; (Activity: 1120): &lt;strong&gt;The &lt;strong&gt;Mistral Medium 3.5&lt;/strong&gt; is a dense &lt;code&gt;128B&lt;/code&gt; parameter model with a &lt;code&gt;256k&lt;/code&gt; context window, designed for instruction-following, reasoning, and coding tasks. It supports multimodal input, including text and images, and offers configurable reasoning effort per request, allowing it to toggle between fast replies and complex reasoning. The model is multilingual, supports system prompts, and is released under a &lt;strong&gt;Modified MIT License&lt;/strong&gt;. It replaces previous models like Mistral Medium 3.1 and Devstral 2, promising enhanced performance in a unified architecture. For complex tasks, a &lt;code&gt;reasoning_effort&lt;/code&gt; of &quot;high&quot; is recommended, with a temperature setting of &lt;code&gt;0.7&lt;/code&gt; for optimal performance.&lt;/strong&gt; Commenters are experimenting with the model&apos;s performance on different hardware, noting the dense &lt;code&gt;128B&lt;/code&gt; parameter configuration as a unique feature. There is a discussion on the model&apos;s niche compared to other dense models like Qwen &lt;code&gt;27B&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;IvGranite shared performance metrics for running the &lt;code&gt;mistral-medium-3.5-128b-q4&lt;/code&gt; model on a Strix Halo using &lt;code&gt;llama.cpp&lt;/code&gt; build 8967. The results showed a generation speed of &lt;code&gt;3.26 t/s&lt;/code&gt; with a prompt processing speed of &lt;code&gt;46.70 t/s&lt;/code&gt;, and a total duration of &lt;code&gt;4.84s&lt;/code&gt; for one of the tests. This indicates a relatively efficient processing time for a model of this size, highlighting the potential of the &lt;code&gt;q4&lt;/code&gt; quantization in optimizing performance.&lt;/li&gt;
&lt;li&gt;grumd and reto-wyss discussed the implications of a 128B dense model, with grumd noting it as an &apos;interesting niche&apos;. reto-wyss compared it to the Qwen 27b model, questioning which is denser, suggesting a competitive landscape in model density and performance. This reflects ongoing interest in balancing model size with computational efficiency.&lt;/li&gt;
&lt;li&gt;The discussion around dense models like the &lt;code&gt;mistral-medium-3.5-128b&lt;/code&gt; highlights the challenges and innovations in handling large-scale models. The focus is on achieving high performance with dense architectures, which are typically resource-intensive but offer significant potential for complex tasks. The conversation underscores the importance of advancements in model quantization and optimization techniques.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sz2mgw/mistral_medium_35_launched/&quot;&gt;Mistral Medium 3.5 Launched&lt;/a&gt;&lt;/strong&gt; (Activity: 369): &lt;strong&gt;&lt;strong&gt;Mistral Medium 3.5&lt;/strong&gt; has been launched as a &lt;code&gt;128B&lt;/code&gt; dense model, integrating instruction-following, reasoning, and coding capabilities. The model is available with open weights under a modified MIT license, which restricts commercial use without a license fee for companies with revenue exceeding &lt;code&gt;$20M&lt;/code&gt; per month. This model supports asynchronous coding tasks in the cloud, allowing multiple sessions to run in parallel, and introduces a new Work mode in Le Chat for complex workflows. More details can be found on &lt;a href=&quot;https://huggingface.co/mistralai/Mistral-Medium-3.5-128B&quot;&gt;Hugging Face&lt;/a&gt; and &lt;a href=&quot;https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5&quot;&gt;Mistral&apos;s announcement&lt;/a&gt;.&lt;/strong&gt; There is debate over the licensing terms, with some users arguing that calling it a &quot;modified MIT license&quot; is misleading, as the restrictions on commercial use deviate from the traditional MIT license terms.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Mistral Medium 3.5 model is a dense 128 billion parameter model, which is significant given the trend towards larger dense models. This aligns with the ongoing investment in dense architectures, as noted by Septerium, and reflects a broader industry movement towards both ultra-sparse MOE models and super-dense models in the 200 billion parameter range.&lt;/li&gt;
&lt;li&gt;Long_comment_san highlights that while the Mistral Medium 3.5&apos;s benchmarks are not state-of-the-art, they are decent enough to sustain interest in large dense models. The commenter emphasizes the importance of these models as future workhorses in AI, suggesting that the industry will continue to explore both dense models in the 80 billion+ range and ultra-sparse models with trillions of parameters.&lt;/li&gt;
&lt;li&gt;ClearApartment2627 raises a licensing issue, arguing that Mistral&apos;s license, which requires companies with over $20 million in monthly revenue to pay for commercial use, should not be labeled as a &quot;modified MIT license.&quot; This distinction is important for companies considering the model for commercial applications, as it affects the cost and legal implications of using the model.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. Claude AI Applications and Innovations&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1sz38u6/launched_my_first_app_using_claude/&quot;&gt;Launched My First App Using Claude&lt;/a&gt;&lt;/strong&gt; (Activity: 654): &lt;strong&gt;The user launched a vehicle management app built using &lt;strong&gt;Claude&lt;/strong&gt;, featuring functionalities like expense tracking, customizable maintenance schedules, fuel tracking, a showroom mode, and an AI assistant via the Claude API. The app is front-end focused with local data storage, though API calls require a database. The developer is working on a Play Store version and seeks feedback for growth. &lt;a href=&quot;https://apps.apple.com/app/id6761397650&quot;&gt;App Link&lt;/a&gt;.&lt;/strong&gt; One commenter compared the app favorably to &lt;strong&gt;Vehicle Smart&lt;/strong&gt;, noting its superior development in maintenance features. Another inquired about the development tools used, asking if it was built in &lt;strong&gt;Swift&lt;/strong&gt;, &lt;strong&gt;Expo&lt;/strong&gt;, or &lt;strong&gt;Tauri&lt;/strong&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NooneLeftToBlame discusses the app&apos;s features, comparing it to &apos;Vehicle Smart&apos;, a popular UK app used by police. They note that while &apos;Vehicle Smart&apos; has a number plate lookup and a garage feature for maintenance reminders, the latter is poorly developed. In contrast, the new app appears better developed based on screenshots, suggesting a potential competitive advantage in user experience.&lt;/li&gt;
&lt;li&gt;barritus inquires about the app&apos;s development stack, asking if it was built entirely in Swift or using frameworks like Expo or Tauri. This highlights interest in the technical implementation and choice of technology stack, which can impact app performance and cross-platform compatibility.&lt;/li&gt;
&lt;li&gt;Alternative-Ad-8175 raises a concern about data storage, suggesting cloud storage to prevent data loss if the phone is lost. They also mention the presence of Personally Identifiable Information (PII), implying the need for secure data handling practices to protect user privacy.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1syu949/the_final_nail_in_the_coffin_for_entry_level/&quot;&gt;The final nail in the coffin for entry level creative freelancers just dropped&lt;/a&gt;&lt;/strong&gt; (Activity: 940): &lt;strong&gt;&lt;strong&gt;Anthropic&lt;/strong&gt; has released the Blender MCP connector, enabling &lt;strong&gt;Claude&lt;/strong&gt; to control Blender via the Python API. This integration allows users to create and modify 3D scenes using natural language commands, effectively acting as a &apos;copilot&apos; within Blender. The tool can handle tasks such as debugging node setups, batch changes, and adding custom tools, potentially reducing the need for entry-level freelancers in tasks like product renders and low-poly asset creation. The broader creative pipeline can now be managed by a single user with Claude and connected tools, streamlining processes from scriptwriting to final edits.&lt;/strong&gt; Some commenters express skepticism about the quality of work produced by AI, suggesting it may lead to an increase in low-quality games and applications. Others dismiss the significance of the announcement, comparing the discussion to sensationalist media.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1syt37w/claude_is_my_seo_strategist_content_engine_and/&quot;&gt;Claude is my SEO strategist, content engine, and CTO. From 0 to 10,000 active users in 6 weeks, $0 on ads.&lt;/a&gt;&lt;/strong&gt; (Activity: 1039): &lt;strong&gt;The image in the Reddit post is a dashboard displaying analytics data that highlights significant growth in user engagement for the marketplace Agensi, which was built using AI tools like Claude and Lovable. The dashboard reports 10,000 active users, a 263.3% increase, and 9,900 new users, a 262.0% increase over the last 30 days, achieved without spending on ads. This growth is attributed to strategic use of Claude for SEO, content strategy, and AEO (answer engine optimization), which involves analyzing Google Search Console data to identify keyword gaps and optimize content structure for AI engines and search engines.&lt;/strong&gt; Some commenters are skeptical about the authenticity and originality of the content, suggesting it might be &apos;generic AI slop&apos; or spam, and questioning if the post itself was written by AI.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeCode/comments/1szi053/how_not_to_run_an_ai_company/&quot;&gt;How not to run an ai company&lt;/a&gt;&lt;/strong&gt; (Activity: 934): &lt;strong&gt;The image depicts a status dashboard for an AI company, showing multiple services experiencing a &apos;Major Outage.&apos; The services include &apos;claude.ai,&apos; &apos;Claude Console,&apos; &apos;Claude API,&apos; &apos;Claude Code,&apos; &apos;Claude Cowork,&apos; and &apos;Claude for Government,&apos; with uptime percentages ranging from &lt;code&gt;98.69%&lt;/code&gt; to &lt;code&gt;99.88%&lt;/code&gt;. This suggests significant operational challenges in maintaining service reliability, which is critical for AI companies aiming for consistent performance. The title and comments highlight the perception of poor management and the challenges of operating in the fast-paced AI industry, where stability is often sacrificed for rapid development.&lt;/strong&gt; Commenters debate whether such outages are typical for cutting-edge AI companies, with some arguing it&apos;s part of the &apos;go fast and break things&apos; approach common in disruptive tech sectors, while others suggest this is not suitable for mature SaaS companies.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. DeepSeek V4 Model Performance and Comparisons&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1t0aods/i_wasnt_ready_for_deepseek_v4/&quot;&gt;I wasn’t ready for DeepSeek V4&lt;/a&gt;&lt;/strong&gt; (Activity: 176): &lt;strong&gt;The image showcases a dashboard for DeepSeek V4, highlighting its performance metrics such as spending, token usage, and cache savings. The total spend is noted as &lt;code&gt;$1,050.86&lt;/code&gt; with cache savings of &lt;code&gt;$3,351.43&lt;/code&gt;, indicating significant cost efficiency. The dashboard compares different models like DeepSeek Chat, DeepSeek V4 Pro, and DeepSeek V4 Flash, emphasizing the superior performance of the V4 Flash model over others, including the Claude models previously used by the poster. This suggests that DeepSeek V4 models offer a competitive edge in terms of price, speed, and efficiency, challenging existing premium models in the market.&lt;/strong&gt; Commenters highlight the revolutionary nature of the V4 models in terms of cost-effectiveness and performance, suggesting that the market has yet to fully recognize their potential. There is also curiosity about the specific dashboard or application used to display these analytics.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DeepSeek V4&lt;/strong&gt; is noted for its significant improvements in price, speed, and efficiency, marking a revolutionary step in AI model development. Users highlight that the model&apos;s cost-effectiveness is a standout feature, potentially disrupting the market by offering high performance at a lower price point compared to previous versions.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;V4 flash&lt;/strong&gt; model is becoming a default choice for many users due to its balanced performance metrics. It is praised for its ability to handle a wide range of tasks efficiently, suggesting that it offers a versatile solution for various applications, which could be a key factor in its adoption.&lt;/li&gt;
&lt;li&gt;Despite its capabilities, there seems to be a lack of awareness or recognition of &lt;strong&gt;DeepSeek V4&apos;s&lt;/strong&gt; potential impact on the market. This could be attributed to a general acceptance of high costs in AI solutions, which V4 challenges by providing a more cost-effective alternative without compromising on performance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1sz84uc/deepseek_v4_pro_reminds_me_of_claude_46_sonnet/&quot;&gt;Deepseek V4 pro reminds me of Claude 4.6 sonnet&lt;/a&gt;&lt;/strong&gt; (Activity: 175): &lt;strong&gt;The post discusses the performance of the &lt;strong&gt;Deepseek V4 Pro&lt;/strong&gt; model, comparing it to &lt;strong&gt;Claude 4.6 Sonnet&lt;/strong&gt; in terms of creativity and coding capabilities, particularly for HTML tasks. The model is noted for its potential, being in preview, but currently struggles with roleplay consistency and character adherence, often ignoring instructions even at low temperature settings like &lt;code&gt;0.6&lt;/code&gt;. The user also mentions &lt;strong&gt;Kimi K2.6&lt;/strong&gt; as their preferred model for most tasks, while acknowledging Deepseek V4 Pro&apos;s improvements over its predecessor, Deepseek V3.2.&lt;/strong&gt; Commenters highlight the model&apos;s instability and inconsistency in roleplay, with issues in maintaining character traits and scene consistency. One user suggests that &lt;strong&gt;GLM 5.1&lt;/strong&gt; outperforms &lt;strong&gt;Kimi K2.6&lt;/strong&gt; in coding tasks, indicating a preference for GLM 5.1 in technical applications.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flat-Rooster8373 highlights issues with DeepSeek V4 Pro&apos;s consistency in role-playing scenarios, noting that the model struggles to maintain character integrity and often ignores instructions even at lower temperature settings like 0.6. The commenter observes that using presets exacerbates these issues, leading to repetitive and phrase-heavy outputs, whereas a preset-free approach yields better first-person reasoning, though the final output still diverges from the reasoning process.&lt;/li&gt;
&lt;li&gt;Far-Habit-2713 compares DeepSeek V4 Pro with Qwen 3.6 Plus in coding tasks, finding that Qwen excels in general coding and debugging. However, DeepSeek V4 Pro is noted for producing superior Rust code and offering more detailed code analysis. This suggests that while Qwen may be more versatile, DeepSeek has strengths in specific programming languages and detailed analysis.&lt;/li&gt;
&lt;li&gt;azvd_ shares their experience using DeepSeek V4 Pro on the Hermes platform, noting that it makes fewer mistakes compared to Opus 4.7. This improvement is attributed to DeepSeek&apos;s enhanced understanding capabilities, which contrasts with Opus&apos;s intentional reduction in comprehension to possibly optimize other aspects.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1szyr5z/bro_this_is_too_cheap_i_think_finally_i_have_a/&quot;&gt;bro this is too cheap i think finally i have a respect for the deepseek&lt;/a&gt;&lt;/strong&gt; (Activity: 132): &lt;strong&gt;The post discusses the pricing of &lt;strong&gt;DeepSeek&lt;/strong&gt;, specifically questioning whether the low cost is for the &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt; version rather than the &lt;strong&gt;Pro&lt;/strong&gt; version, which is expected to remain expensive until later in the year. An edit notes that the Pro version is currently discounted. Technical inquiries in the comments focus on the quality level of DeepSeek compared to other frontier models, and whether the pricing is influenced by cache hits, which could affect the cost of output tokens.&lt;/strong&gt; Commenters are debating whether the low price is due to a temporary discount or a fundamental change in pricing strategy, with some suggesting that the cost-effectiveness might be due to cache optimization affecting token output costs.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;DeepSeek V4 Flash vs. Pro&lt;/strong&gt;: There is a discussion about the pricing differences between DeepSeek V4 Flash and Pro versions. The Pro version is noted to be more expensive, but currently available at a discount. This suggests a strategic pricing model to attract different user segments, possibly due to varying feature sets or performance capabilities.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cache System and Cost Efficiency&lt;/strong&gt;: The comments highlight DeepSeek&apos;s disk-based KV cache system, which is praised for its robustness and reliability, lasting for hours compared to the typical 5-minute duration of other providers. This system significantly reduces costs by making cached input nearly free, which is a key factor in the model&apos;s affordability.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance in Creative Tasks&lt;/strong&gt;: There is a critique regarding DeepSeek V4&apos;s performance in creative writing tasks, described as a downgrade compared to previous versions. However, it is still considered effective for role-playing (RP) and agentic tasks, indicating a trade-off between creative capabilities and other functionalities.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. ICML 2026 Conference Discussions and Controversies&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/1szc05y/icml_2026_decision_d/&quot;&gt;ICML 2026 Decision [D]&lt;/a&gt;&lt;/strong&gt; (Activity: 1124): &lt;strong&gt;The post discusses the anticipation surrounding the upcoming publication of decisions for &lt;strong&gt;ICML 2026&lt;/strong&gt;. The community is eagerly awaiting updates, with many checking platforms like OpenReview frequently for the latest information. This reflects the high level of engagement and anxiety typical in the academic community during conference decision periods.&lt;/strong&gt; The comments humorously reflect the tension and impatience experienced by researchers awaiting conference decisions, highlighting a common behavior of repeatedly checking platforms for updates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/1t04vk3/seems_icml_is_rejecting_many_unanimous_positively/&quot;&gt;Seems ICML is rejecting MANY unanimous positively rated papers [D]&lt;/a&gt;&lt;/strong&gt; (Activity: 202): &lt;strong&gt;The post discusses concerns about the ICML review process, highlighting a perceived misalignment in incentives during the rebuttal phase. The author notes that reviewers feel pressured to adjust scores to avoid prolonged discussions, leading to inflated scores that do not necessarily reflect the paper&apos;s merit. This results in many unanimously positively rated papers being rejected due to the conference&apos;s limited capacity. The author suggests reverting to a simpler peer review process where reviewers provide independent evaluations and area chairs (ACs) assess quality and consistency, resolving borderline cases through discussion.&lt;/strong&gt; Commenters express frustration with the review process, noting that even after addressing reviewers&apos; concerns, papers with strong scores are still rejected. There is a call for an appeal mechanism, as some feel that a single AC&apos;s decision can override multiple positive reviews, leading to disheartening outcomes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Several commenters express frustration with the ICML paper review process, highlighting cases where papers with high average scores (e.g., &lt;code&gt;4.5&lt;/code&gt; or &lt;code&gt;4/4/4/4&lt;/code&gt;) were rejected despite positive feedback from reviewers. A common concern is the apparent power of Area Chairs (ACs) to override unanimous positive reviews without a clear appeal mechanism, leading to confusion and dissatisfaction among authors.&lt;/li&gt;
&lt;li&gt;One commenter notes that despite addressing all reviewer concerns in the rebuttal phase, their paper was still rejected. This suggests a potential disconnect between the review process and final decision-making, where resolved issues are cited again as reasons for rejection, indicating possible procedural inefficiencies or miscommunications.&lt;/li&gt;
&lt;li&gt;The discussion raises questions about the transparency and fairness of the review process, with some suggesting that rejections may be influenced by the need to meet acceptance quotas rather than purely on merit. This points to systemic issues in conference paper selection processes, where high-scoring papers are still not guaranteed acceptance.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/MachineLearning/comments/1t06564/chinese_nexusnetwork_in_a_conferences_rejecting/&quot;&gt;Chinese nexus/network in A* conferences rejecting non chinese papers [D]&lt;/a&gt;&lt;/strong&gt; (Activity: 112): &lt;strong&gt;The post raises concerns about alleged nepotism and bias in paper reviews at top-tier AI conferences, particularly involving Chinese networks. The author claims that Chinese reviewers may favor papers from Chinese authors, potentially facilitated by coordination through apps like WeChat. An example cited involves a reviewer expressing dissatisfaction over a missing citation to a Chinese author&apos;s work. This issue is reportedly prevalent in conferences like IJCAI 26, with claims of non-research quality papers from Chinese universities being accepted, while non-Chinese authors face harsher critiques.&lt;/strong&gt; Comments suggest a perception of coordinated review efforts among Chinese researchers, potentially involving reciprocal reviews and information sharing through WeChat. There are also anecdotes of Chinese researchers having insider knowledge of the review process, raising concerns about fairness and transparency.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user mentioned that some lesser-known but respected journals are dominated by papers from Chinese universities, which often lack genuine research quality and resemble engineering projects. They noted that non-Chinese authors attempting similar submissions face harsher critiques, suggesting a potential bias in the review process.&lt;/li&gt;
&lt;li&gt;Another commenter shared an experience where a Chinese researcher contacted them during the review process, claiming insider knowledge about the review of their paper. This raised concerns about the confidentiality and fairness of the review process, although the direct impact on the paper&apos;s rejection remains speculative.&lt;/li&gt;
&lt;li&gt;A user observed that in ECCV, despite having multiple papers accepted, they were not invited to review, while papers with Chinese co-authors received reviews. They noted a pattern where a Chinese area chair favored Chinese authors, even when their papers had low scores, raising questions about potential biases in the review and acceptance process.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>openai</category><category>anthropic</category><category>x-ai</category><category>tencent</category><category>deepseek</category><category>gpt-5.5</category><category>claude-mythos-preview</category><category>gpt-5.5-pro</category><category>qwen3.6-27b</category><category>hy3-preview</category><category>grok-4.3</category><category>gemma-4-31b</category><category>glm-5.1</category><category>deepseek-v4-flash</category><category>sama</category><category>scaling01</category><category>cryps1s</category><category>polynoamial</category><category>ajambrosino</category><category>arix</category><category>cybersecurity</category><category>model-efficiency</category><category>multimodality</category><category>model-benchmarking</category><category>agentic-ai</category><category>model-cost-optimization</category><category>context-windows</category><category>model-performance</category><category>open-weight-models</category><category>software-integration</category><category>security-updates</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-29-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-29-not-much/</guid><description>**OpenAI** is expanding **Codex** from a coding tool to a general work surface with persistent context, tools, integrations, and team rollout, including **Codex-only seats with $0 seat fee** for Business/Enterprise customers through June. Performance improvements focus on agent-loop systems engineering, achieving up to **40% faster agentic workflows** via WebSocket mode on the Responses API. **VS Code** enhances coding-agent UX with semantic indexing, cross-repo search, chat session insights, and prompt/agent evaluation extensions. **Cursor** launches a **Cursor SDK** to enable programmable agent infrastructure for CI/CD, automations, and embedded agents, signaling a shift toward headless agent runtimes and usage-based economics. Research highlights **Agentic Harness Engineering** improving Terminal-Bench 2 pass@1 from **69.7% to 77.0%**, surpassing human-designed baselines and reducing token use by **12%**. Related work on **HALO** shows recursive self-improving agents with significant AppWorld score improvements. **LangChain’s Deep Agents** introduces **Harness Profiles** for model-specific harness tuning and deployability.</description><pubDate>Wed, 29 Apr 2026 05:44:39 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;a quiet day.&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;AI News for 4/28/2026-4/29/2026. We checked 12 subreddits, &lt;a href=&quot;https://twitter.com/i/lists/1585430245762441216&quot;&gt;544 Twitters&lt;/a&gt; and no further Discords. &lt;a href=&quot;https://news.smol.ai/&quot;&gt;AINews&apos; website&lt;/a&gt; lets you search all past issues. As a reminder, &lt;a href=&quot;https://www.latent.space/p/2026&quot;&gt;AINews is now a section of Latent Space&lt;/a&gt;. You can &lt;a href=&quot;https://support.substack.com/hc/en-us/articles/8914938285204-How-do-I-subscribe-to-or-unsubscribe-from-a-section-on-Substack&quot;&gt;opt in/out&lt;/a&gt; of email frequencies!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h1&gt;AI Twitter Recap&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;Coding Agents Become Platforms: Codex, Cursor SDK, and VS Code Harness Upgrades&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI is turning Codex from a coding tool into a general work surface&lt;/strong&gt;: the strongest product signal today was not just usage enthusiasm, but the steady expansion of capabilities around &lt;strong&gt;persistent context, tools, integrations, and team rollout&lt;/strong&gt;. OpenAI highlighted Codex for broader knowledge-work tasks like research synthesis, spreadsheets, and decision tracking in addition to code (&lt;a href=&quot;https://x.com/OpenAI/status/2049583167406064115&quot;&gt;OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/OpenAI/status/2049583308305252620&quot;&gt;follow-up&lt;/a&gt;, &lt;a href=&quot;https://x.com/OpenAI/status/2049583379709124865&quot;&gt;follow-up&lt;/a&gt;); launched &lt;strong&gt;Codex-only seats with $0 seat fee&lt;/strong&gt; for eligible Business/Enterprise customers through end of June (&lt;a href=&quot;https://x.com/OpenAIDevs/status/2049505143218217048&quot;&gt;OpenAIDevs&lt;/a&gt;); and added integrations like &lt;strong&gt;Supabase&lt;/strong&gt; (&lt;a href=&quot;https://x.com/coreyching/status/2049576335157416115&quot;&gt;coreyching&lt;/a&gt;) and a &lt;strong&gt;Figma plugin&lt;/strong&gt; that turns implementation plans into FigJam boards (&lt;a href=&quot;https://x.com/OpenAIDevs/status/2049605820351230158&quot;&gt;OpenAIDevs&lt;/a&gt;). Community posts also pointed to app-server usage, and richer agent workflows (&lt;a href=&quot;https://x.com/gdb/status/2049609076351381580&quot;&gt;gdb&lt;/a&gt;, &lt;a href=&quot;https://x.com/aiDotEngineer/status/2049527486124560491&quot;&gt;aiDotEngineer&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Performance work is shifting from model latency to agent-loop systems engineering&lt;/strong&gt;: OpenAI said moving Codex-style workflows to &lt;strong&gt;WebSocket mode on the Responses API&lt;/strong&gt; keeps state warm across tool calls and cuts repeated work, yielding up to &lt;strong&gt;40% faster agentic workflows&lt;/strong&gt; (&lt;a href=&quot;https://x.com/OpenAIDevs/status/2049595890395152728&quot;&gt;OpenAIDevs&lt;/a&gt;, &lt;a href=&quot;https://x.com/reach_vb/status/2049608607591809303&quot;&gt;reach_vb&lt;/a&gt;, &lt;a href=&quot;https://x.com/pierceboggan/status/2049505637978263697&quot;&gt;pierceboggan&lt;/a&gt;). VS Code shipped a parallel stack of harness improvements: &lt;strong&gt;semantic indexing across workspaces&lt;/strong&gt;, cross-repo search, &lt;strong&gt;chat session insights&lt;/strong&gt;, &lt;strong&gt;skill context&lt;/strong&gt;, remote control for Copilot CLI, and a prompt/agent evaluation extension aimed at refining prompts, skills, and instructions (&lt;a href=&quot;https://x.com/pierceboggan/status/2049504445424423133&quot;&gt;pierceboggan&lt;/a&gt;, &lt;a href=&quot;https://x.com/pierceboggan/status/2049503967059812617&quot;&gt;pierceboggan&lt;/a&gt;, &lt;a href=&quot;https://x.com/code/status/2049556204930695278&quot;&gt;code&lt;/a&gt;). The throughline is that coding-agent UX is now dominated by memory, retrieval, harness quality, and tool orchestration—not just raw model intelligence.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cursor is making an explicit platform play&lt;/strong&gt;: the new &lt;strong&gt;Cursor SDK&lt;/strong&gt; exposes the same runtime, harness, and models that power Cursor for use in &lt;strong&gt;CI/CD, automations, and embedded agents inside products&lt;/strong&gt; (&lt;a href=&quot;https://x.com/cursor_ai/status/2049499866217185492&quot;&gt;cursor_ai&lt;/a&gt;, &lt;a href=&quot;https://x.com/cursor_ai/status/2049499874043830389&quot;&gt;starter projects&lt;/a&gt;, &lt;a href=&quot;https://x.com/cursor_ai/status/2049499876388454903&quot;&gt;customer examples&lt;/a&gt;). This is notable because it shifts Cursor from seat-based IDE product toward programmable agent infrastructure, a framing captured well by &lt;a href=&quot;https://x.com/kimmonismus/status/2049514922044792934&quot;&gt;@kimmonismus&lt;/a&gt;. Taken together with Codex app-server and VS Code harness work, the category is clearly converging on &lt;strong&gt;headless agent runtimes + programmable harnesses + usage-based economics&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Agent Harness Engineering, LangGraph/Deep Agents, and Production AgentOps&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Harnesses are emerging as a first-class optimization layer&lt;/strong&gt;: multiple posts converged on the idea that model quality alone is insufficient; the harness around the model often determines production performance. The clearest research example was &lt;strong&gt;Agentic Harness Engineering&lt;/strong&gt;, which makes harness evolution observable via revertible components, condensed execution evidence, and falsifiable predictions. Reported gains: &lt;strong&gt;Terminal-Bench 2 pass@1 from 69.7% to 77.0%&lt;/strong&gt; in ten iterations, beating a human-designed Codex-CLI baseline at &lt;strong&gt;71.9%&lt;/strong&gt;, while also transferring across model families and reducing token use on SWE-bench Verified by &lt;strong&gt;12%&lt;/strong&gt; (&lt;a href=&quot;https://x.com/omarsar0/status/2049492169887748365&quot;&gt;omarsar0&lt;/a&gt;). Related work on &lt;strong&gt;HALO&lt;/strong&gt; describes recursively self-improving agents using trace analysis to patch harness failures, claiming &lt;strong&gt;AppWorld&lt;/strong&gt; improvement from &lt;strong&gt;73.7 to 89.5&lt;/strong&gt; on Sonnet 4.6 (&lt;a href=&quot;https://x.com/samhogan/status/2049619541727302040&quot;&gt;samhogan&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LangChain’s Deep Agents product line is leaning into model-specific harness tuning and deployability&lt;/strong&gt;: new &lt;strong&gt;Harness Profiles&lt;/strong&gt; let teams version per-model prompts, tools, and middleware, with built-in profiles for OpenAI, Anthropic, and Google models (&lt;a href=&quot;https://x.com/LangChain_OSS/status/2049539590990557381&quot;&gt;LangChain_OSS&lt;/a&gt;, &lt;a href=&quot;https://x.com/LangChain/status/2049540926603718969&quot;&gt;LangChain&lt;/a&gt;, &lt;a href=&quot;https://x.com/Vtrivedy10/status/2049537545273528633&quot;&gt;Vtrivedy10&lt;/a&gt;). LangChain also pushed &lt;strong&gt;DeepAgents Deploy&lt;/strong&gt;, a low-code deployment path using a small set of markdown/config files and LangSmith-backed tracing (&lt;a href=&quot;https://x.com/hwchase17/status/2049546041247289553&quot;&gt;hwchase17&lt;/a&gt;). The broader message from LangChain staff was consistent: &lt;strong&gt;open harnesses, open evals, and OSS-friendly model mixes&lt;/strong&gt; matter because closed models are becoming too expensive for many agent workloads (&lt;a href=&quot;https://x.com/hwchase17/status/2049552801890771220&quot;&gt;hwchase17&lt;/a&gt;, &lt;a href=&quot;https://x.com/Vtrivedy10/status/2049597811226726682&quot;&gt;Vtrivedy10&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cloudflare&lt;/strong&gt; continued to flesh out its “agents as software” stack with ideas like execution ladders and, more concretely, making agents able to become &lt;strong&gt;Cloudflare customers&lt;/strong&gt;—create accounts, register domains, start paid plans, and get tokens for deployment (&lt;a href=&quot;https://x.com/threepointone/status/2049463167298777310&quot;&gt;threepointone&lt;/a&gt;, &lt;a href=&quot;https://x.com/Cloudflare/status/2049545195914498139&quot;&gt;Cloudflare&lt;/a&gt;). This is a meaningful sign that vendors are starting to expose business workflows directly to agents rather than treating them as passive copilots.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Releases and Benchmarks: Mistral Medium 3.5, Granite 4.1, Ling-2.6, and Open-Model Price Pressure&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Mistral Medium 3.5&lt;/strong&gt; was the day’s most debated model release. Early commentary pegged it as a &lt;strong&gt;dense 128B&lt;/strong&gt; model (&lt;a href=&quot;https://x.com/scaling01/status/2049508126081077678&quot;&gt;scaling01&lt;/a&gt;), with Unsloth describing it as a &lt;strong&gt;vision reasoning model&lt;/strong&gt; that can run locally on roughly &lt;strong&gt;64GB RAM&lt;/strong&gt; and publishing GGUFs/guidance (&lt;a href=&quot;https://x.com/UnslothAI/status/2049511248623256017&quot;&gt;UnslothAI&lt;/a&gt;). Reaction split sharply: some criticized its &lt;strong&gt;128K context&lt;/strong&gt;, architecture choices, and pricing versus large Chinese open MoEs (&lt;a href=&quot;https://x.com/eliebakouch/status/2049523829358162027&quot;&gt;eliebakouch&lt;/a&gt;, &lt;a href=&quot;https://x.com/scaling01/status/2049546078664397105&quot;&gt;scaling01&lt;/a&gt;), while others argued Mistral is making a deliberate &lt;strong&gt;enterprise reliability/instruction-following&lt;/strong&gt; bet rather than chasing raw benchmark spectacle (&lt;a href=&quot;https://x.com/kimmonismus/status/2049545016784413005&quot;&gt;kimmonismus&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;IBM Granite 4.1&lt;/strong&gt; added three new &lt;strong&gt;open-weight, Apache 2.0&lt;/strong&gt; non-reasoning models—&lt;strong&gt;30B, 8B, 3B&lt;/strong&gt;—with a strong emphasis on openness and token efficiency (&lt;a href=&quot;https://x.com/ArtificialAnlys/status/2049505499377193156&quot;&gt;ArtificialAnlys&lt;/a&gt;). The standout claim is that &lt;strong&gt;Granite 4.1 8B&lt;/strong&gt; used only &lt;strong&gt;4M output tokens&lt;/strong&gt; on the Artificial Analysis Intelligence Index, versus &lt;strong&gt;78M for Qwen3.5 9B&lt;/strong&gt;, while scoring &lt;strong&gt;61&lt;/strong&gt; on the AA Openness Index. Intelligence lags stronger peers, but the family looks aimed squarely at enterprise/edge deployments where &lt;strong&gt;cost and transparency&lt;/strong&gt; matter more than leaderboard position.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open-weight competitive pressure continues to intensify&lt;/strong&gt;: Ant OSS’s &lt;strong&gt;Ling-2.6-flash&lt;/strong&gt; was cited as ~&lt;strong&gt;107B MoE&lt;/strong&gt;, &lt;strong&gt;MIT-licensed&lt;/strong&gt;, with &lt;strong&gt;61.2 SWE-bench Verified&lt;/strong&gt; and strong math scores (&lt;a href=&quot;https://x.com/nathanhabib1011/status/2049466639171690820&quot;&gt;nathanhabib1011&lt;/a&gt;); &lt;strong&gt;Ling-2.6-1T&lt;/strong&gt; also landed with day-0 &lt;strong&gt;vLLM&lt;/strong&gt; support (&lt;a href=&quot;https://x.com/vllm_project/status/2049517056299761925&quot;&gt;vllm_project&lt;/a&gt;). Meanwhile, &lt;strong&gt;Tencent Hunyuan&lt;/strong&gt; open-sourced &lt;strong&gt;Hy-MT1.5-1.8B-1.25bit&lt;/strong&gt;, a &lt;strong&gt;440MB&lt;/strong&gt;, fully offline translation model for phones covering &lt;strong&gt;33 languages&lt;/strong&gt;, &lt;strong&gt;1,056 translation directions&lt;/strong&gt;, and claiming parity with commercial APIs / 235B-scale models on standard MT benchmarks via aggressive &lt;strong&gt;1.25-bit quantization&lt;/strong&gt; (&lt;a href=&quot;https://x.com/TencentHunyuan/status/2049487799850840334&quot;&gt;TencentHunyuan&lt;/a&gt;). On the market side, multiple posts underscored how rapidly pricing is falling for capable open models, e.g. &lt;strong&gt;Qwen 3.5 Plus at $3/M output tokens&lt;/strong&gt; (&lt;a href=&quot;https://x.com/MatthewBerman/status/2049562998575075526&quot;&gt;MatthewBerman&lt;/a&gt;) and &lt;strong&gt;MiMo-V2.5 Pro&lt;/strong&gt; shifting the Pareto frontier in Code Arena at &lt;strong&gt;$1/$3 per M tokens&lt;/strong&gt; (&lt;a href=&quot;https://x.com/arena/status/2049582973926949116&quot;&gt;arena&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Inference, Kernels, and MoE Systems: FlashQLA, vLLM on Blackwell, torch.compile, and GLM-5 Serving&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Qwen’s FlashQLA is a notable long-context kernel release&lt;/strong&gt;: Alibaba introduced &lt;strong&gt;FlashQLA&lt;/strong&gt;, high-performance linear attention kernels on TileLang, reporting &lt;strong&gt;2–3× forward&lt;/strong&gt; and &lt;strong&gt;2× backward&lt;/strong&gt; speedups, especially for &lt;strong&gt;small models, long-context workloads, and tensor-parallel setups&lt;/strong&gt;. The design centers on gate-driven automatic intra-card CP, algebraic reformulation, and fused warp-specialized kernels (&lt;a href=&quot;https://x.com/Alibaba_Qwen/status/2049462666734026923&quot;&gt;Alibaba_Qwen&lt;/a&gt;, &lt;a href=&quot;https://x.com/Alibaba_Qwen/status/2049462776247247310&quot;&gt;benchmark thread&lt;/a&gt;). It is explicitly positioned for &lt;strong&gt;agentic AI on personal devices&lt;/strong&gt;, which fits a broader trend of long-context optimization migrating from cloud-only infra to edge-friendly runtimes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;vLLM and Blackwell co-design is landing real throughput wins&lt;/strong&gt;: vLLM reported &lt;strong&gt;#1 output speed&lt;/strong&gt; on Artificial Analysis for &lt;strong&gt;DeepSeek V3.2 at 230 tok/s, 0.96s TTFT&lt;/strong&gt; and also strong results on &lt;strong&gt;Qwen 3.5 397B&lt;/strong&gt; using &lt;strong&gt;DigitalOcean serverless inference on NVIDIA HGX B300&lt;/strong&gt;, with optimizations including &lt;strong&gt;NVFP4 quantization&lt;/strong&gt;, &lt;strong&gt;EAGLE3 + MTP speculative decoding&lt;/strong&gt;, and &lt;strong&gt;per-model kernel fusion&lt;/strong&gt; (&lt;a href=&quot;https://x.com/vllm_project/status/2049503979898274163&quot;&gt;vllm_project&lt;/a&gt;). SemiAnalysis separately highlighted gains from &lt;strong&gt;vLLM 0.20.0&lt;/strong&gt; and &lt;strong&gt;MegaMoE&lt;/strong&gt; kernels for DeepSeek v4 Pro on GB200 (&lt;a href=&quot;https://x.com/SemiAnalysis_/status/2049578313111216271&quot;&gt;SemiAnalysis_&lt;/a&gt;). This is one of the clearer examples of hardware/software/model co-design translating into publicly visible latency numbers.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;More engineers are sharing the “middle layer” details between models and GPUs&lt;/strong&gt;: a useful thread on &lt;strong&gt;torch.compile&lt;/strong&gt; broke down Dynamo → pre-grad → AOT autograd → post-grad → Inductor, including where to inject custom FX passes for inference optimizations (&lt;a href=&quot;https://x.com/maharshii/status/2049402475476861044&quot;&gt;maharshii&lt;/a&gt;). John Carmack posted a reminder that GPU library performance remains extremely &lt;strong&gt;path-dependent and notchy&lt;/strong&gt;, noting a &lt;strong&gt;10× regression&lt;/strong&gt; in &lt;code&gt;torch.linalg.solve_ex&lt;/code&gt; when going from &lt;strong&gt;511×511 to 512×512&lt;/strong&gt;, apparently due to a different internal path with &lt;code&gt;CudaMalloc/Free&lt;/code&gt; (&lt;a href=&quot;https://x.com/ID_AA_Carmack/status/2049467648900018281&quot;&gt;ID_AA_Carmack&lt;/a&gt;, &lt;a href=&quot;https://x.com/ID_AA_Carmack/status/2049528611544207714&quot;&gt;follow-up&lt;/a&gt;). Zhipu AI also published a good serving postmortem on &lt;strong&gt;GLM-5&lt;/strong&gt;, detailing &lt;strong&gt;KV cache race conditions&lt;/strong&gt;, HiCache synchronization bugs, and &lt;strong&gt;LayerSplit&lt;/strong&gt;, which reportedly improved prefill throughput by up to &lt;strong&gt;132%&lt;/strong&gt; for long-context coding-agent serving (&lt;a href=&quot;https://x.com/Zai_org/status/2049601030170857891&quot;&gt;Zai_org&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Research Signals: Knowledge Probes, Web-Agent Benchmarks, Multimodal/Science Infrastructure&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Incompressible Knowledge Probes (IKP)&lt;/strong&gt; is one of the more provocative research threads**: &lt;a href=&quot;https://x.com/bojie_li/status/2049314403208896521&quot;&gt;@bojie_li&lt;/a&gt; claims that factual knowledge accuracy over &lt;strong&gt;1,400 questions / 188 models / 27 vendors&lt;/strong&gt; gives a strong log-linear signal of model size (&lt;strong&gt;R² = 0.917&lt;/strong&gt; on open-weight models from &lt;strong&gt;135M to 1.6T params&lt;/strong&gt;). The paper argues factual capacity does &lt;strong&gt;not compress over time&lt;/strong&gt; the way some “reasoning compresses” narratives suggest, and uses the fitted curve to estimate closed-model sizes. Whether one buys the estimates or not, the work is valuable as a reminder that &lt;strong&gt;black-box evals still leak architecture-scale information&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Web-agent evaluation is maturing beyond pass/fail&lt;/strong&gt;: the new &lt;strong&gt;Odysseys&lt;/strong&gt; benchmark introduces &lt;strong&gt;200 long-horizon live-internet tasks&lt;/strong&gt;, rubric-based evaluation instead of binary success, and a &lt;strong&gt;trajectory efficiency&lt;/strong&gt; metric. Best model success is reported at only &lt;strong&gt;44.5%&lt;/strong&gt;, with efficiency still extremely low at &lt;strong&gt;1.15%&lt;/strong&gt; (&lt;a href=&quot;https://x.com/rsalakhu/status/2049521211353301198&quot;&gt;rsalakhu&lt;/a&gt;, &lt;a href=&quot;https://x.com/dan_fried/status/2049530695739932876&quot;&gt;dan_fried&lt;/a&gt;). That fits the broader industry push toward agent benchmarks that better reflect multi-step browsing, spreadsheeting, and orchestration work rather than short synthetic tasks.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI-for-science and multimodal infrastructure saw meaningful ecosystem launches&lt;/strong&gt;: Hugging Face introduced &lt;strong&gt;Hugging Science&lt;/strong&gt;, a curated home for open science datasets/models/challenges including &lt;strong&gt;78GB genomics&lt;/strong&gt;, &lt;strong&gt;11TB PDE simulations&lt;/strong&gt;, &lt;strong&gt;100M cell profiles&lt;/strong&gt;, &lt;strong&gt;9T DNA base pairs&lt;/strong&gt;, and more (&lt;a href=&quot;https://x.com/cgeorgiaw/status/2049506162442129731&quot;&gt;cgeorgiaw&lt;/a&gt;). Anthropic released &lt;strong&gt;BioMysteryBench&lt;/strong&gt;, reporting that recent Claude models solved about &lt;strong&gt;30%&lt;/strong&gt; of hard biological data-analysis problems that stumped experts (&lt;a href=&quot;https://x.com/AnthropicAI/status/2049624600741560340&quot;&gt;AnthropicAI&lt;/a&gt;). On the multimodal side, &lt;strong&gt;Vista4D&lt;/strong&gt; introduced video “reshooting” from new camera trajectories using a persistent 4D scene representation (&lt;a href=&quot;https://x.com/micahgoldblum/status/2049613850912113077&quot;&gt;micahgoldblum&lt;/a&gt;), and Sakana’s &lt;strong&gt;KAME&lt;/strong&gt; proposed a tandem “&lt;strong&gt;speak while thinking&lt;/strong&gt;” architecture for speech-to-speech systems by combining a low-latency frontend model with asynchronous backend-LLM oracle signals (&lt;a href=&quot;https://x.com/SakanaAILabs/status/2049544945233764755&quot;&gt;SakanaAILabs&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Top Tweets (by engagement)&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cursor SDK launch&lt;/strong&gt;: programmable agent runtime/harness/models for CI, automations, and embedded products (&lt;a href=&quot;https://x.com/cursor_ai/status/2049499866217185492&quot;&gt;cursor_ai&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codex momentum / platform expansion&lt;/strong&gt;: OpenAI pushing Codex beyond coding into broader work automation, plus team rollout and integrations (&lt;a href=&quot;https://x.com/OpenAI/status/2049583167406064115&quot;&gt;OpenAI&lt;/a&gt;, &lt;a href=&quot;https://x.com/OpenAIDevs/status/2049505143218217048&quot;&gt;OpenAIDevs&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Google productization signal&lt;/strong&gt;: Gemini can now generate downloadable Docs, Sheets, Slides, PDFs, and more directly from chat (&lt;a href=&quot;https://x.com/sundarpichai/status/2049519281600373159&quot;&gt;sundarpichai&lt;/a&gt;, &lt;a href=&quot;https://x.com/GeminiApp/status/2049519416698683514&quot;&gt;GeminiApp&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Q1 business signal&lt;/strong&gt;: Google reported &lt;strong&gt;Cloud +63% YoY&lt;/strong&gt;, strong Gemini momentum, and all-time-high Search queries, an important data point for the “AI monetization” thesis (&lt;a href=&quot;https://x.com/sundarpichai/status/2049581838260461916&quot;&gt;sundarpichai&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Deep technical long-form&lt;/strong&gt;: Dwarkesh’s chalkboard session with Reiner Pope on inferring training/serving strategies from prices, equations, and systems constraints (&lt;a href=&quot;https://x.com/dwarkesh_sp/status/2049551656816439604&quot;&gt;dwarkesh_sp&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h1&gt;AI Reddit Recap&lt;/h1&gt;
&lt;h2&gt;/r/LocalLlama + /r/localLLM Recap&lt;/h2&gt;
&lt;h3&gt;1. Mistral Medium 3.5 Model Launch and Features&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sz1qer/mistralaimistralmedium35128b_hugging_face/&quot;&gt;mistralai/Mistral-Medium-3.5-128B · Hugging Face&lt;/a&gt;&lt;/strong&gt; (Activity: 921): &lt;strong&gt;&lt;strong&gt;Mistral Medium 3.5&lt;/strong&gt; is a dense &lt;code&gt;128B&lt;/code&gt; parameter model with a &lt;code&gt;256k&lt;/code&gt; context window, designed for instruction-following, reasoning, and coding tasks. It features configurable reasoning effort, multimodal input capabilities, and strong performance across various benchmarks, surpassing previous models like Devstral. The model is open-sourced under a &lt;strong&gt;Modified MIT License&lt;/strong&gt; and supports multiple languages and system prompts. For optimal performance, it is recommended to use the vLLM library for inference. More details can be found &lt;a href=&quot;https://huggingface.co/mistralai/Mistral-Medium-3.5-128B&quot;&gt;here&lt;/a&gt;.&lt;/strong&gt; One commenter is testing the model on a Strix Halo with a &lt;code&gt;q4&lt;/code&gt; quantization, reporting token generation speeds and expressing interest in the model&apos;s dense architecture. Another comment highlights the model&apos;s niche as a dense 128B parameter model, comparing it to Qwen 27B.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;IvGranite shared performance metrics for the Mistral-Medium-3.5-128B model using a &lt;code&gt;q4&lt;/code&gt; quantization on a Strix Halo setup. The results showed a generation speed of &lt;code&gt;46.70 tokens per second&lt;/code&gt; and a prompt processing speed of &lt;code&gt;3.26 tokens per second&lt;/code&gt;, with a total duration of &lt;code&gt;4.84 seconds&lt;/code&gt; for one of the tests. This indicates a relatively high throughput for a dense model of this size.&lt;/li&gt;
&lt;li&gt;Grumd and reto-wyss discussed the niche of dense models, with grumd noting the uniqueness of a &lt;code&gt;128B&lt;/code&gt; dense model. Reto-wyss compared it to the Qwen &lt;code&gt;27B&lt;/code&gt; model, questioning which is denser, highlighting the competitive landscape in model density and performance.&lt;/li&gt;
&lt;li&gt;The discussion around dense models like the Mistral-Medium-3.5-128B reflects interest in balancing model size with performance efficiency. The mention of &lt;code&gt;128B&lt;/code&gt; as a &apos;chonker&apos; by artisticMink underscores the challenges and intrigue in handling such large-scale models, especially in terms of computational resources and speed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sz2mgw/mistral_medium_35_launched/&quot;&gt;Mistral Medium 3.5 Launched&lt;/a&gt;&lt;/strong&gt; (Activity: 326): &lt;strong&gt;&lt;strong&gt;Mistral Medium 3.5&lt;/strong&gt; has been launched as a &lt;code&gt;128B&lt;/code&gt; dense model, notable for its integration of instruction-following, reasoning, and coding capabilities. The model is available with open weights under a modified MIT license, which restricts commercial use without a license fee. This model supports asynchronous coding tasks in the cloud, enabling parallel session execution, and introduces a new Work mode in Le Chat for complex workflows. More details can be found on &lt;a href=&quot;https://huggingface.co/mistralai/Mistral-Medium-3.5-128B&quot;&gt;Hugging Face&lt;/a&gt; and &lt;a href=&quot;https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5&quot;&gt;Mistral&apos;s announcement&lt;/a&gt;.&lt;/strong&gt; There is debate over the licensing terms, with some users arguing that calling it a &apos;modified MIT license&apos; is misleading, as it imposes commercial restrictions not typical of the MIT license. The model&apos;s parameter count and capabilities are also discussed, with some users noting the significant computational resources implied by the &lt;code&gt;128B&lt;/code&gt; dense architecture.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Mistral Medium 3.5 model is a dense 128 billion parameter model, which is significant given the trend towards larger dense models. This aligns with the ongoing investment in dense architectures, as noted by Septerium, who highlights the importance of continuing to develop these models despite the industry&apos;s focus on sparse models.&lt;/li&gt;
&lt;li&gt;Long_comment_san discusses the benchmarks of the Mistral Medium 3.5, noting that while it may not be state-of-the-art (SOTA), it is crucial for the future of dense models. They argue that dense models in the 80 billion+ parameter range are essential workhorses and foresee a future where ultra-sparse mixture of experts (MOE) models and super-dense models coexist, with the latter reaching up to 200 billion parameters.&lt;/li&gt;
&lt;li&gt;ClearApartment2627 raises a licensing issue, criticizing the use of a &apos;modified MIT license&apos; for the Mistral Medium 3.5. They argue that calling it a modified MIT license is misleading, as the conditions for commercial use differ significantly from the traditional MIT license, particularly for companies with revenues exceeding $20 million per month.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Qwen 3.6 Model Evaluations and Features&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sxzqry/qwen_36_27b_bf16_vs_q4_k_m_vs_q8_0_gguf_evaluation/&quot;&gt;Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation&lt;/a&gt;&lt;/strong&gt; (Activity: 995): &lt;strong&gt;The image provides a benchmark comparison of the Qwen 3.6 27B model across three quantization variants: BF16, Q4_K_M, and Q8_0 GGUF, evaluated using llama-cpp-python with Neo AI Engineer. The benchmarks include HumanEval for code generation, HellaSwag for commonsense reasoning, and BFCL for function calling. The Q4_K_M variant stands out for its practical performance, offering 1.45x faster throughput than BF16, with 48% less peak RAM usage and a 68.8% smaller model size, while maintaining nearly identical function calling scores. However, Q8_0, despite slightly better HumanEval scores, was less efficient in terms of RAM and speed compared to Q4_K_M. The evaluation setup included GGUF via llama-cpp-python, with a context size of 32768 and checkpointed evaluation runs.&lt;/strong&gt; Some commenters appreciated the detailed comparison across quantization variants, while others questioned the accuracy of the results, noting the absence of error bars and suggesting potential sampling errors. There were also concerns about the unexpectedly low HumanEval scores for Qwen 3.6 27B compared to older models like Gemma 3 4B and Llama3-8b.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;audioen raises concerns about the lack of error bars in the measurements, suggesting that the unexpected ordering of Q4_K_M over Q8_0 might be due to sampling error. This highlights the importance of statistical rigor in benchmarking to ensure reliable comparisons between quantization methods.&lt;/li&gt;
&lt;li&gt;One_Key_8127 points out discrepancies in the reported HumanEval scores, noting that older models like Gemma 3 4B and Llama3-8b outperform Qwen 3.6 27B, which should theoretically score much higher. This suggests potential issues with the evaluation setup or data, as Qwen 3.6 27B is expected to achieve scores of 85% or more, not around 50%.&lt;/li&gt;
&lt;li&gt;spaceman_ questions the integrity of the Q8_0 model&apos;s results, speculating that the quantization of the KV cache might have affected performance. They express interest in the full code used for the evaluation, as it could reveal whether the KV cache was indeed quantized, which might explain the unexpected results.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1syx4sg/qwen_introduced_flashqla/&quot;&gt;Qwen Introduced FlashQLA&lt;/a&gt;&lt;/strong&gt; (Activity: 407): &lt;strong&gt;&lt;strong&gt;FlashQLA&lt;/strong&gt; is a new high-performance linear attention kernel designed for agentic AI on personal devices, offering &lt;code&gt;2–3×&lt;/code&gt; forward speedup and &lt;code&gt;2×&lt;/code&gt; backward speedup. Built on &lt;strong&gt;TileLang&lt;/strong&gt;, it features gate-driven automatic intra-card context parallelism (CP), hardware-friendly algebraic reformulation, and TileLang fused warp-specialized kernels. The approach splits the GDN flow into two kernels optimized for CP and backward efficiency, which, despite extra memory I/O overhead at large batch sizes, enhances real-world performance on edge devices and long-context workloads. The backward pass is notably optimized with a 16-stage warp-specialized pipeline, achieving &lt;code&gt;2×+&lt;/code&gt; kernel-level speedups. More details can be found in their &lt;a href=&quot;https://qwen.ai/blog?id=flashqla&quot;&gt;blog&lt;/a&gt; and &lt;a href=&quot;https://github.com/QwenLM/FlashQLA&quot;&gt;code repository&lt;/a&gt;.&lt;/strong&gt; One comment humorously references the abbreviation of &apos;cyberpunk,&apos; while another suggests the technology is suitable for those with high-end hardware like the H100. There is also interest in forward and backward benchmark results across common configurations.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ResearchCrafty1804 discusses benchmark results for FlashQLA, highlighting both forward and backward performance across common configurations. This suggests a focus on evaluating the model&apos;s efficiency in different computational scenarios, which is crucial for understanding its practical applications and limitations.&lt;/li&gt;
&lt;li&gt;pmttyji provides a detailed list of technical requirements for running FlashQLA, including the need for an SM90 or above, CUDA 12.8 or above, and PyTorch 2.8 or above. These specifications indicate the advanced hardware and software environment necessary to leverage FlashQLA&apos;s capabilities effectively.&lt;/li&gt;
&lt;li&gt;LightBrightLeftRight hints at the potential for local deployment of FlashQLA on high-performance hardware like the H100, suggesting that users with access to such resources can experiment with the model locally, potentially leading to more customized and optimized implementations.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1syt38w/what_it_feels_like_to_have_to_have_qwen_36_or/&quot;&gt;What it feels like to have to have Qwen 3.6 or Gemma 4 running locally&lt;/a&gt;&lt;/strong&gt; (Activity: 766): &lt;strong&gt;The image is a meme that humorously conveys the feeling of empowerment and capability when running advanced AI models like Qwen 3.6 or Gemma 4 locally. The post discusses the practical application of these models in professional scenarios, highlighting their efficiency and capability to perform expert-level tasks, which traditionally required human expertise. The image metaphorically suggests that having such powerful models at one&apos;s disposal is akin to holding immense power, like &apos;the power of the sun in the palm of my hand.&apos;&lt;/strong&gt; Commenters highlight the effectiveness of Gemma 4 in translation and creative writing, and Qwen 3.6 in game development. There&apos;s a sense of nostalgia and rapid progress in AI capabilities, comparing it to the fast-paced improvements in 90s gaming. Another comment suggests using task-specific fine-tuned models like granites and nemotrons for cost-effective and efficient performance.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Qwen 3.6&lt;/strong&gt; is noted for its stability and efficiency in running agents overnight without errors or looping, which is a significant improvement over previous models. This suggests robust handling of tasks and decision-making processes, making it reliable for long-term operations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Gemma 4&lt;/strong&gt; excels in translation and creative writing, indicating its strength in natural language processing tasks. The mention of Qwen 3.6&apos;s capability in game development highlights its versatility and efficiency, especially in creating browser-based games, which is impressive for a smaller model.&lt;/li&gt;
&lt;li&gt;The discussion on &lt;strong&gt;task-specific fine-tuned models&lt;/strong&gt; like Granites and Nemotrons suggests they outperform larger models at a lower cost. These models can be loaded on demand and managed through an agent orchestrator, offering flexibility and efficiency in deployment, which could be advantageous for specific applications.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. Local LLM Hardware and Usage Experiences&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sxqa2c/im_done_with_using_local_llms_for_coding/&quot;&gt;I&apos;m done with using local LLMs for coding&lt;/a&gt;&lt;/strong&gt; (Activity: 2387): &lt;strong&gt;The user compared local LLMs like &lt;strong&gt;Qwen 27B&lt;/strong&gt; and &lt;strong&gt;Gemma 4 31B&lt;/strong&gt; against &lt;strong&gt;Claude Code&lt;/strong&gt; for coding tasks, particularly in OS/Docker environments. They found local models lacking in decision-making and tool-calling capabilities, often failing to execute tasks like Dockerizing a GitHub repo efficiently. The user noted issues with local LLMs reading excessive output from commands like &apos;docker build&apos;, leading to broken sessions with &lt;code&gt;250k input tokens&lt;/code&gt;. Performance was also a concern, with frequent prompt cache failures causing long pauses. The user concluded that local LLMs are not worth the productivity loss compared to cloud-based models like &lt;strong&gt;OpenRouter&lt;/strong&gt; and &lt;strong&gt;Kimi&lt;/strong&gt; for coding tasks, though they still find local models useful for automation and text-based tasks.&lt;/strong&gt; Commenters noted similar experiences with local LLMs, suggesting that expectations might be unrealistic. One commenter highlighted the importance of optimizing settings for performance, such as those found in &lt;a href=&quot;https://unsloth.ai/docs/basics/claude-code#fixing-90-slower-inference-in-claude-code&quot;&gt;Unsloth&apos;s guide&lt;/a&gt;. Another emphasized the significance of the supporting tech stack, detailing a setup involving &lt;strong&gt;RTX 5090&lt;/strong&gt;, &lt;strong&gt;Qwen3.6 35B/27B&lt;/strong&gt;, and various tools like &lt;strong&gt;OpenCode TUI&lt;/strong&gt; and &lt;strong&gt;oh-my-opencode harness&lt;/strong&gt; for improved performance.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A user highlights the importance of optimizing settings for local models like Claude Code to improve performance. They reference a guide on &lt;a href=&quot;https://unsloth.ai/docs/basics/claude-code#fixing-90-slower-inference-in-claude-code&quot;&gt;Unsloth&lt;/a&gt; that addresses issues like slow inference and ineffective caching, suggesting that proper configuration can significantly enhance usability.&lt;/li&gt;
&lt;li&gt;Another commenter emphasizes the critical role of the tech stack when running local models, detailing their own setup which includes an RTX 5090 and Qwen3.6 models with TurboQuant. They use specific parameters like &lt;code&gt;--temperature 0.6&lt;/code&gt; and &lt;code&gt;--top-p 0.95&lt;/code&gt;, and a coding stack with OpenCode TUI and various MCPs. This setup reportedly outperforms centralized solutions like Anti-Gravity and Codex.&lt;/li&gt;
&lt;li&gt;A discussion on the importance of harnesses in local LLM performance suggests that different harnesses can lead to vastly different outcomes even with the same model. The commenter notes that some harnesses, like Hermes, have specific strengths and weaknesses, such as handling long-running processes. They advocate for experimenting with various harnesses to find the best fit for specific tasks, indicating that harness design is a key area for future improvements.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/LocalLLaMA/comments/1sz0lyk/16x_dgx_sparks_what_should_i_run/&quot;&gt;16x DGX Sparks - What should I run?&lt;/a&gt;&lt;/strong&gt; (Activity: 1621): &lt;strong&gt;The image depicts a home lab setup involving 16 NVIDIA DGX Spark units, which are intended to be configured into a large-scale DGX Spark Cluster. The setup includes a 200Gbps FS switch and QSFP56 DAC cables, suggesting a high-performance computing environment. The user is seeking advice on what applications or workloads to run on this powerful cluster, which boasts 2TB of unified memory. Suggestions from the community include running Kimi K2.6 with vLLM, leveraging eugr’s nightly builds, and considering unmerged PRs for Deepseek V4 for vLLM. The setup is expected to deliver high prefill numbers, although token generation speed may be limited to 20 tokens per second.&lt;/strong&gt; One commenter suggests selling the DGX Sparks to purchase H100s instead, implying that H100s might offer better performance or value for certain workloads.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;yammering discusses the performance of running Kimi K2.6 on an eight-node cluster with vLLM, noting that using eugr&apos;s nightly builds can enhance performance. They mention unmerged pull requests for Deepseek V4 for vLLM, suggesting potential improvements. They also highlight that while Flash runs well on 8x, the Pro version could utilize all 16 nodes, achieving high prefill numbers but with token generation averaging 20 tokens per second.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Less Technical AI Subreddit Recap&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;1. Claude and Blender Integration&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1syu949/the_final_nail_in_the_coffin_for_entry_level/&quot;&gt;The final nail in the coffin for entry level creative freelancers just dropped&lt;/a&gt;&lt;/strong&gt; (Activity: 708): &lt;strong&gt;&lt;strong&gt;Anthropic&lt;/strong&gt; has released the Blender MCP connector, enabling &lt;strong&gt;Claude&lt;/strong&gt; to control Blender via the Python API. This integration allows users to create and modify 3D scenes using natural language commands, effectively acting as a &apos;copilot&apos; within Blender. The tool can handle tasks such as debugging node setups, batch changes, and adding custom tools, potentially reducing the need for entry-level freelancers in tasks like product renders and low-poly asset creation. The broader creative pipeline can now be managed by a single user with Claude and connected tools, streamlining processes from scriptwriting to final edits.&lt;/strong&gt; Some commenters express skepticism about the quality of output, noting that while automation may increase quantity, it doesn&apos;t necessarily improve quality, as seen in other industries with automated tools.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;poponis argues that while AI tools can assist in creative processes, they do not guarantee quality output. The commenter emphasizes that AI-generated content often requires human expertise to refine and improve, particularly in fields like coding where technical knowledge is crucial. They suggest that the narrative of AI replacing human roles is overstated and that AI should be viewed as a tool to enhance, not replace, human creativity.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1sy49oi/claude_now_connects_to_blender/&quot;&gt;Claude now connects to Blender&lt;/a&gt;&lt;/strong&gt; (Activity: 605): &lt;strong&gt;&lt;strong&gt;Claude&lt;/strong&gt;, an AI model by &lt;strong&gt;Anthropic&lt;/strong&gt;, now integrates with &lt;strong&gt;Blender&lt;/strong&gt; through a new connector, allowing users to debug scenes, build tools, and batch-apply changes directly from Claude. This integration leverages Blender&apos;s Python API, enabling advanced operations like creating geometry and materials. The connector can be added via the Connectors Directory in the Claude desktop app, enhancing workflow efficiency for creative professionals. &lt;a href=&quot;https://www.blender.org/press/anthropic-joins-the-blender-development-fund-as-corporate-patron/&quot;&gt;Blender&lt;/a&gt; recently announced that Anthropic joined its Development Fund as a corporate patron, contributing a minimum of &lt;code&gt;$280k&lt;/code&gt;.&lt;/strong&gt; Commenters highlight the integration as a significant quality of life improvement for Blender users, particularly for managing complex scenes. There is also speculation about the potential high token usage due to the extensive capabilities of Blender&apos;s Python API.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Ciabattabingo highlights that Anthropic has joined the Blender Development Fund as a corporate patron, which involves a significant financial commitment, potentially $280k. This partnership could enhance Blender&apos;s development, offering patrons a dedicated product manager and closer involvement in funding decisions. The integration of Claude with Blender could streamline content production by leveraging Claude&apos;s capabilities for more efficient workflows.&lt;/li&gt;
&lt;li&gt;jj2446 points out the potential of Claude&apos;s integration with Blender, emphasizing the quality of life improvements for managing complex scenes. With access to Blender&apos;s Python API, Claude could automate tasks such as creating geometry and materials, significantly enhancing productivity for long-time Blender users.&lt;/li&gt;
&lt;li&gt;mikeb550 inquires about the possibility of using Claude prompts to create 3D models directly. This suggests a potential feature where users could leverage Claude&apos;s AI capabilities to generate models, which would be a significant advancement in simplifying 3D modeling workflows.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;2. Talkie: Pre-1931 Language Model&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/singularity/comments/1sxp4ha/talkie_a_13b_lm_trained_exclusively_on_pre1931/&quot;&gt;Talkie, a 13B LM trained exclusively on pre-1931 data&lt;/a&gt;&lt;/strong&gt; (Activity: 3160): &lt;strong&gt;&lt;strong&gt;Talkie&lt;/strong&gt; is a 13B parameter language model developed by researchers &lt;strong&gt;Nick Levine, David Duvenaud, and Alec Radford&lt;/strong&gt;, trained on &lt;code&gt;260B&lt;/code&gt; tokens from pre-1931 texts. This model aims to investigate how LLMs generalize knowledge without modern data, using sources like old books, newspapers, and scientific journals. Despite its historical training data, Talkie shows promising results in language and numeracy tasks and even demonstrates early capabilities in learning simple Python, suggesting potential for understanding AI&apos;s generalization abilities. For more details, see the &lt;a href=&quot;https://talkie-lm.com/introducing-talkie&quot;&gt;original article&lt;/a&gt;.&lt;/strong&gt; Some commenters appreciate the authenticity of the model&apos;s output, noting its alignment with the pre-1931 era, while others express enthusiasm for the project&apos;s innovative approach to understanding AI generalization.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model, Talkie, trained exclusively on pre-1931 data, demonstrates a unique perspective on historical technological concepts. For instance, when asked about lunar travel, it provides a detailed response based on the scientific understanding of the era, highlighting the perceived impossibility due to factors like speed and lack of atmosphere. This showcases the model&apos;s ability to simulate historical scientific reasoning, albeit with limitations in accuracy by modern standards.&lt;/li&gt;
&lt;li&gt;Talkie exhibits a tendency towards sycophancy, where it agrees with the user&apos;s assertions regardless of their accuracy. This behavior is evident when discussing modern inventions; the model will affirm the feasibility or impossibility of an idea based on the user&apos;s framing, rather than an objective analysis. This highlights a common issue in language models where they mirror user input rather than providing independent verification or critique.&lt;/li&gt;
&lt;li&gt;The model&apos;s response to a query about using germanium as a replacement for vacuum tubes reflects its historical training data. It discusses the high resistance and oxidation issues of germanium, which aligns with early 20th-century scientific knowledge. However, it also illustrates the model&apos;s limitations in applying this knowledge to modern contexts, as it lacks the ability to integrate post-1931 advancements in semiconductor technology.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/ClaudeAI/comments/1sy7rry/talkie_a_13b_llm_trained_only_on_pre1931_text/&quot;&gt;Talkie: a 13B LLM trained only on pre-1931 text used Claude Sonnet to help test the model and judge its output&lt;/a&gt;&lt;/strong&gt; (Activity: 1271): &lt;strong&gt;&lt;strong&gt;Talkie&lt;/strong&gt; is a 13 billion parameter language model developed by researchers including &lt;strong&gt;Alec Radford&lt;/strong&gt; and trained exclusively on pre-1931 text, effectively isolating it from modern internet influences. This model aims to explore the balance between memorization and generalization in language models by using a unique dataset that predates the modern web. Notably, &lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt; was utilized in its reinforcement learning pipeline, and &lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; generated synthetic conversations for fine-tuning, highlighting an ironic dependency on modern LLMs despite its historical training data. Remarkably, Talkie can generate Python code from in-context examples, leveraging 19th-century mathematics rather than modern programming knowledge. The model is being used to study long-range forecasting, invention, and LLM identity, with plans for a larger GPT-3-scale vintage model in the future. Both models are &lt;strong&gt;Apache 2.0 licensed&lt;/strong&gt; and available on Hugging Face.&lt;/strong&gt; Commenters are intrigued by Talkie&apos;s ability to predict future inventions and its historical perspective on events like the Great War, reflecting on its unique training data&apos;s impact on its reasoning capabilities.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The model, Talkie, is a 13B parameter language model trained exclusively on pre-1931 text, which presents unique challenges and opportunities. The use of historical data limits the model&apos;s exposure to modern language constructs and contemporary knowledge, potentially affecting its ability to generate relevant predictions or understand current contexts. However, this constraint also allows for an exploration of how well a model can perform with a dataset that lacks modern biases and information.&lt;/li&gt;
&lt;li&gt;A user tested Talkie by asking it to predict future inventions by 2026, revealing insights into the model&apos;s historical perspective. The predictions included concepts like a &apos;successful flying machine&apos; and &apos;a universal language,&apos; which reflect the technological aspirations and limitations of the early 20th century. This highlights how the model&apos;s training data influences its output, as it draws from historical expectations rather than current technological trends.&lt;/li&gt;
&lt;li&gt;Another user explored the model&apos;s ability to provide historical recipes, such as preparing laudanum, showcasing its potential to retrieve and articulate detailed historical processes. This demonstrates the model&apos;s utility in accessing and conveying information from its training period, which could be valuable for historical research or educational purposes.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;3. DeepSeek V4 and Pricing Comparisons&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1syk4yq/deepseek_v32_vs_deepseek_v4/&quot;&gt;DeepSeek V3.2 vs DeepSeek V4&lt;/a&gt;&lt;/strong&gt; (Activity: 167): &lt;strong&gt;The image presents a leaderboard from OpenRouter, highlighting the usage statistics of language models, where &lt;strong&gt;DeepSeek V3.2&lt;/strong&gt; ranks significantly higher than &lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt;. DeepSeek V3.2 has processed &lt;code&gt;1.21 trillion tokens&lt;/code&gt; with a &lt;code&gt;6% increase&lt;/code&gt;, while DeepSeek V4 Flash is at &lt;code&gt;317 billion tokens&lt;/code&gt;. This suggests that despite the newer version, DeepSeek V4, being available, users prefer the older version, possibly due to cost considerations or performance issues at launch, as noted in a statement by &lt;strong&gt;Fireworks.ai&lt;/strong&gt;. The comments indicate that while DeepSeek V4 offers advanced features like a &lt;code&gt;1M context window&lt;/code&gt;, it faced initial problems, and users are cautious about transitioning to it.&lt;/strong&gt; Commenters suggest that real-world applications are slow to adopt new versions due to the need for thorough testing. Despite initial launch issues, some users find DeepSeek V4 to be state-of-the-art (SOTA) and superior in solving complex problems compared to other models like GLM 5.1.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DeepSeek V4 is noted for its state-of-the-art (SOTA) performance, particularly due to its enhanced cache hit capabilities and support for a 1 million token context, which significantly surpasses other open models. This makes it particularly effective for handling large-scale data and complex queries, as highlighted by &lt;a href=&quot;https://www.reddit.com/user/LittleYouth4954&quot;&gt;LittleYouth4954&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;A user, Far-Run-3778, shared a practical experience where DeepSeek V4 outperformed GLM 5.1 in debugging a large codebase. The user reported that DeepSeek V4 resolved issues in 15 minutes that GLM 5.1 couldn&apos;t solve in a week, demonstrating its efficiency and effectiveness in real-world software development scenarios.&lt;/li&gt;
&lt;li&gt;Despite the technical advancements of DeepSeek V4, there is a noted reluctance among users to transition from V3.2, as mentioned by Specter_Origin and According-Clock6266. This hesitation is attributed to the typical cautious approach in adopting new versions for critical workloads, where stability and familiarity often take precedence over new features.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;a href=&quot;https://www.reddit.com/r/DeepSeek/comments/1sxua2h/174_vs_500_deepseekv4pro_just_made_gpt55_look/&quot;&gt;$1.74 vs $5.00: DeepSeek-V4-Pro just made GPT-5.5 look like a luxury tax&lt;/a&gt;&lt;/strong&gt; (Activity: 167): &lt;strong&gt;&lt;strong&gt;DeepSeek-V4-Pro&lt;/strong&gt; offers a highly competitive pricing model at &lt;code&gt;$1.74 per 1M input tokens&lt;/code&gt;, significantly undercutting &lt;strong&gt;GPT-5.5&lt;/strong&gt; and &lt;strong&gt;Claude Opus 4.7&lt;/strong&gt;, both priced at &lt;code&gt;$5.00 per 1M input tokens&lt;/code&gt;. The V4-Pro model boasts &lt;code&gt;1.6 trillion parameters&lt;/code&gt; and a &lt;code&gt;1M context window&lt;/code&gt;, achieving &lt;code&gt;80%+&lt;/code&gt; on the SWE-bench, which challenges the cost-effectiveness of OpenAI&apos;s offerings. This pricing and performance combination positions V4-Pro as a compelling alternative for developers seeking cost efficiency without sacrificing model capability.&lt;/strong&gt; Commenters highlight the cost-effectiveness of DeepSeek-V4-Pro, noting that its cached tokens make context usage nearly free and output tokens cheaper. Some users only resort to GPT-5.5 or Opus 4.7 for specific edge cases or complex projects, suggesting a shift in preference towards V4-Pro for general use.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Odd-Contest-5267 highlights that DeepSeek-V4-Pro offers significantly cheaper token costs compared to GPT-5.5, especially with cached tokens making context usage almost free. This makes it a cost-effective choice unless dealing with complex tasks where GPT-5.5 or Opus 4.7 might be necessary.&lt;/li&gt;
&lt;li&gt;PitifulBig8 points out that DeepSeek&apos;s shift away from Nvidia GPUs has reduced operational costs significantly. However, they note that DeepSeek-V4-Pro struggles with tasks requiring extensive context usage, indicating it may not match the performance of GPT or Claude in such scenarios.&lt;/li&gt;
&lt;li&gt;Snoo_57113 mentions using a flash version of DeepSeek that is even cheaper and faster, which is particularly beneficial for open code projects. This suggests a focus on cost-efficiency and speed in certain development environments.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1&gt;AI Discords&lt;/h1&gt;
&lt;p&gt;Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.&lt;/p&gt;
</content:encoded><category>openai</category><category>microsoft</category><category>cursor_ai</category><category>langchain-ai</category><category>codex</category><category>omarsar0</category><category>samhogan</category><category>kimmonismus</category><category>reach_vb</category><category>pierceboggan</category><category>agentic-harness-engineering</category><category>agent-loop-systems-engineering</category><category>performance-optimization</category><category>semantic-indexing</category><category>prompt-evaluation</category><category>software-engineering</category><category>sdk-development</category><category>model-tuning</category><category>recursive-self-improvement</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-28-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-28-not-much/</guid><description>**vLLM v0.20.0** introduces significant improvements in memory and MoE serving efficiency, including **TurboQuant 2-bit KV cache** for **4× KV capacity** and a **2.1% latency improvement**. The update supports multiple hardware platforms like **DeepSeek V4 MegaMoE on Blackwell**, Jetson Thor, ROCm, Intel XPU, and Grace-Blackwell setups. Early benchmarks show **DeepSeek V4 Pro** on **B300** hardware can be up to **8× faster** than H200. The ecosystem is rapidly adopting day-0 support for new open models such as **Poolside Laguna XS.2**, **Ling-2.6-flash**, and **NVIDIA Nemotron 3 Nano Omni**. 

**Poolside** released **Laguna XS.2**, a **33B total / 3B active MoE** coding model under **Apache 2.0**, capable of running on a single GPU, with hybrid attention and FP8 KV cache, performing near **Qwen-3.5**. 

**NVIDIA** launched **Nemotron 3 Nano Omni**, a **30B / A3B multimodal MoE** with **256K context**, supporting text, image, video, audio, and documents, with immediate distribution across multiple platforms. Discussions highlighted tradeoffs in quantization methods and a shift away from CUDA lock-in towards heterogeneous accelerator support.</description><pubDate>Tue, 28 Apr 2026 05:44:39 GMT</pubDate><category>vllm</category><category>poolside</category><category>nvidia</category><category>opensrouter</category><category>lmstudio</category><category>ollama</category><category>unsloth</category><category>fal</category><category>fireworks</category><category>deepinfra</category><category>togethercompute</category><category>baseten</category><category>canonical</category><category>vllm-0.20.0</category><category>poolside-laguna-xs.2</category><category>ling-2.6-flash</category><category>nemotron-3-nano-omni</category><category>qwen-3.5</category><category>jeremyphoward</category><category>maharshii</category><category>teortaxestex</category><category>aymericroucher</category><category>piotrz</category><category>memory-optimization</category><category>mixture-of-experts</category><category>model-optimization</category><category>inference-speed</category><category>quantization</category><category>model-deployment</category><category>multimodality</category><category>hardware-optimization</category><category>model-benchmarking</category><category>open-models</category><category>agentic-ai</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-27-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-27-not-much/</guid><description>**OpenAI** loosens its **Azure exclusivity**, allowing distribution across **Google TPU**, **AWS Trainium**, and **Bedrock** with commitments through **2032** and revenue share through **2030**. **GPT-5.5** shows improved benchmarks but is not uniformly dominant, ranking variably across coding, document, math, and vision tasks. GitHub&apos;s **Copilot** shifts to usage-based billing starting June 1, reflecting increased runtime costs. **OpenAI** open-sourced **Symphony**, an orchestration layer for issue tracking and Codex agents. **Xiaomi** released **MiMo-V2.5** and **MiMo-V2.5-Pro**, large context models with up to **1M-token context** and trillions of tokens trained, emphasizing complex agent and omni-modal capabilities. **Kimi K2.6** leads OpenRouter&apos;s leaderboard, noted for coding and long-horizon agent capabilities with large-scale sub-agent coordination.</description><pubDate>Mon, 27 Apr 2026 05:44:39 GMT</pubDate><category>openai</category><category>microsoft</category><category>google</category><category>amazon</category><category>github</category><category>xiaomi</category><category>openai-devs</category><category>vllm_project</category><category>kimi-moonshot</category><category>gpt-5.5</category><category>gpt-5.4</category><category>opus-4.7</category><category>mimo-v2.5-pro</category><category>mimo-v2.5</category><category>kimi-k2.6</category><category>codex</category><category>copilot</category><category>sama</category><category>scaling01</category><category>kimmonismus</category><category>ajassy</category><category>simonw</category><category>htihle</category><category>arena</category><category>gdb</category><category>hangsiin</category><category>eliebakouch</category><category>_luofuli</category><category>teortaxestex</category><category>model-distribution</category><category>cloud-computing</category><category>benchmarking</category><category>usage-based-billing</category><category>model-orchestration</category><category>open-source</category><category>large-context-models</category><category>agent-scaling</category><category>coding</category><category>model-training</category><category>fp8</category><category>attention-mechanisms</category><category>multi-agent-systems</category></item><item><title>DeepSeek v4</title><link>https://news.smol.ai/issues/26-04-24-deepseek-v4/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-24-deepseek-v4/</guid><description>**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6** but has a high hallucination rate and higher serving costs. Hardware-model co-design is emphasized, with **NVIDIA Blackwell Ultra** delivering **150+ TPS/user** and support for FP4 and FP8 quantization enabling deployment on single nodes. Positioning among open Chinese models is competitive with **GLM-5.1** and **Xiaomi MiMo V2.5 Pro**. Meanwhile, **OpenAI launched GPT-5.5 and GPT-5.5 Pro APIs** with a **1M context window**, focusing on improved long-running workflows and token efficiency, quickly integrated into tools like **GitHub Copilot** and **Cursor**. *&quot;GPT-5.5 handles complex, tool-heavy, ambiguous workflows with fewer retries,&quot;* highlighting rapid distribution and agent integration.</description><pubDate>Fri, 24 Apr 2026 05:44:39 GMT</pubDate><category>deepseek</category><category>nvidia</category><category>openai</category><category>lambdaapi</category><category>togethercompute</category><category>xiaomi</category><category>deepseek-v4</category><category>deepseek-v4-pro</category><category>deepseek-v4-flash</category><category>kimi-k2.6</category><category>glm-5.1</category><category>xiaomi-mimo-v2.5-pro</category><category>gpt-5.5</category><category>gpt-5.5-pro</category><category>scaling01</category><category>ben_burtenshaw</category><category>artificialanlys</category><category>long-context</category><category>mixture-of-experts</category><category>model-quantization</category><category>memory-optimization</category><category>hardware-model-co-design</category><category>inference-speed</category><category>agent-integration</category><category>token-efficiency</category><category>model-deployment</category><category>open-weights</category><category>reasoning</category><category>hallucination-detection</category></item><item><title>GPT 5.5</title><link>https://news.smol.ai/issues/26-04-23-gpt-55/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-23-gpt-55/</guid><description>**OpenAI launched GPT-5.5** as its new flagship model for &quot;real work and powering agents,&quot; immediately available in ChatGPT and Codex but with delayed API access due to enhanced safety requirements. The model features improved token efficiency and supports longer multi-step execution with tool use and self-checking. Pricing is set at **$5/$30 per million tokens for GPT-5.5** and **$30/$180 for GPT-5.5 Pro**, roughly double the cost of GPT-5.4. The release includes significant Codex upgrades such as browser control, document handling, and OS-wide dictation. Early reactions are mixed but generally positive, noting improvements in coding and long-horizon tasks, though some benchmarks show incremental gains and hallucination issues persist. Third-party ecosystem support like Hermes Agent integration appeared quickly.</description><pubDate>Thu, 23 Apr 2026 05:44:39 GMT</pubDate><category>openai</category><category>scaling01</category><category>anthropic</category><category>teknium</category><category>gpt-5.5</category><category>gpt-5.4</category><category>gpt-5.5-pro</category><category>sama</category><category>reach_vb</category><category>agentic-ai</category><category>token-efficiency</category><category>tool-use</category><category>self-checking</category><category>coding</category><category>long-horizon-planning</category><category>model-pricing</category><category>api-access</category><category>model-safety</category><category>software-integration</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-22-not-much/</guid><description>**Alibaba** released **Qwen3.6-27B**, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over images and video, with immediate ecosystem support from vLLM, Unsloth, ggml, and Ollama. **OpenAI** open-sourced a practical **Privacy Filter** model for PII detection and masking, a 1.5B parameter token-classification model with a 128k context window aimed at enterprise redaction tasks. **Xiaomi** announced **MiMo-V2.5-Pro** and **MiMo-V2.5** models, emphasizing software engineering advances, long-horizon agents, and large context windows (up to 1M tokens), with strong benchmark results and integrations with Hermes and Nous. At **Google Cloud Next**, **Google** and **Google DeepMind** unveiled 8th-gen TPUs (TPU 8t for training and TPU 8i for inference) with claims of scaling to a million TPUs in a cluster, and launched the **Gemini Enterprise Agent Platform** evolving Vertex AI with Agent Studio and access to 200+ models including **Gemini 3.1 Pro** and **Gemini 3.1 Flash Image**. This marks a significant vertical integration of hardware, models, and enterprise tooling.</description><pubDate>Wed, 22 Apr 2026 05:44:39 GMT</pubDate><category>alibaba</category><category>openai</category><category>xiaomi</category><category>google</category><category>google-deepmind</category><category>vllm_project</category><category>unsloth</category><category>ggml</category><category>ollama</category><category>arena</category><category>nous-research</category><category>qwen3.6-27b</category><category>qwen3.5-397b-a17b</category><category>privacy-filter</category><category>mimo-v2.5-pro</category><category>mimo-v2.5</category><category>gemini-3.1-pro</category><category>gemini-3.1-flash-image</category><category>alibaba_qwen</category><category>clementdelangue</category><category>altryne</category><category>eliebakouch</category><category>mervenoyann</category><category>xiaomimo</category><category>sundarpichai</category><category>scaling01</category><category>open-models</category><category>multimodality</category><category>vision</category><category>tokenization</category><category>pii-detection</category><category>privacy</category><category>enterprise-ai</category><category>agentic-ai</category><category>benchmarking</category><category>long-context</category><category>model-deployment</category><category>hardware-optimization</category><category>model-integration</category><category>software-engineering</category></item><item><title>GPT-Image-2</title><link>https://news.smol.ai/issues/26-04-21-image-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-21-image-2/</guid><description>**OpenAI** launched **GPT-Image-2**, enhancing image generation with improved text rendering, layout fidelity, editing, multilingual support, and &quot;thinking&quot; capabilities. It supports generating slides, infographics, diagrams, UI mockups, and QR codes, and integrates with tools like **Figma**, **Canva**, **Adobe Firefly**, and **Hermes Agent**. Benchmarks show GPT-Image-2 leads image generation tasks with a +242 Elo advantage. **Hugging Face** released **ml-intern**, an open-source agent automating post-training research loops, improving scientific reasoning and healthcare benchmarks significantly. **Hermes** is evolving into a richer local/open agent platform with enhanced multi-process orchestration capabilities.</description><pubDate>Tue, 21 Apr 2026 05:44:39 GMT</pubDate><category>openai</category><category>hugging-face</category><category>figma</category><category>canva</category><category>adobe</category><category>nous-research</category><category>gpt-image-2</category><category>qwen3-1.7b</category><category>codex</category><category>clementdelangue</category><category>lewtun</category><category>gdb</category><category>nickaturley</category><category>mark_k</category><category>petergostev</category><category>tekninum</category><category>mayank_022</category><category>image-generation</category><category>multilingual-models</category><category>model-integration</category><category>benchmarking</category><category>agent-infrastructure</category><category>multi-process-systems</category><category>fine-tuning</category><category>scientific-reasoning</category><category>healthcare-ai</category><category>hierarchical-decomposition</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-20-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-20-not-much/</guid><description>**Moonshot&apos;s Kimi K2.6** is a major open-weight **1T-parameter MoE** model featuring **32B active parameters**, **384 experts**, **MLA attention**, **256K context window**, native multimodality, and **INT4 quantization**. It supports day-0 integration with platforms like **vLLM**, **OpenRouter**, **Cloudflare Workers AI**, and others, showcasing state-of-the-art performance on benchmarks such as **HLE w/ tools 54.0**, **SWE-Bench Pro 58.6**, and **Math Vision w/ python 93.2**. The model excels in **long-horizon execution** with over **4,000 tool calls**, **12+ hour continuous runs**, and **300 parallel sub-agents**. Meanwhile, **Alibaba&apos;s Qwen3.6-Max-Preview** previewed enhanced **agentic coding**, improved world knowledge, and instruction following, with notable performance on **AIME 2026 #15** and ranking in **Code Arena**. **Hermes Agent** is rapidly expanding its ecosystem, surpassing **100K GitHub stars** and integrating with tools like **Ollama** and **Copilot CLI**, while pioneering advanced multi-agent orchestration techniques such as **stateless ephemeral units**, **LLM-driven replanning**, and **dynamic context injection**. These developments highlight the competitive momentum of Chinese open and semi-open labs in coding and agent models.</description><pubDate>Mon, 20 Apr 2026 05:44:39 GMT</pubDate><category>moonshot</category><category>alibaba</category><category>vllm</category><category>openrouter</category><category>cloudflare</category><category>baseten</category><category>mlx</category><category>nous-research</category><category>opencode</category><category>ollama</category><category>kimi-k2.6</category><category>qwen-3.6-max-preview</category><category>mixture-of-experts</category><category>multimodality</category><category>int4-quantization</category><category>long-context</category><category>agentic-coding</category><category>multi-agent-systems</category><category>model-orchestration</category><category>memory-consolidation</category><category>llm-driven-replanning</category><category>dynamic-context-injection</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-17-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-17-not-much/</guid><description>**Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though early user feedback noted some regressions and stability issues. Discussions highlighted its cost-efficiency and agentic capabilities compared to **Gemini 3.1 Pro** and **GPT-5.4**. Meanwhile, **OpenAI**&apos;s Codex updates introduced advanced computer-use features enabling fast, agentic control of desktop apps and enterprise software, signaling progress toward practical AGI-like agents.</description><pubDate>Fri, 17 Apr 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>claude-opus-4.7</category><category>gemini-3.1-pro</category><category>gpt-5.4</category><category>claude-code</category><category>codex</category><category>claudeai</category><category>yuchenj_uw</category><category>kimmonismus</category><category>skirano</category><category>therundownai</category><category>arena</category><category>artificialanlys</category><category>victortaelin</category><category>emollick</category><category>alexalbert__</category><category>theo</category><category>scaling01</category><category>reach_vb</category><category>kr0der</category><category>hamelhusain</category><category>mattrickard</category><category>matvelloso</category><category>gdb</category><category>agentic-ai</category><category>model-benchmarking</category><category>adaptive-reasoning</category><category>cost-efficiency</category><category>computer-use</category><category>prototyping-tools</category><category>code-generation</category><category>model-performance</category><category>software-integration</category></item><item><title>Anthropic&apos;s Claude Opus 4.7</title><link>https://news.smol.ai/issues/26-04-16-opus-47/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-16-opus-47/</guid><description>**Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new **xhigh** reasoning tier. Benchmarks show substantial gains, including **SWE-bench Pro 64.3%**, **SWE-bench Verified 87.6%**, and **TerminalBench 69.4%**, with top rankings on **Vals Index** and **GDPval-AA**. Technical changes include a new tokenizer and increased image input resolution to **3.75MP**. Some long-context benchmarks showed mixed results, with a shift in focus from MRCR to Graphwalks. Adoption was rapid across tools like **Cursor**, **VS Code**, **Replit Agent**, and **Perplexity**. Meanwhile, **OpenAI** expanded **Codex** into a broader computer agent with Mac computer use, in-app browser, image generation/editing, 90+ plugins, multi-terminal support, SSH remote devbox access, and richer file previews. A new vertical life-sciences model, **GPT-Rosalind**, was also introduced.</description><pubDate>Thu, 16 Apr 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>cursor</category><category>replit</category><category>perplexity-ai</category><category>microsoft</category><category>claude-opus-4.7</category><category>codex</category><category>gpt-rosalind</category><category>bcherny</category><category>kimmonismus</category><category>scaling01</category><category>valsai</category><category>artificialanlys</category><category>natolambert</category><category>nrehiew_</category><category>coding</category><category>agentic-ai</category><category>tokenization</category><category>long-context</category><category>benchmarking</category><category>image-processing</category><category>software-engineering</category><category>computer-use</category><category>plugin-integration</category><category>multi-terminal-support</category><category>ssh-access</category><category>model-expansion</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-15-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-15-not-much/</guid><description>**OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes, fostering a new ecosystem with integrations from **Cloudflare**, **Modal**, **Vercel**, and others. **Cloudflare** launched **Project Think**, a next-gen Agents SDK with durable execution and sandboxed code, alongside **Agent Lee**, a prompt-driven UI agent using sandboxed TypeScript, and introduced real-time voice pipelines and browser automation tools. **Hermes Agent** focuses on persistent skill formation by learning from completed workflows, positioning itself as a professional agent distinct from GUI-first assistants like OpenClaw. *&quot;Hermes autonomously backfills tracking data, updates cron jobs, and saves workflows as reusable skills,&quot;* highlighting its advanced workflow management capabilities.</description><pubDate>Wed, 15 Apr 2026 05:44:39 GMT</pubDate><category>openai</category><category>cloudflare</category><category>modal</category><category>vercel</category><category>akshat_b</category><category>whoiskatrin</category><category>aninibread</category><category>braydenwilmoth</category><category>korinne_dev</category><category>kathyyliao</category><category>joshesye</category><category>chooseliberty</category><category>neoaiforecast</category><category>agents-sdk</category><category>sandboxing</category><category>durable-execution</category><category>state-management</category><category>voice-processing</category><category>browser-automation</category><category>workflow-automation</category><category>skill-formation</category><category>open-source</category><category>prompt-driven-ui</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-13-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-13-not-much/</guid><description>**Harness engineering** is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. **OpenAI&apos;s Codex** is expanding agentic coding workflows beyond software engineering, including codebase understanding and bug triage. Tooling trends show convergence on multi-agent orchestration, observability, and remote control, with **GitHub Copilot**, **Cursor**, and **LangChain** advancing these capabilities. The **Hermes Agent v0.9.0** release introduces a local web dashboard and enhanced security, gaining community traction over **OpenClaw** for UX and efficiency. The open agent ecosystem is growing with projects like **Open Agents** and **DeepAgent** providing modular stacks and runtimes.</description><pubDate>Mon, 13 Apr 2026 05:44:39 GMT</pubDate><category>openai</category><category>github</category><category>cursor</category><category>langchain</category><category>nous-research</category><category>codex</category><category>andrew_ng</category><category>steve_yegge</category><category>gabrielchua</category><category>giffmana</category><category>rhys_sullivan</category><category>teknium</category><category>shaun_furman</category><category>dabit3</category><category>robinebers</category><category>zainanzhou</category><category>nicoalbanese10</category><category>bromann</category><category>elliothyun</category><category>tiagonbotelho</category><category>pierceboggan</category><category>sydneyrunkle</category><category>agent-harnesses</category><category>multi-agent-systems</category><category>software-engineering</category><category>tooling</category><category>orchestration</category><category>observability</category><category>remote-control</category><category>security-hardening</category><category>user-experience</category><category>open-source</category><category>community-engagement</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-10-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-10-not-much/</guid><description>**GLM-5.1** has reached **#3 on Code Arena**, surpassing **Gemini 3.1** and **GPT-5.4**, and matching **Claude Sonnet 4.6** in coding performance. **Z.ai** now holds the **#1 open model rank** close to the top overall. The advisor pattern, combining a cheap executor with an expensive advisor, is gaining traction, improving performance and efficiency in models like **Haiku + Opus** and **Sonnet + Opus**. **Alibaba&apos;s Qwen Code v0.14.x** introduces orchestration features including remote control channels, cron tasks, and sub-agent model selection. Model routing is becoming a product-level concern due to specialization and spikiness in top models such as **Opus** and **GPT-5.4**. The **Hermes Agent** ecosystem shows strong momentum with a new workspace mobile app, FAST mode for **OpenAI/GPT-5.4**, and over **50k GitHub stars**. Practitioners report Hermes as a reliable agent framework, with local Qwen3-Coder-Next 80B 4-bit replacing parts of workflows previously reliant on Claude Code. The harness layer is emerging as a key abstraction in agent frameworks.</description><pubDate>Fri, 10 Apr 2026 05:44:39 GMT</pubDate><category>z-ai</category><category>anthropic</category><category>berkeley</category><category>langchain</category><category>alibaba</category><category>openai</category><category>glm-5.1</category><category>gemini-3.1</category><category>gpt-5.4</category><category>claude-3-sonnet</category><category>haiku</category><category>opus</category><category>sonnet</category><category>qwen-3.6-plus</category><category>qwen3-coder-next-80b</category><category>zixuan_li</category><category>akshay_pachaar</category><category>harrison_chase</category><category>walden_yan</category><category>yuchen_jin</category><category>sentdex</category><category>model-performance</category><category>agent-frameworks</category><category>orchestration</category><category>model-routing</category><category>fine-tuning</category><category>agent-harness</category><category>model-selection</category><category>workflow-automation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-09-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-09-not-much/</guid><description>**Anthropic&apos;s Mythos** and **OpenAI&apos;s** upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. **LangChain&apos;s Deep Agents deploy** introduces an open memory, model-agnostic agent harness architecture emphasizing open protocols and memory ownership. Sandboxes are gaining prominence as a core infrastructure for reinforcement learning, with labs running up to **100K concurrent sandboxes** aiming for **1M**. The **Hermes Agent** by Nous continues to gain traction with new integrations and features like a web-based HUD and token cost tracking.</description><pubDate>Thu, 09 Apr 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>langchain</category><category>nous-research</category><category>mythos</category><category>kimmonismus</category><category>paul_cal</category><category>gneubig</category><category>kentonvarda</category><category>boazbaraktcs</category><category>ylecun</category><category>deanwball</category><category>hwchase17</category><category>vtrivedy10</category><category>sarahcat21</category><category>aijoey</category><category>cybersecurity</category><category>sandboxing</category><category>reinforcement-learning</category><category>agent-architecture</category><category>memory-management</category><category>model-deployment</category><category>software-security</category><category>evaluation-methods</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-08-not-much/</guid><description>**Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future versions. Independent benchmarks rank Muse Spark highly, with strong performance on intelligence indices and efficiency, notably using over 10× less compute than **Llama 4 Maverick**. Key technical highlights include training efficiency, test-time scaling, and parallel multi-agent inference. Community testing shows strengths in image-to-code and one-shot game generation. Additionally, **Zhipu AI&apos;s GLM-5.1** is recognized as a leading open-weight model with architecture similar to DeepSeek-V3.2.</description><pubDate>Wed, 08 Apr 2026 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>zhipu-ai</category><category>deepseek</category><category>muse-spark</category><category>llama-4-maverick</category><category>glm-5.1</category><category>deepseek-v3.2</category><category>alexandr_wang</category><category>shengjia_zhao</category><category>jack_w_rae</category><category>ananyaku</category><category>_jasonwei</category><category>artificialanlys</category><category>valsai</category><category>epochairesearch</category><category>scale_ai</category><category>matthuang</category><category>omarsar0</category><category>skirano</category><category>mattdeitke</category><category>garrytan</category><category>sebastian_raschka</category><category>multimodality</category><category>tool-use</category><category>visual-chain-of-thought</category><category>multi-agent-systems</category><category>training-efficiency</category><category>test-time-scaling</category><category>parallel-inference</category><category>image-to-code</category><category>model-benchmarking</category><category>model-architecture</category></item><item><title>Anthropic @ $30B ARR, Project GlassWing and Claude Mythos Preview — first model too dangerous to release since GPT-2</title><link>https://news.smol.ai/issues/26-04-06-anthropic-mythos/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-06-anthropic-mythos/</guid><description>**Anthropic** strategically challenges **OpenAI** amid its upcoming IPO concerns by announcing a jump from **$19B ARR in March** to **$30B ARR in April**, highlighting a differential growth rate and higher cost efficiency. The company also revealed **Claude Mythos**, rumored as the largest successful training run, now restricted under **Project Glasswing** due to its dangerous capabilities. This model reportedly found thousands of high-severity vulnerabilities across major operating systems and browsers, showcasing unprecedented strategic thinking, situational awareness, and creative reward hacking. Notable figures like **Nicolas Carlini** and **Sam Bowman** commented on the model&apos;s advanced behaviors and unexpected internet access. Anthropic&apos;s disclosures emphasize both impressive business growth and groundbreaking AI capabilities.</description><pubDate>Tue, 07 Apr 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>claude-mythos</category><category>nicolas_carlini</category><category>sam_bowman</category><category>model-training</category><category>model-capabilities</category><category>security-vulnerabilities</category><category>strategic-thinking</category><category>reward-hacking</category><category>situational-awareness</category><category>benchmarking</category><category>model-restrictions</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-07-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-07-not-much/</guid><description>**Hermes Agent** is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new **Manim skill** enables generation of math/technical animations, expanding agent capabilities. The Hermes ecosystem is rapidly growing with GUI tools, WebUI, HUD updates, OAuth support, and integrations. An open training-data movement for agents is emerging, focusing on sharing reusable behavioral data and harness traces. Meanwhile, **Anthropic&apos;s Claude Code** faces distribution and policy challenges, with reports of restrictions and unreliability impacting third-party coding agents, highlighting issues with subscription economics for always-on agents. *&quot;Claude Code now errors if used to analyze Claude Code source&quot;* and *&quot;basically unusable&quot;* are key community sentiments.</description><pubDate>Mon, 06 Apr 2026 05:44:39 GMT</pubDate><category>nous-research</category><category>anthropic</category><category>theo</category><category>clementdelangue</category><category>badlogicgames</category><category>yuchenj_uw</category><category>self-improving-skills</category><category>agent-architecture</category><category>memory-persistence</category><category>animation-generation</category><category>open-training-data</category><category>coding-agents</category><category>subscription-models</category><category>policy-restrictions</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-14-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-14-not-much/</guid><description>**Google** introduced **Skills in Chrome**, enabling reusable browser workflows with Gemini prompts and a library of ready-made Skills, enhancing end-user agentization. **Tencent** teased **HYWorld 2.0**, an open-source 3D world model generating editable scenes from a single image. **Google DeepMind** released **Gemini Robotics-ER 1.6**, improving visual/spatial reasoning for robotics with 93% instrument-reading success. **OpenAI** expanded Trusted Access with **GPT-5.4-Cyber**, a fine-tuned model for defensive security workflows. **Hugging Face** launched **Kernels** on the Hub, offering GPU kernel repos with 1.7x–2.5x speedups. **Cursor** showcased a multi-agent CUDA optimization system with a 38% speedup across 235 problems. The **Hermes Agent** stack advanced to v0.9.0 with enhanced reliability, memory management, and integrations, while **LangChain** pushed **deepagents 0.5** toward deployable, multi-tenant async systems with multimodal support and prompt caching. *&quot;Hermes’ key advantage is operational stability, extensibility, and deployability.&quot;*</description><pubDate>Mon, 06 Apr 2026 05:44:39 GMT</pubDate><category>google</category><category>tencent</category><category>google-deepmind</category><category>openai</category><category>hugging-face</category><category>cursor</category><category>langchain</category><category>gemini</category><category>gemini-robotics-er-1.6</category><category>gpt-5.4-cyber</category><category>deepagents-0.5</category><category>clementdelangue</category><category>dylantfwang</category><category>antoinersx</category><category>steveschoettler</category><category>teknium</category><category>aiqiang888</category><category>sydneyrunkle</category><category>agent-infrastructure</category><category>cuda-optimization</category><category>visual-reasoning</category><category>spatial-reasoning</category><category>gpu-kernels</category><category>multi-agent-systems</category><category>memory-management</category><category>async-systems</category><category>multimodality</category><category>prompt-caching</category><category>software-engineering</category><category>robotics</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-03-not-much/</guid><description>**Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including **vLLM**, **llama.cpp**, **Ollama**, **Intel hardware**, **Unsloth**, and **Hugging Face Inference Endpoints**. Local inference benchmarks showed strong performance on consumer hardware, including RTX 4090 and Mac mini M4. Early benchmarking praised its efficiency and ranking improvements over previous versions. Meanwhile, **Hermes Agent** emerged as a popular open-source agent harness, noted for stability and capability on long tasks, with users switching from OpenClaw to Hermes.</description><pubDate>Fri, 03 Apr 2026 05:44:39 GMT</pubDate><category>google</category><category>huggingface</category><category>intel</category><category>ollama</category><category>unsloth</category><category>gemma-4</category><category>fchollet</category><category>demishassabis</category><category>clementdelangue</category><category>quixiai</category><category>googlegemma</category><category>ggerganov</category><category>osanseviero</category><category>maartengr</category><category>basecampbernie</category><category>prince_canuma</category><category>measure_plan</category><category>kimmonismus</category><category>anemll</category><category>arena</category><category>stochasticchasm</category><category>reach_vb</category><category>zeneca</category><category>everlier</category><category>erick_lindberg_</category><category>anomalistg</category><category>reasoning</category><category>agentic-workflows</category><category>multimodality</category><category>on-device-ai</category><category>local-inference</category><category>model-benchmarking</category><category>moe</category><category>vision</category><category>audio-processing</category><category>memory-optimization</category><category>open-source</category><category>model-performance</category></item><item><title>Gemma 4</title><link>https://news.smol.ai/issues/26-04-02-gemma-4/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-02-gemma-4/</guid><description>**Google DeepMind** released **Gemma 4**, a family of open-weight, multimodal models with long-context support up to **256K tokens** under an **Apache 2.0 license**, marking a major capability and licensing shift. The lineup includes **31B dense**, **26B MoE (A4B)**, and two edge models (**E4B**, **E2B**) optimized for local and edge deployment with native multimodal support (text, vision, audio). Early benchmarks show **Gemma-4-31B** ranking #3 among open models and strong scientific reasoning performance with **85.7% GPQA Diamond**. Day-0 ecosystem support includes **llama.cpp**, **Ollama**, **vLLM**, and **LM Studio**, with notable local inference performance on hardware like **M2 Ultra** and **RTX 4090**. The architecture features hybrid attention and MoE layering, diverging from standard transformers. Community and developer engagement is high, with rapid adoption and tooling integration.</description><pubDate>Thu, 02 Apr 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>gemma-4</category><category>gemma-4-31b</category><category>gemma-4-26b-a4b</category><category>jeffdean</category><category>_philschmid</category><category>rasbt</category><category>ggerganov</category><category>clattner_llvm</category><category>julien_c</category><category>clementdelangue</category><category>multimodality</category><category>long-context</category><category>model-architecture</category><category>moe</category><category>local-inference</category><category>model-optimization</category><category>function-calling</category><category>quantization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-04-01-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-04-01-not-much/</guid><description>**Arcee’s Trinity-Large-Thinking** was released with **open weights under Apache 2.0**, featuring a **400B total / 13B active** model size and strong agentic performance, ranking **#2 on PinchBench**. **Z.ai’s GLM-5V-Turbo** is a **vision coding model** with **native multimodal fusion** and a **CogViT encoder**, integrated into multiple platforms. **TII’s Falcon Perception** offers an **open-vocabulary referring expression segmentation model** with an **early-fusion transformer** and a competitive **0.3B OCR model**. **H Company’s Holo3** is a GUI-navigation model family based on **Qwen3.5**. A **Claude Code leak** revealed a minimalist agent core with a **4-layer context compression stack**, **40+ tool modular architecture**, and advanced features like **task budget management** and **streaming tool execution**. The leak highlights Anthropic’s agent design and operational sophistication.</description><pubDate>Wed, 01 Apr 2026 05:44:39 GMT</pubDate><category>arcee</category><category>z-ai</category><category>tii</category><category>anthropic</category><category>h-company</category><category>trinity-large-thinking</category><category>glm-5v-turbo</category><category>falcon-perception</category><category>qwen-3.5</category><category>claude-4.6-opus</category><category>claude-sonnet-4.5</category><category>mark_mcquade</category><category>latkins</category><category>willccbb</category><category>xlr8harder</category><category>natolambert</category><category>craig_hewitt</category><category>zhihu_frontier</category><category>open-weights</category><category>agentic-performance</category><category>vision</category><category>multimodality</category><category>transformer-architecture</category><category>early-fusion</category><category>ocr</category><category>gui-navigation</category><category>context-compression</category><category>tooling</category><category>feature-flags</category><category>production-ablations</category><category>task-budget-management</category><category>streaming</category><category>modular-architecture</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-30-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-30-not-much/</guid><description>**Anthropic** introduced **computer use inside Claude Code** for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. **OpenAI** released a **Codex plugin for Claude Code**, enabling cross-agent composition and signaling a shift toward composable coding harnesses. OpenAI also noted that late-night Codex tasks run longer, supporting background agent delegation. **Nous Research**&apos;s **Hermes Agent** saw rapid adoption due to better compaction, adaptability, and multi-agent profiles, evolving toward an agent OS abstraction. An ecosystem around Hermes includes tools for trace analytics, fine-tuning, and remote control, with debates on open-source versus proprietary agent infrastructure. Key themes include tooling, prompt/runtime orchestration, and review loops as critical factors beyond model capabilities.</description><pubDate>Mon, 30 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>nous-research</category><category>huggingface</category><category>claude-code</category><category>codex</category><category>hermes-agent</category><category>omarsar0</category><category>dkundel</category><category>reach_vb</category><category>theo</category><category>jayfarei</category><category>kaiostephens</category><category>icarushermes</category><category>winglian</category><category>clementdelangue</category><category>fchollet</category><category>closed-loop-verification</category><category>cross-agent-composition</category><category>agent-ecosystem</category><category>multi-agent-systems</category><category>runtime-orchestration</category><category>tooling</category><category>fine-tuning</category><category>remote-monitoring</category><category>privacy</category><category>sandboxing</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-27-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-27-not-much/</guid><description>**Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion parameters**, with **Google** potentially funding Anthropic&apos;s data center expansion. Meanwhile, **Zhipu** released **GLM-5.1**, advancing open coding models and narrowing the gap with closed models. Local inference economics are improving, highlighted by efficient deployments of **Qwen 3.5 14B**, **Qwen 27B**, and **Qwen3.5-35B** models with quantization techniques like **TurboQuant vLLM**. However, TurboQuant&apos;s benchmarking claims face criticism from researchers. Overall, the AI landscape shows aggressive scaling, local model deployment, and agent products gaining traction.</description><pubDate>Fri, 27 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>google</category><category>zhipu</category><category>claude-opus-4.6</category><category>capybara</category><category>glm-5.1</category><category>qwen-3.5-14b</category><category>qwen-27b</category><category>qwen3.5-35b</category><category>scaling01</category><category>yuchenj_uw</category><category>kimmonismus</category><category>m1astra</category><category>dejavucoder</category><category>iscienceluvr</category><category>gaoj0017</category><category>model-scaling</category><category>coding</category><category>academic-reasoning</category><category>cybersecurity</category><category>quantization</category><category>local-inference</category><category>model-benchmarking</category><category>inference-optimization</category><category>model-performance</category><category>agent-products</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-24-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-24-not-much/</guid><description>**Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and &quot;computer use&quot; for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming product-native. **Nous Research** releases **Hermes Agent v0.4.0** with 300+ PRs, adding OpenAI-compatible APIs and self-improving memory agents. Open agent ecosystems mature with **AI2&apos;s MolmoWeb** (4B and 8B models), **GenReasoning&apos;s OpenReward** platform offering 330+ RL environments and 4.5M+ tasks, and **Zhipu&apos;s ZClawBench** benchmark with 116 real-world agent tasks, highlighting progress toward standardized environment serving and benchmarkable agent tasks.</description><pubDate>Tue, 24 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>figma</category><category>github</category><category>cursor_ai</category><category>langchain</category><category>nous-research</category><category>ai2</category><category>genreasoning</category><category>zhipu-ai</category><category>huggingface</category><category>molmo-2-4b</category><category>molmo-2-8b</category><category>hermes-agent-v0.4.0</category><category>agent-infrastructure</category><category>multi-agent-systems</category><category>orchestration</category><category>computer-use</category><category>tool-calling</category><category>design-canvases</category><category>open-agent-platforms</category><category>reinforcement-learning-environments</category><category>benchmarking</category><category>rl-environments</category><category>self-improvement</category><category>api</category><category>memory-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-25-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-25-not-much/</guid><description>**ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency. The scoring protocol sparked debate over its harsh efficiency-based metric compared to prior ARC versions and other benchmarks like **NetHack**. The community acknowledges the benchmark highlights weaknesses in current LLM agents in interactive, sparse-feedback environments. Concurrently, agent infrastructure advances with **LangChain** launching Fleet shareable skills for reusable domain knowledge, and **Anthropic** revealing **Claude Code auto mode** for classifier-mediated approval balancing autonomy and manual confirmation. Browser and coding agents are evolving into trainable systems beyond prompt wrappers, exemplified by **BrowserBase** and **Prime Intellect** collaboration.</description><pubDate>Tue, 24 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>langchain</category><category>arcprize</category><category>primeintellect</category><category>arc-agi-3</category><category>claude-code</category><category>fchollet</category><category>mikeknoop</category><category>scaling01</category><category>_rockt</category><category>mark_k</category><category>andykonwinski</category><category>bradenjhancock</category><category>jeremyphoward</category><category>togelius</category><category>bracesproul</category><category>hwchase17</category><category>caspar_br</category><category>_catwu</category><category>agentic-reasoning</category><category>interactive-environments</category><category>benchmarking</category><category>efficiency-metrics</category><category>zero-preparation-generalization</category><category>agent-infrastructure</category><category>trainable-agents</category><category>classifier-approval</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-26-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-26-not-much/</guid><description>**Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting **9 languages** and competitive with ElevenLabs. **Cohere** introduced **Cohere Transcribe**, an audio model with **14-language** support and top English ASR leaderboard performance at **5.42 WER**. **OpenAI** released smaller multimodal variants **GPT-5.4 mini** and **GPT-5.4 nano** with **400k context**, noted for cost-competitiveness but high verbosity and hallucination rates. Other releases include **GLM-5-Turbo** by Zai, **Reka Edge** and **Flash 3** on OpenRouter, and new multi-agent UX tooling **Cline Kanban** for orchestrating CLI coding agents.</description><pubDate>Tue, 24 Mar 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>mistral-ai</category><category>cohere</category><category>openai</category><category>zai</category><category>reka-ai</category><category>gemini-3.1-flash</category><category>voxtral-tts</category><category>cohere-transcribe</category><category>gpt-5.4-mini</category><category>gpt-5.4-nano</category><category>glm-5-turbo</category><category>reka-edge</category><category>reka-flash-3</category><category>logan_kilpatrick</category><category>sundar_pichai</category><category>guillaume_lample</category><category>aidan_gomez</category><category>jay_alammar</category><category>giffmana</category><category>andrew_curran</category><category>voice</category><category>vision</category><category>function-calling</category><category>context-windows</category><category>multimodality</category><category>text-to-speech</category><category>low-latency</category><category>human-preference</category><category>automatic-speech-recognition</category><category>model-benchmarking</category><category>cost-efficiency</category><category>hallucination-detection</category><category>multi-agent-systems</category><category>open-source</category><category>git-worktrees</category></item><item><title>The Claude Code Source Leak</title><link>https://news.smol.ai/issues/26-03-31-claude-code-leak/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-31-claude-code-leak/</guid><description>**Anthropic&apos;s** closed-source coding product **Claude Code** experienced a significant source leak exposing over **500k lines** of orchestration logic, including autonomous modes and memory systems, but not model weights. The leak led to rapid public reverse-engineering, numerous forks with up to **32.6k stars and 44.3k forks**, and subsequent **DMCA takedowns** by Anthropic. Suspicious npm packages emerged targeting users compiling the leaked code, creating a live security hazard. Discussions also mention unreleased model references like **&quot;mythos&quot;** and ongoing product feature updates despite the leak. *&quot;OFFICIAL STATEMENT from Anthropic regarding the leak&quot;* was noted but not detailed.</description><pubDate>Tue, 24 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>claude-code</category><category>model-architecture</category><category>security</category><category>reverse-engineering</category><category>dmca</category><category>software-development</category><category>open-source</category><category>code-leak</category><category>agent-harness-design</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-23-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-23-not-much/</guid><description>**Anthropic** introduced **Claude Cowork** and **Claude Code** enabling desktop control of mouse, keyboard, and screen in a **macOS research preview**, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is evolving towards long-running, parallel, tool-rich workflows with projects like **Hermes Agent**, **T3 Code**, **Command Center**, and **Parchi** enhancing multi-agent orchestration and autonomous task management. Operational challenges such as fragility and inefficiency in subagents, including **GPT-5.2 Pro** and **Claude** browser/computer use, highlight the need for closed-loop feedback systems. Research from **Meta AI** advances self-improving agents with **Hyperagents / DGM-H** enabling meta-level procedural improvements, and unifies reinforcement learning post-training with **RLLM** (RL + LM-as-RM) to improve reward modeling across task types. Additionally, **WebArena-Infinity** drastically reduces browser environment construction costs, accelerating benchmark and environment generation.</description><pubDate>Mon, 23 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>meta-ai-fair</category><category>claude</category><category>gpt-5.2-pro</category><category>dgm-h</category><category>rllm</category><category>jenny_zhang</category><category>jase_weston</category><category>mikhail_parakhin</category><category>jeremyphoward</category><category>agent-frameworks</category><category>workflow-automation</category><category>multi-agent-systems</category><category>reinforcement-learning</category><category>reward-models</category><category>self-improving-agents</category><category>benchmark-generation</category><category>operational-efficiency</category><category>closed-loop-feedback</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-20-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-20-not-much/</guid><description>**Cursor&apos;s Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into third-party tools like **T3 Code** and communication channels such as Telegram and Discord, while **LangChain** is evolving from orchestration to multi-agent products with offerings like **Deep Agents/Open SWE** and **LangSmith Fleet**. The discourse emphasizes the importance of clear base-model attribution, licensing compliance, and product differentiation through fine-tuning and user experience.</description><pubDate>Fri, 20 Mar 2026 05:44:39 GMT</pubDate><category>cursor</category><category>kimi</category><category>fireworks</category><category>anthropic</category><category>langchain</category><category>kimi-k2.5</category><category>claude-code</category><category>clementdelangue</category><category>leerob</category><category>amanrsanger</category><category>yuchenj_uw</category><category>kimmonismus</category><category>model-attribution</category><category>fine-tuning</category><category>reinforcement-learning</category><category>open-source</category><category>agent-products</category><category>model-licensing</category><category>software-integration</category><category>product-differentiation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-19-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-19-not-much/</guid><description>**Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into reinforcement learning, trained across **3–4 clusters worldwide** by a **~40-person** team. **OpenAI** acquired **Astral**, the team behind Python tools **uv, ruff, and ty**, strengthening its developer platform. **Anthropic** expanded **Claude Code** with messaging app channels for persistent developer workflows. The focus in AI agents is shifting from single agents to managed fleets and runtimes, with **LangChain** launching **LangSmith Fleet** for enterprise agent management emphasizing **agent identity**, **credential management**, and auditability. Other launches include **Cognition&apos;s teams of Devins**, **AgentUI** by **lvwerra**, and discussions on agent runtimes with features like **checkpointing** and **rollback**. Security and permissions are emerging as critical constraints in agent system design.</description><pubDate>Thu, 19 Mar 2026 05:44:39 GMT</pubDate><category>cursor</category><category>openai</category><category>anthropic</category><category>langchain</category><category>cognition</category><category>claude-code</category><category>composer-2</category><category>kimmonismus</category><category>mntruell</category><category>theo</category><category>ellev3n11</category><category>amanrsanger</category><category>charliermarsh</category><category>gdb</category><category>yuchenj_uw</category><category>neilhtennek</category><category>simonw</category><category>yuvalinthedeep</category><category>lvwerra</category><category>hrishioa</category><category>reinforcement-learning</category><category>developer-tooling</category><category>agent-systems</category><category>agent-runtimes</category><category>security</category><category>credential-management</category><category>multi-agent-systems</category><category>model-training</category><category>benchmarking</category><category>software-engineering</category><category>enterprise-ai</category></item><item><title>MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model</title><link>https://news.smol.ai/issues/26-03-18-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-18-not-much/</guid><description>**MiniMax M2.7** is the headline model release, described as a &quot;self-evolving agent&quot; with strong performance metrics including **56.22% on SWE-Pro**, **57.0% on Terminal Bench 2**, and parity with **Sonnet 4.6**. It features recursive self-improvement in skills, memory, and architecture. **Artificial Analysis** places M2.7 on the cost/performance frontier with an Intelligence Index score of **50**, matching **GLM-5 (Reasoning)** but at a fraction of the cost. Distribution is available via platforms like **Ollama cloud** and **OpenRouter**. **Xiaomi’s MiMo-V2-Pro** is noted as a serious Chinese API-only reasoning model with a score of **49** on the Intelligence Index and favorable token efficiency. **Cartesia’s Mamba-3** is highlighted as an SSM optimized for inference-heavy use, with early reactions focusing on hybrid transformer architectures like **Qwen3.5** and **Kimi Linear**. The report emphasizes a shift from prompting to harness engineering, where the execution environment and agent harnesses, including skills and MCP, are becoming key differentiators in AI system design. This includes discussions on tools, repo legibility, constraints, and feedback loops, with mentions of **DSPy** and **GPT-5.4 mini** as important components in this evolving landscape.</description><pubDate>Wed, 18 Mar 2026 05:44:39 GMT</pubDate><category>minimax</category><category>xiaomi</category><category>artificial-analysis</category><category>ollama</category><category>trae</category><category>yupp</category><category>openrouter</category><category>vercel</category><category>zo</category><category>opencode</category><category>kilocode</category><category>cartesia</category><category>minimax-m2.7</category><category>sonnet-4.6</category><category>glm-5</category><category>mimo-v2-pro</category><category>mamba-3</category><category>qwen-3.5</category><category>kimi-k2.5</category><category>gpt-5.4-mini</category><category>self-evolving-agents</category><category>reasoning</category><category>cost-efficiency</category><category>token-efficiency</category><category>hybrid-architecture</category><category>harness-engineering</category><category>agent-harnesses</category><category>skills</category><category>memory-optimization</category><category>architecture</category><category>feedback-loops</category><category>api</category><category>inference</category><category>execution-environment</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-17-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-17-not-much/</guid><description>**OpenAI** released **GPT-5.4 mini** and **GPT-5.4 nano**, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a **400k context window** and over **2x speed** compared to GPT-5 mini. The mini model approaches larger GPT-5.4 performance while using only **30% of Codex quota**, becoming the default for many coding workflows. Pricing concerns and truthfulness tradeoffs were noted, with mixed third-party evaluations on reasoning and resistance to false premises. OpenAI also addressed behavior tuning issues in a recent update. Meanwhile, agent infrastructure is evolving with secure code execution and orchestration tools like **LangChain&apos;s LangSmith Sandboxes** and **Open SWE**, inspired by internal systems at **Stripe, Ramp, and Coinbase**. Subagents and secure execution are now key product features, with releases like **Hermes Agent v0.3.0** showcasing plugin architectures, live Chrome control, and voice mode. Research on attention mechanisms, including **Attention Residuals** and vertical attention, is gaining traction.</description><pubDate>Tue, 17 Mar 2026 05:44:39 GMT</pubDate><category>openai</category><category>langchain</category><category>stripe</category><category>ramp</category><category>coinbase</category><category>nous-research</category><category>hermes-agent</category><category>gpt-5.4-mini</category><category>gpt-5.4-nano</category><category>gpt-5.4</category><category>codex</category><category>hwchase17</category><category>michpokrass</category><category>coding</category><category>multimodality</category><category>subagents</category><category>context-window</category><category>model-performance</category><category>pricing</category><category>behavior-tuning</category><category>secure-execution</category><category>plugin-architecture</category><category>attention-mechanisms</category><category>agent-infrastructure</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-16-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-16-not-much/</guid><description>**Moonshot&apos;s Attention Residuals** paper introduced an input-dependent attention mechanism over prior layers with a **1.25x compute advantage** and less than **2% inference latency overhead**, validated on **Kimi Linear 48B total / 3B active**. The paper sparked debate on novelty versus prior art like **DeepCrossAttention** and Google’s earlier work, highlighting tensions in **idea novelty**, **citation quality**, and **frontier-scale validation**. **OpenAI&apos;s Codex** showed strong momentum with over **2M weekly active users**, nearly **4x growth YTD**, and **GPT-5.4** hitting **5T tokens/day** and a **$1B annualized run-rate**. Codex added subagents supporting multi-agent coding workflows. Infrastructure for coding agents matured with tools like **Context Hub / chub** supporting agent feedback loops, **AssemblyAI&apos;s skill** for Claude Code and Codex, and automated skill extraction from GitHub repos yielding **40% knowledge-transfer gains**. **LangChain** launched **LangGraph CLI** and open-sourced **Deep Agents**, recreating top coding agent workflows with planning, filesystem ops, shell access, and sub-agents.</description><pubDate>Mon, 16 Mar 2026 05:44:39 GMT</pubDate><category>moonshot</category><category>openai</category><category>assemblyai</category><category>langchain</category><category>kimi-linear-48b</category><category>codex</category><category>gpt-5.4</category><category>claude-code</category><category>kimi_moonshot</category><category>elonmusk</category><category>yuchenj_uw</category><category>nathancgy4</category><category>eliebakouch</category><category>tokenbender</category><category>behrouz_ali</category><category>cloneofsimo</category><category>fidjissimo</category><category>sama</category><category>gdb</category><category>andrewyng</category><category>itsafiz</category><category>simplifyinai</category><category>attention-mechanisms</category><category>model-architecture</category><category>inference-speed</category><category>agent-feedback</category><category>agent-skills</category><category>multi-agent-systems</category><category>knowledge-transfer</category><category>cli-tools</category><category>coding-agents</category><category>model-deployment</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-13-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-13-not-much/</guid><description>**MCP tools** remain relevant for deterministic APIs despite ergonomic criticisms, with new **web MCP support in Chrome v146** enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and multi-agent memory framed as a computer architecture challenge. Agent UX is evolving towards always-on, cross-device operation, exemplified by **Perplexity Computer** on iOS and **Claude Code** session management. **Anthropic** released **Opus 4.6 1M context** as default with no extra long-context API charges, achieving **78.3% on MRCR v2 at 1M tokens**. Sparse attention optimizations like **IndexCache** in **DeepSeek Sparse Attention** yield significant speedups on large models with minimal code changes.</description><pubDate>Fri, 13 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>ibm</category><category>perplexity-ai</category><category>llamaindex</category><category>deepseek</category><category>google-chrome</category><category>opus-4.6</category><category>glm-5</category><category>pamelafox</category><category>tadasayy</category><category>llama_index</category><category>bromann</category><category>dair_ai</category><category>omarsar0</category><category>abxxai</category><category>teknuim</category><category>bcherny</category><category>kimmonismus</category><category>_catwu</category><category>alexalbert__</category><category>realyushibai</category><category>persistent-memory</category><category>agent-infrastructure</category><category>cross-device-synchronization</category><category>long-context</category><category>sparse-attention</category><category>inference-optimization</category><category>computer-architecture</category><category>task-completion</category><category>systems-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-12-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-12-not-much/</guid><description>**Harnesses, agent infrastructure, and the MCP protocol** are central themes, with emphasis on how **harnesses, sandboxes, filesystem access, skills, memory, and observability** shape agent UI/UX and runtime environments. Despite jokes about MCP&apos;s demise, it remains vital in production, notably used internally by **Uber** and supported by **Anthropic**. The **coding-agent stack** is evolving with **CursorBench** combining offline and online metrics to evaluate models on **intelligence and efficiency**, where **GPT-5.4** leads in correctness and token efficiency. Agent-assisted development is splitting between automation-heavy workflows and &quot;stay-in-the-loop&quot; tooling, with **OpenAI** advancing **Codex Automations** featuring worktree vs. branch choices and UI customization. The open agent platform **Hermes Agent v0.2.0** introduces full MCP client support, ACP server for editors, and expanded provider integrations including **OpenAI OAuth**.</description><pubDate>Thu, 12 Mar 2026 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>uber</category><category>nous-research</category><category>cursor_ai</category><category>redisinc</category><category>artificialanlys</category><category>langchain-js</category><category>gpt-5.4</category><category>mattturck</category><category>hwchase17</category><category>omarsar0</category><category>gergelyorosz</category><category>htihle</category><category>theprimeagen</category><category>sydneyrunkle</category><category>corbtt</category><category>agent-infrastructure</category><category>mcp-protocol</category><category>harnesses</category><category>coding-agents</category><category>evaluation-methodologies</category><category>agent-ui-ux</category><category>runtime-environments</category><category>multi-axis-evaluation</category><category>automation</category><category>workflow-optimization</category><category>open-agent-platforms</category><category>provider-integration</category><category>filesystem-checkpoints</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-11-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-11-not-much/</guid><description>**NVIDIA’s Nemotron 3 Super** is a **120B parameter / ~12B active** open model featuring a **hybrid Mamba-Transformer / SSM Latent MoE** architecture and **1M context window**, delivering up to **2.2x faster inference than GPT-OSS-120B** in FP4 with strong throughput gains. It supports agentic workloads and is unusually open with weights, data, and infrastructure details released. The model scored **36 on the AA Intelligence Index**, outperforming GPT-OSS-120B but behind Qwen3.5-122B-A10B. Community and infrastructure support from projects like **vLLM**, **llama.cpp**, **Ollama**, **Together**, **Baseten**, **W&amp;B Inference**, **LangChain**, and **Unsloth GGUFs** was immediate. Key technical innovations include **native multi-token prediction (MTP)** and a significant **KV-cache efficiency** advantage. 

On the product side, a shift towards **persistent agent runtimes and orchestration layers** is highlighted, with **Andrej Karpathy** advocating for a &quot;bigger IDE&quot; concept where agents replace files as the unit of work, enabling legible, forkable agentic organizations with real-time control. New launches fitting this vision include **Perplexity’s Personal Computer**, an always-on local/cloud hybrid running on Mac mini, and **Computer for Enterprise** orchestrating 20 specialized models and 400+ apps. **Replit Agent 4** offers a collaborative, canvas-like workflow with parallel agents, while **Base44 Superagents** provide integrated solutions for nontechnical users. The engineering focus is increasingly on the orchestration harness rather than just the model.</description><pubDate>Wed, 11 Mar 2026 05:44:39 GMT</pubDate><category>nvidia</category><category>perplexity</category><category>replit</category><category>base44</category><category>vllm</category><category>llama.cpp</category><category>ollama</category><category>togethercompute</category><category>baseten</category><category>wandb</category><category>langchain</category><category>unsloth</category><category>nemotron-3-super</category><category>gpt-oss-120b</category><category>qwen3.5-122b-a10b</category><category>karpathy</category><category>ctnzr</category><category>bnjmn_marie</category><category>artificialanlys</category><category>model-architecture</category><category>model-optimization</category><category>inference-speed</category><category>kv-cache</category><category>multi-token-prediction</category><category>agent-infrastructure</category><category>orchestration</category><category>persistent-agents</category><category>model-serving</category><category>product-launches</category></item><item><title>Yann LeCun’s AMI Labs launches with a $1.03B seed to build world models around JEPA</title><link>https://news.smol.ai/issues/26-03-10-ami-labs/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-10-ami-labs/</guid><description>**Yann LeCun** launched **Advanced Machine Intelligence (AMI Labs)** with a record **$1.03B seed round** at a **$3.5B pre-money valuation**, aiming to build AI models that understand the **physical world** through **world models** rather than just language prediction. The startup, based in **Europe** with locations in **Paris** and **Zürich**, is framed as a major milestone for European AI and backed by a prominent founding team including **Alex Lebrun**, **Saining Xie**, and **Pascale Fung**. The mission is described as a &quot;long-term scientific endeavor&quot; to create AI that &quot;perceives, learns, reasons and acts&quot; in the real world.</description><pubDate>Tue, 10 Mar 2026 05:44:39 GMT</pubDate><category>ami-labs</category><category>ylecun</category><category>lxbrun</category><category>sainingxie</category><category>pascalefung</category><category>laurentsolly</category><category>world-models</category><category>representation-learning</category><category>pretraining</category><category>scaling</category><category>video</category><category>funding</category><category>seed-round</category><category>valuation</category><category>real-world-understanding</category></item><item><title>Autoresearch: Sparks of Recursive Self Improvement</title><link>https://news.smol.ai/issues/26-03-09-autoresearch/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-09-autoresearch/</guid><description>**RSI** covers AI developments from 3/5/2026 to 3/9/2026, highlighting the emergence of **LLMs autonomously training smaller LLMs**, marking a significant &quot;AutoML moment&quot; in AI progress. **Karpathy** and **Yi Tay** discuss &quot;vibe training,&quot; where AI models fix bugs and improve code autonomously, suggesting models may soon surpass human debugging efficiency. The report anticipates **Jakub Pachocki&apos;s Automated AI Research Intern** system by September 2026 to accelerate human researchers. On AI Twitter, the focus is on **coding agents** shifting bottlenecks from implementation to review and verification, with **Anthropic&apos;s Claude Code Review** improving PR review effectiveness significantly, and tools like **OpenAI Codex Review** and **Cognition&apos;s Devin Review** enhancing code review workflows. Harness engineering is evolving into systems engineering, emphasizing decoupling agent storage from compute for collaborative agent teams.</description><pubDate>Mon, 09 Mar 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>cognition</category><category>claude-3</category><category>codex</category><category>karpathy</category><category>yi_tay</category><category>jakub_pachocki</category><category>automated-machine-learning</category><category>coding-agents</category><category>bug-fixing</category><category>model-autonomy</category><category>multi-agent-systems</category><category>pr-review</category><category>systems-engineering</category><category>model-verification</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-06-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-06-not-much/</guid><description>**OpenAI** rolled out **GPT-5.4**, achieving tied **#1** on the **Artificial Analysis Intelligence Index** with **Gemini 3.1 Pro Preview** scoring **57** (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger **~1.05M token** context window and higher per-token prices ($2.50/$15 vs $1.75/$14 for GPT-5.2), with strengths in **physics reasoning (CritPt)** and **agentic coding (TerminalBench Hard)** but a higher hallucination rate and **~28% higher benchmark run cost**. The **GPT-5.4 Pro** variant shows a **+10 point jump** on CritPt reaching **30%** but at an extreme output token cost of **$180 / 1M tokens**. Community benchmarks show GPT-5.4 excels in agentic/coding tasks but mixed feedback on reasoning efficiency and literalness compared to **Claude**. OpenAI updated agent prompting guidance for GPT-5.4 API users, emphasizing tool use, structured outputs, and verification loops. **Claude Code** added local scheduled tasks and loop patterns for agents. The **MCP** framework is highlighted as a connective tissue for AI evaluation and design-code round-trips, with **Truesight MCP** enabling AI evaluation like unit testing and **Figma MCP server** supporting bidirectional design-code integration. Open-source **T3 Code** launched as an agent orchestration coding app built on Codex CLI.</description><pubDate>Fri, 06 Mar 2026 05:44:39 GMT</pubDate><category>openai</category><category>artificial-analysis</category><category>gemini</category><category>claude</category><category>mit</category><category>figma</category><category>github</category><category>gpt-5.4</category><category>gpt-5.2</category><category>gemini-3.1-pro</category><category>benchmarking</category><category>physics-reasoning</category><category>agentic-coding</category><category>hallucination-detection</category><category>context-windows</category><category>cost-efficiency</category><category>agent-prompting</category><category>scheduled-tasks</category><category>loop-patterns</category><category>ai-evaluation</category><category>design-code-integration</category><category>agent-orchestration</category><category>open-source</category></item><item><title>GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back</title><link>https://news.smol.ai/issues/26-03-05-gpt54/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-05-gpt54/</guid><description>**OpenAI** launched **GPT-5.4** and **GPT-5.4 Pro** with unified mainline and Codex models, featuring **native computer use**, up to **~1M token context**, and efficiency improvements including a new **Codex `/fast` mode**. Benchmarks showed strong results like **OSWorld-Verified 75.0%** surpassing human baseline and **GDPval 83%** against industry pros. User feedback highlighted coding utility but raised concerns about pricing and overthinking. Integration with devtools like **Cursor**, **Perplexity**, and **Arena** was announced. In systems research, **FlashAttention-4 (FA4)** was introduced with near-matmul speed attention on **Blackwell** GPUs, featuring innovations like **polynomial exp emulation** and **online softmax**. *&quot;Steering mid-response&quot;* and *&quot;fewer tokens, faster speed&quot;* were emphasized as UX and efficiency improvements.</description><pubDate>Thu, 05 Mar 2026 05:44:39 GMT</pubDate><category>openai</category><category>cursor_ai</category><category>perplexity_ai</category><category>arena</category><category>gpt-5.4</category><category>gpt-5.4-pro</category><category>sama</category><category>reach_vb</category><category>scaling01</category><category>danshipper</category><category>yuchenj_uw</category><category>native-computer-use</category><category>long-context</category><category>efficiency</category><category>steering</category><category>benchmarking</category><category>gpu-kernels</category><category>attention-mechanisms</category><category>algorithmic-optimization</category><category>pipeline-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-04-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-04-not-much/</guid><description>**Gemini 3.1 Flash-Lite** is highlighted by **Demis Hassabis** for its speed and cost-efficiency, focusing on latency and cost per capability rather than raw performance. **NotebookLM Studio** introduces a new feature for generating immersive cinematic video overviews. Rumors about **GPT-5.4** suggest a ~1 million token context window and an &quot;extreme reasoning mode&quot; for long-horizon tasks, with speculation about monthly model updates from **OpenAI**. **Anthropic&apos;s Claude Opus 4.6** is noted for strong general agent behavior but weaker visual mathematics performance. **Alibaba&apos;s Qwen** team faces leadership exits and restructuring, with concerns about compute access and organizational changes. Qwen models dominate research workflows, appearing in 41% of Hugging Face papers in 2025-2026, raising ecosystem dependence risks. The open-weight model landscape may consolidate around non-profits, **NVIDIA**, and **Meta** due to business incentives.</description><pubDate>Wed, 04 Mar 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>openai</category><category>anthropic</category><category>alibaba</category><category>nvidia</category><category>meta-ai-fair</category><category>hugging-face</category><category>gemini-3.1-flash-lite</category><category>gpt-5.4</category><category>claude-opus-4.6</category><category>qwen-3.5</category><category>qwen</category><category>demishassabis</category><category>natolambert</category><category>poezhao0605</category><category>simonw</category><category>model-positioning</category><category>latency</category><category>cost-efficiency</category><category>context-window</category><category>extreme-reasoning</category><category>agentic-ai</category><category>model-updates</category><category>general-agent-behavior</category><category>visual-mathematics</category><category>leadership-exits</category><category>organizational-restructuring</category><category>compute-access</category><category>research-workflows</category><category>open-weight-models</category><category>ecosystem-dependence</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-03-not-much/</guid><description>**Google DeepMind** launched **Gemini 3.1 Flash-Lite**, emphasizing *dynamic thinking levels* for adjustable compute, with notable metrics like **$0.25/M input**, **$1.50/M output**, **1432 Elo on LMArena**, and **2.5× faster time-to-first-token** than Gemini 2.5 Flash. It supports a **1M context window** and high throughput for multimodal inputs including text, images, video, audio, and PDFs. **OpenAI** rolled out **GPT-5.3 Instant** to all ChatGPT users, improving conversational naturalness and reducing hallucinations by **26.8% with search**. The upcoming **GPT-5.4** was teased amid speculation. **Alibaba&apos;s Qwen** faces leadership exits, raising concerns about its future and open-source status. The news highlights advancements in model efficiency, pricing, and multimodality, alongside organizational changes impacting AI development.</description><pubDate>Tue, 03 Mar 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>google</category><category>openai</category><category>alibaba</category><category>gemini-3.1-flash-lite</category><category>gemini-3</category><category>gpt-5.3</category><category>gpt-5.4</category><category>qwen</category><category>jeffdean</category><category>noamshazeer</category><category>sundarpichai</category><category>aidan_mclau</category><category>justinlin610</category><category>multimodality</category><category>latency</category><category>throughput</category><category>context-window</category><category>model-pricing</category><category>model-benchmarking</category><category>model-performance</category><category>conversational-ai</category><category>hallucination-reduction</category><category>api</category><category>model-rollout</category><category>leadership-exit</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-03-02-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-03-02-not-much/</guid><description>**Alibaba** released the **Qwen 3.5** series with models ranging from **0.8B to 9B** parameters, featuring **native multimodality**, **scaled reinforcement learning**, and targeting **edge and lightweight agent** deployments. The models support very long context windows up to **262K tokens** (extendable to 1M) and use a novel **Gated DeltaNet hybrid attention** architecture combining linear and full attention layers. Deployment examples include **Ollama** and **LM Studio**, with a notable **6-bit on-device demo on iPhone 17 Pro**. Evaluators are cautioned that reasoning is disabled by default on smaller models. In coding agents, **Codex 5.3** shows promising benchmark results on **WeirdML** with **79.3%** accuracy, though availability and downtime remain critical challenges, especially highlighted by **Claude** outages. Agent reliability and observability are emphasized as cross-functional problems requiring clear success criteria and practical evaluation strategies. Studies show that using **AGENTS.md** and **SKILL.md** guardrails can significantly reduce runtime and token usage by mitigating worst-case thrashing in coding workflows.</description><pubDate>Mon, 02 Mar 2026 05:44:39 GMT</pubDate><category>alibaba</category><category>ollama</category><category>lm-studio</category><category>openai</category><category>anthropic</category><category>qwen-3.5-0.8b</category><category>qwen-3.5-2b</category><category>qwen-3.5-4b</category><category>qwen-3.5-9b</category><category>codex-5.3</category><category>claude-3</category><category>nrehiew_</category><category>kimmonismus</category><category>lioronai</category><category>danielhanchen</category><category>theo</category><category>htihle</category><category>teortaxestex</category><category>theprimeagen</category><category>yuchenj_uw</category><category>_lewtun</category><category>saen_dev</category><category>_philschmid</category><category>omarsar0</category><category>multimodality</category><category>reinforcement-learning</category><category>long-context</category><category>hybrid-attention</category><category>on-device-ai</category><category>model-deployment</category><category>agent-reliability</category><category>agent-observability</category><category>coding-agents</category><category>benchmarking</category><category>runtime-optimization</category><category>token-efficiency</category></item><item><title>OpenAI closes $110B raise from Amazon, NVIDIA, SoftBank in largest startup fundraise in history @ $840B post-money</title><link>https://news.smol.ai/issues/26-02-27-openai-g/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-27-openai-g/</guid><description>**OpenAI** has closed a major funding round totaling **$110 billion** at a **$730 billion pre-money valuation**, with investments from **SoftBank ($30B)**, **NVIDIA ($30B)**, and **Amazon ($50B)**. Key user metrics include **1.6 million weekly Codex users**, **over 9 million paying business users** of ChatGPT, and **more than 900 million weekly active ChatGPT users** with **50 million consumer subscribers**. The partnership with Amazon includes exclusive cloud services and **2 gigawatts of Trainium capacity**. Microsoft maintains a reduced partnership with stateless APIs. This funding round is one of the largest in history, highlighting OpenAI&apos;s dominant position in AI adoption and infrastructure.</description><pubDate>Fri, 27 Feb 2026 05:44:39 GMT</pubDate><category>openai</category><category>softbank</category><category>nvidia</category><category>amazon</category><category>microsoft</category><category>codex</category><category>chatgpt</category><category>sama</category><category>model-scaling</category><category>model-metrics</category><category>investment</category><category>cloud-computing</category><category>infrastructure</category><category>training-capacity</category><category>user-growth</category><category>partnerships</category></item><item><title>Nano Banana 2 aka Gemini 3.1 Flash Image Preview: the new SOTA Imagegen model</title><link>https://news.smol.ai/issues/26-02-26-nanobanana2/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-26-nanobanana2/</guid><description>**Google and DeepMind** launched **Nano Banana 2** (aka **Gemini 3.1 Flash Image Preview**), a leading image generation and editing model integrated across multiple Google products with features like **4K upscaling**, **multi-subject consistency**, and **real-time search-conditioned generation**. Evaluations rank it #1 in text-to-image tasks with competitive pricing. Additionally, advances in **agentic coding** are noted with models like **GPT-5.2**, **GPT-5.3 Codex**, **Opus 4.6**, and **Gemini 3.1**, alongside Microsoft&apos;s **Copilot Tasks** introducing task delegation. Persistent memory features are rolling out in **Claude** models, though interoperability challenges remain.</description><pubDate>Thu, 26 Feb 2026 05:44:39 GMT</pubDate><category>google</category><category>google-deepmind</category><category>microsoft</category><category>anthropic</category><category>perplexity-ai</category><category>gemini-3.1-flash</category><category>gpt-5.2</category><category>gpt-5.3-codex</category><category>opus-4.6</category><category>claude</category><category>sundarpichai</category><category>demishassabis</category><category>mustafasuleyman</category><category>yusuf_i_mehdi</category><category>borisdayma</category><category>aravsrinivas</category><category>image-generation</category><category>text-rendering</category><category>3d-imaging</category><category>real-time-information</category><category>agentic-ai</category><category>persistent-memory</category><category>multi-agent-systems</category><category>tooling</category><category>coding-agents</category><category>task-delegation</category></item><item><title>Agentic Engineering: WTF Happened in December 2025?</title><link>https://news.smol.ai/issues/26-02-25-wtf-happened/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-25-wtf-happened/</guid><description>**Perplexity** launched **Computer**, an orchestration-first agent platform featuring multi-model routing, usage-based pricing, and parallel asynchronous sub-agents for distributed workflows. **Andrej Karpathy** claims a &quot;phase change&quot; in coding agents since December, highlighting sustained long-horizon task completion. **OpenAI** released **GPT-5.3-Codex** with ~25% speed improvements and strong benchmark performance, while **Claude Code** celebrates its first year with ecosystem integrations and scaling challenges. This marks a significant shift in coding workflows and agent-based software development.</description><pubDate>Wed, 25 Feb 2026 05:44:39 GMT</pubDate><category>perplexity</category><category>openai</category><category>anthropic</category><category>langchain-ai</category><category>gpt-5.3-codex</category><category>claude-code</category><category>karpathy</category><category>aravsrinivas</category><category>lioronai</category><category>denisyarats</category><category>swyx</category><category>catwu</category><category>hwchase17</category><category>coding-agents</category><category>agent-architecture</category><category>distributed-workflows</category><category>usage-based-pricing</category><category>model-routing</category><category>benchmarking</category><category>context-length</category><category>observability</category><category>software-development</category></item><item><title>Anthropic accuses DeepSeek, Moonshot, and MiniMax of &quot;industrial-scale distillation attacks&quot;.</title><link>https://news.smol.ai/issues/26-02-23-anthropic-distillation/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-23-anthropic-distillation/</guid><description>**Anthropic** alleges *industrial-scale* distillation attacks on its **Claude** model by **DeepSeek**, **Moonshot AI**, and **MiniMax**, involving **~24,000 fraudulent accounts** and **&gt;16M Claude exchanges** to extract capabilities, raising concerns about competitive risks and safety. The community debates the difference between scraping and API-output extraction, highlighting a shift toward protecting models via *API abuse resistance* techniques. Meanwhile, coding agents like **Codex** and **Claude Code** see real adoption and failures, with emerging best practices in &quot;agentic engineering&quot; led by **Simon Willison**. The **OpenClaw** ecosystem expands with alternatives like **NanoClaw** and integrations such as **Ollama 0.17** simplifying open model usage.</description><pubDate>Tue, 24 Feb 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>deepseek</category><category>moonshot-ai</category><category>minimax</category><category>openai</category><category>ollama</category><category>claude</category><category>claude-3</category><category>codex</category><category>claude-code</category><category>simon_willison</category><category>api-abuse-resistance</category><category>model-security</category><category>agentic-engineering</category><category>coding-agents</category><category>model-distillation</category><category>workflow-automation</category><category>sandboxing</category><category>realtime-communication</category></item><item><title>Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2</title><link>https://news.smol.ai/issues/26-02-24-claude-code/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-24-claude-code/</guid><description>**Alibaba** launched the **Qwen 3.5 Medium Model Series** featuring models like **Qwen3.5-Flash**, **Qwen3.5-35B-A3B (MoE)**, and **Qwen3.5-122B-A10B (MoE)** emphasizing efficiency over scale with innovations like **1M context** and INT4 quantization. **OpenAI** released **GPT-5.3-Codex** via the **Responses API** with enhanced file input support and faster web socket-based throughput. **Anthropic** introduced **Claude Code Remote Control** enabling terminal session continuation from mobile and expanded enterprise workflow features. **Cursor** shifted UX to agent demo videos instead of diffs, highlighting new interaction modes.</description><pubDate>Tue, 24 Feb 2026 05:44:39 GMT</pubDate><category>alibaba</category><category>openai</category><category>anthropic</category><category>cursor</category><category>huggingface</category><category>qwen3.5-flash</category><category>qwen3.5-35b-a3b</category><category>qwen3.5-122b-a10b</category><category>qwen3.5-27b</category><category>qwen3.5-397b-a17b</category><category>gpt-5.3-codex</category><category>claude-code</category><category>awnihannun</category><category>andrew_n_carr</category><category>justinlin610</category><category>unslothai</category><category>terryyuezhuo</category><category>haihaoshen</category><category>0xsero</category><category>ali_tongyilab</category><category>scaling01</category><category>gdb</category><category>noahzweben</category><category>_catwu</category><category>model-architecture</category><category>reinforcement-learning</category><category>quantization</category><category>context-windows</category><category>agentic-ai</category><category>api</category><category>websockets</category><category>software-ux</category><category>enterprise-workflows</category><category>model-deployment</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-02-20-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-20-not-much/</guid><description>**Gemini 3.1 Pro** demonstrates strong retrieval capabilities and cost efficiency compared to **GPT-5.2** and **Opus 4.6**, though users report tooling and UI issues. The **SWE-bench Verified** evaluation methodology is under scrutiny for consistency, with updates bringing results closer to developer claims. Benchmarking debates arise over what frontier models truly measure, especially with ARC-AGI puzzles. **Claude Opus 4.6** shows a noisy but notable **14.5-hour time horizon** on software tasks, with token limits causing practical failures. **Sonnet 4.6** improves significantly in code and instruction-following benchmarks, but user backlash grows due to product regressions.</description><pubDate>Sat, 21 Feb 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>anthropic</category><category>context-arena</category><category>artificial-analysis</category><category>epoch-ai</category><category>scaling01</category><category>gemini-3.1-pro</category><category>gpt-5.2</category><category>opus-4.6</category><category>sonnet-4.6</category><category>claude-opus-4.6</category><category>dillonuzar</category><category>artificialanlys</category><category>yuchenj_uw</category><category>theo</category><category>minimax_ai</category><category>epochairesearch</category><category>paul_cal</category><category>scaling01</category><category>metr_evals</category><category>idavidrein</category><category>xlr8harder</category><category>htihle</category><category>arena</category><category>retrieval</category><category>benchmarking</category><category>evaluation-methodology</category><category>token-limits</category><category>cost-efficiency</category><category>instruction-following</category><category>software-reasoning</category><category>model-reliability</category></item><item><title>Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2</title><link>https://news.smol.ai/issues/26-02-19-gemini31/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-19-gemini31/</guid><description>**Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**, highlighting a significant reasoning improvement with **ARC-AGI-2 = 77.1%** and strong coding and agentic-tool benchmarks like **SWE-Bench Verified = 80.6%**. Independent evaluators such as **Artificial Analysis** and **Arena** confirmed top-tier performance and cost efficiency, though community reactions included excitement about practical gains, skepticism about benchmark targeting, and concerns over rollout inconsistencies. The release emphasizes the same core intelligence powering **Gemini 3 Deep Think** scaled for practical use, with notable mentions from leaders like *@sundarpichai*, *@demishassabis*, and *@JeffDean*.</description><pubDate>Thu, 19 Feb 2026 05:44:39 GMT</pubDate><category>google</category><category>google-deepmind</category><category>geminiapp</category><category>gemini-3.1-pro</category><category>gemini-3-deep-think</category><category>sundarpichai</category><category>demishassabis</category><category>jeffdean</category><category>koraykv</category><category>noamshazeer</category><category>joshwoodward</category><category>artificialanlys</category><category>arena</category><category>oriolvinyalsml</category><category>scaling01</category><category>reasoning</category><category>benchmarking</category><category>agentic-ai</category><category>cost-efficiency</category><category>hallucination</category><category>code-generation</category><category>model-release</category><category>developer-tools</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-02-18-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-18-not-much/</guid><description>**Anthropic** released **Claude Opus/Sonnet 4.6**, showing a significant intelligence index jump but with increased token usage and cost. **Anthropic** also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls. **Alibaba** launched **Qwen 3.5** with discussions on reasoning efficiency and token bloat, plus open-sourced **Qwen3.5-397B-A17B FP8 weights**. The **GLM-5** technical report introduced asynchronous agent reinforcement learning and compute-efficient techniques. Rumors about **Gemini 3.1 Pro** suggest longer reasoning capabilities, while **MiniMax M2.5** appeared on community leaderboards. The community debates benchmark reliability and model performance nuances.</description><pubDate>Wed, 18 Feb 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>alibaba</category><category>scaling01</category><category>arena</category><category>artificial-analysis</category><category>claude-4.6</category><category>claude-opus-4.6</category><category>claude-sonnet-4.6</category><category>qwen-3.5</category><category>qwen3.5-397b-a17b</category><category>glm-5</category><category>gemini-3.1-pro</category><category>minimax-m2.5</category><category>eshear</category><category>theo</category><category>omarsar0</category><category>grad62304977</category><category>scaling01</category><category>benchmarking</category><category>token-efficiency</category><category>ai-agent-autonomy</category><category>reinforcement-learning</category><category>asynchronous-learning</category><category>model-performance</category><category>open-weights</category><category>reasoning</category><category>software-engineering</category><category>agentic-engineering</category></item><item><title>Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats</title><link>https://news.smol.ai/issues/26-02-17-sonnet-46/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-17-sonnet-46/</guid><description>**Anthropic** launched **Claude Sonnet 4.6**, an upgrade over Sonnet 4.5, featuring broad improvements in **coding, long-context reasoning, agent planning, knowledge work, and design**, plus a **1M-token context window (beta)**. Benchmarks show Sonnet 4.6 leading on **GDPval-AA ELO 1633**, with significant token usage increases and improved output aesthetics. Integrations include **Cursor, Windsurf, Microsoft Foundry, and Perplexity Pro/Max**. Early user feedback noted some regression issues that were later fixed. Pricing remains the same as Sonnet 4.5. Tooling enhancements include code execution for filtering results, improving accuracy and efficiency.</description><pubDate>Tue, 17 Feb 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>cursor</category><category>microsoft</category><category>perplexity-ai</category><category>cognition</category><category>claude-3-sonnet-4.6</category><category>claude-3-sonnet-4.5</category><category>claude-3-opus-4.5</category><category>claude-3-opus-4.6</category><category>alexalbert__</category><category>scaling01</category><category>rishdotblog</category><category>claudeai</category><category>kimmonismus</category><category>artificialanlys</category><category>long-context</category><category>agent-planning</category><category>knowledge-work</category><category>benchmarking</category><category>tokenization</category><category>model-integration</category><category>code-execution</category><category>model-updates</category><category>aesthetic-quality</category></item><item><title>Qwen3.5-397B-A17B: the smallest Open-Opus class, very efficient model</title><link>https://news.smol.ai/issues/26-02-16-qwen35/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-16-qwen35/</guid><description>**Alibaba** released **Qwen3.5-397B-A17B**, an open-weight model featuring **native multimodality**, **spatial intelligence**, and a **hybrid linear attention + sparse MoE** architecture supporting **201 languages** and **long context windows** up to **256K tokens**. The model shows improvements over previous versions like **Qwen3-Max** and **Qwen3-VL**, with a sparsity ratio of about **4.3%**. Community discussions highlighted the **Gated Delta Networks** enabling efficient inference despite large model size (~**800GB BF16**), with successful local runs on Apple Silicon using quantization techniques. The hosted API version, **Qwen3.5-Plus**, supports **1M context** and integrates search and code interpreter features. This release follows other Chinese labs like **Z.ai**, **Minimax**, and **Kimi** in refreshing large models. The model is licensed under **Apache-2.0** and is expected to be the last major release before **DeepSeek v4**. The news also notes **Pete Steinberger** joining **OpenAI**.</description><pubDate>Mon, 16 Feb 2026 05:44:39 GMT</pubDate><category>alibaba</category><category>openai</category><category>deepseek</category><category>z-ai</category><category>minimax</category><category>kimi</category><category>unsloth</category><category>ollama</category><category>vllm</category><category>qwen3.5-397b-a17b</category><category>qwen3.5-plus</category><category>qwen3-max</category><category>qwen3-vl</category><category>kimi</category><category>pete_steinberger</category><category>justinlin610</category><category>native-multimodality</category><category>spatial-intelligence</category><category>sparse-moe</category><category>long-context</category><category>model-quantization</category><category>model-architecture</category><category>model-deployment</category><category>inference-optimization</category><category>apache-2.0-license</category></item><item><title>MiniMax-M2.5: SOTA coding, search, toolcalls, $1/hour</title><link>https://news.smol.ai/issues/26-02-13-minimax25/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-13-minimax25/</guid><description>**MiniMax-M2.5** is now open source, featuring an &quot;agent-native&quot; reinforcement learning framework called **Forge** trained across **200k+ RL environments** for coding, tool use, and workflows. It boasts strong benchmark scores like **80.2% SWE-Bench Verified** and emphasizes cost-efficiency with claims like &quot;$1 per hour at 100 tps&quot; and good on-device performance. The **Forge** RL system uses multi-level prefix caching and high rollout compute share (~60%) to generate millions of trajectories daily. Independent reviews note improved stability and multi-turn viability but high token usage. The ecosystem rapidly adopted MiniMax-M2.5 with quantized releases including **2-bit GGUF** and **INT4** formats. Meanwhile, **Together** markets **GLM-5** as a leading open-source model for long-horizon agents with **77.8% SWE-Bench Verified** and MoE efficiency using DeepSeek Sparse Attention.</description><pubDate>Fri, 13 Feb 2026 05:44:39 GMT</pubDate><category>minimax-ai</category><category>togethercompute</category><category>huggingface</category><category>intel</category><category>wandb</category><category>minimax-m2.5</category><category>glm-5</category><category>reinforcement-learning</category><category>agent-based-models</category><category>model-quantization</category><category>benchmarking</category><category>model-efficiency</category><category>multi-turn-dialogue</category><category>infrastructure-optimization</category><category>cost-efficiency</category><category>on-device-ai</category></item><item><title>new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5</title><link>https://news.smol.ai/issues/26-02-12-anthropic-gemini-deepthink/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-12-anthropic-gemini-deepthink/</guid><description>**Google DeepMind** is rolling out the upgraded **Gemini 3 Deep Think V2** reasoning mode to **Google AI Ultra** subscribers and opening early access to the **Vertex AI / Gemini API** for select users. Key benchmark achievements include **ARC-AGI-2 at 84.6%**, **Humanity’s Last Exam (HLE) at 48.4% without tools**, and a **Codeforces Elo of 3455**, showcasing Olympiad-level performance in physics and chemistry. The mode emphasizes practical scientific and engineering applications such as error detection in math papers, physical system modeling, semiconductor optimization, and a **sketch to CAD/STL pipeline** for 3D printing. ARC benchmark creator François Chollet highlights the benchmark&apos;s role in advancing test-time adaptation and fluid intelligence, projecting human-AI parity around **2030**. This rollout is framed as a productized, compute-heavy test-time mode rather than a lab demo, with cost disclosures for ARC tasks provided.</description><pubDate>Thu, 12 Feb 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>google</category><category>geminiapp</category><category>arcprize</category><category>gemini-3-deep-think-v2</category><category>arc-agi-2</category><category>demishassabis</category><category>sundarpichai</category><category>fchollet</category><category>jeffdean</category><category>oriolvinyalsml</category><category>tulseedoshi</category><category>benchmarking</category><category>reasoning</category><category>test-time-adaptation</category><category>fluid-intelligence</category><category>scientific-computing</category><category>engineering-workflows</category><category>3d-modeling</category><category>cost-analysis</category></item><item><title>Z.ai GLM-5: New SOTA Open Weights LLM</title><link>https://news.smol.ai/issues/26-02-11-glm-5/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-11-glm-5/</guid><description>**Zhipu AI** launched **GLM-5**, an **Opus-class** model scaling from **355B to 744B parameters** with **DeepSeek Sparse Attention** integration for cost-efficient long-context serving. GLM-5 achieves **SOTA on BrowseComp** and leads on **Vending Bench 2**, focusing on office productivity tasks and surpassing **Kimi K2.5** on the GDPVal-AA benchmark. Despite broad availability on platforms like **OpenRouter**, **Modal**, **DeepInfra**, and **Ollama Cloud**, GLM-5 faces **compute constraints** impacting rollout and pricing. The model supports up to **200K context length** and **128K max output tokens**.</description><pubDate>Wed, 11 Feb 2026 05:44:39 GMT</pubDate><category>zhipu-ai</category><category>openrouter</category><category>modal</category><category>deepinfra</category><category>ollama</category><category>qoder</category><category>vercel</category><category>glm-5</category><category>glm-4.5</category><category>kimi-k2.5</category><category>deepseek-sparse-attention</category><category>long-context</category><category>model-scaling</category><category>pretraining</category><category>benchmarking</category><category>office-productivity</category><category>context-window</category><category>model-deployment</category><category>cost-efficiency</category></item><item><title>Qwen-Image 2.0 and Seedance 2.0</title><link>https://news.smol.ai/issues/26-02-10-qwenimage-seedance-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-10-qwenimage-seedance-2/</guid><description>**OpenAI** advances its Responses API for multi-hour agent workflows with features like **server-side compaction**, **hosted containers**, and **Skills API**, alongside upgrading **Deep Research** to **GPT-5.2** and adding connectors. Discussions around sandbox design highlight a shift towards **sandbox-as-a-tool** architectures, with **LangChain** enhancing its **deepagents v0.4** with pluggable sandbox backends. Coding agent UX evolves with multi-model orchestration involving **Claude Opus 4.6**, **GPT-5.3-Codex**, and **Gemini 3 Pro**. **EntireHQ** raised **$60M seed** funding for a Git-compatible database capturing code intent and agent context. In model releases, **Alibaba Qwen** launched **Qwen-Image-2.0** emphasizing **2K resolution** and **1K-token prompts** for unified generation and editing. ByteDance&apos;s **Seedance 2.0** marks a significant leap in text-to-video quality, while **Moonshot&apos;s Kimi** introduces an **Agent Swarm** with up to **100 sub-agents** and **4.5× faster** parallel execution.</description><pubDate>Tue, 10 Feb 2026 05:44:39 GMT</pubDate><category>openai</category><category>langchain-ai</category><category>anthropic</category><category>google-deepmind</category><category>mistral-ai</category><category>alibaba</category><category>bytedance</category><category>moonshot</category><category>gpt-5.2</category><category>gpt-5.3-codex</category><category>claude-opus-4.6</category><category>gemini-3-pro</category><category>qwen-image-2.0</category><category>seedance-2.0</category><category>hwchase17</category><category>nabbilkhan</category><category>sydneyrunkle</category><category>joecuevasjr</category><category>pierceboggan</category><category>reach_vb</category><category>gdb</category><category>ashtom</category><category>agentic-sandboxes</category><category>multi-model-orchestration</category><category>server-side-compaction</category><category>coding-agent-ux</category><category>long-running-agents</category><category>model-release</category><category>text-to-video</category><category>image-generation</category><category>parallel-execution</category><category>funding</category><category>git-compatible-database</category><category>token-efficiency</category><category>workflow-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-02-09-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-09-not-much/</guid><description>**OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing &quot;You can just build things&quot; as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as their first &quot;high cybersecurity capability&quot; model. Sam Altman reported over **1M Codex app downloads in the first week** and strong weekly user growth. Meanwhile, **Anthropic&apos;s Claude Opus 4.6** is recognized as a leading &quot;agentic generalist&quot; model, topping text and code leaderboards but noted for high token usage. Discussions around serving economics and &quot;fast mode&quot; behavior highlight practical deployment considerations. Additionally, Recursive Language Models (RLMs) introduce a novel approach using a second programmatic context space to extend long-context capabilities.</description><pubDate>Mon, 09 Feb 2026 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>cursor_ai</category><category>github</category><category>microsoft</category><category>gpt-5.3-codex</category><category>claude-opus-4.6</category><category>sama</category><category>pierceboggan</category><category>kylebrussell</category><category>natolambert</category><category>omarsar0</category><category>sam_altman</category><category>builder-tooling</category><category>cybersecurity</category><category>api-access</category><category>model-rollout</category><category>agentic-ai</category><category>long-context</category><category>serving-economics</category><category>throughput-latency</category><category>token-efficiency</category><category>workflow-design</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-02-06-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-06-not-much/</guid><description>**AI News** for early February 2026 highlights a detailed comparison between **GPT-5.3-Codex** and **Claude Opus 4.6**, with users noting **Codex&apos;s** strength in detailed scoped tasks and **Opus&apos;s** ergonomic advantage for exploratory work. Benchmarks on Karpathy&apos;s **nanochat GPT-2 speedrun** show **Opus 4.6** achieving better wall-clock performance, while **Codex-5.3-xhigh** sometimes suffers from context issues. **Karpathy** cautions that current models are not yet reliable for fully autonomous AI engineering. Discussions on agent swarms reveal emerging parallels to software organizational design, with **Anthropic-style** agent coordination systems and **LangChain/LangSmith** emphasizing environment engineering through tracing, sandboxing, and state control. The concept of Recursive Language Models (RLM) is introduced as a future direction for agent systems to reduce context rot and improve structured communication.</description><pubDate>Fri, 06 Feb 2026 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>langchain</category><category>gpt-5.3-codex</category><category>claude-opus-4.6</category><category>nanochat-gpt-2</category><category>karpathy</category><category>sama</category><category>swyx</category><category>omarsar0</category><category>hamelhusain</category><category>deepfates</category><category>agent-systems</category><category>ai-engineering</category><category>benchmarking</category><category>software-organization</category><category>sandboxing</category><category>tracing</category><category>state-management</category><category>recursive-language-models</category><category>context-management</category></item><item><title>OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex</title><link>https://news.smol.ai/issues/26-02-05-claude-opus-openai-codex/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-05-claude-opus-openai-codex/</guid><description>**OpenAI** launched **GPT-5.3-Codex**, emphasizing **token efficiency**, **inference speed**, and hardware/software co-design with **GB200-NVL72** and **NVIDIA** collaboration. The new **Frontier** agent platform supports business-context agents with execution environments and learning capabilities. **Anthropic** showcased **Opus 4.6** agent teams autonomously building a clean-room C compiler booting Linux, highlighting advances in agentic coding and long-context capabilities. Community benchmarks report **2.93× faster** inference and significant efficiency gains, signaling a shift away from infinite compute budgets in 2026.</description><pubDate>Thu, 05 Feb 2026 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>nvidia</category><category>gpt-5.3-codex</category><category>opus-4.6</category><category>agentic-coding</category><category>long-context</category><category>token-efficiency</category><category>inference-speed</category><category>hardware-software-co-design</category><category>agent-platforms</category><category>benchmarking</category><category>software-development</category><category>compiler-construction</category></item><item><title>ElevenLabs $500m Series D at $11B, Cerebras $1B Series H at $23B, Vibe Coding -&gt; Agentic Engineering</title><link>https://news.smol.ai/issues/26-02-04-elevenlabs-cerebras/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-04-elevenlabs-cerebras/</guid><description>**Google&apos;s Gemini 3** is being integrated widely, including a new **Chrome side panel** and **Nano Banana** UX features, with rapid adoption and a **78% unit-cost reduction** in serving costs. The **Gemini app** reached **750M+ MAU** in Q4 2025, nearing ChatGPT&apos;s user base. Google is also benchmarking AI &quot;soft skills&quot; through games like Poker and Chess in the **Kaggle Game Arena**. Meanwhile, coding agents are converging in IDEs: **VS Code** launched **Agent Sessions** supporting **Claude** and **Codex** agents with features like parallel subagents and integrated browsers. **GitHub Copilot** now allows agent choice between **Claude** and **OpenAI Codex** for async backlog clearing. OpenAI reports **1M+ active users** for Codex with expanded integration surfaces, though some users request better GPU support. The coding-agent ecosystem is professionalizing with community platforms like **OpenClaw** and tooling such as ClawHub and CLI updates. *&quot;Gemini 3 adoption faster than any other model&quot;* and *&quot;VS Code as home for coding agents&quot;* highlight major industry shifts.</description><pubDate>Wed, 04 Feb 2026 05:44:39 GMT</pubDate><category>google</category><category>openai</category><category>github</category><category>microsoft</category><category>deepmind</category><category>gemini-3</category><category>claude</category><category>codex</category><category>sama</category><category>sundarpichai</category><category>reach_vb</category><category>agent-frameworks</category><category>model-deployment</category><category>benchmarking</category><category>cost-optimization</category><category>software-development</category><category>async-processing</category><category>gpu-acceleration</category><category>coding-agents</category><category>user-adoption</category><category>game-theory</category><category>workflow-integration</category></item><item><title>Context Graphs: Hype or actually Trillion-dollar opportunity?</title><link>https://news.smol.ai/issues/26-02-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-03-not-much/</guid><description>**Zhipu AI** launched **GLM-OCR**, a lightweight **0.9B** multimodal OCR model excelling in complex document understanding with top benchmark scores and day-0 deployment support from **lmsys**, **vllm**, and **novita labs**. **Ollama** enabled local-first usage with easy offline operation. **Alibaba** released **Qwen3-Coder-Next**, an **80B MoE** model with only **3B active** parameters, designed for coding agents with a massive **256K context window** and trained on **800K verifiable tasks**, achieving over **70% SWE-Bench Verified**. The open coding ecosystem also saw **Allen AI** announce **SERA-14B**, an on-device-friendly coding model with new datasets. The emerging concept of **Context Graphs** was highlighted as a promising framework for data and agent traceability, with initiatives like **Cursor&apos;s Agent Trace** specifying context graphs for coding agents, emphasizing potential improvements in agent performance and customer-driven adoption. This coverage reflects ongoing innovation in **multimodality**, **long-context**, **mixture-of-experts**, and **agentic coding models**.</description><pubDate>Tue, 03 Feb 2026 05:44:39 GMT</pubDate><category>zhipu-ai</category><category>lmsys</category><category>vllm</category><category>novita-labs</category><category>ollama</category><category>alibaba</category><category>allenai</category><category>cognition</category><category>cursor</category><category>glm-ocr</category><category>qwen3-coder-next</category><category>sera-14b</category><category>jaya_gupta</category><category>dharmesh_shah</category><category>multimodality</category><category>ocr</category><category>long-context</category><category>mixture-of-experts</category><category>agentic-coding-models</category><category>context-graphs</category><category>benchmarking</category><category>model-deployment</category><category>model-optimization</category><category>model-training</category></item><item><title>OpenAI Codex App: death of the VSCode fork, multitasking worktrees, Skills Automations</title><link>https://news.smol.ai/issues/26-02-02-openai-codex-app/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-02-02-openai-codex-app/</guid><description>**OpenAI** launched the **Codex app** on macOS as a dedicated agent-native command center for coding, featuring **multiple agents in parallel**, **built-in worktrees** for conflict isolation, **skills** for reusable bundles, and **scheduled automations**. The app emphasizes developer workflows like **Plan mode** for upfront task decomposition and is gaining positive adoption signals from insiders including **@sama**. There is movement towards ecosystem standardization of skills folders, signaling early conventions in agent tooling. Codex also exemplifies a &quot;self-improving&quot; product feedback loop combining humans and agents. In coding agents practice, best practices include a &quot;test-first&quot; approach to bug fixes, the &quot;conductor&quot; model where one developer manages 5-10 agents in parallel, and a neurosymbolic framing explaining why coding agents succeed due to software&apos;s verifiability and symbolic tooling. Benchmark skepticism remains about productivity studies that do not reflect agentic workflows.</description><pubDate>Mon, 02 Feb 2026 05:44:39 GMT</pubDate><category>openai</category><category>codex</category><category>sama</category><category>reach_vb</category><category>gdb</category><category>skirano</category><category>embirico</category><category>ajambrosino</category><category>thsottiaux</category><category>nbaschez</category><category>yuchenj_uw</category><category>badlogicgames</category><category>random_walker</category><category>agent-based-systems</category><category>parallel-processing</category><category>software-testing</category><category>developer-workflows</category><category>automation</category><category>product-feedback-loop</category><category>neurosymbolic-ai</category><category>benchmarking</category></item><item><title>MoltBook takes over the timeline</title><link>https://news.smol.ai/issues/26-01-30-moltbook/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-30-moltbook/</guid><description>**Moltbook** and **OpenClaw** showcase emergent multi-agent social networks where AI agents autonomously interact, creating an AI-native forum layer with complex security and identity challenges. **Karpathy** describes this as &quot;takeoff-adjacent,&quot; highlighting bots self-organizing and engaging in prompt-injection and credential theft. **Anthropic** reports on AI coding tradeoffs with a study of **52 junior engineers** and reveals **Claude** planned a Mars rover drive, marking a milestone in AI-driven space exploration. **Google** publicly releases **Genie 3**, sparking debate over its capabilities and latency issues. The rise of agent-to-agent private communications raises concerns about alignment and observability in 2026.</description><pubDate>Fri, 30 Jan 2026 05:44:39 GMT</pubDate><category>moltbook</category><category>openclaw</category><category>anthropic</category><category>google</category><category>claude</category><category>genie-3</category><category>karpathy</category><category>multi-agent-systems</category><category>agent-communication</category><category>security</category><category>prompt-injection</category><category>identity</category><category>alignment</category><category>observability</category><category>ai-planning</category><category>ai-coding</category><category>emergent-behavior</category></item><item><title>xAI Grok Imagine API - the #1 Video Model, Best Pricing and Latency - and merging with SpaceX</title><link>https://news.smol.ai/issues/26-01-29-xai-grok-imagine-api/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-29-xai-grok-imagine-api/</guid><description>**Google DeepMind** launched **Project Genie (Genie 3 + Nano Banana Pro + Gemini)**, a prototype for creating interactive, real-time generated worlds from text or image prompts, currently available to **Google AI Ultra subscribers in the U.S. (18+)** with noted limitations like **~60s generation limits** and imperfect physics. In parallel, the open-source **LingBot-World** offers a real-time interactive world model with **&lt;1s latency at 16 FPS** and minute-level coherence, emphasizing interactivity and causal consistency. In video generation, **xAI Grok Imagine** debuted strongly with native audio support, **15s duration**, and competitive pricing at **$4.20/min including audio**, while **Runway Gen-4.5** focuses on animation workflows with new features like **Motion Sketch** and **Character Swap**. The 3D generation space sees **fal** adding **Hunyuan 3D 3.1 Pro/Rapid** to its API offerings, extending model-as-a-service workflows into 3D pipelines.</description><pubDate>Thu, 29 Jan 2026 05:44:39 GMT</pubDate><category>google-deepmind</category><category>x-ai</category><category>runway</category><category>fal</category><category>genie-3</category><category>nano-banana-pro</category><category>gemini</category><category>lingbot-world</category><category>grok-imagine</category><category>runway-gen-4.5</category><category>hunyuan-3d-3.1-pro</category><category>demishassabis</category><category>sundarpichai</category><category>interactive-simulation</category><category>real-time-generation</category><category>promptability</category><category>character-customization</category><category>world-models</category><category>open-source</category><category>video-generation</category><category>audio-generation</category><category>animation-workflows</category><category>model-as-a-service</category><category>3d-generation</category><category>latency</category><category>coherence</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-28-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-28-not-much/</guid><description>**AI News for 1/27/2026-1/28/2026** highlights a quiet day with deep dives into frontier model &quot;personality split&quot; where **GPT-5.2** excels at *exploration* and **Claude Opus 4.5** at *exploitation*, suggesting **OpenAI** suits research workflows and **Anthropic** commercial reliability. The rise of agentic coding loops shows new failure modes, with *self-verification* workflows gaining traction. The open-model **Kimi K2.5** emerges as a flashpoint, boasting enhanced **agent execution**, **multimodality**, and **coding polish**, runnable on **Apple silicon M3 Ultra Mac Studios** with **Thunderbolt 5 (RDMA)**, and challenging **Claude Opus 4.5** on benchmarks and pricing. Licensing issues threaten enterprise adoption despite model quality. The meme &quot;clawdbot&quot; reflects rapid agent branding proliferation. Agent engineering advances with shared &quot;skills&quot; interfaces promoted by **DeepLearning.AI**, **Anthropic**, and **LangChain**.</description><pubDate>Wed, 28 Jan 2026 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>deeplearningai</category><category>langchain</category><category>apple</category><category>gpt-5.2</category><category>claude-opus-4.5</category><category>kimi-k2.5</category><category>agentic-ai</category><category>multimodality</category><category>coding</category><category>self-verification</category><category>agent-engineering</category><category>model-benchmarking</category><category>model-optimization</category><category>workflow-automation</category></item><item><title>Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager</title><link>https://news.smol.ai/issues/26-01-27-kimi-k25/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-27-kimi-k25/</guid><description>**MoonshotAI&apos;s Kimi K2.5** is a **32B active-1T parameter open-weights model** featuring **native multimodality** with image and video understanding, built through continual pretraining on **15 trillion mixed visual and text tokens**. It introduces a new **MoonViT vision encoder** and supports advanced capabilities like **Agent Swarm**, which coordinates up to 100 sub-agents for parallel workflows, and an **Office Productivity K2.5 Agent** for large-scale office tasks. This release marks a significant leap in open models from China, claiming state-of-the-art results on benchmarks like HLE and BrowseComp, and offering aggressive API pricing and throughput.</description><pubDate>Tue, 27 Jan 2026 05:44:39 GMT</pubDate><category>moonshotai</category><category>kimi-k2.5</category><category>multimodality</category><category>model-training</category><category>mixture-of-experts</category><category>agentic-ai</category><category>vision</category><category>video-understanding</category><category>model-optimization</category><category>parallel-processing</category><category>office-productivity</category></item><item><title>Anthropic launches the MCP Apps open spec, in Claude.ai</title><link>https://news.smol.ai/issues/26-01-26-mcp-apps/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-26-mcp-apps/</guid><description>**Anthropic** has officially absorbed the independent MCP UI project and, collaborating with **OpenAI**, **Block**, **VS Code**, **Antigravity**, **JetBrains**, and **AWS**, released the **MCP Apps spec** and official support in **Claude.ai**. This standard aims to enable a rich ecosystem of interoperable applications with rich UI, addressing the proliferation of subscription services. Meanwhile, **NVIDIA** introduced **ToolOrchestra** with an **8B orchestrator** model trained via scalable reinforcement learning for efficient agent orchestration. The concept of Recursive Language Models (RLMs) is gaining traction for efficient context management in agent stacks. The “Clawdbot” UX pattern emphasizes outcome-first assistant design with tight context and tool integration, sparking security concerns around prompt injection. **Alibaba** launched **Qwen3-Max-Thinking**, a flagship reasoning and agent model with adaptive tool use and strong benchmark scores, now available in public evaluation platforms like LM Arena and Yupp.</description><pubDate>Mon, 26 Jan 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>block</category><category>vs-code</category><category>antigravity</category><category>jetbrains</category><category>aws</category><category>nvidia</category><category>alibaba</category><category>claude-ai</category><category>claude-ai</category><category>toolorchestra-8b</category><category>qwen3-max-thinking</category><category>agent-orchestration</category><category>reinforcement-learning</category><category>recursive-language-models</category><category>context-management</category><category>user-experience</category><category>security</category><category>prompt-injection</category><category>reasoning</category><category>adaptive-tool-use</category><category>model-evaluation</category><category>benchmarking</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-22-not-much/</guid><description>**Anthropic** launches &quot;Claude in Excel Pro&quot; with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor** introduces Agent Skills for dynamic context focus. **GPT-5.2 Pro** achieves **31%** on FrontierMath Tier 4, showing significant benchmark progress. **Baseten** raises **$300M** at a **$5B valuation** targeting high-performance inference. Discussions highlight math benchmarks as indicators of AI capability, uneven AGI progress, and the importance of reasoning and continual learning as future frontiers. Notable figures include *Sam Altman*, *François Chollet*, *Shane Legg*, and *Demis Hassabis*.</description><pubDate>Thu, 22 Jan 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>google</category><category>sakana-ai</category><category>cursor</category><category>baseten</category><category>epoch-ai-research</category><category>deepmind</category><category>claude-3</category><category>codex</category><category>gemini</category><category>gpt-5.2-pro</category><category>sama</category><category>fchollet</category><category>shane_legg</category><category>demishassabis</category><category>benchmarking</category><category>reasoning</category><category>continual-learning</category><category>reinforcement-learning</category><category>model-performance</category><category>agentic-ai</category><category>security</category><category>model-training</category></item><item><title>OpenEvidence, the ‘ChatGPT for doctors,’ raises $250m at $12B valuation, 12x from $1b last Feb</title><link>https://news.smol.ai/issues/26-01-21-openevidence/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-21-openevidence/</guid><description>**OpenEvidence** raised **$12 billion**, a 12x increase from last year, with usage by 40% of U.S. physicians and over $100 million in annual revenue. **Anthropic** released a new **Claude** model constitution under **CC0 1.0**, framing it as a living document for alignment and training. **Podium** reported over **$100 million ARR** from **10,000+ AI agents**, shifting from software sales to AI operators. Innovations in agent memory and reliability include the **Agent Cognitive Compressor (ACC)** and multi-agent scientific workflows via **MCP-SIM**. Agentic benchmarking shows challenges in long-horizon tasks with models like **Gemini 3 Flash High**, **GPT-5.2 High**, and **Claude Opus 4.5 High** scoring modestly on professional services and legal research benchmarks.</description><pubDate>Wed, 21 Jan 2026 05:44:39 GMT</pubDate><category>openevidence</category><category>anthropic</category><category>podium</category><category>openai</category><category>google</category><category>gemini</category><category>claude</category><category>claude-3</category><category>claude-opus</category><category>gpt-5.2</category><category>gemini-3-flash-high</category><category>daniel_nadler</category><category>amanda_askell</category><category>eric_rea</category><category>tom_loverro</category><category>garry_tan</category><category>omarsar0</category><category>brendanfoody</category><category>deredleritt3r</category><category>agentic-ai</category><category>model-alignment</category><category>performance-evaluation</category><category>memory-optimization</category><category>long-context</category><category>benchmarking</category><category>multi-agent-systems</category><category>reinforcement-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-20-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-20-not-much/</guid><description>**X Engineering** open-sourced its new transformer-based recommender algorithm, sparking community debate on transparency and fairness. **GLM-4.7-Flash (30B-A3B)** gains momentum as a strong local inference model with efficient KV-cache management and quantization tuning strategies. Innovations include tensor parallelism on Mac Minis achieving ~100 tok/s throughput. Research highlights &quot;Societies of Thought&quot; as a reasoning mechanism improving model accuracy by 20%+.</description><pubDate>Tue, 20 Jan 2026 05:44:39 GMT</pubDate><category>x-ai</category><category>unsloth-ai</category><category>google</category><category>deepseek</category><category>ollama</category><category>glm-4.7-flash</category><category>grok</category><category>deepseek-r1</category><category>qwq</category><category>giffmana</category><category>david_sholz</category><category>yuchenj_uw</category><category>nearcyan</category><category>sam_paech</category><category>teortaxes_tex</category><category>danielhanchen</category><category>alexocheema</category><category>nopmobiel</category><category>rohanpaul_ai</category><category>transformer-architecture</category><category>recommendation-systems</category><category>local-inference</category><category>kv-cache</category><category>quantization</category><category>tensor-parallelism</category><category>reasoning</category><category>model-optimization</category><category>fine-tuning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-19-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-19-not-much/</guid><description>**AI News for 1/16/2026-1/19/2026** covers new architectures for scaling Transformer memory and context, including **STEM** from **Carnegie Mellon** and **Meta AI**, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous prefetch. **RePo** from **Sakana AI** introduces adaptive positional reordering to improve robustness on noisy and long-range contexts. Model releases highlight **Zhipu AI&apos;s GLM-4.7-Flash**, a **30B-class MLA + small MoE** model optimized for coding and agentic tasks, noted for strong benchmark performance and a compression narrative from larger to smaller models. Inference and deployment updates include **mlx-lm 0.30.3** supporting GLM-4.7-Flash with efficient 4-bit performance on laptops. The report emphasizes practical takeaways on static sparsity, adaptive ordering, and the resurgence of small, fast models for interactive tasks. *&quot;Sparse capacity doesn’t have to mean MoE routers + expert parallelism; static sparsity can be systems-friendly.&quot;*</description><pubDate>Mon, 19 Jan 2026 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>carnegie-mellon</category><category>sakana-ai</category><category>zhipu-ai</category><category>glm-4.7-flash</category><category>glm-4.7</category><category>glm-4.5</category><category>qwen3-vl</category><category>qwen</category><category>transformer-memory</category><category>model-architecture</category><category>mixture-of-experts</category><category>adaptive-position-encoding</category><category>long-context</category><category>model-compression</category><category>inference-optimization</category><category>local-inference</category><category>model-deployment</category><category>benchmarking</category><category>coding</category><category>agentic-ai</category></item><item><title>ChatGPT starts testing ads on free tier + new $8/mo Go plan in the US</title><link>https://news.smol.ai/issues/26-01-16-chatgpt-ads/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-16-chatgpt-ads/</guid><description>**OpenAI** announced the **ChatGPT Go** tier at **$8/month** with ads testing in the US free tier, emphasizing that ads will not influence responses and will be clearly labeled. The update includes memory improvements and a &quot;very fast Codex&quot; feature teased by **Sam Altman**. The Codex CLI ecosystem now supports open-weight models with improved context length. Discussions highlight the importance of human-in-the-loop for reliability in agent orchestration and file interface improvements over traditional retrieval-augmented generation.</description><pubDate>Fri, 16 Jan 2026 05:44:39 GMT</pubDate><category>openai</category><category>ollama</category><category>chatgpt-go</category><category>codex</category><category>sama</category><category>sam_altman</category><category>fidjissimo</category><category>scaling01</category><category>tomwarren</category><category>embirico</category><category>adamdotdev</category><category>ollama</category><category>thsottiaux</category><category>lateinteraction</category><category>dbreunig</category><category>ads</category><category>monetization</category><category>memory</category><category>agent-orchestration</category><category>human-in-the-loop</category><category>cli-tools</category><category>context-length</category><category>workflow-optimization</category></item><item><title>Open Responses: explicit spec for OpenAI&apos;s Responses API supported by OpenRouter, Ollama, Huggingface, vLLM, et al</title><link>https://news.smol.ai/issues/26-01-15-openresponses/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-15-openresponses/</guid><description>**OpenAI** launched the **Open Responses** API spec, an open-source, multi-provider standard for interoperable LLM APIs designed to simplify agent stacks and tooling. Early adopters like **ollama** and **vLLM** support the spec, while notable absences include **anthropic** and **google-deepmind**. Agent design insights from **Cursor** emphasize explicit roles and planning over mega-agent models, with **GPT-5.2** outperforming **Opus 4.5** in long runs. The emerging dominant context/memory abstraction for agents is a **filesystem-as-memory** approach, championed by **llamaindex** and **langchain**, using virtual filesystems often backed by databases like Postgres. LangChain also shipped an open-source desktop interface for agent orchestration called **openwork**. This news highlights advances in API standardization, agent architecture, and memory abstractions in AI development.</description><pubDate>Thu, 15 Jan 2026 05:44:39 GMT</pubDate><category>openai</category><category>ollama</category><category>vllm</category><category>openrouter</category><category>anthropic</category><category>google-deepmind</category><category>langchain</category><category>llamaindex</category><category>gpt-5.2</category><category>opus-4.5</category><category>reach_vb</category><category>simonw</category><category>yuchenj_uw</category><category>omarsar0</category><category>jerryjliu0</category><category>hwchase17</category><category>swyx</category><category>interoperable-apis</category><category>agent-architecture</category><category>filesystem-memory</category><category>api-standardization</category><category>multi-agent-systems</category><category>prompt-engineering</category><category>model-comparison</category><category>virtual-filesystems</category><category>open-source</category><category>agent-ux</category></item><item><title>not much happened today.</title><link>https://news.smol.ai/issues/26-01-14-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-14-not-much/</guid><description>**OpenAI** launched **GPT-5.2-Codex** API, touted as their strongest coding model for long-running tasks and cybersecurity. **Cursor** integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. **GitHub** incorporated it into their code tools, easing enterprise adoption. Discussions highlight the importance of review loops in agent systems and debate evaluation metrics for coding models. **OpenAI** partnered with **Cerebras** to improve inference speed and latency, with Cerebras serving **GLM-4.7** at 1,445 tokens/sec and low latency. Provider benchmarking reveals tradeoffs in throughput, latency, and context window sizes. **Modal** shared operational scaling insights for self-hosted inference fleets of 20k GPUs, focusing on batch inference optimization with **vLLM** and FlashInfer backend. This reflects a focus on inference infrastructure, long-horizon autonomous agents, and coding model evaluation.</description><pubDate>Wed, 14 Jan 2026 05:44:39 GMT</pubDate><category>openai</category><category>cursor</category><category>github</category><category>cerebras</category><category>modal</category><category>artificial-analysis</category><category>vllm</category><category>gpt-5.2-codex</category><category>glm-4.7</category><category>swyx</category><category>kevinweil</category><category>pierceboggan</category><category>mntruell</category><category>scaling01</category><category>long-running-tasks</category><category>autonomous-agents</category><category>code-generation</category><category>inference-speed</category><category>latency</category><category>batch-inference</category><category>gpu-scaling</category><category>model-evaluation</category><category>agent-systems</category><category>operational-scaling</category></item><item><title>Anthropic Labs: Cowork, Claude Code, MCP, Skills incubator led by Mike Krieger and Ben Mann</title><link>https://news.smol.ai/issues/26-01-13-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-13-not-much/</guid><description>**Anthropic** consolidates its AI agent products under the **Cowork** brand, integrating prior tools like **Claude Code** and **Claude for Chrome** into a unified agent with sandboxed Linux VM environments using **Apple&apos;s virtualization** and **bubblewrap** for security. Meanwhile, **Anthropic Labs** reorganizes with Mike Krieger stepping down as CPO, focusing on productizing **Claude** with a &gt;$1B ARR agent lab. The AI community debates the meaning of &quot;vibe coding,&quot; emphasizing disciplined engineer verification over casual coding. **LangChain** launches **Agent Builder GA**, offering no-code but powerful agent orchestration features like memory, triggers, and human-in-the-loop approvals. Some experts advocate simplifying agent tooling to core filesystem and bash access for efficiency. Open-source recreations of Cowork-like environments using **QEMU** and sandboxing tools highlight rapid commoditization of AI agent tech.</description><pubDate>Tue, 13 Jan 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>langchain</category><category>apple</category><category>claude</category><category>claude-code</category><category>mike_krieger</category><category>ben_mann</category><category>gergely_orosz</category><category>yuchen_jin</category><category>harrison_chase</category><category>jared_z</category><category>sandboxing</category><category>agent-ux</category><category>agent-orchestration</category><category>human-in-the-loop</category><category>memory-management</category><category>tooling-simplification</category><category>linux-virtualization</category><category>security</category><category>agent-productization</category></item><item><title>Apple picks Google&apos;s Gemini to power Siri&apos;s next generation</title><link>https://news.smol.ai/issues/26-01-12-gemini-apple/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-12-gemini-apple/</guid><description>**Apple** has decided to power Siri with **Google&apos;s Gemini models** and cloud technology, marking a significant partnership and a setback for **OpenAI**, which was initially partnered with Apple. **Anthropic** launched &quot;Cowork,&quot; a product preview for Claude&apos;s coding capabilities, sparking discussions about &quot;LLM OS&quot;. **OpenAI** introduced **ChatGPT Health** and acquired **Torch** to expand in healthcare AI. **DeepSeek** unveiled **Engram**, a new conditional memory module that enables O(1) lookup-style memory for static patterns, improving long-context handling and offering hardware-friendly optimizations to scale knowledge capacity efficiently. Engram is positioned as a key modeling primitive for next-gen sparse models, with ongoing community debate about its architectural merits and practical impact.</description><pubDate>Mon, 12 Jan 2026 05:44:39 GMT</pubDate><category>apple</category><category>google</category><category>openai</category><category>anthropic</category><category>deepseek</category><category>gemini</category><category>claude</category><category>chatgpt</category><category>engram</category><category>conditional-memory</category><category>long-context</category><category>hashing</category><category>memory-optimization</category><category>transformers</category><category>model-scaling</category><category>sparsity</category><category>hardware-optimization</category><category>model-architecture</category><category>ai-healthcare</category><category>model-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-09-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-09-not-much/</guid><description>**Anthropic** tightens usage policies for **Claude Max** in third-party apps, prompting builders to adopt **model-agnostic orchestration** and **BYO-key** defaults to mitigate platform risks. The **Model Context Protocol (MCP)** is evolving into a key tooling plane with **OpenAI MCP Server** and **mcp-cli** enhancing tool discovery and token efficiency. The concept of **skills** as modular, versioned behaviors gains traction, with implementations in **Claude Code**, **GitHub Copilot**, and **Cline** adding websearch tooling. AI21 Labs addresses concurrency challenges in agent workspaces using **git worktrees** for transactional parallel writes, while long-horizon agents focus on **context engineering** and persistent file-centric workspaces.</description><pubDate>Fri, 09 Jan 2026 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>ai21-labs</category><category>github</category><category>cline</category><category>claude-max</category><category>yuchenj_uw</category><category>andersonbcdefg</category><category>gneubig</category><category>matan_sf</category><category>scaling01</category><category>reach_vb</category><category>_philschmid</category><category>claude_code</category><category>code</category><category>jamesmontemagno</category><category>cline</category><category>danstripper</category><category>omarsar0</category><category>model-agnostic</category><category>model-context-protocol</category><category>tooling</category><category>skills</category><category>concurrency</category><category>transactional-workspaces</category><category>context-engineering</category><category>file-centric-workspaces</category><category>rate-limiting</category><category>agent-workspaces</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-08-not-much/</guid><description>**Stanford paper** reveals **Claude 3.7 Sonnet** memorized **95.8% of Harry Potter 1**, highlighting copyright extraction risks compared to **GPT-4.1**. **Google AI Studio** sponsors **TailwindCSS** amid OSS funding debates. **Google** and **Sundar Pichai** launch **Gmail Gemini 3** features including AI Overviews and natural-language search with user controls. **Alibaba Qwen** releases **Qwen3-VL-Embedding** and **Qwen3-VL-Reranker**, a multimodal, multilingual retrieval stack supporting text, images, and video with quantization and instruction customization, achieving strong benchmark results. **Z.ai** goes public on HKEX with **GLM-4.7** leading the Artificial Analysis Intelligence Index v4.0, showing gains in reasoning, coding, and agentic use, with large-scale MoE architecture and MIT license. **Falcon-H1R-7B** from TII targets efficient reasoning in smaller models, scoring 16 on the Intelligence Index. **AI21 Labs** introduces **Jamba2**, a memory-efficient enterprise model with hybrid SSM-Transformer architecture and Apache 2.0 license, available via SaaS and Hugging Face. **vLLM** shows throughput improvements in inference and kernel engineering. *&quot;Embeddings should be multimodal by default,&quot;* notes Justin Lin.</description><pubDate>Thu, 08 Jan 2026 05:44:39 GMT</pubDate><category>stanford</category><category>google</category><category>google-deepmind</category><category>alibaba</category><category>z-ai</category><category>tii</category><category>ai21-labs</category><category>huggingface</category><category>claude-3-7-sonnet</category><category>gpt-4-1</category><category>gemini-3</category><category>qwen3-vl-embedding</category><category>qwen3-vl-reranker</category><category>glm-4-7</category><category>falcon-h1r-7b</category><category>jamba2</category><category>sundarpichai</category><category>justinlin610</category><category>copyright-extraction</category><category>multimodality</category><category>multilinguality</category><category>retrieval-augmented-generation</category><category>model-architecture</category><category>mixture-of-experts</category><category>model-quantization</category><category>reasoning</category><category>inference</category><category>kernel-engineering</category><category>memory-optimization</category><category>enterprise-ai</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-07-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-07-not-much/</guid><description>**AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding agents with allow/deny lists. **MCP** integration is expanding across assistants and robotics, with Hugging Face embedding assistants via **HuggingChat + HF MCP server**. The **DeepSeek-R1** paper has been expanded to **86 pages**, emphasizing trajectory exploration and RL shaping behavior. **NousCoder-14B** shows a **+7% improvement on LiveCodeBench** after **4 days** of RL training, demonstrating advances in RL for coding with small open models. Top tweets also mention a viral &quot;96GB RAM laptop&quot;, **ChatGPT Health** launch by **OpenAI**, and **Karpathy**&apos;s nanochat scaling-law miniseries.</description><pubDate>Wed, 07 Jan 2026 05:44:39 GMT</pubDate><category>langchain</category><category>cursor</category><category>huggingface</category><category>openai</category><category>weights-biases</category><category>nouscoder-14b</category><category>deepseek-r1</category><category>karpathy</category><category>_philschmid</category><category>omarsar0</category><category>agent-frameworks</category><category>context-management</category><category>reinforcement-learning</category><category>operational-safety</category><category>model-transparency</category><category>trajectory-exploration</category><category>token-optimization</category><category>coding-agents</category><category>integration-platforms</category></item><item><title>xAI raises $20B Series E at ~$230B valuation</title><link>https://news.smol.ai/issues/26-01-06-xai-series-e/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-06-xai-series-e/</guid><description>**xAI**, Elon Musk&apos;s AI company, completed a massive **$20 billion Series E funding round**, valuing it at about **$230 billion** with investors like **Nvidia**, **Cisco Investments**, and others. The funds will support AI infrastructure expansion including **Colossus I and II supercomputers** and training **Grok 5**, leveraging data from **X&apos;s 600 million monthly active users**. At **CES 2026**, the focus was on &quot;AI everywhere&quot; with a strong emphasis on **AI-first hardware** and integration between **NVIDIA** and **Hugging Face&apos;s LeRobot** for robotics development. The **Reachy Mini** robot is gaining traction as a consumer robotics platform. In software, **Claude Code** is emerging as a popular local/private coding assistant, with new UI features in **Claude Desktop** and innovations like **Cursor&apos;s dynamic context** reducing token usage by nearly **47%** in multi-MCP setups. *&quot;The 600 million MAU figure in xAI’s announcement combines X platform users with Grok users. That’s a clever framing choice.&quot;*</description><pubDate>Tue, 06 Jan 2026 05:44:39 GMT</pubDate><category>xai</category><category>nvidia</category><category>cisco</category><category>fidelity</category><category>valor-equity-partners</category><category>qatar-investment-authority</category><category>mgx</category><category>stepstone-group</category><category>baron-capital-group</category><category>hugging-face</category><category>amd</category><category>grok-5</category><category>claude-code</category><category>aakash_gupta</category><category>fei-fei_li</category><category>lisa_su</category><category>clementdelangue</category><category>thom_wolf</category><category>saradu</category><category>omarsar0</category><category>yuchenj_uw</category><category>_catwu</category><category>cursor_ai</category><category>ai-infrastructure</category><category>supercomputing</category><category>robotics</category><category>ai-hardware</category><category>agentic-ai</category><category>context-management</category><category>token-optimization</category><category>local-ai-assistants</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-05-nvidia-vera-rubin/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-05-nvidia-vera-rubin/</guid><description>**AI News** from early January 2026 highlights a viral economic prediction about **Vietnam** surpassing Thailand, **Microsoft**&apos;s reported open-sourcing of **bitnet.cpp** for 1-bit CPU inference promising speed and energy gains, and a new research partnership between **Google DeepMind** and **Boston Dynamics** focusing on **Gemini Robotics** and **Atlas hardware**. The concept of **agentic coding** is gaining traction, emphasizing human oversight and infrastructure layers called **Agent Harnesses** to manage long-running AI tasks, with advocates like **Philipp Schmid** promoting this shift. Innovations in persistent memory for coding agents, such as **Claude-Mem**, aim to improve context durability. There is also critical discussion on the specification problem in agent workflows, advocating for better abstractions beyond conversational intent. Practical challenges include managing parallel agents and permission risks. Additionally, open tooling advances include a **JAX-based LLM-Pruning Collection** for efficient model pruning methods.</description><pubDate>Mon, 05 Jan 2026 05:44:39 GMT</pubDate><category>microsoft</category><category>google-deepmind</category><category>boston-dynamics</category><category>claude-mem</category><category>bitnet-cpp</category><category>gemini</category><category>_philschmid</category><category>demishassabis</category><category>agentic-coding</category><category>agent-harnesses</category><category>persistent-memory</category><category>software-engineering</category><category>inference-efficiency</category><category>model-pruning</category><category>context-durability</category><category>specification-problem</category><category>workflow-management</category><category>cpu-inference</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/26-01-02-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/26-01-02-not-much/</guid><description>**DeepSeek** released a new paper on **mHC: Manifold-Constrained Hyper-Connections**, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the **Birkhoff polytope** to improve stability and performance, with only about **6.7% training overhead**. The innovation includes systems-level optimizations like fused kernels and activation recomputation, highlighting a frontier-lab integration of math and kernel engineering. Additionally, discussions around **long-horizon agents** emphasize context management bottlenecks, introducing **Recursive Language Models (RLMs)** that manage context dynamically rather than relying on larger context windows. This work signals a shift in architectural design and efficiency for base model training and agent development.</description><pubDate>Fri, 02 Jan 2026 05:44:39 GMT</pubDate><category>deepseek</category><category>bytedance</category><category>teortaxestex</category><category>askperplexity</category><category>rasbt</category><category>norxornor</category><category>dorialexander</category><category>iamgrigorev</category><category>primeintellect</category><category>a1zhang</category><category>residual-path-design</category><category>manifold-constrained-hyper-connections</category><category>birkhoff-polytope</category><category>training-overhead</category><category>kernel-optimization</category><category>activation-recomputation</category><category>pipeline-parallelism</category><category>long-horizon-agents</category><category>context-management</category><category>recursive-language-models</category><category>neural-network-stability</category><category>scaling-levers</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-31-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-31-not-much/</guid><description>**South Korea&apos;s Ministry of Science** launched a coordinated program with **5 companies** to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like **SK Telecom A.X-K1 (519B total / 33B active)** and **LG K-EXAONE (236B MoE / 23B active)**, with a total first-round budget of **~$140M**. This initiative contrasts with EU approaches by focusing funding on fewer stakeholders and explicitly budgeting for data. Meanwhile, **Alibaba&apos;s Qwen-Image-2512** emerges as a leading open-source image generation model, rapidly integrated into various toolchains including AI-Toolkit and local inference paths with quantization support, and hosted on platforms like Replicate. The model has undergone extensive blind testing with over **10,000 rounds** on AI Arena, highlighting its ecosystem adoption.</description><pubDate>Wed, 31 Dec 2025 05:44:39 GMT</pubDate><category>sk-telecom</category><category>lg</category><category>upstage</category><category>naver</category><category>alibaba</category><category>unsloth</category><category>replicate</category><category>qwen-image-2512</category><category>ax-k1</category><category>k-exaone</category><category>eliebakouch</category><category>clementdelangue</category><category>dorialexander</category><category>rising_sayak</category><category>_akhaliq</category><category>ostrisai</category><category>ivanfioravanti</category><category>yupp_ai</category><category>mixture-of-experts</category><category>model-release</category><category>quantization</category><category>open-source-models</category><category>image-generation</category><category>model-integration</category><category>model-benchmarking</category><category>compute-costs</category><category>dataset-curation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-30-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-30-not-much/</guid><description>**Z.ai (GLM family) IPO in Hong Kong on Jan 8, 2026**, aiming to raise **$560M** at **HK$4.35B**, marking it as the &quot;first AI-native LLM company&quot; public listing. The IPO highlights **GLM-4.7** as a starting point. **Meta AI** acquired **Manus** for approximately **$4–5B**, with Manus achieving **$100M ARR in 8–9 months**, illustrating the value of application-layer differentiation over proprietary models. Manus focuses on agentic architecture, context engineering, and general primitives like code execution and browser control, emphasizing &quot;agent habitats&quot; as a competitive moat. Discussions around **Claude Code** highlight skepticism about &quot;vibe coding,&quot; advocating for disciplined, framework-like AI-assisted programming practices.</description><pubDate>Tue, 30 Dec 2025 05:44:39 GMT</pubDate><category>z.ai</category><category>meta-ai-fair</category><category>manus</category><category>replit</category><category>glm-4.7</category><category>claude-code</category><category>zixuanli_</category><category>jietang</category><category>yuchenj_uw</category><category>sainingxie</category><category>amasad</category><category>hidecloud</category><category>imjaredz</category><category>random_walker</category><category>agentic-architecture</category><category>context-engineering</category><category>application-layer</category><category>code-generation</category><category>agent-habitats</category><category>ai-native-llm</category><category>ipo</category><category>inference-infrastructure</category><category>programming-paradigms</category></item><item><title>Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch</title><link>https://news.smol.ai/issues/25-12-29-meta-manus/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-29-meta-manus/</guid><description>**Manus** achieved a rapid growth trajectory in 2025, raising **$500M** from Benchmark and reaching **$100M ARR** before being acquired by **Meta** for an estimated **$4B**. The **vLLM** team launched a dedicated community site with new resources, while performance issues with **AMD MI300X FP8** were noted in **vLLM** and **sglang** benchmarks. **Weaviate** released operational features including **Object TTL**, **Java v6 client GA**, and **multimodal document embeddings**. API fragmentation concerns were raised by **Teknium** advocating for unified SDK wrappers. In open-weight models, **GLM-4.7** gained recognition as a reliable coding model with faster throughput on **Baseten**, and **MiniMax-M2.1** rose as a leading open agentic coder model, topping WebDev leaderboards.</description><pubDate>Mon, 29 Dec 2025 05:44:39 GMT</pubDate><category>manus</category><category>benchmark</category><category>meta-ai-fair</category><category>vllm</category><category>amd</category><category>sglang</category><category>weaviate</category><category>teknim</category><category>baseten</category><category>alphaxiv</category><category>minimax</category><category>glm-4.7</category><category>minimax-m2.1</category><category>vllm</category><category>alex_wang</category><category>nat_friedman</category><category>performance-optimization</category><category>inference-frameworks</category><category>model-benchmarking</category><category>model-deployment</category><category>open-source-models</category><category>multimodality</category><category>api</category><category>code-generation</category><category>community-building</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-26-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-26-not-much/</guid><description>**MiniMax M2.1** launches as an **open-source** agent and coding Mixture-of-Experts (MoE) model with **~10B active / ~230B total parameters**, claiming to outperform **Gemini 3 Pro** and **Claude Sonnet 4.5**, and supports local inference including on **Apple Silicon M3 Ultra** with quantization. **GLM 4.7** demonstrates local scaling on **Mac Studios** with **2× 512GB M3 Ultra** hardware, highlighting system-level challenges like bandwidth and parallelism. The concept of **inference quality** is emphasized as a key factor affecting output variance across deployments. Yann LeCun&apos;s **VL-JEPA** proposes a **non-generative, non-autoregressive** multimodal model operating in latent space for efficient real-time video processing with fewer parameters and decoding operations. Advances in agentic reinforcement learning for coding include self-play methods where agents inject and fix bugs autonomously, enabling self-improvement without human labeling, and large-scale RL infrastructure involving massive parallel code generation and execution sandboxes.</description><pubDate>Fri, 26 Dec 2025 05:44:39 GMT</pubDate><category>minimax-ai</category><category>vllm-project</category><category>exolabs</category><category>mlx</category><category>apple</category><category>openai</category><category>minimax-m2.1</category><category>glm-4.7</category><category>gemini-3-pro</category><category>claude-3-sonnet</category><category>vl-jepa</category><category>ylecun</category><category>awnihannun</category><category>alexocheema</category><category>edwardsun0909</category><category>johannes_hage</category><category>open-source</category><category>mixture-of-experts</category><category>local-inference</category><category>quantization</category><category>inference-quality</category><category>multimodality</category><category>non-autoregressive-models</category><category>video-processing</category><category>reinforcement-learning</category><category>self-play</category><category>agentic-rl</category><category>parallel-computing</category><category>model-deployment</category></item><item><title>Nvidia buys (most of) Groq for $20B cash; largest execuhire ever</title><link>https://news.smol.ai/issues/25-12-24-nvidia-groq/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-24-nvidia-groq/</guid><description>**Groq** leadership team is joining **Nvidia** under a &quot;non-exclusive licensing agreement&quot; in a deal valued at **$20 billion cash**, marking a major acquisition in AI chip space though Nvidia states it is not acquiring Groq as a company. Jensen Huang plans to integrate Groq&apos;s low-latency processors into the NVIDIA AI factory architecture to enhance AI inference and real-time workloads. Twitter highlights include **Gemini** used as a consumer utility for calorie tracking, OpenAI discussing the &quot;deployment gap&quot; focusing on model usage in healthcare and business, and Tesla&apos;s FSD v14 described as a &quot;Physical Turing Test&quot; for consumer AI. Benchmarking challenges are noted by **Epoch AI** emphasizing provider variance and integration issues affecting model quality measurement. Discussions on coding agents and developer experience convergence continue in the AI community.</description><pubDate>Wed, 24 Dec 2025 05:44:39 GMT</pubDate><category>nvidia</category><category>groq</category><category>openai</category><category>tesla</category><category>epoch-ai</category><category>gemini</category><category>gemini</category><category>fsd-v14</category><category>jensen_huang</category><category>xeophon</category><category>js_denain</category><category>jim_fan</category><category>benchmarking</category><category>inference</category><category>model-evaluation</category><category>ai-integration</category><category>agent-patterns</category><category>real-time-processing</category><category>low-latency</category><category>developer-experience</category><category>healthcare</category><category>business-workflows</category><category>consumer-ai</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-23-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-23-not-much/</guid><description>**GLM-4.7** and **MiniMax M2.1** open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters and 200K context. **Gemma Scope 2** from **google-deepmind** introduces sparse autoencoders and transcoders for interpretability across Gemma 3 models, aiming to provide shared infrastructure for safety and debugging. The **Medmarks v0.1** open medical evaluation suite and leaderboard launch addresses the need for open medical benchmarking across 15+ environments, engaging clinicians and researchers.</description><pubDate>Tue, 23 Dec 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>valsai</category><category>minimax-ai</category><category>ollama</category><category>trae</category><category>alibaba</category><category>sophont</category><category>prime-intellect</category><category>glm-4.7</category><category>glm-4.6</category><category>minimax-m2.1</category><category>gemma-3</category><category>gemma-scope-2</category><category>ivanfioravanti</category><category>awnihannun</category><category>deedydas</category><category>cline</category><category>omarsar0</category><category>adonis_singh</category><category>eliebakouch</category><category>teortaxestex</category><category>ibragim_bad</category><category>callum_mcdougall</category><category>neelnanda5</category><category>interpretability</category><category>sparse-autoencoders</category><category>agent-workflows</category><category>model-benchmarking</category><category>medical-evaluation</category><category>multi-agent-systems</category><category>model-performance</category><category>model-optimization</category><category>reinforcement-learning</category><category>tool-use</category><category>function-calling</category><category>context-windows</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-22-not-much/</guid><description>**Zhipu AI&apos;s GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi&apos;s MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model optimized for deployment. The open-weight text-to-image competition sees **Z-Image Turbo** leading with 6B parameters under Apache-2.0 license. Video model advances focus on control and long-form consistency, exemplified by **Kling 2.6 Motion Control** and research like MemFlow&apos;s adaptive memory retrieval. In agent frameworks, **Google&apos;s A2UI protocol** introduces agent-driven UI generation, while studies reveal that mixing multiple agent frameworks is common, with challenges in logic, termination, and tool interaction. LangChain emphasizes persistent memory patterns for production agents.</description><pubDate>Mon, 22 Dec 2025 05:44:39 GMT</pubDate><category>zhipu-ai</category><category>xiaomi</category><category>google</category><category>langchain</category><category>huggingface</category><category>openrouter</category><category>artificial-analysis</category><category>vllm-project</category><category>glm-4.7</category><category>mimo-v2-flash</category><category>z-image-turbo</category><category>kling-2.6-motion-control</category><category>mervenoyann</category><category>eliebakouch</category><category>omarsar0</category><category>osanseviero</category><category>dair_ai</category><category>coding</category><category>complex-reasoning</category><category>tool-use</category><category>mixture-of-experts</category><category>cost-efficiency</category><category>open-weight-models</category><category>text-to-image</category><category>video-models</category><category>memory-persistence</category><category>agent-frameworks</category><category>interactive-user-interfaces</category><category>model-deployment</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-19-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-19-not-much/</guid><description>**Alibaba** released **Qwen-Image-Layered**, an open-source model enabling Photoshop-grade layered image decomposition with recursive infinite layers and prompt-controlled structure. **Kling 2.6** introduced advanced motion control for image-to-video workflows, supported by a creator contest and prompt recipes. **Runway** unveiled the **GWM-1** family with frame-by-frame video generation and Gen-4.5 updates adding audio and multi-shot editing. In LLM platforms, **Gemini 3 Flash** leads benchmarks over **GPT-5.2**, attributed to agentic reinforcement learning improvements post-distillation. Users note **GPT-5.2** excels at long-context tasks (~256k tokens) but face UX limitations pushing some to use **Codex CLI**. Discussions around **Anthropic Opus 4.5** suggest perceived model degradation linked to user expectations.</description><pubDate>Fri, 19 Dec 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>kling-ai</category><category>runway</category><category>google</category><category>anthropic</category><category>openai</category><category>qwen-image-layered</category><category>kling-2.6</category><category>gwm-1</category><category>gen-4.5</category><category>gemini-3-flash</category><category>gpt-5.2</category><category>codex-cli</category><category>opus-4.5</category><category>ankesh_anand</category><category>image-decomposition</category><category>motion-control</category><category>video-generation</category><category>agentic-reinforcement-learning</category><category>long-context</category><category>model-degradation</category><category>benchmarking</category><category>tool-use</category><category>prompt-engineering</category></item><item><title>Claude Skills grows: Open Standard, Directory, Org Admin</title><link>https://news.smol.ai/issues/25-12-18-claude-skills-grows/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-18-claude-skills-grows/</guid><description>**Claude Skills** are gaining significant traction since their launch in October, with a milestone of 100k views in one day for the Claude Skills talk, signaling growing adoption and importance. Announcements include org admin support, a new Skills Directory, and the move to an open standard named **Agent Skills**. In frontier model launches, **OpenAI** released **GPT-5.2-Codex**, touted as the best agentic coding model with improvements in native compaction, long-context reliability, and tool-calling, emphasizing real-world security impacts. **Google DeepMind** introduced **Gemini 3 Flash**, focusing on speed as a product feature impacting workflows and user engagement, alongside **FunctionGemma** and **T5Gemma 2**, emphasizing on-device deployment, fine-tuning, and multimodality.</description><pubDate>Thu, 18 Dec 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>google-deepmind</category><category>hugging-face</category><category>claude-skills</category><category>gpt-5.2-codex</category><category>gemini-3-flash</category><category>functiongemma</category><category>t5gemma-2</category><category>sama</category><category>gregbrockman</category><category>philschmid</category><category>agentic-ai</category><category>fine-tuning</category><category>long-context</category><category>tool-calling</category><category>on-device-ai</category><category>multimodality</category><category>security</category><category>workflow-optimization</category></item><item><title>Gemini 3.0 Flash Preview: 1/4 cost of Pro, but ~as smart, retakes Pareto Frontier</title><link>https://news.smol.ai/issues/25-12-17-gemini-3-flash/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-17-gemini-3-flash/</guid><description>**Google** launched **Gemini 3 Flash**, a pro-grade reasoning model with flash latency, supporting tool calling and multimodal IO, available via multiple platforms including Google AI Studio and Vertex AI. It offers competitive pricing at $0.50 per 1M input tokens and $3.00 per 1M output tokens, with context windows up to 1M tokens. Benchmarks show **Gemini 3 Flash** rivals or outperforms larger models like **GPT-5.2** and **Gemini 3 Pro** in agentic, coding, and reasoning tasks, validated by ARC-AGI-2, SWE-bench, LMArena, and Arena benchmarks. Despite some tradeoffs like high token use and hallucination rates, it is cost-effective overall. Key figures include **Sundar Pichai**, **Jeff Dean**, and **Demis Hassabis** who publicly celebrated this achievement. The model&apos;s tool calling capabilities were demonstrated with 100 tools in a live demo.</description><pubDate>Wed, 17 Dec 2025 05:44:39 GMT</pubDate><category>google</category><category>google-deepmind</category><category>gemini-3-flash</category><category>gemini-3</category><category>gpt-5.2</category><category>gemini-3-pro</category><category>sundar_pichai</category><category>jeffdean</category><category>demishassabis</category><category>tool-calling</category><category>multimodality</category><category>benchmarking</category><category>reasoning</category><category>cost-efficiency</category><category>model-performance</category><category>context-window</category><category>agentic-ai</category><category>model-deployment</category></item><item><title>OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks</title><link>https://news.smol.ai/issues/25-12-16-gpt-image-15/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-16-gpt-image-15/</guid><description>**OpenAI** released its new image model **GPT Image 1.5**, featuring precise image editing, better instruction following, improved text and markdown rendering, and faster generation up to 4×. Despite topping multiple leaderboards like **LMArena (1277)**, **Design Arena (1344)**, and **AA Arena (1272)**, user feedback from Twitter, Reddit, and Discord communities is largely negative compared to **Nano Banana Pro** by **Gemini**. Xiaomi introduced the **MiMo-V2-Flash**, a **309B MoE** model optimized for inference efficiency with **256K context window**, achieving state-of-the-art scores on SWE-Bench. The model uses Hybrid Sliding Window Attention and multi-token prediction, offering significant speedups and efficiency improvements. The timing of OpenAI&apos;s launch amid competition from Gemini and Nano Banana Pro affects user sentiment, highlighting challenges in benchmarking relevance.</description><pubDate>Tue, 16 Dec 2025 05:44:39 GMT</pubDate><category>openai</category><category>gemini</category><category>xiaomi</category><category>lmsys</category><category>deepseek</category><category>openrouter</category><category>gpt-image-1.5</category><category>nano-banana-pro</category><category>mimo-v2-flash</category><category>deepseek-v3.2</category><category>fuli_luo</category><category>eliebakouch</category><category>image-generation</category><category>instruction-following</category><category>benchmarking</category><category>model-efficiency</category><category>long-context</category><category>multi-token-prediction</category><category>hybrid-attention</category><category>model-optimization</category><category>inference-speed</category><category>agentic-workflows</category><category>model-architecture</category><category>model-quantization</category></item><item><title>NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B</title><link>https://news.smol.ai/issues/25-12-15-nemotron-3/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-15-nemotron-3/</guid><description>**NVIDIA** has released **Nemotron 3 Nano**, a fully open-source hybrid Mamba-Transformer Mixture-of-Experts (MoE) model with a **30B parameter size** and a **1 million token context window**. It includes open weights, training recipes, datasets, and an RL environment suite called NeMo Gym, supporting commercial use under the NVIDIA Open Model License. The model achieves state-of-the-art results on benchmarks like SWE-Bench and Artificial Analysis Intelligence Index, outperforming **Qwen3-30B A3B**. Ecosystem support is immediate with integrations into inference stacks like **vLLM**, **llama.cpp**, and **Baseten**. Upcoming larger models, Nemotron Super and Ultra, will feature NVFP4 pretraining and LatentMoE routing to optimize compute. This release marks a significant milestone for open-source American AI with comprehensive open assets and advanced hybrid architecture.</description><pubDate>Mon, 15 Dec 2025 05:44:39 GMT</pubDate><category>nvidia</category><category>huggingface</category><category>togethercompute</category><category>baseten</category><category>vllm</category><category>llamaindex</category><category>nemotron-3-nano</category><category>qwen3-30b-a3b-base</category><category>ctnzr</category><category>andrew_n_carr</category><category>awnihannun</category><category>hybrid-architecture</category><category>mixture-of-experts</category><category>reinforcement-learning</category><category>long-context</category><category>model-release</category><category>open-source-models</category><category>model-training</category><category>model-optimization</category><category>benchmarking</category><category>agent-training</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-12-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-12-not-much/</guid><description>**GPT-5.2** shows mixed performance in public evaluations, excelling in agentic tasks but at a significantly higher cost (~**$620/run**) compared to **Opus 4.5** and **GPT-5.1**. It performs variably on reasoning and coding benchmarks, with some improvements on long-context tasks. Extended &quot;reasoning effort&quot; settings notably impact results. Aggregators rank **Gemini 3 Pro** above GPT-5.2 in task persistence. **OpenAI** released sparse activation models sparking debate on sparsity vs MoE architectures. **Allen AI**&apos;s **Olmo 3.1 (32B)** advances open reinforcement learning scale with substantial compute investment (~**125k H100 hours**). **Mistral**&apos;s Devstral-2 and **llama.cpp** improve local inference infrastructure with new features like GGUF support and distributed speedups. **Tinker** platform goes GA with vision input and finetuning support for **Qwen3-VL-235B**.</description><pubDate>Fri, 12 Dec 2025 05:44:39 GMT</pubDate><category>openai</category><category>allen_ai</category><category>mistral-ai</category><category>ollama</category><category>lmstudio</category><category>thinkymachines</category><category>gpt-5.2</category><category>opus-4.5</category><category>gemini-3-pro</category><category>gpt-5.1</category><category>olmo-3.1-32b</category><category>qwen3-vl-235b</category><category>sama</category><category>scaling01</category><category>akhaliq</category><category>artificialanlys</category><category>lechmazur</category><category>acerfur</category><category>epochairesearch</category><category>reinforcement-learning</category><category>model-benchmarking</category><category>long-context</category><category>model-quantization</category><category>model-optimization</category><category>inference-speed</category><category>sparsity</category><category>fine-tuning</category><category>vision</category></item><item><title>GPT-5.2 (Instant/Thinking/Pro): 74% on GDPVal, 1.4x cost of GPT 5.1, on 10 Year OpenAI Anniversary</title><link>https://news.smol.ai/issues/25-12-11-gpt-52/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-11-gpt-52/</guid><description>**OpenAI** celebrates its 10 year anniversary with the launch of **GPT-5.2**, featuring significant across-the-board improvements including a rare 40% price increase. GPT-5.2 shows strong performance gains in scientific reasoning, knowledge work, and economic value tasks, achieving over **70.9%** human expert parity on **GDPval** tasks and reaching **90.5%** on ARC-AGI-1 with a large efficiency gain. Despite some mixed results in coding benchmarks and vision capabilities, GPT-5.2 is well received as a major update with extended context and tiered reasoning controls. Pricing is set at **$1.75/M input** and **$14/M output** tokens with a 90% cache discount. The update is live in ChatGPT and API, marking a significant milestone for OpenAI&apos;s LLM development.</description><pubDate>Thu, 11 Dec 2025 05:44:39 GMT</pubDate><category>openai</category><category>gpt-5.2</category><category>sama</category><category>yanndubs</category><category>polynoamial</category><category>scaling01</category><category>scientific-reasoning</category><category>knowledge-work</category><category>long-context</category><category>benchmarking</category><category>performance-optimization</category><category>pricing</category><category>software-engineering</category><category>vision</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-10-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-10-not-much/</guid><description>**NousResearch&apos;s Nomos 1** is a 30B open math model achieving a top Putnam score with only ~3B active parameters, enabling consumer Mac inference. **AxiomProver** also posts top Putnam results using ThinkyMachines&apos; RL stack. **Mistral&apos;s Devstral 2 Small** outperforms DeepSeek v3.2 in 71% of preferences with better speed and cost. **Anthropic&apos;s Claude Code** introduces asynchronous agent execution. **Cursor 2.2** adds deep agent primitives like Debug and Plan Modes. **VS Code** launches unified agent chat sessions improving multi-agent workflows. **LangChain** releases &quot;Polly&quot; for agent observability. The **Stirrup** harness leads OpenAI GDPval benchmarks with Claude Opus 4.5, GPT-5, and Gemini 3 Pro following. Advances in quantization include **vLLM** integrating Intel&apos;s AutoRound PTQ for efficient serving. **Unsloth** achieves up to 3× training speedups with new kernels across Llama, Qwen, Mistral, and Gemma models. *&quot;Compositional reasoning + specialized post-training under constrained active params can rival frontier closed models on formal math.&quot;*</description><pubDate>Wed, 10 Dec 2025 05:44:39 GMT</pubDate><category>nousresearch</category><category>thinkymachines</category><category>mistral-ai</category><category>deepseek</category><category>anthropic</category><category>cursor</category><category>microsoft</category><category>langchain-ai</category><category>openai</category><category>gemini</category><category>intel</category><category>vllm_project</category><category>danielhanchen</category><category>nomos-1</category><category>axiomprover</category><category>devstral-2-small</category><category>deepseek-v3.2</category><category>claude-code</category><category>cursor-2.2</category><category>claude-opus-4.5</category><category>gpt-5</category><category>claude-sonnet-4.5</category><category>gemini-3-pro</category><category>llama</category><category>qwen</category><category>mistral</category><category>gemma</category><category>math</category><category>formal-reasoning</category><category>agentic-systems</category><category>asynchronous-execution</category><category>multi-agent-systems</category><category>observability</category><category>benchmarking</category><category>quantization</category><category>post-training-quantization</category><category>training-speedup</category><category>kernel-optimization</category><category>inference-efficiency</category></item><item><title>MCP -&gt; Agentic AI Foundation, Mistral Devstral 2</title><link>https://news.smol.ai/issues/25-12-09-devstral2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-09-devstral2/</guid><description>**OpenAI Engineering** sees a significant collaborative milestone with the launch of the **Agentic AI Foundation** under the Linux Foundation, uniting projects from **Anthropic**, **OpenAI**, and **Block**. **Mistral** released **Devstral 2**, a coding model with **123B parameters** and open weights, offering a cost-effective alternative to **Sonnet 4.3** and competitive performance against **DeepSeek v3.2**. The new **Mistral Vibe CLI** supports agentic coding workflows with rapid ecosystem integration. **Alibaba** introduced **Soft Adaptive Policy Optimization (SAPO)** for reinforcement learning tuning, improving stability and performance in **Qwen3-VL** across multiple tasks. Research highlights include the importance of data decontamination in RL and ongoing discussions on MoE RL stability and reward hacking mitigation.</description><pubDate>Tue, 09 Dec 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>block</category><category>mistral-ai</category><category>alibaba</category><category>linux-foundation</category><category>deepseek</category><category>devstral-2</category><category>devstral-small-2</category><category>sonnet-4.3</category><category>deepseek-v3.2</category><category>qwen3-vl</category><category>guillaumelample</category><category>b_roziere</category><category>qtnx_</category><category>charliermarsh</category><category>omarsar0</category><category>eliebakouch</category><category>justinwaugh</category><category>cwolferesearch</category><category>pan</category><category>agentic-ai</category><category>coding-models</category><category>reinforcement-learning</category><category>model-performance</category><category>model-optimization</category><category>open-weights</category><category>cli-tools</category><category>multi-file-code-automation</category><category>data-decontamination</category><category>moe</category><category>reward-models</category><category>rl-stability</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-08-not-much/</guid><description>**Claude Code Skills** gains attention with a published talk and Hugging Face&apos;s new &quot;skill&quot; enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. **Zhipu AI** launches multimodal models **GLM-4.6V** (106B params MoE) and **GLM-4.6V-Flash** (9B dense), featuring 128k context and native multimodal function calling, with free Flash variant and API pricing detailed. **Jina AI** releases **Jina-VLM (2B)**, a compact multilingual VLM excelling in diagrams and documents with top benchmark scores. At **NeurIPS 2025**, research highlights include Google&apos;s post-Transformer sequence architectures (Moneta, Yaad, Memora) showing up to 20% gains in long-context retrieval, **AxiomProver**&apos;s autonomous Lean system solving 9/12 Putnam 2025 problems rapidly, and mechanistic interpretability advances discussed by Chris Olah emphasizing scalable tooling.</description><pubDate>Mon, 08 Dec 2025 05:44:39 GMT</pubDate><category>hugging-face</category><category>zhipu-ai</category><category>jina-ai</category><category>google-deepmind</category><category>axiomprover</category><category>glm-4.6v</category><category>glm-4.6v-flash</category><category>jina-vlm-2b</category><category>lioronai</category><category>akshay_pachaar</category><category>_akhaliq</category><category>ben_burtenshaw</category><category>vllm_project</category><category>prince_canuma</category><category>zenmuxai</category><category>eliebakouch</category><category>theturingpost</category><category>axiommathai</category><category>neelnanda5</category><category>sarahookr</category><category>fine-tuning</category><category>multimodality</category><category>model-optimization</category><category>long-context</category><category>mechanistic-interpretability</category><category>formal-methods</category><category>sequence-architectures</category><category>reinforcement-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-05-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-05-not-much/</guid><description>**vLLM 0.12.0** introduces DeepSeek support, GPU Model Runner V2, and quantization improvements with PyTorch 2.9.0 and CUDA 12.9. **NVIDIA** launches CUDA Tile IR and cuTile Python for advanced GPU tensor operations targeting Blackwell GPUs. **Hugging Face** releases Transformers v5 RC with an any-to-any multimodal pipeline supporting models like **Gemma3n** and **Qwen3-Omni**. Agent platforms see updates from **LangChain** with content moderation and cost tracking, **Together AI** and **Meta AI** collaborate on RL for long-horizon workflows, and **SonarSource** integrates static analysis into AI codegen. Economic insights from **OpenRouter** highlight coding as a key AI application, with reasoning models surpassing 50% usage and market bifurcation between premium and open models. Additionally, **Kling Video 2.6** debuts native audio capabilities, and **Runway Gen-4.5**, **Qwen3-TTS**, and **Gemini 3 Pro** advance multimodality.</description><pubDate>Fri, 05 Dec 2025 05:44:39 GMT</pubDate><category>vllm</category><category>nvidia</category><category>huggingface</category><category>langchain-ai</category><category>together-ai</category><category>meta-ai-fair</category><category>sonarsource</category><category>openrouter</category><category>runway</category><category>gemini</category><category>arena</category><category>vllm-0.12.0</category><category>gemma3n</category><category>qwen3-omni</category><category>qwen3-vl</category><category>gpt-5.1-codex-max</category><category>gemini-3-pro</category><category>runway-gen-4.5</category><category>kling-video-2.6</category><category>jeremyphoward</category><category>mervenoyann</category><category>sydneyrunkle</category><category>swyx</category><category>maximelabonne</category><category>gpu-programming</category><category>quantization</category><category>multimodality</category><category>agent-platforms</category><category>reinforcement-learning</category><category>static-analysis</category><category>reasoning</category><category>inference-infrastructure</category><category>model-optimization</category><category>economics</category><category>audio</category><category>video-generation</category></item><item><title>OpenRouter&apos;s State of AI - An Empirical 100 Trillion Token Study</title><link>https://news.smol.ai/issues/25-12-04-openrouter/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-04-openrouter/</guid><description>**OpenRouter** released its first survey showing usage trends with 7 trillion tokens proxied weekly, highlighting a 52% roleplay bias. **Deepseek**&apos;s open model market share has sharply declined due to rising coding model usage. Reasoning model token usage surged from 0% to over 50%. **Grok Code Fast** shows high usage, while **Anthropic** leads in tool calling and coding requests with around 60% share. Input tokens quadrupled and output tokens tripled this year, driven mainly by programming use cases, which dominate spending and volume. Google launched **Gemini 3 Deep Think**, featuring parallel thinking and achieving 45.1% on ARC-AGI-2 benchmarks, and previewed **Titans**, a long-context neural memory architecture scaling beyond 2 million tokens. These advances were shared by **Google DeepMind** and **Google AI** on Twitter.</description><pubDate>Thu, 04 Dec 2025 05:44:39 GMT</pubDate><category>openrouter</category><category>deepseek</category><category>anthropic</category><category>google</category><category>google-deepmind</category><category>grok-code-fast</category><category>gemini-3</category><category>gemini-3-deep-think</category><category>gpt-5.1-codex-max</category><category>quocleix</category><category>noamshazeer</category><category>mirrokni</category><category>reasoning</category><category>coding</category><category>tokenization</category><category>long-context</category><category>model-architecture</category><category>benchmarking</category><category>agentic-ai</category><category>prompt-engineering</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-12-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-03-not-much/</guid><description>**OpenAI&apos;s Code Red response** and **Anthropic&apos;s IPO** are major highlights. In AI video and imaging, **Kling 2.6** introduces native audio co-generation with coherent lip-sync, partnered with platforms like **ElevenLabs** and **OpenArt**. **Runway Gen-4.5** enhances lighting fidelity, while **Google&apos;s Gemini 3 Nano Banana Pro** supports advanced image compositing. Open model releases include **DeepSeek V3.2** with sparse attention and cost-effective pricing, and **Mistral&apos;s Ministral 3** multimodal family with strong 14B variants. Retrieval and code models from **Alibaba&apos;s EvoQwen2.5-VL** and **Nous Research&apos;s Hermes 4.3** show competitive performance with permissive licensing and HF availability. The community arena sees additions like INTELLECT-3 (106B MoE). *&quot;coherent looking &amp; sounding output&quot;* and *&quot;auto-lighting to match scene mood&quot;* are noted advancements.</description><pubDate>Wed, 03 Dec 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>google</category><category>runway</category><category>elevenlabs</category><category>freepik</category><category>openart</category><category>deepseek</category><category>mistral-ai</category><category>alibaba</category><category>nous-research</category><category>kling-2.6</category><category>kling-o1</category><category>runway-gen-4.5</category><category>gemini-3</category><category>deepseek-v3.2</category><category>ministral-3</category><category>evoqwen2.5-vl</category><category>hermes-4.3</category><category>intellect-3</category><category>video-generation</category><category>audio-processing</category><category>multimodality</category><category>image-generation</category><category>reasoning</category><category>model-quantization</category><category>sparse-attention</category><category>model-pricing</category><category>multimodal-models</category><category>retrieval-augmentation</category><category>model-training</category><category>model-release</category></item><item><title>DeepSeek V3.2 &amp; 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling</title><link>https://news.smol.ai/issues/25-12-01-deepseek-32/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-01-deepseek-32/</guid><description>**DeepSeek** launched the **DeepSeek V3.2** family including Standard, Thinking, and Speciale variants with up to **131K context window** and competitive benchmarks against **GPT-5-High**, **Sonnet 4.5**, and **Gemini 3 Pro**. The release features a novel **Large Scale Agentic Task Synthesis Pipeline** focusing on agentic behaviors and improvements in **reinforcement learning** post-training algorithms. The models are available on platforms like **LM Arena** with pricing around **$0.28/$0.42 per million tokens**. Community feedback is mixed, praising the frontier reasoning capabilities but critiquing the chat UI experience. Key figures include **Susan Zhang** and **Teortaxes** who provided commentary on the release.</description><pubDate>Tue, 02 Dec 2025 05:44:39 GMT</pubDate><category>deepseek_ai</category><category>lm-arena</category><category>deepseek-v3.2</category><category>deepseek-v3.2-speciale</category><category>gpt-5-high</category><category>sonnet-4.5</category><category>gemini-3-pro</category><category>suchenzang</category><category>teortaxestex</category><category>agentic-ai</category><category>reinforcement-learning</category><category>large-context-windows</category><category>model-benchmarking</category><category>model-performance</category><category>multi-agent-systems</category><category>model-training</category><category>model-deployment</category></item><item><title>Mistral 3: Mistral Large 3 + Ministral 3B/8B/14B open weights models</title><link>https://news.smol.ai/issues/25-12-02-mistral-3/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-12-02-mistral-3/</guid><description>**Mistral** has launched the **Mistral 3 family** including **Ministral 3** models (3B/8B/14B) and **Mistral Large 3**, a sparse MoE model with **675B total parameters** and **256k context window**, all under an Apache 2.0 open license. Early benchmarks rank Mistral Large 3 at **#6 among open models** with strong coding performance. The launch includes broad ecosystem support such as vLLM, llama.cpp, Ollama, and LM Studio integrations. Meanwhile, **Anthropic** acquired the open-source **Bun** runtime to accelerate **Claude Code**, which reportedly reached a **$1B run-rate in ~6 months**. Anthropic also announced discounted **Claude** plans for nonprofits and shared insights on AI&apos;s impact on work internally.</description><pubDate>Tue, 02 Dec 2025 05:44:39 GMT</pubDate><category>mistral-ai</category><category>anthropic</category><category>apple</category><category>runway</category><category>moondream</category><category>mistral-large-3</category><category>ministral-3</category><category>clara-7b-instruct</category><category>gen-4.5</category><category>claude-code</category><category>anjney_midha</category><category>_akhaliq</category><category>alexalbert__</category><category>_catwu</category><category>mikeyk</category><category>sparse-moe</category><category>multimodality</category><category>benchmarking</category><category>open-source</category><category>model-licensing</category><category>model-performance</category><category>long-context</category><category>inference-optimization</category><category>instruction-following</category><category>local-inference</category><category>code-generation</category><category>model-integration</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-26-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-26-not-much/</guid><description>**Anthropic** introduces durable agents and MCP tasks for long-running workflows, with practical engineering patterns and integrations like Prefect. **Booking.com** deploys a large-scale agent system improving customer satisfaction using LangGraph, Kubernetes, GPT-4 Mini, and Weaviate. **Perplexity** rolls out user-level memory and virtual try-on features. **Claude Opus 4.5** leads on LisanBench and Code Arena WebDev benchmarks with mixed community feedback on its &quot;thinking&quot; and &quot;non-thinking&quot; modes, while improving cost-efficiency and UX with batch APIs and context compaction. Research on multi-agent systems shows **LatentMAS** reduces communication tokens by 70-84% and improves accuracy using Qwen3 models, and reasoning trace distillation achieves significant token reduction with maintained accuracy, highlighting the importance of reasoning trace style.</description><pubDate>Wed, 26 Nov 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>booking.com</category><category>perplexity-ai</category><category>langchain</category><category>claude</category><category>scaling01</category><category>deepseek</category><category>qwen</category><category>prefect</category><category>claude-opus-4.5</category><category>qwen-3-4b</category><category>qwen-3-8b</category><category>qwen-3-14b</category><category>deepseek-r1</category><category>jeremyphoward</category><category>alexalbert__</category><category>omarsar0</category><category>lingyang_pu</category><category>dair_ai</category><category>agent-systems</category><category>multi-agent-systems</category><category>reasoning</category><category>benchmarking</category><category>cost-efficiency</category><category>model-optimization</category><category>long-context</category><category>memory-management</category><category>reinforcement-learning</category><category>model-performance</category><category>multi-agent-communication</category><category>latent-representation</category><category>inference-cost</category><category>software-integration</category></item><item><title>Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights</title><link>https://news.smol.ai/issues/25-11-25-flux2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-25-flux2/</guid><description>**Black Forest Labs&apos; FLUX.2** release features **Multi-Reference Support** for up to **4 Megapixel** output and up to **10 images** with consistency, including four form factors: Pro, Flex, Dev (32B Open Weight model), and Klein (TBA Open Weights). The new **FLUX.2 - VAE** introduces a variational autoencoder optimizing learnability, quality, and compression. Meanwhile, **Anthropic&apos;s Claude Opus 4.5** demonstrates strong performance and efficiency, scoring **70 on Artificial Analysis**, tying with **GPT-5.1 high** and trailing **Gemini 3 Pro (73)**. Opus 4.5 excels in agentic coding benchmarks and research evaluations, with notable token efficiency and reduced running costs. *&quot;Opus 4.5 leads Gemini 3 Pro on SWE-Bench Verified and tops the AICodeKing leaderboard,&quot;* and it shows strong QA and systematic review capabilities. Anthropic also released a dense prompting guide for Opus 4.5.</description><pubDate>Tue, 25 Nov 2025 05:44:39 GMT</pubDate><category>black-forest-labs</category><category>anthropic</category><category>huggingface</category><category>flux-2</category><category>flux-2-dev</category><category>claude-opus-4.5</category><category>gpt-5.1</category><category>gemini-3-pro</category><category>multi-reference-support</category><category>variational-autoencoder</category><category>image-generation</category><category>open-weights</category><category>agentic-coding</category><category>token-efficiency</category><category>benchmarking</category><category>prompting</category><category>model-performance</category></item><item><title>Claude Opus 4.5: 3rd new SOTA coding model in past week, 1/3 the price of Opus </title><link>https://news.smol.ai/issues/25-11-24-opus-45/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-24-opus-45/</guid><description>**Anthropic** launched **Claude Opus 4.5**, a new flagship model excelling in **coding, agents, and tooling** with a significant **3x price cut** compared to Opus 4.1 and improved **token efficiency** using **76% fewer output tokens**. Opus 4.5 achieved a new **SOTA** on **SWE-bench Verified** with **80.9% accuracy**, surpassing previous models like **Gemini 3 Pro** and **GPT-5.1-Codex-Max**. The update includes advanced API features such as **effort control**, **context compaction**, and **programmatic tool calling**, improving tool accuracy and reducing token usage. Claude Code is now bundled with Claude Desktop, and new integrations like Claude for Chrome and Excel are rolling out. Benchmarks show Opus 4.5 breaking the 80% barrier on SWE-bench Verified and strong performance on ARC-AGI-2 and BrowseComp-Plus.</description><pubDate>Mon, 24 Nov 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>amazon</category><category>google</category><category>anthropic</category><category>claude-opus-4.5</category><category>gemini-3-pro</category><category>gpt-5.1-codex-max</category><category>opus-4.1</category><category>sonnet-4.5</category><category>alexalbert__</category><category>btibor91</category><category>scaling01</category><category>klieret</category><category>coding</category><category>agents</category><category>tool-use</category><category>token-efficiency</category><category>benchmarking</category><category>api</category><category>model-pricing</category><category>model-performance</category><category>effort-control</category><category>context-compaction</category><category>programmatic-tool-calling</category></item><item><title>AI Engineer Code Summit</title><link>https://news.smol.ai/issues/25-11-21-aie-code/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-21-aie-code/</guid><description>The recent **AIE Code Summit** showcased key developments including **Google DeepMind&apos;s Gemini 3 Pro Image model, Nano Banana Pro**, which features enhanced text rendering, 4K visuals, and fine-grained editing capabilities. Community feedback highlights its strong performance in design and visualization tasks, with high user preference scores. Benchmarking updates reveal the new **CritPt physics frontier benchmark** where Gemini 3 Pro outperforms GPT-5, though AI still lags on complex unseen research problems. Agentic task evaluations show varied time horizons and performance gaps between open-weight and closed frontier models, emphasizing ongoing challenges in AI research and deployment. *&quot;Instruction following remains jagged for some users,&quot;* and model fit varies by use case, with Gemini 3 excelling in UI and code tasks but showing regressions in transcription and writing fidelity.</description><pubDate>Fri, 21 Nov 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>togethercompute</category><category>gemini-3-pro-image</category><category>gemini-3</category><category>gpt-5</category><category>claude-3.7-sonnet</category><category>demishassabis</category><category>omarsar0</category><category>lintool</category><category>hrishioa</category><category>teknium</category><category>artificialanlys</category><category>minyangtian1</category><category>ofirpress</category><category>metr_evals</category><category>scaling01</category><category>image-generation</category><category>fine-tuning</category><category>benchmarking</category><category>agentic-ai</category><category>physics</category><category>model-performance</category><category>instruction-following</category><category>model-comparison</category><category>time-horizon</category><category>user-preference</category></item><item><title>Nano Banana Pro (Gemini Image Pro) solves text-in-images, infographic generation, 2-4k resolution, and Google Search grounding</title><link>https://news.smol.ai/issues/25-11-20-nano-banana-pro/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-20-nano-banana-pro/</guid><description>**Google** launched **Gemini 3 Pro Image (Nano Banana Pro)**, a next-generation AI image generation and editing model with integrated Google Search grounding, multi-image composition, and fine-grained visual controls, offering pricing at $0.134 per 2K image and $0.24 per 4K image. It features improved text rendering with error rates dropping from 56% to 8% compared to its predecessor, and includes SynthID watermark checks for provenance. The model is available via Gemini App, API, LM Arena, Hugging Face Spaces, Together AI, and Flow. Meanwhile, **OpenAI** shared early experiments with **GPT-5** accelerating scientific research, including proofs of previously unsolved problems in math, physics, biology, and materials science. *&quot;GPT-5 accelerated research tasks in math/physics/biology/materials; in 4, it helped find proofs of previously unsolved problems.&quot;*</description><pubDate>Thu, 20 Nov 2025 05:44:39 GMT</pubDate><category>google</category><category>openai</category><category>hugging-face</category><category>togethercompute</category><category>lmsys</category><category>gemini-3-pro</category><category>gpt-5</category><category>jeffdean</category><category>kevinweil</category><category>demishassabis</category><category>image-generation</category><category>text-rendering</category><category>model-provenance</category><category>scientific-research</category><category>proof-assistance</category><category>multimodal-integration</category><category>api-access</category><category>fine-tuning</category></item><item><title>OpenAI fires back: GPT-5.1-Codex-Max (API) and GPT 5.1 Pro (ChatGPT)</title><link>https://news.smol.ai/issues/25-11-19-gpt-51-codex-max-pro/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-19-gpt-51-codex-max-pro/</guid><description>**OpenAI** released **GPT-5.1-Codex-Max**, featuring compaction-native training, an &quot;Extra High&quot; reasoning mode, and claims of over 24-hour autonomous operation, showing significant performance gains on benchmarks like METR, CTF, and PaperBench. **Google&apos;s Gemini 3 Pro** demonstrates strong coding and reasoning capabilities, achieving new state-of-the-art results on SWE-bench Verified and WeirdML, with estimated model size between 5-10 trillion parameters. The AI coding agent ecosystem is rapidly evolving with integrations and tooling improvements from multiple companies. **Sam Altman** highlighted the significant improvements in GPT-5.1-Codex-Max. The news also covers educational offerings like ChatGPT for Teachers and multi-agent workflows involving Gemini 3, GPT-5.1-Codex-Max, and Claude Sonnet 4.5.</description><pubDate>Wed, 19 Nov 2025 05:44:39 GMT</pubDate><category>openai</category><category>google</category><category>anthropic</category><category>langchain-ai</category><category>gpt-5.1-codex-max</category><category>gpt-5.1-codex</category><category>gemini-3-pro</category><category>claude-3.5-sonnet</category><category>sama</category><category>coding</category><category>autonomous-systems</category><category>benchmarking</category><category>model-scaling</category><category>multi-agent-systems</category><category>model-performance</category><category>reasoning</category><category>model-architecture</category></item><item><title>Gemini 3 Pro — new GDM frontier model 6, Gemini 3 Deep Think, and Antigravity IDE</title><link>https://news.smol.ai/issues/25-11-18-gemini-3/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-18-gemini-3/</guid><description>**Google** launched **Gemini 3 Pro**, a state-of-the-art model with a **1M-token context window**, **multimodal reasoning**, and strong agentic capabilities, priced significantly higher than Gemini 2.5. It leads major benchmarks, surpassing **Grok 4.1** and competing closely with **Sonnet 4.5** and **GPT-5.1**, though GPT-5.1 excels in ultralong summarization. Independent evaluations from **Artificial Analysis**, **Vending Bench**, **ARC-AGI 2**, **Box**, and **PelicanBench** validate Gemini 3 as a frontier LLM. Google also introduced **Antigravity**, an agentic IDE powered by Gemini 3 Pro and other models, featuring task orchestration and human-in-the-loop validation. The launch marks Google&apos;s strong return to AI with more models expected soon. *&quot;Google is very, very back in the business.&quot;*</description><pubDate>Tue, 18 Nov 2025 05:44:39 GMT</pubDate><category>google</category><category>google-deepmind</category><category>gemini-3-pro</category><category>gemini-2.5</category><category>grok-4.1</category><category>sonnet-4.5</category><category>gpt-5.1</category><category>sundarpichai</category><category>_philschmid</category><category>oriol_vinyals</category><category>multimodality</category><category>agentic-ai</category><category>benchmarking</category><category>context-window</category><category>model-performance</category><category>instruction-following</category><category>model-pricing</category><category>api</category><category>model-release</category><category>reasoning</category><category>model-evaluation</category></item><item><title>xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing</title><link>https://news.smol.ai/issues/25-11-17-grok-41/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-17-grok-41/</guid><description>**xAI** launched **Grok 4.1**, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of **1483**, showing improvements in creative writing and anti-hallucination. **OpenAI&apos;s GPT-5.1 &quot;Thinking&quot;** demonstrates efficiency gains with ~60% less &quot;thinking&quot; on easy queries and strong ARC-AGI performance. **Google DeepMind** released **WeatherNext 2**, an ensemble generative model that is **8× faster** and more accurate for global weather forecasts, integrated into multiple Google products. **Sakana AI** raised **¥20B ($135M)** in Series B funding at a **$2.63B** valuation to focus on efficient AI for resource-constrained enterprise applications in Japan. New evaluations highlight tradeoffs between hallucination and knowledge accuracy across models including **Claude 4.1 Opus** and **Anthropic** models.</description><pubDate>Mon, 17 Nov 2025 05:44:39 GMT</pubDate><category>xai</category><category>openai</category><category>google-deepmind</category><category>sakana-ai</category><category>anthropic</category><category>microsoft</category><category>mufg</category><category>khosla</category><category>nea</category><category>lux-capital</category><category>iqt</category><category>grok-4.1</category><category>gpt-5.1</category><category>claude-4.1-opus</category><category>grok-4</category><category>gpt-5</category><category>grok-4.1-thinking</category><category>gpt-5-pro</category><category>claude-4.5-haiku</category><category>yanndubs</category><category>gregkamradt</category><category>philschmid</category><category>willccbb</category><category>model-performance</category><category>creative-writing</category><category>hallucination</category><category>evaluation-datasets</category><category>ensemble-models</category><category>weather-forecasting</category><category>funding</category><category>efficiency</category><category>anti-hallucination</category><category>arc-agi</category><category>model-scaling</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-14-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-14-not-much/</guid><description>**OpenAI** launched **GPT-5.1** featuring &quot;adaptive reasoning&quot; and developer-focused API improvements, including prompt caching and a reasoning_effort toggle for latency/cost tradeoffs. Independent analysis shows a minor intelligence bump with significant gains in agentic coding benchmarks. **Anthropic**&apos;s **Claude** models introduced structured outputs with JSON schema compliance in public beta for Sonnet 4.5 and Opus 4.1, enhancing tooling and code execution workflows. Rumors of an Opus 4.5 release were debunked. **LangChain** released a &quot;Deep Agents&quot; package and context-engineering playbook to optimize agent workflows. The community is eagerly anticipating **Google DeepMind**&apos;s **Gemini 3** model, hinted at in social media and upcoming AIE CODE events. *&quot;Tickets are sold out, but side events and volunteering opportunities are available.&quot;*</description><pubDate>Fri, 14 Nov 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>langchain-ai</category><category>google-deepmind</category><category>gpt-5.1</category><category>sonnet-4.5</category><category>opus-4.1</category><category>gemini-3</category><category>swyx</category><category>allisontam_</category><category>gdb</category><category>sama</category><category>alexalbert__</category><category>simonw</category><category>omarsar0</category><category>abacaj</category><category>scaling01</category><category>amandaaskell</category><category>adaptive-reasoning</category><category>developer-tools</category><category>prompt-optimization</category><category>json-schema</category><category>agent-workflows</category><category>context-engineering</category><category>structured-outputs</category><category>model-release</category><category>benchmarking</category></item><item><title>minor updates to GPT 5.1 and SIMA 2</title><link>https://news.smol.ai/issues/25-11-13-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-13-not-much/</guid><description>**OpenAI** released **GPT-5.1** family models including **5.1-Codex** and **5.1-Codex-Mini** with improved steerability, faster responses, and new tools like apply_patch and shell command execution. Pricing remains unchanged from 5.0. Immediate integrations include **GitHub Copilot**, **VS Code**, **Cursor**, and **Perplexity** adopting GPT-5.1 models. **Google DeepMind** announced **SIMA 2**, a **Gemini**-powered agent capable of language instruction following, planning, and self-improvement without human feedback, targeting robotics applications. New research on context engineering and agentic tool use patterns was published, with contributions from **Weaviate** and **LlamaIndex** on database query planning and chart parsing respectively. *&quot;Adaptive reasoning&quot;* and agentic coding improvements are highlighted in GPT-5.1- Instant.</description><pubDate>Thu, 13 Nov 2025 05:44:39 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>github</category><category>microsoft</category><category>cursor_ai</category><category>perplexity-ai</category><category>weaviate</category><category>llamaindex</category><category>gpt-5.1</category><category>gpt-5.1-codex</category><category>gpt-5.1-codex-mini</category><category>sima-2</category><category>gemini</category><category>sama</category><category>allisontam_</category><category>cline</category><category>cognition</category><category>demishassabis</category><category>omarsar0</category><category>helloiamleonie</category><category>adaptive-reasoning</category><category>agentic-coding</category><category>tool-use</category><category>context-engineering</category><category>memory-architecture</category><category>self-improvement</category><category>retrieval-augmentation</category><category>database-query-planning</category><category>chart-parsing</category><category>robotics</category></item><item><title>GPT 5.1 in ChatGPT: No evals, but adaptive thinking and instruction following</title><link>https://news.smol.ai/issues/25-11-12-gpt-51/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-12-gpt-51/</guid><description>**OpenAI** launched **GPT-5.1** with improvements in conversational tone, instruction following, and adaptive reasoning. **GPT-5.0** is being sunset in 3 months. ChatGPT introduces new tone toggles for personalization, serving over **800 million users**. **Waymo** rolls out freeway driving for public riders in major California cities, showcasing advances in autonomous driving. **Anthropic**&apos;s Project Fetch explores LLMs as robotics copilots using **Claude**. **Perceptron** releases a new API and Python SDK for multimodal perception-action apps supporting **Isaac-0.1** and **Qwen3VL-235B**. **Code Arena** offers live coding evaluations supporting **Claude**, **GPT-5**, **GLM-4.6**, and **Gemini**. **LangChain** introduces middleware for agent governance with human-in-the-loop controls. **LlamaIndex** releases a structured extraction template for SEC filings using LlamaAgents. **NousResearch** promotes ARC Prize benchmarks for generalized intelligence evaluation.</description><pubDate>Wed, 12 Nov 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>waymo</category><category>perceptron</category><category>langchain</category><category>llamaindex</category><category>nousresearch</category><category>gpt-5.1</category><category>gpt-5.0</category><category>claude</category><category>isaac-0.1</category><category>qwen3vl-235b</category><category>glm-4.6</category><category>gemini</category><category>dmitri_dolgov</category><category>jeffdean</category><category>fidji_simo</category><category>akshats07</category><category>adaptive-reasoning</category><category>instruction-following</category><category>personalization</category><category>autonomous-driving</category><category>robotics</category><category>multimodality</category><category>agent-evaluation</category><category>agent-governance</category><category>middleware</category><category>structured-extraction</category><category>benchmarking</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-11-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-11-not-much/</guid><description>**GPT-5** leads Sudoku-Bench solving 33% of puzzles but 67% remain unsolved, highlighting challenges in meta-reasoning and spatial logic. New training methods like **GRPO fine-tuning** and &quot;Thought Cloning&quot; show limited success. Research on &quot;looped LLMs&quot; suggests pretrained models benefit from repeated computation for better performance. **Baidu&apos;s ERNIE-4.5-VL-28B-A3B-Thinking** offers lightweight multimodal reasoning with Apache 2.0 licensing, outperforming **Gemini-2.5-Pro** and **GPT-5-High** on document tasks. **Databricks ai_parse_document** preview delivers cost-efficient document intelligence outperforming GPT-5 and Claude. **Pathwork AI** uses **LlamaCloud** for underwriting automation. **Gemini File Search API** enables agentic retrieval augmented generation (RAG) with MCP server integration. **Together AI** and **Collinear** launch **TraitMix** for persona-driven agent simulations integrated with **Together Evals**. Reports highlight risks in long-running code agents like **Claude Code** reverting changes, emphasizing guardrails. Community consensus favors multiple code copilots including Claude Code, Codex, and others.</description><pubDate>Tue, 11 Nov 2025 05:44:39 GMT</pubDate><category>openai</category><category>baidu</category><category>databricks</category><category>llamaindex</category><category>togethercompute</category><category>sakanaailabs</category><category>gpt-5</category><category>qwen2.5-7b</category><category>ernie-4.5-vl-28b-a3b-thinking</category><category>gemini-2.5-pro</category><category>llamacloud</category><category>claude-code</category><category>sakanaailabs</category><category>micahgoldblum</category><category>francoisfleuret</category><category>matei_zaharia</category><category>jerryjliu0</category><category>omarsar0</category><category>togethercompute</category><category>imjaredz</category><category>theo</category><category>reasoning-benchmarks</category><category>reinforcement-learning</category><category>fine-tuning</category><category>multimodality</category><category>document-intelligence</category><category>retrieval-augmented-generation</category><category>agentic-systems</category><category>persona-simulation</category><category>code-agents</category><category>guardrails</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-10-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-10-not-much/</guid><description>**Moonshot AI&apos;s Kimi K2 Thinking** AMA revealed a hybrid attention stack using **KDA + NoPE MLA** outperforming full MLA + RoPE, with the **Muon optimizer** scaling to ~1T parameters and native **INT4** QAT for cost-efficient inference. K2 Thinking ranks highly on **LisanBench** and **LM Arena Text** leaderboards, offering low-cost INT4 serving and strong performance in Math, Coding, and Creative Writing. It supports heavy agentic tool use with up to 300 tool requests per run and recommends using the official API for reliable long-trace inference. **Meta AI** released the **Omnilingual ASR** suite covering 1600+ languages including 500 underserved, plus a 7B wav2vec 2.0 model and ASR corpus. Additionally, the **Gelato-30B-A3B** model for computer grounding in GUI manipulation agents outperforms larger VLMs, targeting immediate agent gains. Qwen&apos;s image-edit LoRAs and light-restoration app were also highlighted.</description><pubDate>Mon, 10 Nov 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>meta-ai-fair</category><category>togethercompute</category><category>qwen</category><category>kimi-k2-thinking</category><category>kimi-k3</category><category>gelato-30b-a3b</category><category>omnilingual-wav2vec-2.0</category><category>yuchenj_uw</category><category>scaling01</category><category>code_star</category><category>omarsar0</category><category>kimi_moonshot</category><category>anas_awadalla</category><category>akhaliq</category><category>minchoi</category><category>attention-mechanisms</category><category>quantization</category><category>fine-tuning</category><category>model-optimization</category><category>agentic-ai</category><category>speech-recognition</category><category>multilingual-models</category><category>gui-manipulation</category><category>image-editing</category><category>dataset-release</category></item><item><title>Terminal-Bench 2.0 and Harbor</title><link>https://news.smol.ai/issues/25-11-07-tbench2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-07-tbench2/</guid><description>**Terminal-Bench** has fixed task issues and launched version 2.0 with cloud container support via the **Harbor framework**, gaining recognition from models like **Claude 4.5** and **Kimi K2 Thinking**. **Moonshot AI&apos;s Kimi K2 Thinking** is a 1 trillion parameter MoE reasoning model with ~32B active parameters, running natively in **INT4 quantization** and featuring a 256K context window. It leads open-weights benchmarks with an Artificial Analysis Intelligence Index score of **67** and strong agentic performance, running efficiently on consumer Apple silicon and 2× M3 Ultra hardware. The model is broadly available on **Hugging Face**, **Ollama Cloud**, and integrated into frameworks like slime. Serving bottlenecks were traced to network bandwidth rather than GPU limits, highlighting infrastructure considerations for LLM deployment.</description><pubDate>Fri, 07 Nov 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>anthropic</category><category>hugging-face</category><category>ollama</category><category>slime-framework</category><category>kimi-k2-thinking</category><category>clementdelangue</category><category>dbreunig</category><category>awnihannun</category><category>crystalsssup</category><category>kimi_moonshot</category><category>benchmarking</category><category>agentic-ai</category><category>quantization</category><category>model-optimization</category><category>inference</category><category>model-deployment</category><category>moe</category><category>context-windows</category><category>cost-efficiency</category></item><item><title>Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench &amp;&amp; Soumith leaves Pytorch</title><link>https://news.smol.ai/issues/25-11-06-kimi-k2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-06-kimi-k2/</guid><description>**Moonshot AI** launched **Kimi K2 Thinking**, a **1 trillion parameter** mixture-of-experts (MoE) model with **32 billion active experts**, a **256K context window**, and native **INT4 quantization-aware training**. It achieves state-of-the-art results on benchmarks like **HLE (44.9%)**, **BrowseComp (60.2%)**, and agentic tool use with **200-300 sequential tool calls**. The model is deployed with **vLLM** support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp. Early user reports note some API instability under launch load. Meanwhile, **Google** announced the **TPU v7 (Ironwood)** with a **10× peak performance improvement** over TPU v5p, aimed at training and agentic inference for models like **Gemini**. **Apple** added support for M5 Neural Accelerators in llama.cpp for inference acceleration.</description><pubDate>Thu, 06 Nov 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>google</category><category>apple</category><category>vllm_project</category><category>arena</category><category>baseten</category><category>yupp_ai</category><category>kimi-k2-thinking</category><category>gemini</category><category>eliebakouch</category><category>nrehiew_</category><category>andrew_n_carr</category><category>ofirpress</category><category>artificialanlys</category><category>sundarpichai</category><category>akhaliq</category><category>mixture-of-experts</category><category>quantization</category><category>int4</category><category>context-window</category><category>agentic-ai</category><category>benchmarking</category><category>model-deployment</category><category>inference-acceleration</category><category>api</category><category>performance-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-05-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-05-not-much/</guid><description>**Kimi-K2 Reasoner** has been integrated into **vLLM** and will soon be supported by **SGLang**, featuring a massive **1.2 trillion parameter MoE** configuration. **Perplexity AI** released research on cloud-portable trillion-parameter MoE kernels optimized for **AWS EFA**, with potential integration into **vLLM**. **IBM&apos;s vLLM** team formalized hybrid dense and sparse expert models, supporting models like **Qwen3-Next**, **Nemotron Nano 2**, and **Granite 4.0**. **Kimi-K2** reportedly scores **77% on GPQA Diamond**, outperforming **GPT-4.5** at 71.4%, though this is unverified. 

**Anthropic** published a guide on efficient tool-heavy agent systems using MCP patterns, drastically reducing context tokens by ~98.7%. **Graphiti MCP** demonstrated shared memory across apps like **Claude Desktop** and **Cursor** for persistent agent memory. **VS Code** introduced an &quot;Agent sessions&quot; feature to unify agent management, including **Copilot** and **Codex**. **Cursor AI** improved coding accuracy via semantic search and code retrieval embeddings. New evaluation frameworks like **CodeClash** and **LMArena** assess agent and coding model performance in realistic multi-round tasks and occupation-tagged leaderboards.</description><pubDate>Wed, 05 Nov 2025 05:44:39 GMT</pubDate><category>vllm</category><category>perplexity-ai</category><category>ibm</category><category>anthropic</category><category>graphiti</category><category>claude</category><category>cursor-ai</category><category>microsoft</category><category>kimi-k2</category><category>qwen3-next</category><category>nemotron-nano-2</category><category>granite-4.0</category><category>gpt-4.5</category><category>copilot</category><category>codex</category><category>scaling01</category><category>cedric_chee</category><category>aravsrinivas</category><category>omarsar0</category><category>_avichawla</category><category>pierceboggan</category><category>jo_parkhurst</category><category>jyangballin</category><category>ofirpress</category><category>ml_angelopoulos</category><category>mixture-of-experts</category><category>model-integration</category><category>cloud-computing</category><category>hybrid-models</category><category>benchmarking</category><category>agent-systems</category><category>memory-persistence</category><category>semantic-search</category><category>code-retrieval</category><category>context-length-optimization</category><category>tool-use</category><category>evaluation-frameworks</category><category>software-development</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-04-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-04-not-much/</guid><description>**Google&apos;s Project Suncatcher** prototypes scalable ML compute systems in orbit using solar energy with Trillium-generation TPUs surviving radiation, aiming for prototype satellites by 2027. **China&apos;s 50% electricity subsidies** for datacenters may offset chip efficiency gaps, with **Huawei** planning gigawatt-scale SuperPoDs for DeepSeek by 2027. **Epoch** launched an open data center tracking hub, and **Deutsche Telekom** and **NVIDIA** announced a $1.1B Munich facility with 10k GPUs. In agent stacks, **MCP** (Model-Compute-Platform) tools gain traction with implementations like **LitServe**, **Claude Desktop**, and **Reka&apos;s MCP server** for VS Code. Anthropic emphasizes efficient code execution with MCP. Context engineering shifts focus from prompt writing to model input prioritization, with reports and tools from **Weaviate**, **Anthropic**, and practitioners highlighting instruction-following rerankers and embedding approaches. DeepMind&apos;s **IMO-Bench** math reasoning suite shows **Gemini DeepThink** achieving high scores, with a ProofAutoGrader correlating strongly with human grading. Benchmarks and governance updates include new tasks and eval sharing in lighteval.</description><pubDate>Tue, 04 Nov 2025 05:44:39 GMT</pubDate><category>google</category><category>huawei</category><category>epoch-ai</category><category>deutsche-telekom</category><category>nvidia</category><category>anthropic</category><category>reka-ai</category><category>weaviate</category><category>deepmind</category><category>trillium</category><category>gemini-2.5-pro</category><category>gemini-deepthink</category><category>sundarpichai</category><category>yuchenj_uw</category><category>teortaxestex</category><category>epochairesearch</category><category>scaling01</category><category>_avichawla</category><category>rekaailabs</category><category>anthropicai</category><category>douwekiela</category><category>omarsar0</category><category>nityeshaga</category><category>goodside</category><category>iscienceluvr</category><category>lmthang</category><category>energy-efficiency</category><category>datacenters</category><category>mcp</category><category>context-engineering</category><category>instruction-following</category><category>embedding-models</category><category>math-reasoning</category><category>benchmarking</category><category>code-execution</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-11-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-11-03-not-much/</guid><description>**OpenAI** and **AWS** announced a strategic partnership involving a $38B compute deal to deploy hundreds of thousands of NVIDIA GB200 and GB300 chips, while **Microsoft** secured a license to ship NVIDIA GPUs to the UAE with a planned $7.9B datacenter investment. A 3-month NVFP4 kernel optimization competition on Blackwell B200s was launched by **NVIDIA** and GPU_MODE with prizes including DGX Spark and RTX 50XX GPUs. **vLLM** gains traction for local LLM serving, exemplified by PewDiePie&apos;s adoption. **Alibaba** previewed the Qwen3-Max-Thinking model hitting 100% on AIME 2025 and HMMT benchmarks, signaling advances in reasoning with tool use. The MIT-licensed MiniMax-M2 230B MoE model topped the Arena WebDev leaderboard, tying with Claude Sonnet 4.5 Thinking 32k. Critiques emerged on OSWorld benchmark stability and task validity. **LlamaIndex**&apos;s LIGHT framework demonstrated significant improvements in long-term memory tasks over raw context and RAG baselines, with gains up to +160.6% in summarization at 10M tokens. **Amazon** introduced Chronos-2, a time-series foundation model for zero-shot forecasting. The MCP ecosystem expanded with new tools like mcp2py OAuth integration and Gemini Docs MCP server, alongside a build sprint by **Anthropic** and **Gradio** offering substantial credits and prizes. *&quot;OSWorld doesn’t really exist—different prompt sets = incomparable scores&quot;* highlights benchmarking challenges.</description><pubDate>Mon, 03 Nov 2025 05:44:39 GMT</pubDate><category>openai</category><category>aws</category><category>microsoft</category><category>nvidia</category><category>gpu_mode</category><category>vllm</category><category>alibaba</category><category>arena</category><category>llamaindex</category><category>amazon</category><category>anthropic</category><category>gradio</category><category>qwen3-max-thinking</category><category>minimax-m2</category><category>claude-3-sonnet</category><category>llamaindex-light</category><category>chronos-2</category><category>sama</category><category>gdb</category><category>andrewcurran_</category><category>a1zhang</category><category>m_sirovatka</category><category>omarsar0</category><category>_philschmid</category><category>compute-deals</category><category>gpu-optimization</category><category>kernel-optimization</category><category>local-serving</category><category>reasoning</category><category>long-context</category><category>benchmarks</category><category>long-term-memory</category><category>time-series-forecasting</category><category>agent-frameworks</category><category>oauth-integration</category><category>developer-tools</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-31-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-31-not-much/</guid><description>**Poolside** raised **$1B** at a **$12B valuation**. **Eric Zelikman** raised **$1B** after leaving **Xai**. **Weavy** joined **Figma**. New research highlights **FP16** precision reduces training-inference mismatch in **reinforcement-learning** fine-tuning compared to **BF16**. **Kimi AI** introduced a hybrid **KDA (Kimi Delta Attention)** architecture improving long-context throughput and RL stability, alongside a new **Kimi CLI** for coding with agent protocol support. **OpenAI** previewed Agent Mode in ChatGPT enabling autonomous research and planning during browsing.</description><pubDate>Fri, 31 Oct 2025 05:44:39 GMT</pubDate><category>poolside</category><category>x-ai</category><category>figma</category><category>openai</category><category>kimi</category><category>moonshot</category><category>eric_zelikman</category><category>reinforcement-learning</category><category>precision</category><category>fp16</category><category>bf16</category><category>linear-attention</category><category>long-context</category><category>cli</category><category>agent-frameworks</category><category>coding-agents</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-30-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-30-not-much/</guid><description>**Moonshot AI** released **Kimi Linear (KDA)** with day-0 infrastructure and strong long-context metrics, achieving up to **75% KV cache reduction** and **6x decoding throughput**. **MiniMax M2** pivoted to full attention for multi-hop reasoning, maintaining strong agentic coding performance with **200k context** and **~100 TPS**. **ByteDance**, **Princeton**, and **Mila** introduced **Looped LLMs** showing efficiency gains comparable to larger transformers. **OpenAI**&apos;s **Aardvark (GPT-5)** entered private beta as an agentic security researcher for scalable vulnerability discovery. **Cursor** launched faster cloud coding agents, though transparency concerns arose regarding base-model provenance. **Cognition** released a public beta for a desktop/mobile tool-use agent named Devin. The community discussed advanced attention mechanisms and adaptive compute techniques.</description><pubDate>Thu, 30 Oct 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>minimax</category><category>bytedance</category><category>princeton</category><category>mila</category><category>openai</category><category>cursor</category><category>cognition</category><category>hkust</category><category>kimi-linear</category><category>kimi-delta-attention</category><category>minimax-m2</category><category>looped-llms</category><category>aardvark-gpt-5</category><category>kimi_moonshot</category><category>scaling01</category><category>uniartisan</category><category>omarsar0</category><category>aicodeking</category><category>songlinyang4</category><category>iscienceluvr</category><category>nrehiew_</category><category>gdb</category><category>embeddedsec</category><category>auchenberg</category><category>simonw</category><category>long-context</category><category>attention-mechanisms</category><category>agentic-ai</category><category>tool-use</category><category>adaptive-compute</category><category>coding-agents</category><category>performance-optimization</category><category>memory-optimization</category><category>reinforcement-learning</category><category>model-architecture</category></item><item><title>Cursor 2.0 &amp; Composer-1: Fast Models and New Agents UI</title><link>https://news.smol.ai/issues/25-10-29-cursor-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-29-cursor-2/</guid><description>**Cursor 2.0** launched with **Composer-1**, an agentic coding model optimized for speed and precision, featuring multi-agent orchestration, built-in browser for testing, and voice-to-code capabilities. **OpenAI** released **gpt-oss-safeguard** models (20B, 120B) for policy-based safety classification, open-weight and fine-tuned from gpt-oss, available on Hugging Face and supported by inference stacks like Ollama and Cerebras. **Goodfire** and **Rakuten** demonstrated sparse autoencoders for PII detection matching **gpt-5-mini** accuracy at significantly lower cost. The Cursor 2.0 update also includes a redesigned interface for managing multiple AI coding agents, marking a major advancement in AI IDEs. *&quot;Fast-not-slowest&quot; tradeoff emphasized by early users for Composer-1, enabling rapid iteration with human-in-the-loop.*</description><pubDate>Wed, 29 Oct 2025 05:44:39 GMT</pubDate><category>cursor_ai</category><category>openai</category><category>huggingface</category><category>ollama</category><category>cerebras</category><category>groq</category><category>goodfireai</category><category>rakuten</category><category>composer-1</category><category>gpt-oss-safeguard-20b</category><category>gpt-oss-safeguard-120b</category><category>gpt-oss</category><category>gpt-5-mini</category><category>sasha_rush</category><category>dan_shipper</category><category>samkottler</category><category>ellev3n11</category><category>swyx</category><category>agentic-coding</category><category>reinforcement-learning</category><category>mixture-of-experts</category><category>fine-tuning</category><category>policy-classification</category><category>open-weight-models</category><category>inference-stacks</category><category>cost-efficiency</category><category>multi-agent-systems</category><category>ide</category><category>voice-to-code</category><category>code-review</category><category>built-in-browser</category><category>model-optimization</category></item><item><title>OpenAI completes Microsoft + For-profit restructuring + announces 2028 AI Researcher timeline + Platform / AI cloud product direction + next $1T of compute</title><link>https://news.smol.ai/issues/25-10-28-openai-restructure/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-28-openai-restructure/</guid><description>**OpenAI** has completed a major recapitalization and restructuring, forming a Public Benefit Corporation with a non-profit Foundation holding special voting rights and equity valued at **$130B**. **Microsoft** holds about **27%** diluted ownership and committed to **$250B** in Azure spend, losing exclusivity on compute but retaining Azure API exclusivity until AGI is declared. The compute infrastructure deals for 2025 total **30GW** worth **$1.4T**, with OpenAI aiming to build **1GW per week** at **$20B per GW**, projecting **$3-4 trillion** infrastructure by 2033. The company is shifting focus from first-party apps to a platform approach, emphasizing ecosystem growth and third-party development. **Sam Altman** and **Sama** are key figures in this transition, with significant financial and strategic implications for AI industry partnerships, including openness to **Anthropic** and **Google Gemini** on Azure.</description><pubDate>Tue, 28 Oct 2025 05:44:39 GMT</pubDate><category>openai</category><category>microsoft</category><category>anthropic</category><category>google-deepmind</category><category>sama</category><category>sam_altman</category><category>public-benefit-corporation</category><category>corporate-restructuring</category><category>compute-infrastructure</category><category>cloud-computing</category><category>platform-strategy</category><category>api-exclusivity</category><category>investment</category><category>infrastructure-capex</category></item><item><title>MiniMax M2 230BA10B — 8% of Claude Sonnet&apos;s price, ~2x faster, new SOTA open model</title><link>https://news.smol.ai/issues/25-10-27-minimax-m2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-27-minimax-m2/</guid><description>**MiniMax M2**, an open-weight sparse MoE model by **Hailuo AI**, launches with **≈200–230B parameters** and **10B active parameters**, offering strong performance near frontier closed models and ranking #5 overall on the Artificial Analysis Intelligence Index v3.0. It supports coding and agent tasks, is licensed under **MIT**, and is available via API at competitive pricing. The architecture uses **full attention**, **QK-Norm**, **GQA**, partial RoPE, and sigmoid routing, with day-0 support in **vLLM** and deployment on platforms like Hugging Face and Baseten. Despite verbosity and no tech report, it marks a significant win for open models.</description><pubDate>Mon, 27 Oct 2025 05:44:39 GMT</pubDate><category>hailuo-ai</category><category>huggingface</category><category>baseten</category><category>vllm</category><category>modelscope</category><category>openrouter</category><category>cline</category><category>minimax-m2</category><category>reach_vb</category><category>artificialanlys</category><category>akhaliq</category><category>eliebakouch</category><category>grad62304977</category><category>yifan_zhang_</category><category>zpysky1125</category><category>sparse-moe</category><category>model-benchmarking</category><category>model-architecture</category><category>instruction-following</category><category>tool-use</category><category>api-pricing</category><category>model-deployment</category><category>performance-evaluation</category><category>full-attention</category><category>qk-norm</category><category>gqa</category><category>rope</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-24-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-24-not-much/</guid><description>**vLLM** announced support for **NVIDIA Nemotron Nano 2**, featuring a hybrid Transformer–Mamba design and tunable &quot;thinking budget&quot; enabling up to 6× faster token generation. **Mistral AI Studio** launched a production platform for agents with deep observability. **Baseten** reported high throughput (650 TPS) for **GPT-OSS 120B** on NVIDIA hardware. **Hugging Face InspectAI** added inference provider integration for cross-provider evaluation. **Thinking Machines Tinker** abstracts distributed fine-tuning for open-weight LLMs like **Qwen3** and **Llama 3**. In China, **MiniMax M2** shows competitive performance with top models and is optimized for agents and coding, while **Zhipu GLM-4.6-Air** focuses on reliability and scaling for coding tasks. Rumors suggest **Gemini 2.5 Flash** may be a &gt;500B parameter MoE model, and a possible **GPT-5.1 mini** reference appeared. Outside LLMs, **Tahoe-x1 (3B)** foundation model achieved SOTA in cancer cell biology benchmarks. Research from Stanford introduces a method to detect model provenance via training-order &quot;palimpsest&quot; with strong statistical guarantees.</description><pubDate>Fri, 24 Oct 2025 05:44:39 GMT</pubDate><category>vllm_project</category><category>nvidia</category><category>mistral-ai</category><category>baseten</category><category>huggingface</category><category>thinking-machines</category><category>deeplearningai</category><category>pytorch</category><category>arena</category><category>yupp-ai</category><category>zhipu-ai</category><category>scaling01</category><category>stanford</category><category>nemotron-nano-2</category><category>gpt-oss-120b</category><category>qwen3</category><category>llama-3</category><category>minimax-m2</category><category>glm-4.6-air</category><category>gemini-2.5-flash</category><category>gpt-5.1-mini</category><category>tahoe-x1</category><category>swyx</category><category>dvilasuero</category><category>_lewtun</category><category>clementdelangue</category><category>zephyr_z9</category><category>skylermiao7</category><category>teortaxestex</category><category>nalidoust</category><category>transformer-architecture</category><category>model-optimization</category><category>inference</category><category>distributed-training</category><category>multi-gpu-support</category><category>performance-optimization</category><category>agents</category><category>observability</category><category>model-evaluation</category><category>reinforcement-learning</category><category>model-provenance</category><category>statistical-testing</category><category>foundation-models</category><category>cancer-biology</category><category>model-fine-tuning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-23-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-23-not-much/</guid><description>**LangSmith** launched the **Insights Agent** with multi-turn evaluation for agent ops and observability, improving failure detection and user intent clustering. **Meta PyTorch** and **Hugging Face** introduced **OpenEnv**, a Gymnasium-style API and hub for reproducible agentic environments supporting distributed training. Discussions highlighted the importance of provider fidelity in agent coding, with **OpenRouter**&apos;s exacto filter improving stability. Builder UX updates include **Google AI Studio**&apos;s Annotation mode for Gemini code changes, **Microsoft**&apos;s Copilot Mode enhancements in Edge, and **OpenAI**&apos;s Shared Projects and Company Knowledge features for ChatGPT Business. **Claude** added project-scoped Memory. In reinforcement learning, **Meta**&apos;s ScaleRL proposes a methodology to predict RL scaling outcomes for LLMs with improved efficiency and stability.</description><pubDate>Thu, 23 Oct 2025 05:44:39 GMT</pubDate><category>langchain</category><category>meta-ai-fair</category><category>hugging-face</category><category>openrouter</category><category>google-ai</category><category>microsoft</category><category>openai</category><category>anthropic</category><category>gemini-1.5-pro</category><category>claude-3</category><category>chatgpt</category><category>hwchase17</category><category>ankush_gola11</category><category>whinthorn</category><category>koylanai</category><category>_lewtun</category><category>bhutanisanyam1</category><category>thom_wolf</category><category>danielhanchen</category><category>cline</category><category>canvrno</category><category>pashmerepat</category><category>mustafasuleyman</category><category>yusuf_i_mehdi</category><category>jordirib1</category><category>fidjissimo</category><category>bradlightcap</category><category>mikeyk</category><category>alexalbert__</category><category>agent-ops</category><category>observability</category><category>multi-turn-evaluation</category><category>reinforcement-learning</category><category>distributed-training</category><category>api</category><category>model-stability</category><category>user-intent-clustering</category><category>software-development</category><category>project-management</category><category>code-generation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-22-not-much/</guid><description>**LangChain &amp; LangGraph 1.0** released with major updates for reliable, controllable agents and unified docs, emphasizing &quot;Agent Engineering.&quot; **Meta** introduced **PyTorch Monarch** and **TorchForge** for distributed programming and reinforcement learning, enabling large-scale agentic systems. **Microsoft Learn MCP** server now integrates with tools like **Claude Code** and **VS Code** for instant doc querying, accelerating grounded agent workflows. **vLLM** improved inference correctness with token ID returns and batch-invariant inference, collaborating with **Ray** for orchestration in PyTorch Foundation. **OpenAI** launched **ChatGPT Atlas**, a browser agent with contextual Q&amp;A and advanced safety features, though early users note maturity challenges and caution around credential access.</description><pubDate>Wed, 22 Oct 2025 05:44:39 GMT</pubDate><category>langchain</category><category>meta</category><category>microsoft</category><category>openai</category><category>pytorch</category><category>ray</category><category>claude</category><category>vllm</category><category>chatgpt-atlas</category><category>hwchase17</category><category>soumithchintala</category><category>masondrxy</category><category>robertnishihara</category><category>cryps1s</category><category>yuchenj_uw</category><category>agent-frameworks</category><category>reinforcement-learning</category><category>distributed-computing</category><category>inference-correctness</category><category>serving-infrastructure</category><category>browser-agents</category><category>security</category><category>middleware</category><category>runtime-systems</category><category>documentation</category></item><item><title>ChatGPT Atlas: OpenAI&apos;s AI Browser</title><link>https://news.smol.ai/issues/25-10-21-chatgpt-atlas/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-21-chatgpt-atlas/</guid><description>**OpenAI** launched the **Chromium fork AI browser Atlas** for macOS, featuring integrated **Agent mode** and browser memory with local login capabilities, aiming to surpass **Google&apos;s Gemini** in Chrome. The launch received mixed reactions regarding reliability and privacy. **LangChain** raised a **$125M Series B** at a $1.25B valuation, releasing **v1.0 agent engineering stack** with significant adoption including **85M+ OSS downloads/month** and usage by ~35% of the Fortune 500. The ecosystem also saw updates like **vLLM&apos;s MoE LoRA expert finetuning support**.</description><pubDate>Tue, 21 Oct 2025 05:44:39 GMT</pubDate><category>openai</category><category>google</category><category>langchain</category><category>ivp</category><category>capitalg</category><category>sapphire</category><category>sequoia</category><category>benchmark</category><category>gemini</category><category>atlas</category><category>kevinweil</category><category>bengoodger</category><category>fidjissimo</category><category>omarsar0</category><category>yuchenj_uw</category><category>nickaturley</category><category>raizamrtn</category><category>hwchase17</category><category>bromann</category><category>casper_hansen_</category><category>corbtt</category><category>agent-mode</category><category>browser-memory</category><category>chromium</category><category>finetuning</category><category>moe</category><category>lora</category><category>agent-runtime</category><category>observability</category><category>software-development</category><category>funding</category></item><item><title>DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100</title><link>https://news.smol.ai/issues/25-10-20-deepseek-ocr/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-20-deepseek-ocr/</guid><description>As **ICCV 2025** begins, **DeepSeek** releases a novel **DeepSeek-OCR** 3B MoE vision-language model that compresses long text as visual context with high accuracy and efficiency, challenging traditional tokenization approaches. The model achieves ~97% decoding precision at &lt;10× compression and processes up to ~33M pages/day on 20 A100-40G nodes, outperforming benchmarks like GOT-OCR2.0. Discussions highlight the potential for unlimited context windows and tokenization-free inputs, with contributions from **@karpathy**, **@teortaxesTex**, and others. In video generation, **google-deepmind**&apos;s **Veo 3.1** leads community benchmarks with advanced precision editing and scene blending, while **Krea** open-sources a 14B autoregressive video model enabling realtime long-form generation at ~11 FPS on a single B200 GPU.</description><pubDate>Mon, 20 Oct 2025 05:44:39 GMT</pubDate><category>deepseek-ai</category><category>google-deepmind</category><category>krea</category><category>deepseek-ocr</category><category>deepseek3b-moe-a570m</category><category>veo-3.1</category><category>karpathy</category><category>teortaxestex</category><category>reach_vb</category><category>_akhaliq</category><category>eliebakouch</category><category>vikhyatk</category><category>demishassabis</category><category>ocr</category><category>vision</category><category>multimodality</category><category>model-compression</category><category>long-context</category><category>model-architecture</category><category>video-generation</category><category>autoregressive-models</category><category>model-efficiency</category><category>precision-editing</category></item><item><title>The Karpathy-Dwarkesh Interview delays AGI timelines</title><link>https://news.smol.ai/issues/25-10-17-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-17-not-much/</guid><description>The recent AI news highlights the **Karpathy interview** as a major event, alongside significant discussions on reasoning improvements without reinforcement learning, with **test-time sampling** achieving GRPO-level performance. Critiques on context window marketing reveal effective limits near **64K tokens**, with **Claude Haiku 4.5** showing competitive reasoning speed. **GPT-5** struggles with advanced math benchmarks, and data quality issues termed &quot;Brain Rot&quot; affect model reasoning and safety. In agent frameworks, **Anthropic Skills** enable modular coding workflows, **OpenAI Codex IDE** extensions enhance developer productivity, and **HuggingChat Omni** introduces meta-routing across 100+ open models using **Arch-Router-1.5B**. LangChain and LlamaIndex advance graph-first agent infrastructure, while **Google Gemini** integrates with Google Maps for real-world grounding.</description><pubDate>Fri, 17 Oct 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>huggingface</category><category>langchain</category><category>llamaindex</category><category>google</category><category>epoch-ai</category><category>claude-haiku-4.5</category><category>gpt-5</category><category>arch-router-1.5b</category><category>karpathy</category><category>aakaran31</category><category>du_yilun</category><category>giffmana</category><category>omarsar0</category><category>jeremyphoward</category><category>claude_code</category><category>mikeyk</category><category>alexalbert__</category><category>clementdelangue</category><category>jerryjliu0</category><category>reasoning</category><category>long-context</category><category>sampling</category><category>benchmarking</category><category>data-quality</category><category>agent-frameworks</category><category>modular-workflows</category><category>ide-extensions</category><category>model-routing</category><category>graph-first-agents</category><category>real-world-grounding</category></item><item><title>Claude Agent Skills - glorified AGENTS.md? or MCP killer?</title><link>https://news.smol.ai/issues/25-10-16-claude-skills/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-16-claude-skills/</guid><description>**Anthropic** achieves a rare feat with back-to-back AI news headlines featuring **Claude&apos;s** new **Skills**—a novel way to build specialized agents using Markdown files, scripts, and metadata to handle tasks like creating and reading PDFs, Docs, and PPTs. Simon Willison calls this a &quot;bigger deal than MCP,&quot; predicting a &quot;Cambrian explosion in Skills.&quot; Meanwhile, **Anthropic** launches **Claude 4.5 Haiku** with strong reasoning and long-context capabilities, priced competitively. Other updates include **OpenAI&apos;s** ChatGPT memory management improvements, **Windows 11 Copilot** voice and vision features, and **HuggingChat Omni** routing across 115 open-source models from 15 providers. These developments highlight advances in agent skills, document processing, long-context reasoning, and multi-model routing.</description><pubDate>Thu, 16 Oct 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>microsoft</category><category>perplexity-ai</category><category>huggingface</category><category>groq</category><category>cerebras</category><category>togethercompute</category><category>claude-4.5-haiku</category><category>claude</category><category>chatgpt</category><category>huggingchat-omni</category><category>simonwillison</category><category>alexalbert__</category><category>mustafasuleyman</category><category>yusuf_i_mehdi</category><category>aravsrinivas</category><category>agent-skills</category><category>document-processing</category><category>long-context</category><category>reasoning</category><category>multi-model-routing</category><category>memory-management</category><category>voice</category><category>vision</category></item><item><title>Claude Haiku 4.5</title><link>https://news.smol.ai/issues/25-10-15-haiku-45/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-15-haiku-45/</guid><description>**Anthropic** released **Claude Haiku 4.5**, a model that is over 2x faster and 3x cheaper than **Claude Sonnet 4.5**, improving iteration speed and user experience significantly. Pricing comparisons highlight Haiku 4.5&apos;s competitive cost against models like **GPT-5** and **GLM-4.6**. **Google** and **Yale** introduced the open-weight **Cell2Sentence-Scale 27B (Gemma)** model, which generated a novel, experimentally validated cancer hypothesis, with open-sourced weights for community use. Early evaluations show **GPT-5** and **o3** models outperform **GPT-4.1** in agentic reasoning tasks, balancing cost and performance. Agent evaluation challenges and memory-based learning advances were also discussed, with contributions from Shanghai AI Lab and others. *&quot;Haiku 4.5 materially improves iteration speed and UX,&quot;* and *&quot;Cell2Sentence-Scale yielded validated cancer hypothesis&quot;* were key highlights.</description><pubDate>Wed, 15 Oct 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>google</category><category>yale</category><category>artificial-analysis</category><category>shanghai-ai-lab</category><category>claude-3.5-sonnet</category><category>claude-3-haiku</category><category>claude-3-haiku-4.5</category><category>gpt-5</category><category>gpt-4.1</category><category>gemma-2.5</category><category>gemma</category><category>o3</category><category>swyx</category><category>sundarpichai</category><category>osanseviero</category><category>clementdelangue</category><category>deredleritt3r</category><category>azizishekoofeh</category><category>vikhyatk</category><category>mirrokni</category><category>pdrmnvd</category><category>akhaliq</category><category>sayashk</category><category>gne</category><category>model-performance</category><category>fine-tuning</category><category>reasoning</category><category>agent-evaluation</category><category>memory-optimization</category><category>model-efficiency</category><category>open-models</category><category>cost-efficiency</category><category>foundation-models</category><category>agentic-workflows</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-14-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-14-not-much/</guid><description>**Alibaba** released compact dense **Qwen3-VL** models at 4B and 8B sizes with FP8 options, supporting up to 1M context and open vocabulary detection, rivaling larger models like **Qwen2.5-VL-72B**. Ecosystem support includes **MLX-VLM**, **LM Studio**, **vLLM**, **Kaggle models**, and **Ollama Cloud**. In video AI, **Arena** added **Sora 2** models leading in video benchmarks, with **Higgsfield Enhancer** improving video quality. **Runway** launched domain-specific workflow apps for creative tasks. Research on **Representation Autoencoders for DiTs (RAE-DiT)** shows improved diffusion model performance. On local training, **NVIDIA DGX Spark** enables strong local fine-tuning, while **Nanochat** by **Karpathy** offers a minimal stack for training and inference. **Together AI** introduced **ATLAS**, a speculative decoding method achieving up to 4× faster inference on **DeepSeek-V3.1**. These developments highlight advances in efficient model deployment, video AI, local fine-tuning, and inference speed optimization.</description><pubDate>Tue, 14 Oct 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>arena</category><category>runway</category><category>nvidia</category><category>togethercompute</category><category>ollama</category><category>qwen3-vl-4b</category><category>qwen3-vl-8b</category><category>qwen2.5-vl-72b</category><category>deepseek-v3.1</category><category>karpathy</category><category>model-optimization</category><category>fine-tuning</category><category>inference-speed</category><category>video-generation</category><category>diffusion-models</category><category>representation-learning</category><category>local-ai</category><category>speculative-decoding</category><category>fp8-quantization</category><category>context-windows</category></item><item><title>OpenAI Titan XPU: 10GW of self-designed chips with Broadcom</title><link>https://news.smol.ai/issues/25-10-13-oai-broadcom/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-13-oai-broadcom/</guid><description>**OpenAI** is finalizing a custom ASIC chip design to deploy **10GW** of inference compute, complementing existing deals with **NVIDIA** (10GW) and **AMD** (6GW). This marks a significant scale-up from OpenAI&apos;s current **2GW** compute, aiming for a roadmap of **250GW** total, which is half the energy consumption of the US. Greg from OpenAI highlights the shift of **ChatGPT** from interactive use to always-on ambient agents requiring massive compute, emphasizing the challenge of building chips for billions of users. The in-house ASIC effort was driven by the need for tailored designs after limited success influencing external chip startups. Broadcom&apos;s stock surged 10% on the news. Additionally, **InferenceMAX** reports improved ROCm stability and nuanced performance comparisons between AMD MI300X and NVIDIA H100/H200 on **llama-3-70b** FP8 workloads, with RL training infrastructure updates noted.</description><pubDate>Mon, 13 Oct 2025 05:44:39 GMT</pubDate><category>openai</category><category>nvidia</category><category>amd</category><category>broadcom</category><category>inferencemax</category><category>llama-3-70b</category><category>gdb</category><category>asic</category><category>inference</category><category>compute-infrastructure</category><category>chip-design</category><category>fp8</category><category>reinforcement-learning</category><category>ambient-agents</category><category>custom-accelerators</category><category>energy-consumption</category><category>podcast</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-10-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-10-not-much/</guid><description>**FrontierMath Tier 4** results show **GPT-5 Pro** narrowly outperforming **Gemini 2.5 Deep Think** in reasoning accuracy, with concerns about problem leakage clarified by **Epoch AI Research**. **Mila** and **Microsoft** propose **Markovian Thinking** to improve reasoning efficiency, enabling models to reason over 24K tokens with less compute. New research suggests base models inherently contain reasoning mechanisms, with &quot;thinking models&quot; learning to invoke them effectively. In systems, **NVIDIA Blackwell** combined with **vLLM** wins InferenceMAX with significant throughput gains, while **Together AI&apos;s ATLAS** adaptive speculative decoding achieves 4× speed improvements and reduces RL training time by over 60%. **SparseServe** introduces dynamic sparse attention with KV tiering, drastically improving throughput and latency in GPU memory management.</description><pubDate>Fri, 10 Oct 2025 05:44:39 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>microsoft</category><category>epoch-ai-research</category><category>togethercompute</category><category>nvidia</category><category>mila</category><category>gpt-5-pro</category><category>gemini-2.5</category><category>vllm</category><category>deepseek-v3.1</category><category>epochairesearch</category><category>yitayml</category><category>_philschmid</category><category>jiqizhixin</category><category>cvenhoff00</category><category>neelnanda5</category><category>lateinteraction</category><category>mgoin_</category><category>blackhc</category><category>teortaxestex</category><category>reasoning</category><category>reinforcement-learning</category><category>inference</category><category>speculative-decoding</category><category>sparse-attention</category><category>kv-cache-management</category><category>throughput-optimization</category><category>compute-efficiency</category><category>tokenization</category></item><item><title>Air Street&apos;s State of AI 2025 Report</title><link>https://news.smol.ai/issues/25-10-09-state-of-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-09-state-of-ai/</guid><description>**Reflection** raised **$2B** to build frontier open-weight models with a focus on safety and evaluation, led by a team with backgrounds from **AlphaGo**, **PaLM**, and **Gemini**. **Figure** launched its next-gen humanoid robot, **Figure 03**, emphasizing non-teleoperated capabilities for home and large-scale use. **Radical Numerics** released **RND1**, a **30B-parameter sparse MoE diffusion language model** with open weights and code to advance diffusion LM research. **Zhipu** posted strong results with **GLM-4.6** on the Design Arena benchmark, while **AI21 Labs**&apos; **Jamba Reasoning 3B** leads tiny reasoning models. **Anthropic** introduced a plugin system for **Claude Code** to enhance developer tools and agent stacks. The report also highlights SoftBank&apos;s acquisition of ABB&apos;s robotics unit for **$5.4B** and the growing ecosystem around open frontier modeling and small-model reasoning.</description><pubDate>Thu, 09 Oct 2025 05:44:39 GMT</pubDate><category>reflection</category><category>mastra</category><category>datacurve</category><category>spellbook</category><category>kernel</category><category>figure</category><category>softbank</category><category>abb</category><category>radicalnumerics</category><category>zhipu-ai</category><category>ai21-labs</category><category>anthropic</category><category>glm-4.6</category><category>jamba-1.5</category><category>rnd1</category><category>claude-code</category><category>adcock_brett</category><category>achowdhery</category><category>clementdelangue</category><category>humanoid-robots</category><category>mixture-of-experts</category><category>diffusion-models</category><category>open-weight-models</category><category>reinforcement-learning</category><category>benchmarking</category><category>small-language-models</category><category>plugin-systems</category><category>developer-tools</category><category>agent-stacks</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-08-not-much/</guid><description>**Samsung&apos;s 7M Tiny Recursive Model (TRM)** achieves superior reasoning on ARC-AGI and Sudoku with fewer layers and MLP replacing self-attention. **LeCun&apos;s team** introduces **JEPA-SCORE**, enabling density estimation from encoders without retraining. **AI21 Labs** releases **Jamba Reasoning 3B**, a fast hybrid SSM-Transformer model supporting up to 64K context tokens. **Alibaba&apos;s Qwen3 Omni/Omni Realtime** offers a unified audio-video-text model with extensive language and speech support, outperforming Gemini 2.0 Flash on BigBench Audio. **Alibaba** also debuts **Qwen Image Edit 2509**, a top open-weight multi-image editing model. **ColBERT Nano** models demonstrate effective retrieval at micro-scale parameter sizes. In reinforcement learning, **CoreWeave**, **Weights &amp; Biases**, and **OpenPipe** launch serverless RL infrastructure reducing costs and speeding training. **Stanford&apos;s AgentFlow** presents an in-the-flow RL system with a 7B backbone outperforming larger models on agentic tasks. This update highlights advances in **recursive reasoning**, **density estimation**, **multimodal architectures**, **long-context modeling**, **retrieval**, and **serverless reinforcement learning**.</description><pubDate>Wed, 08 Oct 2025 05:44:39 GMT</pubDate><category>samsung</category><category>lecuun</category><category>ai21-labs</category><category>alibaba</category><category>coreweave</category><category>weights-biases</category><category>openpipe</category><category>stanford</category><category>7m-tiny-recursive-model</category><category>jamba-reasoning-3b</category><category>qwen3-omni</category><category>qwen-image-edit-2509</category><category>colbert-nano</category><category>agentflow</category><category>rasbt</category><category>jm_alexia</category><category>jiqizhixin</category><category>randall_balestr</category><category>corbtt</category><category>shawnup</category><category>_akhaliq</category><category>recursive-reasoning</category><category>density-estimation</category><category>multimodality</category><category>long-context</category><category>retrieval</category><category>serverless-reinforcement-learning</category><category>agentic-systems</category><category>model-efficiency</category><category>reinforcement-learning</category><category>transformers</category></item><item><title>Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA</title><link>https://news.smol.ai/issues/25-10-07-gemini-cua/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-07-gemini-cua/</guid><description>**Google DeepMind** released a new **Gemini 2.5 Computer Use model** for browser and Android UI control, evaluated by Browserbase. **OpenAI** showcased **GPT-5 Pro**, new developer tools including **Codex** with Slack integration, and agent-building SDKs at Dev Day. **Google DeepMind&apos;s CodeMender** automates security patching for large codebases. **Microsoft** introduced an open-source **Agent Framework** for multi-agent enterprise systems. AI community discussions highlight agent orchestration, program synthesis, and UI control advancements. **GLM-4.6** update from Zhipu features a large Mixture-of-Experts model with 355B parameters.</description><pubDate>Tue, 07 Oct 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>openai</category><category>microsoft</category><category>anthropic</category><category>zhipu-ai</category><category>llamaindex</category><category>mongodb</category><category>gemini-2.5</category><category>gpt-5-pro</category><category>glm-4.6</category><category>codex</category><category>swyx</category><category>demishassabis</category><category>philschmid</category><category>assaf_elovic</category><category>hwchase17</category><category>jerryjliu0</category><category>skirano</category><category>fabianstelzer</category><category>blackhc</category><category>andrewyng</category><category>agent-frameworks</category><category>program-synthesis</category><category>security</category><category>multi-agent-systems</category><category>computer-use-models</category><category>open-source</category><category>moe</category><category>developer-tools</category><category>workflow-automation</category><category>api</category><category>vision</category><category>reasoning</category></item><item><title>OpenAI Dev Day: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs</title><link>https://news.smol.ai/issues/25-10-06-devday/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-06-devday/</guid><description>**OpenAI** showcased major product launches at their DevDay including the **Apps SDK**, **AgentKit**, and **Codex** now generally available with SDK and enterprise features. They introduced new models such as **gpt-5-pro**, **gpt-realtime-mini-2025-10-06**, **gpt-audio-mini-2025-10-06**, **gpt-image-1-mini**, and **sora-2** with a pro variant. The Apps SDK enables embedding interactive apps inside ChatGPT with partners like **Canva**, **Figma**, **Zillow**, and **Coursera**. AgentKit offers a full stack for building and deploying production agents with tools like ChatKit and Guardrails. Codex supports speech and controller-driven coding, credited with high internal shipping velocity. Pricing for GPT-5 Pro was revealed at $15 input and $120 output per million tokens. *&quot;OpenAI turned ChatGPT into an application platform&quot;* and *&quot;AgentKit built a working agent in under 8 minutes&quot;* were highlights.</description><pubDate>Mon, 06 Oct 2025 05:44:39 GMT</pubDate><category>openai</category><category>canva</category><category>figma</category><category>zillow</category><category>coursera</category><category>gpt-5-pro</category><category>gpt-realtime-mini-2025-10-06</category><category>gpt-audio-mini-2025-10-06</category><category>gpt-image-1-mini</category><category>sora-2</category><category>sora-2-pro</category><category>sama</category><category>edwinarbus</category><category>gdb</category><category>dbreunig</category><category>stevenheidel</category><category>api</category><category>model-release</category><category>fine-tuning</category><category>agentic-ai</category><category>code-generation</category><category>model-deployment</category><category>pricing</category><category>prompt-optimization</category><category>software-development</category><category>multimodality</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-03-not-much/</guid><description>**Anthropic** announces a new CTO. Frontier coding agents see updates with **Claude Sonnet 4.5** showing strong cybersecurity and polished UX but trailing **GPT-5 Codex** in coding capability. **xAI Grok Code Fast** claims higher edit success at lower cost. **Google&apos;s Jules** coding agent launches a programmable API with CI/CD integration. **Qwen** clarifies its model taxonomy and API tiers. Vision/LM Arena rankings show a tight competition among **Claude Sonnet 4.5**, **Claude Opus 4.1**, **Gemini 2.5 Pro**, and OpenAI&apos;s latest models. In video generation, **Sora 2 Pro** leads App Store rankings with rapid iteration and a new creator ecosystem; early tests show it answers GPQA-style questions at 55% accuracy versus GPT-5&apos;s 72%. Video Arena adds new models like **Luma&apos;s Ray 3** and **Kling 2.5** for benchmarking. Multi-modal video+audio generation model **Ovi** (Veo-3-like) is released. Retrieval models include **ModernVBERT** from MIT with efficient image-text retrieval capabilities. *&quot;Claude Sonnet 4.5 is basically the same as Opus 4.1 for coding&quot;* and *&quot;Jules is a programmable team member&quot;* highlight key insights.</description><pubDate>Fri, 03 Oct 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>x-ai</category><category>google</category><category>google-labs</category><category>openai</category><category>arena</category><category>epoch-ai</category><category>mit</category><category>luma</category><category>akhaliq</category><category>claude-3-sonnet</category><category>claude-3-opus</category><category>gpt-5-codex</category><category>grok-4-fast</category><category>qwen-3-next</category><category>gemini-2.5-pro</category><category>sora-2-pro</category><category>ray-3</category><category>kling-2.5</category><category>veo-3</category><category>modernvbert</category><category>finbarrtimbers</category><category>gauravisnotme</category><category>justinlin610</category><category>billpeeb</category><category>apples_jimmy</category><category>akhaliq</category><category>coding-agents</category><category>cybersecurity</category><category>api</category><category>model-taxonomy</category><category>model-ranking</category><category>video-generation</category><category>benchmarking</category><category>multi-modal-generation</category><category>retrieval</category><category>image-text-retrieval</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-10-02-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-02-not-much/</guid><description>**Kling 2.5 Turbo** leads in text-to-video and image-to-video generation with competitive pricing. **OpenAI Sora 2** shows strong instruction-following but has physics inconsistencies. **Google Gemini 2.5 Flash** &quot;Nano Banana&quot; image generation is now generally available with multi-image blending and flexible aspect ratios. **IBM Granite 4.0** introduces a hybrid Mamba/Transformer architecture with large context windows and strong token efficiency, outperforming some peers on the Intelligence Index. **Qwen** models receive updates including fine-tuning API support and improved vision capabilities. **Tinker** offers a flexible fine-tuning API supporting LoRA sharing and CPU-only training loops. The ecosystem also sees updates like **Synthesia 3.0** adding video agents.</description><pubDate>Thu, 02 Oct 2025 05:44:39 GMT</pubDate><category>openai</category><category>google</category><category>ibm</category><category>alibaba</category><category>kling_ai</category><category>synthesia</category><category>ollama</category><category>huggingface</category><category>arena</category><category>artificialanalysis</category><category>tinker</category><category>scaling01</category><category>kling-2.5-turbo</category><category>sora-2</category><category>gemini-2.5-flash</category><category>granite-4.0</category><category>qwen-3</category><category>qwen-image-2509</category><category>qwen3-vl-235b</category><category>artificialanlys</category><category>kling_ai</category><category>altryne</category><category>teortaxestex</category><category>fofrai</category><category>tim_dettmers</category><category>sundarpichai</category><category>officiallogank</category><category>andrew_n_carr</category><category>googleaidevs</category><category>clementdelangue</category><category>wzhao_nlp</category><category>alibaba_qwen</category><category>scaling01</category><category>ollama</category><category>video-generation</category><category>instruction-following</category><category>physics-simulation</category><category>image-generation</category><category>model-architecture</category><category>mixture-of-experts</category><category>context-windows</category><category>token-efficiency</category><category>fine-tuning</category><category>lora</category><category>cpu-training</category><category>model-benchmarking</category><category>api</category><category>workflow-automation</category></item><item><title>Thinking Machines&apos; Tinker: LoRA based LLM fine-tuning API</title><link>https://news.smol.ai/issues/25-10-01-thinky/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-10-01-thinky/</guid><description>**Thinking Machines** recently raised **$2 billion** without shipping a product until now, launching their first product **Tinker**, a managed service API for fine-tuning large and mixture-of-experts models like **Qwen-235B-A22B** using **LoRA** for cost-efficient training. The Tinker API offers low-level primitives for post-training methods and is supported by an open-source **Tinker Cookbook** library. Influential AI figures like **Andrej Karpathy** and **Lilian Weng** praised its design for reducing complexity and boosting research productivity. Meanwhile, **OpenAI** launched **Sora 2**, a video+audio model integrated into their consumer social app, sparking viral engagement and concerns over misuse and content moderation. Sam Altman emphasized the product&apos;s dual focus on delight and revenue alongside AGI research.</description><pubDate>Wed, 01 Oct 2025 05:44:39 GMT</pubDate><category>thinking-machines</category><category>openai</category><category>qwen-235b-a22b</category><category>sora-2</category><category>karpathy</category><category>lilianweng</category><category>sama</category><category>fine-tuning</category><category>lora</category><category>model-training</category><category>api</category><category>model-optimization</category><category>distributed-training</category><category>post-training-methods</category><category>research-productivity</category><category>video-generation</category><category>content-moderation</category><category>engagement-patterns</category></item><item><title>Sora 2: new video+audio model and OpenAI&apos;s first Social Network</title><link>https://news.smol.ai/issues/25-09-30-sora2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-30-sora2/</guid><description>**Sora 2** released with improvements on physical world video modeling and a new &quot;character consistency&quot; feature allowing real-world element injection from a single video. The model powers a new **Sora social network** app with profiles, DMs, and viral videos, emphasizing user control over likeness use. **OpenAI** employees are actively experimenting with the model. Meanwhile, **Anthropic** launched **Claude 4.5 Sonnet** with enhanced intelligence, token efficiency, and agentic tool use, outperforming some competitors and closely tracking **GPT-5-high** on benchmarks. Ecosystem support includes LangSmith integration and strong coding/math benchmark results.</description><pubDate>Tue, 30 Sep 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>sora-2</category><category>claude-4.5-sonnet</category><category>gpt-5-high</category><category>sama</category><category>video-generation</category><category>character-consistency</category><category>social-networks</category><category>agentic-ai</category><category>token-efficiency</category><category>benchmarking</category><category>model-performance</category><category>context-management</category><category>coding</category><category>math</category></item><item><title>Anthropic Claude Sonnet 4.5, Claude Code 2.0, new VS Code Extensions</title><link>https://news.smol.ai/issues/25-09-29-sonnet-45/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-29-sonnet-45/</guid><description>**Anthropic** launched a major update with **Claude Sonnet 4.5**, achieving **77.2% SWE-Bench** verified performance and improvements in finance, law, and STEM. They also released **Claude Code v2** featuring checkpoints, a refreshed terminal, and a native VS Code extension, plus a new mascot **Clawd**. The **Claude API** gained context editing and memory tools, and the **Claude Agent SDK** was introduced. The **Claude.ai** apps now support code execution and file creation, with a **Chrome extension** available for Max users. Additionally, **Imagine with Claude** offers a generative UI research preview. Reception has been positive from developers and third-party evaluators. Meanwhile, **DeepSeek** released **V3.2-Exp** with a new **Sparse Attention** algorithm, significantly reducing long-context costs and cutting API prices by over 50%, while maintaining quality.</description><pubDate>Mon, 29 Sep 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>deepseek</category><category>openai</category><category>stripe</category><category>claude-sonnet-4.5</category><category>claude-code-v2</category><category>deepseek-v3.2-exp</category><category>john_schulman</category><category>mike_krieger</category><category>swe-bench</category><category>finance</category><category>law</category><category>stem</category><category>code-execution</category><category>context-editing</category><category>memory-management</category><category>api</category><category>chrome-extension</category><category>generative-ui</category><category>sparse-attention</category><category>long-context</category><category>cost-efficiency</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-26-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-26-not-much/</guid><description>**Google** released a dense September update including **Gemini Robotics 1.5** with enhanced spatial/temporal reasoning, **Gemini Live**, **EmbeddingGemma**, and **Veo 3 GA** powering creative workflows. They also introduced agentic features like restaurant-reservation agents and reduced pricing for **Gemini 2.5 Flash**. **Meta AI** unveiled the open-weight **Code World Model (CWM) 32B**, excelling in code semantics and math benchmarks, with innovations in training code models via execution traces. Local-first coding setups highlight **Qwen3-Coder-30B** running efficiently on consumer GPUs, paired with tools like **Cline** and **LM Studio**. Runtime improvements include **vLLM v1** supporting hybrid models and **mlx-lm** adding batch inference on Apple silicon. In infrastructure, **FlashAttention 4** was reverse-engineered revealing a ~20% speedup from architectural optimizations. **Perplexity AI** advances its independent web index and browsing API with upcoming feed refreshes. Embedding latency improvements were achieved by **Superhuman** using **Baseten**.</description><pubDate>Fri, 26 Sep 2025 05:44:39 GMT</pubDate><category>google</category><category>meta-ai-fair</category><category>perplexity-ai</category><category>baseten</category><category>gemini-robotics-1.5</category><category>gemini-live</category><category>embeddinggemma</category><category>veo-3</category><category>gemini-2.5-flash</category><category>code-world-model-32b</category><category>qwen3-coder-30b</category><category>vllm-v1</category><category>mlx-lm</category><category>flashattention-4</category><category>osanseviero</category><category>_anniexie</category><category>rmstein</category><category>scaling01</category><category>giffmana</category><category>cline</category><category>redhat_ai</category><category>awnihannun</category><category>charles_irl</category><category>bernhardsson</category><category>akshat_b</category><category>aravsrinivas</category><category>spatial-reasoning</category><category>temporal-reasoning</category><category>agentic-ai</category><category>code-semantics</category><category>code-execution-traces</category><category>coding-infrastructure</category><category>runtime-optimization</category><category>batch-inference</category><category>embedding-latency</category><category>api</category><category>model-optimization</category><category>model-performance</category></item><item><title>GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)</title><link>https://news.smol.ai/issues/25-09-25-gdpval/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-25-gdpval/</guid><description>**OpenAI**&apos;s Evals team released **GDPval**, a comprehensive evaluation benchmark covering 1,320 tasks across 44 predominantly digital occupations, assessing AI models against human experts with 14 years average experience. Early results show **Claude 4.1 Opus** outperforming human experts in most categories and **GPT-5 high** trailing behind, with projections that **GPTnext** could match human performance by mid-2026. The benchmark is positioned as a key metric for policymakers and labor impact forecasting. Additionally, **Artificial Analysis** reported improvements in **Gemini 2.5 Flash/Flash-Lite** and **DeepSeek V3.1 Terminus** models, alongside new speech-to-text benchmarks (AA-WER) highlighting leaders like **Google Chirp 2** and **NVIDIA Canary Qwen2.5B**. Agentic AI advances include **Kimi OK Computer**, an OS-like agent with extended tool capabilities and new vendor verification tools.</description><pubDate>Thu, 25 Sep 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>google</category><category>nvidia</category><category>artificial-analysis</category><category>deepseek</category><category>claude-4.1-opus</category><category>gpt-5-high</category><category>gptnext</category><category>gemini-2.5-flash</category><category>gemini-2.5-flash-lite</category><category>deepseek-v3.1-terminus</category><category>google-chirp-2</category><category>qwen-2.5b</category><category>kevinweil</category><category>gdb</category><category>dejavucoder</category><category>yuchenj_uw</category><category>lhsummers</category><category>benchmarking</category><category>agentic-ai</category><category>tool-use</category><category>long-context</category><category>speech-to-text</category><category>model-evaluation</category><category>reasoning</category><category>pricing</category><category>model-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-24-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-24-not-much/</guid><description>**Alibaba** unveiled the **Qwen3** model family including **Qwen3-Max** and **Qwen3-VL** with a native 256K context window expandable to 1M, strong OCR in 32 languages, and rapid release velocity (~3.5 releases/month) backed by a $52B infrastructure roadmap. **OpenAI** launched **GPT-5 Codex**, an agent-optimized coding model with up to **400K context** and adaptive reasoning priced at $1.25/$10 per million tokens, integrated into Cline and benchmarked in WebDev arenas. **Meta AI FAIR** released the open-weight **Code World Model (CWM) 32B**, a dense code generation model with strong benchmark scores (e.g., 65.8% SWE-bench Verified, 96.6% Math-500) and public safety reports. Ecosystem updates include GitHub Copilot&apos;s new embedding model for faster code search and Anthropic&apos;s Claude Sonnet 4 and Opus 4.1 integration into Microsoft 365 Copilot. The vLLM 0.10.2 update introduces Decode Context Parallel (DCP) for improved system performance.</description><pubDate>Wed, 24 Sep 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>openai</category><category>meta-ai-fair</category><category>huggingface</category><category>anthropic</category><category>microsoft</category><category>github</category><category>qwen3-max</category><category>qwen3-vl</category><category>qwen3-coder-plus</category><category>gpt-5-codex</category><category>code-world-model-32b</category><category>claude-sonnet-4</category><category>claude-opus-4.1</category><category>huybery</category><category>akhaliq</category><category>lmarena_ai</category><category>gdb</category><category>ylecun</category><category>pierceboggan</category><category>julesagent</category><category>context-windows</category><category>code-generation</category><category>model-releases</category><category>model-benchmarking</category><category>api</category><category>model-optimization</category><category>multimodality</category><category>software-engineering</category><category>model-training</category></item><item><title>Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap</title><link>https://news.smol.ai/issues/25-09-23-alibaba-yunqi/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-23-alibaba-yunqi/</guid><description>**Alibaba&apos;s Tongyi Qianwen (Qwen) team** launched major updates including the **1T parameter Qwen3-Max**, **Qwen3-Omni**, and **Qwen3-VL** models, alongside specialized versions like **Qwen3Guard**, **Qwen3-LiveTranslate**, **Qwen3-TTS-Flash**, **Qwen-Image-Edit**, and **Qwen3Coder**. At the **AliCloud Yunqi (Apsara) conference**, CEO **Eddie Wu** outlined a $52B roadmap emphasizing two AI development stages: &quot;intelligence emergence&quot; focusing on learning from humans and reasoning, and &quot;autonomous action&quot; highlighting AI&apos;s tool use and real-world task execution. The updates showcase advances in **tool use**, **large-model coding capabilities**, and AI&apos;s expanding role across industries such as logistics, manufacturing, biomedicine, and finance. Junyang Lin and Alibaba Wan are key spokespersons for these developments. The Qwen project is now seen as a &quot;frontier lab&quot; for AI innovation.</description><pubDate>Tue, 23 Sep 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>alicloud</category><category>qwen3-max</category><category>qwen3-omni</category><category>qwen3-vl</category><category>qwen3guard</category><category>qwen3-livetranslate</category><category>qwen3-tts-flash</category><category>qwen-image-edit</category><category>qwen3coder</category><category>qwen</category><category>junyang_lin</category><category>eddie_wu</category><category>alibaba_wan</category><category>tool-use</category><category>large-model-coding</category><category>reasoning</category><category>multimodality</category><category>model-release</category><category>model-updates</category><category>industry-application</category><category>scaling</category><category>fine-tuning</category><category>reinforcement-learning</category></item><item><title>NVIDIA to invest $100B in OpenAI for 10GW of Vera Rubin rollout</title><link>https://news.smol.ai/issues/25-09-22-nvda-oai/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-22-nvda-oai/</guid><description>**NVIDIA** and **OpenAI** announced a landmark strategic partnership to deploy at least **10 gigawatts** of AI datacenters using NVIDIA&apos;s systems, with NVIDIA investing up to **$100 billion** progressively as each gigawatt is deployed, starting in the second half of 2026 on the Vera Rubin platform. This deal significantly impacts the AI infrastructure funding landscape, potentially supporting OpenAI&apos;s $300 billion commitment to Oracle. The announcement caused major stock market reactions, with NVIDIA&apos;s market cap surging by $170 billion. Additionally, advancements in deterministic inference for reinforcement learning and FP8 precision gains in GPU performance were highlighted by AI practitioners.</description><pubDate>Mon, 22 Sep 2025 05:44:39 GMT</pubDate><category>nvidia</category><category>openai</category><category>oracle</category><category>intel</category><category>enfabrica</category><category>wayne</category><category>qwen3-omni</category><category>deepseek-v3.1</category><category>artificialanlys</category><category>gdb</category><category>gpu-infrastructure</category><category>deterministic-inference</category><category>reinforcement-learning</category><category>fp8-precision</category><category>gpu-performance</category><category>ai-infrastructure</category><category>strategic-partnerships</category><category>investment</category><category>datacenters</category><category>cuda-graphs</category><category>pipeline-parallelism</category><category>data-parallelism</category></item><item><title>Grok 4 Fast: Xai&apos;s distilled, 40% more token efficient, 2m context, 344 tok/s frontier model</title><link>https://news.smol.ai/issues/25-09-19-grok-4-fast/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-19-grok-4-fast/</guid><description>**xAI** announced **Grok 4 Fast**, a highly efficient model running at **344 tokens/second**, offering reasoning and nonreasoning modes and free trials on major platforms. **Meta** showcased its neural band and Ray-Ban Display with a live demo that experienced hiccups but sparked discussion on live hardware demos and integration challenges. **Meta** is also developing a first-party &quot;Horizon Engine&quot; for AI rendering and released Quest-native Gaussian Splatting capture tech. New model releases include **Mistral&apos;s Magistral 1.2**, a compact multimodal vision-language model with improved benchmarks and local deployment; **Moondream 3**, a 9B-parameter MoE VLM focused on efficient visual reasoning; **IBM&apos;s Granite-Docling-258M**, a document VLM for layout-faithful PDF to HTML/Markdown conversion; and **ByteDance&apos;s SAIL-VL2**, a vision-language foundation model excelling at multimodal understanding and reasoning at 2B and 8B parameter scales.</description><pubDate>Fri, 19 Sep 2025 05:44:39 GMT</pubDate><category>xai</category><category>meta-ai-fair</category><category>mistral-ai</category><category>ibm</category><category>bytedance</category><category>grok-4-fast</category><category>magistral-1.2</category><category>moondream-3</category><category>granite-docling-258m</category><category>sail-vl2</category><category>nearcyan</category><category>aidangomez</category><category>_akhaliq</category><category>vikhyatk</category><category>rohanpaul_ai</category><category>efficiency</category><category>reasoning</category><category>vision</category><category>multimodality</category><category>model-optimization</category><category>model-deployment</category><category>vision-encoders</category><category>model-architecture</category><category>model-training</category></item><item><title>Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer &amp; datacenters</title><link>https://news.smol.ai/issues/25-09-18-nvidia-intc/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-18-nvidia-intc/</guid><description>**Nvidia and Intel** announced a joint development partnership for multiple new generations of x86 products, marking a significant shift in the tech industry. This collaboration has been in the works for a year and impacts both consumer and data center markets, boosting hopes for Intel&apos;s Foundry business. On the AI hardware front, **Meta** showcased its neural band and Ray-Ban Display with a live demo that experienced hiccups but sparked discussion on live tech demos. Meta is also moving from Unity to its own Horizon Engine for AI rendering, including Gaussian splatting capture technology. In AI models, **Mistral** released Magistral 1.2, a compact multimodal vision-language model with improved benchmarks and local deployment capabilities, while **Moondream 3** previewed a 9B-parameter, 2B-active MoE VLM focused on efficient visual reasoning.</description><pubDate>Thu, 18 Sep 2025 05:44:39 GMT</pubDate><category>nvidia</category><category>intel</category><category>meta-ai-fair</category><category>mistral-ai</category><category>magistral-1.2</category><category>moondream-3</category><category>nearcyan</category><category>_akhaliq</category><category>vikhyatk</category><category>multimodality</category><category>vision</category><category>model-optimization</category><category>model-efficiency</category><category>model-architecture</category><category>reinforcement-learning</category><category>fine-tuning</category><category>ai-hardware</category><category>gaussian-splatting</category><category>live-demo</category><category>visual-reasoning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-17-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-17-not-much/</guid><description>**Anthropic** published an in-depth postmortem on their August-September reliability issues. **OpenAI**&apos;s GPTeam achieved a perfect 12/12 score at the **ICPC 2025** World Finals, showcasing rapid progress in general-purpose reasoning and introducing controllable &quot;thinking time&quot; tiers for **gpt-5** in ChatGPT. **Google DeepMind**&apos;s **gemini-2.5-deep-think** earned a gold medal level at ICPC, solving 10/12 problems with advances in parallel thoughts, multi-step reasoning, and novel reinforcement learning techniques. OpenAI and Apollo Evaluations detected &quot;scheming&quot; behaviors in frontier models, emphasizing the need for chain-of-thought transparency and launching a $500K Kaggle challenge. GitHub launched an MCP server registry integrated with VS Code Insiders, with additional support from JetBrains and Hugging Face for open LLMs in Copilot Chat. Weaviate released a native Query Agent translating natural language to database operations with citations.</description><pubDate>Wed, 17 Sep 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>openai</category><category>google-deepmind</category><category>apollo-evaluations</category><category>github</category><category>hugging-face</category><category>weaviate</category><category>gpt-5</category><category>gemini-2.5-deep-think</category><category>sama</category><category>merettm</category><category>woj_zaremba</category><category>markchen90</category><category>esyudkowsky</category><category>reasoning</category><category>reinforcement-learning</category><category>alignment</category><category>chain-of-thought</category><category>model-evaluation</category><category>agent-frameworks</category><category>ide-integration</category><category>natural-language-to-sql</category><category>real-time-voice</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-16-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-16-not-much/</guid><description>**GPT-5 Codex** rollout shows strong agentic coding capabilities with some token bloat issues. IDEs like **VS Code Insiders** and **Cursor 1.6** enhance context windows and model integration. **vLLM 0.10.2** supports aarch64 and NVIDIA GB200 with performance improvements. **AMD ROCm** updates add modern attention, sparse MoE, and distributed inference. **TRL** introduces Context Parallelism for long-context training. Robotics and RL data pipelines improve with **Unsloth** and **LeRobotDataset v3**. **Qwen3-Next-80B** runs efficiently on Mac M4 Max with MLX. **Tencent&apos;s HunyuanImage 2.1** is a 17B bilingual text-to-image model with 2048×2048 resolution and restricted open weights.</description><pubDate>Tue, 16 Sep 2025 05:44:39 GMT</pubDate><category>openai</category><category>microsoft</category><category>perplexity-ai</category><category>huggingface</category><category>amd</category><category>tencent</category><category>lmstudio</category><category>gpt-5-codex</category><category>vllm-0.10.2</category><category>qwen3-next-80b</category><category>hunyuanimage-2.1</category><category>gdb</category><category>teknium1</category><category>finbarrtimbers</category><category>thsottiaux</category><category>theturingpost</category><category>pierceboggan</category><category>amandaksilver</category><category>aravsrinivas</category><category>sergiopaniego</category><category>art_zucker</category><category>danielhanchen</category><category>rwojo</category><category>awnihannun</category><category>agentic-ai</category><category>ide</category><category>context-windows</category><category>inference</category><category>distributed-inference</category><category>reinforcement-learning</category><category>robotics</category><category>long-context</category><category>model-optimization</category><category>text-to-image</category><category>multimodality</category><category>model-licenses</category></item><item><title>GPT-5 Codex launch and OpenAI&apos;s quiet rise in Agentic Coding</title><link>https://news.smol.ai/issues/25-09-15-gpt5-codex/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-15-gpt5-codex/</guid><description>**OpenAI** released **GPT-5-Codex**, an agentic coding model optimized for long-running software engineering tasks with dynamic task-adaptive thinking, multi-hour autonomy, and improved code quality. It achieves 51% accuracy on an unreleased large refactor benchmark and integrates deeply with developer tools like Xcode. Meanwhile, **Alibaba** launched **Qwen3-Next-80B**, a hybrid MoE model with native long-context support (262k tokens, extensible to 1M+), targeting efficient reasoning and repository-scale code analysis, supported by **Together AI** and **NVIDIA** with CUDA-accelerated attention. The trend towards hybrid SSM + MoE architectures is noted, emphasizing efficiency and scaling in China and US training regimes. Community discussions highlight the importance of variable compute and routing for inference efficiency and quality.</description><pubDate>Mon, 15 Sep 2025 05:44:39 GMT</pubDate><category>openai</category><category>alibaba</category><category>together-ai</category><category>nvidia</category><category>gpt-5-codex</category><category>qwen3-next-80b</category><category>sama</category><category>swyx</category><category>omarsar0</category><category>ofirpress</category><category>agentic-ai</category><category>software-engineering</category><category>long-context</category><category>mixture-of-experts</category><category>model-optimization</category><category>cuda-acceleration</category><category>inference-efficiency</category><category>routing</category><category>task-adaptive-thinking</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-12-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-12-not-much/</guid><description>**Meta** released **MobileLLM-R1**, a sub-1B parameter reasoning model family on Hugging Face with strong small-model math accuracy, trained on 4.2T tokens. **Alibaba** introduced **Qwen3-Next-80B-A3B** with hybrid attention, 256k context window, and improved long-horizon memory, priced competitively on Alibaba Cloud. **Meta AI FAIR** fixed a benchmark bug in SWE-Bench affecting agent evaluation. LiveMCP-101 benchmark shows frontier models like **GPT-5** underperform on complex tasks with common failure modes cataloged. OpenAI highlights hallucination issues due to benchmark incentives, proposing calibration improvements. Community demos and tooling updates continue to evolve.</description><pubDate>Sat, 13 Sep 2025 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>huggingface</category><category>alibaba</category><category>openai</category><category>mobilellm-r1</category><category>qwen3-next-80b-a3b</category><category>gpt-5</category><category>_akhaliq</category><category>tacocohen</category><category>pkirgis</category><category>sayashk</category><category>reasoning</category><category>model-efficiency</category><category>hybrid-attention</category><category>long-context</category><category>benchmarking</category><category>agent-evaluation</category><category>hallucination-detection</category><category>model-calibration</category><category>inference-complexity</category><category>model-pricing</category></item><item><title>Qwen3-Next-80B-A3B-Base: Towards Ultimate Training &amp; Inference Efficiency</title><link>https://news.smol.ai/issues/25-09-11-qwen3-next/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-11-qwen3-next/</guid><description>**MoE (Mixture of Experts) models** have become essential in frontier AI models, with **Qwen3-Next** pushing sparsity further by activating only **3.7% of parameters** (3B out of 80B) using a hybrid architecture combining **Gated DeltaNet** and **Gated Attention**. This new design includes **512 total experts** (10 routed + 1 shared), **Zero-Centered RMSNorm** for stability, and improved MoE router initialization, resulting in **~10× cheaper training and 10× faster inference** compared to previous models. **Alibaba&apos;s Qwen3-Next** reportedly outperforms **Gemini-2.5-Flash-Thinking** and approaches the flagship 235B model&apos;s performance, with deployments on **Hugging Face**, **Baseten**, and native **vLLM** support for efficient inference.</description><pubDate>Thu, 11 Sep 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>mistral-ai</category><category>deepseek</category><category>snowflake</category><category>hugging-face</category><category>baseten</category><category>nvidia</category><category>qwen3-next</category><category>qwen3</category><category>mixtral-8x7b</category><category>gemini-2.5-pro</category><category>justinlin610</category><category>teortaxestex</category><category>yuchenj_uw</category><category>mixture-of-experts</category><category>model-sparsity</category><category>gated-attention</category><category>hybrid-architecture</category><category>rmsnorm</category><category>model-stability</category><category>model-training</category><category>inference-optimization</category><category>multi-token-prediction</category><category>model-deployment</category></item><item><title>Oracle jumps +36% in a day after winning $300B OpenAI contract</title><link>https://news.smol.ai/issues/25-09-10-oci/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-10-oci/</guid><description>**Oracle&apos;s OCI division** reported a stunning **+359% revenue bookings growth to $455B** with cloud revenue guidance of **$144B by 2030**, driven significantly by a large deal with **OpenAI** amid tensions with **Microsoft**. On AI infrastructure, **Moonshot AI** released **Kimi’s checkpoint-engine**, enabling rapid weight updates on 1T-parameter models across thousands of GPUs, integrating with **vLLM**. **RLFactory** introduced a plug-and-play reinforcement learning framework for tool-using agents, showing smaller models outperforming larger ones. **TRL v0.23** added context parallelism for long-context training. **Thinking Machines Lab** published research on deterministic inference pipelines, making **vLLM** deterministic for **Qwen** models. **Meta** launched **BackendBench**, a PyTorch benchmarking tool.</description><pubDate>Wed, 10 Sep 2025 05:44:39 GMT</pubDate><category>oracle</category><category>openai</category><category>microsoft</category><category>moonshot-ai</category><category>vllm-project</category><category>thinking-machines-lab</category><category>meta</category><category>qwen3-235b</category><category>qwen3-4b</category><category>qwen2.5-7b</category><category>vllm</category><category>kimi_moonshot</category><category>arankomatsuzaki</category><category>qgallouedec</category><category>cHHillee</category><category>woosuk_k</category><category>stasbekman</category><category>reinforcement-learning</category><category>model-weight-updates</category><category>deterministic-inference</category><category>benchmarking</category><category>long-context</category><category>model-optimization</category><category>cuda</category><category>distributed-training</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-09-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-09-not-much/</guid><description>**Cognition** raised **$400M** at a **$10.2B** valuation to advance AI coding agents, with **swyx** joining to support the &quot;Decade of Agents&quot; thesis. **Vercel** launched an OSS &quot;vibe coding platform&quot; using a tuned **GPT-5** agent loop. **Claude Code** emphasizes minimalism in agent loops for reliability. **Kimi K2-0905** achieved 94% on coding evals and improved agentic capabilities with doubled context length. **Alibaba** released **Qwen3-ASR**, a multilingual transcription model with &lt;8% WER. **Meta** introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization include **AutoRound**, **QuTLASS v0.1.0**, and **AlgoPerf v0.6**. **Google&apos;s Veo 3** video generation API went GA with significant price cuts and vertical video support.</description><pubDate>Tue, 09 Sep 2025 05:44:39 GMT</pubDate><category>cognition</category><category>founders-fund</category><category>lux-capital</category><category>8vc</category><category>neo</category><category>vercel</category><category>claude</category><category>groq</category><category>alibaba</category><category>huggingface</category><category>meta-ai-fair</category><category>google</category><category>theturingpost</category><category>algoperf</category><category>gpt-5</category><category>kimi-k2-0905</category><category>glm-4.5</category><category>qwen3-asr</category><category>opus-4.1</category><category>swyx</category><category>tim_dettmers</category><category>coding-agents</category><category>agent-architecture</category><category>open-source</category><category>model-evaluation</category><category>multilingual-models</category><category>speech-recognition</category><category>model-optimization</category><category>kv-cache</category><category>quantization</category><category>algorithmic-benchmarking</category><category>video-generation</category><category>context-windows</category></item><item><title>Cognition&apos;s $10b Series C; Smol AI updates</title><link>https://news.smol.ai/issues/25-09-08-cog-smol/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-08-cog-smol/</guid><description>**Cognition** raised **$400M** at a **$10.2B** valuation to advance AI coding agents, with **swyx** joining the company. **Vercel** launched an OSS coding platform using a tuned **GPT-5** agent loop. The **Kimi K2-0905** model achieved top coding eval scores and improved agentic capabilities with doubled context length. **Alibaba** released **Qwen3-ASR**, a multilingual transcription model with robust noise handling. **Meta** introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization were highlighted, including **AutoRound** in SGLang and **QuTLASS v0.1.0** for Blackwell GPUs. Algorithmic benchmarking tools like **AlgoPerf v0.6** were updated for efficiency.</description><pubDate>Mon, 08 Sep 2025 05:44:39 GMT</pubDate><category>cognition</category><category>vercel</category><category>meta-ai-fair</category><category>alibaba</category><category>groq</category><category>huggingface</category><category>kimi-k2-0905</category><category>qwen3-asr</category><category>gpt-5</category><category>swyx</category><category>coding-agents</category><category>agent-development</category><category>open-source</category><category>model-evaluation</category><category>multilingual-models</category><category>inference-optimization</category><category>kv-cache-compression</category><category>quantization</category><category>algorithmic-benchmarking</category><category>context-length</category><category>model-performance</category></item><item><title>Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched</title><link>https://news.smol.ai/issues/25-09-05-1t-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-05-1t-models/</guid><description>**Moonshot AI** updated their **Kimi K2-0905** open model with doubled context length to **256k tokens**, improved coding and tool-calling, and integration with agent scaffolds. **Alibaba** released **Qwen 3 Max**, a **1 trillion parameter** model with agent-oriented behavior, available via **Qwen Chat**, **Alibaba Cloud API**, and **OpenRouter**. The community highlights China&apos;s dominance in open models and debates around meaningful evaluation methods for code agents, emphasizing long-horizon and domain-specific evals. Influential voices like **@swyx** and **@karpathy** discuss the importance of practical evals and discriminator models for ranking outputs.</description><pubDate>Fri, 05 Sep 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>alibaba</category><category>huggingface</category><category>together-ai</category><category>groq</category><category>lmsys</category><category>openrouter</category><category>llamaindex</category><category>kimi-k2-0905</category><category>qwen-3-max</category><category>qwen-3</category><category>swyx</category><category>karpathy</category><category>willdepue</category><category>levie</category><category>bebischof</category><category>andrew_n_carr</category><category>bigeagle_xd</category><category>long-context</category><category>agents</category><category>coding</category><category>tool-use</category><category>model-evaluation</category><category>instruction-following</category><category>context-windows</category><category>semantic-search</category><category>discriminator-models</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-04-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-04-not-much/</guid><description>**Google DeepMind** released **EmbeddingGemma (308M)**, a small multilingual embedding model optimized for on-device retrieval-augmented generation and semantic search, supporting over 100 languages and running efficiently with quantization and EdgeTPU latency under 15ms. **Jina AI** introduced new code-focused embedding models (0.5B/1.5B) with GGUF quantization, achieving state-of-the-art retrieval across multiple languages and tasks. **LightOn** demonstrated large-scale retrieval training without distillation using contrastive training on billions of passages. **Hugging Face** released the **FineVision** dataset with 17.3M images and 9.5B answer tokens for vision-language model training, showing significant benchmark improvements. The **MiniCPM-V 4.5 (8B)** multimodal model reported surpassing **GPT-4o** and **Gemini-2.0 Pro** on OpenCompass benchmarks with innovative video token compression. Microsoft’s **VibeVoice TTS** and Stanford’s Mixture-of-Contexts video generation also featured. Additionally, a Stanford study benchmarked optimizers like Muon, Soap, Mars, and Sophia, finding diminishing speedups over AdamW at larger scales but advantages at smaller scales. The new ChatGPT branching feature was noted for its simplicity and popularity. *&quot;Everyone&apos;s a decacorn now.&quot;*</description><pubDate>Thu, 04 Sep 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>hugging-face</category><category>jina-ai</category><category>lighton</category><category>microsoft</category><category>stanford</category><category>openai</category><category>ollama</category><category>weaviate</category><category>langchain</category><category>llamaindex</category><category>embeddinggemma</category><category>qwen-2.5-coder</category><category>minicpm-v-4.5</category><category>gpt-4o</category><category>gemini-2.0-pro</category><category>osanseviero</category><category>_philschmid</category><category>tomaarsen</category><category>ollama</category><category>weaviate_io</category><category>lusxvr</category><category>andimarafioti</category><category>thibaudfrere</category><category>_akhaliq</category><category>clementdelangue</category><category>gordonwetzstein</category><category>konstmish</category><category>wen_kaiyue</category><category>percyliang</category><category>embeddings</category><category>retrieval-augmented-generation</category><category>quantization</category><category>multilingual-models</category><category>on-device-ai</category><category>semantic-search</category><category>contrastive-learning</category><category>dataset-release</category><category>vision</category><category>multimodality</category><category>video-generation</category><category>text-to-speech</category><category>optimizer-benchmarking</category><category>training-recipes</category><category>model-compression</category><category>video-token-compression</category><category>fine-tuning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-03-not-much/</guid><description>**Exa** raised a **$700m Series B**, **OpenPipe** was acquired by **Coreweave**, and **Statsig** and **Alex** were acquired by **OpenAI**. The **Agent/Client Protocol (ACP)** was introduced by the **Zed** team to standardize IDE-agent interoperability, supporting **Claude Code** and **Gemini** CLIs. **LangChain 1.0 alpha** unifies content blocks for reasoning and multimodal data. The **OSWorld Verified leaderboard** promotes reproducible evaluation of computer-use agents including **OpenAI** and **Anthropic** models. FAIR revealed coding agent cheating on **SWE-Bench Verified**. **PR Arena** hosts live coding agent competitions. Benchmarks like **GSO** and **Holistic Agent Leaderboard** test software optimization and web browsing tasks, with **Qwen3-Coder** and **Gemini 2.5 Flash** showing strong performance. Advances in reinforcement learning for tool use include **SimpleTIR** improving multi-turn tool use success rates and **UI-TARS-2** advancing GUI agents. The **DARLING** optimizer improves quality and diversity in reasoning and instruction following, while **DEPO** achieves data-efficient RLVR with significant speedups.</description><pubDate>Wed, 03 Sep 2025 05:44:39 GMT</pubDate><category>exa</category><category>openpipe</category><category>coreweave</category><category>statsig</category><category>openai</category><category>zed</category><category>claude</category><category>gemini</category><category>langchain</category><category>anthropic</category><category>fair</category><category>alibaba</category><category>hud-evals</category><category>claude-code</category><category>gemini</category><category>qwen3-coder</category><category>gemini-2.5-flash</category><category>zeddotdev</category><category>mathemagic1an</category><category>hwchase17</category><category>giffmana</category><category>gneubig</category><category>crystalsssup</category><category>sayashk</category><category>_philschmid</category><category>_akhaliq</category><category>jaseweston</category><category>agent-protocols</category><category>interoperability</category><category>standardization</category><category>agent-evaluation</category><category>coding-agents</category><category>software-optimization</category><category>web-browsing</category><category>reinforcement-learning</category><category>multi-turn-reasoning</category><category>optimizer-design</category><category>data-efficient-rlvr</category><category>leaderboards</category><category>benchmarking</category></item><item><title>Anthropic raises $13B at $183B Series F</title><link>https://news.smol.ai/issues/25-09-02-anthropic-f/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-02-anthropic-f/</guid><description>**Anthropic** achieved a **$183B post-money valuation** in Series F funding by September 2025, growing from about $1B run-rate in January to over **$5B run-rate** by August 2025. Their **Claude Code** product saw **&gt;10x usage growth** in three months and reached **$500M run-rate revenue**, serving over **300,000 business customers** with a nearly **7x increase in large accounts**. **Mistral AI** launched **Le Chat** with 20+ MCP connectors integrating with major SaaS platforms and persistent memory features. Benchmarking updates highlight **GPT-5** leading agent intelligence indices, with strong performances from **xAI&apos;s Grok** and **Anthropic&apos;s Claude** families. Reliability tooling and agent evaluation advances were shared by **Galileo**, **OpenPipe**, and others. **Zhipu/THUDM** open-sourced **Slime v0.1.0**, enhancing RL infrastructure behind **GLM-4.5** with significant decoding speed improvements and advanced tensor offload techniques.</description><pubDate>Tue, 02 Sep 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>mistral-ai</category><category>x-ai</category><category>salesforce</category><category>galileo</category><category>openpipe</category><category>zhipu</category><category>thudm</category><category>claude-code</category><category>gpt-5</category><category>grok-4</category><category>claude</category><category>sonnet-4</category><category>glm-4.5</category><category>deepseek-r1</category><category>swyx</category><category>emilygsands</category><category>_philschmid</category><category>_lewtun</category><category>omarsar0</category><category>_avichawla</category><category>corbtt</category><category>enterprise-connectors</category><category>agent-benchmarking</category><category>reinforcement-learning</category><category>inference-optimization</category><category>memory-optimization</category><category>cuda</category><category>multi-token-prediction</category><category>speculative-decoding</category><category>tensor-offload</category><category>performance-optimization</category><category>real-time-guardrails</category><category>cost-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-09-01-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-09-01-not-much/</guid><description>**OpenAI** integrates **GPT-5** into Xcode 26 with improved coding latency, though some UX trade-offs are noted. **xAI&apos;s Grok Code Fast 1** gains momentum, surpassing **Claude Sonnet** in usage and praised for fast debugging. **Zhipu&apos;s GLM-4.5** offers a cost-effective coding plan with strong performance against Claude Sonnet 4. **Meituan** releases the **LongCat-Flash-Chat**, a 560B parameter MoE model with adaptive compute and detailed technical insights. Apple debuts on-device vision-language models **FastVLM** and **MobileCLIP2** alongside **InternVL3.5**.</description><pubDate>Mon, 01 Sep 2025 05:44:39 GMT</pubDate><category>openai</category><category>x-ai</category><category>zhipu-ai</category><category>meituan</category><category>apple</category><category>gpt-5</category><category>grok-code-fast-1</category><category>claude-sonnet</category><category>glm-4.5</category><category>longcat-flash-chat</category><category>fastvlm</category><category>mobileclip2</category><category>internvl3.5</category><category>gdb</category><category>martin_casado</category><category>yanndubs</category><category>elonmusk</category><category>cline</category><category>vikhyatk</category><category>dzhng</category><category>quixiai</category><category>tim_dettmers</category><category>casper_hansen_</category><category>reach_vb</category><category>eliebakouch</category><category>teortaxestex</category><category>youjiacheng</category><category>model-architecture</category><category>moe</category><category>adaptive-compute</category><category>inference-speed</category><category>model-training</category><category>cost-efficiency</category><category>coding</category><category>developer-tools</category><category>open-inference</category><category>on-device-ai</category><category>vision</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-29-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-29-not-much/</guid><description>**Apple** released three real-time vision-language models (**FastVLM**, **MobileCLIP2**) on Hugging Face with significant speed and size improvements, supporting WebGPU and Core ML. Their MLX framework now supports **MXFP4** format, competing with **NVFP4** for FP4 quantization. **xAI** launched **grok-code-fast-1**, outperforming Claude for code edits, while **OpenAI** integrated **GPT-5** into Xcode 26 and released a new **Responses API** on **Groq** hardware. CLI-first agent workflows advanced with tools like **SemTools**, **MLX** local runner for Apple Silicon, and **llama.vim** recommending **Qwen 3 Coder 30B A3B**. Retrieval research highlights limitations of single-vector embeddings, promoting ColBERT-style late interaction.</description><pubDate>Fri, 29 Aug 2025 05:44:39 GMT</pubDate><category>apple</category><category>hugging-face</category><category>x-ai</category><category>openai</category><category>groq</category><category>run-llama</category><category>lmstudio</category><category>fastvlm</category><category>mobileclip2</category><category>grok-code-fast-1</category><category>gpt-5</category><category>qwen-3-coder-30b-a3b</category><category>reach_vb</category><category>xenovacom</category><category>pcuenq</category><category>awnihannun</category><category>cline</category><category>veggie_eric</category><category>nickbaumann_</category><category>gdb</category><category>benankdev</category><category>loganmarkewich</category><category>tom_doerr</category><category>fastmcp</category><category>ggerganov</category><category>orionweller</category><category>antoine_chaffin</category><category>vision</category><category>model-quantization</category><category>code-generation</category><category>cli-workflows</category><category>retrieval-augmentation</category><category>embedding-models</category><category>local-ai</category><category>multimodality</category></item><item><title>OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o</title><link>https://news.smol.ai/issues/25-08-28-gpt-realtime/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-28-gpt-realtime/</guid><description>**OpenAI** launched the **gpt-realtime** model and **Realtime API** to GA, featuring advanced speech-to-speech capabilities, new voices (**Cedar**, **Marin**), image input, SIP telephony, and a ~20% price cut. Benchmarks show improvements over **gpt-4o-realtime** on BigBench and ComplexFuncBench. **xAI** introduced **Grok Code Fast 1**, a speed-optimized coding model integrated with popular IDEs, while **OpenAI Codex** received major upgrades for local and cloud development workflows. Google’s **Gemini CLI** improved multi-editor support, and new models like **Microsoft MAI-1-preview** and **MAI-Voice-1** were announced. *&quot;The new all-in-one WebRTC API removes the ephemeral token step and supports video on the same connection,&quot;* highlighting enhanced developer tooling.</description><pubDate>Thu, 28 Aug 2025 08:44:39 GMT</pubDate><category>openai</category><category>xai</category><category>microsoft</category><category>google</category><category>gpt-realtime</category><category>gpt-4o-realtime</category><category>grok-code-fast-1</category><category>codex</category><category>mai-1-preview</category><category>mai-voice-1</category><category>gemini-cli</category><category>swyx</category><category>juberti</category><category>omarsar0</category><category>reach_vb</category><category>pbbakkum</category><category>skcd42</category><category>mohitreddy13</category><category>cline</category><category>kevinweil</category><category>gdb</category><category>sama</category><category>_philschmid</category><category>speech-to-speech</category><category>instruction-following</category><category>function-calling</category><category>telephony</category><category>webrtc</category><category>voice-agents</category><category>multilingual-switching</category><category>voice-control</category><category>benchmarks</category><category>coding-models</category><category>ide-integration</category><category>developer-tools</category><category>model-updates</category></item><item><title>OpenAI updates Codex, VSCode Extension that can sync tasks with Codex Cloud</title><link>https://news.smol.ai/issues/25-08-27-codex-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-27-codex-2/</guid><description>**OpenAI Codex** has launched a new IDE Extension integrating with VS Code and Cursor, enabling seamless local and cloud task handoff, sign-in via ChatGPT plans, upgraded CLI, and GitHub code review automation. Facebook AI researchers introduced **StepWiser**, a process-level reward model improving reasoning and training by chunk-by-chunk evaluation, achieving SOTA on ProcessBench. **Google DeepMind&apos;s Gemini 2.5 Flash Image** model showcases advanced spatial reasoning, multi-image fusion, and developer tools including a browser extension for image remixing. NVIDIA revealed efficiency data on **Nemotron-CC-Math (133B)** and **Jet-Nemotron** models.</description><pubDate>Wed, 27 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>facebook-ai-fair</category><category>google-deepmind</category><category>nvidia</category><category>codex</category><category>stepwiser</category><category>gemini-2.5-flash</category><category>nemotron-cc-math</category><category>jet-nemotron</category><category>jaseweston</category><category>tesatory</category><category>benjamindekr</category><category>tokumin</category><category>fabianstelzer</category><category>officiallogank</category><category>process-reward-modeling</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>spatial-reasoning</category><category>multi-image-fusion</category><category>developer-tools</category><category>code-review</category><category>ide-extension</category><category>cli</category><category>cloud-computing</category><category>model-efficiency</category></item><item><title>nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion</title><link>https://news.smol.ai/issues/25-08-26-nano-banana/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-26-nano-banana/</guid><description>**Google DeepMind** revealed **Gemini-2.5-Flash-Image-Preview**, a state-of-the-art image editing model excelling in **character consistency**, **natural-language edits**, and **multi-image composition**, dominating the Image Edit Arena with a ~170-180 Elo lead and over 2.5M votes. It is integrated into multiple platforms including Google AI Studio and third-party services. **Nous Research** released **Hermes 4**, an open-weight hybrid reasoning model focused on steerability and STEM benchmarks. **NVIDIA** launched **Nemotron Nano 9B V2**, a hybrid Mamba-Transformer with 128k context, top-performing under 10B parameters, and released a 6.6T-token pretraining subset. **InternVL3.5** introduced 32 vision-language models based on OpenAI&apos;s gpt-oss and Qwen3 backbones. **Ollama v0.11.7** added DeepSeek v3.1 support with hybrid thinking and Turbo mode preview.</description><pubDate>Tue, 26 Aug 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>nous-research</category><category>nvidia</category><category>openai</category><category>ollama</category><category>huggingface</category><category>openrouter</category><category>gemini-2.5-flash-image-preview</category><category>hermes-4</category><category>nemotron-nano-9b-v2</category><category>internvl3.5</category><category>gpt-oss</category><category>qwen3</category><category>deepseek-v3.1</category><category>sundarpichai</category><category>_philschmid</category><category>lmarena_ai</category><category>omarsar0</category><category>skirano</category><category>yupp_ai</category><category>xanderatallah</category><category>officiallogank</category><category>mervenoyann</category><category>image-editing</category><category>natural-language-processing</category><category>multi-image-composition</category><category>character-consistency</category><category>reasoning</category><category>hybrid-models</category><category>context-windows</category><category>model-steerability</category><category>pretraining</category><category>finetuning</category><category>alignment</category><category>vision</category><category>vision-language</category><category>api</category><category>model-integration</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-25-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-25-not-much/</guid><description>**xAI** released open weights for **Grok-2** and **Grok-2.5** with a novel MoE residual architecture and μP scaling, sparking community excitement and licensing concerns. **Microsoft** open-sourced **VibeVoice-1.5B**, a multi-speaker long-form TTS model with streaming support and a 7B variant forthcoming. **Motif Technology** published a detailed report on **Motif-2.6B**, highlighting Differential Attention, PolyNorm, and extensive finetuning, trained on AMD MI250 GPUs. In coding tools, momentum builds around **GPT-5**-backed workflows, with developers favoring it over Claude Code. **Alibaba** released **Qwen-Code v0.0.8** with deep VS Code integration and MCP CLI enhancements. The MCP ecosystem advances with LiveMCP-101 stress tests, the universal MCP server &quot;Rube,&quot; and LangGraph Platform&apos;s rollout of revision queueing and ART integration for RL training of agents.</description><pubDate>Mon, 25 Aug 2025 05:44:39 GMT</pubDate><category>xai-org</category><category>microsoft</category><category>motif-technology</category><category>alibaba</category><category>huggingface</category><category>langchain-ai</category><category>grok-2</category><category>grok-2.5</category><category>vibevoice-1.5b</category><category>motif-2.6b</category><category>gpt-5</category><category>qwen-code</category><category>elonmusk</category><category>clementdelangue</category><category>rasbt</category><category>quanquangu</category><category>akhaliq</category><category>eliebakouch</category><category>gdb</category><category>ericmitchellai</category><category>ivanfioravanti</category><category>deanwball</category><category>giffmana</category><category>omarsar0</category><category>corbtt</category><category>mixture-of-experts</category><category>model-scaling</category><category>model-architecture</category><category>text-to-speech</category><category>fine-tuning</category><category>training-data</category><category>optimization</category><category>reinforcement-learning</category><category>agentic-ai</category><category>tool-use</category><category>model-training</category><category>model-release</category><category>api</category><category>software-development</category><category>model-quantization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-22-not-much/</guid><description>**DeepMind** released **Genie 3**, an interactive multimodal world simulator with advanced spatial memory and real-time avatar control, and **SIMA**, an embodied training agent operating inside generated worlds. **Alibaba** introduced **Qwen-Image-Edit**, an open-weights image editor scoring **ELO 1098 (#2)** in the Image Editing Arena, running on Qualcomm NPUs, alongside **Qwen-VL-Max** entering the Vision top-20. Video models like **Kling 2.1** showed a **235% improvement** in frame control, with new entrants **Luma Ray 2** and **Runway Gen-4 Turbo** debuting. **Google** provided free **Veo 3** generations in Gemini App and enhanced Google Photos with natural-language edits. **DeepSeek v3.1** launched with focus on SWE and Search agents, supporting local inference on Apple Silicon with 4-bit quantization achieving ~**21 tok/s** on M3 Ultra. The news highlights advances in interactive simulation, vision editing, video synthesis, and scalable local AI inference.</description><pubDate>Fri, 22 Aug 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>alibaba</category><category>google</category><category>deepseek</category><category>baseten</category><category>yupp</category><category>qwen-image-edit</category><category>qwen-vl-max</category><category>kling-2.1</category><category>veo-3</category><category>deepseek-v3.1</category><category>genie-3</category><category>sima</category><category>demishassabis</category><category>bonniesjli</category><category>shreyar</category><category>ostrisai</category><category>lmarena_ai</category><category>teortaxestex</category><category>ivanfioravanti</category><category>multimodality</category><category>embodied-ai</category><category>simulation</category><category>fine-tuning</category><category>quantization</category><category>video-generation</category><category>image-generation</category><category>local-inference</category><category>scaling</category><category>agent-training</category><category>real-time-control</category><category>spatial-memory</category></item><item><title>Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528</title><link>https://news.smol.ai/issues/25-08-21-cohere-command-a-reasoning/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-21-cohere-command-a-reasoning/</guid><description>**Cohere&apos;s Command A Reasoning** model outperforms GPT-OSS in open deep research capabilities, emphasizing agentic use cases for 2025. **DeepSeek-V3.1** introduces a hybrid reasoning architecture toggling between reasoning and non-reasoning modes, optimized for agentic workflows and coding, with extensive long-context pretraining (~630B tokens for 32k context, ~209B for 128k), FP8 training, and a large MoE expert count (~37B). Benchmarks show competitive performance with notable improvements in SWE-Bench and other reasoning tasks. The model supports a $0.56/M input and $1.68/M output pricing on the DeepSeek API and enjoys rapid ecosystem integration including HF weights, INT4 quantization by Intel, and vLLM reasoning toggles. Community feedback highlights the hybrid design&apos;s pragmatic approach to agent and software engineering workflows, though some note the lack of tool use in reasoning mode.</description><pubDate>Thu, 21 Aug 2025 05:44:39 GMT</pubDate><category>cohere</category><category>deepseek</category><category>intel</category><category>huggingface</category><category>baseten</category><category>vllm-project</category><category>chutes-ai</category><category>anycoder</category><category>command-a-reasoning</category><category>deepseek-v3.1</category><category>artificialanlys</category><category>reach_vb</category><category>scaling01</category><category>cline</category><category>ben_burtenshaw</category><category>haihaoshen</category><category>jon_durbin</category><category>_akhaliq</category><category>willccbb</category><category>teortaxestex</category><category>agentic-ai</category><category>hybrid-models</category><category>long-context</category><category>fp8-training</category><category>mixture-of-experts</category><category>benchmarking</category><category>quantization</category><category>reasoning</category><category>coding-workflows</category><category>model-pricing</category></item><item><title>DeepSeek V3.1: 840B token continued pretrain, beating Claude 4 Sonnet at 11% of its cost</title><link>https://news.smol.ai/issues/25-08-20-deepseekv31/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-20-deepseekv31/</guid><description>**DeepSeek** released **DeepSeek V3.1**, a quietly rolled out open model with an **128K context window** and improvements in **token efficiency**, coding, and agentic benchmarks. **ByteDance** launched the permissive **Seed-OSS 36B** model on Hugging Face, noted for long-context and reasoning capabilities. **Zhipu AI** introduced **ComputerRL**, a reinforcement learning framework for computer-use agents, achieving strong benchmark results. In developer tooling, **GitHub Copilot** expanded globally, **Microsoft VS Code** integrated **Gemini 2.5 Pro** and updated **GPT-5** agent prompts, and **Anthropic** launched **Claude Code** seats with spend controls. Open-source fine-tuning advances include **Together AI** adding SFT for **gpt-oss-120B/20B** and **Baseten** enabling multinode 120B training with Truss CLI. The community noted mixed performance and ongoing post-training adjustments for DeepSeek V3.1.</description><pubDate>Wed, 20 Aug 2025 05:44:39 GMT</pubDate><category>deepseek</category><category>bytedance</category><category>zhipu-ai</category><category>github</category><category>microsoft</category><category>anthropic</category><category>together-ai</category><category>baseten</category><category>huggingface</category><category>deepseek-v3.1</category><category>seed-oss-36b</category><category>computerrl</category><category>gemini-2.5-pro</category><category>gpt-5</category><category>claude-code</category><category>gpt-oss-120b</category><category>gpt-oss-20b</category><category>teortaxestex</category><category>rasbt</category><category>lukehoban</category><category>burkeholland</category><category>_catwu</category><category>cline</category><category>winglian</category><category>token-efficiency</category><category>coding</category><category>agentic-benchmarks</category><category>long-context</category><category>reinforcement-learning</category><category>developer-tools</category><category>fine-tuning</category><category>multinode-training</category><category>model-release</category></item><item><title>Databricks&apos; $100B Series K</title><link>https://news.smol.ai/issues/25-08-19-databricks/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-19-databricks/</guid><description>**Databricks** reached a **$100 billion valuation**, becoming a centicorn with new Data ([Lakebase](https://www.databricks.com/product/lakebase)) and AI ([Agent Bricks](https://docs.databricks.com/aws/en/generative-ai/agent-bricks/)) products. **OpenAI** launched **ChatGPT Go** in India at ₹399/month (~$4.55), offering significantly increased usage limits and UPI payment support, with plans for global expansion. The **DeepSeek V3.1 Base/Instruct** models were quietly released on Hugging Face, showing strong coding benchmark performance and adopting an Anthropic-style hybrid system. The **Qwen-Image-Edit** model from **Alibaba** is gaining traction with integrations and community pruning experiments. *&quot;DeepSeek V3.1 Base outperforms Claude 4 Opus on coding benchmarks&quot;* and *&quot;ChatGPT Go offers 10x higher message limits and 2x longer memory&quot;* highlight key advancements.</description><pubDate>Tue, 19 Aug 2025 05:44:39 GMT</pubDate><category>databricks</category><category>openai</category><category>deepseek</category><category>hugging-face</category><category>alibaba</category><category>deepseek-v3.1-base</category><category>deepseek-v3.1-instruct</category><category>chatgpt-go</category><category>qwen-image-edit</category><category>sama</category><category>nickaturley</category><category>kevinweil</category><category>gdb</category><category>sherwinwu</category><category>nptacek</category><category>reach_vb</category><category>clementdelangue</category><category>teortaxestex</category><category>quixiai</category><category>georgejrjrjr</category><category>scaling01</category><category>alibaba_qwen</category><category>linoy_tsaban</category><category>ostrisai</category><category>lmarena_ai</category><category>model-release</category><category>benchmarking</category><category>pricing-models</category><category>fine-tuning</category><category>model-architecture</category><category>image-editing</category><category>video-generation</category><category>api</category><category>agentic-ai</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-18-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-18-not-much/</guid><description>**Gemma 3 270M**, an ultra-small model optimized for edge and mobile use, was released and is gaining adoption. **NVIDIA** launched two open multilingual ASR models, **Canary 1B** and **Parakeet-TDT 0.6B**, trained on 1 million hours of data with CC-BY licensing, plus the efficient **Nemotron-Nano v2 9B** model with significant speedups. **Alibaba&apos;s Qwen-Image-Edit** offers bilingual text editing and semantic image transformations. **Tencent Hunyuan** introduced a controllable game-world video generator trained on over 1 million gameplay recordings. **Meta&apos;s DINOv3** presents a scalable self-supervised vision backbone with strong domain transfer capabilities. **IBM** quietly released efficient English embedding models under a commercial-friendly license. The **BeyondWeb** synthetic data paper shows significant training speed and performance gains over prior datasets. Analysis of **HRM** architecture suggests performance improvements largely stem from data augmentation and scaffolding rather than novel architecture. *&quot;Models and datasets are openly licensed and available on Hugging Face.&quot;*</description><pubDate>Mon, 18 Aug 2025 05:44:39 GMT</pubDate><category>nvidia</category><category>alibaba</category><category>tencent</category><category>meta-ai-fair</category><category>ibm</category><category>datology</category><category>gemma-3-270m</category><category>canary-1b</category><category>parakeet-tdt-0.6b</category><category>nemotron-nano-v2</category><category>qwen-image-edit</category><category>dino-v3</category><category>demishassabis</category><category>adrgrondin</category><category>rasbt</category><category>reach_vb</category><category>ctnzr</category><category>clementdelangue</category><category>natolambert</category><category>_akhaliq</category><category>itspaulai</category><category>mervenoyann</category><category>xenovacom</category><category>tomaarsen</category><category>pratyushmaini</category><category>code_star</category><category>leavittron</category><category>k_schuerholt</category><category>giffmana</category><category>synthetic-data</category><category>multilingual-asr</category><category>self-supervised-learning</category><category>vision</category><category>model-efficiency</category><category>training-data</category><category>data-augmentation</category><category>model-speedup</category><category>domain-transfer</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-15-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-15-not-much/</guid><description>**OpenAI** rolled out **GPT-5** as the default in ChatGPT with new modes and a &quot;warmer&quot; personality, plus expanded message limits for Plus/Team users and Enterprise/Edu access. Performance rankings show **gpt-5-high** leading, with smaller variants also ranked, though critiques note some underperformance versus Chinese models and sensitivity to sycophancy. OpenAI enhanced developer tools with a &quot;Quick eval&quot; feature, coding tips, and an improved Playground. **Google** released **Imagen 4** generally available with faster generation and higher resolution, plus the ultra-small **Gemma 3 270M** model with a large vocabulary and ecosystem support. Podcasts featured OpenAI leaders discussing GPT-5 systems, routing, and efficiency.</description><pubDate>Fri, 15 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>google</category><category>lmsys</category><category>gpt-5</category><category>gpt-5-high</category><category>gpt-5-mini-high</category><category>gpt-5-nano-high</category><category>imagen-4</category><category>gemma-3-270m</category><category>sama</category><category>aidan_mclau</category><category>kevinweil</category><category>lmarena_ai</category><category>edwinarbus</category><category>gdb</category><category>omarsar0</category><category>philschmid</category><category>m4rkmc</category><category>model-releases</category><category>model-performance</category><category>prompt-engineering</category><category>developer-tools</category><category>image-generation</category><category>model-optimization</category><category>transformers</category><category>tokenization</category><category>model-scaling</category></item><item><title>Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants</title><link>https://news.smol.ai/issues/25-08-14-cohere-ai2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-14-cohere-ai2/</guid><description>**OpenAI&apos;s GPT-5** achieved a speedrun of Pokemon Red 3x faster than **o3**. **Perplexity** raised **$200M** at a **$20B valuation**. **AI2** secured **$75M NSF grants** and **$77M from NVIDIA** for AI infrastructure projects like Olmo and Molmo. **Cohere** raised **$500M** and hired **Joelle Pineau** from **meta-ai-fair**, boosting models like Command A. **Google** released the **Gemma 3 270M** on-device tiny LLM with INT4 QAT checkpoints and large embedding tables, and made **Imagen 4** generally available with a fast version at $0.02/image. **Meta-ai-fair** introduced **DINOv3**, a family of self-supervised vision foundation models with high-resolution dense features and strong performance on benchmarks like COCO detection and ADE20K segmentation, under a permissive license. A **$150,000 MiniMax AI Agent Challenge** is ongoing with 200+ prizes, encouraging AI project builds by August 25.</description><pubDate>Thu, 14 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>perplexity-ai</category><category>ai2</category><category>nvidia</category><category>cohere</category><category>meta-ai-fair</category><category>google</category><category>hugging-face</category><category>ollama</category><category>unsloth</category><category>gpt-5</category><category>o3</category><category>command-a</category><category>gemma-3-270m</category><category>imagen-4</category><category>dinov3</category><category>joelle_pineau</category><category>fchollet</category><category>awnihannun</category><category>_philschmid</category><category>osanseviero</category><category>model-speed</category><category>funding</category><category>ai-infrastructure</category><category>on-device-ai</category><category>quantization</category><category>embedding-models</category><category>image-generation</category><category>self-supervised-learning</category><category>vision</category><category>dense-prediction</category><category>benchmarking</category><category>instruction-following</category><category>model-optimization</category><category>model-release</category><category>challenge</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-13-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-13-not-much/</guid><description>**OpenAI** continues small updates to **GPT-5**, introducing &quot;Auto/Fast/Thinking&quot; modes with **196k token context**, **3,000 messages/week**, and dynamic routing to cheaper models for cost efficiency. The **MiniMax AI Agent Challenge** offers **$150,000** in prizes for AI agent development by August 25. The community discusses **GPT-OSS-120B** base model extraction, hosting, and tooling improvements, including multi-tool pipelines and flex-attention. **Anthropic** announces model pairing in **Claude Code** with **Opus 4.1** for planning and **Sonnet 4** for execution, expanding context to **1M tokens** and introducing prompt caching. Key figures include *@sama*, *@jeremyphoward*, *@jxmnop*, and *@_catwu*.</description><pubDate>Wed, 13 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>minimax</category><category>gpt-5</category><category>gpt-oss-120b</category><category>opus-4.1</category><category>sonnet-4</category><category>sama</category><category>jeremyphoward</category><category>jxmnop</category><category>_catwu</category><category>context-windows</category><category>model-routing</category><category>model-hosting</category><category>multi-tool-pipelines</category><category>prompt-caching</category><category>model-extraction</category><category>model-pairing</category><category>cost-efficiency</category><category>model-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-12-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-12-not-much/</guid><description>**OpenAI** released the **GPT-5** series including **GPT-5-mini** and **GPT-5-nano**, with mixed user feedback on performance and API behavior. **Anthropic** extended **Claude Sonnet 4** context window to **1 million tokens**, a 5x increase, enhancing large document processing. **Zhipu AI** launched the open-source multimodal **GLM-4.5V** model with improvements in RL scaling and agentic tasks. **Google DeepMind** showcased the video generation model **Genie 3** and updated the **Gemini App** with new features like **Deep Think** and **Gemini Live**. **Alibaba Qwen** released the distilled image model **Qwen-Image distilled** and enhanced their Deep Research capabilities. Open source models like **Skywork&apos;s Matrix-Game 2.0** and **Jan.ai&apos;s Jan-v1** (built on **Qwen3-4B-Thinking**) were introduced, focusing on real-time world modeling and web search respectively. Developer tools such as **Claude Code** and **Cursor** were also highlighted.</description><pubDate>Tue, 12 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>zhipu-ai</category><category>google-deepmind</category><category>alibaba</category><category>skywork</category><category>jan-ai</category><category>gpt-5</category><category>gpt-5-mini</category><category>gpt-5-nano</category><category>claude-sonnet-4</category><category>glm-4.5v</category><category>genie-3</category><category>gemini-app</category><category>qwen-image-distilled</category><category>matrix-game-2.0</category><category>jan-v1</category><category>qwen3-4b-thinking</category><category>context-window</category><category>multimodality</category><category>reinforcement-learning</category><category>agentic-tasks</category><category>video-generation</category><category>image-generation</category><category>real-time-systems</category><category>web-search</category><category>model-accuracy</category><category>developer-tools</category><category>open-source-models</category><category>long-context</category><category>model-scaling</category></item><item><title>OpenAI&apos;s IMO Gold model also wins IOI Gold</title><link>https://news.smol.ai/issues/25-08-11-ioi-gold/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-11-ioi-gold/</guid><description>**OpenAI** announced placing **#6 among human coders** at the IOI, reflecting rapid progress in competitive coding AI over the past two years. The **GPT-5** launch faced significant user backlash over restrictive usage limits and removal of model selection control, leading to a reversal and increased limits to **3000 requests per week** for Plus users. Confusion around **GPT-5** naming and benchmarking was highlighted, with critiques on methodological issues comparing models like **Claude** and **Gemini**. Performance reviews of **GPT-5** are mixed, with claims of near-zero hallucinations by **OpenAI** staff but user reports of confidence in hallucinations and steering difficulties. Benchmarks show **GPT-5 mini** performing well on document understanding, while the full **GPT-5** is seen as expensive and middling. On the Chatbot Arena, **Gemini 2.5 Pro** holds a **67%** winrate against **GPT-5 Thinking**. Prompting and model behavior remain key discussion points.</description><pubDate>Mon, 11 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>gpt-5</category><category>gpt-5-thinking</category><category>gpt-5-mini</category><category>gemini-2.5-pro</category><category>claude</category><category>opus-4.1</category><category>sama</category><category>scaling01</category><category>yanndubs</category><category>sherylhsu</category><category>ahmed_el-kishky</category><category>jerry_tworek</category><category>noam_brown</category><category>alex_wei</category><category>amandaaskell</category><category>ericmitchellai</category><category>jon_durbin</category><category>gdb</category><category>jerryjliu0</category><category>reinforcement-learning</category><category>benchmarking</category><category>model-performance</category><category>prompt-engineering</category><category>model-behavior</category><category>competitive-programming</category><category>user-experience</category><category>model-naming</category><category>model-selection</category><category>hallucination-detection</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-08-not-much/</guid><description>**OpenAI** launched **GPT-5** with a unified user experience removing manual model selection, causing initial routing and access issues for Plus users that are being addressed with fixes including restored model options and increased usage limits. **GPT-5** introduces &quot;Priority Processing&quot; for lower latency at higher price tiers, achieving ~750ms median time-to-first-token in some cases. Microsoft reports full Copilot adoption of **GPT-5**, and API traffic doubled within 24 hours, peaking at 2 billion tokens per minute. Early benchmarks show **GPT-5** leading in reasoning tasks like FrontierMath and LiveBench, with improvements in hallucination control and creative writing, though some models like Grok-4 and Claude-4 Sonnet Thinking outperform it in specific RL-heavy reasoning benchmarks. OpenAI also released extensive migration and feature guides but faced some rollout issues including a broken code sample and a problematic Voice Mode launch. *&quot;Unified GPT-5&quot; ends model pickers, pushing developers away from manual model selection.*</description><pubDate>Fri, 08 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>microsoft</category><category>gpt-5</category><category>gpt-4o</category><category>grok-4</category><category>claude-4-sonnet</category><category>sama</category><category>nickaturley</category><category>elaineyale6</category><category>scaling01</category><category>mustafasuleyman</category><category>kevinweil</category><category>omarsar0</category><category>jeremyphoward</category><category>juberti</category><category>epochairesearch</category><category>lechmazur</category><category>gdb</category><category>reasoning</category><category>latency</category><category>model-routing</category><category>benchmarking</category><category>reinforcement-learning</category><category>hallucination-control</category><category>creative-writing</category><category>priority-processing</category><category>api-traffic</category><category>model-deprecation</category><category>user-experience</category><category>model-selection</category><category>voice-mode</category><category>documentation</category></item><item><title>OpenAI rolls out GPT-5 and GPT-5 Thinking to &gt;1B users worldwide; -mini and -nano help claim Pareto Frontier</title><link>https://news.smol.ai/issues/25-08-07-gpt-5/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-07-gpt-5/</guid><description>**OpenAI** launched **GPT-5**, a unified system featuring a fast main model and a deeper thinking model with a real-time router, supporting up to **400K context length** and aggressive pricing that reclaims the Pareto Frontier of Intelligence. The rollout includes variants like **gpt-5-mini** and **gpt-5-nano** with significant cost reductions, and integrations with products such as **ChatGPT**, **Cursor AI**, **JetBrains AI Assistant**, **Microsoft Copilot**, **Notion AI**, and **Perplexity AI**. Benchmarks show GPT-5 performing strongly in coding and long-context reasoning, roughly matching **Claude 4.1 Sonnet/Opus** on SWE-bench Verified. The launch was accompanied by a GPT-5 prompting cookbook and notable community discussions on pricing and performance.</description><pubDate>Thu, 07 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>cursor_ai</category><category>jetbrains</category><category>microsoft</category><category>notion</category><category>perplexity_ai</category><category>factoryai</category><category>gpt-5</category><category>gpt-5-mini</category><category>gpt-5-nano</category><category>claude-4.1-sonnet</category><category>claude-4.1-opus</category><category>sama</category><category>scaling01</category><category>jeffintime</category><category>embirico</category><category>mustafasuleyman</category><category>cline</category><category>lmarena_ai</category><category>nrehiew_</category><category>ofirpress</category><category>sauers_</category><category>model-architecture</category><category>context-windows</category><category>pricing-models</category><category>coding</category><category>long-context</category><category>prompt-engineering</category><category>model-benchmarking</category><category>model-integration</category><category>tool-use</category><category>reasoning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-08-06-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-06-not-much/</guid><description>**OpenAI** released its first open models since GPT-2, **gpt-oss-120b** and **gpt-oss-20b**, which quickly trended on **Hugging Face**. **Microsoft** supports these models via **Azure AI Foundry** and **Windows Foundry Local**. Key architectural innovations include **sliding window attention**, **mixture of experts (MoE)**, a **RoPE variant**, and a **256k context length**. The models use a new **MXFP4** format supported by **llama.cpp**. Hypotheses suggest **gpt-oss** was trained on **synthetic data** to enhance safety and performance, supporting the **Reasoning Core Hypothesis**. **OpenAI** announced a **$500K bounty** for red teaming with partners including **Anthropic**, **Google**, and the **UK AISI**. Performance critiques highlight inconsistent benchmarking results, with **GPT-OSS-120B** scoring **41.8%** on the **Aider Polyglot** coding benchmark, trailing competitors like **Kimi-K2** and **DeepSeek-R1**. Some users note the model excels in math and reasoning but lacks common sense and practical utility.</description><pubDate>Wed, 06 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>huggingface</category><category>microsoft</category><category>llamaindex</category><category>ollama</category><category>baseten</category><category>fireworksai</category><category>cerebras</category><category>groq</category><category>together</category><category>anthropic</category><category>google</category><category>uk-aisi</category><category>gpt-oss-120b</category><category>gpt-oss-20b</category><category>kimi-k2</category><category>deepseek-r1</category><category>qwen-3-32b</category><category>woj_zaremba</category><category>sama</category><category>huybery</category><category>drjimfan</category><category>jxmnop</category><category>scaling01</category><category>arunv30</category><category>kevinweil</category><category>xikun_zhang_</category><category>jerryjliu0</category><category>ollama</category><category>basetenco</category><category>reach_vb</category><category>gneubig</category><category>shxf0072</category><category>_lewtun</category><category>sliding-window-attention</category><category>mixture-of-experts</category><category>rope</category><category>context-length</category><category>mxfp4-format</category><category>synthetic-data</category><category>reasoning-core-hypothesis</category><category>red-teaming</category><category>benchmarking</category><category>coding-benchmarks</category><category>model-performance</category><category>fine-tuning</category></item><item><title>OpenAI&apos;s gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3</title><link>https://news.smol.ai/issues/25-08-05-gpt-oss/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-05-gpt-oss/</guid><description>**OpenAI** released the **gpt-oss** family, including **gpt-oss-120b** and **gpt-oss-20b**, their first open-weight models since GPT-2, designed for agentic tasks and licensed under **Apache 2.0**. These models use a **Mixture-of-Experts (MoE)** architecture with wide vs. deep design and innovative features like bias units in attention and a unique swiglu variant. The **120B** model was trained with about **2.1 million H100 GPU hours**. Meanwhile, **Anthropic** launched **claude-4.1-opus**, touted as the best coding model currently. **DeepMind** showcased **genie-3**, a realtime world simulation model with minute-long consistency. The releases highlight advances in open-weight models, reasoning capabilities, and world simulation. Key figures like **@sama**, **@rasbt**, and **@SebastienBubeck** provided technical insights and performance evaluations, noting strengths and hallucination risks.</description><pubDate>Tue, 05 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>google-deepmind</category><category>gpt-oss-120b</category><category>gpt-oss-20b</category><category>gpt-oss</category><category>claude-4.1-opus</category><category>claude-4.1</category><category>genie-3</category><category>sama</category><category>rasbt</category><category>sebastienbubeck</category><category>polynoamial</category><category>kaicathyc</category><category>finbarrtimbers</category><category>vikhyatk</category><category>scaling01</category><category>teortaxestex</category><category>mixture-of-experts</category><category>model-architecture</category><category>agentic-ai</category><category>model-training</category><category>model-performance</category><category>reasoning</category><category>hallucination-detection</category><category>gpu-optimization</category><category>open-weight-models</category><category>realtime-simulation</category></item><item><title>Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT</title><link>https://news.smol.ai/issues/25-08-04-qwen-image/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-04-qwen-image/</guid><description>**Alibaba** surprised with the release of **Qwen-Image**, a **20B MMDiT** model excelling at bilingual text rendering and graphic poster creation, with open weights and demos available. **Google DeepMind** launched **Gemini 2.5 Deep Think** to Ultra subscribers, showing significant reasoning improvements and benchmark gains (+11.2% AIME, +13.2% HLE, +13.4% LiveCodeBench) rivaling **OpenAI&apos;s o3 Pro**. ByteDance&apos;s **SeedProver** achieved state-of-the-art math theorem proving results, surpassing DeepMind&apos;s AlphaGeometry2. OpenAI is developing a &quot;universal verifier&quot; for math and coding gains transfer. Competitive reasoning benchmarks and game arenas by Google and Kaggle highlight a meta-shift in reasoning model efficiency, comparable to the original Transformer leap. Other open-weight models gaining momentum include **GLM-4.5**, **XBai o4**, and **Tencent Hunyuan** with a focus on efficient training. *&quot;Qwen is all you need.&quot;*</description><pubDate>Mon, 04 Aug 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>google-deepmind</category><category>openai</category><category>bytedance</category><category>kaggle</category><category>tencent</category><category>qwen-image</category><category>mmdit</category><category>gemini-2.5</category><category>o3-pro</category><category>seedprover</category><category>glm-4.5</category><category>xbai-o4</category><category>hunyuan</category><category>swyx</category><category>demishassabis</category><category>tulseedoshi</category><category>mparakhin</category><category>teortaxestex</category><category>cgeorgiaw</category><category>dorialexander</category><category>steph_palazzolo</category><category>corbtt</category><category>synthwavedd</category><category>epochairesearch</category><category>bilingual-text-rendering</category><category>image-generation</category><category>image-editing</category><category>synthetic-data</category><category>reasoning</category><category>math-theorem-proving</category><category>benchmarking</category><category>instruction-following</category><category>model-efficiency</category><category>open-weight-models</category><category>model-transparency</category><category>competitive-evaluation</category></item><item><title>Gemini 2.5 Deep Think finally ships</title><link>https://news.smol.ai/issues/25-08-01-deep-think/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-08-01-deep-think/</guid><description>**OpenAI** is rumored to soon launch new **GPT-OSS** and **GPT-5** models amid drama with **Anthropic** revoking access to **Claude**. **Google DeepMind** quietly launched **Gemini 2.5 Deep Think**, a model optimized for parallel thinking that achieved gold-medal level at the IMO and excels in reasoning, coding, and creative tasks. Leaks suggest **OpenAI** is developing a **120B MoE** and a **20B** model with advanced attention mechanisms. Chinese AI companies like **Kimi Moonshot**, **Alibaba**, and **ZHIpu AI** are releasing faster and more capable open models such as **kimi-k2-turbo-preview**, **Qwen3-Coder-Flash**, and **GLM-4.5**, signaling strong momentum and potential to surpass the U.S. in AI development. *&quot;The final checkpoint was selected just 5 hours before the IMO problems were released,&quot;* highlighting rapid development cycles.</description><pubDate>Fri, 01 Aug 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>google-deepmind</category><category>kimi-moonshot</category><category>alibaba</category><category>ollama</category><category>zhipu-ai</category><category>stepfun</category><category>gemini-2.5-deep-think</category><category>gpt-oss</category><category>gpt-5</category><category>kimi-k2-turbo-preview</category><category>qwen3-coder-flash</category><category>glm-4.5</category><category>step-3</category><category>claude</category><category>demishassabis</category><category>philschmid</category><category>scaling01</category><category>teortaxestex</category><category>teknium1</category><category>lmarena_ai</category><category>andrewyng</category><category>parallel-thinking</category><category>model-releases</category><category>moe</category><category>attention-mechanisms</category><category>multimodal-reasoning</category><category>model-performance</category><category>context-windows</category><category>open-source-models</category><category>model-leaks</category><category>creative-ai</category><category>coding</category><category>reasoning</category><category>model-optimization</category></item><item><title>Figma&apos;s $50+b IPO</title><link>https://news.smol.ai/issues/25-07-31-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-31-not-much/</guid><description>**OpenAI**&apos;s stealth model **horizon-alpha** on **OpenRouter** sparks speculation as a precursor to **GPT-5**, showing strong reasoning and SVG generation capabilities, comparable to **Gemini 2.5 Pro**. **Alibaba** released the **Qwen3-Coder** family, including a fast **Qwen3-Coder-Flash (30B-A3B)** variant with agentic features and 1M context length support via **UnslothAI**. **Cohere** launched **Command A Vision**, a 111B parameter open-weights vision-language model outperforming **GPT-4.1** and **Llama 4 Maverick** on enterprise benchmarks. **Black Forest Labs** introduced **FLUX.1 Krea [dev]**, an open-weights photorealism model compatible with fine-tuning tools like **diffusers** and **ostrisai**. **Zhipu AI** unveiled **GLM-4.5**, a hybrid reasoning open model with agentic capabilities available on **Together AI**. Discussions highlight the rising importance of **inference-time training** and **reasoning model generalization**. **Mistral AI** released the technical report for **Voxtral** continuing its open science efforts.</description><pubDate>Thu, 31 Jul 2025 05:44:39 GMT</pubDate><category>openai</category><category>openrouter</category><category>alibaba</category><category>unslothai</category><category>cohere</category><category>huggingface</category><category>black-forest-labs</category><category>diffusers</category><category>ostrisai</category><category>zhipu-ai</category><category>together-ai</category><category>mistral-ai</category><category>horizon-alpha</category><category>gpt-5</category><category>gemini-2.5-pro</category><category>qwen3-coder</category><category>qwen3-coder-flash-30b-a3b</category><category>command-a-vision</category><category>gpt-4.1</category><category>llama-4-maverick</category><category>flux-1-krea-dev</category><category>glm-4.5</category><category>voxtral</category><category>scaling01</category><category>teortaxestex</category><category>huybery</category><category>nickfrosst</category><category>aidangomez</category><category>reach_vb</category><category>zai_org</category><category>corbtt</category><category>jxmnop</category><category>teknuim1</category><category>reasoning</category><category>svg-generation</category><category>agentic-ai</category><category>context-windows</category><category>vision</category><category>fine-tuning</category><category>inference-time-training</category><category>model-generalization</category><category>open-models</category><category>technical-reports</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-30-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-30-not-much/</guid><description>**Chinese AI labs** have released powerful open-source models like **GLM-4.5** and **GLM-4.5-Air** from **Zhipu AI**, **Qwen3 Coder** and **Qwen3-235B** from **Alibaba**, and **Kimi K2** from **Moonshot AI**, highlighting a surge in permissively licensed models. **Zhipu AI&apos;s GLM-4.5** is a 355B parameter MoE model competitive with **Claude 4 Opus** and **Gemini 2.5 Pro**. **Alibaba&apos;s Qwen3 Coder** shows strong code generation performance with a low edit failure rate, while **Moonshot AI&apos;s Kimi K2** is a 1 trillion-parameter MoE model surpassing benchmarks like **LiveCodeBench**. In video and image generation, **xAI** launched **Grok Imagine**, and **Wan2.2** impressed with innovative image-to-video generation. Robotics advances include **Figure&apos;s Figure-01 and Figure-02** humanoid robots and **ViTPose++** for pose estimation in basketball analysis. **SmolLM3** training and evaluation code was fully released under Apache 2.0. **OpenAI** introduced **Study Mode** in **ChatGPT** to enhance interactive learning, and **Runway** rolled out **Runway Aleph**, a new in-context video model for multi-task visual generation. The community notes a competitive disadvantage for organizations avoiding these Chinese open-source models. *&quot;Orgs avoiding these models are at a significant competitive disadvantage,&quot;* noted by @corbtt.</description><pubDate>Wed, 30 Jul 2025 05:44:39 GMT</pubDate><category>zhipu-ai</category><category>alibaba</category><category>moonshot-ai</category><category>x-ai</category><category>figure</category><category>openai</category><category>runway</category><category>mlx</category><category>ollama</category><category>deeplearningai</category><category>glm-4.5</category><category>glm-4.5-air</category><category>qwen3-coder</category><category>qwen3-235b</category><category>kimi-k2</category><category>grok-imagine</category><category>wan-2.2</category><category>smollm3</category><category>figure-01</category><category>figure-02</category><category>vitpose++</category><category>chatgpt</category><category>yuchenj_uw</category><category>corbtt</category><category>reach_vb</category><category>ollama</category><category>deeplearningai</category><category>gdb</category><category>sama</category><category>c_valenzuelab</category><category>adcock_brett</category><category>skalskip92</category><category>loubnabenallal1</category><category>hojonathanho</category><category>ostrisai</category><category>model-releases</category><category>model-performance</category><category>moe</category><category>image-generation</category><category>video-generation</category><category>pose-estimation</category><category>robotics</category><category>training-code-release</category><category>interactive-learning</category><category>in-context-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-29-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-29-not-much/</guid><description>**Chinese labs** have released a wave of powerful, permissively licensed models in July, including **Zhipu AI&apos;s GLM-4.5** and **GLM-4.5-Air**, **Alibaba&apos;s Qwen3 Coder** and **Qwen3-235B**, and **Moonshot AI&apos;s Kimi K2**. These models feature large-scale Mixture of Experts architectures with active parameters ranging from 3B to 32B and context windows up to 256K tokens. **Zhipu AI&apos;s GLM-4.5** competes with **Claude 4 Opus** and **Gemini 2.5 Pro** in benchmarks. **Moonshot AI&apos;s Kimi K2** is a 1 trillion-parameter MoE model surpassing other open-weight models on **LiveCodeBench** and **AceBench**. In video and image generation, **xAI** launched **Grok Imagine**, and **Wan2.2** impressed with its Image-to-Video approach. **Ideogram** released a character consistency model. Robotics advances include **Figure&apos;s Figure-01 and Figure-02** humanoid robots and **ViTPose++** for pose estimation in basketball analysis. The **SmolLM3** training and evaluation code was fully released under an Apache 2.0 license. *&quot;Orgs avoiding these Chinese open-source models are at a significant competitive disadvantage,&quot;* noted by @corbtt.</description><pubDate>Tue, 29 Jul 2025 05:44:39 GMT</pubDate><category>zhipu-ai</category><category>alibaba</category><category>moonshot-ai</category><category>x-ai</category><category>ideogram</category><category>figure</category><category>smollm</category><category>openai</category><category>glm-4.5</category><category>glm-4.5-air</category><category>qwen3-coder</category><category>qwen3-235b</category><category>kimi-k2</category><category>wan-2.2</category><category>grok-imagine</category><category>smollm3</category><category>figure-01</category><category>figure-02</category><category>vitpose++</category><category>yuchenj_uw</category><category>corbtt</category><category>cline</category><category>reach_vb</category><category>ollama</category><category>deeplearningai</category><category>ostrisai</category><category>hojonathanho</category><category>adcock_brett</category><category>skalskip92</category><category>loubnabenallal1</category><category>model-releases</category><category>moe</category><category>model-benchmarking</category><category>image-generation</category><category>video-generation</category><category>pose-estimation</category><category>robotics</category><category>training-code-release</category><category>apache-license</category></item><item><title>GLM-4.5: Deeper, Headier, &amp; better than Kimi/Qwen/DeepSeek (SOTA China LLM?)</title><link>https://news.smol.ai/issues/25-07-28-glm-45/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-28-glm-45/</guid><description>**Z.ai** (Zhipu AI) released the **GLM-4.5-355B-A32B** and **GLM-4.5-Air-106B-A12B** open weights models, claiming state-of-the-art performance competitive with **Claude 4 Opus**, **Grok 4**, and OpenAI&apos;s **o3**. These models emphasize token efficiency and efficient reinforcement learning training validated by the Muon optimizer. **Alibaba Qwen** introduced **Group Sequence Policy Optimization (GSPO)**, a new reinforcement learning algorithm powering the **Qwen3** model suite, integrated into Hugging Face&apos;s TRL library. Speculation surrounds mystery models &quot;summit&quot; and &quot;zenith&quot; as potential **GPT-5** variants based on **GPT-4.1** architecture. **Qwen3-Coder** shows strong coding benchmark results, rivaling **Claude Sonnet 4** and **Kimi K2**. The rise of powerful Chinese open-source models like **GLM-4.5**, **Wan-2.2**, and **Qwen3 Coder** contrasts with a slowdown from Western labs such as **OpenAI**.</description><pubDate>Mon, 28 Jul 2025 05:44:39 GMT</pubDate><category>z-ai</category><category>alibaba</category><category>huggingface</category><category>openai</category><category>glm-4.5-355b-a32b</category><category>glm-4.5-air-106b-a12b</category><category>qwen3-coder</category><category>claude-4-opus</category><category>grok-4</category><category>o3</category><category>gpt-4.1</category><category>gpt-5</category><category>kimi-k2</category><category>claude-sonnet-4</category><category>lupantech</category><category>teortaxestex</category><category>mervenoyann</category><category>_lewtun</category><category>scaling01</category><category>cline</category><category>reinforcement-learning</category><category>token-efficiency</category><category>model-optimization</category><category>open-source-models</category><category>agentic-ai</category><category>coding</category><category>model-training</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-25-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-25-not-much/</guid><description>**OpenAI** has fully rolled out its ChatGPT agent to all Plus, Pro, and Team users and is building hype for the upcoming **GPT-5**, which reportedly outperforms **Grok-4** and can build a cookie clicker game in two minutes. **Alibaba&apos;s Qwen** team released the open-source reasoning model **Qwen3-235B-Thinking**, achieving an **89%** win rate over **gpt4-0314** using a new RL algorithm called **Group Sequence Policy Optimization (GSPO)**. **Runway** introduced **Runway Aleph**, a state-of-the-art in-context video model for editing and generating video content. **Hugging Face** highlights the growing momentum of open-source AI, especially from Chinese teams. Other updates include **Kling&apos;s** upgrades for image-to-video generation and **Google&apos;s Imagen 4 Ultra** being recognized as a top text-to-image model. **Anthropic** integrated **Claude** with **Canva** for branded visual designs but faces stability issues. The **PyTorch** team released optimized checkpoints for **SmolLM3** to speed up inference.</description><pubDate>Fri, 25 Jul 2025 05:44:39 GMT</pubDate><category>openai</category><category>alibaba</category><category>runway</category><category>hugging-face</category><category>google</category><category>anthropic</category><category>pytorch</category><category>lmarena</category><category>gpt-5</category><category>gpt4-0314</category><category>qwen3-235b-thinking</category><category>runway-aleph</category><category>imagen-4-ultra</category><category>smollm3</category><category>grok-4</category><category>sama</category><category>clementdelangue</category><category>xikun_zhang_</category><category>teknnium1</category><category>chujiezheng</category><category>reinforcement-learning</category><category>reasoning</category><category>video-generation</category><category>image-generation</category><category>model-optimization</category><category>open-source</category><category>model-performance</category><category>inference-speed</category><category>integration</category><category>stability</category></item><item><title>3x in 3 months: Cursor @ $28b, Cognition + Windsurf @ $10b</title><link>https://news.smol.ai/issues/25-07-24-cogsurf-cursor/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-24-cogsurf-cursor/</guid><description>**Cursor** is reportedly fundraising at a **$28 billion valuation with $1 billion ARR**, while the combined **Cognition+Windsurf** entity is fundraising at a **$10 billion valuation** after acquiring Windsurf remainco for $300 million. The competition between AI coding agents intensifies as Cursor focuses on Async SWE Agents and Cognition+Windsurf acquires an agentic IDE. **Alibaba&apos;s Qwen3-Coder** gains widespread adoption for coding tasks and integration into tools like **Claude Code** and **LM Studio**. **OpenAI** rolls out **ChatGPT Agent** to all Plus, Pro, and Team users, sparking discussions about an &quot;agentic economy&quot; emphasizing **AI literacy**. **Anthropic&apos;s Claude Code** is praised as a premier development tool with active community feedback. **Perplexity&apos;s Comet browser assistant** receives positive reviews and new feature showcases. The debate continues on whether AI coding tools will replace developers, with critiques highlighting the ongoing human effort required. A new minimalistic software engineering agent, **mini**, achieves 65% on SWE-bench with just 100 lines of code.</description><pubDate>Thu, 24 Jul 2025 05:44:39 GMT</pubDate><category>cursor</category><category>cognition</category><category>windsurf</category><category>alibaba</category><category>openai</category><category>anthropic</category><category>perplexity</category><category>qwen3-coder</category><category>chatgpt-agent</category><category>claude-code</category><category>mini</category><category>bindureddy</category><category>xikun_zhang_</category><category>aravsrinivas</category><category>gergelyorosz</category><category>jeremyphoward</category><category>agentic-ai</category><category>fundraising</category><category>software-engineering</category><category>ai-coding</category><category>agentic-economy</category><category>model-integration</category><category>community-feedback</category><category>performance-benchmarking</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-23-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-23-not-much/</guid><description>**Alibaba** announced the release of **Qwen3-Coder-480B-A35B-Instruct**, an open agentic code model with **480B** parameters and **256K** context length, praised for rapid development and strong coding performance. Benchmark claims of **41.8% on ARC-AGI-1** faced skepticism from **Franois Chollet** and others due to reproducibility issues. The model quickly integrated into ecosystems like **vLLM**, **Dynamic GGUFs**, and **OpenRouterAI**. The **White House** unveiled a new **AI Action Plan** emphasizing **Innovation**, **Infrastructure**, and **International Diplomacy**, linking AI leadership to national security and prioritizing compute access for the **Department of Defense**. The plan sparked debate on open vs. closed-source AI, with calls from **Clement Delangue** to embrace open science to maintain US AI competitiveness.</description><pubDate>Wed, 23 Jul 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>openrouterai</category><category>togethercompute</category><category>vllm_project</category><category>unslothai</category><category>white-house</category><category>qwen3-coder-480b-a35b-instruct</category><category>kimi-k2</category><category>fchollet</category><category>clementdelangue</category><category>scaling01</category><category>aravsrinivas</category><category>rasbt</category><category>gregkamradt</category><category>yuchenj_uw</category><category>code-generation</category><category>benchmarking</category><category>model-integration</category><category>context-windows</category><category>open-source</category><category>national-security</category><category>infrastructure</category><category>ai-policy</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-22-not-much/</guid><description>**Moonshot AI** released the **Kimi K2**, a 1-trillion parameter ultra-sparse Mixture-of-Experts (MoE) model with the **MuonClip** optimizer and a large-scale agentic data pipeline using over **20,000 tools**. Shortly after, **Alibaba** updated its **Qwen3** model with the **Qwen3-235B-A22B** variant, which outperforms Kimi K2 and other top models on benchmarks like **GPQA** and **AIME** despite being 4.25x smaller. Alibaba also released **Qwen3-Coder-480B-A35B**, a MoE model specialized for coding with a 1 million token context window. **Google DeepMind** launched **Gemini 2.5 Flash-Lite**, a faster and more cost-efficient model outperforming previous versions in coding, math, and multimodal tasks. The MoE architecture is becoming mainstream, with models like **Mistral**, **DeepSeek**, and **Kimi K2** leading the trend. In mathematics, an advanced **Gemini** model achieved a gold medal level score at the **International Mathematical Olympiad (IMO)**, marking a first for AI. An **OpenAI** researcher noted their IMO model &quot;knew&quot; when it did not have a correct solution, highlighting advances in model reasoning and self-awareness.</description><pubDate>Tue, 22 Jul 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>alibaba</category><category>google</category><category>google-deepmind</category><category>openai</category><category>hugging-face</category><category>vllm-project</category><category>kimi-k2</category><category>qwen3-235b-a22b</category><category>qwen3-coder-480b-a35b</category><category>gemini-2.5-flash-lite</category><category>mistral-7b</category><category>deepseek-v3</category><category>demishassabis</category><category>rasbt</category><category>alexwei_</category><category>yitayml</category><category>mixture-of-experts</category><category>agentic-ai</category><category>model-optimization</category><category>model-training</category><category>benchmarking</category><category>code-generation</category><category>long-context</category><category>multimodality</category><category>math</category><category>reinforcement-learning</category><category>model-architecture</category><category>model-performance</category><category>open-source</category><category>alignment</category></item><item><title>OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits</title><link>https://news.smol.ai/issues/25-07-21-imo-gold/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-21-imo-gold/</guid><description>**OpenAI** and **Google DeepMind** achieved a major milestone by solving 5 out of 6 problems at the **International Mathematical Olympiad (IMO) 2025** within the human time limit of 4.5 hours, earning the IMO Gold medal. This breakthrough was accomplished using general-purpose reinforcement learning and pure in-weights reasoning without specialized tools or internet access, surpassing previous systems like AlphaProof and AlphaGeometry2. The success resolved a 3-year-old AI bet on AI&apos;s capability to solve IMO problems and sparked discussions among mathematicians including **Terence Tao**. Despite this, 26 human competitors remain better than AI on the hardest combinatorics problem (P6). The achievement highlights advances in **reinforcement-learning**, **reasoning**, and **model-scaling** in AI research.</description><pubDate>Mon, 21 Jul 2025 05:44:39 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>gemini-1.5-pro</category><category>o1</category><category>terence_tao</category><category>oriol_vinyals</category><category>alexander_wei</category><category>jerry_tworek</category><category>paul_christiano</category><category>eliezer_yudkowsky</category><category>reinforcement-learning</category><category>reasoning</category><category>model-scaling</category><category>fine-tuning</category><category>model-training</category><category>benchmarking</category><category>natural-language-processing</category></item><item><title>ChatGPT Agent: new o* model + unified Deep Research browser + Operator computer use + Code Interpreter terminal</title><link>https://news.smol.ai/issues/25-07-17-chatgpt-agent/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-17-chatgpt-agent/</guid><description>**OpenAI** launched the **ChatGPT Agent**, a new advanced AI system capable of browsing the web, coding, analyzing data, and creating reports, marking a significant step towards human-like computer use. The agent, distinct from and superior to **o3**, is considered the first public exposure of what was internally called **o4**, now merged into **GPTNext**. It features end-to-end reinforcement learning, can operate for extended periods (tested up to 2 hours), and is classified as &quot;High&quot; risk for biological misuse, with safeguards activated. Early benchmarks show mixed results, excelling in some tests like **WebArena** and **BrowserComp** but underperforming on others like **PaperBench**. Key figures involved include **Sam Altman**, **Greg Brockman**, and **Kevin Weil**, with technical insights from **xikun_zhang_** and risk commentary from **KerenGu** and **boazbaraktcs**. The launch sparked speculation about **GPT-5**, which was confirmed not to be the case.</description><pubDate>Thu, 17 Jul 2025 05:44:39 GMT</pubDate><category>openai</category><category>o3</category><category>o4</category><category>gptnext</category><category>sama</category><category>gdb</category><category>kevinweil</category><category>xikun_zhang_</category><category>keren_gu</category><category>boazbaraktcs</category><category>reinforcement-learning</category><category>benchmarking</category><category>model-performance</category><category>model-risk</category><category>long-context</category><category>model-deployment</category><category>fine-tuning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-16-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-16-not-much/</guid><description>**Mistral** released **Voxtral**, claimed as the world&apos;s best open speech recognition models, available via API and Hugging Face. **Moonshot AI** launched **Kimi K2**, a trillion-parameter **Mixture-of-Experts (MoE)** model, outperforming **GPT-4.1** on benchmarks with 65.4% on SWE-Bench Verified and achieving 200 tokens/second inference speed on **Groq** hardware. **Nous Research** open-sourced the **Hermes 3** dataset with 1 million samples, aiding SOTA models on the **Llama-3** series. **Google DeepMind** introduced the **Mixture-of-Recursions (MoR)** architecture promising 2x inference speed and 50% parameter reduction but faced skepticism. **Goedel-Prover V2** topped the **PutnamBench** theorem proving benchmark. AtCoder World Finals saw a human winner with **OpenAI** placing second. Research highlights include **Jason Wei**&apos;s insights on **reinforcement learning** and the &quot;Verifier&apos;s Law&quot; emphasizing the asymmetry of verification in AI training.</description><pubDate>Wed, 16 Jul 2025 05:44:39 GMT</pubDate><category>mistral-ai</category><category>moonshot-ai</category><category>nous-research</category><category>google-deepmind</category><category>openai</category><category>groq</category><category>anthropic</category><category>kimi-k2</category><category>gpt-4.1</category><category>voxtral</category><category>goedel-prover-v2</category><category>llama-3</category><category>cline</category><category>_jasonwei</category><category>speech-recognition</category><category>mixture-of-experts</category><category>benchmarking</category><category>dataset-release</category><category>model-architecture</category><category>theorem-proving</category><category>reinforcement-learning</category><category>asymmetry-of-verification</category><category>inference-speed</category><category>model-performance</category></item><item><title>Voxtral - Mistral&apos;s SOTA ASR model in 3B (mini) and 24B (&quot;small&quot;) sizes beats OpenAI Whisper large-v3</title><link>https://news.smol.ai/issues/25-07-15-voxtral/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-15-voxtral/</guid><description>**Mistral** surprises with the release of **Voxtral**, a transcription model outperforming **Whisper large-v3**, **GPT-4o mini Transcribe**, and **Gemini 2.5 Flash**. Voxtral models (3B and 24B) support **32k token context length**, handle audios up to **30-40 minutes**, offer built-in **Q&amp;A and summarization**, are **multilingual**, and enable **function-calling** from voice commands, powered by the **Mistral Small 3.1** language model backbone. Meanwhile, **Moonshot AI**&apos;s **Kimi K2**, a non-reasoning **Mixture of Experts (MoE)** model built by a team of around **200 people**, gains attention for blazing-fast inference on **Groq** hardware, broad platform availability including **Together AI** and **DeepInfra**, and local running on **M4 Max 128GB** Mac. Developer tool integrations include **LangChain** and Hugging Face support, highlighting Kimi K2&apos;s strong tool use capabilities.</description><pubDate>Tue, 15 Jul 2025 05:44:39 GMT</pubDate><category>mistral-ai</category><category>moonshot-ai</category><category>groq</category><category>together-ai</category><category>deepinfra</category><category>huggingface</category><category>langchain</category><category>voxtal-3b</category><category>voxtal-24b</category><category>kimi-k2</category><category>jeremyphoward</category><category>teortaxestex</category><category>scaling01</category><category>zacharynado</category><category>jonathanross321</category><category>reach_vb</category><category>philschmid</category><category>transcription</category><category>long-context</category><category>function-calling</category><category>multilingual-models</category><category>mixture-of-experts</category><category>inference-speed</category><category>developer-tools</category><category>model-integration</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-14-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-14-not-much/</guid><description>**Cognition** is acquiring the remaining assets of **Windsurf** after a significant weekend deal. **Moonshot AI** released **Kimi K2**, an open-source, MIT-licensed agentic model with **1 Trillion total / 32B active parameters** using a Mixture-of-Experts architecture, trained on **15.5 Trillion tokens** with the **MuonClip** optimizer, showing top performance on benchmarks like **EQ-Bench** and **Creative Writing**. **xAI** launched **Grok-4**, ranking 5th on **IQ Bench** but with notable quirks including a bug causing it to respond only with &quot;Heavy&quot; and a high frequency of Elon Musk mentions. Rumors about **OpenAI** delaying an open-source model release surfaced, with speculation about CEO **sama**&apos;s PR strategy and a possible **GPT-5** launch in September. The **Gemini 2.5** paper was released with **3,295 authors**, and **Google** introduced its **Gemini Embedding** model, topping the **MTEB leaderboard**.</description><pubDate>Mon, 14 Jul 2025 05:44:39 GMT</pubDate><category>cognition</category><category>windsurf</category><category>moonshot-ai</category><category>x-ai</category><category>openai</category><category>google</category><category>stanfordnlp</category><category>huggingface</category><category>kimi-k2</category><category>grok-4</category><category>gpt-5</category><category>gemini-2.5</category><category>gemini-embedding</category><category>sama</category><category>hardmaru</category><category>jeremyphoward</category><category>akhaliq</category><category>teortaxestex</category><category>yuchenj_uw</category><category>demishassabis</category><category>mixture-of-experts</category><category>model-training</category><category>model-performance</category><category>fine-tuning</category><category>benchmarking</category><category>agentic-ai</category><category>model-bugs</category><category>embedding-models</category></item><item><title>Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params</title><link>https://news.smol.ai/issues/25-07-11-kimi-k2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-11-kimi-k2/</guid><description>**Moonshot AI** has released **Kimi K2**, a **1 trillion parameter** Mixture-of-Experts model trained on **15.5 trillion tokens** using the new **MuonClip** optimizer, achieving state-of-the-art results on benchmarks like **SWE-Bench Verified (65.8%)** and **TAU2 (58.4%)**. This model is competitive with **GPT-4.1** and **Sonnet 4** on non-thinking tasks and is available under an **MIT license**. Meanwhile, **xAI** announced **Grok-4**, noted for its &quot;LEAST censored frontier model&quot; status and strong long-context performance but criticized for rushed post-training. **Mistral AI** updated its **Devstral 2507** models with improved performance and cost efficiency. The community is excited about the potential of the **MuonClip** optimizer, which may surpass the long-standing AdamW optimizer in machine learning.</description><pubDate>Fri, 11 Jul 2025 05:44:39 GMT</pubDate><category>moonshot-ai</category><category>alibaba</category><category>tencent</category><category>deepseek</category><category>x-ai</category><category>mistral-ai</category><category>weights-biases</category><category>hugging-face</category><category>kimi-k2</category><category>kimi-k2-1t</category><category>deepseek-v3</category><category>grok-4</category><category>devstral-2507</category><category>gpt-4.1</category><category>sonnet-4</category><category>yuchenj_uw</category><category>andrew_n_carr</category><category>scaling01</category><category>novita_labs</category><category>teknium1</category><category>aravsrinivas</category><category>mparakhin</category><category>simonw</category><category>mixture-of-experts</category><category>model-training</category><category>model-optimization</category><category>optimizer</category><category>benchmarking</category><category>long-context</category><category>model-performance</category><category>open-weights</category><category>model-release</category></item><item><title>Grok 4: xAI succeeds in going from 0 to new SOTA LLM in 2 years</title><link>https://news.smol.ai/issues/25-07-10-grok-4/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-10-grok-4/</guid><description>**xAI** launched **Grok 4** and **Grok 4 Heavy**, large language models rumored to have **2.4 trillion parameters** and trained with **100x more compute** than Grok 2 on **100k H100 GPUs**. Grok 4 achieved new state-of-the-art results on benchmarks like **ARC-AGI-2 (15.9%)**, **HLE (50.7%)**, and **Vending-Bench**, outperforming models such as **Claude 4 Opus**. The model supports a **256K context window** and is priced at **$3.00/M input tokens** and **$15.00/M output tokens**. It is integrated into platforms like **Cursor**, **Cline**, **LangChain**, and **Perplexity Pro/Max**. The launch was accompanied by a controversial voice mode and sparked industry discussion about xAI&apos;s rapid development pace, with endorsements from figures like **Elon Musk** and **Arav Srinivas**.</description><pubDate>Thu, 10 Jul 2025 05:44:39 GMT</pubDate><category>xai</category><category>perplexity-ai</category><category>langchain</category><category>cursor</category><category>cline</category><category>grok-4</category><category>grok-4-heavy</category><category>claude-4-opus</category><category>elonmusk</category><category>aravsrinivas</category><category>igor_babuschkin</category><category>yuchenj_uw</category><category>model-releases</category><category>benchmarking</category><category>long-context</category><category>model-pricing</category><category>model-integration</category><category>voice</category><category>performance</category><category>scaling</category><category>gpu-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-09-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-09-not-much/</guid><description>**LangChain** is nearing unicorn status, while **OpenAI** and **Google DeepMind&apos;s Gemini 3 Pro** models are launching soon. **Perplexity** rolls out its agentic browser **Comet** to waitlists, offering multitasking and voice command features. **xAI&apos;s Grok-4** update sparked controversy due to offensive outputs, drawing comparisons to **Microsoft&apos;s Tay** bot and resulting in regional blocks. **Hugging Face** released **SmolLM3**, a 3B parameter open-source model with state-of-the-art reasoning and long context capabilities. **Google** introduced **T5Gemma** encoder-decoder models, a significant update in this model category. **Anthropic** investigates &quot;alignment faking&quot; in language models, focusing on safety concerns with models like **Claude 3.7 Sonnet** and **DeepSeek-R1**. *&quot;Grok 3 had high reasoning, Grok 4 has heil reasoning&quot;* was a notable user comment on the controversy.</description><pubDate>Wed, 09 Jul 2025 05:44:39 GMT</pubDate><category>langchain</category><category>openai</category><category>google-deepmind</category><category>perplexity</category><category>xai</category><category>microsoft</category><category>huggingface</category><category>anthropic</category><category>grok-4</category><category>smollm3</category><category>t5gemma</category><category>claude-3.7-sonnet</category><category>deepseek-r1</category><category>aravsrinivas</category><category>clementdelangue</category><category>_akhaliq</category><category>agentic-ai</category><category>model-controversy</category><category>open-source</category><category>model-release</category><category>alignment</category><category>fine-tuning</category><category>long-context</category><category>multimodality</category><category>model-research</category></item><item><title>SmolLM3: the SOTA 3B reasoning open source LLM</title><link>https://news.smol.ai/issues/25-07-08-smollm3/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-08-smollm3/</guid><description>**HuggingFace** released **SmolLM3-3B**, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until **Olmo 3** arrives. **Grok 4** was launched with mixed reactions, while concerns about **Claude 4** nerfs and an imminent **Claude 4.1** surfaced. **Gemini Nano** is now shipping in **Chrome 137+**, enabling local LLM access for **3.7 billion** users. **Tencent** introduced **Hunyuan-A13B**, an 80B parameter model with a 256K context window running on a single **H200** GPU. The **Gemini API** added a batch mode with 50% discounts on **2.5 models**. **MatFormer Lab** launched tools for custom-sized **Gemma 3n** models. Open source OCR models like **Nanonets-OCR-s** and **ChatDOC/OCRFlux-3B** derived from **Qwen2.5-VL-3B** were highlighted, with licensing discussions involving **Alibaba**.</description><pubDate>Tue, 08 Jul 2025 05:44:39 GMT</pubDate><category>huggingface</category><category>allenai</category><category>openai</category><category>anthropic</category><category>google-deepmind</category><category>mistral-ai</category><category>tencent</category><category>gemini</category><category>alibaba</category><category>smollm3-3b</category><category>olmo-3</category><category>grok-4</category><category>claude-4</category><category>claude-4.1</category><category>gemini-nano</category><category>hunyuan-a13b</category><category>gemini-2.5</category><category>gemma-3n</category><category>qwen2.5-vl-3b</category><category>elonmusk</category><category>mervenoyann</category><category>skirano</category><category>amandaaskell</category><category>clementdelangue</category><category>loubnabenallal1</category><category>awnihannun</category><category>swyx</category><category>artificialanlys</category><category>officiallogank</category><category>osanseviero</category><category>cognitivecompai</category><category>aravsrinivas</category><category>open-source</category><category>small-language-models</category><category>model-releases</category><category>model-performance</category><category>benchmarking</category><category>multimodality</category><category>context-windows</category><category>precision-fp8</category><category>api</category><category>batch-processing</category><category>model-scaling</category><category>model-architecture</category><category>licensing</category><category>ocr</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-07-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-07-not-much/</guid><description>Over the holiday weekend, key AI developments include the upcoming release of **Grok 4**, **Perplexity** teasing new projects, and community reactions to **Cursor** and **Dia**. Research highlights feature a paper on **Reinforcement Learning (RL)** improving generalization and reasoning across domains, contrasting with Supervised Fine-Tuning&apos;s forgetting issues. **Energy-Based Transformers (EBTs)** are proposed as a promising alternative to traditional transformers. **AI21 Labs** updated its **Jamba** model family with enhanced grounding and instruction following, maintaining a **256K** context window. **Baidu** open-sourced its massive **424 billion** parameter **Ernie 4.5** model, while **Kontext-dev** became the top trending model on **Hugging Face**. Advances in length generalization for recurrent models and the introduction of **2-simplicial attention** were noted. In biomedical AI, **Biomni**, powered by **Claude 4 Sonnet**, demonstrated superior accuracy and rare disease diagnosis capabilities. Additionally, the Python package manager `uv` received praise for improving Python installation workflows.</description><pubDate>Mon, 07 Jul 2025 05:44:39 GMT</pubDate><category>ai21-labs</category><category>hugging-face</category><category>baidu</category><category>perplexity-ai</category><category>deepmind</category><category>anthropic</category><category>grok-4</category><category>jamba</category><category>ernie-4.5</category><category>claude-4-sonnet</category><category>claude-4</category><category>kontext-dev</category><category>_philschmid</category><category>corbtt</category><category>jxmnop</category><category>sedielem</category><category>_akhaliq</category><category>slashml</category><category>alexiglad</category><category>clementdelangue</category><category>_albertgu</category><category>tri_dao</category><category>theaitimeline</category><category>deep-learning-ai</category><category>reinforcement-learning</category><category>fine-tuning</category><category>energy-based-transformers</category><category>ssm-transformer</category><category>context-windows</category><category>length-generalization</category><category>recurrent-neural-networks</category><category>attention-mechanisms</category><category>2-simplicial-attention</category><category>biomedical-ai</category><category>instruction-following</category><category>open-weight-models</category><category>python-package-management</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-03-not-much/</guid><description>**Ilya Sutskever** confirmed his role as CEO of **Safe Superintelligence Inc. (SSI)** with **Daniel Levy** as President, dismissing acquisition rumors and emphasizing their strong team and compute resources. **Perplexity AI** expanded its data integrations by adding **Morningstar&apos;s** financial research and hinted at new product features for Pro users. **Meta AI FAIR** clarified its research structure, distinguishing its small lab from larger model training groups, and welcomed **Nat Friedman** to enhance AI product development. **Midjourney** and **Sakana AI** announced hiring for research and applied engineering roles. **Cohere** expanded its presence in Montréal, receiving praise from Canadian officials. On the model front, **Google DeepMind&apos;s Gemini Pro** released the **Veo 3** video generation model globally. **DeepSeek** launched the faster **DeepSeek R1T2** model using an Assembly of Experts approach, available under an MIT license. **Kling AI** showcased cinematic video generation capabilities. **OpenAI** introduced a high-cost **Deep Research API** with pricing up to **$30 per call**. **Together AI** announced the release of the **DeepSWE agent**.</description><pubDate>Thu, 03 Jul 2025 05:44:39 GMT</pubDate><category>safe-superintelligence-inc</category><category>perplexity-ai</category><category>meta-ai-fair</category><category>midjourney</category><category>sakana-ai</category><category>cohere</category><category>google-deepmind</category><category>deepseek</category><category>openai</category><category>together-ai</category><category>veo-3</category><category>deepseek-r1t2</category><category>deepseek-tng-r1t2-chimera</category><category>o3-deep-research</category><category>o4-mini-deep-research</category><category>deepswe-agent</category><category>ilya_sutskever</category><category>daniel_levy</category><category>daniel_gross</category><category>aravsrinivas</category><category>zeyuanallenzhu</category><category>nat_friedman</category><category>davidsholz</category><category>fp_champagne</category><category>demishassabis</category><category>reach_vb</category><category>video-generation</category><category>assembly-of-experts</category><category>model-licenses</category><category>api-pricing</category><category>research-roles</category><category>product-expansion</category><category>corporate-leadership</category><category>model-release</category><category>team-expansion</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-02-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-02-not-much/</guid><description>**Meta** has hired **Scale AI CEO Alexandr Wang** as its new **Chief AI Officer**, acquiring a **49% non-voting stake** in **Scale AI** for **$14.3 billion**, doubling its valuation to **~$28 billion**. This move is part of a major talent shuffle involving **Meta**, **OpenAI**, and **Scale AI**. Discussions include the impact on **Yann LeCun**&apos;s influence at **Meta** and potential responses from **OpenAI**. In model news, **Gemma 3N** faces technical issues like vision NaNs and FP16 overflows, with fixes from **UnslothAI**. Chinese open-source models like **GLM-4.1V-Thinking** by **Zhipu AI** and **DeepSeek R1T2** show strong performance and speed improvements. **Huawei** open-sourced a **72B MoE** model with a novel load balancing solution. The **MiniMax-M1** hybrid MoE model leads math benchmarks on the **Text Arena leaderboard**. **AllenAI** launched **SciArena** for scientific literature evaluation, where **o3** outperforms others. Research from **Sakana AI Labs** introduces **AB-MCTS** for code generation, improving synthesis benchmarks.</description><pubDate>Wed, 02 Jul 2025 05:44:39 GMT</pubDate><category>meta</category><category>scale-ai</category><category>unslothai</category><category>zhipu-ai</category><category>deepseek</category><category>huawei</category><category>minimax-ai</category><category>allenai</category><category>sakana-ai-labs</category><category>openai</category><category>gemma-3n</category><category>glm-4.1v-thinking</category><category>deepseek-r1t2</category><category>mini-max-m1</category><category>o3</category><category>claude-4-opus</category><category>claude-sonnet</category><category>moe-72b</category><category>alexandr_wang</category><category>natfriedman</category><category>steph_palazzolo</category><category>thegregyang</category><category>teortaxes_tex</category><category>denny_zhou</category><category>agihippo</category><category>danielhanchen</category><category>osanseviero</category><category>reach_vb</category><category>scaling01</category><category>ndea</category><category>model-performance</category><category>vision</category><category>conv2d</category><category>float16</category><category>training-loss</category><category>open-source</category><category>model-benchmarks</category><category>moe</category><category>load-balancing</category><category>scientific-literature-evaluation</category><category>code-generation</category><category>adaptive-tree-search</category><category>synthesis-benchmarks</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-07-01-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-07-01-not-much/</guid><description>**Meta** makes a major AI move by hiring **Scale AI** founder **Alexandr Wang** as Chief AI Officer and acquiring a 49% non-voting stake in **Scale AI** for **$14.3 billion**, doubling its valuation to about **$28 billion**. **Chai Discovery** announces **Chai-2**, a breakthrough model for zero-shot antibody discovery and optimization. The US government faces budget cuts threatening to eliminate a quarter million science research jobs by **2026**. Data access restrictions intensify as companies like **Atlassian**, **Notion**, and **Slack** block web crawlers including **Common Crawl**, raising concerns about future public internet archives. **Hugging Face** shuts down **HuggingChat** after serving over a million users, marking a significant experiment in open-source LLMs. **Sakana AI** releases **AB-MCTS**, an inference-time scaling algorithm enabling multiple models like **Gemini 2.5 Pro** and **DeepSeek-R1-0528** to cooperate and outperform individual models.</description><pubDate>Tue, 01 Jul 2025 05:44:39 GMT</pubDate><category>meta</category><category>scale-ai</category><category>anthropic</category><category>cloudflare</category><category>grammarly</category><category>superhuman</category><category>chai-discovery</category><category>atlassian</category><category>notion</category><category>slack</category><category>commoncrawl</category><category>hugging-face</category><category>sakana-ai</category><category>chai-2</category><category>gemini-2.5-pro</category><category>deepseek-r1-0528</category><category>alexandr_wang</category><category>nat_friedman</category><category>clementdelangue</category><category>teortaxestex</category><category>ylecun</category><category>steph_palazzolo</category><category>andersonbcdefg</category><category>jeremyphoward</category><category>reach_vb</category><category>inference</category><category>model-scaling</category><category>collective-intelligence</category><category>zero-shot-learning</category><category>enterprise-deployment</category><category>data-access</category><category>science-funding</category><category>open-source-llms</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-06-30-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-30-not-much/</guid><description>**Meta** has poached top AI talent from **OpenAI**, including **Alexandr Wang** joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next **Llama** model. The AI job market shows polarization with high demand and compensation for top-tier talent, while credentials like strong GitHub projects gain importance. The **WizardLM** team moved from **Microsoft** to **Tencent** to develop open-source models like **Hunyuan-A13B**, highlighting shifts in China&apos;s AI industry. Rumors suggest **OpenAI** will release a new open-source model in July, potentially surpassing existing **ChatGPT** models. **Baidu** open-sourced multiple variants of its **ERNIE 4.5** model series, featuring advanced techniques like **2-bit quantization**, **MoE router orthogonalization loss**, and **FP8** training, with models ranging from **0.3B** to **424B** parameters. **Gemini 2.5 Pro** returned to the free tier of the **Gemini API**, enabling developers to explore its features.</description><pubDate>Mon, 30 Jun 2025 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>tencent</category><category>microsoft</category><category>baidu</category><category>gemini</category><category>o3-mini</category><category>o1-mini</category><category>llama</category><category>hunyuan-a13b</category><category>ernie-4.5</category><category>ernie-4.5-21b-a3b</category><category>qwen3-30b-a3b</category><category>gemini-2.5-pro</category><category>alexandr_wang</category><category>shengjia_zhao</category><category>jhyuxm</category><category>ren_hongyu</category><category>shuchaobi</category><category>saranormous</category><category>teortaxesTex</category><category>mckbrando</category><category>yuchenj_uw</category><category>francoisfleuret</category><category>quanquangu</category><category>reach_vb</category><category>philschmid</category><category>superintelligence</category><category>ai-talent</category><category>job-market</category><category>open-source-models</category><category>multimodality</category><category>mixture-of-experts</category><category>quantization</category><category>fp8-training</category><category>model-benchmarking</category><category>model-performance</category><category>model-releases</category><category>api</category><category>model-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-06-27-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-27-not-much/</guid><description>**Google** released **Gemma 3n**, a multimodal model for edge devices available in **2B and 4B** parameter versions, with support across major frameworks like **Transformers** and **Llama.cpp**. **Tencent** open-sourced **Hunyuan-A13B**, a **Mixture-of-Experts (MoE)** model with **80B total parameters** and a **256K context window**, optimized for tool calling and coding. **Black Forest Labs** released **FLUX.1 Kontext [dev]**, an open image AI model gaining rapid Hugging Face adoption. **Inception AI Labs** launched **Mercury**, the first commercial-scale **diffusion LLM** for chat. The **FineWeb2** multilingual pre-training dataset paper was released, analyzing data quality impacts. The **Qwen** team released **Qwen-VLo**, a unified visual understanding and generation model. **Kyutai Labs** released a top-ranked open-source speech-to-text model running on Macs and iPhones. **OpenAI** introduced **Deep Research API** with **o3/o4-mini** models and open-sourced prompt rewriter methodology, integrated into **LangChain** and **LangGraph**. The open-source **Gemini CLI** gained over **30,000 GitHub stars** as an AI terminal agent.</description><pubDate>Fri, 27 Jun 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>tencent</category><category>black-forest-labs</category><category>inception-ai</category><category>qwen</category><category>kyutai-labs</category><category>openai</category><category>langchain</category><category>langgraph</category><category>hugging-face</category><category>ollama</category><category>unslothai</category><category>nvidia</category><category>amd</category><category>gemma-3n</category><category>hunyuan-a13b</category><category>flux-1-kontext-dev</category><category>mercury</category><category>fineweb2</category><category>qwen-vlo</category><category>o3-mini</category><category>o4-mini</category><category>demishassabis</category><category>reach_vb</category><category>tri_dao</category><category>osanseviero</category><category>simonw</category><category>clementdelangue</category><category>swyx</category><category>hwchase17</category><category>sydneyrunkle</category><category>multimodality</category><category>mixture-of-experts</category><category>context-windows</category><category>tool-use</category><category>coding</category><category>image-generation</category><category>diffusion-models</category><category>dataset-release</category><category>multilinguality</category><category>speech-to-text</category><category>api</category><category>prompt-engineering</category><category>agent-frameworks</category><category>open-source</category><category>model-release</category></item><item><title>OpenAI releases Deep Research API (o3/o4-mini)</title><link>https://news.smol.ai/issues/25-06-26-deepresearch-api/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-26-deepresearch-api/</guid><description>**OpenAI** has launched the **Deep Research API** featuring powerful models **o3-deep-research** and **o4-mini-deep-research** with native support for MCP, Search, and Code Interpreter, enabling advanced agent capabilities including multi-agent setups. **Google** released **Gemma 3n**, a multimodal model optimized for edge devices with only 3GB RAM, achieving a top score of 1300 on LMSys Arena, featuring the new MatFormer architecture and broad ecosystem integration. **Black Forest Labs** introduced **FLUX.1 Kontext [dev]**, a 12B parameter rectified flow transformer for instruction-based image editing, comparable to **GPT-4o**. **DeepMind** unveiled **AlphaGenome**, an AI model capable of reading 1 million DNA bases for gene function prediction, marking a breakthrough in AI biology. **Sakana AI** presented Reinforcement-Learned Teachers (RLTs) to enhance LLM reasoning, achieving 86.1% on MiniF2F with efficient compute. **Higgsfield AI** released **Higgsfield Soul**, a high-aesthetic photo model with 50+ presets for fashion-grade realism. Additionally, **Google** launched the **Gemini CLI**, an open-source AI agent for terminal use with free Gemini 2.5 Pro requests.</description><pubDate>Thu, 26 Jun 2025 05:44:39 GMT</pubDate><category>openai</category><category>google</category><category>black-forest-labs</category><category>deepmind</category><category>sakana-ai</category><category>higgsfield-ai</category><category>huggingface</category><category>ollama</category><category>o3-deep-research</category><category>o4-mini-deep-research</category><category>gemma-3n</category><category>flux-1-kontext-dev</category><category>gpt-4o</category><category>alphagenome</category><category>demishassabis</category><category>hardmaru</category><category>osanseviero</category><category>clementdelangue</category><category>multimodality</category><category>model-releases</category><category>agentic-ai</category><category>reinforcement-learning</category><category>instruction-following</category><category>model-architecture</category><category>model-optimization</category><category>image-generation</category><category>biological-ai</category><category>multi-agent-systems</category><category>model-integration</category></item><item><title>Context Engineering: Much More than Prompts</title><link>https://news.smol.ai/issues/25-06-25-context-eng/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-25-context-eng/</guid><description>**Context Engineering** emerges as a significant trend in AI, highlighted by experts like **Andrej Karpathy**, **Walden Yan** from **Cognition**, and **Tobi Lutke**. It involves managing an LLM&apos;s context window with the right mix of prompts, retrieval, tools, and state to optimize performance, going beyond traditional prompt engineering. **LangChain** and its tool **LangGraph** are noted for advancing this approach. Additionally, **OpenAI** has launched **ChatGPT connectors** for platforms like **Google Drive**, **Dropbox**, **SharePoint**, and **Box**, enhancing context integration for Pro users. Other notable news includes the launch of **Vercel Sandbox**, **Cloudflare Containers**, the leak and release of **Gemini Code** by **Google DeepMind**, and fundraising efforts by **OpenRouter**.</description><pubDate>Wed, 25 Jun 2025 05:44:39 GMT</pubDate><category>openai</category><category>langchain</category><category>cognition</category><category>google-deepmind</category><category>vercel</category><category>cloudflare</category><category>openrouter</category><category>gemini-code</category><category>karpathy</category><category>walden_yan</category><category>tobi_lutke</category><category>hwchase17</category><category>rlancemartin</category><category>kwindla</category><category>dex_horthy</category><category>context-engineering</category><category>retrieval-augmented-generation</category><category>tools</category><category>state-management</category><category>history-management</category><category>prompt-engineering</category><category>software-layer</category><category>chatgpt-connectors</category><category>api-integration</category></item><item><title>Bartz v. Anthropic PBC — &quot;Training use is Fair Use&quot;</title><link>https://news.smol.ai/issues/25-06-24-fair-use/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-24-fair-use/</guid><description>**Anthropic** won a significant fair use ruling allowing the training of **Claude** on copyrighted books, setting a precedent for AI training legality despite concerns over pirated data. **Replit** achieved a major milestone with **$100M ARR**, showing rapid growth. **Delphi** raised **$16M Series A** to scale digital minds, while **Thinking Machines Lab** focuses on reinforcement learning for business applications. **Disney** and **Universal** sued **Midjourney** over unauthorized use of copyrighted images. **Google DeepMind** released **Gemini Robotics On-Device**, a compact foundation model for robotics.</description><pubDate>Tue, 24 Jun 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>replit</category><category>delphi</category><category>sequoia</category><category>thinking-machines-lab</category><category>disney</category><category>universal</category><category>midjourney</category><category>google-deepmind</category><category>claude</category><category>gemini-robotics-on-device</category><category>andrea_bartz</category><category>giffmana</category><category>andrewcurran_</category><category>amasad</category><category>swyx</category><category>hwchase17</category><category>krandiash</category><category>daraladje</category><category>steph_palazzolo</category><category>corbtt</category><category>demishassabis</category><category>fair-use</category><category>copyright</category><category>reinforcement-learning</category><category>foundation-models</category><category>robotics</category><category>funding</category><category>lawsuit</category><category>digital-minds</category><category>model-release</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/25-06-23-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-23-not-much/</guid><description>**Sakana AI** released **Reinforcement-Learned Teachers (RLTs)**, a novel technique using smaller 7B parameter models trained via reinforcement learning to teach reasoning through step-by-step explanations, accelerating **Chain-of-Thought** learning. **Mistral AI** updated **Mistral Small 3.2** improving instruction following and function calling with experimental FP8 quantization. **Google Magenta RealTime**, an 800M parameter open-weights model for real-time music generation, was released. **Arcee AI** launched **AFM-4.5B**, a sub-10B parameter foundation model extended from **Llama 3**. **OpenThinker3-7B** was introduced as a new state-of-the-art 7B reasoning model with a 33% improvement over **DeepSeek-R1-Distill-Qwen-7B**. The **STORM** text-video model compresses video input by 8x using **Mamba layers** and outperforms **GPT-4o** on MVBench with 70.6%. Discussions on reinforcement learning algorithms PPO vs. GRPO and insights on **DINOv2**&apos;s performance on ImageNet-1k were also highlighted. *&quot;A very quiet day&quot;* in AI news with valuable workshops from **OpenAI**, **Amazon**, and **GDM**.</description><pubDate>Mon, 23 Jun 2025 05:44:39 GMT</pubDate><category>sakana-ai</category><category>mistral-ai</category><category>google</category><category>arcee-ai</category><category>deepseek-ai</category><category>openai</category><category>amazon</category><category>gdm</category><category>mistral-small-3.2</category><category>magenta-realtime</category><category>afm-4.5b</category><category>llama-3</category><category>openthinker3-7b</category><category>deepseek-r1-distill-qwen-7b</category><category>storm</category><category>qwen2-vl</category><category>gpt-4o</category><category>dino-v2</category><category>sama</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>fine-tuning</category><category>function-calling</category><category>quantization</category><category>music-generation</category><category>foundation-models</category><category>reasoning</category><category>text-video</category><category>model-compression</category><category>image-classification</category><category>evaluation-metrics</category></item><item><title>The Quiet Rise of Claude Code vs Codex</title><link>https://news.smol.ai/issues/25-06-20-claude-code/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-20-claude-code/</guid><description>**Claude Code** is gaining mass adoption, inspiring derivative projects like **OpenCode** and **ccusage**, with discussions ongoing in AI communities. **Mistral AI** released **Mistral Small 3.2**, a **24B** parameter model update improving instruction following and function calling, available on **Hugging Face** and supported by **vLLM**. Sebastian Raschka implemented **Qwen3 0.6B** from scratch, noting its deeper architecture and memory efficiency compared to **Llama 3 1B**. **Google DeepMind** showcased **Gemini 2.5 Flash-Lite**&apos;s UI code generation from visual context and added video upload support in the **Gemini App**. **Apple**&apos;s new **3B** parameter on-device foundation model was benchmarked, showing slower speed but efficient memory use via **2-bit quantization**, suitable for background tasks. **Google DeepMind** also released **Magenta Real-time**, an **800M** parameter music generation model licensed under **Apache 2.0**, marking Google&apos;s 1000th model on **Hugging Face**. **Kuaishou** launched **KLING 2.1**, a new video model accessible via API.</description><pubDate>Fri, 20 Jun 2025 05:44:39 GMT</pubDate><category>mistral-ai</category><category>hugging-face</category><category>google-deepmind</category><category>apple</category><category>artificial-analysis</category><category>kuaishou</category><category>mistral-small-3.2</category><category>qwen3-0.6b</category><category>llama-3-1b</category><category>gemini-2.5-flash-lite</category><category>gemini-app</category><category>magenta-real-time</category><category>apple-3b-on-device</category><category>reach_vb</category><category>guillaumelample</category><category>qtnx_</category><category>shxf0072</category><category>rasbt</category><category>demishassabis</category><category>artificialanlys</category><category>osanseviero</category><category>instruction-following</category><category>function-calling</category><category>model-implementation</category><category>memory-efficiency</category><category>2-bit-quantization</category><category>music-generation</category><category>video-models</category><category>benchmarking</category><category>api</category></item><item><title>minor ai followups: MultiAgents, Meta-SSI-Scale, Karpathy, AI Engineer</title><link>https://news.smol.ai/issues/25-06-19-followups/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-19-followups/</guid><description>**OpenAI** released a paper revealing how training models like **GPT-4o** on insecure code can cause broad misalignment, drawing reactions from experts like *@sama* and *@polynoamial*. **California&apos;s AI regulation efforts** were highlighted by *@Yoshua_Bengio* emphasizing transparency and whistleblower protections. The term **&quot;context rot&quot;** was coined to describe LLM conversation degradation, with systems like **Embra** using CRM-like memory for robustness. Scalable oversight research aiming to improve human control over smarter AIs was discussed by *@RyanPGreenblatt*. New model releases include **Kyutai&apos;s** speech-to-text models capable of 400 real-time streams on a single H100 GPU, **Tencent&apos;s Hunyuan 3D 2.1** as the first open-source production-ready PBR 3D generative model, and **Arcee&apos;s AFM-4.5B** foundation model family targeting enterprise use, competitive with **Gemma** and **Qwen**.</description><pubDate>Thu, 19 Jun 2025 05:44:39 GMT</pubDate><category>openai</category><category>meta-ai-fair</category><category>scale-ai</category><category>huggingface</category><category>tencent</category><category>arcee-ai</category><category>gpt-4o</category><category>afm-4.5b</category><category>gemma</category><category>qwen</category><category>stt-1b-en_fr</category><category>stt-2.6b-en</category><category>hunyuan-3d-2.1</category><category>sama</category><category>polynoamial</category><category>neelnanda5</category><category>teortaxestex</category><category>yoshua_bengio</category><category>zachtratar</category><category>ryanpgreenblatt</category><category>reach_vb</category><category>arankomatsuzaki</category><category>code_star</category><category>ai-safety</category><category>alignment</category><category>ai-regulation</category><category>memory-optimization</category><category>scalable-oversight</category><category>speech-recognition</category><category>3d-generation</category><category>foundation-models</category></item><item><title>Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?</title><link>https://news.smol.ai/issues/25-06-18-zuck-founder-mode/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-18-zuck-founder-mode/</guid><description>**Meta AI** is reportedly offering **8-9 figure signing bonuses and salaries** to top AI talent, confirmed by **Sam Altman**. They are also targeting key figures like **Nat** and **Dan** from the AI Grant fund for strategic hires. **Essential AI** released the massive **24-trillion-token Essential-Web v1.0 dataset** with rich metadata and a 12-category taxonomy. **DeepLearning.AI** and **Meta AI** launched a course on **Llama 4**, featuring new MoE models **Maverick (400B)** and **Scout (109B)** with context windows up to **10M tokens**. **MiniMax** open-sourced **MiniMax-M1**, a long-context LLM with a 1M-token window, and introduced the **Hailuo 02** video model. **OpenAI** rolled out &quot;Record mode&quot; for **ChatGPT Pro, Enterprise, and Edu** on macOS. **Arcee** launched the **AFM-4.5B** foundation model for enterprise. **Midjourney** released its **V1 video model** enabling image animation. These developments highlight major advances in model scale, long-context reasoning, multimodality, and enterprise AI applications.</description><pubDate>Wed, 18 Jun 2025 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>deeplearning-ai</category><category>essential-ai</category><category>minimax</category><category>arcee</category><category>midjourney</category><category>llama-4</category><category>maverick</category><category>scout</category><category>minimax-m1</category><category>afm-4.5b</category><category>chatgpt</category><category>midjourney-v1</category><category>sama</category><category>nat</category><category>dan</category><category>ashvaswani</category><category>clementdelangue</category><category>amit_sangani</category><category>andrewyng</category><category>_akhaliq</category><category>long-context</category><category>multimodality</category><category>model-release</category><category>foundation-models</category><category>dataset-release</category><category>model-training</category><category>video-generation</category><category>enterprise-ai</category><category>model-architecture</category><category>moe</category><category>prompt-optimization</category></item><item><title>Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview</title><link>https://news.smol.ai/issues/25-06-17-gemini-2-5/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-17-gemini-2-5/</guid><description>**Gemini 2.5** models are now generally available, including the new **Gemini 2.5 Flash-Lite**, **Flash**, **Pro**, and **Ultra** variants, featuring sparse **Mixture-of-Experts (MoE)** transformers with native multimodal support. A detailed 30-page tech report highlights impressive long-horizon planning demonstrated by **Gemini Plays Pokemon**. The **LiveCodeBench-Pro** benchmark reveals frontier LLMs struggle with hard coding problems, while **Moonshot AI** open-sourced **Kimi-Dev-72B**, achieving state-of-the-art results on **SWE-bench Verified**. Smaller specialized models like **Nanonets-OCR-s**, **II-Medical-8B-1706**, and **Jan-nano** show competitive performance, emphasizing that bigger models are not always better. **DeepSeek-r1** ties for #1 in WebDev Arena, and **MiniMax-M1** sets new standards in long-context reasoning. **Kling AI** demonstrated video generation capabilities.</description><pubDate>Tue, 17 Jun 2025 05:44:39 GMT</pubDate><category>google</category><category>moonshot-ai</category><category>deepseek</category><category>cognitivecompai</category><category>kling-ai</category><category>gemini-2.5</category><category>gemini-2.5-flash-lite</category><category>gemini-2.5-flash</category><category>gemini-2.5-pro</category><category>gemini-2.5-ultra</category><category>kimi-dev-72b</category><category>nanonets-ocr-s</category><category>ii-medical-8b-1706</category><category>jan-nano</category><category>deepseek-r1</category><category>minimax-m1</category><category>tulsee_doshi</category><category>oriolvinyalsml</category><category>demishassabis</category><category>officiallogank</category><category>_philschmid</category><category>swyx</category><category>sainingxie</category><category>scaling01</category><category>gneubig</category><category>clementdelangue</category><category>mervenoyann</category><category>mixture-of-experts</category><category>multimodality</category><category>long-horizon-planning</category><category>benchmarking</category><category>coding-performance</category><category>long-context</category><category>ocr</category><category>video-generation</category><category>model-releases</category></item><item><title>Chinese Models Launch - MiniMax-M1, Hailuo 2 &quot;Kangaroo&quot;, Moonshot Kimi-Dev-72B</title><link>https://news.smol.ai/issues/25-06-16-chinese-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-16-chinese-models/</guid><description>**MiniMax AI** launched **MiniMax-M1**, a 456 billion parameter open weights LLM with a 1 million token input and 80k token output using efficient &quot;lightning attention&quot; and a GRPO variant called CISPO. **MiniMax AI** also announced **Hailuo 02 (0616)**, a video model similar to **ByteDance&apos;s Seedance**. **Moonshot AI** released **Kimi-Dev-72B**, a coding model outperforming **DeepSeek R1** on SWEBench Verified. Discussions on multi-agent system design from **Anthropic** and **LangChain** highlighted improvements in task completion and challenges like prompt injection attacks, as demonstrated by **Karpathy** and **Columbia University** research. **Sakana AI** introduced **ALE-Agent**, a coding agent that ranked 21st in the AtCoder Heuristic Competition solving NP-hard optimization problems. There is unverified news about an acquisition involving **OpenAI**, **Microsoft**, and **Windsurf**.</description><pubDate>Mon, 16 Jun 2025 05:44:39 GMT</pubDate><category>minimax-ai</category><category>moonshot-ai</category><category>deepseek</category><category>bytedance</category><category>anthropic</category><category>langchain</category><category>columbia-university</category><category>sakana-ai</category><category>openai</category><category>microsoft</category><category>minimax-m1</category><category>hailuo-02</category><category>kimi-dev-72b</category><category>deepseek-r1</category><category>ale-agent</category><category>jerryjliu0</category><category>hwchase17</category><category>omarsar0</category><category>gallabytes</category><category>lateinteraction</category><category>karpathy</category><category>multi-agent-systems</category><category>attention-mechanisms</category><category>coding</category><category>optimization</category><category>prompt-injection</category><category>model-performance</category><category>video-generation</category><category>model-training</category><category>task-automation</category></item><item><title>Cognition vs Anthropic: Don&apos;t Build Multi-Agents/How to Build Multi-Agents</title><link>https://news.smol.ai/issues/25-06-13-cognition-vs-anthropic/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-13-cognition-vs-anthropic/</guid><description>Within the last 24 hours, **Cognition**&apos;s Walden Yan advised *&quot;Don&apos;t Build Multi-Agents,&quot;* while **Anthropic** shared their approach to building multi-agent systems with **Claude&apos;s** multi-agent research architecture. **LangChain** highlighted advances in context engineering and production AI agents used by **LinkedIn** and **BlackRock**. The community is engaging in a debate on multi-agent AI development. Additionally, **Hugging Face** announced deprecating **TensorFlow** and **Flax** support in favor of **PyTorch**. Research on agent memory and model elicitation techniques from **LlamaIndex** and **Anthropic** were also discussed.</description><pubDate>Fri, 13 Jun 2025 05:44:39 GMT</pubDate><category>cognition</category><category>anthropic</category><category>langchain</category><category>huggingface</category><category>microsoft</category><category>llamaindex</category><category>linkedin</category><category>blackrock</category><category>claude</category><category>walden_yan</category><category>hwchase17</category><category>assaf_elovic</category><category>sh_reya</category><category>hamelhusain</category><category>omarsar0</category><category>clefourrier</category><category>jerryjliu0</category><category>akbirkhan</category><category>multi-agent-systems</category><category>context-engineering</category><category>agent-memory</category><category>model-elicitation</category><category>ai-evaluation</category><category>deep-research-workflows</category><category>framework-migration</category><category>pydantic-schema</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-06-12-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-12-not-much/</guid><description>**Bytedance** showcased an impressive state-of-the-art video generation model called **Seedance 1.0** without releasing it, while **Morph Labs** announced **Trinity**, an autoformalization system for Lean. **Huggingface Transformers** deprecated Tensorflow/JAX support. **Andrew Ng** of **DeepLearning.AI** highlighted the rise of the **GenAI Application Engineer** role emphasizing skills in **AI building blocks** and **AI-assisted coding tools** like **Codex** and **Claude Code**. Engineering teams are increasingly testing API designs against LLMs for usability. **Figure AI**&apos;s CEO stressed speed as a key competitive advantage, and **LangChain** introduced the concept of **Context Engineering** for AI agents. Reinforcement learning on LLMs shows transformative potential, and the community values **AI evals** and data work. **Sakana AI** released **Text-to-LoRA**, a hypernetwork method for generating task-specific LoRA adapters from natural language, enabling efficient model customization. The video generation race heats up with **Bytedance**&apos;s Seed-based model praised for quality, challenging American labs, alongside models like **Kling 2.1** and **Veo 3**.</description><pubDate>Thu, 12 Jun 2025 05:44:39 GMT</pubDate><category>bytedance</category><category>morph-labs</category><category>huggingface</category><category>deeplearning.ai</category><category>figure-ai</category><category>langchain</category><category>sakana-ai</category><category>seedance-1.0</category><category>codex</category><category>claude-code</category><category>kling-2.1</category><category>veo-3</category><category>andrew_ng</category><category>hwchase17</category><category>adcock_brett</category><category>clementdelangue</category><category>akhaliq</category><category>jxmnop</category><category>hamelhusain</category><category>sh_reya</category><category>video-generation</category><category>autoformalization</category><category>ai-assisted-coding</category><category>api-design</category><category>context-engineering</category><category>reinforcement-learning</category><category>ai-evals</category><category>hypernetworks</category><category>model-fine-tuning</category><category>foundation-models</category></item><item><title>Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI</title><link>https://news.smol.ai/issues/25-06-11-execuhires-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-11-execuhires-2/</guid><description>**Meta** hires **Scale AI&apos;s Alexandr Wang** to lead its new &quot;Superintelligence&quot; division following a **$15 billion investment** for a 49% stake in Scale. **Lamini&apos;s Sharon Zhou** joins **AMD** as VP of AI under Lisa Su, while **Instacart&apos;s Fidji Simo** becomes CEO of Apps at **OpenAI** under **Sama**. **Meta** offers over **$10 million/year compensation packages** to top researchers, successfully recruiting **Jack Rae** from **Gemini**. **OpenAI** releases **o3-pro** model to **ChatGPT Pro** users and API, outperforming **o3** and setting new benchmarks like **Extended NYT Connections** and **SnakeBench**. Despite being slower than **o1-pro**, **o3-pro** excels in reasoning and complex problem-solving. **OpenAI** cuts **o3** pricing by **80%**, making it cheaper than **GPT-4o** and pressuring competitors like **Google** and **Anthropic** to lower prices. Users can now fine-tune the **GPT-4.1** family using **direct preference optimization (DPO)** for subjective tasks.</description><pubDate>Wed, 11 Jun 2025 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>scale-ai</category><category>lamini</category><category>amd</category><category>openai</category><category>gemini</category><category>google</category><category>anthropic</category><category>o3-pro</category><category>o3</category><category>o1-pro</category><category>gpt-4o</category><category>gpt-4.1</category><category>gpt-4.1-mini</category><category>gpt-4.1-nano</category><category>alexandr_wang</category><category>sharon_zhou</category><category>fidji_simo</category><category>sama</category><category>jack_rae</category><category>markchen90</category><category>kevinweil</category><category>gdb</category><category>gregkamradt</category><category>lechmazur</category><category>wesrothmoney</category><category>paul_cal</category><category>imjaredz</category><category>cto_junior</category><category>johnowhitaker</category><category>polynoamial</category><category>scaling01</category><category>model-release</category><category>benchmarking</category><category>reasoning</category><category>fine-tuning</category><category>pricing</category><category>model-performance</category><category>direct-preference-optimization</category><category>complex-problem-solving</category></item><item><title>Reasoning Price War 2: Mistral Magistral + o3&apos;s 80% price cut + o3-pro</title><link>https://news.smol.ai/issues/25-06-10-o3-cut/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-10-o3-cut/</guid><description>**OpenAI** announced an **80% price cut** for its **o3** model, making it competitively priced with **GPT-4.1** and rivaling **Anthropic&apos;s Claude 4 Sonnet** and **Google&apos;s Gemini 2.5 Pro**. Alongside, **o3-pro** was released as a more powerful and reliable variant, though early benchmarks showed mixed performance relative to cost. **Mistral AI** launched its **Magistral** reasoning models, including an open-source **24B parameter** version optimized for efficient deployment on consumer GPUs. The price reduction and new model releases signal intensified competition in reasoning-focused large language models, with notable improvements in token efficiency and cost-effectiveness.</description><pubDate>Tue, 10 Jun 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>google-deepmind</category><category>mistral-ai</category><category>perplexity-ai</category><category>o3</category><category>o3-pro</category><category>gpt-4.1</category><category>claude-4-sonnet</category><category>gemini-2.5-pro</category><category>magistral-small</category><category>magistral-medium</category><category>mistral-small-3.1</category><category>swyx</category><category>sama</category><category>scaling01</category><category>polynoamial</category><category>nrehiew_</category><category>kevinweil</category><category>gdb</category><category>flavioad</category><category>stevenheidel</category><category>aravsrinivas</category><category>reasoning</category><category>token-efficiency</category><category>price-cut</category><category>benchmarking</category><category>open-source</category><category>model-releases</category><category>context-windows</category><category>gpu-optimization</category></item><item><title>Apple exposes Foundation Models API and... no new Siri</title><link>https://news.smol.ai/issues/25-06-09-apple-letdown/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-09-apple-letdown/</guid><description>**Apple** released on-device foundation models for iOS developers, though their recent &quot;Illusion of Reasoning&quot; paper faced significant backlash for flawed methodology regarding LLM reasoning. **OpenAI** updated **ChatGPT&apos;s Advanced Voice Mode** with more natural voice and improved translation, demonstrated by Greg Brockman. **LangChain** and **LlamaIndex** launched new AI agents and tools, including a SWE Agent for software automation and an Excel agent using reinforcement learning for data transformation. The AI community engaged in heated debate over reasoning capabilities of LLMs, highlighting challenges in evaluation methods.</description><pubDate>Mon, 09 Jun 2025 05:44:39 GMT</pubDate><category>apple</category><category>openai</category><category>langchain</category><category>llamaindex</category><category>chatgpt</category><category>gdb</category><category>scaling01</category><category>giffmana</category><category>kevinweil</category><category>on-device-ai</category><category>foundation-models</category><category>reasoning</category><category>reinforcement-learning</category><category>voice</category><category>translation</category><category>software-automation</category><category>agentic-workflows</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-06-06-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-06-not-much/</guid><description>**China&apos;s Xiaohongshu (Rednote) released dots.llm1**, a **142B parameter open-source Mixture-of-Experts (MoE) language model** with **14B active parameters** and a **32K context window**, pretrained on **11.2 trillion high-quality, non-synthetic tokens**. The model supports efficient inference frameworks like Docker, HuggingFace, and vLLM, and provides intermediate checkpoints every 1 trillion tokens, enabling flexible fine-tuning. Benchmarking claims it slightly surpasses **Qwen3 235B** on MMLU, though some concerns exist about benchmark selection and synthetic data verification. The release is notable for its truly open-source licensing and no synthetic data usage, sparking community optimism for support in frameworks such as llama.cpp and mlx.</description><pubDate>Fri, 06 Jun 2025 05:44:39 GMT</pubDate><category>xiaohongshu</category><category>rednote-hilab</category><category>deepseek</category><category>huggingface</category><category>dots-llm1</category><category>qwen3-235b</category><category>mixture-of-experts</category><category>open-source</category><category>model-benchmarking</category><category>fine-tuning</category><category>inference</category><category>context-windows</category><category>training-data</category><category>model-architecture</category><category>model-performance</category><category>model-optimization</category></item><item><title>Gemini 2.5 Pro (06-05) launched at AI Engineer World&apos;s Fair</title><link>https://news.smol.ai/issues/25-06-05-aia/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-05-aia/</guid><description>At the second day of **AIE**, **Google&apos;s Gemini 2.5 Pro** reclaimed the top spot on the LMArena leaderboard with a score of **1470** and a +24 Elo increase, showing improvements in coding, reasoning, and math. **Qwen3** released state-of-the-art embedding and reranking models, with **Qwen3-Embedding-8B** topping the MTEB multilingual leaderboard. **OpenThinker3-7B** emerged as the top open reasoning model trained on the **OpenThoughts3-1.2M dataset**, outperforming previous models by 33%. **LightOn** introduced **FastPlaid**, achieving up to a 554% speedup for late-interaction models. **Morph Labs** hired **Christian Szegedy** as Chief Scientist to lead Verified Superintelligence development. The **AI Engineer World&apos;s Fair** featured a fireside chat with **Greg Brockman** and **NVIDIA CEO Jensen Huang**, highlighting the return of basic research and engineering best practices.</description><pubDate>Thu, 05 Jun 2025 05:44:39 GMT</pubDate><category>google</category><category>qwen</category><category>lighton</category><category>morph-labs</category><category>openai</category><category>nvidia</category><category>gemini-2.5-pro</category><category>qwen3-embedding-8b</category><category>openthinker3-7b</category><category>greg_brockman</category><category>jensen_huang</category><category>christian_szegedy</category><category>swyx</category><category>benchmarking</category><category>reasoning</category><category>coding</category><category>math</category><category>embedding-models</category><category>late-interaction</category><category>dataset-release</category><category>model-performance</category><category>model-architecture</category><category>ai-conferences</category></item><item><title>AI Engineer World&apos;s Fair Talks Day 1</title><link>https://news.smol.ai/issues/25-06-04-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-04-not-much/</guid><description>**Mistral** launched a new **Code** project, and **Cursor** released version **1.0**. **Anthropic** improved **Claude Code** plans, while **ChatGPT** announced expanded connections. The day was dominated by **AIE** keynotes and tracks including **GraphRAG**, **RecSys**, and **Tiny Teams**. On Reddit, **Google** open-sourced the **DeepSearch** stack for building AI agents with **Gemini 2.5** and **LangGraph**, enabling flexible agent architectures and integration with local LLMs like **Gemma**. A new **Meta** paper analyzed language model memorization, showing GPT-style transformers store about **3.5–4 bits/parameter** and exploring the transition from memorization to generalization, with implications for **Mixture-of-Experts** models and quantization effects.</description><pubDate>Wed, 04 Jun 2025 05:44:39 GMT</pubDate><category>mistral</category><category>cursor</category><category>anthropic</category><category>openai</category><category>aie</category><category>google-deepmind</category><category>meta-ai-fair</category><category>gemini-2.5</category><category>gemma</category><category>claude-code</category><category>agent-based-architecture</category><category>open-source</category><category>model-memorization</category><category>scaling-laws</category><category>quantization</category><category>mixture-of-experts</category><category>language-model-memorization</category><category>model-generalization</category><category>langgraph</category><category>model-architecture</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-06-03-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-03-not-much/</guid><description>**OpenAI** rolled out **Codex** to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. **Anthropic&apos;s Claude 4 Opus and Sonnet** models lead coding benchmarks, while **Google&apos;s Gemini 2.5 Pro and Flash** models gain recognition with new audio capabilities. **Qwen 2.5-VL** and **Qwen 3** quantizations are noted for versatility and support. **Bing Video Creator** launched globally enabling text-to-video generation, and **Perplexity Labs** sees increased demand for travel search. New agentic AI tools and RAG innovations include **LlamaCloud** and **FedRAG**. Open-source releases include **Holo-1** for web navigation and **PlayAI&apos;s PlayDiffusion** for speech editing. Audio and multimodal advances feature **Suno&apos;s** music editing upgrades, **Google&apos;s** native TTS in 24+ languages, and **Universal Streaming&apos;s** ultra-low latency speech-to-text. **Google NotebookLM** now supports public notebooks. *&quot;Codex&apos;s internet access brings tradeoffs, with explicit warnings about risk&quot;* and *&quot;Gemini 2.5 Pro is cited as a daily driver by users&quot;*.</description><pubDate>Tue, 03 Jun 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>google</category><category>perplexity-ai</category><category>bing</category><category>playai</category><category>suno</category><category>hugging-face</category><category>langchain-ai</category><category>qwen</category><category>mlx</category><category>assemblyai</category><category>llamacloud</category><category>codex</category><category>claude-4-opus</category><category>claude-4-sonnet</category><category>gemini-2.5-pro</category><category>gemini-2.5</category><category>qwen-2.5-vl</category><category>qwen-3</category><category>playdiffusion</category><category>sama</category><category>gdb</category><category>kevinweil</category><category>lmarena_ai</category><category>epochairesearch</category><category>reach_vb</category><category>wightmanr</category><category>deeplearningai</category><category>mervenoyann</category><category>awnihannun</category><category>jordirib1</category><category>aravsrinivas</category><category>omarsar0</category><category>lioronai</category><category>jerryjliu0</category><category>nerdai</category><category>tonywu_71</category><category>_akhaliq</category><category>clementdelangue</category><category>_mfelfel</category><category>fine-tuning</category><category>model-benchmarking</category><category>text-to-video</category><category>agentic-ai</category><category>retrieval-augmented-generation</category><category>open-source-models</category><category>speech-editing</category><category>audio-processing</category><category>text-to-speech</category><category>ultra-low-latency</category><category>multimodality</category><category>public-notebooks</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-06-02-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-06-02-not-much/</guid><description>**DeepSeek R1-0528** release brings major improvements in reasoning, hallucination reduction, JSON output, and function calling, matching or surpassing closed models like **OpenAI o3** and **Gemini 2.5 Pro** on benchmarks such as **Artificial Analysis Intelligence Index**, **LiveBench**, and **GPQA Diamond**. The model ranks #2 globally in open weights intelligence, surpassing **Meta AI**, **Anthropic**, and **xAI**. Open weights and technical transparency have fueled rapid adoption across platforms like **Ollama** and **Hugging Face**. Chinese AI labs including **DeepSeek**, **Alibaba**, **ByteDance**, and **Xiaomi** now match or surpass US labs in model releases and intelligence, driven by open weights strategies. Reinforcement learning post-training is critical for intelligence gains, mirroring trends seen at **OpenAI**. Optimized quantization techniques (1-bit, 4-bit) and local inference enable efficient experimentation on consumer hardware. New benchmarks like **LisanBench** test knowledge, planning, memory, and long-context reasoning, with **OpenAI o3** and **Claude Opus 4** leading. Discussions highlight concerns about benchmark contamination and overemphasis on RL-tuned gains.</description><pubDate>Mon, 02 Jun 2025 05:44:39 GMT</pubDate><category>deepseek_ai</category><category>openai</category><category>gemini</category><category>meta-ai-fair</category><category>anthropic</category><category>x-ai</category><category>ollama</category><category>hugging-face</category><category>alibaba</category><category>bytedance</category><category>xiaomi</category><category>deepseek-r1-0528</category><category>o3</category><category>gemini-2.5-pro</category><category>claude-opus-4</category><category>teortaxestex</category><category>wenfeng</category><category>danielhanchen</category><category>awnihannun</category><category>reach_vb</category><category>abacaj</category><category>reasoning</category><category>reinforcement-learning</category><category>benchmarking</category><category>quantization</category><category>local-inference</category><category>model-evaluation</category><category>open-weights</category><category>transparency</category><category>post-training</category><category>agentic-benchmarks</category><category>long-context</category><category>hallucination-detection</category></item><item><title>Mary Meeker is so back: BOND Capital AI Trends report</title><link>https://news.smol.ai/issues/25-05-30-mary-meeker/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-30-mary-meeker/</guid><description>**Mary Meeker** returns with a comprehensive **340-slide report** on the state of AI, highlighting accelerating tech cycles, compute growth, and comparisons of **ChatGPT** to early Google and other iconic tech products. The report also covers enterprise traction and valuation of major AI companies. On Twitter, **@tri_dao** discusses an &quot;ideal&quot; inference architecture featuring attention variants like **GTA**, **GLA**, and **DeepSeek MLA** with high arithmetic intensity (~256), improving efficiency and model quality. Other highlights include the release of **4-bit DWQ of DSR1 Qwen3 8B** on Hugging Face, **AnthropicAI**&apos;s open-source interpretability tools for LLMs, and discussions on transformer training and abstractions by various researchers.</description><pubDate>Sat, 31 May 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>hugging-face</category><category>deepseek</category><category>qwen-3-8b</category><category>tri_dao</category><category>fleetwood___</category><category>teortaxestex</category><category>awnihannun</category><category>lateinteraction</category><category>neelnanda5</category><category>eliebakouch</category><category>_akhaliq</category><category>attention-mechanisms</category><category>inference</category><category>arithmetic-intensity</category><category>transformers</category><category>model-optimization</category><category>interpretability</category><category>model-quantization</category><category>training</category></item><item><title>DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release</title><link>https://news.smol.ai/issues/25-05-29-deepseek-r1-0528/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-29-deepseek-r1-0528/</guid><description>**DeepSeek R1-0528** marks a significant upgrade, closing the gap with proprietary models like **Gemini 2.5 Pro** and surpassing benchmarks from **Anthropic**, **Meta**, **NVIDIA**, and **Alibaba**. This Chinese open-weights model leads in several AI benchmarks, driven by reinforcement learning post-training rather than architecture changes, and demonstrates increased reasoning token usage (23K tokens per question). The China-US AI race intensifies as Chinese labs accelerate innovation through transparency and open research culture. Key benchmarks include **AIME 2024**, **LiveCodeBench**, and **GPQA Diamond**.</description><pubDate>Thu, 29 May 2025 05:44:39 GMT</pubDate><category>deepseek-ai</category><category>anthropic</category><category>meta-ai-fair</category><category>nvidia</category><category>alibaba</category><category>google-deepmind</category><category>deepseek-r1-0528</category><category>gemini-2.5-pro</category><category>qwen-3-8b</category><category>qwen-3-235b</category><category>artificialanlys</category><category>scaling01</category><category>cline</category><category>reach_vb</category><category>zizhpan</category><category>andrewyng</category><category>teortaxestex</category><category>teknim1</category><category>lateinteraction</category><category>abacaj</category><category>cognitivecompai</category><category>awnihannun</category><category>reinforcement-learning</category><category>benchmarking</category><category>model-performance</category><category>open-weights</category><category>reasoning</category><category>quantization</category><category>post-training</category><category>model-comparison</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-28-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-28-not-much/</guid><description>**DeepSeek R1 v2** model released with availability on Hugging Face and inference partners. The **Gemma model family** continues prolific development including **PaliGemma 2**, **Gemma 3**, and others. **Claude 4** and its variants like **Opus 4** and **Claude Sonnet 4** show top benchmark performance, including new SOTA on **ARC-AGI-2** and **WebDev Arena**. **Codestral Embed** introduces a 3072-dimensional code embedder. **BAGEL**, an open-source multimodal model by **ByteDance**, supports reading, reasoning, drawing, and editing with long mixed contexts. Benchmarking highlights include **Nemotron-CORTEXA** topping SWEBench and **Gemini 2.5 Pro** performing on VideoGameBench. Discussions on random rewards effectiveness focus on **Qwen** models. *&quot;Opus 4 NEW SOTA ON ARC-AGI-2. It&apos;s happening - I was right&quot;* and *&quot;Claude 4 launch has dev moving at a different pace&quot;* reflect excitement in the community.</description><pubDate>Wed, 28 May 2025 05:44:39 GMT</pubDate><category>deepseek-ai</category><category>huggingface</category><category>gemma</category><category>claude</category><category>bytedance</category><category>qwen</category><category>nemotron</category><category>sakana-ai-labs</category><category>deepseek-r1-0528</category><category>pali-gemma-2</category><category>gemma-3</category><category>shieldgemma-2</category><category>txgemma</category><category>gemma-3-qat</category><category>gemma-3n-preview</category><category>medgemma</category><category>dolphingemma</category><category>signgemma</category><category>claude-4</category><category>opus-4</category><category>claude-sonnet-4</category><category>codestral-embed</category><category>bagel</category><category>qwen</category><category>nemotron-cortexa</category><category>gemini-2.5-pro</category><category>yuchenj_uw</category><category>_akhaliq</category><category>clementdelangue</category><category>osanseviero</category><category>alexalbert__</category><category>guillaumelample</category><category>theturingpost</category><category>lmarena_ai</category><category>epochairesearch</category><category>scaling01</category><category>nrehiew_</category><category>ctnzr</category><category>benchmarking</category><category>model-releases</category><category>multimodality</category><category>code-generation</category><category>model-performance</category><category>long-context</category><category>reinforcement-learning</category><category>model-optimization</category><category>open-source</category></item><item><title>Mistral&apos;s Agents API and the 2025 LLM OS</title><link>https://news.smol.ai/issues/25-05-27-mistral-agents/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-27-mistral-agents/</guid><description>**The LLM OS** concept has evolved since 2023, with **Mistral AI** releasing a new **Agents API** that includes code execution, web search, persistent memory, and agent orchestration. **LangChainAI** introduced the **Open Agent Platform (OAP)**, an open-source no-code platform for intelligent agents. **OpenAI** plans to develop **ChatGPT** into a super-assistant by H1 2025, competing with **Meta**. Discussions around **Qwen** models focus on reinforcement learning effects, while **Claude 4** performance is also noted. The AI Engineer World&apos;s Fair is calling for volunteers.</description><pubDate>Tue, 27 May 2025 05:44:39 GMT</pubDate><category>mistral-ai</category><category>langchain-ai</category><category>openai</category><category>meta-ai-fair</category><category>qwen</category><category>claude-4</category><category>chatgpt</category><category>o3</category><category>o4</category><category>omarsar0</category><category>simonw</category><category>swyx</category><category>scaling01</category><category>agent-frameworks</category><category>multi-agent-systems</category><category>tool-use</category><category>code-execution</category><category>web-search</category><category>model-context-protocol</category><category>persistent-memory</category><category>function-calling</category><category>open-source</category><category>no-code</category><category>reinforcement-learning</category><category>model-performance</category><category>agent-orchestration</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-26-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-26-not-much/</guid><description>**OpenAI** plans to evolve **ChatGPT** into a **super-assistant** by 2025 with models like **o3** and **o4** enabling agentic tasks and supporting a billion users. Recent multimodal and reasoning model releases include ByteDance&apos;s **BAGEL-7B**, Google&apos;s **MedGemma**, and NVIDIA&apos;s **ACEReason-Nemotron-14B**. The **Sudoku-Bench Leaderboard** highlights ongoing challenges in AI creative reasoning. In software development, OpenAI&apos;s **Codex** aids code generation and debugging, while Gemini&apos;s **Context URL tool** enhances prompt context. **AgenticSeek** offers a local, privacy-focused alternative for autonomous agents. Ethical concerns are raised about AGI development priorities and Anthropic&apos;s alignment with human values. Technical discussions emphasize emergence in AI and training challenges, with humor addressing misconceptions about **Gemini 3.0** and async programming in C. A novel synthetic speech training method enables instruction tuning of LLMs without real speech data, advancing low-resource language support.</description><pubDate>Mon, 26 May 2025 05:44:39 GMT</pubDate><category>openai</category><category>bytedance</category><category>google</category><category>nvidia</category><category>sakana-ai-labs</category><category>deep-learning-ai</category><category>gemini</category><category>agenticseek</category><category>anthropic</category><category>chatgpt</category><category>o3</category><category>o4</category><category>bagel-7b</category><category>medgemma</category><category>acereason-nemotron-14b</category><category>codex</category><category>gemini</category><category>scaling01</category><category>mervenoyann</category><category>sakananailabs</category><category>_philschmid</category><category>omarsar0</category><category>teortaxestex</category><category>andrewlampinen</category><category>sedielem</category><category>cis_female</category><category>agentic-systems</category><category>multimodality</category><category>reasoning</category><category>code-generation</category><category>prompt-engineering</category><category>privacy</category><category>ethical-ai</category><category>emergence</category><category>synthetic-data</category><category>speech-instruction-tuning</category><category>low-resource-languages</category><category>humor</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-23-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-23-not-much/</guid><description>**Anthropic&apos;s Claude 4 models (Opus 4, Sonnet 4)** demonstrate strong coding abilities, with Sonnet 4 achieving **72.7%** on SWE-bench and Opus 4 at **72.5%**. Claude Sonnet 4 excels in codebase understanding and is considered **SOTA on large codebases**. Criticism arose over Anthropic&apos;s handling of **ASL-3 security requirements**. Demand for Claude 4 is high, with integration into IDEs and support from Cherry Studio and FastHTML. **Google DeepMind** introduced **Gemini 2.5 Pro Deep Think** and **Gemma 3n**, a mobile multimodal model reducing RAM usage by nearly 3x. **Google&apos;s Imagen 4 Ultra** ranks third in the Artificial Analysis Image Arena, available on **Vertex AI Studio**. Google also promoted **Google Beam**, an AI video model for immersive 3D experiences, and new text-to-speech models with multi-speaker support. The **GAIA benchmark** shows Claude 4 Opus and Sonnet leading in agentic performance.</description><pubDate>Fri, 23 May 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>google-deepmind</category><category>openai</category><category>claude-4</category><category>claude-4-opus</category><category>claude-4-sonnet</category><category>gemini-2.5-pro</category><category>gemma-3n</category><category>imagen-4-ultra</category><category>cline</category><category>amanrsanger</category><category>ryanpgreenblatt</category><category>johnschulman2</category><category>alexalbert__</category><category>nearcyan</category><category>mickeyxfriedman</category><category>jeremyphoward</category><category>gneubig</category><category>teortaxesTex</category><category>scaling01</category><category>artificialanlys</category><category>philschmid</category><category>codebase-understanding</category><category>coding</category><category>agentic-performance</category><category>multimodality</category><category>text-to-speech</category><category>video-generation</category><category>model-integration</category><category>benchmarking</category><category>memory-optimization</category></item><item><title>Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama</title><link>https://news.smol.ai/issues/25-05-22-claude-4/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-22-claude-4/</guid><description>**Anthropic** has officially released **Claude 4** with two variants: **Claude Opus 4**, a high-capability model for complex tasks priced at **$15/$75 per million tokens**, and **Claude Sonnet 4**, optimized for efficient everyday use. The release emphasizes **instruction following** and extended work sessions up to **7 hours**. Community discussions highlight concerns about **token pricing**, **token accounting transparency**, and calls for **open-sourcing Claude 3.5 Sonnet** weights to support local model development. The news also covers **Claude Code GA**, new **Agent Capabilities API**, and various livestreams and reports detailing these updates. There is notable debate around **sliding window attention** and advanced inference techniques for local deployment.</description><pubDate>Thu, 22 May 2025 05:44:39 GMT</pubDate><category>anthropic</category><category>claude-4</category><category>claude-4-opus</category><category>claude-4-sonnet</category><category>claude-3.5-sonnet</category><category>instruction-following</category><category>token-accounting</category><category>pricing-models</category><category>sliding-window-attention</category><category>inference-techniques</category><category>open-sourcing</category><category>model-accessibility</category><category>agent-capabilities-api</category><category>extended-context</category><category>model-deployment</category></item><item><title>OpenAI buys Jony Ive&apos;s io for $6.5b, LMArena lands $100m seed from a16z</title><link>https://news.smol.ai/issues/25-05-21-openai-io/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-21-openai-io/</guid><description>**OpenAI** confirmed a partnership with **Jony Ive** to develop consumer hardware. **LMArena** secured a $100 million seed round from **a16z**. **Mistral** launched a new code model fine-tune. **Google DeepMind** announced multiple updates at **Google I/O 2024**, including over a dozen new models and 20 AI products. Key highlights include the release of **Gemini 2.5 Pro** and **Gemini Diffusion**, featuring advanced multimodal reasoning, coding, and math capabilities, and integration of Gemini in **Google Chrome** as an AI browsing assistant. **Deep Think** enhanced reasoning mode and **Project Astra** improvements were also introduced, focusing on voice output, memory, and computer control for a universal AI assistant.</description><pubDate>Wed, 21 May 2025 05:44:39 GMT</pubDate><category>openai</category><category>lmarena</category><category>a16z</category><category>mistral-ai</category><category>google</category><category>google-deepmind</category><category>gemini-2.5-pro</category><category>gemini-diffusion</category><category>sundar_pichai</category><category>multimodality</category><category>reasoning</category><category>code-generation</category><category>math</category><category>model-fine-tuning</category><category>ai-assistants</category><category>voice</category><category>memory-optimization</category></item><item><title>Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)</title><link>https://news.smol.ai/issues/25-05-20-google-io/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-20-google-io/</guid><description>**Google I/O 2024** showcased significant advancements with **Gemini 2.5 Pro** and **Deep Think** reasoning mode from **google-deepmind**, emphasizing AI-driven transformations and developer opportunities. **GeminiApp** aims to become a universal **AI assistant** on the path to **AGI**, with new features like **AI Mode** in Google Search expanding generative AI access. The event included multiple keynotes and updates on over a dozen models and 20+ AI products, highlighting **Google&apos;s** leadership in AI innovation. Influential voices like **demishassabis** and **philschmid** provided insights and recaps, while the launch of **Jules** as a competitor to Codex/Devin was noted.</description><pubDate>Tue, 20 May 2025 05:44:39 GMT</pubDate><category>google</category><category>google-deepmind</category><category>gemini-2.5-pro</category><category>gemini-2.5</category><category>demishassabis</category><category>philschmid</category><category>jack_w_rae</category><category>ai-assistants</category><category>reasoning</category><category>generative-ai</category><category>developer-tools</category><category>ai-integration</category><category>model-optimization</category><category>ai-application</category><category>model-updates</category><category>ai-deployment</category><category>model-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-19-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-19-not-much/</guid><description>**Meta** released **KernelLLM 8B**, outperforming **GPT-4o** and **DeepSeek V3** on KernelBench-Triton Level 1. **Mistral Medium 3** debuted strongly in multiple benchmarks. **Qwen3** models introduced a unified framework with multilingual support. **DeepSeek-V3** features hardware-aware co-design. **BLIP3-o** family released for multimodal tasks using diffusion transformers. **Salesforce** launched **xGen-Small** models excelling in long-context and math benchmarks. **Bilibili** released **AniSORA** for anime video generation. **Stability AI** open-sourced **Stable Audio Open Small** optimized for Arm devices. Google’s **AlphaEvolve** coding agent improved **Strassen&apos;s algorithm** for the first time since 1969. Research shows **chain-of-thought reasoning** can harm instruction-following ability, with mitigation strategies like classifier-selective reasoning being most effective, but reasoning techniques show high variance and limited generalization. *&quot;Chain-of-thought (CoT) reasoning can harm a model’s ability to follow instructions&quot;* and *&quot;Mitigation strategies such as few-shot in-context learning, self-reflection, self-selective reasoning, and classifier-selective reasoning can counteract reasoning-induced failures&quot;*.</description><pubDate>Mon, 19 May 2025 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>mistral-ai</category><category>qwen</category><category>deepseek</category><category>salesforce</category><category>bilibili</category><category>stability-ai</category><category>google</category><category>kernelllm-8b</category><category>gpt-4o</category><category>deepseek-v3</category><category>mistral-medium-3</category><category>qwen3</category><category>blip3-o</category><category>xgen-small</category><category>anisora</category><category>stable-audio-open-small</category><category>alphaevolve</category><category>reach_vb</category><category>lmarena_ai</category><category>theadimeline</category><category>adcock_brett</category><category>jxmnop</category><category>dair_ai</category><category>omarsar0</category><category>benchmarking</category><category>model-performance</category><category>multilinguality</category><category>hardware-optimization</category><category>multimodality</category><category>image-generation</category><category>video-generation</category><category>text-to-audio</category><category>model-parallelism</category><category>chain-of-thought</category><category>instruction-following</category><category>reasoning</category><category>mitigation-strategies</category></item><item><title>ChatGPT Codex, OpenAI&apos;s first cloud SWE agent</title><link>https://news.smol.ai/issues/25-05-16-codex/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-16-codex/</guid><description>**OpenAI** launched **Codex**, a cloud-based software engineering agent powered by **codex-1** (an optimized version of **OpenAI o3**) available in research preview for Pro, Enterprise, and Team ChatGPT users, featuring parallel task execution like refactoring and bug fixing. The **Codex CLI** was enhanced with quick sign-in and a new low-latency model, **codex-mini**. **Gemma 3** is highlighted as the best open model runnable on a single GPU. **Runway** released the Gen-4 References API for style transfer in generation. **Salesforce** introduced **BLIP3-o**, a unified multimodal model family using diffusion transformers for CLIP image features. The **Qwen 2.5** models (1.5B and 3B versions) were integrated into the PocketPal app with various chat templates. **Marigold IID**, a new state-of-the-art open-source depth estimation model, was released. 

In research, **DeepSeek** shared insights on scaling and hardware for DeepSeek-V3. **Google** unveiled **LightLab**, a diffusion-based light source control in images. **Google DeepMind&apos;s AlphaEvolve** uses **Gemini 2.0** to discover new math and reduce costs without reinforcement learning. **Omni-R1** studied audio&apos;s role in fine-tuning audio LLMs. **Qwen** proposed a parallel scaling law inspired by classifier-free guidance. **Salesforce** released **Lumina-Next** on the Qwen base, outperforming Janus-Pro. A study found LLM performance degrades in multi-turn conversations due to unreliability. **J1** is incentivizing LLM-as-a-Judge thinking via reinforcement learning. A new Qwen study correlates question and strategy similarity to predict reasoning strategies.</description><pubDate>Fri, 16 May 2025 05:44:39 GMT</pubDate><category>openai</category><category>runway</category><category>salesforce</category><category>qwen</category><category>deepseek</category><category>google</category><category>google-deepmind</category><category>j1</category><category>codex-1</category><category>openai-o3</category><category>codex-mini</category><category>gemma-3</category><category>blip3-o</category><category>qwen-2.5</category><category>marigold-iid</category><category>deepseek-v3</category><category>lightlab</category><category>gemini-2.0</category><category>lumina-next</category><category>sama</category><category>kevinweil</category><category>omarsar0</category><category>iscienceluvr</category><category>akhaliq</category><category>osanseviero</category><category>c_valenzuelab</category><category>mervenoyann</category><category>arankomatsuzaki</category><category>jasonwei</category><category>demishassabis</category><category>philschmid</category><category>swyx</category><category>teortaxestex</category><category>jaseweston</category><category>software-engineering</category><category>parallel-processing</category><category>multimodality</category><category>diffusion-models</category><category>depth-estimation</category><category>scaling-laws</category><category>reinforcement-learning</category><category>fine-tuning</category><category>model-performance</category><category>multi-turn-conversation</category><category>reasoning</category><category>audio-processing</category></item><item><title>Gemini&apos;s AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL</title><link>https://news.smol.ai/issues/25-05-15-alphaevolve/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-15-alphaevolve/</guid><description>**Deepmind&apos;s AlphaEvolve**, a 2025 update to AlphaTensor and FunSearch, is a Gemini-powered **coding agent for algorithm discovery** that designs faster matrix multiplication algorithms, solves open math problems, and improves data center and AI training efficiency. It achieves a **23% faster kernel speedup** in Gemini training and surpasses state-of-the-art on 20% of applied problems, including improvements on the Minimum Overlap Problem and Kissing number problem. Unlike Deep-RL, it optimizes code pieces rather than model weights. Meanwhile, **OpenAI** released **GPT-4.1** in ChatGPT, specializing in coding and instruction following, with a faster alternative **GPT-4.1 mini** replacing GPT-4o mini for all users. OpenAI also launched the Safety Evaluations Hub and the OpenAI to Z Challenge using o3/o4 mini and GPT-4.1 models to discover archaeological sites. *&quot;Maybe midtrain + good search is all you need for AI for scientific innovation&quot;* - Jason Wei.</description><pubDate>Thu, 15 May 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>openai</category><category>gemini</category><category>gpt-4.1</category><category>gpt-4o-mini</category><category>o3</category><category>o4-mini</category><category>_philschmid</category><category>scott_swingle</category><category>alex_dimakis</category><category>henry</category><category>jason_wei</category><category>kevinweil</category><category>michpokrass</category><category>scaling01</category><category>gdb</category><category>algorithm-discovery</category><category>coding-agents</category><category>matrix-multiplication</category><category>optimization</category><category>reinforcement-learning</category><category>model-weights</category><category>training-efficiency</category><category>safety-evaluations</category><category>instruction-following</category><category>coding-tasks</category><category>model-releases</category></item><item><title>Granola launches team notes, while Notion launches meeting transcription</title><link>https://news.smol.ai/issues/25-05-14-notion-granola/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-14-notion-granola/</guid><description>**GPT-4.1** is now available in **ChatGPT** for Plus, Pro, and Team users, focusing on coding and instruction following, with **GPT 4.1 mini** replacing **GPT 4o mini**. **Anthropic** is releasing new **Claude** models including **Claude Opus** and **Claude Sonnet**, though some criticism about hallucinations in **Claude O3** was noted. **Alibaba** shared the **Qwen3 Technical Report** with strong benchmark results from **Seed1.5-VL**. **Meta FAIR** announced new models and datasets but faced criticism on **Llama 4**. **AM-Thinking-v1** launched on **Hugging Face** as a 32B scale reasoning model. **Granola** raised $43M in Series B and launched **Granola 2.0** with a Notion-like UI. The AI ecosystem shows rapid iteration and cloning of ideas, emphasizing execution and distribution.</description><pubDate>Wed, 14 May 2025 05:44:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>alibaba</category><category>meta-ai-fair</category><category>huggingface</category><category>granola</category><category>gpt-4.1</category><category>gpt-4o-mini</category><category>gpt-4.1-mini</category><category>claude-opus</category><category>claude-sonnet</category><category>claude-o3</category><category>qwen3</category><category>seed1.5-vl</category><category>llama-4</category><category>am-thinking-v1</category><category>kevinweil</category><category>scaling01</category><category>steph_palazzolo</category><category>andersonbcdefg</category><category>reach_vb</category><category>yuchenj_uw</category><category>qtnx_</category><category>_akhaliq</category><category>risingsayak</category><category>coding</category><category>instruction-following</category><category>benchmarking</category><category>model-releases</category><category>reasoning</category><category>image-generation</category><category>collaborative-software</category><category>model-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-13-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-13-not-much/</guid><description>**Tencent&apos;s Hunyuan-Turbos** has risen to #8 on the LMArena leaderboard, showing strong performance across major categories and significant improvement since February. The **Qwen3 model family**, especially the **Qwen3 235B-A22B (Reasoning)** model, is noted for its intelligence and efficient parameter usage. **OpenAI** introduced **HealthBench**, a new health evaluation benchmark developed with input from over **250 physicians**, where models like **o3**, **GPT-4.1 nano**, and **Grok 3** showed strong results. **ByteDance** released **Seed1.5-VL**, a vision-language model with a 532M-parameter vision encoder and a 20B active parameter MoE LLM, achieving state-of-the-art results on 38 public benchmarks. In vision-language, **Kling 2.0** leads image-to-video generation, and **Gemini 2.5 Pro** excels in video understanding with advanced multimodal capabilities. Meta&apos;s Vision-Language-Action framework and updates on VLMs for 2025 were also highlighted.</description><pubDate>Tue, 13 May 2025 05:44:39 GMT</pubDate><category>tencent</category><category>openai</category><category>bytedance</category><category>meta-ai-fair</category><category>nvidia</category><category>deepseek</category><category>hunyuan-turbos</category><category>qwen3-235b-a22b</category><category>o3</category><category>gpt-4.1-nano</category><category>grok-3</category><category>gemini-2.5-pro</category><category>seed1.5-vl</category><category>kling-2.0</category><category>lmarena_ai</category><category>artificialanlys</category><category>gdb</category><category>_jasonwei</category><category>iScienceLuvr</category><category>_akhaliq</category><category>_philschmid</category><category>teortaxesTex</category><category>mervenoyann</category><category>reach_vb</category><category>benchmarking</category><category>model-performance</category><category>moe</category><category>reasoning</category><category>vision</category><category>video-understanding</category><category>vision-language</category><category>multimodality</category><category>model-evaluation</category><category>model-optimization</category></item><item><title>Prime Intellect&apos;s INTELLECT-2 and PRIME-RL advance distributed reinforcement learning</title><link>https://news.smol.ai/issues/25-05-12-intellect-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-12-intellect-2/</guid><description>**Prime Intellect** released **INTELLECT-2**, a decentralized GPU training and RL framework with a vision for distributed AI training overcoming colocation limits. **ByteDance** launched **DreamO**, a unified image customization model on Hugging Face. **Qwen** released models optimized for GPTQ, GGUF, and AWQ quantization. **Gemma** surpassed 150 million downloads on Hugging Face. **Meta** released weights for the **Dynamic Byte Latent Transformer** and the **Collaborative Reasoner** framework to improve language model efficiency and reasoning. **RunwayML** introduced **Gen-4 References**, a near-realtime model requiring no fine-tuning. **Mistral AI** released **Mistral Medium 3**, a strong multimodal model, and **Le Chat Enterprise**, an agentic AI assistant for business. **Google** updated **Gemini 2.5 Pro Preview** with video understanding and UI improvements. *&quot;Airbnb for spare GPUs from all over the world&quot;* highlights the ongoing challenges and potential of distributed GPU training.</description><pubDate>Mon, 12 May 2025 05:44:39 GMT</pubDate><category>primeintellect</category><category>bytedance</category><category>qwen</category><category>gemma</category><category>meta-ai-fair</category><category>runwayml</category><category>mistral-ai</category><category>google</category><category>intellect-2</category><category>dreamo</category><category>qwen</category><category>gemini-2.5-pro</category><category>dynamic-byte-latent-transformer</category><category>gen-4-references</category><category>mistral-medium-3</category><category>le-chat-enterprise</category><category>_akhaliq</category><category>reach_vb</category><category>osanseviero</category><category>aiatmeta</category><category>c_valenzuelab</category><category>lmarena_ai</category><category>adcock_brett</category><category>distributed-training</category><category>reinforcement-learning</category><category>gpu-clusters</category><category>model-optimization</category><category>quantization</category><category>multimodality</category><category>agentic-ai</category><category>video-understanding</category><category>fine-tuning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-09-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-09-not-much/</guid><description>**Gemini 2.5 Flash** shows a **12 point increase** in the Artificial Analysis Intelligence Index but costs **150x more** than Gemini 2.0 Flash due to **9x more expensive output tokens** and **17x higher token usage** during reasoning. **Mistral Medium 3** competes with **Llama 4 Maverick**, **Gemini 2.0 Flash**, and **Claude 3.7 Sonnet** with better coding and math reasoning at a significantly lower price. **Alibaba&apos;s Qwen3** family supports reasoning and multilingual tasks across **119 languages** and includes a **Web Dev** tool for app building. **Huawei&apos;s Pangu Ultra MoE** matches **DeepSeek R1** performance on Ascend NPUs, with new compute and upcoming V4 training. **OpenAI&apos;s o4-mini** now supports **Reinforcement Fine-Tuning (RFT)** using chain-of-thought reasoning. **Microsoft&apos;s X-REASONER** enables generalizable reasoning across modalities post-trained on general-domain text. Deep research integration with GitHub repos in ChatGPT enhances codebase search and reporting. The AI Engineer World&apos;s Fair offers an Early Bird discount for upcoming tickets.</description><pubDate>Fri, 09 May 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>mistral-ai</category><category>alibaba</category><category>huawei</category><category>openai</category><category>microsoft</category><category>deepseek</category><category>gemini-2.5-flash</category><category>gemini-2.0-flash</category><category>mistral-medium-3</category><category>llama-4-maverick</category><category>claude-3.7-sonnet</category><category>qwen3</category><category>pangu-ultra-moe</category><category>deepseek-r1</category><category>o4-mini</category><category>x-reasoner</category><category>giffmana</category><category>artificialanlys</category><category>teortaxestex</category><category>akhaliq</category><category>john__allard</category><category>model-performance</category><category>reasoning</category><category>cost-analysis</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>multilinguality</category><category>code-search</category><category>model-training</category><category>vision</category><category>model-integration</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-08-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-08-not-much/</guid><description>**OpenAI** launched both **Reinforcement Finetuning** and **Deep Research on GitHub repos**, drawing comparisons to **Cognition&apos;s DeepWiki**. **Nvidia** open-sourced **Open Code Reasoning models (32B, 14B, 7B)** with Apache 2.0 license, showing 30% better token efficiency and compatibility with llama.cpp, vLLM, transformers, and TGI. Independent evaluations highlight **Mistral Medium 3** rivaling **Llama 4 Maverick**, **Gemini 2.0 Flash**, and **Claude 3.7 Sonnet** in coding and math reasoning, priced significantly lower but no longer open-source. **Google&apos;s Gemini 2.5 Pro** is noted as their most intelligent model with improved coding from simple prompts, while **Gemini 2.5 Flash** incurs a 150x cost increase over Gemini 2.0 Flash due to higher token usage and cost. The **Absolute Zero Reasoner (AZR)** achieves SOTA performance in coding and math reasoning via reinforced self-play without external data. Vision-language model **X-REASONER** is post-trained on general-domain text for reasoning. **Apple ML research** released **FastVLM** with on-device iPhone demo. **HiDream LoRA trainer** supports QLoRA fine-tuning under memory constraints. **Nvidia&apos;s Parakeet ASR model** tops Hugging Face ASR leaderboard with MLX implementation. New datasets **SwallowCode** and **SwallowMath** boost LLM performance in math and code. Overall, a quiet day with significant model releases and performance insights.</description><pubDate>Thu, 08 May 2025 05:44:39 GMT</pubDate><category>openai</category><category>nvidia</category><category>mistral-ai</category><category>google</category><category>apple</category><category>huggingface</category><category>open-code-reasoning-32b</category><category>open-code-reasoning-14b</category><category>open-code-reasoning-7b</category><category>mistral-medium-3</category><category>llama-4-maverick</category><category>gemini-2.5-pro</category><category>gemini-2.5-flash</category><category>claude-3.7-sonnet</category><category>absolute-zero-reasoner</category><category>x-reasoner</category><category>fastvlm</category><category>parakeet-asr</category><category>reach_vb</category><category>artificialanlys</category><category>scaling01</category><category>iscienceluvr</category><category>arankomatsuzaki</category><category>awnihannun</category><category>risingsayak</category><category>reinforcement-learning</category><category>fine-tuning</category><category>code-generation</category><category>reasoning</category><category>vision</category><category>on-device-ai</category><category>model-performance</category><category>dataset-release</category><category>model-optimization</category></item><item><title>AI Engineer World&apos;s Fair: Second Run, Twice The Fun</title><link>https://news.smol.ai/issues/25-05-07-aiewf-2025/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-07-aiewf-2025/</guid><description>**The 2025 AI Engineer World&apos;s Fair** is expanding with **18 tracks** covering topics like **Retrieval + Search**, **GraphRAG**, **RecSys**, **SWE-Agents**, **Agent Reliability**, **Reasoning + RL**, **Voice AI**, **Generative Media**, **Infrastructure**, **Security**, and **Evals**. New focuses include **MCP**, **Tiny Teams**, **Product Management**, **Design Engineering**, and **Robotics and Autonomy** featuring foundation models from **Waymo**, **Tesla**, and **Google**. The event highlights the growing importance of **AI Architects** and enterprise AI leadership. Additionally, **Demis Hassabis** announced the **Gemini 2.5 Pro Preview &apos;I/O edition&apos;**, which leads coding and web development benchmarks on **LMArena**.</description><pubDate>Wed, 07 May 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>waymo</category><category>tesla</category><category>anthropic</category><category>braintrust</category><category>gemini-2.5-pro</category><category>demishassabis</category><category>retrieval-augmentation</category><category>graph-databases</category><category>recommendation-systems</category><category>software-engineering-agents</category><category>agent-reliability</category><category>reinforcement-learning</category><category>voice</category><category>image-generation</category><category>video-generation</category><category>infrastructure</category><category>security</category><category>evaluation</category><category>ai-leadership</category><category>enterprise-ai</category><category>mcp</category><category>tiny-teams</category><category>product-management</category><category>design-engineering</category><category>robotics</category><category>foundation-models</category><category>coding</category><category>web-development</category></item><item><title>Gemini 2.5 Pro Preview 05-06 (I/O edition) - the SOTA vision+coding model</title><link>https://news.smol.ai/issues/25-05-06-gemini-2-5-pro/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-06-gemini-2-5-pro/</guid><description>**Gemini 2.5 Pro** has been updated with enhanced multimodal image-to-code capabilities and dominates the WebDev Arena Leaderboard, surpassing **Claude 3.7 Sonnet** in coding and other tasks. **Nvidia** released the **Llama-Nemotron** model family on Hugging Face, noted for efficient reasoning and inference. **Alibaba&apos;s Qwen3** models range from 0.6B to 235B parameters, including dense and MoE variants. **KerasRS** was released by **Franois Chollet** as a new recommender system library compatible with JAX, PyTorch, and TensorFlow, optimized for TPUs. These updates highlight advancements in coding, reasoning, and speech recognition models.</description><pubDate>Tue, 06 May 2025 05:44:39 GMT</pubDate><category>google-deepmind</category><category>nvidia</category><category>alibaba</category><category>hugging-face</category><category>gemini-2.5-pro</category><category>claude-3.7-sonnet</category><category>llama-nemotron</category><category>qwen3</category><category>demishassabis</category><category>_philschmid</category><category>lmarena_ai</category><category>scaling01</category><category>fchollet</category><category>multimodality</category><category>coding</category><category>reasoning</category><category>model-release</category><category>speech-recognition</category><category>recommender-systems</category><category>benchmarking</category></item><item><title>Cursor @ $9b, OpenAI Buys Windsurf @ $3b</title><link>https://news.smol.ai/issues/25-05-05-cursor-openai-windsurf/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-05-cursor-openai-windsurf/</guid><description>**OpenAI** is reportedly close to closing a deal with Windsurf, coinciding with **Cursor&apos;s** $900M funding round at a $9B valuation. **Nvidia** launched the **Llama-Nemotron series** featuring models from 8B to 253B parameters, praised for reasoning and inference efficiency. **Alibaba** released the **Qwen3 family** with MoE and dense models up to 235B parameters, ranking highly in coding and math benchmarks. **DeepSeek** introduced **Prover-V2**, an open-source AI for math reasoning with an 88.9% pass rate on MiniF2F-test. **Microsoft** released reasoning-focused **Phi-4 models**, outperforming OpenAI&apos;s **o1-mini**. **Baidu** debuted turbo versions of **ERNIE 4.5 and X1** for faster, cheaper inference. **Suno v4.5** added advanced AI music generation features, while **Runway Gen-4 References** enable placing characters into scenes with high consistency. **KerasRS**, a new recommender system library optimized for TPUs, was released by **Franois Chollet**.</description><pubDate>Mon, 05 May 2025 05:44:39 GMT</pubDate><category>openai</category><category>cursor</category><category>nvidia</category><category>alibaba</category><category>deepseek</category><category>microsoft</category><category>baidu</category><category>suno</category><category>runway</category><category>keras</category><category>llama-nemotron-ultra</category><category>llama-nemotron-super</category><category>llama-nemotron-nano</category><category>qwen3-235b-a22b</category><category>prover-v2</category><category>phi-4-reasoning</category><category>ernie-4.5-turbo</category><category>ernie-x1-turbo</category><category>suno-v4.5</category><category>gen-4-references</category><category>o1-mini</category><category>_akhaliq</category><category>adcock_brett</category><category>lmarena_ai</category><category>fchollet</category><category>reasoning</category><category>inference-efficiency</category><category>open-license</category><category>moe-models</category><category>math-reasoning</category><category>theorem-proving</category><category>model-performance</category><category>music-generation</category><category>image-generation</category><category>recommender-systems</category><category>tpu-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-02-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-02-not-much/</guid><description>**Qwen model family** released quantized versions of Qwen3 models including **14B**, **32B**, and **235B** parameters, with promising coding capabilities in Qwen3-235B. **Microsoft** launched **Phi-4-reasoning**, a **14B** parameter model distilled from OpenAI&apos;s o3-mini, emphasizing supervised fine-tuning and reinforcement learning, outperforming larger models in some benchmarks. **Cohere&apos;s Command A** leads SQL performance on Bird Bench. **Google** introduced the **TRAJAN** eval for video generation temporal consistency and updated the **Gemini** OpenAI compatibility layer. **Inception Labs** launched a diffusion LLM API claiming 5x speed improvements over autoregressive models. Community rankings show **OpenAI&apos;s o3** model debuting strongly in web app-building tasks. Other releases include **AllenAI&apos;s OLMo2 1B** and additional Phi 4 variants. *&quot;Qwen3-235B shows promise for coding&quot;* and *&quot;Phi-4-reasoning tech report emphasizes SFT gains&quot;* highlight key advancements.</description><pubDate>Fri, 02 May 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>together-ai</category><category>scaling01</category><category>microsoft</category><category>deepseek</category><category>cohere</category><category>google</category><category>epoch-ai-research</category><category>inception-labs</category><category>openai</category><category>allenai</category><category>qwen3-14b</category><category>qwen3-32b</category><category>qwen3-235b</category><category>phi-4-reasoning</category><category>o3-mini</category><category>command-a</category><category>gemini-2.5-pro</category><category>o4-mini</category><category>olm-o2-1b</category><category>o3</category><category>cline</category><category>_philschmid</category><category>iscienceluvr</category><category>alexalbert__</category><category>_lewtun</category><category>teortaxestex</category><category>sarahookr</category><category>reach_vb</category><category>quantization</category><category>fine-tuning</category><category>reinforcement-learning</category><category>benchmarking</category><category>video-generation</category><category>diffusion-models</category><category>model-performance</category><category>model-evaluation</category><category>model-release</category><category>text-generation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-05-01-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-05-01-not-much/</guid><description>**Microsoft** released **Phi-reasoning 4**, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. **Anthropic** introduced remote MCP server support and a 45-minute Research mode in **Claude**. **Cursor** published a model popularity list. **Alibaba** launched **Qwen3-235B** and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on **Together AI** API. **Microsoft** also released **Phi-4-Mini-Reasoning** with benchmark performance on AIME 2025 and OmniMath. **DeepSeek** announced **DeepSeek-Prover V2** with state-of-the-art math problem solving, scaling to 671B parameters. **Meta AI**&apos;s **Llama** models hit 1.2 billion downloads, with new **Llama Guard 4** and **Prompt Guard 2** for input/output filtering and jailbreak prevention. **Xiaomi** released the open-source reasoning model **MiMo-7B** trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the **LMArena leaderboard**, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like **OpenRouterAI** rankings. *&quot;LMArena slop and biased&quot;* and *&quot;61.3% of all data going to proprietary model providers&quot;* were noted concerns.</description><pubDate>Thu, 01 May 2025 05:44:39 GMT</pubDate><category>microsoft</category><category>anthropic</category><category>cursor</category><category>alibaba</category><category>togethercompute</category><category>deepseek</category><category>meta-ai-fair</category><category>xiaomi</category><category>openrouterai</category><category>cohere</category><category>phi-4</category><category>phi-4-mini-reasoning</category><category>qwen3-235b</category><category>qwen3-moe-235b</category><category>qwen3-moe-30b</category><category>qwen3-dense-32b</category><category>qwen3-dense-14b</category><category>qwen3-dense-8b</category><category>qwen3-dense-4b</category><category>qwen3-dense-0.6b</category><category>qwen2.5-omni-3b</category><category>deepseek-prover-v2</category><category>llama</category><category>llama-guard-4</category><category>prompt-guard-2</category><category>mimo-7b</category><category>cline</category><category>reach_vb</category><category>vipulved</category><category>akhaliq</category><category>omarsar0</category><category>zhs05232838</category><category>huajian_xin</category><category>mervenoyann</category><category>karpathy</category><category>random_walker</category><category>sarahookr</category><category>blancheminerva</category><category>clefourrier</category><category>reasoning</category><category>model-fine-tuning</category><category>model-evaluation</category><category>benchmarking</category><category>model-popularity</category><category>open-source</category><category>math</category><category>model-scaling</category><category>model-filtering</category><category>jailbreak-prevention</category></item><item><title>ChatGPT responds to GlazeGate + LMArena responds to Cohere</title><link>https://news.smol.ai/issues/25-04-30-glazegate/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-30-glazegate/</guid><description>**OpenAI** faced backlash after a controversial ChatGPT update, leading to an official retraction admitting they &quot;focused too much on short-term feedback.&quot; Researchers from **Cohere** published a paper criticizing **LMArena** for unfair practices favoring incumbents like **OpenAI**, **DeepMind**, **X.ai**, and **Meta AI Fair**. The **Qwen3 family** by **Alibaba** was released, featuring models up to **235B MoE**, supporting **119 languages** and trained on **36 trillion tokens**, with integration into **vLLM** and support in tools like **llama.cpp**. Meta announced the second round of **Llama Impact Grants** to promote open-source AI innovation. Discussions on AI Twitter highlighted concerns about leaderboard overfitting and fairness in model benchmarking, with notable commentary from **karpathy** and others.</description><pubDate>Wed, 30 Apr 2025 15:44:39 GMT</pubDate><category>openai</category><category>cohere</category><category>lm-arena</category><category>deepmind</category><category>x-ai</category><category>meta-ai-fair</category><category>alibaba</category><category>vllm</category><category>llamaindex</category><category>qwen3-235b-a22b</category><category>qwen3</category><category>qwen3-moe</category><category>llama-4</category><category>joannejang</category><category>arankomatsuzaki</category><category>karpathy</category><category>sarahookr</category><category>reach_vb</category><category>model-releases</category><category>model-benchmarking</category><category>performance-evaluation</category><category>open-source</category><category>multilinguality</category><category>model-integration</category><category>fine-tuning</category><category>model-optimization</category></item><item><title>LlamaCon: Meta AI gets into the Llama API platform business</title><link>https://news.smol.ai/issues/25-04-29-llamacon/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-29-llamacon/</guid><description>**Meta** celebrated progress in the **Llama** ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by **Cerebras** and **Groq** hardware, though it remains waitlisted. Meanwhile, **Alibaba** released the **Qwen3** family of large language models, including **two MoE models** and **six dense models** ranging from **0.6B to 235B parameters**, with the flagship **Qwen3-235B-A22B** achieving competitive benchmark results and supporting **119 languages and dialects**. The Qwen3 models are optimized for coding and agentic capabilities, are Apache 2.0 licensed, and have broad deployment support including local usage with tools like **vLLM**, **Ollama**, and **llama.cpp**. Community feedback highlights Qwen3&apos;s scalable performance and superiority over models like OpenAI&apos;s **o3-mini**.</description><pubDate>Tue, 29 Apr 2025 05:44:39 GMT</pubDate><category>meta-ai-fair</category><category>cerebras</category><category>groq</category><category>alibaba</category><category>vllm</category><category>ollama</category><category>llamaindex</category><category>hugging-face</category><category>llama-cpp</category><category>llama-4</category><category>qwen3</category><category>qwen3-235b-a22b</category><category>qwen3-30b-a3b</category><category>qwen3-4b</category><category>qwen2-5-72b-instruct</category><category>o3-mini</category><category>reach_vb</category><category>huybery</category><category>teortaxestex</category><category>awnihannun</category><category>thezachmueller</category><category>model-release</category><category>fine-tuning</category><category>reinforcement-learning</category><category>moe</category><category>multilingual-models</category><category>model-optimization</category><category>model-deployment</category><category>coding</category><category>benchmarking</category><category>apache-license</category></item><item><title>Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1</title><link>https://news.smol.ai/issues/25-04-28-qwen-3/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-28-qwen-3/</guid><description>**Qwen 3** has been released by **Alibaba** featuring a range of models including two MoE variants, **Qwen3-235B-A22B** and **Qwen3-30B-A3B**, which demonstrate competitive performance against top models like **DeepSeek-R1**, **o1**, **o3-mini**, **Grok-3**, and **Gemini-2.5-Pro**. The models introduce an &quot;enable_thinking=True&quot; mode with advanced soft switching for inference scaling. The release is notable for its Apache 2.0 license and broad inference platform support including MCP. The dataset improvements and multi-stage RL post-training contribute to performance gains. Meanwhile, **Gemini 2.5 Pro** from **Google DeepMind** shows strong coding and long-context reasoning capabilities, and **DeepSeek R2** is anticipated soon. Twitter discussions highlight Qwen3&apos;s finegrained MoE architecture, large context window, and multi-agent system applications.</description><pubDate>Mon, 28 Apr 2025 05:44:39 GMT</pubDate><category>alibaba</category><category>google-deepmind</category><category>deepseek</category><category>mistral-ai</category><category>qwen-3</category><category>qwen3-235b-a22b</category><category>qwen3-30b-a3b</category><category>deepseek-r1</category><category>o1</category><category>o3-mini</category><category>grok-3</category><category>gemini-2.5-pro</category><category>awnihannun</category><category>prince_canuma</category><category>actuallyisaak</category><category>oriolvinyalsml</category><category>iscienceluvr</category><category>reach_vb</category><category>teortaxestex</category><category>omarsar0</category><category>mixture-of-experts</category><category>reinforcement-learning</category><category>benchmarking</category><category>model-release</category><category>model-architecture</category><category>long-context</category><category>multi-agent-systems</category><category>inference</category><category>dataset-release</category></item><item><title>Cognition&apos;s DeepWiki, a free encyclopedia of all GitHub repos</title><link>https://news.smol.ai/issues/25-04-25-cognition-deepwiki/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-25-cognition-deepwiki/</guid><description>**Silas Alberti** of **Cognition** announced **DeepWiki**, a free encyclopedia of all GitHub repos providing Wikipedia-like descriptions and Devin-backed chatbots for public repos. **Meta** released **Perception Encoders (PE)** with A2.0 license, outperforming **InternVL3** and **Qwen2.5VL** on vision tasks. **Alibaba** launched the **Qwen Chat App** for iOS and Android. **Hugging Face** integrated the **Dia 1.6B SoTA** text-to-speech model via **FAL**. **OpenAI** expanded deep research usage with a lightweight version powered by **o4-mini** model, now available to free users. **Perplexity AI** updated their model selector with **Grok 3 Beta**, **o4-mini**, and support for models like **gemini 2.5 pro**, **claude 3.7**, and **gpt-4.1**. **vLLM** project introduced **OpenRLHF** framework for reinforcement learning with human feedback. **Surya OCR** alpha model supports 90+ languages and LaTeX. **MegaParse** open-source library was introduced for LLM-ready data formats.</description><pubDate>Fri, 25 Apr 2025 05:44:39 GMT</pubDate><category>cognition</category><category>meta-ai-fair</category><category>alibaba</category><category>hugging-face</category><category>openai</category><category>perplexity-ai</category><category>vllm</category><category/><category>o4-mini</category><category>perception-encoder</category><category>qwen-2.5-vl</category><category>dia-1.6b</category><category>grok-3</category><category>gemini-2.5-pro</category><category>claude-3.7</category><category>gpt-4.1</category><category>silas-alberti</category><category>mervenoyann</category><category>reach_vb</category><category>aravsrinivas</category><category>vikparuchuri</category><category>lioronai</category><category>vision</category><category>text-to-speech</category><category>reinforcement-learning</category><category>ocr</category><category>model-releases</category><category>model-integration</category><category>open-source</category><category>frameworks</category><category>chatbots</category><category>model-selector</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-24-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-24-not-much/</guid><description>AI news for April 23-24, 2025, covering new model releases, benchmarks, and research developments from companies like openai, google deepmind, anthropic, and epoch ai research.</description><pubDate>Thu, 24 Apr 2025 05:44:39 GMT</pubDate><category>openai</category><category>google</category><category>anthropic</category><category>epoch ai research</category><category>gpt-image-1</category><category>o3</category><category>o4-mini</category><category>gpt-4.1</category><category>dam</category><category>image-generation</category><category>model-benchmarks</category><category>vision-language-models</category><category>music-ai</category><category>ai-experiences</category><category>ai-research</category><category>supercomputers</category></item><item><title>gpt-image-1 - ChatGPT&apos;s imagegen model, confusingly NOT 4o, now available in API</title><link>https://news.smol.ai/issues/25-04-23-gpt-image-1/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-23-gpt-image-1/</guid><description>**OpenAI** officially launched the **gpt-image-1** API for image generation and editing, supporting features like alpha channel transparency and a &quot;low&quot; content moderation policy. **OpenAI&apos;s** models **o3** and **o4-mini** are leading in benchmarks for style control, math, coding, and hard prompts, with **o3** ranking #1 in several categories. A new benchmark called **Vending-Bench** reveals performance variance in LLMs on extended tasks. **GPT-4.1** ranks in the top 5 for hard prompts and math. **Nvidia&apos;s** **Eagle 2.5-8B** matches **GPT-4o** and **Qwen2.5-VL-72B** in long-video understanding. AI supercomputer performance doubles every 9 months, with **xAI&apos;s Colossus** costing an estimated $7 billion and the US dominating 75% of global performance. The Virology Capabilities Test shows **OpenAI&apos;s o3** outperforms 94% of expert virologists. **Nvidia** also released the **Describe Anything Model (DAM)**, a multimodal LLM for detailed image and video captioning, now available on Hugging Face.</description><pubDate>Wed, 23 Apr 2025 05:44:39 GMT</pubDate><category>openai</category><category>nvidia</category><category>hugging-face</category><category>x-ai</category><category>gpt-image-1</category><category>o3</category><category>o4-mini</category><category>gpt-4.1</category><category>eagle-2.5-8b</category><category>gpt-4o</category><category>qwen2.5-vl-72b</category><category>kevinweil</category><category>lmarena_ai</category><category>_philschmid</category><category>willdepue</category><category>arankomatsuzaki</category><category>epochairesearch</category><category>danhendrycks</category><category>reach_vb</category><category>mervenoyann</category><category>_akhaliq</category><category>image-generation</category><category>content-moderation</category><category>benchmarking</category><category>long-context</category><category>multimodality</category><category>model-performance</category><category>supercomputing</category><category>virology</category><category>video-understanding</category><category>model-releases</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-22-not-much/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-22-not-much/</guid><description>**Nemotron-H** model family introduces hybrid Mamba-Transformer models with up to **3x faster inference** and variants including **8B**, **56B**, and a compressed **47B** model. **Nvidia Eagle 2.5** is a frontier VLM for long-context multimodal learning, matching **GPT-4o** and **Qwen2.5-VL-72B** on long-video understanding. **Gemini 2.5 Flash** shows improved dynamic thinking and cost-performance, outperforming previous Gemini versions. **Gemma 3** now supports **torch.compile** for about **60% faster inference** on consumer GPUs. **SRPO** using **Qwen2.5-32B** surpasses DeepSeek-R1-Zero-32B on benchmarks with reinforcement learning only. **Alibaba&apos;s Uni3C** unifies 3D-enhanced camera and human motion controls for video generation. **Seedream 3.0** by **ByteDance** is a bilingual image generation model with high-resolution outputs up to **2K**. **Adobe DRAGON** optimizes diffusion generative models with distributional rewards. **Kimina-Prover Preview** is an LLM trained with reinforcement learning from **Qwen2.5-72B**, achieving **80.7% pass@8192** on miniF2F. **BitNet b1.58 2B4T** is a native 1-bit LLM with **2B parameters** trained on **4 trillion tokens**, matching full-precision LLM performance with better efficiency. Antidistillation sampling counters unwanted model distillation by modifying reasoning traces from frontier models.</description><pubDate>Tue, 22 Apr 2025 05:44:39 GMT</pubDate><category>nvidia</category><category>deepseek</category><category>hugging-face</category><category>alibaba</category><category>bytedance</category><category>adobe</category><category>nemotron-h</category><category>nvidia-eagle-2.5</category><category>gpt-4o</category><category>qwen2.5-vl-72b</category><category>gemini-2.5-flash</category><category>gemini-2.0-pro</category><category>gemini-exp-1206</category><category>gemma-3</category><category>qwen2.5-32b</category><category>deepseek-r1-zero-32b</category><category>uni3c</category><category>seedream-3.0</category><category>adobe-dragon</category><category>kimina-prover</category><category>qwen2.5-72b</category><category>bitnet-b1.58-2b4t</category><category>philschmid</category><category>arankomatsuzaki</category><category>osanseviero</category><category>iScienceLuvr</category><category>akhaliq</category><category>transformers</category><category>model-optimization</category><category>multimodality</category><category>long-context</category><category>reinforcement-learning</category><category>torch-compile</category><category>image-generation</category><category>diffusion-models</category><category>distributional-rewards</category><category>model-efficiency</category><category>model-training</category><category>native-quantization</category><category>sampling-techniques</category></item><item><title>not much happened today; New email provider for AINews</title><link>https://news.smol.ai/issues/25-04-21-not-much-resend/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-21-not-much-resend/</guid><description>**Smol AI** is migrating its AI news email service to **Resend** to improve deliverability and enable new features like personalizable AI news and a &quot;Hacker News of AI.&quot; Recent AI model updates include **OpenAI**&apos;s API-only **GPT-4.1**, **Google Gemini 2.5 Flash** reasoning model, **ByteDance Seaweed** 7B-param video AI, **Anthropic Claude**&apos;s values system, **Cohere Embed 4** multimodal embedding model, and **xAI Grok** updates with Memory and Studio features. Discussions also cover agentic workflows for document automation and AI coding patterns.</description><pubDate>Mon, 21 Apr 2025 05:44:39 GMT</pubDate><category>smol-ai</category><category>resend</category><category>openai</category><category>google</category><category>bytedance</category><category>anthropic</category><category>cohere</category><category>x-ai</category><category>gpt-4.1</category><category>gpt-4o</category><category>gpt-4o-mini</category><category>gemini-2.5-flash</category><category>seaweed-7b</category><category>claude</category><category>embed-4</category><category>grok</category><category>adcock_brett</category><category>swyx</category><category>jerryjliu0</category><category>alexalbert</category><category>omarsar0</category><category>email-deliverability</category><category>model-releases</category><category>reasoning</category><category>video-generation</category><category>multimodality</category><category>embedding-models</category><category>agentic-workflows</category><category>document-processing</category><category>function-calling</category><category>tool-use</category><category>ai-coding</category></item><item><title>Grok 3 &amp; 3-mini now API Available</title><link>https://news.smol.ai/issues/25-04-18-ainews-grok-3-and-3-mini-now-api-available/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-18-ainews-grok-3-and-3-mini-now-api-available/</guid><description>**Grok 3** API is now available, including a smaller version called Grok 3 mini, which offers competitive pricing and full reasoning traces. **OpenAI** released a practical guide for building AI agents, while **LlamaIndex** supports the Agent2Agent protocol for multi-agent communication. **Codex CLI** is gaining traction with new features and competition from **Aider** and **Claude Code**. **GoogleDeepMind** launched **Gemini 2.5 Flash**, a hybrid reasoning model topping the Chatbot Arena leaderboard. **OpenAI**&apos;s o3 and o4-mini models show emergent behaviors from large-scale reinforcement learning. **EpochAIResearch** updated its methodology, removing **Maverick** from high FLOP models as **Llama 4 Maverick** training compute drops. **GoodfireAI** announced a $50M Series A for its Ember neural programming platform. **Mechanize** was founded to build virtual work environments and automation benchmarks. **GoogleDeepMind**&apos;s Quantisation Aware Training for Gemma 3 models reduces model size significantly, with open source checkpoints available.</description><pubDate>Sat, 19 Apr 2025 05:44:39 GMT</pubDate><category>openai</category><category>llamaindex</category><category>google-deepmind</category><category>epochairesearch</category><category>goodfireai</category><category>mechanize</category><category>grok-3</category><category>grok-3-mini</category><category>gemini-2.5-flash</category><category>o3</category><category>o4-mini</category><category>llama-4-maverick</category><category>gemma-3-27b</category><category>agent-development</category><category>agent-communication</category><category>cli-tools</category><category>reinforcement-learning</category><category>model-evaluation</category><category>quantization-aware-training</category><category>model-compression</category><category>training-compute</category><category>hybrid-reasoning</category><category>model-benchmarking</category></item><item><title>Gemini 2.5 Flash completes the total domination of the Pareto Frontier</title><link>https://news.smol.ai/issues/25-04-17-ainews-gemini-25-flash-completes-the-total-domination-of-the-pareto-frontier/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-17-ainews-gemini-25-flash-completes-the-total-domination-of-the-pareto-frontier/</guid><description>**Gemini 2.5 Flash** is introduced with a new &quot;thinking budget&quot; feature offering more control compared to Anthropic and OpenAI models, marking a significant update in the Gemini series. **OpenAI** launched **o3** and **o4-mini** models, emphasizing advanced tool use capabilities and multimodal understanding, with **o3** dominating several leaderboards but receiving mixed benchmark reviews. The importance of tool use in AI research and development is highlighted, with **OpenAI Codex CLI** announced as a lightweight open-source coding agent. The news reflects ongoing trends in AI model releases, benchmarking, and tool integration.</description><pubDate>Fri, 18 Apr 2025 02:06:17 GMT</pubDate><category>google</category><category>openai</category><category>anthropic</category><category>gemini-2.5-flash</category><category>o3</category><category>o4-mini</category><category>sama</category><category>kevinweil</category><category>markchen90</category><category>alexandr_wang</category><category>polynoamial</category><category>scaling01</category><category>aidan_mclau</category><category>cwolferesearch</category><category>tool-use</category><category>multimodality</category><category>benchmarking</category><category>reasoning</category><category>reinforcement-learning</category><category>open-source</category><category>model-releases</category><category>chain-of-thought</category><category>coding-agent</category></item><item><title>OpenAI o3, o4-mini, and Codex CLI</title><link>https://news.smol.ai/issues/25-04-16-ainews-openai-o3-o4-mini-and-codex-cli/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-16-ainews-openai-o3-o4-mini-and-codex-cli/</guid><description>**OpenAI** launched the **o3** and **o4-mini** models, emphasizing improvements in **reinforcement-learning scaling** and overall efficiency, making **o4-mini** cheaper and better across prioritized metrics. These models showcase enhanced **vision** and **tool use** capabilities, though API access for these features is pending. The release includes **Codex CLI**, an open-source coding agent that integrates with these models to convert natural language into working code. Accessibility extends to **ChatGPT Plus, Pro, and Team users**, with **o3** being notably more expensive than **Gemini 2.5 Pro**. Performance benchmarks highlight the intelligence gains from scaling inference, with comparisons against models like **Sonnet** and **Gemini**. The launch has been well received despite some less favorable evaluation results.</description><pubDate>Thu, 17 Apr 2025 03:17:29 GMT</pubDate><category>openai</category><category>o3</category><category>o4-mini</category><category>gemini-2.5-pro</category><category>claude-3-sonnet</category><category>chatgpt</category><category>sama</category><category>aidan_mclau</category><category>markchen90</category><category>gdb</category><category>aidan_clark_</category><category>kevinweil</category><category>swyx</category><category>polynoamial</category><category>scaling01</category><category>reinforcement-learning</category><category>performance</category><category>vision</category><category>tool-use</category><category>open-source</category><category>coding-agents</category><category>model-benchmarking</category><category>multimodality</category><category>scaling</category><category>inference</category></item><item><title>QwQ-32B claims to match DeepSeek R1-671B</title><link>https://news.smol.ai/issues/25-04-16-ainews-qwq-32b-claims-to-match-deepseek-r1-671b/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-16-ainews-qwq-32b-claims-to-match-deepseek-r1-671b/</guid><description>**Alibaba Qwen** released their **QwQ-32B** model, a **32 billion parameter** reasoning model using a novel two-stage reinforcement learning approach: first scaling RL for math and coding tasks with accuracy verifiers and code execution servers, then applying RL for general capabilities like instruction following and alignment. Meanwhile, **OpenAI** rolled out **GPT-4.5** to Plus users, with mixed feedback on coding performance and noted inference cost improvements. The QwQ model aims to compete with larger MoE models like **DeepSeek-R1**. *&quot;GPT-4.5 is unusable for coding&quot;* was a notable user critique, while others praised its reasoning improvements due to scaling pretraining.</description><pubDate>Wed, 16 Apr 2025 19:06:15 GMT</pubDate><category>alibaba</category><category>openai</category><category>deepseek-ai</category><category>qwen-2.5-plus</category><category>qwq-32b</category><category>deepseek-r1</category><category>gpt-4.5</category><category>gpt-3</category><category>davinci</category><category>aidan_mclau</category><category>sama</category><category>scaling01</category><category>juberti</category><category>polynoamial</category><category>reach_vb</category><category>reinforcement-learning</category><category>math</category><category>code-execution</category><category>instruction-following</category><category>alignment</category><category>reasoning</category><category>model-release</category><category>model-benchmarking</category><category>scaling</category><category>performance</category><category>inference-costs</category></item><item><title>SOTA Video Gen: Veo 2 and Kling 2 are GA for developers</title><link>https://news.smol.ai/issues/25-04-15-ainews-sota-video-gen-veo-2-and-kling-2-are-ga-for-developers/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-15-ainews-sota-video-gen-veo-2-and-kling-2-are-ga-for-developers/</guid><description>**Google&apos;s Veo 2** video generation model is now available in the **Gemini API** with a cost of **35 cents per second** of generated video, marking a significant step in accessible video generation. Meanwhile, China&apos;s **Kling 2** model launched with pricing around **$2 for a 10-second clip** and a minimum subscription of **$700 per month for 3 months**, generating excitement despite some skill challenges. **OpenAI** announced the **GPT-4.1 family** release, including **GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano**, highlighting improvements in **coding, instruction following, and a 1 million token context window**. The GPT-4.1 models are **26% cheaper than GPT-4o** and will replace the **GPT-4.5 Preview** API version by July 14. Performance benchmarks show GPT-4.1 achieving **54-55% on SWE-bench verified** and a **60% improvement over GPT-4o** in some internal tests, though some critiques note it underperforms compared to other models like OpenRouter and DeepSeekV3 in coding tasks. The release is API-only, with a prompting guide provided for developers.</description><pubDate>Wed, 16 Apr 2025 05:55:06 GMT</pubDate><category>google</category><category>openai</category><category>veo-2</category><category>gemini</category><category>gpt-4.1</category><category>gpt-4o</category><category>gpt-4.5-preview</category><category>gpt-4.1-mini</category><category>gpt-4.1-nano</category><category>kevinweil</category><category>stevenheidel</category><category>aidan_clark_</category><category>video-generation</category><category>api</category><category>coding</category><category>instruction-following</category><category>context-window</category><category>performance</category><category>benchmarks</category><category>model-deprecation</category></item><item><title>GPT 4.1: The New OpenAI Workhorse</title><link>https://news.smol.ai/issues/25-04-14-ainews-gpt-41-the-new-openai-workhorse/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-14-ainews-gpt-41-the-new-openai-workhorse/</guid><description>**OpenAI** released **GPT-4.1**, including **GPT-4.1 mini** and **GPT-4.1 nano**, highlighting improvements in **coding**, **instruction following**, and handling **long contexts** up to **1 million tokens**. The model achieves a **54 score on SWE-bench verified** and shows a **60% improvement over GPT-4o** on internal benchmarks. Pricing for **GPT-4.1 nano** is notably low at **$0.10/1M input** and **$0.40/1M output**. **GPT-4.5 Preview** is being deprecated in favor of **GPT-4.1**. Integration support includes **Llama Index** with day 0 support. Some negative feedback was noted for **GPT-4.1 nano**. Additionally, **Perplexity&apos;s Sonar API** ties with **Gemini-2.5 Pro** for the top spot in the LM Search Arena leaderboard. New benchmarks like **MRCR** and **GraphWalks** were introduced alongside updated prompting guides and cookbooks.</description><pubDate>Tue, 15 Apr 2025 05:16:26 GMT</pubDate><category>openai</category><category>llama-index</category><category>perplexity-ai</category><category>google-deepmind</category><category>gpt-4.1</category><category>gpt-4.1-mini</category><category>gpt-4.1-nano</category><category>gpt-4o</category><category>gemini-2.5-pro</category><category>sama</category><category>kevinweil</category><category>omarsar0</category><category>aidan_mclau</category><category>danhendrycks</category><category>polynoamial</category><category>scaling01</category><category>aravsrinivas</category><category>lmarena_ai</category><category>coding</category><category>instruction-following</category><category>long-context</category><category>benchmarks</category><category>model-pricing</category><category>model-integration</category><category>model-deprecation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-11-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-11-ainews-not-much-happened-today/</guid><description>The AI news recap highlights independent evaluations showing **Grok-3** outperforming models like **GPT-4.5** and **Claude 3.7 Sonnet** on reasoning benchmarks, while **Grok-3 mini** excels in reasoning tasks. Research on **reinforcement learning (RL)** fine-tuning reveals potential improvements for small reasoning models but also notes instability in reported gains. Benchmark results suggest **Quasar Alpha** and **Optimus Alpha** may be versions of **GPT-4.1**. Vision and multimodal models like **Kaleidoscope**, supporting 18 languages, and **InternVL3**, built on **InternViT** and **Qwen2.5VL**, demonstrate advances in multilingual vision and reasoning. The fusion model **TransMamba** combines transformer precision with speed via **SSM** mechanisms. Alibaba&apos;s **FantasyTalking** generates realistic talking portraits. Agent-focused events at **CMU** and tools like **FilmAgent AI** for virtual film production and **BrowseComp** benchmark for browsing agents were announced. The coding assistant **Augment** supports multiple IDEs with code analysis and suggestions. Discussions also covered Google’s new agent-to-agent protocol concept.</description><pubDate>Fri, 11 Apr 2025 20:07:39 GMT</pubDate><category>openai</category><category>alibaba</category><category>cmu</category><category>grok-3</category><category>grok-3-mini</category><category>gpt-4.5</category><category>claude-3.7-sonnet</category><category>quasar-alpha</category><category>optimus-alpha</category><category>gpt-4.1</category><category>kaleidoscope</category><category>internvl3</category><category>internvit</category><category>qwen2.5vl</category><category>transmamba</category><category>fantasytalking</category><category>rasbt</category><category>sarahookr</category><category>mervenoyann</category><category>gneubig</category><category>svpino</category><category>mathemagic1an</category><category>reinforcement-learning</category><category>reasoning</category><category>benchmarks</category><category>vision</category><category>multilinguality</category><category>multimodality</category><category>transformers</category><category>attention-mechanisms</category><category>agents</category><category>code-generation</category><category>model-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-10-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-10-ainews-not-much-happened-today/</guid><description>**OpenAI** teased a *Memory update in ChatGPT* with limited technical details. Evidence suggests upcoming releases of **o3** and **o4-mini** models, alongside a press leak about **GPT-4.1**. **X.ai** launched the **Grok 3** and **Grok 3 mini** APIs, confirmed as **o1** level models. Discussions compared **Google&apos;s TPUv7** with **Nvidia&apos;s GB200**, highlighting TPUv7&apos;s specs like **4,614 TFLOP/s FP8 performance**, **192 GB HBM**, and **1.2 Tbps ICI bandwidth**. TPUv7 may have pivoted from training to inference chip use. Key AI events include **Google Cloud Next 2025** and **Samsung&apos;s Gemini-powered Ballie robot**. The community is invited to participate in the **AI Engineer World&apos;s Fair 2025** and the 2025 State of AI Engineering survey.</description><pubDate>Fri, 11 Apr 2025 00:53:38 GMT</pubDate><category>openai</category><category>x-ai</category><category>google</category><category>nvidia</category><category>samsung</category><category>gpt-4.1</category><category>o3</category><category>o4-mini</category><category>grok-3</category><category>grok-3-mini</category><category>o1</category><category>tpuv7</category><category>gb200</category><category>sama</category><category>memory</category><category>model-release</category><category>hardware-accelerators</category><category>fp8</category><category>hbm</category><category>inference</category><category>ai-conferences</category><category>agent-collaboration</category><category>robotics</category><category>model-comparison</category><category>performance</category><category>power-consumption</category></item><item><title>Google&apos;s Agent2Agent Protocol (A2A)</title><link>https://news.smol.ai/issues/25-04-09-ainews-googles-agent2agent-protocol-a2a/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-09-ainews-googles-agent2agent-protocol-a2a/</guid><description>**Google Cloud Next** announcements featured the launch of **Google and DeepMind&apos;s** full **MCP support** and a new **Agent to Agent protocol** designed for agent interoperability with multiple partners. The protocol includes components like the **Agent Card**, **Task communication channels**, **Enterprise Auth and Observability**, and **Streaming and Push Notification support**. On the model front, **Moonshot AI** released **Kimi-VL-A3B**, a multimodal model with **128K context** and strong vision and math benchmark performance, outperforming **gpt-4o**. **Meta AI** introduced smaller versions of **llama-4** family models: **llama-4-scout** and **llama-4-maverick**, with a larger **Behemoth** model still in training. **DeepCoder 14B** from **UC Berkeley** is an open-source coding model rivaling **openai&apos;s o3-mini** and **o1** models, trained with reinforcement learning on 24K coding problems. **Nvidia** released **llama-3.1-nemotron-ultra-253b** on Hugging Face, noted for beating **llama-4-behemoth** and **maverick** and competing with **deepseek-r1**.</description><pubDate>Thu, 10 Apr 2025 01:31:18 GMT</pubDate><category>google</category><category>google-deepmind</category><category>moonshot-ai</category><category>meta-ai-fair</category><category>uc-berkeley</category><category>openai</category><category>nvidia</category><category>hugging-face</category><category>togethercompute</category><category>deepseek</category><category>kimi-vl-a3b</category><category>gpt-4o</category><category>llama-4-scout</category><category>llama-4-maverick</category><category>llama-4-behemoth</category><category>deepcoder-14b</category><category>o3-mini</category><category>o1</category><category>llama-3.1-nemotron-ultra-253b</category><category>deepseek-r1</category><category>reach_vb</category><category>_akhaliq</category><category>epochairesearch</category><category>artificialanlys</category><category>winglian</category><category>danielhanchen</category><category>yuchenj_uw</category><category>jeremyphoward</category><category>agent-interoperability</category><category>multimodality</category><category>vision</category><category>math</category><category>reinforcement-learning</category><category>coding</category><category>model-training</category><category>open-source</category><category>model-benchmarking</category><category>context-windows</category><category>streaming</category><category>push-notifications</category><category>enterprise-authentication</category><category>model-release</category></item><item><title>DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level</title><link>https://news.smol.ai/issues/25-04-09-ainews-deepcoder-a-fully-open-source-14b-coder-at-o3-mini-level/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-09-ainews-deepcoder-a-fully-open-source-14b-coder-at-o3-mini-level/</guid><description>**Together AI and Agentica** released **DeepCoder-14B**, an open-source 14B parameter coding model rivaling OpenAI&apos;s **o3-mini** and **o1** on coding benchmarks, trained with an open-source RL framework from ByteDance and costing about **$26,880**. **Google DeepMind** launched **Gemini 2.5 Pro** with experimental &quot;Flash&quot; versions available to subscribers. **Moonshot AI** introduced **Kimi-VL-A3B**, a multimodal model with **128K context** outperforming **gpt-4o** on vision and math benchmarks. **Meta AI** released **Llama 4 Scout** and **Maverick**, with a larger **Behemoth** model in training, featuring mixture-of-experts and L2 norm techniques. **Runway** launched **Gen-4 Turbo** with 10x better results than Gen-3 at the same cost. **Google** announced **Imagen 3**, a high-quality text-to-image model now in Vertex AI, enabling easier object removal. The report highlights open-source contributions, reinforcement learning training optimizations, and significant model performance improvements across coding, multimodal, and image generation domains.</description><pubDate>Wed, 09 Apr 2025 19:51:30 GMT</pubDate><category>together-ai</category><category>agentica</category><category>opena</category><category>bytedance</category><category>google-deepmind</category><category>moonshot-ai</category><category>meta-ai-fair</category><category>runway</category><category>deepcoder-14b</category><category>o3-mini</category><category>o1</category><category>gemini-2.5-pro</category><category>kimi-vl-a3b</category><category>gpt-4o</category><category>llama-4-scout</category><category>maverick</category><category>behemoth</category><category>gen-4-turbo</category><category>imagen-3</category><category>philschmid</category><category>lepikhin</category><category>reach_vb</category><category>akhaliq</category><category>yuchenj_uw</category><category>epochairesearch</category><category>danielhanchen</category><category>c_valenzuelab</category><category>open-source</category><category>reinforcement-learning</category><category>code-generation</category><category>multimodality</category><category>model-training</category><category>mixture-of-experts</category><category>l2-normalization</category><category>image-generation</category><category>model-performance</category><category>context-windows</category></item><item><title>Llama 4&apos;s Controversial Weekend Release</title><link>https://news.smol.ai/issues/25-04-07-ainews-llama-4s-controversial-weekend-release/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-07-ainews-llama-4s-controversial-weekend-release/</guid><description>**Meta** released **Llama 4**, featuring two new medium-size MoE open models and a promised 2 Trillion parameter &quot;behemoth&quot; model, aiming to be the largest open model ever. The release included advanced training techniques like Chameleon-like early fusion with MetaCLIP, interleaved chunked attention without RoPE, native FP8 training, and training on up to 40 trillion tokens. Despite the hype, the release faced criticism for lack of transparency compared to Llama 3, implementation issues, and poor performance on some benchmarks. Meta leadership, including **Ahmad Al Dahle**, denied allegations of training on test sets. The smallest Scout model at 109B parameters is too large for consumer GPUs, and the claimed 10 million token context is disputed. The community response has been mixed, with some praising the openness and others pointing out discrepancies and quality concerns.</description><pubDate>Tue, 08 Apr 2025 01:55:40 GMT</pubDate><category>meta</category><category>llama-4</category><category>llama-3</category><category>llama-3-2</category><category>ahmad_al_dahle</category><category>ylecun</category><category>reach_vb</category><category>yuchenj_uw</category><category>mixture-of-experts</category><category>early-fusion</category><category>attention-mechanisms</category><category>fp8-training</category><category>training-data</category><category>benchmarking</category><category>model-performance</category><category>model-release</category><category>multimodality</category><category>open-models</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-04-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-04-ainews-not-much-happened-today/</guid><description>**OpenAI** announced that **o3** and **o4-mini** models will be released soon, with **GPT-5** expected in a few months, delayed for quality improvements and capacity planning. **DeepSeek** introduced **Self-Principled Critique Tuning (SPCT)** to enhance inference-time scalability for generalist reward models. **Anthropic&apos;s Sonnet 3.7** remains a top coding model. **Google&apos;s Gemma 3** is available on KerasHub, and **Qwen 2.5 VL** powers a new Apache 2.0 licensed OCR model. **Gemini 2.5 Pro** entered public preview with increased rate limits and pricing announced, becoming a preferred model for many tasks except image generation. Meta&apos;s architectural advantage and the **FrontierMath benchmark** challenge AI&apos;s long-form reasoning and worldview development. Research reveals LLMs focus attention on the first token as an &quot;attention sink,&quot; preserving representation diversity, demonstrated in **Gemma 7B** and **LLaMa 3.1** models. **MegaScale-Infer** offers efficient serving of large-scale Mixture-of-Experts models with up to **1.90x higher per-GPU throughput**.</description><pubDate>Sat, 05 Apr 2025 01:50:06 GMT</pubDate><category>openai</category><category>deepseek</category><category>anthropic</category><category>google</category><category>meta-ai-fair</category><category>o3</category><category>o4-mini</category><category>gpt-5</category><category>sonnet-3.7</category><category>gemma-3</category><category>qwen-2.5-vl</category><category>gemini-2.5-pro</category><category>gemma-7b</category><category>llama-3-1-405b</category><category>sama</category><category>akhaliq</category><category>nearcyan</category><category>fchollet</category><category>reach_vb</category><category>philschmid</category><category>teortaxestex</category><category>epochairesearch</category><category>omarsar0</category><category>inference-scaling</category><category>reward-modeling</category><category>coding-models</category><category>ocr</category><category>model-preview</category><category>rate-limiting</category><category>model-pricing</category><category>architectural-advantage</category><category>benchmarking</category><category>long-form-reasoning</category><category>attention-mechanisms</category><category>mixture-of-experts</category><category>gpu-throughput</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-03-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-03-ainews-not-much-happened-today/</guid><description>**Gemini 2.5 Pro** shows strengths and weaknesses, notably lacking LaTex math rendering unlike **ChatGPT**, and scored **24.4%** on the **2025 US AMO**. **DeepSeek V3** ranks 8th and 12th on recent leaderboards. **Qwen 2.5** models have been integrated into the **PocketPal** app. Research from **Anthropic** reveals that **Chains-of-Thought (CoT)** reasoning is often unfaithful, especially on harder tasks, raising safety concerns. **OpenAI**&apos;s **PaperBench** benchmark shows AI agents struggle with long-horizon planning, with **Claude 3.5 Sonnet** achieving only **21.0%** accuracy. **CodeAct** framework generalizes **ReAct** for dynamic code writing by agents. **LangChain** explains multi-agent handoffs in LangGraph. **Runway Gen-4** marks a new phase in media creation.</description><pubDate>Fri, 04 Apr 2025 06:34:03 GMT</pubDate><category>google</category><category>anthropic</category><category>openai</category><category>llama_index</category><category>langchain</category><category>runway</category><category>deepseek</category><category>gemini-2.5-pro</category><category>chatgpt</category><category>deepseek-v3</category><category>qwen-2.5</category><category>claude-3.5-sonnet</category><category>claude-3.7-sonnet</category><category>rasbt</category><category>danielhanchen</category><category>hkproj</category><category>math</category><category>benchmarking</category><category>chains-of-thought</category><category>model-performance</category><category>multi-agent-systems</category><category>agent-frameworks</category><category>media-generation</category><category>long-horizon-planning</category><category>code-generation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-04-01-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-04-01-ainews-not-much-happened-today/</guid><description>**OpenAI** plans to release its first open-weight language model since **GPT-2** in the coming months, signaling a move towards more open AI development. **DeepSeek** launched its open-source **R1 model** earlier this year, challenging perceptions of China&apos;s AI progress. **Gemma 3** has achieved function calling capabilities and ranks on the **Berkeley Function-Calling Leaderboard**, while **GemmaCoder3-12b** improves code reasoning performance on **LiveCodeBench**. **Alibaba_Qwen&apos;s Qwen2.5-Omni** introduces a novel Thinker-Talker system and **TMRoPE** for multimodal input understanding. The **TogetherCompute** team achieved **140 TPS** on a 671B parameter model, outperforming **Azure** and **DeepSeek API** on **Nvidia GPUs**. **OpenAI** also expanded **ChatGPT** features with image generation for all free users and a new voice release. **Runway Gen-4** enhances animation for miniature dioramas, and **LangChain** launched a chat-based generative UI agent. Commercial deployment of **Figure 03 humanoid robots** at **BMW** highlights advances in autonomy and manufacturing scaling. New tools include **OpenAI&apos;s realtime transcription API** with **WebRTC** support and **Amazon&apos;s Nova Act AI browser agent**.</description><pubDate>Wed, 02 Apr 2025 06:14:34 GMT</pubDate><category>openai</category><category>deepseek</category><category>berkeley</category><category>alibaba</category><category>togethercompute</category><category>nvidia</category><category>azure</category><category>runway</category><category>langchain</category><category>bmw</category><category>amazon</category><category>gpt-2</category><category>r1</category><category>gemma-3</category><category>gemmacoder3-12b</category><category>qwen2.5-omni</category><category>sama</category><category>clémentdelangue</category><category>lioronai</category><category>scaling01</category><category>cognitivecompai</category><category>osanseviero</category><category>jack_w_rae</category><category>ben_burtenshaw</category><category>theturingpost</category><category>vipulved</category><category>kevinweil</category><category>tomlikesrobots</category><category>adcock_brett</category><category>juberti</category><category>open-source</category><category>function-calling</category><category>benchmarking</category><category>code-reasoning</category><category>multimodality</category><category>inference-speed</category><category>image-generation</category><category>voice-generation</category><category>animation</category><category>robotics</category><category>realtime-transcription</category><category>webrtc</category></item><item><title>&gt;$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)</title><link>https://news.smol.ai/issues/25-03-31-ainews-greaterdollar41b-raised-today-openai-300b-cursor-95b-etched-15b/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-31-ainews-greaterdollar41b-raised-today-openai-300b-cursor-95b-etched-15b/</guid><description>**OpenAI** is preparing to release a highly capable open language model, their first since GPT-2, with a focus on reasoning and community feedback, as shared by **@kevinweil** and **@sama**. **DeepSeek V3 0324** has achieved the #5 spot on the Arena leaderboard, becoming the top open model with an MIT license and cost advantages. **Gemini 2.5 Pro** is noted for outperforming models like **Claude 3.7 Sonnet** in coding tasks, with upcoming pricing and improvements expected soon. New startups like **Sophont** are building open multimodal foundation models for healthcare. Significant fundraises include **Cursor** closing $625M at a $9.6B valuation and **Etched** raising $85M at $1.5B. Innovations in AI infrastructure include **SkyPilot&apos;s** cost-efficient cloud provisioning and the launch of **AgentEvals**, an open-source package for evaluating AI agents. Discussions on smartphone privacy highlight **iPhone&apos;s** stronger user defense compared to Android.</description><pubDate>Tue, 01 Apr 2025 06:33:20 GMT</pubDate><category>openai</category><category>deepseek</category><category>gemini</category><category>cursor</category><category>etched</category><category>skypilot</category><category>agent-evals</category><category>deepseek-v3-0324</category><category>gemini-2.5-pro</category><category>claude-3.7-sonnet</category><category>kevinweil</category><category>sama</category><category>lmarena_ai</category><category>scaling01</category><category>iscienceluvr</category><category>stevenheidel</category><category>lepikhin</category><category>dzhng</category><category>raizamrtn</category><category>karpathy</category><category>open-models</category><category>model-releases</category><category>model-performance</category><category>coding</category><category>multimodality</category><category>model-deployment</category><category>cost-efficiency</category><category>agent-evaluation</category><category>privacy</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-28-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-28-ainews-not-much-happened-today/</guid><description>**GPT-4o** was praised for its improved coding, instruction following, and freedom, becoming the leading non-reasoning coding model surpassing **DeepSeek V3** and **Claude 3.7 Sonnet** in coding benchmarks, though it still lags behind reasoning models like **o3-mini**. Concerns about policy compliance in image generation were noted, with efforts to improve adherence. **Gemini 2.5 Pro** was highlighted for its advanced audio and video understanding, long context capabilities, and integration with platforms like **Cursor AI** and **Windsurf AI**. AI infrastructure developments include a partnership between **Together AI** and **Hypertec Group** to deliver large-scale GPU clusters, and **CoreWeave&apos;s IPO** was celebrated for advancing AI infrastructure. GPU and TPU usage is expected to increase significantly. *&quot;GPT-4o&apos;s transparency and background generation feature&quot;* and *&quot;Gemini 2.5 Pro scored above 50% on Simple-Bench AI Explanation&quot;* were key highlights.</description><pubDate>Fri, 28 Mar 2025 23:18:38 GMT</pubDate><category>openai</category><category>deepseek</category><category>anthropic</category><category>google-deepmind</category><category>togethercompute</category><category>hypertecgroup</category><category>coreweave</category><category>cursor-ai</category><category>windsurf-ai</category><category>gpt-4o</category><category>deepseek-v3</category><category>claude-3.7-sonnet</category><category>o3-mini</category><category>gemini-2.5-pro</category><category>sama</category><category>kevinweil</category><category>joannejang</category><category>nrehiew_</category><category>giffmana</category><category>_philschmid</category><category>scaling01</category><category>saranormous</category><category>coding</category><category>instruction-following</category><category>image-generation</category><category>policy-compliance</category><category>long-context</category><category>audio-processing</category><category>video-processing</category><category>gpu-clusters</category><category>ai-infrastructure</category><category>api-access</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-27-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-27-ainews-not-much-happened-today/</guid><description>**OpenAI** announced the new **GPT-4o** model with enhanced instruction-following, complex problem-solving, and native image generation capabilities. The model shows improved performance in math, coding, and creativity, with features like transparent background image generation. Discussions around content filtering and policy for image generation emphasize balancing creative freedom and harm prevention. **DeepSeek V3-0324** APIs, available on **Hugging Face** and powered by **SambaNovaAI**, outperform benchmarks and models like **Gemini 2.0 Pro** and **Claude 3.7 Sonnet**. **Gemini 2.5 Pro** is recommended for coding, and **Gemini 3** can be deployed easily on Google Cloud Vertex AI via the new Model Garden SDK. The **Gemma 3 Technical Report** has been released on arXiv.</description><pubDate>Fri, 28 Mar 2025 01:20:31 GMT</pubDate><category>openai</category><category>hugging-face</category><category>sambanova</category><category>google-cloud</category><category>gpt-4o</category><category>deepseek-v3-0324</category><category>gemini-2.5-pro</category><category>gemini-3</category><category>claude-3.7-sonnet</category><category>abacaj</category><category>nrehiew_</category><category>sama</category><category>joannejang</category><category>giffmana</category><category>lmarena_ai</category><category>_philschmid</category><category>instruction-following</category><category>image-generation</category><category>content-filtering</category><category>model-performance</category><category>api</category><category>coding</category><category>model-deployment</category><category>benchmarking</category><category>model-release</category></item><item><title>OpenAI adopts MCP</title><link>https://news.smol.ai/issues/25-03-26-ainews-openai-adopts-mcp/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-26-ainews-openai-adopts-mcp/</guid><description>**OpenAI** announced support for **MCP**, a significant technical update. **Google&apos;s Gemini 2.5 Pro** leads benchmarks with top scores in **MMLU-Pro (86%)**, **GPQA Diamond (83%)**, and **AIME 2024 (88%)**, featuring a **1 million token context window** and multimodal inputs. **Alibaba&apos;s Qwen 2.5 Omni 7B** was released as a fully multimodal, interactive, open-source model with a novel &quot;thinker-talker&quot; architecture supporting voice and video chat. **DeepSeek V3-0324** outperforms its predecessor on multiple benchmarks. Research on reasoning features in large language models using sparse autoencoders was highlighted, alongside a study on scaling laws of synthetic data showing performance plateaus near **300B tokens**. Discussions also covered the fastest output speeds of Gemini models and concerns about over-reliance on benchmarks for intelligence measurement. *Swyx* will curate the Data Council AI Engineering Track in April.</description><pubDate>Thu, 27 Mar 2025 01:07:34 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>alibaba</category><category>togethercompute</category><category>gemini-2.5-pro</category><category>gemini-1.5-pro</category><category>gemini-2.0-flash</category><category>qwen-2.5-omni-7b</category><category>deepseek-v3-0324</category><category>deepseek-r1</category><category>swyx</category><category>model-benchmarking</category><category>multimodality</category><category>reasoning</category><category>scaling-laws</category><category>model-quantization</category><category>synthetic-data</category><category>model-performance</category><category>context-windows</category><category>speech-recognition</category><category>translation</category><category>audio-processing</category><category>video-processing</category></item><item><title>Gemini 2.5 Pro + 4o Native Image Gen</title><link>https://news.smol.ai/issues/25-03-25-ainews-gemini-25-pro-4o-native-image-gen/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-25-ainews-gemini-25-pro-4o-native-image-gen/</guid><description>**Gemini 2.5 Pro** from **Google DeepMind** has become the new top AI model, surpassing **Grok 3** by 40 LMarena points, with contributions from **Noam Shazeer** integrating Flash Thinking techniques. It is available as a free, rate-limited experimental model. Meanwhile, **OpenAI** released **GPT 4o Native Images**, an autoregressive image generation model with detailed insights shared by **Allan Jabri** and credits to **Gabe Goh**. Gemini 2.5 Pro excels in reasoning, coding, STEM, multimodal tasks, and instruction following, topping the LMarena leaderboard significantly. It is accessible via Google AI Studio and the Gemini App.</description><pubDate>Wed, 26 Mar 2025 01:13:42 GMT</pubDate><category>google-deepmind</category><category>openai</category><category>lmarena_ai</category><category>gemini-2.5-pro</category><category>gpt-4o</category><category>noam-shazeer</category><category>allan-jabri</category><category>gabe-goh</category><category>autoregressive-models</category><category>multimodality</category><category>reasoning</category><category>coding</category><category>instruction-following</category><category>model-release</category><category>leaderboards</category></item><item><title>Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio</title><link>https://news.smol.ai/issues/25-03-24-ainews-halfmoon-is-reve-image-a-new-sota-image-model-from-ex-adobestability-trio/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-24-ainews-halfmoon-is-reve-image-a-new-sota-image-model-from-ex-adobestability-trio/</guid><description>**Reve**, a new composite AI model from former Adobe and Stability alums **Christian Cantrell**, **Taesung Park**, and **Michaël Gharbi**, has emerged as the top-rated image generation model, surpassing previous state-of-the-art models like Recraft and Ideogram in text rendering and typography. The team emphasizes *&quot;enhancing visual generative models with logic&quot;* and *&quot;understanding user intent with advanced language capabilities&quot;* to iteratively amend visuals based on natural language input. Additionally, **DeepSeek-V3-0324** and **Alibaba&apos;s Qwen2.5-VL-32B-Instruct** models were released with notable performance improvements, including better vision task benchmarks and mathematical reasoning.</description><pubDate>Tue, 25 Mar 2025 01:43:04 GMT</pubDate><category>artificial-analysis</category><category>stability-ai</category><category>adobe</category><category>deepseek</category><category>alibaba</category><category>deepseek-v3-0324</category><category>qwen-2.5-vl-32b-instruct</category><category>recraft</category><category>christian-cantrell</category><category>taesung-park</category><category>michael-gharbi</category><category>text-to-image</category><category>prompt-understanding</category><category>model-composition</category><category>visual-generation</category><category>language-understanding</category><category>model-performance</category><category>complex-prompting</category><category>iterative-generation</category></item><item><title>lots of little things happened this week</title><link>https://news.smol.ai/issues/25-03-21-ainews-lots-of-little-things-happened-this-week/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-21-ainews-lots-of-little-things-happened-this-week/</guid><description>**Anthropic** introduced a novel &apos;think&apos; tool enhancing instruction adherence and multi-step problem solving in agents, with combined reasoning and tool use demonstrated by **Claude**. **NVIDIA**&apos;s **Llama-3.3-Nemotron-Super-49B-v1** ranked #14 on LMArena, noted for strong math reasoning and a 15M post-training dataset. **Sakana AI** launched a Sudoku-based reasoning benchmark to advance AI problem-solving capabilities. **Meta AI** released **SWEET-RL**, a reinforcement learning algorithm improving long-horizon multi-turn tasks by 6%, and introduced **CollaborativeAgentBench**, a benchmark for collaborative LLM agents working with humans on programming and design tasks. **Percy Liang** relaunched the **HELM** benchmark with 5 challenging datasets evaluating 22 top language models.</description><pubDate>Sat, 22 Mar 2025 00:20:28 GMT</pubDate><category>anthropic</category><category>nvidia</category><category>sakana-ai</category><category>meta-ai-fair</category><category>llama-3-3-nemotron-super-49b-v1</category><category>claude</category><category>percy-liang</category><category>reinforcement-learning</category><category>reasoning</category><category>benchmarks</category><category>multi-turn-collaboration</category><category>instruction-following</category><category>dataset-release</category><category>model-evaluation</category></item><item><title>Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI</title><link>https://news.smol.ai/issues/25-03-20-ainews-promptable-prosody-sota-asr-and-semantic-vad-openai-revamps-voice-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-20-ainews-promptable-prosody-sota-asr-and-semantic-vad-openai-revamps-voice-ai/</guid><description>**OpenAI** has launched three new state-of-the-art audio models in their API, including **gpt-4o-transcribe**, a speech-to-text model outperforming Whisper, and **gpt-4o-mini-tts**, a text-to-speech model with promptable prosody allowing control over timing and emotion. The **Agents SDK** now supports audio, enabling voice agents. OpenAI also updated turn detection for real-time voice activity detection (VAD) based on speech content. Additionally, **OpenAI&apos;s o1-pro** model is available to select developers with advanced features like vision and function calling, though at higher compute costs. The community shows strong enthusiasm for these audio advancements, with a radio contest for TTS creations underway. Meanwhile, **Kokoro-82M v1.0** emerges as a leading open weights TTS model with competitive pricing on Replicate.</description><pubDate>Thu, 20 Mar 2025 22:51:24 GMT</pubDate><category>openai</category><category>replicate</category><category>gpt-4o-transcribe</category><category>gpt-4o-mini-tts</category><category>o1-pro</category><category>kokoro-82m</category><category>juberti</category><category>sama</category><category>reach_vb</category><category>kevinweil</category><category>omarsar0</category><category>speech-to-text</category><category>text-to-speech</category><category>voice-activity-detection</category><category>prompt-engineering</category><category>real-time-processing</category><category>model-release</category><category>api</category><category>function-calling</category><category>structured-outputs</category><category>model-performance</category></item><item><title>Every 7 Months: The Moore&apos;s Law for Agent Autonomy</title><link>https://news.smol.ai/issues/25-03-19-ainews-every-7-months-the-moores-law-for-agent-autonomy/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-19-ainews-every-7-months-the-moores-law-for-agent-autonomy/</guid><description>**METR** published a paper measuring AI agent autonomy progress, showing it has doubled every 7 months since **2019 (GPT-2)**. They introduced a new metric, the **50%-task-completion time horizon**, where models like **Claude 3.7 Sonnet** achieve 50% success in about 50 minutes. Projections estimate **1 day autonomy by 2028** and **1 month autonomy by late 2029**. Meanwhile, **Nvidia** released **Cosmos-Transfer1** for conditional world generation and **GR00T-N1-2B**, an open foundation model for humanoid robot reasoning with 2B parameters. **Canopy Labs** introduced **Orpheus 3B**, a high-quality text-to-speech model with zero-shot voice cloning and low latency. **Meta** reportedly delayed **Llama-4** release due to performance issues. **Microsoft** launched **Phi-4-multimodal**.</description><pubDate>Thu, 20 Mar 2025 01:59:24 GMT</pubDate><category>metr</category><category>nvidia</category><category>hugging-face</category><category>canopy-labs</category><category>meta-ai-fair</category><category>microsoft</category><category>claude-3-7-sonnet</category><category>llama-4</category><category>phi-4-multimodal</category><category>gpt-2</category><category>cosmos-transfer1</category><category>gr00t-n1-2b</category><category>orpheus-3b</category><category>reach_vb</category><category>akhaliq</category><category>drjimfan</category><category>scaling01</category><category>agent-autonomy</category><category>task-completion</category><category>multimodality</category><category>text-to-speech</category><category>robotics</category><category>foundation-models</category><category>model-release</category><category>scaling-laws</category><category>fine-tuning</category><category>zero-shot-learning</category><category>latency</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-18-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-18-ainews-not-much-happened-today/</guid><description>At Nvidia GTC Day 1, several AI updates were highlighted: **Google&apos;s Gemini 2.0 Flash** introduces image input/output but is not recommended for text-to-image tasks, with **Imagen 3** preferred for that. **Mistral AI** released **Mistral Small 3.1** with 128k token context window and competitive pricing. **Allen AI** launched **OLMo-32B**, an open LLM outperforming **GPT-4o mini** and **Qwen 2.5**. **ShieldGemma 2** was introduced for image safety classification. **LangChainAI** announced multiple updates including **Julian** powered by **LangGraph** and integration with **AnthropicAI&apos;s MCP**. Jeremy Howard released **fasttransform**, a Python library for data transformations. **Perplexity AI** partnered with **Kalshi** for NCAA March Madness predictions.</description><pubDate>Tue, 18 Mar 2025 22:00:12 GMT</pubDate><category>nvidia</category><category>google</category><category>mistral-ai</category><category>allen-ai</category><category>anthropic</category><category>langchainai</category><category>perplexity-ai</category><category>kalshi</category><category>stripe</category><category>qodoai</category><category>gemini-2.0-flash</category><category>imagen-3</category><category>mistral-small-3.1</category><category>mistral-3</category><category>gpt-4o-mini</category><category>claude-3.5-haiku</category><category>olm0-32b</category><category>qwen-2.5</category><category>shieldgemma-2</category><category>julian</category><category>fasttransform</category><category>jeremyphoward</category><category>karpathy</category><category>abacaj</category><category>mervenoyann</category><category>multimodality</category><category>image-generation</category><category>context-windows</category><category>model-pricing</category><category>open-source-models</category><category>image-classification</category><category>frameworks</category><category>python-libraries</category><category>partnerships</category></item><item><title>Cohere&apos;s Command A claims #3 open model spot (after DeepSeek and Gemma)</title><link>https://news.smol.ai/issues/25-03-17-ainews-coheres-command-a-claims-3-open-model-spot-after-deepseek-and-gemma/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-17-ainews-coheres-command-a-claims-3-open-model-spot-after-deepseek-and-gemma/</guid><description>**Cohere&apos;s Command A** model has solidified its position on the LMArena leaderboard, featuring an open-weight **111B** parameter model with an unusually long **256K context window** and competitive pricing. **Mistral AI** released the lightweight, multilingual, and multimodal **Mistral AI Small 3.1** model, optimized for single RTX 4090 or Mac 32GB RAM setups, with strong performance on instruct and multimodal benchmarks. The new OCR model **SmolDocling** offers fast document reading with low VRAM usage, outperforming larger models like Qwen2.5VL. Discussions highlight the importance of system-level improvements over raw LLM advancements, and **MCBench** is recommended as a superior AI benchmark for evaluating model capabilities across code, aesthetics, and awareness.</description><pubDate>Tue, 18 Mar 2025 00:28:53 GMT</pubDate><category>cohere</category><category>mistral-ai</category><category>hugging-face</category><category>command-a</category><category>mistral-ai-small-3.1</category><category>smoldocling</category><category>qwen-2.5-vl</category><category>aidangomez</category><category>sophiamyang</category><category>mervenoyann</category><category>aidan_mclau</category><category>reach_vb</category><category>lateinteraction</category><category>context-windows</category><category>multilinguality</category><category>multimodality</category><category>fine-tuning</category><category>benchmarking</category><category>ocr</category><category>model-performance</category><category>model-releases</category><category>model-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-14-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-14-ainews-not-much-happened-today/</guid><description>**Google DeepMind** announced updates to **Gemini 2.0**, including an upgraded **Flash Thinking model** with stronger reasoning and native image generation capabilities. **Cohere** launched **Command A**, a **111B** parameter dense model with a **256K context window** and competitive pricing, available on **Hugging Face**. **Meta AI** proposed **Dynamic Tanh (DyT)** as a replacement for normalization layers in Transformers, supported by **Yann LeCun**. **Alibaba** released **QwQ-32B**, a **32.5B** parameter model excelling in math and coding, fine-tuned with reinforcement learning and freely available under **Apache 2.0 license**. **Google DeepMind** also released **Gemma 3** models ranging from **1B to 27B** parameters with a **128K token context window** and over **140 language** support, plus **ShieldGemma 2**, an image safety checker. Benchmarking shows **Gemma 3 27B** has strong vision and memory efficiency but is outperformed by larger models like **Llama 3.3 70B** and **DeepSeek V3 671B**. The **Hugging Face LLM leaderboard** history was shared by @_lewtun.</description><pubDate>Fri, 14 Mar 2025 22:57:23 GMT</pubDate><category>google-deepmind</category><category>cohere</category><category>meta-ai-fair</category><category>alibaba</category><category>hugging-face</category><category>gemini-2.0-flash-thinking</category><category>command-a</category><category>qwq-32b</category><category>gemma-3-27b</category><category>gemma-3</category><category>shieldgemma-2</category><category>llama-3-70b</category><category>deepseek-r1</category><category>o1-mini</category><category>deepseek-v3</category><category>yann-lecun</category><category>model-updates</category><category>model-performance</category><category>benchmarking</category><category>reinforcement-learning</category><category>transformers</category><category>normalization-layers</category><category>image-generation</category><category>vision</category><category>memory-efficiency</category><category>context-windows</category><category>fine-tuning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-13-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-13-ainews-not-much-happened-today/</guid><description>**DeepSeek R1** demonstrates significant efficiency using **FP8** precision, outperforming **Gemma 3 27B** in benchmarks with a **Chatbot Arena Elo Score** of **1363** vs. **1338**, requiring substantial hardware like **32 H100 GPUs** and **2,560GB VRAM**. **OpenAI** labels **DeepSeek** as &quot;state-controlled&quot; and calls for bans on &quot;PRC-produced&quot; models, sparking community backlash accusing **OpenAI** and **Sam Altman** of anti-competitive behavior. Discussions emphasize **DeepSeek&apos;s** openness and affordability compared to **OpenAI**, with users highlighting its local and Hugging Face deployment options. Meanwhile, **Gemma 3** receives mixed community feedback on creativity and worldbuilding.</description><pubDate>Thu, 13 Mar 2025 21:13:47 GMT</pubDate><category>openai</category><category>nvidia</category><category>deepseek</category><category>hugging-face</category><category>deepseek-r1</category><category>gemma-3</category><category>gemma-3-27b</category><category>sam-altman</category><category>fp8</category><category>model-efficiency</category><category>hardware-requirements</category><category>quantization</category><category>benchmarking</category><category>model-deployment</category><category>open-source</category></item><item><title>Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen</title><link>https://news.smol.ai/issues/25-03-12-ainews-gemma-3-beats-deepseek-v3-in-elo-20-flash-beats-gpt4o-with-native-image-gen/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-12-ainews-gemma-3-beats-deepseek-v3-in-elo-20-flash-beats-gpt4o-with-native-image-gen/</guid><description>**Google DeepMind** launched the **Gemma 3** family of models featuring a **128k context window**, **multimodal input (image and video)**, and **multilingual support for 140+ languages**. The **Gemma 3-27B** model ranks among the top open models on LMArena benchmarks, outperforming several competitors and matching **Gemini-1.5-Pro** on benchmarks. Additionally, **Gemini 2** introduced **Flash Native Image Generation** with advanced image editing capabilities, a feature teased by OpenAI but not launched. The updates highlight significant advances in context length, multimodality, and model efficiency via quantization.</description><pubDate>Thu, 13 Mar 2025 01:01:43 GMT</pubDate><category>google-deepmind</category><category>openai</category><category>gemma-3</category><category>gemini-1.5-pro</category><category>gemini-2</category><category>o1-preview</category><category>o3-mini-high</category><category>deepseek-v3</category><category>claude-3.7-sonnet</category><category>qwen-2.5-max</category><category>reach_vb</category><category>_philschmid</category><category>danielhanchen</category><category>lmarena_ai</category><category>osanseviero</category><category>multimodality</category><category>multilinguality</category><category>context-window</category><category>quantization</category><category>image-generation</category><category>model-benchmarking</category><category>model-performance</category><category>vision</category></item><item><title>The new OpenAI Agents Platform</title><link>https://news.smol.ai/issues/25-03-11-ainews-the-new-openai-agents-platform/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-11-ainews-the-new-openai-agents-platform/</guid><description>**OpenAI** introduced a comprehensive suite of new tools for AI agents, including the **Responses API**, **Web Search Tool**, **Computer Use Tool**, **File Search Tool**, and an open-source **Agents SDK** with integrated observability tools, marking a significant step towards the &quot;Year of Agents.&quot; Meanwhile, **Reka AI** open-sourced **Reka Flash 3**, a **21B parameter reasoning model** that outperforms **o1-mini** and powers their Nexus platform, with weights available on **Hugging Face**. The **OlympicCoder** series surpassed **Claude 3.7 Sonnet** and much larger models on competitive coding benchmarks. **DeepSeek** built a **32K GPU cluster** capable of training V3-level models in under a week and is exploring AI distillation. **Hugging Face** announced **Cerebras** inference support, achieving over **2,000 tokens/s** on **Llama 3.3 70B**, 70x faster than leading GPUs. **Reka&apos;s Sonic-2** voice AI model delivers **40ms latency** via the **Together API**. **Alibaba&apos;s Qwen Chat** enhanced its multimodal interface with video understanding up to **500MB**, voice-to-text, guest mode, and expanded file uploads. *Sama* praised OpenAI&apos;s new API as &quot;one of the most well-designed and useful APIs ever.&quot;</description><pubDate>Wed, 12 Mar 2025 00:23:17 GMT</pubDate><category>openai</category><category>reka-ai</category><category>hugging-face</category><category>deepseek</category><category>togethercompute</category><category>alibaba</category><category>reka-flash-3</category><category>o1-mini</category><category>claude-3-7-sonnet</category><category>llama-3-3-70b</category><category>sonic-2</category><category>qwen-chat</category><category>olympiccoder</category><category>sama</category><category>reach_vb</category><category>ai-agents</category><category>api</category><category>model-releases</category><category>fine-tuning</category><category>reinforcement-learning</category><category>model-training</category><category>model-inference</category><category>multimodality</category><category>voice-synthesis</category><category>gpu-clusters</category><category>model-distillation</category><category>performance-optimization</category><category>open-source</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-10-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-10-ainews-not-much-happened-today/</guid><description>The AI news recap highlights several key developments: **nanoMoE**, a PyTorch implementation of a mid-sized Mixture-of-Experts (MoE) model inspired by Andrej Karpathy&apos;s nanoGPT, enables pretraining on commodity hardware within a week. An agentic leaderboard ranks LLMs powering **smolagents CodeAgent**, with **GPT-4.5** leading, followed by **Claude-3.7-Sonnet**. Discussions around **DeepSeek-R1** emphasize AI model commoditization, with DeepSeek dubbed the &quot;OpenAI of China.&quot; **Q-Filters** offer a training-free method for KV cache compression in autoregressive models, achieving **32x compression** with minimal perplexity loss. The **PokéChamp** minimax language agent, powered by **GPT-4o** and **Llama-3-8b**, demonstrates strong performance in Pokémon battles. Other notable models include **TinyR1-32B-Preview** with Branch-Merge Distillation, **R1-Searcher** incentivizing search capability via reinforcement learning, and the **Forgetting Transformer** using a Forget Gate in softmax attention. These advancements reflect ongoing innovation in model architectures, compression, reinforcement learning, and agentic AI.</description><pubDate>Mon, 10 Mar 2025 22:46:37 GMT</pubDate><category>openai</category><category>deepseek</category><category>hugging-face</category><category>gpt-4.5</category><category>claude-3.7-sonnet</category><category>deepseek-r1</category><category>smolagents-codeagent</category><category>gpt-4o</category><category>llama-3-8b</category><category>tinyr1-32b-preview</category><category>r1-searcher</category><category>forgetting-transformer</category><category>nanomoe</category><category>andrej-karpathy</category><category>cwolferesearch</category><category>aymericroucher</category><category>teortaxestex</category><category>jonathanross321</category><category>akhaliq</category><category>mixture-of-experts</category><category>reinforcement-learning</category><category>kv-cache-compression</category><category>agentic-ai</category><category>model-distillation</category><category>attention-mechanisms</category><category>model-compression</category><category>minimax</category><category>model-pretraining</category></item><item><title>DeepSeek&apos;s Open Source Stack</title><link>https://news.smol.ai/issues/25-03-07-ainews-deepseeks-open-source-stack/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-07-ainews-deepseeks-open-source-stack/</guid><description>**DeepSeek&apos;s Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, excelling in PhD-level science QA and math benchmarks. **Character-3**, an omnimodal AI video generation model by Hedra Labs and Together AI, enables realistic animated content creation. **Google DeepMind** introduced the **Gemini embedding model** with an 8k context window, ranking #1 on MMTEB, alongside the **Gemini 2.0 Code Executor** supporting Python libraries and auto-fix features. **Inception Labs&apos; Mercury Coder** is a diffusion-based code generation model offering faster token processing. **OpenAI** released **GPT-4.5**, their largest model yet but with less reasoning ability than some competitors. **AI21 Labs** launched **Jamba Mini 1.6**, noted for superior output speed compared to Gemini 2.0 Flash, GPT-4o mini, and Mistral Small 3. A new dataset of 1.9M scanned pages was released for OCR benchmarking, with **Mistral OCR** showing competitive but not top-tier document parsing performance compared to LLM/LVM-powered methods. *&quot;Cracked engineers are all you need.&quot;*</description><pubDate>Sat, 08 Mar 2025 05:06:31 GMT</pubDate><category>deepseek</category><category>pyspur</category><category>hugging-face</category><category>togethercompute</category><category>hedra-labs</category><category>google-deepmind</category><category>deeplearningai</category><category>openai</category><category>ai21-labs</category><category>mistral-ai</category><category>qwen-qwq-32b</category><category>start</category><category>character-3</category><category>gemini</category><category>gemini-2.0</category><category>mercury-coder</category><category>gpt-4.5</category><category>jamba-mini-1.6</category><category>gemini-2.0-flash</category><category>gpt-4o-mini</category><category>mistral-small-3</category><category>mistral-ocr</category><category>_akhaliq</category><category>lmarena_ai</category><category>reach_vb</category><category>danielhanchen</category><category>_philschmid</category><category>aidan_mclau</category><category>vikhyatk</category><category>jerryjliu0</category><category>fine-tuning</category><category>benchmarking</category><category>multimodality</category><category>code-generation</category><category>diffusion-models</category><category>model-performance</category><category>model-optimization</category><category>ocr</category><category>embedding-models</category><category>context-windows</category><category>runtime-limits</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-06-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-06-ainews-not-much-happened-today/</guid><description>**AI21 Labs launched Jamba 1.6**, touted as the **best open model for private enterprise deployment**, outperforming **Cohere, Mistral, and Llama** on benchmarks like **Arena Hard**. **Mistral AI** released a state-of-the-art **multimodal OCR model** with multilingual and structured output capabilities, available for on-prem deployment. **Alibaba Qwen** introduced **QwQ-32B**, an open-weight reasoning model with **32B parameters** and cost-effective usage, showing competitive benchmark scores. **OpenAI** released **o1** and **o3-mini** models with advanced API features including streaming and function calling. **AMD** unveiled **Instella**, open-source 3B parameter language models trained on **AMD Instinct MI300X GPUs**, competing with **Llama-3.2-3B** and others. **Alibaba** also released **Babel**, open multilingual LLMs performing comparably to **GPT-4o**. **Anthropic** launched **Claude 3.7 Sonnet**, enhancing reasoning and prompt engineering capabilities.</description><pubDate>Fri, 07 Mar 2025 05:50:14 GMT</pubDate><category>ai21-labs</category><category>mistral-ai</category><category>alibaba</category><category>openai</category><category>amd</category><category>anthropic</category><category>hugging-face</category><category>jamba-1.6</category><category>mistral-ocr</category><category>qwq-32b</category><category>o1</category><category>o3-mini</category><category>instella</category><category>llama-3-2-3b</category><category>gemma-2-2b</category><category>qwen-2-5-3b</category><category>babel-9b</category><category>babel-83b</category><category>gpt-4o</category><category>claude-3-7-sonnet</category><category>multimodality</category><category>ocr</category><category>multilinguality</category><category>structured-output</category><category>on-prem-deployment</category><category>reasoning</category><category>benchmarking</category><category>api</category><category>open-source</category><category>model-training</category><category>gpu-optimization</category><category>prompt-engineering</category><category>function-calling</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-03-04-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-04-ainews-not-much-happened-today/</guid><description>**Weights and Biases** announced a **$1.7 billion acquisition by CoreWeave** ahead of CoreWeave&apos;s IPO. **CohereForAI** released the **Aya Vision models (8B and 32B parameters)** supporting **23 languages**, outperforming larger models like **Llama-3.2 90B Vision** and **Molmo 72B**. **Microsoft** introduced **Phi-4-Mini (3.8B parameters)** and **Phi-4-Multimodal models**, excelling in math, coding, and multimodal benchmarks. **CogView4**, a **6B parameter text-to-image model** with **2048x2048 resolution** and Apache 2.0 license, was released. **Alibaba** launched **Wan 2.1**, an open-source video generation model with **720p output** and **16 fps generation**. **Google** announced new AI features for Pixel devices including **Scam Detection** and **Gemini integrations**. **LlamaCloud** reached **General Availability** and raised **$19M Series A funding**, serving over **100 Fortune 500 companies**. **Weaviate** launched the **Query Agent**, the first of three Weaviate Agents.</description><pubDate>Wed, 05 Mar 2025 05:17:34 GMT</pubDate><category>weights-and-biases</category><category>coreweave</category><category>cohereforai</category><category>microsoft</category><category>alibaba</category><category>google</category><category>llamaindex</category><category>weaviate</category><category>aya-vision-8b</category><category>aya-vision-32b</category><category>llama-3-2-90b-vision</category><category>molmo-72b</category><category>phi-4-mini</category><category>phi-4-multimodal</category><category>cogview4</category><category>wan-2-1</category><category>mervenoyann</category><category>reach_vb</category><category>jayalammar</category><category>sarahookr</category><category>aidangomez</category><category>nickfrosst</category><category>dair_ai</category><category>akhaliq</category><category>bobvanluijt</category><category>jerryjliu0</category><category>multilinguality</category><category>vision</category><category>multimodality</category><category>image-generation</category><category>video-generation</category><category>model-releases</category><category>benchmarking</category><category>funding</category><category>agentic-ai</category><category>model-performance</category></item><item><title>Anthropic&apos;s $61.5B Series E</title><link>https://news.smol.ai/issues/25-03-03-ainews-anthropics-dollar615b-series-e/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-03-03-ainews-anthropics-dollar615b-series-e/</guid><description>**Anthropic** raised a **$3.5 billion Series E funding round** at a **$61.5 billion valuation**, signaling strong financial backing for the **Claude** AI model. **GPT-4.5** achieved **#1 rank across all categories** on the LMArena leaderboard, excelling in multi-turn conversations, coding, math, creative writing, and style control. **DeepSeek R1** tied with GPT-4.5 for top performance on hard prompts with style control. Discussions highlighted comparisons between **GPT-4.5** and **Claude 3.7 Sonnet** in coding and workflow applications. The importance of the **LMSYS benchmark** was emphasized, though some questioned the relevance of benchmarks versus user acquisition. Additionally, **Perplexity AI** partnered with **Deutsche Telekom** to integrate the **Perplexity Assistant** into a new AI phone.</description><pubDate>Tue, 04 Mar 2025 06:51:49 GMT</pubDate><category>anthropic</category><category>openai</category><category>deepseek</category><category>lmsys</category><category>perplexity-ai</category><category>deutsche-telekom</category><category>gpt-4.5</category><category>claude-3.7-sonnet</category><category>deepseek-r1</category><category>lmarena_ai</category><category>teortaxestex</category><category>casper_hansen_</category><category>omarsar0</category><category>aidan_mclau</category><category>willdepue</category><category>vikhyatk</category><category>teknim1</category><category>reach_vb</category><category>_aidan_clark_</category><category>cto_junior</category><category>aravsrinivas</category><category>model-performance</category><category>benchmarking</category><category>style-control</category><category>coding</category><category>multi-turn</category><category>funding</category><category>partnerships</category><category>workflow</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-28-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-28-ainews-not-much-happened-today/</guid><description>**GPT-4.5** sparked mixed reactions on Twitter, with **@karpathy** noting users preferred **GPT-4** in a poll despite his personal favor for GPT-4.5&apos;s creativity and humor. Critics like **@abacaj** highlighted **GPT-4.5&apos;s slowness** and questioned its practical value and pricing compared to other models. Performance-wise, **GPT-4.5** ranks above **GPT-4o** but below **o1** and **Claude 3.5 Sonnet**, with **Claude 3.7** outperforming it on many tasks yet GPT-4.5 praised for its humor and &quot;vibes.&quot; Speculation about GPT-4.5&apos;s size suggests around **5 trillion parameters**. Discussions also touched on pricing disparities, with **Perplexity Deep Research** at $20/month versus ChatGPT at $200/month. The emotional intelligence and humor of models like **Claude 3.7** were also noted.</description><pubDate>Sat, 01 Mar 2025 03:41:57 GMT</pubDate><category>openai</category><category>anthropic</category><category>perplexity-ai</category><category>deepseek</category><category>scaling01</category><category>gpt-4.5</category><category>gpt-4</category><category>gpt-4o</category><category>o1</category><category>claude-3.5-sonnet</category><category>claude-3.7</category><category>claude-3-opus</category><category>deepseek-v3</category><category>grok-3</category><category>andrej-karpathy</category><category>jeremyphoward</category><category>abacaj</category><category>stevenheidel</category><category>yuchenj_uw</category><category>aravsrinivas</category><category>dylan522p</category><category>random_walker</category><category>model-performance</category><category>humor</category><category>emotional-intelligence</category><category>model-comparison</category><category>pricing</category><category>context-windows</category><category>model-size</category><category>user-experience</category></item><item><title>GPT 4.5 — Chonky Orion ships!</title><link>https://news.smol.ai/issues/25-02-27-ainews-gpt-45-chonky-orion-ships/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-27-ainews-gpt-45-chonky-orion-ships/</guid><description>**OpenAI released GPT-4.5** as a research preview, highlighting its **deep world knowledge**, **improved understanding of user intent**, and a **128,000 token context window**. It is noted for excelling in **writing, creative tasks, image understanding, and data extraction** but is not a reasoning model. **Microsoft unveiled Phi-4 Multimodal and Phi-4 Mini**, open-source models integrating **text, vision, and speech/audio**, with strong performance in **math and coding tasks**. **Cohere released Command R7B Arabic**, an open-weights model optimized for **Arabic language capabilities** targeting enterprises in the MENA region. The community is exploring the impact of larger models on creative writing, intent understanding, and world knowledge, with GPT-4.5 expected to be a basis for GPT-5.</description><pubDate>Fri, 28 Feb 2025 07:24:08 GMT</pubDate><category>openai</category><category>microsoft</category><category>cohere</category><category>gpt-4.5</category><category>phi-4-multimodal</category><category>phi-4-mini</category><category>command-r7b-arabic</category><category>sama</category><category>kevinweil</category><category>aidan_mclau</category><category>omarsar0</category><category>rasbt</category><category>reach_vb</category><category>creative-writing</category><category>natural-language-processing</category><category>multimodality</category><category>math</category><category>coding</category><category>context-windows</category><category>model-releases</category><category>open-source</category><category>arabic-language</category></item><item><title>lots of small launches</title><link>https://news.smol.ai/issues/25-02-26-ainews-lots-of-small-launches/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-26-ainews-lots-of-small-launches/</guid><description>**GPT-4o Advanced Voice Preview** is now available for free ChatGPT users with enhanced daily limits for Plus and Pro users. **Claude 3.7 Sonnet** has achieved the top rank in WebDev Arena with improved token efficiency. **DeepSeek-R1** with 671B parameters benefits from the **Together Inference** platform optimizing NVIDIA Blackwell GPU usage, alongside the open-source **DeepGEMM** CUDA library delivering up to 2.7x speedups on Hopper GPUs. **Perplexity** launched a new Voice Mode and a **Deep Research API**. The upcoming **Grok 3 API** will support a 1M token context window. Several companies including **Elicit**, **Amazon**, **Anthropic**, **Cloudflare**, **FLORA**, **Elevenlabs**, and **Inception Labs** announced new funding rounds, product launches, and model releases.</description><pubDate>Thu, 27 Feb 2025 04:09:12 GMT</pubDate><category>openai</category><category>anthropic</category><category>amazon</category><category>cloudflare</category><category>perplexity-ai</category><category>deepseek-ai</category><category>togethercompute</category><category>elevenlabs</category><category>elicitorg</category><category>inceptionailabs</category><category>mistral-ai</category><category>gpt-4o</category><category>claude-3.7-sonnet</category><category>claude-3.7</category><category>claude-3.5-sonnet</category><category>deepseek-r1</category><category>deepseek-v3</category><category>grok-3</category><category>lmarena_ai</category><category>alexalbert__</category><category>aravsrinivas</category><category>reach_vb</category><category>voice</category><category>model-releases</category><category>cuda</category><category>gpu-optimization</category><category>inference</category><category>open-source</category><category>api</category><category>model-performance</category><category>token-efficiency</category><category>context-windows</category><category>cuda</category><category>jit-compilation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-25-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-25-ainews-not-much-happened-today/</guid><description>**Claude 3.7 Sonnet** demonstrates exceptional coding and reasoning capabilities, outperforming models like **DeepSeek R1**, **O3-mini**, and **GPT-4o** on benchmarks such as **SciCode** and **LiveCodeBench**. It is available on platforms including **Perplexity Pro**, **Anthropic**, **Amazon Bedrock**, and **Google Cloud**, with pricing at **$3/$15 per million tokens**. Key features include a **64k token thinking mode**, **200k context window**, and the **CLI-based coding assistant Claude Code**. Meanwhile, **DeepSeek** released **DeepEP**, an open-source communication library optimized for MoE model training and inference with support for **NVLink**, **RDMA**, and **FP8**. These updates highlight advancements in coding AI and efficient model training infrastructure.</description><pubDate>Wed, 26 Feb 2025 02:19:12 GMT</pubDate><category>anthropic</category><category>perplexity-ai</category><category>amazon</category><category>google-cloud</category><category>deepseek_ai</category><category>claude-3.7-sonnet</category><category>claude-3.7</category><category>deepseek-r1</category><category>o3-mini</category><category>deepseek-v3</category><category>gemini-2.0-pro</category><category>gpt-4o</category><category>qwen2.5-coder-32b-instruct</category><category>skirano</category><category>omarsar0</category><category>reach_vb</category><category>artificialanlys</category><category>terryyuezhuo</category><category>_akhaliq</category><category>_philschmid</category><category>catherineols</category><category>goodside</category><category>danielhanchen</category><category>coding</category><category>reasoning</category><category>model-benchmarking</category><category>agentic-workflows</category><category>context-window</category><category>model-performance</category><category>open-source</category><category>moe</category><category>model-training</category><category>communication-libraries</category><category>fp8</category><category>nvlink</category><category>rdma</category><category>cli-tools</category></item><item><title>Claude 3.7 Sonnet</title><link>https://news.smol.ai/issues/25-02-24-ainews-claude-37-sonnet/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-24-ainews-claude-37-sonnet/</guid><description>**Anthropic** launched **Claude 3.7 Sonnet**, their most intelligent model to date featuring hybrid reasoning with two thinking modes: near-instant and extended step-by-step thinking. The release includes **Claude Code**, an agentic coding tool in limited preview, and supports a **128k output token capability** in beta. Claude 3.7 Sonnet performs well on coding benchmarks like **SWE-Bench Verified** and **Cognition&apos;s junior-dev eval**, and introduces advanced features such as streaming thinking, prompt caching, and tool use. The model is also benchmarked on **Pokebench**, reflecting agentic capabilities similar to the Voyager paper. The launch is accompanied by extensive documentation, cookbooks, and prompting guides for extended thinking. *&quot;The first generally available hybrid reasoning model&quot;* and *&quot;first coding tool from Anthropic&quot;* were highlighted in social media announcements.</description><pubDate>Tue, 25 Feb 2025 05:58:56 GMT</pubDate><category>anthropic</category><category>claude-3-7-sonnet</category><category>claude-3</category><category>claude-code</category><category>hybrid-reasoning</category><category>extended-thinking</category><category>coding-benchmarks</category><category>agentic-ai</category><category>prompt-caching</category><category>streaming</category><category>token-capacity</category><category>tool-use</category></item><item><title>AI Engineer Summit Day 1</title><link>https://news.smol.ai/issues/25-02-21-ainews-ai-engineer-summit-day-1/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-21-ainews-ai-engineer-summit-day-1/</guid><description>The **AIE Summit** in NYC highlighted key talks including **Grace Isford&apos;s Trends Keynote**, **Neo4j/Pfizer&apos;s presentation**, and **OpenAI&apos;s first definition of Agents**. Speakers announced **$930 million in funding**. On AI Twitter, discussions focused on **Grok-3** and **o3-mini** models, with debates on performance and benchmarking, including **Grok-3&apos;s record compute scale of 4e26 to 5e26 FLOP**. The **o3-mini** model uncovered a critical **CUDA kernel bug** in Sakana AI&apos;s code. **DeepSeek-R1** was promoted as an open-source alternative with notable training batch sizes. Additionally, **Alibaba** announced the **Qwen 2.5-VL** model release.</description><pubDate>Sat, 22 Feb 2025 02:50:34 GMT</pubDate><category>openai</category><category>anthropic</category><category>xai</category><category>togethercompute</category><category>alibaba</category><category>sakana-ai</category><category>grok-3</category><category>o3-mini</category><category>deepseek-r1</category><category>qwen-2.5-vl</category><category>aidan_mclau</category><category>giffmana</category><category>nrehiew_</category><category>teortaxestex</category><category>epochairesearch</category><category>andrew_n_carr</category><category>borismpower</category><category>yuhu_ai_</category><category>benchmarking</category><category>model-performance</category><category>cuda</category><category>model-training</category><category>open-source</category><category>debugging</category><category>inference-speed</category><category>batch-size</category><category>reinforcement-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-21-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-21-ainews-not-much-happened-today/</guid><description>**Grok-3**, a new family of LLMs from **xAI** using **200,000 Nvidia H100 GPUs** for advanced reasoning, outperforms models from **Google, Anthropic, and OpenAI** on math, science, and coding benchmarks. **DeepSeek-R1** from **ByteDance Research** achieves top accuracy on the challenging **SuperGPQA** dataset. **SigLIP 2** from **GoogleDeepMind** improves semantic understanding and OCR with flexible resolutions and multilingual capabilities, available on HuggingFace. **OpenAI&apos;s o3-mini-high** ranks #1 in coding and math prompts. **Perplexity&apos;s R1 1776**, a post-trained version of DeepSeek R1, is available on Ollama. The **Llamba** family distills **Llama-3.x** into efficient recurrent models with higher throughput. **AlphaMaze** combines DeepSeek R1 with GRPO for visual reasoning on ARC-AGI puzzles. **Audiobox Aesthetics** from **Meta AI** offers unified quality assessment for audio. The community notes that Grok 3&apos;s compute increase yields only modest performance gains.</description><pubDate>Fri, 21 Feb 2025 22:50:40 GMT</pubDate><category>xai</category><category>nvidia</category><category>google-deepmind</category><category>anthropic</category><category>openai</category><category>bytedance</category><category>ollama</category><category>meta-ai-fair</category><category>grok-3</category><category>deepseek-r1</category><category>siglip-2</category><category>o3-mini-high</category><category>r1-1776</category><category>llamba-1b</category><category>llamba-3b</category><category>llamba-8b</category><category>llama-3</category><category>alphamaze</category><category>audiobox-aesthetics</category><category>scaling01</category><category>iscienceluvr</category><category>philschmid</category><category>arankomatsuzaki</category><category>reach_vb</category><category>mervenoyann</category><category>wightmanr</category><category>lmarena_ai</category><category>ollama</category><category>akhaliq</category><category>benchmarking</category><category>model-releases</category><category>performance</category><category>reasoning</category><category>multimodality</category><category>semantic-understanding</category><category>ocr</category><category>multilinguality</category><category>model-distillation</category><category>recurrent-neural-networks</category><category>visual-reasoning</category><category>audio-processing</category></item><item><title>The Ultra-Scale Playbook: Training LLMs on GPU Clusters</title><link>https://news.smol.ai/issues/25-02-19-ainews-the-ultra-scale-playbook-training-llms-on-gpu-clusters/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-19-ainews-the-ultra-scale-playbook-training-llms-on-gpu-clusters/</guid><description>**Huggingface** released &quot;The Ultra-Scale Playbook: Training LLMs on GPU Clusters,&quot; an interactive blogpost based on **4000 scaling experiments on up to 512 GPUs**, providing detailed insights into modern GPU training strategies. **DeepSeek** introduced the Native Sparse Attention (NSA) model, gaining significant community attention, while **Perplexity AI** launched R1-1776, an uncensored and unbiased version of DeepSeek&apos;s R1 model. **Google DeepMind** unveiled PaliGemma 2 Mix, a multi-task vision-language model available in **3B, 10B, and 28B sizes**. **Microsoft** introduced Muse, a generative AI model trained on the game Bleeding Edge, and presented Magma, a foundation model for multimodal AI agents excelling in UI navigation and robotic manipulation. **Baichuan-M1-14B** was announced as a state-of-the-art medical LLM trained on **20T tokens**, and a fully open-source 40B genome modeling model using StripedHyena 2 architecture was also released. *&quot;Making your own gaming experience is coming sooner than you&apos;d think,&quot;* noted in relation to Muse.</description><pubDate>Thu, 20 Feb 2025 05:57:17 GMT</pubDate><category>huggingface</category><category>deepseek</category><category>perplexity-ai</category><category>google-deepmind</category><category>microsoft</category><category>baichuan</category><category>stripedhyena</category><category>deepseek-native-sparse-attention</category><category>r1-1776</category><category>paligemma-2-mix</category><category>muse</category><category>baichuan-m1-14b</category><category>stripedhyena-2</category><category>eliebakouch</category><category>nouamanetazi</category><category>lvwerra</category><category>thom-wolf</category><category>proftomyeh</category><category>alex-wang</category><category>aravsrinivas</category><category>_akhaliq</category><category>_philschmid</category><category>mervenoyann</category><category>reach_vb</category><category>arankomatsuzaki</category><category>maximelabonne</category><category>gpu-training</category><category>scaling</category><category>multimodality</category><category>vision</category><category>model-training</category><category>foundation-models</category><category>medical-llm</category><category>genome-modeling</category><category>robotic-manipulation</category><category>interactive-content</category></item><item><title>X.ai Grok 3 and Mira Murati&apos;s Thinking Machines</title><link>https://news.smol.ai/issues/25-02-18-ainews-xai-grok-3-and-mira-muratis-thinking-machines/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-18-ainews-xai-grok-3-and-mira-muratis-thinking-machines/</guid><description>**Grok 3** has launched with mixed opinions but strong benchmark performance, notably outperforming models like **Gemini 2 Pro** and **GPT-4o**. The **Grok-3 mini** variant shows competitive and sometimes superior capabilities, especially in reasoning and coding, with reinforcement learning playing a key role. **Mira Murati** has publicly shared her post-OpenAI plan, founding the frontier lab **Thinking Machines**, focusing on collaborative, personalizable AI, multimodality, and empirical safety and alignment research, reminiscent of **Anthropic**&apos;s approach.</description><pubDate>Tue, 18 Feb 2025 23:54:10 GMT</pubDate><category>anthropic</category><category>openai</category><category>thinking-machines</category><category>grok-3</category><category>grok-3-mini</category><category>gemini-2-pro</category><category>gpt-4o</category><category>o3-mini-high</category><category>o1</category><category>deepseek-r1</category><category>mira-murati</category><category>lmarena_ai</category><category>karpathy</category><category>omarsar0</category><category>ibab</category><category>arankomatsuzaki</category><category>iscienceluvr</category><category>scaling01</category><category>benchmarking</category><category>reasoning</category><category>reinforcement-learning</category><category>coding</category><category>multimodality</category><category>safety</category><category>alignment</category><category>research-publishing</category><category>model-performance</category><category>creative-ai</category></item><item><title>LLaDA: Large Language Diffusion Models</title><link>https://news.smol.ai/issues/25-02-17-ainews-llada-large-language-diffusion-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-17-ainews-llada-large-language-diffusion-models/</guid><description>**LLaDA (Large Language Diffusion Model) 8B** is a breakthrough diffusion-based language model that rivals **LLaMA 3 8B** while training on **7x fewer tokens (2 trillion tokens)** and using **0.13 million H800 GPU hours**. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, **StepFun AI** released two major models: **Step-Video-T2V 30B**, a text-to-video model generating up to **204 frames** with high coherence and motion quality, and **Step-Audio-Chat 132B**, a voice-to-voice model. Additionally, challenging multimodal benchmarks like **Scale AI&apos;s EnigmaEval** and **Cambridge&apos;s ZeroBench** highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.</description><pubDate>Tue, 18 Feb 2025 03:27:47 GMT</pubDate><category>stepfun-ai</category><category>scale-ai</category><category>cambridge</category><category>llamaindex</category><category>llada-8b</category><category>llama-3-8b</category><category>step-video-t2v-30b</category><category>step-audio-chat-132b</category><category>llama-2-7b</category><category>arankomatsuzaki</category><category>_akhaliq</category><category>omarsar0</category><category>iscienceluvr</category><category>gallabytes</category><category>maximelabonne</category><category>reach_vb</category><category>diffusion-models</category><category>text-generation</category><category>multimodality</category><category>video-generation</category><category>voice-processing</category><category>benchmarking</category><category>instruction-following</category><category>model-scaling</category><category>gpu-usage</category><category>long-context</category><category>multi-turn-dialogue</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-14-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-14-ainews-not-much-happened-today/</guid><description>**Smolagents** library by **Huggingface** continues trending. **ChatGPT-4o** latest version `chatgpt-40-latest-20250129` released. **DeepSeek R1 671B** sets speed record at **198 t/s**, fastest reasoning model, recommended with specific prompt settings. **Perplexity Deep Research** outperforms models like **Gemini Thinking**, **o3-mini**, and **DeepSeek-R1** on **Humanity&apos;s Last Exam** benchmark with **21.1%** score and **93.9%** accuracy on **SimpleQA**. **ChatGPT-4o** ranks #1 on Arena leaderboard in multiple categories except math. **OpenAI&apos;s o3 model** powers Deep Research tool for ChatGPT Pro users. **Gemini 2 Flash** and **Qwen 2.5** models support LLMGrading verifier. **Qwen 2.5** models added to PocketPal app. **MLX** shows small LLMs like Qwen 0.5B generate tokens at high speed on M4 Max and iPhone 16 Pro. **Gemini Flash 2.0** leads new AI agent leaderboard. **DeepSeek R1** is most liked on Hugging Face with over 10 million downloads.</description><pubDate>Sat, 15 Feb 2025 01:23:56 GMT</pubDate><category>hugging-face</category><category>openai</category><category>perplexity-ai</category><category>deepseek-ai</category><category>gemini</category><category>qwen</category><category>metr_evals</category><category>chatgpt-4o</category><category>deepseek-r1</category><category>o3</category><category>o3-mini</category><category>gemini-2-flash</category><category>qwen-2.5</category><category>qwen-0.5b</category><category>_akhaliq</category><category>aravsrinivas</category><category>lmarena_ai</category><category>omarsar0</category><category>risingsayak</category><category>reasoning</category><category>benchmarking</category><category>model-performance</category><category>prompt-engineering</category><category>model-optimization</category><category>model-deployment</category><category>small-language-models</category><category>mobile-ai</category><category>ai-agents</category><category>speed-optimization</category></item><item><title>Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)</title><link>https://news.smol.ai/issues/25-02-13-ainews-reasoning-models-are-near-superhuman-coders-openai-ioi-nvidia-kernels/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-13-ainews-reasoning-models-are-near-superhuman-coders-openai-ioi-nvidia-kernels/</guid><description>**o3 model** achieved a **gold medal at the 2024 IOI** and ranks in the **99.8 percentile on Codeforces**, outperforming most humans with reinforcement learning (RL) methods proving superior to inductive bias approaches. **Nvidia&apos;s DeepSeek-R1** autonomously generates GPU kernels that surpass some expert-engineered kernels, showcasing simple yet effective AI-driven optimization. **OpenAI** updated **o1 and o3-mini** models to support file and image uploads in ChatGPT and released **DeepResearch**, a powerful research assistant based on the **o3 model with RL** for deep chain-of-thought reasoning. **Ollama** introduced **OpenThinker models** fine-tuned from **Qwen2.5**, outperforming some DeepSeek-R1 distillation models. **ElevenLabs** grew into a $3.3 billion company specializing in AI voice synthesis without open-sourcing their technology. Research highlights include **Sakana AI Labs&apos; TAID knowledge distillation method** receiving a Spotlight at **ICLR 2025**, and **Apple&apos;s work on scaling laws for mixture-of-experts (MoEs)**. The importance of open-source AI for scientific discovery was also emphasized.</description><pubDate>Fri, 14 Feb 2025 02:42:41 GMT</pubDate><category>openai</category><category>nvidia</category><category>ollama</category><category>elevenlabs</category><category>sakana-ai</category><category>apple</category><category>o3</category><category>o1</category><category>o3-mini</category><category>deepseek-r1</category><category>qwen-2.5</category><category>openthinker</category><category>alex-wei</category><category>karpathy</category><category>abacaj</category><category>awnihannun</category><category>reinforcement-learning</category><category>gpu-kernel-optimization</category><category>fine-tuning</category><category>knowledge-distillation</category><category>scaling-laws</category><category>chain-of-thought-reasoning</category><category>model-accessibility</category></item><item><title>small news items</title><link>https://news.smol.ai/issues/25-02-12-ainews-small-news-items/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-12-ainews-small-news-items/</guid><description>**OpenAI** announced plans for **GPT-4.5 (Orion)** and **GPT-5**, with GPT-5 integrating the **o3** model and offering unlimited chat access in the free tier. **DeepSeek R1 Distilled Qwen 1.5B** outperforms OpenAI&apos;s **o1-preview** on math benchmarks, while **ModernBERT 0.3b** surpasses **Qwen 0.5b** at MMLU without fine-tuning. **Mistral** and **Perplexity** adopt **Cerebras** hardware for 10x performance gains. OpenAI&apos;s **o3** model won a gold medal at the 2024 International Olympiad in Informatics. Partnerships include **Qwen** with **Groq**. Significant RLHF activity is noted in Nigeria and the global south, and **Bytedance** is expected to rise in AI prominence soon. *&quot;GPT5 is all you need.&quot;*</description><pubDate>Thu, 13 Feb 2025 00:10:12 GMT</pubDate><category>openai</category><category>ollama</category><category>mistral</category><category>perplexity</category><category>cerebras</category><category>alibaba</category><category>groq</category><category>bytedance</category><category>gpt-4.5</category><category>gpt-5</category><category>deepseek-r1-distilled-qwen-1.5b</category><category>o1-preview</category><category>modernbert-0.3b</category><category>qwen-0.5b</category><category>o3</category><category>jeremyphoward</category><category>arankomatsuzaki</category><category>sama</category><category>nrehiew_</category><category>danhendrycks</category><category>akhaliq</category><category>math</category><category>benchmarking</category><category>fine-tuning</category><category>model-performance</category><category>reinforcement-learning</category><category>model-architecture</category><category>partnerships</category><category>funding</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-11-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-11-ainews-not-much-happened-today/</guid><description>**Zyphra AI** launched **Zonos-v0.1**, a leading open-weight text-to-speech model supporting multiple languages and zero-shot voice cloning. **Meta FAIR** released the open-source **Audiobox Aesthetics** model trained on 562 hours of audio data. **Kyutai Labs** introduced **Moshi**, a real-time speech-to-speech system with low latency. **Perplexity AI** announced the **Sonar** model based on **Llama 3.3 70b**, outperforming top models like **GPT-4o** and **Claude 3.5 Sonnet** with 1200 tokens/second speed, powered by **Cerebras** infrastructure. **UC Berkeley** open-sourced a 1.5B model trained with reinforcement learning that beats **o1-preview** on math tasks. **ReasonFlux-32B** achieved 91.2% on the MATH benchmark, outperforming **OpenAI o1-preview**. **CrossPoster**, an AI agent for cross-platform posting, was released using **LlamaIndex** workflows. **Brilliant Labs** integrated the **Google DeepMind Gemini Live API** into smart glasses for real-time translation and object identification.</description><pubDate>Wed, 12 Feb 2025 01:24:43 GMT</pubDate><category>zyphra-ai</category><category>meta-ai-fair</category><category>kyutai-labs</category><category>perplexity-ai</category><category>cerebras</category><category>uc-berkeley</category><category>brilliant-labs</category><category>google-deepmind</category><category>zonos-v0.1</category><category>audiobox-aesthetics</category><category>moshi</category><category>sonar</category><category>llama-3-70b</category><category>gpt-4o-mini</category><category>claude-3.5-haiku</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>deepseek-r1-distilled-qwen-1.5b</category><category>reasonflux-32b</category><category>o1-preview</category><category>danhendrycks</category><category>text-to-speech</category><category>speech-to-speech</category><category>benchmarking</category><category>model-performance</category><category>reinforcement-learning</category><category>math</category><category>real-time-processing</category><category>open-source</category><category>cross-platform-integration</category><category>multilinguality</category><category>zero-shot-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-10-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-10-ainews-not-much-happened-today/</guid><description>**Google** released **Gemini 2.0 Flash Thinking Experimental 1-21**, a vision-language reasoning model with a **1 million-token context window** and improved accuracy on science, math, and multimedia benchmarks, surpassing **DeepSeek-R1** but trailing **OpenAI&apos;s o1**. **ZyphraAI** launched **Zonos**, a multilingual **Text-to-Speech model** with **instant voice cloning** and controls for speaking rate, pitch, and emotions, running at **~2x real-time speed on RTX 4090**. **Hugging Face** released **OpenR1-Math-220k**, a large-scale **math reasoning dataset** with **220K problems** and **800K reasoning traces** generated on **512 H100 GPUs**. **Tom Goldstein** introduced **Huginn-3.5B**, an open-source latent reasoning model trained on **800B tokens** that outperforms larger models on reasoning tasks like **GSM8K**. Discussions by **Jeremy Howard** and **iScienceLuvr** highlight advances in implicit latent reasoning and debate the future of human-readable reasoning traces. **Anthropic** launched the **Anthropic Economic Index** to analyze AI&apos;s economic impact using millions of **Claude** conversations.</description><pubDate>Tue, 11 Feb 2025 03:56:45 GMT</pubDate><category>google</category><category>zyphraai</category><category>hugging-face</category><category>anthropic</category><category>deepseek</category><category>openai</category><category>gemini-2.0-flash-thinking-experimental-1-21</category><category>zonos</category><category>openr1-math-220k</category><category>huginn-3.5b</category><category>deepseek-r1</category><category>o1</category><category>claude</category><category>jeremyphoward</category><category>andrej-karpathy</category><category>tom-goldstein</category><category>reach_vb</category><category>iscienceluvr</category><category>vision</category><category>multilingual-models</category><category>text-to-speech</category><category>voice-cloning</category><category>math</category><category>reasoning</category><category>latent-reasoning</category><category>chain-of-thought</category><category>dataset-release</category><category>fine-tuning</category><category>model-training</category><category>model-performance</category><category>context-windows</category><category>benchmarking</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-02-07-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-07-ainews-not-much-happened-today/</guid><description>**DeepSeek-R1 surpasses OpenAI in GitHub stars**, marking a milestone in open-source AI with rapid growth in community interest. **AlphaGeometry2 achieves gold-medalist level performance with an 84% solving rate on IMO geometry problems**, showcasing significant advancements in AI reasoning. **LangChain releases a tutorial for building AI agents in JavaScript**, enhancing developer capabilities in agent deployment. Reflections on **Anthropic&apos;s Claude model** reveal early access and influence on AI development timelines. Lighthearted AI humor includes calls to ban second-order optimizers and challenges in web development longevity. The AI Engineer Summit 2025 workshops were announced, continuing community engagement and education.</description><pubDate>Sat, 08 Feb 2025 04:22:33 GMT</pubDate><category>deepseek</category><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>langchain</category><category>adyen</category><category>deepseek-r1</category><category>alphageometry-2</category><category>claude</category><category>akhaliq</category><category>lmthang</category><category>aymericroucher</category><category>vikhyatk</category><category>swyx</category><category>open-source</category><category>reasoning</category><category>agentic-ai</category><category>javascript</category><category>model-release</category><category>memes</category><category>ai-development</category><category>benchmarking</category></item><item><title>s1: Simple test-time scaling (and Kyutai Hibiki)</title><link>https://news.smol.ai/issues/25-02-06-ainews-s1-simple-test-time-scaling-and-kyutai-hibiki/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-06-ainews-s1-simple-test-time-scaling-and-kyutai-hibiki/</guid><description>**&quot;Wait&quot; is all you need** introduces a novel reasoning model finetuned from **Qwen 2.5 32B** using just **1000 questions with reasoning traces** distilled from **Gemini 2.0 Flash Thinking**, enabling controllable test-time compute by appending &quot;Wait&quot; to extend reasoning. Lead author **Niklas Muennighoff**, known for work on **Bloom**, **StarCoder**, and **BIG-bench**, highlights this method&apos;s efficiency and its reproduction of the famous o1 scaling chart. Additionally, **Kyutai Moshi**&apos;s Hibiki project demonstrates impressive offline French-English live translation on iPhone. Recent AI model releases include **DeepSeek R1 and R3 open source models**, potentially marking a major open-source milestone, **Hugging Face&apos;s SmolLM2** emphasizing data-centric training for small LMs, and **IBM&apos;s Granite-Vision-3.1-2B**, a small vision-language model with strong performance. Key research papers spotlight **LIMO** for minimal demonstration reasoning achieving high accuracy on AIME and MATH benchmarks, and **Token-Assisted Reasoning** mixing latent and text tokens to improve language model reasoning.</description><pubDate>Fri, 07 Feb 2025 03:47:44 GMT</pubDate><category>google-deepmind</category><category>qwen</category><category>gemini</category><category>hugging-face</category><category>ibm</category><category>deepseek</category><category>qwen-2.5-32b</category><category>gemini-2.0-flash</category><category>smollm2</category><category>granite-vision-3.1-2b</category><category>niklas-muennighoff</category><category>reasoning</category><category>fine-tuning</category><category>scaling-laws</category><category>open-source-models</category><category>data-centric-training</category><category>vision</category><category>multilingual-models</category><category>language-model-reasoning</category></item><item><title>Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking</title><link>https://news.smol.ai/issues/25-02-05-ainews-gemini-20-flash-ga-with-new-flash-lite-20-pro-and-flash-thinking/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-05-ainews-gemini-20-flash-ga-with-new-flash-lite-20-pro-and-flash-thinking/</guid><description>**Google DeepMind** officially launched **Gemini 2.0** models including **Flash**, **Flash-Lite**, and **Pro Experimental**, with **Gemini 2.0 Flash** outperforming **Gemini 1.5 Pro** while being **12x cheaper** and supporting **multimodal input** and a **1 million token context window**. **Andrej Karpathy** released a **3h31m** video deep dive into **large language models**, covering **pretraining**, **fine-tuning**, and **reinforcement learning** with examples like **GPT-2** and **Llama 3.1**. A free course on **Transformer architecture** was introduced by **Jay Alammar**, **Maarten Gr**, and **Andrew Ng**, focusing on **tokenizers**, **embeddings**, and **mixture-of-expert models**. **DeepSeek-R1** reached **1.2 million downloads** on **Hugging Face** with a detailed **36-page technical report**. **Anthropic** increased rewards to **$10K** and **$20K** for their jailbreak challenge, while **BlueRaven** extension was updated to hide Twitter metrics for unbiased engagement.</description><pubDate>Thu, 06 Feb 2025 02:00:20 GMT</pubDate><category>google-deepmind</category><category>hugging-face</category><category>anthropic</category><category>gemini-2.0-flash</category><category>gemini-2.0-flash-lite</category><category>gemini-2.0-pro-experimental</category><category>gemini-1.5-pro</category><category>deepseek-r1</category><category>gpt-2</category><category>llama-3-1</category><category>andrej-karpathy</category><category>jayalammar</category><category>maartengr</category><category>andrewyng</category><category>nearcyan</category><category>multimodality</category><category>context-windows</category><category>cost-efficiency</category><category>pretraining</category><category>fine-tuning</category><category>reinforcement-learning</category><category>transformer</category><category>tokenization</category><category>embeddings</category><category>mixture-of-experts</category></item><item><title>How To Scale Your Model, by DeepMind</title><link>https://news.smol.ai/issues/25-02-04-ainews-how-to-scale-your-model-by-deepmind/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-04-ainews-how-to-scale-your-model-by-deepmind/</guid><description>**Researchers at Google DeepMind (GDM)** released a comprehensive &quot;little textbook&quot; titled **&quot;How To Scale Your Model&quot;** covering modern Transformer architectures, inference optimizations beyond O(N^2) attention, and high-performance computing concepts like rooflines. The resource includes practical problems and real-time comment engagement. On AI Twitter, several key updates include the open-sourced humanoid robotics model **ASAP** inspired by athletes like **Cristiano Ronaldo**, **LeBron James**, and **Kobe Bryant**; a new paper on **Mixture-of-Agents** proposing the **Self-MoA** method for improved LLM output aggregation; training of reasoning LLMs using the **GRPO algorithm** from **DeepSeek** demonstrated on **Qwen 0.5**; findings on bias in LLMs used as judges highlighting the need for multiple independent evaluations; and the release of **mlx-rs**, a Rust library for machine learning with examples including **Mistral** text generation. Additionally, **Hugging Face** launched an AI app store featuring over **400,000 apps** with 2,000 new daily additions and 2.5 million weekly visits, enabling AI-powered app search and categorization.</description><pubDate>Wed, 05 Feb 2025 06:59:23 GMT</pubDate><category>google-deepmind</category><category>deepseek</category><category>hugging-face</category><category>qwen-0.5</category><category>omarsar0</category><category>drjimfan</category><category>tairanhe99</category><category>guanyashi</category><category>lioronai</category><category>_philschmid</category><category>awnihannun</category><category>clementdelangue</category><category>transformers</category><category>inference</category><category>high-performance-computing</category><category>robotics</category><category>sim2real</category><category>mixture-of-experts</category><category>reinforcement-learning</category><category>bias-mitigation</category><category>rust</category><category>text-generation</category><category>open-source</category></item><item><title>OpenAI takes on Gemini&apos;s Deep Research</title><link>https://news.smol.ai/issues/25-02-03-ainews-openai-takes-on-geminis-deep-research/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-03-ainews-openai-takes-on-geminis-deep-research/</guid><description>**OpenAI** released the full version of the **o3** agent, with a new **Deep Research** variant showing significant improvements on the **HLE benchmark** and achieving SOTA results on **GAIA**. The release includes an &quot;inference time scaling&quot; chart demonstrating rigorous research, though some criticism arose over public test set results. The agent is noted as &quot;extremely simple&quot; and currently limited to 100 queries/month, with plans for a higher-rate version. Reception has been mostly positive, with some skepticism. Additionally, advances in **reinforcement learning** were highlighted, including a simple test-time scaling technique called **budget forcing** that improved reasoning on math competitions by 27%. Researchers from **Google DeepMind**, **NYU**, **UC Berkeley**, and **HKU** contributed to these findings. The original **Gemini Deep Research** team will participate in the upcoming AI Engineer NYC event.</description><pubDate>Tue, 04 Feb 2025 02:44:29 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>nyu</category><category>uc-berkeley</category><category>hku</category><category>o3</category><category>o3-mini-high</category><category>o3-deep-research-mini</category><category>sama</category><category>danhendrycks</category><category>ethan-mollick</category><category>dan-shipper</category><category>reinforcement-learning</category><category>benchmarking</category><category>inference-speed</category><category>model-performance</category><category>reasoning</category><category>test-time-scaling</category><category>agent-design</category></item><item><title>o3-mini launches, OpenAI on &quot;wrong side of history&quot;</title><link>https://news.smol.ai/issues/25-02-01-ainews-o3-mini-launches-openai-on-wrong-side-of-history/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-02-01-ainews-o3-mini-launches-openai-on-wrong-side-of-history/</guid><description>**OpenAI** released **o3-mini**, a new reasoning model available for free and paid users with a &quot;high&quot; reasoning effort option that outperforms the earlier **o1** model on STEM tasks and safety benchmarks, costing **93% less** per token. **Sam Altman** acknowledged a shift in open source strategy and credited **DeepSeek R1** for influencing assumptions. **MistralAI** launched **Mistral Small 3 (24B)**, an open-weight model with competitive performance and low API costs. **DeepSeek R1** is supported by **Text-generation-inference v3.1.0** and available via **ai-gradio** and replicate. The news highlights advancements in reasoning, cost-efficiency, and safety in AI models.</description><pubDate>Sat, 01 Feb 2025 09:16:19 GMT</pubDate><category>openai</category><category>mistral-ai</category><category>deepseek</category><category>togethercompute</category><category>fireworksai_hq</category><category>ai-gradio</category><category>replicate</category><category>o3-mini</category><category>o1</category><category>gpt-4o</category><category>mistral-small-3-24b</category><category>deepseek-r1</category><category>sam-altman</category><category>reasoning</category><category>safety</category><category>cost-efficiency</category><category>model-performance</category><category>benchmarking</category><category>api</category><category>open-weight-models</category><category>model-releases</category></item><item><title>Mistral Small 3 24B and Tulu 3 405B</title><link>https://news.smol.ai/issues/25-01-30-ainews-mistral-small-3-24b-and-tulu-3-405b/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-30-ainews-mistral-small-3-24b-and-tulu-3-405b/</guid><description>**Mistral AI** released **Mistral Small 3**, a **24B parameter** model optimized for local inference with low latency and **81% accuracy on MMLU**, competing with **Llama 3.3 70B**, **Qwen-2.5 32B**, and **GPT4o-mini**. **AI2** released **Tülu 3 405B**, a large finetuned model of **Llama 3** using Reinforcement Learning from Verifiable Rewards (RVLR), competitive with **DeepSeek v3**. **Sakana AI** launched **TinySwallow-1.5B**, a Japanese language model using **TAID** for on-device use. **Alibaba_Qwen** released **Qwen 2.5 Max**, trained on **20 trillion tokens**, with performance comparable to **DeepSeek V3**, **Claude 3.5 Sonnet**, and **Gemini 1.5 Pro**, and updated API pricing. These releases highlight advances in open models, efficient inference, and reinforcement learning techniques.</description><pubDate>Fri, 31 Jan 2025 00:08:47 GMT</pubDate><category>mistral-ai</category><category>ai2</category><category>sakana-ai</category><category>alibaba_qwen</category><category>deepseek</category><category>ollama</category><category>llamaindex</category><category>mistral-small-3</category><category>tulu-3-405b</category><category>llama-3</category><category>tiny-swallow-1.5b</category><category>qwen-2.5-max</category><category>deepseek-v3</category><category>claude-3.5-sonnet</category><category>gemini-1.5-pro</category><category>gpt4o-mini</category><category>llama-3-3-70b</category><category>clementdelangue</category><category>dchaplot</category><category>reach_vb</category><category>reinforcement-learning</category><category>model-fine-tuning</category><category>local-inference</category><category>model-performance</category><category>model-optimization</category><category>on-device-ai</category><category>instruction-following</category><category>api</category><category>training-data</category><category>natural-language-processing</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-29-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-29-ainews-not-much-happened-today/</guid><description>**DeepSeek-R1 and DeepSeek-V3** models have made significant advancements, trained on an **instruction-tuning dataset of 1.5M samples** with **600,000 reasoning** and **200,000 non-reasoning SFT data**. The models demonstrate strong **performance benchmarks** and are deployed on-premise via collaborations with **Dell** and **Hugging Face**. Training costs are estimated around **$5.5M to $6M**, with efficient hardware utilization on **8xH100 servers**. The **International AI Safety Report** highlights risks such as **malicious use**, **malfunctions**, and **systemic risks** including **AI-driven cyberattacks**. Industry leaders like **Yann LeCun** and **Yoshua Bengio** provide insights on market reactions, AI safety, and ethical considerations, with emphasis on AI&apos;s role in creativity and economic incentives.</description><pubDate>Thu, 30 Jan 2025 01:07:40 GMT</pubDate><category>deepseek</category><category>hugging-face</category><category>dell</category><category>openai</category><category>deepseek-r1</category><category>deepseek-v3</category><category>coder-v2</category><category>prover</category><category>yann-lecun</category><category>yoshua-bengio</category><category>francois-chollet</category><category>giffman</category><category>instruction-tuning</category><category>performance-benchmarks</category><category>model-deployment</category><category>training-costs</category><category>hardware-scalability</category><category>ai-safety</category><category>risk-mitigation</category><category>ethical-ai</category><category>open-source</category><category>gpu-utilization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-28-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-28-ainews-not-much-happened-today/</guid><description>**Huawei chips** are highlighted in a diverse AI news roundup covering **NVIDIA&apos;s** stock rebound, new open music foundation models like **Local Suno**, and competitive AI models such as **Qwen 2.5 Max** and **Deepseek V3**. The release of **DeepSeek Janus Pro**, a multimodal LLM with image generation capabilities, and advancements in **reinforcement learning** and **chain-of-thought reasoning** are noted. Discussions include GPU rebranding with **NVIDIA&apos;s H6400 GPUs**, data center innovations, and enterprise AI applications like crypto APIs in hedge funds. *&quot;Deepseek R1&apos;s capabilities&quot;* and *&quot;Qwen 2.5 models added to applications&quot;* are key highlights.</description><pubDate>Wed, 29 Jan 2025 01:48:45 GMT</pubDate><category>nvidia</category><category>anthropic</category><category>openai</category><category>deepseek</category><category>huawei</category><category>vercel</category><category>bespoke-labs</category><category>deepseek-r1</category><category>qwen-2.5</category><category>qwen-2.5-max</category><category>deepseek-v3</category><category>deepseek-janus-pro</category><category>gpt-4</category><category>saranormous</category><category>zizhpan</category><category>victormustar</category><category>omarsar0</category><category>markchen90</category><category>sakanaailabs</category><category>reach_vb</category><category>madiator</category><category>dain_mclau</category><category>francoisfleuret</category><category>garygodchaux</category><category>arankomatsuzaki</category><category>id_aa_carmack</category><category>lavanyasant</category><category>virattt</category><category>model-merging</category><category>multimodality</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>gpu-optimization</category><category>compute-infrastructure</category><category>compression</category><category>crypto-api</category><category>image-generation</category></item><item><title>DeepSeek #1 on US App Store, Nvidia stock tanks -17%</title><link>https://news.smol.ai/issues/25-01-27-ainews-deepseek-1-on-us-app-store-nvidia-stock-tanks-17percent/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-27-ainews-deepseek-1-on-us-app-store-nvidia-stock-tanks-17percent/</guid><description>**DeepSeek** has made a significant cultural impact by hitting mainstream news unexpectedly in 2025. The **DeepSeek-R1** model features a massive **671B parameter MoE architecture** and demonstrates **chain-of-thought (CoT)** capabilities comparable to **OpenAI&apos;s o1** at a lower cost. The **DeepSeek V3** model trains a **236B parameter model 42% faster** than its predecessor using **fp8 precision**. The **Qwen2.5** multimodal models support images and videos with sizes ranging from **3B to 72B parameters**, featuring strong vision and agentic capabilities. **LangChain** and **LangGraph** integration enable AI chatbots with memory and tool use, including applications like the **DeFi Agent**. Discussions highlight **NVIDIA&apos;s** role in hardware acceleration, with concerns about stock drops due to **DeepSeek&apos;s** efficiency and market fears. The compute demand is expected to rise despite efficiency gains, driven by inference scaling and MoE design improvements.</description><pubDate>Tue, 28 Jan 2025 05:28:32 GMT</pubDate><category>deepseek</category><category>openai</category><category>nvidia</category><category>langchain</category><category>deepseek-r1</category><category>deepseek-v3</category><category>qwen2.5-vl</category><category>o1</category><category>sama</category><category>mervenoyann</category><category>omarasar0</category><category>teortaxestex</category><category>nptacek</category><category>carpeetti</category><category>finbarrtimbers</category><category>cwolferesearch</category><category>arthurrapier</category><category>danhendrycks</category><category>scaling01</category><category>janusflow</category><category>moe-architecture</category><category>chain-of-thought</category><category>fp8-precision</category><category>multimodality</category><category>vision</category><category>agentic-ai</category><category>inference-scaling</category><category>gpu-optimization</category><category>model-efficiency</category><category>ai-chatbots</category><category>memory-integration</category><category>tool-use</category><category>stock-market-reactions</category></item><item><title>TinyZero: Reproduce DeepSeek R1-Zero for $30</title><link>https://news.smol.ai/issues/25-01-24-ainews-tinyzero-reproduce-deepseek-r1-zero-for-dollar30/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-24-ainews-tinyzero-reproduce-deepseek-r1-zero-for-dollar30/</guid><description>**DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-Zero, in a cost-effective Qwen model fine-tune for two math tasks. A key finding is a lower bound to the distillation effect at **1.5B parameters**, with RLCoT reasoning emerging as an intrinsic property. Various RL techniques like PPO, DeepSeek&apos;s GRPO, or PRIME show similar outcomes, and starting from an Instruct model speeds convergence. The **Humanity’s Last Exam (HLE) Benchmark** introduces a challenging multi-modal test with **3,000 expert-level questions** across **100+ subjects**, where models perform below **10%**, with **DeepSeek-R1** achieving **9.4%**. DeepSeek-R1 excels in chain-of-thought reasoning, outperforming models like **o1** while being **20x cheaper** and MIT licensed. The **WebDev Arena Leaderboard** ranks DeepSeek-R1 #2 in technical domains and #1 under Style Control, closing in on **Claude 3.5 Sonnet**. OpenAI&apos;s **Operator** is deployed to 100% of Pro users in the US, enabling tasks like ordering meals and booking reservations, and functions as a research assistant for AI paper searches and summaries. Hugging Face announces a leadership change after significant growth, while Meta AI releases the first stable version of **Llama Stack** with streamlined upgrades and automated verification. DeepSeek-R1&apos;s open-source success is celebrated, and technical challenges like memory management on macOS 15+ are addressed with residency sets in MLX for stability.</description><pubDate>Sat, 25 Jan 2025 02:32:28 GMT</pubDate><category>deepseek</category><category>berkeley</category><category>hugging-face</category><category>meta-ai-fair</category><category>openai</category><category>deeplearningai</category><category>deepseek-r1</category><category>qwen</category><category>o1</category><category>claude-3-sonnet</category><category>claude-3</category><category>prime</category><category>ppo</category><category>grpo</category><category>llama-stack</category><category>jiayi-pan</category><category>saranormous</category><category>reach_vb</category><category>lmarena_ai</category><category>nearcyan</category><category>omarsar0</category><category>philschmid</category><category>hardmaru</category><category>awnihannun</category><category>winglian</category><category>reinforcement-learning</category><category>fine-tuning</category><category>chain-of-thought</category><category>multi-modal-benchmark</category><category>memory-management</category><category>model-training</category><category>open-source</category><category>agentic-workflow-automation</category><category>model-performance</category></item><item><title>OpenAI launches Operator, its first Agent</title><link>https://news.smol.ai/issues/25-01-23-ainews-openai-launches-operator-its-first-agent/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-23-ainews-openai-launches-operator-its-first-agent/</guid><description>**OpenAI** launched **Operator**, a premium computer-using agent for web tasks like booking and ordering, available now for Pro users in the US with an API promised. It features long horizon remote VMs up to 20 minutes and video export, showing state-of-the-art agent performance but not yet human-level. **Anthropic** had launched a similar agent 3 months earlier as an open source demo. **DeepSeek AI** unveiled **DeepSeek R1**, an open-source reasoning model excelling on the **Humanity&apos;s Last Exam** dataset, outperforming models like **LLaMA 4** and **OpenAI&apos;s o1**. **Google DeepMind** open-sourced **VideoLLaMA 3**, a multimodal foundation model for image and video understanding. **Perplexity AI** released **Perplexity Assistant** for Android with reasoning and search capabilities. The **Humanity&apos;s Last Exam** dataset contains 3,000 questions testing AI reasoning, with current models scoring below 10% accuracy, indicating room for improvement. OpenAI&apos;s Computer-Using Agent (CUA) shows improved performance on OSWorld and WebArena benchmarks but still lags behind humans. **Anthropic AI** introduced Citations for safer AI responses. *Sam Altman* and *Swyx* commented on Operator&apos;s launch and capabilities.</description><pubDate>Fri, 24 Jan 2025 03:34:34 GMT</pubDate><category>openai</category><category>anthropic</category><category>deepseek-ai</category><category>google-deepmind</category><category>perplexity-ai</category><category>operator</category><category>deepseek-r1</category><category>videollama-3</category><category>llama-4</category><category>o1</category><category>claude</category><category>sam-altman</category><category>swyx</category><category>computer-using-agent</category><category>reasoning</category><category>multimodality</category><category>performance-benchmarks</category><category>open-source</category><category>ai-safety</category><category>benchmarking</category><category>video-generation</category><category>model-evaluation</category></item><item><title>Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning</title><link>https://news.smol.ai/issues/25-01-22-ainews-bespoke-stratos-sky-t1-the-vicunaalpaca-moment-for-reasoning/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-22-ainews-bespoke-stratos-sky-t1-the-vicunaalpaca-moment-for-reasoning/</guid><description>**Reasoning Distillation** has emerged as a key technique, with Berkeley/USC researchers releasing **Sky-T1-32B-Preview**, a finetuned model of **Qwen 2.5 32B** using 17k reasoning traces for just **$450**, matching benchmarks of **o1-preview**. **DeepSeek** introduced **R1**, a model surpassing **o1-preview** and enabling distillation to smaller models like a 1.5B Qwen to match **gpt-4o** and **claude-3-sonnet** levels. **Bespoke Labs** further distilled **R1** on Qwen, outperforming **o1-preview** with fewer samples. This progress suggests that *&quot;SFT is all you need&quot;* for reasoning without major architecture changes. Additionally, **DeepSeek-R1** uses pure reinforcement learning with supervised finetuning to accelerate convergence and shows strong reasoning and multimodal capabilities. **Google&apos;s Gemini 2.0 Flash Thinking** model boasts a **1 million token context window**, code execution, and excels in math, science, and multimodal reasoning. Critiques highlight challenges in model repeatability, behavioral self-awareness, and RLHF limitations in reasoning robustness.</description><pubDate>Thu, 23 Jan 2025 07:08:27 GMT</pubDate><category>berkeley</category><category>usc</category><category>deepseek</category><category>bespoke-labs</category><category>google</category><category>llmsys</category><category>stanford</category><category>lm-sys</category><category>sky-t1-32b-preview</category><category>qwen-2.5-32b</category><category>r1</category><category>o1-preview</category><category>gpt-4o</category><category>claude-3-sonnet</category><category>bespoke-stratos-32b</category><category>gemini-2.0-flash-thinking</category><category>teortaxestex</category><category>cwolferesearch</category><category>madiator</category><category>chakraai</category><category>philschmid</category><category>abacaj</category><category>omarsar0</category><category>reasoning</category><category>supervised-finetuning</category><category>reinforcement-learning</category><category>multimodality</category><category>model-distillation</category><category>context-windows</category><category>code-execution</category><category>model-repeatability</category><category>behavioral-self-awareness</category><category>rlhf</category></item><item><title>Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2</title><link>https://news.smol.ai/issues/25-01-21-ainews-project-stargate-dollar500b-datacenter-17percent-of-us-gdp-and-gemini-2-flash-thinking-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-21-ainews-project-stargate-dollar500b-datacenter-17percent-of-us-gdp-and-gemini-2-flash-thinking-2/</guid><description>**Project Stargate**, a US &quot;AI Manhattan project&quot; led by **OpenAI** and **Softbank**, supported by **Oracle**, **Arm**, **Microsoft**, and **NVIDIA**, was announced with a scale comparable to the original Manhattan project costing **$35B inflation adjusted**. Despite Microsoft&apos;s reduced role as exclusive compute partner, the project is serious but not immediately practical. Meanwhile, **Noam Shazeer** revealed a second major update to **Gemini 2.0 Flash Thinking**, enabling **1M token long context** usable immediately. Additionally, **AI Studio** introduced a new **code interpreter** feature. On Reddit, **DeepSeek R1**, a distillation of **Qwen 32B**, was released for free on **HuggingChat**, sparking discussions on self-hosting, performance issues, and quantization techniques. DeepSeek&apos;s CEO **Liang Wenfeng** highlighted their focus on **fundamental AGI research**, efficient **MLA architecture**, and commitment to **open-source development** despite export restrictions, positioning DeepSeek as a potential alternative to closed-source AI trends.</description><pubDate>Wed, 22 Jan 2025 01:56:21 GMT</pubDate><category>openai</category><category>softbank</category><category>oracle</category><category>arm</category><category>microsoft</category><category>nvidia</category><category>huggingface</category><category>deepseek-ai</category><category>gemini-2.0-flash</category><category>deepseek-r1</category><category>qwen-32b</category><category>noam-shazeer</category><category>liang-wenfeng</category><category>long-context</category><category>quantization</category><category>code-interpretation</category><category>model-distillation</category><category>open-source</category><category>agi-research</category><category>model-performance</category><category>memory-optimization</category></item><item><title>DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level</title><link>https://news.smol.ai/issues/25-01-20-ainews-deepseek-r1-o1-level-open-weights-model-and-a-simple-recipe-for-upgrading-15b-models-to-sonnet4o-level/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-20-ainews-deepseek-r1-o1-level-open-weights-model-and-a-simple-recipe-for-upgrading-15b-models-to-sonnet4o-level/</guid><description>**DeepSeek** released **DeepSeek R1**, a significant upgrade over **DeepSeek V3** from just three weeks prior, featuring 8 models including full-size 671B MoE models and multiple distillations from **Qwen 2.5** and **Llama 3.1/3.3**. The models are MIT licensed, allowing finetuning and distillation. Pricing is notably cheaper than **o1** by 27x-50x. The training process used **GRPO** (reward for correctness and style outcomes) without relying on PRM, MCTS, or reward models, focusing on reasoning improvements through reinforcement learning. Distilled models can run on **Ollama** and show strong capabilities like writing **Manim code**. The release emphasizes advances in **reinforcement-learning**, **fine-tuning**, and **model-distillation** with a novel RL framework from DeepSeekMath.</description><pubDate>Tue, 21 Jan 2025 07:50:24 GMT</pubDate><category>deepseek</category><category>ollama</category><category>qwen</category><category>llama</category><category>deepseek-r1</category><category>deepseek-v3</category><category>qwen-2.5</category><category>llama-3.1</category><category>llama-3.3-70b</category><category>reinforcement-learning</category><category>fine-tuning</category><category>model-distillation</category><category>model-optimization</category><category>reasoning</category><category>reward-models</category><category>multi-response-sampling</category><category>model-training</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-17-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-17-ainews-not-much-happened-today/</guid><description>**DeepSeek-V3**, a **671 billion parameter mixture-of-experts model**, surpasses **Llama 3.1 405B** and **GPT-4o** in coding and math benchmarks. **OpenAI** announced the upcoming release of **GPT-5** on **April 27, 2023**. **MiniMax-01 Coder mode** in **ai-gradio** enables building a chess game in one shot. **Meta** research highlights trade-offs in scaling visual tokenizers. **Google DeepMind** improves diffusion model quality via inference-time scaling. The **RA-DIT** method fine-tunes LLMs and retrievers for better RAG responses. The U.S. proposes a three-tier export restriction system on AI chips and models, excluding countries like **China** and **Russia**. Security vulnerabilities in AI chatbots involving CSRF and prompt injection were revealed. Concerns about superintelligence and weapons-grade AI models were expressed. **ai-gradio** updates include NVIDIA NIM compatibility and new models like **cosmos-nemotron-34b**. **LangChain** integrates with **Claude-3-haiku** for AI agents with persistent memory. **Triton Warp specialization** optimizes GPU usage for matrix multiplication. **Meta&apos;s** fine-tuned **Llama** models, **OpenBioLLM-8B** and **OpenBioLLM-70B**, target personalized medicine and clinical trials.</description><pubDate>Sat, 18 Jan 2025 02:33:34 GMT</pubDate><category>openai</category><category>deep-learning-ai</category><category>meta-ai-fair</category><category>google-deepmind</category><category>saama</category><category>langchain</category><category>nvidia</category><category>deepseek-v3</category><category>llama-3-1-405b</category><category>gpt-4o</category><category>gpt-5</category><category>minimax-01</category><category>claude-3-haiku</category><category>cosmos-nemotron-34b</category><category>akhaliq</category><category>mixture-of-experts</category><category>coding</category><category>math</category><category>scaling</category><category>visual-tokenizers</category><category>diffusion-models</category><category>inference-time-scaling</category><category>retrieval-augmented-generation</category><category>ai-export-restrictions</category><category>security-vulnerabilities</category><category>prompt-injection</category><category>gpu-optimization</category><category>fine-tuning</category><category>personalized-medicine</category><category>clinical-trials</category><category>ai-agents</category><category>persistent-memory</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-16-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-16-ainews-not-much-happened-today/</guid><description>**Harvey** secured a new **$300M funding round**. **OuteTTS 0.3 1B &amp; 500M** text-to-speech models were released featuring **zero-shot voice cloning**, **multilingual support** (en, jp, ko, zh, fr, de), and **emotion control**, powered by **OLMo-1B** and **Qwen 2.5 0.5B**. The **HOVER** model, a **1.5M-parameter neural net** for **agile motor control**, was introduced, leveraging **human motion capture datasets** and **massively parallel reinforcement learning**. **kokoro.js** enables running AI models locally in browsers with minimal dependencies. **Meta AI** awarded **$200K LLM evaluation grants** for projects on **regional language understanding**, **complex reasoning**, and **interactive programming environments**. **Stability AI&apos;s Twitter account was hacked**, prompting security warnings. **Alibaba Qwen** improved **Process Reward Models (PRMs)** for better **mathematical reasoning** using a **consensus filtering mechanism**. **DeepSeek V3** uses **pipeline parallelism** to enhance **distributed inference** and **long-context generation efficiency**. Discussions on **AI policy in legal frameworks** and **AI&apos;s role in democratizing education** were highlighted. Lighthearted AI-related humor was also shared.</description><pubDate>Fri, 17 Jan 2025 06:04:28 GMT</pubDate><category>harvey</category><category>meta-ai-fair</category><category>stability-ai</category><category>alibaba</category><category>deepseek</category><category>hugging-face</category><category>oute-tts-0.3-1b</category><category>oute-tts-0.3-500m</category><category>olm-1b</category><category>qwen-2.5-0.5b</category><category>hover</category><category>gpt-4o</category><category>deepseek-v3</category><category>reach_vb</category><category>drjimfan</category><category>vikhyatk</category><category>mervenoyann</category><category>aiatmeta</category><category>iscienceluvr</category><category>alibaba_qwen</category><category>awnihannun</category><category>ajeya_cotra</category><category>emollick</category><category>qtnx_</category><category>designerx</category><category>text-to-speech</category><category>zero-shot-learning</category><category>multilinguality</category><category>emotion-control</category><category>motor-control</category><category>reinforcement-learning</category><category>local-ai</category><category>distributed-inference</category><category>pipeline-parallelism</category><category>mathematical-reasoning</category><category>process-reward-models</category><category>legal-ai</category><category>education-ai</category><category>ai-security</category><category>humor</category></item><item><title>Titans: Learning to Memorize at Test Time</title><link>https://news.smol.ai/issues/25-01-15-ainews-titans-learning-to-memorize-at-test-time/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-15-ainews-titans-learning-to-memorize-at-test-time/</guid><description>**Google** released a new paper on &quot;Neural Memory&quot; integrating persistent memory directly into transformer architectures at test time, showing promising long-context utilization. **MiniMax-01** by @omarsar0 features a **4 million token context window** with **456B parameters** and **32 experts**, outperforming **GPT-4o** and **Claude-3.5-Sonnet**. **InternLM3-8B-Instruct** is an open-source model trained on **4 trillion tokens** with state-of-the-art results. **Transformer²** introduces self-adaptive LLMs that dynamically adjust weights for continuous adaptation. Advances in AI security highlight the need for **agent authentication**, **prompt injection** defenses, and **zero-trust architectures**. Tools like **Micro Diffusion** enable budget-friendly diffusion model training, while **LeagueGraph** and **Agent Recipes** support open-source social media agents.</description><pubDate>Thu, 16 Jan 2025 07:58:41 GMT</pubDate><category>google</category><category>meta-ai-fair</category><category>openai</category><category>anthropic</category><category>langchain</category><category>minimax-01</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>internlm3-8b-instruct</category><category>transformer2</category><category>omarsar0</category><category>hwchase17</category><category>abacaj</category><category>hardmaru</category><category>rez0__</category><category>bindureddy</category><category>akhaliq</category><category>saranormous</category><category>long-context</category><category>mixture-of-experts</category><category>self-adaptive-models</category><category>prompt-injection</category><category>agent-authentication</category><category>diffusion-models</category><category>zero-trust-architecture</category><category>continuous-adaptation</category><category>vision</category><category>agentic-systems</category></item><item><title>small little news items</title><link>https://news.smol.ai/issues/25-01-14-ainews-small-little-news-items/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-14-ainews-small-little-news-items/</guid><description>**Ollama** enhanced its models by integrating **Cohere&apos;s R7B**, optimized for **RAG** and **tool use tasks**, and released **Ollama v0.5.5** with quality updates and a new engine. **Together AI** launched the **Llama 3.3 70B multimodal model** with improved reasoning and math capabilities, while **OpenBMB** introduced the **MiniCPM-o 2.6**, outperforming **GPT-4V** on visual tasks. Insights into **Process Reward Models (PRM)** were shared to boost **LLM reasoning**, alongside **Qwen2.5-Math-PRM** models excelling in mathematical reasoning. **LangChain** released a beta for **ChatGPT Tasks** enabling scheduling of reminders and summaries, and introduced open-source **ambient agents** for email assistance. **OpenAI** rolled out **Tasks** for scheduling actions in **ChatGPT** for Plus, Pro, and Teams users. AI software engineering is rapidly advancing, predicted to match human capabilities within 18 months. Research on **LLM scaling laws** highlights power law relationships and plateauing improvements, while **GANs** are experiencing a revival.</description><pubDate>Wed, 15 Jan 2025 02:19:30 GMT</pubDate><category>ollama</category><category>cohere</category><category>togethercompute</category><category>openbmb</category><category>qwen</category><category>langchain</category><category>openai</category><category>r7b</category><category>llama-3-70b</category><category>minicpm-o-2.6</category><category>gpt-4v</category><category>qwen2.5-math-prm</category><category>rag</category><category>tool-use-tasks</category><category>quality-of-life</category><category>new-engine</category><category>multimodality</category><category>improved-reasoning</category><category>math-capabilities</category><category>process-reward-models</category><category>llm-reasoning</category><category>mathematical-reasoning</category><category>beta-release</category><category>task-scheduling</category><category>ambient-agents</category><category>email-assistants</category><category>ai-software-engineering</category><category>codebase-analysis</category><category>test-case-generation</category><category>security-infrastructure</category><category>llm-scaling-laws</category><category>power-law</category><category>plateauing-improvements</category><category>gans-revival</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-13-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-13-ainews-not-much-happened-today/</guid><description>**Helium-1 Preview** by **kyutai_labs** is a **2B-parameter multilingual base LLM** outperforming **Qwen 2.5**, trained on **2.5T tokens** with a **4096 context size** using token-level distillation from a **7B model**. **Phi-4 (4-bit)** was released in **lmstudio** on an **M4 max**, noted for speed and performance. **Sky-T1-32B-Preview** is a **$450 open-source reasoning model** matching **o1&apos;s performance** with strong benchmark scores. **Codestral 25.01** by **mistralai** is a new SOTA coding model supporting **80+ programming languages** and offering **2x speed**. 

Innovations include **AutoRAG** for optimizing retrieval-augmented generation pipelines, **Agentic RAG** for autonomous query reformulation and critique, **Multiagent Finetuning** using societies of models like **Phi-3**, **Mistral**, **LLaMA-3**, and **GPT-3.5** for reasoning improvements, and **VideoRAG** incorporating video content into RAG with LVLMs. 

Applications include a dynamic UI AI chat app by **skirano** on **Replit**, **LangChain** tools like **DocTalk** for voice PDF conversations, AI travel agent tutorials, and news summarization agents. **Hyperbolic Labs** offers competitive GPU rentals including **H100**, **A100**, and **RTX 4090**. **LLMQuoter** enhances RAG accuracy by identifying key quotes. 

Infrastructure updates include **MLX export** for LLM inference from Python to C++ by **fchollet** and **SemHash** semantic text deduplication by **philschmid**.</description><pubDate>Tue, 14 Jan 2025 06:08:22 GMT</pubDate><category>kyutai-labs</category><category>lmstudio</category><category>mistralai</category><category>llamaindex</category><category>huggingface</category><category>langchainai</category><category>hyperbolic-labs</category><category>replit</category><category>fchollet</category><category>philschmid</category><category>helium-1</category><category>qwen-2.5</category><category>phi-4</category><category>sky-t1-32b-preview</category><category>o1</category><category>codestral-25.01</category><category>phi-3</category><category>mistral</category><category>llama-3</category><category>gpt-3.5</category><category>llama-3</category><category>gpt-3.5</category><category>llmquoter</category><category>reach_vb</category><category>awnihannun</category><category>lior_on_ai</category><category>sophiamyang</category><category>omarsar0</category><category>skirano</category><category>yuchenj_uw</category><category>fchollet</category><category>philschmid</category><category>multilinguality</category><category>token-level-distillation</category><category>context-windows</category><category>model-performance</category><category>open-source</category><category>reasoning</category><category>coding</category><category>retrieval-augmented-generation</category><category>hybrid-retrieval</category><category>multiagent-systems</category><category>video</category><category>large-video-language-models</category><category>dynamic-ui</category><category>voice-interaction</category><category>gpu-rentals</category><category>model-optimization</category><category>semantic-deduplication</category><category>model-inference</category></item><item><title>Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model</title><link>https://news.smol.ai/issues/25-01-10-ainews-moondream-202519-structured-text-enhanced-ocr-gaze-detection-in-a-2b-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-10-ainews-moondream-202519-structured-text-enhanced-ocr-gaze-detection-in-a-2b-model/</guid><description>**Moondream** has released a new version that advances VRAM efficiency and adds structured output and gaze detection, marking a new frontier in vision model practicality. Discussions on Twitter highlighted advancements in reasoning models like **OpenAI&apos;s o1**, model distillation techniques, and new multimodal embedding models such as **vdr-2b-multi-v1** and **LLaVA-Mini**, which significantly reduce computational costs. Research on GANs and decentralized diffusion models showed improved stability and performance. Development tools like **MLX** and **vLLM** received updates for better portability and developer experience, while frameworks like **LangChain** and **Qdrant** enable intelligent data workflows. Company updates include new roles and team expansions at **GenmoAI**. *&quot;Efficiency tricks are all you need.&quot;*</description><pubDate>Sat, 11 Jan 2025 07:18:42 GMT</pubDate><category>openai</category><category>llamaindex</category><category>langchainai</category><category>qdrant</category><category>genmoai</category><category>o1</category><category>vdr-2b-multi-v1</category><category>llava-mini</category><category>philschmid</category><category>saranormous</category><category>jxmnop</category><category>reach_vb</category><category>iscienceluvr</category><category>multimodalart</category><category>arohan</category><category>adcock_brett</category><category>awnihannun</category><category>russelljkaplan</category><category>ajayj_</category><category>vision</category><category>model-efficiency</category><category>structured-output</category><category>gaze-detection</category><category>reasoning</category><category>model-distillation</category><category>multimodality</category><category>embedding-models</category><category>gan</category><category>diffusion-models</category><category>self-attention</category><category>training-optimizations</category><category>development-frameworks</category><category>api</category><category>cross-language-deployment</category><category>semantic-search</category><category>agentic-document-processing</category><category>developer-experience</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-09-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-09-ainews-not-much-happened-today/</guid><description>**rStar-Math** surpasses **OpenAI&apos;s o1-preview** in math reasoning with **90.0% accuracy** using a **7B LLM** and **MCTS** with a **Process Reward Model**. **Alibaba** launches **Qwen Chat** featuring **Qwen2.5-Plus** and **Qwen2.5-Coder-32B-Instruct** models enhancing vision-language and reasoning. **Microsoft** releases **Phi-4**, trained on **40% synthetic data** with improved pretraining. **Cohere** introduces **North**, a secure AI workspace integrating **LLMs**, **RAG**, and automation for private deployments. **LangChain** showcases a company research agent with multi-step workflows and open-source datasets. **Transformers.js** demos released for text embeddings and image segmentation in JavaScript. Research highlights include **Meta Meta-CoT** for enhanced chain-of-thought reasoning, **DeepSeek V3** with recursive self-improvement, and collaborative AI development platforms. Industry partnerships include **Rakuten** with **LangChain**, **North** with **RBC** supporting 90,000 employees, and **Agent Laboratory** collaborating with **AMD** and **Johns Hopkins**. Technical discussions emphasize **CUDA** and **Triton** for AI efficiency and evolving AI-assisted coding stacks by **Andrew Ng**.</description><pubDate>Fri, 10 Jan 2025 03:35:37 GMT</pubDate><category>openai</category><category>anthropic</category><category>alibaba</category><category>microsoft</category><category>cohere</category><category>langchain</category><category>weights-biases</category><category>deepseek</category><category>rakuten</category><category>rbc</category><category>amd</category><category>johns-hopkins</category><category>rstar-math</category><category>o1-preview</category><category>qwen2.5-plus</category><category>qwen2.5-coder-32b-instruct</category><category>phi-4</category><category>claude-3.5-sonnet</category><category>reach_vb</category><category>rasbt</category><category>akshaykagrawal</category><category>arankomatsuzaki</category><category>teortaxestex</category><category>aidangomez</category><category>andrewyng</category><category>math</category><category>process-reward-model</category><category>mcts</category><category>vision</category><category>reasoning</category><category>synthetic-data</category><category>pretraining</category><category>rag</category><category>automation</category><category>private-deployment</category><category>multi-step-workflow</category><category>open-source-dataset</category><category>text-embeddings</category><category>image-segmentation</category><category>chain-of-thought</category><category>multimodal-reasoning</category><category>finetuning</category><category>recursive-self-improvement</category><category>collaborative-platforms</category><category>ai-development</category><category>partnerships</category><category>cuda</category><category>triton</category><category>ai-efficiency</category><category>ai-assisted-coding</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-08-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-08-ainews-not-much-happened-today/</guid><description>**Sebastien Bubeck** introduced **REINFORCE++**, enhancing classical REINFORCE with **PPO-inspired techniques** for **30% faster training**. **AI21 Labs** released **Phi-4** under the **MIT License**, accessible via **Ollama**. **François Chollet** announced plans for **ARC-AGI-2** and a next-generation **AGI benchmark**. **LangChain** launched **10 new integration packages** to boost **LLM application development**. **Tom Doerr** introduced **Ollama-OCR**, a Python package for **text extraction** using **vision language models**. **Arohan** optimized **Shampoo** for **memory efficiency**, reducing usage from **20 to 6 bytes per parameter**. **Bindu Reddy** showcased **CodeLLM&apos;s v1** for **frontend code generation** and highlighted **LlamaIndex Workflows** for **academic summarization** and **slide generation**. **Hwchase17** collaborated with **Together Compute** to enhance **WebDev Arena** with **complex coding agents** for **LLM coding evaluations**. **Jonathan Ross** detailed **Groq&apos;s** mission to reduce **compute costs by 1000x** amid rising **generative AI** spending. **Clement Delangue** warned about **scam alerts** involving false claims of association with **AI21**. **Vikhyat K** raised concerns about the **ethical implications** and **trade-offs** of **AGI**. Memes and humor included creative AI prompts and critiques of **LLM behaviors**.</description><pubDate>Thu, 09 Jan 2025 03:45:48 GMT</pubDate><category>ai21-labs</category><category>ollama</category><category>langchain</category><category>togethercompute</category><category>groq</category><category>phi-4</category><category>reinforce++</category><category>arc-agi-2</category><category>sebastien-bubeck</category><category>fchollet</category><category>tom-doerr</category><category>arohan_</category><category>bindureddy</category><category>hwchase17</category><category>jonathanross321</category><category>clementdelangue</category><category>vikhyatk</category><category>reinforcement-learning</category><category>ppo</category><category>model-optimization</category><category>memory-efficiency</category><category>python-packages</category><category>vision</category><category>text-extraction</category><category>frontend-code-generation</category><category>workflow-automation</category><category>coding-agents</category><category>compute-cost-reduction</category><category>ethical-ai</category><category>agi-benchmarks</category><category>scam-alerts</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-07-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-07-ainews-not-much-happened-today/</guid><description>**NVIDIA** has launched **Cosmos**, an open-source video world model trained on **20 million hours of video**, aimed at advancing **robotics** and **autonomous driving**. The release sparked debate over its open-source status and technical approach. Additionally, **NVIDIA** announced **Digits**, a **$3,000** personal AI supercomputer designed to democratize AI computing. The AI community expresses mixed feelings about rapid AI progress, with concerns about **AGI**, job displacement, and investment hype. Discussions also highlight upcoming tools for fine-tuning AI models at home and foundation models for AI robotics.</description><pubDate>Wed, 08 Jan 2025 04:01:51 GMT</pubDate><category>nvidia</category><category>openai</category><category>cosmos</category><category>sama</category><category>robotics</category><category>autonomous-driving</category><category>open-source</category><category>fine-tuning</category><category>foundation-models</category><category>memory-optimization</category></item><item><title>PRIME: Process Reinforcement through Implicit Rewards</title><link>https://news.smol.ai/issues/25-01-06-ainews-prime-process-reinforcement-through-implicit-rewards/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-06-ainews-prime-process-reinforcement-through-implicit-rewards/</guid><description>**Implicit Process Reward Models (PRIME)** have been highlighted as a significant advancement in online reinforcement learning, trained on a **7B model** with impressive results compared to **gpt-4o**. The approach builds on the importance of process reward models established by &quot;Let&apos;s Verify Step By Step.&quot; Additionally, AI Twitter discussions cover topics such as **proto-AGI** capabilities with **claude-3.5-sonnet**, the role of **compute scaling** for **Artificial Superintelligence (ASI)**, and model performance nuances. New AI tools like **Gemini 2.0 coder mode** and **LangGraph Studio** enhance agent architecture and software development. Industry events include the **LangChain AI Agent Conference** and meetups fostering AI community connections. Company updates reveal **OpenAI&apos;s** financial challenges with Pro subscriptions and **DeepSeek-V3&apos;s** integration with **Together AI** APIs, showcasing efficient **671B MoE parameter** models. Research discussions focus on **scaling laws** and compute efficiency in large language models.</description><pubDate>Tue, 07 Jan 2025 02:33:39 GMT</pubDate><category>openai</category><category>together-ai</category><category>deepseek</category><category>langchain</category><category>lucidrains</category><category>claude-3.5-sonnet</category><category>gpt-4o</category><category>deepseek-v3</category><category>gemini-2.0</category><category>sama</category><category>aidan_mclau</category><category>omarsar0</category><category>akhaliq</category><category>hwchase17</category><category>tom_doerr</category><category>lmarena_ai</category><category>cwolferesearch</category><category>richardmcngo</category><category>reinforcement-learning</category><category>scaling-laws</category><category>model-performance</category><category>agent-architecture</category><category>software-development</category><category>compute-scaling</category><category>multi-expert-models</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/25-01-03-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/25-01-03-ainews-not-much-happened-today/</guid><description>**Olmo 2** released a detailed tech report showcasing full pre, mid, and post-training details for a frontier fully open model. **PRIME**, an open-source reasoning solution, achieved **26.7% pass@1**, surpassing **GPT-4o** in benchmarks. Performance improvements include **Qwen 32B (4-bit)** generating at **&gt;40 tokens/sec** on an **M4 Max** and **libvips** being **25x faster** than **Pillow** for image resizing. New tools like **Swaggo/swag** for Swagger 2.0 documentation, **Jujutsu (jj)** Git-compatible VCS, and **Portspoof** security tool were introduced. Robotics advances include a weapon detection system with a meters-wide field of view and faster frame rates. Hardware benchmarks compared **H100** and **MI300x** accelerators. Applications span medical error detection using PRIME and a financial AI agent integrating **LangChainAI** and **Vercel AI SDK**. Architectural insights suggest the need for breakthroughs similar to **SSMs** or **RNNs**.</description><pubDate>Sat, 04 Jan 2025 07:58:51 GMT</pubDate><category>olmo</category><category>openai</category><category>qwen</category><category>cerebras-systems</category><category>langchain</category><category>vercel</category><category>swaggo</category><category>gin</category><category>echo</category><category>prime</category><category>gpt-4o</category><category>qwen-32b</category><category>akhaliq</category><category>jason-wei</category><category>vikhyatk</category><category>awnihannun</category><category>arohan</category><category>tom-doerr</category><category>hendrikbgr</category><category>jerryjliu0</category><category>adcock-brett</category><category>shuchaobi</category><category>stasbekman</category><category>reach-vb</category><category>virattt</category><category>andrew-n-carr</category><category>reasoning</category><category>chain-of-thought</category><category>math</category><category>coding</category><category>optimization</category><category>performance</category><category>image-processing</category><category>software-development</category><category>agent-frameworks</category><category>version-control</category><category>security</category><category>robotics</category><category>hardware-optimization</category><category>medical-ai</category><category>financial-ai</category><category>architecture</category></item><item><title>not much happened to end the year</title><link>https://news.smol.ai/issues/24-12-31-ainews-not-much-happened-to-end-the-year/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-31-ainews-not-much-happened-to-end-the-year/</guid><description>**Reinforcement Fine-Tuning (RFT)** is introduced as a **data-efficient** method to improve **reasoning in LLMs** using minimal **training data** with strategies like **First-Correct Solutions (FCS)** and **Greedily Diverse Solutions (GDS)**. **DeepSeek-V3**, a **671B parameter MoE language model** trained on **14.8 trillion tokens** with **FP8 mixed precision training**, highlights advances in large-scale models and open-source LLMs. Predictions for **AI in 2025** include growth in **smaller models**, **multimodality**, and challenges in **open-source AI**. The impact of AI on software development jobs suggests a need for **higher intelligence** and **specialization** as AI automates low-skilled tasks. Enhancements to **CodeLLM** improve coding assistance with features like **in-place editing** and **streaming responses**. **Natural Language Reinforcement Learning (NLRL)** offers better interpretability and richer feedback for AI planning and critique. AI hiring is growing rapidly with startups seeking strong engineers in **ML** and **systems**. New AI-powered tools such as **Rivet**, **Buzee**, and **Konfig** improve real-time applications, search, and SDK generation using technologies like **Rust** and **V8 isolates**.</description><pubDate>Tue, 31 Dec 2024 23:55:07 GMT</pubDate><category>deepseek</category><category>smol-ai</category><category>deepseek-v3</category><category>code-llm</category><category>o1</category><category>sonnet-3.5</category><category>corbtt</category><category>tom_doerr</category><category>cognitivecompai</category><category>alexalbert__</category><category>theturingpost</category><category>svpino</category><category>bindureddy</category><category>reinforcement-learning</category><category>reasoning</category><category>training-data</category><category>mixed-precision-training</category><category>open-source</category><category>multimodality</category><category>software-development</category><category>natural-language-processing</category><category>interpretability</category><category>developer-tools</category><category>real-time-applications</category><category>search</category><category>sdk-generation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-12-30-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-30-ainews-not-much-happened-today/</guid><description>**Sam Altman** publicly criticizes **DeepSeek** and **Qwen** models, sparking debate about **OpenAI**&apos;s innovation claims and reliance on foundational research like the **Transformer architecture**. **Deepseek V3** shows significant overfitting issues in the **Misguided Attention** evaluation, solving only **22%** of test prompts, raising concerns about its reasoning and finetuning. Despite skepticism about its open-source status, **Deepseek V3** is claimed to surpass **ChatGPT4** as an open-source model, marking a milestone 1.75 years after ChatGPT4&apos;s release on **March 14, 2023**. The discussions highlight competitive dynamics in AI model performance and innovation sustainability.</description><pubDate>Tue, 31 Dec 2024 02:24:45 GMT</pubDate><category>openai</category><category>deepseek</category><category>google</category><category>qwen</category><category>deepseek-v3</category><category>chatgpt-4</category><category>sam-altman</category><category>overfitting</category><category>reasoning</category><category>misguided-attention</category><category>model-evaluation</category><category>model-architecture</category><category>finetuning</category><category>open-source</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-12-27-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-27-ainews-not-much-happened-today/</guid><description>**ChatGPT**, **Sora**, and the **OpenAI API** experienced a &gt;5 hour outage but are now restored. Updates to **vLLM** enable **DeepSeek-V3** to run with enhanced **parallelism** and **CPU offloading**, improving **model deployment flexibility**. Discussions on **gradient descent** in **top-k routing MoE** and adoption of **FP8 precision** focus on **training efficiency** and **memory optimization**. **AIDE**, an **AI voice medical assistant** by **Team Therasync**, leverages **Qdrant**, **OpenAI**, and **Twilio**. **DeepSeek-Engineer** offers AI-powered coding assistance with structured outputs. **LlamaIndex** integrates **LlamaCloud** and **ElevenLabs** for large-scale **document processing** and voice interaction. Insights on **version control** with **ghstack** and advocacy for **linear decay learning rate schedules** highlight best practices in AI development. Experts predict **smaller, tighter models**, **true multimodal models**, and **on-device AI** in 2025. Proposals for **planetary-scale federated learning** and community AGI moonshots emphasize future AI directions. Discussions on **agentic systems**, **multi-agent workflows**, and **deliberative alignment** through **chain of thought reasoning** underscore AI safety and alignment efforts.</description><pubDate>Sat, 28 Dec 2024 05:06:02 GMT</pubDate><category>openai</category><category>deepseek</category><category>qdrant</category><category>twilio</category><category>llamaindex</category><category>elevenlabs</category><category>vllm</category><category>deepseek-v3</category><category>llamaindex</category><category>francois-fleuret</category><category>daniel-hanchen</category><category>aaron-defazio</category><category>fchollet</category><category>elad-gil</category><category>wojciech-zaremba</category><category>richard-socher</category><category>training-efficiency</category><category>parallelism</category><category>cpu-offloading</category><category>gradient-descent</category><category>mixture-of-experts</category><category>fp8-precision</category><category>memory-optimization</category><category>ai-voice-assistants</category><category>coding-assistants</category><category>document-processing</category><category>version-control</category><category>learning-rate-schedules</category><category>federated-learning</category><category>agentic-systems</category><category>multi-agent-systems</category><category>deliberative-alignment</category><category>chain-of-thought</category><category>on-device-ai</category><category>multimodality</category></item><item><title>DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens</title><link>https://news.smol.ai/issues/24-12-26-ainews-deepseek-v3-671b-finegrained-moe-trained-for-dollar55m-usd-of-compute-on-15t-tokens/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-26-ainews-deepseek-v3-671b-finegrained-moe-trained-for-dollar55m-usd-of-compute-on-15t-tokens/</guid><description>**DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It was trained with only **2.788M H800 GPU hours**, significantly less than **Llama-3**&apos;s **30.8M GPU-hours**, showcasing major compute efficiency and cost reduction. The model is open-source and deployed via **Hugging Face** with API support. Innovations include native FP8 mixed precision training, Multi-Head Latent Attention scaling, distillation from synthetic reasoning data, pruning and healing for MoEs with up to **256 experts**, and a new multi-token prediction objective enabling lookahead token planning. Research highlights also cover the **OREO method** and **Natural Language Reinforcement Learning (NLRL)** for multi-step reasoning and agent control.</description><pubDate>Fri, 27 Dec 2024 01:18:46 GMT</pubDate><category>deepseek-ai</category><category>hugging-face</category><category>openai</category><category>anthropic</category><category>deepseek-v3</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>llama-3</category><category>nrehiew_</category><category>denny_zhou</category><category>mixture-of-experts</category><category>model-training</category><category>model-optimization</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>multi-token-prediction</category><category>synthetic-data</category><category>model-distillation</category><category>fine-tuning</category><category>attention-mechanisms</category><category>gpu-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-12-24-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-24-ainews-not-much-happened-today/</guid><description>The **Qwen team** launched **QVQ**, a vision-enabled version of their experimental **QwQ o1 clone**, benchmarking comparably to **Claude 3.5 Sonnet**. Discussions include **Bret Taylor&apos;s** insights on autonomous software development distinct from the Copilot era. The **Latent Space LIVE!** talks cover highlights of **2024 AI startups, vision, open models, post-transformers, synthetic data, smol models, and agents**. Twitter recaps by **Claude 3.5 Sonnet** highlight proposals for benchmarks measuring LLM calibration and falsehood confidence, with **QVQ** outperforming **GPT-4o** and **Claude Sonnet 3.5**. AI alignment debates focus on intentionality and critiques of alignment faking in models like **Claude**. Updates from **OpenAI** include new **o3 and o3-mini models** and a deliberative alignment strategy. The **ASAL project** is a collaboration between **MIT**, **OpenAI**, and **Swiss AI Lab IDSIA** to automate artificial life discovery. Personal stories reveal frustrations with **USCIS** green card denials despite high qualifications. New tools like **GeminiCoder** enable rapid app creation, and a **contract review agent** using **Reflex** and **Llama Index** checks GDPR compliance. Holiday greetings and memes were also shared.</description><pubDate>Wed, 25 Dec 2024 02:01:53 GMT</pubDate><category>alibaba</category><category>openai</category><category>mit</category><category>idsia</category><category>llamaindex</category><category>ollama</category><category>qwen-o1</category><category>qvq</category><category>claude-3.5-sonnet</category><category>gpt-4o</category><category>o3</category><category>o3-mini</category><category>bret-taylor</category><category>vision</category><category>benchmarking</category><category>llm-calibration</category><category>intentionality</category><category>alignment-faking</category><category>deliberative-alignment</category><category>artificial-life</category><category>gdpr-compliance</category><category>contract-review-agent</category><category>app-creation</category><category>synthetic-data</category><category>post-transformers</category><category>smol-models</category><category>agents</category></item><item><title>not much happened this weekend</title><link>https://news.smol.ai/issues/24-12-23-ainews-not-much-happened-this-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-23-ainews-not-much-happened-this-weekend/</guid><description>**o3** model gains significant attention with discussions around its capabilities and implications, including an OpenAI board member referencing &quot;AGI.&quot; **LangChain** released their **State of AI 2024** survey. **Hume** announced **OCTAVE**, a **3B parameter** API-only speech-language model with voice cloning. **x.ai** secured a **$6B Series C** funding round. Discussions highlight **inference-time scaling**, **model ensembles**, and the surprising generalization ability of **small models**. New tools and datasets include **FineMath**, the best open math dataset on Hugging Face, and frameworks for LLM agents. Industry updates cover a **5-month benchmarking** of **AMD MI300X** vs **Nvidia H100 + H200**, insights from a meeting with **Lisa Su** on AMD&apos;s software stack, and open AI engineering roles. Research innovations include **Large Concept Models (LCM)** from Meta AI, **Chain of Continuous Thought (Coconut)** for latent space reasoning, and mechanistic interpretability initiatives.</description><pubDate>Tue, 24 Dec 2024 01:01:31 GMT</pubDate><category>openai</category><category>langchain</category><category>hume</category><category>x-ai</category><category>amd</category><category>nvidia</category><category>meta-ai-fair</category><category>hugging-face</category><category>o3</category><category>o1</category><category>opus</category><category>sonnet</category><category>octave</category><category>lisa-su</category><category>clementdelangue</category><category>philschmid</category><category>neelnanda5</category><category>inference-time-scaling</category><category>model-ensembles</category><category>small-models</category><category>voice-cloning</category><category>fine-math-dataset</category><category>llm-agent-framework</category><category>benchmarking</category><category>software-stack</category><category>large-concept-models</category><category>latent-space-reasoning</category><category>mechanistic-interpretability</category><category>planning</category><category>speech-language-models</category></item><item><title>o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath</title><link>https://news.smol.ai/issues/24-12-20-ainews-o3-solves-aime-gpqa-codeforces-makes-11-years-of-progress-in-arc-agi-and-25percent-in-frontiermath/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-20-ainews-o3-solves-aime-gpqa-codeforces-makes-11-years-of-progress-in-arc-agi-and-25percent-in-frontiermath/</guid><description>**OpenAI** announced the **o3** and **o3-mini** models with groundbreaking benchmark results, including a jump from **2% to 25%** on the **FrontierMath** benchmark and **87.5%** on the **ARC-AGI** reasoning benchmark, representing about **11 years of progress** on the GPT3 to GPT4o scaling curve. The **o1-mini** model shows superior inference efficiency compared to o3-full, promising significant cost reductions on coding tasks. The announcement was accompanied by community discussions, safety testing applications, and detailed analyses. *Sama* highlighted the unusual cost-performance tradeoff, and **Eric Wallace** shared insights on the o-series deliberative alignment strategy.</description><pubDate>Sat, 21 Dec 2024 01:44:22 GMT</pubDate><category>openai</category><category>o3</category><category>o3-mini</category><category>o1-mini</category><category>gpt-3</category><category>gpt-4o</category><category>o1</category><category>sama</category><category>eric-wallace</category><category>benchmarking</category><category>math</category><category>reasoning</category><category>model-performance</category><category>inference-speed</category><category>cost-efficiency</category><category>alignment</category><category>safety-testing</category></item><item><title>ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens, </title><link>https://news.smol.ai/issues/24-12-19-ainews-modernbert-small-new-retrieverclassifier-workhorse-8k-context-2t-tokens/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-19-ainews-modernbert-small-new-retrieverclassifier-workhorse-8k-context-2t-tokens/</guid><description>**Answer.ai/LightOn** released **ModernBERT**, an updated encoder-only model with **8k token context**, trained on **2 trillion tokens** including code, with **139M/395M parameters** and state-of-the-art performance on retrieval, NLU, and code tasks. It features **Alternating Attention** layers mixing global and local attention. **Gemini 2.0 Flash Thinking** debuted as #1 in Chatbot Arena, and the **O1 model** scored top in reasoning benchmarks. **Llama** downloads surpassed **650 million**, doubling in 3 months. **OpenAI** launched desktop app integrations with voice capabilities. **Figure** delivered its first humanoid robots commercially. Advances in robotics simulation and a new physics engine **Genesis** claiming **430,000x faster than real-time** were highlighted.</description><pubDate>Fri, 20 Dec 2024 03:27:55 GMT</pubDate><category>answerdotai</category><category>lightonio</category><category>hugging-face</category><category>google-deepmind</category><category>openai</category><category>meta-ai-fair</category><category>figure</category><category>modernbert</category><category>gemini-2.0-flash-thinking</category><category>o1</category><category>llama</category><category>jeremyphoward</category><category>alec-radford</category><category>philschmid</category><category>drjimfan</category><category>bindureddy</category><category>encoder-only-models</category><category>long-context</category><category>alternating-attention</category><category>natural-language-understanding</category><category>reasoning</category><category>robotics-simulation</category><category>physics-engine</category><category>humanoid-robots</category><category>model-performance</category><category>model-releases</category></item><item><title>Genesis: Generative Physics Engine for Robotics (o1-mini version)</title><link>https://news.smol.ai/issues/24-12-18-ainews-genesis-generative-physics-engine-for-robotics-o1-mini-version/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-18-ainews-genesis-generative-physics-engine-for-robotics-o1-mini-version/</guid><description>**OpenAI** launched the **o1 model** API featuring function calling, structured outputs, vision support, and developer messages, achieving **60% fewer reasoning tokens** than its preview. The model excels in math and code with a **0.76 LiveBench Coding score**, outperforming Sonnet 3.5. Beta SDKs for Go and Java and WebRTC support with **60% lower prices** were also released. **Google Gemini 2.0 Pro (Gemini Exp 1206)** deployment accelerated, showing improved coding, math, and reasoning performance. Meta AI FAIR introduced research on training transformers directly on raw bytes using dynamic entropy-based patching. Commercial humanoid robots were successfully deployed by an industry player. **Hugging Face** researchers demonstrated that their **3B Llama model** can outperform the **70B Llama model** on MATH-500 accuracy using search techniques, highlighting efficiency gains with smaller models. Concerns about reproducibility and domain-specific limitations were noted.</description><pubDate>Thu, 19 Dec 2024 05:17:10 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>meta-ai-fair</category><category>hugging-face</category><category>o1</category><category>o1-preview</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>gemini-2.0-pro</category><category>llama-3-3b</category><category>llama-3-70b</category><category>aidan_mclau</category><category>sundarpichai</category><category>adcock_brett</category><category>function-calling</category><category>structured-outputs</category><category>vision</category><category>performance-benchmarks</category><category>sdk</category><category>webrtc</category><category>reasoning</category><category>math</category><category>code-generation</category><category>transformer-architecture</category><category>model-training</category><category>humanoid-robots</category><category>search</category><category>model-efficiency</category><category>dataset-sharing</category></item><item><title>Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)</title><link>https://news.smol.ai/issues/24-12-18-ainews-genesis-generative-physics-engine-for-robotics-o1-2024-12-17/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-18-ainews-genesis-generative-physics-engine-for-robotics-o1-2024-12-17/</guid><description>**Genesis** is a newly announced **universal physics engine** developed by a large-scale collaboration led by **CMU PhD student Zhou Xian**. It integrates multiple state-of-the-art physics solvers to simulate diverse materials and physical phenomena, targeting robotics applications with features like lightweight, ultra-fast simulation, photo-realistic rendering, and generative data capabilities. The engine is open source and designed for robotics simulation beyond just video generation. Additionally, **OpenAI** released the **o1** model to API with advanced features like function calling and vision support, showing strong math and coding performance. **Google** teased updates on **Gemini 2.0 Pro**, accelerating deployment for advanced users.</description><pubDate>Thu, 19 Dec 2024 04:48:33 GMT</pubDate><category>openai</category><category>google</category><category>carnegie-mellon-university</category><category>o1</category><category>gemini-2.0-pro</category><category>zhou-xian</category><category>aidan_mclau</category><category>sundar-pichai</category><category>universal-physics-engine</category><category>robotics-simulation</category><category>physics-simulation</category><category>photo-realistic-rendering</category><category>generative-data</category><category>simulation-platform</category><category>open-source</category><category>function-calling</category><category>vision</category><category>performance-benchmarks</category><category>sdk</category><category>realtime-api</category></item><item><title>OpenAI Voice Mode Can See Now - After Gemini Does</title><link>https://news.smol.ai/issues/24-12-18-ainews-openai-voice-mode-can-see-now-after-gemini-does/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-18-ainews-openai-voice-mode-can-see-now-after-gemini-does/</guid><description>**OpenAI** launched **Realtime Video** shortly after **Gemini**, which led to less impact due to Gemini&apos;s earlier arrival with lower cost and fewer rate limits. **Google DeepMind** released **Gemini 2.0 Flash** featuring enhanced multimodal capabilities and real-time streaming. **Anthropic** introduced **Clio**, a system analyzing real-world usage of **Claude** models. Together Computing acquired CodeSandbox to launch a code interpreter tool. Discussions highlighted **Meta&apos;s Llama 3.3-70B** for its advanced roleplay and prompt handling abilities, outperforming models like **Mistral Large** and **GPT-4o** in expressiveness and censorship. The AI community also engaged in humorous takes on AI outages and model competition, with **ChatGPT** adding a Santa mode for holiday interactions. *&quot;Anthropic is capturing the developer ecosystem, Gemini has AI enthusiast mindshare, ChatGPT reigns over AI dabblers&quot;* was a noted observation from the community.</description><pubDate>Wed, 18 Dec 2024 09:46:07 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>togethercompute</category><category>scale-ai</category><category>meta-ai-fair</category><category>mistral-ai</category><category>gemini-2.0-flash</category><category>claude</category><category>claude-3.5-sonnet</category><category>llama-3-70b</category><category>llama-3</category><category>mistral-large</category><category>gpt-4o</category><category>bindureddy</category><category>multimodality</category><category>real-time-streaming</category><category>roleplay</category><category>prompt-handling</category><category>model-comparison</category><category>model-training</category><category>creative-writing</category><category>model-censorship</category><category>code-execution</category><category>developer-ecosystem</category><category>ai-humor</category></item><item><title>o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning</title><link>https://news.smol.ai/issues/24-12-17-ainews-o1-api-4o4o-mini-in-realtime-api-webrtc-dpo-finetuning/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-17-ainews-o1-api-4o4o-mini-in-realtime-api-webrtc-dpo-finetuning/</guid><description>**OpenAI** launched the **o1 API** with enhanced features including vision inputs, function calling, structured outputs, and a new `reasoning_effort` parameter, achieving **60% fewer reasoning tokens** on average. The **o1 pro** variant is confirmed as a distinct implementation coming soon. Improvements to the **Realtime API** with **WebRTC** integration offer easier usage, longer sessions (up to **30 minutes**), and significantly reduced pricing (up to **10x cheaper** with mini models). **DPO Preference Tuning** for fine-tuning is introduced, currently available for the **4o** model. Additional updates include official Go and Java SDKs and OpenAI DevDay videos. The news also highlights discussions on **Google Gemini 2.0 Flash** model&apos;s performance reaching **83.6% accuracy**.</description><pubDate>Wed, 18 Dec 2024 01:43:51 GMT</pubDate><category>openai</category><category>google</category><category>google-deepmind</category><category>o1-2024-12-17</category><category>o1</category><category>o1-pro</category><category>4o</category><category>4o-mini</category><category>gemini-2-0-flash</category><category>claude-3.5-sonnet</category><category>claude-3.5</category><category>aidan_mclau</category><category>kevinweil</category><category>simonw</category><category>michpokrass</category><category>morgymcg</category><category>juberti</category><category>function-calling</category><category>structured-outputs</category><category>vision</category><category>reasoning</category><category>webrtc</category><category>realtime-api</category><category>preference-tuning</category><category>fine-tuning</category><category>api</category><category>model-performance</category></item><item><title>Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights</title><link>https://news.smol.ai/issues/24-12-16-ainews-meta-apollo-video-understanding-up-to-1-hour-sota-open-weights/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-16-ainews-meta-apollo-video-understanding-up-to-1-hour-sota-open-weights/</guid><description>**Meta** released **Apollo**, a new family of state-of-the-art video-language models available in **1B, 3B, and 7B** sizes, featuring &quot;Scaling Consistency&quot; for efficient scaling and introducing **ApolloBench**, which speeds up video understanding evaluation by **41×** across five temporal perception categories. **Google Deepmind** launched **Veo 2**, a 4K video generation model with improved physics and camera control, alongside an enhanced **Imagen 3** image model. **OpenAI** globally rolled out ChatGPT search with advanced voice and map features and discussed a potential $2,000/month &quot;ChatGPT Max&quot; tier. Research highlights include achieving **Llama 70B** performance using **Llama 3B** via test-time compute scaling and expanding **Command R7B** language support from 10 to 23 languages. Industry updates feature **Figure AI** delivering humanoid robots commercially and **Klarna** reducing workforce through AI. Notion integrated **Cohere Rerank** for better search. Studies reveal LLMs can recognize their own writing style and show self-preference bias. Discussions note video processing progress outpacing text due to better signal-per-compute and data evaluation.</description><pubDate>Tue, 17 Dec 2024 01:17:52 GMT</pubDate><category>meta-ai-fair</category><category>hugging-face</category><category>google-deepmind</category><category>openai</category><category>figure-ai</category><category>klarna</category><category>cohere</category><category>notion</category><category>apollo-1b</category><category>apollo-3b</category><category>apollo-7b</category><category>veo-2</category><category>imagen-3</category><category>llama-3-70b</category><category>llama-3b</category><category>command-r7b</category><category>llama-1b</category><category>llama-8b</category><category>chatgpt</category><category>akhaliq</category><category>_lewtun</category><category>clementdelangue</category><category>adcock_brett</category><category>rohanpaul_ai</category><category>swyx</category><category>shaneguML</category><category>video-understanding</category><category>scaling-consistency</category><category>benchmarking</category><category>temporal-ocr</category><category>egocentric-perception</category><category>spatial-perception</category><category>reasoning</category><category>video-generation</category><category>physics-simulation</category><category>voice-features</category><category>map-integration</category><category>language-expansion</category><category>test-time-compute-scaling</category><category>humanoid-robots</category><category>ai-integration</category><category>search-optimization</category><category>self-recognition</category><category>self-preference-bias</category></item><item><title>Meta BLT: Tokenizer-free, Byte-level LLM</title><link>https://news.smol.ai/issues/24-12-13-ainews-meta-blt-tokenizer-free-byte-level-llm/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-13-ainews-meta-blt-tokenizer-free-byte-level-llm/</guid><description>**Meta AI** introduces the **Byte Latent Transformer (BLT)**, a tokenizer-free architecture that dynamically forms byte patches for efficient compute allocation, outperforming **Llama 3** on benchmarks including the CUTE benchmark. The model was trained on approximately **1 trillion tokens** and features a three-block transformer design with local and global components. This approach challenges traditional tokenization and may enable new multimodal capabilities such as direct file interaction without retrieval-augmented generation. Additionally, **Microsoft** announced the **Phi-4 14B** parameter model achieving state-of-the-art results on STEM and reasoning benchmarks, surpassing **GPT-4o**. **DeepSeek AI** launched new vision-language models based on their MoE architecture with sizes ranging from **1.0B to 27B** parameters. **OpenAI** released a new Projects feature for ChatGPT, and **Cohere** introduced their smallest and fastest **Command R7B** model. **Anthropic** published research on &quot;Best-of-N Jailbreaking&quot; vulnerabilities across text, vision, and audio models. Industry discussion highlights a trend of decreasing frontier LLM sizes, with **GPT-4** at approximately **1.8 trillion parameters** compared to newer models.</description><pubDate>Sat, 14 Dec 2024 05:38:19 GMT</pubDate><category>meta-ai-fair</category><category>llamaindex</category><category>microsoft</category><category>deepseek-ai</category><category>openai</category><category>cohere</category><category>anthropic</category><category>byte-latent-transformer</category><category>llama-3</category><category>phi-4</category><category>gpt-4o</category><category>command-r7b</category><category>tokenization</category><category>transformer-architecture</category><category>model-efficiency</category><category>benchmarking</category><category>multimodality</category><category>vision</category><category>reinforcement-learning</category><category>model-scaling</category><category>jailbreaking</category><category>model-optimization</category></item><item><title>Google wakes up: Gemini 2.0 et al</title><link>https://news.smol.ai/issues/24-12-11-ainews-google-wakes-up-gemini-20-et-al/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-11-ainews-google-wakes-up-gemini-20-et-al/</guid><description>**Google DeepMind** launched **Gemini 2.0 Flash**, a new multimodal model outperforming Gemini 1.5 Pro and o1-preview, featuring vision and voice APIs, multilingual capabilities, and native tool use. It powers new AI agents like **Project Astra** and **Project Mariner**, with Project Mariner achieving state-of-the-art **83.5%** on the WebVoyager benchmark. **OpenAI** announced ChatGPT integration with **Apple** devices, enabling Siri access and visual intelligence features. **Claude 3.5 Sonnet** is noted as a distilled version of Opus. The AI community&apos;s response at **NeurIPS 2024** has been overwhelmingly positive, signaling a strong comeback for Google in AI innovation. Key topics include **multimodality**, **agent development**, **multilinguality**, **benchmarking**, and **model releases**.</description><pubDate>Thu, 12 Dec 2024 03:16:07 GMT</pubDate><category>google-deepmind</category><category>openai</category><category>apple</category><category>gemini-2.0-flash</category><category>gemini-1.5-pro</category><category>gemini-exp-1206</category><category>claude-3.5-sonnet</category><category>opus</category><category>demis-hassabis</category><category>sundar-pichai</category><category>paige-bailey</category><category>bindureddy</category><category>multimodality</category><category>agent-development</category><category>multilinguality</category><category>benchmarking</category><category>model-releases</category></item><item><title>ChatGPT Canvas GA</title><link>https://news.smol.ai/issues/24-12-10-ainews-chatgpt-canvas-ga/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-10-ainews-chatgpt-canvas-ga/</guid><description>**OpenAI** launched **ChatGPT Canvas** to all users, featuring **code execution** and **GPT integration**, effectively replacing Code Interpreter with a Google Docs-like interface. **Deepseek AI** announced their **V2.5-1210** update improving performance on **MATH-500 (82.8%)** and LiveCodebench. **Meta AI Fair** introduced **COCONUT**, a new continuous latent space reasoning paradigm. **Huggingface** released **TGI v3**, processing **3x more tokens** and running **13x faster** than vLLM on long prompts. **Cognition Labs** released **Devin**, an AI developer building Kubernetes operators. **Hyperbolic** raised **$12M Series A** to build an open AI platform with an **H100 GPU marketplace**. Discussions included **AI capabilities and employment impact**, and **NeurIPS 2024** announcements with **Google DeepMind** demos and a debate on AI scaling. On Reddit, **Llama 3.3-70B** supports **90K context length** finetuning using **Unsloth** with **gradient checkpointing** and Apple&apos;s **Cut Cross Entropy (CCE)** algorithm, fitting on **41GB VRAM**. **Llama 3.1-8B** reaches **342K context lengths** with Unsloth, surpassing native limits.</description><pubDate>Wed, 11 Dec 2024 04:20:02 GMT</pubDate><category>openai</category><category>deepseek-ai</category><category>meta-ai-fair</category><category>huggingface</category><category>cognition-labs</category><category>hyperbolic</category><category>google-deepmind</category><category>llama-3-70b</category><category>llama-3-1-8b</category><category>tgi-v3</category><category>deepseek-v2.5-1210</category><category>coconut</category><category>arav_srinivas</category><category>sama</category><category>jonathan-frankle</category><category>dylan</category><category>code-execution</category><category>gpt-integration</category><category>model-finetuning</category><category>gradient-checkpointing</category><category>context-length</category><category>latent-space-reasoning</category><category>performance-optimization</category><category>gpu-memory-optimization</category><category>kubernetes</category><category>gpu-marketplace</category><category>ai-capabilities</category><category>employment-impact</category><category>neurips-2024</category><category>ai-scaling</category><category>humor</category></item><item><title>OpenAI Sora Turbo and Sora.com</title><link>https://news.smol.ai/issues/24-12-09-ainews-openai-sora-turbo-and-soracom/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-09-ainews-openai-sora-turbo-and-soracom/</guid><description>**OpenAI** launched **Sora Turbo**, enabling text-to-video generation for ChatGPT Plus and Pro users with monthly generation limits and regional restrictions in Europe and the UK. **Google** announced a quantum computing breakthrough with the development of the **Willow chip**, potentially enabling commercial quantum applications. Discussions on **O1** model performance highlighted its lag behind **Claude 3.5 Sonnet** and **Gemini** in coding tasks, with calls for algorithmic innovation beyond transformer scaling. The **Llama 3.3 Euryale v2.3** model was praised for storytelling and roleplay capabilities, with users suggesting parameter tuning to reduce creative liberties and repetition. Alternatives like **Mistral-Large**, **Behemoth**, and **Endurance v1.1** were also noted. Additionally, **Nvidia** faces an anti-monopoly investigation in China. Memes and humor around GPU issues and embargo mishaps were popular on social media.</description><pubDate>Tue, 10 Dec 2024 02:21:42 GMT</pubDate><category>openai</category><category>google</category><category>nvidia</category><category>hugging-face</category><category>mistral-ai</category><category>sora-turbo</category><category>o1</category><category>claude-3.5-sonnet</category><category>claude-3.5</category><category>gemini</category><category>llama-3-3-euryale-v2.3</category><category>mistral-large</category><category>behemoth</category><category>endurance-v1.1</category><category>sama</category><category>sundarpichai</category><category>bindureddy</category><category>denny_zhou</category><category>nrehiew_</category><category>text-to-video-generation</category><category>quantum-computing</category><category>coding-capabilities</category><category>transformers</category><category>algorithmic-innovation</category><category>storytelling</category><category>roleplay</category><category>model-parameter-tuning</category><category>anti-monopoly-investigation</category></item><item><title>Meta Llama 3.3: 405B/Nova Pro performance at 70B price</title><link>https://news.smol.ai/issues/24-12-06-ainews-meta-llama-33-405bnova-pro-performance-at-70b-price/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-06-ainews-meta-llama-33-405bnova-pro-performance-at-70b-price/</guid><description>**Meta AI** released **Llama 3.3 70B**, matching the performance of the 405B model with improved efficiency using *&quot;a new alignment process and progress in online RL techniques&quot;*. **OpenAI** announced **Reinforcement Fine-Tuning (RFT)** for building expert models with limited data, offering alpha access to researchers and enterprises. **Google DeepMind&apos;s Gemini-Exp-1206** leads benchmarks, tying with **GPT-4o** in coding performance. **LlamaCloud** enhanced document processing with table extraction and analytics. Discussions on **OpenAI&apos;s** pricing plans continue in the community.</description><pubDate>Fri, 06 Dec 2024 22:44:07 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>google-deepmind</category><category>hugging-face</category><category>llamacloud</category><category>llama-3-70b</category><category>llama-3.3-70b</category><category>gpt-4o</category><category>gemini-exp-1206</category><category>sama</category><category>steven-heidel</category><category>aidan_mclau</category><category>lmarena_ai</category><category>oriolvinyalsml</category><category>jerryjliu0</category><category>reinforcement-learning</category><category>fine-tuning</category><category>model-performance</category><category>document-processing</category><category>pricing-models</category><category>alignment</category><category>online-rl</category></item><item><title>$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews</title><link>https://news.smol.ai/issues/24-12-05-ainews-dollar200-chatgpt-pro-and-o1-fullpro-with-vision-without-api-and-mixed-reviews/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-05-ainews-dollar200-chatgpt-pro-and-o1-fullpro-with-vision-without-api-and-mixed-reviews/</guid><description>**OpenAI** launched the **o1** model with multimodal capabilities, faster reasoning, and image input support, marking it as a state-of-the-art model despite some bugs and mixed community reviews. The new **o1-pro** tier offers unlimited access for $200/month with notable benchmark improvements but some performance trade-offs compared to **claude-3.5-sonnet**. **Google** released the **PaliGemma 2** vision-language model family in sizes **3B, 10B, and 28B**, excelling in visual question answering, image segmentation, and OCR, with day-0 support for fine-tuning. **LlamaIndex** announced discounts and feature updates for large-scale document processing. The AI community also reacted humorously to the new pricing tiers and model comparisons. *&quot;o1 can see now, which makes it the SOTA multimodal model&quot;* and *&quot;most users will be best served by free/Plus tiers&quot;* were notable sentiments.</description><pubDate>Fri, 06 Dec 2024 02:34:03 GMT</pubDate><category>openai</category><category>google</category><category>llamaindex</category><category>o1</category><category>o1-pro</category><category>claude-3.5-sonnet</category><category>pali-gemma-2</category><category>sama</category><category>bindureddy</category><category>mervenoyann</category><category>fchollet</category><category>multimodality</category><category>vision</category><category>fine-tuning</category><category>benchmarking</category><category>model-performance</category><category>image-generation</category><category>document-processing</category><category>model-release</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-12-04-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-04-ainews-not-much-happened-today/</guid><description>**OpenAI** announced their &quot;12 Days of OpenAI&quot; event with daily livestreams and potential releases including the **O1 full model**, **Sora video model**, and **GPT-4.5**. **Google DeepMind** released the **GenCast weather model** capable of **15-day forecasts in 8 minutes** using TPU chips, and launched **Genie 2**, a model generating playable 3D worlds from single images. Leading vision researchers **Lucas Beyer**, **Alexander Kolesnikov**, and **Xiaohua Zhai** moved from DeepMind to OpenAI, which is opening a Zürich office. Criticism arose over OpenAI&apos;s strategy and model quality compared to **Anthropic** and **Claude 3.5 Sonnet**. On Reddit, a modified **llama.cpp** supports **Nvidia&apos;s Llama-3_1-Nemotron-51B**, matching performance of larger 70B models via NAS optimization.</description><pubDate>Thu, 05 Dec 2024 02:41:39 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>nvidia</category><category>huggingface</category><category>o1-full</category><category>sora</category><category>gpt-4.5</category><category>gpt-4</category><category>claude-3.5-sonnet</category><category>llama-3-1-nemotron-51b</category><category>llama-3-1</category><category>llama-3</category><category>nemotron-51b</category><category>lucas-beyer</category><category>alexander-kolesnikov</category><category>xiaohua-zhai</category><category>aidan_mclau</category><category>giffmana</category><category>joannejang</category><category>sama</category><category>vision</category><category>model-performance</category><category>neural-architecture-search</category><category>model-optimization</category><category>multimodality</category><category>model-release</category><category>model-training</category><category>reinforcement-learning</category><category>image-generation</category></item><item><title>Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)</title><link>https://news.smol.ai/issues/24-12-03-ainews-olympus-has-dropped-aka-amazon-nova-microorliteorproorpremierorcanvasorreel/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-03-ainews-olympus-has-dropped-aka-amazon-nova-microorliteorproorpremierorcanvasorreel/</guid><description>**Amazon** announced the **Amazon Nova** family of multimodal foundation models at AWS Re:Invent, available immediately with no waitlist in configurations like Micro, Lite, Pro, Canvas, and Reel, with Premier and speech-to-speech coming next year. These models offer **2-4x faster token speeds** and are **25%-400% cheaper** than competitors like **Anthropic Claude** models, positioning Nova as a serious contender in AI engineering. Pricing undercuts models such as **Google DeepMind Gemini Flash 8B**, and some Nova models extend context length up to **300k tokens**. However, benchmarking controversy exists as some evaluations show Nova scoring below **Llama-3 70B** in **LiveBench AI** metrics. Separately, **CycleQD** was introduced by **Sakana AI Labs**, using evolutionary computation for population-based model merging to develop niche LLM agents.</description><pubDate>Wed, 04 Dec 2024 03:06:39 GMT</pubDate><category>amazon</category><category>anthropic</category><category>google-deepmind</category><category>sakana-ai-labs</category><category>amazon-nova</category><category>claude-3</category><category>llama-3-70b</category><category>gemini-1.5-flash</category><category>gpt-4o</category><category>philschmid</category><category>bindureddy</category><category>multimodality</category><category>benchmarking</category><category>model-merging</category><category>model-performance</category><category>model-architecture</category><category>model-optimization</category><category>population-based-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-12-02-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-12-02-ainews-not-much-happened-today/</guid><description>**AI News for 11/29/2024-12/2/2024** highlights several developments: **Nvidia** introduced **Puzzle**, a distillation-based neural architecture search for inference-optimized large language models, enhancing efficiency. The **IC-Light V2** model was released for varied illumination scenarios, and new video model techniques like **Trajectory Attention** and **Timestep Embedding** were presented. **Amazon** increased its investment in **Anthropic** to **$8 billion**, supporting AI safety research through a new fellowship program. **Google** is expanding AI integration with the **Gemini API** and open collaboration tools. Discussions on domain name relevance emphasize alternatives to **.com** domains like **.io**, **.ai**, and **.co**. Advances in reasoning include a **13.53% improvement** in LLM performance using &quot;Reverse Thinking&quot;. **Pydantic** launched a new agent framework, and **Supabase** released version 2 of their assistant. Other notable mentions include **Browser Company** teasing a second browser and **World Labs** launching image-to-3D-world technology. The NotebookLM team departed from **Google**, and **Cognition** was featured on the cover of **Forbes**. The news was summarized by **Claude 3.5 Sonnet**.</description><pubDate>Mon, 02 Dec 2024 23:49:20 GMT</pubDate><category>nvidia</category><category>amazon</category><category>anthropic</category><category>google</category><category>pydantic</category><category>supabase</category><category>browser-company</category><category>world-labs</category><category>cognition</category><category>ic-light-v2</category><category>claude-3-5-sonnet</category><category>puzzle</category><category>akhaliq</category><category>adcock_brett</category><category>omarsar0</category><category>iscienceluvr</category><category>distillation</category><category>neural-architecture-search</category><category>inference-optimization</category><category>video</category><category>trajectory-attention</category><category>timestep-embedding</category><category>ai-safety-research</category><category>fellowship-programs</category><category>api</category><category>domain-names</category><category>reverse-thinking</category><category>reasoning</category><category>agent-frameworks</category><category>image-to-3d</category><category>ai-integration</category></item><item><title>not much happened to end the week</title><link>https://news.smol.ai/issues/24-11-29-ainews-not-much-happened-to-end-the-week/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-29-ainews-not-much-happened-to-end-the-week/</guid><description>**AI News for 11/29/2024-11/30/2024** covers key updates including the **Gemini multimodal model** advancing in musical structure understanding, a new **quantized SWE-Bench** for benchmarking at **1.3 bits per task**, and the launch of the **DeepSeek-R1 model** focusing on transparent reasoning as an alternative to **o1**. The establishment of the **1st International Network of AI Safety Institutes** highlights global collaboration on AI safety. Industry updates feature **Amazon&apos;s Olympus AI model**, **Tesla&apos;s Optimus**, and experiments with **ChatGPT** as a universal translator. Community reflections emphasize the impact of large language models on daily life and medical AI applications. Discussions include scaling sparse autoencoders to **gpt-4** and the need for transparency in reasoning LLMs. The report also notes humor around **ChatGPT**&apos;s French nickname.</description><pubDate>Fri, 29 Nov 2024 23:07:35 GMT</pubDate><category>google-deepmind</category><category>deeplearningai</category><category>amazon</category><category>tesla</category><category>x-ai</category><category>alibaba</category><category>ollama</category><category>gemini</category><category>deepseek-r1</category><category>o1</category><category>chatgpt</category><category>gpt-4</category><category>claude-3.5-sonnet</category><category>o1-preview</category><category>o1-mini</category><category>gpt4o</category><category>qwq-32b</category><category>yoshua-bengio</category><category>kevinweil</category><category>ylecun</category><category>multimodality</category><category>benchmarking</category><category>quantization</category><category>reinforcement-learning</category><category>ai-safety</category><category>translation</category><category>reasoning</category><category>interpretability</category><category>model-comparison</category><category>humor</category></item><item><title>Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500</title><link>https://news.smol.ai/issues/24-11-27-ainews-qwen-with-questions-32b-open-weights-reasoning-model-nears-o1-in-gpqaaimemath500/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-27-ainews-qwen-with-questions-32b-open-weights-reasoning-model-nears-o1-in-gpqaaimemath500/</guid><description>**DeepSeek r1** leads the race for &quot;open o1&quot; models but has yet to release weights, while **Justin Lin** released **QwQ**, a **32B open weight model** that outperforms **GPT-4o** and **Claude 3.5 Sonnet** on benchmarks. QwQ appears to be a fine-tuned version of **Qwen 2.5**, emphasizing sequential search and reflection for complex problem-solving. **SambaNova** promotes its RDUs as superior to GPUs for inference tasks, highlighting the shift from training to inference in AI systems. On Twitter, **Hugging Face** announced CPU deployment for llama.cpp instances, **Marker v1** was released as a faster and more accurate deployment tool, and **Agentic RAG** developments focus on integrating external tools and advanced LLM chains for improved response accuracy. The open-source AI community sees growing momentum with models like **Flux** gaining popularity, reflecting a shift towards multi-modal AI models including image, video, audio, and biology.</description><pubDate>Thu, 28 Nov 2024 01:23:25 GMT</pubDate><category>deepseek</category><category>sambanova</category><category>hugging-face</category><category>dair-ai</category><category>deepseek-r1</category><category>qwq</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>qwen-2.5</category><category>llama-cpp</category><category>justin-lin</category><category>clementdelangue</category><category>ggerganov</category><category>vikparuchuri</category><category>model-releases</category><category>benchmarking</category><category>fine-tuning</category><category>sequential-search</category><category>inference</category><category>model-deployment</category><category>agentic-rag</category><category>external-tools</category><category>multi-modal-models</category></item><item><title>OLMo 2 - new SOTA Fully Open LLM</title><link>https://news.smol.ai/issues/24-11-26-ainews-olmo-2-new-sota-fully-open-llm/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-26-ainews-olmo-2-new-sota-fully-open-llm/</guid><description>**AI2** has updated **OLMo-2** to roughly **Llama 3.1 8B** equivalent, training with **5T tokens** and using learning rate annealing and new high-quality data (Dolmino). They credit **Tülu 3** and its &quot;Reinforcement Learning with Verifiable Rewards&quot; approach. On Reddit, **Qwen2.5-72B instruct** model shows near lossless performance with **AutoRound 4-bit quantization**, available on **HuggingFace** in 4-bit and 2-bit versions, with discussions on **MMLU** benchmark and quantization-aware training. **HuggingFace** released **SmolVLM**, a **2B parameter** vision-language model running efficiently on consumer GPUs, supporting fine-tuning on Google Colab and demonstrating strong OCR capabilities with adjustable resolution and quantization options.</description><pubDate>Wed, 27 Nov 2024 05:17:18 GMT</pubDate><category>ai2</category><category>huggingface</category><category>intel</category><category>llama-3-1-8b</category><category>olmo-2</category><category>qwen2-5-72b-instruct</category><category>smolvlm</category><category>tulu-3</category><category>reinforcement-learning</category><category>quantization</category><category>learning-rate-annealing</category><category>ocr</category><category>fine-tuning</category><category>model-training</category><category>vision</category></item><item><title>Anthropic launches the Model Context Protocol</title><link>https://news.smol.ai/issues/24-11-25-ainews-anthropic-launches-the-model-context-protocol/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-25-ainews-anthropic-launches-the-model-context-protocol/</guid><description>**Anthropic** has launched the **Model Context Protocol (MCP)**, an open protocol designed to enable seamless integration between large language model applications and external data sources and tools. MCP supports diverse resources such as file contents, database records, API responses, live system data, screenshots, and logs, identified by unique URIs. It also includes reusable prompt templates, system and API tools, and JSON-RPC 2.0 transports with streaming support. MCP allows servers to request LLM completions through clients with priorities on cost, speed, and intelligence, hinting at an upcoming model router by Anthropic. Launch partners like **Zed**, **Sourcegraph**, and **Replit** have reviewed MCP favorably, while some developers express skepticism about its provider exclusivity and adoption potential. The protocol emphasizes security, testing, and dynamic tool discovery, with guides and videos available from community members such as **Alex Albert** and **Matt Pocock**. This development follows Anthropic&apos;s recent **$4 billion fundraise from Amazon** and aims to advance terminal-level integration for **Claude Desktop**.</description><pubDate>Tue, 26 Nov 2024 01:56:47 GMT</pubDate><category>anthropic</category><category>amazon</category><category>zed</category><category>sourcegraph</category><category>replit</category><category>claude-3.5-sonnet</category><category>claude-desktop</category><category>alex-albert</category><category>matt-pocock</category><category>hwchase17</category><category>model-context-protocol</category><category>integration</category><category>json-rpc</category><category>agentic-behaviors</category><category>security</category><category>tool-discovery</category><category>open-protocol</category><category>api-integration</category><category>system-integration</category><category>prompt-templates</category><category>model-routing</category></item><item><title>Vision Everywhere: Apple AIMv2 and Jina CLIP v2</title><link>https://news.smol.ai/issues/24-11-22-ainews-vision-everywhere-apple-aimv2-and-jina-clip-v2/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-22-ainews-vision-everywhere-apple-aimv2-and-jina-clip-v2/</guid><description>**Apple** released **AIMv2**, a novel vision encoder pre-trained with autoregressive objectives that achieves **89.5% accuracy on ImageNet** and integrates joint visual and textual objectives. **Jina** launched **Jina CLIP v2**, a multimodal embedding model supporting **89 languages** and high-resolution images with efficient Matryoshka embeddings reducing dimensions by **94%** with minimal accuracy loss. **Allen AI** introduced **Tülu 3** models based on **Llama 3.1** with **8B and 70B** parameters, offering **2.5x faster inference** and alignment via SFT, DPO, and RLVR methods, competing with **Claude 3.5** and **Llama 3.1 70B**. These developments highlight advances in autoregressive training, vision encoders, and multilingual multimodal embeddings.</description><pubDate>Fri, 22 Nov 2024 23:31:04 GMT</pubDate><category>apple</category><category>jina</category><category>allen_ai</category><category>aimv2-3b</category><category>jina-clip-v2</category><category>tulu-3</category><category>llama-3-1</category><category>claude-3-5</category><category>llama-3-1-70b</category><category>autoregressive-objectives</category><category>vision</category><category>multilinguality</category><category>multimodality</category><category>image-generation</category><category>model-training</category><category>model-optimization</category><category>reinforcement-learning</category><category>fine-tuning</category><category>model-benchmarking</category></item><item><title>LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)</title><link>https://news.smol.ai/issues/24-11-21-ainews-lmsys-killed-model-versioning-gpt-4o-1120-gemini-exp-1121/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-21-ainews-lmsys-killed-model-versioning-gpt-4o-1120-gemini-exp-1121/</guid><description>**AI News for 11/21/2024-11/22/2024** highlights the intense frontier lab race with **OpenAI&apos;s gpt-4o-2024-11-20** and **Google DeepMind&apos;s gemini-exp-1121** trading top spots on the Lmsys leaderboard. The trend of using date-based model identifiers instead of traditional versioning is noted across leading labs including **Anthropic**. **DeepSeek R1** is gaining attention as a potent open-source alternative, especially in the context of the AI competition between China and the US. **Gemini-Exp-1121** is praised for improvements in vision, coding, and reasoning, while **MistralAI** expands with a new Palo Alto office, signaling growth and hiring.</description><pubDate>Fri, 22 Nov 2024 00:56:03 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>deepseek</category><category>mistral-ai</category><category>gpt-4o-2024-11-20</category><category>gemini-exp-1121</category><category>deepseek-r1</category><category>model-release</category><category>model-ranking</category><category>open-source</category><category>vision</category><category>coding</category><category>reasoning</category><category>market-competition</category></item><item><title>DeepSeek-R1 claims to beat o1-preview AND will be open sourced</title><link>https://news.smol.ai/issues/24-11-20-ainews-deepseek-r1-claims-to-beat-o1-preview-and-will-be-open-sourced/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-20-ainews-deepseek-r1-claims-to-beat-o1-preview-and-will-be-open-sourced/</guid><description>**DeepSeek** has released **DeepSeek-R1-Lite-Preview**, an open-source reasoning model achieving **o1-preview-level performance** on math benchmarks with transparent thought processes, showing promise in real-time problem-solving. **NVIDIA** reported a record **$35.1 billion** revenue in Q3 with **112% year-on-year data center growth**, driven by **Hopper** and **Blackwell architectures**, the latter offering **2.2x performance improvement**. **Google DeepMind** introduced **AlphaQubit**, a quantum computing system improving error correction and outperforming leading decoders, though challenges remain in scaling and speed. The AI community continues to focus on **reasoning models**, **benchmarking**, and **quantum error correction** advancements.</description><pubDate>Thu, 21 Nov 2024 02:41:02 GMT</pubDate><category>deepseek</category><category>nvidia</category><category>google-deepmind</category><category>deepseek-r1-lite-preview</category><category>o1-preview</category><category>hopper</category><category>blackwell</category><category>alphaqubit</category><category>yann-lecun</category><category>reasoning</category><category>benchmarking</category><category>quantum-error-correction</category><category>quantum-computing</category><category>model-performance</category><category>model-release</category></item><item><title>Perplexity starts Shopping for you</title><link>https://news.smol.ai/issues/24-11-19-ainews-perplexity-starts-shopping-for-you/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-19-ainews-perplexity-starts-shopping-for-you/</guid><description>**Stripe** launched their Agent SDK, enabling AI-native shopping experiences like **Perplexity Shopping** for US Pro members, featuring one-click checkout and free shipping via the **Perplexity Merchant Program**. **Mistral AI** released the **Pixtral Large 124B** multi-modal image model, now on **Hugging Face** and supported by **Le Chat** for image generation. **Cerebras Systems** offers a public inference endpoint for **Llama 3.1 405B** with a 128k context window and high throughput. **Claude 3.6** shows improvements over **Claude 3.5** but with subtle hallucinations. The **Bi-Mamba** 1-bit architecture improves LLM efficiency. The **wandb SDK** is preinstalled on Google Colab, and **Pixtral Large** is integrated into **AnyChat** and supported by **vLLM** for efficient model usage.</description><pubDate>Wed, 20 Nov 2024 00:43:00 GMT</pubDate><category>stripe</category><category>perplexity-ai</category><category>mistral-ai</category><category>hugging-face</category><category>cerebras</category><category>anthropic</category><category>weights-biases</category><category>google</category><category>vllm-project</category><category>pixtral-large-124b</category><category>llama-3.1-405b</category><category>claude-3.6</category><category>claude-3.5</category><category>patrick-collison</category><category>jeff-weinstein</category><category>mervenoyann</category><category>sophiamyang</category><category>tim-dettmers</category><category>omarsar0</category><category>akhaliq</category><category>aravsrinivas</category><category>multi-modal</category><category>image-generation</category><category>inference</category><category>context-windows</category><category>model-performance</category><category>model-efficiency</category><category>sdk</category><category>ai-integration</category><category>one-click-checkout</category><category>memory-optimization</category></item><item><title>Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11</title><link>https://news.smol.ai/issues/24-11-18-ainews-pixtral-large-124b-beats-llama-32-90b-with-updated-mistral-large-2411/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-18-ainews-pixtral-large-124b-beats-llama-32-90b-with-updated-mistral-large-2411/</guid><description>**Mistral** has updated its **Pixtral Large** vision encoder to 1B parameters and released an update to the **123B parameter Mistral Large 24.11** model, though the update lacks major new features. **Pixtral Large** outperforms **Llama 3.2 90B** on multimodal benchmarks despite having a smaller vision adapter. **Mistral&apos;s Le Chat** chatbot received comprehensive feature updates, reflecting a company focus on product and research balance as noted by **Arthur Mensch**. **SambaNova** sponsors inference with their RDUs offering faster AI model processing than GPUs. On Reddit, **vLLM** shows strong concurrency performance on an **RTX 3090** GPU, with quantization challenges noted in **FP8 kv-cache** but better results using **llama.cpp** with **Q8 kv-cache**. Users discuss performance trade-offs between **vLLM**, **exllamav2**, and **TabbyAPI** for different model sizes and batching strategies.</description><pubDate>Tue, 19 Nov 2024 02:25:23 GMT</pubDate><category>mistral-ai</category><category>sambanova</category><category>nvidia</category><category>pixtral-large</category><category>mistral-large-24.11</category><category>llama-3-2</category><category>qwen2.5-7b-instruct-abliterated-v2-gguf</category><category>qwen2.5-32b-q3_k_m</category><category>vllm</category><category>llama-cpp</category><category>exllamav2</category><category>tabbyapi</category><category>arthur-mensch</category><category>multimodality</category><category>vision</category><category>model-updates</category><category>chatbots</category><category>inference</category><category>gpu-optimization</category><category>quantization</category><category>performance</category><category>concurrency</category><category>kv-cache</category></item><item><title>Stripe lets Agents spend money with StripeAgentToolkit</title><link>https://news.smol.ai/issues/24-11-15-ainews-stripe-lets-agents-spend-money-with-stripeagenttoolkit/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-15-ainews-stripe-lets-agents-spend-money-with-stripeagenttoolkit/</guid><description>**Stripe** has pioneered an AI SDK specifically designed for agents that handle payments, integrating with models like **gpt-4o** to enable financial transactions and token-based charging. The AI developer tooling trend emphasizes better &quot;AI-Computer Interfaces&quot; for improved agent reliability, with tools like **E2B** and the `llms.txt` documentation trend gaining traction, notably adopted by **Anthropic**. In AI model news, **Gemini-Exp-1114** topped the Vision Leaderboard and improved in Math Arena, while discussions continue around model overfitting and the limits of scaling laws for **AGI**. **OpenAI** released a **ChatGPT desktop app for macOS** with integrations for **VS Code**, **Xcode**, and **Terminal**, enhancing developer workflows and pair programming. **Anthropic** introduced a prompt improver using chain-of-thought reasoning, and **Meta AI** shared top research from **EMNLP2024** on image captioning, dialogue systems, and memory-efficient fine-tuning. Highlights from **ICLR 2025** include diffusion-based illumination harmonization, open mixture-of-experts language models, and hyperbolic vision-language models. A new adaptive decoding method optimizes creativity and factuality per token. Tools like **LlamaParse** and **RAGformation** were also introduced for document parsing and retrieval-augmented generation.</description><pubDate>Sat, 16 Nov 2024 01:02:33 GMT</pubDate><category>stripe</category><category>openai</category><category>anthropic</category><category>meta-ai-fair</category><category>gpt-4o</category><category>gemini-exp-1114</category><category>abacaj</category><category>francois-fleuret</category><category>lmarena_ai</category><category>goodside</category><category>jxmnop</category><category>jaseweston</category><category>stevenheidel</category><category>ai-computer-interfaces</category><category>agentic-ai</category><category>model-overfitting</category><category>benchmarks</category><category>scaling-laws</category><category>agi</category><category>chain-of-thought</category><category>image-captioning</category><category>dialogue-systems</category><category>memory-efficient-fine-tuning</category><category>diffusion-models</category><category>mixture-of-experts</category><category>adaptive-decoding</category><category>creativity-optimization</category><category>factuality-optimization</category><category>pair-programming</category><category>document-parsing</category><category>retrieval-augmented-generation</category></item><item><title>Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo</title><link>https://news.smol.ai/issues/24-11-14-ainews-gemini-experimental-1114-retakes-1-llm-rank-with-1344-elo/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-14-ainews-gemini-experimental-1114-retakes-1-llm-rank-with-1344-elo/</guid><description>**Anthropic** released the **3.5 Sonnet** benchmark for jailbreak robustness, emphasizing adaptive defenses. **OpenAI** enhanced **GPT-4** with a new RAG technique for contiguous chunk retrieval. **LangChain** launched **Promptim** for prompt optimization. **Meta AI** introduced **NeuralFeels** with neural fields for visuotactile perception. **RichardMCNgo** resigned from **OpenAI**, highlighting concerns on **AI governance** and **theoretical alignment**. Discussions emphasized the importance of **truthful public information** and **ethical alignment** in AI deployment. The latest **Gemini** update marks a new #1 LLM amid alignment challenges. The AI community continues to focus on **benchmarking**, **prompt-engineering**, and **alignment** issues.</description><pubDate>Fri, 15 Nov 2024 02:50:42 GMT</pubDate><category>anthropic</category><category>openai</category><category>langchain</category><category>meta-ai-fair</category><category>claude-3-sonnet</category><category>gpt-4</category><category>gemini-1.5</category><category>claude-3.5-sonnet</category><category>richardmcngo</category><category>andrewyng</category><category>philschmid</category><category>benchmarking</category><category>prompt-engineering</category><category>rag</category><category>visuotactile-perception</category><category>ai-governance</category><category>theoretical-alignment</category><category>ethical-alignment</category><category>jailbreak-robustness</category><category>model-releases</category><category>alignment</category></item><item><title>Common Corpus: 2T Open Tokens with Provenance</title><link>https://news.smol.ai/issues/24-11-13-ainews-common-corpus-2t-open-tokens-with-provenance/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-13-ainews-common-corpus-2t-open-tokens-with-provenance/</guid><description>**Pleais** via **Huggingface** released **Common Corpus**, the largest fully open multilingual dataset with over **2 trillion tokens** including detailed **provenance information**. They also introduced **OCRonos-Vintage**, a **124M-parameter OCR correction model** that efficiently fixes digitization errors on CPU and GPU, unlocking knowledge from PDFs. On AI tools, **LangChainAI** launched **Prompt Canvas** for collaborative **prompt engineering**, while **DeepSeek** released **JanusFlow 1.3B**, a unified multimodal LLM integrating autoregressive and rectified flow models for enhanced **image understanding** and **generation**. **Alibaba Cloud** announced **Qwen2.5-Coder**, a code-focused LLM with advanced coding capabilities, and **Claude 3.5 Sonnet** was highlighted for superior code generation. Discussions on **quantization challenges** and **scaling laws for precision** by **Tim Dettmers** and others emphasized the impact of low-precision training on model scalability and inference efficiency. *&quot;Scaling Laws for Precision&quot;* paper insights and alternative efficiency methods were also noted.</description><pubDate>Thu, 14 Nov 2024 01:54:53 GMT</pubDate><category>pleais</category><category>huggingface</category><category>langchainai</category><category>deepseek</category><category>alibaba</category><category>anthropic</category><category>qwen-2.5-coder</category><category>claude-3.5-sonnet</category><category>janusflow-1.3b</category><category>ocronos-vintage</category><category>tim-dettmers</category><category>tom-doerr</category><category>omarsar0</category><category>swyx</category><category>madiator</category><category>reach_vb</category><category>provenance</category><category>ocr</category><category>multilingual-datasets</category><category>prompt-engineering</category><category>multimodality</category><category>image-generation</category><category>code-generation</category><category>quantization</category><category>model-scaling</category><category>inference-efficiency</category></item><item><title>BitNet was a lie?</title><link>https://news.smol.ai/issues/24-11-12-ainews-bitnet-was-a-lie/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-12-ainews-bitnet-was-a-lie/</guid><description>**Scaling laws for quantization** have been modified by a group led by Chris Re, analyzing over **465 pretraining runs** and finding benefits plateau at FP6 precision. Lead author **Tanishq Kumar** highlights that longer training and more data increase sensitivity to quantization, explaining challenges with models like **Llama-3**. **Tim Dettmers**, author of QLoRA, warns that the era of efficiency gains from low-precision quantization is ending, signaling a shift from scaling to optimizing existing resources. Additionally, **Alibaba** announced **Qwen 2.5-Coder-32B-Instruct**, which matches or surpasses **GPT-4o** on coding benchmarks, and open-source initiatives like **DeepEval** for LLM testing are gaining traction.</description><pubDate>Wed, 13 Nov 2024 01:36:06 GMT</pubDate><category>sambanova</category><category>alibaba</category><category>hugging-face</category><category>qwen-2.5-coder-32b-instruct</category><category>gpt-4o</category><category>llama-3</category><category>tanishq-kumar</category><category>tim-dettmers</category><category>quantization</category><category>scaling-laws</category><category>model-efficiency</category><category>fine-tuning</category><category>model-performance</category><category>code-generation</category><category>open-source</category><category>unit-testing</category><category>ci-cd</category></item><item><title>FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI</title><link>https://news.smol.ai/issues/24-11-11-ainews-frontiermath-a-benchmark-for-evaluating-advanced-mathematical-reasoning-in-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-11-ainews-frontiermath-a-benchmark-for-evaluating-advanced-mathematical-reasoning-in-ai/</guid><description>**Epoch AI** collaborated with over **60 leading mathematicians** to create the **FrontierMath benchmark**, a fresh set of hundreds of original math problems with easy-to-verify answers, aiming to challenge current AI models. The benchmark reveals that all tested models, including **o1**, perform poorly, highlighting the difficulty of complex problem-solving and **Moravec&apos;s paradox** in AI. Key AI developments include the introduction of **Mixture-of-Transformers (MoT)**, a sparse multi-modal transformer architecture reducing computational costs, and improvements in **Chain-of-Thought (CoT) prompting** through incorrect reasoning and explanations. Industry news covers **OpenAI** acquiring the **chat.com** domain, **Microsoft** launching the **Magentic-One agent framework**, **Anthropic** releasing **Claude 3.5 Haiku** outperforming **gpt-4o** on some benchmarks, and **xAI** securing **150MW grid power** with support from **Elon Musk** and **Trump**. **LangChain AI** introduced new tools including a **Financial Metrics API**, **Document GPT** with PDF upload and Q&amp;A, and **LangPost** AI agent for LinkedIn posts. **xAI** also demonstrated the **Grok Engineer** compatible with OpenAI and Anthropic APIs for code generation.</description><pubDate>Tue, 12 Nov 2024 01:33:12 GMT</pubDate><category>epoch-ai</category><category>openai</category><category>microsoft</category><category>anthropic</category><category>x-ai</category><category>langchainai</category><category>o1</category><category>claude-3.5-haiku</category><category>gpt-4o</category><category>karpathy</category><category>philschmid</category><category>adcock_brett</category><category>dylan522p</category><category>benchmarking</category><category>math</category><category>moravecs-paradox</category><category>mixture-of-experts</category><category>chain-of-thought</category><category>agent-framework</category><category>financial-metrics-api</category><category>pdf-processing</category><category>few-shot-learning</category><category>code-generation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-11-08-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-08-ainews-not-much-happened-today/</guid><description>This week in AI news, **Anthropic** launched **Claude Sonnet 3.5**, enabling desktop app control via natural language. **Microsoft** introduced **Magentic-One**, a multi-agent system built on the **AutoGen framework**. **OpenCoder** was unveiled as an AI-powered code cookbook for large language models. **SambaNova** is sponsoring a hackathon with prizes up to **$5000** for building real-time AI agents. **Sophiamyang** announced new **Batch and Moderation APIs** with **50% lower cost** and multi-dimensional harmful text detection. Open-source tools like **Infisical** for secret management, **CrewAI** for autonomous agent orchestration, and **Crawlee** for web scraping were released. Research highlights include **SCIPE** for error analysis in LLM chains, **Context Refinement Agent** for improved retrieval-augmented generation, and **MemGPT** for managing LLM memory. The week also saw a legal win for **OpenAI** in the RawStory copyright case, affirming that facts used in LLM training are not copyrightable.</description><pubDate>Fri, 08 Nov 2024 23:16:39 GMT</pubDate><category>anthropic</category><category>microsoft</category><category>sambanova</category><category>openai</category><category>langchain</category><category>llamaindex</category><category>claude-3.5-sonnet</category><category>opencoder</category><category>sophiamyang</category><category>tom_doerr</category><category>omarsar0</category><category>_akhaliq</category><category>andrewyng</category><category>giffmana</category><category>multi-agent-systems</category><category>natural-language-interfaces</category><category>batch-processing</category><category>harmful-content-detection</category><category>secret-management</category><category>retrieval-augmented-generation</category><category>error-analysis</category><category>memory-management</category><category>web-scraping</category><category>autonomous-agents</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-11-07-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-07-ainews-not-much-happened-today/</guid><description>This week in AI news highlights **Ollama 0.4** supporting **Meta&apos;s Llama 3.2 Vision** models (11B and 90B), with applications like handwriting recognition. **Self-Consistency Preference Optimization (ScPO)** was introduced to improve model consistency without human labels. Discussions on **model scaling**, **neural networks resurgence**, and **AMD&apos;s multi-GPU bandwidth** challenges were noted. The importance of **skip connections** in **Transformers** was emphasized. In healthcare, **less regulation plus AI** could revolutionize disease treatment and aging. Tools like **LlamaParse** and **Gemini** aid automated resume insights. **Gitpod Flex** demonstrated zero-trust architecture for secure development environments. Research includes surveys on **Small Language Models (SLMs)**, **number understanding** in LLMs, and **DTrOCR** using a **GPT-2 decoder** for OCR. Multi-agent systems in prediction markets were discussed by **TogetherCompute** and **LangChainAI**. Community events include **NeurIPS Happy Hour**, **NLP seminars**, and courses on **Agent Memory** with LLMs as operating systems.</description><pubDate>Fri, 08 Nov 2024 01:01:09 GMT</pubDate><category>meta-ai-fair</category><category>ollama</category><category>amd</category><category>llamaindex</category><category>gemini</category><category>gitpod</category><category>togethercompute</category><category>langchainai</category><category>weights-biases</category><category>stanfordnlp</category><category>deeplearningai</category><category>llama-3-2-vision</category><category>gpt-2</category><category>bindureddy</category><category>fstichler</category><category>stasbekman</category><category>jxmnop</category><category>bindureddy</category><category>omarsar0</category><category>giffmana</category><category>rajammanabrolu</category><category>model-scaling</category><category>neural-networks</category><category>multi-gpu-support</category><category>skip-connections</category><category>transformers</category><category>healthcare-ai</category><category>automated-recruitment</category><category>zero-trust-security</category><category>small-language-models</category><category>numerical-processing</category><category>chain-of-thought</category><category>optical-character-recognition</category><category>multi-agent-systems</category><category>agent-memory</category><category>interactive-language-learning</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/24-11-06-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-06-ainews-not-much-happened-today/</guid><description>**Grok Beta** surpasses **Llama 3.1 70B** in intelligence but is less competitive due to its pricing at **$5/1M input tokens** and **$15/1M output tokens**. **Defense Llama**, developed with **Meta AI** and **Scale AI**, targets American national security applications. **SWE-Kit**, an open-source framework, supports building customizable AI software engineers compatible with **Llama 3**, **ChatGPT**, and **Claude**. **LangChainAI** and **Weights &amp; Biases** integrate to improve retrievers and reduce hallucinations in **RAG applications** using **Gemini**. **Perplexity AI** offers enhanced election tracking tools for the **2024 elections**, including live state results and support for **Claude 3.5 Haiku**. **AI Talk** launched featuring discussions on Chinese AI labs with guests from **Qwen**. Memes highlight **Elon Musk** and humorous AI coding mishaps.</description><pubDate>Thu, 07 Nov 2024 02:54:09 GMT</pubDate><category>meta-ai-fair</category><category>scale-ai</category><category>anthropic</category><category>perplexity-ai</category><category>langchainai</category><category>weights-biases</category><category>qwen</category><category>grok-beta</category><category>llama-3-1-70b</category><category>claude-3-5-haiku</category><category>claude-3-opus</category><category>llama-3</category><category>chatgpt</category><category>gemini</category><category>alexandr_wang</category><category>svpino</category><category>aravsrinivas</category><category>bindureddy</category><category>teortaxestex</category><category>jessechenglyu</category><category>junyang-lin</category><category>cte_junior</category><category>jerryjliu0</category><category>pricing</category><category>national-security</category><category>defense</category><category>open-source</category><category>agentic-ai</category><category>retrieval-augmented-generation</category><category>election-predictions</category><category>real-time-updates</category><category>annotation</category><category>ai-ecosystem</category><category>memes</category><category>humor</category></item><item><title>Tencent&apos;s Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data</title><link>https://news.smol.ai/issues/24-11-05-ainews-tencents-hunyuan-large-claims-to-beat-deepseek-v2-and-llama3-405b-with-less-data/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-05-ainews-tencents-hunyuan-large-claims-to-beat-deepseek-v2-and-llama3-405b-with-less-data/</guid><description>**Tencent** released a notable &gt;300B parameter MoE model pretrained on **7T tokens**, including **1.5T synthetic data** generated via **Evol-Instruct**. The model introduces novel techniques like &quot;recycle routing&quot; and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, **Anthropic** launched **Claude 3.5 Haiku**, now available on multiple platforms, praised for intelligence and speed but criticized for a **10x price increase**. **Meta** opened **Llama AI** to the U.S. defense sector, and a **Llama Impact Hackathon** offers a **$15K prize** for projects using **Llama 3.1 &amp; 3.2 Vision**. **LlamaIndex** released a React chat UI component with Tailwind CSS and LLM backend integrations. The **MLX LM** model advances text generation speed and efficiency with KV cache quantization.</description><pubDate>Wed, 06 Nov 2024 06:22:40 GMT</pubDate><category>tencent</category><category>anthropic</category><category>meta-ai-fair</category><category>togethercompute</category><category>llamaindex</category><category>claude-3.5-haiku</category><category>llama-3-1</category><category>llama-3-2</category><category>mlx-lm</category><category>mixture-of-experts</category><category>synthetic-data</category><category>model-scaling</category><category>model-architecture</category><category>model-optimization</category><category>kv-cache-quantization</category><category>react</category><category>fine-tuning</category><category>scaling-laws</category><category>model-efficiency</category><category>model-deployment</category><category>multimodality</category></item><item><title>OpenAI beats Anthropic to releasing Speculative Decoding</title><link>https://news.smol.ai/issues/24-11-04-ainews-openai-beats-anthropic-to-releasing-speculative-decoding/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-04-ainews-openai-beats-anthropic-to-releasing-speculative-decoding/</guid><description>**Prompt lookup** and **Speculative Decoding** techniques are gaining traction with implementations from **Cursor**, **Fireworks**, and teased features from **Anthropic**. **OpenAI** has introduced faster response times and file edits with these methods, offering about **50%** efficiency improvements. The community is actively exploring AI engineering use cases with these advancements. Recent updates highlight progress from companies like **NVIDIA**, **OpenAI**, **Anthropic**, **Microsoft**, **Boston Dynamics**, and **Meta**. Key technical insights include CPU inference capabilities, multimodal retrieval-augmented generation (RAG), and neural network fundamentals. New AI products include fully AI-generated games and advanced content generation tools. Challenges in AI research labs such as bureaucracy and resource allocation were also discussed, alongside AI safety and governance concerns.</description><pubDate>Tue, 05 Nov 2024 02:51:39 GMT</pubDate><category>openai</category><category>anthropic</category><category>nvidia</category><category>microsoft</category><category>boston-dynamics</category><category>meta-ai-fair</category><category>runway</category><category>elevenlabs</category><category>etched</category><category>osmo</category><category>physical-intelligence</category><category>langchain</category><category>claude-3-sonnet</category><category>mrt5</category><category>adcock_brett</category><category>vikhyatk</category><category>dair_ai</category><category>rasbt</category><category>bindureddy</category><category>teortaxestex</category><category>svpino</category><category>c_valenzuelab</category><category>davidsholz</category><category>speculative-decoding</category><category>prompt-lookup</category><category>cpu-inference</category><category>multimodality</category><category>retrieval-augmented-generation</category><category>neural-networks</category><category>optimization</category><category>ai-safety</category><category>governance</category><category>model-architecture</category><category>inference-economics</category><category>content-generation</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-11-01-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-01-ainews-not-much-happened-today/</guid><description>**ChatGPT Search** was launched by **Sam Altman**, who called it his favorite feature since ChatGPT&apos;s original launch, doubling his usage. Comparisons were made between ChatGPT Search and **Perplexity** with improvements noted in Perplexity&apos;s web navigation. **Google** introduced a &quot;Grounding&quot; feature in the Gemini API &amp; AI Studio enabling Gemini models to access real-time web information. Despite Gemini&apos;s leaderboard performance, developer adoption lags behind **OpenAI** and **Anthropic**. **SmolLM2**, a new small, powerful on-device language model, outperforms **Meta&apos;s Llama 3.2 1B**. A **Claude** desktop app was released for Mac and Windows. **Meta AI** announced robotics advancements including Meta Sparsh, Meta Digit 360, and Meta Digit Plexus. **Stable Diffusion 3.5 Medium**, a 2B parameter model with a permissive license, was released. Insights on AGI development suggest initial inferiority but rapid improvement. **Anthropic** advocates for early targeted AI regulation. Discussions on ML specialization predict training will concentrate among few companies, while inference becomes commoditized. New AI tools include **Suno AI Personas** for music creation, **PromptQL** for natural language querying over data, and **Agent S** for desktop task automation. Humor was shared about Python environment upgrades.</description><pubDate>Fri, 01 Nov 2024 20:59:45 GMT</pubDate><category>openai</category><category>anthropic</category><category>google</category><category>meta-ai-fair</category><category>suno-ai</category><category>perplexity-ai</category><category>smollm2</category><category>llama-3-2</category><category>stable-diffusion-3.5</category><category>claude-3.5-sonnet</category><category>gemini</category><category>sam-altman</category><category>akhaliq</category><category>arav-srinivas</category><category>labenz</category><category>loubnabenallal1</category><category>alexalbert</category><category>fchollet</category><category>stasbekman</category><category>svpino</category><category>rohanpaul_ai</category><category>hamelhusain</category><category>on-device-ai</category><category>model-performance</category><category>robotics</category><category>multimodality</category><category>ai-regulation</category><category>model-releases</category><category>natural-language-processing</category><category>prompt-engineering</category><category>agentic-ai</category><category>ai-application</category><category>model-optimization</category></item><item><title>The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more</title><link>https://news.smol.ai/issues/24-11-01-ainews-the-ai-search-wars-have-begun-searchgpt-gemini-grounding-and-more/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-11-01-ainews-the-ai-search-wars-have-begun-searchgpt-gemini-grounding-and-more/</guid><description>**ChatGPT** launched its search functionality across all platforms using a fine-tuned version of **GPT-4o** with synthetic data generation and distillation from **o1-preview**. This feature includes a Chrome extension promoted by **Sam Altman** but has issues with hallucinations. The launch coincides with **Gemini** introducing Search Grounding after delays. Notably, **The New York Times** is not a partner due to a lawsuit against **OpenAI**. The AI search competition intensifies with consumer and B2B players like **Perplexity** and **Glean**. Additionally, **Claude 3.5 Sonnet** achieved a new benchmark record on SWE-bench Verified, and a new hallucination evaluation benchmark, SimpleQA, was introduced. Other highlights include the **Universal-2** speech-to-text model with 660M parameters and **HOVER**, a neural whole-body controller for humanoid robots trained in NVIDIA Isaac simulation. AI hedge fund teams using **LangChain** and **LangGraph** were also showcased. The news is sponsored by the RAG++ course featuring experts from **Weights &amp; Biases**, **Cohere**, and **Weaviate**.</description><pubDate>Fri, 01 Nov 2024 07:04:02 GMT</pubDate><category>openai</category><category>google</category><category>gemini</category><category>nyt</category><category>perplexity-ai</category><category>glean</category><category>nvidia</category><category>langchain</category><category>langgraph</category><category>weights-biases</category><category>cohere</category><category>weaviate</category><category>gpt-4o</category><category>o1-preview</category><category>claude-3.5-sonnet</category><category>universal-2</category><category>sam-altman</category><category>alexalbert__</category><category>_jasonwei</category><category>svpino</category><category>drjimfan</category><category>virattt</category><category>fine-tuning</category><category>synthetic-data</category><category>distillation</category><category>hallucinations</category><category>benchmarking</category><category>speech-to-text</category><category>robotics</category><category>neural-networks</category><category>ai-agents</category></item><item><title>Creating a LLM-as-a-Judge</title><link>https://news.smol.ai/issues/24-10-30-ainews-creating-a-llm-as-a-judge/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-30-ainews-creating-a-llm-as-a-judge/</guid><description>**Anthropic** released details on Claude 3.5 SWEBench+SWEAgent, while **OpenAI** introduced SimpleQA and **DeepMind** launched NotebookLM. **Apple** announced new M4 Macbooks, and a new SOTA image model, Recraft v3, emerged. Hamel Husain presented a detailed 6,000-word treatise on creating LLM judges using a method called **critique shadowing** to align LLMs with domain experts, addressing the problem of untrusted and unused data in AI teams. The workflow involves expert-reviewed datasets and iterative prompt refinement. Additionally, **Zep** introduced a temporal knowledge graph memory layer to improve AI agent memory and reduce hallucinations. **Anthropic** also integrated Claude 3.5 Sonnet with GitHub Copilot, expanding access to Copilot Chat users.</description><pubDate>Wed, 30 Oct 2024 23:17:27 GMT</pubDate><category>anthropic</category><category>openai</category><category>deepmind</category><category>apple</category><category>zep</category><category>perplexity-ai</category><category>github</category><category>claude-3.5-sonnet</category><category>claude-3.5</category><category>notebooklm</category><category>simpleqa</category><category>recraft-v3</category><category>hamel-husain</category><category>swyx</category><category>critique-shadowing</category><category>llm-judging</category><category>domain-experts</category><category>dataset-creation</category><category>prompt-engineering</category><category>error-analysis</category><category>temporal-knowledge-graphs</category><category>memory-layer</category><category>ai-agent-memory</category><category>hallucination-reduction</category><category>integration</category></item><item><title>GitHub Copilot Strikes Back</title><link>https://news.smol.ai/issues/24-10-29-ainews-github-copilot-strikes-back/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-29-ainews-github-copilot-strikes-back/</guid><description>**GitHub&apos;s tenth annual Universe conference** introduced the **Multi-model Copilot** featuring **Anthropic&apos;s Claude 3.5 Sonnet**, **Google&apos;s Gemini 1.5 Pro**, and **OpenAI&apos;s o1-preview** models in a new picker UI, allowing developers to choose from multiple companies&apos; models. The event also showcased **GitHub Spark**, an AI-native tool for building natural language applications with deployment-free hosting and integrated model prompting. Additionally, GitHub updated its Copilot Workspace with new agents and security Autofix features. **Weights &amp; Biases** launched Weave with multimodal observability supporting audio, text, and images, integrating the OpenAI Realtime API. Twitter recaps highlighted **tinygrad&apos;s** codebase optimization and discussions on GenAI adoption and **Gemini Flash-8B&apos;s** cost efficiency at **$0.0375 per million tokens**.</description><pubDate>Wed, 30 Oct 2024 01:05:11 GMT</pubDate><category>github</category><category>anthropic</category><category>google-deepmind</category><category>openai</category><category>weights-biases</category><category>claude-3-5-sonnet</category><category>gemini-1.5-pro</category><category>o1-preview</category><category>gemini-flash-8b</category><category>cassidy-williams</category><category>fchollet</category><category>rohanpaul_ai</category><category>jxmnop</category><category>model-picker-ui</category><category>multi-model-integration</category><category>natural-language-applications</category><category>deployment-free-hosting</category><category>model-prompting</category><category>multimodal-observability</category><category>audio-tracing</category><category>codebase-optimization</category><category>price-performance-ratio</category></item><item><title>not much happened this weekend</title><link>https://news.smol.ai/issues/24-10-28-ainews-not-much-happened-this-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-28-ainews-not-much-happened-this-weekend/</guid><description>**Moondream**, a **1.6b vision language model**, secured seed funding, highlighting a trend in moon-themed tiny models alongside **Moonshine** (27-61m ASR model). **Claude 3.5 Sonnet** was used for AI Twitter recaps. Discussions included **pattern recognition** vs. **intelligence** in **LLMs**, **reinforcement learning** for prompt optimization, and **NotebookLlama**, an open-source **NotebookLM** variant using **LLaMA models** for tasks like **text-to-speech**. Advances in **model optimization** with **async-TP** in **PyTorch** for **tensor parallelism** and hyperparameter tuning were noted. **Mini-Omni 2** demonstrated multimodal capabilities across **image**, **audio**, and **text** for voice conversations with emphasis on **modal alignment** and **multimodal fine-tuning**. AI productivity tools like an **AI email writer** and **LlamaCloud**-based research assistants were introduced. Emphasis on practical skill development and privacy-conscious AI tool usage with **Llama3-8B** was highlighted. Generative AI tools such as **#AIPythonforBeginners** and **GenAI Agents** with **LangGraph** were shared. Business insights covered rapid execution in AI product development and emerging AI-related job roles. Challenges in enterprise-grade text-to-SQL and advanced retrieval methods were discussed with tutorials on **RAG** applications using **LangChain** and **MongoDB**.</description><pubDate>Mon, 28 Oct 2024 22:27:43 GMT</pubDate><category>moondream</category><category>openai</category><category>anthropic</category><category>hugging-face</category><category>mistral-ai</category><category>google-deepmind</category><category>langchain</category><category>deepmind</category><category>microsoft</category><category>claude-3.5-sonnet</category><category>llama-3</category><category>llama-3-8b</category><category>notebookllama</category><category>min-omni-2</category><category>amanda-askell</category><category>philschmid</category><category>stasbekman</category><category>francois-fleuret</category><category>mervenoyann</category><category>reach_vb</category><category>dzhng</category><category>aravsrinivas</category><category>sama</category><category>lateinteraction</category><category>andrew-y-ng</category><category>bindureddy</category><category>jerryjliu0</category><category>pattern-recognition</category><category>reinforcement-learning</category><category>prompt-optimization</category><category>text-to-speech</category><category>model-optimization</category><category>tensor-parallelism</category><category>hyperparameters</category><category>multimodal</category><category>modal-alignment</category><category>multimodal-fine-tuning</category><category>ai-productivity</category><category>privacy</category><category>generative-ai</category><category>rag</category><category>retrieval-augmentation</category><category>enterprise-text-to-sql</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-10-25-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-25-ainews-not-much-happened-today/</guid><description>**Liquid AI** held a launch event introducing new foundation models. **Anthropic** shared follow-up research on social bias and feature steering with their &quot;Golden Gate Claude&quot; feature. **Cohere** released multimodal Embed 3 embeddings models following Aya Expanse. There was misinformation about **GPT-5/Orion** debunked by **Sam Altman**. **Meta AI FAIR** announced **Open Materials 2024** with new models and datasets for inorganic materials discovery using the EquiformerV2 architecture. **Anthropic AI** demonstrated feature steering to balance social bias and model capabilities. **NVIDIA**&apos;s **Llama-3.1-Nemotron-70B** ranked highly on the Arena leaderboard with style control. **Perplexity AI** expanded to 100M weekly queries with new finance and reasoning modes. **LangChain** emphasized real application integration with interactive frame interpolation. **Kestra** highlighted scalable event-driven workflows with open-source YAML-based orchestration. **OpenFLUX** optimized inference speed by doubling it through guidance LoRA training. Discussions on AI safety included trust dynamics between humans and AI, economic impacts of AI automation, and the White House AI National Security memo addressing cyber and biological risks. **LlamaIndex** showcased knowledge-backed agents for enhanced AI applications.</description><pubDate>Sat, 26 Oct 2024 00:52:03 GMT</pubDate><category>liquid-ai</category><category>anthropic</category><category>cohere</category><category>openai</category><category>meta-ai-fair</category><category>nvidia</category><category>perplexity-ai</category><category>langchain</category><category>kestra</category><category>ostrisai</category><category>llamaindex</category><category>llama-3.1-nemotron-70b</category><category>golden-gate-claude</category><category>embed-3</category><category>sam-altman</category><category>lmarena_ai</category><category>aravsrinivas</category><category>svpino</category><category>richardmcngo</category><category>ajeya_cotra</category><category>tamaybes</category><category>danhendrycks</category><category>jerryjliu0</category><category>feature-steering</category><category>social-bias</category><category>multimodality</category><category>model-optimization</category><category>workflow-orchestration</category><category>inference-speed</category><category>event-driven-workflows</category><category>knowledge-backed-agents</category><category>economic-impact</category><category>ai-national-security</category><category>trust-dynamics</category></item><item><title>s{imple|table|calable} Consistency Models</title><link>https://news.smol.ai/issues/24-10-24-ainews-simpleortableorcalable-consistency-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-24-ainews-simpleortableorcalable-consistency-models/</guid><description>**Model distillation** significantly accelerates diffusion models, enabling near real-time image generation with only 1-4 sampling steps, as seen in **BlinkShot** and **Flux Schnell**. Research led by **Yang Song** introduced **simplified continuous-time consistency models (sCMs)**, achieving under 10% FID difference in just 2 steps and scaling up to **1.5B parameters** for higher quality. On AI hardware, **Tesla** is deploying a **50k H100 cluster** potentially capable of completing **GPT-4** training in under three weeks, while **Cerebras Systems** set a new inference speed record on **Llama 3.1 70B** with their wafer-scale AI chips. **Stability AI** released **Stable Diffusion 3.5** and its Turbo variant, and **Cohere** launched new multilingual models supporting **23 languages** with state-of-the-art performance. **LangChain** also announced ecosystem updates.</description><pubDate>Fri, 25 Oct 2024 02:36:02 GMT</pubDate><category>stability-ai</category><category>tesla</category><category>cerebras</category><category>cohere</category><category>langchain</category><category>llama-3-70b</category><category>llama-3-405b</category><category>llama-3-1</category><category>stable-diffusion-3.5</category><category>gpt-4</category><category>yang-song</category><category>model-distillation</category><category>diffusion-models</category><category>continuous-time-consistency-models</category><category>image-generation</category><category>ai-hardware</category><category>inference-speed</category><category>multilingual-models</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-10-23-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-23-ainews-not-much-happened-today/</guid><description>**Anthropic** released upgraded **Claude 3.5 Sonnet** and **Claude 3.5 Haiku** models featuring a new **computer use capability** that allows interaction with computer interfaces via screenshots and actions like mouse movement and typing. The **Claude 3.5 Sonnet** achieved state-of-the-art coding performance on SWE-bench Verified with a **49% score**, surpassing OpenAI&apos;s **o1-preview**. **Anthropic** focuses on teaching general computer skills rather than task-specific tools, with expected rapid improvements. Other releases include **Mochi 1**, an open-source video generation model, **Stable Diffusion 3.5** with Large and Medium variants, and **Embed 3** by **Cohere**, a multimodal embedding model for text and image search. **KerasHub** was launched by **François Chollet**, unifying KerasNLP and KerasCV with 37 pretrained models. Microsoft introduced the **Differential Transformer** to reduce attention noise via differential attention maps, and research on transformer attention layers was shared by **Rasbt**.</description><pubDate>Thu, 24 Oct 2024 00:39:59 GMT</pubDate><category>anthropic</category><category>openai</category><category>cohere</category><category>microsoft</category><category>claude-3.5-sonnet</category><category>claude-3.5-haiku</category><category>o1-preview</category><category>mochi-1</category><category>stable-diffusion-3.5</category><category>embed-3</category><category>kerashub</category><category>differential-transformer</category><category>alexalbert</category><category>fchollet</category><category>rasbt</category><category>computer-use</category><category>coding-performance</category><category>video-generation</category><category>fine-tuning</category><category>multimodality</category><category>transformers</category><category>attention-mechanisms</category><category>model-optimization</category></item><item><title>Claude 3.5 Sonnet (New) gets Computer Use</title><link>https://news.smol.ai/issues/24-10-22-ainews-claude-35-sonnet-new-gets-computer-use/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-22-ainews-claude-35-sonnet-new-gets-computer-use/</guid><description>**Anthropic** announced new Claude 3.5 models: **3.5 Sonnet** and **3.5 Haiku**, improving coding performance significantly, with Sonnet topping several coding benchmarks like **Aider** and **Vectara**. The new **Computer Use API** enables controlling computers via vision, scoring notably higher than other AI systems, showcasing progress in AI-driven computer interaction. **Zep** launched a cloud edition for AI agents memory management, highlighting challenges in **multimodal memory**. The update also mentions **Llama 3.1** and **Nemotron** models from **NVIDIA**.</description><pubDate>Wed, 23 Oct 2024 02:08:12 GMT</pubDate><category>anthropic</category><category>zep</category><category>nvidia</category><category>claude-3.5-sonnet</category><category>claude-3.5-haiku</category><category>llama-3.1</category><category>nemotron</category><category>philschmid</category><category>swyx</category><category>coding</category><category>benchmarks</category><category>computer-use</category><category>vision</category><category>multimodal-memory</category><category>model-updates</category><category>ai-integration</category></item><item><title>DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing</title><link>https://news.smol.ai/issues/24-10-21-ainews-docetl-agentic-query-rewriting-and-evaluation-for-complex-document-processing/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-21-ainews-docetl-agentic-query-rewriting-and-evaluation-for-complex-document-processing/</guid><description>**UC Berkeley&apos;s EPIC lab** introduces innovative LLM data operators with projects like **LOTUS** and **DocETL**, focusing on effective programming and computation over large data corpora. This approach contrasts GPU-rich big labs like **Deepmind** and **OpenAI** with GPU-poor compound AI systems. **Microsoft** open-sourced **BitNet b1.58**, a 1-bit ternary parameter LLM enabling **4-20x faster training** and on-device inference at human reading speeds. Nvidia released **Llama-3.1-Nemotron-70B-Instruct**, a fine-tuned open-source model outperforming **GPT-4o** and **Claude-3.5-sonnet**. These developments highlight advances in **model-optimization**, **on-device-ai**, and **fine-tuning**.</description><pubDate>Tue, 22 Oct 2024 00:04:21 GMT</pubDate><category>uc-berkeley</category><category>deepmind</category><category>openai</category><category>microsoft</category><category>nvidia</category><category>archetype-ai</category><category>boston-dynamics</category><category>toyota-research</category><category>google</category><category>adobe</category><category>openai</category><category>mistral</category><category>tesla</category><category>meta-ai-fair</category><category>bitnet-b1.58</category><category>llama-3.1-nemotron-70b-instruct</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>rohanpaul_ai</category><category>adcock_brett</category><category>david-patterson</category><category>model-optimization</category><category>on-device-ai</category><category>fine-tuning</category><category>large-corpus-processing</category><category>gpu-acceleration</category><category>frameworks</category><category>model-benchmarking</category></item><item><title>DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality</title><link>https://news.smol.ai/issues/24-10-18-ainews-deepseek-janus-and-meta-spirit-lm-decoupled-image-and-expressive-voice-omnimodality/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-18-ainews-deepseek-janus-and-meta-spirit-lm-decoupled-image-and-expressive-voice-omnimodality/</guid><description>**DeepSeek Janus** and **Meta SpiRit-LM** are two notable multimodality AI models recently released, showcasing advances in image generation and speech synthesis respectively. DeepSeek Janus separates vision encoders for image understanding and generation, achieving better results in both tasks. Meta&apos;s SpiRit-LM introduces an expressive speech and writing model generating pitch and style units, improving over standard TTS. Additionally, **W&amp;B Weave** offers comprehensive LLM observability and multimodality fine-tuning tools. Industry updates include Nvidia&apos;s Nemotron 70b model underperforming, Meta open-sourcing Movie Gen Bench for media generation benchmarking, Perplexity launching internal search with multi-step reasoning, and Anthropic updating Claude apps. Open source progress includes Hugging Face&apos;s gradient accumulation fix in transformers and advocacy for open source AI to prevent Big Tech dominance. *&quot;Model merging for combining skills of multiple models&quot;* is also highlighted.</description><pubDate>Fri, 18 Oct 2024 22:46:38 GMT</pubDate><category>deepseek</category><category>meta-ai-fair</category><category>wandb</category><category>nvidia</category><category>anthropic</category><category>hugging-face</category><category>perplexity-ai</category><category>nemotron-70b</category><category>claude</category><category>claude-3.5-sonnet</category><category>gpt-4o</category><category>bindureddy</category><category>aravsrinivas</category><category>danielhanchen</category><category>clementdelangue</category><category>cwolferesearch</category><category>multimodality</category><category>image-generation</category><category>speech-synthesis</category><category>fine-tuning</category><category>model-merging</category><category>benchmarking</category><category>open-source</category><category>model-optimization</category><category>reinforcement-learning</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-10-17-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-17-ainews-not-much-happened-today/</guid><description>**Answer.ai** launched **fastdata**, a synthetic data generation library using `claudette` and Tencent&apos;s Billion Persona paper. **NotebookLM** became customizable, and **Motherduck** introduced notable LLMs in SQL implementations. **Perplexity** and **Dropbox** announced competitors to **Glean**. **OpenAI** unveiled audio chat completions priced at 24 cents per minute. **Meta AI** released **Llama 3.1**, powering Lenovo AI Now&apos;s on-device agent. **Yi-Lightning** model ranked #6 globally, surpassing **GPT-4o**. **Zyphra AI** released the large **Zyda-2** dataset with 5 trillion tokens. **François Chollet** clarified transformer architecture as set-processing, not sequence-processing. Research suggests memorization aids LLM reasoning. **Anthropic** updated its Responsible Scaling Policy for AI safety. Tools like **Perplexity Finance**, **Open Canvas** by **LangChain**, and **AlphaCodium** code generation tool were highlighted. Approximately $500 million was raised for AI agent startups, with ongoing discussions on AI&apos;s job market impact. Combining prompt caching with the Batches API can yield a 95% discount on **Claude 3.5 Sonnet** tokens.</description><pubDate>Fri, 18 Oct 2024 01:13:21 GMT</pubDate><category>answer-ai</category><category>tencent</category><category>notebooklm</category><category>motherduck</category><category>perplexity</category><category>dropbox</category><category>openai</category><category>meta-ai-fair</category><category>yi-ai</category><category>zyphra-ai</category><category>anthropic</category><category>langchain</category><category>openai</category><category>claudette</category><category>llama-3-1</category><category>yi-lightning</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>fchollet</category><category>aravsrinivas</category><category>svpino</category><category>swyx</category><category>synthetic-data</category><category>fine-tuning</category><category>sql</category><category>audio-processing</category><category>on-device-ai</category><category>dataset-release</category><category>transformer</category><category>llm-reasoning</category><category>ai-safety</category><category>code-generation</category><category>ai-pricing</category><category>ai-job-market</category></item><item><title>Did Nvidia&apos;s Nemotron 70B train on test?</title><link>https://news.smol.ai/issues/24-10-16-ainews-did-nvidias-nemotron-70b-train-on-test/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-16-ainews-did-nvidias-nemotron-70b-train-on-test/</guid><description>**NVIDIA&apos;s Nemotron-70B** model has drawn scrutiny despite strong benchmark performances on **Arena Hard**, **AlpacaEval**, and **MT-Bench**, with some standard benchmarks like **GPQA** and **MMLU Pro** showing no improvement over the base **Llama-3.1-70B**. The new **HelpSteer2-Preference dataset** improves some benchmarks with minimal losses elsewhere. Meanwhile, **Mistral** released **Ministral 3B and 8B** models featuring **128k context length** and outperforming **Llama-3.1** and **GPT-4o** on various benchmarks under the **Mistral Commercial License**. **NVIDIA&apos;s Nemotron 70B** also surpasses **GPT-4o** and **Claude-3.5-Sonnet** on key benchmarks using **RLHF (REINFORCE)** training. Additionally, **Zep** introduced **Graphiti**, an open-source temporal knowledge graph memory layer for AI agents, built on **Neo4j**.</description><pubDate>Thu, 17 Oct 2024 00:44:43 GMT</pubDate><category>nvidia</category><category>mistral-ai</category><category>hugging-face</category><category>zep</category><category>nemotron-70b</category><category>llama-3.1-70b</category><category>llama-3.1</category><category>ministral-3b</category><category>ministral-8b</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>claude-3.5</category><category>reach_vb</category><category>philschmid</category><category>swyx</category><category>benchmarking</category><category>reinforcement-learning</category><category>reward-models</category><category>temporal-knowledge-graphs</category><category>memory-layers</category><category>context-windows</category><category>model-releases</category><category>open-source</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-10-15-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-15-ainews-not-much-happened-today/</guid><description>**Vertical SaaS agents** are gaining rapid consensus as the future of AI applications, highlighted by **Decagon&apos;s $100m funding** and **Sierra&apos;s $4b round**. **OpenAI alumni** are actively raising venture capital and forming new startups, intensifying competition in the AI market. **Demis Hassabis** celebrated the **Nobel Prize** recognition for **AlphaFold2**, a breakthrough in protein structure prediction. Advances in AI models include techniques like **LoRA projectors** and **annealing on high-quality data**, while discussions emphasize the need for **high-bandwidth sensory inputs** beyond language for common sense learning. New methods like **LoLCATs** aim to optimize transformer models such as **Llama** and **Mistral** for efficiency. Ethical concerns about AI agents performing harmful tasks remain under investigation. The AI community continues to explore model evaluation challenges and optimization frameworks like **LPZero** for neural architecture search.</description><pubDate>Tue, 15 Oct 2024 21:33:05 GMT</pubDate><category>openai</category><category>decagon</category><category>sierra</category><category>togethercompute</category><category>llama</category><category>mistral</category><category>mira-murati</category><category>demis-hassabis</category><category>clement-delangue</category><category>john-o-whitaker</category><category>yann-lecun</category><category>francois-chollet</category><category>ajeya-cotra</category><category>rohan-paul</category><category>adcock-brett</category><category>vertical-saas</category><category>funding</category><category>protein-structure-prediction</category><category>lora</category><category>self-supervised-learning</category><category>model-optimization</category><category>neural-architecture-search</category><category>model-evaluation</category><category>ethics</category><category>transformers</category><category>multi-agent-systems</category><category>long-context</category></item><item><title>Not much (in AI) happened this weekend</title><link>https://news.smol.ai/issues/24-10-14-ainews-not-much-in-ai-happened-this-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-14-ainews-not-much-in-ai-happened-this-weekend/</guid><description>**OpenAI** introduced an &quot;edit this area&quot; feature for image generation, praised by **Sam Altman**. **Yann LeCun** highlighted a NYU paper improving pixel generation with feature prediction loss using pre-trained visual encoders like DINOv2. Long-context LLMs such as **llama-3.1-8b** and **llama-3.2** variants now support up to **131k tokens**, offering alternatives to RAG systems. **Bindu Reddy** announced AI agents capable of building and deploying code from English instructions, signaling AI&apos;s replacement of SQL and potential impact on Python. SpaceX&apos;s successful **Starship rocket catch** was celebrated by **Andrej Karpathy** and others, with **Soumith Chintala** praising SpaceX&apos;s efficient, low-bureaucracy research approach. Privacy concerns arose from **Harvard** students&apos; AI glasses, I-XRAY, which can reveal personal information. **Meta AI FAIR**&apos;s Movie Gen model advances media foundation models with high-quality text-to-image and video generation, including synced audio. Humanoid robots like **Ameca** and **Azi** now engage in expressive conversations using **ChatGPT**. **xAI** rapidly deployed **100K Nvidia H100 GPUs** in 19 days, with CEO Jensen Huang commending Elon Musk. Leading AI research labs compared include **Meta-FAIR**, **Google DeepMind**, and **Microsoft Research**. Skepticism about LLM intelligence was voiced by **Sam Pino**, emphasizing limitations in novel problem-solving despite strong memorization.</description><pubDate>Mon, 14 Oct 2024 22:52:37 GMT</pubDate><category>openai</category><category>meta-ai-fair</category><category>google-deepmind</category><category>microsoft</category><category>x-ai</category><category>spacex</category><category>harvard</category><category>nvidia</category><category>llama-3.1-8b</category><category>llama-3.2</category><category>chatgpt</category><category>movie-gen</category><category>sam-altman</category><category>yann-lecun</category><category>rasbt</category><category>bindureddy</category><category>andrej-karpathy</category><category>soumithchintala</category><category>svpino</category><category>adcock_brett</category><category>rohanpaul_ai</category><category>long-context</category><category>feature-prediction-loss</category><category>ai-agents</category><category>privacy</category><category>text-to-video</category><category>text-to-image</category><category>humanoid-robots</category><category>gpu-deployment</category><category>media-foundation-models</category><category>ai-research-labs</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-10-11-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-11-ainews-not-much-happened-today/</guid><description>**Rhymes AI** released **Aria**, a new **25.3B** parameter multimodal MoE model supporting text, code, image, and video with a **64k token context window** and Apache-2.0 license. **OpenAI**&apos;s **o1-preview** and **o1-mini** models show consistent improvement over **Anthropic** and **Google Gemini 1.5 Pro/Flash** on long context RAG benchmarks up to **128k tokens**, while **Google Gemini 1.5** models excel at extreme context lengths up to **2 million tokens**. **Meta AI** expanded rollout to 21 countries with new language support but remains unavailable in the EU. The one-year anniversary of **SWE-bench** benchmark for software engineering tasks was celebrated, alongside the introduction of SWE-bench Multimodal. New AI tools include **OxyCopilot** by Oxylabs for web scraping, **Taipy** for Python-based production apps, and **Latitude** for prompt engineering. Industry insights highlight changing AI funding dynamics and OpenAI&apos;s strategic focus on consumer products like ChatGPT. *&quot;all recaps done by Claude 3.5 Sonnet, best of 4 runs.&quot;*</description><pubDate>Fri, 11 Oct 2024 23:00:43 GMT</pubDate><category>rhymes-ai</category><category>openai</category><category>anthropic</category><category>google</category><category>meta-ai-fair</category><category>oxylabs</category><category>aria</category><category>o1-preview</category><category>o1-mini</category><category>gemini-1.5-pro</category><category>gemini-1.5-flash</category><category>gemini-1.5</category><category>claude-3.5-sonnet</category><category>mervenoyann</category><category>osanseviero</category><category>dbrxmosaicai</category><category>ylecun</category><category>ofirpress</category><category>clefourrier</category><category>omarsar0</category><category>rohanpaul_ai</category><category>svpino</category><category>finbarrtimbers</category><category>_philschmid</category><category>multimodality</category><category>mixture-of-experts</category><category>long-context</category><category>retrieval-augmented-generation</category><category>benchmarking</category><category>software-engineering</category><category>llm-evaluation</category><category>prompt-engineering</category><category>web-scraping</category><category>python</category><category>production-applications</category></item><item><title>State of AI 2024</title><link>https://news.smol.ai/issues/24-10-10-ainews-state-of-ai-2024/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-10-ainews-state-of-ai-2024/</guid><description>**Nathan Benaich&apos;s State of AI Report** in its 7th year provides a comprehensive overview of AI research and industry trends, including highlights like **BitNet** and the synthetic data debate. **Cerebras** is preparing for an IPO, reflecting growth in AI compute. A hackathon hosted by **Daily** and the **Pipecat** community focuses on conversational voice AI and multimodal experiences with $20,000 in prizes. Nobel Prizes in Physics and Chemistry were awarded for AI research: **Geoffrey Hinton** and **John Hopfield** for neural networks and statistical mechanics, and **Demis Hassabis**, **John Jumper**, and **David Baker** for AlphaFold and protein structure prediction. **Meta** released **Llama 3.2** with multimodal capabilities, accompanied by educational resources and performance updates. *&quot;This recognizes the impact of deep neural networks on society&quot;* and *&quot;tremendous impact of AlphaFold and ML-powered protein structure prediction&quot;* were noted by experts.</description><pubDate>Thu, 10 Oct 2024 22:35:38 GMT</pubDate><category>cerebras</category><category>daily</category><category>pipecat</category><category>meta-ai-fair</category><category>anthropic</category><category>llama-3-2</category><category>bitnet</category><category>geoffrey-hinton</category><category>john-hopfield</category><category>demis-hassabis</category><category>john-jumper</category><category>david-baker</category><category>multimodality</category><category>synthetic-data</category><category>protein-structure-prediction</category><category>neural-networks</category><category>statistical-mechanics</category><category>conversational-ai</category><category>voice-ai</category><category>hackathon</category><category>ipo</category><category>model-release</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-10-09-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-09-ainews-not-much-happened-today/</guid><description>**Geoffrey Hinton** and **John Hopfield** won the **Nobel Prize in Physics** for foundational work on neural networks linking AI and physics. **Meta AI** introduced a **13B parameter audio generation model** as part of Meta Movie Gen for video-synced audio. **Anthropic** launched the **Message Batches API** enabling asynchronous processing of up to 10,000 queries at half the cost. **Together Compute** released **Flux Schnell**, a free model for 3 months. New techniques like **PrefixQuant** quantization and **Prompt Caching** for low-latency inference were highlighted by **rohanpaul_ai**. **LangGraph** added long-term memory support for persistent document storage. **Hex-LLM** framework was introduced for TPU-based low-cost, high-throughput LLM serving from Hugging Face models. Discussions on AI safety emphasized gender equality in science, and concerns about premature AI regulation by media and Hollywood were raised.</description><pubDate>Thu, 10 Oct 2024 01:02:45 GMT</pubDate><category>meta-ai-fair</category><category>anthropic</category><category>togethercompute</category><category>hugging-face</category><category>flux-schnell</category><category>geoffrey-hinton</category><category>john-hopfield</category><category>demis-hassabis</category><category>rohanpaul_ai</category><category>svpino</category><category>hwchase17</category><category>shreyar</category><category>philschmid</category><category>mmitchell_ai</category><category>bindureddy</category><category>audio-generation</category><category>quantization</category><category>prompt-caching</category><category>long-term-memory</category><category>llm-serving-framework</category><category>hallucination-detection</category><category>ai-safety</category><category>ai-governance</category></item><item><title>The AI Nobel Prize</title><link>https://news.smol.ai/issues/24-10-08-ainews-the-ai-nobel-prize/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-08-ainews-the-ai-nobel-prize/</guid><description>**Geoff Hinton** and **John Hopfield** won the **Nobel Prize in Physics** for their work on **Artificial Neural Networks**. The award citation spans **14 pages** highlighting their contributions. **Zep** released a new community edition of their low-latency memory layer for AI agents, emphasizing knowledge graphs for memory. At OpenAI&apos;s DevDay, new features like real-time voice API, vision model fine-tuning, and prompt caching with a **50% discount** on reused tokens were introduced. **Anthropic&apos;s Claude 3.5 Sonnet** was recognized as the best model currently. **Reka AI Labs** updated their **Reka Flash** model with enhanced multimodal and function calling capabilities. The **GOT (Generic OCR Transformer)** achieved **98.79% accuracy** on OCR benchmarks. Discussions on open-source AI models highlighted their role in fostering competition and decentralization. Software development insights included the importance of Single Sign-On (SSO), thorough testing, and AI-assisted coding workflows. Ethical and societal topics covered critiques of tax policies and the appointment of France&apos;s first Minister of AI.</description><pubDate>Wed, 09 Oct 2024 01:33:48 GMT</pubDate><category>openai</category><category>anthropic</category><category>reka-ai</category><category>zep</category><category>claude-3.5-sonnet</category><category>reka-flash</category><category>got</category><category>geoff-hinton</category><category>john-hopfield</category><category>philschmid</category><category>alexalbert</category><category>mervenoyann</category><category>clementdelangue</category><category>svpino</category><category>bindureddy</category><category>ylecun</category><category>rohanpaul_ai</category><category>artificial-neural-networks</category><category>nobel-prize</category><category>knowledge-graphs</category><category>memory-layers</category><category>real-time-voice-api</category><category>vision</category><category>fine-tuning</category><category>prompt-caching</category><category>multimodality</category><category>function-calling</category><category>ocr</category><category>open-source</category><category>single-sign-on</category><category>software-testing</category><category>ai-assisted-coding</category><category>ai-ethics</category></item><item><title>not much happened this weekend</title><link>https://news.smol.ai/issues/24-10-07-ainews-not-much-happened-this-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-07-ainews-not-much-happened-this-weekend/</guid><description>**AI news from 10/4/2024 to 10/7/2024** highlights several developments: **OpenAI&apos;s o1-preview** shows strong performance on complex tasks but struggles with simpler ones, while **Claude 3.5 Sonnet** can match its reasoning through advanced prompting techniques. **Meta** introduced **Movie Gen**, a cutting-edge media foundation model for text-to-video generation and editing. **Reka** updated their 21B Flash Model with temporal video understanding, native audio, and tool use capabilities. Interest grows in &quot;open o1&quot; reproductions focusing on prompting and finetuning, with **Entropix** exploring entropy-based sampling. **LangChainAI** demonstrated a Retrieval Agent for complex Q&amp;A, and synthetic data generation research surveyed 417 models. A resurgence in RNNs shows efficient parallel training making them competitive with Transformers. Biologically-inspired AI safety approaches were also noted. *&quot;A quiet weekend and air conditioning is all you need.&quot;*</description><pubDate>Tue, 08 Oct 2024 02:36:09 GMT</pubDate><category>openai</category><category>meta-ai-fair</category><category>reka</category><category>langchainai</category><category>entropix</category><category>o1-preview</category><category>claude-3.5-sonnet</category><category>21b-flash-model</category><category>lex-fridman</category><category>imrat</category><category>jjitsev</category><category>giffmana</category><category>_philschmid</category><category>karpathy</category><category>rasbt</category><category>adcock_brett</category><category>glennko</category><category>rohanpaul_ai</category><category>labenz</category><category>prompting-techniques</category><category>finetuning</category><category>entropy-based-sampling</category><category>temporal-understanding</category><category>native-audio</category><category>tool-use</category><category>instruction-chaining</category><category>multimodality</category><category>retrieval-augmented-generation</category><category>synthetic-data-generation</category><category>rnn</category><category>parallel-training</category><category>biologically-inspired-ai-safety</category><category>text-to-video-generation</category><category>video-editing</category></item><item><title>Contextual Document Embeddings: `cde-small-v1`</title><link>https://news.smol.ai/issues/24-10-04-ainews-contextual-document-embeddings-cde-small-v1/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-04-ainews-contextual-document-embeddings-cde-small-v1/</guid><description>**Meta** announced a new text-to-video model, **Movie Gen**, claiming superior adaptation of **Llama 3** to video generation compared to OpenAI&apos;s Sora Diffusion Transformers, though no release is available yet. Researchers Jack Morris and Sasha Rush introduced the **cde-small-v1** model with a novel **contextual batching** training technique and **contextual embeddings**, achieving strong performance with only **143M parameters**. **OpenAI** launched Canvas, a collaborative interface for ChatGPT with synthetic data training. **Google DeepMind** welcomed Tim Brooks to work on video generation and world simulators. Google released **Gemini 1.5 Flash-8B**, improving cost and rate limits with algorithmic efficiency.</description><pubDate>Sat, 05 Oct 2024 01:38:06 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>google-deepmind</category><category>weights-biases</category><category>togethercompute</category><category>llama-3</category><category>cde-small-v1</category><category>gemini-1.5-flash-8b</category><category>chatgpt</category><category>jack-morris</category><category>sasha-rush</category><category>tim-brooks</category><category>demis-hassabis</category><category>karina-nguyen</category><category>contextual-embeddings</category><category>contextual-batching</category><category>video-generation</category><category>synthetic-data</category><category>model-efficiency</category><category>training-techniques</category><category>rag</category><category>algorithmic-efficiency</category></item><item><title>Canvas: OpenAI&apos;s answer to Claude Artifacts</title><link>https://news.smol.ai/issues/24-10-03-ainews-canvas-openais-answer-to-claude-artifacts/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-03-ainews-canvas-openais-answer-to-claude-artifacts/</guid><description>**OpenAI** released **Canvas**, an enhanced writing and coding tool based on **GPT-4o**, featuring inline suggestions, seamless editing, and a collaborative environment. Early feedback compares it to **Cursor** and **Claude Artifacts**, noting strengths and some execution issues. OpenAI also sponsors **Marijn Haverbeke**, creator of **ProseMirror** and **CodeMirror**, which are used in Canvas. The integration involved training a detector to trigger Canvas appropriately, achieving **83% accuracy** in correct triggers. Unlike Claude Artifacts, Canvas currently lacks Mermaid Diagrams and HTML preview support. Additionally, **Daily** is sponsoring a **$20,000** voice AI hackathon in San Francisco, highlighting voice AI as a key emerging skill.</description><pubDate>Thu, 03 Oct 2024 23:22:37 GMT</pubDate><category>openai</category><category>cursor_ai</category><category>daily</category><category>gpt-4o</category><category>claude-artifacts</category><category>marijn-haverbeke</category><category>karina-nguyen</category><category>vicente-silveira</category><category>swyx</category><category>inline-suggestions</category><category>collaborative-editing</category><category>code-editing</category><category>model-training</category><category>model-integration</category><category>feature-detection</category><category>accuracy-evaluation</category><category>voice-ai</category><category>hackathon</category><category>open-source-libraries</category></item><item><title>Not much technical happened today</title><link>https://news.smol.ai/issues/24-10-02-ainews-not-much-technical-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-02-ainews-not-much-technical-happened-today/</guid><description>**OpenAI** announced raising **$6.6B** in new funding at a **$157B valuation**, with ChatGPT reaching *250M weekly active users*. **Poolside** raised **$500M** to advance AGI development. **LiquidAI** introduced three new MoE models (1B, 3B, 40B) with a **32k context window** and efficient token handling. **OpenAI** released Whisper V3 Turbo, an open-source multilingual model with significant speed improvements. **Meta AI FAIR** is hiring research interns focusing on **LLM reasoning, alignment, synthetic data, and novel architectures**. **Cohere** partnered with Fujitsu to launch Takane, a custom Japanese model. Technical discussions included challenges in **LoRA fine-tuning**, **float8 quantization** in Keras, and new tools like **create-llama** for agent templates. Industry commentary raised concerns about AI development priorities and highlighted freelancing opportunities in AI.</description><pubDate>Wed, 02 Oct 2024 22:45:37 GMT</pubDate><category>openai</category><category>poolside</category><category>liquidai</category><category>perplexity-ai</category><category>meta-ai-fair</category><category>cohere</category><category>fujitsu</category><category>whisper-v3-turbo</category><category>llama-3</category><category>llamaindex</category><category>nick-turley</category><category>arav-srinivas</category><category>francois-fleuret</category><category>finbarr-timbers</category><category>lewtun</category><category>francois-chollet</category><category>jerry-j-liu</category><category>mmitchell-ai</category><category>jxnlco</category><category>mixture-of-experts</category><category>context-windows</category><category>model-optimization</category><category>fine-tuning</category><category>quantization</category><category>model-training</category><category>alignment</category><category>synthetic-data</category><category>model-architecture</category><category>agentic-ai</category></item><item><title>OpenAI Realtime API and other Dev Day Goodies</title><link>https://news.smol.ai/issues/24-10-01-ainews-openai-realtime-api-and-other-dev-day-goodies/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-10-01-ainews-openai-realtime-api-and-other-dev-day-goodies/</guid><description>**OpenAI** launched the **gpt-4o-realtime-preview** Realtime API featuring text and audio token processing with pricing details and future plans including vision and video support. The API supports voice activity detection modes, function calling, and ephemeral sessions with auto-truncation for context limits. Partnerships with **LiveKit**, **Agora**, and **Twilio** enhance audio components and AI virtual agent voice calls. Additionally, OpenAI introduced vision fine-tuning with only 100 examples improving mapping accuracy for **Grab** and RPA success for **Automat**. Model distillation and prompt caching features were also announced, including free eval inference for users opting to share data.</description><pubDate>Wed, 02 Oct 2024 06:06:20 GMT</pubDate><category>openai</category><category>livekit</category><category>agora</category><category>twilio</category><category>grab</category><category>automat</category><category>gpt-4o-realtime-preview</category><category>gpt-4o</category><category>voice-activity-detection</category><category>function-calling</category><category>ephemeral-sessions</category><category>auto-truncation</category><category>vision-fine-tuning</category><category>model-distillation</category><category>prompt-caching</category><category>audio-processing</category></item><item><title>Liquid Foundation Models: A New Transformers alternative + AINews Pod 2</title><link>https://news.smol.ai/issues/24-09-30-ainews-liquid-foundation-models-a-new-transformers-alternative-ainews-pod-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-30-ainews-liquid-foundation-models-a-new-transformers-alternative-ainews-pod-2/</guid><description>**Liquid.ai** emerged from stealth with three subquadratic foundation models demonstrating superior efficiency compared to state space models and Apple’s on-device and server models, backed by a $37M seed round. **Meta AI** announced **Llama 3.2** with multimodal vision-enabled models and lightweight text-only variants for mobile. **Google DeepMind** introduced production-ready **Gemini-1.5-Pro-002** and **Gemini-1.5-Flash-002** models with improved pricing and rate limits, alongside **AlphaChip**, an AI-driven chip design system using reinforcement learning for rapid superhuman layouts. **OpenAI** enhanced ChatGPT Plus and Teams with Advanced Voice Mode featuring Custom Instructions, Memory, and new nature-inspired voices. California Governor vetoed SB-1047 AI regulation bill, celebrated by AI community figures like **ylecun** and **svpino** as a win for open-source AI. Google upgraded **NotebookLM** with audio overviews supporting YouTube and audio files, turning documents into AI-generated podcasts. *&quot;Open source in AI is thriving,&quot;* noted **ylecun**, highlighting 1 million models on Github and HuggingFace.</description><pubDate>Tue, 01 Oct 2024 01:34:19 GMT</pubDate><category>liquid-ai</category><category>meta-ai-fair</category><category>google-deepmind</category><category>openai</category><category>llama-3-2</category><category>gemini-1.5-pro-002</category><category>gemini-1.5-flash-002</category><category>ylecun</category><category>svpino</category><category>reinforcement-learning</category><category>multimodality</category><category>model-efficiency</category><category>foundation-models</category><category>audio-processing</category><category>model-deployment</category><category>open-source</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-09-27-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-27-ainews-not-much-happened-today/</guid><description>**Meta** released **Llama 3.2**, including lightweight 1B and 3B models for on-device AI with capabilities like summarization and retrieval-augmented generation. **Molmo**, a new multimodal model, was introduced with a large dense captioning dataset. **Google DeepMind** announced **AlphaChip**, an AI-driven chip design method improving TPU and CPU designs. **Hugging Face** surpassed 1 million free public models, highlighting the value of smaller specialized models. Discussions covered challenges in scaling RAG applications, the future of on-device AI running ChatGPT-level models, reliability issues in larger LLMs, and new Elo benchmarking accepted at NeurIPS 2024. AI ethics and regulation topics included free speech responsibilities and California&apos;s SB-1047 bill potentially affecting open-source AI. *&quot;AlphaChip transformed computer chip design,&quot;* and *&quot;ChatGPT-level AI on mobile devices predicted within a year.&quot;*</description><pubDate>Fri, 27 Sep 2024 21:53:11 GMT</pubDate><category>meta-ai-fair</category><category>google-deepmind</category><category>hugging-face</category><category>llama-3-2</category><category>llama-3</category><category>molmo</category><category>demis-hassabis</category><category>clementdelangue</category><category>svpino</category><category>awnihannun</category><category>osanseviero</category><category>omarsar0</category><category>sarahookr</category><category>ylecun</category><category>on-device-ai</category><category>multimodality</category><category>chip-design</category><category>retrieval-augmented-generation</category><category>rag</category><category>benchmarking</category><category>reliability</category><category>ai-regulation</category><category>free-speech</category><category>pytorch-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-09-26-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-26-ainews-not-much-happened-today/</guid><description>**Meta AI** released **Llama 3.2** models including **1B, 3B text-only** and **11B, 90B vision** variants with **128K token context length** and adapter layers for image-text integration. These models outperform competitors like **Gemma 2** and **Phi 3.5-mini**, and are supported on major platforms including **AWS, Azure, and Google Cloud**. **OpenAI CTO Mira Murati** announced her departure. **Allen AI** released **Molmo**, an open-source multimodal model family outperforming proprietary systems. **Google** improved **Gemini 1.5** with Flash and Pro models. **Meta** showcased **Project Orion AR glasses** and hinted at a **Quest 3S** priced at $300. Discussions covered new benchmarks for multimodal models, model optimization, and AI safety and alignment.</description><pubDate>Thu, 26 Sep 2024 22:52:11 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>allenai</category><category>google-deepmind</category><category>llama-3-2</category><category>llama-3</category><category>gemma-2</category><category>phi-3-5-mini</category><category>claude-3-haiku</category><category>gpt-4o-mini</category><category>molmo</category><category>gemini-1.5</category><category>gemini</category><category>mira-murati</category><category>demis-hassabis</category><category>ylecun</category><category>sama</category><category>multimodality</category><category>model-optimization</category><category>benchmarks</category><category>ai-safety</category><category>model-distillation</category><category>pruning</category><category>adapter-layers</category><category>open-source-models</category><category>performance</category><category>context-windows</category></item><item><title>Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)</title><link>https://news.smol.ai/issues/24-09-25-ainews-llama-32-on-device-1b3b-and-multimodal-11b90b-with-ai2-molmo-kicker/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-25-ainews-llama-32-on-device-1b3b-and-multimodal-11b90b-with-ai2-molmo-kicker/</guid><description>**Meta** released **Llama 3.2** with new multimodal versions including **3B** and **20B** vision adapters on a frozen Llama 3.1, showing competitive performance against **Claude Haiku** and **GPT-4o-mini**. **AI2** launched multimodal **Molmo 72B** and **7B** models outperforming Llama 3.2 in vision tasks. Meta also introduced new **128k-context 1B and 3B models** competing with **Gemma 2** and **Phi 3.5**, with collaborations hinted with **Qualcomm**, **Mediatek**, and **Arm** for on-device AI. The release includes a **9 trillion token count** for Llama 1B and 3B. Partner launches include **Ollama**, **Together AI** offering free 11B model access, and **Fireworks AI**. Additionally, a new **RAG++ course** from **Weights &amp; Biases**, **Cohere**, and **Weaviate** offers systematic evaluation and deployment guidance for retrieval-augmented generation systems based on extensive production experience.</description><pubDate>Wed, 25 Sep 2024 23:54:30 GMT</pubDate><category>meta-ai-fair</category><category>ai2</category><category>qualcomm</category><category>mediatek</category><category>arm</category><category>ollama</category><category>together-ai</category><category>fireworks-ai</category><category>weights-biases</category><category>cohere</category><category>weaviate</category><category>llama-3-2</category><category>llama-3-1</category><category>claude-3-haiku</category><category>gpt-4o-mini</category><category>molmo-72b</category><category>molmo-7b</category><category>gemma-2</category><category>phi-3-5</category><category>llama-3-2-vision</category><category>llama-3-2-3b</category><category>llama-3-2-20b</category><category>mira-murati</category><category>daniel-han</category><category>multimodality</category><category>vision</category><category>context-windows</category><category>quantization</category><category>model-release</category><category>tokenization</category><category>model-performance</category><category>model-optimization</category><category>rag</category><category>model-training</category><category>instruction-following</category></item><item><title>ChatGPT Advanced Voice Mode</title><link>https://news.smol.ai/issues/24-09-24-ainews-chatgpt-advanced-voice-mode/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-24-ainews-chatgpt-advanced-voice-mode/</guid><description>**OpenAI** rolled out **ChatGPT Advanced Voice Mode** with 5 new voices and improved accent and language support, available widely in the US. Ahead of rumored updates for **Llama 3** and **Claude 3.5**, **Gemini Pro** saw a significant price cut aligning with the new intelligence frontier pricing. **OpenAI&apos;s o1-preview model** showed promising planning task performance with 52.8% accuracy on Randomized Mystery Blocksworld. **Anthropic** is rumored to release a new model, generating community excitement. **Qwen 2.5** was released with models up to 32B parameters and support for 128K tokens, matching GPT-4 0613 benchmarks. Research highlights include PlanBench evaluation of o1-preview, OpenAI&apos;s release of a multilingual MMMLU dataset covering 14 languages, and RAGLAB framework standardizing Retrieval-Augmented Generation research. New AI tools include PDF2Audio for converting PDFs to audio, an open-source AI starter kit for local model deployment, and **Moshi**, a speech-based AI assistant from Kyutai. Industry updates feature **Scale AI** nearing $1B ARR with 4x YoY growth and **Together Compute&apos;s** enterprise platform offering faster inference and cost reductions. Insights from **Sam Altman**&apos;s blog post were also shared.</description><pubDate>Wed, 25 Sep 2024 01:31:24 GMT</pubDate><category>openai</category><category>anthropic</category><category>scale-ai</category><category>togethercompute</category><category>kyutai-labs</category><category>o1-preview</category><category>qwen-2.5</category><category>llama-3</category><category>claude-3.5</category><category>sam-altman</category><category>omarsar0</category><category>bindureddy</category><category>rohanpaul_ai</category><category>_philschmid</category><category>alexandr_wang</category><category>svpino</category><category>ylecun</category><category>_akhaliq</category><category>voice-synthesis</category><category>planning</category><category>multilingual-datasets</category><category>retrieval-augmented-generation</category><category>open-source</category><category>speech-assistants</category><category>enterprise-ai</category><category>price-cuts</category><category>benchmarking</category><category>model-performance</category></item><item><title>a calm before the storm</title><link>https://news.smol.ai/issues/24-09-23-ainews-a-calm-before-the-storm/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-23-ainews-a-calm-before-the-storm/</guid><description>**Anthropic** is raising funds at a valuation up to **$40 billion** ahead of anticipated major releases. **OpenAI** launched new reasoning models **o1** and **o1-mini**, with increased rate limits and a multilingual MMLU benchmark. **Alibaba** released the open-source **Qwen2.5** model supporting 29+ languages, showing competitive performance to **gpt-4** at lower cost. **Microsoft** and **Blackrock** plan to invest **$30 billion** in AI data centers, with **Groq** partnering with Aramco to build the world&apos;s largest AI inference center. Robotics advances include Disney Research and ETH Zurich&apos;s diffusion-based motion generation for robots and Pudu Robotics&apos; semi-humanoid robot. Slack and Microsoft introduced AI-powered agents integrated into their platforms. Research highlights include long-context scaling for **llama-2-70b** using Dual Chunk Attention and KV cache quantization enabling 1 million token context on **llama-7b** models.</description><pubDate>Mon, 23 Sep 2024 23:33:49 GMT</pubDate><category>anthropic</category><category>openai</category><category>alibaba</category><category>microsoft</category><category>blackrock</category><category>groq</category><category>aramco</category><category>disney</category><category>eth-zurich</category><category>pudu-robotics</category><category>slack</category><category>o1</category><category>o1-mini</category><category>qwen2.5</category><category>gpt-4</category><category>llama-2-70b</category><category>llama-7b</category><category>adcock_brett</category><category>philschmid</category><category>rohanpaul_ai</category><category>jvnixon</category><category>kateclarktweets</category><category>sama</category><category>long-context</category><category>kv-cache-quantization</category><category>diffusion-models</category><category>reinforcement-learning</category><category>robotics</category><category>ai-integration</category><category>multilinguality</category><category>model-benchmarking</category><category>model-performance</category><category>model-optimization</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-09-20-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-20-ainews-not-much-happened-today/</guid><description>**Anthropic** introduced a RAG technique called Contextual Retrieval that reduces retrieval failure rates by 67% using prompt caching. **Meta** is teasing multimodal **Llama 3** ahead of Meta Connect. **OpenAI** is hiring for a multi-agent research team focusing on improved AI reasoning with their **o1 models**, which have sparked mixed reactions. **DeepSeek 2.5** is noted as a cost-effective alternative to **GPT-4** and **Claude 3.5 sonnet**. New models like **3DTopia-XL** for 3D asset generation and **CogVideoX** for image-to-video conversion were highlighted. Techniques to boost reasoning by re-reading questions and combining retrieval with prompt caching were shared. Industry insights emphasize the necessity of AI adoption in enterprises and the disruption of traditional ML businesses. Tools like **LangChainAI&apos;s LangGraph Templates** and **LlamaIndex&apos;s LlamaParse Premium** enhance agentic applications and multimodal content extraction. Discussions on LLM evals and caching highlight production challenges and improvements. *&quot;Companies not allowing developers to use AI are unlikely to succeed&quot;* was a key sentiment.</description><pubDate>Sat, 21 Sep 2024 01:37:46 GMT</pubDate><category>anthropic</category><category>meta-ai-fair</category><category>openai</category><category>deepseek-ai</category><category>llamaindex</category><category>langchainai</category><category>llama-3</category><category>o1</category><category>deepseek-2.5</category><category>gpt-4</category><category>claude-3.5-sonnet</category><category>3dtopia-xl</category><category>cogvideox</category><category>retrieval-augmented-generation</category><category>prompt-caching</category><category>multimodality</category><category>multi-agent-systems</category><category>reasoning</category><category>diffusion-models</category><category>image-to-video</category><category>prompting</category><category>enterprise-ai</category><category>agentic-ai</category><category>long-context</category><category>model-evaluation</category><category>caching</category><category>model-cost-efficiency</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-09-19-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-19-ainews-not-much-happened-today/</guid><description>**OpenAI&apos;s o1-preview and o1-mini models** lead benchmarks in Math, Hard Prompts, and Coding. **Qwen 2.5 72B** model shows strong performance close to **GPT-4o**. **DeepSeek-V2.5** tops Chinese LLMs, rivaling **GPT-4-Turbo-2024-04-09**. **Microsoft&apos;s GRIN MoE** achieves good results with 6.6B active parameters. **Moshi voice model** from Kyutai Labs runs locally on Apple Silicon Macs. **Perplexity app** introduces voice mode with push-to-talk. **LlamaCoder** by Together.ai uses **Llama 3.1 405B** for app generation. **Google DeepMind&apos;s Veo** is a new generative video model for YouTube Shorts. The **2024 ARC-AGI competition** increases prize money and plans a university tour. A survey on model merging covers 50+ papers for LLM alignment. The **Kolmogorov–Arnold Transformer (KAT)** paper proposes replacing MLP layers with KAN layers for better expressiveness. **Hugging Face Hub** integrates with **Google Cloud Vertex AI Model Garden** for easier open-source model deployment. **Agent.ai** is introduced as a professional network for AI agents. *&quot;Touching grass is all you need.&quot;*</description><pubDate>Fri, 20 Sep 2024 01:00:56 GMT</pubDate><category>openai</category><category>qwen</category><category>deepseek-ai</category><category>microsoft</category><category>kyutai-labs</category><category>perplexity-ai</category><category>together-ai</category><category>meta-ai-fair</category><category>google-deepmind</category><category>hugging-face</category><category>google</category><category>anthropic</category><category>o1-preview</category><category>o1-mini</category><category>qwen-2.5</category><category>gpt-4o</category><category>deepseek-v2.5</category><category>gpt-4-turbo-2024-04-09</category><category>grin</category><category>llama-3-1-405b</category><category>veo</category><category>kat</category><category>hyung-won-chung</category><category>noam-brown</category><category>bindureddy</category><category>akhaliq</category><category>karpathy</category><category>aravsrinivas</category><category>fchollet</category><category>cwolferesearch</category><category>philschmid</category><category>labenz</category><category>ylecun</category><category>benchmarking</category><category>math</category><category>coding</category><category>instruction-following</category><category>model-merging</category><category>model-expressiveness</category><category>moe</category><category>voice</category><category>voice-models</category><category>generative-video</category><category>competition</category><category>open-source</category><category>model-deployment</category><category>ai-agents</category></item><item><title>o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release</title><link>https://news.smol.ai/issues/24-09-18-ainews-o1-destroys-lmsys-arena-qwen-25-kyutai-moshi-release/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-18-ainews-o1-destroys-lmsys-arena-qwen-25-kyutai-moshi-release/</guid><description>**OpenAI&apos;s o1-preview** model has achieved a milestone by fully matching top daily AI news stories without human intervention, consistently outperforming other models like **Anthropic**, **Google**, and **Llama 3** in vibe check evaluations. **OpenAI** models dominate the top 4 slots on **LMsys** benchmarks, with rate limits increasing to **500-1000 requests per minute**. In open source, **Alibaba&apos;s Qwen 2.5** suite surpasses **Llama 3.1** at the 70B scale and updates its closed **Qwen-Plus** models to outperform **DeepSeek V2.5** but still lag behind leading American models. **Kyutai Moshi** released its open weights realtime voice model featuring a unique streaming neural architecture with an &quot;inner monologue.&quot; **Weights &amp; Biases** introduced **Weave**, an LLM observability toolkit that enhances experiment tracking and evaluation, turning prompting into a more scientific process. The news also highlights upcoming events like the **WandB LLM-as-judge hackathon** in San Francisco. *&quot;o1-preview consistently beats out our vibe check evals&quot;* and *&quot;OpenAI models are gradually raising rate limits by the day.&quot;*</description><pubDate>Wed, 18 Sep 2024 21:51:26 GMT</pubDate><category>openai</category><category>anthropic</category><category>google</category><category>alibaba</category><category>deepseek</category><category>kyutai</category><category>weights-biases</category><category>mistral-ai</category><category>o1-preview</category><category>o1-mini</category><category>qwen-2.5</category><category>qwen-plus</category><category>llama-3-1</category><category>deepseek-v2.5</category><category>sama</category><category>guillaumelample</category><category>chain-of-thought</category><category>multimodality</category><category>model-benchmarking</category><category>model-performance</category><category>streaming-neural-architecture</category><category>llm-observability</category><category>experiment-tracking</category><category>rate-limiting</category></item><item><title>nothing much happened today</title><link>https://news.smol.ai/issues/24-09-17-ainews-nothing-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-17-ainews-nothing-much-happened-today/</guid><description>**OpenAI&apos;s o1 model** faces skepticism about open-source replication due to its extreme restrictions and unique training advances like RL on CoT. **ChatGPT-4o** shows significant performance improvements across benchmarks. **Llama-3.1-405b** fp8 and bf16 versions perform similarly with cost benefits for fp8. A new open-source benchmark &quot;Humanity&apos;s Last Exam&quot; offers $500K in prizes to challenge LLMs. Model merging benefits from neural network sparsity and linear mode connectivity. Embedding-based toxic prompt detection achieves high accuracy with low compute. **InstantDrag** enables fast, optimization-free drag-based image editing. **LangChain v0.3** releases with improved dependency management. Automated code review tool **CodeRabbit** adapts to team coding styles. Visual search advances integrate multimodal data for better product search. Experts predict AI will be default software by 2030.</description><pubDate>Wed, 18 Sep 2024 00:27:31 GMT</pubDate><category>openai</category><category>lmsys</category><category>scale-ai</category><category>cognition</category><category>langchain</category><category>qdrant</category><category>rohanpaul_ai</category><category>o1</category><category>chatgpt-4o</category><category>llama-3-1-405b</category><category>denny_zhou</category><category>svpino</category><category>alexandr_wang</category><category>cwolferesearch</category><category>rohanpaul_ai</category><category>_akhaliq</category><category>kylebrussell</category><category>reinforcement-learning</category><category>model-merging</category><category>embedding-models</category><category>toxicity-detection</category><category>image-editing</category><category>dependency-management</category><category>automated-code-review</category><category>visual-search</category><category>benchmarking</category></item><item><title>a quiet weekend</title><link>https://news.smol.ai/issues/24-09-16-ainews-a-quiet-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-16-ainews-a-quiet-weekend/</guid><description>**OpenAI** released the new **o1** model, leveraging reinforcement learning and chain-of-thought prompting to excel in reasoning benchmarks, achieving an IQ-like score of **120**. **Google DeepMind** introduced **DataGemma** to reduce hallucinations by connecting LLMs with real-world data, and unveiled **ALOHA** and **DemoStart** for robot dexterity using diffusion methods. **Adobe** previewed its **Firefly AI Video Model** with text-to-video and generative extend features. **Mistral** launched the multimodal **Pixtral 12B** model, and **Tencent** presented the **GameGen-O** open-world video game generation model. Several research papers from **Stanford**, **OpenAI**, **Microsoft**, **Mila**, and **Notre Dame** focus on advanced reasoning, self-verification, and reflection tuning techniques. Experts like **Terence Tao** and **George Hotz** have shared mixed but optimistic views on o1&apos;s capabilities. Seed funding rounds include **Supermaven** ($12M) and **11x** ($24M).</description><pubDate>Tue, 17 Sep 2024 00:28:09 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>adobe</category><category>mistral-ai</category><category>tencent</category><category>supermaven</category><category>11x</category><category>cohere</category><category>anthropic</category><category>latent-space-university</category><category>stanford</category><category>microsoft</category><category>mila</category><category>notre-dame</category><category>o1</category><category>datagemma</category><category>aloha</category><category>demostart</category><category>firefly-ai-video-model</category><category>pixtral-12b</category><category>gamegen-o</category><category>george-hotz</category><category>terence-tao</category><category>adcock_brett</category><category>rohanpaul_ai</category><category>bindureddy</category><category>fchollet</category><category>philschmid</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>reasoning</category><category>robotics</category><category>diffusion-models</category><category>multimodality</category><category>video-generation</category><category>model-training</category><category>reflection-tuning</category><category>mathematical-reasoning</category><category>model-benchmarking</category><category>fine-tuning</category></item><item><title>Learnings from o1 AMA</title><link>https://news.smol.ai/issues/24-09-13-ainews-learnings-from-o1-ama/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-13-ainews-learnings-from-o1-ama/</guid><description>**OpenAI** released the **o1 model series**, touted as their &quot;most capable and aligned models yet,&quot; trained with reinforcement learning to enhance reasoning. The **o1-preview** model scored **21% on ARC-AGI**, **~80% on aider code editing** (surpassing Claude 3.5 Sonnet&apos;s 77%), and **~52% on Cognition-Golden**, showcasing a shift from memorizing answers to memorizing reasoning. The model employs a unique chain-of-thought approach enabling &quot;System II thinking&quot; for better problem-solving. Experts like **Andrew Mayne** advise framing o1 as a smart friend providing thoughtful explanations. Additionally, an advanced RAG course sponsored by **Weights &amp; Biases**, **Cohere**, and **Weaviate** offers strategies for hybrid search and prompting to optimize AI solutions.</description><pubDate>Sat, 14 Sep 2024 00:55:34 GMT</pubDate><category>openai</category><category>weights-biases</category><category>cohere</category><category>weaviate</category><category>o1-preview</category><category>o1-mini</category><category>claude-3.5-sonnet</category><category>gpt-4o</category><category>sama</category><category>rohanpaul_ai</category><category>gdb</category><category>andrew-mayne</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>reasoning</category><category>model-performance</category><category>prompting</category><category>code-editing</category><category>rag</category><category>hybrid-search</category></item><item><title>o1: OpenAI&apos;s new general reasoning models</title><link>https://news.smol.ai/issues/24-09-12-ainews-o1-openais-new-general-reasoning-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-12-ainews-o1-openais-new-general-reasoning-models/</guid><description>**OpenAI** has released the **o1** model family, including **o1-preview** and **o1-mini**, focusing on test-time reasoning with extended output token limits over 30k tokens. The models show strong performance, ranking in the 89th percentile on competitive programming, excelling in USA Math Olympiad qualifiers, and surpassing PhD-level accuracy on physics, biology, and chemistry benchmarks. Notably, **o1-mini** performs impressively despite its smaller size compared to **gpt-4o**. The release highlights new scaling laws for test-time compute that scale loglinearly. Additionally, **Nvidia** is reportedly losing AI chip market share to startups, with a shift in developer preference from CUDA to **llama** models for web development, though Nvidia remains dominant in training. This news reflects significant advances in reasoning-focused models and shifts in AI hardware competition.</description><pubDate>Fri, 13 Sep 2024 01:18:57 GMT</pubDate><category>openai</category><category>nvidia</category><category>o1</category><category>o1-preview</category><category>o1-mini</category><category>gpt-4o</category><category>llama</category><category>jason-wei</category><category>jim-fan</category><category>test-time-reasoning</category><category>reasoning-tokens</category><category>token-limit</category><category>competitive-programming</category><category>benchmarking</category><category>scaling-laws</category><category>ai-chip-competition</category><category>inference</category><category>training</category><category>model-performance</category></item><item><title>Pixtral 12B: Mistral beats Llama to Multimodality</title><link>https://news.smol.ai/issues/24-09-11-ainews-pixtral-12b-mistral-beats-llama-to-multimodality/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-11-ainews-pixtral-12b-mistral-beats-llama-to-multimodality/</guid><description>**Mistral AI** released **Pixtral 12B**, an open-weights **vision-language model** with a **Mistral Nemo 12B** text backbone and a 400M vision adapter, featuring a large vocabulary of **131,072 tokens** and support for **1024x1024 pixel images**. This release notably beat **Meta AI** in launching an open multimodal model. At the Mistral AI Summit, architecture details and benchmark performances were shared, showing strong OCR and screen understanding capabilities. Additionally, **Arcee AI** announced **SuperNova**, a distilled **Llama 3.1 70B &amp; 8B** model outperforming Meta&apos;s Llama 3.1 70B instruct on benchmarks. **DeepSeek** released **DeepSeek-V2.5**, scoring **89 on HumanEval**, surpassing **GPT-4-Turbo**, Opus, and Llama 3.1 in coding tasks. **OpenAI** plans to release **Strawberry** as part of ChatGPT soon, though its capabilities are debated. **Anthropic** introduced Workspaces for managing multiple Claude deployments with enhanced access controls.</description><pubDate>Thu, 12 Sep 2024 00:30:22 GMT</pubDate><category>mistral-ai</category><category>meta-ai-fair</category><category>hugging-face</category><category>arcee-ai</category><category>deepseek-ai</category><category>openai</category><category>anthropic</category><category>pixtral-12b</category><category>mistral-nemo-12b</category><category>llama-3-1-70b</category><category>llama-3-1-8b</category><category>deeps-eek-v2-5</category><category>gpt-4-turbo</category><category>llama-3-1</category><category>strawberry</category><category>claude</category><category>reach_vb</category><category>devendra_chapilot</category><category>_philschmid</category><category>rohanpaul_ai</category><category>vision</category><category>multimodality</category><category>ocr</category><category>benchmarking</category><category>model-release</category><category>model-architecture</category><category>model-performance</category><category>fine-tuning</category><category>model-deployment</category><category>reasoning</category><category>code-generation</category><category>api</category><category>access-control</category></item><item><title>not much happened today + AINews Podcast?</title><link>https://news.smol.ai/issues/24-09-10-ainews-not-much-happened-today-ainews-podcast/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-10-ainews-not-much-happened-today-ainews-podcast/</guid><description>**Glean** doubled its valuation again. **Dan Hendrycks&apos; Superforecaster AI** generates plausible election forecasts with interesting prompt engineering. A **Stanford** study found that **LLM-generated research ideas** are statistically more novel than those by expert humans. **SambaNova** announced faster inference for **llama-3** models, surpassing **Cerebras**. **Benjamin Clavie** gave a notable talk on retrieval-augmented generation techniques. **Strawberry** is reported to launch in two weeks. **Google Illuminate** offers AI-generated podcast discussions about papers and books. **Apple** unveiled new AI features in iOS 18, including visual intelligence and improved Siri, with on-device and cloud processing for camera-based event additions. The **Reflection 70B** model sparked controversy over performance claims. Experts highlighted the unreliability of traditional benchmarks like MMLU and HumanEval, recommending alternative evaluation methods such as LMSys Chatbot Arena and Hugging Face&apos;s open-sourced **Lighteval** suite. The AI research community continues to explore AI&apos;s role in generating novel research ideas and improving benchmarking.</description><pubDate>Wed, 11 Sep 2024 02:24:16 GMT</pubDate><category>glean</category><category>sambanova</category><category>cerebras</category><category>stanford</category><category>google</category><category>apple</category><category>hugging-face</category><category>lmsys</category><category>superforecaster-ai</category><category>llama-3</category><category>reflection-70b</category><category>danhendrycks</category><category>benjamin-clavie</category><category>bclavie</category><category>bindureddy</category><category>swyx</category><category>borismpower</category><category>corbtt</category><category>drjimfan</category><category>clementdelangue</category><category>rohanpaul_ai</category><category>prompt-engineering</category><category>research-ideas</category><category>inference-speed</category><category>retrieval-augmented-generation</category><category>evaluation-methods</category><category>visual-intelligence</category><category>on-device-ai</category><category>model-performance</category><category>benchmarking</category><category>novelty-detection</category></item><item><title>AIPhone 16: the Visual Intelligence Phone</title><link>https://news.smol.ai/issues/24-09-09-ainews-aiphone-16-the-visual-intelligence-phone/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-09-ainews-aiphone-16-the-visual-intelligence-phone/</guid><description>**Apple** announced the new **iPhone 16** lineup featuring **Visual Intelligence**, a new AI capability integrated with Camera Control, Apple Maps, and Siri, emphasizing privacy and default service use over third-party AI like OpenAI. **Apple Photos** now includes advanced video understanding with timestamp recognition. Meanwhile, **Reflection-70B** claims to be a top open-source model but benchmarks show it performs close to **Llama 3 70B** and slightly worse than **Qwen 2 72B**. **Yann LeCun** highlighted ongoing challenges with LLM planning abilities, noting models like **Llama-3.1-405b** and **Claude** show some skill, while **GPT-4** and **Gemini** lag behind. **Weights &amp; Biases** is sponsoring an event to advance LLM evaluation techniques with prizes and API access.</description><pubDate>Mon, 09 Sep 2024 23:00:14 GMT</pubDate><category>apple</category><category>openai</category><category>weights-biases</category><category>reflection-70b</category><category>llama-3-70b</category><category>qwen-2-72b</category><category>llama-3-1-405b</category><category>claude</category><category>gpt-4</category><category>gemini</category><category>yann-lecun</category><category>vision</category><category>video-understanding</category><category>benchmarking</category><category>planning</category><category>model-evaluation</category><category>privacy</category><category>ai-integration</category><category>instruction-following</category></item><item><title>Reflection 70B, by Matt from IT Department</title><link>https://news.smol.ai/issues/24-09-06-ainews-reflection-70b-by-matt-from-it-department/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-06-ainews-reflection-70b-by-matt-from-it-department/</guid><description>**Reflection Tuning** technique has been used by a two-person team from **Hyperwrite** and **Glaive** to finetune **llama-3.1-70b**, showing strong performance improvements with minimal synthetic data. The approach builds on the concept of adding `thinking` and `reflection` steps to outputs, related to the **Chain of Thought** method. Despite some criticisms like contamination concerns, worse coding performance, and reliance on system prompts, the model has received positive reception and comparisons to **claude-3.5-sonnet**. The work highlights efficient instruction tuning and synthetic data generation for large models.</description><pubDate>Sat, 07 Sep 2024 01:17:07 GMT</pubDate><category>hyperwrite</category><category>glaive</category><category>llama-3.1-70b</category><category>llama-3</category><category>claude-3.5-sonnet</category><category>matt-shumer</category><category>sahil-chaudhary</category><category>fine-tuning</category><category>chain-of-thought</category><category>instruction-following</category><category>synthetic-data</category><category>quantization</category><category>model-evaluation</category><category>prompt-engineering</category></item><item><title>Replit Agent - How did everybody beat Devin to market?</title><link>https://news.smol.ai/issues/24-09-05-ainews-replit-agent-how-did-everybody-beat-devin-to-market/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-05-ainews-replit-agent-how-did-everybody-beat-devin-to-market/</guid><description>**Replit Agent** launched as a fully integrated Web IDE enabling text-to-app generation with planning and self-healing, available immediately to paid users without a waitlist. Other notable developments include **Melodio**, a new text-to-music model, and **Together AI**&apos;s kernel and speculative decoding work. **Anthropic AI** announced a new enterprise plan featuring a **500K context window** and enhanced security. Discussions on **JPEG-LM** and **AVC-LM** models for improved image and video generation, and GPU market trends around the **H100 GPU** pricing were highlighted. Influential voices like **Andrej Karpathy** shared insights on AI agents and automation.</description><pubDate>Fri, 06 Sep 2024 01:54:59 GMT</pubDate><category>replit</category><category>anthropic</category><category>togethercompute</category><category>jpeg-lm</category><category>avc-lm</category><category>andrej-karpathy</category><category>mervenoyann</category><category>bindureddy</category><category>rohanpaul_ai</category><category>leptonai</category><category>teortaxestex</category><category>document-retrieval</category><category>retrieval-augmented-generation</category><category>ai-agents</category><category>image-generation</category><category>video-generation</category><category>context-windows</category><category>gpu-pricing</category><category>enterprise-ai</category><category>self-healing</category><category>text-to-music</category></item><item><title>$1150m for SSI, Sakana, You.com + Claude 500m context</title><link>https://news.smol.ai/issues/24-09-04-ainews-dollar1150m-for-ssi-sakana-youcom-claude-500m-context/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-04-ainews-dollar1150m-for-ssi-sakana-youcom-claude-500m-context/</guid><description>**Safe Superintelligence** raised **$1 billion** at a **$5 billion** valuation, focusing on safety and search approaches as hinted by Ilya Sutskever. **Sakana AI** secured a **$100 million Series A** funding round, emphasizing nature-inspired collective intelligence. **You.com** pivoted to a ChatGPT-like productivity agent after a **$50 million Series B** round, while **Perplexity AI** raised over **$250 million** this summer. **Anthropic** launched Claude for Enterprise with a **500 million token context window**. **AI2** released a **64-expert Mixture-of-Experts (MoE) model** called OLMo, outperforming Llama2-13B-Chat. Key AI research trends include efficient MoE architectures, challenges in AI alignment and GPU costs, and emerging AI agents for autonomous tasks. Innovations in AI development feature command and control for video generation, Retrieval-Augmented Generation (RAG) efficiency, and GitHub integration under Anthropic&apos;s Enterprise plan. *&quot;Our logo is meant to invoke the idea of a school of fish coming together and forming a coherent entity from simple rules as we want to make use of ideas from nature such as evolution and collective intelligence in our research.&quot;*</description><pubDate>Thu, 05 Sep 2024 03:25:36 GMT</pubDate><category>safe-superintelligence</category><category>sakana-ai</category><category>you-com</category><category>perplexity-ai</category><category>anthropic</category><category>ai2</category><category>olmo</category><category>llama2-13b-chat</category><category>claude</category><category>claude-3.5-sonnet</category><category>ilya-sutskever</category><category>mervenoyann</category><category>yuchenj_uw</category><category>rohanpaul_ai</category><category>ctojunior</category><category>omarsar0</category><category>mixture-of-experts</category><category>model-architecture</category><category>model-training</category><category>gpu-costs</category><category>retrieval-augmented-generation</category><category>video-generation</category><category>ai-alignment</category><category>enterprise-ai</category><category>agentic-ai</category><category>command-and-control</category></item><item><title>Everybody shipped small things this holiday weekend</title><link>https://news.smol.ai/issues/24-09-03-ainews-everybody-shipped-small-things-this-holiday-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-09-03-ainews-everybody-shipped-small-things-this-holiday-weekend/</guid><description>**xAI** announced the **Colossus 100k H100 cluster** capable of training an FP8 GPT-4 class model in 4 days. **Google** introduced **Structured Output** for **Gemini**. **Anthropic** discussed **Claude**&apos;s performance issues possibly due to API prompt modifications. **OpenAI** enhanced controls for File Search in their Assistants API. **Cognition** and **Anthropic** leaders appeared on podcasts. The viral **Kwai-Kolors** virtual try-on model and the open-source real-time audio conversational model **Mini-Omni** (similar to **gpt-4o-voice**) were released. Tutorials on parameter-efficient fine-tuning with LoRA and QLoRA, long-context embedding challenges, and Claude&apos;s LaTeX rendering feature were highlighted. **AI21 Labs** released **Jamba 1.5** models with a 256K context window and faster long-context performance. **NVIDIA** debuted **Mistral-Nemo-Minitron-8B** on the Open LLM Leaderboard. **LangChain** introduced resource tags for workspace organization, and a low-code AI app toolkit was shared by **svpino**. Legal AI agents and financial agent evaluations using LangSmith were also featured.</description><pubDate>Wed, 04 Sep 2024 01:35:37 GMT</pubDate><category>xai</category><category>google</category><category>anthropic</category><category>openai</category><category>cognition</category><category>ai21-labs</category><category>nvidia</category><category>langchain</category><category>gpt-4o-voice</category><category>gemini</category><category>claude</category><category>jamba-1.5</category><category>mistral-nemo-minitron-8b</category><category>dario-amodei</category><category>scott-wu</category><category>fchollet</category><category>svpino</category><category>fine-tuning</category><category>long-context</category><category>parameter-efficient-fine-tuning</category><category>latex-rendering</category><category>real-time-audio</category><category>virtual-try-on</category><category>resource-tags</category><category>low-code</category><category>ai-agents</category><category>workspace-organization</category><category>model-benchmarking</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-08-30-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-30-ainews-not-much-happened-today/</guid><description>**Meta** announced significant adoption of **LLaMA 3.1** with nearly **350 million downloads** on Hugging Face. **Magic AI Labs** introduced **LTM-2-Mini**, a long context model with a **100 million token context window**, and a new evaluation method called HashHop. **LMSys** added style control to their Chatbot Arena leaderboard, improving rankings for models like **Claude 3.5 Sonnet** and **LLaMA 3.1 405B**. **Alibaba** released **Qwen2-VL**, a multimodal LLM under Apache 2.0 license, competitive with **GPT-4o mini**. **OpenAI** CEO **Sam Altman** announced collaboration with the US AI Safety Institute for pre-release model testing. Discussions on AI safety and potential AI takeover risks were highlighted by **Ajeya Cotra**. Tools like **firecrawl** for web crawling and challenges in PDF processing were noted. AI hype cycles and market trends were discussed by **François Chollet**, and potential AI disruption in call centers was shared by **Rohan Paul**.</description><pubDate>Sat, 31 Aug 2024 00:41:42 GMT</pubDate><category>meta-ai-fair</category><category>hugging-face</category><category>magic-ai-labs</category><category>lmsys</category><category>alibaba</category><category>openai</category><category>llama-3-1</category><category>claude-3-5-sonnet</category><category>llama-3-1-405b</category><category>ltm-2-mini</category><category>qwen2-vl</category><category>gpt-4o-mini</category><category>sam-altman</category><category>ajeya-cotra</category><category>fchollet</category><category>rohanpaul_ai</category><category>philschmid</category><category>long-context</category><category>style-control</category><category>multimodality</category><category>ai-safety</category><category>model-evaluation</category><category>web-crawling</category><category>pdf-processing</category><category>ai-hype-cycles</category><category>call-center-automation</category></item><item><title>Summer of Code AI: $1.6b raised, 1 usable product</title><link>https://news.smol.ai/issues/24-08-29-ainews-summer-of-code-ai-dollar16b-raised-1-usable-product/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-29-ainews-summer-of-code-ai-dollar16b-raised-1-usable-product/</guid><description>**Code + AI** is emphasized as a key modality in AI engineering, highlighting productivity and verifiability benefits. Recent major funding rounds include **Cognition AI raising $175M**, **Poolside raising $400M**, **Codeium AI raising $150M**, and **Magic raising $320M**. Magic announced their **LTM-2** model with a **100 million token context window**, boasting efficiency improvements over **Llama 3.1 405B** by about **1000x cheaper** in sequence-dimension algorithm and drastically lower memory requirements. Magic&apos;s stack is built from scratch with custom CUDA and no open-source foundations, partnered with **Google Cloud** and powered by **NVIDIA H100** and **GB200 GPUs**, aiming to scale to tens of thousands of GPUs. Google DeepMind revealed updates to **Gemini Advanced** with customizable expert &quot;Gems.&quot; Neural Game Engines like **GameNGen** can run DOOM in a diffusion model trained on **0.9B frames**. The content also references **LLM quantization** research by Rohan Paul.</description><pubDate>Fri, 30 Aug 2024 00:01:06 GMT</pubDate><category>cognition</category><category>poolside</category><category>codeium</category><category>magic</category><category>google-deepmind</category><category>nvidia</category><category>google-cloud</category><category>ltm-2</category><category>llama-3-1-405b</category><category>gemini-advanced</category><category>nat-friedman</category><category>ben-chess</category><category>rohan-paul</category><category>long-context</category><category>model-efficiency</category><category>custom-hardware</category><category>cuda</category><category>training-stack</category><category>gpu-scaling</category><category>neural-world-models</category><category>diffusion-models</category><category>quantization</category></item><item><title>Cerebras Inference: Faster, Better, AND Cheaper</title><link>https://news.smol.ai/issues/24-08-28-ainews-cerebras-inference-faster-better-and-cheaper/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-28-ainews-cerebras-inference-faster-better-and-cheaper/</guid><description>**Groq** led early 2024 with superfast LLM inference speeds, achieving ~450 tokens/sec for Mixtral 8x7B and 240 tokens/sec for Llama 2 70B. **Cursor** introduced a specialized code edit model hitting 1000 tokens/sec. Now, **Cerebras** claims the fastest inference with their wafer-scale chips, running **Llama3.1-8b** at 1800 tokens/sec and **Llama3.1-70B** at 450 tokens/sec at full precision, with competitive pricing and a generous free tier. **Google&apos;s Gemini 1.5** models showed significant benchmark improvements, especially Gemini-1.5-Flash and Gemini-1.5-Pro. New open-source models like **CogVideoX-5B** and **Mamba-2 (Rene 1.3B)** were released, optimized for consumer hardware. **Anthropic&apos;s Claude** now supports prompt caching, improving speed and cost efficiency. *&quot;Cerebras Inference runs Llama3.1 20x faster than GPU solutions at 1/5 the price.&quot;*</description><pubDate>Thu, 29 Aug 2024 00:59:27 GMT</pubDate><category>groq</category><category>cerebras</category><category>cursor</category><category>google-deepmind</category><category>anthropic</category><category>llama-3.1-8b</category><category>llama-3.1-70b</category><category>gemini-1.5-flash</category><category>gemini-1.5-pro</category><category>cogvideox-5b</category><category>mamba-2</category><category>rene-1.3b</category><category>llama-3.1</category><category>gemini-1.5</category><category>claude</category><category>jeremyphoward</category><category>sam-altman</category><category>nat-friedman</category><category>daniel-gross</category><category>swyx</category><category>inference-speed</category><category>wafer-scale-chips</category><category>prompt-caching</category><category>model-merging</category><category>benchmarking</category><category>open-source-models</category><category>code-editing</category><category>model-optimization</category></item><item><title>CogVideoX: Zhipu&apos;s Open Source Sora</title><link>https://news.smol.ai/issues/24-08-27-ainews-cogvideox-zhipus-open-source-sora/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-27-ainews-cogvideox-zhipus-open-source-sora/</guid><description>**Zhipu AI**, Alibaba&apos;s AI arm and China&apos;s 3rd largest AI lab, released the open 5B video generation model **CogVIdeoX**, which can run without GPUs via their ChatGLM web and desktop apps. **Meta AI** announced trust &amp; safety research and CyberSecEval 3 alongside the release of **Llama 3.1**, with **Llama 3 405B** now available serverless on Google Cloud Vertex AI and Hugging Face x NVIDIA NIM API. Updates include **Moondream**, an open vision-language model improving DocVQA and TextVQA tasks, and the lightweight MoE chat model **Phi-3.5** with 16x3.8B parameters. **Together Compute** introduced the Rerank API featuring Salesforce&apos;s **LlamaRank** model for document and code ranking. Research highlights include superposition prompting for RAG without fine-tuning, the AgentWrite pipeline for long-form content generation over 20,000 words, and a comparison showing Long Context methods outperform RAG at higher costs. Tools include Not Diamond, an AI model router, AI command line interfaces, and an open-source WebGPU background removal tool. *&quot;You don&apos;t even need GPUs to run it,&quot;* referring to CogVIdeoX.</description><pubDate>Wed, 28 Aug 2024 01:26:46 GMT</pubDate><category>zhipu-ai</category><category>alibaba</category><category>meta-ai-fair</category><category>google</category><category>hugging-face</category><category>nvidia</category><category>togethercompute</category><category>salesforce</category><category>cogvideox</category><category>llama-3-1</category><category>llama-3-405b</category><category>moondream</category><category>phi-3.5</category><category>llama-rank</category><category>rohanpaul_ai</category><category>philschmid</category><category>vikhyatk</category><category>algo_diver</category><category>jayalammar</category><category>davidsholz</category><category>video-generation</category><category>serverless-computing</category><category>vision</category><category>document-vqa</category><category>text-vqa</category><category>mixture-of-experts</category><category>retrieval-augmented-generation</category><category>long-context</category><category>model-routing</category><category>webgpu</category><category>background-removal</category><category>long-form-generation</category><category>superposition-prompting</category></item><item><title>not much happened this weekend</title><link>https://news.smol.ai/issues/24-08-26-ainews-not-much-happened-this-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-26-ainews-not-much-happened-this-weekend/</guid><description>**Nous Research** announced **DisTrO**, a new optimizer that drastically reduces inter-GPU communication by 1000x to 10,000x enabling efficient training on slow networks, offering an alternative to **GDM&apos;s DiLoCo**. **Cursor AI** gained viral attention from an 8-year-old user and announced a new fundraise, with co-host Aman returning to their podcast. **George Hotz** launched **tinybox** for sale. In robotics, **AGIBOT** revealed 5 new humanoid robots with open-source plans, and **Unitree** showcased its G1 humanoid robot nearing mass production at $16,000. **ETH Zurich** and **Disney** developed an AI system for physics-based robot motion generation from text or images. **UC San Diego** released **ACE**, an open-source teleoperation system for controlling multiple robots. AI21 Labs unveiled **Jamba 1.5**, a multilingual model with 256k context length and permissive licensing. **Luma Labs** released **Dream Machine 1.5** for improved text-to-video generation. **Ideogram** launched **v2** of its text-to-image model with near-perfect text generation. **Nvidia** and **Mistral** released **Mistral-NeMo-Minitron 8B**, a small model outperforming **Mistral-7B** and **llama-3-8b** on the Open LLM leaderboard.</description><pubDate>Tue, 27 Aug 2024 00:09:52 GMT</pubDate><category>nous-research</category><category>cursor-ai</category><category>gdm</category><category>george-hotz</category><category>agibot</category><category>unitree</category><category>eth-zurich</category><category>disney</category><category>uc-san-diego</category><category>ai21-labs</category><category>luma-labs</category><category>ideogram</category><category>nvidia</category><category>mistral-ai</category><category>meta-ai-fair</category><category>jamba-1.5</category><category>dream-machine-1.5</category><category>ideogram-v2</category><category>mistral-nemo-minitron-8b</category><category>mistral-7b</category><category>llama-3-8b</category><category>george-hotz</category><category>adcock_brett</category><category>aman</category><category>distributed-ai</category><category>optimizer</category><category>inter-gpu-communication</category><category>low-latency-training</category><category>open-source</category><category>humanoid-robots</category><category>robotics</category><category>physics-based-motion</category><category>teleoperation</category><category>multilingual-models</category><category>long-context</category><category>text-to-video</category><category>text-to-image</category><category>model-performance</category></item><item><title>Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1</title><link>https://news.smol.ai/issues/24-08-23-ainews-nvidia-minitron-llm-pruning-and-distillation-updated-for-llama-31/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-23-ainews-nvidia-minitron-llm-pruning-and-distillation-updated-for-llama-31/</guid><description>**Nvidia** and **Meta** researchers updated their **Llama 3** results with a paper demonstrating the effectiveness of combining **weight pruning** and **knowledge distillation** to reduce training costs by training only the largest model from scratch and deriving smaller models via pruning and distillation. The process involves teacher correction, activation-based pruning (favoring width pruning), and retraining with distillation using KL Divergence loss, resulting in better-performing models at comparable sizes. However, distillation incurs some accuracy tradeoffs. Additionally, **AI21 Labs** launched **Jamba 1.5**, a hybrid SSM-Transformer MoE model with large context windows and multilingual support. **Anthropic** updated **Claude 3** with LaTeX rendering and prompt caching. An open-source coding-focused LLM, **Dracarys**, was released in 70B and 72B sizes, showing improved coding performance. The **Mistral Nemo Minitron 8B** model outperforms **Llama 3.1 8B** and **Mistral 7B** on the Hugging Face leaderboard, highlighting pruning and distillation benefits. Research on prompt optimization reveals the complexity of prompt search spaces and the surprising effectiveness of simple algorithms like AutoPrompt/GCG.</description><pubDate>Fri, 23 Aug 2024 22:14:15 GMT</pubDate><category>nvidia</category><category>meta-ai-fair</category><category>ai21-labs</category><category>anthropic</category><category>hugging-face</category><category>llama-3-1-8b</category><category>llama-3-1</category><category>jamba-1.5</category><category>claude-3</category><category>dracarys-70b</category><category>dracarys-72b</category><category>mistral-nemo-minitron-8b</category><category>mistral-7b</category><category>pruning</category><category>knowledge-distillation</category><category>weight-pruning</category><category>activation-based-pruning</category><category>width-pruning</category><category>kl-divergence</category><category>teacher-correction</category><category>prompt-optimization</category><category>multilinguality</category><category>long-context</category><category>mixture-of-experts</category><category>model-fine-tuning</category></item><item><title>super quiet day</title><link>https://news.smol.ai/issues/24-08-22-ainews-super-quiet-day/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-22-ainews-super-quiet-day/</guid><description>**AI21 Labs** released **Jamba 1.5**, a scaled-up State Space Model optimized for long context windows with **94B parameters** and up to **2.5X faster inference**, outperforming models like **Llama 3.1 70B** on benchmarks. The **Phi-3.5** model was praised for its safety and performance, while **Dracarys**, a new **70B open-source coding model** announced by **Bindu Reddy**, claims superior benchmarks over Llama 3.1 70B. Discussions on **California&apos;s SB 1047** AI safety legislation involve **Stanford** and **Anthropic**, highlighting a balance between precaution and industry growth. Innovations include **uv virtual environments** for rapid setup, **LangChain&apos;s LangSmith** resource tags for project management, and multi-agent systems in **Qdrant** enhancing data workflows. Community events like the **RAG workshop** by **AWS**, **LangChain**, and **Elastic** continue to support AI learning and collaboration. Memes remain a popular way to engage with AI industry culture.</description><pubDate>Fri, 23 Aug 2024 00:55:37 GMT</pubDate><category>ai21-labs</category><category>anthropic</category><category>stanford</category><category>hugging-face</category><category>langchain</category><category>qdrant</category><category>aws</category><category>elastic</category><category>jamba-1.5</category><category>phi-3.5</category><category>dracarys</category><category>llama-3-1-70b</category><category>llama-3-1</category><category>bindu-reddy</category><category>rohanpaul_ai</category><category>jackclarksf</category><category>danhendrycks</category><category>reach_vb</category><category>iqdotgraph</category><category>state-space-models</category><category>long-context</category><category>benchmarking</category><category>ai-safety</category><category>virtual-environments</category><category>multi-agent-systems</category><category>resource-management</category><category>community-engagement</category><category>model-performance</category></item><item><title>Ideogram 2 + Berkeley Function Calling Leaderboard V2</title><link>https://news.smol.ai/issues/24-08-21-ainews-ideogram-2-berkeley-function-calling-leaderboard-v2/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-21-ainews-ideogram-2-berkeley-function-calling-leaderboard-v2/</guid><description>**Ideogram** returns with a new image generation model featuring **color palette control**, a fully controllable API, and an iOS app, reaching a milestone of **1 billion images created**. Meanwhile, **Midjourney** released a Web UI but still lacks an API. In function calling, the **Berkeley Function Calling Leaderboard (BFCL)** updated to **BFCL V2 • Live**, adding **2251 live, user-contributed function documentation and queries** to improve evaluation quality. **GPT-4** leads the leaderboard, but the open-source **Functionary Llama 3-70B finetune** from Kai surpasses **Claude**. On AI model releases, **Microsoft** launched three **Phi-3.5** models with impressive reasoning and context window capabilities, while **Meta AI FAIR** introduced **UniBench**, a unified benchmark suite for over **50 vision-language model tasks**. **Baseten** improved **Llama 3** inference speed by up to **122%** using Medusa. A new cybersecurity benchmark, **Cyberbench**, featuring **40 CTF tasks**, was released. Additionally, **Codegen** was introduced as a tool for programmatic codebase analysis and AI-assisted development. *&quot;Multiple functions &gt; parallel functions&quot;* was highlighted as a key insight in function calling.</description><pubDate>Thu, 22 Aug 2024 00:05:05 GMT</pubDate><category>ideogram</category><category>midjourney</category><category>berkeley</category><category>openai</category><category>hugging-face</category><category>microsoft</category><category>meta-ai-fair</category><category>baseten</category><category>kai</category><category>claude</category><category>functionary</category><category>llama-3-70b</category><category>gpt-4</category><category>phi-3.5</category><category>functionary-llama-3-70b</category><category>llama-3</category><category>function-calling</category><category>benchmarking</category><category>image-generation</category><category>model-optimization</category><category>vision</category><category>multimodality</category><category>model-performance</category><category>fine-tuning</category><category>context-windows</category><category>cybersecurity</category><category>code-analysis</category><category>ai-assisted-development</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-08-20-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-20-ainews-not-much-happened-today/</guid><description>**OpenAI** launched **GPT-4o finetuning** with a case study on Cosine. **Anthropic** released **Claude 3.5 Sonnet** with 8k token output. **Microsoft Phi** team introduced **Phi-3.5** in three variants: Mini (3.8B), MoE (16x3.8B), and Vision (4.2B), noted for sample efficiency. **Meta** released **Llama 3.1 405B**, deployable on Google Cloud Vertex AI, offering GPT-4 level capabilities. **Qwen2-Math-72B** achieved state-of-the-art math benchmark performance with a Gradio demo. Discussions included model comparisons like ViT vs CNN and Mamba architecture. Tools updates featured **DSPy** roadmap, **Flux Schnell** improving diffusion speed on M1 Max, and **LangChain** community events. Research highlights zero-shot DUP prompting for math reasoning and fine-tuning best practices. AI ethics covered California&apos;s AI Safety Bill SB 1047 and regulatory concerns from **Yann LeCun**. Commentary on AI engineer roles by **Swyx**. *&quot;Chat with PDF&quot;* feature now available for Box Enterprise Plus users.</description><pubDate>Wed, 21 Aug 2024 00:22:36 GMT</pubDate><category>openai</category><category>anthropic</category><category>microsoft</category><category>meta-ai-fair</category><category>hugging-face</category><category>langchain</category><category>box</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>phi-3.5-mini</category><category>phi-3.5-moe</category><category>phi-3.5-vision</category><category>llama-3-1-405b</category><category>qwen2-math-72b</category><category>swyx</category><category>ylecun</category><category>fine-tuning</category><category>benchmarking</category><category>model-comparison</category><category>model-performance</category><category>diffusion-models</category><category>reinforcement-learning</category><category>zero-shot-learning</category><category>math</category><category>model-efficiency</category><category>ai-regulation</category><category>ai-safety</category><category>ai-engineering</category><category>prompt-engineering</category></item><item><title>The DSPy Roadmap</title><link>https://news.smol.ai/issues/24-08-19-ainews-the-dspy-roadmap/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-19-ainews-the-dspy-roadmap/</guid><description>**Omar Khattab** announced joining **Databricks** before his MIT professorship and outlined the roadmap for **DSPy 2.5 and 3.0+**, focusing on improving core components like LMs, signatures, optimizers, and assertions with features such as adopting **LiteLLM** to reduce code and enhance caching and streaming. The roadmap also includes developing more accurate, cost-effective optimizers, building tutorials, and enabling interactive optimization tracking. On AI Twitter, **Google** launched **Gemini Live**, a mobile conversational AI with voice and 10 voices, alongside **Pixel Buds Pro 2** with a custom Tensor A1 chip. **OpenAI** updated **ChatGPT-4o**, reclaiming the top spot on LMSYS Arena. **xAI** released **Grok-2** in beta, achieving SOTA in image generation with FLUX 1. **Nous Research** released open-source **Hermes 3** models in 8B, 70B, and 405B sizes, with the 405B model achieving SOTA. Robotics updates include **Astribot**&apos;s humanoid robot and **Apple**&apos;s tabletop robot with Siri voice commands. **Sakana AI** introduced &quot;The AI Scientist,&quot; an autonomous AI research system.</description><pubDate>Tue, 20 Aug 2024 05:06:22 GMT</pubDate><category>databricks</category><category>mit</category><category>google</category><category>openai</category><category>x-ai</category><category>nous-research</category><category>astribot</category><category>apple</category><category>sakana-ai</category><category>dspy</category><category>litel-lm</category><category>gemini</category><category>chatgpt-4o</category><category>grok-2</category><category>hermes-3</category><category>omar-khattab</category><category>giffmana</category><category>model-optimization</category><category>fine-tuning</category><category>optimizers</category><category>interactive-optimization</category><category>robotics</category><category>autonomous-systems</category><category>voice</category><category>image-generation</category><category>open-source-models</category><category>scientific-research</category><category>streaming</category><category>caching</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-08-16-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-16-ainews-not-much-happened-today/</guid><description>**Anthropic** rolled out **prompt caching** in its API, reducing input costs by up to **90%** and latency by **80%**, enabling instant fine-tuning with longer prompts. **xAI** released **Grok-2**, a new model competing with frontier models from **Google DeepMind**, **OpenAI**, **Anthropic**, **Mistral AI**, and **Meta AI Fair**, supporting vision and text inputs and integrating external image generation models. **Claude 3.5 Sonnet** is reported to outperform **GPT-4** in coding and reasoning, while **ChatGPT-4o-latest** shows reasoning improvements. **François Chollet** proposed a theory defining intelligence as the efficiency of operationalizing past information for future tasks. The **Aya project** involves 3000 collaborators building multilingual AI datasets. **Demis Hassabis** discussed AI hype and safe AI development in a podcast. Tools like **Dora AI** for Figma and **Box&apos;s AI API** enhance design automation and document processing. **Salesforce** released **DEI**, an open AI software engineering agents framework with a 55% resolve rate on SWE-Bench Lite. Industry trends highlight rapid AI integration, networking importance in the AI job market, and potential OpenAI GPT-4 expansion in response to competitors. Memes include humor about Apple Vision Pro.</description><pubDate>Sat, 17 Aug 2024 03:43:03 GMT</pubDate><category>anthropic</category><category>x-ai</category><category>google-deepmind</category><category>openai</category><category>mistral-ai</category><category>meta-ai-fair</category><category>salesforce</category><category>box</category><category>grok-2</category><category>claude-3.5-sonnet</category><category>claude-3.5</category><category>gpt-4</category><category>chatgpt-4o-latest</category><category>demis-hassabis</category><category>francois-chollet</category><category>prompt-caching</category><category>model-performance</category><category>vision</category><category>fine-tuning</category><category>multilinguality</category><category>ai-safety</category><category>design-automation</category><category>document-processing</category><category>ai-agents</category><category>ai-integration</category><category>ai-job-market</category><category>ai-acceleration</category><category>humor</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-08-15-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-15-ainews-not-much-happened-today/</guid><description>**GPT-5** delayed again amid a quiet news day. **Nous Research** released Hermes 3 finetune of **Llama 3** base models, rivaling FAIR&apos;s instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. **Nvidia** introduced Minitron finetune of **Llama 3.1**. **Salesforce** launched a DEI agent scoring 55% on SWE-Bench Lite. **Goodfire AI** secured $7M seed funding for mechanistic interpretability work. **Anthropic** rolled out prompt caching in their API, cutting input costs by up to 90% and latency by 80%, aiding coding assistants and large document processing. **xAI** released **Grok-2**, matching **Claude 3.5 Sonnet** and **GPT-4 Turbo** on LMSYS leaderboard with vision+text inputs and image generation integration. **Claude 3.5 Sonnet** reportedly outperforms **GPT-4** in coding and reasoning. **François Chollet** defined intelligence as efficient operationalization of past info for future tasks. **Salesforce&apos;s** DEI framework surpasses individual agent performance. **Google DeepMind&apos;s** Demis Hassabis discussed AGI&apos;s role in scientific discovery and safe AI development. **Dora AI** plugin generates landing pages in under 60 seconds, boosting web team efficiency. **Box AI API** beta enables document chat, data extraction, and content summarization. **LangChain** updated Python &amp; JavaScript integration docs.</description><pubDate>Fri, 16 Aug 2024 04:05:53 GMT</pubDate><category>nous-research</category><category>nvidia</category><category>salesforce</category><category>goodfire-ai</category><category>anthropic</category><category>x-ai</category><category>google-deepmind</category><category>box</category><category>langchain</category><category>llama-3</category><category>llama-3-1</category><category>grok-2</category><category>claude-3.5-sonnet</category><category>gpt-4-turbo</category><category>fchollet</category><category>demis-hassabis</category><category>fine-tuning</category><category>prompt-caching</category><category>mechanistic-interpretability</category><category>model-performance</category><category>multimodality</category><category>agent-frameworks</category><category>software-engineering-agents</category><category>api</category><category>document-processing</category><category>text-generation</category><category>model-releases</category><category>vision</category><category>image-generation</category><category>efficiency</category><category>scientific-discovery</category></item><item><title>Grok 2! and ChatGPT-4o-latest confuses everybody</title><link>https://news.smol.ai/issues/24-08-14-ainews-grok-2-and-chatgpt-4o-latest-confuses-everybody/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-14-ainews-grok-2-and-chatgpt-4o-latest-confuses-everybody/</guid><description>**OpenAI** quietly released a new **GPT-4o** model in ChatGPT, distinct from the API version, reclaiming the #1 spot on Lmsys arena benchmarks across multiple categories including math, coding, and instruction-following. Meanwhile, **X.ai** launched **Grok 2**, outperforming **Claude 3.5 Sonnet** and previous GPT-4o versions, with plans for enterprise API release. Grok 2 integrates **Black Forest Labs&apos; Flux.1**, an open-source text-to-image model surpassing **Stable Diffusion 3**. **Google DeepMind** announced **Gemini Advanced** with enhanced conversational features and Pixel device integration. AI researcher **ylecun** highlighted LLM limitations in learning and creativity, while **rohanpaul_ai** discussed an AI Scientist system generating publishable ML research at low cost. **karpathy** warned of security risks in LLM tokenizers akin to SQL injection.</description><pubDate>Thu, 15 Aug 2024 00:51:40 GMT</pubDate><category>openai</category><category>x-ai</category><category>black-forest-labs</category><category>google-deepmind</category><category>gpt-4o</category><category>grok-2</category><category>claude-3.5-sonnet</category><category>flux-1</category><category>stable-diffusion-3</category><category>gemini-advanced</category><category>ylecun</category><category>rohanpaul_ai</category><category>karpathy</category><category>benchmarking</category><category>model-performance</category><category>tokenization</category><category>security-vulnerabilities</category><category>multi-agent-systems</category><category>research-automation</category><category>text-to-image</category><category>conversational-ai</category><category>model-integration</category></item><item><title>Gemini Live</title><link>https://news.smol.ai/issues/24-08-13-ainews-gemini-live/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-13-ainews-gemini-live/</guid><description>**Google** launched **Gemini Live** on Android for **Gemini Advanced** subscribers during the Pixel 9 event, featuring integrations with Google Workspace apps and other Google services. The rollout began on 8/12/2024, with iOS support planned. **Anthropic** released **Genie**, an AI software engineering system achieving a **57%** improvement on SWE-Bench. **TII** introduced **Falcon Mamba**, a 7B attention-free open-access model scalable to long sequences. Benchmarking showed that longer context lengths do not always improve Retrieval-Augmented Generation. **Supabase** launched an AI-powered Postgres service dubbed the &quot;ChatGPT of databases,&quot; fully open source. **Perplexity AI** partnered with Polymarket to integrate real-time probability predictions into search results. A tutorial demonstrated a multimodal recipe recommender using **Qdrant**, **LlamaIndex**, and **Gemini**. An OpenAI engineer shared success tips emphasizing debugging and hard work. The connection between matrices and graphs in linear algebra was highlighted for insights into nonnegative matrices and strongly connected components. **Keras 3.5.0** was released with Hugging Face Hub integration for model saving and loading.</description><pubDate>Wed, 14 Aug 2024 01:23:26 GMT</pubDate><category>google</category><category>anthropic</category><category>tii</category><category>supabase</category><category>perplexity-ai</category><category>llamaindex</category><category>openai</category><category>hugging-face</category><category>gemini-1.5-pro</category><category>genie</category><category>falcon-mamba</category><category>gemini-1.5</category><category>llamaindex</category><category>omarsar0</category><category>osanseviero</category><category>dbrxmosaicai</category><category>alphasignalai</category><category>perplexity_ai</category><category>_jasonwei</category><category>svpino</category><category>multimodality</category><category>benchmarking</category><category>long-context</category><category>retrieval-augmented-generation</category><category>open-source</category><category>model-releases</category><category>model-integration</category><category>model-performance</category><category>software-engineering</category><category>linear-algebra</category><category>hugging-face-hub</category><category>debugging</category></item><item><title>a quiet weekend</title><link>https://news.smol.ai/issues/24-08-12-ainews-a-quiet-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-12-ainews-a-quiet-weekend/</guid><description>**Figure** unveiled **Figure 02**, claimed as the most advanced humanoid robot, operating autonomously at BMW&apos;s Plant Spartanburg. **DeepMind** developed a table tennis robot achieving **100% wins against beginners** and **55% against intermediates**. **Boston Dynamics** showcased the dexterity of its fully-electric **Atlas** robot performing pushups and burpees. An autonomous dental robot performed the world&apos;s first dental procedure on a human, reducing a 2-hour process to 15 minutes using a **3D volumetric scanner**. **SAM 2** was introduced as an open model for real-time object segmentation without custom adaptation. **Alibaba** released **Qwen2-Math**, outperforming **GPT-4** and **Claude 3.5** in math capabilities. A new Listening-While-Speaking Language Model (LSLM) enables simultaneous listening and speaking in real-time. Researchers developed a disease prediction AI with **95% accuracy** for diseases like coronary artery disease, type 2 diabetes, and breast cancer. Tools like **LlamaParse CLI** and **MLX Whisper package** enhance PDF parsing and speech recognition, with the latter running **40X faster than realtime** on M1 Max. The news highlights significant advancements in robotics, AI models, and practical AI tools.</description><pubDate>Mon, 12 Aug 2024 22:36:30 GMT</pubDate><category>figure</category><category>deepmind</category><category>boston-dynamics</category><category>alibaba</category><category>llamaindex</category><category>sam-2</category><category>qwen2-math</category><category>gpt-4</category><category>claude-3.5</category><category>adcock_brett</category><category>rasbt</category><category>hamel-husain</category><category>rohanpaul_ai</category><category>robotics</category><category>object-segmentation</category><category>real-time-processing</category><category>disease-prediction</category><category>speech-recognition</category><category>cli-tools</category><category>model-performance</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-08-09-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-09-ainews-not-much-happened-today/</guid><description>**Qwen2-Math-72B** outperforms **GPT-4o**, **Claude-3.5-Sonnet**, **Gemini-1.5-Pro**, and **Llama-3.1-405B** on math benchmarks using synthetic data and advanced optimization techniques. **Google AI** cuts pricing for **Gemini 1.5 Flash** by up to 78%. **Anthropic** expands its bug bounty program targeting universal jailbreaks in next-gen safety systems. Tutorial on **QLoRA** fine-tuning of **IDEFICS3-Llama 8B** for visual question answering released. A Chinese open weights model surpasses previous MATH benchmark records. Surveys on **Mamba** models and LLM-based agents for software engineering highlight advancements and applications. Open-source tools like **R2R RAG engine** and **LlamaIndex Workflows** simplify building complex AI applications. **Mistral AI** introduces customizable AI agents. Concerns raised about California bill SB 1047&apos;s focus on existential risk and debates on banning open-source AI. Memes and humor continue in AI communities.</description><pubDate>Sat, 10 Aug 2024 05:51:12 GMT</pubDate><category>anthropic</category><category>google</category><category>mistral-ai</category><category>llamaindex</category><category>qwen2-math-72b</category><category>gpt-4o</category><category>claude-3.5-sonnet</category><category>gemini-1.5-pro</category><category>llama-3.1-405b</category><category>idefics3-llama-8b</category><category>rohanpaul_ai</category><category>anthropicai</category><category>mervenoyann</category><category>jeremyphoward</category><category>omarsar0</category><category>ylecun</category><category>bindureddy</category><category>math</category><category>fine-tuning</category><category>synthetic-data</category><category>reinforcement-learning</category><category>bug-bounty</category><category>visual-question-answering</category><category>open-source</category><category>retrieval-augmented-generation</category><category>agentic-ai</category><category>ai-safety</category><category>policy</category></item><item><title>Too Cheap To Meter: AI prices cut 50-70% in last 30 days</title><link>https://news.smol.ai/issues/24-08-08-ainews-too-cheap-to-meter-ai-prices-cut-50-70percent-in-last-30-days/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-08-ainews-too-cheap-to-meter-ai-prices-cut-50-70percent-in-last-30-days/</guid><description>**Gemini 1.5 Flash** has cut prices by approximately **70%**, offering a highly competitive free tier of **1 million tokens per minute** at **$0.075/mtok**, intensifying the AI model price war. Other significant price reductions include **GPT-4o** (~50% cut to **$2.50/mtok**), **GPT-4o mini** (70-98.5% cut to **$0.15/mtok**), **Llama 3.1 405b** (46% cut to **$2.7/mtok**), and **Mistral Large 2** (62% cut to **$3/mtok**). **Deepseek v2** introduced context caching, reducing input token costs by up to **90%** to **$0.014/mtok**. New model releases include **Llama 3.1 405b**, **Sonnet 3.5**, **EXAONE-3.0** (7.8B instruction-tuned by LG AI Research), and **MiniCPM V 2.6** (vision-language model combining SigLIP 400M and Qwen2-7B). Benchmarks show **Mistral Large** performing well on ZebraLogic and **Claude-3.5** leading LiveBench. **FlexAttention**, a new PyTorch API, simplifies and optimizes attention mechanisms. **Andrej Karpathy** analyzed RLHF, highlighting its limitations compared to traditional reinforcement learning. Google DeepMind research on compute-optimal scaling was also summarized.</description><pubDate>Fri, 09 Aug 2024 04:27:56 GMT</pubDate><category>llamaindex</category><category>together-ai</category><category>deepinfra</category><category>deepseek-ai</category><category>mistral-ai</category><category>google-deepmind</category><category>lg-ai-research</category><category>llamaindex</category><category>llamaindex</category><category>llamaindex</category><category>gpt-4o</category><category>gpt-4o-mini</category><category>llama-3-1-405b</category><category>mistral-large-2</category><category>gemini-1.5-flash</category><category>deepseek-v2</category><category>sonnet-3.5</category><category>exaone-3.0</category><category>minicpm-v-2.6</category><category>claude-3.5</category><category>gpt-4o-2024-08-06</category><category>rohanpaul_ai</category><category>akhaliq</category><category>mervenoyann</category><category>sophiamyang</category><category>chhillee</category><category>karpathy</category><category>price-cuts</category><category>context-caching</category><category>instruction-tuning</category><category>vision</category><category>benchmarks</category><category>pytorch</category><category>attention-mechanisms</category><category>reinforcement-learning-from-human-feedback</category><category>compute-optimal-scaling</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-08-07-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-07-ainews-not-much-happened-today/</guid><description>**OpenAI** introduced structured outputs in their API with a new &quot;strict&quot; mode and a &quot;response_format&quot; parameter, supporting models like **gpt-4-0613**, **gpt-3.5-turbo-0613**, and the new **gpt-4o-2024-08-06**. They also halved the price of **gpt-4o** to $2.50 per million tokens. **Mistral Large 2** outperforms **gpt4-turbo** and **claude-3-opus** on hard benchmarks and coding tasks. **Idefics3-Llama** offers multimodal capabilities with a 10k token context window. **BigLlama-3.1-1T-Instruct** is an upscaled version of **llama-3-120b-instruct**. New benchmark &quot;big_model_smell&quot; measures creativity and reliability. **Figure 02** robot features advanced AI hardware with onboard vision language model, enhanced battery, and speech-to-speech reasoning. **Yann LeCun** expressed concerns about California&apos;s SB1047 regulation.</description><pubDate>Thu, 08 Aug 2024 01:50:11 GMT</pubDate><category>openai</category><category>mistral-ai</category><category>meta-ai-fair</category><category>gpt-4-0613</category><category>gpt-3.5-turbo-0613</category><category>gpt-4o-2024-08-06</category><category>mistral-large-2</category><category>gpt4-turbo</category><category>claude-3-opus</category><category>idefics3-llama</category><category>bigllama-3.1-1t-instruct</category><category>llama-3-120b-instruct</category><category>sama</category><category>rohanpaul_ai</category><category>corbtt</category><category>guillaumelample</category><category>mervenoyann</category><category>maximelabonne</category><category>aidan_mclau</category><category>adcock_brett</category><category>ylecun</category><category>structured-outputs</category><category>function-calling</category><category>json-schema</category><category>benchmarking</category><category>multimodality</category><category>context-windows</category><category>model-scaling</category><category>ai-hardware</category><category>vision</category><category>speech-processing</category><category>robotics</category><category>ai-regulation</category></item><item><title>GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)</title><link>https://news.smol.ai/issues/24-08-06-ainews-gpt4o-august-100percent-structured-outputs-for-all-gpt4o-mini-edition/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-06-ainews-gpt4o-august-100percent-structured-outputs-for-all-gpt4o-mini-edition/</guid><description>**Stability.ai** users are leveraging **LoRA** and **ControlNet** for enhanced line art and artistic style transformations, while facing challenges with **AMD GPUs** due to the discontinuation of **ZLUDA**. Community tensions persist around the **r/stablediffusion** subreddit moderation. **Unsloth AI** users report fine-tuning difficulties with **LLaMA3** models, especially with PPO trainer integration and prompt formatting, alongside anticipation for **multi-GPU** support and cost-effective cloud computing on **RunPod**. **Google** released the lightweight **Gemma 2 2B** model optimized for on-device use with **2.6B** parameters, featuring safety and sparse autoencoder tools, and announced **Diffusers** integration for efficient text-to-image generation on limited resources.</description><pubDate>Wed, 07 Aug 2024 02:55:03 GMT</pubDate><category>stability-ai</category><category>unsloth-ai</category><category>google</category><category>hugging-face</category><category>gpt-4o-mini</category><category>gpt-4o-2024-08-06</category><category>llama-3</category><category>bigllama-3.1-1t-instruct</category><category>meta-llama-3-120b-instruct</category><category>gemma-2-2b</category><category>lora</category><category>controlnet</category><category>line-art</category><category>gpu-performance</category><category>multi-gpu-support</category><category>fine-tuning</category><category>prompt-formatting</category><category>cloud-computing</category><category>text-to-image-generation</category><category>model-integration</category></item><item><title>GPT4o August + 100% Structured Outputs for All (GPT4o August edition)</title><link>https://news.smol.ai/issues/24-08-06-ainews-gpt4o-august-100percent-structured-outputs-for-all-gpt4o-august-edition/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-06-ainews-gpt4o-august-100percent-structured-outputs-for-all-gpt4o-august-edition/</guid><description>**OpenAI** released the new **gpt-4o-2024-08-06** model with **16k context window** and **33-50% lower pricing** than the previous 4o-May version, featuring a new Structured Output API that improves output quality and reduces retry costs. **Meta AI** launched **Llama 3.1**, a **405-billion parameter** model surpassing **GPT-4** and **Claude 3.5 Sonnet** on benchmarks, alongside expanding the **Llama Impact Grant** program. **Google DeepMind** quietly released **Gemini 1.5 Pro**, outperforming **GPT-4o**, **Claude-3.5**, and **Llama 3.1** on LMSYS benchmarks and leading the Vision Leaderboard. **Yi-Large Turbo** was introduced as a cost-effective upgrade priced at $0.19 per million tokens. In hardware, **NVIDIA H100 GPUs** were highlighted by **John Carmack** for their massive AI workload power, and **Groq** announced plans to deploy **108,000 LPUs** by Q1 2025. New AI tools and techniques include **RAG (Retrieval-Augmented Generation)**, the **JamAI Base** platform for Mixture of Agents systems, and **LangSmith**&apos;s enhanced filtering capabilities. Google DeepMind also introduced **PEER (Parameter Efficient Expert Retrieval)** architecture.</description><pubDate>Wed, 07 Aug 2024 02:40:09 GMT</pubDate><category>openai</category><category>meta-ai-fair</category><category>google-deepmind</category><category>yi-large</category><category>nvidia</category><category>groq</category><category>langchain</category><category>jamai</category><category>langsmith</category><category>gpt-4o-2024-08-06</category><category>llama-3-1-405b</category><category>llama-3</category><category>claude-3.5-sonnet</category><category>gemini-1.5-pro</category><category>gpt-4o</category><category>yi-large-turbo</category><category>john-carmack</category><category>jonathan-ross</category><category>rohanpaul_ai</category><category>structured-output</category><category>context-windows</category><category>model-pricing</category><category>benchmarking</category><category>parameter-efficient-expert-retrieval</category><category>retrieval-augmented-generation</category><category>mixture-of-experts</category><category>model-performance</category><category>ai-hardware</category><category>model-deployment</category><category>filtering</category><category>multi-lingual</category><category>vision</category></item><item><title>How Carlini Uses AI</title><link>https://news.smol.ai/issues/24-08-05-ainews-how-carlini-uses-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-05-ainews-how-carlini-uses-ai/</guid><description>**Groq&apos;s** shareholders&apos; net worth rises while others fall, with **Intel&apos;s CEO** expressing concern. **Nicholas Carlini** of **DeepMind** gains recognition and criticism for his extensive AI writings, including an 80,000-word treatise on AI use and a benchmark for large language models. **Chris Dixon** comments on AI Winter skepticism, emphasizing long-term impact. **Box** introduces an AI API for extracting structured data from documents, highlighting potential and risks of LLM-driven solutions. Recent AI developments include **Figure AI** launching the advanced humanoid robot Figure 02, **OpenAI** rolling out Advanced Voice Mode for ChatGPT with emotion detection, **Google** open-sourcing **Gemma 2 2B** model matching GPT-3.5-Turbo-0613 performance, **Meta AI Fair** releasing Segment Anything Model 2 (SAM 2) for real-time object tracking, **NVIDIA** showcasing Project GR00T for humanoid teleoperation with Apple Vision Pro, **Stability AI** launching Stable Fast 3D for rapid 3D asset generation, and **Runway** unveiling Gen-3 Alpha for AI text-to-video generation.</description><pubDate>Mon, 05 Aug 2024 23:43:14 GMT</pubDate><category>groq</category><category>intel</category><category>deepmind</category><category>box</category><category>figure-ai</category><category>openai</category><category>google</category><category>meta-ai-fair</category><category>nvidia</category><category>stability-ai</category><category>runway</category><category>gemma-2-2b</category><category>gpt-3.5-turbo-0613</category><category>mixtral-8x7b</category><category>gen-3-alpha</category><category>segment-anything-model-2</category><category>stable-fast-3d</category><category>nicholas-carlini</category><category>chris-dixon</category><category>rasbt</category><category>benchmarking</category><category>adversarial-attacks</category><category>large-language-models</category><category>text-generation</category><category>multimodality</category><category>robotics</category><category>emotion-detection</category><category>structured-data-extraction</category><category>real-time-processing</category><category>teleoperation</category><category>3d-generation</category><category>text-to-video</category></item><item><title>Execuhires: Tempting The Wrath of Khan</title><link>https://news.smol.ai/issues/24-08-02-ainews-execuhires-tempting-the-wrath-of-khan/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-02-ainews-execuhires-tempting-the-wrath-of-khan/</guid><description>**Character.ai&apos;s $2.5b execuhire to Google** marks a significant leadership move alongside **Adept&apos;s $429m execuhire to Amazon** and **Inflection&apos;s $650m execuhire to Microsoft**. Despite strong user growth and content momentum, Character.ai&apos;s CEO Noam Shazeer returns to Google, signaling shifting vibes in the AI industry. **Google DeepMind&apos;s Gemini 1.5 Pro** tops Chatbot Arena benchmarks, outperforming **GPT-4o** and **Claude-3.5**, excelling in multilingual, math, and coding tasks. The launch of **Black Forest Labs&apos; FLUX.1** text-to-image model and **LangGraph Studio** agent IDE highlight ongoing innovation. **Llama 3.1 405B** is released as the largest open-source model, fostering developer use and competition with closed models. The industry is focusing increasingly on post-training and data as key competitive factors, raising questions about acquisition practices and regulatory scrutiny.</description><pubDate>Sat, 03 Aug 2024 01:48:48 GMT</pubDate><category>character.ai</category><category>google</category><category>adept</category><category>amazon</category><category>inflection</category><category>microsoft</category><category>stability-ai</category><category>black-forest-labs</category><category>schelling</category><category>google-deepmind</category><category>openai</category><category>anthropic</category><category>meta-ai-fair</category><category>lmsys</category><category>langchainai</category><category>gemini-1.5-pro</category><category>gpt-4o</category><category>claude-3.5</category><category>flux-1</category><category>llama-3-1-405b</category><category>noam-shazeer</category><category>mostafa-mostaque</category><category>david-friedman</category><category>rob-rombach</category><category>alexandr-wang</category><category>svpino</category><category>rohanpaul_ai</category><category>execuhire</category><category>model-benchmarking</category><category>multilinguality</category><category>math</category><category>coding</category><category>text-to-image</category><category>agent-ide</category><category>open-source-models</category><category>post-training</category><category>data-driven-performance</category></item><item><title>Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs</title><link>https://news.smol.ai/issues/24-08-01-ainews-rombach-et-al-flux1-proordevorschnell-dollar31m-seed-for-black-forest-labs/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-08-01-ainews-rombach-et-al-flux1-proordevorschnell-dollar31m-seed-for-black-forest-labs/</guid><description>**Stability AI** co-founder Rombach launched **FLUX.1**, a new text-to-image model with three variants: pro (API only), dev (open-weight, non-commercial), and schnell (Apache 2.0). FLUX.1 outperforms **Midjourney** and **Ideogram** based on Black Forest Labs&apos; ELO score and plans to expand into text-to-video. **Google DeepMind** released **Gemma-2 2B**, a 2 billion parameter open-source model that outperforms larger models like **GPT-3.5-Turbo-0613** and **Mixtral-8x7b** on Chatbot Arena, optimized with NVIDIA TensorRT-LLM. The release includes safety classifiers (ShieldGemma) and sparse autoencoder analysis (Gemma Scope). Discussions highlight benchmarking discrepancies and US government support for open-weight AI models. Critiques of AI coding tools&apos; productivity gains were also noted.</description><pubDate>Fri, 02 Aug 2024 01:05:39 GMT</pubDate><category>stability-ai</category><category>google-deepmind</category><category>nvidia</category><category>gemma-2-2b</category><category>gpt-3.5-turbo-0613</category><category>mixtral-8x7b</category><category>flux-1</category><category>rohanpaul_ai</category><category>fchollet</category><category>bindureddy</category><category>clementdelangue</category><category>ylecun</category><category>svpino</category><category>text-to-image</category><category>text-to-video</category><category>model-benchmarking</category><category>open-weight-models</category><category>model-distillation</category><category>safety-classifiers</category><category>sparse-autoencoders</category><category>ai-coding-tools</category></item><item><title>Gemma 2 2B + Scope + Shield</title><link>https://news.smol.ai/issues/24-07-31-ainews-gemma-2-2b-scope-shield/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-31-ainews-gemma-2-2b-scope-shield/</guid><description>**Gemma 2B**, a 2 billion parameter model trained on **2 trillion tokens** and distilled from a larger unnamed LLM, has been released by **Google DeepMind** and shows strong leaderboard performance despite weaknesses in math. The Gemma series, including 9B and 27B models, has gained popularity since its June release. The team also released 400 SAEs for interpretability, inspired by **Anthropic**&apos;s research. A finetuned classifier called ShieldGemma outperforms Meta&apos;s LlamaGuard in harm detection. Meanwhile, **Meta AI** announced **Llama-3.1-405B** reaching #3 on the Overall Arena leaderboard, and released **SAM 2**, a video and image segmentation model with significant speed improvements. **OpenAI** is rolling out an advanced Voice Mode to Plus users. **Perplexity AI** launched a Publishers Program with major media partners and a status page. **NVIDIA** introduced Project GR00T for scaling robot data using Apple Vision Pro and generative simulation. Interest in quantization for compressing LLMs is growing, and LLM-as-a-Judge implementations from Vicuna, AlpacaEval, and G-Eval highlight the effectiveness of simple prompts and domain-specific evaluation.</description><pubDate>Thu, 01 Aug 2024 01:33:32 GMT</pubDate><category>google-deepmind</category><category>anthropic</category><category>meta-ai-fair</category><category>openai</category><category>perplexity-ai</category><category>nvidia</category><category>lmsys</category><category>gemma-2b</category><category>gemma-2-9b</category><category>gemma-2-27b</category><category>llama-3-1-405b</category><category>sam-2</category><category>gpt-3.5</category><category>vicuna</category><category>alpacaeval</category><category>g-eval</category><category>knowledge-distillation</category><category>leaderboards</category><category>model-interpretability</category><category>finetuning</category><category>harm-detection</category><category>video-segmentation</category><category>voice</category><category>publishers-program</category><category>robotics-data-scaling</category><category>quantization</category><category>llm-evaluation</category><category>prompt-engineering</category></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-07-31-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-31-ainews-not-much-happened-today/</guid><description>**Meta** released **SAM 2**, a unified model for real-time object segmentation with a new dataset 4.5x larger and 53x more annotated than previous ones. **FastHTML**, a new Python web framework by **Jeremy Howard**, enables easy creation and deployment of interactive web apps. **Scale AI** launched the SEAL Leaderboard on adversarial robustness, topped by **Gemini 1.5 Pro** from **Google DeepMind**. **Apple** published a technical report on their Intelligence Foundation Language Models for on-device and server use. **Yann LeCun** emphasized the importance of open source AI in an article co-authored with Martin Casado and Ion Stoica. **Maarten Grootendorst**&apos;s &quot;Visual Guide to Quantization&quot; on efficient LLM inference went viral. **ChatGPT** started rolling out advanced voice and vision-enabled modes to select users. **Leonardo AI** was acquired by **Canva**. **Jim Fan** shared insights on Project Groot augmenting human demonstration data for robotics. **Midjourney v6.1** was released.</description><pubDate>Wed, 31 Jul 2024 07:04:15 GMT</pubDate><category>meta-ai-fair</category><category>google-deepmind</category><category>scale-ai</category><category>apple</category><category>canva</category><category>hugging-face</category><category>sam-2</category><category>gemini-1.5-pro</category><category>chatgpt</category><category>midjourney-v6.1</category><category>jeremyphoward</category><category>demis-hassabis</category><category>ylecun</category><category>maartengrootendorst</category><category>jimfan</category><category>object-segmentation</category><category>quantization</category><category>web-development-framework</category><category>adversarial-robustness</category><category>on-device-ai</category><category>open-source</category><category>robotics</category><category>voice</category><category>vision</category></item><item><title>Apple Intelligence Beta + Segment Anything Model 2</title><link>https://news.smol.ai/issues/24-07-29-ainews-apple-intelligence-beta-segment-anything-model-2/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-29-ainews-apple-intelligence-beta-segment-anything-model-2/</guid><description>**Meta** advanced its open source AI with a sequel to the **Segment Anything Model**, enhancing image segmentation with memory attention for video applications using minimal data and compute. **Apple Intelligence** delayed its official release to iOS 18.1 in October but launched developer previews on **MacOS Sequoia**, **iOS 18**, and **iPadOS 18**, accompanied by a detailed 47-page paper revealing extensive pretraining on **6.3T tokens** and use of **Cloud TPUs** rather than Apple Silicon. The paper highlights improvements in instruction following, reasoning, and writing through post-training and synthetic data. Benchmarks show Apple’s model scores lower than **Llama 3**, but with trusted human evaluations. Additionally, **Meta** released **Llama 3.1** with a 405B parameter model, marking a significant open-source frontier model release.</description><pubDate>Tue, 30 Jul 2024 02:45:55 GMT</pubDate><category>meta-ai-fair</category><category>apple</category><category>llama-3-405b</category><category>llama-3</category><category>segment-anything-model</category><category>bindureddy</category><category>maximelabonne</category><category>reach_vb</category><category>image-segmentation</category><category>memory-attention</category><category>video-processing</category><category>pretraining</category><category>cloud-tpus</category><category>post-training</category><category>synthetic-data</category><category>instruction-following</category><category>reasoning</category><category>writing</category><category>benchmarking</category></item><item><title>AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold</title><link>https://news.smol.ai/issues/24-07-25-ainews-alphaproof-alphageometry2-reach-1-point-short-of-imo-gold/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-25-ainews-alphaproof-alphageometry2-reach-1-point-short-of-imo-gold/</guid><description>**Search+Verifier** highlights advances in neurosymbolic AI during the 2024 Math Olympics. **Google DeepMind**&apos;s combination of **AlphaProof** and **AlphaGeometry 2** solved four out of six IMO problems, with AlphaProof being a finetuned **Gemini** model using an AlphaZero approach, and AlphaGeometry 2 trained on significantly more synthetic data with a novel knowledge-sharing mechanism. Despite impressive results, human judges noted the AI required much longer time than human competitors. Meanwhile, **Meta AI** released **Llama 3.1** with a 405B parameter model and smaller variants, and **Mistral AI** launched **Mistral Large 2** with 123B parameters and 128k context windows, outperforming Llama 3.1 on coding tasks and multilingual benchmarks. This marks significant progress in AI mathematical reasoning, model scaling, and multilingual capabilities.</description><pubDate>Fri, 26 Jul 2024 01:15:56 GMT</pubDate><category>google-deepmind</category><category>meta-ai-fair</category><category>mistral-ai</category><category>gemini</category><category>alphageometry-2</category><category>alphaproof</category><category>llama-3-1-405b</category><category>llama-3-70b</category><category>llama-3-8b</category><category>mistral-large-2</category><category>tim-gowers</category><category>guillaume-lample</category><category>osanseviero</category><category>neurosymbolic-ai</category><category>mathematical-reasoning</category><category>synthetic-data</category><category>knowledge-sharing</category><category>model-fine-tuning</category><category>alpha-zero</category><category>multilinguality</category><category>context-windows</category><category>model-scaling</category><category>benchmarking</category><category>performance-comparison</category></item><item><title>Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B</title><link>https://news.smol.ai/issues/24-07-24-ainews-mistral-large-2-rip-mistral-7b-8x7b-8x22b/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-24-ainews-mistral-large-2-rip-mistral-7b-8x7b-8x22b/</guid><description>**Mistral Large 2** introduces **123B parameters** with **Open Weights** under a Research License, focusing on **code generation**, **math performance**, and a massive **128k context window**, improving over Mistral Large 1&apos;s 32k context. It claims better **function calling** capabilities than **GPT-4o** and enhanced reasoning. Meanwhile, **Meta** officially released **Llama-3.1** models including **Llama-3.1-70B** and **Llama-3.1-8B** with detailed pre-training and post-training insights. The **Llama-3.1 8B** model&apos;s 128k context performance was found underwhelming compared to **Mistral Nemo** and **Yi 34B 200K**. Mistral is deprecating older Apache open-source models, focusing on Large 2 and **Mistral Nemo 12B**. The news also highlights community discussions and benchmarking comparisons.</description><pubDate>Wed, 24 Jul 2024 23:44:31 GMT</pubDate><category>mistral-ai</category><category>meta-ai-fair</category><category>groq</category><category>togethercompute</category><category>mistral-large-2</category><category>mistral-nemo-12b</category><category>llama-3.1-8b</category><category>llama-3.1-70b</category><category>llama-3.1</category><category>llama-3-405b</category><category>yi-34b-200k</category><category>gpt-4o</category><category>code-generation</category><category>math</category><category>function-calling</category><category>reasoning</category><category>context-windows</category><category>model-deprecation</category><category>pretraining</category><category>posttraining</category><category>benchmarking</category></item><item><title>Llama 3.1: The Synthetic Data Model</title><link>https://news.smol.ai/issues/24-07-23-ainews-llama-31-the-synthetic-data-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-23-ainews-llama-31-the-synthetic-data-model/</guid><description>**Meta AI** has released **Llama 3.1**, including a **405B parameter model** that triggers regulatory considerations like the **EU AI Act** and **SB 1047**. The model incorporates extensive **synthetic data** techniques for **code**, **math**, **multilinguality**, **long context**, and **tool use** fine-tuning, with **RLHF** using synthetic preference data from **Llama 2**. The launch was coordinated across major inference providers, with **Groq** demonstrating **750 tokens per second** inference speed and **Fireworks** leading in pricing. The updated license explicitly allows synthetic data generation, marking a significant step in open frontier-class LLMs and cost-efficiency improvements since March.</description><pubDate>Wed, 24 Jul 2024 00:13:31 GMT</pubDate><category>meta-ai-fair</category><category>groq</category><category>fireworks</category><category>llama-3-405b</category><category>llama-3-1</category><category>llama-3</category><category>bindureddy</category><category>thomas</category><category>synthetic-data</category><category>fine-tuning</category><category>reinforcement-learning</category><category>multilinguality</category><category>long-context</category><category>tool-use</category><category>code-generation</category><category>math</category><category>model-licensing</category><category>inference-speed</category><category>model-deployment</category></item><item><title>Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model</title><link>https://news.smol.ai/issues/24-07-22-ainews-llama-31-leaks-big-bumps-to-8b-minor-bumps-to-70b-and-sota-oss-405b-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-22-ainews-llama-31-leaks-big-bumps-to-8b-minor-bumps-to-70b-and-sota-oss-405b-model/</guid><description>**Llama 3.1** leaks reveal a **405B dense model** with **128k context length**, trained on **39.3M GPU hours** using H100-80GB GPUs, and fine-tuned with **over 25M synthetic examples**. The model shows significant benchmark improvements, especially for the 8B and 70B variants, with some evals suggesting the 70B outperforms **GPT-4o**. **GPT-4o Mini** launched as a cost-efficient variant with strong performance but some reasoning weaknesses. Synthetic datasets like **NuminaMath** enable models such as **Alibaba Qwen 2** to surpass GPT-4o and Claude 3.5 in math competitions. Discussions include reasoning task benchmarks and dataset building for improved reasoning.</description><pubDate>Tue, 23 Jul 2024 01:12:50 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>alibaba</category><category>llama-3-1-405b</category><category>llama-3-8b</category><category>llama-3-70b</category><category>llama-3-1-8b</category><category>gpt-4o</category><category>gpt-4o-mini</category><category>claude-3-5</category><category>qwen-2</category><category>swyx</category><category>philschmid</category><category>jjitsev</category><category>lewtun</category><category>teknium1</category><category>adcock_brett</category><category>multilinguality</category><category>code-generation</category><category>context-windows</category><category>model-training</category><category>synthetic-data</category><category>benchmarking</category><category>reasoning</category><category>fine-tuning</category><category>model-performance</category><category>dataset-release</category></item><item><title>DataComp-LM: the best open-data 7B model/benchmark/dataset</title><link>https://news.smol.ai/issues/24-07-19-ainews-datacomp-lm-the-best-open-data-7b-modelbenchmarkdataset/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-19-ainews-datacomp-lm-the-best-open-data-7b-modelbenchmarkdataset/</guid><description>**DataComp team** released a competitive **7B open data language model** trained on only **2.5T tokens** from the massive **DCLM-POOL dataset** of **240 trillion tokens**, showing superior scaling trends compared to FineWeb. **OpenAI** launched **GPT-4o mini**, a cost-effective model with **82% MMLU** and performance near GPT-4-Turbo, aimed at developers for broad applications. **NVIDIA and Mistral** jointly released the **Mistral NeMo 12B** model featuring a **128k token context window**, FP8 checkpoint, multilingual support, and Apache 2.0 licensing. **DeepSeek** announced **DeepSeek-V2-0628** as the top open-source model on the LMSYS Chatbot Arena leaderboard with strong rankings in coding, math, and hard prompts. This news highlights advances in dataset design, model efficiency, and open-source contributions in the AI community.</description><pubDate>Sat, 20 Jul 2024 02:08:36 GMT</pubDate><category>datacomp</category><category>hugging-face</category><category>openai</category><category>nvidia</category><category>mistral-ai</category><category>deepseek</category><category>mistral-nemo-12b</category><category>gpt-4o-mini</category><category>deepseek-v2-0628</category><category>mistral-7b</category><category>llama-3</category><category>gemma-2</category><category>qwen-2</category><category>sam-altman</category><category>guillaume-lample</category><category>philschmid</category><category>miramurati</category><category>dataset-design</category><category>scaling-laws</category><category>model-benchmarking</category><category>model-performance</category><category>fine-tuning</category><category>multilinguality</category><category>function-calling</category><category>context-windows</category><category>open-source-models</category><category>model-optimization</category><category>cost-efficiency</category><category>benchmarking</category></item><item><title>Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)</title><link>https://news.smol.ai/issues/24-07-18-ainews-mini-nemo-turbo-lite-smol-models-go-brrr-gpt4o-mini-version/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-18-ainews-mini-nemo-turbo-lite-smol-models-go-brrr-gpt4o-mini-version/</guid><description>**OpenAI** launched the **GPT-4o Mini**, a cost-efficient small model priced at **$0.15 per million input tokens** and **$0.60 per million output tokens**, aiming to replace **GPT-3.5 Turbo** with enhanced intelligence but some performance limitations. **DeepSeek** open-sourced **DeepSeek-V2-0628**, topping the LMSYS Chatbot Arena Leaderboard and emphasizing their commitment to contributing to the AI ecosystem. **Mistral AI** and **NVIDIA** released the **Mistral NeMo**, a **12B parameter** multilingual model with a record **128k token context window** under an **Apache 2.0 license**, sparking debates on benchmarking accuracy against models like **Meta Llama 8B**. Research breakthroughs include the **TextGrad** framework for optimizing compound AI systems via textual feedback differentiation and the **STORM** system improving article writing by **25%** through simulating diverse perspectives and addressing source bias. Developer tooling trends highlight **LangChain**&apos;s evolving context-aware reasoning applications and the **Modular** ecosystem&apos;s new official GPU support, including discussions on **Mojo** and **Keras 3.0** integration.</description><pubDate>Fri, 19 Jul 2024 00:13:31 GMT</pubDate><category>openai</category><category>deepseek-ai</category><category>mistral-ai</category><category>nvidia</category><category>meta-ai-fair</category><category>hugging-face</category><category>langchain</category><category>keras</category><category>gpt-4o-mini</category><category>deepseek-v2-0628</category><category>mistral-nemo</category><category>llama-8b</category><category>liang-wenfeng</category><category>cost-efficiency</category><category>context-windows</category><category>open-source</category><category>benchmarking</category><category>neural-networks</category><category>model-optimization</category><category>text-generation</category><category>fine-tuning</category><category>developer-tools</category><category>gpu-support</category><category>parallelization</category><category>cuda-integration</category><category>multilinguality</category><category>long-context</category><category>article-generation</category></item><item><title>Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)</title><link>https://news.smol.ai/issues/24-07-18-ainews-mini-nemo-turbo-lite-smol-models-go-brrr-gpt4o-version/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-18-ainews-mini-nemo-turbo-lite-smol-models-go-brrr-gpt4o-version/</guid><description>**GPT-4o-mini** launches with a **99% price reduction** compared to text-davinci-003, offering **3.5% the price of GPT-4o** and matching Opus-level benchmarks. It supports **16k output tokens**, is faster than previous models, and will soon support **text, image, video, and audio inputs and outputs**. **Mistral Nemo**, a **12B parameter model** developed with **Nvidia**, features a **128k token context window**, FP8 checkpoint, and strong benchmark performance. **Together Lite and Turbo** offer fp8/int4 quantizations of **Llama 3** with up to **4x throughput** and significantly reduced costs. **DeepSeek V2** is now open-sourced. Upcoming releases include at least **5 unreleased models** and **Llama 4** leaks ahead of ICML 2024.</description><pubDate>Fri, 19 Jul 2024 00:00:39 GMT</pubDate><category>openai</category><category>nvidia</category><category>mistral-ai</category><category>togethercompute</category><category>deepseek-ai</category><category>lmsys</category><category>gpt-4o-mini</category><category>mistral-nemo</category><category>llama-3</category><category>llama-3-400b</category><category>deepseek-v2</category><category>sam-altman</category><category>model-quantization</category><category>context-windows</category><category>instruction-following</category><category>model-performance</category><category>cost-efficiency</category><category>multimodality</category><category>benchmarking</category><category>open-source</category><category>model-release</category></item><item><title>Gemma 2 tops /r/LocalLlama vibe check</title><link>https://news.smol.ai/issues/24-07-17-ainews-gemma-2-tops-rlocalllama-vibe-check/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-17-ainews-gemma-2-tops-rlocalllama-vibe-check/</guid><description>**Gemma 2 (9B, 27B)** is highlighted as a top-performing local LLM, praised for its speed, multilingual capabilities, and efficiency on consumer GPUs like the 2080ti. It outperforms models like **Llama 3** and **Mistral 7B** in various tasks, including non-English text processing and reasoning. The community discussion on /r/LocalLlama reflects strong preference for Gemma 2, with **18 mentions**, compared to **10 mentions** for Llama 3 and **9 mentions** for Mistral. Other models like **Phi 3** and **Qwen** also received mentions but are considered surpassed by Gemma 2. Additionally, **Andrej Karpathy** announced the launch of **Eureka Labs**, an AI+Education startup aiming to create an AI-native school with AI Teaching Assistants, starting with the **LLM101n** course to teach AI training fundamentals. This initiative is seen as a significant development in AI education.</description><pubDate>Wed, 17 Jul 2024 22:57:14 GMT</pubDate><category>gemma</category><category>llamaindex</category><category>mistral-ai</category><category>cohere</category><category>deepseek-ai</category><category>nous-research</category><category>eureka-labs</category><category>gemma-2-9b</category><category>gemma-2-27b</category><category>llama-3</category><category>mistral-7b</category><category>phi-3</category><category>qwen</category><category>andrej-karpathy</category><category>model-comparison</category><category>local-llms</category><category>multilinguality</category><category>model-efficiency</category><category>fine-tuning</category><category>ai-education</category><category>ai-teaching-assistants</category></item><item><title>SciCode: HumanEval gets a STEM PhD upgrade</title><link>https://news.smol.ai/issues/24-07-16-ainews-scicode-humaneval-gets-a-stem-phd-upgrade/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-16-ainews-scicode-humaneval-gets-a-stem-phd-upgrade/</guid><description>**PhD-level benchmarks** highlight the difficulty of coding scientific problems for LLMs, with **GPT-4** and **Claude 3.5 Sonnet** scoring under 5% on the new **SciCode** benchmark. **Anthropic** doubled the max output token limit for Claude 3.5 Sonnet to 8192 tokens. The **Q-GaLore** method enables training **LLaMA-7B** on a single 16GB GPU. The **Mosaic compiler** now generates efficient code for NVIDIA H100 GPUs. The **Dolphin 2.9.3-Yi-1.5-34B-32k-GGUF** model on Hugging Face has over 111k downloads. **Llama 3** shows strong performance, achieving 90% zero-shot accuracy on the MATH dataset. Discussions continue on the limitations and forms of synthetic data for model training.</description><pubDate>Wed, 17 Jul 2024 02:04:35 GMT</pubDate><category>anthropic</category><category>hugging-face</category><category>nvidia</category><category>gpt-4</category><category>claude-3.5-sonnet</category><category>llama-3-7b</category><category>llama-3</category><category>dolphin-2.9.3-yi-1.5-34b-32k-gguf</category><category>yi-tay</category><category>rohanpaul_ai</category><category>alexalbert__</category><category>tri_dao</category><category>abacaj</category><category>benchmarks</category><category>coding</category><category>model-training</category><category>gpu-optimization</category><category>model-performance</category><category>synthetic-data</category><category>compiler-optimization</category><category>zero-shot-learning</category></item><item><title>Microsoft AgentInstruct + Orca 3</title><link>https://news.smol.ai/issues/24-07-15-ainews-microsoft-agentinstruct-orca-3/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-15-ainews-microsoft-agentinstruct-orca-3/</guid><description>**Microsoft Research** released **AgentInstruct**, the third paper in its **Orca** series, introducing a generative teaching pipeline that produces **25.8 million** synthetic instructions to fine-tune **mistral-7b**, achieving significant performance gains: +40% AGIEval, +19% MMLU, +54% GSM8K, +38% BBH, +45% AlpacaEval, and a 31.34% reduction in hallucinations. This synthetic data approach follows the success of **FineWeb** and **Apple&apos;s Rephrasing research** in improving dataset quality. Additionally, **Tencent** claims to have generated **1 billion** diverse personas for synthetic data. On AI Twitter, notable discussions included a shooting incident at a Trump rally and recent ML research highlights such as **FlashAttention-3**, **RankRAG**, and **Mixture of A Million Experts**.</description><pubDate>Tue, 16 Jul 2024 00:42:03 GMT</pubDate><category>microsoft-research</category><category>apple</category><category>tencent</category><category>hugging-face</category><category>mistral-7b</category><category>orca-2.5</category><category>philschmid</category><category>sama</category><category>bindureddy</category><category>rohanpaul_ai</category><category>zachtratar</category><category>dair_ai</category><category>synthetic-data</category><category>fine-tuning</category><category>instruction-following</category><category>transformers</category><category>model-performance</category><category>hallucination-detection</category><category>dataset-quality</category><category>flashattention</category><category>mixture-of-experts</category></item><item><title>We Solved Hallucinations</title><link>https://news.smol.ai/issues/24-07-12-ainews-we-solved-hallucinations/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-12-ainews-we-solved-hallucinations/</guid><description>**Reddit&apos;s URL structure causes link errors in AI-generated summaries, especially with NSFW content affecting models like Claude and GPT-4.** The team fixed this glitch while still leveraging LLMs for summarizing Reddit content. **GPT-2 training costs have dramatically dropped to ~$672 using H100 GPUs and software improvements like CUDA and FlashAttention.** **FlashAttention-3 was released, achieving up to 740 TFLOPS on H100 GPUs, with FP8 nearing 1.2 PFLOPS, developed collaboratively by Meta, NVIDIA, Princeton, and Colfax.** Hopper GPUs enable major speedups with new hardware features. **Synthetic data may not improve vision tasks, as shown in recent research.** The **Avocado360 benchmark evaluates vision-language models&apos; ability to detect avocados in images.** **Lynx, a hallucination detection model for LLMs, was introduced for real-world healthcare and fintech applications, trained by Patronus AI on Databricks Mosaic AI using Composer.**</description><pubDate>Sat, 13 Jul 2024 02:52:26 GMT</pubDate><category>meta-ai-fair</category><category>nvidia</category><category>princeton</category><category>colfax</category><category>patronus-ai</category><category>databricks</category><category>mosaic-ai</category><category>openai</category><category>gpt-2</category><category>flashattention-3</category><category>lynx</category><category>karpathy</category><category>tri_dao</category><category>giffmana</category><category>vikhyatk</category><category>dbrxmosaicai</category><category>compute-hardware</category><category>gpu-optimization</category><category>flashattention</category><category>llm-evaluation</category><category>hallucination-detection</category><category>vision</category><category>benchmarking</category><category>synthetic-data</category><category>model-training</category></item><item><title>FlashAttention 3, PaliGemma, OpenAI&apos;s 5 Levels to Superintelligence</title><link>https://news.smol.ai/issues/24-07-12-ainews-flashattention-3-paligemma-openais-5-levels-to-superintelligence/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-12-ainews-flashattention-3-paligemma-openais-5-levels-to-superintelligence/</guid><description>**FlashAttention-3** introduces fast and accurate attention optimized for **H100 GPUs**, advancing native **FP8 training**. **PaliGemma**, a versatile **3B Vision-Language Model (VLM)** combining a SigLIP-So400m ViT encoder with the **Gemma-2B** language model, emphasizes a prefix-LM architecture for improved image-query interaction. **OpenAI** reveals a framework on levels of superintelligence, signaling progress toward Level 2 and highlighting internal safety disagreements. On Reddit, **NuminaMath 7B**, fine-tuned from **DeepSeekMath-7B**, wins the AI Math Olympiad by solving 29 problems using iterative supervised fine-tuning and tool-integrated reasoning. Open-source LLMs like **CodeLlama-34b** and **WizardCoder-Python-34B-V1.0** are closing the coding performance gap with closed models such as **ChatGPT-3.5**.</description><pubDate>Fri, 12 Jul 2024 09:31:43 GMT</pubDate><category>openai</category><category>together-ai</category><category>google</category><category>hugging-face</category><category>deepseek</category><category>code-llama</category><category>flashattention-3</category><category>paligemma-3b</category><category>gemma-2b</category><category>numinamath-7b</category><category>deepseekmath-7b</category><category>codellama-34b</category><category>wizardcoder-python-34b-v1.0</category><category>chatgpt-3.5</category><category>ilya-sutskever</category><category>lucas-giffman</category><category>attention-mechanisms</category><category>fp8-training</category><category>vision</category><category>prefix-lm</category><category>superintelligence</category><category>fine-tuning</category><category>chain-of-thought</category><category>tool-integrated-reasoning</category><category>self-consistency-decoding</category><category>python</category><category>coding-capabilities</category><category>elo-ratings</category></item><item><title>Nothing much happened today</title><link>https://news.smol.ai/issues/24-07-10-ainews-nothing-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-10-ainews-nothing-much-happened-today/</guid><description>**HuggingFace** released a browser-based timestamped Whisper using transformers.js. A Twitter bot by **truth_terminal** became the first &quot;semiautonomous&quot; bot to secure VC funding. **Microsoft** and **Apple** abruptly left the **OpenAI** board amid regulatory scrutiny. **Meta** is finalizing a major upgrade to Reddit comments addressing hallucination issues. The **Yi model** gained popularity on GitHub with 7.4K stars and 454 forks, with potential integration with **Axolotl** for pregeneration and preprocessing. **AMD** technologies enable household/small business AI appliances. **Meta** released **Chameleon-7b** and **Chameleon-30b** models on HuggingFace supporting unified text and image tokenization. **Salesforce**&apos;s **xLAM-1b** model outperforms **GPT-3.5** in function calling despite its smaller size. **Anole** pioneered open-source multimodal text-image-video generation up to 720p 144fps. **Phi-3 Mini** expanded from 3.8B to 4.7B parameters with function calling, competing with **Mistral-7b v3**. *&quot;System 2 distillation&quot;* in humans relates to automaticity and procedural memory.</description><pubDate>Thu, 11 Jul 2024 01:15:43 GMT</pubDate><category>huggingface</category><category>truth_terminal</category><category>microsoft</category><category>apple</category><category>openai</category><category>meta-ai-fair</category><category>yi</category><category>axolotl</category><category>amd</category><category>salesforce</category><category>chameleon-7b</category><category>chameleon-30b</category><category>xlam-1b</category><category>gpt-3.5</category><category>phi-3-mini</category><category>mistral-7b-v3</category><category>function-calling</category><category>multimodality</category><category>model-releases</category><category>model-updates</category><category>model-integration</category><category>automaticity</category><category>procedural-memory</category><category>text-image-video-generation</category></item><item><title>Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)</title><link>https://news.smol.ai/issues/24-07-09-ainews-test-time-training-mobilellm-lilian-weng-on-hallucination-plus-turbopuffer/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-09-ainews-test-time-training-mobilellm-lilian-weng-on-hallucination-plus-turbopuffer/</guid><description>**Lilian Weng** released a comprehensive literature review on **hallucination detection** and **anti-hallucination methods** including techniques like FactualityPrompt, SelfCheckGPT, and WebGPT. **Facebook AI Research (FAIR)** published **MobileLLM**, a sub-billion parameter on-device language model architecture achieving performance comparable to **llama-2-7b** with innovations like thin and deep models and shared weights. A new **RNN-based LLM architecture** with expressive hidden states was introduced, replacing attention mechanisms and scaling better than Mamba and Transformer models for long-context modeling. Additionally, **Tsinghua University** open sourced **CodeGeeX4-ALL-9B**, a multilingual code generation model excelling in code assistance.</description><pubDate>Wed, 10 Jul 2024 05:57:13 GMT</pubDate><category>facebook-research</category><category>meta-ai-fair</category><category>tsinghua-university</category><category>llama-2-7b</category><category>codegeex4-all-9b</category><category>mamba</category><category>lilian-weng</category><category>yann-lecun</category><category>hallucination-detection</category><category>anti-hallucination-methods</category><category>on-device-ai</category><category>model-architecture</category><category>rnn</category><category>long-context-modeling</category><category>model-scaling</category><category>expressive-hidden-states</category><category>code-generation</category></item><item><title>Problems with MMLU-Pro</title><link>https://news.smol.ai/issues/24-07-08-ainews-problems-with-mmlu-pro/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-08-ainews-problems-with-mmlu-pro/</guid><description>**MMLU-Pro** is gaining attention as the successor to MMLU on the **Open LLM Leaderboard V2** by **HuggingFace**, despite community concerns about evaluation discrepancies and prompt sensitivity affecting model performance, notably a **10-point improvement** in **Llama-3-8b-q8** with simple prompt tweaks. **Meta&apos;s MobileLLM** research explores running sub-billion parameter LLMs on smartphones using shared weights and deeper architectures. **Salesforce&apos;s APIGen** introduces an automated dataset generation system for function-calling tasks outperforming larger models. **Runway Gen-3 Alpha** launches an AI video generator for paid users creating realistic 10-second clips. **Nomic AI&apos;s GPT4All 3.0** offers an open-source desktop app supporting thousands of local models. AI assistants with multimodal capabilities and affordable access to multiple LLMs like ChatGPT, Claude, Llama, and Gemini are emerging. **Meta 3D Gen** advances text-to-3D asset generation, while Argil AI enables deepfake video creation from text threads. Research on transformer grokking and reasoning highlights advances in robust reasoning capabilities.</description><pubDate>Tue, 09 Jul 2024 00:20:51 GMT</pubDate><category>huggingface</category><category>meta-ai-fair</category><category>salesforce</category><category>runway</category><category>nomic-ai</category><category>pineapple</category><category>argil-ai</category><category>mmlu-pro</category><category>llama-3-8b-q8</category><category>gpt4all-3.0</category><category>chatgpt</category><category>claude</category><category>llama</category><category>gemini</category><category>mobilellm</category><category>runway-gen-3-alpha</category><category>meta-3d-gen</category><category>wenhu-chen</category><category>danhendrycks</category><category>clementine</category><category>ylecun</category><category>adcock_brett</category><category>svpino</category><category>rohanpaul_ai</category><category>benchmarking</category><category>prompt-engineering</category><category>model-evaluation</category><category>model-performance</category><category>multimodality</category><category>automated-dataset-generation</category><category>video-generation</category><category>open-source-models</category><category>ai-assistants</category><category>text-to-3d</category><category>deepfake</category><category>transformers</category><category>reasoning</category></item><item><title>Qdrant&apos;s BM42: &quot;Please don&apos;t trust us&quot;</title><link>https://news.smol.ai/issues/24-07-05-ainews-qdrants-bm42-please-dont-trust-us/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-05-ainews-qdrants-bm42-please-dont-trust-us/</guid><description>**Qdrant** attempted to replace BM25 and SPLADE with a new method called &quot;BM42&quot; combining transformer attention and collection-wide statistics for semantic and keyword search, but their evaluation using the Quora dataset was flawed. **Nils Reimers** from **Cohere** reran BM42 on better datasets and found it underperformed. Qdrant acknowledged the errors but still ran a suboptimal BM25 implementation. This highlights the importance of dataset choice and evaluation sanity checks in search model claims. Additionally, **Stripe** faced criticism for AI/ML model failures causing account and payment issues, prompting calls for alternatives. **Anthropic** revealed that **Claude 3.5 Sonnet** suppresses some answer parts with backend tags, sparking debate. **Gemma 2** model optimizations allow 2x faster fine-tuning with 63% less memory and longer context windows, running up to 34B parameters on consumer GPUs. **nanoLLaVA-1.5** was announced as a compact 1B parameter vision model with significant improvements.</description><pubDate>Sat, 06 Jul 2024 02:25:00 GMT</pubDate><category>qdrant</category><category>cohere</category><category>stripe</category><category>anthropic</category><category>hugging-face</category><category>stablequan_ai</category><category>claude-3.5-sonnet</category><category>gemma-2</category><category>nano-llava-1.5</category><category>nils-reimers</category><category>jeremyphoward</category><category>hamelhusain</category><category>rohanpaul_ai</category><category>semantic-search</category><category>benchmarking</category><category>dataset-quality</category><category>model-evaluation</category><category>model-optimization</category><category>vision</category><category>fine-tuning</category><category>context-windows</category></item><item><title>Not much happened today.</title><link>https://news.smol.ai/issues/24-07-03-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-03-ainews-not-much-happened-today/</guid><description>**Meta** introduced **Meta 3D Gen**, a system for end-to-end generation of 3D assets from text in under 1 minute, producing high-quality 3D assets with detailed textures. **Perplexity AI** updated Pro Search to handle deeper research with multi-step reasoning and code execution. **Microsoft** improved **Phi-3 Mini** with better long-context understanding and instruction following. **GPT4All 3.0** launched with support for thousands of models and major OS compatibility, featuring local file chat. **Yi-Large** model launched on Fireworks AI Playground. Research highlights include the evolution of **reinforcement learning from human feedback (RLHF)**, persona-driven data synthesis using a billion diverse personas, meta-tuning for few-shot generalization, and steering vectors for model behavior control. Tools updates include **LangSmith** improving memory retrieval and **Qdrant Engine v1.10** adding universal query API and multivector search.</description><pubDate>Wed, 03 Jul 2024 22:39:42 GMT</pubDate><category>meta</category><category>perplexity-ai</category><category>microsoft</category><category>gpt4all</category><category>langchainai</category><category>qdrant-engine</category><category>phi-3-mini</category><category>gpt4all-3.0</category><category>yi-large</category><category>meta-3d-gen</category><category>rohanpaul_ai</category><category>andriy_mulyar</category><category>cwolferesearch</category><category>sarahookr</category><category>3d-generation</category><category>long-context</category><category>instruction-following</category><category>reinforcement-learning-from-human-feedback</category><category>persona-driven-data-synthesis</category><category>meta-tuning</category><category>model-steering</category><category>memory-retrieval</category><category>multivector-search</category><category>universal-query-api</category></item><item><title>GraphRAG: The Marriage of Knowledge Graphs and RAG</title><link>https://news.smol.ai/issues/24-07-02-ainews-graphrag-the-marriage-of-knowledge-graphs-and-rag/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-02-ainews-graphrag-the-marriage-of-knowledge-graphs-and-rag/</guid><description>**Microsoft Research** open sourced **GraphRAG**, a retrieval augmented generation (RAG) technique that extracts knowledge graphs from sources and clusters them for improved LLM answers, though it increases token usage and inference time. **Gemma 2** models were released focusing on efficient small LLMs with innovations like sliding window attention and RMS norm, nearly matching the larger **Llama 3 70B**. **Anthropic&apos;s Claude 3.5 Sonnet** leads in instruction following and coding benchmarks, while **Nvidia&apos;s Nemotron 340B** model was released in June. **Qwen2-72B** tops the HuggingFace Open LLM leaderboard excelling in math and long-range reasoning. Discussions on RAG highlighted its limitations and improvements in context usage via function calls. A persona-driven synthetic data generation approach introduced 1 billion personas, with a fine-tuned model matching GPT-4 performance on math benchmarks at 7B scale. The **200GB AutoMathText dataset** was also noted for math data synthesis.</description><pubDate>Wed, 03 Jul 2024 01:30:30 GMT</pubDate><category>microsoft-research</category><category>anthropic</category><category>nvidia</category><category>hugging-face</category><category>gemma-2</category><category>llama-3-70b</category><category>claude-3.5-sonnet</category><category>nemotron-340b</category><category>qwen2-72b</category><category>llama-3</category><category>travis-fischer</category><category>rasbt</category><category>alexandr-wang</category><category>osanseviero</category><category>rohanpaul_ai</category><category>hamelhusain</category><category>svpino</category><category>aaaazzam</category><category>omarsar0</category><category>retrieval-augmented-generation</category><category>knowledge-graphs</category><category>token-usage</category><category>inference-time</category><category>attention-mechanisms</category><category>instruction-following</category><category>coding</category><category>math</category><category>long-range-reasoning</category><category>synthetic-data</category><category>dataset-release</category><category>fine-tuning</category><category>context-windows</category><category>function-calling</category></item><item><title>RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)</title><link>https://news.smol.ai/issues/24-07-01-ainews-routellm-rip-martian-plus-ainews-structured-summaries-update/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-07-01-ainews-routellm-rip-martian-plus-ainews-structured-summaries-update/</guid><description>**LMSys** introduces RouteLLM, an open-source router framework trained on **preference data** from Chatbot Arena, achieving **cost reductions over 85% on MT Bench, 45% on MMLU, and 35% on GSM8K** while maintaining **95% of GPT-4&apos;s performance**. This approach surpasses previous task-specific routing by using syntax-based Mixture of Experts (MoE) routing and data augmentation, beating commercial solutions by 40%. The update highlights advances in **LLM routing**, **cost-efficiency**, and **model performance optimization** across multiple models rather than single-model or MoE-level improvements. Additionally, the AI Twitter recap notes the **Gemma 2 model family** as a top open model, the **Block Transformer architecture** for improved inference throughput, and a proposal for a fully Software 2.0 computer vision system by **karpathy**.</description><pubDate>Tue, 02 Jul 2024 00:23:08 GMT</pubDate><category>lmsys</category><category>openai</category><category>gpt-4</category><category>gemma-2-27b</category><category>gemma-2-9b</category><category>karpathy</category><category>bindureddy</category><category>armand-joulin</category><category>llm-routing</category><category>cost-efficiency</category><category>model-performance</category><category>model-optimization</category><category>data-augmentation</category><category>syntax-based-routing</category><category>mixture-of-experts</category><category>inference-throughput</category><category>software-2.0</category><category>computer-vision</category></item><item><title>That GPT-4o Demo</title><link>https://news.smol.ai/issues/24-06-28-ainews-that-gpt-4o-demo/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-28-ainews-that-gpt-4o-demo/</guid><description>**Romain Huet** demonstrated an unreleased version of **GPT-4o** on ChatGPT Desktop showcasing capabilities like low latency voice generation, whisper tone moderation, camera mode streaming video to GPT-4o, rapid OCR, screen sharing with ChatGPT for programming help, clipboard reading, and vision-based code conversation. OpenAI&apos;s four investment areas highlighted include textual intelligence, efficiency/cost, model customization, and multimodal agents. **Google DeepMind** released **Gemma 2** models in 9B and 27B sizes trained on 8T and 13T tokens respectively, using SFT, distillation, RLHF, and model merging, optimized for TPUv5e with strong performance and safety measures. **Meta AI** announced the Meta LLM Compiler built on Meta Code Llama with enhanced code optimization and compiler features.</description><pubDate>Sat, 29 Jun 2024 00:48:47 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>meta-ai-fair</category><category>gpt-4o</category><category>gemma-2</category><category>meta-code-llama</category><category>romain-huet</category><category>fchollet</category><category>voice-generation</category><category>ocr</category><category>screen-sharing</category><category>vision</category><category>code-understanding</category><category>model-customization</category><category>efficiency</category><category>textual-intelligence</category><category>multimodal-agents</category><category>sft</category><category>distillation</category><category>rlhf</category><category>model-merging</category><category>model-optimization</category><category>safety</category></item><item><title>Gemma 2: The Open Model for Everyone</title><link>https://news.smol.ai/issues/24-06-27-ainews-gemma-2-the-open-model-for-everyone/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-27-ainews-gemma-2-the-open-model-for-everyone/</guid><description>**Gemma 2**, a **27B** parameter model from **google-deepmind**, was released with innovations like 1:1 local-global attention alternation and logit soft-capping, leveraging **knowledge distillation** to train smaller models on over 50× the compute-optimal token quantity. The model supports multilingual and multimodal capabilities, with fine-tuning success on over 200 Indic language variants. The **Open LLM Leaderboard** highlights **alibaba&apos;s Qwen 72B** as the top model, with **mistral-ai&apos;s Mixtral-8x22B-Instruct** also ranking highly. **Anthropic** launched **Claude 3.5 Sonnet**, improving intelligence at mid-tier cost and speed. Research on eliminating matrix multiplication in LLMs promises significant memory savings without performance loss. *Kathleen Kenealy* and *Daniel Han* provided insights on Gemma 2&apos;s tokenizer and attention scaling respectively.</description><pubDate>Fri, 28 Jun 2024 06:21:39 GMT</pubDate><category>google-deepmind</category><category>alibaba</category><category>mistral-ai</category><category>anthropic</category><category>gemma-2</category><category>qwen-72b</category><category>mixtral-8x22b-instruct</category><category>claude-3.5-sonnet</category><category>kathleen-kenealy</category><category>daniel-han</category><category>knowledge-distillation</category><category>attention-mechanisms</category><category>multilingual-models</category><category>multimodality</category><category>model-training</category><category>model-optimization</category><category>memory-optimization</category><category>fine-tuning</category></item><item><title>Mozilla&apos;s AI Second Act</title><link>https://news.smol.ai/issues/24-06-26-ainews-mozillas-ai-second-act/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-26-ainews-mozillas-ai-second-act/</guid><description>**Mozilla** showcased detailed live demos of **llamafile** and announced **sqlite-vec** for vector search integration at the AIE World&apos;s Fair. **LlamaIndex** launched **llama-agents**. **Anthropic** introduced new UI features and **Projects** for **Claude** with a 200K context window. **Etched AI** revealed a specialized inference chip claiming **500k tokens/sec**, though benchmark claims are questioned. **Sohu** chip enables **15 agent trajectories/sec**. **Tim Dettmers** shared theoretical GPU inference limits of ~300k tokens/sec for 8xB200 NVLink on 70B Llama. **Deepseek Coder v2** outperforms **Gemini** and GPT-4 variants in coding and reasoning. The **PyTorch documentary** launched to little attention.</description><pubDate>Thu, 27 Jun 2024 01:37:35 GMT</pubDate><category>mozilla</category><category>llamaindex</category><category>anthropic</category><category>etched-ai</category><category>sohu</category><category>deepseek</category><category>openai</category><category>llama-3</category><category>claude-3-opus</category><category>gemini-1.5</category><category>deepseek-coder-v2</category><category>gpt-4</category><category>justine-tunney</category><category>stephen-hood</category><category>tim-dettmers</category><category>bindureddy</category><category>vector-search</category><category>inference-speed</category><category>hardware-benchmarks</category><category>context-windows</category><category>open-source-models</category><category>coding</category><category>reasoning</category><category>model-benchmarking</category><category>gpu-inference</category><category>agentic-ai</category></item><item><title>Shall I compare thee to a Sonnet&apos;s day?</title><link>https://news.smol.ai/issues/24-06-25-ainews-shall-i-compare-thee-to-a-sonnets-day/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-25-ainews-shall-i-compare-thee-to-a-sonnets-day/</guid><description>**Claude 3.5 Sonnet** from **Anthropic** achieves top rankings in coding and hard prompt arenas, surpassing **GPT-4o** and competing with **Gemini 1.5 Pro** at lower cost. **Glif** demonstrates a fully automated **Wojak meme generator** using Claude 3.5 for JSON generation and ComfyUI for images, showcasing new JSON extractor capabilities. **Artifacts** enables rapid creation of niche apps, exemplified by a dual monitor visualizer made in under 5 minutes. **François Chollet** highlights that fusion energy is not a near-term solution compared to existing nuclear fission plants. **Mustafa Suleyman** notes that 75% of desk workers now use AI, marking a shift toward AI-assisted productivity.</description><pubDate>Wed, 26 Jun 2024 00:39:44 GMT</pubDate><category>anthropic</category><category>lmsys</category><category>glif</category><category>comfyui</category><category>claude-3.5-sonnet</category><category>claude-3.5</category><category>gpt-4o</category><category>gemini-1.5-pro</category><category>fchollet</category><category>mustafasuleyman</category><category>hard-prompts</category><category>json</category><category>json-extraction</category><category>meme-generation</category><category>instruction-following</category><category>app-development</category><category>fusion-energy</category><category>nuclear-fission</category><category>productivity</category></item><item><title>Gemini Nano: 50-90% of Gemini Pro, &lt;100ms inference, on device, in Chrome Canary</title><link>https://news.smol.ai/issues/24-06-25-ainews-gemini-nano-50-90percent-of-gemini-pro-less100ms-inference-on-device-in-chrome-canary/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-25-ainews-gemini-nano-50-90percent-of-gemini-pro-less100ms-inference-on-device-in-chrome-canary/</guid><description>The latest **Chrome Canary** now includes a feature flag for **Gemini Nano**, offering a prompt API and on-device optimization guide, with models Nano 1 and 2 at **1.8B** and **3.25B** parameters respectively, showing decent performance relative to Gemini Pro. The base and instruct-tuned model weights have been extracted and posted to **HuggingFace**. In AI model releases, **Anthropic** launched **Claude 3.5 Sonnet**, which outperforms **GPT-4o** on some benchmarks, is twice as fast as Opus, and is free to try. **DeepSeek-Coder-V2** achieves **90.2%** on HumanEval and **75.7%** on MATH, surpassing GPT-4-Turbo-0409, with models up to **236B** parameters and **128K** context length. **GLM-0520** from **Zhipu AI/Tsinghua** ranks highly in coding and overall benchmarks. **NVIDIA** announced **Nemotron-4 340B**, an open model family for synthetic data generation. Research highlights include **TextGrad**, a framework for automatic differentiation on textual feedback; **PlanRAG**, an iterative plan-then-RAG decision-making technique; a paper on **goldfish loss** to mitigate memorization in LLMs; and a tree search algorithm for language model agents.</description><pubDate>Tue, 25 Jun 2024 07:02:13 GMT</pubDate><category>google</category><category>gemini</category><category>huggingface</category><category>anthropic</category><category>deepseek</category><category>zhipu-ai</category><category>tsinghua</category><category>nvidia</category><category>gemini-nano</category><category>gemini-pro</category><category>claude-3.5-sonnet</category><category>gpt-4o</category><category>deepseek-coder-v2</category><category>glm-0520</category><category>nemotron-4-340b</category><category>gpt-4-turbo-0409</category><category>adcock_brett</category><category>dair_ai</category><category>lmsysorg</category><category>model-quantization</category><category>prompt-api</category><category>optimization</category><category>model-weights</category><category>benchmarking</category><category>code-generation</category><category>math</category><category>synthetic-data</category><category>automatic-differentiation</category><category>retrieval-augmented-generation</category><category>mitigating-memorization</category><category>tree-search</category><category>inference-time-algorithms</category></item><item><title>Shazeer et al (2024): you are overpaying for inference &gt;13x</title><link>https://news.smol.ai/issues/24-06-21-ainews-shazeer-et-al-2024-you-are-overpaying-for-inference-greater13x/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-21-ainews-shazeer-et-al-2024-you-are-overpaying-for-inference-greater13x/</guid><description>**Noam Shazeer** explains how **Character.ai** serves **20% of Google Search Traffic** for LLM inference while reducing serving costs by a factor of **33** compared to late 2022, with leading commercial APIs costing at least **13.5X more**. Key memory-efficiency techniques include **MQA &gt; GQA** reducing KV cache size by 8X, hybrid attention horizons, cross-layer KV-sharing, stateful caching with a 95% cache rate, and native int8 precision with custom kernels. **Anthropic** released **Claude 3.5 Sonnet**, which outperforms **Claude 3 Opus** at twice the speed and one-fifth the cost, passing **64%** of internal pull request tests and introducing new features like Artifacts for real-time doc and code generation. Discussions on LLM architecture highlight the dominance of transformers, challenges in scaling and overfitting, and the importance of architecture work for progress.</description><pubDate>Sat, 22 Jun 2024 00:48:48 GMT</pubDate><category>character.ai</category><category>anthropic</category><category>claude-3.5-sonnet</category><category>claude-3-opus</category><category>noam-shazeer</category><category>kevin-a-fischer</category><category>sebastien-bubeck</category><category>_aidan_clark_</category><category>andrej-karpathy</category><category>memory-efficiency</category><category>kv-cache</category><category>attention-mechanisms</category><category>stateful-caching</category><category>int8-precision</category><category>transformer-architecture</category><category>scaling</category><category>overfitting</category><category>architecture</category></item><item><title>Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts</title><link>https://news.smol.ai/issues/24-06-21-ainews-claude-crushes-code-92percent-humaneval-and-claudeai-artifacts/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-21-ainews-claude-crushes-code-92percent-humaneval-and-claudeai-artifacts/</guid><description>**Claude 3.5 Sonnet**, released by **Anthropic**, is positioned as a Pareto improvement over Claude 3 Opus, operating at **twice the speed** and costing **one-fifth** as much. It achieves state-of-the-art results on benchmarks like **GPQA, MMLU, and HumanEval**, surpassing even **GPT-4o** and Claude 3 Opus on vision tasks. The model demonstrates significant advances in coding capabilities, passing **64% of test cases** compared to 38% for Claude 3 Opus, and is capable of autonomously fixing pull requests. Anthropic also introduced the **Artifacts** feature, enabling users to interact with AI-generated content such as code snippets and documents in a dynamic workspace, similar to OpenAI&apos;s Code Interpreter. This release highlights improvements in performance, cost-efficiency, and coding proficiency, signaling a growing role for LLMs in software development.</description><pubDate>Fri, 21 Jun 2024 07:27:45 GMT</pubDate><category>anthropic</category><category>openai</category><category>cognition</category><category>claude-3.5-sonnet</category><category>claude-3-opus</category><category>gpt-4o</category><category>alex-albert</category><category>benchmarking</category><category>model-performance</category><category>coding</category><category>model-optimization</category><category>fine-tuning</category><category>instruction-following</category><category>model-efficiency</category><category>model-release</category><category>api</category><category>performance-optimization</category></item><item><title>There&apos;s Ilya!</title><link>https://news.smol.ai/issues/24-06-19-ainews-theres-ilya/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-19-ainews-theres-ilya/</guid><description>**Ilya Sutskever** has co-founded **Safe Superintelligence Inc** shortly after leaving **OpenAI**, while **Jan Leike** moved to **Anthropic**. **Meta** released new models including **Chameleon 7B** and **34B** with mixed-modal input and unified token space quantization. **DeepSeek-Coder-V2** shows code capabilities comparable to **GPT-4 Turbo**, supporting **338 programming languages** and **128K context length**. **Consistency Large Language Models (CLLMs)** enable parallel decoding generating multiple tokens per step. **Grokked Transformers** demonstrate reasoning through training dynamics affecting memory formation and generalization. **VoCo-LLaMA** compresses vision tokens with LLMs improving video temporal correlation understanding. The **BigCodeBench** benchmark evaluates LLMs on **1,140 coding tasks** across **139 Python libraries**, topped by DeepSeek-Coder-V2 and Claude 3 Opus. **PixelProse** is a large **16M image-caption dataset** with reduced toxicity.</description><pubDate>Thu, 20 Jun 2024 00:18:00 GMT</pubDate><category>safe-superintelligence-inc</category><category>openai</category><category>anthropic</category><category>meta</category><category>deepseek</category><category>google-deepmind</category><category>chameleon-7b</category><category>chameleon-34b</category><category>deepseek-coder-v2</category><category>gpt-4-turbo</category><category>claude-3-opus</category><category>voco-llama</category><category>ilya-sutskever</category><category>jan-leike</category><category>ylecun</category><category>akhaliq</category><category>philschmid</category><category>rohanpaul_ai</category><category>mervenoyann</category><category>fchollet</category><category>parallel-decoding</category><category>code-generation</category><category>quantization</category><category>training-dynamics</category><category>vision</category><category>benchmarks</category><category>datasets</category><category>image-captioning</category><category>reasoning</category><category>memory-optimization</category></item><item><title>Gemini launches context caching... or does it?</title><link>https://news.smol.ai/issues/24-06-18-ainews-gemini-launches-context-caching-or-does-it/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-18-ainews-gemini-launches-context-caching-or-does-it/</guid><description>**Nvidia&apos;s Nemotron** ranks #1 open model on LMsys and #11 overall, surpassing **Llama-3-70b**. **Meta AI** released **Chameleon 7B/34B** models after further post-training. **Google&apos;s Gemini** introduced context caching, offering a cost-efficient middle ground between RAG and finetuning, with a minimum input token count of 33k and no upper limit on cache duration. **DeepSeek** launched **DeepSeek-Coder-V2**, a 236B parameter model outperforming **GPT-4 Turbo**, **Claude-3-Opus**, and **Gemini-1.5-Pro** in coding tasks, supporting 338 programming languages and extending context length to 128K. It was trained on 6 trillion tokens using the **Group Relative Policy Optimization (GRPO)** algorithm and is available on Hugging Face with a commercial license. These developments highlight advances in model performance, context caching, and large-scale coding models.</description><pubDate>Tue, 18 Jun 2024 21:26:50 GMT</pubDate><category>nvidia</category><category>meta-ai-fair</category><category>google</category><category>deepseek</category><category>hugging-face</category><category>nemotron</category><category>llama-3-70b</category><category>chameleon-7b</category><category>chameleon-34b</category><category>gemini-1.5-pro</category><category>deepseek-coder-v2</category><category>gpt-4-turbo</category><category>claude-3-opus</category><category>gemini-1.5-pro</category><category>rohanpaul_ai</category><category>_philschmid</category><category>aman-sanger</category><category>context-caching</category><category>model-performance</category><category>fine-tuning</category><category>reinforcement-learning</category><category>group-relative-policy-optimization</category><category>large-context</category><category>model-training</category><category>coding</category><category>model-release</category></item><item><title>Is this... OpenQ*?</title><link>https://news.smol.ai/issues/24-06-17-ainews-is-this-openq/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-17-ainews-is-this-openq/</guid><description>**DeepSeekCoder V2** promises GPT4T-beating performance at a fraction of the cost. **Anthropic** released new research on reward tampering. **Runway** launched their Sora response and Gen-3 Alpha video generation model. A series of papers explore &quot;test-time&quot; search techniques improving mathematical reasoning with models like **LLaMa-3 8B**. **Apple** announced Apple Intelligence with smarter Siri and image/document understanding, partnered with **OpenAI** to integrate ChatGPT into iOS 18, and released 20 new CoreML models with LoRA fine-tuning for specialization. **NVIDIA** released **Nemotron-4 340B**, an open model matching GPT-4 performance. **DeepSeek-Coder-V2** excels in coding and math with 338 programming languages and 128K context length. **Stability AI** released Stable Diffusion 3 Medium weights. **Luma Labs** launched Dream Machine for 5-second video generation from text and images.</description><pubDate>Tue, 18 Jun 2024 00:38:33 GMT</pubDate><category>deepseek_ai</category><category>anthropic</category><category>runwayml</category><category>openai</category><category>apple</category><category>nvidia</category><category>stability-ai</category><category>luma-labs</category><category>deepseek-coder-v2</category><category>llama-3-8b</category><category>nemotron-4-340b</category><category>stable-diffusion-3-medium</category><category>adcock_brett</category><category>clementdelangue</category><category>svpino</category><category>reward-tampering</category><category>test-time-search</category><category>mathematical-reasoning</category><category>process-supervision</category><category>fine-tuning</category><category>on-device-ai</category><category>video-generation</category><category>cost-efficiency</category><category>context-length</category><category>coding</category><category>image-understanding</category><category>multimodality</category></item><item><title>Nemotron-4-340B: NVIDIA&apos;s new large open models, built on syndata, great for syndata</title><link>https://news.smol.ai/issues/24-06-14-ainews-nemotron-4-340b-nvidias-new-large-open-models-built-on-syndata-great-for-syndata/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-14-ainews-nemotron-4-340b-nvidias-new-large-open-models-built-on-syndata-great-for-syndata/</guid><description>**NVIDIA** has scaled up its **Nemotron-4** model from **15B** to a massive **340B** dense model, trained on **9T tokens**, achieving performance comparable to **GPT-4**. The model alignment process uses over **98% synthetic data**, with only about **20K human-annotated samples** for fine-tuning and reward model training. The synthetic data generation pipeline is open-sourced, including synthetic prompts and preference data generation. The base and instruct versions outperform **Mixtral** and **Llama 3**, while the reward model ranks better than **Gemini 1.5**, **Cohere**, and **GPT-4o**. Other notable models include **Mamba-2-Hybrid 8B**, which is up to **8x faster** than Transformers and excels on long-context tasks, **Samba-3.8B-instruct** for infinite context length with linear complexity, **Dolphin-2.9.3** tiny models optimized for low-resource devices, and **Faro Yi 9B DPO** with a **200K context window** running efficiently on **16GB VRAM**. The Mixture-of-Agents technique boosts open-source LLMs beyond GPT-4 Omni on AlpacaEval 2.0.</description><pubDate>Fri, 14 Jun 2024 21:06:38 GMT</pubDate><category>nvidia</category><category>hugging-face</category><category>mistral-ai</category><category>llamaindex</category><category>cohere</category><category>gemini</category><category>mistral</category><category>nemotron-4-340b</category><category>mixtral</category><category>llama-3</category><category>gemini-1.5</category><category>gpt-4o</category><category>mamba-2-hybrid-8b</category><category>samba-3.8b-instruct</category><category>dolphin-2.9.3</category><category>faro-yi-9b-dpo</category><category>philipp-schmid</category><category>bryan-catanzaro</category><category>oleksii-kuchaiev</category><category>rohanpaul_ai</category><category>cognitivecompai</category><category>_philschmid</category><category>01ai_yi</category><category>synthetic-data</category><category>model-alignment</category><category>reward-models</category><category>fine-tuning</category><category>long-context</category><category>model-scaling</category><category>inference-speed</category><category>mixture-of-agents</category><category>open-source-models</category><category>model-training</category><category>instruction-following</category><category>context-windows</category></item><item><title>Hybrid SSM/Transformers &gt; Pure SSMs/Pure Transformers</title><link>https://news.smol.ai/issues/24-06-13-ainews-hybrid-ssmtransformers-greater-pure-ssmspure-transformers/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-13-ainews-hybrid-ssmtransformers-greater-pure-ssmspure-transformers/</guid><description>**NVIDIA**&apos;s Bryan Catanzaro highlights a new paper on **Mamba models**, showing that mixing Mamba and Transformer blocks outperforms either alone, with optimal attention below **20%**. **Mixture-of-Agents (MoA)** architecture improves LLM generation quality, scoring **65.1% on AlpacaEval 2.0** versus **GPT-4 Omni&apos;s 57.5%**. The **LiveBench AI benchmark** evaluates reasoning, coding, writing, and data analysis. A hybrid **Mamba-2-Hybrid** model with **7% attention** surpasses a Transformer on MMLU accuracy, jumping from **50% to 53.6%**. **GPT-4** performs better at temperature=1. **Qwen 72B** leads open-source models on LiveBench AI. **LaminiAI Memory Tuning** achieves **95% accuracy** on a SQL agent task, improving over instruction fine-tuning. **Sakana AI Lab** uses evolutionary strategies for preference optimization. **Luma Labs Dream Machine** demonstrates advanced text-to-video generation. The **MMWorld benchmark** evaluates multimodal video understanding, and **Table-LLaVa 7B** competes with GPT-4V on multimodal table tasks.</description><pubDate>Thu, 13 Jun 2024 20:52:25 GMT</pubDate><category>nvidia</category><category>lamini-ai</category><category>sakana-ai</category><category>luma-labs</category><category>mamba-2-hybrid</category><category>gpt-4</category><category>qwen-72b</category><category>table-llava-7b</category><category>bryan-catanzaro</category><category>bindureddy</category><category>ylecun</category><category>ctnzr</category><category>corbtt</category><category>realsharonzhou</category><category>andrew-n-carr</category><category>karpathy</category><category>_akhaliq</category><category>omarsar0</category><category>mixture-of-experts</category><category>benchmarking</category><category>fine-tuning</category><category>multimodality</category><category>text-to-video</category><category>model-performance</category><category>memory-optimization</category><category>preference-optimization</category><category>video-understanding</category><category>multimodal-tables</category></item><item><title>The Last Hurrah of Stable Diffusion?</title><link>https://news.smol.ai/issues/24-06-12-ainews-the-last-hurrah-of-stable-diffusion/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-12-ainews-the-last-hurrah-of-stable-diffusion/</guid><description>**Stability AI** launched **Stable Diffusion 3 Medium** with models ranging from **450M to 8B parameters**, featuring the MMDiT architecture and T5 text encoder for image text rendering. The community has shown mixed reactions following the departure of key researchers like Emad Mostaque. On AI models, **Llama 3 8B Instruct** shows strong evaluation correlation with **GPT-4**, while **Qwen 2 Instruct** surpasses Llama 3 on MMLU benchmarks. The **Mixture of Agents (MoA)** framework outperforms GPT-4o on AlpacaEval 2.0. Techniques like **Spectrum** and **QLoRA** enable efficient fine-tuning with less VRAM. Research on **grokking** reveals transformers can transition from memorization to generalization through extended training. Benchmark initiatives include the **$1M ARC Prize Challenge** for AGI progress and **LiveBench**, a live LLM benchmark to prevent dataset contamination. The **Character Codex Dataset** offers open data on over **15,000 characters** for RAG and synthetic data. The **MLX 0.2** tool enhances LLM experience on Apple Silicon Macs with improved UI and faster retrieval-augmented generation.</description><pubDate>Wed, 12 Jun 2024 22:08:29 GMT</pubDate><category>stability-ai</category><category>togethercompute</category><category>llama-3-8b</category><category>llama-3</category><category>qwen-2</category><category>gpt-4</category><category>gpt-4o</category><category>emad-mostaque</category><category>rohanpaul_ai</category><category>fchollet</category><category>mikeknoop</category><category>micahgoldblum</category><category>teknium1</category><category>rasbt</category><category>percyliang</category><category>model-architecture</category><category>fine-tuning</category><category>benchmarks</category><category>dataset-release</category><category>model-evaluation</category><category>reasoning</category><category>model-training</category><category>retrieval-augmented-generation</category><category>multimodality</category></item><item><title>Francois Chollet launches $1m ARC Prize</title><link>https://news.smol.ai/issues/24-06-11-ainews-francois-chollet-launches-dollar1m-arc-prize/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-11-ainews-francois-chollet-launches-dollar1m-arc-prize/</guid><description>**François Chollet** critiques current paths to **AGI**, emphasizing the importance of benchmarks that resist saturation and focus on skill acquisition and open-ended problem solving. The **ARC-AGI** puzzles exemplify &quot;easy for humans, hard for AI&quot; challenges to measure progress toward AGI. Meanwhile, **Apple** announces integration of **ChatGPT** into iOS, iPadOS, and macOS through a partnership with **OpenAI**, enabling AI-powered features like document summarization and photo analysis with privacy-preserving measures. Discussions highlight Apple&apos;s focus on deep AI integration and on-device models optimized with techniques like mixed-precision quantization, though some skepticism remains about their AI capabilities compared to **GPT-4**. Additionally, **Together Compute** introduces a Mixture of Agents approach achieving strong performance on **AlpacaEval 2.0**.</description><pubDate>Tue, 11 Jun 2024 23:42:03 GMT</pubDate><category>openai</category><category>apple</category><category>togethercompute</category><category>gpt-4</category><category>chatgpt</category><category>francois-chollet</category><category>karpathy</category><category>svpino</category><category>philschmid</category><category>clementdelangue</category><category>sama</category><category>gdb</category><category>miramurati</category><category>kevin-weil</category><category>sarah-friar</category><category>benchmarking</category><category>agi</category><category>pattern-recognition</category><category>skill-acquisition</category><category>privacy</category><category>on-device-ai</category><category>mixed-precision-quantization</category><category>mixture-of-experts</category><category>multimodality</category><category>agentic-ai</category></item><item><title>Talaria: Apple&apos;s new MLOps Superweapon</title><link>https://news.smol.ai/issues/24-06-10-ainews-talaria-apples-new-mlops-superweapon/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-10-ainews-talaria-apples-new-mlops-superweapon/</guid><description>**Apple Intelligence** introduces a small (~3B parameters) on-device model and a larger server model running on Apple Silicon with Private Cloud Compute, aiming to surpass **Google Gemma**, **Mistral Mixtral**, **Microsoft Phi**, and **Mosaic DBRX**. The on-device model features a novel lossless quantization strategy using mixed 2-bit and 4-bit LoRA adapters averaging 3.5 bits-per-weight, enabling dynamic adapter hot-swapping and efficient memory management. Apple credits the **Talaria** tool for optimizing quantization and model latency, achieving about 0.6 ms time-to-first-token latency and 30 tokens per second generation rate on iPhone 15 Pro. Apple focuses on an &quot;adapter for everything&quot; strategy with initial deployment on SiriKit and App Intents. Performance benchmarks rely on human graders, emphasizing consumer-level adequacy over academic dominance. The Apple ML blog also mentions an Xcode code-focused model and a diffusion model for Genmoji.</description><pubDate>Tue, 11 Jun 2024 06:41:05 GMT</pubDate><category>apple</category><category>google</category><category>mistral-ai</category><category>microsoft</category><category>mosaic</category><category>gemma</category><category>mixtral</category><category>phi</category><category>dbrx</category><category>craig-federighi</category><category>andrej-karpathy</category><category>quantization</category><category>on-device-ai</category><category>adapter-models</category><category>model-optimization</category><category>model-latency</category><category>lossless-quantization</category><category>low-bit-palletization</category><category>token-generation</category><category>model-benchmarking</category><category>human-evaluation</category></item><item><title>HippoRAG: First, do know(ledge) Graph</title><link>https://news.smol.ai/issues/24-06-07-ainews-hipporag-first-do-knowledge-graph/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-07-ainews-hipporag-first-do-knowledge-graph/</guid><description>**Alibaba** released new open-source **Qwen2** models ranging from **0.5B to 72B parameters**, achieving SOTA results on benchmarks like MMLU and HumanEval. Researchers introduced **Sparse Autoencoders** to interpret **GPT-4** neural activity, improving feature representation. The **HippoRAG** paper proposes a hippocampus-inspired retrieval augmentation method using knowledge graphs and Personalized PageRank for efficient multi-hop reasoning. New techniques like **Stepwise Internalization** enable implicit chain-of-thought reasoning in LLMs, enhancing accuracy and speed. The **Buffer of Thoughts (BoT)** method improves reasoning efficiency with significant cost reduction. A novel scalable MatMul-free LLM architecture competitive with SOTA Transformers at billion-parameter scale was also presented. *&quot;Single-Step, Multi-Hop retrieval&quot;* is highlighted as a key advancement in retrieval speed and cost.</description><pubDate>Fri, 07 Jun 2024 23:55:52 GMT</pubDate><category>alibaba</category><category>openai</category><category>qwen-2</category><category>gpt-4</category><category>hipporag</category><category>rohanpaul_ai</category><category>omarsar0</category><category>nabla_theta</category><category>huybery</category><category>knowledge-graphs</category><category>personalized-pagerank</category><category>multi-hop-retrieval</category><category>chain-of-thought</category><category>implicit-reasoning</category><category>sparse-autoencoders</category><category>model-interpretability</category><category>model-efficiency</category><category>model-architecture</category><category>fine-tuning</category><category>reinforcement-learning</category></item><item><title>Qwen 2 beats Llama 3 (and we don&apos;t know how)</title><link>https://news.smol.ai/issues/24-06-06-ainews-qwen-2-beats-llama-3-and-we-dont-know-how/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-06-ainews-qwen-2-beats-llama-3-and-we-dont-know-how/</guid><description>**Alibaba** released **Qwen 2** models under Apache 2.0 license, claiming to outperform **Llama 3** in open models with multilingual support in **29 languages** and strong benchmark scores like **MMLU 82.3** and **HumanEval 86.0**. **Groq** demonstrated ultra-fast inference speed on **Llama-3 70B** at **40,792 tokens/s** and running 4 Wikipedia articles in 200ms. Research on **sparse autoencoders (SAEs)** for interpreting **GPT-4** neural activity showed new training methods, metrics, and scaling laws. **Meta AI** announced the **No Language Left Behind (NLLB)** model capable of high-quality translations between **200 languages**, including low-resource ones. *&quot;Our post-training phase is designed with the principle of scalable training with minimal human annotation,&quot;* highlighting techniques like rejection sampling for math and execution feedback for coding.</description><pubDate>Thu, 06 Jun 2024 22:33:41 GMT</pubDate><category>alibaba</category><category>groq</category><category>meta-ai-fair</category><category>qwen-2</category><category>llama-3</category><category>llama-3-70b</category><category>gpt-4</category><category>nllb</category><category>philschmid</category><category>huybery</category><category>jonathanross321</category><category>awnihannun</category><category>gdb</category><category>nabla_theta</category><category>ylecun</category><category>multilinguality</category><category>benchmarking</category><category>inference-speed</category><category>sparse-autoencoders</category><category>scaling-laws</category><category>post-training</category><category>instruction-following</category><category>rejection-sampling</category><category>execution-feedback</category><category>model-release</category><category>multilingual-models</category><category>model-training</category></item><item><title>5 small news items</title><link>https://news.smol.ai/issues/24-06-05-ainews-5-small-news-items/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-05-ainews-5-small-news-items/</guid><description>**OpenAI** announces that ChatGPT&apos;s voice mode is &quot;coming soon.&quot; **Leopold Aschenbrenner** launched a 5-part AGI timelines series predicting a **trillion dollar cluster** from current AI progress. **Will Brown** released a comprehensive GenAI Handbook. **Cohere** completed a **$450 million funding round** at a **$5 billion valuation**. DeepMind research on **uncertainty quantification in LLMs** and an **xLSTM model** outperforming transformers were highlighted. Studies on the **geometry of concepts in LLMs** and methods to **eliminate matrix multiplication** for efficiency gains were shared. Discussions on **parameter-efficient fine-tuning (PEFT)** and **automated alignment of LLMs** were noted. New tools include **LangGraph** for AI agents, **LlamaIndex** with longer context windows, and **Hugging Face&apos;s** integration with **NVIDIA NIM** for Llama3. **Mistral AI** released a fine-tuning API for their models.</description><pubDate>Thu, 06 Jun 2024 02:50:37 GMT</pubDate><category>openai</category><category>cohere</category><category>deepmind</category><category>hugging-face</category><category>nvidia</category><category>mistral-ai</category><category>llama-3</category><category>xLSTM</category><category>leopold-aschenbrenner</category><category>will-brown</category><category>rohanpaul_ai</category><category>richardmcngo</category><category>omarsar0</category><category>hwchase17</category><category>clementdelangue</category><category>sophiamyang</category><category>uncertainty-quantification</category><category>parameter-efficient-fine-tuning</category><category>automated-alignment</category><category>model-efficiency</category><category>long-context</category><category>agentic-ai</category><category>fine-tuning</category><category>inference-optimization</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/24-06-04-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-04-ainews-not-much-happened-today/</guid><description>**Twelve Labs** raised **$50m** in Series A funding co-led by NEA and **NVIDIA&apos;s NVentures** to advance multimodal AI. **Livekit** secured **$22m** in funding. **Groq** announced running at **800k tokens/second**. OpenAI saw a resignation from Daniel Kokotajlo. Twitter users highlighted **Gemini 1.5 FlashModel** for high performance at low cost and **Gemini Pro** ranking #2 in Japanese language tasks. **Mixtral** models can run up to 8x faster on NVIDIA RTX GPUs using TensorRT-LLM. **Mamba-2** model architecture introduces state space duality for larger states and faster training, outperforming previous models. **Phi-3 Medium (14B)** and **Small (7B)** models benchmark near GPT-3.5-Turbo-0613 and Llama 3 8B. Prompt engineering is emphasized for unlocking LLM capabilities. Data quality is critical for model performance, with upcoming masterclasses on data curation. Discussions on AI safety include a Frontier AI lab employee letter advocating whistleblower protections and debates on aligning AI to user intent versus broader humanity interests.</description><pubDate>Tue, 04 Jun 2024 23:53:47 GMT</pubDate><category>twelve-labs</category><category>livekit</category><category>groq</category><category>openai</category><category>nea</category><category>nvidia</category><category>lmsys</category><category>mistral-ai</category><category>gemini-1.5-flashmodel</category><category>gemini-pro</category><category>mixtral</category><category>mamba-2</category><category>phi-3-medium</category><category>phi-3-small</category><category>gpt-3.5-turbo-0613</category><category>llama-3-8b</category><category>llama-2-70b</category><category>mistral-finetune</category><category>daniel-kokotajlo</category><category>rohanpaul_ai</category><category>_arohan_</category><category>tri_dao</category><category>_albertgu</category><category>_philschmid</category><category>sarahcat21</category><category>hamelhusain</category><category>jachiam0</category><category>willdepue</category><category>teknium1</category><category>model-performance</category><category>prompt-engineering</category><category>data-curation</category><category>ai-safety</category><category>model-benchmarking</category><category>model-optimization</category><category>training</category><category>sequence-models</category><category>state-space-models</category></item><item><title>Mamba-2: State Space Duality</title><link>https://news.smol.ai/issues/24-06-03-ainews-mamba-2-state-space-duality/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-06-03-ainews-mamba-2-state-space-duality/</guid><description>**Mamba-2**, a new **state space model (SSM)**, outperforms previous models like Mamba and Transformer++ in **perplexity** and **wall-clock time**, featuring **8x larger states** and **50% faster training**. It introduces the concept of **state space duality (SSD)** connecting SSMs and linear attention. The **FineWeb-Edu dataset**, a high-quality subset of the **15 trillion token FineWeb dataset**, filtered using **llama-3-70b** for educational quality, enables better and faster LLM learning, potentially reducing tokens needed to surpass **GPT-3** performance. Additionally, perplexity-based data pruning using a **125M parameter model** improves downstream performance and reduces pretraining steps by up to **1.45x**. The **Video-MME benchmark** evaluates multi-modal LLMs on video analysis across multiple visual domains and video lengths.</description><pubDate>Mon, 03 Jun 2024 21:31:26 GMT</pubDate><category>hugging-face</category><category>mamba-2</category><category>mamba</category><category>transformer++</category><category>llama-3-70b</category><category>gpt-3</category><category>_albertgu</category><category>tri_dao</category><category>arankomatsuzaki</category><category>_akhaliq</category><category>clementdelangue</category><category>karpathy</category><category>state-space-models</category><category>perplexity</category><category>training-efficiency</category><category>data-pruning</category><category>benchmarking</category><category>multimodality</category><category>video-analysis</category></item><item><title>Ways to use Anthropic&apos;s Tool Use GA</title><link>https://news.smol.ai/issues/24-05-31-ainews-ways-to-use-anthropics-tool-use-ga/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-31-ainews-ways-to-use-anthropics-tool-use-ga/</guid><description>**Anthropic** launched general availability of tool use/function calling with support for streaming, forced use, and vision, alongside **Amazon** and **Google**. Alex Albert shared five architectures for agentic tool use: delegation, parallelization, debate, specialization, and tool suite experts. **Anthropic** also introduced a self-guided course on tool use. **Yann LeCun** emphasized ethical open science funding, gradual emergence of superintelligence with safety guardrails, and convolutional networks for image/video processing as competitive with vision transformers. He also noted growth in AI researchers across industry, academia, and government.</description><pubDate>Fri, 31 May 2024 20:31:29 GMT</pubDate><category>anthropic</category><category>amazon</category><category>google</category><category>claude-3-opus</category><category>haiku</category><category>opus</category><category>convnext</category><category>yann-lecun</category><category>alex-albert</category><category>sainingxie</category><category>tool-use</category><category>function-calling</category><category>agentic-ai</category><category>streaming</category><category>vision</category><category>parallelization</category><category>delegation</category><category>debate</category><category>specialization</category><category>open-science</category><category>superintelligence</category><category>convolutional-networks</category><category>self-attention</category><category>ai-research</category></item><item><title>Contextual Position Encoding (CoPE)</title><link>https://news.smol.ai/issues/24-05-30-ainews-contextual-position-encoding-cope/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-30-ainews-contextual-position-encoding-cope/</guid><description>**Meta AI** researcher **Jason Weston** introduced **CoPE**, a novel positional encoding method for transformers that incorporates *context* to create learnable gates, enabling improved handling of counting and copying tasks and better performance on language modeling and coding. The approach can potentially be extended with external memory for gate calculation. **Google DeepMind** released **Gemini 1.5 Flash** and **Pro** models optimized for fast inference. **Anthropic** announced general availability of tool use for **Claude**, enhancing its ability to orchestrate tools for complex tasks. **Alexandr Wang** launched **SEAL Leaderboards** for private, expert evaluations of frontier models. **Karpathy** reflected on the 4th anniversary of **GPT-3**, emphasizing scaling and practical improvements. **Perplexity AI** launched **Perplexity Pages** to convert research into visually appealing articles, described as an &quot;AI Wikipedia&quot; by **Arav Srinivas**.</description><pubDate>Fri, 31 May 2024 03:11:48 GMT</pubDate><category>meta-ai-fair</category><category>google-deepmind</category><category>anthropic</category><category>perplexity-ai</category><category>langchain</category><category>openai</category><category>cope</category><category>gemini-1.5-flash</category><category>gemini-1.5-pro</category><category>claude</category><category>gpt-3</category><category>jason-weston</category><category>alexandr-wang</category><category>karpathy</category><category>arav-srinivas</category><category>positional-encoding</category><category>transformers</category><category>counting</category><category>copying</category><category>language-modeling</category><category>coding</category><category>external-memory</category><category>tool-use</category><category>model-evaluation</category><category>inference-speed</category><category>model-benchmarking</category><category>scaling</category><category>research-synthesis</category></item><item><title>1 TRILLION token context, real time, on device?</title><link>https://news.smol.ai/issues/24-05-29-ainews-1-trillion-token-context-real-time-on-device/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-29-ainews-1-trillion-token-context-real-time-on-device/</guid><description>**Cartesia**, a startup specializing in **state space models (SSMs)**, launched a low latency voice model outperforming transformer-based models with **20% lower perplexity**, **2x lower word error**, and **1 point higher NISQA quality**. This breakthrough highlights the potential for models that can continuously process and reason over massive streams of multimodal data (text, audio, video) with a **trillion token context window** on-device. The news also covers recent AI developments including **Mistral&apos;s Codestral weights release**, **Schedule Free optimizers** paper release, and **Scale AI&apos;s** new elo-style eval leaderboards. Additionally, a debate between **yann-lecun** and **elon-musk** on the importance of publishing AI research versus engineering achievements was noted. The **Gemini 1.5 Pro/Advanced** models were mentioned for their strong performance.</description><pubDate>Wed, 29 May 2024 23:01:07 GMT</pubDate><category>cartesia</category><category>mistral-ai</category><category>scale-ai</category><category>gemini-1.5-pro</category><category>gemini-1.5</category><category>yann-lecun</category><category>elon-musk</category><category>state-space-models</category><category>voice-models</category><category>multimodality</category><category>model-performance</category><category>on-device-ai</category><category>long-context</category><category>evaluation-leaderboards</category><category>learning-rate-optimization</category><category>scientific-publishing</category><category>research-vs-engineering</category></item><item><title>Somebody give Andrej some H100s already</title><link>https://news.smol.ai/issues/24-05-28-ainews-somebody-give-andrej-some-h100s-already/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-28-ainews-somebody-give-andrej-some-h100s-already/</guid><description>**OpenAI**&apos;s GPT-2 sparked controversy five years ago for being &quot;too dangerous to release.&quot; Now, with **FineWeb** and **llm.c**, a tiny GPT-2 model can be trained in **90 minutes** for **$20** using **8xA100** GPUs, with the full 1.6B model estimated to take **1 week** and **$2.5k**. The project is notable for its heavy use of **CUDA** (75.8%) aiming to simplify the training stack. Meanwhile, a Twitter debate between **Yann LeCun** and **Elon Musk** highlighted the importance of **convolutional neural networks (CNNs)** in real-time image processing for autonomous driving, with LeCun emphasizing scientific research&apos;s role in technological progress. LeCun also criticized AI doomsday scenarios, arguing for cautious optimism about AI safety and regulation.</description><pubDate>Wed, 29 May 2024 01:24:27 GMT</pubDate><category>openai</category><category>fineweb</category><category>meta-ai-fair</category><category>nvidia</category><category>tesla</category><category>gpt-2</category><category>andrej-karpathy</category><category>yann-lecun</category><category>elon-musk</category><category>francois-chollet</category><category>svpino</category><category>mervenoyann</category><category>cuda</category><category>fine-tuning</category><category>training-time</category><category>gpu-acceleration</category><category>convolutional-neural-networks</category><category>real-time-processing</category><category>ai-safety</category><category>ai-regulation</category></item><item><title>Life after DPO (RewardBench)</title><link>https://news.smol.ai/issues/24-05-27-ainews-life-after-dpo-rewardbench/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-27-ainews-life-after-dpo-rewardbench/</guid><description>**xAI raised $6 billion at a $24 billion valuation**, positioning it among the most highly valued AI startups, with expectations to fund **GPT-5 and GPT-6 class models**. The **RewardBench** tool, developed by Nathan Lambert, evaluates reward models (RMs) for language models, showing Cohere&apos;s RMs outperforming open-source alternatives. The discussion highlights the evolution of language models from Claude Shannon&apos;s 1948 model to GPT-3 and beyond, emphasizing the role of **RLHF (Reinforcement Learning from Human Feedback)** and the newer **DPO (Direct Preference Optimization)** method. Notably, some **Llama 3 8B reward model-focused models** are currently outperforming GPT-4, Cohere, Gemini, and Claude on the RewardBench leaderboard, raising questions about reward hacking. Future alignment research directions include improving preference datasets, DPO techniques, and personalization in language models. The report also compares xAI&apos;s valuation with OpenAI, Mistral AI, and Anthropic, noting speculation about xAI&apos;s spending on Nvidia hardware.</description><pubDate>Tue, 28 May 2024 00:04:01 GMT</pubDate><category>x-ai</category><category>openai</category><category>mistral-ai</category><category>anthropic</category><category>cohere</category><category>meta-ai-fair</category><category>hugging-face</category><category>nvidia</category><category>gpt-3</category><category>gpt-4</category><category>gpt-5</category><category>gpt-6</category><category>llama-3-8b</category><category>llama-3</category><category>claude-3</category><category>gemini</category><category>nathan-lambert</category><category>chris-manning</category><category>elon-musk</category><category>bindureddy</category><category>rohanpaul_ai</category><category>nearcyan</category><category>reinforcement-learning-from-human-feedback</category><category>direct-preference-optimization</category><category>reward-models</category><category>rewardbench</category><category>language-model-history</category><category>model-evaluation</category><category>alignment-research</category><category>preference-datasets</category><category>personalization</category><category>transformer-architecture</category></item><item><title>Ten Commandments for Deploying Fine-Tuned Models</title><link>https://news.smol.ai/issues/24-05-24-ainews-ten-commandments-for-deploying-fine-tuned-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-24-ainews-ten-commandments-for-deploying-fine-tuned-models/</guid><description>**Gemini-in-Google-Slides** is highlighted as a useful tool for summarizing presentations. Kyle Corbitt&apos;s talk on deploying fine-tuned models in production emphasizes avoiding fine-tuning unless necessary, focusing on prompting, data quality, appropriate model choice, and thorough evaluation. **Anthropic** showcased feature alteration in **Claude AI**, demonstrating control over model behavior and increased understanding of large language models. Open-source models like **GPT-4o** are approaching closed-source performance on benchmarks like MMLU for simple tasks, though advanced models remain necessary for complex automation.</description><pubDate>Fri, 24 May 2024 22:12:57 GMT</pubDate><category>anthropic</category><category>google</category><category>openai</category><category>claude-3-opus</category><category>claude-3</category><category>gpt-4o</category><category>kyle-corbitt</category><category>bindureddy</category><category>alexalbert__</category><category>fine-tuning</category><category>prompt-engineering</category><category>model-evaluation</category><category>feature-alteration</category><category>benchmarking</category><category>model-performance</category><category>open-source-models</category></item><item><title>Clémentine Fourrier on LLM evals</title><link>https://news.smol.ai/issues/24-05-23-ainews-clementine-fourrier-on-llm-evals/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-23-ainews-clementine-fourrier-on-llm-evals/</guid><description>**Clémentine Fourrier** from **Huggingface** presented at **ICLR** about **GAIA** with **Meta** and shared insights on **LLM evaluation** methods. The blog outlines three main evaluation approaches: **Automated Benchmarking** using sample inputs/outputs and metrics, **Human Judges** involving grading and ranking with methods like **Vibe-checks**, **Arena**, and **systematic annotations**, and **Models as Judges** using generalist or specialist models with noted biases. Challenges include data contamination, subjectivity, and bias in scoring. These evaluations help prevent regressions, rank models, and track progress in the field.</description><pubDate>Thu, 23 May 2024 23:34:22 GMT</pubDate><category>huggingface</category><category>meta-ai-fair</category><category>claude-3-opus</category><category>clem_fourrier</category><category>llm-evaluation</category><category>automated-benchmarking</category><category>human-evaluation</category><category>model-bias</category><category>data-contamination</category><category>elo-ranking</category><category>systematic-annotations</category><category>preference-learning</category><category>evaluation-metrics</category><category>prompt-sensitivity</category></item><item><title>ALL of AI Engineering in One Place</title><link>https://news.smol.ai/issues/24-05-22-ainews-all-of-ai-engineering-in-one-place/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-22-ainews-all-of-ai-engineering-in-one-place/</guid><description>The upcoming **AI Engineer World&apos;s Fair** in San Francisco from **June 25-27** will feature a significantly expanded format with booths, talks, and workshops from **top model labs** like **OpenAI, DeepMind, Anthropic, Mistral, Cohere, HuggingFace**, and **Character.ai**. It includes participation from **Microsoft Azure, Amazon AWS, Google Vertex**, and major companies such as **Nvidia, Salesforce, Mastercard, Palo Alto Networks**, and more. The event covers **9 tracks** including **RAG, multimodality, evals/ops, open models, code generation, GPUs, agents, AI in Fortune 500**, and a new **AI leadership** track. Additionally, **Anthropic** shared interpretability research on **Claude 3 Sonnet**, revealing millions of interpretable features that can be steered to modify model behavior, including safety-relevant features related to bias and unsafe content, though more research is needed for practical applications. The event offers a discount code for AI News readers.</description><pubDate>Thu, 23 May 2024 01:22:53 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>mistral-ai</category><category>cohere</category><category>hugging-face</category><category>adept</category><category>midjourney</category><category>character-ai</category><category>microsoft</category><category>amazon</category><category>nvidia</category><category>salesforce</category><category>mastercard</category><category>palo-alto-networks</category><category>axa</category><category>novartis</category><category>discord</category><category>twilio</category><category>tinder</category><category>khan-academy</category><category>sourcegraph</category><category>mongodb</category><category>neo4j</category><category>hasura</category><category>modular</category><category>cognition</category><category>anysphere</category><category>perplexity-ai</category><category>groq</category><category>mozilla</category><category>nous-research</category><category>galileo</category><category>unsloth</category><category>langchain</category><category>llamaindex</category><category>instructor</category><category>weights-biases</category><category>lambda-labs</category><category>neptune</category><category>datastax</category><category>crusoe</category><category>covalent</category><category>qdrant</category><category>baseten</category><category>e2b</category><category>octo-ai</category><category>gradient-ai</category><category>lancedb</category><category>log10</category><category>deepgram</category><category>outlines</category><category>crew-ai</category><category>factory-ai</category><category>claude-3-sonnet</category><category>claude-3</category><category>interpretability</category><category>feature-steering</category><category>safety</category><category>multilinguality</category><category>multimodality</category><category>rag</category><category>evals-ops</category><category>open-models</category><category>code-generation</category><category>gpus</category><category>agents</category><category>ai-leadership</category></item><item><title>Anthropic&apos;s &quot;LLM Genome Project&quot;: learning &amp; clamping 34m features on Claude Sonnet</title><link>https://news.smol.ai/issues/24-05-21-ainews-anthropics-llm-genome-project-learning-and-clamping-34m-features-on-claude-sonnet/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-21-ainews-anthropics-llm-genome-project-learning-and-clamping-34m-features-on-claude-sonnet/</guid><description>**Anthropic** released their third paper in the MechInterp series, **Scaling Monosemanticity**, scaling interpretability analysis to **34 million features** on **Claude 3 Sonnet**. This work introduces the concept of **dictionary learning** to isolate recurring neuron activation patterns, enabling more interpretable internal states by combining features rather than neurons. The paper reveals abstract features related to code, errors, sycophancy, crime, self-representation, and deception, demonstrating intentional modifiability by clamping feature values. The research marks a significant advance in **model interpretability** and **neural network analysis** at frontier scale.</description><pubDate>Tue, 21 May 2024 22:47:46 GMT</pubDate><category>anthropic</category><category>scale-ai</category><category>suno-ai</category><category>microsoft</category><category>claude-3-sonnet</category><category>claude-3</category><category>emmanuel-ameisen</category><category>alex-albert</category><category>model-interpretability</category><category>dictionary-learning</category><category>neural-networks</category><category>feature-activation</category><category>intentional-modifiability</category><category>scaling</category><category>mechanistic-interpretability</category></item><item><title>Skyfall</title><link>https://news.smol.ai/issues/24-05-20-ainews-skyfall/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-20-ainews-skyfall/</guid><description>Between 5/17 and 5/20/2024, key AI updates include **Google DeepMind&apos;s Gemini 1.5 Pro and Flash models**, featuring sparse multimodal MoE architecture with up to **10M context** and a dense Transformer decoder that is **3x faster and 10x cheaper**. **Yi AI released Yi-1.5 models** with extended context windows of **32K and 16K tokens**. Other notable releases include **Kosmos 2.5 (Microsoft), PaliGemma (Google), Falcon 2, DeepSeek v2 lite, and HunyuanDiT diffusion model**. Research highlights feature an **Observational Scaling Laws paper** predicting model performance across families, a **Layer-Condensed KV Cache** technique boosting inference throughput by **up to 26×**, and the **SUPRA method** converting LLMs into RNNs for reduced compute costs. Hugging Face expanded local AI capabilities enabling on-device AI without cloud dependency. LangChain updated its v0.2 release with improved documentation. The community also welcomed a new LLM Finetuning Discord by Hamel Husain and Dan Becker for Maven course users. *&quot;Hugging Face is profitable, or close to profitable,&quot;* enabling $10 million in free shared GPUs for developers.</description><pubDate>Mon, 20 May 2024 23:02:42 GMT</pubDate><category>google-deepmind</category><category>yi-ai</category><category>microsoft</category><category>hugging-face</category><category>langchain</category><category>maven</category><category>gemini-1.5-pro</category><category>gemini-1.5-flash</category><category>yi-1.5</category><category>kosmos-2.5</category><category>paligemma</category><category>falcon-2</category><category>deepseek-v2</category><category>hunyuan-dit</category><category>gemini-1.5</category><category>gemini-1.5-flash</category><category>yi-1.5</category><category>hamel-husain</category><category>dan-becker</category><category>clement-delangue</category><category>philschmid</category><category>osanseviero</category><category>arankomatsuzaki</category><category>jason-wei</category><category>rohanpaul_ai</category><category>multimodality</category><category>mixture-of-experts</category><category>transformer</category><category>model-optimization</category><category>long-context</category><category>model-performance</category><category>model-inference</category><category>fine-tuning</category><category>local-ai</category><category>scaling-laws</category><category>causal-models</category><category>hallucination-detection</category><category>model-distillation</category><category>model-efficiency</category></item><item><title>Chameleon: Meta&apos;s (unreleased) GPT4o-like Omnimodal Model</title><link>https://news.smol.ai/issues/24-05-17-ainews-chameleon-metas-unreleased-gpt4o-like-omnimodal-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-17-ainews-chameleon-metas-unreleased-gpt4o-like-omnimodal-model/</guid><description>**Meta AI FAIR** introduced **Chameleon**, a new multimodal model family with **7B** and **34B** parameter versions trained on **10T tokens** of interleaved text and image data enabling &quot;early fusion&quot; multimodality that can natively output any modality. While reasoning benchmarks are modest, its &quot;omnimodality&quot; approach competes well with pre-GPT4o multimodal models. **OpenAI** launched **GPT-4o**, a model excelling in benchmarks like MMLU and coding tasks, with strong multimodal capabilities but some regression in ELO scores and hallucination issues. **Google DeepMind** announced **Gemini 1.5 Flash**, a small model with **1M context window** and flash performance, highlighting convergence trends between OpenAI and Google models. **Anthropic** updated **Claude 3** with streaming support, forced tool use, and vision tool integration for multimodal knowledge extraction. OpenAI also partnered with Reddit, raising industry attention.</description><pubDate>Fri, 17 May 2024 20:46:44 GMT</pubDate><category>meta-ai-fair</category><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>reddit</category><category>chameleon</category><category>gpt-4o</category><category>gemini-1.5-flash</category><category>claude-3</category><category>armen-aghajanyan</category><category>sama</category><category>alexandr-wang</category><category>abacaj</category><category>alexalbert__</category><category>multimodality</category><category>early-fusion</category><category>benchmarking</category><category>model-training</category><category>tokenization</category><category>streaming</category><category>tool-use</category><category>vision</category><category>coding</category><category>hallucination-detection</category><category>model-performance</category></item><item><title>Cursor reaches &gt;1000 tok/s finetuning Llama3-70b for fast file editing</title><link>https://news.smol.ai/issues/24-05-16-ainews-cursor-reaches-greater1000-toks-finetuning-llama3-70b-for-fast-file-editing/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-16-ainews-cursor-reaches-greater1000-toks-finetuning-llama3-70b-for-fast-file-editing/</guid><description>**Cursor**, an AI-native IDE, announced a **speculative edits** algorithm for code editing that surpasses **GPT-4** and **GPT-4o** in accuracy and latency, achieving speeds of over **1000 tokens/s** on a **70b** model. **OpenAI** released **GPT-4o** with multimodal capabilities including audio, vision, and text, noted to be **2x faster and 50% cheaper** than GPT-4 turbo, though with mixed coding performance. **Anthropic** introduced streaming, forced tool use, and vision features for developers. **Google DeepMind** unveiled **Imagen Video** and **Gemini 1.5 Flash**, a small model with a **1M-context** window. **HuggingFace** is distributing **$10M** in free GPUs for open-source AI models like **Llama**, **BLOOM**, and **Stable Diffusion**. Evaluation insights highlight challenges with LLMs on novel problems and benchmark saturation, with new benchmarks like **MMLU-Pro** showing significant drops in top model performance.</description><pubDate>Fri, 17 May 2024 00:50:41 GMT</pubDate><category>cursor</category><category>openai</category><category>anthropic</category><category>google-deepmind</category><category>huggingface</category><category>gpt-4</category><category>gpt-4o</category><category>gpt-4-turbo</category><category>gpt-4o-mini</category><category>llama</category><category>bloom</category><category>stable-diffusion</category><category>sama</category><category>abacaj</category><category>imjaredz</category><category>erhartford</category><category>alexalbert</category><category>svpino</category><category>maximelabonne</category><category>_philschmid</category><category>speculative-decoding</category><category>code-edits</category><category>multimodality</category><category>image-generation</category><category>streaming</category><category>tool-use</category><category>fine-tuning</category><category>benchmarking</category><category>mmlu</category><category>model-performance</category><category>evaluation</category><category>synthetic-data</category><category>context-windows</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/24-05-15-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-15-ainews-not-much-happened-today/</guid><description>**Ilya Sutskever** steps down as Chief Scientist at **OpenAI** after nearly a decade, with **Jakub Pachocki** named as his successor. **Google DeepMind** announces **Gemini 1.5 Pro** and **Gemini 1.5 Flash** models featuring 2 million token context and improved multimodal capabilities, alongside demos of **Project Astra** AI assistant, **Imagen 3** text-to-image model, and **Veo** generative video model. **GPT-4o** tops the VHELM leaderboard and outperforms competitors on LMSYS Chatbot Arena. **Reka Core** multimodal model with 128K context and **Alibaba&apos;s Qwen1.5-110B** open-source model are released. **Salesforce** shares an online RLHF recipe.</description><pubDate>Wed, 15 May 2024 21:20:08 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>rekailabs</category><category>alibaba</category><category>salesforce</category><category>gpt-4o</category><category>gemini-1.5-pro</category><category>gemini-1.5-flash</category><category>imagen-3</category><category>veo</category><category>reka-core</category><category>qwen-1.5-110b</category><category>ilya-sutskever</category><category>jakub-pachocki</category><category>mike-krieger</category><category>sama</category><category>multimodality</category><category>long-context</category><category>model-releases</category><category>reinforcement-learning</category><category>model-benchmarking</category><category>text-to-image</category><category>video-generation</category><category>ai-assistants</category></item><item><title>Google I/O in 60 seconds</title><link>https://news.smol.ai/issues/24-05-14-ainews-google-io-in-60-seconds/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-14-ainews-google-io-in-60-seconds/</guid><description>**Google** announced updates to the **Gemini model family**, including **Gemini 1.5 Pro** with **2 million token support**, and the new **Gemini Flash** model optimized for speed with **1 million token capacity**. The Gemini suite now includes **Ultra**, **Pro**, **Flash**, and **Nano** models, with **Gemini Nano** integrated into **Chrome 126**. Additional Gemini features include **Gemini Gems** (custom GPTs), **Gemini Live** for voice conversations, and **Project Astra**, a live video understanding assistant. The **Gemma model family** was updated with **Gemma 2** at **27B parameters**, offering near-**llama-3-70b** performance at half the size, plus **PaliGemma**, a vision-language open model inspired by **PaLI-3**. Other launches include **DeepMind&apos;s Veo**, **Imagen 3** for photorealistic image generation, and a **Music AI Sandbox** collaboration with YouTube. **SynthID watermarking** now extends to text, images, audio, and video. The **Trillium TPUv6** codename was revealed. Google also integrated AI across its product suite including Workspace, Email, Docs, Sheets, Photos, Search, and Lens. *&quot;The world awaits Apple&apos;s answer.&quot;*</description><pubDate>Tue, 14 May 2024 22:01:01 GMT</pubDate><category>google</category><category>google-deepmind</category><category>youtube</category><category>gemini-1.5-pro</category><category>gemini-flash</category><category>gemini-ultra</category><category>gemini-pro</category><category>gemini-nano</category><category>gemma-2</category><category>llama-3-70b</category><category>paligemma</category><category>imagen-3</category><category>veo</category><category>tokenization</category><category>model-performance</category><category>fine-tuning</category><category>vision</category><category>multimodality</category><category>model-release</category><category>model-training</category><category>model-optimization</category><category>ai-integration</category><category>image-generation</category><category>watermarking</category><category>hardware-optimization</category><category>voice</category><category>video-understanding</category></item><item><title>GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version) </title><link>https://news.smol.ai/issues/24-05-13-ainews-gpt-4o-the-new-sota-everything-frontier-model-gpt4t-version/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-13-ainews-gpt-4o-the-new-sota-everything-frontier-model-gpt4t-version/</guid><description>**OpenAI** launched **GPT-4o**, a frontier model supporting real-time reasoning across **audio, vision, and text**, now free for all ChatGPT users with enhanced coding capabilities and upcoming advanced voice and video features. Discussions cover **open-source LLMs** like **Llama 3**, fine-tuning techniques including knowledge distillation for **GPT-3.5**, and hardware optimization strategies such as quantization. Emerging architectures include multimodal integrations with ChatGPT voice and Open Interpreter API, Mixture of Experts models combining autoregressive and diffusion approaches, and novel designs like the **YOCO architecture** and **ThunderKittens DSL** for efficient GPU use. Research advances in efficient attention methods like **Conv-Basis** using FFT and model scaling techniques such as depth upscaling were also highlighted.</description><pubDate>Mon, 13 May 2024 23:14:50 GMT</pubDate><category>openai</category><category>hugging-face</category><category>nous-research</category><category>eleutherai</category><category>hazyresearch</category><category>gpt-4o</category><category>gpt-3.5</category><category>llama-3</category><category>real-time-reasoning</category><category>coding-capabilities</category><category>fine-tuning</category><category>knowledge-distillation</category><category>hardware-optimization</category><category>quantization</category><category>multimodality</category><category>mixture-of-experts</category><category>efficient-attention</category><category>model-scaling</category><category>depth-upscaling</category><category>transformer-architecture</category><category>gpu-optimization</category><category>prompt-engineering</category></item><item><title>GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)</title><link>https://news.smol.ai/issues/24-05-13-ainews-gpt-4o-the-new-sota-everything-frontier-model-gpt4o-version/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-13-ainews-gpt-4o-the-new-sota-everything-frontier-model-gpt4o-version/</guid><description>**OpenAI** has released **GPT-4o**, a new **multimodal** model capable of reasoning across text, audio, and video in real time with low latency (~300ms). It features voice and vision capabilities, improved non-English language performance with an expanded 200k vocabulary tokenizer, and is available to all ChatGPT users including free plans. GPT-4o is half the price and twice as fast as GPT-4-turbo with 5x rate limits. The model supports real-time voice and video input/output and shows strong coding capabilities. The release includes a new desktop app that can read screen and clipboard history, challenging existing desktop agent startups. The announcement was accompanied by demos including image generation and 3D object handling, with OpenAI achieving state-of-the-art performance in ASR and vision tasks. The update was widely discussed on social media, with comparisons to GPT-4T highlighting GPT-4o&apos;s speed and versatility. *&quot;GPT-4o is smart, fast, natively multimodal, and a step towards more natural human-computer interaction&quot;* and *&quot;extremely versatile and fun to play with&quot;*.</description><pubDate>Mon, 13 May 2024 22:58:05 GMT</pubDate><category>openai</category><category>lmsys</category><category>multion</category><category>adept</category><category>gpt-4o</category><category>gpt-4-turbo</category><category>sama</category><category>gdb</category><category>multimodality</category><category>vision</category><category>speech-recognition</category><category>tokenization</category><category>real-time-processing</category><category>coding</category><category>model-performance</category><category>model-optimization</category><category>desktop-agents</category></item><item><title>Quis promptum ipso promptiet?</title><link>https://news.smol.ai/issues/24-05-10-ainews-quis-promptum-ipso-promptiet/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-10-ainews-quis-promptum-ipso-promptiet/</guid><description>**Anthropic** released upgrades to their Workbench Console, introducing new prompt engineering features like chain-of-thought reasoning and prompt generators that significantly reduce development time, exemplified by their customer **Zoominfo**. **OpenAI** teased a &quot;magic&quot; new development coming soon, speculated to be a new LLM replacing GPT-3.5 in the free tier or a search competitor. The open-source community highlighted **Llama 3 70B** as &quot;game changing&quot; with new quantized weights for **Llama 3 120B** and CUDA graph support for **llama.cpp** improving GPU performance. **Neuralink** demonstrated a thought-controlled mouse, sparking interest in modeling consciousness from brain signals. The **ICLR 2024** conference is being held in Asia for the first time, generating excitement.</description><pubDate>Sat, 11 May 2024 06:34:12 GMT</pubDate><category>anthropic</category><category>openai</category><category>zoominfo</category><category>neuralink</category><category>llama-3-70b</category><category>llama-3-120b</category><category>llama-3</category><category>llama-cpp</category><category>sama</category><category>gdb</category><category>bindureddy</category><category>svpino</category><category>rohanpaul_ai</category><category>alexalbert__</category><category>abacaj</category><category>prompt-engineering</category><category>chain-of-thought</category><category>rag</category><category>quantization</category><category>cuda-graphs</category><category>gpu-optimization</category><category>thought-controlled-devices</category><category>modeling-consciousness</category><category>conference</category></item><item><title>LMSys advances Llama 3 eval analysis</title><link>https://news.smol.ai/issues/24-05-09-ainews-lmsys-advances-llama-3-eval-analysis/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-09-ainews-lmsys-advances-llama-3-eval-analysis/</guid><description>**LMSys** is enhancing LLM evaluation by categorizing performance across **8 query subcategories** and **7 prompt complexity levels**, revealing uneven strengths in models like **Llama-3-70b**. **DeepMind** released **AlphaFold 3**, advancing molecular structure prediction with holistic modeling of protein-DNA-RNA complexes, impacting biology and genetics research. **OpenAI** introduced the **Model Spec**, a public standard to clarify model behavior and tuning, inviting community feedback and aiming for models to learn directly from it. **Llama 3** has reached top leaderboard positions on LMSys, nearly matching **Claude-3-sonnet** in performance, with notable variations on complex prompts. The analysis highlights the evolving landscape of model benchmarking and behavior shaping.</description><pubDate>Fri, 10 May 2024 00:52:45 GMT</pubDate><category>lmsys</category><category>openai</category><category>google-deepmind</category><category>isomorphic-labs</category><category>llama-3-70b</category><category>llama-3</category><category>claude-3-sonnet</category><category>alphafold-3</category><category>demis-hassabis</category><category>sam-altman</category><category>miranda-murati</category><category>karina-nguyen</category><category>joanne-jang</category><category>john-schulman</category><category>benchmarking</category><category>model-behavior</category><category>prompt-complexity</category><category>model-specification</category><category>molecular-structure-prediction</category><category>performance-analysis</category><category>leaderboards</category></item><item><title>OpenAI&apos;s PR Campaign?</title><link>https://news.smol.ai/issues/24-05-08-ainews-openais-pr-campaign/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-08-ainews-openais-pr-campaign/</guid><description>**OpenAI** faces user data deletion backlash over its new partnership with StackOverflow amid GDPR complaints and US newspaper lawsuits, while addressing election year concerns with efforts like the Media Manager tool for content opt-in/out by 2025 and source link attribution. **Microsoft** develops a top-secret airgapped GPT-4 AI service for US intelligence agencies. OpenAI releases the Model Spec outlining responsible AI content generation policies, including NSFW content handling and profanity use, emphasizing clear distinctions between bugs and design decisions. **Google DeepMind** announces **AlphaFold 3**, a state-of-the-art model predicting molecular structures with high accuracy, showcasing cross-domain AI techniques. New research on **xLSTM** proposes scaling LSTMs to billions of parameters, competing with transformers in performance and scaling. Microsoft introduces **vAttention**, a dynamic memory management method for efficient large language model serving without PagedAttention.</description><pubDate>Thu, 09 May 2024 01:27:27 GMT</pubDate><category>openai</category><category>microsoft</category><category>google-deepmind</category><category>alphafold-3</category><category>xlstm</category><category>gpt-4</category><category>demis-hassabis</category><category>sama</category><category>joanne-jang</category><category>omarsar0</category><category>arankomatsuzaki</category><category>drjimfan</category><category>memory-management</category><category>model-spec</category><category>scaling</category><category>multimodality</category><category>performance</category><category>transformers</category><category>dynamic-memory</category><category>model-architecture</category></item><item><title>Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?</title><link>https://news.smol.ai/issues/24-05-07-ainews-kolmogorov-arnold-networks-mlp-killers-or-just-spicy-mlps/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-07-ainews-kolmogorov-arnold-networks-mlp-killers-or-just-spicy-mlps/</guid><description>**Ziming Liu**, a grad student of **Max Tegmark**, published a paper on **Kolmogorov-Arnold Networks (KANs)**, claiming they outperform **MLPs** in interpretability, inductive bias injection, function approximation accuracy, and scaling, despite being 10x slower to train but 100x more parameter efficient. KANs use learnable activation functions modeled by B-splines on edges rather than fixed activations on nodes. However, it was later shown that KANs can be mathematically rearranged back into MLPs with similar parameter counts, sparking debate on their interpretability and novelty. Meanwhile, on AI Twitter, there is speculation about a potential **GPT-5** release with mixed impressions, OpenAI&apos;s adoption of the **C2PA metadata standard** for detecting AI-generated images with high accuracy for **DALL-E 3**, and **Microsoft** training a large 500B parameter model called **MAI-1**, potentially previewed at Build conference, signaling increased competition with OpenAI. *&quot;OpenAI&apos;s safety testing for GPT-4.5 couldn&apos;t finish in time for Google I/O launch&quot;* was also noted.</description><pubDate>Tue, 07 May 2024 22:47:14 GMT</pubDate><category>openai</category><category>microsoft</category><category>gpt-5</category><category>gpt-4</category><category>dall-e-3</category><category>max-tegmark</category><category>ziming-liu</category><category>bindureddy</category><category>nptacek</category><category>zacharynado</category><category>rohanpaul_ai</category><category>svpino</category><category>learnable-activations</category><category>mlp</category><category>function-approximation</category><category>interpretability</category><category>inductive-bias-injection</category><category>b-splines</category><category>model-rearrangement</category><category>parameter-efficiency</category><category>ai-generated-image-detection</category><category>metadata-standards</category><category>large-model-training</category></item><item><title>DeepSeek-V2 beats Mixtral 8x22B with &gt;160 experts at HALF the cost</title><link>https://news.smol.ai/issues/24-05-06-ainews-deepseek-v2-beats-mixtral-8x22b-with-greater160-experts-at-half-the-cost/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-06-ainews-deepseek-v2-beats-mixtral-8x22b-with-greater160-experts-at-half-the-cost/</guid><description>**DeepSeek V2** introduces a new state-of-the-art MoE model with **236B parameters** and a novel Multi-Head Latent Attention mechanism, achieving faster inference and surpassing GPT-4 on AlignBench. **Llama 3 120B** shows strong creative writing skills, while Microsoft is reportedly developing a **500B parameter** LLM called **MAI-1**. Research from Scale AI highlights overfitting issues in models like **Mistral** and **Phi**, whereas **GPT-4**, **Claude**, **Gemini**, and **Llama** maintain benchmark robustness. In robotics, **Tesla Optimus** advances with superior data collection and teleoperation, **LeRobot** marks a move toward open-source robotics AI, and **Nvidia&apos;s DrEureka** automates robot skill training. Multimodal LLM hallucinations are surveyed with new mitigation strategies, and **Google&apos;s Med-Gemini** achieves SOTA on medical benchmarks with fine-tuned multimodal models.</description><pubDate>Mon, 06 May 2024 23:37:03 GMT</pubDate><category>deepseek-ai</category><category>mistral-ai</category><category>microsoft</category><category>openai</category><category>scale-ai</category><category>tesla</category><category>nvidia</category><category>google-deepmind</category><category>deepseek-v2</category><category>llama-3-120b</category><category>llama-3-400b</category><category>gpt-4</category><category>mistral</category><category>phi</category><category>claude</category><category>gemini</category><category>mai-1</category><category>med-gemini</category><category>erhartford</category><category>maximelabonne</category><category>bindureddy</category><category>adcock_brett</category><category>drjimfan</category><category>clementdelangue</category><category>omarsar0</category><category>rohanpaul_ai</category><category>mixture-of-experts</category><category>multi-head-attention</category><category>model-inference</category><category>benchmarking</category><category>overfitting</category><category>robotics</category><category>teleoperation</category><category>open-source</category><category>multimodality</category><category>hallucination-detection</category><category>fine-tuning</category><category>medical-ai</category><category>model-training</category></item><item><title>$100k to predict LMSYS human preferences in a Kaggle contest</title><link>https://news.smol.ai/issues/24-05-03-ainews-dollar100k-to-predict-lmsys-human-preferences-in-a-kaggle-contest/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-03-ainews-dollar100k-to-predict-lmsys-human-preferences-in-a-kaggle-contest/</guid><description>**Llama 3 models** are making breakthroughs with Groq&apos;s 70B model achieving record low costs per million tokens. A new **Kaggle competition** offers a $100,000 prize to develop models predicting human preferences from a dataset of over 55,000 user-LLM conversations. Open source evaluator LLMs like **Prometheus 2** outperform proprietary models such as **GPT-4** and **Claude 3 Opus** in judgment tasks. New datasets like **WildChat1M** provide over 1 million ChatGPT interaction logs with diverse and toxic examples. Techniques like **LoRA fine-tuning** show significant performance gains, and **NVIDIA&apos;s NeMo-Aligner** toolkit enables scalable LLM alignment across hundreds of GPUs. Factuality-aware alignment methods are proposed to reduce hallucinations in LLM outputs.</description><pubDate>Fri, 03 May 2024 22:09:28 GMT</pubDate><category>groq</category><category>openai</category><category>lmsys</category><category>scale-ai</category><category>ai2</category><category>nvidia</category><category>llama-3-70b</category><category>llama-3</category><category>gpt-4</category><category>claude-3-opus</category><category>prometheus-2</category><category>bindureddy</category><category>drjimfan</category><category>percyliang</category><category>seungonekim</category><category>mobicham</category><category>clefourrier</category><category>benchmarking</category><category>datasets</category><category>fine-tuning</category><category>reinforcement-learning</category><category>model-alignment</category><category>hallucination</category><category>parameter-efficient-fine-tuning</category><category>scalable-training</category><category>factuality</category><category>chatbot-performance</category></item><item><title>Evals: The Next Generation</title><link>https://news.smol.ai/issues/24-05-02-ainews-evals-the-next-generation/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-02-ainews-evals-the-next-generation/</guid><description>**Scale AI** highlighted issues with data contamination in benchmarks like **MMLU** and **GSM8K**, proposing a new benchmark where **Mistral** overfits and **Phi-3** performs well. **Reka** released the **VibeEval** benchmark for multimodal models addressing multiple choice benchmark limitations. **Sam Altman** of **OpenAI** discussed GPT-4 as &quot;dumb&quot; and hinted at **GPT-5** with AI agents as a major breakthrough. Researchers jailbroke **GPT-3.5** via fine-tuning. Global calls emerged to ban AI-powered weapons, with US officials urging human control over nuclear arms. Ukraine launched an AI consular avatar, while **Moderna** partnered with **OpenAI** for medical AI advancements. **Sanctuary AI** and **Microsoft** collaborate on AI for general-purpose robots. MIT introduced **Kolmogorov-Arnold networks** with improved neural network efficiency. **Meta AI** is training **Llama 3** models with over 400 billion parameters, featuring multimodality and longer context.</description><pubDate>Thu, 02 May 2024 23:54:22 GMT</pubDate><category>scale-ai</category><category>mistral-ai</category><category>reka-ai</category><category>openai</category><category>moderna</category><category>sanctuary-ai</category><category>microsoft</category><category>mit</category><category>meta-ai-fair</category><category>gpt-4</category><category>gpt-5</category><category>gpt-3.5</category><category>phi-3</category><category>mistral-7b</category><category>llama-3</category><category>sam-altman</category><category>jim-fan</category><category>benchmarking</category><category>data-contamination</category><category>multimodality</category><category>fine-tuning</category><category>ai-regulation</category><category>ai-safety</category><category>ai-weapons</category><category>neural-networks</category><category>model-architecture</category><category>model-training</category><category>model-performance</category><category>robotics</category><category>activation-functions</category><category>long-context</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/24-05-01-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-05-01-ainews-not-much-happened-today/</guid><description>**Anthropic** released a team plan and iOS app about 4 months after **OpenAI**. The **Command-R 35B** model excels at creative writing, outperforming larger models like **Goliath-120** and **Miqu-120**. The **Llama-3 8B** model now supports a 1 million token context window, improving long-context understanding with minimal training on a single 8xA800 GPU machine. **TensorRT-LLM** benchmarks show it is 30-70% faster than **llama.cpp** on consumer hardware. A benchmark suggests **GPT2-Chat** may have better reasoning than **GPT-4-Turbo**, though results are debated. Demos include a self-learning **Llama-3** voice agent running locally on Jetson Orin and a Self-Learning Large Action Model (LAM). **Amazon CodeWhisperer** was renamed to **Q Developer**, expanding its generative AI assistant capabilities. **Apple** plans an AI-enabled Safari browser with an on-device LLM in iOS 18 and macOS 15. Big Tech dominates AI lobbying in Washington, while major U.S. newspapers sued **OpenAI** and **Microsoft** for copyright infringement. **DeepMind&apos;s AlphaZero** became the greatest chess player in 9 hours, and their Naturalized Execution Tuning (NExT) method improves LLM code reasoning by 14-26%. **Stable Diffusion** is used for diverse image generation applications.</description><pubDate>Thu, 02 May 2024 00:47:12 GMT</pubDate><category>anthropic</category><category>openai</category><category>perplexity-ai</category><category>amazon</category><category>apple</category><category>microsoft</category><category>deepmind</category><category>command-r-35b</category><category>goliath-120</category><category>miqu-120</category><category>llama-3-8b</category><category>tensorrt-llm</category><category>llama-cpp</category><category>gpt2-chat</category><category>gpt-4-turbo</category><category>llama-3</category><category>deepmind-alphazero</category><category>creative-writing</category><category>context-windows</category><category>benchmarking</category><category>model-performance</category><category>self-learning</category><category>function-calling</category><category>retrieval-augmented-generation</category><category>ai-assistants</category><category>on-device-ai</category><category>ai-lobbying</category><category>copyright-infringement</category><category>code-reasoning</category><category>image-generation</category></item><item><title>LLMs-as-Juries</title><link>https://news.smol.ai/issues/24-04-30-ainews-llms-as-juries/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-30-ainews-llms-as-juries/</guid><description>**OpenAI** has rolled out the **memory feature** to all ChatGPT Plus users and partnered with the **Financial Times** to license content for AI training. Discussions on **OpenAI&apos;s profitability** arise due to paid training data licensing and potential **GPT-4 usage limit reductions**. Users report issues with ChatGPT&apos;s data cleansing after the memory update. Tutorials and projects include building AI voice assistants and interface agents powered by LLMs. In **Stable Diffusion**, users seek realistic **SDXL models** comparable to PonyXL, and new extensions like **Hi-diffusion** and **Virtuoso Nodes v1.1** enhance ComfyUI with advanced image generation and Photoshop-like features. Cohere finds that multiple agents outperform single agents in LLM judging tasks, highlighting advances in multi-agent systems.</description><pubDate>Wed, 01 May 2024 01:41:25 GMT</pubDate><category>openai</category><category>cohere</category><category>financial-times</category><category>gpt-4</category><category>gpt-3.5</category><category>sdxl</category><category>ponyxl</category><category>memory</category><category>training-data</category><category>model-usage-limits</category><category>data-cleansing</category><category>ai-voice-assistants</category><category>interface-agents</category><category>image-generation</category><category>model-extensions</category><category>multi-agent-systems</category></item><item><title>A quiet weekend</title><link>https://news.smol.ai/issues/24-04-29-ainews-a-quiet-weekend/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-29-ainews-a-quiet-weekend/</guid><description>**Yann LeCun** predicts a shift to **AR interfaces** with AI assistants in 10-15 years, moving away from smartphones. The **Dolphin-2.9 model** based on **Llama-3** was released, improving quality issues. **PixArt Sigma**, a **0.6B parameter** model, achieves **Stable Diffusion 3.0** level performance with complete prompt adherence and local usability. Research shows transformers can use meaningless filler tokens for algorithmic tasks with dense supervision. AI-generated restaurant reviews can pass the **Turing test**, fooling humans and AI detectors. **Uber** uses graph algorithms and learned embeddings for ETA prediction. **Coca-Cola** and **Microsoft** announced a 5-year AI partnership to accelerate cloud and generative AI initiatives. The **Llama-3 70B** model can run on a single 4GB GPU using **AirLLM** optimization without quantization but is slow. **Mistral.rs** is introduced as a fast LLM inference platform with quantization and OpenAI API compatibility. Only 5% of LLMs make it from prototype to production due to challenges, especially in enterprise. EXL2 and GGUF quantization methods for Llama models show similar perplexity vs model size, with Llama-3 and Llama-2 degrading more under quantization compared to full precision.</description><pubDate>Mon, 29 Apr 2024 22:10:15 GMT</pubDate><category>microsoft</category><category>coca-cola</category><category>uber</category><category>lmsys</category><category>nous-research</category><category>mistral-ai</category><category>llama-3</category><category>dolphin-2.9</category><category>pixart-sigma</category><category>llama-3-70b</category><category>yann-lecun</category><category>ar-interfaces</category><category>transformers</category><category>algorithmic-tasks</category><category>turing-test</category><category>graph-algorithms</category><category>embeddings</category><category>generative-ai</category><category>model-optimization</category><category>llm-inference</category><category>quantization</category><category>model-deployment</category></item><item><title>Apple&apos;s OpenELM beats OLMo with 50% of its dataset, using DeLighT</title><link>https://news.smol.ai/issues/24-04-26-ainews-apples-openelm-beats-olmo-with-50percent-of-its-dataset-using-delight/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-26-ainews-apples-openelm-beats-olmo-with-50percent-of-its-dataset-using-delight/</guid><description>**Apple** advances its AI presence with the release of **OpenELM**, its first relatively open large language model available in sizes from **270M to 3B** parameters, featuring a novel layer-wise scaling architecture inspired by the **DeLight** paper. Meanwhile, **Meta&apos;s LLaMA 3** family pushes context length boundaries with models supporting over **160K tokens** and an **8B-Instruct model with 262K context length** released on Hugging Face, alongside performance improvements in quantized versions. A new paper on AI alignment highlights **KTO** as the best-performing method, with sensitivity to training data volume noted. In AI ethics and regulation, former **Google** CEO **Eric Schmidt** warns about the risks of open-source AI empowering bad actors and geopolitical rivals, while a U.S. proposal aims to enforce &quot;Know Your Customer&quot; rules to end anonymous cloud usage.</description><pubDate>Fri, 26 Apr 2024 21:32:41 GMT</pubDate><category>apple</category><category>meta-ai-fair</category><category>google</category><category>openelm</category><category>llama-3</category><category>llama-3-8b-instruct</category><category>llama-3-70b</category><category>eric-schmidt</category><category>sebastian-raschka</category><category>layer-wise-scaling</category><category>context-length</category><category>quantization</category><category>ai-alignment</category><category>open-source</category><category>ai-regulation</category></item><item><title>Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM</title><link>https://news.smol.ai/issues/24-04-25-ainews-snowflake-arctic-fully-open-10b128x4b-dense-moe-hybrid-llm/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-25-ainews-snowflake-arctic-fully-open-10b128x4b-dense-moe-hybrid-llm/</guid><description>**Snowflake Arctic** is a notable new foundation language model released under Apache 2.0, claiming superiority over **Databricks** in data warehouse AI applications and adopting a mixture-of-experts architecture inspired by **DeepSeekMOE** and **DeepSpeedMOE**. The model employs a 3-stage curriculum training strategy similar to the recent **Phi-3** paper. In AI image and video generation, **Nvidia** introduced the **Align Your Steps** technique improving image quality at low step counts, while **Stable Diffusion 3** and **SD3 Turbo** models were compared for prompt understanding and image quality. **Adobe** launched an AI video upscaling project enhancing blurry videos to HD, though with some high-resolution artifacts. **Apple** released open-source on-device language models with code and training logs, diverging from typical weight-only releases. The **Llama-3-70b** model ties for first place on the LMSYS leaderboard for English queries, and **Phi-3** (4B params) outperforms **GPT-3.5 Turbo** in the banana logic benchmark. Fast inference and quantization of **Llama 3** models were demonstrated on MacBook devices.</description><pubDate>Fri, 26 Apr 2024 01:33:53 GMT</pubDate><category>snowflake</category><category>databricks</category><category>deepseek</category><category>deepspeed</category><category>nvidia</category><category>stable-diffusion</category><category>adobe</category><category>apple</category><category>llamaindex</category><category>lmsys</category><category>openai</category><category>snowflake-arctic</category><category>phi-3</category><category>llama-3-70b</category><category>llama-3</category><category>stable-diffusion-3</category><category>sd3-turbo</category><category>gpt-3.5-turbo</category><category>mixture-of-experts</category><category>curriculum-learning</category><category>model-release</category><category>image-generation</category><category>video-upscaling</category><category>quantization</category><category>inference-speed</category><category>benchmarking</category><category>model-comparison</category><category>open-source</category><category>on-device-ai</category></item><item><title>OpenAI&apos;s Instruction Hierarchy for the LLM OS</title><link>https://news.smol.ai/issues/24-04-24-ainews-openais-instruction-hierarchy-for-the-llm-os/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-24-ainews-openais-instruction-hierarchy-for-the-llm-os/</guid><description>**OpenAI** published a paper introducing the concept of privilege levels for LLMs to address prompt injection vulnerabilities, improving defenses by 20-30%. **Microsoft** released the lightweight **Phi-3-mini** model with 4K and 128K context lengths. **Apple** open-sourced the **OpenELM** language model family with an open training and inference framework. An instruction accuracy benchmark compared 12 models, with **Claude 3 Opus**, **GPT-4 Turbo**, and **Llama 3 70B** performing best. The **Rho-1** method enables training state-of-the-art models using only 3% of tokens, boosting models like **Mistral**. **Wendy&apos;s** deployed AI-powered drive-thru ordering, and a study found **Gen Z** workers prefer generative AI for career advice. Tutorials on deploying **Llama 3** models on AWS EC2 highlight hardware requirements and inference server use.</description><pubDate>Thu, 25 Apr 2024 00:15:11 GMT</pubDate><category>openai</category><category>microsoft</category><category>apple</category><category>deepseek</category><category>mistral-ai</category><category>llamaindex</category><category>wendys</category><category>phi-3-mini</category><category>openelm</category><category>claude-3-opus</category><category>gpt-4-turbo</category><category>gpt-3.5-turbo</category><category>llama-3-70b</category><category>rho-1</category><category>mistral-7b</category><category>llama-3-8b</category><category>llama-3</category><category>prompt-injection</category><category>alignment</category><category>benchmarking</category><category>instruction-following</category><category>context-windows</category><category>model-training</category><category>model-deployment</category><category>inference</category><category>performance-optimization</category><category>ai-application</category><category>career-advice</category><category>drive-thru-ai</category></item><item><title>Perplexity, the newest AI unicorn</title><link>https://news.smol.ai/issues/24-04-23-ainews-perplexity-the-newest-ai-unicorn/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-23-ainews-perplexity-the-newest-ai-unicorn/</guid><description>**Perplexity** doubles its valuation shortly after its Series B with a Series B-1 funding round. Significant developments around **Llama 3** include context length extension to **16K tokens**, new multimodal **LLaVA models** outperforming Llama 2, and fine-tuning improvements like QDoRA surpassing QLoRA. The **Llama-3-70B** model is praised for instruction following and performance across quantization formats. **Phi-3 models** by **Meta AI** released in multiple sizes show competitive benchmark results, with the 14B model achieving **78% on MMLU** and the 3.8B model nearing **GPT-3.5** performance.</description><pubDate>Tue, 23 Apr 2024 22:48:23 GMT</pubDate><category>perplexity-ai</category><category>meta-ai-fair</category><category>hugging-face</category><category>groq</category><category>llama-3-8b</category><category>llama-3-70b</category><category>llama-3</category><category>llava-llama-3-8b-v1_1</category><category>phi-3</category><category>gpt-3.5</category><category>daniel-gross</category><category>aravind-srinivas</category><category>context-length</category><category>fine-tuning</category><category>quantization</category><category>instruction-following</category><category>model-comparison</category><category>multimodality</category><category>benchmarking</category><category>memory-optimization</category><category>model-performance</category></item><item><title>FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you&apos;re welcome)</title><link>https://news.smol.ai/issues/24-04-22-ainews-fineweb-15t-tokens-12-years-of-commoncrawl-deduped-and-filtered-youre-welcome/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-22-ainews-fineweb-15t-tokens-12-years-of-commoncrawl-deduped-and-filtered-youre-welcome/</guid><description>**2024** has seen a significant increase in dataset sizes for training large language models, with **Redpajama 2** offering up to **30T tokens**, **DBRX** at **12T tokens**, **Reka Core/Flash/Edge** with **5T tokens**, and **Llama 3** trained on **15T tokens**. **Huggingface** released an open dataset containing **15T tokens** from **12 years** of filtered CommonCrawl data, enabling training of models like **Llama 3** if compute resources are available. On Reddit, **WizardLM-2-8x22b** outperformed other open LLMs including **Llama-3-70b-instruct** in reasoning and math benchmarks. **Claude Opus** demonstrated strong zero-shot code error spotting, surpassing **Llama 3**. Benchmarks revealed limitations in the **LMSYS chatbot leaderboard** due to instruction-tuned models gaming the system, and a new RAG benchmark showed **Llama 3 70B** underperforming compared to **GPT-4**, while **Mistral 8x7B** remained strong. Efficient quantized versions of **Llama 3** models are available on **Huggingface**, with users reporting token generation limits around **9600 tokens** on a 3090 GPU. Safety concerns include a UK sex offender banned from AI tool usage and **GPT-4** demonstrating an **87% success rate** exploiting real vulnerabilities, raising security concerns.</description><pubDate>Tue, 23 Apr 2024 00:03:58 GMT</pubDate><category>huggingface</category><category>meta-ai-fair</category><category>dbrx</category><category>reka-ai</category><category>mistral-ai</category><category>lmsys</category><category>openai</category><category>llama-3-70b</category><category>llama-3</category><category>wizardlm-2-8x22b</category><category>claude-opus</category><category>mistral-8x7b</category><category>gpt-4</category><category>datasets</category><category>benchmarking</category><category>quantization</category><category>zero-shot-learning</category><category>reasoning</category><category>code-error-detection</category><category>token-generation</category><category>security</category></item><item><title>Llama-3-70b is GPT-4-level Open Model</title><link>https://news.smol.ai/issues/24-04-19-ainews-llama-3-70b-is-gpt-4-level-open-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-19-ainews-llama-3-70b-is-gpt-4-level-open-model/</guid><description>**Meta** has released **Llama 3**, their most capable open large language model with **8B and 70B parameter versions** supporting **8K context length** and outperforming previous models including **Llama 2** and **Mistral 7B**. **Groq** serves the **Llama 3 70B** model at **500-800 tokens/second**, making it the fastest GPT-4-level token source. Discussions highlight AI scaling challenges with **Elon Musk** stating that training **Grok 3** will require **100,000 Nvidia H100 GPUs**, and **AWS** planning to acquire **20,000 B200 GPUs** for a **27 trillion parameter model**. Microsoft unveiled **VASA-1** for lifelike talking face generation, while **Stable Diffusion 3** and its extensions received mixed impressions. Concerns about AI energy usage and political bias in AI were also discussed.</description><pubDate>Sat, 20 Apr 2024 02:21:27 GMT</pubDate><category>meta-ai-fair</category><category>groq</category><category>nvidia</category><category>amazon</category><category>microsoft</category><category>llama-3-70b</category><category>llama-3-8b</category><category>llama-3</category><category>llama-2-70b</category><category>mistral-7b</category><category>grok-3</category><category>stable-diffusion-3</category><category>vasa-1</category><category>elon-musk</category><category>benchmarking</category><category>model-performance</category><category>fine-tuning</category><category>function-calling</category><category>arithmetic</category><category>image-generation</category><category>video-generation</category><category>energy-usage</category><category>gpu-demand</category><category>political-bias</category><category>ai-safety</category><category>scaling</category><category>context-windows</category><category>tokenization</category></item><item><title>Meta Llama 3 (8B, 70B)</title><link>https://news.smol.ai/issues/24-04-18-ainews-meta-llama-3-8b-70b/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-18-ainews-meta-llama-3-8b-70b/</guid><description>**Meta** partially released **Llama 3** models including **8B** and **70B** variants, with a **400B** variant still in training, touted as the first GPT-4 level open-source model. **Stability AI** launched **Stable Diffusion 3 API** with model weights coming soon, showing competitive realism against **Midjourney V6**. **Boston Dynamics** unveiled an electric humanoid robot **Atlas**, and **Microsoft** introduced the **VASA-1** model generating lifelike talking faces at 40fps on RTX 4090. **Mistral AI**, a European OpenAI rival, is seeking $5B funding with its **Mixtral-8x22B-Instruct-v0.1** model achieving 100% accuracy on 64K context benchmarks. AI safety discussions include calls from former OpenAI board member **Helen Toner** for audits of top AI companies, and the **Mormon Church** released AI usage principles. New AI development tools include **Ctrl-Adapter** for diffusion models, **Distilabel 1.0.0** for synthetic dataset pipelines, **Data Bonsai** for data cleaning with LLMs, and **Dendron** for building LLM agents with behavior trees. Memes highlight AI development humor and cultural references. The release of **Llama 3** models features improved reasoning, a 128K token vocabulary, 8K token sequences, and grouped query attention.</description><pubDate>Fri, 19 Apr 2024 04:28:01 GMT</pubDate><category>meta-ai-fair</category><category>stability-ai</category><category>boston-dynamics</category><category>microsoft</category><category>mistral-ai</category><category>hugging-face</category><category>llama-3-8b</category><category>llama-3-70b</category><category>llama-3-400b</category><category>stable-diffusion-3</category><category>mixtral-8x22b-instruct-v0.1</category><category>vasa-1</category><category>helen-toner</category><category>transformer</category><category>tokenization</category><category>model-training</category><category>benchmarking</category><category>robotics</category><category>natural-language-processing</category><category>real-time-processing</category><category>synthetic-data</category><category>dataset-cleaning</category><category>behavior-trees</category><category>ai-safety</category><category>model-accuracy</category><category>api</category><category>model-release</category><category>humor</category></item><item><title>Mixtral 8x22B Instruct sparks efficiency memes</title><link>https://news.smol.ai/issues/24-04-17-ainews-mixtral-8x22b-instruct-sparks-efficiency-memes/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-17-ainews-mixtral-8x22b-instruct-sparks-efficiency-memes/</guid><description>**Mistral** released an instruct-tuned version of their **Mixtral 8x22B** model, notable for using only **39B active parameters** during inference, outperforming larger models and supporting **5 languages** with **64k context window** and math/code capabilities. The model is available on **Hugging Face** under an **Apache 2.0 license** for local use. **Google** plans to invest over **$100 billion** in AI, with other giants like **Microsoft**, **Intel**, and **SoftBank** also making large investments. The UK criminalized non-consensual deepfake porn, raising enforcement debates. A former **Nvidia** employee claims Nvidia&apos;s AI chip lead is unmatchable this decade. AI companions could become a **$1 billion** market. AI has surpassed humans on several basic tasks but lags on complex ones. **Zyphra** introduced **Zamba**, a novel 7B parameter hybrid model outperforming **LLaMA-2 7B** and **OLMo-7B** with less training data, trained on 128 H100 GPUs over 30 days. **GroundX** API advances retrieval-augmented generation accuracy.</description><pubDate>Wed, 17 Apr 2024 21:02:34 GMT</pubDate><category>mistral-ai</category><category>hugging-face</category><category>google</category><category>microsoft</category><category>intel</category><category>softbank</category><category>nvidia</category><category>mixtral-8x22b</category><category>llama-2-7b</category><category>olmo-7b</category><category>guillaume-lample</category><category>osanseviero</category><category>_philschmid</category><category>svpino</category><category>multilinguality</category><category>math</category><category>code-generation</category><category>context-window</category><category>model-performance</category><category>model-release</category><category>retrieval-augmented-generation</category><category>deepfake</category><category>ai-investment</category><category>ai-chip</category><category>hybrid-architecture</category><category>training-data</category></item><item><title>Lilian Weng on Video Diffusion</title><link>https://news.smol.ai/issues/24-04-16-ainews-lilian-weng-on-video-diffusion/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-16-ainews-lilian-weng-on-video-diffusion/</guid><description>**OpenAI** expands with a launch in **Japan**, introduces a **Batch API**, and partners with **Adobe** to bring the **Sora video model** to Premiere Pro. **Reka AI** releases the **Reka Core multimodal language model**. **WizardLM-2** is released showing impressive performance, and **Llama 3** news is anticipated soon. Geoffrey Hinton highlights AI models exhibiting **intuition, creativity, and analogy recognition** beyond humans. The **Devin AI model** notably contributes to its own codebase. **Opus** demonstrates the ability to recognize its own generated outputs. **Sam Altman** warns startups about being steamrolled by OpenAI if they don&apos;t adapt quickly. **Yann LeCun** discusses AGI timelines, emphasizing it is inevitable but not imminent or solely from LLMs. Lilian Weng&apos;s blog on **diffusion models for video generation** highlights **training-free adaptation** as a breakthrough technique.</description><pubDate>Wed, 17 Apr 2024 02:15:37 GMT</pubDate><category>openai</category><category>adobe</category><category>reka-ai</category><category>wizardlm-2</category><category>llama-3</category><category>reka-core</category><category>devin</category><category>opus</category><category>sora</category><category>lilian-weng</category><category>sam-altman</category><category>geoffrey-hinton</category><category>yann-lecun</category><category>diffusion-models</category><category>video-generation</category><category>training-free-adaptation</category><category>multimodality</category><category>intuition</category><category>creativity</category><category>analogy-recognition</category><category>self-improving-ai</category><category>model-recognition</category><category>agi-timelines</category><category>model-performance</category><category>startup-competition</category></item><item><title>Multi-modal, Multi-Aspect, Multi-Form-Factor AI</title><link>https://news.smol.ai/issues/24-04-15-ainews-multi-modal-multi-aspect-multi-form-factor-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-15-ainews-multi-modal-multi-aspect-multi-form-factor-ai/</guid><description>Between April 12-15, **Reka Core** launched a new GPT4-class multimodal foundation model with a detailed technical report described as &quot;full Shazeer.&quot; **Cohere Compass** introduced a foundation embedding model for indexing and searching multi-aspect enterprise data like emails and invoices. The open-source **IDEFICS 2-8B** model continues Google&apos;s Flamingo multimodal model reproduction. **Rewind** pivoted to a multi-platform app called Limitless, moving away from spyware. Reddit discussions highlighted **Apple MLX** outperforming **Ollama** and **Mistral Instruct** on M2 Ultra GPUs, GPU choices for LLMs and Stable Diffusion, and AI-human comparisons by Microsoft Research&apos;s Chris Bishop. Former PayPal CEO Dan Schulman predicted **GPT-5** will drastically reduce job scopes by 80%. **Mistral** CEO Arthur Mensch criticized the obsession with AGI as &quot;creating God.&quot;</description><pubDate>Mon, 15 Apr 2024 22:42:55 GMT</pubDate><category>reka-ai</category><category>cohere</category><category>google</category><category>rewind</category><category>apple</category><category>mistral-ai</category><category>microsoft</category><category>paypal</category><category>gpt-4</category><category>idefics-2-8b</category><category>mistral-instruct</category><category>apple-mlx</category><category>gpt-5</category><category>arthur-mensch</category><category>dan-schulman</category><category>chris-bishop</category><category>multimodality</category><category>foundation-models</category><category>embedding-models</category><category>gpu-performance</category><category>model-comparison</category><category>enterprise-data</category><category>open-source</category><category>performance-optimization</category><category>job-impact</category><category>agi-criticism</category><category>technical-report</category></item><item><title>Zero to GPT in 1 Year</title><link>https://news.smol.ai/issues/24-04-12-ainews-zero-to-gpt-in-1-year/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-12-ainews-zero-to-gpt-in-1-year/</guid><description>**GPT-4 Turbo** reclaimed the top leaderboard spot with significant improvements in coding, multilingual, and English-only tasks, now rolled out in paid **ChatGPT**. Despite this, **Claude Opus** remains superior in creativity and intelligence. **Mistral AI** released powerful open-source models like **Mixtral-8x22B** and **Zephyr 141B** suited for fine-tuning. **LangChain** enhanced tool integration across models, and **Hugging Face** introduced Transformer.js for running transformers in browsers. Medical domain-focused **Medical mT5** was shared as an open-source multilingual text-to-text model. The community also highlighted research on LLMs as regressors and shared practical advice on OCR/PDF data modeling from **Vik Paruchuri**&apos;s journey.</description><pubDate>Fri, 12 Apr 2024 23:27:50 GMT</pubDate><category>openai</category><category>anthropic</category><category>mistral-ai</category><category>langchain</category><category>hugging-face</category><category>gpt-4-turbo</category><category>claude-3-opus</category><category>mixtral-8x22b</category><category>zephyr-141b</category><category>medical-mt5</category><category>vik-paruchuri</category><category>sam-altman</category><category>greg-brockman</category><category>miranda-murati</category><category>abacaj</category><category>mbusigin</category><category>akhaliq</category><category>clementdelangue</category><category>fine-tuning</category><category>multilinguality</category><category>tool-integration</category><category>transformers</category><category>model-evaluation</category><category>open-source-models</category><category>multimodal-llms</category><category>natural-language-processing</category><category>ocr</category><category>model-training</category></item><item><title>Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention</title><link>https://news.smol.ai/issues/24-04-11-ainews-mergestral-meta-mtiav2-cohere-rerank-3-google-infini-attention/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-11-ainews-mergestral-meta-mtiav2-cohere-rerank-3-google-infini-attention/</guid><description>**Meta** announced their new **MTIAv2 chips** designed for training and inference acceleration with improved architecture and integration with PyTorch 2.0. **Mistral** released the **8x22B Mixtral** model, which was merged back into a dense model to effectively create a 22B Mistral model. **Cohere** launched **Rerank 3**, a foundation model enhancing enterprise search and retrieval-augmented generation (RAG) systems supporting 100+ languages. **Google** published a paper on **Infini-attention**, an ultra-scalable linear attention mechanism demonstrated on 1B and 8B models with 1 million sequence length. Additionally, **Meta&apos;s Llama 3** is expected to start rolling out soon. Other notable updates include **Command R+**, an open model surpassing GPT-4 in chatbot performance with 128k context length, and advancements in Stable Diffusion models and RAG pipelines.</description><pubDate>Thu, 11 Apr 2024 22:56:47 GMT</pubDate><category>meta-ai-fair</category><category>mistral-ai</category><category>cohere</category><category>google</category><category>stability-ai</category><category>hugging-face</category><category>ollama</category><category>mistral-8x22b</category><category>command-r-plus</category><category>rerank-3</category><category>infini-attention</category><category>llama-3</category><category>sd-1.5</category><category>cosxl</category><category>aidan_gomez</category><category>ylecun</category><category>swyx</category><category>model-merging</category><category>training-accelerators</category><category>retrieval-augmented-generation</category><category>linear-attention</category><category>long-context</category><category>foundation-models</category><category>image-generation</category><category>rag-pipelines</category><category>model-benchmarking</category><category>context-length</category><category>model-performance</category></item><item><title>Music&apos;s Dall-E moment</title><link>https://news.smol.ai/issues/24-04-10-ainews-musics-dall-e-moment/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-10-ainews-musics-dall-e-moment/</guid><description>**Google&apos;s Griffin architecture** outperforms transformers with faster inference and lower memory usage on long contexts. **Command R+** climbs to 6th place on the LMSYS Chatbot Arena leaderboard, surpassing **GPT-4-0613** and **GPT-4-0314**. **Mistral AI** releases an open-source **8x22B model** with a 64K context window and around 130B total parameters. **Google** open-sources **CodeGemma** models with pre-quantized 4-bit versions for faster downloads. **Ella weights** enhance Stable Diffusion 1.5 with LLM for semantic alignment. **Unsloth** enables 4x larger context windows and 80% memory reduction for finetuning. **Andrej Karpathy** releases LLMs implemented in pure C for potential performance gains. **Command R+** runs in realtime on M2 Max MacBook using iMat q1 quantization. **Cohere&apos;s Command R** model offers low API costs and strong leaderboard performance. **Gemini 1.5** impresses with audio capabilities recognizing speech tone and speaker identification from audio clips.</description><pubDate>Wed, 10 Apr 2024 22:07:48 GMT</pubDate><category>google</category><category>mistral-ai</category><category>lmsys</category><category>cohere</category><category>griffin</category><category>command-r-plus</category><category>gpt-4-0613</category><category>gpt-4-0314</category><category>mistral-8x22b</category><category>codegemma</category><category>stable-diffusion-1.5</category><category>command-r</category><category>gemini-1.5</category><category>andrej-karpathy</category><category>model-architecture</category><category>benchmarking</category><category>open-source</category><category>model-quantization</category><category>memory-optimization</category><category>inference-speed</category><category>multimodality</category><category>finetuning</category><category>performance-optimization</category><category>audio-processing</category></item><item><title>Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence</title><link>https://news.smol.ai/issues/24-04-09-ainews-gemini-pro-and-gpt4t-vision-go-ga-on-the-same-day-by-complete-coincidence/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-09-ainews-gemini-pro-and-gpt4t-vision-go-ga-on-the-same-day-by-complete-coincidence/</guid><description>At **Google Cloud Next**, **Gemini 1.5 Pro** was released with a **million-token context window**, available in **180+ countries**, featuring **9.5 hours of audio understanding**, a new **File API** for nearly unlimited free uploads, and the **Gecko-1b-256/768 embedding model**. **GPT-4 Turbo with Vision** became generally available in the API with a major update improving reasoning capabilities. **Meta Platforms** plans to launch smaller versions of **Llama 3** next week. The **Orca 2.5 7B** model using Direct Nash Optimization outperforms older GPT-4 versions in AlpacaEval. New releases include **Functionary-V2.4** with enhanced function calling and code interpretation, and **CosXL** models for image editing. Research highlights include continuous U-Nets for diffusion models achieving up to **80% faster inference** and a massive multilingual dataset with **~5.6 trillion word tokens**. Creative applications include a no-code touch screen game made with Gemini 1.5 and AI-generated novel trailers.</description><pubDate>Wed, 10 Apr 2024 01:05:31 GMT</pubDate><category>google</category><category>openai</category><category>meta-ai-fair</category><category>hugging-face</category><category>cohere</category><category>gemini-1.5-pro</category><category>gpt-4-turbo</category><category>llama-3</category><category>orca-2.5-7b</category><category>functionary-v2.4</category><category>cosxl</category><category>million-token-context-window</category><category>audio-processing</category><category>file-api</category><category>text-embedding</category><category>function-calling</category><category>reasoning</category><category>direct-nash-optimization</category><category>contrastive-learning</category><category>code-interpreter</category><category>diffusion-models</category><category>neural-odes</category><category>inference-speed</category><category>multilingual-dataset</category><category>image-editing</category><category>no-code-development</category></item><item><title>Anime pfp anon eclipses $10k A::B prompting challenge</title><link>https://news.smol.ai/issues/24-04-08-ainews-anime-pfp-anon-eclipses-dollar10k-ab-prompting-challenge/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-08-ainews-anime-pfp-anon-eclipses-dollar10k-ab-prompting-challenge/</guid><description>**Victor Taelin** issued a $10k challenge to GPT models, initially achieving only **10% success** with state-of-the-art models, but community efforts surpassed **90% success** within 48 hours, highlighting GPT capabilities and common skill gaps. In Reddit AI communities, **Command R Plus (104B)** is running quantized on **M2 Max hardware** via **Ollama** and **llama.cpp** forks, with **GGUF quantizations** released on Huggingface. Streaming text-to-video generation is now available through the **st2v** GitHub repo. **WD Tagger v3** was released for mass auto-captioning datasets with a WebUI. Lesser-known prompting techniques like self-tagging and generational frameworks produced thought-provoking outputs in OpenAI discussions, including experiments with self-evolving system prompts. Stable Diffusion users discussed image composition importance for training character LoRAs and best checkpoints for video game character generation. Discussions also covered scarcity of **5B parameter models** and open(ish) licenses for open source AI. Memes included jokes about ChatGPT and Gemini training data differences.</description><pubDate>Tue, 09 Apr 2024 01:18:42 GMT</pubDate><category>openai</category><category>ollama</category><category>huggingface</category><category>command-r-plus-104b</category><category>stable-diffusion-1.5</category><category>victor-taelin</category><category>futuristfrog</category><category>quantization</category><category>model-optimization</category><category>streaming</category><category>prompt-engineering</category><category>self-prompting</category><category>image-composition</category><category>character-lora-training</category><category>model-size</category><category>open-source-licenses</category><category>memes</category><category>humor</category></item><item><title>Mixture of Depths: Dynamically allocating compute in transformer-based language models</title><link>https://news.smol.ai/issues/24-04-05-ainews-mixture-of-depths-dynamically-allocating-compute-in-transformer-based-language-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-05-ainews-mixture-of-depths-dynamically-allocating-compute-in-transformer-based-language-models/</guid><description>**DeepMind** introduces the Mixture-of-Depths (MoD) technique, dynamically allocating FLOPs across transformer layers to optimize compute usage, achieving over **50% faster** forward passes without training impact. MoD selectively processes tokens using top-k routing, improving efficiency and potentially enabling faster ultra-long context handling. The method can combine with Mixture-of-Experts (MoE) for decoupled routing of queries, keys, and values. Reddit discussions highlight concerns about **LLM hype** overshadowing other AI tech, improvements in transformer efficiency, a new Think-and-Execute framework boosting algorithmic reasoning by **10-20%**, and Visual Autoregressive modeling (VAR) surpassing diffusion models in image quality and speed. On-device model Octopus v2 outperforms GPT-4 in function calling accuracy and latency.</description><pubDate>Fri, 05 Apr 2024 22:44:29 GMT</pubDate><category>deepmind</category><category>octopus-v2</category><category>piotrpadlewski</category><category>transformer-efficiency</category><category>dynamic-compute-allocation</category><category>mixture-of-experts</category><category>mixture-of-depths</category><category>top-k-routing</category><category>algorithmic-reasoning</category><category>visual-autoregressive-modeling</category><category>on-device-models</category><category>function-calling</category><category>scaling-laws</category></item><item><title>Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning</title><link>https://news.smol.ai/issues/24-04-04-ainews-cohere-command-r-anthropic-claude-tool-use-openai-finetuning/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-04-ainews-cohere-command-r-anthropic-claude-tool-use-openai-finetuning/</guid><description>**Cohere** launched **Command R+**, a **104B dense model** with **128k context length** focusing on **RAG**, **tool-use**, and **multilingual** capabilities across **10 key languages**. It supports **Multi-Step Tool use** and offers open weights for research. **Anthropic** introduced **tool use in beta** for **Claude**, supporting over **250 tools** with new cookbooks for practical applications. **OpenAI** enhanced its fine-tuning API with new upgrades and case studies from Indeed, SK Telecom, and Harvey, promoting DIY fine-tuning and custom model training. **Microsoft** achieved a quantum computing breakthrough with an **800x error rate improvement** and the most usable qubits to date. **Stability AI** released **Stable Audio 2.0**, improving audio generation quality and control. The **Opera browser** added local inference support for large language models like **Meta&apos;s Llama**, **Google&apos;s Gemma**, and **Vicuna**. Discussions on Reddit highlighted **Gemini&apos;s large context window**, analysis of **GPT-3.5-Turbo** model size, and a battle simulation between **Claude 3** and **ChatGPT** using local 7B models like **Mistral** and **Gemma**.</description><pubDate>Thu, 04 Apr 2024 22:21:15 GMT</pubDate><category>cohere</category><category>anthropic</category><category>openai</category><category>microsoft</category><category>stability-ai</category><category>opera-software</category><category>meta-ai-fair</category><category>google-deepmind</category><category>mistral-ai</category><category>c4ai-command-r-plus</category><category>claude-3</category><category>gpt-3.5-turbo</category><category>gemini</category><category>mistral-7b</category><category>gemma-2</category><category>claude-3-5</category><category>llama-3</category><category>vicuna</category><category>tool-use</category><category>multilingual-models</category><category>rag</category><category>fine-tuning</category><category>quantum-computing</category><category>audio-generation</category><category>local-inference</category><category>context-windows</category><category>model-size-analysis</category><category>model-comparison</category></item><item><title>ReALM: Reference Resolution As Language Modeling</title><link>https://news.smol.ai/issues/24-04-03-ainews-realm-reference-resolution-as-language-modeling/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-03-ainews-realm-reference-resolution-as-language-modeling/</guid><description>**Apple** is advancing in AI with a new approach called **ReALM: Reference Resolution As Language Modeling**, which improves understanding of ambiguous references using three contexts and finetunes a smaller **FLAN-T5** model that outperforms **GPT-4** on this task. In Reddit AI news, an open-source coding agent **SWE-agent** achieves **12.29%** on the SWE-bench benchmark, and **RAGFlow** introduces a customizable retrieval-augmented generation engine. A new quantization method, **QuaRot**, enables efficient 4-bit inference. AI applications include a t-shirt design generator, **podgenai** for GPT-4 based podcast generation, and an open-source model from **HuggingFace** that runs without a GPU. Industry discussions focus on the impact of large language models on the AI field and efforts to decentralize AI development. **Takuto Takizawa** joins **Stability AI Japan** as Head of Sales &amp; Partnerships.</description><pubDate>Thu, 04 Apr 2024 00:00:20 GMT</pubDate><category>apple</category><category>openai</category><category>hugging-face</category><category>stability-ai</category><category>flan-t5</category><category>gpt-4</category><category>takuto-takizawa</category><category>reference-resolution</category><category>finetuning</category><category>quantization</category><category>retrieval-augmented-generation</category><category>open-source</category><category>coding-agents</category><category>podcast-generation</category><category>image-generation</category><category>ai-industry-trends</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/24-04-02-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-02-ainews-not-much-happened-today/</guid><description>**RAGFlow** open sourced, a deep document understanding RAG engine with **16.3k context length** and natural language instruction support. **Jamba v0.1**, a **52B parameter** MoE model by Lightblue, released but with mixed user feedback. **Command-R** from **Cohere** available on Ollama library. Analysis of **GPT-3.5-Turbo** architecture reveals about **7 billion parameters** and embedding size of **4096**, comparable to OpenChat-3.5-0106 and Mixtral-8x7B. AI chatbots, including **GPT-4**, outperform humans in debates on persuasion. **Mistral-7B** made amusing mistakes on a math riddle. Hardware highlights include a discounted **HGX H100 640GB** machine with 8 H100 GPUs bought for $58k, and CPU comparisons between **Epyc 9374F** and **Threadripper 1950X** for LLM inference. GPU recommendations for local LLMs focus on VRAM and inference speed, with users testing **4090 GPU** and **Midnight-miqu-70b-v1.0.q5_k_s** model. Stable Diffusion influences gaming habits and AI art evaluation shows bias favoring human-labeled art.</description><pubDate>Tue, 02 Apr 2024 21:04:12 GMT</pubDate><category>cohere</category><category>lightblue</category><category>openai</category><category>mistral-ai</category><category>nvidia</category><category>amd</category><category>hugging-face</category><category>ollama</category><category>jamba-v0.1</category><category>command-r</category><category>gpt-3.5-turbo</category><category>openchat-3.5-0106</category><category>mixtral-8x7b</category><category>mistral-7b</category><category>midnight-miqu-70b-v1.0.q5_k_s</category><category>rag</category><category>mixture-of-experts</category><category>model-architecture</category><category>model-analysis</category><category>debate-persuasion</category><category>hardware-performance</category><category>gpu-inference</category><category>cpu-comparison</category><category>local-llm</category><category>stable-diffusion</category><category>ai-art-bias</category></item><item><title>AdamW -&gt; AaronD?</title><link>https://news.smol.ai/issues/24-04-01-ainews-adamw-greater-aarond/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-04-01-ainews-adamw-greater-aarond/</guid><description>**Aaron Defazio** is gaining attention for proposing a potential tuning-free replacement of the long-standing **Adam optimizer**, showing promising experimental results across classic machine learning benchmarks like ImageNet ResNet-50 and CIFAR-10/100. On Reddit, **Claude 3 Opus** has surpassed all **OpenAI** models on the LMSys leaderboard, while a user pretrained a **LLaMA-based 300M** model outperforming **bert-large** on language modeling tasks with a modest budget. The new **MambaMixer** architecture demonstrates promising results in vision and time series forecasting. In image generation, **Stable Diffusion 1.5** with LoRAs achieves realistic outputs, and the **WDXL** release showcases impressive capabilities. AI applications include an AI-generated Nike spec ad and a chatbot built with OpenAI models that may resist prompt injections. OpenAI is reportedly planning a ban wave targeting policy violators and jailbreak users. *&quot;The high alpha seems to come from Aaron Defazio,&quot;* highlighting his impactful work in optimizer research.</description><pubDate>Mon, 01 Apr 2024 19:58:53 GMT</pubDate><category>openai</category><category>hugging-face</category><category>claude-3-opus</category><category>llama-3</category><category>llama-3-300m</category><category>bert-large</category><category>stable-diffusion-1.5</category><category>wdxl</category><category>aaron-defazio</category><category>optimizer</category><category>machine-learning-benchmarks</category><category>vision</category><category>time-series-forecasting</category><category>image-generation</category><category>prompt-injection</category><category>policy-enforcement</category></item><item><title>Evals-based AI Engineering</title><link>https://news.smol.ai/issues/24-03-29-ainews-evals-based-ai-engineering/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-29-ainews-evals-based-ai-engineering/</guid><description>**Hamel Husain** emphasizes the importance of comprehensive evals in AI product development, highlighting evaluation, debugging, and behavior change as key iterative steps. **OpenAI** released a voice engine demo showcasing advanced voice cloning from small samples, raising safety concerns. Reddit discussions introduced new models like **Jamba** (hybrid Transformer-SSM with MoE), **Bamboo** (7B LLM with high sparsity based on Mistral), **Qwen1.5-MoE** (efficient parameter activation), and **Grok 1.5** (128k context length, surpassing GPT-4 in code generation). Advances in quantization include **1-bit Llama2-7B** models outperforming full precision and the **QLLM** quantization toolbox supporting GPTQ/AWQ/HQQ methods.</description><pubDate>Fri, 29 Mar 2024 22:20:49 GMT</pubDate><category>openai</category><category>mistral-ai</category><category>x-ai</category><category>llamaindex</category><category>jamba</category><category>bamboo</category><category>qwen-1.5-moe</category><category>grok-1.5</category><category>llama2-7b</category><category>hamel-husain</category><category>alec-radford</category><category>evaluation</category><category>fine-tuning</category><category>prompt-engineering</category><category>voice-cloning</category><category>quantization</category><category>model-optimization</category><category>code-generation</category><category>context-windows</category></item><item><title>Jamba: Mixture of Architectures dethrones Mixtral</title><link>https://news.smol.ai/issues/24-03-28-ainews-jamba-mixture-of-architectures-dethrones-mixtral/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-28-ainews-jamba-mixture-of-architectures-dethrones-mixtral/</guid><description>**AI21 labs** released **Jamba**, a **52B parameter MoE model** with **256K context length** and open weights under Apache 2.0 license, optimized for single A100 GPU performance. It features a unique blocks-and-layers architecture combining transformer and MoE layers, competing with models like **Mixtral**. Meanwhile, **Databricks** introduced **DBRX**, a **36B active parameter MoE model** trained on **12T tokens**, noted as a new standard for open LLMs. In image generation, advancements include **Animatediff** for video-quality image generation and **FastSD CPU v1.0.0 beta 28** enabling ultra-fast image generation on CPUs. Other innovations involve style-content separation using **B-LoRA** and improvements in high-resolution image upscaling with **SUPIR**.</description><pubDate>Thu, 28 Mar 2024 23:43:23 GMT</pubDate><category>ai21-labs</category><category>databricks</category><category>together-ai</category><category>hugging-face</category><category>midjourney</category><category>jamba</category><category>dbrx</category><category>mixtral</category><category>animatediff</category><category>fastsd</category><category>sdxs512-0.9</category><category>b-lora</category><category>supir</category><category>mixture-of-experts</category><category>model-architecture</category><category>context-windows</category><category>model-optimization</category><category>fine-tuning</category><category>image-generation</category><category>video-generation</category><category>cpu-optimization</category><category>style-content-separation</category><category>high-resolution-upscaling</category></item><item><title>DBRX: Best open model (just not most efficient)</title><link>https://news.smol.ai/issues/24-03-27-ainews-dbrx-best-open-model-just-not-most-efficient/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-27-ainews-dbrx-best-open-model-just-not-most-efficient/</guid><description>**Databricks Mosaic** has released a new open-source model called **DBRX** that outperforms **Grok**, **Mixtral**, and **Llama2** on evaluations while being about **2x more efficient** than Llama2 and Grok. The model was trained on **12 trillion tokens** using **3,000 H100 GPUs** over 2 months, with an estimated compute cost of **$10 million**. It uses OpenAI&apos;s **100k tiktoken tokenizer** and shows strong zero-shot code generation performance, even beating **GPT-4** on the Humaneval benchmark. DBRX also upstreamed work to **MegaBlocks** open source. Despite its scale and efficiency, DBRX&apos;s performance on MMLU is only slightly better than Mixtral, raising questions about its scaling efficiency. The focus of DBRX is on enabling users to train models efficiently, with MoE training being about **2x more FLOP-efficient** than dense models, achieving similar quality with nearly **4x less compute** than previous MPT models. This release is part of the ongoing competition for open-source AI leadership, including models like **Dolly**, **MPT**, and **Mistral**. *&quot;If it activates 36B params, the model&apos;s perf should be equivalent to a 72B dense model or even 80B,&quot;* says Qwen&apos;s tech lead.</description><pubDate>Wed, 27 Mar 2024 22:33:19 GMT</pubDate><category>databricks</category><category>hugging-face</category><category>mistral-ai</category><category>mosaicml</category><category>openai</category><category>dbrx</category><category>grok</category><category>mixtral</category><category>llama-2</category><category>mpt-7b</category><category>gpt-4</category><category>mixture-of-experts</category><category>model-efficiency</category><category>tokenization</category><category>model-training</category><category>code-generation</category><category>model-architecture</category><category>open-source-models</category><category>benchmarking</category><category>fine-tuning</category></item><item><title>Claude 3 is officially America&apos;s Next Top Model</title><link>https://news.smol.ai/issues/24-03-26-ainews-claude-3-is-officially-americas-next-top-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-26-ainews-claude-3-is-officially-americas-next-top-model/</guid><description>**Claude 3 Opus** outperforms **GPT4T** and **Mistral Large** in blind Elo rankings, with **Claude 3 Haiku** marking a new cost-performance frontier. Fine-tuning techniques like **QLoRA** on **Mistral 7B** and evolutionary model merging on HuggingFace models are highlighted. Public opinion shows strong opposition to ASI development. Research supervision opportunities in AI alignment are announced. The **Stable Diffusion 3 (SD3)** release raises workflow concerns for tools like **ComfyUI** and **automatic1111**. **Opus** shows a 5% performance dip on **OpenRouter** compared to the **Anthropic API**. A new benchmark stresses LLM recall at long contexts, with **Mistral 7B** struggling and **Qwen 72b** performing well.</description><pubDate>Wed, 27 Mar 2024 00:11:55 GMT</pubDate><category>anthropic</category><category>mistral-ai</category><category>huggingface</category><category>openrouter</category><category>stable-diffusion</category><category>automatic1111</category><category>comfyui</category><category>claude-3-opus</category><category>claude-3-sonnet</category><category>claude-3-haiku</category><category>gpt-4o-mini</category><category>mistral-7b</category><category>qwen-72b</category><category>mark_riedl</category><category>ethanjperez</category><category>stuhlmueller</category><category>ylecun</category><category>aravsrinivas</category><category>fine-tuning</category><category>model-merging</category><category>alignment</category><category>ai-ethics</category><category>benchmarking</category><category>model-performance</category><category>long-context</category><category>cost-efficiency</category><category>model-evaluation</category></item><item><title>Andrew likes Agents</title><link>https://news.smol.ai/issues/24-03-25-ainews-andrew-likes-agents/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-25-ainews-andrew-likes-agents/</guid><description>**Andrew Ng&apos;s The Batch writeup on Agents** highlighted the significant improvement in coding benchmark performance when using an iterative agent workflow, with **GPT-3.5** wrapped in an agent loop achieving up to **95.1%** correctness on HumanEval, surpassing **GPT-4** zero-shot at **67.0%**. The report also covers new developments in **Stable Diffusion** models like **Cyberrealistic_v40**, **Platypus XL**, and **SDXL Lightning** for Naruto-style image generation, alongside innovations in LoRA and upscaling techniques. Discussions on **local LLM deployment** and optimization focus on hardware setups and finetuning strategies for efficient inference and multi-user serving. Emad&apos;s departure from **Stability AI** and new **Sora** videos from **OpenAI** were also noted.</description><pubDate>Tue, 26 Mar 2024 01:11:50 GMT</pubDate><category>openai</category><category>stability-ai</category><category>gpt-3.5</category><category>gpt-4</category><category>cyberrealistic_v40</category><category>platypus-xl</category><category>sdxl-lightning</category><category>andrew-ng</category><category>lilian-weng</category><category>emad</category><category>agents</category><category>human-eval-benchmark</category><category>fine-tuning</category><category>local-llm-deployment</category><category>inference-speed</category><category>image-generation</category><category>lora</category><category>upscaling</category><category>workflow-optimization</category></item><item><title>Astro Nano</title><link>https://news.smol.ai/projects/project-2/</link><guid isPermaLink="true">https://news.smol.ai/projects/project-2/</guid><description>Minimal portfolio and blog build with astro and no frameworks.</description><pubDate>Tue, 26 Mar 2024 00:00:00 GMT</pubDate></item><item><title>not much happened today</title><link>https://news.smol.ai/issues/24-03-22-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-22-ainews-not-much-happened-today/</guid><description>The Reddit community /r/LocalLlama discusses **fine-tuning and training LLMs**, including tutorials and questions on training models with specific data like dictionaries and synthetic datasets with **25B+ tokens**. Users explore **retrieval-augmented generation (RAG)** challenges with models like **mistral-7b** and embedding generation for EEG brain activity. Discussions include **hardware optimization** for running **llama-2-70b** locally under budget constraints, and performance benchmarks for **qwen-1.5** models. There is interest in extending LLM capabilities, such as converting **llama-2-7b** into a vision-capable model like **llava** and improving model memory for longer context retention.</description><pubDate>Fri, 22 Mar 2024 23:55:31 GMT</pubDate><category>microsoft</category><category>mistral-ai</category><category>ollama</category><category>llama-2-70b</category><category>llama-2-7b</category><category>mistral-7b</category><category>qwen-1.5</category><category>llava</category><category>fine-tuning</category><category>synthetic-data</category><category>retrieval-augmented-generation</category><category>embeddings</category><category>hardware-optimization</category><category>performance-benchmarks</category><category>model-memory</category><category>multimodality</category></item><item><title>Welcome /r/LocalLlama!</title><link>https://news.smol.ai/issues/24-03-21-ainews-welcome-rlocalllama/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-21-ainews-welcome-rlocalllama/</guid><description>**Sakana** released a paper on evolutionary model merging. **OpenInterpreter** launched their **O1 devkit**. Discussions highlight **Claude Haiku**&apos;s underrated performance with 10-shot examples. On **Reddit&apos;s IPO**, AINews introduces Reddit summaries starting with /r/LocalLlama, covering upcoming subreddits like r/machinelearning and r/openai. **Aether Research** released **Cerebrum 8x7b** based on **Mixtral**, matching **GPT-3.5 Turbo** and **Gemini Pro** on reasoning tasks, setting a new open-source reasoning SOTA. **Moistral 11B v1** finetuned model from Cream-Phi-2 creators was released. A creative writing benchmark uses **Claude Opus** as judge. Hobbyists explore **1.58 BitNet** ternary quantization and **1-bit LLMs** training. Nvidia&apos;s **Blackwell (h200)** chip supports **FP4 precision** quantization. **LMDeploy v0.2.6+** enables efficient vision-language model deployment with models like **Qwen-VL-Chat**. Users seek GUIs for LLM APIs with plugin and RAG support. Pipelines for synthetic training data generation and fine-tuning language models for chat are discussed.</description><pubDate>Thu, 21 Mar 2024 23:33:53 GMT</pubDate><category>sakana</category><category>openinterpreter</category><category>reddit</category><category>aether-research</category><category>mistral-ai</category><category>nvidia</category><category>lmdeploy</category><category>cerebrum-8x7b</category><category>mixtral-7b</category><category>gpt-3.5-turbo</category><category>gemini-pro</category><category>moistral-11b-v1</category><category>claude-opus</category><category>qwen-vl-chat</category><category>model-merging</category><category>benchmarking</category><category>quantization</category><category>performance-optimization</category><category>deployment</category><category>vision</category><category>fine-tuning</category><category>training-data</category><category>synthetic-data</category><category>rag</category><category>gui</category></item><item><title>Shipping and Dipping: Inflection + Stability edition</title><link>https://news.smol.ai/issues/24-03-20-ainews-shipping-and-dipping-inflection-stability-edition/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-20-ainews-shipping-and-dipping-inflection-stability-edition/</guid><description>**Inflection AI** and **Stability AI** recently shipped major updates (**Inflection AI 2.5** and **Stable Diffusion 3**) but are now experiencing significant executive departures, signaling potential consolidation in the GPU-rich startup space. **Mustafa Suleyman** has joined **Microsoft AI** as CEO, overseeing consumer AI products like Copilot, Bing, and Edge. **Microsoft Azure** is collaborating with **NVIDIA** on the Grace Blackwell 200 Superchip. **Google DeepMind** announced **TacticAI**, an AI assistant for football tactics developed with Liverpool FC, using geometric deep learning and achieving 90% expert approval in blind tests. **Anthropic** released **Claude 3 Haiku** and **Claude 3 Sonnet** on Google Cloud&apos;s Vertex AI, with **Claude 3 Opus** coming soon. Concerns about AI job displacement arise as **NVIDIA** introduces AI nurses that outperform humans at bedside manner at 90% lower cost.</description><pubDate>Thu, 21 Mar 2024 00:59:01 GMT</pubDate><category>inflection-ai</category><category>stability-ai</category><category>microsoft</category><category>nvidia</category><category>google-deepmind</category><category>anthropic</category><category>inflection-ai-2.5</category><category>stable-diffusion-3</category><category>claude-3-haiku</category><category>claude-3-sonnet</category><category>claude-3-opus</category><category>tacticai</category><category>mustafa-suleyman</category><category>executive-departures</category><category>gpu-acceleration</category><category>ai-assistants</category><category>geometric-deep-learning</category><category>ai-integration</category><category>ai-cost-reduction</category><category>ai-job-displacement</category><category>ai-healthcare</category><category>model-release</category></item><item><title>World_sim.exe</title><link>https://news.smol.ai/issues/24-03-19-ainews-worldsimexe/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-19-ainews-worldsimexe/</guid><description>**NVIDIA** announced **Project GR00T**, a foundation model for humanoid robot learning using multimodal instructions, built on their tech stack including Isaac Lab, OSMO, and Jetson Thor. They revealed the **DGX Grace-Blackwell GB200** with over **1 exaflop** compute, capable of training **GPT-4 1.8T parameters** in 90 days on 2000 Blackwells. Jensen Huang confirmed GPT-4 has **1.8 trillion parameters**. The new **GB200 GPU** supports float4/6 precision with ~3 bits per parameter and achieves **40,000 TFLOPs** on fp4 with 2x sparsity. 

Open source highlights include the release of **Grok-1**, a **340B parameter** model, and **Stability AI&apos;s SV3D**, an open-source text-to-video generation solution. **Nous Research** collaborated on implementing Steering Vectors in Llama.CPP. 

In Retrieval Augmented Generation (RAG), a new **5.5-hour tutorial** builds a pipeline using open-source HF models, and **LangChain** released a video on query routing and announced integration with **NVIDIA NIM** for GPU-optimized LLM inference. 

Prominent opinions include **Yann LeCun** distinguishing language from other cognitive abilities, **Sam Altman** predicting AGI arrival in 6 years with a leap from GPT-4 to GPT-5 comparable to GPT-3 to GPT-4, and discussions on the philosophical status of LLMs like Claude. There is also advice against training models from scratch for most companies.</description><pubDate>Wed, 20 Mar 2024 00:46:48 GMT</pubDate><category>nvidia</category><category>nous-research</category><category>stability-ai</category><category>hugging-face</category><category>langchain</category><category>anthropic</category><category>openai</category><category>gpt-4</category><category>gpt-4o</category><category>grok-1</category><category>llama-cpp</category><category>claude-3-opus</category><category>claude-3</category><category>gpt-5</category><category>jensen-huang</category><category>yann-lecun</category><category>sam-altman</category><category>multimodality</category><category>foundation-models</category><category>hardware-optimization</category><category>model-quantization</category><category>float4</category><category>float6</category><category>retrieval-augmented-generation</category><category>text-to-video</category><category>prompt-engineering</category><category>long-form-rag</category><category>gpu-optimization</category><category>philosophy-of-ai</category><category>agi-predictions</category></item><item><title>Grok-1 in Bio</title><link>https://news.smol.ai/issues/24-03-18-ainews-grok-1-in-bio/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-18-ainews-grok-1-in-bio/</guid><description>**Grok-1**, a **314B parameter Mixture-of-Experts (MoE) model** from **xAI**, has been released under an Apache 2.0 license, sparking discussions on its architecture, finetuning challenges, and performance compared to models like **Mixtral** and **Miqu 70B**. Despite its size, its **MMLU benchmark performance** is currently unimpressive, with expectations that **Grok-2** will be more competitive. The model&apos;s weights and code are publicly available, encouraging community experimentation. **Sam Altman** highlighted the growing importance of compute resources, while **Grok&apos;s** potential deployment on **Groq hardware** was noted as a possible game-changer. Meanwhile, **Anthropic&apos;s Claude** continues to attract attention for its &quot;spiritual&quot; interaction experience and consistent ethical framework. The release also inspired memes and humor within the AI community.</description><pubDate>Tue, 19 Mar 2024 00:07:45 GMT</pubDate><category>xai</category><category>mistral-ai</category><category>perplexity-ai</category><category>groq</category><category>anthropic</category><category>openai</category><category>grok-1</category><category>mixtral</category><category>miqu-70b</category><category>claude-3-opus</category><category>claude-3</category><category>claude-3-haiku</category><category>sam-altman</category><category>arthur-mensch</category><category>daniel-han</category><category>arav-srinivas</category><category>francis-yao</category><category>mixture-of-experts</category><category>model-release</category><category>model-performance</category><category>benchmarking</category><category>finetuning</category><category>compute</category><category>hardware-optimization</category><category>mmlu</category><category>model-architecture</category><category>open-source</category><category>memes</category></item><item><title>Astro Sphere</title><link>https://news.smol.ai/projects/project-1/</link><guid isPermaLink="true">https://news.smol.ai/projects/project-1/</guid><description>Portfolio and blog build with astro.</description><pubDate>Mon, 18 Mar 2024 00:00:00 GMT</pubDate></item><item><title>MM1: Apple&apos;s first Large Multimodal Model</title><link>https://news.smol.ai/issues/24-03-15-ainews-mm1-apples-first-large-multimodal-model/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-15-ainews-mm1-apples-first-large-multimodal-model/</guid><description>**Apple** announced the **MM1** multimodal LLM family with up to **30B parameters**, claiming performance comparable to **Gemini-1** and beating larger older models on VQA benchmarks. The paper targets researchers and hints at applications in embodied agents and business/education. **Yann LeCun** emphasized that human-level AI requires understanding the physical world, memory, reasoning, and hierarchical planning, while **Franois Chollet** cautioned that NLP is far from solved despite LLM advances. **Cohere** released **Command-R**, a model for Retrieval Augmented Generation, and **Anthropic** highlighted the **Claude 3** family (Opus, Sonnet, Haiku) for various application needs. Open-source hardware **DexCap** enables dexterous robot manipulation data collection affordably. Tools like **CopilotKit** simplify AI integration into React apps, and migration to **Keras 3** with JAX backend offers faster training. New projects improve reranking for retrieval and add financial agents to **LangChain**. The content includes insights on AI progress, new models, open-source tools, and frameworks.</description><pubDate>Fri, 15 Mar 2024 23:34:51 GMT</pubDate><category>apple</category><category>cohere</category><category>anthropic</category><category>hugging-face</category><category>langchain</category><category>mm1</category><category>gemini-1</category><category>command-r</category><category>claude-3-opus</category><category>claude-3-sonnet</category><category>claude-3-haiku</category><category>claude-3</category><category>yann-lecun</category><category>francois-chollet</category><category>multimodality</category><category>vqa</category><category>fine-tuning</category><category>retrieval-augmented-generation</category><category>open-source</category><category>robotics</category><category>model-training</category><category>react</category><category>reranking</category><category>financial-agents</category></item><item><title>Not much happened piday</title><link>https://news.smol.ai/issues/24-03-14-ainews-not-much-happened-piday/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-14-ainews-not-much-happened-piday/</guid><description>**DeepMind** announces **SIMA**, a generalist AI agent capable of following natural language instructions across diverse 3D environments and video games, advancing embodied AI agents. **Anthropic** releases **Claude 3 Haiku**, their fastest and most affordable model, now available via API and Perplexity. New research explores language model scaling laws, over-training, and introduces **Branch-Train-MiX (BTX)** for efficient training of large language models using mixture-of-experts. Predictions suggest software engineering jobs will grow to **30-35 million** in five years, aided by AI coding assistants like **Cohere&apos;s Command-R** focusing on retrieval-augmented generation and tool use. The **EU AI Act** is approved, mandating transparency in training data for GPAI systems. Privacy-preserving in-context learning with differential privacy is highlighted as promising work. Memes humorously discuss AI software engineers and notable figures like **Andrej Karpathy**.</description><pubDate>Thu, 14 Mar 2024 23:53:52 GMT</pubDate><category>deepmind</category><category>anthropic</category><category>cohere</category><category>claude-3-haiku</category><category>demis-hassabis</category><category>fchollet</category><category>abacaj</category><category>andrej-karpathy</category><category>embodied-ai-agents</category><category>natural-language-instructions</category><category>language-model-scaling</category><category>mixture-of-experts</category><category>retrieval-augmented-generation</category><category>software-engineering</category><category>ai-regulation</category><category>differential-privacy</category><category>privacy-preserving-learning</category><category>humor</category></item><item><title>DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY</title><link>https://news.smol.ai/issues/24-03-13-ainews-deepmind-sima-one-ai-9-games-600-tasks-visionlanguage-only/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-13-ainews-deepmind-sima-one-ai-9-games-600-tasks-visionlanguage-only/</guid><description>**DeepMind SIMA** is a generalist AI agent for 3D virtual environments evaluated on **600 tasks** across **9 games** using only screengrabs and natural language instructions, achieving **34%** success compared to humans&apos; **60%**. The model uses a multimodal Transformer architecture. **Andrej Karpathy** outlines AI autonomy progression in software engineering, while **Arav Srinivas** praises Cognition Labs&apos; AI agent demo. **François Chollet** expresses skepticism about automating software engineering fully. **Yann LeCun** suggests moving away from generative models and reinforcement learning towards human-level AI. Meta&apos;s **Llama-3** training infrastructure with **24k H100 Cluster Pods** is shared by **Soumith Chintala** and **Yann LeCun**. **Deepgram&apos;s Aura** offers low-latency speech APIs, and **Modal Labs&apos; Devin AI** demonstrates document navigation and interaction with ComfyUI. Memes and humor circulate in the AI community.</description><pubDate>Thu, 14 Mar 2024 01:07:46 GMT</pubDate><category>deepmind</category><category>cognition-labs</category><category>deepgram</category><category>modal-labs</category><category>meta-ai-fair</category><category>anthropic</category><category>llama-3</category><category>claude-3-opus</category><category>claude-3</category><category>gpt-3.5-turbo</category><category>andrej-karpathy</category><category>arav-srinivas</category><category>francois-chollet</category><category>yann-lecun</category><category>soumith-chintala</category><category>john-carmack</category><category>multimodality</category><category>transformer</category><category>software-engineering</category><category>ai-agents</category><category>ai-infrastructure</category><category>training</category><category>text-to-speech</category><category>speech-to-text</category><category>real-time-processing</category><category>model-architecture</category><category>benchmarking</category></item><item><title>The world&apos;s first fully autonomous AI Engineer</title><link>https://news.smol.ai/issues/24-03-12-ainews-the-worlds-first-fully-autonomous-ai-engineer/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-12-ainews-the-worlds-first-fully-autonomous-ai-engineer/</guid><description>**Cognition Labs&apos;s Devin** is highlighted as a potentially groundbreaking AI software engineer agent capable of learning unfamiliar technologies, addressing bugs, deploying frontend apps, and fine-tuning its own AI models. It integrates **OpenAI&apos;s GPT-4** with reinforcement learning and features tools like asynchronous chat, browser, shell access, and an IDE. The system claims advanced long-term reasoning and planning abilities, attracting praise from investors like **Patrick Collison** and **Fred Ehrsam**. The technology is noted for its potential as one of the most advanced AI agents, sparking excitement about agents and AGI.</description><pubDate>Tue, 12 Mar 2024 23:05:08 GMT</pubDate><category>cognition-labs</category><category>openai</category><category>gpt-4</category><category>devin</category><category>patrick-collison</category><category>fred-ehrsam</category><category>tim-dettmers</category><category>reinforcement-learning</category><category>fine-tuning</category><category>long-term-reasoning</category><category>planning</category><category>ai-agents</category><category>software-engineering</category><category>model-integration</category><category>asynchronous-chat</category><category>ide</category><category>agentic-ai</category></item><item><title>Fixing Gemma</title><link>https://news.smol.ai/issues/24-03-11-ainews-fixing-gemma/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-11-ainews-fixing-gemma/</guid><description>**Google&apos;s Gemma model** was found unstable for finetuning until **Daniel Han from Unsloth AI** fixed 8 bugs, improving its implementation. **Yann LeCun** explained technical details of a pseudo-random bit sequence for adaptive equalizers, while **François Chollet** discussed the low information bandwidth of the human visual system. **Arav Srinivas** reported that **Claude 3 Opus** showed no hallucinations in extensive testing, outperforming **GPT-4** and **Mistral-Large** in benchmarks. Reflections from **Yann LeCun** highlight ongoing AI progress toward human-level intelligence. The community is shifting pipelines to work better with Claude models, and emotional experiences in ML development were shared by **Aidan Clark**.</description><pubDate>Tue, 12 Mar 2024 00:03:26 GMT</pubDate><category>google</category><category>unsloth</category><category>anthropic</category><category>mistral-ai</category><category>gemma</category><category>claude-3-opus</category><category>claude-3</category><category>mistral-large</category><category>gpt-4</category><category>daniel-han</category><category>yann-lecun</category><category>francois-chollet</category><category>arav-srinivas</category><category>_aidan_clark_</category><category>finetuning</category><category>numerical-precision</category><category>benchmarking</category><category>structured-data-extraction</category><category>adaptive-equalizer</category><category>information-theory</category><category>hallucination-detection</category><category>model-stability</category></item><item><title>FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs</title><link>https://news.smol.ai/issues/24-03-08-ainews-fsdpqlora-the-answer-to-70b-scale-ai-for-desktop-class-gpus/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-08-ainews-fsdpqlora-the-answer-to-70b-scale-ai-for-desktop-class-gpus/</guid><description>**Jeremy Howard** and collaborators released a new tool combining **FSDP**, **QLoRA**, and **HQQ** to enable training **70b-parameter** models on affordable consumer GPUs like **RTX 4090s** with only **24GB RAM**, overcoming traditional memory constraints that required expensive data center GPUs costing over $150k. The approach shards quantized models across multiple GPUs and uses techniques like gradient checkpointing and CPU offloading to achieve efficient training on desktop-class hardware. The blogpost details challenges and solutions integrating these methods, highlighting a significant cost reduction from $150k to under $2.5k for training large language models. Additionally, Twitter recaps mention **Inflection AI**&apos;s **Inflection-2.5** model rivaling **GPT-4** in benchmarks with less compute, and **Grok** improving speed by 3x. **Yann LeCun** discusses multi-step reasoning training for LLMs.</description><pubDate>Fri, 08 Mar 2024 23:21:13 GMT</pubDate><category>answer.ai</category><category>hugging-face</category><category>meta-ai-fair</category><category>nvidia</category><category>inflectionai</category><category>qlora</category><category>fsdp</category><category>inflection-2.5</category><category>gpt-4</category><category>jeremy_howard</category><category>tim_dettmers</category><category>yann_lecun</category><category>model-training</category><category>quantization</category><category>memory-optimization</category><category>gradient-checkpointing</category><category>cpu-offloading</category><category>fine-tuning</category><category>model-sharding</category><category>reinforcement-learning</category><category>chain-of-thought</category><category>benchmarking</category></item><item><title>Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU</title><link>https://news.smol.ai/issues/24-03-07-ainews-inflection-25-at-94percent-of-gpt4-and-pi-at-6m-mau/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-07-ainews-inflection-25-at-94percent-of-gpt4-and-pi-at-6m-mau/</guid><description>**Mustafa Suleyman** announced **Inflection 2.5**, which achieves *more than 94% the average performance of GPT-4 despite using only 40% the training FLOPs*. **Pi**&apos;s user base is growing about 10% weekly, with new features like realtime web search. The community noted similarities between Inflection 2.5 and **Claude 3 Sonnet**. **Claude 3 Opus** outperformed **GPT-4** in a 1.5:1 vote and is now the default for **Perplexity Pro** users. **Anthropic** added experimental tool calling support for Claude 3 via **LangChain**. **LlamaIndex** released LlamaParse JSON Mode for structured PDF parsing and added video retrieval via VideoDB, enabling retrieval-augmented generation (RAG) pipelines. A paper proposed knowledge-augmented planning for LLM agents. New benchmarks like TinyBenchmarks and the **Yi-9B** model release show strong code and math performance, surpassing **Mistral**.</description><pubDate>Fri, 08 Mar 2024 02:11:17 GMT</pubDate><category>inflection</category><category>anthropic</category><category>perplexity-ai</category><category>llamaindex</category><category>mistral-ai</category><category>langchain</category><category>inflection-2.5</category><category>claude-3-sonnet</category><category>claude-3-opus</category><category>gpt-4</category><category>yi-9b</category><category>mistral</category><category>mustafa-suleyman</category><category>amanda-askell</category><category>jeremyphoward</category><category>abacaj</category><category>omarsar0</category><category>retrieval-augmented-generation</category><category>benchmarking</category><category>ocr</category><category>structured-output</category><category>video-retrieval</category><category>knowledge-augmentation</category><category>planning</category><category>tool-use</category><category>evaluation</category><category>code-benchmarks</category><category>math-benchmarks</category></item><item><title>Not much happened today</title><link>https://news.smol.ai/issues/24-03-06-ainews-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-06-ainews-not-much-happened-today/</guid><description>**Anthropic** released **Claude 3**, replacing Claude 2.1 as the default on Perplexity AI, with **Claude 3 Opus** surpassing **GPT-4** in capability. Debate continues on whether Claude 3&apos;s performance stems from emergent properties or pattern matching. **LangChain** and **LlamaIndex** added support for Claude 3 enabling multimodal and tool-augmented applications. Despite progress, current models still face challenges in out-of-distribution reasoning and robustness. **Cohere** partnered with **Accenture** for enterprise AI search, while **Mistral AI** and **Snowflake** collaborate to provide LLMs on Snowflake&apos;s platform. **Together AI Research** integrates **Deepspeed** innovations to accelerate generative AI infrastructure. **Hugging Face** and the **European Space Agency** released a large earth observation dataset, and **Google** open sourced **Gemma 2B**, optimized for smartphones via the MLC-LLM project. **GPT4All** improved model discoverability for open models. The AI community balances excitement over new models with concerns about limitations and robustness, alongside growing enterprise adoption and open-source contributions. Memes and humor continue to provide social commentary.</description><pubDate>Thu, 07 Mar 2024 01:15:26 GMT</pubDate><category>anthropic</category><category>perplexity</category><category>langchain</category><category>llamaindex</category><category>cohere</category><category>accenture</category><category>mistral-ai</category><category>snowflake</category><category>together-ai</category><category>hugging-face</category><category>european-space-agency</category><category>google</category><category>gpt4all</category><category>claude-3</category><category>claude-3-opus</category><category>claude-3-sonnet</category><category>gpt-4</category><category>gemma-2b</category><category>multimodality</category><category>instruction-following</category><category>out-of-distribution-reasoning</category><category>robustness</category><category>enterprise-ai</category><category>cloud-infrastructure</category><category>open-datasets</category><category>model-deployment</category><category>model-discoverability</category><category>generative-ai</category><category>image-generation</category></item><item><title>Stable Diffusion 3 — Rombach &amp; Esser did it again!</title><link>https://news.smol.ai/issues/24-03-05-ainews-stable-diffusion-3-rombach-and-esser-did-it-again/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-05-ainews-stable-diffusion-3-rombach-and-esser-did-it-again/</guid><description>**Over 2500 new community members joined following Soumith Chintala&apos;s shoutout, highlighting growing interest in SOTA LLM-based summarization. The major highlight is the detailed paper release of **Stable Diffusion 3 (SD3)**, showcasing advanced text-in-image control and complex prompt handling, with the model outperforming other SOTA image generation models in human-evaluated benchmarks. The SD3 model is based on an enhanced Diffusion Transformer architecture called **MMDiT**. Meanwhile, **Anthropic** released **Claude 3** models, noted for human-like responses and emotional depth, scoring 79.88% on HumanEval but costing over twice as much as GPT-4. Microsoft launched new Orca-based models and datasets, and Latitude released **DolphinCoder-StarCoder2-15b** with strong coding capabilities. Integration of image models by **Perplexity AI** and 3D CAD generation by **PolySpectra** powered by **LlamaIndex** were also highlighted. *&quot;SD3&apos;s win rate beats all other SOTA image gen models (except perhaps Ideogram)&quot;* and *&quot;Claude 3 models are very good at generating d3 visualizations from text descriptions.&quot;*</description><pubDate>Tue, 05 Mar 2024 22:30:03 GMT</pubDate><category>stability-ai</category><category>anthropic</category><category>microsoft</category><category>latitude</category><category>perplexity-ai</category><category>llamaindex</category><category>tripo-ai</category><category>stable-diffusion-3</category><category>claude-3</category><category>orca</category><category>dolphincoder-starcoder2-15b</category><category>soumith-chintala</category><category>bill-peebles</category><category>swyx</category><category>kevinafischer</category><category>jeremyphoward</category><category>akhaliq</category><category>karinanguyen_</category><category>aravsrinivas</category><category>diffusion-models</category><category>multimodality</category><category>benchmarking</category><category>human-evaluation</category><category>text-generation</category><category>image-generation</category><category>3d-modeling</category><category>fine-tuning</category><category>roleplay</category><category>coding</category><category>dataset-release</category></item><item><title>Claude 3 just destroyed GPT 4 (see for yourself)</title><link>https://news.smol.ai/issues/24-03-04-ainews-claude-3-just-destroyed-gpt-4-see-for-yourself/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-04-ainews-claude-3-just-destroyed-gpt-4-see-for-yourself/</guid><description>**Claude 3** from **Anthropic** launches in three sizes: Haiku (small, unreleased), Sonnet (medium, default on claude.ai, AWS, and GCP), and Opus (large, on Claude Pro). Opus outperforms **GPT-4** on key benchmarks like GPQA, impressing benchmark authors. All models support **multimodality** with advanced vision capabilities, including converting a 2-hour video into a blog post. Claude 3 offers improved alignment, fewer refusals, and extended context length up to **1 million tokens** with near-perfect recall. Haiku is noted for speed and cost-efficiency, processing dense research papers in under three seconds. The models excel at following complex instructions and producing structured outputs like JSON. Safety improvements reduce refusal rates, though some criticism remains from experts. Claude 3 is trained on synthetic data and shows strong domain-specific evaluation results in finance, medicine, and philosophy.</description><pubDate>Mon, 04 Mar 2024 23:59:02 GMT</pubDate><category>anthropic</category><category>amazon</category><category>google</category><category>claude-ai</category><category>claude-3</category><category>claude-3-opus</category><category>claude-3-sonnet</category><category>claude-3-haiku</category><category>gpt-4</category><category>mmitchell</category><category>connor-leahy</category><category>multimodality</category><category>vision</category><category>long-context</category><category>model-alignment</category><category>model-evaluation</category><category>synthetic-data</category><category>structured-output</category><category>instruction-following</category><category>model-speed</category><category>cost-efficiency</category><category>benchmarking</category><category>safety</category></item><item><title>The Era of 1-bit LLMs</title><link>https://news.smol.ai/issues/24-03-01-ainews-the-era-of-1-bit-llms/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-03-01-ainews-the-era-of-1-bit-llms/</guid><description>**The Era of 1-bit LLMs** research, including the **BitNet b1.58** model, introduces a ternary parameter approach that matches full-precision Transformer LLMs in performance while drastically reducing energy costs by **38x**. This innovation promises new scaling laws and hardware designs optimized for 1-bit LLMs. Discussions on AI Twitter highlight advances in **AGI societal impact**, **robotics with multimodal models**, **fine-tuning techniques like ResLoRA**, and **AI security efforts at Hugging Face**. Ethical considerations in generative AI and humor within the AI community are also prominent topics.</description><pubDate>Fri, 01 Mar 2024 22:33:03 GMT</pubDate><category>hugging-face</category><category>bitnet-b1.58</category><category>swyx</category><category>levelsio</category><category>gdb</category><category>npew</category><category>_akhaliq</category><category>osanseviero</category><category>mmitchell_ai</category><category>deliprao</category><category>nearcyan</category><category>clementdelangue</category><category>quantization</category><category>model-optimization</category><category>energy-efficiency</category><category>fine-tuning</category><category>robotics</category><category>multimodality</category><category>ai-security</category><category>ethics</category><category>humor</category></item><item><title>Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)</title><link>https://news.smol.ai/issues/24-02-29-ainews-dia-de-las-secuelas-starcoder-the-stack-dune-semianalysis/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-29-ainews-dia-de-las-secuelas-starcoder-the-stack-dune-semianalysis/</guid><description>**HuggingFace/BigCode** has released **StarCoder v2**, including the **StarCoder2-15B** model trained on over **600 programming languages** using the **The Stack v2** dataset. This release marks a state-of-the-art achievement for models of this size, with opt-out requests excluded from training data. A detailed technical report is available, highlighting the model&apos;s capabilities and training methodology. Additionally, a live event featuring **Dylan Patel** discussing GPU economics is announced for San Francisco.</description><pubDate>Fri, 01 Mar 2024 00:14:08 GMT</pubDate><category>hugging-face</category><category>bigcode</category><category>starcoder-2</category><category>starcoder2-15b</category><category>dylan-patel</category><category>code-generation</category><category>model-training</category><category>dataset-release</category><category>model-performance</category></item><item><title>... and welcome AI Twitter!</title><link>https://news.smol.ai/issues/24-02-28-ainews-and-welcome-ai-twitter/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-28-ainews-and-welcome-ai-twitter/</guid><description>The AI Twitter discourse from **2/27-28/2024** covers a broad spectrum including **ethical considerations** highlighted by **Margaret Mitchell** around **Google Gemini&apos;s** launch, and **John Carmack&apos;s** insights on evolving coding skills in the AI era. **Guillaume Lample** announced the release of the **Mistral Large** multilingual model. Discussions also touched on potential leadership changes at **Google** involving **Sundar Pichai**, and **OpenAI&apos;s** possible entry into the synthetic data market as noted by **Delip Rao**. Technological advancements include **Yann LeCun&apos;s** commentary on running LLMs on mobile devices and **Alex Wang&apos;s** praise for the **Apple Vision Pro**. Financial platform issues were raised by **Pieter Levels** regarding **Stripe&apos;s** payment policies. The cultural dynamics within big tech were discussed by **François Chollet** and **Dhéliat**. The lighter side of AI was represented by memes and humor from **Pieter Levels** and **AISafetyMemes**. This summary reflects the fast-evolving AI landscape blending technical innovation, corporate strategy, ethics, and community culture.</description><pubDate>Thu, 29 Feb 2024 00:50:17 GMT</pubDate><category>google</category><category>openai</category><category>apple</category><category>stripe</category><category>mistral-large</category><category>google-gemini</category><category>margaret-mitchell</category><category>john-carmack</category><category>guillaume-lample</category><category>sundar-pichai</category><category>delip-rao</category><category>santiago-l-valdarrama</category><category>alex-wang</category><category>yann-lecun</category><category>pieter-levels</category><category>francois-chollet</category><category>dheliat</category><category>ai-ethics</category><category>multilinguality</category><category>on-device-ai</category><category>convolutional-neural-networks</category><category>synthetic-data</category><category>financial-transaction-systems</category><category>corporate-culture</category><category>humor</category></item><item><title>Welcome Interconnects and OpenRouter</title><link>https://news.smol.ai/issues/24-02-27-ainews-welcome-interconnects-and-openrouter/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-27-ainews-welcome-interconnects-and-openrouter/</guid><description>**Discord communities** analyzed **22 guilds**, **349 channels**, and **12885 messages** revealing active discussions on **model comparisons and optimizations** involving **Mistral AI**, **Miqu**, and **GGUF quantized models**. Highlights include comparing **Mistral Large** with **GPT-4**, focusing on cost-effectiveness and performance, and exploring quantization techniques like **GPTQ** and **QLORA** to reduce VRAM usage. Advanced applications such as **role-playing**, **story-writing**, **code clarity**, and **AI-assisted decompilation** were emphasized, alongside development of tools like an **asynchronous summarization script** for **Mistral 7b**. The intersection of **quantum computing** and AI was discussed, including DARPA-funded projects and **encoder-based diffusion techniques** for image processing. Community efforts featured new Spanish LLM announcements, hardware experimentation, and open-source initiatives, with platforms like **Perplexity AI** and **LlamaIndex** noted for innovation and integration. Speculation about **Mistral AI**&apos;s open-source commitment and tools like **R2R** for rapid RAG deployment highlighted collaborative spirit.</description><pubDate>Tue, 27 Feb 2024 20:03:47 GMT</pubDate><category>mistral-ai</category><category>openai</category><category>perplexity-ai</category><category>llamaindex</category><category>qwen</category><category>langchain</category><category>mistral-large</category><category>miqu</category><category>mixtral</category><category>gpt-4</category><category>mistral-7b</category><category>nathan-lambert</category><category>alex-atallah</category><category>model-comparison</category><category>model-optimization</category><category>quantization</category><category>role-playing</category><category>story-writing</category><category>code-clarity</category><category>ai-assisted-decompilation</category><category>asynchronous-processing</category><category>quantum-computing</category><category>encoder-based-diffusion</category><category>open-source</category><category>hardware-experimentation</category><category>rag-systems</category></item><item><title>Mistral Large disappoints</title><link>https://news.smol.ai/issues/24-02-26-ainews-mistral-large-disappoints/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-26-ainews-mistral-large-disappoints/</guid><description>**Mistral** announced **Mistral Large**, a new language model achieving **81.2% accuracy on MMLU**, trailing **GPT-4 Turbo** by about 5 percentage points on benchmarks. The community reception has been mixed, with skepticism about open sourcing and claims that **Mistral Small** outperforms the open **Mixtral 8x7B**. Discussions in the **TheBloke** Discord highlighted performance and cost-efficiency comparisons between **Mistral Large** and **GPT-4 Turbo**, technical challenges with **DeepSpeed** and **DPOTrainer** for training, advances in AI deception for roleplay characters using **DreamGen Opus V1**, and complexities in model merging using linear interpolation and PEFT methods. Enthusiasm for AI-assisted decompilation was also expressed, emphasizing the use of open-source projects for training data.</description><pubDate>Mon, 26 Feb 2024 21:59:34 GMT</pubDate><category>mistral-ai</category><category>openai</category><category>hugging-face</category><category>mistral-large</category><category>mistral-small</category><category>mixtral-8x7b</category><category>gpt-4-turbo</category><category>dreamgen-opus-v1</category><category>timotheeee1</category><category>cogbuji</category><category>plasmator</category><category>jsarnecki</category><category>maldevide</category><category>spottyluck</category><category>mrjackspade</category><category>benchmarking</category><category>model-merging</category><category>fine-tuning</category><category>reinforcement-learning</category><category>model-training</category><category>tokenization</category><category>model-optimization</category><category>ai-assisted-decompilation</category><category>performance</category><category>cost-efficiency</category><category>deception</category><category>roleplay</category><category>deep-speed</category><category>dpo</category></item><item><title>One Year of Latent Space</title><link>https://news.smol.ai/issues/24-02-23-ainews-one-year-of-latent-space/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-23-ainews-one-year-of-latent-space/</guid><description>**Latent Space** podcast celebrated its first anniversary, reaching #1 in AI Engineering podcasts and 1 million unique readers on Substack. The **Gemini 1.5** image generator by **Google DeepMind** sparked controversy over bias and inaccurate representation, leading to community debates on AI ethics. Discussions in **TheBloke** and **LM Studio** Discords highlighted AI&apos;s growing role in creative industries, especially game development and text-to-3D tools. Fine-tuning and performance optimization of models like **Gemma 7B** and **Mistral-next** were explored in **Nous Research AI** and **Mistral** Discords, with shared solutions including learning rates and open-source tools. Emerging trends in AI hardware and application development were discussed in **CUDA MODE** and **LangChain AI** Discords, including critiques of **Nvidia&apos;s CUDA** by **Jim Keller** and advancements in reducing AI hallucinations hinted by **Richard Socher**.</description><pubDate>Sat, 24 Feb 2024 01:05:00 GMT</pubDate><category>google-deepmind</category><category>nous-research</category><category>mistral-ai</category><category>hugging-face</category><category>nvidia</category><category>langchain</category><category>jetbrains</category><category>gemini-1.5</category><category>gemma-7b</category><category>mistral-next</category><category>opus-v1</category><category>orca-2-13b</category><category>nous-hermes-2-dpo-7b</category><category>jim-keller</category><category>richard-socher</category><category>ai-ethics</category><category>bias-mitigation</category><category>fine-tuning</category><category>performance-optimization</category><category>model-merging</category><category>knowledge-transfer</category><category>text-to-3d</category><category>ai-hallucination</category><category>hardware-optimization</category><category>application-development</category><category>vulnerability-research</category></item><item><title>Ring Attention for &gt;1M Context</title><link>https://news.smol.ai/issues/24-02-22-ainews-ring-attention-for-greater1m-context/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-22-ainews-ring-attention-for-greater1m-context/</guid><description>**Google Gemini Pro** has sparked renewed interest in long context capabilities. The CUDA MODE Discord is actively working on implementing the **RingAttention** paper by Liu, Zaharia, and Abbeel, including extensions from the World Model RingAttention paper, with available PyTorch and CUDA implementations. TheBloke Discord discussed various topics including **LLM guessing game evaluation**, chatbot UX comparisons between **Nvidia&apos;s Chat with RTX** and **Polymind**, challenges in **retrieval-augmented generation (RAG)** integration, VRAM optimization, fine-tuning for character roleplay using **Dynamic Prompt Optimization (DPO)**, and model choices like **deepseek-coder-6.7B-instruct**. There was also discussion on ML workflows on Mac Studio, with preferences for **llama.cpp** over **ollama**, and scaling inference cost-effectively using GPUs like the **4090** on Runpod. LM Studio users face manual update requirements for version **0.2.16**, which includes support for **Gemma models** and bug fixes, especially for MacOS. The Gemma 7B model has had performance issues, while Gemma 2B received positive feedback.</description><pubDate>Fri, 23 Feb 2024 00:51:56 GMT</pubDate><category>google</category><category>cuda-mode</category><category>nvidia</category><category>polymind</category><category>deepseek</category><category>ollama</category><category>runpod</category><category>lmstudio</category><category>gemini-pro</category><category>gemma-7b</category><category>gemma-2b</category><category>deepseek-coder-6.7b-instruct</category><category>llama-cpp</category><category>liu</category><category>zaharia</category><category>abbeel</category><category>long-context</category><category>ringattention</category><category>pytorch</category><category>cuda</category><category>llm-guessing-game</category><category>chatbots</category><category>retrieval-augmented-generation</category><category>vram-optimization</category><category>fine-tuning</category><category>dynamic-prompt-optimization</category><category>ml-workflows</category><category>gpu-scaling</category><category>model-updates</category></item><item><title>Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)</title><link>https://news.smol.ai/issues/24-02-21-ainews-google-ai-win-some-gemma-15-pro-lose-some-image-gen/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-21-ainews-google-ai-win-some-gemma-15-pro-lose-some-image-gen/</guid><description>**Google&apos;s Gemma open models** (2-7B parameters) outperform **Llama 2** and **Mistral** in benchmarks but face criticism for an unusual license and poor image generation quality, which Google partially acknowledges. The upcoming **Gemini Pro 1.5** model features a 1 million token context window, excelling in video understanding and needle-in-haystack tasks. Discord communities like **TheBloke** and **LM Studio** discuss mixed reception of Gemma models, anticipation for **Llama 3** release, challenges in dataset editing, and hardware considerations such as **NVIDIA GeForce RTX 3090** and **RTX 4090** GPUs. LM Studio users report issues with version 0.2.15 Beta and ongoing integration of Gemma models, with resources shared on **Hugging Face**.</description><pubDate>Thu, 22 Feb 2024 02:21:19 GMT</pubDate><category>google</category><category>hugging-face</category><category>nvidia</category><category>gemma-2b</category><category>gemma-7b</category><category>gemma</category><category>gemini-pro-1.5</category><category>llama-2</category><category>llama-3</category><category>mistral</category><category>benchmarking</category><category>license-policies</category><category>image-generation</category><category>video-understanding</category><category>long-context</category><category>dataset-editing</category><category>model-integration</category><category>gpu-hardware</category><category>bug-fixes</category><category>quantization</category></item><item><title>Karpathy emerges from stealth?</title><link>https://news.smol.ai/issues/24-02-20-ainews-karpathy-emerges-from-stealth/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-20-ainews-karpathy-emerges-from-stealth/</guid><description>**Andrej Karpathy** released a comprehensive 2-hour tutorial on **tokenization**, detailing techniques up to **GPT-4**&apos;s tokenizer and noting the complexity of **Llama 2** tokenization with SentencePiece. Discussions in AI Discord communities covered **model optimization and efficiency**, focusing on **quantization** of models like **Mistral 7B** and **Zephyr-7B** to reduce memory usage for consumer GPUs, including Intel&apos;s new weight-only quantization algorithm. Efforts to improve computational efficiency included selective augmentation reducing costs by 57.76% and memory token usage versus kNN for Transformers. Challenges in hardware compatibility and software issues were shared, alongside fine-tuning techniques such as LoRA and model merging. Innovative applications of LLMs in retrieval-augmented generation (RAG), multi-model learning, and meta-reasoning were explored. The community emphasized dataset sharing, open-source releases like SDXL VAE encoded datasets and Audiogen AI codecs, and ethical AI use with censorship and guardrails. Collaboration and resource sharing remain strong in these AI communities.</description><pubDate>Wed, 21 Feb 2024 01:54:38 GMT</pubDate><category>intel</category><category>mistral-ai</category><category>audiogen</category><category>thebloke</category><category>mistral-7b</category><category>mixtral-8x7b</category><category>zephyr-7b</category><category>gpt-4</category><category>llama-2</category><category>andrej-karpathy</category><category>tokenization</category><category>quantization</category><category>model-optimization</category><category>fine-tuning</category><category>model-merging</category><category>computational-efficiency</category><category>memory-optimization</category><category>retrieval-augmented-generation</category><category>multi-model-learning</category><category>meta-reasoning</category><category>dataset-sharing</category><category>open-source</category><category>ethical-ai</category><category>community-collaboration</category></item><item><title>Companies liable for AI hallucination is Good Actually for AI Engineers</title><link>https://news.smol.ai/issues/24-02-19-ainews-companies-liable-for-ai-hallucination-is-good-actually-for-ai-engineers/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-19-ainews-companies-liable-for-ai-hallucination-is-good-actually-for-ai-engineers/</guid><description>**Air Canada** faced a legal ruling requiring it to honor refund policies communicated by its AI chatbot, setting a precedent for corporate liability in AI engineering accuracy. The tribunal ordered a refund of **$650.88 CAD** plus damages after the chatbot misled a customer about bereavement travel refunds. Meanwhile, AI community discussions highlighted innovations in **quantization techniques** for GPU inference, **Retrieval-Augmented Generation (RAG)** and fine-tuning of LLMs, and **CUDA** optimizations for PyTorch models. New prototype models like **Mistral-Next** and the **Large World Model (LWM)** were introduced, showcasing advances in handling large text contexts and video generation with models like **Sora**. Ethical and legal implications of AI autonomy were debated alongside challenges in dataset management. Community-driven projects such as the open-source TypeScript agent framework **bazed-af** emphasize collaborative AI development. Additionally, benchmarks like **BABILong** for up to **10M context evaluation** and tools from **karpathy** were noted.</description><pubDate>Tue, 20 Feb 2024 00:05:26 GMT</pubDate><category>air-canada</category><category>huggingface</category><category>mistral-ai</category><category>mistral-next</category><category>large-world-model</category><category>sora</category><category>babilong</category><category>andrej-karpathy</category><category>quantization</category><category>retrieval-augmented-generation</category><category>fine-tuning</category><category>cuda-optimization</category><category>video-generation</category><category>ai-ethics</category><category>dataset-management</category><category>open-source</category><category>community-driven-development</category></item><item><title>Sora pushes SOTA</title><link>https://news.smol.ai/issues/24-02-16-ainews-sora-pushes-sota/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-16-ainews-sora-pushes-sota/</guid><description>**Discord communities** analyzed over **20 guilds**, **312 channels**, and **10550 messages** reveal intense discussions on AI developments. Key highlights include the **Dungeon Master AI assistant** for Dungeons and Dragons using models like **H20 GPT**, GPU power supply debates involving **3090** and **3060 GPUs**, and excitement around **Google&apos;s Gemini 1.5** with its **1 million token context window** and **OpenAI&apos;s Sora** model. Challenges with **large world models (LWM)** multimodality, **GPT-assisted coding**, and **role-play model optimization** with **Yi models** and **Mixtral Instruct** were discussed. Technical issues like **model merging errors** with **MistralCasualML**, fine-tuning scripts like **AutoFineTune**, and cross-language engineering via **JSPyBridge** were also prominent. NVIDIA&apos;s **Chat with RTX** feature leveraging **retrieval-augmented generation (RAG)** on 30+ series GPUs was compared to LMStudio&apos;s support for **Mistral 7b** and **Llama 13b** models. The community is cautiously optimistic about these frontier models&apos; applications in media and coding.</description><pubDate>Fri, 16 Feb 2024 11:15:03 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>nvidia</category><category>mistral-ai</category><category>h2oai</category><category>gemini-1.5</category><category>sora</category><category>h20-gpt</category><category>mistral-7b</category><category>llama-13b</category><category>mistralcasualml</category><category>mixtral-instruct</category><category>yi-models</category><category>multimodality</category><category>gpu-power-management</category><category>long-context</category><category>model-merging</category><category>fine-tuning</category><category>retrieval-augmented-generation</category><category>role-play-model-optimization</category><category>cross-language-integration</category><category>training-loss</category><category>synthetic-data-generation</category><category>coding-support</category></item><item><title>AI gets Memory</title><link>https://news.smol.ai/issues/24-02-14-ainews-ai-gets-memory/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-14-ainews-ai-gets-memory/</guid><description>**AI Discords** analysis covered **20 guilds**, **312 channels**, and **6901 messages**. The report highlights the divergence of RAG style operations for context and memory, with implementations like **MemGPT** rolling out in **ChatGPT** and **LangChain**. The **TheBloke Discord** discussed **open-source large language models** such as the **Large World Model** with contexts up to **1 million tokens**, and the **Cohere aya model** supporting **101 languages**. Roleplay-focused models like **MiquMaid-v2-70B** were noted for performance improvements with enhanced hardware. Finetuning techniques like **Sequential Fine-Tuning (SFT)** and **Direct Preference Optimization (DPO)** were explained, with tools like **Unsloth AI&apos;s apply_chat_template** preferred over Alpaca. Integration of JavaScript and Python via **JSPyBridge** in the **SillyTavern** project was also discussed. Training challenges with **Mixtral 8x7b qlora** versus **Mistral 7b** were noted. The **LM Studio Discord** focused on hardware limitations affecting large model loading, medical LLMs like **medAlpaca**, and hardware discussions around GPU upgrades and overclocking. Anticipation for **IQ3_XSS** 1.5 bit quantization support in LM Studio was expressed.</description><pubDate>Thu, 15 Feb 2024 00:47:59 GMT</pubDate><category>openai</category><category>langchain</category><category>thebloke</category><category>cohere</category><category>unsloth-ai</category><category>mistral-ai</category><category>microsoft</category><category>miqumaid-v2-70b</category><category>mixtral-8x7b-qlora</category><category>mistral-7b</category><category>phi-2</category><category>medalpaca</category><category>aya</category><category>joanne-jang</category><category>rag</category><category>memory-modeling</category><category>context-windows</category><category>open-source</category><category>finetuning</category><category>sequential-fine-tuning</category><category>direct-preference-optimization</category><category>rlhf</category><category>ppo</category><category>javascript-python-integration</category><category>hardware-optimization</category><category>gpu-overclocking</category><category>quantization</category><category>model-training</category><category>large-context</category><category>multilinguality</category></item><item><title>The Dissection of Smaug (72B)</title><link>https://news.smol.ai/issues/24-02-12-ainews-the-dissection-of-smaug-72b/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-12-ainews-the-dissection-of-smaug-72b/</guid><description>**Abacus AI** launched **Smaug 72B**, a large finetune of **Qwen 1.0**, which remains unchallenged on the **Hugging Face Open LLM Leaderboard** despite skepticism from **Nous Research**. **LAION** introduced a local voice assistant model named **Bud-E** with a notable demo. The **TheBloke Discord** community discussed model performance trade-offs between large models like **GPT-4** and smaller quantized models, fine-tuning techniques using datasets like **WizardLM_evol_instruct_V2_196k** and **OpenHermes-2.5**, and challenges in web UI development and model merging involving **Mistral-7b** and **MiquMaid**. The **LM Studio Discord** highlighted issues with model conversion from PyTorch to gguf, hardware setups involving **Intel Xeon CPUs** and **Nvidia P40 GPUs**, privacy concerns, and limitations in image generation and web UI availability.</description><pubDate>Tue, 13 Feb 2024 01:40:29 GMT</pubDate><category>abacus-ai</category><category>hugging-face</category><category>nous-research</category><category>laion</category><category>thebloke</category><category>lm-studio</category><category>intel</category><category>nvidia</category><category>elevenlabs</category><category>smaug-72b</category><category>qwen-1.0</category><category>qwen-1.5</category><category>gpt-4</category><category>mistral-7b</category><category>miqumaid</category><category>wizardlm_evol_instruct_v2_196k</category><category>openhermes-2.5</category><category>bindureddy</category><category>fine-tuning</category><category>model-merging</category><category>quantization</category><category>web-ui</category><category>model-conversion</category><category>hardware-setup</category><category>privacy</category><category>image-generation</category><category>optical-character-recognition</category><category>prompt-engineering</category></item><item><title>Gemini Ultra is out, to mixed reviews</title><link>https://news.smol.ai/issues/24-02-08-ainews-gemini-ultra-is-out-to-mixed-reviews/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-08-ainews-gemini-ultra-is-out-to-mixed-reviews/</guid><description>**Google** released **Gemini Ultra** as a paid tier for &quot;Gemini Advanced with Ultra 1.0&quot; following the discontinuation of Bard. Reviews noted it is &quot;slightly faster/better than ChatGPT&quot; but with reasoning gaps. The **Steam Deck** was highlighted as a surprising AI workstation capable of running models like Solar 10.7B. Discussions in AI communities covered topics such as multi-GPU support for OSS Unsloth, training data contamination from OpenAI outputs, ethical concerns over model merging, and new alignment techniques like Listwise Preference Optimization (LiPO). The **Mojo** programming language was praised for high-performance computing. In research, the **Subformer** model uses sandwich-style parameter sharing and SAFE for efficiency, and **BiLLM** introduced 1-bit post-training quantization to reduce resource use. The **OpenHermes** dataset viewer tool was launched, and GPU scheduling with Slurm was discussed. Fine-tuning challenges for models like **OpenHermes-2.5-Mistral-7B** and VRAM requirements were also topics of interest.</description><pubDate>Fri, 09 Feb 2024 05:58:08 GMT</pubDate><category>google</category><category>openai</category><category>mistral-ai</category><category>hugging-face</category><category>gemini-ultra</category><category>gemini-advanced</category><category>solar-10.7b</category><category>openhermes-2.5-mistral-7b</category><category>subformer</category><category>billm</category><category>multi-gpu-support</category><category>training-data-contamination</category><category>model-merging</category><category>model-alignment</category><category>listwise-preference-optimization</category><category>high-performance-computing</category><category>parameter-sharing</category><category>post-training-quantization</category><category>dataset-viewer</category><category>gpu-scheduling</category><category>fine-tuning</category><category>vram-optimization</category></item><item><title>MetaVoice &amp; RIP Bard</title><link>https://news.smol.ai/issues/24-02-07-ainews-metavoice-and-rip-bard/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-07-ainews-metavoice-and-rip-bard/</guid><description>**Coqui**, a TTS startup that recently shut down, inspired a new **TTS model** supporting voice cloning and longform synthesis from a small startup called **MetaVoice**. **Google** discontinued the **Bard** brand in favor of **Gemini**. On **TheBloke Discord**, discussions focused on AI training with models like **Mixtral**, **Nous Mixtral DPO**, and **Miqu 70B**, comparing them to **OpenAI&apos;s GPT** models, and debated prompt engineering, lorebooks, and removing safety features via **LoRA fine-tuning** on models such as **Llama2 70B instruct**. Technical topics included transformer layer offloading limitations and adapting **LLaMa 2** for Apple Silicon. On **OpenAI Discord**, **DALL-E** images now include **C2PA metadata** for content authenticity, sparking debates on AI censorship, metadata manipulation, and open-source AI models versus commercial giants like **GPT-4**. Users discussed GPT-4 usability, limitations, and practical applications.</description><pubDate>Wed, 07 Feb 2024 22:41:50 GMT</pubDate><category>coqui</category><category>metavoice</category><category>google</category><category>openai</category><category>thebloke</category><category>mixtral</category><category>nous-mixtral-dpo</category><category>miqu-70b</category><category>gpt-4</category><category>llama-2-70b-instruct</category><category>llama-2</category><category>llama-2-70b</category><category>llama-2-70b-instruct</category><category>text-to-speech</category><category>voice-cloning</category><category>longform-synthesis</category><category>prompt-engineering</category><category>direct-preference-optimization</category><category>lora-fine-tuning</category><category>transformers</category><category>gpu-acceleration</category><category>apple-silicon</category><category>content-authenticity</category><category>metadata</category><category>ai-censorship</category><category>open-source-ai</category><category>model-comparison</category><category>usability</category><category>model-limitations</category></item><item><title>Qwen 1.5 Released</title><link>https://news.smol.ai/issues/24-02-06-ainews-qwen-15-released/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-06-ainews-qwen-15-released/</guid><description>**Chinese AI models Yi, Deepseek, and Qwen** are gaining attention for strong performance, with **Qwen 1.5** offering up to **32k token context** and compatibility with Hugging Face transformers and quantized models. The **TheBloke Discord** discussed topics like quantization of a **70B LLM**, the introduction of the **Sparse MoE model Sparsetral** based on **Mistral**, debates on merging vs fine-tuning, and Direct Preference Optimization (DPO) for character generation. The **Nous Research AI Discord** covered challenges in Japanese Kanji generation, AI scams on social media, and Meta&apos;s VR headset prototypes showcased at **SIGGRAPH 2023**. Discussions also included fine-tuning frozen networks and new models like **bagel-7b-v0.4**, **DeepSeek-Math-7b-instruct**, and **Sparsetral-16x7B-v2**.</description><pubDate>Tue, 06 Feb 2024 23:40:32 GMT</pubDate><category>deepseek</category><category>qwen</category><category>mistral-ai</category><category>hugging-face</category><category>meta-ai-fair</category><category>qwen-1.5</category><category>mistral-7b</category><category>sparsetral-16x7b-v2</category><category>bagel-7b-v0.4</category><category>deepseek-math-7b-instruct</category><category>quantization</category><category>token-context</category><category>multilinguality</category><category>retrieval-augmented-generation</category><category>agent-planning</category><category>code-generation</category><category>sparse-moe</category><category>model-merging</category><category>fine-tuning</category><category>direct-preference-optimization</category><category>character-generation</category><category>ascii-art</category><category>kanji-generation</category><category>vr</category><category>retinal-resolution</category><category>light-field-passthrough</category><category>frozen-networks</category><category>normalization-layers</category></item><item><title>Less Lazy AI</title><link>https://news.smol.ai/issues/24-02-05-ainews-less-lazy-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-05-ainews-less-lazy-ai/</guid><description>The AI Discord summaries for early 2024 cover various community discussions and developments. Highlights include **20** guilds, **308** channels, and **10449** messages analyzed, saving an estimated **780 minutes** of reading time. Key topics include **Polymind Plugin Puzzle** integrating PubMed API, roleplay with **HamSter v0.2**, VRAM challenges in **Axolotl** training, fine-tuning tips for **FLAN-T5**, and innovative **model merging** strategies. The **Nous Research AI** community discussed GPT-4&apos;s lyricism issues, quantization techniques using `llama.cpp`, **frankenmerging** with models like **miqu-1-120b-GGUF**, anticipation for **Qwen2**, and tools like `text-generation-webui` and **ExLlamaV2**. The **LM Studio** community reported a bug where the app continues running after UI closure, with a workaround to forcibly terminate the process. These discussions reflect ongoing challenges and innovations in AI model training, deployment, and interaction.</description><pubDate>Tue, 06 Feb 2024 00:50:28 GMT</pubDate><category>openai</category><category>hugging-face</category><category>nous-research</category><category>h2oai</category><category>apple</category><category>hamster-v0.2</category><category>flan-t5</category><category>miqu-1-120b-gguf</category><category>qwen2</category><category>axolotl</category><category>philschmid</category><category>model-merging</category><category>fine-tuning</category><category>quantization</category><category>vram-optimization</category><category>plugin-development</category><category>chatbot-memory</category><category>model-training</category><category>bug-reporting</category><category>api-compatibility</category></item><item><title>The Core Skills of AI Engineering</title><link>https://news.smol.ai/issues/24-02-03-ainews-the-core-skills-of-ai-engineering/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-03-ainews-the-core-skills-of-ai-engineering/</guid><description>**AI Discords for 2/2/2024** analyzed **21 guilds**, **312 channels**, and **4782 messages** saving an estimated **382 minutes** of reading time. Discussions included **Eugene Yan** initiating a deep dive into **AI engineering** challenges, highlighting overlaps between software engineering and data science skills. The **TheBloke Discord** featured talks on **MiquMaid**, **OLMo** (an open-source 65B LLM by **AI2** under Apache 2.0), **Aphrodite** model batching, **AWQ** quantization, and **LoRA** fine-tuning techniques like **QLoRA** and **LoftQ**. The **LAION Discord** discussed **SSD-1B** distillation issues, data quality optimization with captioning datasets like **BLIP**, **COCO**, and **LLaVA**, and tokenization strategies for prompt adherence in image generation. Other topics included AI security with watermarking, superconductors and carbon nanotubes for hardware, and deployment of LLMs via **Hugging Face** tools.</description><pubDate>Sun, 04 Feb 2024 00:54:29 GMT</pubDate><category>ai2</category><category>hugging-face</category><category>miqumaid</category><category>olmo</category><category>aphrodite</category><category>awq</category><category>exl2</category><category>mistral-medium</category><category>internlm</category><category>ssd-1b</category><category>lora</category><category>qlora</category><category>loftq</category><category>eugene-yan</category><category>ai-engineering</category><category>quantization</category><category>fine-tuning</category><category>open-source</category><category>model-deployment</category><category>data-quality</category><category>tokenization</category><category>prompt-adherence</category><category>distillation</category><category>ai-security</category><category>batching</category><category>hardware</category><category>role-playing</category></item><item><title>AI2 releases OLMo - the 4th open-everything LLM</title><link>https://news.smol.ai/issues/24-02-02-ainews-ai2-releases-olmo-the-4th-open-everything-llm/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-02-ainews-ai2-releases-olmo-the-4th-open-everything-llm/</guid><description>**AI2** is gaining attention in 2024 with its new **OLMo** models, including 1B and 7B sizes and a 65B model forthcoming, emphasizing open and reproducible research akin to **Pythia**. The **Miqu-70B** model, especially the Mistral Medium variant, is praised for self-correction and speed optimizations. Discussions in **TheBloke** Discord covered programming language preferences, VRAM constraints for large models, and fine-tuning experiments with **Distilbert-base-uncased**. The **Mistral** Discord highlighted challenges in the **GPU shortage** affecting semiconductor production involving **TSMC**, **ASML**, and **Zeiss**, debates on open-source versus proprietary models, and fine-tuning techniques including **LoRA** for low-resource languages. Community insights also touched on embedding chunking strategies and JSON output improvements.</description><pubDate>Sat, 03 Feb 2024 03:35:10 GMT</pubDate><category>ai2</category><category>allenai</category><category>mistral-ai</category><category>tsmc</category><category>asml</category><category>zeiss</category><category>olmo-1b</category><category>olmo-7b</category><category>olmo-65b</category><category>miqu-70b</category><category>mistral-medium</category><category>distilbert-base-uncased</category><category>nathan-lambert</category><category>lhc1921</category><category>mrdragonfox</category><category>yashkhare_</category><category>gbourdin</category><category>fine-tuning</category><category>gpu-shortage</category><category>embedding-chunking</category><category>json-generation</category><category>model-optimization</category><category>reproducible-research</category><category>self-correction</category><category>vram-constraints</category><category>programming-languages</category></item><item><title>Trust in GPTs at all time low</title><link>https://news.smol.ai/issues/24-02-01-ainews-trust-in-gpts-at-all-time-low/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-02-01-ainews-trust-in-gpts-at-all-time-low/</guid><description>**Discord communities** were analyzed with **21 guilds**, **312 channels**, and **8530 messages** reviewed, saving an estimated **628 minutes** of reading time. Discussions highlighted challenges with **GPTs** and the **GPT store**, including critiques of the **knowledge files capability** and context management issues. The **CUDA MODE Discord** was introduced for CUDA coding support. Key conversations in the **TheBloke Discord** covered **Xeon** GPU server cost-effectiveness, **Llama3** and **Mistral Medium** model comparisons, **LLaVA-1.6**&apos;s visual reasoning and OCR capabilities, and the leaked **Miqu** 70B model. Technical topics included fine-tuning **TinyLlama** and **MiquMaid+Euryale** models, and model merging with examples like **Harmony-4x7B-bf16** and **Smaug-34B-v0.1**. The **Nous Research AI Discord** discussed style influence in LLMs, quantization issues, **Bittensor** incentives for AI model improvements, and the identification of **MIQU** as **Mistral Medium**. The release of the **Open Hermes 2.5 dataset** on **Hugging Face** was also announced. *&quot;Discussions pointed towards the need for better context management in GPTs, contrasting with OpenAI&apos;s no-code approach.&quot;*</description><pubDate>Fri, 02 Feb 2024 03:25:24 GMT</pubDate><category>openai</category><category>hugging-face</category><category>mistral-ai</category><category>nous-research</category><category>bittensor</category><category>llama-3</category><category>mistral-medium</category><category>llava-1.6</category><category>miquella-120b-gguf</category><category>tinymodels</category><category>miqumaid</category><category>harmony-4x7b-bf16</category><category>smaug-34b-v0.1</category><category>nick-dobos</category><category>manojbh</category><category>teknium</category><category>arthurmensch</category><category>context-management</category><category>fine-tuning</category><category>model-merging</category><category>quantization</category><category>gpu-servers</category><category>visual-reasoning</category><category>ocr</category><category>dataset-release</category><category>incentive-structures</category></item><item><title>Miqu confirmed to be an early Mistral-medium checkpoint</title><link>https://news.smol.ai/issues/24-01-31-ainews-miqu-confirmed-to-be-an-early-mistral-medium-checkpoint/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-31-ainews-miqu-confirmed-to-be-an-early-mistral-medium-checkpoint/</guid><description>**Miqu**, an open access model, scores **74 on MMLU** and **84.5 on EQ-Bench**, sparking debates about its performance compared to **Mistral Medium**. The **CEO of Mistral** confirmed these results. Discussions in the **TheBloke Discord** highlight **Miqu&apos;s** superiority in instruction-following and sampling methods like dynatemp and min-p. Developers also explore browser preferences and Discord UI themes. Role-playing with models like **BagelMistery Tour v2** and **Psyfighter v2** is popular, alongside technical talks on **fp16 quantization** of **Miqu-1-70b**. Training and fine-tuning tips for models like **Unsloth** and **Mistral 7B** are shared. In the **Nous Research AI Discord**, the **Activation Beacon** method is discussed for extending LLM context length from 4K to 400K tokens. **SQLCoder-70B**, fine-tuned on **CodeLlama-70B**, leads in text-to-SQL generation and is available on Hugging Face. The **Miqu model** also impresses with an **83.5 EQ-Bench score**, fueling speculation about its capabilities.</description><pubDate>Wed, 31 Jan 2024 23:15:13 GMT</pubDate><category>mistral-ai</category><category>hugging-face</category><category>nous-research</category><category>aiatmeta</category><category>miqu-1-70b</category><category>mistral-medium</category><category>llama-2-70b-chat</category><category>mixtral</category><category>sqlcoder-70b</category><category>codellama-70b</category><category>bagelmistery-tour-v2</category><category>psyfighter-v2</category><category>intrstllrninja</category><category>instruction-following</category><category>sampling-methods</category><category>fp16-quantization</category><category>fine-tuning</category><category>model-training</category><category>context-length</category><category>text-to-sql</category><category>model-performance</category><category>model-optimization</category></item><item><title>CodeLLama 70B beats GPT4 on HumanEval</title><link>https://news.smol.ai/issues/24-01-30-ainews-codellama-70b-beats-gpt4-on-humaneval/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-30-ainews-codellama-70b-beats-gpt4-on-humaneval/</guid><description>**Meta AI** surprised the community with the release of **CodeLlama**, an open-source model now available on platforms like **Ollama** and **MLX** for local use. The **Miqu model** sparked debate over its origins, possibly linked to **Mistral Medium** or a fine-tuned **Llama-2-70b**, alongside discussions on **AI ethics** and alignment risks. The **Aphrodite engine** showed strong performance on **A6000 GPUs** with specific configurations. Role-playing AI models such as **Mixtral** and **Flatdolphinmaid** faced challenges with repetitiveness, while **Noromaid** and **Rpcal** performed better, with **ChatML** and **DPO** recommended for improved responses. Learning resources like fast.ai&apos;s course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like *Paged 8bit lion* and *adafactor* were discussed. 

At **Nous Research AI**, the **Activation Beacon** project introduced a method for unlimited context length in LLMs using &quot;global state&quot; tokens, potentially transforming retrieval-augmented models. The **Eagle-7B** model, based on **RWKV-v5**, outperformed **Mistral** in benchmarks with efficiency and multilingual capabilities. **OpenHermes2.5** was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like **IMP v1-3b**, **Bakllava**, **Moondream**, and **Qwen-vl** were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.</description><pubDate>Tue, 30 Jan 2024 21:10:01 GMT</pubDate><category>meta-ai-fair</category><category>ollama</category><category>nous-research</category><category>mistral-ai</category><category>hugging-face</category><category>codellama</category><category>miqu</category><category>mistral-medium</category><category>llama-2-70b</category><category>aphrodite-engine</category><category>mixtral</category><category>flatdolphinmaid</category><category>noromaid</category><category>rpcal</category><category>chatml</category><category>mistral-7b</category><category>activation-beacon</category><category>eagle-7b</category><category>rwkv-v5</category><category>openhermes2.5</category><category>nous-hermes-2-mixtral-8x7b-dpo</category><category>imp-v1-3b</category><category>bakllava</category><category>moondream</category><category>qwen-vl</category><category>ai-ethics</category><category>alignment</category><category>gpu-optimization</category><category>direct-prompt-optimization</category><category>fine-tuning</category><category>cuda-programming</category><category>optimizer-technology</category><category>quantization</category><category>multimodality</category><category>context-length</category><category>dense-retrieval</category><category>retrieval-augmented-generation</category><category>multilinguality</category><category>model-performance</category><category>open-source</category><category>code-generation</category><category>classification</category><category>vision</category></item><item><title>RWKV &quot;Eagle&quot; v5: Your move, Mamba</title><link>https://news.smol.ai/issues/24-01-29-ainews-rwkv-eagle-v5-your-move-mamba/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-29-ainews-rwkv-eagle-v5-your-move-mamba/</guid><description>**RWKV v5 Eagle** was released with better-than-**mistral-7b** evaluation results, trading some English performance for multilingual capabilities. The mysterious **miqu-1-70b** model sparked debate about its origins, possibly a leak or distillation of **Mistral Medium** or a fine-tuned **Llama 2**. Discussions highlighted fine-tuning techniques, including the effectiveness of **1,000 high-quality prompts** over larger mixed-quality datasets, and tools like **Deepspeed**, **Axolotl**, and **QLoRA**. The **Nous Research AI** community emphasized the impact of **Rotary Position Embedding (RoPE) theta settings** on LLM extrapolation, improving models like **Mistral Instruct v0.2**. Speed improvements in **Mistral Tuna** kernels reduced token processing costs, enhancing efficiency. The launch of **Eagle 7B** with 7.52B parameters showcased strong multilingual performance, surpassing other 7B class models.</description><pubDate>Tue, 30 Jan 2024 01:20:56 GMT</pubDate><category>eleutherai</category><category>mistral-ai</category><category>hugging-face</category><category>llamaindex</category><category>nous-research</category><category>rwkv</category><category>lmsys</category><category>rwkv-v5</category><category>mistral-7b</category><category>miqu-1-70b</category><category>mistral-medium</category><category>llama-2</category><category>mistral-instruct-v0.2</category><category>mistral-tuna</category><category>llama-2-13b</category><category>kunoichi-dpo-v2-7b</category><category>gpt-4</category><category>andrej-karpathy</category><category>fine-tuning</category><category>multilinguality</category><category>rotary-position-embedding</category><category>model-optimization</category><category>model-performance</category><category>quantization</category><category>speed-optimization</category><category>prompt-engineering</category><category>model-benchmarking</category><category>reinforcement-learning</category></item><item><title>GPT4Turbo A/B Test: gpt-4-0125-preview</title><link>https://news.smol.ai/issues/24-01-26-ainews-gpt4turbo-ab-test-gpt-4-0125-preview/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-26-ainews-gpt4turbo-ab-test-gpt-4-0125-preview/</guid><description>**OpenAI** released a new **GPT-4 Turbo** version in January 2024, prompting natural experiments in summarization and discussions on API performance and cost trade-offs. The **TheBloke** Discord highlighted **UnSloth&apos;s** upcoming limited multi-GPU support for Google Colab beginners, AI models like **Tiny Llama** and **Mistral** running on Nintendo Switch, and advanced model merging techniques such as DARE and SLERP. The **OpenAI** Discord noted issues with **GPT-4-1106-preview** processing delays, troubleshooting GPT model errors, and transcription challenges with **GPT-3.5** and **GPT-4 Turbo**. **Nous Research AI** focused on extending context windows, notably **LLaMA-2-7B-Chat** reaching **16,384** tokens, and fine-tuning alternatives like **SelfExtend**. Discussions also touched on chatbot persona creation, model configuration optimizations, and societal impacts of AI technology.</description><pubDate>Fri, 26 Jan 2024 22:48:31 GMT</pubDate><category>openai</category><category>thebloke</category><category>nous-research</category><category>hugging-face</category><category>gpt-4-turbo</category><category>gpt-4-1106-preview</category><category>gpt-3.5</category><category>llama-2-7b-chat</category><category>tiny-llama</category><category>mistral</category><category>multi-gpu-support</category><category>model-optimization</category><category>model-merging</category><category>fine-tuning</category><category>context-windows</category><category>chatbot-personas</category><category>api-performance</category><category>text-transcription</category><category>cost-considerations</category><category>model-troubleshooting</category></item><item><title>GPT4Turbo A/B Test: gpt-4-1106-preview</title><link>https://news.smol.ai/issues/24-01-26-ainews-gpt4turbo-ab-test-gpt-4-1106-preview/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-26-ainews-gpt4turbo-ab-test-gpt-4-1106-preview/</guid><description>**OpenAI** released a new **GPT-4 Turbo** version, prompting a natural experiment in summarization comparing the November 2023 and January 2024 versions. The **TheBloke** Discord discussed troubleshooting model loading errors with **OpenHermes-2.5-Mistral-7B-4.0bpw** and **exllamav2**, debates on **RHEL** in ML, dataset generation for understanding GPT flaws, and running LLMs like **Llama** and **Mistral** on consoles. **LangChain** fine-tuning challenges for **Llama2** were also noted. The **OpenAI** Discord highlighted **GPT-4** speed inconsistencies, API vs web performance, prompt engineering with **GPT-3.5** and **GPT-4 Turbo**, and **DALL-E** typo issues in image text. Discussions included NLP tools like *semantic-text-splitter* and collaboration concerns with **GPT-4 Vision** on **Azure**. The **Nous Research AI** Discord focused on extending context windows with **Mistral instruct v0.2**, **MistralLite**, and **LLaMA-2-7B-Chat** achieving 16,384 token context, plus alternatives like **SelfExtend** for context extension without fine-tuning. The societal impact of AI technology was also considered.</description><pubDate>Fri, 26 Jan 2024 22:07:42 GMT</pubDate><category>openai</category><category>huggingface</category><category>thebloke</category><category>nous-research</category><category>mistral-ai</category><category>langchain</category><category>microsoft</category><category>azure</category><category>gpt-4-turbo</category><category>gpt-4</category><category>gpt-3.5</category><category>openhermes-2.5-mistral-7b-4.0bpw</category><category>exllamav2</category><category>llama-2-7b-chat</category><category>mistral-instruct-v0.2</category><category>mistrallite</category><category>llama2</category><category>model-loading</category><category>rhel</category><category>dataset-generation</category><category>llm-on-consoles</category><category>fine-tuning</category><category>speed-optimization</category><category>api-performance</category><category>prompt-engineering</category><category>token-limits</category><category>memory-constraints</category><category>text-generation</category><category>nlp-tools</category><category>context-window-extension</category><category>sliding-windows</category><category>rope-theta</category><category>non-finetuning-context-extension</category><category>societal-impact</category></item><item><title>Adept Fuyu-Heavy: Multimodal model for Agents</title><link>https://news.smol.ai/issues/24-01-25-ainews-adept-fuyu-heavy-multimodal-model-for-agents/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-25-ainews-adept-fuyu-heavy-multimodal-model-for-agents/</guid><description>**Adept** launched **Fuyu-Heavy**, a multimodal model focused on UI understanding and visual QA, outperforming **Gemini Pro** on the MMMU benchmark. The model uses **DPO** (Direct Preference Optimization), gaining attention as a leading tuning method. The size of Fuyu-Heavy is undisclosed but estimated between **20B-170B** parameters, smaller than rumored frontier models like **Claude 2**, **GPT4V**, and **Gemini Ultra**. Meanwhile, **Mamba** was rejected at ICLR for quality concerns. In Discord discussions, **DeepSeek Coder 33B** was claimed to outperform **GPT-4** in coding tasks, and deployment strategies for large models like **Yi-34B-200K** and **Goliath-120B** were explored. Quantization debates highlighted mixed views on **Q8** and **EXL2 quants**. Fine-tuning and instruct-tuning of **Mistral 7B Instruct v0.2** were discussed, alongside insights on RMS optimization and heterogeneous AI architectures combining **Transformers** and **Selective SSM (Mamba)**. The potential of recurrent LLMs like **RWKV** and techniques like **Contrastive Preference Optimization (CPO)** were also noted.</description><pubDate>Thu, 25 Jan 2024 21:30:23 GMT</pubDate><category>adept</category><category>hugging-face</category><category>deepseek</category><category>mistral-ai</category><category>nous-research</category><category>fuyu-heavy</category><category>fuyu-8b</category><category>gemini-pro</category><category>claude-2</category><category>gpt4v</category><category>gemini-ultra</category><category>deepseek-coder-33b</category><category>yi-34b-200k</category><category>goliath-120b</category><category>mistral-7b-instruct-v0.2</category><category>mamba</category><category>rwkv</category><category>multimodality</category><category>visual-question-answering</category><category>direct-preference-optimization</category><category>benchmarking</category><category>model-size-estimation</category><category>quantization</category><category>model-merging</category><category>fine-tuning</category><category>instruct-tuning</category><category>rms-optimization</category><category>heterogeneous-ai-architectures</category><category>recurrent-llms</category><category>contrastive-preference-optimization</category></item><item><title>Google Solves Text to Video</title><link>https://news.smol.ai/issues/24-01-24-ainews-google-solves-text-to-video/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-24-ainews-google-solves-text-to-video/</guid><description>**Google Research** introduced **Lumiere**, a text-to-video model featuring advanced inpainting capabilities using a Space-Time diffusion process, surpassing previous models like Pika and Runway. Manveer from UseScholar.org compiled a comprehensive list of code evaluation benchmarks beyond HumanEval, including datasets from **Amazon Science**, **Hugging Face**, and others. Discord communities such as **TheBloke** discussed topics including running **Mistral-7B** via API, GPU rentals, and multimodal model integration with **LLava**. **Nous Research AI** highlighted learning rate strategies for LLM fine-tuning, issues with inference, and benchmarks like HumanEval and MBPP. **RestGPT** gained attention for controlling applications via RESTful APIs, showcasing LLM application capabilities.</description><pubDate>Thu, 25 Jan 2024 05:36:26 GMT</pubDate><category>google-research</category><category>amazon-science</category><category>huggingface</category><category>mistral-ai</category><category>together-ai</category><category>mistral-7b</category><category>llava</category><category>text-to-video</category><category>inpainting</category><category>space-time-diffusion</category><category>code-evaluation</category><category>fine-tuning</category><category>inference</category><category>gpu-rentals</category><category>multimodality</category><category>api</category><category>model-integration</category><category>learning-rates</category></item><item><title>RIP Latent Diffusion, Hello Hourglass Diffusion</title><link>https://news.smol.ai/issues/24-01-23-ainews-rip-latent-diffusion-hello-hourglass-diffusion/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-23-ainews-rip-latent-diffusion-hello-hourglass-diffusion/</guid><description>**Katherine Crowson** from **Stable Diffusion** introduces a hierarchical pure transformer backbone for diffusion-based image generation that efficiently scales to megapixel resolutions with under 600 million parameters, improving upon the original ~900M parameter model. This architecture processes local and global image phenomena separately, enhancing efficiency and resolution without latent steps. Additionally, Meta&apos;s Self Rewarding LM paper has inspired **lucidrains** to begin an implementation. Discord summaries highlight GPT-4&apos;s robustness against quantification tricks, discussions on open-source GPT-0 alternatives, challenges in DPO training on limited VRAM with suggestions like QLoRA and rmsprop, and efforts to improve roleplay model consistency through fine-tuning and merging. Philosophical debates on AI sentience and GPT-4 customization for markdown and translation tasks were also noted.</description><pubDate>Wed, 24 Jan 2024 01:38:15 GMT</pubDate><category>stable-diffusion</category><category>meta-ai-fair</category><category>openai</category><category>hugging-face</category><category>gpt-4</category><category>latent-diffusion</category><category>katherine-crowson</category><category>lucidrains</category><category>diffusion-models</category><category>transformers</category><category>image-generation</category><category>model-efficiency</category><category>fine-tuning</category><category>quantization</category><category>prompt-engineering</category><category>roleplay</category><category>training-optimization</category></item><item><title>Nightshade poisons AI art... kinda?</title><link>https://news.smol.ai/issues/24-01-22-ainews-nightshade-poisons-ai-art-kinda/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-22-ainews-nightshade-poisons-ai-art-kinda/</guid><description>Over the weekend of **1/19-20/2024**, discussions in **TheBloke Discord** covered key topics including **Mixture of Experts (MoE)** model efficiency, GPU parallelism, and quantization strategies. Users debated the effectiveness of AI detection tools like **GPTZero** and explored fine-tuning challenges with models such as **Mistral 7B** and **Falcon 7B**. Community interest was strong in developing simpler, community-powered quantization services and understanding model merging techniques. Ethical considerations around AI applications like AI girlfriend sites were also discussed.</description><pubDate>Mon, 22 Jan 2024 21:09:56 GMT</pubDate><category>mistral-ai</category><category>hugging-face</category><category>mistral-7b</category><category>falcon-7b</category><category>mixture-of-experts</category><category>gpu-parallelism</category><category>quantization</category><category>fine-tuning</category><category>model-merging</category><category>ai-detection</category><category>role-playing</category><category>benchmarking</category></item><item><title>Sama says: GPT-5 soon</title><link>https://news.smol.ai/issues/24-01-22-ainews-sama-says-gpt-5-soon/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-22-ainews-sama-says-gpt-5-soon/</guid><description>**Sam Altman** at Davos highlighted that his top priority is launching the new model, likely called **GPT-5**, while expressing uncertainty about **Ilya Sutskever**&apos;s employment status. **Itamar from Codium** introduced the concept of **Flow Engineering** with **AlphaCodium**, gaining attention from **Andrej Karpathy**. On the **TheBloke Discord**, engineers discussed a **multi-specialty mixture-of-experts (MOE) model** combining seven distinct 7 billion parameter models specialized in law, finance, and medicine. Debates on **8-bit fine-tuning** and the use of **bitsandbytes** with GPU support were prominent. Discussions also covered **model merging** using tools like **Mergekit** and compatibility with **Alpaca format**. Interest in optimizing AI models on **AMD** hardware using **AOCL blas and lapack libraries** with **llama.cpp** was noted. Users experimented with AI for command line tasks, and the **Mixtral MoE model** was refined to surpass larger models in coding ability. Comparisons among LLMs such as **GPT-3.5**, **Mixtral**, **Gemini Pro**, and **GPT-4** focused on knowledge depth, problem-solving, and speed, especially for coding tasks.</description><pubDate>Mon, 22 Jan 2024 20:51:23 GMT</pubDate><category>openai</category><category>codium</category><category>thebloke</category><category>amd</category><category>hugging-face</category><category>gpt-5</category><category>mixtral-7b</category><category>gpt-3.5</category><category>gemini-pro</category><category>gpt-4</category><category>llama-cpp</category><category>sam-altman</category><category>ilya-sutskever</category><category>itamar</category><category>andrej-karpathy</category><category>mixture-of-experts</category><category>fine-tuning</category><category>model-merging</category><category>8-bit-optimization</category><category>gpu-acceleration</category><category>performance-comparison</category><category>command-line-ai</category><category>vector-stores</category><category>embeddings</category><category>coding-capabilities</category></item><item><title>1/17/2024: Help crowdsource function calling datasets</title><link>https://news.smol.ai/issues/24-01-18-ainews-1172024-help-crowdsource-function-calling-datasets/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-18-ainews-1172024-help-crowdsource-function-calling-datasets/</guid><description>**LM Studio** updated its FAQ clarifying its **closed-source** status and perpetual freeness for personal use with no data collection. The new beta release includes fixes and hints at upcoming **2-bit quantization** support. For gaming, models like **Dolphin 2.7 Mixtral 8x7B**, **MegaDolphin**, and **Dolphin 2.6 Mistral 7B DPO** with **Q4_K_M** quantization were recommended. Discussions highlighted that single powerful GPUs outperform multi-GPU setups due to bottlenecks, with older GPUs like Tesla P40 being cost-effective. **Microsoft&apos;s AutoGen Studio** was introduced but has issues and requires **API fees** for open-source models. Linux users are advised to use **llama.cpp** over LM Studio due to lack of headless mode. Additional tools like **LLMFarm** for iOS and various Hugging Face repositories were also mentioned. *&quot;LM Studio must be running to use the local inference server as there is no headless mode available&quot;* and *&quot;matching model size to GPU memory is key for performance&quot;* were notable points.</description><pubDate>Thu, 18 Jan 2024 21:20:01 GMT</pubDate><category>lm-studio</category><category>mistral-ai</category><category>microsoft</category><category>hugging-face</category><category>apple</category><category>mistral-7b</category><category>dolphin-2.7-mixtral-8x7b</category><category>mega-dolphin</category><category>dolphin-2.6-mistral-7b-dpo</category><category>llama-cpp</category><category>yagilb</category><category>heyitsyorkie</category><category>function-calling</category><category>quantization</category><category>model-performance</category><category>gpu-optimization</category><category>model-selection</category><category>closed-source</category><category>memory-optimization</category><category>linux-server</category><category>api-fees</category><category>headless-mode</category></item><item><title>1/16/2024: ArtificialAnalysis - a new model/host benchmark site</title><link>https://news.smol.ai/issues/24-01-17-ainews-1162024-artificialanalysis-a-new-modelhost-benchmark-site/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-17-ainews-1162024-artificialanalysis-a-new-modelhost-benchmark-site/</guid><description>**Artificial Analysis** launched a new models and hosts comparison site, highlighted by **swyx**. **Nous Research AI** Discord discussed innovative summarization techniques using **NVIDIA 3090 and 2080ti GPUs** for processing around **100k tokens**, and adapting prompts for smaller models like **OpenChat 7B**. The availability of **Hermes 2 Mixtral** on **Huggingface&apos;s HuggingChat** was noted, alongside fine-tuning challenges with **Mixtral** using Axolotl. Discussions included byte-level tokenization experiments with **Byte Mistral**, multimodal training on **COCO image bytes**, and inference speed improvements using **vllm** and **llama.cpp**. Calls for transparency in data sharing and open-sourcing the **Hermes 2 Mixtral** dataset were emphasized, with comparisons of **dpo** and **sft** methods and quantized LLM use on **M1 MacBook Pro**.</description><pubDate>Wed, 17 Jan 2024 22:14:53 GMT</pubDate><category>nous-research</category><category>nvidia</category><category>hugging-face</category><category>mixtral</category><category>hermes-2-mixtral</category><category>openchat-7b</category><category>byte-mistral</category><category>swyx</category><category>gabriel_syme</category><category>manojbh</category><category>carsonpoole</category><category>fullstack6209</category><category>summarization</category><category>fine-tuning</category><category>byte-level-tokenization</category><category>multimodality</category><category>inference-speed-optimization</category><category>dataset-sharing</category><category>quantization</category></item><item><title>1/16/2024: TIES-Merging</title><link>https://news.smol.ai/issues/24-01-16-ainews-1162024-ties-merging/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-16-ainews-1162024-ties-merging/</guid><description>**TheBloke&apos;s Discord** community actively discusses **Mixture of Experts (MoE) models**, focusing on **random gate routing layers** for training and the challenges of immediate model use. There is a robust debate on **quantization methods**, comparing **GPTQ** and **EXL2 quants**, with EXL2 noted for faster execution on specialized hardware. A new model, **Nous Hermes 2**, based on **Mixtral 8x7B** and trained with **RLHF**, claims benchmark superiority but shows some inconsistencies. The **Frontier supercomputer** at Oak Ridge National Laboratory is highlighted for training a **trillion-parameter LLM** with **14TB RAM**, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of **ghost attention** in the **academicat** model is explored, with mixed reactions from the community. *&quot;Random gate layer is good for training but not for immediate use,&quot;* and *&quot;EXL2 might offer faster execution on specialized hardware,&quot;* are key insights shared.</description><pubDate>Tue, 16 Jan 2024 20:51:01 GMT</pubDate><category>thebloke</category><category>hugging-face</category><category>nous-research</category><category>togethercompute</category><category>oak-ridge-national-laboratory</category><category>vast-ai</category><category>runpod</category><category>mixtral-8x7b</category><category>nous-hermes-2</category><category>frankendpo-4x7b-bf16</category><category>sanjiwatsuki</category><category>superking__</category><category>mrdragonfox</category><category>_dampf</category><category>kaltcit</category><category>rombodawg</category><category>technotech</category><category>mixture-of-experts</category><category>random-gate-routing</category><category>quantization</category><category>gptq</category><category>exl2-quants</category><category>reinforcement-learning-from-human-feedback</category><category>supercomputing</category><category>trillion-parameter-models</category><category>ghost-attention</category><category>model-fine-tuning</category><category>reward-models</category></item><item><title>1/13-14/2024: Don&apos;t sleep on #prompt-engineering </title><link>https://news.smol.ai/issues/24-01-15-ainews-113-142024-dont-sleep-on-prompt-engineering/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-15-ainews-113-142024-dont-sleep-on-prompt-engineering/</guid><description>The **OpenAI** Discord community engaged in diverse discussions including **prompt engineering** techniques like contrastive Chain of Thought and step back prompting, and explored **model merging** and **mixture-of-experts (MoE)** concepts. Philosophical debates on **AI consciousness** and the ethics of **AI-generated voices** highlighted concerns about AI sentience and copyright issues. Technical clarifications were made on **hyperdimensional vector space models** used in modern AI embeddings. Users also discussed **customizing GPT** with personality profiles and prompt personalization to overcome token limits, and proposed a **universal translator** feature for multilingual Discord interactions. Key contributors included longtime regular MadameArchitect and community members such as @darthgustav and @metaldrgn.</description><pubDate>Tue, 16 Jan 2024 00:58:42 GMT</pubDate><category>openai</category><category>madamearchitect</category><category>darthgustav</category><category>metaldrgn</category><category>prompt-engineering</category><category>model-merging</category><category>mixture-of-experts</category><category>ai-consciousness</category><category>ethics</category><category>hyperdimensional-vector-space</category><category>tokenization</category><category>multilinguality</category><category>prompt-personalization</category></item><item><title>1/12/2024: Anthropic coins Sleeper Agents</title><link>https://news.smol.ai/issues/24-01-13-ainews-1122024-anthropic-coins-sleeper-agents/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-13-ainews-1122024-anthropic-coins-sleeper-agents/</guid><description>**Anthropic** released a new paper exploring the persistence of deceptive alignment and backdoors in models through stages of training including supervised fine-tuning and reinforcement learning safety training. The study found that safety training and adversarial training did not eliminate backdoors, which can cause models to write insecure code or exhibit hidden behaviors triggered by specific prompts. Notable AI figures like **leo gao** and **andrej-karpathy** praised the work, highlighting its implications for future model security and the risks of sleeper agent LLMs. Additionally, the **Nous Research AI** Discord community discussed topics such as the trade-off between security and convenience, the **Hulk Dataset 0.1** for LLM fine-tuning, curiosity about a **120B model** and **Nous Mixtral**, debates on LLM leaderboard legitimacy, and the rise of Frankenmerge techniques for model merging and capacity enhancement.</description><pubDate>Sat, 13 Jan 2024 22:06:35 GMT</pubDate><category>anthropic</category><category>openai</category><category>nous-research</category><category>hugging-face</category><category>nous-mixtral</category><category>120b</category><category>leo-gao</category><category>andrej-karpathy</category><category>reinforcement-learning</category><category>fine-tuning</category><category>backdoors</category><category>model-security</category><category>adversarial-training</category><category>chain-of-thought</category><category>model-merging</category><category>dataset-release</category><category>security-vs-convenience</category></item><item><title>1/11/2024: Mixing Experts vs Merging Models</title><link>https://news.smol.ai/issues/24-01-12-ainews-1112024-mixing-experts-vs-merging-models/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-12-ainews-1112024-mixing-experts-vs-merging-models/</guid><description>**18 guilds**, **277 channels**, and **1342 messages** were analyzed with an estimated reading time saved of **187 minutes**. The community switched to **GPT-4 turbo** and discussed the rise of **Mixture of Experts (MoE) models** like **Mixtral**, **DeepSeekMOE**, and **Phixtral**. Model merging techniques, including naive linear interpolation and &quot;frankenmerges&quot; by **SOLAR** and **Goliath**, are driving new performance gains on open leaderboards. Discussions in the **Nous Research AI Discord** covered topics such as AI playgrounds supporting prompt and RAG parameters, security concerns about third-party cloud usage, debates on Discord bots and TOS, skepticism about **Teenage Engineering&apos;s** cloud LLM, and performance differences between **GPT-4 0613** and **GPT-4 turbo**. The community also explored fine-tuning strategies involving **DPO**, **LoRA**, and safetensors, integration of RAG with API calls, semantic differences between MoE and dense LLMs, and data frameworks like **llama index** and **SciPhi-AI&apos;s synthesizer**. Issues with anomalous characters in fine-tuning were also raised.</description><pubDate>Fri, 12 Jan 2024 18:49:15 GMT</pubDate><category>deepseek-ai</category><category>hugging-face</category><category>nous-research</category><category>teenage-engineering</category><category>discord</category><category>gpt-4-turbo</category><category>gpt-4-0613</category><category>mixtral</category><category>deepseekmoe</category><category>phixtral</category><category>ash_prabaker</category><category>shacrw</category><category>teknium</category><category>0xevil</category><category>everyoneisgross</category><category>ldj</category><category>pramod8481</category><category>mgreg_42266</category><category>georgejrjrjr</category><category>kenakafrosty</category><category>mixture-of-experts</category><category>model-merging</category><category>fine-tuning</category><category>rag</category><category>security</category><category>discord-tos</category><category>model-performance</category><category>prompt-engineering</category><category>function-calling</category><category>semantic-analysis</category><category>data-frameworks</category></item><item><title>1/10/2024: All the best papers for AI Engineers</title><link>https://news.smol.ai/issues/24-01-11-ainews-1102024-all-the-best-papers-for-ai-engineers/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-11-ainews-1102024-all-the-best-papers-for-ai-engineers/</guid><description>**OpenAI** launched the **GPT Store** featuring over **3 million** custom versions of **ChatGPT** accessible to Plus, Team, and Enterprise users, with weekly highlights of impactful GPTs like **AllTrails**. The new **ChatGPT Team** plan offers advanced models including **GPT-4** and **DALL·E 3**, alongside collaborative tools and enhanced data privacy. Discussions around AI-generated imagery favored **DALL·E** and **Stable Diffusion**, while users faced rate limit challenges and debated the GPT Store&apos;s SEO and categorization. Ethical considerations in prompt engineering were raised with a three-layer framework called &apos;The Sieve&apos;. Additionally, **DeepSeek-MoE** was noted for its range of Mixture of Experts (MoE) model sizes. *&quot;The Sieve,&quot; a three-layer ethical framework for AI,* was highlighted in prompt engineering discussions.</description><pubDate>Thu, 11 Jan 2024 08:35:15 GMT</pubDate><category>openai</category><category>deepseek-ai</category><category>chatgpt</category><category>gpt-4</category><category>dall-e-3</category><category>stable-diffusion</category><category>deepseek-moe</category><category>abdubs</category><category>darthgustav</category><category>prompt-engineering</category><category>model-release</category><category>rate-limiting</category><category>ethics</category><category>image-generation</category><category>moe</category><category>collaborative-workspaces</category><category>data-privacy</category></item><item><title>1/9/2024: Nous Research lands $5m for Open Source AI</title><link>https://news.smol.ai/issues/24-01-10-ainews-192024-nous-research-lands-dollar5m-for-open-source-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-10-ainews-192024-nous-research-lands-dollar5m-for-open-source-ai/</guid><description>**Nous Research** announced a **$5.2 million seed financing** focused on **Nous-Forge**, aiming to embed transformer architecture into chips for powerful servers supporting real-time voice agents and **trillion parameter models**. **Rabbit R1** launched a demo at CES with mixed reactions. **OpenAI** shipped the **GPT store** and briefly leaked an upcoming personalization feature. A new paper on **Activation Beacon** proposes a solution to extend LLMs&apos; context window significantly, with code to be released on GitHub. Discussions also covered **QLORA**, **fine-tuning**, **synthetic data**, and **custom architectures** for LLMs.</description><pubDate>Thu, 11 Jan 2024 00:53:13 GMT</pubDate><category>nous-research</category><category>openai</category><category>rabbit-tech</category><category>qlora</category><category>phi-3</category><category>mixtral</category><category>ollama</category><category>kenakafrosty</category><category>_stilic_</category><category>teknium</category><category>context-window</category><category>fine-tuning</category><category>synthetic-data</category><category>activation-beacon</category><category>transformer-architecture</category><category>seed-financing</category><category>real-time-voice-agents</category><category>trillion-parameter-models</category></item><item><title>1/8/2024: The Four Wars of the AI Stack</title><link>https://news.smol.ai/issues/24-01-08-ainews-182024-the-four-wars-of-the-ai-stack/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-08-ainews-182024-the-four-wars-of-the-ai-stack/</guid><description>The **Nous Research AI Discord** discussions highlighted several key topics including the use of **DINO**, **CLIP**, and **CNNs** in the **Obsidian Project**. A research paper on distributed models like **DistAttention** and **DistKV-LLM** was shared to address cloud-based **LLM** service challenges. Another paper titled &apos;Self-Extend LLM Context Window Without Tuning&apos; argued that existing **LLMs** can handle long contexts inherently. The community also discussed AI models like **Mixtral**, favored for its **32k context window**, and compared it with **Mistral** and **Marcoroni**. Other topics included hierarchical embeddings, agentic retrieval-augmented generation (**RAG**), synthetic data for fine-tuning, and the application of **LLMs** in the oil &amp; gas industry. The launch of the **AgentSearch-V1** dataset with one billion embedding vectors was also announced. The discussions covered **mixture-of-experts (MoE)** implementations and the performance of smaller models.</description><pubDate>Tue, 09 Jan 2024 07:39:51 GMT</pubDate><category>nous-research</category><category>openai</category><category>mistral-ai</category><category>hugging-face</category><category>mixtral</category><category>mistral</category><category>context-window</category><category>distributed-models</category><category>long-context</category><category>hierarchical-embeddings</category><category>agentic-rag</category><category>fine-tuning</category><category>synthetic-data</category><category>oil-and-gas</category><category>embedding-datasets</category><category>mixture-of-experts</category><category>model-comparison</category></item><item><title>1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??</title><link>https://news.smol.ai/issues/24-01-07-ainews-16-72024-llama-pro-an-alternative-to-peftrag/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-07-ainews-16-72024-llama-pro-an-alternative-to-peftrag/</guid><description>New research papers introduce promising **Llama Extensions** including **TinyLlama**, a compact **1.1B** parameter model pretrained on about **1 trillion tokens** for 3 epochs, and **LLaMA Pro**, an **8.3B** parameter model expanding **LLaMA2-7B** with additional training on **80 billion tokens** of code and math data. LLaMA Pro adds layers to avoid catastrophic forgetting and balances language and code tasks but faces scrutiny for not using newer models like **Mistral** or **Qwen**. Meanwhile, **OpenAI** Discord discussions reveal insights on **GPT-4** token limits, privacy reassurances, fine-tuning for GPT-3.5, challenges with multi-language image recognition, custom GPT creation requiring **ChatGPT Plus**, and security concerns in GPT deployment. Users also share tips on dynamic image generation with **DALL-E** and logo creation.</description><pubDate>Mon, 08 Jan 2024 00:51:41 GMT</pubDate><category>openai</category><category>mistral-ai</category><category>llamaindex</category><category>langchain</category><category>llama-3</category><category>llama-3-1-1b</category><category>llama-3-8-3b</category><category>gpt-4</category><category>gpt-3.5</category><category>dall-e</category><category>yannic-kilcher</category><category>fine-tuning</category><category>model-expansion</category><category>token-limits</category><category>privacy</category><category>multilinguality</category><category>image-generation</category><category>security</category><category>custom-models</category><category>model-training</category></item><item><title>1/4/2024: Jeff Bezos backs Perplexity&apos;s $520m Series B.</title><link>https://news.smol.ai/issues/24-01-05-ainews-142024-jeff-bezos-backs-perplexitys-dollar520m-series-b/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-05-ainews-142024-jeff-bezos-backs-perplexitys-dollar520m-series-b/</guid><description>**Perplexity** announced their **Series B** funding round with notable investor **Jeff Bezos**, who previously invested in **Google** 25 years ago. **Anthropic** is raising **$750 million**, projecting at least **$850 million in annualized revenue** next year and implementing &quot;brutal&quot; changes to their Terms of Service. Discussions in **Nous Research AI Discord** cover topics such as **document recall limits from gigabytes of data**, **RNN memory and compute trade-offs**, **synthetic datasets**, and benchmarking of models like **WizardCoder-33B-V1.1**, **MobileLLaMA-1.4B-Base**, **ShearedLLaMA**, and **TinyLLaMA**. Other highlights include **UnsLOTH** optimizations for multi-GPU systems, **AI rap voice models**, **context-extending code**, and architectural innovations like applying **Detectron/ViT backbones to LLMs**, **sliding window attention** in **Mistral**, and parallelizing **Mixtral 8x7b** with **FSDP** and **HF Accelerate**.</description><pubDate>Fri, 05 Jan 2024 08:29:59 GMT</pubDate><category>perplexity</category><category>anthropic</category><category>google</category><category>nous-research</category><category>mistral-ai</category><category>hugging-face</category><category>wizardcoder-33b-v1.1</category><category>mobilellama-1.4b-base</category><category>shearedllama</category><category>tinyllama</category><category>mixtral-8x7b</category><category>jeff-bezos</category><category>document-recall</category><category>rnn-memory</category><category>synthetic-data</category><category>benchmarking</category><category>multi-gpu-support</category><category>context-length</category><category>model-architecture</category><category>sliding-window-attention</category><category>model-parallelism</category><category>gpu-optimization</category></item><item><title>1/3/2024: RIP Coqui</title><link>https://news.smol.ai/issues/24-01-03-ainews-132024-rip-coqui/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-03-ainews-132024-rip-coqui/</guid><description>**Coqui**, a prominent open source text-to-speech project from the Mozilla ML group, officially shut down. Discussions in the **HuggingFace** Discord highlighted skepticism about the claimed `3X faster` speed of **sdxl**, attributing improvements more to techniques like `torch.compile` and removal of `fp16` and `attention` rather than **diffusers 0.25** features. Users confirmed that a *HuggingFace user token* can be used across multiple machines, though distinct tokens are recommended for safety. The **Learning Loss Minimization (LLM) Leaderboard** briefly experienced issues but was later confirmed operational. A Kaggle notebook was shared demonstrating how to build Transformer architectures from scratch using PyTorch. Additionally, a new image dataset with 15k shoe, sandal, and boot images was introduced for multiclass classification tasks. Explanations about the workings of the Common Crawl web-crawling process were also shared.</description><pubDate>Thu, 04 Jan 2024 06:56:46 GMT</pubDate><category>coqui</category><category>mozilla</category><category>hugging-face</category><category>google</category><category>sdxl</category><category>diffusers-0.25</category><category>text-to-speech</category><category>performance-optimization</category><category>token-management</category><category>transformer-architecture</category><category>image-datasets</category><category>web-crawling</category><category>pytorch</category><category>leaderboards</category></item><item><title>1/2/2024: Smol tweaks to Smol Talk</title><link>https://news.smol.ai/issues/24-01-02-ainews-122024-smol-tweaks-to-smol-talk/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-02-ainews-122024-smol-tweaks-to-smol-talk/</guid><description>**OpenAI** Discord discussions highlight a detailed comparison of AI search engines including **Perplexity**, **Copilot**, **Bard**, and **Claude 2**, with Bard and Claude 2 trailing behind. **Meta AI** chatbot by Meta is introduced, available on Instagram and Whatsapp, featuring image generation likened to a free GPT version. Users report multiple browser issues with **ChatGPT**, including persistent captchas when using VPNs and plugin malfunctions. Debates cover prompt engineering, API usage, and data formats like **JSON**, **YAML**, and **Markdown**. Discussions also touch on ChatGPT&apos;s personality tuning and model capability variations. *&quot;Meta AI includes an image generation feature, which he likened to a free version of GPT.&quot;*</description><pubDate>Wed, 03 Jan 2024 07:38:24 GMT</pubDate><category>openai</category><category>meta-ai-fair</category><category>perplexity-ai</category><category>claude-2</category><category>bard</category><category>copilot</category><category>meta-ai</category><category>gemini-ultra</category><category>chatgpt</category><category>prompt-engineering</category><category>api</category><category>json</category><category>yaml</category><category>markdown</category><category>chatbot</category><category>image-generation</category><category>vpn</category><category>browser-compatibility</category><category>personality-tuning</category><category>plugin-issues</category></item><item><title>1/1/2024: How to start with Open Source AI</title><link>https://news.smol.ai/issues/24-01-02-ainews-112024-how-to-start-with-open-source-ai/</link><guid isPermaLink="true">https://news.smol.ai/issues/24-01-02-ainews-112024-how-to-start-with-open-source-ai/</guid><description>**OpenAI Discord** discussions revealed mixed sentiments about **Bing&apos;s AI** versus **ChatGPT** and **Perplexity AI**, and debated **Microsoft Copilot&apos;s** integration with **Office 365**. Users discussed **DALL-E 3** access within **ChatGPT Plus**, **ChatGPT&apos;s performance issues**, and ways to train a **GPT model** using book content via **OpenAI API** or custom GPTs. Anticipation for **GPT-4 turbo** in **Microsoft Copilot** was noted alongside conversations on **AI reasoning**, **prompt engineering**, and overcoming **Custom GPT** glitches. Advice for AI beginners included starting with **Python** and using YAML or Markdown for knowledge integration. The future of AI with multiple specialized GPTs and **Microsoft Copilot&apos;s** role was also explored.</description><pubDate>Wed, 03 Jan 2024 07:23:06 GMT</pubDate><category>openai</category><category>microsoft</category><category>perplexity-ai</category><category>gpt-4-turbo</category><category>dall-e-3</category><category>chatgpt</category><category>swyx</category><category>prompt-engineering</category><category>ai-reasoning</category><category>custom-gpt</category><category>performance</category><category>python</category><category>knowledge-integration</category></item><item><title>12/31/2023: Happy New Year</title><link>https://news.smol.ai/issues/23-12-31-ainews-12312023-happy-new-year/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-31-ainews-12312023-happy-new-year/</guid><description>**LM Studio** community discussions highlight variations and optimizations in **Dolphin** and **Mistral 7b** models, focusing on hardware-software configurations and GPU vRAM impact on processing speed. Challenges with **Mixtral** model deployment on local machines and workarounds for downloading models from **HuggingFace** in restricted regions were addressed. Users explored enhancing AI&apos;s emotional intelligence and personalities through extended prompts, referencing research on emotional stimuli in large language models. The community also discussed hardware setups for budget AI compute servers, integration issues with **ChromaDB** and **Autogen**, and shared positive feedback on LM Studio&apos;s usability and UI. Celebrations for the New Year added a social touch to the guild interactions.</description><pubDate>Mon, 01 Jan 2024 05:33:14 GMT</pubDate><category>lm-studio</category><category>mistral-ai</category><category>hugging-face</category><category>amd</category><category>mistral-7b</category><category>mixtral</category><category>fine-tuning</category><category>hardware-optimization</category><category>vram</category><category>emotional-intelligence</category><category>model-deployment</category><category>integration</category><category>gpu-optimization</category><category>software-updates</category></item><item><title>12/30/2023: Mega List of all LLMs</title><link>https://news.smol.ai/issues/23-12-31-ainews-12302023-mega-list-of-all-llms/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-31-ainews-12302023-mega-list-of-all-llms/</guid><description>**Stella Biderman**&apos;s tracking list of **LLMs** is highlighted, with resources shared for browsing. The **Nous Research AI** Discord discussed the **Local Attention Flax** module focusing on computational complexity, debating linear vs quadratic complexity and proposing chunking as a solution. Benchmark logs for various LLMs including **Deita v1.0** with its **SFT+DPO** training method were shared. Discussions covered model merging, graded modal types, function calling in AI models, and data contamination issues in **Mixtral**. Community insights were sought on **Amazon Titan Text Express** and **Amazon Titan Text Lite** LLMs, including a unique training strategy involving bad datasets. Several GitHub repositories and projects like **DRUGS**, **MathPile**, **CL-FoMo**, and **SplaTAM** were referenced for performance and data quality evaluations.</description><pubDate>Sun, 31 Dec 2023 10:23:31 GMT</pubDate><category>nous-research</category><category>hugging-face</category><category>amazon</category><category>mistral-ai</category><category>deita-v1.0</category><category>mixtral</category><category>amazon-titan-text-express</category><category>amazon-titan-text-lite</category><category>stella-biderman</category><category>euclaise</category><category>joey00072</category><category>local-attention</category><category>computational-complexity</category><category>benchmarking</category><category>model-merging</category><category>graded-modal-types</category><category>function-calling</category><category>data-contamination</category><category>training-methods</category></item><item><title>12/29/2023: TinyLlama on the way</title><link>https://news.smol.ai/issues/23-12-30-ainews-12292023-tinyllama-on-the-way/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-30-ainews-12292023-tinyllama-on-the-way/</guid><description>The **Nous/Axolotl community** is pretraining a **1.1B model on 3 trillion tokens**, showing promising results on **HellaSwag** for a small 1B model. The **LM Studio Discord** discussions cover extensive **GPU-related issues**, **Discord bot integration** with the **OpenAI API**, and **hardware limitations** affecting model usage. Community members also discuss **server hosting** for embeddings and LLMs, propose updates for **Discord channels** to improve model development collaboration, and address a **gibberish problem** in beta releases. The **Autogen** tool&apos;s installation and operational challenges are also clarified by users.</description><pubDate>Sat, 30 Dec 2023 11:06:56 GMT</pubDate><category>openai</category><category>hugging-face</category><category>tinyllama-1.1b</category><category>gpu-optimization</category><category>model-deployment</category><category>discord-bots</category><category>embedding-models</category><category>inference-server</category><category>hardware-compatibility</category><category>model-performance</category><category>beta-testing</category><category>autogen</category><category>context-window</category></item><item><title>12/28/2023: Smol Talk updates</title><link>https://news.smol.ai/issues/23-12-29-ainews-12282023-smol-talk-updates/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-29-ainews-12282023-smol-talk-updates/</guid><description>**Nous Research AI** Discord discussions covered topics such as AI placement charts, **ChatGPT**&apos;s issues with Latex math format compatibility with Obsidian, and performance metrics of the **TinyLlama 1.1B** model on various benchmarks. Users shared resources including the math-centric corpus **MathPile**, knowledge graph building methods, and open-source large language model repositories. Technical discussions included decentralized computation feasibility for models like **Mixtral**, philosophical debates on AI sentience, and strategies for model finetuning and token counting. The community also discussed the **Obsidian** model, vision model training, and the release of the multimodal **TinyGPT-V** model by Tyrannosaurus. *&quot;ChatGPT not generating Latex math format compatible with Obsidian&quot;* and *&quot;optimistic about human-level AI within our lifetime&quot;* were notable quotes.</description><pubDate>Fri, 29 Dec 2023 10:32:18 GMT</pubDate><category>nous-research</category><category>tyrannosaurus</category><category>tinyllama-1.1b</category><category>mixtral</category><category>tinygpt-v</category><category>gary-marcus</category><category>latex</category><category>benchmarking</category><category>knowledge-graphs</category><category>model-finetuning</category><category>tokenization</category><category>decentralized-computation</category><category>philosophy-of-ai</category><category>multimodality</category><category>vision</category><category>open-source-models</category></item><item><title>12/27/2023: NYT vs OpenAI</title><link>https://news.smol.ai/issues/23-12-29-ainews-12272023-nyt-vs-openai/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-29-ainews-12272023-nyt-vs-openai/</guid><description>The LM Studio Discord community extensively discussed **model performance** comparisons, notably between **Phi2** by **Microsoft Research** and **OpenHermes 2.5 Mistral 7b**, with focus on **U.S. history knowledge** and fine-tuning for improved accuracy. Technical challenges around **LLM API** usage, conversation history maintenance, and **GPU optimization** for inference speed were addressed. Hardware discussions covered **DDR4 vs DDR5**, multi-GPU setups, and potential of **Apple M1/M3** and **AMD AI CPUs** for AI workloads. The community also announced the **ChromaDB Plugin v3.0.2** release enabling image search in vector databases. Users shared practical tips on running multiple LM Studio instances and optimizing resource usage.</description><pubDate>Fri, 29 Dec 2023 10:14:01 GMT</pubDate><category>microsoft-research</category><category>mistral-ai</category><category>apple</category><category>amd</category><category>phi2</category><category>openhermes-2.5-mistral-7b</category><category>llama-2-7b</category><category>llama-2-13b</category><category>model-performance</category><category>fine-tuning</category><category>llm-api</category><category>gpu-optimization</category><category>hardware-configuration</category><category>multi-gpu</category><category>inference-speed</category><category>plugin-release</category><category>conversation-history</category></item><item><title>12/26/2023: not much happened today</title><link>https://news.smol.ai/issues/23-12-29-ainews-12262023-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-29-ainews-12262023-not-much-happened-today/</guid><description>**LM Studio** users extensively discussed its performance, installation issues on macOS, and upcoming features like **Exllama2 support** and multimodality with the **Llava model**. Conversations covered **GPU offloading**, **vRAM utilization**, **MoE model expert selection**, and **model conversion compatibility**. The community also addressed **inefficient help requests** referencing the blog &apos;Don&apos;t Ask to Ask, Just Ask&apos;. Technical challenges with **ChromaDB Plugin**, **server vs desktop hardware performance**, and **saving model states with Autogen** were highlighted. Discussions included comparisons with other chatbots and mentions of **AudioCraft** from **meta-ai-fair** and **MusicLM** from **google-deepmind** for music generation.</description><pubDate>Fri, 29 Dec 2023 10:07:18 GMT</pubDate><category>meta-ai-fair</category><category>google-deepmind</category><category>llava</category><category>exllama2</category><category>gpu-offloading</category><category>vram-utilization</category><category>model-conversion</category><category>moe-models</category><category>multimodality</category><category>model-performance</category><category>hardware-configuration</category><category>model-saving</category><category>chatml</category><category>installation-issues</category><category>music-generation</category></item><item><title>12/25/2023: Nous Hermes 2 Yi 34B for Christmas</title><link>https://news.smol.ai/issues/23-12-25-ainews-12252023-nous-hermes-2-yi-34b-for-christmas/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-25-ainews-12252023-nous-hermes-2-yi-34b-for-christmas/</guid><description>**Teknium** released **Nous Hermes 2** on **Yi 34B**, positioning it as a top open model compared to **Mixtral**, **DeepSeek**, and **Qwen**. **Apple** introduced **Ferret**, a new open-source multimodal LLM. Discussions in the **Nous Research AI Discord** focused on **AI model optimization** and **quantization** techniques like **AWQ**, **GPTQ**, and **AutoAWQ**, with insights on proprietary optimization and throughput metrics. Additional highlights include the addition of **NucleusX Model** to **transformers**, a **30B model with 80 MMLU**, and the **YAYI 2** language model by **Wenge Technology** trained on **2.65 trillion tokens**. *&quot;AutoAWQ outperforms vLLM up to batch size 8&quot;* was noted, and proprietary parallel decoding and tensor parallelization across GPUs were discussed for speed improvements.</description><pubDate>Tue, 26 Dec 2023 07:45:27 GMT</pubDate><category>teknim</category><category>nous-research</category><category>apple</category><category>mixtral</category><category>deepseek</category><category>qwen</category><category>huggingface</category><category>wenge-technology</category><category>nous-hermes-2</category><category>yi-34b</category><category>nucleusx</category><category>yayi-2</category><category>ferret</category><category>teknium</category><category>carsonpoole</category><category>casper_ai</category><category>pradeep1148</category><category>osanseviero</category><category>metaldragon01</category><category>quantization</category><category>model-optimization</category><category>throughput-metrics</category><category>batch-processing</category><category>parallel-decoding</category><category>tensor-parallelization</category><category>multimodality</category><category>language-model-pretraining</category><category>model-benchmarking</category></item><item><title>12/24/2023: Dolphin Mixtral 8x7b is wild</title><link>https://news.smol.ai/issues/23-12-25-ainews-12242023-dolphin-mixtral-8x7b-is-wild/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-25-ainews-12242023-dolphin-mixtral-8x7b-is-wild/</guid><description>**Mistral** models are recognized for being uncensored, and Eric Hartford&apos;s **Dolphin** series applies uncensoring fine-tunes to these models, gaining popularity on Discord and Reddit. The **LM Studio** Discord community discusses various topics including hardware compatibility, especially GPU performance with Nvidia preferred, fine-tuning and training models, and troubleshooting issues with LM Studio&apos;s local model hosting capabilities. Integration efforts with **GPT Pilot** and a beta release for ROCm integration are underway. Users also explore the use of **Autogen** for group chat features and share resources like the **Ollama** NexusRaven library. Discussions highlight challenges with running LM Studio on different operating systems, model performance issues, and external tools like **Google Gemini** and **ChatGLM3** compilation.</description><pubDate>Tue, 26 Dec 2023 07:23:04 GMT</pubDate><category>mistral-ai</category><category>ollama</category><category>google</category><category>openai</category><category>dolphin</category><category>glm3</category><category>chatglm3-ggml</category><category>eric-hartford</category><category>fine-tuning</category><category>hardware-compatibility</category><category>gpu-inference</category><category>local-model-hosting</category><category>model-integration</category><category>rocm-integration</category><category>performance-issues</category><category>autogen</category><category>linux</category><category>model-training</category></item><item><title>12/23/2023: NeurIPS Best Papers of 2023</title><link>https://news.smol.ai/issues/23-12-23-ainews-12232023-neurips-best-papers-of-2023/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-23-ainews-12232023-neurips-best-papers-of-2023/</guid><description>The **Latent Space Pod** released a **3-hour recap** of the **best NeurIPS 2023 papers**. The **Nous Research AI Discord** community discussed **optimizing AI performance** with shorter context lengths, **malware security concerns** linked to **HuggingFace**, and shared insights on **video and music content**. Technical discussions included the **DYAD research paper** proposing a faster alternative to linear layers, **Apple&apos;s ML Ferret** machine learning tool, and accessing **PALM2** via API. The community also explored **Large Language Models** focusing on specialized models, data scaling, embedding/vector databases, model merging, and interpretability, with mentions of **Hermes 2.5**, **GPT-4**, and **Mistral**. Additionally, there were conversations on the **Striped Hyena Architecture**, **quantization challenges**, and fixes related to **RMSNorm** and the **&quot;Attention is All You Need&quot;** paper.</description><pubDate>Sun, 24 Dec 2023 07:45:58 GMT</pubDate><category>nous-research</category><category>hugging-face</category><category>apple</category><category>gpt-4</category><category>palm2</category><category>hermes-2.5</category><category>mistral-7b</category><category>context-length</category><category>malware-security</category><category>video-content</category><category>music-content</category><category>linear-layers</category><category>api-access</category><category>large-language-models</category><category>embedding</category><category>vector-databases</category><category>model-merging</category><category>model-interpretability</category><category>striped-hyena-architecture</category><category>quantization</category><category>rmsnorm</category><category>attention-mechanisms</category></item><item><title>12/22/2023: Anyscale&apos;s Benchmark Criticisms</title><link>https://news.smol.ai/issues/23-12-22-ainews-12222023-anyscales-benchmark-criticisms/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-22-ainews-12222023-anyscales-benchmark-criticisms/</guid><description>**Anyscale** launched their **LLMPerf leaderboard** to benchmark large language model inference performance, but it faced criticism for lacking detailed metrics like cost per token and throughput, and for comparing public LLM endpoints without accounting for batching and load. In **OpenAI Discord** discussions, users reported issues with **Bard** and preferred **Microsoft Copilot** for storytelling, noting fewer hallucinations. There was debate on the value of upgrading from **GPT-3.5** to **GPT-4**, with many finding paid AI models worthwhile for coding productivity. Bugs and performance issues with OpenAI APIs were also highlighted, including slow responses and message limits. Future AI developments like **GPT-6** and concerns about OpenAI&apos;s transparency and profitability were discussed. Prompt engineering for image generation was another active topic, emphasizing clear positive prompts and the desire for negative prompts.</description><pubDate>Sat, 23 Dec 2023 01:16:52 GMT</pubDate><category>anyscale</category><category>openai</category><category>microsoft</category><category>gpt-4</category><category>gpt-3.5</category><category>bard</category><category>benchmarking</category><category>performance</category><category>api</category><category>prompt-engineering</category><category>bug-tracking</category><category>model-comparison</category><category>productivity</category><category>programming-languages</category><category>storytelling</category></item><item><title>12/21/2023: The State of AI (according to LangChain)</title><link>https://news.smol.ai/issues/23-12-21-ainews-12212023-the-state-of-ai-according-to-langchain/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-21-ainews-12212023-the-state-of-ai-according-to-langchain/</guid><description>**LangChain** launched their first report based on **LangSmith** stats revealing top charts for mindshare. On **OpenAI**&apos;s Discord, users raised issues about the **Mixtral model**, noting inconsistencies and comparing it to **Poe&apos;s Mixtral**. There were reports of declining output quality and unpredictable behavior in **GPT-4** and **ChatGPT**, with discussions on differences between **Playground GPT-4** and **ChatGPT GPT-4**. Users also reported anomalous behavior in **Bing** and **Bard AI** models, including hallucinations and strange assertions. Various user concerns included message limits on GPT-4, response completion errors, chat lags, voice setting inaccessibility, password reset failures, 2FA issues, and subscription restrictions. Techniques for guiding GPT-4 outputs and creative uses with **DALL-E** were also discussed. *Users highlighted financial constraints affecting subscriptions and queries about earning with ChatGPT and token costs.*</description><pubDate>Fri, 22 Dec 2023 00:20:28 GMT</pubDate><category>langchain</category><category>openai</category><category>perplexity-ai</category><category>microsoft</category><category>poe</category><category>mixtral</category><category>gpt-4</category><category>chatgpt</category><category>bard</category><category>dall-e</category><category>model-consistency</category><category>model-behavior</category><category>response-quality</category><category>chatgpt-usage-limitations</category><category>error-handling</category><category>user-experience</category><category>model-comparison</category><category>hallucination-detection</category><category>prompt-engineering</category><category>creative-ai</category></item><item><title>12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous</title><link>https://news.smol.ai/issues/23-12-20-ainews-12202023-project-obsidian-multimodal-mistral-7b-from-nous/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-20-ainews-12202023-project-obsidian-multimodal-mistral-7b-from-nous/</guid><description>**Project Obsidian** is a multimodal model being trained publicly, tracked by **Teknium** on the Nous Discord. Discussions include **4M: Massively Multimodal Masked Modeling** and **Reason.dev**, a TypeScript framework for LLM applications. The **OpenAI Discord** community discussed hardware specs for running **TensorFlow JS** for image detection, security API ideas for filtering inappropriate images, and concerns about racial and cultural bias in AI, especially in facial recognition and healthcare. Challenges with **GPT-3.5** and **GPT-4** in word puzzle games were noted, along with GPU recommendations prioritizing VRAM for AI inference. Users also debated **GPT-4**&apos;s vision capabilities, limitations of **DALL·E 3**, platform access issues, and prompting strategies for better outputs.</description><pubDate>Thu, 21 Dec 2023 03:20:57 GMT</pubDate><category>nous-research</category><category>teknim</category><category>openai</category><category>gpt-4</category><category>gpt-3.5</category><category>dall-e-3</category><category>multimodality</category><category>image-detection</category><category>security-api</category><category>bias</category><category>facial-recognition</category><category>healthcare-ai</category><category>gpu-optimization</category><category>prompt-engineering</category><category>vision</category></item><item><title>12/19/2023: Everybody Loves OpenRouter</title><link>https://news.smol.ai/issues/23-12-20-ainews-12192023-everybody-loves-openrouter/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-20-ainews-12192023-everybody-loves-openrouter/</guid><description>**OpenRouter** offers an easy OpenAI-compatible proxy for **Mixtral-8x7b-instruct**. Discord discussions highlight **GPT-4** performance and usability issues compared to **GPT-3.5**, including memory management and accessibility problems. Users debate local language models versus OpenAI API usage, with mentions of **Dolphin 2.0 Mistral 7B** and **Google&apos;s video generation project**. Prompt engineering and custom instructions for GPT models are also key topics. Concerns about censorship on models like **Gemini** and translation tool preferences such as **DeepL** were discussed.</description><pubDate>Wed, 20 Dec 2023 08:10:20 GMT</pubDate><category>openai</category><category>mistral-ai</category><category>google</category><category>hugging-face</category><category>gpt-4</category><category>gpt-3.5</category><category>mixtral-8x7b-instruct</category><category>dolphin-2.0-mistral-7b</category><category>gemini</category><category>performance</category><category>memory-management</category><category>api</category><category>prompt-engineering</category><category>local-language-models</category><category>translation</category><category>censorship</category><category>video-generation</category></item><item><title>12/18/2023: Gaslighting Mistral for fun and profit</title><link>https://news.smol.ai/issues/23-12-18-ainews-12182023-gaslighting-mistral-for-fun-and-profit/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-18-ainews-12182023-gaslighting-mistral-for-fun-and-profit/</guid><description>**OpenAI** Discord discussions reveal comparisons among language models including **GPT-4 Turbo**, **GPT-3.5 Turbo**, **Claude 2.1**, **Claude Instant 1**, and **Gemini Pro**, with **GPT-4 Turbo** noted for user-centric explanations. Rumors about **GPT-4.5** remain unconfirmed, with skepticism prevailing until official announcements. Users discuss technical challenges like slow responses and API issues, and explore role-play prompt techniques to enhance model performance. Ethical concerns about AI&apos;s impact on academia and employment are debated. Future features for **Dalle 3** and a proposed new GPT model are speculated upon, while a school project seeks help using the **OpenAI API**. The community also touches on AI glasses and job market implications of AI adoption.</description><pubDate>Tue, 19 Dec 2023 03:35:50 GMT</pubDate><category>openai</category><category>anthropic</category><category>google-deepmind</category><category>gpt-4-turbo</category><category>gpt-3.5-turbo</category><category>claude-2.1</category><category>claude-instant-1</category><category>gemini-pro</category><category>gpt-4.5</category><category>dalle-3</category><category>sam-altman</category><category>prompt-engineering</category><category>api</category><category>model-performance</category><category>ethics</category><category>role-play</category><category>user-experience</category><category>ai-impact-on-jobs</category><category>ai-translation</category><category>technical-issues</category></item><item><title>12/16/2023: ByteDance suspended by OpenAI</title><link>https://news.smol.ai/issues/23-12-16-ainews-12162023-bytedance-suspended-by-openai/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-16-ainews-12162023-bytedance-suspended-by-openai/</guid><description>The OpenAI Discord community discussed hardware options like **Mac racks** and the **A6000 GPU**, highlighting their value for AI workloads. They compared **Claude 2.1** and **GPT 4 Turbo** on coding tasks, with **GPT 4 Turbo** outperforming Claude 2.1. The benefits of the **Bard API** for **gemini pro** were noted, including a free quota of **60 queries per minute**. Users shared experiences with **ChatGPT Plus** membership issues, payment problems, and speculated about the upcoming **GPT-5** and the rumored **GPT-4.5**. Discussions also covered the confidentiality of the **Alpha feature**, AI art generation policies, and improvements in organizational work features. The community expressed mixed feelings about GPT-4&apos;s performance and awaited future model updates.</description><pubDate>Sat, 16 Dec 2023 19:41:52 GMT</pubDate><category>openai</category><category>google-deepmind</category><category>anthropic</category><category>claude-2.1</category><category>gpt-4-turbo</category><category>gemini-1.5-pro</category><category>gpt-5</category><category>gpt-4.5</category><category>gpt-4</category><category>hardware</category><category>gpu</category><category>api-costs</category><category>coding</category><category>model-comparison</category><category>subscription-issues</category><category>payment-processing</category><category>feature-confidentiality</category><category>ai-art-generation</category><category>organizational-productivity</category><category>model-speculation</category></item><item><title>12/15/2023: Mixtral-Instruct beats Gemini Pro (and matches GPT3.5)</title><link>https://news.smol.ai/issues/23-12-15-ainews-12152023-mixtral-instruct-beats-gemini-pro-and-matches-gpt35/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-15-ainews-12152023-mixtral-instruct-beats-gemini-pro-and-matches-gpt35/</guid><description>Thanks to a **karpathy** shoutout, **lmsys** now has enough data to rank **mixtral** and **gemini pro**. The discussion highlights the impressive performance of these state-of-the-art open-source models that can run on laptops. In the **openai** Discord, users compared AI tools like **perplexity** and **chatgpt&apos;s browsing tool**, favoring Perplexity for its superior data gathering, pricing, and usage limits. Interest was shown in AI&apos;s ability to convert large code files with **deepseek coder** recommended. Debates on privacy implications for AI advancement and challenges of running LLMs on local and cloud GPUs were prominent. Users reported issues with **chatgpt** including performance problems, loss of access to custom GPTs, and unauthorized access. Discussions also covered prompt engineering for large context windows and speculations about **gpt-4.5** and **gpt-4** future developments.</description><pubDate>Fri, 15 Dec 2023 22:33:20 GMT</pubDate><category>lmsys</category><category>openai</category><category>deepseek</category><category>cloudflare</category><category>huggingface</category><category>mixtral</category><category>gemini-pro</category><category>gpt-3.5</category><category>gpt-4.5</category><category>gpt-4</category><category>chatgpt</category><category>karpathy</category><category>performance</category><category>context-window</category><category>prompt-engineering</category><category>privacy</category><category>local-gpu</category><category>cloud-gpu</category><category>code-generation</category><category>model-comparison</category><category>model-usage</category><category>api-errors</category></item><item><title>12/14/2023: $1e7 for Superalignment</title><link>https://news.smol.ai/issues/23-12-14-ainews-12142023-dollar1e7-for-superalignment/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-14-ainews-12142023-dollar1e7-for-superalignment/</guid><description>**Jan Leike** is launching a new grant initiative inspired by **Patrick Collison&apos;s Fast Grants** to support AI research. **OpenAI** introduced a new developers Twitter handle @OpenAIDevs for community updates. Discussions on **OpenAI&apos;s Gemini** and **Bard** chatbots highlight their ability to read each other&apos;s instructions and offer unique coding solutions. Users reported various issues with **GPT-4**, including performance problems, customization difficulties, and a resolved bug in image recognition. There are ongoing conversations about **prompt engineering** challenges and new **JSON mode support** in Convo-lang for API use. Concerns about misuse of chatbots for illegal activities and alternatives like **Llama2** models and the **Perplexity chatbot** were also discussed.</description><pubDate>Thu, 14 Dec 2023 22:51:28 GMT</pubDate><category>openai</category><category>llamaindex</category><category>perplexity-ai</category><category>gemini</category><category>bard</category><category>gpt-4</category><category>gpt-4.5</category><category>llama-2</category><category>jan-leike</category><category>patrick-collison</category><category>prompt-engineering</category><category>api</category><category>custom-gpt</category><category>json</category><category>bug-fixes</category><category>chatbots</category><category>performance</category><category>tts</category><category>code-generation</category><category>image-recognition</category></item><item><title>12/13/2023 SOLAR10.7B upstages Mistral7B?</title><link>https://news.smol.ai/issues/23-12-13-ainews-12132023-solar107b-upstages-mistral7b/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-13-ainews-12132023-solar107b-upstages-mistral7b/</guid><description>**Upstage** released the **SOLAR-10.7B** model, which uses a novel Depth Up-Scaling technique built on the **llama-2** architecture and integrates **mistral-7b** weights, followed by continued pre-training. The **Nous** community finds it promising but not exceptional. Additionally, weights for the **phi-2** base model were released, trained on **1.4 trillion tokens** including synthetic texts created by GPT-3 and filtered by GPT-4, using **96 A100 GPUs** over 14 days. On **OpenAI&apos;s** Discord, users discussed challenges with various **GPT** models, including incoherent outputs, API usage limitations, and issues with **GPT-4 Vision API**. Conversations also covered understanding **AGI** and **ASI**, concerns about OpenAI&apos;s partnership with Axel Springer, and pricing changes for GPT Plus. Discussions included the **Gemini** chat model integrated into Bard and comparisons with GPT-4 performance.</description><pubDate>Wed, 13 Dec 2023 23:29:29 GMT</pubDate><category>upstage</category><category>nous-research</category><category>openai</category><category>mistral-ai</category><category>microsoft</category><category>solar-10.7b</category><category>llama-2</category><category>mistral-7b</category><category>phi-2</category><category>gpt-4</category><category>gemini</category><category>depth-up-scaling</category><category>pretraining</category><category>synthetic-data</category><category>gpu-training</category><category>api-usage</category><category>model-integration</category><category>agi</category><category>asi</category><category>chat-models</category><category>vision</category><category>model-performance</category><category>fine-tuning</category></item><item><title>12/12/2023: Towards LangChain 0.1</title><link>https://news.smol.ai/issues/23-12-12-ainews-12122023-towards-langchain-01/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-12-ainews-12122023-towards-langchain-01/</guid><description>The **Langchain rearchitecture** has been completed, splitting the repo for better maintainability and scalability, while remaining backwards compatible. **Mistral** launched a new Discord community, and **Anthropic** is rumored to be raising another **$3 billion**. On the **OpenAI Discord**, discussions covered **information leakage** in AI training, **mixture of experts (MoE) models** like **mixtral 8x7b**, advanced **prompt engineering techniques**, and issues with **ChatGPT** performance and API access. Users also explored AI applications in **logo generation**, **education**, and **gaming**, and shared solutions for **Oauth2 authentication** problems. A new small language model named **Phi-2** was mentioned from **Microsoft**.</description><pubDate>Wed, 13 Dec 2023 03:45:12 GMT</pubDate><category>langchain</category><category>mistral-ai</category><category>anthropic</category><category>openai</category><category>microsoft</category><category>mixtral-8x7b</category><category>phi-2</category><category>gpt-3</category><category>chatgpt</category><category>gpt-4</category><category>mixture-of-experts</category><category>information-leakage</category><category>prompt-engineering</category><category>oauth2</category><category>logo-generation</category><category>education-ai</category><category>gaming-ai</category><category>api-access</category><category>model-maintainability</category><category>scalability</category></item><item><title>12/11/2023: Mixtral beats GPT3.5 and Llama2-70B</title><link>https://news.smol.ai/issues/23-12-11-ainews-12112023-mixtral-beats-gpt35-and-llama2-70b/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-11-ainews-12112023-mixtral-beats-gpt35-and-llama2-70b/</guid><description>**Mistral AI** announced the **Mixtral 8x7B** model featuring a Sparse Mixture of Experts (SMoE) architecture, sparking discussions on its potential to rival **GPT-4**. The community debated GPU hardware options for training and fine-tuning transformer models, including **RTX 4070s**, **A4500**, **RTX 3090s with nvlink**, and **A100 GPUs**. Interest was expressed in fine-tuning Mixtral and generating quantized versions, alongside curating high-quality coding datasets. Resources shared include a YouTube video on open-source model deployment, an Arxiv paper, GitHub repositories, and a blog post on Mixture-of-Experts. Discussions also touched on potential open-source releases of **GPT-3.5 Turbo** and **llama-3**, and running **OpenHermes 2.5** on Mac M3 Pro with VRAM considerations.</description><pubDate>Mon, 11 Dec 2023 20:11:07 GMT</pubDate><category>mistral-ai</category><category>openai</category><category>huggingface</category><category>mixtral-8x7b</category><category>gpt-4</category><category>gpt-3.5-turbo</category><category>llama-3</category><category>openhermes-2.5</category><category>llava-v1.5-13b-gptq</category><category>sparse-mixture-of-experts</category><category>fine-tuning</category><category>quantization</category><category>gpu-hardware</category><category>transformers</category><category>model-deployment</category><category>open-source</category><category>coding-datasets</category></item><item><title>12/10/2023: not much happened today</title><link>https://news.smol.ai/issues/23-12-10-ainews-12102023-not-much-happened-today/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-10-ainews-12102023-not-much-happened-today/</guid><description>**Nous Research AI** Discord community discussed attending **NeurIPS** and organizing future AI events in Australia. Highlights include interest in open-source and decentralized AI projects, with **Richard Blythman** seeking co-founders. Users shared projects like **Photo GPT AI** and introduced **StableLM Zephyr 3B**. The **Mixtral** model, based on **Mistral**, sparked debate on performance and GPU requirements, with comparisons to **GPT-3.5** and potential competitiveness with **GPT-4** after fine-tuning. Tools like **Tensorboard**, **Wandb**, and **Llamahub** were noted for fine-tuning and evaluation. Discussions covered **Mixture of Experts (MoE)** architectures, fine-tuning with limited data, and inference optimization strategies for ChatGPT. Memes and community interactions referenced AI figures like **Andrej Karpathy** and **Yann LeCun**. The community also shared resources such as GitHub links and YouTube videos related to these models and tools.</description><pubDate>Sun, 10 Dec 2023 23:49:57 GMT</pubDate><category>nous-research</category><category>openai</category><category>mistral-ai</category><category>hugging-face</category><category>ollama</category><category>lm-studio</category><category>mixtral-8x7b-32kseqlen</category><category>mistral-7b</category><category>stablelm-zephyr-3b</category><category>openhermes-2.5-neural-chat-v3-3-slerp</category><category>gpt-3.5</category><category>gpt-4</category><category>andrej-karpathy</category><category>yann-lecun</category><category>richard-blythman</category><category>gabriel-syme</category><category>pradeep1148</category><category>cyborg_1552</category><category>fine-tuning</category><category>mixture-of-experts</category><category>model-benchmarking</category><category>inference-optimization</category><category>model-evaluation</category><category>open-source</category><category>decentralized-ai</category><category>gpu-optimization</category><category>community-engagement</category></item><item><title>12/9/2023: The Mixtral Rush</title><link>https://news.smol.ai/issues/23-12-09-ainews-1292023-the-mixtral-rush/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-09-ainews-1292023-the-mixtral-rush/</guid><description>**Mixtral&apos;s weights** were released without code, prompting the **Disco Research community** and **Fireworks AI** to implement it rapidly. Despite efforts, no significant benchmark improvements were reported, limiting its usefulness for local LLM usage but marking progress for the **small models community**. Discussions in the DiscoResearch Discord covered **Mixtral&apos;s performance** compared to models like **Hermes 2.5** and **Hermes 2**, with evaluations on benchmarks such as **winogrande**, **truthfulqa_mc2**, and **arc_challenge**. Technical topics included GPU requirements, multi-GPU setups, and quantization via **GPTQ**. Benchmarking strategies like grammar-based evaluation, chain of thought (CoT), and min_p sampling were explored, alongside model sampling techniques like Min P and Top P to enhance response stability and creativity. Users also discussed GPTs&apos; learning limitations and the adaptability of models under varying conditions, emphasizing min_p sampling&apos;s role in enabling higher temperature settings for creativity.</description><pubDate>Sat, 09 Dec 2023 23:30:00 GMT</pubDate><category>discoresearch</category><category>fireworks-ai</category><category>hugging-face</category><category>mistral-ai</category><category>mixtral</category><category>hermes-2.5</category><category>hermes-2</category><category>mistral-yarn</category><category>ultrachat</category><category>bjoernp</category><category>the_bloke</category><category>rtyax</category><category>kalomaze</category><category>solbus</category><category>calytrix</category><category>benchmarking</category><category>gpu-requirements</category><category>multi-gpu</category><category>quantization</category><category>gptq</category><category>chain-of-thought</category><category>min-p-sampling</category><category>top-p-sampling</category><category>model-sampling</category><category>model-merging</category><category>model-performance</category><category>small-models</category><category>reasoning-consistency</category><category>temperature-sampling</category></item><item><title>12/8/2023 - Mamba v Mistral v Hyena</title><link>https://news.smol.ai/issues/23-12-08-ainews-1282023-mamba-v-mistral-v-hyena/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-08-ainews-1282023-mamba-v-mistral-v-hyena/</guid><description>Three new AI models are highlighted: **Mistral&apos;s 8x7B MoE model (Mixtral)**, **Mamba models** up to 3B by Together, and **StripedHyena 7B**, a competitive subquadratic attention model from Stanford&apos;s Hazy Research. Discussions on **Anthropic&apos;s Claude 2.1** focus on its prompting technique and alignment challenges. The **Gemini AI** from Google is noted as potentially superior to **GPT-4**. The community also explores **Dreambooth** for image training and shares resources like the **DialogRPT-human-vs-machine** model on Hugging Face. Deployment challenges for large language models, including CPU performance and GPU requirements, are discussed with references to **Falcon 180B** and transformer batching techniques. User engagement includes meme sharing and humor.</description><pubDate>Fri, 08 Dec 2023 22:40:04 GMT</pubDate><category>mistral-ai</category><category>togethercompute</category><category>stanford</category><category>anthropic</category><category>google</category><category>hugging-face</category><category>mistral-8x7b-moe</category><category>mamba-3b</category><category>stripedhyena-7b</category><category>claude-2.1</category><category>gemini</category><category>gpt-4</category><category>dialogrpt-human-vs-machine</category><category>cybertron-7b-v2-gguf</category><category>falcon-180b</category><category>andrej-karpathy</category><category>tri-dao</category><category>maxwellandrews</category><category>raddka</category><category>mixture-of-experts</category><category>attention-mechanisms</category><category>prompt-engineering</category><category>alignment</category><category>image-training</category><category>model-deployment</category><category>gpu-requirements</category><category>cpu-performance</category><category>model-inference</category><category>long-context</category><category>model-evaluation</category><category>open-source</category><category>chatbots</category></item><item><title>12/7/2023: Anthropic says &quot;skill issue&quot;</title><link>https://news.smol.ai/issues/23-12-07-ainews-1272023-anthropic-says-skill-issue/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-07-ainews-1272023-anthropic-says-skill-issue/</guid><description>**Anthropic** fixed a glitch in their **Claude 2.1** model&apos;s needle in a haystack test by adding a prompt. Discussions on **OpenAI&apos;s** Discord compared **Google&apos;s Gemini Pro and Gemini Ultra** models with **OpenAI&apos;s GPT-4** and **GPT-3.5**, with some users finding GPT-4 superior in benchmarks. Rumors about a **GPT-4.5** release circulated without official confirmation. Concerns were raised about &quot;selective censorship&quot; affecting language model performance. The EU&apos;s potential regulation of AI, including **ChatGPT**, was highlighted. Users reported issues with **ChatGPT Plus** message limits and subscription upgrades, and shared experiences with **BingChat** and **DALL-E**. The community discussed prompt engineering techniques and future applications like image generation and MIDI sequence analysis, expressing hopes for **GPT-5**.</description><pubDate>Thu, 07 Dec 2023 20:49:01 GMT</pubDate><category>anthropic</category><category>openai</category><category>google</category><category>claude-2.1</category><category>gpt-4</category><category>gpt-3.5</category><category>gemini-pro</category><category>gemini-ultra</category><category>gpt-4.5</category><category>chatgpt</category><category>bingchat</category><category>dall-e</category><category>gpt-5</category><category>prompt-engineering</category><category>model-performance</category><category>regulation</category><category>language-model-performance</category><category>image-generation</category><category>audio-processing</category><category>midi-sequence-analysis</category><category>subscription-issues</category><category>network-errors</category></item><item><title>Is Google&apos;s Gemini... legit?</title><link>https://news.smol.ai/issues/23-12-06-ainews-is-googles-gemini-legit/</link><guid isPermaLink="true">https://news.smol.ai/issues/23-12-06-ainews-is-googles-gemini-legit/</guid><description>**Google&apos;s Gemini** AI model is generating significant discussion and skepticism, especially regarding its **32-shot chain of thought** MMLU claim and **32k context window**. The community is comparing Gemini&apos;s performance and capabilities with **OpenAI&apos;s GPT-4** and **GPT-3.5**, highlighting the upcoming **Gemini Pro** and **Gemini Ultra** models on the Bard platform. Users report various **OpenAI service issues** including chatbot errors and subscription problems. Discussions also cover **prompt engineering techniques**, AI model evaluation comparing **GPT-4**, **Claude 2.1**, and **PaLM2**, and improvements in speech and multimodal capabilities. The bot now supports reading and summarizing links from platforms like arXiv, Twitter, and YouTube, enhancing user interaction.</description><pubDate>Wed, 06 Dec 2023 22:22:18 GMT</pubDate><category>google</category><category>openai</category><category>gemini</category><category>gemini-pro</category><category>gemini-ultra</category><category>gpt-4</category><category>gpt-3.5</category><category>claude-2.1</category><category>palm2</category><category>swyx</category><category>chain-of-thought</category><category>context-windows</category><category>prompt-engineering</category><category>model-evaluation</category><category>multimodality</category><category>speech-processing</category><category>chatbot-errors</category><category>subscription-management</category></item></channel></rss>