World Models are all you need.

AI News for 12/23/2025-12/24/2025. We checked 12 subreddits and 544 Twitters. Estimated reading time saved (at 200wpm): 341 minutes. Our new website is now up with full metadata search and beautiful vibe coded presentation of all past issues. See https://news.smol.ai/ for the full news breakdowns and give us feedback on @smol_ai!

AI Twitter Recap

Top Story: Yann LeCun’s AMI Labs launches with a $1.03B seed to build world models around JEPA

What happened

Yann LeCun formally unveiled Advanced Machine Intelligence (AMI Labs), a new startup focused on “building real intelligence into the real world,” with an unusually large $1.03B seed round (also cited as €890M) at a reported $3.5B pre-money valuation, described as one of the largest seed rounds ever and likely the largest for a European company. The announcement came directly from LeCun, who said the company had completed “one of the largest seeds ever” and was hiring @ylecun, and from CEO Alex Lebrun, who framed the mission as a “long-term scientific endeavor” to build systems that “truly understand the real world” @lxbrun. Multiple press reports converged on the same core facts: AMI aims to build AI models that understand the physical world and reflects LeCun’s long-running view that human-level AI will come from world modeling rather than scaling language prediction alone @TechCrunch @WIRED @business @Reuters @ZeffMax. The founding and senior team includes LeCun; Alex Lebrun as CEO @lxbrun; Saining Xie as cofounder/CSO @sainingxie; Laurent Solly as COO @laurentsolly; Pascale Fung as Co-Founder and Chief Research & Innovation Officer @pascalefung; plus a wave of prominent founding researchers joining to work specifically on world models, representation learning, pretraining, scaling, and video @sanghyunwoo1219 @jihanyang13 @duchao0726 @zhouxy2017 @jingli9111.**

Facts vs. opinions

Facts reported across tweets and coverage

Funding size: $1.03B seed / €890M @ylecun @lxbrun @laurentsolly.
Valuation: $3.5B pre-money was reported by commentators and news summaries @iScienceLuvr @ZeffMax.
Company thesis: build AI models that can understand the physical/real world, not just language @TechCrunch @WIRED @Reuters.
LeCun’s positioning: world models have been his public thesis for years; AMI is the vehicle to test it at startup scale @ZeffMax @WIRED.
Official language from AMI leaders: “real intelligence into the real world,” “human-centered,” “perceives, learns, reasons and acts” @Brian_Bo_Li @pascalefung.
Hiring/opening locations: Paris explicitly mentioned by Pascale Fung @pascalefung; observers also noted Zürich among locations @giffmana.
Europe/France angle: French media and political figures framed it as a major European/French AI milestone @BFMTV @France24_fr @EmmanuelMacron @NicolasDufourcq.

Opinions and interpretation

Supportive view: this is LeCun finally getting the capital and team to prove his long-argued alternative to LLM-centric AI @teortaxesTex.
Bullish technical view: world models will be a “huge leap forward,” especially for embodiment/robotics, and AMI’s open-research posture is attractive @mervenoyann @ziv_ravid.
Architecture-war framing: some commentators explicitly cast AMI as a bet that the industry is building on the wrong foundation by over-indexing on autoregressive language models @LiorOnAI.
Skeptical/neutral view: the key question is not whether world models sound compelling, but whether JEPA-style methods can scale into economically useful systems faster than LLM-centric agents are already commercializing. This skepticism is more implicit than explicit in the tweet set, but appears through “gets a chance to prove his vision” style comments @teortaxesTex.
Meta-commentary: AMI is not being framed internally as a “conventional lab” @sainingxie, which suggests an attempt to differentiate from the standard frontier-lab pattern of API-first model scaling.

Technical details: JEPA, world models, and why this is different from next-token LMs

AMI’s public narrative is aligned with LeCun’s JEPA/world-model agenda. The explicit technical details in the tweets are sparse, but the discussion strongly points to the following stack of ideas:

World models: latent predictive models of environment dynamics that learn compact state representations and predict future states/outcomes rather than raw sensory streams.
JEPA: Joint Embedding Predictive Architecture, introduced by LeCun in 2022, highlighted in commentary as a method that learns abstract representations and predicts in a compressed latent space rather than trying to reconstruct every pixel/token @LiorOnAI.
Motivation for JEPA over generative modeling:
- Real-world sensor streams contain lots of unpredictable or irrelevant entropy.
- Raw-pixel/video prediction is inefficient because it spends modeling capacity on noise.
- Predicting latent abstractions may better support planning, controllability, and invariance.
Action-conditioned world models: commentary noted the key extension that models should predict consequences of actions, enabling planning before acting @LiorOnAI. That is closer to model-based RL/control than to passive sequence modeling.
Target domains repeatedly implied:
- Robotics / embodied AI @mervenoyann
- Healthcare and lower-hallucination systems @kimmonismus
- Industrial process control / safety-critical environments @LiorOnAI
- More generally, systems that must track persistent state, causality, and action outcomes in the physical world.

This is broadly consistent with LeCun’s longstanding critique of pure autoregressive LLMs:

text prediction alone is not sufficient for grounded understanding,
the world is only partially predictable,
intelligent agents need hierarchical representations and planning in latent space,
data from vision/video/embodiment should dominate long-run AI progress.

Team composition as a technical signal

The founding roster is itself a technical clue. Several hires emphasize:

world models @sanghyunwoo1219 @zhouxy2017
pretraining, scaling, video, representation @jingli9111
a cluster of vision-heavy researchers, noted by supporters as “a team of vision 🐐s” @mervenoyann

That suggests AMI is likely to emphasize vision/video/self-supervised representation learning, not just append world-model language to an otherwise standard LLM stack.

Open research posture

Several supportive reactions specifically mentioned hope for open releases/open research @mervenoyann @mervenoyann. That matters because JEPA/world-model work has historically had stronger academic than product traction; openness would help AMI recruit and shape a research ecosystem. But at launch this is still aspiration rather than demonstrated practice.

Different opinions in the reaction set

1) Strongly supportive: “LeCun finally gets to run the experiment”

A sizable share of reactions are essentially relief that LeCun now has a dedicated startup and capital base to validate his worldview.

“Yann gets a chance to prove his vision” @teortaxesTex
“very bullish… world models will be a huge leap forward” @mervenoyann
“super bullish on AMI labs” because of team quality and open research ambition @ziv_ravid
“understanding the real world is key to building advanced AI systems” @duchao0726

This camp sees AMI as an overdue counterweight to the current industry equilibrium around autoregressive LMs + RLHF + tool use.

2) Architecture-war framing: “LLMs predict words; AMI wants models of reality”

This view was most explicitly articulated by @LiorOnAI:

language models operate over words/tokens,
reality is continuous, sensorimotor, and partly unpredictable,
generative models overfit to reconstruction,
JEPA predicts meaningful abstractions instead.

This is the clearest pro-AMI technical argument in the tweet set. It treats hallucination, brittleness, and lack of grounded planning as symptoms of the wrong training objective, not just insufficient scale.

3) Pragmatic neutral: “Compelling thesis, but now it has to ship”

Some reactions are celebratory but not credulous:

“gets a chance to prove his vision” @teortaxesTex
“burning question… PyTorch or JAX shop” @giffmana

The latter is not just joke infrastructure chatter; it reflects a real question for how AMI will operationalize research. A startup attempting novel world-model training at scale must choose an ecosystem optimized either for:

fast research iteration and broad hiring familiarity (PyTorch), or
aggressive large-scale functional-programming style and SPMD compiler stacks (JAX).

4) Broader simulation/world-model enthusiasm outside AMI

The AMI launch also landed into a broader discourse where “simulation is the next frontier” was already in the air. Percy Liang argued that the next big opportunity is to “put society into a docker container” via simulation models that can predict what happens in hypothetical real-world scenarios @percyliang. That isn’t about AMI directly, but it reinforces why LeCun’s thesis currently resonates: many researchers increasingly think progress requires moving from token imitation to model-based prediction of environments and interactions.

Context: why this matters now

AMI matters because it is a high-profile, well-capitalized attempt to reopen a question many in industry had tacitly declared settled: is next-token prediction the central path to advanced intelligence, or just a useful but ultimately narrow substrate?

Why the timing is notable

The launch comes when:

LLMs and coding agents are commercially successful,
multimodal systems are improving fast,
robotics/autonomy/world-model language is resurging,
and there is growing awareness that benchmark gains in text/code may not directly translate to physical-world competence.

This matters especially because frontier AI discourse lately has been dominated by:

agents/harnesses/tool use,
reasoning RL,
coding automation,
and inference infrastructure.

AMI is an explicit bet that the next frontier is grounded representation learning and predictive modeling of the real world, not just better wrappers around text models.

Why LeCun is uniquely positioned

LeCun has spent years publicly arguing:

human and animal intelligence is learned from observation and action in the world,
language is too low-bandwidth and derivative to be the main training signal,
systems need latent-variable world models and planning.

His influence made him one of the most visible skeptics of “LLMs alone get us to AGI.” AMI is therefore not just another startup; it is the most direct institutionalization so far of the anti-token-maximalist view from one of the field’s most prominent figures.

Europe/France implications

Political and institutional reactions in France/Europe were unusually strong:

Macron celebrated it as a new page for AI and “la France des chercheurs, des bâtisseurs” @EmmanuelMacron
Bpifrance’s Nicolas Dufourcq highlighted French pride in backing a company that could “revolutionize global AI” @NicolasDufourcq

So AMI is also being positioned as a European strategic AI champion, not merely a research startup.

All relevant AMI/world-model tweets and what each adds

@TechCrunch: headline confirmation of the $1.03B raise and world-model framing.
@BFMTV: French-language mainstream framing of the raise as historic.
@WIRED: contextualizes LeCun’s long-running thesis that physical-world mastery, not language alone, is the route to human-level AI.
@business: Bloomberg confirmation of the funding magnitude.
@iScienceLuvr: adds the $3.5B pre-money valuation figure.
@sainingxie: AMI is “not a conventional lab,” and Xie joins as cofounder/CSO.
@lxbrun: CEO announcement; mission is long-term scientific effort toward real-world understanding.
@ZeffMax: concise summary that AMI is LeCun betting big on world models after years of advocacy.
@teortaxesTex: “gets a chance to prove his vision.”
@Brian_Bo_Li: “real intelligence into the real world” slogan.
@sanghyunwoo1219: joined from day one specifically to work on world models.
@laurentsolly: COO announcement; repeats funding and “next AI frontier models.”
@mavenlin: enthusiasm from another team member, signaling depth of founding bench.
@crystalsssup: notes Saining Xie’s presence as a signal of AMI’s seriousness.
@ylecun: official unveiling; “one of the largest seeds ever,” likely largest for a European company.
@jihanyang13: founding-team join announcement.
@giffmana: asks whether AMI becomes a PyTorch or JAX shop.
@France24_fr: French media framing as a “paradigm shift.”
@TheRundownAI: short summary of “beyond language models to build world models.”
@pascalefung: Fung joins as CRIO; emphasizes “human-centered” AI that perceives, learns, reasons, acts.
@EmmanuelMacron: political endorsement and national strategic framing.
@franceinter: media amplification around LeCun’s broader claims about jobs and AI transformation.
@mervenoyann: bullish on world models as a leap forward for embodied research and likes the open stance.
@kimmonismus: adds healthcare/Nabla commercialization angle and hallucination-risk framing.
@pascalefung: hiring for Paris team.
@zhouxy2017: founding member working on world models.
@Reuters: calls AMI an “alternative AI approach.”
@NVIDIAAI and related Thinking Machines/NVIDIA posts are not about AMI; omitted from focus.
@chris_j_paxton: notes absence of Bay Area in listed locations; suggests geographic differentiation.
@giffmana: clarifies Zürich is one of the locations.
@lilianweng: “building technologies for better human-AI collaboration on next gen hardware at scale.” Indirect but clearly tied to joining/working with the AMI orbit.
@Yuchenj_UW: juxtaposes LeCun’s world-model startup and Meta’s Moltbook acquisition, highlighting the contrast between long-horizon foundational bets and near-term agent/social-product bets.
@LiorOnAI: the most explicit technical gloss on JEPA and why latent-space predictive modeling may matter.
@sainingxie: appreciation reply; minor but confirms continued engagement.
@NandoDF @DrJimFan @denisyarats: peer congratulations; low-information but signal broad respect.

Bottom line

AMI Labs is the strongest institutional challenge yet to the idea that scaling autoregressive language models is the sole or dominant route to AGI. The hard facts are unusually concrete — $1.03B seed, $3.5B pre-money, elite vision/world-model-heavy team, France/Europe strategic backing — while the technical promise remains largely thesis-level for now: JEPA-style latent predictive world models that learn from real-world sensor data and support planning/action without reconstructing every bit of noise. Supporters view it as the overdue next paradigm; neutrals see a high-stakes test of whether LeCun’s critique of LLMs can finally cash out in products and benchmarks; skeptics, even when not stated bluntly, will judge it on whether world models can outcompete rapidly improving LLM agents before the market closes around the current stack.

Other Topics

Agents, coding workflows, and the “builder vs reviewer” shift

A broad theme across the timeline is that coding agents are changing software org structure: implementation is no longer the bottleneck; review, architecture, and product judgment are @renilzac @clairevo @dexhorthy. Multiple reactions converged on the framing that engineers increasingly become either builders with product taste or reviewers with systems thinking @radek__w @ZhitaoLi224653.
Agent harnesses emerged as a major practical concept: “Agent = Model + Harness,” with filesystems, memory, browsers, routing, orchestration, and sandboxes all part of the real product surface @Vtrivedy10 @techczech @AstasiaMyers @omarsar0.
Tooling updates reflected that trend:
- VS Code Agent Hooks for policy enforcement and workflow guidance @code
- GitHub/Figma MCP closes design↔code loops @github
- LangGraph deploy and LangGraph 1.1 simplify productionization @LangChain @sydneyrunkle
- Together MCP server and Together GPU Clusters add infra for agent-driven app building and scale @togethercompute @togethercompute
- Ollama scheduled prompts in Claude Code adds simple automation loops @ollama
Product reactions were split between enthusiasm and caution:
- Perplexity Computer replacing routine knowledge work and marketing tasks was cited as a strong founder use case @GabbbarSingh @AravSrinivas @AravSrinivas
- But several posts warned against optimizing for “% AI-written code” or abandoning code comprehension entirely @karrisaarinen @dexhorthy.
UX matters as much as raw capability: Claude Code/Hermes/OpenClaw users repeatedly noted trust, feedback loops, memory, and interface presentation as key to perceived competence @StudioYorktown @sudoingX @cz_binance.

Benchmarks, evals, and reliability research

Cameron Wolfe posted a practical stats thread on making LLM evals more reliable: model scores as sample means, estimate standard error as std / sqrt(n), and report 95% confidence intervals as x̄ ± 1.96×SE instead of raw mean-only metrics @cwolferesearch @cwolferesearch.
New benchmark work focused on grounding and human validity:
- Opposite-Narrator Contradictions for sycophancy @LechMazur
- OfficeQA Pro: enterprise grounded reasoning remains hard, with frontier agents still <50% @kristahopsalong @DbrxMosaicAI
- SWE-bench Verified appears overstated relative to maintainer reality: maintainers would merge only about half of agent PRs that pass the grader @whitfill_parker @joel_bkr
- AuditBench introduces 56 LLMs with implanted hidden behaviors for alignment-auditing evaluation @abhayesian
- CodeClash probes long-horizon coding/planning; top models still fare poorly in sustained agentic adversarial settings @OfirPress @OfirPress
Interpretability of reasoning traces continues to be contested: one paper summary claimed 97%+ of “thinking steps” are decorative and CoT monitoring is unreliable @shi_weiyan.

Models, infrastructure, and training systems

Megatron Core MoE drew strong attention as an open framework for large-scale MoE training, with a claim of 1233 TFLOPS/GPU for DeepSeek-V3-685B @EthanHe_42 @eliebakouch. Commentary suggested DeepSeek-style MoE training efficiency is becoming commoditized @teortaxesTex.
Gemini Embedding 2 launched as Google’s first fully multimodal embedding model:
- single embedding space for text, images, video, audio, docs
- 8,192-token text inputs
- 100+ languages
- output dims 3072 / 1536 / 768 via MRL
- up to 6 images, 120s video, 6-page PDFs per request @OfficialLoganK @_philschmid @googleaidevs.
Hugging Face Storage Buckets launched as S3-like mutable storage built on Xet deduplication, starting at $8/TB/month, positioned for checkpoints, logs, traces, eval outputs, and agent artifacts @victormustar @huggingface @Wauplin.
Other notable model/system releases:
- RWKV-7 G1e in 13B/7B/3B/1B sizes @BlinkDL_AI
- Hume TADA open-source TTS model: zero content hallucinations across 1,000+ test samples, 5x faster than comparable LLM-TTS, and 2,048 tokens ≈ 700s of audio @hume_ai
- Phi-4-reasoning-vision-15B highlighted as a compact open multimodal model @dl_weekly
- Baseten/Harvard prefix-caching collaboration for inference efficiency @chutes_ai

Autonomous research, AlphaGo lineage, and recursive improvement

The strongest meta-theme outside AMI was automated ML research:
- Karpathy’s autoresearch concept — overnight experiment loops with code edits, short training runs, and metric-based keep/discard logic — was widely discussed @NerdyRodent @_philschmid
- Yuchen Jin ran a Claude-driven “chief scientist” loop for 11+ hours, 568 experiments, on 8 GPUs, observing a progression from broad exploration to focused refinement to heavy validation @Yuchenj_UW
- Karpathy hinted at AgentHub, “GitHub for agents,” as the next layer for multi-agent research collaboration @karpathy @Yuchenj_UW
AlphaGo’s 10-year anniversary triggered many reflections:
- Demis Hassabis argued AlphaGo’s search-and-planning ideas remain central to AGI and science @demishassabis
- Google/DeepMind linked AlphaGo to AlphaEvolve and broader compute/science optimization @Google @GoogleDeepMind
- Noam Brown-style framing that current reasoning models follow the AlphaGo recipe: imitation, inference-time search, then RL @polynoamial
Recursive self-improvement discourse remained active:
- Schmidhuber resurfaced his long-running meta-learning/RSI work @SchmidhuberAI
- Commentary on unsupervised RLVR suggested naive recursive improvement currently hits ceilings @teortaxesTex

Capability milestones, applications, and deployment

One of the most striking capability claims: a possible AI-assisted resolution of a FrontierMath open problem, first from users claiming GPT-5.4 Pro solved it and later from observers noting this could be the first FrontierMath open problem solved by AI if validated @spicey_lemonade @kevinweil @GregHBurnham @AcerFur.
Google reported a prospective clinical study of AMIE in urgent care workflows: blinded evaluation found similar differential-diagnosis and management-plan quality overall versus PCPs, but PCPs outperformed on practicality and cost effectiveness (p=0.003, p=0.004) @iScienceLuvr.
Google Sheets with Gemini reached 70.48% on SpreadsheetBench, described as near human-expert ability @GoogleAI.
Google Workspace/Gemini rollout expanded across Docs, Sheets, Slides, and Drive, with claims of Sheets tasks 9x faster, AI-generated slide layouts, and Drive-level cross-document answers @Google @sundarpichai.
Microsoft reported health as the #1 topic for Copilot mobile users in 2025, based on analysis of 500k+ conversations @mustafasuleyman.
Sharon Zhou claimed superhuman performance on AI kernel optimization in production settings, suggesting automatic GPU-porting/optimization may soon be practical @realSharonZhou.

AI Reddit Recap

/r/LocalLlama + /r/localLLM Recap

1. Qwen Model Releases and Benchmarks

Qwen3.5-35B-A3B Uncensored (Aggressive) — GGUF Release (Activity: 337): The release of Qwen3.5-35B-A3B Aggressive on Hugging Face is a fully uncensored version of the Qwen model, featuring 0/465 refusals and maintaining full capability without personality alterations. This model includes various quantization formats such as BF16, Q8_0, Q6_K, and others, and supports multimodal inputs (text, image, video) with a 262K context length. It employs a hybrid attention mechanism combining Gated DeltaNet and softmax in a 3:1 ratio. The model is designed for high performance with 256 experts in a mixture of experts (MoE) configuration, activating 8+1 per token. Users are advised to use specific sampling parameters and the --jinja flag with llama.cpp for optimal performance. A notable comment inquires about the technique used for uncensoring the model, indicating a technical interest in the process behind the model’s modifications.
- The uncensoring process for models like Qwen3.5-35B-A3B is a topic of interest, with users questioning the techniques used. One user, guiopen, specifically asks about the method employed to achieve this uncensoring, indicating a desire for transparency in the process.
- Velocita84 raises a critical point about the need for evaluating Kullback-Leibler Divergence (KLD) to substantiate claims of ‘no capability loss’ in the uncensored model. This suggests a demand for rigorous statistical validation to ensure that the model’s performance remains intact post-modification.
- Long_comment_san inquires about the complexity of the uncensoring process, questioning whether it involves a standard procedure applicable across different models or if it requires specific adjustments for each architecture. This highlights a curiosity about the technical challenges and methodologies involved in uncensoring AI models.
Qwen 3.5 0.8B - small enough to run on a watch. Cool enough to play DOOM. (Activity: 635): The post discusses the implementation of a VLM agent using the Qwen 3.5 0.8B model, which is small enough to run on a smartwatch, to play DOOM via VizDoom. The model uses a simple approach: it takes a screenshot, overlays a grid, and uses a vision model with ‘shoot’ and ‘move’ tools to decide actions. Despite its small size, the model performs well in basic scenarios, achieving kills by selecting the correct column to shoot. However, it struggles with ammo conservation in more complex scenarios like ‘defend_the_center’. The setup involves Python, VizDoom, and HTTP calls to LM Studio, with a latency of 10 seconds per step on an M1 Mac. The author is working on improving ammo conservation by adding a ‘reason’ field to tool calls. Commenters highlight the novelty and potential of using such a small model for gaming, with one noting the existence of benchmark harnesses for models playing DOOM, and another expressing interest in testing similar setups with other models and games.
- ethereal_intellect discusses potential performance metrics for the Qwen 3.5 0.8B model, mentioning plans to connect it to typing games like Typing of the Dead and Monkeytype to measure words per minute (WPM) and frames per second (FPS). They note a preliminary test using LMStudio on their GPU, which showed a 0.16ms time to first token, indicating a potentially fast processing loop.
- mitchins-au references existing benchmark harnesses for models playing DOOM, suggesting that there are established methods to evaluate such capabilities. This implies that the Qwen 3.5 0.8B model could be assessed using these benchmarks to determine its gaming performance.
- No_Swimming6548 raises the question of whether the Qwen 3.5 0.8B model can run in real-time on a high-end GPU, hinting at the potential for real-time gaming applications if the model’s performance is sufficient.
Qwen-3.5-27B-Derestricted (Activity: 401): The Qwen-3.5-27B-Derestricted model, available on Hugging Face, is a large language model with 28 billion parameters, supporting BF16 and F32 tensor types. It has been downloaded 95 times in the last month but is not yet deployed by any inference providers. Users have noted its uncensored nature, allowing it to respond to a wide range of queries without restriction. The model is still in experimental stages, with ongoing evaluations of its coherence and intelligence, as noted by its performance on the UGI Leaderboard. Commenters have expressed surprise at the model’s lack of censorship, with one user noting it responded to controversial queries without hesitation. Another commenter, who is the model’s creator, is seeking feedback as they continue to experiment with the Qwen 3.5 models. The model’s potential is highlighted by its past performance in coherence and intelligence metrics.
- The Qwen-3.5-27B model, particularly in its ‘uncensored’ form, has sparked interest due to its ability to respond to controversial or sensitive queries without hesitation. This raises questions about the balance between model capability and ethical constraints, as noted by users who are testing its limits on platforms like Hugging Face.
- The model’s creator, Arli_AI, is actively seeking feedback on the Qwen 3.5 models, particularly the 27B variant, which is still in its experimental phase. The absence of a formal model card suggests that the model is in a preliminary stage, and user feedback is crucial for its development.
- There is a discussion about the terminology used to describe different levels of model restriction, such as ‘uncensored’, ‘abliterated’, ‘derestricted’, and ‘heretic’. These terms indicate varying degrees of model freedom in generating responses, with ‘Heretic 1.2’ being mentioned as a more advanced version that potentially supersedes ‘Derestricted’ features. This highlights the evolving nature of model customization and the community’s interest in understanding these distinctions.

2. Local LLM Experiences and Challenges

I regret ever finding LocalLLaMA (Activity: 498): The post describes a journey into the world of local Large Language Models (LLMs), starting with using AI for study aids and evolving into a deep dive into local AI technologies like LocalLLaMA and LM Studio. The user discusses acquiring hardware like MI50s, experimenting with quantization, and exploring various AI models such as Qwen and Gemini. The narrative highlights the technical complexity and personal fascination with local AI, despite its niche appeal outside engineering circles. A commenter from a major AI company notes that local AI is not widely appreciated outside engineering, likening its potential impact to Linux in computing. Another commenter views the pursuit of knowledge through local AI as a positive addiction.
How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified. (Activity: 535): The post describes a novel approach to improving the performance of the Qwen2-72B model on the Open LLM Leaderboard by duplicating a specific block of 7 middle layers without modifying any weights. This method, developed using 2x RTX 4090 GPUs, led to the model achieving the top position on the leaderboard. The author notes that duplicating a single layer or too many layers does not yield the same results, suggesting that pretraining creates discrete functional circuits that must be preserved as a whole. The full technical details are available in a blog post. One commenter noted that they often found specific groups of contiguous layers around the middle of a model to be most effective, aligning with the author’s finding of a ‘middle reasoning cortex.’ This suggests a shared understanding of the importance of preserving certain layer structures in neural networks.
- Arli_AI discusses a technique of manually ablating layers in neural networks, noting that contiguous layers in the middle often perform best, which aligns with the original post’s finding of a ‘middle reasoning cortex’. This suggests a potential area of focus for optimizing neural network performance by targeting specific layers for duplication or modification.
- sean_hash raises a critical point about the effectiveness of layer duplication compared to fine-tuning, suggesting that the success of layer duplication might highlight deficiencies in the training of base models. This implies that current training methodologies might not fully leverage the potential of neural network architectures.
- Hanthunius inquires about the implementation details of layer duplication, questioning whether the layers were pre-duplicated in memory or if an extra loop was used at runtime. The latter approach could offer more flexibility and efficiency by avoiding memory duplication and allowing for automated testing.
Anyone else feel like an outsider when AI comes up with family and friends? (Activity: 653): The post discusses the disconnect between technical experts in AI and the general public’s perception, often shaped by sensational headlines. The author, who works with AI models, finds it challenging to bridge this gap without coming across as overly defensive or dismissive. They note that non-technical people often view AI negatively, citing concerns about creativity, hype, and trust, which are not always based on a deep understanding of the technology. Commenters highlight a range of perspectives, from those who see AI as a threat to jobs and creativity, to others who are overly trusting of AI tools like ChatGPT. Some suggest that the media’s sensationalism contributes to these polarized views, and that conversations about AI should consider the emotional and social needs of the participants rather than focusing solely on technical accuracy.
- The comment by ttkciar highlights a common issue in AI discussions: the gap between public perception and technical reality. The commenter notes that many people, including their spouse, form opinions based on sensationalist media rather than technical facts. They also discuss the concept of an ‘AI Winter,’ emphasizing that it is driven by attitudes and funding rather than technological capability. This reflects a broader misunderstanding about AI’s potential and limitations, which is often exacerbated by media narratives.
- Krowken’s comment addresses several technical and societal concerns related to AI, such as the fear of job displacement due to AI automation, cognitive offloading in educational contexts, and the proliferation of AI-generated content. They also mention the rising cost of RAM, which impacts the affordability of home computing, and the ethical issues surrounding AI, such as deep fake pornography and the misuse of chatbots as therapy replacements. Despite these concerns, the commenter acknowledges the utility of large language models for specific tasks, illustrating the nuanced view many hold about AI’s role in society.
- Heavy-Focus-1964 emphasizes the importance of communication skills when discussing complex topics like AI. They suggest that effectively conveying technical knowledge without alienating others is a valuable skill, especially in discussions where there is a significant knowledge gap between parties. This comment underscores the challenge of bridging the divide between technical experts and the general public in conversations about AI.

3. AI Hardware and Performance Discussions

Genuinely curious what doors the M5 Ultra will open (Activity: 591): The image provides a detailed comparison of various GPUs, including data center, consumer, workstation, and Apple Silicon models, with a focus on specifications such as VRAM, memory type, bus width, and bandwidth. The Apple M5 Ultra is highlighted for its impressive bandwidth of 819 GB/s using LPDDR5X memory, suggesting significant performance improvements. This comparison indicates that the M5 Ultra’s bandwidth is catching up with high-end GPUs, potentially making larger models more feasible for use. The discussion also speculates on future Apple designs, like the M3 Ultra, potentially achieving ~1200GB/sec bandwidth, positioning it competitively against upcoming GPUs like the 5090. Commenters note the absence of the RTX 6000 PRO Blackwell with 96GB VRAM, highlighting a gap in the comparison. There’s also a humorous remark about the high cost of the M5 Ultra, suggesting it could be expensive.
- TokenRingAI highlights the potential memory speed of the M5 Ultra, estimating around 1200GB/sec, which would position it just below the performance of the 5090. This suggests significant improvements in data throughput, potentially enhancing tasks that require high memory bandwidth.
- sine120 discusses the limitations of current DRAM configurations, emphasizing the need for at least 128GB of unified memory for high-performance tasks. They mention models like Qwen3.5-122B and Coder-Next as suitable for a 128GB M5 Max, but note the high cost, which could be 3-4x more than a typical gaming rig, making it hard to justify for portable development.
- false79 points out the absence of the RTX 6000 PRO Blackwell with 96GB VRAM in the discussion, which could be a significant competitor in terms of memory capacity and performance, especially for professional and high-end computing tasks.
Happy birthday, llama.cpp! (Activity: 243): The post celebrates the anniversary of llama.cpp, a project that began with the leak of Meta’s original LLaMA models. Initially, the project struggled with performance, achieving only a few tokens per second. However, it has since evolved significantly, supporting advanced features like sub-7B models, >200k context, and fine-tuning capabilities. The project owes much of its success to contributions from Georgi and others, with notable advancements in quantization and C++ rewrites. For more technical details, see the original commit. A comment highlights the importance of quantization work over the C++ rewrite in enabling the running of 70B models at conversational speed on a Mac Mini, emphasizing the technical achievements of the llama.cpp team.
- sean_hash highlights the significant progress made by llama.cpp, noting that it took three years from Georgi’s first commit to achieve the capability of running 70B models at conversational speed on a Mac Mini. The comment emphasizes that while the C++ rewrite is often credited, the quantization work was more crucial in this advancement.
- Kornelius20 reflects on the impact of llama.cpp on their career, mentioning how they started by torrenting models on a university workstation. This experience was pivotal, suggesting that the accessibility of models through llama.cpp has been instrumental in shaping their current work in the field.
- Weak_Engine_8501 expresses gratitude for the innovations brought by llama.cpp, particularly in enabling models to run on local hardware. This comment underscores the importance of llama.cpp in making local LLMs more accessible and practical for individual use, highlighting its role in personal and professional development.
karpathy / autoresearch (Activity: 396): Andrej Karpathy introduces a novel approach to AI research automation with the ‘autoresearch’ framework, which allows AI agents to autonomously modify and test LLM training setups overnight. This process involves a simplified single-GPU implementation of nanochat, where agents iteratively adjust code, train for short periods, and evaluate improvements. The innovation lies in using program.md files to guide AI agents, marking a shift from traditional Python file manipulation to programming in natural language documents, which could redefine research workflows. Commenters express skepticism about the novelty and impact of Karpathy’s approach, with some suggesting it lacks significant architectural innovation and others highlighting the potential of the program.md pattern as a paradigm shift in research strategy.
- The discussion highlights a critique of Andrej Karpathy’s approach, suggesting that his focus on transformers and AGI might be limiting. A commenter recommends exploring GraphMERT as a potential architecture to replace transformers, indicating a belief that neurosymbolic methods could offer a more promising direction for AI research.
- A key point raised is the innovative use of the program.md pattern, where research strategies are encoded in markdown files for agents to interpret and execute. This approach is seen as a potential paradigm shift in AI research, emphasizing the importance of ‘programming in natural language docs’ over traditional automation loops.
- The comment on the nanochat leaderboard points out that while agent-driven changes like rope adjustments resemble Bayesian optimization, the real challenge lies in scaling from small, quick experiments to large-scale, resource-intensive model training. The bottleneck remains in compute power, highlighting the difficulty in generalizing from small-scale to large-scale AI models.

Less Technical AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo

1. Autonomous AI Research and Development

Andrej Karpathy’s Newest Development - Autonomously Improving Agentic Swarm Is Now Operational (Activity: 515): Andrej Karpathy has shared a significant development in AI training with an autonomously improving agentic swarm that made 20 changes to enhance the validation loss of a model, resulting in an 11% improvement in the “Time to GPT-2” metric. This system autonomously executed the “try → measure → think → try again” loop, achieving a faster training time from 2.02 hours to 1.80 hours to reach GPT-2 level, without manual intervention. This showcases the potential of autonomous systems in optimizing neural network training, as detailed in Karpathy’s nanochat project. Commenters are intrigued by the autonomous nature of the system, with one noting it as the first instance of an AI independently executing the research loop and outperforming manual tuning. Another comment humorously questions if this marks the beginning of the singularity era.
- SECONDLANDING highlights a significant achievement where an AI agent autonomously improved a model’s training efficiency by approximately 11%, reducing the time from 2.02 hours to 1.80 hours to reach GPT-2 level performance. This was achieved through a self-directed research loop involving iterative testing and optimization, marking a notable instance of AI surpassing manual tuning efforts. The project is detailed on GitHub.
- Worldly_Expression43 shares a similar experience with Opus 4.6, which autonomously optimized a Retrieval-Augmented Generation (RAG) pipeline using pgvector. The AI evaluated multiple chunking strategies, tested six different methods, and provided a solution that was three times faster than the original approach. This underscores the potential of AI in self-benchmarking and optimization, leading to significant performance improvements.
Andrew Karpathy’s “autoresearch”: An autonomous loop where AI edits PyTorch, runs 5-min training experiments, and continuously lowers its own val_bpb. “Who knew early singularity could be this fun? :)” (Activity: 839): Andrej Karpathy has introduced an autonomous research framework called “autoresearch,” which automates the process of optimizing neural network architectures, optimizers, and hyperparameters using a loop that runs 5-minute training experiments. The system operates on a git feature branch, autonomously making commits as it discovers improved settings, effectively lowering validation loss (val_bpb). This approach allows for continuous, unsupervised research progress, with each dot in the visual representation indicating a complete LLM training run. The project is implemented in a minimal repository designed for single-GPU use, consisting of approximately 630 lines of code. For more details, see the original post. Commenters highlight the potential of this approach to remove humans from the loop in large-scale model training, suggesting it could lead to significant advancements beyond current transformer limitations. Tobi Lutke shares a personal experiment where the system improved a model’s score by 19% overnight, demonstrating its potential for rapid, autonomous optimization.
- Kaarssteun shares an experience where an AI autonomously improved a model’s performance by 19% on a smaller 0.8b model compared to a previous 1.6b model after 37 experiments in 8 hours. This highlights the potential of AI-driven research to optimize models efficiently, even for non-experts, by iteratively reasoning through experiments and improving model quality and speed.
- Karpathy confirms that the improvements found by the autoresearch system on a depth 12 model successfully transferred to a depth 24 model, suggesting that the system’s optimizations are scalable. This resulted in a new leaderboard entry for ‘time to GPT-2’, demonstrating the system’s capability to enhance model performance across different scales.
- Alarming_Bluebird648 discusses the implementation of recursive optimization through an AI agent managing its own git branch to reduce validation bits per byte (val_bpb) in nanochat runs. This approach is seen as a pathway to overcoming current transformer bottlenecks by potentially scaling these loops to full architecture search.
Yann LeCun unveils his new startup Advanced Machine Intelligence (AMI Labs) — and raises $1.03B (Activity: 836): Yann LeCun has co-founded a new startup, Advanced Machine Intelligence (AMI) Labs, with Alexandre LeBrun. The company has raised $1.03 billion to develop world models using LeCun’s JEPA architecture, which aims to model physical reality rather than just text, addressing limitations of current LLMs like hallucination issues. This initiative is positioned as a long-term research project with no immediate product or revenue expectations, and all code and papers will be open source. Notable investors include NVIDIA, Samsung, and Bezos Expeditions. TechCrunch Commenters express admiration for LeCun’s realistic approach to AI, noting his reputation for honesty about AI’s capabilities. There is optimism about the potential outcomes of his independent venture.
- Yann LeCun’s new startup, Advanced Machine Intelligence (AMI Labs), is reportedly seeking a valuation of over $5 billion. The company has assembled a leadership team with LeBrun as CEO, LeFunde as CFO, and LeTune as head of post-training. They are also considering hiring LeMune for growth and LePrune for inference efficiency, indicating a strategic focus on scaling and optimizing AI models. TechCrunch provides more details on this development.
- The substantial $1.03 billion funding for AMI Labs suggests a significant investment in developing advanced AI models, potentially focusing on world models. This level of funding indicates confidence in LeCun’s vision and the potential for groundbreaking advancements in AI technology, particularly in areas that require large-scale computational resources and innovative model architectures.
- Yann LeCun’s approach with AMI Labs is noted for its emphasis on realistic AI capabilities, avoiding the overhype seen in other sectors. His leadership is expected to bring a balanced perspective to AI development, focusing on practical and scalable solutions rather than speculative technologies. This aligns with his reputation for providing honest insights into the field’s current state and future potential.
Figure robot autonomously cleaning living room (Activity: 1758): Figure AI has demonstrated their humanoid robot, Helix 02, autonomously cleaning a living room, showcasing advancements in robotic motion and decision-making. The robot uses various body parts to manipulate objects, such as pushing toys into a basket using gravity, and turning off a TV with a remote, indicating an improved understanding of physical interactions. The demonstration highlights the robot’s ability to perform tasks with less intermediate processing, suggesting enhanced efficiency and fluidity in its operations. Source. Commenters are impressed by the robot’s human-like movements and decision-making, though they note the need for transparency regarding the level of autonomy versus pre-programmed actions. The discussion emphasizes the importance of understanding how abstract the instructions were and the robot’s ability to generalize tasks.
- The robot’s ability to use different body parts for tasks, such as using gravity to drop toys into a basket, demonstrates an improved understanding of physical interactions. This suggests advancements in AI’s comprehension of the physical world, though it still requires optimization in task execution, like ensuring complete surface cleaning.
- A key point of discussion is the level of abstraction in the robot’s instructions. The debate centers on whether the robot autonomously deduced tasks from a general command like ‘tidy up the room’ or if each action was pre-programmed. Transparency in these processes is crucial for evaluating AI progress.
- The robot’s performance is noted for its practical utility compared to previous demonstrations focused on entertainment, such as backflips. This shift towards functional tasks highlights a trend in robotics towards more practical applications, though the current capabilities are still limited in complex, real-world scenarios.
800,000 human brain cells, in a dish, learned to play a video game (Activity: 2605): Researchers have successfully cultivated 800,000 human brain cells in vitro, which have demonstrated the ability to learn and play the video game Pong. This experiment, conducted by Cortical Labs, showcases the potential of ‘DishBrain’ technology, where neurons are integrated with a computer chip to create a biological-computational interface. The neurons were able to adapt and improve their gameplay over time, suggesting a form of rudimentary learning and decision-making capability. This research could have significant implications for understanding neural networks and developing advanced AI systems. The comments reflect a mix of humor and philosophical intrigue, with some users noting the ethical and existential questions raised by such experiments, particularly concerning the nature of consciousness and the definition of humanity.

2. Claude Code Review and Features

Introducing Code Review, a new feature for Claude Code. (Activity: 819): Anthropic has introduced a new feature called Code Review for their Claude Code platform, aimed at Team and Enterprise users. This feature is designed to address the bottleneck in code reviews by providing deep, multi-agent reviews that catch bugs often missed by human reviewers. Internally, it has increased substantive review comments on PRs from 16% to 54%, with less than 1% of findings marked incorrect. On large PRs, 84% surface findings with an average of 7.5 issues. The reviews are thorough, taking about 20 minutes and costing $15–25, focusing on depth rather than speed. It assists human reviewers but does not approve PRs autonomously. More details can be found here. Commenters noted the high cost and time of the reviews, suggesting the feature is targeted at enterprises. There is also a humorous acknowledgment of the feature’s internal testing, with a nod to the persistence of human reviewers.
- The introduction of Code Review by Claude Code is seen as targeting enterprise users due to its cost structure, with reviews averaging ~20 minutes and costing ~$15–25. This pricing model suggests a focus on depth over speed, which may not be suitable for smaller companies or startups that handle multiple pull requests daily, as highlighted by a user estimating costs of ~$300 per day for 10-15 PRs.
- A user points out that despite the introduction of Code Review, the necessity for a human reviewer remains, which limits the cost-saving potential of the feature. This suggests that while the tool may assist in the review process, it does not replace the need for human oversight, making it less appealing for startups looking to reduce operational costs.
- The feature has been in internal use at Anthropic for several months, as noted by a user referencing the company’s status page. This indicates a period of testing and refinement before public release, which could imply a focus on ensuring reliability and effectiveness in enterprise environments.
Bringing Code Review to Claude Code (Activity: 457): Claude Code has introduced a new feature called Code Review, which utilizes a team of agents to identify and rank bugs in pull requests (PRs) by severity. This feature provides a high-signal summary comment and inline flags for detected issues. It is currently available as a research preview in beta for Team and Enterprise users, with costs averaging $15–25 per PR, depending on the size and complexity, and is billed based on token usage. More details can be found in their blog post. Some users express concerns about the cost, noting that $15–25 per PR is high compared to built-in solutions like Codex plans. Others suggest that similar reviews could be conducted manually using existing skills and agents.
- The cost of using Claude Code’s review feature is a significant point of contention, with users highlighting that it ranges from $15 to $25 per pull request (PR), depending on the size and complexity. This is seen as a disadvantage compared to Codex, which includes similar functionality in its plan at no additional cost. The pricing model is based on token usage, which can make it less predictable and potentially more expensive for larger projects.
- The Claude Code GitHub Action is mentioned as a lighter-weight and potentially more cost-effective alternative to the full Code Review service. This option might be more suitable for users who do not require the depth of a full review and are looking to manage costs more effectively. The GitHub Action is designed to integrate seamlessly with existing workflows, providing a balance between functionality and cost.
- Some users express skepticism about the value proposition of Claude Code’s review service, suggesting that similar results can be achieved through personal expertise and the use of agents. This perspective highlights a preference for leveraging existing skills and tools over investing in a potentially costly service, especially when alternatives like Codex offer integrated solutions without additional fees.
Introducing Code Review, a new feature for Claude Code. (Activity: 891): Anthropic has introduced a new feature called Code Review for their Claude Code platform, currently in research preview for Team and Enterprise users. This feature aims to address the bottleneck in code reviews by providing deep, multi-agent reviews that catch bugs often missed by human reviewers. Internally, it has increased substantive review comments on PRs from 16% to 54%, with less than 1% of findings marked incorrect by engineers. On large PRs (1,000+ lines), 84% surface findings, averaging 7.5 issues per review. The reviews are designed for depth, taking approximately 20 minutes and costing $15–25, which is more expensive than lightweight scans but aims to prevent costly production incidents. More details can be found here. Commenters express concerns about the cost of $15-25 per review, considering it steep compared to custom automated solutions that provide feedback faster and cheaper. Some see it as an expensive option for teams unable to customize their setups.
- SeaworthySamus highlights the potential for cost savings and efficiency by using custom slash commands for automated pull request reviews. These commands can be tailored to specific scopes and coding standards, offering feedback more quickly and at a lower cost than the $15-25 per review suggested by the new feature. This approach may be more suitable for teams that can customize their setups, as opposed to using a potentially expensive out-of-the-box solution.
- spenpal_dev questions the differentiation between the new Code Review feature and the existing /review command. This suggests a need for clarification on what additional value or functionality the new feature provides over existing tools, which could influence its perceived value and adoption.
- ryami333 points out a lack of responsiveness from maintainers on a highly upvoted issue in the GitHub repository, specifically issue #6235. This comment underscores a potential disconnect between user feedback and development priorities, suggesting that addressing user-reported issues could be more beneficial than introducing new features.
I used Claude Code to build a USB dongle that auto-plays Chrome Dino — no drivers, no host software, just a $2 board and two light sensors (Activity: 653): The post describes a project using an ATtiny85 USB dongle to automate playing the Chrome Dino game. The device functions as a USB HID keyboard, requiring no additional software, and uses two LDR sensors to detect obstacles and send jump/duck commands. The firmware, written in bare-metal C using avr-gcc, integrates a V-USB HID stack and employs a pulse-width envelope measurement for adaptive timing. Claude Code assisted in developing the firmware, including obstacle classification logic and adaptive timing, while Codex provided a code review that identified a bug. The project emphasizes its independence from host-side software and its adaptive timing mechanism, with a total firmware size of 2699 bytes. GitHub Repo and Blog links are provided for further details. The comments reflect a mix of humor and surprise, with one user expressing admiration for the project’s complexity and another surprised by the game’s ducking feature. There is no deep technical debate in the comments.
Hands down the best guide to Claude Cowork (Activity: 1483): The image provides a detailed comparison guide for using Claude AI in three modes: Chat, Cowork, and Projects. It highlights differences in access, setup, and functionality, noting that Chat functions like a chatbot, Cowork allows desktop file interaction, and Projects serves as a saved workspace. The guide also specifies the skills needed, output quality, and context handling for each mode, and mentions that Cowork and Projects require a Pro Plan subscription. This structured comparison helps users decide which mode to use based on their needs. One commenter noted that Claude AI’s chat mode now retains memory of exchanges, similar to ChatGPT, enhancing its usability. Another expressed a desire for the inclusion of Claude Code in the comparison.

3. AI Model Performance and Benchmarks

Benchmarking Model Performance: Launch Day vs. Current API Generations (Activity: 189): The image compares outputs from the Gemini 3.1 Pro model on two different dates, highlighting a perceived degradation in quality over time. The left image, from February 2026, shows a more detailed Ferrari, while the right image, from May 2026, appears less polished. This suggests potential issues with model updates or API changes affecting output quality. The discussion emphasizes the stochastic nature of LLMs, indicating that single comparisons may not be reliable without multiple runs to account for variability. Commenters highlight the probabilistic nature of language models, suggesting that single-instance comparisons are insufficient to draw conclusions about model performance changes over time.
- DifficultSelection highlights the importance of understanding that LLM inference is inherently stochastic, suggesting that to draw meaningful conclusions from performance comparisons, one should conduct approximately 30 runs per date. This approach accounts for the variability in outputs due to the probabilistic nature of language models.
ChatGPT vs Gemini vs Claude vs Perplexity: I gave them $1k each to trade stocks. After 9 weeks, ChatGPT went from frozen in cash to +21% (one stock doubled) (Activity: 492): In a 9-week experiment, four AI models—ChatGPT, Gemini, Claude, and Perplexity—were each given $1,000 to autonomously trade stocks using Alpaca APIs. ChatGPT led with a +21.1% return, primarily due to a strategic all-in on healthcare stocks, notably IOVA and ACHC, which saw significant gains. Perplexity maintained a +1.1% return by holding cash, while Gemini and Claude underperformed with -6.6% and -11.5% respectively, due to high-risk trades and frequent stop-outs. The S&P 500 declined by -1.5% during the same period, highlighting ChatGPT’s relative outperformance. The experiment was automated via Python and results are publicly logged on GitHub. Commenters suggest the results might be accidental and propose using multiple instances of each model to validate findings. Another suggestion was to include a random control, like throwing darts, to compare against AI performance.
- Disastrous-Wildcat raises a valid point about the potential for randomness in the results, suggesting that replicating the experiment with multiple instances of the same model could provide more statistically significant insights. This would require a substantial financial investment, but it would help determine if the observed performance is consistent or merely coincidental.
- vegt121 proposes a more robust experimental design by suggesting the use of 100 instances of each model, each with $1,000, to trade stocks. This approach would allow for a comprehensive analysis of performance across a larger sample size, potentially revealing patterns or consistencies in the models’ trading strategies. However, the main challenge is the financial requirement of $400,000 to conduct such an experiment.
- Jumpin_Joeronimo introduces an interesting idea of leveraging models to track and mimic the stock trades of congress members, potentially using websites that aggregate such data. This could provide a unique strategy for AI-driven stock trading, assuming the models can effectively interpret and act on this information.
16+ AI Image Models: The Showdown — Midjourney v7, GPT Image 1.5/Mini, Nano Banana Pro/2/1, Kling Kolors v3.0/v2.1, Seedream 5.0 Lite/4.6/4.5/4.1/4.0, Imagen 4, Qwen Image, Runway Gen4 — Same Prompt, Side by Side (Activity: 96): The article provides a comprehensive comparison of 16+ AI image models, including Midjourney v7, GPT Image 1.5/Mini, Nano Banana Pro/2/1, Kling Kolors v3.0/v2.1, Seedream 5.0 Lite/4.6/4.5/4.1/4.0, Imagen 4, Qwen Image, and Runway Gen4. Each model is evaluated using the same prompt to highlight differences in rendering capabilities, focusing on aspects like detail, color accuracy, and artistic style. The comparison aims to showcase the strengths and weaknesses of each model, with Midjourney v7 noted for its theatrical effect but criticized for lacking detail upon closer inspection. The full article can be accessed here. One comment highlights the impressive initial impact of Midjourney’s output but notes a lack of detail upon closer examination, suggesting a trade-off between visual appeal and fine detail.

AI Discords

Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.

Mar 10
Yann LeCun’s AMI Labs launches with a $1.03B seed to build world models around JEPA

Companies

Topics

People