All tags
Topic: "model-scaling"
not much happened today
phi-4 phi-4-mini-reasoning qwen3-235b qwen3-moe-235b qwen3-moe-30b qwen3-dense-32b qwen3-dense-14b qwen3-dense-8b qwen3-dense-4b qwen3-dense-0.6b qwen2.5-omni-3b deepseek-prover-v2 llama llama-guard-4 prompt-guard-2 mimo-7b microsoft anthropic cursor alibaba togethercompute deepseek meta-ai-fair xiaomi openrouterai cohere reasoning model-fine-tuning model-evaluation benchmarking model-popularity open-source math model-scaling model-filtering jailbreak-prevention cline reach_vb vipulved akhaliq omarsar0 zhs05232838 huajian_xin mervenoyann karpathy random_walker sarahookr blancheminerva clefourrier
Microsoft released Phi-reasoning 4, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. Anthropic introduced remote MCP server support and a 45-minute Research mode in Claude. Cursor published a model popularity list. Alibaba launched Qwen3-235B and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on Together AI API. Microsoft also released Phi-4-Mini-Reasoning with benchmark performance on AIME 2025 and OmniMath. DeepSeek announced DeepSeek-Prover V2 with state-of-the-art math problem solving, scaling to 671B parameters. Meta AI's Llama models hit 1.2 billion downloads, with new Llama Guard 4 and Prompt Guard 2 for input/output filtering and jailbreak prevention. Xiaomi released the open-source reasoning model MiMo-7B trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the LMArena leaderboard, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like OpenRouterAI rankings. "LMArena slop and biased" and "61.3% of all data going to proprietary model providers" were noted concerns.
LLaDA: Large Language Diffusion Models
llada-8b llama-3-8b step-video-t2v-30b step-audio-chat-132b llama-2-7b stepfun-ai scale-ai cambridge llamaindex diffusion-models text-generation multimodality video-generation voice-processing benchmarking instruction-following model-scaling gpu-usage long-context multi-turn-dialogue arankomatsuzaki _akhaliq omarsar0 iscienceluvr gallabytes maximelabonne reach_vb
LLaDA (Large Language Diffusion Model) 8B is a breakthrough diffusion-based language model that rivals LLaMA 3 8B while training on 7x fewer tokens (2 trillion tokens) and using 0.13 million H800 GPU hours. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, StepFun AI released two major models: Step-Video-T2V 30B, a text-to-video model generating up to 204 frames with high coherence and motion quality, and Step-Audio-Chat 132B, a voice-to-voice model. Additionally, challenging multimodal benchmarks like Scale AI's EnigmaEval and Cambridge's ZeroBench highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.
Meta BLT: Tokenizer-free, Byte-level LLM
byte-latent-transformer llama-3 phi-4 gpt-4o command-r7b meta-ai-fair llamaindex microsoft deepseek-ai openai cohere anthropic tokenization transformer-architecture model-efficiency benchmarking multimodality vision reinforcement-learning model-scaling jailbreaking model-optimization
Meta AI introduces the Byte Latent Transformer (BLT), a tokenizer-free architecture that dynamically forms byte patches for efficient compute allocation, outperforming Llama 3 on benchmarks including the CUTE benchmark. The model was trained on approximately 1 trillion tokens and features a three-block transformer design with local and global components. This approach challenges traditional tokenization and may enable new multimodal capabilities such as direct file interaction without retrieval-augmented generation. Additionally, Microsoft announced the Phi-4 14B parameter model achieving state-of-the-art results on STEM and reasoning benchmarks, surpassing GPT-4o. DeepSeek AI launched new vision-language models based on their MoE architecture with sizes ranging from 1.0B to 27B parameters. OpenAI released a new Projects feature for ChatGPT, and Cohere introduced their smallest and fastest Command R7B model. Anthropic published research on "Best-of-N Jailbreaking" vulnerabilities across text, vision, and audio models. Industry discussion highlights a trend of decreasing frontier LLM sizes, with GPT-4 at approximately 1.8 trillion parameters compared to newer models.
Common Corpus: 2T Open Tokens with Provenance
qwen-2.5-coder claude-3.5-sonnet janusflow-1.3b ocronos-vintage pleais huggingface langchainai deepseek alibaba anthropic provenance ocr multilingual-datasets prompt-engineering multimodality image-generation code-generation quantization model-scaling inference-efficiency tim-dettmers tom-doerr omarsar0 swyx madiator reach_vb
Pleais via Huggingface released Common Corpus, the largest fully open multilingual dataset with over 2 trillion tokens including detailed provenance information. They also introduced OCRonos-Vintage, a 124M-parameter OCR correction model that efficiently fixes digitization errors on CPU and GPU, unlocking knowledge from PDFs. On AI tools, LangChainAI launched Prompt Canvas for collaborative prompt engineering, while DeepSeek released JanusFlow 1.3B, a unified multimodal LLM integrating autoregressive and rectified flow models for enhanced image understanding and generation. Alibaba Cloud announced Qwen2.5-Coder, a code-focused LLM with advanced coding capabilities, and Claude 3.5 Sonnet was highlighted for superior code generation. Discussions on quantization challenges and scaling laws for precision by Tim Dettmers and others emphasized the impact of low-precision training on model scalability and inference efficiency. "Scaling Laws for Precision" paper insights and alternative efficiency methods were also noted.
not much happened today
llama-3-2-vision gpt-2 meta-ai-fair ollama amd llamaindex gemini gitpod togethercompute langchainai weights-biases stanfordnlp deeplearningai model-scaling neural-networks multi-gpu-support skip-connections transformers healthcare-ai automated-recruitment zero-trust-security small-language-models numerical-processing chain-of-thought optical-character-recognition multi-agent-systems agent-memory interactive-language-learning bindureddy fstichler stasbekman jxmnop bindureddy omarsar0 giffmana rajammanabrolu
This week in AI news highlights Ollama 0.4 supporting Meta's Llama 3.2 Vision models (11B and 90B), with applications like handwriting recognition. Self-Consistency Preference Optimization (ScPO) was introduced to improve model consistency without human labels. Discussions on model scaling, neural networks resurgence, and AMD's multi-GPU bandwidth challenges were noted. The importance of skip connections in Transformers was emphasized. In healthcare, less regulation plus AI could revolutionize disease treatment and aging. Tools like LlamaParse and Gemini aid automated resume insights. Gitpod Flex demonstrated zero-trust architecture for secure development environments. Research includes surveys on Small Language Models (SLMs), number understanding in LLMs, and DTrOCR using a GPT-2 decoder for OCR. Multi-agent systems in prediction markets were discussed by TogetherCompute and LangChainAI. Community events include NeurIPS Happy Hour, NLP seminars, and courses on Agent Memory with LLMs as operating systems.
Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
claude-3.5-haiku llama-3-1 llama-3-2 mlx-lm tencent anthropic meta-ai-fair togethercompute llamaindex mixture-of-experts synthetic-data model-scaling model-architecture model-optimization kv-cache-quantization react fine-tuning scaling-laws model-efficiency model-deployment multimodality
Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase. Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.
not much happened today
gpt-4-0613 gpt-3.5-turbo-0613 gpt-4o-2024-08-06 mistral-large-2 gpt4-turbo claude-3-opus idefics3-llama bigllama-3.1-1t-instruct llama-3-120b-instruct openai mistral-ai meta-ai-fair structured-outputs function-calling json-schema benchmarking multimodality context-windows model-scaling ai-hardware vision speech-processing robotics ai-regulation sama rohanpaul_ai corbtt guillaumelample mervenoyann maximelabonne aidan_mclau adcock_brett ylecun
OpenAI introduced structured outputs in their API with a new "strict" mode and a "response_format" parameter, supporting models like gpt-4-0613, gpt-3.5-turbo-0613, and the new gpt-4o-2024-08-06. They also halved the price of gpt-4o to $2.50 per million tokens. Mistral Large 2 outperforms gpt4-turbo and claude-3-opus on hard benchmarks and coding tasks. Idefics3-Llama offers multimodal capabilities with a 10k token context window. BigLlama-3.1-1T-Instruct is an upscaled version of llama-3-120b-instruct. New benchmark "big_model_smell" measures creativity and reliability. Figure 02 robot features advanced AI hardware with onboard vision language model, enhanced battery, and speech-to-speech reasoning. Yann LeCun expressed concerns about California's SB1047 regulation.
AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold
gemini alphageometry-2 alphaproof llama-3-1-405b llama-3-70b llama-3-8b mistral-large-2 google-deepmind meta-ai-fair mistral-ai neurosymbolic-ai mathematical-reasoning synthetic-data knowledge-sharing model-fine-tuning alpha-zero multilinguality context-windows model-scaling benchmarking performance-comparison tim-gowers guillaume-lample osanseviero
Search+Verifier highlights advances in neurosymbolic AI during the 2024 Math Olympics. Google DeepMind's combination of AlphaProof and AlphaGeometry 2 solved four out of six IMO problems, with AlphaProof being a finetuned Gemini model using an AlphaZero approach, and AlphaGeometry 2 trained on significantly more synthetic data with a novel knowledge-sharing mechanism. Despite impressive results, human judges noted the AI required much longer time than human competitors. Meanwhile, Meta AI released Llama 3.1 with a 405B parameter model and smaller variants, and Mistral AI launched Mistral Large 2 with 123B parameters and 128k context windows, outperforming Llama 3.1 on coding tasks and multilingual benchmarks. This marks significant progress in AI mathematical reasoning, model scaling, and multilingual capabilities.
Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)
llama-2-7b codegeex4-all-9b mamba facebook-research meta-ai-fair tsinghua-university hallucination-detection anti-hallucination-methods on-device-ai model-architecture rnn long-context-modeling model-scaling expressive-hidden-states code-generation lilian-weng yann-lecun
Lilian Weng released a comprehensive literature review on hallucination detection and anti-hallucination methods including techniques like FactualityPrompt, SelfCheckGPT, and WebGPT. Facebook AI Research (FAIR) published MobileLLM, a sub-billion parameter on-device language model architecture achieving performance comparable to llama-2-7b with innovations like thin and deep models and shared weights. A new RNN-based LLM architecture with expressive hidden states was introduced, replacing attention mechanisms and scaling better than Mamba and Transformer models for long-context modeling. Additionally, Tsinghua University open sourced CodeGeeX4-ALL-9B, a multilingual code generation model excelling in code assistance.
Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
nemotron-4-340b mixtral llama-3 gemini-1.5 gpt-4o mamba-2-hybrid-8b samba-3.8b-instruct dolphin-2.9.3 faro-yi-9b-dpo nvidia hugging-face mistral-ai llamaindex cohere gemini mistral synthetic-data model-alignment reward-models fine-tuning long-context model-scaling inference-speed mixture-of-agents open-source-models model-training instruction-following context-windows philipp-schmid bryan-catanzaro oleksii-kuchaiev rohanpaul_ai cognitivecompai _philschmid 01ai_yi
NVIDIA has scaled up its Nemotron-4 model from 15B to a massive 340B dense model, trained on 9T tokens, achieving performance comparable to GPT-4. The model alignment process uses over 98% synthetic data, with only about 20K human-annotated samples for fine-tuning and reward model training. The synthetic data generation pipeline is open-sourced, including synthetic prompts and preference data generation. The base and instruct versions outperform Mixtral and Llama 3, while the reward model ranks better than Gemini 1.5, Cohere, and GPT-4o. Other notable models include Mamba-2-Hybrid 8B, which is up to 8x faster than Transformers and excels on long-context tasks, Samba-3.8B-instruct for infinite context length with linear complexity, Dolphin-2.9.3 tiny models optimized for low-resource devices, and Faro Yi 9B DPO with a 200K context window running efficiently on 16GB VRAM. The Mixture-of-Agents technique boosts open-source LLMs beyond GPT-4 Omni on AlpacaEval 2.0.
GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)
gpt-4o gpt-3.5 llama-3 openai hugging-face nous-research eleutherai hazyresearch real-time-reasoning coding-capabilities fine-tuning knowledge-distillation hardware-optimization quantization multimodality mixture-of-experts efficient-attention model-scaling depth-upscaling transformer-architecture gpu-optimization prompt-engineering
OpenAI launched GPT-4o, a frontier model supporting real-time reasoning across audio, vision, and text, now free for all ChatGPT users with enhanced coding capabilities and upcoming advanced voice and video features. Discussions cover open-source LLMs like Llama 3, fine-tuning techniques including knowledge distillation for GPT-3.5, and hardware optimization strategies such as quantization. Emerging architectures include multimodal integrations with ChatGPT voice and Open Interpreter API, Mixture of Experts models combining autoregressive and diffusion approaches, and novel designs like the YOCO architecture and ThunderKittens DSL for efficient GPU use. Research advances in efficient attention methods like Conv-Basis using FFT and model scaling techniques such as depth upscaling were also highlighted.