All tags
Topic: "small-language-models"
Air Street's State of AI 2025 Report
glm-4.6 jamba-1.5 rnd1 claude-code reflection mastra datacurve spellbook kernel figure softbank abb radicalnumerics zhipu-ai ai21-labs anthropic humanoid-robots mixture-of-experts diffusion-models open-weight-models reinforcement-learning benchmarking small-language-models plugin-systems developer-tools agent-stacks adcock_brett achowdhery clementdelangue
Reflection raised $2B to build frontier open-weight models with a focus on safety and evaluation, led by a team with backgrounds from AlphaGo, PaLM, and Gemini. Figure launched its next-gen humanoid robot, Figure 03, emphasizing non-teleoperated capabilities for home and large-scale use. Radical Numerics released RND1, a 30B-parameter sparse MoE diffusion language model with open weights and code to advance diffusion LM research. Zhipu posted strong results with GLM-4.6 on the Design Arena benchmark, while AI21 Labs' Jamba Reasoning 3B leads tiny reasoning models. Anthropic introduced a plugin system for Claude Code to enhance developer tools and agent stacks. The report also highlights SoftBank's acquisition of ABB's robotics unit for $5.4B and the growing ecosystem around open frontier modeling and small-model reasoning.
SmolLM3: the SOTA 3B reasoning open source LLM
smollm3-3b olmo-3 grok-4 claude-4 claude-4.1 gemini-nano hunyuan-a13b gemini-2.5 gemma-3n qwen2.5-vl-3b huggingface allenai openai anthropic google-deepmind mistral-ai tencent gemini alibaba open-source small-language-models model-releases model-performance benchmarking multimodality context-windows precision-fp8 api batch-processing model-scaling model-architecture licensing ocr elonmusk mervenoyann skirano amandaaskell clementdelangue loubnabenallal1 awnihannun swyx artificialanlys officiallogank osanseviero cognitivecompai aravsrinivas
HuggingFace released SmolLM3-3B, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until Olmo 3 arrives. Grok 4 was launched with mixed reactions, while concerns about Claude 4 nerfs and an imminent Claude 4.1 surfaced. Gemini Nano is now shipping in Chrome 137+, enabling local LLM access for 3.7 billion users. Tencent introduced Hunyuan-A13B, an 80B parameter model with a 256K context window running on a single H200 GPU. The Gemini API added a batch mode with 50% discounts on 2.5 models. MatFormer Lab launched tools for custom-sized Gemma 3n models. Open source OCR models like Nanonets-OCR-s and ChatDOC/OCRFlux-3B derived from Qwen2.5-VL-3B were highlighted, with licensing discussions involving Alibaba.
not much happened today
chatgpt-4o deepseek-r1 o3 o3-mini gemini-2-flash qwen-2.5 qwen-0.5b hugging-face openai perplexity-ai deepseek-ai gemini qwen metr_evals reasoning benchmarking model-performance prompt-engineering model-optimization model-deployment small-language-models mobile-ai ai-agents speed-optimization _akhaliq aravsrinivas lmarena_ai omarsar0 risingsayak
Smolagents library by Huggingface continues trending. ChatGPT-4o latest version
chatgpt-40-latest-20250129 released. DeepSeek R1 671B sets speed record at 198 t/s, fastest reasoning model, recommended with specific prompt settings. Perplexity Deep Research outperforms models like Gemini Thinking, o3-mini, and DeepSeek-R1 on Humanity's Last Exam benchmark with 21.1% score and 93.9% accuracy on SimpleQA. ChatGPT-4o ranks #1 on Arena leaderboard in multiple categories except math. OpenAI's o3 model powers Deep Research tool for ChatGPT Pro users. Gemini 2 Flash and Qwen 2.5 models support LLMGrading verifier. Qwen 2.5 models added to PocketPal app. MLX shows small LLMs like Qwen 0.5B generate tokens at high speed on M4 Max and iPhone 16 Pro. Gemini Flash 2.0 leads new AI agent leaderboard. DeepSeek R1 is most liked on Hugging Face with over 10 million downloads. not much happened today
llama-3-2-vision gpt-2 meta-ai-fair ollama amd llamaindex gemini gitpod togethercompute langchainai weights-biases stanfordnlp deeplearningai model-scaling neural-networks multi-gpu-support skip-connections transformers healthcare-ai automated-recruitment zero-trust-security small-language-models numerical-processing chain-of-thought optical-character-recognition multi-agent-systems agent-memory interactive-language-learning bindureddy fstichler stasbekman jxmnop bindureddy omarsar0 giffmana rajammanabrolu
This week in AI news highlights Ollama 0.4 supporting Meta's Llama 3.2 Vision models (11B and 90B), with applications like handwriting recognition. Self-Consistency Preference Optimization (ScPO) was introduced to improve model consistency without human labels. Discussions on model scaling, neural networks resurgence, and AMD's multi-GPU bandwidth challenges were noted. The importance of skip connections in Transformers was emphasized. In healthcare, less regulation plus AI could revolutionize disease treatment and aging. Tools like LlamaParse and Gemini aid automated resume insights. Gitpod Flex demonstrated zero-trust architecture for secure development environments. Research includes surveys on Small Language Models (SLMs), number understanding in LLMs, and DTrOCR using a GPT-2 decoder for OCR. Multi-agent systems in prediction markets were discussed by TogetherCompute and LangChainAI. Community events include NeurIPS Happy Hour, NLP seminars, and courses on Agent Memory with LLMs as operating systems.