All tags
Person: "skirano"
nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion
gemini-2.5-flash-image-preview hermes-4 nemotron-nano-9b-v2 internvl3.5 gpt-oss qwen3 deepseek-v3.1 google-deepmind nous-research nvidia openai ollama huggingface openrouter image-editing natural-language-processing multi-image-composition character-consistency reasoning hybrid-models context-windows model-steerability pretraining finetuning alignment vision vision-language api model-integration sundarpichai _philschmid lmarena_ai omarsar0 skirano yupp_ai xanderatallah officiallogank mervenoyann
Google DeepMind revealed Gemini-2.5-Flash-Image-Preview, a state-of-the-art image editing model excelling in character consistency, natural-language edits, and multi-image composition, dominating the Image Edit Arena with a ~170-180 Elo lead and over 2.5M votes. It is integrated into multiple platforms including Google AI Studio and third-party services. Nous Research released Hermes 4, an open-weight hybrid reasoning model focused on steerability and STEM benchmarks. NVIDIA launched Nemotron Nano 9B V2, a hybrid Mamba-Transformer with 128k context, top-performing under 10B parameters, and released a 6.6T-token pretraining subset. InternVL3.5 introduced 32 vision-language models based on OpenAI's gpt-oss and Qwen3 backbones. Ollama v0.11.7 added DeepSeek v3.1 support with hybrid thinking and Turbo mode preview.
SmolLM3: the SOTA 3B reasoning open source LLM
smollm3-3b olmo-3 grok-4 claude-4 claude-4.1 gemini-nano hunyuan-a13b gemini-2.5 gemma-3n qwen2.5-vl-3b huggingface allenai openai anthropic google-deepmind mistral-ai tencent gemini alibaba open-source small-language-models model-releases model-performance benchmarking multimodality context-windows precision-fp8 api batch-processing model-scaling model-architecture licensing ocr elonmusk mervenoyann skirano amandaaskell clementdelangue loubnabenallal1 awnihannun swyx artificialanlys officiallogank osanseviero cognitivecompai aravsrinivas
HuggingFace released SmolLM3-3B, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until Olmo 3 arrives. Grok 4 was launched with mixed reactions, while concerns about Claude 4 nerfs and an imminent Claude 4.1 surfaced. Gemini Nano is now shipping in Chrome 137+, enabling local LLM access for 3.7 billion users. Tencent introduced Hunyuan-A13B, an 80B parameter model with a 256K context window running on a single H200 GPU. The Gemini API added a batch mode with 50% discounts on 2.5 models. MatFormer Lab launched tools for custom-sized Gemma 3n models. Open source OCR models like Nanonets-OCR-s and ChatDOC/OCRFlux-3B derived from Qwen2.5-VL-3B were highlighted, with licensing discussions involving Alibaba.
not much happened today
claude-3.7-sonnet claude-3.7 deepseek-r1 o3-mini deepseek-v3 gemini-2.0-pro gpt-4o qwen2.5-coder-32b-instruct anthropic perplexity-ai amazon google-cloud deepseek_ai coding reasoning model-benchmarking agentic-workflows context-window model-performance open-source moe model-training communication-libraries fp8 nvlink rdma cli-tools skirano omarsar0 reach_vb artificialanlys terryyuezhuo _akhaliq _philschmid catherineols goodside danielhanchen
Claude 3.7 Sonnet demonstrates exceptional coding and reasoning capabilities, outperforming models like DeepSeek R1, O3-mini, and GPT-4o on benchmarks such as SciCode and LiveCodeBench. It is available on platforms including Perplexity Pro, Anthropic, Amazon Bedrock, and Google Cloud, with pricing at $3/$15 per million tokens. Key features include a 64k token thinking mode, 200k context window, and the CLI-based coding assistant Claude Code. Meanwhile, DeepSeek released DeepEP, an open-source communication library optimized for MoE model training and inference with support for NVLink, RDMA, and FP8. These updates highlight advancements in coding AI and efficient model training infrastructure.
not much happened today
helium-1 qwen-2.5 phi-4 sky-t1-32b-preview o1 codestral-25.01 phi-3 mistral llama-3 gpt-3.5 llama-3 gpt-3.5 llmquoter kyutai-labs lmstudio mistralai llamaindex huggingface langchainai hyperbolic-labs replit fchollet philschmid multilinguality token-level-distillation context-windows model-performance open-source reasoning coding retrieval-augmented-generation hybrid-retrieval multiagent-systems video large-video-language-models dynamic-ui voice-interaction gpu-rentals model-optimization semantic-deduplication model-inference reach_vb awnihannun lior_on_ai sophiamyang omarsar0 skirano yuchenj_uw fchollet philschmid
Helium-1 Preview by kyutai_labs is a 2B-parameter multilingual base LLM outperforming Qwen 2.5, trained on 2.5T tokens with a 4096 context size using token-level distillation from a 7B model. Phi-4 (4-bit) was released in lmstudio on an M4 max, noted for speed and performance. Sky-T1-32B-Preview is a $450 open-source reasoning model matching o1's performance with strong benchmark scores. Codestral 25.01 by mistralai is a new SOTA coding model supporting 80+ programming languages and offering 2x speed.
Innovations include AutoRAG for optimizing retrieval-augmented generation pipelines, Agentic RAG for autonomous query reformulation and critique, Multiagent Finetuning using societies of models like Phi-3, Mistral, LLaMA-3, and GPT-3.5 for reasoning improvements, and VideoRAG incorporating video content into RAG with LVLMs.
Applications include a dynamic UI AI chat app by skirano on Replit, LangChain tools like DocTalk for voice PDF conversations, AI travel agent tutorials, and news summarization agents. Hyperbolic Labs offers competitive GPU rentals including H100, A100, and RTX 4090. LLMQuoter enhances RAG accuracy by identifying key quotes.
Infrastructure updates include MLX export for LLM inference from Python to C++ by fchollet and SemHash semantic text deduplication by philschmid.