All tags
Person: "lioronai"
not much happened today
qwen-3.5-0.8b qwen-3.5-2b qwen-3.5-4b qwen-3.5-9b codex-5.3 claude-3 alibaba ollama lm-studio openai anthropic multimodality reinforcement-learning long-context hybrid-attention on-device-ai model-deployment agent-reliability agent-observability coding-agents benchmarking runtime-optimization token-efficiency nrehiew_ kimmonismus lioronai danielhanchen theo htihle teortaxestex theprimeagen yuchenj_uw _lewtun saen_dev _philschmid omarsar0
Alibaba released the Qwen 3.5 series with models ranging from 0.8B to 9B parameters, featuring native multimodality, scaled reinforcement learning, and targeting edge and lightweight agent deployments. The models support very long context windows up to 262K tokens (extendable to 1M) and use a novel Gated DeltaNet hybrid attention architecture combining linear and full attention layers. Deployment examples include Ollama and LM Studio, with a notable 6-bit on-device demo on iPhone 17 Pro. Evaluators are cautioned that reasoning is disabled by default on smaller models. In coding agents, Codex 5.3 shows promising benchmark results on WeirdML with 79.3% accuracy, though availability and downtime remain critical challenges, especially highlighted by Claude outages. Agent reliability and observability are emphasized as cross-functional problems requiring clear success criteria and practical evaluation strategies. Studies show that using AGENTS.md and SKILL.md guardrails can significantly reduce runtime and token usage by mitigating worst-case thrashing in coding workflows.
Agentic Engineering: WTF Happened in December 2025?
gpt-5.3-codex claude-code perplexity openai anthropic langchain-ai coding-agents agent-architecture distributed-workflows usage-based-pricing model-routing benchmarking context-length observability software-development karpathy aravsrinivas lioronai denisyarats swyx catwu hwchase17
Perplexity launched Computer, an orchestration-first agent platform featuring multi-model routing, usage-based pricing, and parallel asynchronous sub-agents for distributed workflows. Andrej Karpathy claims a "phase change" in coding agents since December, highlighting sustained long-horizon task completion. OpenAI released GPT-5.3-Codex with ~25% speed improvements and strong benchmark performance, while Claude Code celebrates its first year with ecosystem integrations and scaling challenges. This marks a significant shift in coding workflows and agent-based software development.
not much happened today
glm-4.6v glm-4.6v-flash jina-vlm-2b hugging-face zhipu-ai jina-ai google-deepmind axiomprover fine-tuning multimodality model-optimization long-context mechanistic-interpretability formal-methods sequence-architectures reinforcement-learning lioronai akshay_pachaar _akhaliq ben_burtenshaw vllm_project prince_canuma zenmuxai eliebakouch theturingpost axiommathai neelnanda5 sarahookr
Claude Code Skills gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. Zhipu AI launches multimodal models GLM-4.6V (106B params MoE) and GLM-4.6V-Flash (9B dense), featuring 128k context and native multimodal function calling, with free Flash variant and API pricing detailed. Jina AI releases Jina-VLM (2B), a compact multilingual VLM excelling in diagrams and documents with top benchmark scores. At NeurIPS 2025, research highlights include Google's post-Transformer sequence architectures (Moneta, Yaad, Memora) showing up to 20% gains in long-context retrieval, AxiomProver's autonomous Lean system solving 9/12 Putnam 2025 problems rapidly, and mechanistic interpretability advances discussed by Chris Olah emphasizing scalable tooling.
not much happened today
codex claude-4-opus claude-4-sonnet gemini-2.5-pro gemini-2.5 qwen-2.5-vl qwen-3 playdiffusion openai anthropic google perplexity-ai bing playai suno hugging-face langchain-ai qwen mlx assemblyai llamacloud fine-tuning model-benchmarking text-to-video agentic-ai retrieval-augmented-generation open-source-models speech-editing audio-processing text-to-speech ultra-low-latency multimodality public-notebooks sama gdb kevinweil lmarena_ai epochairesearch reach_vb wightmanr deeplearningai mervenoyann awnihannun jordirib1 aravsrinivas omarsar0 lioronai jerryjliu0 nerdai tonywu_71 _akhaliq clementdelangue _mfelfel
OpenAI rolled out Codex to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. Anthropic's Claude 4 Opus and Sonnet models lead coding benchmarks, while Google's Gemini 2.5 Pro and Flash models gain recognition with new audio capabilities. Qwen 2.5-VL and Qwen 3 quantizations are noted for versatility and support. Bing Video Creator launched globally enabling text-to-video generation, and Perplexity Labs sees increased demand for travel search. New agentic AI tools and RAG innovations include LlamaCloud and FedRAG. Open-source releases include Holo-1 for web navigation and PlayAI's PlayDiffusion for speech editing. Audio and multimodal advances feature Suno's music editing upgrades, Google's native TTS in 24+ languages, and Universal Streaming's ultra-low latency speech-to-text. Google NotebookLM now supports public notebooks. "Codex's internet access brings tradeoffs, with explicit warnings about risk" and "Gemini 2.5 Pro is cited as a daily driver by users".
Cognition's DeepWiki, a free encyclopedia of all GitHub repos
o4-mini perception-encoder qwen-2.5-vl dia-1.6b grok-3 gemini-2.5-pro claude-3.7 gpt-4.1 cognition meta-ai-fair alibaba hugging-face openai perplexity-ai vllm vision text-to-speech reinforcement-learning ocr model-releases model-integration open-source frameworks chatbots model-selector silas-alberti mervenoyann reach_vb aravsrinivas vikparuchuri lioronai
Silas Alberti of Cognition announced DeepWiki, a free encyclopedia of all GitHub repos providing Wikipedia-like descriptions and Devin-backed chatbots for public repos. Meta released Perception Encoders (PE) with A2.0 license, outperforming InternVL3 and Qwen2.5VL on vision tasks. Alibaba launched the Qwen Chat App for iOS and Android. Hugging Face integrated the Dia 1.6B SoTA text-to-speech model via FAL. OpenAI expanded deep research usage with a lightweight version powered by o4-mini model, now available to free users. Perplexity AI updated their model selector with Grok 3 Beta, o4-mini, and support for models like gemini 2.5 pro, claude 3.7, and gpt-4.1. vLLM project introduced OpenRLHF framework for reinforcement learning with human feedback. Surya OCR alpha model supports 90+ languages and LaTeX. MegaParse open-source library was introduced for LLM-ready data formats.
not much happened today
gpt-2 r1 gemma-3 gemmacoder3-12b qwen2.5-omni openai deepseek berkeley alibaba togethercompute nvidia azure runway langchain bmw amazon open-source function-calling benchmarking code-reasoning multimodality inference-speed image-generation voice-generation animation robotics realtime-transcription webrtc sama clémentdelangue lioronai scaling01 cognitivecompai osanseviero jack_w_rae ben_burtenshaw theturingpost vipulved kevinweil tomlikesrobots adcock_brett juberti
OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, signaling a move towards more open AI development. DeepSeek launched its open-source R1 model earlier this year, challenging perceptions of China's AI progress. Gemma 3 has achieved function calling capabilities and ranks on the Berkeley Function-Calling Leaderboard, while GemmaCoder3-12b improves code reasoning performance on LiveCodeBench. Alibaba_Qwen's Qwen2.5-Omni introduces a novel Thinker-Talker system and TMRoPE for multimodal input understanding. The TogetherCompute team achieved 140 TPS on a 671B parameter model, outperforming Azure and DeepSeek API on Nvidia GPUs. OpenAI also expanded ChatGPT features with image generation for all free users and a new voice release. Runway Gen-4 enhances animation for miniature dioramas, and LangChain launched a chat-based generative UI agent. Commercial deployment of Figure 03 humanoid robots at BMW highlights advances in autonomy and manufacturing scaling. New tools include OpenAI's realtime transcription API with WebRTC support and Amazon's Nova Act AI browser agent.
How To Scale Your Model, by DeepMind
qwen-0.5 google-deepmind deepseek hugging-face transformers inference high-performance-computing robotics sim2real mixture-of-experts reinforcement-learning bias-mitigation rust text-generation open-source omarsar0 drjimfan tairanhe99 guanyashi lioronai _philschmid awnihannun clementdelangue
Researchers at Google DeepMind (GDM) released a comprehensive "little textbook" titled "How To Scale Your Model" covering modern Transformer architectures, inference optimizations beyond O(N^2) attention, and high-performance computing concepts like rooflines. The resource includes practical problems and real-time comment engagement. On AI Twitter, several key updates include the open-sourced humanoid robotics model ASAP inspired by athletes like Cristiano Ronaldo, LeBron James, and Kobe Bryant; a new paper on Mixture-of-Agents proposing the Self-MoA method for improved LLM output aggregation; training of reasoning LLMs using the GRPO algorithm from DeepSeek demonstrated on Qwen 0.5; findings on bias in LLMs used as judges highlighting the need for multiple independent evaluations; and the release of mlx-rs, a Rust library for machine learning with examples including Mistral text generation. Additionally, Hugging Face launched an AI app store featuring over 400,000 apps with 2,000 new daily additions and 2.5 million weekly visits, enabling AI-powered app search and categorization.