All tags
Person: "ben_burtenshaw"
not much happened today
glm-4.6v glm-4.6v-flash jina-vlm-2b hugging-face zhipu-ai jina-ai google-deepmind axiomprover fine-tuning multimodality model-optimization long-context mechanistic-interpretability formal-methods sequence-architectures reinforcement-learning lioronai akshay_pachaar _akhaliq ben_burtenshaw vllm_project prince_canuma zenmuxai eliebakouch theturingpost axiommathai neelnanda5 sarahookr
Claude Code Skills gains attention with a published talk and Hugging Face's new "skill" enabling one-line fine-tuning pipelines for models from ~0.5B to 70B parameters, supporting SFT, DPO, and GRPO, costing as low as ~$0.30 for small runs. Zhipu AI launches multimodal models GLM-4.6V (106B params MoE) and GLM-4.6V-Flash (9B dense), featuring 128k context and native multimodal function calling, with free Flash variant and API pricing detailed. Jina AI releases Jina-VLM (2B), a compact multilingual VLM excelling in diagrams and documents with top benchmark scores. At NeurIPS 2025, research highlights include Google's post-Transformer sequence architectures (Moneta, Yaad, Memora) showing up to 20% gains in long-context retrieval, AxiomProver's autonomous Lean system solving 9/12 Putnam 2025 problems rapidly, and mechanistic interpretability advances discussed by Chris Olah emphasizing scalable tooling.
Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528
command-a-reasoning deepseek-v3.1 cohere deepseek intel huggingface baseten vllm-project chutes-ai anycoder agentic-ai hybrid-models long-context fp8-training mixture-of-experts benchmarking quantization reasoning coding-workflows model-pricing artificialanlys reach_vb scaling01 cline ben_burtenshaw haihaoshen jon_durbin _akhaliq willccbb teortaxestex
Cohere's Command A Reasoning model outperforms GPT-OSS in open deep research capabilities, emphasizing agentic use cases for 2025. DeepSeek-V3.1 introduces a hybrid reasoning architecture toggling between reasoning and non-reasoning modes, optimized for agentic workflows and coding, with extensive long-context pretraining (~630B tokens for 32k context, ~209B for 128k), FP8 training, and a large MoE expert count (~37B). Benchmarks show competitive performance with notable improvements in SWE-Bench and other reasoning tasks. The model supports a $0.56/M input and $1.68/M output pricing on the DeepSeek API and enjoys rapid ecosystem integration including HF weights, INT4 quantization by Intel, and vLLM reasoning toggles. Community feedback highlights the hybrid design's pragmatic approach to agent and software engineering workflows, though some note the lack of tool use in reasoning mode.
not much happened today
gpt-2 r1 gemma-3 gemmacoder3-12b qwen2.5-omni openai deepseek berkeley alibaba togethercompute nvidia azure runway langchain bmw amazon open-source function-calling benchmarking code-reasoning multimodality inference-speed image-generation voice-generation animation robotics realtime-transcription webrtc sama clémentdelangue lioronai scaling01 cognitivecompai osanseviero jack_w_rae ben_burtenshaw theturingpost vipulved kevinweil tomlikesrobots adcock_brett juberti
OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, signaling a move towards more open AI development. DeepSeek launched its open-source R1 model earlier this year, challenging perceptions of China's AI progress. Gemma 3 has achieved function calling capabilities and ranks on the Berkeley Function-Calling Leaderboard, while GemmaCoder3-12b improves code reasoning performance on LiveCodeBench. Alibaba_Qwen's Qwen2.5-Omni introduces a novel Thinker-Talker system and TMRoPE for multimodal input understanding. The TogetherCompute team achieved 140 TPS on a 671B parameter model, outperforming Azure and DeepSeek API on Nvidia GPUs. OpenAI also expanded ChatGPT features with image generation for all free users and a new voice release. Runway Gen-4 enhances animation for miniature dioramas, and LangChain launched a chat-based generative UI agent. Commercial deployment of Figure 03 humanoid robots at BMW highlights advances in autonomy and manufacturing scaling. New tools include OpenAI's realtime transcription API with WebRTC support and Amazon's Nova Act AI browser agent.