All tags
Model: "gemma-4"
not much happened today
glm-5.2 glm-5.2-max opus-4.8 claude-fable-5 ornith-1.0 gemma-4 qwen-3.5 lfm2.5-230m gemini-3.5-flash codex z.ai databricks liquid-ai google-deepmind google sail hyperagent openai langchain coding-benchmarks agentic-ai reinforcement-learning model-optimization speculative-decoding hardware-optimization long-running-agents agent-persistence cost-efficiency computer-use safety-controls developer-tools token-consumption concurrent-agents philschmid gdb reach_vb eliebakouch
Z.ai's GLM-5.2 leads in coding and agent benchmarks with top scores like 1595 on Code Arena: Frontend and 34.29% reasoning accuracy with zero failures. Databricks improved GLM-5.2 speed to 392 tok/s using hardware and optimizations. Ornith-1.0, a new MIT-licensed coding model family, spans 9B to 397B parameters with strong benchmark results and a self-improving RL training method. Liquid AI released a small model for low-latency robotics/e-commerce use. Google integrated computer use into Gemini 3.5 Flash with safety controls and developer tools for device control. Startups like Sail and Hyperagent focus on long-running agents with persistent execution and cost efficiency. OpenAI reports growing internal Codex use for complex, cross-functional tasks, highlighting agent skill concurrency.
not much happened today
opus-4.8 gemma-4 cognition frontiercode moonshot google claudedevs magicpath langsmith modal coding-evaluation agent-control verification agent-ergonomics sandbox-environments local-inference workflow-optimization cli-tools plugin-integration persistent-memory swyx dzhng claudecode bcherny reach_vb omarsar0 gneubig hamelhusain angaisb_
FrontierCode benchmark by Cognition highlights the challenge of coding tasks with the best model, Opus 4.8, scoring only about 13% on the hardest subset, indicating coding is less solved than benchmarks suggest. The trend toward using loops as a control metaphor for coding agents is prominent, with emphasis on clear goals, verification, and iteration, though some experts caution about overreliance on loops. Agent ergonomics are improving with observability dashboards, sandbox environments, and workflow tools from ClaudeDevs, MagicPath, LangSmith, and Modal. Kimi by Moonshot released major updates including a stronger coding agent and a desktop agent product supporting up to 300 local sub-agents. Google advanced efficient local deployment with upgrades to Gemma 4 checkpoints.
not much happened today
gemma-4 google huggingface intel ollama unsloth reasoning agentic-workflows multimodality on-device-ai local-inference model-benchmarking moe vision audio-processing memory-optimization open-source model-performance fchollet demishassabis clementdelangue quixiai googlegemma ggerganov osanseviero maartengr basecampbernie prince_canuma measure_plan kimmonismus anemll arena stochasticchasm reach_vb zeneca everlier erick_lindberg_ anomalistg
Gemma 4 was launched by Google under an Apache 2.0 license, marking a significant open-model release focused on reasoning, agentic workflows, multimodality, and on-device use. It outperforms models 10x larger and has immediate ecosystem support including vLLM, llama.cpp, Ollama, Intel hardware, Unsloth, and Hugging Face Inference Endpoints. Local inference benchmarks showed strong performance on consumer hardware, including RTX 4090 and Mac mini M4. Early benchmarking praised its efficiency and ranking improvements over previous versions. Meanwhile, Hermes Agent emerged as a popular open-source agent harness, noted for stability and capability on long tasks, with users switching from OpenClaw to Hermes.
Gemma 4
gemma-4 gemma-4-31b gemma-4-26b-a4b google-deepmind multimodality long-context model-architecture moe local-inference model-optimization function-calling quantization jeffdean _philschmid rasbt ggerganov clattner_llvm julien_c clementdelangue
Google DeepMind released Gemma 4, a family of open-weight, multimodal models with long-context support up to 256K tokens under an Apache 2.0 license, marking a major capability and licensing shift. The lineup includes 31B dense, 26B MoE (A4B), and two edge models (E4B, E2B) optimized for local and edge deployment with native multimodal support (text, vision, audio). Early benchmarks show Gemma-4-31B ranking #3 among open models and strong scientific reasoning performance with 85.7% GPQA Diamond. Day-0 ecosystem support includes llama.cpp, Ollama, vLLM, and LM Studio, with notable local inference performance on hardware like M2 Ultra and RTX 4090. The architecture features hybrid attention and MoE layering, diverging from standard transformers. Community and developer engagement is high, with rapid adoption and tooling integration.