All tags
Person: "julien_c"
Gemma 4
gemma-4 gemma-4-31b gemma-4-26b-a4b google-deepmind multimodality long-context model-architecture moe local-inference model-optimization function-calling quantization jeffdean _philschmid rasbt ggerganov clattner_llvm julien_c clementdelangue
Google DeepMind released Gemma 4, a family of open-weight, multimodal models with long-context support up to 256K tokens under an Apache 2.0 license, marking a major capability and licensing shift. The lineup includes 31B dense, 26B MoE (A4B), and two edge models (E4B, E2B) optimized for local and edge deployment with native multimodal support (text, vision, audio). Early benchmarks show Gemma-4-31B ranking #3 among open models and strong scientific reasoning performance with 85.7% GPQA Diamond. Day-0 ecosystem support includes llama.cpp, Ollama, vLLM, and LM Studio, with notable local inference performance on hardware like M2 Ultra and RTX 4090. The architecture features hybrid attention and MoE layering, diverging from standard transformers. Community and developer engagement is high, with rapid adoption and tooling integration.