Person: "kathleen-kenealy"

Jun 28, 2024

gemma-2 qwen-72b mixtral-8x22b-instruct claude-3.5-sonnet google-deepmind alibaba mistral-ai anthropic knowledge-distillation attention-mechanisms multilingual-models multimodality model-training model-optimization memory-optimization fine-tuning kathleen-kenealy daniel-han

Gemma 2, a 27B parameter model from google-deepmind, was released with innovations like 1:1 local-global attention alternation and logit soft-capping, leveraging knowledge distillation to train smaller models on over 50× the compute-optimal token quantity. The model supports multilingual and multimodal capabilities, with fine-tuning success on over 200 Indic language variants. The Open LLM Leaderboard highlights alibaba's Qwen 72B as the top model, with mistral-ai's Mixtral-8x22B-Instruct also ranking highly. Anthropic launched Claude 3.5 Sonnet, improving intelligence at mid-tier cost and speed. Research on eliminating matrix multiplication in LLMs promises significant memory savings without performance loss. Kathleen Kenealy and Daniel Han provided insights on Gemma 2's tokenizer and attention scaling respectively.

You can also subscribe by rss .

Press Esc or click anywhere to close