All tags
Topic: "linear-attention"
not much happened today
raev2 gated-deltanet-2 kda mamba-3 dclm nvidia openai nous-research representation-learning tokenization linear-attention long-context mechanistic-interpretability math data-filtering agent-infrastructure language-modeling commonsense-reasoning 1jaskiratsingh recatm sainingxie ahatamiz1 rasbt nousresearch tatsu_hashimoto goodfireai markchen90 wtgowers memecrashes cloneofsimo lvwerra
RAEv2 advances representation-first tokenization with >10x faster convergence and improved generation, tested on text-to-image and world models. NVIDIA's Gated DeltaNet-2 innovates linear attention with channel-wise gates, outperforming KDA and Mamba-3 at 1.3B parameters on language modeling and reasoning tasks. Studies on subword tokenization reveal only some benefits at scale, while data filtering research suggests that with enough compute, no filtering may be optimal at around 1e30 FLOPs. Mechanistic interpretability updates propose clustering features by joint firing patterns for better geometry understanding. OpenAI's AI-assisted breakthrough on an ErdΕs unit-distance math problem sparks debate on AI's role in mathematical research. Harnesses remain key for capability improvements in agent infrastructure.
not much happened today
Poolside raised $1B at a $12B valuation. Eric Zelikman raised $1B after leaving Xai. Weavy joined Figma. New research highlights FP16 precision reduces training-inference mismatch in reinforcement-learning fine-tuning compared to BF16. Kimi AI introduced a hybrid KDA (Kimi Delta Attention) architecture improving long-context throughput and RL stability, alongside a new Kimi CLI for coding with agent protocol support. OpenAI previewed Agent Mode in ChatGPT enabling autonomous research and planning during browsing.
Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention
mistral-8x22b command-r-plus rerank-3 infini-attention llama-3 sd-1.5 cosxl meta-ai-fair mistral-ai cohere google stability-ai hugging-face ollama model-merging training-accelerators retrieval-augmented-generation linear-attention long-context foundation-models image-generation rag-pipelines model-benchmarking context-length model-performance aidan_gomez ylecun swyx
Meta announced their new MTIAv2 chips designed for training and inference acceleration with improved architecture and integration with PyTorch 2.0. Mistral released the 8x22B Mixtral model, which was merged back into a dense model to effectively create a 22B Mistral model. Cohere launched Rerank 3, a foundation model enhancing enterprise search and retrieval-augmented generation (RAG) systems supporting 100+ languages. Google published a paper on Infini-attention, an ultra-scalable linear attention mechanism demonstrated on 1B and 8B models with 1 million sequence length. Additionally, Meta's Llama 3 is expected to start rolling out soon. Other notable updates include Command R+, an open model surpassing GPT-4 in chatbot performance with 128k context length, and advancements in Stable Diffusion models and RAG pipelines.