All tags
Topic: "mechanistic-interpretability"
not much happened this weekend
o3 o1 opus sonnet octave openai langchain hume x-ai amd nvidia meta-ai-fair hugging-face inference-time-scaling model-ensembles small-models voice-cloning fine-math-dataset llm-agent-framework benchmarking software-stack large-concept-models latent-space-reasoning mechanistic-interpretability planning speech-language-models lisa-su clementdelangue philschmid neelnanda5
o3 model gains significant attention with discussions around its capabilities and implications, including an OpenAI board member referencing "AGI." LangChain released their State of AI 2024 survey. Hume announced OCTAVE, a 3B parameter API-only speech-language model with voice cloning. x.ai secured a $6B Series C funding round. Discussions highlight inference-time scaling, model ensembles, and the surprising generalization ability of small models. New tools and datasets include FineMath, the best open math dataset on Hugging Face, and frameworks for LLM agents. Industry updates cover a 5-month benchmarking of AMD MI300X vs Nvidia H100 + H200, insights from a meeting with Lisa Su on AMD's software stack, and open AI engineering roles. Research innovations include Large Concept Models (LCM) from Meta AI, Chain of Continuous Thought (Coconut) for latent space reasoning, and mechanistic interpretability initiatives.
not much happened today
llama-3 llama-3-1 grok-2 claude-3.5-sonnet gpt-4-turbo nous-research nvidia salesforce goodfire-ai anthropic x-ai google-deepmind box langchain fine-tuning prompt-caching mechanistic-interpretability model-performance multimodality agent-frameworks software-engineering-agents api document-processing text-generation model-releases vision image-generation efficiency scientific-discovery fchollet demis-hassabis
GPT-5 delayed again amid a quiet news day. Nous Research released Hermes 3 finetune of Llama 3 base models, rivaling FAIR's instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. Nvidia introduced Minitron finetune of Llama 3.1. Salesforce launched a DEI agent scoring 55% on SWE-Bench Lite. Goodfire AI secured $7M seed funding for mechanistic interpretability work. Anthropic rolled out prompt caching in their API, cutting input costs by up to 90% and latency by 80%, aiding coding assistants and large document processing. xAI released Grok-2, matching Claude 3.5 Sonnet and GPT-4 Turbo on LMSYS leaderboard with vision+text inputs and image generation integration. Claude 3.5 Sonnet reportedly outperforms GPT-4 in coding and reasoning. François Chollet defined intelligence as efficient operationalization of past info for future tasks. Salesforce's DEI framework surpasses individual agent performance. Google DeepMind's Demis Hassabis discussed AGI's role in scientific discovery and safe AI development. Dora AI plugin generates landing pages in under 60 seconds, boosting web team efficiency. Box AI API beta enables document chat, data extraction, and content summarization. LangChain updated Python & JavaScript integration docs.
Anthropic's "LLM Genome Project": learning & clamping 34m features on Claude Sonnet
claude-3-sonnet claude-3 anthropic scale-ai suno-ai microsoft model-interpretability dictionary-learning neural-networks feature-activation intentional-modifiability scaling mechanistic-interpretability emmanuel-ameisen alex-albert
Anthropic released their third paper in the MechInterp series, Scaling Monosemanticity, scaling interpretability analysis to 34 million features on Claude 3 Sonnet. This work introduces the concept of dictionary learning to isolate recurring neuron activation patterns, enabling more interpretable internal states by combining features rather than neurons. The paper reveals abstract features related to code, errors, sycophancy, crime, self-representation, and deception, demonstrating intentional modifiability by clamping feature values. The research marks a significant advance in model interpretability and neural network analysis at frontier scale.