All tags
Person: "alexandr-wang"
Execuhires: Tempting The Wrath of Khan
gemini-1.5-pro gpt-4o claude-3.5 flux-1 llama-3-1-405b character.ai google adept amazon inflection microsoft stability-ai black-forest-labs schelling google-deepmind openai anthropic meta-ai-fair lmsys langchainai execuhire model-benchmarking multilinguality math coding text-to-image agent-ide open-source-models post-training data-driven-performance noam-shazeer mostafa-mostaque david-friedman rob-rombach alexandr-wang svpino rohanpaul_ai
Character.ai's $2.5b execuhire to Google marks a significant leadership move alongside Adept's $429m execuhire to Amazon and Inflection's $650m execuhire to Microsoft. Despite strong user growth and content momentum, Character.ai's CEO Noam Shazeer returns to Google, signaling shifting vibes in the AI industry. Google DeepMind's Gemini 1.5 Pro tops Chatbot Arena benchmarks, outperforming GPT-4o and Claude-3.5, excelling in multilingual, math, and coding tasks. The launch of Black Forest Labs' FLUX.1 text-to-image model and LangGraph Studio agent IDE highlight ongoing innovation. Llama 3.1 405B is released as the largest open-source model, fostering developer use and competition with closed models. The industry is focusing increasingly on post-training and data as key competitive factors, raising questions about acquisition practices and regulatory scrutiny.
GraphRAG: The Marriage of Knowledge Graphs and RAG
gemma-2 llama-3-70b claude-3.5-sonnet nemotron-340b qwen2-72b llama-3 microsoft-research anthropic nvidia hugging-face retrieval-augmented-generation knowledge-graphs token-usage inference-time attention-mechanisms instruction-following coding math long-range-reasoning synthetic-data dataset-release fine-tuning context-windows function-calling travis-fischer rasbt alexandr-wang osanseviero rohanpaul_ai hamelhusain svpino aaaazzam omarsar0
Microsoft Research open sourced GraphRAG, a retrieval augmented generation (RAG) technique that extracts knowledge graphs from sources and clusters them for improved LLM answers, though it increases token usage and inference time. Gemma 2 models were released focusing on efficient small LLMs with innovations like sliding window attention and RMS norm, nearly matching the larger Llama 3 70B. Anthropic's Claude 3.5 Sonnet leads in instruction following and coding benchmarks, while Nvidia's Nemotron 340B model was released in June. Qwen2-72B tops the HuggingFace Open LLM leaderboard excelling in math and long-range reasoning. Discussions on RAG highlighted its limitations and improvements in context usage via function calls. A persona-driven synthetic data generation approach introduced 1 billion personas, with a fine-tuned model matching GPT-4 performance on math benchmarks at 7B scale. The 200GB AutoMathText dataset was also noted for math data synthesis.
Contextual Position Encoding (CoPE)
cope gemini-1.5-flash gemini-1.5-pro claude gpt-3 meta-ai-fair google-deepmind anthropic perplexity-ai langchain openai positional-encoding transformers counting copying language-modeling coding external-memory tool-use model-evaluation inference-speed model-benchmarking scaling research-synthesis jason-weston alexandr-wang karpathy arav-srinivas
Meta AI researcher Jason Weston introduced CoPE, a novel positional encoding method for transformers that incorporates context to create learnable gates, enabling improved handling of counting and copying tasks and better performance on language modeling and coding. The approach can potentially be extended with external memory for gate calculation. Google DeepMind released Gemini 1.5 Flash and Pro models optimized for fast inference. Anthropic announced general availability of tool use for Claude, enhancing its ability to orchestrate tools for complex tasks. Alexandr Wang launched SEAL Leaderboards for private, expert evaluations of frontier models. Karpathy reflected on the 4th anniversary of GPT-3, emphasizing scaling and practical improvements. Perplexity AI launched Perplexity Pages to convert research into visually appealing articles, described as an "AI Wikipedia" by Arav Srinivas.
Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model
chameleon gpt-4o gemini-1.5-flash claude-3 meta-ai-fair openai google-deepmind anthropic reddit multimodality early-fusion benchmarking model-training tokenization streaming tool-use vision coding hallucination-detection model-performance armen-aghajanyan sama alexandr-wang abacaj alexalbert__
Meta AI FAIR introduced Chameleon, a new multimodal model family with 7B and 34B parameter versions trained on 10T tokens of interleaved text and image data enabling "early fusion" multimodality that can natively output any modality. While reasoning benchmarks are modest, its "omnimodality" approach competes well with pre-GPT4o multimodal models. OpenAI launched GPT-4o, a model excelling in benchmarks like MMLU and coding tasks, with strong multimodal capabilities but some regression in ELO scores and hallucination issues. Google DeepMind announced Gemini 1.5 Flash, a small model with 1M context window and flash performance, highlighting convergence trends between OpenAI and Google models. Anthropic updated Claude 3 with streaming support, forced tool use, and vision tool integration for multimodal knowledge extraction. OpenAI also partnered with Reddit, raising industry attention.