All tags
Model: "alphafold-3"
LMSys advances Llama 3 eval analysis
llama-3-70b llama-3 claude-3-sonnet alphafold-3 lmsys openai google-deepmind isomorphic-labs benchmarking model-behavior prompt-complexity model-specification molecular-structure-prediction performance-analysis leaderboards demis-hassabis sam-altman miranda-murati karina-nguyen joanne-jang john-schulman
LMSys is enhancing LLM evaluation by categorizing performance across 8 query subcategories and 7 prompt complexity levels, revealing uneven strengths in models like Llama-3-70b. DeepMind released AlphaFold 3, advancing molecular structure prediction with holistic modeling of protein-DNA-RNA complexes, impacting biology and genetics research. OpenAI introduced the Model Spec, a public standard to clarify model behavior and tuning, inviting community feedback and aiming for models to learn directly from it. Llama 3 has reached top leaderboard positions on LMSys, nearly matching Claude-3-sonnet in performance, with notable variations on complex prompts. The analysis highlights the evolving landscape of model benchmarking and behavior shaping.
OpenAI's PR Campaign?
alphafold-3 xlstm gpt-4 openai microsoft google-deepmind memory-management model-spec scaling multimodality performance transformers dynamic-memory model-architecture demis-hassabis sama joanne-jang omarsar0 arankomatsuzaki drjimfan
OpenAI faces user data deletion backlash over its new partnership with StackOverflow amid GDPR complaints and US newspaper lawsuits, while addressing election year concerns with efforts like the Media Manager tool for content opt-in/out by 2025 and source link attribution. Microsoft develops a top-secret airgapped GPT-4 AI service for US intelligence agencies. OpenAI releases the Model Spec outlining responsible AI content generation policies, including NSFW content handling and profanity use, emphasizing clear distinctions between bugs and design decisions. Google DeepMind announces AlphaFold 3, a state-of-the-art model predicting molecular structures with high accuracy, showcasing cross-domain AI techniques. New research on xLSTM proposes scaling LSTMs to billions of parameters, competing with transformers in performance and scaling. Microsoft introduces vAttention, a dynamic memory management method for efficient large language model serving without PagedAttention.