All tags
Topic: "molecular-structure-prediction"
LMSys advances Llama 3 eval analysis
llama-3-70b llama-3 claude-3-sonnet alphafold-3 lmsys openai google-deepmind isomorphic-labs benchmarking model-behavior prompt-complexity model-specification molecular-structure-prediction performance-analysis leaderboards demis-hassabis sam-altman miranda-murati karina-nguyen joanne-jang john-schulman
LMSys is enhancing LLM evaluation by categorizing performance across 8 query subcategories and 7 prompt complexity levels, revealing uneven strengths in models like Llama-3-70b. DeepMind released AlphaFold 3, advancing molecular structure prediction with holistic modeling of protein-DNA-RNA complexes, impacting biology and genetics research. OpenAI introduced the Model Spec, a public standard to clarify model behavior and tuning, inviting community feedback and aiming for models to learn directly from it. Llama 3 has reached top leaderboard positions on LMSys, nearly matching Claude-3-sonnet in performance, with notable variations on complex prompts. The analysis highlights the evolving landscape of model benchmarking and behavior shaping.