All tags
Topic: "policy"
not much happened today
fable-5 mythos anthropic model-performance trust data-retention benchmarking agentic-ai coding policy darioamodei natolambert martin_casado drfeifei antirez clementdelangue deanwball hlntnr _arohan_ dbahdanau gergelyorosz scaling01 dbreunig omarsar0 yacinemtb mchlhess jasonbotterill lvwerra lechmazur kimmonismus walden_yan hrishioa
Anthropic faced backlash for silently degrading AI research capabilities in its Fable/Mythos models without clear disclosure, raising concerns about trust, reproducibility, and enterprise data retention policies. Despite controversy, Fable 5 demonstrated strong benchmark performance, leading in agentic and coding tasks with high scores on Agent Arena, SimpleBench, CADGenBench, and PACT. Dario Amodei published a policy advocating stronger frontier AI oversight amid these tensions.
not much happened today
qwen2-math-72b gpt-4o claude-3.5-sonnet gemini-1.5-pro llama-3.1-405b idefics3-llama-8b anthropic google mistral-ai llamaindex math fine-tuning synthetic-data reinforcement-learning bug-bounty visual-question-answering open-source retrieval-augmented-generation agentic-ai ai-safety policy rohanpaul_ai anthropicai mervenoyann jeremyphoward omarsar0 ylecun bindureddy
Qwen2-Math-72B outperforms GPT-4o, Claude-3.5-Sonnet, Gemini-1.5-Pro, and Llama-3.1-405B on math benchmarks using synthetic data and advanced optimization techniques. Google AI cuts pricing for Gemini 1.5 Flash by up to 78%. Anthropic expands its bug bounty program targeting universal jailbreaks in next-gen safety systems. Tutorial on QLoRA fine-tuning of IDEFICS3-Llama 8B for visual question answering released. A Chinese open weights model surpasses previous MATH benchmark records. Surveys on Mamba models and LLM-based agents for software engineering highlight advancements and applications. Open-source tools like R2R RAG engine and LlamaIndex Workflows simplify building complex AI applications. Mistral AI introduces customizable AI agents. Concerns raised about California bill SB 1047's focus on existential risk and debates on banning open-source AI. Memes and humor continue in AI communities.