All tags
Topic: "ambient-agents"
OpenAI Titan XPU: 10GW of self-designed chips with Broadcom
llama-3-70b openai nvidia amd broadcom inferencemax asic inference compute-infrastructure chip-design fp8 reinforcement-learning ambient-agents custom-accelerators energy-consumption podcast gdb
OpenAI is finalizing a custom ASIC chip design to deploy 10GW of inference compute, complementing existing deals with NVIDIA (10GW) and AMD (6GW). This marks a significant scale-up from OpenAI's current 2GW compute, aiming for a roadmap of 250GW total, which is half the energy consumption of the US. Greg from OpenAI highlights the shift of ChatGPT from interactive use to always-on ambient agents requiring massive compute, emphasizing the challenge of building chips for billions of users. The in-house ASIC effort was driven by the need for tailored designs after limited success influencing external chip startups. Broadcom's stock surged 10% on the news. Additionally, InferenceMAX reports improved ROCm stability and nuanced performance comparisons between AMD MI300X and NVIDIA H100/H200 on llama-3-70b FP8 workloads, with RL training infrastructure updates noted.
small little news items
r7b llama-3-70b minicpm-o-2.6 gpt-4v qwen2.5-math-prm ollama cohere togethercompute openbmb qwen langchain openai rag tool-use-tasks quality-of-life new-engine multimodality improved-reasoning math-capabilities process-reward-models llm-reasoning mathematical-reasoning beta-release task-scheduling ambient-agents email-assistants ai-software-engineering codebase-analysis test-case-generation security-infrastructure llm-scaling-laws power-law plateauing-improvements gans-revival
Ollama enhanced its models by integrating Cohere's R7B, optimized for RAG and tool use tasks, and released Ollama v0.5.5 with quality updates and a new engine. Together AI launched the Llama 3.3 70B multimodal model with improved reasoning and math capabilities, while OpenBMB introduced the MiniCPM-o 2.6, outperforming GPT-4V on visual tasks. Insights into Process Reward Models (PRM) were shared to boost LLM reasoning, alongside Qwen2.5-Math-PRM models excelling in mathematical reasoning. LangChain released a beta for ChatGPT Tasks enabling scheduling of reminders and summaries, and introduced open-source ambient agents for email assistance. OpenAI rolled out Tasks for scheduling actions in ChatGPT for Plus, Pro, and Teams users. AI software engineering is rapidly advancing, predicted to match human capabilities within 18 months. Research on LLM scaling laws highlights power law relationships and plateauing improvements, while GANs are experiencing a revival.