All tags
Company: "broadcom"
not much happened today
dflash nemo-automodel claude openai broadcom qualcomm modular nvidia skypilot modal anthropic hugging-face hardware inference performance-optimization model-training agent-ux security capability-based-security open-source fine-tuning infrastructure model-optimization gdb kimmonismus scaling01 clattner_llvm karpathy gallabytes dabit3 kentonvarda random_walker jubbaonjeans victormustar
OpenAI announced Jalapeño, its first custom AI chip for LLM inference, built with Broadcom, aiming to control more of the AI stack and improve compute economics with a fast 9-month design cycle. Community analysis suggests Jalapeño features 216GB HBM3E, ~7.1–7.4 TB/s bandwidth, and ~10 PFLOPS FP4 performance, signaling hyperscaler-style inference silicon as a new standard. Meanwhile, Qualcomm is acquiring Modular, with Mojo open-sourcing on track, indicating rising competition in vertically integrated inference stacks beyond NVIDIA/CUDA. On infrastructure, NVIDIA's NeMo AutoModel boosts training throughput for MoE models by 3.4–3.7x, and startups like SkyPilot and Modal advance unified and open-source inference solutions. Custom training of DFLASH models yields 30–50% decode gains. In UX, Anthropic's Slack-native Claude agent shifts agent interaction from tools to coworkers, raising new security and cost concerns around identity, permissions, and lock-in, with debates on capability-based security and attribution. Hugging Face responded with its self-hosted Slack coding agent Moon Bot.
OpenAI Titan XPU: 10GW of self-designed chips with Broadcom
llama-3-70b openai nvidia amd broadcom inferencemax asic inference compute-infrastructure chip-design fp8 reinforcement-learning ambient-agents custom-accelerators energy-consumption podcast gdb
OpenAI is finalizing a custom ASIC chip design to deploy 10GW of inference compute, complementing existing deals with NVIDIA (10GW) and AMD (6GW). This marks a significant scale-up from OpenAI's current 2GW compute, aiming for a roadmap of 250GW total, which is half the energy consumption of the US. Greg from OpenAI highlights the shift of ChatGPT from interactive use to always-on ambient agents requiring massive compute, emphasizing the challenge of building chips for billions of users. The in-house ASIC effort was driven by the need for tailored designs after limited success influencing external chip startups. Broadcom's stock surged 10% on the news. Additionally, InferenceMAX reports improved ROCm stability and nuanced performance comparisons between AMD MI300X and NVIDIA H100/H200 on llama-3-70b FP8 workloads, with RL training infrastructure updates noted.