All tags
Company: "qualcomm"
not much happened today
dflash nemo-automodel claude openai broadcom qualcomm modular nvidia skypilot modal anthropic hugging-face hardware inference performance-optimization model-training agent-ux security capability-based-security open-source fine-tuning infrastructure model-optimization gdb kimmonismus scaling01 clattner_llvm karpathy gallabytes dabit3 kentonvarda random_walker jubbaonjeans victormustar
OpenAI announced Jalapeño, its first custom AI chip for LLM inference, built with Broadcom, aiming to control more of the AI stack and improve compute economics with a fast 9-month design cycle. Community analysis suggests Jalapeño features 216GB HBM3E, ~7.1–7.4 TB/s bandwidth, and ~10 PFLOPS FP4 performance, signaling hyperscaler-style inference silicon as a new standard. Meanwhile, Qualcomm is acquiring Modular, with Mojo open-sourcing on track, indicating rising competition in vertically integrated inference stacks beyond NVIDIA/CUDA. On infrastructure, NVIDIA's NeMo AutoModel boosts training throughput for MoE models by 3.4–3.7x, and startups like SkyPilot and Modal advance unified and open-source inference solutions. Custom training of DFLASH models yields 30–50% decode gains. In UX, Anthropic's Slack-native Claude agent shifts agent interaction from tools to coworkers, raising new security and cost concerns around identity, permissions, and lock-in, with debates on capability-based security and attribution. Hugging Face responded with its self-hosted Slack coding agent Moon Bot.
Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)
llama-3-2 llama-3-1 claude-3-haiku gpt-4o-mini molmo-72b molmo-7b gemma-2 phi-3-5 llama-3-2-vision llama-3-2-3b llama-3-2-20b meta-ai-fair ai2 qualcomm mediatek arm ollama together-ai fireworks-ai weights-biases cohere weaviate multimodality vision context-windows quantization model-release tokenization model-performance model-optimization rag model-training instruction-following mira-murati daniel-han
Meta released Llama 3.2 with new multimodal versions including 3B and 20B vision adapters on a frozen Llama 3.1, showing competitive performance against Claude Haiku and GPT-4o-mini. AI2 launched multimodal Molmo 72B and 7B models outperforming Llama 3.2 in vision tasks. Meta also introduced new 128k-context 1B and 3B models competing with Gemma 2 and Phi 3.5, with collaborations hinted with Qualcomm, Mediatek, and Arm for on-device AI. The release includes a 9 trillion token count for Llama 1B and 3B. Partner launches include Ollama, Together AI offering free 11B model access, and Fireworks AI. Additionally, a new RAG++ course from Weights & Biases, Cohere, and Weaviate offers systematic evaluation and deployment guidance for retrieval-augmented generation systems based on extensive production experience.