All tags
Person: "jayalammar"
not much happened today
aya-vision-8b aya-vision-32b llama-3-2-90b-vision molmo-72b phi-4-mini phi-4-multimodal cogview4 wan-2-1 weights-and-biases coreweave cohereforai microsoft alibaba google llamaindex weaviate multilinguality vision multimodality image-generation video-generation model-releases benchmarking funding agentic-ai model-performance mervenoyann reach_vb jayalammar sarahookr aidangomez nickfrosst dair_ai akhaliq bobvanluijt jerryjliu0
Weights and Biases announced a $1.7 billion acquisition by CoreWeave ahead of CoreWeave's IPO. CohereForAI released the Aya Vision models (8B and 32B parameters) supporting 23 languages, outperforming larger models like Llama-3.2 90B Vision and Molmo 72B. Microsoft introduced Phi-4-Mini (3.8B parameters) and Phi-4-Multimodal models, excelling in math, coding, and multimodal benchmarks. CogView4, a 6B parameter text-to-image model with 2048x2048 resolution and Apache 2.0 license, was released. Alibaba launched Wan 2.1, an open-source video generation model with 720p output and 16 fps generation. Google announced new AI features for Pixel devices including Scam Detection and Gemini integrations. LlamaCloud reached General Availability and raised $19M Series A funding, serving over 100 Fortune 500 companies. Weaviate launched the Query Agent, the first of three Weaviate Agents.
Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
gemini-2.0-flash gemini-2.0-flash-lite gemini-2.0-pro-experimental gemini-1.5-pro deepseek-r1 gpt-2 llama-3-1 google-deepmind hugging-face anthropic multimodality context-windows cost-efficiency pretraining fine-tuning reinforcement-learning transformer tokenization embeddings mixture-of-experts andrej-karpathy jayalammar maartengr andrewyng nearcyan
Google DeepMind officially launched Gemini 2.0 models including Flash, Flash-Lite, and Pro Experimental, with Gemini 2.0 Flash outperforming Gemini 1.5 Pro while being 12x cheaper and supporting multimodal input and a 1 million token context window. Andrej Karpathy released a 3h31m video deep dive into large language models, covering pretraining, fine-tuning, and reinforcement learning with examples like GPT-2 and Llama 3.1. A free course on Transformer architecture was introduced by Jay Alammar, Maarten Gr, and Andrew Ng, focusing on tokenizers, embeddings, and mixture-of-expert models. DeepSeek-R1 reached 1.2 million downloads on Hugging Face with a detailed 36-page technical report. Anthropic increased rewards to $10K and $20K for their jailbreak challenge, while BlueRaven extension was updated to hide Twitter metrics for unbiased engagement.
CogVideoX: Zhipu's Open Source Sora
cogvideox llama-3-1 llama-3-405b moondream phi-3.5 llama-rank zhipu-ai alibaba meta-ai-fair google hugging-face nvidia togethercompute salesforce video-generation serverless-computing vision document-vqa text-vqa mixture-of-experts retrieval-augmented-generation long-context model-routing webgpu background-removal long-form-generation superposition-prompting rohanpaul_ai philschmid vikhyatk algo_diver jayalammar davidsholz
Zhipu AI, Alibaba's AI arm and China's 3rd largest AI lab, released the open 5B video generation model CogVIdeoX, which can run without GPUs via their ChatGLM web and desktop apps. Meta AI announced trust & safety research and CyberSecEval 3 alongside the release of Llama 3.1, with Llama 3 405B now available serverless on Google Cloud Vertex AI and Hugging Face x NVIDIA NIM API. Updates include Moondream, an open vision-language model improving DocVQA and TextVQA tasks, and the lightweight MoE chat model Phi-3.5 with 16x3.8B parameters. Together Compute introduced the Rerank API featuring Salesforce's LlamaRank model for document and code ranking. Research highlights include superposition prompting for RAG without fine-tuning, the AgentWrite pipeline for long-form content generation over 20,000 words, and a comparison showing Long Context methods outperform RAG at higher costs. Tools include Not Diamond, an AI model router, AI command line interfaces, and an open-source WebGPU background removal tool. "You don't even need GPUs to run it," referring to CogVIdeoX.