All tags
Model: "gpt-2"
not much happened today
gpt-2 r1 gemma-3 gemmacoder3-12b qwen2.5-omni openai deepseek berkeley alibaba togethercompute nvidia azure runway langchain bmw amazon open-source function-calling benchmarking code-reasoning multimodality inference-speed image-generation voice-generation animation robotics realtime-transcription webrtc sama clémentdelangue lioronai scaling01 cognitivecompai osanseviero jack_w_rae ben_burtenshaw theturingpost vipulved kevinweil tomlikesrobots adcock_brett juberti
OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, signaling a move towards more open AI development. DeepSeek launched its open-source R1 model earlier this year, challenging perceptions of China's AI progress. Gemma 3 has achieved function calling capabilities and ranks on the Berkeley Function-Calling Leaderboard, while GemmaCoder3-12b improves code reasoning performance on LiveCodeBench. Alibaba_Qwen's Qwen2.5-Omni introduces a novel Thinker-Talker system and TMRoPE for multimodal input understanding. The TogetherCompute team achieved 140 TPS on a 671B parameter model, outperforming Azure and DeepSeek API on Nvidia GPUs. OpenAI also expanded ChatGPT features with image generation for all free users and a new voice release. Runway Gen-4 enhances animation for miniature dioramas, and LangChain launched a chat-based generative UI agent. Commercial deployment of Figure 03 humanoid robots at BMW highlights advances in autonomy and manufacturing scaling. New tools include OpenAI's realtime transcription API with WebRTC support and Amazon's Nova Act AI browser agent.
Every 7 Months: The Moore's Law for Agent Autonomy
claude-3-7-sonnet llama-4 phi-4-multimodal gpt-2 cosmos-transfer1 gr00t-n1-2b orpheus-3b metr nvidia hugging-face canopy-labs meta-ai-fair microsoft agent-autonomy task-completion multimodality text-to-speech robotics foundation-models model-release scaling-laws fine-tuning zero-shot-learning latency reach_vb akhaliq drjimfan scaling01
METR published a paper measuring AI agent autonomy progress, showing it has doubled every 7 months since 2019 (GPT-2). They introduced a new metric, the 50%-task-completion time horizon, where models like Claude 3.7 Sonnet achieve 50% success in about 50 minutes. Projections estimate 1 day autonomy by 2028 and 1 month autonomy by late 2029. Meanwhile, Nvidia released Cosmos-Transfer1 for conditional world generation and GR00T-N1-2B, an open foundation model for humanoid robot reasoning with 2B parameters. Canopy Labs introduced Orpheus 3B, a high-quality text-to-speech model with zero-shot voice cloning and low latency. Meta reportedly delayed Llama-4 release due to performance issues. Microsoft launched Phi-4-multimodal.
Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
gemini-2.0-flash gemini-2.0-flash-lite gemini-2.0-pro-experimental gemini-1.5-pro deepseek-r1 gpt-2 llama-3-1 google-deepmind hugging-face anthropic multimodality context-windows cost-efficiency pretraining fine-tuning reinforcement-learning transformer tokenization embeddings mixture-of-experts andrej-karpathy jayalammar maartengr andrewyng nearcyan
Google DeepMind officially launched Gemini 2.0 models including Flash, Flash-Lite, and Pro Experimental, with Gemini 2.0 Flash outperforming Gemini 1.5 Pro while being 12x cheaper and supporting multimodal input and a 1 million token context window. Andrej Karpathy released a 3h31m video deep dive into large language models, covering pretraining, fine-tuning, and reinforcement learning with examples like GPT-2 and Llama 3.1. A free course on Transformer architecture was introduced by Jay Alammar, Maarten Gr, and Andrew Ng, focusing on tokenizers, embeddings, and mixture-of-expert models. DeepSeek-R1 reached 1.2 million downloads on Hugging Face with a detailed 36-page technical report. Anthropic increased rewards to $10K and $20K for their jailbreak challenge, while BlueRaven extension was updated to hide Twitter metrics for unbiased engagement.
not much happened today
llama-3-2-vision gpt-2 meta-ai-fair ollama amd llamaindex gemini gitpod togethercompute langchainai weights-biases stanfordnlp deeplearningai model-scaling neural-networks multi-gpu-support skip-connections transformers healthcare-ai automated-recruitment zero-trust-security small-language-models numerical-processing chain-of-thought optical-character-recognition multi-agent-systems agent-memory interactive-language-learning bindureddy fstichler stasbekman jxmnop bindureddy omarsar0 giffmana rajammanabrolu
This week in AI news highlights Ollama 0.4 supporting Meta's Llama 3.2 Vision models (11B and 90B), with applications like handwriting recognition. Self-Consistency Preference Optimization (ScPO) was introduced to improve model consistency without human labels. Discussions on model scaling, neural networks resurgence, and AMD's multi-GPU bandwidth challenges were noted. The importance of skip connections in Transformers was emphasized. In healthcare, less regulation plus AI could revolutionize disease treatment and aging. Tools like LlamaParse and Gemini aid automated resume insights. Gitpod Flex demonstrated zero-trust architecture for secure development environments. Research includes surveys on Small Language Models (SLMs), number understanding in LLMs, and DTrOCR using a GPT-2 decoder for OCR. Multi-agent systems in prediction markets were discussed by TogetherCompute and LangChainAI. Community events include NeurIPS Happy Hour, NLP seminars, and courses on Agent Memory with LLMs as operating systems.
We Solved Hallucinations
gpt-2 flashattention-3 lynx meta-ai-fair nvidia princeton colfax patronus-ai databricks mosaic-ai openai compute-hardware gpu-optimization flashattention llm-evaluation hallucination-detection vision benchmarking synthetic-data model-training karpathy tri_dao giffmana vikhyatk dbrxmosaicai
Reddit's URL structure causes link errors in AI-generated summaries, especially with NSFW content affecting models like Claude and GPT-4. The team fixed this glitch while still leveraging LLMs for summarizing Reddit content. GPT-2 training costs have dramatically dropped to ~$672 using H100 GPUs and software improvements like CUDA and FlashAttention. FlashAttention-3 was released, achieving up to 740 TFLOPS on H100 GPUs, with FP8 nearing 1.2 PFLOPS, developed collaboratively by Meta, NVIDIA, Princeton, and Colfax. Hopper GPUs enable major speedups with new hardware features. Synthetic data may not improve vision tasks, as shown in recent research. The Avocado360 benchmark evaluates vision-language models' ability to detect avocados in images. Lynx, a hallucination detection model for LLMs, was introduced for real-world healthcare and fintech applications, trained by Patronus AI on Databricks Mosaic AI using Composer.
Somebody give Andrej some H100s already
gpt-2 openai fineweb meta-ai-fair nvidia tesla cuda fine-tuning training-time gpu-acceleration convolutional-neural-networks real-time-processing ai-safety ai-regulation andrej-karpathy yann-lecun elon-musk francois-chollet svpino mervenoyann
OpenAI's GPT-2 sparked controversy five years ago for being "too dangerous to release." Now, with FineWeb and llm.c, a tiny GPT-2 model can be trained in 90 minutes for $20 using 8xA100 GPUs, with the full 1.6B model estimated to take 1 week and $2.5k. The project is notable for its heavy use of CUDA (75.8%) aiming to simplify the training stack. Meanwhile, a Twitter debate between Yann LeCun and Elon Musk highlighted the importance of convolutional neural networks (CNNs) in real-time image processing for autonomous driving, with LeCun emphasizing scientific research's role in technological progress. LeCun also criticized AI doomsday scenarios, arguing for cautious optimism about AI safety and regulation.