Topic: "zero-shot-learning"

Jul 01, 2025

chai-2 gemini-2.5-pro deepseek-r1-0528 meta scale-ai anthropic cloudflare grammarly superhuman chai-discovery atlassian notion slack commoncrawl hugging-face sakana-ai inference model-scaling collective-intelligence zero-shot-learning enterprise-deployment data-access science-funding open-source-llms alexandr_wang nat_friedman clementdelangue teortaxestex ylecun steph_palazzolo andersonbcdefg jeremyphoward reach_vb

Meta makes a major AI move by hiring Scale AI founder Alexandr Wang as Chief AI Officer and acquiring a 49% non-voting stake in Scale AI for $14.3 billion, doubling its valuation to about $28 billion. Chai Discovery announces Chai-2, a breakthrough model for zero-shot antibody discovery and optimization. The US government faces budget cuts threatening to eliminate a quarter million science research jobs by 2026. Data access restrictions intensify as companies like Atlassian, Notion, and Slack block web crawlers including Common Crawl, raising concerns about future public internet archives. Hugging Face shuts down HuggingChat after serving over a million users, marking a significant experiment in open-source LLMs. Sakana AI releases AB-MCTS, an inference-time scaling algorithm enabling multiple models like Gemini 2.5 Pro and DeepSeek-R1-0528 to cooperate and outperform individual models.

Mar 20, 2025

Every 7 Months: The Moore's Law for Agent Autonomy

claude-3-7-sonnet llama-4 phi-4-multimodal gpt-2 cosmos-transfer1 gr00t-n1-2b orpheus-3b metr nvidia hugging-face canopy-labs meta-ai-fair microsoft agent-autonomy task-completion multimodality text-to-speech robotics foundation-models model-release scaling-laws fine-tuning zero-shot-learning latency reach_vb akhaliq drjimfan scaling01

METR published a paper measuring AI agent autonomy progress, showing it has doubled every 7 months since 2019 (GPT-2). They introduced a new metric, the 50%-task-completion time horizon, where models like Claude 3.7 Sonnet achieve 50% success in about 50 minutes. Projections estimate 1 day autonomy by 2028 and 1 month autonomy by late 2029. Meanwhile, Nvidia released Cosmos-Transfer1 for conditional world generation and GR00T-N1-2B, an open foundation model for humanoid robot reasoning with 2B parameters. Canopy Labs introduced Orpheus 3B, a high-quality text-to-speech model with zero-shot voice cloning and low latency. Meta reportedly delayed Llama-4 release due to performance issues. Microsoft launched Phi-4-multimodal.

Feb 12, 2025

not much happened today

zonos-v0.1 audiobox-aesthetics moshi sonar llama-3-70b gpt-4o-mini claude-3.5-haiku gpt-4o claude-3.5-sonnet deepseek-r1-distilled-qwen-1.5b reasonflux-32b o1-preview zyphra-ai meta-ai-fair kyutai-labs perplexity-ai cerebras uc-berkeley brilliant-labs google-deepmind text-to-speech speech-to-speech benchmarking model-performance reinforcement-learning math real-time-processing open-source cross-platform-integration multilinguality zero-shot-learning danhendrycks

Zyphra AI launched Zonos-v0.1, a leading open-weight text-to-speech model supporting multiple languages and zero-shot voice cloning. Meta FAIR released the open-source Audiobox Aesthetics model trained on 562 hours of audio data. Kyutai Labs introduced Moshi, a real-time speech-to-speech system with low latency. Perplexity AI announced the Sonar model based on Llama 3.3 70b, outperforming top models like GPT-4o and Claude 3.5 Sonnet with 1200 tokens/second speed, powered by Cerebras infrastructure. UC Berkeley open-sourced a 1.5B model trained with reinforcement learning that beats o1-preview on math tasks. ReasonFlux-32B achieved 91.2% on the MATH benchmark, outperforming OpenAI o1-preview. CrossPoster, an AI agent for cross-platform posting, was released using LlamaIndex workflows. Brilliant Labs integrated the Google DeepMind Gemini Live API into smart glasses for real-time translation and object identification.

Jan 17, 2025

not much happened today

oute-tts-0.3-1b oute-tts-0.3-500m olm-1b qwen-2.5-0.5b hover gpt-4o deepseek-v3 harvey meta-ai-fair stability-ai alibaba deepseek hugging-face text-to-speech zero-shot-learning multilinguality emotion-control motor-control reinforcement-learning local-ai distributed-inference pipeline-parallelism mathematical-reasoning process-reward-models legal-ai education-ai ai-security humor reach_vb drjimfan vikhyatk mervenoyann aiatmeta iscienceluvr alibaba_qwen awnihannun ajeya_cotra emollick qtnx_ designerx

Harvey secured a new $300M funding round. OuteTTS 0.3 1B & 500M text-to-speech models were released featuring zero-shot voice cloning, multilingual support (en, jp, ko, zh, fr, de), and emotion control, powered by OLMo-1B and Qwen 2.5 0.5B. The HOVER model, a 1.5M-parameter neural net for agile motor control, was introduced, leveraging human motion capture datasets and massively parallel reinforcement learning. kokoro.js enables running AI models locally in browsers with minimal dependencies. Meta AI awarded $200K LLM evaluation grants for projects on regional language understanding, complex reasoning, and interactive programming environments. Stability AI's Twitter account was hacked, prompting security warnings. Alibaba Qwen improved Process Reward Models (PRMs) for better mathematical reasoning using a consensus filtering mechanism. DeepSeek V3 uses pipeline parallelism to enhance distributed inference and long-context generation efficiency. Discussions on AI policy in legal frameworks and AI's role in democratizing education were highlighted. Lighthearted AI-related humor was also shared.

Aug 21, 2024

not much happened today

gpt-4o claude-3.5-sonnet phi-3.5-mini phi-3.5-moe phi-3.5-vision llama-3-1-405b qwen2-math-72b openai anthropic microsoft meta-ai-fair hugging-face langchain box fine-tuning benchmarking model-comparison model-performance diffusion-models reinforcement-learning zero-shot-learning math model-efficiency ai-regulation ai-safety ai-engineering prompt-engineering swyx ylecun

OpenAI launched GPT-4o finetuning with a case study on Cosine. Anthropic released Claude 3.5 Sonnet with 8k token output. Microsoft Phi team introduced Phi-3.5 in three variants: Mini (3.8B), MoE (16x3.8B), and Vision (4.2B), noted for sample efficiency. Meta released Llama 3.1 405B, deployable on Google Cloud Vertex AI, offering GPT-4 level capabilities. Qwen2-Math-72B achieved state-of-the-art math benchmark performance with a Gradio demo. Discussions included model comparisons like ViT vs CNN and Mamba architecture. Tools updates featured DSPy roadmap, Flux Schnell improving diffusion speed on M1 Max, and LangChain community events. Research highlights zero-shot DUP prompting for math reasoning and fine-tuning best practices. AI ethics covered California's AI Safety Bill SB 1047 and regulatory concerns from Yann LeCun. Commentary on AI engineer roles by Swyx. "Chat with PDF" feature now available for Box Enterprise Plus users.

Jul 17, 2024

SciCode: HumanEval gets a STEM PhD upgrade

gpt-4 claude-3.5-sonnet llama-3-7b llama-3 dolphin-2.9.3-yi-1.5-34b-32k-gguf anthropic hugging-face nvidia benchmarks coding model-training gpu-optimization model-performance synthetic-data compiler-optimization zero-shot-learning yi-tay rohanpaul_ai alexalbert__ tri_dao abacaj

PhD-level benchmarks highlight the difficulty of coding scientific problems for LLMs, with GPT-4 and Claude 3.5 Sonnet scoring under 5% on the new SciCode benchmark. Anthropic doubled the max output token limit for Claude 3.5 Sonnet to 8192 tokens. The Q-GaLore method enables training LLaMA-7B on a single 16GB GPU. The Mosaic compiler now generates efficient code for NVIDIA H100 GPUs. The Dolphin 2.9.3-Yi-1.5-34B-32k-GGUF model on Hugging Face has over 111k downloads. Llama 3 shows strong performance, achieving 90% zero-shot accuracy on the MATH dataset. Discussions continue on the limitations and forms of synthetic data for model training.

Apr 23, 2024

FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)

llama-3-70b llama-3 wizardlm-2-8x22b claude-opus mistral-8x7b gpt-4 huggingface meta-ai-fair dbrx reka-ai mistral-ai lmsys openai datasets benchmarking quantization zero-shot-learning reasoning code-error-detection token-generation security

2024 has seen a significant increase in dataset sizes for training large language models, with Redpajama 2 offering up to 30T tokens, DBRX at 12T tokens, Reka Core/Flash/Edge with 5T tokens, and Llama 3 trained on 15T tokens. Huggingface released an open dataset containing 15T tokens from 12 years of filtered CommonCrawl data, enabling training of models like Llama 3 if compute resources are available. On Reddit, WizardLM-2-8x22b outperformed other open LLMs including Llama-3-70b-instruct in reasoning and math benchmarks. Claude Opus demonstrated strong zero-shot code error spotting, surpassing Llama 3. Benchmarks revealed limitations in the LMSYS chatbot leaderboard due to instruction-tuned models gaming the system, and a new RAG benchmark showed Llama 3 70B underperforming compared to GPT-4, while Mistral 8x7B remained strong. Efficient quantized versions of Llama 3 models are available on Huggingface, with users reporting token generation limits around 9600 tokens on a 3090 GPU. Safety concerns include a UK sex offender banned from AI tool usage and GPT-4 demonstrating an 87% success rate exploiting real vulnerabilities, raising security concerns.