All tags
Company: "reka-ai"
The new OpenAI Agents Platform
reka-flash-3 o1-mini claude-3-7-sonnet llama-3-3-70b sonic-2 qwen-chat olympiccoder openai reka-ai hugging-face deepseek togethercompute alibaba ai-agents api model-releases fine-tuning reinforcement-learning model-training model-inference multimodality voice-synthesis gpu-clusters model-distillation performance-optimization open-source sama reach_vb
OpenAI introduced a comprehensive suite of new tools for AI agents, including the Responses API, Web Search Tool, Computer Use Tool, File Search Tool, and an open-source Agents SDK with integrated observability tools, marking a significant step towards the "Year of Agents." Meanwhile, Reka AI open-sourced Reka Flash 3, a 21B parameter reasoning model that outperforms o1-mini and powers their Nexus platform, with weights available on Hugging Face. The OlympicCoder series surpassed Claude 3.7 Sonnet and much larger models on competitive coding benchmarks. DeepSeek built a 32K GPU cluster capable of training V3-level models in under a week and is exploring AI distillation. Hugging Face announced Cerebras inference support, achieving over 2,000 tokens/s on Llama 3.3 70B, 70x faster than leading GPUs. Reka's Sonic-2 voice AI model delivers 40ms latency via the Together API. Alibaba's Qwen Chat enhanced its multimodal interface with video understanding up to 500MB, voice-to-text, guest mode, and expanded file uploads. Sama praised OpenAI's new API as "one of the most well-designed and useful APIs ever."
The AI Nobel Prize
claude-3.5-sonnet reka-flash got openai anthropic reka-ai zep artificial-neural-networks nobel-prize knowledge-graphs memory-layers real-time-voice-api vision fine-tuning prompt-caching multimodality function-calling ocr open-source single-sign-on software-testing ai-assisted-coding ai-ethics geoff-hinton john-hopfield philschmid alexalbert mervenoyann clementdelangue svpino bindureddy ylecun rohanpaul_ai
Geoff Hinton and John Hopfield won the Nobel Prize in Physics for their work on Artificial Neural Networks. The award citation spans 14 pages highlighting their contributions. Zep released a new community edition of their low-latency memory layer for AI agents, emphasizing knowledge graphs for memory. At OpenAI's DevDay, new features like real-time voice API, vision model fine-tuning, and prompt caching with a 50% discount on reused tokens were introduced. Anthropic's Claude 3.5 Sonnet was recognized as the best model currently. Reka AI Labs updated their Reka Flash model with enhanced multimodal and function calling capabilities. The GOT (Generic OCR Transformer) achieved 98.79% accuracy on OCR benchmarks. Discussions on open-source AI models highlighted their role in fostering competition and decentralization. Software development insights included the importance of Single Sign-On (SSO), thorough testing, and AI-assisted coding workflows. Ethical and societal topics covered critiques of tax policies and the appointment of France's first Minister of AI.
Evals: The Next Generation
gpt-4 gpt-5 gpt-3.5 phi-3 mistral-7b llama-3 scale-ai mistral-ai reka-ai openai moderna sanctuary-ai microsoft mit meta-ai-fair benchmarking data-contamination multimodality fine-tuning ai-regulation ai-safety ai-weapons neural-networks model-architecture model-training model-performance robotics activation-functions long-context sam-altman jim-fan
Scale AI highlighted issues with data contamination in benchmarks like MMLU and GSM8K, proposing a new benchmark where Mistral overfits and Phi-3 performs well. Reka released the VibeEval benchmark for multimodal models addressing multiple choice benchmark limitations. Sam Altman of OpenAI discussed GPT-4 as "dumb" and hinted at GPT-5 with AI agents as a major breakthrough. Researchers jailbroke GPT-3.5 via fine-tuning. Global calls emerged to ban AI-powered weapons, with US officials urging human control over nuclear arms. Ukraine launched an AI consular avatar, while Moderna partnered with OpenAI for medical AI advancements. Sanctuary AI and Microsoft collaborate on AI for general-purpose robots. MIT introduced Kolmogorov-Arnold networks with improved neural network efficiency. Meta AI is training Llama 3 models with over 400 billion parameters, featuring multimodality and longer context.
FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)
llama-3-70b llama-3 wizardlm-2-8x22b claude-opus mistral-8x7b gpt-4 huggingface meta-ai-fair dbrx reka-ai mistral-ai lmsys openai datasets benchmarking quantization zero-shot-learning reasoning code-error-detection token-generation security
2024 has seen a significant increase in dataset sizes for training large language models, with Redpajama 2 offering up to 30T tokens, DBRX at 12T tokens, Reka Core/Flash/Edge with 5T tokens, and Llama 3 trained on 15T tokens. Huggingface released an open dataset containing 15T tokens from 12 years of filtered CommonCrawl data, enabling training of models like Llama 3 if compute resources are available. On Reddit, WizardLM-2-8x22b outperformed other open LLMs including Llama-3-70b-instruct in reasoning and math benchmarks. Claude Opus demonstrated strong zero-shot code error spotting, surpassing Llama 3. Benchmarks revealed limitations in the LMSYS chatbot leaderboard due to instruction-tuned models gaming the system, and a new RAG benchmark showed Llama 3 70B underperforming compared to GPT-4, while Mistral 8x7B remained strong. Efficient quantized versions of Llama 3 models are available on Huggingface, with users reporting token generation limits around 9600 tokens on a 3090 GPU. Safety concerns include a UK sex offender banned from AI tool usage and GPT-4 demonstrating an 87% success rate exploiting real vulnerabilities, raising security concerns.
Lilian Weng on Video Diffusion
wizardlm-2 llama-3 reka-core devin opus sora openai adobe reka-ai diffusion-models video-generation training-free-adaptation multimodality intuition creativity analogy-recognition self-improving-ai model-recognition agi-timelines model-performance startup-competition lilian-weng sam-altman geoffrey-hinton yann-lecun
OpenAI expands with a launch in Japan, introduces a Batch API, and partners with Adobe to bring the Sora video model to Premiere Pro. Reka AI releases the Reka Core multimodal language model. WizardLM-2 is released showing impressive performance, and Llama 3 news is anticipated soon. Geoffrey Hinton highlights AI models exhibiting intuition, creativity, and analogy recognition beyond humans. The Devin AI model notably contributes to its own codebase. Opus demonstrates the ability to recognize its own generated outputs. Sam Altman warns startups about being steamrolled by OpenAI if they don't adapt quickly. Yann LeCun discusses AGI timelines, emphasizing it is inevitable but not imminent or solely from LLMs. Lilian Weng's blog on diffusion models for video generation highlights training-free adaptation as a breakthrough technique.
Multi-modal, Multi-Aspect, Multi-Form-Factor AI
gpt-4 idefics-2-8b mistral-instruct apple-mlx gpt-5 reka-ai cohere google rewind apple mistral-ai microsoft paypal multimodality foundation-models embedding-models gpu-performance model-comparison enterprise-data open-source performance-optimization job-impact agi-criticism technical-report arthur-mensch dan-schulman chris-bishop
Between April 12-15, Reka Core launched a new GPT4-class multimodal foundation model with a detailed technical report described as "full Shazeer." Cohere Compass introduced a foundation embedding model for indexing and searching multi-aspect enterprise data like emails and invoices. The open-source IDEFICS 2-8B model continues Google's Flamingo multimodal model reproduction. Rewind pivoted to a multi-platform app called Limitless, moving away from spyware. Reddit discussions highlighted Apple MLX outperforming Ollama and Mistral Instruct on M2 Ultra GPUs, GPU choices for LLMs and Stable Diffusion, and AI-human comparisons by Microsoft Research's Chris Bishop. Former PayPal CEO Dan Schulman predicted GPT-5 will drastically reduce job scopes by 80%. Mistral CEO Arthur Mensch criticized the obsession with AGI as "creating God."