All tags
Company: "togethercompute"
not much happened today
phi-4 phi-4-mini-reasoning qwen3-235b qwen3-moe-235b qwen3-moe-30b qwen3-dense-32b qwen3-dense-14b qwen3-dense-8b qwen3-dense-4b qwen3-dense-0.6b qwen2.5-omni-3b deepseek-prover-v2 llama llama-guard-4 prompt-guard-2 mimo-7b microsoft anthropic cursor alibaba togethercompute deepseek meta-ai-fair xiaomi openrouterai cohere reasoning model-fine-tuning model-evaluation benchmarking model-popularity open-source math model-scaling model-filtering jailbreak-prevention cline reach_vb vipulved akhaliq omarsar0 zhs05232838 huajian_xin mervenoyann karpathy random_walker sarahookr blancheminerva clefourrier
Microsoft released Phi-reasoning 4, a finetuned 14B reasoning model slightly behind QwQ but limited by data transparency and token efficiency issues. Anthropic introduced remote MCP server support and a 45-minute Research mode in Claude. Cursor published a model popularity list. Alibaba launched Qwen3-235B and other Qwen3 variants, highlighting budget-friendly coding and reasoning capabilities, with availability on Together AI API. Microsoft also released Phi-4-Mini-Reasoning with benchmark performance on AIME 2025 and OmniMath. DeepSeek announced DeepSeek-Prover V2 with state-of-the-art math problem solving, scaling to 671B parameters. Meta AI's Llama models hit 1.2 billion downloads, with new Llama Guard 4 and Prompt Guard 2 for input/output filtering and jailbreak prevention. Xiaomi released the open-source reasoning model MiMo-7B trained on 25 trillion tokens. Discussions on AI model evaluation highlighted issues with the LMArena leaderboard, data access biases favoring proprietary models, and challenges in maintaining fair benchmarking, with suggestions for alternatives like OpenRouterAI rankings. "LMArena slop and biased" and "61.3% of all data going to proprietary model providers" were noted concerns.
Google's Agent2Agent Protocol (A2A)
kimi-vl-a3b gpt-4o llama-4-scout llama-4-maverick llama-4-behemoth deepcoder-14b o3-mini o1 llama-3.1-nemotron-ultra-253b deepseek-r1 google google-deepmind moonshot-ai meta-ai-fair uc-berkeley openai nvidia hugging-face togethercompute deepseek agent-interoperability multimodality vision math reinforcement-learning coding model-training open-source model-benchmarking context-windows streaming push-notifications enterprise-authentication model-release reach_vb _akhaliq epochairesearch artificialanlys winglian danielhanchen yuchenj_uw jeremyphoward
Google Cloud Next announcements featured the launch of Google and DeepMind's full MCP support and a new Agent to Agent protocol designed for agent interoperability with multiple partners. The protocol includes components like the Agent Card, Task communication channels, Enterprise Auth and Observability, and Streaming and Push Notification support. On the model front, Moonshot AI released Kimi-VL-A3B, a multimodal model with 128K context and strong vision and math benchmark performance, outperforming gpt-4o. Meta AI introduced smaller versions of llama-4 family models: llama-4-scout and llama-4-maverick, with a larger Behemoth model still in training. DeepCoder 14B from UC Berkeley is an open-source coding model rivaling openai's o3-mini and o1 models, trained with reinforcement learning on 24K coding problems. Nvidia released llama-3.1-nemotron-ultra-253b on Hugging Face, noted for beating llama-4-behemoth and maverick and competing with deepseek-r1.
not much happened today
gpt-2 r1 gemma-3 gemmacoder3-12b qwen2.5-omni openai deepseek berkeley alibaba togethercompute nvidia azure runway langchain bmw amazon open-source function-calling benchmarking code-reasoning multimodality inference-speed image-generation voice-generation animation robotics realtime-transcription webrtc sama clémentdelangue lioronai scaling01 cognitivecompai osanseviero jack_w_rae ben_burtenshaw theturingpost vipulved kevinweil tomlikesrobots adcock_brett juberti
OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, signaling a move towards more open AI development. DeepSeek launched its open-source R1 model earlier this year, challenging perceptions of China's AI progress. Gemma 3 has achieved function calling capabilities and ranks on the Berkeley Function-Calling Leaderboard, while GemmaCoder3-12b improves code reasoning performance on LiveCodeBench. Alibaba_Qwen's Qwen2.5-Omni introduces a novel Thinker-Talker system and TMRoPE for multimodal input understanding. The TogetherCompute team achieved 140 TPS on a 671B parameter model, outperforming Azure and DeepSeek API on Nvidia GPUs. OpenAI also expanded ChatGPT features with image generation for all free users and a new voice release. Runway Gen-4 enhances animation for miniature dioramas, and LangChain launched a chat-based generative UI agent. Commercial deployment of Figure 03 humanoid robots at BMW highlights advances in autonomy and manufacturing scaling. New tools include OpenAI's realtime transcription API with WebRTC support and Amazon's Nova Act AI browser agent.
not much happened today
gpt-4o deepseek-v3 claude-3.7-sonnet o3-mini gemini-2.5-pro openai deepseek anthropic google-deepmind togethercompute hypertecgroup coreweave cursor-ai windsurf-ai coding instruction-following image-generation policy-compliance long-context audio-processing video-processing gpu-clusters ai-infrastructure api-access sama kevinweil joannejang nrehiew_ giffmana _philschmid scaling01 saranormous
GPT-4o was praised for its improved coding, instruction following, and freedom, becoming the leading non-reasoning coding model surpassing DeepSeek V3 and Claude 3.7 Sonnet in coding benchmarks, though it still lags behind reasoning models like o3-mini. Concerns about policy compliance in image generation were noted, with efforts to improve adherence. Gemini 2.5 Pro was highlighted for its advanced audio and video understanding, long context capabilities, and integration with platforms like Cursor AI and Windsurf AI. AI infrastructure developments include a partnership between Together AI and Hypertec Group to deliver large-scale GPU clusters, and CoreWeave's IPO was celebrated for advancing AI infrastructure. GPU and TPU usage is expected to increase significantly. "GPT-4o's transparency and background generation feature" and "Gemini 2.5 Pro scored above 50% on Simple-Bench AI Explanation" were key highlights.
OpenAI adopts MCP
gemini-2.5-pro gemini-1.5-pro gemini-2.0-flash qwen-2.5-omni-7b deepseek-v3-0324 deepseek-r1 openai google-deepmind alibaba togethercompute model-benchmarking multimodality reasoning scaling-laws model-quantization synthetic-data model-performance context-windows speech-recognition translation audio-processing video-processing swyx
OpenAI announced support for MCP, a significant technical update. Google's Gemini 2.5 Pro leads benchmarks with top scores in MMLU-Pro (86%), GPQA Diamond (83%), and AIME 2024 (88%), featuring a 1 million token context window and multimodal inputs. Alibaba's Qwen 2.5 Omni 7B was released as a fully multimodal, interactive, open-source model with a novel "thinker-talker" architecture supporting voice and video chat. DeepSeek V3-0324 outperforms its predecessor on multiple benchmarks. Research on reasoning features in large language models using sparse autoencoders was highlighted, alongside a study on scaling laws of synthetic data showing performance plateaus near 300B tokens. Discussions also covered the fastest output speeds of Gemini models and concerns about over-reliance on benchmarks for intelligence measurement. Swyx will curate the Data Council AI Engineering Track in April.
The new OpenAI Agents Platform
reka-flash-3 o1-mini claude-3-7-sonnet llama-3-3-70b sonic-2 qwen-chat olympiccoder openai reka-ai hugging-face deepseek togethercompute alibaba ai-agents api model-releases fine-tuning reinforcement-learning model-training model-inference multimodality voice-synthesis gpu-clusters model-distillation performance-optimization open-source sama reach_vb
OpenAI introduced a comprehensive suite of new tools for AI agents, including the Responses API, Web Search Tool, Computer Use Tool, File Search Tool, and an open-source Agents SDK with integrated observability tools, marking a significant step towards the "Year of Agents." Meanwhile, Reka AI open-sourced Reka Flash 3, a 21B parameter reasoning model that outperforms o1-mini and powers their Nexus platform, with weights available on Hugging Face. The OlympicCoder series surpassed Claude 3.7 Sonnet and much larger models on competitive coding benchmarks. DeepSeek built a 32K GPU cluster capable of training V3-level models in under a week and is exploring AI distillation. Hugging Face announced Cerebras inference support, achieving over 2,000 tokens/s on Llama 3.3 70B, 70x faster than leading GPUs. Reka's Sonic-2 voice AI model delivers 40ms latency via the Together API. Alibaba's Qwen Chat enhanced its multimodal interface with video understanding up to 500MB, voice-to-text, guest mode, and expanded file uploads. Sama praised OpenAI's new API as "one of the most well-designed and useful APIs ever."
DeepSeek's Open Source Stack
qwen-qwq-32b start character-3 gemini gemini-2.0 mercury-coder gpt-4.5 jamba-mini-1.6 gemini-2.0-flash gpt-4o-mini mistral-small-3 mistral-ocr deepseek pyspur hugging-face togethercompute hedra-labs google-deepmind deeplearningai openai ai21-labs mistral-ai fine-tuning benchmarking multimodality code-generation diffusion-models model-performance model-optimization ocr embedding-models context-windows runtime-limits _akhaliq lmarena_ai reach_vb danielhanchen _philschmid aidan_mclau vikhyatk jerryjliu0
DeepSeek's Open Source Week was summarized by PySpur, highlighting multiple interesting releases. The Qwen QwQ-32B model was fine-tuned into START, excelling in PhD-level science QA and math benchmarks. Character-3, an omnimodal AI video generation model by Hedra Labs and Together AI, enables realistic animated content creation. Google DeepMind introduced the Gemini embedding model with an 8k context window, ranking #1 on MMTEB, alongside the Gemini 2.0 Code Executor supporting Python libraries and auto-fix features. Inception Labs' Mercury Coder is a diffusion-based code generation model offering faster token processing. OpenAI released GPT-4.5, their largest model yet but with less reasoning ability than some competitors. AI21 Labs launched Jamba Mini 1.6, noted for superior output speed compared to Gemini 2.0 Flash, GPT-4o mini, and Mistral Small 3. A new dataset of 1.9M scanned pages was released for OCR benchmarking, with Mistral OCR showing competitive but not top-tier document parsing performance compared to LLM/LVM-powered methods. "Cracked engineers are all you need."
lots of small launches
gpt-4o claude-3.7-sonnet claude-3.7 claude-3.5-sonnet deepseek-r1 deepseek-v3 grok-3 openai anthropic amazon cloudflare perplexity-ai deepseek-ai togethercompute elevenlabs elicitorg inceptionailabs mistral-ai voice model-releases cuda gpu-optimization inference open-source api model-performance token-efficiency context-windows cuda jit-compilation lmarena_ai alexalbert__ aravsrinivas reach_vb
GPT-4o Advanced Voice Preview is now available for free ChatGPT users with enhanced daily limits for Plus and Pro users. Claude 3.7 Sonnet has achieved the top rank in WebDev Arena with improved token efficiency. DeepSeek-R1 with 671B parameters benefits from the Together Inference platform optimizing NVIDIA Blackwell GPU usage, alongside the open-source DeepGEMM CUDA library delivering up to 2.7x speedups on Hopper GPUs. Perplexity launched a new Voice Mode and a Deep Research API. The upcoming Grok 3 API will support a 1M token context window. Several companies including Elicit, Amazon, Anthropic, Cloudflare, FLORA, Elevenlabs, and Inception Labs announced new funding rounds, product launches, and model releases.
AI Engineer Summit Day 1
grok-3 o3-mini deepseek-r1 qwen-2.5-vl openai anthropic xai togethercompute alibaba sakana-ai benchmarking model-performance cuda model-training open-source debugging inference-speed batch-size reinforcement-learning aidan_mclau giffmana nrehiew_ teortaxestex epochairesearch andrew_n_carr borismpower yuhu_ai_
The AIE Summit in NYC highlighted key talks including Grace Isford's Trends Keynote, Neo4j/Pfizer's presentation, and OpenAI's first definition of Agents. Speakers announced $930 million in funding. On AI Twitter, discussions focused on Grok-3 and o3-mini models, with debates on performance and benchmarking, including Grok-3's record compute scale of 4e26 to 5e26 FLOP. The o3-mini model uncovered a critical CUDA kernel bug in Sakana AI's code. DeepSeek-R1 was promoted as an open-source alternative with notable training batch sizes. Additionally, Alibaba announced the Qwen 2.5-VL model release.
o3-mini launches, OpenAI on "wrong side of history"
o3-mini o1 gpt-4o mistral-small-3-24b deepseek-r1 openai mistral-ai deepseek togethercompute fireworksai_hq ai-gradio replicate reasoning safety cost-efficiency model-performance benchmarking api open-weight-models model-releases sam-altman
OpenAI released o3-mini, a new reasoning model available for free and paid users with a "high" reasoning effort option that outperforms the earlier o1 model on STEM tasks and safety benchmarks, costing 93% less per token. Sam Altman acknowledged a shift in open source strategy and credited DeepSeek R1 for influencing assumptions. MistralAI launched Mistral Small 3 (24B), an open-weight model with competitive performance and low API costs. DeepSeek R1 is supported by Text-generation-inference v3.1.0 and available via ai-gradio and replicate. The news highlights advancements in reasoning, cost-efficiency, and safety in AI models.
small little news items
r7b llama-3-70b minicpm-o-2.6 gpt-4v qwen2.5-math-prm ollama cohere togethercompute openbmb qwen langchain openai rag tool-use-tasks quality-of-life new-engine multimodality improved-reasoning math-capabilities process-reward-models llm-reasoning mathematical-reasoning beta-release task-scheduling ambient-agents email-assistants ai-software-engineering codebase-analysis test-case-generation security-infrastructure llm-scaling-laws power-law plateauing-improvements gans-revival
Ollama enhanced its models by integrating Cohere's R7B, optimized for RAG and tool use tasks, and released Ollama v0.5.5 with quality updates and a new engine. Together AI launched the Llama 3.3 70B multimodal model with improved reasoning and math capabilities, while OpenBMB introduced the MiniCPM-o 2.6, outperforming GPT-4V on visual tasks. Insights into Process Reward Models (PRM) were shared to boost LLM reasoning, alongside Qwen2.5-Math-PRM models excelling in mathematical reasoning. LangChain released a beta for ChatGPT Tasks enabling scheduling of reminders and summaries, and introduced open-source ambient agents for email assistance. OpenAI rolled out Tasks for scheduling actions in ChatGPT for Plus, Pro, and Teams users. AI software engineering is rapidly advancing, predicted to match human capabilities within 18 months. Research on LLM scaling laws highlights power law relationships and plateauing improvements, while GANs are experiencing a revival.
not much happened today
phi-4 reinforce++ arc-agi-2 ai21-labs ollama langchain togethercompute groq reinforcement-learning ppo model-optimization memory-efficiency python-packages vision text-extraction frontend-code-generation workflow-automation coding-agents compute-cost-reduction ethical-ai agi-benchmarks scam-alerts sebastien-bubeck fchollet tom-doerr arohan_ bindureddy hwchase17 jonathanross321 clementdelangue vikhyatk
Sebastien Bubeck introduced REINFORCE++, enhancing classical REINFORCE with PPO-inspired techniques for 30% faster training. AI21 Labs released Phi-4 under the MIT License, accessible via Ollama. François Chollet announced plans for ARC-AGI-2 and a next-generation AGI benchmark. LangChain launched 10 new integration packages to boost LLM application development. Tom Doerr introduced Ollama-OCR, a Python package for text extraction using vision language models. Arohan optimized Shampoo for memory efficiency, reducing usage from 20 to 6 bytes per parameter. Bindu Reddy showcased CodeLLM's v1 for frontend code generation and highlighted LlamaIndex Workflows for academic summarization and slide generation. Hwchase17 collaborated with Together Compute to enhance WebDev Arena with complex coding agents for LLM coding evaluations. Jonathan Ross detailed Groq's mission to reduce compute costs by 1000x amid rising generative AI spending. Clement Delangue warned about scam alerts involving false claims of association with AI21. Vikhyat K raised concerns about the ethical implications and trade-offs of AGI. Memes and humor included creative AI prompts and critiques of LLM behaviors.
OpenAI Voice Mode Can See Now - After Gemini Does
gemini-2.0-flash claude claude-3.5-sonnet llama-3-70b llama-3 mistral-large gpt-4o openai google-deepmind anthropic togethercompute scale-ai meta-ai-fair mistral-ai multimodality real-time-streaming roleplay prompt-handling model-comparison model-training creative-writing model-censorship code-execution developer-ecosystem ai-humor bindureddy
OpenAI launched Realtime Video shortly after Gemini, which led to less impact due to Gemini's earlier arrival with lower cost and fewer rate limits. Google DeepMind released Gemini 2.0 Flash featuring enhanced multimodal capabilities and real-time streaming. Anthropic introduced Clio, a system analyzing real-world usage of Claude models. Together Computing acquired CodeSandbox to launch a code interpreter tool. Discussions highlighted Meta's Llama 3.3-70B for its advanced roleplay and prompt handling abilities, outperforming models like Mistral Large and GPT-4o in expressiveness and censorship. The AI community also engaged in humorous takes on AI outages and model competition, with ChatGPT adding a Santa mode for holiday interactions. "Anthropic is capturing the developer ecosystem, Gemini has AI enthusiast mindshare, ChatGPT reigns over AI dabblers" was a noted observation from the community.
not much happened today
llama-3-2-vision gpt-2 meta-ai-fair ollama amd llamaindex gemini gitpod togethercompute langchainai weights-biases stanfordnlp deeplearningai model-scaling neural-networks multi-gpu-support skip-connections transformers healthcare-ai automated-recruitment zero-trust-security small-language-models numerical-processing chain-of-thought optical-character-recognition multi-agent-systems agent-memory interactive-language-learning bindureddy fstichler stasbekman jxmnop bindureddy omarsar0 giffmana rajammanabrolu
This week in AI news highlights Ollama 0.4 supporting Meta's Llama 3.2 Vision models (11B and 90B), with applications like handwriting recognition. Self-Consistency Preference Optimization (ScPO) was introduced to improve model consistency without human labels. Discussions on model scaling, neural networks resurgence, and AMD's multi-GPU bandwidth challenges were noted. The importance of skip connections in Transformers was emphasized. In healthcare, less regulation plus AI could revolutionize disease treatment and aging. Tools like LlamaParse and Gemini aid automated resume insights. Gitpod Flex demonstrated zero-trust architecture for secure development environments. Research includes surveys on Small Language Models (SLMs), number understanding in LLMs, and DTrOCR using a GPT-2 decoder for OCR. Multi-agent systems in prediction markets were discussed by TogetherCompute and LangChainAI. Community events include NeurIPS Happy Hour, NLP seminars, and courses on Agent Memory with LLMs as operating systems.
Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
claude-3.5-haiku llama-3-1 llama-3-2 mlx-lm tencent anthropic meta-ai-fair togethercompute llamaindex mixture-of-experts synthetic-data model-scaling model-architecture model-optimization kv-cache-quantization react fine-tuning scaling-laws model-efficiency model-deployment multimodality
Tencent released a notable >300B parameter MoE model pretrained on 7T tokens, including 1.5T synthetic data generated via Evol-Instruct. The model introduces novel techniques like "recycle routing" and expert-specific learning rates, alongside a compute-efficient scaling law for MoE active parameters. However, its custom license restricts use in the EU and by companies with over 100M MAU, and it avoids China-sensitive queries. Meanwhile, Anthropic launched Claude 3.5 Haiku, now available on multiple platforms, praised for intelligence and speed but criticized for a 10x price increase. Meta opened Llama AI to the U.S. defense sector, and a Llama Impact Hackathon offers a $15K prize for projects using Llama 3.1 & 3.2 Vision. LlamaIndex released a React chat UI component with Tailwind CSS and LLM backend integrations. The MLX LM model advances text generation speed and efficiency with KV cache quantization.
not much happened today
llama mistral openai decagon sierra togethercompute vertical-saas funding protein-structure-prediction lora self-supervised-learning model-optimization neural-architecture-search model-evaluation ethics transformers multi-agent-systems long-context mira-murati demis-hassabis clement-delangue john-o-whitaker yann-lecun francois-chollet ajeya-cotra rohan-paul adcock-brett
Vertical SaaS agents are gaining rapid consensus as the future of AI applications, highlighted by Decagon's $100m funding and Sierra's $4b round. OpenAI alumni are actively raising venture capital and forming new startups, intensifying competition in the AI market. Demis Hassabis celebrated the Nobel Prize recognition for AlphaFold2, a breakthrough in protein structure prediction. Advances in AI models include techniques like LoRA projectors and annealing on high-quality data, while discussions emphasize the need for high-bandwidth sensory inputs beyond language for common sense learning. New methods like LoLCATs aim to optimize transformer models such as Llama and Mistral for efficiency. Ethical concerns about AI agents performing harmful tasks remain under investigation. The AI community continues to explore model evaluation challenges and optimization frameworks like LPZero for neural architecture search.
not much happened today
flux-schnell meta-ai-fair anthropic togethercompute hugging-face audio-generation quantization prompt-caching long-term-memory llm-serving-framework hallucination-detection ai-safety ai-governance geoffrey-hinton john-hopfield demis-hassabis rohanpaul_ai svpino hwchase17 shreyar philschmid mmitchell_ai bindureddy
Geoffrey Hinton and John Hopfield won the Nobel Prize in Physics for foundational work on neural networks linking AI and physics. Meta AI introduced a 13B parameter audio generation model as part of Meta Movie Gen for video-synced audio. Anthropic launched the Message Batches API enabling asynchronous processing of up to 10,000 queries at half the cost. Together Compute released Flux Schnell, a free model for 3 months. New techniques like PrefixQuant quantization and Prompt Caching for low-latency inference were highlighted by rohanpaul_ai. LangGraph added long-term memory support for persistent document storage. Hex-LLM framework was introduced for TPU-based low-cost, high-throughput LLM serving from Hugging Face models. Discussions on AI safety emphasized gender equality in science, and concerns about premature AI regulation by media and Hollywood were raised.
Contextual Document Embeddings: `cde-small-v1`
llama-3 cde-small-v1 gemini-1.5-flash-8b chatgpt meta-ai-fair openai google-deepmind weights-biases togethercompute contextual-embeddings contextual-batching video-generation synthetic-data model-efficiency training-techniques rag algorithmic-efficiency jack-morris sasha-rush tim-brooks demis-hassabis karina-nguyen
Meta announced a new text-to-video model, Movie Gen, claiming superior adaptation of Llama 3 to video generation compared to OpenAI's Sora Diffusion Transformers, though no release is available yet. Researchers Jack Morris and Sasha Rush introduced the cde-small-v1 model with a novel contextual batching training technique and contextual embeddings, achieving strong performance with only 143M parameters. OpenAI launched Canvas, a collaborative interface for ChatGPT with synthetic data training. Google DeepMind welcomed Tim Brooks to work on video generation and world simulators. Google released Gemini 1.5 Flash-8B, improving cost and rate limits with algorithmic efficiency.
ChatGPT Advanced Voice Mode
o1-preview qwen-2.5 llama-3 claude-3.5 openai anthropic scale-ai togethercompute kyutai-labs voice-synthesis planning multilingual-datasets retrieval-augmented-generation open-source speech-assistants enterprise-ai price-cuts benchmarking model-performance sam-altman omarsar0 bindureddy rohanpaul_ai _philschmid alexandr_wang svpino ylecun _akhaliq
OpenAI rolled out ChatGPT Advanced Voice Mode with 5 new voices and improved accent and language support, available widely in the US. Ahead of rumored updates for Llama 3 and Claude 3.5, Gemini Pro saw a significant price cut aligning with the new intelligence frontier pricing. OpenAI's o1-preview model showed promising planning task performance with 52.8% accuracy on Randomized Mystery Blocksworld. Anthropic is rumored to release a new model, generating community excitement. Qwen 2.5 was released with models up to 32B parameters and support for 128K tokens, matching GPT-4 0613 benchmarks. Research highlights include PlanBench evaluation of o1-preview, OpenAI's release of a multilingual MMMLU dataset covering 14 languages, and RAGLAB framework standardizing Retrieval-Augmented Generation research. New AI tools include PDF2Audio for converting PDFs to audio, an open-source AI starter kit for local model deployment, and Moshi, a speech-based AI assistant from Kyutai. Industry updates feature Scale AI nearing $1B ARR with 4x YoY growth and Together Compute's enterprise platform offering faster inference and cost reductions. Insights from Sam Altman's blog post were also shared.
Replit Agent - How did everybody beat Devin to market?
jpeg-lm avc-lm replit anthropic togethercompute document-retrieval retrieval-augmented-generation ai-agents image-generation video-generation context-windows gpu-pricing enterprise-ai self-healing text-to-music andrej-karpathy mervenoyann bindureddy rohanpaul_ai leptonai teortaxestex
Replit Agent launched as a fully integrated Web IDE enabling text-to-app generation with planning and self-healing, available immediately to paid users without a waitlist. Other notable developments include Melodio, a new text-to-music model, and Together AI's kernel and speculative decoding work. Anthropic AI announced a new enterprise plan featuring a 500K context window and enhanced security. Discussions on JPEG-LM and AVC-LM models for improved image and video generation, and GPU market trends around the H100 GPU pricing were highlighted. Influential voices like Andrej Karpathy shared insights on AI agents and automation.
CogVideoX: Zhipu's Open Source Sora
cogvideox llama-3-1 llama-3-405b moondream phi-3.5 llama-rank zhipu-ai alibaba meta-ai-fair google hugging-face nvidia togethercompute salesforce video-generation serverless-computing vision document-vqa text-vqa mixture-of-experts retrieval-augmented-generation long-context model-routing webgpu background-removal long-form-generation superposition-prompting rohanpaul_ai philschmid vikhyatk algo_diver jayalammar davidsholz
Zhipu AI, Alibaba's AI arm and China's 3rd largest AI lab, released the open 5B video generation model CogVIdeoX, which can run without GPUs via their ChatGLM web and desktop apps. Meta AI announced trust & safety research and CyberSecEval 3 alongside the release of Llama 3.1, with Llama 3 405B now available serverless on Google Cloud Vertex AI and Hugging Face x NVIDIA NIM API. Updates include Moondream, an open vision-language model improving DocVQA and TextVQA tasks, and the lightweight MoE chat model Phi-3.5 with 16x3.8B parameters. Together Compute introduced the Rerank API featuring Salesforce's LlamaRank model for document and code ranking. Research highlights include superposition prompting for RAG without fine-tuning, the AgentWrite pipeline for long-form content generation over 20,000 words, and a comparison showing Long Context methods outperform RAG at higher costs. Tools include Not Diamond, an AI model router, AI command line interfaces, and an open-source WebGPU background removal tool. "You don't even need GPUs to run it," referring to CogVIdeoX.
Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B
mistral-large-2 mistral-nemo-12b llama-3.1-8b llama-3.1-70b llama-3.1 llama-3-405b yi-34b-200k gpt-4o mistral-ai meta-ai-fair groq togethercompute code-generation math function-calling reasoning context-windows model-deprecation pretraining posttraining benchmarking
Mistral Large 2 introduces 123B parameters with Open Weights under a Research License, focusing on code generation, math performance, and a massive 128k context window, improving over Mistral Large 1's 32k context. It claims better function calling capabilities than GPT-4o and enhanced reasoning. Meanwhile, Meta officially released Llama-3.1 models including Llama-3.1-70B and Llama-3.1-8B with detailed pre-training and post-training insights. The Llama-3.1 8B model's 128k context performance was found underwhelming compared to Mistral Nemo and Yi 34B 200K. Mistral is deprecating older Apache open-source models, focusing on Large 2 and Mistral Nemo 12B. The news also highlights community discussions and benchmarking comparisons.
Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)
gpt-4o-mini mistral-nemo llama-3 llama-3-400b deepseek-v2 openai nvidia mistral-ai togethercompute deepseek-ai lmsys model-quantization context-windows instruction-following model-performance cost-efficiency multimodality benchmarking open-source model-release sam-altman
GPT-4o-mini launches with a 99% price reduction compared to text-davinci-003, offering 3.5% the price of GPT-4o and matching Opus-level benchmarks. It supports 16k output tokens, is faster than previous models, and will soon support text, image, video, and audio inputs and outputs. Mistral Nemo, a 12B parameter model developed with Nvidia, features a 128k token context window, FP8 checkpoint, and strong benchmark performance. Together Lite and Turbo offer fp8/int4 quantizations of Llama 3 with up to 4x throughput and significantly reduced costs. DeepSeek V2 is now open-sourced. Upcoming releases include at least 5 unreleased models and Llama 4 leaks ahead of ICML 2024.
The Last Hurrah of Stable Diffusion?
llama-3-8b llama-3 qwen-2 gpt-4 gpt-4o stability-ai togethercompute model-architecture fine-tuning benchmarks dataset-release model-evaluation reasoning model-training retrieval-augmented-generation multimodality emad-mostaque rohanpaul_ai fchollet mikeknoop micahgoldblum teknium1 rasbt percyliang
Stability AI launched Stable Diffusion 3 Medium with models ranging from 450M to 8B parameters, featuring the MMDiT architecture and T5 text encoder for image text rendering. The community has shown mixed reactions following the departure of key researchers like Emad Mostaque. On AI models, Llama 3 8B Instruct shows strong evaluation correlation with GPT-4, while Qwen 2 Instruct surpasses Llama 3 on MMLU benchmarks. The Mixture of Agents (MoA) framework outperforms GPT-4o on AlpacaEval 2.0. Techniques like Spectrum and QLoRA enable efficient fine-tuning with less VRAM. Research on grokking reveals transformers can transition from memorization to generalization through extended training. Benchmark initiatives include the $1M ARC Prize Challenge for AGI progress and LiveBench, a live LLM benchmark to prevent dataset contamination. The Character Codex Dataset offers open data on over 15,000 characters for RAG and synthetic data. The MLX 0.2 tool enhances LLM experience on Apple Silicon Macs with improved UI and faster retrieval-augmented generation.
Francois Chollet launches $1m ARC Prize
gpt-4 chatgpt openai apple togethercompute benchmarking agi pattern-recognition skill-acquisition privacy on-device-ai mixed-precision-quantization mixture-of-experts multimodality agentic-ai francois-chollet karpathy svpino philschmid clementdelangue sama gdb miramurati kevin-weil sarah-friar
François Chollet critiques current paths to AGI, emphasizing the importance of benchmarks that resist saturation and focus on skill acquisition and open-ended problem solving. The ARC-AGI puzzles exemplify "easy for humans, hard for AI" challenges to measure progress toward AGI. Meanwhile, Apple announces integration of ChatGPT into iOS, iPadOS, and macOS through a partnership with OpenAI, enabling AI-powered features like document summarization and photo analysis with privacy-preserving measures. Discussions highlight Apple's focus on deep AI integration and on-device models optimized with techniques like mixed-precision quantization, though some skepticism remains about their AI capabilities compared to GPT-4. Additionally, Together Compute introduces a Mixture of Agents approach achieving strong performance on AlpacaEval 2.0.
1/16/2024: TIES-Merging
mixtral-8x7b nous-hermes-2 frankendpo-4x7b-bf16 thebloke hugging-face nous-research togethercompute oak-ridge-national-laboratory vast-ai runpod mixture-of-experts random-gate-routing quantization gptq exl2-quants reinforcement-learning-from-human-feedback supercomputing trillion-parameter-models ghost-attention model-fine-tuning reward-models sanjiwatsuki superking__ mrdragonfox _dampf kaltcit rombodawg technotech
TheBloke's Discord community actively discusses Mixture of Experts (MoE) models, focusing on random gate routing layers for training and the challenges of immediate model use. There is a robust debate on quantization methods, comparing GPTQ and EXL2 quants, with EXL2 noted for faster execution on specialized hardware. A new model, Nous Hermes 2, based on Mixtral 8x7B and trained with RLHF, claims benchmark superiority but shows some inconsistencies. The Frontier supercomputer at Oak Ridge National Laboratory is highlighted for training a trillion-parameter LLM with 14TB RAM, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of ghost attention in the academicat model is explored, with mixed reactions from the community. "Random gate layer is good for training but not for immediate use," and "EXL2 might offer faster execution on specialized hardware," are key insights shared.
12/8/2023 - Mamba v Mistral v Hyena
mistral-8x7b-moe mamba-3b stripedhyena-7b claude-2.1 gemini gpt-4 dialogrpt-human-vs-machine cybertron-7b-v2-gguf falcon-180b mistral-ai togethercompute stanford anthropic google hugging-face mixture-of-experts attention-mechanisms prompt-engineering alignment image-training model-deployment gpu-requirements cpu-performance model-inference long-context model-evaluation open-source chatbots andrej-karpathy tri-dao maxwellandrews raddka
Three new AI models are highlighted: Mistral's 8x7B MoE model (Mixtral), Mamba models up to 3B by Together, and StripedHyena 7B, a competitive subquadratic attention model from Stanford's Hazy Research. Discussions on Anthropic's Claude 2.1 focus on its prompting technique and alignment challenges. The Gemini AI from Google is noted as potentially superior to GPT-4. The community also explores Dreambooth for image training and shares resources like the DialogRPT-human-vs-machine model on Hugging Face. Deployment challenges for large language models, including CPU performance and GPU requirements, are discussed with references to Falcon 180B and transformer batching techniques. User engagement includes meme sharing and humor.