Company: "groq"

Dec 24, 2025

Nvidia buys (most of) Groq for $20B cash; largest execuhire ever

gemini fsd-v14 nvidia groq openai tesla epoch-ai gemini benchmarking inference model-evaluation ai-integration agent-patterns real-time-processing low-latency developer-experience healthcare business-workflows consumer-ai jensen_huang xeophon js_denain jim_fan

Groq leadership team is joining Nvidia under a "non-exclusive licensing agreement" in a deal valued at $20 billion cash, marking a major acquisition in AI chip space though Nvidia states it is not acquiring Groq as a company. Jensen Huang plans to integrate Groq's low-latency processors into the NVIDIA AI factory architecture to enhance AI inference and real-time workloads. Twitter highlights include Gemini used as a consumer utility for calorie tracking, OpenAI discussing the "deployment gap" focusing on model usage in healthcare and business, and Tesla's FSD v14 described as a "Physical Turing Test" for consumer AI. Benchmarking challenges are noted by Epoch AI emphasizing provider variance and integration issues affecting model quality measurement. Discussions on coding agents and developer experience convergence continue in the AI community.

Oct 29, 2025

Cursor 2.0 & Composer-1: Fast Models and New Agents UI

composer-1 gpt-oss-safeguard-20b gpt-oss-safeguard-120b gpt-oss gpt-5-mini cursor_ai openai huggingface ollama cerebras groq goodfireai rakuten agentic-coding reinforcement-learning mixture-of-experts fine-tuning policy-classification open-weight-models inference-stacks cost-efficiency multi-agent-systems ide voice-to-code code-review built-in-browser model-optimization sasha_rush dan_shipper samkottler ellev3n11 swyx

Cursor 2.0 launched with Composer-1, an agentic coding model optimized for speed and precision, featuring multi-agent orchestration, built-in browser for testing, and voice-to-code capabilities. OpenAI released gpt-oss-safeguard models (20B, 120B) for policy-based safety classification, open-weight and fine-tuned from gpt-oss, available on Hugging Face and supported by inference stacks like Ollama and Cerebras. Goodfire and Rakuten demonstrated sparse autoencoders for PII detection matching gpt-5-mini accuracy at significantly lower cost. The Cursor 2.0 update also includes a redesigned interface for managing multiple AI coding agents, marking a major advancement in AI IDEs. "Fast-not-slowest" tradeoff emphasized by early users for Composer-1, enabling rapid iteration with human-in-the-loop.

Oct 16, 2025

Claude Agent Skills - glorified AGENTS.md? or MCP killer?

claude-4.5-haiku claude chatgpt huggingchat-omni anthropic openai microsoft perplexity-ai huggingface groq cerebras togethercompute agent-skills document-processing long-context reasoning multi-model-routing memory-management voice vision simonwillison alexalbert__ mustafasuleyman yusuf_i_mehdi aravsrinivas

Anthropic achieves a rare feat with back-to-back AI news headlines featuring Claude's new Skills—a novel way to build specialized agents using Markdown files, scripts, and metadata to handle tasks like creating and reading PDFs, Docs, and PPTs. Simon Willison calls this a "bigger deal than MCP," predicting a "Cambrian explosion in Skills." Meanwhile, Anthropic launches Claude 4.5 Haiku with strong reasoning and long-context capabilities, priced competitively. Other updates include OpenAI's ChatGPT memory management improvements, Windows 11 Copilot voice and vision features, and HuggingChat Omni routing across 115 open-source models from 15 providers. These developments highlight advances in agent skills, document processing, long-context reasoning, and multi-model routing.

Sep 09, 2025

not much happened today

gpt-5 kimi-k2-0905 glm-4.5 qwen3-asr opus-4.1 cognition founders-fund lux-capital 8vc neo vercel claude groq alibaba huggingface meta-ai-fair google theturingpost algoperf coding-agents agent-architecture open-source model-evaluation multilingual-models speech-recognition model-optimization kv-cache quantization algorithmic-benchmarking video-generation context-windows swyx tim_dettmers

Cognition raised $400M at a $10.2B valuation to advance AI coding agents, with swyx joining to support the "Decade of Agents" thesis. Vercel launched an OSS "vibe coding platform" using a tuned GPT-5 agent loop. Claude Code emphasizes minimalism in agent loops for reliability. Kimi K2-0905 achieved 94% on coding evals and improved agentic capabilities with doubled context length. Alibaba released Qwen3-ASR, a multilingual transcription model with <8% WER. Meta introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization include AutoRound, QuTLASS v0.1.0, and AlgoPerf v0.6. Google's Veo 3 video generation API went GA with significant price cuts and vertical video support.

Sep 08, 2025

Cognition's $10b Series C; Smol AI updates

kimi-k2-0905 qwen3-asr gpt-5 cognition vercel meta-ai-fair alibaba groq huggingface coding-agents agent-development open-source model-evaluation multilingual-models inference-optimization kv-cache-compression quantization algorithmic-benchmarking context-length model-performance swyx

Cognition raised $400M at a $10.2B valuation to advance AI coding agents, with swyx joining the company. Vercel launched an OSS coding platform using a tuned GPT-5 agent loop. The Kimi K2-0905 model achieved top coding eval scores and improved agentic capabilities with doubled context length. Alibaba released Qwen3-ASR, a multilingual transcription model with robust noise handling. Meta introduced Set Block Decoding for 3-5× faster decoding without architectural changes. Innovations in KV cache compression and quantization were highlighted, including AutoRound in SGLang and QuTLASS v0.1.0 for Blackwell GPUs. Algorithmic benchmarking tools like AlgoPerf v0.6 were updated for efficiency.

Sep 05, 2025

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched

kimi-k2-0905 qwen-3-max qwen-3 moonshot-ai alibaba huggingface together-ai groq lmsys openrouter llamaindex long-context agents coding tool-use model-evaluation instruction-following context-windows semantic-search discriminator-models swyx karpathy willdepue levie bebischof andrew_n_carr bigeagle_xd

Moonshot AI updated their Kimi K2-0905 open model with doubled context length to 256k tokens, improved coding and tool-calling, and integration with agent scaffolds. Alibaba released Qwen 3 Max, a 1 trillion parameter model with agent-oriented behavior, available via Qwen Chat, Alibaba Cloud API, and OpenRouter. The community highlights China's dominance in open models and debates around meaningful evaluation methods for code agents, emphasizing long-horizon and domain-specific evals. Influential voices like @swyx and @karpathy discuss the importance of practical evals and discriminator models for ranking outputs.

Aug 29, 2025

not much happened today

fastvlm mobileclip2 grok-code-fast-1 gpt-5 qwen-3-coder-30b-a3b apple hugging-face x-ai openai groq run-llama lmstudio vision model-quantization code-generation cli-workflows retrieval-augmentation embedding-models local-ai multimodality reach_vb xenovacom pcuenq awnihannun cline veggie_eric nickbaumann_ gdb benankdev loganmarkewich tom_doerr fastmcp ggerganov orionweller antoine_chaffin

Apple released three real-time vision-language models (FastVLM, MobileCLIP2) on Hugging Face with significant speed and size improvements, supporting WebGPU and Core ML. Their MLX framework now supports MXFP4 format, competing with NVFP4 for FP4 quantization. xAI launched grok-code-fast-1, outperforming Claude for code edits, while OpenAI integrated GPT-5 into Xcode 26 and released a new Responses API on Groq hardware. CLI-first agent workflows advanced with tools like SemTools, MLX local runner for Apple Silicon, and llama.vim recommending Qwen 3 Coder 30B A3B. Retrieval research highlights limitations of single-vector embeddings, promoting ColBERT-style late interaction.

Aug 06, 2025

not much happened today

gpt-oss-120b gpt-oss-20b kimi-k2 deepseek-r1 qwen-3-32b openai huggingface microsoft llamaindex ollama baseten fireworksai cerebras groq together anthropic google uk-aisi sliding-window-attention mixture-of-experts rope context-length mxfp4-format synthetic-data reasoning-core-hypothesis red-teaming benchmarking coding-benchmarks model-performance fine-tuning woj_zaremba sama huybery drjimfan jxmnop scaling01 arunv30 kevinweil xikun_zhang_ jerryjliu0 ollama basetenco reach_vb gneubig shxf0072 _lewtun

OpenAI released its first open models since GPT-2, gpt-oss-120b and gpt-oss-20b, which quickly trended on Hugging Face. Microsoft supports these models via Azure AI Foundry and Windows Foundry Local. Key architectural innovations include sliding window attention, mixture of experts (MoE), a RoPE variant, and a 256k context length. The models use a new MXFP4 format supported by llama.cpp. Hypotheses suggest gpt-oss was trained on synthetic data to enhance safety and performance, supporting the Reasoning Core Hypothesis. OpenAI announced a $500K bounty for red teaming with partners including Anthropic, Google, and the UK AISI. Performance critiques highlight inconsistent benchmarking results, with GPT-OSS-120B scoring 41.8% on the Aider Polyglot coding benchmark, trailing competitors like Kimi-K2 and DeepSeek-R1. Some users note the model excels in math and reasoning but lacks common sense and practical utility.

Jul 16, 2025

not much happened today

kimi-k2 gpt-4.1 voxtral goedel-prover-v2 llama-3 mistral-ai moonshot-ai nous-research google-deepmind openai groq anthropic speech-recognition mixture-of-experts benchmarking dataset-release model-architecture theorem-proving reinforcement-learning asymmetry-of-verification inference-speed model-performance cline _jasonwei

Mistral released Voxtral, claimed as the world's best open speech recognition models, available via API and Hugging Face. Moonshot AI launched Kimi K2, a trillion-parameter Mixture-of-Experts (MoE) model, outperforming GPT-4.1 on benchmarks with 65.4% on SWE-Bench Verified and achieving 200 tokens/second inference speed on Groq hardware. Nous Research open-sourced the Hermes 3 dataset with 1 million samples, aiding SOTA models on the Llama-3 series. Google DeepMind introduced the Mixture-of-Recursions (MoR) architecture promising 2x inference speed and 50% parameter reduction but faced skepticism. Goedel-Prover V2 topped the PutnamBench theorem proving benchmark. AtCoder World Finals saw a human winner with OpenAI placing second. Research highlights include Jason Wei's insights on reinforcement learning and the "Verifier's Law" emphasizing the asymmetry of verification in AI training.

Jul 15, 2025

Voxtral - Mistral's SOTA ASR model in 3B (mini) and 24B ("small") sizes beats OpenAI Whisper large-v3

voxtal-3b voxtal-24b kimi-k2 mistral-ai moonshot-ai groq together-ai deepinfra huggingface langchain transcription long-context function-calling multilingual-models mixture-of-experts inference-speed developer-tools model-integration jeremyphoward teortaxestex scaling01 zacharynado jonathanross321 reach_vb philschmid

Mistral surprises with the release of Voxtral, a transcription model outperforming Whisper large-v3, GPT-4o mini Transcribe, and Gemini 2.5 Flash. Voxtral models (3B and 24B) support 32k token context length, handle audios up to 30-40 minutes, offer built-in Q&A and summarization, are multilingual, and enable function-calling from voice commands, powered by the Mistral Small 3.1 language model backbone. Meanwhile, Moonshot AI's Kimi K2, a non-reasoning Mixture of Experts (MoE) model built by a team of around 200 people, gains attention for blazing-fast inference on Groq hardware, broad platform availability including Together AI and DeepInfra, and local running on M4 Max 128GB Mac. Developer tool integrations include LangChain and Hugging Face support, highlighting Kimi K2's strong tool use capabilities.

Apr 29, 2025

LlamaCon: Meta AI gets into the Llama API platform business

llama-4 qwen3 qwen3-235b-a22b qwen3-30b-a3b qwen3-4b qwen2-5-72b-instruct o3-mini meta-ai-fair cerebras groq alibaba vllm ollama llamaindex hugging-face llama-cpp model-release fine-tuning reinforcement-learning moe multilingual-models model-optimization model-deployment coding benchmarking apache-license reach_vb huybery teortaxestex awnihannun thezachmueller

Meta celebrated progress in the Llama ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by Cerebras and Groq hardware, though it remains waitlisted. Meanwhile, Alibaba released the Qwen3 family of large language models, including two MoE models and six dense models ranging from 0.6B to 235B parameters, with the flagship Qwen3-235B-A22B achieving competitive benchmark results and supporting 119 languages and dialects. The Qwen3 models are optimized for coding and agentic capabilities, are Apache 2.0 licensed, and have broad deployment support including local usage with tools like vLLM, Ollama, and llama.cpp. Community feedback highlights Qwen3's scalable performance and superiority over models like OpenAI's o3-mini.

Feb 13, 2025

small news items

gpt-4.5 gpt-5 deepseek-r1-distilled-qwen-1.5b o1-preview modernbert-0.3b qwen-0.5b o3 openai ollama mistral perplexity cerebras alibaba groq bytedance math benchmarking fine-tuning model-performance reinforcement-learning model-architecture partnerships funding jeremyphoward arankomatsuzaki sama nrehiew_ danhendrycks akhaliq

OpenAI announced plans for GPT-4.5 (Orion) and GPT-5, with GPT-5 integrating the o3 model and offering unlimited chat access in the free tier. DeepSeek R1 Distilled Qwen 1.5B outperforms OpenAI's o1-preview on math benchmarks, while ModernBERT 0.3b surpasses Qwen 0.5b at MMLU without fine-tuning. Mistral and Perplexity adopt Cerebras hardware for 10x performance gains. OpenAI's o3 model won a gold medal at the 2024 International Olympiad in Informatics. Partnerships include Qwen with Groq. Significant RLHF activity is noted in Nigeria and the global south, and Bytedance is expected to rise in AI prominence soon. "GPT5 is all you need."

Jan 09, 2025

not much happened today

phi-4 reinforce++ arc-agi-2 ai21-labs ollama langchain togethercompute groq reinforcement-learning ppo model-optimization memory-efficiency python-packages vision text-extraction frontend-code-generation workflow-automation coding-agents compute-cost-reduction ethical-ai agi-benchmarks scam-alerts sebastien-bubeck fchollet tom-doerr arohan_ bindureddy hwchase17 jonathanross321 clementdelangue vikhyatk

Sebastien Bubeck introduced REINFORCE++, enhancing classical REINFORCE with PPO-inspired techniques for 30% faster training. AI21 Labs released Phi-4 under the MIT License, accessible via Ollama. François Chollet announced plans for ARC-AGI-2 and a next-generation AGI benchmark. LangChain launched 10 new integration packages to boost LLM application development. Tom Doerr introduced Ollama-OCR, a Python package for text extraction using vision language models. Arohan optimized Shampoo for memory efficiency, reducing usage from 20 to 6 bytes per parameter. Bindu Reddy showcased CodeLLM's v1 for frontend code generation and highlighted LlamaIndex Workflows for academic summarization and slide generation. Hwchase17 collaborated with Together Compute to enhance WebDev Arena with complex coding agents for LLM coding evaluations. Jonathan Ross detailed Groq's mission to reduce compute costs by 1000x amid rising generative AI spending. Clement Delangue warned about scam alerts involving false claims of association with AI21. Vikhyat K raised concerns about the ethical implications and trade-offs of AGI. Memes and humor included creative AI prompts and critiques of LLM behaviors.

Sep 23, 2024

a calm before the storm

o1 o1-mini qwen2.5 gpt-4 llama-2-70b llama-7b anthropic openai alibaba microsoft blackrock groq aramco disney eth-zurich pudu-robotics slack long-context kv-cache-quantization diffusion-models reinforcement-learning robotics ai-integration multilinguality model-benchmarking model-performance model-optimization adcock_brett philschmid rohanpaul_ai jvnixon kateclarktweets sama

Anthropic is raising funds at a valuation up to $40 billion ahead of anticipated major releases. OpenAI launched new reasoning models o1 and o1-mini, with increased rate limits and a multilingual MMLU benchmark. Alibaba released the open-source Qwen2.5 model supporting 29+ languages, showing competitive performance to gpt-4 at lower cost. Microsoft and Blackrock plan to invest $30 billion in AI data centers, with Groq partnering with Aramco to build the world's largest AI inference center. Robotics advances include Disney Research and ETH Zurich's diffusion-based motion generation for robots and Pudu Robotics' semi-humanoid robot. Slack and Microsoft introduced AI-powered agents integrated into their platforms. Research highlights include long-context scaling for llama-2-70b using Dual Chunk Attention and KV cache quantization enabling 1 million token context on llama-7b models.

Aug 29, 2024

Cerebras Inference: Faster, Better, AND Cheaper

llama-3.1-8b llama-3.1-70b gemini-1.5-flash gemini-1.5-pro cogvideox-5b mamba-2 rene-1.3b llama-3.1 gemini-1.5 claude groq cerebras cursor google-deepmind anthropic inference-speed wafer-scale-chips prompt-caching model-merging benchmarking open-source-models code-editing model-optimization jeremyphoward sam-altman nat-friedman daniel-gross swyx

Groq led early 2024 with superfast LLM inference speeds, achieving ~450 tokens/sec for Mixtral 8x7B and 240 tokens/sec for Llama 2 70B. Cursor introduced a specialized code edit model hitting 1000 tokens/sec. Now, Cerebras claims the fastest inference with their wafer-scale chips, running Llama3.1-8b at 1800 tokens/sec and Llama3.1-70B at 450 tokens/sec at full precision, with competitive pricing and a generous free tier. Google's Gemini 1.5 models showed significant benchmark improvements, especially Gemini-1.5-Flash and Gemini-1.5-Pro. New open-source models like CogVideoX-5B and Mamba-2 (Rene 1.3B) were released, optimized for consumer hardware. Anthropic's Claude now supports prompt caching, improving speed and cost efficiency. "Cerebras Inference runs Llama3.1 20x faster than GPU solutions at 1/5 the price."

Aug 07, 2024

GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

gpt-4o-2024-08-06 llama-3-1-405b llama-3 claude-3.5-sonnet gemini-1.5-pro gpt-4o yi-large-turbo openai meta-ai-fair google-deepmind yi-large nvidia groq langchain jamai langsmith structured-output context-windows model-pricing benchmarking parameter-efficient-expert-retrieval retrieval-augmented-generation mixture-of-experts model-performance ai-hardware model-deployment filtering multi-lingual vision john-carmack jonathan-ross rohanpaul_ai

OpenAI released the new gpt-4o-2024-08-06 model with 16k context window and 33-50% lower pricing than the previous 4o-May version, featuring a new Structured Output API that improves output quality and reduces retry costs. Meta AI launched Llama 3.1, a 405-billion parameter model surpassing GPT-4 and Claude 3.5 Sonnet on benchmarks, alongside expanding the Llama Impact Grant program. Google DeepMind quietly released Gemini 1.5 Pro, outperforming GPT-4o, Claude-3.5, and Llama 3.1 on LMSYS benchmarks and leading the Vision Leaderboard. Yi-Large Turbo was introduced as a cost-effective upgrade priced at $0.19 per million tokens. In hardware, NVIDIA H100 GPUs were highlighted by John Carmack for their massive AI workload power, and Groq announced plans to deploy 108,000 LPUs by Q1 2025. New AI tools and techniques include RAG (Retrieval-Augmented Generation), the JamAI Base platform for Mixture of Agents systems, and LangSmith's enhanced filtering capabilities. Google DeepMind also introduced PEER (Parameter Efficient Expert Retrieval) architecture.

Aug 05, 2024

How Carlini Uses AI

gemma-2-2b gpt-3.5-turbo-0613 mixtral-8x7b gen-3-alpha segment-anything-model-2 stable-fast-3d groq intel deepmind box figure-ai openai google meta-ai-fair nvidia stability-ai runway benchmarking adversarial-attacks large-language-models text-generation multimodality robotics emotion-detection structured-data-extraction real-time-processing teleoperation 3d-generation text-to-video nicholas-carlini chris-dixon rasbt

Groq's shareholders' net worth rises while others fall, with Intel's CEO expressing concern. Nicholas Carlini of DeepMind gains recognition and criticism for his extensive AI writings, including an 80,000-word treatise on AI use and a benchmark for large language models. Chris Dixon comments on AI Winter skepticism, emphasizing long-term impact. Box introduces an AI API for extracting structured data from documents, highlighting potential and risks of LLM-driven solutions. Recent AI developments include Figure AI launching the advanced humanoid robot Figure 02, OpenAI rolling out Advanced Voice Mode for ChatGPT with emotion detection, Google open-sourcing Gemma 2 2B model matching GPT-3.5-Turbo-0613 performance, Meta AI Fair releasing Segment Anything Model 2 (SAM 2) for real-time object tracking, NVIDIA showcasing Project GR00T for humanoid teleoperation with Apple Vision Pro, Stability AI launching Stable Fast 3D for rapid 3D asset generation, and Runway unveiling Gen-3 Alpha for AI text-to-video generation.

Jul 24, 2024

Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B

mistral-large-2 mistral-nemo-12b llama-3.1-8b llama-3.1-70b llama-3.1 llama-3-405b yi-34b-200k gpt-4o mistral-ai meta-ai-fair groq togethercompute code-generation math function-calling reasoning context-windows model-deprecation pretraining posttraining benchmarking

Mistral Large 2 introduces 123B parameters with Open Weights under a Research License, focusing on code generation, math performance, and a massive 128k context window, improving over Mistral Large 1's 32k context. It claims better function calling capabilities than GPT-4o and enhanced reasoning. Meanwhile, Meta officially released Llama-3.1 models including Llama-3.1-70B and Llama-3.1-8B with detailed pre-training and post-training insights. The Llama-3.1 8B model's 128k context performance was found underwhelming compared to Mistral Nemo and Yi 34B 200K. Mistral is deprecating older Apache open-source models, focusing on Large 2 and Mistral Nemo 12B. The news also highlights community discussions and benchmarking comparisons.

Jul 24, 2024

Llama 3.1: The Synthetic Data Model

llama-3-405b llama-3-1 llama-3 meta-ai-fair groq fireworks synthetic-data fine-tuning reinforcement-learning multilinguality long-context tool-use code-generation math model-licensing inference-speed model-deployment bindureddy thomas

Meta AI has released Llama 3.1, including a 405B parameter model that triggers regulatory considerations like the EU AI Act and SB 1047. The model incorporates extensive synthetic data techniques for code, math, multilinguality, long context, and tool use fine-tuning, with RLHF using synthetic preference data from Llama 2. The launch was coordinated across major inference providers, with Groq demonstrating 750 tokens per second inference speed and Fireworks leading in pricing. The updated license explicitly allows synthetic data generation, marking a significant step in open frontier-class LLMs and cost-efficiency improvements since March.

Jun 06, 2024

Qwen 2 beats Llama 3 (and we don't know how)

qwen-2 llama-3 llama-3-70b gpt-4 nllb alibaba groq meta-ai-fair multilinguality benchmarking inference-speed sparse-autoencoders scaling-laws post-training instruction-following rejection-sampling execution-feedback model-release multilingual-models model-training philschmid huybery jonathanross321 awnihannun gdb nabla_theta ylecun

Alibaba released Qwen 2 models under Apache 2.0 license, claiming to outperform Llama 3 in open models with multilingual support in 29 languages and strong benchmark scores like MMLU 82.3 and HumanEval 86.0. Groq demonstrated ultra-fast inference speed on Llama-3 70B at 40,792 tokens/s and running 4 Wikipedia articles in 200ms. Research on sparse autoencoders (SAEs) for interpreting GPT-4 neural activity showed new training methods, metrics, and scaling laws. Meta AI announced the No Language Left Behind (NLLB) model capable of high-quality translations between 200 languages, including low-resource ones. "Our post-training phase is designed with the principle of scalable training with minimal human annotation," highlighting techniques like rejection sampling for math and execution feedback for coding.

Jun 04, 2024

Not much happened today

gemini-1.5-flashmodel gemini-pro mixtral mamba-2 phi-3-medium phi-3-small gpt-3.5-turbo-0613 llama-3-8b llama-2-70b mistral-finetune twelve-labs livekit groq openai nea nvidia lmsys mistral-ai model-performance prompt-engineering data-curation ai-safety model-benchmarking model-optimization training sequence-models state-space-models daniel-kokotajlo rohanpaul_ai _arohan_ tri_dao _albertgu _philschmid sarahcat21 hamelhusain jachiam0 willdepue teknium1

Twelve Labs raised $50m in Series A funding co-led by NEA and NVIDIA's NVentures to advance multimodal AI. Livekit secured $22m in funding. Groq announced running at 800k tokens/second. OpenAI saw a resignation from Daniel Kokotajlo. Twitter users highlighted Gemini 1.5 FlashModel for high performance at low cost and Gemini Pro ranking #2 in Japanese language tasks. Mixtral models can run up to 8x faster on NVIDIA RTX GPUs using TensorRT-LLM. Mamba-2 model architecture introduces state space duality for larger states and faster training, outperforming previous models. Phi-3 Medium (14B) and Small (7B) models benchmark near GPT-3.5-Turbo-0613 and Llama 3 8B. Prompt engineering is emphasized for unlocking LLM capabilities. Data quality is critical for model performance, with upcoming masterclasses on data curation. Discussions on AI safety include a Frontier AI lab employee letter advocating whistleblower protections and debates on aligning AI to user intent versus broader humanity interests.

May 23, 2024

ALL of AI Engineering in One Place

claude-3-sonnet claude-3 openai google-deepmind anthropic mistral-ai cohere hugging-face adept midjourney character-ai microsoft amazon nvidia salesforce mastercard palo-alto-networks axa novartis discord twilio tinder khan-academy sourcegraph mongodb neo4j hasura modular cognition anysphere perplexity-ai groq mozilla nous-research galileo unsloth langchain llamaindex instructor weights-biases lambda-labs neptune datastax crusoe covalent qdrant baseten e2b octo-ai gradient-ai lancedb log10 deepgram outlines crew-ai factory-ai interpretability feature-steering safety multilinguality multimodality rag evals-ops open-models code-generation gpus agents ai-leadership

The upcoming AI Engineer World's Fair in San Francisco from June 25-27 will feature a significantly expanded format with booths, talks, and workshops from top model labs like OpenAI, DeepMind, Anthropic, Mistral, Cohere, HuggingFace, and Character.ai. It includes participation from Microsoft Azure, Amazon AWS, Google Vertex, and major companies such as Nvidia, Salesforce, Mastercard, Palo Alto Networks, and more. The event covers 9 tracks including RAG, multimodality, evals/ops, open models, code generation, GPUs, agents, AI in Fortune 500, and a new AI leadership track. Additionally, Anthropic shared interpretability research on Claude 3 Sonnet, revealing millions of interpretable features that can be steered to modify model behavior, including safety-relevant features related to bias and unsafe content, though more research is needed for practical applications. The event offers a discount code for AI News readers.

May 03, 2024

$100k to predict LMSYS human preferences in a Kaggle contest

llama-3-70b llama-3 gpt-4 claude-3-opus prometheus-2 groq openai lmsys scale-ai ai2 nvidia benchmarking datasets fine-tuning reinforcement-learning model-alignment hallucination parameter-efficient-fine-tuning scalable-training factuality chatbot-performance bindureddy drjimfan percyliang seungonekim mobicham clefourrier

Llama 3 models are making breakthroughs with Groq's 70B model achieving record low costs per million tokens. A new Kaggle competition offers a $100,000 prize to develop models predicting human preferences from a dataset of over 55,000 user-LLM conversations. Open source evaluator LLMs like Prometheus 2 outperform proprietary models such as GPT-4 and Claude 3 Opus in judgment tasks. New datasets like WildChat1M provide over 1 million ChatGPT interaction logs with diverse and toxic examples. Techniques like LoRA fine-tuning show significant performance gains, and NVIDIA's NeMo-Aligner toolkit enables scalable LLM alignment across hundreds of GPUs. Factuality-aware alignment methods are proposed to reduce hallucinations in LLM outputs.

Apr 23, 2024

Perplexity, the newest AI unicorn

llama-3-8b llama-3-70b llama-3 llava-llama-3-8b-v1_1 phi-3 gpt-3.5 perplexity-ai meta-ai-fair hugging-face groq context-length fine-tuning quantization instruction-following model-comparison multimodality benchmarking memory-optimization model-performance daniel-gross aravind-srinivas

Perplexity doubles its valuation shortly after its Series B with a Series B-1 funding round. Significant developments around Llama 3 include context length extension to 16K tokens, new multimodal LLaVA models outperforming Llama 2, and fine-tuning improvements like QDoRA surpassing QLoRA. The Llama-3-70B model is praised for instruction following and performance across quantization formats. Phi-3 models by Meta AI released in multiple sizes show competitive benchmark results, with the 14B model achieving 78% on MMLU and the 3.8B model nearing GPT-3.5 performance.

Apr 20, 2024

Llama-3-70b is GPT-4-level Open Model

llama-3-70b llama-3-8b llama-3 llama-2-70b mistral-7b grok-3 stable-diffusion-3 vasa-1 meta-ai-fair groq nvidia amazon microsoft benchmarking model-performance fine-tuning function-calling arithmetic image-generation video-generation energy-usage gpu-demand political-bias ai-safety scaling context-windows tokenization elon-musk

Meta has released Llama 3, their most capable open large language model with 8B and 70B parameter versions supporting 8K context length and outperforming previous models including Llama 2 and Mistral 7B. Groq serves the Llama 3 70B model at 500-800 tokens/second, making it the fastest GPT-4-level token source. Discussions highlight AI scaling challenges with Elon Musk stating that training Grok 3 will require 100,000 Nvidia H100 GPUs, and AWS planning to acquire 20,000 B200 GPUs for a 27 trillion parameter model. Microsoft unveiled VASA-1 for lifelike talking face generation, while Stable Diffusion 3 and its extensions received mixed impressions. Concerns about AI energy usage and political bias in AI were also discussed.

Mar 19, 2024

Grok-1 in Bio

grok-1 mixtral miqu-70b claude-3-opus claude-3 claude-3-haiku xai mistral-ai perplexity-ai groq anthropic openai mixture-of-experts model-release model-performance benchmarking finetuning compute hardware-optimization mmlu model-architecture open-source memes sam-altman arthur-mensch daniel-han arav-srinivas francis-yao

Grok-1, a 314B parameter Mixture-of-Experts (MoE) model from xAI, has been released under an Apache 2.0 license, sparking discussions on its architecture, finetuning challenges, and performance compared to models like Mixtral and Miqu 70B. Despite its size, its MMLU benchmark performance is currently unimpressive, with expectations that Grok-2 will be more competitive. The model's weights and code are publicly available, encouraging community experimentation. Sam Altman highlighted the growing importance of compute resources, while Grok's potential deployment on Groq hardware was noted as a possible game-changer. Meanwhile, Anthropic's Claude continues to attract attention for its "spiritual" interaction experience and consistent ethical framework. The release also inspired memes and humor within the AI community.