All tags
Company: "amazon"
not much happened today
gpt-2 r1 gemma-3 gemmacoder3-12b qwen2.5-omni openai deepseek berkeley alibaba togethercompute nvidia azure runway langchain bmw amazon open-source function-calling benchmarking code-reasoning multimodality inference-speed image-generation voice-generation animation robotics realtime-transcription webrtc sama clémentdelangue lioronai scaling01 cognitivecompai osanseviero jack_w_rae ben_burtenshaw theturingpost vipulved kevinweil tomlikesrobots adcock_brett juberti
OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, signaling a move towards more open AI development. DeepSeek launched its open-source R1 model earlier this year, challenging perceptions of China's AI progress. Gemma 3 has achieved function calling capabilities and ranks on the Berkeley Function-Calling Leaderboard, while GemmaCoder3-12b improves code reasoning performance on LiveCodeBench. Alibaba_Qwen's Qwen2.5-Omni introduces a novel Thinker-Talker system and TMRoPE for multimodal input understanding. The TogetherCompute team achieved 140 TPS on a 671B parameter model, outperforming Azure and DeepSeek API on Nvidia GPUs. OpenAI also expanded ChatGPT features with image generation for all free users and a new voice release. Runway Gen-4 enhances animation for miniature dioramas, and LangChain launched a chat-based generative UI agent. Commercial deployment of Figure 03 humanoid robots at BMW highlights advances in autonomy and manufacturing scaling. New tools include OpenAI's realtime transcription API with WebRTC support and Amazon's Nova Act AI browser agent.
lots of small launches
gpt-4o claude-3.7-sonnet claude-3.7 claude-3.5-sonnet deepseek-r1 deepseek-v3 grok-3 openai anthropic amazon cloudflare perplexity-ai deepseek-ai togethercompute elevenlabs elicitorg inceptionailabs mistral-ai voice model-releases cuda gpu-optimization inference open-source api model-performance token-efficiency context-windows cuda jit-compilation lmarena_ai alexalbert__ aravsrinivas reach_vb
GPT-4o Advanced Voice Preview is now available for free ChatGPT users with enhanced daily limits for Plus and Pro users. Claude 3.7 Sonnet has achieved the top rank in WebDev Arena with improved token efficiency. DeepSeek-R1 with 671B parameters benefits from the Together Inference platform optimizing NVIDIA Blackwell GPU usage, alongside the open-source DeepGEMM CUDA library delivering up to 2.7x speedups on Hopper GPUs. Perplexity launched a new Voice Mode and a Deep Research API. The upcoming Grok 3 API will support a 1M token context window. Several companies including Elicit, Amazon, Anthropic, Cloudflare, FLORA, Elevenlabs, and Inception Labs announced new funding rounds, product launches, and model releases.
not much happened today
claude-3.7-sonnet claude-3.7 deepseek-r1 o3-mini deepseek-v3 gemini-2.0-pro gpt-4o qwen2.5-coder-32b-instruct anthropic perplexity-ai amazon google-cloud deepseek_ai coding reasoning model-benchmarking agentic-workflows context-window model-performance open-source moe model-training communication-libraries fp8 nvlink rdma cli-tools skirano omarsar0 reach_vb artificialanlys terryyuezhuo _akhaliq _philschmid catherineols goodside danielhanchen
Claude 3.7 Sonnet demonstrates exceptional coding and reasoning capabilities, outperforming models like DeepSeek R1, O3-mini, and GPT-4o on benchmarks such as SciCode and LiveCodeBench. It is available on platforms including Perplexity Pro, Anthropic, Amazon Bedrock, and Google Cloud, with pricing at $3/$15 per million tokens. Key features include a 64k token thinking mode, 200k context window, and the CLI-based coding assistant Claude Code. Meanwhile, DeepSeek released DeepEP, an open-source communication library optimized for MoE model training and inference with support for NVLink, RDMA, and FP8. These updates highlight advancements in coding AI and efficient model training infrastructure.
Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)
amazon-nova claude-3 llama-3-70b gemini-1.5-flash gpt-4o amazon anthropic google-deepmind sakana-ai-labs multimodality benchmarking model-merging model-performance model-architecture model-optimization population-based-learning philschmid bindureddy
Amazon announced the Amazon Nova family of multimodal foundation models at AWS Re:Invent, available immediately with no waitlist in configurations like Micro, Lite, Pro, Canvas, and Reel, with Premier and speech-to-speech coming next year. These models offer 2-4x faster token speeds and are 25%-400% cheaper than competitors like Anthropic Claude models, positioning Nova as a serious contender in AI engineering. Pricing undercuts models such as Google DeepMind Gemini Flash 8B, and some Nova models extend context length up to 300k tokens. However, benchmarking controversy exists as some evaluations show Nova scoring below Llama-3 70B in LiveBench AI metrics. Separately, CycleQD was introduced by Sakana AI Labs, using evolutionary computation for population-based model merging to develop niche LLM agents.
not much happened today
ic-light-v2 claude-3-5-sonnet puzzle nvidia amazon anthropic google pydantic supabase browser-company world-labs cognition distillation neural-architecture-search inference-optimization video trajectory-attention timestep-embedding ai-safety-research fellowship-programs api domain-names reverse-thinking reasoning agent-frameworks image-to-3d ai-integration akhaliq adcock_brett omarsar0 iscienceluvr
AI News for 11/29/2024-12/2/2024 highlights several developments: Nvidia introduced Puzzle, a distillation-based neural architecture search for inference-optimized large language models, enhancing efficiency. The IC-Light V2 model was released for varied illumination scenarios, and new video model techniques like Trajectory Attention and Timestep Embedding were presented. Amazon increased its investment in Anthropic to $8 billion, supporting AI safety research through a new fellowship program. Google is expanding AI integration with the Gemini API and open collaboration tools. Discussions on domain name relevance emphasize alternatives to .com domains like .io, .ai, and .co. Advances in reasoning include a 13.53% improvement in LLM performance using "Reverse Thinking". Pydantic launched a new agent framework, and Supabase released version 2 of their assistant. Other notable mentions include Browser Company teasing a second browser and World Labs launching image-to-3D-world technology. The NotebookLM team departed from Google, and Cognition was featured on the cover of Forbes. The news was summarized by Claude 3.5 Sonnet.
not much happened to end the week
gemini deepseek-r1 o1 chatgpt gpt-4 claude-3.5-sonnet o1-preview o1-mini gpt4o qwq-32b google-deepmind deeplearningai amazon tesla x-ai alibaba ollama multimodality benchmarking quantization reinforcement-learning ai-safety translation reasoning interpretability model-comparison humor yoshua-bengio kevinweil ylecun
AI News for 11/29/2024-11/30/2024 covers key updates including the Gemini multimodal model advancing in musical structure understanding, a new quantized SWE-Bench for benchmarking at 1.3 bits per task, and the launch of the DeepSeek-R1 model focusing on transparent reasoning as an alternative to o1. The establishment of the 1st International Network of AI Safety Institutes highlights global collaboration on AI safety. Industry updates feature Amazon's Olympus AI model, Tesla's Optimus, and experiments with ChatGPT as a universal translator. Community reflections emphasize the impact of large language models on daily life and medical AI applications. Discussions include scaling sparse autoencoders to gpt-4 and the need for transparency in reasoning LLMs. The report also notes humor around ChatGPT's French nickname.
Anthropic launches the Model Context Protocol
claude-3.5-sonnet claude-desktop anthropic amazon zed sourcegraph replit model-context-protocol integration json-rpc agentic-behaviors security tool-discovery open-protocol api-integration system-integration prompt-templates model-routing alex-albert matt-pocock hwchase17
Anthropic has launched the Model Context Protocol (MCP), an open protocol designed to enable seamless integration between large language model applications and external data sources and tools. MCP supports diverse resources such as file contents, database records, API responses, live system data, screenshots, and logs, identified by unique URIs. It also includes reusable prompt templates, system and API tools, and JSON-RPC 2.0 transports with streaming support. MCP allows servers to request LLM completions through clients with priorities on cost, speed, and intelligence, hinting at an upcoming model router by Anthropic. Launch partners like Zed, Sourcegraph, and Replit have reviewed MCP favorably, while some developers express skepticism about its provider exclusivity and adoption potential. The protocol emphasizes security, testing, and dynamic tool discovery, with guides and videos available from community members such as Alex Albert and Matt Pocock. This development follows Anthropic's recent $4 billion fundraise from Amazon and aims to advance terminal-level integration for Claude Desktop.
Execuhires: Tempting The Wrath of Khan
gemini-1.5-pro gpt-4o claude-3.5 flux-1 llama-3-1-405b character.ai google adept amazon inflection microsoft stability-ai black-forest-labs schelling google-deepmind openai anthropic meta-ai-fair lmsys langchainai execuhire model-benchmarking multilinguality math coding text-to-image agent-ide open-source-models post-training data-driven-performance noam-shazeer mostafa-mostaque david-friedman rob-rombach alexandr-wang svpino rohanpaul_ai
Character.ai's $2.5b execuhire to Google marks a significant leadership move alongside Adept's $429m execuhire to Amazon and Inflection's $650m execuhire to Microsoft. Despite strong user growth and content momentum, Character.ai's CEO Noam Shazeer returns to Google, signaling shifting vibes in the AI industry. Google DeepMind's Gemini 1.5 Pro tops Chatbot Arena benchmarks, outperforming GPT-4o and Claude-3.5, excelling in multilingual, math, and coding tasks. The launch of Black Forest Labs' FLUX.1 text-to-image model and LangGraph Studio agent IDE highlight ongoing innovation. Llama 3.1 405B is released as the largest open-source model, fostering developer use and competition with closed models. The industry is focusing increasingly on post-training and data as key competitive factors, raising questions about acquisition practices and regulatory scrutiny.
Ways to use Anthropic's Tool Use GA
claude-3-opus haiku opus convnext anthropic amazon google tool-use function-calling agentic-ai streaming vision parallelization delegation debate specialization open-science superintelligence convolutional-networks self-attention ai-research yann-lecun alex-albert sainingxie
Anthropic launched general availability of tool use/function calling with support for streaming, forced use, and vision, alongside Amazon and Google. Alex Albert shared five architectures for agentic tool use: delegation, parallelization, debate, specialization, and tool suite experts. Anthropic also introduced a self-guided course on tool use. Yann LeCun emphasized ethical open science funding, gradual emergence of superintelligence with safety guardrails, and convolutional networks for image/video processing as competitive with vision transformers. He also noted growth in AI researchers across industry, academia, and government.
ALL of AI Engineering in One Place
claude-3-sonnet claude-3 openai google-deepmind anthropic mistral-ai cohere hugging-face adept midjourney character-ai microsoft amazon nvidia salesforce mastercard palo-alto-networks axa novartis discord twilio tinder khan-academy sourcegraph mongodb neo4j hasura modular cognition anysphere perplexity-ai groq mozilla nous-research galileo unsloth langchain llamaindex instructor weights-biases lambda-labs neptune datastax crusoe covalent qdrant baseten e2b octo-ai gradient-ai lancedb log10 deepgram outlines crew-ai factory-ai interpretability feature-steering safety multilinguality multimodality rag evals-ops open-models code-generation gpus agents ai-leadership
The upcoming AI Engineer World's Fair in San Francisco from June 25-27 will feature a significantly expanded format with booths, talks, and workshops from top model labs like OpenAI, DeepMind, Anthropic, Mistral, Cohere, HuggingFace, and Character.ai. It includes participation from Microsoft Azure, Amazon AWS, Google Vertex, and major companies such as Nvidia, Salesforce, Mastercard, Palo Alto Networks, and more. The event covers 9 tracks including RAG, multimodality, evals/ops, open models, code generation, GPUs, agents, AI in Fortune 500, and a new AI leadership track. Additionally, Anthropic shared interpretability research on Claude 3 Sonnet, revealing millions of interpretable features that can be steered to modify model behavior, including safety-relevant features related to bias and unsafe content, though more research is needed for practical applications. The event offers a discount code for AI News readers.
Not much happened today
command-r-35b goliath-120 miqu-120 llama-3-8b tensorrt-llm llama-cpp gpt2-chat gpt-4-turbo llama-3 deepmind-alphazero anthropic openai perplexity-ai amazon apple microsoft deepmind creative-writing context-windows benchmarking model-performance self-learning function-calling retrieval-augmented-generation ai-assistants on-device-ai ai-lobbying copyright-infringement code-reasoning image-generation
Anthropic released a team plan and iOS app about 4 months after OpenAI. The Command-R 35B model excels at creative writing, outperforming larger models like Goliath-120 and Miqu-120. The Llama-3 8B model now supports a 1 million token context window, improving long-context understanding with minimal training on a single 8xA800 GPU machine. TensorRT-LLM benchmarks show it is 30-70% faster than llama.cpp on consumer hardware. A benchmark suggests GPT2-Chat may have better reasoning than GPT-4-Turbo, though results are debated. Demos include a self-learning Llama-3 voice agent running locally on Jetson Orin and a Self-Learning Large Action Model (LAM). Amazon CodeWhisperer was renamed to Q Developer, expanding its generative AI assistant capabilities. Apple plans an AI-enabled Safari browser with an on-device LLM in iOS 18 and macOS 15. Big Tech dominates AI lobbying in Washington, while major U.S. newspapers sued OpenAI and Microsoft for copyright infringement. DeepMind's AlphaZero became the greatest chess player in 9 hours, and their Naturalized Execution Tuning (NExT) method improves LLM code reasoning by 14-26%. Stable Diffusion is used for diverse image generation applications.
Llama-3-70b is GPT-4-level Open Model
llama-3-70b llama-3-8b llama-3 llama-2-70b mistral-7b grok-3 stable-diffusion-3 vasa-1 meta-ai-fair groq nvidia amazon microsoft benchmarking model-performance fine-tuning function-calling arithmetic image-generation video-generation energy-usage gpu-demand political-bias ai-safety scaling context-windows tokenization elon-musk
Meta has released Llama 3, their most capable open large language model with 8B and 70B parameter versions supporting 8K context length and outperforming previous models including Llama 2 and Mistral 7B. Groq serves the Llama 3 70B model at 500-800 tokens/second, making it the fastest GPT-4-level token source. Discussions highlight AI scaling challenges with Elon Musk stating that training Grok 3 will require 100,000 Nvidia H100 GPUs, and AWS planning to acquire 20,000 B200 GPUs for a 27 trillion parameter model. Microsoft unveiled VASA-1 for lifelike talking face generation, while Stable Diffusion 3 and its extensions received mixed impressions. Concerns about AI energy usage and political bias in AI were also discussed.
Claude 3 just destroyed GPT 4 (see for yourself)
claude-3 claude-3-opus claude-3-sonnet claude-3-haiku gpt-4 anthropic amazon google claude-ai multimodality vision long-context model-alignment model-evaluation synthetic-data structured-output instruction-following model-speed cost-efficiency benchmarking safety mmitchell connor-leahy
Claude 3 from Anthropic launches in three sizes: Haiku (small, unreleased), Sonnet (medium, default on claude.ai, AWS, and GCP), and Opus (large, on Claude Pro). Opus outperforms GPT-4 on key benchmarks like GPQA, impressing benchmark authors. All models support multimodality with advanced vision capabilities, including converting a 2-hour video into a blog post. Claude 3 offers improved alignment, fewer refusals, and extended context length up to 1 million tokens with near-perfect recall. Haiku is noted for speed and cost-efficiency, processing dense research papers in under three seconds. The models excel at following complex instructions and producing structured outputs like JSON. Safety improvements reduce refusal rates, though some criticism remains from experts. Claude 3 is trained on synthetic data and shows strong domain-specific evaluation results in finance, medicine, and philosophy.
12/30/2023: Mega List of all LLMs
deita-v1.0 mixtral amazon-titan-text-express amazon-titan-text-lite nous-research hugging-face amazon mistral-ai local-attention computational-complexity benchmarking model-merging graded-modal-types function-calling data-contamination training-methods stella-biderman euclaise joey00072
Stella Biderman's tracking list of LLMs is highlighted, with resources shared for browsing. The Nous Research AI Discord discussed the Local Attention Flax module focusing on computational complexity, debating linear vs quadratic complexity and proposing chunking as a solution. Benchmark logs for various LLMs including Deita v1.0 with its SFT+DPO training method were shared. Discussions covered model merging, graded modal types, function calling in AI models, and data contamination issues in Mixtral. Community insights were sought on Amazon Titan Text Express and Amazon Titan Text Lite LLMs, including a unique training strategy involving bad datasets. Several GitHub repositories and projects like DRUGS, MathPile, CL-FoMo, and SplaTAM were referenced for performance and data quality evaluations.