All tags
Model: "gpt-4o-2024-08-06"
Too Cheap To Meter: AI prices cut 50-70% in last 30 days
gpt-4o gpt-4o-mini llama-3-1-405b mistral-large-2 gemini-1.5-flash deepseek-v2 sonnet-3.5 exaone-3.0 minicpm-v-2.6 claude-3.5 gpt-4o-2024-08-06 llamaindex together-ai deepinfra deepseek-ai mistral-ai google-deepmind lg-ai-research llamaindex llamaindex llamaindex price-cuts context-caching instruction-tuning vision benchmarks pytorch attention-mechanisms reinforcement-learning-from-human-feedback compute-optimal-scaling rohanpaul_ai akhaliq mervenoyann sophiamyang chhillee karpathy
Gemini 1.5 Flash has cut prices by approximately 70%, offering a highly competitive free tier of 1 million tokens per minute at $0.075/mtok, intensifying the AI model price war. Other significant price reductions include GPT-4o (~50% cut to $2.50/mtok), GPT-4o mini (70-98.5% cut to $0.15/mtok), Llama 3.1 405b (46% cut to $2.7/mtok), and Mistral Large 2 (62% cut to $3/mtok). Deepseek v2 introduced context caching, reducing input token costs by up to 90% to $0.014/mtok. New model releases include Llama 3.1 405b, Sonnet 3.5, EXAONE-3.0 (7.8B instruction-tuned by LG AI Research), and MiniCPM V 2.6 (vision-language model combining SigLIP 400M and Qwen2-7B). Benchmarks show Mistral Large performing well on ZebraLogic and Claude-3.5 leading LiveBench. FlexAttention, a new PyTorch API, simplifies and optimizes attention mechanisms. Andrej Karpathy analyzed RLHF, highlighting its limitations compared to traditional reinforcement learning. Google DeepMind research on compute-optimal scaling was also summarized.
not much happened today
gpt-4-0613 gpt-3.5-turbo-0613 gpt-4o-2024-08-06 mistral-large-2 gpt4-turbo claude-3-opus idefics3-llama bigllama-3.1-1t-instruct llama-3-120b-instruct openai mistral-ai meta-ai-fair structured-outputs function-calling json-schema benchmarking multimodality context-windows model-scaling ai-hardware vision speech-processing robotics ai-regulation sama rohanpaul_ai corbtt guillaumelample mervenoyann maximelabonne aidan_mclau adcock_brett ylecun
OpenAI introduced structured outputs in their API with a new "strict" mode and a "response_format" parameter, supporting models like gpt-4-0613, gpt-3.5-turbo-0613, and the new gpt-4o-2024-08-06. They also halved the price of gpt-4o to $2.50 per million tokens. Mistral Large 2 outperforms gpt4-turbo and claude-3-opus on hard benchmarks and coding tasks. Idefics3-Llama offers multimodal capabilities with a 10k token context window. BigLlama-3.1-1T-Instruct is an upscaled version of llama-3-120b-instruct. New benchmark "big_model_smell" measures creativity and reliability. Figure 02 robot features advanced AI hardware with onboard vision language model, enhanced battery, and speech-to-speech reasoning. Yann LeCun expressed concerns about California's SB1047 regulation.
GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)
gpt-4o-mini gpt-4o-2024-08-06 llama-3 bigllama-3.1-1t-instruct meta-llama-3-120b-instruct gemma-2-2b stability-ai unsloth-ai google hugging-face lora controlnet line-art gpu-performance multi-gpu-support fine-tuning prompt-formatting cloud-computing text-to-image-generation model-integration
Stability.ai users are leveraging LoRA and ControlNet for enhanced line art and artistic style transformations, while facing challenges with AMD GPUs due to the discontinuation of ZLUDA. Community tensions persist around the r/stablediffusion subreddit moderation. Unsloth AI users report fine-tuning difficulties with LLaMA3 models, especially with PPO trainer integration and prompt formatting, alongside anticipation for multi-GPU support and cost-effective cloud computing on RunPod. Google released the lightweight Gemma 2 2B model optimized for on-device use with 2.6B parameters, featuring safety and sparse autoencoder tools, and announced Diffusers integration for efficient text-to-image generation on limited resources.
GPT4o August + 100% Structured Outputs for All (GPT4o August edition)
gpt-4o-2024-08-06 llama-3-1-405b llama-3 claude-3.5-sonnet gemini-1.5-pro gpt-4o yi-large-turbo openai meta-ai-fair google-deepmind yi-large nvidia groq langchain jamai langsmith structured-output context-windows model-pricing benchmarking parameter-efficient-expert-retrieval retrieval-augmented-generation mixture-of-experts model-performance ai-hardware model-deployment filtering multi-lingual vision john-carmack jonathan-ross rohanpaul_ai
OpenAI released the new gpt-4o-2024-08-06 model with 16k context window and 33-50% lower pricing than the previous 4o-May version, featuring a new Structured Output API that improves output quality and reduces retry costs. Meta AI launched Llama 3.1, a 405-billion parameter model surpassing GPT-4 and Claude 3.5 Sonnet on benchmarks, alongside expanding the Llama Impact Grant program. Google DeepMind quietly released Gemini 1.5 Pro, outperforming GPT-4o, Claude-3.5, and Llama 3.1 on LMSYS benchmarks and leading the Vision Leaderboard. Yi-Large Turbo was introduced as a cost-effective upgrade priced at $0.19 per million tokens. In hardware, NVIDIA H100 GPUs were highlighted by John Carmack for their massive AI workload power, and Groq announced plans to deploy 108,000 LPUs by Q1 2025. New AI tools and techniques include RAG (Retrieval-Augmented Generation), the JamAI Base platform for Mixture of Agents systems, and LangSmith's enhanced filtering capabilities. Google DeepMind also introduced PEER (Parameter Efficient Expert Retrieval) architecture.