All tags
Model: "gpt-5"
not much happened today
o3 o4-mini gpt-5 sonnet-3.7 gemma-3 qwen-2.5-vl gemini-2.5-pro gemma-7b llama-3-1-405b openai deepseek anthropic google meta-ai-fair inference-scaling reward-modeling coding-models ocr model-preview rate-limiting model-pricing architectural-advantage benchmarking long-form-reasoning attention-mechanisms mixture-of-experts gpu-throughput sama akhaliq nearcyan fchollet reach_vb philschmid teortaxestex epochairesearch omarsar0
OpenAI announced that o3 and o4-mini models will be released soon, with GPT-5 expected in a few months, delayed for quality improvements and capacity planning. DeepSeek introduced Self-Principled Critique Tuning (SPCT) to enhance inference-time scalability for generalist reward models. Anthropic's Sonnet 3.7 remains a top coding model. Google's Gemma 3 is available on KerasHub, and Qwen 2.5 VL powers a new Apache 2.0 licensed OCR model. Gemini 2.5 Pro entered public preview with increased rate limits and pricing announced, becoming a preferred model for many tasks except image generation. Meta's architectural advantage and the FrontierMath benchmark challenge AI's long-form reasoning and worldview development. Research reveals LLMs focus attention on the first token as an "attention sink," preserving representation diversity, demonstrated in Gemma 7B and LLaMa 3.1 models. MegaScale-Infer offers efficient serving of large-scale Mixture-of-Experts models with up to 1.90x higher per-GPU throughput.
small news items
gpt-4.5 gpt-5 deepseek-r1-distilled-qwen-1.5b o1-preview modernbert-0.3b qwen-0.5b o3 openai ollama mistral perplexity cerebras alibaba groq bytedance math benchmarking fine-tuning model-performance reinforcement-learning model-architecture partnerships funding jeremyphoward arankomatsuzaki sama nrehiew_ danhendrycks akhaliq
OpenAI announced plans for GPT-4.5 (Orion) and GPT-5, with GPT-5 integrating the o3 model and offering unlimited chat access in the free tier. DeepSeek R1 Distilled Qwen 1.5B outperforms OpenAI's o1-preview on math benchmarks, while ModernBERT 0.3b surpasses Qwen 0.5b at MMLU without fine-tuning. Mistral and Perplexity adopt Cerebras hardware for 10x performance gains. OpenAI's o3 model won a gold medal at the 2024 International Olympiad in Informatics. Partnerships include Qwen with Groq. Significant RLHF activity is noted in Nigeria and the global south, and Bytedance is expected to rise in AI prominence soon. "GPT5 is all you need."
not much happened today
deepseek-v3 llama-3-1-405b gpt-4o gpt-5 minimax-01 claude-3-haiku cosmos-nemotron-34b openai deep-learning-ai meta-ai-fair google-deepmind saama langchain nvidia mixture-of-experts coding math scaling visual-tokenizers diffusion-models inference-time-scaling retrieval-augmented-generation ai-export-restrictions security-vulnerabilities prompt-injection gpu-optimization fine-tuning personalized-medicine clinical-trials ai-agents persistent-memory akhaliq
DeepSeek-V3, a 671 billion parameter mixture-of-experts model, surpasses Llama 3.1 405B and GPT-4o in coding and math benchmarks. OpenAI announced the upcoming release of GPT-5 on April 27, 2023. MiniMax-01 Coder mode in ai-gradio enables building a chess game in one shot. Meta research highlights trade-offs in scaling visual tokenizers. Google DeepMind improves diffusion model quality via inference-time scaling. The RA-DIT method fine-tunes LLMs and retrievers for better RAG responses. The U.S. proposes a three-tier export restriction system on AI chips and models, excluding countries like China and Russia. Security vulnerabilities in AI chatbots involving CSRF and prompt injection were revealed. Concerns about superintelligence and weapons-grade AI models were expressed. ai-gradio updates include NVIDIA NIM compatibility and new models like cosmos-nemotron-34b. LangChain integrates with Claude-3-haiku for AI agents with persistent memory. Triton Warp specialization optimizes GPU usage for matrix multiplication. Meta's fine-tuned Llama models, OpenBioLLM-8B and OpenBioLLM-70B, target personalized medicine and clinical trials.
Life after DPO (RewardBench)
gpt-3 gpt-4 gpt-5 gpt-6 llama-3-8b llama-3 claude-3 gemini x-ai openai mistral-ai anthropic cohere meta-ai-fair hugging-face nvidia reinforcement-learning-from-human-feedback direct-preference-optimization reward-models rewardbench language-model-history model-evaluation alignment-research preference-datasets personalization transformer-architecture nathan-lambert chris-manning elon-musk bindureddy rohanpaul_ai nearcyan
xAI raised $6 billion at a $24 billion valuation, positioning it among the most highly valued AI startups, with expectations to fund GPT-5 and GPT-6 class models. The RewardBench tool, developed by Nathan Lambert, evaluates reward models (RMs) for language models, showing Cohere's RMs outperforming open-source alternatives. The discussion highlights the evolution of language models from Claude Shannon's 1948 model to GPT-3 and beyond, emphasizing the role of RLHF (Reinforcement Learning from Human Feedback) and the newer DPO (Direct Preference Optimization) method. Notably, some Llama 3 8B reward model-focused models are currently outperforming GPT-4, Cohere, Gemini, and Claude on the RewardBench leaderboard, raising questions about reward hacking. Future alignment research directions include improving preference datasets, DPO techniques, and personalization in language models. The report also compares xAI's valuation with OpenAI, Mistral AI, and Anthropic, noting speculation about xAI's spending on Nvidia hardware.
Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?
gpt-5 gpt-4 dall-e-3 openai microsoft learnable-activations mlp function-approximation interpretability inductive-bias-injection b-splines model-rearrangement parameter-efficiency ai-generated-image-detection metadata-standards large-model-training max-tegmark ziming-liu bindureddy nptacek zacharynado rohanpaul_ai svpino
Ziming Liu, a grad student of Max Tegmark, published a paper on Kolmogorov-Arnold Networks (KANs), claiming they outperform MLPs in interpretability, inductive bias injection, function approximation accuracy, and scaling, despite being 10x slower to train but 100x more parameter efficient. KANs use learnable activation functions modeled by B-splines on edges rather than fixed activations on nodes. However, it was later shown that KANs can be mathematically rearranged back into MLPs with similar parameter counts, sparking debate on their interpretability and novelty. Meanwhile, on AI Twitter, there is speculation about a potential GPT-5 release with mixed impressions, OpenAI's adoption of the C2PA metadata standard for detecting AI-generated images with high accuracy for DALL-E 3, and Microsoft training a large 500B parameter model called MAI-1, potentially previewed at Build conference, signaling increased competition with OpenAI. "OpenAI's safety testing for GPT-4.5 couldn't finish in time for Google I/O launch" was also noted.
Evals: The Next Generation
gpt-4 gpt-5 gpt-3.5 phi-3 mistral-7b llama-3 scale-ai mistral-ai reka-ai openai moderna sanctuary-ai microsoft mit meta-ai-fair benchmarking data-contamination multimodality fine-tuning ai-regulation ai-safety ai-weapons neural-networks model-architecture model-training model-performance robotics activation-functions long-context sam-altman jim-fan
Scale AI highlighted issues with data contamination in benchmarks like MMLU and GSM8K, proposing a new benchmark where Mistral overfits and Phi-3 performs well. Reka released the VibeEval benchmark for multimodal models addressing multiple choice benchmark limitations. Sam Altman of OpenAI discussed GPT-4 as "dumb" and hinted at GPT-5 with AI agents as a major breakthrough. Researchers jailbroke GPT-3.5 via fine-tuning. Global calls emerged to ban AI-powered weapons, with US officials urging human control over nuclear arms. Ukraine launched an AI consular avatar, while Moderna partnered with OpenAI for medical AI advancements. Sanctuary AI and Microsoft collaborate on AI for general-purpose robots. MIT introduced Kolmogorov-Arnold networks with improved neural network efficiency. Meta AI is training Llama 3 models with over 400 billion parameters, featuring multimodality and longer context.
Multi-modal, Multi-Aspect, Multi-Form-Factor AI
gpt-4 idefics-2-8b mistral-instruct apple-mlx gpt-5 reka-ai cohere google rewind apple mistral-ai microsoft paypal multimodality foundation-models embedding-models gpu-performance model-comparison enterprise-data open-source performance-optimization job-impact agi-criticism technical-report arthur-mensch dan-schulman chris-bishop
Between April 12-15, Reka Core launched a new GPT4-class multimodal foundation model with a detailed technical report described as "full Shazeer." Cohere Compass introduced a foundation embedding model for indexing and searching multi-aspect enterprise data like emails and invoices. The open-source IDEFICS 2-8B model continues Google's Flamingo multimodal model reproduction. Rewind pivoted to a multi-platform app called Limitless, moving away from spyware. Reddit discussions highlighted Apple MLX outperforming Ollama and Mistral Instruct on M2 Ultra GPUs, GPU choices for LLMs and Stable Diffusion, and AI-human comparisons by Microsoft Research's Chris Bishop. Former PayPal CEO Dan Schulman predicted GPT-5 will drastically reduce job scopes by 80%. Mistral CEO Arthur Mensch criticized the obsession with AGI as "creating God."
World_sim.exe
gpt-4 gpt-4o grok-1 llama-cpp claude-3-opus claude-3 gpt-5 nvidia nous-research stability-ai hugging-face langchain anthropic openai multimodality foundation-models hardware-optimization model-quantization float4 float6 retrieval-augmented-generation text-to-video prompt-engineering long-form-rag gpu-optimization philosophy-of-ai agi-predictions jensen-huang yann-lecun sam-altman
NVIDIA announced Project GR00T, a foundation model for humanoid robot learning using multimodal instructions, built on their tech stack including Isaac Lab, OSMO, and Jetson Thor. They revealed the DGX Grace-Blackwell GB200 with over 1 exaflop compute, capable of training GPT-4 1.8T parameters in 90 days on 2000 Blackwells. Jensen Huang confirmed GPT-4 has 1.8 trillion parameters. The new GB200 GPU supports float4/6 precision with ~3 bits per parameter and achieves 40,000 TFLOPs on fp4 with 2x sparsity.
Open source highlights include the release of Grok-1, a 340B parameter model, and Stability AI's SV3D, an open-source text-to-video generation solution. Nous Research collaborated on implementing Steering Vectors in Llama.CPP.
In Retrieval Augmented Generation (RAG), a new 5.5-hour tutorial builds a pipeline using open-source HF models, and LangChain released a video on query routing and announced integration with NVIDIA NIM for GPU-optimized LLM inference.
Prominent opinions include Yann LeCun distinguishing language from other cognitive abilities, Sam Altman predicting AGI arrival in 6 years with a leap from GPT-4 to GPT-5 comparable to GPT-3 to GPT-4, and discussions on the philosophical status of LLMs like Claude. There is also advice against training models from scratch for most companies.
Sama says: GPT-5 soon
gpt-5 mixtral-7b gpt-3.5 gemini-pro gpt-4 llama-cpp openai codium thebloke amd hugging-face mixture-of-experts fine-tuning model-merging 8-bit-optimization gpu-acceleration performance-comparison command-line-ai vector-stores embeddings coding-capabilities sam-altman ilya-sutskever itamar andrej-karpathy
Sam Altman at Davos highlighted that his top priority is launching the new model, likely called GPT-5, while expressing uncertainty about Ilya Sutskever's employment status. Itamar from Codium introduced the concept of Flow Engineering with AlphaCodium, gaining attention from Andrej Karpathy. On the TheBloke Discord, engineers discussed a multi-specialty mixture-of-experts (MOE) model combining seven distinct 7 billion parameter models specialized in law, finance, and medicine. Debates on 8-bit fine-tuning and the use of bitsandbytes with GPU support were prominent. Discussions also covered model merging using tools like Mergekit and compatibility with Alpaca format. Interest in optimizing AI models on AMD hardware using AOCL blas and lapack libraries with llama.cpp was noted. Users experimented with AI for command line tasks, and the Mixtral MoE model was refined to surpass larger models in coding ability. Comparisons among LLMs such as GPT-3.5, Mixtral, Gemini Pro, and GPT-4 focused on knowledge depth, problem-solving, and speed, especially for coding tasks.
12/16/2023: ByteDance suspended by OpenAI
claude-2.1 gpt-4-turbo gemini-1.5-pro gpt-5 gpt-4.5 gpt-4 openai google-deepmind anthropic hardware gpu api-costs coding model-comparison subscription-issues payment-processing feature-confidentiality ai-art-generation organizational-productivity model-speculation
The OpenAI Discord community discussed hardware options like Mac racks and the A6000 GPU, highlighting their value for AI workloads. They compared Claude 2.1 and GPT 4 Turbo on coding tasks, with GPT 4 Turbo outperforming Claude 2.1. The benefits of the Bard API for gemini pro were noted, including a free quota of 60 queries per minute. Users shared experiences with ChatGPT Plus membership issues, payment problems, and speculated about the upcoming GPT-5 and the rumored GPT-4.5. Discussions also covered the confidentiality of the Alpha feature, AI art generation policies, and improvements in organizational work features. The community expressed mixed feelings about GPT-4's performance and awaited future model updates.
12/7/2023: Anthropic says "skill issue"
claude-2.1 gpt-4 gpt-3.5 gemini-pro gemini-ultra gpt-4.5 chatgpt bingchat dall-e gpt-5 anthropic openai google prompt-engineering model-performance regulation language-model-performance image-generation audio-processing midi-sequence-analysis subscription-issues network-errors
Anthropic fixed a glitch in their Claude 2.1 model's needle in a haystack test by adding a prompt. Discussions on OpenAI's Discord compared Google's Gemini Pro and Gemini Ultra models with OpenAI's GPT-4 and GPT-3.5, with some users finding GPT-4 superior in benchmarks. Rumors about a GPT-4.5 release circulated without official confirmation. Concerns were raised about "selective censorship" affecting language model performance. The EU's potential regulation of AI, including ChatGPT, was highlighted. Users reported issues with ChatGPT Plus message limits and subscription upgrades, and shared experiences with BingChat and DALL-E. The community discussed prompt engineering techniques and future applications like image generation and MIDI sequence analysis, expressing hopes for GPT-5.