Model: "stable-diffusion-3"

Aug 15, 2024

Grok 2! and ChatGPT-4o-latest confuses everybody

gpt-4o grok-2 claude-3.5-sonnet flux-1 stable-diffusion-3 gemini-advanced openai x-ai black-forest-labs google-deepmind benchmarking model-performance tokenization security-vulnerabilities multi-agent-systems research-automation text-to-image conversational-ai model-integration ylecun rohanpaul_ai karpathy

OpenAI quietly released a new GPT-4o model in ChatGPT, distinct from the API version, reclaiming the #1 spot on Lmsys arena benchmarks across multiple categories including math, coding, and instruction-following. Meanwhile, X.ai launched Grok 2, outperforming Claude 3.5 Sonnet and previous GPT-4o versions, with plans for enterprise API release. Grok 2 integrates Black Forest Labs' Flux.1, an open-source text-to-image model surpassing Stable Diffusion 3. Google DeepMind announced Gemini Advanced with enhanced conversational features and Pixel device integration. AI researcher ylecun highlighted LLM limitations in learning and creativity, while rohanpaul_ai discussed an AI Scientist system generating publishable ML research at low cost. karpathy warned of security risks in LLM tokenizers akin to SQL injection.

Apr 26, 2024

Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM

snowflake-arctic phi-3 llama-3-70b llama-3 stable-diffusion-3 sd3-turbo gpt-3.5-turbo snowflake databricks deepseek deepspeed nvidia stable-diffusion adobe apple llamaindex lmsys openai mixture-of-experts curriculum-learning model-release image-generation video-upscaling quantization inference-speed benchmarking model-comparison open-source on-device-ai

Snowflake Arctic is a notable new foundation language model released under Apache 2.0, claiming superiority over Databricks in data warehouse AI applications and adopting a mixture-of-experts architecture inspired by DeepSeekMOE and DeepSpeedMOE. The model employs a 3-stage curriculum training strategy similar to the recent Phi-3 paper. In AI image and video generation, Nvidia introduced the Align Your Steps technique improving image quality at low step counts, while Stable Diffusion 3 and SD3 Turbo models were compared for prompt understanding and image quality. Adobe launched an AI video upscaling project enhancing blurry videos to HD, though with some high-resolution artifacts. Apple released open-source on-device language models with code and training logs, diverging from typical weight-only releases. The Llama-3-70b model ties for first place on the LMSYS leaderboard for English queries, and Phi-3 (4B params) outperforms GPT-3.5 Turbo in the banana logic benchmark. Fast inference and quantization of Llama 3 models were demonstrated on MacBook devices.

Apr 20, 2024

Llama-3-70b is GPT-4-level Open Model

llama-3-70b llama-3-8b llama-3 llama-2-70b mistral-7b grok-3 stable-diffusion-3 vasa-1 meta-ai-fair groq nvidia amazon microsoft benchmarking model-performance fine-tuning function-calling arithmetic image-generation video-generation energy-usage gpu-demand political-bias ai-safety scaling context-windows tokenization elon-musk

Meta has released Llama 3, their most capable open large language model with 8B and 70B parameter versions supporting 8K context length and outperforming previous models including Llama 2 and Mistral 7B. Groq serves the Llama 3 70B model at 500-800 tokens/second, making it the fastest GPT-4-level token source. Discussions highlight AI scaling challenges with Elon Musk stating that training Grok 3 will require 100,000 Nvidia H100 GPUs, and AWS planning to acquire 20,000 B200 GPUs for a 27 trillion parameter model. Microsoft unveiled VASA-1 for lifelike talking face generation, while Stable Diffusion 3 and its extensions received mixed impressions. Concerns about AI energy usage and political bias in AI were also discussed.

Apr 19, 2024

Meta Llama 3 (8B, 70B)

llama-3-8b llama-3-70b llama-3-400b stable-diffusion-3 mixtral-8x22b-instruct-v0.1 vasa-1 meta-ai-fair stability-ai boston-dynamics microsoft mistral-ai hugging-face transformer tokenization model-training benchmarking robotics natural-language-processing real-time-processing synthetic-data dataset-cleaning behavior-trees ai-safety model-accuracy api model-release humor helen-toner

Meta partially released Llama 3 models including 8B and 70B variants, with a 400B variant still in training, touted as the first GPT-4 level open-source model. Stability AI launched Stable Diffusion 3 API with model weights coming soon, showing competitive realism against Midjourney V6. Boston Dynamics unveiled an electric humanoid robot Atlas, and Microsoft introduced the VASA-1 model generating lifelike talking faces at 40fps on RTX 4090. Mistral AI, a European OpenAI rival, is seeking $5B funding with its Mixtral-8x22B-Instruct-v0.1 model achieving 100% accuracy on 64K context benchmarks. AI safety discussions include calls from former OpenAI board member Helen Toner for audits of top AI companies, and the Mormon Church released AI usage principles. New AI development tools include Ctrl-Adapter for diffusion models, Distilabel 1.0.0 for synthetic dataset pipelines, Data Bonsai for data cleaning with LLMs, and Dendron for building LLM agents with behavior trees. Memes highlight AI development humor and cultural references. The release of Llama 3 models features improved reasoning, a 128K token vocabulary, 8K token sequences, and grouped query attention.

Mar 21, 2024

Shipping and Dipping: Inflection + Stability edition

inflection-ai-2.5 stable-diffusion-3 claude-3-haiku claude-3-sonnet claude-3-opus tacticai inflection-ai stability-ai microsoft nvidia google-deepmind anthropic executive-departures gpu-acceleration ai-assistants geometric-deep-learning ai-integration ai-cost-reduction ai-job-displacement ai-healthcare model-release mustafa-suleyman

Inflection AI and Stability AI recently shipped major updates (Inflection AI 2.5 and Stable Diffusion 3) but are now experiencing significant executive departures, signaling potential consolidation in the GPU-rich startup space. Mustafa Suleyman has joined Microsoft AI as CEO, overseeing consumer AI products like Copilot, Bing, and Edge. Microsoft Azure is collaborating with NVIDIA on the Grace Blackwell 200 Superchip. Google DeepMind announced TacticAI, an AI assistant for football tactics developed with Liverpool FC, using geometric deep learning and achieving 90% expert approval in blind tests. Anthropic released Claude 3 Haiku and Claude 3 Sonnet on Google Cloud's Vertex AI, with Claude 3 Opus coming soon. Concerns about AI job displacement arise as NVIDIA introduces AI nurses that outperform humans at bedside manner at 90% lower cost.

Mar 05, 2024

Stable Diffusion 3 — Rombach & Esser did it again!

stable-diffusion-3 claude-3 orca dolphincoder-starcoder2-15b stability-ai anthropic microsoft latitude perplexity-ai llamaindex tripo-ai diffusion-models multimodality benchmarking human-evaluation text-generation image-generation 3d-modeling fine-tuning roleplay coding dataset-release soumith-chintala bill-peebles swyx kevinafischer jeremyphoward akhaliq karinanguyen_ aravsrinivas

Over 2500 new community members joined following Soumith Chintala's shoutout, highlighting growing interest in SOTA LLM-based summarization. The major highlight is the detailed paper release of Stable Diffusion 3 (SD3), showcasing advanced text-in-image control and complex prompt handling, with the model outperforming other SOTA image generation models in human-evaluated benchmarks. The SD3 model is based on an enhanced Diffusion Transformer architecture called MMDiT. Meanwhile, Anthropic released Claude 3 models, noted for human-like responses and emotional depth, scoring 79.88% on HumanEval but costing over twice as much as GPT-4. Microsoft launched new Orca-based models and datasets, and Latitude released DolphinCoder-StarCoder2-15b with strong coding capabilities. Integration of image models by Perplexity AI and 3D CAD generation by PolySpectra powered by LlamaIndex were also highlighted. "SD3's win rate beats all other SOTA image gen models (except perhaps Ideogram)" and "Claude 3 models are very good at generating d3 visualizations from text descriptions."