All tags
Model: "gpt-4-turbo"
Pixtral 12B: Mistral beats Llama to Multimodality
pixtral-12b mistral-nemo-12b llama-3-1-70b llama-3-1-8b deeps-eek-v2-5 gpt-4-turbo llama-3-1 strawberry claude mistral-ai meta-ai-fair hugging-face arcee-ai deepseek-ai openai anthropic vision multimodality ocr benchmarking model-release model-architecture model-performance fine-tuning model-deployment reasoning code-generation api access-control reach_vb devendra_chapilot _philschmid rohanpaul_ai
Mistral AI released Pixtral 12B, an open-weights vision-language model with a Mistral Nemo 12B text backbone and a 400M vision adapter, featuring a large vocabulary of 131,072 tokens and support for 1024x1024 pixel images. This release notably beat Meta AI in launching an open multimodal model. At the Mistral AI Summit, architecture details and benchmark performances were shared, showing strong OCR and screen understanding capabilities. Additionally, Arcee AI announced SuperNova, a distilled Llama 3.1 70B & 8B model outperforming Meta's Llama 3.1 70B instruct on benchmarks. DeepSeek released DeepSeek-V2.5, scoring 89 on HumanEval, surpassing GPT-4-Turbo, Opus, and Llama 3.1 in coding tasks. OpenAI plans to release Strawberry as part of ChatGPT soon, though its capabilities are debated. Anthropic introduced Workspaces for managing multiple Claude deployments with enhanced access controls.
not much happened today
llama-3 llama-3-1 grok-2 claude-3.5-sonnet gpt-4-turbo nous-research nvidia salesforce goodfire-ai anthropic x-ai google-deepmind box langchain fine-tuning prompt-caching mechanistic-interpretability model-performance multimodality agent-frameworks software-engineering-agents api document-processing text-generation model-releases vision image-generation efficiency scientific-discovery fchollet demis-hassabis
GPT-5 delayed again amid a quiet news day. Nous Research released Hermes 3 finetune of Llama 3 base models, rivaling FAIR's instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. Nvidia introduced Minitron finetune of Llama 3.1. Salesforce launched a DEI agent scoring 55% on SWE-Bench Lite. Goodfire AI secured $7M seed funding for mechanistic interpretability work. Anthropic rolled out prompt caching in their API, cutting input costs by up to 90% and latency by 80%, aiding coding assistants and large document processing. xAI released Grok-2, matching Claude 3.5 Sonnet and GPT-4 Turbo on LMSYS leaderboard with vision+text inputs and image generation integration. Claude 3.5 Sonnet reportedly outperforms GPT-4 in coding and reasoning. François Chollet defined intelligence as efficient operationalization of past info for future tasks. Salesforce's DEI framework surpasses individual agent performance. Google DeepMind's Demis Hassabis discussed AGI's role in scientific discovery and safe AI development. Dora AI plugin generates landing pages in under 60 seconds, boosting web team efficiency. Box AI API beta enables document chat, data extraction, and content summarization. LangChain updated Python & JavaScript integration docs.
There's Ilya!
chameleon-7b chameleon-34b deepseek-coder-v2 gpt-4-turbo claude-3-opus voco-llama safe-superintelligence-inc openai anthropic meta deepseek google-deepmind parallel-decoding code-generation quantization training-dynamics vision benchmarks datasets image-captioning reasoning memory-optimization ilya-sutskever jan-leike ylecun akhaliq philschmid rohanpaul_ai mervenoyann fchollet
Ilya Sutskever has co-founded Safe Superintelligence Inc shortly after leaving OpenAI, while Jan Leike moved to Anthropic. Meta released new models including Chameleon 7B and 34B with mixed-modal input and unified token space quantization. DeepSeek-Coder-V2 shows code capabilities comparable to GPT-4 Turbo, supporting 338 programming languages and 128K context length. Consistency Large Language Models (CLLMs) enable parallel decoding generating multiple tokens per step. Grokked Transformers demonstrate reasoning through training dynamics affecting memory formation and generalization. VoCo-LLaMA compresses vision tokens with LLMs improving video temporal correlation understanding. The BigCodeBench benchmark evaluates LLMs on 1,140 coding tasks across 139 Python libraries, topped by DeepSeek-Coder-V2 and Claude 3 Opus. PixelProse is a large 16M image-caption dataset with reduced toxicity.
Gemini launches context caching... or does it?
nemotron llama-3-70b chameleon-7b chameleon-34b gemini-1.5-pro deepseek-coder-v2 gpt-4-turbo claude-3-opus gemini-1.5-pro nvidia meta-ai-fair google deepseek hugging-face context-caching model-performance fine-tuning reinforcement-learning group-relative-policy-optimization large-context model-training coding model-release rohanpaul_ai _philschmid aman-sanger
Nvidia's Nemotron ranks #1 open model on LMsys and #11 overall, surpassing Llama-3-70b. Meta AI released Chameleon 7B/34B models after further post-training. Google's Gemini introduced context caching, offering a cost-efficient middle ground between RAG and finetuning, with a minimum input token count of 33k and no upper limit on cache duration. DeepSeek launched DeepSeek-Coder-V2, a 236B parameter model outperforming GPT-4 Turbo, Claude-3-Opus, and Gemini-1.5-Pro in coding tasks, supporting 338 programming languages and extending context length to 128K. It was trained on 6 trillion tokens using the Group Relative Policy Optimization (GRPO) algorithm and is available on Hugging Face with a commercial license. These developments highlight advances in model performance, context caching, and large-scale coding models.
Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing
gpt-4 gpt-4o gpt-4-turbo gpt-4o-mini llama bloom stable-diffusion cursor openai anthropic google-deepmind huggingface speculative-decoding code-edits multimodality image-generation streaming tool-use fine-tuning benchmarking mmlu model-performance evaluation synthetic-data context-windows sama abacaj imjaredz erhartford alexalbert svpino maximelabonne _philschmid
Cursor, an AI-native IDE, announced a speculative edits algorithm for code editing that surpasses GPT-4 and GPT-4o in accuracy and latency, achieving speeds of over 1000 tokens/s on a 70b model. OpenAI released GPT-4o with multimodal capabilities including audio, vision, and text, noted to be 2x faster and 50% cheaper than GPT-4 turbo, though with mixed coding performance. Anthropic introduced streaming, forced tool use, and vision features for developers. Google DeepMind unveiled Imagen Video and Gemini 1.5 Flash, a small model with a 1M-context window. HuggingFace is distributing $10M in free GPUs for open-source AI models like Llama, BLOOM, and Stable Diffusion. Evaluation insights highlight challenges with LLMs on novel problems and benchmark saturation, with new benchmarks like MMLU-Pro showing significant drops in top model performance.
GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)
gpt-4o gpt-4-turbo openai lmsys multion adept multimodality vision speech-recognition tokenization real-time-processing coding model-performance model-optimization desktop-agents sama gdb
OpenAI has released GPT-4o, a new multimodal model capable of reasoning across text, audio, and video in real time with low latency (~300ms). It features voice and vision capabilities, improved non-English language performance with an expanded 200k vocabulary tokenizer, and is available to all ChatGPT users including free plans. GPT-4o is half the price and twice as fast as GPT-4-turbo with 5x rate limits. The model supports real-time voice and video input/output and shows strong coding capabilities. The release includes a new desktop app that can read screen and clipboard history, challenging existing desktop agent startups. The announcement was accompanied by demos including image generation and 3D object handling, with OpenAI achieving state-of-the-art performance in ASR and vision tasks. The update was widely discussed on social media, with comparisons to GPT-4T highlighting GPT-4o's speed and versatility. "GPT-4o is smart, fast, natively multimodal, and a step towards more natural human-computer interaction" and "extremely versatile and fun to play with".
Not much happened today
command-r-35b goliath-120 miqu-120 llama-3-8b tensorrt-llm llama-cpp gpt2-chat gpt-4-turbo llama-3 deepmind-alphazero anthropic openai perplexity-ai amazon apple microsoft deepmind creative-writing context-windows benchmarking model-performance self-learning function-calling retrieval-augmented-generation ai-assistants on-device-ai ai-lobbying copyright-infringement code-reasoning image-generation
Anthropic released a team plan and iOS app about 4 months after OpenAI. The Command-R 35B model excels at creative writing, outperforming larger models like Goliath-120 and Miqu-120. The Llama-3 8B model now supports a 1 million token context window, improving long-context understanding with minimal training on a single 8xA800 GPU machine. TensorRT-LLM benchmarks show it is 30-70% faster than llama.cpp on consumer hardware. A benchmark suggests GPT2-Chat may have better reasoning than GPT-4-Turbo, though results are debated. Demos include a self-learning Llama-3 voice agent running locally on Jetson Orin and a Self-Learning Large Action Model (LAM). Amazon CodeWhisperer was renamed to Q Developer, expanding its generative AI assistant capabilities. Apple plans an AI-enabled Safari browser with an on-device LLM in iOS 18 and macOS 15. Big Tech dominates AI lobbying in Washington, while major U.S. newspapers sued OpenAI and Microsoft for copyright infringement. DeepMind's AlphaZero became the greatest chess player in 9 hours, and their Naturalized Execution Tuning (NExT) method improves LLM code reasoning by 14-26%. Stable Diffusion is used for diverse image generation applications.
OpenAI's Instruction Hierarchy for the LLM OS
phi-3-mini openelm claude-3-opus gpt-4-turbo gpt-3.5-turbo llama-3-70b rho-1 mistral-7b llama-3-8b llama-3 openai microsoft apple deepseek mistral-ai llamaindex wendys prompt-injection alignment benchmarking instruction-following context-windows model-training model-deployment inference performance-optimization ai-application career-advice drive-thru-ai
OpenAI published a paper introducing the concept of privilege levels for LLMs to address prompt injection vulnerabilities, improving defenses by 20-30%. Microsoft released the lightweight Phi-3-mini model with 4K and 128K context lengths. Apple open-sourced the OpenELM language model family with an open training and inference framework. An instruction accuracy benchmark compared 12 models, with Claude 3 Opus, GPT-4 Turbo, and Llama 3 70B performing best. The Rho-1 method enables training state-of-the-art models using only 3% of tokens, boosting models like Mistral. Wendy's deployed AI-powered drive-thru ordering, and a study found Gen Z workers prefer generative AI for career advice. Tutorials on deploying Llama 3 models on AWS EC2 highlight hardware requirements and inference server use.
Zero to GPT in 1 Year
gpt-4-turbo claude-3-opus mixtral-8x22b zephyr-141b medical-mt5 openai anthropic mistral-ai langchain hugging-face fine-tuning multilinguality tool-integration transformers model-evaluation open-source-models multimodal-llms natural-language-processing ocr model-training vik-paruchuri sam-altman greg-brockman miranda-murati abacaj mbusigin akhaliq clementdelangue
GPT-4 Turbo reclaimed the top leaderboard spot with significant improvements in coding, multilingual, and English-only tasks, now rolled out in paid ChatGPT. Despite this, Claude Opus remains superior in creativity and intelligence. Mistral AI released powerful open-source models like Mixtral-8x22B and Zephyr 141B suited for fine-tuning. LangChain enhanced tool integration across models, and Hugging Face introduced Transformer.js for running transformers in browsers. Medical domain-focused Medical mT5 was shared as an open-source multilingual text-to-text model. The community also highlighted research on LLMs as regressors and shared practical advice on OCR/PDF data modeling from Vik Paruchuri's journey.
Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence
gemini-1.5-pro gpt-4-turbo llama-3 orca-2.5-7b functionary-v2.4 cosxl google openai meta-ai-fair hugging-face cohere million-token-context-window audio-processing file-api text-embedding function-calling reasoning direct-nash-optimization contrastive-learning code-interpreter diffusion-models neural-odes inference-speed multilingual-dataset image-editing no-code-development
At Google Cloud Next, Gemini 1.5 Pro was released with a million-token context window, available in 180+ countries, featuring 9.5 hours of audio understanding, a new File API for nearly unlimited free uploads, and the Gecko-1b-256/768 embedding model. GPT-4 Turbo with Vision became generally available in the API with a major update improving reasoning capabilities. Meta Platforms plans to launch smaller versions of Llama 3 next week. The Orca 2.5 7B model using Direct Nash Optimization outperforms older GPT-4 versions in AlpacaEval. New releases include Functionary-V2.4 with enhanced function calling and code interpretation, and CosXL models for image editing. Research highlights include continuous U-Nets for diffusion models achieving up to 80% faster inference and a massive multilingual dataset with ~5.6 trillion word tokens. Creative applications include a no-code touch screen game made with Gemini 1.5 and AI-generated novel trailers.
Mistral Large disappoints
mistral-large mistral-small mixtral-8x7b gpt-4-turbo dreamgen-opus-v1 mistral-ai openai hugging-face benchmarking model-merging fine-tuning reinforcement-learning model-training tokenization model-optimization ai-assisted-decompilation performance cost-efficiency deception roleplay deep-speed dpo timotheeee1 cogbuji plasmator jsarnecki maldevide spottyluck mrjackspade
Mistral announced Mistral Large, a new language model achieving 81.2% accuracy on MMLU, trailing GPT-4 Turbo by about 5 percentage points on benchmarks. The community reception has been mixed, with skepticism about open sourcing and claims that Mistral Small outperforms the open Mixtral 8x7B. Discussions in the TheBloke Discord highlighted performance and cost-efficiency comparisons between Mistral Large and GPT-4 Turbo, technical challenges with DeepSpeed and DPOTrainer for training, advances in AI deception for roleplay characters using DreamGen Opus V1, and complexities in model merging using linear interpolation and PEFT methods. Enthusiasm for AI-assisted decompilation was also expressed, emphasizing the use of open-source projects for training data.
GPT4Turbo A/B Test: gpt-4-0125-preview
gpt-4-turbo gpt-4-1106-preview gpt-3.5 llama-2-7b-chat tiny-llama mistral openai thebloke nous-research hugging-face multi-gpu-support model-optimization model-merging fine-tuning context-windows chatbot-personas api-performance text-transcription cost-considerations model-troubleshooting
OpenAI released a new GPT-4 Turbo version in January 2024, prompting natural experiments in summarization and discussions on API performance and cost trade-offs. The TheBloke Discord highlighted UnSloth's upcoming limited multi-GPU support for Google Colab beginners, AI models like Tiny Llama and Mistral running on Nintendo Switch, and advanced model merging techniques such as DARE and SLERP. The OpenAI Discord noted issues with GPT-4-1106-preview processing delays, troubleshooting GPT model errors, and transcription challenges with GPT-3.5 and GPT-4 Turbo. Nous Research AI focused on extending context windows, notably LLaMA-2-7B-Chat reaching 16,384 tokens, and fine-tuning alternatives like SelfExtend. Discussions also touched on chatbot persona creation, model configuration optimizations, and societal impacts of AI technology.
GPT4Turbo A/B Test: gpt-4-1106-preview
gpt-4-turbo gpt-4 gpt-3.5 openhermes-2.5-mistral-7b-4.0bpw exllamav2 llama-2-7b-chat mistral-instruct-v0.2 mistrallite llama2 openai huggingface thebloke nous-research mistral-ai langchain microsoft azure model-loading rhel dataset-generation llm-on-consoles fine-tuning speed-optimization api-performance prompt-engineering token-limits memory-constraints text-generation nlp-tools context-window-extension sliding-windows rope-theta non-finetuning-context-extension societal-impact
OpenAI released a new GPT-4 Turbo version, prompting a natural experiment in summarization comparing the November 2023 and January 2024 versions. The TheBloke Discord discussed troubleshooting model loading errors with OpenHermes-2.5-Mistral-7B-4.0bpw and exllamav2, debates on RHEL in ML, dataset generation for understanding GPT flaws, and running LLMs like Llama and Mistral on consoles. LangChain fine-tuning challenges for Llama2 were also noted. The OpenAI Discord highlighted GPT-4 speed inconsistencies, API vs web performance, prompt engineering with GPT-3.5 and GPT-4 Turbo, and DALL-E typo issues in image text. Discussions included NLP tools like semantic-text-splitter and collaboration concerns with GPT-4 Vision on Azure. The Nous Research AI Discord focused on extending context windows with Mistral instruct v0.2, MistralLite, and LLaMA-2-7B-Chat achieving 16,384 token context, plus alternatives like SelfExtend for context extension without fine-tuning. The societal impact of AI technology was also considered.
1/11/2024: Mixing Experts vs Merging Models
gpt-4-turbo gpt-4-0613 mixtral deepseekmoe phixtral deepseek-ai hugging-face nous-research teenage-engineering discord mixture-of-experts model-merging fine-tuning rag security discord-tos model-performance prompt-engineering function-calling semantic-analysis data-frameworks ash_prabaker shacrw teknium 0xevil everyoneisgross ldj pramod8481 mgreg_42266 georgejrjrjr kenakafrosty
18 guilds, 277 channels, and 1342 messages were analyzed with an estimated reading time saved of 187 minutes. The community switched to GPT-4 turbo and discussed the rise of Mixture of Experts (MoE) models like Mixtral, DeepSeekMOE, and Phixtral. Model merging techniques, including naive linear interpolation and "frankenmerges" by SOLAR and Goliath, are driving new performance gains on open leaderboards. Discussions in the Nous Research AI Discord covered topics such as AI playgrounds supporting prompt and RAG parameters, security concerns about third-party cloud usage, debates on Discord bots and TOS, skepticism about Teenage Engineering's cloud LLM, and performance differences between GPT-4 0613 and GPT-4 turbo. The community also explored fine-tuning strategies involving DPO, LoRA, and safetensors, integration of RAG with API calls, semantic differences between MoE and dense LLMs, and data frameworks like llama index and SciPhi-AI's synthesizer. Issues with anomalous characters in fine-tuning were also raised.
1/1/2024: How to start with Open Source AI
gpt-4-turbo dall-e-3 chatgpt openai microsoft perplexity-ai prompt-engineering ai-reasoning custom-gpt performance python knowledge-integration swyx
OpenAI Discord discussions revealed mixed sentiments about Bing's AI versus ChatGPT and Perplexity AI, and debated Microsoft Copilot's integration with Office 365. Users discussed DALL-E 3 access within ChatGPT Plus, ChatGPT's performance issues, and ways to train a GPT model using book content via OpenAI API or custom GPTs. Anticipation for GPT-4 turbo in Microsoft Copilot was noted alongside conversations on AI reasoning, prompt engineering, and overcoming Custom GPT glitches. Advice for AI beginners included starting with Python and using YAML or Markdown for knowledge integration. The future of AI with multiple specialized GPTs and Microsoft Copilot's role was also explored.
12/18/2023: Gaslighting Mistral for fun and profit
gpt-4-turbo gpt-3.5-turbo claude-2.1 claude-instant-1 gemini-pro gpt-4.5 dalle-3 openai anthropic google-deepmind prompt-engineering api model-performance ethics role-play user-experience ai-impact-on-jobs ai-translation technical-issues sam-altman
OpenAI Discord discussions reveal comparisons among language models including GPT-4 Turbo, GPT-3.5 Turbo, Claude 2.1, Claude Instant 1, and Gemini Pro, with GPT-4 Turbo noted for user-centric explanations. Rumors about GPT-4.5 remain unconfirmed, with skepticism prevailing until official announcements. Users discuss technical challenges like slow responses and API issues, and explore role-play prompt techniques to enhance model performance. Ethical concerns about AI's impact on academia and employment are debated. Future features for Dalle 3 and a proposed new GPT model are speculated upon, while a school project seeks help using the OpenAI API. The community also touches on AI glasses and job market implications of AI adoption.
12/16/2023: ByteDance suspended by OpenAI
claude-2.1 gpt-4-turbo gemini-1.5-pro gpt-5 gpt-4.5 gpt-4 openai google-deepmind anthropic hardware gpu api-costs coding model-comparison subscription-issues payment-processing feature-confidentiality ai-art-generation organizational-productivity model-speculation
The OpenAI Discord community discussed hardware options like Mac racks and the A6000 GPU, highlighting their value for AI workloads. They compared Claude 2.1 and GPT 4 Turbo on coding tasks, with GPT 4 Turbo outperforming Claude 2.1. The benefits of the Bard API for gemini pro were noted, including a free quota of 60 queries per minute. Users shared experiences with ChatGPT Plus membership issues, payment problems, and speculated about the upcoming GPT-5 and the rumored GPT-4.5. Discussions also covered the confidentiality of the Alpha feature, AI art generation policies, and improvements in organizational work features. The community expressed mixed feelings about GPT-4's performance and awaited future model updates.