All tags
Topic: "api"
SmolLM3: the SOTA 3B reasoning open source LLM
smollm3-3b olmo-3 grok-4 claude-4 claude-4.1 gemini-nano hunyuan-a13b gemini-2.5 gemma-3n qwen2.5-vl-3b huggingface allenai openai anthropic google-deepmind mistral-ai tencent gemini alibaba open-source small-language-models model-releases model-performance benchmarking multimodality context-windows precision-fp8 api batch-processing model-scaling model-architecture licensing ocr elonmusk mervenoyann skirano amandaaskell clementdelangue loubnabenallal1 awnihannun swyx artificialanlys officiallogank osanseviero cognitivecompai aravsrinivas
HuggingFace released SmolLM3-3B, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source models until Olmo 3 arrives. Grok 4 was launched with mixed reactions, while concerns about Claude 4 nerfs and an imminent Claude 4.1 surfaced. Gemini Nano is now shipping in Chrome 137+, enabling local LLM access for 3.7 billion users. Tencent introduced Hunyuan-A13B, an 80B parameter model with a 256K context window running on a single H200 GPU. The Gemini API added a batch mode with 50% discounts on 2.5 models. MatFormer Lab launched tools for custom-sized Gemma 3n models. Open source OCR models like Nanonets-OCR-s and ChatDOC/OCRFlux-3B derived from Qwen2.5-VL-3B were highlighted, with licensing discussions involving Alibaba.
not much happened today
o3-mini o1-mini llama hunyuan-a13b ernie-4.5 ernie-4.5-21b-a3b qwen3-30b-a3b gemini-2.5-pro meta-ai-fair openai tencent microsoft baidu gemini superintelligence ai-talent job-market open-source-models multimodality mixture-of-experts quantization fp8-training model-benchmarking model-performance model-releases api model-optimization alexandr_wang shengjia_zhao jhyuxm ren_hongyu shuchaobi saranormous teortaxesTex mckbrando yuchenj_uw francoisfleuret quanquangu reach_vb philschmid
Meta has poached top AI talent from OpenAI, including Alexandr Wang joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next Llama model. The AI job market shows polarization with high demand and compensation for top-tier talent, while credentials like strong GitHub projects gain importance. The WizardLM team moved from Microsoft to Tencent to develop open-source models like Hunyuan-A13B, highlighting shifts in China's AI industry. Rumors suggest OpenAI will release a new open-source model in July, potentially surpassing existing ChatGPT models. Baidu open-sourced multiple variants of its ERNIE 4.5 model series, featuring advanced techniques like 2-bit quantization, MoE router orthogonalization loss, and FP8 training, with models ranging from 0.3B to 424B parameters. Gemini 2.5 Pro returned to the free tier of the Gemini API, enabling developers to explore its features.
not much happened today
gemma-3n hunyuan-a13b flux-1-kontext-dev mercury fineweb2 qwen-vlo o3-mini o4-mini google-deepmind tencent black-forest-labs inception-ai qwen kyutai-labs openai langchain langgraph hugging-face ollama unslothai nvidia amd multimodality mixture-of-experts context-windows tool-use coding image-generation diffusion-models dataset-release multilinguality speech-to-text api prompt-engineering agent-frameworks open-source model-release demishassabis reach_vb tri_dao osanseviero simonw clementdelangue swyx hwchase17 sydneyrunkle
Google released Gemma 3n, a multimodal model for edge devices available in 2B and 4B parameter versions, with support across major frameworks like Transformers and Llama.cpp. Tencent open-sourced Hunyuan-A13B, a Mixture-of-Experts (MoE) model with 80B total parameters and a 256K context window, optimized for tool calling and coding. Black Forest Labs released FLUX.1 Kontext [dev], an open image AI model gaining rapid Hugging Face adoption. Inception AI Labs launched Mercury, the first commercial-scale diffusion LLM for chat. The FineWeb2 multilingual pre-training dataset paper was released, analyzing data quality impacts. The Qwen team released Qwen-VLo, a unified visual understanding and generation model. Kyutai Labs released a top-ranked open-source speech-to-text model running on Macs and iPhones. OpenAI introduced Deep Research API with o3/o4-mini models and open-sourced prompt rewriter methodology, integrated into LangChain and LangGraph. The open-source Gemini CLI gained over 30,000 GitHub stars as an AI terminal agent.
The Quiet Rise of Claude Code vs Codex
mistral-small-3.2 qwen3-0.6b llama-3-1b gemini-2.5-flash-lite gemini-app magenta-real-time apple-3b-on-device mistral-ai hugging-face google-deepmind apple artificial-analysis kuaishou instruction-following function-calling model-implementation memory-efficiency 2-bit-quantization music-generation video-models benchmarking api reach_vb guillaumelample qtnx_ shxf0072 rasbt demishassabis artificialanlys osanseviero
Claude Code is gaining mass adoption, inspiring derivative projects like OpenCode and ccusage, with discussions ongoing in AI communities. Mistral AI released Mistral Small 3.2, a 24B parameter model update improving instruction following and function calling, available on Hugging Face and supported by vLLM. Sebastian Raschka implemented Qwen3 0.6B from scratch, noting its deeper architecture and memory efficiency compared to Llama 3 1B. Google DeepMind showcased Gemini 2.5 Flash-Lite's UI code generation from visual context and added video upload support in the Gemini App. Apple's new 3B parameter on-device foundation model was benchmarked, showing slower speed but efficient memory use via 2-bit quantization, suitable for background tasks. Google DeepMind also released Magenta Real-time, an 800M parameter music generation model licensed under Apache 2.0, marking Google's 1000th model on Hugging Face. Kuaishou launched KLING 2.1, a new video model accessible via API.
SOTA Video Gen: Veo 2 and Kling 2 are GA for developers
veo-2 gemini gpt-4.1 gpt-4o gpt-4.5-preview gpt-4.1-mini gpt-4.1-nano google openai video-generation api coding instruction-following context-window performance benchmarks model-deprecation kevinweil stevenheidel aidan_clark_
Google's Veo 2 video generation model is now available in the Gemini API with a cost of 35 cents per second of generated video, marking a significant step in accessible video generation. Meanwhile, China's Kling 2 model launched with pricing around $2 for a 10-second clip and a minimum subscription of $700 per month for 3 months, generating excitement despite some skill challenges. OpenAI announced the GPT-4.1 family release, including GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, highlighting improvements in coding, instruction following, and a 1 million token context window. The GPT-4.1 models are 26% cheaper than GPT-4o and will replace the GPT-4.5 Preview API version by July 14. Performance benchmarks show GPT-4.1 achieving 54-55% on SWE-bench verified and a 60% improvement over GPT-4o in some internal tests, though some critiques note it underperforms compared to other models like OpenRouter and DeepSeekV3 in coding tasks. The release is API-only, with a prompting guide provided for developers.
not much happened today
gpt-4o deepseek-v3-0324 gemini-2.5-pro gemini-3 claude-3.7-sonnet openai hugging-face sambanova google-cloud instruction-following image-generation content-filtering model-performance api coding model-deployment benchmarking model-release abacaj nrehiew_ sama joannejang giffmana lmarena_ai _philschmid
OpenAI announced the new GPT-4o model with enhanced instruction-following, complex problem-solving, and native image generation capabilities. The model shows improved performance in math, coding, and creativity, with features like transparent background image generation. Discussions around content filtering and policy for image generation emphasize balancing creative freedom and harm prevention. DeepSeek V3-0324 APIs, available on Hugging Face and powered by SambaNovaAI, outperform benchmarks and models like Gemini 2.0 Pro and Claude 3.7 Sonnet. Gemini 2.5 Pro is recommended for coding, and Gemini 3 can be deployed easily on Google Cloud Vertex AI via the new Model Garden SDK. The Gemma 3 Technical Report has been released on arXiv.
Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI
gpt-4o-transcribe gpt-4o-mini-tts o1-pro kokoro-82m openai replicate speech-to-text text-to-speech voice-activity-detection prompt-engineering real-time-processing model-release api function-calling structured-outputs model-performance juberti sama reach_vb kevinweil omarsar0
OpenAI has launched three new state-of-the-art audio models in their API, including gpt-4o-transcribe, a speech-to-text model outperforming Whisper, and gpt-4o-mini-tts, a text-to-speech model with promptable prosody allowing control over timing and emotion. The Agents SDK now supports audio, enabling voice agents. OpenAI also updated turn detection for real-time voice activity detection (VAD) based on speech content. Additionally, OpenAI's o1-pro model is available to select developers with advanced features like vision and function calling, though at higher compute costs. The community shows strong enthusiasm for these audio advancements, with a radio contest for TTS creations underway. Meanwhile, Kokoro-82M v1.0 emerges as a leading open weights TTS model with competitive pricing on Replicate.
The new OpenAI Agents Platform
reka-flash-3 o1-mini claude-3-7-sonnet llama-3-3-70b sonic-2 qwen-chat olympiccoder openai reka-ai hugging-face deepseek togethercompute alibaba ai-agents api model-releases fine-tuning reinforcement-learning model-training model-inference multimodality voice-synthesis gpu-clusters model-distillation performance-optimization open-source sama reach_vb
OpenAI introduced a comprehensive suite of new tools for AI agents, including the Responses API, Web Search Tool, Computer Use Tool, File Search Tool, and an open-source Agents SDK with integrated observability tools, marking a significant step towards the "Year of Agents." Meanwhile, Reka AI open-sourced Reka Flash 3, a 21B parameter reasoning model that outperforms o1-mini and powers their Nexus platform, with weights available on Hugging Face. The OlympicCoder series surpassed Claude 3.7 Sonnet and much larger models on competitive coding benchmarks. DeepSeek built a 32K GPU cluster capable of training V3-level models in under a week and is exploring AI distillation. Hugging Face announced Cerebras inference support, achieving over 2,000 tokens/s on Llama 3.3 70B, 70x faster than leading GPUs. Reka's Sonic-2 voice AI model delivers 40ms latency via the Together API. Alibaba's Qwen Chat enhanced its multimodal interface with video understanding up to 500MB, voice-to-text, guest mode, and expanded file uploads. Sama praised OpenAI's new API as "one of the most well-designed and useful APIs ever."
not much happened today
jamba-1.6 mistral-ocr qwq-32b o1 o3-mini instella llama-3-2-3b gemma-2-2b qwen-2-5-3b babel-9b babel-83b gpt-4o claude-3-7-sonnet ai21-labs mistral-ai alibaba openai amd anthropic hugging-face multimodality ocr multilinguality structured-output on-prem-deployment reasoning benchmarking api open-source model-training gpu-optimization prompt-engineering function-calling
AI21 Labs launched Jamba 1.6, touted as the best open model for private enterprise deployment, outperforming Cohere, Mistral, and Llama on benchmarks like Arena Hard. Mistral AI released a state-of-the-art multimodal OCR model with multilingual and structured output capabilities, available for on-prem deployment. Alibaba Qwen introduced QwQ-32B, an open-weight reasoning model with 32B parameters and cost-effective usage, showing competitive benchmark scores. OpenAI released o1 and o3-mini models with advanced API features including streaming and function calling. AMD unveiled Instella, open-source 3B parameter language models trained on AMD Instinct MI300X GPUs, competing with Llama-3.2-3B and others. Alibaba also released Babel, open multilingual LLMs performing comparably to GPT-4o. Anthropic launched Claude 3.7 Sonnet, enhancing reasoning and prompt engineering capabilities.
lots of small launches
gpt-4o claude-3.7-sonnet claude-3.7 claude-3.5-sonnet deepseek-r1 deepseek-v3 grok-3 openai anthropic amazon cloudflare perplexity-ai deepseek-ai togethercompute elevenlabs elicitorg inceptionailabs mistral-ai voice model-releases cuda gpu-optimization inference open-source api model-performance token-efficiency context-windows cuda jit-compilation lmarena_ai alexalbert__ aravsrinivas reach_vb
GPT-4o Advanced Voice Preview is now available for free ChatGPT users with enhanced daily limits for Plus and Pro users. Claude 3.7 Sonnet has achieved the top rank in WebDev Arena with improved token efficiency. DeepSeek-R1 with 671B parameters benefits from the Together Inference platform optimizing NVIDIA Blackwell GPU usage, alongside the open-source DeepGEMM CUDA library delivering up to 2.7x speedups on Hopper GPUs. Perplexity launched a new Voice Mode and a Deep Research API. The upcoming Grok 3 API will support a 1M token context window. Several companies including Elicit, Amazon, Anthropic, Cloudflare, FLORA, Elevenlabs, and Inception Labs announced new funding rounds, product launches, and model releases.
o3-mini launches, OpenAI on "wrong side of history"
o3-mini o1 gpt-4o mistral-small-3-24b deepseek-r1 openai mistral-ai deepseek togethercompute fireworksai_hq ai-gradio replicate reasoning safety cost-efficiency model-performance benchmarking api open-weight-models model-releases sam-altman
OpenAI released o3-mini, a new reasoning model available for free and paid users with a "high" reasoning effort option that outperforms the earlier o1 model on STEM tasks and safety benchmarks, costing 93% less per token. Sam Altman acknowledged a shift in open source strategy and credited DeepSeek R1 for influencing assumptions. MistralAI launched Mistral Small 3 (24B), an open-weight model with competitive performance and low API costs. DeepSeek R1 is supported by Text-generation-inference v3.1.0 and available via ai-gradio and replicate. The news highlights advancements in reasoning, cost-efficiency, and safety in AI models.
Mistral Small 3 24B and Tulu 3 405B
mistral-small-3 tulu-3-405b llama-3 tiny-swallow-1.5b qwen-2.5-max deepseek-v3 claude-3.5-sonnet gemini-1.5-pro gpt4o-mini llama-3-3-70b mistral-ai ai2 sakana-ai alibaba_qwen deepseek ollama llamaindex reinforcement-learning model-fine-tuning local-inference model-performance model-optimization on-device-ai instruction-following api training-data natural-language-processing clementdelangue dchaplot reach_vb
Mistral AI released Mistral Small 3, a 24B parameter model optimized for local inference with low latency and 81% accuracy on MMLU, competing with Llama 3.3 70B, Qwen-2.5 32B, and GPT4o-mini. AI2 released Tülu 3 405B, a large finetuned model of Llama 3 using Reinforcement Learning from Verifiable Rewards (RVLR), competitive with DeepSeek v3. Sakana AI launched TinySwallow-1.5B, a Japanese language model using TAID for on-device use. Alibaba_Qwen released Qwen 2.5 Max, trained on 20 trillion tokens, with performance comparable to DeepSeek V3, Claude 3.5 Sonnet, and Gemini 1.5 Pro, and updated API pricing. These releases highlight advances in open models, efficient inference, and reinforcement learning techniques.
Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model
o1 vdr-2b-multi-v1 llava-mini openai llamaindex langchainai qdrant genmoai vision model-efficiency structured-output gaze-detection reasoning model-distillation multimodality embedding-models gan diffusion-models self-attention training-optimizations development-frameworks api cross-language-deployment semantic-search agentic-document-processing developer-experience philschmid saranormous jxmnop reach_vb iscienceluvr multimodalart arohan adcock_brett awnihannun russelljkaplan ajayj_
Moondream has released a new version that advances VRAM efficiency and adds structured output and gaze detection, marking a new frontier in vision model practicality. Discussions on Twitter highlighted advancements in reasoning models like OpenAI's o1, model distillation techniques, and new multimodal embedding models such as vdr-2b-multi-v1 and LLaVA-Mini, which significantly reduce computational costs. Research on GANs and decentralized diffusion models showed improved stability and performance. Development tools like MLX and vLLM received updates for better portability and developer experience, while frameworks like LangChain and Qdrant enable intelligent data workflows. Company updates include new roles and team expansions at GenmoAI. "Efficiency tricks are all you need."
o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
o1-2024-12-17 o1 o1-pro 4o 4o-mini gemini-2-0-flash claude-3.5-sonnet claude-3.5 openai google google-deepmind function-calling structured-outputs vision reasoning webrtc realtime-api preference-tuning fine-tuning api model-performance aidan_mclau kevinweil simonw michpokrass morgymcg juberti
OpenAI launched the o1 API with enhanced features including vision inputs, function calling, structured outputs, and a new
reasoning_effort
parameter, achieving 60% fewer reasoning tokens on average. The o1 pro variant is confirmed as a distinct implementation coming soon. Improvements to the Realtime API with WebRTC integration offer easier usage, longer sessions (up to 30 minutes), and significantly reduced pricing (up to 10x cheaper with mini models). DPO Preference Tuning for fine-tuning is introduced, currently available for the 4o model. Additional updates include official Go and Java SDKs and OpenAI DevDay videos. The news also highlights discussions on Google Gemini 2.0 Flash model's performance reaching 83.6% accuracy. not much happened today
ic-light-v2 claude-3-5-sonnet puzzle nvidia amazon anthropic google pydantic supabase browser-company world-labs cognition distillation neural-architecture-search inference-optimization video trajectory-attention timestep-embedding ai-safety-research fellowship-programs api domain-names reverse-thinking reasoning agent-frameworks image-to-3d ai-integration akhaliq adcock_brett omarsar0 iscienceluvr
AI News for 11/29/2024-12/2/2024 highlights several developments: Nvidia introduced Puzzle, a distillation-based neural architecture search for inference-optimized large language models, enhancing efficiency. The IC-Light V2 model was released for varied illumination scenarios, and new video model techniques like Trajectory Attention and Timestep Embedding were presented. Amazon increased its investment in Anthropic to $8 billion, supporting AI safety research through a new fellowship program. Google is expanding AI integration with the Gemini API and open collaboration tools. Discussions on domain name relevance emphasize alternatives to .com domains like .io, .ai, and .co. Advances in reasoning include a 13.53% improvement in LLM performance using "Reverse Thinking". Pydantic launched a new agent framework, and Supabase released version 2 of their assistant. Other notable mentions include Browser Company teasing a second browser and World Labs launching image-to-3D-world technology. The NotebookLM team departed from Google, and Cognition was featured on the cover of Forbes. The news was summarized by Claude 3.5 Sonnet.
Pixtral 12B: Mistral beats Llama to Multimodality
pixtral-12b mistral-nemo-12b llama-3-1-70b llama-3-1-8b deeps-eek-v2-5 gpt-4-turbo llama-3-1 strawberry claude mistral-ai meta-ai-fair hugging-face arcee-ai deepseek-ai openai anthropic vision multimodality ocr benchmarking model-release model-architecture model-performance fine-tuning model-deployment reasoning code-generation api access-control reach_vb devendra_chapilot _philschmid rohanpaul_ai
Mistral AI released Pixtral 12B, an open-weights vision-language model with a Mistral Nemo 12B text backbone and a 400M vision adapter, featuring a large vocabulary of 131,072 tokens and support for 1024x1024 pixel images. This release notably beat Meta AI in launching an open multimodal model. At the Mistral AI Summit, architecture details and benchmark performances were shared, showing strong OCR and screen understanding capabilities. Additionally, Arcee AI announced SuperNova, a distilled Llama 3.1 70B & 8B model outperforming Meta's Llama 3.1 70B instruct on benchmarks. DeepSeek released DeepSeek-V2.5, scoring 89 on HumanEval, surpassing GPT-4-Turbo, Opus, and Llama 3.1 in coding tasks. OpenAI plans to release Strawberry as part of ChatGPT soon, though its capabilities are debated. Anthropic introduced Workspaces for managing multiple Claude deployments with enhanced access controls.
not much happened today
llama-3 llama-3-1 grok-2 claude-3.5-sonnet gpt-4-turbo nous-research nvidia salesforce goodfire-ai anthropic x-ai google-deepmind box langchain fine-tuning prompt-caching mechanistic-interpretability model-performance multimodality agent-frameworks software-engineering-agents api document-processing text-generation model-releases vision image-generation efficiency scientific-discovery fchollet demis-hassabis
GPT-5 delayed again amid a quiet news day. Nous Research released Hermes 3 finetune of Llama 3 base models, rivaling FAIR's instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. Nvidia introduced Minitron finetune of Llama 3.1. Salesforce launched a DEI agent scoring 55% on SWE-Bench Lite. Goodfire AI secured $7M seed funding for mechanistic interpretability work. Anthropic rolled out prompt caching in their API, cutting input costs by up to 90% and latency by 80%, aiding coding assistants and large document processing. xAI released Grok-2, matching Claude 3.5 Sonnet and GPT-4 Turbo on LMSYS leaderboard with vision+text inputs and image generation integration. Claude 3.5 Sonnet reportedly outperforms GPT-4 in coding and reasoning. François Chollet defined intelligence as efficient operationalization of past info for future tasks. Salesforce's DEI framework surpasses individual agent performance. Google DeepMind's Demis Hassabis discussed AGI's role in scientific discovery and safe AI development. Dora AI plugin generates landing pages in under 60 seconds, boosting web team efficiency. Box AI API beta enables document chat, data extraction, and content summarization. LangChain updated Python & JavaScript integration docs.
Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
claude-3.5-sonnet claude-3-opus gpt-4o anthropic openai cognition benchmarking model-performance coding model-optimization fine-tuning instruction-following model-efficiency model-release api performance-optimization alex-albert
Claude 3.5 Sonnet, released by Anthropic, is positioned as a Pareto improvement over Claude 3 Opus, operating at twice the speed and costing one-fifth as much. It achieves state-of-the-art results on benchmarks like GPQA, MMLU, and HumanEval, surpassing even GPT-4o and Claude 3 Opus on vision tasks. The model demonstrates significant advances in coding capabilities, passing 64% of test cases compared to 38% for Claude 3 Opus, and is capable of autonomously fixing pull requests. Anthropic also introduced the Artifacts feature, enabling users to interact with AI-generated content such as code snippets and documents in a dynamic workspace, similar to OpenAI's Code Interpreter. This release highlights improvements in performance, cost-efficiency, and coding proficiency, signaling a growing role for LLMs in software development.
Meta Llama 3 (8B, 70B)
llama-3-8b llama-3-70b llama-3-400b stable-diffusion-3 mixtral-8x22b-instruct-v0.1 vasa-1 meta-ai-fair stability-ai boston-dynamics microsoft mistral-ai hugging-face transformer tokenization model-training benchmarking robotics natural-language-processing real-time-processing synthetic-data dataset-cleaning behavior-trees ai-safety model-accuracy api model-release humor helen-toner
Meta partially released Llama 3 models including 8B and 70B variants, with a 400B variant still in training, touted as the first GPT-4 level open-source model. Stability AI launched Stable Diffusion 3 API with model weights coming soon, showing competitive realism against Midjourney V6. Boston Dynamics unveiled an electric humanoid robot Atlas, and Microsoft introduced the VASA-1 model generating lifelike talking faces at 40fps on RTX 4090. Mistral AI, a European OpenAI rival, is seeking $5B funding with its Mixtral-8x22B-Instruct-v0.1 model achieving 100% accuracy on 64K context benchmarks. AI safety discussions include calls from former OpenAI board member Helen Toner for audits of top AI companies, and the Mormon Church released AI usage principles. New AI development tools include Ctrl-Adapter for diffusion models, Distilabel 1.0.0 for synthetic dataset pipelines, Data Bonsai for data cleaning with LLMs, and Dendron for building LLM agents with behavior trees. Memes highlight AI development humor and cultural references. The release of Llama 3 models features improved reasoning, a 128K token vocabulary, 8K token sequences, and grouped query attention.
Google Solves Text to Video
mistral-7b llava google-research amazon-science huggingface mistral-ai together-ai text-to-video inpainting space-time-diffusion code-evaluation fine-tuning inference gpu-rentals multimodality api model-integration learning-rates
Google Research introduced Lumiere, a text-to-video model featuring advanced inpainting capabilities using a Space-Time diffusion process, surpassing previous models like Pika and Runway. Manveer from UseScholar.org compiled a comprehensive list of code evaluation benchmarks beyond HumanEval, including datasets from Amazon Science, Hugging Face, and others. Discord communities such as TheBloke discussed topics including running Mistral-7B via API, GPU rentals, and multimodal model integration with LLava. Nous Research AI highlighted learning rate strategies for LLM fine-tuning, issues with inference, and benchmarks like HumanEval and MBPP. RestGPT gained attention for controlling applications via RESTful APIs, showcasing LLM application capabilities.
1/2/2024: Smol tweaks to Smol Talk
claude-2 bard copilot meta-ai gemini-ultra chatgpt openai meta-ai-fair perplexity-ai prompt-engineering api json yaml markdown chatbot image-generation vpn browser-compatibility personality-tuning plugin-issues
OpenAI Discord discussions highlight a detailed comparison of AI search engines including Perplexity, Copilot, Bard, and Claude 2, with Bard and Claude 2 trailing behind. Meta AI chatbot by Meta is introduced, available on Instagram and Whatsapp, featuring image generation likened to a free GPT version. Users report multiple browser issues with ChatGPT, including persistent captchas when using VPNs and plugin malfunctions. Debates cover prompt engineering, API usage, and data formats like JSON, YAML, and Markdown. Discussions also touch on ChatGPT's personality tuning and model capability variations. "Meta AI includes an image generation feature, which he likened to a free version of GPT."
12/22/2023: Anyscale's Benchmark Criticisms
gpt-4 gpt-3.5 bard anyscale openai microsoft benchmarking performance api prompt-engineering bug-tracking model-comparison productivity programming-languages storytelling
Anyscale launched their LLMPerf leaderboard to benchmark large language model inference performance, but it faced criticism for lacking detailed metrics like cost per token and throughput, and for comparing public LLM endpoints without accounting for batching and load. In OpenAI Discord discussions, users reported issues with Bard and preferred Microsoft Copilot for storytelling, noting fewer hallucinations. There was debate on the value of upgrading from GPT-3.5 to GPT-4, with many finding paid AI models worthwhile for coding productivity. Bugs and performance issues with OpenAI APIs were also highlighted, including slow responses and message limits. Future AI developments like GPT-6 and concerns about OpenAI's transparency and profitability were discussed. Prompt engineering for image generation was another active topic, emphasizing clear positive prompts and the desire for negative prompts.
12/19/2023: Everybody Loves OpenRouter
gpt-4 gpt-3.5 mixtral-8x7b-instruct dolphin-2.0-mistral-7b gemini openai mistral-ai google hugging-face performance memory-management api prompt-engineering local-language-models translation censorship video-generation
OpenRouter offers an easy OpenAI-compatible proxy for Mixtral-8x7b-instruct. Discord discussions highlight GPT-4 performance and usability issues compared to GPT-3.5, including memory management and accessibility problems. Users debate local language models versus OpenAI API usage, with mentions of Dolphin 2.0 Mistral 7B and Google's video generation project. Prompt engineering and custom instructions for GPT models are also key topics. Concerns about censorship on models like Gemini and translation tool preferences such as DeepL were discussed.
12/18/2023: Gaslighting Mistral for fun and profit
gpt-4-turbo gpt-3.5-turbo claude-2.1 claude-instant-1 gemini-pro gpt-4.5 dalle-3 openai anthropic google-deepmind prompt-engineering api model-performance ethics role-play user-experience ai-impact-on-jobs ai-translation technical-issues sam-altman
OpenAI Discord discussions reveal comparisons among language models including GPT-4 Turbo, GPT-3.5 Turbo, Claude 2.1, Claude Instant 1, and Gemini Pro, with GPT-4 Turbo noted for user-centric explanations. Rumors about GPT-4.5 remain unconfirmed, with skepticism prevailing until official announcements. Users discuss technical challenges like slow responses and API issues, and explore role-play prompt techniques to enhance model performance. Ethical concerns about AI's impact on academia and employment are debated. Future features for Dalle 3 and a proposed new GPT model are speculated upon, while a school project seeks help using the OpenAI API. The community also touches on AI glasses and job market implications of AI adoption.
12/14/2023: $1e7 for Superalignment
gemini bard gpt-4 gpt-4.5 llama-2 openai llamaindex perplexity-ai prompt-engineering api custom-gpt json bug-fixes chatbots performance tts code-generation image-recognition jan-leike patrick-collison
Jan Leike is launching a new grant initiative inspired by Patrick Collison's Fast Grants to support AI research. OpenAI introduced a new developers Twitter handle @OpenAIDevs for community updates. Discussions on OpenAI's Gemini and Bard chatbots highlight their ability to read each other's instructions and offer unique coding solutions. Users reported various issues with GPT-4, including performance problems, customization difficulties, and a resolved bug in image recognition. There are ongoing conversations about prompt engineering challenges and new JSON mode support in Convo-lang for API use. Concerns about misuse of chatbots for illegal activities and alternatives like Llama2 models and the Perplexity chatbot were also discussed.