All tags

Topic: "fine-tuning"

    Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI
    not much happened today
    not much happened today
    ChatGPT Codex, OpenAI's first cloud SWE agent
    Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning
    not much happened today
    not much happened today
    ChatGPT responds to GlazeGate + LMArena responds to Cohere
    LlamaCon: Meta AI gets into the Llama API platform business
    Every 7 Months: The Moore's Law for Agent Autonomy
    Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)
    not much happened today
    The new OpenAI Agents Platform
    DeepSeek's Open Source Stack
    Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)
    small news items
    not much happened today
    s1: Simple test-time scaling (and Kyutai Hibiki)
    Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking
    TinyZero: Reproduce DeepSeek R1-Zero for $30
    DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level
    not much happened today
    not much happened today
    DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens
    o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
    Meta Llama 3.3: 405B/Nova Pro performance at 70B price
    $200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews
    Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500
    OLMo 2 - new SOTA Fully Open LLM
    Vision Everywhere: Apple AIMv2 and Jina CLIP v2
    BitNet was a lie?
    Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data
    The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more
    not much happened today
    DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
    DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
    not much happened today
    The AI Nobel Prize
    Not much technical happened today
    a quiet weekend
    Pixtral 12B: Mistral beats Llama to Multimodality
    Reflection 70B, by Matt from IT Department
    Everybody shipped small things this holiday weekend
    Ideogram 2 + Berkeley Function Calling Leaderboard V2
    not much happened today
    The DSPy Roadmap
    not much happened today
    not much happened today
    not much happened today
    GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)
    Llama 3.1: The Synthetic Data Model
    Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model
    DataComp-LM: the best open-data 7B model/benchmark/dataset
    Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)
    Gemma 2 tops /r/LocalLlama vibe check
    Microsoft AgentInstruct + Orca 3
    FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence
    Qdrant's BM42: "Please don't trust us"
    GraphRAG: The Marriage of Knowledge Graphs and RAG
    Gemma 2: The Open Model for Everyone
    Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts
    Gemini launches context caching... or does it?
    Is this... OpenQ*?
    Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata
    Hybrid SSM/Transformers > Pure SSMs/Pure Transformers
    The Last Hurrah of Stable Diffusion?
    HippoRAG: First, do know(ledge) Graph
    5 small news items
    Somebody give Andrej some H100s already
    Ten Commandments for Deploying Fine-Tuned Models
    Skyfall
    Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing
    Google I/O in 60 seconds
    GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)
    DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
    $100k to predict LMSYS human preferences in a Kaggle contest
    Evals: The Next Generation
    Perplexity, the newest AI unicorn
    Llama-3-70b is GPT-4-level Open Model
    Zero to GPT in 1 Year
    Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning
    Evals-based AI Engineering
    Jamba: Mixture of Architectures dethrones Mixtral
    DBRX: Best open model (just not most efficient)
    Claude 3 is officially America's Next Top Model
    Andrew likes Agents
    not much happened today
    Welcome /r/LocalLlama!
    MM1: Apple's first Large Multimodal Model
    The world's first fully autonomous AI Engineer
    FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs
    Stable Diffusion 3 — Rombach & Esser did it again!
    The Era of 1-bit LLMs
    Mistral Large disappoints
    One Year of Latent Space
    Ring Attention for >1M Context
    Karpathy emerges from stealth?
    Companies liable for AI hallucination is Good Actually for AI Engineers
    Sora pushes SOTA
    The Dissection of Smaug (72B)
    Gemini Ultra is out, to mixed reviews
    Qwen 1.5 Released
    Less Lazy AI
    The Core Skills of AI Engineering
    AI2 releases OLMo - the 4th open-everything LLM
    Trust in GPTs at all time low
    Miqu confirmed to be an early Mistral-medium checkpoint
    CodeLLama 70B beats GPT4 on HumanEval
    RWKV "Eagle" v5: Your move, Mamba
    GPT4Turbo A/B Test: gpt-4-0125-preview
    GPT4Turbo A/B Test: gpt-4-1106-preview
    Adept Fuyu-Heavy: Multimodal model for Agents
    Google Solves Text to Video
    RIP Latent Diffusion, Hello Hourglass Diffusion
    Nightshade poisons AI art... kinda?
    Sama says: GPT-5 soon
    1/16/2024: ArtificialAnalysis - a new model/host benchmark site
    1/12/2024: Anthropic coins Sleeper Agents
    1/11/2024: Mixing Experts vs Merging Models
    1/9/2024: Nous Research lands $5m for Open Source AI
    1/8/2024: The Four Wars of the AI Stack
    1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??
    12/31/2023: Happy New Year
    12/27/2023: NYT vs OpenAI
    12/24/2023: Dolphin Mixtral 8x7b is wild
    12/13/2023 SOLAR10.7B upstages Mistral7B?
    12/11/2023: Mixtral beats GPT3.5 and Llama2-70B
    12/10/2023: not much happened today