Topic: fine-tuning

codex-1 openai-o3 codex-mini gemma-3 blip3-o qwen-2.5 marigold-iid deepseek-v3 lightlab gemini-2.0 lumina-next openai runway salesforce qwen deepseek google google-deepmind j1 software-engineering parallel-processing multimodality diffusion-models depth-estimation scaling-laws reinforcement-learning fine-tuning model-performance multi-turn-conversation reasoning audio-processing sama kevinweil omarsar0 iscienceluvr akhaliq osanseviero c_valenzuelab mervenoyann arankomatsuzaki jasonwei demishassabis philschmid swyx teortaxestex jaseweston

OpenAI launched Codex, a cloud-based software engineering agent powered by codex-1 (an optimized version of OpenAI o3) available in research preview for Pro, Enterprise, and Team ChatGPT users, featuring parallel task execution like refactoring and bug fixing. The Codex CLI was enhanced with quick sign-in and a new low-latency model, codex-mini. Gemma 3 is highlighted as the best open model runnable on a single GPU. Runway released the Gen-4 References API for style transfer in generation. Salesforce introduced BLIP3-o, a unified multimodal model family using diffusion transformers for CLIP image features. The Qwen 2.5 models (1.5B and 3B versions) were integrated into the PocketPal app with various chat templates. Marigold IID, a new state-of-the-art open-source depth estimation model, was released. In research, DeepSeek shared insights on scaling and hardware for DeepSeek-V3. Google unveiled LightLab, a diffusion-based light source control in images. Google DeepMind's AlphaEvolve uses Gemini 2.0 to discover new math and reduce costs without reinforcement learning. Omni-R1 studied audio's role in fine-tuning audio LLMs. Qwen proposed a parallel scaling law inspired by classifier-free guidance. Salesforce released Lumina-Next on the Qwen base, outperforming Janus-Pro. A study found LLM performance degrades in multi-turn conversations due to unreliability. J1 is incentivizing LLM-as-a-Judge thinking via reinforcement learning. A new Qwen study correlates question and strategy similarity to predict reasoning strategies.

May 12

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

May 08

not much happened today

May 02

not much happened today

Apr 30

ChatGPT responds to GlazeGate + LMArena responds to Cohere

Apr 29

LlamaCon: Meta AI gets into the Llama API platform business

Mar 20

Every 7 Months: The Moore's Law for Agent Autonomy

Mar 18

Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)

Mar 14

not much happened today

Mar 12

The new OpenAI Agents Platform

Mar 08

DeepSeek's Open Source Stack

Feb 14

Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)

Feb 13

small news items

Feb 11

not much happened today

Feb 07

s1: Simple test-time scaling (and Kyutai Hibiki)

Feb 06

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

Jan 25

TinyZero: Reproduce DeepSeek R1-Zero for $30

Jan 21

DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

Jan 18

not much happened today

Jan 08

not much happened today

Dec 27, 2024

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

Dec 18, 2024

o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning

Dec 06, 2024

Meta Llama 3.3: 405B/Nova Pro performance at 70B price

Dec 06, 2024

$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews

Nov 28, 2024

Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500

Nov 27, 2024

OLMo 2 - new SOTA Fully Open LLM

Nov 22, 2024

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

Nov 13, 2024

BitNet was a lie?

Nov 06, 2024

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

Nov 01, 2024

The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more

Oct 24, 2024

not much happened today

Oct 22, 2024

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Oct 18, 2024

DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

Oct 18, 2024

not much happened today

Oct 09, 2024

The AI Nobel Prize

Oct 02, 2024

Not much technical happened today

Sep 17, 2024

a quiet weekend

Sep 12, 2024

Pixtral 12B: Mistral beats Llama to Multimodality

Sep 07, 2024

Reflection 70B, by Matt from IT Department

Sep 04, 2024

Everybody shipped small things this holiday weekend

Aug 22, 2024

Ideogram 2 + Berkeley Function Calling Leaderboard V2

Aug 21, 2024

not much happened today

Aug 20, 2024

The DSPy Roadmap

Aug 17, 2024

not much happened today

Aug 16, 2024

not much happened today

Aug 10, 2024

not much happened today

Aug 07, 2024

GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)

Jul 24, 2024

Llama 3.1: The Synthetic Data Model

Jul 23, 2024

Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model

Jul 20, 2024

DataComp-LM: the best open-data 7B model/benchmark/dataset

Jul 19, 2024

Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)

Jul 17, 2024

Gemma 2 tops /r/LocalLlama vibe check

Jul 16, 2024

Microsoft AgentInstruct + Orca 3

Jul 12, 2024

FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence

Jul 06, 2024

Qdrant's BM42: "Please don't trust us"

Jul 03, 2024

GraphRAG: The Marriage of Knowledge Graphs and RAG

Jun 28, 2024

Gemma 2: The Open Model for Everyone

Jun 21, 2024

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

Jun 18, 2024

Gemini launches context caching... or does it?

Jun 18, 2024

Is this... OpenQ*?

Jun 14, 2024

Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata

Jun 13, 2024

Hybrid SSM/Transformers > Pure SSMs/Pure Transformers

Jun 12, 2024

The Last Hurrah of Stable Diffusion?

Jun 07, 2024

HippoRAG: First, do know(ledge) Graph

Jun 06, 2024

5 small news items

May 29, 2024

Somebody give Andrej some H100s already

May 24, 2024

Ten Commandments for Deploying Fine-Tuned Models

May 20, 2024

Skyfall

May 17, 2024

Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing

May 14, 2024

Google I/O in 60 seconds

May 13, 2024

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)

May 06, 2024

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

May 03, 2024

$100k to predict LMSYS human preferences in a Kaggle contest

May 02, 2024

Evals: The Next Generation

Apr 23, 2024

Perplexity, the newest AI unicorn

Apr 20, 2024

Llama-3-70b is GPT-4-level Open Model

Apr 12, 2024

Zero to GPT in 1 Year

Apr 04, 2024

Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning

Mar 29, 2024

Evals-based AI Engineering

Mar 28, 2024

Jamba: Mixture of Architectures dethrones Mixtral

Mar 27, 2024

DBRX: Best open model (just not most efficient)

Mar 27, 2024

Claude 3 is officially America's Next Top Model

Mar 26, 2024

Andrew likes Agents

Mar 22, 2024

not much happened today

Mar 21, 2024

Welcome /r/LocalLlama!

Mar 15, 2024

MM1: Apple's first Large Multimodal Model

Mar 12, 2024

The world's first fully autonomous AI Engineer

Mar 08, 2024

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

Mar 05, 2024

Stable Diffusion 3 — Rombach & Esser did it again!

Mar 01, 2024

The Era of 1-bit LLMs

Feb 26, 2024

Mistral Large disappoints

Feb 24, 2024

One Year of Latent Space

Feb 23, 2024

Ring Attention for >1M Context

Feb 21, 2024

Karpathy emerges from stealth?

Feb 20, 2024

Companies liable for AI hallucination is Good Actually for AI Engineers

Feb 16, 2024

Sora pushes SOTA

Feb 13, 2024

The Dissection of Smaug (72B)

Feb 09, 2024

Gemini Ultra is out, to mixed reviews

The Core Skills of AI Engineering

Feb 03, 2024

AI2 releases OLMo - the 4th open-everything LLM

Feb 02, 2024

Trust in GPTs at all time low

Jan 31, 2024

Miqu confirmed to be an early Mistral-medium checkpoint

Jan 30, 2024

CodeLLama 70B beats GPT4 on HumanEval

codellama miqu mistral-medium llama-2-70b aphrodite-engine mixtral flatdolphinmaid noromaid rpcal chatml mistral-7b activation-beacon eagle-7b rwkv-v5 openhermes2.5 nous-hermes-2-mixtral-8x7b-dpo imp-v1-3b bakllava moondream qwen-vl meta-ai-fair ollama nous-research mistral-ai hugging-face ai-ethics alignment gpu-optimization direct-prompt-optimization fine-tuning cuda-programming optimizer-technology quantization multimodality context-length dense-retrieval retrieval-augmented-generation multilinguality model-performance open-source code-generation classification vision

Meta AI surprised the community with the release of CodeLlama, an open-source model now available on platforms like Ollama and MLX for local use. The Miqu model sparked debate over its origins, possibly linked to Mistral Medium or a fine-tuned Llama-2-70b, alongside discussions on AI ethics and alignment risks. The Aphrodite engine showed strong performance on A6000 GPUs with specific configurations. Role-playing AI models such as Mixtral and Flatdolphinmaid faced challenges with repetitiveness, while Noromaid and Rpcal performed better, with ChatML and DPO recommended for improved responses. Learning resources like fast.ai's course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like Paged 8bit lion and adafactor were discussed. At Nous Research AI, the Activation Beacon project introduced a method for unlimited context length in LLMs using "global state" tokens, potentially transforming retrieval-augmented models. The Eagle-7B model, based on RWKV-v5, outperformed Mistral in benchmarks with efficiency and multilingual capabilities. OpenHermes2.5 was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like IMP v1-3b, Bakllava, Moondream, and Qwen-vl were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.

Jan 30, 2024

RWKV "Eagle" v5: Your move, Mamba