All tags
Model: "gemini-2.0"
ChatGPT Codex, OpenAI's first cloud SWE agent
codex-1 openai-o3 codex-mini gemma-3 blip3-o qwen-2.5 marigold-iid deepseek-v3 lightlab gemini-2.0 lumina-next openai runway salesforce qwen deepseek google google-deepmind j1 software-engineering parallel-processing multimodality diffusion-models depth-estimation scaling-laws reinforcement-learning fine-tuning model-performance multi-turn-conversation reasoning audio-processing sama kevinweil omarsar0 iscienceluvr akhaliq osanseviero c_valenzuelab mervenoyann arankomatsuzaki jasonwei demishassabis philschmid swyx teortaxestex jaseweston
OpenAI launched Codex, a cloud-based software engineering agent powered by codex-1 (an optimized version of OpenAI o3) available in research preview for Pro, Enterprise, and Team ChatGPT users, featuring parallel task execution like refactoring and bug fixing. The Codex CLI was enhanced with quick sign-in and a new low-latency model, codex-mini. Gemma 3 is highlighted as the best open model runnable on a single GPU. Runway released the Gen-4 References API for style transfer in generation. Salesforce introduced BLIP3-o, a unified multimodal model family using diffusion transformers for CLIP image features. The Qwen 2.5 models (1.5B and 3B versions) were integrated into the PocketPal app with various chat templates. Marigold IID, a new state-of-the-art open-source depth estimation model, was released.
In research, DeepSeek shared insights on scaling and hardware for DeepSeek-V3. Google unveiled LightLab, a diffusion-based light source control in images. Google DeepMind's AlphaEvolve uses Gemini 2.0 to discover new math and reduce costs without reinforcement learning. Omni-R1 studied audio's role in fine-tuning audio LLMs. Qwen proposed a parallel scaling law inspired by classifier-free guidance. Salesforce released Lumina-Next on the Qwen base, outperforming Janus-Pro. A study found LLM performance degrades in multi-turn conversations due to unreliability. J1 is incentivizing LLM-as-a-Judge thinking via reinforcement learning. A new Qwen study correlates question and strategy similarity to predict reasoning strategies.
DeepSeek's Open Source Stack
qwen-qwq-32b start character-3 gemini gemini-2.0 mercury-coder gpt-4.5 jamba-mini-1.6 gemini-2.0-flash gpt-4o-mini mistral-small-3 mistral-ocr deepseek pyspur hugging-face togethercompute hedra-labs google-deepmind deeplearningai openai ai21-labs mistral-ai fine-tuning benchmarking multimodality code-generation diffusion-models model-performance model-optimization ocr embedding-models context-windows runtime-limits _akhaliq lmarena_ai reach_vb danielhanchen _philschmid aidan_mclau vikhyatk jerryjliu0
DeepSeek's Open Source Week was summarized by PySpur, highlighting multiple interesting releases. The Qwen QwQ-32B model was fine-tuned into START, excelling in PhD-level science QA and math benchmarks. Character-3, an omnimodal AI video generation model by Hedra Labs and Together AI, enables realistic animated content creation. Google DeepMind introduced the Gemini embedding model with an 8k context window, ranking #1 on MMTEB, alongside the Gemini 2.0 Code Executor supporting Python libraries and auto-fix features. Inception Labs' Mercury Coder is a diffusion-based code generation model offering faster token processing. OpenAI released GPT-4.5, their largest model yet but with less reasoning ability than some competitors. AI21 Labs launched Jamba Mini 1.6, noted for superior output speed compared to Gemini 2.0 Flash, GPT-4o mini, and Mistral Small 3. A new dataset of 1.9M scanned pages was released for OCR benchmarking, with Mistral OCR showing competitive but not top-tier document parsing performance compared to LLM/LVM-powered methods. "Cracked engineers are all you need."
PRIME: Process Reinforcement through Implicit Rewards
claude-3.5-sonnet gpt-4o deepseek-v3 gemini-2.0 openai together-ai deepseek langchain lucidrains reinforcement-learning scaling-laws model-performance agent-architecture software-development compute-scaling multi-expert-models sama aidan_mclau omarsar0 akhaliq hwchase17 tom_doerr lmarena_ai cwolferesearch richardmcngo
Implicit Process Reward Models (PRIME) have been highlighted as a significant advancement in online reinforcement learning, trained on a 7B model with impressive results compared to gpt-4o. The approach builds on the importance of process reward models established by "Let's Verify Step By Step." Additionally, AI Twitter discussions cover topics such as proto-AGI capabilities with claude-3.5-sonnet, the role of compute scaling for Artificial Superintelligence (ASI), and model performance nuances. New AI tools like Gemini 2.0 coder mode and LangGraph Studio enhance agent architecture and software development. Industry events include the LangChain AI Agent Conference and meetups fostering AI community connections. Company updates reveal OpenAI's financial challenges with Pro subscriptions and DeepSeek-V3's integration with Together AI APIs, showcasing efficient 671B MoE parameter models. Research discussions focus on scaling laws and compute efficiency in large language models.