Frozen AI News archive

OpenAI launches Operator, its first Agent

**OpenAI** launched **Operator**, a premium computer-using agent for web tasks like booking and ordering, available now for Pro users in the US with an API promised. It features long horizon remote VMs up to 20 minutes and video export, showing state-of-the-art agent performance but not yet human-level. **Anthropic** had launched a similar agent 3 months earlier as an open source demo. **DeepSeek AI** unveiled **DeepSeek R1**, an open-source reasoning model excelling on the **Humanity's Last Exam** dataset, outperforming models like **LLaMA 4** and **OpenAI's o1**. **Google DeepMind** open-sourced **VideoLLaMA 3**, a multimodal foundation model for image and video understanding. **Perplexity AI** released **Perplexity Assistant** for Android with reasoning and search capabilities. The **Humanity's Last Exam** dataset contains 3,000 questions testing AI reasoning, with current models scoring below 10% accuracy, indicating room for improvement. OpenAI's Computer-Using Agent (CUA) shows improved performance on OSWorld and WebArena benchmarks but still lags behind humans. **Anthropic AI** introduced Citations for safer AI responses. *Sam Altman* and *Swyx* commented on Operator's launch and capabilities.

Canonical issue URL

AI News for 1/22/2025-1/23/2025. We checked 7 subreddits, 433 Twitters and 34 Discords (225 channels, and 4386 messages) for you. Estimated reading time saved (at 200wpm): 483 minutes. You can now tag @smol_ai for AINews discussions!

As widely rumored, OpenAI launched their computer use agent, 3 months after Anthropic's equivalent:

https://www.youtube.com/watch?v=CSE77wAdDLg


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Models and Releases

AI Benchmarks and Evaluation

AI Safety and Ethics

AI Research and Development

AI Industry and Companies

Memes/Humor


This summary categorizes the provided tweets into AI Models and Releases, AI Benchmarks and Evaluation, AI Safety and Ethics, AI Research and Development, AI Industry and Companies, and Memes/Humor, ensuring thematic coherence and grouping similar discussion points. Each summary references direct tweets with inline markdown links to maintain factual grounding.


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. DeepSeek's Competitiveness Shakes Tech Giants

Theme 2. Advanced LLM Architectures: Byte-Level Models and Reasoning Agents

Theme 3. Tooling for Better Reasoning in AI Models: Enhancements in Open WebUI

Theme 4. NVIDIA's GPU Innovations for Enhanced AI: Blackwell and Long Context Libraries

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. OpenAI launches Operator Tool for Computers

Theme 2. OpenAI's Vision for AI Agents by 2025


AI Discord Recap

A summary of Summaries of Summaries by o1-preview-2024-09-12

Theme 1. DeepSeek R1 vs Existing Models: Capabilities and Controversies

Theme 2. OpenAI's Operator and Agents: New Features and User Reactions

Theme 3. AI Assistants and IDEs: Cursor, Codeium Windsurf, Aider, and JetBrains

Theme 4. AI Model Development and Multi-GPU Support

Theme 5. Hardware and Performance Discussions: GPUs, CUDA Updates, and Training Large Models


PART 1: High level Discord summaries

Cursor IDE Discord


Codeium (Windsurf) Discord


Unsloth AI (Daniel Han) Discord


LM Studio Discord


Perplexity AI Discord


OpenRouter (Alex Atallah) Discord


aider (Paul Gauthier) Discord


OpenAI Discord


Yannick Kilcher Discord


Nous Research AI Discord


Stackblitz (Bolt.new) Discord


Latent Space Discord


GPU MODE Discord


MCP (Glama) Discord


Nomic.ai (GPT4All) Discord


Stability.ai (Stable Diffusion) Discord


Eleuther Discord


LlamaIndex Discord


Cohere Discord


Notebook LM Discord Discord


Modular (Mojo 🔥) Discord


LAION Discord


DSPy Discord


tinygrad (George Hotz) Discord


LLM Agents (Berkeley MOOC) Discord


Gorilla LLM (Berkeley Function Calling) Discord


Axolotl AI Discord


MLOps @Chipro Discord


Mozilla AI Discord


OpenInterpreter Discord


The Torchtune Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Cursor IDE ▷ #general (655 messages🔥🔥🔥):

DeepSeek R1, OpenAI O1, Chat vs Composer Mode, AI Agentic Models, Usage-Based Pricing

Links mentioned:


Codeium (Windsurf) ▷ #content (1 messages):

Web Search Feature, Demo Video Launch

Link mentioned: Tweet from Windsurf (@windsurf_ai): Just surfin' the web! 🏄


Codeium (Windsurf) ▷ #discussion (49 messages🔥):

Codeium extension features, Devin's capabilities, Web search for Codeium, Supercomplete in Jetbrains, Windsurf updates and issues

Links mentioned:


Codeium (Windsurf) ▷ #windsurf (493 messages🔥🔥🔥):

Windsurf issues with credits, Windsurf login errors, Development tools and setups for mobile apps, Comparison of AI models, User experiences with Windsurf

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (347 messages🔥🔥):

DeepSeek R1 and Qwen, Multi-GPU Support in Unsloth, Fine-tuning for Non-English Languages, Tokenization Challenges in Biology, Evaluation and Training Strategies

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (24 messages🔥):

DeepSeek V3 Hardware Requirements, Dual RTX 3090s Setup, Training Reasoning Models, Dolphin-R1 Dataset, Unsloth Integration with TRL

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (79 messages🔥🔥):

DeepSeek Distilled Models, Unsloth Notebooks, Model Fine-tuning Issues, VRAM Consumption, Dataset Management

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (11 messages🔥):

Finetuning Striped Hyena Model, Unsloth GPU Support, Genomic Data Pretraining

Link mentioned: GitHub - togethercomputer/stripedhyena: Repository for StripedHyena, a state-of-the-art beyond Transformer architecture: Repository for StripedHyena, a state-of-the-art beyond Transformer architecture - togethercomputer/stripedhyena


LM Studio ▷ #general (173 messages🔥🔥):

DeepSeek models, LM Studio error troubleshooting, Quantization effects on model performance, Local network accessibility in LM Studio, Gemini 2.0 performance

Links mentioned:


LM Studio ▷ #hardware-discussion (143 messages🔥🔥):

NVIDIA RTX 5090 Performance, Llama Model Benchmarking, AVX2 Requirements for GPU Usage, AI Inference API Subsidies, Procyon AI Text Generation Benchmark

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

Perplexity Assistant Launch, Assistant Features, Integration with Other Apps


Perplexity AI ▷ #general (250 messages🔥🔥):

Perplexity Assistant Issues, Model Selection Challenges, Sonar Model Changes, AI Output Quality Comparisons, New Features in Perplexity

Links mentioned:


Perplexity AI ▷ #sharing (11 messages🔥):

PyCTC Decode, Mistral Plans IPO, CIA Chatbot, Stargate Initiative, DeepSeek R1

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (11 messages🔥):

API Issues, API SOC 2 Compliance, Sonar vs Legacy Models, Retrieving Old Responses by ID, Sonar-Pro Multi-Step Goals


OpenRouter (Alex Atallah) ▷ #announcements (6 messages):

Web Search API Launch, Reasoning Tokens Introduced, Web Search Pricing Update, Model Standardization Improvements, Announcement Prematurely Made

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (244 messages🔥🔥):

Deepseek R1, API Features and Issues, Web Search Pricing, Model Performance Comparisons, Payment Methods for Credits

Links mentioned:


aider (Paul Gauthier) ▷ #general (154 messages🔥🔥):

Aider Configuration, DeepSeek R1 Performance, Using Multiple LLMs, Chat Mode Operations, Citations API from Anthropic

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (79 messages🔥🔥):

Aider Installation Issues, Using Aider with Docker, Aider Logging Practices, Aider and Large Codebases, Models for Aider Usage

Link mentioned: Installation: How to install and get started pair programming with aider.


aider (Paul Gauthier) ▷ #links (10 messages🔥):

JetBrains AI, Cursor & Windsurf competition, VSCode vs JetBrains, Continue feature use, User waitlist inquiries


OpenAI ▷ #annnouncements (1 messages):

Operator introduction, OpenAI presentation


OpenAI ▷ #ai-discussions (170 messages🔥🔥):

DeepSeek R1 Performance, Operator Features, Usage of Perplexity Assistant, OpenAI API Comparisons, Spiking Neural Networks Discussion

Links mentioned:


OpenAI ▷ #gpt-4-discussions (9 messages🔥):

GPT Outage, Voice Feature Issues, Status Updates, Attributing Blame for Downtime

Link mentioned: OpenAI Status: no description found


OpenAI ▷ #prompt-engineering (14 messages🔥):

OCR use cases, GIS datasets improvement, Task prompt sharing, ASMR meta-prompts, Content creation strategies


OpenAI ▷ #api-discussions (14 messages🔥):

Using prompts effectively, Task prompts for GPT, ASMR meta-prompts, Content creation for social media, Daily news digests


Yannick Kilcher ▷ #general (160 messages🔥🔥):

AI Model Limitations, National Energy Emergency Declaration, Employment Opportunities in AI, OpenAI's Future Plans, Math Problem Solving with LLMs

Links mentioned:


Yannick Kilcher ▷ #paper-discussion (10 messages🔥):

DeepSeek memory requirements, Generalized Spatial Propagation Network, Reinforcement Learning reward hacking

Links mentioned:


Yannick Kilcher ▷ #agents (1 messages):

IntellAgent, Conversational Agents Evaluation, Research Insights

Link mentioned: GitHub - plurai-ai/intellagent: A framework for comprehensive diagnosis and evaluation of conversational agents using simulated, realistic synthetic interactions: A framework for comprehensive diagnosis and evaluation of conversational agents using simulated, realistic synthetic interactions - plurai-ai/intellagent


Yannick Kilcher ▷ #ml-news (10 messages🔥):

OpenAI Operator, Kanye West AI Project, ChatGPT Free Tier Updates, Humanity's Last Exam, R1 Competitive Landscape

Links mentioned:


Nous Research AI ▷ #general (162 messages🔥🔥):

AI and AGI Predictions, AI Implementation for Companies, Olweus Bullying Victimization Questionnaire, GPU Comparisons and Choices, Model Training Strategies

Links mentioned:


Nous Research AI ▷ #ask-about-llms (6 messages):

Synthetic Data Generation, R1 Dataset Availability, Olweus Bullying Victimization Questionnaire


Nous Research AI ▷ #research-papers (4 messages):

Human-AI representation similarities, Diffusion models optimization

Links mentioned:


Nous Research AI ▷ #interesting-links (3 messages):

Evabyte Architecture, Tensor Network ML Library, Symbolic Reasoning in ML, Graph Isomorphism Optimization

Links mentioned:


Nous Research AI ▷ #research-papers (4 messages):

Human-AI Representation Alignment, Optimization for Diffusion Models

Links mentioned:


Nous Research AI ▷ #reasoning-tasks (1 messages):

lowiqgenai: Hey i did some using MistralAI free Services fhai50032/medmcqa-solved-thinking-o1


Stackblitz (Bolt.new) ▷ #announcements (1 messages):

katetra: https://x.com/boltdotnew/status/1882483266680406527


Stackblitz (Bolt.new) ▷ #prompting (1 messages):

cwinhall: You need to connect supabase. It's trying to use that for the database


Stackblitz (Bolt.new) ▷ #discussions (143 messages🔥🔥):

Bolt and Stripe Integration, Token Allocation Issues, Chat Persistence Problems, 3D Model Viewer Implementation, Payment System Suggestions

Links mentioned:


Latent Space ▷ #ai-general-chat (139 messages🔥🔥):

OpenAI Operator Launch, Imagen 3 performance, DeepSeek model advancements, Fireworks AI transcription service, API revenue sharing

Links mentioned:


GPU MODE ▷ #general (5 messages):

R1 jailbreak risks, DDoS capabilities of AI, End-to-End LLM solutions

Link mentioned: - YouTube: no description found


GPU MODE ▷ #triton (9 messages🔥):

Step Execution Order, Data Overwriting Issue, Compiler Behavior, Variable Changes


GPU MODE ▷ #cuda (49 messages🔥):

CUDA Toolkit 12.8 Release, Accel-Sim Framework, New Tensor Instructions, FP8 and FP4 Data Types, Blackwell Architecture Enhancements

Links mentioned:


GPU MODE ▷ #torch (8 messages🔥):

Torch Profiler Function Timing, Learning Rate Scheduling Techniques, Memory Allocation in Torch Compile


GPU MODE ▷ #jobs (1 messages):

ComfyUI Hiring, Machine Learning Engineers, Open Source Contributions


GPU MODE ▷ #beginner (14 messages🔥):

Choosing a GPU for Programming, Running Large Models Locally, Multi-GPU Training with Florence-2, Cost-effective GPU Options, Cloud GPU Services

Links mentioned:


GPU MODE ▷ #self-promotion (4 messages):

LeetGPU.com support, ComfyUI event, Performance enhancement tips, Community engagement, Workflows sharing

Links mentioned:


GPU MODE ▷ #thunderkittens (1 messages):

Tensor accumulation in CUDA kernels, Setup of accumulators

Link mentioned: ThunderKittens/kernels/matmul/H100/matmul.cu at main · HazyResearch/ThunderKittens: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.


GPU MODE ▷ #arc-agi-2 (8 messages🔥):

Tiny GRPO Repository, Reasoning Gym Project, Accessible RL Tutorials, Cloud Computing Options, Community Contributions

Links mentioned:


MCP (Glama) ▷ #general (87 messages🔥🔥):

MCP Server Improvements, Podman vs Docker, Line Number Handling in Code, MCP Client Interaction, Timeout Issues with MCP Servers

Links mentioned:


MCP (Glama) ▷ #showcase (7 messages):

Anthropic TS Client Issue, Puppeteer for Browser Automation, SSE Client Example Correction

Links mentioned:


Nomic.ai (GPT4All) ▷ #announcements (1 messages):

GPT4All v3.7.0 Release, Windows ARM Support, macOS Updates, Code Interpreter Improvements, Chat Templating Fixes


Nomic.ai (GPT4All) ▷ #general (64 messages🔥🔥):

ChatGPT access and limitations, Prompt engineering, Model compatibility and selection, Issues with Jinja templates, NSFW content generation

Link mentioned: Chat Templates - GPT4All: GPT4All Docs - run LLMs efficiently on your hardware


Stability.ai (Stable Diffusion) ▷ #general-chat (59 messages🔥🔥):

CitiVAI Downtime, Image Generation Techniques, GPU Comparisons, AI Model Training, Clip Skip Settings

Links mentioned:


Eleuther ▷ #general (22 messages🔥):

Google's Titans, Difficulty of Titans Paper, Pretraining Context in Models, Interpretability and Diffusion Models, Reward Systems in Model Distillation

Link mentioned: Google Research Unveils "Transformers 2.0" aka TITANS: Have we finally cracked the code on how to give models "human-like" memory? Watch to find out!Join My Newsletter for Regular AI Updates 👇🏼https://forwardfu...


Eleuther ▷ #research (27 messages🔥):

Feature Learning with Egomotion, LLM Explainability and Security Feedback, Multi-Turn Reasoning in Models, Learned Optimizers, Distributional Dynamic Programming

Links mentioned:


Eleuther ▷ #lm-thunderdome (1 messages):

Ruler tasks, Long context tasks


LlamaIndex ▷ #blog (2 messages):

Open-source RAG system, LlamaIndex, AI Chrome extensions


LlamaIndex ▷ #general (44 messages🔥):

AgentWorkflow Improvements, Multi-Agent Workflows, Agent vs Tool Clarification, Dynamic Memory Management, LlamaIndex Documentation Issues

Links mentioned:


Cohere ▷ #discussions (9 messages🔥):

Cohere LCoT models, Pydantic support for Cohere, COT prompting techniques

Link mentioned: Tossing Hat GIF - Jeff Bridges Agent Champagne Kingsman Golden Circle - Discover & Share GIFs: Click to view the GIF


Cohere ▷ #api-discussions (3 messages):

Cohere API endpoints, Latency issues in South America, Cohere Reranker models on premise


Cohere ▷ #cmd-r-bot (29 messages🔥):

Artificial Superintelligence (ASI), Cohere Documentation Queries


Notebook LM Discord ▷ #use-cases (4 messages):

NotebookLM Journey, Obsidian Plugins, Audio Generation Issues, Note Saving Limits

Link mentioned: NotebookLM: The AI Tool That Will Change Your Study Habits: In this video I share the Google NotebookLM features I use for studying.00:00 Introduction00:46 Workflow01:49 Feature 102:50 Feature 203:55 Feature 304:50 Fe...


Notebook LM Discord ▷ #general (35 messages🔥):

Podcast Creation, NotebookLM Language Settings, Test Questions Format, Downloading Audio Issues, Document Comparison with NotebookLM

Links mentioned:


Modular (Mojo 🔥) ▷ #general (9 messages🔥):

Asynchronous code in Mojo, Sharing forum posts

Links mentioned:


Modular (Mojo 🔥) ▷ #announcements (1 messages):

MAX Builds page launch, Community-built packages, Package submission instructions


Modular (Mojo 🔥) ▷ #mojo (27 messages🔥):

Overriding functions in Mojo, Python-style generators in Mojo, Re-assigning variables in Mojo function definitions, __iadd__ method in Mojo


LAION ▷ #general (27 messages🔥):

Open Source TTS, Audio Distortions Visualization, Colab Notebook Sharing, Pydub Implementations, Audio Output Widgets

Link mentioned: Google Colab: no description found


DSPy ▷ #general (5 messages):

Repo Spam Concerns, Adaptation vs. Imitation in Frameworks, Using External Libraries with DSPy


DSPy ▷ #examples (2 messages):

OpenAI Model, Groq Integration


tinygrad (George Hotz) ▷ #general (7 messages):

llvm_bf16_cast PR, shapetracker add problem, bounty suggestions, mask shrinks and views


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (6 messages):

Course Certificates, LLM MOOC Enrollment


Gorilla LLM (Berkeley Function Calling) ▷ #leaderboard (3 messages):

BFCLV3 LLMs testing, Tool relationships in LLMs, Research on BFCLV3 dataset

Link mentioned: gorilla/berkeley-function-call-leaderboard/data at main · ShishirPatil/gorilla: Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) - ShishirPatil/gorilla


Gorilla LLM (Berkeley Function Calling) ▷ #discussion (1 messages):

BFCLV3 System Message, LLMs Tool Dependency


Axolotl AI ▷ #general (2 messages):

KTO Loss Merge, Office Hours Announcement


MLOps @Chipro ▷ #events (1 messages):

Event for Senior Engineers/Data Scientists, Networking Opportunities in Toronto


Mozilla AI ▷ #announcements (1 messages):

Local-First X AI Hackathon, Event Discussion Thread


OpenInterpreter ▷ #ai-content (1 messages):

fund21: ye how we can you integrate Deepspeek on >interpreter --os mode ?




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}