Frozen AI News archive

Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU

**Mustafa Suleyman** announced **Inflection 2.5**, which achieves *more than 94% the average performance of GPT-4 despite using only 40% the training FLOPs*. **Pi**'s user base is growing about 10% weekly, with new features like realtime web search. The community noted similarities between Inflection 2.5 and **Claude 3 Sonnet**. **Claude 3 Opus** outperformed **GPT-4** in a 1.5:1 vote and is now the default for **Perplexity Pro** users. **Anthropic** added experimental tool calling support for Claude 3 via **LangChain**. **LlamaIndex** released LlamaParse JSON Mode for structured PDF parsing and added video retrieval via VideoDB, enabling retrieval-augmented generation (RAG) pipelines. A paper proposed knowledge-augmented planning for LLM agents. New benchmarks like TinyBenchmarks and the **Yi-9B** model release show strong code and math performance, surpassing **Mistral**.

Canonical issue URL

Mustafa Suleyman announced Inflection 2.5, which closes much of the gap Inflection had with GPT-4 in an undisclosed compute-efficient way ("achieves more than 94% the average performance of GPT-4 despite using only 40% the training FLOPs", which is funny because those numbers aren't public).

image.png

But IQ isn't the only metric that matters; they are also optimizing for EQ, which is best proxied but the impressive user numbers they also released for Pi:

image.png

More notes on the Axios exclusive:

image.png

They also released a corrected version of MT-Bench for community use.

The community has spotted a couple other interesting tidbits:


Table of Contents

[TOC]


PART X: AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs

Claude 3 Release and Capabilities:

Retrieval Augmented Generation (RAG):

Benchmarking and Evaluation:

AI Research and Techniques:

Memes and Humor:


PART 0: Summary of Summaries of Summaries

Claude 3 Sonnet (14B?)

  1. Model Releases and Comparisons: Multiple new AI models sparked heated discussions around their strengths and limitations. Inflection-2.5 claimed to match GPT-4 performance on benchmarks while using less compute, but faced skepticism from @HlibIvanov who called it a mere GPT-4 distill lacking innovation. Claude-3 Opus achieved impressive feats like a perfect 800 on the SAT reading section, with @res6969 praising its enhanced knowledge web construction over 35k tokens. However, @jeffreyw128 noted Claude struggled to find a specific name among 500. Gemma underwhelmed @lee0099 compared to 7B Mistral, especially in multi-turn dialogues and being English-only.

  2. Open-Source AI and Community Dynamics: @natolambert vented frustrations over the OSS community's pedantic corrections and lack of perspective, which can deter OSS advocates. Even helpful posts face excessive criticism, as experienced when writing on OSS. The GaLore optimizer by @AnimaAnandkumar promised major memory savings for LLM training, generating excitement from @nafnlaus00 and others about improving accessibility on consumer GPUs. However, @caseus_ questioned GaLore's claimed parity with full pre-training. Integrating GaLore into projects like axolotl faced implementation challenges.

  3. Hardware Optimization for AI Workloads: Optimizing hardware was a key focus, with techniques like pruning, quantization via bitsandbytes, and low-precision operations discussed. @iron_bound highlighted Nvidia's H100 GPU offering 5.5 TB/s L2 cache bandwidth, while @zippika speculated the RTX 4090's L1 cache could reach 40 TB/s. CUDA implementations like @tspeterkim_89106's Flash Attention aimed for performance gains. However, @marksaroufim warned about coarsening impact on benchmarking consistency.

  4. AI Applications and Tooling: Innovative AI applications were showcased, like @pradeep1148's Infinite Craft Game and Meme Generation using Mistral. LlamaIndex released LlamaParse JSON Mode for parsing PDFs into structured data. Integrating AI with developer workflows was explored, with @alexatallah offering sponsorships for OpenRouter VSCode extensions, while LangChain's ask-llm library simplified LLM coding integrations.

Claude 3 Opus (8x220B?)

ChatGPT (GPT4T)


PART 1: High level Discord summaries

Perplexity AI Discord Summary


Nous Research AI Discord Summary


LAION Discord Summary


OpenAI Discord Summary


LM Studio Discord Summary


LlamaIndex Discord Summary


Latent Space Discord Summary


Eleuther Discord Summary


HuggingFace Discord Summary


OpenAccess AI Collective (axolotl) Discord Summary


OpenRouter (Alex Atallah) Discord Summary


LangChain AI Discord Summary


CUDA MODE Discord Summary


Interconnects (Nathan Lambert) Discord Summary


Alignment Lab AI Discord Summary


DiscoResearch Discord Summary


LLM Perf Enthusiasts AI Discord Summary


Datasette - LLM (@SimonW) Discord Summary


Skunkworks AI Discord Summary


PART 2: Detailed by-Channel summaries and links

Perplexity AI ▷ #announcements (1 messages):


Perplexity AI ▷ #general (413 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (14 messages🔥):


Perplexity AI ▷ #pplx-api (19 messages🔥):

Links mentioned:

pplx-api: no description found


Nous Research AI ▷ #off-topic (6 messages):

Links mentioned:


Nous Research AI ▷ #interesting-links (44 messages🔥):

Links mentioned:


Nous Research AI ▷ #announcements (1 messages):

<ul>
  <li><strong>New Model Unveiled: Genstruct 7B</strong>: Nous Research announces the release of <strong>Genstruct 7B</strong>, an instruction-generation model that can create valid instructions from raw text, allowing for the creation of new finetuning datasets. The model, inspired by the Ada-Instruct paper, is designed to generate questions for complex scenarios, promoting detailed reasoning.</li>
  <li><strong>User-Informed Generative Training</strong>: The <strong>Genstruct 7B</strong> model is grounded in user-provided context, taking inspiration from Ada-Instruct and pushing it further to enhance the reasoning capabilities of subsequently trained models. Available for download on HuggingFace: [Genstruct 7B on HuggingFace](https://huggingface.co/NousResearch/Genstruct-7B).</li>
  <li><strong>Led by a Visionary</strong>: The development of <strong>Genstruct 7B</strong> was spearheaded by `<@811403041612759080>` at Nous Research, signifying a team investment in innovation for instruction-based model training.</li>
</ul>

Links mentioned:

NousResearch/Genstruct-7B · Hugging Face: no description found


Nous Research AI ▷ #general (329 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (41 messages🔥):

Links mentioned:


LAION ▷ #general (300 messages🔥🔥):

Links mentioned:


LAION ▷ #research (74 messages🔥🔥):

Links mentioned:

Neverseenagain Yourleaving GIF - Neverseenagain Yourleaving Oh - Discover & Share GIFs: Click to view the GIF


OpenAI ▷ #ai-discussions (204 messages🔥🔥):

Links mentioned:

EvalPlus Leaderboard: no description found


OpenAI ▷ #gpt-4-discussions (29 messages🔥):

Links mentioned:


OpenAI ▷ #prompt-engineering (68 messages🔥🔥):


OpenAI ▷ #api-discussions (68 messages🔥🔥):


LM Studio ▷ #💬-general (187 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (27 messages🔥):

Links mentioned:

Kquant03/TechxGenus-starcoder2-15b-instruct-GGUF · Hugging Face: no description found


LM Studio ▷ #🧠-feedback (4 messages):


LM Studio ▷ #🎛-hardware-discussion (46 messages🔥):

Links mentioned:


LM Studio ▷ #crew-ai (2 messages):


LM Studio ▷ #open-interpreter (85 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #announcements (1 messages):

Links mentioned:

LlamaIndex user survey: Take this survey powered by surveymonkey.com. Create your own surveys for free.


LlamaIndex ▷ #blog (5 messages):

Links mentioned:

LlamaIndex user survey: Take this survey powered by surveymonkey.com. Create your own surveys for free.


LlamaIndex ▷ #general (298 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

Links mentioned:

GitHub - mominabbass/LinC: Code for "Enhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibration": Code for "Enhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibration" - mominabbass/LinC


Latent Space ▷ #ai-general-chat (14 messages🔥):

Links mentioned:


Latent Space ▷ #ai-announcements (4 messages):


Latent Space ▷ #llm-paper-club-west (204 messages🔥🔥):

Links mentioned:


Eleuther ▷ #announcements (1 messages):


Eleuther ▷ #general (78 messages🔥🔥):

Links mentioned:


Eleuther ▷ #research (77 messages🔥🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (15 messages🔥):

Links mentioned:


Eleuther ▷ #gpt-neox-dev (34 messages🔥):

Links mentioned:


Co-authored-by: Quentin Anthony <[email protected]>


HuggingFace ▷ #general (106 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (10 messages🔥):


HuggingFace ▷ #cool-finds (11 messages🔥):

Links mentioned:


HuggingFace ▷ #i-made-this (18 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (9 messages🔥):

Links mentioned:


HuggingFace ▷ #diffusion-discussions (2 messages):

Links mentioned:

ByteDance/SDXL-Lightning · finetune: no description found


HuggingFace ▷ #computer-vision (7 messages):


HuggingFace ▷ #NLP (23 messages🔥):

Links mentioned:

Golden Retriever Dog GIF - Golden Retriever Dog Puppy - Discover & Share GIFs: Click to view the GIF


HuggingFace ▷ #diffusion-discussions (2 messages):

Links mentioned:

ByteDance/SDXL-Lightning · finetune: no description found


HuggingFace ▷ #gradio-announcements (1 messages):

<ul>
  <li><strong>Gradio 4.20.0 Unleashed with External Auth Providers</strong>: User <code>@yuviii_</code> announced Gradio's newest version 4.20.0 which supports <strong>external / arbitrary authentication providers</strong>, including HF OAuth and Google OAuth, enhancing app security and user flexibility. Check out the examples on HF Spaces - [HF OAuth Example](https://huggingface.co/spaces/Wauplin/gradio-oauth-private-models) and [Google OAuth Example](https://huggingface.co/spaces/gradio/oauth-example).</li>
  <li><strong>Clean Up With Ease</strong>: The latest Gradio update introduces a <code>delete_cache</code> parameter to <code>gr.Blocks</code>, allowing for automatic cleanup of files upon app shutdown.</li>
  <li><strong>Smooth User Logout Experience</strong>: Users can now enjoy a smoother sign-off with Gradio's new <code>/logout</code> functionality.</li>
  <li><strong>Stylish Downloads with Gradio</strong>: The <code>gr.DownloadButton</code> component is now available, making the provision of downloadable content in apps easier and more visually appealing. For more information, visit the [documentation for gr.DownloadButton](https://www.gradio.app/docs/downloadbutton#demos).</li>
</ul>

Links mentioned:

Gradio DownloadButton Docs: no description found


OpenAccess AI Collective (axolotl) ▷ #general (44 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (45 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (20 messages🔥):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #announcements (2 messages):


OpenRouter (Alex Atallah) ▷ #general (94 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #general (55 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #langchain-templates (9 messages🔥):


LangChain AI ▷ #share-your-work (7 messages):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Links mentioned:


CUDA MODE ▷ #general (6 messages):

Links mentioned:


CUDA MODE ▷ #cuda (30 messages🔥):

Links mentioned:


CUDA MODE ▷ #torch (11 messages🔥):

Links mentioned:


CUDA MODE ▷ #algorithms (1 messages):

Links mentioned:

GitHub - rayleizhu/vllm-ra: vLLM with RelayAttention integration: vLLM with RelayAttention integration. Contribute to rayleizhu/vllm-ra development by creating an account on GitHub.


CUDA MODE ▷ #ring-attention (23 messages🔥):

Links mentioned:


CUDA MODE ▷ #off-topic (1 messages):


Interconnects (Nathan Lambert) ▷ #news (4 messages):

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (48 messages🔥):

Links mentioned:

Aggregator’s AI Risk: A single AI can never make everyone happy, which is fundamentally threatening to the Aggregator business model; the solution is personalized AI


Interconnects (Nathan Lambert) ▷ #random (19 messages🔥):

Links mentioned:


Alignment Lab AI ▷ #general-chat (1 messages):


Alignment Lab AI ▷ #oo (16 messages🔥):

Links mentioned:


Alignment Lab AI ▷ #oo2 (10 messages🔥):


DiscoResearch ▷ #general (9 messages🔥):

Links mentioned:

Reliable, Adaptable, and Attributable Language Models with Retrieval: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, di...


DiscoResearch ▷ #embedding_dev (13 messages🔥):

Links mentioned:


LLM Perf Enthusiasts AI ▷ #claude (13 messages🔥):


LLM Perf Enthusiasts AI ▷ #prompting (1 messages):

Since there is only one message provided without any specific discussion on topics, no summary can be generated. If more context or messages were provided, I'd be able to offer a summary with the requested details.


Datasette - LLM (@SimonW) ▷ #ai (5 messages):

Links mentioned:

Making my bookshelves clickable | James' Coffee Blog: no description found


Datasette - LLM (@SimonW) ▷ #llm (8 messages🔥):


Skunkworks AI ▷ #off-topic (2 messages):

Links mentioned: