Frozen AI News archive

Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence

At **Google Cloud Next**, **Gemini 1.5 Pro** was released with a **million-token context window**, available in **180+ countries**, featuring **9.5 hours of audio understanding**, a new **File API** for nearly unlimited free uploads, and the **Gecko-1b-256/768 embedding model**. **GPT-4 Turbo with Vision** became generally available in the API with a major update improving reasoning capabilities. **Meta Platforms** plans to launch smaller versions of **Llama 3** next week. The **Orca 2.5 7B** model using Direct Nash Optimization outperforms older GPT-4 versions in AlpacaEval. New releases include **Functionary-V2.4** with enhanced function calling and code interpretation, and **CosXL** models for image editing. Research highlights include continuous U-Nets for diffusion models achieving up to **80% faster inference** and a massive multilingual dataset with **~5.6 trillion word tokens**. Creative applications include a no-code touch screen game made with Gemini 1.5 and AI-generated novel trailers.

Canonical issue URL

Incremental improvements, but big nonetheless:

Lots more smaller updates from Cohere Command R to Google CodeGemma in the recaps below.


Table of Contents

[TOC]


AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence. Comment crawling still not implemented but coming soon.

Latest AI Model Developments

Efficient AI Techniques

Creative Applications

Scaling AI Infrastructure

Responsible AI Development

Memes and Humor


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Cohere Command R+ Model Performance

Other Notable Open Model Releases and Updates

Emerging Trends and Discussions

Memes and Humor


AI Discord Recap

A summary of Summaries of Summaries

1. New AI Model Releases and Capabilities:

2. Efficient LLM Training and Deployment Approaches:

3. AI Assistants and Multimodal Interactions:

4. Open-Source AI Frameworks and Community Efforts:

5. Misc Updates:


PART 1: High level Discord summaries

Perplexity AI Discord

APIs Battle for Supremacy: The community is buzzing with debates over the merits of Perplexity Pro versus Claude 3 Opus, contrasting Perplexity's flexible usage with Claude 3's superior writing but restrictive conditions. Anticipation builds around the GPT-4 Turbo API, as engineers eagerly await upgrades that could mirror Claude 3 Opus's performance enhancements.

Perplexity's Prowess in Preview: Enthusiasm surrounds Gemini 1.5, with its potential to rival GPT-4 and exceed expectations with a larger context window and multimedia support. Meanwhile, ChatGPT Plus faces scrutiny against free AI options, with Perplexity Pro's web search feature standing out among commentary.

Helpers and Handlers in the Limelight: The Harpa AI browser extension garners attention as a potent web automation tool, streamlining tasks such as content summarization and email explanation, easing the workflow for engineers.

Perplexity's API Conundrums and Triumphs: Discussions traverse the landscape of Perplexity's API offerings, from handling public PDFs/TXT files, the absence of the pplx-pro model through API access, to the resolution of an API balance top-up issue. A newly released Perplexity API Ruby client stirs the community, while inquiries for a Perplexity-specific token calculation tool reflect ongoing optimization efforts.

Media, Models, and the Multiverse: Diverse links shared across the guild, from in-depth explorations of GPT origins to interviews discussing workflows for trusted AI at Workflows & Tooling to Create Trusted AI, mirror the broad spectrum of interests amongst AI engineers, with Perplexity AI searches being the oft-chosen portal for their multifarious quests for knowledge.


Stability.ai (Stable Diffusion) Discord


Nous Research AI Discord

Call for Hackathon Project Ideas: During a brainstorming session for a hackathon, participants discussed cool AI projects, including fine-tuning Mistral on academic papers and examining projects from a previous Mistral hackathon.

GPT-4 Might Still Fail the Apple Test: Chat about GPT-4 updates highlighted that the model still struggles with the "apple test" at a temperature setting of 0.7, as well as invitations for collaboration on the IMO math prize, suggesting high computational resources might be involved.

Reviewing Fine-Tuning Techniques for Chat: A heated debate on fine-tuning chat models unfolded, with a focus on comparing the efficacy of Direct Preference Optimization (DPO) and other methods like SFT+KTO and Microsoft's DNO.

AI Renovation: Training Efficiency and Model Performance: Engineers are excited about the leaner LLM training approach with GPT-2 in C introduced by Karpathy, and StableLM 2 12B pre-trained on 2 trillion tokens is also a buzz, while excitement builds up for the potential of Hermes 2.5 outperforming its predecessors.

nanoLLaVA Emerges for Edge Efficiency: The release of a sub-1B vision-language model, nanoLLaVA, aimed at edge devices, stoked conversations about its anticipated integration with Obsidian and Hermes Vision.


LM Studio Discord

GPU or Bust: AI enthusiasts debated over hardware configurations for optimizing AI model performance, notably stating a 5800x with 64GB RAM and GPU offload increases efficiency. A swap to a 14700K with 96GB RAM at 6400MHz from an i3 setup showed minimal inference speed boost, hinting that VRAM might be the bottleneck.

LLM's New Power Players:

Techies Query Model Bits and Pieces: Queries arose regarding GGUF quantization and llama.cpp downloads, while GPU model compatibilities were explored, considering options like P40 with external cooling. Insights on "instruct" models and mixtures of experts being a cut above for command accuracy were shared.

LM Studio Gets Text-Embedding Update: Version 0.2.19 of LM Studio hit the virtual shelves, offering new text embedding support and quality-of-life improvements for AI researchers. Download links for Windows, Mac, and Linux are available, with additional RAM allocation discussion for larger models.

New Model Drops and Community Shares: Google's market-oriented models, Gemma 1.1 and CodeGemma series, are shared by the community, making waves for their memory-efficient design and instruction-following dexterity, respectively. These models are positioned as reliable resources for AI engineers, accessible via Hugging Face.


OpenAI Discord

Intelligence Debate Gets Brainy: Engineers have been discussing the nature of intelligence with references to "Uniquely human intelligence" and considering if AI, like Claude.ai or GPT, could have aspects of qualia or consciousness. The technical depth escalated with sharing academic perspectives on human cognition and evolutionary theories.

GPT-5 Release Sparking Speculation: Anticipation for GPT-5's launch provoked discussion on the supposed challenges and timeframe, comparing current options like Claude 3 Opus and Gemini 1.5 Pro for programming aid, and grappling with regional availability.

Artistic Algorithms Stir Up Ethics Chat: A contentious debate on the ethics of AI-generated art vs human creativity surfaced, touching on appreciation, sentiment analysis, and platform policies like YouTube's ToS possibly clashing with content creation practices.

Task Mastering with LLMs? Query Away: Queries about large language models' (LLMs) capacity to simplify complex tasks surfaced, sparking conversation on the need for additional systems for task management despite AI's assistance, echoing broader issues of integrating AI's capacities with practical work structures.

Prompt Engineering—A Guiding Light or a Blind Spot?: Discussions on prompt modularity likened the GPT environment to a modular OS, but raised concerns over transparency and changes to default system prompts. Highlighted was the difficulty in separating system prompts and the non-deterministic results of prompt injection techniques, noting issues of stewardship and AI ethics.


LlamaIndex Discord

New Paths in RAG Improvement: Fanghua Yu has shared a method to improve Retrieval-Augmented Generation (RAG) by utilizing LlamaParse to extract a document knowledge graph, which enhances RAG's performance. The details and evaluation of various RAG techniques, including Matous Eibich's ARAGOG project, are explored in a comprehensive survey.

RAG Sees Through Pills: Multimodal RAG applications are extending into the medical domain with a focus on medication identification, combining images and descriptions to accurately recognize pills, as highlighted in a recent blog post.

RAG for the Masses: An upcoming event will detail the construction of enterprise-grade RAG systems, covering aspects like advanced parsing/ingestion and ensuring comprehensive observability for these systems. Those interested can sign up for the event here.

Troubleshooting OpenSearch Vector Store: Technical discussions indicate issues with inserting new data into OpenSearch vector store, where multiple members shared similar experiences. Workarounds offered include using index.refresh_ref_docs(), and an instructional video on document parsing can be found here.

Seeking Gemini Guidance: There's a call to arms for sharing a notebook modeled after OpenAI's with Gemini LLM examples, which illustrates the community's desire for practical guides for emerging tools. The existing OpenAI example is highly regarded and can serve as a baseline for future Gemini LLM templates which are accessible here.


HuggingFace Discord

Cool New Tools for AI Engineers: Hugging Face released Gemma 1.1 Instruct 7B with coding capabilities and dropped their compute prices by up to 50%, offering an average 20% reduction compared to AWS EC2. They've also publicized two massive OCR datasets with 26 million pages and introduced Gradio's API Recorder feature.

AI Community's Learning Hub: A GitHub repo for NLP sentiment classification was shared, and Hugging Face encourages learning through various tutorials and resources, such as Gradio's and Langchain's tutorials.

Creative AI Developments to Watch: Innovations from the community include BeastBot for viral content insights and Ragdoll Studio, an enhanced character generation and interaction platform. Deep Q-Learning applications are gaining traction with GitHub showcases.

AI Model Debugging and Optimization Conversations: Members struggled with TorchScript export, scheduler/sampler behaviors, and OOM errors when training Mistral on A100 GPUs. Community members suggest checking out the shared Google Colab notebook for potential solutions.

Engaging Discussions in Specialized AI Topics: Debates and assistance on topics like benchmarking AI hardware, issues with model and approach recognitions, using GPT-2 for summarization, and integrating contrastive loss in vision models reflect the community's engagement with cutting-edge AI challenges and solutions.


Eleuther Discord

Boot Up the Branding Machine!: The term GPT-3.5 is being used for branding purposes despite potential confusion, overshadowing its technical lineage.

Claude's Cloak of Invisibility: Despite Claude 3 Opus showcasing superior performance to GPT-4, there's a shroud of mystery around its size, with no reliable rumors or leaks so far.

The Art of Averages in Optimizers: A pointed discussion revealed that the ScheduleFree optimizer does not use an exponential moving average but maintains a simple average, as reflected by the convergence to a 1/t term.

MoE Models: From Dense to Sparse and Back Again: A new paper suggests that Mixture-of-Experts (MoE) models can train densely and infer sparsely, questioning the prevalent notion of their parameter efficiency while scaling.

Token Sampling Strategies Debated: In min-P vs. top-P sampling, min-P could be more effective due to gradual probability changes, a perspective supported by token distribution analysis seen in the VLLM GitHub repo.


LangChain AI Discord

LangChain's Graph Mysteries Unveiled: Engineers clarified the existence of the attach_edge method in the CompiledGraph class of LangChain, pointing members to the official documentation to unravel its functionalities.

AI Transcription Jargons Squashed: Amidst building AI transcription applications, discourse ensued over SerpAPI and an ostensibly similar Serper API. Community members remained uncertain about Serper API's synergy with LangChain, distinct from the well-integrated SerpAPI.

Model Showdown: Cost vs Capability: LLM (Large Language Models) aficionados shared operational wisdom, comparing the prowess of GPT-3.5, GPT-4, and Claude, while airing tribulations with models like gemin8 in terms of practical deployment and economy.

Custom Retrievals and Error Handling: Engagement with custom retrieval systems sparked exchanges on performance evaluation, directing novices to the trulens eval package, while questions on error management in LangChain were elucidated with refs to Pydantic validators and RunnableRetry.

LangChain vs. OpenAI: A comparison conundrum arose between LangChain's utility for AI assistants and the bespoke OpenAI's APIs. However, the discussion failed to distill definitive perks of LangChain over OpenAI's offerings.

Art and Cybersecurity Fused: DIY developers have burgeoned tools across aesthetics and security. Artful AI now sports new image models (Artful AI), AISploit empowers penetration testers (AISploit GitHub), Galaxy AI democratizes premium AI models (Galaxy API), TinderGPT streamlines dating app chats (TinderGPT GitHub), and everything-rag rolls out as a local chatbot assistant (everything-rag GitHub).

AI Stylist & Publishing Aid: A tutorial reveals an AI capable of fashionably dressing social media images (YouTube Guide), and a fellow engineer inquires about tutorials for AI agent publishing, seeking to endow their creation with a user interface.


CUDA MODE Discord

Meta Sponsors Massive GPU Hours: Meta's sponsorship of a study on LLM knowledge involved a colossal 4.2 million GPU hours, amounting to about 479 years of non-stop computing, illustrating the scale and resource intensity of the project.

GPT-2 Dons CUDA Cap: There's buzz around a GPT-2 training code ported to CUDA, potentially heralding new efficiency and performance milestones. Conversations indicate a growing working group eager to explore this CUDA adaptation.

Optimization Opportunities in Triton: Discussions surfaced around leveraging triton-viz for enhanced program visualization and tackling documentation puzzles, specifically contributing official references via GitHub pull request #3608.

LIAH Joins the LLM Deception Game: Debates arose over the usefulness of ring attention architectures, notably when a member introduced LIAH (lie-in-a-haystack), a tactic introduced to deter language models from depending on pre-existing knowledge, accessible through its GitHub repository.

Quantization Quandary in LLMs: The challenges of quantization for LLMs yielded a discussion about the potential performance benefits of both application and inference, specifically spotlighting 4-bit quantization techniques and the use of HQQ for mobile LLM deployment applications, as shown in the shared HQQ code.


OpenInterpreter Discord

Voice Recognition Smooths Out Python Bumps: A practical fix for bot voice recognition issues was found by downgrading from Python 3.11.4 to 3.10, aligning with community insights that Python 3.9 and 3.10 are preferred for compatibility.

Windows Woes and Linux Leads for 01: One member’s installation struggle of 01 on Windows, notably API key issues, led to a recommendation to check the environment variable naming (use OPENAI_API_KEY), while Linux users reported fewer issues.

GPT-4 Steals the Show: The release of GPT-4 stirred up excitement within the community due to its improved performance and vision capabilities, with discussions highlighting its integration on the OpenAI Platform.

DIY Tech Enthusiasts Gear Up for OpenInterpreter: Discussions delved into DIY vs. preorder options for OpenInterpreter, highlighting the M5 Atom Echo as a key component; the custom software for which is best optimized for the M5, available from vendors like Mouser.com.

Desk Robot Dreams and Raspberry Pi Schemes: Conversations about using the Raspberry Pi for 01 projects emerged, with ambitions ranging from desk robots to open source contributions, marked by the intent to utilize the domain cofounder.bot for future developments.


OpenAccess AI Collective (axolotl) Discord

Jet MoE Awaits Departure for Hugging Face: The integration of Jet MoE into Hugging Face's transformers is highly anticipated, as evidenced by a pending GitHub pull request. Several users are monitoring the PR closely, with discussions highlighting the potential this architecture holds.

Lepton AI Soars with Simplicity: The user-friendly and cloud-native platform Lepton AI received praise for its simplicity in running AI applications, with tools like Photon and WhisperX being highlighted; the platform can be explored further at Lepton AI’s website.

AI Giants Flex Their Compute: A complex trio of models embracing the stage, Qwen 1.5 32B, Yi 34B, and Command R, spurred discussions on their comparative performance and capabilities, particularly in context handling and dataset performance.

Meta Gears Up for Llama 3's Debut: Eager chatter surrounded Meta's upcoming Llama 3, particularly its expected multimodal prowess and the uncertainty around its parameter count. Speculations sync with reports from The Information about smaller Llama 3 variants.

SVD Enhances LoRA: A highlight within the community was the discovery shared by CFGeek, indicating improved fine-tuning results by initializing LoRA layers with SVD. The method's full description is available within the PiSSA GitHub repo and a dedicated arXiv paper.


OpenRouter (Alex Atallah) Discord


Latent Space Discord


Modular (Mojo 🔥) Discord


LAION Discord


PART 2: Detailed by-Channel summaries and links

Perplexity AI ▷ #general (645 messages🔥🔥🔥):

Links mentioned:


Perplexity AI ▷ #sharing (18 messages🔥):

Link mentioned: Workflows & Tooling to Create Trusted AI | Ask More of AI with Clara Shih: Clara sits down with the founder/CEOs of three of the hottest AI companies-- Aravind Srinivas (Perplexity AI), Jerry Liu (LlamaIndex), and Harrison Chase (La...


Perplexity AI ▷ #pplx-api (6 messages):


Stability.ai (Stable Diffusion) ▷ #general-chat (470 messages🔥🔥🔥):

Links mentioned:


Nous Research AI ▷ #off-topic (2 messages):


Nous Research AI ▷ #interesting-links (9 messages🔥):

Links mentioned:


Nous Research AI ▷ #general (175 messages🔥🔥):

Links mentioned:


Nous Research AI ▷ #ask-about-llms (20 messages🔥):

Link mentioned: Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data: The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent investi...


Nous Research AI ▷ #project-obsidian (4 messages):

Link mentioned: qnguyen3/nanoLLaVA · Hugging Face: no description found


Nous Research AI ▷ #bittensor-finetune-subnet (1 messages):

4biddden: Is there a runpod template available for the bittensor fine-tune?


Nous Research AI ▷ #world-sim (223 messages🔥🔥):

Links mentioned:


LM Studio ▷ #💬-general (116 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🤖-models-discussion-chat (76 messages🔥🔥):

Links mentioned:


LM Studio ▷ #🧠-feedback (7 messages):

Links mentioned:


LM Studio ▷ #📝-prompts-discussion-chat (2 messages):


LM Studio ▷ #🎛-hardware-discussion (57 messages🔥🔥):


LM Studio ▷ #🧪-beta-releases-chat (8 messages🔥):

Links mentioned:


LM Studio ▷ #amd-rocm-tech-preview (7 messages):

Links mentioned:


LM Studio ▷ #model-announcements (2 messages):

Link mentioned: lmstudio-community (LM Studio Community): no description found


OpenAI ▷ #ai-discussions (141 messages🔥🔥):


OpenAI ▷ #gpt-4-discussions (16 messages🔥):


OpenAI ▷ #prompt-engineering (40 messages🔥):


OpenAI ▷ #api-discussions (40 messages🔥):


LlamaIndex ▷ #blog (4 messages):


LlamaIndex ▷ #general (191 messages🔥🔥):

Links mentioned:


LlamaIndex ▷ #ai-discussion (3 messages):


HuggingFace ▷ #announcements (1 messages):

Links mentioned:


HuggingFace ▷ #general (132 messages🔥🔥):

Links mentioned:


HuggingFace ▷ #today-im-learning (1 messages):

Link mentioned: GitHub - ManoBharathi93/Sentiment_Classifier: Sentiment classifier on IMDB movie dataset: Sentiment classifier on IMDB movie dataset. Contribute to ManoBharathi93/Sentiment_Classifier development by creating an account on GitHub.


HuggingFace ▷ #cool-finds (5 messages):

Links mentioned:


HuggingFace ▷ #i-made-this (16 messages🔥):

Links mentioned:


HuggingFace ▷ #reading-group (11 messages🔥):

Links mentioned:


HuggingFace ▷ #computer-vision (9 messages🔥):

Link mentioned: X-CLIP: no description found


HuggingFace ▷ #NLP (4 messages):


HuggingFace ▷ #diffusion-discussions (7 messages):

Link mentioned: Google Colaboratory: no description found


HuggingFace ▷ #gradio-announcements (1 messages):

Link mentioned: Gradio Changelog: Gradio Changelog and Release Notes


Eleuther ▷ #general (15 messages🔥):


Eleuther ▷ #research (149 messages🔥🔥):

Links mentioned:


Eleuther ▷ #lm-thunderdome (13 messages🔥):

Link mentioned: vllm/vllm/model_executor/layers/sampler.py at b4543c8f6bf67a7f1a0d6d0fd6cf5697c7eeaabb · vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm


LangChain AI ▷ #general (111 messages🔥🔥):

Links mentioned:


LangChain AI ▷ #share-your-work (5 messages):

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

Link mentioned: Build a real AI model that can try any cloth: I built an agent system which will autonomously iterate & generate img of AI model wearing certain cloth and produce millions+ social postsFree access to run...


CUDA MODE ▷ #general (4 messages):

Links mentioned:


CUDA MODE ▷ #triton (1 messages):

mobicham: Still using dot product instead of additions 🤔


CUDA MODE ▷ #cuda (3 messages):

Links mentioned:


CUDA MODE ▷ #torch (2 messages):

Links mentioned:


CUDA MODE ▷ #announcements (1 messages):


CUDA MODE ▷ #ring-attention (9 messages🔥):

Links mentioned:


CUDA MODE ▷ #off-topic (2 messages):

Link mentioned: Hatsune Miku Fans Furious Live Show Was Just a Flatscreen On Stage: The virtual pop idol did not appear in her full hologram form in two shows on her North American tour and fans are pissed.


CUDA MODE ▷ #triton-puzzles (4 messages):

Link mentioned: Add additional tips and links to README. by jlebar · Pull Request #3608 · openai/triton: no description found


CUDA MODE ▷ #hqq (46 messages🔥):

Links mentioned:


CUDA MODE ▷ #triton-viz (20 messages🔥):


CUDA MODE ▷ #llmdotc (13 messages🔥):

Link mentioned: GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.


OpenInterpreter ▷ #general (50 messages🔥):


OpenInterpreter ▷ #O1 (41 messages🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (77 messages🔥🔥):

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (7 messages):

Link mentioned: Tweet from Charles Foster (@CFGeek): YES! If you initialize a LoRA layer based on the SVD of the original weight matrix (with its top singular values & vectors), you get significantly better fine-tuning results. This is a straight-up fr...


OpenAccess AI Collective (axolotl) ▷ #general-help (5 messages):


OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

blackl1ght: Does anyone have a good function-calling or JSON mode dataset?


OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #app-showcase (2 messages):

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (86 messages🔥🔥):

Links mentioned:


Latent Space ▷ #ai-general-chat (88 messages🔥🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #general (16 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #💬︱twitter (2 messages):


Modular (Mojo 🔥) ▷ #✍︱blog (1 messages):

Link mentioned: Modular: How to Contribute to Mojo Standard Library: A Step-by-Step Guide: We are building a next-generation AI developer platform for the world. Check out our latest post: How to Contribute to Mojo Standard Library: A Step-by-Step Guide


Modular (Mojo 🔥) ▷ #ai (1 messages):

Link mentioned: GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.


Modular (Mojo 🔥) ▷ #🔥mojo (33 messages🔥):

Links mentioned:


Modular (Mojo 🔥) ▷ #community-projects (2 messages):


Modular (Mojo 🔥) ▷ #community-blogs-vids (1 messages):

Link mentioned: How to Print Any Star Pattern in Python: If you want to learn more about Python, Mojo, or even modern software development with Scrum, sign up for my newsletter. You won't regret it!https://www.xenn...


Modular (Mojo 🔥) ▷ #performance-and-benchmarks (2 messages):


Modular (Mojo 🔥) ▷ #nightly (20 messages🔥):

Links mentioned:


LAION ▷ #general (26 messages🔥):

Links mentioned:


LAION ▷ #research (21 messages🔥):

Links mentioned: