Frozen AI News archive

Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11

**Mistral** has updated its **Pixtral Large** vision encoder to 1B parameters and released an update to the **123B parameter Mistral Large 24.11** model, though the update lacks major new features. **Pixtral Large** outperforms **Llama 3.2 90B** on multimodal benchmarks despite having a smaller vision adapter. **Mistral's Le Chat** chatbot received comprehensive feature updates, reflecting a company focus on product and research balance as noted by **Arthur Mensch**. **SambaNova** sponsors inference with their RDUs offering faster AI model processing than GPUs. On Reddit, **vLLM** shows strong concurrency performance on an **RTX 3090** GPU, with quantization challenges noted in **FP8 kv-cache** but better results using **llama.cpp** with **Q8 kv-cache**. Users discuss performance trade-offs between **vLLM**, **exllamav2**, and **TabbyAPI** for different model sizes and batching strategies.

Canonical issue URL

AI News for 11/15/2024-11/18/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (217 channels, and 6180 messages) for you. Estimated reading time saved (at 200wpm): 636 minutes. You can now tag @smol_ai for AINews discussions!

We last caught up with Mistral in Sept when they released Pixtral (our coverage here), previously the 12B Mistral Nemo + a 400M vision adapter. Mistral have now upsized the vision encoder to 1B, and also, buried in the footnotes of the Pixtral Large blogpost, updated the 123B param Mistral Large 24.07 (aka "Mistral Large 2" - our coverage here) to "Mistral Large 24.11". The lack of magnet link, lack of blogpost, lack of benchmarks, and refusal to call it "Mistral Large 3" suggest that this update is literally nothing to write home about, but the updates to function calling and system prompt are worth a peek.

Anyway, it's been a whole 13 days since someone dropped a >100B open weights model, so any day that happens is a boon to the Open AI community that we should never take for granted. The big takeaway is that Pixtral Large overwhelmingly beats Llama 3.2 90B on every major multimodal benchmark:

image.png

Although of course one wonders how Llama 3.2 would do if it had an additional 34B weights to memorize things. It's also notable that the Llama 3.2 vision adapter is 20B vs Pixtral Large's 1B.

Lastly, Mistral's Le Chat got a surprisingly comprehensive set of updates, giving it full the full chatbot feature set compared to its peers.

image.png

Arthur Mensch notes twice that this is part of a company level prioritization of product alongside research.

Since this is a new open weights model, you could also take it for a spin on this issue's inference sponsor! (Help us check them out!)


[Sponsored by SambaNova] Processors designed specifically for AI workloads have some major advantages over GPUs. SambaNova’s RDUs have a combination of large addressable memory and dataflow architecture that makes them a lot faster (https://shortclick.link/lk96sw) than other processors for model inference and other AI tasks.

Swyx's comment: the sponsor link discusses the SN40L "Reconfigurable Dataflow Unit" (RDU) holding "hundreds of models in-memory, equating to trillions of parameters", with the ability to "switches between models in microseconds, up to 100x faster than GPU". A pretty darn cool intro into one of the 3 main "big chip" players heating up the high end XXL-size LLM inference market!


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

TO BE COMPLETED


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. vLLM High Concurrency with RTX 3090: Performance and Issues

Theme 2. Qwen 2.5 Coder 32B vs Claude 3.5 Sonnet: Local Performance Comparison

Theme 3. Qwen2.5-Turbo: Extending the Context Length to 1M Tokens

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

Theme 1. ChatGPT-4o Reflective Learning Breakthrough

Theme 2. Claude Sonnet 3.5 Deployment Impact

Theme 3. ComfyUI-based Video Generation Breakthroughs

Theme 4. Anthropic Teams with Palantir on Defense AI


AI Discord Recap

A summary of Summaries of Summaries by O1-mini

Theme 1: 🚀 Fresh AI Models Take Flight

Theme 2: 🛠️ Integrative AI Frameworks and Tools

Theme 3: ⚙️ Performance Boosts and GPU Optimizations

Theme 4: 🏆 Community Hackathons and Collaborative Events

Theme 5: 🐛 Technical Hiccups and Bug Bounties


PART 1: High level Discord summaries

OpenRouter (Alex Atallah) Discord


Unsloth AI (Daniel Han) Discord


Perplexity AI Discord


HuggingFace Discord


aider (Paul Gauthier) Discord


OpenAI Discord


Eleuther Discord


Stability.ai (Stable Diffusion) Discord


LM Studio Discord


Nous Research AI Discord


Interconnects (Nathan Lambert) Discord


Latent Space Discord


GPU MODE Discord


Notebook LM Discord Discord


Modular (Mojo 🔥) Discord


Cohere Discord


LlamaIndex Discord


OpenAccess AI Collective (axolotl) Discord


tinygrad (George Hotz) Discord


LLM Agents (Berkeley MOOC) Discord


DSPy Discord


LAION Discord


MLOps @Chipro Discord


Mozilla AI Discord


Torchtune Discord


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

  • Perplexity models
  • Citations attribute
  • Chat completions

OpenRouter (Alex Atallah) ▷ #app-showcase (4 messages):

  • Threaded Conversations
  • Model Switching
  • vnc-lm Discord Bot
  • WordPress Chatbot Feature
  • Market Competition against Intercom and Zendesk

Links mentioned:


OpenRouter (Alex Atallah) ▷ #general (998 messages🔥🔥🔥):

  • Gemini Fast Performance
  • Mistral API Issues
  • Self-Moderated vs OR Moderated APIs
  • OpenAI O1 Streaming Feature
  • User Discussions on Prompt Engineering

Links mentioned:


OpenRouter (Alex Atallah) ▷ #beta-feedback (21 messages🔥):

  • Custom Provider Keys Access Requests
  • Integration Beta Feature Requests

Unsloth AI (Daniel Han) ▷ #general (723 messages🔥🔥🔥):

  • Unsloth Framework Features
  • Qwen 2.5 Turbo Release
  • Fine-tuning Techniques
  • RAG Approach in AI
  • Model Performance and Configuration

Links mentioned:


Unsloth AI (Daniel Han) ▷ #off-topic (9 messages🔥):

  • Jordan Peterson Gif
  • YouTube Content
  • Aussies Behavior
  • Random Links Humor
  • Part-Time Job Experience

Link mentioned: Jordan Peterson Jbp GIF - Jordan Peterson JBP Precisely - Discover & Share GIFs: Click to view the GIF


Unsloth AI (Daniel Han) ▷ #help (113 messages🔥🔥):

  • Fine-tuning Llama 3
  • Training Dataset Size Impact
  • Gradient Accumulation Fixes
  • Loading Model Adapters
  • Functionality Questions on Unsloth

Links mentioned:


Unsloth AI (Daniel Han) ▷ #research (12 messages🔥):

  • Fast Forward optimization strategy
  • Limits of quantization
  • PrefixQuant technique
  • LaTent Reasoning Optimization (LaTRO)

Links mentioned:


Perplexity AI ▷ #announcements (2 messages):

  • Canadian Student Offer
  • Perplexity Shopping Launch
  • Buy with Pro Feature

Perplexity AI ▷ #general (599 messages🔥🔥🔥):

  • Perplexity Pro Subscription Issues
  • Shopping Feature and Monetization
  • Context Memory Limit Changes
  • Model Accessibility by Region
  • User Experience Concerns

Links mentioned:


Perplexity AI ▷ #sharing (20 messages🔥):

  • AI-Generated Video Games
  • Autonomous ML Engineer
  • AI Readiness in India
  • Coroutines Implementation
  • Chemistry of Tea

Perplexity AI ▷ #pplx-api (18 messages🔥):

  • API Billing Questions
  • Summarization Issues with Leo
  • Reddit API Functionality
  • Make.com Module Feedback

HuggingFace ▷ #general (441 messages🔥🔥🔥):

  • Mistral Large Models
  • LLaVA-o1 vs. Pixtral
  • Gradio and MusicGen
  • Hugging Face API Quotas
  • Dataset Contributions

Links mentioned:


HuggingFace ▷ #today-im-learning (6 messages):

  • Neuralink Updates
  • Transformer Code Issues
  • Tunguska-39B Updates

Link mentioned: README.md · BeaverAI/Tunguska-39B-v1b-GGUF at main: no description found


HuggingFace ▷ #cool-finds (15 messages🔥):

  • Molecular Machine Learning at NeurIPS 2024
  • Magic Quill
  • Neural Network Communication
  • VLMs and their capabilities
  • HtmlRAG in Retrieval-Augmented Generation

Links mentioned:


HuggingFace ▷ #i-made-this (40 messages🔥):

  • AnyModal Framework
  • RoboLlama Robotics Model
  • Kaggle Generative AI Course
  • YouTube Transcript Tool
  • Dataset for Visual Language Models

Links mentioned:


HuggingFace ▷ #reading-group (28 messages🔥):

  • Hardware Compatibility
  • LLaMA Model Usage
  • System Specifications
  • VRAM Management
  • Language Model Testing

HuggingFace ▷ #computer-vision (1 messages):

4rsn: What image painting model do you guys recommend?


HuggingFace ▷ #NLP (3 messages):

  • Reclassifying older text
  • Fine-tuning BERT models
  • Using Sentence Transformers
  • SBERT model updates
  • Using Hugging Face API

Links mentioned:


HuggingFace ▷ #diffusion-discussions (4 messages):

  • CogVideoX-1.5-5B issues
  • Diffusers latest version

aider (Paul Gauthier) ▷ #general (422 messages🔥🔥🔥):

  • Qwen 2.5 updates
  • Aider usage tips
  • Streaming models in OpenAI
  • Comparison of LLMs
  • Community discussions on AI tools

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (96 messages🔥🔥):

  • Configuring Aider with OpenRouter
  • Handling Token Limits in Aider
  • Using Aider with Local Models
  • Extra Parameters for Litellm
  • Benchmark Run Skipped Tests

Links mentioned:


aider (Paul Gauthier) ▷ #links (1 messages):

epicureus: great video here https://www.youtube.com/watch?v=t-i2x3APvGQ


OpenAI ▷ #ai-discussions (287 messages🔥🔥):

  • Google's Project Astra
  • o1-mini vs o1-preview and GPT-4o
  • AI roleplaying capabilities
  • Memory features in AI
  • GPT model updates

Links mentioned:


OpenAI ▷ #gpt-4-discussions (6 messages):

  • Clearing the cache
  • Game bots shutdown

OpenAI ▷ #prompt-engineering (86 messages🔥🔥):

  • Prompt Engineering Challenges
  • LLMs Interaction
  • Self-Observation in AI
  • Chain of Thought Prompting
  • Criticism of AI Responses

OpenAI ▷ #api-discussions (86 messages🔥🔥):

  • Self-Observation Prompts
  • Conversations Between LLMs
  • Chain of Thought Prompting
  • User Perceptions of AI
  • Introspection in AI

Eleuther ▷ #general (41 messages🔥):

  • AI Code Generation Projects
  • Electric Engineering Dataset for LLMs
  • Truncation Sampling in LLMs
  • Crash Test of Gemini's Credibility
  • Grokking Phenomenon in AI Models

Links mentioned:


Eleuther ▷ #research (289 messages🔥🔥):

  • nGPT Optimizer Presentation
  • Normalization Techniques in Neural Networks
  • Variational Autoencoders (VAEs) and Latent Space
  • Diffusion Models for Upscaling
  • Emerging Concepts in Language Modeling

Links mentioned:


Eleuther ▷ #scaling-laws (5 messages):

  • Scaling Pretraining
  • Economic Feasibility of Scaling
  • LLM Pretraining Scalability

Eleuther ▷ #interpretability-general (8 messages🔥):

  • Function Vectors in ICL
  • Overcomplete SAEs and Subspace Generalization
  • Fine-tuning Dynamics in PLMs
  • Emergent Representations in LLMs

Links mentioned:


Eleuther ▷ #lm-thunderdome (24 messages🔥):

  • Using Fine-Tuned OpenAI Models
  • Few-Shot vs Zero-Shot Evaluation Results
  • KeyError in Custom Model Invocation
  • Model Branch Specification in Local Completions

Links mentioned:


Stability.ai (Stable Diffusion) ▷ #general-chat (309 messages🔥🔥):

  • Stable Diffusion 3.5
  • Running SD on GPU
  • SDXL Lightning
  • Roop Unleashed
  • Using Prompts for Image Generation

Links mentioned:


LM Studio ▷ #general (124 messages🔥🔥):

  • LM Studio Local Server
  • AI Video Upscaling Tools
  • Nvidia vs AMD GPUs
  • Using Multiple GPUs
  • Port Forwarding in Local Network

Links mentioned:


LM Studio ▷ #hardware-discussion (177 messages🔥🔥):

  • Windows vs Ubuntu GPU performance
  • AMD vs NVIDIA
  • Multi-GPU setups
  • Power management settings
  • GPU memory compatibility

Links mentioned:


Nous Research AI ▷ #general (141 messages🔥🔥):

  • Ollama and LCPP for Inference
  • AnyModal Framework
  • Decentralised Training Run Updates
  • AI Research Paper Highlights
  • Feedback on Hermes AI Responses

Links mentioned:


Nous Research AI ▷ #ask-about-llms (88 messages🔥🔥):

  • Hermes 3 Compute Instances
  • LLM for AI Video Creation
  • Function Calling in LLMs
  • Model Performance Comparisons
  • Document Extraction and Analysis with LLMs

Links mentioned:


Nous Research AI ▷ #research-papers (8 messages🔥):

  • Fine-tuning LLMs for domain adaptation
  • Evaluation of VLA models in robotic tasks
  • Unlocking reasoning capabilities in LLMs
  • LLaVA and structured generation
  • LLM2CLIP for enhanced visual representation

Links mentioned:


Nous Research AI ▷ #interesting-links (5 messages):

  • Agentic Workflow and Fine-Tuning
  • LLaMA-Mesh Announcement
  • AnyModal Framework

Links mentioned:


Nous Research AI ▷ #research-papers (8 messages🔥):

  • Fine-tuning Large Language Models
  • Benchmarking Vision, Language, & Action Models
  • Latent Reasoning Capabilities in LLMs
  • LLaVA and Step-by-Step Reasoning
  • LLM2CLIP for Enhanced Visual Representation

Links mentioned:


Nous Research AI ▷ #reasoning-tasks (19 messages🔥):

  • Dynamic Model Selection
  • AI Newsletters
  • Link Aggregation Tools
  • Scraping Tools
  • LocalLlama Community

Interconnects (Nathan Lambert) ▷ #events (4 messages):

  • Dinner Reservations
  • Toronto and Vancouver Connections

Interconnects (Nathan Lambert) ▷ #news (106 messages🔥🔥):

  • Qwen 2.5 Turbo
  • Mistral AI Updates
  • API Challenges
  • Deepseek Models

Links mentioned:


Interconnects (Nathan Lambert) ▷ #ml-drama (9 messages🔥):

  • AI Hallucination Concerns
  • Ilya Sutskever and Sam Altman Misalignment
  • TechEmails Twitter Revelations

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (74 messages🔥🔥):

  • 2T Pretraining Dataset Comparison
  • OLMoE ICLR Review Concerns
  • Gemini Model Improvements
  • RewardBench for RLHF Evaluation
  • New LLaVA-o1 Model Release

Links mentioned:


Interconnects (Nathan Lambert) ▷ #memes (42 messages🔥):

  • Reinforcement Learning (RL)
  • Tulu 3 vs Hermes 3
  • Llama 3 Paper
  • OpenAI's Early Works
  • Anthropic's Model Release

Links mentioned:


Interconnects (Nathan Lambert) ▷ #reads (2 messages):

  • Doria's Work
  • Understanding Spoken Content

Latent Space ▷ #ai-general-chat (72 messages🔥🔥):

  • Pixtral Large release
  • Mistral's Le Chat updates
  • Qwen 2.5 Turbo features
  • Lindy usability concerns
  • OpenAI streaming availability

Links mentioned:


Latent Space ▷ #ai-in-action-club (55 messages🔥🔥):

  • Windsurf Editor
  • Anthropic API
  • Cursor vs Windsurf
  • Codeium Demo Enhancements
  • AI's Context Management

Links mentioned:


GPU MODE ▷ #general (18 messages🔥):

  • Dynamic Parallelism in CUDA
  • Cloud Providers for (G)H200
  • Zoom Talks
  • Learning CUDA on Cloud Platforms
  • Profiling Information for CUDA Kernels

GPU MODE ▷ #triton (6 messages):

  • Modified FlashAttention in Triton
  • Triton CPU Backend with torch.compile
  • Community Humor

GPU MODE ▷ #torch (41 messages🔥):

  • Advanced PyTorch Resources
  • Custom Kernels with PyBind
  • PyTorch DCP Memory Usage
  • FSDP State Dict Functionality

Links mentioned:


GPU MODE ▷ #announcements (1 messages):

  • Jay Shah's talk at CUTLASS
  • Epilogue Fusion in CUTLASS
  • GPU passthrough on Proxmox VE

Link mentioned: Articles: We present expository-style articles and coding tutorials on our blog.


GPU MODE ▷ #cool-links (2 messages):

  • ZLUDA
  • CUDA Alternatives
  • AMD and Intel GPUs

Link mentioned: #246 Developer Of ZLUDA: CUDA For Non Nvidia GPUs | Andrzej Janik: CUDA is one of the primary reasons people buy NVIDIA GPUs but what if there was a way to have this compute power on AMD and Intel GPUs as well. Well there is...


GPU MODE ▷ #beginner (8 messages🔥):

  • NCU Source View
  • Live Register Counts
  • CUDA and Tensor Core Clock Speeds
  • Thermal Throttling Mitigation

Link mentioned: NVIDIA H100 Tensor Core GPU Architecture Overview: A high-level overview of NVIDIA H100, new H100-based DGX, DGX SuperPOD, and HGX systems, and a H100-based Converged Accelerator. This is followed by a deep dive into the H100 hardware architecture, ef...


GPU MODE ▷ #pmpp-book (3 messages):

  • CUDA grid and block configuration
  • Confusion in documentation

GPU MODE ▷ #youtube-recordings (3 messages):

  • CUTLASS and Flash Attention 3
  • Column and Row Permutations in FA3
  • Indexing Techniques in GPU Computing

Link mentioned: Lecture 36: CUTLASS and Flash Attention 3: Speaker: Jay ShahSlides: https://github.com/cuda-mode/lecturesCorrection by Jay: "It turns out I inserted the wrong image for the intra-warpgroup overlappin...


GPU MODE ▷ #off-topic (3 messages):

  • Steamed Hams Meme
  • Tasty Hacks
  • Hackathon Culture
  • Creative Projects
  • Community Engagement

Link mentioned: tastyhacks '24 berkeley · Luma: many hackathons nowadays have been tainted by status. participants optimize for winning by incorporating sponsor prizes minimally in their hacks, which later…


GPU MODE ▷ #triton-puzzles (1 messages):

jongjyh: thx bro!


GPU MODE ▷ #rocm (5 messages):

  • CK Profiler Results
  • FP16 Matrix Multiplication Performance
  • H100 vs MI300X
  • Async Copy with TMA
  • AMD Optimization Challenges

GPU MODE ▷ #liger-kernel (1 messages):

0x000ff4: okay how I can contribute to the project can you direct me 🙂


GPU MODE ▷ #self-promotion (4 messages):

  • YouTube channel on quantization
  • X platform AI/ML content
  • Blog on Tensor Core matmul kernel

Links mentioned:


GPU MODE ▷ #🍿 (23 messages🔥):

  • Finetuning Loop Development
  • Job Queue Access
  • Discord Competition Infrastructure
  • Training Data Sources
  • Scheduler Development

GPU MODE ▷ #thunderkittens (4 messages):

  • Template Parameter Inference
  • Register Limit in WGs
  • Register Spills in TK Programs

GPU MODE ▷ #edge (4 messages):

  • Memory Bound in High Context Generation
  • Neuron Culling in Small Language Models
  • Speculative Decoding Challenges

Notebook LM Discord ▷ #use-cases (33 messages🔥):

  • NotebookLM experiments
  • Audio creations with NotebookLM
  • Panel speaker briefings
  • Use case for spending analysis
  • Feedback on NotebookLM

Links mentioned:


Notebook LM Discord ▷ #general (90 messages🔥🔥):

  • NotebookLM issues
  • Feature requests
  • Using NotebookLM for gaming
  • Audio file concerns
  • Integration with external sources

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (51 messages🔥):

  • Mojo Benchmarking
  • Handling Pstate Driver Issues
  • Dict Struct Bugs in Mojo
  • AstNode Struct Implementation
  • CPU Frequency Error Handling

Links mentioned:


Modular (Mojo 🔥) ▷ #max (10 messages🔥):

  • Max Graphs and Knowledge Graphs
  • Using MAX for Graph Searches
  • Feasibility of LLM Inference with MAX
  • Pregel Integration with MAX
  • Memory Requirements for Encoding Graphs

Link mentioned: GitHub - microsoft/graphrag: A modular graph-based Retrieval-Augmented Generation (RAG) system: A modular graph-based Retrieval-Augmented Generation (RAG) system - microsoft/graphrag


Cohere ▷ #discussions (41 messages🔥):

  • Issues with Cohere Model Output
  • User Experience with Different k Values
  • Text Adventure Use Case Challenges
  • Reactions to Cohere's Performance
  • Documents for Office Hours Topics

Links mentioned:


Cohere ▷ #announcements (1 messages):

  • Cohere Developer Office Hours
  • Long text strategies
  • Memory in RAG systems
  • Compression and summarization
  • Use Case discussions

Cohere ▷ #questions (4 messages):

  • Cohere API issues
  • Playwright and file uploads
  • Unexpected tokens in API responses
  • Turning off citations

Cohere ▷ #api-discussions (1 messages):

  • API Errors
  • Service Unavailable Issues

Cohere ▷ #cohere-toolkit (3 messages):

  • Toolkit Release v1.1.3
  • New Features in Toolkit
  • Development Experience Improvements

Link mentioned: Release 2024-11-18 (v1.1.3) · cohere-ai/cohere-toolkit: What's Changed Improve global Settings usage to deal with settings that aren't set Major tool refactoring: Clarify tool schema names (eg ManagedTool -> ToolDefinition, ToolName -> Tool...


LlamaIndex ▷ #blog (4 messages):

  • Ask AI widget in documentation
  • Multimedia Research Report Generator
  • Structured Financial Report Generation
  • Mistral Multi-Modal Image Model Launch

Link mentioned: Multi-Modal LLM using Mistral for image reasoning - LlamaIndex: no description found


LlamaIndex ▷ #general (36 messages🔥):

  • condenseQuestionChatEngine
  • CitationQueryEngine
  • CSV data handling
  • EY Techathon team building
  • blockchain development collaboration

Link mentioned: MLflow LlamaIndex Flavor: no description found


LlamaIndex ▷ #ai-discussion (1 messages):

  • EY Techathon Team
  • AI Developer Position
  • Web App Developer Position

OpenAccess AI Collective (axolotl) ▷ #general (20 messages🔥):

  • Liger Kernel Performance
  • DPO Implementation Feedback
  • Web3 Job Listings
  • Model Optimization Requests

Link mentioned: Reddit - Dive into anything: no description found


OpenAccess AI Collective (axolotl) ▷ #other-llms (2 messages):

  • AnyModal Framework
  • Chai Research Grants
  • Generative AI
  • Open-source Projects
  • Community-driven AGI

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (1 messages):

  • vLLM analytics
  • Token usage inspection

OpenAccess AI Collective (axolotl) ▷ #datasets (3 messages):

  • Pretraining with Instruction-Based Datasets
  • Mathematical Sequence Problems
  • Code Availability for Instruction Datasets

Link mentioned: instruction-pretrain/ft-instruction-synthesizer-collection · Datasets at Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (9 messages🔥):

  • Pretraining and Finetuning Qwen/Qwen2
  • Phorm Bot Issues
  • Understanding eval_steps

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search): Understand code, faster.


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (6 messages):

  • eval_steps inquiry
  • Phorm response issues

Link mentioned: OpenAccess-AI-Collective/axolotl | Phorm AI Code Search): Understand code, faster.


tinygrad (George Hotz) ▷ #general (26 messages🔥):

  • Tinygrad Contributions
  • Release Schedule
  • Alias Implementation
  • Int64 Indexing Bounty
  • Graph and Buffer Improvements

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (4 messages):

  • LSTM Training on M1 Mac
  • AMD GPU Issues with PyTorch
  • TinyGrad for AMD
  • TinyNet Training Example
  • JIT Compilation Problems

LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

  • Intel Tiber AI Cloud
  • Intel Liftoff Program
  • AMA with Intel

Link mentioned: Building with Intel: Tiber AI Cloud and Intel Liftoff · Luma: Building with Intel: Tiber AI Cloud and Intel Liftoff About the AMA Join us for an exclusive AMA session featuring specialists from Intel, our esteemed sponsor…


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

  • Lecture 10 Announcement
  • Percy Liang's Presentation
  • Open-Source Foundation Models
  • Course Resources

Link mentioned: CS 194/294-196 (LLM Agents) - Lecture 10, Percy Liang: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (3 messages):

  • Team Seeking
  • Quiz Score Notifications

LLM Agents (Berkeley MOOC) ▷ #mooc-lecture-discussion (10 messages🔥):

  • Grade Availability
  • Writing Section Deadline
  • Missing Lecture Slides
  • Hackathon Article Deadline
  • Using Substack for Submissions

DSPy ▷ #show-and-tell (1 messages):

  • DSPy VLM tutorial
  • Attributes extraction from images

Link mentioned: Tweet from Karthik Kalyanaraman (@karthikkalyan90): 🧵DSPy recently added support for VLMs in beta. A quick thread on attributes extraction from images using DSPy. For this example, we will see how to extract useful attributes from screenshots of websi...


DSPy ▷ #general (10 messages🔥):

  • DSPy signatures
  • Username generation strategies
  • Code analysis with DSPy
  • LLM caching issues
  • Randomization in LLM outputs

Link mentioned: Tweet from Omar Khattab (@lateinteraction): new hobby: dspy code golf super short pseudocode with natural language tasks should just work and be optimizable Quoting Ajay Singh (@ajay_frontiers) For reliability, nothing beats DSPy (thanks to...


LAION ▷ #general (8 messages🔥):

  • VisRAG Talk
  • Hackathon Culture
  • Copyright Trolls
  • Legal Discussions

Links mentioned:


LAION ▷ #research (1 messages):

  • MultiNet Benchmark
  • Vision-Language-Action models
  • VLA model performance
  • Prompt engineering in robotics
  • Mini VLA model μGATO

Links mentioned:


MLOps @Chipro ▷ #events (2 messages):

  • Starting with MLOps
  • Seeking Clarification
  • Complexity in MLOps

Mozilla AI ▷ #announcements (2 messages):

  • Common Corpus dataset
  • Transformer Lab Demo

Torchtune ▷ #general (1 messages):

  • DCP async checkpointing
  • Intermediate checkpointing efficiency

Link mentioned: [DCP][RFC] DCP async checkpointing in TorchTune for intermediate checkpoints [WIP] by saumishr · Pull Request #2006 · pytorch/torchtune: Context What is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here) This diff introduces the DistributedCheckpointing based ...


AI21 Labs (Jamba) ▷ #jamba (1 messages):

rotem2733: Hello?




{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}