Frozen AI News archive

DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

**DeepSeek Janus** and **Meta SpiRit-LM** are two notable multimodality AI models recently released, showcasing advances in image generation and speech synthesis respectively. DeepSeek Janus separates vision encoders for image understanding and generation, achieving better results in both tasks. Meta's SpiRit-LM introduces an expressive speech and writing model generating pitch and style units, improving over standard TTS. Additionally, **W&B Weave** offers comprehensive LLM observability and multimodality fine-tuning tools. Industry updates include Nvidia's Nemotron 70b model underperforming, Meta open-sourcing Movie Gen Bench for media generation benchmarking, Perplexity launching internal search with multi-step reasoning, and Anthropic updating Claude apps. Open source progress includes Hugging Face's gradient accumulation fix in transformers and advocacy for open source AI to prevent Big Tech dominance. *"Model merging for combining skills of multiple models"* is also highlighted.

Canonical issue URL

AI News for 10/17/2024-10/18/2024. We checked 7 subreddits, 433 Twitters and 31 Discords (228 channels, and 2111 messages) for you. Estimated reading time saved (at 200wpm): 249 minutes. You can now tag @smol_ai for AINews discussions!

It is multimodality day in AI research land as two notable multimodality papers were released: Janus and SpiRit-LM.

DeepSeek Janus

Earlier work like Chameleon (our coverage here) and Show-O used a single vision encoder for both visual understanding (image input) and generation (image output). Deepseek separated them:

image.png

and found better results in comparable size image generation:

image.png

and image understanding:

image.png

Open question as to whether this approach maintains its advantage with scale, and if it is really all that important to include image generation in the same stack.

Meta SpiRit-LM

Along with SAM 2.1 and Layer Skip, Meta's Friday drop included SpiRit-LM, a (Spi)eech and W(Rit)ing model that also includes an "expressive" version generating pitch and style units.

image.png

The demo has voice samples - not quite NotebookLM level, but you can see how this is a step above standard TTS.

image.png


Brought to you by W&B Weave: The best ML experiment tracking software in the world is now offering complete LLM observability!

With 3 lines of code you can trace all LLM inputs, outputs and metadata. Then with our evaluation tooling, you can turn AI Engineering from an art into a science.

P.S. Weave also works for multimodality - see how to fine-tune and evaluate GPT-4o on image data.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Industry Updates and Developments

AI Research and Technical Insights

AI Applications and Use Cases

AI Community and Career Insights


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. High-Performance Local LLM Setups

Theme 2. DeepSeek's Janus: A 1.3B Multimodal Model Breakthrough

Theme 3. Meta AI's Hidden Prompt Controversy

Theme 4. AI-Powered Game Development Innovations

Theme 5. LLM API Cost and Performance Comparison Tools

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Development

AI Applications and Demonstrations

Robotics Advancements

AI Ethics and Societal Impact

Community Discussion


AI Discord Recap

A summary of Summaries of Summaries by O1-mini

Theme 1. Model Performance and Evaluations

Theme 2. Advanced Training Techniques

Theme 3. Cutting-Edge Tools and Frameworks

Theme 4. Innovative AI Applications

Theme 5. Community and Collaborative Efforts


PART 1: High level Discord summaries

Nous Research AI Discord


HuggingFace Discord


Eleuther Discord


OpenRouter (Alex Atallah) Discord


LM Studio Discord


Latent Space Discord


Perplexity AI Discord


Modular (Mojo 🔥) Discord


aider (Paul Gauthier) Discord


OpenAI Discord


GPU MODE Discord


LlamaIndex Discord


OpenAccess AI Collective (axolotl) Discord


Cohere Discord


tinygrad (George Hotz) Discord


Interconnects (Nathan Lambert) Discord


Stability.ai (Stable Diffusion) Discord


LLM Agents (Berkeley MOOC) Discord


LAION Discord


DSPy Discord


Torchtune Discord


OpenInterpreter Discord


LangChain AI Discord


Alignment Lab AI Discord


LLM Finetuning (Hamel + Dan) Discord


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Nous Research AI ▷ #general (324 messages🔥🔥):

  • Octopus Password Mystery
  • Fine-tuning Models
  • LLM Performance Evaluations
  • Strawberry Problem
  • Anthropic's Updates

Links mentioned:


Nous Research AI ▷ #ask-about-llms (4 messages):

  • Rust ML Libraries
  • Transition from Python to Rust
  • torch-rs
  • burn and ochre

Nous Research AI ▷ #research-papers (1 messages):

chiralcarbon: https://arxiv.org/abs/2410.13848


Nous Research AI ▷ #interesting-links (3 messages):

  • SCP generator
  • LLM Culture repository

Links mentioned:


Nous Research AI ▷ #research-papers (1 messages):

chiralcarbon: https://arxiv.org/abs/2410.13848


HuggingFace ▷ #general (214 messages🔥🔥):

  • Using AI in Coding
  • Learning Python
  • Factorio Game Discussion
  • Kaggle Competition Insights
  • PlandexAI Discussion

Links mentioned:


HuggingFace ▷ #today-im-learning (3 messages):

  • LLM Evaluation
  • Finetuning Flux Models
  • BitNet Framework

Links mentioned:


HuggingFace ▷ #cool-finds (1 messages):

  • Perplexity for Finance
  • Stock Research Tools

Link mentioned: Tweet from Perplexity (@perplexity_ai): Perplexity for Finance: Real-time stock quotes. Historical earning reports. Industry peer comparisons. Detailed analysis of company financials. All with delightful UI. Have fun researching the marke...


HuggingFace ▷ #i-made-this (5 messages):

  • AI Content Detection Web App
  • Style Transfer Function
  • Behavioral Economics in Decision-Making
  • Fine-tuning and Model Merging
  • Cognitive Biases in Financial Crises

Links mentioned:


HuggingFace ▷ #reading-group (11 messages🔥):

  • HuggingFace Reading Group
  • Intel Patent for Code Generation LLM
  • Discord Stage Channels
  • AI Resources for Beginners

Link mentioned: US20240111498A1 - Apparatus, Device, Method and Computer Program for Generating Code using an LLM
- Google Patents
: no description found


HuggingFace ▷ #computer-vision (2 messages):

  • Out of context object detection
  • Importance of context in image analysis
  • Training models for detection
  • Creating 'others' class

HuggingFace ▷ #NLP (6 messages):

  • Setfit Model Logging
  • Argilla Version Issues

Link mentioned: Argilla: Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets.


HuggingFace ▷ #diffusion-discussions (27 messages🔥):

  • Kwai Kolors in Google Colab
  • ControlNet training considerations
  • Renting VMs for diffusion models
  • Instance types and pricing on AWS EC2

Link mentioned: yisol/IDM-VTON · Hugging Face: no description found


Eleuther ▷ #general (5 messages):

  • Open Source AI Definition
  • Contributions to RWKV
  • Open Source AI projects

Link mentioned: The Open Source AI Definition – 1.0-RC1: Endorse the Open Source AI Definition: have your organization appended to the press release announcing version 1.0 version 1.0-RC1 Preamble Why we need Open Source Artificial Intelligence (AI) Open So...


Eleuther ▷ #research (168 messages🔥🔥):

  • SAE Steering Challenges
  • Noise Distribution in Training
  • Future-Correlation in Machine Learning
  • Interpreting SAE vs Transformer Models
  • Improving Computational Efficiency

Links mentioned:


Eleuther ▷ #lm-thunderdome (2 messages):

  • Huggingface Adapter Issues
  • Summarization Task Errors

Link mentioned: lm-evaluation-harness/lm_eval/models/huggingface.py at 624017b7f4501638b0d5848d0f0eab2914a7fb2c · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness


OpenRouter (Alex Atallah) ▷ #general (167 messages🔥🔥):

  • Nemotron 70B performance
  • OpenRouter data policies
  • GPT-4o model responses
  • Privacy policy linking
  • Kuzco as a provider

Links mentioned:


LM Studio ▷ #general (126 messages🔥🔥):

  • LM Studio Auto Scroll Issues
  • ROCM Compatibility with AMD GPUs
  • Performance of Different Language Models
  • Agent Zero AI Framework
  • Cache Memory Management in MLX-LM

Links mentioned:


LM Studio ▷ #hardware-discussion (11 messages🔥):

  • ROCM support on 580s
  • Xeon CPU thread adjustments
  • Performance of modified 580s
  • Utilization monitoring in Linux

Latent Space ▷ #ai-general-chat (56 messages🔥🔥):

  • Claude App Update
  • Inference Providers for LLM Completions
  • MotherDuck SQL Function for LLMs
  • Voyage AI and Embeddings
  • DeepMind Grandmaster Chess Player

Links mentioned:


Latent Space ▷ #ai-announcements (6 messages):

  • Drew Houston's podcast
  • AI and Dropbox features
  • Coding with LLMs
  • Company size commentary

Link mentioned: Tweet from Alessio Fanelli (@FanaHOVA): 7 years ago @drewhouston told @sama the biggest opportunity in startups was AI. Now, he is rebuilding Dropbox to be the curation layer for your "silicon brain" 🧠 Our @latentspacepod chat co...


Latent Space ▷ #ai-in-action-club (67 messages🔥🔥):

  • Code Diffusion and ASTs
  • Recording Availability
  • Compiler Courses Interest
  • Code Transformation Techniques

Links mentioned:


Perplexity AI ▷ #general (96 messages🔥🔥):

  • Perplexity subscription issues
  • Discussion on Spaces functionality
  • API performance concerns
  • Enterprise use cases
  • User experiences with Perplexity

Links mentioned:


Perplexity AI ▷ #sharing (9 messages🔥):

  • Starlink Gigabit Speed Plan
  • Seutaringkeu Insights
  • Photoshop Functionality
  • Long COVID Research
  • Understanding APIs

Link mentioned: YouTube: no description found


Perplexity AI ▷ #pplx-api (2 messages):

  • PPLX Playground Accuracy
  • PPLX API Response Differences
  • System Prompt Variations

Modular (Mojo 🔥) ▷ #general (27 messages🔥):

  • Mojo Documentation Feedback
  • Mojo's Performance Focus
  • Building a Pythonic Interface
  • Tensor Implementation in Mojo
  • Community Engagement and Future Plans

Links mentioned:


Modular (Mojo 🔥) ▷ #mojo (75 messages🔥🔥):

  • Mojo's Compatibility
  • Networking in Mojo
  • Transitioning from Python
  • Language Preferences
  • Swift vs. Rust

Modular (Mojo 🔥) ▷ #max (2 messages):

  • Max GPU support
  • Apple Metal

aider (Paul Gauthier) ▷ #general (60 messages🔥🔥):

  • Installing Aider
  • Using O1 Models in Aider
  • Pair Programming with Aider
  • Alternatives to Aider for UI/UX Design
  • Durable Execution in Aider

Links mentioned:


aider (Paul Gauthier) ▷ #questions-and-tips (25 messages🔥):

  • Aider file commit errors
  • Token limit issues
  • Deno integration with Aider
  • Controlling repo map output
  • Installation errors

Links mentioned:


aider (Paul Gauthier) ▷ #links (1 messages):

mittens4025: https://shchegrikovich.substack.com/p/use-prolog-to-improve-llms-reasoning


OpenAI ▷ #ai-discussions (25 messages🔥):

  • Advanced Voice Mode Issues
  • Glif Workflow Tool
  • ChatGPT Windows App Feedback

OpenAI ▷ #gpt-4-discussions (32 messages🔥):

  • ChatGPT for Windows
  • Voice Functionality in ChatGPT
  • Privacy Concerns with Screen Sharing
  • Code Generation Issues
  • AI Model Performance Issues

OpenAI ▷ #prompt-engineering (3 messages):

  • Voice AI engineering
  • Image generation spelling

OpenAI ▷ #api-discussions (3 messages):

  • Voice AI engineers
  • Image generation accuracy

GPU MODE ▷ #general (8 messages🔥):

  • Edge deployment projects
  • Sampling inefficiencies
  • Performance differences in gemm
  • Lazy evaluation in MLX
  • Inference speed bottlenecks

Links mentioned:


GPU MODE ▷ #triton (3 messages):

  • Unplanned Closure of Discussion
  • Bug in Integer Packed Tensors
  • Build Error in Triton
  • CMake Configuration Issues

Link mentioned: GitHub - triton-lang/triton: Development repository for the Triton language and compiler: Development repository for the Triton language and compiler - triton-lang/triton


GPU MODE ▷ #torch (11 messages🔥):

  • Flex Attention with DDP Workarounds
  • Using Shared Memory in CUDA

Links mentioned:


GPU MODE ▷ #beginner (25 messages🔥):

  • GPU Mathematics vs Engineering
  • Parallel Processing Scaling Laws
  • Triton and Tensor Cores Usage
  • Benchmarking in Triton

Link mentioned: triton.testing.do_bench — Triton documentation: no description found


GPU MODE ▷ #torchao (2 messages):

  • Performance comparison
  • Torch versions

GPU MODE ▷ #llmdotc (6 messages):

  • Stable Diffusion Optimization
  • Inference Pipeline in C
  • GGML Library Limitations

Link mentioned: GitHub - leejet/stable-diffusion.cpp: Stable Diffusion and Flux in pure C/C++: Stable Diffusion and Flux in pure C/C++. Contribute to leejet/stable-diffusion.cpp development by creating an account on GitHub.


GPU MODE ▷ #bitnet (1 messages):

  • Open Source Re-Implementations
  • T-MAC Low-Bit Inference
  • RMSNorm Variations

Link mentioned: GitHub - microsoft/T-MAC: Low-bit LLM inference on CPU with lookup table: Low-bit LLM inference on CPU with lookup table. Contribute to microsoft/T-MAC development by creating an account on GitHub.


GPU MODE ▷ #sparsity-pruning (1 messages):

  • Sparse-Dense Multiplication
  • PyTorch CUDA Performance

GPU MODE ▷ #webgpu (1 messages):

fancytrevor: if anyone is at the webai summit im kicking around, would be cool to say hi


LlamaIndex ▷ #blog (3 messages):

  • MongoDB Hybrid Search
  • Auth0 AI Applications
  • Hackathon Projects

Link mentioned: GitHub - auth0-lab/market0: sample app about authz and AI: sample app about authz and AI. Contribute to auth0-lab/market0 development by creating an account on GitHub.


LlamaIndex ▷ #general (46 messages🔥):

  • Faithfulness evaluation replication
  • LlamaParse failure in Docx files
  • Handling exceptions in workflows
  • Parallel function calling in workflows
  • Using Ollama in npx create-llama

Links mentioned:


LlamaIndex ▷ #ai-discussion (1 messages):

  • Query Planning
  • LlamaIndex
  • Information Retrieval
  • Natural Language Processing

Link mentioned: Query Planning Workflow with LlamaIndex: Ankush k Singal


OpenAccess AI Collective (axolotl) ▷ #general (45 messages🔥):

  • Bitnet Release
  • Liger Flash Attention Integration
  • VRAM Savings with Liger
  • Liger Installation Issues
  • Axolotl Configuration

Links mentioned:


Cohere ▷ #discussions (13 messages🔥):

  • Stealth Project with Aya
  • Discussion with Gemini
  • Language Translation Experiment

Link mentioned: ‎Gemini - AI Discussion: Nature of LLMs, Reasoning, Future: Created with Gemini


Cohere ▷ #questions (7 messages):

  • RAG AMAs Recording
  • Cohere Command R+ Issues

Cohere ▷ #api-discussions (6 messages):

  • Trial User Access
  • Fine-Tuning Rerank Context Window

Cohere ▷ #projects (7 messages):

  • Claude-Haiku
  • Prompt efficiency
  • Toolkit mention
  • Fast responses
  • Updated prompts

tinygrad (George Hotz) ▷ #general (12 messages🔥):

  • Compositional Linear Algebra (CoLA)
  • OpenCL Setup Issues
  • Tinygrad Optimization Strategies

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (18 messages🔥):

  • Transferability of Tinygrad skills
  • Jim Keller discussion insights
  • Helpful Tinygrad resources
  • Debugging reinforcement learning
  • MuJoCo interface challenges

Links mentioned:


Interconnects (Nathan Lambert) ▷ #news (2 messages):

  • Janus GitHub Repository
  • Text and Image Processing

Link mentioned: GitHub - deepseek-ai/Janus: Contribute to deepseek-ai/Janus development by creating an account on GitHub.


Interconnects (Nathan Lambert) ▷ #ml-questions (9 messages🔥):

  • Inference Providers for Chat Assistants
  • Special Tokens in Chat Models
  • Pre-Filling Responses
  • OpenRouter Assistant Prefill Feature

Link mentioned: OpenRouter): LLM router and marketplace


Interconnects (Nathan Lambert) ▷ #ml-drama (3 messages):

  • Garrison Lovely's behavior
  • Greg Brockman's return to OpenAI
  • Changes at OpenAI

Links mentioned:


Interconnects (Nathan Lambert) ▷ #random (3 messages):

  • Artifacts Log Utility
  • Community Engagement for Pixmo
  • Data Discovery

Links mentioned:


Interconnects (Nathan Lambert) ▷ #rlhf (11 messages🔥):

  • Instruction tuning an LLM
  • Data quality in tuning
  • Preference tuning (RLHF)
  • DPO for persona responses
  • Reaction improvements in Discord

Stability.ai (Stable Diffusion) ▷ #general-chat (20 messages🔥):

  • Gen AI Hackathon
  • Creating Checkpoints
  • Seamless Image Generation
  • Training Models
  • Sampling Methods for Cartoon Style

Links mentioned:


LLM Agents (Berkeley MOOC) ▷ #mooc-announcements (1 messages):

  • Quiz 6 Release
  • Course Signup
  • MOOC Channel for Discussion
  • Guest Speakers
  • External Partnerships

Link mentioned: Large Language Model Agents: no description found


LLM Agents (Berkeley MOOC) ▷ #mooc-questions (17 messages🔥):

  • Course Signup Process
  • Feedback on Article Assignments
  • Livestream Announcements
  • Quiz Access Issues
  • Discord Community Engagement

Link mentioned: Large Language Model Agents: no description found


LAION ▷ #general (11 messages🔥):

  • Gen AI Hackathon
  • Pixtral vs Qwen2 Performance
  • L3_2 Training Issues
  • Explicit Content Captioning
  • NSFW Evaluation Chaos

Link mentioned: Vertical Specific AI Agents Hackathon · Luma: Gen AI Agents CreatorsCorner, collaborating with aixplain, Sambanova Systems, Prem, Marly, Senso, Mistral, coval, heygen, fiberplane, exa, and others…


DSPy ▷ #general (1 messages):

  • LRM using DSPy
  • Token costs for LLM-based applications
  • GPT-4 pricing changes

Link mentioned: Drop o1 Preview, Try This Alternative: Building robust LLM-based applications is token-intensive. You often have to plan for the parsing and digestion of a lot of tokens for summarization or even retrieval augmented generation. Even the me...


DSPy ▷ #colbert (8 messages🔥):

  • ColBERTv2 training
  • N-way tuples with scores
  • PATH implementation
  • DeBERTa and MiniLM usage
  • Training with pylate

Links mentioned:


Torchtune ▷ #general (1 messages):

  • Qwen2.5 Pull Request
  • Torchtune updates

Link mentioned: Qwen2.5 by calvinpelletier · Pull Request #1863 · pytorch/torchtune: Context What is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here) Issue #1624 Changelog TODO Test plan TODO run pre-comm...


Torchtune ▷ #dev (7 messages):

  • Torchtune training approaches
  • Preference pair generation
  • RLAIF paper application
  • Iterative training process
  • DPO vs PPO methods

OpenInterpreter ▷ #general (3 messages):

  • Automating document editing
  • Aider AI enhancements
  • Open Interpreter development

OpenInterpreter ▷ #ai-content (1 messages):

abhichaturvedi_94225: Thanks <@631210549170012166>


LangChain AI ▷ #share-your-work (1 messages):

  • Capital Companion
  • AI trading assistant
  • LangChain
  • LangGraph
  • Advanced trading strategies

Link mentioned: Capital Companion - AI Trading Assistant for Stocks Today | Best Trading Strategy: Enhance your swing trade stocks strategy with AI-driven insights on trending stocks, equity trading software, and comprehensive technical analysis for the best trading strategy.


Alignment Lab AI ▷ #general (1 messages):

  • Twitter/X Embed Fix
  • Discord Integration

Link mentioned: Tweet from GitHub - FixTweet/FxTwitter: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others: Fix broken Twitter/X embeds! Use multiple images, videos, polls, translations and more on Discord, Telegram and others - FixTweet/FxTwitter


LLM Finetuning (Hamel + Dan) ▷ #general (1 messages):

  • LLM Use Cases
  • Mapping Questions-Answers
  • Community Repositories





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}