Frozen AI News archive

Ideogram 2 + Berkeley Function Calling Leaderboard V2

**Ideogram** returns with a new image generation model featuring **color palette control**, a fully controllable API, and an iOS app, reaching a milestone of **1 billion images created**. Meanwhile, **Midjourney** released a Web UI but still lacks an API. In function calling, the **Berkeley Function Calling Leaderboard (BFCL)** updated to **BFCL V2 • Live**, adding **2251 live, user-contributed function documentation and queries** to improve evaluation quality. **GPT-4** leads the leaderboard, but the open-source **Functionary Llama 3-70B finetune** from Kai surpasses **Claude**. On AI model releases, **Microsoft** launched three **Phi-3.5** models with impressive reasoning and context window capabilities, while **Meta AI FAIR** introduced **UniBench**, a unified benchmark suite for over **50 vision-language model tasks**. **Baseten** improved **Llama 3** inference speed by up to **122%** using Medusa. A new cybersecurity benchmark, **Cyberbench**, featuring **40 CTF tasks**, was released. Additionally, **Codegen** was introduced as a tool for programmatic codebase analysis and AI-assisted development. *"Multiple functions > parallel functions"* was highlighted as a key insight in function calling.

Canonical issue URL

AI News for 8/20/2024-8/21/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (254 channels, and 1980 messages) for you. Estimated reading time saved (at 200wpm): 222 minutes. You can now tag @smol_ai for AINews discussions!

Thanks to @levelsio for shouting us out on the Lex Fridman pod!

'Tis the season of sequels.

After the spectacular launch of Flux (the former Stable Diffusion team, our coverage here), Ideogram (the former Google Imagen 1 team) is back with a vengeance. A new model, with 5 distinct styles with color palette control, a fully controllable API, and iOS app (sorry Android friends), announcing a milestone of 1 billion images created. No research paper of course, but Ideogram is catapulted back to top image lab status, while Midjourney just released a Web UI (still no API).

image.png

Meanwhile in AI Engineer land, the Gorilla team updated the Berkeley Function Calling Leaderboard (now commonly known as BFCL) to BFCL V2 • Live, adding 2251 "live, user-contributed function documentation and queries, avoiding the drawbacks of dataset contamination and biased benchmarks." They also note that multiple functions > parallel functions:

a very high demand for the feature of having to intelligently choose between functions (multiple functions) and lower demand for making parallel function calls in a single turn (parallel functions)

The dataset weights were adjusted accordingly:

image.png

Depth and breadth of function calling is also an important hyperparameter - the dataset now includes rare function documentations that contain 10+ function options or a complex function with 10+ nested parameters.

GPT4 dominates the new leaderboard, but the open source Functionary Llama 3-70B finetune from Kai notably beats Claude.

image.png


{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Model Releases and Benchmarks

AI Applications and Tools

AI Research and Developments

AI Ethics and Societal Impact


AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Optimizing LLM Performance: Finetuning and Deployment Strategies

Theme 2. Microsoft's Phi-3.5 Model Release: A New Frontier in Efficient AI

Theme 3. Creative AI Applications: Role-Playing and Character Generation

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Advancements and Releases

AI Capabilities and Benchmarks

AI in Industry and Applications

AI Development and Training

AI Industry Trends

Memes and Humor


AI Discord Recap

A summary of Summaries of Summaries by Claude 3.5 Sonnet

1. LLM Advancements and Benchmarking

2. Model Performance Optimization

3. Open-Source AI Developments

4. Multimodal AI and Generative Modeling

5. Misc


PART 1: High level Discord summaries

OpenRouter (Alex Atallah) Discord


Nous Research AI Discord


Stability.ai (Stable Diffusion) Discord


OpenAI Discord


Cohere Discord


Perplexity AI Discord


LlamaIndex Discord


OpenAccess AI Collective (axolotl) Discord


Latent Space Discord


Eleuther Discord


tinygrad (George Hotz) Discord


OpenInterpreter Discord


LangChain AI Discord


DSPy Discord


Torchtune Discord


MLOps @Chipro Discord


DiscoResearch Discord


The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The LLM Finetuning (Hamel + Dan) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The Interconnects (Nathan Lambert) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

  • Hermes 3
  • OpenAI deprecated parameters

OpenRouter (Alex Atallah) ▷ #general (138 messages🔥🔥):

  • Hermes 3
  • Phi 3.5
  • Phi 3.5 - Vision Model
  • Azure Pricing
  • GPT-4o Finetuning

Links mentioned:


Nous Research AI ▷ #research-papers (2 messages):

  • Model MoErging
  • Survey on Model MoErging

Link mentioned: Tweet from Prateek Yadav (@prateeky2806): We just released our survey on "Model MoErging", But what is MoErging?🤔Read on! Imagine a world where fine-tuned models, each specialized in a specific domain, can collaborate and "com...


Nous Research AI ▷ #datasets (1 messages):

  • Dataset Licenses
  • TLDRLegal

Nous Research AI ▷ #off-topic (7 messages):

  • Military Rations
  • Snapchat Clickbait

Links mentioned:


Nous Research AI ▷ #interesting-links (2 messages):

  • Semantic search on codebases
  • Codebase-wiki approach
  • Code translation for semantic search
  • Code chunking for semantic search

Nous Research AI ▷ #general (102 messages🔥🔥):

  • Hermes 3 Sentience
  • Replete-Coder-V2-Llama-3.1-8b
  • Model Merging
  • Training on Discord
  • Nous Funding

Links mentioned:


Nous Research AI ▷ #ask-about-llms (18 messages🔥):

  • Hermes 3
  • Hermes 3 - Llama 3.1 - 8B
  • Mistral
  • LLMs
  • Agents

Stability.ai (Stable Diffusion) ▷ #general-chat (114 messages🔥🔥):

  • Forge
  • AI Upscaling
  • Marketing
  • Rubbrband
  • Flux Pro

Links mentioned:


OpenAI ▷ #ai-discussions (80 messages🔥🔥):

  • AI Moat
  • AI skill
  • AI competency
  • AGI
  • Model Training

OpenAI ▷ #gpt-4-discussions (7 messages):

  • OpenAI API Limits
  • ChatGPT Plus vs OpenAI API
  • Training GPT
  • Life Coach App

OpenAI ▷ #prompt-engineering (5 messages):

  • Structured output
  • JSON mode
  • Stochasticity
  • Agent/Assistant GPT libraries

OpenAI ▷ #api-discussions (5 messages):

  • Structured Output vs JSON
  • API Questions

Cohere ▷ #discussions (33 messages🔥):

  • Command-R Fine-Tuning
  • Command-R Model Compatibility
  • Research in Industry
  • C4AI Community
  • Verified Resident Role

Links mentioned:


Cohere ▷ #questions (50 messages🔥):

  • Sensitive Data Detection
  • Document Chunking
  • RAG
  • Fine-Tuning Command-R
  • Classification

Link mentioned: Login | Cohere: Login for access to advanced Large Language Models and NLP tools through one easy-to-use API.


Cohere ▷ #projects (1 messages):

  • OpenSesame

Perplexity AI ▷ #announcements (1 messages):

  • Campus Strategist Program
  • Perplexity growth
  • Program Benefits

Perplexity AI ▷ #general (60 messages🔥🔥):

  • Perplexity bugs
  • Perplexity Pro Subscription
  • Perplexity API
  • Perplexity Campus Strategist
  • Perplexity Image Generation

Link mentioned: Tweet from Aravind Srinivas (@AravSrinivas): What do you think? Is it too late and no major differentiation or is it worth it ? What would you like to see in a Perplexity browser? Quoting Siu (@F22Siu) @AravSrinivas Should perplexity build a ...


Perplexity AI ▷ #sharing (11 messages🔥):

  • Perplexity search features
  • Otter.ai
  • Facebook Youth Appeal
  • Password Managers

Perplexity AI ▷ #pplx-api (6 messages):

  • API Citation Access
  • API Performance
  • Error 520 with Cloudflare

Link mentioned: Discussions: no description found


LlamaIndex ▷ #blog (2 messages):

  • LLMs in Production meetup
  • LlamaCloud

LlamaIndex ▷ #general (58 messages🔥🔥):

  • LlamaIndex Indexing
  • Retrieval Techniques
  • Agent Latency
  • Qdrant Embedding
  • RedisIndexStore

Link mentioned: AI process thousands of videos?! - SAM2 deep dive 101: Build your own SAM2 AI to analyse/edit video clipsDownload Free Python Introduction Ebook: https://clickhubspot.com/1sf7🔗 Links- Get full code breakdown & J...


OpenAccess AI Collective (axolotl) ▷ #general (17 messages🔥):

  • Phi-3.5-vision
  • Phi-3 Model Family
  • OpenAI's gpt4o fine tuning
  • Mistral fine-tuning

Link mentioned: microsoft/Phi-3.5-vision-instruct · Hugging Face: no description found


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (42 messages🔥):

  • Flash Attention GEMMA2
  • EOS/EOT Masking
  • Dataset Loader Issues
  • 8-bit GPU Support
  • Train on EOS setting

Links mentioned:


Latent Space ▷ #ai-general-chat (57 messages🔥🔥):

  • Zed AI Composer
  • Anthropic's Fast Edit Mode
  • Phi 3.5 mini
  • Aider v0.51.0
  • Waymo revenue

Links mentioned:


Latent Space ▷ #ai-announcements (1 messages):

  • Speculative Decoding
  • Paper Club
  • ReFT: Representation Fine Tuning

Link mentioned: Tweet from Latent.Space (@latentspacepod): This is entirely speculative but... Tomorrow's LS paper club with @picocreator is going to be extremely lit! come learn about the state of the art in Speculative Decoding! Quoting Latent.Space (...


Eleuther ▷ #general (7 messages):

  • Llama 3 405b lobotomization
  • Model MoErging
  • Instruction Tuning Datasets
  • Alpaca dataset
  • KV Cache in Models

Links mentioned:


Eleuther ▷ #research (20 messages🔥):

  • Long Context Reasoning
  • Mamba vs Transformers
  • Llama3.1 Style Rope Scaling
  • Model MoErging

Links mentioned:


Eleuther ▷ #scaling-laws (1 messages):

infinit3e: https://huggingface.co/papers/2408.03314


Eleuther ▷ #lm-thunderdome (3 messages):

  • Llama Benchmarks
  • Chain of Thought Paper
  • Eval on ASDiV

tinygrad (George Hotz) ▷ #general (9 messages🔥):

  • Samba Weights
  • Tinygrad
  • Training Samba

Link mentioned: no title found: no description found


tinygrad (George Hotz) ▷ #learn-tinygrad (10 messages🔥):

  • Tinygrad GPU error
  • 3060 CUDA issue
  • Mamba in Tinygrad
  • Tinygrad efficiency
  • Reproducible script needed

OpenInterpreter ▷ #general (15 messages🔥):

  • Open Interpreter API base URL
  • Default Model Selection
  • OpenAI's models pricing
  • Open Interpreter UI

OpenInterpreter ▷ #ai-content (1 messages):

8i8__papillon__8i8d1tyr: https://www.youtube.com/watch?v=d7DtiMzMBdU


LangChain AI ▷ #general (2 messages):

  • LangChain medication extraction
  • LangSmith vs LangChain
  • BERT in Ollama
  • Evaluating extraction accuracy

LangChain AI ▷ #share-your-work (5 messages):

  • Rubiks AI
  • Claude 3 Opus
  • Mistral-Large 2
  • UAP Research
  • Self-Supervised Learning

Links mentioned:


LangChain AI ▷ #tutorials (2 messages):

  • Custom Loaders and Splitters

DSPy ▷ #general (4 messages):

  • LiteLLM
  • DSPy Self-Discover Framework

Torchtune ▷ #general (1 messages):

  • Torchtune nightly release
  • T5 fine-tuning
  • Hermes 2.5

Torchtune ▷ #dev (1 messages):

  • Pre-fill and Decode Optimization

MLOps @Chipro ▷ #events (1 messages):

mr_naija85: Interesting, I would like to attend


DiscoResearch ▷ #general (1 messages):

  • AIDEV
  • AIDEV 2
  • LLM Applications
  • Generative AI
  • AI Village

Link mentioned: AIDev 2 - Developer Community (LLM, Applications & Generative AI): This developer community is aimed at developers and researchers who work with Large Language Models and generative AI on a daily basis.






{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}