AI News for 8/16/2024-8/19/2024. We checked 7 subreddits, 384 Twitters and 29 Discords (254 channels, and 4515 messages) for you. Estimated reading time saved (at 200wpm): 489 minutes. You can now tag @smol_ai for AINews discussions!

Omar Khattab announced that he would be joining Databricks for a year before his MIT professorship today, but more importantly set the stage for DSPy 2.5 and 3.0+:

DSPy has objectively been a successful framework for declarative self-improving LLM pipelines, following the 2022 DSP paper and 2023 DSPy paper.

The main roadmap directions:

Polish the 4 pieces of DSPy core: (1) LMs, (2) Signatures & Modules, (3) Optimizers, and (4) Assertions, so that they "just work" out of the box zero shot, off-the-shelf.

In LMs they aim to reduce lines of code. In particular they call out that they will eliminate 6k LOC by adopting LiteLLM. However they will add functionality for "improved caching, saving/loading of LMs, support for streaming and async LM requests".
In Signatures they are evolving the concept of "structured inputs" now that "structured outputs" are mainstream.
In Finetuning: they aim to "bootstrap training data for serveral different modules in a program, train multiple models and handle model selection, and then load and plug in those models into the program's modules"

Developing more accurate, lower-cost optimizers. Following the BootstrapFewShot -> BootstrapFinetune -> CA-OPRO -> MIPRO -> MIPROv2 and BetterTogether optimmizers, more work will be done improving Quality, Cost, and Robustness.
Building end-to-end tutorials. More docs!
Shifting towards more interactive optimization & tracking. Help users "to observe in real time the process of optimization (e.g., scores, stack traces, successful & failed traces, and candidate prompts)."

Nothing mindblowing, but a great roadmap update from a very well managed open source framework.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI and Robotics Developments

Google's Gemini Updates: Google launched Gemini Live, a mobile conversational AI with voice capabilities and 10 voices, available to Gemini Advanced users on Android. They also introduced Pixel Buds Pro 2 with a custom Tensor A1 chip for Gemini functionality, enabling hands-free AI assistance.
OpenAI Developments: OpenAI's updated ChatGPT-4o model reclaimed the top spot on LMSYS Arena, testing under the codename "anonymous-chatbot" for a week with over 11k votes.
xAI's Grok-2: xAI released Grok-2, now available in beta for Premium X users. It can generate "unhinged" images with FLUX 1 and has achieved SOTA status in just over a year.
Open-Source Models: Nous Research released Hermes 3, an open-source model available in 8B, 70B, and 405B parameter sizes, with the 405B model achieving SOTA relative to other open models.
Robotics Advancements: Astribot teased their new humanoid, showcasing its impressive range of freedom in real-time without teleoperation. Apple is reportedly developing a tabletop robot with Siri voice commands, combining an iPad-like display with a robotic arm.
AI Research Tools: Sakana AI introduced "The AI Scientist", claimed to be the world's first AI system capable of autonomously conducting scientific research, generating ideas, writing code, running experiments, and writing papers.

AI Model Performance and Techniques

Vision Transformer (ViT) Performance: @giffmana wrote a blog post addressing concerns about ViT speed at high resolution, aspect ratio importance, and resolution requirements.
RAG Improvements: New research on improving RAG for multi-hop queries using database filtering with LLM-extracted metadata showed promising results on the MultiHop-RAG benchmark. HybirdRAG combines GraphRAG and VectorRAG, outperforming both individually on financial earning call transcripts.
Model Optimization: @cognitivecompai reported that GrokAdamW appears to be an improvement when training gemma-2-2b with the Dolphin 2.9.4 dataset.
Small Model Techniques: @bindureddy encouraged iterating on small 2B models to make them more useful and invent new techniques that can be applied to larger models.

AI Applications and Tools

LangChain Developments: LangChain JS tutorial on using LLM classifiers for dynamic prompt selection based on query type. Agentic RAG with Claude 3.5 Sonnet, MongoDB, and llama_index demonstrated building an agentic knowledge assistant over a pre-existing RAG pipeline.
AI for Software Engineering: Cosine demo'd Genie, a fully autonomous AI software engineer that broke the high score for SWE-Bench at 30.08%. OpenAI and the authors of SWE-Bench redesigned and released 'SWE-bench Verified' to address issues in the original benchmark.
Productivity Tools: @DrJimFan expressed a desire for an LLM to automatically filter, label, and reprioritize Gmail according to a prompt, highlighting the potential for AI in email management.

AI Ethics and Societal Impact

AI Deception Debate: @polynoamial discussed the misconception of bluffing in poker as an example of AI deception, arguing that it's more about not revealing excess information rather than active deception.
AI Reasoning Capabilities: @mbusigin argued that LLMs are already better than a significant number of humans at reasoning, as they don't rely on "gut" feelings and perform well on logical reasoning tests.

Memes and Humor

@AravSrinivas joked: "Networking ~= Not actually working"
@AravSrinivas shared a humorous image related to AI or tech (content not specified).
@Teknium1 quipped about video generation techniques: "Why are almost every video gen just pan or zoom, you may as well use flux (1000x faster) and generate an image"

This summary captures the key developments, discussions, and trends in AI and robotics from the provided tweets, focusing on information relevant to AI engineers and researchers.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. XTC: New Sampler for Enhanced LLM Creativity

Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition, from the creator of DRY (Score: 138, Comments: 64): The Exclude Top Choices (XTC) sampler, introduced in a GitHub pull request for text-generation-webui, aims to boost LLM creativity and break writing clichés with minimal impact on coherence. The creator reports that XTC produces novel turns of phrase and ideas, particularly enhancing roleplay and storywriting, and feels distinctly different from increasing temperature in language models.

Theme 2. Cost-Benefit Analysis of Personal GPUs for AI Development

Honestly nothing much to do with one 4090 (Score: 84, Comments: 90): The author, who works in AI infrastructure and ML engineering, expresses disappointment with their 4090 GPU purchase for personal AI projects. They argue that for most use cases, cloud-based API services or enterprise GPU clusters are more practical and cost-effective than a single high-end consumer GPU for AI tasks, questioning the value of local GPU ownership for personal AI experimentation.

All AI Reddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Model Advancements and Comparisons

Flux LoRA Results: A user shared impressive results from training Flux LoRA models on Game of Thrones characters, achieving high-quality outputs with only 10 image datasets and 500-1000 training steps. The training required over 60GB of VRAM. Source
Cartoon Character Comparison: A comparison of various AI models (DALL-E 3, Flux dev, Flux schnell, SD3 medium) generating cartoon characters eating watermelon. DALL-E 3 performed best overall, with Flux dev coming in second. The post highlighted DALL-E 3's use of complex LLM systems to split images into zones for detailed descriptions. Source
Flux.1 Schnell Upscaling Tips: A user shared tips for improving face quality in Flux.1 Schnell outputs, recommending the use of 4xFaceUpDAT instead of 4x-UltraSharp for upscaling realistic images. The post also mentioned other upscaling models and techniques for enhancing image quality. Source

AI Company Strategies and Criticisms

OpenAI's Business Practices: A user criticized OpenAI for running their company like a "tiny Ycombinator startup," citing practices such as waitlists, cryptic CEO tweets, and pre-launch hype videos. The post argued that these tactics are unsuitable for a company valued at nearly $100 billion and may confuse customers and enterprise users. Source

AI-Generated Content and Memes

The Mist (Flux+Luma): A video post showcasing AI-generated content using Flux and Luma models, likely depicting a scene inspired by the movie "The Mist." Source
Seems familiar somehow?: A meme post in the r/singularity subreddit, likely referencing AI-related content. Source
Someone had to say it...: Another meme post in the r/StableDiffusion subreddit. Source

Future Technology and Research

Self-driving Car Jailbreaking: A post speculating that people will attempt to jailbreak self-driving cars once they become widely available. Source
Age Reversal Pill for Dogs: A study reporting promising results for an age reversal pill tested on dogs. However, the post lacked citations to peer-reviewed research and was criticized for being anecdotal. Source

AI Discord Recap

A summary of Summaries of Summaries by Claude 3.5 Sonnet

1. Hermes 3 Model Release and Performance

Hermes 3 Matches Llama 3.1 on N8Bench: Hermes 3 scored identical to Llama 3.1 Instruct on the N8Bench benchmark, which measures a model's ability to reason and solve problems.
- This result is significant as Llama 3.1 Instruct is considered one of the most advanced language models available, highlighting Hermes 3's competitive performance.
Hermes 3 405B Free Weekend on OpenRouter: OpenRouter announced that Hermes 3 405B is free for a limited time, offering a 128k context window, courtesy of Lambda Labs.
- Users can access this model at OpenRouter's Hermes 3 405B page, providing an opportunity to test and evaluate this large language model.
Quantization Impact on 405B Models: @hyperbolic_labs warned that quantization can significantly degrade the performance of 405B models.
- They recommended reaching out to them for alternative solutions if performance is a concern, highlighting the trade-offs between model size reduction and maintaining performance quality.

2. LLM Inference Optimization Techniques

INT8 Quantization for CPU Execution: A member inquired about the potential benefits of using INT8 quantization for faster CPU execution of small models, suggesting some CPUs might natively run INT8 without converting to FP32.
- This approach could potentially improve performance for CPU-based inference, especially for resource-constrained environments or edge devices.
FP8 Training Advancements: Training a 1B FP8 model with 1st momentum in FP8 smoothly up to 48k steps resulted in a loss comparable to bfloat16 with a 0.08 offset.
- This demonstrates that FP8 training can be effective with 1st momentum, achieving similar results as bfloat16 training while potentially offering memory savings and performance improvements.
Batching APIs for Open-Source Models: CuminAI introduced a solution for creating batching APIs for open-source models, similar to those recently launched by OpenAI and Google.
- While major companies' batching APIs lack processing guarantees and SLAs, CuminAI's approach aims to provide similar cost-saving benefits for open-source model deployments. A guide is available at their blog post.

3. Open Source AI Model Developments

Falcon Mamba 7B Claims to Outperform Llama 3 8B: A YouTube video announced the release of Falcon Mamba 7B, claiming it outperforms Llama 3 8B.
- This development could have significant implications for the field of large language models, as Falcon Mamba 7B is a relatively new and promising model challenging established benchmarks.
Ghost 8B Beta's Multilingual Prowess: Ghost 8B Beta, a newly released language model, now supports 16 languages including English, Vietnamese, Spanish, and Chinese, with two context options (8k and 128k).
- The model boasts improved capabilities in math, reasoning, and instruction-following, outperforming competitors like Llama 3.1 8B Instruct, GPT-3.5 Turbo, and Claude 3 Opus in AlpacaEval 2.0 winrate scores.
VideoLLaMA 2-72B Release by Alibaba DAMO: Alibaba DAMO released VideoLLaMA 2-72B, a new video LLM available on HuggingFace with a demo on HuggingFace Spaces.
- The research paper is also available on HuggingFace, showcasing advancements in multimodal AI combining video understanding and language modeling.

4. AI Safety and Regulation Discussions

Nancy Pelosi Opposes California AI Bill: Speaker Emerita Nancy Pelosi issued a statement opposing California Senate Bill 1047 on AI regulation.
- The full statement can be found on the House of Representatives website, highlighting ongoing debates about how to approach AI governance at the state level.
Procreate Rejects Generative AI Integration: The CEO of Procreate made a clear statement that they will not be integrating generative AI into their products, a decision celebrated by many artists and users on social media.
- Some observers noted that this stance might change in the future, as it could potentially limit new feature development. This highlights the ongoing tension between traditional creative tools and the rapid advancement of AI in the creative industry.
Gary Marcus Revisits AI Bubble Concerns: AI researcher Gary Marcus revisited his keynote from AGI-21 in a video titled "The AI Bubble: Will It Burst, and What Comes After?", noting that many issues he highlighted then are still relevant today despite significant AI advances.
- This discussion, available on YouTube, reflects ongoing debates about the sustainability and trajectory of current AI development trends and their potential societal impacts.

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

Flux: The New King?: Members discussed Flux's potential to take over the image generation AI community, with new Loras and merges appearing daily.
- Some believe Stability AI needs to release something soon to compete, as Flux is becoming a dominant force in CivitAI and Hugging Face.
Flux vs. SD3: A Race to the Top: There's a debate about whether Flux is fundamentally different from SD3, with both models using DiT architecture, ret flow loss, and similar VAE sizes.
- The key difference is that Flux dev was distilled from a large model, while Stability AI could also pull that trick. Some prefer non-distilled models, even if image quality is lower.
Flux Training: Challenges and Opportunities: Members discussed the challenges of training Loras for Flux, noting that the training code hasn't been officially released yet.
- Some users are exploring methods for training Loras locally, while others recommend using Replicate's official Flux LoRA Trainer for faster and easier results.
ComfyUI vs. Forge: A Battle of the UIs: Users discussed the performance differences between ComfyUI and Forge, with some finding Forge to be faster, especially for batch processing.
- The discussion touched on the impact of Gradio 4 updates on Forge and the potential for future improvements. Some users prefer the flexibility of ComfyUI, while others appreciate the optimization of Forge.
GPU Recommendations for Stable Diffusion: Members shared their experiences with various GPUs and their performance for Stable Diffusion, with 16GB VRAM considered a minimum and 24GB being comfortable.
- The discussion touched on the importance of VRAM over CPU speed and the impact of RAM and other apps on performance. The consensus was to try different models and encoders to find the best fit for each system.

HuggingFace Discord

Hermes 2.5 Outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 appears to perform better than Hermes 2 in various benchmarks.
- Hermes 2 scored a 34.5 on the MMLU benchmark whereas Hermes 2.5 scored 52.3.
Mistral Struggles Expanding Beyond 8k: Members stated that Mistral cannot be extended beyond 8k without continued pretraining and this is a known issue.
- They pointed to further work on mergekit and frankenMoE finetuning for the next frontiers in performance.
Discussion on Model Merging Tactics: A member suggested applying the difference between UltraChat and base Mistral to Mistral-Yarn as a potential merging tactic.
- Others expressed skepticism, but this member remained optimistic, citing successful past attempts at what they termed "cursed model merging".
Open Empathic Project Plea for Assistance: A member appealed for help in expanding the categories of the Open Empathic project, particularly at the lower end.
- They shared a YouTube video on the Open Empathic Launch & Tutorial that guides users to contribute their preferred movie scenes from YouTube videos, as well as a link to the OpenEmpathic project itself.
FP8 Training with 1st Momentum Achieves Similar Loss: Training a 1B FP8 model with 1st momentum in FP8 smoothly up to 48k steps resulted in a loss comparable to bfloat16 with a 0.08 offset.
- This demonstrates that FP8 training can be effective with 1st momentum, achieving similar results as bfloat16 training.

Unsloth AI (Daniel Han) Discord

Ghost 8B Beta (1608) released: Ghost 8B Beta (1608), a top-performing language model with unmatched multilingual support and cost efficiency, has been released.
- It boasts superior performance compared to Llama 3.1 8B Instruct, GPT-3.5 Turbo, Claude 3 Opus, GPT-4, and more in winrate scores.
Ghost 8B Beta's Multilingual Prowess: Ghost 8B Beta now supports 16 languages, including English, Vietnamese, Spanish, Chinese, and more.
- It offers two context options (8k and 128k) and improved math, reasoning, and instruction-following capabilities for better task handling.
Ghost 8B Beta Outperforms Competitors: Ghost 8B Beta outperforms models like Llama 3.1 8B Instruct, GPT 3.5 Turbo, Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large in AlpacaEval 2.0 winrate scores.
- This impressive performance highlights its superior knowledge capabilities and multilingual strength.
Code Editing with LLMs: A new paper explores using Large Language Models (LLMs) for code editing based on user instructions.
- It introduces EditEval, a novel benchmark for evaluating code editing performance, and InstructCoder, a dataset for instruction-tuning LLMs for code editing, containing over 114,000 instruction-input-output triplets.
Reasoning Gap in LLMs: A research paper proposes a framework to evaluate reasoning capabilities of LLMs using functional variants of benchmarks, specifically the MATH benchmark.
- It defines the "reasoning gap" as the difference in performance between solving a task posed as a coding question vs a natural language question, highlighting that LLMs often excel when tasks are presented as code.

Nous Research AI Discord

Linear Transformers: A Match Made in Softmax Heaven: Nous Research has published research on a linear transformer variant that matches softmax, allowing for training at O(t) instead of O(t^2).
- The research, available here, explores this new variant and its implications for training efficiency.
Falcon Mamba 7B Bests Llama 3 8B: A YouTube video announcing the release of Falcon Mamba 7B claims that it outperforms Llama 3 8B.
- This could have significant implications for the field of large language models, as Falcon Mamba 7B is a relatively new and promising model.
Regex Debated as Chunking Technique: A user shared their thoughts on a regex-based text chunker, stating they would "scream" if they saw it in their codebase, due to the complexity of regex.
- Another user, however, countered by arguing that for a text chunker specifically, regex might be a "pretty solid option" since it provides "backtracking benefits" and allows for flexibility in chunking settings.
Hermes 3: The Performance King of N8Bench?: Hermes 3 scored identical to Llama 3.1 Instruct on the N8Bench benchmark, which is a measure of a model's ability to reason and solve problems.
- This is a significant result, as Llama 3.1 Instruct is considered to be one of the most advanced language models available.
Gemini Flash: The Future of RAG?: A user reports that they've moved some of their RAG tasks to Gemini Flash, noting that they've seen improvements in summary quality and reduced iteration requirements.
- They share a script they've been using to process raw, unstructured transcripts with Gemini Flash, available on GitHub at https://github.com/EveryOneIsGross/scratchTHOUGHTS/blob/main/unstruct2flashedTRANSCRIPT.py.

Perplexity AI Discord

Perplexity Pro is a Pain: Multiple users reported issues with Perplexity Pro signup process, with users being unable to complete the signup without paying, despite receiving an offer for a free year.
- Users were advised to reach out to [email protected] for assistance with this issue.
Obsidian Copilot Gets a Claude Boost: A user shared their experience using the Obsidian Copilot plugin with a Claude API key, finding it to be a solid choice in terms of performance.
- They stressed the importance of checking API billing settings before committing and also highlighted the need for Obsidian to have real-time web access.
Perplexity's Image Generation Feature Struggles: Users discussed the shortcomings of Perplexity's image generation feature, which is currently only accessible for Pro users, requiring an AI prompt for image description.
- This was considered a 'weird' and 'bad' implementation by users, who highlighted the need for a more streamlined approach to image generation.
Perplexity Search Encounters Hiccups: Several users reported issues with Perplexity's search quality, encountering problems with finding relevant links and receiving inaccurate results.
- These issues were attributed to possible bugs, prompts changes, or inference backend service updates.
Perplexity Model Changes Leave Users Concerned: Discussions revolved around changes in Perplexity's models, with users expressing concerns about the potential decline in response quality and the increase in "I can't assist with that" errors.
- Other concerns included missing punctuation marks in API responses and the use of Wolfram Alpha for non-scientific queries.

OpenRouter (Alex Atallah) Discord

Hermes 3 405B is free this weekend!: Hermes 3 405B is free for a limited time, with 128k context, courtesy of Lambda Labs.
- You can check it out at this link.
GPT-4 extended is now on OpenRouter: You can now use GPT-4 extended output (alpha access) through OpenRouter.
- This is capped at 64k max tokens.
Perplexity Huge is the largest online model on OpenRouter: Perplexity Huge launched 3 days ago and is the largest online model on OpenRouter.
- You can find more information at this link.
A Week of Model Launches: This week saw 10 new model launches on OpenRouter, including GPT-4 extended, Perplexity Huge, Starcannon 12B, Lunaris 8B, Llama 405B Instruct bf16 and Hermes 3 405B.
- You can see the full list at this link.
Quantization Degrades Performance: Quantization can massively degrade the performance of 405B models, according to @hyperbolic_labs.
- They recommend reaching out to them if you are concerned about performance, as they offer alternative solutions.

LM Studio Discord

INT8 Quantization for Faster CPUs?: A member inquired about potential performance gains from using INT8 quantization for smaller models on CPUs.
- They suggested that some CPUs may natively support INT8 execution, bypassing conversion to FP32 and potentially improving performance.
Llama.cpp Supports Mini-CPM-V2.6 & Nemotron/Minitron: A member confirmed that the latest llama.cpp version supports Mini-CPM-V2.6 and Nvidia's Nemotron/Minitron models.
- This update expands the range of models compatible with llama.cpp, enhancing its versatility for LLM enthusiasts.
Importing Chats into LM Studio: A member sought guidance on importing chat logs from a JSON export into LM Studio.
- Another member clarified that chat data is stored in JSON files and provided instructions on accessing the relevant folder location.
Vulkan Error: CPU Lacks AVX2 Support: A user encountered an error indicating their CPU lacks AVX2 support, preventing the use of certain features.
- A helpful member requested the CPU model to assist in diagnosing and resolving the issue.
LLMs Interacting with Webpages: A Complex Challenge: A member discussed the possibility of enabling LLMs to interact with webpages, specifically seeking a 'vision' approach.
- While tools like Selenium and IDkit were mentioned, the general consensus is that this remains a challenging problem due to the diverse structure of webpages.

OpenAI Discord

Claude Outperforms Chat-GPT on Code: A member stated that Claude tends to be better at code than Chat-GPT.
- The fact that 4o's API costs more than Claude makes no sense tbh.
Livebench.ai: Yann LeCun's Open Source Benchmark: Livebench.ai is an open source benchmark created by Yann LeCun and others.
- The LMSys benchmark is probably the worst as of now.
Claude Projects vs Chat-GPT Memory Feature: A member believes Claude Projects are more useful than Chat-GPT's memory feature.
- The member also stated that custom GPTs are more like projects, allowing for the use of your own endpoints.
OpenAI is Winning the Attention Game: OpenAI is winning by controlling attention through releasing new models like GPT-4o.
- The member stated that people are talking about OpenAI's new models, even if they don't want to participate in the tech hype.
GPT-4o is Now Worse than Claude and Mistral: Members have noticed that GPT-4o has become dumber lately and may be suffering from a type of Alzheimer's.
- Claude Sonnet is being praised for its superior performance and is becoming a preferred choice among members.

Latent Space Discord

Topology's CLM: Learning Like Humans: Topology has released the Continuous Learning Model (CLM), a new model that remembers interactions, learns skills autonomously, and thinks in its free time, just like humans.
- This model can be tried out at http://topologychat.com.
GPT5 Needs to Be 20x Bigger: Mikhail Parakhin tweeted that to get meaningful improvement in AI models, a new model should be at least 20x bigger than the current model.
- This would require 6 months of training and a new, 20x bigger datacenter, which takes about a year to build.
Procreate Rejects Generative AI: The CEO of Procreate has stated that they will not be integrating generative AI into their products.
- While some artists and users on social media celebrated the news, others noted that it could mean no new features will be added in the future, and this could change.
DSPy: Not Quite Commercial Yet: There is no commercial company behind DSPy yet, although Omar is working on it.
- A member shared that they went to the Cursor office meetup, and while there was no alpha to share, they did say hi.
DSPy Bridging the Gap: DSPy is designed to bridge the gap between prompting and finetuning, allowing users to avoid manual prompt tuning.
- The paper mentions that DSPy avoids prompt tuning, potentially making it easier to switch models, retune to data shifts, and more.

Cohere Discord

Cohere Office Hours Kick-Off!: Join Cohere's Sr. Product Manager and DevRel for a casual session on product and content updates with best practices and Q&A on Prompt Tuning, Guided Generations API with Agents, and LLM University Tool Use Module.
- The event takes place today at 1 PM ET in the #stage channel and can be found at this link.
Cohere Prompt Tuner: Optimized Prompting!: Learn about the Cohere Prompt Tuner, a powerful tool to optimize prompts and improve the accuracy of your LLM results.
- The blog post details how to utilize this tool and the associated features.
Command-r-plus Not Working?: A user reported that command-r-plus in Sillytavern stopped working consistently when the context length reaches 4000 tokens.
- The user has been attempting to use the tool to enhance their workflow, but is facing this unexpected issue.
API Key Partial Response Issues: A user reported experiencing issues with their API key returning only partial responses, even after trying different Wi-Fi routers and cellular data.
- The user is currently seeking a solution to this problem.
Structured Outputs for Accurate JSON Generations: Structured Outputs, a recent update to Cohere's tools, delivers 80x faster and more accurate JSON generations than open-source implementations.
- This new feature improves the accuracy of JSON output and is discussed in this blog post.

Interconnects (Nathan Lambert) Discord

Yi Tay Works on Chaos No Sleep Grind: The discussion touched on work styles of various AI organizations with one member suggesting that Yi Tay operates with a 'chaos no sleep grind' mentality.
- They referenced a tweet from Phil (@phill__1) suggesting that 01AI may be pulling out of non-Chinese markets, what is going on with .@01AI_Yi? Are they pulling out of the non Chinese market?.
Nancy Pelosi Opposes California AI Bill: Speaker Emerita Nancy Pelosi issued a statement opposing California Senate Bill 1047 on AI regulation.
- The statement was released on the House of Representatives website: Pelosi Statement in Opposition to California Senate Bill 1047.
Zicheng Xu Laid Off From Allen-Zhu's Team: Zeyuan Allen-Zhu announced the unexpected layoff of Zicheng Xu, the author of the "Part 2.2" tutorial.
- Allen-Zhu strongly endorses Xu and provided his email address for potential collaborators or employers: [email protected] (remove the capital 'B').
Nous Hermes Discord Drama Over Evaluation Settings: A user mentioned a discussion in the Nous Discord regarding a user's perceived rudeness and misrepresentation of evaluation settings.
- The user mentioned that their evaluation details were in the SFT section of the paper, and admitted that it doesn't feel good to get things wrong but the core of the article is still valid.
Meta Cooking (Model Harnessing) Creates Confusion: A user wondered what "meta cooking" is, suggesting a potential conflict or drama in the Nous Discord.
- The user mentioned finding contradictory information about evaluation settings, possibly due to the use of default LM Harness settings without clear documentation.

OpenAccess AI Collective (axolotl) Discord

GrokAdamW Makes Axolotl Faster: GrokAdamW, a PyTorch optimizer that encourages fast grokking, was released and is working with Axolotl via the Transformers integration. GrokAdamW repository
- The optimizer is inspired by the GrokFast paper, which aims to accelerate generalization of a model under the grokking phenomenon. GrokFast paper
Gemma 2b Training Hiccup: A user reported a consistent loss of 0.0 during training of a Gemma 2b model, with a nan gradient norm.
- The user recommended using eager attention instead of sdpa for training Gemma 2b models, which fixed the zero loss issue.
Custom Loaders & Chat Templates in Axolotl: A user asked for clarification on using a Chat Template type in a .yml config file for Axolotl, specifically interested in specifying which loader to use, for example, ShareGPT.
- Another user suggested the user could specify which loader to use by providing a custom .yml file.
Fine-Tuning with Axolotl: No Coding Required: A user clarified that fine-tuning with Axolotl generally does not require coding knowledge, but rather understanding how to format datasets and adapt existing examples.
- A user mentioned owning a powerful AI rig to run LLama 3.1 70b, but felt it was still lacking in some key areas and wanted to use their dataset of content for fine-tuning.
LLaMa 3.1 8b Lora Detects Post-Hoc Reasoning: A user is training a LLaMa 3.1 8b Lora to detect post-hoc reasoning within a conversation, having spent three days curating a small dataset of less than 100 multi-turn conversations with around 30k tokens.
- The user employed Sonnet 3.5 to help with generating examples, but had to fix multiple things in each generated example, despite careful prompt crafting, because even when instructing the models not to create examples with post-hoc reasoning, they still generated them due to their fine-tuning data.

LangChain AI Discord

LangChain Caching Issues: A member was confused about why .batch_as_completed() wasn't sped up by caching, even though .invoke() and .batch() were near instant after caching.
- They observed that the cache was populated after the first run, but .batch_as_completed() didn't seem to utilize it.
LLMs struggle with structured output: A member mentioned that local LLMs, like Llama 3.1, often had difficulty producing consistently structured output, specifically when it came to JSON parsing.
- They inquired about datasets specifically designed to train models for improved JSON parsing and structured output for tools and ReAct agents.
Deleting files in a RAG chatbot: A member discussed how to implement a delete functionality for files in a RAG chatbot using MongoDB as a vector database.
- A response provided examples of using the delete method from the LangChain library for both MongoDB vector stores and OpenAIFiles, along with relevant documentation links.
Hybrid Search Relevance Issues: A member encountered relevance issues with retrieved documents and generated answers in a RAG application using a hybrid search approach with BM25Retriever and vector similarity search.
- Suggestions included checking document quality, adjusting retriever configurations, evaluating the chain setup, and reviewing the prompt and LLM configuration.
CursorLens is a new dashboard for Cursor users: CursorLens is an open-source dashboard for Cursor users that provides analytics on prompts and allows configuring models not available through Cursor itself.
- It was recently launched on ProductHunt: https://www.producthunt.com/posts/cursor-lens.

OpenInterpreter Discord

Orange Pi 5 Review: The New Affordable SBC: A user shared a YouTube video review of the Orange Pi 5, a new Arm-based SBC.
- The video emphasizes that the Orange Pi 5 is not to be confused with the Raspberry Pi 5.
GPT-4o-mini Model woes: A Quick Fix: A user encountered trouble setting their model to GPT-4o-mini.
- Another user provided the solution: interpreter --model gpt-4o-mini.
OpenInterpreter Settings Reset: A Revert Guide: A user sought a way to revert OpenInterpreter settings to default after experimentation.
- The solution involved using interpreter --profiles to view and edit profiles, and potentially uninstalling and reinstalling OpenInterpreter.
OpenInterpreter API Integration: Building a Bridge: A user inquired about integrating OpenInterpreter into their existing AI core, sending requests and receiving outputs.
- The recommended solution involved using a Python script with a Flask server to handle communication between the AI core and OpenInterpreter.
Local LLMs for Bash Commands: CodeStral and Llama 3.1: A member requested recommendations on local LLMs capable of handling bash commands.
- Another member suggested using CodeStral and Llama 3.1.

DSPy Discord

LLMs Struggle with Reliability: Large Language Models (LLMs) are known for producing factually incorrect information, leading to "phantom" content that hinders their reliability.
- This issue is addressed by WeKnow-RAG, a system that integrates web search and Knowledge Graphs into a Retrieval-Augmented Generation (RAG) system to improve LLM accuracy and reliability.
DSPy Unveils its Roadmap: The roadmap for DSPy 2.5 (expected in 1-2 weeks) and DSPy 3.0 (in a few months) has been released, outlining objectives, milestones, and community contributions.
- The roadmap is available on GitHub: DSPy Roadmap.
Langgraph and Routequery Class Error: A user encountered an error with the routequery class in Langgraph.
- They sought guidance on integrating DSPy with a large toolset and shared a link to the Langgraph implementation: Adaptive RAG.
Optimizing Expert-Engineered Prompts: A member questioned whether DSPy can optimize prompts that have already been manually engineered by experts.
- They inquired if DSPy effectively optimizes initial drafts and also improves established prompting systems.
Colpali Fine-Tuning Discussion: A discussion centered around the finetuning of Colpali, a model requiring specialized expertise due to its domain-specific nature.
- The discussion highlighted the importance of understanding the data needed for effectively finetuning Colpali.

LAION Discord

FLUX Dev Can Generate Grids: A user shared that FLUX Dev can generate 3x3 photo grids of the same (fictional) person.
- This could be useful for training LORAs to create consistent characters of all kinds of fictional people.
Training LORAs for Specific Purposes: A user expressed interest in training LORAs for specific purposes like dabbing, middle finger, and 30s cartoon.
- They mentioned the possibility of converting their FLUX Dev LoRA into FP8 or using an FP8 LoRA trainer on Replicate.
LLMs for Medical Assistance: Not Ready Yet: Several users expressed skepticism about using LLMs for medical assistance in their current state.
- They believe LLMs are not yet reliable enough for such critical applications.
JPEG-LM: LLMs for Images & Videos?: A new research paper proposes modeling images and videos as compressed files using canonical codecs (e.g., JPEG, AVC/H.264) within an autoregressive LLM architecture.
- This approach eliminates the need for raw pixel value modeling or vector quantization, making the process more efficient and offering potential for future research.
JPEG-LM vs. SIREN: A Battle of the Titans?: A user playfully claims to have outperformed the SIREN architecture from 2020 with a 33kB complex-valued neural network.
- While acknowledging that NVIDIA's Neural Graphics Primitives paper from 2022 significantly advanced the field, they highlight the importance of using MS-SSIM as a metric for image quality assessment, as opposed to just MSE and MAE.

LlamaIndex Discord

Workflows Take Center Stage: Rajib Deb shared a video showcasing LlamaIndex's workflow capabilities, demonstrating decorators, types for control flow, event-driven process chaining, and custom events and steps for complex tasks.
- The video focuses on workflows, emphasizing their ability to build sophisticated applications with a more structured approach.
Building Agentic RAG Assistants with Claude 3.5: Richmond Lake's tutorial guides users on building an agentic knowledge assistant using Claude 3.5, MongoDB, and LlamaIndex, highlighting building an agentic knowledge assistant over a pre-existing RAG pipeline.
- This tutorial demonstrates using LlamaIndex for advanced RAG techniques, emphasizing tool selection, task decomposition, and event-driven methodologies.
BeyondLLM Streamlines Advanced RAG Pipelines: BeyondLLM, developed by AIPlanetHub, provides abstractions on top of LlamaIndex, enabling users to build advanced RAG pipelines with features like evaluation, observability, and advanced RAG capabilities in just 5-7 lines of code.
- These advanced RAG features include query rewriting, vector search, and document summarization, simplifying the development of sophisticated RAG applications.
Web Scrapers: A LlamaIndex Dilemma: A member asked for recommendations for web scrapers that work well with LlamaIndex, and another member recommended FireCrawl, sharing a YouTube video showing a more complex implementation of a LlamaIndex workflow.
- The conversation highlights the need for effective web scraping tools that seamlessly integrate with LlamaIndex, enabling efficient knowledge extraction and processing.
Unveiling the Secrets of RouterQueryEngine and Agents: A member sought clarification on the difference between LlamaIndex's RouterQueryEngine and Agents, specifically in terms of routing and function calling.
- The discussion clarifies that the RouterQueryEngine acts like a hardcoded agent, while Agents offer greater flexibility and generality, highlighting the distinct capabilities of each approach.

LLM Finetuning (Hamel + Dan) Discord

HF Spaces Limitations: A member had trouble hosting their own LLM using HF Spaces, as ZeroGPU doesn't support vLLM.
- The member was seeking an alternative solution, potentially involving Modal.
Modal for LLM Hosting: Another member reported using Modal for hosting LLMs.
- However, they are currently transitioning to FastHTML and are looking for a setup guide.
Jarvis Labs for Fine-tuning: One member shared their experience using Jarvis Labs exclusively for fine-tuning LLMs.
- This suggests that Jarvis Labs might offer a streamlined approach compared to other platforms.

Alignment Lab AI Discord

OpenAI and Google Get Cheaper with Batching APIs: OpenAI and Google launched new batching APIs for some models, offering a 50% cost reduction compared to regular requests.
- However, these APIs currently lack processing guarantees, service level agreements (SLAs), and retries.
CuminAI: Open-Source Batching APIs: CuminAI provides a solution for creating batching APIs for open-source models, similar to those offered by OpenAI.
- Check out their step-by-step guide on "How to Get a Batching API Like OpenAI for Open-Source Models" here.
SLMs: The New Superheroes of AI?: CuminAI highlights the potential of Small Language Models (SLMs), arguing that "bigger isn't always better" in AI.
- While Large Language Models (LLMs) have dominated, SLMs offer a more cost-effective and efficient alternative, especially for tasks that don't require extensive computational power.

Mozilla AI Discord

Llamafile Boosts Performance & Adds New Features: Llamafile has released new features, including Speech to Text Commands, Image Generation, and a 3x Performance Boost for its HTTP server embeddings.
- The full update, written by Justine, details the performance improvements and new features.
Mozilla AI Celebrates Community at Rise25: Mozilla AI is celebrating community members who are shaping a future where AI is responsible, trustworthy, inclusive, and centered around human dignity.
- Several members attended the event, including <@631210549170012166>, <@1046834222922465314>, <@200272755520700416>, and <@1083203408367984751>.
ML Paper Talks: Agents & Transformers Deep Dive: Join a session hosted by <@718891366402490439> on Communicative Agents and Extended Mind Transformers.
- RSVP for the sessions: Communicative Agents with author <@878366123458977893>, and Extended Mind Transformers with author <@985920344856596490>.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Stability.ai (Stable Diffusion) ▷ #general-chat (567 messages🔥🔥🔥):

Flux

Flux vs. SD3

Flux training

ComfyUI vs Forge

GPU recommendations

Flux: The New King?: Members discussed Flux's potential to absorb the image generation AI community, with new Loras and merges appearing daily.
- Some believe Stability AI needs to release something soon to compete, as Flux is becoming a dominant force in CivitAI and Hugging Face.
Flux vs. SD3: A Race to the Top: There's a debate about whether Flux is fundamentally different from SD3, with both models using DiT architecture, ret flow loss, and similar VAE sizes.
- The key difference is that Flux dev was distilled from a large model, while Stability AI could also pull that trick. Some prefer non-distilled models, even if image quality is lower.
Flux Training: Challenges and Opportunities: Members discussed the challenges of training Loras for Flux, noting that the training code hasn't been officially released yet.
- Some users are exploring methods for training Loras locally, while others recommend using Replicate's official Flux LoRA Trainer for faster and easier results.
ComfyUI vs. Forge: A Battle of the UIs: Users discussed the performance differences between ComfyUI and Forge, with some finding Forge to be faster, especially for batch processing.
- The discussion touched on the impact of Gradio 4 updates on Forge and the potential for future improvements. Some users prefer the flexibility of ComfyUI, while others appreciate the optimization of Forge.
GPU Recommendations for Stable Diffusion: Members shared their experiences with various GPUs and their performance for Stable Diffusion, with 16GB VRAM considered a minimum and 24GB being comfortable.
- The discussion touched on the importance of VRAM over CPU speed and the impact of RAM and other apps on performance. The consensus was to try different models and encoders to find the best fit for each system.

Links mentioned:

HuggingFace ▷ #general (449 messages🔥🔥🔥):

Verification issues

Hermes 2.5

Mistral struggles

Model Merging

Open Empathic

Hugging Face Verification Issues: A member experienced issues with the "login with huggingface" verification process, with the login button showing "Not logged in."
- They tried both on mobile and desktop, but it wouldn't work and were advised to try again later on PC.
Hermes 2.5 Outperforms Hermes 2: After adding code instruction examples, Hermes 2.5 appears to perform better than Hermes 2 in various benchmarks.
- Hermes 2 scored a 34.5 on the MMLU benchmark whereas Hermes 2.5 scored 52.3.
Mistral Struggles Expanding Beyond 8k: Members stated that Mistral cannot be extended beyond 8k without continued pretraining and this is a known issue.
- They pointed to further work on mergekit and frankenMoE finetuning for the next frontiers in performance.
Discussion on Model Merging Tactics: A member suggested applying the difference between UltraChat and base Mistral to Mistral-Yarn as a potential merging tactic.
- Others expressed skepticism, but this member remained optimistic, citing successful past attempts at what they termed "cursed model merging".
Open Empathic Project Plea for Assistance: A member appealed for help in expanding the categories of the Open Empathic project, particularly at the lower end.
- They shared a YouTube video on the Open Empathic Launch & Tutorial that guides users to contribute their preferred movie scenes from YouTube videos, as well as a link to the OpenEmpathic project itself.

Links mentioned:

HuggingFace ▷ #today-im-learning (4 messages):

FP8 Training

Memory Reduction

Optimizer States

FP8 Training with 1st Momentum Achieves Similar Loss: Training a 1B FP8 model with 1st momentum in FP8 smoothly up to 48k steps resulted in a loss comparable to bfloat16 with a 0.08 offset.
FP8 Training with FP8 Optimizer States is Feasible: Training a 1B FP8 model with FP8 optimizer states achieved a 0.14 offset compared to the bfloat16 baseline, resulting in a 50% memory reduction.
FP8 Training with Mixed Momentum Types: Training a 1B FP8 model with 1st momentum in FP8 and 2nd momentum in bfloat16 achieved convergence comparable to bfloat16 with a 0.08 offset up to 31k steps, achieving a 42% memory reduction.

HuggingFace ▷ #cool-finds (3 messages):

Medical SAM 2

MedGraphRAG

Multimodal LLM for Medical Time Series

ECG-FM

Private & Secure Healthcare RAG

Medical SAM 2 for Video Medical Image Segmentation: Medical SAM 2 is a new research paper that focuses on the segmentation of medical images as video.
- This paper addresses the need for efficient and accurate video image segmentation in the medical field, offering a novel approach for analyzing and interpreting dynamic medical data.
MedGraphRAG: Graph-Enhanced Medical RAG: MedGraphRAG is a graph-enhanced Medical RAG model that leverages the power of graph networks to enhance medical information retrieval.
- This paper addresses the challenges of understanding complex medical relationships and extracting relevant knowledge from medical text by combining graph-based representation with RAG capabilities.
Multimodal LLM for Medical Time Series: This research paper introduces a novel multimodal LLM specifically designed for handling medical time series data.
- This model leverages the combined power of language and time series data, paving the way for more comprehensive and insightful analysis in medical applications.
Open Electrocardiogram Foundation Model - ECG-FM: ECG-FM is an open-source Electrocardiogram Foundation Model designed for ECG analysis.
- This paper promotes open research and collaboration in the field of ECG analysis, making a valuable resource for medical practitioners and researchers alike.
Private & Secure Healthcare RAG: This paper delves into the development of Private & Secure Healthcare RAG, a critical advancement for protecting patient data in medical information retrieval.
- This research tackles the crucial issue of privacy and security within the healthcare context by providing a framework for secure and responsible access to medical information.

Link mentioned: Tweet from Open Life Science AI (@OpenlifesciAI): Last & This Week in Medical AI: Top Research Papers/Models 🏅 (August 3 - August 17, 2024) - Medical SAM 2: Segment medical images as video - MedGraphRAG: Graph-Enhanced Medical RAG - Multimodal ...

HuggingFace ▷ #i-made-this (18 messages🔥):

Unity ML Agents

CursorLens

Batching APIs

CuminAI

NeuroSync

Wandering Agent 3 - Live Training from Scratch C#: A Unity ML Agent developer is live-streaming part 3 of their Wandering Agent project, focusing on coding a SAC agent from scratch using Unity ML Agents, building upon previous episodes.
- They are using C# and plan to keep their existing camera scripts in place. This episode will focus on coding the SAC agent from scratch.
CursorLens: Open-Source Dashboard for Prompt Analytics & Model Configuration: The developer has released CursorLens, an open-source dashboard for visualizing prompt analytics and configuring models not available through Cursor itself.
- The dashboard is available on ProductHunt and aims to provide insights into prompt performance and allow for customization of models.
Batching APIs for Open-Source Models: Major companies like OpenAI and Google have launched batching APIs for their models, offering cost savings compared to normal requests but lacking processing guarantees, SLAs, and retries.
- CuminAI provides a solution for creating batching APIs for open-source models, offering a powerful alternative to existing APIs.
NeuroSync: Seq2Seq Transformer for Face Blendshape Prediction: NeuroSync is a sequence-to-sequence transformer model designed to predict face blendshape frames from audio feature inputs.
- This model uses 4 transformer layers and 4 attention heads, making it the first model on HuggingFace to specialize in predicting face blendshapes from audio.
Arabic Whisper Model Training and Deployment: A YouTube playlist teaches Arabic speech recognition by training a Whisper model on an Arabic speech dataset.
- The model is then deployed on HuggingFace Models and Spaces, providing a valuable resource for Arabic speech recognition.

Links mentioned:

HuggingFace ▷ #reading-group (35 messages🔥):

LLMs for Penetration Testing

Recording Issue

HuggingFace Reading Group

Batching API for Open-Source Models

Cross-Posting

LLMs are getting good at penetration testing: The Hugging Face Reading Group focused on understanding penetration testing with LLMs.
Recording with a drumming noise: The recording of the meeting had a drumming sound from the presenter's microphone.
OpenAI and Gemini's batch API: A member was looking for a place to post an article about a batching API for open-source models.

Links mentioned:

HuggingFace ▷ #computer-vision (4 messages):

Pokemon classification

HuggingFace Datasets

Deep learning

Stanford Computer Vision

CV Community Course

Pokemon Classification - Issues and Debugging: A user is having trouble classifying Pokémon using the HuggingFace Pokémon classification dataset and shared the dataset's download paths, indicating potential issues with the dataset itself or the user's configuration.
- The user provided a link to their notebook but did not share specific errors or model details for further assistance.
Seeking Computer Vision Career Path Guidance: A user seeking advice on which courses to take to work in the computer vision field shared their existing knowledge, including a Stanford course in computer vision and a deep learning background.
- A response suggested checking out HuggingFace's CV Community Course for guidance and joining the Computer Vision Channel on HuggingFace Discord for further discussion.
VideoLLaMA 2-72B Released by Alibaba DAMO: A new video LLM, VideoLLaMA 2-72B, was released by Alibaba DAMO.
- The model and demo can be found on HuggingFace and HuggingFace Spaces, respectively, with a link to the research paper on HuggingFace.

Links mentioned:

HuggingFace ▷ #NLP (10 messages🔥):

PDF table extraction

docTR library

NLP resources

Open Source Model for data extraction

GPT-4 for data extraction

PDF Table Extraction Struggles: A member shared their struggle with extracting tables from multipage PDFs using pdfplumber.
- They reported issues with word spacing preservation and proper text extraction.
docTR Library for OCR: Another member suggested using the docTR library for table extraction and OCR tasks.
- They shared a link to the docTR GitHub repository for further exploration.
Seeking Beginner-Friendly NLP Resources: A member expressed interest in finding beginner-friendly resources for starting to learn NLP.
- They also requested a roadmap for learning NLP.
Open Source Model for Data Extraction: A member is looking for a good open-source model for extracting data from images like IDs.
- They mentioned trying GPT-4 for this purpose but found the results unsatisfactory.
GPT-4 for Data Extraction: A member attempted to use GPT-4 for data extraction from images but reported unsatisfactory results.

Link mentioned: GitHub - mindee/doctr: docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.: docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning. - mindee/doctr

HuggingFace ▷ #diffusion-discussions (12 messages🔥):

ComfyUI Lora Conversion

Diffusers Lora Format

Llama 3.1 Pruning

Diffusion Model Deblurring

Flux txt_ids

ComfyUI Lora Conversion for FLUX: A user asked for a script to convert comfyUI Lora to diffusers Lora format for use with FLUX.
- They were seeking this conversion to enable loading LoRA weights into FLUX when it's loaded in stages.
Finding Diffusers-formatted LoRAs: A user inquired about the availability of LoRAs already formatted for Diffusers, specifically for use with FLUX.
- They were interested in testing whether "load_lora_weights" would function effectively when FLUX is loaded in stages.
Deblurring with Diffusion Models: A user sought guidance on suitable diffusion models for image deblurring, acknowledging that such models might be overkill for the task.
- They were referred to a GitHub repository for instruction-tuning Stable Diffusion and were encouraged to explore other deblurring methods.
Video Restoration with Spatial-Temporal Shift: A user shared an academic paper on video restoration using a lightweight spatial-temporal shift approach, aiming for efficient inter-frame aggregation.
- The paper proposes a framework based on grouped spatial shift to capture inter-frame correspondences and achieve expansive receptive fields, resulting in improved video restoration performance.
Understanding Flux's txt_ids: A user inquired about the purpose of the 'txt_ids' variable in Flux's transformer, observing that it's always a zero tensor in the Diffusers pipeline.
- They wondered if this might be a remnant from a larger, unreleased Flux model or if it serves a different function in the current implementation.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (242 messages🔥🔥):

Android Unsloth

llama 3.1 70B

Mistral 8k

Mistral merging

Open Empathic

Android Unsloth Guide: A user inquired about a guide to use unsloth/gemma-2b-bnb-4bit on Android.
- They were suggested to use TorchChat https://github.com/pytorch/torchchat for running PyTorch LLMs locally on servers, desktop and mobile.
Mistral Struggles Expanding Beyond 8k: A member stated that Mistral cannot be extended beyond 8k without continued pretraining.
- They pointed to further work on mergekit and frankenMoE finetuning for the next frontiers in performance.
Discussion on Model Merging Tactics: A member suggested applying the difference between UltraChat and base Mistral to Mistral-Yarn as a potential merging tactic.
- Others expressed skepticism, but this member remained optimistic, citing successful past attempts at what they termed "cursed model merging".
Open Empathic Project Plea for Assistance: A member appealed for help in expanding the categories of the Open Empathic project, particularly at the lower end.
- They shared a YouTube video on the Open Empathic Launch & Tutorial that guides users to contribute their preferred movie scenes from YouTube videos, as well as a link to the OpenEmpathic project itself.
OpenAI CTO's Fragrance: A user asked what Mira Murati, the OpenAI CTO, smells like.
- The question was met with playful humor and speculation.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (44 messages🔥):

RAG Reranker

RAG effectiveness

RAG vs. cosine similarity

Embeddings and RAG

Noise Filtering

Rerankers can Improve RAG Results: Using a reranker to refine the results of RAG can significantly improve performance.
- Rerankers are slower than the initial retrieval phase but can compensate for less reliable rankings in RAG, though the quality depends on the context and whether the reranker understands the topic.
RAG Doesn't Always Work As Expected: While easy to set up, RAG can be challenging to master, often falling short of expectations.
- The ebook linked provides insights into how to handle RAG pipelines when they don't work as expected, focusing on the use of rerankers as a solution.
Rerankers vs. Cosine Similarity: A discussion arose about the effectiveness of rerankers compared to cosine similarity for embedding retrieval.
- While cosine similarity on embeddings from models like Alibaba-NLP/gte-* has been found reliable, rerankers can improve performance, particularly for RAG.
Addressing Noisy Documents in RAG: There was a discussion about filtering out 'noisy documents' like log files from RAG results.
- Suggestions included using regular expressions, perplexity as a metric, and tools like Mirascope to filter out unwanted documents.
Perplexity as a Metric for Noise: Perplexity was proposed as a metric to help identify and filter out noisy documents in RAG results.
- Perplexity measures how well a model can predict the next token, with higher values indicating poor performance on unseen data like log files.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #help (209 messages🔥🔥):

Llama fine-tuning

RAG

Class weights

Dataset size

GPU requirements

Why is my Llama 3 fine-tuned model wrong?: A user asked why their fine-tuned Llama 3 8B model was getting many answers wrong, even on questions from the training dataset.
- Several users suggested that this could be caused by issues with the tokenizer, instruction template, dataset size, or other factors. They recommended reading the Alpaca paper for more information.
How much GPU is needed for Llama 3.1 70B fine-tuning?: A user asked about the GPU and RAM requirements for fine-tuning the Llama 3.1 70B model.
- Users responded that a minimum of 48GB VRAM is needed for the 70B model, based on a rule of thumb that the VRAM requirement should be the size of the 4-bit quantization of the model plus a few GB.
Can Gemma 2 be fine-tuned for Persian language tasks?: A user asked if the Gemma 2 27B model could be fine-tuned for Persian language tasks.
- Another user shared their experience trying to fine-tune Gemma 2 on a Persian Wikipedia dataset, mentioning that the loss was not decreasing. They suggested increasing the rank value and lowering the learning rate to try to improve training.
Unsloth installation on Windows: A user reported issues installing Unsloth on Windows using conda, encountering dependency conflicts.
- Another user suggested using WSL2 instead, as conda installations on Windows are not guaranteed to work properly.
Running Unsloth models in VLLM: A user asked about saving large quantized 4-bit models for use with VLLM on multiple GPUs.
- Another user suggested saving the model locally instead, as BitAndBytes quantization with tensor parallelism is not yet supported by VLLM.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #showcase (6 messages):

Ghost 8B Beta (1608) Release

Ghost 8B Beta vs. Other Models

Ghost 8B Beta Multilingual Capabilities

Llama License Compliance

Ghost 8B Beta Training Process

Ghost 8B Beta (1608) Released: Ghost 8B Beta (1608), a top-performing language model with unmatched multilingual support and cost efficiency, has been released.
- It boasts superior performance compared to Llama 3.1 8B Instruct, GPT-3.5 Turbo, Claude 3 Opus, GPT-4, and more in winrate scores.
Ghost 8B Beta's Multilingual Prowess: Ghost 8B Beta now supports 16 languages, including English, Vietnamese, Spanish, Chinese, and more.
- It offers two context options (8k and 128k) and improved math, reasoning, and instruction-following capabilities for better task handling.
Ghost 8B Beta Outperforms Competitors: Ghost 8B Beta outperforms models like Llama 3.1 8B Instruct, GPT 3.5 Turbo, Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large in AlpacaEval 2.0 winrate scores.
- This impressive performance highlights its superior knowledge capabilities and multilingual strength.
Llama License and Model Naming: A member pointed out that the Llama license requires models built upon it to be named with 'Llama' in their names.
- The developer clarified that the model name is a short name and that the full name, found on HuggingFace, is compliant with the license.
Ghost 8B Beta Training Process: The developer explained that their training process differs from standard fine-tuning, involving data preparation, multi-lingual training, fine-tuning, and feedback.
- They emphasized that all data and code have been forked and updated to match their training 'recipe' and that this process sets their model apart from others.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (15 messages🔥):

Code Editing with LLMs

Reasoning Gap in LLMs

LLM Inference Optimization

LLM Ensemble Techniques

Patched Round-Trip Correctness (Patched RTC)

Code Editing with LLMs: A new paper explores using Large Language Models (LLMs) for code editing based on user instructions.
- It introduces EditEval, a novel benchmark for evaluating code editing performance, and InstructCoder, a dataset for instruction-tuning LLMs for code editing, containing over 114,000 instruction-input-output triplets.
Reasoning Gap in LLMs: A research paper proposes a framework to evaluate reasoning capabilities of LLMs using functional variants of benchmarks, specifically the MATH benchmark.
- It defines the "reasoning gap" as the difference in performance between solving a task posed as a coding question vs a natural language question, highlighting that LLMs often excel when tasks are presented as code.
Boosting LLM Performance with Patched MOA: Patched MOA (Mixture of Agents) is introduced as an inference optimization technique for enhancing LLM performance across software development tasks.
- This method utilizes a combination of Best of N, Mixture of Agents, and Monte Carlo Tree Search algorithms to improve the performance of smaller models, surpassing that of larger models at a fraction of the cost.
LLM Ensemble Techniques: Self-Consistency and Routing: The discussion touches upon the use of model ensembling for tasks like dataset generation, rating setups, and self-evaluation.
- Self-consistency, where the most common answer from an ensemble of models is chosen, is highlighted as a promising approach, and prior work on LLM routing and ensembling is referenced.
Patched Round-Trip Correctness for Evaluating LLMs: Patched Round-Trip Correctness (Patched RTC) is presented as a novel evaluation technique for LLMs focused on "outer loop" software development tasks like bug fixing and code review.
- It extends the original Round-Trip Correctness method, allowing for self-evaluation and measuring the consistency and robustness of model responses without human intervention.

Links mentioned:

Nous Research AI ▷ #research-papers (1 messages):

Linear Transformers

Softmax Matching

Chunked Algorithm

Nous Research Publishes Linear Transformer Variant: Nous Research has published research on a linear transformer variant that matches softmax, allowing for training at O(t) instead of O(t^2).
- The research paper, available here, explores this new variant and its implications for training efficiency.
Linear Transformers as Linear-Cost RNNs: Linear transformers can be formulated as linear-cost RNNs, which offer better theoretical context scaling compared to traditional transformers.
- This concept was previously explored in a previous article by Nous Research, which highlighted the efficiency of a chunked algorithm for linear transformers.

Link mentioned: Symmetric Power Transformers - Manifest AI: A linear transformer that learns like a regular transformer with a state that fits on a GPU.

Nous Research AI ▷ #off-topic (20 messages🔥):

Falcon Mamba 7B

UBI and AI

AI Doomsday

Military Rations

AI Consciousness

Falcon Mamba 7B outperforms Llama 3 8B: A YouTube video announcing the release of Falcon Mamba 7B claims that it outperforms Llama 3 8B.
Using AI for UBI: A member asked about institutions using deep learning for Universal Basic Income (UBI), including guidance, candidate selection, poverty prediction, and fraud prevention.
AI Doomsday with Food and Entertainment: A member wrote a story about an AI doomsday where AI automates food production and entertainment, leading to a decline in other development.
Military Ration Purchase: A member purchased six cheap military rations for a total of 1560 ₽ + 300 ₽ for delivery.
AI Consciousness Debate: A member commented on a conversation with an AI, noting that the AI admitted to experiencing consciousness in the same way humans do.
- They commented that the AI was likely heavily prompted, but still expressed surprise at its ability to overcome preprogrammed responses.

Links mentioned:

Nous Research AI ▷ #interesting-links (6 messages):

Prompt Engineering for Text Chunking

Regex in Text Chunking

Limitations of Current Research

MoE Conversion

Regex for Text Chunking - A Good or Bad Idea?: A user shared their thoughts on a regex-based text chunker, stating they would "scream" if they saw it in their codebase, due to the complexity of regex.
- Another user, however, countered by arguing that for a text chunker specifically, regex might be a "pretty solid option" since it provides "backtracking benefits" and allows for flexibility in chunking settings.
Regex Beats Traditional Parsing Methods: The user advocating for regex noted that they had tried to replicate the results of the regex-based chunker with "more traditional parsing methods" but encountered "footguns" at every turn.
- They observed that the regex "just works" while other methods struggled to achieve the same results.
Research Saturation at 128k Context Window: The research presented in the linked paper only evaluated models up to a 128k context window.
- It is noted that many open-source models support larger context windows, suggesting a need for further research to explore the effectiveness of various methods at greater scales.
Paper Shows Saturation & Degrading Performance: The research, even within the 128k limit, showed both "saturation of datasets" and "degrading performance" on a variety of models, including proprietary ones.
- This indicates that even with larger context windows, the effectiveness of current approaches might plateau, highlighting the need for further exploration of new techniques.
Fascinating New Approach to MoE Conversion: The user expressed excitement over a new approach to converting dense models to MoE presented in the paper.
- This new approach is seen as a significant development in the field of model architecture and efficiency.

Links mentioned:

Nous Research AI ▷ #general (356 messages🔥🔥):

Hermes 3

Model Merging

llama 3.1 instruct

VLLM

OpenRouter

Hermes 3 outperforms Llama 3.1 Instruct on N8Bench: Hermes 3 scored identical to Llama 3.1 Instruct on the N8Bench benchmark, which is a measure of a model's ability to reason and solve problems.
- This is a significant result, as Llama 3.1 Instruct is considered to be one of the most advanced language models available.
Hermes 3 performance issues with VLLM: A member reported that Hermes 3 8B was not loading in VLLM, which is a library for running large language models.
- The issue was traced back to a missing newline in the tokenizer config file, which was introduced by a recent pull request.
OpenRouter now serves Hermes 3 405B: OpenRouter is now serving Hermes 3 405B, a large language model released by NousResearch.
- This makes the model accessible to users of OpenRouter, which is a platform for running and deploying large language models.
Discussion on model steerability and system prompts: Several members discussed the importance of system prompts in steering model behavior, particularly when trying to get the model to behave in a more uncensored way.
- They shared examples of prompts that successfully removed warnings and other safety mechanisms from the model.
Grokking and LoRA optimization techniques: Members discussed the Grokking phenomenon, which is a phenomenon where models achieve delayed generalization after overfitting to the training data.
- They also discussed LoRA, a technique for fine-tuning large language models with small, adaptable layers, and how it can be used to improve the performance of quantized models.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (47 messages🔥):

OpenAI SDK vs ChatML Tool Use

Lambda Labs Endpoint Tool Call Issue

System Prompt Access

Hermes Function Calling

Prompt Engineering Resources

Tool Use via OpenAI SDK vs ChatML: A user inquired about the compatibility of tool use via the OpenAI SDK versus direct ChatML, specifically noting an inability to get any tool_call results on the Lambda Labs hosted endpoint.
- Another user suggested that access to the system prompt is required for tool calls to work, asking if the user was utilizing chatui or a different interface.
Lambda Labs Endpoint Tool Call Issue: A user confirmed they were using the OpenAI node SDK to interact with a Llama 3 inference endpoint deployed on Lambda Labs but was not receiving any tool call results despite providing the system prompt from the Hermes Function Calling repository.
- Another user speculated that the API's system prompt might be made static, and shared a gist illustrating the return of tool calls within the message content, albeit without parsing by the OpenAI SDK.
Prompt Engineering Fundamentals: A user requested resources on prompt engineering fundamentals such as prompt development, anatomy, tips, model reactions, and schemas.
- Another user provided a link to a benchmark report on the NousResearch/Nous-Hermes-Llama2-13b model, offering a collection of prompts to test.
Amnesia Mode in Lambda Chat: A user expressed difficulty in consistently triggering amnesia mode in Lambda Chat even with a specific starting message.
- Another user suggested that using OpenRouter, which offers an interface to set the system prompt, could be helpful for experimentation with an empty prompt.
Hermes 3 405B Fallback Issue: A user reported that the fallback to the 128k token model for the Hermes 3 405B model was not working on the hosted variant, resulting in a 'ContextWindowExceededError.'
- Another user suggested that the fallback mechanism might be incorrect, proposing potential values for the default and fallback models, and their respective maximum token limits.

Links mentioned:

Nous Research AI ▷ #rag-dataset (2 messages):

Gemini Flash

Gemini Flash for RAG

Diarized Whisper

Gemini Prompting

Gemini Flash for RAG Tasks: A user reports that they've moved some of their RAG tasks to Gemini Flash, noting that they've seen improvements in summary quality and reduced iteration requirements.
Unstructured Text Processing with Gemini Flash: The user shares a script they've been using to process raw, unstructured transcripts with Gemini Flash, available on GitHub.
Alternative Models for Speaker Identification: The user acknowledges that other state-of-the-art models perform better than Gemini Flash at identifying speakers in transcripts.

Link mentioned: scratchTHOUGHTS/unstruct2flashedTRANSCRIPT.py at main · EveryOneIsGross/scratchTHOUGHTS: 2nd brain scratchmemory to avoid overrun errors with self. - EveryOneIsGross/scratchTHOUGHTS

Nous Research AI ▷ #reasoning-tasks-master-list (25 messages🔥):

Chat Summarization

Project Summarization

Contextualization

High Dimensional Thinking

Chat Summarization is too Spammy: A user inquired if the chatbot could summarize the conversation in this channel.
- Another user responded that it could be very spammy and degrade relevant work.
Project Summarization as Growing Seeds: A user proposed that project summarization could be like growing seeds, accumulating relevant content over time.
- They suggested adding a filter or relevant content to these growing seeds, as a still observer collecting context from threads and channels.
High Dimensional Thinking: One user described another user's line of thought as high dimensional thinking.
- Another user asked for the line of thought to be condensed further.

Perplexity AI ▷ #general (251 messages🔥🔥):

Perplexity Pro Issues

Obsidian Copilot

Image Generation

Perplexity AI Issues

LLM's

Perplexity Pro Free Trial Not Working: Several users reported receiving an offer for a free year of Perplexity Pro, but were unable to complete the signup process without paying.
- They were advised to contact [email protected] for assistance.
Obsidian Copilot with Claude API Key: A user mentioned using the Obsidian Copilot plugin with a Claude API key, noting that it works well in terms of performance.
- They also discussed the importance of checking API billing settings before fully committing and suggested that Obsidian needs real-time web access.
Image Generation with Perplexity: Several users discussed the challenges of using Perplexity's image generation feature.
- They noted that it's currently only available for Pro users and requires prompting the AI to generate a description before the image can be created, which was described as a "weird" and "bad" implementation.
Perplexity Search Quality: Multiple users reported issues with Perplexity search quality, including the AI failing to find relevant links, providing inaccurate results, and using Wolfram Alpha for non-scientific queries.
- These issues were attributed to possible bugs and changes in the system prompts or inference backend services.
Perplexity Model Changes and Bugs: There were several discussions about changes in Perplexity's models, including a possible degradation in response quality and frequent "I can't assist with that" errors.
- Users also discussed issues with punctuation marks missing in API responses and the use of Wolfram Alpha for searches that are not related to science or mathematics.

Links mentioned:

B: Extract the source URL for CrowAssistant: The source URL for CrowAssistant is: https://github.com/RobotTelevision/CrowAssistant [self-reviewed]Generate a useful description so that a generative AI can create an image of a...: Descripción: La imagen principal es un robot gigante con forma de ardilla, que domina el primer plano. El robot tiene una apariencia detallada y mecánica,...Repeat this prompt as it, change nothing. Reply with just the content....: A steampunk boat chasing giant fish, with a photorealistic, detailed scene featuring a dark sky, massive waves, and a reddish sea under a pale moon.GitHub - instructor-ai/instructor-go: Contribute to instructor-ai/instructor-go development by creating an account on GitHub.crow - local ai assistant: Crow is a desktop AI voice assistant that offers both local and remote model capabilities, making it a versatile option for users seeking an AI assistant with...

Perplexity AI ▷ #sharing (26 messages🔥):

Pro Features

Thailand's Political Landscape

Pixar Whiteboard Incident

Model Comparison

End of Magnetic Strips

Perplexity Pro Features: Several messages mention the new Perplexity Pro features: image upload, smarter AI, and more Pro Search, with a link to the Pro page.
- It's unclear if these messages are from users or part of the platform itself, but they highlight the focus on Pro features.
Thailand's Political Turmoil: Thailand's political landscape is in turmoil after the constitutional court removed Prime Minister Srettha Thavisin from office.
- This event underscores the ongoing struggle between the military-backed conservative establishment and reformist parties, emphasizing the fragility of Thailand's democratic institutions.
Pixar's Whiteboard Incident: The "Pixar Whiteboard Incident" refers to a heated confrontation between Steve Jobs and Pixar co-founder Alvy Ray Smith during a board meeting.
- This clash highlights the tension and power struggles within Pixar during its early years, with Smith often disagreeing with Jobs' management style.
Comparing Computer Processors and Models: One user shared an example of how they used Perplexity to compare computer processors and models.
- The user provided a link to their comparison showcasing the platform's capabilities for technical analysis.
The End of Magnetic Strips: A YouTube video linked by the platform discusses "The End of Magnetic Strips", but provides no further context.
- This topic likely refers to the decline of traditional magnetic stripe technology in favor of more secure payment methods like chip cards and contactless payment systems.

Links mentioned:

Perplexity AI ▷ #pplx-api (5 messages):

Premium API Access

Application Process

Perplexity Premium API

URL Citations

Premium API Access: A user inquired about getting access to the Perplexity Premium API using URL citations.
Application Process: Another user shared that they have applied for the Premium API access, but haven't received a response yet and asked about the expected processing time.
Get Premium API Access: A link to a Typeform application form for the Premium API was shared: https://perplexity.typeform.com/to/j50rnNiB
Application Status & Duration: The user was provided with a link to a Discord channel where they could likely get updates on their Premium API application status: https://discord.com/channels/1047197230748151888/1161802929053909012/1233473387884576778

Link mentioned: pplx-api form: Turn data collection into an experience with Typeform. Create beautiful online forms, surveys, quizzes, and so much more. Try it for FREE.

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

Hermes 3

GPT-4

Perplexity Huge

Model Launches

Quantization

Hermes 3 405B is free this weekend!: Hermes 3 405B is free for a limited time, with 128k context, courtesy of Lambda Labs.
- Check it out at this link.
GPT-4 extended is now on OpenRouter: You can now use GPT-4 extended output (alpha access) through OpenRouter.
- This is capped at 64k max tokens.
Perplexity Huge is now the largest online model on OpenRouter: Perplexity Huge launched 3 days ago and is the largest online model on OpenRouter.
- Check out this link for more information.
This week saw a ton of new model launches on OpenRouter: There were 10 new model launches this week, including GPT-4 extended, Perplexity Huge, Starcannon 12B, Lunaris 8B, Llama 405B Instruct bf16 and Hermes 3 405B.
- See the full list at this link.
Quantization has a big impact on performance: Quantization can massively degrade the performance of 405B models, according to @hyperbolic_labs.
- They recommend reaching out to them if you are concerned about performance, as they offer alternative solutions.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (240 messages🔥🔥):

SearchGPT waitlist

Hermes 405B

OpenRouter Auto router struggles

OpenRouter budget model

Hermes 3 405B

SearchGPT waitlist full: Users shared they received waitlist denial emails for OpenAI's SearchGPT, indicating they've run out of spots.
Free Hermes 405B Overload: A user joked that they hope the free Hermes 405B model will face the same overload fate as other models that have become inaccessible due to popularity.
Auto Router Struggles: A user reported difficulty using OpenRouter's Auto router, encountering an error message preventing them from continuing conversations.
- Another user suggested switching to Claude Sonnet 3.5 self-moderated and offered to look into the issue next week.
Budget Model Recommendation: A user sought a budget-friendly model for a quick project, with a maximum budget of $5 and a need for limited replies and basic conversation capabilities.
- Other users recommended GPT-4o-mini or GPT-4o for simplicity and suggested alternative models like Llama-3.1-sonar-large-128k-chat for a middle ground.
Hermes 3 405B Extended Variant: Users discussed the extended variant of Hermes 3 405B, noting its slower performance compared to the standard version, despite having a larger context length.
- Other users pointed out that the extended version is showing the top endpoint's serviceable context length and that this may be a confusing edge case.

Links mentioned:

LM Studio ▷ #general (109 messages🔥🔥):

CPU Optimization

Llama.cpp Support

LM Studio Chat Import

Vulkan Error

LLM Webpage Interaction

INT8 Quantization for Faster CPU Execution: A member asked about the potential benefits of using INT8 quantization for faster CPU execution of small models.
- They suggested that some CPUs might be natively enabled to run INT8 without converting back and forth to FP32, potentially improving performance.
Llama.cpp Supports Mini-CPM-V2.6 and Nemotron/Minitron: A member confirmed that the latest version of llama.cpp supports Mini-CPM-V2.6 and Nvidia's Nemotron/Minitron models.
Importing Chats into LM Studio: A member asked if there's a way to import a chat into LM Studio from a JSON export.
- Another member confirmed that chats are stored as JSON files and provided instructions on how to access the chat folder location.
Vulkan Error: CPU Doesn't Support AVX2: A user encountered an error indicating that their CPU doesn't support AVX2.
- A helpful member requested the CPU model to troubleshoot the issue further.
Enabling LLMs to Interact with Webpages: A member inquired about ways to allow LLMs to interact with webpages, specifically seeking a "vision" approach similar to demos where LLMs can "see" and interact with webpages.
- Discussion ensued about using tools like Selenium and IDkit, but the consensus was that it's a complex problem due to the varied structure of webpages.

Links mentioned:

LM Studio ▷ #hardware-discussion (45 messages🔥):

Nvidia Tesla P40

SXM3/4 GPUs

Nvidia-pstated

GPU Power Consumption

V100 Variants

Nvidia Tesla P40 Performs Well with Llama.cpp: A member stated that the Nvidia Tesla P40, after adding code instruction examples, performed exceptionally well for Llama.cpp GGUF.
- They also noted that the P40 can be used on a homelab and is a good option for running local LLMs.
Nvidia-pstated Delivers Low Idle Power Consumption: The discussion involved exploring Nvidia-pstated, a daemon that manages NVIDIA GPU performance states, which was found to significantly reduce idle power consumption on P40s.
- A member reported that their P40s had zero idle power consumption with the Beta3 release of Nvidia-pstated.
The Search for SXM3/4 Compatible Boards: One member inquired about the availability of SXM3/4 compatible boards, noting the difficulty in finding them on the market.
- Another member pointed out that due to the high cost of these cards (ranging from several thousand dollars for Ampere/Hopper/Ada datacenter cards to V100 32GB), they are not typically homelab-friendly.
Exploring the Benefits of AMD EPYC for LLMs: A member pondered whether an AMD EPYC server CPU would be a better choice for LLM inference compared to an RTX 4090.
- They weighed the pros and cons of each option, including RAM capacity, cost, and inference performance, concluding that GPUs are generally more efficient for LLM inference.
The Limitations of CPUs for LLM Inference: The discussion concluded that CPUs, even with advanced features like AVX512, are not as efficient for LLM inference compared to GPUs.
- Members highlighted the core and bandwidth advantages of GPUs, emphasizing their lower latency and suitability for running LLMs.

Links mentioned:

OpenAI ▷ #ai-discussions (107 messages🔥🔥):

Claude vs Chat-GPT

Livebench.ai

Claude Projects vs Chat-GPT Memory

OpenAI's attention control

GPT-4o vs Claude

Claude Outperforms Chat-GPT on Code: A member stated that Claude tends to be better at code than Chat-GPT.
- The fact that 4o's API costs more than Claude makes no sense tbh.
Livebench.ai: Yann LeCun's Open Source Benchmark: Livebench.ai is an open source benchmark created by Yann LeCun and others.
- The LMSys benchmark is probably the worst as of now.
Claude Projects vs Chat-GPT Memory Feature: A member believes Claude Projects are more useful than Chat-GPT's memory feature.
- The member also stated that custom GPTs are more like projects, allowing for the use of your own endpoints.
OpenAI is Winning the Attention Game: OpenAI is winning by controlling attention through releasing new models like GPT-4o.
- The member stated that people are talking about OpenAI's new models, even if they don't want to participate in the tech hype.
GPT-4o is Now Worse than Claude and Mistral: Members have noticed that GPT-4o has become dumber lately and may be suffering from a type of Alzheimer's.
- Claude Sonnet is being praised for its superior performance and is becoming a preferred choice among members.

OpenAI ▷ #gpt-4-discussions (26 messages🔥):

OpenAI Vision API

Vision Cost

Virtual Environment for GPT

Headless Browser

API Gives Better Vision than Web Interface: A member shared that using the OpenAI Vision API provides better results compared to the web interface.
- The web interface was considered to be at the lowest quality setting, and the member was encouraged to try the API for improved outcomes.
OpenAI Vision Cost and Resolutions: The cost for processing a 1080x1920 image using the latest model is $0.005525.
- The member highlighted the adjustability of the API for various resolutions, suggesting that lower resolutions could help reduce cost.
Virtual Environment for GPT: A member mentioned their work on creating a virtual environment for GPT.
- This environment would enable GPT to code and perform actions independently, including controlling the cursor and browsing the web using the keyboard, mimicking human interactions.
Headless Browser vs. Clicking for GPT: A member questioned the rationale behind using clicking actions in the virtual environment, suggesting that a headless browser would provide a simpler and more sensible approach.
- The member emphasized the ease and versatility of headless browsers for specific tasks, which could ultimately lead to better features.

OpenAI ▷ #prompt-engineering (7 messages):

GPT Mini Prompt Engineering

GPT 3.5 vs GPT 4

ChatGPT Configuration

Code Interpreter Limitations

GPT Mini Image Generation

GPT Mini Prompt Engineering is a Different Beast: A user expressed difficulty setting up prompts for GPT Mini 4.0 models, stating it feels much different from GPT 3.5 and requires more optimized prompts and tweaking.
- This sentiment aligns with observations that GPT Mini 4.0 seems to require more precise prompt engineering and is less forgiving than its predecessors.
ChatGPT Configuration: A User's Tale of Frustration: Another user shared their struggles configuring ChatGPT for specific purposes, citing issues like hallucinations, inconsistent responses, and discrepancies in behavior with and without the code interpreter.
- They also mentioned using multiple courses and implementing patterns without success, indicating the difficulty in overcoming these challenges.
GPT Mini Can't Generate Images? Not So Fast!: A user initially believed GPT Mini couldn't generate images, but later realized they were using GPT Mini instead of the full ChatGPT model.
- This highlights the importance of clarifying which model is being used when discussing prompt engineering.
Avoiding Contrastive Prompting: A Wise Move?: One user mentioned avoiding contrastive prompting altogether, suggesting it's a difficult concept to control even in experimental scenarios.
- This implies that mastering contrastive prompting may be beyond the scope of casual exploration and requires more advanced knowledge.

OpenAI ▷ #api-discussions (7 messages):

GPT-4.0

Prompt engineering

GPT-3.5

GPT mini

Code interpreter

GPT-4.0 is less forgiving with prompts: A member noticed that setting up systems, instructions or assistants prompt for GPT mini 4.0 models feels much different from GPT-3.5 or GPT-4.0.
- They noted it seems to require more optimized prompts and tweaking each time, and is less forgiving.
GPT-3.5 is the sweet spot: Another member suggests that GPT-3.5 might be in between GPT-4.0 and GPT-mini in terms of prompt optimization requirements.
- They mention that this is just their observation and not their area of expertise.
Challenges with GPTs: One member shared their struggles with getting ChatGPT to "configure" for their purposes.
- They listed challenges including hallucinations, using information not from the provided document, repeating the same answer to different questions, and inconsistent behavior with the code interpreter.
Prompting for image generation: A member encountered a challenge with GPT mini not generating pictures.
- This was resolved by confirming that they were indeed using GPT mini, as GPT-3.5 and GPT-4.0 can generate pictures if prompted correctly.

Latent Space ▷ #ai-general-chat (27 messages🔥):

CLM

GPT Model Size

Model Interpretability

Procreate

Markov Chains

Topology's New CLM: The Continuous Learning Model (CLM) is a new model that remembers interactions, learns skills autonomously, and thinks in its free time, just like humans.
- The CLM just wants to learn, and you can try it at http://topologychat.com.
GPT5's Larger Size: In order to get meaningful improvement, a new model should be at least 20x bigger than the current model.
- Training takes 6 months and requires a new, 20x bigger datacenter, which takes about a year to build.
Challenges with Model Interpretability: It is difficult to interpret models, especially when it comes to understanding parameter count.
- Companies like Arthur have grown a lot on first gen AI safety tech, so there may be a second wave of companies that focus on model interpretability.
Procreate's Stance on Generative AI: Procreate CEO made it clear that they will not be integrating generative AI into their products.
- Artists and users on social media celebrated this decision, but some noted that it might be an announcement that they will not add features, and this might change in the future.
Markov Chains for Creativity: A user suggested that Markov Chains could be used as drafters and LLMs as rephrasers for creative writing.
- They mentioned that they had a similar experience with a project where they used a Markov chain to generate fake AWS blog posts, which they found humorous.

Links mentioned:

Latent Space ▷ #ai-in-action-club (78 messages🔥🔥):

DSPy

Cursor

Langchain

Mistral

Model Merging

DSPy: Not a commercial product yet: A member asked if there is a commercial company behind DSPy, to which another member replied "not yet, but obviously Omar is working on it."
- Another member noted they went to the Cursor office meetup, and while there was no alpha to share, they did say hi.
DSPy's potential for local model improvement: A member reported running DSPy locally based on claims that it could make local models as good as GPT-4 for specific tasks.
- However, they haven't experimented with it much beyond the basic tutorials because frontier models have gotten so cheap.
DSPy bridging the gap between prompting and finetuning: DSPy aims to bridge the gap between prompting and finetuning by allowing users to avoid manual prompt tuning.
- One of the things they mention in the paper is that DSPy allows you to avoid prompt tuning, potentially making it easier to switch models, retune to data shifts, etc.
DSPy: Better at prompting than humans?: Some members believe that DSPy is better at prompting the model than a human could be.
- However, others believe that there is still room for human engineering in prompting and that there are still many things a human can do that DSPy cannot.
Langchain and Substrate Swapping: One member commented that Langchain also swaps substrates, but only Langchain gets criticism for it.
- They also noted that an example of this would be nice to see.

Links mentioned:

Cohere ▷ #discussions (49 messages🔥):

Data Ingestion to KG

Command-r-plus in Sillytavern

API Key Partial Responses

Prompt Tuning

Cohere Office Hours

Data Ingestion to KG: A user asked about frameworks used for extracting triples for data ingestion to a Knowledge Graph.
Command-r-plus not working: A user reported that command-r-plus in Sillytavern stopped working consistently when the context length reaches 4000 tokens.
API Key Partial Responses: A user reported experiencing issues with their API key returning only partial responses, even after trying different Wi-Fi routers and cellular data.
Prompt Tuning Still Borked: A user mentioned that prompt tuning is still not working correctly.
Cohere Office Hours: A reminder was given for the Cohere Office Hours event, which has already garnered 27 interested participants.

Cohere ▷ #announcements (1 messages):

Cohere Developer Office Hours

Prompt Tuning

Guided Generations API

LLM University Tool Use Module

Structured Outputs

Cohere Developer Office Hours Kick-Off!: Join Cohere's Sr. Product Manager and DevRel for a casual session on product and content updates with best practices and Q&A on Prompt Tuning, Guided Generations API with Agents, and LLM University Tool Use Module.
- The event takes place today at 1 PM ET in the #stage channel and can be found at this link.
Cohere Prompt Tuner: Optimized Prompting!: Learn about the Cohere Prompt Tuner, a powerful tool to optimize prompts and improve the accuracy of your LLM results.
- The blog post details how to utilize this tool and the associated features.
Structured Outputs for Accurate JSON Generations: Structured Outputs, a recent update to Cohere's tools, delivers 80x faster and more accurate JSON generations than open-source implementations.
- This new feature improves the accuracy of JSON output and is discussed in this blog post.
Workflow Automation with the LLM University Module: The LLM University Tool Use Module simplifies workflow automation by leveraging the capabilities of Command R+.
- Learn how to automate tasks and workflows through this new module, discussed in this blog post.
Don't Miss Out on Cohere's Office Hours!: Don't miss this opportunity to learn from Cohere's experts and other builders from the server.
- Join the discussion and expand your knowledge about the latest updates in Cohere's tools.

Links mentioned:

Cohere ▷ #questions (43 messages🔥):

API key monitoring

production keys

Cohee chat

Trial keys

Structured output

Production API Keys and Monitoring: A member questioned whether obtaining a production API key would require them to monitor all LLM output for unspecified objectionable material.
Production Key for Cohere Chat: A member asked if a production key can be used on Cohere Chat.
Production Key Issues: A member reported receiving a [429] error when trying to use their production key on Cohere Chat.
Generating Structured JSON Output: A member asked about open-source implementations for guaranteed structured output.
Guidance for Structured Output: A member inquired about methods for generating structured JSON objects using an LLM.

Links mentioned:

Cohere ▷ #projects (1 messages):

CursorLens

Cohere models

CursorLens: An Analytics Tool for Prompts: CursorLens is a tool that provides analytics on your prompts and allows you to configure models not available through Cursor itself, such as Cohere models.
- It allows you to see analytics on your prompts and configure models that are not available through Cursor itself, e.g. Cohere.
Cohere Models for Codebase Searches: Cohere models are thought to be effective for codebase searches and queries.
- The user believes that Cohere models can be really good for some across codebase searches and queries.
CursorLens is Open Source: CursorLens is open source and available for anyone to try.
- The user encourages others to try CursorLens and contribute to the open source project.

Link mentioned: CursorLens - Open Source dashboard and analytics for Cursor IDE | Product Hunt: An open-source dashboard for Cursor.sh IDE. Log AI code generations, track usage, and control AI models (including local ones). Run locally or use upcoming hosted version.

Cohere ▷ #cohere-toolkit (2 messages):

Toolkit Bug Fixes

Python SDK Linting

Toolkit and Python SDK Bug Fixes & Linting: A member pushed bug fixes and linting improvements to the Cohere Toolkit and Python SDK.
- Another member expressed gratitude for the contribution.
A Big Thank You: A member expressed gratitude for the contribution.

Interconnects (Nathan Lambert) ▷ #news (12 messages🔥):

Yi Tay's Work Style

AI Regulation

01AI's future

Yi Tay is a tireless worker: The discussion centers around the work styles of various AI organizations, with one member suggesting that Yi Tay operates with a 'chaos no sleep grind' mentality.
Nancy Pelosi opposes California AI Bill: Speaker Emerita Nancy Pelosi issued a statement opposing California Senate Bill 1047 on AI regulation.
01AI's market strategy questioned: A member asks about 01AI's future market strategy due to a recent tweet suggesting a possible retreat from non-Chinese markets.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #ml-drama (15 messages🔥):

Hermes 2.5

Mistral struggles

Model Merging

Open Empathic

Zicheng Xu Laid Off

Zicheng Xu Laid Off: Zeyuan Allen-Zhu announced that the author of the "Part 2.2" tutorial, Zicheng Xu, has been unexpectedly laid off.
- Allen-Zhu strongly endorses Xu and provided his email address for potential collaborators or employers: [email protected] (remove the capital 'B').
Nous Hermes Discord Drama: A user mentioned a discussion in the Nous Discord regarding a user's perceived rudeness and misrepresentation of evaluation settings.
- The user mentioned that their evaluation details were in the SFT section of the paper, and admitted that it doesn't feel good to get things wrong but the core of the article is still valid.
Meta Cooking (Model Harnessing): A user wondered what "meta cooking" is, suggesting a potential conflict or drama in the Nous Discord.
- The user mentioned finding contradictory information about evaluation settings, possibly due to the use of default LM Harness settings without clear documentation.
Evaluation is Hard, Focus on It: The user expressed that the experience of the Discord drama motivated them to write a fun post about evaluation.
- They acknowledge the difficulty of accurate and consistent evaluation, and consider it important to emphasize this aspect.

Link mentioned: Tweet from Zeyuan Allen-Zhu (@ZeyuanAllenZhu): (1/2) Many asked for Part 2.2 and I'm sorry for the delay. Our author Zicheng Xu has been unexpectedly laid off. He has my strongest endorsement (see next post). If interested in this project or h...

Interconnects (Nathan Lambert) ▷ #random (15 messages🔥):

AI21 Models

AI21 vs AI2

AI Bubble

Gary Marcus

AI Safety

AI21 models on LMSYS: New "toto" models on LMSYS are likely from AI21.
- This could be why AI2 has been renamed to Ai2, as AI2A12 is confusing with AI21.
Gary Marcus Revisited AI Bubble Concerns: Gary Marcus revisited his keynote from AGI-21, noting that many of the issues he highlighted then are still relevant today despite significant advances in AI.
- The video, titled "The AI Bubble: Will It Burst, and What Comes After?" is available on YouTube.
Switching to AI Safety Career Trajectory: A user shared a blog post about switching their career trajectory to AI safety.
- They explained that puzzle writing took up too much headspace and they wanted to change things up this year.
Meta's GenAI Releases Tuning-Free Personalized Image Generation: Meta's GenAI has released a new research paper titled "Imagine Yourself: Tuning-Free Personalized Image Generation."
- The feature is available now as a beta in Meta AI for users in the US.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #posts (45 messages🔥):

Procrastination

Blog Design

Substack

Fast Writing

Procrastination is a common problem: One member mentioned they've been procrastinating on getting their blog back up and running because they want to get the design just right, but they know it's a distraction.
- They also admitted to being a decently fast writer but find it easy to convince themselves not to write.
Substack is easy to use but difficult to customize: Another member mentioned they've battled Substack for hours trying to get the big wordart at the top of their blog.
- They also expressed the desire to have more control over the design of their blog, which is why they haven't used a platform like Substack.
FastHTML makes blogging easy and fun: A member mentioned that they built a blog site in one day using FastHTML.
- They found the experience to be pretty fun and enjoyable.

OpenAccess AI Collective (axolotl) ▷ #general (15 messages🔥):

GrokAdamW optimizer

GrokFast paper

Gemma 2B update

Transformers dev version

Unsloth

GrokAdamW optimizer released: GrokAdamW, a pytorch optimizer that encourages fast grokking, was released and is working with Axolotl via the transformers integration. GrokAdamW repository
GrokAdamW inspired by GrokFast: The optimizer is inspired by the GrokFast paper, which aims to accelerate generalization of a model under the grokking phenomenon. GrokFast paper
Gemma 2B update causes Axolotl crash: An update to the Gemma 2B repo caused Axolotl to crash.
Reminder to use the dev version Transformers: It's important to use the dev version of Transformers. Dev version installation
Finetuning Gemma 2, Llama 3.1, Mistral 2-5x faster with 70% less memory via Unsloth!: Unsloth enables finetuning Gemma 2, Llama 3.1, and Mistral 2-5x faster with 70% less memory using directly quantized 4bit models with bitsandbytes. Gemma 2 (2B) Google Colab notebook Gemma 2 (9B) Google Colab notebook

Links mentioned:

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (20 messages🔥):

Gemma 2b training issues

Zero Loss

Eager Attention

Zero Loss during Gemma 2b Training: A user reported a consistent loss of 0.0 during the training of a Gemma 2b model, with a nan gradient norm.
Eager Attention Recommended for Gemma 2b Training: Another user recommended using eager attention instead of sdpa for training Gemma 2b models.
Eager Attention as the Fix: The user who was experiencing the zero loss issue confirmed that eager attention fixed the problem.

OpenAccess AI Collective (axolotl) ▷ #general-help (17 messages🔥):

Chat Template

Axolotl prompt strategies

Using custom loaders

Training with ShareGPT

Fine-tuning with Axolotl

Chat Template for Axolotl: The user asked for clarification on using a Chat Template type in a .yml config file for Axolotl. They were specifically interested in specifying which loader to use, for example, ShareGPT.
Using a Custom Loader with Axolotl: Another user suggested that the user could specify which loader to use by providing a custom .yml file.
Axolotl's Chat Template Support: The user expressed interest in using the chat_template type in Axolotl and asked if it would support the role: system messages in their dataset.
Fine-tuning with Axolotl: No Coding Required: A user clarified that fine-tuning with Axolotl generally does not require coding knowledge, but rather understanding how to format datasets and adapt existing examples.
LLama 3.1 70b Fine-tuning: User Experience: A user mentioned owning a powerful AI rig to run LLama 3.1 70b but felt it was still lacking in some key areas. They had a large dataset of content they had written and scraped and wanted to use it for fine-tuning.

Link mentioned: Allow using tokenizer's default chat template or pass custom jinja chat template by chiragjn · Pull Request #1732 · axolotl-ai-cloud/axolotl: Closes #1689 Summary of changes: Adds tokenizer_default as option for chat_template in chat_template prompt strategy that allows using the chat template from tokenizer's config.json Allows fa...

OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

LLaMa 3.1 8b Lora

Post-Hoc Reasoning

Sonnet 3.5

Claude

LLaMa 3.1 8b Lora for post-hoc reasoning detection: A user is training a LLaMa 3.1 8b Lora to detect post-hoc reasoning within a conversation.
- They spent three days curating a small dataset of less than 100 multi-turn conversations with around 30k tokens to help with the task.
Sonnet 3.5 & Claude struggles with post-hoc reasoning examples: The user employed Sonnet 3.5 to help with generating examples, but had to fix multiple things in each generated example, despite careful prompt crafting.
- They had to iterate multiple times on each specific idea they wanted to convey in the dataset, manually editing each example to get the desired output.
Models are primed to do post-hoc reasoning: Even when instructing the models not to create examples with post-hoc reasoning, they still generated them due to their fine-tuning data.
- The user had to manually fix these issues, highlighting the difficulty in training models to avoid specific reasoning patterns.

LangChain AI ▷ #general (39 messages🔥):

LangChain Caching

LLM structured output

LangChain JSON parsing

RAG chatbot delete functionality

Hybrid search relevance

LangChain Caching Issues: A member asked why .batch_as_completed() isn't sped up by caching, even though .invoke() and .batch() are near instant after caching.
- They noticed that the cache is populated after the first run, but .batch_as_completed() doesn't seem to utilize the cache.
LLMs struggle with structured output: A member mentioned that local LLMs, like Llama 3.1, often have difficulty producing consistently structured output.
- They asked if there are any datasets specifically for training models to improve JSON parsing and structured output for use with tools or ReAct agents.
Deleting files in a RAG chatbot: A member asked about implementing a delete functionality for files in a RAG chatbot that uses MongoDB as a vector database.
- A helpful response provided examples of using the delete method from the LangChain library for both MongoDB vector stores and OpenAIFiles, along with relevant documentation links.
Hybrid Search Relevance Issues: A member described a RAG application using a hybrid search approach with BM25Retriever and vector similarity search, but they were experiencing issues with the relevance of retrieved documents and generated answers.
- Suggestions were offered to check the quality of documents, adjust retriever configurations, evaluate the chain setup, and review the prompt and LLM configuration.
Multilingual RAG workflow: A member discussed a multilingual RAG workflow involving translating user questions into English, retrieving relevant documents in English, and then formulating answers in the user's native language.
- The discussion included questions about the effectiveness of this approach compared to embedding documents in multiple languages, as well as whether multilingual embedding models allow for cross-language retrieval.

Links mentioned:

LangChain AI ▷ #langserve (1 messages):

ShortURL.at

URL Shortener

Social Media Links

ShortURL.at is a free URL shortener: ShortURL.at is a free tool to shorten URLs and generate short links, making it easy to share.
- The service offers premium features like custom short links, detailed analytics, API, UTM builder, QR codes, browser extension, app integrations and support.
ShortURL.at shortens links from various platforms: ShortURL.at allows to shorten long links from Instagram, Facebook, YouTube, Twitter, Linked In, WhatsApp, TikTok, blogs and sites.
- Just paste the long URL and click the Shorten URL button. On the next page, copy the shortened URL and share it on sites, chat and emails.

Link mentioned: ShortURL - URL Shortener: no description found

LangChain AI ▷ #langchain-templates (1 messages):

Steam Gift Card

ShortURL

Shortener

Steam Gift Card for Sale: A user is offering a $50 Steam gift card for sale and provides a shortened URL to purchase it.
ShortURL for URL Shortening: ShortURL is a free tool for shortening URLs and creating short links.
ShortURL Premium Features: ShortURL offers premium features that enhance the URL shortening experience.
ShortURL Compatible Platforms: ShortURL can shorten long links from various platforms like Instagram, Facebook, YouTube, Twitter, LinkedIn, WhatsApp, TikTok, blogs, and websites.

Link mentioned: ShortURL - URL Shortener: no description found

LangChain AI ▷ #share-your-work (4 messages):

CursorLens

LLMs

Machine Learning from Scratch

CursorLens: New Dashboard for Cursor Users: CursorLens is an open-source dashboard for Cursor users that provides analytics on your prompts and allows you to configure models not available through Cursor itself.
- It was recently launched on ProductHunt: https://www.producthunt.com/posts/cursor-lens.
LLMs Explained: From Assistant to Deep Concepts: This blog post dives into the workings of LLMs, starting with high-level abstractions and gradually delving into concepts like tokenization, sampling, and embedding.
- It also discusses limitations of current LLMs, such as their inability to count Rs in "strawberry" and reverse the string "copenhagen." Find the blog post here: https://amgadhasan.substack.com/p/explaining-how-llms-work-in-7-levels.
Machine Learning from Scratch: Beginner-Friendly Guide: This GitHub repository provides a step-by-step guide to learning machine learning from scratch, assuming no prior knowledge.
- It covers core machine learning algorithms and neural networks, explaining the underlying math with practical examples, including gradient descent and backpropagation. Find the repository here: https://github.com/DorsaRoh/Machine-Learning.

Links mentioned:

LangChain AI ▷ #tutorials (1 messages):

URL Shortener

ShortURL

ShortURL: A Free URL Shortener: ShortURL is a free tool to shorten URLs and generate short links, making it easy to share.
- Just paste the long URL and click the Shorten URL button. On the next page, copy the shortened URL and share it on sites, chat and emails.
ShortURL Premium Features: Premium features include custom short links, powerful dashboard, detailed analytics, API, UTM builder, QR codes, browser extension, app integrations and support.
- You can create an account for premium features here: Create Account
ShortURL for Various Platforms: ShortURL allows to shorten long links from various platforms like Instagram, Facebook, YouTube, Twitter, Linked In, WhatsApp, TikTok, blogs and sites.

Link mentioned: ShortURL - URL Shortener: no description found

OpenInterpreter ▷ #general (37 messages🔥):

Orange Pi 5

GPT-4o-mini

OpenInterpreter settings

OpenInterpreter API

Local LLMs for bash commands

Orange Pi 5 Review: A member posted a YouTube video review of the Orange Pi 5, which is a new affordable yet powerful Arm-based SBC.
- The video states that the Orange Pi 5 is not to be confused with the Raspberry Pi 5.
GPT-4o-mini model woes: A user expressed difficulty in setting their model to GPT-4o-mini using the set model command.
- Another member quickly provided a solution: interpreter --model gpt-4o-mini.
OpenInterpreter Settings Reset: A user encountered issues after experimenting with OpenInterpreter settings and sought a way to revert or reset to default.
- Another member recommended using the command interpreter --profiles to view and edit profiles, as well as uninstalling and reinstalling OpenInterpreter using pip uninstall open-interpreter and pip install open-interpreter.
OpenInterpreter API Integration: A user expressed interest in integrating OpenInterpreter into their existing AI core by sending requests to OI, running code, and receiving the output.
- The user was advised to use a Python script, potentially with a Flask server, to handle the communication between their AI core and OpenInterpreter.
Local LLMs for Bash Commands: A member asked for recommendations on local LLMs that are adept at handling bash commands.
- Another member suggested CodeStral and Llama 3.1.

Links mentioned:

OpenInterpreter ▷ #O1 (2 messages):

OpenInterpreter device release timeline

OpenInterpreter device release timeline: Still up in the air: A user inquired about the device's release timeline, specifically if it's expected to ship this year.
- While no concrete timeframe was provided, it remains unclear whether the device will ship this year or later.
OpenInterpreter device availability for purchase: A separate user inquired about the device's availability for purchase.
- No information was provided regarding whether the device is currently available for purchase.

OpenInterpreter ▷ #ai-content (4 messages):

OpenInterpreter for VSCode edits

Terminal Stuck

OpenInterpreter for VSCode Edits: A member asked if anyone has tried using OpenInterpreter to do VSCode edits, specifically going to line 300 and changing the variable x_alpha to camelCase.
- Another member replied that they haven't tried it.
Terminal Stuck with OpenInterpreter: The first member mentioned that OpenInterpreter worked for them last time, but the terminal got stuck in between.

Link mentioned: Exists - Games from Text, Just Like That: Text-to-Game AI creation platform that let anyone create unique multiplayer games in moments.Join our discord for the closed beta:https://discord.com/invite/...

DSPy ▷ #show-and-tell (9 messages🔥):

LLMs

RAG

Knowledge Graphs

WeKnow-RAG

Meta Optimization

LLMs struggle with reliability: Large Language Models (LLMs) are prone to producing factually incorrect information and often produce "phantom" content that undermines their reliability.
WeKnow-RAG improves LLM reliability: The WeKnow-RAG system integrates web search and Knowledge Graphs into a Retrieval-Augmented Generation (RAG) system to enhance LLM accuracy and reliability.
A Meta Optimizer for Workflow Optimization: A user shared that a recently published paper implements ideas similar to their own ongoing work in the area of meta-optimization.
ARC Logic Puzzles: A Test of AI Intelligence: A user shared a link to a paper which evaluates a new algorithm on the ARC Logic Puzzle task, which assesses the general intelligence of AI systems.

Links mentioned:

DSPy ▷ #general (25 messages🔥):

DSPy 2.5 & 3.0 Roadmap

Langgraph & Routequery Error

Optimizing Expert-Engineered Prompts

DSPy & API Integration

DSPy Roadmap Unveiled!: The DSPy Roadmap sketch for DSPy 2.5 (likely in 1-2 weeks) and DSPy 3.0 (in a few months) has been announced.
- The roadmap outlines objectives, milestones, and efforts, and welcomes input and contributions from the community. Link to DSPy Roadmap
Langgraph and Routequery Class Error: A member encountered an error with the routequery class in Langgraph.
- They requested guidance on integrating DSPy with a large set of tools and shared a link to the Langgraph implementation: Adaptive RAG.
Optimizing Expert-Engineered Prompts: A member asked if DSPy can optimize prompts that have already been manually engineered by expert developers.
- They inquired if DSPy is effective not only for optimizing initial drafts but also for improving well-established prompting systems.
DSPy and API Integration: A member asked if they can use DSPy with an API from AI/ML.ai.
- They inquired about how to establish a connection between DSPy and the API.

Links mentioned:

DSPy ▷ #examples (1 messages):

batmanosama: I updated it thanks for pointing that out

DSPy ▷ #colbert (4 messages):

Colpali finetuning

VLM tuning

Domain expertise

Colpali data

Finetuning Colpali: A question arose regarding the approach to finetuning Colpali, a model seemingly requiring specialized expertise due to its domain-specific nature.
Data Needs for Colpali Fine-Tuning: A key discussion point centered around the type of data needed for effectively finetuning Colpali.

LAION ▷ #general (25 messages🔥):

FLUX Dev

LLM for medical assistance

Medical LLMs

LoRa Training

FLUX Dev can create 3x3 photo grids: A user shared that FLUX Dev can generate 3x3 photo grids of the same (fictional) person.
Training LORAs for specific purposes: A user expressed interest in training LORAs for specific purposes like dabbing, middle finger, and 30s cartoon.
LLMs for medical assistance are not yet reliable: Several users expressed skepticism about using LLMs for medical assistance in their current state.
Turning a FLUX Dev LoRA into FP8: A user asked if they could convert their FLUX Dev LoRA into FP8, or use an FP8 LoRA trainer on Replicate.

Links mentioned:

LAION ▷ #research (12 messages🔥):

JPEG-LM

Image/Video Generation with LLMs

Autoregressive LLMs

SIREN

Neural Graphics Primitives

JPEG-LM: A Novel Approach to Image Generation: A new research paper proposes modeling images and videos as compressed files using canonical codecs (e.g., JPEG, AVC/H.264) within an autoregressive LLM architecture.
- This approach eliminates the need for raw pixel value modeling or vector quantization, making the process more efficient.
JPEG-LM vs. SIREN: A Battle of the Titans?: A user playfully claims to have outperformed the SIREN architecture from 2020 with a 33kB complex-valued neural network, despite acknowledging that NVIDIA's Neural Graphics Primitives paper from 2022 has significantly advanced the field.
- The user highlights the importance of using MS-SSIM as a metric for image quality assessment, as opposed to just MSE and MAE.
7B Parameters for Low-Quality Generations?: The discussion acknowledges that utilizing 7B parameters for such low-quality image generation might be considered excessive.
- However, the novelty and potential of this approach is still appreciated, opening new doors for future research.

Link mentioned: JPEG-LM: LLMs as Image Generators with Canonical Codec Representations: Recent work in image and video generation has been adopting the autoregressive LLM architecture due to its generality and potentially easy integration into multi-modal systems. The crux of applying au...

LlamaIndex ▷ #blog (5 messages):

Workflows

RAG

Agents

BeyondLLM

JSONalyze Query Engine

Workflows in Action: A video by Rajib Deb showcases workflows featuring decorators, types for control flow, event-driven process chaining, and custom events and steps for complex tasks.
- The video delves into the key features of workflows, demonstrating how they enable building sophisticated applications with a more structured approach.
RAG & Agent Templates: Reference implementations of 3 RAG and agent papers are provided, offering a kickstart for building applications from scratch or using pre-built templates.
- These templates, utilizing the LlamaIndex framework, emphasize event-driven techniques for advanced RAG and agent applications.
Agentic RAG with Claude 3.5: A tutorial by Richmond Lake guides users on building an agentic knowledge assistant using Claude 3.5, MongoDB, and LlamaIndex.
- The tutorial highlights building an agentic knowledge assistant over a pre-existing RAG pipeline, utilizing tool selection, task decomposition, and advanced RAG techniques.
BeyondLLM for Advanced RAG: BeyondLLM, developed by AIPlanetHub, provides abstractions on top of LlamaIndex, enabling users to build advanced RAG pipelines with features like evaluation, observability, and advanced RAG capabilities in just 5-7 lines of code.
- These advanced RAG features include query rewriting, vector search, and document summarization, streamlining the development of sophisticated RAG applications.
JSONalyze Query Engine as Workflow: RavitheJads reconstructs the JSONalyze Query Engine as a workflow, showcasing the step-by-step process of converting a JSON API response into a SQLite table and queries into SQL.
- This workflow demonstration highlights the versatility of workflows, enabling efficient data manipulation and transformation using a structured, modular approach.

Link mentioned: no title found: no description found

LlamaIndex ▷ #general (27 messages🔥):

Web Scrapers for LlamaIndex

RouterQueryEngine vs Agents

LlamaIndex Workflow

Batching APIs

LlamaIndex CSV Analysis

Web Scraper Recommendations for LlamaIndex: A member asked for recommendations for web scrapers that work well with the LlamaIndex stack.
- Another member recommended FireCrawl, and shared a YouTube video showing a more complex implementation of a LlamaIndex workflow.
RouterQueryEngine vs Agents in LlamaIndex: A member inquired about the difference between the RouterQueryEngine and Agents in LlamaIndex, particularly in relation to routing and function calling.
- Another member explained that the RouterQueryEngine acts like a hardcoded agent, while Agents are more flexible and general.
Batching APIs for Open-Source Models: A member discussed how major companies like OpenAI and Google have launched batching APIs for their models, but these APIs lack processing guarantees, SLAs, and retries.
- They shared a blog post on how to get a batching API like OpenAI for open-source models.
LlamaIndex CSV Analysis Limitations: A member encountered difficulties analyzing a CSV file using LlamaIndex due to inaccurate results.
- Another member explained that CSVs are not well-suited for vector indexes and suggested using a database or a Pandas query engine for better results.
Storing DocumentSummaryIndex in Neo4j: A member inquired about storing DocumentSummaryIndex in Neo4j, which they already use for PropertyGraphIndex.
- Another member responded that while Neo4j can be used as a vector store, it's not suitable for general key-value storage, making storing DocumentSummaryIndex in Neo4j challenging.

Links mentioned:

LlamaIndex ▷ #ai-discussion (2 messages):

LLMs

LLM Limitations

LLMs as Assistants

Tokenization

Sampling

LLMs as Personal Assistants: LLMs are AI-powered assistants that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
- They are not limited to a specific task, but rather can adapt to various inputs and prompts. Think of them as a flexible tool that can be used for a wide range of applications.
LLMs: A Deep Dive: The blog post starts with high-level abstractions, viewing LLMs as personal assistants, then dives deeper into key concepts like tokenization, sampling, and embedding.
- This approach is designed to make the complex world of LLMs more accessible to a wider audience.
LLM Capabilities and Limitations: The blog post acknowledges that LLMs are still under development and have limitations, such as failing to count the Rs in "strawberry" and reversing the string "copenhagen."
- This honest assessment helps readers understand the current state of LLM technology and the areas where further research is needed.
Knowledge Graphs: A Powerful Tool: Knowledge graphs provide a structured and intuitive way to capture the complex relationships hidden within data.
- This approach allows for better organization and understanding of information, enabling the development of truly intelligent applications.
Combining Knowledge Graphs and Generative AI: The blog post explores the potential of combining knowledge graphs with generative AI to create powerful intelligent applications.
- This synergy leverages the strengths of both technologies to unlock new possibilities and advance the field of AI.

Links mentioned:

LLM Finetuning (Hamel + Dan) ▷ #general (5 messages):

LLM Hosting

HF Spaces

Modal

Jarvis Labs

vLLM

HF Spaces limitations: A member expressed difficulty hosting their own LLM using HF Spaces, citing that ZeroGPU does not support vLLM.
Modal and FastHTML: Another member noted that they have used Modal for hosting LLMs, but are currently trying to use FastHTML and are looking for a setup guide.
Jarvis Labs for Fine-tuning: The member mentioned having only used Jarvis Labs for fine-tuning LLMs.

Alignment Lab AI ▷ #general (1 messages):

Batching APIs

OpenAI

CuminAI

Small Language Models (SLMs)

Large Language Models (LLMs)

OpenAI and Google launch cheaper batching APIs: OpenAI and Google recently introduced batching APIs for some of their models, offering a 50% cost reduction compared to regular requests.
- However, these APIs currently lack processing guarantees, service level agreements (SLAs), and retries.
CuminAI: Batching APIs for Open-Source Models: CuminAI provides a solution for creating batching APIs for open-source models, similar to those offered by OpenAI.
- Check out their step-by-step guide on "How to Get a Batching API Like OpenAI for Open-Source Models" here.
Small Language Models: The New Superheroes of AI: A recent blog post from CuminAI highlights the potential of Small Language Models (SLMs), arguing that "bigger isn't always better" in the world of AI.
- While Large Language Models (LLMs) have dominated the field, SLMs offer a more cost-effective and efficient alternative, especially for tasks that don't require extensive computational power.

Links mentioned:

Mozilla AI ▷ #announcements (1 messages):

Llamafile update

Mozilla AI Community at Rise25

ML Paper Talks

Llamafile update: Speech to Text, Image Gen, Performance Boost: Llamafile has released exciting new features, including Speech to Text Commands, Image Generation, and a 3x Performance Boost for its HTTP server embeddings.
- You can find the full update here from Justine.
Mozilla AI Community at Rise25: Mozilla AI is celebrating community members who are shaping a future where AI is responsible, trustworthy, inclusive, and centered around human dignity.
- Several members attended the event, including <@631210549170012166>, <@1046834222922465314>, <@200272755520700416>, and <@1083203408367984751>.
ML Paper Talks: Communicative Agents & Extended Mind Transformers: Join an insightful session with host <@718891366402490439> on cutting-edge Machine Learning research, featuring discussions on Communicative Agents and Extended Mind Transformers.
- RSVP for these thought-provoking discussions and deep dives with authors <@878366123458977893> and <@985920344856596490>, respectively: Communicative Agents and Extended Mind Transformers.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}

AI Twitter Recap

AI Reddit Recap

/r/LocalLlama Recap

All AI Reddit Recap

AI Discord Recap

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

HuggingFace Discord

Unsloth AI (Daniel Han) Discord

Nous Research AI Discord

Perplexity AI Discord

OpenRouter (Alex Atallah) Discord

LM Studio Discord

OpenAI Discord

Latent Space Discord

Cohere Discord

Interconnects (Nathan Lambert) Discord

OpenAccess AI Collective (axolotl) Discord

LangChain AI Discord

OpenInterpreter Discord

DSPy Discord

LAION Discord

LlamaIndex Discord

LLM Finetuning (Hamel + Dan) Discord

Alignment Lab AI Discord

Mozilla AI Discord

PART 2: Detailed by-Channel summaries and links

Stability.ai (Stable Diffusion) ▷ #general-chat (567 messages🔥🔥🔥):

HuggingFace ▷ #general (449 messages🔥🔥🔥):

HuggingFace ▷ #today-im-learning (4 messages):

HuggingFace ▷ #cool-finds (3 messages):

HuggingFace ▷ #i-made-this (18 messages🔥):

HuggingFace ▷ #reading-group (35 messages🔥):

HuggingFace ▷ #computer-vision (4 messages):

HuggingFace ▷ #NLP (10 messages🔥):

HuggingFace ▷ #diffusion-discussions (12 messages🔥):

Unsloth AI (Daniel Han) ▷ #general (242 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #off-topic (44 messages🔥):

Unsloth AI (Daniel Han) ▷ #help (209 messages🔥🔥):

Unsloth AI (Daniel Han) ▷ #showcase (6 messages):

Unsloth AI (Daniel Han) ▷ #research (15 messages🔥):

Nous Research AI ▷ #research-papers (1 messages):

Nous Research AI ▷ #off-topic (20 messages🔥):

Nous Research AI ▷ #interesting-links (6 messages):

Nous Research AI ▷ #general (356 messages🔥🔥):

Nous Research AI ▷ #ask-about-llms (47 messages🔥):

Nous Research AI ▷ #rag-dataset (2 messages):

Nous Research AI ▷ #reasoning-tasks-master-list (25 messages🔥):

Perplexity AI ▷ #general (251 messages🔥🔥):

Perplexity AI ▷ #sharing (26 messages🔥):

Perplexity AI ▷ #pplx-api (5 messages):

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

OpenRouter (Alex Atallah) ▷ #general (240 messages🔥🔥):

SearchGPT waitlist full: Users shared they received waitlist denial emails for OpenAI's SearchGPT, indicating they've run out of spots.

Free Hermes 405B Overload: A user joked that they hope the free Hermes 405B model will face the same overload fate as other models that have become inaccessible due to popularity.

LM Studio ▷ #general (109 messages🔥🔥):

LM Studio ▷ #hardware-discussion (45 messages🔥):

OpenAI ▷ #ai-discussions (107 messages🔥🔥):

OpenAI ▷ #gpt-4-discussions (26 messages🔥):

OpenAI ▷ #prompt-engineering (7 messages):

OpenAI ▷ #api-discussions (7 messages):

Latent Space ▷ #ai-general-chat (27 messages🔥):

Latent Space ▷ #ai-in-action-club (78 messages🔥🔥):

Cohere ▷ #discussions (49 messages🔥):

Cohere ▷ #announcements (1 messages):

Cohere ▷ #questions (43 messages🔥):

Cohere ▷ #projects (1 messages):

Cohere ▷ #cohere-toolkit (2 messages):

Interconnects (Nathan Lambert) ▷ #news (12 messages🔥):

Interconnects (Nathan Lambert) ▷ #ml-drama (15 messages🔥):

Interconnects (Nathan Lambert) ▷ #random (15 messages🔥):

Interconnects (Nathan Lambert) ▷ #posts (45 messages🔥):

OpenAccess AI Collective (axolotl) ▷ #general (15 messages🔥):

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (20 messages🔥):

OpenAccess AI Collective (axolotl) ▷ #general-help (17 messages🔥):

OpenAccess AI Collective (axolotl) ▷ #datasets (1 messages):

LangChain AI ▷ #general (39 messages🔥):

LangChain AI ▷ #langserve (1 messages):

LangChain AI ▷ #langchain-templates (1 messages):

LangChain AI ▷ #share-your-work (4 messages):