AI News for 9/19/2024-9/20/2024. We checked 7 subreddits, 433 Twitters and 30 Discords (221 channels, and 2035 messages) for you. Estimated reading time saved (at 200wpm): 258 minutes. You can now tag @smol_ai for AINews discussions!

Anthropic wrote about Contextrual Retrieval, a RAG technique that takes advantage of their prompt caching feature, showing that Reranked Contextual Embedding and Contextual BM25 reduced the top-20-chunk retrieval failure rate by 67% (5.7% → 1.9%):

However this is just a RAG technique so we didnt feel it was title story worthy.

Team Meta is heavily teasing multimodal Llama 3 at next week's Meta Connect, but we can't make it the headline story until it's out.

Meanwhile, if you've been itching to get your own personal AINews or kick us some inference money, you can now sign up for our "AINews Plus" service and have your own customized AI News service on any topic of your choice!

https://youtu.be/iDCUYZgnAjY

See you at the LLM as Judge Hackathon this weekend if you are in SF!

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Research and Development

OpenAI's o1 Models: @polynoamial announced that OpenAI is hiring ML engineers for a new multi-agent research team, viewing multi-agent as a path to better AI reasoning. @scottastevenson noted that o1 models are causing confusion and skepticism among technologists, similar to early responses to GPT-3 and ChatGPT. @nptacek observed that o1 feels different in terms of prompting, requiring more goal-oriented rather than instruction-driven approaches.
AI Model Developments: @rohanpaul_ai compared DeepSeek 2.5 to GPT-4, noting it's 21X cheaper than Claude 3.5 sonnet and 17X cheaper than GPT-4. @_akhaliq shared information about 3DTopia-XL, a high-quality 3D PBR asset generation model using Diffusion Transformer. @multimodalart highlighted CogVideoX's capabilities for image-to-video conversion, especially for timelapse videos.
AI Research Insights: @rohanpaul_ai discussed a powerful but simple prompting technique that asks LLMs to re-read questions, significantly boosting reasoning across diverse tasks and model types. @alexalbert__ shared research on Contextual Retrieval, a technique that reduces incorrect chunk retrieval rates by up to 67% when combined with prompt caching.

AI Industry and Applications

AI in Enterprise: @svpino stated that companies not allowing their developers to use AI are unlikely to succeed. @scottastevenson noted how LLMs have disrupted traditional ML businesses, with deep moats evaporating in months.
AI Tools and Platforms: @LangChainAI announced LangGraph Templates, a collection of reference architectures for creating agentic applications. @llama_index introduced LlamaParse Premium, combining visual understanding capabilities of multimodal models with long text/table content extraction.
AI in Production: @HamelHusain shared advice on using LLM evals to improve AI products, demonstrating how to create a data flywheel to move from demo to production-ready products. @svpino discussed the importance and challenges of caching in LLM applications for improved speed and cost-efficiency.

AI Ethics and Regulation

@ylecun discussed the political leanings of scientists studying misinformation, noting that scientists generally lean left due to their focus on facts and the current prevalence of misinformation from the right. @ylecun also shared an open letter signed by industry leaders urging the EU to harmonize AI regulations to prevent the region from becoming a technological backwater.
@fchollet clarified that the ARC-AGI benchmark was not designed specifically to trip up LLMs, but rather to highlight the limitations of deep learning, which LLMs share as part of the same paradigm.

Memes and Humor

Various tweets showcased humorous AI-generated content, including @nearcyan wrapping their entire Twitter app for "talk like a pirate day" and @jxnlco sharing an amusing AI-generated image.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Llama 3 Multimodal: Meta's Next Big AI Release

"Meta's Llama has become the dominant platform for building AI products. The next release will be multimodal and understand visual information." (Score: 74, Comments: 21): Yann LeCun announced on LinkedIn that Meta's Llama 3 will be a multimodal model with visual understanding capabilities. This next release of Llama is positioned to further solidify Meta's dominance in the AI product development landscape.
Zuck is teasing llama multimodal over on IG. (Score: 164, Comments: 42): Mark Zuckerberg has hinted at multimodal capabilities for Llama on Instagram. This development is expected to be officially unveiled at Meta Connect, which is scheduled for next week.
- Llama.cpp developers' lack of support for multimodal models and tool calling has disappointed users. The team focuses on barebones efficiency and CPU/CPU+GPU inference, leaving multimodal implementation to be redone from scratch.
- Users debate the performance of Llama.cpp versus other backends like TabbyAPI, ExllamaV2, and KTransformers. Some argue for potential improvements in Llama.cpp's GPU performance through better optimization, speculative decoding, and tensor parallelism.
- The community expresses frustration over Llama.cpp's lack of support for Meta's Chameleon model, despite a Meta developer offering assistance. A pull request implementing support was not merged, leading to disappointment among contributors.

Theme 2. Qwen2.5 32B: Impressive Performance in GGUF Quantization

Qwen2.5 32B GGUF evaluation results (Score: 78, Comments: 38): The evaluation of Qwen2.5 32B GGUF models shows strong performance in the computer science category of MMLU PRO, with the Q4_K_L quantization (20.43GB) scoring 72.93 and the Q3_K_S quantization (14.39GB) achieving 70.73, representing only a 3.01% performance loss. Both Qwen2.5 32B quantizations significantly outperformed the Gemma2-27b-it-q8_0 model (29GB), which scored 58.05 in the same category.
- Qwen2.5 32B quantizations show impressive performance, with users noting significant improvements in certain areas despite potential drawbacks in world knowledge and censorship.
- Users suggest testing IQ variant quants, considered SOTA for under 4-bit and typically superior to older Q_K type quants. Interest in comparing 72B IQ3_XXS (31.85GB) and IQ2_XXS (25.49GB) versions for 24GB VRAM users.
- Discussion around official Qwen/Qwen2.5 GGUF files on Hugging Face, with a caution that official quants often underperform compared to community-created versions.
Qwen 2.5 on Phone: added 1.5B and 3B quantized versions to PocketPal (Score: 74, Comments: 34): Qwen 2.5 models, including 1.5B (Q8) and 3B (Q5_0) versions, have been added to the PocketPal mobile AI app for both iOS and Android platforms. Users can provide feedback or report issues through the project's GitHub repository, with the developer promising to address concerns as time permits.
- Users expressed interest in adding speech-to-text functionality and modifying the system prompt. The developer confirmed that most settings are customizable and shared a screenshot of the available options.
- A user inquired about context size settings, leading to a discussion on the distinction between context length and generation time parameters. The developer explained the rationale behind their placement and added the issue to the GitHub repository.
- The app supports various chat templates (ChatML, Llama, Gemma) and models, with users comparing performance of Qwen 2.5 3B (Q5), Gemma 2 2B (Q6), and Danube 3. The developer provided [screenshots](https://preview.redd.it/130oisgjvspd1.png?width=1290&format=png&auto=webp&s=9890aa96eec037b33f6849e9

Theme 3. EU AI Regulation: Balancing Innovation and Control

Open Letter from Ericsson, coordinate by Meta, about fragmented regulation in Europe hindering AI opportunities (Score: 87, Comments: 16): Ericsson CEO Börje Ekholm warns that fragmented EU regulations are hindering AI development in Europe, potentially depriving Europeans of technological advances enjoyed elsewhere. The letter emphasizes that open models enhance sovereignty and control, and estimates suggest Generative AI could increase global GDP by 10% over the coming decade, urging for clear, consistent rules to enable the use of European data for AI training. Read the full open letter here.
- Commenters debate the impact of EU regulations on AI innovation, with some suggesting it could lead to Europe's dependence on USA for future AI technologies. Others argue for a common framework similar to GDPR to clarify rules across Europe and facilitate investments.
- Discussion centers on the scope of AI regulation, with suggestions to focus on banning "1984 things" like surveillance and discrimination rather than regulating models themselves. The OP clarifies that the issue is about regulating data used for AI training, not AI usage.
- A link to euneedsai.com is shared, potentially offering additional context on Europe's AI needs and regulatory landscape.
Quick Reminder: SB 1047 hasn't been signed into law yet, if you live in California send a note to the governor (Score: 209, Comments: 57): California's SB 1047, an AI "safety" bill inspired by the Terminator, has passed but not yet been signed into law. The post urges California residents to voice their objections to the governor through the official contact page, selecting "An Active Bill" and "SB 1047" as the topic, and choosing "Con" as their stance.
- Critics argue SB 1047 is a regulatory capture bill, potentially hindering open research while benefiting corporations conducting closed, profit-driven research without safety checks. Some believe the bill may be unconstitutional, though others suggest it's likely legal.
- Commenters emphasize the importance of open-source AI for research, general use, and long-term safety through collaborative development. They suggest mentioning location, voter status, and personal stories of open-source AI benefits when contacting officials.
- Concerns about China's AI advancement were raised as a reason to oppose regulation. A dedicated website, stopsb1047.com, was shared for submitting comments against the bill, with some users reporting sending detailed responses.

Theme 4. Mistral Small 2409 22B: Quantization Impact Analysis

Mistral Small 2409 22B GGUF quantization Evaluation results (Score: 106, Comments: 25): The post presents quantization evaluation results for the Mistral Small Instruct 2409 22B model, focusing on the computer science category of the MMLU PRO benchmark. Various quantization levels were tested, with the Q4_K_L variant surprisingly outperforming others at 60.00% accuracy, while model sizes ranged from 9.64GB to 18.35GB. The author also included comparison results for Qwen2.5-32B and Gemma2-27b models, and provided links to the GGUF models, backend, evaluation tool, and configuration used.
- Q4_K_L quantization outperforming Q5_K_L sparked discussion, with users speculating on random chance or layer differences. Tests were run at 0 temperature, with Q4_K_L achieving 60.20% accuracy (245/407 questions).
- Qwen2.5-32B performance was praised. Users requested comparisons with Mistral Nemo 12B, which the author confirmed was evaluated and would be posted later.
- Discussions touched on quantization effects, with anecdotal reports of 5-bit quantizations performing worse than 4-bit for some models. A user's test suggested Q4 variants might be "smarter" than Q6 in certain scenarios.

Theme 5. AI Model Size Debate: Efficiency vs. Capability

Hot Take: Llama3 405B is probably just too big (Score: 104, Comments: 94): Llama3.1-405B, initially leading open models, is now considered too large for practical use compared to more efficient models like Mistral Large (~120B). The post argues that 27-35B and 120B models will become industry standards, with companies deploying off-the-shelf 120B models first, then fine-tuning 30B models to reduce costs by over 50%. While acknowledging Meta AI's contribution, the author emphasizes the need for more 100B+ models, which are cheaper to train, fine-tune, and host than larger counterparts.
- Industry standards for AI models are debated, with some arguing that companies will use whatever works best regardless of size. The 405B model is seen as useful for research, distillation, and in-house use by large organizations concerned with data privacy.
- Larger models like Llama 405B are viewed as important for pushing boundaries and competing with rumored 1.7T parameter models like GPT-4. Some users argue that creating SOTA models is valuable for research and gathering training data.
- Practical applications of large models are discussed, with some users reporting daily use of the 405B model through API for better responses. There's interest in tutorials for fine-tuning 70B+ models without excessive cost or complexity.

Other AI Subreddit Recap

r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity

AI Research and Techniques

Google Deepmind advances multimodal learning with joint example selection: A Google Deepmind paper demonstrates how data curation via joint example selection can further accelerate multimodal learning. This technique could improve efficiency in training large multimodal models.
Microsoft's MInference dramatically speeds up long-context task inference: Microsoft's MInference technique enables inference of up to millions of tokens for long-context tasks while maintaining accuracy, dramatically speeding up supported models. This could allow for much more efficient processing of very long documents or conversations.
Scaling synthetic data creation using 1 billion web-curated personas: A paper on scaling synthetic data creation leverages the diverse perspectives within a large language model to generate data from 1 billion personas curated from web data. This approach could help create more diverse and representative training datasets.

AI Model Releases and Improvements

Salesforce's "tiny giant" xLAM-1b model surpasses GPT 3.5 in function calling: Salesforce released xLAM-1b, a 1 billion parameter model that achieves 70% accuracy in function calling, surpassing GPT 3.5. This demonstrates significant improvements in smaller, more efficient models.
Phi-3 Mini (June) with function calling: Rubra AI released an updated Phi-3 Mini model in June with function calling capabilities. It is competitive with Mistral-7b v3 and outperforms the base Phi-3 Mini, showing rapid progress in smaller open-source models.
OmniGen multimodal model: A new research paper describes OmniGen, a multimodal model with a built-in LLM and vision model that provides unprecedented control through prompting. It can manipulate images based on text instructions without needing specialized training.

AI Development and Industry Trends

OpenAI's funding round closing with high demand: OpenAI's latest funding round is closing with such high demand that they've had to turn down "billions of dollars" in surplus offers. This indicates strong investor confidence in the company's future.
Debate over LLM APIs and ML product development: A discussion on r/MachineLearning highlights concerns that the prevalence of LLM APIs is leading to a focus on prompt engineering rather than more fundamental ML research and development. This reflects ongoing debates about the direction of AI research and development.
Indestructible 5D memory crystals: New technology allows for storing up to 360 terabytes of data for billions of years in resistant crystals, potentially providing a way to preserve human knowledge long-term.

AI Discord Recap

A summary of Summaries of Summaries by O1-preview

Theme 1. New AI Models Make Waves in the Community

Qwen 2.5 Takes Center Stage: Unsloth AI confirms support for Qwen 2.5, with users eagerly training Qwen2.5-14b. OpenRouter introduces Qwen2.5 72B, boasting enhanced coding and mathematics capabilities, and a whopping 131,072 context size.
Mistral Leaps into Multimodal AI: Mistral Pixtral 12B debuts as Mistral's first multimodal model, now accessible on OpenRouter. This launch marks a pivotal moment, expanding Mistral's offerings into versatile AI applications.
Flux Model Lights Up Stability.ai Users: The Flux model impresses with superior prompt adherence and image quality, overcoming initial resource hurdles. Despite some concerns over aesthetic similarities, optimism runs high for Flux's performance.

Theme 2. Fine-Tuning Models: Triumphs and Tribulations

LoRA Fine-Tuning Sparks Innovation: HuggingFace users suggest utilizing LoRA for fine-tuning base models, inspired by ipadapter approaches. This could boost model performance without extensive retraining.
Phi-3.5-Mini Throws a Curveball: Unsloth AI users face an AttributeError while fine-tuning Phi-3.5-Mini, wrestling with LongRopeRotaryEmbedding issues despite following recommended fixes. The community hunts for a viable workaround.
Quantization Trade-offs Ignite Debate: Members discuss that unquantized models may deliver better speed and throughput in batch processing. The critical balance among speed, size, and cost takes center stage in decision-making.

Theme 3. AI Tools Test Users' Patience

Aider Battles API Gremlins: Users grapple with Aider not reading from .env files, leading to configuration chaos and overload errors with the Anthropic API. Logging LLM conversation history becomes the sleuthing method of choice.
LangChain's Chunky Output Causes Chagrin: LangChain v2.0 users report intermittent function call information output in chunks when using OpenAI streaming. Suspicions of lurking bugs prompt calls for fixes.
LM Studio's Connectivity Conundrum Solved: Switching to IPv4 on macOS saves the day for LM Studio users facing connection woes. Clear guidance on adjusting settings turns frustration into relief.

Theme 4. AI Coding Assistants Stir Conversations

O1 Models Under Fire in Code Edits: Aider users express skepticism about O1 models' performance compared to Sonnet-3.5, especially in code refactoring tasks. Hopes remain high for future enhancements to boost interaction capabilities.
Wizardlm Weaves Magic in OpenInterpreter: Wizardlm 8x22B outperforms Llama 405B in OpenInterpreter, nailing tasks on the first try more often. Users are impressed by its efficiency and effectiveness.
Devin Doubles Down on Improvements: Devin now offers faster, more accurate code edits and improved enterprise security support. While many praise the updates, feedback remains mixed, with some users expressing frustration over limitations.

Theme 5. Community Events and Collaborative Efforts

Hackathon Hustle Sets the Stage: CUDA MODE members gear up for a hackathon, with approvals flying and team formations encouraged via forum ideas. A chance to get the PMPP book signed by Prof. Wen-mei Hwu adds extra excitement.
OpenAI Calls for Multi-Agent Marvels: OpenAI is hiring ML engineers for a new multi-agent research team, seeing this niche as crucial for enhancing AI reasoning. They're encouraging applications even from those without prior multi-agent experience.
Web AI Summit 2024 On the Horizon: Members express enthusiasm for networking opportunities at the upcoming summit. The event promises valuable exchanges on web AI topics among eager participants.

PART 1: High level Discord summaries

HuggingFace Discord

Exploring Advanced Tokenization Techniques: A new blog post titled This Title Is Already Tokenized explains advanced methods, raising interest in their applications for modern NLP.
- The content details complexities in tokenization, driving discussion on its relevance to current projects.
Unity ML Agents for Language Model Training: Catch the latest YouTube video about training LLMs from scratch with Unity ML Agents and sentence transformers.
- Highlighting Oproof Validation Success, this episode shows key milestones in the Tau LLM series.
New GSM8K Reasoning Dataset Released: A user introduced a new reasoning dataset, based on GSM8K aimed at AI model training.
- Expected to enhance AI's critical reasoning capabilities with its structured challenges.
Fractal Generator's New Zoom Functionality: A fractal generator project now features zoom capabilities through the 'Aiming Better' section, allowing users to adjust grid length and generate new outputs.
- Community suggestions included implementing scroll wheel input for smoother interactions.
Fine-tuning Base Models with LoRA: A suggestion was made to utilize LoRA for fine-tuning base models, taking inspiration from ipadapter methodologies.
- This could enhance model performance by adjusting parameters without extensive retraining.

aider (Paul Gauthier) Discord

Aider struggles with API interactions: Users face issues with Aider not reading from the .env file, causing configuration challenges and overloading errors with the Anthropic API. Logging LLM's conversation history is suggested as a potential diagnostic approach.
- With this context, further inquiry into configuration issues seems vital to ensure smoother API interactions.
O1 model compared to Sonnet-3.5: Skepticism surrounds the performance of O1 models in Aider versus Sonnet-3.5, particularly in tasks like editing and code refactoring. Users look forward to enhancements that will boost the interaction capabilities between Aider and O1 models.
- This comparison leads to broader discussions about model integration and usability in coding tasks.
Chain of Thought sparks debate: A member questioned the efficacy of the Chain of Thought method, suggesting prior training had a larger impact on performance. Discussions revealed that pragmatic tuning on results is essential for tailored applications.
- This highlights a common theme in AI discussions around enabling model performance through appropriate methodologies.
Anthropic enhances LLM operations with Contextual Retrieval: Anthropic introduced a contextual retrieval method that improves prompt caching for efficient LLM operations. This method's implementation is seen as crucial for projects like Aider.
- Collectively, it underscores the need for continual improvements in managing AI interactions to streamline functionalities.
Issues with function renaming in Aider: Aider's attempt at renaming functions resulted in partial updates causing undefined function errors, raising concerns about its search/replace effectiveness. Users noted that despite prompts, Aider managed to fix only one instance of linter errors.
- The necessity for enhanced functionality in reference updates comes to light, hinting at room for significant improvements in Aider's architecture.

Unsloth AI (Daniel Han) Discord

Qwen 2.5 Finds Unsloth Compatibility: Users confirmed that Qwen 2.5 is supported by Unsloth, though bugs with chat templates are being addressed by the Qwen team.
- It better be, I'm training Qwen2.5-14b right now was a sentiment expressing the urgency for functional support.
Successful Fine-tuning of Models Like Qwen 2.5: For fine-tuning LLMs in text classification with limited datasets, models such as Qwen 2.5 and BERT are ideal, with one user achieving 71% accuracy with Llama 8.1 8B.
- Members are looking to enhance these scores and share successes and challenges.
Crucial Quantization Trade-offs: Discussion indicated that unquantized models may yield better speed and throughput, particularly in batch processing scenarios.
- Members debated the critical trade-offs among speed, size, and cost when deciding on quantizing models.
AGI Progress Sparks Debate: Concerns were expressed that achieving AGI is not solely about finding answers, but more about effectively explaining them, suggesting significant challenges lie ahead.
- Echoing the 80/20 rule, it was noted that a 60-year investment into AGI indicates the arduous path to its realization.
BART Model's Input Mechanism Under Scrutiny: Questions arose about BART's input format, highlighting that it uses an EOS token to start generation instead of the expected BOS token.
- Experiments are planned to analyze the implications of this behavior further.

CUDA MODE Discord

Fixing Bugs in Triton and Confidence Functions: Members reported bugs with assert_verbose_allclose, prompting a fix in Pull Request #261 aiming to enhance its reliability across multiple scenarios.
- Concerns also arose regarding KL Divergence calculations yielding unexpected results for larger input sizes, suggesting a need for alignment with established functions like cross-entropy.
Hackathon Teams and CDP Book Signing: Participants prepared for the hackathon, confirming approvals and encouraging self-organization into teams using ideas posted in the forum.
- Notably, a chance to get the PMPP book signed by Prof. Wen-mei Hwu at the event was highlighted, adding an extra layer of engagement.
Web AI Summit 2024 Networking Opportunities: Excitement builds for the upcoming Web AI Summit 2024, with members expressing interest in attending and networking around web AI topics.
- The summit provides a chance for valuable exchanges among participants looking to share insights and experiences in the domain.
Insights on Apple's ML Framework and MLX Platform: The Apple-specific ML framework focuses on techniques like autodiff and JIT compilation to enhance performance on Apple silicon, creating parallels with PyTorch's kernel development approach.
- Members discussed MLX, a NumPy-like platform designed for optimal performance with metal backends, enhancing compatibility with Apple's hardware capabilities.
Modal’s Serverless Functionality Explored: Members sought information on leveraging Modal for free GPU access, discussing its serverless deployment without SSH support but offering free credits for new accounts.
- Recommendations were made to explore a GitHub repository for samples to initiate CUDA workflows seamlessly on Modal.

Perplexity AI Discord

Perplexity Pro Subscription Confusion: New users expressed confusion over using Perplexity Pro for tasks like tailoring resumes, suggesting alternatives like ChatGPT might be more effective.
- Discussions continued on whether existing Pro account holders can apply new Xfinity rewards codes after downgrading their subscriptions.
Mixed Performance of o1 Mini Model: Users provided feedback on the o1 Mini model, reporting mixed results, with some tasks yielding basic responses that lacked depth.
- While compared to Claude Sonnet 3.5, there’s a strong call for improvements in specific prompting techniques for better outcomes.
AI Models Versatile in Coding: Several users highlighted experimenting with the latest AI models for coding, designating o1 Mini as an option but noting limitations in complex projects.
- They emphasized the necessity for internet search capabilities and real-time feedback to boost coding performance within AI tools.
Sonar vs. Llama-3.1 Performance Discrepancies: Users reported experiencing poor performance with the llama-3.1-sonar-large-128k-online model, particularly in response formatting when compared to web application results.
- Specific issues included output quality decline, premature truncation, and inconsistencies in following prompting instructions.
Navigating Access to Beta Features: Inquiries about access to the return_citations beta feature were raised, with advice to contact [email protected] for applications.
- Clarifications were requested about the search_recency_filter, whether it’s in closed beta, and the potential for recent content retrieval.

Stability.ai (Stable Diffusion) Discord

Pony XL vs Original Model Conundrum: Pony XL is a refined iteration of its predecessor, but concerns arise regarding text encoder layer disalignment with other embeddings that blur model classifications.
- One user aptly compared the excitement around Pony to tulip mania, suggesting SDXL might serve better depending on specific project needs.
Flux Model Showcases Strong Performance: The Flux model is now recognized for overcoming its initial obstacles, notably in resource needs and speed, thus establishing a reputation for prompt adherence and image quality.
- Despite some feedback on aesthetic similarities in generated images, the community remains hopeful about Flux's ability to achieve top-tier performance.
SDXL and Flags: A Trouble Spot: Users reported that both SDXL and SD3M struggle with accurately rendering common symbols like country flags, questioning their reliability.
- Community suggestions included training a Lora model specifically aimed at improving flag accuracy for SDXL's outputs.
Optimizing ComfyUI for Better Workflows: Discussions around efficiently using ComfyUI emphasized workflows in the cloud and exploring serverless options like Backblaze for model storage.
- Members expressed interest in maximizing VRAM across multiple GPUs, sharing tips for enhancing performance in demanding workloads.
Missing Inpainting Models Sparks Inquiry: A user voiced frustration over the absence of inpainting and erasing models in IOPaint, needing command prompt access to unlock these functionalities.
- This led to a broader conversation about how command-line parameters can impact model availability and operations within various UIs.

Nous Research AI Discord

Upscaling Videos Made Easier: A member suggested using video2x to upscale videos by processing each frame through an upscaling model.
- Another member contemplated decreasing the frame rate to reduce workload before upscaling, though uncertainty remained about the video's quality.
Feline AI Revolutionizes Music Production: One user showcased their feline AI chatbot for music production, capable of generating MIDI files and recommending synthesis methods.
- Plans are in motion to migrate to Llama for better performance, emphasizing its understanding of time signatures and musical styles.
Interest in Forge Technology Grows: Members inquired about the functionalities of Forge, especially its relation to Hermes and other models.
- A linked Discord message may shed light on Forge's capabilities in this context.
Exploring Hermes 3 Accessibility: Discussion on accessing Hermes 3 included a link to OpenRouter for exploration.
- Opinions on Hermes 3's performance and data handling were shared among participants.
Philosophical Musings on AI Consciousness: A peculiar paper on consciousness as a gradient in intelligence manifolds was brought up, sparking skepticism about its validity.
- Debate arose around the extent of AI's understanding of complex concepts like music theory.

Latent Space Discord

Hyung Won Chung's paradigm shift: Hyung Won Chung introduced a paradigm shift in AI during his MIT talk, highlighting the recent launch of o1 as a significant development in the field.
- He stated that this talk comes at a crucial time due to major advancements in AI understanding.
OpenAI recruiting ML engineers for multi-agent team: OpenAI is hiring ML engineers for a new multi-agent research team, viewing this niche as essential for enhancing AI reasoning; application details here.
- They emphasized that prior experience with multi-agent systems isn't a prerequisite, encouraging broader applications.
Devin improves speed and accuracy: Recent enhancements to Devin have resulted in faster and more accurate code edits and improved enterprise security support source.
- While many users have praised the updates, feedback has been mixed, with some expressing frustration at its limitations.
New RAG proposal reduces retrieval errors: Anthropic's latest proposal on retrieval-augmented generation (RAG) suggests a 67% reduction in incorrect chunk retrieval rates link.
- The conversation underlined the growing interest in strategies to enhance RAG effectiveness.
Queries about GitHub Copilot models: Users raised questions regarding the standards of models utilized in GitHub Copilot, speculating it uses GPT-4o, with concerns surrounding performance consistency source.
- Discussion centered on the impact of context on performance across various AI tools.

OpenRouter (Alex Atallah) Discord

Chatroom enhances user interaction: The chatroom now supports editable messages, allowing users to modify their messages or bot responses easily.
- This update includes a redesigned stats interface for improved user engagement.
Qwen 2.5 sets a new standard: Qwen 2.5 72B delivers enhanced coding and mathematics capabilities with a 131,072 context size. More information is available here.
- This model represents significant progress in AI capabilities, driving performance expectations higher.
Mistral enters the multimodal space: Mistral Pixtral 12B marks the company's debut in multimodal AI, with a free version accessible here.
- This launch proves to be a pivotal moment, expanding Mistral's offerings into versatile AI applications.
Hermes 3 shifts to paid structure: With Hermes 3 moving to a paid structure of $4.5/month, users are reconsidering service usage options. More details are available here.
- The lack of notifications on the pricing change raised concerns within the community regarding reliance on free credits.
Custom API integrations get attention: A request surfaced for the ability to use custom OpenAI compatible API Key endpoints to better integrate with private LLM servers.
- Several members echoed the importance of this flexibility for future integration capabilities.

LlamaIndex Discord

Exciting Opik Partnership for RAG Autologging: LlamaIndex announced a partnership with Opik, which will autolog all RAG/agent calls for both development and production environments, streamlining the auth process.
- This automation simplifies user experiences in complex multi-step workflows.
Launch of RAGApp v0.1: Code-Free Multi-Agent Apps: The team launched RAGApp v0.1, enabling the creation of multi-agent applications without any coding required.
- Users can easily add agents, assign roles, set prompts, and utilize various tools for application enhancements.
LlamaIndex IDs Cause Pinecone Headaches: Users reported challenges with ID control in Pinecone due to LlamaIndex's auto-generated IDs, complicating deletions.
- The community suggested manual ID editing and creating nodes to manage these limitations better.
Pandas Query Engine Behaves Unexpectedly: Discrepancies surfaced between query outputs in a notebook versus a Python script with the Pandas Query Engine, affecting functionality when using df.head().
- Switching from df.head() to df.head(1) proved to solve the issue, indicating column count may impact query parsing.
Graph RAG Facing Query Compatibility Issues: Users identified issues with querying patterns in Graph RAG, where the provided pattern did not align with retrieved chunks.
- Further analysis revealed mismatched expectations in the GraphRAGQueryEngine during data fetching.

OpenAI Discord

o1 Mini Vs. 4o Performance Showdown: Users labeled o1 mini as inferior to 4o, arguing it lacks real-world experience and intelligent reasoning, merely typing faster without substance.
- One user remarked that o1 feels no different than 4o, provoking discussions on AI's cognitive capabilities.
Hot Debate on AI Consciousness: A rigorous discussion erupted over whether AI can genuinely reason or if it's merely a simulation, with mixed opinions on intentionality.
- A member proposed that focusing AI on task completion, rather than human-like reasoning, might yield safer and more efficient results.
Clarifying GPT-4 Memory Features: There were inquiries regarding memory features for GPT-4 API, with clarity that these are exclusively available to ChatGPT Plus users.
- One user pointed out the ease of implementing their own memory tools with alternatives like Pinecone, despite the lack of this in the ChatGPT interface.
Feedback Roundup on IDE Integration: Suggestions surfaced about enhancing AI tools integrated within IDEs, particularly a call for live previews akin to ClaudeAI.
- Numerous users desired ChatGPT to add this functionality, while others recommended exploring various IDEs for better compatibility.
Sharing and Improving Prompt Usage: A member shared a helpful prompt from a prompt guide, emphasizing its ongoing relevance.
- Visual aids were noted as valuable for enhancing prompt understanding, highlighting their role in effective communication of ideas.

Eleuther Discord

HyperCloning Speeds Up Model Initialization: Discussion centered on using HyperCloning to initialize large language models with smaller pre-trained ones, aiming to enhance training efficiency.
- One member suggested training a mini model, scaling it, and distilling from the larger model to optimize compute usage.
IDA Gains Traction in AI Alignment: The Iterated Distillation and Amplification (IDA) method for aligning AI systems was acknowledged for its iterative effectiveness.
- One participant expressed skepticism regarding the term 'distillation', arguing it fails to represent the compression and information discarding needed.
Critical FP8 Training Instabilities Uncovered: FP8 training was reported to have instabilities due to outlier amplification from the SwiGLU activation function during prolonged training runs.
- An audience member questioned if other activation functions would face similar issues in extended training contexts.
Tokenized SAEs Boost Performance: Tokenized SAEs introduced a per-token decoder bias enhancing models like GPT-2 and Pythia, facilitating faster training.
- This method addresses training class imbalance, enabling better learning of local context features via 'unigram reconstruction'.
Concerns Over BOS Token in Gemma Models: Concerns arose that the BOS token in Gemma models might only be added once in sequences, impacting rolling loglikelihood calculations.
- The same member confirmed they found the BOS token was missing in llh_rolling during certain instances while debugging.

LM Studio Discord

IPv4 Switch Fixes Connectivity Issues: A member noted that switching to IPv4 on MacOS often resolves connection issues, with another confirming that 'it just worked.'
- Clear guidance on adjusting TCP/IP settings can be found here.
Crunching LM Studio Connection Challenges: Members faced issues connecting LM Studio API to CrewAI, exploring different solutions but no consensus was reached.
- One suggested watching a helpful YouTube video for insights into proper setup.
3090 Power Limiting Sparks Debate: A member shared insights on limiting the 3090 to 290W versus undervolting, prompting resource recommendations for further understanding.
- Suggestions included reviewing documentation, with varying opinions on the effectiveness of each method.
Windows vs Linux Power Management: A comparison revealed that adjusting GPU power settings in Windows requires manual setup, while Linux users can optimize with a single command.
- Members debated the accessibility of power management across systems, affirming that Windows offers faster settings adjustments.
RAM Speed vs CPU Inference Bottlenecks: Discussion arose around whether RAM speed and bandwidth significantly hinder CPU inference, with proposals for a motherboard using DDR6.
- Frustration over underutilization of multiple CPU cores was shared, highlighting concerns over efficiency in current CPU designs.

Modular (Mojo 🔥) Discord

Mojo LLMs API Sparks Excitement: A user expressed keen interest in utilizing the Mojo based LLMs API alongside Pythagora, a development tool designed for building applications via conversational interactions.
- They raised questions about the cost of the service, emphasizing the excitement surrounding its role in software development transformation.
GitHub Discussions Closure: Mark Your Calendars!: The GitHub Discussions for Mojo and MAX will officially close on September 26th, with important discussions being converted to GitHub Issues.
- To ensure valuable discussions survive, members can tag the organizer for conversion requests as the community centralizes its communication.
Packed Structs: Mojo's Compatibility Query: The chat highlighted concerns regarding the lack of packed structs support in Mojo, complicating the handling of bitflags as __mlir_type.i1 lists.
- There was hope that LLVM would address this through byte alignment, although skepticism lingered about its reliability.
Variable Bit Width Integers Demand: Members debated the implementation of variable bit width integers for TCP/IP, specifically the need for types like UInt{1,2,3,4,6,7,13}.
- While bitwise operators and masks were proposed as alternatives, they were deemed less ergonomic, leading to a desire for native support in Mojo.
Feature Requests Piling Up in Mojo: A feature request emerged for allowing empty lists without a type parameter for better compatibility with Python in Mojo, alongside other syntax inquiries.
- Mentions of explicit trait implementations were common, with requests for clearer guidelines on defining a generic struct with multiple traits.

Torchtune Discord

Input Position System Gets Streamlined: The recent PR removes auto-setting of input_pos to simplify generation/cacheing logic.
- This aim is to prevent user confusion by eliminating the hunt across classes for default settings.
Memory Optimisations Under the Microscope: Discussions highlighted memory optimisations such as activation offloading, which are in development, and the encouragement of chunked cross entropy usage.
- Members acknowledged that previously dismissed methods are now being reassessed for the mem opt tutorial.
Bringing Efficiency to Generation with Batch Sizes: The focus centered on generation efficiency, emphasizing that the generate script will only support bsz 1 during execution.
- Members mulled over the simplicity of looping through batches, while considering the drawbacks of raising batch sizes.
Debating Submethods for Generation Process: A lively debate sparked around the inclusion of submethods like sample and batched_sample, aimed at refining the generation approach.
- Opinions varied, with some favoring separation of methods while others preferred a streamlined method similar to gpt-fast practices.
Challenges in Keeping Generate Recipe Simple: Urgency emerged from a member about maintaining a straightforward generate recipe amidst user-reported issues linked to larger batch sizes.
- An ongoing effort is underway to simplify logic, viewed as essential for clarity with the generate functionalities.

OpenInterpreter Discord

Wizardlm eclipses Llama in task performance: Experimentation shows that microsoft/Wizardlm 8x22B consistently outshines llama 405B, successfully completing tasks on the first try more often.
- Members were impressed by Wizardlm's efficiency in various tasks, prompting discussion on potential broader applications.
O1 currently lacks useful functions: Participants noted that O1 remains in development and doesn’t have any functioning features ready for deployment.
- They expressed concerns about its inability to interact with applications, emphasizing the need for further enhancements.
Proposed discussion on O1 functionalities: A call for a dedicated discussion about O1's features has been made, seeking to clarify its potential use cases and gather insights.
- To maximize participation, members are encouraged to share their availability, specifically in GMT timezone.
Firebase/Stripe integration struggles: A user reported ongoing issues with their FarmFriend project’s integration with Firebase and Stripe, particularly around handling CORS and authentication domains.
- They described facing a 'deathloop' with service configurations and called for assistance from those experienced in maintaining such integrations.

tinygrad (George Hotz) Discord

Bounty for CLANG dlopen Replacement: A bounty discussion arose about replacing CLANG dlopen with mmap, which may require manual handling of relocations as shown in this pull request.
- At this point I'm very curious to see who will get this bounty highlighted the competitive interest in this task.
Tinygrad Compatibility with Intel GPUs: A user inquired whether Tinygrad supports multiple Intel GPUs, similar to its functionality with AMD and NVIDIA GPUs, and received positive feedback.
- The advice was to investigate compatibility further, indicating a growing interest in Intel hardware.
Troubleshooting IPMI Credential Issues: Reported issues with IPMI pointed to possible incorrect credentials, prompting a discussion on the best methods to reset them.
- Suggestions included using a monitor and keyboard for setup, and ensuring the web BMC password matches the displayed one.
Confusion Over GPU Setup Connections: A question emerged regarding whether to use HDMI or VGA for GPU setup, with a clear consensus that VGA only is necessary during initial connections.
- This confusion underscores a common oversight in hardware configuration practices.
Undergraduate Thesis on ShapeTrackers Mergeability: One user expressed interest in tackling the mergeability of two arbitrary ShapeTrackers in Lean for their undergraduate thesis, questioning the status of the bounty.
- They noted it appears incomplete, presenting an opportunity for new contributions to the project.

Cohere Discord

Cohere Discord: A Place for Learning: Members expressed their excitement about joining the Cohere Discord community, encouraging a collaborative atmosphere for learning about AI and Cohere's offerings.
- Welcome! messages were sent to newcomers, fostering an engaging environment for shared knowledge.
Grab Your Trial Key and Start Hacking: A member suggested using a trial key that offers 1000 calls a month for free, emphasizing hands-on learning through projects.
- Another member agreed, stating that application is the best way to learn and mentioned they would explore this further after finishing their capstone project.
Rerank Multilingual v3 struggles with English: A member reported discrepancies using rerank_multilingual_v3, noting scores below 0.05 for English queries compared to 0.57 and 0.98 with rerank_english_v3.
- This inconsistency is negatively impacting their RAG results, leading to unexpected filtering of relevant chunks.
Test with Curl Command for Rerank Models: Another member suggested a curl command to swap models for testing, proposing queries like 'what are the working hours?' and 'what are the opening times?'.
- This could enable better performance comparisons between the models.
Interest in Newsletters: A member mentioned being attracted to the community through the classify newsletter, showcasing its importance in community engagement.
- Another member expressed a desire for more newsletters, indicating an appetite for continuous updates and information from the community.

LAION Discord

Seeking Speedy Whisper Solutions: A member requested help to maximize the speed of Whisper, focusing on using multiple GPUs for transcription tasks for a very large dataset.
- Efficient processing is crucial, and the need for batching was emphasized.
Whisper-TPU: A Fast Option: Whisper-TPU was highlighted as a notably fast alternative for transcription needs, catering to users requiring high-speed processing.
- Its potential for handling demanding tasks sparked interest among those in the discussion.
Exploring Transfusion Architecture Usage: Curiosity arose about leveraging the Transfusion architecture for multimodal applications, suggesting its innovative capability.
- A GitHub repository for Transfusion showcases its potential to predict tokens and diffuse images.
Challenges with Diffusion and AR Training: Experiments with combining diffusion and AR training revealed significant stability challenges, highlighting a crucial integration obstacle.
- The community is actively seeking effective strategies to enhance stability in these training methods.
Inquiring About Qwen-Audio Training Instability: Discussion surfaced regarding training instability in the Qwen-Audio paper, connecting it with issues in multimodal setups.
- Members expressed intent to revisit the paper for clarity on these challenges, indicating their relevance.

Interconnects (Nathan Lambert) Discord

Qwen 2.4 Disappoints vs o1-mini: The newly announced qwen2.4-math-72b-instruct fails to outperform o1-mini during tests with code execution and ensemble methods using n=256 generations.
- This outcome highlights the difficulty in achieving fair comparisons, especially on metrics like AIME, without the reflection type CoT.
EU Halts Llama Multimodal Launch: A developer mentioned that their team is keen on creating a multimodal version of Llama, but they won't release it in the EU amid regulatory uncertainties.
- This decision reflects broader concerns regarding how fragmented tech regulations may stifle AI innovation in Europe.
Community Fears EU's Anti-Tech Stance: Discussions emerged around the EU's perceived anti-tech sentiment, where members believe regulations, although well-meaning, induce significant uncertainty.
- There's a call for clearer regulations to better balance innovation and safety within the tech landscape.
OpenAI's Extended Video Insights: OpenAI's extended video suggests that a model with RL is now superior at discovering CoT steps compared to human capabilities.
- Key points raised included the importance of infrastructure in algorithm performance and the emergence of self-critique as a significant development.

LangChain AI Discord

Chunky Output Issue in LangChain v2.0: Users are reporting that LangChain v2.0 outputs intermittent function call information in chunks while using OpenAI streaming, suggesting possible bugs.
- This situation raises concerns regarding configuration settings and stability in output formats during function calls.
Ell vs LangChain Comparison: A discussion highlighted differences and comparisons between Ell and LangChain, revealing community interest in evaluating AI frameworks reliability.
- Participants are examining frameworks meticulously to determine effective model integrations for current projects.
Clarifying LangGraph Support: A query about where to direct questions regarding LangGraph indicates confusion in the community about appropriate support channels.
- This points to the need for better-defined support avenues for users exploring various tools and libraries.
Beta Testing Call for New Agent Platform: An announcement invited beta testers for a new platform for launching agents with native tokens, indicating opportunity for innovation.
- This platform aims to enhance agent deployment methods, creating buzz around integration strategies.
OpenAI Assistants Documentation Request: Members requested guidance on utilizing their custom OpenAI assistants according to the latest documentation, showcasing an adaptation to API changes.
- The importance of understanding new Assistants API features was emphasized as community members navigate the revisions.

OpenAccess AI Collective (axolotl) Discord

Moshi Model Launch Hits the Scene: The Moshi model has been launched as a speech-text foundation model featuring a novel method for converting text into speech, enabling full-duplex spoken dialogue.
- This development significantly enhances both conversational dynamics and speech recognition capabilities.
GRIN MoE Achieves Excellence with Minimal Parameters: The GRIN MoE model stands out by achieving high performance with only 6.6B active parameters, excelling particularly in coding and mathematics tasks.
- Employing SparseMixer-v2 for gradient estimation, GRIN pushes the envelope by circumventing standard methods of expert parallelism.
Concerns Arise Over Mistral Small Release: Discussion surrounding Mistral Small confirmed it’s an instruction-only version, attracting mixed reactions from members.
- Participants highlighted memory intensity issues as a notable limitation for several users.

DSPy Discord

Bootstrapping Clarified in DSPy: A member clarified that bootstrapping in DSPy is utilized to generate intermediate examples within a pipeline, ensuring successful predictions capture the full trace of processes.
- It was emphasized that even with LLMs' non-deterministic nature, if the final result is correct, the intermediate steps should also be valid.
MathPrompt Paper Sparks Interest: A member highlighted a research paper on MathPrompt, suggesting its potential to extend understanding of enhanced mathematical reasoning, and linked to the paper.
- This reference could pave the way for more robust prompt engineering strategies aimed at mathematical tasks.
TypedPredictors JSON Hacks: A member shared novel tricks for TypedPredictors, showcasing how to mock JSON parsing to refine output pre-processing for enhanced data handling.
- The approach includes removing excess text, addressing invalid escape sequences, and logging errors from their GitHub Gist.

LLM Finetuning (Hamel + Dan) Discord

FinTech Startup Seeks LLM Engineer: A FinTech startup is looking for a skilled LLM Engineer for a one-week sprint aimed at enhancing their multilingual real-time translation service using LLama 3.1 or Qwen2 models.
- This initiative promises to significantly improve the way millions of financial transactions are handled across language barriers.
Qwen 2.5's Multilingual Potential: A participant recommended exploring Qwen 2.5 for its multilingual capabilities, suggesting it might align well with the project's objectives.
- This recommendation indicates a direction towards enhancing the Whisper model alongside the selected LLM to further improve translation accuracy.

The Alignment Lab AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The DiscoResearch Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

HuggingFace ▷ #announcements (1 messages):

Tokenization Techniques

Unity ML Agents

GSM8K Reasoning Dataset

Nemotron-Mini-4B Demo

Fine-tuning Parler TTS

Explore Tokenization Techniques: A new blog post titled This Title Is Already Tokenized explains advanced tokenization methods by a verified user.
- The content dives into the intricacies of tokenization while raising excitement about its applications in modern NLP.
Unity ML Agents Pretraining: Catch the latest YouTube video on training language models from scratch using Unity ML Agents and sentence transformers.
- The episode highlights Oproof Validation Success and significant milestones in the Tau LLM series.
New GSM8K Reasoning Dataset Released: A contributor has introduced a new reasoning dataset based on GSM8K, aimed at AI model training.
- This dataset is expected to enhance the critical reasoning capabilities of AI systems.
Nemotron-Mini-4B Demo Available: Check out the demo for the Nemotron-Mini-4B model, showcasing its capabilities.
- This demo aims to illustrate the model's utility for AI practitioners and researchers.
Fine-tuning Parler TTS for Specific Languages: A detailed blog post discusses the process of fine-tuning Parler TTS to cater to specific languages.
- This guide offers insights on leveraging existing TTS models for niche language communities.

Links mentioned:

HuggingFace ▷ #general (162 messages🔥🔥):

GPT models and unsupervised learning

Llava model quantization

Triplet loss explanation

AI tools support for Apple Silicon

Nanotron project discussion

GPT models simplify unsupervised learning: GPT models operate unsupervised, learning to predict the next token without the need for labeling, such as POS tags.
- Research highlights methods to extract syntax from these models, showcasing their potential to encompass features of classical NLP.
Understanding Llava model quantization: A user inquired about quantizing the Llava model for efficient use with a 12 GB Nvidia GPU, seeking best practices.
- Suggestions focused on guides for quantization methods, with attention to computational resources.
Explaining triplet loss for embeddings: Triplet loss calculates the Euclidean distance between embeddings to cluster alike samples while distancing those that differ.
- Visual clarity in diagrams may help convey the relationship between anchor, positive, and negative embeddings more effectively.
Apple Silicon and AI tools support: There’s potential for improved support of ML tools on Apple Silicon, leveraging its NPU and unified RAM.
- Discussions emphasized the emerging development of these tools and their capacity for machine learning applications.
Excitement around the Nanotron project: The Nanotron project garnered enthusiastic reactions, with references to its integration with popular culture.
- Users exchanged excited comments suggesting interest in gaming and creative applications related to the project.

Links mentioned:

HuggingFace ▷ #today-im-learning (1 messages):

HF tutorials

Image creation guide

User collaboration

Exciting New HF Tutorial Launch: A member announced their first tutorial titled 'Y01E01 - Make an image', aimed at simplifying usage for beginners, available here.
- They provided a helpful suggestion to rename the tutorial file for clarity and requested feedback from more experienced users to improve it.
Invitation for User Contributions: The member expressed enthusiasm for any feedback or corrections on their tutorial, encouraging community input to enhance the content.
- They highlighted a willingness to incorporate additional information and improvements from established users.

Link mentioned: OFT/HF4Noobs at main: no description found

HuggingFace ▷ #cool-finds (7 messages):

GLiNER model in FastAPI

Automatic Notebook Generator

Logo Generation Model

3D Content Generation Framework

Stable Fast 3D

Introducing GLiNER Model on GitHub: A user shared a project on GitHub that features a GLiNER model implemented as a FastAPI microservice, accessible here.
- This could be a really cool tool for those interested in AI model deployment.
Automatic Notebook Creator is Here: An automatic notebook generator has been found, available at this Hugging Face Space.
- Refreshing as it could streamline the process of notebook creation for AI projects.
Cool Logo Generation Model Discovery: A new model for logo generation called Huggieverse has been shared, showcasing various prompts on Hugging Face.
- Images generated include happy stars and lemons, demonstrating its potential for fun branding.
3D Content Generation Framework Unveiled: A user pointed out a GitHub repository for a unified framework for 3D content generation available here.
- They express a need for rapid 3D object generation, looking for solutions that save in PLY format.
Stable Fast 3D Offers Quick 3D Generation: A suggestion was made to use Stable Fast 3D, which can generate a 3D object from an image in under one second.
- There exists a Hugging Face Space for it, providing an efficient alternative for the user's request.

Links mentioned:

HuggingFace ▷ #i-made-this (220 messages🔥🔥):

Fractal Generator

Interactive World & Character Generative AI

Self-Supervised Learning Workshop at ECCV 2024

OCR Demos by PleIAs

Fractal Generator Zoom Functionality: A user discussed their fractal generator project, highlighting its ability to zoom in through the 'Aiming Better' section by changing grid length and generating new outputs.
- Another user suggested detecting the scroll wheel input for smoother interaction, while expressing enjoyment in testing the tool.
Beta Testing for AI Platform: An interactive AI platform is seeking beta testers to explore character and world generation experiences, inviting interested users to reach out via direct messages.
- This platform aims to create immersive experiences combining user-generated content and AI capabilities.
ECCV 2024 Workshop on Self-Supervised Learning: An article was shared summarizing various techniques from papers featured in the upcoming ECCV 2024 workshop focused on improving data efficiency and model interpretability in self-supervised learning.
- The workshop addresses topics like representation collapse during joint-embedding pre-training, emphasizing the importance of augmentations.
OCR Demo from PleIAs: A user shared an OCR demo created by PleIAs, which reportedly runs on CPU and is available on Hugging Face with a refreshing link.
- This demo illustrates practical applications of OCR technology and sparks interest among users in exploring related functionalities.

Links mentioned:

HuggingFace ▷ #computer-vision (3 messages):

reCAPTCHAv2 100% success rate

Qwen2-VL-72B-Instruct introduction

Model creator engagement on HF Hub

reCAPTCHAv2 achieves a 100% success rate: A new paper reveals that reCAPTCHAv2 now achieves a 100% success rate in solving CAPTCHAs, surpassing previous rates of around 68% to 71% view PDF here.
- The study uses advanced YOLO models for image segmentation and suggests current AI technologies can effectively exploit image-based captchas.
Qwen2-VL-72B-Instruct hits the scene: Qwen2-VL-72B-Instruct just launched, introducing naive dynamic resolution and multimodal rotary position embedding (M-RoPE) for effective information fusion read more.
- This model can now handle videos over 20 minutes in length with enhanced understanding capabilities, according to the developers.
Engaging model creators on HF Hub: It's suggested to pose questions directly on the Community tab of the HF Hub as model creators likely monitor that space for inquiries link to collection.
- This advice may facilitate more effective communication between users and model creators.

Links mentioned:

HuggingFace ▷ #NLP (2 messages):

GPT2SP paper insights

Story Point Estimation with GPT models

Handling non-standard language in embeddings

Exploring GPT2SP for Agile Estimation: A member is working on their Master's thesis, aiming to improve story point estimation for Agile methodologies by experimenting with advanced models like GPT-3 and GPT-4, building on the insights from the GPT2SP paper.
- They're seeking recommendations for the most suitable model and insights from anyone with similar experience.
Curious Patterns in Language Sequences: Another member observed unconventional sequences like 'll' for I'll and 'yes!do it', mentioning issues with missing spaces in words. They expressed concern about how such cases are treated in a ST embedding pipeline and the lack of existing embedding models that accommodate them.
- They highlighted the challenges of non-standard language patterns and their implications for embedding effectiveness.

HuggingFace ▷ #diffusion-discussions (11 messages🔥):

Fine-tuning Base Models

Best GPUs for Micro Datacenters

Liquid AI's Foundational Model

Mathematical Resources for Model Development

Diffusion Models Discussion Channel

Base Model Fine-tuning with LoRA: There was a suggestion regarding using LoRA to fine-tune a base model in a similar manner to ipadapter.
- This approach aims to enhance performance by adjusting the model's parameters without extensive retraining.
Top GPUs for Small Cluster Setups: A comprehensive comparison of GPU models for small clusters was shared, highlighting factors like price, VRAM, and TFLOPS.
- The NVIDIA RTX 4070 topped the efficiency list, delivering substantial performance at a lower cost per TFLOPS compared to others.
Liquid AI's Foundational Model Resources: A user sought resources on the foundational principles behind Liquid AI's new model, aiming to build upon existing knowledge in math and C++.
- Recommendations included recent whitepapers related to LLMs and resources like Unity ML Agents for practical implementations.
Mathematical Foundations for Model Building: Discussions emphasized the importance of a strong mathematical background before embarking on model creation.
- Members shared various resources to aid those looking to deepen their understanding of the complex algorithms involved in training models.
Clarification on Diffusion Models Channel: Clarification was given regarding the correct channel for discussing topics related to Hugging Face Diffusion Models.
- It seems that some users mistakenly use the channel for LLMs, but it is specifically dedicated to diffusion-related discussions.

Link mentioned: GitHub - huggingface/diffusion-models-class: Materials for the Hugging Face Diffusion Models Course: Materials for the Hugging Face Diffusion Models Course - huggingface/diffusion-models-class

aider (Paul Gauthier) ▷ #general (169 messages🔥🔥):

Aider API interactions

O1 model performance

Using proxies with Aider

Sonnet's coding capabilities

Providing coding conventions

Issues with Aider's API interactions: Users are experiencing problems with Aider not reading from the .env file, leading to configuration challenges and overload errors with the Anthropic API.
- It's suggested that logging the LLM's conversation history may help diagnose the issues.
Performance of O1 models in Aider: There is skepticism regarding the performance of O1 models compared to Sonnet-3.5, particularly in editing and code refactoring tasks.
- Users are hopeful for future improvements that could enhance the interaction between Aider and O1 models once more features are integrated.
Using proxies in Aider: Users are exploring ways to integrate proxy settings with Aider, including configurations for Shadowsocks to facilitate connections.
- There is mixed success in getting proxies to work seamlessly, with users sharing their specific approaches and challenges.
Sonnet's handling of large functions: Concerns were raised about Sonnet's tendency to attempt replacing entire large functions when only small changes are required, leading to mistakes.
- Users expressed the need for LLMs to handle smaller code chunks instead of large blocks to reduce errors in code edits.
Incorporating coding conventions in Aider: The ability to provide Aider with relevant coding guidelines through pre-indexed documentation was discussed as a potential improvement.
- Users believe that this feature could enhance Aider's effectiveness as a pair programmer, ensuring that it adheres to specific coding standards.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (54 messages🔥):

Aider URL Scraping

File Management in Aider

Aider's Test-Driven Development Approach

Issues with Model Configuration

Function Renaming Errors

Stop Aider from Scraping URLs: Users expressed frustration with Aider asking for URLs they pasted unless they specifically used /web.
- To mitigate this, it was suggested to clarify preferences in the configuration, but definitive solutions were not provided.
Managing Files with Aider: It was noted that new files need to be added manually for Aider to recognize them, while edits to existing files will be noticed without committing.
- For optimal functionality, users were advised to run Aider outside of VSCode, maximizing its interactive terminal capabilities.
Test-Driven Development Practices with Aider: Several users discussed using Aider in a Test-Driven Development (TDD) context, preferring manual edits for test files.
- Recommendations included making files read-only to allow Aider to focus solely on implementing production code without altering tests.
Issues with Model Configuration and Token Limits: A user reported trouble with token limits while using Bedrock/Anthropic Claude 3.5, experiencing cuts mid-response.
- Attempts to adjust model settings via JSON files did not yield expected results, prompting further investigation into configurations.
Function Renaming and Linter Errors in Aider: Aider's attempt to rename a function resulted in incomplete updates across all code references, leading to undefined function errors.
- Despite being prompted to resolve linter errors, Aider only managed to fix a single occurrence, suggesting limitations in its search/replace functionality.

Links mentioned:

aider (Paul Gauthier) ▷ #links (13 messages🔥):

Anthropic's Contextual Retrieval

Chain of Thought by Cerebras

RAG challenges and solutions

Google CTR Booster Bot

AI Development Platforms competition

Anthropic reveals Contextual Retrieval method: Anthropic introduced a contextual retrieval method that enhances prompt caching, crucial for efficient LLM operations.
- One user noted that implementing this method should be a priority for projects like Aider, as it generally applies to all LLMs.
Chain of Thought raises questions: A member questioned if the Chain of Thought method actually improves performance, suggesting earlier reliance on training.
- Responses highlighted that the complexity of results necessitates specific tuning for intended applications.
RAG requires careful tuning for effectiveness: Discussions around RAG revealed its initial appearance of simplicity is misleading, as achieving optimal results is quite challenging.
- One member emphasized the importance of integrating full text search and re-ranking processes to derive decent outcomes.
Open source Google CTR Booster Bot launched: A user shared their Google CTR Booster Bot, made in Javascript to simulate human behavior for click-through rates.
- They noted that Aider assisted in building about 25-33% of the code, making the development process notably enjoyable.
AI platforms compete: Anthropic vs OpenAI: Members discussed the growing competition between Anthropic and OpenAI as leading AI development platforms.
- With continuous advancements, the need for simpler tools has become more pronounced amidst this competitive landscape.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (198 messages🔥🔥):

Qwen 2.5 support

Fine-tuning models

Model performance comparisons

Quantization techniques

Productivity tools

Qwen 2.5 Supported by Unsloth: Members confirmed that Qwen 2.5 is supported by Unsloth, although there are bugs with chat templates that the Qwen team is addressing.
- It better be, I'm training Qwen2.5-14b right now was mentioned by a user emphasizing the importance of support.
Fine-tuning Models and Data Size: It is suggested that for fine-tuning an LLM for text classification, models like Qwen 2.5 and BERT could be appropriate choices, especially with limited datasets.
- One user noted that they achieved 71% accuracy with Llama 8.1 8B, expressing a desire to improve this score.
Quantization Concerns: Discussion around quantization revealed that using unquantized models can provide better speed and throughput, particularly in batch processing scenarios.
- Members highlighted the trade-offs between speed, size, and cost when deciding whether to quantize models.
Productivity Tools Discussion: A user shared a link to MetaDock, a productivity tool that supports split screens and multi-login features, while others referenced FancyZones for window management.
- Are there any tiled window manager ever? was posed, leading to discussions of free alternatives.
Model Compatibility and Performance: Users discussed the performance differences between models, particularly noting that Qwen 2.5 might be faster than Llama 3.1 8B for certain tasks.
- Conversations indicated that external factors like compute environment and dataset size significantly impact model performance and training effectiveness.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (7 messages):

AGI Progress

Pareto Principle in AI

Querying and Database Limitations

Economics of L3.1 Hosting

Insights on AGI Proximity: When studying complex topics, members voiced concerns about our distance from AGI, highlighting that it's not just about getting answers but explaining them effectively.
- This realization emphasizes the 80/20 rule, suggesting that the last step toward AGI is exceptionally time-consuming, with 60 years of investment already behind us.
Understanding AI Intelligence: A member expressed skepticism about the existing intelligence in AI, stating it's merely an advanced querying of a lossy database rather than true intellect.
- They elaborated that current AI capabilities are essentially curve fitting, not reflecting genuine understanding.
Cost-Benefit Analysis of Hosting L3.1: One member shared concerns about the economics of hosting 0.5 million documents (average 7k tokens) using L3.1 70b, debating between using runpod with vLLM or paying on a token basis to API providers.
- They reached out for insights on which option could be more economical for their extensive data analysis needs.

Unsloth AI (Daniel Han) ▷ #help (11 messages🔥):

phi-3.5-mini finetuning issue

TGI pre-quantized weights support

Using model.generate with chat history

LoRa weights loading crashes

Prediction loss evaluation in training

Workaround for phi-3.5-mini finetuning issue: A user reported encountering an AttributeError related to LongRopeRotaryEmbedding while attempting to fine-tune phi-3.5-mini using a max length over 4095, as detailed in this issue.
- This problem appears persistent despite following the suggested solutions, prompting the community to seek additional workarounds.
Uncertainty on TGI's support for pre-quantized weights: A query was raised about whether TGI supports loading pre-quantized weights, indicating potential compatibility issues.
- No definitive responses were provided to clarify this functionality.
Implementing model.generate with conversation history: To use model.generate with conversation history, one user recommended formatting the chat using structures passed to the tokenizer, citing the Hugging Face documentation.
- However, there was confusion from another user regarding the compatibility of their prompt with this method.
Crashes when loading LoRa weights: A member experienced crashes when loading LoRa weights with unsloth FastLanguageModel, while inference worked fine with PeftModel.
- The issue was suspected to be linked to possible missing dependencies, such as flash-attn.
Understanding prediction_loss_only parameter: Discussion about using prediction_loss_only = True for evaluations during the training loop aimed at reducing VRAM usage, however, members sought clarity on its exact functions.
- Particularly, questions were raised about whether this setting impacts solely the evaluation pass and it was noted that DataCollatorForCompletionOnlyLM is already in use to limit loss calculations.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (3 messages):

Contacting Authors

BART Model Behavior

Torchtune Activation Offloading

Memory Consumption Techniques

W&B Charts

Understanding BART's Input Mechanism: A member questioned the input format for input_ids and labels, noting that the BART model starts generation with an EOS token rather than the traditional BOS token.
- This behavior is deemed odd, but experimentation is planned to further investigate the implications.
Exploration of Torchtune's Activation Offloading: A link to the Torchtune GitHub repository was shared, highlighting its functionality in activation offloading.
- The feature, part of a native PyTorch library for LLM fine-tuning, is of keen interest for further exploration.
Insights from PyTorch Conference: A tweet was shared about a talk by Jane Xu at the PyTorch conference discussing techniques that reduce memory consumption in model training.
- Techniques mentioned include OffloadActivations, Activation Checkpointing, and various optimizers like AdamW4bit.
Involvement with W&B Charts: The member noted the nice visualizations presented in the Weights & Biases (W&B) charts during the talk, which assist in tracking these new memory-saving techniques.
- These charts play a crucial role in understanding performance metrics during experimentation.

Links mentioned:

CUDA MODE ▷ #general (6 messages):

Nvidia's Triton Inference Server

Google Cloud GPU VM Setup

BaM Reproduction Systems

Peer-to-Peer GPU Communication

Clarification on Triton Inference Server: A user clarified that they were referring to Nvidia's Triton Inference Server, not OpenAI's version.
- Another member mentioned that they have used it before for a small project, showcasing its practical application.
BaM Reproduction Systems Discussion: A user inquired about low-budget system recommendations for the BaM project, specifically regarding the Gigabyte G292-Z22 capabilities for peer-to-peer communication.
- They sought confirmation about whether the system supports both GPU-to-GPU and GPU-to-NVMe connectivity.
New Video Guide for Google Cloud Setup: A member shared a YouTube video guide detailing how to set up a Google Cloud GPU VM instance for deep learning, including instructions for PyTorch installation in VSCode via SSH.
- They mentioned finding the setup process tedious and aimed to help others in the community who are working on model training.

CUDA MODE ▷ #triton (9 messages🔥):

GroupNorm implementation

Performance challenges

Memory optimization strategies

Triton kernel adjustments

YouTube recording inquiry

GroupNorm Implementation Progress: A member is implementing GroupNorm in Triton and has achieved working forward/backward propagation but faces performance drops on certain tensor sizes like 512 x 64 x 64 with num_groups=32.
- They hypothesize this drop is due to the T4 GPU's memory bandwidth limitations and are seeking suggestions to enhance performance.
Performance Optimization Insights: Another member suggested that the implementation requires reading input X twice: once for calculating statistics and again for normalization, which is essential for larger tensors.
- They pointed out that using a large BLOCK_SIZE might reduce occupancy; 256 is typically a better value for memory-bound kernels like this.
Reducing BLOCK_SIZE for Efficiency: The initial member acknowledged that lowering the BLOCK_SIZE improves performance, citing successful benchmarks at sizes of 4096 and 16384.
- They plan to rewrite the kernel with these optimizations in mind, especially for larger tensor scenarios.
Challenges with Large Tensors: The conversation highlighted that larger spatial dimensions (like 128x128 or 512x512) prevent loading an entire group into SRAM, complicating mean/std calculations.
- The member described their current implementation, which uses a for loop to compute statistics but has encountered a bottleneck in memory bandwidth.
Inquiry on YouTube Recording: A member mentioned that there should be a recording available on YouTube relevant to their discussions, but they had yet to confirm this with the organizers.
- They expressed intent to reach out to the organizers the following day to confirm the availability of the recording.

Link mentioned: Google Colab: no description found

CUDA MODE ▷ #torch (3 messages):

Model Optimization Talks

Flash Attention Implementations

Seeking Recommended Talks on Model Optimization: A member inquired about model optimization talks, recalling a previous discussion and asking for relevant links or recommendations.
- This prompted others to consider past contributions made by members, such as Driss' multiple PRs related to this topic.
Flash Attention Implementation Insights: Another member mentioned the availability of a flash attention implementation from cuDNN, highlighting its speed advantages over FA3.
- They indicated that this is a noteworthy development in the ongoing efforts to optimize model performance.

CUDA MODE ▷ #algorithms (2 messages):

Llama2-7B Training

FP8 Precision

SwiGLU Activation Issues

Optimization Techniques

Intel's Llama2-7B trained with FP8 precision: Intel has successfully trained the Llama2-7B model on a dataset of 2 trillion tokens using FP8 precision, marking a 20-fold increase over previous limits. The paper discusses instabilities in FP8 training, traced to outlier amplification by the SwiGLU activation function, and introduces a Smooth-SwiGLU modification to ensure stability.
- This shift not only affects activation stability but also allows for the FP8 quantization of both Adam optimizer moments, showcasing improved management of model parameters.
Random Projection to Soothe Activations: One member proposed exploring random projection techniques to smooth out activations and reduce dimensionality, suggesting that if activations are sparse enough, this could mitigate issues with outliers. They noted the use of Hadamard transform in existing models like Quip# and Quarot for similar purposes.
- This opens a discussion on how dimensionality reduction methods could influence the stability and performance of large language models.

Link mentioned: Scaling FP8 training to trillion-token LLMs: We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits. Through these extended training runs, we uncover...

CUDA MODE ▷ #cool-links (1 messages):

iron_bound: https://wunkolo.github.io/post/2024/09/gpu-debug-scopes/

CUDA MODE ▷ #off-topic (4 messages):

Yudkowski's Rationality

Nous Research Merch

Daily dive into Yudkowski's 'Rationality: From AI to Zombies': A member decided to take notes on a single chapter of Yudkowski's Rationality: From AI to Zombies each day.
- They expressed skepticism about its usefulness if read quickly, suggesting it requires thoughtful engagement.
Nous Research Merch Drop Praise: Members enthusiastically discussed the recent Nous Research merch drop, highlighting the item they received: Decentralize T.
- However, one member noted the $90 hoodie was quite expensive, expressing a desire to buy it despite the price.

CUDA MODE ▷ #irl-meetup (3 messages):

Latent Space server

San Francisco meeting spots

Latent Space Server Insights: A member shared a link to a post titled San Francisco meeting spots found in the Latent Space server.
- This post may be useful for planning future meetups in San Francisco.
Uncertain Plans for Discussions: One member inquired about another's future plans, suggesting interest in potential discussions.
- The other member replied, stating that there are currently no plans as it stands.

CUDA MODE ▷ #llmdotc (17 messages🔥):

L2 Side Aware optimization

Stochastic rounding hack

CI support for Llama 3

Travel fatigue

Friend requests on Discord

Exploring L2 Side Aware Optimization: A member plans to write an article addressing L2 Side Aware optimization and NVIDIA's memory hierarchy, preferring to focus on it once the WiFi improves.
- They noted that doing development via SSH in the current conditions would be quite painful.
Cool Stochastic Rounding Hack Idea: One member introduced a stochastic rounding hack that forces some mantissa bits to zero to save power, describing it as a pretty cool idea.
- This hack could potentially provide a more efficient approach to computation.
Collaborating on LLM.c Ideas: There was a discussion about writing an ultra-vague 'llm.c on GH200 and/or Llama3.1' megahack idea, with some members expressing discomfort in tackling it.
- They invited others in the group to contribute if they felt more comfortable with the task.
Thanking CI Team for Event Support: One member expressed appreciation for the involvement of the AAA+E team at the upcoming event while admitting they are currently in Chicago.
- They thanked the members for their collaboration and offered to provide CI updates for Llama 3 in the future.
Travel Fatigue Strikes Again: Several members shared their travel experiences, with one stating they managed only 7 hours of sleep but felt fine after a long walk.
- Discussions also included light-hearted comments about the challenges of managing sleep while traveling.

CUDA MODE ▷ #bitnet (9 messages🔥):

Compression Methods for LLMs

Product Quantization Techniques

BitNet Training Implementation

Efficiency of Quantization

Memory Optimization Strategies

New Insights on LLM Compression Strategies: Recent discussions highlighted the effectiveness of data-free compression methods achieving 50-60% sparsity, thereby reducing memory footprints for LLMs while maintaining performance metrics like perplexity.
- Knowledge-Intensive Compressed LLM BenchmarK (LLM-KICK) was introduced to redefine evaluation protocols, significantly aligned with dense model counterparts.
Debating Product Quantization Methods: Concerns were raised regarding product quantization methods achieving similar compression ratios as bitnets, with varied opinions on effectiveness for 2-bit and 3-bit models.
- One member pointed out that fine-tuning a 2-bit model can yield positive results, especially using approaches like LoRAs or hqq+ style.
Efficiency of Quantization Methods questioned: A member critiqued the efficiency of current quantization methods, noting that on an H100, processing a Llamav2-70B model can take between 3 to 11 hours, indicating slow performance.
- Another participant humorously remarked that this range suggests a misunderstanding of what efficiency entails.
BitNet Training Tweaks Making Progress: Progress on BitNet training was reported, mentioning the integration of int8 mixed-precision training and changes in quantization methods, which signal potential improvements.
- A mini debug run showed promise, indicating that the modified approach to forward pass quantization could lead to better speed and performance.
Exploring Memory Optimization Techniques: Discussion included utilizing A8W2 from gemlite for memory reduction in quantization strategies, while also exploring the impact on speed relative to A8W8.
- Members shared insights on balancing memory optimization with processing speed, suggesting that the adjustments could cater specifically to resource efficiency.

Links mentioned:

CUDA MODE ▷ #webgpu (2 messages):

Web AI Summit 2024

Exciting Web AI Summit 2024 Announcement: A member shared a link to the Web AI Summit 2024, expressing interest in attending the event.
- They invited others to join them, suggesting a catch-up during the summit.
Upcoming Gathering at Web AI Summit: A member announced their plans to attend the Web AI Summit and encouraged others to meet up.
- The invitation hints at creating an opportunity for networking and discussions around Web AI.

Link mentioned: Web AI Summit 2024: no description found

CUDA MODE ▷ #cudamode-irl (32 messages🔥):

Hackathon Invitations

Finding Teams

PMPP Book Signing

Project Ideas for Hackathon

Parking Options

Hackathon Invitations are on the way: <@sr2132> confirmed that both QQ and Ishan are approved for the hackathon and invitations should be received by tomorrow.
- Participants are encouraged to share their teammates' details if they haven't received their invites.
Self-organize teams for the hackathon: <@marksaroufim> advised participants to self-organize teams by reviewing ideas posted in the forum or by talking to session leads before the event.
- This process is intended to streamline team formation and maximize time for coding during the hackathon.
PMPP Book Signing Opportunity: Attendees have the unique chance to get the PMPP book signed by author Prof. Wen-mei Hwu during the CUDA-MODE IRL event.
- Participants are reminded to bring their books for signatures.
Project Ideas Encouraged for Hackathon: <@andreaskoepf> encouraged attendees to think about potential projects to hack on and to look at the forum for existing ideas.
- He highlighted that small teams should ideally consist of 2-4 members, but solo projects are also welcome.
Parking Suggestions for the Event: Participants are advised to consider taking an Uber to the CUDA-MODE IRL event instead of driving.
- <@marksaroufim> suggested that parking may not be the best option, and another member inquired about available parking.

CUDA MODE ▷ #liger-kernel (89 messages🔥🔥):

assert_verbose_allclose Bugs

KL Divergence Issues

RMSNorm Fixes

Triton Kernel Constraints

Bugs in assert_verbose_allclose: Multiple members identified bugs related to the assert_verbose_allclose function and its incorrect behavior when dealing with certain inputs.
- A proposed fix was shared through Pull Request #261 aimed at resolving these bugs across different scenarios.
KL Divergence calculation mysteries: Concerns were raised about the KL Divergence (kldiv) calculations producing unexpected results, especially for large input sizes beyond 16384.
- Notable differences in results between the standard implementation and the LigerKLDIVLoss function were observed, leading to discussions on potential fixes.
RMSNorm related adjustments: Members discussed the RMSNorm implementation, confirming it works correctly but noting minor numerical stability issues.
- A detailed analysis of accessing weights incorrectly and potential adjustments to improve efficiency was proposed.
Triton kernel constraints and improvements: Discussion highlighted limitations with Triton processing larger kernel sizes without crashes, particularly a 64kb limit.
- Members suggested changing grid sizes to optimize performance based on insights from the Triton tutorial and comparing approaches taken in the cross_entropy implementation.
Recommendations for debugging: Members recommended using torch.allclose for testing instead of custom assertions for better debugging of loss functions.
- A consensus emerged that aligning the calculations of KL divergence to those used in cross-entropy could resolve discrepancies.

Links mentioned:

CUDA MODE ▷ #irl-sponsor-qa (9 messages🔥):

Modal

PrimeIntellect

Lambda Cloud

CUDA workflows

Exploration of Modal and Free GPU Access: A member is investigating Modal, PrimeIntellect, and Lambda Cloud for accessing a 1 H100 machine for free tonight and tomorrow to prepare for the weekend.
- Is there any way this is possible?
Modal’s Serverless Functionality Questioned: There is a belief that Modal functions as a 'serverless' and hassle-free deployment service but it likely does not provide direct SSH access to machines.
- Nonetheless, Modal offers some free credits upon account creation.
Getting Started with CUDA on Modal: A recommendation was made to check out this GitHub repository for sample code aimed at initiating CUDA workflows on Modal.
- The Jupyter Lab example in the repo is highlighted as a useful interface that includes shell access.
Launching VSCode on Modal: Another way to utilize Modal was suggested through the command modal launch vscode --gpu h100, allowing users to attach a --volume or --mount to save their work.
- This suggestion aims to facilitate the development workflow while using Modal.
Feedback on Modal Experience: After receiving helpful commands from the community, the initial user expressed gratitude and stated, Tysm! Trying this now! as they test Modal functionalities.
- Another member encouraged feedback on the experience, mentioning they don’t frequently use that workflow.

Link mentioned: GitHub - charlesfrye/cuda-modal: Enter CUDA MODE on Modal: Enter CUDA MODE on Modal. Contribute to charlesfrye/cuda-modal development by creating an account on GitHub.

CUDA MODE ▷ #metal (4 messages):

Apple ML Framework

MLX Platform

Metal Backends

Apple's Specialized ML Framework: A member noted that the framework is specialized for Apple computers, employing specific techniques like autodiff, vmap, and JIT compiling for better performance on Apple silicon.
- This approach is more akin to PyTorch due to its tailored kernel development as opposed to Triton.
Introducing MLX: A NumPy-like Platform: Another member shared that MLX is a NumPy-like platform built from the ground up, indicating its focus on optimizing performance.
- They emphasized its unique structure, separating it from traditional libraries.
Metal Backends in MLX: The discussion highlighted that MLX includes its own metal backends, notably 'steel', and utilizes Metal Performance Shaders (MPS).
- This integration helps enhance performance and leverages the capabilities of Apple's hardware.
Lazy Evaluation Improves Performance: It's noted that MLX handles operations lazily, only evaluating on distinct calls, which leads to improved overall performance.
- This technique minimizes unnecessary computations, optimizing resource usage.

Perplexity AI ▷ #general (164 messages🔥🔥):

Perplexity Pro Subscription Issues

o1 Mini Model Performance

Using AI Models for Coding

Prompting Techniques for AI Models

Pro Search Features in Perplexity

Troubles with Perplexity Pro: New users expressed confusion over using Perplexity Pro for tasks like tailoring resumes, suggesting that other tools like ChatGPT might be more effective.
- There are ongoing discussions about whether existing Pro account holders can apply new Xfinity rewards codes after downgrading their subscriptions.
Mixed Results with o1 Mini Model: Users reported varied experiences with the o1 Mini model in Perplexity, noting that while it handles some tasks well, others yield basic responses lacking depth in reasoning.
- Users compared o1 Mini's performance unfavorably against models like Claude Sonnet 3.5, emphasizing the need for specific prompting techniques.
AI Models for Coding: Several users discussed using the latest AI models for coding tasks, highlighting o1 Mini as a potential tool but noting its limitations in more complex projects.
- The integration of internet search capabilities and real-time feedback mechanisms are seen as crucial for enhancing coding performance in AI models.
Effective AI Prompting Techniques: Engagement revolved around the best practices for prompting AI models, with users suggesting that simpler, clearer prompts yield better results.
- The importance of testing and reviewing outputs is emphasized as part of the process to understand model capabilities better.
Future of AI and Neuralink: Discussions touched on the potential impact of advancements like Neuralink, suggesting a future with smarter AI and enhanced human capabilities.
- Contrasting views emerged about the current state of AI and the ethical implications of creating artificial intelligence that could surpass human intelligence.

Links mentioned:

Perplexity AI ▷ #sharing (14 messages🔥):

AI benefits

Israel-Lebanon tensions

Linkin Park legacy

Cooking chicken keraguen

AI innovator profiles

Exploring AI Benefits: A link shared discusses the benefits of AI, providing insights into the positive impacts AI has on various sectors.
- Key points include efficiency improvements and enhanced decision-making capabilities.
Current Israel-Lebanon Tensions: A discussion centered around the ongoing tensions between Israel and Lebanon using the latest insights from CPLX version 0.3.0.
- The conversation pointed to historical context and current implications on regional stability.
Gripping Linkin Park Legacy Discussion: A deep dive into Linkin Park's legacy reveals the band's cultural impact and ongoing influence.
- Members shared personal reflections on the band's music, facing emotional narratives and its resonance with fans.
Cooking Chicken Keraguen Guide: A link about how to cook chicken keraguen offers step-by-step recipes and tips.
- This dish draws from rich cultural traditions, providing a flavorful cooking experience.
AI Innovator's Insights Unpacked: A look into AI's shaping through the experiences of innovators shows how ideas evolve into impactful tools.
- Participants discussed the challenges faced by innovators in a rapidly changing tech landscape.

Perplexity AI ▷ #pplx-api (10 messages🔥):

Changes to Perplexity API

Sonar vs. Llama-3.1 Model Performance

Beta Features Access

Search Recency Filter

API Limitations on Output

Perplexity API site updates: The models page on the Perplexity API site has been updated, moving away from the outdated 'pplx' models now referred to as Sonar.
- Pplx models are outdated and unsupported, so users are encouraged to switch to Sonar.
Sonar performance issues compared to web app: A user reports experiencing significantly worse results with the llama-3.1-sonar-large-128k-online model compared to the Perplexity web application, particularly in formatting responses.
- They noted specific problems with output quality, stopping prematurely, and difficulties in adhering to specific prompting instructions.
Accessing beta features for citations: A user inquired about obtaining access to the return_citations beta feature for the API, which requires a specific application process.
- They were advised to reach out via email at [email protected] for further assistance and guidance.
Understanding search_recency_filter availability: There was a question regarding whether the search_recency_filter feature is part of the closed beta or if it is available to all users.
- This feature could help gather recent information from sources published within an hour when set appropriately.
Proposed process for improving output quality: A user suggested a multi-step process to improve output quality by caching links and formatting text across several runs to ensure accurate referencing.
- They expressed interest in understanding why the web application seems to outperform the API model and whether using GPT could enhance the process.

Links mentioned:

Stability.ai (Stable Diffusion) ▷ #general-chat (132 messages🔥🔥):

Pony and XL Model Comparison

Flux Model Capabilities

Issues with SDXL and Flags

Using ComfyUI Efficiently

Inpainting and Erasing Models

Pony and XL Models are Similar: Pony XL is essentially a refined version of the original model, but there are concerns about its text encoder layer disalignment with other embeddings, leading to confusion about its classification as a separate model.
- One user compared the hype around Pony to tulip mania, suggesting there are better options available in SDXL depending on specific needs.
Flux Model's Performance Highlighted: Flux has reportedly overcome initial hurdles including resource requirements and speed, establishing itself as a leading model in terms of prompt adherence and image quality.
- Despite some community complaints regarding aesthetic similarities in generated images, users are optimistic about Flux's potential to maintain superiority in model performance.
Challenges with SDXL and Flags: Users noted that both SDXL and SD3M struggle with rendering country flags and common symbols effectively, raising questions about the models' capabilities in this area.
- One user suggested training a Lora model specifically for flags to enhance SDXL's efficacy in generating accurate depictions.
Effective Use of ComfyUI: There was a discussion regarding how to optimize ComfyUI workflows in the cloud, with suggestions to use serverless platforms or explore options like Backblaze for model storage.
- Users expressed interest in maximizing VRAM utilization across multiple GPUs, seeking advice for optimizing performance and efficiency in their workloads.
Inpainting and Erasing Model Absence: A user raised a question regarding missing inpainting and erasing models in IOPaint, discovering that command prompt options were required to access these features.
- This led to discussions about how command-line parameters can affect the availability of certain models and functionalities within various UI setups.

Links mentioned:

Nous Research AI ▷ #general (73 messages🔥🔥):

Upscaling Videos with AI

Music Production Chatbot

Forge Technology

Hermes 3

Consciousness in AI

Upscaling Videos Made Easier: A member suggested using video2x to upscale videos by processing each frame through an upscaling model.
- Another member contemplated decreasing the frame rate to reduce the workload before upscaling, although they expressed uncertainty about the resulting video quality.
Feline AI Revolutionizes Music Production: One user discussed their creation of a feline AI chatbot focused on music production, which can generate MIDI files and recommend synthesis methods.
- Despite its current limitations, they aim to move it to Llama for improved performance, highlighting its grasp of time signatures and musical styles.
Interest in Forge Technology Grows: A member inquired about the functionality of Forge and its relation to other models like Hermes and the World Sim.
- Another member linked a Discord message that may provide insight into the capabilities and features of Forge.
Exploring Hermes 3 Accessibility: Members discussed where to try Hermes 3, with a link to OpenRouter being provided for exploration.
- The conversation included opinions on Hermes 3's overall performance and capabilities in handling up-to-date information.
Philosophical Musings on AI Consciousness: A user brought up a peculiar paper regarding consciousness represented as a gradient in intelligence manifolds, suggesting skepticism about its validity.
- This led to a discussion about AI's understanding of concepts like music theory and the potential for models to be trained in more sophisticated ways.

Links mentioned:

Nous Research AI ▷ #ask-about-llms (29 messages🔥):

Hermes-3 functionality

RAG for maritime law chatbots

Using rules in RAG

Together AI cost-effectiveness

Seeking help with Hermes-3 model function arguments: A user asked for assistance in adding a function with multiple argument types to the Hermes-3 model in their repository, specifically in functions.py. Another suggested reverting to using OpenAI's tool via Langchain as a design decision that may restrict argument types.
- They were referred to another member who might provide further help once available.
RAG recommended for maritime law chatbots: Discussion focused on using Retrieval Augmented Generation (RAG) for building a chatbot aimed at maritime shipping and law, as finetuning could be less effective. Members expressed that referencing specific legal documents through RAG is essential to avoid hallucination.
- A rich resource link was shared, highlighting how to implement RAG using Langchain to tackle Q&A applications effectively.
Exploring RAG for rule-based applications: A user inquired about adapting RAG for rule-based operations in a multiplayer text game, highlighting challenges in retrieving applicable rules under varying conditions. They pondered how general terms in rules could impact rule retrieval and application in practical scenarios.
- Another member explained how to create an API server interfacing with Google Sheets to manage the rules efficiently, suggesting the combination of rule management and fine-tuning the model for specific prompts.
Cost-effective AI Models with Together AI: A user noted the affordability of using Together AI models, considering their application within RAG functionalities. There was a general agreement on the potential benefits of using Together AI for building financial applications due to low costs and efficient performance.
Setting rules across different conversations: Discussion included concerns about implementing consistent instructions across different conversations, particularly in the context of a MUD. A user remarked that current models may struggle to handle complex, context-carrying instructions persistently.

Links mentioned:

Nous Research AI ▷ #research-papers (3 messages):

ReST-MCTS Paper

Iteration of Thought Framework

ReST-MCTS paper gains attention: A participant noted that the ReST-MCTS paper, which combines STaR, PRM, and MCTS, seems underappreciated despite its seamless integration of methodologies.
- This paper could lead to insightful discussions about its overlapping concepts with other works.
New Iteration of Thought Framework Proposed: The Iteration of Thought (IoT) framework aims to enhance LLM responses through dynamic prompt generation, unlike traditional approaches like CoT or ToT.
- It includes an Inner Dialogue Agent (IDA) to create context-specific prompts, and adapts reasoning based on evolving conversational contexts.

Links mentioned:

Nous Research AI ▷ #interesting-links (2 messages):

Promptriever

Twitter Insights

Promptriever: A New Era in Dense Retrieval: Introducing the Promptriever, the first dense retrieval model that can be prompted like a language model, offering a fresh perspective on information retrieval.
- Its capacity to be prompted brings a blend of charm and curiosity, raising questions about whether it's cursed or beautiful.
Mysterious Twitter Insight: A tweet from @unknown sparked intrigue but left context undefined, engaging the community with its ambiguous allure.
- Participants pondered the tweet's meaning, reflecting on its ability to intrigue and mystify without clear explanation.

Links mentioned:

Nous Research AI ▷ #research-papers (3 messages):

ReST-MCTS

Iteration of Thought framework

Large Language Models engagement

ReST-MCTS Paper Gains Attention: A member highlighted the ReST-MCTS paper* as underappreciated, mentioning its effective blend of STaR, PRM, and MCTS methodologies.
- They noted that the paper combines these techniques in a seamless way, urging others to explore its insights.
Introduction of Iteration of Thought Framework: A new framework called Iteration of Thought (IoT) was proposed for enhancing LLM responses through adaptive conversation methods.
- The framework consists of three components, including an Inner Dialogue Agent to generate context-specific prompts, aiming to improve the thoughtfulness of LLM interactions.

Links mentioned:

Latent Space ▷ #ai-general-chat (38 messages🔥):

Hyung Won Chung's MIT Talk

OpenAI Hiring for Multi-Agent Research

Advancements in Devin

Improvement Techniques in RAG

GitHub Copilot Updates

Hyung Won Chung presents new paradigms: Hyung Won Chung shared his MIT talk discussing a paradigm shift and the recent launch of o1, which he believes exemplifies a new paradigm in the field.
- He emphasized the timely nature of the talk in relation to significant advancements in understanding AI.
OpenAI seeks ML Engineers for Multi-Agent Research: OpenAI is currently hiring ML engineers for a new multi-agent research team, viewing this area as pivotal for enhancing AI reasoning details here.
- They noted that prior experience with multi-agent systems isn't required, encouraging interested candidates to apply.
Devin shows improvements and community feedback: Recent updates revealed that Devin is now faster and more accurate in code edits, and has improved support for enterprise security requirements source.
- However, user feedback ranged from frustration at its capabilities to admiration for its demo performances.
New Techniques in RAG Applications: A new proposal from Anthropic on contextual retrieval suggests a reduction in incorrect chunk retrieval rates by up to 67% link.
- Participants noted the increase in strategies designed to improve retrieval-augmented generation (RAG) and their efficacy.
GitHub Copilot's Model Confusion: Users speculated about the standard of models used in GitHub Copilot, with claims that it employs GPT-4o, prompting questions about performance source.
- Conversations highlighted the significance of understanding context and how performance varies across different AI tools.

Links mentioned:

Latent Space ▷ #ai-announcements (1 messages):

swyxio: new pod is up on SOTA Prompting! https://x.com/latentspacepod/status/1837206370573041758

Latent Space ▷ #ai-in-action-club (53 messages🔥):

Cursor usage

Emoji reactions issues

Discord message editing problems

Cody and Claude alternatives

Zoom meeting link

Cursor dominates the conversation: Members are primarily discussing their experience with Cursor, with many expressing satisfaction and a desire to learn more about its features.
- One member remarked that their workflow is so refined with Cursor that trying other tools is challenging.
Emoji reactions not functioning: Several participants voiced issues with emoji reactions not working properly in Discord, prompting discussions on potential fixes.
- A member indicated they are restarting the app to resolve emoji reaction problems.
Frustrations with Discord functionality: Issues with editing messages in Discord have been reported by multiple members, sparking conversations about Discord's reliability.
- A user noted they are unsure if the problem lies in their system or is a Discord-wide issue.
Alternatives to Phind discussed: One member indicated they stopped using Phind because they found better alternatives such as Cody and Cursor.
- This shift highlights the ongoing exploration of tools aimed at enhancing productivity within AI applications.
Zoom meeting coordinates shared: A user shared a Zoom meeting link along with the meeting ID and passcode for others to join.
- The link was shared as part of the ongoing chat, reflecting collaboration among members.

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom ...

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

Chatroom improvements

New Model Releases

Hermes 3 Pricing Update

Chatroom introduces editable messages: The chatroom now features editable messages, allowing users to edit their messages or the bot's responses by clicking the regenerate button for new replies.
- Additionally, the stats have undergone a redesign, enhancing user interaction.
Qwen 2.5 raises the bar in AI models: Qwen 2.5 72B boasts improved capabilities in coding and mathematics, and features an impressive 131,072 context size. More details can be found here.
- This model demonstrates a significant knowledge leap, setting a new standard in AI capabilities.
Mistral Pixtral debuts as multimodal model: Mistral Pixtral 12B marks Mistral's first venture into multimodal AI, with a free variant accessible to users as well. Details can be explored here.
- This introduction represents a step forward in providing versatile AI solutions through multimodal functionality.
Upgrade announcement of Neversleep Lumimaid: Neversleep Lumimaid v0.2 8B is now a finetune of Llama 3.1 8B, offering a HUGE step up dataset wise compared to its predecessor. More information is available here.
- This upgrade showcases a commitment to enhancing dataset quality for improved model performance.
Hermes 3 goes paid but retains a free variant: Hermes 3 is transitioning to a paid model at $4.5/month, although a free variant remains available for the time being. Details can be accessed here.
- Overall, this shift reflects the ongoing evolution of model offerings in the marketplace.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #general (79 messages🔥🔥):

Frontend for OpenRouter

SillyTavern functionalities

Model pricing changes

Integration API issues

Feature requests and further developments

Recommendation for AI Chat Frontend: A user sought recommendations for a frontend to chat with AIs on OpenRouter, mentioning the need for sharing conversations across PCs. Another member pointed to every-chatgpt-gui as a comprehensive frontend option.
- It was noted that while SillyTavern is good for role-playing, it has syntax highlighting for code snippets but may not be optimal for coding tasks.
Recent Pricing Changes for Hermes Model: A reminder was given about the price increase of nousresearch/hermes-3-llama-3.1-405b from free to $4.5/M, which surprised users reliant on free credits. Users voiced concerns about lack of notifications regarding the sudden pricing changes.
- Information was shared indicating that cached tokens were not reflected in usage stats due to OpenRouter's limitations on forwarding caching details.
Integration API Challenges: A user encountered issues with their integration API key not functioning correctly, despite assurances that Lambda's services remained free. It was suggested to check the activity page to verify prompt caching effectiveness.
- Various users exchanged tips on utilizing caching and provided updates on feature implementations to better assist API usage.
Discussion on Model Performance: Users compared the performance of different models, notably mentioning that deepinfra qwen72b was slow (5-8 tok/s) while hyperbolic was significantly faster. Some questioned the scalability and intended use of OpenRouter for larger applications.
- Several users reported successes using OpenRouter for extensive token usage, highlighting its potential for both personal and broader application contexts.
Feature Requests and Future Developments: Inquiries about submitting feature requests were made, with directions provided to a specific channel. Discussions also touched on the need for more robust sharing options for chat histories.
- The community anticipates upcoming updates regarding feature enhancements, including functionality around caching and user engagement.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #beta-feedback (1 messages):

Custom API Integration

Private LLM Servers

Request for Custom API Key Integration: A member proposed the feature to add a custom OpenAI compatible URL / API KEY endpoint to facilitate integration with private LLM servers.
- This request highlights the need for more flexible integration options to support varied user environments and deployments.
Discussion on Integration Flexibility: Several members expressed the importance of allowing custom endpoints in current and future integrations for broader compatibility.
- The sentiment was shared that enabling these features could enhance overall user experience by accommodating diverse system architectures.

LlamaIndex ▷ #blog (2 messages):

RAG integrations

Opik partnership

RAGApp v0.1 release

Opik Autologs RAG Calls: The team is excited to announce a partnership with Opik which autologs all RAG/agent calls and traces for both development and production environments.
- Opik streamlines the auth process with automation, simplifying user experiences in multi-step workflows.
RAGApp v0.1: No-Code Multi-Agent Apps: Today’s launch of RAGApp v0.1 enables users to build multi-agent applications without writing any code.
- Users can freely add agents, assign roles, set system prompts, and utilize various tools to enhance their applications.

LlamaIndex ▷ #general (68 messages🔥🔥):

LlamaIndex and Pinecone Integration

Pandas Query Engine Behavior

Graph RAG Query Issues

Gemini LLM Error

Contextual Retrieval Features

Issues with LlamaIndex and Pinecone IDs: Users reported difficulties with ID control in Pinecone after using LlamaIndex, noting that Pinecone auto-generates IDs making deletions cumbersome.
- Suggestions included creating nodes or editing IDs manually, and it was pointed out that the current deletion limitations necessitate the use of prefixes.
Inconsistent Queries in Pandas Query Engine: A user observed discrepancies between query outputs in a notebook and a Python script using the Pandas Query Engine, leading to issues when utilizing df.head().
- Modifying df.head() to df.head(1) resolved the problem, indicating that the number of columns in the DataFrame could affect query parsing.
Graph RAG Query Difficulties: A user using Graph RAG faced issues with querying the index, with the supplied pattern not matching retrieved chunks despite using the same notebook.
- Investigations into the GraphRAGQueryEngine's patterns suggested mismatched expectations during fetching.
Gemini LLM Compatibility Errors: Users encountered 'AttributeError: to_dict' while utilizing the Gemini LLM, pointing to potential incompatibility with certain library versions.
- Suggestions included downgrading versions and the possibility of raising a pull request to address observed issues.
Contextual Retrieval and Hybrid Retrieval in LlamaIndex: LlamaIndex supports contextual metadata extraction akin to Anthropic's contextual embeddings, allowing users to enhance indexing with summaries and question answers.
- The addition of BM25 retrieval methods and the use of QueryFusionRetriever was also discussed, emphasizing the integration of multiple retrieval strategies.

Links mentioned:

LlamaIndex ▷ #ai-discussion (1 messages):

stk_vnluser: yep

OpenAI ▷ #ai-discussions (59 messages🔥🔥):

o1 vs 4o performance

GPT task efficiency and reasoning

Local server memory implementation

AI development feedback

AI consciousness discussion

o1 mini perceived as inferior to 4o: Users expressed that o1 mini feels like it lacks real-world experience and doesn't exhibit intelligent reasoning, asserting that it merely types faster than 4o.
- One user mentioned that o1 seems to feel nothing more than 4o does, prompting discussions about the AI's actual cognitive capabilities.
Debate on AI reasoning and consciousness: A lengthy debate ensued regarding whether AI can genuinely reason or just simulate it, with varying opinions on AI's capacity for intentionality and understanding.
- One participant suggested that an AI focused solely on task completion, ignoring human-like reasoning, might be safer and more efficient.
Memory capabilities of GPT-4 API: Users inquired if memory features are available for GPT-4 through the API, clarifying that these features are currently exclusive to ChatGPT Plus users within the interface.
- One user noted it's easy to implement their own memory tools using alternatives like Pinecone, despite the ChatGPT interface lacking this functionality.
Request for feedback on IDE integrations: Suggestions were shared regarding improving AI tools integrated within IDEs, with discussions about the need for live previews to enhance workflow, similar to features in ClaudeAI.
- Several users expressed a desire for ChatGPT to incorporate this functionality, while others recommended exploring various IDEs for better integrations.
AI’s capability limitations discussed: Conversations highlighted skepticism about AI’s ability to handle reasoning in out-of-distribution scenarios and the challenges of constructing a universally-aligned ethical framework.
- Participants pointed out that any alignment strategy under a single ethical truth could inadvertently lead to broader issues or biases, emphasizing the complexity of AI interactions.

OpenAI ▷ #gpt-4-discussions (1 messages):

null.user: hmm

OpenAI ▷ #prompt-engineering (4 messages):

Effective Prompts

ChatGPT Usage

Sharing Useful Prompts: A member shared a useful prompt they crafted in the past, stating it is still relevant today. They linked to their prompt guide for others to explore.
- This act encourages community members to exchange valuable resources and prompts to enhance their experience with GPT models.
Prompt Understanding Improvement: Another member mentioned that a prompt shown in a screenshot effectively captures the intended idea. They expressed optimism that it helps in grasping the concept better.
- This highlights the significance of visual aids in understanding how to frame prompts effectively.

OpenAI ▷ #api-discussions (4 messages):

Prompt sharing

GuideGPT utility

Looks like it works!: A member mentioned that a prompt shown in a screenshot seems to capture the intended idea effectively.
- This suggests that visual examples can enhance understanding of prompt structure.
Appreciation expressed!: Another member expressed gratitude for the assistance received, showcasing the supportive atmosphere.
- Such acknowledgment fosters community engagement.
Sharing a useful prompt: A member shared a previously written prompt that they found particularly useful.
- This indicates a culture of collaboration and resource sharing within the community.
GuideGPT link shared: A direct link to a GuideGPT prompt was provided, allowing others to access it easily.
- Sharing such resources promotes the exploration of effective tools within the group.

Eleuther ▷ #research (46 messages🔥):

Model Initialization Techniques

Iterated Distillation and Amplification

Challenges in FP8 Training

Llama1 Checkpoints Status

Overcomplete Color Space in Image Models

Exploring HyperCloning for Model Initialization: A discussion on utilizing HyperCloning to initialize large language models using smaller pre-trained models, potentially saving training time and improving accuracy.
- One contributor suggested the approach of training a mini model, scaling it up, and then distilling from the larger model as a way to optimize compute usage.
Iterated Amplification Gains Popularity: The concept of Iterated Distillation and Amplification (IDA) was recognized for its potential to align AI systems effectively through iterative processes.
- One participant expressed skepticism about the term 'distillation', arguing that it doesn’t capture the necessary properties of compression and the purpose of discarding information.
Difficulties Encountered with FP8 Training: The conversation highlighted critical instabilities discovered in FP8 training due to outlier amplification by the SwiGLU activation function during long training runs.
- An audience member questioned whether other activation functions like relu^2 might face similar issues in prolonged training scenarios.
Status of Original Llama1 Checkpoints: Inquiries were made about the status of the original Llama1 checkpoints, with insights shared about their access and potential leaks.
- One user noted that the checkpoints were not uploaded publicly and referenced a specific Hugging Face repository containing leaked versions.
Using Overcomplete Color Spaces: An intriguing proposal was raised about whether overcomplete color spaces might enhance the performance of image models through redundant input representation.
- The idea was philosophically linked to the concept of qualia, suggesting a similar approach could be beneficial for language models as well.

Links mentioned:

Eleuther ▷ #interpretability-general (4 messages):

Tokenized SAEs

Spectral Filters

Whisper Interpretability

Attention-MLP Interactions

Interpretable Sequence Continuation

Tokenized SAEs enhance architecture performance: A post on tokenized SAEs introduces a per-token decoder bias that improves existing models like GPT-2 and Pythia by making training significantly faster.
- This method addresses training class imbalance, facilitating the learning of local context features through 'unigram reconstruction'.
Exploration of Spectral Filters in AI: A discussion on spectral filters was noted, though specific details weren't provided in the chat.
- The relevance of spectral filters to recent model performance was mentioned.
Insights on Whisper Interpretability: A blog post on Whisper Interpretability was shared, hinging on enhancing understanding of the model's operations.
- The insights presented in the blog post emphasize the significance of interpretability in complex models.
Excitement for Upcoming EMNLP Papers: Super proud to have 2 papers at #EMNLP2024! According to a post by @FazlBarez, the titles include 'Interpreting Context Look-ups in Transformers' focusing on Attention-MLP interactions.
- Another paper titled 'Towards Interpretable Sequence Continuation' aims to analyze shared circuits in large language models.
Focus on Key/Value Cache in Interpretability: A member remarked that in interpretability discussions, the usual focus is on statistics of the residual stream, rather than the key/value cache mechanisms.
- This perspective shift emphasizes the importance of understanding the underlying mechanisms in model interpretation.

Links mentioned:

Eleuther ▷ #lm-thunderdome (2 messages):

Gemma Models

BOS Token Application

Concerns Over BOS Token in Gemma Models: A member raised concerns that the BOS token might only be added once to the sequence for Gemma models, based on their findings during language modeling tasks.
- They questioned whether the BOS token should consistently be added every time, especially in the context of rolling loglikelihood calculations.
Verification of Input in Debugger: The same member confirmed they stopped the model call in the debugger and verified its inputs, noting the BOS token was missing in llh_rolling during certain instances.
- This discovery led to further scrutiny regarding the application of the BOS token in the modeling process.

LM Studio ▷ #general (33 messages🔥):

IPv4 Switching on MacOS

LM Studio API Connection Issues

Handling Model Loading Errors

Tracking API Callers in LM Studio

Qwen2.5-Coder Compatibility

Switching to IPv4 solves connection issues: A member suggested that switching to IPv4 often resolves common connection issues on MacOS, leading to clarity on how to adjust settings.
- After some troubleshooting, another member confirmed that this method was effective, stating, 'it just worked.'
Troubles connecting LM Studio to CrewAI: Members discussed challenges in connecting LM Studio API to CrewAI, with various solutions explored but nothing definitive agreed upon.
- One member referenced a YouTube video for guidance on using CrewAI with LM Studio.
Model loading error and potential fixes: A member outlined an error regarding model loading in llama.cpp, to which another explained that the model is not supported in LM Studio.
- Discussion also revolved around graphics processing unit (GPU) selections and the changes from version 0.2.* to 0.3.*.
Identifying API callers in LM Studio: A member inquired about tracking API callers to LM Studio, leading to advice on creating a custom wrapper for API keys using Python.
- However, it was stated that there are currently no plans for built-in tracking features in the near future.
Compatibility of Qwen2.5-Coder with LM Studio: Members questioned whether Qwen2.5-Coder works with LM Studio as they could only find references to the Qwen distribution.
- This sparked interest in the capabilities of Qwen within the context of LM Studio platforms.

Links mentioned:

LM Studio ▷ #hardware-discussion (13 messages🔥):

3090 Power Limiting vs Undervolting

Power Management across OS

Comparing GPU Power Limit Settings

RAM Speed and CPU Inference Bottleneck

Motherboard Design for DDR6 and CPU Inference

3090 Power Limiting vs Undervolting Strategies: A member discussed limiting the 3090 to 290W each and asked about comparing this with undervolting while considering clock rates.
- RTFM was suggested as a potential answer while pointing to various resources for deeper understanding.
Power Management: Windows vs Linux: A comparison was made on how to achieve lower GPU power usage, noting that in Windows, one needs to tweak settings manually, while in Linux, it's about executing a single command with nvidia-smi.
- This led to a debate about the ease of use between the operating systems with others confirming quicker settings in Windows.
Effectiveness of Setting GPU Power Limits: One member shared that setting the power limit on a 4090 from 450W to 320W only caused a 10% drop in FPS.
- This sparked a conversation about whether power limiting is sufficient compared to undervolting as a method of power management.
Debating RAM Speed and CPU Inference: A member questioned if RAM speed and bandwidth are the biggest bottlenecks during CPU inference.
- They proposed a motherboard concept that could use DDR6 directly for CPU inference, prompting curiosity about why such designs don't exist.
Underutilization of CPU Cores: A member expressed frustration over the multitude of CPU cores that remain largely unused, especially since they haven't purchased a video game since 2011.
- This raised questions about the efficiency of current CPU designs in general computing workloads.

Link mentioned: RTX 3080 / 3090 Undervolting | 100W Less = Same Performance?: Check prices on Amazon belowNvidia RTX 3090: https://geni.us/4o7XjNvidia RTX 3080: https://geni.us/Dk9g3GPU Undervolting Guide (in-depth): https://youtu.be/z...

Modular (Mojo 🔥) ▷ #general (4 messages):

Mojo LLMs API

Pythagora Dev Tool

Feedback for Magic

Mojo LLMs API Interest: A user expressed interest in utilizing the Mojo based LLMs API with Pythagora, a dev tool that builds apps through conversational interaction.
- They inquired about the cost associated with using this service while highlighting the excitement around its new era of software development.
Pythagora's AI Coding Assistant Inquiry: One user asked if another was looking to use Pythagora's AI coding assistant with Mojo.
- This reiterates the ongoing conversations around integrating Mojo with Pythagora for enhanced coding capabilities.
Last Call for Magic Feedback: A reminder was sent out for participants to join the Magic feedback chats for product insights, specifically targeting those who haven't used Magic yet.
- The call promises a quick 30-minute session with exclusive swag for feedback providers, and interested individuals can book a slot here.

Links mentioned:

Modular (Mojo 🔥) ▷ #announcements (2 messages):

Closure of GitHub Discussions

Upcoming Community Meeting

Closure of GitHub Discussions for Mojo and MAX: In an effort to centralize the community, GitHub Discussions for the Mojo and MAX repositories will be closed on September 26th.
- Important discussions with over 10 comments will be converted to GitHub Issues, and members can request discussions for conversion by tagging the organizer.
Next Community Meeting Rescheduled: The next community meeting has been moved to Monday, September 30th at 10 AM PT. Members can add the meeting to their calendar via this link and contribute their talks in the Google doc.
- Participants are encouraged to prepare their presentations and submit them ahead of time to enhance the meeting experience.

Links mentioned:

Modular (Mojo 🔥) ▷ #mojo (23 messages🔥):

Variable Bit Width Integers

Packed Structs in Mojo

Set Implementation and __copyinit__

Custom Decorators

Generic Struct Syntax

Variable Bit Width Integers Needed for TCP/IP: A member questioned the possibility of using variable bit width integers like UInt{1,2,3,4,6,7,13} for TCP/IP implementation.
- Bitwise operators and masks were suggested but considered less ergonomic, while hopes were expressed for Mojo to support this.
Concern Over Packed Structs Support: Discussion arose about the absence of packed structs in Mojo, which complicates handling bitflags as lists of __mlir_type.i1.
- A member hoped LLVM would manage this due to byte alignment, but skepticism existed regarding its reliability.
Request for copyinit in Set Implementation: A member inquired why Set does not implement __copyinit__, impacting its conformance to the CollectionElement trait.
- They looked through GitHub issues for clarity on this omission but found no satisfying explanation.
Feature Request for Empty Lists in Mojo: A feature request was made to allow empty lists without a type parameter in Mojo for better Python compatibility.
- While this idea was acknowledged, it was suggested to consider implicit conversion from the empty list literal instead.
Syntax for Generic Structs Implementing Multiple Traits: Queries arose regarding the current (nightly) syntax for defining a generic struct that implements multiple traits.
- Members noted potential issues with syntax when combining traits such as T: Trait1 & Trait2.

Torchtune ▷ #dev (27 messages🔥):

Input Position Settings

Memory Optimisations

Generation Efficiency

Batch Sizes Management

Generate Recipe Simplification

Input Position System Under Review: A discussion emerged around the decision to remove the auto-setting of input_pos in the recent PR, with reasons pointed towards simplifying generation/cacheing logic.
- The change aims to prevent user confusion by avoiding the need to search through various classes for default settings.
Configurable Memory Optimisations Discussed: Members discussed various memory optimisations, indicating that solutions such as activation offloading are in progress, while others like chunked cross entropy are being encouraged for use.
- Notably, it was mentioned that past techniques were deemed mostly wasteful but are now being considered for the mem opt tutorial.
Generation Efficiency with Batch Sizes: A conversation revolved around the efficiency of generation at varying batch sizes, particularly highlighting that the generate script will only allow for bsz 1 during execution.
- Members contemplated the simplicity of looping through batches in configurations but recognized the limitations that come with increasing batch sizes.
Debate on Submethods for Generation: There was a whimsical debate on introducing submethods like sample and batched_sample, signaling a focus on refining the generation process.
- Although some members were inclined to separate out methods, others suggested a more streamlined approach, echoing practices from other frameworks like gpt-fast.
Challenges and Revisions in Generation Workflow: A member expressed urgency in keeping the generate recipe simple due to the slew of user issues reported, especially as they transition to support larger batch sizes.
- There’s an ongoing effort to streamline logic, which is perceived as necessary to reduce complexity for users navigating through the generate functionalities.

Link mentioned: [RFC] Adding overrides for max cache seq length by SalmanMohammadi · Pull Request #1449 · pytorch/torchtune: Context What is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here) #1364 Changelog This PR: Adds support for overriding th...

OpenInterpreter ▷ #general (5 messages):

OpenInterpreter Models

Task Success with OpenInterpreter

Enhancing Existing Projects

Firebase/Stripe Integrations

Wizardlm outshines Llama in OpenInterpreter tasks: After experimenting with multiple models, a member reports that microsoft/Wizardlm 8x22B performs significantly better than llama 405B, requiring fewer attempts to complete tasks correctly.
- Wizardlm has achieved task completion on the first try more often, showcasing its effectiveness for their usage.
Successful tasks with OpenInterpreter: One member has successfully used OpenInterpreter for organizing and sorting large data sets, as well as creating a desktop shortcut.
- They expressed a desire to learn about other tasks that members have accomplished with the tool.
Enhancements require clarity with OpenInterpreter: Explicit instructions are needed when asking OpenInterpreter to modify existing applications, as it tends to generate new projects instead.
- A member mentioned the necessity of being specific and repetitive to avoid unintended outcomes during modifications.
Seeking help with Firebase/Stripe troubleshooting: A user is struggling to get their FarmFriend project, which integrates with Firebase and Stripe, functioning properly after multiple attempts and new credentials.
- Caught in a 'deathloop', they are encountering issues related to CORS, service accounts, and authentication domains, requesting assistance from anyone skilled in these areas.

OpenInterpreter ▷ #O1 (11 messages🔥):

O1 Installation Video

Functionalities of O1

Discussion on O1

Scheduling Test Sessions

O1 Installation Video Update: A user inquired about the status of the O1 install video, expressing eagerness to help test once it's available.
- They mentioned that a feature request and a message were sent to the developer regarding the video.
O1 Still in Development Stages: A participant remarked that O1 is still being developed and, as of now, it doesn't perform any useful functions.
- It's noted that O1 currently lacks control over applications or app interactions.
Proposal for O1 Discussion: A member suggested holding a detailed discussion about O1 to gain insights on its functionalities.
- They proposed to use a specific channel for the discussion at a mutually agreed time.
Scheduling a Discussion for Feedback: The same member encouraged others to indicate their available times for the O1 discussion, specifying the need for GMT timezone.
- This effort aims to ensure ample participation for a comprehensive discussion among developers and users.

tinygrad (George Hotz) ▷ #general (15 messages🔥):

CLANG dlopen replacement

Tinybox with Intel GPUs

IPMI credentials issue

Mergeability of ShapeTrackers in Lean

Exploration of CLANG dlopen replacement with mmap: A bounty discussion arose about replacing CLANG dlopen with mmap, possibly leading to manual handling of relocations in object files as evidenced by this pull request.
- At this point I'm very curious to see who will get this bounty was mentioned, indicating interest in pursuing this task.
Inquiry on Tinygrad and Intel Arc GPUs: A user questioned if Tinygrad functions well with multiple Intel GPUs, considering existing support for AMD and NV. They received affirmation and were advised to investigate further.
Struggles with IPMI IP issues: Another user reported issues with IPMI, suspecting incorrect credentials, and sought advice on resetting them. Recommendations included using a monitor and keyboard for setup and confirming that the web BMC password is the same as displayed.
Issues with GPU connection during setup: There was a question on whether the GPU HDMI or VGA should be used for initial setup, with a definitive answer indicating VGA only is necessary.
- This indicates a common oversight in hardware connections during configuration.
Potential undergraduate thesis on ShapeTrackers: A user inquired about the status of the bounty related to the mergeability of two arbitrary ShapeTrackers in Lean, expressing an interest in tackling it for their undergraduate thesis.
- They observed that it doesn't seem to have been completed yet, signaling an opportunity for new contributions in the project.

Links mentioned:

Cohere ▷ #discussions (11 messages🔥):

Cohere Discord Community

Trial Key for Cohere

Custom Timeouts on Connectors

Capstone Projects

Newsletters

Cohere Discord: A Place for Learning: Members expressed their excitement about joining the Cohere Discord community, with newcomers eager to learn about AI and Cohere's offerings.
- Welcome! has been given to new members, encouraging a collaborative atmosphere for sharing knowledge.
Grab Your Trial Key and Start Hacking: A member suggested using a trial key for Cohere that offers 1000 calls a month for free, emphasizing hands-on learning through projects.
- Another member agreed, stating that application is the best way to learn and mentioned they would explore more after finishing their capstone project.
Discussions on Technical Peculiarities: A timeout issue was raised with a response time of 504, followed by a question about setting custom timeouts on connectors.
- This highlights ongoing technical discussions about managing service interactions within the Cohere framework.
Interest in Newsletters: A member mentioned being drawn to the community through the classify newsletter, indicating the value of newsletters in attracting participants.
- Another member expressed a desire for more newsletters, showing interest in continuous engagement and updates from the community.

Cohere ▷ #api-discussions (3 messages):

Rerank Multilingual v3 issues

Comparison of Rerank models

Effect on RAG results

Rerank Multilingual v3 struggles with English: A member reported discrepancies when using rerank_multilingual_v3, noting scores below 0.05 for similar English queries compared to 0.57 and 0.98 with rerank_english_v3.
- This inconsistency is impacting their RAG results, causing relevant chunks to be filtered out unexpectedly.
Curl command suggested for testing: Another member suggested a curl command to swap models for testing, proposing queries like 'what are the working hours?' and 'what are the opening times?'.
- This could potentially assist in comparing the model performances more effectively.
Lack of visibility on documents hampers support: One user indicated they couldn't assist further due to not having access to the complete set of documents.
- They mentioned that the model seemed to be working from their perspective, adding to the confusion over the reported issues.

LAION ▷ #general (7 messages):

Whisper Optimization

Transcription Projects

GPU Utilization

Seeking Speedy Whisper Solutions: A member requested assistance on how to maximize the speed of Whisper, querying about utilizing multiple GPUs and supporting batching for transcription tasks.
- They mentioned a requirement to transcribe a very large dataset, indicating the need for efficient processing.
Whisper-TPU: A Fast Option: Whisper-TPU was suggested as a notably fast alternative for processing, representing a potential avenue for enhancing performance.
- This option could cater to the needs of users requiring high-speed transcription capabilities.

LAION ▷ #research (4 messages):

Transfusion architecture

Diffusion and AR training stability

Qwen-Audio training challenges

Exploring Transfusion architecture usage: There is curiosity about whether the new model might leverage the Transfusion architecture or a similar approach, particularly for multimodal applications.
- A relevant GitHub repository for Transfusion provides a Pytorch implementation, highlighting its capability to predict tokens and diffuse images.
Challenges with Diffusion and AR training: Experiments revealed difficulty in achieving stability when combining diffusion and AR training, suggesting a critical obstacle in method integration.
- The community is seeking effective strategies to enhance the stability of these training methods.
Inquiring about Qwen-Audio training instability: A member mentioned recalling issues with training instability in multimodal setups as discussed in the Qwen-Audio research paper, indicating the prevalence of the challenge.
- They expressed the intent to revisit the paper to clarify those details, acknowledging the relevance to current discussions.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #news (11 messages🔥):

Qwen vs o1-mini

Llama Multimodal Development

EU Regulatory Landscape

OpenAI Extended Video Insights

Qwen 2.4 Math Model underperforms against o1-mini: Despite being a newly announced sota open source math model, qwen2.4-math-72b-instruct does not surpass o1-mini when utilizing code execution and ensemble methods with n=256.
- For reference, it matches o1-mini on AIME, illustrating challenges in fair comparison with 256 generations without reflection type CoT.
Llama Multimodal Version on Hold: A developer shared excitement about their team's work on a multimodal version of Llama, but they will not release it in the EU due to current uncertainties.
- This aligns with concerns expressed about fragmented regulations potentially hindering innovation in AI across Europe; their stance aims to support European developers.
Concerns over EU's Anti-Tech Sentiment: Community members voiced apprehension about the EU appearing anti-tech, suggesting that although the intentions behind regulations may be good, they often create uncertainty.
- Discussions emphasize the need for better regulatory clarity to foster innovation while maintaining safety in the tech landscape.
Insights from OpenAI's Extended Video: Highlights from OpenAI's extended video mention that a model with RL is reportedly better at finding new CoT steps than humans, indicating evolving approaches in AI reasoning.
- Discussions around the model included the significance of infrastructure relative to algorithms and the emergence of self-critique as a noteworthy advancement.

Links mentioned:

LangChain AI ▷ #general (6 messages):

LangChain v2.0 issues

LangGraph inquiries

New agent platform

OpenAI Assistant usage

Fixing chunked output in LangChain v2.0: A user sought assistance regarding intermittent function call information being output in chunks when using LangChain v2.0 with OpenAI streaming.
- This raised concerns about potential bugs or configuration issues affecting the output.
Comparing Ell and LangChain: A member prompted a discussion about the differences and potential comparisons between Ell and LangChain.
- This indicates ongoing interest in evaluating AI frameworks and the choices available to users.
Seeking help with LangGraph: A user inquired where to direct questions regarding LangGraph, indicating a point of confusion for those looking for support.
- This highlights the need for clearer community support channels for specific tools and libraries.
Beta testers needed for new agent platform: An announcement was made about a new platform for launching agents with attached native tokens, inviting interested beta testers to DM for more details.
- This reflects a growing interest in innovative agent deployment solutions within the community.
Using OpenAI Assistants according to latest docs: A member requested guidance on using their custom OpenAI assistant following the latest documentation, stressing the importance of clarity in the new changes.
- They referred to specific features of the Assistants API, including interactions and tool capabilities.

Link mentioned: OpenAI assistants | 🦜️🔗 LangChain: The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistant...

LangChain AI ▷ #share-your-work (1 messages):

degen_cap: https://x.com/degencap777/status/1836483857614541266 hope to share your thought

OpenAccess AI Collective (axolotl) ▷ #general (6 messages):

Moshi model launch

GRIN MoE

Mistral small release

Moshi Model Unveiled: The Moshi model has been launched, described as a speech-text foundation model that uses a unique method for generating speech from text.
- This model allows for full-duplex spoken dialogue, enhancing conversational dynamics and speech recognition significantly.
GRIN MoE Shines with Fewer Parameters: The GRIN MoE model impressively achieves high performance with only 6.6B active parameters, especially excelling in coding and mathematics.
- By utilizing SparseMixer-v2 for gradient estimation and avoiding conventional techniques of expert parallelism, GRIN offers a fresh approach to MoE training.
Discussion on Mistral Small Release: Members noted the release of Mistral Small, but expressed that it was an instruction version only.
- Concerns were raised about its memory intensity, indicating it's a limiting factor for some users.

Links mentioned:

DSPy ▷ #general (5 messages):

Bootstrapping in DSPy

MathPrompt Paper

TypedPredictors Tricks

Understanding Bootstrapping's Purpose: A member clarified that the purpose of bootstrapping in DSPy is to generate intermediate examples within a pipeline, confirming successful predictions capture a full trace of the process.
- This leads to the assumption that if the final result is correct, the intermediate steps are also valid despite LLMs being inherently non-deterministic.
Introduction to MathPrompt: A member shared an interesting find about MathPrompt with a reference to a research paper.
- This suggests an extension in understanding how prompts can enhance mathematical reasoning.
Hack Your JSON Parsing with TypedPredictors: A member shared tricks for handling TypedPredictors by mocking the JSON parsing functionality to improve output pre-processing.
- Their approach includes removing unnecessary text, fixing invalid escape sequences, and logging failed parses from their GitHub Gist.

Link mentioned: fix-json.py: GitHub Gist: instantly share code, notes, and snippets.

LLM Finetuning (Hamel + Dan) ▷ #general (2 messages):

LLM Engineers Wanted

Multilingual Translation

Qwen 2.5

Real-time Financial Communication

Revolutionizing Global Finance with LLMs: A FinTech startup is seeking a talented LLM Engineer for a one-week sprint to enhance their multilingual real-time translation service using LLama 3.1 or Qwen2 models.
- The project aims to break down language barriers in global finance, promising significant contributions to how millions of transactions are processed globally.
Exploring Qwen 2.5 for Multilingual Capabilities: A user advised consideration of Qwen 2.5, highlighting its potential for multilingual functionality, suggesting it could fit the project's needs.
- This insight could guide the direction of enhancing the Whisper model alongside the chosen LLM.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}