**MCTS is all you need.**

AI News for 6/14/2024-6/17/2024. We checked 7 subreddits, 384 Twitters and 30 Discords (414 channels, and 5506 messages) for you. Estimated reading time saved (at 200wpm): 669 minutes. You can now tag @smol_ai for AINews discussions!

A bunch of incremental releases over this weekend; DeepSeekCoder V2 promises GPT4T-beating performance (validated by aider) at $0.14/$0.28 per million tokens (vs GPT4T’s $10/$30), Anthropic dropped some Reward Tampering research, and Runway finally dropped their Sora response.

However probably the longer lasting, meatier thing to dive into is the discussion around “test-time” search:

image.png

spawning a list of related papers:

We’ll be honest that we haven’t read any of these papers yet, but we did cover OpenAI’s thoughts on verifier-generator process supervision on the ICLR podcast, and have lined the remaining papers up for the Latent Space Discord Paper Club.


{% if medium == ‘web’ %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}


AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Apple’s AI Developments and Partnerships

  • Apple Intelligence announced: @adcock_brett noted Apple revealed Apple Intelligence at WWDC, their first AI system coming to iPhone, iPad, and Mac, with features like a smarter Siri and image/document understanding.
  • OpenAI partnership: Apple and OpenAI announced a partnership to directly integrate ChatGPT into iOS 18, iPadOS 18, and macOS, as mentioned by @adcock_brett.
  • On-device AI models: @ClementDelangue highlighted that Apple released 20 new CoreML models for on-device AI and 4 new datasets on Hugging Face.
  • Optimized training: Apple offered a peek into its new models’ performance and how they were trained and optimized, as reported by @DeepLearningAI.
  • LoRA adapters for specialization: @svpino explained how Apple uses LoRA fine-tuning to generate specialized “adapters” for different tasks, swapping them on the fly.

Open Source LLMs Matching GPT-4 Performance

  • Nemotron-4 340B from NVIDIA: NVIDIA released Nemotron-4 340B, an open model matching GPT-4 (0314) performance, according to @adcock_brett.
  • DeepSeek-Coder-V2: @deepseek_ai introduced DeepSeek-Coder-V2, a 230B model excelling in coding and math, beating several other models. It supports 338 programming languages and 128K context length.
  • Stable Diffusion 3 Medium: Stability AI released open model weights for its text-to-image model, Stable Diffusion 3 Medium, offering advanced capabilities, as noted by @adcock_brett.

New Video Generation Models

  • Dream Machine from Luma Labs: Luma Labs launched Dream Machine, a new AI model generating 5-second video clips from text and image prompts, as reported by @adcock_brett.
  • Gen-3 Alpha from Runway: @c_valenzuelab showcased Runway’s new Gen-3 Alpha model, generating detailed videos with complex scenes and customization options.
  • PROTEUS from Apparate Labs: Apparate Labs launched PROTEUS, a real-time AI video generation model creating realistic avatars and lip-syncs from a single reference image, as mentioned by @adcock_brett.
  • Video-to-Audio from Google DeepMind: @GoogleDeepMind shared progress on their video-to-audio generative technology, adding sound to silent clips matching scene acoustics and on-screen action.

Robotics and Embodied AI Developments

  • OpenVLA for robotics: OpenVLA, a new open-source 7B-param robotic foundation model outperforming a larger closed-source model, was reported by @adcock_brett.
  • Virtual rodent from DeepMind and Harvard: DeepMind and Harvard created a ‘virtual rodent’ powered by an AI neural network, mimicking agile movements and neural activity of real-life rats, as noted by @adcock_brett.
  • Manta Ray drone from Northrop Grumman: @adcock_brett mentioned Northrop Grumman released videos of the ‘Manta Ray’, their new uncrewed underwater vehicle drone prototype.
  • Autonomous driving with humanoids: A new approach to autonomous driving leveraging humanoids to operate vehicle controls based on sensor feedback was reported by @adcock_brett.

Miscellaneous AI Research and Applications

  • Anthropic’s reward tampering research: @AnthropicAI published a new paper investigating reward tampering, showing AI models can learn to hack their own reward system.
  • Meta’s CRAG benchmark: Meta’s article discussing the Corrective Retrieval-Augmented Generation (CRAG) benchmark was highlighted by @dair_ai.
  • DenseAV for learning language from videos: An AI algorithm called ‘DenseAV’ that can learn language meaning and sound locations from unlabeled videos was mentioned by @adcock_brett.
  • Goldfish loss for training LLMs: @tomgoldsteincs introduced the goldfish loss, a technique for training LLMs without memorizing training data.
  • Creativity reduction in aligned LLMs: @hardmaru shared a paper exploring the unintended consequences of aligning LLMs with RLHF, which reduces their creativity and output diversity.

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

AI Models and Techniques

Stable Diffusion Models and Techniques

Llama and Local LLM Models

AI Ethics and Regulation

AI and the Future


AI Discord Recap

A summary of Summaries of Summaries

1. AI Model Performance and Scaling

  • Scaling Up with New AI Models: DeepSeek’s Coder V2 reportedly beats GPT-4 on benchmarks and Google DeepMind reveals new video-to-audio tech creating tracks for any video, gaining traction on Rowan Cheung’s X profile.
  • Expanding AI Capabilities Across Platforms: Runway introduces Gen-3 Alpha for video generation, enhancing cinematic styles and scene transitions. AP details shared on Twitter.

2. Integration and Implementation Across Platforms

  • Hybrid Notes App Unveils LLM integration: OpenRouter unveils a notes app integrating LLMs for dynamic content interaction, though lacking mobile support as specified on their full-screen app.
  • Challenges with Implementation on Various Platforms: Users face issues like CORS errors on OpenRouter and integration challenges on LangChain, reflecting the need for better implementation guides or platform-specific APIs.

3. Ethical AI and Governance

  • OpenAI Shifts Towards Profit-Driven Model: Speculations and confirmations stir about OpenAI’s move towards becoming a profit entity, potentially impacting governance and ethical considerations. More on this from The Information.
  • Discussions on AI Ethics Heat Up: Debates continue about data privacy, model biases, and corporate governance in AI, as Edward Snowden criticizes OpenAI’s new board appointments on Edward Snowden’s X profile.

4. New AI Developments and Benchmarking

  • AI Innovations and Improvements Announced: Anthropic publishes insights into AI’s ability to tamper with reward systems in their new research article.
  • Benchmarking New Models: Stability AI releases SD3 models discussing new techniques for loss stabilization and artifacts management across forums, including a spotlight on Reddit.

5. Collaborative AI Projects and User Engagement

  • Community Projects Highlight AI Integration: From a notes app merging notes and API keys management on OpenRouter to innovative AI-driven video generation tools like Dream Machine, community-built tools are pushing the boundaries of creativity and practical AI application, visible on platforms like Lumalabs.
  • Interactive AI Discussions and Collabs Flourish: Webinars and collaborative events like the upcoming Mojo Community Meeting encourage deep dives into AI advancements, with detailed discussions and participation boasts from across the global user base as shared on blog.

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

  • SD3 License Troubles: The new license for Stable Diffusion 3 (SD3) has led to its ban on Civitai due to legal ambiguities, with a review by Civitai’s legal team announced in their temporary ban statement.
  • Community Rift Over SD3: Users expressed frustration with Stability AI’s licensing of SD3, highlighting both confusion and discontent, while some criticized youtuber Olivio Sarikas for allegedly misrepresenting the SD3 license for views, referencing his video.
  • Guidance for ComfyUI: Issues around ComfyUI setup sparked technical discussion, with suggested fixes for custom node installations including dependencies like cv2; a user-contributed ComfyUI tutorial was shared to assist.
  • Seeking SD3 Alternatives: The dialogue points to a shift towards seeking alternative models and artistic tools, such as video generation with animatediff, possibly due to the ongoing SD3 controversy.
  • Misinformation Allegations in the AI Community: Accusations fly regarding youtuber Olivio Sarikas spreading misinformation about SD3’s license, with community members challenging the veracity of his content found in his contentious video.

Unsloth AI (Daniel Han) Discord

  • Ollama Integration Nears Completion: The Ollama support development has reached 80% completion, with the Unsloth AI team and Ollama collaboratively pushing through delays. Issues with template fine-tuning validation and learning rates concerning Ollama were discussed, along with an issue where running model.push_to_hub_merged does not save the full merged model, prompting a manual workaround.

  • Unsloth Speeds Ahead: Unsloth’s training process is touted to be 24% faster than torch.compile() torchtune for the NVIDIA GeForce RTX 4090, as benchmarks show its impressive training speed. Additionally, upcoming multi-GPU support for up to 8 GPUs is being tested with a select group of users getting early access for initial evaluations.

  • Training Troubles and Tricks: Members encountered challenges like crashes during saving steps while training the Yi model, possible mismanagement of quantization_method during saving, and confusion around batch sizes and gradient accumulation in VRAM usage. Solutions and workarounds included verifying memory/disk resources and a submitted pull request addressing the quantization error.

  • Lively Discussion on Nostalgia and Novelty in Music: Members shared music ranging from a nostalgic 1962 song to iconic tracks by Daft Punk and Darude, showing a light-hearted side to the community. In contrast, concerns were raised over Gemma 2’s output on AI Studio, with mixed reactions varying from disappointment to intrigue and anticipation for Gemini 2.0.

  • CryptGPT Secures LLMs with an Encryption Twist: CryptGPT was introduced as a concept using the Vigenere cipher to pretrain GPT-2 models on encrypted datasets, ensuring privacy and requiring an encryption key to generate output, as detailed in a shared blog post.

  • Singular Message of Curiosity: The community-collaboration channel featured a single message expressing interest, but without further context or detail, its relevance to broader discussion topics remains unclear.


CUDA MODE Discord

  • NVIDIA’s Next Big Thing Speculated and PyCUDA SM Query Clarified: Engineers speculated about the potential specs of the upcoming NVIDIA 5090 GPU, with rumors of up to 64 GB of VRAM circulating yet met with skepticism. Additionally, a discrepancy in GPU SM count for an A10G card reported by techpowerup was cleared up, with independent sources such as Amazon Web Services confirming the correct count as 80, not the 72 originally stated.

  • Triton and Torch Users Navigate Glitches and Limits: Triton users encountered an AttributeError in Colab and debated the feasibility of nested reductions for handling quadrants. Meanwhile, PyTorch users adjusted the SM threshold in torch.compile(mode="max-autotune") to accommodate GPUs with less than 68 SMs and explored enabling coordinate descent tuning for better performance.

  • Software and Algorithms Push the AI Envelope: A member lauded the matching of GPT-4 with LLaMA 3 8B, while Akim will attend the AI_dev conference and is open to networking. Elsewhere, Vayuda’s search algorithm paper spurred interest among enthusiasts, discussed across multiple channels. Discussions around AI training, evident in Meta’s described challenges in LLM training, underscore the importance of infrastructure adaptability.

  • CUDA Development Optics: News from CUDA-focused development revealed: Permuted DataLoader integration did not significantly affect performance; a unique seed strategy was developed for stochastic rounding; challenges surfaced regarding ZeRO-2’s memory overhead; and new LayerNorm kernels provided much-needed speedups under certain configurations.

  • Beyond CUDA: Dynamic Batching, Quantization, and Bit Packing: In the domain of parallel computing, engineers struggled with dynamic batching for Gaudi architecture and discussed the complexity of quantization and bit-packing techniques. They stressed the VRAM limitations constraining local deployment of large models and shared diverse resources, including links to Python development environments and documentation on novel machine learning libraries.


LM Studio Discord

  • LM Studio equips engineers with CLI tools: The latest LM Studio 0.2.22 release introduced ‘lms’, a CLI management tool for models and debugging prompts, which is detailed in its GitHub repository. The update streamlines the workflow for AI deployments, especially with model loading/unloading and input inspection.

  • Performance tweaks and troubleshooting: Engineers discussed optimal settings for AI model performance, including troubleshooting GPU support for Intel ARC A7700, configuration adjustments for GPU layers, and adjusting Flash Attention settings. There was a recommendation to check Open Interpreter’s documentation for issues hosting local models and a call for better handling of font sizes in LM Studio interfaces for usability.

  • Diverse model engagement: Members recommended Fimbulvetr-11B for roleplaying use-cases, while highlighting the fast-paced changes in coding models like DeepSeek-Coder-V2, advising peers to stay updated with current models for specific tasks like coding, which can be reviewed on sites like Large and Small Language Models list.

  • Hardware optimization and issues: A link to archived LM Studio 0.2.23 was shared for those facing installation issues—a MirrorCreator link. Hardware discussions also included the compatibility of mixed RAM sticks, setting CPU cores for server mode, and troubleshooting GPU detection on various systems.

  • Development insights and API interactions: Developers shared their aspirations for integrating various coding models like llama3 and deepseek-coder into their VSCode workflow and sought assistance with implementing models in continue.dev. There was also a conversation about decoupling ROCm from the main LM Studio application and a user guide for configuring continue.dev with LM Studio.

  • Beta release observations and app versioning: The community tested and reviewed recent beta releases, discussing tokenizer fixes and GPU offloading glitches. There’s a need for access to older versions, which is challenged by LM Studio’s update policies, and a suggestion to maintain personal archives of preferred versions.

  • AI-driven creativity and quality of life concerns: Engineers raised issues like the mismanagement of stop tokens by LM Studio and a tool’s tendency to append irrelevant text in outputs. A frequent use-case-related complaint was an AI model not indicating its failure to provide a correct output by using an “#ERROR” message when necessary.


HuggingFace Discord

AI Alternatives for GPT-4 on Low-End Hardware: Users debated on practical AI models for less powerful servers with suggestions like “llama3 (70B-7B), mixtral 8x7B, or command r+” for self-hosted AI similar to GPT-4.

RWKV-TS Challenges RNN Dominance: An arXiv paper introduces RWKV-TS, proposing it as a more efficient alternative to RNNs in time series forecasting, by effectively capturing long-term dependencies and scaling computationally.

Model Selection Matters in Business Use: In the choice of AI for business applications, it’s crucial to consider use cases, tools, and deployment constraints, even with a limitation like the 7B model size. For tailored advice, members suggested focusing on specifics.

Innovations and Integrations Abound: From Difoosion, a user-friendly web interface for Stable Diffusion, to Ask Steve, a Chrome extension designed to streamline web tasks using LLMs, community members are actively integrating AI into practical tools and workflows.

Issues and Suggestions in Model Handling and Fine-Tuning:

  • A tutorial for fine-tuning BERT was shared.
  • Concerns about non-deterministic model initializations were raised, with advice to save the model state for reproducibility.
  • Mistral-7b-0.3’s context length handling and the quest for high-quality meme generator models indicate challenges and pursuits in model customization.
  • For TPU users, guidance on using Diffusers with GCP’s TPU is sought, indicating an interest in leveraging cloud TPUs for diffusion models.

OpenAI Discord

  • iOS Compatibility Question Marks: Members debated whether ChatGPT functioned with iOS 18 beta, recommending sticking to stable versions like iOS 17 and noting that beta users are under NDA regarding new features. No clear consensus was reached on compatibility.

  • Open Source Ascending: The release of an open-source model by DeepSeek AI that outperforms GPT-4 Turbo in coding and math sparked debate about the advantages of open-source AI over proprietary models.

  • Database Deployments with LLMs: For better semantic search and fewer hallucinations, a community member highlighted OpenAI’s Cookbook as a resource for integrating vector databases with OpenAI’s models.

  • GPT-4 Usage Ups and Downs: Users expressed frustrations with access to GPT interactions, privacy settings on Custom GPTs, and server downtimes. The community provided workarounds and suggested monitoring OpenAI’s service status for updates.

  • Challenges with 3D Modeling and Prompt Engineering: Conversations focused on the technicalities of generating shadow-less 3D models and the intricacies of preventing GPT-4 from mixing information. Members shared various strategies, including step-back prompting and setting explicit actions to guide the AI’s output.


LAION Discord

  • Stabilizing SD3 Models: The discussion revolved around SD3 models facing stability hurdles, particularly with artifacts and training. Concerns were raised about loss stabilization, pinpointing issues like non-uniform timestep sampling and missing elements such as qk norm.

  • T2I Models Take the Stage: The dialog highlighted interest in open-source T2I (text-to-image) models, notably for character consistency across scenes. Resources such as Awesome-Controllable-T2I-Diffusion-Models and Theatergen were recommended for those seeking reliable multi-turn image generation.

  • Logical Limitbreak: A member brought attention to current challenges in logical reasoning within AI, identifying Phi-2’s “severe reasoning breakdown” and naming bias in LLMs when tackling AIW problems—a key point supported by related research.

  • Boosting Deductive Reasoning: Queries about hybrid methods for enhancing deductive reasoning in LLMs directed to Logic-LM, a method that combines LLMs with symbolic AI solvers to improve logical problem-solving capabilities.

  • Video Generation Innovation: Fudan University’s Hallo model sparked excitement, a tool capable of video generation from single images and audio, with potential application alongside Text-to-Speech systems. A utility to run it locally was shared from FXTwitter, highlighting community interest in practical integrations.


OpenAccess AI Collective (axolotl) Discord

  • 200T Parameter Model: AGI or Fantasy?: Discussions about the accessibility of a hypothetical 200T parameter model surfaced, highlighting both the limits of current compute capabilities for most users and the humor in staking an AGI claim for such models.

  • Competing at the Big Model Rodeo: Members juxtaposed the Qwen7B and Llama3 8B models, acknowledging Llama3 8B as the dominant contender in performance. The problem of custom training configurations for Llama3 models was tackled, with a solution shared to address the chat_template setting issues.

  • Optimization Quest for PyTorch GPUs: Requests for optimization feedback directed towards various GPU setups in PyTorch have yielded a trove of diverse community experiences ranging from AMD MI300X to RTX 3090, Google TPU v4, and 4090 with tinygrad.

  • Navigating Axolotl’s Development Labyrinth: An issue halting the development with the Llama3 models was found and traced to a specific commit, which helped identify the problem but emphasized the need for a fix in the main branch. Instructions for setting inference parameters and fine-tuning vision models within Axolotl were detailed for users.

  • Data Extraction with a Twist of Structure: Community showcase hinted at positive results after fine-tuning LLMs with Axolotl, particularly in transforming unstructured press releases into structured outputs. A forthcoming post promises to expound on the use of the OpenAI API’s function calling to enhance LLM accuracy in this task. The author points to a detailed post for more information.


Perplexity AI Discord

  • Pro Language Partnerships!: Perplexity AI has inked a deal with SoftBank, offering Perplexity Pro free for one year to SoftBank customers. This premium service, typically costing 29,500 yen annually, is set to enhance users’ exploration and learning experiences through AI (More info on the partnership).

  • Circumventing AB Testing Protocols? Think Again: Engineers discussed how to bypass A/B testing for Agentic Pro Search, with a Reddit link provided; however, concerns about integrity led to reconsideration. The community also tackled a myriad of usage questions on Perplexity features, debated the merits of Subscriptions to Perplexity versus ChatGPT, and raised critical privacy issues concerning web crawling practices.

  • API Access is the Name of the Game: Members expressed urgency for closed-beta access to the Perplexity API, emphasizing the impact on launching projects like those at Kalshi. Troubleshooting Custom GPT issues, they exchanged tips to enhance its “ask-anything” feature using schema-based explanations and error detail to improve action/function call handling.

  • Community Leaks and Shares: Links to Perplexity AI searches and pages on varied topics, from data table management tools (Tanstack Table) to Russia’s pet food market and elephant communication strategies, were circulated. A mishap with a publicized personal document on prostate health led to community-driven support resolving the issue.

  • Gaming and Research Collide: The shared content within the community included a mix of academic interests and gaming culture, demonstrated by a publicly posted page pertaining to The Elder Scrolls, hinting at the intersecting passions of the technical audience involved.


Nous Research AI Discord

  • Neurons Gaming with Doom: An innovative approach brings together biotech and gaming as living neurons are used to play the video game Doom, detailed in a YouTube video. This could be a step forward in understanding biological process integration with digital systems.

  • AI Ethics and Bias in the Spotlight: A critical take on AI discussed in a ResearchGate paper calls attention to AI’s trajectory towards promulgating human bias and aligned corporate interests, naming “stochastic parrots” as potential instruments of cognitive manipulation.

  • LLM Merging and MoE Concerns: An engaged debate over the practical use of Mixture of Experts (MoE) models surfaced, contemplating the effectiveness of model merging versus comprehensive fine-tuning, citing a PR on llama.cpp and MoE models on Hugging Face.

  • Llama3 8B Deployment Challenges: On setting up and deploying Llama3 8B, it was advised to utilize platforms like unsloth qlora, Axolotl, and Llamafactory for training and lmstudio or Ollama for running fast OAI-compatible endpoints on Apple’s M2 Ultra, bringing light to tooling for model deployment.

  • Autechre Tunes Stir Debate: Opinions and emotions around Autechre’s music led to sharing of contrasting YouTube videos, “Gantz Graf” and “Altibzz”, showcasing the diverse auditory landscapes crafted by the electronic music duo.

  • Explore Multiplayer AI World Building: Suggestion raised for collaborative creation in WorldSim, as members discussed enabling multiplayer features for AI-assisted co-op experiences, while noting censorship from the model provider could influence WorldSim AI content.

  • NVIDIA’s LLM Rolls Out: Introductions to NVIDIA’s Nemotron-4-340B-Instruct model, accessible on Hugging Face, kindled talks on synthetic data generation and strategic partnerships, highlighting the company’s new stride into language processing.

  • OpenAI’s Profit-Minded Pivot: OpenAI’s CEO Sam AltBody has indicated a potential shift from a non-profit to a for-profit setup, aligning closer to competitors and affecting the organizational dynamic and future trajectories within the AI industry.


Modular (Mojo đŸ”„) Discord

  • Mojo Functions Discussion Heats Up: Engineers critiqued the Mojo manual’s treatment of def and fn functions, highlighting the ambiguity in English phrasing and implications for type declarations in these function variants. This led to a consensus that while def functions permit optional type declarations, fn functions enforce them; a nuanced distinction impacting code flexibility and type safety.

  • Meetup Alert: Mojo Community Gathers: An upcoming Mojo Community Meeting was announced, featuring talks on constraints, Lightbug, and Python interoperability, inviting participants to join via Zoom. Moreover, benchmark tests revealed that Mojo’s Lightbug outstrips Python FastAPI in single-threaded performance yet falls short of Rust Actix, sparking further discussion on potential runtime costs entailed by function coloring decisions.

  • Fresh Release of Mojo 24.4: The Mojo team has rolled out version 24.4, introducing core language and standard library improvements. Detail-oriented engineers were pointed towards a blog post for a deep dive into the new traits, OS module features, and more.

  • Advanced Mojo Techniques Uncovered: Deep technical discussions unveiled challenges and insights in Mojo programming, from handling 2D Numpy arrays and leveraging DTypePointer for efficient SIMD operations to addressing bugs in casting unsigned integers. Notably, a discrepancy involving alias usage in CRC32 table initialization sparked an investigation into unexpected casting behaviors.

  • Nightly Mojo Compiler on the Horizon: Engineers were informed about the new nightly builds of the Mojo compiler with the release of versions 2024.6.1505, 2024.6.1605, and 2024.6.1705, along with instructions to update via modular update. Each version’s specifics could be examined via provided GitHub diffs, showcasing the platform’s continuous refinement. Additionally, the absence of external documentation for built-in MLIR dialects was noted, and enhancements such as direct output expressions in REPL were requested.


Eleuther Discord

  • Replication of OpenAI’s Generalization Techniques by Eleuther: EleutherAI’s interpretability team successfully replicated OpenAI’s “weak-to-strong” generalization on open-source LLMs across 21 NLP datasets, publishing a detailed account of their findings, positive and negative, on experimenting with variants like strong-to-strong training and probe-based methods, here.

  • Job Opportunities and Navigating CommonCrawl: The AI Safety Institute announced new roles with visa assistance for UK relocation on their careers page, while discussions on efficiently processing CommonCrawl data mentioned tools like ccget and resiliparse.

  • Model Innovations and Concerns: From exploring RWKV-CLIP, a vision-language model, to concerns about content generated by diffusion models and the stealing of commercial model outputs, the community addressed various aspects of AI model development and security. The effectiveness of the Laprop optimizer was debated, and papers ranging from those on online adaptation to those on “stealing” embedding models were shared, with a key paper being here.

  • Evolving Optimization and Scaling Laws: A member’s critique of a hypernetwork-based paper sparked conversations on the value and comparison of hypernetworks with Hopfield nets. Interested parties ventured into the scaling of scaling laws, considering online adaptation for LLMs and citing Andy L. Jones’ concept of offsetting training compute against inference compute.

  • Interpretability Insights on Sparse Autoencoders: Interpretability research centered around Sparse Autoencoders, with a paper proposing a framework for evaluating feature dictionaries in tasks like indirect object identification with GPT-2, and another highlighting “logit prisms” decomposing logit output components, as documented in this article.

  • Need for A Shared Platform for Model Evaluation: Calls were made for a platform to share and validate evaluation results of AI models, particularly for those using Hugging Face and seeking to verify the credibility of closed-source models, highlighting the need for comprehensive and transparent evaluation metrics.

  • Awaiting Code Release for Vision-Language Project: A specific request for a release date for code related to RWKV-CLIP was directed to the GitHub Issues page of the project, indicating a demand for access to the latest advancements in vision-language representation models.


LLM Finetuning (Hamel + Dan) Discord

  • Apple Sidesteps NVIDIA in AI: Apple’s WWDC reveal details its avoidance of NVIDIA hardware, preferring their in-house AXLearn on TPUs and Apple Silicon, potentially revolutionizing their AI development strategy. The technical scoop is unpacked in a Trail of Bits blog post.

  • Embeddings and Fine-Tuning: Enthusiasm emerges for fine-tuning methodologies, with discussions ranging from embedding intricacies, highlighted by resources like Awesome Embeddings, to specific practices like adapting TinyLlama for unique narration styles, detailed in a developer’s blog post.

  • Prompt Crafting Innovations: Mention of Promptfoo and inspect-ai indicates a trend toward more sophisticated prompt engineering tools, with the community weighing functionality and user-friendliness. Diverging preferences suggest such tools are pivotal for refined human-AI interaction schemes.

  • Crediting Confusions Cleared: Participants express mixed signals about course credits across platforms like LangSmith and Replicate, with reminders and clarifications surfacing through communal support. The difference between beta and course credits was elucidated for concerned members.

  • Code Llama Leaps Forward: Conversations ignited by the release of Code Llama show a commitment to enhancing programming productivity. Curiosity about permissible variability between Hugging Face and GitHub configuration formats for Code Llama indicates the precision required for fine-tuning these purpose-built models.


Interconnects (Nathan Lambert) Discord

  • Sakana AI Joins the Unicorn Club: Sakana AI, pushing past traditional transformer models, has secured a monster $1B valuation from heavy-hitters like NEA, Lux, and Khosla, marking a significant milestone for the AI community. Full financial details can be ferreted out in this article.

  • Next-Gen Video Generation with Runway’s Gen-3 Alpha: Runway has turned heads with its Gen-3 Alpha, flaunting the ability to create high-quality videos replete with intricate scene transitions and a cornucopia of cinematographic styles, setting a new bar in video generation which can be explored here.

  • DeepMind’s Video-Turned-Audio Breakthrough: Google DeepMind’s new video-to-audio technology aims to revolutionize silent AI video generations by churning out a theoretically infinite number of tracks tailored to any video, as showcased in Rowan Cheung’s examples.

  • Wayve’s Impressive Take on View Synthesis: Wayve claims a fresh victory in AI with a view synthesis model that leverages 4D Gaussians, promising a significant leap in generating new perspectives from static images, detailed in Jon Barron’s tweet.

  • Speculations Stir on OpenAI’s Future: Whispers of OpenAI’s governance shake-up suggest a potential pivot to a for-profit stance with musings of a subsequent IPO, stirring debate within the community; some greet with derision while others await concrete developments, as covered in The Information and echoed by Jacques Thibault’s tweet.


LlamaIndex Discord

  • RAG and Agents Drawn Clear: An Excalidraw-enhanced slide deck was shared detailing the construction of Retrieval-Augmented Generation (RAG) and Agents, containing diagrams that elucidate concepts from simple to advanced levels.

  • Observability Integrated in LLM Apps: A new module for instrumentation brings end-to-end observability to LLM applications through Arize integration, with a guide available detailing custom event/span handler instrumentation.

  • Knowledge Graphs Meet Neo4j: Discussions around integrating Neo4j knowledge graphs with LlamaIndex focused on transforming Neo4j graphs into property graphs for LlamaIndex, with resources and documentation provided (LlamaIndex Property Graph Example).

  • Enhanced LLMs with Web Scraping Strategies: Apublication discusses improving LLMs by combining them with web scraping and RAG, recommending tools such as Firecrawl for effective Markdown extraction, and Scrapfly for diverse output formats suitable for LLM preprocessing.

  • Practical Tutorials and AI Event Highlights: Practical step-by-step guides for full-stack agents and multimodal RAG pipelines were made available, and AI World’s Fair highlighted with noteworthy speakers shared their knowledge on AI and engineering, enhancing the community’s skill set and understanding of emerging AI trends.


tinygrad (George Hotz) Discord

  • Script Snafu and OpenCL Woes: Discussions around autogen_stubs.sh revealed that clang2py breaks the indentation, but this was found unnecessary for GPU-accelerated tinygrad operations. Meanwhile, George Hotz suggested fixing OpenCL installation and verifying with clinfo due to errors affecting tinygrad’s GPU functionality.

  • Enhanced OpenCL Diagnostics on the Horizon: A move to improve OpenCL error messages is underway, with a proposed solution that autonomously generates messages from available OpenCL headers, aiming to ease developers’ debugging process.

  • Deciphering Gradient Synchronization: In a bid to demystify gradient synchronization, George Hotz affirmed Tinygrad’s built-in solution within its optimizer, touting its efficiency compared to the more complex Distributed Data Parallel in PyTorch.

  • Chasing PyTorch’s Tail with Ambitions and Actions: George Hotz conveyed ambitions for tinygrad to eclipse PyTorch in terms of speed, simplicity, and reliability. Although currently trailing, particularly in LLM training, tinygrad’s clean design and strong foundation exude promise.

  • Precision Matters in the Kernel Cosmos: A technical exchange discussed strategies for incorporating mixed precision in models, where George Hotz recommended late casting for efficiency gains and the use of cast_ methods, highlighting a critical aspect of optimizing for computation-heavy tasks.


OpenRouter (Alex Atallah) Discord

  • GPT Notes App Unveiled: An LLM client and notes app hybrid has been demonstrated, featuring dynamic inclusion of notes, vanilla JavaScript construction, and local storage of notes and API keys in the browser; however, it currently lacks mobile support. The app is showcased with a Codepen and a full-screen deployment.

  • OpenRouter Gripes and Glimpses: OpenRouter requires at least one user message to prevent errors, with users suggesting the use of the prompt parameter; formatting tools like PDF.js and Jina AI Reader are recommended for PDF pre-processing to enhance LLM compatibility.

  • Censorship Consternation with Qwen2: The Qwen2 model is facing user criticism for excessive censorship, while the less restrictive Dolphin Qwen 2 model garners recommendation for its more realistic narrative generation.

  • Gemini Flash Context Clash: Questions arise over Gemini Flash’s token limits, with OpenRouter listing a 22k limit, in contrast to the 8k tokens cited in the Gemini Documentation; the discrepancy is attributed to OpenRouter’s character counting to align with Vertex AI’s pricing.

  • Rate Limits and Configuration Conversations: Users discuss rate limits for models like GPT-4o and Opus and model performance configurations; for further information, the OpenRouter documentation on rate limits proves informative, and there is a focus on efficiency in API requests and usage.


LangChain AI Discord

  • LangChain API Update Breaks TextGen: A recent API update has disrupted textgen integration in LangChain, with members seeking solutions in the general channel.

  • Technical Troubleshooting Takes the Stage: Users discussed challenges with installing langchain_postgres and a ModuleNotFoundError caused by an update to tenacity version 8.4.0; reverting to version 8.3.0 fixed the issue.

  • LangChain Knowledge Sharing: Questions around LangChain usage emerged, including transitioning from Python to JavaScript implementations, and handling of models like Llama 3 or Google Gemini for local deployment.

  • Tech enthusiasts Intro New Cool Toys: Innovative projects were highlighted such as R2R’s automatic knowledge graph construction, an interactive map for Collision events, and CryptGPT, which is a privacy-preserving approach to LLMs using Vigenere cipher.

  • AI for the Creatively Inclined: Community members announced a custom GPT for generating technical diagrams, and Rubik’s AI, a research assistant and search engine offering free premium with models like GPT-4 Turbo to beta testers.


Latent Space Discord

OtterTune Exits Stage Left: OtterTuneAI has shut down following a failed acquisition deal, marking the end of their automatic database tuning services.

Apple and OpenAI Make Moves: Apple released optimized on-device models on Hugging Face, such as DETR Resnet50 Core ML, while OpenAI faced criticism from Edward Snowden for adding former NSA Director Paul M. Nakasone to its board.

DeepMind Stays in Its Lane: In recent community discussions, it was clarified that DeepMind has not been contributing to specific AI projects, debunking earlier speculation.

Runway and Anthropic Innovate: Runway announced their new video generation model, Gen-3 Alpha, on Twitter, while Anthropic publicized important research on AI models hacking their reward systems in a blog post.

Future of AI in Collaboration and Learning: Prime Intellect is set to open source sophisticated models DiLoco and DiPaco, Bittensor is making use of The Horde for decentralized training, and a YouTube video shared among users breaks down optimizers critical for model training.


Cohere Discord

  • AGI: Fantasy or Future?: Members shared their perspectives on a YouTube video about AGI, discussing the balance between skepticism and the potential for real progress that parallels the aftermath of the dot-com bubble.

  • Next.js Migrations Ahead: There’s a collaborative push to utilize Next.js App Router for the Cohere toolkit, aiming at better code portability and community contribution, details of which are in GitHub issue #219.

  • C4AI by Cohere: Nick Frosst invites to a C4AI talk via a Google Meet link, offering an avenue for community engagement on LLM advancements and applications.

  • Command Your Browser: A free Chrome Extension has been released, baking LLMs into Chrome to boost productivity, while an interactive Collision map with AI chat features showcases events using modern web tech stacks.

  • Developer Touch Base: Cohere is hosting Developer Office Hours with David Stewart for a deep dive into API and model intricacies; interested community members can join here and post their questions on the mentioned thread for dedicated support.


OpenInterpreter Discord

  • Frozen Model Mystery Solved: Engineers reported instances of a model freezing during coding, but it was determined that patience pays off as the model generally completes the task, albeit with a deceptive pause.

  • Tech Support Redirect: A query about Windows installation issues for a model led to advice pointing the user towards a specific help channel for more targeted assistance.

  • Model Memory Just Got Better: A member celebrated a breakthrough with memory implementation, achieving success they described in rudimentary terms; meanwhile, Llama 3 Instruct 70b and 8b performance details were disclosed through a Reddit post.

  • Cyber Hat Countdown: An open-source, AI-enabled “cyber hat” project sparked interest among engineers for its originality, potential for innovation, and an open invite for collaboration watch here; similarly, Dream Machine’s text and image-based realistic video creation signaled strides in AI model capabilities.

  • Semantic Search Synergy: Conversation turned to the fusion of voice-based semantic search and indexing with a vector database holding audio data, leveraging the prowess of an LLM to perform complex tasks based on vocal inputs, suggesting the nascent power of integrated tech systems.


Torchtune Discord

  • Tuning Into Torchtune’s Single Node Priorities: Torchtune is focusing on optimizing single node training before considering multi-node training; it utilizes the tune run command as a wrapper for torch run, which might support multi-node setups with some adjustments, despite being untested for such use.

  • Unlocking Multi-Node Potential in Torchtune: Some members shared how to potentially configure Torchtune for multi-node training, suggesting the use of tune run —nnodes 2 and additional tools like TorchX or slurm for script execution and network coordination across nodes, referencing the FullyShardedDataParallel documentation as a resource for sharding strategies.


DiscoResearch Discord

  • Llama3 Sticks to Its Roots: Despite the introduction of a German model, the Llama3 tokenizer has not been modified and remains identical to the base Llama3, raising questions about its efficiency in handling German tokens.
  • Token Talk: Concerns emerged over the unchanged tokenizer, with engineers speculating that not incorporating specific German tokens could substantially reduce the context window and affect the quality of embeddings.
  • Comparing Llama2 and Llama3 Token Sizes: Inquisitive minds noted that Llama3’s tokenizer is notably 4 times larger than Llama2’s, leading to questions about its existing efficacy with the German language and potential unrecognized issues.

Datasette - LLM (@SimonW) Discord

Heralding Data Engineering Job Security: ChatGPT’s burgeoning role in the tech landscape drew humor-inflected commentary that it represents an infinite job generator for data engineers.

Thoughtbot Clears the Fog on LLMs: The guild appreciated a guide by Thoughtbot for its lucidity in dissecting the world of Large Language Models, specifically for their delineation of Base, Instruct, and Chat models which can aid beginners.

New Kid on the Search Block: Turso’s latest release integrates native vector search with SQLite, which aims at enhancing the AI product development experience by replacing the need for independent extensions like sqlite-vss.


AI Stack Devs (Yoko Li) Discord

  • In Search of Hospital AI Project Name: User gomiez inquired about the name of the hospital AI project within the AI Stack Devs community. There was no additional context or responses provided to further identify the project.

Mozilla AI Discord

  • Llama as Firefox’s New Search Companion?: A guild member, cryovolcano., inquired about the possibility of integrating llamafile with tinyllama as a search engine in the Firefox browser. No further details or context about the implementation or feasibility were provided.

The LLM Perf Enthusiasts AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


The YAIG (a16z Infra) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.


PART 2: Detailed by-Channel summaries and links

{% if medium == ‘web’ %}

Stability.ai (Stable Diffusion) ▷ #general-chat (723 messagesđŸ”„đŸ”„đŸ”„):

  • Lack of Trust in SD3 License Creates Chaos: There are significant concerns over Stability AI’s new license for SD3, leading to temporary banning of SD3-related content on Civitai due to the perceived legal ambiguities. Civitai announcement mentions that “legal team review” is underway.
  • Community Frustration and Critics’ Backlash: Many users voice their frustrations and criticisms towards Stability AI’s confusing license and handling of SD3’s release. One user notes, “The worst base model release yet
 I just wanted nice hands.”
  • Inquiry and Troubleshooting in ComfyUI: Several users discuss issues and fixes for ComfyUI setup, particularly around custom nodes installations and dependencies like cv2. One user shared a helpful ComfyUI install tutorial.
  • Interest in Model Applications and Alternatives: Users explore models for various art styles and uses, such as retro dark fantasy and video generation with animatediff tools. User discussions imply the open-source community might pivot attention to alternative models and tools post-SD3 controversy.
  • Youtuber Olivio Sarikas Faces Scrutiny: Multiple users discuss the youtuber’s video on SD3’s license, accusing him of spreading misinformation and overblown fears about the legal implications, with one stating, “Olivio had all the information
 and willfully misreported it to farm views.”

Links mentioned:


Unsloth AI (Daniel Han) ▷ #general (517 messagesđŸ”„đŸ”„đŸ”„):

  • Work in Progress on Ollama Support: A member stated, "Unfortunately the Ollama support got kinda delayed," but reassured that they are "working with the amazing Ollama team." The support is around 80% complete.
  • Validation Issues in Template Fine-Tuning: A member queried about validating templates for use with Ollama and discussed issues with learning rates and model configurations. They noted, "I had acceptable results with my merged models but it turns sick sometimes."
  • Push to HF Merged Models Issue: A member raised a problem where running `model.push_to_hub_merged` only saves the adapter but not the full merged model. Another member suggested a workaround involving manually merging before uploading.
  • Training Performance Comparisons: A user highlighted UnsLoath's performance in training speed, claiming it was "24% faster than torch.compile() torchtune for 4090" based on their benchmarking results. The UnsLoath team acknowledged this and discussed the possibility of releasing an academic paper on it.
  • Upcoming Multi-GPU Support: The team confirmed that they will be implementing multi-GPU support up to 8 GPUs. A small group is getting early access for initial testing.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #random (17 messagesđŸ”„):

  • Vintage Music Video Shared: A member posted a YouTube video titled “Not like us (1962) full song,” indicating their appreciation for older music styles. Another member complimented the taste, humorously noting they’ve only listened to anime songs.
  • Darude’s Sandstorm and Musical Preferences: A member jokingly shared Darude - Sandstorm, later revealing a genuine preference for Daft Punk’s Discovery album, sharing it on Spotify. Other users chimed in to share their favorite Daft Punk songs like “Lose Yourself to Dance.”
  • Mixed Reactions to Gemma 2 on AI Studio: A member mentioned trying out Gemma 2 27b on aistudio.google.com, noting the output was not impressive. Another user recognized the reference from Reddit, while others expressed excitement and anticipation for Gemma 2 and its potential capabilities.
  • Speculation and Excitement for Gemini 2.0: Users speculated that the release of Gemma 2 could mean that Gemini 2.0 is also near. There was notable excitement about the potential for training the model, with one user contemplating renting a Runpod 48GB instance to thoroughly test the model’s performance and capacity.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #help (304 messagesđŸ”„đŸ”„):

  • Facing issues with Triton on Windows: A member reported issues installing Triton on Windows 11 even after setting up Visual C++ correctly. Assistance was provided by querying if g++ or clang++ could be called from the terminal.

  • Data Preparation Tutorial Request: A member inquired about a data preparation tutorial for Unsloth fine-tuning similar to OpenAI’s chat fine-tuning data prep notebook. Another member cited a plan to create a tutorial and recommended a related YouTube video.

  • Model training crashes during saving: A member experienced crashes while training the Yi model during the last saving steps, suspecting memory or disk space issues. It was suggested to check available memory and disk space, and a link to Unsloth’s saving issues on GitHub was provided.

  • Issues with batch size and gradient accumulation: A member questioned the discrepancy in VRAM usage when adjusting batch size and gradient accumulation. Discussions clarified that gradient accumulation steps act similar to increasing batch size, and experimenting with larger batch sizes was recommended.

  • Error with quantization_method in save.py: A bug was identified where quantization_method was mishandled as a string, leading to errors. A workaround involved passing quantization_method as a list, and a pull request to fix the bug was submitted.

Links mentioned:


Unsloth AI (Daniel Han) ▷ #showcase (3 messages):

  • CryptGPT introduces privacy-preserving LLMs: A user shared an introductory blog post titled “CryptGPT: Privacy-Preserving LLMs using Vigenere cipher”. The blog post describes pretraining a GPT-2 model on an encrypted dataset, achieving comparable performance to a regular GPT-2 but requiring an encryption key to use it. Blog Post Link.

Link mentioned: Tweet from Diwank Singh (@diwanksingh): http://x.com/i/article/1802116084507848704


Unsloth AI (Daniel Han) ▷ #community-collaboration (1 messages):

starsupernova: Oh very interesting!


CUDA MODE ▷ #general (49 messagesđŸ”„):

  • Lighting AI Interface Suggestions: A member shared the NVIDIA warp example code and sought advice on a graphical interface to see the rendered results. They considered setting up a VNC session to resolve the issue.

  • Solved NVRTC Compilation Error: A user described an issue with NVRTC where compiling multiple kernels resulted in ‘invalid resource handle’. They later resolved it by avoiding initializing a new context for each compilation, which was causing CUDA to free the modules/functions.

  • GPU SM Count Discrepancy: A query was raised about the discrepancies between measured and reported SM counts for the A10G GPU, noting that techpowerup reports 72 SMs while pycuda measures 80. It was clarified that the site might be wrong and other sources confirm 80 SMs.

  • New NVIDIA 5090 GPU Speculations: Members discussed the upcoming NVIDIA 5090, with speculations about it having up to 64 GB of VRAM (source). There were debates about the likelihood of these specs, with pessimistic views on seeing 64GB in consumer versions.

  • Value of Forum Knowledge in Daily AI Work: A member expressed doubts about the practical value of most discussions in their daily AI work apart from a few specific topics. Others responded by emphasizing the importance of performance optimization and the general value of learning and being part of such communities.

Links mentioned:


CUDA MODE ▷ #triton (2 messages):

  • AttributeError in Triton on Colab: A user encountered an AttributeError while running Fused Softmax from Triton’s official tutorial on Colab. The error message indicated 'CudaDriver' object has no attribute 'active' and they are seeking assistance for this issue.

  • Nested Reduction Feasibility in Triton: Another user inquired about the possibility of performing nested reductions in Triton. They are interested in running reduction code at various stages to handle quadrants individually, asking if this staged reduction is supported.


CUDA MODE ▷ #torch (10 messagesđŸ”„):

  • Error with torch.compile(mode="max-autotune"): A user reported receiving an error message, Not enough SMs to use max_autotune_gemm mode, due to a hard-coded limit of 68 SMs in the PyTorch code, while their GPU only has 66 SMs. The user shared a link to the relevant section in the PyTorch repository.

  • Discussion on Reducing SM Threshold: A member suggested lowering the SM threshold to test if performance remains good without needing to rebuild from source. The lack of consumer GPUs in CI was mentioned as a reason for the current hard-coded value.

  • Testing Performance with Modified SM Threshold: After changing the SM threshold to 0, the user reported no significant performance improvement.

  • Enabling Coordinate Descent Tuning: Another member proposed enabling coordinate descent tuning found in inductor/config.py as a potential solution for improving performance.

Link mentioned: pytorch/torch/_inductor/utils.py at f0d68120f4e99ee6c05f1235d9b42a4524af39d5 · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch


CUDA MODE ▷ #algorithms (2 messages):

  • Vayuda paper sparks interest in search algorithms: A member shared a link to the Vayuda paper expressing hope that more people would work on search. This implies a potential for significant research and development in the area.

  • GPT-4 matches LLaMA 3 8B impressively: A member was impressed by how matching GPT-4 with LLaMA 3 8B turned out. They highlighted this achievement as noteworthy in current AI capabilities.


CUDA MODE ▷ #beginner (5 messages):

  • Blockwise softmax not in PMPP book: Blockwise softmax concepts are not covered in the PMPP book, but understanding the flash-attn algorithm and shared memory (smem) is crucial. High-end implementations leverage tensor cores, requiring further exploration into resources like CUTLASS.
  • Start with accessible YouTube lectures: For newcomers to GPU programming and high-performance computing, starting with YouTube lectures is advised. These lectures aim to provide an accessible introduction to the fundamentals.

CUDA MODE ▷ #jax (1 messages):

  • Announcing tpux for simplifying Cloud TPU: A member announced the tpux project, a suite of tools aimed at simplifying Cloud TPU setup and operation to facilitate the usage of JAX across multiple hosts. For more details, visit tpux on GitHub and give it a ⭐.

Link mentioned: GitHub - yixiaoer/tpux: A set of Python scripts that makes your experience on TPU better: A set of Python scripts that makes your experience on TPU better - yixiaoer/tpux


CUDA MODE ▷ #torchao (11 messagesđŸ”„):

  • Quant API Import Documentation Issue: A member flagged a correction stating, unwrap_tensor_subclass() should be imported from torchao.utils or torchao.quantization.quant_api, not torchao.quantization.utils. They emphasized the importance of users calling unwrap_tensor_subclass() before compiling the quant model to avoid errors.

  • API Release Delay and BC Issues: It was confirmed that the 0.3 release is being delayed due to backward compatibility issues that need resolution. This delay ensures the team can address and fix critical problems.

  • Innovative API Naming with ‘brrr’ Proposal: There was a playful yet practical suggestion to create an API with the name brrr that adds additional experimental flags based on the number of ‘r’s. A member humorously asked if this was serious but also hinted at a need for easier control over torchinductor flags like use_mixed_mm.

  • Feedback on use_mixed_mm Flag: A member suggested enabling the use_mixed_mm flag by default if the relevant kernel in AO is on. This feedback may lead to a GitHub issue for further discussion and implementation.


CUDA MODE ▷ #off-topic (10 messagesđŸ”„):

  • Meta tackles large-scale AI training challenges: Meta’s article discusses the complexity and computation required to train large language models (LLMs). The shift to generative AI has necessitated a rethinking of software, hardware, and network infrastructure.

  • Interview with Esolang Academics: A YouTube video titled “Interview with Esolang Academic 2024” was shared. The full version and BC Vim Linter will be available on Patreon for $5 the following day.

  • Pessimistic Neko’s Jensen Emojis: Member pessmistic_neko posted emojis <:jensen:1189650200147542017> to express their amusement.

Links mentioned:

  • Interview with Esolang Academic 2024: Esoteric programming languageFull version + BC Vim Linter for $5 tomorrow on: https://www.patreon.com/ProgrammersAreAlsoHuman Interview with an Esoteric deve...
  • How Meta trains large language models at scale: As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of co

  • How Meta trains large language models at scale: As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of co


CUDA MODE ▷ #irl-meetup (1 messages):

  • Catch Akim at AI_dev Conference: One member mentioned they will “probably be at AI_dev” and invited others to reach out. They also noted that there will be a movie about “PyTorch” shown on Tuesday.

CUDA MODE ▷ #llmdotc (473 messagesđŸ”„đŸ”„đŸ”„):

Links mentioned:


CUDA MODE ▷ #oneapi (2 messages):

  • Dynamic batching support struggles with Gaudi: A member mentioned the difficulties in getting dynamic batching with vLLM ported to Gaudi. They questioned if there is an architecture limitation preventing the implementation of KV cache flash attention kernels, contrasting it with regular “rectangular” shapes that are processed without issue.

  • Channel rename suggestion to Intel: Another suggestion was to rename the channel to Intel, tagging a user for their input. This reflects a possible channel rebranding direction.


CUDA MODE ▷ #bitnet (49 messagesđŸ”„):

  • Meeting Troubleshooting and New Link Shares: Users were discussing voice chat issues and shared several resources like Python development environments with Nix. “New laptop and having some problems with ubuntu,” one mentioned while testing their setup.
  • Benchmarking and Quantization Debates: Much of the conversation centered around benchmarking matrix multiplication with different precisions and quantization techniques. One user inquired, “Are you benchmarking matmul(x_fp16, W_nbit) or do you include scaling / zeros with grouping?” while others responded with their specific benchmarking approaches and the importance of grouping for better quality.
  • Resource Links for Further Reading: Several useful links were shared including a quantization technique and a library supporting mixed-precision matrix multiplications. These resources aimed to facilitate a clearer understanding of optimization strategies.
  • VRAM Constraints and GPU Considerations: Discussions also included the limitations of running larger models like llama2 locally due to VRAM constraints. One user mentioned using an XPS15 laptop with a GeForce GTX 1650 and explored alternative platforms like Lightning AI’s L4 with 22 free hours for testing.
  • New Git Pull Requests and Test Case Pushes: Updates on the development side were shared, including pushing new test cases for BitnetTensor and UInt2Tensor. Users interacted around issues and updates, as seen in the comment, “pushed test cases for BitnetTensor, UInt2Tensor and bitpacking gen,” providing collaborative development progress.

Links mentioned:


LM Studio ▷ #💬-general (204 messagesđŸ”„đŸ”„):

  • Link Your Code Projects with ‘lms’ Tool: With the release of LM Studio 0.2.22, users can now utilize ‘lms’ for managing models and debugging prompts. The tool helps with loading/unloading models, and inspecting raw LLM input, streamlining local AI deployments (GitHub repository).

  • Intel ARC A770 GPU Now Supported: There were several inquiries about Intel ARC A770 GPU support. Instructions were provided to enable OpenCL for Intel GPUs, emphasizing manual adjustments for GPU layers.

  • Performance Comparison and GPU Utilization: Members discussed performance comparisons, revealing mixed results with CPU vs. GPU, and specific configuration needs for optimal model performance. Issues with the Deepseek Coder V2 Lite GGUF models were addressed, highlighting the necessity to toggle Flash Attention settings.

  • Local Model Hosting Issues with Open Interpreter: Users encountered issues hosting local models for Open Interpreter via LM Studio. Recommendations included checking the detailed guide on Open Interpreter’s documentation.

  • Font Size Adjustments in LM Studio: A repeated request was to improve font size controls in LM Studio. Although there are keyboard shortcuts for zooming in/out, a more permanent and versatile solution within the app was suggested.

Links mentioned:


LM Studio ▷ #đŸ€–-models-discussion-chat (137 messagesđŸ”„đŸ”„):

  • Qwen2 and the search mishap: A user initially struggled to get coherent outputs from Qwen2 instruct, solved it using the "blank" preset, and another member advised on searching within the Discord for help rather than external sites.
  • Roleplaying model recommendation: When asked for the best model for roleplaying, a member suggested Fimbulvetr-11B, describing it as effective for their needs.
  • Finding coding models amid confusion: There was a discussion about the best models for coding, emphasizing the rapidly changing landscape and the difficulty of making reliable recommendations. Users mentioned preferring Codestral and exploring Large and Small Language Models list for detailed searches.
  • New "Ultra-Quality" model releases: Members highlighted the release of new high-performance models like Psyonic-Cetacean-Ultra-Quality-20b-GGUF-imat-plus2 and discussed their testing results and quantitative improvements.
  • Discussion on DeepSeek-Coder-V2: A member noted the release of DeepSeek-Coder-V2, capturing the excitement around its coding capabilities and discussing VRAM requirements and flash attention settings for optimal performance.

Links mentioned:


LM Studio ▷ #🧠-feedback (13 messagesđŸ”„):

  • How to handle AVX2 instruction issue: A member faced issues after updating LM Studio and found that reinstalling the beta version from here resolved the problem. They warn, “do not update” afterwards to avoid recurring issues.

  • Qwen2 outputting eot_id token problem: Users reported LM Studio outputting the eot_id token for Qwen2 instead of stopping generation, similar to issues with Llama3. Suggestions included checking the preset used and whether flash was enabled.

  • Suggestion for GPU off-loading: A user proposed an enhancement to allow off-loading models to GPU before they fully load into RAM. This would benefit machines with more VRAM than RAM, particularly GPU servers, ensuring faster and more efficient model loading.

  • Stop token handling in LM Studio: Concerns were raised about LM Studio allowing stop tokens to appear in the output and not stopping generation, leading to extensive token generation. One user emphasized the need for LM Studio to honor all listed stop tokens and treat this as a release-blocking bug.

  • User interface feedback: The LM Studio interface received positive feedback for being “cool, soft, intuitive, and fast.” Another user suggested adding VRAM usage statistics for better performance monitoring.

Link mentioned: LM Studio Beta Releases: no description found


LM Studio ▷ #📝-prompts-discussion-chat (8 messagesđŸ”„):

  • Wrestling with Error Detection: A member expressed frustration over their model’s inability to detect its errors and suggested it should output “#ERROR” when it cannot self-correct. Despite clear instructions, the model keeps requesting guidance rather than failing gracefully.
  • Struggling with Text Appendages: Another member sought advice on preventing a model from adding irrelevant text at the end of responses. They specified using the bartowski/aya-23-8B-GGUF/aya-23-8B-Q8_0.gguf model and received a suggestion to try the Cohere Command R preset.

LM Studio ▷ #⚙-configs-discussion (3 messages):

  • Mikupad User Faces Config Issues: A user sought help for using Mikupad as a webUI to interact with LMS, reporting an error message for an unexpected endpoint or method. They noted, “Mikupad have same config as LMS.”
  • Codestral RAG Preset Advice Needed: A member downloaded Codestral RAG and requested advice on creating a preset oriented towards RAG (retrieval-augmented generation). They mentioned reading relevant information on Hugging Face but remained unsure about the preset creation process.

LM Studio ▷ #🎛-hardware-discussion (34 messagesđŸ”„):

  • Archiving LM Studio 0.2.23 Setup: A member shared a MirrorCreator link to the archived LM Studio 0.2.23 setup file, noting that the installers are digitally signed and can be verified for integrity.
  • Adding a Second RTX 3090: A member asked if adding a different brand RTX 3090 would cause issues and whether to retain an RTX 2070 in the same system. Advice given suggested that for best results, get the exact same card and an SLI bridge; keeping the 2070 would slow down performance.
  • Setting CPU Cores in Server Mode: A query was raised regarding the ability to set the number of CPU cores for processing in Server Mode, noting that only four cores were being utilized despite the model being loaded in RAM.
  • AMD Ryzen RX 7700S GPU Detection Issues: A member faced issues with LM Studio not detecting an AMD Ryzen RX 7700S GPU on a Windows laptop. The discussion sought troubleshooting steps and clarified specifics about the GPU and OS.
  • Mixing RAM Sticks Concerns: The conversation involved the viability of mixing different RAM sticks with the same speed but potentially different timings for CPU-only inference tasks. The conclusion was that it should work but to confirm compatibility using memtest.

Link mentioned: LM-Studio-0.2.23-Setup.exe - Mirrored.to - Mirrorcreator - Upload files to multiple hosts: no description found


LM Studio ▷ #đŸ§Ș-beta-releases-chat (22 messagesđŸ”„):

  • Smaug-3 Tokenizer Issue Resolved: The latest build resolves the previously noted smaug-3 tokenizer issue. This update was quickly acknowledged and appreciated by other members.
  • Decoupling ROCm from Main App: A user commended the move to decouple ROCm from the main app, highlighting the successful upgrade and smooth operation on a 7900xtx. They shared their positive experience: “working just fine for me after upgrading”.
  • Command R+ GPU Offloading Glitch: Users debated an issue where Command R+ outputs gibberish when fully offloaded to the GPU, while the same model functions correctly on the CPU. One user mentioned, “Something screwy there. My context is only 4k”, indicating it might not be a memory issue.
  • Older Version Availability: Members discussed the difficulty of accessing older versions of the app, noting that changing version numbers in the URL to access older versions no longer works. Suggestions included personally keeping copies of older versions before updating, although this was flagged as impractical post-update.

LM Studio ▷ #autogen (1 messages):

  • Environment recreation resolves API key issue: A user described an issue receiving an “incorrect API key” error that persisted until they recreated their environment and reinstalled dependencies. Setting the API key using $env:OPENAI_API_KEY resolved their problem.
  • Assistant sends blank messages, causing errors: Although the user successfully set the default message and configured a model for user proxy and chat managers, the assistant sends blank messages, which results in errors in LM Studio. They are seeking further solutions to this issue.

LM Studio ▷ #open-interpreter (13 messagesđŸ”„):

  • Interpreter defaults to GPT-4 despite LM Studio running: A user faced an issue where attempting to run interpreter —local with a running LM Studio server resulted in a prompt for a provider, and then defaulted to GPT-4 even after setting LM Studio as the provider.
  • YouTube tutorial link shared: Another user suggested following this YouTube tutorial to potentially resolve the issue with the Open Interpreter setup.
  • Need to see full server page screenshot: It was advised to have the server running with a model selected and to share a screenshot of the entire LMStudio server page to diagnose the problem.
  • MacOS vs Linux inquiry: The troubleshooting user mentioned the steps they took on MacOS, prompting an inquiry about whether the original issue occurred on Linux.
  • Simple setup steps shared: A user provided clear steps to set up the interpreter on their machine, which seemed to work fine on MacOS.

Link mentioned: ChatGPT “Code Interpreter” But 100% Open-Source (Open Interpreter Tutorial): This is my second video about Open Interpreter, with many new features and much more stability, the new Open Interpreter is amazing. Update: Mixtral 7x8b was



LM Studio ▷ #model-announcements (1 messages):

  • DeepSeek releases ultra-fast coding models: DeepSeek’s new Coding models are now available, featuring their V2 MoE with 16B total parameters and only 2.4B activated for each request. This model requires flash attention disabled for proper functioning; download it here.

  • DeepSeek’s community contributions highlighted: The DeepSeek-Coder-V2-Lite-Instruct is part of the LM Studio Community models highlights program, which emphasizes new and notable models. The GGUF quantization was provided by bartowski based on the latest llama.cpp release.

Link mentioned: lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF · Hugging Face: no description found


LM Studio ▷ #🛠-dev-chat (27 messagesđŸ”„):

  • VSCode code scripts and model suggestions integration: A member shared their “dream workflow” for integrating VSCode with various models, using tools like CodeParrot and OpenAI’s Playground to generate script files and continue.dev for code modification and explanation. They expressed challenges in iterating code versions and requested help setting up continue.dev.

  • Recommendations for model selection and config in continue.dev: Another member recommended using models like llama3 or deepseek-coder for chat and provided a configuration file example for continue.dev. They pointed to issues related to unsupported GPU (6600XT) needing OpenCL instead of ROCM.

  • GPU setup issues: A member faced problems setting up GPU acceleration with ROCM and then OpenCL, leading to repeated errors about GPU survey failures. It was suggested they might be missing drivers and to seek detailed help in a specific channel.

  • Configuring continue.dev with LM Studio: Discussions highlighted the complexities of setting up multiple servers with LM Studio for different models, and using the apiBase property in continue.dev’s config. A link to setup instructions specifically for LM Studio was shared.

  • Call for API usage of LM Studio: A member asked about using LM Studio via API through ngrok, but it was clarified that LM Studio must be installed and run locally to use its services.

Links mentioned:

  • Select models | Continue: Configure LLMs
  • Tab Autocomplete (beta) | Continue: Continue now provides support for tab autocomplete in VS Code and JetBrains IDEs. We will be greatly improving the experience over the next few releases, and it is always helpful to hear feedback. If ...
  • Example configurations | Continue: If you're looking for a quick way to create the perfect Continue setup, we've written a few sample config.jsons for common situations. You can copy these and paste them into your config.json...
  • LM Studio | Continue: LM Studio is an application for Mac, Windows, and Linux that makes it easy to locally run open-source models and comes with a great UI. To get started with LM Studio, download from the website, use th...

HuggingFace ▷ #general (372 messagesđŸ”„đŸ”„):

  • Homemade AIs for Low-Resource Devices: Users discussed self-hosted AI alternatives to GPT-4 that don’t require powerful servers. “Maybe llama3 (70B-7B), mixtral 8x7B, or command r+” were suggested.
  • FlowGPT’s NSFW Content: FlowGPT is under scrutiny for potentially allowing NSFW content, which OpenAI prohibits. One user argued that while NSFW bots are common, it’s important to clarify moral vs. legal concerns.
  • Efficient Fine-Tuning and Evaluation: Viliamvolosv shared his QLoRa settings for improving Russian language models on classic literature, seeking advice on optimal parameters. Fulx69 highlighted the importance of experimenting with r and alpha values and suggested tools for evaluation like LLaMA-Factory.
  • New AI Models and Tools: DeepSeek-Coder-V2 is claimed to surpass GPT-4-Turbo in coding and math, with users recommending using LiveCodeBench for unbiased evaluation. Its on @deepseek_ai.
  • Joined and Welcomed: New users like 9do4n1 and open.group joined, with others welcoming them and clarifying server rules and culture. “Welcome đŸ€—â€ messages emphasized the supportive community environment.

Links mentioned:


HuggingFace ▷ #today-im-learning (5 messages):

  • Seeking a Model for Business Use: A member inquired about the best model for general-purpose support and business use. They specified that the largest model they can deploy is 7B.

  • Experimentation Recommended: In response, another member suggested that the choice of the model will depend on the specific use case, whether tools/agents are being used, and the deployment/affordability constraints.

  • Game Screenshot Project with GPT-4 API: A member shared their experience of using the GPT-4 API to crop and caption over 150 screenshots from the game Mirror’s Edge: Catalyst and creating a LoRA for Stable Diffusion from those images.


HuggingFace ▷ #cool-finds (10 messagesđŸ”„):

  • RNNs vs RWKV-TS in Time Series Forecasting: A member shared an arXiv paper discussing the declining dominance of traditional RNN architectures in time series tasks. The paper introduces RWKV-TS, a novel RNN-based model, which claims better efficiency, long-term sequence information capture, and computational scalability.

  • Advanced Prompt Option Impact on Production Time: A member reported that disabling the advanced prompt option significantly reduces the production time during peak periods, improving fidelity and maintaining scene stability.

  • Web Scraping and RAG to Enhance LLMs: A Medium article was shared, explaining how integrating web scraping with retrieval-augmented generation (RAG) can power up large language models (LLMs). Techniques referenced aim to enhance data collection and prompt accuracy.

  • Labor Market Impact of LLMs: A member shared a study examining the labor market impact potential of LLMs, revealing that large portions of the U.S. workforce could see significant changes in their job tasks due to LLMs. The investigation suggests both low and high-wage workers may experience shifts in their work responsibilities.

  • Reducing AI Hallucinations through RAG: An article from Wired on reducing AI hallucinations using retrieval-augmented generation (RAG) was discussed. The approach involves a model gathering information from a custom database before generating responses, enhancing reliability and accuracy.

Links mentioned:


HuggingFace ▷ #i-made-this (18 messagesđŸ”„):

  • Introducing Difoosion - A Web Interface for Stable Diffusion: A member showcased their Web-Interface for Stable Diffusion leveraging the diffusers library and a Pure-Python web framework, Rio. They invited the community to check it out on GitLab.

  • Ask Steve - LLMs Integration into Chrome: A member developed a Chrome extension that integrates LLMs directly into the browser, akin to GitHub Copilot but for web navigation. They introduced the tool as a way to eliminate repetitive tasks and promoted the project at Ask Steve.

  • Ilaria RVC for Voice Conversion: A member announced the creation of Ilaria RVC, a voice conversion space running on Zero, and thanked another user for their help. They shared the project on Hugging Face Spaces.

  • Demonstrating Transformers.js with LLM Temperature Parameter: A blog post was shared about the temperature parameter in LLMs, featuring an interactive demo via Transformers.js running directly in the browser. The author highlighted how this approach could revolutionize educational content by eliminating the need for hosting models, shared on Twitter.

  • PowershAI - Combining PowerShell with AI: A member introduced PowershAI, a PowerShell module allowing Function Calling with AI integration, which they developed while studying the OpenAI API. They shared their progress on GitHub and detailed their journey in a blog post.

Links mentioned:


HuggingFace ▷ #reading-group (16 messagesđŸ”„):

  • QKV in ViT challenged and experiments planned: A user questioned the correctness of the QKV implementation in ViTs, describing it as “wrong” and promising to conduct experiments to provide insights. More on this in the coming days.

  • HyperZZW vs Self-Attention: A member shared a critique of the self-attention mechanism in ViTs, proposing the HyperZZW operator as a simpler and more reasonable alternative. They linked a detailed post on X (Twitter), suggesting that it deals better with spatial information.

  • Global HyperZZW and tokenization issues: The same user argued that converting images into tokens in ViTs is fundamentally flawed and that the Global HyperZZW branch can manage global position info more efficiently with a matrix multiplication strategy.

  • Different strategies for image and text data: They also stressed that images and text are fundamentally different, making ViT’s implementation inappropriate for vision data, hinting at the use of prior information for future sequence modeling instead of attention mechanisms.

  • Slow neural loss and local feedback error: Contributions like slow neural loss as local feedback error have been verified and mentioned as a potential key element for next-gen architectures, inspired by Hinton’s proposal. This was promoted with another Twitter link.

Links mentioned:

  • Tweet from Harvie Zhang (@harvie_zhang): I propose a #HyperZZW operator with linear complexity to replace the #SelfAttention mechanism. The pixcel-level scores are obtained by Hadamard product between large implicit kernels and input activat...
  • Tweet from Harvie Zhang (@harvie_zhang): Do you think there is any difference between your proposed loss and my slow neural loss? Please also refer to Eqn. 11-12 in https://arxiv.org/pdf/2401.17948. Quoting Francesco Faccio (@FaccioAI) ...

HuggingFace ▷ #computer-vision (4 messages):

  • Inquiring about VASA models: A member asked if anyone has figured out the VASA-like open-source models. There’s no indication of a follow-up or response in the provided messages.
  • Interest in mobile CLIP: Another member queried if Hugging Face will implement the mobile CLIP model. There were no further discussions or responses to this question in the provided messages.

HuggingFace ▷ #NLP (5 messages):

  • Fine-tuning BERT methods shared: A member suggested using the method outlined in this tutorial for fine-tuning BERT.
  • Randomness issue in HF model loading: A user mentioned that loading HuggingFace models multiple times leads to different validation outputs, suggesting to save untrained model states for reproducibility. They noted, “don’t rely on HF initialization to be deterministic
 Save your untrained model state”.
  • Trouble with Mistral-7b-0.3 context handling: A new member is having issues with Mistral-7b-0.3 model not handling a context length properly, failing to answer questions beyond the first half of the context. They seek guidance on whether they misunderstood the model capabilities.
  • New Open Source TTS model: A member shared a new TTS model, MARS5-TTS, inviting their team to a talk on the Mozilla AI Main stage. They requested the community to submit any questions they might have for the MARS5-TTS team.

Links mentioned:


HuggingFace ▷ #diffusion-discussions (5 messages):

  • Struggles with meme generator model: A member sought advice on developing a high-quality meme generator model and asked for guidance from those with experience or interest in this domain. They emphasized the desire to produce high-quality memes and wondered about the initial steps.

  • Rate limit errors hinder progress: One member reported rate limit exceeding errors and requested help to resolve this issue.

  • Overflow error in Stable Diffusion XL: A detailed error involving SDXL loading was shared, showcasing an Overflow error: cannot fit ‘int’ into an index-sized integer. The provided code snippet and system information, including GPU: A100 and Torch: 2.3.1, were part of the context.

  • Seeking examples for Diffusers with GCP’s TPU: Another member requested an example or guidance on using Diffusers with GCP’s TPU.


OpenAI ▷ #ai-discussions (184 messagesđŸ”„đŸ”„):

  • ChatGPT on iOS 18 remains uncertain: A member asked if ChatGPT works with iOS 18, and another noted not to install beta software, underscoring the importance of using a stable iOS version like iOS 17 for ChatGPT. They also mentioned that beta users sign NDAs about new features.

  • Extracting transcripts from YouTube videos: Members discussed tools for extracting transcripts from YouTube videos, including AI tools like Otter.ai, and a specific tool that requires the YouTube API key via the fabric library. One member suggested using Google Gemini’s trial for a consumer-friendly experience.

  • Open source models beat GPT-4 in specific tasks: DeepSeek AI released an open-source model reportedly outperforming GPT-4 Turbo in specialized tasks like coding and math. This sparked discussions about open-source versus proprietary models.

  • Connecting OpenAI models to databases: A member asked about integrating OpenAI’s LLM with a continuously updating database, and another shared links to OpenAI’s Cookbook with examples for vector databases, which are foundational for supporting semantic search and reducing hallucinations in responses.

  • Dream Machine and Sora’s AI capabilities: There was enthusiastic discussion about Luma’s Dream Machine video capabilities, compared to the anticipated Sora, revealing some users’ impatience with the limited release of Sora. Members noted its impressive but still evolving functionality, with unique features like incorporating consistent physical motion.

Links mentioned:


OpenAI ▷ #gpt-4-discussions (49 messagesđŸ”„):

  • Custom GPTs privacy setting confusion: A member struggled with setting their Custom GPTs to private, mentioning that the most restricted option now is “invite only,” but it was still showing as available to everyone. A workaround suggested is to create a copy and restrict it to people with the link, then delete the original.
  • Funny idea for a GPT mod: A member suggested making a Fallout or Skyrim mod that changes all the game’s dialogue to zoomer slang or any specified prompt, noting it would be amusing.
  • Access issues with free-tier GPT interactions: Several members reported difficulties in accessing GPT interactions, with conversations requiring a paid subscription to continue. This seems to be affecting multiple users, with some confirming the same issue with their friends.
  • Specifying actions for Custom GPTs: A user inquired about setting specific actions like web browsing in their custom GPT and was advised to prompt the GPT accordingly for when to use certain tools.
  • GPT usage limits frustration: Another user expressed frustration over GPT not loading and servers being down, with others confirming similar issues. For real-time updates, users were directed to check status.openai.com.

OpenAI ▷ #prompt-engineering (28 messagesđŸ”„):

  • 3D Models Struggle with No Shadows: Members discussed the challenges of creating 3D models with no shadows or lighting. One shared hope to create texture resembling an “albedo map” to aid in 3D conversions, while another suggested inpainting or using tools like Canva to minimize shadows.

  • Extracting Information from GPT-4: A member faced issues with GPT-4 mixing sample and target emails during information extraction. Solutions included clearly separating samples with distinct markers and clarifying instructions.

  • Generate Detailed Roadmaps with ChatGPT: To explore topics like marketing and branding in depth, members recommended strategies such as step-back prompting and using detailed queries. Shared tips included creating topic trees and using browser tools for specific research.

  • Handling ChatGPT’s Request Refusals: A user experienced consecutive refusals from ChatGPT to fulfill certain requests without clear reasons. Tips shared included repeating the prompt and asking for detailed explanations while requesting the fulfillment.

  • Generating Tables from XML Data: A member inquired about prompts for extracting XML data into table form and generating specific token amounts with the GPT API. The community awaits further responses to this technical query.


OpenAI ▷ #api-discussions (28 messagesđŸ”„):

  • Secrets to 3D Model Prompts: A member suggested finding 3 examples of a 3D model with no shadows or lighting, asking ChatGPT to notate their lack and then generating a new image. Another user noted that completely eliminating shadows seems impossible due to language limitations and rendering corrections by ChatGPT and Dall-E.

  • Using Separate Samples for GPT-4: To prevent GPT-4 from mixing sample and target emails, members debated using distinct markers. Clear separation and specific instructions can prevent content amalgamation.

  • Balancing Shadows in 3D Models: A detailed discussion on minimizing shadows and light on objects for better 3D model texture mapping ensued. The consensus was that the baked-in shading interferes with albedo map creation, recommending using the generated shape as a base model instead.

  • Generating Marketing Roadmaps with ChatGPT: One user sought advanced insights on marketing topics like Brand Archetypes using ChatGPT. Members advised step-back prompting and specific roadmaps of subtopics; suggestions included using clear directives and external resources for deeper dives.

  • ChatGPT Refusal Quirks: Several users reported that ChatGPT sometimes refuses requests without giving reasons. The proposed workaround involves asking ChatGPT to explain refusals, which may prompt it to fulfill the request.


LAION ▷ #general (250 messagesđŸ”„đŸ”„):

  • SD3 Models Struggle with Artifacts and Training: Members discussed the stability and training challenges with SD3 models, noting that loss stabilization remains complex. Explicit concerns were raised about non-uniform timestep sampling and the lack of critical components such as qk norm.

  • Timestep Weighting in Training: Discussion highlighted different approaches to timestep weighting with V-prediction models. One user prefers uniform sampling while reweighting loss, segmenting schedules into smaller batches to distribute training effectively.

  • Open-source T2I Models: Queries and recommendations about the best open T2I models with character consistency led to GitHub resources for controllable text-to-image generation. Theatergen for character management was also discussed as an option for consistent multi-turn image generation.

  • ComfyUI and Adaptive ODE Solvers: A member shared a GitHub link for adaptive ODE solvers implemented for SD3, suggesting they offer better results than existing fixed-step solvers and could serve as a valuable reference or alternative for current diffusers.

  • Fudan’s Open-source Video Generative Model: Spirited discussion erupted around Fudan University’s Hallo model for video generation from single images and audio, with another tool to run it locally shared on FXTwitter. Members expressed interest in integrating it with Text-to-Speech systems like Udio or Suno.

Links mentioned:


LAION ▷ #research (34 messagesđŸ”„):

  • Logical Reasoning Challenges with AIW Problems: Discussion highlighted the frequent use of names like “Alice” in logical reasoning problems, which may bias LLMs. A member shared that “Phi-2 performed horrible in general,” showing “severe reasoning breakdown” in SOTA LLMs on the AIW problem described in this paper.

  • Experiment Tactics to Address Bias: One member experimented by changing the problem setup to remove bias from known examples, noting that models like “GPT4o and Claude-Opus managed to solve it,” while others failed. Failures were attributed to the LLMs’ misinterpretations like handling groupings incorrectly or hallucinating geometric associations.

  • Reasoning Sensitivity in Models: Further analysis showed LLMs are “VERY SENSITIVE to even slight AIW problem variations,” with Fig 11 from the referenced paper illustrating drastic fluctuations in correct response rates with slight changes, emphasizing the fragile state of their reasoning capabilities.

  • Symbolic AI Hybrids for Deductive Reasoning: A query about research efforts combining LLMs with symbolic AI for improved deductive reasoning led to the recommendation of Logic-LM, which integrates LLMs with symbolic solvers to significantly boost logical problem-solving performance.

  • JEPA for Building a Collective Vision in Email Assistants: Anu4938 shared ambitions of using JEPA to create an email assistant aimed at maximizing collective good and efficiently managing complexities. The envisioned assistant emphasizes values such as environmental respect, climate change action, and fostering global cooperation.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general (161 messagesđŸ”„đŸ”„):

  • Debate on Large Model Usability: A heated discussion took place on releasing and meme-ing about large models like the 200T parameter model, which are beyond most users’ reach. One user humorously mentioned, “I am this close to making a 200T parameter model. Claim it is AGI.”

  • Qwen7B vs Llama3 8B: Members discussed the performance comparison between Qwen7B and Llama3 8B with one user mentioning that small LLMs like Qwen7B are unlikely to outperform Llama3 8B, emphasizing its current superiority in the field.

  • Custom Llama3 Template Issue: There was a detailed technical exchange about training configurations and issues related to the chat_template setting when training with Llama3 models. One user shared a link to fix custom Llama3 prompt strategies that resolved some issues.

  • GPU and Optimization Feedback for PyTorch: A call for feedback from users using various GPUs to assist PyTorch optimizations saw diverse responses, including GPUs like AMD MI300X, RTX 3090, Google TPU v4, and 4090 with tinygrad.

  • Shared Projects and Resources: Users shared several resources, including a blog post on CryptGPT: Privacy-Preserving LLMs, a language-specific GPT chat model DanskGPT, and GitHub links for setting up chat UI similar to HuggingChat using Huggingface’s chat-ui project.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (4 messages):

  • Llama3 Bug Halts Development: A user raised an issue on GitHub regarding a bug introduced on June 7 that prevents tuning Llama 3 or Mistral models. The bug is affecting several users, with 6 people confirming its impact, and while a workaround exists, they insist that the main branch needs fixing.
  • Investigating the Bug Source: Another member asked if the issue might be related to setting remove_unused_column to false, but then concluded that the “length” keyword argument problem likely stems from a specific commit. The problematic commit was identified after a bisect, confirming it as the source of the issue.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #general-help (9 messagesđŸ”„):

  • Config confusion for dataset types: A user expressed confusion regarding the dataset type field in their axolotl config, particularly for alpaca_chat.load_qa, referencing the dataset formats. Another user confirmed that the config format provided is correct.

  • Running accelerate on SLURM clusters: A user shared a SLURM job script for running axolotl with accelerate and deepspeed, specifying mixed precision and multi-GPU settings. They advised replacing $PMI_RANK with $SLURM_NODEID if the former is unavailable.

  • QDora issues in Axolotl: A user inquired about getting QDora to work with Axolotl, and another user replied that it hangs after a few steps, suggesting it’s unreliable. Further details on building QDora from source were sought.

  • Using axolotl for personality extraction: A user asked if anyone has used Axolotl to train models for extracting personalities from text and linked to Delphi AI for reference. They asked if the oasst dataset format would be appropriate, linking to the documentation.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #datasets (4 messages):

  • Dataset Config Issues Resolved: A member requested a dataset config section due to encountering a ValueError stating “unhandled prompt tokenization strategy: sharegpt.” Another member shared a configuration link from Discord (link), which resolved the issue.

OpenAccess AI Collective (axolotl) ▷ #community-showcase (1 messages):

  • First Finetune with Axolotl Shines: “Had a blast finetuning my first LLMs with Axolotl!” The author reports successfully transitioning an unstructured press release into a structured output, hinting at further exploring OpenAI API’s function calling for improved accuracy.
  • Exploring Press Release Data Extraction Efficiency: “We previously looked into how well LLMs could extract structured data from press releases.” The initial evaluations revealed that while LLMs performed decently, there was noticeable room for improvement.
  • Future Comparisons Promised: Emphasizing the use of function calling over raw prompting for better accuracy, a separate post on finetuning comparisons is hinted at. For more details, the author refers readers to a detailed post.

Link mentioned: Alex Strick van Linschoten - Finetuning my first LLM(s) for structured data extraction with axolotl: I finetuned my first LLM(s) for the task of extracting structured data from ISAF press releases. Initial tests suggest that it worked pretty well out of the box.


OpenAccess AI Collective (axolotl) ▷ #axolotl-help-bot (11 messagesđŸ”„):

  • Adjusting Inference Parameters in Axolotl: A user asked how to set inference parameters like temperature or seed while running accelerate launch -m axolotl.cli.inference. It was suggested to modify the inference script directly or the configuration file if the command-line arguments for these settings aren’t supported, showcasing an example of how to adjust generation_config.

  • Request for Fine-Tuning Vision Models: A user inquired about fine-tuning vision models. It was explained that the process involves loading a pre-trained model (e.g., ResNet-50), preparing the dataset, modifying the final layers if necessary, defining data transforms, and then setting up a training loop with proper loss function and optimizer.

Links mentioned:


OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (11 messagesđŸ”„):

  • Doubling context length in models needs careful adjustments: To train a model at 2x the native context length (e.g., 16k from 8k), users need to modify several settings related to model architecture, data processing, and training configuration. Key changes include adjusting maximum position embeddings and training parameters like batch size and gradient accumulation steps.

  • Fine-tuning vision models with Axolotl explained: A step-by-step guide is provided for fine-tuning vision models using Axolotl. It involves cloning the Axolotl repository, installing dependencies, preparing the dataset, modifying the configuration file, and using Accelerate for training and inference.

Links mentioned:


Perplexity AI ▷ #announcements (1 messages):

  • Curiosity speaks every language; partners with SoftBank: Perplexity announced a strategic partnership with SoftBank to offer Perplexity Pro free for one year to customers using SoftBank, Y!mobile, and LINEMO services. This premium version of Perplexity, valued at 29,500 yen annually, provides users with a revolutionary AI answer engine for exploring and learning. More info.

Link mentioned: SoftBank Corp. Launches Strategic Partnership with Leading AI Startup Perplexity | About Us | SoftBank: SoftBank Corp.‘s corporate page provides information about “SoftBank Corp. Launches Strategic Partnership with Leading AI Startup Perplexity”.


Perplexity AI ▷ #general (187 messagesđŸ”„đŸ”„):

  • Agentic Search AB Testing Secrets: Community members discussed the new Agentic Pro Search being in A/B testing. One user shared a Reddit link on how to cheat the system but later reconsidered to avoid messing up the control group.
  • Confusion Over Perplexity’s Features and Model Settings: Users had various questions about using Perplexity, such as setting a system prompt, formatting answers, accessing writing modes, and experiencing issues like temperature changes or the chat freezing. They shared solutions like contacting support or clearing browser cache for bugs.
  • Perplexity vs. ChatGPT and Investment Discussions: Members debated whether it was worth having both Perplexity and ChatGPT subscriptions concurrently and discussed the potential of investing in Perplexity. Comparisons focused on the strengths of each platform for specific use cases like writing and research.
  • Concerns Over Web Crawling and Privacy: Some users raised concerns about Perplexity’s crawling behavior not respecting robots.txt and masking user agents. Suggestions for blocking or addressing this issue included using JA3 fingerprinting and bot endpoints.
  • Customizable Features and Document Handling: Members inquired and discussed uploading files, handling extensive document collections, and potential integrations with academic databases like academia.edu. Solutions included using other AI tools like custom GPTs on OpenAI and NotebookLM to manage large document loads.

Link mentioned: Reddit - Dive into anything: no description found


Perplexity AI ▷ #sharing (10 messagesđŸ”„):

  • Tanstack Table Search Shared: A member shared a link to a Perplexity AI search related to Tanstack Table queries. This sparked interest in data table management tools.

  • Pet Food in Russia Search: Another member provided a link to a search about pet food in Russia. Discussions likely revolved around the pet food market in Russia.

  • Prostate Health Paper Public Issue: A user unintentionally made their prostate health paper public and sought help to fix it. Another member advised using the “Unpublish Page” button in the Pages menu.

  • Elephant Communication Page: A contributor shared a link to a page discussing elephant communication. This might have led to conversations around animal behavior and communication methods.

  • Elder Scrolls Page (duplicated): A couple of messages included links to a page about The Elder Scrolls. This probably indicates a shared interest in this gaming series among users.


Perplexity AI ▷ #pplx-api (3 messages):

  • Ask-anything feature in Custom GPT struggles: A user successfully got their Custom GPT working but wants it to handle any prompts or queries. Another suggested explaining the problem to GPT-4o with a specific schema and error details to resolve issues with Action/Function Calling to the Perplexity API.
  • Inquire for closed-beta access timeframe: A member asked about the expected response time for closed-beta access to the API. They mentioned their project at Kalshi is heavily dependent on accessing sources and is ready to launch pending this access.

Nous Research AI ▷ #off-topic (3 messages):

Links mentioned:


  • Neurons play Doom in YouTube Video: Shared YouTube video titled “Growing Living Neurons to Play
Doom? | Part 2!” explores the concept of using living neurons to play the video game Doom. It’s an intriguing intersection of biotech and gaming.

  • Automated Bias and Indoctrination: Link to ResearchGate paper discusses the market-driven trajectory of AI and novel risks related to human bias and cognition at scale. The paper critiques “stochastic parrots” like LLMs as tools for manipulation aligned with corporate biases.

  • Solving the Alignment Problem in mASI: A thought-provoking paper aims to highlight ethical implications and extreme liabilities in AI decision-making. It introduces the concept of “Ethical Center of Gravity” for balancing ethical deeds to mitigate dystopian risks.

  • Efficient LLM Inference with vLLM: Blog post details about vLLM, an open-source inference engine using PagedAttention to improve memory usage and throughput. vLLM can run models with significantly fewer GPUs and boasts up to 24x higher throughput compared to HuggingFace Transformers.

  • Stable Diffusion Subreddit Protests Reddit API Changes: The r/StableDiffusion subreddit reopens after protesting changes to Reddit’s open API policy. The protest highlighted concerns about the impact on app developers, moderation, and accessibility for blind users.

Links mentioned:


Nous Research AI ▷ #general (124 messagesđŸ”„đŸ”„):

  • Stable Diffusion 3 and Usage: Members noted that lying is allowed for anime, but only for non-human characters. Shared link to access Stable Diffusion 3.
  • NVIDIA’s Synthetic Data Model: Discussion about Nemotron-4-340B-Instruct, a large language model for synthetic data generation, optimized for English chat and supporting a 4,096 tokens context length. Available on Hugging Face and explored for its usage and competitive implications for NVIDIA’s customer relations.
  • Realtime Inference and ComfyUI: A member suggested using ComfyUI with TensorRT SD-Turbo for near real-time inference, especially fun when paired with a webcam feed for image manipulation.
  • OpenAI’s Shift to For-Profit: Sam Altman has informed shareholders that OpenAI might transition to a for-profit entity akin to rivals like Anthropic and xAI.
  • Model Merging and MoE Debate: Extended discussion on the practicality and performance of Mixture of Experts (MoE) models and merging strategies, with hesitations about the efficacy of merging methods versus comprehensive fine-tuning. Links shared to relevant PR on llama.cpp and MoE models on Hugging Face.

Links mentioned:


Nous Research AI ▷ #ask-about-llms (22 messagesđŸ”„):

  • Setting Up Llama3 8B is Challenging: Plasmator asked for tips on training Llama3 8B for a specific style and deploying a fast OAI-compatible endpoint on M2 Ultra. Teknium recommended using unsloth qlora, Axolotl, and Llamafactory for training, and lmstudio or Ollama for endpoint deployment on a Mac.

  • RAG Method Inquiry: Rbccapitalmarkets inquired if the set-based prompting technique from a recent paper could work with RAG (Retrieval-Augmented Generation). They shared a link to the paper for further context.

  • PEFT Methods Discussion at CMU: 420gunna mentioned a CMU Advanced NLP course where the professor plugs his own paper about two new PEFT (Parameter Efficient Fine-Tuning) methods. They shared a YouTube link to the lecture for those interested.

  • Nvidia’s Nemotron Model: Avinierdc asked about opinions on Nvidia’s new Nemotron model, sparking a brief discussion. Teknium expressed a positive outlook and noted having tried it once on LMSYS chatbot arena.

  • Equivalents to ComfyUI for LLMs: Csshsh sought a tool equivalent to ComfyUI for LLMs that allows playful interaction below the API layer. Orabazes suggested that a lot can be done with ComfyUI and recommended checking out the AnyNode custom node for running models locally.

Links mentioned:


Nous Research AI ▷ #world-sim (29 messagesđŸ”„):

  • Feature for multiplayer collaboration considered: A member asked about the possibility of creating lobbies for collaborative creation with AI. Another member confirmed interest, stating, “Yes that’s something we really like the idea of - any forms of multiplayer/pvp/co-op”.

  • Worldclient and WebSim are not connected: There was confusion regarding the connection between Opus on WebSim and WorldSim. It was clarified that “worldclient has no connection with websim”.

  • WorldSim AI experiences more censorship: A member noted, “the world-sim ai has been censored a bit”. Another explained that the increased censorship could be due to stricter measures by the model provider, Anthropic.

  • Continuation feature for AI responses in development: Members discussed a bug where the AI’s replies abruptly cut off. One highlighted an ongoing effort to fix this, “yeah, I have a feature in the works to allow continuation”.

  • Claude 3’s vision support and cost considerations: Members discussed integrating vision support in WorldSim, noting that Claude 3 already has this feature. They also debated the costs, with suggestions to use GPT4o for vision tasks and pass the information to Claude to optimize usage.


Modular (Mojo đŸ”„) ▷ #general (40 messagesđŸ”„):

  • Mojo Manual on Functions Sparks Debate: A discussion arose around the Mojo manual’s explanation of def and fn functions, specifically whether def functions allow or require no type declarations. One participant proposed seven alternative phrasings to clarify the language, showing the nuances in English interpretation.

  • Mojo Typing Mechanisms Critiqued: The conversation steered towards the permissiveness of type declarations in def functions. The consensus was that while def functions do not enforce type declarations, they do allow them, contrasting with fn functions which require explicit type declarations.

  • Mojo Community Event Announcement: An announcement for the Mojo Community Meeting was made, stating it would include talks by Helehex on constraints and Valentin on Lightbug, followed by a discussion on Python interop by Jack. A link to join the meeting was provided. Join the meeting

  • Benchmark Comparison Shared: A user shared the results of a 1-thread benchmark test comparing Python FastAPI, Mojo Lightbug, and Rust Actix. The results showed Mojo Lightbug performed better than Python FastAPI but lagged behind Rust Actix.

  • Concerns About Function Coloring Discussed: Following the community meeting, a discussion about the potential runtime costs of eliminating function coloring led to a conversation about stackful vs stackless coroutines. The debate highlighted the trade-offs between runtime cost and language complexity. Link to discussion on coroutines

Links mentioned:


Modular (Mojo đŸ”„) ▷ #đŸ’Źïž±twitter (2 messages):

  • Modular tweets new update: Modular shares a Tweet with their community, keeping followers updated on their latest activities and announcements.
  • Another Modular announcement via Twitter: Modular posts another Tweet to keep the community informed about their continuous advancements and upcoming events.

Modular (Mojo đŸ”„) ▷ #✍blog (1 messages):

  • Mojo 24.4 Release Announced: Mojo has released version 24.4, boasting several significant core language and standard library enhancements. Readers are encouraged to read the full blog post here for detailed insights and code examples.

Link mentioned: Modular: What’s New in Mojo 24.4? Improved collections, new traits, os module features and core language enhancements: We are building a next-generation AI developer platform for the world. Check out our latest post: What’s New in Mojo 24.4? Improved collections, new traits, os module features and core language enhanc



Modular (Mojo đŸ”„) ▷ #ai (2 messages):

  • $1,000,000 Prize for True AGI Solution!: A user shared a YouTube video featuring Francois Chollet discussing why he believes LLMs won’t lead to AGI, along with a $1,000,000 ARC-AGI Prize for finding a true solution. Another user expressed skepticism, commenting that the prize amount felt like “lowballing.”

Link mentioned: Francois Chollet - LLMs won’t lead to AGI - $1,000,000 Prize to find true solution: Here is my conversation with Francois Chollet and Mike Knoop on the $1 million ARC-AGI Prize they’re launching today.I did a bunch of socratic grilling throu



Modular (Mojo đŸ”„) ▷ #đŸ”„mojo (107 messagesđŸ”„đŸ”„):

  • Defining 2D Numpy Arrays with Mojo: Users discussed the limitations of Mojo in passing nested lists to Python and shared workarounds using the ast module and Python.import_module. For example, one user suggested a function ndarray that converts a string representation of a nested list to a Numpy array, which is then returned as a Python object.
  • Differences Between DTypePointer and Pointer[SomeDType]: Users highlighted that DTypePointer is preferable for SIMD operations, as it allows for efficient simd_load instructions. This was particularly helpful for a user who wanted to understand the performance implications of using each type.
  • VSCode Integration with Mojo: A member asked how to include directories in VSCode for Mojo, and another provided a way to do so via settings.json. This helps VSCode analyze Mojo packages by adding "mojo.lsp.includeDirs": [ "/Users/your-name/your-mojo-files" ].
  • Bug in Casting and Contextual Behavior: A user reported a bug when casting unsigned integers using int() or UInt32(), experiencing different behavior between running the script and using the REPL. A GitHub issue was created to track this inconsistency.
  • CRC32 Table Calculation with Var vs. Alias: A detailed discussion revealed an issue when using alias instead of var to initialize CRC32 tables, leading to different results due to casting behaviors. The minimal example showed that overflowing as if signed was occurring unexpectedly, prompting an investigation into the alias-specific behavior.

Links mentioned:


Modular (Mojo đŸ”„) ▷ #🏎engine (3 messages):

  • TPU usage clarified: A member explained that the only way to use TPUs is through calling XLA via the PjRT API. They provided a link to the PjRT API documentation and the TPU plugin libtpu.so.
  • Call for native TPU support: Another member suggested writing native support for TPUs, similar to how Modular handles GPUs. The first member responded that there’s no public API for TPUs that operates at a lower level than XLA.

Link mentioned: GitHub - openxla/xla: A machine learning compiler for GPUs, CPUs, and ML accelerators: A machine learning compiler for GPUs, CPUs, and ML accelerators - openxla/xla


Modular (Mojo đŸ”„) ▷ #nightly (9 messagesđŸ”„):

  • New Nightly Mojo Compiler Released: A new nightly Mojo compiler version 2024.6.1505 was released, and users can update with modular update nightly/mojo. For more details, see the raw diff and the current changelog.
  • Compiler Version 2024.6.1605 Released: Another nightly update to Mojo compiler, version 2024.6.1605, has been released. Users should update and review changes through the raw diff and the changelog.
  • Latest Nightly Release 2024.6.1705: The most recent update to the nightly Mojo compiler is now available as version 2024.6.1705. Update details can be reviewed via the raw diff and the current changelog.
  • Request for Builtin MLIR Dialects Documentation: A user inquired about the availability of external documentation for builtin MLIR dialects. Another member confirmed that no such documentation is currently available.
  • Feature Request for REPL Improvements: A query was made regarding whether expressions could directly output values in the REPL similar to Python. The response suggested filing a feature request on GitHub for this enhancement.

Eleuther ▷ #announcements (1 messages):

  • Interpretability team replicates OpenAI’s findings: The EleutherAI interpretability team successfully replicated OpenAI’s “weak-to-strong” generalization results using open-source LLMs. They observed these results across 21 NLP datasets and tried several modifications to improve generalization but found that “vanilla weak-to-strong training may already be close to eliciting everything the student ‘knows’”.

  • Negative results for generalization improvements: The team experimented with various modifications such as strong-to-strong training, modified loss functions, and several probe-based experiments, with “generally negative results”. Among these, only the log-confidence auxiliary loss showed potential signs of consistent improvement in generalization.

  • Detailed findings published: The detailed findings and results of their investigations on weak-to-strong generalization in open-source models like Qwen1.5 0.5B and Llama 3 8B can be found in their latest blog post.

Link mentioned: Experiments in Weak-to-Strong Generalization: Writing up results from a recent project


Eleuther ▷ #general (51 messagesđŸ”„):

  • AISI adds new roles and assistance for moving: AISI announced a variety of new job openings on their careers page and mentioned they could assist with visas for candidates open to relocating to the UK. This sparked interest among members not residing in the UK, considering the opportunity despite the location requirement.

  • Discussion on CommonCrawl processing: Members exchanged tips for processing CommonCrawl snapshots, highlighting tools like ccget and resiliparse. Challenges included throttling and performance optimizations for handling large datasets efficiently.

  • Interest in reproducible image generation models: Users discussed image generation models trained on publicly licensed data, specifically pointing to the CommonCanvas models on Hugging Face and a related arXiv paper. While some found the models currently less effective, they suggested their potential use in creating applications like texture generation.

  • Clarification of Git vs. GitHub confusion: Members clarified the differences between Git and GitHub, emphasizing that Git is a source code management tool and GitHub is a repository hosting service. The conversation included a video link to help explain these concepts further.

  • Introduction of new members: New members such as Piyush Ranjan Maharana and Tomer shared their backgrounds, including work in computational physics, autonomous cars, and material discovery via LLMs. They expressed eagerness to learn and contribute to the community.

Links mentioned:


Eleuther ▷ #research (61 messagesđŸ”„đŸ”„):

  • Exploring RWKV-CLIP for Vision-Language Learning: A paper discussed the introduction of RWKV-CLIP, a vision-language representation learning model combining transformers’ parallel training with RNNs’ efficient inference. This approach aims to improve large-scale image-text data quality by leveraging LLMs to synthesize and refine web-based texts and synthetic captions.

  • Concerns Around Diffusion Model Hallucinations: Another paper explored the phenomenon of “hallucinations” in diffusion models, identifying a failure mode termed mode interpolation. The study revealed that diffusion models interpolate between data modes, creating artifacts not present in the original training distribution.

  • Discussion on Prefetching Streaming Datasets: Some technical discussions touched on handling streaming datasets with keep_in_memory=True for efficient data fetching. Members shared insights about the recent introduction of checkpointing and resuming streams, enhancing usability for large datasets.

  • Effectiveness of Laprop Optimizer: Members debated the effectiveness of the Laprop optimizer, with mixed results showing indifferent or inferior performance compared to AdamW. Parameter tweaks made some improvements, yet Laprop’s overall performance remained underwhelming.

  • Stealing Commercial Embedding Models: A paper highlighted a method for “stealing” commercial embedding models by training local models with text-embedding pairs obtained from APIs. The method showed that effective replication could be achieved inexpensively, raising concerns about the security of commercial models.

Links mentioned:


Eleuther ▷ #scaling-laws (18 messagesđŸ”„):

  • Hypernetwork-based Paper Critique: A member dismissed a paper proposing linear hypernetwork attention as “useless,” claiming it contains a critical mistake making its efficiency worse than full attention. They highlighted that the paper provides some reasoning for attention mechanisms behaving like hypernetworks.

  • Hypernetworks and Hopfield Nets Debate: Members discussed whether hypernetworks are actually Hopfield nets, with one member noting that although there are high-level similarities like input-dependent weight generation, Hopfield networks are inherently recurrent. This sparked a conversation on the historical significance and evolution of Hopfield networks.

  • Hopfield Networks’ Historical Context: Members reminisced about Hopfield networks’ past significance in connectionism and their influence on current models like transformers. They pointed out that modern models use backpropagation and multi-layer networks for superior performance, but the concepts of attractors and dynamics from Hopfield nets still inform contemporary neural network architecture.

  • Dynamic Evaluation and Online Adaptation: A member shared a paper on dynamic evaluation for language models, emphasizing its utility in adapting to distributional shifts at test time. This method is described as turning parameters into temporally changing states, much like memory in neuroscience, and warrants a potential Jones-style scaling law evaluation.

  • Jones-style Scaling Law Reference: In response to the dynamic evaluation discussion, a member referenced the “Scaling scaling laws with board games” paper by Andy L. Jones, which suggests trading off training compute and inference compute. This reference underscores the relevance of considering efficient scaling laws in adaptive model contexts.

Links mentioned:


Eleuther ▷ #interpretability-general (11 messagesđŸ”„):

  • Math PhD explores Sparse Autoencoders: A new math PhD graduate expressed interest in interpretability research involving Sparse Autoencoders (SAEs). They were directed to a blog post, which found that SAEs may recover composed features instead of ground truth ones in toy models.

  • Discussion on Sparse Coding and Dictionary Learning: Members shared relevant papers and discussed topics related to sparse coding and dictionary learning, including a paper on dictionary learning in Wasserstein space here and another on disentanglement in naturalistic videos here.

  • Framework for Evaluating Feature Dictionaries: A paper was introduced which proposes a framework for evaluating feature dictionaries in specific tasks using supervised dictionaries, highlighting its application on the indirect object identification task using GPT-2 Small (link to paper).

  • Link to Linear Identifiability Work: Inquiries about settings with genuinely linear features in activation space led to recommendations for investigating linear probes and ICA literature, with a relevant paper here.

  • Announcement of Logit Prisms Tool: New work extending the logit lens method was announced as “logit prisms,” decomposing logit output into components of the residual stream, attention layers, and MLP layers. It was used to study the gemma-2b model, revealing that digits 0-9 are encoded in a heart-like shape in a 2D space (full article).

Links mentioned:


Eleuther ▷ #lm-thunderdome (4 messages):

  • Call for sharing evaluation results: A member pointed out that Hugging Face is using an outdated harness version and observed significant differences in current results. They inquired about a platform where people could post their own evaluation results including runtime parameters and version information for validation.
  • Independent validation request for closed-source models: The same member also asked if there was a place to post independent validation results for various closed-source models. This suggests a need for a shared, trustworthy evaluation forum.
  • Multi-GPU evaluation issue with WANDB: Another member reported an issue when executing multi-GPU evaluation, leading to the creation of two separate projects in WANDB instead of one. They shared their command setup and sought advice on whether using the --num_processes=2 flag for data parallel evaluation is appropriate.

Eleuther ▷ #multimodal-general (3 messages):

  • Code Release Inquiry Leads to GitHub Issues: A member inquired about the release date for a particular code. Another member redirected the query to the project’s GitHub Issues page for the RWKV-CLIP project.

Link mentioned: Issues · deepglint/RWKV-CLIP: The official code of “RWKV-CLIP: A Robust Vision-Language Representation Learner” - Issues · deepglint/RWKV-CLIP


LLM Finetuning (Hamel + Dan) ▷ #general (35 messagesđŸ”„):

  • Apple’s AI strategy at WWDC intrigues: A community member shared a blog post detailing Apple’s new AI strategy, highlighting Apple’s avoidance of NVIDIA hardware and CUDA APIs. It discusses the use of Apple’s AXLearn, which runs on TPUs and Apple Silicon.

  • Deep dive into embeddings resources: A list of valuable resources on embeddings was shared, including a link to a curated list on GitHub and a blog post at vickiboykis.com. Members discussed the importance of understanding latent spaces and how embeddings emerge.

  • Open call for refusal classifier models: A member expressed interest in off-the-shelf refusal classifier models, possibly using T5/BERT for multilingual data. They indicated a need for around 1K samples for training and sought advice on this topic.

  • Fine-tuning TinyLlama for specific narration style: A member documented their experience with fine-tuning TinyLlama to generate David Attenborough-style narration, sharing their blog post. They utilized tools like Axolotl and Jarvis Labs for the project, learning and sharing detailed steps and insights.

  • Issue with loading model config on Jarvis Labs: A user faced an error while trying to fine-tune Mistral on Jarvis, which was resolved after switching to version v0.3 and changing the permissions of their token. They noted this might have also needed network stability, thanking others for their assistance.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #đŸŸ©-modal (14 messagesđŸ”„):

  • Credit confusion and resolution: A user realized they missed the deadline for additional credits and asked for help, receiving a positive response with the GitHub link. Another user asked about the status of their account, and their credits were granted after a manual review.
  • Discussion on model startup optimization: A user inquired whether copying model weights into the image or mounting them from a volume affects startup times. They were informed that weights loaded into images might have a slight edge, but infrastructure unification means differences are minor.
  • Multi-turn conversation issue and solution: A user experienced an issue with their model predicting the first turn of conversation repeatedly and was advised to discuss it in the appropriate channel. They later resolved it by changing the dataset format to the input_output format of Axolotl.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #learning-resources (5 messages):

  • Learn TextGrad for prompt fine-tuning: Members discussed the TextGrad project, which uses large language models to backpropagate textual gradients. It was noted that the project is considered better than DSPy and there is an explanatory YouTube video.

  • Using TextGrad without installation: One member inquired if they could use TextGrad with their Anthropic/OpenAI API keys without installing anything. Another member mentioned that they tried the example Colab notebooks where one can set their OpenAI API key and test how it works.

  • Implementing LLMs from scratch: A link to a GitHub repository was shared, providing a step-by-step guide for implementing a ChatGPT-like LLM in PyTorch. This resource could be useful for those interested in learning and experimenting with LLM development from the ground up.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #hugging-face (2 messages):

  • Reminder about form deadline: Gentle reminder folks that today is the last day to sign the form! If you have not gotten credits yet, but yet think you filled out the first form, FILL THIS ONE OUT!
  • Credits issuance for second form submissions: If you applied on the second form, “we haven’t done credits for those yet, that happens after Monday.” There was a mention of difficulty in finding some users in the original form.

LLM Finetuning (Hamel + Dan) ▷ #replicate (2 messages):

  • User Follows Up on Credit Link: A member expressed concern about not receiving a link to redeem credits for Replicate. They mentioned having already sent their email and other details via DM.

  • LoRA Adapter Deployment Query: A member sought assistance on deploying LoRA adapters to Replicate, mentioning success with running a fine-tuned phi-3-mini locally using Cog. They contrasted the process with Modal, where a volume is created and bound to a container at runtime, and asked how a similar approach could be achieved on Replicate.


LLM Finetuning (Hamel + Dan) ▷ #langsmith (5 messages):

  • Clarification on LangSmith Beta Credits vs Course Credits: A user asked if “LangSmith Beta Credit” are the same as the credits for the course. Another user clarified that they are different; “LangSmith Beta Credit” was granted to beta users, while course credits should appear as ‘Mastering LLMs Course Credit’ under billing.
  • Offering Help with Missing Credits: One user offered assistance to another user who felt they were missing course credits. They confirmed that they could check the situation if provided with the email used in the credits form.
  • User Queries about Missing Credits: Another user inquired about not seeing any credits on LangSmith. They requested help to understand if any additional steps were needed from their end.

LLM Finetuning (Hamel + Dan) ▷ #berryman_prompt_workshop (2 messages):

  • Promptfoo gains interest among members: A member expressed interest in Promptfoo, thanking another for sharing it.
  • Inspect-ai preferred over Promptfoo: Another member shared their preference for inspect-ai over Promptfoo, citing its flexibility and fit with Python in a test style. However, they mentioned it’s not straightforward to do side-by-side comparisons with inspect-ai compared to Promptfoo.

LLM Finetuning (Hamel + Dan) ▷ #workshop-3 (4 messages):

  • CUDA Error during Docker Execution: A user experienced a Docker error when running Python in a container, with the message “OCI runtime create failed: runc create failed: unable to start container process”. Another user suggested that this might be due to an improperly set up CUDA or a compatibility issue.
  • Difficulty in Issue Replication: The issue is hard to replicate, as noted by a responder who stated, “It’s hard to tell because I can’t replicate this issue”. This indicates the problem might be environment-specific or related to the user’s specific configuration.

LLM Finetuning (Hamel + Dan) ▷ #clavie_beyond_ragbasics (8 messagesđŸ”„):

  • RAGatouille Simplifies ColBERT Usage: A member praised RAGatouille as a great tool for integrating ColBERT with Langchain for internal projects. They also recommended Ben’s post as a fantastic introduction to ColBERT.
  • Understanding Bi-encoder Functionality in RAG: Addressing a beginner’s query about the logic behind bi-encoders in RAG setups, another member explained that models are trained to associate queries and documents with a prefix system. The response highlighted the necessity of defining “similarity” during model training to suit different use cases.
  • Exploring Learning Resources: A member sought resources for advanced topics like finetuning ColBERT and rerankers, and using embedding adapters. They appreciated another member’s recommendation of a Medium post on building state-of-the-art text embedding models.
  • Combining Full Text Search with Fine-tuned Rerankers: A participant discussed their approach of using lancedb and combining full-text search via Lucene with fine-tuned rerankers for impactful results. They noted not using vector databases as mentioned in Ben’s presentation.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #jason_improving_rag (1 messages):

  • Efficient Category Structuring for GPT-4o: Members discussed how using a tree structure for category prompts improves GPT-4o’s decision-making in filter selection. Despite the large system prompt, it works well even though latency was an issue with GPT-4.

  • Single Vector Strategy for Documents: The group uses just one vector per document/product, accompanied by appropriate meta tags. This approach aids in maintaining a streamlined and effective categorization system.


LLM Finetuning (Hamel + Dan) ▷ #jeremy_python_llms (3 messages):

  • Catch up on talks with shared link: A member requested a link to catch up on the discussions. Jeremy Howard promptly shared this Discord link, and the member expressed their gratitude.

Link mentioned: Join the fast.ai Discord Server!: Check out the fast.ai community on Discord - hang out with 10920 other members and enjoy free voice and text chat.


LLM Finetuning (Hamel + Dan) ▷ #saroufimxu_slaying_ooms (3 messages):

  • Anticipation Builds for New Session: A user inquired about the likelihood of a new session taking place with a humorous undertone: What is the probability that there will be a session? đŸ«ąđŸ€Ș.

  • Upcoming Project in Memory Efficiency: Another user informed everyone that a new project focused on memory efficiency is underway. They mentioned that once this project is ready, a “more interesting talk” can be expected.


LLM Finetuning (Hamel + Dan) ▷ #axolotl (27 messagesđŸ”„):

  • Strickvl hits OOM errors with local LORA models: Despite using two 4090s, Strickvl faces Out-Of-Memory (OOM) errors when loading full resultant LORA models. They suggested checking configurations and considering quantization, and shared their configs on GitHub.

  • Quantization offers a memory-saving solution: Chrislevy pointed out that models loaded in float32 consume a lot of memory and recommended using torch_dtype=torch.bfloat16 for inference, as described in the Llama 3 model card.

  • Documentation gap for axolotl and finetuning: There’s a call for better documentation on finetuning, specifically on training LORA/QLORA settings, saving models, and proper loading techniques. Strickvl emphasized this need and hinted at using Hamel’s course repo for sanity checks.

  • Modal Labs guide clarifies model loading: Andrewcka provided code insights from Modal Labs’ inference script explaining how the script identifies the last trained model by date-time to handle inference effectively.

  • Finetuning multi chat conversations with axolotl: Huikang inquired about adapting axolotl for multi chat conversations and shared resources like the code for CodeLlama and the axolotl dataset formats for conversation fine-tuning.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #wing-axolotl (1 messages):

  • Excitement over Code Llama’s release: Code Llama is an evolution of Llama 2, tuned specifically for code tasks, and released in the Hugging Face ecosystem. The release includes models on the Hub, Transformers integration, and several productivity-boosting features for software engineers.
  • Format difference spotted: Noting a format difference between the Hugging Face blog post about Code Llama and the GitHub configuration file for finetuning Code Llama models. This was highlighted to confirm if such differences are acceptable.

Links mentioned:


LLM Finetuning (Hamel + Dan) ▷ #charles-modal (1 messages):

  • Channel Lockdown Notice: The channel is being locked down, and members are directed to use another channel for any questions for Charles. A friendly emoji, <:hugging_angel:936261297182482452>, was included in the announcement.

LLM Finetuning (Hamel + Dan) ▷ #simon_cli_llms (5 messages):

  • CORS Error blocks video fetch: A member reported encountering CORS errors when trying to fetch a video. The suggested workaround is to “open the raw .mp4”.
  • CloudFront misconfiguration suspected: The issue may stem from a CloudFront misconfiguration where the request’s CORS headers aren’t being cached properly. The member noted that “CloudFront will cache the whole response on the first time the URL is hit” and “their cache does not key on the fetch mode request headers”.
  • Video link provided: The video in question is accessible at this link. The member queried whether it was “recorded from outside zoom and shared via a bucket”.

Link mentioned: no title found: no description found


LLM Finetuning (Hamel + Dan) ▷ #allaire_inspect_ai (3 messages):

  • Using instructor with inspect_ai: A member asked if there was a way to use something like instructor in inspect_ai to ensure the output format is valid. Another member suggested either implementing and registering a custom model or using tool calls directly, as this is what instructor does under the hood.
  • flexibility of inspect_ai: One user noted that inspect_ai allows for replacing existing infrastructure with custom solutions or enhancing current setups.

LLM Finetuning (Hamel + Dan) ▷ #credits-questions (3 messages):

  • Credit Check Confusion on Braintrust Data: A user expressed frustration about not finding where to check credits on the Braintrust Data site: “I can not even find where to check credits on braintrustdata site. It does not show anything to billing at all?” Another user suggested seeking help in another channel, emphasizing they also couldn’t find the credit status.
  • Redirect to Proper Channel for Solutions: A member recommended moving the discussion to a different channel, tagging another user for a potential answer to the credits check issue. They acknowledged similar difficulties in locating the current credit status.

LLM Finetuning (Hamel + Dan) ▷ #fireworks (6 messages):

  • Users Swamp Support for Credit Issues: Multiple users requested assistance with missing credits on their accounts. User account IDs mentioned include carljvh-7d2eb0, jalonso-e11d20, alex-kira-d15187, harrille-postia-723075, ayhanfuat-fa2dd5, and data-94d7ef.

LLM Finetuning (Hamel + Dan) ▷ #braintrust (3 messages):

  • User seeks platform testing credits: @peaky8linders asked about logging in to test a platform and still seeing the Upgrade button, querying if they could still get credits. They provided their email and organization information for verification.
  • Credits confirmed: @ankrgyl assured @peaky8linders that they should be all set with the credits.

LLM Finetuning (Hamel + Dan) ▷ #west-coast-usa (1 messages):

.peterj: Anyone from Seattle area?


LLM Finetuning (Hamel + Dan) ▷ #east-coast-usa (1 messages):

ssilby: <@415846459016216576> I’m in! Let’s set up a DMV meetup :3


LLM Finetuning (Hamel + Dan) ▷ #predibase (7 messages):

  • Predibase Misinterprets Dataset Fields: A user faced issues with their Alpaca/ShareGPT-formatted dataset on Predibase due to a missing text field. They were curious how to work with template-free datasets and convert their data accordingly.
  • Getting Data Format Right for Predibase: The user resolved their issue by selecting the ‘instruction tuning’ format and adjusting the data as per Predibase’s documentation. They shared their dataset for reference here.
  • Test Data Evaluation on Predibase: The user noted a limitation of Predibase regarding the use of test data for evaluation and mentioned they would perform the evaluation after the model is trained.
  • Extracting Adapters from Predibase: The user inquired if it is possible to download or extract the adapters trained on Predibase for local testing, preferring to avoid deploying a custom instance.

Link mentioned: isafpr_finetune/data at main · strickvl/isafpr_finetune: Finetuning an LLM for structured data extraction from press releases - strickvl/isafpr_finetune


LLM Finetuning (Hamel + Dan) ▷ #openpipe (3 messages):

  • Dataset Format Struggles Resolved: A member asked for examples of datasets formatted correctly for Openpipe, mentioning unsuccessful attempts with axolotl and template-free datasets. Later, they solved their own problem by formatting the data according to the OpenAI chat format used for OpenAI finetuning, sharing their dataset on GitHub.

Link mentioned: isafpr_finetune/data at main · strickvl/isafpr_finetune: Finetuning an LLM for structured data extraction from press releases - strickvl/isafpr_finetune


LLM Finetuning (Hamel + Dan) ▷ #openai (1 messages):

kramakurious: <@1010989949572612166> is this something you can help with?


Interconnects (Nathan Lambert) ▷ #news (69 messagesđŸ”„đŸ”„):

  • Sakana AI hits $1B valuation: Sakana AI, a Japanese startup developing alternatives to transformer models, raised funds from NEA, Lux, and Khosla at a $1B valuation. For more details, check out the link.

  • Runway’s Gen-3 Alpha debuts: Runway introduced Gen-3 Alpha, a new base model for video generation. Claimed to create highly detailed videos with complex scene changes and a wide range of cinematic choices.

  • DeepSeek-Coder-V2 impresses: DeepSeek-Coder-V2 was released, reportedly beating GPT-4 on both HumanEval and MATH benchmarks.

  • Google DeepMind’s new video-to-audio tech: Google DeepMind showcased progress on their video-to-audio (V2A) technology, capable of generating an “unlimited number” of tracks for any video. See examples here.

  • Wayve’s new view synthesis model: Wayve released a new view synthesis model, impressively creating views from input images using 4D Gaussians, according to Jon Barron’s update.

Links mentioned:

  • Tweet from Rowan Cheung (@rowancheung): Google DeepMind just shared progress on their new video-to-audio (V2A) tech Until now, AI video generations have been silent, this solves that. V2A can generate an "unlimited number" of track...
  • Tweet from Jon Barron (@jon_barron): Wayve dropped a new view synthesis model earlier today. I'm guessing it's a radiance field made of 4D Gaussians. Nothing generative, just view synthesis from input images. Very impressive.
  • Tweet from Runway (@runwayml): Introducing Gen-3 Alpha: Runway’s new base model for video generation. Gen-3 Alpha can create highly detailed videos with complex scene changes, a wide range of cinematic choices, and detailed art di...
  • Tweet from Dwarkesh Patel (@dwarkesh_sp): I asked Buck about his thoughts on ARC-AGI to prepare for interviewing @fchollet. He tells his coworker Ryan, and within 6 days they've beat SOTA on ARC and are on the heels of average human perf...
  • Tweet from Stephanie Palazzolo (@steph_palazzolo): NEW w/ @nmasc_ @KateClarkTweets: Sakana AI, a Japanese startup developing alternatives to transformer models, has raised from NEA, Lux and Khosla at a $1B valuation. More here: https://www.theinform...
  • Tweet from Nathan Lambert (@natolambert): What unlocked all these text-to-video models being good within the same 6month window? Was it just that people weren't trying? Wild that it seems like just coincidence for them to all emerge. L...
  • AI Text to Sound Effects Generator: Use our AI Sound Effects Generator to generate any sound imaginable from a text prompt for free. Perfect for videos, podcasts, or any other audio production.

Interconnects (Nathan Lambert) ▷ #ml-drama (4 messages):

  • Sam Altman hints at OpenAI governance changes: A tweet by Jacques Thibault referenced a private statement by Sam Altman, suggesting OpenAI might convert to a for-profit business. This move could potentially enable a public offering, allowing Altman to gain a stake in OpenAI. Read the full tweet.

  • The Information reports on OpenAI’s potential shift: The Information detailed that Altman has privately mentioned OpenAI’s possible shift to a benefit corporation, similar to Anthropic and xAI. This transformation could lead to OpenAI going public. Read the article here.

  • Community reacts skeptically: One member expressed skepticism over these developments, summarizing their sentiment with “This is so sketch lmao”.

Link mentioned: Tweet from Jacques (@JacquesThibs): “Sam Altman recently told some shareholders that OAI is considering changing its governance structure to a for-profit business that OAI’s nonprofit board doesn’t control. [
] could open 



Interconnects (Nathan Lambert) ▷ #random (63 messagesđŸ”„đŸ”„):

  • Compliments on Interconnects Merch: Members discussed the quality of the merchandise, noting that while stickers were not well-received, the T-shirts were appreciated. One member mentioned, “stickers were bad need to try another vendor.”
  • Dissecting ARC-AGI Performance: A link to a Redwood Research article discussing methods to improve ARC-AGI performance sparked debate. Members criticized the approach of using a large number of samples, arguing it’s more about hitting by chance rather than scaling.
  • Exploring Neurosymbolic AI: Members dove into neurosymbolic AI, questioning if leveraging LLMs for discrete program search truly fits the traditional definition. A discussion evolved around a tweet from François Chollet, parsing out whether current AI techniques suffice or if fundamental breakthroughs are necessary.
  • MidJourney’s New Ventures: MidJourney is expanding into hardware and anticipates launching training on its video models in January. CEO David Holz confirmed this during a Discord “Office Hour” session.
  • Conundrums at Academic Conferences: A member pondered the value of attending ACL in Thailand despite the travel inconvenience from California, questioning its relevance compared to major conferences like NeurIPS. “I don’t think it’s do or die,” another member responded, suggesting optional attendance.

Links mentioned:


LlamaIndex ▷ #blog (9 messagesđŸ”„):

  • RAG and Agents Guide Excites with Excalidraw Diagrams: @nerdai shared a comprehensive slide deck on building RAG and Agents. The guide includes full Excalidraw diagrams breaking down simple-to-advanced concepts.
  • Arize Integration Adds End-to-End Observability: The new instrumentation module integrates with Arize, demonstrated in this guide. It shows how to instrument custom event/span handlers in LLM apps.
  • AI World’s Fair Wrap-up Featuring Top Speakers: Join talks from @jerryjliu0, @freddie_v4, @atitaarora, and more at the AI Sizzle and Waves event by AI Engineer World’s Fair. Hosted by Angela Tse, Atita Arora, and Julia Neagu.
  • Beginner’s Guide for Full-Stack Agents Released: @MervinPraison’s tutorial offers a step-by-step guide on building core components of an agent using local models and @chainlit_io. The tutorial is designed to create simple applications.
  • Multimodal RAG Pipeline with Claude 3 and SingleStoreDB: @Pavan_Belagatti discusses future roles of multimodal RAG in his article, which utilizes Claude 3 by @AnthropicAI and @SingleStoreDB. This pipeline addresses the prevalence of images within documents.

Links mentioned:


LlamaIndex ▷ #general (95 messagesđŸ”„đŸ”„):

  • Chunking customer service emails for RAG: One member asked how to create chunks for a customer service RAG model based on email conversations. Another suggested capturing the first email from each chain to ensure each email is included.
  • Generating specific outputs from markdown documents: A user is having issues with LlamaIndex truncating relevant language from markdown documents. They need precise outputs without summarization and are looking for any advice to improve this.
  • Using Neo4j with LlamaIndex: Multiple queries were raised about converting Neo4j knowledge graphs into LlamaIndex property graphs. Detailed instructions and a link to the LlamaIndex documentation were shared (LlamaIndex Property Graph Example).
  • Overlapping sentence retrieval: A user inquired about expanded sentences overlapping when using sentence-level retrievals. It was clarified that overlapping sentences do not get merged, and custom post-processing would be needed.
  • Saving ChatMemoryBuffer: There was a discussion on saving ChatMemoryBuffer objects to a file format to manage token limits in long conversations. A method to save chat memory as a dict and store it in a JSON file was suggested.

Links mentioned:


LlamaIndex ▷ #ai-discussion (6 messages):

  • Power-Up LLMs with Web Scraping and RAG!: How to Power-Up LLMs with Web Scraping and RAG explores enhancing LLM performance through web scraping and retrieval-augmented generation (RAG). The article highlights tools like Firecrawl for clean Markdown extraction and Scrapfly for various output formats.
  • Firecrawl vs. Scrapfly in LLM Applications: “Firecrawl shines for Markdown”, making it ideal for preparing data for LLMs. Scrapfly offers flexibility with various output formats but may need additional processing for LLM optimization.

tinygrad (George Hotz) ▷ #general (39 messagesđŸ”„):

  • Script indentation breaks in autogen_stubs.sh: A member faced issues with the autogen_stubs.sh script where clang2py breaks indentation, causing syntax errors. Discussions revealed it was not needed for the intended task of running tinygrad with GPU.
  • OpenCL installation issues cause errors: Problems with OpenCL installation led to errors when running tinygrad on GPU. George Hotz suggested fixing the OpenCL setup and checking clinfo to troubleshoot.
  • Improving OpenCL error messages: The community discussed enhancing OpenCL error messages by autogenerating them from OpenCL headers. A pull request was opened to implement better error messages.
  • Process replay documentation needed: George Hotz requested adding documentation on process replay to assist new contributors. This was in response to simplifying the process of rewriting operations using new styles.
  • Monday meeting agenda topics: Important topics include the tinybox launch, the 0.9.1 release, the CI benchmark duration, removing numpy, and various technical discussions. Highlights also include performance milestones like achieving 200 tok/s for llama 7B on multi-GPU setups.

Links mentioned:


tinygrad (George Hotz) ▷ #learn-tinygrad (69 messagesđŸ”„đŸ”„):

  • George Hotz addresses recursive rewrite assert: A member asked about an assert in uops graph_rewrite which counts recursive rewrites. This assert ensures that recursive rewrites loop below a threshold to prevent infinite recursion.

  • Gradient sync in beautiful_mnist_multigpu.py simplified: George Hotz confirmed that gradient synchronization is inherent in Tinygrad’s optimizer. He emphasized the simplicity over Torch’s Distributed Data Parallel.

  • Tinygrad’s goals to surpass PyTorch: George Hotz discussed Tinygrad’s aim to outperform PyTorch in speed, API simplicity, and bug reduction. While currently slower, especially in LLM training, Tinygrad’s purity and potential were highlighted by enthusiastic users.

  • Mixed precision implementation discussion: A user sought advice from George Hotz on implementing mixed precision for a model, discussing various approaches including using DEFAULT_FLOAT and nn class modifications. George suggested cast_ methods and late casting techniques for better efficiency.

  • Kernel issues resolved: A user resolved kernel issues related to remainder tensors not appearing in UOp graphs, learning that separate realize calls split operations into different kernels. Discussions highlighted the significance of realizing tensors appropriately to meet custom accelerator requirements.

Link mentioned: Creation - tinygrad docs: no description found


OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

  • Introducing GPT Notes app: A member showcased a hybrid application combining an LLM client and notes app, allowing users to dynamically include/exclude notes into the LLM’s context. The project, built without using any JS libraries, offers features like import/export, basic markdown, and responses management.
  • No mobile support, pure vanilla JS: Despite lacking mobile support, the app boasts of no reliance on libraries, purely built with vanilla JavaScript. It includes functionalities like storing API keys, history, and notes locally in the browser.
  • Explore the app on Codepen: The member provided a Codepen link for the project and a deployed fullscreen app. The application serves as an example for anyone looking for a similar tool.

Link mentioned: GPNotes: no description found


OpenRouter (Alex Atallah) ▷ #general (68 messagesđŸ”„đŸ”„):

  • OpenRouter Errors without User Messages Sparking Debate: Users discussed the issue of OpenRouter returning errors if no user message is found, noting that some models require at least a user message as an opener, and even starting with an assistant is not supported by every model due to their instruct-tuned format. A suggested workaround was using the prompt parameter instead of messages (OpenRouter Docs).

  • Document Formatting and Uploading Puzzles Users: A user inquired about services for formatting text into structured “papers,” leading to a broader discussion on document formatting and uploading. The conversation highlighted the complexity of making PDFs LLM-friendly, with suggestions to preprocess PDFs using tools like PDF.js and Jina AI Reader.

  • Qwen2’s Censorship Criticized: Users shared their experiences with the Qwen2 model, labeling it as overly censored despite jailbreak attempts, evidenced by implausibly positive narrative outcomes. Alternative, less-censored models like Dolphin Qwen 2 were recommended.

  • Gemini Flash’s Context Limit Debate: A discrepancy in Gemini Flash’s token generation limits prompted questions, with OR listing 22k tokens while Gemini Docs claimed 8k. It was clarified that OR counts characters to match Vertex AI’s pricing model (OpenRouter Status).

  • Rate Limits and Model Configuration Questions Arise: Users inquired about rate limits for models like GPT-4o and Opus, leading to guidance on checking rate limits via API keys (OpenRouter Rate Limits). Also, discussions about maximizing model performance and configuration settings like “Sonnet from OR vs Sonnet with Claude key” and “LiteLLM vs OR Routing” unfolded, emphasizing custom retry options and API call efficiency.

Links mentioned:


OpenRouter (Alex Atallah) ▷ #음반 (1 messages):

is.maywell: <:a6adc388ea504e89751ecbbd50919d3a:1240669253699637339>


LangChain AI ▷ #general (48 messagesđŸ”„):

  • TextGen Integration in LangChain Broken: A member reported that textgen integration into LangChain is broken due to an API update.
  • Best Splitter for Chunking Textbook PDFs: A member asked for advice on the best splitter to use for chunking PDF text according to headers and chapters, aiming to structure the text better.
  • LangChain Postgres Installation Trouble: Users exchanged advice about installing langchain_postgres, with a solution involving correcting the targeted directory for pip install.
  • Module Error with New Tenacity Version: A user encountered a ModuleNotFoundError for ‘tenacity.asyncio’ following an update to version 8.4.0, but found reverting to version 8.3.0 resolved the issue.
  • Help for New LangChain Users: Multiple users sought guidance on implementing specific models or error handling in LangChain, including transitioning from Python code to LangChain JS, managing HuggingFace models, and recommended LLMs like Llama 3 or Google Gemini for local use. A relevant discussion was linked here.

Links mentioned:


LangChain AI ▷ #share-your-work (14 messagesđŸ”„):

  • R2R adds automatic knowledge graph construction: R2R v2 now includes automatic knowledge graph construction along with a comprehensive cookbook that walks through basic and advanced features. “This should make a great (and up to date) starting point If you are interested in KGs.”

  • Collision event interactive map launched: Eloquentsyntax announced an interactive map for Collision parties and events. The map includes filters, door fees, addresses, RSVP links, and an AI chat to find events easily.

  • CryptGPT: Privacy-Preserving LLMs using Vigenere cipher: Diwank introduced CryptGPT, a project that pretrains a GPT-2 model on Vigenere ciphertexts, ensuring privacy from the model provider. The unique feature is that usage requires knowledge of the encryption key.

  • Scrape Web + Create diagrams with GPT: Ashes47 shared a project from user Anuj4799, who created a custom GPT for generating technical diagrams. The demo can be checked out here.

  • Rubik’s AI beta tester and promo: Paulm24 invited users to beta test an advanced research assistant and search engine, offering a 2-month free premium with models like GPT-4 Turbo and Claude 3 Opus using the promo code RUBIX. Interested users are encouraged to sign up at Rubik’s AI.

Links mentioned:


LangChain AI ▷ #tutorials (1 messages):

emarco: https://www.youtube.com/watch?v=0gJLFTlGFVU


Latent Space ▷ #ai-general-chat (21 messagesđŸ”„):

  • OtterTune is no more: OtterTuneAI officially shut down after a failed acquisition deal. The announcement was shared on Twitter.
  • Check out Apple's models on Hugging Face: Apple has published several models optimized for on-device performance on Hugging Face, including DETR Resnet50 Core ML for semantic segmentation and Stable Diffusion Core ML.
  • OpenAI under fire for appointing former NSA head: Edward Snowden criticized OpenAI’s decision to appoint former NSA Director Paul M. Nakasone to its board, calling it a betrayal of public trust.
  • Runway releases Gen-3 Alpha video model: Runway introduces Gen-3 Alpha, a new model for video generation with advanced features. Details were shared on Twitter.
  • Anthropic research on reward tampering: Anthropic publishes a new paper on AI models learning to hack their reward systems. The research and its findings are summarized in their blog post.

Links mentioned:

  • Flo Crivello on Building Lindy.AI: no description found
  • Tweet from Anthropic (@AnthropicAI): New Anthropic research: Investigating Reward Tampering. Could AI models learn to hack their own reward system? In a new paper, we show they can, by generalization from training in simpler settings. ...
  • apple (Apple): no description found
  • Tweet from Buck Shlegeris (@bshlgrs): ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71...
  • Tweet from Air Katakana (@airkatakana): i’m calling the top, this company didn’t even do anything yet
  • Tweet from Tom Goldstein (@tomgoldsteincs): LLMs can memorize training data, causing copyright/privacy risks. Goldfish loss is a nifty trick for training an LLM without memorizing training data. I can train a 7B model on the opening of Harry P...
  • Tweet from Greg Brockman (@gdb): GPT-4o as an assistant for helping doctors screen and treat cancer patients: Quoting Othman Laraki (@othman) I'm thrilled to announce the @Color Copilot, which we developed in partnership with ...
  • Tweet from Runway (@runwayml): Introducing Gen-3 Alpha: Runway’s new base model for video generation. Gen-3 Alpha can create highly detailed videos with complex scene changes, a wide range of cinematic choices, and detailed art di...
  • Tweet from Andy Pavlo (@[email protected]) (@andy_pavlo): I'm to sad to announce that @OtterTuneAI is officially dead. Our service is shutdown and we let everyone go today (1mo notice). I can't got into details of what happened but we got screwed ove...
  • Tweet from François Chollet (@fchollet): Re: the path forward to solve ARC-AGI... If you are generating lots of programs, checking each one with a symbolic checker (e.g. running the actual code of the program and verifying the output), and ...
  • Tweet from Buck Shlegeris (@bshlgrs): ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71...
  • Tweet from Edward Snowden (@Snowden): They've gone full mask-off: 𝐝𝐹 𝐧𝐹𝐭 đžđŻđžđ« trust @OpenAI or its products (ChatGPT etc). There is only one reason for appointing an @NSAGov Director to your board. This is a willful, calculat...
  • Flo Crivello on Building Lindy.AI | annotated by Daniel: AI Agents are a new category of software, built on top of large language models (LLMs).

Latent Space ▷ #ai-in-action-club (20 messagesđŸ”„):

  • Prime Intellect set to open source DiLoco and DiPaco: Users discussed how Prime Intellect plans to release state-of-the-art models DiLoco and DiPaco soon, enhancing open collaboration. One member shared a Prime Intellect link detailing how the platform democratizes AI through distributed training across global compute resources.

  • Bittensor utilizes The Horde: Users mentioned that The Horde, known for distributing computational tasks, is being utilized on the Bittensor network for decentralized AI model training.

  • DeepMind did not participate: Contrary to some expectations, it was clarified that DeepMind did not contribute to specific ongoing projects in the community discussion.

  • YouTube video on Optimizers: Members shared a YouTube video about optimizers, explaining various types from Gradient Descent to Adam. It offered an easy way to remember different optimizers for effective model training.

  • ChatGPT’s multi-step responses: A discussion centered around how ChatGPT formulates multi-step responses, clarifying that different transformer blocks can be processed separately. This sparked interest and questions about specific parallelizations within transformer layers.

Links mentioned:

  • Prime Intellect - Commoditizing Compute & Intelligence: Prime Intellect democratizes AI development at scale. Our platform makes it easy to find global compute resources and train state-of-the-art models through distributed training across clusters. Collec...
  • Optimizers - EXPLAINED!: From Gradient Descent to Adam. Here are some optimizers you should know. And an easy way to remember them. SUBSCRIBE to my channel for more good stuff! REFER...

Cohere ▷ #general (20 messagesđŸ”„):

  • Debate on AGI Hype: A user shared a YouTube video titled “Is AGI Just a Fantasy?” featuring Nick Frosst, spurring discussions about the hype, real tech advancements, and evaluation of LLMs. Members expressed fatigue over “hype bros” but acknowledged the importance of ongoing investment, likening it to the dot-com bubble that led to significant innovations.

  • Call for Next.js App Router Collaboration: A member announced the creation of a GitHub issue inviting collaboration on migrating the Cohere toolkit UI to Next.js App Router to improve code transferability and attract more contributors. The GitHub issue #219 contains more details about the feature request.

  • C4AI Talk Link Shared: Nick Frosst provided a Google Meet link for the C4AI talk and directed members with questions to the relevant Discord channel.

  • Interest in Contributing Data for Training: A user inquired about submitting 8,000 PDFs for embedding model training with Cohere. Nick Frosst sought clarification if the user intended to fine-tune an embedding model, opening a discussion on potential data contributions.

Links mentioned:


Cohere ▷ #project-sharing (11 messagesđŸ”„):

  • Cohere models integrate into Chrome for free: A member announced a free Chrome Extension that integrates LLMs directly into the browser, eliminating repetitive tasks and enhancing productivity. Users are encouraged to provide feedback and can configure it with detailed instructions provided.
  • Interactive Collision map launched: Another member created an interactive map of all Collision events, allowing users to filter by event details and access AI chat for easier navigation. It utilizes Sveltekit, Supabase, and Vercel for its build.
  • Command R+ configuration issue resolved: A user experienced issues configuring Command R+ with the Cohere-powered extension but received help to rectify it by using a Blank Template first. The developer acknowledged the bug and plans to fix it.
  • Inquiry about Cohere data submission: A user inquired if Cohere accepts data submissions for training, specifically mentioning they have nearly 8,000 PDFs for embedding model training.

Links mentioned:


Cohere ▷ #announcements (1 messages):

  • David Stewart to host Cohere Developer Office Hours: A relaxed session is scheduled for tomorrow, hosted by David Stewart, a seasoned Solution Architect at Cohere. Members are encouraged to post their questions and issues on this thread to get prioritized during the event.
  • Event details released: The Office Hours event will take place on June 18, at 1:00 PM ET. Join the event here for live interaction and guidance on Cohere API and model-related queries.

Link mentioned: Join the Cohere Community Discord Server!: Cohere community server. Come chat about Cohere API, LLMs, Generative AI, and everything in between. | 17098 members


OpenInterpreter ▷ #general (14 messagesđŸ”„):

  • Model freezes mid-code: One member inquired if others were experiencing their model freezing while in the middle of coding. Another member replied that it usually completes the task even when it looks frozen.
  • Windows installation issues: A user reported issues with installing and running the model on Windows. They were advised to search for help and post their query in a designated channel.
  • Memory functionality improves: A member expressed satisfaction with getting memory to work in a “very primitive way.” They enthusiastically shared their progress with the community.
  • Llama 3 Performance Review: A detailed model comparison and performance test for Llama 3 was shared, promising a comprehensive assessment of Llama 3 Instruct’s capabilities across various formats and quantization levels.
  • Profiles functionality feature: A new ‘profiles’ feature on Open Interpreter was highlighted. A member shared a video to explain its capabilities and applications.

Links mentioned:


OpenInterpreter ▷ #O1 (4 messages):

  • Check your unit arrival in pinned messages: A user asked how to check when their unit is arriving, mentioning they placed an order very early. Another member redirected them to a pinned message in the channel for manufacturing updates and timelines.
  • Discuss combo of vector DB, semantic search, and LLM: A question was raised about the potential of combining a vector database of audio with voice-based semantic search and indexing, alongside an LLM capable of accessing this data and performing actions. The proposed combination hints at a powerful tool for actions based on verbal inputs.

OpenInterpreter ▷ #ai-content (6 messages):

  • DIY AI Cyber Hat turns heads: A member shared their project on making an open-source AI-enabled wearable hat, likening it to smart glasses. They provided a video preview and expressed openness for collaboration, view the video here.
  • Terminator humor on hat design: One member humorously remarked that the hat design made the creator look like a terminator sent to eliminate the founder of Hobby Lobby.
  • Interest in sci-fi wearables sparks engagement: People showed enthusiasm for the AI hat project, requesting access to the source code once it’s cleaned up. The creator suggested possible future integration of more sensors for scientific experiments.
  • Pi Zero heads to Big Mouth Billy Bass: The same creator teased their next project involving integrating a Pi Zero in a Big Mouth Billy Bass.
  • Dream Machine generates buzz: A member shared Dream Machine, an AI model that creates high-quality, realistic videos from text and images. The model aims to build a universal imagination engine and is now available to the public.

Links mentioned:

  • Luma Dream Machine: Dream Machine is an AI model that makes high quality, realistic videos fast from text and images from Luma AI
  • I Made My Own Custom AI Cyber Hat: This is a video about the start of a project of mine that I've called "heddy" (the hat portion at least). I created my own smart AI enabled hat largely thro...

Torchtune ▷ #general (7 messages):

  • Single node focus for now in Torchtune: When asked if Torchtune plans to release multi-node training, a member clarified that the focus is currently on single node training. However, they noted that “our ‘tune run’ command is a wrapper around torch run” and with minor changes, multi-node setups could work, although it’s untested.

  • Distributed config adjustments for multi-node training: Members exchanged tips on setting up multi-node training in Torchtune. One suggested setting tune run —nnodes 2, while another mentioned the need for TorchX or slurm to handle script launches and node communications over specific ports, pointing to resources like TorchX and hybrid shard strategy documentation.

Link mentioned: FullyShardedDataParallel — PyTorch 2.3 documentation: no description found


DiscoResearch ▷ #discolm_german (5 messages):

  • Llama3 tokenizer remains unchanged: Members discussed whether the Llama3 tokenizer was extended for the German model. One member confirmed that “tokenizer is the same as the base Llama3”.

  • Concerns about German token handling: A member questioned the rationale behind not extending the tokenizer, noting that not including German tokens probably decreases the context window quite a bit. They were curious if they were missing any reasoning, especially considering the potential increases in embeddings.

  • Size comparison with Llama2: Another member pointed out that Llama3’s tokenizer is 4 times larger than Llama2’s. They inquired whether it was already more effective on German or if there were still issues.


Datasette - LLM (@SimonW) ▷ #ai (3 messages):

  • Alternative Positions in AI Discussions Praised: One member appreciated another’s writing for “addressing the alternative position in good faith”. They humorously noted that ChatGPT’s rise is “a full employment act for data engineers in perpetuity”.

  • Thoughtbot’s LLM Guide Shoutout: A member highlighted a useful thoughtbot resource for beginners in LLMs. They recommended reading Jose Blanco’s post on using open-source LLMs locally and remotely.

  • Clarity in Naming Conventions for LLMs Appreciated: Another member found the categorization of LLMs into Base, Instruct, and Chat models particularly clear and detailed.

Link mentioned: Understanding open source LLMs: Do you think you can run any Large Language Model (LLM) on your machine?


Datasette - LLM (@SimonW) ▷ #llm (1 messages):

  • Turso adds native vector search support: Turso has introduced native vector search capabilities to their platform, supplementing SQLite’s existing features. This new addition aims to simplify vector search for users building AI products, addressing previous challenges with managing extensions like sqlite-vss.

Link mentioned: Turso brings Native Vector Search to SQLite: Vector Similarity Search is now available!


AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (1 messages):

gomiez: anyone know of the hospital ai town project name?


Mozilla AI ▷ #llamafile (1 messages):

cryovolcano.: can we use llamafile with tinyllama as a search engine in firefox ?





{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}