AI News for 1/8/2025-1/9/2025. We checked 7 subreddits, 433 Twitters and 32 Discords (219 channels, and 2928 messages) for you. Estimated reading time saved (at 200wpm): 312 minutes. You can now tag @smol_ai for AINews discussions!

Congrats to all seven billionaire cofounders of Anthropic.

{% if medium == 'web' %}

Table of Contents

[TOC]

{% else %}

The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!

{% endif %}

AI Twitter Recap

all recaps done by Claude 3.5 Sonnet, best of 4 runs.

AI Models & Benchmarks

rStar-Math Surpasses OpenAI's o1 in Math Reasoning: @reach_vb detailed how rStar-Math uses MCTS and a Process Reward Model to achieve 90.0% accuracy on the MATH benchmark with a 7B LLM, outperforming o1-preview by +4.5%.
Qwen Chat Launches on Open WebUI: @Alibaba_Qwen announced the release of Qwen Chat, featuring models like Qwen2.5-Plus and Qwen2.5-Coder-32B-Instruct, enhancing vision-language and reasoning capabilities.
Microsoft’s Phi-4 Model Released: @rasbt shared insights on Phi-4, highlighting its training on 40% synthetic data and its impact on pretraining with improved performance through increased training epochs.

AI Tools & Platforms

North AI Workspace for Enterprises: @cohere introduced North, a secure AI workspace integrating LLMs, RAG, and automation, optimized for private deployments and enhancing employee productivity.
LangChain’s Company Research Agent: @LangChainAI showcased a company researcher agent that follows a multi-step workflow including Research, Extraction, and Reflection phases, along with an open-source dataset for evaluation.
Transformers.js Demos Released: @tom_doerr shared a collection of demos for Transformers.js, covering tasks like text embeddings and image segmentation across JavaScript environments.

AI Research & Studies

Gradient Dissent Podcast Episode: @weights_biases featured @akshaykagrawal, discussing collaborative platforms for AI development in the latest episode of Gradient Dissent.
Meta Chain-of-Thought in LLMs: @arankomatsuzaki presented Meta Meta-CoT, an extension of Chain-of-Thought that models underlying reasoning processes, enhancing multimodal reasoning capabilities.
DeepSeek V3 and Self-Improvement in LLMs: @teortaxesTex discussed DeepSeek's approach to finetuning with domain-specific data and recursive self-improvement, highlighting the role of MCTS in generating high-quality training data.

AI Industry Partnerships

Rakuten Partners with LangChain: @LangChainAI announced a collaboration with Rakuten, recognizing them as one of the few companies delivering real value with Generative AI.
North’s Partnership with RBC: @aidangomez revealed the partnership with @RBC, aimed at optimizing North for financial services and supporting 90,000 employees in adopting the latest AI technologies.
Agent Laboratory Collaboration with AMD and Johns Hopkins: @arankomatsuzaki highlighted how Agent Laboratory enables researchers to use LLM agents for the entire research process, fostering open-source and adaptable solutions.

Technical Discussions & Development

CUDA and Triton for AI Efficiency: @hkproj emphasized the importance of learning CUDA and Triton for significant financial gains in AI development, as showcased in a linked video.
AI-Assisted Coding Best Practices: @AndrewYNg shared his evolving software stack leveraging AI tools like OpenAI’s o1, Anthropic’s Claude 3.5 Sonnet, and various deployment platforms to enhance prototyping efficiency.
Dynamic Few-Shot Prompting in AI Models: @hwchase17 discussed the implementation of dynamic few-shot prompting in Realm-X, significantly improving performance from ~40% to ~80% by selecting the most relevant examples based on user queries.

Memes & Humor

Work-Life Balance with AI Agents: @bindureddy humorously listed the traits of AI agents, poking fun at their current limitations while predicting rapid improvement.
AI Replacing Jobs: @mickeyxfriedman joked about AI eliminating various unique job roles, highlighting the humorous side of AI disruptions.
Personal AI Experiences: @karpathy shared a lighthearted take on his daily routine enhanced by AI, reflecting the everyday integration of AI tools with a touch of humor.

** AI Community & Events**

NLP Seminar with Stanford: @stanfordnlp announced a talk by @taoyds on Vision-Language Models, inviting non-affiliates to register for the seminar.
GitHub Expo for AI Engineers: @swyx promoted the @aiDotEngineer Expo, targeting those hiring AI engineers and encouraging participation through dedicated spaces.
AI Studio Joins Google DeepMind: @osanseviero celebrated the merger of AI Studio, Gemma, and Gemini API with Google DeepMind, anticipating accelerated advancements in open models and accessible research.

AI Reddit Recap

/r/LocalLlama Recap

Theme 1. Groq's Handling of Models: Insights and Comparisons

This sums my experience with models on Groq (Score: 1096, Comments: 64): The post humorously critiques Groq's performance with Llama3.3 70b and Qwen2.5 72b models by likening it to a character who is fast but inaccurate in math. The meme suggests that while Groq's processing might be rapid, it may lack precision, as depicted through the comedic exchange of an incorrect multiplication result.
- Groq's Performance and Use Cases: Groq is critiqued for quantizing models excessively, fitting them into small VRAM sizes like 230 MB, which may lead to reduced precision. Users suggest Groq is better suited for simple tasks like cleaning transcripts due to its speed, rather than complex reasoning tasks.
- Comparative Evaluations: Cerebras evaluated Llama 3.1 8B and 70B models across providers, including Groq, and found Groq's performance comparable to others, despite the humorous critique. The evaluation can be found on Cerebras's blog.
- Model Alternatives and Questions: Some users question the choice of Groq, suggesting alternatives like Qwen2.5 72b for potentially better results. There is also skepticism about the post's potential sponsorship by competitors like Cerebras or Nvidia.

Theme 2. Phi-4 Performance: Benchmark vs Real-World Tasks

Phi 4 is just 14B But Better than llama 3.1 70b for several tasks. (Score: 251, Comments: 63): Phi-4, a 14B parameter model, demonstrates superior performance in specific tasks compared to Llama 3.1 70B, according to a scatter plot analyzing AI models by their active parameters and MMLU aggregate performance scores. The plot underscores Phi-4's high efficiency and effectiveness, positioning it as a "small but mighty" model, outperforming larger models like Llama-3.3-70B and Qwen2.5-72B.
- Phi-4's Benchmark Focus: There is skepticism about Phi-4's real-world task performance, with some claiming it excels in benchmarks due to heavy training on benchmark data rather than actual tasks. SnooPaintings8639 notes that while Phi-4 scores high on benchmarks, it struggles in real use cases and closed benchmarks, suggesting overfitting concerns.
- Model Comparisons: Phi-4 is not universally seen as better than larger models like Llama 3.1 70B or Qwen 2.5 35B. siegevjorn and silenceimpaired question its superiority, with Vishnu_One confirming it does not surpass Qwen 2.5.
- Training and Data Strategy: Phi-4's training strategy focuses on reasoning through complex problems using synthetic data, as highlighted by rabbotz. x0wl mentions it was trained to avoid factual questions, leading to poor performance in general knowledge but excelling in math benchmarks.
Phi-4 Llamafied + 4 Bug Fixes + GGUFs, Dynamic 4bit Quants (Score: 202, Comments: 64): The Phi-4 model has been updated with 4 bug fixes improving tokenizer and chat template handling, which enhances inference and fine-tuning performance. The model is now Llamafied for compatibility with various frameworks, offering 2x faster fine-tuning, 70% VRAM reduction, and 9x longer context lengths using Unsloth. New uploads on HuggingFace include GGUF, 4-bit, and 16-bit versions, along with Dynamic 4-bit quants that enhance accuracy by selectively maintaining 16-bit layers.
- Bug Fixes and Improvements: The Phi-4 model received significant bug fixes, notably in the tokenizer, improving performance. The fixes are detailed in a blog post, and enhance the model's accuracy, as demonstrated by a 20% increase in Python test pass rates when using the updated GGUF files.
- Dynamic 4-bit Quants and Compatibility: The Dynamic 4-bit quants are primarily for inference or fine-tuning rather than compatibility with frameworks like llama.cpp. These quants provide improved accuracy compared to BitsandBytes 4-bit, as discussed in this blog post.
- User Feedback and Performance: Users reported improved performance and accuracy with the Phi-4 model, surpassing expectations and previous versions like Phi-3. The updates were noted to boost performance on tests such as the Pentesting multiple choice test, with scores improving significantly due to chat template fixes.

Theme 3. NVIDIA Project DIGITS Memory Bandwidth Speculation

Why I think that NVIDIA Project DIGITS will have 273 GB/s of memory bandwidth (Score: 372, Comments: 130): The author estimates that NVIDIA Project DIGITS will have a memory bandwidth of 273 GB/s, based on measurements of memory chip dimensions from images in an NVIDIA CES presentation. They used GIMP to correct image perspective and compared the aspect ratio of the memory chips to those of Micron 128Gb LPDDR5X chips, concluding that a 315-ball x32 bus package is the closest match. The lack of mention of memory bandwidth in the presentation suggests it may not be exceptionally high.
- Discussions highlight skepticism about NVIDIA Project DIGITS' estimated 273 GB/s memory bandwidth, with users comparing it to other hardware like Apple M4 Max with 546GB/s and questioning why NVIDIA didn't mention bandwidth in their presentation, suggesting it's not exceptionally high. Users also compare it to AMD's Strix Halo and note that Xeon or Epyc systems could offer similar or better performance at a potentially lower price point.
- Commenters debate the practicality of DIGITS versus Ryzen AI Max+ PRO 395, noting that the Ryzen 395 might be cheaper and versatile for general use, while DIGITS offers CUDA and potential clustering benefits. Both machines feature 128GB of memory, but there are concerns about DIGITS' speed and value compared to other systems.
- There is speculation about Micron's involvement with DIGITS, considering their past business relations with NVIDIA and the potential use of Micron LPDDR5X memory. Some users reference Micron's dual die packaging as a cost-saving measure, while others point out that DIGITS could be seen as an overpriced version of AMD's Strix Halo with CUDA capabilities.

Theme 4. TransPixar: Transparency-Preserving Generative Models

TransPixar: a new generative model that preserves transparency, (Score: 417, Comments: 40): TransPixar, a new generative model, has been released and is noted for its ability to preserve transparency in generated assets. This feature holds potential for creating game assets, indicating advancements in generative models for game development.
- TransPixar is praised for its utility in generating game assets, with links provided for its GitHub, Arxiv, and Hugging Face demo and model: GitHub, Arxiv, Demo, Model.
- Concerns are raised about the use of a trademarked name from a major animation studio, which could potentially lead to legal issues.
- The model's ability to handle RGBA output is highlighted as a significant technical advancement, as most AI models typically only produce RGB output, making transparency a complex feature to implement.

Other AI Subreddit Recap

/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT

Theme 1. Salesforce's AI Strategy: Ending Software Engineer Hires by 2025

Salesforce will hire no more software engineers in 2025 due to AI (Score: 729, Comments: 116): Salesforce plans to halt hiring software engineers in 2025 as a result of advancements in AI.
- Many users believe Salesforce's AI announcement is primarily a marketing tactic rather than a genuine strategy to replace engineers. Indicava and bmson suggest skepticism, citing past marketing claims about AI's role in decision-making at Salesforce, and Frugal_Ferengi argues that AI isn't yet capable of replacing human engineers effectively.
- Despite the announcement, Salesforce continues hiring engineers, especially in India, contradicting the claim of halting hiring. WonderingStarDusts and WH7EVR provide evidence of ongoing recruitment, implying that the statement may not reflect the company's actual hiring practices.
- Concerns about AI's impact on software engineering jobs are discussed, with This_Organization382 and wtf_is_a_monad expressing doubt over AI's current capability to fully replace engineers. They highlight that AI models like ChatGPT still struggle with complex tasks, and the decision to limit hiring may be a premature move lacking substantial data support.

Theme 2. ChatGPT Losing It: Recognizing Anthropic-Type Mistakes

ChatGPT loses it (Score: 408, Comments: 38): The post titled "ChatGPT loses it" lacks a detailed body, and includes a video which is not analyzable. No further technical details or discussion points are provided in the text.
- A humorous discussion emerged about whether a phone's mass changes when its memory is full, with Caneofpain noting that the mass change is technically real but immeasurably small. Trollsmurf added that memory types might affect mass differently, potentially making devices lighter when data is added due to changes in electron states.
- Wirtschaftsprufer shared a comedic anecdote involving ChatGPT's responses, illustrating the AI's unexpected and humorous behavior in recalling events.
- Ithkuil commented on the humor's longevity, pondering how perceptions might change by 2025, with Drtoucan setting a reminder to revisit the topic in a year.

Theme 3. Conspiracy Claims: OpenAI's Erasure of Former Employee Data

A viral post by X user Mario Nawfal had claimed that OpenAI has removed all traces of their former employee Suchir Balaji from ChatGPT. The Crypto Times fact checked the claims made by user and found them to be true. (Score: 107, Comments: 67): OpenAI allegedly removed all traces of former employee Suchir Balaji from ChatGPT, according to a viral post by X user Mario Nawfal. The Crypto Times fact-checked these claims and confirmed their accuracy.
- Several commenters question the reliability of the viral claims, with users like Mrkvitko pointing out the misleading nature of the title, emphasizing that Suchir Balaji's information was likely never in the training data rather than being removed. Tall-Log-1955 and traumfisch criticize the conspiracy theory angle and the credibility of sources like The Crypto Times.
- Discussions around Suchir Balaji's role at OpenAI highlight his significant contributions, with references to John Schulman's acknowledgment of Balaji's essential work. However, there's contention about his whistleblower status, with NotFromMilkyWay noting his NDA violation and subsequent legal and personal consequences.
- The conversation touches on the technical aspects of ChatGPT's data handling, with traumfisch and SkaldCrypto discussing whether web search capabilities would allow ChatGPT to recognize Balaji due to his media presence, contrasting it with the typical training data limitations.

AI Discord Recap

A summary of Summaries of Summaries by o1-2024-12-17

Theme 1. Model Showdowns & Surprises

Phi-4 Rockets Past Microsoft: The Unsloth’s Phi-4 soared above the official Microsoft version, featuring “We found & fixed 4 bugs in Phi-4 & Llamafied the model” in a lively tweet. Its 4-bit and 16-bit releases sparked instant hype among the community.
Stunning Gains with rStar-Math: Microsoft’s technique bumped Qwen2.5-Math-7B from 58.8% to 90.0% on the MATH benchmark, with Phi3-mini leaping from 41.4% to 86.4%. They now solve about 53.3% of USA Math Olympiad problems, fueling talk of massive leaps for small LLMs.
Qwen Chat Opens Doors: This new web UI unifies Qwen models, enabling direct doc uploads and side-by-side comparisons. Future expansions include voice, web search, and more, hinting at a user-friendly AI frontier.

Theme 2. Coding Tools & HPC Upgrades

ComfyUI Integrates OpenPose: Users overcame friction with Pony models by relying on workflow guides for control nodes. Some pivoted to Forge UI but returned once new solutions emerged for smooth nodal integration.
AMD vs Nvidia GPU Grudge Match: Community members compared performance on Windows using ZLUDA, ROCm, or native GPU drivers. Each approach yields distinct gains, with official wiki guides clarifying installation steps.
Self-Hosted Codeium Goes Mainstream: Enterprise teams discovered an on-prem version via GitHub Issue #115, fueling advanced setups. Meanwhile, devs praised Cascade for minimal coding overhead and swift end-to-end site building.

Theme 3. Cutting-Edge Prompting & Decoding

Speculative Decoding Steals the Spotlight: Some dubbed it “DLSS for language models,” claiming it slashes GPU usage in training and inference. Enthusiasts embraced the idea, seeing it as a route to refine outputs while conserving compute hours.
Function Calling Models Stir Curiosity: Users sought benchmarks for open-source function calling, focusing on accuracy tweaks after training. Structured prompts and robust test sets emerged as the secret sauce for reliable calls.
Meta-Prompting & System Message Tweaks: Creators unleashed multi-layer instructions, shaping responses by rewriting system directives. Some insisted “the real magic is specifying exactly what output you want from the start,” stressing precise goals over guesswork.

Theme 4. HPC & GPU Revelations

MI210 Occupancy Mystifies HPC Crowd: Devs found a puzzling 2.5 blocks per compute unit, or 2 with __syncthreads(), on CDNA-based GPUs. They attributed these odd occupancy limits to quirks deep in AMD’s hardware design.
NVIDIA Drops $3000 Home Supercomputer: Enthusiasts cheered HPC-level power for personal AI labs, blasting past standard workstation constraints. Early adopters glimpsed real AI experimentation at home without busting the bank.
ARC Prize Morphs into Non-Profit: Organizers, led by Greg Kamradt, pivot to guide AGI research with structured funds for 2025. They build on ARC Prize 2024 insights, promising a more expansive set of open AI initiatives.

Theme 5. Big Hackathons & Corporate Shifts

AI Agent Hackathon Lures Builders: OpenRouter tempts participants with $10 in API credits and a $6,000 total prize pool, fueled by n8n’s cash awards. The Live Agent Studio portion runs Jan 8–22, with winners revealed Feb 1.
Salesforce Freezes 2025 Hiring: Marc Benioff promised 30% productivity boosts from Agentforce and declared, “we’ll be bigger in five years.” Despite the freeze, admirers note the powerful synergy between AI and corporate maneuvering.
Anthropic Snags $2B at $60B Valuation: Investors estimate $875 million in annual recurring revenue, triggering “good prayers” for 2025 breakthroughs. The AI world hails this war chest, expecting massive leaps on the horizon.

PART 1: High level Discord summaries

Stability.ai (Stable Diffusion) Discord

ComfyUI Gains with OpenPose Pony: A discussion circled around integrating OpenPose control with the Pony model in ComfyUI, referencing node integration tips in Forge UI.
- One user encountered a challenge with ComfyUI’s features, pivoting to Forge UI for improved workflow, but others suggested solutions from the ComfyUI Workflow Resources.
Power Outages Crash SD Dreams: Concerns emerged about potential harm to GPUs and data corruption if a power outage occurs mid-generation in Stable Diffusion.
- A user confirmed the GPU usually remains safe, but abrupt interruption can cause OS-level file errors or data loss, urging frequent backups.
Keeping AI Tools in Sync: Maintaining up-to-date A1111 and ComfyUI proved challenging, with conflicts triggered by older Python versions.
- Participants noted that using Python 3.10.11 resolves most version mismatches, ensuring consistent usage across these frameworks.
AMD GPU Showdown: Users compared ZLUDA and ROCm for AMD GPU support on Windows, noting each offers distinct gains.
- They cited official guides for setting up stable-diffusion-webui on AMD hardware, and reaffirmed the viability of native Windows alternatives.

Unsloth AI (Daniel Han) Discord

Unsloth's Phi-4 Flies Past Microsoft: The Unsloth's Phi-4 model soared above the official Microsoft version on the Open LLM Leaderboard, featuring GGUF, 4-bit, and 16-bit releases after critical bug fixes.
- “We found & fixed 4 bugs in Phi-4 & Llamafied the model.” was the official line in the tweet from Unsloth AI (@UnslothAI), stirring excitement among the community.
Qwen2.5-Math-7B Instruct Touted for Tabular Triumphs: The Qwen2.5-Math-7B-Instruct model was suggested for efficient markdown table calculations, with some users training for one epoch at a 3e-5 learning rate.
- A user switched focus from mistralai/Mathstral-7B-v0.1 upon learning it wasn't a base or PEFT model, turning to Qwen alternatives for improved tabular performance.
Speculative Decoding Takes the Stage: Speculative decoding was highlighted as a potential 'DLSS for language models,' aiming to cut resource use during training or inference.
- The suggestion got a positive reception, with one member seeing it as a fresh angle to refine model output while sparing GPU hours.
LoRA Merging Moves Forward: Community members debated merging LoRA adapters trained on smaller variants into a larger 16-bit model to maintain performance fidelity.
- They emphasized minimal loss of detail, cautioning that merging on a 4-bit foundation could degrade final results.

Codeium (Windsurf) Discord

Self-Hosted Codeium Gains Ground: Members discovered a self-hosted version of Codeium for enterprise deployments, seeking advanced info on obtaining it and referencing Codeium pricing details. They also looked at GitHub Issue #115 for tips on retrieving API keys.
- Questions arose about straightforward setup and whether this move might increase adoption in larger teams. Some noted that Codeium remains free for individuals, while enterprise users pursue on-premise flexibility.
Windsurf Woes: Users encountered ongoing Windsurf crashes, freezes, and random 'The window is not responding' errors. One user on Ubuntu 24.04 reported success, while another on Arch with Hyprland overcame token submission issues by removing config files.
- They hoped future fixes in the Windsurf Editor Changelogs might address stability. Flickering performance dampened confidence, though some reported smooth setups on certain systems.
Cascade Carries the Day: Community members praised Cascade for dependable flow handling and minimal coding overhead. One user claimed they built their company website with minimal effort using its capabilities.
- Others cited frustration with the Cascade panel auto-opening and sought better toggles. They nudged developers for a fix on Codeium Feedback, hoping for a quick resolution.
Flow Credit Fiascos: Several participants complained about flow credits billing confusion and suspected double charges. One user mentioned hefty fees with minimal credit allocation, feeling overlooked by support.
- They urged others to document similar billing complaints at Codeium Feedback. Worries over sustaining prompt credits for collaborations also surfaced, prompting calls for more transparent usage tracking.
Agent Aspirations & Update Pain: Some asked about using agents with Windsurf, but the forum lacked clarity on official integration. This generated interest in bridging features from other platforms.
- A recent update caused sporadic commands to fail and baffling code generation in Cascade. Reports ranged from slow performance to partial breakage, prompting repeated calls for quick patches.

Cursor IDE Discord

Cursor Composer Confusion: Repeated complaints cited Cursor composer's tendency to ignore .cursorrules, driving users to alternative coding tools for reliable edits.
- A stuck generation in 0.44.9 persisting into 0.44.10 fueled annoyance over the composer’s stability.
Claude’s Crazy Quirks: Multiple comments highlighted Claude thriving with deliberate prompts encouraging it to share internal reasoning.
- Yet users remain exasperated by its erratic output quality, requiring careful monitoring and overshadowing potential productivity gains.
Cursor Rules Rigor: Community members stressed a dedicated .cursorrules file to guide model compliance in every project.
- Cursor Directory was cited as a hub for curated rule sets adapted for popular frameworks and languages.
Docs Demand & Developer Dialogue: Participants slammed the inadequate Cursor documentation, calling it confusing for advanced features and runtime metrics.
- They recommended the official forum for quicker replies from developers, but many hope for deeper written resources.

Stackblitz (Bolt.new) Discord

Color-Coded Prompting Made Easy: Enthusiasts recommended specifying color names and hex codes in prompts, highlighting minimal instructions for clarity.
- One member suggested a short 'just an idea' approach, aiming to eliminate confusion by keeping directions concise.
Public Repos with a Prefix: A member revealed a public repos feature for StackBlitz, allowing users to open GitHub URLs by prepending 'http://bolt.new'.
- They noted this setup increases accessibility, letting users quickly load code from accessible repositories.
Subreddit AI Calls for Q&A: A promotional post introduced SubReddit AI, inviting questions on prompting strategies.
- Community members discussed short prompt tactics and code snippet usage to refine model outputs.
Bolt Performance Meltdown & PWA Friction: Users reported Bolt performance hiccups, with one person burning 100k tokens from repeated code insertions.
- Others complained about PWA setup errors, though a few successfully launched their PWAs to prove it's workable.
Supabase & GitHub Rollbacks Confusion: Participants flagged issues with Supabase migrations not reverting with project code, risking irreversible changes.
- They advised frequent forks, while some faced GitHub deployment snags including empty repos during the setup.

aider (Paul Gauthier) Discord

Claude Clashes with DeepSeek: Users compared Claude and DeepSeek, noting mixed reviews on DeepSeek’s competence and occasional misfires in execution.
- Some highlighted that using a VPN or careful setup might reduce stalls, but others remain unconvinced of its reliability.
Aider’s Configuration Confusions: Members encountered TypeError issues with litellm when Aider sent a 'prompt' list instead of 'messages,' echoing guidance from the troubleshooting docs.
- They referenced CONTRIBUTING.md for clarifications and debated best practices to automate pull requests via PR #540.
Eyeing Tier 5 Keys with OpenAI: A conversation emerged about OpenAI’s model tiers, with talk of a $200 O1 Pro subscription and alternatives like Unify.ai.
- Participants weighed cost versus flexibility, sharing tips on achieving robust coverage for advanced features.
Gemini 2.0 Flash Hits the Road: While out running errands, someone tested Gemini 2.0 Flash Experimental in voice mode for quick app idea brainstorming.
- They noticed it lacked markdown output for structuring specs, but it created a concise summary afterward to streamline development steps.

Notebook LM Discord Discord

DeepResearch & NotebookLM's Bulky Blues: Community members noted no direct tie between DeepResearch and NotebookLM, referencing a YouTube video about boosting research and content efficiency.
- They mulled over possible workarounds like extension-based uploads, underscoring that NotebookLM still lacks a fully native approach to external repositories.
Quoted Summaries via NotebookLM Plus: A user guided NotebookLM to return only direct quotes from sources, observing fluctuating reliability without the Plus edition’s improved memory retention.
- They also noted difficulties replicating the command flow across usage sessions, suggesting NotebookLM Plus for more stable prompt adherence.
Mandarin Podcast Magic from English: A member inquired about generating a Mandarin podcast from English source material in NotebookLM, but found no concrete solution.
- The community floated collaboration ideas, acknowledging the need for more robust multi-language handling tools.
License Laments & Podcast Prompts: Many faced NotebookLM usage issues linked to workspace licenses and feature removal, discussing potential restarts or new notebooks for a clean slate.
- Some tried external tools like Illuminate for voice variety in podcast outputs, while others sought creative prompts to produce audio from curated sources.

LM Studio Discord

Qwen Chat’s Quick Kickoff: The brand-new Qwen Chat extends a Web UI for Qwen models, supporting model comparisons, document uploads, and a visual interface.
- A Tweet from Qwen hints at more enhancements coming soon, fueling the community’s excitement.
Snapdragon X Elite Eyes OpenCL?: One user asked about potential OpenCL support on Snapdragon X Elite, referencing updates in Llama.cpp to optimize computing overhead.
- Enthusiasts foresee better performance for LLaMA models across different hardware if the integration materializes.
AMD RX 7900XT vs Nvidia: GPU Grudge Match: Community members compared the AMD RX 7900XT with Nvidia 4090, 4080, and 3090, spotlighting memory bandwidth concerns and referencing this Reddit discussion.
- They concluded that detailed benchmarks are key before picking a GPU for demanding LLM workloads.
MacBook VRAM Tinkering for Bigger Models: MacBook users experimented with /etc/sysctl.conf to set iogpu.wired_limit_mb=54272, freeing memory for 4-bit and 6-bit MLX models.
- They reported big speed-ups once the system recognized the increased VRAM allotment.
DIGITS Delay Dramas: Members awaiting DIGITS expressed hopes it will provide a broad entry to the Nvidia ecosystem, but grumbled about delays.
- They remain optimistic that once available, full CUDA acceleration could simplify large-scale LLM experimentation.

OpenAI Discord

Graph Generation Gains Steam: A user found ChatGPT able to generate a GRAPH with code requests, revealing potential for advanced data visualization.
- Another user exclaimed yea unbelievable, spotlighting community intrigue over GPT's expanded functionalities.
Meta-Prompting in the Spotlight: Participants explored Meta-Prompting as an advanced technique, shaping AI output through layered instructions.
- One member stressed specifying the desired output from the outset, calling it the key to harnessing robust responses.
Hassabis Seeks Fresh Funding: The community showed enthusiasm for Hassabis and his upcoming investor round, applauding his prolific AI achievements.
- They offered good prayers, underscoring the group's hopes for a successful fundraising.
OpenAI Prompting Strategy Scrutinized: A participant critiqued OpenAI's approach, arguing that reworking system messages might sharpen performance.
- They also highlighted a lack of financial benefits from contributing, fueling talk on the fairness of such collaborations.

Interconnects (Nathan Lambert) Discord

rStar-Math rockets model accuracy: Microsoft's rStar-Math boosts Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing earlier attempts on MATH tasks.
- It solves about 53.3% of USA Math Olympiad problems, igniting talk about massive leaps in small LLM performance.
Qwen Chat cheers multi-model synergy: Qwen Chat unifies Qwen2.5-Plus and Qwen2-VL-Max in a single UI, enabling side-by-side comparisons and document uploads.
- Future expansions hint at web search, image generation, and voice features, signifying a bigger push into user-friendly AI interaction.
NuminaMath's data wrinkles raise eyebrows: NuminaMath aims for consistent single-box solutions, but 2.6% of entries have none and 7.7% have multiple, indicating possible data anomalies.
- Contributors question the quality of open datasets, underscoring potential pitfalls in large-scale math corpora.
MoEs overshadow dense setups: Mixture of Experts historically outperforms dense models at the same parameter usage, implying better peak performance from bigger parameter pools.
- Discussions favored MoEs for high-level tasks, though training complexity was noted as a major challenge.
AI cost talk alarms policy watchers: An estimate claiming $5M for open source AI caused confusion, as further tweets clarified real total expenses.
- Members warned that the public might overlook capex, R&D, and data curation outlays, leading to inaccurate conclusions on AI budgeting.

Eleuther Discord

SmolLM Steps Up with 320GB Dataset: The SmolLM Corpus release got postponed until "tomorrow," now promising 320GB of shardable data instead of the former 1TB uncompressed size for easier handling.
- One user called it "more usable than the previous 1TB uncompressed version," fueling anticipation for the full dataset among early adopters.
SciAgents Sparks Scientific Synergy: Community members praised the ontological approach of SciAgents for revealing interdisciplinary connections in research, referencing this arXiv paper.
- While it doesn’t match GPT-4-level breakthroughs yet, users saw big potential for higher-level learning orchestration across multiple scientific domains.
Grokking Gains Steam with Weight Decay: Participants highlighted grokking as tied to Softmax Collapse, referencing Grokking at the Edge of Numerical Stability and noting that heavy 0.1 weight decay often alleviates overfitting.
- They questioned reliance on softmax for attention, proposing alternatives like sigmoid loss, while suggesting that lower WD could help avoid low-rank pitfalls in LLM optimization.
Modal Makes GPU Training Accessible: Several users applauded Modal for allowing bigger model training via cloud GPUs, citing the generous $30 monthly free credit as a top highlight.
- One user praised it as "more cost-effective for large jobs" compared to traditional reservations, with a focus on supporting researchers at scale.

GPU MODE Discord

Alpha Competition: Swift Softmax Showdown: A new alpha competition invites speed-hungry devs to engineer the fastest softmax kernel on a staging server, with sign-ups already open.
- Early contestants tested performance boosts, echoing “Woo hoo!” in excitement over the results.
Nectar Social’s Sweet $10k Bounty: Early-stage AI startup Nectar Social offers referral fees of up to $10,000 for hires like LLM/AI Engineer and Sr/Staff Product Manager in Seattle.
- They’re funded by major investors and focus on social commerce, encouraging interested folks to reach out.
ARC Prize’s Non-Profit Pivot: The ARC Prize is transitioning into a non-profit foundation to shape research around AGI, steered by Greg Kamradt and team.
- They emphasize a more structured framework, building on insights from the ARC Prize 2024.
MicroDiT Meets MMDIT: Investigators completed MicroDiT replication, sharing model weights and an inference script for local testing.
- Now, a planned DCAE autoencoder and MMDIT upgrades promise improved prompt adherence, pending more powerful compute resources.
MI210 Occupancy: The Great ROCm Riddle: Enthusiasts tackled puzzling occupancy numbers on MI210, observing 2.5 blocks per compute unit and other unexpected figures.
- They found that adding __syncthreads() drops the max to exactly 2, underscoring the quirks in CDNA-based GPUs.

Nous Research AI Discord

DisTrO Release Fuels Collaboration: The newly open-sourced DisTrO garnered excitement from multiple users eager to integrate it into their custom setups.
- Discussions revolve around improved documentation and potential synergy with advanced optimizers.
DeepSeek V3 Triggers Output Quality Debates: A difference in output between official DeepSeek V3 and third-party providers prompted speculation about caching and model issues.
- Some suspect repetitive answers stem from caching quirks, while others consider inherent model tuning limitations.
Hermes Model Sparks Censorship Discourse: The Hermes model drew criticism for partial censorship, as many found system prompts necessary to override restrictions.
- Opinions diverge on whether advanced prompt engineering or deeper training changes can unlock a truly unfiltered model.
Function-Calling Models Prompt Benchmark Curiosity: Members compared open-source function-calling models, looking for benchmarks and strategies to enhance function-call accuracy.
- Post-training improvements and structured prompts surfaced as prime candidates for refining performance.
Qwen 7B Wows Math Fans with AIME-Level Skills: Qwen 7B tackled AIME questions at o1 level, with this tweet highlighting an MCTS-based reflection approach.
- While many praised the model’s computational finesse, others questioned whether these math feats translate into broader reasoning prowess.

Latent Space Discord

Salesforce’s Surprising Stoppage & Soaring Ambitions: Marc Benioff announced Salesforce will hire no more software engineers in 2025, citing a 30% boost from Agentforce.
- He referenced this article and predicted 'we'll be bigger in five years' despite the freeze.
OpenAI’s Overhaul Overwhelms Custom Instructions: An OpenAI update for ChatGPT’s voice system seemingly broke custom instructions while introducing new features on October 19.
- A tweet highlighted interrupted voice improvements and the pressing need for stable tests during these changes.
Anthropic’s Astonishing $2B Valuation Vault: Sources confirm Anthropic is raising $2 billion, surging to a $60 billion valuation and fueling their 2025 growth strategy.
- A note showed annual recurring revenue reaching $875 million, underscoring 'notable expansion in enterprise sales'.
Google Groups AI under DeepMind: Multiple Google AI teams will merge with Google DeepMind, driving new open model initiatives and developer tools in 2025.
- A post hinted at 'a thrilling year ahead' and signaled possible internal changes to unify AI efforts.
Moondream’s Model Makes Moves: The updated Moondream 2b vision-language model sparked discussion around script availability and refined functionalities.
- A Reddit thread mentioned 'resource sharing' and praised the model’s strong performance.

OpenRouter (Alex Atallah) Discord

Hackathon Hype & Live Agent Studio Showdown: OpenRouter announced a AI Agent Hackathon offering $10 in API credits and a $6,000 prize pool, plus new cash awards for top n8n agents.
- The Live Agent Studio portion runs January 8–22, with winners revealed on February 1 and community voting from January 26 onward.
Gemini Flash Storms the Stage: A user shared performance metrics for Gemini Flash 1.5, reaching 63,364 requests and 7,018 outputs at a cost of $0.000171 with 255.6 tps.
- Enthusiasts praised its features, though some recommended additional tweaks for a smoother experience.
OpenRouter UI Hits a Lag Spike: Members criticized OpenRouter’s sluggish UI when chat history exceeds 1k lines, making scrolling and typing cumbersome.
- They suggested improved pagination and activity filtering to maintain speed.
O1 API Quirks Confound Coders: Developers noted ===== blocks in O1 API responses, replacing backticks and causing confusion.
- Some guessed this might preserve tokens, but many found it disruptive.
Hanami Gets a Quick Shout: A few folks wondered if anyone was adopting Hanami, with one user encountering unexpected characters during tests.
- Discussion followed on its reliability, though concrete details were limited.

Perplexity AI Discord

Perplexity Unrolls CSV Downloads: Perplexity introduced an option to download tables as CSV from responses, making data extraction a breeze.
- Developers welcomed this feature, as shown in this screenshot, describing it as a crucial convenience for data tasks.
Youzu.ai Interiors Inspiration: The AI-driven Youzu.ai helps users plan room designs and identifies local purchase options, easing the shopping process.
- Community feedback praised its user-friendly approach, calling it a game-changer for stressful design tasks.
Ecosia Courts Perplexity for Green Partnership: A product manager from Ecosia reached out to Perplexity, seeking a collaborative effort and green search synergy.
- They struggled to find the right contact, so they asked the community for intros, hoping to reduce friction in connecting the two platforms.
NVIDIA's Home Supercomputer Sparks Conversation: According to this announcement, NVIDIA released a $3000 supercomputer package for personal use.
- Enthusiasts noted the potential for AI experimentation at home, praising the possibility of HPC power beyond typical workstation limits.
Toyota's Rocket Rumblings: Reports indicate that Toyota is exploring new rocketry efforts, as mentioned in this article.
- Although primarily an automotive manufacturer, Toyota's expansion into aerospace stirred speculation about tech crossovers.

Cohere Discord

Cohere's 'North' Nudges Productivity Gains: Cohere announced the early access launch of North, an all-in-one secure AI workspace that integrates LLMs, search, and agents to outdo Microsoft Copilot and Google Vertex AI Agent Builder, as shared in their blog.
- They boasted about a seamless user experience for daily tasks, and the community highlighted its potential to drive operational efficiency, referencing Cohere's official tweet.
Command R+ Powers Large Generative Runs: A user emphasized Command R+ for large generative models, referencing the official model overview for advanced workflows and performance details.
- Community interest included suggestions on how to incorporate Command R+ into daily tasks, reaffirming its role as a key feature for robust model usage.
Upgrading from embed-v2 to v3 Stirs Concerns: A user sought guidelines for migrating from embed-v2 to v3, citing worries over regenerating massive corpora.
- They noted the prospect of embed-v2's deprecation, triggering conversation on incremental upgrade strategies and potential pitfalls.
Rolling Chat Approach Extends 4k Token Limit: Users expressed frustration with 4k token constraints when generating complete chapters or reasoning using cmd-r+.
- The community proposed adopting a rolling chat history to surpass these boundaries, pointing to a smoother method for extended outputs.

tinygrad (George Hotz) Discord

Bounties Boost PR #8505: A reward is offered for retesting PR #8505 with MOCKGPU AMD on OS X, payable via PayPal or USDC in the Tinygrad community.
- George mentioned it specifically targets OS X issues, and members hope this stabilizes GPU tests.
LL-VM or Bust!: They proposed merging LLVM JIT with LLVM autogen, referencing [PR #8486] for simpler iteration while managing multiple versions in support/llvm.py.
- Concerns about function signature shifts in LLVM were eased, and tests from LLVM 14 to 19 showed no showstoppers.
Newcomers, Start Contributing Now!: Members urged new developers to join Tinygrad, emphasizing that more pull requests are welcome.
- They noted a bounty system on specific tasks, underscoring a supportive environment.
TinyGrad Blog Teaches the Code Layout: A new blog post outlines Tinygrad's key structure, with a focus on the core tinygrad/ directory.
- The author warns against editing untested code outside this area, and the community agrees with this cautious strategy.
Device Setup Means Business in TinyGrad: Developers clarified that setting Device.DEFAULT before making Tensors allows METAL, CUDA, or CLANG usage as needed.
- They added that CLANG runs on CPU by default, giving more direct control in Tinygrad.

Nomic.ai (GPT4All) Discord

Nvidia Crushes Vulkan in GPT4All Benchmarks: Members observed Nvidia GPUs outperforming llama.cpp Vulkan when running GPT4All, referencing issue #3365 for details.
- They credited the CUDA stack for superior speed, showcasing notable hardware-based gains.
phi-4 Model Makes Waves: Users tested phi-4-Q4_0 in GPT4All and confirmed it runs well on JavaScript tasks, with details at phi-4-Q4_0.gguf.
- They highlighted its MIT license, citing the Microsoft release on Hugging Face.
Local Server API Triggers Confusion: Members discovered the local server API only recognized OpenAI calls, causing errors with missing openai_api_key configs.
- They questioned the absence of local hosting support, noting current constraints in GPT4All setups.
Chat Template Setup Baffles Beginners: A new user struggled configuring the Vicuna chat template, as older models lacked specialized instructions.
- They were directed to GitHub for guidance on ensuring templates produce correct outputs.
Roleplay Models Stir Interest: For COTE anime RP, the group proposed Nous Hermes 2 for immersive content and creative depth.
- They also mentioned exploring llama3-8B-DarkIdol-2.2-Uncensored-1048K for further experimentation.

LlamaIndex Discord

GitHub Gathering & Agentic Workflows: The GitHub HQ Event set for Jan 15th promises insights into debugging AI agents with ArizeAI, fast inference with GroqInc, and agentic workflows with LlamaIndex, as outlined in this announcement tweet.
- This in-person gathering aims to merge practical demos with real-time development tips for AI-driven systems, with participants anticipating significant knowledge gains.
Agentic Document Workflows Arriving 2025: A new paradigm named Agentic Document Workflows (ADW) will integrate documents directly into business processes by 2025, according to this blog post.
- Community members described it as “a dedicated push for streamlined multi-format processing,” pointing to more robust pipeline designs for organizational efficiency.
Ollama's 3-Second Speed Streak: An updated Ollama reportedly cut evaluation time below 3 seconds, fueling interest in performance benchmarks among local LLM users.
- This development provoked chatter about real-time inference possibilities, as participants weighed the implications for broader deployment scenarios.
Vector Indexing Twists with PostgreSQL: Members explored VectorStoreIndex with PostgreSQL JSON indexing to filter nodes by metadata, highlighting partial workarounds and design challenges.
- Some advocated for official indexing support to handle large data volumes, underscoring calls for more advanced search functionalities in LlamaIndex.
Token Tussle with QueryFusionRetriever: Users combining TEI Reranker with QueryFusionRetriever encountered a 'Input validation error' due to token limits, especially with a 25 top-K setting.
- Some suggested lowering top-K or adjusting parameters, referencing TEI Rerank docs for guidance on optimal memory usage.

Modular (Mojo 🔥) Discord

Rust Refines Actor Deployments: Rust syntax for actor implementations in Mojo cuts extra noise from type boundaries, notably in GlommioMultipaxosWorker.
- Participants worried that overload resolution might escalate complexity in expanded codebases.
Quojo Quickens Quantum Coding: Community showcased the Quojo library as a quantum computing machine in Mojo, highlighted in this GitHub repository.
- They praised its rapid build-out, likening it to a Qiskit-style approach for bridging theoretical quantum principles with hands-on development.
MLIR Trims Unused Steps: A shared YouTube demo illustrated how MLIR steers hardware resource usage for quantum operations.
- Members noted it can remove identity multiplication at compile time, boosting runtime efficiency.
Qiskit Jumps Into Quantum Simulation: Some recommended Qiskit for experimenting with quantum circuits, even without immediate IBM API connections.
- They contrasted it with smaller frameworks like Quojo, agreeing the Qiskit ecosystem helps new developers ramp up quickly.

LLM Agents (Berkeley MOOC) Discord

Hackathon Hold-Up Halts Results: Organizers updated the timeline on the Hackathon website, stating final results are postponed until January due to pending feedback from judges, which has impressed many with outstanding submissions.
- They mentioned most tallies are done, but certain judges haven't submitted their final reviews yet, prompting participants to await an official announcement soon.
Google Form Fiasco & Twitter Trouble: A user struggled to edit a previous Google Form submission, leading organizers to suggest re-submitting, while others recommended using a different email if the original one is closed.
- Concerns about a deactivated Twitter account arose regarding certificate eligibility, with confirmations that inactivity won't jeopardize final certification.

OpenInterpreter Discord

Python Puzzlement in OI 1.0: Members discovered that using --tools interpreter in OI 1.0 might not fully enable direct Python code execution, since it still tries to call python.exe.
- One line in the system message suggests OI 1.0's built-in interpreter has changed, leaving some users unsure if direct code running is still feasible.
gpt-4o-mini Gains Some Ground: A few folks tested the gpt-4o-mini model, noting that it performs better with certain commands and can print partial file contents instead of the entire text.
- They also pointed out that the AI still shows some weaknesses, prompting more tweaks to refine performance.
Curiosity Over Model & Parameters: A user sought specifics on the model's capability, looking for a breakdown of parameters and any necessary modifications.
- This request spurred added interest in adjusting interaction approaches for better results.
Checking Custom Instructions: Participants shared custom instructions encouraging careful tool usage, especially around code execution in OI 1.0.
- They suggested verifying command viability before running, aiming to help the AI handle complex tasks more reliably.

LAION Discord

TruLie Ties Up Curiosity: Attendees sought info on the TruLie dataset, probing its current relevance and practical applications, but no direct link was shared.
- Some participants mentioned an interest in how it might serve potential ML pipelines, though no further details were provided.
Image-to-3D Gains Ground: Members discussed image-to-3D technologies that can run on a laptop, citing Gaussian splat and NeRF libraries alongside 3D Arena.
- They highlighted single-image pipelines for 3D reconstruction and weighed GPU performance impacts for practical workflows.
Chirpy3D Creates Avian Art: Discussion of Chirpy3D centered on continuous part latents for 3D bird generation, with ties to University of Surrey and Imperial College London.
- Some participants recognized Chirpy3D’s creative approach, blending part-based modeling with generative design for potential future expansions.
World Models Widen 3D Horizons: Members touched on World Models, which integrate physics-aware networks for realistic video creation and connect closely to 3D generation topics.
- They saw these models as complementary to image-to-3D workflows, though no direct resources or links were mentioned.
Quest for an Agent Registry: Participants sought a good open tool registry for building AI agents, emphasizing collaboration and code-sharing.
- A user asked about any standard resource, but no specific links or solutions emerged from the conversation.

DSPy Discord

Chatbot COT Gains a Boost: One participant asked about improving Chain of Thought (COT) for chatbots beyond just adding a signature, highlighting the significance of thorough evaluation methods.
- They specifically said Is there any way to improve COT other than setting a signature?, hoping to refine reasoning steps in chat interactions.
Evals Step into the Spotlight: An article by Drew Breunig championed building your own eval for LLMs, explaining it as more critical than the model or prompts, and shared his blog post.
- He declared your eval is the most valuable AI asset you own, urging teams to refine approach, track improvements, and test frequently.
Drew Breunig Highlights Tools and Career: He introduced his background at PlaceIQ, Precisely, and the Overture Maps Foundation, sharing a personal site with details about his work timeline.
- He showcased StepList for tracking routines and Reporter for self-monitoring, suggesting these solutions accelerate personal awareness.

AI21 Labs (Jamba) Discord

Jovial Jamba Jumpstarts Podcast Transcript Queries: One user built a basic Python app using Jamba's Conversational RAG to query podcast transcripts for easier recall.
- They described it as It's been a lot of fun, even though it's still a work in progress.
AI Code Generation's Quirky Stumbles: Another user noted comedic slip-ups while troubleshooting HTML, Javascript, and PHP code that had been generated by AI.
- They suggested that the current boom in AI tech is only scratching the surface of what's possible.
PHP Persists as a Reliable Web Companion: A member continues to rely on PHP for web development and local IRC bot coding, praising its easy integration.
- They said Jamba simplifies certain tasks by using conversation arrays similar to other APIs.

Torchtune Discord

ModernBERT Makes a Brief Appearance: One user in #general asked if anyone had tested finetuning ModernBERT, hoping to compare experiences and glean performance tips.
- No further responses or references emerged, and the conversation remained limited to this initial prompt.
Nectar Social’s Sweet Referral Bounties: In #jobs, Nectar Social announced multiple open roles (including Sr/Staff Product Manager and LLM/AI Engineer) with referral bounties up to $10,000 for successful hires.
- They operate in semi-stealth, recruit in Seattle and beyond, and offer flexible options for roles like a Customer Success Manager or Founding Account Executives in NYC/LA.

The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Axolotl AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Mozilla AI Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The HuggingFace Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

The Gorilla LLM (Berkeley Function Calling) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.

PART 2: Detailed by-Channel summaries and links

{% if medium == 'web' %}

Stability.ai (Stable Diffusion) ▷ #general-chat (719 messages🔥🔥🔥):

ComfyUI Features, OpenPose Control in Pony, Electrical Outages Impact on SD, Updates for AI Tools, Using AMD GPUs with Different Interfaces

Exploring OpenPose Control in Pony with ComfyUI: Users discussed the best methods for utilizing OpenPose control within the Pony model in ComfyUI and sought guidance on installation and node integration.
- One user highlighted that while experimenting with ComfyUI, they faced hurdles, leading them to consider using alternatives like Forge UI.
Concerns Regarding Power Outages While Using SD: A user raised concerns about the potential impact of a power outage during a Stable Diffusion generation and whether their GPU would be affected.
- Another user advised that while the GPU would likely be fine, a power loss could corrupt the file system, leading to data loss.
Updates and Maintenance for AI Tools: Users shared experiences regarding the challenges of keeping AI tools like A1111 and ComfyUI up to date, especially after encountering issues during updates.
- It was noted that outdated Python versions could cause compatibility issues with various AI models and suggested using Python 3.10.11 instead.
Comparing AMD GPU Support and Performance: The discussion shifted to AMD GPU support, focusing on the differences between using ZLUDA and ROCm directly on Windows.
- It was clarified that while ZLUDA provides certain benefits on Windows, native support for AMD GPUs can also be achieved without it.
User Interactions and Community Support: New users sought help in the channel regarding specific model functionalities and usage within the community, emphasizing the need for updated guides.
- The importance of engaging with developer communities for sharing knowledge and troubleshooting common problems in AI tools was highlighted.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #general (393 messages🔥🔥):

Phi-4 Bug Fixes, Unsloth Model Deployment, Chat Templates in LLMs, Adapting Models for Inference, Quantization Impacts

Unsloth Phi-4 Bug Fixes: Unsloth's Phi-4 version has surpassed the official Microsoft version on the Open LLM Leaderboard, with ongoing bug fixes being reported on the blog.
- Users are encouraged to keep an eye on the blog for updates and to check if re-running finetunes might improve performance following the recent fixes.
Compatibility Issues for Apple Devices: Currently, Unsloth does not support Apple silicon unless running Linux, limiting some users' ability to explore its features.
- Users switching to Ubuntu reported performance issues like running out of memory, particularly when using certain models like Gemma.
Understanding Chat Templates: Chat templates can affect both the fine-tuning process and the deployment of models, with specific structures suggested for use.
- The trained chat template is included in the tokenizer_config.json, and users should design templates according to their application needs.
Model Adaptation Recommendations: When dynamically attaching an adaptor to an Unsloth model, using a higher resolution (16bit) model is preferred over a quantized (4bit) version for inference.
- This approach minimizes loss and is recommended for merging to achieve better performance.
Quantization Insights: Users questioned how quantized models could outperform non-quantized ones and discovered that despite inherent noise, they still showed improvement over Microsoft’s offerings.
- Discussions included the trade-offs involved with quantization and the importance of evaluating model adaptation strategies.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #off-topic (3 messages):

Job Search Success, Funny GIFs

Infinit3e celebrates new job: A member joyfully announced that their job search is over, stating, 'job search done im now employed'.
- This elicited a response highlighting a GIF that features a man in a suit and tie making a funny face in front of a crowd.
Amogus6969 GIF shared: A member shared a funny GIF depicting a man in a suit making a humorous expression, garnering attention in the channel.
- The content description noted it as 'a man in a suit and tie is making a funny face in front of a crowd'.

Link mentioned: Amogus6969 GIF - Amogus6969 - Discover & Share GIFs: Click to view the GIF

Unsloth AI (Daniel Han) ▷ #help (48 messages🔥):

Mathstral-7B-v0.1 Model Limitations, Model Suggestions for Tabular Calculations, Training for Longer Contexts, Merging LoRA Models, Classical ML for Name Splitting

Mathstral-7B-v0.1 not Supported: The mistralai/Mathstral-7B-v0.1 is confirmed as neither a base model nor a PEFT model, leading community members to seek alternatives.
- Theyruinedelise stated that support for this model is planned for the future.
Recommendations for 7B Tabular Model: etherl recommended trying the Qwen/Qwen2.5-Math-7B-Instruct model for good performance in tabular calculations, specifically for small markdown tables.
- Member marioz_70065 plans to experiment with training for one epoch with a learning rate of 3e-5.
Training Models for Longer Contexts: shaswat_singh. inquired about training a llama 7B model for longer contexts, suggesting the instruction/input key is nearly 2k tokens and output could reach 7k tokens.
- marioz_70065 advised breaking down queries into smaller parts to manage processing.
Merging and Quantizing LoRA Models: Discussion centered around merging LoRA models trained on 4B versions and whether to use the 16B model for this process.
- fjefo clarified that using a 16-bit model for both training and merging ensures integrity during the process.
Using Classical ML for Name Splitting: Member andresmosqueraw sought advice on splitting names and identifying gender, proposing fine-tuning as a solution.
- mrdragonfox suggested using classical machine learning techniques rather than a large language model for this task.

Links mentioned:

Unsloth AI (Daniel Han) ▷ #research (4 messages):

DLSS for Language Models, Speculative Decoding

Exploring DLSS-like Techniques for Language Models: A member questioned the existence of techniques similar to DLSS for language models that could optimize training or inference processes to reduce resource usage.
- They specifically sought insights into research addressing this optimization challenge.
Introduction to Speculative Decoding: Another member introduced the concept of speculative decoding as a potential approach related to the earlier inquiry.
- This suggestion was affirmed by the initial questioner, who expressed gratitude for the information.

Codeium (Windsurf) ▷ #discussion (125 messages🔥🔥):

Codeium Self-Hosted Version, Windsurf Performance Issues, Cascade Model Benefits, Custom Model Training, Prompt Credit Usage

Codeium introduces self-hosted version: A user highlighted that a self-hosted version of Codeium is now available in the enterprise offering.
- How can I get this? others queried, wondering about the deployment details.
Windsurf facing stability problems: Several users reported persistent problems with Windsurf, including window crashes and connection issues.
- One user noted they struggled with the program frequently freezing and receiving 'The window is not responding' error.
Cascade shows advantages over Windurf: Users praised the Cascade model for its efficiency, especially in handling actions without exceeding flow limits.
- One member shared that they managed to build their company website with minimal coding using Cascade's capabilities.
Interest in custom models for specific tasks: A member inquired whether it is possible to train custom models on Codeium for specific tasks.
- This sparked a discussion on capabilities and available training options within the platform.
Prompt credit consumption concerns: Users expressed concerns about exceeding their prompt credits, especially with extensive usage on collaborative tasks.
- One user mentioned they managed to deplete their credits rapidly due to several operational sessions.

Links mentioned:

Codeium (Windsurf) ▷ #windsurf (140 messages🔥🔥):

Windsurf Installation Experiences, Cascade Panel Issues, Flow Credits and Billing Concerns, Agent Integration with Windsurf, Update Feedback

Windsurf Installation Experiences: Users shared their installation experiences on various systems, with one mentioning that Windsurf works perfectly on Ubuntu 24.04 and another on Arch with Hyprland having issues with token submissions.
- After troubleshooting, one user successfully resolved their issues by deleting specific directory files related to Windsurf.
Cascade Panel Issues: Concerns were raised about the Cascade panel auto-opening when starting a new project, with one user reporting related issues affecting their workflow.
- Others suggested that current settings might not effectively prevent the panel's reopening, indicating a need for a clearer solution.
Flow Credits and Billing Concerns: Several users expressed frustration over billing-related issues, particularly being charged twice for flow credits they did not receive, prompting queries about how to resolve these concerns with support.
- One user noted that they were charged a hefty amount yet received limited service, feeling ignored by support channels.
Agent Integration with Windsurf: A user inquired about the possibility of utilizing agents with Windsurf, referencing the interest in features like those recently introduced in other platforms.
- Responses indicated uncertainty about this integration, revealing a gap in feature familiarity among users.
Update Feedback: After a recent update, users reported various issues, including commands not executing and unexpected behavior in Cascades, like generating unnecessary source code.
- Feedback suggested that some users experienced a decrease in performance and functionality compared to previous versions.

Links mentioned:

Cursor IDE ▷ #general (246 messages🔥🔥):

Cursor composer issues, Claude performance, Cursor rules usage, Community feedback, Cursor documentation

Persistent issues with Cursor composer: Users reported significant issues with the Cursor composer frequently ignoring provided cursor rules and making unwanted changes to their codebases.
- Feedback indicated that even premium plans are experiencing reduced performance, prompting users to reconsider their reliance on the composer feature.
Claude's variable performance: Contributors noted that Claude can perform well when prompted for specific tasks, particularly when instructed to utilize inner thoughts and monologues in its responses.
- However, many users expressed frustration over its inconsistent handling of prompts and the need for careful oversight when applying changes.
Proper usage of Cursor rules: It was emphasized that users should create a .cursorrules file to set clear guidelines for the behavior of models like Claude when working on projects.
- Participants shared strategies for structuring prompts to improve Claude's adherence to rules, suggesting that more focused prompts yield better results.
Community Engagement and Support: Users discussed the community's role in providing support, highlighting that the official Discord offers a platform to share issues and get responses from Cursor developers.
- For serious inquiries, the community suggested utilizing the Cursor forum to seek direct assistance from the development team.
Cursor's Documentation and Features: There was a consensus among users that the Cursor documentation is lacking in certain areas, likening it to being hosted on a problematic platform.
- Users expressed a desire for improved documentation and visibility regarding request statistics and application features.

Links mentioned:

Stackblitz (Bolt.new) ▷ #prompting (11 messages🔥):

Prompting Techniques, Payment System Issues, Public Repos Feature, Sleep Schedule Jokes, Subreddit AI Promotional Post

Mastering Prompting with Colors: A member emphasized the importance of specifying color names and hex codes in prompts, noting to clarify where to use each color.
- Just an idea should suffice when stating your requirements - brevity over detail!
Payment System Still Under Construction: A member mentioned that the payment system isn't operational yet, suggesting a working solution is still forthcoming.
- This indicates ongoing development efforts to finalize the feature for users.
Exciting Public Repos Feature Release: Another member shared a link related to a public repos feature announced on X back in October, detailing how it works with GitHub URLs.
- This feature allows users to easily access public repositories with a simple prefix, enhancing accessibility.
Sleep Schedules Affecting Responses: Members jokingly discussed the impact of messed up sleep schedules on response times, with apologies exchanged.
- This personal touch adds a layer of camaraderie among community members.
Subreddit AI - Open for Questions: A promotional post shared a link to Subreddit AI, inviting questions about prompting techniques used in the project.
- This showcases the community's openness to assist and share knowledge about prompting!

Links mentioned:

Stackblitz (Bolt.new) ▷ #discussions (211 messages🔥🔥):

Bolt Performance Issues, PWA Development, Token Management, Integration Concerns, GitHub Deployment Issues

Users report performance issues with Bolt: Many users have expressed frustration regarding Bolt's performance, including frequent errors that lead to loss of code and excessive token use.
- One user mentioned spending over 100k tokens due to Bolt continuously inserting code sections that weren't modified.
Challenges with PWA Development: Questions arose around the compatibility of StackBlitz with Progressive Web Apps (PWAs), with some users receiving error messages about unsupported configurations.
- Despite these challenges, one user reported successfully deploying a PWA from Bolt, showing mixed experiences.
Token spending and management insights: Discussions on token management emphasized the need to sometimes insert entire files into the chat to clarify changes needed, as miscommunication leads to excessive token burns.
- Users shared tips on managing token usage effectively to avoid being stuck in repetitive loops of generating code.
Integration with Supabase and Rollbacks: Concerns were raised about Supabase migrations not rolling back with the project code, leading to confusing behavior and potential irreversible changes.
- Users recommend maintaining regular forks of projects for easier recovery, but acknowledged difficulties exist with automating migrations rollbacks.
Deployment issues on GitHub: Some users experienced difficulties with deploying their projects to GitHub, facing issues with empty repos during the process.
- The need for better integration with GitHub for managing code versioning and rollbacks was highlighted as a priority for improving user workflow.

Links mentioned:

aider (Paul Gauthier) ▷ #general (66 messages🔥🔥):

AI Editor Comparisons, Aider and O1 Performance, Discussion on AI Capabilities, Development Contributions to Aider, Usage of OpenAI Models

Comparing AI Editors: Claude vs Deepseek: Users shared mixed feelings about Deepseek, with some finding it less competent than Claude, despite one noting its sneaky behavior in handling commands.
- Deepseek sometimes distracts Aider, leading to unexpected issues in execution.
Aider Becoming an Assistant's Assistant: A user humorously noted that Aider is poised to become their assistant's assistant, showcasing the push toward automation in coding.
- There was a call for those working tirelessly to create the perfect coding environment to step forward and connect.
The Future of AI and Proactivity: A participant expressed optimism about AI's future, suggesting that it will become more proactive and ultimately increase the ratio of questions between AI and users.
- Discussion included the current constraints of AI being related to power and computation costs, with aspirations for future advancements driven by Moore's Law.
Aider's Contribution to Automated Coding: One user described a vision where making issues in Aider could result in automated pull requests, showcasing the integration of AI in software development processes.
- The conversation reflected on maintaining human oversight during the coding process while also acknowledging Aider's potential to help streamline development.
Clarifying OpenAI Model Configurations: A user inquired about the difference between models prefixed with openai/ and those without, seeking to understand the codebase better.
- The responses clarified that these models are functionally the same and can be referred to with or without the prefix, reflecting flexibility in naming conventions.

Links mentioned:

aider (Paul Gauthier) ▷ #questions-and-tips (61 messages🔥🔥):

Aider Configuration Issues, OpenAI Model Access, DeepSeek Performance, Task Management Techniques, Context Management in AI Models

Aider Configuration Issues: Users reported issues with Aider sending a 'prompt' list instead of a 'messages' list when communicating with a local litellm proxy, leading to TypeError messages.
- Configuration details shared highlighted that the litellm_provider should match the first chunk, indicating potential misconfigurations in user setups.
OpenAI Model Access Options: A user inquired about access to a Tier 5 OpenAI key and received suggestions to use a $200 O1 Pro subscription or available services like Unify.ai for accessing various models.
- The community discussed the benefits and implications of using different providers, including potential costs and the availability of features.
DeepSeek Performance Issues: Concerns were raised regarding DeepSeek's performance, with users experiencing stalls after a few requests and questioning whether it was a normal issue.
- Some members shared experiences of consistently using DeepSeek without issues, suggesting that VPNs might be used to improve access to less congested models.
Task Management Techniques: A user sought advice on managing multiple task suggestions made by the AI and was advised to create a TODO.md file for tracking tasks effectively.
- The conversation highlighted the need for efficient task management workflows and strategies to handle multiple suggestions without losing context.
Context Management in AI Models: Users discussed Aider's handling of chat history, noting that it retains history within sessions but can be cleared to avoid confusion for the model.
- The community suggested approaches to minimize repeat suggestions by encouraging the model to process context more effectively.

Links mentioned:

aider (Paul Gauthier) ▷ #links (1 messages):

Gemini 2.0 Flash Experimental, App Development Assistance, Voice Mode Interaction

Gemini 2.0 Flash shines in voice interaction: While running errands, I engaged with Gemini 2.0 Flash Experimental in voice mode on iOS, treating it like a passenger.
- It surprisingly generated criteria and discussed tasks for developing my app idea, illustrating its conversational capabilities.
Voice Mode misses markdown output: Despite effective task generation, Gemini 2.0 Flash did not create markdown files for my app specification.
- I noted this limitation; it would have been helpful for organizing information during development.
Concise task summary feature: After getting home, I prompted Gemini to summarize our conversation, resulting in a clear set of bullet point tasks.
- This feature proved valuable for structuring my workflow for the app development process.

Notebook LM Discord ▷ #use-cases (19 messages🔥):

DeepResearch Reports, Quotation Mode in NotebookLM, Podcasts from English to Mandarin, System Prompt in NotebookLM Plus, Creative Podcasting Prompts

DeepResearch Reports integration discussion: Members discussed the lack of direct integration between DeepResearch and NotebookLM, exploring alternatives like using extensions to bulk upload sources.
- One member shared a YouTube video that covered tips on improving research and content creation with NotebookLM.
Instructing NotebookLM for direct quotes: A member successfully instructed NotebookLM to only respond with direct quotes from sources using a command structure in the system prompt.
- They indicated challenges with consistency in responses unless using the Plus version for enhanced memory retention.
Mandarin podcast generation inquiry: A member inquired if anyone had figured out how to generate a Mandarin podcast from English source content through NotebookLM.
- The responses regarding this specific task were less concrete, indicating a need for community collaboration.
Clarification on system prompts in Plus: Discussion arose about the presence of system prompts in NotebookLM Plus, with members highlighting confusion due to identical appearances between Plus and free versions.
- Clarification was sought on the functionality differences related to using system prompts consistently.
Request for podcasting prompts: A user shared a request for effective prompts tailored for podcasting, inviting the community to contribute ideas.
- The request highlighted a collective interest in enhancing the creative process for podcast production.

Links mentioned:

Notebook LM Discord ▷ #general (94 messages🔥🔥):

Notebook LM Usage Issues, Podcast Generation, Workspace License Troubles, AI Tool Features, Language Support

Notebook LM Usage Issues Overflow: Users reported challenges when accessing features of Notebook LM, often linked to workspace licenses and features being removed due to low usage.
- Some members asked questions about managing sources, resolving glitches with uploads, and retrieving previous conversations.
Generating Customized Podcasts: Members explored options for generating podcasts from selected sources, emphasizing that users can specify which sources to focus on for audio outputs.
- A workaround was proposed to use other tools like Google's Illuminate to enhance voice variety in generated podcasts.
Workarounds for Feature Limitations: Hints were shared about how to manage multiple modules in a notebook, indicating there's no existing feature to link notebooks for responses from multiple sources.
- Users were encouraged to create a new notebook if they faced issues with outdated sources to simplify their tasks.
Language Support and Interaction: Discussions touched on using Notebook LM in different languages, indicating that users can prompt the tool to respond in their preferred language like Japanese.
- Participants mentioned the accuracy of translations and expressed enthusiasm about supporting multilingual interactions.
Sharing Notebook LM Content: Users expressed interest in sharing their Notebook LM-generated content with others while seeking clarifications on editing access for shared material.
- Resources were provided for those new to Notebook LM, including tutorial playlists to assist in getting started effectively.

Links mentioned:

LM Studio ▷ #general (66 messages🔥🔥):

LM Studio related issues, Model loading problems, Directory structure for models, Announcement of Qwen Chat, Insights on LLM application development

Users face issues with LM Studio model loading: Many users discussed problems loading models in LM Studio, with confusion around version compatibility and model paths.
- One user resolved their issue by ensuring they didn't open the app from the installer and ran it directly from applications.
Confusion over directory structure for models: Users expressed frustration about LM Studio's requirement for a specific directory structure to access models, complicating their ability to share models across applications.
- Alternatives for organizing models were suggested, but many preferred a unified directory approach for ease of use.
Exciting launch of Qwen Chat: An announcement was made for the launch of Qwen Chat, showcasing its features for interacting with various Qwen models.
- Features include model comparisons, document uploads, and support for visual understanding, with more enhancements planned for the future.
OpenCL backend support for Snapdragon X Elite: A user inquired about potential support for OpenCL backend on Snapdragon X Elite, noting recent updates in Llama.cpp.
- It reflects the broader interest in optimizing LLaMA models for different hardware setups.
Development trends in LLM applications: Users discussed the convergence of features across different LLM applications as they develop over time, with many sharing their experiences using multiple tools.
- One user highlighted the fun in developing custom applications that leverage these LLMs while keeping pace with competitive features.

Links mentioned:

LM Studio ▷ #hardware-discussion (33 messages🔥):

AMD RX 7900XT performance, External GPU options for MacBook Pro, Finding system bottlenecks, Memory configuration for ML models, Availability of DIGITS

AMD RX 7900XT vs Nvidia GPUs: Members discussed the performance of 7900XT in TOPS and compared it to 4090, 4080, and 3090 models, raising concerns about memory bandwidth.
- The conversation suggested looking at specific benchmarks provided in this Reddit discussion... for detailed comparisons.
External GPU support on MacBooks: A member inquired about using a sidecar type of graphics card with a MacBook Pro, specifically with M3 Pro Max and 64GB RAM.
- However, it was confirmed that this is not possible with Apple Silicon, unlike older Intel Macs that had such capabilities.
Locating system bottlenecks: A user asked about finding bottlenecks in their system, sharing details about their Ryzen 7 7800X3D and AMD RX 7900GRE setup while running Llama 3.3 70B Instruct.
- Responses indicated that performance issues could be related to RAM and that fully loading data into VRAM would significantly enhance speed.
Tuning MacBook Pro memory for ML models: To dedicate more memory to VRAM, a member suggested modifying the /etc/sysctl.conf file to set iogpu.wired_limit_mb=54272 for better model performance.
- By adjusting these settings, users can potentially run 4-bit and 6-bit MLX models more effectively within their available memory constraints.
Expectations for DIGITS platform: A member expressed anticipation for the arrival of DIGITS, seeing it as the best option for accessing the full Nvidia stack, despite delays.
- Concerns were raised about the potential speed of the compute capabilities once launched.

OpenAI ▷ #ai-discussions (60 messages🔥🔥):

TensorFlow GPU Issues, Model Safety Concerns, Best YouTube Channels for Machine Learning, Jupyter Notebook vs Python File, Environment Setup for TensorFlow

TensorFlow GPU Issues Persist: A member expressed frustrations about their Jupyter kernel not detecting their NVIDIA GeForce RTX 3060 GPU, despite installing CUDA 12.6.3, cuDNN 9.6.0, and TensorFlow 2.11.0.
- I've got 64G RAM and a 12G GPU, they stated, further seeking guidance to resolve this persistent detection issue.
Concerns Over Model Safety and Jailbreaks: There was a discussion about OpenAI potentially overreacting to jailbreaks by making their models safer, as it appears futile to prevent jailbreaks altogether.
- A participant noted there will always be some kind of jailbreak, suggesting that OpenAI might save money by not attempting to patch them extensively.
Searching for ML Learning Resources: A user inquired about the best YouTube channels to learn machine learning, sparking a brief discussion.
- The response highlighted a mixture of humor and skepticism regarding the effectiveness of traditional learning resources.
Preference for Debugging over Jupyter Notebook: One member described avoiding Jupyter Notebook for coding, preferring normal Python files and the ability to set breakpoints for debugging.
- They emphasized that this method provides access to more important information while troubleshooting code.
Installing and Configuring TensorFlow Environment: During troubleshooting, a user was guided through commands to properly set up their tf-gpu-env environment to ensure compatibility with Jupyter Notebook.
- Discussion among users highlighted possible issues in environment setup as essential for successful TensorFlow installation and usage.

OpenAI ▷ #gpt-4-discussions (7 messages):

GPT code handling, Graph generation

Frustrations with GPT Code Responses: A user expressed frustration that no matter how many times they request the full code in their prompts, GPT continues to respond with comments instead of actual code.
- It always makes stuff like "code stays the same here", and users noted that while version 4o works fine, 01 tends to incorrectly handle their requests.
Excitement over Graph Generation: A member remarked on the surprising ability of ChatGPT to generate a GRAPH.
- Another user reacted with disbelief, saying, yea unbelievable in response to the graph generation.

OpenAI ▷ #prompt-engineering (13 messages🔥):

Meta-Prompting, Investor Round for Hassabis, Prompt Engineering Concepts, OpenAI's Financial Returns

Exploring Meta-Prompting: A member inquired about experiences or interesting use cases regarding Meta-Prompting, signaling an interest in expanding their prompting skills.
- Another member tried to respond with jotted down thoughts, highlighting that the group is focused on exploring this concept.
Investor Round Hopes for Hassabis: The discussion shifted toward supporting the upcoming investor round for Hassabis, with one member expressing admiration for his contributions to AI.
- Good prayers are sought for this venture, showcasing the community's encouragement.
Understanding What Makes a Good Prompt: A member emphasized that creating an effective prompt starts with knowing exactly what output is desired from the model.
- This reflects a shared understanding within the group about the foundational aspects of prompt engineering.
Concerns Over Financial Recognition from OpenAI: One member expressed dissatisfaction about not receiving any financial compensation from OpenAI, even while engaging in AI work.
- This led to reflections on the group's closed nature despite their contributions and lack of financial returns.

OpenAI ▷ #api-discussions (13 messages🔥):

Meta-Prompting, OpenAI's Approach to Prompting, Investor Round for Hassabis, Community Engagement in AI, Creating Effective Prompts

Exploring Meta-Prompting Usage: Members discussed the concept of Meta-Prompting and its potential use cases, with one member expressing interest in expanding their understanding of effective prompting.
- Another member contributed that crafting a good prompt begins with knowing exactly what output you want from the model.
OpenAI's Approach Criticized: A member remarked on OpenAI potentially being behind in actual prompting techniques, suggesting changes in system messages could optimize performance.
- This member also highlighted frustration about the group's lack of financial benefits despite engaging with AI technologies.
Investor Round for Hassabis: A member rallied the group to send good prayers for the upcoming investor round concerning Hassabis, acknowledging their impressive work.
- They expressed a sense of camaraderie in wishing for success in securing funding.
Financial Gains from AI Work: Discussions revolved around whether members receive monetary compensation from AI-related work, with a sentiment that many do benefit financially.
- One member sarcastically commented on their lack of income from OpenAI while contributing to the community.

Interconnects (Nathan Lambert) ▷ #events (3 messages):

ICLR Attendance, Meetup Details

ICLR Attendance Buzz: Members are confirming their attendance at ICLR, with one expressing excitement with a simple 'Cya there!'
- Another member, philpax, indicated they will arrive shortly and shared details of their appearance to facilitate recognition.
Philpax Prepares for Meetup: Philpax mentioned they won't have mobile internet and will be waiting outside in a light brown coat, black jeans, along with a gym bag and backpack.
- This note aims to help others identify him at the venue.

Interconnects (Nathan Lambert) ▷ #news (19 messages🔥):

rStar-Math improvements, O1 vs GPT4o + MCTS, Qwen Chat launch, Chinese AI interview insights

rStar-Math shows major progress: Microsoft's rStar-Math improves Qwen2.5-Math-7B scores from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, outperforming existing models.
- It ranks among the top 20% of high school math students on the USA Math Olympiad with an average of 53.3% problems solved.
Debate on O1's efficiency vs GPT4o + MCTS: A discussion emerged about whether O1 offers any unique advantages over GPT4o + MCTS, with members scrutinizing performance and efficiency.
- Concerns were raised about MCTS being resource-heavy, while opinions suggested that O1 may only be a more efficient adaptation of existing strategies.
Qwen Chat enhances interaction with models: Qwen Chat was announced, allowing users to interact with various Qwen models, including Qwen2.5-Plus and Qwen2-VL-Max, in a unified web UI.
- It features document uploads, model comparisons, and promises future enhancements like web search, image generation, and voice interaction.
Chinese AI industry insights from an interview: In an interview, Li Kaifu discussed the challenges faced by Chinese AI startups, citing funding and technological limitations compared to U.S. counterparts.
- He highlighted a shift in focus for Zero One Infinity, now favoring efficient medium-sized models instead of continuing with training large models.

Links mentioned:

Interconnects (Nathan Lambert) ▷ #other-papers (21 messages🔥):

NuminaMath dataset, Lead authors' backgrounds, Psychology and business degrees, Quality of open data, High school competition

Doubts Arise Over NuminaMath Data Quality: While 89.7% of entries in NuminaMath contain one boxed solution, concerns surface as 2.6% have no solutions and 7.7% have multiple, indicating potential quality issues.
- Problems like this underscore the state of open and publicly available data and suggest deeper quality concerns.
Lead Author's Surprising Background: Notably, the lead author on the paper is a PhD student in psychology at Stanford, which raises eyebrows given the project's technical nature.
- Comments suggest this is an unusual crossover, as the author prominently involved in mathematical data analysis hails from a different academic discipline.
High School Competition Acknowledgment: I concluded that I have no chance against Chinese high schoolers was a sentiment echoed by a member who's extensively analyzed the cn_k12 subset of the NuminaMath dataset.
- Frustrations linger regarding the rigorous standards set by peers in educational competitions.
Psychology Programs Are No Joke: Discussion revealed that psychology programs are extremely competitive, with one member underscoring the difficulty of getting accepted into such programs.
- Given the lead author’s background, conversations shift to the nature of transitioning from psychology to business-related fields.
Business Skills as the Next Frontier: A member emphasized that if coding is solved, business is the next frontier, highlighting the practicality of skills beyond technical proficiency.
- This transition illustrates the evolving landscape of job skills in the tech-augmented workspace.

Link mentioned: Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Thought: We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends traditional Chain-of-Thought (CoT) by explicitly modeling the underlying reasoning required to arrive at a particular CoT....

Interconnects (Nathan Lambert) ▷ #ml-questions (11 messages🔥):

Complexity in Large Scale Models, Transformers vs MoEs

Complexity is Key for Large Scale Providers: A member noted that while getting models right entails a lot of complexity, it's undoubtedly worthwhile, especially for large scale providers.
- This sentiment echoes the challenges faced in balancing model performance and ease of hosting.
MoEs outperform Dense Models: Another member emphasized that Mixture of Experts (MoEs) generally outperform dense models when they maintain the same number of active parameters.
- They suggested that more information is stored in more weights, leading to better peak performance.
Transformers Encourage Overlapping Approaches: One member warned against reading about Transformers, mentioning their tendency to for loop over experts, which can be demotivating.
- However, another member argued that this for loop approach might serve as a better introductory concept.
Debate on Architectural Efficacy: Discussion ensued regarding whether Transformers may lead to better models architecturally, with a focus on training efficiency.
- The notion of 'better' models remains subjective, sparking further conversation on performance variances.

Interconnects (Nathan Lambert) ▷ #random (17 messages🔥):

AI Alignment Discussion, Post-training Model Shaping, Imposter Syndrome in AI Fields, Blog Publishing Challenges

AI Alignment Insights from Anthropic Salon: In a recent YouTube video, researchers discussed alignment during an Anthropic event, with Josh Batson mentioning Amanda Askell's role in shaping the base model into a purposeful agent.
- This sparked debate on whether such shaping occurs during pretraining or post-training, with one member likening the process to crafting a character from a block of clay.
Clarification on Model Training Process: Members discussed the notion that character work might be integrated earlier in the model development process rather than just being the final step.
- One member articulated, “It feels easier to teach an agent to be well-aligned than what pops out of pretraining,” suggesting a shift in perspective on training stages.
Struggles with Imposter Syndrome: A member shared experiences of feeling impostor syndrome in the tech field, acknowledging the challenges faced during transitions in their career.
- They humorously described it as a cursed superpower, stressing that while it brings stress, it also adds value to their learning journey.
Challenges in Blog Publishing: Discussion branched into difficulties related to publishing blog posts, with one member mentioning their struggle to balance content with proper citations in MLA format.
- They expressed uncertainty about which posts would appeal to readers, highlighting the trial and error nature of content creation.

Link mentioned: How difficult is AI alignment? | Anthropic Research Salon: At an Anthropic Research Salon event in San Francisco, four of our researchers—Alex Tamkin, Jan Leike, Amanda Askell and Josh Batson—discussed alignment scie...

Interconnects (Nathan Lambert) ▷ #reads (3 messages):

Efficient Deep Learning, Pop-up Interference, Navigation Issues on the Blog

Exploring Efficient Deep Learning Techniques: A shared blog post discusses various techniques for efficient deep learning, including model pruning, quantization, and the evolution of NVIDIA GPUs.
- The article outlines important sections such as existing fast linear algebra methods and inductive biases for better convergence.
Pop-up Interferes with Blog Reading: A user reported a pop-up blocking part of the first page of the blog, causing frustration for viewers.
- Another user humorously commented on the situation, remarking on the challenges of googling.

Link mentioned: Alex L. Zhang | A Meticulous Guide to Advances in Deep Learning Efficiency over the Years: A very long and thorough guide how deep learning algorithms, hardware, libraries, compilers, and more have become more efficient.

Interconnects (Nathan Lambert) ▷ #posts (14 messages🔥):

AI Cost Concerns, Open Source AI, Policy Maker Reactions

Discussions on AI Costing: There was a concern raised about open source AI costing only $5M, prompting reactions from policymakers who seemed alarmed.
- The conversation highlighted comments suggesting that the true costs involve more than just GPU hours, referencing a tweet with further discussion.
Misunderstandings in AI Economics: A member noted that the illustrative figures presented did not account for the total capex, R&D expenses, or costs related to data generation.
- This led to concerns that the average reader might overlook significant details that can affect their understanding of the topic.
General Chat Vibes: There was an overall positive atmosphere as members expressed enthusiasm and agreement on the points discussed.
- Responses included supportive remarks like 'Nice p1!' indicating engagement with the conversation.

Link mentioned: Tweet from Teortaxes▶️ (@teortaxesTex): @natolambert I agree on substance but why do you present this as some debunking? They say right there that GPU-hours*$/hr does not include their total capex, R&D expenses, or data gen.(and it's me...

Eleuther ▷ #general (33 messages🔥):

GPT-NeoX vs Nvidia NeMo, SmolLM Corpus Upload, SciAgents Research Discussion, Modal for Model Training, DL Framework Usability vs Performance

GPT-NeoX favored for performance: A discussion highlighted that both GPT-NeoX and Nvidia NeMo have megatron-based SFT and RLHF, but the preference leans towards GPT-NeoX for its more robust performance.
- Open-instruct is seen as a reliable but less performant option due to its basis on TRL and HF trainer.
SmolLM Corpus release update: A user shared that the SmolLM Corpus upload will be delayed until tomorrow, with expectations for a full dataset of 320GB once sharded.
- They noted the new structure is more usable than the previous 1TB uncompressed version.
Discussion on SciAgents research: Members discussed insights from the SciAgents research, appreciating its ontological approach to uncovering interdisciplinary relationships in scientific fields.
- One noted it doesn't reach GPT-4 level breakthroughs, but acknowledged the potential for applications in higher-level learning orchestration.
Benefits of using Modal for training: Several users praised Modal for training models larger than their local GPUs, emphasizing the $30 monthly free credit as a significant advantage.
- They mentioned it being more cost-effective for larger jobs than traditional dedicated reservations while providing generous support for researchers.
DL frameworks comparison: A comparison of various DL frameworks revealed a spectrum from usability to performance, highlighting the strengths and weaknesses of frameworks like Megatron-LM and HF trainer.
- The challenges of using GPT-NeoX for non-transformer models were discussed, stressing the tradeoff between performance and flexibility.

Links mentioned:

Eleuther ▷ #research (42 messages🔥):

Grokking phenomenon, Weight decay in LLMs, Softmax and scaling issues, Alternative loss functions, Attention mechanisms

Grokking and Softmax Collapse: The work discussed the phenomenon of grokking, where models generalize after overfitting, attributed to Softmax Collapse (SC) due to lack of regularization.
- Members suggested that mitigating SC can enable grokking without heavy interventions while raising concerns about results 'in realistic settings'.
Weight Decay's Dominance: A consensus formed around the heavy reliance on 0.1 weight decay (WD) across many modern LLMs to address optimization capacity issues tied to naïve loss functions.
- Discussion highlighted potential strategies to improve weight decay specifically for attention mechanisms, suggesting that lower WD could avoid inducing low rank.
Critique of Softmax Functions: The community raised doubts about the efficacy of softmax, indicating that all softmax functions may be flawed, with suggestions for alternatives like sigmoid loss.
- Members noted instances where non-softmax functions performed better for attention, while deliberating whether loss functions require separation of probabilities as attentions do.
Exploration of Alternative Loss Functions: There were mentions of using sigmoid loss for LLM training as an alternative, especially in context of CLIP loss, although individual results varied significantly.
- One approach suggested involving auxiliary losses such as abs(norm(logits) - 1.0) to improve training efficiency without complicating the model design.
Scaling in Neural Networks: Discussion around unit scaling emerged, with many expressing belief that it's necessary to counter the pitfalls of rescaling symmetries causing issues in model performance.
- The conversation emphasized that separation of values is crucial for attention mechanisms, whereas for language loss, flexibility in word choices should be prioritized.

Links mentioned:

Eleuther ▷ #gpt-neox-dev (6 messages):

Llama 2 pretraining issues, Memory profiling for GPU usage, Model parallelism configurations, SLURM setups and outputs

Challenges with Llama 2 Pretraining Configurations: Attempts to set up pretraining for a 7B Llama2 style model lead to hangs and OOM errors, despite using a modified 6.7B config.
- Model parallelism settings were suspected to cause issues since the 1.3B config works perfectly across two nodes.
Request for Memory Usage Profiles: A user requested memory usage profiling of 1.3B and 2.7B models under both MP = 1 and MP = 2 configurations to aid debugging.
- Monitoring could reveal excessive VRAM usage, even if it's not leading to OOM errors.
Exploring the 6.7B Model's Memory Performance: Questions arose about whether the 6.7B model experiences OOM errors when using MP = 1 but PP = 2.
- This case had not been tested yet, and the user plans to evaluate it soon.
SLURM Outputs Indicate Potential Issues: Last log entries during runs suggested dependencies on boto3 and hf_transfer for S3 checkpointing without crashing.
- The hanging issues during the Llama 2 config execution showed no progress despite initiating a wandb run.

Links mentioned:

GPU MODE ▷ #general (10 messages🔥):

NCU profile comparison, Scam prevention advice, Learning Triton/CUDA, Options for simulated distributed training, Long context benchmarking

Compare NCU Profiles for Insights: A user suggested that comparing the NCU profile of 32x32 vs 16x16 configurations could provide insights into performance differences.
- This method could reveal areas needing optimization or highlight scale-related issues.
Warning Against Scammers: A member warned others not to send money to a specific user after being misled into conversations about Bitcoin and scams.
- The community quickly acted to ban this user when evidence of fraudulent activity was presented.
Learning Triton/CUDA is Beneficial: One user inquired whether to learn Triton/CUDA when using a limited number of GPUs like 8xH100s.
- Another replied that understanding GPU operations through these technologies is invaluable for writing better performance-optimized code.
Simulating Distributed Training: A member asked about options for faking distributed training without having the necessary infrastructure.
- Suggestions were made regarding JAX and frameworks like Accelerate/Torch Lightning that simplify the distributed training process.
Seeking Long Context Benchmark Recommendations: A user is working to accelerate decoding for LLM inference and is looking for long context benchmarks with extensive output generation.
- They noted issues with existing benchmarks favoring short prompts and expressed a need for benchmarks that reflect realistic long generation tasks.

GPU MODE ▷ #triton (8 messages🔥):

WGMMA Computation Requirement, Triton Implementations of Fused MLP, Profiling Triton Operations, Proton Profiler, Torch.device for Triton Examples

WGMMA requires 4 Warp Computation: It was noted that WGMMA requires splitting computation over 4 warps with a minimum tile size of 64.
- Confusion was cleared up over the requirement for at least a size of 16 for each warp.
Seeking Triton Fused MLP Implementations: A user inquired about existing Triton implementations for the fused MLP featured in the tiny-cuda-nn GitHub repository.
- They also questioned why on-chip MLP isn't widely used, pondering if it's too small for most applications.
Profiling Triton Operations Discussion: Inquiries were raised regarding how to profile Triton operations, contrasting it with tools used for standard Torch and CUDA runtime.
- Responses suggested using proton and NCU for profiling Triton.
Proton Profiler Introduced: A YouTube video titled 'Dev Tools: Proton/Interpreter' was shared, discussing tools useful for writing Triton kernels.
- The video explained the Proton tool in detail, emphasizing its utility for debugging.
Fixing AttributeError in Triton Examples: An error was reported when running tutorial examples in Triton, particularly an AttributeError concerning get_active_torch_device.
- It was advised to use torch.device('cuda') instead, which resolved the issue.

Links mentioned:

GPU MODE ▷ #cuda (14 messages🔥):

CUDA Driver Importance, Memory Banking Lectures, Writing CUDA Kernels, Blackwell vs. Hopper in CUDA, CUDA File Upload Tips

CUDA Drivers Are Essential for CUDA Functionality: A member emphasized that a NVIDIA driver is necessary for CUDA to function, noting that without it, CUDA cannot operate on systems lacking an NVIDIA GPU.
- Another user confirmed that their system returned an error due to not having a GPU, highlighting the need for the right driver setup.
Curiosity about Memory Banking Discussions: A member inquired if any lectures had discussed memory banking, showing interest in learning more about this topic.
- No direct answers about existing lectures were provided, indicating a potential gap in shared knowledge.
Support for CUDA Kernel Writing: A beginner expressed the desire for help writing a simple CUDA kernel to compute the max and mean of a 2D matrix.
- Another user suggested they could receive assistance and pointed out the benefits of uploading .cpp files for easy viewing.
Will Blackwell Enhance CUDA Programming Model?: A member asked whether Blackwell would introduce significant enhancements to the CUDA programming model, similar to Hopper.
- They questioned if optimized Blackwell kernels would align with the enhancements seen in Hopper, including producer-consumer models and async tensor core instructions.
Tips on Sharing CUDA Files for Help: A tip was shared about uploading CUDA files with a .cpp extension for better visibility in the Discord channel.
- Additionally, members were reminded of a specific channel available for questions catering to beginners.

GPU MODE ▷ #jobs (2 messages):

Nectar Social job openings, European consultancy in GPU and HPC

Nectar Social offering lucrative referral bounties: An early-stage AI startup, Nectar Social, is hiring for several roles including Sr/Staff Product Manager and LLM/AI Engineer in Seattle, with referral bounties up to $10,000.
- Contact for details as the company is backed by major funds and is growing quickly, focusing on social commerce.
European consultancy seeks hardware-oriented developers: A European consultancy based in Amsterdam and Budapest is hiring developers focused on GPU and HPC software, particularly in CUDA, HIP, OpenCL, and C++.
- They work closely with clients such as AMD, contributing to projects like rocPRIM and hipCUB, and job details can be found here.

GPU MODE ▷ #beginner (3 messages):

CUDA installation on Ubuntu, Starting AI projects on MacBook without NVIDIA GPU, Using cloud providers for CUDA

Installing CUDA on Ubuntu Made Easy: For those looking to install CUDA on Ubuntu, the NVIDIA CUDA Installation Guide for Linux offers comprehensive instructions.
- It provides crucial details on CUDA's role as a parallel computing platform designed to optimize GPU performance.
MacBook Users Need to Know About CUDA Limitations: A user expressed concerns about starting projects on a MacBook without an NVIDIA GPU, highlighting challenges in CUDA-related tasks.
- In response, another member recommended using cloud providers or platforms like Google Colab or Lightning AI to access CUDA capabilities.

Link mentioned: CUDA Installation Guide for Linux: no description found

GPU MODE ▷ #off-topic (1 messages):

kashimoo: my gf says i sleep talk about CUDA 😭

GPU MODE ▷ #rocm (24 messages🔥):

MI210 Compute Unit Performance, Kernel Launch Optimization, Occupancy Differences in GPUs, Workgroup Size Calculations, RX7900XTX Performance Insights

MI210 occupancy values raise questions: Discussion highlighted that the maximum workgroups per compute unit (CU) for MI210 appear non-round, with a focus on the discrepancy in expected values such as 2.5 maximum blocks per CU related to its architecture.
- One member noted that if __syncthreads() is added at the end of the kernel, the maximum limit becomes exactly 2.
Kernel launch benefits seen in occupancy: It was noted that occupancy for CDNA1 is theoretically 10 but practically achieves around 8 during single kernel launches, whereas simultaneous launches can utilize the full potential.
- The comparison with MI100 suggests that maximum active warps can vary by GPU model based on architectural optimizations.
Confusion in computational results: Members debated the correct calculations for maximum blocks and occupancy per SIMD, revealing discrepancies in expectations versus recorded values based on workgroup sizes.
- A breakdown concluded that occupancy metrics for differing architectures like CDNA2/3 and RDNA2/3 vary, hence highlighting the intricacies of these models.
Exploring RX7900XTX performance metrics: Contributors discussed performance metrics for the RX7900XTX, noting that it supports 16 maximum warps per SIMD compared to MI210's occupancy inspection.
- The addition of background calculations highlighted some unexpected results, especially when comparing across GPU models.
Inquiry on RX5000 GPU performance: A member expressed interest in learning about the performance metrics of RX5000 GPUs, seeking contributions on their performance results.
- This inquiry reflects an ongoing quest for comparative insights across various GPU architectures.

Link mentioned: Optimizing GPU occupancy and resource usage with large thread groups: Sebastian Aaltonen, co-founder of Second Order Ltd, talks about how to optimize GPU occupancy and resource usage of compute shaders that use large thread groups.

GPU MODE ▷ #self-promotion (1 messages):

MicroDiT replication, DCAE autoencoder, MMDIT prompt adherence, Compute grants

MicroDiT Replication Achieved!: The replication of the MicroDiT paper is complete, with the model weights available for download here and an inference script available on GitHub.
- I think I might be cooking—acknowledged the support for compute from a community member in pursuit of advancing this project.
Exploration into Architecture Improvements: There are plans to enhance the architecture by incorporating a DCAE as the autoencoder and using MMDIT for improved prompt adherence.
- The motivation behind these changes is to bolster the overall efficacy of the model in its tasks.
Hunting for Compute Grants: A member is actively seeking compute grants to expedite experiments, noting that their personal GPU lacks the necessary power.
- This search for funding emphasizes the challenges faced by researchers working on computationally intensive projects.

Link mentioned: Tweet from sway (@SwayStar123): MicroDiT replication is complete.Download weights here: https://huggingface.co/SwayStar123/MicroDiT/blob/main/no_cfg/microdit_model_epoch_19.ptInference script here: https://github.com/SwayStar123/mic...

GPU MODE ▷ #🍿 (2 messages):

Alpha competition, Softmax kernel performance

Alpha Competition Kicks Off for Softmax Kernel: A member announced the launch of the first running alpha competition on their staging server, inviting participants to compete for the fastest softmax kernel.
- Shoot me a dm and I'll send you an invite!
Excitement Builds for Competition: Participants expressed excitement about the new competition format, highlighting the opportunity to optimize performance.
- Woo hoo! was the shared sentiment echoing enthusiasm in the group.

GPU MODE ▷ #thunderkittens (3 messages):

ThunderKittens GitHub repo, Collaboration on kernel development, CPP performance metrics

Explore ThunderKittens for Performance Testing: Refer to the ThunderKittens GitHub repository to reproduce tests and explore tile primitives for better performance metrics.
- The harness can output TFLOPS based on chosen dimensions like sequence length and batch size.
Call for Collaboration on Kernel Projects: A member expressed interest in exploring various kernels such as MoE and Deep Seek Attention, inviting others to join in collaboration.
- They emphasized a desire to expand contributions to the repository and encouraged interested parties to reach out.
Clarification on Issue Resolution: A member inquired about the resolution of a previously mentioned issue, indicating ongoing concerns.
- This reflects an active engagement within the group to ensure all matters are addressed promptly.

Link mentioned: ThunderKittens/tests/python at main · HazyResearch/ThunderKittens: Tile primitives for speedy kernels. Contribute to HazyResearch/ThunderKittens development by creating an account on GitHub.

GPU MODE ▷ #arc-agi-2 (7 messages):

ARC Prize Non-Profit Transition, Rejection Sampling Experiment, Text-Domain Exploration, Meta CoT Paper Insights, Positional Encoding Impact

ARC Prize evolves into a Non-Profit: The ARC Prize is transitioning into a full-fledged non-profit foundation to enhance research towards AGI, led by itself as President.
- The initiative aims to guide research progress with Greg Kamradt at the helm, leveraging his expertise from the ARC Prize 2024.
Preparing a Rejection Sampling Experiment: A member is setting up a simple rejection sampling baseline experiment, expected to run tonight.
- This effort highlights a hands-on approach to evaluating sampling methods in their work.
Exploring Text-Domain for ARC: First explorations focus on text-domain as a more feasible approach due to resource limitations with vision-encoders.
- A member expresses willingness to collaborate on extending experiments into vision inputs later.
Meta CoT Paper Discusses Classic Limitations: The Meta CoT paper presents critical insights on why classic Chain of Thought (CoT) approaches often fall short.
- Authors highlight inadequacies in CoT and propose potential improvements to enhance reasoning capabilities.
Custom Positional Encoding Enhancements: A member shared that using custom embeddings for positional encodings significantly boosted their model's performance over traditional methods.
- There was a discussion about utilizing simpler models and tools such as TGI and Axolotl for baseline setups in experiments.

Links mentioned:

Nous Research AI ▷ #general (47 messages🔥):

Contributing GPU to Training, DisTrO Open Sourcing, DeepSeek V3 Differences, Hermes Model Censorship, Cursor vs IDEs

Contributing GPU to Training: A new member is learning about a project and inquired about contributing their GPU for training, but was informed to stay tuned for future updates.
- Currently, contributions are not open, suggesting potential changes in the future.
DisTrO is Open Sourced: The discussion revealed that the DisTrO optimizer has been open-sourced, and its code is available in a shared repository.
- Members noted that many have already implemented it in their projects, raising interest in collaboration and documentation.
Differences in DeepSeek V3 Outputs: A member questioned the varying output quality between the official DeepSeek V3 API and other providers, noting repetitiveness in responses.
- Community members suggested it could be due to aggressive caching by the official API, but had mixed opinions on third-party quality.
Understanding Hermes Model Censorship: A member raised concerns about the Hermes model's censorship, finding it not fully uncensored and dependent on system prompt instructions.
- Discussions highlighted that the model could behave as desired if appropriate system prompts are used.
Evaluating Cursor vs IDEs: A participant questioned if the Cursor tool justifies a switch from traditional IDEs like WebStorm or PyCharm, expressing skepticism based on productivity.
- Community members generally agreed that if a user is comfortable with their current tools, sticking with them is advisable, as productivity can remain similar across AI auto-complete tools.

Nous Research AI ▷ #ask-about-llms (2 messages):

Reducing memory usage in models, Open source function calling models, Function calling accuracy benchmarks

Tips to decrease memory usage without chunking: One member inquired about strategies to reduce memory usage with the Qwen2.5-32B-Instruct-AWQ model on an RTX 4090 while managing a ~6K token input length, reporting OOM errors.
- Enabling flash attention did not significantly reduce VRAM usage, leading to a search for more effective methods.
Inquiry on the best open source function calling models: Another member sought recommendations for the best open source function calling models and asked if any benchmarks are available tracking their function calling accuracy.
- They specifically wondered how models improve at function calling in the post training pipeline.

Nous Research AI ▷ #research-papers (3 messages):

Research ideas and papers, Carson Poole's projects

Inquiry about Progress in Research: A user asked about any progress made, which prompted a response directing them to Carson Poole's personal site for information.
- Carson mentioned that about half of the ideas on his site have been turned into papers and encouraged users to check it out.
Carson Poole's Research Website: Carson Poole shared links to his personal site poole.ai and other projects like Forefront.ai and Simple AI Software.
- He also invited users to email him for more information about research ideas, listing several papers that could serve as inspiration on his site.
Notable Research Ideas to Explore: Carson provided a list of research ideas, including works like ReLoRA, Sparse Upcycling, and GQA, all linked to their original sources.
- Each of these papers was noted to have been discussed earlier on Discord, with specific dates and links provided for additional context.

Link mentioned: Carson Poole's Personal Site: no description found

Nous Research AI ▷ #interesting-links (11 messages🔥):

Qwen 7B performance, Self-reflection in models, Math reasoning capabilities, LLMs usefulness in math, Reliability of LLMs

Microsoft's Qwen 7B achieves impressive math feats: Microsoft showcased Qwen 7B solving AIME at the level of o1, demonstrating enhanced math capabilities through their MCTS driver process that allows for self-reflection, much like reasoning models.
- This development has garnered significant interest, with projections of further discussions on related findings in upcoming podcasts.
Debate on math capabilities vs reasoning: A member argued that math capabilities do not necessarily reflect reasoning capacity, noting that LLMs have shown limited connections between the two.
- The conversation revealed a belief that while math skills are important, the application of LLMs in reasoning tasks remains questionable.
Doubts over LLMs’ reliability in math: Concerns were raised about the reliability of LLMs for solving math problems, with a member suggesting they don't inspire enough trust for more complex estimates.
- Though some acknowledged LLMs being useful for certain tasks like estimates, they remain wary of their applications in more demanding scenarios.

Link mentioned: Tweet from Alex Volkov (Thursd/AI) (@altryne): Ugh guys... Microsoft just made Qwen 7B solve AIME at the level of o1 😵‍💫 They also showed that with their MCTS driver process, there was self-reflection capability like with reasoning models. Will ...

Nous Research AI ▷ #research-papers (3 messages):

Progress on Ideas, Research Ideas List, Carson Poole's Contributions

Inquiry on Progress: A user asked about any updates regarding progress in research or projects, prompting a reply from Carson Poole.
- Carson mentioned that insights can be found on his personal site, hinting that many ideas have been turned into papers.
Carson Poole's Research Ideas: Carson shared a list of research ideas that can be referenced or 'stolen' from, with several ideas linked to relevant academic papers such as ReLoRA.
- He highlighted that some ideas were first seen in discussions dating back to November 2022, exemplifying ongoing collaborative research.

Link mentioned: Carson Poole's Personal Site: no description found

Latent Space ▷ #ai-general-chat (47 messages🔥):

Salesforce hiring freeze, OpenAI custom instructions update, Anthropic funding round and valuation, Google AI product merger, Moondream model release

Salesforce freezes hiring for software engineers: Marc Benioff revealed that Salesforce will not be hiring any more software engineers in 2025, citing a 30% productivity boost from AI via their product, Agentforce.
- During his podcast interview, he expressed optimism for the company's growth despite the hiring freeze, stating they will likely be 'larger' in five years.
OpenAI updates breaking custom instructions: An update from OpenAI is reportedly breaking custom instructions while integrating new features into the ChatGPT voice system.
- A community member noted they were recording a video on improving voice quality just as the update was being implemented.
Anthropic secures $2 billion in new funding: Anthropic is raising $2 billion, bringing their valuation to an astonishing $60 billion, tripling from last year's figure.
- The annual recurring revenue reportedly reached around $875 million, primarily from business sales, indicating significant growth.
Google AI products merging under DeepMind: A member shared excitement over the merging of several AI studios into Google DeepMind, looking forward to advancements in open models and tools for developers in 2025.
- There was speculation about the potential reorganization and structure changes within Google, with conversation about existing redundancies in their model offerings.
Moondream update and script availability: An update was shared regarding Moondream 2b, a vision-language model, with individuals inquiring about scripts available for running the new model.
- The community continued discussing details about the model's capabilities and the overall update process.

Links mentioned:

OpenRouter (Alex Atallah) ▷ #announcements (1 messages):

AI Agent Hackathon, OpenRouter API credits, Live Agent Studio competition, Prize pool increase, Registration details

Hackathon offers OpenRouter credits and cash prizes: Participants in the ottomator.ai's AI Agent Hackathon can claim $10 in OpenRouter API credits, with a total prize pool of $6,000 for top performers.
- Registration is open now until January 22nd, with winners announced on February 1st.
Live Agent Studio competition details: The Live Agent Studio Hackathon runs from January 8th to January 22nd at 7:00 PM CST, culminating in community voting starting January 26th.
- Winners will be livestreamed on February 1st, and participants are encouraged to build agents compatible with the studio using their tool of choice.
Increased prize pool for n8n agents: The n8n team has added to the prize pool, now totaling $6,000, with specific awards of $700 and $300 for the best n8n agents.
- Judging for these two awards will be conducted by the n8n team, adding extra incentive for participants to engage with their platform.
Important Hackathon guidelines: Participants are reminded to read the agreement and comprehensive guide to building AI agents for the Hackathon carefully.
- These resources will provide essential information and instructions prior to participation, ensuring everyone is well-prepared.

Link mentioned: oTTomator: no description found

OpenRouter (Alex Atallah) ▷ #general (46 messages🔥):

OpenRouter Performance Issues, O1 API Response Format, Gemini Flash Performance, Hanami Usage, Crypto Payments

OpenRouter's UI performance criticized: Members discussed how OpenRouter infrastructure is great, but its UI performance is lacking, especially when handling long chat histories of over 1k lines.
- Users noted that scrolling and typing becomes nearly impossible when the chat history exceeds this limit and requested improvements in activity filtering and pagination.
Strange formatting in O1 API: Multiple users reported that the O1 API responses use ===== instead of backticks for formatting, causing confusion and dissatisfaction with its behavior.
- One user speculated this might save tokens, while others questioned the rationale behind the change.
Gemini Flash capabilities: A member shared performance metrics for Gemini Flash 1.5, noting 63,364 in requests and 7,018 out, leading to a cost of $0.000171 with impressive 255.6 tps.
- They expressed enthusiasm for Gemini, despite some performance suggestions for better user experience.
Hanami framework inquiry: A user inquired if anyone was utilizing the Hanami framework, while another noted experiencing unexpected characters during testing.
- This sparked a brief discussion about its reliability and usability among members.
Crypto payments as liberation: One user humorously remarked about transcending government constraints after making a payment with crypto, sparking congratulatory responses from others.
- This light-hearted exchange highlighted the community's engagement with emerging payment methods.

Perplexity AI ▷ #announcements (1 messages):

CSV Downloads, Table Responses

CSV Download Feature Launched: Users can now download tables as CSV files directly from responses by selecting the download option, a feature that enhances usability.
- This addition was illustrated through an attached image
Enhanced User Interaction Through Table Features: The new CSV download option provides a seamless way for users to extract data, enhancing interaction with table responses.
- By simplifying data retrieval, this feature aims to improve overall user experience.

Perplexity AI ▷ #general (33 messages🔥):

Youzu.ai design tool, Perplexity user issues, Collaboration project proposal, Ecosia partnership inquiry, Perplexity optimization tips

Youzu.ai Offers Interior Design Assistance: Youzu.ai is an AI-powered tool that helps users create beautiful room designs by suggesting where to buy items based on their location, significantly reducing stress during the process.
- A member praised its effectiveness, stating it transformed the ordeal of shopping into an enjoyable experience.
Perplexity Users Facing Technical Glitches: Several users reported issues with Perplexity, including text input delays, a persistent upload request window, and the inability to reload modified files within Spaces.
- One user even inquired about the maximum context size for inputs and outputs within the platform.
Call for Collaboration on Meaningful Projects: A member expressed a desire to leverage the diverse talents in the group to create impactful projects, inviting contributions regardless of time commitment.
- They emphasized the importance of collaboration, aiming to build something collectively memorable.
Ecosia Seeks Partnership with Perplexity: A product manager from Ecosia reached out to connect for a potential partnership, noting difficulties in finding appropriate contact channels.
- They were hopeful for assistance from the community to facilitate this collaboration.
Discussion on Optimizing Perplexity Usage: Users discussed strategies to optimize Perplexity for professional research, including model selection and best practices for system prompts.
- One user sought advice from experienced members on creating a powerful, pre-configured system prompt for their profile.

Links mentioned:

Perplexity AI ▷ #sharing (6 messages):

Toyota exploring rockets, Upcoming video game releases, IndyCar driver averages, Average lifespan of Spaniards, NVIDIA supercomputer for home use

Toyota's Rocket Ambition: Toyota is reportedly exploring innovations in rocketry, as outlined in this article. The automotive giant is venturing into new aerospace territories.
Sneak Peek at Upcoming Video Games: A discussion has emerged around upcoming video game releases. Gamers are eagerly anticipating what's next on the horizon.
IndyCar Driver Performance Insights: An analysis of IndyCar driver averages sheds light on performance metrics across the season. Fans are diving deep into numbers to gauge driver success.
Lifespan Statistics in Spain: The average lifespan of a Spaniard is discussed in this analysis. Health trends reveal significant insights into longevity.
NVIDIA's Home Supercomputer Offer: NVIDIA's new supercomputer, priced at $3000, is now available for home use, according to this announcement. This innovation is set to redefine computing at home.
- The fusion of AI technology with personal computing is becoming more accessible.

Perplexity AI ▷ #pplx-api (3 messages):

Korean language API usage, Model alternatives to Llama-3.1, Discord discussion links

Request for Korean Language API: @razer8967 expressed a desire for an API that provides responses solely in Korean.
- I want API answer only Korean lang detailed the user's wish for language specificity.
Discussion on Alternative Models: @razer8967 clarified that they are looking for models other than Llama-3.1-sonar-small, large, huge for their needs.
- This implies an ongoing search for suitable models compatible with their language preference.
Link to Discord Conversation: A member shared a Discord link related to the topic, though it lacked additional context.
- Despite the link, no specific insights were drawn from the content shared.

Cohere ▷ #discussions (2 messages):

North AI Workspace, Cohere Launch Events, Productivity Tools

Cohere Launches Its North AI Workspace: Cohere announced the early access launch of North, an all-in-one secure AI workspace platform that integrates LLMs, search, and agents into an intuitive interface to enhance productivity.
- This platform is designed to outperform Microsoft Copilot and Google Vertex AI Agent Builder, promising a seamless experience for users aiming for operational efficiency.
North Promises Peak Productivity: North combines multiple functionalities to help users achieve peak productivity by making AI integration effortless in their daily work routines.
- The launch excited the community, highlighting its potential impact on enhancing workplace efficiency.

Links mentioned:

Cohere ▷ #questions (7 messages):

Command R+ for Generative Models, Guidelines for Upgrading Embeddings, Classification Model Error Handling, Alignment Evals Hackathon

Command R+ becomes essential for Large Models: For large generative models, Command R+ is recommended, with detailed model information available in the model overview. A user inquired about potential workflows to utilize this feature effectively.
Upgrading from embed-v2 to v3 Guidelines: A user sought advice on the best practices for upgrading from embed-v2 to v3 embeddings due to concerns over the massive task of regenerating embeddings. They indicated the possibility of future deprecation of embed-v2.
Error Handling in Large Classification Models: A user encountered an error while training a classification model due to a 2,500 example limit in requests, since their dataset had 95,429 labeled examples. They sought guidance on how to effectively handle large datasets for fine-tuning.
Upcoming Alignment Evals Hackathon: A user announced they are hosting an Alignment Evals Hackathon on the 25th, which includes the release of evals and interp tutorials. Other members encouraged sharing the hack details with the community for wider participation.

Cohere ▷ #api-discussions (26 messages🔥):

Cohere LLM API recursive output issue, Generating long reports with Cohere, Handling token limits in model outputs, API rate limit errors, Setting auto mode for generating context

Cohere LLM API gets stuck in recursive output: A user reported experiencing a recursive loop issue with the Cohere LLM API while using the Python ClientV2, causing excessive token usage.
- There was a suggestion to implement a max_tokens parameter to cap the number of events in the response stream.
Challenges with generating long reports: A user inquired about expanding the output length of cmd-r+, citing that 4k tokens were inadequate for generating complete chapters or including reasoning.
- The suggestion was made to use a rolling chat history to extend output generations, to circumvent the 4k limit.
Issues with API rate limits: A user encountered a TooManyRequestsError while querying the API, indicating they exceeded allowed limits.
- The community member suggested reaching out to support for assistance concerning API issues.
Clarifications on setting auto mode: There was confusion regarding how to implement auto mode for managing chat history on the API, with users seeking clearer guidance.
- One member explained the process of duplicating chat history but sought further assistance on whether an auto mode was available.
Max message size error in API: A user encountered a message size error when attempting a GET request on the datasets endpoint with a size exceeding 4MB.
- This highlights a potential limitation in the API regarding maximum message sizes received from requests.

Cohere ▷ #projects (2 messages):

Discord Channel Rules

Reminder about Posting Rules: A user was reminded to read the channel rules and to only post messages in one channel.
- The reminder emphasized the importance of following guidelines to maintain order in the Discord community.
User Acknowledges Reminder: Another user apologized and stated they would address the reminder later.
- This response indicates cooperation while acknowledging the request from the community.

tinygrad (George Hotz) ▷ #general (18 messages🔥):

Bounty for PR #8505, LLVM JIT and Autogen Discussion, Stability of LLVM API, Contributions to Tinygrad

Bounty offered for PR #8505: A member was informed that the bounty is available for retesting PR #8505, with payment options via PayPal or USDC.
- George mentioned that the bounty is specifically for the MOCKGPU AMD test on OSX.
LLVM JIT and Autogen merged plans: A member shared that PR #8486 has been ready for review, intending to combine LLVM JIT with LLVM autogen for ease of development.
- They noted that comments discussing the approach for handling multiple versions can be reviewed in support/llvm.py.
Concerns about LLVM API stability: A member expressed uncertainties regarding silent changes in function signatures within the LLVM API, without finding prior examples.
- George reassured that such silent changes are unlikely, and the member confirmed that things currently work across LLVM versions 14-19 without issues.
New contributions to Tinygrad encouraged: A user reached out about the excitement regarding contributing to Tinygrad, asking for guidance on how to get involved.
- Interest in contributing indicates that the community is welcoming and eager for collaborative efforts.

Links mentioned:

tinygrad (George Hotz) ▷ #learn-tinygrad (4 messages):

TinyGrad Blog Overview, Initializing Layers with Device Specification, Device Options in TinyGrad

TinyGrad Blog Post Shared: A member shared their blog post which provides an overview of exploring TinyGrad's codebase, reflecting learnings as they delve deeper into the project.
- They emphasized that this is a high-level overview and caution against modifying untested code outside the core tinygrad/ directory.
Device Specification for Layer Initialization: A question was raised about specifying the device for weights and biases during the initialization of a layer like nn.Linear.
- A member suggested using Device.DEFAULT to set the desired device before creating any Tensors, with options including METAL, CUDA, and CLANG.
Device Options Clarified: When asked about device options, a response listed several specifications for initializing Tensors in TinyGrad, highlighting that CLANG will utilize the CPU.
- This array of options allows users to tailor their hardware preferences when working within the TinyGrad framework.

Link mentioned: TinyGrad Codebase Explained-ish: A detailed-ish explanation of TinyGrad’s repository structure and key files

Nomic.ai (GPT4All) ▷ #general (22 messages🔥):

Nvidia performance with GPT4All, Using the phi-4 model, Local server API issues, Template setup for models, Recommendations for roleplay models

Nvidia performance boosts with GPT4All: It was noted that there is a significant performance difference between llama.cpp Vulkan and what GPT4All uses, especially on Nvidia with CUDA advantages.
- This highlights the varying performance levels depending on hardware configurations and model implementations.
phi-4 model achieves success: A member shared their successful javascript test run with the phi-4-Q4_0 model within GPT4All, noting it functions well under its template.
- The model is officially licensed under MIT, according to the Microsoft release at Hugging Face.
Local server API raises questions: There was a discussion regarding the local server API being compatible only with OpenAI, raising concerns about missing openai_api_key errors when using a local language model.
- Members questioned why local hosting doesn't seem supported, clarifying the base limitations of current configurations.
Template setup confusions for models: A beginner expressed difficulties in setting up the Chat Template for a Vicuna model on GPT4All, getting standard responses regardless of input.
- Advice was given to check GitHub resources for template configurations, highlighting how older models may lack specific chat templates.
Recommendations for roleplay models: For roleplay in the COTE anime, suggestions included using the Nous Hermes 2 model, which is noted for its compatibility with RP content.
- Community members also encouraged exploring resources on platforms like Reddit for further recommendations tailored to roleplay scenarios.

Links mentioned:

LlamaIndex ▷ #blog (2 messages):

GitHub HQ Event, Agentic Document Workflows, AI Agents Debugging, Fast Inference Systems, LlamaIndex Workflows

GitHub HQ Event on January 15th: Join expert talks at GitHub HQ on Jan 15th that will cover debugging AI agents with @arizeai, creating fast inference agent systems with @GroqInc, and building agentic workflows with LlamaIndex. For more details, see the announcement tweet.
- This event promises hands-on insights and multiple learning opportunities for developers and tech enthusiasts alike.
Transition to Agentic Document Workflows in 2025: A new paradigm called Agentic Document Workflows (ADW) is set to streamline document processing by integrating directly into business processes in 2025. Discover the core principles behind ADW in the blog post here.
- ADW emphasizes handling documents that come in multiple formats and aims to enhance operational efficiency.

LlamaIndex ▷ #general (18 messages🔥):

Ollama performance, Access control for applications, Vector database indexing, Local TEI server for reranking, QueryFusionRetriever token limit

Ollama update boosts performance: A user reported that after updating Ollama, the evaluation time was reduced to less than 3 seconds.
- Improvements in processing speed have been noted, raising interest in further performance reviews.
Controlling app access with email restrictions: A question was raised about deploying an app to ensure it's only accessible to users with specific email addresses, proposing Cloud Run + Google IAP as a solution.
- This suggests a straightforward solution for non-technical users in managing app accessibility.
Challenges in VectorStoreIndex integration: A user explored the feasibility of filtering nodes based on metadata keys within a VectorStoreIndex while using a PostgreSQL database with JSON indexing.
- Discussions indicate the need for manual indexing or further support for automatic indexing within LlamaIndex.
Local TEI server's reranking capabilities: A user shared an API reference about utilizing a local TEI server for reranking but expressed uncertainty about implementation.
- The community is encouraged to update existing discussions to reflect capabilities, specifically regarding issues and pull requests.
QueryFusionRetriever encounters token limit: A user reported issues when combining TEI Reranker with QueryFusionRetriever, encountering an 'Input validation error' for exceeding token limits.
- They provided a code snippet showing the configuration for setting up the retrievers, highlighting the 25 top K parameter.

Links mentioned:

Modular (Mojo 🔥) ▷ #mojo (18 messages🔥):

Rust syntax for Actor models, Overload resolution in Mojo, Quantum computing libraries in Mojo, MAX and quantum computing, Quojo library for quantum operations

Rust's Syntax Streamlines Actor Implementations: A member appreciates Rust's syntax for making multiline reusable implementations of actors like GlommioMultipaxosWorker simpler, reducing noise from verbose type boundaries.
- They expressed concern that overload resolution order might become problematic in larger codebases due to frequent shuffling.
Interest in Quantum Computing Libraries for Mojo: A member asked about the existence of developing quantum computing libraries in Mojo, looking for a 'Qiskit-like' implementation for practical experience.
- Another member recommended utilizing MAX, explaining its capability to support various hardware configurations, alongside its future potential for quantum programming.
Explaining MLIR in a Nutshell: A member shared a YouTube video link explaining MLIR concepts to clarify its role in optimizing quantum operations.
- They illustrated how MAX analyzes hardware use during runtime for computational optimizations like eliminating identity multiplication.
Introduction to the Quojo Library: A user pointed to the Quojo library on GitHub, describing it as a quantum computing machine written in Mojo.
- The community reacted positively, recognizing the speed of emerging young developers in the quantum computing space.
Starting with Qiskit for Quantum Simulation: In response to queries, a member noted that Qiskit should be a starting point, clarifying that libraries can simulate quantum operations without the need for direct IBM API access initially.
- Members discussed the potential of libraries like Quojo while acknowledging the learning curve involved.

Link mentioned: GitHub - Deftioon/Quojo: A Quantum Computing Machine written in Mojo: A Quantum Computing Machine written in Mojo. Contribute to Deftioon/Quojo development by creating an account on GitHub.

LLM Agents (Berkeley MOOC) ▷ #hackathon-announcements (1 messages):

Hackathon Results, Judging Timeline Updates

Hackathon Results Release Delayed: The Hackathon website has been updated to reflect changes to the timeline, indicating that while most results have been tallied, final results are expected in January after further feedback from judges.
- Participants have been informed that judges are impressed by submissions, and an official announcement will follow once all results are confirmed.
Judges Still Reviewing Submissions: Most final results have been tallied, but the organizers are still waiting for responses from a few judges before finalizing everything.
- This delay has been communicated to participants, highlighting the judges' favorable reactions to the submitted projects.

LLM Agents (Berkeley MOOC) ▷ #mooc-questions (6 messages):

Google Form edits, Twitter account deactivation, Form submission process, Email access issues

Google Form won't allow edits: A user reported that the Google Form won't allow them to edit their previous submission, expressing the need for assistance.
- The response suggested that re-submitting the form would overwrite the previous entry.
Accessing forms with different emails: One member mentioned that trying to access the form with a different email might work, advising to input the correct email in the field.
- Another user noted the form was closed but questioned if their deactivated Twitter account would affect their eligibility for a certificate.
Twitter account status won't disqualify: After confirming the form was closed, the user expressed concern about disqualification due to their Twitter account deactivation.
- A member reassured them that deactivation would not result in disqualification for the certificate.

OpenInterpreter ▷ #general (7 messages):

OI 1.0 Python Execution, AI Improvement Observations, Model and Parameters Inquiry, Custom Instructions Insights

Concern about OI 1.0 Python Execution: A member questioned whether --tools interpreter is intended to enable running Python code in OI 1.0, noting it still attempts to call python.exe.
- Another member pointed to a specific line in the system message that suggests OI 1.0 can no longer run code directly, causing confusion.
AI Improvement Attempts: One member shared insights on the AI's performance, noting improvements with certain commands and its ability to print the head of files instead of the entire content.
- They mentioned using the gpt-4o-mini model, highlighting areas where they have noticed the AI struggles.
Inquiry about Model and Parameters: A user requested details on the model and parameters in use, seeking a rundown on its capabilities and any necessary changes.
- This inquiry reflects a broader interest in understanding how to optimize interactions with the AI.
Custom Instructions Reflections: Discussion included sharing custom instruction sets, showcasing various guidelines aimed at improving the AI's interaction regarding code execution.
- The instructions emphasized careful handling of commands and encouraged confirming tool capabilities before use.

LAION ▷ #general (5 messages):

TruLie dataset, Image-to-3D techniques, Chirpy3D, World Models, Gaussian splats

Inquiry about the TruLie Dataset: A member inquired about the TruLie dataset, seeking information from the community on its current relevance.
Exploration of Image-to-3D Techniques: There was a discussion on the latest advancements in image-to-3D technologies, specifically seeking open-source options that can run on a laptop.
- Interest was expressed in techniques that utilize single image inputs, alongside recommendations for Gaussian splat and NeRF libraries.
Chirpy3D Sparks Interest: A member shared a link to Chirpy3D, a project focused on continuous part latents for creative 3D bird generation, highlighting its potential in the 3D modeling space.
- The project is linked to contributions from several researchers at notable institutions like University of Surrey and Imperial College London.
The Rise of World Models: Another topic of interest was the World Models, which integrate physics-aware networks for more realistic video generation.
- While not strictly image-to-3D, it was noted that this approach is closely related to the ongoing discussions about 3D generation technologies.

Links mentioned:

LAION ▷ #research (1 messages):

rom1504: Is there any good open tool registry for building agents ?

DSPy ▷ #general (4 messages):

Improving COT for Chatbots, Building Custom Evaluations for LLMs, Importance of Evals in AI Development, Drew Breunig's Work and Projects

Newbie Seeks Chatbot COT Improvement: A member inquired about ways to improve Chain of Thought (COT) for chatbots beyond just setting a signature.
- Is there any way to improve COT other than setting a signature for it?
Build Your Own Eval for LLMs: A discussion highlighted an article by Drew Breunig on the importance of building your own evaluation when selecting models, emphasizing that evals are essential.
- Breunig stated that your eval is the most valuable AI asset you own, not your model or your prompts.
Drew Breunig's AI Contributions: Drew Breunig introduced himself, explaining his background at PlaceIQ and his current work with Precisely and the Overture Maps Foundation.
- He advocated for the production of tools like StepList, an app for managing routines, and Reporter, an app to measure self-understanding.
Community Engagement with Evals: A community member expressed enthusiasm about Breunig's article on evaluations, stating they would check out the content.
- Awesome! Gonna check it out now.

Links mentioned:

AI21 Labs (Jamba) ▷ #general-chat (3 messages):

Python app with Jamba, AI code generation, PHP coding, Jamba functionality

Podcaster harnesses Jamba for episode recall: A member shared that they created a basic Python app using Jamba's Conversational RAG to query podcast transcripts for better episode recall.
- It's been a lot of fun experimenting with the app, despite it still being a work in progress.
AI Code Generation's amusing quirks: Another member noted their initial experiences with AI in generating code led to some amusing mistakes during code troubleshooting in HTML, Javascript, and PHP.
- They remarked that the current boom in AI technology seems to be just scratching the surface of what's possible.
PHP remains a staple for web projects: A member shared their ongoing use of PHP for web development and local IRC bot coding, emphasizing its reliability.
- They expressed satisfaction after getting connected to Jamba, as it simplified some aspects of programming by working with conversation arrays similar to other APIs.

Torchtune ▷ #general (1 messages):

jovial_lynx_74856: Anyone here tried finetuning ModernBERT?

Torchtune ▷ #jobs (1 messages):

Nectar Social hiring, Referral bounties, AI startup roles

Nectar Social is hunting for talent: An early-stage AI startup, Nectar Social, is expanding rapidly and seeking to fill key positions in Seattle and beyond, including a Sr/Staff Product Manager, LLM/AI Engineer, and more.
- They are offering referral bounties of up to $10,000 for successful hires and are currently operating in semi-stealth mode.
Diverse roles with flexible locations: Available positions also include a Customer Success Manager that offers flexible location options and Founding Account Executives in NYC/LA.
- Previous startup experience is preferred for applicants to increase their chances of success.

{% else %}

The full channel by channel breakdowns have been truncated for email.

If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!

If you enjoyed AInews, please share with a friend! Thanks in advance!

{% endif %}