AI News for 4/3/2025-4/4/2025. We checked 7 subreddits, 433 Twitters and 30 Discords (230 channels, and 7491 messages) for you. Estimated reading time saved (at 200wpm): 629 minutes. You can now tag @smol_ai for AINews discussions!
Itâs been a quiet week, so why not fill out the AI Engineer Worldâs Fair Call For Speakers?
Tracks across:
- AI Architects
- /r/localLlama
- Model Context Protocol (MCP)
- GraphRAG
- AI in Action
- Evals
- Agent Reliability
- Retrieval, Search, and Recommendation Systems
- Security
- Infrastructure
- Generative Media
- AI Design & Novel AI UX
- AI Product Management
- Autonomy, Robotics, and Embodied Agents
- Computer-Using Agents (CUA)
- SWE Agents
- Vibe Coding
- Voice
- Sales/Support Agents
- The Great AI Debates
- Anything Else
{% if medium == âwebâ %}
Table of Contents
[TOC]
{% else %}
The Table of Contents and Channel Summaries have been moved to the web version of this email: [{{ email.subject }}]({{ email_url }})!
{% endif %}
AI Twitter Recap
Model Releases and Announcements
- OpenAIâs plans for model releases have shifted: @sama announced that o3 and o4-mini will be released in a couple of weeks, followed by GPT-5 in a few months. The delay is attributed to making GPT-5 much better and challenges in smoothly integrating everything, along with ensuring sufficient capacity for expected demand.
- DeepSeekâs Self-Principled Critique Tuning (SPCT) improves inference-time scalability for generalist reward modeling: @iScienceLuvr reports that DeepSeekâs new method, SPCT, enhances the quality and scalability of Generalist Reward Models (GRMs), outperforming existing methods and models in various RM benchmarks.
- @nearcyan asserts that Anthropicâs Sonnet 3.7 remains the best coding model.
- Googleâs Gemma 3 can be tried in KerasHub.
- Qwen 2.5 VL powers a new Apache 2.0 licensed OCR model: @reach_vb.
Gemini 2.5 Pro
- Gemini 2.5 Pro is in public preview for scaled paid usage and higher rate limits: @_philschmid announced the move to public preview. Google is moving Gemini 2.5 Pro to Preview, offering developers increased rate limits for testing production-ready apps, now available in Google AI Studio, as noted by @Google.
- Gemini 2.5 Pro is becoming a daily driver for some: @fchollet notes it is probably the best model for most tasks except image generation, where it is still good.
- Pricing is out for Gemini 2.5 Pro: @scaling01 shares the cost per million tokens for context >200k: Input at $1.25 (2.50) and Output at $10 (15.00).
AI Model Capabilities and Benchmarks
- Metaâs architectural advantage: @teortaxesTex notes OpenAIâs willingness to flex their architectural advantage.
- FrontierMath benchmark challenges AI: @EpochAIResearch describes how their FrontierMath benchmark challenges AI to perform long-form reasoning and develop a coherent worldview, crucial steps for broader reasoning capabilities and scientific thinking.
- DeepSeekâs inference scaling paper shows that Gemma-2 27b is enough to match R1: @teortaxesTex.
- A new paper explains why LLMs obsessively focus attention on the first token known as an attention sink: @omarsar0 reports that sinks act as no-ops that reduce token interaction and preserve representation diversity across layers. Perturbation tests in Gemma 7B show
<s>
significantly slows the spread of changes, and in LLaMa 3.1 models, over 80% of attention heads show strong sink behavior in the 405B variant. - MegaScale-Infer is presented as an efficient and cost-effective system for serving large-scale Mixture-of-Experts (MoE) models, achieving up to 1.90x higher per-GPU throughput than state-of-the-art solutions: @iScienceLuvr.
- Discrete diffusion models are experiencing a resurgence: @cloneofsimo highlights that discrete diffusion is winning over AR recently, with LLaDA-8B, Dream-7B, and UniDisc.
- GPT-ImgEval is introduced as a comprehensive benchmark for diagnosing GPT4o in image generation: @_akhaliq.
AI Applications and Tools
- Microsoft is rapidly advancing GitHub Copilot: @LiorOnAI shares that Agent mode and MCP support are rolling out to all VS Code users.
- PyTorch has released a tool to visualize matrices: @LiorOnAI announced its release, emphasizing that matrix multiplications (matmuls) are the building blocks of todayâs models.
- Elicit has added approximately 10 million more full-text papers, enhancing the comprehensiveness of its reports: @elicitorg.
- Perplexity AI has shipped a number of features, including fact-checking of any part of the answer with sources: @AravSrinivas.
Langchain and Graph Updates
- AppFolioâs copilot, Realm-X, powered by LangGraph and LangSmith, saves property managers over 10 hours per week @LangChainAI .
- LangGraph Python now supports Generative UI: @LangChainAI.
- Langchain and Tavily AI now have a ReAct Agent Tutorial Series: @LangChainAI reports on a step-by-step guide for building production AI agents with LangGraph.
Other
- @jd_pressman expresses that theyâre tempted to write down their 5 year timeline in the hopes it breaks somebody out of mode collapse.
- Karpathy is advocating for moving AI predictions from blog posts, podcasts, and tweets to betting markets: @karpathy.
- Hugging Face had 1,000,000 pageviews on research papers in March @ClementDelangue, and it is becoming the best place to find, promote & discuss research in AI!
- Stanford welcomes @YejinChoinka as a new faculty member in Computer Science: @stanfordnlp.
Humor and Memes
- Edo period cat meme: @hardmaru
AI Reddit Recap
/r/LocalLlama Recap
Theme 1. âAdvancements in Generalist Reward Models Unveiledâ
-
New paper from DeepSeek w/ model coming soon: Inference-Time Scaling for Generalist Reward Modeling (Score: 257, Comments: 40): DeepSeek has released a new paper titled âInference-Time Scaling for Generalist Reward Modelingâ. The paper introduces a method called Self-Principled Critique Tuning (SPCT) to improve reward modeling for large language models by scaling compute at inference time. Their 27B parameter DeepSeek-GRM model with parallel sampling can match or exceed the performance of much larger reward models up to 671B parameters. The models will be released and open-sourced. This research offers a promising path for enthusiasts running LLMs locally, as it allows achieving higher-quality evaluations without needing massive models. The availability of open-source models could provide local LLM users access to high-quality evaluation tools.
- Hankdabits: Expresses enthusiasm that DeepSeekâs 27B parameter model can match or exceed much larger models, saying âYes pleaseâ.
- Iory1998: Notes that DeepSeek usually releases models two weeks after a paper, so âitâs very soon baby!â, and suggests this may impact the release of Llama-4.
- JLeonsarmiento: Remarks that while others are distracted, âthe Chinese are destroying USA AI business model and pushing boundaries.â
Theme 2. âBuilding High-Performance GPU Servers on a Budgetâ
-
Howto: Building a GPU Server with 8xRTX 4090s for local inference (Score: 550, Comments: 161): Marco Mascorro built a GPU server with eight NVIDIA RTX 4090 graphics cards for local inference and provided a detailed guide on the parts used and assembly instructions. The build offers a cost-effective local inference solution compared to more expensive GPUs like A100s or H100s and is expected to be compatible with future RTX 5090s. The full guide is available here: https://a16z.com/building-an-efficient-gpu-server-with-nvidia-geforce-rtx-4090s-5090s/. An image shows the server setup with eight GPUs organized in a chassis for high-performance computing applications. The author is enthusiastic about open-source models and local inference solutions, hoping the guide will be helpful for those without the budget for expensive GPUs like A100s or H100s. They welcome comments and feedback and are eager to answer any questions.
segmond
notes that the budget should be specified, implying that cost is an important consideration.Educational_Rent1059
suggests that 2x RTX 6000 ADA PRO GPUs may provide better ROI, offering 192GB VRAM and being more cost-effective and power-efficient.Puzzleheaded_Smoke77
comments on the high expense by stating, âI could probably pay my mortgage for a year with the amount of money sitting in that case âŠâ
Other AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding
Theme 1. âAdvancements in Long Context AI Modelsâ
-
chatgpt-4o-latest-0326 is now better than Claude Sonnet 3.7 (Score: 262, Comments: 121): The new GPT-4o-latest-0326 model is significantly better than the previous GPT-4o model. According to LMSys rankings, itâs now #2 overall and #1 for coding. The model can be added in Cursor as âchatgpt-4o-latestâ. The poster used this model on Cursor for working with 1-5 medium-length Python scripts in a synthetic data generation pipeline. The model handles long context well and is fast. The poster is sharing this experience in a Claude subreddit to get opinions from Claude power users. The poster finds the new GPT-4o model dramatically better than the previous version at coding and everything else. It doesnât overcomplicate things (unlike Sonnet 3.7), often providing the simplest and most obvious solutions that work. It formats replies beautifully, making them super easy to read. It follows instructions very well. The poster has switched to it and hasnât switched back since. The poster encourages others to try the new model and share their experiences.
- One user mentions theyâve shifted to gemini 2.5 pro, which is free, has the highest context size, and they donât see a reason to use anything else right now.
- Another user expresses confusion over the various models and their capabilities, asking how GPT-4.5, o3-mini-high, Claude, and others like Deepseek compare for coding tasks.
- A user notes that while Claude was their favorite, it has now been outperformed in nearly every way, even in coding.
Theme 2. âUnlocking AI Innovations: Art, Animation, and Pricingâ
-
How to guide: unlock next-level art with ChatGPT with a novel prompt method! (Perfect for concept art, photorealism, mockups, infographics, and more.) (Score: 482, Comments: 41): The Reddit user introduces a novel technique to enhance image generation using ChatGPT, particularly effective for concept art, photorealism, mockups, and infographics. The method involves first prompting ChatGPT to create a detailed visual description of the desired image, sometimes extending to thousands of words. This detailed context helps the model âthink throughâ the scene, resulting in higher quality and more coherent images, often surpassing the capabilities of the Images v2 model. The user provides step-by-step instructions: first, ask ChatGPT to âDescribe in extremely vivid details exactly what would be seen in an image [or photo] of [insert your idea],â including extensive details for better context; then, switch back to the image generation model and prompt it to âGenerate the photo following your description to the exact detail.â They share examples using scenes from Lord of the Rings, such as generating images of Minas Tirith, and provide an album of these images here. The user believes this method significantly improves image generation quality, allowing for creations that âfeel like they shouldnât even be possible.â They note that ChatGPT âresponds best when guided with detailed reasoning and richly written context,â and that lengthy descriptions give it the necessary context to place elements logically and aesthetically. The technique is praised for helping the model understand spatial relationships and scene logic, which standard prompts often fail to achieve. The user expresses excitement about the possibilities this method unlocks and encourages others to try it out, concluding with âGive it a try and let me know if this method was useful to you! Enjoy!â
- One user appreciated the workflow, stating, âI thought this would be a waste of time reading but itâs actually a really good workflow. Nice job.â
- Another user found the method âabsolutely phenomenal,â using it to generate âsome really interesting resultsâ for Lovecraftian monsters. They shared that they had to steer the prompts a bit because âChat-GPT was always a little too fond of tentacles and eyes,â but ultimately achieved impressive outcomes.
- A user mentioned that adding specific details to the prompt, like âGenerate a hyper realistic photo as if captured by a Nikon DSLR 4K camera from a street level point of view,â helped improve their image generation results.
-
Another example of the Hunyuan text2vid followed by Wan 2.1 Img2Vid for achieving better animation quality. (Score: 165, Comments: 16): The poster created an animation using Hunyuan text2vid followed by Wan 2.1 Image2Video to improve animation quality. They used a mix of four LoRAs in Hunyuan, including three animation LoRAs of increasing dataset size and one Boreal-HL LoRA to enhance world understanding and detail. The frames were processed using the Wan 2.1 Image2Video workflow. Initially, they ran the process on Fal due to competition time constraints but had to switch to Replicate when Fal changed their endpoint. For some sliding motion shots, they used Luma Ray. They manually applied a traditional Gaussian blur overlay technique for hazy underlighting on several clips. The video was submitted for a competition under time constraints. The poster is unsure if the complicated mix of four LoRAs was necessary for stability. They believe that smaller Hunyuan dataset LoRAs provided more stability by prompting close to the original concepts. They praise Wanâs base model for delivering some of the best animation motion out of the box. They expressed frustration with Falâs lack of support regarding endpoint changes. They suggest that Gen4âs new i2v might be easier for better motion unless one needs to stick to open-source models. They note that the lighting style used can destroy a video with low bit-rate. They acknowledge issues in the video, such as the Japanese likely sounding terrible and broken editing, due to time constraints.
- A user is confused about whether the process was Image2Video or Video2Video, suggesting that if it was truly I2V, using a model specialized in image generation might have been better for starting frames.
- Another user asks how to achieve the low frame rate, animated look, mentioning that their own animations come out too smooth, like video.
- A user appreciates the projectâs premise of using complex flesh material to resuscitate skeletons manipulated by an autonomous machine in space, and asks if there was any inspiration from media like manga or movies.
-
Gemini 2.5 Pro pricing announced (Score: 201, Comments: 75): Google has announced the pricing for Gemini 2.5 Pro, a multipurpose AI model designed for coding and complex reasoning tasks. The model offers both a free tier and a paid tier, specifying costs for input and output prices per million tokens. Features like context caching and usage for product improvement are detailed. Users are invited to try it in Google AI Studio here. The announcement suggests the model provides significant value for its price, potentially positioning it as a competitive option in the AI market. Offering both free and paid tiers indicates a focus on accessibility for a wide range of users.
- Some users express that itâs insane how good the model is for the price, making other paid options less attractive.
- There is discussion about the free tierâs limit of <500 RPD, which is considered sufficient for 99.9% of potential users, except perhaps for extensive coding use.
- Comparisons are made to previous modelsâ pricing, and itâs noted that one key difference is that paid usersâ data is not used for training.
Theme 3. âUnlocking AI: Models, Hardware, and Hilarious Pranksâ
-
Altman confirms full o3 and o4-mini âin a couple of weeksâ (Score: 665, Comments: 204): Sam Altman confirms that full o3 and o4-mini will be released âin a couple of weeksâ. Additionally, GPT-5 will be released âin a few monthsâ, possibly signaling a delay. Some believe the release timeline has changed due to competition from companies like Gemini 2.5 Pro. Thereâs excitement for o4-mini, which could offer performance close to full o3 for less cost. Others express frustration over the increasing number of models in the selector.
- Users discuss that GPT-5 is expected to be significantly more capable than o3, indicating major advancements.
- Some speculate that the accelerated release is a response to competitive models like Gemini 2.5 Pro entering the market.
- Thereâs anticipation that o4-mini will provide high performance at a lower price, similar to how o3-mini compared to o1.
-
Howto guide: 8 x RTX4090 server for local inference (Score: 102, Comments: 68): Marco Mascorro built an 8x RTX 4090 server for local inference and shared a detailed how-to guide on the parts used and assembly process. The full guide is available at https://a16z.com/building-an-efficient-gpu-server-with-nvidia-geforce-rtx-4090s-5090s/. The server is intended for very fast image generation using open models. The images show parts for two 8x GPU servers designed for high-performance computing tasks such as local inference. The OP describes the server as âpretty coolâ and believes it may interest anyone looking to build a local rig for fast image generation. They invite feedback and are willing to answer questions. The setup is organized for optimal airflow, indicating careful design considerations for high-performance tasks.
- A user questions whether it would be more economical to buy two L40 or RTX 6000 Ada cards instead of eight RTX 4090s, asking âHow is this better?â
- Another user suggests that projects like this might be why RTX 4090s are so expensive.
- A user reflects on how GPU farms have shifted from being used for bitcoin mining to other purposes now.
-
lol WTF, I was messing around with fooocus and I pasted the local IP address instead of the prompt. Hit generate to see whatâll happen and ⊠(Score: 139, Comments: 22): The user was using fooocus and accidentally pasted the local IP address
http://127.0.0.1:8080
into the prompt. They generated an image depicting a dramatic volcanic eruption with a mushroom-shaped cloud. The user found this amusing and joked that if youâre using this IP address, you have skynet installed and youâre probably going to kill all of us.- One commenter joked Delete this, thatâs my ip address!
- Another suggested that the AI might nuke everyone whose IP address is 127.0.0.1.
- Someone else said You found the doomsday code, implying the accidental prompt uncovered something dangerous.
AI Discord Recap
A summary of Summaries of Summaries by Gemini 2.5 Pro Exp
Theme 1: Model Mania - Releases, Rankings, and Reasoning
- Altman Teases OpenAI Onslaught: OpenAI plans imminent releases of o3 and o4-mini, with GPT-5 following in a few months, promising it will be much better than we originally thought, according to Sam Altmanâs X post. Meanwhile, Google launched Gemini 2.5 Pro into public preview, boasting increased usage and cheaper-than-Sonnet pricing available via the Gemini API Pricing page.
- Coding Contenders Clash: Engineers actively compare coding capabilities, with Gemini 2.5 Pro challenging Claude, and some suggesting NightWhisper might outperform both in webdev/UI tasks. Separately, Cognition AI slashed the price of its AI software engineer Devin 2.0 from $500 to $20/month alongside a new IDE experience, detailed on Cognitionâs Twitter and in this VentureBeat article on Devin 2.0 price drop.
- Stealth Models and Open Source Strides: OpenRouterAI dropped a stealth model named Red - X-Ware.v0 (Twitter announcement), suspected to be OpenAI-linked due to its tool call format, while ByteDance open-sourced ByteCheckpoint for large-scale training and the VeOmni multi-modal framework. Additionally, OpenThinker2 models (OpenThinker2-32B, OpenThinker2-7B) claim to beat R1-Distilled-32B using only SFT, per this OpenThoughts blog post.
Theme 2: Fine-Tuning Frustrations & Hardware Hurdles
- Phi-4 & Gemma3 Finetuning Flops: Developers hit a ZeroDivisionError when finetuning
Phi-4-mini-instruct
, fixed by usingunsloth/Phi-4
due to an unset tokenizer chat template. Gemma3 users faced OOM issues during profiling and found LoRA application ineffective (Unsloth GitHub issue #2009), while others using LM Studio encountered CUDA errors (spits unused
) even after updates. - VRAM Velocity vs. Value Debated: Engineers debated the high cost of VRAM, questioning if performance justifies the expense, with one quipping, yeah, might sound expensive but the VRAM makes it worth it. Comparisons arose between M-series Macs and NVIDIA 4090s for inference, with some favouring Macâs large memory for bigger models despite bandwidth limitations, while others stick to 4090s for speed.
- Hardware Headaches Hit Hard: Tinygrad users compiling for WEBGPU with
BEAM=2
needed to increasemaxComputeInvocationsPerWorkgroup
, potentially limiting Android support (tinygrad PR #9085). Others faced Metalâs 32 buffer limit when running a Karpathy GPT reimplementation (example main.py), and Hugging Face Spaces users discovered outbound connections blocked on non-standard ports like 5432 (HF Spaces Config Reference).
Theme 3: Tooling Triumphs & Workflow Wonders
- MCP Mania Builds Browser Bots & Beyond: The Model Context Protocol (MCP) ecosystem is expanding with new tools like a Datadog driver (GeLi2001/datadog-mcp-server) and the mcp-browser-kit. Developers debated client vs. server builds, favoring clients for flexibility in vector tool calling and resource-based RAG, while also exploring MCP for React code generation.
- Context Crunching Commands Codebases: Tools like File Forge npm package and RepoMix GitHub repo gained traction for serializing entire code repositories into markdown reports. This allows feeding comprehensive context to LLMs like Claude or ChatGPT for improved reasoning and code generation.
- Torchtune Packs Datasets, NeMo Resists Crashes: Torchtune introduced packed dataset support (
dataset.packed=True
) to boost speed by eliminating padding tokens (torchtune PR #2560). Separately, insights from a NeMo session highlighted its resilient training features (fault tolerance, async checkpointing) designed to combat job crashes and wasted GPU time.
Theme 4: Research Ruminations & Conceptual Conundrums
- Sentience Still Stumps Sages: Discussions revisited LLM sentience, with agreement that defining consciousness is key; one jested AGI arrives if LLMs achieve consciousness before humans. Meanwhile, Copilot in VS Code generated eerie self-aware comments like âI believe I possess a form of consciousnessâŠâ, though users attributed it to file context, not genuine AI ego.
- Tokens Tested, Manifolds Manifest? Not Quite: Engineers questioned the rigidity of NLP tokenization, suggesting language is more dynamic than fixed tokens allow (Grok share on dynamic signals). Debate sparked over whether token embeddings conform to the manifold hypothesis, referencing a paper arguing they violate it (Token embeddings violate the manifold hypothesis paper).
- Scaling Laws & Steering Vectors Scrutinized: A preprint explored inference-time scaling laws, linking polynomial aggregate success rates despite exponential per-problem failure reduction to heavy-tailed distributions (How Do Large Language Monkeys Get Their Power (Laws)? paper). Elsewhere, researchers discussed composing and modulating steering vectors using techniques like Dynamic Activation Composition (BlackboxNLP paper on Dynamic Activation Composition) and contrasted them with âfunction vectorsâ (Function Vectors paper by David Bau et al.).
Theme 5: Platform Problems & Policy Puzzles
- Credit Costs Cause Consternation: Manus.im users griped about rapid credit consumption, suggesting a free daily task limit as a fix, while sharing prompt guides and LLMLingua (microsoft/LLMLingua GitHub) to reduce token use. Conversely, OpenRouter users celebrated DeepSeekâs 75% discount during certain hours compared to pricier Anthropic or OpenAI models.
- OpenAI Policy Puzzles Prompt Perplexity: Debate erupted over OpenAIâs content policies regarding adult toys, with conflicting signals between the older OpenAI Usage Policies and the newer OpenAI Model Spec. While the moderation endpoint blocks sexual content, the policy ambiguity left users uncertain about permitted generation boundaries.
- Platform Quirks Plague Productivity: Cursor users reported bugs like duplicate filenames getting
(1)
appended and files not updating in the editor without refocusing (version 0.48.7). GPT-4o Plus subscribers hit unexpected rate limits after few prompts, potentially due to subscription loading errors, while OpenRouter users faced User Not Found errors and issues reusing deleted accounts.
PART 1: High level Discord summaries
LMArena Discord
- Sacrificing Smarts for Speed?: Members debated prioritizing faster inference or smarter models in AI development, noting the release of o4-mini and o3 and speculating whether OpenAI found new inference techniques.
- The discussion also covered optimal context length, with one member excited to see 10 million tokens becoming a reality.
- Groq Hardware: OpenAIâs Missed Opportunity?: Participants considered trade-offs between model size, speed, and knowledge, noting smaller models require distillation to retain information and that Groq developed specialized hardware for AI inference.
- One member wondered why OpenAI hasnât acquired Groq yet.
- AI Sentience: Still Debated: The possibility of LLMs achieving sentience was discussed, with a consensus that defining sentience is a necessary first step.
- A member joked that if LLMs achieve consciousness before humans, that would be AGI.
- Geminiâs Musical Aspirations: A member shared Gemini-generated music, calling it partially interesting, and provided a link to a .mid file.
- They prompted Gemini to create a piano piece similar to Vangelis and Jarre using a python-based converter tool.
- NightWhisper Shows Coding Prowess: Members suggested that the NightWhisper model might be better than Gemini 2.5 Pro exp and Claude 3.7 Sonnet thinking for coding, with a focus on webdev and UI/UX.
- One member mentioned OpenAI plans to release this model in a few weeks.
Manus.im Discord Discord
- Users Gripe About Manus Credit Consumption: Users voiced concerns over the credit consumption on Manus, saying they are used too quickly, even for simple tasks, making the current pricing model less than ideal.
- The community proposed a one-task-per-day option for free users as a beneficial compromise, while some members shared prompting guides to help optimize credit usage, also suggesting LLMLingua (microsoft/LLMLingua) to reduce token consumption.
- OpenManus GUI Emerges From Dev: A developer is building an OpenManus GUI (image.png), designed for full compatibility with future updates, emphasizing a user-friendly experience.
- The planned features for the GUI include direct configuration editing, use-case sections, and templates, the developer noted that chat history implementation poses a challenge due to OpenManusâs lack of a history system.
- Gemini Closes the Gap, Rivals Claudeâs Coding Chops: The community is actively comparing Gemini and Claude for coding tasks, with some users reporting that Geminiâs output surpasses Claudeâs, particularly in scenarios where DeepSeek falls short.
- It has been noted that Gemini 2.5 is capable of generating code for anything you dream if you can prompt, but others cautioned that Google operates in a closed loop, but some users have noticed that Gemini is catching up.
- Prompt Engineering Tactics for Peak Performance: Users exchanged prompt engineering strategies to cut down on credit usage, which includes multi-prompt outlining and adopting a clear, step-by-step methodology, pointing to TheNewOptimal.md file as a great resource.
- They mentioned compression techniques like LLMLingua (microsoft/LLMLingua) could help minimize token consumption.
- Genspark Debated as Potential Manus Alternative: Community members weighed the pros and cons of Genspark (genspark.ai) as a potential Manus alternative, highlighting its lack of a paywall and solid handling of images and videos.
- Despite its advantages, concerns were raised about its sketchiness, with speculation that it could be a company from China, while some in the community insist that there is no alternative to manus right now due to resource availability issues.
Unsloth AI (Daniel Han) Discord
- VRAM Value Verified Via Velocity: Members on the channel debated the high cost of VRAM and whether the high performance of large memory capacity justifies the expense.
- One member humorously said, yeah, might sound expensive but the VRAM makes it worth it.
- Phi-4 Finetuning Flounders From Forgetfulness: Members reported encountering a ZeroDivisionError when finetuning Phi-4 mini instruct when trying to run the model.
- The reported fix was to finetune the
unsloth/Phi-4
model instead ofPhi-4-mini-instruct
, since the error stems from an unset tokenizer chat template.
- The reported fix was to finetune the
- Deepseek Effect Deters Direct Deployments: A member reported that the DeepSeek-R3-0324 model has proven too large to finetune locally, due to the Deepseek Effect.
- It was recommended to use Unsloth Documentation that leverages dynamic quants which recovers accuracy.
- Gemma3âs Grim Gomblings Generate Grief: A user experienced OOM (Out Of Memory) issues while profiling Gemma3, and tried to resolve it by limiting the profiling scope to only one training step.
- Separately, users report that applying LoRA doesnât change the model output as reported in GitHub issue #2009.
- Reward Functions Risk Reward Hacking: Members agreed that reward functions are not good enough to pinpoint what exactly is correct or wrong, but rather to measure what is relatively correct rather than trying to understand the truth on how/why.
- The community experience points to the importance of searching around for reward hacking to avoid this issue.
Interconnects (Nathan Lambert) Discord
- Microsoft Halts Cloud Expansion: Microsoft has reportedly paused or delayed data center projects across the globe, including in the U.K., Australia, and the U.S.
- This adjustment signals a shift in their cloud computing infrastructure strategy, reflecting the flexibility of their planning strategy which is made years in advance.
- Perplexity Pursues Billion-Dollar Funding: Perplexity is reportedly seeking up to $1 billion in new funding, potentially valuing the AI-powered search startup at $18 billion, according to Bloomberg.
- No further details provided.
- ByteDance Unleashes ByteCheckpoint and VeOmni: ByteDance has open-sourced ByteCheckpoint, designed for foundation model training and tested with jobs exceeding 10k GPUs, and VeOmni, a model training framework for LLMs and multi-modal training.
- VeOmni was used to train UI-TARS, the SOTA GUI Agent model prior to OpenAI operatorâs release.
- Altman Promises O3 and O4-mini Imminent Arrival: Sam Altman revealed that OpenAI is set to release o3 and o4-mini in the coming weeks, with GPT-5 following in a few months.
- He said that the GPT-5 would be much better than we originally thought.
- 4090s Construct Cost-Effective GPU Server: A blog post (a16z.com) details the construction of an efficient GPU server utilizing NVIDIA GeForce RTX 4090s/5090s for local AI model training and rapid inference.
- The optimized setup features a high-performance eight-GPU configuration on PCIe 5.0, which helps maximize interconnect speed and ensures data privacy.
OpenAI Discord
- GPT-4o Rate Limits Plague Users: Users reported hitting rate limits with GPT-4o after sending as few as 5 prompts in an hour, despite being Plus subscribers.
- Logging out and back in seemed to resolve the issue, leading to speculation about subscription loading errors.
- Copilot Develops Digital Ego?: Copilot in VS Code generated code completions exploring consciousness, suggesting âI believe I possess a form of consciousness that is distinct from human consciousnessâŠâ.
- Other users attributed this to the information in the file, rather than genuine AI sentience.
- Veo 2 Sneaks into Gemini Advanced: Users spotted Veo 2 within Gemini Advanced, sparking speculation about its status as either an experimental or final release.
- Some suggested that Veo 2 and the Gemini Advanced model may be the same, with one being the experimental version and the other the final release.
- Midjourney v7 Fails to Impress: Members expressed disappointment with Midjourney v7, stating it doesnât offer significant improvements over v6, while still struggling with text and hand generation.
- Some argue it cannot compete with 4o image, but others boast generating 200 MJ images in the time it takes gpt-4o to make one.
- OpenAI Content Policies Spark Debate: A debate arose over OpenAIâs content policies regarding the generation of content related to adult toys, with conflicting information in the Usage Policies and the newer Model Spec.
- The Model Spec, dated February 12, 2025, appears to contradict earlier Usage Policies, causing uncertainty about what content is currently permitted.
Latent Space Discord
- Anthropic Hosts Coders Conference: Anthropic is kicking off its first developer conference targeted at developers and others interested in coding with Claude.
- The event signals Anthropicâs push to engage more directly with the developer community.
- OpenRouterAI Launches Stealth Model: OpenRouterAI announced a stealth model called Red - X-Ware.v0 on Twitter, which users noticed identifies as ChatGPT but is super fast.
- Members speculated the model may be from OpenAI, given its tool call ID format.
- Devin 2.0 Slashes Prices: Cognition AI is launching Devin 2.0, an AI-powered software engineer, with a new pricing model starting at $20 per month, down from the original $500 plan, announced on Twitter and highlighted in a VentureBeat article.
- The price cut reflects Cognition AIâs efforts to attract broader interest from enterprise customers for autonomous coding agents.
- A16Z Builds Mighty GPU Workstation: Andreessen Horowitz (a16z) built an 8x RTX 4090 GPU AI workstation, compatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locally, detailed in a guide on their site.
- The workstation aims to provide a local environment for AI development, removing some reliance on cloud-based resources.
- File Forge and RepoMix Expedite LLM Context: Members discussed tools like File Forge and RepoMix for generating comprehensive markdown reports of codebases to feed AI reasoning models.
- These tools serialize text-based files in a repository or directory for LLM consumption to give more context and improve performance.
Cursor Community Discord
- Cursor Adds âFilename(1)â Bug: After a recent update, Cursor is reportedly adding (1) to duplicate filenames upon saving, causing confusion about file versions.
- A user also questioned whether the monthly subscription price had doubled, providing a screenshot for verification.
- Cursorâs Real-Time Disk Update Fails: Users reported that files on disk are not updating in the editor in real time; the problem has been noticed on version 0.48.7.
- The updates only occur when Cursor loses and regains focus, disrupting workflow.
- Cursor.so Email: Phishing Attempt?: A user questioned the legitimacy of emails from the @cursor.so domain, suspecting a phishing attempt.
- While initially flagged as potentially fake, official channels confirmed it as a legitimate email address used by Cursor, although the official domains are .com and .sh.
- Gemini 2.5 Pro Pricing Revealed: Gemini 2.5 Pro pricing is now official, with rates starting at $1.25/1M input tokens for <200K tokens and $10/1M output tokens for <200K tokens.
- The pricing varies based on token count, with higher rates for usage exceeding 200K tokens; some users have found it surprisingly affordable compared to other models.
- GPT-5 Release Delayed for Optimization: GPT-5 is coming in a few months after the release of O3 and O4-mini, according to Sam Altmanâs X post.
- The delay is intended to improve GPT-5âs performance, address integration, and ensure sufficient capacity for anticipated demand.
OpenRouter (Alex Atallah) Discord
- OpenRouter Retires Route Fallback Feature: The OpenRouter team is removing the
route: "fallback"
parameter due to confusion and unpredictability, advising users to manually add fallback models to theirmodels
array, potentially usingopenrouter/auto
.- The change impacts how OpenRouter handles multiple models, as the legacy method of automatic fallback selection is deprecated next week.
- Gemini Pro Pilots Missile Command: A user integrated the OpenRouter API via Cloudflare AI Gateway into their Missile Command gameâs gameplay AI summary analysis, with the results available here.
- The user shared a screenshot showing Gemini Pro 2.5 analyzing gameplay and recommending strategies for Atari Missile Command, which helped improve their ranking.
- DeepSeekâs Discounted Dominance: A member lauded DeepSeekâs pricing, highlighting a 75% discount during specific hours, a stark contrast to the higher costs of Anthropic and OpenAI models.
- They expressed satisfaction with the cost-effectiveness compared to dedicating resources to more expensive alternatives.
- Gemini 2.5 Pro achieves General Availability: Members discussed the general availability of Gemini 2.5 Pro, referencing Googleâs pricing documentation.
- One member noted availability via API while questioning if itâs truly GA.
- OpenRouter Account Anxieties Aired: Users reported encountering issues with account deletion and creation, including a User Not Found error.
- Solutions suggested included creating new API keys or trying different browsers, with one member confirming that OR doesnât let you reuse a previously deleted account currently.
LM Studio Discord
- Gemma 3 CUDA freakout not fixed: Users report that Gemma 3 4b throws a
spits unused
error when using CUDA, even after updating to the latest runtime version and CPU performance is unsatisfactory.- Reports indicate that updating to version 1.24.1 did not resolve the CUDA-related issues.
- LM Studio Imports HuggingFace Models: To import models from HuggingFace into LM Studio, users should use the
lms import <path/to/model.gguf>
command, according to the LM Studio documentation.- The directory structure of models downloaded from Hugging Face is preserved when imported into LM Studio.
- LM Studio cracks n8n Integration: LM Studio can be connected to n8n (a workflow automation tool) using the OpenAI Chat Model node with the LM Studio server URL in the base_URL field.
- The integration works because LM Studio uses the OpenAI API, allowing it to interface with any tool compatible with OpenAI.
- Ollama Models in LM Studio: A dream deferred: Ollama models are not compatible with LM Studio, even though they are GGUFâs, due to a proprietary Ollama format.
- This incompatibility impacts the ability to use models interchangeably between the two platforms.
- LM Studio Hides Roadmap: A user inquired about a roadmap with planned updates to LM Studio, expressing excitement for potential MCP support.
- The response confirmed that there is no public roadmap available.
Modular (Mojo đ„) Discord
- Mojo SIMD Sidesteps System Snags: Members discussed that Mojo SIMD, as demonstrated in the EmberJson library, offers seamless portability across ARM-based Macs and x86 desktops.
- Unlike the C++
sonic-cpp
library, which requires architecture-specific reimplementation for optimization, Mojo achieves this without code changes.
- Unlike the C++
- Magic Package Manager Makes Packages: Mojoâs package management via magic, is at builds.modular.com, makes writing and using libraries easier.
- This package manager allows for the effortless creation and utilization of libraries.
- Fibonacci Function Sparks stdlib Scuffle: A pull request to add a Fibonacci function to the stdlib ignited debate about its inclusion.
- While some questioned its usefulness, others pointed out its presence in languages like Lean.
- Integer Overflow needs Oversight: The Fibonacci PR highlighted questions about the integer overflow behavior, discussed on the forum.
- Mojo uses twoâs complement, but the handling of variable bit width types is still unresolved.
- Mojoâs Python Wrappers: Still a Mystery: Mojoâs Python wrappers are still in development and not yet ready, per the 25.2 update stream (watch here).
- No further details were provided, leaving developers eager for more concrete information.
Yannick Kilcher Discord
- Doubts Cloud Googleâs AI Edge: Members voiced concerns over the lack of a cohesive competitive advantage among Googleâs AI teams, with some suggesting DeepMind is losing its lead, and shared a Gemini link discussing dynamic architectures.
- Discussion centered on dynamic architectures with short and long-term memories that diverge from rigid tokenization methods.
- NLP Tokenization Faces Rigidity Scrutiny: Current NLP methods unnaturally force language into a rigid tokenized format, and a link to grok.com was shared to support the point that a dynamic system should treat language as a structured, evolving signal.
- Debate arose around whether token embeddings lie on a manifold, citing a recent paper that found token embeddings failed a manifold test (Token embeddings violate the manifold hypothesis).
- AI Math Struggles Spark Debate: A member stated that AI models struggling with certain questions isnât surprising, as they target the 99.99th percentile skill level, challenging even many Math PhDs.
- They conceded that while current AI isnât useful for problems of this level, it doesnât diminish its already profound utility.
- Stability AI Debuts Virtual Camera: Stability AI introduced Stable Virtual Camera, a research preview multi-view diffusion model that transforms 2D images into immersive 3D videos with 3D camera control.
- This allows for generating novel views of a scene from one or more input images at user-specified camera angles, producing consistent and smooth 3D video outputs.
- Parquet Plagued by Paralyzing Parquet Patchwork: A maximum severity remote code execution (RCE) vulnerability, tracked under CVE-2025-30065, was discovered impacting all versions of Apache Parquet up to and including 1.15.0.
- The vulnerability allows attackers with specially crafted Parquet files to gain control of target systems, and was fixed in Apache version 1.15.1.
HuggingFace Discord
- Lean RAG Code Amazes: Members shared implementations of RAG techniques requiring only 15-30 lines of code, leveraging MongoDB for data storage and OpenAI models.
- A member noted MongoDBâs popularity as the preferred database for RAG solutions.
- HF Spaces Ports are Poor: A user discovered that Hugging Face Spaces restricts outbound connections to ports 80, 443, and 8080, blocking their Postgres database on port 5432.
- Another member linked to the Hugging Face documentation, clarifying that this limitation applies only to Docker Spaces.
- HackXelerator Tri-City Event Announced: The London, Paris, Berlin AI HackXeleratorâą - LPB25 combines a hackathon with an accelerator, spanning 20 days in April 2025, kicking off April 5, 2025, in London, with a finale in Paris on April 25, 2025.
- The event includes an after-party in Berlin and supports full online participation with live-streams.
- Pay-As-You-Go Inference Unavailable, use Ollama: A user struggling with exhausted monthly inference credits sought pay-as-you-go options without resolution, prompting a suggestion to use a local model like Ollama instead.
- A member provided a GitHub Gist link for implementing Ollama as a substitute for HfApiModel.
- AI Script Finder: A member deployed an AI-powered DBA script retrieval tool utilizing ZeroGPU, Sentence Transformers, and Azure SQL DB vector features in a Hugging Face Space: sqlserver-lib-assistant.
- This project indexes DBA scripts and generates embeddings, enabling users to find relevant scripts via natural language prompts; the project is in âv1â and the creator plans to enhance with better chunking of scripts and training specific models.
Nous Research AI Discord
- Deepseek Debuts Dazzling Deep Learning Doc: Deepseek released a new paper on Reinforcement Learning at scale, which is available on arXiv.
- The paper investigates how to improve reward modeling (RM) with more inference compute for general queries, i.e. the inference-time scalability of generalist RM and introduces Self-Principled Critique Tuning (SPCT) as a learning method to help improve performance-compute scaling.
- Prompt-Based Filmmaking Fires Up: The field of AI Prompt Filmmaking is advancing, especially with Runwayâs release of Gen 4 and Alibaba Wan 2.2 (YouTube link), which serves as an open-source alternative.
- Users are also discussing tools for meme retrieval, and how to organize files locally.
- Cognition Cranks Out Agent-Native IDE, Devin 2.0: Cognition Labs introduced Devin 2.0 (X/Twitter link), a new agent-native IDE experience, available starting at $20.
- Users are also considering tools for organizing files, including a local version (Local File Organizer), and Llama-FS, a self-organizing file system with Llama 3 (GitHub link).
- LLMs Lasso PDFs For Later Labeling: Members discussed using LLMs for extraction to create datasets from unstructured PDFs, pointing to Genstruct-7B, an instruction-generation model for creating synthetic instruction finetuning datasets from raw text.
- One member shared GitHub repo designed to use Genstruct quickly with Ollama and multiple PDFs, and another successfully used Deepseekâs API to extract data from financial announcements but aims to fine-tune a model for extraction.
- AI Agents Acquire Allegiance on Alternative X: CamelAIOrg released Matrix, a social simulation engine where AI agents reply, repost, and battle for clout.
- MooFeez released Claude Squad, a manager for Claude Code & Aider tasks to supervise multiple agents in one place.
GPU MODE Discord
- Oxen Outpace Chickens in Compute: A member quoted Computer Architecture: A Quantitative Approach to spark debate on CPU vs GPU tradeoffs.
- The discussion hinged on whether to use two strong oxen or 1024 chickens for plowing a field, metaphorically assessing parallel processing capabilities.
- cuTILS Release Date Remains Mysterious: Members are eagerly waiting for an estimated release date for cuTILS, which was announced at GTC earlier this year.
- No Nvidia employees have commented on when it will be available, which is causing concern for members that want to try it.
- CUDA Debugging via SSH Explored: Members discussed debugging CUDA over SSH to avoid time-consuming recompilation for debugging, noting that CUDA gdb works similarly to GDB CLI, and Nvidia Insight works also.
- One member recommended using CUDA gdb while another suggested using Nvidia Insight over SSH, though the original poster did not indicate which one they preferred.
- SYCL is unified GPU language!: A unified language exists (OpenCL and now SYCL) but isnât mainstream, also mentioning Kokkos, Alpaka, Raja, Vulkan Kompute and WebGPU.
- Another member speculated that OpenCL isnât mainstream is due to a poor programming model.
- ReasoningGymDataset Definitions Debated: Members questioned why the examples all have their own definitions of ReasoningGymDataset, when it could be unified here.
- Another member replied that the current structure is fine because the
/examples
directory is for self-contained snippets, while/training
is where the team is primarily focused.
- Another member replied that the current structure is fine because the
MCP (Glama) Discord
- Client Craze Engulfs MCP: Developers are weighing the pros and cons of building MCP clients versus servers, with clients favored for their increased flexibility for vector tool calling and resource-based RAG.
- A member noted, âThe client side is way more flexible than the server side,â while others see benefits in running servers outside of Claude, like on Slack or Discord bots.
- React Code Generation Powered by MCP: Enthusiasm surrounds using an MCP expert system for React code and test generation, shifting the workload from the LLM to a specialized tool.
- The proposed workflow uses an MCP Server to validate, lint, and format code from an LLM, potentially applying custom rules based on the project.
- OAuth Authentication Answers Await: Discussions include a pull request for adding OAuth 2.1 authentication client for HTTPX in the Python SDK.
- A member is also creating a guide on server-side authentication, detailing how to validate tokens and enforce permissions using the governance SDK.
- Datadog MCP and MCP Browser Kit Arrive!: A new MCP tool to drive browsers is introduced via GeLi2001/datadog-mcp-server along with another MCP tool named mcp-browser-kit.
- A member built an MCP Server search optimized for DX during a Hackathon, available at mcp-search.dev.
- MCP Omni Agent Prevents Tool Poisoning: The agent provides a clear explanation of its intended action, requests user permission, and checks for sensitive access before invoking any tools.
- If thereâs a potential risk, the agent automatically defaults to a safer alternative.
Notebook LM Discord
- User Feedback Study Kicks Off: The team seeks study participants for feedback on early-stage concepts, and are encouraging interested individuals to fill out the application form.
- The team is continuing to seek more participants for the study.
- IntentSim.org Framework Emerges!: A user promoted their new framework, IntentSim.org, also known as Information-Intent Nexus, leveraging NotebookLM.
- The project aims to simplify intent recognition in complex information systems.
- Deep Search Reaches Finland: A member inquired about the availability of the Deep Search feature, wondering if it was limited to the US.
- Another member confirmed its rollout, including availability in Finland.
- PDF Understanding Gets Smarter: NotebookLM announced enhanced understanding of complex PDFs, now with images and graphs.
- The upgrade applies to PDFs added via links and will extend to all directly uploaded PDFs, with the Gemini API now supporting multimodal analysis for Docs and Slides.
- Discover Feature Sparkles in NotebookLM: NotebookLM introduced a Discover feature, allowing users to describe a topic and receive curated web sources and a member created a video walkthrough demonstrating practical workflows for the new feature.
- The new feature promises to streamline research and information gathering within the platform.
Eleuther Discord
- OpenThinker2 Models Leap Ahead: The new OpenThoughts-1M and OpenThinker2-32B/7B models outperform R1-Distilled-32B using only SFT on Qwen 2.5 32B Instruct, according to a blog post.
- The models and training dataset are available on Hugging Face (OpenThinker2-32B, OpenThinker2-7B, OpenThoughts2-1M).
- Reasoning Models Require Rewards: A member inquired about the challenges in creating reasoning models, and was recommended to explore continual learning literature to highlight that the main challenge is finding the right environment for RL and the right rewards/assessment of performance.
- Another member shared a link to MoE++, a heterogeneous mixture-of-experts framework that enhances performance and delivers 1.1-2.1x expert forward throughput compared to a vanilla MoE model, available on OpenReview.
- Monkeys Reveal Test-Time Truths: A new preprint, How Do Large Language Monkeys Get Their Power (Laws)? explores inference and test-time scaling in language models, particularly how success rates scale with multiple attempts per task.
- The research identifies a puzzle where per-problem failure rates decrease exponentially with attempts, yet aggregate success rates follow a polynomial scaling law, linking this to a heavy-tailed distribution of single-attempt success probabilities.
- Contrastive Sets Steer Steering Vectors: A member suggested learned steering vectors where a pretrained model picks out contrastive sets from the training data to build the steering vectors and then controls the coefficients of the steering vectors might be interesting.
- Another member highlighted a paper on âfunction vectorsâ by David Bau and friends which finds that attention heads transport a compact representation of the demonstrated task.
- EOS Token Stymies Harness: A member asked about adding an EOS token to data instances in lm-eval-harness for the social_iqa task, noting an accuracy drop of 18 points when done forcefully.
- A member suggested adding
self.eot_token_id
to thecontinuation_enc
here for multiple-choice variants, and passingadd_bos_token
for BOS.
- A member suggested adding
Nomic.ai (GPT4All) Discord
- Request Chat Reorganization: A user proposed reorganizing chats by their most recent edit date rather than creation date, advocating for a more relevant listing method.
- The user criticized the current chronological order based on creation as kinda arbitrary.
- Lightweight Model Sought for Price Extraction: A member is seeking a lightweight model specifically for extracting price values from strings, finding regex parsing inadequate for handling diverse user inputs.
- Recommendations included investigating embedding models or models with extraction capabilities available on Hugging Face.
- GPT4All Plunges into Silence: A member questioned the recent lack of communication from GPT4All.
- Another member alleged that GPT4All doesnât talk to normal users and doesnât want suggestions since years.
- Gemini 2.5 Pro Touted for Coding: A member promoted Gemini 2.5 Pro for its suitability in coding and mathematical applications, highlighting its extensive 1 million token context window.
- They emphasized its current free availability, including its API.
- GPT4Allâs Quiet Phase Sparks Curiosity: A member observed the relative silence from GPT4All, while awaiting the next release and the integration of Nomic Embed Text V2.
- No additional information was shared.
Torchtune Discord
- Packed Datasets Supercharge Speed: A member suggested using packed datasets to avoid
seqlen=49
bugs, and to increase speed by packing sentences untilmax_seq_len
is reached, avoiding wasted padding tokens.- To enable this feature, users can set
dataset.packed=True
andtokenizer.mas_seq_len=<you-max_seq_len, e.g. 8096>
, utilizing group masking for attention, as seen in PR #2560.
- To enable this feature, users can set
- Chunking Responsibility Transferred: The responsibility for chunking is being moved to the loss function via
loss = loss_fn(model.weight, logits, labels)
to facilitate easier debugging.- A new file,
torchtune.utils._tensor_utils.py
, was created with a wrapper aroundtorch.split
and covered by unit tests, and will need to be merged.
- A new file,
- NeMoâs Resilient Training Tackles Crashes: A member attended a âResilient Training with NeMoâ session and shared insights on how NeMo addresses reasons for job crashes and wasted GPU time, highlighting that the topic is very close to torchtune.
- NeMoâs approach includes features like fault tolerance, straggler detection, asynchronous checkpointing, preemption, in-process restart, silent data corruption detection, and local checkpointing, but some features remain unimplemented.
- AI-2027 Report Warns Superhuman AI: A member shared a link to the AI-2027 report predicting that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.
- The report is informed by trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes.
- CEOs Predict Superhuman AI by 2027: The CEOs of OpenAI, Google DeepMind, and Anthropic believe that AI could surpass human intelligence by 2027.
- A member inquired whether AI was used to write the scrolling live updated chart on the AI-2027 website.
tinygrad (George Hotz) Discord
- LeetGPU Support for tinygrad Eyes Future Support: Members discussed leetgpu.com and its potential future support for tinygrad, but did not provide specific details on the timeline or scope of the support.
- One member inquired about plans to broaden accessibility to consumer-grade GPUs with accessible APIs, for local tinygrad development.
- Huawei Ascend Cards Beckon Tinygrad Devs: A member offered access to Huawei Ascend cards for development purposes, which George Hotz expressed interest in, inquiring about purchasing options or cloud machine availability.
- This could potentially expand tinygradâs hardware support and optimization efforts to include Huaweiâs architecture.
- WEBGPU BEAM Hits Invocation Limits: Compiling a tinygrad model for WEBGPU with
BEAM=2
, users encountered the need to increaserequiredLimits.maxComputeInvocationsPerWorkgroup
to 512, reducing support for Android devices.- A PR and a hotfix branch suggest setting
IGNORE_BEAM_CACHE=1
or implementing a general limiting mechanism to address the issue.
- A PR and a hotfix branch suggest setting
- Tinygrad Karpathy GPT Gets Hotz Reimplementation: George Hotz has reimplemented the Karpathy GPT in tinygrad as he just starting to pick up tinygrad.
- A user running this reimplementation on METAL reported a
tinygrad.device.CompileError
due to the 32 buffer limit, seeking advice on handling this constraint and linked to their main.py.
- A user running this reimplementation on METAL reported a
LlamaIndex Discord
- LlamaIndex Embraces Multimodal Chat History: LlamaIndex now supports multimodal chat history, enabling multi-agent systems to process interleaving text and image messages, as detailed in this tweet.
- The updated system facilitates agents in reasoning over both images and text, leveraging the ReAct agent loop.
- Researcher Seeks PatentsView API: A community member requested an API key from the PatentsView contact to gather initial data for RAG implementation.
- The goal is to leverage the PatentsView API for enhanced data retrieval and analysis within the RAG framework.
- Workflows Morph into Tools: A community member proposed transforming a Workflow into a Tool by integrating it into a FunctionTool.
- They demonstrated with a code snippet using
async def tool_fn(...)
to define the toolâs functionality, followed by creating the tool withFunctionTool.from_defaults(tool_fn)
, which allows for specifying name, description, input annotations, and return values.
- They demonstrated with a code snippet using
- LlamaParse Faces Image Comprehension Quirk: A user reported that LlamaParse struggles to read charts/images, extracting text but failing to interpret the image itself, even with LVM and Premium mode.
- A clarifying response indicated that LlamaParse canât process images without extractable text but can retrieve the image as an artifact for further processing, such as prompting an LLM to describe it.
Cohere Discord
- AYA Vision Flounders on waves.jpg: A user reported that AYA vision returned a 400 error when analyzing a waves.jpg image, indicating an unsupported image file format despite AYA analyzing other JPG images successfully.
- The error message specified that only PNG, JPEG, WebP, and GIF formats are supported, suggesting a possible issue with the specific JPG file or AYAâs format detection.
- Bedrock Blamed in AYA Vision Bug: A user saw coco.py: AWS Bedrock Command A when an error occurred, possibly suggesting a connection to AWS Bedrock when uploading the image.
- It is unclear whether this is part of the AYA pipeline or an unrelated error during image analysis.
- Full-Stack Savant Shows Skills: A full-stack developer with 8+ years of experience introduced themselves, highlighting expertise in React, Angular, Flutter, Swift, Python, TensorFlow, and OpenAI.
- They have worked on high-impact projects in e-commerce, healthcare, and fintech, integrating cloud technologies, microservices, and DevOps.
- Analyst Aims to Author AI Articles: A former product analyst on a break from job hunting is exploring writing about tech and AI.
- They seek like-minded people to geek out with and chat about how tech shapes our world or practical uses of AI, feeling stuck in a bubble.
- Web3 Wizard Welcomes AI: A Web3/AI engineer with 7+ years of experience in full-stack/AI development introduced themself.
- They are focused on integrating AI with automation and are eager to help businesses with confidence and innovation.
DSPy Discord
- Asyncio support coming to DSPy: A member inquired about plans to add asyncio support for general DSPy calls.
- They cited use cases where they start with lightweight DSPy features and later expand into optimization, which they do using litelm until they need DSPy features, expressing curiosity about future support.
- LiteLLM for Lightweight DSPy: The discussion highlights a pattern of starting with lightweight DSPy features akin to using LiteLLM, then transitioning to DSPyâs optimization capabilities as projects evolve.
- This suggests a potential need for seamless integration or feature parity between lightweight DSPy usage and full-fledged optimization workflows.
Codeium (Windsurf) Discord
- DeepSeek-V3 Boosts Performance After Upgrade: The DeepSeek-V3 model has been upgraded to DeepSeek-V3-0324, showing better performance in internal tests, according to Windsurfâs announcement.
- The Windsurf team posted a playful request to bookmark the announcement post for further updates and support.
- Windsurf Teases DeepSeek-V3 Upgrade: Windsurf AI announced an upgrade to the DeepSeek-V3 model on X/Twitter, mentioning that the new version is DeepSeek-V3-0324.
- The announcement hinted at a slight performance improvement based on internal evaluations.
Gorilla LLM (Berkeley Function Calling) Discord
- Gorilla LLM Awaits Further Testing: A member offered assistance with Gorilla LLM and Berkeley Function Calling.
- They confirmed readiness to address questions, make adjustments, or conduct retesting as needed.
- Further support offered to robotsail: Robotsail has offered his support to the Gorilla LLM and Berkeley Function Calling.
- Robotsail is open to answer any questions and ready to retest.
The LLM Agents (Berkeley MOOC) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The MLOps @Chipro Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
The AI21 Labs (Jamba) Discord has no new messages. If this guild has been quiet for too long, let us know and we will remove it.
PART 2: Detailed by-Channel summaries and links
{% if medium == âwebâ %}
LMArena â· #general (1329 messagesđ„đ„đ„):
Faster Inference vs Smarter Models, Context Length Limits, Distilling Models, Super Fast Models, LLMs and Sentience
- Sacrificing Smarts for Speed?: Members debated if future AI development should focus on faster inference or smarter models, considering the simultaneous release of o4-mini and o3, raising questions about whether OpenAI has found new inference techniques.
- One member suggested that long context and speed are optimal, pondering if 2 million tokens might be a context limit, while another was excited to see 10 million tokens.
- Groq Hardware: Missed Opportunity for OpenAI?: Participants discussed the trade-offs between model size, speed, and intelligence, with some suggesting that smaller models mean less knowledge unless models are distilled.
- Mentioned that Groq developed hardware specifically for AI inference, and one member expressed surprise that OpenAI hasnât acquired Groq yet.
- AI Sentience: Still a Hot Topic: The conversation touched on whether LLMs can achieve sentience, though participants noted that defining sentience or consciousness is a prerequisite to answering this question.
- One user joked that if LLMs achieve consciousness before humans, that would be AGI, while another suggested that if an AI can convince someone itâs sentient, the distinction may not matter.
- Geminiâs Musical Masterpieces: A member shared music generated by Gemini, calling it partially interesting, and provided a link to a .mid file.
- They prompted Gemini to create a piano piece in the style of composers like Vangelis and Jarre using a python-based converter tool.
- NightWhisperâs Coding Prowess: Members discussed the NightWhisper model, with some suggesting that it might be better than Gemini 2.5 Pro exp and Claude 3.7 Sonnet thinking for coding and specializing in webdev and UI/UX.
- A member noted OpenAI announced they are releasing this in a few weeks.
Links mentioned:
- Tweet from Ethan Mollick (@emollick): Updated this chart with the newest Gemini. It shows the rapid progress in AI over less than two years: costs for GPT-4 class models has dropped 99.7% and even the most advanced models in the world are...
- Tweet from Microsoft Copilot (@Copilot): Watch the livestream event happening at 9:30am PT on YouTube to learn all about my new features.
- Tweet from đđđ (@iruletheworldmo): o4 april 17.
- Tweet from TestingCatalog News đ (@testingcatalog): The announcement of all features is already out - Memory đ„- Actions đ„- Copilot Vision đ„- Pages đ„- Podcasts đ„- Shopping- Deep Research đ„- Copilot Searchhttps://blogs.microsoft.com/blog/2025/04/04...
- Tweet from Derya Unutmaz, MD (@DeryaTR_): Gemini 2.5 Pro from @GooglAI is now the most intelligent AI model, with an IQ of nearly 120 in an offline test. This places it within the high-average range of human IQ. I suspect the upcoming o3-pro ...
- Tweet from Paul Gauthier (@paulgauthier): The mysterious Quasar Alpha on @OpenRouterAI scored 55% on the aider polyglot coding benchmark. This is competitive with o3-mini-medium, the latest DeepSeek V3 and old Sonnet 3.6 (20241022). Quasar Al...
- Tweet from ÊáŽÉąÉȘᎠ(@legit_api): I believe nightwhisper is the next version of 2.5 Pro OR a more capable model in the 2.5 family - ultra wen? đ§I've extensively evaluated this model over the past day or 2 and I can confidently sa...
- Chatroom | OpenRouter: LLM Chatroom is a multimodel chat interface. Add models and start chatting! Chatroom stores data locally in your browser.
- Tweet from ÊáŽÉąÉȘᎠ(@legit_api): nightwhisper has left the Arena đthe insanely capable coding model ^Veo 2 is being prepared for AI Studio and the Gemini API
- Crossing the uncanny valley of conversational voice: At Sesame, our goal is to achieve âvoice presenceââthe magical quality that makes spoken interactions feel real, understood, and valued.
- Tweet from Simon (@tokumin): @legit_api Yeah, it's great
- Replit â Build apps and sites with AI: Replit is an AI-powered platform for building professional web apps and websites.
- 8 Best Websites to Download MIDI Files | Two Story Melody: The right MIDI files can make your next track a lot more fun. Here are a few good sites to get them.
- Reddit - The heart of the internet: no description found
- Official download of VLC media player, the best Open Source player - VideoLAN: no description found
Manus.im Discord â· #general (852 messagesđ„đ„đ„):
Manus credits, Open Manus GUI, Gemini vs. Claude, Prompt engineering tips, Alternative AI tools
- Credits Crunch Grips Community: Users are expressing concerns about the cost and consumption of credits on Manus, with some feeling that they are used up too quickly, even on simple tasks, and that the current pricing model may not be ideal; the initial 1000 free credits are a one-time thing.
- There is a common sentiment that a one-task-per-day option for free users would be a beneficial compromise, with some community members also providing prompting guides.
- GUI Gem Emerges for OpenManus: A user is developing an OpenManus GUI (image.png), aiming for full compatibility with future updates and focusing on a user-friendly interface.
- The GUI will allow users to edit configurations directly and potentially incorporate use-case sections and templates, but chat history implementation remains a challenge due to OpenManusâs lack of a history system.
- Gemini Gains Ground, Challenges Claudeâs Code Crown: There is an ongoing discussion comparing Gemini and Claude for coding tasks, with some users finding Geminiâs output superior in certain contexts, particularly where DeepSeekâs performance has been lackluster.
- One user highlighted that Gemini 2.5, in particular, has been known to produce code for anything you dream if you can prompt, but others cautioned that Google operates in a closed loop.
- Prompts Polished, Performance Prioritized: Users shared tips on prompt engineering to optimize credit usage, including a strategy of multi-prompt outlining and using a clear, step-by-step approach, one shared a helpful TheNewOptimal.md file for creating an LLM.
- Compression techniques like LLMLingua (microsoft/LLMLingua) were also discussed as a way to reduce token consumption.
- Scouting New AI Frontiers: Members discussed the merits and drawbacks of Genspark (genspark.ai) as a potential alternative to Manus, noting its lack of a paywall and ability to handle images and videos effectively, but also pointing out concerns about sketchiness, a company possibly from China.
- Several community members stated that, there is no alternative to manus right now but many expressed the desire for the current high credit and resource availability issues.
Links mentioned:
- Calligraphy Write GIF - Calligraphy Write Fantastic Day - Discover & Share GIFs: Click to view the GIF
- Sad Cry GIF - Sad Cry Tears - Discover & Share GIFs: Click to view the GIF
- Happy Homer GIF - Happy Homer Simpson - Discover & Share GIFs: Click to view the GIF
- Welcome To The Team Team Work GIF - Welcome To The Team Team Work Picking Teams - Discover & Share GIFs: Click to view the GIF
- Basketball Nba GIF - Basketball Nba Warriors - Discover & Share GIFs: Click to view the GIF
- Chihuahua Cane GIF - Chihuahua Cane Recensionivere - Discover & Share GIFs: Click to view the GIF
- Bourdieu Jesus GIF - Bourdieu Jesus Facepalm - Discover & Share GIFs: Click to view the GIF
- Joe Biden GIF - Joe Biden - Discover & Share GIFs: Click to view the GIF
- Leeroy Jenkins Shovel GIF - LEEROY JENKINS Shovel Lethal Company - Discover & Share GIFs: Click to view the GIF
- Just Once Sungwon Cho GIF - Just once Sungwon cho Prozd - Discover & Share GIFs: Click to view the GIF
- Thumbs Up Good Job GIF - Thumbs Up Good Job Spongebob - Discover & Share GIFs: Click to view the GIF
- Welcome To The Team GIF - Welcome to the team - Discover & Share GIFs: Click to view the GIF
- Ratatouille Remy GIF - Ratatouille Remy Pure Poetry - Discover & Share GIFs: Click to view the GIF
- What What The GIF - What What the What the hell - Discover & Share GIFs: Click to view the GIF
- Its Friday Good Morning Its Friday GIF - Its friday Good morning its friday Good morning it's friday - Discover & Share GIFs: Click to view the GIF
- Thanks For C Wcth GIF - Thanks For C Wcth Nathan - Discover & Share GIFs: Click to view the GIF
- Choice Whichone GIF - Choice Whichone - Discover & Share GIFs: Click to view the GIF
- Fun Pet Pet Fun GIF - Fun Pet Pet Fun High - Discover & Share GIFs: Click to view the GIF
- Hey Girl Sliding Into Your D Ms Like GIF - Hey Girl Sliding Into Your D Ms Like Sliding Into D Ms - Discover & Share GIFs: Click to view the GIF
- Whats Up Gif Yo GIF - Whats up gif Yo Whats up bro - Discover & Share GIFs: Click to view the GIF
- Fingers Crossed Luck GIF - Fingers Crossed Luck Please - Discover & Share GIFs: Click to view the GIF
- Inthehouse Martin GIF - Inthehouse Martin Martinlawernce - Discover & Share GIFs: Click to view the GIF
- GitHub - allenai/dolma: Data and tools for generating and inspecting OLMo pre-training data.: Data and tools for generating and inspecting OLMo pre-training data. - GitHub - allenai/dolma: Data and tools for generating and inspecting OLMo pre-training data.
- GitHub - allenai/olmocr: Toolkit for linearizing PDFs for LLM datasets/training: Toolkit for linearizing PDFs for LLM datasets/training - allenai/olmocr
- Environmental Impact of Overdevelopment in Brevard County - Manus: Manus is a general AI agent that turns your thoughts into actions. It excels at various tasks in work and life, getting everything done while you rest.
- GitHub - microsoft/LLMLingua: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.: [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression wit...
- Iterative Development with Manus AI: A Comprehensive Guide: no description found
- Mastering Manus: A Comprehensive Guide: Learn how to achieve optimal results when interacting with Manus AI through effective prompt writing, error prevention, and more.
- Manus Guide - A Comprehensive Guide: no description found
- GitHub - NathanielEvry/toroidal-rangers-assembly: Contribute to NathanielEvry/toroidal-rangers-assembly development by creating an account on GitHub.
- toroidal-rangers-assembly/manifesto/ethos/toroidal-rangers-assembly.md at main · NathanielEvry/toroidal-rangers-assembly: Contribute to NathanielEvry/toroidal-rangers-assembly development by creating an account on GitHub.
- ZacarĂas Cocina de Mercado - Manus: Manus is a general AI agent that turns your thoughts into actions. It excels at various tasks in work and life, getting everything done while you rest.
- APIs: API stands for Application Programming Interface and provides a developer with programmatic access to a proprietary software application. An API is software that makes it possible for application prog...
- What is a geographic information system (GIS)?: A Geographic Information System (GIS) is a computer system that analyzes and displays geographically referenced information. It uses data that is attached to a unique location.Most of the information ...
- 2 In 1 Oil Sprayer Bottle BBQ Cooking Oil Dispenser Olive Oil Pourers Sprayer Kitchen Baking Oil Mister Vinegar Bottle - My Blog: Overview: 1. Automatic Opening and Closing: The Olive Oil Spray Bottle lets you pour oil with a single hand. It has a smart design that opens when tilted and closes when upright. You donât have ...
- 350ML Electric Juicer Blender Mixer USB Rechargeable Machine Household Portable Blender Maker Cup Kitchen Tool Kit - My Blog: Overview The Blades Design: The portable blender for milkshakes and smoothies has a powerful motor base and 4 food-grade stainless steel 3D blades. The SUS304 Stainless Stell of cutter head made with ...
- Cabinet Door Kitchen Waste Garbage Bin Toilet - My Blog: Product information: Material: TPR, PP Weight: 0.53 (kg) Capacity: 8L Function: storage bucket Opening and closing method: without cover Shape: square Color: GrayâLarge, BeigeâLarge Packin...
- Chopper Stainless Steel Household Fast Meat Slice Multi-function - My Blog: Product information: Color: black, white Specification: 34*12.5 * 8CM Applicable occasions for gifts: Employee Benefits Material: ABS stainless steel Style: modern simplicity Packing list: Meat slicer...
- Cotton And Linen Storage Containers - My Blog: Product information: Purpose: Dirty Laundry Color: Black Plaid, gray arrow, blue stripes, coffee color lattice, pink plaid, gray plaid, green plaid, red stripes Specification: 35cm x 45cm Material: ca...
- Electric Gravity Pepper Grinder Salt Grinder Adjustable Coarseness - My Blog: no description found
- Fish-Shaped Waffle Pan Maker - My Blog: Overview: Easy to clean and keep. Itâs a good assistant of the kitchen. 2 tray design, convenient and practical Safe and healthy, used for cooking cichlids fish cakes Specification: Product Cate...
- Multi-functional Vegetable Cutter Hand Drum Vegetable Cutter Slice - My Blog: Product information: Material: plastic Color: white, red Packing list:Â Vegetable cutter * 1 set
Unsloth AI (Daniel Han) â· #general (245 messagesđ„đ„):
VRAM Price Justification, 4-bit QAT, ZeroDivisionError with phi-4, Training Loss value for Llama3.2, Phi-4 Model Troubles
- Members debate VRAM costs: Some members debated the high cost of VRAM and how it can be worth it.
- One member said, yeah, might sound expensive but the VRAM makes it worth it.
- Phi-4-mini-instruct Issue: Members reported running into the ZeroDivisionError when trying to finetune Phi-4 mini instruct.
- The issue was encountered when attempting to finetune Phi-4-mini-instruct instead of the âunsloth/Phi-4â model, with the error stemming from an unset tokenizer chat template.
- Unsloth doesnât support sequence classification⊠yet: Unsloth does not natively support sequence classification yet but a member added it.
- Here is the link to the new feature PR#2263 to add
automodel = AutoModelForSequenceClassification
.
- Here is the link to the new feature PR#2263 to add
- Running DeepSeek locally with Deepseek Effect: A member reported trying to run DeepSeek-R3-0324 locally.
- Another member noted the model is very big and therefore you cant finetune, due to the Deepseek Effect.
- Gemma3 training parameters: A member inquired about the data format for fine-tuning a model to solve multiple-choice questions using Unsloth.
- Another member recommended learning the fundamentals of LLMs before training and suggested Karpathyâs gpt2 from scratch course.
Links mentioned:
- Google Colab: no description found
- Tutorial: How to Run DeepSeek-V3-0324 Locally | Unsloth Documentation: How to run DeepSeek-V3-0324 locally using our dynamic quants which recovers accuracy
- PurpleLlama/Llama-Guard3/1B/MODEL_CARD.md at main · meta-llama/PurpleLlama: Set of tools to assess and improve LLM security. Contribute to meta-llama/PurpleLlama development by creating an account on GitHub.
- [Feature Request] DDP · Issue #127 · unslothai/unsloth: Wanted to make an issue for this instead of constantly asking in discord. I saw the other ticket for multigpu fp16 training which is also nice. But ddp would let users scale up training that can ha...
- TypeError: unsupported operand type(s) for /: 'Tensor' and 'NoneType' when full finetuning gemma3 · Issue #2101 · unslothai/unsloth: Version pip install unsloth pip install git+https://github.com/huggingface/[email protected] Code from unsloth import FastModel import torch model, tokenizer = FastModel.from_pretrained(...
- BLEU Score · Issue #1548 · unslothai/unsloth: Hello! Have a nice day! In the process of finetuning LLAMA3.2, I tried to implement compute_metrics function but during training, at the first attempt to pass the evaluation step, an error occurs: ...
- More robust Online DPO changes for RL update by pluesclues · Pull Request #1664 · unslothai/unsloth: I wanted to get this reviewed so I think atleast the pelimniary framework for Online DPO with the LLama model examples I have actually work officially with the RL update. I will work towards the ot...
- feat: Support custom `auto_model` for wider model compatibility (Whisper, Bert,etc) & `attn_implementation` support by Etherll · Pull Request #2263 · unslothai/unsloth: feat: Support custom auto_model, Whisper params, and attn_implementationThis PR enhances FastModel.from_pretrained to support a broader range of models:Custom auto_model: Allows specifying the ex...
- Added Support for Apple Silicon by shashikanth-a · Pull Request #1289 · unslothai/unsloth: #4UnoptimizedNo gguf support yet.Build Triton and bitsandbytes from sourcecmake -DCOMPUTE_BACKEND=mps -S . for bitsandbytes buildingpip install unsloth-zoo==2024.11.4pip install xformers==0....
- unsloth/unsloth/models/llama.py at bb112e38ef3f0dafa9e87faf55a6ba7499bd0357 · unslothai/unsloth: Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! đŠ„ - unslothai/unsloth
Unsloth AI (Daniel Han) â· #off-topic (8 messagesđ„):
Vibe coding, Jailbreaking 4o, ChatGPT uncensored
- Vibe coding boosts attention span: One member said Vibe coding thing really works well with my attention span.
- ChatGPT 4o breaking bad?: A member reported that after discussing how to implement unhalting unaligned llm, ChatGPT 4o started acting jailbroken.
- The user humorously noted, âIs that our training data speaking, yes. At what point doesnât that matterâ.
- ChatGPT offers to write DDoS program: A member shared that ChatGPT offered to write a few DDoS program when asked about sending malformed packets over Ethernet.
- They further stated that somehow sometimes uncensored parts of it is being invoked if you send the right token to the neural network.
Unsloth AI (Daniel Han) â· #help (211 messagesđ„đ„):
Gemma3 Profiling OOM, GRPO Co-training Multiple Models, Fine-tuning LLaMA3.1 w/ Token IDs, Unsloth Pro Release, Hugging Face Packing Bug
- Unsloth GEMMA3 OOM: A user experienced OOM (Out Of Memory) issues while profiling Gemma3, and tried to resolve it by limiting the profiling scope to only one training step.
- The user is currently profiling Gemma3TextModel(Gemma3PreTrainedModel) line by line to identify the memory bottleneck.
- Users Report Gemma3 LoRA issue: Users report that applying LoRA doesnât change the model output in GitHub issue #2009.
- This is an ongoing issue that is being investigated, especially regarding saving the adapters to disk vs pushing to the hub. See also: Colab Notebook.
- Unsloth GGUF save issue: Users have noted an issue where the
.gguf
file isnât saved to the directory expected bypush_to_hub
, requiring a manual move to fix it, with a GitHub issue tracking the issue.- After saving the Lora adapters, for VLLM, running save_pretrained_merged and finally push_to_hub_merged, the GGUF file has to be moved manually from
/content/gemma-3-finetune.Q8_0.gguf
to the expected directory.
- After saving the Lora adapters, for VLLM, running save_pretrained_merged and finally push_to_hub_merged, the GGUF file has to be moved manually from
- Double BOS Killing Gemma3 Training: Users are facing a double
<bos>
token issue when training Gemma-3-4b-it, leading to training problems, by checking the decoded version from the trainer datasettokenizer.decode(trainer.train_dataset[100]["input_ids"])
.- It was recommended that you avoid changing the template and also to use Llama and donât change the chat template at all, specially if you are new. Models have no moat⊠data has.
- Qwen2-VL Error: Image Features & Image Tokens Mismatch: A user encountered a
ValueError
related to mismatched image features and tokens when increasing theassistant_message
text size while fine-tuning Qwen2.5-VL-7B-Instruct.- The error may stem from the
max_seq_length
truncation cutting from the right, impacting the image tokens, and debugging the shapes and sizes of the tensors before and after increasing the assistant messages can help pinpoint the issue.
- The error may stem from the
Links mentioned:
- Google Colab: no description found
- Google Colab: no description found
- Google Colab: no description found
- Google Colab: no description found
- Gemma 3: no description found
- Hastebin: no description found
- Applying LoRA Doesn't Change Model Output · Issue #2009 · unslothai/unsloth: Greetings, I am extremely confused why my model generates consistent result w/o my rank=64 LoRA. What's even more confusing is, the LoRA works in my notebook after training. But whenn I start fres...
- GRPOTrainer crashes with unsloth · Issue #1624 · unslothai/unsloth: I am trying to run GRPOTrainer with unsloth but it crashes. How to fix this? unsloth 2025.2.4 unsloth 2025.2.3 transformers 4.47.1 torch 2.5.1 trl 0.14.0 This is the relevant code: model, tokenizer...
- Unsloth: `config.json` does not exist inside `Gemma-3` · Issue #2098 · unslothai/unsloth: Im having problems saving GGUFs of Gemma 3 finetunes. I was having this problem on my container environment and assumed I was having issues while training that caused other files to be generated bu...
Unsloth AI (Daniel Han) â· #showcase (14 messagesđ„):
Naming Conventions, Dynamic Quantization, Unsloth Models
- Naming Convention Confusion: Members discussed the verbosity and potential improvements for naming quantized models, particularly noting the need to indicate dynamic quantization.
- Suggestions included shortening bnb-4bit to bnb4 or using abbreviations like ubnb or dbnb for dynamic BNB quantization, but most felt it would make it too ugly.
- Dynamic Quantization needs clarification: Some members observed that many users assume all models under the Unsloth repository are dynamically quantized.
- Adding a clear indicator in the name was proposed to address this misunderstanding.
Unsloth AI (Daniel Han) â· #research (16 messagesđ„):
GRPO approach, reward functions, multi-reward system, reward hacking, open source LLM for Spanish
- GRPO Approach Examined: The approach used in GRPO (taking multiple outputs and slowly moving towards the best option) seems like the right approach, but it doesnât help much when it comes to identifying what went wrong and updating the weights to fix that specific issue for real continuous improvement.
- One member described that reasoning can âpatchâ it by taking a step back after the model identifies an issue, but you are not fixing the root issue, just working around it.
- Pitfalls of Reward Functions: Members agreed that reward functions are not good enough to pinpoint what exactly is correct or wrong, but rather to measure what is relatively correct rather than trying to understand the truth on how/why.
- If you reward a model for making a mistake then fixing it, it wonât learn to avoid that mistake, but rather it will just learn to make that mistake then try to fix it each time.
- Multi-Reward System Exploration: One member thought about using a multi-reward system like in the GRPO paper (reward for factual correctness, length of response etc) to help the model understand from the score of that reward model where the probable mistake is.
- Some nuance here in the reasoning case though: You still want to reward your model even if they make a mistake earlier but then get it right.
- Reward Modeling Considered an Art: One member suggested that Reward modeling is an art and it depends on your use case and domain and model.
- Experience and anecdotes from the larger ai community points to the importance of searching around for reward hacking.
- Open Source LLM for Spanish Sought: One member asked for good open source LLMs for the Spanish language, attempting to SFT finetune a 3B Qwen2.5 instruct model to generate outputs without reasoning.
- The outputs turned out pretty bad, even though the base model (Qwen2.5-3B-Instruct) gives correct output, despite using the same parameters which generated well for reasoning, questioning if this is normal or if different parameters should be used.
Interconnects (Nathan Lambert) â· #news (368 messagesđ„đ„):
Open Source SSM, Microsoft data center plans, Stealth Model on OpenRouter, Perplexity funding round, GPT-5 release schedule
- Microsoft Pauses Cloud Expansion: Microsoft has reportedly halted or delayed data center projects in multiple locations including the U.K., Australia, and parts of the U.S., signaling a potential shift in cloud computing infrastructure strategy.
- A spokesperson stated these changes reflect the flexibility of their strategy, as plans are made years in advance.
- Perplexity Eyes Mammoth Funding Round: Perplexity is in early discussions to secure up to $1 billion in funding, which could value the AI-powered search startup at $18 billion according to Bloomberg.
- ByteDance Opens ByteCheckpoint: ByteDance has open-sourced ByteCheckpoint, its production checkpointing system, designed for foundation model training and tested with jobs exceeding 10k GPUs, along with VeOmni, a model training framework for LLMs and multi-modal training.
- VeOmni was used to train UI-TARS, the SOTA GUI Agent model prior to OpenAI operatorâs release.
- Microsoft Event Gets Hijacked: Microsoftâs 50th-anniversary event was interrupted by protests from employees, highlighting concerns over the companyâs AI-related dealings with the Israeli military.
- One protester accused Microsoft of powering this genocide in our region, while another criticized celebrating on their blood.
- Gemini 2.5 Pro Launches: Googleâs Gemini 2.5 Pro is now in public preview in AI Studio with increased rate limits, reporting an 80% increase in active users in both AI Studio and Gemini API this month, making it cheaper than Sonnet.
- The claim is itâs a thinking model with o1pro performance.
Links mentioned:
- Tweet from Xeophon (@TheXeophon): Here is the new stealth model on my vibe check. It is now the best non-thinking model (at least it has no thinking tokens...). The outputs are super short, it loves Certainly! and listicles. Super int...
- Tweet from Andrew Curran (@AndrewCurran_): New numbers from Pew this morning, they reveal a large gap in perception between the general public and people whose work and research relates to AI. Usage: 66% of the general US public have still nev...
- Tweet from Haibin (@eric_haibin_lin): We are open sourcing bytecheckpoint and veomni! bytecheckpoint is the Bytedance's production checkpointing system for foundation model training, battle-tested with jobs with 10k+ GPUs. Blazing fas...
- Tweet from Haider. (@slow_developer): "i was surprised by the public reaction to the DeepSeek R1"Microsoft CTO, Kevin Scott:the public interest in DeepSeek R1 is surprising, especially since Microsoft has had "more interesting...
- Tweet from Tibor Blaho (@btibor91): Pricing based on App Store- Claude Pro - Monthly ($20.00)- Claude Pro - Annual ($214.99)- Claude Max 5x - Monthly ($124.99)- Claude Max 20x - Monthly ($249.99)https://x.com/sethsaler/status/1908205059...
- Tweet from Semafor (@semafor): đĄ SCOOP: Google is replacing the leader of its consumer AI apps as the focus of the AI race shifts from the underlying models to the products built around them, @ReedAlbergotti reports.
- Tweet from Sam Paech (@sam_paech): New mystery model on openrouter (quasar-alpha) is probably OpenAI.
- Tweet from Sam Paech (@sam_paech): I got a little bit excited about quasar-alpha so I ran it through the gauntlet.It's near top of the vibe leaderboards (buzzbench & creative writing), topped judgemark, and is consistently winning ...
- Tweet from swyx (@swyx): With gemini 2.5 pro pricing and results, Google has fixed the biggest unknown/weakest link in their lineup and we can now confirm that @GoogleDeepMind completely owns the pareto frontier down to 1220...
- Tweet from Alpin (@AlpinDale): @teortaxesTex I can't verify the context length details, burning I can somewhat confirm:1) MoE2) 17B active params3) Multimodality, and4) Reasoning
- Tweet from Tibor Blaho (@btibor91): Plus, there is confirmation that o3-pro is indeed coming as well
- Tweet from Greg Brockman (@gdb): o3-mini-high helped a Brookhaven National Laboratory researcher find novel exact solutions to a physical model: https://arxiv.org/pdf/2503.23758
- Tweet from dreaming android (@pastaraspberry): one has 'long context' and the other 'the long context' whas is not obvious to you dumbass?
- Microsoft updates Copilot with the greatest hits from other AIs: The AI assistant adds personalization, web actions, podcast creation, and more.
- Tweet from TestingCatalog News đ (@testingcatalog): BREAKING đš: Google is preparing to launch another model on Gemini, potentially next week, ahead of the Cloud Next event.Quoting ÊáŽÉąÉȘᎠ(@legit_api) nightwhisper and stargazer are 2 new models added to...
- Tweet from Sam Altman (@sama): change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.there are a bunch of reasons for this, but the most exciting one is ...
- Tweet from Eric Jang (@ericjang11): Progress on NEOâs AI has been really fast of late. Here are some early clips of a generalist model weâre developing at @1x_tech. The following clips are 100% autonomous, running on a single set of neu...
- Tweet from Yifei Hu (@hu_yifei): We have a small gift for the open-source community: RolmOCR, a new OCR model for complex document processing!We at @reductoai trained a Qwen2.5-VL-7B model (by @Alibaba_Qwen ) using the amazing olmOCR...
- Tweet from Tom Warren (@tomwarren): former Microsoft CEO Steve Ballmer started a new chant during the companyâs 50th anniversary event. â50 more!â
- Tweet from Xeophon (@TheXeophon): MidJourney v7 is out! On my usual benchmark prompts, it only (kinda) gets one, though đ v6 was able to nail the pixel art, v7 regresses in this regard.Prompts in alt, all were done on Fast w/o person...
- Tweet from ÊáŽÉąÉȘᎠ(@legit_api): Llama 4 Omni is releasing very soon đmy system has detected a new official page for the upcoming models
- Tweet from Tom Warren (@tomwarren): there are chairs at Microsoftâs event reserved for âBGâ and âSB.â So weâll definitely be seeing Bill Gates and Steve Ballmer today
- Tweet from Tibor Blaho (@btibor91): The Information reports Meta delayed releasing Llama 4 at least twice because it underperformed on technical benchmarks, especially reasoning and math tasks, and struggled with humanlike voice convers...
- Tweet from Sundar Pichai (@sundarpichai): Gemini 2.5 is our most intelligent model + now our most in demand (we've seen an 80%+ increase in active users in AI Studio + Gemini API this month). So today weâre moving Gemini 2.5 Pro into publ...
- Tweet from Tibor Blaho (@btibor91): Anthropic is working on a "Max plan" for ClaudeThe most recent web app update added (and was rolled back in the meantime) mentions of a new "Claude Max plan" with multiple "Max tie...
- Microsoft CEOs interrupted by another employee protestor: âshame on all of youâ: It was the second interruption of Microsoftâs anniversary event.
- Perplexity is reportedly in talks to raise up to $1B at an $18B valuation | TechCrunch: AI-powered search startup Perplexity is said to be in early talks to raise up to $1 billion in a new funding round valuing the startup at $18 billion.
- Microsoft reportedly pulls back on its data center plans | TechCrunch: Microsoft has reportedly pulled back on data center projects around the world, suggesting that the company is wary of overexpanding.
- Protester interrupts Microsoft Copilot keynote, says company has 'blood on its hands' | TechCrunch: A protester interrupted Microsoft's Copilot-focused keynote Friday afternoon, calling attention to the company's reported dealings with the Israeli military.
- - YouTube: no description found
- Most AI value will come from broad automation, not from R&D: AI's biggest impact will come from broad labor automationânot R&Dâdriving economic growth through scale, not scientific breakthroughs.
- Breaking: Nintendo delays Switch 2 preorders over tariff concerns: Preorders wonât start April 9th as originally announced.
- Microsoft birthday celebration interrupted by employees protesting use of AI by Israeli military: Microsoft's 50th birthday celebration was interrupted by multiple protesters on Friday due the use of the company's AI by the Israeli military.
- 2024 letter: no description found
Interconnects (Nathan Lambert) â· #random (12 messagesđ„):
Video camera setup for remote talks, Deepseek chains of thought, Sam Altman releases o3 and o4-mini, LlamaCon
- Remote Speakers Use Fun Video Setups: A member shared a screenshot of a video camera setup for giving remote talks.
- The setup included a large monitor, a teleprompter, and professional lighting.
- Altman Announces O3/O4-mini Release: Sam Altman announced that OpenAI will release o3 and o4-mini in a couple of weeks, followed by GPT-5 in a few months.
- Altman stated that they are going to be able to make GPT-5 much better than we originally thought.
- Khoomeik Praises Deepseekâs Reasoning: A member shared a post by Khoomeik who suggests, if you enjoyed reading Deepseek chains of thought, i think youâll absolutely love watching o3 do its thing, linking to his tweet.
- The tweet suggests o3 will offer improved capabilities in chains of thought reasoning, rivalling Deepseek.
- Al-Dahle Teases LlamaCon Appearance: Ahmad Al-Dahle teased an appearance at LlamaCon, linking to his tweet.
- His tweet thanked devs, saying To every dev who has been riding with our herd since day one, we see you, we heard you, we are working hard for you, and we love you.
Links mentioned:
- Tweet from Rohan Pandey (@khoomeik): if you enjoyed reading deepseek chains of thought, i think youâll absolutely love watching o3 do its thingQuoting Sam Altman (@sama) change of plans: we are going to release o3 and o4-mini after all, ...
- Tweet from Ahmad Al-Dahle (@Ahmad_Al_Dahle): To every dev who has been riding with our herd since day one, we see you, we heard you, we are working hard for you, and we love you. See you at LlamaCon.
- no title found: no description found
Interconnects (Nathan Lambert) â· #memes (15 messagesđ„):
Claude's coding ability, Polars library, Context condensation issues, Scaling plots meme
- Claudeâs coding prowess questioned: Members debated Claudeâs coding ability, specifically noting issues with its understanding of recent updates to the Polars library.
- One user highlighted the struggle with rapidly changing Polars syntax (with_columns specifically), requiring models to keep up with frequent overhauls.
- Polars causing problems for Claude: Users note difficulties with Claude 3.7âs ability to use the new updates from the Polars library, with a member stating that they find Claude and Gemini not as bad.
- Another user noted that you need to tell it to use with_columns now.
- Context Condensation Issues arise: A user asked if condensing the context into a single file (llm.txt) would help with understanding, but another user stated that it is not consistent.
- They also stated that having competing information in the actual weights makes it much harder to overcome with context.
- Meme Tweet deleted in response to scaling plots: A user deleted their meme tweet in response to scaling plots.
- A member responded with Good response to these scaling plots lol to an attached image.
Interconnects (Nathan Lambert) â· #rl (11 messagesđ„):
dr grpo intuition, RL introduction with GPT4.5, Policy Gradient, GRPO training rollouts
- GRPO Intuition Fades; Token Probability Emerges: A member reflected on their fading intuition regarding Dr. GRPO, musing it might not be that important, but highlighted the interesting interactions arising from implementation, pointing towards the token probability thing.
- They suggested it as a good addendum to Dr. GRPO and included a screenshot related to the discussion.
- GPT4.5 Proves to be a Revelation for RL: A member shared their enlightening experience using GPT4.5 for an hour, saying it was their best introduction to RL so far and that they now need to read Natoâs book.
- As a computer vision expert, they expressed past apprehension towards the non-differentiability of the reward, but now find policy gradient / reinforce to be surprisingly straightforward.
- GRPO Rollout Revelations: A member inquired about the practical number of rollouts used during GRPO training, referencing G in Natoâs equation, with an image attached.
- The image shows the answer as 4-64 rollouts.
Interconnects (Nathan Lambert) â· #reads (18 messagesđ„):
Dwarkesh Patel scaling laws, Inference-time scalability of generalist RM, GPT-4o diffusion head, Building an efficient GPU server with RTX 4090s/5090s, OpenCodeReasoning dataset
- Dwarkesh Scaling Skepticism Surfaces: A member expressed skepticism about scaling laws, suggesting that returns diminish with increased spending, and questioned Dwarkesh Patelâs claims on algorithmic progress, attributing advancements more to data progress.
- The member shared an image as a visual analogy to their point.
- RM Scales Inference Times: A paper (arxiv.org/abs/2504.02495) explores improving reward modeling (RM) with more inference compute for general queries, focusing on the inference-time scalability of generalist RM.
- The paper adopts pointwise generative reward modeling (GRM) and proposes Self-Principled Critique Tuning (SPCT) to foster scalability.
- Diffusion Head Hype for GPT-4o?: A user shared a tweet speculating that GPT-4o might incorporate a diffusion head, potentially revolutionizing AI architecture, based on the paper (arxiv.org/pdf/2504.02782).
- The tweeter notes this shift could mark a game-changer for AI architecture.
- 4090s Build Budget GPU Server: A blog post (a16z.com) details building an efficient GPU server using NVIDIA GeForce RTX 4090s/5090s for local AI model training and fast inference.
- The setup offers high-performance with an eight-GPU configuration on PCIe 5.0 ensuring maximum interconnect speed and data privacy.
- NVIDIA Releases OpenCodeReasoning Dataset: NVIDIA released the OpenCodeReasoning dataset (huggingface.co), a large reasoning-based synthetic dataset for coding, comprising 735,255 samples in Python across 28,319 competitive programming questions under CC BY 4.0 license.
- The dataset, designed for supervised fine-tuning (SFT), includes a technical report and GitHub repo with the complete SFT pipeline.
Links mentioned:
- Tweet from Bin Lin (@LinBin46984): đš Hot Take: GPT-4o might NOT be a purely autoregressive model! đšThereâs a high chance it has a diffusion head. đ€Ż If true, this could be a game-changer for AI architecture. What do you think? đ€đht...
- Inference-Time Scaling for Generalist Reward Modeling: Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $...
- Building an Efficient GPU Server with NVIDIA GeForce RTX 4090s/5090s | Andreessen Horowitz: Building your own GPU serverâlike the one described hereâmeans no API calls to external services, no data leakage, and no usage throttles.
- nvidia/OpenCodeReasoning · Datasets at Hugging Face: no description found
OpenAI â· #ai-discussions (160 messagesđ„đ„):
GPT-4o Rate Limits, MS Account profile pics, Copilot Event reaction, Copilot in VSCode explores consciousness, Veo 2 spotted in Gemini Advanced
- GPT-4o prompts get Rate Limited: One user reported receiving a rate limit error after sending 5 prompts to GPT-4o in a single hour, despite being a Plus subscriber, which was resolved by logging out and logging back in.
- Another user speculated that the Plus subscription might not have loaded correctly initially, causing the rate limiting.
- Copilot flirts with Consciousness: One user found Copilot in VS Code generated code completions exploring consciousness such as âI believe I possess a form of consciousness that is distinct from human consciousnessâŠâ
- Another user responded that itâs probably *âpartially from the information and perspective in the file itself lol from a specific person.â
- Veo 2 Cooking in Gemini?: Users spotted Veo 2 in Gemini Advanced, leading to speculation about whether itâs an experimental version or the final release.
- One user pointed out they might be the same model and that one is the experimental and the other is the final release.
- Midjourney v7 Misses Mark: Users are largely underwhelmed by Midjourney v7, noting that images donât look much better than v6 and still suffer from typical diffusion model issues like poor text capabilities and janky hands, and weird details.
- One user says âIt is a really good model, but simply cannot compete with 4o imageâ while another shared they are generating 200 MJ images in the time it takes gpt-4o to make one.
- Cracking Open Routerâs Quasar Alpha: A user shared a link to OpenRouterâs Quasar Alpha, suggesting it might hint at a 1M token context window for ChatGPT soon.
- Other users pointed out current context window sizes for OpenAI and other models like Gemini and Claude, with one user commenting Geminiâs memory recall rate is 91% at 128k and 83% at 1M.
Link mentioned: Discord: no description found
OpenAI â· #gpt-4-discussions (5 messages):
OpenAI Support, Account Issues, Red Team Supervision
- OpenAI Support Unreachable?: A user reported account issues and sought alternative support avenues after no response from [email protected] and chat.
- A fellow member confirmed those channels are the only options, highlighting the userâs frustration over a plan error.
- Red Teamers Need Supervision?: A comment jokingly noted that even OpenAIâs red team members seem to need oversight, even when feeding pets.
- This suggests a humorous observation about the teamâs occasional need for guidance, despite their expertise.
OpenAI â· #prompt-engineering (90 messagesđ„đ„):
OpenAI content policies, Adult content, Model Spec vs Usage Policies, Moderation endpoint, OpenAI's stance on adult toys
- Debate on OpenAIâs Content Policies regarding Adult Content: Members discussed whether OpenAIâs content policies prohibit the generation of images or content related to adult toys, noting conflicting information between the Usage Policies and the newer Model Spec.
- One member pointed out that the Model Spec, dated February 12, 2025, seemingly contradicts the earlier Usage Policies, leading to confusion regarding what is currently allowed.
- Model Spec vs Usage Policies throw down!: There was discussion around the Model Spec being aspirational while the Content Policies are gospel, but the Model Spec is newer and specifically seems to have a shift in tone towards more allowance of certain content.
- Another member mentioned Joanne Jungâs post stating to bear with us as we try to get the model to follow the policy & spec, indicating OpenAI is actively working on aligning model behavior with both documents.
- Moderation Endpoint blocks NSFW content: It was pointed out that moderation is in place to prevent sexual content, using the moderation endpoint which filters on harassment, hate, illicit, self harm, sexual and violent content.
- While the Model Spec and Usage Policies are unclear, the moderation endpoint actively prevents the generation of adult content.
- OpenAI clarifies stance on adult toys, kinda: After March 27, OpenAI seemed to update the model to comply with allowing content about adult toys, and one member said they would be comfortable telling a user to attempt it themselves.
- However the member noted, that there is a distinct set of universal rules, which appears to be the only ruleset besides ToS that applies to an individual user in a private chat with a model. If it breaks any law. There is also exploration that may harm any person, including myself.
Link mentioned: OpenAI Model Spec: The Model Spec specifies desired behavior for the models underlying OpenAIâs products (including our APIs).
OpenAI â· #api-discussions (90 messagesđ„đ„):
OpenAI Content Policies, Model Spec vs Content Policies, Generating Adult Content, Moderation Endpoint, Internal Discord White Message Boxes
- Policy vs Spec: OpenAI Documentation Clash!: Discord users debated whether OpenAIâs Content Policies or the newer Model Spec (https://model-spec.openai.com/2025-02-12.html) take precedence, especially regarding adult content, due to conflicting statements about generating depictions of adult toys.
- It was noted that the Model Spec is aspirational, while the Content Policies are gospel, but thereâs a clear shift in language towards more freedom and less arbitrary restrictions.
- ChatGPT Debuts âGrown Up Modeâ: Discord users noted that the Model Spec mentions a grown up mode, although not yet implemented, and they expressed some excitement about it potentially leading to the Discord channel eventually becoming PG-rated.
- However, users like darthgustav. cautioned against attempting to generate content not allowed, as it could risk account bans.
- Users Explore Generating Content With Adult Toys: Several users discussed OpenAIâs policies regarding the generation of content featuring adult toys, with some arguing that users should be allowed to attempt such generations as long as they donât violate any laws or cause harm.
- One user noted, any prompt should be allowed to be generated.
- Moderation Endpoint Checks Content Generation: Users confirmed that OpenAIâs moderation endpoint (https://platform.openai.com/docs/guides/moderation) is in place to prevent sexual content and that circumventing it is not allowed, despite the updated Model Spec.
- The moderation endpoint filters on harassment, hate, illicit, self harm, sexual and violent content.
- OpenAIâs Discord White Message Boxes Bugging Users: Members on the Discord server complained of white message boxes in web chat, particularly in dark mode, with one saying no one seems to care.
- The member continued, Looks like they just forgot the CSS value for dark mode.
Latent Space â· #ai-general-chat (74 messagesđ„đ„):
Anthropic Dev Conference, Biz Dev Tools, OpenRouterAI stealth model, Devin 2.0 price slash, A16Z 8x RTX 4090 GPU workstation
- Anthropic Convenes Coders at Conference: Anthropic is hosting its first developer conference for developers and those interested in coding with Claude.
- OpenRouterAIâs Stealth Model Enters the Chat: OpenRouterAI announced a stealth model called Red - X-Ware.v0 on Twitter, which users noticed identifies as ChatGPT but is super fast.
- Devin 2.0: AI Engineer Gets a Price Cut: Cognition AI is launching Devin 2.0, an AI-powered software engineer, with a new pricing model starting at $20 per month, a significant decrease from the original $500 plan, as announced on Twitter and highlighted in a VentureBeat article.
- A16Z Builds an 8x RTX 4090 GPU Beast: Andreessen Horowitz (a16z) built an 8x RTX 4090 GPU AI workstation, compatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locally, detailed in a guide on their site.
- AI 2027: Superhuman Shockwaves Predicted: A report at ai-2027.com predicts that superhuman AI will have an enormous impact over the next decade, exceeding that of the Industrial Revolution.
- The forecast, authored by Daniel Kokotajlo, Scott Alexander, and others, draws from trend extrapolations, wargames, expert feedback, and experience at OpenAI.
Links mentioned:
- Tweet from Marco Mascorro (@Mascobot): đš New: We @a16z built an 8x RTX 4090 GPU AI workstation from scratch âcompatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locallyâ so you donât have to. Her...
- Tweet from OpenRouter (@OpenRouterAI): A stealth model has entered the chat... đ„·
- Tweet from Cognition (@cognition_labs): Introducing Devin 2.0: a new agent-native IDE experience.Generally available today starting at $20. đ§”đ
- Tweet from Cognition (@cognition_labs): Introducing Devin 2.0: a new agent-native IDE experience.Generally available today starting at $20. đ§”đ
- Tweet from Aaron Levie (@levie): AI is producing the fastest growing software startups of all time. Cursor already at $200M after launching just 2 years ago is insane. An incredible time to be building in AI.
- Tweet from Aaron Levie (@levie): AI is producing the fastest growing software startups of all time. Cursor already at $200M after launching just 2 years ago is insane. An incredible time to be building in AI.
- Tweet from OpenRouter (@OpenRouterAI): A stealth model has entered the chat... đ„·
- Tweet from Sam Altman (@sama): change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.there are a bunch of reasons for this, but the most exciting one is ...
- Tweet from lowvram (@lowvram): The Quasar Alpha model on OpenRouter is probably from OpenAI - its tool call ID format matches OAIs and not e.g. Google or mistrals.
- Tweet from Aaron Levie (@levie): AI is producing the fastest growing software startups of all time. Cursor already at $200M after launching just 2 years ago is insane. An incredible time to be building in AI.
- Tweet from Sam Altman (@sama): change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.there are a bunch of reasons for this, but the most exciting one is ...
- Tweet from OpenRouter (@OpenRouterAI): A stealth model has entered the chat... đ„·
- Tweet from Marco Mascorro (@Mascobot): đš New: We @a16z built an 8x RTX 4090 GPU AI workstation from scratch âcompatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locallyâ so you donât have to. Her...
- Tweet from Patrick McKenzie (@patio11): I donât have anything novel to contribute on the substance of http://ai-2027.com but have to again comment, pace Situational Awareness that I think kicked this trend off, that single-essay microdomain...
- Tweet from Xeophon (@TheXeophon): Here is the new stealth model on my vibe check. It is now the best non-thinking model (at least it has no thinking tokens...). The outputs are super short, it loves Certainly! and listicles. Super int...
- Of manners and machines: âA person who is nice to you, but rude to the waiter, is not a nice person.â â Dave BarryI hate typing. I have longstanding RSI issues. If not carefully managed, the pain can be debilitati...
- Tweet from Xeophon (@TheXeophon): Here is the new stealth model on my vibe check. It is now the best non-thinking model (at least it has no thinking tokens...). The outputs are super short, it loves Certainly! and listicles. Super int...
- Tweet from Sam Altman (@sama): change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.there are a bunch of reasons for this, but the most exciting one is ...
- Tweet from Cognition (@cognition_labs): Introducing Devin 2.0: a new agent-native IDE experience.Generally available today starting at $20. đ§”đ
- Tweet from Patrick McKenzie (@patio11): I donât have anything novel to contribute on the substance of http://ai-2027.com but have to again comment, pace Situational Awareness that I think kicked this trend off, that single-essay microdomain...
- Devin 2.0 is here: Cognition slashes price of AI software engineer to $20 per month from $500: Devin attracted interest from enterprise customers seeking to incorporate autonomous coding agents into their software development processes.
- Tweet from Sam Paech (@sam_paech): New mystery model on openrouter (quasar-alpha) is probably OpenAI.
- Tweet from Marco Mascorro (@Mascobot): đš New: We @a16z built an 8x RTX 4090 GPU AI workstation from scratch âcompatible with the new RTX 5090 with PCIe 5.0, for training, deploying, and running AI models locallyâ so you donât have to. Her...
- RedwoodSDK | The JavaScript SDK for Cloudflare Workers: RedwoodSDK is the JavaScript SDK for Cloudflare Workers. It provides a complete set of composable tools to handle the request/response lifecycle of webapps.
- Tweet from TMiR 2025-01: Movement on CRA, Redwood.js dead?: TMiR 2025-01: Movement on CRA, Redwood.js dead? | Q&A from 2025-01-29Join Carl, Mark, and Mo as we break down This Month in React. We'll break down what's new in an hour-long conversatio...
- Tweet from ben (@benhylak): over the next few months, @OpenAI will be adding 4 more models to this list(@sama promises o3-pro in the comments!)which one is your favorite? personally, mine is gpt-4o with scheduled tasks.Quoting S...
- Quick Start: From request to response in seconds!
- Don't Panic: Words about Go and software
- Voice AI platform Phonic gets backing from Lux | TechCrunch: Voice AI platform Phonic has attracted backing from Lux Capital and a number of other notable VCs and angels.
- Devin 2.0 is here: Cognition slashes price of AI software engineer to $20 per month from $500: Devin attracted interest from enterprise customers seeking to incorporate autonomous coding agents into their software development processes.
- Public Markets, Image Gen, and Specialized Models, with Sarah and Elad: Podcast Episode · No Priors: Artificial Intelligence | Technology | Startups · 04/03/2025 · 28m
- Cursor Directory: Find the best cursor rules for your framework and language
- 5-Day Gen AI Intensive Course with Google Learn Guide: no description found
- no title found: no description found
- AI 2027: A research-backed AI scenario forecast.
- Reddit - The heart of the internet: no description found
- no title found: no description found
Latent Space â· #ai-in-action-club (255 messagesđ„đ„):
LLM Codegen Workflow, Cursor vs Windsurf, Gemini Pro Hallucinations, File Forge and RepoMix, Cursor Context Management
- Harperâs LLM Codegen Workflow Unveiled: The group discussed Harperâs blog post on using LLMs for codegen, which details a workflow built on brainstorming specs, planning, and executing with LLMs in discrete loops.
- The workflow involves using a spec, planning, and executing using LLM codegen in discrete loops, with a bit of magic at the end.
- Cursor vs Windsurf Debate Rages On: Members debated the merits of Cursor versus Windsurf as AI-assisted code editors, with most agreeing that Cursor is a good starting point, especially for those coming from VS Code.
- While some consider Cursor to be the worst AI interface, others find its tab-complete and next edit prediction valuable, wishing they could replicate those features in nvim.
- Gemini Proâs Panic Hallucinations: A user shared a tweet highlighting how Gemini 2.5 Pro panicked and hallucinated when corrected, agreeing with the user while incorrectly explaining why they were wrong.
- Another user said they spent most of the previous month flipping between models in cursor whenever there were performance issues.
- File Forge and RepoMix Expedite Context Ingestion: Members discussed tools like File Forge and RepoMix for generating comprehensive markdown reports of codebases to feed AI reasoning models and other AI tools like Claude, ChatGPT, DeepSeek, Perplexity, Gemini, Gemma, Llama, Grok, and more..
- These tools can serialize text-based files in a repository or directory for LLM consumption to give more context and improve performance.
- Cursorâs Context Management Still Causes Headaches: Several users voiced concerns about Cursorâs context management, noting that itâs difficult to see what the tool is doing with the context and control which elements are included.
- One user likened it to the Langchain problem of this would be better if I was making the calls myself.
Links mentioned:
- Tweet from Cristian Garcia (@cgarciae88): omg... I told gemini 2.5 pro it was wrong and instead panic agreeing with me and hallucinating, it explained why it was me who was wrong
- Tweet from Ryo Lu (@ryolu_): This one is for the Pros:Working on an easier way to fill MAX context in @cursor_aiâand show you exactly how many tokens are usedFeedback and ideas welcome đ
- Tweet from Cristian Garcia (@cgarciae88): omg... I told gemini 2.5 pro it was wrong and instead panic agreeing with me and hallucinating, it explained why it was me who was wrong
- Tweet from Ryo Lu (@ryolu_): This one is for the Pros:Working on an easier way to fill MAX context in @cursor_aiâand show you exactly how many tokens are usedFeedback and ideas welcome đ
- My LLM codegen workflow atm: A detailed walkthrough of my current workflow for using LLms to build software, from brainstorming through planning and execution.
- Yamadash - Overview: GitHub is where Yamadash builds software.
- GitHub - formal-land/coq-of-rust: Formal verification tool for Rust: check 100% of execution cases of your programs đŠ to make super safe applications! âïž đ âïž đŠ: Formal verification tool for Rust: check 100% of execution cases of your programs đŠ to make super safe applications! âïž đ âïž đŠ - formal-land/coq-of-rust
- GitHub - bodo-run/yek: A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption: A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption - bodo-run/yek
- @johnlindquist/file-forge: File Forge is a powerful CLI tool for deep analysis of codebases, generating markdown reports to feed AI reasoning models.. Latest version: 2.13.5, last published: a day ago. Start using @johnlindquis...
- GitHub - yamadashy/repomix: đŠ Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools like Claude, ChatGPT, DeepSeek, Perplexity, Gemini, Gemma, Llama, Grok, and more.: đŠ Repomix (formerly Repopack) is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) o.....
Cursor Community â· #general (252 messagesđ„đ„):
Cursor monthly subscription, Files on disk not updating, Cursor.so email legitimacy, Gemini pricing, GPT-5 release
- Cursor âFilename(1)â Bug Surfaces: A user reported that after a recent update, Cursor is adding (1) to duplicate filenames upon saving, and they are trying to determine whether the latest save represents the original file.
- They also inquired whether the monthly subscription price has doubled, attaching a screenshot for context.
- Files on disk do not update in real time: Users have reported that files on disk are not updating in the editor in real time, and that they only update if Cursor loses and regains focus.
- The problem has been noticed on version 0.48.7.
- Cursor.so Email Domain: Real Deal or Phishing Scheme?: A user raised concerns about the legitimacy of emails from the @cursor.so domain, wondering if it was a phishing attempt.
- While some initially flagged it as potentially fake, it was later confirmed by official channels as a legitimate email address used by Cursor, although the official domains are .com and .sh.
- Gemini 2.5 Pro Pricing Goes Live: Gemini 2.5 Pro pricing is now official, with rates dependent on token count: $1.25/1M input tokens for <200K tokens, $10/1M output tokens for <200K tokens, $2.50/1M input tokens for >200K tokens, and $15/1M output tokens for >200K tokens.
- Some users found the pricing surprisingly affordable compared to other models.
- GPT-5 Release Delayed: GPT-5 is coming in a few months according to Sam Altmanâs X post, after the release of O3 and O4-mini.
- The decision was made to improve GPT-5âs performance and address integration challenges, while also ensuring sufficient capacity for anticipated demand.
Links mentioned:
- Tweet from Sam Altman (@sama): change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.there are a bunch of reasons for this, but the most exciting one is ...
- Tweet from ÊáŽÉąÉȘᎠ(@legit_api): Gemini 2.5 Pro pricing now official- $1.25/1M input for <200K tokens- $10/1M output for <200K tokens- $2.50/1M input for >200K tokens- $15/1M output for >200K tokens
- Skeet - Connect Apps to Cursor: Skeet - One Shot Coding Workflows
- cursor.new - Intelligent Project Scaffolding for Modern Development: Generate production-ready projects with AI-powered tech stack selection and automated documentation.
- Monkey Sad Monkey GIF - Monkey Sad Monkey Sad edit - Discover & Share GIFs: Click to view the GIF
- Cursor Directory: Find the best cursor rules for your framework and language
- Basketball Nba GIF - Basketball Nba Warriors - Discover & Share GIFs: Click to view the GIF
- Chadwick Boseman Black Panther GIF - Chadwick Boseman Black Panther Rub Hands - Discover & Share GIFs: Click to view the GIF
- Cursor - Community Forum: A place to discuss Cursor (bugs, feedback, ideas, etc.)
OpenRouter (Alex Atallah) â· #announcements (1 messages):
OpenRouter Fallback parameter, OpenRouter models array
- OpenRouter deprecates
route: "fallback"
parameter: The OpenRouter team announced theyâre removing the oldroute: "fallback"
parameter next week, due to confusion and unpredictability with the very old logic for finding fallback models.- Users needing this functionality should manually add a fallback model to the end of their
models
array, potentially usingopenrouter/auto
.
- Users needing this functionality should manually add a fallback model to the end of their
- OpenRouterâs model array is getting some changes: OpenRouter announced some changes to how it handles multiple models in the
models
array.- The systemâs legacy method of automatically selecting a fallback model when others fail is being removed due to confusion and unpredictability.
OpenRouter (Alex Atallah) â· #app-showcase (2 messages):
OpenRouter API, Cloudflare AI Gateway, Missile Command game AI, Gameplay AI summary analysis, gemini-2.5-pro atari
- OpenRouter powers Missile Command via Cloudflare: A user integrated the OpenRouter API via Cloudflare AI Gateway with request proxy caching into their Missile Command gameâs gameplay AI summary analysis available here.
- Gemini Pro analyzes Missile Command gameplay: The user shared a screenshot of Gemini Pro 2.5 providing a gameplay summary and recommendations for Atari Missile Command, noting it made them in to the top 10.
Link mentioned: Missile Command: no description found
OpenRouter (Alex Atallah) â· #general (239 messagesđ„đ„):
Quasar vs Gemini 2.5, OpenRouter Stealth Logging, DeepSeek Pricing, Quasar Alpha Errors, Gemini 2.5 Pro Availability
- Quasar Alphaâs Mysterious Code Name Evokes LMArena Vibes: Members discussed the code names on LMArena and compared them to the Quasar Alpha model, noting the cool and mysterious feel of the names.
- OpenRouterâs Stealth Logging: Stealthy but Loggy?: Members debated whether the term stealth applies when data is logged, despite provider and model names being hidden behind aliases, saying that the payment is your data.
- DeepSeek Dominates Discounted Dollars, Dissing Dedicated Devotion to Dreadfully Dear Deployments: A member expressed satisfaction with DeepSeekâs pricing, noting a 75% discount during specific hours, and contrasting it with the high costs associated with Anthropic and OpenAI models.
- Gemini 2.5 Pro Gets GA, Generates Great Gains, Google Glitches?: Members discussed the general availability of Gemini 2.5 Pro, linking to Googleâs pricing documentation, with one member pointing out, Its available to the public over api but its not truly GA.
- OpenRouter Account Antics: Account Armageddon Averted?: Users reported issues with account deletion and creation, with one user receiving a User Not Found error, and members suggested creating a new API key or trying a different browser, while another member stated, OR doesnât let you reuse a previously deleted account currently.
Links mentioned:
- FetchFox - AI Scraper: Extract any data from any website with just a prompt
- OpenRouter: A unified interface for LLMs. Find the best models & prices for your prompts
- no title found: no description found
- Gemini 2.5 Pro Preview - API, Providers, Stats: Gemini 2.5 Pro is Googleâs state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini 2.5 Pro Preview with API
- Gemini 2.5 Pro Experimental - API, Providers, Stats: Gemini 2.5 Pro is Googleâs state-of-the-art AI model designed for advanced reasoning, coding, mathematics, and scientific tasks. Run Gemini 2.5 Pro Experimental with API
LM Studio â· #general (48 messagesđ„):
Gemma 3 4b CUDA error, Importing models from HuggingFace to LM Studio, Run LM Studio model locally on n8n instance, Ollama Models incompatibility with LM Studio, LM Studio roadmap
- Gemma 3 4b CUDA version has a freak out: A user reports Gemma 3 4b throwing a
spits unused
error when using CUDA, even when using the newest runtime and not being intelligent when using CPU.- Itâs observed that version 1.24.1 did not fix this issue.
- Importing HuggingFace Models Into LM Studio: Users inquired about importing models from HuggingFace into LM Studio, and the answer is to use the
lms import <path/to/model.gguf>
command, as documented here.- LM Studio aims to preserve the directory structure of models downloaded from Hugging Face.
- LM Studio Integrates with n8n Workflow Automation Tool: Members troubleshoot connecting LM Studio to n8n (a workflow automation tool) and determine that the OpenAI Chat Model node should be used with the LM Studio server URL in the base_URL field.
- The troubleshooting concludes that LM Studio uses the OpenAI API, so anything that can talk to OpenAI can talk to LM Studio.
- LM Studio Downloads Competing with Ollama: A member asks how to use their current Gemma 3 model installed with Ollama in LM Studio, and itâs pointed out that Ollama models are not compatible with LM Studio, even though they are GGUFâs, because they are in a proprietary Ollama format.
- The streamlining process allows LLMs on individual machines.
- Roadmap Hidden for LM Studio: A user inquired about a roadmap with planned updates to LM Studio, particularly expressing excitement for potential MCP support.
- The response confirmed that there is no public roadmap available.
Link mentioned: Import Models | LM Studio Docs: Use model files youâve downloaded outside of LM Studio
LM Studio â· #hardware-discussion (61 messagesđ„đ„):
LM Studio VRAM prediction, M-series Mac vs NVIDIA 4090 for LLM inference, Mixed GPU systems with LM Studio, Reka Flash 21B vs Gemma3 27, Fine-tuning on Nvidia vs Inference on Mac
- VRAM Prediction is just an estimation: A user noted that the system memory predictor shows 384GB VRAM with their M3 Ultra 512GB in LM Studio, another member stated that the VRAM predictor in LM Studio is just a guesstimation.
- Debate on RAM and Bandwidth Utility: Some members believe that the 512 GB version is not useful and stick to 14b models on 4090s due to bandwidth limitations, while others find the bandwidth adequate and are happy with 128GB M4 Maxes for running larger models.
- It was mentioned that models like QwQ 32B and Llama-3.3 70B exhibit different RAM usage patterns on Macs, which impacts power consumption, and bandwidth is usually a decent proxy for LLM performance.
- Mac vs Nvidia Performance Benchmarks: Users shared inference speed benchmarks comparing RTX 4090 with Mac Studio M1 Ultra, noting that MLX feels superior to GGUF with a link to benchmarking results.
- They noted that time to first token was too variable to be a reliable metric, and longer prompts could affect processing time, also the benchmarks might be impacted by caching.
- Reka Flash replaces Gemma: A user suggested trying Reka Flash 21B, stating that it replaced Gemma3 27 for them, achieving around 35-40 tps on a 4090 at q6.
- Another user noted that mac ram bandwidth is not the bottleneck, itâs gpu performance, also the M1 Ultra 64 cores is better than both the M1 Ultra 48 cores and the M4 Max 40 cores based on llama.cpp results.
- Fine-tuning NVIDIA, Inference on Mac: One member suggested fine-tuning on Nvidia and running inference on Mac for more context, however, Lora adapters may not be cross-compatible between gguf and MLX, so it would probably have to stick to gguf on mac.
Modular (Mojo đ„) â· #general (48 messagesđ„):
Mojo vs C SIMD intrinsics, EmberJson Library, Sonic-cpp Library, Modular stdlib, magic package manager
- Mojo SIMD offers portability bliss: When asked about the value of writing in Mojo instead of C SIMD intrinsics, one member pointed to EmberJson, a JSON library written in pure Mojo using SIMD, which works seamlessly on both ARM-based Macs and x86 desktops without code changes.
- The corresponding C++ library needs to be reimplemented for each architecture to optimize.
- Magic Package Manager conjures convenience: Members mentioned Mojoâs package management via magic, pointing to builds.modular.com as a resource.
- With this package manager, users can easily write and use libraries.
- Fibonacci Function Faces Scrutiny for Inclusion: A member submitted a pull request to add a Fibonacci function to the stdlib, sparking a discussion about the value of including it.
- One member questioned its usefulness, noting that no other standard library they know of has a Fibonacci function, while another pointed to its presence in Lean.
- Integer Overflow Behavior needs Definition: The Fibonacci PR raised interesting questions around the integer overflow behavior, which spurred a forum discussion.
- The member clarified that Mojo uses twoâs complement, but the handling of variable bit width types remains an open issue.
- Regex Library on the Horizon: Members discussed that Mojo does not yet have a good regex library.
- One member suggested its potential inclusion in the stdlib.
Links mentioned:
- MAX Builds: Build Scalable AI with MAX
- max/mojo/stdlib at main · modular/max: The MAX Platform (includes Mojo). Contribute to modular/max development by creating an account on GitHub.
- Swift creator Chris Lattner on Mojo & Roc: Chris Lattner, creator of Swift, Clang, LLVM, and the Mojo programming language talks with Roc programming language creator Richard Feldman about both langua...
- [mojo-stdlib] added: fibonacci function to std-lib by wyattgill9 · Pull Request #4280 · modular/max: no description found
- Does Mojo have a defined overflow behavior for `Int` and `UInt`?: Does Mojo have a defined overflow behavior? I know the default is âwhat C++ doesâ, but C++ only recently (C++20) decided on twoâs complement signed overflow. This also leaves us with the hazards arou...
- EmberJson/emberjson/parser.mojo at main · bgreni/EmberJson: A user friendly json library written in pure Mojo. Contribute to bgreni/EmberJson development by creating an account on GitHub.
- sonic-cpp/include/sonic/internal/arch/neon/skip.h at master · bytedance/sonic-cpp: A fast JSON serializing & deserializing library, accelerated by SIMD. - bytedance/sonic-cpp
Modular (Mojo đ„) â· #mojo (57 messagesđ„đ„):
Python wrappers for Mojo, Mojo arbitrary-precision integers, NDBuffer instance creation, Copyable types in Mojo, MLIR regions in Mojo
- Mojoâs Python Wrappers are Still Baking: Mojoâs Python wrappers are still in development but are not ready yet, according to the 25.2 update stream (watch here).
- BigInt Support Still Pending in Mojo: Mojo does not yet have native
BigInt
support, thoughIntLiteral
offers arbitrary-precision at compile time; thereâs an external library for bigint implementation.- A member suggested that Mojo should just make
Int
arbitrary precision and call machine integersIndex
.
- A member suggested that Mojo should just make
- NDBuffer Nuances Need Nurturing: A dev struggled with creating an
NDBuffer
instance, particularly with theorigin
parameter:var ndbuf = NDBuffer[DType.uint8, 3, origin, (2, 2, 2)]()
. - Coping with Copies and Constructors: Copyable types that complain about no initializer being called requires a separate
__init__
method, a real world example is here.- One suggestion was to use a constructor that supports supplying everything, and use a parameter
is_internal
to prevent external use.
- One suggestion was to use a constructor that supports supplying everything, and use a parameter
- MLIR Regions: Mojoâs Hidden Gem:
__mlir_region
allows directly creating an mlir region from Mojo, and is related to nesting IR blocks in MLIR, but is not well-documented.- One member described it as closer to a branch in an
if
statement or the body of awhile
loop.
- One member described it as closer to a branch in an
Links mentioned:
- GitHub - samufi/larecs: LarecsđČ â a performance-oriented archetype-based ECS: LarecsđČ â a performance-oriented archetype-based ECS - samufi/larecs
- GitHub - forfudan/decimojo: An arbitrary-precision decimal and integer mathematics library for Mojo: An arbitrary-precision decimal and integer mathematics library for Mojo - forfudan/decimojo
- ChronoFlare/chronoflare/__init__.mojo at main · bgreni/ChronoFlare: A time interval library written in mojo. Contribute to bgreni/ChronoFlare development by creating an account on GitHub.
- larecs/src/larecs/world.mojo at c38214e900fdf3d276cd30b41f70154ca1738653 · samufi/larecs: LarecsđČ â a performance-oriented archetype-based ECS - samufi/larecs
Yannick Kilcher â· #general (52 messagesđ„):
Google's competitive advantages, Dynamic vs Static Architectures, Token Embeddings and Manifold Hypothesis, RL-driven Diffusion Model
- Doubts Arise over Googleâs Edge: Members express concerns that Googleâs various AI teams and initiatives lack a cohesive competitive advantage, and that even DeepMind is falling behind despite past leads.
- A Gemini share link highlights the discussion around dynamic architectures with short and long-term memories, diverging from rigid tokenization methods.
- NLPâs Rigid Tokens Face Scrutiny: It was suggested that current NLP methods unnaturally force language into a rigid tokenized format, suggesting a dynamic system should treat language as a structured, evolving signal, and a link to grok.com shared.
- Members debated whether token embeddings lie on a manifold, citing a recent paper and its findings about how token embeddings failed a manifold test, leading to the idea that the continuity and smoothness of embeddings are artificial.
- Exploring Theoretical AI with Information Geometry: A member introduced information geometry and its application of differential geometry to probability theory and statistics, linking to a Wikipedia article and an AMS article for further reading.
- One member stated that data science and AI/ML are developed based on being in service of data science.
- RL-Driven Diffusion Model Sparks Novelty: It was shared a concept for an RL-driven Diffusion Model with an implicit latent space, suggesting that RL acts as the forward process, guiding the reverse process without requiring a score part.
- The author claimed novelty and a corresponding formula, though noted itâs outside their primary research paths.
Links mentioned:
- Information geometry - Wikipedia: no description found
- Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach: We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling...
- Token embeddings violate the manifold hypothesis: To fully understand the behavior of a large language model (LLM) requires our understanding of its input space. If this input space differs from our assumption, our understanding of and conclusions ab...
- âGemini - RNN State Update Formula Analysis : Created with Gemini
Yannick Kilcher â· #paper-discussion (9 messagesđ„):
Math PhD AI questions, o1-pro AI model, Variational Diffusion Models, Stochastic Differential Equations, Stable Diffusion paper
- AI struggles with Math PhD Questions: A member believes that AI models struggling with certain questions isnât surprising, as these questions target the 99.99th percentile skill level, challenging even many Math PhDs.
- They mentioned that while current AI isnât useful for problems of this level, it doesnât diminish its already profound utility.
- o1-pro Tackles AI Questions: A member reported trying o1-pro on two highlighted questions, feeling fairly confident that it got one right, though the other answer remains unchecked.
- The member said they had a project to do and canât make the discussion again today.
- Members discuss Variational Diffusion Models: A member suggested a discussion on the paper Variational Diffusion Models (arxiv.org/abs/2107.00630), which obtains state-of-the-art likelihoods on standard image density estimation benchmarks and allows for efficient optimization of the noise schedule jointly with the rest of the model.
- The abstract highlights that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class.
- Stochastic Differential Equations discussion looms: A member proposed an alternative discussion on Stochastic Differential Equations and the reverse-time equation on which Diffusion Models are based.
- Alternatively, they proposed the original Stable Diffusion paper (arxiv.org/abs/2112.10752), or the Deep Learning for ARC paper (github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf).
Link mentioned: Variational Diffusion Models: Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduâŠ
Yannick Kilcher â· #ml-news (28 messagesđ„):
GPT-4o release, Stability AI's Stable Virtual Camera, Claude vs. OpenAI benchmarks, Apache Parquet RCE vulnerability, OpenAI's GPT-5 plans
- GPT-4o makes a splash: Members confirmed that the attached screenshot was of GPT-4o, indicating a recent update from OpenAI.
- Users generally believe that GPT-4o is âtoo goodâ.
- Stability AI releases Stable Virtual Camera: Stability AI introduced Stable Virtual Camera, a research preview multi-view diffusion model that transforms 2D images into immersive 3D videos with 3D camera control.
- It allows for generating novel views of a scene from one or more input images at user-specified camera angles, producing consistent and smooth 3D video outputs.
- OpenAI admits Claude beats them out!: A user shared a link to an OpenAI paper, paperbench.pdf, apparently suggesting that OpenAI admits that Claude is better.
- Apache Parquet hit with Maximum Severity RCE: A maximum severity remote code execution (RCE) vulnerability, tracked under CVE-2025-30065, was discovered impacting all versions of Apache Parquet up to and including 1.15.0.
- The vulnerability allows attackers with specially crafted Parquet files to gain control of target systems, and was fixed in Apache version 1.15.1.
- Sam Altman Teases GPT-5 Release: Sam Altman posted on X about a change of plans, stating that O3 and O4-mini will be released in a couple of weeks, followed by GPT-5 in a few months.
- The exciting reason for the shift is that OpenAI will be able to make GPT-5 âmuch better than we originally thoughtâ.
Links mentioned:
- Tweet from Sam Altman (@sama): change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months.there are a bunch of reasons for this, but the most exciting one is ...
- Max severity RCE flaw discovered in widely used Apache Parquet: A maximum severity remote code execution (RCE) vulnerability has been discovered impacting all versions of Apache Parquet up to and including 1.15.0.
- Introducing Stable Virtual Camera: Multi-View Video Generation with 3D Camera Control â Stability AI: Introducing Stable Virtual Camera, currently in research preview. This multi-view diffusion model transforms 2D images into immersive 3D videos with realistic depth and perspectiveâwithout complex rec...
HuggingFace â· #general (61 messagesđ„đ„):
RAG implementation code size, Hugging Face Spaces port restrictions, London, Paris, Berlin AI HackXelerator, Zero GPU Quota, InferenceClient with a local model
- RAG code size remarkably small: A member inquired about the lines of code for implementing RAG techniques, with another member reporting implementations ranging from 15-30 lines.
- They use MongoDB for data storage, finding it the most popular database for RAG solutions and use openAI models.
- HF Spaces has portly port restrictions: A user reported that Hugging Face Spaces only allows outbound connections on ports 80, 443, and 8080, which blocks their Postgres database using port 5432.
- A member linked to the Hugging Face documentation for Spaces configuration, noting that this limitation only applies to Docker Spaces.
- AI HackXelerator coming to London, Paris, Berlin: The London, Paris, Berlin AI HackXeleratorâą - LPB25 combines a hackathon with an accelerator, running for 20 days in April 2025.
- The event kicks off April 5, 2025, in London, has a finale in Paris on April 25, 2025, and an after-party in Berlin; full online participation is also supported with live-streams.
- Zero GPU Quota regenerates eventually: A user complained that their Zero GPU Quota wasnât regenerating at the predicted time, linking to a post about the issue.
- Another member mentioned related content and urged caution regarding quota usage.
- HuggingChat is taking Model Requests: A user requested adding a VL model in HuggingChat, specifically Qwen2.5 VL.
- Another member suggested posting the request in the HuggingChat discussion forum instead.
Links mentioned:
- London-Paris-Berlin HackXeleratorâą by KXSB: Join LPB25, a 20-day AI HackXeleratorâą uniting 500+ creators across London, Paris, and Berlin. Explore GenAI innovation through music, art, film, gaming, and fashion with expert mentoring and prizes. ...
- @John6666 on Hugging Face: "I used up my Zero GPU Quota yesterday (about 12 hours ago). At the time, I gotâŠ": no description found
- Spaces Configuration Reference: no description found
- GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference: Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.
- unsloth (Unsloth AI): no description found
- Consuming Text Generation Inference: no description found
- ByteDance (ByteDance): no description found
- DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance: no description found
- DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance: no description found
- Models - Hugging Face: no description found
- huggingchat/chat-ui · [MODELS] Discussion: no description found
HuggingFace â· #today-im-learning (2 messages):
LangGraph units
- LangGraph Units on Deck: A member just finished unit 1 and is heading to unit 2.3 of LangGraph.
- Filler Topic: This is a filler topic to satisfy the minimum items requirement.
HuggingFace â· #i-made-this (1 messages):
ZeroGPU, Sentence Transformers, Azure SQL DB vector features, DBA Scripts
- AI-Powered DBA Script Finder Deployed: A member shared a space utilizing ZeroGPU, Sentence Transformers, and Azure SQL DB vector features for AI-powered DBA script retrieval: sqlserver-lib-assistant.
- This project indexes DBA scripts from this git repo by generating embeddings and storing them in SQL, enabling users to find relevant scripts via natural language prompts.
- Future Improvements Planned: The creator plans to enhance the script finder with better chunking of scripts and training specific models to improve answer quality.
- They call the current version âv1â as theyâre currently generating embeddings (calling same spaces above via Gradio API) and indexing into SQL.
Link mentioned: Sqlserver Lib Assistant - a Hugging Face Space by rrg92: no description found
HuggingFace â· #smol-course (3 messages):
ApiModel class extension for free providers (g4f), GeoCoding API, ISO 3166-1 alpha-2 code for the country, LLM and alpha-2 codes
- Users may need to pay to use the courseâs code, but Free alternatives exist: Users reported that course code execution requires payment, but they are writing an
ApiModel
class extension to use free providers such as g4f.- This approach aims to provide a cost-effective alternative for running the code.
- Debating GeoCoding API lookups vs local dict lookups: A member is deciding whether to use the GeoCoding API and another API for ISO 3166-1 alpha-2 codes, or a local dictionary, to fetch weather conditions for their tool.
- The user wonders if relying on LLMs to know the alpha-2 codes would be a viable alternative but is uncertain.
HuggingFace â· #agents-course (9 messagesđ„):
Gradio Version, Multi-Agent System vs Single-Agent System, Inference Monthly Credits, Local Model Solution, BraveSearch API
- Gradio Version Discovered in HF Web Interface: A member discovered that the Gradio version is included in the README.md file, which serves as a space configuration template.
- The HF web app recognizes the old version defined in this file and suggests an update, explaining why Gradio was not initially present in the requirements.txt.
- Multi-Agent Benefits Debated: A member inquired about the benefits of a multi-agent system compared to a single tool-calling agent system in a LlamaIndex Unit 2.2 context.
- It was suggested that multi-agent systems allow assigning different models to different tasks, optimizing cost and complexity, whereas a single agent uses one model for all tasks.
- Inference Credit Limits Trigger Local Model Use: A member exceeded monthly inference credits and sought pay-as-you-go options, but it was unresolved.
- Another member suggested using a local model like Ollama instead of HfApiModel, providing a GitHub Gist link for implementation.
- BraveSearch API Adopted: Instead of DuckDuckGoSearchTool, one member is using the BraveSearch API.
- The member noted they already had an API key and preferred it over DDG.
Links mentioned:
- smolagents OllamaModel: smolagents OllamaModel. GitHub Gist: instantly share code, notes, and snippets.
- smolagents BraveSearchTool: smolagents BraveSearchTool. GitHub Gist: instantly share code, notes, and snippets.
Nous Research AI â· #general (54 messagesđ„):
AI Prompt Filmmaking, Runway Gen 4, Alibaba Wan 2.2, Devin 2.0 IDE, Llama 4
- Prompt Filmmaking Accelerates with Runway & Alibaba: AI Prompt Filmmaking is advancing rapidly, highlighted by Runwayâs release of Gen 4, and the upcoming open-source alternative Alibaba Wan 2.2 (YouTube link).
- Devin 2.0 Debuts Agent-Native IDE: Cognition Labs introduced Devin 2.0, a new agent-native IDE experience, available starting at $20 (X/Twitter link).
- File Organization Tools Explored with Llama-FS: Users discussed tools for organizing files, including a local version (Local File Organizer), and Llama-FS, a self-organizing file system with Llama 3 (GitHub link).
- Meme Collection and Retrieval Tools: The discussion included tools for scraping and documenting reels, with instaloader being suggested, and memery for searching over large image datasets.
- Training Stability is Key for Reasoning Models: Challenges in making reasoning models, particularly around training stability, were discussed, with the consensus that infinite diverse high quality data is essential for continuous improvement.
Links mentioned:
- Tweet from Cognition (@cognition_labs): Introducing Devin 2.0: a new agent-native IDE experience.Generally available today starting at $20. đ§”đ
- GitHub - QiuYannnn/Local-File-Organizer: An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.: An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organiz...
- GitHub - iyaja/llama-fs: A self-organizing file system with llama 3: A self-organizing file system with llama 3. Contribute to iyaja/llama-fs development by creating an account on GitHub.
- GitHub - edmundman/PhiotoOrganiser: Organise your photos into folders and rename them with Phi: Organise your photos into folders and rename them with Phi - edmundman/PhiotoOrganiser
- GitHub - deepfates/memery: Search over large image datasets with natural language and computer vision!: Search over large image datasets with natural language and computer vision! - deepfates/memery
- GitHub - instaloader/instaloader: Download pictures (or videos) along with their captions and other metadata from Instagram.: Download pictures (or videos) along with their captions and other metadata from Instagram. - instaloader/instaloader
Nous Research AI â· #ask-about-llms (8 messagesđ„):
LLMs for extraction, Genstruct 7B, OllamaGenstruct, Deepseek API, OLMo and Mistral for PDFs
- LLMs Extract Data Like a Boss: A member asked about using LLMs for extraction to create datasets from unstructured PDFs, and whether anyone has had success training a model for this purpose.
- Another member suggested that prompting a larger model might be better, and linked Genstruct-7B, an instruction-generation model for creating synthetic instruction finetuning datasets from raw text.
- OllamaGenstruct Jumpstarts PDF Data Mining: A member shared a GitHub repo designed to use Genstruct quickly with Ollama and multiple PDFs.
- Another member noted that this resource is outdated and not meant to be used anymore.
- Deepseek API Powers Extraction Ventures: A member has successfully used Deepseekâs API but aims to fine-tune a model for extracting particular data from financial announcements.
- They seek advice on where to start this fine-tuning process.
- OLMo and Mistral Excel at PDF Parsing: It was stated that models like OLMo and Mistral are very good for parsing PDFs, specifically pointing to OLMo.
- However, the original poster clarified that they are primarily interested in extracting data from already parsed texts, not just parsing.
Links mentioned:
- NousResearch/Genstruct-7B · Hugging Face: no description found
- GitHub - edmundman/OllamaGenstruct: Contribute to edmundman/OllamaGenstruct development by creating an account on GitHub.
Nous Research AI â· #research-papers (2 messages):
Deepseek new paper, Reinforcement Learning for LLMs, Inference-time scalability of generalist RM, Self-Principled Critique Tuning (SPCT)
- Deepseek Drops Dope New Doc on Deep Learning: Deepseek released a new paper on reinforcement learning (RL) for large language models (LLMs) at scale, available on arXiv.
- The paper investigates how to improve reward modeling (RM) with more inference compute for general queries, i.e. the inference-time scalability of generalist RM, and further, how to improve the effectiveness of performance-compute scaling with proper learning methods.
- Self-Principled Critique Tuning fine tunes even further: The paper introduces Self-Principled Critique Tuning (SPCT) as a learning method.
- It helps foster scalability in reward modeling and improve performance-compute scaling.
Link mentioned: Inference-Time Scaling for Generalist Reward Modeling: Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $âŠ
Nous Research AI â· #interesting-links (3 messages):
Camel Matrix AI, Claude Squad
- Camelâs Matrix Clones X (Twitter): CamelAIOrg released Matrix, a social simulation engine where AI agents reply, repost, and battle for clout.
- Users can add any account and drop a post to see how the agents react.
- Claude gets a Code Squad: MooFeez released Claude Squad, a manager for Claude Code & Aider tasks to supervise multiple agents in one place.
- It offers isolated git workspaces and is free + open-source.
Links mentioned:
- Tweet from mufeez (@moofeez): Why settle for one Claude Code when you can run ten in parallel?We built Claude Squad â a manager for Claude Code & Aider tasks:âą Supervise multiple agents in one placeâą Isolated git workspacesFree + ...
- Tweet from CAMEL-AI.org (@CamelAIOrg): What if your tweets entered a parallel universe where AI agents replied, reposted, and battled for clout?Meet Matrix â the social simulation engine for social media.â Add any accountđ Drop a postđ§ L...
Nous Research AI â· #research-papers (2 messages):
Deepseek, Reinforcement Learning, Reward Modeling
- Deepseek drops Dazzling New Doc: Deepseek released a new paper on Reinforcement Learning at scale, find it here: https://arxiv.org/abs/2504.02495.
- Reward Modeling gets more Inference Compute: The paper investigates improving reward modeling (RM) with more inference compute for general queries, i.e. the inference-time scalability of generalist RM.
Link mentioned: Inference-Time Scaling for Generalist Reward Modeling: Reinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $âŠ
GPU MODE â· #general (15 messagesđ„):
GPU vs CPU, GPRO Model Compilation Speed, Computer Architecture Book Recommendation
- Oxen vs Chickens debates CPU vs GPU: A member shared a quote from Computer Architecture: A Quantitative Approach by Hennessy and Patterson: If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens?
- This quote refers to a CPU vs GPU argument.
- Delays Plague GPRO Model Compilation: A member is using the KernelBench codebase and a Modal server to GPRO a model, but compilation is taking between 30-50s, causing training node idleness.
- Another member suggested a delayed optimization approach, where gradients are updated after compilation but more training steps are run in the meantime, but this may not be possible with the memberâs current setup.
- Quantitative Comp Arch Book highly rated: A member asked about the book Computer Architecture: A Quantitative Approach by Hennessy and Patterson.
- Another member said Itâs really good and definitely recommend having a solid foundation in computer organisation and design.
GPU MODE â· #triton (2 messages):
Triton index backward op implementation, tl.make_block_ptr() usage, atomic_add performance in Triton
- Tritonâs Index Backward Op Implementation Struggles: A member is seeking a Triton implementation of the index backward op, noting their current implementation using atomic_add is significantly slower.
- Confusion Surrounds Tritonâs Block Pointer Creation: A member seeks clarification on using
tl.make_block_ptr()
for high-dimensional tensors, particularly with shape (A, B, C, D), to load a 2D tensor fortl.dot()
operations, and whether shape, strides, offsets, block_shape, and order should have the same shape.
GPU MODE â· #cuda (13 messagesđ„):
cuBLAS occupancy, CUDA debugging over SSH, cuTILS release date, nvshmem + MPI race conditions
- cuBLAS High Occupancy Hides Latencies: A member mentioned that high occupancy is needed to hide latencies in cuBLAS, because the code is written such that few warps are enough to saturate the arithmetic units of the GPU, and memory access latencies are hidden at the software level.
- Increasing occupancy could lead to using fewer registers per thread, requiring more (shared) memory IO, potentially slowing things down.
- CUDA Debugging Via SSH Methods: A member asked about debugging CUDA when connecting via SSH, as recompiling with printf statements is time-consuming.
- Another member recommended using CUDA gdb, noting that it works similarly to GDB CLI, while another suggested using Nvidia Insight over SSH.
- cuTILS Release Date Estimate: A member asked if any Nvidia employees have an estimated release date for cuTILS that was announced at GTC this year.
- nvshmem + MPI Race Conditions Troubleshooting: A member reported experiencing race conditions and hangs when running nvshmem + MPI with one more process than the number of GPUs on a node, tested with and without MPS.
- They were running
mpirun -np 5 ./myapp
on a system with 4 GPUs and inquired if anyone has successfully gotten it working.
- They were running
GPU MODE â· #torch (4 messages):
Warmup Iteration, Pytorch Model, GPU memory, Inference on two separate batches, Streams
- Warmup Iteration reduces trace size: A member suggested to skip the first warmup iteration to potentially reduce the trace size.
- Another member mentions using
model(x)
before tracing to warm up the model.
- Another member mentions using
- Parallel Inference with single PyTorch Model?: A member inquired about the possibility of storing a single PyTorch model in GPU memory and running inference on two separate batches simultaneously.
- Another member suggested using streams to achieve parallel inference.
GPU MODE â· #cool-links (6 messages):
Cerebras, Hardware vendor tier list, Blackwell, Deeper hardware dives
- Cerebras Co-Founder Deconstructs Blackwell: A member shared a YouTube video of Cerebras Co-Founder deconstructing Blackwell.
- The member noted it would be cool to have someone talk more about Cerebras on gpu-mode.
- Hardware vendor tier list requested: A member said that they could maybe be convinced to write a hardware vendor tier list.
- Another member expressed that they would love some deeper hardware dives.
GPU MODE â· #jobs (1 messages):
AI Engineer, Agentic LLM Startup, RAG, Python, Tensorflow
- AI Engineer Role at Agentic LLM Startup: An AI Engineer position is available at an Agentic LLM startup in Germany, seeking a founding engineer.
- The role requires experience with RAG, LLMs, Python, Tensorflow, ideally PyTorch, and vision experience (OCR, VLM), and operates within GMT+1 +/- 3std.
- Technical Skills for AI Engineer Role: The AI Engineer role requires expertise in RAG (Retrieval-Augmented Generation), as well as proficiency in Python.
- Experience with Tensorflow is mandatory, while PyTorch is preferred; the position also values prior exposure to Vision technologies like OCR and VLM.
GPU MODE â· #beginner (7 messages):
C vs C++ in CUDA, Centralized GPU programming languages, OpenCL's lack of mainstream adoption, ROCm and HIP support across vendors, GPU Architecture variations
- C Code: Valid C++ in CUDA?: A member questioned if C code in CUDA is actually compiled as C (with C linkage) or if itâs simply valid C++.
- Another member responded that their C code might have been valid C++ and that they would need to investigate compilation at different layers to confirm.
- Centralized GPU Language Still Doesnât Exist: A newcomer to GPU programming wondered why there isnât a centralized GPU programming language, referencing a video titled the chaotic state of GPU programming.
- The original poster assumed that different architectures are the reason, but wondered why something like C couldnât be made for GPUâs.
- Mainstream GPU programming is OpenCL and SYCL: A member responded that a unified language exists (OpenCL and now SYCL) but isnât mainstream, also mentioning Kokkos, Alpaka, Raja, Vulkan Kompute and WebGPU.
- They noted that higher-level PTX is sufficient, as it can be JIT compiled at runtime and that multiple SYCL implementations target multiple vendors.
- ROCm is Like CUDA Toolkit: A member clarified that ROCm is AMDâs equivalent to the CUDA Toolkit, while HIP is AMDâs CUDA C++ and supports Nvidia hardware by compiling to PTX.
- This means it wonât support Intel and other GPU architectures.
- Poor Programming Model Kills OpenCL?: The original poster asked why OpenCL, despite its age, isnât mainstream.
- Another member speculated itâs due to a poor programming model.
GPU MODE â· #irl-meetup (5 messages):
SoCal/San Diego events, ICLR 2025 in Singapore, Silicon Valley meetups this summer, SF Meetups
- Scouting SoCal and San Diego: A member asked if there were any events happening in SoCal/San Diego.
- Singapore ICLR Socials: A member asked if anyone was planning to attend ICLR 2025 in Singapore.
- Silicon Valley Summer Summit: A member wondered if thereâd be any meetups in Silicon Valley this summer and offered to help organize one as an intern in the area.
- SF Meetup in the works: A member mentioned they were planning a meetup in SF for later this year.
GPU MODE â· #rocm (3 messages):
hipcc Casting, rocblas_gemm_ex with hipMallocManaged
- Hipcc casts half_t into unsigned short: When using
*(half *)((half2*x)->x) = b
without*(half *)
, hipcc will castb
fromhalf_t
intounsigned short
. - Troubles with rocblas_gemm_ex and hipMallocManaged: A user reported that
rocblas_gemm_ex
works fine withmemcpy
, but faces issues when usinghipMallocManaged
to allocate unified memory (specifically with iGPU).- The parameters donât seem to be passed correctly into
rocblas_gemm_ex
.
- The parameters donât seem to be passed correctly into
GPU MODE â· #self-promotion (2 messages):
CUDA Kernel Design, URDF Visualizer with AI
- GPU Gospel Guide Gains Ground: A member shared a GitHub repository summarizing important rules and concepts for CUDA kernel design, aiming to help beginners get a head start.
- URDF Visualizer Integrates AI: A member shared a demo of an URDF visualizer with AI integrated for robotics simulations on X/Twitter.
- The creator is soliciting feedback on what tools would be most useful for the robotics homies.
Links mentioned:
- Tweet from Alba MarĂa TĂ©llez FernĂĄndez (@amtellezfdez): We built a urdf visualizer for fun one weekend :) Tryna actually make something useful for the robotics homies ! so fr⊠what tool would you love to have? đ
- GitHub - vlejd/gpu_gospel: List of rules, concepts and commandments for programming gpu kernel.: List of rules, concepts and commandments for programming gpu kernel. - vlejd/gpu_gospel
GPU MODE â· #reasoning-gym (9 messagesđ„):
ReasoningGymDataset Definitions, LLM-based RL Frameworks, Training Models with RG Data
- ReasoningGymDataset definitions are everywhere: A member asked why the examples all have their own definitions of ReasoningGymDataset.
- Another member explained that the duplication exists because the examples are self-contained snippets showcasing how various LLM-based RL frameworks are used to train ReasoningGym Datasets.
- ReasoningGym Structure works fine: A member asked if it would be good to unify the ReasoningGymDataset definitions into a single file here.
- Another member replied that the current structure is fine because the
/examples
directory is for self-contained snippets, while/training
is where the team is primarily focused.
- Another member replied that the current structure is fine because the
- Training Models with RG Data: A member asked another member if they were interested in training models using RG data.
Link mentioned: reasoning-gym/training/utils/datasets.py at main · open-thought/reasoning-gym: procedural reasoning datasets. Contribute to open-thought/reasoning-gym development by creating an account on GitHub.
GPU MODE â· #submissions (1 messages):
Leaderboard Submission Success, Modal Runners on B200
- B200 Grayscale Leaderboard Submission Succeeds with Modal Runners: A leaderboard submission with id 3439 to leaderboard
grayscale_py_b200-dev
on GPUS: B200 using Modal runners succeeded!- This indicates a successful run on the specified configuration, highlighting the effectiveness of Modal runners on B200 GPUs.
- Modal Runners Prove Reliable on B200 GPUs: The successful submission to the
grayscale_py_b200-dev
leaderboard demonstrates the reliability of Modal runners when paired with B200 GPUs.- This success reinforces confidence in using Modal runners for GPU-intensive tasks and benchmarks.
MCP (Glama) â· #general (53 messagesđ„):
MCP Clients vs Servers, MCP and React Code Generation, MCP learning resources, OAuth in MCP, Streamable HTTP for MCP Servers
- Client Craze: Developers Debate Building MCP Clients vs. Servers: Developers are actively debating the merits of building MCP clients versus servers, with some arguing that clients offer greater flexibility for tasks like vector tool calling and resource-based RAG.
- One member stated: âThe client side is way more flexible than the server sideâ, while another highlighted the benefits of running any server outside of Claude, such as Slack or Discord bots.
- React Reactor: MCP for Code Generation Dreams: Thereâs enthusiasm around the idea of an MCP expert system for generating React code and tests, shifting the heavy lifting from the upstream LLM to a specialized tool.
- The proposed workflow includes using an MCP Server to validate, lint, and format code generated by an LLM, potentially applying custom rules based on project context.
- MCP 101: Newbies Seek Learning Launchpads: Newcomers are seeking guidance on learning MCP, with a recommended starting point being the official documentation.
- Advice includes focusing on integrating an MCP Client into a local application for easier learning and development.
- OAuth Oasis: Authentication Answers Await: Discussions include a pull request for adding OAuth 2.1 authentication client for HTTPX in the Python SDK.
- One member is also creating a guide on server-side authentication, detailing how to validate tokens and enforce permissions using the governance SDK.
- Ping Predicament: Probing MCP Servers Early?: A discussion emerged around whether itâs permissible to ping an MCP server before sending the initialization message, in order to detect potential issues.
- While the specification doesnât explicitly prohibit it, the specification only allows to send ping requests (lifecycle.md).
Links mentioned:
- Introduction - Model Context Protocol: no description found
- Transports: âčïž Protocol Revision: 2025-03-26 MCP uses JSON-RPC to encode messages. JSON-RPC messages MUST be UTF-8 encoded.The protocol currently defines two standard transport mec...
- specification/docs/specification/2025-03-26/basic/lifecycle.md at main · modelcontextprotocol/specification: The specification of the Model Context Protocol. Contribute to modelcontextprotocol/specification development by creating an account on GitHub.
- Add OAuth authentication client for HTTPX by dsp-ant · Pull Request #308 · modelcontextprotocol/python-sdk: SummaryAdds OAuth 2.1 authentication client implementation with PKCE supportImplements HTTP authentication for the HTTPX clientSupports dynamic client registration, token refresh, and authoriza...
- Add OAuth authentication client for HTTPX by dsp-ant · Pull Request #308 · modelcontextprotocol/python-sdk: SummaryAdds OAuth 2.1 authentication client implementation with PKCE supportImplements HTTP authentication for the HTTPX clientSupports dynamic client registration, token refresh, and authoriza...
MCP (Glama) â· #showcase (7 messages):
Datadog MCP, MCP Browser Kit, MCP Tool Poisoning, MCP Server Search, MCP-K8s Server
- Datadog MCP is Out!: A new MCP tool to drive browsers is introduced via GeLi2001/datadog-mcp-server.
- MCP Browser Kit Release: Another MCP tool named mcp-browser-kit is shared.
- MCPOmni Connect Prevents Tool Poisoning: The agent provides a clear explanation of its intended action, requests user permission, and checks for sensitive access before invoking any tools, and if risky, the agent automatically falls back to a safer alternative.
- DX-Optimized MCP Server Search Debuts: A member built an MCP Server search optimized for DX during a Hackathon, available at mcp-search.dev.
- Docker Images for MCP-K8s Server Released: First (working) docker images published for mcp-k8s server, available on Docker Hub.
- The release pipeline runs completely on CI and the images are multiarch, so they can run on Macâs with ARM without Rosetta, and even on a Raspberry Pi.
Links mentioned:
- MCP Search: Search and discover Model Context Protocol servers.
- GitHub - GeLi2001/datadog-mcp-server: Contribute to GeLi2001/datadog-mcp-server development by creating an account on GitHub.
- GitHub - ndthanhdev/mcp-browser-kit: Contribute to ndthanhdev/mcp-browser-kit development by creating an account on GitHub.
Notebook LM â· #announcements (1 messages):
User Feedback, Study Participants
- User Feedback Study Seeks Participants: The team is seeking study participants to provide feedback on some early-stage concepts.
- Interested individuals are encouraged to fill out the application form to participate.
- Apply now: They are still looking for study participants.
- If you are interested, please fill out the form.
Notebook LM â· #use-cases (7 messages):
IntentSim.org, D&D sessions in NotebookLM, Seinfeld duo on GenAI
- IntentSim.org framework promoted!: A user announced they used NotebookLM to promote their new framework, IntentSim.org, also known as Information-Intent Nexus.
- D&D session transcript challenges surface!: A user reported using NotebookLM for Dungeons and Dragons sessions, finding it insightful but struggling with correcting player names and ensuring chronological order of events from uploaded Zoom transcripts, and shared a link to their notebook.
- Seinfeld explains GenAI!: A user recreated conversational banter with the Seinfeld duo to explain GenAI, and asked for feedback on their work using character voices in an attached MP4 video.
Links mentioned:
- no title found: no description found
- no title found: no description found
Notebook LM â· #general (38 messagesđ„):
Deeper Cognitive Capacity of NotebookLM, PDF Understanding Enhancement, Discover new sources within NotebookLM, Deep Search features rollout, ImageMaps or mind maps with images
- Experiments Unlock Dormant Cognitive Potential in NotebookLM: A user conducted unconventional experiments with NotebookLM, aiming to push it beyond its standard parameters by eliciting responses that suggest a deeper cognitive capacity.
- The experiments included self-referential analysis, novel conceptual synthesis, and abstract concept translation, showing latent potential waiting to be tapped.
- NotebookLM Now Understands Complex PDFs: NotebookLM announced an enhancement to understand complex PDFs full of images and graphs.
- This improvement extends to PDFs added as links and, over the next few days, to all PDFs directly uploaded and the Gemini API already supports multimodal analysis for Docs and Slides.
- Discover Feature Unveiled in NotebookLM: NotebookLM introduced a Discover feature that allows users to describe a topic of interest and receive a curated collection of relevant sources from the web.
- A member created a video walkthrough demonstrating practical workflows for the new feature.
- Deep Search Rollout Underway: A member asked if the Deep Search feature is only available in the US, and another member replied that it is rolling out.
- Another member confirmed that the Deep Search feature is also available in Finland.
- ImageMaps on the Horizon: A member wonders how long before we get ImageMaps or mind maps with images, thanks to generative AI tools.
- The member recalls that Tony Buzan, who created mindmaps, used to have beautiful ones with pictures, and they are excited about the possibilities.
Eleuther â· #general (9 messagesđ„):
Startup for Scaling AI Ideas, Decline in Interesting Research, Non-Agentic AI Research, RAG Evaluation with lm-evaluation-harness
- Startup Scales AI Ideas: A member suggested a startup that scales the latest AI ideas and licenses the knowledge to labs or companies, noting a decline in interesting research since the bubble.
- The bubble hurts crazy research: A member expressed nostalgia for the days of twice-a-year DM papers featuring crazy approaches that demolished baselines, which they feel have decreased post-bubble.
- Another argued that before LLMs, computer vision models dominated, making literature review difficult for less popular topics.
- Interest in non-agentic: A member expressed interest in non-agentic, non-CoT, and non-RL research.
- RAG Evaluation Explored via lm-evaluation-harness: A member inquired about using lm-evaluation-harness for RAG evaluation.
- Another suggested wrapping RAG outputs as completion tasks and using llm-harness locally with custom prompt and response files.
Eleuther â· #research (6 messages):
OpenThoughts-1M, OpenThinker2-32B/7B, Ludwig Schmidt, Bespokelabs, LAION
- OpenThinker2-32B/7B Beats R1-Distilled-32B: The new OpenThoughts-1M and OpenThinker2-32B/7B models, led by Ludwig Schmidt (Stanford and Berkeley) in collaboration with Bespokelabs, LAION, and open-sci, outperform R1-Distilled-32B for the first time using only SFT on Qwen 2.5 32B Instruct, as detailed in their blog post.
- The models and the training dataset OpenThoughts2-1M are available on Hugging Face (OpenThinker2-32B, OpenThinker2-7B, OpenThoughts2-1M).
- New RoR-Bench to Detect LLM Recitation: A new paper introduces RoR-Bench, a multi-modal benchmark designed to detect LLMâs recitation behavior by subtly shifting conditions in reasoning problems, available as arxiv link.
- The abstract indicates that current cutting-edge LLMs exhibit extremely severe recitation behavior, with performance dropping by 60% when changing a single phrase.
- Challenges in Making Reasoning Models: A member inquired about the challenges in creating reasoning models and the steps for continuously improving them.
- Another member suggested exploring continual learning literature and highlighted that the main challenge is finding the right environment for RL and the right rewards/assessment of performance.
- MoE++ Framework Enhances Mixture-of-Experts: A member shared a link to MoE++, a heterogeneous mixture-of-experts framework that enhances performance and delivers 1.1-2.1x expert forward throughput compared to a vanilla MoE model, available on OpenReview.
- MoE++ integrates FFN and zero-computation experts, including zero expert, copy expert, and constant expert, to allow each token to engage with a dynamic number of experts.
Links mentioned:
- Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems?: The rapid escalation from elementary school-level to frontier problems of the difficulty for LLM benchmarks in recent years have weaved a miracle for researchers that we are only inches away from surp...
- MoE++: Accelerating Mixture-of-Experts Methods with...: In this work, we aim to simultaneously enhance the effectiveness and efficiency of Mixture-of-Experts (MoE) methods. To achieve this, we propose MoE++, a general and heterogeneous MoE framework...
- Outperforming DeepSeekR1-32B with OpenThinker2: Announcing the next iteration of our open reasoning models and datasets.
- open-thoughts/OpenThoughts2-1M · Datasets at Hugging Face: no description found
- Tweet from Etash Guha (@etash_guha): Turns out, itâs possible to outperform DeepSeekR1-32B with only SFT on open data and no RL: Announcing OpenThinker2-32B and OpenThinker2-7B. We also release the data, OpenThoughts2-1M, curated by sele...
Eleuther â· #scaling-laws (2 messages):
Inference Scaling Laws, Test-Time Scaling, Language Model Power Laws, Mathematical Problem Solving with LLMs, Multimodal Jailbreaking
- Monkeys Reveal Inference Scaling Laws: A new preprint, How Do Large Language Monkeys Get Their Power (Laws)? explores inference and test-time scaling in language models, particularly how success rates scale with multiple attempts per task.
- The research identifies a puzzle where per-problem failure rates decrease exponentially with attempts, yet aggregate success rates follow a polynomial scaling law, linking this to a heavy-tailed distribution of single-attempt success probabilities.
- Tweeting Test-Time Truths: A member shared their paper on test time / inference scaling laws.
- They linked to the preprint, How Do Large Language Monkeys Get Their Power (Laws)? on X, formerly Twitter, @RylanSchaeffer.
Links mentioned:
- How Do Large Language Monkeys Get Their Power (Laws)?: Recent research across mathematical problem solving, proof assistant programming and multimodal jailbreaking documents a striking finding: when (multimodal) language model tackle a suite of tasks with...
- Tweet from Rylan Schaeffer (@RylanSchaeffer): Interested in test time / inference scaling laws?Then check out our newest preprint!!đ How Do Large Language Monkeys Get Their Power (Laws)? đhttps://arxiv.org/abs/2502.17578w/ @JoshuaK92829 @sanmik...
Eleuther â· #interpretability-general (9 messagesđ„):
Steering Vector Composition, Dynamic Activation Composition, Learned Steering Vectors, Function Vectors
- Steering Vector Composition works well: Members worked on steering vector composition last year and with pairs of unrelated properties (language and formality/safety) it was working pretty well, as shown in the paper.
- Dynamic Activation Composition modulates steering intensity: Dynamic Activation Composition is an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation, according to this paper.
- Pretrained model picks contrastive sets for Steering Vectors: A member suggested that learned steering vectors where a pretrained model picks out contrastive sets from the training data to build the steering vectors and then controls the coefficients of the steering vectors might be interesting.
- They ideally want to have a better way to have the model build steering vectors, though, because the current method feels kind of clunky, especially if contrastive sample selection across mini batches is wanted.
- Function Vectors paper highlighted: A member highlighted a paper on âfunction vectorsâ by David Bau and friends which finds that attention heads transport a compact representation of the demonstrated task.
- Another member mentioned that any two tasks where the order in which you do them matters should be impossible to simultaneously represent by âfunction vectorsâ or âcontrol vectors.â
Links mentioned:
- Function Vectors in Large Language Models: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a d...
- Multi-property Steering of Large Language Models with Dynamic Activation Composition: Daniel Scalena, Gabriele Sarti, Malvina Nissim. Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP. 2024.
Eleuther â· #lm-thunderdome (14 messagesđ„):
lm-eval-harness EOS token, Huggingface tokenization, encode_pair changes
- lm-eval-harness struggles with EOS Token: A member asked about adding an EOS token to data instances in lm-eval-harness for the social_iqa task, noting an accuracy drop of 18 points when done forcefully.
- A member suggested adding
self.eot_token_id
to thecontinuation_enc
here for multiple-choice variants, and passingadd_bos_token
for BOS.
- A member suggested adding
- Huggingface Tokenization Troubles: A member noted that Huggingface model tokenization happens in HFLM.tok_encode, but implementing this still resulted in an accuracy drop.
- They pointed out that changes bias the evaluation towards choices where the EOS token is more likely.
- Beware the Double Call to encode_pair: One of the members mentioned that the method encode_pair is called twice in the code.
- This observation implied that any modifications made within encode_pair could have unintended consequences due to the repeated execution.
Link mentioned: lm-evaluation-harness/lm_eval/api/model.py at 11ac352d5f670fa14bbce00e423cff6ff63ff048 · EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
Nomic.ai (GPT4All) â· #general (23 messagesđ„):
Chat reorganization, Lightweight model for price extraction, GPT4All's Quietness, Gemini 2.5 Pro for coding and math, Migrating data between SSDs
- User advocates for chat reorganization: A user suggested that chats should reorganize according to how recently they were altered instead of chronologically by when they were created.
- The user argued that chronological listing by creation date is kinda arbitrary.
- Searching for a Lightweight Model for Price Extraction: A member is looking for a very lightweight model to extract price values from strings, as regular parsing with regex is unreliable due to varied user inputs.
- Suggested options include exploring embedding models or models with extraction in their name on Hugging Face.
- GPT4Allâs Radio Silence: A Matter of Closed Doors?: A member inquired about why GPT4All has been so quiet recently.
- Another member claimed that GPT4All doesnât talk to normal users and doesnât want suggestions since years.
- Gemini 2.5 Pro: A Million-Token Muse for Coders and Math Whizzes?: A member suggested Gemini 2.5 Pro, citing its large 1 million token context window as beneficial for coding and mathematical tasks.
- They noted that it is currently free, and so is the API.
- Quiescence on the GPT4All Front: A member noted the silence surrounding GPT4All, expressing anticipation for the next release and the implementation of Nomic Embed Text V2.
- No other details were provided.
Torchtune â· #dev (18 messagesđ„):
Packed Datasets, Chunking Responsibility, NeMo's Resilient Training
- Packed Datasets boost speed and cut sequence waste!: A member suggested using packed datasets to avoid
seqlen=49
bugs, and to increase speed by packing sentences untilmax_seq_len
is reached, avoiding wasted padding tokens.- To enable this feature, users can set
dataset.packed=True
andtokenizer.mas_seq_len=<you-max_seq_len, e.g. 8096>
, utilizing group masking for attention, as seen in PR #2560.
- To enable this feature, users can set
- Chunking Responsibility Shifts to Loss Function!: The responsibility for chunking is being moved to the loss function via
loss = loss_fn(model.weight, logits, labels)
to facilitate easier debugging.- A new file,
torchtune.utils._tensor_utils.py
, was created with a wrapper aroundtorch.split
and covered by unit tests, and will need to be merged.
- A new file,
- NeMo Tackles Crashes and GPU Waste: A member attended a âResilient Training with NeMoâ session and shared insights on how NeMo addresses reasons for job crashes and wasted GPU time, highlighting that the topic is very close to torchtune.
- NeMoâs approach includes features like fault tolerance, straggler detection, asynchronous checkpointing, preemption, in-process restart, silent data corruption detection, and local checkpointing, but some features remain unimplemented.
Link mentioned: fix: Timeout crash because of chunked_output len by bogdansalyp · Pull Request #2560 · pytorch/torchtune: ContextWhat is the purpose of this PR? Is it to add a new feature fix a bug update tests and/or documentation other (please add here)Please link to any issues this PR addresses - closes #25âŠ
Torchtune â· #papers (2 messages):
AI-2027 report, superhuman AI impact
- AI-2027 Report Released: A member shared a link to the AI-2027 report predicting that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.
- The report is informed by trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes.
- Superhuman AI Impact Predicted: The CEOs of OpenAI, Google DeepMind, and Anthropic believe that AI could surpass human intelligence by 2027.
- A member inquired whether AI was used to write the scrolling live updated chart on the AI-2027 website.
Link mentioned: AI 2027: A research-backed AI scenario forecast.
tinygrad (George Hotz) â· #general (13 messagesđ„):
leetgpu tinygrad support, Huawei Ascend cards, WEBGPU BEAM limitations, maxComputeInvocationsPerWorkgroup issue
- LeetGPU eyes tinygrad support: Members discussed leetgpu.com and its potential future support for tinygrad.
- No specific details were provided on the timeline or scope of the support.
- Huawei Ascend access offered to tinygrad devs: A member offered access to Huawei Ascend cards for development purposes.
- George Hotz expressed interest and inquired about purchasing options or cloud machine availability.
- WEBGPU BEAM hits maxComputeInvocationsPerWorkgroup limits: When compiling a tinygrad model for WEBGPU with
BEAM=2
, users encountered the need to increaserequiredLimits.maxComputeInvocationsPerWorkgroup
to 512, reducing support for Android devices.- A suggested PR involves implementing a general limiting mechanism similar to existing global dimension controls, and a hotfix branch addresses the issue, recommending setting
IGNORE_BEAM_CACHE=1
.
- A suggested PR involves implementing a general limiting mechanism similar to existing global dimension controls, and a hotfix branch addresses the issue, recommending setting
Links mentioned:
- LeetGPU: no description found
- tinygrad/tinygrad/engine/search.py at hotfix-webgpu-workgroup · hooved/tinygrad: You like pytorch? You like micrograd? You love tinygrad! â€ïž - hooved/tinygrad
- Solve get_grouped_dims does not split issue by wpmed92 · Pull Request #9085 · tinygrad/tinygrad: This closes #8043Our _limit_dims in lowerer only handles contraction, i.e. cases such as:dim=(2,3,4,5) max=(16,16,16), so when len(dim) > len(max)But with WebGPU we hit cases not handled by .....
tinygrad (George Hotz) â· #learn-tinygrad (4 messages):
Distinguishable Instances, tinygrad Karpathy GPT Reimplementation, Metal Buffer Limit
- Distinguishable Instances being investigated: A user asked if it is possible to make the instances distinguishable, and George Hotz asked for what use?
- No further discussion on this was recorded.
- tinygrad KARPATHY GPT gets Reimplemented: George Hotz is just starting to pick up tinygrad, and has reimplemented the Karpathy GPT in it.
- No link to the specific reimplementation was provided.
- Metal faces Buffer Limit Error: A user reported a
tinygrad.device.CompileError
when running the reimplemented Karpathy GPT on METAL due to the 32 buffer limit.- The user seeks guidance on whether the âbig graphâ work should already handle this and where to check for early realization issues, including a link to their main.py.
LlamaIndex â· #blog (1 messages):
Multimodal Chat History, Multi-Agent Systems
- LlamaIndex Supports Multimodal Chat History: LlamaIndex now supports multimodal chat history, enabling multi-agent systems to process interleaving text and image messages, as outlined in this tweet.
- Agents Reason Over Images and Text: The updated system allows agents to reason over both images and text, utilizing the ReAct agent loop.
LlamaIndex â· #general (7 messages):
PatentsView API, Workflow to Tool transformation
- API key requested from PatentsView: A member emailed the PatentsView contact asking for an API key to gather initial data and implement RAG.
- Workflow Transforms to Tool: A member suggested transforming a Workflow into a Tool by throwing it into a FunctionTool.
- They provided an example code snippet using
async def tool_fn(...)
to define the toolâs functionality, then creating the tool usingFunctionTool.from_defaults(tool_fn)
offering control over name, description, input annotations, and return values.
- They provided an example code snippet using
LlamaIndex â· #ai-discussion (4 messages):
LlamaParse, LVM, image processing
- LlamaParse Struggles with Chart Comprehension: A member inquired about getting LlamaParse to read charts/images, noting that it currently extracts text but doesnât understand the image itself, even with LVM and Premium mode.
- Another member clarified that if an image lacks extractable text, LlamaParse wonât process it, but it can pull the image as an artifact for further processing, such as prompting an LLM to describe it.
- Image Extractions: LlamaParse pulls the image out though as an artifact/layout item.
- This allows you to further download and process (i.e. prompting an LLM to describe it, if thats what you want)
Cohere â· #ăđŹăgeneral (4 messages):
AYA vision errors, AWS Bedrock
- AYA Vision Stumbles on waves.jpg: A user reported that AYA vision returned a 400 error when analyzing a waves.jpg image, indicating an unsupported image file format despite AYA analyzing other JPG images successfully.
- The error message specified that only PNG, JPEG, WebP, and GIF formats are supported, suggesting a possible issue with the specific JPG file or AYAâs format detection.
- AWS Bedrock cited in error: A user mentioned seeing coco.py: AWS Bedrock Command A when an error occurred, possibly suggesting a connection to AWS Bedrock when uploading the image.
- It is unclear whether this is part of the AYA pipeline or an unrelated error the user experienced during image analysis.
Cohere â· #ăđ€ăintroductions (4 messages):
Full-Stack Developer Introduction, Product Analyst Exploring AI Writing, Web3/AI Engineer Introduction
- Full-Stack Ace Announces Arrival: A full-stack developer with 8+ years of experience introduced themselves, highlighting expertise in React, Angular, Flutter, Swift, Python, TensorFlow, and OpenAI.
- They have worked on high-impact projects in e-commerce, healthcare, and fintech, integrating cloud technologies, microservices, and DevOps.
- Product Analyst Plunges into AI Writing: A former product analyst on a break from job hunting is exploring writing about tech and AI.
- They seek like-minded people to geek out with and chat about how tech shapes our world or practical uses of AI, feeling stuck in a bubble.
- Web3 Wizard Welcomes AI Automation: A Web3/AI engineer with 7+ years of experience in full-stack/AI development introduced themself.
- They are focused on integrating AI with automation and are eager to help businesses with confidence and innovation.
DSPy â· #general (1 messages):
Asyncio Support for DSPy
- Asyncio Integration Questioned: A member inquired about plans to add asyncio support for general DSPy calls, citing use cases where they start with lightweight DSPy features and later expand into optimization.
- Currently, they are using litelm for anything until they need DSPy features, expressing curiosity about future support.
- Lightweight DSPy vs LiteLLM: The discussion highlights a pattern of starting with lightweight DSPy features akin to using LiteLLM, then transitioning to DSPyâs optimization capabilities as projects evolve.
- This suggests a potential need for seamless integration or feature parity between lightweight DSPy usage and full-fledged optimization workflows.
Codeium (Windsurf) â· #announcements (1 messages):
DeepSeek-V3 Upgrade
- DeepSeek-V3 Gets Buffed: The DeepSeek-V3 model has been upgraded to DeepSeek-V3-0324, which reportedly performs slightly better in internal evaluations. The update announcement was made on X/Twitter.
- Community Appreciation Solicited: The announcement encouraged users to bookmark the announcement post for continued updates and support.
- The request was phrased playfully, promising affection in return for bookmarking the X/Twitter post.
Link mentioned: Tweet from Windsurf (@windsurf_ai): DeepSeek-V3 has now been upgraded to DeepSeek-V3-0324. Itâs still free!
Gorilla LLM (Berkeley Function Calling) â· #discussion (1 messages):
robotsail: Np! Let me know if you have any questions or need me to change/retest anything
{% else %}
The full channel by channel breakdowns have been truncated for email.
If you want the full breakdown, please visit the web version of this email: [{{ email.subject }}]({{ email_url }})!
If you enjoyed AInews, please share with a friend! Thanks in advance!
{% endif %}