> AI News for 4/30/2024-5/1/2024. We checked 7 subreddits and [**373** Twitters](https://twitter.com/i/lists/1585430245762441216) and **28** Discords (**418** channels, and **5796** messages) for you. Estimated reading time saved (at 200wpm): **615 minutes**.

Anthropic continues its pattern being (merely) 4 months behind OpenAI in releasing a team plan and iOS app in an otherwise relatively quiet day in AI. Perplexity is teasing a private Pages feature with a signup form you can access via Discord:

Table of Contents

[TOC]

AI Reddit Recap

Across r/LocalLlama, r/machinelearning, r/openai, r/stablediffusion, r/ArtificialInteligence, /r/LLMDevs, /r/Singularity. Comment crawling works now but has lots to improve!

LLM Models and Frameworks

Command-R 35B model excels at creative writing: In /r/LocalLLaMA, the Command-R 35B model outperforms larger models like Goliath-120 and Miqu-120 in a creativity benchmark. Proper prompting is key to unlocking its potential.
Llama-3 8B model context window extension: The Llama-3 8B model can use a 1 million token context window. Extending the context from 8K to 80K tokens improves performance on long-context understanding tasks, using only 3.5K GPT-4 generated training samples on a single 8xA800 GPU machine in 8 hours.
TensorRT-LLM outperforms llama.cpp in speed: According to benchmarks on consumer laptops and desktops, TensorRT-LLM is 30-70% faster than llama.cpp on the same hardware.
Benchmark suggests GPT2-Chat has better reasoning than GPT 4-Turbo: In /r/LocalLLaMA, a new benchmark with 80 one-shot tasks indicates that GPT2-Chat may have better reasoning capabilities than GPT 4-Turbo, despite being slightly less intelligent overall. However, some users express skepticism about the results.

AI Agents and Robotics

Self-learning Llama-3 voice agent demo: A demo of a self-learning Llama-3 voice agent with function calling and automatic RAG, running locally on Jetson Orin.
Self-Learning Large Action Model (LAM) demo: An open-source demo of a Self-Learning Large Action Model (LAM) that requires no user training.

AI Assistants

Amazon CodeWhisperer renamed to Q Developer: Amazon CodeWhisperer has been renamed to Q Developer, expanding its functions as a generative AI assistant for developers.
Apple to unveil AI-enabled Safari browser: Apple plans to unveil an AI-enabled Safari browser with an on-device LLM in iOS 18 and macOS 15.

AI Ethics and Governance

AI lobbying frenzy in Washington dominated by Big Tech: Big Tech companies are dominating an AI lobbying frenzy in Washington as they aim to influence AI policy.
Major U.S. newspapers sue OpenAI and Microsoft for copyright infringement: Major U.S. newspapers have filed a lawsuit against OpenAI and Microsoft for alleged copyright infringement.

AI Research

DeepMind’s AlphaZero becomes greatest chess player in 9 hours: Starting from scratch, DeepMind’s AlphaZero became the greatest chess player in just 9 hours.
DeepMind’s Naturalized Execution Tuning (NExT) improves LLM code reasoning: DeepMind’s NExT improves LLM code reasoning capabilities by having models inspect execution traces and provide rationales, improving fix rates by 14-26%.

Stable Diffusion and Image Generation

Stable Diffusion used for diverse applications: In /r/StableDiffusion, Stable Diffusion is being used for generating realistic selfies, clothing options, and more, beyond just NSFW content.
ConsistentID project generates high-quality portraits: The ConsistentID project generates realistic portraits with identity fidelity and diversity, potentially surpassing Ipadapter.
HiDiffusion for SDXL generates high-quality images: In /r/StableDiffusion, HiDiffusion for SDXL generates high-quality images but requires a cfg of 20 for coherence.

AI Twitter Recap

all recaps done by Claude 3 Opus, best of 4 runs. We are working on clustering and flow engineering with Haiku.

Claude iOS App Launch and New Features by Anthropic

Claude iOS app launch: @AnthropicAI announced the release of the Claude iOS app, bringing their AI to mobile devices. The app is now available on the App Store.
New Team plan: @AnthropicAI introduced a Team plan for Claude with increased usage, user management, billing, and a 200K context window for complex tasks.
Upcoming collaboration features: @AnthropicAI teased future features like citations from reliable sources for claim verification and integrations with data repositories, while maintaining security and safety.

AI Experts Share Insights

Demis Hassabis on AI accelerating science: @demishassabis spoke at @TEDTalks about how AI will speed up scientific discovery and help tackle major challenges like cancer and climate change.
Yann LeCun critiques current LLMs: @ylecun argued that knowledge accumulation in LLMs is not a substitute for true understanding, outlining behaviors that show a lack of basic logic, common sense, and inability to acknowledge mistakes.

Personal Experiences and Reflections

Anthropic employee shares favorite Claude posts: @alexalbert__, an Anthropic employee, shared their top 10 humorous Claude posts and memes from the company Slack over the past two months.
Dealing with hand disability and career change: @jxnlco shared his experience losing the ability to code and work due to a hand disability in 2020, and why he is now consulting rather than working at a fast-paced startup.
Leaving Scale AI with insights on ML progress: @russelljkaplan announced his departure from @scale_AI after nearly 4 years, reflecting on the company’s growth and his unique perspective on the future of ML. He plans to share more thoughts on ML progress and his next steps.

AI Research and Updates

Lmsys.org offers community access to unreleased models: @lmsysorg clarified they work with model developers to provide community access to unreleased models for preview testing, aiming to bring more models as they scale and partner with open-source and commercial providers.
2020 paper on RLHF+PPO for instruction following: @rasbt highlighted a 2020 paper by Stiennon et al. that used RLHF+PPO to finetune LLMs for instruction following, two years before InstructGPT.
Meta presents multi-token prediction for faster LLMs: @arankomatsuzaki and @rohanpaul_ai shared a Meta paper on using multi-token prediction to train LMs more efficiently, with up to 3x faster inference while maintaining or improving downstream performance.

Other Topics

Machine learning book recommendations: @svpino shared his top 3 ML books covering the ML workflow, algorithms, and deep learning tools like Keras, PyTorch, and Scikit-Learn.
Critique of Ilya Sutskever’s arguments: @teortaxesTex questioned Sutskever’s claim that predictive objectives will succeed at creating a perfect oracle.
Memes and humor: @mervenoyann and @BorisMPower shared humorous images and memes.

AI Discord Recap

A summary of Summaries of Summaries

1. Large Language Model (LLM) Advancements and Benchmarks

LLaMA 3 is gaining traction, with Nous Research’s Hermes 2 Pro on LLaMA 3 8B outperforming the original on benchmarks like AGIEval and GPT4All Suite. Discussions around quantizing LLMs, with a 5.5 bits per weight limit before significant quality loss. Efforts to extend context lengths beyond typical limits, like 1M tokens for LLaMA 3, though practical limits average 100-200k.
Iterative methods like Meta’s Iterative Reasoning Preference Optimization boosted accuracy on GSM8K and ARC-Challenge for LLaMA-2-70B-Chat. Kolmogorov-Arnold Networks (KANs) proposed as more accurate and interpretable alternatives to MLPs.
The LLaMA vs GPT-4 performance comparison on ScandEval’s German NLG tasks sparked interest, with LLaMA 3 outperforming GPT-4.

2. Optimizations and Techniques for Efficient LLM Inference

Significant interest in efficient inference methods like effort/bucketMul for vector-matrix approximation, Ring Attention discussed at the LLM Paper Club, and CUDA optimizations in llm.c like Flash Attention and CUDA Graphs.
Debates on using binary vector representations for embeddings inspired by biological plausibility and CLIP, Dino, and the RWKV LLM method.
Techniques to improve transformer lens interpretability like the tuned lens method and exploring the distributional simplicity bias in neural scaling laws.

3. Open-Source AI Tools, Libraries, and Frameworks

LlamaIndex gaining traction for document knowledge graphing, with the new LlamaIndex.TS v0.3 improving type safety and agent support. Discussions on using MongoDB Atlas as a vector store.
Widespread adoption of Axolotl for open-source LLM fine-tuning, with new features like LLaMA-3 prompt strategies and integration with dstack for orchestration.
Interest in llama.cpp optimizations, with the Flash Attention merge and efforts to support LLaMA 3 tokenization. LM Studio anticipating the 0.2.22 release with llama.cpp updates.
Tinygrad developments like renaming Scalar to ConstType, exploring const support variables, and symbolic shape handling by geohot.

4. Multimodal and Retrieval-Augmented AI Capabilities

Releases of multimodal models like Snowflake Arctic 480B for coding and FireLLaVA 13B by Fireworks, an open-source LLaVA model trained on instruction data.
Explorations into Retrieval-Augmented Generation (RAG) using LangChain with Mistral Large and LlamaIndex, with tutorials on building advanced RAG assistants and complexity-adaptive RAG strategies.
Releases of multimodal AI assistants like Neuralgameworks for Unreal Engine and the AI product Rabbit R1, sparking interest in integrating with OpenInterpreter.
Advances in medical AI like the cardiac ultrasound study with OpenCLIP and Google’s Med-Gemini multimodal models for healthcare.

PART 1: High level Discord summaries

CUDA MODE Discord

CUDA C++ Optimizing Insights: Developers shared that the CUDA C++ Core Libraries best practices revealed performance improvements, but a Google Drive link intended for slides was found to be empty. They further discussed accurate CUDA kernel profiling techniques, with a preference for NVIDIA tools like nsight compute and nsight system over cudaEventRecord due to less overhead and more robust profiling.
Triton Tackles Block Size and Debugging: In the Triton domain, engineers clarified that Triton’s max block size is not limited by hardware constraints like CUDA, and directed debugging tactics to utilize the Triton debugging lecture. The channel also noted the usage of triton-nightly to benefit from the recent interpreter bug fixes.
Sparsity Algorithm Sparks Benchmarks & Learning: AI enthusiasts discussed an algorithm that leverages activation sparsity with a batch size of 1, and the algorithm’s creator engaged, promising to share new benchmarks and insights about the speed/quality trade-offs compared to quantization methods.
Strides Align and Kernels Optimize in CUDA: Concerns and strategies over tensor stride alignment and kernel optimizations, like matmul_backward_bias, dominated discussion in #llmdotc. Advances in performance using strategy x128 packing, experimenting with CUDA Graphs, cuDNN Flash Attention optimization, and the introduction of FP32 for master weights were debated, demonstrating a drive towards more efficient CUDA programming.
AMD’s ROCm and Torch Nightly Discussions: Users focusing on AMD’s ROCm platform exchanged torch Nightly preferences over Torch 2.3, questioned the absence of the latest version 2.0 of flash attention in AMD’s fork, and shared the addition of a backward pass for AMD Flash Attention, leading to informative exchanges and a tutorial resource on AMD HIP.

Unsloth AI (Daniel Han) Discord

GPU Efficiency Sparks Interest: The A4000 16GB GPU is lauded for its efficiency in training, with its cost-effectiveness earning praise when compared to the A100. The B200 is touted for its potential, being forecasted to be 25x more efficient than the current H100 at a comparable cost.

Techniques in Question: Debate over employing LoRA versus QLoRA revealed that QLoRA may offer a 75% VRAM usage reduction at the possible expense of 1-2% in model accuracy. The 80-10-10 split for training data was suggested to ensure model robustness, while language model fine-tuning is progressing, evidencing its application in Turkish translation.

Innovations in Model Training: Users reported quantization issues with llama.cpp, leading to GitHub issues such as #3759 and #4180. Workflows for fine-tuning and training were a point of clarification, with strategies for checkpointing and inference providers like Jan and GPT4All being put forward, available at repositories like janhq/jan.

AI Development Roadmapping Proposed: Advocates for a straightforward AI project roadmap emphasized its importance, while the potential of smaller models for enhanced conversational skills is under exploration. Additionally, the concept of retrieval augmentation is gaining traction, with references to implementations such as FlagEmbedding’s GitHub repository.

Size and Performance: A noteworthy mention was that the Phi3 Mini 4k outperforms the larger 128k version in open LLM rankings, prompting a reevaluation of the efficacy of model sizes. There’s an inclination toward models like Phi3 Mini 4k for their efficiency over larger counterparts.

LM Studio Discord

Flashy Performance Optimizations: Flash Attention integration into llama.cpp enhances memory efficiency by moving from an O(N^2) to O(N) complexity, eliciting community enthusiasm with the merged PR available at Flash ATTENTION support merged into llama.cpp.
Versatility Issues with Model Constraints: Lively discussions reveal models face compatibility obstacles when used beyond designed limits, such as Llama 3 not playing well with old builds and erroring out with contexts larger than 250,000 tokens, despite one’s attempt at a 1M token window with 36GB VRAM.
Necessity for Ample Hardware: Threads agree that using LLMs effectively requires considerable system resources, with models like Everything 7b q4 becoming sluggish on a mere 8 GB RAM, and an updated llama.cpp tokenizer error hinting at hefty RAM needs.
ROCm Build Roadblocks: AMD users engage over ROCm and OpenCL integration, with reports of misread VRAM capacity on a 7900xtx, despite using an RX 6600 previously, and recommendations to opt for a 7900XTX over a 7900 GRE for ensured LM Studio compatibility.
Chasing the Latest Model and Software Releases: The pending release of LM Studio 0.2.22 has generated buzz, aimed to fix tokenizer concerns and enhance model performance, while the beta release of llama.cpp is also suggested to address issues flagged by the community.

For updates on technical advancements and fixes, the community is advised to check the respective GitHub repositories and release pages for the latest commits and build updates.

Nous Research AI Discord

Breaking the OOD Barrier: A solution for positional out-of-distribution (OOD) issues has been proposed to help large language models generalize to longer contexts, which can be found in a recently published paper. An implementation example employing --grp-attn-n and --grp-attn-w parameters is available in the llama.cpp repository.
Llama-3 Leaps Ahead: Nous Research has launched Hermes 2 Pro on Llama-3 8B, touting Function Calling and Structured Output enhancements and outperforming Llama-3 8B Instruct on prominent benchmarks. A quantized version targeting efficiency without compromising advancements is also available on HuggingFace.
LLM Performance and Practicability: Discussions indicated that quantization to 5.5 bits per weight is a threshold before performance loss in large language models becomes significant. The new Hermes 2 Pro Llama 3 has unlearned specific tasks while gaining new ones like function calling, with the community exploring the optimization of long context lengths and integration of advanced tokenization mechanisms.
Data Sets and Tools for AI Innovation: A new Wikipedia RAG dataset has been released, paralleling a study on leveraging LLMs for synthesizing multilingual training data, available here. Moreover, discussion included the integration of Pydantic in the rework of Cynde and the introduction of Logfire, a platform praised for its simplified code observability, detailed here.
Virtual Simulation Advances: The community has seen the release of business and music industry simulators, CompSimulator and Snow Singer Simulator, aimed at providing immersive AI-driven experiences. In addition, talks from AGI House SF have spurred plans for community meetups, with a noted feature that LLAMA 3 bots on HF Chat yield consistent responses for identical messages.

Stability.ai (Stable Diffusion) Discord

SD3 Anticipation Fizzles Without Launch: Skepticism clouds the Stable Diffusion 3 (SD3) release, as expectations for an April or May launch were met with disappointment; there’s concern Stability AI may face criticism for statements about SD3 being free and open-source.
Local Interface Lineup Evaluation: AI enthusiasts are comparing Stable Diffusion local interfaces like ComfyUI, AUTO11111, Focus, and Forge, with recommendations hinging on user-friendliness and specific hardware requisites such as NVIDIA or AMD GPU compatibility.
AI-Assisted Prompt Engineering: There’s an ongoing debate about the best tools for effective image description prompts with mentions of ChatGPT, Gemini, Claude 3, and idefics2; these are potentially valuable for refining prompts to enhance image generation results.
AI Service and Privacy Tools: Discussions indicate trends in investing in AI services like Gemini and Claude 3, coupled with the strategic use of VPN technologies, including DNS over HTTPS, for bypassing regional restrictions or maintaining user anonymity.
Extension Talk for Automatic1111 Fans: Queries surfaced regarding the capability of embedding labels within images using Automatic1111 extension and whether there are features analogous to clip skip and stylizer in custom interfaces such as ComfyUI.

OpenAI Discord

Chat Control Gets Upgraded: OpenAI has rolled out updated data controls for ChatGPT Free and Plus, allowing users to see chat history while opting out of data use for training. They also introduced Temporary Chat for one-off sessions with no chat history retention.
GPT-2’s Resurgence in Chatbots: Members are exploring the gpt2-chatbot with mixed feedback; it excels in certain scenarios but is also noted to fail occasionally. There’s intrigue regarding its capability for infinite generations, though access issues have been reported.
Dissecting AI Emotional Intelligence: In-depth discussions on AI’s potential to develop emotion have drawn parallels to human development. Emphasis lies on whether empathetic understanding or akin emotional responses are either achievable or desirable in AI systems.
DALL-E’s Free Tier Functionality Debates: Users have been discussing the offerings of OpenAI’s services like DALL-E for free users, balancing between business sustainability and expanding user functionalities.
Harnessing Positive Prompting Results: AI Engineers are exploring efficient prompt engineering, with a focus on positive prompting and meta-prompting to achieve more effective interactions with AI models, suggesting strategies like “instead of ‘x’, use ‘y’” to refine output quality.

Perplexity AI Discord

Pages Feature Prepares for Beta Liftoff: Perplexity AI announces an upcoming feature named Pages designed for crafting shareable, in-depth explorations of topics; early access to the beta version is available for interested users.

API Citations The Missing Piece: Engineers express concerns about accessing citations through API requests when using Perplexity-online models, alongside discussions of discrepancy between Pro UI and API model results; the API documentation is clarified to be the go-to resource for model details.

Limitations and Glitches in Spotlight: Members discussed the 50 daily usage limit for Opus, the presence of glitches in Pro Search and referencing tools, and slow responses from AI models, with technical advice offered around possible email filtering from service providers for login issues.

Discovery Through Shared Content: Users actively shared insights and links on diverse topics, including Microsoft Research Asia, the Vimeo API, and Tesla’s self-driving tech; plus, a shared newsletter provided a window into product development insights.

Claude 3 Policy and Model Utilization Clarified: Queries about the usage policy of Claude 3 led to discussions on whether Perplexity’s or Anthropic’s policies are applicable, while the usage of online models in the Pro UI was explained to be either finetuned or employing a search engine-style vector database for responses.

Eleuther Discord

Speeding Up Inference with Effort/BucketMul: A new algorithm, effort/bucketMul, was introduced, designed to significantly accelerate vector-matrix approximation and large language model (LLM) inference, promising real-time computational load adjustments and compatibility with models like Mistral. Further details can be found here.
Binary Beats Hypersphere for Embedding Efficiency: Discourse over embedding strategies yielded insights into the efficiency of binary vector representations for embeddings, backed by biological plausibility and computational frugality, with a connection made to the RWKV LLM, which might benefit from faster learning applying these principles. To delve deeper, read about the RWKV LLM and seminal embedding works such as CLIP and Dino.
Demystifying the Black Box and Improving Benchmarks: Conversations around the opacity of LLMs noted the gap between their complexity and our comprehension, with a focus on improving fairness in benchmark comparisons by avoiding training LLMs on benchmark test sets. Refer to the discussion on bias in benchmark datasets.
KANs Take the Lead Over MLPs: Emerging research introduced Kolmogorov-Arnold Networks (KANs), outshining Multi-Layer Perceptrons (MLPs) in terms of accuracy and interpretability with efficient scaling laws. The pivotal paper on KANs is found here.
Striving for Transparent LLM Computations: A member’s exposition theorized about the computational models within sequence-prediction models, discussing how tied embeddings might influence interpretability and pondering experimental methods to validate their hypotheses. Essential reads include Deriving a Model of Computation for Next-Token Prediction and papers on the tuned lens method and the concept of distributional simplicity bias.

HuggingFace Discord

Cash Prizes for CVPR Participation: HuggingFace announced CVPR competitions with a total prize pool of over $120,000, including competitions such as SnakeCLEF, FungiCLEF, and PlantCLEF slated for June 17-21, 2024.
Transformers and Gradio Level Up: A significant update to the Transformers library introduces new models, with Phi-3 now operable in the browser. Gradio also released v4.28.0, featuring custom components, and parallel updates arrived for the Datasets library, reaching v2.19.0 with Polars compatibility.
AI Tools You Should Experiment With: New AI tools and methods are shared, including a Medium post on “5 Interesting AI Tools Everyone Should Try” and a discussion on accelerating diffusion models in PyTorch 2, as suggested in Hugging Face’s documentation.
Med-Gemini: AI for Medicine Introduced: A YouTube video provides insights into Google’s Med-Gemini, a multimodal GenAI model designed for medical applications, promoting understanding of such models’ scope and potential.
Job Opportunities and Community Insights: A software engineer with extensive experience inquired about opportunities at Hugging Face and was directed to the available positions. Meanwhile, community exchanges included discussions on intent recognition issues with the Rasa chatbot framework, learning curves between PyTorch and TensorFlow, and creating instruction datasets for LLM finetuning.
Gradio’s Status Checkpoint: Gradio faced issues with their Share Server impacting usage on Colab; they provided a status page to keep track of progress on the fix.
Innovations in the AI Community: Contributions from community members feature projects like a PnPR-GCN technique for leak-free link prediction and HDR imaging challenges, articulating solutions and engaging with the wider discourse on AI advancements.
Lean Learning Approaches: Within reading groups, attention has been turned to topics such as graph neural networks with arXiv:2404.14928 and the application of negotiation as a metric for evaluating LLM alignment touched upon in NegotiationArena shared at arXiv:2402.05863.

LlamaIndex Discord

RTX 4080: Enough For Small Language Models?: Engineers discussed whether a gaming card like the RTX 4080 is suitable for running and fine-tuning smaller language models, noting the importance of VRAM but suggesting limitations in fine-tuning models larger than 7B with small batch sizes.
Local AI Processing Values Security: The conversation highlighted the advantage of a local PC for dealing with sensitive data and robust computing tasks over cloud solutions like Google Colab, which may raise privacy concerns.
Introducing Word Loom for AI Language Management: A new open specification called Word Loom was introduced, targeting the efficient management and exchange of language for AI, aiming for a clear separation of code from natural language and better composability, with detailed information found on GitHub.
AI Financial Genius Works Without Human Help: A groundbreaking financial assistant now boasts the ability to calculate percentage evolution, CAGR, and P/E ratios over unstructured financial reports autonomously, as highlighted in a recent tweet.
LlamaIndex Scores New Technical Capabilities: The latest release, LlamaIndex.TS version 0.3, brings significant improvements including agent support for various platforms, Web Streams enhancements, and a more resilient type system as announced in a tweet.

Modular (Mojo 🔥) Discord

Mojo Marches On: The Mojo developer community celebrated the first anniversary of Mojo’s launch, praising the addition of traits, references, and lifetimes which significantly enhanced the standard library. Concerning enhancements, it was suggested to improve Mojo by allowing negative numbers, and implementing a fallback for scalar processing, inspired by linkage to articles within the issues.

Performance Power-ups: Innovative optimization of string allocations and conversions in Mojo cut processing time from 18.5 to 12.5 seconds for 100M records, with the latest effort reducing it further to 3.8 seconds utilizing multi-core processing techniques. A call was made to form Team-Mojo for the One Billion Row Challenge, seeing it as an opportunity for showcase and community collaboration.

Syntax and Semantics Synergy: Discussions on syntax and semantics highlighted the importance of Mojo’s syntax alignment for users and how inout in Mojo bears similarity to pass-by-reference in C++, but with its nuances. Questions about the __source_location() function led to a conversation pondering the inclusion of function_name in its output and the replacement of these features in the nightly branch.

Exploring Concurrency Considerations: The conversation speculated on Mojo’s concurrency model potential, theorizing it might mirror an actor model more than the golang-style, with a spotlight on avoiding heavy runtime inclusion. The Mojo compiler, with an LLVM backbone, has a dedicated YouTube video explaining its underpinnings.

Tweet Teasers Lead to Speculation: Modular spurred curiosity with a series of unspecified tweets, teasing intriguing developments without revealing the specifics, piquing interest for details beyond the announcements.

OpenAccess AI Collective (axolotl) Discord

Exploring Axolotl’s Model Support: In a discussion within the #axolotl-phorm-bot channel, it was clarified that Axolotl supports GaLore but not phi-3 format. Community advice recommended checking out the Hugging Face documentation for details on enabling GaLore. Meanwhile, an untested PR was highlighted as a resource for those looking to add command-r model to Axolotl.

Strategies for Effective Chat-Tokenization: Members in #general channel debated the impact of the Beginning of Sentence (BOS) token in tokenizer behavior, and the importance of specifying it correctly in different scenarios. Also, a study on generalist foundation models prompted discussions on the effectiveness of complex prompting strategies and the challenges in rendering academic theory practical.

Best Practices for Fine-Tuning New Models: The #general-help channel was abuzz with communities engaged in fine-tuning processes, where certain criteria such as using smaller models like an 8b model for beginners were recommended. Practical tips for dataset conversion for ShareGPT loader, and inquiries regarding fsdp compatibility with lora were discussed.

Tutorial Collaboration Strikes a Chord: In the #community-showcase, a tutorial illustrating the combination of axolotl and dstack, an open-source container orchestrator, was shared and well-received, emphasizing ease-of-use and flexibility. Contributors are directed to GitHub for detailed usage.

Compute Resources for Collaboration: An offer in the #axolotl-dev channel extended compute resources to other members for the purpose of helping with triage and troubleshooting, which could be particularly useful for those involved in bug fixes and enhancements.

LAION Discord

AI Enters TOS Grey Zone: A discussion emerged around users employing AI products without consenting to terms of service, highlighting a gray area in user agreement enforcement and prompting debate on legal implications for both users and providers.

Leaderboard Integrity Challenged: There’s a push for a more transparent AI model leaderboard, emphasizing the need for openness and verifiability, while members expressed skepticism over LMSYS’s Chatbot Arena, raising concerns of lack of objectivity and opaque data practices. The notion of incorporating only open source models and filtering by open weights was put forth as a criterion for improved leaderboards.

Eager for Efficiency: Engineering conversations revolved around a multitude of optimization strategies, from considering GANs for superior model reconstruction to discussions about Natten’s cuda implementation, and the development of projects like magvit2.

Breaking New Ground in AI and Medicine: The community took note of a published study on cardiac ultrasound utilizing OpenCLIP that was recently featured in Nature Medicine, despite some existing issues with the study.

Revolutionizing Networks and Fact-Checking: Enthusiasm was evident for the innovative Kolmogorov-Arnold Networks (KANs), poised to outdo MLPs in accuracy and interpretability (the paper on KAN), and the introduction of VisualFactChecker, a training-free pipeline designed to bolster visual content captioning fidelity (the paper on VFC).

Latent Space Discord

Decentralizing AI’s Compute Power: Prime Intellect has plunged into the exploration of decentralized AI training methodologies, aiming to rival the expansive GPU clusters employed by larger corporations. Their platform is geared towards leveraging globally distributed compute resources, as detailed in their extensive blog post.

Starcoder Rises: Hugging Face has launched a new Large Language Model called StarCoder2-15B-Instruct-v0.1, focusing primarily on code generation. They’ve made the model and pipeline open-source, inviting the community to engage, as outlined on their announcement page.

Simulating AI Societies on Consumer Tech: An experimental setup involving 300 AI agents called AI Town is reported to operate seamlessly on a MacBook M1 Max. The intriguing tweet reveals the capabilities and potential of AI simulations on consumer-level hardware.

LLM Paper Club: Ring in the Discussion: The LLM Paper Club’s upcoming event features a collaborative discussion with the StrongCompute team on the Ring Attention paper. Engineers interested in the latest research findings can join via this Zoom link.

Video Meet for the Tech-Elite: A Zoom meeting video call has been set up for a more visual interactive discussion, likely concerning ongoing work or a paper club event. The community members can join using the provided Zoom Meeting link.

OpenInterpreter Discord

Respect Is Tech’s Best Friend: A community reminder underscored the imperative of respect and constructive interaction; as the group expands, it is vital that everyone feel welcomed and valued for a collaborative future.

Open Interpreter Becomes Browser-Savvy: The Open Interpreter tool was confirmed to possess capabilities for web browsing and data scraping tasks without the need for traditional browser control, fostering direct web interactions through the AI.

Hitting the Right Note with DIY Speaker Amp: To boost the audio output from speakers, one solution recommended was an external amplifier, highlighting one potential amplifier on Amazon, though real-world application awaits confirmation upon testing.

R1’s AI Unboxing Sparks Integration Talks: An MKBHD YouTube review on the AI product, Rabbit R1, watch here, ignited discussions on its potential integration with OpenInterpreter, with engineers eager to push the envelope of interconnected AI systems.

Tunnel Vision for Successful OI Connection: Engineers traded know-how on establishing a stable connection with an OpenInterpreter server, including the method for setting up new domains with ngrok and modifying the tunnel.py file, aiming to iron out connection wrinkles—more details at ngrok domains page.

OpenRouter (Alex Atallah) Discord

New AI Models Hit the Ice: Snowflake Arctic 480B and FireLLaVA 13B have been released with Snowflake Arctic 480B boasting a hybrid transformer architecture optimized for coding, available at Snowflake Arctic 480B, and FireLLaVA 13B, a multimodal model from Fireworks, accessible at FireLLaVA 13B. Pricing and developer specifications are updated to reflect their enhanced capabilities.
OpenRouter Gets Smarter with Efficient Load Handling: New load balancing features aim to distribute provider workloads more effectively, complemented by real-time monitoring tools for latency and provider performance at Activity page, improving overall system robustness.
Streamlined Resources for Developers: OpenRouter’s documentation now includes updates, enabling more efficient use of Image and multimodal requests, tailored tool calls, and function calling; details can be found at Image Requests and Tool Calls.
Cost Reduction in AI Services: OpenRouter has reduced prices significantly: a major 40% cut for Mythomax Extended services, alongside a modest 4% saving on Mixtral 8x7b Instruct, reinforcing the platform’s commitment to affordable AI services.
AI Writes with a Swedish Flair: Skribler, a tool designed to assist Swedish authors with various facets of writing by incorporating different AI models, is on the rise with a user base already willing to pay for its services - check it out at skribler.se.

AI Stack Devs (Yoko Li) Discord

Crisp Visuals Spark Interest: Hexagen World surprised members with high-quality diffusion model outputs, suggesting promising avenues for interactive AI game development.

Retro Games Reimagined with AI: The Guild discussed reviving retro games like Farmville using Generative AI, with WebSim as a potential platform for these nostalgic reboots.

Spy Games Meet Generative Towns: An intriguing concept for a 1950s-themed AI town with a communist spy character was proposed, generating interest in creating an immersive cat-and-mouse game within WebSim.

Join the AI-Animated Conversation: Those curious about AI-driven animation received an invitation to a specialized Discord group via a community link, offering room for collaborative discussions and projects in interactive AI.

Dev Discussions Highlight Compatibility Issues: AI devs tackled local setup processes, noting particular issues with Windows systems and the importance of using the correct Node version (nvm use 19). Some even considered switching to Linux, especially since games like Stellaris are supported, as evidenced by information found on WineHQ.

Cohere Discord

Command R Impresses: The Cohere community has expressed appreciation for the CommandR/R+ models, highlighting their polished performance which seemingly surpasses other large language models for an enterprise-level experience.

LLM Grammar Secrets Exposed: A discussion on LLMs (Large Language Models) and their ability to generate grammatically correct sentences revealed insights into word and sentence embeddings, and the significance of the self-attention mechanism, with a resource provided for in-depth understanding.

AI Legal Eagle Takes Flight: A webinar on constructing an AI legal assistant using Cohere’s RAG saw the community engaged, with a link to the recording made available on YouTube.

Azure Meets OAuth: Instructions for setting up OAuth with connectors on Azure using the Cohere toolkit were clarified, highlighting the ability for azure integration while keeping data internal as detailed on their GitHub page.

Multilingual Mastery in the Making: The implementation and potential of multilingual support in Command-R is under active evaluation by the community, with particular attention to languages like Norwegian and the desire for enhanced benchmarks.

LangChain AI Discord

PDF Table Extraction Proves Tricky: Engineers shared challenges with table extraction from PDFs using unstructure library, noting subpar outcomes particularly with multi-page tables. No solution was provided, indicating an area ripe for development or an opportunity for tool recommendation.

LangChain and Llama 3 Join Forces: There was a conversation about integrating Llama 3 with LangChain, directing users to utilize Fireworks and corresponding API keys. Additionally, a mention about the re-inclusion of Google Drive libraries in a project was noted, highlighting the cyclical nature of tech dependencies.

Launch, Updates, and Spec Introductions: Noteworthy developments include the launch of QuickVid for summarizing YouTube content, the update of LangChain chatbot to 0.1.17, and the introduction of Word Loom as a potential standard for AI language management, feedback solicited at their GitHub Gist. Queries about the usefulness of a detailed performance report comparing various LLMs for content creation were also raised.

Knowledge Graph Aspirations and AI Sales Agents: Members shared insights into tools for converting documents into knowledge graphs and the development of AI-powered Sales Agents. For the former, layout parsers and Azure Doc AI were proposed, alongside exploring LangChain’s documented graph construction methods. The latter involved SalesGPT logic and a call for partnerships.

RAG Innovations and Language-Focused Tutorials: Engineers discussed a variety of RAG applications, including an Advanced RAG assistant for the French-speaking community, local training of Llama3, and an Adaptive RAG technique that responds based on query complexity. Related instructional videos were shared: French RAG Assistant, Local Agentic RAG w/ llama3, and LangGraph + Adaptive Rag + LLama3 Python Project.

Mozilla AI Discord

Mozilla AI is Hiring, Wave at Lm-buddy: Mozilla AI is currently expanding its team, with opportunities posted on their official Discord channel, and has also released Lm-buddy, a new open-source tool aimed at improving model evaluation efficiency.

LLaMA3:8b on M1 MacBook Air Confirmed for Testing: After users encountered issues with LLaMA3:8b running on M1 MacBook Air, the response indicated that testing on M1 will become a priority once other support issues are resolved.

Whispering to Llamafile: Proposals were made to integrate whisper.cpp models into llamafile for enhanced inference, despite the challenges in adding microphone and speaker functionalities.

Performance Debate Clarified: An article by Justine Tunney suggesting np.matmul performs at 29 gflops was contested, leading to a clarification that this was specific to an Intel computer on Ubuntu and that actual performance may vary.

Simultaneous Llamafiles and Path Customization Explained: Discussions in the guild confirmed that running multiple llamafiles with different models is possible, with operating systems managing the resources. Users also learned that customization using the --server --path PUBLIC_PATH option is limited to replacing .html and .js files in the zip file.

tinygrad (George Hotz) Discord

Tinygrad Undergoes Tensor Transformations: The tinygrad project implemented major updates with a commit renaming Scalar to ConstType, contributing to standardization efforts in the codebase. Discussions spotlighted the potential to optimize constant handling in operations by introducing const support variables and the significance of const Variables for operations linked to symbolic dimensions.

Graph Visualization Interest Piques for Backward Passes: The conversation included curiosity about visualizing graph diagrams for backward operations with a focus on issue #3572. There are hints at using dot files and setting GRAPH=1 for visual aid in understanding these operations.

Symbolic Dimensions Step into the Spotlight: Georgehotz shared insights on working with symbolic shapes and introduced a pull request with a skipped test for symbolic arange. This indicates an ongoing effort to enhance tinygrad’s capabilities with symbolic dimensions.

JIT Crafting and Mean Calculations: A dialogue on improving tinygrad’s Just-In-Time (JIT) compilation with symbolic variables led to the suggestion that a robust test would involve calculating the mean of variable-length 2D tensors. Such enhancements could refine the efficiency and performance of the JIT compiler.

CUDA Challenges on Nvidia Xavier: Technical discussions touched upon challenges faced while running EfficientNet examples on Nvidia Xavier, emphasizing the need to ensure CUDA=1 for proper script execution. Members also deliberated on whether Rednode’s representation in tinygrad could be complicating symbolic compiler logic.

Interconnects (Nathan Lambert) Discord

Claude Joins the AI Chat App Scene: Anthropic has released its Claude app, stirring up curiosity among members about its performance compared to OpenAI’s solutions. While no detailed comparisons were provided, one user has downloaded the app and reported a smooth initial experience, with particular kudos to Anthropic’s branding.
Elevating Performance Through Feedback: After receiving pointed feedback, a member significantly improved their work quality, resulting in commendation from their peers. Specifics of the work improvement were not given, but the reactive boost in productivity was notable.
AI Leaderboards Under Scrutiny: An article suggests that AI leaderboards might be outdated, highlighting that the most accurate system for code generation, as per HumanEval benchmarks, is LDB. However, its reliance on expensive calls to models like GPT-4 casts a shadow on its efficiency and cost-effectiveness.
ML Collective Attendance: An individual confirmed sparse attendance at ML Collective meetings, indicating ongoing participation but no specific outcomes or details from the meetings were discussed.

Alignment Lab AI Discord

Spam Alert Across the Guild: Multiple channels within the Discord guild were infiltrated by inappropriate content that advertised adult material involving potentially underage subjects, alongside Discord invite links purportedly offering leaked content.
Urgent Need for Moderation: These messages violate community guidelines, hint at illegal activities, and disregard the purpose of professional discourse expected in technical discussions.
Unwelcome Interruptions: The spam disrupted numerous channels, ranging from those dedicated to AI discussion to collaboration and general chat, necessitating attention by moderators.
Content Warning for Engineers: Engineers must be cautious as the spam contains potential security risks, such as phishing attempts, that could compromise professional and personal data.
Call to Action: Immediate actions are advised to remove the content, ban the posters, and enhance security measures to prevent future incidents.

Skunkworks AI Discord

Prompt Engineering Propels LLaMA-3: The LLaMA-3 instruct prompt strategies have been updated, leading to performance improvements, with the associated changes detailed in a GitHub pull request.
Easing Dataset Woes: Proper usage of eot_id has resolved challenges related to dataset entry formatting, proving to be more efficient than manual </s> tagging.
Meta Harnesses Iterative Reasoning: New “Iterative Reasoning Preference Optimization” techniques have elevated LLaMA-2-70B-Chat’s accuracy, as demonstrated by improved scores on GSM8K and ARC-Challenge benchmarks; the paper can be read here.
Axolotl Fine-Tuning Success: An user experienced success fine-tuning LLaMA-3 8b with Axolotl, noting enhanced model outputs.
Cranking Up Coding Jams: A motivational anime track, “NEVER GIVE UP YOUR WAAAAAAAAAAAAY,” was shared to possibly fuel late-night coding sessions, complete with a YouTube link and a note of Patreon support for the creators.

DiscoResearch Discord

LLaMA beats GPT-4 in Language Showdown: Results from scandeval.com indicate that LLaMA 3 outperforms GPT-4 in the ScandEval benchmark for German natural language tasks, sparking discussions about new AI model capabilities.

Accelerated Local Loads Trump Sluggish Cloud: An engineer reported that a program loads in 3 seconds on a local machine, pointing towards issues other than storage affecting slower load times when running jobs elsewhere.

Qdora Expands LLaMA’s Middleway: Exciting progress in Large Language Model (LLM) expansion has emerged with the mention of qdora, a solution fostering the growth of models like LLaMA; the process is outlined in an Answer.ai blog post.

Avoiding Forgetfulness in AI Training: The guild discussed methods to prevent catastrophic forgetting during post-pretraining, referencing an Arxiv paper on enhancing Transformer blocks that helps LLMs retain old skills while learning new ones.

Fusing AI Past and Present: Guild engagement highlighted the prospect of “Non-forgetful Learning” in LLMs, where expansion techniques are crucial for merging traditional AI skills with newer, more advanced capabilities.

Datasette - LLM (@SimonW) Discord

Designing User-Centric Data Retrieval: A member proposed a frontend feature for Datasette allowing users to select country-specific data from a dropdown with the goal of improving user experience in data fetching.
Debating URL vs. UI Customization: Two user experience strategies emerged: dynamically updating the URL to display relevant data upon selection, and developing a customizable interface with “buildable” queries based on user input.

PART 2: Detailed by-Channel summaries and links

CUDA MODE ▷ #general (4 messages):

CUDA Best Practices Shared: The channel shared a Twitter link about best practices for CUDA C++ Core Libraries and also provided slides via a Google Drive link, but the folder was noted to have no files.
Prompt Action on Spam: A user flagged the attention of moderators with a mention (@&1189538650011217942), followed by swift action from another member confirming the removal of a spammy post.
Understanding PyTorch’s autograd.grad: A member posed a question about using torch.autograd.grad to obtain the diagonal of the Hessian matrix of a function output with respect to parameters, with two consecutive gradient computations.

Link mentioned: CCCL - Google Drive: no description found

CUDA MODE ▷ #triton (13 messages🔥):

Triton’s Block Size Puzzle: A member inquired about the maximum block size in Triton, thinking it would match CUDA’s limit. In response, it was explained that Triton’s block size is not fundamentally tied to the hardware and could theoretically be very large, with no direct relation to the number of threads launched per block.
Triton Debugging Techniques Probed: An individual sought advice on the best practices for debugging Triton kernels, finding challenges with TRITON_INTERPRET=1 and device_print. Another member encouraged reviewing a Triton debugging lecture for insights, as it might provide useful strategies.
Need for Triton Interpreter Bug Fixes: Following up on debugging issues, a user mentioned that the TRITON_INTERPRET=1 setting was causing abnormal program behavior. It was suggested to install Triton from source or use triton-nightly to benefit from recent interpreter bug fixes.
Curiosity for Triton’s Release Schedule: A member asked about the expected release date for the next version of Triton, as they are currently using version 2.3. The response was that there is no solid plan yet for the upcoming release.

Link mentioned: Lecture 14: Practitioners Guide to Triton: https://github.com/cuda-mode/lectures/tree/main/lecture%2014

CUDA MODE ▷ #cuda (14 messages🔥):

Exploring CUTLASS vs CuBLAS: A member highlighted the performance of CUTLASS, which outperformed CuBLAS with a matrix multiplication benchmark (8192 x 8192 x 8192), achieving 288 Teraflops compared to CuBLAS’s 258 Teraflops. When integrated into Python, however, CUTLASS’s performance advantage disappeared, matching CuBLAS at 257 Teraflops.
Kernel Timing Conundrums in CUDA: A discussion emerged around accurately profiling time durations within CUDA kernels, as utilizing cudaEventRecord showed unstable timings, particularly in shared memory versions of matrix multiply kernels with varying tile sizes.
NVIDIA Tools for Accurate Profiling: It was suggested to use NVIDIA’s nsight compute or nsight system for more robust profiling, as they are built to be more accurate and might incur less overhead compared to custom profiling with cudaEventRecord.
Understanding Profiling Overheads: A member queried about inconsistencies between cudaEventRecord timings and ncu's Duration field, with the concern that ncu's report might include profiling overhead. The response clarified that ncu runs warm-up kernels which could account for additional reported time, but ultimately suggest more accuracy.
Nsight Systems vs. NCU Utility: Clarification was given that both nsys and ncu can be used for profiling CUDA kernels, with each providing different utilities and interfaces for analyzing and understanding kernel performance.

Link mentioned: Strangely, Matrix Multiplications on GPUs Run Faster When Given “Predictable” Data! [short]: Great minds discuss flops per watt.

CUDA MODE ▷ #algorithms (5 messages):

Sparsity and Quality Trade-offs: The conversation revolves around an algorithm potentially leveraging batch size=1 activation sparsity, which might preserve compute and quality. However, there is concern that this approach could face limitations similar to activation sparsity when dealing with batched computations over one.
Effort Creator Chimes In: The creator of the algorithm mentioned joined the chat and is open to discussing their findings about its performance.
Benchmark Revelations: The creator provided an update that new benchmarks show effort/bucketMul performs worse in terms of speed/quality ratio when compared to quantization, with an article to come detailing these findings.
Quality Keeps Up with Pruning: Despite speed/quality concerns, the creator claimed that in terms of quality degradation, their method appears superior to simply pruning the smallest weights, promising to post supporting charts.
Direct Comparison Shared: A direct comparison was shared highlighting the difference between removing the lowest weights from a matrix and skipping the least important calculations, noting the creator’s ongoing process of learning about sparsity.

CUDA MODE ▷ #triton-puzzles (2 messages):

Confusion Over Sequence Length in Puzzle 9: User expressed confusion regarding Puzzle 9’s terminologies, specifically about the parameters T and N0. The formula for z_i was also a subject of confusion as the user was unsure how it should be interpreted based on the provided information.
Possible Description Conflict Noted: Another member acknowledged potential conflicting information in the problem description of Puzzle 9 and shared their assumption that N0 equals T for solving purposes.

CUDA MODE ▷ #llmdotc (809 messages🔥🔥🔥):

CUDA Optimization Discussions Intensify: The CUDA MODE Discord community continues to scrutinize and optimize various kernel operations. Members are experimenting with aligning tensor strides and optimizing the matmul_backward_bias kernel, with an eye on future enhancements using x128 packing for increased performance. Several iterations have been proposed for the gradient clipping and adam optimizer kernels, considering their impacts on computational efficiency and memory usage.
CUDA Graphs and cuDNN Flash Attention in Action: The channel’s contributors have successfully integrated optional support for cuDNN flash attention, seeing meaningful speed improvements, although the exact performance gain over current bespoke kernels remains under evaluation. CUDA graphs have been mentioned as a mechanism for optimization, though more detail is needed to understand their current state of use within the community’s codebase.
Comparing PyTorch and llm.c Performance: Recent discussions and benchmarks suggest that llm.c is closely matching or surpassing the performance of PyTorch for the GPT-2 model training, even outperforming PyTorch 2.3.0 by up to 32%. However, with PyTorch nightly builds showing considerable performance improvements due to recently merged PRs, llm.c is now slightly behind with a ~4% slower token processing rate.
Debates Over Memory Efficiency and Operation Fusing: There’s ongoing discussion about the relative merits of fusing operations like GELU with matmul kernels to save memory. Though such fusion is tricky and could potentially hurt performance, some suggest fusing into the epilogue of the preceding matmul or re-computing in backward passes could be a memory-efficient compromise. Concepts like prologue vs. epilogue fusion and matmul’s need for input/output tiles in forward/backward passes are central to these debates.
Potential for Master Weights in FP32: A suggestion was made to keep master weights in FP32 by default to provide a more stable and reliable implementation. This modification would imply certain changes to the optimizer update function and memory allocation scheme, with lazy initialization during the update stage as a possible approach.

Links mentioned:

no title found: no description found
(beta) Compiling the optimizer with torch.compile — PyTorch Tutorials 2.3.0+cu121 documentation: no description found
cuda::discard_memory: CUDA C++ Core Libraries
Delivering 1 PFLOP/s of Performance with FP8 FlashAttention-2: We recently released an update to our FlashAttention-2 forward pass implementation on NVIDIA Hopper™ architecture that incorporates a number of new optimizations and improvements, including …
Performance Comparison between Torch.Compile and APEX optimizers: TL;DR Compiled Adam outperformed SOTA hand-optimized APEX optimizers on all benchmarks; 62.99% on Torchbench, 53.18% on HuggingFace, 142.75% on TIMM and 88.13% on BlueBerries Compiled AdamW performed...
Which Compute Capability is supported by which CUDA versions?: What are compute capabilities supported by each of: CUDA 5.5? CUDA 6.0? CUDA 6.5?
Release Notes — NVIDIA cuDNN v9.1.0 documentation: no description found
karpa - Overview: karpa has 13 repositories available. Follow their code on GitHub.
Compiler Explorer - CUDA C++ (NVCC 12.2.1): #include <cuda/barrier> #include <cuda/std/utility> // cuda::std::move #include <cooperative_groups.h> #include <cooperative_groups/reduce.h> t...
fixed potential error and generalized gelu forward by ngc92 · Pull Request #313 · karpathy/llm.c: This adds a helper function for safe casting from size_t to ints (may want to have that in utils.h too). that macro is then used to convert the size_t valued block_size * x128::size back to a regu...
WikiText 103 evaluation · Issue #246 · karpathy/llm.c: I've seen some repos use WikiText-103 as the dataset they use to eval GPT-like models, e.g.: https://github.com/tysam-code/hlb-gpt/tree/main Add prepro script to download and preprocess and tokeni...
mixed precision utilities for dev/cuda by ngc92 · Pull Request #325 · karpathy/llm.c: cherry-picked from #315
Add llm.cpp fork to README by jrhemstad · Pull Request #314 · karpathy/llm.c: no description found
option to keep weights as fp32 by ngc92 · Pull Request #326 · karpathy/llm.c: adds an optional second copy of the weights in fp32 precision TODO missing free
gradient accumulation preview / wip by karpathy · Pull Request #318 · karpathy/llm.c: I can't seem to get this working tonight, something is off. The Python part works. i.e. we have the following. Running the default python script reproduces the old behavior before this PR: python ...
feature/cudnn for flash-attention by karpathy · Pull Request #323 · karpathy/llm.c: Builds on top of PR #322 Additional small fixes to merge cudnn support, and with it flash attention
Add NSight Compute ranges, use CUDA events for timings by PeterZhizhin · Pull Request #273 · karpathy/llm.c: CUDA events allow for more accurate timings (as measured by a GPU) nvtxRangePush/nvtxRangePop Adds simple stack traces to NSight Systems: Sample run command: nsys profile mpirun --allow-run-as-roo...
Second matmul for fully custom attention by ngc92 · Pull Request #227 · karpathy/llm.c: So far, just in the /dev files, because for the main script we also need to touch backward. For some reason, I see considerable speed-up in the benchmarks here, but in my attempts to use this in ...
Updated adamw to use packed data types by ChrisDryden · Pull Request #303 · karpathy/llm.c: Before Runtime total average iteration time: 38.547570 ms After Runtime: total average iteration time: 37.901735 ms Kernel development file specs: Barely noticeable with the current test suite: Bef...
What's the difference of flash attention implement between cudnn and Dao-AILab? · Issue #52 · NVIDIA/cudnn-frontend: Is this link a flash attention?
cuDNN Flash Attention Forward & Backwards BF16 (+35% performance) by ademeure · Pull Request #322 · karpathy/llm.c: RTX 4090 with BF16 and batch size of 24: Baseline: 232.37ms (~106K tokens/s) cuDNN: 170.77ms (~144K tokens/s) ==> +35% performance! Compile time: Priceless(TM) (~2.7s to 48.7s - it's a big dep...
first draft for gradient clipping by global norm by ngc92 · Pull Request #315 · karpathy/llm.c: one new kernel that calculates the overall norm of the gradient, and updates to the adam kernel. Still TODO: clip value is hardcoded at function call site error handling for broken gradients would...
single adam kernel call handling all parameters by ngc92 · Pull Request #262 · karpathy/llm.c: First attempt at a generalized Adam kernel
llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
llm.c/train_gpt2.cu at master · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
first draft for gradient clipping by global norm by ngc92 · Pull Request #315 · karpathy/llm.c: one new kernel that calculates the overall norm of the gradient, and updates to the adam kernel. Still TODO: clip value is hardcoded at function call site error handling for broken gradients would...
Pull requests · karpathy/llm.c: LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Add NSight Compute ranges, use CUDA events for timings by PeterZhizhin · Pull Request #273 · karpathy/llm.c: CUDA events allow for more accurate timings (as measured by a GPU) nvtxRangePush/nvtxRangePop Adds simple stack traces to NSight Systems: Sample run command: nsys profile mpirun --allow-run-as-roo...
Faster Parallel Reductions on Kepler | NVIDIA Technical Blog: Parallel reduction is a common building block for many parallel algorithms. A presentation from 2007 by Mark Harris provided a detailed strategy for implementing parallel reductions on GPUs…
nanoGPT/train.py at master · karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. - karpathy/nanoGPT
[inductor] comprehensive padding by shunting314 · Pull Request #120758 · pytorch/pytorch: Stack from ghstack (oldest at bottom): -> #120758 This PR adds the ability to pad tensor strides during lowering. The goal is to make sure (if possible) tensors with bad shape can have aligned st...
GitHub - gevtushenko/llm.c: LLM training in simple, raw C/CUDA: LLM training in simple, raw C/CUDA. Contribute to gevtushenko/llm.c development by creating an account on GitHub.
Bonus Lecture: CUDA C++ llm.cpp: llm.cpp: https://github.com/gevtushenko/llm.cSlides: https://drive.google.com/drive/folders/1T-t0d_u0Xu8w_-1E5kAwmXNfF72x-HTA?usp=sharing
CCCL - Google Drive: no description found
Foreach kernel codegen in inductor by mlazos · Pull Request #99975 · pytorch/pytorch: design doc Add foreach kernel codegen for a single overload of foreach add in Inductor. Coverage will expand to more ops in subsequent PRs. example cc @soumith @voznesenskym @penguinwu @anijain2305...
Packing for Gelu backwards by JaneIllario · Pull Request #306 · karpathy/llm.c: Update gelu backwards kernel to do packing into 128 bits, and create gelu brackward cuda file Previous kernel: block_size 32 | time 0.1498 ms | bandwidth 503.99 GB/s block_size 64 | time 0.0760...
convert all float to floatX for layernorm_forward by JaneIllario · Pull Request #319 · karpathy/llm.c: change all kernels to use floatX
Update residual_forward to use packed input by JaneIllario · Pull Request #299 · karpathy/llm.c: Update residual_forward to use 128 bit packed input, with floatX Previous Kernel: block_size 32 | time 0.1498 ms | bandwidth 503.99 GB/s block_size 64 | time 0.0760 ms | bandwidth 993.32 GB/s b...

CUDA MODE ▷ #rocm (8 messages🔥):

Issues Building with Torch 2.3: A member mentioned difficulties in building with Torch 2.3 and expressed a preference for using torch nightly instead.
AMD Lacks Latest Flash Attention Kernels: A member queried why AMD’s official fork has not ported version 2.0 of flash attention, despite there being newer flash attention kernels available.
Backward Pass Added to AMD Flash Attention: In response to a question about the backward pass for AMD Flash Attention, it was confirmed that the backward pass was indeed implemented, with a link to the ROCm flash-attention GitHub repo.
AMD RDNA3 Support in Flash Attention: A member asked which branch has the RDNA3 working for the ROCm flash-attention, indicating the presence of allowed_archs in the code.
AMD HIP Tutorial Playlist Shared: Another member found the information interesting and shared a YouTube playlist for an AMD HIP Tutorial, which covers using the HIP programming language on the ROCm platform.

Links mentioned:

AMD HIP Tutorial: In this series of videos, we will teach how to use the HIP programming language to program AMD GPUs running on the AMD ROCm platform. This set of videos is a...
GitHub - ROCm/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to ROCm/flash-attention development by creating an account on GitHub.

Unsloth AI (Daniel Han) ▷ #general (572 messages🔥🔥🔥):

Efficiency vs. Power Debate: The A4000 16GB GPU was highlighted as efficient for training, costing significantly lower than an A100 per hour. The upcoming B200 is pegged as a game-changer, potentially 25x more efficient than an H100 at a similar price point.
Finetuning With LoRA and QLoRA: A discussion clarified the differences in VRAM usage and potential accuracy degradation between using LoRA (16bit) and QLoRA (4bit). QLoRA saves 75% of VRAM but might result in 1-2% accuracy loss, whereas LoRA has no accuracy degradation.
Training Advice Shared: A recommended strategy was to split datasets into 80% for training, 10% for tuning hyperparameters, and keep 10% hidden for final model evaluation without further tuning to avoid contaminating training data.
Training Turkish Language Model: A user is fine-tuning Llama 3 on translate tasks for Turkish with over 430k examples. The model is currently performing like a translation bot, changing the output language based on the input language.
ORPO Training on Unsloth: A snippet of code was shared for training the mlabonne/orpo-dpo-mix-40k dataset using Unsloth ORPO Trainer on an RTX 4090 GPU, taking about 5 hours.
Unsloth Wiki Update: Contributions regarding fine-tuning and training were added to the Unsloth wiki, acknowledging the community input.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
Self-Rewarding Language Models: We posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human prefer...
Weird Minion GIF - Weird Minion - Discover & Share GIFs: Click to view the GIF
Reddit - Dive into anything: no description found
Google Colab: no description found
Google Colab: no description found
Load: no description found
mlabonne/orpo-dpo-mix-40k · Datasets at Hugging Face: no description found
gist:e45b337e9d9bd0492bf5d3c1d4706c7b: GitHub Gist: instantly share code, notes, and snippets.
NousResearch (NousResearch): no description found
Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - M-Chimiste/unsloth_finetuning: Contribute to M-Chimiste/unsloth_finetuning development by creating an account on GitHub.
[FIXED] NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs · Issue #400 · unslothai/unsloth: I'm a beginner to try unsloth. I run the free notebook Llama 3 (8B), and then got the following error: I also encountered the following error during the first installing step: ERROR: pip's dep...
no title found: no description found
Type error when importing datasets on Kaggle · Issue #6753 · huggingface/datasets: Describe the bug When trying to run import datasets print(datasets.__version__) It generates the following error TypeError: expected string or bytes-like object It looks like It cannot find the val...
GitHub - facebookresearch/xformers: Hackable and optimized Transformers building blocks, supporting a composable construction.: Hackable and optimized Transformers building blocks, supporting a composable construction. - facebookresearch/xformers
unsloth (Unsloth AI): no description found

Unsloth AI (Daniel Han) ▷ #random (6 messages):

Size Matters Less with AI Models: A remark was made indicating that the Phi3 Mini 4k version outperforms its larger 128k counterpart on the Open LLM Leaderboard, suggesting the Mini could be the preferred choice.
Customized Mistral Adaptation: It was noted that Phi3 has been modified using Mistral technology but is configured to work specifically with their version of Phi.
Pi in the Sky: A user shared their experience running Phi-3 on an Orange Pi Zero 3, describing performance with the Q2 version of gemma 2b as “slightly fast.”

Unsloth AI (Daniel Han) ▷ #help (254 messages🔥🔥):

Problems with Quantization and Conversion: Users reported issues with quantization using llama.cpp, such as “failed to quant q8 gguf messages after a large run,” and manual attempts at GGUF conversion led to errors like “Vocab size mismatch.” References were made to GitHub issues #3759 and GitHub issue #4180 related to these problems.
Questions on Few-Shot Learning and Best Practices: One user inquired if it’s better to put all few-shot examples in one user turn or across multiple turns for training. Another user starsupernova suggested trial and error, and in general confirmed either approach can work.
Checkpointing Finetuning Process for Later Resumption: Instructions on checkpointing were shared, pointing users to the Unsloth GitHub Wiki for guidance on how to save progress and continue training later without consuming excessive storage.
Choosing Inference Providers for Fine-Tuned Models: theyruinedelise recommended using Jan or GPT4All as good inference providers for a fine-tuned Llama 3 70B model with Unsloth, with a link to Jan’s GitHub repo (janhq/jan).
Requests for Workflow and Tutorial Clarification: Multiple users sought clarification on training workflows, saving and pushing models to Hugging Face, and how to continue training from checkpoints. For example, starsupernova advised to save both model and tokenizer to Hugging Face and confirmed that setting ref_model=None is fine when using the DPO notebook.

Links mentioned:

Google Colab: no description found
Google Colab: no description found
unsloth/llama-3-8b-bnb-4bit · Hugging Face: no description found
NousResearch/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face: no description found
Home: Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory - unslothai/unsloth
GitHub - janhq/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM): Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM) - janhq/jan
llama3-instruct models not stopping at stop token · Issue #3759 · ollama/ollama: What is the issue? I'm using llama3:70b through the OpenAI-compatible endpoint. When generating, I am getting outputs like this: Please provide the output of the above command. Let's proceed f...
wikimedia/wikipedia · Datasets at Hugging Face: no description found
Google Colab: no description found
[Usage]: Llama 3 8B Instruct Inference · Issue #4180 · vllm-project/vllm: Your current environment Using the latest version of vLLM on 2 L4 GPUs. How would you like to use vllm I was trying to utilize vLLM to deploy meta-llama/Meta-Llama-3-8B-Instruct model and use OpenA...

Unsloth AI (Daniel Han) ▷ #suggestions (18 messages🔥):

Diverse Datasets Without VRAM Woes: Members discussed whether combining multiple datasets increases VRAM usage. The consensus was that merging datasets doesn’t affect VRAM, but rather increases training time.
Training Challenges with Vast Datasets: One member pondered the feasibility of fine-tuning Mistral 7B with a large dataset using 16 gigs of VRAM. Despite the huge size of the dataset, members opined that while possible, it would be very time-consuming and advised focusing on high-quality synthetic data.
A Guide to AI Roadmaps: A suggestion was made to create a simple roadmap for AI projects. This would ideally be a straightforward to-do list in a README.md to clarify development directions and goals.
Model Enhancements for Chatterboxes: Experimentation with smaller models is underway, aiming to increase conversational abilities and accuracy. This indicates a focus on refining AI for better dialogue interactions.
Retrieval Augmentation in the Spotlight: A link to a GitHub repository named FlagEmbedding was shared, which showcases work on retrieval and retrieval-augmented Long LLMs. This could be of interest to those looking to improve their models with retrieval mechanisms. Long_LLM/longllm_qlora on GitHub

Links mentioned:

lmsys/lmsys-chat-1m · Datasets at Hugging Face: no description found
FlagEmbedding/Long_LLM/longllm_qlora at master · FlagOpen/FlagEmbedding: Retrieval and Retrieval-augmented LLMs. Contribute to FlagOpen/FlagEmbedding development by creating an account on GitHub.

LM Studio ▷ #💬-general (204 messages🔥🔥):

Flash Attention Merged into llama.cpp: Flash Attention feature provides better memory efficiency and allows contexts to fit more easily within memory as it operates on an O(N) rather than O(N^2) complexity. Enthusiasm was expressed for the merged PR in llama.cpp, found here: FLASH ATTENTION support merged into llama.cpp.
Experiencing Issues When Loading Models: Users are discussing various issues related to loading models in LM Studio, with one sharing an error and another expressing concern about the system requirements in relation to VRAM and physicaI RAM.
Discussions on Proxy and LM Studio: Users experiencing problems when searching for models may find issues relating to corporate networks, proxies, or the need to disable IPv6 if unable to route to Hugging Face.
GPU Offload Clarifications: An important recommendation made was to turn off GPU offload when using inadequate VRAM, as 3GB of VRAM is insufficient for certain operations in LM Studio.
Eagerness for LM Studio Beta: The beta release of LM Studio 0.2.22 integrating new PRs from llama.cpp was announced, enticing users to test it and provide feedback on the inferencing quality, with the additional anticipation of progress on OpenELM as seen in this update.

Links mentioned:

Perfecto Chefs GIF - Perfecto Chefs Kiss - Discover & Share GIFs: Click to view the GIF
Attempt at OpenElm by joshcarp · Pull Request #6986 · ggerganov/llama.cpp: Currently failing on line 821 of sgemm.cpp, still some parsing of ffn/attention head info needs to occur. Currently hard coded some stuff. Fixes: #6868 Raising this PR as a draft because I need hel...
ggml : add Flash Attention by ggerganov · Pull Request #5021 · ggerganov/llama.cpp: ref #3365 Setting up what's needed for Flash Attention support in ggml and llama.cpp The proposed operator performs: // new res = ggml_flash_attn(ctx, q, k, v, kq_mask, kq_scale); // fused scale ...

LM Studio ▷ #🤖-models-discussion-chat (123 messages🔥🔥):

Exploring Model Limitations: A member queried about the downsides of downloading a 1048K context model for use with only 20k tokens, noting that the updated quantization options were limited. Concerns were also raised that the new Llama 3 quant at Q8 displayed repetitive behavior in version 0.2.20 (ROCm preview).
Compatibilty Issues with Llama 3: Participants discussed that new Llama 3 quants are not backward compatible with older builds, with instances of repeating answers. “These models will also work if you haven’t updated to latest llama.cpp, but will still have the old broken tokenizer until you get your tool updated.” was stated on Reddit, suggesting an update is necessary for optimal use.
Slow Performance on Modest Hardware: Users conversed about the feasibility of running uncensored models, like Everything 7b q4, on machines with 8 GB RAM. It was indicated that the models can work but expect slow performance, with advice to close additional applications such as web browsers to free up resources.
Image Generation Models Availability: Within the discussion, it was clarified that LM Studio does not currently support direct image generation. A member posted a link to a GitHub repository by AUTOMATIC1111, one of the popular free and local options for image generation, separate from LM Studio’s functionality.
Looking for Enhanced Human-like AI Behavior: A user sought advice on creating a more vibrant and human-like AI agent, mentioning the example from a YouTube video featuring “Neuro Sama.” Tips included asking the Llama 3 to create character prompts with specific personality traits and the direction to explore more specialized channels for advanced model behaviors.

Links mentioned:

vonjack/Hermes-2-Pro-BakLLaVA-Mistral-7B · Hugging Face: no description found
Dont Know Idk GIF - Dont Know Idk Dunno - Discover & Share GIFs: Click to view the GIF
Meta AI: Use Meta AI assistant to get things done, create AI-generated images for free, and get answers to any of your questions. Meta AI is built on Meta's latest Llama large language model and uses Emu,...
Reddit - Dive into anything: no description found
AI-Engine/BakLLaVA1-MistralLLaVA-7B-GGUF · Hugging Face: no description found
AUTOMATIC1111 - Overview: AUTOMATIC1111 has 41 repositories available. Follow their code on GitHub.
Reddit - Dive into anything: no description found
Neuro Challenges Vedal: Neuro won't stop spamming chat when Vedal challenges her.►Twitch: http://www.twitch.tv/vedal987►Twitter: https://twitter.com/Vedal987#neurosama #vtuber #vedal

LM Studio ▷ #🧠-feedback (35 messages🔥):

Model Load Error Reported: A user encountered an error stating “(Exit code: 0). Please check settings and try loading the model again.” with 7.15 GB of RAM available and Linux OS specifications.
Various System Specs on Linux: Discussion revolves around observing an unusually high number of Linux users with limited free RAM; also, someone with 64GB+ of RAM also reported only having a few KB of memory free.
Hard Drive Chatter during Model Generation: A user noted a HDD seek sound or “chattering” coming from their computer when generating tokens with a model partially offloaded to the GPU. They clarified that the system has 96GB of RAM, and the noise was specific to HDD, not coil whine or the cooling system.
Problem with Llama3m Model Operation: There were queries about a Llama3 model’s performance, specifically, if it was caching to an HDD rather than staying in RAM. The model of interest was mentioned with a link: Llama-3-8B-Lexi-Uncensored-GGUF and operated at a context size of 8k tokens.
LM Studio vs Ollama Debate: Users shared their opinions about LM Studio and Ollama, leading to a debate on preferences where one user expressed a strong preference for LM Studio while another reminded the community to value both and avoid negative comparisons.

Links mentioned:

Orenguteng/Llama-3-8B-Lexi-Uncensored-GGUF · Hugging Face: no description found
Hard Drive Sounds: This is a comparison of all the sounds of the HDDs in my hard drive collection. The drives are played in chronological from oldest to newest.

LM Studio ▷ #⚙-configs-discussion (9 messages🔥):

Llama3 Loading Woes: A member encountered an error while trying to load llama3 with a 1M token context window, despite having 36GB VRAM and 128GB RAM. The error was attributed to the excessive size of the desired context window when the system parameters are designed for a context size of 250,000.
Context Window Overload: Attempting to load a 100k token context window successfully maxed out the system’s capabilities, indicating that the 1M token ambition was simply too resource-intensive.
Quadratic to Linear: One contributor mentioned that the context issue used to be quadratic but, with current optimizations, it’s “more like linear nowadays”.
Configuration Misread: A member highlighted that the Readme for the model indicates the requirement of “100s of gigs of ram”. This comment implies a possible oversight in understanding the hardware requirements for large context windows.
Model Download Attempt: The member provided a specific directory for Llama-3-8B-Instruct-Gradient-1048k-iMat-GGUF, which suggests an effort to download or reference a specific version of the model.

LM Studio ▷ #🎛-hardware-discussion (272 messages🔥🔥):

Groq's Tempting Token Generation: There was a discussion around Groq's ability to generate 800 tokens per second for llama 3 70B, with anticipation of an upcoming paid subscription model.
Hardware Guidance for LLMs: A member was advised that their AMD rx 5600m 6GB VRAM with Ryzen 7 4k setup may be on the low end for running local models, suggesting they explore models listed on the app’s front page.
Model Download Speeds: Members engaged in a talk about the download speeds of models from Hugging Face within LM Studio, with one claiming about 10MB/s and another advocating a speeds comparison between direct downloads vs LM Studio.
The Quest for Comparative Accuracy in LLMs: A user sought LLMs that could match the accuracy of ChatGPT, discussing the recent 70b llama3 and Wizard models, with mentions of performance being new and uncharted.
Hardware Endeavors and Puzzling Phenomenon: There were extensive discussions surrounding the optimal hardware for LLM processing, with a focus on memory speed and VRAM as limiting factors, SLI/NVLink capabilities, and an anecdote about two different models generating the same fictional city name in separate instances, prompting a mix of humor and curiosity.

Links mentioned:

no title found: no description found
Reddit - Dive into anything: no description found
MACKLEMORE & RYAN LEWIS - THRIFT SHOP FEAT. WANZ (OFFICIAL VIDEO): The Heist physical deluxe edition:http://www.macklemoremerch.comThe Heist digital deluxe on iTunes: http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewAlb...

LM Studio ▷ #🧪-beta-releases-chat (141 messages🔥🔥):

Troubleshooting Hardware Compatibility: A member queried about software running but facing issues with Large Language Model (LLM) acceptance on their hardware setup. Another participant advised that the hardware with i5-4570 and 16GB RAM is likely insufficient for most models, suggesting they could only run a 7b Q4 model effectively.
New LLama.cpp Commit Requested: A request was posted for the latest commit of llama.cpp to fix a tokenizer problem. A response suggested that it would be made available soon.
Eagerly Awaiting LM Studio 0.2.22: Dialogue surrounding LM Studio 0.2.21 issues led to anticipation for the upcoming release of LM Studio 0.2.22. Discussion indicated the later version might address current issues.
Release and Quick Fixes for LM Studio 0.2.22: The release of LM Studio 0.2.22 Preview Build 1 was announced with features including UI touch-ups and updated llama.cpp, and URLs for Mac and corrected Windows installers were shared. After some confusion with incorrect version labeling, a new URL was provided and confirmed to work for Windows users.
Model Performance Discussions After Update: Members discussed various model performances post LM Studio update, with a focus on GGUF format issues and the effectiveness of recent quantizations. A member highlighted a reasoning-gap in Llama 3 GGUF models using a ‘banana test’ and apple quantity scenario, comparing it to other formats’ performance on logical reasoning tasks.

Links mentioned:

no title found: no description found
Tweet from bartowski (@bartowski1182): Ran into multiple issues making llamacpp quants for 70b instruct, it'll be up soon I promise :) eta is tomorrow morning
no title found: no description found
GGUF My Repo - a Hugging Face Space by ggml-org: no description found
no title found: no description found
Doja Cat GIF - Doja Cat Star - Discover & Share GIFs: Click to view the GIF
NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF · Hugging Face: no description found
bartowski/Meta-Llama-3-8B-Instruct-GGUF · Hugging Face: no description found
Qawe Asd GIF - Qawe Asd - Discover & Share GIFs: Click to view the GIF
Ojo Huevo GIF - Ojo Huevo Pase de huevo - Discover & Share GIFs: Click to view the GIF
Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
Dell Treasure Box (Black) Desktop i5-4570, 16GB, 512GB SSD, DVD, Win10: Dell RGB Treasure Box OptiPlex SFF (Refurbished) Consumer Desktop Intel Core i5-4570 (up to 3.6GHz), 16GB, 512GB SSD, DVD, Windows 10 Professional (EN/FR) (Black)

LM Studio ▷ #autogen (4 messages):

Model Loading Issue Raised: A member mentioned having issues loading a model and sought assistance in resolving it.
Reminder of Discord Etiquette: Another member reminded to avoid spamming questions across multiple channels, advising to keep such queries in a specific channel.

LM Studio ▷ #amd-rocm-tech-preview (40 messages🔥):

VRAM Misreading Spotted: A member mentioned that LM Studio is incorrectly reading the VRAM capacity of their 7900xtx. They also have a 7800x3d with integrated GPU, but doubt it’s causing the issue.
Performance Precedent Creates Confusion: Despite having used a RX 6600 with LM Studio GPU offloading before, a member faces an error stating “no ROCm-capable device is detected” after updating to version 0.2.18. This elicits discussion about support for ROCm and OpenCL implementations with various AMD GPUs.
HIP SDK Support Misconceptions Clarified: Members exchanged information about the compatibility of different AMD GPUs with ROCm and the HIP SDK, stating that graphics cards like the RX 6600 and 6700XT are not supported by the HIP SDK which LM Studio utilizes.
Lamenting GPU Support on LM Studio: While one member considered upgrading to a 7900 GRE, another advised that they would be better off with a 7900XTX for guaranteed compatibility with LM Studio’s ROCm build. The price difference between models in their country sparked a humorous suggestion of a budget flight for hardware shopping.
Searching for Linux-Specific ROCm Builds: The conversation revealed that there is no ROCm build for Linux, prompting a mention of Mozilla’s work on llamafile as a potential workaround for issues related to AMD’s driver support.

Links mentioned:

GPU and OS Support (Linux) — ROCm 5.7.1 Documentation Home: no description found
Doja Cat GIF - Doja Cat Star - Discover & Share GIFs: Click to view the GIF
Llamafile: four months of progress towards democratizing AI: no description found
Sapphire AMD Radeon RX 7900 XTX PULSE Graphics Card for Gaming - 24GB | Ebuyer.com: no description found

LM Studio ▷ #crew-ai (2 messages):

CrewAI Integration with RAG: A member inquired about successfully integrating LMStudio with Retrieval-Augmented Generation for functionalities similar to PDFSearch or WebsiteSearch using CrewAI.
Embedder Preferences in CrewAI: The same member mentioned the possibility of assigning an embedder like huggingface within CrewAI, but expressed interest in utilizing LMStudio Nomic embed.
Model Performance Observations: They shared their experience testing models Gemma, llama3 fp16, and Wizardlm, finding Gemma to most align with their needs.

LM Studio ▷ #🛠-dev-chat (1 messages):

yagilb: https://x.com/lmstudioai/status/1785796240656957514

Nous Research AI ▷ #ctx-length-research (25 messages🔥):

Tackling Positional OOD for Context Extension: A member highlighted a solution to positional out-of-distribution (OOD) issues which allows models to generalize to longer contexts. They shared an arXiv paper that proposes this method and considered it one of the most slept-on papers for context length extension.
Normalizing Outliers for Better Performance: Further discussing the same paper, the member mentioned that models can maintain good performance with longer contexts by normalizing outlier values. This was a follow-up to the earlier discussion on extending context lengths in AI models.
Reference Implementation in llama.cpp: An example implementation for the discussed concept can be found in llama.cpp on GitHub. It employs parameters --grp-attn-n and --grp-attn-w in a server executable, which the member linked to a GitHub repository with accompanying visualization and description.
Debating on “Infinite” Contexts and RoPE: There was a discussion on the balance between preventing OOD issues and extending context capabilities, with some members referring to attention truncation as counterproductive. A member pointed out that “infinite” context length is misleading and mentioned the ReRoPE implementation on GitHub, which was released 9 months prior by the original RoPE author, suggesting possible plagiarism.
The Myth of Infinite Context: The channel had a lighthearted exchange acknowledging the impracticality of “infinite context” models, with a nod to the excessive number of related papers on arXiv and a quip about the impossibility of having enough VRAM for such models. They also referenced Google publishing one of the many papers on this topic.

Links mentioned:

GitHub - bojone/rerope: Rectified Rotary Position Embeddings: Rectified Rotary Position Embeddings. Contribute to bojone/rerope development by creating an account on GitHub.
llama.cpp/examples/server at master · ggerganov/llama.cpp: LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

Nous Research AI ▷ #off-topic (25 messages🔥):

Seeking the AI Swisshutnife: A member asked about platforms for MLOps bounties, akin to an AI-focused Fiverr, expressing a strong interest in such a service. They received suggestions that while there isn’t one dedicated to AI/MLOps, general programming bounties could be found on Replit.
Construction Tech Job Alert: A job opportunity was shared for a software engineer experienced in Python and JavaScript at a Miami-based construction tech company. They have projects undergoing beta testing across the US and are open to remote candidates.
Machine Learning on Unreal Engine: A member announced the launch of an RAG-based AI assistant for Unreal Engine, which aims to improve the workflow in game development and related fields. They invited Unreal Engine users to give it a try and provide feedback, touting its potential to speed up development and learning check it out here.
A Battle of AI Assistants: Following the revelation of the RAG-based tool, another member brought up their work with a GPT-4 vision-based tool for Unreal Engine 5, emphasizing the advantages of visual inputs for specific tasks such as blueprint editing in UE5.
Call for Computing Power: One member inquired about potential grants or resources for data generation and evaluation, expressing a need for access to high-powered computing resources like A100 GPUs to accelerate their research beyond the limitations of their current setup.

Links mentioned:

Neuralgameworks - Your Ultimate Unreal Engine AI Assistant: no description found
Bounties: Work with top Replit creators to bring your ideas to life.

Nous Research AI ▷ #interesting-links (9 messages🔥):

AI Bubble Trouble: A YouTube video titled “Is the AI bubble popping?” was shared that discusses whether there’s a bursting bubble in the AI startup ecosystem. The video provides a narrative involving three AI startups and uses stability/inflection/cohere analysis. Watch the YouTube video.
Memories Made Digital: A GitHub repository for Memary was mentioned, a project aimed at creating long-term memory for autonomous agents using neo4j for storing memories graphically. Interest was expressed in its novel approach and potential performance. Explore the Memary repo.
Sudden Shutdown of GPT-2 Chatbot: A Tweet from @itsandrewgao reported the gpt2-chatbot being turned OFFLINE unexpectedly, provoking curiosity about the sudden change. View the Tweet.
AI Sensemaking Challenge: A challenging problem was shared on Twitter by @VictorTaelin, who spent hours trying to solve it without success and expressed eagerness for a solution. Check out the Twitter post.
Advanced Reasoning for AI: An arXiv paper detailed a method for improving Chain-of-Thought (CoT) reasoning in AI by using iterative preference optimization and a specially modified loss function. This approach boosted accuracy significantly on various benchmarks such as GSM8K and MATH for Llama-2-70B-Chat. Read the arxiv paper.

Links mentioned:

Verbasizer: no description found
Iterative Reasoning Preference Optimization: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Ch...
Tweet from Andrew Gao (@itsandrewgao): gpt2-chatbot was just turned OFFLINE I was just using it half an hour ago! @shaunralston for the find #gpt2 @openai
GitHub - kingjulio8238/memary: Longterm Memory for Autonomous Agents.: Longterm Memory for Autonomous Agents. . Contribute to kingjulio8238/memary development by creating an account on GitHub.
Is the AI bubble popping?: The story of 3 startups paints a portrait of an AI bubble that could be popping.Subscribe to Synapse for deeply researched stories that are shaping the AI la...
GitHub - KindXiaoming/pykan: Kolmogorov Arnold Networks: Kolmogorov Arnold Networks. Contribute to KindXiaoming/pykan development by creating an account on GitHub.
GitHub - SynaLinks/HybridAGI: The Programmable Neuro-Symbolic AGI that lets you program its behavior using Graph-based Prompt Programming: for people who want AI to behave as expected: The Programmable Neuro-Symbolic AGI that lets you program its behavior using Graph-based Prompt Programming: for people who want AI to behave as expected - SynaLinks/HybridAGI

Nous Research AI ▷ #announcements (1 messages):

Hermes 2 Goes Pro with Llama-3: Nous Research announces Hermes 2 Pro on Llama-3 8B, enhancing capabilities with Function Calling and Structured Output. Their first Llama-3 based model, it surpasses its predecessor on various benchmarks and is now available on HuggingFace.
Leading the Benchmarks: Hermes 2 Pro has demonstrated superior performance over Llama-3 8B Instruct on AGIEval, GPT4All Suite, TruthfulQA, and BigBench, showcasing advancements in AI evaluation metrics.
Explore the Quantized Version: For those interested in a lighter model, the quantized version of Hermes 2 Pro Llama-3 8B is available, offering the same advancements in a more size-efficient form on HuggingFace GGUF.
Collaboration Achievement: A shout-out was given to the collaboration team behind Hermes 2 Pro, which included specific members contributing to the development and customization required for this latest model release.
Follow the Journey on Twitter: Keep up with the latest updates by following Nous Research’s progress with Hermes 2 Pro via their Twitter announcement.

Links mentioned:

NousResearch/Hermes-2-Pro-Llama-3-8B · Hugging Face: no description found
NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF · Hugging Face: no description found

Nous Research AI ▷ #general (468 messages🔥🔥🔥):

Llama and Hermes Performance Discussions: Members discussed the performance differences between Hermes 2 Pro Llama 3 and previously released models. Some pointed out that Hermes 2 Pro may have unlearned tasks such as the “apple test” but also gained new capabilities like function calling.
Quantizing Language Models: The community debated the effectiveness of quantizing large language models (LLMs). It was noted that a limit exists around 5.5 bits per weight where performance loss becomes significant when quantizing, and that Q8 quantization normally does not result in quality loss.
Training Challenges with Quantization: There was a consensus that 1.58 bit LLMs likely perform well in early training due to the regulatory properties of low-bit quantization but may diverge in performance as they reach the network’s capacity limit.
Context Length in LLMs: The topic of context length was also raised, with discussions on the practical limits and whether extensive soft prompt tuning (SPT) examples are worthwhile. It was highlighted that the longest valid samples average around 100/200k for text.
New LLM Releases and Collaborative Efforts: Enthusiasm was shown for potential new state-of-the-art models, with an 8B LLM briefly mentioned along with interest in novel fine-tuning methods over existing models. Collaboration on these fronts is ongoing, exhibiting excitement and speculative planning from various members.

Links mentioned:

AudioPaLM: no description found
Tweet from undefined: no description found
Tweet from Q (@qtnx_): llama-3-vision-alpha now works using @huggingface transformers
Tweet from Teortaxes▶️ (@teortaxesTex): So llama 8b won't quantize well even if you fix token merging. Maybe the issue is vocab, maybe just overtraining, and I fear the latter. My (half-baked) intuition is that we're refining compos...
LLM.int8() and Emergent Features — Tim Dettmers: When I attended NAACL, I wanted to do a little test. I had two pitches for my LLM.int8() paper. One pitch is about how I use advanced quantization methods to achieve no performance degradation transfo...
Tweet from lmsys.org (@lmsysorg): Thanks for the incredible enthusiasm from our community! We really didn't see this coming. Just a couple of things to clear up: - In line with our policy, we've worked with several model de...
qresearch/llama-3-vision-alpha-hf · Hugging Face: no description found
llava_instruct_150k.json · liuhaotian/LLaVA-Instruct-150K at main: no description found
Tweet from Andrew Gao (@itsandrewgao): gpt2-chatbot was just turned OFFLINE I was just using it half an hour ago! @shaunralston for the find #gpt2 @openai
Over9000 Dragonball GIF - Over9000 Dragonball - Discover & Share GIFs: Click to view the GIF
Tweet from Q (@qtnx_): llama-3-vision-alpha now works using @huggingface transformers
LLaVA/docs/Finetune_Custom_Data.md at main · haotian-liu/LLaVA: [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond. - haotian-liu/LLaVA
Tweet from Sanchit Gandhi (@sanchitgandhi99): Last week we released 🤗Diarizers, a library for fine-tuning speaker diarization models 🗣️ Using a free Google Colab, it takes 10 minutes to improve multilingual performance by 30%: https://colab.re...
DeepSpeed/blogs/deepspeed-ulysses/README.md at master · microsoft/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. - microsoft/DeepSpeed
BitNetMCU/docs/documentation.md at main · cpldcpu/BitNetMCU: Neural Networks with low bit weights on a CH32V003 RISC-V Microcontroller without multiplication - cpldcpu/BitNetMCU
GitHub - tincans-ai/gazelle: Joint speech-language model - respond directly to audio!: Joint speech-language model - respond directly to audio! - tincans-ai/gazelle
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hx🔗 Links- F...
GitHub - zhuzilin/ring-flash-attention: Ring attention implementation with flash attention: Ring attention implementation with flash attention - zhuzilin/ring-flash-attention
Do NOT sleep on Whisper.cpp: @ggerganov's Whisper.cpp is bringing OpenAI's Whisper to the masses. We discuss on "The Changelog" podcast. 🎧 👉 https://changelog.fm/532Subscribe for more!...
EasyContext/easy_context/zigzag_ring_attn/monkey_patch.py at main · jzhang38/EasyContext: Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. - jzhang38/EasyContext
Tweet from chris (@hingeloss): Presenting: the world's fastest AI voice chat - 500ms latency, running locally, 2x faster than anyone else. How is this possible? 👇
🦌 Gazelle v0.2: no description found
perplexity: more statistics, added documentation by JohannesGaessler · Pull Request #6936 · ggerganov/llama.cpp: I have seen subjective reports about quantization being more harmful for LLaMA 3 than for LLaMA 2. I decided to investigate this and have to this end added more statistics (and documentation) to pe...
llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...

Nous Research AI ▷ #ask-about-llms (16 messages🔥):

The Quest for a Million Contexts: An attempt to load the 1 M context on lm studio was unsuccessful, and it was clarified that models like Phi-3 128k don’t run on ollama due to issues with supporting attention window mechanisms like Rope Theta and Ring.
LLaMA Pull Request to the Rescue: Users reported an issue that has been resolved with a new pull request to llama.cpp, improving BPE pre-processing and adding support for LLaMA 3 and Deepseek.
Tokenizer Troubles and GGUFs: There was confusion about whether tokenizers were the root issue for a bug, and whether GGUFs required requantization, with some thinking the problema addressed and others not so sure.
Grokking Through Reverse Engineering: A study detailed on arXiv about the phenomenon of “grokking” suggested using mechanistic interpretability to reverse-engineer learned behaviors of neural networks.
Ranking the Outputs of LLMs: A method for qualitative ranking of LLM outputs was sought, with a suggestion to use argilla distilable or a reward model, although clarity on executing the actual evaluation in distilable was questioned.

Links mentioned:

Progress measures for grokking via mechanistic interpretability: Neural networks often exhibit emergent behavior, where qualitatively new capabilities arise from scaling up the amount of parameters, training data, or training steps. One approach to understanding em...
llama : improve BPE pre-processing + LLaMA 3 and Deepseek support by ggerganov · Pull Request #6920 · ggerganov/llama.cpp: Continuing the work in #6252 by @dragnil1 This PR adds support for BPE pre-tokenization to llama.cpp Summary The state so far has been that for all BPE-based models, llama.cpp applied a default pre...

Nous Research AI ▷ #rag-dataset (16 messages🔥):

Introducing Wikipedia RAG Dataset: A link to the Wikipedia RAG dataset on Hugging Face was shared, highlighting its relevance to the paper on Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval. This paper was published on Nov 10, 2023, and can be found here.
Halal & Kosher Datasets?: A member made a brief reference to the creation of datasets marked as Halal & Kosher, implying considerations for ethical or cultural compliance in dataset creation.
Cynde Integrates Pydantic: The new Pydantic platform is being integrated into a rework of Cynde, which is interesting to members involved in the technical development.
Logfire Simplifies Code Observability: The Logfire platform’s introduction was discussed, denoted as a new observability platform facilitating the tracking of Pydantic models in function call settings. The platform, described as “intuitive” and “currently free,” is praised for its ease of use and efficiency, with a specific reference to its capability of tracking nested CV jobs and providing significant data feedback. More about Logfire can be explored here.
Model Fine-Tuning for Specific Output Formats: A conversation took place around the fine-tuning of AI models to generate specific output formats, wherein a member suggests simplicity in instruction consistency to ensure proper formatting. Hermes 2 Pro - Llama-3 8B was mentioned as an example, particularly its structured output section on the Hugging Face model page.

Links mentioned:

Pydantic Logfire | Uncomplicated observability: Logfire is a new type of observability platform built on the same belief as Pydantic — that the most powerful tools can be easy to use.
🦢SWIM-IR Dataset - a nthakur Collection: no description found
NousResearch/Hermes-2-Pro-Llama-3-8B · Hugging Face: no description found

Nous Research AI ▷ #world-sim (24 messages🔥):

Virtual Business and Music Stardom Simulators Introduced: CompSimulator and Snow Singer Simulator are launched, offering users immersive experiences in the business and music industries respectively, powered by advanced AI technologies.
Eldritch Themes in Alternate History Simulation: A member describes an alternate history simulation featuring Eldritch Nazi themes, cyberpunk influences, and an uprising in Reichskommisariat Mittelafrika.
Consistency in LLAMA 3 HF Chat Bot Responses: It was noted that the LLAMA 3 bot on HF Chat generates the same response for the same message sent to it.
World Simulation Talks & Global Community Engagement: A YouTube video featuring talks from AGI House SF is shared, inspiring plans for a community meetup in LA and a global event connecting with SF and Japan.
Websim Game Development Updates: A user announced a new game created on Websim, planning an update that will span from the stone age to the galactic age, with the link posted but leading to “null” and promising more features soon.

Links mentioned:

no title found: no description found
Snow Singer Simulator - HuggingChat: Use the Snow Singer Simulator assistant inside of HuggingChat
CompSim - HuggingChat: Use the CompSim assistant inside of HuggingChat
World Simulation Talks @ AGI House SF: 0:00 Conversation1:31 Kickoff by Jeremy Nixon6:08 Karan Malhotra of Nous Research26:22 Rob Hasfield: CEO of Websim1:00:08 Ivan Vendrov of Midjourney [Real ti...

Stability.ai (Stable Diffusion) ▷ #general-chat (497 messages🔥🔥🔥):

SD3 Release Skepticism and Speculation: Multiple users expressed doubts and concerns regarding the release of Stable Diffusion 3 (SD3), mentioning claims of its release in April and anticipations for a May release, only to lament its absence. The discourse is marked by skepticism, supposing SD3 might never officially be released and speculation that Stability AI could face backlash for alleged misleading statements about SD3 being free and open-source.
Choosing the Right Model for Local Use: Users are actively discussing the merits and tutorials of various Stable Diffusion local interfaces, including ComfyUI, AUTO11111, Focus, and Forge. Preferences seem to vary, with suggestions to choose based on ease of use and the user’s specific hardware, like owning an NVIDIA vs. AMD GPU.
Prompt Enhancements and Descriptions with AI: Individuals are inquiring about the most effective methods for image descriptions, debating the benefits of various AI tools. Mentioned options include using ChatGPT, Gemini, and employing models such as Claude 3 and idefics2 for analyzing and improving prompts for image generation.
Investments in AI Service Subscriptions and VPN Use: There is active discussion and advice around investing in AI services such as Gemini and Claude 3, alongside shared practices involving VPN usage for region circumvention or maintaining privacy. Users are suggesting various VPNs and hinting at the usage of features like DNS over HTTPS for added security.
Creating and Using Labels in Automatic Extensions: A user queries whether there’s a way to embed labels in output images using extensions for Automatic1111, followed by inquiries about the existence of features equivalent to clip skip and stylizer within custom interfaces like ComfyUI.

Links mentioned:

Towards Pony Diffusion V7 | Civitai: Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...
Yuji Stare Jujutsu Kaisen GIF - Yuji Stare Jujutsu Kaisen Blank - Discover & Share GIFs: Click to view the GIF
Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model: no description found
Introducing Idefics2: A Powerful 8B Vision-Language Model for the community: no description found
Mythos - v1.0 | Stable Diffusion Checkpoint | Civitai: V1 it is somehow 3.55GB big.... i think i managed to do a stable fp8 prune???? i literally have no idea how it is 3.55GB... V2 is a normal 6GB mode...
Melxts2008 Emoji GIF - Melxts2008 Emoji Smile - Discover & Share GIFs: Click to view the GIF
Stable Assistant — Stability AI: Stable Assistant is a friendly chatbot developed by Stability AI equipped with Stability AI’s text and image generation technology, featuring Stable Diffusion 3 and Stable LM 2 12B.
ComfyUI/script_examples/basic_api_example.py at master · hiddenswitch/ComfyUI: A powerful and modular stable diffusion GUI with a graph/nodes interface. - hiddenswitch/ComfyUI
ComfyUI/tests/distributed/test_embedded_client.py at 0862863bc00165b9ba0607595f304f93ca995887 · hiddenswitch/ComfyUI: A powerful and modular stable diffusion GUI with a graph/nodes interface. - hiddenswitch/ComfyUI
Towards Pony Diffusion V7 | Civitai: Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...
What is score_9 and how to use it in Pony Diffusion | Civitai: You may've seen score_9 or its longer version score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up used in prompts for Pony Diffusio...
generative-models/model_licenses/LICENSE-SDXL1.0 at main · Stability-AI/generative-models: Generative Models by Stability AI. Contribute to Stability-AI/generative-models development by creating an account on GitHub.
GitHub - AbdullahAlfaraj/Auto-Photoshop-StableDiffusion-Plugin: A user-friendly plug-in that makes it easy to generate stable diffusion images inside Photoshop using either Automatic or ComfyUI as a backend.: A user-friendly plug-in that makes it easy to generate stable diffusion images inside Photoshop using either Automatic or ComfyUI as a backend. - AbdullahAlfaraj/Auto-Photoshop-StableDiffusion-Plugin

OpenAI ▷ #annnouncements (1 messages):

More Control Over Chat History: OpenAI now updates data controls for both ChatGPT Free and Plus users. Anyone can access their chat history even if they’ve opted out of training data contribution; the update is live on web and coming soon to mobile.
Introducing Temporary Chat: Users have a new option for privacy with the Temporary Chat feature, allowing one-off conversations that won’t be stored in chat history.

OpenAI ▷ #ai-discussions (375 messages🔥🔥):

GPT-2 Chatbot Sparks Curiosity: Members have discussed the “gpt2-chatbot” model, with some stating it performs better than GPT-4 in many cases, and others noting it fails in certain unidentified scenarios. Infinite generations with gpt2-chatbot seem possible, but the model has become unavailable in some arenas.
AI and Emotion: A robust discussion unfolded around the concept of AI and emotions, with members pondering the potential for AI to develop its emotional awareness over time. Comparisons were made between AI evolution and human emotional development, with varying opinions on whether AI could or should strive to achieve a form of empathy or emotional understanding akin to humans.
Limits of the Free Tier: A conversation regarding the accessibility of OpenAI’s features like DALL-E for free users took place, with some expressing desires for added functionalities without subscriptions. The dialogue surfaced awareness of business realities and community desires for OpenAI’s product offerings.
AI Collaboration in Academia: One user queried the community on how to effectively collaborate with multiple AI models, like ChatGPT and Claude, in academic writing. Suggestions included the use of third-party chatbots that could retain other AI responses within the context.
Thoughts on DALL-E Updates: Discussion covered DALL-E’s current state and hypotheticals about future versions such as DALL-E 4. While some users noted improvements in DALL-E 3 leading to better creation results, the conversation also emphasized that good human-AI synergy remains crucial, and debated the importance of AI adapting to human cognitive patterns.

Links mentioned:

Udio | Echoes in the Chaos by Tcald | AI Music Generator - Official Website: Listen to Echoes in the Chaos by Tcald on Udio. Discover, create, and share music with the world. Use the latest technology to create AI music in seconds.
GitHub - openai/simple-evals: Contribute to openai/simple-evals development by creating an account on GitHub.

OpenAI ▷ #gpt-4-discussions (10 messages🔥):

GPT-2’s Exploration in Chat Systems: A member shared their experience experimenting with GPT-2 in chat system integrations. They directed further details of the discussion to a specific channel.
Archive Accidents and Bulk Deletion Queries: A user inadvertently archived all of their chats and inquired about bulk deletion options to handle a large volume of chats as opposed to deleting them individually.
Screenshot Sharing: There was a query about why screenshots cannot be posted in this channel, as a member wanted to share a humorous output of a GPT integration.
Directing Image-Friendly Channels: It was clarified to a member that screenshots can be shared in another channel dedicated to such content.
Inconsistencies in ChatGPT’s Character Limits: A discrepancy was noted by a member where ChatGPT allegedly misrepresented its character limit, allowing for inputs longer than the stated 4096 characters.
Clarifying ChatGPT’s Limitations and Behavior: A member explained that ChatGPT’s self-awareness is limited as it is not trained to accurately know its capabilities or version. They differentiated between the free and ChatGPT Plus versions with varying token limits, and mentioned the possibility of ChatGPT summarizing conversations when context limits are reached.

OpenAI ▷ #prompt-engineering (30 messages🔥):

The Challenge of Negative Prompting: Members discussed issues with negative prompting, indicating that providing examples of desired output is more effective than listing prohibited content. One suggestion included reframing instructions as “instead of x, use y.”
Regional Dialect Woes: A use case was presented involving avoiding specific words that have different meanings in the Argentinian dialect of Spanish. Members suggested quizzing the AI on its understanding of Argentinian Spanish and considering an approach that explains the contextual use of words rather than a list of prohibitions.
Harnessing Positive Prompt Efficacy: It’s highlighted that positive prompting, possibly with the structure “instead of x, use y,” is likely to yield better compliance with GPT’s outputs than listing negative examples or prohibitions.
Metadata-Prompting Explored: For a hobbyist explorer, a simple form of meta-prompting using open variables and markdown for emphasis was discussed. It was suggested that this could enhance interactions with GPT.
Interactivity with AI Models: The potential of meta-prompting to facilitate interactive, dynamic, and multi-layered prompts was also outlined. Examples include using placeholders for {openVariable} to guide the AI’s behavior and structuring output templates to support the exchange.

OpenAI ▷ #api-discussions (30 messages🔥):

Mulling Over Model Prompting Techniques: Members discussed strategies for prompting OpenAI’s models, emphasizing using positive instructions and examples over negative ones, to avoid undesirable word usage. They shared insights on constructing prompts that can drive the AI to produce better outcomes without listing prohibited words and highlighted various approaches such as using phrasing like “instead of ‘x’, use ‘y’” to guide the AI’s language choices.
Knowledge is Power: In discussing techniques to generate a detailed Ideal Customer Persona (ICP) from LinkedIn data, a user presented a strategy involving analyzing posts and screenshots to determine a person’s demographics, psychographics, and behaviors. The aim is to have the AI act as a personal branding and target audience expert as part of a content strategy for marketing and sales.
Prompt Engineering 101: A member requested advice on prompt engineering as a hobbyist looking to delve deeper into interacting with AI for knowledge and coding. Other participants offered suggestions like using open variables in meta-prompting and leveraging markdown for structuring and emphasizing parts of the prompt to encourage more complex AI behaviors.
Meta-Prompting for Interactive Experiences: There was a consensus that meta-prompting techniques, where users create dynamic and interactive prompts for the AI, can significantly enhance the user’s ability to achieve complex tasks. The conversation included an example of how to frame a meta-prompt for the AI to act as an expert system.
Journey into AI Prompt Engineering: An AI enthusiast received encouragement and guidelines on starting with prompt engineering to improve their interactions with OpenAI’s models. There was a particular discussion on the role of open variables and the use of markdown in prompts, as well as the potential benefits of using web search features for researching prompt engineering technologies.

Perplexity AI ▷ #announcements (1 messages):

Exclusive Early Access to New ‘Pages’ Feature: A new feature called Pages is set to launch, offering an easy-to-create, shareable, in-depth exploration of any topic. Interested users can join the beta testing program for early access and the chance to provide feedback by reacting with a specific emoji and heading to the specified channel.

Perplexity AI ▷ #general (241 messages🔥🔥):

API Citation Woes: A member inquired about obtaining citations like [1] and seeing web UI references via API requests while using Perplexity-online models. Another member explained that the anticipated program was suspended earlier due to fraud issues with the discount codes.
Flaws in Pro Search and Reference Features?: Several users reported issues with Pro Search and reference features on Perplexity, noticing either redundant answers or missing references; one even claimed to face these glitches after upgrading to premium.
Questions Surrounding Opus Daily Limit: Discussions around the daily limit for Opus usage surfaced, with members clarifying that it’s 50 uses per day, replenished every 24 hours. Some expressed dissatisfaction with the lack of estimates for when this limit might be increased.
Perplexity Performance and Issues: Users shared experiences of slow responses from AI models and problems logging into accounts. There was advice to double-check spam folders for login links and speculations that service providers could block emails.
Clarity on Model Differences and Features: The conversation touched on the varying quality of answers from different models and features like scratchpad prompts, AI prompting inaccuracies, and context window sizes for conversations. One user confirmed the context window is indeed 32k.

Links mentioned:

Reka Playground: Explore the latest multimodal language models built by Reka
Rabbit R1: Barely Reviewable: AI in a Box. But a different box.Get a dbrand skin and screen protector at https://dbrand.com/rabbitMKBHD Merch: http://shop.MKBHD.comTech I'm using right no...

Perplexity AI ▷ #sharing (19 messages🔥):

Exploring Perplexity AI: Several users shared Perplexity AI search results exploring topics ranging from Microsoft Research Asia to the Vimeo API and queries about the Mac App Store.
LennysNewsletter on Product Insights: A member shared a link to Lenny’s Newsletter, which includes topics like Duolingo’s growth secret and how AI will impact product management, with an invitation to subscribe for full access.
Google’s Recent Layoffs: A link was circulated about Google laying off employees amidst other business adjustments.
Tesla’s Full Self-Driving Discussion: Automobile technology was a point of interest, with a link shared about Tesla’s full self-driving capabilities.
Reminder for Shareability on Discord: Perplexity AI reminded users to ensure their threads are shareable, providing a visual guide linked directly from Discord’s platform.

Link mentioned: How Perplexity builds product: Johnny Ho, co-founder and head of product, explains how he organizes his teams like slime mold, uses AI to build their AI company, and much more

Perplexity AI ▷ #pplx-api (14 messages🔥):

Confusion about API Citations: A member inquired about the possibility to get citations through API requests when using the perplexity-online models for web knowledge, and another member referred to earlier messages that seemingly addressed related concerns.
Policy Clarification for Claude 3 Use: A user asked about the usage policy for Claude 3 provided by Perplexity, especially concerning political use, and if Perplexity’s usage policy takes precedence over Anthropic’s when using their models.
Perplexity Pro vs. API Results Disparity: A user highlighted a discrepancy between results obtained from the Perplexity Pro interface and those from the API using the same prompt, to which a fellow member clarified that Perplexity UI and API might not be using the same model version.
API Documentation Clarification: In response to confusion over model versions, a user referenced the Perplexity API documentation, which lists models like llama-3-70b-instruct with details on parameters and instructed members on how to avoid prompt injections.
Understanding Online Models: A user questioned which online model Perplexity Pro UI uses, leading to an explanation that online models are either finetuned to use sources more effectively or employ a RAG-like approach to synthesize responses from a search engine-style vector database.

Link mentioned: Supported Models: no description found

Eleuther ▷ #general (28 messages🔥):

Effort/bucketMul for Efficient Inference: A new algorithm called effort/bucketMul was introduced, which claims to significantly speed up vector-matrix approximation and LLM inference. It’s described as adjustable in real-time for computational load and is compatible with models like Mistral. Algorithm launched.
Amateur AI Hobbyist Presents Image Patch Study: An amateur AI enthusiast shared their research on efficient image patch representation inspired by neural systems, available on arXiv. They propose a novel binary vector representation learned through unsupervised learning.
Discussion on Binary vs. Hypersphere Embeddings: Members discussed the merits of binary vector representations for embeddings, linking their benefits to biological plausibility and computational efficiency. One member considered applying similar principles to the RWKV LLM for potentially faster learning. RWKV LLM method.
Recommendations for Embedding Strategies: In response to the discussion on representations, links to foundational papers in the space, including CLIP and Dino, were shared for further reading on embedding distributions. CLIP Paper, Dino Paper.
Query on Image Classification with CLIP Embeddings: A member sought advice on classifying images of movie stars using CLIP embeddings, obtaining only 36% accuracy with both modified labels and prompts. They explored using cosine similarity with text descriptions but are considering alternative approaches due to the lack of improvement.

Links mentioned:

Linear Transformers Are Secretly Fast Weight Programmers: We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a ``slow" neural net learns by gradient descent to program the ``f...
Effort Engine: A possibly new algorithm for LLM Inference. Adjust smoothly - and in real time - how many calculations you'd like to do during inference.
Efficient Representation of Natural Image Patches: Utilizing an abstract information processing model based on minimal yet realistic assumptions inspired by biological systems, we study how to achieve the early visual system's two ultimate objecti...
Reddit - Dive into anything: no description found

Eleuther ▷ #research (192 messages🔥🔥):

Unraveling the “Black Box” Analogy: The discussion revealed varying perspectives on why large language models (LLMs) are often referred to as “black boxes.” Some participants noted the complexity of LLMs’ inner workings relative to our understanding, while others suggested that the imprecise use of such terms reflects a human tendency to parrot pithy phrases.
Training LLMs on Test Sets Affects Fair Comparisons: A shared link points out that LLMs trained on benchmark test sets skew the effectiveness of benchmarks and foster potentially unfair comparisons.
Chain-of-Thought (CoT) Rationality in LLMs: Tackling the issue of how LLMs explain their reasoning, some messages suggested that LLM-generated explanations for an answer are not trustworthy as they often do not reflect the model’s internal thought process.
Kolmogorov-Arnold Networks (KANs) Outperform MLPs: Highlighted was a paper that introduces Kolmogorov-Arnold Networks (KANs) as an alternative to Multi-Layer Perceptrons (MLPs), noting that KANs offer better accuracy and interpretability with faster scaling laws and potential for intuitive visualization.
Iterative Preference Optimization to Improve LLM Reasoning: Shared research (link) discusses an iterative method to improve LLM reasoning by optimizing the preference between competing generated CoT candidates, leading to increased accuracy in tasks like GSM8K, MATH, and others.

Links mentioned:

Benchmarking Benchmark Leakage in Large Language Models: Amid the expanding use of pre-training data, the phenomenon of benchmark dataset leakage has become increasingly prominent, exacerbated by opaque training processes and the often undisclosed inclusion...
KAN: Kolmogorov-Arnold Networks: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation fun...
Iterative Reasoning Preference Optimization: Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Ch...
NExT: Teaching Large Language Models to Reason about Code Execution: A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debu...
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training: This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-at...
VideoGigaGAN: no description found
Universal Physics Transformers: A Framework For Efficiently Scaling Neural Operators: Neural operators, serving as physics surrogate models, have recently gained increased interest. With ever increasing problem complexity, the natural question arises: what is an efficient way to scale ...
VideoGigaGAN: Towards Detail-rich Video Super-Resolution: Video super-resolution (VSR) approaches have shown impressive temporal consistency in upsampled videos. However, these approaches tend to generate blurrier results than their image counterparts as the...
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking: When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is imp...
Training Chain-of-Thought via Latent-Variable Inference: Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improv...
analyse-llms/notebooks/Mode_Collapse.ipynb at main · lauraaisling/analyse-llms: Contribute to lauraaisling/analyse-llms development by creating an account on GitHub.
Faster Convergence for Transformer Fine-tuning with Line Search Methods: Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we su...
GitHub - s-chh/PyTorch-Vision-Transformer-ViT-MNIST-CIFAR10: Simplified Pytorch implementation of Vision Transformer (ViT) for small datasets like MNIST, FashionMNIST, SVHN and CIFAR10.: Simplified Pytorch implementation of Vision Transformer (ViT) for small datasets like MNIST, FashionMNIST, SVHN and CIFAR10. - s-chh/PyTorch-Vision-Transformer-ViT-MNIST-CIFAR10
Sequential predictive learning is a unifying theory for hippocampal representation and replay: The mammalian hippocampus contains a cognitive map that represents an animal's position in the environment and generates offline "replay" for the purposes of recall, planning, and forming lo...

Eleuther ▷ #interpretability-general (34 messages🔥):

Exploring the Computational Model of Sequence-Prediction: A member theorized about the computational model learned by sequence-prediction models, particularly related to next-token prediction loss, predicting the existence of phase transitions in token probabilities and seeking feedback on their write-up here.
Connecting Prior Work with Theoretical Predictions: The member acknowledged the relevance of existing research on transformers and iterative inference, notably the tuned lens method from this paper, and discussed how findings from early decoding align with their proposed theory.
Discussing Model Representations with Tied Embeddings: Dialogue ensued about how models with tied embeddings, like Mamba, might affect interpretation, with speculation that tied embeddings could actually benefit the model’s representational coherence.
Drafting Implementation Plans for Theoretical Predictions: In response to whether implementations have been considered to test the hypotheses, a discussion took place about possibly using transformer lens and gpt-2-small to conduct experiments.
Exchanging Interpretability Insights: Members exchanged views on the challenges of defining and operationalizing the “atomicity” of model features. References were made to emerging concepts like the distributional simplicity bias and the Quantization Model of neural scaling laws, linking to research papers here and here.
Refining Interpretability Methods with Formal Languages: A suggestion was made to define an arbitrary formal grammar and train a network on sequences from that language to determine if the rules of the grammar could be considered the “true underlying features,” investigating transformers’ understandings of Dyck languages as a pertinent angle.

Links mentioned:

The Quantization Model of Neural Scaling: We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale....
Neural Networks Learn Statistics of Increasing Complexity: The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present com...
Eliciting Latent Predictions from Transformers with the Tuned Lens: We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer. To do so, we train an affine probe for each block in a froz...
Deriving a Model of Computation for Next-Token Prediction: no description found

HuggingFace ▷ #announcements (2 messages):

Cash in for CVPR Competitions: HuggingFace has announced three different competitions for the CVPR event with a total prize pool of $120,000+. Participants can join SnakeCLEF, FungiCLEF, and PlantCLEF from June 17-21, 2024.
Transformers Library Update: The Transformers library has been updated to v4.40.0, featuring models like Phi-3, Llama 3, IDEFICS 2, and more. Additionally, Phi-3 is set to be operable within the browser, achieving about 20 tokens per second.
Gradio and Datasets Library Enhancements: Gradio has released a significant update with version 4.28.0, focusing on Custom Components, while the Datasets library has reached v2.19.0 with Polars compatibility and improved export functionalities.
Empower Your Prompts: HF Blog spotlights techniques for enhancing prompt consistency in language model outputs through a post on Structured Generations.
Snowflake’s Impressive Model Release: Snowflake has released a whopping 408B Dense + Hybrid MoE model, boasting 17B active parameters and a wide range of capabilities like SQL generation, coding, and instruction following. This achievement is detailed in a highlighted announcement.

Links mentioned:

Tweet from Fleetwood (@fleetwood___): 🚨 Phi-3 running in the browser 🚨 Hits about 20 tok/s 🏎️ Literally 3 lines of JS. Still some kinks to iron out, coming to Ratchet 0.4.0 soon.
Tweet from abhishek (@abhi1thakur): Can I run AutoTrain UI on Kaggle? Yes, you can!!! Check out my latest notebook, copy it, fill in your tokens and enjoy AutoTrain UI running on Kaggle Notebooks backend 🚀 Link to notebook: https://www...
Tweet from Vaibhav (VB) Srivastav (@reach_vb): Let's go!! Common Voice 17 - now on the Hub! 🔥 With 31,000 hours of audio (& transcriptions) across 124 languages. *sound on 🎶* 847 hours of data were added in CV 17, along with 493 hours of ...
Tweet from Brigitte 🤗 (@BrigitteTousi): 🔊Calling all journalists! With @fdaudens, we're excited to announce a new community on the @huggingface Hub: Journalists on Hugging Face. 📰🤗 https://huggingface.co/JournalistsonHF 1/
Tweet from Vaibhav (VB) Srivastav (@reach_vb): Snowflake dropped a 408B Dense + Hybrid MoE 🔥 > 17B active parameters > 128 experts > trained on 3.5T tokens > uses top-2 gating > fully apache 2.0 licensed (along with data recipe to...
Tweet from Sayak Paul (@RisingSayak): Custom pipelines and components in Diffusers 🎸 Wanted to use customized pipelines and other components (schedulers, unets, text encoders, etc.) in Diffusers? Found it inflexible? This 🧶 is for y...
Tweet from lunarflu (@lunarflu1): You can now mention people on @huggingface !

HuggingFace ▷ #general (151 messages🔥🔥):

Chronos Model Fine-Tuning Inquiry: A member sought guidance on fine-tuning the Chronos time-series forecasting model. They were redirected to the GitHub repository for further details.
Hugging Face Job Seeker: A software engineer with 10 years of experience reached out for opportunities at Hugging Face, and was directed to Hugging Face’s job openings, including a wild card position.
Difficulty with Rasa Framework for a Chatbot: A new member is experiencing accuracy issues with intent recognition in a sales-related chatbot using Rasa Framework and is considering making a custom NER model.
Spaces Newbie Questions: Members asked about receiving notifications for new replies in Space community threads, and it was noted that notifications are sent by default.
Kaggle and Google Collaboratory Tips Shared: Several members discuss using Kaggle and Google Colab’s free GPUs for training models, with advice exchanged on the settings to increase VRAM and Kaggle’s phone verification to enable internet access.

Links mentioned:

Hugging Face: Here at Hugging Face, we’re on a journey to advance and democratize ML for everyone. Along the way, we contribute to the development of technology for the better.
Stable Diffusion Finetuned Minecraft Skin Generator - a Hugging Face Space by Nick088: no description found
amazon/chronos-t5-small · Hugging Face: no description found
Drax Guardians Of The Galaxy GIF - Drax Guardians Of The Galaxy Odds - Discover & Share GIFs: Click to view the GIF
zero-gpu-explorers/README · The invited application has been waiting. How long does it take to be approved?: no description found
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hx🔗 Links- F...
GitHub - johko/computer-vision-course: This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord: This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord - johko/computer-vision-course
GitHub - amazon-science/chronos-forecasting: Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting: Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting - amazon-science/chronos-forecasting
Fixes a few Sagemaker config issues by nroggendorff · Pull Request #2732 · huggingface/accelerate: Updates config_args.py to work with the latest version of amazon sagemaker In this new version, you are required to run variables operations with True or False, like --do_eval True as apposed to ju...

HuggingFace ▷ #today-im-learning (3 messages):

Seeking Finetuning Guidance: A member expressed interest in learning how to generate an instruction dataset for finetuning Large Language Models (LLMs).
In Search of Clarifications: Another member inquired for further details on what exactly the first member was referring to in terms of generating an instruction dataset for LLM finetuning.
Introducing Med-Gemini for Medicine: A member shared a YouTube video providing a high-level overview of Med-Gemini, Google’s multimodal GenAI models for medicine, aiming to inform and reassure interested parties about the technology.

Link mentioned: Med-Gemini: A High-Level Overview: A high-level overview on Med-Gemini, Google’s “Family” (said in the voice of Vin Diesel) of Multimodal GenAI models for medicine. Med-Gemini has folks in the…

HuggingFace ▷ #cool-finds (8 messages🔥):

Cool Tools for AI Enthusiasts: A Medium post entitled “5 Interesting AI Tools Everyone Should Try” was recommended, listing a variety of AI applications that could be of interest to people in the field.
Webloading the Future: An article on Medium discusses how to use Groq, Langchain, and Datastax to create robust Webloader RAG applications, read more about it here.
SQL Simplified: The Data Intelligence Alliance through its website, www.dataialliance.org, is developing a “people database” to allow individuals to interact with databases with little or no prior SQL knowledge.
Microscope Image Segmentation Made Easy: The GitHub repository for Micro-SAM, a project designed to simplify the process of segmenting microscopy images, is now available and can be checked out here.
Accelerating Diffusion Models: The Hugging Face documentation details several techniques to speed up diffusion models without compromise, and highlights how using PyTorch 2 can triple the inference speed of text-to-image pipelines, particularly demonstrated with Stable Diffusion XL (SDXL).

Links mentioned:

blog: no description found
Accelerate inference of text-to-image diffusion models: no description found
GitHub - computational-cell-analytics/micro-sam: Segment Anything for Microscopy: Segment Anything for Microscopy. Contribute to computational-cell-analytics/micro-sam development by creating an account on GitHub.
ETH Zürich DLSC: Physics-Informed Neural Networks - Applications: ↓↓↓ LECTURE OVERVIEW BELOW ↓↓↓ETH Zürich Deep Learning in Scientific Computing 2023Lecture 5: Physics-Informed Neural Networks - ApplicationsLecturers: Ben M...

HuggingFace ▷ #i-made-this (11 messages🔥):

Leak-Free Link Prediction Methodology: A GitHub repository called PnPR-GCN_ACM_SAC_24 addresses the issue of information leaks in K-fold cross-validation on transitive graphs. The methodology proposed ensures data splitting without information leakage, enhancing concept prerequisite learning.
Aligning Scheduling with AI: A tweet from dstackai introduced a guide on using the Alignment Handbook alongside dstack to facilitate the scheduling of fine-tuning tasks on cloud or on-premises machines.
Iterative SDXL Inpainting on 🤗 Spaces: The inpainting SDXL sketch pad allows for iterative inpainting and version history to restore previous image versions, but currently, the Space is sleep due to inactivity.
HDR Challenge with Display Compatibility: Mentioned images being HDR encoded, recommending to view them fullscreen for proper color representation, especially on devices like iOS/iPadOS, otherwise, they may appear washed out.
Chat in 55 Languages with Bloom: Bloom Multilingual Chat is a Hugging Face Space where users can converse with the Bloom model in 55 languages through the use of the deep_translator Python library for query translation and back-translation.
Batch Process Your Moon Dreams: A new batch processing feature has been added to MoonDream2, allowing for multiple images to be processed at once. Check out the MoonDream2 batch processing here.
FluentlyXL V4 Unveiled: The FluentlyXL V4 model emphasizes on contrast, realism, and accurate anatomy. You can try this enhanced model at Fluently Playground.

Links mentioned:

fluently/Fluently-XL-v4 · Hugging Face: no description found
Inpainting SDXL Sketch Pad - a Hugging Face Space by tonyassi: no description found
moondream2-batch-processing - a Hugging Face Space by Csplk: no description found
Bloom Multilingual Chatbot - a Hugging Face Space by as-cle-bert: no description found
GitHub - Lama-West/PnPR-GCN_ACM_SAC_24: Contribute to Lama-West/PnPR-GCN_ACM_SAC_24 development by creating an account on GitHub.

HuggingFace ▷ #reading-group (18 messages🔥):

Graph Papers Galore: A member highlighted a paper to read, titled “Graphs play an important role in representing complex relationships” and available at arXiv:2404.14928. They also mentioned considering other graph-related surveys but wished to avoid overextending their focus.
Distillation Insights on the Horizon: Participants discussed distillation in score-based models, mentioning that the Laion server contains experts in the field and suggesting papers by Segmind, discussing rectified/instaflow, lcm lora, and the piecewise rectified flow.
Reading Group Event Scheduled: An event for the reading group was organized and a link was provided for the participants to suggest different times, with a note to accommodate for everyone’s availability.
NegotiationArena: A New Playground for LLMs: Appreciation was shown for a presentation on a paper about how well Large Language Models (LLMs) can negotiate with each other using a framework called NegotiationArena, the paper can be found at arXiv:2402.05863.
Negotiation as an LLM Alignment Metric: A member remarked on the unique aspect of negotiating tasks as a potential metric for evaluating the alignment of LLMs, recognizing that the task differs from regular downstream tasks.

Links mentioned:

Graph Machine Learning in the Era of Large Language Models (LLMs): Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural N...
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis: Negotiation is the basis of social interactions; humans negotiate everything from the price of cars to how to share common resources. With rapidly growing interest in using large language models (LLMs...
Join the Hugging Face Discord Server!: We're working to democratize good machine learning 🤗Verify to link your Hub and Discord accounts! | 77668 members
Large Language Models on Graphs: A Comprehensive Survey: Large language models (LLMs), such as GPT4 and LLaMA, are creating significant advancements in natural language processing, due to their strong text encoding/decoding ability and newly found emergent ...
Towards Graph Foundation Models: A Survey and Beyond: Foundation models have emerged as critical components in a variety of artificial intelligence applications, and showcase significant success in natural language processing and several other domains. M...

HuggingFace ▷ #computer-vision (17 messages🔥):

Improving YOLO Models: A member mentioned they are working on enhancing the accuracy of YOLO architectures even if it means a slower model, and recognized that modifying the architecture could be time-consuming.
Collaboration Sought for CNN Study: A user is looking for a partner to study and learn about Convolutional Neural Networks (CNNs) together.
YOLOv5 Parallel Processing Tip: Sliding window approach for parallelism in YOLOv5 is suggested along with an idea to look into pre-YOLO/CNN image segmentation and contour algorithms, hinting that image simplification and downsampling can yield effective results.
Learning Curve in PyTorch vs TensorFlow: A discussion on whether to learn PyTorch or TensorFlow for CNNs took place, where it was acknowledged TensorFlow has a steeper learning curve, though it offers more devops support from Google, while PyTorch has more academic support and community momentum.
Kaggle Discussion and Tool for Computer Vision: A user shared a Kaggle discussion link to their work which has been designed to assist with training or fine-tuning CV models and are seeking feedback.

Links mentioned:

OpenCV: Image Processing in OpenCV: no description found
3LC - Real-Time 3D Visualizer/Debugger/Data Editor for Training/Finetuning your Models - Free! | Kaggle: 3LC - Real-Time 3D Visualizer/Debugger/Data Editor for Training/Finetuning your Models - Free!.
OpenCV: Morphological Transformations: no description found
Training a finetuned SegFormer model with Pytorch Lightning - : no description found
Balloons Toy Dataset + Detectron2 + 3LC Tutorial - : no description found
Integrating 3LC with YOLOv5 🚀 - : no description found
Integrating 3LC with YOLOv8 🚀 - : no description found

HuggingFace ▷ #NLP (5 messages):

Seeking Guidance for NLP Project: A new member is working on a chatbot project and is experiencing difficulties with intent recognition using the Rasa framework. They are considering creating a custom NER model to identify specific terms related to their business and ponder whether to “make [their] own model,” use Spacy, or utilize a pretrained model from HuggingFace to improve their bot’s performance.
Inquiring About Ollama Template Roles: Another member has queries regarding adding a “Reviewer” role to the Ollama template roles in order to evaluate the assistant’s response format, seeking how to implement this by way of a template. They reference existing documentation at Transformers chat templating guide.
Development of a Mini Emo Bot for College Tech Club: A member is building an NLP model for a Mini bot designed to interact with oral prompts, search for specific information, and provide spoken responses, potentially to be deployed on a Raspberry Pi. They request assistance and guidance as they are new to the field of NLP.

HuggingFace ▷ #diffusion-discussions (1 messages):

sayakpaul: Might be a better question for A1111 forums.

HuggingFace ▷ #gradio-announcements (1 messages):

Gradio Share Server Issues Alert: Gradio is currently experiencing problems with the Share Server, which might affect sharing and usage on Colab. They are actively investigating and resolving the issue, and users are directed to check the status here.
Gradio’s Status Transparency: Users can view Gradio’s operational uptime statistics over different time frames including the last 24 hours, 7 days, 30 days, and 90 days on their status page.
No Recent Updates: As of the last 7 days, there have been no new status updates, but the history can be checked for past incidents here.

Link mentioned: Gradio Status: no description found

LlamaIndex ▷ #blog (4 messages):

Financial Assistant AI Breakthrough: A new financial assistant can now calculate percentage evolution, CAGR, and P/E ratios over unstructured financial reports without human intervention. Brief insights shared via a post linked in a tweet about building this powerful tool.
Boost RAG Applications with Redis: In a collaboration between Redisinc, @tchutch94, and @seldo, learn about creating agentic Retrieval-Augmented Generation (RAG) with semantic caching. They discuss methods for enhancing quality, efficiency, and cost in this resource.
PulumiCorp Webinar on Deploying AI with LlamaIndex: A webinar scheduled for May 8, hosted by _ediri and @seldo, will dive into using Pulumi to deploy an AI application, focusing on LlamaIndex, onto AWS. Information about leveraging infrastructure as code for AI applications was shared in the announcement tweet.
Latest LlamaIndex.TS Update Announced: LlamaIndex.TS version 0.3 has been released with enhancements such as agent support for ReAct, Anthropic, OpenAI, and a generic AgentRunner class, improved Web Streams, and a more robust type system. These updates were highlighted in a tweet featuring the new version’s benefits.

Link mentioned: no title found: no description found

LlamaIndex ▷ #general (130 messages🔥🔥):

Max Tokens and Embedding Models: If the content to embed exceeds the max token limit, the model will only consider the first max_length tokens and ignore the rest. This may require content chunking if the embedding model has a smaller token limit than provided data.
Local Async Calls for AzureOpenAI: LlamaIndex supports async calls to AzureOpenAI using acomplete and astream_complete for completions, and achat and astream_chat for chat context. Async allows tasks like API calls to be executed without blocking other operations, leading to performance improvements.
Real-time Summaries with Source Nodes: LlamaIndex can generate summaries and indicate the nodes used to form them. Streamlining this process involves optimizing prompts and utilizing source nodes information for result relevance.
Understanding RAG with MongoDB Atlas: Questions were raised about querying within LlamaIndex without re-uploading documents and converting them into nodes. Responses indicated that embedding models are essential for comparing queries with the indexed data to retrieve relevant material.
Analyzing LlamaIndex vs. Local Development Drawbacks: Ollama runs locally and can be slower compared to server-based APIs like OpenAI, but it offers privacy and cost benefits for local development. The use of embedding models in the query process is unavoidable for creating and querying indices in LlamaIndex.

Links mentioned:

Summary and Resources: Discover the magic of the internet at Imgur, a community powered entertainment destination. Lift your spirits with funny jokes, trending memes, entertaining gifs, inspiring stories, viral videos, and ...
Starter Tutorial (OpenAI) - LlamaIndex: no description found
Content Moderation using AI: Learn about how to moderate content using AI models and frameworks such as llamaindex, moondream and microsoft phi-3.
OpenAI - LlamaIndex: no description found
Building a (Very Simple) Vector Store from Scratch - LlamaIndex: no description found
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hx🔗 Links- F...
Chroma Vector Store - LlamaIndex: no description found
Query Pipeline with Async/Parallel Execution - LlamaIndex: no description found
Query Pipeline with Async/Parallel Execution - LlamaIndex: no description found
Parallelizing Ingestion Pipeline - LlamaIndex: no description found

LlamaIndex ▷ #ai-discussion (6 messages):

Choosing the Right GPU for AI Tasks: The discussion revolved around the suitability of a gaming card like the RTX 4080 for running and fine-tuning smaller language models. One member advised that while VRAM is critical, even with 16 or 24GB, one should not expect to fine-tune models larger than 7B with small batch sizes.
Local vs Cloud Compute for Privacy Concerns: The member tuhe clarified the need for a local PC stems from dealing with sensitive data and the practicality of having a robust computer for work, rather than cloud solutions like Google Colab which may pose privacy issues.
Introduction to Word Loom: A new open specification called Word Loom was shared, designed for managing and exchanging language for AI, focusing on the separation of code from natural language and composability. Feedback is welcomed on the proposed update, which aims to aid the traditional globalization process, the full details of which are available on GitHub.

Link mentioned: Word Loom proposed update: Word Loom proposed update. GitHub Gist: instantly share code, notes, and snippets.

Modular (Mojo 🔥) ▷ #general (22 messages🔥):

Subreddit Confusion Cleared Up: A member clarified that there is a subreddit for Mojo at https://www.reddit.com/r/modular_mojo/, but the Mojo community primarily engages on GitHub and Discord.
Concurrency Model Speculations: The community discussed Mojo’s potential for adopting concurrency models, with a guess that it won’t follow golang-style but may lean towards an actor model, and a counterpoint emphasizing the importance of not shipping a massive runtime with the language.
Mojo Compiler Insights: It was shared that Mojo’s compiler is handwritten and reuses parts of LLVM, with further explanation available in a YouTube video titled “2023 LLVM Dev Mtg - Mojo 🔥: A system programming language for heterogenous computing.”
Type Declaration Error in Playground: An issue was raised regarding an error message when using ‘ui64’ as a type declaration, with confusion whether custom bitwidth integers like in Zig were supported and highlighting that Int64 works but Int128 doesn’t.
First Mojo Anniversary Reflections: Members reflected on the first anniversary of Mojo’s launch, highlighting the addition of traits, references, and lifetimes as major achievements that unlocked a lot of the standard library’s potential.

Links mentioned:

Input data schema | Modular Docs: The following YAML schema allows you to specify the input shapes required by
Proposal For An Actor System Based On Mojo by reid-spencer · Pull Request #1445 · modularml/mojo: This is currently a work in progress. There are no code changes, just a proposal written in the proposals section. This was pre-approved by Chris Lattner in a conversation in June 2023. I will kee...
2023 LLVM Dev Mtg - Mojo 🔥: A system programming language for heterogenous computing: 2023 LLVM Developers' Meetinghttps://llvm.org/devmtg/2023-10------Mojo 🔥: A system programming language for heterogenous computingSpeaker: Abdul Dakkak, Chr...

Modular (Mojo 🔥) ▷ #💬︱twitter (4 messages):

Modular Tweets a Mystery: Modular’s latest tweet has been shared, but the content is not specified in the message.
Another Modular Update Hits Twitter: Check out the most recent update from Modular by following the shared link.
Modular Shares a Cryptic Message: A new tweet from Modular has been posted; details of the tweet are not described here.
Modular Continues to Tease on Twitter: There’s a new tweet from Modular that might be of interest; specifics behind the tweet are not included in the message.

Modular (Mojo 🔥) ▷ #🔥mojo (58 messages🔥🔥):

Julia’s @time Macro Wins Hearts: One member praised Julia’s @time macro for its ability to show allocations and expressed a desire to see a similar feature in Mojo.
Mystery of the ‘None’ Implementation: A search for how None is implemented in Mojo led to confusion and a discussion linking to GitHub. The inquiry highlighted an error about None not implementing the __is__ and __isnot__ methods.
Praise for Mojo’s Syntax: Mojo’s syntax was lauded by a user who, after evaluating various programming languages, found Mojo’s syntax almost perfectly aligned with their ideal language syntax.
Discussing Pass by Reference in Mojo: A conversation about using inout with structs and the Reference type in Mojo clarified that inout does pass by reference similar to C++ but is distinct in Mojo. The discussion included code samples and highlighted the ongoing development to make referencing more elegant.
Mojo Development Updates and Questions: Various messages touched upon Mojo’s open-source progress, the anticipation for its Windows release, and ensuring Mojo remains user-friendly and understandable without going down the complexity of Rust’s lifetime system.

Links mentioned:

Build software better, together: GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
C++ as an Optimizing Assembler - a Performance Talk - Levo DeLellis - CppNorth 2023: https://www.cppnorth.ca---C++ as an Optimizing Assembler - a Performance Talk - Levo DeLellis - CppNorth 2023Are you tired of abstractions, templates and co...
99 Bottles of Beer/EsoLang: no description found

Modular (Mojo 🔥) ▷ #community-projects (1 messages):

Call for Mojo Contributors: A member extended an invitation to contribute to Mojo, with suggestions such as allowing negative numbers, implementing a fallback for scalar processing, and exploring fast absolute tolerances from articles linked in the issues. No specific plans were set, leaving room for experimental contributions.
Identifying a Missing Mojo Component: Mojo currently lacks the PMADDUBSW instruction, critical for fast SIMD atol (ASCII to long integer conversion), prompting workarounds with ~4 SIMD operations. This feature is specific to x86 and not supported on ARM architectures.

Link mentioned: PMADDUBSW — Multiply and Add Packed Signed and Unsigned Bytes: no description found

Modular (Mojo 🔥) ▷ #community-blogs-vids (3 messages):

Mojo Lang Sparks Enthusiasm: A new YouTube video featuring Chris Lattner discusses Mojo Lang, a potential high-performance successor to Python that leverages CPU/GPU programming techniques.
Podcast Love for Programming Languages: A member expressed their fondness for the podcast, sharing their excitement about the discussions on programming languages and spreading the content internally.

Link mentioned: Mojo Lang - Tomorrow’s High Performance Python? (with Chris Lattner): Mojo is the latest language from the creator of Swift and LLVM. It’s an attempt to take some of the best techniques from CPU/GPU-level programming and packag…

Modular (Mojo 🔥) ▷ #performance-and-benchmarks (7 messages):

Call to Form Team-Mojo for 1BRC: A member suggested forming a Team-Mojo to tackle the One Billion Row Challenge (1brc) as both a showcase and a tutorial.
Performance Optimization in Mojo: After optimizing string allocations and conversions, a member reported reducing processing time from 8 seconds to 1.3 seconds for 100M records, with the current bottleneck being the hashmap, bringing total time from 18.5 to 12.5 seconds. This implementation is only functional in Mojo nightly and can be found on GitHub.
Enthusiasm for Team-Mojo’s Formation: Members expressed enthusiasm about forming team-mojo, indicating it would be a fun project to undertake.
Reference to Benchmarks Game: There was a suggestion to also consider the benchmarks game, a previously uncompleted task by the team.
Multi-core Processing Update: A member proposed a pull request after updating their work to enable multi-core processing, noting a significant performance improvement to now handle 100M records in 3.8 seconds. Another member invited this update for a further review and mentioned their intent to look into the atol function based on their experience with atol-simd.

Links mentioned:

The Mojo is 68,000 times faster than Python type blogs are awesome, but can awesome comparisons be made with other languages too? · modularml/mojo · Discussion #843: Mojo being 35,000 times faster than Python, 68,000 times faster than Python… it’s impressive, amazing, and cool, but to non-Python people and anti-Python who haven’t yet paid attention to Mojo yet ...
GitHub - MoSafi2/1brc-mojo at dev: One Billion Row Challenge (1brc) in Mojo language. Contribute to MoSafi2/1brc-mojo development by creating an account on GitHub.

Modular (Mojo 🔥) ▷ #nightly (20 messages🔥):

Order Swap Still Buggy: A member mentioned that changing the order of something still causes it to break, despite fixing an initial issue.
Considering the Future of bool in Code: A detailed viewpoint was expressed on potentially limiting the use of bool to size 1, highlighting the importance of retaining bool as a primitive in programming and understanding the impact of such a change.
SEMANTICS: Could simd ternary mimic select?: A member inquired if simd ternary might act like select, with another noting that even if statements’ semantics somewhat depend on the concept of being ‘boolable.’
WANTED: Missing __source_location() Function: Conversations involved confusion about the disappearance of the __source_location() function, with a suggestion that it might be replaced by __call_location(). This was visible through a SourceGraph search link and the topic was further discussed, including specific code examples and GitHub documentation links.
Function Names in Source Location: A member questioned the absence of the function_name in the __source_location() function output, with hints that others also share this concern.

Links mentioned:

context:global __source_… - Sourcegraph: no description found
mojo/stdlib/src/testing/testing.mojo at nightly · modularml/mojo: The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.

OpenAccess AI Collective (axolotl) ▷ #general (23 messages🔥):

Clarifying Tokenizer Behavior: Members discussed how including a Beginning of Sentence (BOS) token in a chat template affects encoding, noting that tokenizer.encode("text") automatically adds BOS, but tokenizer.apply_chat_template(chat) needs it specified in the template.
Debating the Value of a Study: A link to a recent study was shared, sparking debate over its usefulness. One member praised its prompting strategy of using cosine-similarity embeddings, while another dismissed the study’s approach as overly complex for benchmarks.
The Practical Struggles with Model Tokens: Users expressed frustration over the implementation of new papers into practice, specifically the challenge of figuring out tokens for a model, despite the plethora of academic publications.
Discussing User Input Masking Strategies: A technical question surfaced about the best practice for masking out user inputs during training: whether to mask just the message or also the instructional tags, and how to ensure proper learning of the format but not user typing styles.
Prompting Approaches and Generalist Models: There was a brief touch on the relevance of complex prompting strategies and whether applying techniques to only generalist models somewhat misses the point when evaluating AI performance on benchmarks.

Link mentioned: Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine: Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabil…

OpenAccess AI Collective (axolotl) ▷ #axolotl-dev (2 messages):

Offer for Compute Help in Triage: A member is extending an offer to assist with triage/troubleshooting of bugs/issues by providing compute resources. They emphasize that such help is invaluable to the project and their sanity.

OpenAccess AI Collective (axolotl) ▷ #general-help (14 messages🔥):

Phi3 Finetuning Underway: Some members are currently engaged in finetuning phi3. Others seeking to dive into examples or explore this further are advised to search the channel’s history for relevant details.
Dataset Format Wrangling for ShareGPT Loader: A member looking to finetune a model shared a JSON dataset example structured for OpenAI’s format, and then received guidance on how to convert it to the ShareGPT loader format. They were advised to replace "messages" with "conversations", "role" with "from", "content" with "value", "user" with "human", and "assistant" with "gpt".
Simplified Script for Dataset Conversion: For adapting the dataset to the required format, a script was provided, which automatically replaces the keys and maps the roles from the input JSON structure to match the ShareGPT expected format.
Choose the Right LLaMA Model for Finetuning: In a discussion on finetuning LLaMA models, it was recommended to avoid finetuning the Meta-LLaMA-3-70B-Instruct variant as it’s already instructed, which could lead to worse performance with a new format. It was also advised for beginners to start with an 8b model before progressing to more complex 70b variants.
FS-DP Compatibility Query for Lora: A member inquired about using fsdp with lora, as opposed to qlora, after encountering issues where training hangs post-model loading. The suggestion indicates that perhaps only qlora might be compatible with their fsdp setup.
LLaMA Model’s Lengthy Output Concerns: A user reported their LLaMA 3 8b instruct model producing long outputs and sentences when trained on regular human conversations. They pondered if certain tokens like end-of-text or punctuation might require additional training, or if more data and epochs are the key to resolving this issue.

Link mentioned: Axolotl - Conversation: no description found

OpenAccess AI Collective (axolotl) ▷ #rlhf (1 messages):

gbourdin: add to my bookmarks. Thanks for this !

OpenAccess AI Collective (axolotl) ▷ #community-showcase (2 messages):

Axolotl Meets dstack: A tutorial demonstrating how to use axolotl with dstack, an open-source orchestrator, was shared. It allows fine-tuning AI models on any cloud or a pool of on-premise machines and is available on GitHub.
Community Approves: A community member responded positively to the shared tutorial, commenting on its ease of use.

Link mentioned: dstack/examples/fine-tuning/axolotl/README.md at master · dstackai/dstack: An open-source container orchestration engine for running AI workloads in any cloud or data center. https://discord.gg/u8SmfwPpMd - dstackai/dstack

OpenAccess AI Collective (axolotl) ▷ #axolotl-phorm-bot (51 messages🔥):

Command-r Model Fine-tuning Discussed: Members explored fine-tuning the command-r model, with suggestions to use runpod templates or to manually implement the unsupported formats. One advised consulting an untested PR on GitHub for adding the command-r model to Axolotl.
Fine-tuning Clarifications Provided: It was established that if specific parameters like sample packing are not compatible, they are simply ignored during the process. This led to confusion as to why a training task took unexpectedly long.
Axolotl Format Capabilities Queried: There were questions about Axolotl’s support for the phi-3 format and GaLore, with Phorm responding that Axolotl does not support phi-3 but does support GaLore, and details on enabling it can be found in the Hugging Face documentation.
Model Adaptation Features and Functions: Through the conversation, it was hinted that adapting models in Axolotl can involve custom code adjustments, and familiarizing oneself with the project’s resources on GitHub is beneficial for tasks such as enabling or configuring specific features like GaLore.

Links mentioned:

Feat: Add cohere (commandr) by NanoCode012 · Pull Request #1547 · OpenAccess-AI-Collective/axolotl: Description Motivation and Context How has this been tested? Untested! Screenshots (if appropriate) Types of changes Social Handles (Optional)
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
axolotl/README.md at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.
transformers/docs/source/en/trainer.md at main · huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers
OpenAccess-AI-Collective/axolotl | Phorm AI Code Search: Understand code, faster.

LAION ▷ #general (60 messages🔥🔥):

AI Compliance with Terms of Service: A participant questioned the situation where an individual is using an AI product without agreeing to its terms. This raises issues about user agreements and how they are enforced.
Call for a New Transparent AI Leaderboard: A user expressed the need for a new and more transparent leaderboard for AI models. They advocated for ones that feature only verifiable open source models and the ability to filter results by open weights.
Concerns Over LMSYS’s Objectivity and Data Practices: There were multiple concerns about the objectivity of the Chatbot Arena leaderboard managed by LMSYS; discussions touched on conflicts of interest and the lack of transparency in handling models’ ratings.
Inquiries and Sharing on AI Models and Datasets: Users sought more information about an AI-generated chess dataset and shared their thoughts on various models’ performances, like llama3 70b’s capabilities even when quantized to 4-bit.
Technical Difficulties and Development Sharing: Participants shared links to ongoing projects like magvit2 and discussed optimization techniques, including when to use GANs for better model reconstruction and Natten’s new fused cuda implementation for efficiency.

Links mentioned:

LMSYS Chatbot Arena: Live and Community-Driven LLM Evaluation | LMSYS Org: <h2><a id="our-mission" class="anchor" href="#our-mission" aria-hidden="true"><svg aria-hidden="true" class="octicon octicon-link&...
Motion-I2V: no description found
lmsys/lmsys-chat-1m · Datasets at Hugging Face: no description found

LAION ▷ #research (25 messages🔥):

Cardiac Ultrasound AI Research Published: A member announced the publication of their study on cardiac ultrasound fine-tuning of OpenCLIP, despite acknowledging several issues with the paper. The research, after enduring an 8-month revision process, is available at Nature Medicine.
Challenging StableDiffusion Sustainability: Discussion touched on a GitHub repository zer0int/CLIP-fine-tune, linked to concerns over Reddit closing open API access, which has widespread implications including affecting app developers and blind users.
Kolmogorov-Arnold Networks Over MLPs: A new paper proposes Kolmogorov-Arnold Networks (KANs) which outperform Multi-Layer Perceptrons in accuracy and interpretability by utilizing learnable activation functions as splines on edges. The concept has resonated with members, finding the approach to be very promising (Read the arXiv paper).
VisualFactChecker for Enhanced Captioning: Another paper introduces VisualFactChecker (VFC), a training-free pipeline that significantly improves captioning for images and 3D objects by incorporating fact-checking, potentially resolving issues like content hallucination. The study details methods that increase fidelity and detail in automatic captioning (View the arXiv paper).
Request for Chess Dataset Generation Details: In search of better training data, a member requests for details on the configuration used to generate the LAION stockfish dataset to gauge if it would be adequate for training their chess bot or if there would be a need to generate additional datasets.

Links mentioned:

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation: Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC)...
KAN: Kolmogorov-Arnold Networks: Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation fun...
Reddit - Dive into anything: no description found
Vision–language foundation model for echocardiogram interpretation - Nature Medicine: A vision–language foundation model, trained on a dataset of more than 1 million echocardiogram video–text pairs, is able to assess various cardiac structural and functional parameters desp...

Latent Space ▷ #ai-general-chat (70 messages🔥🔥):

Decentralized AI Training by Prime Intellect: Prime Intellect explores novel decentralized training approaches to keep up with Big Tech’s expansion of GPU clusters. For an in-depth look, read their blog post discussing the challenges faced by the open-source AI community and their platform’s aim to aggregate global compute resources.
AI Agents or Translation Machines?: A member debated the concept of AI agents, suggesting instead that language models could be considered “translation machines” using shared context and memory, without needing to parallelize for multiple reasons.
Starcoder2-Instruct Released: Hugging Face introduces StarCoder2-15B-Instruct-v0.1, a self-aligned Large Language Model (LLM) for code generation. The underlying pipeline and the model are open-source and permissive, detailed in their announcement page.
AI Town with World Editor: User shares an experimental set-up involving 300 AI agents operating within a simulated world called AI Town, running smoothly on a MacBook M1 Max.
Lilian Weng’s Insightful Yet Challenging Blog Posts: Some members expressed feeling overwhelmed by the depth and complexity of Lilian Weng’s blog posts, particularly the Transformer Family 2.0 post, questioning if they need to dedicate full-time learning to grasp the concepts shared.

Links mentioned:

Learn Prompting: Your Guide to Communicating with AI: Learn Prompting is the largest and most comprehensive course in prompt engineering available on the internet, with over 60 content modules, translated into 9 languages, and a thriving community.
Tweet from cocktail peanut (@cocktailpeanut): I deployed 300 AI agents to Westworld (aka AI Town), and surprisingly, it works without issues on my Macbook M1 Max 64G. Here's what it looks like:
bigcode/starcoder2-15b-instruct-v0.1 · Hugging Face: no description found
AI Engineering 101 and 201 Workshops: from AI Engineer Summit 2023
Tweet from lmsys.org (@lmsysorg): hi @simonw, thanks a ton! We really value your feedback. Just to clarify, following our policy, we've partnered with several model developers to bring their new models to our platform for communi...
State-of-the-art in Decentralized Training: This post explores various novel decentralized training approaches and how they can enable effective AI model training across globally distributed GPUs.
Tweet from Jesse Lyu (@jessechenglyu): get your r1 update to the latest version now - we addressed most of the issues we found so far and more fix/improvements incoming! idle battery life up to 5x better now. ↘️ Quoting rabbit inc. (@rabb...
But what is a neural network? | Chapter 1, Deep learning: What are the neurons, why are there layers, and what is the math underlying it?Help fund future projects: https://www.patreon.com/3blue1brownWritten/interact...
Prompt Engineering Roadmap - roadmap.sh: Step by step guide to learn Prompt Engineering. We also have resources and short descriptions attached to the roadmap items so you can get everything you want to learn in one place.

Latent Space ▷ #ai-announcements (1 messages):

Ring Attention Paper Club Event: A special guest appearance at the LLM Paper Club with the StrongCompute team to discuss the important Ring Attention paper. Interested parties can sign up for the event through this Zoom link.

Link mentioned: LLM Paper Club (Ring Attention!) · Zoom · Luma: The StrongCompute gang (@adam_peaston, @fennecs) is covering Ring Attention today! https://arxiv.org/abs/2310.01889 Also submit and vote for our next paper:…

Latent Space ▷ #llm-paper-club-west (2 messages):

Zoom Meeting Link Shared: A Zoom meeting link was provided for those preferring a video call alternative. The link can be accessed at Zoom Meeting.

Link mentioned: Join our Cloud HD Video Meeting: Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom …

OpenInterpreter ▷ #general (36 messages🔥):

Promoting Positive Community Interactions: A reminder was issued emphasizing the importance of being respectful and constructive as the community grows and diversifies. It was stressed that everyone has an equal right to share their thoughts and should be treated well to build a better future.
Event Reminder and Recap Inquiry: A link to a community event was shared, and members who missed it asked for a recap. It was mentioned that the slides and a screen recording would be made available, with posted slides in a specific channel.
Open Interpreter’s Web Task Capabilities: Members discussed whether Open Interpreter can perform browser tasks like visiting websites and scraping data. Clarification was provided that it is indeed capable of such tasks without needing browser control.
Compatibility and Technical Issues Discussed: Questions surfaced about the compatibility of Open Interpreter’s OS mode with Windows, mentioning persistent errors. A member confirmed that some commands need alterations for Windows, and the package ‘tesseract’ was mentioned as a cause of issues.
Sharing Useful Resources: A YouTube channel was recommended as a useful resource for insights and updates related to Open Interpreter, complete with a direct link to the channel.

Links mentioned:

Join the Open Interpreter Discord Server!: A new way to use computers | 8840 members
Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.
Mike Bird: A.I. engineering

OpenInterpreter ▷ #O1 (31 messages🔥):

The Quest for the External Push Button: Members discussed issues with integrating an external push button with hardware, specifically with the Ataom Echo device. Code modifications were shared, specifically a snippet for ButtonChecker, which when utilized, resolved the problem as confirmed by a member who implemented it.
Amplifying Audio Through External Hardware: A member provided a solution to increase the volume of speakers connected to hardware, suggesting the use of an external amplifier with a link to a potential amp, though noting they had not yet tested this setup.
Unboxing AI Innovations: The channel mentioned a YouTube review by MKBHD of an AI product, Rabbit R1, with a link to the video. There was a debate about the effectiveness of traditional tech reviewers in understanding and evaluating non-mainstream AI devices.
Connecting R1 to OpenInterpreter: Conversations circled around the idea of integrating R1 with OpenInterpreter (OI), with members discussing their anticipation and plans for doing so. There’s an eagerness to explore how these tools can work together, hoping to expand capabilities and build innovative setups.
NGROK Domain Customization for OI: A member shared specific steps to creating a new domain on ngrok and editing the tunnel.py file within the 01 software to address issues with server connection, offering a direct link to the ngrok domains page.

Links mentioned:

ngrok - Online in One Line: no description found
Rabbit R1: Barely Reviewable: AI in a Box. But a different box.Get a dbrand skin and screen protector at https://dbrand.com/rabbitMKBHD Merch: http://shop.MKBHD.comTech I'm using right no...
Amazon.com: HiLetgo Mini 3W+3W DC 5V Audio Amplifier Handy Digital Power Amp Module Board Dual-Channel PAM8403 Stereo Amplifiers with Potentiometer for DIY Portable : Electronics: no description found

OpenRouter (Alex Atallah) ▷ #announcements (2 messages):

<ul>
    <li><strong>Snowflake Arctic 480B and FireLLaVA 13B Models Launched</strong>: Announcing new models <strong>Snowflake Arctic 480B</strong>, excellent at coding with a hybrid transformer architecture, available at <a href="https://openrouter.ai/models/snowflake/snowflake-arctic-instruct">Snowflake Arctic 480B</a> and <strong>FireLLaVA 13B</strong>, an open source, multimodal model by Fireworks at <a href="https://openrouter.ai/models/fireworks/firellava-13b">FireLLaVA 13B</a>. Both come with new pricing and detailed specifications for developers.</li>
    <li><strong>Improved Load Balancing and Detailed Provider Stats</strong>: OpenRouter introduced <strong>load balancing</strong> to manage providers' load surges and now allows monitoring of latency and providers' finish reasons, enhancing performance for users, accessible on the <a href="https://openrouter.ai/activity">Activity page</a>.</li>
    <li><strong>Streamlined Docs for Developers</strong>: Documentation updates for Image and multimodal requests, plus tool calls and function calling, are now available to guide usage on <a href="https://openrouter.ai/docs#images-_-multimodal-requests">Image Requests</a> and <a href="https://openrouter.ai/docs#tool-calls">Tool Calls</a>.</li>
    <li><strong>Feature Expansion and Price Adjustments</strong>: Announced support for <strong>logit_bias</strong> and <strong>min_p</strong> on Lepton models, a significant 40% price cut on Mythomax Extended, and a slight 4% reduction for Mixtral 8x7b Instruct. These changes reflect OpenRouter's commitment to cost-effective and advanced AI capabilities.</li>
    <li><strong>Impending API Changes and Developer Notifications</strong>: Developers are alerted about the upcoming removal of the <code>total_cost</code> field from non-streaming completions and a potential requirement of the <code>User-Agent</code> header in requests to improve service security and efficiency.</li>
</ul>

Links mentioned:

OmniGPT - The Most Affordable ChatGPT Alternative: We offer you the best models in the market: Claude 3, GPT 4 Turbo, GPT 4, Gemini, Perplexity and more at an affordable price.
Syrax AI - Leverage multiple AIs on one platform: With Syrax AI you can access multiple AI models to generate content, images, and more from one platform.
Snowflake: Arctic Instruct by snowflake | OpenRouter: Arctic is a dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP result...
FireLLaVA 13B by fireworks | OpenRouter: The first commercially permissive OSS LLaVA model. This vision-language model was trained entirely on OSS LLM generated instruction following data.
OpenRouter: Build model-agnostic AI apps
OpenRouter: Build model-agnostic AI apps

OpenRouter (Alex Atallah) ▷ #app-showcase (1 messages):

Skribler - The Swedish Author’s AI Assistant: Launched a few weeks back, Skribler is a new tool aimed at Swedish writers, integrating various models via OpenRouter for different writing tasks. It’s available at skribler.se and offers features like generating suggestions for text passages, helping bridge gaps in writing, formulating dialogues, and overall support for the creative writing process, with an introduction video here.
Positive Reception and User Adoption: The announcement of Skribler also notes that it has already secured a group of paying users, indicating a positive reception in its target market.

Link mentioned: Skribler | Skriv med AI: no description found

OpenRouter (Alex Atallah) ▷ #general (64 messages🔥🔥):

OpenRouter Logging Queries: Members are asking if it’s possible to view the per-request prompt and outputs with logging enabled on OpenRouter.
Model Embedding Capability Inquiry: A member inquired about the availability of models that support embedding within OpenRouter.
Context Extension Curiosity: There’s a conversation about the extension of context windows in models, specifically mentioning a model with a context length extended to over 1million and discussions regarding the performance of an extended LLama-3 8B model, available on Hugging Face.
Payment Issues and Solutions Discussed: Users are discussing issues with using pre-paid credit cards on OpenRouter, suggesting that some cards may be blocked by Stripe’s fraud detection, and talking about potential solutions or alternatives for payment.
Stream Cancellation and Model Fall-backs: There are questions concerning the reliability of stream cancellation in OpenRouter, and suggestions for using AWS as a potential fallback for Claude models, similar to how Azure is used for OpenAi’s models.

Links mentioned:

gradientai/Llama-3-8B-Instruct-Gradient-1048k · Hugging Face: no description found
GitHub - hsiehjackson/RULER: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?: This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models? - hsiehjackson/RULER

AI Stack Devs (Yoko Li) ▷ #app-showcase (28 messages🔥):

Crisp Diffusion Model Outputs: A member mentioned that the diffusion model outputs from Hexagen World are really crisp, signaling high-quality results.
Retro Gaming with Generative AI: It was suggested that remaking early social media games like Farmville with Generative AI (GenAI) would be a compelling concept and WebSim could potentially be the best platform to achieve this.
AI Embedded Nostalgic Townsim: A member expressed interest in setting up a 1950s themed AI town in WebSim where one of the characters is a communist spy, creating an interactive game of cat-and-mouse.
Interactive Animation and AI Discussions: Participants interested in AI animation were invited to join a related Discord community by following a provided Discord invite link.
Discovery and Sharing of Hexagen World: The interactive AI concept Hexagen World was shared within the community, discovered via a Twitter post by @bennyj504, capturing the interest of several members who discussed its features and potential.

Links mentioned:

Join the Deforum Discord Server!: Deforum is an open-source animation tool that leverages the power of Stable Diffusion to create AI animations. | 29464 members
Tweet from BennyJ504-075⚜😎🤑🔌.yat 🟣 (@bennyj504): https://www.hexagen.world/
Collectively AI Generated Game World: social experiment where anyone can help create an infinitely unique world in their browser.

AI Stack Devs (Yoko Li) ▷ #ai-town-discuss (2 messages):

First-time experience with Llama3: A member expressed excitement about trying out Llama3 for the first time, indicating a new user’s interest in exploring the capabilities of this AI model.

AI Stack Devs (Yoko Li) ▷ #ai-town-dev (33 messages🔥):

Simple Local Setup Success: A member confirmed that setting up the system locally was very easy to accomplish.
Windows Compatibility Hurdle: Several members reported issues running the local version on Windows, with one getting stuck at Checking for index or schema changes… Another member clarified that Convex local does not support Windows but mentioned that work on Windows compatibility was underway.
Mac-Specific Run Commands Shared: For those running on Mac, it was suggested to use just convex dev for a dedicated sync and just convex logs for a separate terminal log output, offering smooth operations without interferences from npm run dev.
Correct Node Version is Crucial: An error related to the node version was shared by a member when trying to run the app. It was pointed out that one needs to run convex-local-backend in the same directory as npm run dev, and to ensure that the correct node version (nvm use 19) is used in both directories.
Switching to Linux for Development: In light of the aforementioned compatibility issues with Windows, some members considered uninstalling Windows and installing Linux, with one inquiring about how to do so and if it would affect the ability to play the game Stellaris. Another member provided a link to WineHQ indicating that Stellaris has native Mac and Linux versions, implying compatibility would not be an issue.

Link mentioned: WineHQ - Stellaris: no description found

Cohere ▷ #general (35 messages🔥):

Language Models and Grammar: A link to LLM University offers an explanation on how language models like LLMs manage to generate grammatically correct sentences. It talks about the concept of word and sentence embeddings, and the crucial role of self-attention, with a detailed resource available here.
Command R Gets Rave Reviews: Community members praise the Cohere commandR/ R+ models, lauding their high performance and contrasting them to other large language models, with comments suggesting that they offer an enterprise-level polished experience.
RAG-powered AI Legal Assistant Webinar: The recording of a webinar about building an AI legal assistant with Cohere’s RAG is shared and available on YouTube.
Azure and OAuth for Connectors Discussed: For those wondering how to set up OAuth with connectors on Azure, it is clarified that the Cohere toolkit on GitHub can be used which allows everything to run on Azure, ensuring all data remains internal with no external data sharing.
Exploring Multilingual Support in Command-R: The community is actively testing languages like Norwegian on Command-R, leading to discussions about language support and the need for better benchmarks, even though some languages appear to work well without official support.

Links mentioned:

The Attention Mechanism: no description found
Building a RAG-powered AI legal assistant with Cohere: Cohere recently released Command R, its family of highly scalable language models that balance high performance with strong accuracy. In this webinar, you’ll...
GitHub - cohere-ai/cohere-toolkit: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.: Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications. - cohere-ai/cohere-toolkit

Cohere ▷ #collab-opps (1 messages):

There are no sufficient details or discussion points provided in the single message history given to create a summary. If more chat contents were provided, a summary could be created following the guidelines.

LangChain AI ▷ #general (24 messages🔥):

Seeking PDF Table Extraction Help: A member inquired about how to improve table extraction from PDFs, especially when they span multiple pages. They are using unstructure but experiencing poor results.
Integrating Llama 3 with LangChain: A member asked how to use Llama 3 through LangChain and was pointed to use Fireworks with Fireworks API Key to achieve this.
Looking for Document-to-Graph Conversion Tools: Members discussed the need for tools to automatically structure documents into knowledge graphs. Suggestions included using a layout parser like unstructured or Azure Doc AI and exploring LangChain’s documentation on constructing knowledge graphs.
Exploring Sales Agents with AI: A member is seeking advice on building AI-powered Sales Agents that can handle objections and maintain a human tone. They mentioned experimenting with SalesGPT logic and are open to partnerships to further this initiative.
Addressing AI Schema Knowledge Limitations: In a server with over 2000 tables, a member is facing challenges with an AI’s ability to comprehend all the schemas, indicating limitations in AI knowledge about database structures.

Links mentioned:

Constructing knowledge graphs | 🦜️🔗 LangChain: In this guide we’ll go over the basic ways of constructing a knowledge
ChatFireworks | 🦜️🔗 LangChain: Fireworks accelerates product development
Fireworks - Generative AI For Product Innovation!: Use state-of-the-art, open-source LLMs and image models at blazing fast speed, or fine-tune and deploy your own at no additional cost with Fireworks.ai!

LangChain AI ▷ #langserve (1 messages):

Google Drive Libraries in Use Again: A member mentioned the necessity to use Google Drive libraries for certain operations, specifying that the drive key should be set as an environment variable. It was noted that these libraries were previously removed and then re-added to the project.

LangChain AI ▷ #share-your-work (7 messages):

Launch of QuickVid for YouTube Video Summarization: QuickVid introduces a new way to interact with YouTube content by providing lightning-fast summaries and fact verification. Experience the tool that can improve your YouTube experience at QuickVid.
Advanced Webloader RAG Creation Explained: A member shares an article on building powerful Webloader RAG applications with Groq, Langchain, and Datastax. Details can be found at this Medium post.
Introduction of Word Loom Spec for AI Language Management: Word Loom, an open spec for managing language for AI, aims to improve prompt management with core principles of separation of code from natural language, composability, and friendliness to mechanical comparisons and G11N techniques. Feedback on the spec is welcome, and it can be reviewed on GitHub Gist.
Updates to LangChain Chatbot and Documentation Challenges: The LangChain chatbot has been updated to version 0.1.17, with acknowledgement of the challenges posed by outdated documentation post-stable release. A working example of the updated chatbot can be experienced at LangChain Chatbot.
Consideration of LLM Performance Report for Content Creation: A member is testing various LLMs on the leaderboard for content creation use cases like scriptwriting and copywriting, and asks if a detailed report would be useful to others.

Links mentioned:

Word Loom proposed update: Word Loom proposed update. GitHub Gist: instantly share code, notes, and snippets.
GitHub - carlosplanchon/gpt_pydantic_tools: A way to write GPT tools using Pydantic Schemas.: A way to write GPT tools using Pydantic Schemas. Contribute to carlosplanchon/gpt_pydantic_tools development by creating an account on GitHub.
QuickVid: no description found
no title found: no description found

LangChain AI ▷ #tutorials (3 messages):

A Parisian Flavor to Advanced RAG: A new tutorial video showcases the integration of LangChain with Mistral Large and Llamaindex to build an Advanced RAG assistant for the French-speaking community. The content is available on YouTube as “Multi-Agent RAG: LangChain et LlamaIndex portés par Mistral Large - Le vent du changement” with the application’s code provided in the video description.
Training Local Llama3 with a Twist: An instructional video titled “I want Llama3 to perform 10x with my private knowledge - Local Agentic RAG w/ llama3” has been shared, illustrating how to train llama3 with private knowledge to build an agentic RAG. The video can be found here.
Complexity-based RAG Strategy Selection: The “LangGraph + Adaptive Rag + LLama3 Python Project: Easy AI/Chat for your Docs” video introduces an Adaptive RAG approach that adjusts its strategy according to the complexity of the query. This technique promises to optimize the performance of AI/Chat integrations with documentation.

Links mentioned:

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3: Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hx🔗 Links- F...
LangGraph + Adaptive Rag + LLama3 Python Project: Easy AI/Chat for your Docs: #langchain #langgraph #rag #python #automation #llm #ai #automation in this video, I have a super quick tutorial for you showing how to create a fully local ...
Multi-Agent RAG: LangChain et LlamaIndex portés par Mistral Large - Le vent du changement: Dans cette nouvelle vidéo, je passe Mistral Large au banc d'essai pour le développement d'un Assistant RAG multi-agents en utilisant LangChain et LlamaIndex....

Mozilla AI ▷ #announcements (1 messages):

<ul>
  <li><strong>Join the Mozilla AI Team</strong>: Mozilla AI is expanding its team and is currently hiring. Interested parties can check out the employment opportunities on their official Discord channel [here](https://discord.com/channels/1089876418936180786/1230938514955436242/1234870020916510823).</li>
  <li><strong>Introducing Lm-buddy</strong>: Mozilla AI has released a new open-source tool named **Lm-buddy** designed to help evaluate models more efficiently. For more details and access, visit the announcement in their channel [here](https://discord.com/channels/1089876418936180786/1230938514955436242/1234589599733518378).</li>
  <li><strong>Local LLM as Digital Jurist</strong>: There's a discussion about using a **Local LLM** as a judge via the Prometheus framework. Details are available on the Discord channel, accessible [here](https://discord.com/channels/1089876418936180786/1234890301143912599/1234890301143912599).</li>
</ul>

Mozilla AI ▷ #llamafile (34 messages🔥):

M1 MacBook Air Trouble with LLaMA3: A member reported issues running LLaMA3:8b on an M1 MacBook Air, where it works fine on ollama but not on llamafile. The response was that testing on M1 will be made a priority after resolving other ongoing support issues.
Whisper Models Wrapped in Llamafile: A suggestion was made to wrap whisper.cpp models into llamafile for faster inference, noting that integration for microphone and speaker remains unsolved, despite ease of building whisper with cosmo libc.
Justine Tunney’s GEMM Blog Fact-Check: One user queried about a blog post (https://justine.lol/matmul/) stating np.matmul performs at 29 gflops, noting personal experience with much higher gflop performance; a response clarified the original measurement was on an Intel computer with Ubuntu and explained the difference in counting flops.
Multiple Llamafiles Running Simultaneously: A discussion about running multiple llamafiles simultaneously with different models was confirmed to be possible. It was noted that the operating system would manage the resource allocation, and there may be a need for extra tooling for optimized use.
Llamafile Public Path Customization: A member asked about customization using the --server --path PUBLIC_PATH option. It was mentioned that the only tested customizability involved replacing .html and .js files in the zip, rather than external directories.

Links mentioned:

jartine/Phi-3-mini-4k-instruct-llamafile · Hugging Face: no description found
GitHub - stanford-futuredata/FrugalGPT: FrugalGPT: better quality and lower cost for LLM applications: FrugalGPT: better quality and lower cost for LLM applications - stanford-futuredata/FrugalGPT

tinygrad (George Hotz) ▷ #general (8 messages🔥):

Curiosity about Graph Diagrams for Backward Operations: Shikhar_7985 inquired about creating graph diagrams for issue #3572 involving backward passes with two reduce operations. Akshatxv mentioned that there’s a dot file that can be used, while python273 hinted at setting GRAPH=1.
Symbolic Shapes and Skipped Tests in Tinygrad: Georgehotz brought to attention his work on symbolic shapes in Tinygrad and shared a pull request that includes a skipped test for symbolic arange.
Seeking Tinygrad Knowledge Beyond Google: Lynn4400 expressed interest in learning more about Tinygrad, especially its kernels, and mentioned being influenced by a podcast by Lex Fridman. Leikowo directed them to the repo’s documentation as a good starting point for understanding Tinygrad better.

Link mentioned: tensor variable by geohot · Pull Request #4362 · tinygrad/tinygrad: no description found

tinygrad (George Hotz) ▷ #learn-tinygrad (13 messages🔥):

Tinygrad’s Scalar to ConstType Renaming: The project saw a commit renaming Scalar to ConstType and cast_scalar to as_const for pre-req cleanup to standardize constant argument types with dtype.
Exploring Const Support Variables: A member suggested refining tinygrad’s handling of constants in operations, proposing to use const support variables instead of tensor variables for simplification and asserting the bounds during the scheduling phase.
Symbolic JIT and Variable Mean Tests: After a discussion on the need for symbolic JIT enhancements, it was noted that a good test for verifying improvements would involve varying symbolic JIT variable values and calculating the mean of a 2D tensor with variable lengths.
Emphasis on Making Const Variable Work: There was a focus on enabling the functioning of const Variables within tinygrad, as they are pivotal for operations related to symbolic dimensions and operations.
EfficientNet CUDA Usage on Nvidia Xavier: Members discussed issues with running the efficientnet example on Nvidia Xavier, suggesting checking the use of CUDA=1 for proper script execution.
Technical Divisions in Symbolic Logic: A debate occurred regarding the differentiation between Rednode and OpNode in the tinygrad codebase, questioning if Rednode complicates symbolic compiler logic and whether it should be factored out.

Links mentioned:

Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - Comparing tinygrad:master...davidjanoskyrepo:symbolic-mean-var-pull · tinygrad/tinygrad
Comparing 86d90511cee2^...97a2d44d9840 · tinygrad/tinygrad: You like pytorch? You like micrograd? You love tinygrad! ❤️ - Comparing 86d90511cee2^...97a2d44d9840 · tinygrad/tinygrad
rename Scalar to ConstType and cast_scalar to as_const (#3946) · tinygrad/tinygrad@77589bc: prereq cleanup to make const arg same python type as dtype

Interconnects (Nathan Lambert) ▷ #random (11 messages🔥):

Claude Released by Anthropic: Anthropic has officially released the Claude app, and some members have begun downloading it for use.
Quality Queries on Claude: Members are curious about how the newly minted Claude app by Anthropic compares to OpenAI’s offerings, questioning if it holds up in quality.
Smooth Sailing with New App: One of the members did not report any issues when using the Claude app and expressed an affinity toward Anthropic’s branding.
Anthropic’s Branding Wins Hearts: The conversation reflects a positive response to Anthropic’s branding strategies, with members acknowledging the appeal of its logo.
ML Collective Meetings Ongoing: A member confirmed they still attend ML Collective meetings, though not on a weekly basis.

Interconnects (Nathan Lambert) ▷ #reads (1 messages):

Rethinking AI Leaderboards: A shared article titled “AI Leaderboards are No Longer Useful” by Sayash Kapoor, Benedikt Stroebl, and Arvind Narayanan questions the usefulness of current AI leaderboards. According to HumanEval benchmarks, LDB is the most accurate publicly available system for code generation, but its high cost due to repeatedly invoking language models like GPT-4 is a significant drawback.

Link mentioned: AI leaderboards are no longer useful. It’s time to switch to Pareto curves.: What spending $2,000 can tell us about evaluating AI agents

Interconnects (Nathan Lambert) ▷ #posts (2 messages):

Motivation Boost Successful: In response to a blunt performance critique, a member has notably elevated their work quality, eliciting a positive and emphatic reaction from others.

Alignment Lab AI ▷ #ai-and-ml-discussion (1 messages):

Inappropriate Content Alert: The channel received a message promoting a Discord invite link allegedly offering access to leaked materials of questionable and potentially illegal ethics involving minors. The message includes emojis suggestive of adult content and targets everyone in the channel.

Link mentioned: Join the e-girl paradise 🍑🍒 // +18 Discord Server!: Check out the e-girl paradise 🍑🍒 // +18 community on Discord - hang out with 16457 other members and enjoy free voice and text chat.

Alignment Lab AI ▷ #programming-help (1 messages):

Inappropriate Content Alert: A message in the channel contained an offer for free “18+ Teen Girls and onlyfans leaks” and included a Discord invite link. This content is inappropriate for the channel focused on AI alignment and programming help.

Alignment Lab AI ▷ #looking-for-collabs (1 messages):

Inappropriate Content Alert: A message was posted offering free leaks of 18+ Teen Girls and OnlyFans content, including a Discord invite link. This content is against community guidelines and promotes illegal activities.

Alignment Lab AI ▷ #general-chat (1 messages):

Inappropriate Content Alert: The channel contained a message promoting adult content including 18+ teen girls and OnlyFans leaks. The message included emojis and a Discord invite link.

Alignment Lab AI ▷ #landmark-dev (1 messages):

Inappropriate Content Alert: A message containing links to adult content and leaked material from OnlyFans was posted, appearing to be spam or a phishing attempt. This included an invitation to a Discord channel allegedly offering free access to such content.

Link mentioned: Discord - A New Way to Chat with Friends & Communities: Discord is the easiest way to communicate over voice, video, and text. Chat, hang out, and stay close with your friends and communities.

Alignment Lab AI ▷ #landmark-evaluation (1 messages):

Inappropriate Content Alert: A message was posted containing links to NSFW content, specifically promoting 18+ Teen Girls and OnlyFans leaks. The poster shared a Discord invitation link and tagged everyone.

Alignment Lab AI ▷ #open-orca-community-chat (1 messages):

Inappropriate Content Alert: A message containing links to potentially explicit content and an invitation to view onlyfans leaks was posted, suggesting the sharing of illegal content targeted at an 18+ audience. The post included emojis and a Discord invite link.

Alignment Lab AI ▷ #leaderboard (1 messages):

Inappropriate Content Alert: A message was posted containing links to explicit content, specifically referencing a Discord server with leaks from the subscription service known as OnlyFans, potentially featuring underage individuals. The message included a Discord invite link and used emojis that imply the content is adult in nature.

Alignment Lab AI ▷ #looking-for-workers (1 messages):

Inappropriate Content Alert: A message contained an inappropriate solicitation for adult content featuring individuals portrayed as minors, including a Discord invite link. The message was flagged for promoting objectionable material.

Alignment Lab AI ▷ #looking-for-work (1 messages):

Inappropriate Content Alert: A message in the channel contained an offer for adult content featuring young individuals, along with a Discord invite link. This kind of content is highly inappropriate and may violate various terms of service and laws related to the distribution of explicit material of underage subjects.

Alignment Lab AI ▷ #join-in (1 messages):

Inappropriate Content Alert: A message promoting adult content, specifically involving teen girls and OnlyFans leaks, was posted along with a Discord invite link. The post seems to be an attempt to drive traffic to another Discord server that may contain explicit material.

Alignment Lab AI ▷ #fasteval-dev (1 messages):

No summary can be provided as the content does not contain relevant topics or discussion points related to AI or the Alignment Lab AI Discord chatbot messages. Further, the content appears to be inappropriate and not aligned with the expected academic or professional discussions typically summarized.

Alignment Lab AI ▷ #qa (1 messages):

Inappropriate Content Alert: A message was posted that appears to promote access to adult content featuring individuals who may be under the age of consent, along with a link to a Discord server. This type of content is not only inappropriate but potentially illegal and should be reported and removed immediately.

Skunkworks AI ▷ #general (11 messages🔥):

LLaMA-3 Instruct Prompt Strategies Revealed: An update to the LLaMA-3 instruct prompt strategies has been shared, claiming improvements on the model’s performance, including the relevant GitHub pull request.
Clarifying Dataset Entry Confusion: A member detailed that using eot_id solved issues related to a previously attempted method where they were manually adding <|eot_id|> at the end of every dataset entry.
Meta’s Iterative Reasoning Optimization Boosts Accuracy: The paper titled “Iterative Reasoning Preference Optimization” has been circulated, indicating Meta’s advancement with LLama-2-70B-Chat showing accuracy increases on multiple benchmarks like GSM8K and ARC-Challenge. The link to the paper is available here.
Fine-tuning LLaMA-3 with Axolotl: A user shared their experience fine-tuning LLaMA-3 8b using Axolotl, resulting in model outputs that include `

Links mentioned:

Tweet from Aran Komatsuzaki (@arankomatsuzaki): Meta presents Iterative Reasoning Preference Optimization Increasing accuracy for Llama-2-70B-Chat: - 55.6% -> 81.6% on GSM8K - 12.5% -> 20.8% on MATH - 77.8% -> 86.7% on ARC-Challenge htt...
Axolotl - Conversation: no description found
feat: Add LLaMA-3 instruct prompt strategies for fine-tuning by 0-hero · Pull Request #1553 · OpenAccess-AI-Collective/axolotl: Description This builds on top of and includes the changes in the below PR's #1542 #1539 Fastchat PR from @TJ-Solergibert needs to be merged before merging this lm-sys/FastChat#3257 Motivatio...

Skunkworks AI ▷ #off-topic (2 messages):

Motivational Beats to Keep You Pumping: An anime-inspired motivational track titled “NEVER GIVE UP YOUR WAAAAAAAAAAAAY” was shared, featuring an instrumental version from the anime Kill La Kill. The YouTube video encourages viewers to never give up, with a link to a Patreon for support.
Count Me In!: A member responded enthusiastically with “I’ll be there too,” indicating participation or support in relation to the previously shared content.

Link mentioned: NEVER GIVE UP YOUR WAAAAAAAAAAAAY: NEVA GIVE UP - https://bit.ly/2VrgAcKSong is Before my Body is Dry instrumental version from the anime Kill La KillConsider donating to our Patreon!https://w…

DiscoResearch ▷ #general (1 messages):

Quick Load Times Locally: A member mentioned that running their program on their local machine is fast as it loads in 3 secs, suggesting that storage is not the problem when compared to slower load times after submitting a job.

DiscoResearch ▷ #benchmark_dev (1 messages):

le_mess: llama 3 seems to beat gpt4 on scandeval https://scandeval.com/german-nlg/

DiscoResearch ▷ #discolm_german (1 messages):

Exploring Model Expansion with qdora: A member sparked interest in LLM expansion by mentioning qdora, a middleway solution for models like LLaMA. They provided a link to an Answer.ai blog post discussing the process.
Delving into LLaMA Pro’s Non-forgetful Learning: The conversation also highlighted new post-pretraining methods aimed at preventing catastrophic forgetting in LLMs, pointing to an Arxiv paper on expanding Transformer blocks to retain old skills while acquiring new ones.

Link mentioned: LLaMA Pro: Progressive LLaMA with Block Expansion: Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretra…

Datasette - LLM (@SimonW) ▷ #llm (2 messages):

Datasette UX Challenge: A member seeks ideas for a user interface on the Datasette front page where users can select options from a dropdown, like choosing a country to fetch summary data related to that selection.
Contemplating Dynamic URLs vs. Customizable Interface: Two UX approaches were suggested for the Datasette front page; one involves updating the URL dynamically on event to bring the user directly to the data, while the other allows users to “build” the homepage by updating canned queries based on their selection.

May 02, 2024
Not much happened today

Companies

Models

Topics

AI Reddit Recap

AI Twitter Recap

AI Discord Recap

PART 1: High level Discord summaries

CUDA MODE Discord

Unsloth AI (Daniel Han) Discord

LM Studio Discord

Nous Research AI Discord

Stability.ai (Stable Diffusion) Discord

OpenAI Discord

Perplexity AI Discord

Eleuther Discord

HuggingFace Discord

LlamaIndex Discord

Modular (Mojo 🔥) Discord

OpenAccess AI Collective (axolotl) Discord

LAION Discord

Latent Space Discord

OpenInterpreter Discord

OpenRouter (Alex Atallah) Discord

AI Stack Devs (Yoko Li) Discord

Cohere Discord

LangChain AI Discord

Mozilla AI Discord

tinygrad (George Hotz) Discord

Interconnects (Nathan Lambert) Discord

Alignment Lab AI Discord

Skunkworks AI Discord

DiscoResearch Discord

Datasette - LLM (@SimonW) Discord

PART 2: Detailed by-Channel summaries and links