Mustafa Suleyman announced Inflection 2.5, which closes much of the gap Inflection had with GPT-4 in an undisclosed compute-efficient way (âachieves more than 94% the average performance of GPT-4 despite using only 40% the training FLOPsâ, which is funny because those numbers arenât public).
But IQ isnât the only metric that matters; they are also optimizing for EQ, which is best proxied but the impressive user numbers they also released for Pi:
More notes on the Axios exclusive:
- Piâs user base has been growing at around 10% a week for the last two months. This lets us construct some ballpark estimates for Pi vs ChatGPT:
They also released a corrected version of MT-Bench for community use.
The community has spotted a couple other interesting tidbits:
- The results are suspiciously close to Claude 3 Sonnet
- Pi also now has realtime web search.
Table of Contents
[TOC]
PART X: AI Twitter Recap
all recaps done by Claude 3 Opus, best of 4 runs
Claude 3 Release and Capabilities:
- Amanda Askell broke down Claude 3's system prompt, explaining how it influences the model's behavior (671,869 impressions)
- Claude 3 Opus beat GPT-4 in a 1.5:1 vote, showing impressive performance (17,987 impressions)
- Claude 3 is now the default model for Perplexity Pro users, with Opus surpassing GPT-4 and Sonnet being competitive (89,658 impressions)
- Claude 3 shows impressive OCR and structured extraction capabilities, as demonstrated in the @llama_index cookbook (59,822 impressions)
- Anthropic has added experimental support for tool calling in Claude 3 via a LangChain wrapper (21,493 impressions)
Retrieval Augmented Generation (RAG):
- LlamaIndex released LlamaParse JSON Mode which allows parsing text and images from PDFs in a structured format. Combined with Claude-3, this enables building RAG pipelines over complex PDFs.
- LlamaIndex now supports video retrieval via integration with VideoDB, allowing RAG over video data by indexing visual and auditory components.
- A paper on "Knowledge-Augmented Planning for LLM Agents" proposes enhancing LLM planning capabilities through explicit action knowledge bases.
Benchmarking and Evaluation:
- TinyBenchmarks looks promising as a tool for evaluating language models, similar to the Dharma-1 benchmark by @far__el.
- An empirical result suggests that 100 examples may be sufficient to evaluate language models, based on datasets like HumanEval (164 examples) and Bamboogle (124 examples).
- The Yi-9B model was released, showing strong performance on code and math benchmarks, topping Mistral.
AI Research and Techniques:
- Researchers introduced Wanda, a method for network pruning that reduces computational burden while maintaining performance (7,530 impressions)
- A paper proposes foundation agents that can master any computer task by taking screen images and audio as input and producing keyboard/mouse operations (16,598 impressions)
- Microsoft presents Déjà Vu, a KV-cache streaming method for fast, fault-tolerant generative LLM serving (16,458 impressions)
- Google presents RT-H, which outperforms RT-2 on a wide range of robotic tasks using action hierarchies and language (11,348 impressions)
- Meta presents ViewDiff for generating high-quality, multi-view consistent images of 3D objects in authentic surroundings (5,880 impressions)
Memes and Humor:
- "AGI by September 2024" meme tweet pokes fun at overhyped AI timelines.
- A humorous tweet laments the time sink of relying on GPT-4 to write code in 2024 instead of doing it manually.
- Playful banter about making Claude-3 a girlfriend, referencing the Gemini model's supposed creativity.
PART 0: Summary of Summaries of Summaries
Claude 3 Sonnet (14B?)
-
Model Releases and Comparisons: Multiple new AI models sparked heated discussions around their strengths and limitations. Inflection-2.5 claimed to match GPT-4 performance on benchmarks while using less compute, but faced skepticism from @HlibIvanov who called it a mere GPT-4 distill lacking innovation. Claude-3 Opus achieved impressive feats like a perfect 800 on the SAT reading section, with @res6969 praising its enhanced knowledge web construction over 35k tokens. However, @jeffreyw128 noted Claude struggled to find a specific name among 500. Gemma underwhelmed @lee0099 compared to 7B Mistral, especially in multi-turn dialogues and being English-only.
-
Open-Source AI and Community Dynamics: @natolambert vented frustrations over the OSS communityâs pedantic corrections and lack of perspective, which can deter OSS advocates. Even helpful posts face excessive criticism, as experienced when writing on OSS. The GaLore optimizer by @AnimaAnandkumar promised major memory savings for LLM training, generating excitement from @nafnlaus00 and others about improving accessibility on consumer GPUs. However, @caseus_ questioned GaLoreâs claimed parity with full pre-training. Integrating GaLore into projects like axolotl faced implementation challenges.
-
Hardware Optimization for AI Workloads: Optimizing hardware was a key focus, with techniques like pruning, quantization via bitsandbytes, and low-precision operations discussed. @iron_bound highlighted Nvidiaâs H100 GPU offering 5.5 TB/s L2 cache bandwidth, while @zippika speculated the RTX 4090âs L1 cache could reach 40 TB/s. CUDA implementations like @tspeterkim_89106âs Flash Attention aimed for performance gains. However, @marksaroufim warned about coarsening impact on benchmarking consistency.
-
AI Applications and Tooling: Innovative AI applications were showcased, like @pradeep1148âs Infinite Craft Game and Meme Generation using Mistral. LlamaIndex released LlamaParse JSON Mode for parsing PDFs into structured data. Integrating AI with developer workflows was explored, with @alexatallah offering sponsorships for OpenRouter VSCode extensions, while LangChainâs
ask-llm
library simplified LLM coding integrations.
Claude 3 Opus (8x220B?)
-
NVIDIA Restricts CUDA Translation Layers: NVIDIA has banned the use of translation layers for running CUDA on non-NVIDIA hardware, targeting projects like ZLUDA that aimed to bring AMD GPUs to parity with NVIDIA on Windows. The updated restrictions are seen as a move to maintain NVIDIAâs proprietary edge, sparking debates over the policyâs enforceability.
-
Efficiency and Pruning Debates in LLM Training: Discussions emerged around the parameter-efficiency of LLMs and the potential for optimized training schemes. Some argued that the ability to heavily prune models without substantial performance drops indicates inefficiencies, while others cautioned about reduced generalizability. New memory-reduction strategies like GaLore generated interest, with ongoing attempts to integrate it into projects like OpenAccess-AI-Collective/axolotl. Questions arose about the limits of current architectures and the saturation of model compression techniques.
-
Inflection-2.5 and Claude-3 Make Waves: The release of Inflection-2.5, claiming performance on par with GPT-4, and Anthropicâs Claude-3 variants Opus and Sonnet sparked discussions. Some were skeptical of Inflection-2.5âs innovation, suggesting it might just be a GPT-4 distillation. Meanwhile, Claude-3 garnered significant community interest, with Opus achieving a perfect 800 on the SAT reading section according to this tweet.
-
Mistral Powers Innovative Applications: The Mistral language model demonstrated its versatility in powering an Infinite Craft Game and automating meme creation using the Giphy API. Other noteworthy releases included Nous Researchâs Genstruct 7B for instruction-generation (HuggingFace) and new benchmarks like Hae-Rae Bench and K-MMLU for evaluating Korean language models (arXiv).
ChatGPT (GPT4T)
-
Multilingual Model Support and API Integration: Discord communities highlighted advancements in multilingual support and API integration, with Perplexity AI introducing user interface support for languages like Korean, Japanese, German, French, and Spanish and discussing the Perplexity API for integrating Llama 70B. API desires and troubles included discussions on rate limit increases and integration code, as detailed in their API documentation.
-
Innovations in AI-Driven Game Development: Nous Research AI and Skunkworks AI showcased the use of AI in creating new gaming experiences. A crafting game leveraging Mistral demonstrated AIâs potential in game development with its expandable element combination gameplay, showcased in a YouTube video. Similarly, Mistral was used in an Infinite Craft Game and for automating meme creation, illustrating innovative AI applications in gaming and humor.
-
Advancements and Debates in Model Optimization and Pruning: The LAION and OpenAccess AI Collective (axolotl) summaries brought to light discussions on model optimization, pruning, and efficiency. Debates on the pruning of Large Language Models (LLMs) reflected differing opinions on its impact on performance and generalizability, with some engineers proposing pruning as evidence for possible optimization. GaLore emerged as a focal optimization tool in discussions, despite skepticism about its performance parity with full pretraining, with integration efforts underway as noted in their pull request.
-
Emergence of New AI Models and Tools: Across multiple Discord summaries, there was significant buzz around the introduction of new AI models and tools, including Inflection AI 2.5, Genstruct 7B, and Yi-9B. Inflection AI 2.5âs release sparked conversations about its efficiency and performance, whereas Nous Research AI unveiled Genstruct 7B, an instruction-generation model aimed at enhancing dataset creation and detailed reasoning, available on HuggingFace. The Hugging Face community saw the launch of Yi-9B, adding to the growing list of models available for experimentation and deployment, showcasing the continuous innovation and expansion of AI capabilities, with a demo available here.
PART 1: High level Discord summaries
Perplexity AI Discord Summary
-
Perplexity AI Talks the Talk in Multiple Languages: Perplexity now supports Korean, Japanese, German, French, and Spanish for its user interface, as announced by
@ok.alex
. Users can customize their language preferences in the app settings. -
A Limitations and Alternatives Smorgasbord: There was an active discussion about the limitations of AI models, with focus on daily usage limits for Claude 3 Opus and alternatives like Claude 3 Sonnet and GPT-4 Turbo. The need for more direct feedback was mentioned in regards to the closed beta application process.
-
Perplexity Pro Subscribers Sound Off: Users shared their experiences with Perplexity Pro, engaging in discourse regarding the additional benefits and how to access specialized support channels.
-
New Kid on the Block: Inflection AI 2.5: The release of Inflection AI 2.5 sparked conversations about its efficiency and performance levels, with users highlighting its speed and debating its potential use cases.
-
Global Cordiality or Algorithmic Manners?: A discussion was sparkled around cultural communication nuances, with a focus on the use of âsirâ and global differences in respectful address, in the context of language models.
-
Sharing is Caring - Perplexity Goes 3D: Users shared interesting Perplexity search links, exploring topics from 3D space navigation, to altcoin trends, the concept of Ikigai, quality of text generation by Claude 3 Opus, and interpretations of quantum mechanics.
-
API Desires and Troubles: The guild members are engaging with the Perplexity API, seeking integration code for Llama 70B and support for rate limit increases, while also showing interest in the Discover feature. The Perplexity API documentation was referenced as a guide for usage and technical assistance.
Nous Research AI Discord Summary
-
Innovative Crafting AI Game Emerges: A new crafting game leveraging Mistral has been introduced by
@pradeep1148
, showcasing the potential for AI in game development. The game begins with four elements and expands as players combine them, as demonstrated in a YouTube video. -
Continuous Improvement Triggers Tech Buzz: The Yi-34B base model has shown remarkable performance growth in its âNeedle-in-a-Haystackâ test, potentially raising the bar for upcoming models. Googleâs Gemma model received community-driven bug fixes, which are available in Colab notebooks, and the new GaLore project demands community validation, potentially benefiting from pairing with low-bit optimizers.
-
Genstruct 7B Sets the Instructional Pace: Nous Research unveils Genstruct 7B, an instruction-generation model designed to enhance detailed reasoning and dataset creation. Heralded by Nousâs own
<@811403041612759080>
, Genstruct 7B promises innovation in instruction-based model training, available on HuggingFace. -
Chat AI and LLMs Spark Discussions: Anthropicâs new Genstruct-7B-GGUF enters the spotlight alongside debates surrounding Claude 3âs performance, and Inflection AI claims its model Inflection-2.5 matches GPT-4 on benchmarks. Meanwhile, community skepticism prevails regarding both a shared Twitter IQ test chart and the rumors of a GPT-5 release.
-
Technical Debates and Clarifications: From running Ollama locally to the potential of a function-calling model inspired by Claude-style, the community seeks insights on various AI models and tools. Highlights include the Nous-Hermes-2-Mistral-7B-DPOâs upcoming update for function calling data and a refactoring effort for a logit sampler by
@ufghfigchv
tailored for JSON/function calls. Access to GPT-4 was also mentioned with Corcel.io offering free ChatGPT-4 like interactions and a desire for longer context lengths in models like Nous-Hermes for RAG applications.
LAION Discord Summary
-
CUDA Controversy: NVIDIAâs recent policy change prohibits the use of translation layers for running CUDA on non-NVIDIA hardware, directly impacting projects like ZLUDA. The updated restrictions are seen as a move to maintain NVIDIAâs proprietary edge.
-
Scraping Spat Spirals: Stability AI was mixed up in a controversy for allegedly scraping Midjourney, causing a ban on their employees and raising concerns over data scraping practices. While some tweets suggest it was not work-related, it has kindled a discussion on scraping ethics and protocols.
-
Efficiency in the Spotlight: Debates concerning the pruning of Large Language Models (LLMs) have been center stage. Some engineers believe current training methods are ineffectual, proposing pruning as evidence for possible optimization, while others voice concerns about the potential loss of generalizability, stability, and the slowness of certain optimization techniques such as SVD.
-
Pruning Perplexities: Contrary to some beliefs that lightly pruned models experience performance degradation, thereâs an argument that heavily pruned LLMs remain surprisingly generalizable. However, this leads to a bigger question: are oversized parameters needed for model training, or can engineers aim for leaner, yet effective LLMs?
-
Architectural Assessments: Conversations are probing the structural boundaries of current LLMs, exploring whether the strategies to compress and optimize models, especially those based on attention mechanisms, are approaching their limits. This underlines a curiosity about the saturation of model efficiency within present-day architectures.
OpenAI Discord Summary
- Claudeâs Remarkable Performance: Positive experiences with Claude 3 Opus (C3) were highlighted as it outperformed GPT-4 on complex tasks and elementary class problems.
- Claude Versus Gemini: There was a debate over coding capabilities, where Gemini 1.5 Pro solved a Python GUI task successfully on the first try, while Claude 3 did not, indicating strengths and weaknesses in each.
- Doubts Cast on MMLU Datasets: Concerns arose about the MMLU datasetsâ questions lacking logical consistency and containing incorrect answers, leading to calls for reconsidering their use in AI model evaluations.
- GPT-4 Availability and Policy Discussions: Users discussed intermittent access to GPT-4 and policy changes affecting code provision, with confusion over accessing custom models like Human GPT via API clarified by references to OpenAIâs model overview.
- Service Outages and Support Quests: A reported 6-hour service interruption on OpenAIâs APIs, a query on the implementation of randomization in storytelling using Pythonâs random function, and discussions on the development of a GPT classifier reflected the technical and operational challenges community members faced.
LM Studio Discord Summary
-
Tech Titans Talk Troubleshooting: Users discussed hardware configurations with 512 GB RAM and 24GB VRAM, and tackled library version issues, specifically a
GLIBCXX_3.4.29 not found
error. The debates extended to whether a Macbook Pro with M3 Max or M2 Max is suitable for local LLM processing, touching on price and performance trade-offs. Community members aired concerns over the non-openness of OpenAI, with alternative AI services like POE monthly being considered for model access. -
Narrative Nuances and AI Models: In models discussion, the optimal AI specified for storytelling was Mistral-7b-instruct-v0.2-neural-story.Q4_K_M.gguf, yet the limitations due to memory constraints were evident. Lack of image-generation capabilities in LM Studio led participants to consider tools like Automatic 1111 for Stable Diffusion tasks. Interest in Starcoder2 was evident, with users awaiting its support, as indicated by references to its Hugging Face page.
-
Feedback Focus Irregularities and Insights: A user shared they desire clearer guidance to exploit LM Studioâs potential. Thereâs feedback suggesting LM Studio could emulate Embra AI to improve its utility, and that the current version (v0.2.16) doesnât support proxy on macOS 14.3. Also, it was clarified that help requests should not be posted in the feedback channel.
-
Hardware Hub Conversations: Reports indicated experiments with a 200K context in the Smaug model and VRAM demand issues with an LLM Studio task utilizing a 105,000 context. A minimum of 550W was recommended for a PSU to power an RTX 3090, and discussions included handling large contexts or datasets with LLM tasks and mismatched VRAM issues.
-
Response Rate Quest in Crew AI: In looking for ways to increase response speed for an unspecified process, users faced a connection timeout issue and proposed establishing local operations as a potential solution.
-
The Odyssey of Open Interpreter Syntax: There were inquiries and conversations surrounding the correct
default_system_message
syntax and profile configurations in Open Interpreter. Users exchanged experiences with code-trained models and shared learning moments for configurations, with references to instructions at Open Interpreter - System Message and Hugging Face.
LlamaIndex Discord Summary
-
Survey Your Heart Out: LlamaIndex seeks deeper user insights with a 3-minute survey. Engineers are encouraged to participate to shape better resources like documentation and tutorials, accessible at SurveyMonkey.
-
LlamaParse JSON Unleashed: The LlamaParse JSON Mode from LlamaIndex is generating buzz with its ability to parse PDFs into structured dictionaries, especially when paired with models like claude-3 opus. For those interested, a tweet announces the launch at LlamaIndex Tweet.
-
Video Capabilities Level Up: LlamaIndex and
@videodb_io
integration opens new doors for video content handling, allowing keyword-based video upload, search, and streaming within LlamaIndex. Discover more about this integration via this Announcement Tweet. -
Optimizing A for Search Efficiency*: The A* algorithmâs feasibility for similarity search was validated with a subclassing of the embedding class to alter the search methodology, showcasing LlamaIndexâs flexibility.
-
In-Context Learning Gets a Boost: A new methodology enhancing in-context learning has been introduced by
@momin_abbas
with the Few-Shot Linear Probe Calibration. The initiative invites support from the community through GitHub engagement.
Latent Space Discord Summary
- Midjourney and Stability AI Spat Goes Public:
@420gunna
referenced a controversial incident where Stability AI was banned from Midjourney for scraping data, as detailed in a Twitter post by@nickfloats
. - Podcast Episode Featuring AI Experts Hits the Airwaves and Hacker News:
@swyxio
announced a new podcast episode with<@776472701052387339>
and highlighted its presence on Hacker News. - Cheers for Volunteer-led Model Serving Paper Presentation: The community showed appreciation for
@720451321991397446
âs volunteering, with a specific focus on a presentation about model serving, accessible via Google Slides. - Inference Optimization and Hardware Utilization Debate Heats Up: Notable discussions on inference optimization included alternatives like speculative decoding and FlashAttention, and the effect of hardware, with insights derived from resources like EGjoniâs DRUGS GitHub and the DiLoCo paper.
- Decentralized Training and GPU Configuration Discussions Surge: There was an active engagement in deliberations around distributed training with references to DiLoCo and the influence of GPU configurations on model outputs, spurred by an anecdote of an OpenAI incident.
Eleuther Discord Summary
-
New Korean Language Benchmarks Unveiled: Two new evaluation datasets, Hae-Rae Bench and K-MMLU, specifically tailored to assess language modelsâ understanding of Korean language and culture, have been introduced by
@gson_arlo
. -
vLLM Batching Clarified:
@baber_
explained that batching is internally handled by vLLM, thus manual implementation for batched inference is not required, referencing the official documentation for ease of use. -
Multilingual Benchmark Collaboration Called For: Contributors speaking non-mainstream languages are invited to participate in creating pertinent benchmarks that evaluate language model competencies specific to their cultures.
-
Optimizer Memory Issues in GPT-NeoX Addressed: Discussions in the gpt-neox-dev channel focused on tackling memory peaks during optimization with members like
@tastybucketofrice
citing Issue #1160 and suggesting Docker as a potential solution to dependency challenges (Docker commit here). -
Efforts on Unified Fine-tuning Framework: In the lm-thunderdome channel, the lack of a consistent mechanism for fine-tuning language models was spotlighted by
@karatsubabutslower
, despite having a standardized evaluation method like lm-evaluation-harness.
HuggingFace Discord Summary
-
Entrepreneurs Seek Open Source Model Groups: Entrepreneurs on HuggingFace are looking for a community to discuss the application of open-source models in small businesses, however, no dedicated channel or space was recommended within the provided messages.
-
New Model on the Block, Yi-9B: Tonic_1 launched Yi-9B, a new model in the Hugging Face collection, available for use with a demo. Hugging Face may soon be hosting leaderboards and gaming competitions.
-
Inquiry into MMLU Dataset Structure: Privetin displayed interest in understanding the MMLU datasets, a conversation that went without elaboration or engagement from others.
-
Rust Programming Welcomes Enthusiasts: Manel_aloui kicked off their journey with the Rust language and encouraged others to participate, fostering a small community of learners within the channel.
-
AIâs Mainstream Moment in 2022: Highlighted by an Investopedia article shared by
@vardhan0280
, AIâs mainstream surge in 2022 was attributed to the popularity of DALL-E and ChatGPT. -
New Features in Gradio 4.20.0 Update: Gradio announced version 4.20.0, now supporting external authentication providers like HF OAuth and Google OAuth, as well as introducing a
delete_cache
parameter and/logout
feature to enhance the user experience. The newgr.DownloadButton
component was also introduced for stylish downloads, detailed in the documentation.
OpenAccess AI Collective (axolotl) Discord Summary
-
Deepspeed Disappoints Multi-GPU Setups: Engineers noted frustration with Deepspeed, particularly in multi-GPU scenarios with 4x 4090s, finding that it fails to split the base model across GPUs when using the Lora adapter. The discussion referenced a Deepspeed JSON config file.
-
GaLore Spurs Debate: GaLore, an optimization tool, became a focal point due to its potential for memory savings in Large Language Model training. Despite excitement, skeptics questioned its performance parity with full pretraining, even as integration efforts are underway.
-
Efficiency Methods Under Microscope: Discussions surfaced doubts about the potential misleading nature of various efficiency methods, including ReLoRA and NEFT, prompting consideration of dataset sizes and settings for meaningful finetuning.
-
Gemmaâs Performance Draws Criticism: The Gemma model came under scrutiny for underwhelming performance against 7B Mistral, especially in multi-turn dialogues, and was constrained by being English-only, which limited its value for multilingual tasks.
-
Dependency Wars in AI Development: A common thread across discussions was the battle against dependency conflicts, especially with
torch
versions in the installation ofaxolotl[deepspeed]==0.4.0
. Engineers shared tactics like manual installation and suggested specific versions, includingtorch==2.2.0
, as potential fixes.
OpenRouter (Alex Atallah) Discord Summary
-
Claude 3 Makes Group Chat Cool: Alex Atallah shared a tweet on the positive self-moderated group chat experience using Claude 3.
-
Nitro-Power to Your Projects: New ânitroâ models are in testing with OpenRouter, offering safe integration options, although slight adjustments may be expected during the feedback incorporation phase before an official launch.
-
VSCode Extension Bounty: Alex Atallah offered sponsorship for building a VSCode extension for OpenRouter, rewarding developers with free credits for their contributions.
-
Development Tips and Tricks Exchange: Community members exchanged information on various VSCode extensions for LLMs, including Cursor, Continue, and Tabby, as well as pointing to more cost-effective chat models like Sonar 8x7B by Perplexity.
-
Budget-Friendly AI Conversations: Discussions about the cost implications of engaging with models like Claude 3 Opus were had, whereby Sonar 8x7B was highlighted for its cost-effectiveness over others.
LangChain AI Discord Summary
-
CSV Loader Timeout Troubles: Users noted issues with
UnstructuredCSVLoader
throwing âThe write operation timed outâ errors in LangChain. Although solutions were not discussed, the problem was acknowledged by members sharing similar experiences. -
Raising Red Flags on Phishing: Concerns were raised over an uptick in phishing attempts within the server, particularly through suspicious steamcommunity links, but follow-up actions or resolutions were not detailed.
-
Prompt Puzzle from Past Interactions: In the construction of a chat chain, one user faced issues with
HumanMessage
content improperly propagating intoAIMessage
after initial interactions, despite the intention of memory segregation. A shared code snippet highlighted the problem, though the communityâs advice was still sought. -
LangChain Leverages RAPTOR & Pydantic Pairing: Detailed strategies for utilizing Pydantic with LangChain and Redis for structuring user data and chat histories were under discussion, with an invite for insights into unexpected
AIMessage
behaviors. -
Link Library for LangChain Learners: Released resources included ChromaDB Plugin for LM Studio, a tool for generating vector databases, and the
ask-llm
library for simpler LLM integration into Python projects. Highlighted educational content included a Medium article on RAG construction using RAPTOR, and YouTube tutorials on game crafting and meme generation with Mistral and Giphy API.
CUDA MODE Discord Summary
- CUDA Coarsening Tips: A tweet by @zeuxcg shared insights on properly handling coarsening in code execution, highlighting performance impacts due to benchmarking inconsistencies.
- Comparing Relay and Flash Attention: In the realm of attention mechanisms,
@lancerts
raised a discussion on comparing RelayAttention with ring/flash attention, citing a GitHub repository on vLLM with RelayAttention. - Insightful CUDA Command for GPU Reset: A
sudo
command was provided for resetting GPUs while addressing memory allocation on GPUs,sudo nvidia-smi --gpu-reset -i 0
, as well as sharing a potentially relatednvtop
observation. - New CUDA Project on the Block:
@tspeterkim_89106
introduced a project implementing Flash Attention in CUDA, inviting feedback and collaboration on GitHub. - CUDA Synchronization Mechanics in Torch: The use of
torch.cuda.synchronize()
for accurate performance measurements was recommended, with clarifications on synchronization across CUDA kernels and the cross-device usage of scalar tensors.
Interconnects (Nathan Lambert) Discord Summary
-
Inflection-2.5 Sparks Debates:
@xeophon.
introduced Inflection-2.5, an AI model embedded in Pi, which claims high performance on par with GPT-4 and Gemini. However, a tweet by @HlibIvanov criticizes it as a GPT-4 distill, raising questions about its innovation. -
AI Innovation Frenzy Noted:
@natolambert
showcased enthusiasm over the rapid development within the AI field, referencing multiple new model releases and discussing it further in a tweet. -
OSS Community Nitpicks Frustrate: Discussions touched on the unwelcoming nature of the open-source software community, with
@natolambert
expressing frustration over pedantic criticisms that are discouraging to OSS advocates and@xeophon.
bringing up the confusion over labeling in the space. -
Claude-3 Heats Up AI Competition: The release of Claude-3 by @Anthropic has gathered a fervent community response, along with its variants Opus and Sonnet, as shared in a tweet with significant community involvement noted in the form of 20,000 votes in three days.
-
Expectations Soar for Gemini Ultra: The upcoming Gemini Ultra and its 1M context window feature are highly anticipated, with engineers like
@natolambert
and@xeophon
keen on exploring its capabilities for tasks such as analyzing academic papers.
Alignment Lab AI Discord Summary
- Spam Alert Leads to Policy Change: Following a spam incident with @everyone tags, users like
@joshxt
highlighted the importance of respecting everyoneâs inboxes, leading to a new policy where the ability to ping everyone was disabled to prevent unwanted notifications. - Orca Dataset Dives into Discourse: The release of Microsoftâs Orca dataset was brought up by
@joshxt
, sparking a conversation and personal model preferences, with âPsyonic-cetaceanâ and âClaude 3 Opusâ getting special mentions. - Introducing Project Orca-2:
@aslawliet
put forward a proposal for Orca-2, aiming to encompass a diverse range of datasets beyond Microsoftâs recent release, such as FLAN 2021 and selective zero-shot samples from T0 and Natural Instructions. - Efficient Data Augmentation Tactics:
@aslawliet
proposed using Mixtral as a time and cost-efficient data augmentation method over GPT-4, prompting discussion on efficient methods for model improvement. - Cordial Introductions and Greetings: The community warmly welcomed new participants like
@segmentationfault.
and@1168088006553518183
, emphasizing a friendly atmosphere and the shared interest in contributing to the field of AI.
DiscoResearch Discord Summary
-
Choose Your Language Model Wisely: Depending on constraints,
@johannhartmann
advises using Claude Opus and GPT-4 when there are no limitations, DiscoLM-120B for open-source with substantial memory availability, and VAGOsolutions/Sauerkraut LM-UNA-SOLAR-Instruct as the go-to when working with restricted memory. -
Retrieval-Augmented on the Rise: A study in an arXiv paper shows the benefits of retrieval-augmented language models, with a specific focus on joint training of retriever and LLM, though comprehensive research on this integration remains scarce.
-
The Quest for the Best German-Speaker: The Nous Hermes 2 Mixtral 8x7b was praised by
@flozi00
for its high accuracy in task comprehension. In contrast,@cybertimon
and@johannhartmann
recommended exploring a range of models including DiscoResearch/DiscoLM_German_7b_v1 and seedboxai/KafkaLM-7B-DARE_TIES-LaserRMT-QLoRA-DPO-v0.5 for fluent German language capabilities. -
Evaluating Translation Through Embedding Models:
@flozi00
is developing an approach to score translation quality based on embedding distance, using the OPUS 100 dataset. This initiative could steer enhancements in machine translation (MT) models and data quality. -
mMARCO Dataset Receives the Apache Seal: The mMARCO dataset now boasts an Apache 2.0 license, as shared by
@philipmay
, enriching resources for developers although lacking dataset viewer support on Hugging Face.
LLM Perf Enthusiasts AI Discord Summary
- SAT Scores Soar with Opus: Opus nailed a perfect 800 on the SAT reading section, as shared by @jeffreyw128.
- Memorization vs. Learning: Following the SAT victory,
@dare.ai
touched on the difficulty of ensuring that large models like Opus avoid memorizing answers rather than truly learning. - Opus Earns a Fanclub Member:
@nosa_
humorously warned of a faux-confrontation should Opus learn of their high praise for its performance. - Weaving Webs of Wisdom:
@res6969
praised Opus for its enhanced skill in crafting knowledge webs from expansive documents, highlighting its ability to follow instructions over 35k tokens. - In-Depth Search Dilemma: A task that involved finding a specific name among 500 proved to be challenging for different models including Claude Opus, as reported by
@jeffreyw128
.
Datasette - LLM (@SimonW) Discord Summary
- GPT-4 Stumbles in Mystery Test: GPT-4 failed an unspecified test, according to
@dbreunig
, yet no details about the nature of the test or the type of failure were provided. - Bridging Physical and Digital Libraries:
@xnimrodx
shared a novel blog post about making bookshelves clickable that connect to Google Books pages, along with a demo, sparking discussion about potential applications in library systems and local book-sharing initiatives. - Dollar Signs in Templates Cause Chaos:
@trufuswashington
experienced crashes in thellm
command caused by aTypeError
when using a dollar sign$
in a YAML template that was intended for explaining code-related content, uncovering the issue with the special characterâs handling within template prompts.
Skunkworks AI Discord Summary
-
Mistral Turns Game Crafting Infinite: A new Infinite Craft Game powered by Mistral was shared, highlighting the Mistral language modelâs application in a game that allows players to combine elements to create new items, suggesting innovative use-cases for AI in gaming.
-
Meme Generation Meets AI: The Mistral language model has been used to automate meme creation in combination with the Giphy API, as demonstrated in a YouTube video, with the code available on GitHub for engineers looking to explore the intersection of AI and humor.
PART 2: Detailed by-Channel summaries and links
Perplexity AI â· #announcements (1 messages):
- Perplexity AI now speaks your language: User
@ok.alex
announced that Perplexity is now available in Korean (íê”ìŽ), Japanese (æ„æŹèȘ), German (Deutsch), French (Français), and Spanish (Español). Users can change their preferred interface language in the settings on both desktop and mobile.
Perplexity AI â· #general (413 messagesđ„đ„đ„):
-
Limitations and Comparisons of AI Models: Users discussed various limitations of AI models, with
@twelsh37
sharing a comprehensive report prompt for testing AI capabilities.@zero2567
and others noted the limited usage of Claude 3 Opus to 5 times a day, prompting discussions on the constraints and alternatives like Claude 3 Sonnet and GPT-4 Turbo for coding tasks, as mentioned by users like@tunafi.sh
and@deicoon
. -
Pro Subscriptions and Features Enquiry: Users like
@arrogantpotatoo
and@dieg0brand0
shared their subscription to Perplexity Pro, leading to discussions on the benefits and how to access specialized channels and pro support on Discord. -
Testing Inflection AIâs New Release: The announcement of Inflection AIâs 2.5 release caught the attention of multiple users, including
@codelicious
and@ytherium
, with conversations around the modelâs claimed performance level and efficiency. Several noted its speedy performance, even speculating on its potential for various use cases. -
Opinions on Gemini 1.5: Dissatisfaction and speculations with Gemini 1.5 were voiced by users like
@archient
, who found it disappointing and lacking features compared to other services. The conversation touched upon expectations of Googleâs AI product ecosystem and potential reasons for its perceived underperformance. -
Cultural Respect or Language Model Bias?: A few messages from
@gooddawg10
sparked a discussion on respect and formality in communication, highlighting cultural nuances and prompting users like@twelsh37
and@brknclock1215
to address the use of âsirâ and differences in global communication styles.
Links mentioned:
- Inflection-2.5: meet the worldâs best personal AI: We are an AI studio creating a personal AI for everyone. Our first AI is called Pi, for personal intelligence, a supportive and empathetic conversational AI.
- Cat Dont Care Didnt Ask GIF - Cat Dont Care Didnt Ask Didnt Ask - Discover & Share GIFs: Click to view the GIF
- Perplexity Blog: Explore Perplexityâs blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
- What is Pro Search?: Explore Perplexityâs blog for articles, announcements, product updates, and tips to optimize your experience. Stay informed and make the most of Perplexity.
- Welcome to Live â Ableton Reference Manual Version 12 | Ableton: no description found
Perplexity AI â· #sharing (14 messagesđ„):
- Cruising Through the 3D Space: User
@williamc0206
shared a perplexity search link, potentially discussing the capabilities of navigating or generating 3D spaces. Explore the 3D Space with Perplexity. - Altcoins in the Spotlight:
@b.irrek
posted a link that appears to delve into the movement and trends surrounding alternative cryptocurrencies. Insight on Altcoins Here. - Unraveling the Concept of Ikigai:
@sevonade4
invited others to check out a generated text on the concept of Ikigai, which could be of specific interest depending on personal curiosity. Dive into Ikigai. - Contemplations on Claude 3 Opus: Further,
@sevonade4
highlighted the text generation quality of Claude 3 Opus for those interested in exploring different levels of text generation. Reflect with Claude 3 Opus. - Quantum Queries Addressed:
@vmgehman
shared how Perplexity has been a helpful resource in studying various interpretations of quantum mechanics. Quantum Mechanics Explorations.
Perplexity AI â· #pplx-api (19 messagesđ„):
-
Seeking HTML & JS Code for Llama 70B Integration:
@kingmilos
asked for assistance with code to integrate Llama 70B using HTML and JS because their expertise lies in Python.@po.sh
responded by providing a basic code example, instructing to insert the API key and adjust the model as necessary. -
Feedback on Beta Application Process:
@brknclock1215
expressed disappointment in the perceived impersonal nature of the closed beta application denial, suggesting a desire for more direct communication or feedback. -
Documentation for API Assists Programmers:
@icelavaman
pointed@kingmilos
and@pythoner_sad
to the Perplexity API documentation for guidance on using the API with LLM inference, yet@kingmilos
expressed difficulty due to a lack of HTML and JS knowledge. -
User Seeks Support for Rate Limit Increase:
@xlhu_69745
requested assistance with a rate limit increase for the Sonar model but noted a lack of response to their email.@icelavaman
responded with a non-verbal indication, possibly suggesting where to seek help or check updates. -
Interest in Discover Feature via API:
@yankovich
inquired about Perplexity Discover and whether a similar feature could be implemented for users through Perplexityâs API.@bitsavage.
suggested reviewing the API documentation to understand potential functionalities and consider how to craft personalized user discovery features.
Links mentioned:
pplx-api: no description found
Nous Research AI â· #off-topic (6 messages):
- Crafting with AI: User
@pradeep1148
shared a YouTube video titled âInfinite Craft Game using Mistralâ, which shows the development of a crafting game that begins with four elements and expands as players combine them. - In Search of Ollama Examples:
@pier1337
inquired about examples of running Ollama with Deno, but no further information was provided in the channel. - Missed Connection: User
@teknium
responded to nonexistent tags and apologized for missing a direct message on Twitter sent months ago by an unspecified user.
Links mentioned:
- Infinite Craft Game using Mistral: Let develop Neal Agarwalâs web game Infinite Craft. This is a âcrafting gameâ where you start with just four elements and repeatedly combine pairs of elementâŠ
- Making memes with Mistral & Giphy: Lets make memes using mistral llm and Giphy api#llm #ml #python #pythonprogramming https://github.com/githubpradeep/notebooks/blob/main/Giphy%20Mistral.ipynb
Nous Research AI â· #interesting-links (44 messagesđ„):
-
Yi LLMs Constantly Improving:
@thilotee
highlighted ongoing enhancements to the Yi-34B base model, notably its performance on the âNeedle-in-a-Haystackâ test improving from 89.3% to 99.8%. The discussion touched upon the potential for further finetuning and whether the Yi-9B model supports 200k context, which it appears not to (Yiâs Huggingface page and Reddit discussion). -
Googleâs Gemma Admittedly Flawed:
@mister_poodle
shared concerns that Google may be hastily releasing models, as a team called Unsloth fixed several bugs in the Gemma model which were not addressed elsewhere. The fixes are available in Colab notebooks. -
Claude 3 Opus Claims Debated: A user shared an experience with Claude 3 Opus translating the low-resource Circassian language impressively, but later updates suggested the model might have had prior knowledge of the language. This sparked a discussion on in-context reasoning capabilities and the validity of the original claims (Original Twitter Post).
-
GaLore: The New GitHub Gem:
@random_string_of_character
posted links to GaLore, a project on GitHub, and a Twitter post; however, community validation is needed to determine its effectiveness. A suggestion was made to pair it with low-bit optimizers for potential savings. -
Anticipation for Function Calling Model: Amidst various discussions,
@scottwerner
and@sundar_99385
expressed excitement about trying out a new model with function calling capabilities inspired by a Claude-style. No release date was mentioned, but eagerness for the modelâs launch was evident.
Links mentioned:
- Tweet from An Qu (@hahahahohohe): Today while testing @AnthropicAI âs new model Claude 3 Opus I witnessed something so astonishing it genuinely felt like a miracle. Hate to sound clickbaity, but this is really what it felt like. âŠ
- Stop Regressing: Training Value Functions via Classification for Scalable Deep RL: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match booâŠ
- Unsloth Fixing Gemma bugs: Unsloth fixing Googleâs open-source language model Gemma.
- 01-ai/Yi-9B · Hugging Face: no description found
- GitHub - thu-ml/low-bit-optimizers: Low-bit optimizers for PyTorch: Low-bit optimizers for PyTorch. Contribute to thu-ml/low-bit-optimizers development by creating an account on GitHub.
- GitHub - jiaweizzhao/GaLore: Contribute to jiaweizzhao/GaLore development by creating an account on GitHub.
- 01-ai/Yi-34B-200K · Hugging Face: no description found
- Reddit - Dive into anything: no description found
Nous Research AI â· #announcements (1 messages):
<ul>
<li><strong>New Model Unveiled: Genstruct 7B</strong>: Nous Research announces the release of <strong>Genstruct 7B</strong>, an instruction-generation model that can create valid instructions from raw text, allowing for the creation of new finetuning datasets. The model, inspired by the Ada-Instruct paper, is designed to generate questions for complex scenarios, promoting detailed reasoning.</li>
<li><strong>User-Informed Generative Training</strong>: The <strong>Genstruct 7B</strong> model is grounded in user-provided context, taking inspiration from Ada-Instruct and pushing it further to enhance the reasoning capabilities of subsequently trained models. Available for download on HuggingFace: [Genstruct 7B on HuggingFace](https://huggingface.co/NousResearch/Genstruct-7B).</li>
<li><strong>Led by a Visionary</strong>: The development of <strong>Genstruct 7B</strong> was spearheaded by `<@811403041612759080>` at Nous Research, signifying a team investment in innovation for instruction-based model training.</li>
</ul>
Links mentioned:
NousResearch/Genstruct-7B · Hugging Face: no description found
Nous Research AI â· #general (329 messagesđ„đ„):
- Anthropic releases chat-oriented AI: @teknium announces a new model from Anthropic called Genstruct-7B-GGUF, a generative model that can create dialogues and instruction-based content.
- Discussion about Claude 3âs performance:
@proprietary
exclaims about the Claude 3 modelâs impressive capabilities, sparking curiosity and requests to share outputs. - Evaluating a Twitter IQ Test Chart: An incorrect IQ test chart from Twitter is discussed. @makya2148 and others express skepticism about the reported IQ scores for AI models like Claude 3 and GPT-4.
- GPT-5 Release Rumors Circulate: @sanketpatrikar shares rumors of a potential GPT-5 release, leading to speculation but a consensus of skepticism amongst the chat participants.
- Inflection AI Claims Impressive Benchmark Results: Inflection AI tweets about their new model, Inflection-2.5, claiming it is competitive with GPT-4 on all benchmarks.
@teknium
and@mautonomy
discuss the credibility of these claims.
Links mentioned:
- Tweet from Netrunner â e/acc (@thenetrunna): GPT5_MOE_Q4_K_M.gguf SHA256: ce6253d2e91adea0c35924b38411b0434fa18fcb90c52980ce68187dbcbbe40c https ://t.ly/8AN5G
- Tweet from Inflection AI (@inflectionAI): Pi just got a huge upgrade! Itâs now powered by our latest LLM: Inflection-2.5, which is neck and neck with GPT-4 on all benchmarks and used less than half the compute to train. Pi now has world clasâŠ
- Tweet from Emad (@EMostaque): @Teknium1 Less stable above 7b. Transformer engine has it as main implementation. Intel have one too and Google have int8
- gguf/Genstruct-7B-GGUF · Hugging Face: no description found
- Swim In GIF - Swim In Swimming - Discover & Share GIFs: Click to view the GIF
- MTEB Leaderboard - a Hugging Face Space by mteb: no description found
- Tweet from Sebastian Majstorovic (@storytracer): Open source LLMs need open training data. Today I release the largest dataset of English public domain books curated from the @internetarchive and the @openlibrary. It consists of more than 61 billionâŠ
- Tweet from Prof. Anima Anandkumar (@AnimaAnandkumar): For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimiâŠ
- Tweet from FxTwitter / FixupX: Sorry, that user doesnât exist :(
- Tweet from Daniel Han (@danielhanchen): Found more bugs for #Gemma: 1. Must add <bos> 2. Thereâs a typo for <end_of_turn>model 3. sqrt(3072)=55.4256 but bfloat16 is 55.5 4. Layernorm (w+1) must be in float32 5. Keras mixed_bfloaâŠ
- How to Fine-Tune LLMs in 2024 with Hugging Face: In this blog post you will learn how to fine-tune LLMs using Hugging Face TRL, Transformers and Datasets in 2024. We will fine-tune a LLM on a text to SQL dataset.
- Microsoftâs new deal with Franceâs Mistral AI is under scrutiny from the European Union: The European Union is looking into Microsoftâs partnership with French startup Mistral AI. Itâs part of a broader review of the booming generative artificial intelligence sector to see if it raisâŠ
- llama_index/llama-index-packs/llama-index-packs-raptor/llama_index/packs/raptor at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- Weyaxi/Einstein-v4-7B · Hugging Face: no description found
- Tweet from Weyaxi (@Weyaxi): đ Exciting News! đ§âđŹ Meet Einstein-v4-7B, a powerful mistral-based supervised fine-tuned model using diverse high quality and filtered open source datasets!đ âïž I also converted multiple-choiceâŠ
- GitHub - jiaweizzhao/GaLore: Contribute to jiaweizzhao/GaLore development by creating an account on GitHub.
- WIP: galore optimizer by maximegmd · Pull Request #1370 · OpenAccess-AI-Collective/axolotl: Adds support for Galore optimizers Still a WIP, untested.
- GitHub - e-p-armstrong/augmentoolkit: Convert Compute And Books Into Instruct-Tuning Datasets: Convert Compute And Books Into Instruct-Tuning Datasets - e-p-armstrong/augmentoolkit
- GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. - PKU-YuanGroup/Open-Sora-Plan
Nous Research AI â· #ask-about-llms (41 messagesđ„):
- Local OIllama Announcement: User
@teknium
declared that OIllama is intended for local running, while@lakshaykc
suggested it could potentially have an endpoint created for backend inference. - Current Training on Hermes Update:
@teknium
confirmed to@aliissa
that the dataset used for Nous-Hermes-2-Mistral-7B-DPO did not originally include function calling data but a newer version is now in training. - Sampling Tech for JSON/Function Calls:
@ufghfigchv
is refactoring a logit sampler designed for JSON/function calls, which functions well with Hugging Face (HF) and very-large language models (vllm). - GPT-4 Free Access Question:
@micron588
inquired about free access to GPT-4, and@teknium
clarified availability, pointing to a website called Corcel.io offering free ChatGPT-4 like interactions. - Inquiry on Context Length for Nous-Hermes Models:
@nickcbrown
asked why the context lengths for the Nous-Hermes model built on Mixtral/Mistral had been apparently reduced and expressed a desire for longer contexts for applications like RAG (Retrieval-Augmented Generation).
Links mentioned:
- Corcel · Build with the power of Bittensor: no description found
- Lilac - Better data, better AI: Lilac enables data and AI practitioners improve their products by improving their data.
LAION â· #general (300 messagesđ„đ„):
- NVIDIA Puts the Brakes on Cross-Platform CUDA: A recent change that has been stirring up attention is NVIDIAâs ban on using translation layers for running CUDA on non-NVIDIA chips, targeting projects like ZLUDA which aimed at bringing AMD to parity with NVIDIA on Windows.
- Midjourney Scrape Shake-Up: Stability AI is accused of scraping Midjourney for prompts and images, resulting in their employees being banned from Midjourney. The incident has sparked various reactions, with some suggesting it might not be work-related and others joking about the situation.
- Marketing Missteps: Conversation turned to a case where a marketing department is reportedly spending a disproportionate amount on ad conversions, leading to incredulous reactions and discussions on the inefficiency of such spending.
- SD3 Speculations Amidst Dataset Discussions: As talk of new datasets like MajorTOM released by
@mikonvergence
on Twitter surfaces, thereâs anticipation around Stability AIâs plans to distribute SD3 invites and make PRs to diffusers, with users discussing the potential and limitations of SD3âs architecture. - Scraping Etiquette Examination: Amid the ongoing discussions about scraping, from the technical implications to the social impacts, users stress the importance of proper scraping techniques and etiquette, reinforcing that understanding and abiding by these principles is crucial.
Links mentioned:
- Nvidia bans using translation layers for CUDA software â previously the prohibition was only listed in the online EULA, now included in installed files [Updated]: Translators in the crosshairs.
- Tweet from Nick St. Pierre (@nickfloats): In MJ office hours they just said someone at Stability AI was trying to grab all the prompt and image pairs in the middle of a night on Saturday and brought down their service. MJ is banning all of âŠ
- TwoAbove/midjourney-messages · Datasets at Hugging Face: no description found
- Regulations.gov: no description found
- GitHub - itsme2417/PolyMind: A multimodal, function calling powered LLM webui.: A multimodal, function calling powered LLM webui. - GitHub - itsme2417/PolyMind: A multimodal, function calling powered LLM webui.
LAION â· #research (74 messagesđ„đ„):
-
Debate on Training Efficiency and Pruning: Discussion between
@mkaic
and@recviking
focused on whether LLMs are parameter-inefficient and if training methods or architectures could be optimized.@mkaic
suggested that the ability to prune an LLM without a substantial drop in performance indicated potential for more efficient training schemes, while@recviking
argued that pruning reduces a modelâs generalizability and that the vastness of potential inputs makes efficiency evaluation complex. -
SVD and Training Slowness:
@metal63
reported a significant slowdown when applying SVD updates during the training of the Stable Cascade model, with training pausing for 2 minutes for updates.@thejonasbrothers
expressed a dislike towards SVD due to its slowness, hinting at the practical issues with certain training optimizations. -
Pruning Effects on Model Performance:
@thejonasbrothers
noted that pruning models often leads to performance issues such as token repetition and stability concerns, and that while some models may seem saturated at around 7 billion parameters, the situation might differ at larger scales, such as 1 trillion parameters. -
General Utility of Pruned LLMs:
@mkaic
maintained that even after heavy pruning, LLMs retain more generalizability than expected and posed questions about the necessity of large parameters for model training versus inference. The potential for major breakthroughs in training more efficient yet comparably effective LLMs was highlighted as a promising research area. -
Current Structural Limitations:
@thejonasbrothers
and@recviking
discussed the structural limitations of current LLMs and the saturation of efficiencies with an especially critical lens on attention-based optimizations. The dialog raised questions about whether the industry is reaching the limits of compression and model efficiency within existing architectures.
Links mentioned:
Neverseenagain Yourleaving GIF - Neverseenagain Yourleaving Oh - Discover & Share GIFs: Click to view the GIF
OpenAI â· #ai-discussions (204 messagesđ„đ„):
- Claude Outshines GPT-4 in Elemental Knowledge:
@you.wish
and@kennyphngyn
share their positive experiences with Claude 3 Opus (referred to as âC3â), reporting it outperforms GPT-4 on complex tasks and in elementary classes. - Debate Over Claude 3âs Coding Capabilities:
@testtm
finds that Gemini 1.5 Pro succeeds at a Python GUI code task on the first try, while Claude 3 doesnât, suggesting that both models have strengths and weaknesses. - MMLU Dataset Efficacy in Question: On the subject of MMLU datasets,
@privetin
and@foxalabs_32486
criticize the set for having questions that donât make logical sense and containing a significant percentage of incorrect answers, calling for its removal from AI model evaluation. - Limits of YouTube for AI Model Evaluation Highlighted:
@7877
asserts that Youtube is not the best source for quick, raw evaluation numbers of AI models due to its focus on entertainment over detailed information, prompting a discussion on alternative evaluation resources. - Concerns Over the Potency of Free AI Services:
@eskcanta
expresses concern about the limited interactions allowed with Claudeâs free version compared to the more generous allowances from OpenAIâs services, contemplating the economic sustainability of these AI companies in providing free services.
Links mentioned:
EvalPlus Leaderboard: no description found
OpenAI â· #gpt-4-discussions (29 messagesđ„):
- Troubles Accessing GPT Account: User
@haseebmughal_546
faced issues with a frozen page, preventing access to the GPT account. - GPT Policy Update on Code: User
@watcherkk
raised concerns about GPT not providing full codes due to a policy change, mentioning an âout of policyâ message. - Service Disruption on OpenAI:
@qilin111
reported a 6-hour service interruption;@dystopia78
confirmed a partial API outage, although OpenAIâs status page showed âAll Systems Operationalâ at the time. - Inconsistent GPT-4 Availability: Users
@cetacn
,@emante
,@liyucheng09
,@openheroes
,@ed1431
, and@malc2987
discussed intermittent access to GPT-4, with varying degrees of operability among users. - Mix-Up on GPT and Other Models:
@cliffsayshi
inquired about using custom models like Human GPT via API, to which@solbus
clarified that GPTs are exclusive to ChatGPT and not accessible via API, providing links to OpenAIâs model overview and additional info on GPTs vs. assistants.
Links mentioned:
- OpenAI Status: no description found
- OpenAI Status: no description found
OpenAI â· #prompt-engineering (68 messagesđ„đ„):
- Navigating Channel Posting Requirements:
@giorgiomufen
was unclear on how to post in a specific channel due to a required tag, and@eskcanta
assisted by pointing out that one must click on the âsee more tagsâ options first. - Positive Phrasing for Role-play Prompts: Users
@toothpiks252
,@eskcanta
, and@dezuzel
advised@loamy_
on how to phrase a role-play prompt positively, suggesting itâs best to tell the model explicitly what to do instead of what not to do. - GPT-5 for Enhanced Problem-Solving:
@spikyd
expressed confidence that GPT-5 would have a 10% better chance of solving a specific bird enigma puzzle, while@eskcanta
believed it might simply require more targeted training with the vision model. - Improving Randomization in Storytelling: In a conversation about adding randomness to GPTâs choices,
@solbus
recommended using Pythonâs random function via data analysis tools to give@interactiveadventureai
more variety in AI-generated stories. - Seeking Help for GPT Classifier Development:
@chemlox
sought advice on whether to use a react-based agent or fine-tuning for building a GPT classifier to determine the status of conversations, with@eskcanta
suggesting to test the base model first before deciding on a more complex solution.
OpenAI â· #api-discussions (68 messagesđ„đ„):
- Tag Troubles in Posting:
@giorgiomufen
had an issue posting in a channel that required selecting a tag first.@eskcanta
helped them out by pointing out the requirement to click on one of the âsee more tagsâ options. - Positivity Beats Negativity in Instructions:
@loamy_
sought a positive phrase alternative to âdo not repeatâ, which@toothpiks252
and@eskcanta
assisted with by suggesting explicit instructions on desired actions. - Bird Enigma and GPT-5âs Potential:
@spikyd
and@eskcanta
discussed the potential for solving a complex bird enigma with the upcoming GPT-5, with emphasis on improved vision models and reasoning capabilities for such tasks. - Random Number Generation Query:
@interactiveadventureai
inquired about methods for GPT to generate different random seeds for number generation, with@solbus
suggesting using Pythonâs random function as a possible solution. - A Warm Welcome to a New Member:
@thebornchampion
introduced themselves as an aspiring data analyst and prompt engineering enthusiast, and@eskcanta
welcomed them while engaging in a discussion about their interests and use cases for ChatGPT.
LM Studio â· #đŹ-general (187 messagesđ„đ„):
- Discussing Tech Specs and Troubleshooting:
@kavita_27183
shares their hardware specs, boasting 512 GB RAM and 24GB VRAM, while@jedd1
assists in troubleshooting libstdc++ errors and checking whether the system recognizes VRAM correctly. A shared error message points to a library version issue (GLIBCXX_3.4.29 not found
), suggesting a need for updating. - Server-Side Issues with Local Models: Diverse local model topics are addressed, including
@_benoitb
encountering API issues with nodes servers,@datasoul
discussing GGUF-related errors in LM Studio, and@mattjpow
seeking clarification on server context behavior. Advice and responses are offered by@heyitsyorkie
. - Hardware Recommendations for LLM Work:
@saber123316
deliberates on getting a Macbook Pro with M3 Max or M2 Max, seeking community advice on which would suffice for local LLM processing. The conversation touches on the trade-offs regarding price, performance, and the value of more RAM. - Discovering LM Studioâs Capabilities: Users
@aeiou2623
and@.lodis
explore features ranging from image uploads in conversations to model support.@heyitsyorkie
provides guidance on loading GGUF files and clarifies that LM Studio is primarily for running local models offline. - OpenAI Critique and Alternate AI Services Insights:
@saber123316
and@rugg0064
discuss the implications of the revealed non-openness of OpenAI, with mentions of Elon Muskâs dissatisfaction and OpenAIâs proprietary approach prompting users to consider other AI service subscriptions such as POE monthly for accessing various models like Claude and GPT-4.
Links mentioned:
- Inflection-2.5: meet the worldâs best personal AI: We are an AI studio creating a personal AI for everyone. Our first AI is called Pi, for personal intelligence, a supportive and empathetic conversational AI.
- Reor: AI note-taking app that runs models locally & offline on your computer.
- Reddit - Dive into anything: no description found
- RIP Midjourney! FREE & UNCENSORED SDXL 1.0 is TAKING OVER!: Say goodbye to Midjourney and hello to the future of free open-source AI image generation: SDXL 1.0! This new, uncensored model is taking the AI world by stoâŠ
- Accelerating LLM Inference: Medusaâs Uglier Sisters (WITH CODE): https://arxiv.org/abs/2401.10774https://github.com/evintunador/medusas_uglier_sisters
- The unofficial LMStudio FAQ!: Welcome to the unofficial LMStudio FAQ. Here you will find answers to the most commonly asked questions that we get on the LMStudio Discord. (This FAQ is community managed). LMStudio is a free closedâŠ
- 22,000 H100s later, Inflection 2.5!!!: đ Links đhttps://inflection.ai/inflection-2-5â€ïž If you want to support the channel â€ïžSupport here:Patreon - https://www.patreon.com/1littlecoder/Ko-Fi - htâŠ
LM Studio â· #đ€-models-discussion-chat (27 messagesđ„):
-
Optimal AI for Storytelling Specified: User
@laszlo01
inquired about the best AI model for storytelling using Open Chat 3.5.@jason_2065
recommended Mistral-7b-instruct-v0.2-neural-story.Q4_K_M.gguf with 24 layers and 8192 context size, noting the memory constraints. -
LM Studio Lacks Image-Generation Models:
@karisna
asked about models capable of drawing images within LM Studio.@heyitsyorkie
clarified that LM Studio does not support such models and recommended using Automatic 1111 for Stable Diffusion tasks. -
Starcoder2 Running Issues and Support: Users
@b1gb4ng
and@madhur_11
wondered if starcoder2 could be run via LM Studio, to which@heyitsyorkie
informed that it is not supported in the current LM Studio version. -
Exploring Image Generation Alternatives:
@callmemjinina
sought a model that generates pictures.@heyitsyorkie
explained that Language Models and LM Studio cannot perform this task, advising to look for Stable Diffusion tools and tutorials online. -
Request for Starcoder2:
@zachmayer
shared interest in using starcoder2 and posted a link to its Hugging Face page, but@wolfspyre
hinted at the need for patience, implying future support might be coming.
Links mentioned:
Kquant03/TechxGenus-starcoder2-15b-instruct-GGUF · Hugging Face: no description found
LM Studio â· #đ§ -feedback (4 messages):
- Users Seek Guidance: User
@tiro1cor15_10
expressed a desire for guidance to fully realize the potential they see in using the service. - Proxy Support Lacking on macOS:
@calmwater.0184
provided feedback indicating that LM Studio currently does not support the Proxy feature on macOS 14.3 (v0.2.16), impacting the ability to search or download models. - Feature Enhancement Suggestion:
@calmwater.0184
suggested that LM Studio could look into the user experience and features of Embra AI to become a more efficient Productivity Booster Assistant for users at all levels. - Channel Usage Clarification:
@heyitsyorkie
directed users to stop using the designated feedback channel for help requests and instead use the appropriate help channel.
LM Studio â· #đ-hardware-discussion (46 messagesđ„):
-
Extreme Context Experiments on the Smaug Model:
@goldensun3ds
reported running a test involving a 200K context in the Smaug model, which displayed erratic RAM usage between 70GB and 20GB, under only CPU usage. Their latest message indicated no outputs yet and continued fluctuations in RAM usage. -
VRAM and Power Supply Discussions:
@wilsonkeebs
sought advice for the smallest PSU to power a standalone RTX 3090, and@heyitsyorkie
recommended a minimum of 550W but suggested 750W is standard. Wilsonkeebs emphasized the need for sufficient PCIe cables rather than the wattage itself. -
Exploring Large Contexts for Niche Use Cases:
@aswarp
brought up the potential for LLMs to process very large data sets monthly like entire codebases for thorough reporting, acknowledging the trade-off with processing time. Aswarp sees potential particularly for applications in government and smaller businesses. -
VRAM Demand for Processing in LLM Studio:
@jason_2065
mentioned using 105,000 context and experiencing high resource usage with 42GB RAM and over 20GB VRAM, which aligns with the high demands discussed by others when handling large contexts or datasets in memory-intensive LLM tasks. -
Managing Mismatched VRAM for Machine Learning Models: The conversation touched on difficulties associated with mismatched VRAM when running LLMs, noting adjustments to the LLM preset file for GPU allocation which
@goldensun3ds
finds troublesome due to requiring a restart of the LM Studio.
Links mentioned:
- Razer Core X - Thunderboltâą 3 eGPU | Razer United Kingdom: Now compatible with Mac and Windows laptops, featuring 3-slot PCI-Express desktop graphic cards, 650W power supply, and charges via USB-C.
- PSU for NVIDIA GeForce RTX 3090 | Power Supply Calculator: See what power supply you need for your NVIDIA GeForce RTX 3090
LM Studio â· #crew-ai (2 messages):
- Seeking Speed Enhancement Techniques:
@alluring_seahorse_04960
is looking for methods to increase response speed for an unspecified process. They are also experiencing an error: âConnection to telemetry.crewai.com timed outâ. - Suggestion for Baseline Local Operation: In response,
@wolfspyre
suggests establishing a simple baseline operation that can run locally to possibly address the speed issue.
LM Studio â· #open-interpreter (85 messagesđ„đ„):
-
Tackling the Preprompt Syntax:
@nxonxi
inquired about the correct syntax for modifying thedefault_system_message
in Open Interpreter settings across different operating systems. They shared their struggles and attempts to alter the system message for Linux, Windows, and WSL. -
Confusion Over
-s
and DOS Documentation:@nxonxi
mentioned confusion over how to use Open Interpreterâs prompt settings, discussing the-s
or--system_message
option, and shared a link to the documentation, which led to further discussion with@1sbefore
on finding the correct usage and commands. -
Profiles and Configurations - The Journey Continues: Throughout the conversation,
@1sbefore
offered assistance, suggesting to check the correct paths and configurations in Open Interpreterâs Python environments, while@nxonxi
reported on various unsuccessful attempts to modify the prompt or use profiles effectively. -
Exploring Code-Trained Models: In the latter part of the conversation, the discussion shifted toward experiences with different language models like
deepseek-coder-6.7B-instruct-GGUF
, as well as the integration of prompts and system messages within these models.@1sbefore
shared a link to Hugging Face hosting potentially useful GGUFs and provided insights on the modelsâ performances. -
Belief in the Power of Curiosity: When faced with perplexities about working with the Open Interpreter and language models,
@1sbefore
encouraged@nxonxi
to embrace their curiosity as they ventured to find the right sources and setups. The exchange was filled with trial-and-error experiences and shared learning moments including attempts to clone git repositories and adjustments of Python environments.
Links mentioned:
- owao/LHK_DPO_v1_GGUF · Hugging Face: no description found
- All Settings - Open Interpreter: no description found
- All Settings - Open Interpreter: no description found
LlamaIndex â· #announcements (1 messages):
- User Survey Announced:
@seldo_v
encourages everyone to take a 3-minute user survey to help LlamaIndex understand their user base better. The survey can be found at SurveyMonkey and aims to improve documentation, demos, and tutorials.
Links mentioned:
LlamaIndex user survey: Take this survey powered by surveymonkey.com. Create your own surveys for free.
LlamaIndex â· #blog (5 messages):
- Launch of LlamaParse JSON Mode: The LlamaIndex team is excited to announce the new LlamaParse JSON Mode, which simplifies the RAG pipeline creation by parsing text and images from a PDF into a structured dictionary. This feature enhances capabilities when combined with multimodal models like claude-3 opus. View tweet.
- LlamaParse testing by AIMakerspace:
@AIMakerspace
tested LlamaParse with notable results and an in-depth analysis published that details its functionalities and performance metrics. In-depth look at LlamaParse. - Integration with VideoDB for RAG Over Video Streams: LlamaIndex introduced an integration with
@videodb_io
, enabling the upload, search, and streaming of videos directly within LlamaIndex, by words spoken or visual scenes presented. Announcement tweet. - Comprehensive Video Guide for Claude 3: A new video guide has been released offering a comprehensive tutorial on using Claude 3 for various applications, including Vanilla RAG, Routing, and Sub-question query planning with LlamaIndexâs tools. Claude 3 Cookbook.
- LlamaIndex User Survey for Enhanced Resources: LlamaIndex is conducting a 3-minute user survey to better understand its usersâ expertise and needs in order to tailor documentation, demos, and tutorials more effectively. Take the user survey.
Links mentioned:
LlamaIndex user survey: Take this survey powered by surveymonkey.com. Create your own surveys for free.
LlamaIndex â· #general (298 messagesđ„đ„):
-
A Algorithm Chat*:
@nouiri
asked if the A* algorithm could be applied to similarity search.@whitefang_jr
confirmed it is possible by subclassing the embedding class and changing the similarity search method, referencing LlamaIndexâs default use of cosine similarity. -
Configuring Chatbots for Contextual Outputs:
@techexplorer0
inquired about configuring a RAG chatbot for brief, context-specific responses.@kapa.ai
suggested using aResponseSynthesizer
or post-processing responses to achieve concise outputs, linking to the LlamaIndex documentation for exemplified setup. -
Query Engine Customization on LlamaIndex:
@cheesyfishes
replied to various implementation queries, explaining that chat engines typically accept strings, the chunking in LlamaIndex isnât randomized, and suggesting explorations of the source code for deeper issues such as the use of Gemini as a chat engine or embedding within Azure OpenAI. -
Ingesting Documents with Slack:
@habbyman
sought advice on best practices for Slack document ingestion, looking to maintain individual message metadata without losing conversational context. -
LlamaIndex Vector Store Recommendations: New users like
@generalenthu
asked for vector store recommendations compatible with LlamaIndex, receiving suggestions from@cheesyfishes
and@jessjess84
to try Qdrant, ChromaDB, or Postgres/pgvector for their extensive documentation and robust user bases. -
Discrepancies Between LLM Direct Queries and VectorStoreIndex:
@jessjess84
experienced subpar responses from LlamaIndexâsVectorStoreIndex
compared to direct LLM queries, with@teemu2454
attributing this to prompt templates used byquery_engine
and advising adjustments for improved results. In response to a separate query by the same user,@teemu2454
clarified that theVectorStoreIndex
is separate from the LLM used to process texts, with embeddings used only to fetch text for context during LLM queries.
Links mentioned:
- no title found: no description found
- Building a Slack bot that learns with LlamaIndex, Qdrant and Render â LlamaIndex, Data Framework for LLM Applications: LlamaIndex is a simple, flexible data framework for connecting custom data sources to large language models (LLMs).
- Chat Stores - LlamaIndex đŠ v0.10.17: no description found
- OpenAI - LlamaIndex đŠ v0.10.17: no description found
- llama_index/llama-index-integrations/vector_stores/llama-index-vector-stores-opensearch/llama_index/vector_stores/opensearch/base.py at 0ae69d46e3735a740214c22a5f72e05d46d92635 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- llama_index/llama-index-legacy/llama_index/legacy/llms/openai_like.py at f916839e81ff8bd3006fe3bf4df3f59ba7f37da3 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- llama_index/llama-index-core/llama_index/core/base/embeddings/base.py at df7890c56bb69b496b985df9ad28121c7f620c45 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- GitHub - mominabbass/LinC: Code for âEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationâ: Code for âEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationâ - mominabbass/LinC
- OMP_NUM_THREADS.): no description found
- llama_index/llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/base.py at f916839e81ff8bd3006fe3bf4df3f59ba7f37da3 · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
- [Bug]: Issue with EmptyIndex and streaming. · Issue #11680 · run-llama/llama_index: Bug Description Im trying to create a simple Intent Detection agent, the basic expected functionality is to select between to queryengines with RouterQueryEngine, one q_engine with an emptyindex, tâŠ
- Available LLM integrations - LlamaIndex đŠ v0.10.17: no description found
- Implement EvalQueryEngineTool by d-mariano · Pull Request #11679 · run-llama/llama_index: Description Notice I would like input on this PR from the llama-index team. If the team agrees with the need and approach, I will provide unit tests, documentation updates, and Google Colab notebooâŠ
- Chroma Multi-Modal Demo with LlamaIndex - LlamaIndex đŠ v0.10.17: no description found
- Multimodal Retrieval Augmented Generation(RAG) | Weaviate - Vector Database: A picture is worth a thousand words, so why just stop at retrieving textual context!? Learn how to perform multimodal RAG!
- Custom Response - HTML, Stream, File, others - FastAPI): FastAPI framework, high performance, easy to learn, fast to code, ready for production
- no title found: no description found
- llama_index/llama-index-integrations/llms/llama-index-llms-vertex/llama_index/llms/vertex/utils.py at main · run-llama/llama_index: LlamaIndex is a data framework for your LLM applications - run-llama/llama_index
LlamaIndex â· #ai-discussion (1 messages):
- New Approach to Enhancing In-context Learning:
@momin_abbas
shared their latest work focusing on improving in-context learning through Few-Shot Linear Probe Calibration. They ask for support by starring the GitHub repository.
Links mentioned:
GitHub - mominabbass/LinC: Code for âEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationâ: Code for âEnhancing In-context Learning with Language Models via Few-Shot Linear Probe Calibrationâ - mominabbass/LinC
Latent Space â· #ai-general-chat (14 messagesđ„):
- Tea Time Tales from Twitter: User
@420gunna
amusingly described bringing news from Twitter, like a messenger arriving with updates. - Midjourney and Stability AI Clash:
@420gunna
highlighted a controversy involving Stability AI allegedly scraping data from Midjourney, leading to a ban of Stability AI employees from Midjourney as reported by@nickfloats
in this Twitter post. - Mi5 Type Confusion Cleared Up: User
@nav10
humorously clarified initial misconceptions about a data scrape incident being likened to a spy movie scenario but was actually a Discord scrape. - Newsletter Expander or Troublemaker?:
@guardiang
playfully suggested@272654283919458306
might be expanding their newsletterâs scope in light of the recent data scrape incident reported on Twitter. - Laughing Off the Discord Drama:
@swyxio
posted a GIF from Tenor.com in response to the unfolding drama involving Stability AI and Midjourney.
Links mentioned:
- Tweet from Nick St. Pierre (@nickfloats): In MJ office hours they just said someone at Stability AI was trying to grab all the prompt and image pairs in the middle of a night on Saturday and brought down their service. MJ is banning all of âŠ
- Tweet from Prof. Anima Anandkumar (@AnimaAnandkumar): For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimiâŠ
- Obi Wan Im Not Brave Enough For Politics GIF - Obi Wan Im Not Brave Enough For Politics Talk - Discover & Share GIFs: Click to view the GIF
Latent Space â· #ai-announcements (4 messages):
- New Podcast Episode Alert:
@swyxio
announced the latest podcast episode featuring<@776472701052387339>
is now live, available via a tweet posted here. - Podcast Discussion Hits Hacker News:
@swyxio
shared that the podcast with Soumith is also garnering attention on Hacker News. - Model Serving Paper Presentation:
@swyxio
invited<@&1107197669547442196>
members to join<@720451321991397446>
âs presentation on the Model Serving survey paper in their Discord channel.
Latent Space â· #llm-paper-club-west (204 messagesđ„đ„):
- Volunteer Effort Acknowledged: The community expressed gratitude towards
@720451321991397446
for volunteering for a presentation.@eugeneyan
and others cheered on the effort. - Fancy Footwork with Models:
@swizec
discussed noticeable performance differences when running Ollama on Intel vs M2, whilst@amgadoz
reviewed technical aspects such as parallelism and look-ahead decoding in large language models. - Inference Optimization Deep Dive: Speculative decodingâs effectiveness was a topic of interest with
@shivdinho
,@yikesawjeez
, and others discussing how it might vary with hardware. There were also recommendations for resources like Drugs by EGjoni and speculations about using FlashAttention for speed improvements in large model serving. - Decentralized and Distributed Deliberations: The channel touched on distributed training (
@yikesawjeez
mentioned reading DiLoCo paper) and there were discussions about the potential pitfalls of GPU configurations affecting model output from different instances (@ayenem
reflected on an OpenAI incident). - Model Serving Survey Presentation:
@swyxio
provided a link to a Google Slides presentation and the group shared thoughts and reactions to the material presented. There was significant discussion on distributed inference and the trade-offs of different architectures.
Links mentioned:
- Join Slido: Enter #code to vote and ask questions: Participate in a live poll, quiz or Q&A. No login required.
- Join Slido: Enter #code to vote and ask questions: Participate in a live poll, quiz or Q&A. No login required.
- SpecInfer: no description found
- Monk: Monk is the AI DevOps for the cloud. Let your infrastructure take care of itself.
- Notion â The all-in-one workspace for your notes, tasks, wikis, and databases.: A new tool that blends your everyday work apps into one. Itâs the all-in-one workspace for you and your team
- Datasets for Large Language Models: A Comprehensive Survey: This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructuâŠ
- Quickstart â vLLM: no description found
- no title found: no description found
- How to Generate and Use Synthetic Data for Finetuning: Overcoming the bottleneck of human annotations in instruction-tuning, preference-tuning, and pretraining.
- FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI: How FlashAttention became the new industry standard architecture, how FlashAttention 2 is 2x faster still, life inside the Stanford Hazy Research lab, and hints of the post-Transformers future
- Welcome to SkyPilot! â SkyPilot documentation: no description found
- Load Balancing is Impossible : Tyler McMullen discusses load balancing techniques and algorithms such as Randomized Least-conns, Join-Idle-Queue, and Load Interpretation. Load balancing perfectly may be impossible in the real worldâŠ
- Reddit - Dive into anything: no description found
- Reddit - Dive into anything: no description found
- Welcome to SkyPilot! â SkyPilot documentation: no description found
- Model Serving Survey Paper - Paper Club: Model Serving Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems https://arxiv.org/abs/2312.15234v1
- Model Serving Survey Paper - Paper Club: Model Serving Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems https://arxiv.org/abs/2312.15234v1
- GitHub - TheBlokeAI/dockerLLM: TheBlokeâs Dockerfiles: TheBlokeâs Dockerfiles. Contribute to TheBlokeAI/dockerLLM development by creating an account on GitHub.
- DiLoCo: Distributed Low-Communication Training of Language Models: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accâŠ
- Petals: Collaborative Inference and Fine-tuning of Large Models: Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models ofâŠ
- GitHub - EGjoni/DRUGS: Stop messing around with finicky sampling parameters and just use DR”GS!: Stop messing around with finicky sampling parameters and just use DR”GS! - EGjoni/DRUGS
- GitHub - OpenNMT/CTranslate2: Fast inference engine for Transformer models: Fast inference engine for Transformer models. Contribute to OpenNMT/CTranslate2 development by creating an account on GitHub.
- FireAttentionâââServing Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs: Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs
- GitHub - PygmalionAI/aphrodite-engine: PygmalionAIâs large-scale inference engine: PygmalionAIâs large-scale inference engine. Contribute to PygmalionAI/aphrodite-engine development by creating an account on GitHub.
- GitHub - lilacai/lilac: Curate better data for LLMs: Curate better data for LLMs. Contribute to lilacai/lilac development by creating an account on GitHub.
Eleuther â· #announcements (1 messages):
-
New Benchmarks for Korean Language Model Evaluation:
@gson_arlo
announced two new evaluation datasets, Hae-Rae Bench and K-MMLU, designed to test language modelsâ proficiency in Korean. The Hae-Rae Bench assesses modelsâ knowledge of Korean culture, while K-MMLU is focused on Korea-specific questions and includes a challenging subset for current models. -
Call for Contributions to Multilingual Benchmarks: In an effort to improve evaluation practices for languages other than English and Chinese,
@gson_arlo
invites individuals who speak non-mainstream languages or belong to non-mainstream cultures within English-speaking countries to join the <#1208111628051152969> channel. They can contribute by designing benchmarks that assess language model competencies significant to their cultures.
Eleuther â· #general (78 messagesđ„đ„):
-
Exploring Internal Batching Logic of vLLM:
@rwamit
inquired how to perform batched inference using vLLM.@baber_
clarified that vLLM has internal batching, which means itâs unnecessary to manually implement batching when using vLLM. -
Commentary Sought on AI Regulation:
@wonkothesensible
shared a link regarding a request for public commentary on the regulation of open source AI and freely available models. The document calls for a report on the risks and benefits of âdual-use foundation models.â -
The BitNet vs. Full-Precision NN Efficiency Debate: With the release of BitNet b1.58,
@kyo_takano
offered an introductory notebook on Ternary Neural Networks, suggesting that they are inefficient compared to full-precision NNs during training, despite faster inference speeds. -
MidJourney Discord Drama:
@teknium
reported allegations that Stability AI disrupted MidJourney services, leading to a ban on those behind the incident. The circumstances and reasons remain obscure, with@stellaathena
and others questioning how scraping Discord would impact MidJourneyâs servers. -
Potential Collaboration Opportunity with an Expert:
@andrew_f0874
is seeking part-time, voluntary research collaborations in AI/ML. With a PhD from Cornell and prior experience as a research scientist at Google focusing on privacy-preserving technology, he could be especially useful for projects at the intersection of AI/ML and his areas of expertise.
Links mentioned:
- Quickstart â vLLM: no description found
- Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models: This report introduces \texttt{EEVE-Korean-v1.0}, a Korean adaptation of large language models that exhibit remarkable capabilities across English and Korean text understanding. Building on recent higâŠ
- Regulations.gov: no description found
- Tweet from Nick St. Pierre (@nickfloats): In MJ office hours they just said someone at Stability AI was trying to grab all the prompt and image pairs in the middle of a night on Saturday and brought down their service. MJ is banning all of âŠ
- Introduction to Ternary Neural Networks: Introduction to Ternary Neural Networks. GitHub Gist: instantly share code, notes, and snippets.
- Megatron-DeepSpeed/tasks/eval_harness/evaluate.py at main · microsoft/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2 - microsoft/Megatron-DeepSpeed
Eleuther â· #research (77 messagesđ„đ„):
- Pythia Model Suite by EleutherAI:
@alxsp.
provided a link to the Pythia model suite, mentioning its training on the same dataset. - Adaptive Recurrent Vision Paper Discussion:
.the_alt_man
shared a link to a NeurIPS 2023 paper about zero-shot computation scaling in vision models explaining it as a âuniversal transformer but with CNN.â - Batch Normalization vs. Token Normalization Debate: A debate emerged around whether to token normalize or batch normalize loss when training models, as raised by
@thatspysaspy
, with insights from@ai_waifu
suggesting normalization by target tokens, leading to a detailed discussion on the topic. - Innovative Memory Reduction Strategy: GaLore: Discussion of a new memory-efficient training strategy, Gradient Low-Rank Projection (GaLore), was sparked by
@fredholm
with@xylthixlm
,@random_string_of_character
, and@ai_waifu
examining its claims on memory savings and improved results over full-rank updates. - Optimizer Hook Insights with PyTorch: Conversations around the technical implementation of optimizer steps using gradients (
@xylthixlm
,@_inox
, and others) highlighted PyTorchâs ability to index a dictionary with a parameter and the potential impacts on models with tied parameters.
Links mentioned:
- Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels: no description found
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection: Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-ranâŠ
- Stop Regressing: Training Value Functions via Classification for Scalable Deep RL: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match booâŠ
- Pythia Scaling Suite - a EleutherAI Collection: no description found
- Locally Typical Sampling: Todayâs probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perpâŠ
- pytorch/torch/_tensor.py at main · pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch
- GaLore/torchrun_main.py at master · jiaweizzhao/GaLore: Contribute to jiaweizzhao/GaLore development by creating an account on GitHub.
- GitHub - jiaweizzhao/GaLore: Contribute to jiaweizzhao/GaLore development by creating an account on GitHub.
- WIP: galore optimizer by maximegmd · Pull Request #1370 · OpenAccess-AI-Collective/axolotl: Adds support for Galore optimizers Still a WIP, untested.
Eleuther â· #lm-thunderdome (15 messagesđ„):
-
Harnessing Desired Output Formats:
@pminervini
inquired about how to customize model outputs to a specified format using the lm-evaluation-harness.@baber_
advised on modifying thegenerate_until
method and utilizing_loglikelihood_tokens
preceded by conditionally making the generate call based on calculated log_probs. -
MCQA Eval/Artifacts Paper Shared by Creator:
@nish5989
shared their recent paper focused on multiple-choice questions evaluation and dataset artifacts. A discussion ensued regarding the impact of following or ignoring irrelevant instructions in model prompts, with@hailey_schoelkopf
referencing related works on how models react to prompts. -
Consistency in Fine-Tuning:
@karatsubabutslower
questioned the lack of a standardized approach for fine-tuning models on benchmarks like the GLUE dataset. The lm-evaluation-harness provides a standardized evaluation method, but there does not seem to be a similar framework for the fine-tuning process. -
Evaluating with Generation vs. Loglikelihood:
@hailey_schoelkopf
expressed interest in how often models fail to produce answers in the correct format, using generation methods rather than loglikelihood.@nish5989
responded that, as noted in their paperâs appendix, validity typically wasnât an issue, especially in stricter settings, with considerations for future experiments using likelihood methods. -
Discussion on Multilingual Evaluation Criteria:
@seanbethard
queried the preference for language-specific evaluation criteria over crosslingual ones in language understanding and reasoning. They questioned the efficacy and necessity of language-specific eval criteria, especially if it precludes non-native speakers from creating it.
Links mentioned:
- lm-evaluation-harness/lm_eval/models/huggingface.py at 9e6e240229429d2214bc281bed7a4e288f5169a1 · EleutherAI/lm-evaluation-harness.): A framework for few-shot evaluation of language models. - EleutherAI/lm-evaluation-harness
- Multiple Choice Question Standard Deviation · Issue #1524 · EleutherAI/lm-evaluation-harness: I saw that the multiple choice type evaluation would compute the metrics along with standard deviation. From my understanding, multiple choice answer is chosen from the choice with highest probabilâŠ
- Do Prompt-Based Models Really Understand the Meaning of their Prompts?: Recently, a boom of papers has shown extraordinary progress in zero-shot and few-shot learning with various prompt-based models. It is commonly argued that prompts help models to learn faster in the sâŠ
- Are Language Models Worse than Humans at Following Prompts? Itâs Complicated: Prompts have been the center of progress in advancing language modelsâ zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionâŠ
Eleuther â· #gpt-neox-dev (34 messagesđ„):
-
Optimizer Memory Peaks Under Scrutiny:
@tastybucketofrice
raised an issue regarding memory peaks during the optimizer step, pointing to GitHub issue #1160 and PyTorch memory profiling for more context.@gaindrew
requested specific configuration details to faithfully reproduce the problem for further analysis. -
A Night of Dependency Conflicts:
@biiter
mentioned facing multiple challenges, including incompatible PyTorch version dependencies and a crashing machine due to parallel compilation. They succeeded in running the setup on Ubuntu 22.04 with Cuda 12.3 after a series of workarounds, including installing apex from the NVIDIA git repository and capping parallel compilation (flash-attention info). -
Docker Image, a Solution to Dependency Hell?:
@tastybucketofrice
suggested using their newly rebased docker container around pytorch NGC to alleviate recent issues with dependencies like apex (GitHub commit #1170).@tfidia
recommended docker + enroot for a lightweight installation alternative, mentioning its support by the Slurm plugin pyxis for containerized job launches. -
Poetry for Dependency Management:
@catboy_slim_
proposed moving dependencies into poetry for more deterministic package management, discussing both the challenges and aspirations for a stable source installation outside of Docker. They stressed the need for a source build that mirrors the reliability of a docker environment. -
Challenges with Fused Backward/Optimizer Implementations:
@gaindrew
describes progress on a fused backward/optimizer implementation that has led to a significant reduction in peak memory usage at the expense of breaking certain DeepSpeed and logging functionalities. Further work is set to address these issues and is specifically aimed at Adam optimizers.
Links mentioned:
- GitHub: Letâs build from here: GitHub is where over 100 million developers shape the future of software, together. Contribute to the open source community, manage your Git repositories, review code like a pro, track bugs and feaâŠ
- Single node Pythia 14M training on ngc pytorch 24.02 container (#1170) · EleutherAI/gpt-neox@119950c: * Pythia 14M training on ngc pytorch 24.02 container
- pre-commit
Co-authored-by: Quentin Anthony <[email protected]>
- GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact attention: Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. - EleutherAI/gpt-neox
- PyTorch Lightning Fused optimizer step · Issue #1160 · EleutherAI/gpt-neox: Add PyTorch Lightning memory optimizations. https://lightning.ai/pages/community/tutorial/faster-pytorch-training-by-reducing-peak-memory/
HuggingFace â· #general (106 messagesđ„đ„):
- Entrepreneurial Chat Space Inquiry:
@snoozyy
asked if there is a discussion space for small company entrepreneurs using open source models in the Discord; no specific channel or place was suggested in the provided messages. - Learning TTS and Model Training:
@ericpeter24
expressed heâs new to Text-to-Speech models and is looking to train a model using coqui with a dataset on HuggingFace but doesnât know where to start. - Showcasing Personal Projects:
@anuragshas
inquired about creating featured models or spaces on their personal Hugging Face profile; the question wasnât directly addressed in the subsequent messages. - Exploring Multimodal Models:
@welltoobado
mentioned multi_token, a model for embedding various modalities into large language models, and discussed resource requirements for running such models with@kuki1941
. - Inference API in Android Studio:
@hari4626
sought guidance on how to use Inference APIs or endpoints in Android Studio, with@amirgame197
suggesting making a web request with the correct data using a programming language supported by Android Studio.
Links mentioned:
- Inflection-2.5: meet the worldâs best personal AI: We are an AI studio creating a personal AI for everyone. Our first AI is called Pi, for personal intelligence, a supportive and empathetic conversational AI.
- no title found: no description found
- @andrewyng on Hugging Face: âDeepLearning.AI just announced a new short course: Open Source Models withâŠâ: no description found
- Haiper | Generative AI For Video Content Creation: Video creation AI products crafted to empower individuals in creatively expressing themselves.
- Repeat After Me: Transformers are Better than State Space Models at Copying: Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we referâŠ
- Deploying đ€ Hub models in Vertex AI: no description found
- Quickstart â vLLM: no description found
- How to build a multi-label & multi-class dataset correctly?: I am unsure how to proceed creating a Dataset with multiple labels and classes where the classes are not the same for the different labels. A multi-label example is shared here, but the classes are aâŠ
- Deploying đ€ Hub models in Vertex AI: no description found
- GitHub - sshh12/multi_token: Embed arbitrary modalities (images, audio, documents, etc) into large language models.: Embed arbitrary modalities (images, audio, documents, etc) into large language models. - sshh12/multi_token
HuggingFace â· #today-im-learning (10 messagesđ„):
- Background Project in Motion: User
@singe.r
is working on converting img2img to create backgrounds for products and inquires if anyone has undertaken a similar project before. - FP8 Training Achievement Unlocked:
@neuralink
shared their learning experience about accomplishing 55% end-to-end FP8 training from scratch, including developing the kernels. - Invitation to Rust Language Adventure:
@manel_aloui
announced the start of their journey learning the Rust programming language and invites others to join. - Rust Enthusiasts Converge: Following
@manel_aloui
âs call,@cursorop
chimed in about their own experience learning Rust, specifically the candle library for machine learning, sparking a connection between the two Rust learners. - Soliciting Peer Learning for Stanford ML Course:
@singhaditya4333
, also known as Aditya, is seeking companions to join in completing the Stanford machine learning course.
HuggingFace â· #cool-finds (11 messagesđ„):
-
AI Enters Mainstream with a Bang:
@vardhan0280
shared an Investopedia article highlighting that 2022 saw AI go mainstream, largely due to the popularity of OpenAIâs DALL-E and ChatGPT. -
Community Collaboration for Open Sora:
@miko_al
encouraged spreading the word to support the Open-Sora-Plan project which aims to reproduce OpenAIâs text-to-video model with limited resources. -
Precision Health and AI in Space:
@rtscott2001
shared a link to a Nature Machine Intelligence article titled âBiomonitoring and Precision Health in Deep Space Supported by Artificial Intelligence,â available here. -
Karpathy Shares Insights on Training LLMs:
@.lawlord
posted an engaging Twitter thread by Andrej Karpathy addressing the complexities of training large language models (LLMs), a taxing process needing dedicated teams focused on cluster maintenance and fault tolerance. The thread concludes with a link to the famous OPT-175B logbook for further insights. -
Open Source Models with Hugging Face Course Launch:
@andysingal
highlighted a new course offered by DeepLearning.AI focused on open source models using Hugging Faceâs tools, and the course can be viewed here.
Links mentioned:
- DLAI - Open Source Models with Hugging Face: Introduction · Selecting models · Natural Language Processing (NLP) · Translation and Summarization · Sentence Embeddings · Zero-Shot Audio Classification · Automatic Speech Recognition · Text to SpeeâŠ
- Letâs build GPT: from scratch, in code, spelled out.: We build a Generatively Pretrained Transformer (GPT), following the paper âAttention is All You Needâ and OpenAIâs GPT-2 / GPT-3. We talk about connections tâŠ
- Tweet from Andrej Karpathy (@karpathy): Nice read on the rarely-discussed-in-the-open difficulties of training LLMs. Mature companies have dedicated teams maintaining the clusters. At scale, clusters leave the realm of engineering and becomâŠ
- Artificial Intelligence (AI): What It Is and How It Is Used: Artificial intelligence or AI refers to the simulation of human intelligence in machines that are programmed to think and act like humans.
- Exploring TRLx: Hands-on Guide for Implementing Text Summarization through RLHF: Learn text summarization with TRLx. Fine-tune models, create reward feedback and apply PPO for effective learning.
- GitHub - PKU-YuanGroup/Open-Sora-Plan: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project.: This project aim to reproducing Sora (Open AI T2V model), but we only have limited resource. We deeply wish the all open source community can contribute to this project. - PKU-YuanGroup/Open-Sora-Plan
- leom0311 - Overview: leom0311 has 9 repositories available. Follow their code on GitHub.
- Biomonitoring and precision health in deep space supported by artificial intelligence | Nature Machine Intelligence: no description found
HuggingFace â· #i-made-this (18 messagesđ„):
-
New Model Yi-9B Debuts:
@tonic_1
announced the release of Yi-9B, a new addition to the model collection, inviting users to check it out on Hugging Face with demo available here. They hinted at HuggingFaceâs future plans including leaderboards and gaming competitions. -
Chatbot Arena Leaderboard by rwitz_:
@rwitz_
shared his recent work, a ChatBot-Arena-Leaderboard for Mistral Fine-tunes, styled like the lmsys leaderboard and invited model contributions on its hosted space, found here. -
UDOP Model Demonstration:
@shinjeki
provided a simple demo space for playing around with Microsoftâs latest document AI model, UDOP, which can be explored here. -
ComfyUI Workflow Unveiled:
@alx.ai
announced the release and open-sourcing of new workflow and nodes for comfyUI, specifically for creating parallax motion in UIs, details of which can be followed through a tweet posted here. -
AI-Enhanced Educational Dataset:
@locutusque
introduced the UltraTextbooks v2, a large NLP dataset designed for training language models in education, which includes textbooks on various academic subjects, now available on Hugging Face.
Links mentioned:
- Tweet from undefined: no description found
- Andyrasika/Gemma-ChatML · Hugging Face: no description found
- Tweet from TheHeroShep (@TheHeroShep): đ§Excited to share the first (of many) @getsaltai workflow & node release âą @comfyUI workflow to generate controlled 3D parallax motion with a single prompt or input image âą Possibly useful for geneâŠ
- ETHDenver Recap: Emerging Trends in web3 and AI: Where weâre at, where weâre heading, and the return of Kevin.
- Andyrasika/vit-base-patch16-224-in21k-finetuned-lora-food101 · Hugging Face: no description found
- Locutusque/UltraTextbooks-2.0 · Datasets at Hugging Face: no description found
- UDOP DocVQA - a Hugging Face Space by RamAnanth1: no description found
- Testing - a Hugging Face Space by rwitz: no description found
- Open Llm Leaderboard Viz - a Hugging Face Space by dimbyTa: no description found
- Langchain Crash Course (Gradio) - a Hugging Face Space by chongdashu: no description found
- GitHub - chongdashu/langchain-crash-course at lesson-1: Contribute to chongdashu/langchain-crash-course development by creating an account on GitHub.
- Yi 9B - a Hugging Face Space by Tonic: no description found
- Building a Datacomp CLIP index with Fondant - Fondant.): no description found
HuggingFace â· #reading-group (9 messagesđ„):
- Weekend Session Consideration:
@shafi8433
expresses a preference for weekend sessions because the current timing coincides with their working hours. - Timezone Coordination: In response to
@lunarflu
âs inquiry,@shafi8433
mentions being in the IST timezone. - Deciphering Big Data Technology:
@ibrahim_72765_43784
shares a link for a comprehensive exploration of the Big Data Technology Ecosystem posted on Kaggle. - End-to-End Chatbot Using Llama2 Inquiry:
@neerajjulka1986
seeks suggestions for implementing an end-to-end chatbot project using an open-source model, needed for understanding fine-tuning, deployment, and monitoring. - Advice on Fine-Tuning and Deployment:
@chad_in_the_house
responds to@neerajjulka1986
, recommending Hugging Faceâs PEFT for fine-tuning and their text-generation-inference GitHub repository for deployment, but advises careful consideration of compute resources before fine-tuning.
Links mentioned:
- Deciphering the Big Data Technology Ecosystem: A Comprehensive Exploration đ | Kaggle: Deciphering the Big Data Technology Ecosystem: A Comprehensive Exploration đ.
- GitHub - huggingface/peft: đ€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.: đ€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
- GitHub - huggingface/text-generation-inference: Large Language Model Text Generation Inference: Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.
HuggingFace â· #diffusion-discussions (2 messages):
- Slow Down, Power User!: HuggingMod gently reminded
@user
to temper their enthusiasm and reduce the frequency of their messages with a friendly nudge to slow down a bit đ€. - Quest for SDXL-Lightning LoRA Knowledge:
@happy.j
asked for assistance on how to integrate SDXL-Lightning LoRA with a standard sdxl model, linking to a discussion post that didnât fully resolve their issue. - Tips from ByteDance for SDXL Magic: The ByteDance organization offered advice on merging SDXL-Lightning LoRA with a trained SDXL model. They proposed starting with a traditional SDXL model and adding LoRA for acceleration, suggesting advanced techniques such as merging then using adversarial objectives for those seeking a challenge.
Links mentioned:
ByteDance/SDXL-Lightning · finetune: no description found
HuggingFace â· #computer-vision (7 messages):
- Labeling and Splitting Data Tip:
@huzuni
suggests itâs good to label and split data if one does not mind making it public. - User-Friendly Interface for Segmentation:
@huzuni
finds the current interface friendlier than most SAM plugins, especially for segmentation and bounding box labeling. - Slow Down to HuggingModâs Pace:
@HuggingMod
reminded <@715715500470042706> to slow down their posting frequency. - Questioning the Impact of Normalization:
@huzuni
wonders about the effects of normalization in their data as they observed no significant changes with methods like imagenet norm, channel wise norm, and min-max norm. - Looking for Ultralytics Alternatives:
@prod.dopamine
is seeking a good alternative to ultralytics, expressing dissatisfaction with the AGPL license.
HuggingFace â· #NLP (23 messagesđ„):
-
MMLU Dataset Inquiry:
@privetin
expressed interest in the structure and content of the MMLU datasets but no further details or discussions followed the inquiry. -
Slow Down, Youâre Posting Too Fast!:
@HuggingMod
reminded@715715500470042706
to moderate the pace of their postings to prevent spamming the channel. -
Tokenization Troubleshoot:
@mbotta
encountered issues with tokenizing prompts for the OpenHermes-2.5 model, which lacks a âtokenizer.jsonâ file. Through the conversation with@cursorop
, it was clarified that for a fine-tuned model like OpenHermes, one should use the tokenizer of the base model, which in this case is Mistral. -
Seeking the Right Model for Job Title Normalization:
@deb0rian
asked for advice on a base model for fine-tuning that can predict normalized job titles, disciplines, and seniority from user input.@lucnzz
suggested an alternate approach using a basic retriever and generator model, while a subsequent exchange included a humorous GIF link response and dog-related pun by@cakiki
and@lucnzz
. -
Model Recommendations for Colab:
@iloveh8
inquired about the best small/medium open-source language model suitable for Google Colab, with responses from@cursorop
and@lucnzz
offering suggestions including a 2b model or flan T5, and any small quantized model respectively.
Links mentioned:
Golden Retriever Dog GIF - Golden Retriever Dog Puppy - Discover & Share GIFs: Click to view the GIF
HuggingFace â· #diffusion-discussions (2 messages):
-
Slow Your Roll, Poster:
@HuggingMod
cautioned user@715715500470042706
to reduce the frequency of their postings in the server, emphasizing the importance of pacing in the chat. -
Seeking Guidance on SDXL-Lightning LoRA Merge: User
@happy.j
is looking for assistance on combining SDXL-Lightning LoRA with a standard SDXL model and shared a discussion link seeking answers on fine-tuning or creating their own version. The link includes suggestions such as training a regular SDXL model and then applying LoRA for acceleration or merging SDXL-Lightning LoRA before further training, with the latter including advanced techniques involving MSE loss and adversarial objectives.
Links mentioned:
ByteDance/SDXL-Lightning · finetune: no description found
HuggingFace â· #gradio-announcements (1 messages):
<ul>
<li><strong>Gradio 4.20.0 Unleashed with External Auth Providers</strong>: User <code>@yuviii_</code> announced Gradio's newest version 4.20.0 which supports <strong>external / arbitrary authentication providers</strong>, including HF OAuth and Google OAuth, enhancing app security and user flexibility. Check out the examples on HF Spaces - [HF OAuth Example](https://huggingface.co/spaces/Wauplin/gradio-oauth-private-models) and [Google OAuth Example](https://huggingface.co/spaces/gradio/oauth-example).</li>
<li><strong>Clean Up With Ease</strong>: The latest Gradio update introduces a <code>delete_cache</code> parameter to <code>gr.Blocks</code>, allowing for automatic cleanup of files upon app shutdown.</li>
<li><strong>Smooth User Logout Experience</strong>: Users can now enjoy a smoother sign-off with Gradio's new <code>/logout</code> functionality.</li>
<li><strong>Stylish Downloads with Gradio</strong>: The <code>gr.DownloadButton</code> component is now available, making the provision of downloadable content in apps easier and more visually appealing. For more information, visit the [documentation for gr.DownloadButton](https://www.gradio.app/docs/downloadbutton#demos).</li>
</ul>
Links mentioned:
Gradio DownloadButton Docs: no description found
OpenAccess AI Collective (axolotl) â· #general (44 messagesđ„):
- Deepspeed Draws Ire:
@duh_kola
raises an issue with Deepspeed, contributing to the frustration expressed by@leoandlibe
, who says, Deepspeed sucks ass in axo, especially when handling multiple GPUs like his 4x 4090s, as it canât split the base model across them, only the Lora adapter. - Gemma Falters Compared to Mistral:
@noobmaster29
checks in after a hiatus to inquire about Gemmaâs performance compared to 7B Mistral.@lee0099
responds with the communityâs view that Gemma underperforms, specifically with multi-turn dialogues, and@le_mess
clarifies itâs only trained in English, dismissing its utility for multilingual tasks. - GaLore Promises Memory Efficiencies: The GaLore optimizer is highlighted in a tweet shared by
@noobmaster29
, which promises significant reductions in memory requirements for Large Language Model (LLM) training. While@lee0099
criticizes the lack of detailed performance data,@nafnlaus00
and others discuss its potential for improving accessibility of LLM training on consumer-grade hardware. - GaLore Integration in Axolotl WIP: As GaLore garners attention,
@yamashi
is ready to contribute, offering to make a pull request later and urging others to test, while@caseus_
provides updates and bug fixes, sharing related YAML in search of further assistance. - Technical Clarifications and Calls for Assistance:
@tank02.
seeks guidance on which version of CUDA-supported torch to install, with@nanobitz
suggesting the newer version is typically better.@nanobitz
also indicates the existence of a configuration for training Gemma with Axolotl when@noobmaster29
inquires about it.
Links mentioned:
- Tweet from Prof. Anima Anandkumar (@AnimaAnandkumar): For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimiâŠ
- oaaic: Weights & Biases, developer tools for machine learning
- GitHub - jiaweizzhao/GaLore: Contribute to jiaweizzhao/GaLore development by creating an account on GitHub.
- WIP: galore optimizer by maximegmd · Pull Request #1370 · OpenAccess-AI-Collective/axolotl: Adds support for Galore optimizers Still a WIP, untested.
OpenAccess AI Collective (axolotl) â· #axolotl-dev (45 messagesđ„):
-
16bit LoRAâs Limited Appeal?:
@suikamelon
questioned who would use the recently supported 16bit LoRA, referencing a commit in axolotlâs GitHub.@caseus_
pointed out that thereâs a quantized DoRA PR still in process and they will remove a check once merged. -
Memory Quandary in LoftQ: Issues with LoftQâs memory usage and incorrect initialization documentation were identified by
@suikamelon
with discussions found in GitHub Issue #1525 and Pull Request #1532 on Hugging Faceâs PEFT repository. -
Excitement and Skepticism Over GaLore:
@suikamelon
shared a new training strategy called GaLore, along with corresponding GitHub code and a tweet from Anima Anandkumar. The discussion involved considering GaLoreâs potential memory usage benefits, with@caseus_
questioning the claimed performance equivalence to full pretraining. -
Integration Challenges with GaLore: Some users, including
@stoicbatman
and@caseus_
, discussed trying to integrate GaLore into their projects but encountered issues with getting the training started, as reflected by a Pull Request on GaLore. -
Baited by Methods on Multiple Occasions?: A recurring sentiment of possibly being misled by different efficiency methods arose with
@yamashi
,@nruaif
, and others, with mentions of ReLoRA, NEFT, and potentially a third unidentified method. The conversation veered into questions about proper dataset sizes and settings for effective finetuning and pretraining.
Links mentioned:
- Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch: Low-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit a specific, often smaller, dataset by adjusting oâŠ
- LoftQ does not seem to quantify the base model · Issue #1525 · huggingface/peft: System Info transformers version: 4.37.2 Platform: Ubuntu 18.04.6 LTS GPU: RTX GeForce 3090 x 2 Python version: 3.10.13 Huggingface_hub version: 0.20.3 Safetensors version: 0.4.2 Accelerate versionâŠ
- Is it possible to use qlora with relora? · Issue #5 · Guitaricet/relora: no description found
- GitHub - euclaise/SlimTrainer: Full finetuning of large language models without large memory requirements: Full finetuning of large language models without large memory requirements - euclaise/SlimTrainer
- peft/examples/loftq_finetuning at main · huggingface/peft: đ€ PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - huggingface/peft
- be a bit more lenient on transformers version by winglian · Pull Request #5 · jiaweizzhao/GaLore: Hi there! Amazing research on this. Weâre looking to integrate galore into the axolotl project here OpenAccess-AI-Collective/axolotl#1370 One issue I ran into is the transformers dependency pin isâŠ
- support for DoRA w/ PEFT (#1363) · OpenAccess-AI-Collective/axolotl@0cfdb2c: no description found
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection: Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-ranâŠ
- GitHub - jiaweizzhao/GaLore: Contribute to jiaweizzhao/GaLore development by creating an account on GitHub.
- support for DoRA w/ PEFT (#1363) · OpenAccess-AI-Collective/axolotl@0cfdb2c: no description found
- WIP Fix LoftQ docs and tests by BenjaminBossan · Pull Request #1532 · huggingface/peft: Relates to #1525 Donât merge this, some GPU tests are failing Unfortunately, the docs I wrote about how to use LoftQ were incorrect, based on a misunderstanding I had. In reality, it is quite a biâŠ
- jiawe - Overview: Victory LOVES Preparation! jiawe has 47 repositories available. Follow their code on GitHub.
OpenAccess AI Collective (axolotl) â· #general-help (20 messagesđ„):
-
DeepSpeed Zero3 Configuration Check:
@caseus_
queried ifstage3_gather_16bit_weights_on_model_save
was set to true in the DeepSpeed JSON.@seungduk
confirmed its presence in the config, providing a visual from the axolotl GitHub for reference. -
Python Package Dependency Hell:
@tank02.
faced a dependency conflict issue while installingaxolotl[deepspeed]==0.4.0
, which had conflicting versions oftorch
required by other dependencies. Solutions offered included manual installation of specific dependencies and the use of versiontorch==2.2.0
based on advice from@remek1972
. -
Manual Installation Tactic to Resolve Module Version Conflict:
@rtyax
and@remek1972
both recommended manually installing conflicting dependencies to overcome version clashes, with@remek1972
specifically citing successful installation after adjusting PyTorch and xformers versions. -
Masking Mechanism Clarification for Training:
@suikamelon
sought to understand how masked tokens are treated during training, leading to an exchange where@nanobitz
clarified thataxolotl
sets the labels for masked tokens to -100, indicating they are not ignored but excluded from the loss calculation.
Links mentioned:
- Dependency Resolution - pip documentation v24.1.dev0: no description found
- axolotl/deepspeed_configs/zero3_bf16.json at main · OpenAccess-AI-Collective/axolotl: Go ahead and axolotl questions. Contribute to OpenAccess-AI-Collective/axolotl development by creating an account on GitHub.
OpenRouter (Alex Atallah) â· #announcements (2 messages):
-
Tweet on Group Chatting with Claude 3:
@alexatallah
shared a Twitter post about a positive experience with group chatting using Claude 3, which was self-moderated. The story is available on OpenRouterAIâs Twitter. -
âNitroâ Models in Testing:
@alexatallah
informed users of the appearance of new ânitroâ models, which are safe to use and build with. Users were advised that slight changes might occur until an official announcement is made as they are incorporating feedback from early testers.
OpenRouter (Alex Atallah) â· #general (94 messagesđ„đ„):
- Sponsorship Offer for VSCode Extension Builders:
@alexatallah
extended an offer to sponsor anyone willing to build a VSCode extension compatible with OpenRouter with free credits. - Community Discusses VSCode Extensions for LLMs: Community members shared various VSCode extensions for coding assistance with LLMs like OpenRouter and GPT-4, including alternatives such as Cursor, Continue, and Tabby.
- Inefficiency with Long Documents on OpenRouter Chat:
@aliarmani
experienced issues with long document processing on OpenRouter chat inference and received recommendations for alternatives such as Typingmind and ChatbotUI. - Claude 3 Opus Conversations Engaging But Impact Wallets: Users like
@phoshnk
and@billbear
discussed the engaging nature of conversations with Claude 3 Opus, while others like@xiaoqianwx
lamented its cost;@filth2
highlighted Sonnetâs cost-effectiveness. - Moderation Layers on OpenRouter Explained: Community members explained the moderation layers applied to the models on OpenRouter, with OpenAI and Anthropic models receiving additional moderation compared to self-moderated beta models.
Links mentioned:
- Continue: no description found
- Home | Tabby: Description will go into a meta tag in <head />
- Configuration | Continue: Configure your LLM and model provider
- Perplexity: Sonar 8x7B by perplexity | OpenRouter: Sonar is Perplexityâs latest model family. It surpasses their earlier models in cost-efficiency, speed, and performance. The version of this model with Internet access is [Sonar 8x7B Online](/moâŠ
- GitHub - continuedev/continue: â© The easiest way to code with any LLMâContinue is an open-source autopilot for VS Code and JetBrains: â© The easiest way to code with any LLMâContinue is an open-source autopilot for VS Code and JetBrains - continuedev/continue
LangChain AI â· #general (55 messagesđ„đ„):
- Handling CSV Loader Timeout Errors:
@kinghaze443
is seeking help for the error âThe write operation timed outâ which occurs while loading a CSV usingUnstructuredCSVLoader
from LangChain. They provided a snippet of code and mentioned their current LangChain and OpenAI versions. No solution was proposed or discussed in the provided messages. - Concerns Over Phishing Attempts on Discord:
@archiacme
reported an increase in server members sharing suspicious steamcommunity links, suggesting these could be phishing attempts, and inquired about removing them. There was no follow-up or resolution mentioned in the message history. - Clarity on Query Scaling and Large Dataset Handling:
@dbounds
inquired about the right approach for usingretrieval-augmented generation
chains with large datasets, concerned with examples that serialize an entire database into a string for context. The concern highlighted the impracticality of this method for large data sets, but no specific solution was given in the messages provided. - Lack of Documentation and Assistance on Azure AI Search:
@juroy
is looking for information on setting up chains such asRetrievalQA
using Azure AI Search with LangChain and cannot find it in the documentation. Several responses acknowledge the lack of help and difficulty in finding solutions, emphasizing the novelty of the technology and community dynamics, yet no direct solution to@juroy
âs initial problem was provided. - LangChain Streaming and Prompt Viewing:
@cybersmiths
mentions implementing streaming in Python, while@yd4224
asks how to view the complete prompt text includingchat_history
andagent_scratchpad
, hoping to see the actual string sent to the LLM model. A callback solution provided by@chester3637
with code snippets and a mention of Langsmith might serve as a starting point for resolving both inquiries, focusing on the visibility and performance of the LangChain interaction.
Links mentioned:
- no title found: no description found
- LangChain Expression Language (LCEL) | đŠïžđ Langchain: LangChain Expression Language, or LCEL, is a declarative way to easily compose chains together.
- Google Colaboratory: no description found
- Google Colaboratory: no description found
- LangSmith: no description found
- Any updates on Assistant API Streaming?: Building a web app using assistants api. Lack of streaming is seriously hurting the UI and making me consider just going another route until streaming is available. Has anyone heard anything about whâŠ
- LangSmith: Get your LLM app from prototype to production.
- Retrieval augmented generation (RAG) | đŠïžđ Langchain: Letâs now look at adding in a retrieval step to a prompt and an LLM, which adds up to a âretrieval-augmented generationâ chain:
- Azure AI Search | đŠïžđ Langchain: [Azure AI
- langchain/libs/community/langchain_community/document_loaders/parsers/pdf.py at v0.1.11 · langchain-ai/langchain: đŠđ Build context-aware reasoning applications. Contribute to langchain-ai/langchain development by creating an account on GitHub.
- Extract Text from a PDF â pypdf 4.0.1 documentation: no description found
- langchain_agent/assistant at master · couthyapper7/langchain_agent: a csv reader made in langchain with a fine tuned gpt - couthyapper7/langchain_agent
LangChain AI â· #langchain-templates (9 messagesđ„):
- User Profile Construction via Pydantic and LangChain:
@justanothergraphguy
is working on a chat chain to build user profiles, using Pydantic for structured output and LangChain with Redis for chat history. They shared theUserProfile
andResult
Pydantic models to structure the user data. - System Prompt for Interactive User Profile Creation: A detailed system prompt was provided which guides the AI on how to interact with users to build their profiles by extracting information, asking follow-up questions, and confirming the final details.
- Integration Woes in Chain Construction:
@justanothergraphguy
discussed an issue with a chat chain where theHumanMessage
is incorrectly included in theAIMessage
content after the first interaction, despite the intention for memory propagation. - An Example Shows Unexpected Results: An example code interaction was shared where Redis stores the chat history, but when a new message is introduced, the AIMessage includes the prior
HumanMessage
content, indicating a possible issue with the chat history handling. - Seeking the Communityâs Insights:
@justanothergraphguy
is looking for community input on the issue with the chat chain, specifically regarding the erroneous inclusion ofHumanMessage
in subsequentAIMessage
outputs. They provided the initial example code snippet to illustrate the problem.
LangChain AI â· #share-your-work (7 messages):
-
RAG meets RAPTOR:
@andysingal
shared a Medium article about building a Long Context Retriever-Aggregator (RAG) from scratch using RAPTOR and Langchain. The write-up details adapting to evolving knowledge domains and overcoming traditional knowledge retrieval shortcomings. -
ChromaDB Plugin Release:
@vic49.
introduced the ChromaDB Plugin for LM Studio, a solution for creating a ChromaDB vector database to work with LM Studio in server mode. -
Architectural and Medical Generative Scripts:
@_johnny1984
is working on a similar concept to the ChromaDB plugin and shared a script directory showcasing that they are attempting to build new hospitals through generative algorithms, covering both architectural and medical professional roles. -
Integrate LLMs Seamlessly into Python Projects:
@madgic_
announced a new library using langchain calledask-llm
, which allows for the easy integration of LLM interactions into Python projects, inspired by langchain-decorators. The library uses Jinja templating for prompts and is described in detail on GitHub. -
Exploring Vision Models in Production:
@vru.shank
announced a workshop with MultiOn and Quizizz focusing on the application of vision models in production, inviting interested individuals to RSVP for the workshop being held by the LLMs in Prod community. The details and registration can be accessed through this link.
Links mentioned:
- Releases · BBC-Esq/ChromaDB-Plugin-for-LM-Studio: Plugin that creates a ChromaDB vector database to work with LM Studio running in server mode! - BBC-Esq/ChromaDB-Plugin-for-LM-Studio
- GitHub - FlorianMgs/ask-llm: The easiest way to supercharge your apps with LLM!: The easiest way to supercharge your apps with LLM! - FlorianMgs/ask-llm
- Building Long Context RAG from Scratch with RAPTOR using Langchain: Ankush k Singal
- no title found): no description found
- Multi-Modal LLMs in Prod | Practitionersâ Workshop · Luma: The LLMs in Prod community is hosting practitioners from top Gen AI companies to talk about how they are using multi-modal models (vision, audio, image gen, etc.) inâŠ
LangChain AI â· #tutorials (2 messages):
-
Crafting Infinite Fun with AI:
@pradeep1148
shared a YouTube video titled âInfinite Craft Game using Mistralâ, showcasing the development of a game where players start with four elements and combine them to discover new ones using Mistral. -
Meme Generation with Mistral & Giphy:
@pradeep1148
also posted a YouTube video titled âMaking memes with Mistral & Giphyâ, demonstrating the use of Mistral and Giphy API to create memes, and provided a link to the related GitHub notebook.
Links mentioned:
- Making memes with Mistral & Giphy: Lets make memes using mistral llm and Giphy api#llm #ml #python #pythonprogramming https://github.com/githubpradeep/notebooks/blob/main/Giphy%20Mistral.ipynb
- Infinite Craft Game using Mistral: Let develop Neal Agarwalâs web game Infinite Craft. This is a âcrafting gameâ where you start with just four elements and repeatedly combine pairs of elementâŠ
CUDA MODE â· #general (6 messages):
- Inquiring About Channel Purpose:
@8bitelon
questioned if âgeneralâ meant off-topic, to which@marksaroufim
responded that thereâs no particular off-topic channel but one might be created if needed. - Meme Humor Tolerance:
@iron_bound
expressed a desire to post memes, eliciting a response from@marksaroufim
asking to see the best memes in a specific memes channel. - Flash Attention CUDA Project by tspeterkim_89106:
@tspeterkim_89106
shared a project implementing Flash Attention in CUDA, looking for feedback and discussions on the implementation. The project is available on GitHub. - Quick Guide to CUDA:
@iron_bound
provided a concise educational resource with a link to a YouTube video titled âNvidia CUDA in 100 Secondsâ. Watch the video for a rapid overview of CUDA.
Links mentioned:
- Nvidia CUDA in 100 Seconds: What is CUDA? And how does parallel computing on the GPU enable developers to unlock the full potential of AI? Learn the basics of Nvidia CUDA programming inâŠ
- GitHub - tspeterkim/flash-attention-minimal: Flash Attention in ~100 lines of CUDA (forward pass only): Flash Attention in ~100 lines of CUDA (forward pass only) - tspeterkim/flash-attention-minimal
CUDA MODE â· #cuda (30 messagesđ„):
-
Bandwidth Revelations with Nvidia H100:
@iron_bound
discussed the L2 cache read bandwidth for Nvidiaâs H100 GPU, highlighting a 5.5 TB/s figure and proposing a method for calculating L1 cache bandwidth. They referenced an in-depth article at Chips and Cheese that describes Nvidiaâs focus on the compute market with the H100 using the Hopper architecture. -
Architectural Comparisons:
@zippika
speculated based on available data that the Nvidia 4090âs L1 cache bandwidth could be 40TB/s, assuming that the H100 has similar parameters, while@iron_bound
pointed out that one must consider the differences between the Ada and Hopper architectures. -
Unveiling Coarsening Effects on Performance:
@marksaroufim
shared a link to a tweet by @zeuxcg that discloses the proper way to handle coarsening in code execution, providing insights into performance misconceptions due to benchmarking inconsistencies. -
Learning the CUDA CuTe DSL:
@ericauld
started a discussion on understanding the CuTe Domain Specific Language (DSL), sharing a GitHub link to the CUtlass library where CuTe is used, and discussing the best order to read the documentation. -
Dequantization Optimization in CUDA:
@zippika
shared their progress on optimizing fp4 dequantization using the cuda::pipeline API, noting an improvement in speed over a baseline and stating success in accuracy when testing on real inputs and outputs.
Links mentioned:
- Microbenchmarking Nvidiaâs RTX 4090: Nvidiaâs RTX 4090 features Nvidiaâs newest architecture, named Ada Lovelace after a pioneer in early computing. Compared to their previous architecture, Ampere, Ada Lovelace enjoys a prâŠ
- Tweet from Arseny Kapoulkine đșđŠ (@zeuxcg): As a demonstration, I changed coarsen source code to comment out the body of VecAdd/VecAddCoarsened code and changed the launch parameters to omit
2*
and I get these results. What youâre seeing⊠- Nvidiaâs H100: Funny L2, and Tons of Bandwidth: GPUs started out as devices meant purely for graphics rendering, but their highly parallel nature made them attractive for certain compute tasks too. As the GPU compute scene grew over the past couâŠ
- cutlass/media/docs/cute at main · NVIDIA/cutlass: CUDA Templates for Linear Algebra Subroutines. Contribute to NVIDIA/cutlass development by creating an account on GitHub.
CUDA MODE â· #torch (11 messagesđ„):
-
bitsandbytes for Quantization Needs: User
@iron_bound
suggested@mabeto5p
to check out bitsandbytes to perform linalg on low precision integers with PyTorch, which might utilize int8 tensor-cores on Ada architectures.@mabeto5p
was looking for high abstraction tools to handle int4/int8 and fp8 matrices operations. -
A Sync in Time Saves Nine:
@andreaskoepf
pointed out a common mistake which could inflate benchmarks, implying@mabeto5p
might need to addtorch.cuda.synchronize()
to get accurate measurements.@mabeto5p
later affirmed moving away from Jupyter allowed for reasonable benchmarking results. -
Cross-device Sync Clarified: Addressing
@mabeto5p
âs query,@andreaskoepf
clarified thattorch.cuda.synchronize()
callscudaDeviceSynchronize
internally, ensuring all kernels are finished before the call returns, irrespective of the origin, given they are from the same process. -
Mixed Device Tensor Indexing: As per
@_t_vi_
, indexing a CPU tensor with a CUDA tensor scalar is allowed due to historical reasons and convenience, since scalars can be treated differently for operations like indexing. -
Scalars Get VIP Treatment Across Devices:
@_t_vi_
continued to explain that scalar tensors can be used to index other tensors regardless of device due to automatic conversions from Python numbers and CPU scalars to the device of the target tensor.
Links mentioned:
- CUDA Runtime API :: CUDA Toolkit Documentation: no description found
- GitHub - TimDettmers/bitsandbytes: Accessible large language models via k-bit quantization for PyTorch.: Accessible large language models via k-bit quantization for PyTorch. - TimDettmers/bitsandbytes
CUDA MODE â· #algorithms (1 messages):
- RelayAttention vs Ring/Flash Attention Inquiry:
@lancerts
inquired about how RelayAttention compares with ring/flash attention, linking to the GitHub repository vLLM with RelayAttention integration. The user has just begun reading the paper on RelayAttention.
Links mentioned:
GitHub - rayleizhu/vllm-ra: vLLM with RelayAttention integration: vLLM with RelayAttention integration. Contribute to rayleizhu/vllm-ra development by creating an account on GitHub.
CUDA MODE â· #ring-attention (23 messagesđ„):
- GPU Reset Tips from Command Line:
@iron_bound
provided a potential solution to reset a GPU by suggesting the use of the commandsudo nvidia-smi --gpu-reset -i 0
. They also highlighted that innvtop
, pid 3874970 was observed to be running. - Pod Restart to Address GPU Issues:
@andreaskoepf
mentioned restarting the pod in response to@jamesmel
âs concern about memory allocation issues on the GPU not releasing and having no PID. - CUDA Runtime Error Conundrum:
@iron_bound
encountered a CUDA runtime error that specified âhead_size should be a multiple of 8â during the backward pass ofring_flash_attn_varlen.py
. - Sampling Mechanism Clarification:
@andreaskoepf
explained the token sampling mechanism during the forward pass of model training, indicating that only the last sampled token is used after the prompt, with an input tensor shaped (batch_size, 1). - Attention to Ring-attention and How-to Log Sum Exp:
@andreaskoepf
created and shared a âhow-to log sum expâ notebook for ring-attention experiments, and@_t_vi_
expressed enthusiasm about the logsumexp trick, sharing a related project on sinkhorn-kernel.
Links mentioned:
- iron-bound: Weights & Biases, developer tools for machine learning
- ring-attention/notebooks/howto_log_sum_exp.ipynb at main · cuda-mode/ring-attention: ring-attention experiments. Contribute to cuda-mode/ring-attention development by creating an account on GitHub.
- GitHub - RulinShao/LightSeq: Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers: Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers - RulinShao/LightSeq
- iron-bound: Weights & Biases, developer tools for machine learning
- Lernapparat - Machine Learning: no description found
CUDA MODE â· #off-topic (1 messages):
- Intriguing Mandelbrot Visualization Shared: User
@apaz
shared an image link depicting a Mandelbrot set. The cryptographic string in the URL suggests enhanced security or unique identification for the image.
Interconnects (Nathan Lambert) â· #news (4 messages):
-
Inflection Boosts Pi with Emotional Intelligence:
@xeophon.
highlighted the launch of Inflection-2.5, an upgraded AI model from Inflection AI, boasting competitive performance against major models like GPT-4 and Gemini. This new model, implemented in the empathetic AI Pi, is available across iOS, Android, and desktop platforms. -
Skepticism Over Inflection-2.5 Claims:
@xeophon.
shared a tweet by@HlibIvanov
criticizing the efficacy of Inflection-2.5, suggesting it was simply a distillation of GPT-4 and questioning its innovation in AI modeling. -
Nato Lambert Shares Excitement:
@natolambert
expressed excitement in a tweet about the rapid development and release of various models in less than a month, implying the pace of innovation in AI is accelerating.
Links mentioned:
- Inflection-2.5: meet the worldâs best personal AI: We are an AI studio creating a personal AI for everyone. Our first AI is called Pi, for personal intelligence, a supportive and empathetic conversational AI.
- Tweet from Hlib Ivanov (e/acc) (@HlibIvanov): Of course it takes less flops to make a gpt-4 distill than training gpt-4 from scratch iâm not even bothering with proper benchmark, this is clearly trained on unfiltered stuff from gpt-4, and it âŠ
Interconnects (Nathan Lambert) â· #ml-drama (48 messagesđ„):
- Open-Source Software: A Salty Topic?:
@natolambert
shared frustrations about the reactions to writing on open-source software, where even helpful posts often receive pedantic corrections and little welcome from the community. - Venting on Community Feedback: While acknowledging that feedback is useful,
@natolambert
vented that the open-source software (OSS) community often lacks perspective and can deter those trying to promote OSS due to excessive criticism. - ML Policy Discussions Remain Private:
@dangf91
and@natolambert
discussed how the contentious nature of open-source and machine learning (ML) policy keeps many discussions out of the public eye.@natolambert
pointed out that this is partly political, and oneâs stance often differs in public vs. private. - Critiques on License Talks:
@natolambert
noted that while he is open about being engaged in politics around ML, criticism often focuses on minutia like incorrect terminology use. - Troubles in Clarifying âOpenâ Terminology: A conversation between
@xeophon.
and@natolambert
highlighted the difficulties and confusions in classifying models like Mistral or Llama2, with the industry incorrectly labeling proprietary models as âopen-sourceâ.
Links mentioned:
Aggregatorâs AI Risk: A single AI can never make everyone happy, which is fundamentally threatening to the Aggregator business model; the solution is personalized AI
Interconnects (Nathan Lambert) â· #random (19 messagesđ„):
- Claude-3 Ranking Sparks Community Frenzy:
@xeophon
shared a link announcing the launch of @Anthropicâs Claude-3 ranking with impressive 20,000 votes in just three days. Claude-3 variants Opus and Sonnet are creating a buzz, rivaling GPT-4-Turbo and closely matching GPT-4 in performance. - Anticipation for Anthropicâs Gemini Ultra:
@natolambert
expressed eagerness for the release of Gemini Ultra, while@xeophon
is excited to access it and try out the 1M context window feature, especially for analyzing multiple academic papers. - Simplicity in Screenshotting: In response to
@xeophon
asking about the method to generate certain images,@natolambert
explained using the screenshot full window feature on Mac with the tip to press space aftercommand-shift-4
for an effective tool useful in blogging and communications.
Links mentioned:
- Tweet from lmsys.org (@lmsysorg): đ„Exciting news from Arena @Anthropicâs Claude-3 Ranking is here!đ Claude-3 has ignited immense community interest, propelling Arena to unprecedented traffic with over 20,000 votes in just threeâŠ
- Tweet from Nathan Lambert (@natolambert): I didnât expect commoditization at the top end but all of these in less than a month. We still have a week to get GPT5 and Llama 3 in this month post G1.5. Gemini 1.5, Mistral Large, Claude 3, InâŠ
Alignment Lab AI â· #general-chat (1 messages):
- A Warm Welcome to Segmentationfault.:
@segmentationfault.
expressed gratitude for being invited by@748528982034612226
and showed eagerness to contribute to the field despite being new.
Alignment Lab AI â· #oo (16 messagesđ„):
- Early Bird Doesnât Always Get the Worm:
@joshxt
emphasized the importance of not tagging everyone, especially at 5am, as it was considered spam and was promptly deleted. The unwanted alert led to a quick policy change, disabling the ability for users to ping everyone. - Curiosity Over a Mysterious Ping:
@ikaridev
inquired if they were the recipient of an @everyone ping, only to learn from@joshxt
that such a message did exist but was removed due to it being spam. - Dropping the @everyone Hammer: Following a spam incident,
@joshxt
humorously confirmed to@ikaridev
that users no longer have the ability to ping the entire server, hinting at a quick permission tweak. - Orca Dataset Splashes Into the Scene:
@joshxt
sparked a discussion about the newly released Orca dataset by Microsoft, sharing a link to Hugging Face (https://huggingface.co/datasets/microsoft/orca-math-word-problems-200k
) and asking if anyone is doing something interesting with it. - Favorites in the AI Aquarium: In light of the Orca dataset conversation,
@twistedshadows.
confessed a bias towards the model âPsyonic-cetaceanâ which integrates orca2 13b, while@joshxt
admitted to currently favoring âClaude 3 Opus.â
Links mentioned:
- Angry Gary Oldman GIF - Angry Gary Oldman Everyone - Discover & Share GIFs: Click to view the GIF
- microsoft/orca-math-word-problems-200k · Datasets at Hugging Face: no description found
Alignment Lab AI â· #oo2 (10 messagesđ„):
- New Faces in the Chat:
@jaxxks
offered a friendly evening greeting to@1168088006553518183
who had welcomed them in earlier. - The Band Is Gathering: Users such as
@tcapelle
and@aslawliet
joined the channel with general greetings like âHello every1!â and âHello friends! đâ. - Brainstorming Session for Orca-2 Begins:
@aslawliet
introduced the project concept for Orca-2, suggesting it target a broader range of datasets including FLAN 2021 and selective zero-shot (zs_opt) samples from T0 and Natural Instructions (niv). - Data Augmentation Strategy Discussed: The idea of using Mixtral for data augmentation was proposed by
@aslawliet
as a cost and time-efficient alternative to GPT-4. - A Light Moment on AI Choices:
@aslawliet
humorously doubted anyoneâs willingness to use Claude-3 Opus for the current project.
DiscoResearch â· #general (9 messagesđ„):
- Choices of Language Models for Various Constraints: User
@johannhartmann
mentioned different models suitable under various constraints: Claude Opus and GPT-4 with no constraints, DiscoLM-120B for open-source with extensive memory, and VAGOsolutions/Sauerkraut LM-UNA-SOLAR-Instruct when memory is limited. - Discussion on Retrieval-Augmented Language Models:
@maxidl
shared an arXiv paper discussing the advantages of retrieval-augmented language models over traditional ones, noting the paper has a section on joint training of retriever and LLM but research on this topic is not extensive. - Recommendations for German-Speaking Models:
@cybertimon
recommended Nous Hermes 2 Mixtral 8x7b for its fluent German capabilities, while@johannhartmann
suggested exploring DiscoResearch/DiscoLM_German_7b_v1 and other models such as VAGOsolutions/SauerkrautLM-7b-HerO, mayflowergmbh/Brezn-7b, or seedboxai/KafkaLM-7B-DARE_TIES-LaserRMT-QLoRA-DPO-v0.5. - Hermes Mixtral Praised for Accuracy:
@flozi00
voiced their positive experience with Nous Hermes 2 Mixtral 8x7b, highlighting its accuracy in understanding tasks on the first attempt. - Comparison Inquiry for German Language Models:
@johannhartmann
inquired if anyone had compared Nous Hermes Mixtral with other Mixtrals like Sauerkraut or DiscoLM for German prompts, indicating that to their knowledge, Hermes Mixtral had no German finetuning.
Links mentioned:
Reliable, Adaptable, and Attributable Language Models with Retrieval: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, diâŠ
DiscoResearch â· #embedding_dev (13 messagesđ„):
-
Exploring OPUS 100 Translation Quality:
@flozi00
is working on labeling the OPUS 100 dataset to assess translation quality, finding less than half to be of good quality. They plan to develop embedding models to score translation quality by embedding distance, which could be beneficial for improving machine translation (MT) models and datasets. -
Prompt Fine-tuning for Translation Categorization:
@flozi00
employed the nous hermes mixtral dpo model, after substantial prompt fine-tuning, to categorize translation quality. This points toward the potential for automatic quality assessment in translation datasets. -
Dataset Scrubbing for âGoodâ Translation Pairs:
@crispstrobe
highlighted that OPUS 100, being randomly selected from a larger corpus, contains context-specific pairings that often fail outside of their intended setting. Creating subsets with universal âgoodâ pairings is suggested for better utility in general contexts. -
Improving the Automatic Evaluation of Translation Quality:
@flozi00
mentions updating their model and dataset collection for better translation quality judgment. They also intend to iterate over multiple datasets to enhance their collection and welcome further suggestions for improvement. -
mMARCO Dataset Gains Apache 2.0 License:
@philipmay
notes that the mMARCO dataset has added an Apache 2.0 license and directs attention to the datasetâs information on Hugging Face, despite a current lack of dataset viewer support for it.
Links mentioned:
- Translation Data Quality - a flozi00 Collection: no description found
- unicamp-dl/mmarco · Datasets at Hugging Face: no description found
- Data (Hint ID): no description found
LLM Perf Enthusiasts AI â· #claude (13 messagesđ„):
- Opus the SAT Scholar:
@jeffreyw128
shared that Opus achieved a perfect score of 800 on the SAT reading section, and linked to a Twitter post with the results (View tweet). - Concerns about Model Memorization:
@dare.ai
raised a point following the SAT achievement regarding the challenges in creating true holdouts to avoid memorization in massive models. - Opus Garners Praise:
@nosa_
expressed admiration for Opus, humorously threatening a confrontation if anyone tells Opus about the compliment. - Improved Opus Skill at Knowledge Webs:
@res6969
tested Opusâs ability to construct knowledge webs from a large document exceeding 35k tokens, remarking on the modelâs improved instruction following and contextual understanding. - Search Amongst 500 Names Proves Difficult:
@jeffreyw128
reported on failing to find a specific name within 500, using different models including Claude Opus, indicating a typical struggle for AI models with such tasks.
LLM Perf Enthusiasts AI â· #prompting (1 messages):
Since there is only one message provided without any specific discussion on topics, no summary can be generated. If more context or messages were provided, Iâd be able to offer a summary with the requested details.
Datasette - LLM (@SimonW) â· #ai (5 messages):
- GPT-4 Performance Under Scrutiny:
@dbreunig
expressed surprise at GPT-4âs failure in a specific, undisclosed test with no details provided on the nature of the test or the kind of failure encountered. - Clickable Bookshelves Spark Interest:
@xnimrodx
shared a blog post about a script that transforms images of bookshelves into clickable regions, leading to Google Books pages for each book. The post includes a demo and a video showing the functionality. - Library Efficiency Through Imaginative Tech:
@xnimrodx
expressed a desire for an application similar to the clickable bookshelves to help his librarian-wife with shelf-reading tasks, noting the size of her library which is the largest among 35 schools in their diocesan system. - Community Library Project Idea:
@dbreunig
showed interest in creating a toy app to help people register books in the small libraries around their town, indicating a practical application of the technology being discussed.
Links mentioned:
Making my bookshelves clickable | Jamesâ Coffee Blog: no description found
Datasette - LLM (@SimonW) â· #llm (8 messagesđ„):
- Debugging Template Crashes:
@trufuswashington
sought advice from the expert (@746595581086138409
) due to issues with creating templates for common use cases; one template worked fine, while another kept crashing thellm
command. - Error Output Shared: They shared the error output, showing that the
llm
command failed due to aTypeError
related to âexpected str instance, NoneType foundâ during variable interpolation. - Working vs. Crashing Templates Compared:
@trufuswashington
attached two YAML templates for comparison. The working one was for modifying text blocks and the faulty one was for providing brief explanations on various types of code-related content. - Cause of the Crash Found: Eventually,
@trufuswashington
discovered the cause of the crashâit was due to the presence of a dollar sign$
in the template prompt where it was not expected. - Dollar Sign Culprit: Specifically, the error was triggered by the line âI will tip you $100 if you do this.â in the crashing template, indicating issues with handling the special character in the prompt interpolation.
Skunkworks AI â· #off-topic (2 messages):
-
Innovating Crafting Games with AI:
@pradeep1148
shared a YouTube video titled âInfinite Craft Game using Mistral,â showcasing a crafting game where elements are combined to create new items using the capabilities of the Mistral language model. -
Automating Memes with Mistral:
@pradeep1148
also posted a link to another YouTube video titled âMaking memes with Mistral & Giphy,â which demonstrates the process of creating memes by integrating Mistralâs AI with the Giphy API and includes a link to the GitHub repository containing the relevant notebook.
Links mentioned:
- Infinite Craft Game using Mistral: Let develop Neal Agarwalâs web game Infinite Craft. This is a âcrafting gameâ where you start with just four elements and repeatedly combine pairs of elementâŠ
- Making memes with Mistral & Giphy: Lets make memes using mistral llm and Giphy api#llm #ml #python #pythonprogramming https://github.com/githubpradeep/notebooks/blob/main/Giphy%20Mistral.ipynb