All tags
Model: "hermes-2.5"
12/23/2023: NeurIPS Best Papers of 2023
gpt-4 palm2 hermes-2.5 mistral-7b nous-research hugging-face apple context-length malware-security video-content music-content linear-layers api-access large-language-models embedding vector-databases model-merging model-interpretability striped-hyena-architecture quantization rmsnorm attention-mechanisms
The Latent Space Pod released a 3-hour recap of the best NeurIPS 2023 papers. The Nous Research AI Discord community discussed optimizing AI performance with shorter context lengths, malware security concerns linked to HuggingFace, and shared insights on video and music content. Technical discussions included the DYAD research paper proposing a faster alternative to linear layers, Apple's ML Ferret machine learning tool, and accessing PALM2 via API. The community also explored Large Language Models focusing on specialized models, data scaling, embedding/vector databases, model merging, and interpretability, with mentions of Hermes 2.5, GPT-4, and Mistral. Additionally, there were conversations on the Striped Hyena Architecture, quantization challenges, and fixes related to RMSNorm and the "Attention is All You Need" paper.
12/9/2023: The Mixtral Rush
mixtral hermes-2.5 hermes-2 mistral-yarn ultrachat discoresearch fireworks-ai hugging-face mistral-ai benchmarking gpu-requirements multi-gpu quantization gptq chain-of-thought min-p-sampling top-p-sampling model-sampling model-merging model-performance small-models reasoning-consistency temperature-sampling bjoernp the_bloke rtyax kalomaze solbus calytrix
Mixtral's weights were released without code, prompting the Disco Research community and Fireworks AI to implement it rapidly. Despite efforts, no significant benchmark improvements were reported, limiting its usefulness for local LLM usage but marking progress for the small models community. Discussions in the DiscoResearch Discord covered Mixtral's performance compared to models like Hermes 2.5 and Hermes 2, with evaluations on benchmarks such as winogrande, truthfulqa_mc2, and arc_challenge. Technical topics included GPU requirements, multi-GPU setups, and quantization via GPTQ. Benchmarking strategies like grammar-based evaluation, chain of thought (CoT), and min_p sampling were explored, alongside model sampling techniques like Min P and Top P to enhance response stability and creativity. Users also discussed GPTs' learning limitations and the adaptability of models under varying conditions, emphasizing min_p sampling's role in enabling higher temperature settings for creativity.