All tags
Topic: "gpu-requirements"
12/9/2023: The Mixtral Rush
mixtral hermes-2.5 hermes-2 mistral-yarn ultrachat discoresearch fireworks-ai hugging-face mistral-ai benchmarking gpu-requirements multi-gpu quantization gptq chain-of-thought min-p-sampling top-p-sampling model-sampling model-merging model-performance small-models reasoning-consistency temperature-sampling bjoernp the_bloke rtyax kalomaze solbus calytrix
Mixtral's weights were released without code, prompting the Disco Research community and Fireworks AI to implement it rapidly. Despite efforts, no significant benchmark improvements were reported, limiting its usefulness for local LLM usage but marking progress for the small models community. Discussions in the DiscoResearch Discord covered Mixtral's performance compared to models like Hermes 2.5 and Hermes 2, with evaluations on benchmarks such as winogrande, truthfulqa_mc2, and arc_challenge. Technical topics included GPU requirements, multi-GPU setups, and quantization via GPTQ. Benchmarking strategies like grammar-based evaluation, chain of thought (CoT), and min_p sampling were explored, alongside model sampling techniques like Min P and Top P to enhance response stability and creativity. Users also discussed GPTs' learning limitations and the adaptability of models under varying conditions, emphasizing min_p sampling's role in enabling higher temperature settings for creativity.
12/8/2023 - Mamba v Mistral v Hyena
mistral-8x7b-moe mamba-3b stripedhyena-7b claude-2.1 gemini gpt-4 dialogrpt-human-vs-machine cybertron-7b-v2-gguf falcon-180b mistral-ai togethercompute stanford anthropic google hugging-face mixture-of-experts attention-mechanisms prompt-engineering alignment image-training model-deployment gpu-requirements cpu-performance model-inference long-context model-evaluation open-source chatbots andrej-karpathy tri-dao maxwellandrews raddka
Three new AI models are highlighted: Mistral's 8x7B MoE model (Mixtral), Mamba models up to 3B by Together, and StripedHyena 7B, a competitive subquadratic attention model from Stanford's Hazy Research. Discussions on Anthropic's Claude 2.1 focus on its prompting technique and alignment challenges. The Gemini AI from Google is noted as potentially superior to GPT-4. The community also explores Dreambooth for image training and shares resources like the DialogRPT-human-vs-machine model on Hugging Face. Deployment challenges for large language models, including CPU performance and GPU requirements, are discussed with references to Falcon 180B and transformer batching techniques. User engagement includes meme sharing and humor.