All tags
Topic: "gptq"
1/16/2024: TIES-Merging
mixtral-8x7b nous-hermes-2 frankendpo-4x7b-bf16 thebloke hugging-face nous-research togethercompute oak-ridge-national-laboratory vast-ai runpod mixture-of-experts random-gate-routing quantization gptq exl2-quants reinforcement-learning-from-human-feedback supercomputing trillion-parameter-models ghost-attention model-fine-tuning reward-models sanjiwatsuki superking__ mrdragonfox _dampf kaltcit rombodawg technotech
TheBloke's Discord community actively discusses Mixture of Experts (MoE) models, focusing on random gate routing layers for training and the challenges of immediate model use. There is a robust debate on quantization methods, comparing GPTQ and EXL2 quants, with EXL2 noted for faster execution on specialized hardware. A new model, Nous Hermes 2, based on Mixtral 8x7B and trained with RLHF, claims benchmark superiority but shows some inconsistencies. The Frontier supercomputer at Oak Ridge National Laboratory is highlighted for training a trillion-parameter LLM with 14TB RAM, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of ghost attention in the academicat model is explored, with mixed reactions from the community. "Random gate layer is good for training but not for immediate use," and "EXL2 might offer faster execution on specialized hardware," are key insights shared.
12/9/2023: The Mixtral Rush
mixtral hermes-2.5 hermes-2 mistral-yarn ultrachat discoresearch fireworks-ai hugging-face mistral-ai benchmarking gpu-requirements multi-gpu quantization gptq chain-of-thought min-p-sampling top-p-sampling model-sampling model-merging model-performance small-models reasoning-consistency temperature-sampling bjoernp the_bloke rtyax kalomaze solbus calytrix
Mixtral's weights were released without code, prompting the Disco Research community and Fireworks AI to implement it rapidly. Despite efforts, no significant benchmark improvements were reported, limiting its usefulness for local LLM usage but marking progress for the small models community. Discussions in the DiscoResearch Discord covered Mixtral's performance compared to models like Hermes 2.5 and Hermes 2, with evaluations on benchmarks such as winogrande, truthfulqa_mc2, and arc_challenge. Technical topics included GPU requirements, multi-GPU setups, and quantization via GPTQ. Benchmarking strategies like grammar-based evaluation, chain of thought (CoT), and min_p sampling were explored, alongside model sampling techniques like Min P and Top P to enhance response stability and creativity. Users also discussed GPTs' learning limitations and the adaptability of models under varying conditions, emphasizing min_p sampling's role in enabling higher temperature settings for creativity.