All tags
Person: "mrdragonfox"
AI2 releases OLMo - the 4th open-everything LLM
olmo-1b olmo-7b olmo-65b miqu-70b mistral-medium distilbert-base-uncased ai2 allenai mistral-ai tsmc asml zeiss fine-tuning gpu-shortage embedding-chunking json-generation model-optimization reproducible-research self-correction vram-constraints programming-languages nathan-lambert lhc1921 mrdragonfox yashkhare_ gbourdin
AI2 is gaining attention in 2024 with its new OLMo models, including 1B and 7B sizes and a 65B model forthcoming, emphasizing open and reproducible research akin to Pythia. The Miqu-70B model, especially the Mistral Medium variant, is praised for self-correction and speed optimizations. Discussions in TheBloke Discord covered programming language preferences, VRAM constraints for large models, and fine-tuning experiments with Distilbert-base-uncased. The Mistral Discord highlighted challenges in the GPU shortage affecting semiconductor production involving TSMC, ASML, and Zeiss, debates on open-source versus proprietary models, and fine-tuning techniques including LoRA for low-resource languages. Community insights also touched on embedding chunking strategies and JSON output improvements.
1/16/2024: TIES-Merging
mixtral-8x7b nous-hermes-2 frankendpo-4x7b-bf16 thebloke hugging-face nous-research togethercompute oak-ridge-national-laboratory vast-ai runpod mixture-of-experts random-gate-routing quantization gptq exl2-quants reinforcement-learning-from-human-feedback supercomputing trillion-parameter-models ghost-attention model-fine-tuning reward-models sanjiwatsuki superking__ mrdragonfox _dampf kaltcit rombodawg technotech
TheBloke's Discord community actively discusses Mixture of Experts (MoE) models, focusing on random gate routing layers for training and the challenges of immediate model use. There is a robust debate on quantization methods, comparing GPTQ and EXL2 quants, with EXL2 noted for faster execution on specialized hardware. A new model, Nous Hermes 2, based on Mixtral 8x7B and trained with RLHF, claims benchmark superiority but shows some inconsistencies. The Frontier supercomputer at Oak Ridge National Laboratory is highlighted for training a trillion-parameter LLM with 14TB RAM, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of ghost attention in the academicat model is explored, with mixed reactions from the community. "Random gate layer is good for training but not for immediate use," and "EXL2 might offer faster execution on specialized hardware," are key insights shared.