All tags
Topic: "trillion-parameter-models"
1/16/2024: TIES-Merging
mixtral-8x7b nous-hermes-2 frankendpo-4x7b-bf16 thebloke hugging-face nous-research togethercompute oak-ridge-national-laboratory vast-ai runpod mixture-of-experts random-gate-routing quantization gptq exl2-quants reinforcement-learning-from-human-feedback supercomputing trillion-parameter-models ghost-attention model-fine-tuning reward-models sanjiwatsuki superking__ mrdragonfox _dampf kaltcit rombodawg technotech
TheBloke's Discord community actively discusses Mixture of Experts (MoE) models, focusing on random gate routing layers for training and the challenges of immediate model use. There is a robust debate on quantization methods, comparing GPTQ and EXL2 quants, with EXL2 noted for faster execution on specialized hardware. A new model, Nous Hermes 2, based on Mixtral 8x7B and trained with RLHF, claims benchmark superiority but shows some inconsistencies. The Frontier supercomputer at Oak Ridge National Laboratory is highlighted for training a trillion-parameter LLM with 14TB RAM, sparking discussions on open-sourcing government-funded AI research. Additionally, the application of ghost attention in the academicat model is explored, with mixed reactions from the community. "Random gate layer is good for training but not for immediate use," and "EXL2 might offer faster execution on specialized hardware," are key insights shared.
1/9/2024: Nous Research lands $5m for Open Source AI
qlora phi-3 mixtral ollama nous-research openai rabbit-tech context-window fine-tuning synthetic-data activation-beacon transformer-architecture seed-financing real-time-voice-agents trillion-parameter-models kenakafrosty _stilic_ teknium
Nous Research announced a $5.2 million seed financing focused on Nous-Forge, aiming to embed transformer architecture into chips for powerful servers supporting real-time voice agents and trillion parameter models. Rabbit R1 launched a demo at CES with mixed reactions. OpenAI shipped the GPT store and briefly leaked an upcoming personalization feature. A new paper on Activation Beacon proposes a solution to extend LLMs' context window significantly, with code to be released on GitHub. Discussions also covered QLORA, fine-tuning, synthetic data, and custom architectures for LLMs.