All tags
Model: "miqu-1-70b"
Miqu confirmed to be an early Mistral-medium checkpoint
miqu-1-70b mistral-medium llama-2-70b-chat mixtral sqlcoder-70b codellama-70b bagelmistery-tour-v2 psyfighter-v2 mistral-ai hugging-face nous-research aiatmeta instruction-following sampling-methods fp16-quantization fine-tuning model-training context-length text-to-sql model-performance model-optimization intrstllrninja
Miqu, an open access model, scores 74 on MMLU and 84.5 on EQ-Bench, sparking debates about its performance compared to Mistral Medium. The CEO of Mistral confirmed these results. Discussions in the TheBloke Discord highlight Miqu's superiority in instruction-following and sampling methods like dynatemp and min-p. Developers also explore browser preferences and Discord UI themes. Role-playing with models like BagelMistery Tour v2 and Psyfighter v2 is popular, alongside technical talks on fp16 quantization of Miqu-1-70b. Training and fine-tuning tips for models like Unsloth and Mistral 7B are shared. In the Nous Research AI Discord, the Activation Beacon method is discussed for extending LLM context length from 4K to 400K tokens. SQLCoder-70B, fine-tuned on CodeLlama-70B, leads in text-to-SQL generation and is available on Hugging Face. The Miqu model also impresses with an 83.5 EQ-Bench score, fueling speculation about its capabilities.
RWKV "Eagle" v5: Your move, Mamba
rwkv-v5 mistral-7b miqu-1-70b mistral-medium llama-2 mistral-instruct-v0.2 mistral-tuna llama-2-13b kunoichi-dpo-v2-7b gpt-4 eleutherai mistral-ai hugging-face llamaindex nous-research rwkv lmsys fine-tuning multilinguality rotary-position-embedding model-optimization model-performance quantization speed-optimization prompt-engineering model-benchmarking reinforcement-learning andrej-karpathy
RWKV v5 Eagle was released with better-than-mistral-7b evaluation results, trading some English performance for multilingual capabilities. The mysterious miqu-1-70b model sparked debate about its origins, possibly a leak or distillation of Mistral Medium or a fine-tuned Llama 2. Discussions highlighted fine-tuning techniques, including the effectiveness of 1,000 high-quality prompts over larger mixed-quality datasets, and tools like Deepspeed, Axolotl, and QLoRA. The Nous Research AI community emphasized the impact of Rotary Position Embedding (RoPE) theta settings on LLM extrapolation, improving models like Mistral Instruct v0.2. Speed improvements in Mistral Tuna kernels reduced token processing costs, enhancing efficiency. The launch of Eagle 7B with 7.52B parameters showcased strong multilingual performance, surpassing other 7B class models.