All tags
Company: "rwkv"
RWKV "Eagle" v5: Your move, Mamba
rwkv-v5 mistral-7b miqu-1-70b mistral-medium llama-2 mistral-instruct-v0.2 mistral-tuna llama-2-13b kunoichi-dpo-v2-7b gpt-4 eleutherai mistral-ai hugging-face llamaindex nous-research rwkv lmsys fine-tuning multilinguality rotary-position-embedding model-optimization model-performance quantization speed-optimization prompt-engineering model-benchmarking reinforcement-learning andrej-karpathy
RWKV v5 Eagle was released with better-than-mistral-7b evaluation results, trading some English performance for multilingual capabilities. The mysterious miqu-1-70b model sparked debate about its origins, possibly a leak or distillation of Mistral Medium or a fine-tuned Llama 2. Discussions highlighted fine-tuning techniques, including the effectiveness of 1,000 high-quality prompts over larger mixed-quality datasets, and tools like Deepspeed, Axolotl, and QLoRA. The Nous Research AI community emphasized the impact of Rotary Position Embedding (RoPE) theta settings on LLM extrapolation, improving models like Mistral Instruct v0.2. Speed improvements in Mistral Tuna kernels reduced token processing costs, enhancing efficiency. The launch of Eagle 7B with 7.52B parameters showcased strong multilingual performance, surpassing other 7B class models.
Adept Fuyu-Heavy: Multimodal model for Agents
fuyu-heavy fuyu-8b gemini-pro claude-2 gpt4v gemini-ultra deepseek-coder-33b yi-34b-200k goliath-120b mistral-7b-instruct-v0.2 mamba rwkv adept hugging-face deepseek mistral-ai nous-research multimodality visual-question-answering direct-preference-optimization benchmarking model-size-estimation quantization model-merging fine-tuning instruct-tuning rms-optimization heterogeneous-ai-architectures recurrent-llms contrastive-preference-optimization
Adept launched Fuyu-Heavy, a multimodal model focused on UI understanding and visual QA, outperforming Gemini Pro on the MMMU benchmark. The model uses DPO (Direct Preference Optimization), gaining attention as a leading tuning method. The size of Fuyu-Heavy is undisclosed but estimated between 20B-170B parameters, smaller than rumored frontier models like Claude 2, GPT4V, and Gemini Ultra. Meanwhile, Mamba was rejected at ICLR for quality concerns. In Discord discussions, DeepSeek Coder 33B was claimed to outperform GPT-4 in coding tasks, and deployment strategies for large models like Yi-34B-200K and Goliath-120B were explored. Quantization debates highlighted mixed views on Q8 and EXL2 quants. Fine-tuning and instruct-tuning of Mistral 7B Instruct v0.2 were discussed, alongside insights on RMS optimization and heterogeneous AI architectures combining Transformers and Selective SSM (Mamba). The potential of recurrent LLMs like RWKV and techniques like Contrastive Preference Optimization (CPO) were also noted.