All tags
Model: "mistral-instruct-v0.2"
RWKV "Eagle" v5: Your move, Mamba
rwkv-v5 mistral-7b miqu-1-70b mistral-medium llama-2 mistral-instruct-v0.2 mistral-tuna llama-2-13b kunoichi-dpo-v2-7b gpt-4 eleutherai mistral-ai hugging-face llamaindex nous-research rwkv lmsys fine-tuning multilinguality rotary-position-embedding model-optimization model-performance quantization speed-optimization prompt-engineering model-benchmarking reinforcement-learning andrej-karpathy
RWKV v5 Eagle was released with better-than-mistral-7b evaluation results, trading some English performance for multilingual capabilities. The mysterious miqu-1-70b model sparked debate about its origins, possibly a leak or distillation of Mistral Medium or a fine-tuned Llama 2. Discussions highlighted fine-tuning techniques, including the effectiveness of 1,000 high-quality prompts over larger mixed-quality datasets, and tools like Deepspeed, Axolotl, and QLoRA. The Nous Research AI community emphasized the impact of Rotary Position Embedding (RoPE) theta settings on LLM extrapolation, improving models like Mistral Instruct v0.2. Speed improvements in Mistral Tuna kernels reduced token processing costs, enhancing efficiency. The launch of Eagle 7B with 7.52B parameters showcased strong multilingual performance, surpassing other 7B class models.
GPT4Turbo A/B Test: gpt-4-1106-preview
gpt-4-turbo gpt-4 gpt-3.5 openhermes-2.5-mistral-7b-4.0bpw exllamav2 llama-2-7b-chat mistral-instruct-v0.2 mistrallite llama2 openai huggingface thebloke nous-research mistral-ai langchain microsoft azure model-loading rhel dataset-generation llm-on-consoles fine-tuning speed-optimization api-performance prompt-engineering token-limits memory-constraints text-generation nlp-tools context-window-extension sliding-windows rope-theta non-finetuning-context-extension societal-impact
OpenAI released a new GPT-4 Turbo version, prompting a natural experiment in summarization comparing the November 2023 and January 2024 versions. The TheBloke Discord discussed troubleshooting model loading errors with OpenHermes-2.5-Mistral-7B-4.0bpw and exllamav2, debates on RHEL in ML, dataset generation for understanding GPT flaws, and running LLMs like Llama and Mistral on consoles. LangChain fine-tuning challenges for Llama2 were also noted. The OpenAI Discord highlighted GPT-4 speed inconsistencies, API vs web performance, prompt engineering with GPT-3.5 and GPT-4 Turbo, and DALL-E typo issues in image text. Discussions included NLP tools like semantic-text-splitter and collaboration concerns with GPT-4 Vision on Azure. The Nous Research AI Discord focused on extending context windows with Mistral instruct v0.2, MistralLite, and LLaMA-2-7B-Chat achieving 16,384 token context, plus alternatives like SelfExtend for context extension without fine-tuning. The societal impact of AI technology was also considered.