All tags
Model: "exllamav2"
Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11
pixtral-large mistral-large-24.11 llama-3-2 qwen2.5-7b-instruct-abliterated-v2-gguf qwen2.5-32b-q3_k_m vllm llama-cpp exllamav2 tabbyapi mistral-ai sambanova nvidia multimodality vision model-updates chatbots inference gpu-optimization quantization performance concurrency kv-cache arthur-mensch
Mistral has updated its Pixtral Large vision encoder to 1B parameters and released an update to the 123B parameter Mistral Large 24.11 model, though the update lacks major new features. Pixtral Large outperforms Llama 3.2 90B on multimodal benchmarks despite having a smaller vision adapter. Mistral's Le Chat chatbot received comprehensive feature updates, reflecting a company focus on product and research balance as noted by Arthur Mensch. SambaNova sponsors inference with their RDUs offering faster AI model processing than GPUs. On Reddit, vLLM shows strong concurrency performance on an RTX 3090 GPU, with quantization challenges noted in FP8 kv-cache but better results using llama.cpp with Q8 kv-cache. Users discuss performance trade-offs between vLLM, exllamav2, and TabbyAPI for different model sizes and batching strategies.
GPT4Turbo A/B Test: gpt-4-1106-preview
gpt-4-turbo gpt-4 gpt-3.5 openhermes-2.5-mistral-7b-4.0bpw exllamav2 llama-2-7b-chat mistral-instruct-v0.2 mistrallite llama2 openai huggingface thebloke nous-research mistral-ai langchain microsoft azure model-loading rhel dataset-generation llm-on-consoles fine-tuning speed-optimization api-performance prompt-engineering token-limits memory-constraints text-generation nlp-tools context-window-extension sliding-windows rope-theta non-finetuning-context-extension societal-impact
OpenAI released a new GPT-4 Turbo version, prompting a natural experiment in summarization comparing the November 2023 and January 2024 versions. The TheBloke Discord discussed troubleshooting model loading errors with OpenHermes-2.5-Mistral-7B-4.0bpw and exllamav2, debates on RHEL in ML, dataset generation for understanding GPT flaws, and running LLMs like Llama and Mistral on consoles. LangChain fine-tuning challenges for Llama2 were also noted. The OpenAI Discord highlighted GPT-4 speed inconsistencies, API vs web performance, prompt engineering with GPT-3.5 and GPT-4 Turbo, and DALL-E typo issues in image text. Discussions included NLP tools like semantic-text-splitter and collaboration concerns with GPT-4 Vision on Azure. The Nous Research AI Discord focused on extending context windows with Mistral instruct v0.2, MistralLite, and LLaMA-2-7B-Chat achieving 16,384 token context, plus alternatives like SelfExtend for context extension without fine-tuning. The societal impact of AI technology was also considered.