All tags
Person: "carsonpoole"
1/16/2024: ArtificialAnalysis - a new model/host benchmark site
mixtral hermes-2-mixtral openchat-7b byte-mistral nous-research nvidia hugging-face summarization fine-tuning byte-level-tokenization multimodality inference-speed-optimization dataset-sharing quantization swyx gabriel_syme manojbh carsonpoole fullstack6209
Artificial Analysis launched a new models and hosts comparison site, highlighted by swyx. Nous Research AI Discord discussed innovative summarization techniques using NVIDIA 3090 and 2080ti GPUs for processing around 100k tokens, and adapting prompts for smaller models like OpenChat 7B. The availability of Hermes 2 Mixtral on Huggingface's HuggingChat was noted, alongside fine-tuning challenges with Mixtral using Axolotl. Discussions included byte-level tokenization experiments with Byte Mistral, multimodal training on COCO image bytes, and inference speed improvements using vllm and llama.cpp. Calls for transparency in data sharing and open-sourcing the Hermes 2 Mixtral dataset were emphasized, with comparisons of dpo and sft methods and quantized LLM use on M1 MacBook Pro.
12/25/2023: Nous Hermes 2 Yi 34B for Christmas
nous-hermes-2 yi-34b nucleusx yayi-2 ferret teknim nous-research apple mixtral deepseek qwen huggingface wenge-technology quantization model-optimization throughput-metrics batch-processing parallel-decoding tensor-parallelization multimodality language-model-pretraining model-benchmarking teknium carsonpoole casper_ai pradeep1148 osanseviero metaldragon01
Teknium released Nous Hermes 2 on Yi 34B, positioning it as a top open model compared to Mixtral, DeepSeek, and Qwen. Apple introduced Ferret, a new open-source multimodal LLM. Discussions in the Nous Research AI Discord focused on AI model optimization and quantization techniques like AWQ, GPTQ, and AutoAWQ, with insights on proprietary optimization and throughput metrics. Additional highlights include the addition of NucleusX Model to transformers, a 30B model with 80 MMLU, and the YAYI 2 language model by Wenge Technology trained on 2.65 trillion tokens. "AutoAWQ outperforms vLLM up to batch size 8" was noted, and proprietary parallel decoding and tensor parallelization across GPUs were discussed for speed improvements.