Model: "llama-2"

Mar 27, 2024

DBRX: Best open model (just not most efficient)

dbrx grok mixtral llama-2 mpt-7b gpt-4 databricks hugging-face mistral-ai mosaicml openai mixture-of-experts model-efficiency tokenization model-training code-generation model-architecture open-source-models benchmarking fine-tuning

Databricks Mosaic has released a new open-source model called DBRX that outperforms Grok, Mixtral, and Llama2 on evaluations while being about 2x more efficient than Llama2 and Grok. The model was trained on 12 trillion tokens using 3,000 H100 GPUs over 2 months, with an estimated compute cost of $10 million. It uses OpenAI's 100k tiktoken tokenizer and shows strong zero-shot code generation performance, even beating GPT-4 on the Humaneval benchmark. DBRX also upstreamed work to MegaBlocks open source. Despite its scale and efficiency, DBRX's performance on MMLU is only slightly better than Mixtral, raising questions about its scaling efficiency. The focus of DBRX is on enabling users to train models efficiently, with MoE training being about 2x more FLOP-efficient than dense models, achieving similar quality with nearly 4x less compute than previous MPT models. This release is part of the ongoing competition for open-source AI leadership, including models like Dolly, MPT, and Mistral. "If it activates 36B params, the model's perf should be equivalent to a 72B dense model or even 80B," says Qwen's tech lead.

Feb 22, 2024

Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)

gemma-2b gemma-7b gemma gemini-pro-1.5 llama-2 llama-3 mistral google hugging-face nvidia benchmarking license-policies image-generation video-understanding long-context dataset-editing model-integration gpu-hardware bug-fixes quantization

Google's Gemma open models (2-7B parameters) outperform Llama 2 and Mistral in benchmarks but face criticism for an unusual license and poor image generation quality, which Google partially acknowledges. The upcoming Gemini Pro 1.5 model features a 1 million token context window, excelling in video understanding and needle-in-haystack tasks. Discord communities like TheBloke and LM Studio discuss mixed reception of Gemma models, anticipation for Llama 3 release, challenges in dataset editing, and hardware considerations such as NVIDIA GeForce RTX 3090 and RTX 4090 GPUs. LM Studio users report issues with version 0.2.15 Beta and ongoing integration of Gemma models, with resources shared on Hugging Face.

Feb 21, 2024

Karpathy emerges from stealth?

mistral-7b mixtral-8x7b zephyr-7b gpt-4 llama-2 intel mistral-ai audiogen thebloke tokenization quantization model-optimization fine-tuning model-merging computational-efficiency memory-optimization retrieval-augmented-generation multi-model-learning meta-reasoning dataset-sharing open-source ethical-ai community-collaboration andrej-karpathy

Andrej Karpathy released a comprehensive 2-hour tutorial on tokenization, detailing techniques up to GPT-4's tokenizer and noting the complexity of Llama 2 tokenization with SentencePiece. Discussions in AI Discord communities covered model optimization and efficiency, focusing on quantization of models like Mistral 7B and Zephyr-7B to reduce memory usage for consumer GPUs, including Intel's new weight-only quantization algorithm. Efforts to improve computational efficiency included selective augmentation reducing costs by 57.76% and memory token usage versus kNN for Transformers. Challenges in hardware compatibility and software issues were shared, alongside fine-tuning techniques such as LoRA and model merging. Innovative applications of LLMs in retrieval-augmented generation (RAG), multi-model learning, and meta-reasoning were explored. The community emphasized dataset sharing, open-source releases like SDXL VAE encoded datasets and Audiogen AI codecs, and ethical AI use with censorship and guardrails. Collaboration and resource sharing remain strong in these AI communities.

Feb 07, 2024

MetaVoice & RIP Bard

mixtral nous-mixtral-dpo miqu-70b gpt-4 llama-2-70b-instruct llama-2 llama-2-70b llama-2-70b-instruct coqui metavoice google openai thebloke text-to-speech voice-cloning longform-synthesis prompt-engineering direct-preference-optimization lora-fine-tuning transformers gpu-acceleration apple-silicon content-authenticity metadata ai-censorship open-source-ai model-comparison usability model-limitations

Coqui, a TTS startup that recently shut down, inspired a new TTS model supporting voice cloning and longform synthesis from a small startup called MetaVoice. Google discontinued the Bard brand in favor of Gemini. On TheBloke Discord, discussions focused on AI training with models like Mixtral, Nous Mixtral DPO, and Miqu 70B, comparing them to OpenAI's GPT models, and debated prompt engineering, lorebooks, and removing safety features via LoRA fine-tuning on models such as Llama2 70B instruct. Technical topics included transformer layer offloading limitations and adapting LLaMa 2 for Apple Silicon. On OpenAI Discord, DALL-E images now include C2PA metadata for content authenticity, sparking debates on AI censorship, metadata manipulation, and open-source AI models versus commercial giants like GPT-4. Users discussed GPT-4 usability, limitations, and practical applications.

Jan 30, 2024

RWKV "Eagle" v5: Your move, Mamba

rwkv-v5 mistral-7b miqu-1-70b mistral-medium llama-2 mistral-instruct-v0.2 mistral-tuna llama-2-13b kunoichi-dpo-v2-7b gpt-4 eleutherai mistral-ai hugging-face llamaindex nous-research rwkv lmsys fine-tuning multilinguality rotary-position-embedding model-optimization model-performance quantization speed-optimization prompt-engineering model-benchmarking reinforcement-learning andrej-karpathy

RWKV v5 Eagle was released with better-than-mistral-7b evaluation results, trading some English performance for multilingual capabilities. The mysterious miqu-1-70b model sparked debate about its origins, possibly a leak or distillation of Mistral Medium or a fine-tuned Llama 2. Discussions highlighted fine-tuning techniques, including the effectiveness of 1,000 high-quality prompts over larger mixed-quality datasets, and tools like Deepspeed, Axolotl, and QLoRA. The Nous Research AI community emphasized the impact of Rotary Position Embedding (RoPE) theta settings on LLM extrapolation, improving models like Mistral Instruct v0.2. Speed improvements in Mistral Tuna kernels reduced token processing costs, enhancing efficiency. The launch of Eagle 7B with 7.52B parameters showcased strong multilingual performance, surpassing other 7B class models.

Dec 14, 2023

12/14/2023: $1e7 for Superalignment

gemini bard gpt-4 gpt-4.5 llama-2 openai llamaindex perplexity-ai prompt-engineering api custom-gpt json bug-fixes chatbots performance tts code-generation image-recognition jan-leike patrick-collison

Jan Leike is launching a new grant initiative inspired by Patrick Collison's Fast Grants to support AI research. OpenAI introduced a new developers Twitter handle @OpenAIDevs for community updates. Discussions on OpenAI's Gemini and Bard chatbots highlight their ability to read each other's instructions and offer unique coding solutions. Users reported various issues with GPT-4, including performance problems, customization difficulties, and a resolved bug in image recognition. There are ongoing conversations about prompt engineering challenges and new JSON mode support in Convo-lang for API use. Concerns about misuse of chatbots for illegal activities and alternatives like Llama2 models and the Perplexity chatbot were also discussed.

Dec 13, 2023

12/13/2023 SOLAR10.7B upstages Mistral7B?

solar-10.7b llama-2 mistral-7b phi-2 gpt-4 gemini upstage nous-research openai mistral-ai microsoft depth-up-scaling pretraining synthetic-data gpu-training api-usage model-integration agi asi chat-models vision model-performance fine-tuning

Upstage released the SOLAR-10.7B model, which uses a novel Depth Up-Scaling technique built on the llama-2 architecture and integrates mistral-7b weights, followed by continued pre-training. The Nous community finds it promising but not exceptional. Additionally, weights for the phi-2 base model were released, trained on 1.4 trillion tokens including synthetic texts created by GPT-3 and filtered by GPT-4, using 96 A100 GPUs over 14 days. On OpenAI's Discord, users discussed challenges with various GPT models, including incoherent outputs, API usage limitations, and issues with GPT-4 Vision API. Conversations also covered understanding AGI and ASI, concerns about OpenAI's partnership with Axel Springer, and pricing changes for GPT Plus. Discussions included the Gemini chat model integrated into Bard and comparisons with GPT-4 performance.