All tags
Model: "llama-2-70b"
a calm before the storm
o1 o1-mini qwen2.5 gpt-4 llama-2-70b llama-7b anthropic openai alibaba microsoft blackrock groq aramco disney eth-zurich pudu-robotics slack long-context kv-cache-quantization diffusion-models reinforcement-learning robotics ai-integration multilinguality model-benchmarking model-performance model-optimization adcock_brett philschmid rohanpaul_ai jvnixon kateclarktweets sama
Anthropic is raising funds at a valuation up to $40 billion ahead of anticipated major releases. OpenAI launched new reasoning models o1 and o1-mini, with increased rate limits and a multilingual MMLU benchmark. Alibaba released the open-source Qwen2.5 model supporting 29+ languages, showing competitive performance to gpt-4 at lower cost. Microsoft and Blackrock plan to invest $30 billion in AI data centers, with Groq partnering with Aramco to build the world's largest AI inference center. Robotics advances include Disney Research and ETH Zurich's diffusion-based motion generation for robots and Pudu Robotics' semi-humanoid robot. Slack and Microsoft introduced AI-powered agents integrated into their platforms. Research highlights include long-context scaling for llama-2-70b using Dual Chunk Attention and KV cache quantization enabling 1 million token context on llama-7b models.
Not much happened today
gemini-1.5-flashmodel gemini-pro mixtral mamba-2 phi-3-medium phi-3-small gpt-3.5-turbo-0613 llama-3-8b llama-2-70b mistral-finetune twelve-labs livekit groq openai nea nvidia lmsys mistral-ai model-performance prompt-engineering data-curation ai-safety model-benchmarking model-optimization training sequence-models state-space-models daniel-kokotajlo rohanpaul_ai _arohan_ tri_dao _albertgu _philschmid sarahcat21 hamelhusain jachiam0 willdepue teknium1
Twelve Labs raised $50m in Series A funding co-led by NEA and NVIDIA's NVentures to advance multimodal AI. Livekit secured $22m in funding. Groq announced running at 800k tokens/second. OpenAI saw a resignation from Daniel Kokotajlo. Twitter users highlighted Gemini 1.5 FlashModel for high performance at low cost and Gemini Pro ranking #2 in Japanese language tasks. Mixtral models can run up to 8x faster on NVIDIA RTX GPUs using TensorRT-LLM. Mamba-2 model architecture introduces state space duality for larger states and faster training, outperforming previous models. Phi-3 Medium (14B) and Small (7B) models benchmark near GPT-3.5-Turbo-0613 and Llama 3 8B. Prompt engineering is emphasized for unlocking LLM capabilities. Data quality is critical for model performance, with upcoming masterclasses on data curation. Discussions on AI safety include a Frontier AI lab employee letter advocating whistleblower protections and debates on aligning AI to user intent versus broader humanity interests.
Llama-3-70b is GPT-4-level Open Model
llama-3-70b llama-3-8b llama-3 llama-2-70b mistral-7b grok-3 stable-diffusion-3 vasa-1 meta-ai-fair groq nvidia amazon microsoft benchmarking model-performance fine-tuning function-calling arithmetic image-generation video-generation energy-usage gpu-demand political-bias ai-safety scaling context-windows tokenization elon-musk
Meta has released Llama 3, their most capable open large language model with 8B and 70B parameter versions supporting 8K context length and outperforming previous models including Llama 2 and Mistral 7B. Groq serves the Llama 3 70B model at 500-800 tokens/second, making it the fastest GPT-4-level token source. Discussions highlight AI scaling challenges with Elon Musk stating that training Grok 3 will require 100,000 Nvidia H100 GPUs, and AWS planning to acquire 20,000 B200 GPUs for a 27 trillion parameter model. Microsoft unveiled VASA-1 for lifelike talking face generation, while Stable Diffusion 3 and its extensions received mixed impressions. Concerns about AI energy usage and political bias in AI were also discussed.
not much happened today
llama-2-70b llama-2-7b mistral-7b qwen-1.5 llava microsoft mistral-ai ollama fine-tuning synthetic-data retrieval-augmented-generation embeddings hardware-optimization performance-benchmarks model-memory multimodality
The Reddit community /r/LocalLlama discusses fine-tuning and training LLMs, including tutorials and questions on training models with specific data like dictionaries and synthetic datasets with 25B+ tokens. Users explore retrieval-augmented generation (RAG) challenges with models like mistral-7b and embedding generation for EEG brain activity. Discussions include hardware optimization for running llama-2-70b locally under budget constraints, and performance benchmarks for qwen-1.5 models. There is interest in extending LLM capabilities, such as converting llama-2-7b into a vision-capable model like llava and improving model memory for longer context retention.
MetaVoice & RIP Bard
mixtral nous-mixtral-dpo miqu-70b gpt-4 llama-2-70b-instruct llama-2 llama-2-70b llama-2-70b-instruct coqui metavoice google openai thebloke text-to-speech voice-cloning longform-synthesis prompt-engineering direct-preference-optimization lora-fine-tuning transformers gpu-acceleration apple-silicon content-authenticity metadata ai-censorship open-source-ai model-comparison usability model-limitations
Coqui, a TTS startup that recently shut down, inspired a new TTS model supporting voice cloning and longform synthesis from a small startup called MetaVoice. Google discontinued the Bard brand in favor of Gemini. On TheBloke Discord, discussions focused on AI training with models like Mixtral, Nous Mixtral DPO, and Miqu 70B, comparing them to OpenAI's GPT models, and debated prompt engineering, lorebooks, and removing safety features via LoRA fine-tuning on models such as Llama2 70B instruct. Technical topics included transformer layer offloading limitations and adapting LLaMa 2 for Apple Silicon. On OpenAI Discord, DALL-E images now include C2PA metadata for content authenticity, sparking debates on AI censorship, metadata manipulation, and open-source AI models versus commercial giants like GPT-4. Users discussed GPT-4 usability, limitations, and practical applications.
CodeLLama 70B beats GPT4 on HumanEval
codellama miqu mistral-medium llama-2-70b aphrodite-engine mixtral flatdolphinmaid noromaid rpcal chatml mistral-7b activation-beacon eagle-7b rwkv-v5 openhermes2.5 nous-hermes-2-mixtral-8x7b-dpo imp-v1-3b bakllava moondream qwen-vl meta-ai-fair ollama nous-research mistral-ai hugging-face ai-ethics alignment gpu-optimization direct-prompt-optimization fine-tuning cuda-programming optimizer-technology quantization multimodality context-length dense-retrieval retrieval-augmented-generation multilinguality model-performance open-source code-generation classification vision
Meta AI surprised the community with the release of CodeLlama, an open-source model now available on platforms like Ollama and MLX for local use. The Miqu model sparked debate over its origins, possibly linked to Mistral Medium or a fine-tuned Llama-2-70b, alongside discussions on AI ethics and alignment risks. The Aphrodite engine showed strong performance on A6000 GPUs with specific configurations. Role-playing AI models such as Mixtral and Flatdolphinmaid faced challenges with repetitiveness, while Noromaid and Rpcal performed better, with ChatML and DPO recommended for improved responses. Learning resources like fast.ai's course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like Paged 8bit lion and adafactor were discussed.
At Nous Research AI, the Activation Beacon project introduced a method for unlimited context length in LLMs using "global state" tokens, potentially transforming retrieval-augmented models. The Eagle-7B model, based on RWKV-v5, outperformed Mistral in benchmarks with efficiency and multilingual capabilities. OpenHermes2.5 was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like IMP v1-3b, Bakllava, Moondream, and Qwen-vl were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.