Model: "llama-2-7b"

Feb 18, 2025

llada-8b llama-3-8b step-video-t2v-30b step-audio-chat-132b llama-2-7b stepfun-ai scale-ai cambridge llamaindex diffusion-models text-generation multimodality video-generation voice-processing benchmarking instruction-following model-scaling gpu-usage long-context multi-turn-dialogue arankomatsuzaki _akhaliq omarsar0 iscienceluvr gallabytes maximelabonne reach_vb

LLaDA (Large Language Diffusion Model) 8B is a breakthrough diffusion-based language model that rivals LLaMA 3 8B while training on 7x fewer tokens (2 trillion tokens) and using 0.13 million H800 GPU hours. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, StepFun AI released two major models: Step-Video-T2V 30B, a text-to-video model generating up to 204 frames with high coherence and motion quality, and Step-Audio-Chat 132B, a voice-to-voice model. Additionally, challenging multimodal benchmarks like Scale AI's EnigmaEval and Cambridge's ZeroBench highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.

Jul 10, 2024

Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)

llama-2-7b codegeex4-all-9b mamba facebook-research meta-ai-fair tsinghua-university hallucination-detection anti-hallucination-methods on-device-ai model-architecture rnn long-context-modeling model-scaling expressive-hidden-states code-generation lilian-weng yann-lecun

Lilian Weng released a comprehensive literature review on hallucination detection and anti-hallucination methods including techniques like FactualityPrompt, SelfCheckGPT, and WebGPT. Facebook AI Research (FAIR) published MobileLLM, a sub-billion parameter on-device language model architecture achieving performance comparable to llama-2-7b with innovations like thin and deep models and shared weights. A new RNN-based LLM architecture with expressive hidden states was introduced, replacing attention mechanisms and scaling better than Mamba and Transformer models for long-context modeling. Additionally, Tsinghua University open sourced CodeGeeX4-ALL-9B, a multilingual code generation model excelling in code assistance.

Apr 17, 2024

Mixtral 8x22B Instruct sparks efficiency memes

mixtral-8x22b llama-2-7b olmo-7b mistral-ai hugging-face google microsoft intel softbank nvidia multilinguality math code-generation context-window model-performance model-release retrieval-augmented-generation deepfake ai-investment ai-chip hybrid-architecture training-data guillaume-lample osanseviero _philschmid svpino

Mistral released an instruct-tuned version of their Mixtral 8x22B model, notable for using only 39B active parameters during inference, outperforming larger models and supporting 5 languages with 64k context window and math/code capabilities. The model is available on Hugging Face under an Apache 2.0 license for local use. Google plans to invest over $100 billion in AI, with other giants like Microsoft, Intel, and SoftBank also making large investments. The UK criminalized non-consensual deepfake porn, raising enforcement debates. A former Nvidia employee claims Nvidia's AI chip lead is unmatchable this decade. AI companions could become a $1 billion market. AI has surpassed humans on several basic tasks but lags on complex ones. Zyphra introduced Zamba, a novel 7B parameter hybrid model outperforming LLaMA-2 7B and OLMo-7B with less training data, trained on 128 H100 GPUs over 30 days. GroundX API advances retrieval-augmented generation accuracy.

Mar 22, 2024

not much happened today

llama-2-70b llama-2-7b mistral-7b qwen-1.5 llava microsoft mistral-ai ollama fine-tuning synthetic-data retrieval-augmented-generation embeddings hardware-optimization performance-benchmarks model-memory multimodality

The Reddit community /r/LocalLlama discusses fine-tuning and training LLMs, including tutorials and questions on training models with specific data like dictionaries and synthetic datasets with 25B+ tokens. Users explore retrieval-augmented generation (RAG) challenges with models like mistral-7b and embedding generation for EEG brain activity. Discussions include hardware optimization for running llama-2-70b locally under budget constraints, and performance benchmarks for qwen-1.5 models. There is interest in extending LLM capabilities, such as converting llama-2-7b into a vision-capable model like llava and improving model memory for longer context retention.

Dec 29, 2023

12/27/2023: NYT vs OpenAI

phi2 openhermes-2.5-mistral-7b llama-2-7b llama-2-13b microsoft-research mistral-ai apple amd model-performance fine-tuning llm-api gpu-optimization hardware-configuration multi-gpu inference-speed plugin-release conversation-history

The LM Studio Discord community extensively discussed model performance comparisons, notably between Phi2 by Microsoft Research and OpenHermes 2.5 Mistral 7b, with focus on U.S. history knowledge and fine-tuning for improved accuracy. Technical challenges around LLM API usage, conversation history maintenance, and GPU optimization for inference speed were addressed. Hardware discussions covered DDR4 vs DDR5, multi-GPU setups, and potential of Apple M1/M3 and AMD AI CPUs for AI workloads. The community also announced the ChromaDB Plugin v3.0.2 release enabling image search in vector databases. Users shared practical tips on running multiple LM Studio instances and optimizing resource usage.