All tags
Model: "claude-opus"
Granola launches team notes, while Notion launches meeting transcription
gpt-4.1 gpt-4o-mini gpt-4.1-mini claude-opus claude-sonnet claude-o3 qwen3 seed1.5-vl llama-4 am-thinking-v1 openai anthropic alibaba meta-ai-fair huggingface granola coding instruction-following benchmarking model-releases reasoning image-generation collaborative-software model-performance kevinweil scaling01 steph_palazzolo andersonbcdefg reach_vb yuchenj_uw qtnx_ _akhaliq risingsayak
GPT-4.1 is now available in ChatGPT for Plus, Pro, and Team users, focusing on coding and instruction following, with GPT 4.1 mini replacing GPT 4o mini. Anthropic is releasing new Claude models including Claude Opus and Claude Sonnet, though some criticism about hallucinations in Claude O3 was noted. Alibaba shared the Qwen3 Technical Report with strong benchmark results from Seed1.5-VL. Meta FAIR announced new models and datasets but faced criticism on Llama 4. AM-Thinking-v1 launched on Hugging Face as a 32B scale reasoning model. Granola raised $43M in Series B and launched Granola 2.0 with a Notion-like UI. The AI ecosystem shows rapid iteration and cloning of ideas, emphasizing execution and distribution.
FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)
llama-3-70b llama-3 wizardlm-2-8x22b claude-opus mistral-8x7b gpt-4 huggingface meta-ai-fair dbrx reka-ai mistral-ai lmsys openai datasets benchmarking quantization zero-shot-learning reasoning code-error-detection token-generation security
2024 has seen a significant increase in dataset sizes for training large language models, with Redpajama 2 offering up to 30T tokens, DBRX at 12T tokens, Reka Core/Flash/Edge with 5T tokens, and Llama 3 trained on 15T tokens. Huggingface released an open dataset containing 15T tokens from 12 years of filtered CommonCrawl data, enabling training of models like Llama 3 if compute resources are available. On Reddit, WizardLM-2-8x22b outperformed other open LLMs including Llama-3-70b-instruct in reasoning and math benchmarks. Claude Opus demonstrated strong zero-shot code error spotting, surpassing Llama 3. Benchmarks revealed limitations in the LMSYS chatbot leaderboard due to instruction-tuned models gaming the system, and a new RAG benchmark showed Llama 3 70B underperforming compared to GPT-4, while Mistral 8x7B remained strong. Efficient quantized versions of Llama 3 models are available on Huggingface, with users reporting token generation limits around 9600 tokens on a 3090 GPU. Safety concerns include a UK sex offender banned from AI tool usage and GPT-4 demonstrating an 87% success rate exploiting real vulnerabilities, raising security concerns.
Welcome /r/LocalLlama!
cerebrum-8x7b mixtral-7b gpt-3.5-turbo gemini-pro moistral-11b-v1 claude-opus qwen-vl-chat sakana openinterpreter reddit aether-research mistral-ai nvidia lmdeploy model-merging benchmarking quantization performance-optimization deployment vision fine-tuning training-data synthetic-data rag gui
Sakana released a paper on evolutionary model merging. OpenInterpreter launched their O1 devkit. Discussions highlight Claude Haiku's underrated performance with 10-shot examples. On Reddit's IPO, AINews introduces Reddit summaries starting with /r/LocalLlama, covering upcoming subreddits like r/machinelearning and r/openai. Aether Research released Cerebrum 8x7b based on Mixtral, matching GPT-3.5 Turbo and Gemini Pro on reasoning tasks, setting a new open-source reasoning SOTA. Moistral 11B v1 finetuned model from Cream-Phi-2 creators was released. A creative writing benchmark uses Claude Opus as judge. Hobbyists explore 1.58 BitNet ternary quantization and 1-bit LLMs training. Nvidia's Blackwell (h200) chip supports FP4 precision quantization. LMDeploy v0.2.6+ enables efficient vision-language model deployment with models like Qwen-VL-Chat. Users seek GUIs for LLM APIs with plugin and RAG support. Pipelines for synthetic training data generation and fine-tuning language models for chat are discussed.