All tags
Company: "reddit"
Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model
chameleon gpt-4o gemini-1.5-flash claude-3 meta-ai-fair openai google-deepmind anthropic reddit multimodality early-fusion benchmarking model-training tokenization streaming tool-use vision coding hallucination-detection model-performance armen-aghajanyan sama alexandr-wang abacaj alexalbert__
Meta AI FAIR introduced Chameleon, a new multimodal model family with 7B and 34B parameter versions trained on 10T tokens of interleaved text and image data enabling "early fusion" multimodality that can natively output any modality. While reasoning benchmarks are modest, its "omnimodality" approach competes well with pre-GPT4o multimodal models. OpenAI launched GPT-4o, a model excelling in benchmarks like MMLU and coding tasks, with strong multimodal capabilities but some regression in ELO scores and hallucination issues. Google DeepMind announced Gemini 1.5 Flash, a small model with 1M context window and flash performance, highlighting convergence trends between OpenAI and Google models. Anthropic updated Claude 3 with streaming support, forced tool use, and vision tool integration for multimodal knowledge extraction. OpenAI also partnered with Reddit, raising industry attention.
Welcome /r/LocalLlama!
cerebrum-8x7b mixtral-7b gpt-3.5-turbo gemini-pro moistral-11b-v1 claude-opus qwen-vl-chat sakana openinterpreter reddit aether-research mistral-ai nvidia lmdeploy model-merging benchmarking quantization performance-optimization deployment vision fine-tuning training-data synthetic-data rag gui
Sakana released a paper on evolutionary model merging. OpenInterpreter launched their O1 devkit. Discussions highlight Claude Haiku's underrated performance with 10-shot examples. On Reddit's IPO, AINews introduces Reddit summaries starting with /r/LocalLlama, covering upcoming subreddits like r/machinelearning and r/openai. Aether Research released Cerebrum 8x7b based on Mixtral, matching GPT-3.5 Turbo and Gemini Pro on reasoning tasks, setting a new open-source reasoning SOTA. Moistral 11B v1 finetuned model from Cream-Phi-2 creators was released. A creative writing benchmark uses Claude Opus as judge. Hobbyists explore 1.58 BitNet ternary quantization and 1-bit LLMs training. Nvidia's Blackwell (h200) chip supports FP4 precision quantization. LMDeploy v0.2.6+ enables efficient vision-language model deployment with models like Qwen-VL-Chat. Users seek GUIs for LLM APIs with plugin and RAG support. Pipelines for synthetic training data generation and fine-tuning language models for chat are discussed.