All tags
Model: "meta-3d-gen"
Problems with MMLU-Pro
mmlu-pro llama-3-8b-q8 gpt4all-3.0 chatgpt claude llama gemini mobilellm runway-gen-3-alpha meta-3d-gen huggingface meta-ai-fair salesforce runway nomic-ai pineapple argil-ai benchmarking prompt-engineering model-evaluation model-performance multimodality automated-dataset-generation video-generation open-source-models ai-assistants text-to-3d deepfake transformers reasoning wenhu-chen danhendrycks clementine ylecun adcock_brett svpino rohanpaul_ai
MMLU-Pro is gaining attention as the successor to MMLU on the Open LLM Leaderboard V2 by HuggingFace, despite community concerns about evaluation discrepancies and prompt sensitivity affecting model performance, notably a 10-point improvement in Llama-3-8b-q8 with simple prompt tweaks. Meta's MobileLLM research explores running sub-billion parameter LLMs on smartphones using shared weights and deeper architectures. Salesforce's APIGen introduces an automated dataset generation system for function-calling tasks outperforming larger models. Runway Gen-3 Alpha launches an AI video generator for paid users creating realistic 10-second clips. Nomic AI's GPT4All 3.0 offers an open-source desktop app supporting thousands of local models. AI assistants with multimodal capabilities and affordable access to multiple LLMs like ChatGPT, Claude, Llama, and Gemini are emerging. Meta 3D Gen advances text-to-3D asset generation, while Argil AI enables deepfake video creation from text threads. Research on transformer grokking and reasoning highlights advances in robust reasoning capabilities.
Not much happened today.
phi-3-mini gpt4all-3.0 yi-large meta-3d-gen meta perplexity-ai microsoft gpt4all langchainai qdrant-engine 3d-generation long-context instruction-following reinforcement-learning-from-human-feedback persona-driven-data-synthesis meta-tuning model-steering memory-retrieval multivector-search universal-query-api rohanpaul_ai andriy_mulyar cwolferesearch sarahookr
Meta introduced Meta 3D Gen, a system for end-to-end generation of 3D assets from text in under 1 minute, producing high-quality 3D assets with detailed textures. Perplexity AI updated Pro Search to handle deeper research with multi-step reasoning and code execution. Microsoft improved Phi-3 Mini with better long-context understanding and instruction following. GPT4All 3.0 launched with support for thousands of models and major OS compatibility, featuring local file chat. Yi-Large model launched on Fireworks AI Playground. Research highlights include the evolution of reinforcement learning from human feedback (RLHF), persona-driven data synthesis using a billion diverse personas, meta-tuning for few-shot generalization, and steering vectors for model behavior control. Tools updates include LangSmith improving memory retrieval and Qdrant Engine v1.10 adding universal query API and multivector search.