Model: "phi-3"

Jan 14, 2025

helium-1 qwen-2.5 phi-4 sky-t1-32b-preview o1 codestral-25.01 phi-3 mistral llama-3 gpt-3.5 llama-3 gpt-3.5 llmquoter kyutai-labs lmstudio mistralai llamaindex huggingface langchainai hyperbolic-labs replit fchollet philschmid multilinguality token-level-distillation context-windows model-performance open-source reasoning coding retrieval-augmented-generation hybrid-retrieval multiagent-systems video large-video-language-models dynamic-ui voice-interaction gpu-rentals model-optimization semantic-deduplication model-inference reach_vb awnihannun lior_on_ai sophiamyang omarsar0 skirano yuchenj_uw fchollet philschmid

Helium-1 Preview by kyutai_labs is a 2B-parameter multilingual base LLM outperforming Qwen 2.5, trained on 2.5T tokens with a 4096 context size using token-level distillation from a 7B model. Phi-4 (4-bit) was released in lmstudio on an M4 max, noted for speed and performance. Sky-T1-32B-Preview is a $450 open-source reasoning model matching o1's performance with strong benchmark scores. Codestral 25.01 by mistralai is a new SOTA coding model supporting 80+ programming languages and offering 2x speed. Innovations include AutoRAG for optimizing retrieval-augmented generation pipelines, Agentic RAG for autonomous query reformulation and critique, Multiagent Finetuning using societies of models like Phi-3, Mistral, LLaMA-3, and GPT-3.5 for reasoning improvements, and VideoRAG incorporating video content into RAG with LVLMs. Applications include a dynamic UI AI chat app by skirano on Replit, LangChain tools like DocTalk for voice PDF conversations, AI travel agent tutorials, and news summarization agents. Hyperbolic Labs offers competitive GPU rentals including H100, A100, and RTX 4090. LLMQuoter enhances RAG accuracy by identifying key quotes. Infrastructure updates include MLX export for LLM inference from Python to C++ by fchollet and SemHash semantic text deduplication by philschmid.

Jul 17, 2024

Gemma 2 tops /r/LocalLlama vibe check

gemma-2-9b gemma-2-27b llama-3 mistral-7b phi-3 qwen gemma llamaindex mistral-ai cohere deepseek-ai nous-research eureka-labs model-comparison local-llms multilinguality model-efficiency fine-tuning ai-education ai-teaching-assistants andrej-karpathy

Gemma 2 (9B, 27B) is highlighted as a top-performing local LLM, praised for its speed, multilingual capabilities, and efficiency on consumer GPUs like the 2080ti. It outperforms models like Llama 3 and Mistral 7B in various tasks, including non-English text processing and reasoning. The community discussion on /r/LocalLlama reflects strong preference for Gemma 2, with 18 mentions, compared to 10 mentions for Llama 3 and 9 mentions for Mistral. Other models like Phi 3 and Qwen also received mentions but are considered surpassed by Gemma 2. Additionally, Andrej Karpathy announced the launch of Eureka Labs, an AI+Education startup aiming to create an AI-native school with AI Teaching Assistants, starting with the LLM101n course to teach AI training fundamentals. This initiative is seen as a significant development in AI education.

May 02, 2024

Evals: The Next Generation

gpt-4 gpt-5 gpt-3.5 phi-3 mistral-7b llama-3 scale-ai mistral-ai reka-ai openai moderna sanctuary-ai microsoft mit meta-ai-fair benchmarking data-contamination multimodality fine-tuning ai-regulation ai-safety ai-weapons neural-networks model-architecture model-training model-performance robotics activation-functions long-context sam-altman jim-fan

Scale AI highlighted issues with data contamination in benchmarks like MMLU and GSM8K, proposing a new benchmark where Mistral overfits and Phi-3 performs well. Reka released the VibeEval benchmark for multimodal models addressing multiple choice benchmark limitations. Sam Altman of OpenAI discussed GPT-4 as "dumb" and hinted at GPT-5 with AI agents as a major breakthrough. Researchers jailbroke GPT-3.5 via fine-tuning. Global calls emerged to ban AI-powered weapons, with US officials urging human control over nuclear arms. Ukraine launched an AI consular avatar, while Moderna partnered with OpenAI for medical AI advancements. Sanctuary AI and Microsoft collaborate on AI for general-purpose robots. MIT introduced Kolmogorov-Arnold networks with improved neural network efficiency. Meta AI is training Llama 3 models with over 400 billion parameters, featuring multimodality and longer context.

Apr 26, 2024

Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM

snowflake-arctic phi-3 llama-3-70b llama-3 stable-diffusion-3 sd3-turbo gpt-3.5-turbo snowflake databricks deepseek deepspeed nvidia stable-diffusion adobe apple llamaindex lmsys openai mixture-of-experts curriculum-learning model-release image-generation video-upscaling quantization inference-speed benchmarking model-comparison open-source on-device-ai

Snowflake Arctic is a notable new foundation language model released under Apache 2.0, claiming superiority over Databricks in data warehouse AI applications and adopting a mixture-of-experts architecture inspired by DeepSeekMOE and DeepSpeedMOE. The model employs a 3-stage curriculum training strategy similar to the recent Phi-3 paper. In AI image and video generation, Nvidia introduced the Align Your Steps technique improving image quality at low step counts, while Stable Diffusion 3 and SD3 Turbo models were compared for prompt understanding and image quality. Adobe launched an AI video upscaling project enhancing blurry videos to HD, though with some high-resolution artifacts. Apple released open-source on-device language models with code and training logs, diverging from typical weight-only releases. The Llama-3-70b model ties for first place on the LMSYS leaderboard for English queries, and Phi-3 (4B params) outperforms GPT-3.5 Turbo in the banana logic benchmark. Fast inference and quantization of Llama 3 models were demonstrated on MacBook devices.

Apr 23, 2024

Perplexity, the newest AI unicorn

llama-3-8b llama-3-70b llama-3 llava-llama-3-8b-v1_1 phi-3 gpt-3.5 perplexity-ai meta-ai-fair hugging-face groq context-length fine-tuning quantization instruction-following model-comparison multimodality benchmarking memory-optimization model-performance daniel-gross aravind-srinivas

Perplexity doubles its valuation shortly after its Series B with a Series B-1 funding round. Significant developments around Llama 3 include context length extension to 16K tokens, new multimodal LLaVA models outperforming Llama 2, and fine-tuning improvements like QDoRA surpassing QLoRA. The Llama-3-70B model is praised for instruction following and performance across quantization formats. Phi-3 models by Meta AI released in multiple sizes show competitive benchmark results, with the 14B model achieving 78% on MMLU and the 3.8B model nearing GPT-3.5 performance.

Jan 11, 2024

1/9/2024: Nous Research lands $5m for Open Source AI

qlora phi-3 mixtral ollama nous-research openai rabbit-tech context-window fine-tuning synthetic-data activation-beacon transformer-architecture seed-financing real-time-voice-agents trillion-parameter-models kenakafrosty _stilic_ teknium

Nous Research announced a $5.2 million seed financing focused on Nous-Forge, aiming to embed transformer architecture into chips for powerful servers supporting real-time voice agents and trillion parameter models. Rabbit R1 launched a demo at CES with mixed reactions. OpenAI shipped the GPT store and briefly leaked an upcoming personalization feature. A new paper on Activation Beacon proposes a solution to extend LLMs' context window significantly, with code to be released on GitHub. Discussions also covered QLORA, fine-tuning, synthetic data, and custom architectures for LLMs.