All tags
Person: "ggerganov"
not much happened today
fastvlm mobileclip2 grok-code-fast-1 gpt-5 qwen-3-coder-30b-a3b apple hugging-face x-ai openai groq run-llama lmstudio vision model-quantization code-generation cli-workflows retrieval-augmentation embedding-models local-ai multimodality reach_vb xenovacom pcuenq awnihannun cline veggie_eric nickbaumann_ gdb benankdev loganmarkewich tom_doerr fastmcp ggerganov orionweller antoine_chaffin
Apple released three real-time vision-language models (FastVLM, MobileCLIP2) on Hugging Face with significant speed and size improvements, supporting WebGPU and Core ML. Their MLX framework now supports MXFP4 format, competing with NVFP4 for FP4 quantization. xAI launched grok-code-fast-1, outperforming Claude for code edits, while OpenAI integrated GPT-5 into Xcode 26 and released a new Responses API on Groq hardware. CLI-first agent workflows advanced with tools like SemTools, MLX local runner for Apple Silicon, and llama.vim recommending Qwen 3 Coder 30B A3B. Retrieval research highlights limitations of single-vector embeddings, promoting ColBERT-style late interaction.
Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500
deepseek-r1 qwq gpt-4o claude-3.5-sonnet qwen-2.5 llama-cpp deepseek sambanova hugging-face dair-ai model-releases benchmarking fine-tuning sequential-search inference model-deployment agentic-rag external-tools multi-modal-models justin-lin clementdelangue ggerganov vikparuchuri
DeepSeek r1 leads the race for "open o1" models but has yet to release weights, while Justin Lin released QwQ, a 32B open weight model that outperforms GPT-4o and Claude 3.5 Sonnet on benchmarks. QwQ appears to be a fine-tuned version of Qwen 2.5, emphasizing sequential search and reflection for complex problem-solving. SambaNova promotes its RDUs as superior to GPUs for inference tasks, highlighting the shift from training to inference in AI systems. On Twitter, Hugging Face announced CPU deployment for llama.cpp instances, Marker v1 was released as a faster and more accurate deployment tool, and Agentic RAG developments focus on integrating external tools and advanced LLM chains for improved response accuracy. The open-source AI community sees growing momentum with models like Flux gaining popularity, reflecting a shift towards multi-modal AI models including image, video, audio, and biology.