All tags
Person: "huybery"
not much happened today
gpt-oss-120b gpt-oss-20b kimi-k2 deepseek-r1 qwen-3-32b openai huggingface microsoft llamaindex ollama baseten fireworksai cerebras groq together anthropic google uk-aisi sliding-window-attention mixture-of-experts rope context-length mxfp4-format synthetic-data reasoning-core-hypothesis red-teaming benchmarking coding-benchmarks model-performance fine-tuning woj_zaremba sama huybery drjimfan jxmnop scaling01 arunv30 kevinweil xikun_zhang_ jerryjliu0 ollama basetenco reach_vb gneubig shxf0072 _lewtun
OpenAI released its first open models since GPT-2, gpt-oss-120b and gpt-oss-20b, which quickly trended on Hugging Face. Microsoft supports these models via Azure AI Foundry and Windows Foundry Local. Key architectural innovations include sliding window attention, mixture of experts (MoE), a RoPE variant, and a 256k context length. The models use a new MXFP4 format supported by llama.cpp. Hypotheses suggest gpt-oss was trained on synthetic data to enhance safety and performance, supporting the Reasoning Core Hypothesis. OpenAI announced a $500K bounty for red teaming with partners including Anthropic, Google, and the UK AISI. Performance critiques highlight inconsistent benchmarking results, with GPT-OSS-120B scoring 41.8% on the Aider Polyglot coding benchmark, trailing competitors like Kimi-K2 and DeepSeek-R1. Some users note the model excels in math and reasoning but lacks common sense and practical utility.
Figma's $50+b IPO
horizon-alpha gpt-5 gemini-2.5-pro qwen3-coder qwen3-coder-flash-30b-a3b command-a-vision gpt-4.1 llama-4-maverick flux-1-krea-dev glm-4.5 voxtral openai openrouter alibaba unslothai cohere huggingface black-forest-labs diffusers ostrisai zhipu-ai together-ai mistral-ai reasoning svg-generation agentic-ai context-windows vision fine-tuning inference-time-training model-generalization open-models technical-reports scaling01 teortaxestex huybery nickfrosst aidangomez reach_vb zai_org corbtt jxmnop teknuim1
OpenAI's stealth model horizon-alpha on OpenRouter sparks speculation as a precursor to GPT-5, showing strong reasoning and SVG generation capabilities, comparable to Gemini 2.5 Pro. Alibaba released the Qwen3-Coder family, including a fast Qwen3-Coder-Flash (30B-A3B) variant with agentic features and 1M context length support via UnslothAI. Cohere launched Command A Vision, a 111B parameter open-weights vision-language model outperforming GPT-4.1 and Llama 4 Maverick on enterprise benchmarks. Black Forest Labs introduced FLUX.1 Krea [dev], an open-weights photorealism model compatible with fine-tuning tools like diffusers and ostrisai. Zhipu AI unveiled GLM-4.5, a hybrid reasoning open model with agentic capabilities available on Together AI. Discussions highlight the rising importance of inference-time training and reasoning model generalization. Mistral AI released the technical report for Voxtral continuing its open science efforts.
LlamaCon: Meta AI gets into the Llama API platform business
llama-4 qwen3 qwen3-235b-a22b qwen3-30b-a3b qwen3-4b qwen2-5-72b-instruct o3-mini meta-ai-fair cerebras groq alibaba vllm ollama llamaindex hugging-face llama-cpp model-release fine-tuning reinforcement-learning moe multilingual-models model-optimization model-deployment coding benchmarking apache-license reach_vb huybery teortaxestex awnihannun thezachmueller
Meta celebrated progress in the Llama ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by Cerebras and Groq hardware, though it remains waitlisted. Meanwhile, Alibaba released the Qwen3 family of large language models, including two MoE models and six dense models ranging from 0.6B to 235B parameters, with the flagship Qwen3-235B-A22B achieving competitive benchmark results and supporting 119 languages and dialects. The Qwen3 models are optimized for coding and agentic capabilities, are Apache 2.0 licensed, and have broad deployment support including local usage with tools like vLLM, Ollama, and llama.cpp. Community feedback highlights Qwen3's scalable performance and superiority over models like OpenAI's o3-mini.
HippoRAG: First, do know(ledge) Graph
qwen-2 gpt-4 hipporag alibaba openai knowledge-graphs personalized-pagerank multi-hop-retrieval chain-of-thought implicit-reasoning sparse-autoencoders model-interpretability model-efficiency model-architecture fine-tuning reinforcement-learning rohanpaul_ai omarsar0 nabla_theta huybery
Alibaba released new open-source Qwen2 models ranging from 0.5B to 72B parameters, achieving SOTA results on benchmarks like MMLU and HumanEval. Researchers introduced Sparse Autoencoders to interpret GPT-4 neural activity, improving feature representation. The HippoRAG paper proposes a hippocampus-inspired retrieval augmentation method using knowledge graphs and Personalized PageRank for efficient multi-hop reasoning. New techniques like Stepwise Internalization enable implicit chain-of-thought reasoning in LLMs, enhancing accuracy and speed. The Buffer of Thoughts (BoT) method improves reasoning efficiency with significant cost reduction. A novel scalable MatMul-free LLM architecture competitive with SOTA Transformers at billion-parameter scale was also presented. "Single-Step, Multi-Hop retrieval" is highlighted as a key advancement in retrieval speed and cost.
Qwen 2 beats Llama 3 (and we don't know how)
qwen-2 llama-3 llama-3-70b gpt-4 nllb alibaba groq meta-ai-fair multilinguality benchmarking inference-speed sparse-autoencoders scaling-laws post-training instruction-following rejection-sampling execution-feedback model-release multilingual-models model-training philschmid huybery jonathanross321 awnihannun gdb nabla_theta ylecun
Alibaba released Qwen 2 models under Apache 2.0 license, claiming to outperform Llama 3 in open models with multilingual support in 29 languages and strong benchmark scores like MMLU 82.3 and HumanEval 86.0. Groq demonstrated ultra-fast inference speed on Llama-3 70B at 40,792 tokens/s and running 4 Wikipedia articles in 200ms. Research on sparse autoencoders (SAEs) for interpreting GPT-4 neural activity showed new training methods, metrics, and scaling laws. Meta AI announced the No Language Left Behind (NLLB) model capable of high-quality translations between 200 languages, including low-resource ones. "Our post-training phase is designed with the principle of scalable training with minimal human annotation," highlighting techniques like rejection sampling for math and execution feedback for coding.