All tags
Model: "qwen3-30b-a3b"
not much happened today
o3-mini o1-mini llama hunyuan-a13b ernie-4.5 ernie-4.5-21b-a3b qwen3-30b-a3b gemini-2.5-pro meta-ai-fair openai tencent microsoft baidu gemini superintelligence ai-talent job-market open-source-models multimodality mixture-of-experts quantization fp8-training model-benchmarking model-performance model-releases api model-optimization alexandr_wang shengjia_zhao jhyuxm ren_hongyu shuchaobi saranormous teortaxesTex mckbrando yuchenj_uw francoisfleuret quanquangu reach_vb philschmid
Meta has poached top AI talent from OpenAI, including Alexandr Wang joining as Chief AI Officer to work towards superintelligence, signaling a strong push for the next Llama model. The AI job market shows polarization with high demand and compensation for top-tier talent, while credentials like strong GitHub projects gain importance. The WizardLM team moved from Microsoft to Tencent to develop open-source models like Hunyuan-A13B, highlighting shifts in China's AI industry. Rumors suggest OpenAI will release a new open-source model in July, potentially surpassing existing ChatGPT models. Baidu open-sourced multiple variants of its ERNIE 4.5 model series, featuring advanced techniques like 2-bit quantization, MoE router orthogonalization loss, and FP8 training, with models ranging from 0.3B to 424B parameters. Gemini 2.5 Pro returned to the free tier of the Gemini API, enabling developers to explore its features.
LlamaCon: Meta AI gets into the Llama API platform business
llama-4 qwen3 qwen3-235b-a22b qwen3-30b-a3b qwen3-4b qwen2-5-72b-instruct o3-mini meta-ai-fair cerebras groq alibaba vllm ollama llamaindex hugging-face llama-cpp model-release fine-tuning reinforcement-learning moe multilingual-models model-optimization model-deployment coding benchmarking apache-license reach_vb huybery teortaxestex awnihannun thezachmueller
Meta celebrated progress in the Llama ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by Cerebras and Groq hardware, though it remains waitlisted. Meanwhile, Alibaba released the Qwen3 family of large language models, including two MoE models and six dense models ranging from 0.6B to 235B parameters, with the flagship Qwen3-235B-A22B achieving competitive benchmark results and supporting 119 languages and dialects. The Qwen3 models are optimized for coding and agentic capabilities, are Apache 2.0 licensed, and have broad deployment support including local usage with tools like vLLM, Ollama, and llama.cpp. Community feedback highlights Qwen3's scalable performance and superiority over models like OpenAI's o3-mini.
Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1
qwen-3 qwen3-235b-a22b qwen3-30b-a3b deepseek-r1 o1 o3-mini grok-3 gemini-2.5-pro alibaba google-deepmind deepseek mistral-ai mixture-of-experts reinforcement-learning benchmarking model-release model-architecture long-context multi-agent-systems inference dataset-release awnihannun prince_canuma actuallyisaak oriolvinyalsml iscienceluvr reach_vb teortaxestex omarsar0
Qwen 3 has been released by Alibaba featuring a range of models including two MoE variants, Qwen3-235B-A22B and Qwen3-30B-A3B, which demonstrate competitive performance against top models like DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. The models introduce an "enable_thinking=True" mode with advanced soft switching for inference scaling. The release is notable for its Apache 2.0 license and broad inference platform support including MCP. The dataset improvements and multi-stage RL post-training contribute to performance gains. Meanwhile, Gemini 2.5 Pro from Google DeepMind shows strong coding and long-context reasoning capabilities, and DeepSeek R2 is anticipated soon. Twitter discussions highlight Qwen3's finegrained MoE architecture, large context window, and multi-agent system applications.