All tags
Company: "allenai"
not much happened today
qwen3-14b qwen3-32b qwen3-235b phi-4-reasoning o3-mini command-a gemini-2.5-pro o4-mini olm-o2-1b o3 alibaba together-ai scaling01 microsoft deepseek cohere google epoch-ai-research inception-labs openai allenai quantization fine-tuning reinforcement-learning benchmarking video-generation diffusion-models model-performance model-evaluation model-release text-generation cline _philschmid iscienceluvr alexalbert__ _lewtun teortaxestex sarahookr reach_vb
Qwen model family released quantized versions of Qwen3 models including 14B, 32B, and 235B parameters, with promising coding capabilities in Qwen3-235B. Microsoft launched Phi-4-reasoning, a 14B parameter model distilled from OpenAI's o3-mini, emphasizing supervised fine-tuning and reinforcement learning, outperforming larger models in some benchmarks. Cohere's Command A leads SQL performance on Bird Bench. Google introduced the TRAJAN eval for video generation temporal consistency and updated the Gemini OpenAI compatibility layer. Inception Labs launched a diffusion LLM API claiming 5x speed improvements over autoregressive models. Community rankings show OpenAI's o3 model debuting strongly in web app-building tasks. Other releases include AllenAI's OLMo2 1B and additional Phi 4 variants. "Qwen3-235B shows promise for coding" and "Phi-4-reasoning tech report emphasizes SFT gains" highlight key advancements.
not much happened today
llama-3-2 llama-3 gemma-2 phi-3-5-mini claude-3-haiku gpt-4o-mini molmo gemini-1.5 gemini meta-ai-fair openai allenai google-deepmind multimodality model-optimization benchmarks ai-safety model-distillation pruning adapter-layers open-source-models performance context-windows mira-murati demis-hassabis ylecun sama
Meta AI released Llama 3.2 models including 1B, 3B text-only and 11B, 90B vision variants with 128K token context length and adapter layers for image-text integration. These models outperform competitors like Gemma 2 and Phi 3.5-mini, and are supported on major platforms including AWS, Azure, and Google Cloud. OpenAI CTO Mira Murati announced her departure. Allen AI released Molmo, an open-source multimodal model family outperforming proprietary systems. Google improved Gemini 1.5 with Flash and Pro models. Meta showcased Project Orion AR glasses and hinted at a Quest 3S priced at $300. Discussions covered new benchmarks for multimodal models, model optimization, and AI safety and alignment.
AI2 releases OLMo - the 4th open-everything LLM
olmo-1b olmo-7b olmo-65b miqu-70b mistral-medium distilbert-base-uncased ai2 allenai mistral-ai tsmc asml zeiss fine-tuning gpu-shortage embedding-chunking json-generation model-optimization reproducible-research self-correction vram-constraints programming-languages nathan-lambert lhc1921 mrdragonfox yashkhare_ gbourdin
AI2 is gaining attention in 2024 with its new OLMo models, including 1B and 7B sizes and a 65B model forthcoming, emphasizing open and reproducible research akin to Pythia. The Miqu-70B model, especially the Mistral Medium variant, is praised for self-correction and speed optimizations. Discussions in TheBloke Discord covered programming language preferences, VRAM constraints for large models, and fine-tuning experiments with Distilbert-base-uncased. The Mistral Discord highlighted challenges in the GPU shortage affecting semiconductor production involving TSMC, ASML, and Zeiss, debates on open-source versus proprietary models, and fine-tuning techniques including LoRA for low-resource languages. Community insights also touched on embedding chunking strategies and JSON output improvements.