All tags
Person: "nptacek"
Databricks' $100B Series K
deepseek-v3.1-base deepseek-v3.1-instruct chatgpt-go qwen-image-edit databricks openai deepseek hugging-face alibaba model-release benchmarking pricing-models fine-tuning model-architecture image-editing video-generation api agentic-ai sama nickaturley kevinweil gdb sherwinwu nptacek reach_vb clementdelangue teortaxestex quixiai georgejrjrjr scaling01 alibaba_qwen linoy_tsaban ostrisai lmarena_ai
Databricks reached a $100 billion valuation, becoming a centicorn with new Data (Lakebase) and AI (Agent Bricks) products. OpenAI launched ChatGPT Go in India at ₹399/month (~$4.55), offering significantly increased usage limits and UPI payment support, with plans for global expansion. The DeepSeek V3.1 Base/Instruct models were quietly released on Hugging Face, showing strong coding benchmark performance and adopting an Anthropic-style hybrid system. The Qwen-Image-Edit model from Alibaba is gaining traction with integrations and community pruning experiments. "DeepSeek V3.1 Base outperforms Claude 4 Opus on coding benchmarks" and "ChatGPT Go offers 10x higher message limits and 2x longer memory" highlight key advancements.
DeepSeek #1 on US App Store, Nvidia stock tanks -17%
deepseek-r1 deepseek-v3 qwen2.5-vl o1 deepseek openai nvidia langchain moe-architecture chain-of-thought fp8-precision multimodality vision agentic-ai inference-scaling gpu-optimization model-efficiency ai-chatbots memory-integration tool-use stock-market-reactions sama mervenoyann omarasar0 teortaxestex nptacek carpeetti finbarrtimbers cwolferesearch arthurrapier danhendrycks scaling01 janusflow
DeepSeek has made a significant cultural impact by hitting mainstream news unexpectedly in 2025. The DeepSeek-R1 model features a massive 671B parameter MoE architecture and demonstrates chain-of-thought (CoT) capabilities comparable to OpenAI's o1 at a lower cost. The DeepSeek V3 model trains a 236B parameter model 42% faster than its predecessor using fp8 precision. The Qwen2.5 multimodal models support images and videos with sizes ranging from 3B to 72B parameters, featuring strong vision and agentic capabilities. LangChain and LangGraph integration enable AI chatbots with memory and tool use, including applications like the DeFi Agent. Discussions highlight NVIDIA's role in hardware acceleration, with concerns about stock drops due to DeepSeek's efficiency and market fears. The compute demand is expected to rise despite efficiency gains, driven by inference scaling and MoE design improvements.
Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?
gpt-5 gpt-4 dall-e-3 openai microsoft learnable-activations mlp function-approximation interpretability inductive-bias-injection b-splines model-rearrangement parameter-efficiency ai-generated-image-detection metadata-standards large-model-training max-tegmark ziming-liu bindureddy nptacek zacharynado rohanpaul_ai svpino
Ziming Liu, a grad student of Max Tegmark, published a paper on Kolmogorov-Arnold Networks (KANs), claiming they outperform MLPs in interpretability, inductive bias injection, function approximation accuracy, and scaling, despite being 10x slower to train but 100x more parameter efficient. KANs use learnable activation functions modeled by B-splines on edges rather than fixed activations on nodes. However, it was later shown that KANs can be mathematically rearranged back into MLPs with similar parameter counts, sparking debate on their interpretability and novelty. Meanwhile, on AI Twitter, there is speculation about a potential GPT-5 release with mixed impressions, OpenAI's adoption of the C2PA metadata standard for detecting AI-generated images with high accuracy for DALL-E 3, and Microsoft training a large 500B parameter model called MAI-1, potentially previewed at Build conference, signaling increased competition with OpenAI. "OpenAI's safety testing for GPT-4.5 couldn't finish in time for Google I/O launch" was also noted.