Company: "perplexity_ai"

claude-fable-5 opus-4.8 sonnet-5 glm-5.2 kimi-k2.7 anthropic cursor cognition perplexity z-ai langchain vllm-project deepseek-ai multi-model-orchestration model-combination-strategies cybersecurity coding-ide benchmarking inference-optimization speculative-decoding pass-at-1 integration-testing claudeai theo omarsar0 mparakhin kimmonismus artificialanlys claudedevs cursor_ai cognition perplexity_ai zai_org hwchase17 mercor_ai scaling01 vllm_project mgoin_ jon_durbin

Anthropic re-enabled Claude Fable 5 with updated cybersecurity safeguards routing some requests to Opus 4.8. The relaunch influenced tooling adoption by Cursor, Devin, and Perplexity. Builders are adapting to frontier-model constraints by employing multi-model orchestration and model-combination strategies rather than relying on a single model. Fable 5 scored 16.10% on the Remote Labor Index, while Sonnet 5 ranked second on AA-Briefcase with tradeoffs in cost-performance. Meanwhile, Z.ai launched ZCode, a dev environment for GLM-5.2 with BYOK support and cross-platform availability, supported by guides from LangChain and developer adoption noted by hwchase17. Benchmarks show GLM-5.2 leading on APEX-SWE with 55.3% Pass@1 on Integration, closely followed by Kimi K2.7, indicating a shrinking coding gap. Inference improvements include DSpark speculative decoding in vLLM for DeepSeek models with speeds around 250 tok/s and a 1.5× faster decode preview for GLM-5.2 DSpark.

May 26

not much happened today

eagle-3.1 unigram-tokenizer qwen-3.5 deepseek-v4-pro mimo deep-agents-v0.6 397b-parameter-model eaglecorp vllm_project perplexity_ai alibaba lightseek nvidia mooncake flashattention kimmonismus deepseek xiaomi langchain baseten trajectory clay harvey decagon mercor rogo rlm inference-optimization long-context speculative-decoding tokenization attention-mechanisms kv-cache cache-hierarchy agent-engineering model-harness-memory-fit continual-learning quantization autoscaling memory-centric-agents evaluation-automation kimmonismus _luofuli vtrivedy10

Inference optimization is increasingly architectural, with EAGLE 3.1 improving speculative decoding and long-context handling, collaborating with vLLM and TorchSpec. Perplexity open-sourced a rebuilt Unigram tokenizer cutting CPU use by 5–6× and achieving 63 µs at 514 tokens. Qwen3.5 hits 580 tokens/s via joint efforts from Alibaba, LightSeek, NVIDIA, Mooncake, and FlashAttention-4 contributors. Price cuts in APIs from Chinese labs are sustainable due to structural KV-cache and attention improvements, exemplified by DeepSeek V4-Pro and Xiaomi MiMo reducing caching costs significantly. Agent engineering shifts focus from model quality to model-harness-memory fit, with LangChain releasing Deep Agents v0.6 and tools like LangSmith Engine automating evaluation loops. Trajectory launched a continual learning platform with $15M funding and partners like Clay and Harvey, supporting large models including a 397B-parameter model deployed on autoscaled H100 infrastructure. Open-source memory-centric agents and minimal training harnesses also gained attention.

Mar 05

GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back

gpt-5.4 gpt-5.4-pro openai cursor_ai perplexity_ai arena native-computer-use long-context efficiency steering benchmarking gpu-kernels attention-mechanisms algorithmic-optimization pipeline-optimization sama reach_vb scaling01 danshipper yuchenj_uw

OpenAI launched GPT-5.4 and GPT-5.4 Pro with unified mainline and Codex models, featuring native computer use, up to ~1M token context, and efficiency improvements including a new Codex /fast mode. Benchmarks showed strong results like OSWorld-Verified 75.0% surpassing human baseline and GDPval 83% against industry pros. User feedback highlighted coding utility but raised concerns about pricing and overthinking. Integration with devtools like Cursor, Perplexity, and Arena was announced. In systems research, FlashAttention-4 (FA4) was introduced with near-matmul speed attention on Blackwell GPUs, featuring innovations like polynomial exp emulation and online softmax. "Steering mid-response" and "fewer tokens, faster speed" were emphasized as UX and efficiency improvements.

Aug 07, 2025

OpenAI rolls out GPT-5 and GPT-5 Thinking to >1B users worldwide; -mini and -nano help claim Pareto Frontier

gpt-5 gpt-5-mini gpt-5-nano claude-4.1-sonnet claude-4.1-opus openai cursor_ai jetbrains microsoft notion perplexity_ai factoryai model-architecture context-windows pricing-models coding long-context prompt-engineering model-benchmarking model-integration tool-use reasoning sama scaling01 jeffintime embirico mustafasuleyman cline lmarena_ai nrehiew_ ofirpress sauers_

OpenAI launched GPT-5, a unified system featuring a fast main model and a deeper thinking model with a real-time router, supporting up to 400K context length and aggressive pricing that reclaims the Pareto Frontier of Intelligence. The rollout includes variants like gpt-5-mini and gpt-5-nano with significant cost reductions, and integrations with products such as ChatGPT, Cursor AI, JetBrains AI Assistant, Microsoft Copilot, Notion AI, and Perplexity AI. Benchmarks show GPT-5 performing strongly in coding and long-context reasoning, roughly matching Claude 4.1 Sonnet/Opus on SWE-bench Verified. The launch was accompanied by a GPT-5 prompting cookbook and notable community discussions on pricing and performance.

Aug 14, 2024

Gemini Live

gemini-1.5-pro genie falcon-mamba gemini-1.5 llamaindex google anthropic tii supabase perplexity-ai llamaindex openai hugging-face multimodality benchmarking long-context retrieval-augmented-generation open-source model-releases model-integration model-performance software-engineering linear-algebra hugging-face-hub debugging omarsar0 osanseviero dbrxmosaicai alphasignalai perplexity_ai _jasonwei svpino

Google launched Gemini Live on Android for Gemini Advanced subscribers during the Pixel 9 event, featuring integrations with Google Workspace apps and other Google services. The rollout began on 8/12/2024, with iOS support planned. Anthropic released Genie, an AI software engineering system achieving a 57% improvement on SWE-Bench. TII introduced Falcon Mamba, a 7B attention-free open-access model scalable to long sequences. Benchmarking showed that longer context lengths do not always improve Retrieval-Augmented Generation. Supabase launched an AI-powered Postgres service dubbed the "ChatGPT of databases," fully open source. Perplexity AI partnered with Polymarket to integrate real-time probability predictions into search results. A tutorial demonstrated a multimodal recipe recommender using Qdrant, LlamaIndex, and Gemini. An OpenAI engineer shared success tips emphasizing debugging and hard work. The connection between matrices and graphs in linear algebra was highlighted for insights into nonnegative matrices and strongly connected components. Keras 3.5.0 was released with Hugging Face Hub integration for model saving and loading.