All tags
Topic: "code-agents"
not much happened today
glm-5.2 opus-4.8 gpt-5.5 laguna-m.1 north-mini-code codex zhipu hugging-face llama-cpp unsloth poolsideai cohere ollama openai cursor_ai claude cognition sparse-attention 1m-token-inference open-weight-models model-architecture long-context mixture-of-experts quantization local-deployment workflow-automation code-agents software-configuration-management automation-primitives security model-harness agentic-coding rasbt jeremyphoward matvelloso artificialanlys zixuanli_ _xjdr gneubig _catwu
GLM-5.2 from Zhipu emerged as a leading open-weight model with innovative IndexShare sparse-attention enabling efficient 1M-token inference, praised as comparable to GPT-5.5 and Opus 4.8 but lacking vision support. Other notable open models include Laguna M.1 by Poolside AI, a 70-layer sparse MoE optimized for long-horizon coding, and North Mini Code by Cohere with 4-bit quantization and local deployment support via Ollama. The focus is shifting from standalone models to integrated systems combining model + harness + memory + SCM, exemplified by Noumena Code / ncode addressing challenges in concurrent code agent workflows. Automation tools like Codex Record & Replay, Cursor's /automate, and Artifacts in Claude Code enhance teachability, reusability, and security in AI-assisted coding workflows.
not much happened today
gpt-5 qwen2.5-7b ernie-4.5-vl-28b-a3b-thinking gemini-2.5-pro llamacloud claude-code openai baidu databricks llamaindex togethercompute sakanaailabs reasoning-benchmarks reinforcement-learning fine-tuning multimodality document-intelligence retrieval-augmented-generation agentic-systems persona-simulation code-agents guardrails sakanaailabs micahgoldblum francoisfleuret matei_zaharia jerryjliu0 omarsar0 togethercompute imjaredz theo
GPT-5 leads Sudoku-Bench solving 33% of puzzles but 67% remain unsolved, highlighting challenges in meta-reasoning and spatial logic. New training methods like GRPO fine-tuning and "Thought Cloning" show limited success. Research on "looped LLMs" suggests pretrained models benefit from repeated computation for better performance. Baidu's ERNIE-4.5-VL-28B-A3B-Thinking offers lightweight multimodal reasoning with Apache 2.0 licensing, outperforming Gemini-2.5-Pro and GPT-5-High on document tasks. Databricks ai_parse_document preview delivers cost-efficient document intelligence outperforming GPT-5 and Claude. Pathwork AI uses LlamaCloud for underwriting automation. Gemini File Search API enables agentic retrieval augmented generation (RAG) with MCP server integration. Together AI and Collinear launch TraitMix for persona-driven agent simulations integrated with Together Evals. Reports highlight risks in long-running code agents like Claude Code reverting changes, emphasizing guardrails. Community consensus favors multiple code copilots including Claude Code, Codex, and others.