All tags
Topic: "voice-processing"
not much happened today
OpenAI expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes, fostering a new ecosystem with integrations from Cloudflare, Modal, Vercel, and others. Cloudflare launched Project Think, a next-gen Agents SDK with durable execution and sandboxed code, alongside Agent Lee, a prompt-driven UI agent using sandboxed TypeScript, and introduced real-time voice pipelines and browser automation tools. Hermes Agent focuses on persistent skill formation by learning from completed workflows, positioning itself as a professional agent distinct from GUI-first assistants like OpenClaw. "Hermes autonomously backfills tracking data, updates cron jobs, and saves workflows as reusable skills," highlighting its advanced workflow management capabilities.
LLaDA: Large Language Diffusion Models
llada-8b llama-3-8b step-video-t2v-30b step-audio-chat-132b llama-2-7b stepfun-ai scale-ai cambridge llamaindex diffusion-models text-generation multimodality video-generation voice-processing benchmarking instruction-following model-scaling gpu-usage long-context multi-turn-dialogue arankomatsuzaki _akhaliq omarsar0 iscienceluvr gallabytes maximelabonne reach_vb
LLaDA (Large Language Diffusion Model) 8B is a breakthrough diffusion-based language model that rivals LLaMA 3 8B while training on 7x fewer tokens (2 trillion tokens) and using 0.13 million H800 GPU hours. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, StepFun AI released two major models: Step-Video-T2V 30B, a text-to-video model generating up to 204 frames with high coherence and motion quality, and Step-Audio-Chat 132B, a voice-to-voice model. Additionally, challenging multimodal benchmarks like Scale AI's EnigmaEval and Cambridge's ZeroBench highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.