All tags
Topic: "search"
not much happened to end the year
deepseek-v3 code-llm o1 sonnet-3.5 deepseek smol-ai reinforcement-learning reasoning training-data mixed-precision-training open-source multimodality software-development natural-language-processing interpretability developer-tools real-time-applications search sdk-generation corbtt tom_doerr cognitivecompai alexalbert__ theturingpost svpino bindureddy
Reinforcement Fine-Tuning (RFT) is introduced as a data-efficient method to improve reasoning in LLMs using minimal training data with strategies like First-Correct Solutions (FCS) and Greedily Diverse Solutions (GDS). DeepSeek-V3, a 671B parameter MoE language model trained on 14.8 trillion tokens with FP8 mixed precision training, highlights advances in large-scale models and open-source LLMs. Predictions for AI in 2025 include growth in smaller models, multimodality, and challenges in open-source AI. The impact of AI on software development jobs suggests a need for higher intelligence and specialization as AI automates low-skilled tasks. Enhancements to CodeLLM improve coding assistance with features like in-place editing and streaming responses. Natural Language Reinforcement Learning (NLRL) offers better interpretability and richer feedback for AI planning and critique. AI hiring is growing rapidly with startups seeking strong engineers in ML and systems. New AI-powered tools such as Rivet, Buzee, and Konfig improve real-time applications, search, and SDK generation using technologies like Rust and V8 isolates.
Genesis: Generative Physics Engine for Robotics (o1-mini version)
o1 o1-preview gpt-4o claude-3.5-sonnet gemini-2.0-pro llama-3-3b llama-3-70b openai google-deepmind meta-ai-fair hugging-face function-calling structured-outputs vision performance-benchmarks sdk webrtc reasoning math code-generation transformer-architecture model-training humanoid-robots search model-efficiency dataset-sharing aidan_mclau sundarpichai adcock_brett
OpenAI launched the o1 model API featuring function calling, structured outputs, vision support, and developer messages, achieving 60% fewer reasoning tokens than its preview. The model excels in math and code with a 0.76 LiveBench Coding score, outperforming Sonnet 3.5. Beta SDKs for Go and Java and WebRTC support with 60% lower prices were also released. Google Gemini 2.0 Pro (Gemini Exp 1206) deployment accelerated, showing improved coding, math, and reasoning performance. Meta AI FAIR introduced research on training transformers directly on raw bytes using dynamic entropy-based patching. Commercial humanoid robots were successfully deployed by an industry player. Hugging Face researchers demonstrated that their 3B Llama model can outperform the 70B Llama model on MATH-500 accuracy using search techniques, highlighting efficiency gains with smaller models. Concerns about reproducibility and domain-specific limitations were noted.