All tags
Topic: "temporal-knowledge-graphs"
Creating a LLM-as-a-Judge
claude-3.5-sonnet claude-3.5 notebooklm simpleqa recraft-v3 anthropic openai deepmind apple zep perplexity-ai github critique-shadowing llm-judging domain-experts dataset-creation prompt-engineering error-analysis temporal-knowledge-graphs memory-layer ai-agent-memory hallucination-reduction integration hamel-husain swyx
Anthropic released details on Claude 3.5 SWEBench+SWEAgent, while OpenAI introduced SimpleQA and DeepMind launched NotebookLM. Apple announced new M4 Macbooks, and a new SOTA image model, Recraft v3, emerged. Hamel Husain presented a detailed 6,000-word treatise on creating LLM judges using a method called critique shadowing to align LLMs with domain experts, addressing the problem of untrusted and unused data in AI teams. The workflow involves expert-reviewed datasets and iterative prompt refinement. Additionally, Zep introduced a temporal knowledge graph memory layer to improve AI agent memory and reduce hallucinations. Anthropic also integrated Claude 3.5 Sonnet with GitHub Copilot, expanding access to Copilot Chat users.
Did Nvidia's Nemotron 70B train on test?
nemotron-70b llama-3.1-70b llama-3.1 ministral-3b ministral-8b gpt-4o claude-3.5-sonnet claude-3.5 nvidia mistral-ai hugging-face zep benchmarking reinforcement-learning reward-models temporal-knowledge-graphs memory-layers context-windows model-releases open-source reach_vb philschmid swyx
NVIDIA's Nemotron-70B model has drawn scrutiny despite strong benchmark performances on Arena Hard, AlpacaEval, and MT-Bench, with some standard benchmarks like GPQA and MMLU Pro showing no improvement over the base Llama-3.1-70B. The new HelpSteer2-Preference dataset improves some benchmarks with minimal losses elsewhere. Meanwhile, Mistral released Ministral 3B and 8B models featuring 128k context length and outperforming Llama-3.1 and GPT-4o on various benchmarks under the Mistral Commercial License. NVIDIA's Nemotron 70B also surpasses GPT-4o and Claude-3.5-Sonnet on key benchmarks using RLHF (REINFORCE) training. Additionally, Zep introduced Graphiti, an open-source temporal knowledge graph memory layer for AI agents, built on Neo4j.