All tags
Topic: "security"
AI Engineer World's Fair: Second Run, Twice The Fun
gemini-2.5-pro google-deepmind waymo tesla anthropic braintrust retrieval-augmentation graph-databases recommendation-systems software-engineering-agents agent-reliability reinforcement-learning voice image-generation video-generation infrastructure security evaluation ai-leadership enterprise-ai mcp tiny-teams product-management design-engineering robotics foundation-models coding web-development demishassabis
The 2025 AI Engineer World's Fair is expanding with 18 tracks covering topics like Retrieval + Search, GraphRAG, RecSys, SWE-Agents, Agent Reliability, Reasoning + RL, Voice AI, Generative Media, Infrastructure, Security, and Evals. New focuses include MCP, Tiny Teams, Product Management, Design Engineering, and Robotics and Autonomy featuring foundation models from Waymo, Tesla, and Google. The event highlights the growing importance of AI Architects and enterprise AI leadership. Additionally, Demis Hassabis announced the Gemini 2.5 Pro Preview 'I/O edition', which leads coding and web development benchmarks on LMArena.
not much happened today
prime gpt-4o qwen-32b olmo openai qwen cerebras-systems langchain vercel swaggo gin echo reasoning chain-of-thought math coding optimization performance image-processing software-development agent-frameworks version-control security robotics hardware-optimization medical-ai financial-ai architecture akhaliq jason-wei vikhyatk awnihannun arohan tom-doerr hendrikbgr jerryjliu0 adcock-brett shuchaobi stasbekman reach-vb virattt andrew-n-carr
Olmo 2 released a detailed tech report showcasing full pre, mid, and post-training details for a frontier fully open model. PRIME, an open-source reasoning solution, achieved 26.7% pass@1, surpassing GPT-4o in benchmarks. Performance improvements include Qwen 32B (4-bit) generating at >40 tokens/sec on an M4 Max and libvips being 25x faster than Pillow for image resizing. New tools like Swaggo/swag for Swagger 2.0 documentation, Jujutsu (jj) Git-compatible VCS, and Portspoof security tool were introduced. Robotics advances include a weapon detection system with a meters-wide field of view and faster frame rates. Hardware benchmarks compared H100 and MI300x accelerators. Applications span medical error detection using PRIME and a financial AI agent integrating LangChainAI and Vercel AI SDK. Architectural insights suggest the need for breakthroughs similar to SSMs or RNNs.
Anthropic launches the Model Context Protocol
claude-3.5-sonnet claude-desktop anthropic amazon zed sourcegraph replit model-context-protocol integration json-rpc agentic-behaviors security tool-discovery open-protocol api-integration system-integration prompt-templates model-routing alex-albert matt-pocock hwchase17
Anthropic has launched the Model Context Protocol (MCP), an open protocol designed to enable seamless integration between large language model applications and external data sources and tools. MCP supports diverse resources such as file contents, database records, API responses, live system data, screenshots, and logs, identified by unique URIs. It also includes reusable prompt templates, system and API tools, and JSON-RPC 2.0 transports with streaming support. MCP allows servers to request LLM completions through clients with priorities on cost, speed, and intelligence, hinting at an upcoming model router by Anthropic. Launch partners like Zed, Sourcegraph, and Replit have reviewed MCP favorably, while some developers express skepticism about its provider exclusivity and adoption potential. The protocol emphasizes security, testing, and dynamic tool discovery, with guides and videos available from community members such as Alex Albert and Matt Pocock. This development follows Anthropic's recent $4 billion fundraise from Amazon and aims to advance terminal-level integration for Claude Desktop.
FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)
llama-3-70b llama-3 wizardlm-2-8x22b claude-opus mistral-8x7b gpt-4 huggingface meta-ai-fair dbrx reka-ai mistral-ai lmsys openai datasets benchmarking quantization zero-shot-learning reasoning code-error-detection token-generation security
2024 has seen a significant increase in dataset sizes for training large language models, with Redpajama 2 offering up to 30T tokens, DBRX at 12T tokens, Reka Core/Flash/Edge with 5T tokens, and Llama 3 trained on 15T tokens. Huggingface released an open dataset containing 15T tokens from 12 years of filtered CommonCrawl data, enabling training of models like Llama 3 if compute resources are available. On Reddit, WizardLM-2-8x22b outperformed other open LLMs including Llama-3-70b-instruct in reasoning and math benchmarks. Claude Opus demonstrated strong zero-shot code error spotting, surpassing Llama 3. Benchmarks revealed limitations in the LMSYS chatbot leaderboard due to instruction-tuned models gaming the system, and a new RAG benchmark showed Llama 3 70B underperforming compared to GPT-4, while Mistral 8x7B remained strong. Efficient quantized versions of Llama 3 models are available on Huggingface, with users reporting token generation limits around 9600 tokens on a 3090 GPU. Safety concerns include a UK sex offender banned from AI tool usage and GPT-4 demonstrating an 87% success rate exploiting real vulnerabilities, raising security concerns.
1/11/2024: Mixing Experts vs Merging Models
gpt-4-turbo gpt-4-0613 mixtral deepseekmoe phixtral deepseek-ai hugging-face nous-research teenage-engineering discord mixture-of-experts model-merging fine-tuning rag security discord-tos model-performance prompt-engineering function-calling semantic-analysis data-frameworks ash_prabaker shacrw teknium 0xevil everyoneisgross ldj pramod8481 mgreg_42266 georgejrjrjr kenakafrosty
18 guilds, 277 channels, and 1342 messages were analyzed with an estimated reading time saved of 187 minutes. The community switched to GPT-4 turbo and discussed the rise of Mixture of Experts (MoE) models like Mixtral, DeepSeekMOE, and Phixtral. Model merging techniques, including naive linear interpolation and "frankenmerges" by SOLAR and Goliath, are driving new performance gains on open leaderboards. Discussions in the Nous Research AI Discord covered topics such as AI playgrounds supporting prompt and RAG parameters, security concerns about third-party cloud usage, debates on Discord bots and TOS, skepticism about Teenage Engineering's cloud LLM, and performance differences between GPT-4 0613 and GPT-4 turbo. The community also explored fine-tuning strategies involving DPO, LoRA, and safetensors, integration of RAG with API calls, semantic differences between MoE and dense LLMs, and data frameworks like llama index and SciPhi-AI's synthesizer. Issues with anomalous characters in fine-tuning were also raised.
1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??
llama-3 llama-3-1-1b llama-3-8-3b gpt-4 gpt-3.5 dall-e openai mistral-ai llamaindex langchain fine-tuning model-expansion token-limits privacy multilinguality image-generation security custom-models model-training yannic-kilcher
New research papers introduce promising Llama Extensions including TinyLlama, a compact 1.1B parameter model pretrained on about 1 trillion tokens for 3 epochs, and LLaMA Pro, an 8.3B parameter model expanding LLaMA2-7B with additional training on 80 billion tokens of code and math data. LLaMA Pro adds layers to avoid catastrophic forgetting and balances language and code tasks but faces scrutiny for not using newer models like Mistral or Qwen. Meanwhile, OpenAI Discord discussions reveal insights on GPT-4 token limits, privacy reassurances, fine-tuning for GPT-3.5, challenges with multi-language image recognition, custom GPT creation requiring ChatGPT Plus, and security concerns in GPT deployment. Users also share tips on dynamic image generation with DALL-E and logo creation.