All tags
Company: "box"
not much happened today
gpt-4o claude-3.5-sonnet phi-3.5-mini phi-3.5-moe phi-3.5-vision llama-3-1-405b qwen2-math-72b openai anthropic microsoft meta-ai-fair hugging-face langchain box fine-tuning benchmarking model-comparison model-performance diffusion-models reinforcement-learning zero-shot-learning math model-efficiency ai-regulation ai-safety ai-engineering prompt-engineering swyx ylecun
OpenAI launched GPT-4o finetuning with a case study on Cosine. Anthropic released Claude 3.5 Sonnet with 8k token output. Microsoft Phi team introduced Phi-3.5 in three variants: Mini (3.8B), MoE (16x3.8B), and Vision (4.2B), noted for sample efficiency. Meta released Llama 3.1 405B, deployable on Google Cloud Vertex AI, offering GPT-4 level capabilities. Qwen2-Math-72B achieved state-of-the-art math benchmark performance with a Gradio demo. Discussions included model comparisons like ViT vs CNN and Mamba architecture. Tools updates featured DSPy roadmap, Flux Schnell improving diffusion speed on M1 Max, and LangChain community events. Research highlights zero-shot DUP prompting for math reasoning and fine-tuning best practices. AI ethics covered California's AI Safety Bill SB 1047 and regulatory concerns from Yann LeCun. Commentary on AI engineer roles by Swyx. "Chat with PDF" feature now available for Box Enterprise Plus users.
not much happened today
grok-2 claude-3.5-sonnet claude-3.5 gpt-4 chatgpt-4o-latest anthropic x-ai google-deepmind openai mistral-ai meta-ai-fair salesforce box prompt-caching model-performance vision fine-tuning multilinguality ai-safety design-automation document-processing ai-agents ai-integration ai-job-market ai-acceleration humor demis-hassabis francois-chollet
Anthropic rolled out prompt caching in its API, reducing input costs by up to 90% and latency by 80%, enabling instant fine-tuning with longer prompts. xAI released Grok-2, a new model competing with frontier models from Google DeepMind, OpenAI, Anthropic, Mistral AI, and Meta AI Fair, supporting vision and text inputs and integrating external image generation models. Claude 3.5 Sonnet is reported to outperform GPT-4 in coding and reasoning, while ChatGPT-4o-latest shows reasoning improvements. François Chollet proposed a theory defining intelligence as the efficiency of operationalizing past information for future tasks. The Aya project involves 3000 collaborators building multilingual AI datasets. Demis Hassabis discussed AI hype and safe AI development in a podcast. Tools like Dora AI for Figma and Box's AI API enhance design automation and document processing. Salesforce released DEI, an open AI software engineering agents framework with a 55% resolve rate on SWE-Bench Lite. Industry trends highlight rapid AI integration, networking importance in the AI job market, and potential OpenAI GPT-4 expansion in response to competitors. Memes include humor about Apple Vision Pro.
not much happened today
llama-3 llama-3-1 grok-2 claude-3.5-sonnet gpt-4-turbo nous-research nvidia salesforce goodfire-ai anthropic x-ai google-deepmind box langchain fine-tuning prompt-caching mechanistic-interpretability model-performance multimodality agent-frameworks software-engineering-agents api document-processing text-generation model-releases vision image-generation efficiency scientific-discovery fchollet demis-hassabis
GPT-5 delayed again amid a quiet news day. Nous Research released Hermes 3 finetune of Llama 3 base models, rivaling FAIR's instruct tunes but sparking debate over emergent existential crisis behavior with 6% roleplay data. Nvidia introduced Minitron finetune of Llama 3.1. Salesforce launched a DEI agent scoring 55% on SWE-Bench Lite. Goodfire AI secured $7M seed funding for mechanistic interpretability work. Anthropic rolled out prompt caching in their API, cutting input costs by up to 90% and latency by 80%, aiding coding assistants and large document processing. xAI released Grok-2, matching Claude 3.5 Sonnet and GPT-4 Turbo on LMSYS leaderboard with vision+text inputs and image generation integration. Claude 3.5 Sonnet reportedly outperforms GPT-4 in coding and reasoning. François Chollet defined intelligence as efficient operationalization of past info for future tasks. Salesforce's DEI framework surpasses individual agent performance. Google DeepMind's Demis Hassabis discussed AGI's role in scientific discovery and safe AI development. Dora AI plugin generates landing pages in under 60 seconds, boosting web team efficiency. Box AI API beta enables document chat, data extraction, and content summarization. LangChain updated Python & JavaScript integration docs.
How Carlini Uses AI
gemma-2-2b gpt-3.5-turbo-0613 mixtral-8x7b gen-3-alpha segment-anything-model-2 stable-fast-3d groq intel deepmind box figure-ai openai google meta-ai-fair nvidia stability-ai runway benchmarking adversarial-attacks large-language-models text-generation multimodality robotics emotion-detection structured-data-extraction real-time-processing teleoperation 3d-generation text-to-video nicholas-carlini chris-dixon rasbt
Groq's shareholders' net worth rises while others fall, with Intel's CEO expressing concern. Nicholas Carlini of DeepMind gains recognition and criticism for his extensive AI writings, including an 80,000-word treatise on AI use and a benchmark for large language models. Chris Dixon comments on AI Winter skepticism, emphasizing long-term impact. Box introduces an AI API for extracting structured data from documents, highlighting potential and risks of LLM-driven solutions. Recent AI developments include Figure AI launching the advanced humanoid robot Figure 02, OpenAI rolling out Advanced Voice Mode for ChatGPT with emotion detection, Google open-sourcing Gemma 2 2B model matching GPT-3.5-Turbo-0613 performance, Meta AI Fair releasing Segment Anything Model 2 (SAM 2) for real-time object tracking, NVIDIA showcasing Project GR00T for humanoid teleoperation with Apple Vision Pro, Stability AI launching Stable Fast 3D for rapid 3D asset generation, and Runway unveiling Gen-3 Alpha for AI text-to-video generation.