All tags
Person: "arav-srinivas"
not much happened today
smollm2 llama-3-2 stable-diffusion-3.5 claude-3.5-sonnet gemini openai anthropic google meta-ai-fair suno-ai perplexity-ai on-device-ai model-performance robotics multimodality ai-regulation model-releases natural-language-processing prompt-engineering agentic-ai ai-application model-optimization sam-altman akhaliq arav-srinivas labenz loubnabenallal1 alexalbert fchollet stasbekman svpino rohanpaul_ai hamelhusain
ChatGPT Search was launched by Sam Altman, who called it his favorite feature since ChatGPT's original launch, doubling his usage. Comparisons were made between ChatGPT Search and Perplexity with improvements noted in Perplexity's web navigation. Google introduced a "Grounding" feature in the Gemini API & AI Studio enabling Gemini models to access real-time web information. Despite Gemini's leaderboard performance, developer adoption lags behind OpenAI and Anthropic. SmolLM2, a new small, powerful on-device language model, outperforms Meta's Llama 3.2 1B. A Claude desktop app was released for Mac and Windows. Meta AI announced robotics advancements including Meta Sparsh, Meta Digit 360, and Meta Digit Plexus. Stable Diffusion 3.5 Medium, a 2B parameter model with a permissive license, was released. Insights on AGI development suggest initial inferiority but rapid improvement. Anthropic advocates for early targeted AI regulation. Discussions on ML specialization predict training will concentrate among few companies, while inference becomes commoditized. New AI tools include Suno AI Personas for music creation, PromptQL for natural language querying over data, and Agent S for desktop task automation. Humor was shared about Python environment upgrades.
Not much technical happened today
whisper-v3-turbo llama-3 llamaindex openai poolside liquidai perplexity-ai meta-ai-fair cohere fujitsu mixture-of-experts context-windows model-optimization fine-tuning quantization model-training alignment synthetic-data model-architecture agentic-ai nick-turley arav-srinivas francois-fleuret finbarr-timbers lewtun francois-chollet jerry-j-liu mmitchell-ai jxnlco
OpenAI announced raising $6.6B in new funding at a $157B valuation, with ChatGPT reaching 250M weekly active users. Poolside raised $500M to advance AGI development. LiquidAI introduced three new MoE models (1B, 3B, 40B) with a 32k context window and efficient token handling. OpenAI released Whisper V3 Turbo, an open-source multilingual model with significant speed improvements. Meta AI FAIR is hiring research interns focusing on LLM reasoning, alignment, synthetic data, and novel architectures. Cohere partnered with Fujitsu to launch Takane, a custom Japanese model. Technical discussions included challenges in LoRA fine-tuning, float8 quantization in Keras, and new tools like create-llama for agent templates. Industry commentary raised concerns about AI development priorities and highlighted freelancing opportunities in AI.
Contextual Position Encoding (CoPE)
cope gemini-1.5-flash gemini-1.5-pro claude gpt-3 meta-ai-fair google-deepmind anthropic perplexity-ai langchain openai positional-encoding transformers counting copying language-modeling coding external-memory tool-use model-evaluation inference-speed model-benchmarking scaling research-synthesis jason-weston alexandr-wang karpathy arav-srinivas
Meta AI researcher Jason Weston introduced CoPE, a novel positional encoding method for transformers that incorporates context to create learnable gates, enabling improved handling of counting and copying tasks and better performance on language modeling and coding. The approach can potentially be extended with external memory for gate calculation. Google DeepMind released Gemini 1.5 Flash and Pro models optimized for fast inference. Anthropic announced general availability of tool use for Claude, enhancing its ability to orchestrate tools for complex tasks. Alexandr Wang launched SEAL Leaderboards for private, expert evaluations of frontier models. Karpathy reflected on the 4th anniversary of GPT-3, emphasizing scaling and practical improvements. Perplexity AI launched Perplexity Pages to convert research into visually appealing articles, described as an "AI Wikipedia" by Arav Srinivas.
Grok-1 in Bio
grok-1 mixtral miqu-70b claude-3-opus claude-3 claude-3-haiku xai mistral-ai perplexity-ai groq anthropic openai mixture-of-experts model-release model-performance benchmarking finetuning compute hardware-optimization mmlu model-architecture open-source memes sam-altman arthur-mensch daniel-han arav-srinivas francis-yao
Grok-1, a 314B parameter Mixture-of-Experts (MoE) model from xAI, has been released under an Apache 2.0 license, sparking discussions on its architecture, finetuning challenges, and performance compared to models like Mixtral and Miqu 70B. Despite its size, its MMLU benchmark performance is currently unimpressive, with expectations that Grok-2 will be more competitive. The model's weights and code are publicly available, encouraging community experimentation. Sam Altman highlighted the growing importance of compute resources, while Grok's potential deployment on Groq hardware was noted as a possible game-changer. Meanwhile, Anthropic's Claude continues to attract attention for its "spiritual" interaction experience and consistent ethical framework. The release also inspired memes and humor within the AI community.
DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY
llama-3 claude-3-opus claude-3 gpt-3.5-turbo deepmind cognition-labs deepgram modal-labs meta-ai-fair anthropic multimodality transformer software-engineering ai-agents ai-infrastructure training text-to-speech speech-to-text real-time-processing model-architecture benchmarking andrej-karpathy arav-srinivas francois-chollet yann-lecun soumith-chintala john-carmack
DeepMind SIMA is a generalist AI agent for 3D virtual environments evaluated on 600 tasks across 9 games using only screengrabs and natural language instructions, achieving 34% success compared to humans' 60%. The model uses a multimodal Transformer architecture. Andrej Karpathy outlines AI autonomy progression in software engineering, while Arav Srinivas praises Cognition Labs' AI agent demo. François Chollet expresses skepticism about automating software engineering fully. Yann LeCun suggests moving away from generative models and reinforcement learning towards human-level AI. Meta's Llama-3 training infrastructure with 24k H100 Cluster Pods is shared by Soumith Chintala and Yann LeCun. Deepgram's Aura offers low-latency speech APIs, and Modal Labs' Devin AI demonstrates document navigation and interaction with ComfyUI. Memes and humor circulate in the AI community.
Fixing Gemma
gemma claude-3-opus claude-3 mistral-large gpt-4 google unsloth anthropic mistral-ai finetuning numerical-precision benchmarking structured-data-extraction adaptive-equalizer information-theory hallucination-detection model-stability daniel-han yann-lecun francois-chollet arav-srinivas _aidan_clark_
Google's Gemma model was found unstable for finetuning until Daniel Han from Unsloth AI fixed 8 bugs, improving its implementation. Yann LeCun explained technical details of a pseudo-random bit sequence for adaptive equalizers, while François Chollet discussed the low information bandwidth of the human visual system. Arav Srinivas reported that Claude 3 Opus showed no hallucinations in extensive testing, outperforming GPT-4 and Mistral-Large in benchmarks. Reflections from Yann LeCun highlight ongoing AI progress toward human-level intelligence. The community is shifting pipelines to work better with Claude models, and emotional experiences in ML development were shared by Aidan Clark.