All tags
Person: "clementdelangue"
not much happened today
seedance-1.0 codex claude-code kling-2.1 veo-3 bytedance morph-labs huggingface deeplearning.ai figure-ai langchain sakana-ai video-generation autoformalization ai-assisted-coding api-design context-engineering reinforcement-learning ai-evals hypernetworks model-fine-tuning foundation-models andrew_ng hwchase17 adcock_brett clementdelangue akhaliq jxmnop hamelhusain sh_reya
Bytedance showcased an impressive state-of-the-art video generation model called Seedance 1.0 without releasing it, while Morph Labs announced Trinity, an autoformalization system for Lean. Huggingface Transformers deprecated Tensorflow/JAX support. Andrew Ng of DeepLearning.AI highlighted the rise of the GenAI Application Engineer role emphasizing skills in AI building blocks and AI-assisted coding tools like Codex and Claude Code. Engineering teams are increasingly testing API designs against LLMs for usability. Figure AI's CEO stressed speed as a key competitive advantage, and LangChain introduced the concept of Context Engineering for AI agents. Reinforcement learning on LLMs shows transformative potential, and the community values AI evals and data work. Sakana AI released Text-to-LoRA, a hypernetwork method for generating task-specific LoRA adapters from natural language, enabling efficient model customization. The video generation race heats up with Bytedance's Seed-based model praised for quality, challenging American labs, alongside models like Kling 2.1 and Veo 3.
not much happened today
codex claude-4-opus claude-4-sonnet gemini-2.5-pro gemini-2.5 qwen-2.5-vl qwen-3 playdiffusion openai anthropic google perplexity-ai bing playai suno hugging-face langchain-ai qwen mlx assemblyai llamacloud fine-tuning model-benchmarking text-to-video agentic-ai retrieval-augmented-generation open-source-models speech-editing audio-processing text-to-speech ultra-low-latency multimodality public-notebooks sama gdb kevinweil lmarena_ai epochairesearch reach_vb wightmanr deeplearningai mervenoyann awnihannun jordirib1 aravsrinivas omarsar0 lioronai jerryjliu0 nerdai tonywu_71 _akhaliq clementdelangue _mfelfel
OpenAI rolled out Codex to ChatGPT Plus users with internet access and fine-grained controls, improving memory features for free users. Anthropic's Claude 4 Opus and Sonnet models lead coding benchmarks, while Google's Gemini 2.5 Pro and Flash models gain recognition with new audio capabilities. Qwen 2.5-VL and Qwen 3 quantizations are noted for versatility and support. Bing Video Creator launched globally enabling text-to-video generation, and Perplexity Labs sees increased demand for travel search. New agentic AI tools and RAG innovations include LlamaCloud and FedRAG. Open-source releases include Holo-1 for web navigation and PlayAI's PlayDiffusion for speech editing. Audio and multimodal advances feature Suno's music editing upgrades, Google's native TTS in 24+ languages, and Universal Streaming's ultra-low latency speech-to-text. Google NotebookLM now supports public notebooks. "Codex's internet access brings tradeoffs, with explicit warnings about risk" and "Gemini 2.5 Pro is cited as a daily driver by users".
not much happened today
deepseek-r1-0528 pali-gemma-2 gemma-3 shieldgemma-2 txgemma gemma-3-qat gemma-3n-preview medgemma dolphingemma signgemma claude-4 opus-4 claude-sonnet-4 codestral-embed bagel qwen nemotron-cortexa gemini-2.5-pro deepseek-ai huggingface gemma claude bytedance qwen nemotron sakana-ai-labs benchmarking model-releases multimodality code-generation model-performance long-context reinforcement-learning model-optimization open-source yuchenj_uw _akhaliq clementdelangue osanseviero alexalbert__ guillaumelample theturingpost lmarena_ai epochairesearch scaling01 nrehiew_ ctnzr
DeepSeek R1 v2 model released with availability on Hugging Face and inference partners. The Gemma model family continues prolific development including PaliGemma 2, Gemma 3, and others. Claude 4 and its variants like Opus 4 and Claude Sonnet 4 show top benchmark performance, including new SOTA on ARC-AGI-2 and WebDev Arena. Codestral Embed introduces a 3072-dimensional code embedder. BAGEL, an open-source multimodal model by ByteDance, supports reading, reasoning, drawing, and editing with long mixed contexts. Benchmarking highlights include Nemotron-CORTEXA topping SWEBench and Gemini 2.5 Pro performing on VideoGameBench. Discussions on random rewards effectiveness focus on Qwen models. "Opus 4 NEW SOTA ON ARC-AGI-2. It's happening - I was right" and "Claude 4 launch has dev moving at a different pace" reflect excitement in the community.
How To Scale Your Model, by DeepMind
qwen-0.5 google-deepmind deepseek hugging-face transformers inference high-performance-computing robotics sim2real mixture-of-experts reinforcement-learning bias-mitigation rust text-generation open-source omarsar0 drjimfan tairanhe99 guanyashi lioronai _philschmid awnihannun clementdelangue
Researchers at Google DeepMind (GDM) released a comprehensive "little textbook" titled "How To Scale Your Model" covering modern Transformer architectures, inference optimizations beyond O(N^2) attention, and high-performance computing concepts like rooflines. The resource includes practical problems and real-time comment engagement. On AI Twitter, several key updates include the open-sourced humanoid robotics model ASAP inspired by athletes like Cristiano Ronaldo, LeBron James, and Kobe Bryant; a new paper on Mixture-of-Agents proposing the Self-MoA method for improved LLM output aggregation; training of reasoning LLMs using the GRPO algorithm from DeepSeek demonstrated on Qwen 0.5; findings on bias in LLMs used as judges highlighting the need for multiple independent evaluations; and the release of mlx-rs, a Rust library for machine learning with examples including Mistral text generation. Additionally, Hugging Face launched an AI app store featuring over 400,000 apps with 2,000 new daily additions and 2.5 million weekly visits, enabling AI-powered app search and categorization.
Mistral Small 3 24B and Tulu 3 405B
mistral-small-3 tulu-3-405b llama-3 tiny-swallow-1.5b qwen-2.5-max deepseek-v3 claude-3.5-sonnet gemini-1.5-pro gpt4o-mini llama-3-3-70b mistral-ai ai2 sakana-ai alibaba_qwen deepseek ollama llamaindex reinforcement-learning model-fine-tuning local-inference model-performance model-optimization on-device-ai instruction-following api training-data natural-language-processing clementdelangue dchaplot reach_vb
Mistral AI released Mistral Small 3, a 24B parameter model optimized for local inference with low latency and 81% accuracy on MMLU, competing with Llama 3.3 70B, Qwen-2.5 32B, and GPT4o-mini. AI2 released Tülu 3 405B, a large finetuned model of Llama 3 using Reinforcement Learning from Verifiable Rewards (RVLR), competitive with DeepSeek v3. Sakana AI launched TinySwallow-1.5B, a Japanese language model using TAID for on-device use. Alibaba_Qwen released Qwen 2.5 Max, trained on 20 trillion tokens, with performance comparable to DeepSeek V3, Claude 3.5 Sonnet, and Gemini 1.5 Pro, and updated API pricing. These releases highlight advances in open models, efficient inference, and reinforcement learning techniques.
not much happened today
phi-4 reinforce++ arc-agi-2 ai21-labs ollama langchain togethercompute groq reinforcement-learning ppo model-optimization memory-efficiency python-packages vision text-extraction frontend-code-generation workflow-automation coding-agents compute-cost-reduction ethical-ai agi-benchmarks scam-alerts sebastien-bubeck fchollet tom-doerr arohan_ bindureddy hwchase17 jonathanross321 clementdelangue vikhyatk
Sebastien Bubeck introduced REINFORCE++, enhancing classical REINFORCE with PPO-inspired techniques for 30% faster training. AI21 Labs released Phi-4 under the MIT License, accessible via Ollama. François Chollet announced plans for ARC-AGI-2 and a next-generation AGI benchmark. LangChain launched 10 new integration packages to boost LLM application development. Tom Doerr introduced Ollama-OCR, a Python package for text extraction using vision language models. Arohan optimized Shampoo for memory efficiency, reducing usage from 20 to 6 bytes per parameter. Bindu Reddy showcased CodeLLM's v1 for frontend code generation and highlighted LlamaIndex Workflows for academic summarization and slide generation. Hwchase17 collaborated with Together Compute to enhance WebDev Arena with complex coding agents for LLM coding evaluations. Jonathan Ross detailed Groq's mission to reduce compute costs by 1000x amid rising generative AI spending. Clement Delangue warned about scam alerts involving false claims of association with AI21. Vikhyat K raised concerns about the ethical implications and trade-offs of AGI. Memes and humor included creative AI prompts and critiques of LLM behaviors.
not much happened this weekend
o3 o1 opus sonnet octave openai langchain hume x-ai amd nvidia meta-ai-fair hugging-face inference-time-scaling model-ensembles small-models voice-cloning fine-math-dataset llm-agent-framework benchmarking software-stack large-concept-models latent-space-reasoning mechanistic-interpretability planning speech-language-models lisa-su clementdelangue philschmid neelnanda5
o3 model gains significant attention with discussions around its capabilities and implications, including an OpenAI board member referencing "AGI." LangChain released their State of AI 2024 survey. Hume announced OCTAVE, a 3B parameter API-only speech-language model with voice cloning. x.ai secured a $6B Series C funding round. Discussions highlight inference-time scaling, model ensembles, and the surprising generalization ability of small models. New tools and datasets include FineMath, the best open math dataset on Hugging Face, and frameworks for LLM agents. Industry updates cover a 5-month benchmarking of AMD MI300X vs Nvidia H100 + H200, insights from a meeting with Lisa Su on AMD's software stack, and open AI engineering roles. Research innovations include Large Concept Models (LCM) from Meta AI, Chain of Continuous Thought (Coconut) for latent space reasoning, and mechanistic interpretability initiatives.
Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights
apollo-1b apollo-3b apollo-7b veo-2 imagen-3 llama-3-70b llama-3b command-r7b llama-1b llama-8b chatgpt meta-ai-fair hugging-face google-deepmind openai figure-ai klarna cohere notion video-understanding scaling-consistency benchmarking temporal-ocr egocentric-perception spatial-perception reasoning video-generation physics-simulation voice-features map-integration language-expansion test-time-compute-scaling humanoid-robots ai-integration search-optimization self-recognition self-preference-bias akhaliq _lewtun clementdelangue adcock_brett rohanpaul_ai swyx shaneguML
Meta released Apollo, a new family of state-of-the-art video-language models available in 1B, 3B, and 7B sizes, featuring "Scaling Consistency" for efficient scaling and introducing ApolloBench, which speeds up video understanding evaluation by 41× across five temporal perception categories. Google Deepmind launched Veo 2, a 4K video generation model with improved physics and camera control, alongside an enhanced Imagen 3 image model. OpenAI globally rolled out ChatGPT search with advanced voice and map features and discussed a potential $2,000/month "ChatGPT Max" tier. Research highlights include achieving Llama 70B performance using Llama 3B via test-time compute scaling and expanding Command R7B language support from 10 to 23 languages. Industry updates feature Figure AI delivering humanoid robots commercially and Klarna reducing workforce through AI. Notion integrated Cohere Rerank for better search. Studies reveal LLMs can recognize their own writing style and show self-preference bias. Discussions note video processing progress outpacing text due to better signal-per-compute and data evaluation.
Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500
deepseek-r1 qwq gpt-4o claude-3.5-sonnet qwen-2.5 llama-cpp deepseek sambanova hugging-face dair-ai model-releases benchmarking fine-tuning sequential-search inference model-deployment agentic-rag external-tools multi-modal-models justin-lin clementdelangue ggerganov vikparuchuri
DeepSeek r1 leads the race for "open o1" models but has yet to release weights, while Justin Lin released QwQ, a 32B open weight model that outperforms GPT-4o and Claude 3.5 Sonnet on benchmarks. QwQ appears to be a fine-tuned version of Qwen 2.5, emphasizing sequential search and reflection for complex problem-solving. SambaNova promotes its RDUs as superior to GPUs for inference tasks, highlighting the shift from training to inference in AI systems. On Twitter, Hugging Face announced CPU deployment for llama.cpp instances, Marker v1 was released as a faster and more accurate deployment tool, and Agentic RAG developments focus on integrating external tools and advanced LLM chains for improved response accuracy. The open-source AI community sees growing momentum with models like Flux gaining popularity, reflecting a shift towards multi-modal AI models including image, video, audio, and biology.
DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality
nemotron-70b claude claude-3.5-sonnet gpt-4o deepseek meta-ai-fair wandb nvidia anthropic hugging-face perplexity-ai multimodality image-generation speech-synthesis fine-tuning model-merging benchmarking open-source model-optimization reinforcement-learning bindureddy aravsrinivas danielhanchen clementdelangue cwolferesearch
DeepSeek Janus and Meta SpiRit-LM are two notable multimodality AI models recently released, showcasing advances in image generation and speech synthesis respectively. DeepSeek Janus separates vision encoders for image understanding and generation, achieving better results in both tasks. Meta's SpiRit-LM introduces an expressive speech and writing model generating pitch and style units, improving over standard TTS. Additionally, W&B Weave offers comprehensive LLM observability and multimodality fine-tuning tools. Industry updates include Nvidia's Nemotron 70b model underperforming, Meta open-sourcing Movie Gen Bench for media generation benchmarking, Perplexity launching internal search with multi-step reasoning, and Anthropic updating Claude apps. Open source progress includes Hugging Face's gradient accumulation fix in transformers and advocacy for open source AI to prevent Big Tech dominance. "Model merging for combining skills of multiple models" is also highlighted.
The AI Nobel Prize
claude-3.5-sonnet reka-flash got openai anthropic reka-ai zep artificial-neural-networks nobel-prize knowledge-graphs memory-layers real-time-voice-api vision fine-tuning prompt-caching multimodality function-calling ocr open-source single-sign-on software-testing ai-assisted-coding ai-ethics geoff-hinton john-hopfield philschmid alexalbert mervenoyann clementdelangue svpino bindureddy ylecun rohanpaul_ai
Geoff Hinton and John Hopfield won the Nobel Prize in Physics for their work on Artificial Neural Networks. The award citation spans 14 pages highlighting their contributions. Zep released a new community edition of their low-latency memory layer for AI agents, emphasizing knowledge graphs for memory. At OpenAI's DevDay, new features like real-time voice API, vision model fine-tuning, and prompt caching with a 50% discount on reused tokens were introduced. Anthropic's Claude 3.5 Sonnet was recognized as the best model currently. Reka AI Labs updated their Reka Flash model with enhanced multimodal and function calling capabilities. The GOT (Generic OCR Transformer) achieved 98.79% accuracy on OCR benchmarks. Discussions on open-source AI models highlighted their role in fostering competition and decentralization. Software development insights included the importance of Single Sign-On (SSO), thorough testing, and AI-assisted coding workflows. Ethical and societal topics covered critiques of tax policies and the appointment of France's first Minister of AI.
not much happened today
llama-3-2 llama-3 molmo meta-ai-fair google-deepmind hugging-face on-device-ai multimodality chip-design retrieval-augmented-generation rag benchmarking reliability ai-regulation free-speech pytorch-optimization demis-hassabis clementdelangue svpino awnihannun osanseviero omarsar0 sarahookr ylecun
Meta released Llama 3.2, including lightweight 1B and 3B models for on-device AI with capabilities like summarization and retrieval-augmented generation. Molmo, a new multimodal model, was introduced with a large dense captioning dataset. Google DeepMind announced AlphaChip, an AI-driven chip design method improving TPU and CPU designs. Hugging Face surpassed 1 million free public models, highlighting the value of smaller specialized models. Discussions covered challenges in scaling RAG applications, the future of on-device AI running ChatGPT-level models, reliability issues in larger LLMs, and new Elo benchmarking accepted at NeurIPS 2024. AI ethics and regulation topics included free speech responsibilities and California's SB-1047 bill potentially affecting open-source AI. "AlphaChip transformed computer chip design," and "ChatGPT-level AI on mobile devices predicted within a year."
not much happened today + AINews Podcast?
superforecaster-ai llama-3 reflection-70b glean sambanova cerebras stanford google apple hugging-face lmsys prompt-engineering research-ideas inference-speed retrieval-augmented-generation evaluation-methods visual-intelligence on-device-ai model-performance benchmarking novelty-detection danhendrycks benjamin-clavie bclavie bindureddy swyx borismpower corbtt drjimfan clementdelangue rohanpaul_ai
Glean doubled its valuation again. Dan Hendrycks' Superforecaster AI generates plausible election forecasts with interesting prompt engineering. A Stanford study found that LLM-generated research ideas are statistically more novel than those by expert humans. SambaNova announced faster inference for llama-3 models, surpassing Cerebras. Benjamin Clavie gave a notable talk on retrieval-augmented generation techniques. Strawberry is reported to launch in two weeks. Google Illuminate offers AI-generated podcast discussions about papers and books. Apple unveiled new AI features in iOS 18, including visual intelligence and improved Siri, with on-device and cloud processing for camera-based event additions. The Reflection 70B model sparked controversy over performance claims. Experts highlighted the unreliability of traditional benchmarks like MMLU and HumanEval, recommending alternative evaluation methods such as LMSys Chatbot Arena and Hugging Face's open-sourced Lighteval suite. The AI research community continues to explore AI's role in generating novel research ideas and improving benchmarking.
Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs
gemma-2-2b gpt-3.5-turbo-0613 mixtral-8x7b flux-1 stability-ai google-deepmind nvidia text-to-image text-to-video model-benchmarking open-weight-models model-distillation safety-classifiers sparse-autoencoders ai-coding-tools rohanpaul_ai fchollet bindureddy clementdelangue ylecun svpino
Stability AI co-founder Rombach launched FLUX.1, a new text-to-image model with three variants: pro (API only), dev (open-weight, non-commercial), and schnell (Apache 2.0). FLUX.1 outperforms Midjourney and Ideogram based on Black Forest Labs' ELO score and plans to expand into text-to-video. Google DeepMind released Gemma-2 2B, a 2 billion parameter open-source model that outperforms larger models like GPT-3.5-Turbo-0613 and Mixtral-8x7b on Chatbot Arena, optimized with NVIDIA TensorRT-LLM. The release includes safety classifiers (ShieldGemma) and sparse autoencoder analysis (Gemma Scope). Discussions highlight benchmarking discrepancies and US government support for open-weight AI models. Critiques of AI coding tools' productivity gains were also noted.
Is this... OpenQ*?
deepseek-coder-v2 llama-3-8b nemotron-4-340b stable-diffusion-3-medium deepseek_ai anthropic runwayml openai apple nvidia stability-ai luma-labs reward-tampering test-time-search mathematical-reasoning process-supervision fine-tuning on-device-ai video-generation cost-efficiency context-length coding image-understanding multimodality adcock_brett clementdelangue svpino
DeepSeekCoder V2 promises GPT4T-beating performance at a fraction of the cost. Anthropic released new research on reward tampering. Runway launched their Sora response and Gen-3 Alpha video generation model. A series of papers explore "test-time" search techniques improving mathematical reasoning with models like LLaMa-3 8B. Apple announced Apple Intelligence with smarter Siri and image/document understanding, partnered with OpenAI to integrate ChatGPT into iOS 18, and released 20 new CoreML models with LoRA fine-tuning for specialization. NVIDIA released Nemotron-4 340B, an open model matching GPT-4 performance. DeepSeek-Coder-V2 excels in coding and math with 338 programming languages and 128K context length. Stability AI released Stable Diffusion 3 Medium weights. Luma Labs launched Dream Machine for 5-second video generation from text and images.
Francois Chollet launches $1m ARC Prize
gpt-4 chatgpt openai apple togethercompute benchmarking agi pattern-recognition skill-acquisition privacy on-device-ai mixed-precision-quantization mixture-of-experts multimodality agentic-ai francois-chollet karpathy svpino philschmid clementdelangue sama gdb miramurati kevin-weil sarah-friar
François Chollet critiques current paths to AGI, emphasizing the importance of benchmarks that resist saturation and focus on skill acquisition and open-ended problem solving. The ARC-AGI puzzles exemplify "easy for humans, hard for AI" challenges to measure progress toward AGI. Meanwhile, Apple announces integration of ChatGPT into iOS, iPadOS, and macOS through a partnership with OpenAI, enabling AI-powered features like document summarization and photo analysis with privacy-preserving measures. Discussions highlight Apple's focus on deep AI integration and on-device models optimized with techniques like mixed-precision quantization, though some skepticism remains about their AI capabilities compared to GPT-4. Additionally, Together Compute introduces a Mixture of Agents approach achieving strong performance on AlpacaEval 2.0.
5 small news items
llama-3 xLSTM openai cohere deepmind hugging-face nvidia mistral-ai uncertainty-quantification parameter-efficient-fine-tuning automated-alignment model-efficiency long-context agentic-ai fine-tuning inference-optimization leopold-aschenbrenner will-brown rohanpaul_ai richardmcngo omarsar0 hwchase17 clementdelangue sophiamyang
OpenAI announces that ChatGPT's voice mode is "coming soon." Leopold Aschenbrenner launched a 5-part AGI timelines series predicting a trillion dollar cluster from current AI progress. Will Brown released a comprehensive GenAI Handbook. Cohere completed a $450 million funding round at a $5 billion valuation. DeepMind research on uncertainty quantification in LLMs and an xLSTM model outperforming transformers were highlighted. Studies on the geometry of concepts in LLMs and methods to eliminate matrix multiplication for efficiency gains were shared. Discussions on parameter-efficient fine-tuning (PEFT) and automated alignment of LLMs were noted. New tools include LangGraph for AI agents, LlamaIndex with longer context windows, and Hugging Face's integration with NVIDIA NIM for Llama3. Mistral AI released a fine-tuning API for their models.
Mamba-2: State Space Duality
mamba-2 mamba transformer++ llama-3-70b gpt-3 hugging-face state-space-models perplexity training-efficiency data-pruning benchmarking multimodality video-analysis _albertgu tri_dao arankomatsuzaki _akhaliq clementdelangue karpathy
Mamba-2, a new state space model (SSM), outperforms previous models like Mamba and Transformer++ in perplexity and wall-clock time, featuring 8x larger states and 50% faster training. It introduces the concept of state space duality (SSD) connecting SSMs and linear attention. The FineWeb-Edu dataset, a high-quality subset of the 15 trillion token FineWeb dataset, filtered using llama-3-70b for educational quality, enables better and faster LLM learning, potentially reducing tokens needed to surpass GPT-3 performance. Additionally, perplexity-based data pruning using a 125M parameter model improves downstream performance and reduces pretraining steps by up to 1.45x. The Video-MME benchmark evaluates multi-modal LLMs on video analysis across multiple visual domains and video lengths.
DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost
deepseek-v2 llama-3-120b llama-3-400b gpt-4 mistral phi claude gemini mai-1 med-gemini deepseek-ai mistral-ai microsoft openai scale-ai tesla nvidia google-deepmind mixture-of-experts multi-head-attention model-inference benchmarking overfitting robotics teleoperation open-source multimodality hallucination-detection fine-tuning medical-ai model-training erhartford maximelabonne bindureddy adcock_brett drjimfan clementdelangue omarsar0 rohanpaul_ai
DeepSeek V2 introduces a new state-of-the-art MoE model with 236B parameters and a novel Multi-Head Latent Attention mechanism, achieving faster inference and surpassing GPT-4 on AlignBench. Llama 3 120B shows strong creative writing skills, while Microsoft is reportedly developing a 500B parameter LLM called MAI-1. Research from Scale AI highlights overfitting issues in models like Mistral and Phi, whereas GPT-4, Claude, Gemini, and Llama maintain benchmark robustness. In robotics, Tesla Optimus advances with superior data collection and teleoperation, LeRobot marks a move toward open-source robotics AI, and Nvidia's DrEureka automates robot skill training. Multimodal LLM hallucinations are surveyed with new mitigation strategies, and Google's Med-Gemini achieves SOTA on medical benchmarks with fine-tuned multimodal models.
Zero to GPT in 1 Year
gpt-4-turbo claude-3-opus mixtral-8x22b zephyr-141b medical-mt5 openai anthropic mistral-ai langchain hugging-face fine-tuning multilinguality tool-integration transformers model-evaluation open-source-models multimodal-llms natural-language-processing ocr model-training vik-paruchuri sam-altman greg-brockman miranda-murati abacaj mbusigin akhaliq clementdelangue
GPT-4 Turbo reclaimed the top leaderboard spot with significant improvements in coding, multilingual, and English-only tasks, now rolled out in paid ChatGPT. Despite this, Claude Opus remains superior in creativity and intelligence. Mistral AI released powerful open-source models like Mixtral-8x22B and Zephyr 141B suited for fine-tuning. LangChain enhanced tool integration across models, and Hugging Face introduced Transformer.js for running transformers in browsers. Medical domain-focused Medical mT5 was shared as an open-source multilingual text-to-text model. The community also highlighted research on LLMs as regressors and shared practical advice on OCR/PDF data modeling from Vik Paruchuri's journey.
The Era of 1-bit LLMs
bitnet-b1.58 hugging-face quantization model-optimization energy-efficiency fine-tuning robotics multimodality ai-security ethics humor swyx levelsio gdb npew _akhaliq osanseviero mmitchell_ai deliprao nearcyan clementdelangue
The Era of 1-bit LLMs research, including the BitNet b1.58 model, introduces a ternary parameter approach that matches full-precision Transformer LLMs in performance while drastically reducing energy costs by 38x. This innovation promises new scaling laws and hardware designs optimized for 1-bit LLMs. Discussions on AI Twitter highlight advances in AGI societal impact, robotics with multimodal models, fine-tuning techniques like ResLoRA, and AI security efforts at Hugging Face. Ethical considerations in generative AI and humor within the AI community are also prominent topics.