All tags
Topic: "function-calling"
Mistral's Agents API and the 2025 LLM OS
qwen claude-4 chatgpt o3 o4 mistral-ai langchain-ai openai meta-ai-fair agent-frameworks multi-agent-systems tool-use code-execution web-search model-context-protocol persistent-memory function-calling open-source no-code reinforcement-learning model-performance agent-orchestration omarsar0 simonw swyx scaling01
The LLM OS concept has evolved since 2023, with Mistral AI releasing a new Agents API that includes code execution, web search, persistent memory, and agent orchestration. LangChainAI introduced the Open Agent Platform (OAP), an open-source no-code platform for intelligent agents. OpenAI plans to develop ChatGPT into a super-assistant by H1 2025, competing with Meta. Discussions around Qwen models focus on reinforcement learning effects, while Claude 4 performance is also noted. The AI Engineer World's Fair is calling for volunteers.
not much happened today; New email provider for AINews
gpt-4.1 gpt-4o gpt-4o-mini gemini-2.5-flash seaweed-7b claude embed-4 grok smol-ai resend openai google bytedance anthropic cohere x-ai email-deliverability model-releases reasoning video-generation multimodality embedding-models agentic-workflows document-processing function-calling tool-use ai-coding adcock_brett swyx jerryjliu0 alexalbert omarsar0
Smol AI is migrating its AI news email service to Resend to improve deliverability and enable new features like personalizable AI news and a "Hacker News of AI." Recent AI model updates include OpenAI's API-only GPT-4.1, Google Gemini 2.5 Flash reasoning model, ByteDance Seaweed 7B-param video AI, Anthropic Claude's values system, Cohere Embed 4 multimodal embedding model, and xAI Grok updates with Memory and Studio features. Discussions also cover agentic workflows for document automation and AI coding patterns.
not much happened today
gpt-2 r1 gemma-3 gemmacoder3-12b qwen2.5-omni openai deepseek berkeley alibaba togethercompute nvidia azure runway langchain bmw amazon open-source function-calling benchmarking code-reasoning multimodality inference-speed image-generation voice-generation animation robotics realtime-transcription webrtc sama clémentdelangue lioronai scaling01 cognitivecompai osanseviero jack_w_rae ben_burtenshaw theturingpost vipulved kevinweil tomlikesrobots adcock_brett juberti
OpenAI plans to release its first open-weight language model since GPT-2 in the coming months, signaling a move towards more open AI development. DeepSeek launched its open-source R1 model earlier this year, challenging perceptions of China's AI progress. Gemma 3 has achieved function calling capabilities and ranks on the Berkeley Function-Calling Leaderboard, while GemmaCoder3-12b improves code reasoning performance on LiveCodeBench. Alibaba_Qwen's Qwen2.5-Omni introduces a novel Thinker-Talker system and TMRoPE for multimodal input understanding. The TogetherCompute team achieved 140 TPS on a 671B parameter model, outperforming Azure and DeepSeek API on Nvidia GPUs. OpenAI also expanded ChatGPT features with image generation for all free users and a new voice release. Runway Gen-4 enhances animation for miniature dioramas, and LangChain launched a chat-based generative UI agent. Commercial deployment of Figure 03 humanoid robots at BMW highlights advances in autonomy and manufacturing scaling. New tools include OpenAI's realtime transcription API with WebRTC support and Amazon's Nova Act AI browser agent.
Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI
gpt-4o-transcribe gpt-4o-mini-tts o1-pro kokoro-82m openai replicate speech-to-text text-to-speech voice-activity-detection prompt-engineering real-time-processing model-release api function-calling structured-outputs model-performance juberti sama reach_vb kevinweil omarsar0
OpenAI has launched three new state-of-the-art audio models in their API, including gpt-4o-transcribe, a speech-to-text model outperforming Whisper, and gpt-4o-mini-tts, a text-to-speech model with promptable prosody allowing control over timing and emotion. The Agents SDK now supports audio, enabling voice agents. OpenAI also updated turn detection for real-time voice activity detection (VAD) based on speech content. Additionally, OpenAI's o1-pro model is available to select developers with advanced features like vision and function calling, though at higher compute costs. The community shows strong enthusiasm for these audio advancements, with a radio contest for TTS creations underway. Meanwhile, Kokoro-82M v1.0 emerges as a leading open weights TTS model with competitive pricing on Replicate.
not much happened today
jamba-1.6 mistral-ocr qwq-32b o1 o3-mini instella llama-3-2-3b gemma-2-2b qwen-2-5-3b babel-9b babel-83b gpt-4o claude-3-7-sonnet ai21-labs mistral-ai alibaba openai amd anthropic hugging-face multimodality ocr multilinguality structured-output on-prem-deployment reasoning benchmarking api open-source model-training gpu-optimization prompt-engineering function-calling
AI21 Labs launched Jamba 1.6, touted as the best open model for private enterprise deployment, outperforming Cohere, Mistral, and Llama on benchmarks like Arena Hard. Mistral AI released a state-of-the-art multimodal OCR model with multilingual and structured output capabilities, available for on-prem deployment. Alibaba Qwen introduced QwQ-32B, an open-weight reasoning model with 32B parameters and cost-effective usage, showing competitive benchmark scores. OpenAI released o1 and o3-mini models with advanced API features including streaming and function calling. AMD unveiled Instella, open-source 3B parameter language models trained on AMD Instinct MI300X GPUs, competing with Llama-3.2-3B and others. Alibaba also released Babel, open multilingual LLMs performing comparably to GPT-4o. Anthropic launched Claude 3.7 Sonnet, enhancing reasoning and prompt engineering capabilities.
Genesis: Generative Physics Engine for Robotics (o1-mini version)
o1 o1-preview gpt-4o claude-3.5-sonnet gemini-2.0-pro llama-3-3b llama-3-70b openai google-deepmind meta-ai-fair hugging-face function-calling structured-outputs vision performance-benchmarks sdk webrtc reasoning math code-generation transformer-architecture model-training humanoid-robots search model-efficiency dataset-sharing aidan_mclau sundarpichai adcock_brett
OpenAI launched the o1 model API featuring function calling, structured outputs, vision support, and developer messages, achieving 60% fewer reasoning tokens than its preview. The model excels in math and code with a 0.76 LiveBench Coding score, outperforming Sonnet 3.5. Beta SDKs for Go and Java and WebRTC support with 60% lower prices were also released. Google Gemini 2.0 Pro (Gemini Exp 1206) deployment accelerated, showing improved coding, math, and reasoning performance. Meta AI FAIR introduced research on training transformers directly on raw bytes using dynamic entropy-based patching. Commercial humanoid robots were successfully deployed by an industry player. Hugging Face researchers demonstrated that their 3B Llama model can outperform the 70B Llama model on MATH-500 accuracy using search techniques, highlighting efficiency gains with smaller models. Concerns about reproducibility and domain-specific limitations were noted.
Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)
o1 gemini-2.0-pro openai google carnegie-mellon-university universal-physics-engine robotics-simulation physics-simulation photo-realistic-rendering generative-data simulation-platform open-source function-calling vision performance-benchmarks sdk realtime-api zhou-xian aidan_mclau sundar-pichai
Genesis is a newly announced universal physics engine developed by a large-scale collaboration led by CMU PhD student Zhou Xian. It integrates multiple state-of-the-art physics solvers to simulate diverse materials and physical phenomena, targeting robotics applications with features like lightweight, ultra-fast simulation, photo-realistic rendering, and generative data capabilities. The engine is open source and designed for robotics simulation beyond just video generation. Additionally, OpenAI released the o1 model to API with advanced features like function calling and vision support, showing strong math and coding performance. Google teased updates on Gemini 2.0 Pro, accelerating deployment for advanced users.
o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning
o1-2024-12-17 o1 o1-pro 4o 4o-mini gemini-2-0-flash claude-3.5-sonnet claude-3.5 openai google google-deepmind function-calling structured-outputs vision reasoning webrtc realtime-api preference-tuning fine-tuning api model-performance aidan_mclau kevinweil simonw michpokrass morgymcg juberti
OpenAI launched the o1 API with enhanced features including vision inputs, function calling, structured outputs, and a new
reasoning_effort
parameter, achieving 60% fewer reasoning tokens on average. The o1 pro variant is confirmed as a distinct implementation coming soon. Improvements to the Realtime API with WebRTC integration offer easier usage, longer sessions (up to 30 minutes), and significantly reduced pricing (up to 10x cheaper with mini models). DPO Preference Tuning for fine-tuning is introduced, currently available for the 4o model. Additional updates include official Go and Java SDKs and OpenAI DevDay videos. The news also highlights discussions on Google Gemini 2.0 Flash model's performance reaching 83.6% accuracy. The AI Nobel Prize
claude-3.5-sonnet reka-flash got openai anthropic reka-ai zep artificial-neural-networks nobel-prize knowledge-graphs memory-layers real-time-voice-api vision fine-tuning prompt-caching multimodality function-calling ocr open-source single-sign-on software-testing ai-assisted-coding ai-ethics geoff-hinton john-hopfield philschmid alexalbert mervenoyann clementdelangue svpino bindureddy ylecun rohanpaul_ai
Geoff Hinton and John Hopfield won the Nobel Prize in Physics for their work on Artificial Neural Networks. The award citation spans 14 pages highlighting their contributions. Zep released a new community edition of their low-latency memory layer for AI agents, emphasizing knowledge graphs for memory. At OpenAI's DevDay, new features like real-time voice API, vision model fine-tuning, and prompt caching with a 50% discount on reused tokens were introduced. Anthropic's Claude 3.5 Sonnet was recognized as the best model currently. Reka AI Labs updated their Reka Flash model with enhanced multimodal and function calling capabilities. The GOT (Generic OCR Transformer) achieved 98.79% accuracy on OCR benchmarks. Discussions on open-source AI models highlighted their role in fostering competition and decentralization. Software development insights included the importance of Single Sign-On (SSO), thorough testing, and AI-assisted coding workflows. Ethical and societal topics covered critiques of tax policies and the appointment of France's first Minister of AI.
OpenAI Realtime API and other Dev Day Goodies
gpt-4o-realtime-preview gpt-4o openai livekit agora twilio grab automat voice-activity-detection function-calling ephemeral-sessions auto-truncation vision-fine-tuning model-distillation prompt-caching audio-processing
OpenAI launched the gpt-4o-realtime-preview Realtime API featuring text and audio token processing with pricing details and future plans including vision and video support. The API supports voice activity detection modes, function calling, and ephemeral sessions with auto-truncation for context limits. Partnerships with LiveKit, Agora, and Twilio enhance audio components and AI virtual agent voice calls. Additionally, OpenAI introduced vision fine-tuning with only 100 examples improving mapping accuracy for Grab and RPA success for Automat. Model distillation and prompt caching features were also announced, including free eval inference for users opting to share data.
Ideogram 2 + Berkeley Function Calling Leaderboard V2
llama-3-70b gpt-4 phi-3.5 functionary-llama-3-70b llama-3 ideogram midjourney berkeley openai hugging-face microsoft meta-ai-fair baseten kai claude functionary function-calling benchmarking image-generation model-optimization vision multimodality model-performance fine-tuning context-windows cybersecurity code-analysis ai-assisted-development
Ideogram returns with a new image generation model featuring color palette control, a fully controllable API, and an iOS app, reaching a milestone of 1 billion images created. Meanwhile, Midjourney released a Web UI but still lacks an API. In function calling, the Berkeley Function Calling Leaderboard (BFCL) updated to BFCL V2 • Live, adding 2251 live, user-contributed function documentation and queries to improve evaluation quality. GPT-4 leads the leaderboard, but the open-source Functionary Llama 3-70B finetune from Kai surpasses Claude. On AI model releases, Microsoft launched three Phi-3.5 models with impressive reasoning and context window capabilities, while Meta AI FAIR introduced UniBench, a unified benchmark suite for over 50 vision-language model tasks. Baseten improved Llama 3 inference speed by up to 122% using Medusa. A new cybersecurity benchmark, Cyberbench, featuring 40 CTF tasks, was released. Additionally, Codegen was introduced as a tool for programmatic codebase analysis and AI-assisted development. "Multiple functions > parallel functions" was highlighted as a key insight in function calling.
not much happened today
gpt-4-0613 gpt-3.5-turbo-0613 gpt-4o-2024-08-06 mistral-large-2 gpt4-turbo claude-3-opus idefics3-llama bigllama-3.1-1t-instruct llama-3-120b-instruct openai mistral-ai meta-ai-fair structured-outputs function-calling json-schema benchmarking multimodality context-windows model-scaling ai-hardware vision speech-processing robotics ai-regulation sama rohanpaul_ai corbtt guillaumelample mervenoyann maximelabonne aidan_mclau adcock_brett ylecun
OpenAI introduced structured outputs in their API with a new "strict" mode and a "response_format" parameter, supporting models like gpt-4-0613, gpt-3.5-turbo-0613, and the new gpt-4o-2024-08-06. They also halved the price of gpt-4o to $2.50 per million tokens. Mistral Large 2 outperforms gpt4-turbo and claude-3-opus on hard benchmarks and coding tasks. Idefics3-Llama offers multimodal capabilities with a 10k token context window. BigLlama-3.1-1T-Instruct is an upscaled version of llama-3-120b-instruct. New benchmark "big_model_smell" measures creativity and reliability. Figure 02 robot features advanced AI hardware with onboard vision language model, enhanced battery, and speech-to-speech reasoning. Yann LeCun expressed concerns about California's SB1047 regulation.
Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B
mistral-large-2 mistral-nemo-12b llama-3.1-8b llama-3.1-70b llama-3.1 llama-3-405b yi-34b-200k gpt-4o mistral-ai meta-ai-fair groq togethercompute code-generation math function-calling reasoning context-windows model-deprecation pretraining posttraining benchmarking
Mistral Large 2 introduces 123B parameters with Open Weights under a Research License, focusing on code generation, math performance, and a massive 128k context window, improving over Mistral Large 1's 32k context. It claims better function calling capabilities than GPT-4o and enhanced reasoning. Meanwhile, Meta officially released Llama-3.1 models including Llama-3.1-70B and Llama-3.1-8B with detailed pre-training and post-training insights. The Llama-3.1 8B model's 128k context performance was found underwhelming compared to Mistral Nemo and Yi 34B 200K. Mistral is deprecating older Apache open-source models, focusing on Large 2 and Mistral Nemo 12B. The news also highlights community discussions and benchmarking comparisons.
DataComp-LM: the best open-data 7B model/benchmark/dataset
mistral-nemo-12b gpt-4o-mini deepseek-v2-0628 mistral-7b llama-3 gemma-2 qwen-2 datacomp hugging-face openai nvidia mistral-ai deepseek dataset-design scaling-laws model-benchmarking model-performance fine-tuning multilinguality function-calling context-windows open-source-models model-optimization cost-efficiency benchmarking sam-altman guillaume-lample philschmid miramurati
DataComp team released a competitive 7B open data language model trained on only 2.5T tokens from the massive DCLM-POOL dataset of 240 trillion tokens, showing superior scaling trends compared to FineWeb. OpenAI launched GPT-4o mini, a cost-effective model with 82% MMLU and performance near GPT-4-Turbo, aimed at developers for broad applications. NVIDIA and Mistral jointly released the Mistral NeMo 12B model featuring a 128k token context window, FP8 checkpoint, multilingual support, and Apache 2.0 licensing. DeepSeek announced DeepSeek-V2-0628 as the top open-source model on the LMSYS Chatbot Arena leaderboard with strong rankings in coding, math, and hard prompts. This news highlights advances in dataset design, model efficiency, and open-source contributions in the AI community.
Nothing much happened today
chameleon-7b chameleon-30b xlam-1b gpt-3.5 phi-3-mini mistral-7b-v3 huggingface truth_terminal microsoft apple openai meta-ai-fair yi axolotl amd salesforce function-calling multimodality model-releases model-updates model-integration automaticity procedural-memory text-image-video-generation
HuggingFace released a browser-based timestamped Whisper using transformers.js. A Twitter bot by truth_terminal became the first "semiautonomous" bot to secure VC funding. Microsoft and Apple abruptly left the OpenAI board amid regulatory scrutiny. Meta is finalizing a major upgrade to Reddit comments addressing hallucination issues. The Yi model gained popularity on GitHub with 7.4K stars and 454 forks, with potential integration with Axolotl for pregeneration and preprocessing. AMD technologies enable household/small business AI appliances. Meta released Chameleon-7b and Chameleon-30b models on HuggingFace supporting unified text and image tokenization. Salesforce's xLAM-1b model outperforms GPT-3.5 in function calling despite its smaller size. Anole pioneered open-source multimodal text-image-video generation up to 720p 144fps. Phi-3 Mini expanded from 3.8B to 4.7B parameters with function calling, competing with Mistral-7b v3. "System 2 distillation" in humans relates to automaticity and procedural memory.
GraphRAG: The Marriage of Knowledge Graphs and RAG
gemma-2 llama-3-70b claude-3.5-sonnet nemotron-340b qwen2-72b llama-3 microsoft-research anthropic nvidia hugging-face retrieval-augmented-generation knowledge-graphs token-usage inference-time attention-mechanisms instruction-following coding math long-range-reasoning synthetic-data dataset-release fine-tuning context-windows function-calling travis-fischer rasbt alexandr-wang osanseviero rohanpaul_ai hamelhusain svpino aaaazzam omarsar0
Microsoft Research open sourced GraphRAG, a retrieval augmented generation (RAG) technique that extracts knowledge graphs from sources and clusters them for improved LLM answers, though it increases token usage and inference time. Gemma 2 models were released focusing on efficient small LLMs with innovations like sliding window attention and RMS norm, nearly matching the larger Llama 3 70B. Anthropic's Claude 3.5 Sonnet leads in instruction following and coding benchmarks, while Nvidia's Nemotron 340B model was released in June. Qwen2-72B tops the HuggingFace Open LLM leaderboard excelling in math and long-range reasoning. Discussions on RAG highlighted its limitations and improvements in context usage via function calls. A persona-driven synthetic data generation approach introduced 1 billion personas, with a fine-tuned model matching GPT-4 performance on math benchmarks at 7B scale. The 200GB AutoMathText dataset was also noted for math data synthesis.
Ways to use Anthropic's Tool Use GA
claude-3-opus haiku opus convnext anthropic amazon google tool-use function-calling agentic-ai streaming vision parallelization delegation debate specialization open-science superintelligence convolutional-networks self-attention ai-research yann-lecun alex-albert sainingxie
Anthropic launched general availability of tool use/function calling with support for streaming, forced use, and vision, alongside Amazon and Google. Alex Albert shared five architectures for agentic tool use: delegation, parallelization, debate, specialization, and tool suite experts. Anthropic also introduced a self-guided course on tool use. Yann LeCun emphasized ethical open science funding, gradual emergence of superintelligence with safety guardrails, and convolutional networks for image/video processing as competitive with vision transformers. He also noted growth in AI researchers across industry, academia, and government.
Not much happened today
command-r-35b goliath-120 miqu-120 llama-3-8b tensorrt-llm llama-cpp gpt2-chat gpt-4-turbo llama-3 deepmind-alphazero anthropic openai perplexity-ai amazon apple microsoft deepmind creative-writing context-windows benchmarking model-performance self-learning function-calling retrieval-augmented-generation ai-assistants on-device-ai ai-lobbying copyright-infringement code-reasoning image-generation
Anthropic released a team plan and iOS app about 4 months after OpenAI. The Command-R 35B model excels at creative writing, outperforming larger models like Goliath-120 and Miqu-120. The Llama-3 8B model now supports a 1 million token context window, improving long-context understanding with minimal training on a single 8xA800 GPU machine. TensorRT-LLM benchmarks show it is 30-70% faster than llama.cpp on consumer hardware. A benchmark suggests GPT2-Chat may have better reasoning than GPT-4-Turbo, though results are debated. Demos include a self-learning Llama-3 voice agent running locally on Jetson Orin and a Self-Learning Large Action Model (LAM). Amazon CodeWhisperer was renamed to Q Developer, expanding its generative AI assistant capabilities. Apple plans an AI-enabled Safari browser with an on-device LLM in iOS 18 and macOS 15. Big Tech dominates AI lobbying in Washington, while major U.S. newspapers sued OpenAI and Microsoft for copyright infringement. DeepMind's AlphaZero became the greatest chess player in 9 hours, and their Naturalized Execution Tuning (NExT) method improves LLM code reasoning by 14-26%. Stable Diffusion is used for diverse image generation applications.
Llama-3-70b is GPT-4-level Open Model
llama-3-70b llama-3-8b llama-3 llama-2-70b mistral-7b grok-3 stable-diffusion-3 vasa-1 meta-ai-fair groq nvidia amazon microsoft benchmarking model-performance fine-tuning function-calling arithmetic image-generation video-generation energy-usage gpu-demand political-bias ai-safety scaling context-windows tokenization elon-musk
Meta has released Llama 3, their most capable open large language model with 8B and 70B parameter versions supporting 8K context length and outperforming previous models including Llama 2 and Mistral 7B. Groq serves the Llama 3 70B model at 500-800 tokens/second, making it the fastest GPT-4-level token source. Discussions highlight AI scaling challenges with Elon Musk stating that training Grok 3 will require 100,000 Nvidia H100 GPUs, and AWS planning to acquire 20,000 B200 GPUs for a 27 trillion parameter model. Microsoft unveiled VASA-1 for lifelike talking face generation, while Stable Diffusion 3 and its extensions received mixed impressions. Concerns about AI energy usage and political bias in AI were also discussed.
Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence
gemini-1.5-pro gpt-4-turbo llama-3 orca-2.5-7b functionary-v2.4 cosxl google openai meta-ai-fair hugging-face cohere million-token-context-window audio-processing file-api text-embedding function-calling reasoning direct-nash-optimization contrastive-learning code-interpreter diffusion-models neural-odes inference-speed multilingual-dataset image-editing no-code-development
At Google Cloud Next, Gemini 1.5 Pro was released with a million-token context window, available in 180+ countries, featuring 9.5 hours of audio understanding, a new File API for nearly unlimited free uploads, and the Gecko-1b-256/768 embedding model. GPT-4 Turbo with Vision became generally available in the API with a major update improving reasoning capabilities. Meta Platforms plans to launch smaller versions of Llama 3 next week. The Orca 2.5 7B model using Direct Nash Optimization outperforms older GPT-4 versions in AlpacaEval. New releases include Functionary-V2.4 with enhanced function calling and code interpretation, and CosXL models for image editing. Research highlights include continuous U-Nets for diffusion models achieving up to 80% faster inference and a massive multilingual dataset with ~5.6 trillion word tokens. Creative applications include a no-code touch screen game made with Gemini 1.5 and AI-generated novel trailers.
Mixture of Depths: Dynamically allocating compute in transformer-based language models
octopus-v2 deepmind transformer-efficiency dynamic-compute-allocation mixture-of-experts mixture-of-depths top-k-routing algorithmic-reasoning visual-autoregressive-modeling on-device-models function-calling scaling-laws piotrpadlewski
DeepMind introduces the Mixture-of-Depths (MoD) technique, dynamically allocating FLOPs across transformer layers to optimize compute usage, achieving over 50% faster forward passes without training impact. MoD selectively processes tokens using top-k routing, improving efficiency and potentially enabling faster ultra-long context handling. The method can combine with Mixture-of-Experts (MoE) for decoupled routing of queries, keys, and values. Reddit discussions highlight concerns about LLM hype overshadowing other AI tech, improvements in transformer efficiency, a new Think-and-Execute framework boosting algorithmic reasoning by 10-20%, and Visual Autoregressive modeling (VAR) surpassing diffusion models in image quality and speed. On-device model Octopus v2 outperforms GPT-4 in function calling accuracy and latency.
1/17/2024: Help crowdsource function calling datasets
mistral-7b dolphin-2.7-mixtral-8x7b mega-dolphin dolphin-2.6-mistral-7b-dpo llama-cpp lm-studio mistral-ai microsoft hugging-face apple function-calling quantization model-performance gpu-optimization model-selection closed-source memory-optimization linux-server api-fees headless-mode yagilb heyitsyorkie
LM Studio updated its FAQ clarifying its closed-source status and perpetual freeness for personal use with no data collection. The new beta release includes fixes and hints at upcoming 2-bit quantization support. For gaming, models like Dolphin 2.7 Mixtral 8x7B, MegaDolphin, and Dolphin 2.6 Mistral 7B DPO with Q4_K_M quantization were recommended. Discussions highlighted that single powerful GPUs outperform multi-GPU setups due to bottlenecks, with older GPUs like Tesla P40 being cost-effective. Microsoft's AutoGen Studio was introduced but has issues and requires API fees for open-source models. Linux users are advised to use llama.cpp over LM Studio due to lack of headless mode. Additional tools like LLMFarm for iOS and various Hugging Face repositories were also mentioned. "LM Studio must be running to use the local inference server as there is no headless mode available" and "matching model size to GPU memory is key for performance" were notable points.
1/11/2024: Mixing Experts vs Merging Models
gpt-4-turbo gpt-4-0613 mixtral deepseekmoe phixtral deepseek-ai hugging-face nous-research teenage-engineering discord mixture-of-experts model-merging fine-tuning rag security discord-tos model-performance prompt-engineering function-calling semantic-analysis data-frameworks ash_prabaker shacrw teknium 0xevil everyoneisgross ldj pramod8481 mgreg_42266 georgejrjrjr kenakafrosty
18 guilds, 277 channels, and 1342 messages were analyzed with an estimated reading time saved of 187 minutes. The community switched to GPT-4 turbo and discussed the rise of Mixture of Experts (MoE) models like Mixtral, DeepSeekMOE, and Phixtral. Model merging techniques, including naive linear interpolation and "frankenmerges" by SOLAR and Goliath, are driving new performance gains on open leaderboards. Discussions in the Nous Research AI Discord covered topics such as AI playgrounds supporting prompt and RAG parameters, security concerns about third-party cloud usage, debates on Discord bots and TOS, skepticism about Teenage Engineering's cloud LLM, and performance differences between GPT-4 0613 and GPT-4 turbo. The community also explored fine-tuning strategies involving DPO, LoRA, and safetensors, integration of RAG with API calls, semantic differences between MoE and dense LLMs, and data frameworks like llama index and SciPhi-AI's synthesizer. Issues with anomalous characters in fine-tuning were also raised.
12/30/2023: Mega List of all LLMs
deita-v1.0 mixtral amazon-titan-text-express amazon-titan-text-lite nous-research hugging-face amazon mistral-ai local-attention computational-complexity benchmarking model-merging graded-modal-types function-calling data-contamination training-methods stella-biderman euclaise joey00072
Stella Biderman's tracking list of LLMs is highlighted, with resources shared for browsing. The Nous Research AI Discord discussed the Local Attention Flax module focusing on computational complexity, debating linear vs quadratic complexity and proposing chunking as a solution. Benchmark logs for various LLMs including Deita v1.0 with its SFT+DPO training method were shared. Discussions covered model merging, graded modal types, function calling in AI models, and data contamination issues in Mixtral. Community insights were sought on Amazon Titan Text Express and Amazon Titan Text Lite LLMs, including a unique training strategy involving bad datasets. Several GitHub repositories and projects like DRUGS, MathPile, CL-FoMo, and SplaTAM were referenced for performance and data quality evaluations.