All tags
Topic: "context-length"
ChatGPT Canvas GA
llama-3-70b llama-3-1-8b tgi-v3 deepseek-v2.5-1210 coconut openai deepseek-ai meta-ai-fair huggingface cognition-labs hyperbolic google-deepmind code-execution gpt-integration model-finetuning gradient-checkpointing context-length latent-space-reasoning performance-optimization gpu-memory-optimization kubernetes gpu-marketplace ai-capabilities employment-impact neurips-2024 ai-scaling humor arav_srinivas sama jonathan-frankle dylan
OpenAI launched ChatGPT Canvas to all users, featuring code execution and GPT integration, effectively replacing Code Interpreter with a Google Docs-like interface. Deepseek AI announced their V2.5-1210 update improving performance on MATH-500 (82.8%) and LiveCodebench. Meta AI Fair introduced COCONUT, a new continuous latent space reasoning paradigm. Huggingface released TGI v3, processing 3x more tokens and running 13x faster than vLLM on long prompts. Cognition Labs released Devin, an AI developer building Kubernetes operators. Hyperbolic raised $12M Series A to build an open AI platform with an H100 GPU marketplace. Discussions included AI capabilities and employment impact, and NeurIPS 2024 announcements with Google DeepMind demos and a debate on AI scaling. On Reddit, Llama 3.3-70B supports 90K context length finetuning using Unsloth with gradient checkpointing and Apple's Cut Cross Entropy (CCE) algorithm, fitting on 41GB VRAM. Llama 3.1-8B reaches 342K context lengths with Unsloth, surpassing native limits.
Is this... OpenQ*?
deepseek-coder-v2 llama-3-8b nemotron-4-340b stable-diffusion-3-medium deepseek_ai anthropic runwayml openai apple nvidia stability-ai luma-labs reward-tampering test-time-search mathematical-reasoning process-supervision fine-tuning on-device-ai video-generation cost-efficiency context-length coding image-understanding multimodality adcock_brett clementdelangue svpino
DeepSeekCoder V2 promises GPT4T-beating performance at a fraction of the cost. Anthropic released new research on reward tampering. Runway launched their Sora response and Gen-3 Alpha video generation model. A series of papers explore "test-time" search techniques improving mathematical reasoning with models like LLaMa-3 8B. Apple announced Apple Intelligence with smarter Siri and image/document understanding, partnered with OpenAI to integrate ChatGPT into iOS 18, and released 20 new CoreML models with LoRA fine-tuning for specialization. NVIDIA released Nemotron-4 340B, an open model matching GPT-4 performance. DeepSeek-Coder-V2 excels in coding and math with 338 programming languages and 128K context length. Stability AI released Stable Diffusion 3 Medium weights. Luma Labs launched Dream Machine for 5-second video generation from text and images.
Apple's OpenELM beats OLMo with 50% of its dataset, using DeLighT
openelm llama-3 llama-3-8b-instruct llama-3-70b apple meta-ai-fair google layer-wise-scaling context-length quantization ai-alignment open-source ai-regulation eric-schmidt sebastian-raschka
Apple advances its AI presence with the release of OpenELM, its first relatively open large language model available in sizes from 270M to 3B parameters, featuring a novel layer-wise scaling architecture inspired by the DeLight paper. Meanwhile, Meta's LLaMA 3 family pushes context length boundaries with models supporting over 160K tokens and an 8B-Instruct model with 262K context length released on Hugging Face, alongside performance improvements in quantized versions. A new paper on AI alignment highlights KTO as the best-performing method, with sensitivity to training data volume noted. In AI ethics and regulation, former Google CEO Eric Schmidt warns about the risks of open-source AI empowering bad actors and geopolitical rivals, while a U.S. proposal aims to enforce "Know Your Customer" rules to end anonymous cloud usage.
Perplexity, the newest AI unicorn
llama-3-8b llama-3-70b llama-3 llava-llama-3-8b-v1_1 phi-3 gpt-3.5 perplexity-ai meta-ai-fair hugging-face groq context-length fine-tuning quantization instruction-following model-comparison multimodality benchmarking memory-optimization model-performance daniel-gross aravind-srinivas
Perplexity doubles its valuation shortly after its Series B with a Series B-1 funding round. Significant developments around Llama 3 include context length extension to 16K tokens, new multimodal LLaVA models outperforming Llama 2, and fine-tuning improvements like QDoRA surpassing QLoRA. The Llama-3-70B model is praised for instruction following and performance across quantization formats. Phi-3 models by Meta AI released in multiple sizes show competitive benchmark results, with the 14B model achieving 78% on MMLU and the 3.8B model nearing GPT-3.5 performance.
Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention
mistral-8x22b command-r-plus rerank-3 infini-attention llama-3 sd-1.5 cosxl meta-ai-fair mistral-ai cohere google stability-ai hugging-face ollama model-merging training-accelerators retrieval-augmented-generation linear-attention long-context foundation-models image-generation rag-pipelines model-benchmarking context-length model-performance aidan_gomez ylecun swyx
Meta announced their new MTIAv2 chips designed for training and inference acceleration with improved architecture and integration with PyTorch 2.0. Mistral released the 8x22B Mixtral model, which was merged back into a dense model to effectively create a 22B Mistral model. Cohere launched Rerank 3, a foundation model enhancing enterprise search and retrieval-augmented generation (RAG) systems supporting 100+ languages. Google published a paper on Infini-attention, an ultra-scalable linear attention mechanism demonstrated on 1B and 8B models with 1 million sequence length. Additionally, Meta's Llama 3 is expected to start rolling out soon. Other notable updates include Command R+, an open model surpassing GPT-4 in chatbot performance with 128k context length, and advancements in Stable Diffusion models and RAG pipelines.
Miqu confirmed to be an early Mistral-medium checkpoint
miqu-1-70b mistral-medium llama-2-70b-chat mixtral sqlcoder-70b codellama-70b bagelmistery-tour-v2 psyfighter-v2 mistral-ai hugging-face nous-research aiatmeta instruction-following sampling-methods fp16-quantization fine-tuning model-training context-length text-to-sql model-performance model-optimization intrstllrninja
Miqu, an open access model, scores 74 on MMLU and 84.5 on EQ-Bench, sparking debates about its performance compared to Mistral Medium. The CEO of Mistral confirmed these results. Discussions in the TheBloke Discord highlight Miqu's superiority in instruction-following and sampling methods like dynatemp and min-p. Developers also explore browser preferences and Discord UI themes. Role-playing with models like BagelMistery Tour v2 and Psyfighter v2 is popular, alongside technical talks on fp16 quantization of Miqu-1-70b. Training and fine-tuning tips for models like Unsloth and Mistral 7B are shared. In the Nous Research AI Discord, the Activation Beacon method is discussed for extending LLM context length from 4K to 400K tokens. SQLCoder-70B, fine-tuned on CodeLlama-70B, leads in text-to-SQL generation and is available on Hugging Face. The Miqu model also impresses with an 83.5 EQ-Bench score, fueling speculation about its capabilities.
CodeLLama 70B beats GPT4 on HumanEval
codellama miqu mistral-medium llama-2-70b aphrodite-engine mixtral flatdolphinmaid noromaid rpcal chatml mistral-7b activation-beacon eagle-7b rwkv-v5 openhermes2.5 nous-hermes-2-mixtral-8x7b-dpo imp-v1-3b bakllava moondream qwen-vl meta-ai-fair ollama nous-research mistral-ai hugging-face ai-ethics alignment gpu-optimization direct-prompt-optimization fine-tuning cuda-programming optimizer-technology quantization multimodality context-length dense-retrieval retrieval-augmented-generation multilinguality model-performance open-source code-generation classification vision
Meta AI surprised the community with the release of CodeLlama, an open-source model now available on platforms like Ollama and MLX for local use. The Miqu model sparked debate over its origins, possibly linked to Mistral Medium or a fine-tuned Llama-2-70b, alongside discussions on AI ethics and alignment risks. The Aphrodite engine showed strong performance on A6000 GPUs with specific configurations. Role-playing AI models such as Mixtral and Flatdolphinmaid faced challenges with repetitiveness, while Noromaid and Rpcal performed better, with ChatML and DPO recommended for improved responses. Learning resources like fast.ai's course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like Paged 8bit lion and adafactor were discussed.
At Nous Research AI, the Activation Beacon project introduced a method for unlimited context length in LLMs using "global state" tokens, potentially transforming retrieval-augmented models. The Eagle-7B model, based on RWKV-v5, outperformed Mistral in benchmarks with efficiency and multilingual capabilities. OpenHermes2.5 was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like IMP v1-3b, Bakllava, Moondream, and Qwen-vl were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.
1/4/2024: Jeff Bezos backs Perplexity's $520m Series B.
wizardcoder-33b-v1.1 mobilellama-1.4b-base shearedllama tinyllama mixtral-8x7b perplexity anthropic google nous-research mistral-ai hugging-face document-recall rnn-memory synthetic-data benchmarking multi-gpu-support context-length model-architecture sliding-window-attention model-parallelism gpu-optimization jeff-bezos
Perplexity announced their Series B funding round with notable investor Jeff Bezos, who previously invested in Google 25 years ago. Anthropic is raising $750 million, projecting at least $850 million in annualized revenue next year and implementing "brutal" changes to their Terms of Service. Discussions in Nous Research AI Discord cover topics such as document recall limits from gigabytes of data, RNN memory and compute trade-offs, synthetic datasets, and benchmarking of models like WizardCoder-33B-V1.1, MobileLLaMA-1.4B-Base, ShearedLLaMA, and TinyLLaMA. Other highlights include UnsLOTH optimizations for multi-GPU systems, AI rap voice models, context-extending code, and architectural innovations like applying Detectron/ViT backbones to LLMs, sliding window attention in Mistral, and parallelizing Mixtral 8x7b with FSDP and HF Accelerate.
12/23/2023: NeurIPS Best Papers of 2023
gpt-4 palm2 hermes-2.5 mistral-7b nous-research hugging-face apple context-length malware-security video-content music-content linear-layers api-access large-language-models embedding vector-databases model-merging model-interpretability striped-hyena-architecture quantization rmsnorm attention-mechanisms
The Latent Space Pod released a 3-hour recap of the best NeurIPS 2023 papers. The Nous Research AI Discord community discussed optimizing AI performance with shorter context lengths, malware security concerns linked to HuggingFace, and shared insights on video and music content. Technical discussions included the DYAD research paper proposing a faster alternative to linear layers, Apple's ML Ferret machine learning tool, and accessing PALM2 via API. The community also explored Large Language Models focusing on specialized models, data scaling, embedding/vector databases, model merging, and interpretability, with mentions of Hermes 2.5, GPT-4, and Mistral. Additionally, there were conversations on the Striped Hyena Architecture, quantization challenges, and fixes related to RMSNorm and the "Attention is All You Need" paper.