Topic: "quantization"

not much happened today

not much happened today

not much happened today

Terminal-Bench 2.0 and Harbor

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

not much happened today

Cognition's $10b Series C; Smol AI updates

not much happened today

not much happened today

Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528

Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants

not much happened today

Not much happened today

AI Engineer World's Fair Talks Day 1

not much happened today

DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

not much happened today

not much happened today

Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen

Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2

not much happened to end the week

OLMo 2 - new SOTA Fully Open LLM

Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11

Common Corpus: 2T Open Tokens with Provenance

BitNet was a lie?

not much happened today

Not much technical happened today

Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

Reflection 70B, by Matt from IT Department

Summer of Code AI: $1.6b raised, 1 usable product

Gemma 2 2B + Scope + Shield

not much happened today

Talaria: Apple's new MLOps Superweapon

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)

Quis promptum ipso promptiet?

A quiet weekend

Apple's OpenELM beats OLMo with 50% of its dataset, using DeLighT

Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM

Perplexity, the newest AI unicorn

FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)

Anime pfp anon eclipses $10k A::B prompting challenge

ReALM: Reference Resolution As Language Modeling

Evals-based AI Engineering

Welcome /r/LocalLlama!

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

The Era of 1-bit LLMs

Welcome Interconnects and OpenRouter

Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)

Karpathy emerges from stealth?

Companies liable for AI hallucination is Good Actually for AI Engineers

The Dissection of Smaug (72B)

Qwen 1.5 Released

The Core Skills of AI Engineering

Trust in GPTs at all time low

CodeLLama 70B beats GPT4 on HumanEval

codellama miqu mistral-medium llama-2-70b aphrodite-engine mixtral flatdolphinmaid noromaid rpcal chatml mistral-7b activation-beacon eagle-7b rwkv-v5 openhermes2.5 nous-hermes-2-mixtral-8x7b-dpo imp-v1-3b bakllava moondream qwen-vl meta-ai-fair ollama nous-research mistral-ai hugging-face ai-ethics alignment gpu-optimization direct-prompt-optimization fine-tuning cuda-programming optimizer-technology quantization multimodality context-length dense-retrieval retrieval-augmented-generation multilinguality model-performance open-source code-generation classification vision

Meta AI surprised the community with the release of CodeLlama, an open-source model now available on platforms like Ollama and MLX for local use. The Miqu model sparked debate over its origins, possibly linked to Mistral Medium or a fine-tuned Llama-2-70b, alongside discussions on AI ethics and alignment risks. The Aphrodite engine showed strong performance on A6000 GPUs with specific configurations. Role-playing AI models such as Mixtral and Flatdolphinmaid faced challenges with repetitiveness, while Noromaid and Rpcal performed better, with ChatML and DPO recommended for improved responses. Learning resources like fast.ai's course were highlighted for ML/DL beginners, and fine-tuning techniques with optimizers like Paged 8bit lion and adafactor were discussed. At Nous Research AI, the Activation Beacon project introduced a method for unlimited context length in LLMs using "global state" tokens, potentially transforming retrieval-augmented models. The Eagle-7B model, based on RWKV-v5, outperformed Mistral in benchmarks with efficiency and multilingual capabilities. OpenHermes2.5 was recommended for consumer hardware due to its quantization methods. Multimodal and domain-specific models like IMP v1-3b, Bakllava, Moondream, and Qwen-vl were explored for classification and vision-language tasks. The community emphasized centralizing AI resources for collaborative research.

RWKV "Eagle" v5: Your move, Mamba

Adept Fuyu-Heavy: Multimodal model for Agents

RIP Latent Diffusion, Hello Hourglass Diffusion

Nightshade poisons AI art... kinda?

1/17/2024: Help crowdsource function calling datasets

1/16/2024: ArtificialAnalysis - a new model/host benchmark site

1/16/2024: TIES-Merging

12/25/2023: Nous Hermes 2 Yi 34B for Christmas

12/23/2023: NeurIPS Best Papers of 2023

12/11/2023: Mixtral beats GPT3.5 and Llama2-70B

12/9/2023: The Mixtral Rush