*** ##->Archived News:<- Date: (MM/DD/YYYY) | Description: ------ | ------ 07/23/2024 | Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/ 07/22/2024 | llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633 07/18/2024 | Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628 07/18/2024 | Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/ 07/16/2024 | Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1 07/16/2024 | MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1 07/13/2024 | Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271 07/09/2024 | Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1 07/07/2024 | Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031 07/02/2024 | Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat 06/28/2024 | Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156 06/27/2024 | Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw 06/27/2024 | Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315 06/25/2024 | Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io 06/23/2024 | Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931 06/18/2024 | Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases 06/17/2024 | DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2 06/14/2024 | Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct 06/14/2024 | Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c 06/11/2024 | Google releases RecurrentGemma, based on a hybrid RNN architecture: https://hf.co/google/recurrentgemma-9b-it 06/06/2024 | Qwen2 releases, with better benchmarks than Llama 3: https://qwenlm.github.io/blog/qwen2/ 06/01/2024 | KV cache quantization support merged: https://github.com/ggerganov/llama.cpp/pull/7527 05/31/2024 | K2: Fully-reproducible model outperforming Llama 2 70B using 35% less compute: https://hf.co/LLM360/K2 05/29/2024 | Mistral releases Codestral-22B: https://mistral.ai/news/codestral/ 05/28/2024 | DeepSeek-V2 support officially merged: https://github.com/ggerganov/llama.cpp/pull/7519 05/24/2024 | Draft PR adds support for Jamba: https://github.com/ggerganov/llama.cpp/pull/7531 05/23/2024 | Cohere releases 8B & 35B Aya 23 with multilingual capabilities: https://hf.co/collections/CohereForAI/c4ai-aya-23-664f4cda3fa1a30553b221dc 05/22/2024 | Mistral v0.3 models with function calling and extended vocab: https://github.com/mistralai/mistral-inference#model-download 05/21/2024 | Fork of llama.cpp adds DeepSeek-V2 support: https://hf.co/leafspark/DeepSeek-V2-Chat-GGUF 05/21/2024 | Microsoft launches Phi-3 small (7B) and medium (14B) under MIT: https://aka.ms/phi3-hf 05/16/2024 | DeepSeek AI releases 16B V2-Lite: https://hf.co/deepseek-ai/DeepSeek-V2-Lite-Chat 05/14/2024 | PaliGemma, Gemma 2, and LLM Comparator: https://developers.googleblog.com/gemma-family-and-toolkit-expansion-io-2024 05/12/2024 | Yi-1.5 Released with Improved Coding, Math, and Reasoning Capabilities: https://hf.co/collections/01-ai/yi-15-2024-05-663f3ecab5f815a3eaca7ca8 05/11/2024 | Japanese 13B model trained on CPU supercomputer: https://hf.co/Fugaku-LLM/Fugaku-LLM-13B 05/11/2024 | OneBit: Towards Extremely Low-bit LLMs: https://github.com/xuyuzhuang11/OneBit 05/10/2024 | Gemma 2B - 10M Context: https://hf.co/mustafaaljadery/gemma-2B-10M 05/08/2024 | Refuel LLM-2 for data labeling, enrichment, and cleaning: https://hf.co/refuelai/Llama-3-Refueled 05/08/2024 | OpenAI releases AI Specification: https://cdn.openai.com/spec/model-spec-2024-05-08.html 05/06/2024 | IBM releases Granite Code Models: https://github.com/ibm-granite/granite-code-models 05/02/2024 | Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://chatqa-project.github.io/ 05/01/2024 | KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756 05/01/2024 | Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2 04/27/2024 | Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ 04/24/2024 | Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct 04/23/2024 | Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx 04/21/2024 | Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0 04/18/2024 | Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/ 04/17/2024 | Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/ 04/15/2024 | Microsoft AI unreleases WizardLM 2: https://web.archive.org/web/20240415221214/https://wizardlm.github.io/WizardLM2/ 04/15/2024 | Microsoft AI releases WizardLM 2, including Mixtral 8x22B finetune: https://wizardlm.github.io/WizardLM2/ 04/09/2024 | Mistral releases Mixtral-8x22B: https://twitter.com/MistralAI/status/1777869263778291896 04/09/2024 | Llama 3 coming in the next month: https://techcrunch.com/2024/04/09/meta-confirms-that-its-llama-3-open-source-llm-is-coming-in-the-next-month/ 04/08/2024 | StableLM 2 12B released https://huggingface.co/stabilityai/stablelm-2-12b 04/05/2024 | Qwen1.5-32B released with GQA: https://huggingface.co/Qwen/Qwen1.5-32B 04/04/2024 | Command R+ released with GQA, 104B, 128K context: https://huggingface.co/CohereForAI/c4ai-command-r-plus 03/28/2024 | MiniGemini: Dense and MoE vision models: https://github.com/dvlab-research/MiniGemini 03/28/2024 | Jamba 52B MoE released with 256k context: https://huggingface.co/ai21labs/Jamba-v0.1 03/27/2024 | Databricks releases 132B MoE model: https://huggingface.co/collections/databricks/dbrx-6601c0852a0cdd3c59f71962 03/23/2024 | Mistral releases 7B v0.2 base model with 32k context: https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar 03/23/2024 | Grok support merged: https://github.com/ggerganov/llama.cpp/pull/6204 03/17/2024 | xAI open sources Grok: https://github.com/xai-org/grok 03/15/2024 | Control vector support in llamacpp: https://github.com/ggerganov/llama.cpp/pull/5970 03/11/2024 | New 35B RAG model with 128K context: https://huggingface.co/CohereForAI/c4ai-command-r-v01 03/11/2024 | This week, xAI will open source Grok: https://twitter.com/elonmusk/status/1767108624038449405 02/28/2024 | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 02/27/2024 | Mistral readds the notice to their website https://www.reddit.com/r/LocalLLaMA/comments/1b18817/mistral_changing_and_then_reversing_website/ 02/26/2024 | Mistral partners with Microsoft, removes mentions of open models from website https://siliconangle.com/2024/02/26/now-microsoft-partner-mistral-ai-challenges-openai-three-new-llms/ 02/21/2024 | Google releases two open models, Gemma https://blog.google/technology/developers/gemma-open-models/ 02/18/2024 | 1.5bit quant for lcpp merged https://github.com/ggerganov/llama.cpp/pull/5453 02/17/2024 | Kobold.cpp-1.58 prebuilt released https://github.com/LostRuins/koboldcpp/releases/tag/v1.58 02/16/2024 | Exl2 added Qwen support https://github.com/turboderp/exllamav2/issues/334 02/16/2024 | ZLUDA for lcpp merged, however, outlook is questionable at best https://github.com/vosen/ZLUDA/pull/102 06/23/2023 | Ooba's preset arena results and SuperHOT 16k prototype releases 06/22/2023 | Vicuna 33B (preview), OpenLLaMA 7B scaled and MPT 30B released 06/20/2023 | SuperHOT Prototype 2 w/ 8K context released >>94191797 06/18/2023 | Minotaur 15B 8K, WizardLM 7B Uncensored v1.0 and Vicuna 1.3 released 06/17/2023 | exllama support merged into ooba; API server rewrite merged into llama.cpp 06/16/2023 | OpenLlama 13B released 06/16/2023 | Airoboros GPT-4 v1.2 released 06/16/2023 | Robin-33B-V2 released 06/16/2023 | Dan's 30B Personality Engine LoRA released 06/14/2023 | WizardCoder 15B Released 06/14/2023 | CUDA full GPU acceleration merged in llama.cpp 06/10/2023 | First Landmark Attention models released >>93993800 (Cross-thread) 06/08/2023 | Openllama 3B and 7B released 06/07/2023 | StarCoderPlus / StarChat-β released 06/07/2023 | chronos-33b released 06/06/2023 | RedPajama 7B released + Instruct&Chat 06/06/2023 | WizardLM 30B v1.0 released 06/05/2023 | k-quantization released for llama.cpp 06/03/2023 | Nous-Hermes-13b released 06/03/2023 | WizardLM-Uncensored-Falcon-40b released 05/27/2023 | FalconLM release Falcon-7B & 40B, new foundational models 05/26/2023 | BluemoonRP 30B 4K released 05/25/2023 | QLoRA and 4bit bitsandbytes released 05/23/2023 | exllama transformer rewrite offers around x2 t/s increases for GPU models 05/22/2023 | SuperHOT 13B prototype & WizardLM Uncensored 30B released 05/22/2023 | SuperHOT 13B prototype & WizardLM Uncensored 30B released 05/19/2023 | RTX 30 series 15% performance gains, quantization breaking changes again >>93536523 05/19/2023 | PygmalionAI release 13B Pyg & Meth 05/18/2023 | VicunaUnlocked-30B released 05/14/2023 | llama.cpp quantization change breaks current Q4 & Q5 models, must be quantized again 05/13/2023 | llama.cpp GPU acceleration has been merged onto master >>93403996 >>93404319 05/10/2023 | GPU-accelerated token generation >>93334002 05/06/2023 | MPT 7B, 65k context model trained on 1T tokens: https://huggingface.co/mosaicml/mpt-7b-storywriter 05/05/2023 | GPT4-x-AlpacaDente2-30b. https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b 05/04/2023 | Allegedly leaked document from Google, fretting over Open Source LLM's. https://www.semianalysis.com/p/google-we-have-no-moat-and-neither 05/04/2023 | StarCoder, a 15.5B parameter models trained on 80+ programming languages: https://huggingface.co/bigcode/starcoderbase 04/30/2023 | Uncucked Vicuna 13B released: https://huggingface.co/reeducator/vicuna-13b-free 04/30/2023 | PygmalionAI release two 7B LLaMA-based models: https://huggingface.co/PygmalionAI 04/29/2023 | GPT4 X Alpasta 30B Merge: https://huggingface.co/MetaIX/GPT4-X-Alpasta-30b-4bit 04/25/2023 | Proxy script for Tavern via Kobold/webui, increases LLaMA: output quality https://github.com/anon998/simple-proxy-for-tavern 04/23/2023 | OASS 30B released & quantized: https://huggingface.co/MetaIX/OpenAssistant-Llama-30b-4bit 04/22/2023 | SuperCOT LoRA (by kaiokendev), merged by helpful anons: https://huggingface.co/tsumeone/llama-30b-supercot-4bit-128g-cuda https://huggingface.co/ausboss/llama-13b-supercot-4bit-128g 04/22/2023 | OASS "releases" XORs again, deletes them soon after... again 04/21/2023 | StableLM models performing terribly, are apparently broken: https://github.com/Stability-AI/StableLM/issues/30