Abstract Algorithms

mlops

4 articles across 4 sub-topics

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•31 min read

Ai Agents(1)

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•34 min read

Ai Architecture(1)

MLOps Model Serving and Monitoring Patterns for Production Readiness

TLDR: Production ML reliability depends on joining inference serving, data-quality signals, and rollback automation into one operating loop. TLDR: This dedicated deep dive focuses on the internals, failure behavior, performance trade-offs, and rollou...

Mar 13, 2026•12 min read

Ai(1)

LLM Model Naming Conventions: How to Read Names and Why They Matter

TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. �...

Mar 9, 2026•12 min read