Packing Intelligence into Fewer Bits: Non-Linear Quantization in LLMs

A 70-billion-parameter LLM stored in 16-bit floats needs roughly 140 GB of memory, more than most GPUs can hold. Quantization shrinks the model by replacing those 16-bit floats with much smaller in...

Apr 14, 2026 Deep Learning

Decoding RAG Evaluation: When Your Pipeline Fails, Who is to Blame?

Retrieval-Augmented Generation (RAG) has rapidly become the enterprise standard for bridging the gap between static Large Language Models (LLMs) and dynamic, proprietary data. By fetching relevant ...

Mar 26, 2026 Deep Learning

Epilogue: The Grand Unifying Theory of A/B Testing (Enter the GLM)

Welcome to the epilogue of our six-part series on experimentation and A/B testing! Over the past few months, we’ve covered a massive amount of ground. We started with the foundational statistics of...

Mar 23, 2026 Statistics

Beyond Traditional A/B Testing: Multi-Armed and Contextual Bandits

Welcome to the final installment of our A/B Testing series! Over the past several posts, we’ve covered the entire statistical foundation of experimentation - from p-values, confidence intervals, an...

Mar 19, 2026 Statistics

Beyond A/B Testing: Causal Inference in the Wild (DiD, PSM, and IV)

If you’ve been following my series on statistical testing, you’re already comfortable with 2-sample t-tests, ANOVA, and Chi-Square tests. Those tools are fantastic for randomized, perfectly control...

Mar 18, 2026 Statistics

A Practical Introduction to LLM Quantization and Linear Mapping

Why Quantization Matters for LLMs Modern LLMs are enormous, not just in parameter count, but in the memory and compute they demand at inference time. A model with billions of parameters stored in ...

Mar 16, 2026 Deep Learning

KV Cache: The Trick That Lets LLMs Remember Without Recomputing

KV Cache: How LLMs Avoid Recomputing the Past Large language models generate text one token at a time. At every step, the model attends to all previous tokens. Naively, this would require recomput...

Mar 12, 2026 Deep Learning

Demystifying LLM Temperature: The Math Behind the Magic of Token Sampling

If you’ve played with any Large Language Model (LLM) API, you’ve likely tweaked the temperature slider. The conventional wisdom is simple: “Low temperature = boring and factual, High temperature = ...

Feb 28, 2026 Deep Learning

Introduction to AdTech: The Post-Cookie Frontier - Identity, Privacy & and What Comes Next

In the previous posts, we explored how an ad impression is auctioned in milliseconds and how machine learning models decide whether and how much to bid. However, every system we’ve discussed - from...

Dec 15, 2025 AdTech

Introduction to AdTech: The Intelligence Layer - Machine Learning in the Millisecond

In the previous posts, we explored the plumbing of AdTech: how ad tags, redirects, pixels, and browsers coordinate to make a single impression possible. But for a data scientist or engineer, the mo...

Dec 12, 2025 AdTech