Concept: Monte Carlo Better than Quadrature

Core Intuition

Q: Why Monte Carlo method is often the default method for high-dimensional integral approximation, instead of Quadrature methods? I got this question while reading Fearnhead et al. (2025), Scalable Monte Carlo for Bayesian Learning, p.6.

This comes directly from the following:

  • Suppose we apply a cubature rule based on a grid of equally spaced points in each dimension.
  • The spacing of these points will be
  • Then there will be points in total
  • If we have a cubature whose error decays in , for some power , then the error decays at a rate of
  • Note for Monte Carlo method: its error decays at a rate of , by the central limit theorem

Mathematical Foundation

Q: Why cubature whose error decays in ? Taylor’s theorem with integral remainder.

Taylor’s Theorem with Remainder

For a function with continuous derivatives on an interval containing and :

where the remainder can be written as:

Proof of the Error Rate Bound in

Step 1: Take absolute value of the remainder:

Step 2: Apply triangle inequality:

Step 3: Use (definition of sup norm):

Step 4: Evaluate the integral. Let :

Step 5: Combine:

Step 6: Set and :

Derivation of Taylor’s Theorem with Remainder Formula

The key idea is to repeatedly apply integration by parts to express the error in terms of the highest derivative.

Starting Point: Fundamental Theorem of Calculus

This is just:

Step 1: Integration by Parts (once)

Apply integration by parts to with:

  • , so
  • , so

So:

Step 2: Integration by Parts (again)

Apply integration by parts to with:

  • , so
  • , so

So:

Pattern Recognition

After k integration by parts, we get:

Setting k = r:

Intuition

The integral form shows that the remainder is a weighted average of over the interval , with weights that give more importance to points closer to .

Key Equation

For curvature: If we have a cubature whose error decays in . for some power , then the error decays at a rate of .

As , , so .

For Monte Carlo: The error decays at a rate of , by the central limit theorem.

Component of

Insights

  • Curse of dimensionality breaks quadrature: cubature error scales as , so for fixed and , doubling halves the exponent. Monte Carlo error is completely dimension-free.
  • Crossover dimension: Monte Carlo beats an order- cubature rule when . For a second-order rule (), the crossover is ; for most practical integrands in ML/Bayesian settings, is in the hundreds or thousands.
  • Higher-order rules don’t rescue quadrature at scale: increasing shifts the crossover dimension, but collecting enough smooth derivatives of in high is itself intractable.
  • Monte Carlo’s slow 1D rate is a feature in high : looks poor compared to in 1D, but its constant does not grow exponentially with .

Pitfalls

  • Monte Carlo is not always preferable: in or (for ), a well-chosen quadrature rule is faster to converge; defaulting to Monte Carlo in low dimensions wastes samples.
  • Rate assumes i.i.d. samples: the CLT rate breaks down for correlated samples (e.g., MCMC chains), which introduce an effective-sample-size penalty.
  • Ignoring variance: the CLT bound hides the integrand variance ; a high-variance integrand can make Monte Carlo impractical even when the dimension favors it. Variance reduction (importance sampling, control variates) is then essential.
  • Confusing (total points) with (points per dimension): the grid has points, so comparing quadrature and Monte Carlo at the same (not the same ) is critical.

Connections

  • Quasi-Monte Carlo todo: low-discrepancy sequences beat i.i.d. draws, achieving error for moderate .
  • MCMC todo: Markov chains replace i.i.d. draws when the target is known up to a constant; rate survives asymptotically via ergodic theory.
  • Importance Sampling todo: reweights a surrogate distribution to reduce variance; rate holds with a smaller constant.
  • Curse of Dimensionality todo: cubature rate is the canonical example of exponential cost growth with .
  • Central Limit Theorem todo: source of Monte Carlo’s guarantee for i.i.d. estimators.

References

[1] Fearnhead et al. (2025), Scalable Monte Carlo for Bayesian Learning