Concept: Monte Carlo Better than Quadrature

Core Intuition

Q: Why Monte Carlo method is often the default method for high-dimensional integral approximation, instead of Quadrature methods? I got this question while reading Fearnhead et al. (2025), Scalable Monte Carlo for Bayesian Learning, p.6.

This comes directly from the following:

Suppose we apply a cubature rule based on a grid of $m + 1$ equally spaced points in each dimension.
The spacing of these points will be $δ = 1/ m$
Then there will be $n = (m + 1)^{d}$ points in total
If we have a cubature whose error decays in $δ^{r}$ , for some power $r$ , then the error decays at a rate of $m^{- r} ≃ n^{- r / d}$
Note for Monte Carlo method: its error decays at a rate of $n^{1/2}$ , by the central limit theorem

Mathematical Foundation

Q: Why cubature whose error decays in $δ^{r}$ ? Taylor’s theorem with integral remainder.

Taylor’s Theorem with Remainder

For a function $f$ with $r$ continuous derivatives on an interval containing $x_{0}$ and $x$ :

f (x) = k = 0 \sum r - 1 \frac{f ^{(k)} ( x _{0} )}{k !} (x - x_{0})^{k} + R_{r} (x)

where the remainder can be written as:

R_{r} (x) = \frac{1}{( r - 1 )!} \int_{x_{0}}^{x} (x - t)^{r - 1} f^{(r)} (t) d t

Proof of the Error Rate Bound in $δ^{r}$

Step 1: Take absolute value of the remainder:

∣ R_{r} (x) ∣ = \frac{1}{( r - 1 )!} \int_{x_{0}}^{x} (x - t)^{r - 1} f^{(r)} (t) d t

Step 2: Apply triangle inequality:

∣ R_{r} (x) ∣ \leq \frac{1}{( r - 1 )!} \int_{x_{0}}^{x} ∣ x - t ∣^{r - 1} ∣ f^{(r)} (t) ∣ d t

Step 3: Use $∣ f^{(r)} (t) ∣ \leq ∣ f^{(r)} ∣_{\infty}$ (definition of sup norm):

∣ R_{r} (x) ∣ \leq \frac{∣ f ^{(r)} ∣ _{\infty}}{( r - 1 )!} \int_{x_{0}}^{x} ∣ x - t ∣^{r - 1} d t

Step 4: Evaluate the integral. Let $h = ∣ x - x_{0} ∣$ :

\int_{x_{0}}^{x} ∣ x - t ∣^{r - 1} d t = \int_{0}^{h} s^{r - 1} d s = \frac{h ^{r}}{r}

Step 5: Combine:

∣ R_{r} (x) ∣ \leq \frac{∣ f ^{(r)} ∣ _{\infty}}{( r - 1 )!} \cdot \frac{h ^{r}}{r} = \frac{∣ f ^{(r)} ∣ _{\infty}}{r !} h^{r}

Step 6: Set $δ = h = ∣ x - x_{0} ∣$ and $C = \frac{1}{r !}$ :

∣ R_{r} (x) ∣ \leq C \cdot δ^{r} \cdot ∣ f^{(r)} ∣_{\infty}

Derivation of Taylor’s Theorem with Remainder Formula

The key idea is to repeatedly apply integration by parts to express the error in terms of the highest derivative.

Starting Point: Fundamental Theorem of Calculus

f (x) = f (x_{0}) + \int_{x_{0}}^{x} f^{'} (t) d t

This is just: $f (x) - f (x_{0}) = \int_{x_{0}}^{x} f^{'} (t) d t$

Step 1: Integration by Parts (once)

Apply integration by parts to $\int_{x_{0}}^{x} f^{'} (t) d t$ with:

$u = f^{'} (t)$ , so $d u = f^{''} (t) d t$
$d v = d t$ , so $v = t - x$

\int_{x_{0}}^{x} f^{'} (t) d t = [f^{'} (t) (t - x)]_{x_{0}}^{x} - \int_{x_{0}}^{x} (t - x) f^{''} (t) d t

= - f^{'} (x_{0}) (x_{0} - x) + \int_{x_{0}}^{x} (x - t) f^{''} (t) d t

= f^{'} (x_{0}) (x - x_{0}) + \int_{x_{0}}^{x} (x - t) f^{''} (t) d t

So:

f (x) = f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + \int_{x_{0}}^{x} (x - t) f^{''} (t) d t

Step 2: Integration by Parts (again)

Apply integration by parts to $\int_{x_{0}}^{x} (x - t) f^{''} (t) d t$ with:

$u = f^{''} (t)$ , so $d u = f^{'''} (t) d t$
$d v = (x - t) d t$ , so $v = - \frac{( x - t ) ^{2}}{2}$

\int_{x_{0}}^{x} (x - t) f^{''} (t) d t = [- \frac{( x - t ) ^{2}}{2} f^{''} (t)]_{x_{0}}^{x} + \int_{x_{0}}^{x} \frac{( x - t ) ^{2}}{2} f^{'''} (t) d t

= \frac{( x - x _{0} ) ^{2}}{2} f^{''} (x_{0}) + \int_{x_{0}}^{x} \frac{( x - t ) ^{2}}{2} f^{'''} (t) d t

So:

f (x) = f (x_{0}) + f^{'} (x_{0}) (x - x_{0}) + \frac{f ^{''} ( x _{0} )}{2 !} (x - x_{0})^{2} + \int_{x_{0}}^{x} \frac{( x - t ) ^{2}}{2 !} f^{'''} (t) d t

Pattern Recognition

After k integration by parts, we get:

f (x) = j = 0 \sum k - 1 \frac{f ^{(j)} ( x _{0} )}{j !} (x - x_{0})^{j} + \int_{x_{0}}^{x} \frac{( x - t ) ^{k - 1}}{( k - 1 )!} f^{(k)} (t) d t

Setting k = r:

R_{r} (x) = \frac{1}{( r - 1 )!} \int_{x_{0}}^{x} (x - t)^{r - 1} f^{(r)} (t) d t

Intuition

The integral form shows that the remainder is a weighted average of $f^{(r)} (t)$ over the interval $[x_{0}, x]$ , with weights $(x - t)^{r - 1}$ that give more importance to points closer to $x_{0}$ .

Key Equation

For curvature: If we have a cubature whose error decays in $δ^{r}$ . for some power $r$ , then the error decays at a rate of $m^{- r} ≃ n^{- r / d}$ .

f (x) = k = 0 \sum r - 1 \frac{f ^{(k)} ( x _{0} )}{k !} (x - x_{0})^{k} + R_{r} (x),

R_{r} (x) = \frac{1}{( r - 1 )!} \int_{x_{0}}^{x} (x - t)^{r - 1} f^{(r)} (t) d t,

∣ R_{r} (x) ∣ \leq C \cdot δ^{r} \cdot ∣ f^{(r)} ∣_{\infty}

As $δ = 1/ m$ , $n = (m + 1)^{d}$ , so $m^{- r} ≃ n^{- r / d}$ .

For Monte Carlo: The error decays at a rate of $n^{1/2}$ , by the central limit theorem.

Component of

Insights

Curse of dimensionality breaks quadrature: cubature error scales as $n^{- r / d}$ , so for fixed $n$ and $r$ , doubling $d$ halves the exponent. Monte Carlo error $n^{- 1/2}$ is completely dimension-free.
Crossover dimension: Monte Carlo beats an order- $r$ cubature rule when $d > 2 r$ . For a second-order rule ( $r = 2$ ), the crossover is $d > 4$ ; for most practical integrands in ML/Bayesian settings, $d$ is in the hundreds or thousands.
Higher-order rules don’t rescue quadrature at scale: increasing $r$ shifts the crossover dimension, but collecting enough smooth derivatives of $f$ in high $d$ is itself intractable.
Monte Carlo’s slow 1D rate is a feature in high $d$ : $n^{- 1/2}$ looks poor compared to $n^{- r}$ in 1D, but its constant does not grow exponentially with $d$ .

Pitfalls

Monte Carlo is not always preferable: in $d \leq 2$ or $d \leq 4$ (for $r \geq 2$ ), a well-chosen quadrature rule is faster to converge; defaulting to Monte Carlo in low dimensions wastes samples.
Rate assumes i.i.d. samples: the $n^{- 1/2}$ CLT rate breaks down for correlated samples (e.g., MCMC chains), which introduce an effective-sample-size penalty.
Ignoring variance: the CLT bound hides the integrand variance $σ^{2}$ ; a high-variance integrand can make Monte Carlo impractical even when the dimension favors it. Variance reduction (importance sampling, control variates) is then essential.
Confusing $n$ (total points) with $m$ (points per dimension): the grid has $n = (m + 1)^{d}$ points, so comparing quadrature and Monte Carlo at the same $n$ (not the same $m$ ) is critical.

Connections

Quasi-Monte Carlo todo: low-discrepancy sequences beat i.i.d. draws, achieving $O ((lo g n)^{d} / n)$ error for moderate $d$ .
MCMC todo: Markov chains replace i.i.d. draws when the target is known up to a constant; $n^{- 1/2}$ rate survives asymptotically via ergodic theory.
Importance Sampling todo: reweights a surrogate distribution to reduce variance; $n^{- 1/2}$ rate holds with a smaller constant.
Curse of Dimensionality todo: $n^{- r / d}$ cubature rate is the canonical example of exponential cost growth with $d$ .
Central Limit Theorem todo: source of Monte Carlo’s $n^{- 1/2}$ guarantee for i.i.d. estimators.

References

[1] Fearnhead et al. (2025), Scalable Monte Carlo for Bayesian Learning

Quartz 4

Explorer

Monte Carlo Better than Quadrature

Concept: Monte Carlo Better than Quadrature

Core Intuition

Mathematical Foundation

Taylor’s Theorem with Remainder

Proof of the Error Rate Bound in $δ^{r}$

Derivation of Taylor’s Theorem with Remainder Formula

Starting Point: Fundamental Theorem of Calculus

Step 1: Integration by Parts (once)

Step 2: Integration by Parts (again)

Pattern Recognition

Intuition

Key Equation

Component of

Insights

Pitfalls

Connections

References

Graph View

Table of Contents

Quartz 4

Explorer

Monte Carlo Better than Quadrature

Concept: Monte Carlo Better than Quadrature

Core Intuition

Mathematical Foundation

Taylor’s Theorem with Remainder

Proof of the Error Rate Bound in δr

Derivation of Taylor’s Theorem with Remainder Formula

Starting Point: Fundamental Theorem of Calculus

Step 1: Integration by Parts (once)

Step 2: Integration by Parts (again)

Pattern Recognition

Intuition

Key Equation

Component of

Insights

Pitfalls

Connections

References

Graph View

Table of Contents

Proof of the Error Rate Bound in $δ^{r}$