Interactive Probability & Statistics

Welcome to this interactive guide to probability and statistics. This module uses hands-on simulations to make abstract concepts tangible, from the shape of probability distributions to the power of the Central Limit Theorem. By manipulating parameters and generating data, you can build a strong intuition for how to model uncertainty and interpret data.

AI Tutor Setup

To use the AI tutor, you need a Google AI API key.

  1. Go to Google AI Studio.
  2. Click "Create API key" to get your key.
  3. Copy the key and paste it into the field above.

Learning Objectives

  • Distinguish between the shape and properties of thin-tailed (Exponential) and fat-tailed (Pareto) distributions.
  • Contrast the "memoryless" nature of thin-tailed distributions with the scale-invariance of fat-tailed distributions using conditional probability.
  • Observe how the Central Limit Theorem's convergence is rapid for thin-tailed data but slow and volatile for fat-tailed data.
  • Analyze how correlation is stable in thin-tailed systems but highly sensitive to outliers in fat-tailed systems.
  • Evaluate how regression models provide consistent fits for thin-tailed data but become unstable and less reliable with fat-tailed data.

Distribution (Thin-Tail)

The Exponential distribution models the time between events in a Poisson point process. It's "thin-tailed," meaning extreme events are rare. Its probability density function (PDF) is:

$f(x; \lambda) = \lambda e^{-\lambda x}$ for $x \ge 0$

Adjust the rate parameter ($\lambda$) and generate samples to see how the empirical data aligns with the theoretical PDF and CDF.

PDF & Sample Histogram

CDF

Mean: , Var:

Need help with Distribution (Thin-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Distribution (Fat-Tail)

The Pareto distribution models phenomena where a small number of items have a large effect (the "80/20 rule"). It's "fat-tailed," meaning extreme events are more likely. For a minimum value ($x_m$) of 1, its PDF is:

$f(x; \alpha) = \frac{\alpha}{x^{\alpha+1}}$ for $x \ge 1$

Adjust the shape ($\alpha$). As $\alpha$ decreases, the tail gets "fatter," and the probability of very large values increases. Note that the variance is infinite if $\alpha \le 2$, and the mean is infinite if $\alpha \le 1$.

PDF & Sample Histogram

CDF

Mean: , Var:

Need help with Distribution (Fat-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Conditional Probability (Thin-Tail)

Imagine a world where income is distributed exponentially (thin-tailed). This implies a strong tendency towards an average income. The plot shows the probability of doubling your income, given you've already earned an amount $a$. Formally, this is $P(X > 2a \mid X > a)$.

For the Exponential distribution, this probability, $e^{-\lambda a}$, decreases as your income $a$ grows. This means the richer you are, the less likely you are to double your wealth. This "wear-out" effect leads to less extreme income inequality, as high incomes are statistically likely to regress towards the mean rather than grow indefinitely.

$P(X > 2a \mid X > a) = e^{-\lambda a}$

Need help with Conditional Probability (Thin-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Conditional Probability (Fat-Tail)

Now, imagine a world where income follows a Pareto distribution (fat-tailed), which is a much better model for real-world wealth. The plot shows the probability of doubling your income, $P(X > 2a \mid X > a)$, given you've already earned an amount $a$.

For the Pareto distribution, this probability is $(1/2)^\alpha$, a constant that does not depend on your current income $a$. This remarkable result is called scale-invariance. It implies that the probability of a millionaire doubling their wealth is the same as the probability of a billionaire doubling theirs. This "the rich get richer" dynamic is why fat-tailed distributions are essential for understanding phenomena with extreme inequality.

$P(X > 2a \mid X > a) = (1/2)^\alpha = $

Need help with Conditional Probability (Fat-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Sample Mean of Thin-Tailed Distribution

The Central Limit Theorem (CLT) states that the distribution of sample means will be approximately normal, even if the underlying distribution is not. For a thin-tailed distribution like the Exponential, this convergence happens relatively quickly. Adjust the parameters below to see how the distribution of sample means approaches a bell curve.

Simulation Controls

Theoretical Values (for Sample Mean):

Mean: , Std Dev:

Observed Sample Mean Distribution:

Min: , Max:

Q1: , Median: , Q3:

Need help with the Sample Mean (Thin-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Sample Mean of Fat-Tailed Distribution

While the CLT still applies to fat-tailed distributions (with finite variance), the distribution of sample means converges to normal much more slowly. Because of the higher probability of extreme events in the Pareto distribution, the sample means will have a much wider spread (higher variance) compared to the thin-tailed case. Adjust $\alpha$ and see how it affects the result.

Simulation Controls

Theoretical Values (for Sample Mean):

Mean: , Std Dev:

Observed Sample Mean Distribution:

Min: , Max:

Q1: , Median: , Q3:

Need help with the Sample Mean (Fat-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Correlation with Thin-Tailed Data

Here we generate data from an Exponential distribution. With thin-tailed data, correlation is stable. We construct Z = 2X + Y, where X and Y are independent. Notice Corr(X, Y) is near 0, and Corr(X, Z) is high and positive. The histogram of sample correlations is narrow, showing stability.

Simulation Controls

Last Corr(X,Y):

Corr(X,Z) Distribution:

Theoretical:

Min: , Max:

Q1: , Med: , Q3:

Need help with Correlation (Thin-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Correlation with Fat-Tailed Data

Here we generate data from a Pareto distribution. Outliers in fat-tailed data can heavily influence the correlation coefficient. The histogram of sample correlations is much wider than in the thin-tailed case, showing this volatility.

Simulation Controls

Last Corr(X,Y):

Corr(X,Z) Distribution:

Theoretical:

Min: , Max:

Q1: , Med: , Q3:

Need help with Correlation (Fat-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Regression with Thin-Tailed Data

With thin-tailed data, the regression model is stable. The R-squared histogram is narrow, showing that different samples produce very similar model fits.

Model Controls

Last Equation:

R-Squared Distribution:

Theoretical:

Min: , Max:

Q1: , Med: , Q3:

Need help with Regression (Thin-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.

Regression with Fat-Tailed Data

Outliers in fat-tailed data can dramatically affect the regression line. The R-squared histogram is much wider, showing that the model's explanatory power is highly volatile and sensitive to the specific sample drawn.

Model Controls

Last Equation:

R-Squared Distribution:

Theoretical:

Min: , Max:

Q1: , Med: , Q3:

Need help with Regression (Fat-Tail)?

Stuck on this topic? Start a session with the AI tutor for a hint or guidance.