{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
  "language_info": {"name": "python", "version": "3.9.0"}
 },
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c01",
   "metadata": {},
   "source": [
    "# How AdaBoost Reweights Misclassified Samples\n",
    "\n",
    "This notebook implements AdaBoost from scratch, recording the full weight matrix (N samples × T rounds) to visualise exactly how sample weights evolve. We show weight concentration, effective sample size, and what happens when label noise is present."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c02",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "np.random.seed(42)\n",
    "import sklearn; print(f'sklearn {sklearn.__version__}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c03",
   "metadata": {},
   "source": ["## 1. Dataset — Small Enough to Inspect Individual Weights"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c04",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Source: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html\n",
    "from sklearn.datasets import make_classification\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X, y = make_classification(\n",
    "    n_samples=200, n_features=4, n_informative=3,\n",
    "    n_redundant=1, flip_y=0.02, random_state=42\n",
    ")\n",
    "y_signed = np.where(y == 0, -1, 1)  # AdaBoost uses {-1, +1}\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, test_size=0.25, random_state=42, stratify=y\n",
    ")\n",
    "X_train_a, _, y_train_a, _ = train_test_split(\n",
    "    X, y_signed, test_size=0.25, random_state=42, stratify=y\n",
    ")\n",
    "\n",
    "N_train = len(X_train_a)\n",
    "print(f'Training samples: {N_train}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c05",
   "metadata": {},
   "source": ["## 2. EDA"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c06",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 2, figsize=(12, 4))\n",
    "\n",
    "axes[0].bar(['Class 0', 'Class 1'], np.bincount(y_train),\n",
    "            color=['#6366f1','#22c55e'], edgecolor='k')\n",
    "axes[0].set_title('Training Class Distribution')\n",
    "\n",
    "for c, col in zip([0,1], ['#6366f1','#22c55e']):\n",
    "    axes[1].scatter(X_train[y_train==c, 0], X_train[y_train==c, 1],\n",
    "                    alpha=0.6, s=25, color=col, label=f'Class {c}')\n",
    "axes[1].set_xlabel('Feature 0'); axes[1].set_ylabel('Feature 1')\n",
    "axes[1].set_title('Training Data (first 2 features)'); axes[1].legend()\n",
    "\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c07",
   "metadata": {},
   "source": ["## 3. AdaBoost with Full Weight Recording"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c08",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "T = 40  # rounds\n",
    "weight_matrix = np.zeros((T + 1, N_train))  # rows=rounds, cols=samples\n",
    "weight_matrix[0] = 1.0 / N_train\n",
    "\n",
    "alphas, stumps = [], []\n",
    "weighted_errors = []\n",
    "train_acc_staged, test_acc_staged = [], []\n",
    "\n",
    "for t in range(T):\n",
    "    w = weight_matrix[t]\n",
    "    stump = DecisionTreeClassifier(max_depth=1, random_state=t)\n",
    "    stump.fit(X_train_a, y_train_a, sample_weight=w)\n",
    "    preds = stump.predict(X_train_a)\n",
    "\n",
    "    wrong = (preds != y_train_a).astype(float)\n",
    "    err = np.clip((w * wrong).sum(), 1e-10, 1 - 1e-10)\n",
    "    alpha = 0.5 * np.log((1 - err) / err)\n",
    "\n",
    "    w_new = w * np.exp(-alpha * y_train_a * preds)\n",
    "    w_new /= w_new.sum()\n",
    "\n",
    "    weight_matrix[t + 1] = w_new\n",
    "    alphas.append(alpha)\n",
    "    stumps.append(stump)\n",
    "    weighted_errors.append(err)\n",
    "\n",
    "    # Staged prediction\n",
    "    def ens_pred(X_in):\n",
    "        return np.sign(sum(a * st.predict(X_in) for a, st in zip(alphas, stumps)))\n",
    "    train_acc_staged.append(accuracy_score(y_train_a, ens_pred(X_train_a)))\n",
    "    test_acc_staged.append(accuracy_score(\n",
    "        np.where(y_test == 0, -1, 1), ens_pred(X_test)\n",
    "    ))\n",
    "\n",
    "print(f'Final train acc: {train_acc_staged[-1]:.4f}')\n",
    "print(f'Final test acc:  {test_acc_staged[-1]:.4f}')\n",
    "print(f'Weight matrix shape: {weight_matrix.shape}  (rounds+1 × samples)')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c09",
   "metadata": {},
   "source": ["## 4. Weight Heatmap — Full N×T Matrix"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sort samples by final weight (descending) for better visualisation\n",
    "final_order = np.argsort(weight_matrix[-1])[::-1]\n",
    "wm_sorted = weight_matrix[:, final_order]\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(14, 5))\n",
    "im = ax.imshow(np.log1p(wm_sorted * N_train).T,\n",
    "               aspect='auto', cmap='YlOrRd', interpolation='nearest')\n",
    "plt.colorbar(im, ax=ax, label='log(1 + w × N)  [warm = high weight]')\n",
    "ax.set_xlabel('Boosting Round')\n",
    "ax.set_ylabel('Sample (sorted by final weight, high→low)')\n",
    "ax.set_title('Weight Heatmap: log-scaled\\nTop rows = hardest samples, warm = high weight')\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c11",
   "metadata": {},
   "source": ["## 5. Individual Sample Weight Trajectories"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c12",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Top 5 hardest (high final weight) and top 5 easiest (low final weight)\n",
    "hard_idx  = np.argsort(weight_matrix[-1])[::-1][:5]\n",
    "easy_idx  = np.argsort(weight_matrix[-1])[:5]\n",
    "\n",
    "rounds_axis = np.arange(T + 1)\n",
    "fig, axes = plt.subplots(1, 2, figsize=(14, 4))\n",
    "\n",
    "for idx in hard_idx:\n",
    "    axes[0].plot(rounds_axis, weight_matrix[:, idx] * N_train,\n",
    "                 linewidth=1.5, alpha=0.85)\n",
    "axes[0].axhline(1.0, color='gray', linestyle='--', linewidth=1, label='Uniform weight')\n",
    "axes[0].set_title('Hardest 5 Samples — Weight Trajectories\\n(each ends with w >> 1/N)')\n",
    "axes[0].set_xlabel('Round'); axes[0].set_ylabel('Weight × N  (>1 = above uniform)')\n",
    "axes[0].legend()\n",
    "\n",
    "for idx in easy_idx:\n",
    "    axes[1].plot(rounds_axis, weight_matrix[:, idx] * N_train,\n",
    "                 linewidth=1.5, alpha=0.85)\n",
    "axes[1].axhline(1.0, color='gray', linestyle='--', linewidth=1, label='Uniform weight')\n",
    "axes[1].set_title('Easiest 5 Samples — Weight Trajectories\\n(each ends with w << 1/N)')\n",
    "axes[1].set_xlabel('Round'); axes[1].set_ylabel('Weight × N')\n",
    "axes[1].legend()\n",
    "\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c13",
   "metadata": {},
   "source": ["## 6. Weight Concentration — Effective Sample Size"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c14",
   "metadata": {},
   "outputs": [],
   "source": [
    "# ESS = 1 / sum(w_i^2)  — falls as weights concentrate\n",
    "ess = [1.0 / (weight_matrix[t] ** 2).sum() for t in range(T + 1)]\n",
    "\n",
    "# Cumulative weight share of top-k%\n",
    "top_pct = 0.10\n",
    "k = max(1, int(top_pct * N_train))\n",
    "top_k_share = []\n",
    "for t in range(T + 1):\n",
    "    w = weight_matrix[t]\n",
    "    top_k_share.append(np.sort(w)[::-1][:k].sum())\n",
    "\n",
    "fig, axes = plt.subplots(1, 2, figsize=(13, 4))\n",
    "\n",
    "axes[0].plot(np.arange(T+1), ess, color='#6366f1', linewidth=2)\n",
    "axes[0].axhline(N_train, color='gray', linestyle='--', label=f'Max ESS = {N_train}')\n",
    "axes[0].set_xlabel('Round'); axes[0].set_ylabel('ESS')\n",
    "axes[0].set_title('Effective Sample Size Across Rounds\\n(falls as weights concentrate)')\n",
    "axes[0].legend()\n",
    "\n",
    "axes[1].plot(np.arange(T+1), [s*100 for s in top_k_share],\n",
    "             color='#f59e0b', linewidth=2)\n",
    "axes[1].set_xlabel('Round')\n",
    "axes[1].set_ylabel(f'Weight share of top {int(top_pct*100)}% samples (%)')\n",
    "axes[1].set_title(f'Weight Concentration: top {int(top_pct*100)}% samples\\n(rises as boosting progresses)')\n",
    "\n",
    "plt.tight_layout(); plt.show()\n",
    "print(f'ESS at round 0:  {ess[0]:.1f}  (= N = {N_train})')\n",
    "print(f'ESS at round {T}: {ess[-1]:.1f}  ({100*ess[-1]/N_train:.1f}% of N)')\n",
    "print(f'Top {int(top_pct*100)}% weight share: {top_k_share[0]*100:.1f}% → {top_k_share[-1]*100:.1f}%')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c15",
   "metadata": {},
   "source": ["## 7. Noise Experiment — Label Flip Effect on Weights"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inject label noise: flip 5 specific samples\n",
    "noise_count = 5\n",
    "noise_idx = np.random.choice(N_train, noise_count, replace=False)\n",
    "\n",
    "y_train_noisy = y_train_a.copy()\n",
    "y_train_noisy[noise_idx] *= -1  # flip labels\n",
    "\n",
    "print(f'Flipped labels for samples: {noise_idx}')\n",
    "\n",
    "# Run boosting on noisy labels\n",
    "wm_noisy = np.zeros((T + 1, N_train))\n",
    "wm_noisy[0] = 1.0 / N_train\n",
    "alphas_n, stumps_n = [], []\n",
    "test_acc_noisy = []\n",
    "\n",
    "y_test_signed = np.where(y_test == 0, -1, 1)\n",
    "\n",
    "for t in range(T):\n",
    "    w = wm_noisy[t]\n",
    "    s = DecisionTreeClassifier(max_depth=1, random_state=t)\n",
    "    s.fit(X_train_a, y_train_noisy, sample_weight=w)\n",
    "    preds = s.predict(X_train_a)\n",
    "    wrong = (preds != y_train_noisy).astype(float)\n",
    "    err = np.clip((w * wrong).sum(), 1e-10, 1-1e-10)\n",
    "    alpha = 0.5 * np.log((1 - err) / err)\n",
    "    w_new = w * np.exp(-alpha * y_train_noisy * preds)\n",
    "    w_new /= w_new.sum()\n",
    "    wm_noisy[t+1] = w_new\n",
    "    alphas_n.append(alpha); stumps_n.append(s)\n",
    "    test_acc_noisy.append(accuracy_score(\n",
    "        y_test_signed,\n",
    "        np.sign(sum(a * st.predict(X_test) for a, st in zip(alphas_n, stumps_n)))\n",
    "    ))\n",
    "\n",
    "print(f'Noisy final test acc: {test_acc_noisy[-1]:.4f}  (clean: {test_acc_staged[-1]:.4f})')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c17",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 2, figsize=(14, 4))\n",
    "\n",
    "# Noisy sample weight trajectories vs clean samples\n",
    "non_noise_idx = [i for i in range(N_train) if i not in noise_idx]\n",
    "for idx in non_noise_idx[:5]:\n",
    "    axes[0].plot(np.arange(T+1), wm_noisy[:, idx] * N_train,\n",
    "                 color='#94a3b8', linewidth=1, alpha=0.5)\n",
    "for idx in noise_idx:\n",
    "    axes[0].plot(np.arange(T+1), wm_noisy[:, idx] * N_train,\n",
    "                 color='#ef4444', linewidth=2, label=f'Noisy sample {idx}')\n",
    "axes[0].axhline(1.0, color='gray', linestyle='--', linewidth=1)\n",
    "axes[0].set_xlabel('Round'); axes[0].set_ylabel('Weight × N')\n",
    "axes[0].set_title('Noisy Samples (red) vs Clean Samples (gray)\\nNoisy labels → runaway weights')\n",
    "axes[0].legend(fontsize=7)\n",
    "\n",
    "# Accuracy comparison: clean vs noisy\n",
    "axes[1].plot(np.arange(1, T+1), test_acc_staged, color='#22c55e', linewidth=2, label='Clean labels')\n",
    "axes[1].plot(np.arange(1, T+1), test_acc_noisy,  color='#ef4444', linewidth=2, label='5 noisy labels')\n",
    "axes[1].set_xlabel('Round'); axes[1].set_ylabel('Test Accuracy')\n",
    "axes[1].set_title('Test Accuracy: Clean vs Noisy Labels\\n(noisy accuracy degrades as noisy weights dominate)')\n",
    "axes[1].legend()\n",
    "\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c18",
   "metadata": {},
   "source": ["## 8. Weight Distribution Over Rounds (Histogram)"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c19",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 4, figsize=(16, 3), sharey=True)\n",
    "round_snapshots = [0, 5, 15, 39]\n",
    "\n",
    "for ax, rnd in zip(axes, round_snapshots):\n",
    "    w = weight_matrix[rnd]\n",
    "    ax.hist(w * N_train, bins=20, color='#6366f1', edgecolor='white', alpha=0.85)\n",
    "    ax.axvline(1.0, color='#ef4444', linestyle='--', linewidth=1.5, label='Uniform')\n",
    "    ax.set_title(f'Round {rnd}\\nmax={w.max()*N_train:.2f}×')\n",
    "    ax.set_xlabel('Weight × N')\n",
    "    if rnd == 0: ax.set_ylabel('Count')\n",
    "    ax.legend(fontsize=7)\n",
    "\n",
    "plt.suptitle('Weight Distribution Histograms Across Rounds\\n(distribution spreads as boosting progresses)', y=1.02)\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c20",
   "metadata": {},
   "source": [
    "## 9. Discussion\n",
    "\n",
    "1. **Weight heatmap reveals which samples are hard.** The top rows (hardest samples by final weight) are warm from early rounds — they were hard from the start. Some samples fluctuate (correctly classified in some rounds, misclassified in others), showing how different stumps have different error regions.\n",
    "\n",
    "2. **ESS quantifies the cost of concentration.** Starting at N=150, ESS drops to roughly 30–60 by round 40 — meaning late-round stumps are trained on the equivalent of only 20–40 effective examples. This is the mechanism of diminishing returns from additional rounds.\n",
    "\n",
    "3. **Noisy labels → runaway weights.** The 5 flipped-label samples receive exponentially growing weights because no stump can classify them correctly (their labels are wrong). By round 20–30, these 5 samples hold a dominant fraction of total weight, and the ensemble begins fitting them at the expense of correct examples.\n",
    "\n",
    "4. **The weight ratio per round is interpretable.** For ε_t = 0.3, the ratio between a misclassified and correct sample grows by (1-0.3)/0.3 ≈ 2.3 per round. After 15 rounds of consistent misclassification, a sample weighs 2.3^15 ≈ 100,000 times its initial value — explaining the extreme concentration seen in the heatmap."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c21",
   "metadata": {},
   "source": [
    "## 10. Next Steps\n",
    "\n",
    "- **Article 9: Gradient Boosting in Python** — replaces the discrete reweighting with a continuous residual-fitting approach, directly addressing AdaBoost's noise sensitivity\n",
    "- **Article 10: XGBoost for Real Business Problems** — adds second-order gradients and regularisation to the gradient boosting framework\n",
    "- **Article 14: Boosting with Noisy Data: Challenges and Fixes** — practical techniques for dealing with the weight concentration failure mode"
   ]
  }
 ]
}
