{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"},
  "language_info": {"name": "python", "version": "3.9.0"}
 },
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c01",
   "metadata": {},
   "source": [
    "# Boosting Explained Simply with Python\n",
    "\n",
    "This notebook builds a minimal AdaBoost loop from scratch, then scales up to scikit-learn's production AdaBoostClassifier. We track staged error across rounds to show exactly how boosting reduces error sequentially, and compare against a single stump and a full decision tree baseline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c02",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "np.random.seed(42)\n",
    "import sklearn; print(f'sklearn {sklearn.__version__}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c03",
   "metadata": {},
   "source": ["## 1. Dataset"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c04",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Source: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html\n",
    "from sklearn.datasets import make_classification\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "X, y = make_classification(\n",
    "    n_samples=1000, n_features=10, n_informative=6,\n",
    "    n_redundant=2, flip_y=0.05, random_state=42\n",
    ")\n",
    "# AdaBoost expects labels in {-1, +1}\n",
    "y_ada = np.where(y == 0, -1, 1)\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)\n",
    "X_train_a, X_test_a, y_train_a, y_test_a = train_test_split(X, y_ada, test_size=0.25, random_state=42)\n",
    "\n",
    "print(f'Train: {X_train.shape}  Test: {X_test.shape}')\n",
    "print(f'Class balance: {np.bincount(y_train)}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c05",
   "metadata": {},
   "source": ["## 2. EDA"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c06",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(1, 3, figsize=(15, 4))\n",
    "\n",
    "# Class distribution\n",
    "axes[0].bar(['Class 0', 'Class 1'], np.bincount(y), color=['#6366f1','#22c55e'], edgecolor='k')\n",
    "axes[0].set_title('Class Distribution')\n",
    "\n",
    "# Feature distributions\n",
    "for c, col in zip([0,1], ['#6366f1','#22c55e']):\n",
    "    axes[1].hist(X[y==c, 0], bins=20, alpha=0.6, label=f'Class {c}', color=col)\n",
    "axes[1].set_title('Feature 0 Distribution by Class'); axes[1].legend()\n",
    "\n",
    "# Scatter of top 2 features\n",
    "axes[2].scatter(X[y==0,0], X[y==0,1], alpha=0.4, s=15, color='#6366f1', label='Class 0')\n",
    "axes[2].scatter(X[y==1,0], X[y==1,1], alpha=0.4, s=15, color='#22c55e', label='Class 1')\n",
    "axes[2].set_title('Feature 0 vs Feature 1'); axes[2].legend()\n",
    "\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c07",
   "metadata": {},
   "source": ["## 3. Baseline — Single Decision Stump"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c08",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.metrics import accuracy_score\n",
    "\n",
    "stump = DecisionTreeClassifier(max_depth=1, random_state=42)\n",
    "stump.fit(X_train, y_train)\n",
    "stump_acc = accuracy_score(y_test, stump.predict(X_test))\n",
    "print(f'Single stump accuracy: {stump_acc:.4f}')\n",
    "\n",
    "full_tree = DecisionTreeClassifier(random_state=42)\n",
    "full_tree.fit(X_train, y_train)\n",
    "tree_acc = accuracy_score(y_test, full_tree.predict(X_test))\n",
    "print(f'Full tree accuracy:    {tree_acc:.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c09",
   "metadata": {},
   "source": ["## 4. AdaBoost from Scratch"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c10",
   "metadata": {},
   "outputs": [],
   "source": [
    "N = len(X_train_a)\n",
    "n_rounds = 50\n",
    "weights = np.ones(N) / N\n",
    "\n",
    "alphas, stumps_scratch = [], []\n",
    "train_errors, test_errors = [], []\n",
    "\n",
    "for t in range(n_rounds):\n",
    "    # Train weak learner on current weights\n",
    "    s = DecisionTreeClassifier(max_depth=1, random_state=t)\n",
    "    s.fit(X_train_a, y_train_a, sample_weight=weights)\n",
    "    preds_tr = s.predict(X_train_a)\n",
    "\n",
    "    # Weighted error\n",
    "    wrong = (preds_tr != y_train_a).astype(float)\n",
    "    err = (weights * wrong).sum()\n",
    "    err = np.clip(err, 1e-10, 1 - 1e-10)\n",
    "\n",
    "    # Learner weight\n",
    "    alpha = 0.5 * np.log((1 - err) / err)\n",
    "\n",
    "    # Update sample weights\n",
    "    weights *= np.exp(-alpha * y_train_a * preds_tr)\n",
    "    weights /= weights.sum()\n",
    "\n",
    "    alphas.append(alpha)\n",
    "    stumps_scratch.append(s)\n",
    "\n",
    "    # Staged predictions\n",
    "    def staged_predict(X_in):\n",
    "        agg = sum(a * st.predict(X_in) for a, st in zip(alphas, stumps_scratch))\n",
    "        return np.sign(agg)\n",
    "\n",
    "    train_errors.append(1 - accuracy_score(y_train_a, staged_predict(X_train_a)))\n",
    "    test_errors.append(1 - accuracy_score(y_test_a, staged_predict(X_test_a)))\n",
    "\n",
    "print(f'From-scratch AdaBoost (50 rounds):')\n",
    "print(f'  Train error: {train_errors[-1]:.4f}')\n",
    "print(f'  Test error:  {test_errors[-1]:.4f}')\n",
    "print(f'  Test accuracy: {1 - test_errors[-1]:.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c11",
   "metadata": {},
   "source": ["## 5. Staged Error Plot"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c12",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 4))\n",
    "rounds = np.arange(1, n_rounds + 1)\n",
    "plt.plot(rounds, train_errors, 'o-', color='#6366f1', markersize=3, label='Train Error (scratch)')\n",
    "plt.plot(rounds, test_errors, 's-', color='#f59e0b', markersize=3, label='Test Error (scratch)')\n",
    "plt.axhline(1 - stump_acc, color='#94a3b8', linestyle='--', label=f'Single Stump ({1-stump_acc:.3f})')\n",
    "plt.axhline(1 - tree_acc, color='#22c55e', linestyle=':', label=f'Full Tree ({1-tree_acc:.3f})')\n",
    "plt.xlabel('Boosting Round'); plt.ylabel('Error Rate')\n",
    "plt.title('Boosting: Error Decreases Sequentially with Each Round')\n",
    "plt.legend(); plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c13",
   "metadata": {},
   "source": ["## 6. Learner Weights Across Rounds"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c14",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(10, 3))\n",
    "plt.bar(np.arange(1, len(alphas)+1), alphas, color='#6366f1', alpha=0.8)\n",
    "plt.xlabel('Round'); plt.ylabel('Alpha (learner weight)')\n",
    "plt.title('Learner Weights α_t Across Rounds\\n(higher α = better learner = more influence)')\n",
    "plt.tight_layout(); plt.show()\n",
    "print(f'Max alpha: {max(alphas):.4f} at round {np.argmax(alphas)+1}')\n",
    "print(f'Min alpha: {min(alphas):.4f} at round {np.argmin(alphas)+1}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c15",
   "metadata": {},
   "source": ["## 7. Sample Weight Evolution"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Re-run and record weight snapshots\n",
    "weights2 = np.ones(N) / N\n",
    "weight_history = [weights2.copy()]\n",
    "\n",
    "for t in range(10):\n",
    "    s = DecisionTreeClassifier(max_depth=1, random_state=t)\n",
    "    s.fit(X_train_a, y_train_a, sample_weight=weights2)\n",
    "    preds_tr = s.predict(X_train_a)\n",
    "    wrong = (preds_tr != y_train_a).astype(float)\n",
    "    err = np.clip((weights2 * wrong).sum(), 1e-10, 1-1e-10)\n",
    "    alpha = 0.5 * np.log((1 - err) / err)\n",
    "    weights2 *= np.exp(-alpha * y_train_a * preds_tr)\n",
    "    weights2 /= weights2.sum()\n",
    "    weight_history.append(weights2.copy())\n",
    "\n",
    "fig, axes = plt.subplots(1, 4, figsize=(16, 3), sharey=True)\n",
    "for ax, rnd in zip(axes, [0, 2, 5, 9]):\n",
    "    w = weight_history[rnd]\n",
    "    ax.scatter(range(N), np.sort(w), s=5, alpha=0.7, color='#6366f1')\n",
    "    ax.set_title(f'Round {rnd}\\nmax_w={w.max():.4f}')\n",
    "    ax.set_xlabel('Sample (sorted)')\n",
    "    if rnd == 0: ax.set_ylabel('Weight')\n",
    "\n",
    "plt.suptitle('Sample Weight Distribution Across Rounds\\n(weights concentrate on hard examples)', y=1.02)\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c17",
   "metadata": {},
   "source": ["## 8. sklearn AdaBoostClassifier — Full Scale"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c18",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "\n",
    "ada = AdaBoostClassifier(\n",
    "    estimator=DecisionTreeClassifier(max_depth=1),\n",
    "    n_estimators=200, learning_rate=0.5, random_state=42\n",
    ")\n",
    "ada.fit(X_train, y_train)\n",
    "\n",
    "# Staged test scores\n",
    "staged_test  = [accuracy_score(y_test, p) for p in ada.staged_predict(X_test)]\n",
    "staged_train = [accuracy_score(y_train, p) for p in ada.staged_predict(X_train)]\n",
    "\n",
    "plt.figure(figsize=(10, 4))\n",
    "rounds200 = np.arange(1, 201)\n",
    "plt.plot(rounds200, staged_train, color='#6366f1', linewidth=1.5, label='Train Accuracy')\n",
    "plt.plot(rounds200, staged_test,  color='#f59e0b', linewidth=1.5, label='Test Accuracy')\n",
    "plt.axhline(stump_acc, color='#94a3b8', linestyle='--', label=f'Single Stump ({stump_acc:.3f})')\n",
    "plt.axhline(tree_acc,  color='#22c55e', linestyle=':', label=f'Full Tree ({tree_acc:.3f})')\n",
    "plt.xlabel('Boosting Rounds'); plt.ylabel('Accuracy')\n",
    "plt.title('AdaBoostClassifier: Staged Accuracy (200 rounds)')\n",
    "plt.legend(); plt.tight_layout(); plt.show()\n",
    "\n",
    "print(f'Best test accuracy: {max(staged_test):.4f} at round {np.argmax(staged_test)+1}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c19",
   "metadata": {},
   "source": ["## 9. Learning Rate Sweep"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c20",
   "metadata": {},
   "outputs": [],
   "source": [
    "lrs = [0.01, 0.05, 0.1, 0.5, 1.0]\n",
    "plt.figure(figsize=(10, 4))\n",
    "\n",
    "for lr in lrs:\n",
    "    m = AdaBoostClassifier(\n",
    "        estimator=DecisionTreeClassifier(max_depth=1),\n",
    "        n_estimators=200, learning_rate=lr, random_state=42\n",
    "    )\n",
    "    m.fit(X_train, y_train)\n",
    "    scores = [accuracy_score(y_test, p) for p in m.staged_predict(X_test)]\n",
    "    plt.plot(np.arange(1, 201), scores, linewidth=1.5, label=f'lr={lr}')\n",
    "\n",
    "plt.xlabel('Rounds'); plt.ylabel('Test Accuracy')\n",
    "plt.title('Learning Rate vs Boosting Rounds\\n(lower lr = slower convergence, often better peak)')\n",
    "plt.legend(); plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c21",
   "metadata": {},
   "source": ["## 10. Cross-Validation Comparison"]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c22",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import cross_val_score\n",
    "\n",
    "models_cv = {\n",
    "    'Single Stump': DecisionTreeClassifier(max_depth=1, random_state=42),\n",
    "    'Full Tree':    DecisionTreeClassifier(random_state=42),\n",
    "    'AdaBoost (50)': AdaBoostClassifier(\n",
    "        estimator=DecisionTreeClassifier(max_depth=1),\n",
    "        n_estimators=50, learning_rate=0.5, random_state=42),\n",
    "    'AdaBoost (200)': AdaBoostClassifier(\n",
    "        estimator=DecisionTreeClassifier(max_depth=1),\n",
    "        n_estimators=200, learning_rate=0.5, random_state=42),\n",
    "}\n",
    "\n",
    "cv_scores = {}\n",
    "for name, model in models_cv.items():\n",
    "    scores = cross_val_score(model, X, y, cv=10, scoring='accuracy', n_jobs=-1)\n",
    "    cv_scores[name] = scores\n",
    "    print(f'{name:20s}: {scores.mean():.4f} ± {scores.std():.4f}')\n",
    "\n",
    "plt.figure(figsize=(9, 4))\n",
    "plt.boxplot(cv_scores.values(), labels=cv_scores.keys(), patch_artist=True,\n",
    "            boxprops=dict(facecolor='#e0e7ff'),\n",
    "            medianprops=dict(color='#4f46e5', linewidth=2))\n",
    "plt.ylabel('10-Fold CV Accuracy')\n",
    "plt.title('Boosting vs Baselines: Stability Across Folds')\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c23",
   "metadata": {},
   "source": [
    "## 11. Discussion\n",
    "\n",
    "1. **Sequential correction is visible.** The staged error plot shows a clear monotone decline in training error with each round. Test error drops rapidly in early rounds and then plateaus — the signature of a well-regularised boosting run.\n",
    "\n",
    "2. **Alpha tracks quality.** The alpha bar chart shows that early rounds tend to have higher alpha (larger contribution) because the first stumps tackle the easiest errors. Later rounds, handling harder examples, often have lower alpha.\n",
    "\n",
    "3. **Weight concentration is the mechanism.** The weight scatter plots show uniform weights at round 0 concentrating onto a subset of hard examples by round 9. This is what forces each successive stump to specialise on hard regions.\n",
    "\n",
    "4. **Learning rate controls the speed-accuracy trade-off.** Low learning rates require more rounds to converge but tend to find a better plateau. This is directly analogous to step size in gradient descent."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c24",
   "metadata": {},
   "source": [
    "## 12. Next Steps\n",
    "\n",
    "- **Article 7: AdaBoost in Python with a Simple Classification Example** — full AdaBoost with probability outputs, SAMME.R variant\n",
    "- **Article 8: How AdaBoost Reweights Misclassified Samples** — step-by-step weight update visualisation\n",
    "- **Article 9: Gradient Boosting in Python** — extending sequential correction to arbitrary loss functions\n",
    "- **Article 10: XGBoost for Real Business Problems** — production-grade implementation with regularisation"
   ]
  }
 ]
}