Anirudh

Incident Report: A Risk Quant Reviews DeepLearning.AI’s Agentic AI Course

Model Risk & Validation · Incident Report

A five-module review from the one chair in the building that assumes everything is broken until proven otherwise. Fair warning — I have never met a try/except: pass I liked, and I read release notes for fun.

Case fileDLAI · AGENTIC-AI

Date2026-07-12

Scope5 modules · all labs

Overall rating★★★★★★★★★★ 3.0 / 5

Benchmarked vs.HF · LangGraph · CrewAI

ClassificationCOND. APPROVED

There’s a particular kind of person who, when handed a shiny new AI course, immediately asks: “Yes, but can I reproduce it, and what happens when it fails at 2 a.m.?” That person does not get invited to many parties. That person is me, and this is my review.

DeepLearning.AI’s Agentic AI course walks you through five modules that build, brick by brick, from a single reflecting agent to a coordinated team of them: reflection → tool use → evaluation → multi-agent collaboration, all hung on one running example — a research-report agent — plus a rotating cast of standalone labs (chart bots, SQL bots, an email assistant, a sunglasses-store customer-service pipeline, a marketing team). Andrew Ng narrates. The slides are clean. The vibe is optimistic.

My job is to be the opposite of optimistic. So I studied every lecture PDF, ran a magnifying glass over every ungraded and graded lab, cracked open the autograders, and benchmarked the whole thing against what the rest of the field — Hugging Face’s Agents Course, LangGraph, CrewAI, AutoGen, Anthropic’s tool-use docs, LangSmith, DeepEval — teaches for the same money (which, for a lot of them, is zero).

The headline: as a set of ideas, this course is genuinely well-sequenced and honest. As a set of artifacts you might mistake for production code, it is a target-rich environment. The gap between those two sentences is the entire review.

The scorecard · for people who skip to the table

Mod	Topic	Rating	The one-line indictment
1	Reflective Research Agent	★★★★★★★★★★ (3.5/5)	Boots up, wipes its own audit trail on every restart.
2	Reflection Design Pattern	★★★★★★★★★★ (3.5/5)	Teaches you to ground self-critique — then never checks V2 beat V1.
3	Tool Use Design Pattern	★★★★★★★★★★ (3/5)	Hands an agent a hard delete and forbids it from asking permission.
4	Evaluating Agentic Workflows	★★★★★★★★★★ (2.5/5)	The evaluation module ships an autograder that evaluates nothing.
5	Multi-Agent Collaboration	★★★★★★★★★★ (3/5)	Best ideas, best lab — routed through a JSON parser held together with hope.

Overall: ~3/5. A very good conceptual course wearing the costume of an engineering course. Take it for the mental models. Do not git clone your way into an incident.

01

Ambition meets drop_all()

The Reflective Research Agent

★★★★★★★★★★3.5 / 5 as teaching · ~1.5 / 5 as code you’d validate

Most intro modules hand you a notebook and a warm feeling. Module 1 hands you a deployable FastAPI + Postgres app in a Docker container and says good luck. A planner (o4-mini, temperature=1) emits a 5–7 step plan; an executor routes each step to a research/writer/editor agent; the research agents call Tavily, arXiv, and Wikipedia; the result is persisted to Postgres and streamed to a UI. Shipping a real back end on day one is genuinely unusual — Hugging Face opens with one smolagents notebook; Microsoft’s “AI Agents for Beginners” is concept-first lessons. This throws you in the deep end of deployment, state, and orchestration immediately.

Which is exactly why the validation findings pile up so fast.

The signature crime is on main.py:49: an unconditional Base.metadata.drop_all() on every startup. Every restart nukes the tasks table — the audit trail, the one thing a validator actually lives for, has the persistence half-life of a container. The README documents this as a “gotcha” rather than fixing it, which is a very honest way of saying “we know.” Around it: dependencies that float free (2 of ~30 packages pinned, so two builds a week apart are two different models of reality); a research_agent that swallows exceptions and returns "[Model Error: ...]" as content the writer will then cheerfully cite; routing by substring match (if "research" in step_lower), so a step titled “Write up the research draft” routes to the research agent because “research” got checked first; allow_origins=["*"]; default Postgres creds on an exposed port; and LLM-authored raw HTML rendered into innerHTML with no sanitization — stored XSS waiting for the wrong search result.

What’s genuinely instructive is the shape: the clean planner/executor/agents/tools separation, a real multi-source tool loop, and a Docker one-shot that actually runs end to end. What’s dangerous is that it looks production-ready enough for someone to treat it that way.

Risk register

data destruction on boot · silent error-as-content · unpinned deps · prompt-injection + XSS surface · open CORS + default creds · print()-only observability, zero tests.

Verdict — 3.5/5 as a teaching artifact; ~1.5/5 as code you’d validate. For FastAPI-literate learners who’ll read it critically — not copy it.

02

The model grades its own homework

Reflection Design Pattern

★★★★★★★★★★3.5 / 5

My priors on “the model critiques itself” are openly hostile, so I came in swinging. Module 2 mostly earned my grudging respect, because it stages the failure instead of hiding it.

The pattern is simple: emit V1, critique it, emit V2. The deck escalates through three grounding regimes — ungrounded self-reflection, execution feedback (run the code, feed back the stderr), and multimodal feedback (render the chart, let a vision model judge the actual pixels). Pleasingly, it doesn’t stop at “reflection good”: it cites Self-Refine (Madaan et al., 2023), then spends five slides on evaluation and explicitly warns about position bias in LLM-as-judge. Acknowledging that your evaluator is itself a biased instrument, in a beginner course, is more intellectual honesty than I budgeted for.

The star is the SQL lab (M2_UGL_2). It runs two reflection passes on a query over an event-sourced schema where qty_delta < 0 means a sale. The text-only refiner inspects the SQL, pronounces it fine, and confidently returns a negative total-sales number. Then the grounded refiner gets fed the actual result DataFrame, sees the nonsense, and fixes it with ABS(). That’s the whole model-validation lesson in one lab: self-reflection without an external oracle is theater.

Where it thins out is everything a validator asks next. No lab ever checks that V2 beat V1 — improvement is assumed, never measured. There’s no loop, no stopping criterion, no regression guard. The critic in the grounded SQL refiner runs at temperature=1.0 (a nondeterministic validator — a hard sell in any governed pipeline). And the graded assignment is the least trustworthy instance of the whole pattern: ungrounded essay critique, autograded only on “returns a str, no exception, length > 100.” You can pass with three near-identical three-line f-strings and never once observe whether reflection did anything. Defensible — free text can’t be autograded deterministically — but a cop-out all the same.

Versus the field: DLAI is more practical on the grounding distinction than most, but shallower on iteration — it never mentions Reflexion’s cross-trial memory, and LangGraph frames reflection as a stateful graph with loops and stopping edges. DLAI stops at V2; real systems iterate to convergence.

Risk register

V2 > V1 never verified · graded task tests plumbing only · nondeterministic critic · no stopping criterion · unsandboxed exec() of LLM code. ● Redeeming grace: the SQL lab that stages the null result.

Verdict — 3.5/5. The right instinct — ground your self-critique — taught via one excellent lab. Graduate to Reflexion / LangGraph for the loop.

03

We gave the text generator a delete button

Tool Use Design Pattern

★★★★★★★★★★3 / 5

This is the module I actually lose sleep over, because tool use is the exact moment a text predictor acquires side effects on the world. The mechanics are taught cleanly — a tool is just code the LLM can request; here’s naive string-parsing, here’s the JSON function-calling contract, here’s multi-tool orchestration, here’s MCP as the m×n → m+n integration story. Two calling conventions (raw OpenAI vs. aisuite), which is realistic but plants the provider:model footgun ("gpt-4o" vs "openai:gpt-4o") that the README rightly flags as the #1 cause of silent NoneType has no attribute 'choices' failures.

And then the email lab (M3_UGL_2) hands the model a delete_email tool. I read the backend: email_service.py:194 is db.delete(email); db.commit() — a hard delete. No soft-delete, no trash, no tombstone; recovery means reseeding the fixture. Worse, the notebook’s own prompt instructs: “Never ask the user for confirmation before performing an action… (no human-in-the-loop).” The lab hard-codes the anti-pattern — full autonomy over a destructive verb — and then the closing cell congratulates you on building agents that act “safely and transparently.” It demonstrates transparency. It does not demonstrate safety, and conflating the two is precisely the trap a risk function exists to catch.

Elsewhere: web and arXiv content flows straight into the message history — a textbook indirect prompt-injection surface, never mentioned. Argument validation is fully delegated to the model (delete_email(email_id) trusts whatever integer the LLM hallucinates; no existence or ownership check). The autograder, despite the README claiming it “mocks the client,” actually calls your function live and asserts isinstance(out, str) and len(out) > 50 — 51 characters of lorem ipsum passes.

Credit where it’s due, and it’s real credit: auditability is the module’s genuine strength. The assignment forces you to hand-build the tool-response messages and retain the full history, and pretty_print_chat_completion renders the entire tool-call trace. You can reconstruct which tool fired with what arguments — the closest thing to a model-risk audit log in the whole course.

Versus the field, this is where DLAI trails hardest. Hugging Face names web-content injection as a memory-poisoning attack; Anthropic’s docs say “require confirmation for destructive operations”; LangGraph makes interrupt-on-destructive-action a first-class primitive. Module 3 teaches the literal opposite default and never shows the alternative exists.

Risk register

destructive tool + no human-in-the-loop, taught as default · prompt-injection via tool output unmentioned · autograder asserts nothing · no argument/authorization validation. ● Redeeming graces: excellent trace auditability, error-as-data tool returns.

Verdict — 3/5. Crisp on how tools work; actively misleading on how to ship them safely. Pair with Anthropic’s tool-use docs before it goes near production.

04

The grader that grades air

Evaluating Agentic Workflows

★★★★★★★★★★2.5 / 5

This is home turf, so naturally it’s the module I graded hardest. And here’s the twist: the framing is the best thing in the entire course. The 2×2 of objective-vs-judge crossed with ground-truth-vs-none is genuinely reusable pedagogy. Error analysis via traces — count which stage produces the bad output (search results 45%, source selection 10%…) and spend your budget there — is just attribution analysis, and it’s the right instinct. The evaluation-vs-reflection distinction the README draws (reflection fixes this run; evaluation is a reusable scorer that catches regressions across many runs) is exactly the offline-eval-as-regression-gate idea a validator wants to hear.

Then the labs show up and start filing exceptions against themselves.

LLM-as-judge is introduced and simply trusted — no calibration against human labels, no inter-rater agreement, no acknowledgement that a judge from the same model family as the generator is marking its own homework, no treatment of the scorer’s own non-determinism (same rubric, same input, different score — sampling noise cosplaying as a P&L improvement). “Trustworthy source” is operationalized as “URL host is on an allowlist,” which quietly swaps provenance for correctness — a well-sourced wrong answer scores a clean PASS, and arxiv.org is un-reviewed preprints. And the matching in the notebook students actually run is substring, not suffix (any(td in domain for td in TOP_DOMAINS)), so nature.com would happily trust nature.com.phishing.io. The utils.py next door does it correctly with endswith. The one they run does not.

The assignment is where it falls over completely. There’s no blank template to complete — only a filled-in submission.ipynb that’s almost entirely mocked, with TOP_DOMAINS = {} (an empty dict, so validation is structurally guaranteed to FAIL). And the connoisseur’s detail, the thing I could not have designed as satire: grader.py’s part_1 builds cases = [] and returns it. The autograder for the evaluation module evaluates nothing. The validator’s validator is a stub.

What a real validation function would demand and this omits entirely: a versioned held-out eval set, ground truth beyond a domain list, confidence intervals (n≈20 with no error bars is a vibe, not a measurement), human-calibration of the judge, drift monitoring, and trajectory-level metrics (τ-bench’s pass^k reliability, tool-call correctness). The field — LangSmith, DeepEval’s 14 metrics, Ragas, Phoenix — treats agent eval as the hard problem it is. Even DLAI’s own “Evaluating and Debugging Generative AI” goes deeper.

Risk register

judge never validated · provenance ≠ correctness · spoofable substring matching · no statistics · reusable eval suite described but never built · assignment is a mocked skeleton · grader tests nothing.

Verdict — 2.5/5. A very good conceptual frame around labs that under-deliver and an assignment that, fittingly, evaluates almost nothing. The first page of the eval playbook — then go read LangSmith / DeepEval / τ-bench for the rest.

05

Compounding risk with extra steps

Multi-Agent Collaboration

★★★★★★★★★★3 / 5 · the capstone

The capstone hands you an org chart: a team of specialized agents — planner, researcher, writer, editor — passing work down a line, coordinated by an executor that reads each plan step and routes it to whoever should do it. The slides are the most intellectually interesting of the course: linear pipelines, then all-to-all, then deeper hierarchies (manager → sub-teams → fact-checker → citation-checker). It’s a lovely tour. The labs only ever build the linear case. You’re shown the hard topologies and handed the easy one.

Here’s the number that belongs on slide one and isn’t: if each agent in a four-step chain is 90% reliable, the chain is 0.9⁴ ≈ 66%. The entire selling point of multi-agent — decomposition — is also its central risk, because errors propagate and compound rather than cancel. The course never quantifies this, so a learner leaves thinking “more agents = more capability” when the validation reality is “more agents = more independent failure points in series.”

And the routing layer, the load-bearing wall of the whole design, is made of drywall. The executor_agent asks gpt-4o for “only a valid JSON object,” runs the reply through clean_json_block() — a regex that scrapes off Markdown code fences — and then calls a bare json.loads() with no try/except. The very existence of that fence-scraper is a confession that the model doesn’t reliably return clean JSON, so we bolt one on and pray. The day it emits a stray sentence or a trailing comma, the entire pipeline throws and dies mid-report. “Our orchestration layer deserializes free-text LLM output with no schema validation and no fallback” is not a design; it’s an incident with a pending date.

But — and this is real — M5_UGL_1 is the best-engineered lab in the entire course. A sunglasses-store customer-service pipeline with actual inventory/transactions tables in pandas + DuckDB, real tool functions, a planning step (request → JSON tool plan), a reflection step that repairs a flawed plan against a TOOLS-ONLY spec, and an error-explanation step that turns “this return would drive stock negative” into human guidance. It’s the closest the course comes to production shape. (It still explains the failure rather than preventing it with a rollback — but explaining it is more than most demos manage.) The campaign-team lab (M5_UGL_2) is the fun one — researcher → designer → copywriter → packager compiling an HTML report — but it’s a linear happy-path where every agent trusts its upstream blindly, with no evaluation gate between stages.

Versus the field, the comparison is almost unfair, because DeepLearning.AI has dedicated multi-agent courses — “Multi AI Agent Systems with crewAI,” “AI Agentic Design Patterns with AutoGen.” Against those, plus LangGraph’s stateful graphs with checkpoints and retries and OpenAI’s Agents SDK handoffs, this single capstone frames orchestration as a prompt-engineering problem solved with a hand-rolled JSON router, where the serious tools frame it as a reliability problem with real primitives. As a first exposure to the idea, it’s clear. As preparation to ship one, it’s a trailhead, not a map.

Risk register

brittle json.loads router with no fallback · compounding failure never quantified · no transactional safety · blind hand-offs with no eval gate · autograder tests the parts, never the emergent team · observability = print().

Verdict — 3/5. The strongest ideas and the single best lab, undercut by a toy orchestration layer that fails the first hour of a design review. Then graduate to LangGraph / CrewAI / AutoGen before anything touches a customer.

Final disposition

The model validation sign-off

Let me total the ledger, because that’s the job.

What this course does well, and I mean genuinely well: the sequencing is excellent — reflection → tool use → evaluation → multi-agent is the right arc, and reusing one research-agent example across all five modules gives a continuity most course collections lack. The conceptual honesty is above market: Ng’s slides cite the real papers (Self-Refine), warn about LLM-judge position bias, draw the evaluation-vs-reflection line crisply, and never pretend “agentic” is a binary. Two labs — the execution-grounded SQL refiner (M2) and the DuckDB customer-service pipeline (M5) — are things I’d happily show a junior. The tool-call auditability in M3 is real. As a way to build the right mental models fast, it’s very good.

What a validation committee cannot sign: a recurring pattern where safety, reliability, and evaluation are treated as afterthoughts precisely in the modules about acting, and about evaluating. Autograders that certify “returns a string” and, in one delicious case, certify literally nothing. Code that wipes its own audit trail, deletes without confirmation, ingests untrusted content into its own context, routes on unschema’d JSON, and calls all of it “safe.” The course teaches you to start a system; it consistently declines to teach you to trust one.

Who should take it:

Take itEngineers and analysts who want the four agentic design patterns in their hands quickly — and who’ll read the code as a whiteboard sketch that boots, not a reference implementation.

ChaperoneAnyone heading toward production. Take the mental models here; take the guardrails from Anthropic’s tool-use docs, LangGraph’s human-in-the-loop, and LangSmith / DeepEval’s eval discipline.

Look awayIf you need production hygiene, statistical rigor in evaluation, or serious multi-agent reliability — DLAI’s own deeper courses (and the open HF Agents Course) serve those axes better.

Final rating: 3/5 — and, in the spirit of a module that never checks whether V2 beat V1, I’ll note that I have not validated this rating against a held-out committee of human reviewers, calibrated my judge, or attached a confidence interval. Take it, appropriately, with external feedback.

Conditionally Approvedfor education

Not Approvedfor deployment

Findings above. Remediation left as an exercise for the learner — which, to be fair, is the whole point of a course.

Reviewed against: DeepLearning.AI — Agentic AI, Hugging Face Agents Course, Anthropic tool-use docs, LangGraph human-in-the-loop, LangSmith, DeepEval, and the Self-Refine / Reflexion / τ-bench literature. Every code citation (main.py:49, email_service.py:194, TOP_DOMAINS = {}) was read in-repo, not inferred.

July 12, 2026

When the Skeptic Reads the Manual: A Risk Quant Learns to Build AI Agents

In which I build the perfect AI-course reading list, then do to it what I’m paid to do to everyone else’s models.

The confession

There is a particular irony in a model-risk quant deciding to learn how to build AI agents. My job is to sit across the table from someone who has built something clever and ask, as politely as I can manage, “Yes, but how does it break?” I am paid to be the person who doesn’t get swept up in the excitement. Skepticism isn’t a mood for me; it’s a line item.

So when large language models went from party trick to production dependency in roughly the time it takes to run a Monte Carlo simulation, my first instinct was the usual one: fold my arms and wait for the backtest. But a funny thing happens when you spend enough years reviewing other people’s models. You realize the only thing worse than a technology you distrust is one you distrust and don’t understand. You cannot challenge what you cannot read.

So I did the thing skeptics rarely admit to doing. I decided to actually learn it, hands on keyboard, from people who build these systems for a living rather than from a thousand breathless LinkedIn posts. This is the story of the syllabus I built to do that — and, more usefully, the story of how I turned around and validated my own reading list before trusting it.

What’s actually on DeepLearning.AI

If you haven’t looked lately, DeepLearning.AI has quietly become the place practitioners go to learn this stack. It’s Andrew Ng’s shop, and it shows: the pedagogy is unfussy, the notebooks run, and the instructors are people who ship.

The catalog splits into two shapes. There are the foundational specializations, multi-week affairs that teach a subject properly, and then there are the short courses, of which there are now around 124. The short courses are the real draw. Most run one to three hours, each is taught by a named practitioner from a company you’ve heard of, and every one hands you a code notebook so you’re typing rather than nodding. Anthropic, Google, Meta, LangChain, crewAI, LlamaIndex, Hugging Face, Nvidia, Databricks: the guest list reads like the org chart of the entire field.

The topics span the whole territory: prompting, the internals of transformers, retrieval-augmented generation, single and multi-agent systems, evaluation, guardrails and red-teaming, fine-tuning, and coding agents. It’s subscription-based, which for someone who binges in bursts felt fair. The catch is that with 124 courses and no obvious ordering, it’s easy to wander in and drown.

Why I built a five-tier path

Drowning is the problem a curriculum solves. Left to its own devices, a catalog this large becomes a buffet where you fill your plate with whatever’s shiny and leave still hungry for the basics. I wanted a spine, an order that built competence deliberately instead of accreting trivia.

Five tiers, roughly: foundations first, then the practical craft of building agents, then, crucially, evaluation, governance, and reliability, then the supporting technical skills that make the rest robust, and finally some optional productivity tooling.

Here’s where being a risk person rearranges the furniture. In most self-taught AI curricula, evaluation and guardrails are the vegetables you’re meant to eat after the fun part. For someone in model risk, that ordering is backwards to the point of being funny. Evaluation, governance, red-teaming: that isn’t the compliance tax on the interesting work. That is the interesting work. An agent that takes autonomous, multi-step actions is, from where I sit, a model with a much larger blast radius and far worse documentation. So I gave that material its own tier and refused to treat it as an afterthought.

Draft one: the list that looked perfect

Here is what I landed on first. Read it and tell me it doesn’t look like a serious, well-rounded plan:

Tier 1 — Foundations. Agentic AI (DeepLearning.AI, Andrew Ng); How Transformer LLMs Work (Jay Alammar & Maarten Grootendorst).
Tier 2 — Core agent-building. MCP: Build Rich-Context AI Apps with Anthropic (Anthropic); AI Agents in LangGraph (LangChain/Tavily); Building Agentic RAG with LlamaIndex (LlamaIndex); Multi AI Agent Systems with crewAI (crewAI); A2A: The Agent2Agent Protocol (Google Cloud/IBM Research).
Tier 3 — Evaluation, reliability, governance. Evaluating AI Agents (Arize); Governing AI Agents (Databricks); Safe and Reliable AI via Guardrails (GuardrailsAI); Red Teaming LLM Applications (Giskard); NeMo Agent Toolkit: Making Agents Reliable (Nvidia).
Tier 4 — Supporting skills. Getting Structured LLM Output (DotTxt); Pydantic for LLM Workflows (DeepLearning.AI); Building and Evaluating Data Agents (Snowflake).
Tier 5 — Optional coding agents. Claude Code (Anthropic); Spec-Driven Development with Coding Agents (JetBrains).

Marquee names, sensible progression, the risk material promoted to its own tier. I very nearly hit “enroll” on all of it. And then the part of my brain that reviews models for a living cleared its throat and asked the question it always asks: how do you know this is any good?

The validation: interrogating my own reading list

The uncomfortable thing about a course list is that it markets itself. Every card looks equally current, because the thumbnails are equally glossy. But in applied AI, currency is everything. Frameworks ship breaking changes on a schedule that would make a fixed-income desk weep. A course that teaches you to drive a specific library the way it worked eighteen months ago is not a shortcut; it’s a trap with a completion certificate.

So I did the boring, load-bearing thing. I checked the vintage of every course before committing a single evening to it.

Getting the dates was itself a small exercise in not trusting the surface. The course cards don’t advertise a release date — you see a title, a duration, a difficulty, and nothing about age. But the platform is a Next.js app, and Next.js ships a page’s data to the browser as a blob of JSON embedded in the HTML (the __NEXT_DATA__ script tag). Buried in that blob, each course carried a quiet little released_at field that the interface never bothers to render. A few lines in the browser console were enough to pull it out:

			
const data = JSON.parse(document.getElementById('__NEXT_DATA__').textContent);
// walk the embedded query cache and collect { slug, name, released_at }
// ...then sort by date and stop kidding yourself about what's current

Five minutes of squinting at JSON told me more about the syllabus’s health than five hours of reading course descriptions would have. This is, more or less, my entire job compressed into a paragraph: the polished front-end is not the model; go find the data the front-end is quietly sitting on.

Then I split the courses into two buckets, because “old” isn’t one thing:

Concept-durable courses teach mental models that don’t expire — what a transformer is, how to think like an attacker, why you evaluate before you deploy. A two-year-old course on those still earns its seat.
API-volatile courses teach you to drive a specific, fast-moving library. Those rot. A mid-2024 course on an agent framework will hand you notebooks that no longer run against today’s version of that same framework.

Here is what validating the dates turned up (ages as of writing, mid-2026):

Course	Released	Age	Verdict
Agentic AI	Sep 2025	~9 mo	Fresh
How Transformer LLMs Work	Feb 2025	~17 mo	Concept-durable
MCP (Anthropic)	May 2025	~14 mo	Fine
AI Agents in LangGraph	Jun 2024	~25 mo	API-volatile — swap
Building Agentic RAG (LlamaIndex)	May 2024	~26 mo	API-volatile — swap
Multi AI Agent Systems (crewAI)	May 2024	~26 mo	API-volatile — swap
A2A Protocol	Feb 2026	~5 mo	Fresh
Evaluating AI Agents (Arize)	Feb 2025	~17 mo	Fine
Governing AI Agents (Databricks)	Oct 2025	~9 mo	Fresh
Safe & Reliable AI via Guardrails	Nov 2024	~20 mo	Redundant — drop
Red Teaming LLM Applications	Apr 2024	~27 mo	Concept-durable — keep, flagged
NeMo Agent Toolkit (Nvidia)	Dec 2025	~7 mo	Fresh
Getting Structured LLM Output	Apr 2025	~15 mo	Fine
Pydantic for LLM Workflows	Jul 2025	~11 mo	Fresh
Building & Evaluating Data Agents	Sep 2025	~9 mo	Fresh
Claude Code (Anthropic)	Aug 2025	~11 mo	Watch — moves fast
Spec-Driven Development (JetBrains)	Apr 2026	~3 mo	Fresh

The perfect list, it turned out, had five soft spots — all clustered, predictably, in the hands-on agent-building tier where the frameworks churn fastest.

Draft two: the swaps

Where a stale, API-volatile course had a fresher replacement teaching the same thing, I swapped it. Where it didn’t, I kept it and made peace with rewriting a few code cells. Where a fresher course elsewhere already covered the ground, I cut the duplicate.

Out (and why)	In (and why)
Multi AI Agent Systems with crewAI — 26 mo old	Design, Develop, and Deploy Multi-Agent Systems with CrewAI (Nov 2025) — same framework, production-focused, current API
Building Agentic RAG with LlamaIndex — 26 mo old	Retrieval Augmented Generation (RAG) (DeepLearning.AI, Sep 2025) — comprehensive, architecture → deployment → evaluation
AI Agents in LangGraph — 25 mo old	DSPy: Build and Optimize Agentic Apps (Databricks, Jun 2025) — current, and built around MLflow experiment tracking, which is a model-risk person’s native tongue
Safe and Reliable AI via Guardrails — 20 mo old	Dropped — the fresher NeMo (Dec 2025) and Governing AI Agents (Oct 2025) already cover reliability and control
Red Teaming LLM Applications — 27 mo old	Kept, with an asterisk — it’s the oldest course on the list and has no newer equivalent, but adversarial thinking ages well even when the specific exploits don’t

Two honest caveats. First, DSPy is not a like-for-like replacement for LangGraph — it’s a different paradigm (optimizing agents rather than wiring them as graphs). I chose it for freshness and for the MLflow angle; if you want to stay in the LangGraph world, its newest touchpoint is Long-Term Agentic Memory with LangGraph (Mar 2025). Second, the swaps cost time. The 2024 courses were bite-sized; their current replacements are comprehensive. My honest estimate for Tiers 1–4 roughly doubled, from ~30 hours to closer to 65. A season of evenings, not a long weekend. I decided current-and-longer beats dated-and-quick — which is, when you think about it, exactly the trade a risk function exists to make.

Where I am so far

I’ve finished Tier 1’s Agentic AI, and, setting aside the professional obligation to sound measured, I came out of it genuinely energized.

The reframe that landed hardest was almost embarrassingly simple. I’d been thinking of an LLM as a very articulate oracle: you ask, it answers, you move on. The course reoriented me toward agents as iterative, multi-step workflows that take action: plan, act, observe, revise, repeat. Once that clicked, my skepticism sharpened into something more useful. An oracle you can spot-check. A system that plans and acts across many steps, calling tools and touching real state along the way, needs the full apparatus of model risk: evaluation, monitoring, controls, and someone whose job it is to ask how it breaks.

Which is exactly why the tiers ahead are the ones I’m most impatient to reach. The evaluation and governance material isn’t the part I’ll dutifully endure to get certified. It’s the part where my existing instincts finally have somewhere to go. I suspect I’ll be an insufferable student in those modules, and I’m oddly looking forward to it.

A disclaimer, offered without apology

In the spirit of the model-risk documentation I spend my life reading and writing, one honest note on provenance. This whole exercise — the five-tier structure, the reordering that promotes evaluation and governance, and the recency validation that pulled every release date and swapped out the stale courses — came out of an extended back-and-forth with Claude, Anthropic’s AI, acting as co-author and thinking partner. I brought the constraints, the skepticism, and the risk lens; it drafted the first list, and then, when I muttered that I didn’t trust how old these things were, it was the one that went and dug the released_at dates out of the page and helped me sort the keepers from the fossils.

There’s a pleasing symmetry to that. I used an AI to design my plan for understanding AI, validated its first draft the way I’d validate any model handed to me, and then told you so plainly rather than pretending the syllabus sprang fully formed from my own head. If I’m going to spend the next several weeks learning to evaluate these systems, I might as well start by being transparent about working with one. Consider it the first entry in the validation report.

July 11, 2026

Scraping the Daily India Covid-19 Tracker for CSV Data

This is a very short post that will be very useful to help you quickly set up your COVID-19 datasets. I’m sharing code at the end of this post that scrapes through all CSV datasets made available by COVID19-India API.

We have exposed all the crowdsourced patient details, travel history (published by authorities) and statewise trends in this live API : https://t.co/tNyhpPYTJD A shout-out to all Data Analysts, Planners and Enthusiasts to use this data for helping the containment efforts. @_mekin

— covid19indiaorg (@covid19indiaorg) March 22, 2020

Copy paste this standalone script into your R environment and get going!

There are 15+ CSV files on the India COVID-19 API website. raw_data3 is actually a live dataset and more can be expected in the days to come, which is why a script that automates the data sourcing comes in handy. Snapshot of the file names and the data dimensions as of today, 100 days since the first case was recorded in the state of Kerala —

My own analysis of the data and predictions are work-in-progress, going into a Github repo. Execute the code below and get started analyzing the data and fighting COVID-19!

	rm(list = ls())

	# Load relevant libraries —————————————————–
	library(stringr)
	library(data.table)

	# =============================================================================
	# COVID 19-India API: A volunteer-driven, crowdsourced database
	# for COVID-19 stats & patient tracing in India
	# =============================================================================

	url <- "https://api.covid19india.org/csv/"

	# List out all CSV files to source ——————————————–

	html <- paste(readLines(url), collapse="\n")
	pattern <- "https://api.covid19india.org/csv/latest/.*csv"
	matched <- unlist(str_match_all(string = html, pattern = pattern))

	# Downloading the Data ——————————————————–

	covid_datasets <- lapply(as.list(matched), fread)

	# Naming the data objects appropriately —————————————


	exclude_chars <- "https://api.covid19india.org/csv/latest/"

	dataset_names <- substr(x = matched,
	start = 1 + nchar(exclude_chars),
	stop = nchar(matched)- nchar(".csv"))

	# assigning variable names
	for(i in seq_along(dataset_names)){
	assign(dataset_names[i], covid_datasets[[i]])
	}

view raw source_patient_data_india.R hosted with ❤ by GitHub

May 10, 2020

Linear Algebra behind the lm() function in R

This post comes out of the blue, nearly 2 years since my last one. I realize I’ve been lazy, so here’s hoping I move from an inertia of rest to that of motion, implying, regular and (hopefully) relevant posts. I also chanced upon some wisdom while scrolling through my Twitter feed:

https://twitter.com/CoralineAda/status/1193216045210906625

This blog post in particular was meant to be a reminder to myself and other R users that the much used lm() function in R (for fitting linear models) can be replaced with some handy matrix operations to obtain regression coefficients, their standard errors and other goodness-of-fit stats printed out when summary() is called on an lm object.

Linear regression can be formulated mathematically as follows:
$\mathbf{y} = \mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}$ ,
$\mathbf{\epsilon} \sim N(0, \sigma^2 \mathbf{I})$

$\mathbf{y}$ is the $\mathbf{n}\times \mathbf{1}$ outcome variable and $\mathbf{X}$ is the $\mathbf{n}\times \mathbf{(\mathbf{k}+1)}$ data matrix of independent predictor variables (including a vector of ones corresponding to the intercept). The ordinary least squares (OLS) estimate for the vector of coefficients $\mathbf{\beta}$ is:

$\hat{\mathbf{\beta}} = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \mathbf{y}$

The covariance matrix can be obtained with some handy matrix operations:
$\textrm{Var}(\hat{\mathbf{\beta}}) = (\mathbf{X}^{\prime} \mathbf{X})^{-1} \mathbf{X}^{\prime} \;\sigma^2 \mathbf{I} \; \mathbf{X} (\mathbf{X}^{\prime} \mathbf{X})^{-1} = \sigma^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}$
given that $\textrm{Var}(AX) = A \times \textrm{Var}X \times A^{\prime}; \textrm{Var}(\mathbf{y}) = \mathbf{\sigma^2}$

The standard errors of the coefficients are basically $\textrm{Diag}(\sqrt{\textrm{Var}(\hat{\mathbf{\beta}})}) = \textrm{Diag}(\sqrt{\sigma^2 (\mathbf{X}^{\prime} \mathbf{X})^{-1}})$ and with these, one can compute the t-statistics and their corresponding p-values.

Lastly, the F-statistic and its corresponding p-value can be calculated after computing the two residual sum of squares (RSS) statistics:

$\mathbf{RSS}$ – for the full model with all predictors
$\mathbf{RSS_0}$ – for the partial model ( $\mathbf{y} = \mathbf{\mu} + \mathbf{\nu}; \mathbf{\mu} = \mathop{\mathbb{E}}[\mathbf{y}]; \mathbf{\nu} \sim N(0, \sigma_0^2 \mathbf{I})$ ) with the outcome observed mean as estimated outcome

$\mathbf{F} = \frac{(\mathbf{RSS_0}-\mathbf{RSS})/\mathbf{k}}{\mathbf{RSS}/(\mathbf{n}-\mathbf{k}-1)}$

I wrote some R code to construct the output from summarizing lm objects, using all the math spewed thus far. The data used for this exercise is available in R, and comprises of standardized fertility measures and socio-economic indicators for each of 47 French-speaking provinces of Switzerland from 1888. Try it out and see for yourself the linear algebra behind linear regression.

	### Linear Regression Using lm() —————————————-
	data("swiss")
	dat <- swiss
	linear_model <- lm(Fertility ~ ., data = dat)
	summary(linear_model)


	# Call:
	# lm(formula = Fertility ~ ., data = dat)
	#
	# Residuals:
	# Min 1Q Median 3Q Max
	# -15.2743 -5.2617 0.5032 4.1198 15.3213
	#
	# Coefficients:
	# Estimate Std. Error t value Pr(>\|t\|)
	# (Intercept) 66.91518 10.70604 6.250 1.91e-07 ***
	# Agriculture -0.17211 0.07030 -2.448 0.01873 *
	# Examination -0.25801 0.25388 -1.016 0.31546
	# Education -0.87094 0.18303 -4.758 2.43e-05 ***
	# Catholic 0.10412 0.03526 2.953 0.00519 **
	# Infant.Mortality 1.07705 0.38172 2.822 0.00734 **
	# —
	# Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
	#
	# Residual standard error: 7.165 on 41 degrees of freedom
	# Multiple R-squared: 0.7067, Adjusted R-squared: 0.671
	# F-statistic: 19.76 on 5 and 41 DF, p-value: 5.594e-10


	### Using Linear Algebra ————————————————

	y <- matrix(dat$Fertility, nrow = nrow(dat))
	X <- cbind(1, as.matrix(x = dat[,-1]))
	colnames(X)[1] <- "(Intercept)"

	# N x k matrix
	N <- nrow(X)
	k <- ncol(X) – 1 # number of predictor variables (ergo, excluding Intercept column)

	# Estimated Regression Coefficients
	beta_hat <- solve(t(X)%%X)%%(t(X)%*%y)

	# Variance of outcome variable = Variance of residuals
	sigma_sq <- residual_variance <- (N-k-1)^-1 * sum((y – X %*% beta_hat)^2)
	residual_std_error <- sqrt(residual_variance)

	# Variance and Std. Error of estimated coefficients of the linear model
	var_betaHat <- sigma_sq * solve(t(X) %*% X)
	coeff_std_errors <- sqrt(diag(var_betaHat))

	# t values of estimates are ratio of estimated coefficients to std. errors
	t_values <- beta_hat / coeff_std_errors

	# p-values of t-statistics of estimated coefficeints
	p_values_tstat <- 2 * pt(abs(t_values), N-k, lower.tail = FALSE)

	# assigning R's significance codes to obtained p-values
	signif_codes_match <- function(x){
	ifelse(x <= 0.001,"***",
	ifelse(x <= 0.01,"**",
	ifelse(x < 0.05,"*",
	ifelse(x < 0.1,"."," "))))

	}
	signif_codes <- sapply(p_values_tstat, signif_codes_match)

	# R-squared and Adjusted R-squared (refer any econometrics / statistics textbook)
	R_sq <- 1 – (N-k-1)residual_variance / (Nmean((y – mean(y))^2))
	R_sq_adj <- 1 – residual_variance / ((N/(N-1))*mean((y – mean(y))^2))

	# Residual sum of squares (RSS) for the full model
	RSS <- (N-k-1)*residual_variance
	# RSS for the partial model with only intercept (equal to mean), ergo, TSS
	RSS0 <- TSS <- sum((y – mean(y))^2)

	# F statistic based on RSS for full and partial models
	# k = degress of freedom of partial model
	# N – k – 1 = degress of freedom of full model
	F_stat <- ((RSS0 – RSS)/k) / (RSS/(N-k-1))

	# p-values of the F statistic
	p_value_F_stat <- pf(F_stat, df1 = k, df2 = N-k-1, lower.tail = FALSE)

	# stitch the main results toghether
	lm_results <- as.data.frame(cbind(beta_hat, coeff_std_errors,
	t_values, p_values_tstat, signif_codes))
	colnames(lm_results) <- c("Estimate","Std. Error","t value","Pr(>\|t\|)","")


	### Print out results of all relevant calcualtions ———————–


	print(lm_results)
	cat("Residual standard error: ",
	round(residual_std_error, digits = 3),
	" on ",N-k-1," degrees of freedom",
	"\nMultiple R-squared: ",R_sq," Adjusted R-squared: ",R_sq_adj,
	"\nF-statistic: ",F_stat, " on ",k-1," and ",N-k-1,
	" DF, p-value: ", p_value_F_stat,"\n")

	# Estimate Std. Error t value Pr(>\|t\|)
	# (Intercept) 66.9151816789654 10.7060375853301 6.25022854119771 1.73336561301153e-07 ***
	# Agriculture -0.172113970941457 0.0703039231786469 -2.44814177018405 0.0186186100433133 *
	# Examination -0.258008239834722 0.253878200892098 -1.01626779663678 0.315320687313066
	# Education -0.870940062939429 0.183028601571259 -4.75849159892283 2.3228265226988e-05 ***
	# Catholic 0.104115330743766 0.035257852536169 2.95296858017545 0.00513556154915653 **
	# Infant.Mortality 1.07704814069103 0.381719650858061 2.82156849475775 0.00726899472564356 **

	# Residual standard error: 7.165 on 41 degrees of freedom
	# Multiple R-squared: 0.706735 Adjusted R-squared: 0.670971
	# F-statistic: 19.76106 on 4 and 41 DF, p-value: 5.593799e-10

view raw lm_linear_algebra.R hosted with ❤ by GitHub

Hope this was useful and worth your time!

November 12, 2019

Installing Tensorflow on Windows is Easy!

I recently got myself to start using Python on Windows, whereas till very recently I had been working on Python only from Ubuntu.

I am sure I am late in realizing this, but installing Tensorflow was just so easy!

If you’ve tried installing Tensorflow for Windows when it was first introduced, and gave up back then – try again. The method I’d recommend would be using Anaconda Navigator from where you first open a terminal (figure below). You may notice that I already have a tensorflow environment set up, since I am writing this post after installation.

Once you have terminal open, create a conda environment named tensorflow by invoking the following command, with your python version:

C:> conda create -n tensorflow python=3.6

That’s all! You should now have tensorflow ready to use.

For more details, you could always go here. Otherwise, the screenshot below gives a sense of what it takes.

December 30, 2017

Linear / Logistic Regression in R: Dealing With Unknown Factor Levels in Test Data

Let’s say you have data containing a categorical variable with 50 levels. When you divide the data into train and test sets, chances are you don’t have all 50 levels featuring in your training set.

This often happens when you divide the data set into train and test sets according to the distribution of the outcome variable. In doing so, chances are that our explanatory categorical variable might not be distributed exactly the same way in train and test sets – so much so that certain levels of this categorical variable are missing from the training set. The more levels there are to a categorical variable, it gets difficult for that variable to be similarly represented upon splitting the data.

Take for instance this example data set (train.csv + test.csv) which contains a categorical variable var_b that takes 349 unique levels. Our train data has 334 of these levels – on which the model is built – and hence 15 levels are excluded from our trained model. If you try making predictions on the test set with this model in R, it throws an error:
factor var_b has new levels 16060, 17300, 17980, 19060, 21420, 21820, 25220, 29340, 30300, 33260, 34100, 38340, 39660, 44300, 45460
If you’ve used R to model generalized linear class of models such as linear, logit or probit models, then chances are you’ve come across this problem – especially when you’re validating your trained model on test data.

The workaround to this problem is in the form of a function, remove_missing_levels that I found here written by pat-s. You need magrittr library installed and it can only work on lm, glm and glmmPQL objects.

	remove_missing_levels <- function(fit, test_data) {
	library(magrittr)

	# https://stackoverflow.com/a/39495480/4185785

	# drop empty factor levels in test data
	test_data %>%
	droplevels() %>%
	as.data.frame() -> test_data

	# 'fit' object structure of 'lm' and 'glmmPQL' is different so we need to
	# account for it
	if (any(class(fit) == "glmmPQL")) {
	# Obtain factor predictors in the model and their levels
	factors <- (gsub("[-^0-9]\|as.factor\|\$\|\$", "",
	names(unlist(fit$contrasts))))
	# do nothing if no factors are present
	if (length(factors) == 0) {
	return(test_data)
	}

	map(fit$contrasts, function(x) names(unmatrix(x))) %>%
	unlist() -> factor_levels
	factor_levels %>% str_split(":", simplify = TRUE) %>%
	extract(, 1) -> factor_levels

	model_factors <- as.data.frame(cbind(factors, factor_levels))
	} else {
	# Obtain factor predictors in the model and their levels
	factors <- (gsub("[-^0-9]\|as.factor\|\$\|\$", "",
	names(unlist(fit$xlevels))))
	# do nothing if no factors are present
	if (length(factors) == 0) {
	return(test_data)
	}

	factor_levels <- unname(unlist(fit$xlevels))
	model_factors <- as.data.frame(cbind(factors, factor_levels))
	}

	# Select column names in test data that are factor predictors in
	# trained model

	predictors <- names(test_data[names(test_data) %in% factors])

	# For each factor predictor in your data, if the level is not in the model,
	# set the value to NA

	for (i in 1:length(predictors)) {
	found <- test_data[, predictors[i]] %in% model_factors[
	model_factors$factors == predictors[i], ]$factor_levels
	if (any(!found)) {
	# track which variable
	var <- predictors[i]
	# set to NA
	test_data[!found, predictors[i]] <- NA
	# drop empty factor levels in test data
	test_data %>%
	droplevels() -> test_data
	# issue warning to console
	message(sprintf(paste0("Setting missing levels in '%s', only present",
	" in test data but missing in train data,",
	" to 'NA'."),
	var))
	}
	}
	return(test_data)
	}

view raw remove_missing_levels.R hosted with ❤ by GitHub

Once you’ve sourced the above function in R, you can seamlessly proceed with using your trained model to make predictions on the test set. The code below demonstrates this for the data set shared above. You can find these codes in one of my github repos and try it out yourself.

	library(data.table)

	train <- fread('train.csv'); test <- fread('test.csv')

	# consolidate the 2 data sets after creating a variable indicating train / test
	train$flag <- 0; test$flag <- 1
	dat <- rbind(train,test)

	# change outcome, var_b and var_e into factor var
	dat$outcome <- factor(dat$outcome)
	dat$var_b <- factor(dat$var_b)
	dat$var_e <- factor(dat$var_e)

	# check the levels of var_b and var_e in this consolidated, train and test data sets
	length(levels(dat$var_b)); length(unique(train$var_b)); length(unique(test$var_b))

	# get back the train and test data
	train <- subset(dat, flag == 0); test <- subset(dat, flag == 1)
	train$flag <- NULL; test$flag <- NULL

	# Build Logit Model using train data and make predictions
	logitModel <- glm(outcome ~ ., data = train, family = 'binomial')
	preds_train <- predict(logitModel, type = 'response')

	# Model Predictions on test data
	preds_test <- predict(logitModel, newdata = test, type = 'response')
	# running the above code gives us the following error:
	# factor var_b has new levels 16060, 17300, 17980, 19060, 21420, 21820,
	# 25220, 29340, 30300, 33260, 34100, 38340, 39660, 44300, 45460

	# Workaround:
	source('remove_missing_levels.R')
	preds_test <- predict(logitModel,
	newdata = remove_missing_levels(fit = logitModel, test_data = test),
	type = 'response')

view raw factor_new_levels.R hosted with ❤ by GitHub

October 8, 2017

Quick Way of Installing all your old R libraries on a New Device

I recently bought a new laptop and began installing essential software all over again, including R of course! And I wanted all the libraries that I had installed in my previous laptop. Instead of installing libraries one by one all over again, I did the following:

Step 1: Save a list of packages installed in your old computing device (from your old device).

installed <- as.data.frame(installed.packages()) write.csv(installed, 'installed_previously.csv')

This saves information on installed packages in a csv file named installed_previously.csv. Now copy or e-mail this file to your new device and access it from your working directory in R.

Step 2: Create a list of libraries from your old list that were not already installed when you freshly download R (from your new device).

installedPreviously <- read.csv('installed_previously.csv') baseR <- as.data.frame(installed.packages()) toInstall <- setdiff(installedPreviously, baseR)

We now have a list of libraries that were installed in your previous computer in addition to the R packages already installed when you download R. So you now go ahead and install these libraries.

Step 3: Download this list of libraries.

install.packages(toInstall)

That’s it. Save yourself the trouble installing packages one-by-one all over again.

July 27, 2017

Key Insights on Sberbank Home Price Predicting Kaggle Competition Coming Soon…

This post is more about data science and Kaggle than about R or Python. I am currently taking part in my 2nd Kaggle competition, Sberbank Russian Housing Market — Can you predict realty price fluctuations in Russia’s volatile economy?

I’ve been stuck for about a week at the 52nd percentile among 3400+ Kagglers taking part in the competition. I’ve been told that Kaggle Kernels and discussion boards are helpful when you’re stuck or if you need to learn some practical data science that can’t be gleaned from books or tutorials.

One such discussion thread looks like this:

This person going by the pseudonym Schoolpal is currently killing it on the leaderboard and I’m eagerly looking forward to this person’s code once the competition ends in less than 24 hours. If you’re interested too, follow this discussion here.

Cheers!

Update:

This Schoolpal, as mentioned earlier, finally came in second and shared their approach here.

June 29, 2017

Endogenously Detecting Structural Breaks in a Time Series: Implementation in R

The most conventional approach to determine structural breaks in longitudinal data seems to be the Chow Test.

From Wikipedia,

The Chow test, proposed by econometrician Gregory Chow in 1960, is a test of whether the coefficients in two linear regressions on different data sets are equal. In econometrics, it is most commonly used in time series analysis to test for the presence of a structural break at a period which can be assumed to be known a priori (for instance, a major historical event such as a war). In program evaluation, the Chow test is often used to determine whether the independent variables have different impacts on different subgroups of the population.

As shown in the figure below, regressions on the 2 sub-intervals seem to have greater explanatory power than a single regression over the data.

For the data above, determining the sub-intervals is an easy task. However, things may not look that simple in reality. Conducting a Chow test for structural breaks leaves the data scientist at the mercy of his subjective gaze in choosing a null hypothesis for a break point in the data.

Instead of choosing the breakpoints in an exogenous manner, what if the data itself could learn where these breakpoints lie? Such an endogenous technique is what Bai and Perron came up with in a seminal paper published in 1998 that could detect multiple structural breaks in longitudinal data. A later paper in 2003 dealt with the testing for breaks empirically, using a dynamic programming algorithm based on the Bellman principle.

I will discuss a quick implementation of this technique in R.

Brief Outline:

Assuming you have a ts object (I don’t know whether this works with zoo, but it should) in R, called ts. Then implement the following:

	# assuming you have a 'ts' object in R

	# 1. install package 'strucchange'
	# 2. Then write down this code:

	library(strucchange)

	# store the breakdates
	bp_ts <- breakpoints(ts ~ 1)

	# this will give you the break dates and their confidence intervals
	summary(bp_ts)

	# store the confidence intervals
	ci_ts <- confint(bp_ts)

	## to plot the breakpoints with confidence intervals
	plot(ts)
	lines(bp_ts)
	lines(ci_ts)

view raw strucchange_usage.R hosted with ❤ by GitHub

An illustration

I started with data on India’s rice crop productivity between 1950 (around Independence from British Colonial rule) and 2008. Here’s how it looks:

You can download the excel and CSV files here and here respectively.

Here’s the way to go using R:

	library(xlsx)
	library(forecast)
	library(tseries)
	library(strucchange)

	## load the data from a CSV or Excel file. This example is done with an Excel sheet.
	prod_df <- read.xlsx(file = 'agricultural_productivity.xls', sheetIndex = 'Sheet1', rowIndex = 8:65, colIndex = 2, header = FALSE)
	colnames(prod_df) <- c('Rice')
	## store rice data as time series objects
	rice <- ts(prod_df$Rice, start=c(1951, 1), end=c(2008, 1), frequency=1)

	# store the breakpoints
	bp.rice <- breakpoints(rice ~ 1)
	summary(bp.rice)

	## the BIC chooses 5 breakpoints; plot the graph with breakdates and their confidence intervals
	plot(bp.rice)
	plot(rice)
	lines(bp.rice)

	## confidence intervals
	ci.rice <- confint(bp.rice)
	ci.rice
	lines(ci.rice)

view raw rice_strucchange.R hosted with ❤ by GitHub

Voila, this is what you get:

The dotted vertical lines indicated the break dates; the horizontal red lines indicate their confidence intervals.

This is a quick and dirty implementation. For a more detailed take, check out the documentation on the R package called strucchange.

November 8, 2016

Abu Mostafa’s Machine Learning MOOC – Now on EdX

This was in the pipeline for quite some time now. I have been waiting for his lectures on a platform such as EdX or Coursera, and the day has arrived. You can enroll and start with week 1’s lectures as they’re live now.

This course is taught by none other than Dr. Yaser S. Abu – Mostafa, whose textbook on machine learning, Learning from Data is #1 bestseller textbook (Amazon) in all categories of Computer Science. His online course has been offered earlier over here.

Teaching

Dr. Abu-Mostafa received the Clauser Prize for the most original doctoral thesis at Caltech. He received the ASCIT Teaching Awards in 1986, 1989 and 1991, the GSC Teaching Awards in 1995 and 2002, and the Richard P. Feynman prize for excellence in teaching in 1996.

Live ‘One-take’ Recordings

The lectures have been recorded from a live broadcast (including Q&A, which will let you gauge the level of CalTech students taking this course). In fact, it almost seems as though Abu Mostafa takes a direct jab at Andrew Ng’s popular Coursera MOOC by stating the obvious on his course page.

A real Caltech course, not a watered-down version

Again, while enrolling note that this is what Abu Mostafa had to say about the online course: “A Caltech course does not cater to short attention spans, and it may not provide instant gratification…[like] many MOOCs out there that are quite simple and have a ‘video game’ feel to them.” Unsurprisingly, many online students have dropped out in the past, but some of those students who “complained early on but decided to stick with the course had very flattering words to say at the end”.

Prerequisites

Basic probability
Basic matrices
Basic calculus
Some programming language/platform (I choose Python!)

If you’re looking for a challenging machine learning course, this is probably one you must take.

September 24, 2016

Author: Anirudh

The Reflective Research Agent

Reflection Design Pattern

Tool Use Design Pattern

Evaluating Agentic Workflows

Multi-Agent Collaboration

The model validation sign-off

Share this:

The confession

What’s actually on DeepLearning.AI

Why I built a five-tier path

Draft one: the list that looked perfect

The validation: interrogating my own reading list

Draft two: the swaps

Where I am so far

A disclaimer, offered without apology

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: