A production agent system, built on Microsoft Graph and Claude Agent SDK, running continuously for ten weeks against a real operating problem.
This is a long-form technical case study of an agent architecture I built and run. It is a production system, not a demo. The numbers are real. The code is shipping. The problem domain is my own job search, which gave me a ninety-day forcing function and the fastest possible feedback loop. The patterns behind it are the ones any organization moving infrastructure teams into agent territory is going to keep running into: grounding, skill isolation, deterministic state, writable memory, and operator integration.
Production agent architectureContinuous operation since Feb 6, 2026Claude Agent SDK · Microsoft Graph · Python
Section 1 | Positioning
I built the agent architecture enterprises are about to need.
My twenty years are in Microsoft infrastructure. Tenants, identity, endpoints, Exchange, the plumbing. In February my position at National Business Technologies was eliminated in a reduction in force, and I had ninety days to replace a principal-level role. I treated it as a forcing function. Rather than running a job search on spreadsheets and manual follow-ups, I rebuilt the whole thing as an agent system and became my own Customer Zero.
Today it runs my pipeline end to end. 145 opportunities tracked, six active interview threads, continuous operation since February 6. Scheduled jobs run unattended on cron, regenerating the dashboard, classifying new email, and scraping job boards without a session open. Inside the session, it scans my inbox via Graph, drafts recruiter replies in my voice, tailors resumes, generates interview prep documents grounded in confirmed experience, and refuses to fabricate anything the source material does not support. It is the same pattern enterprises now need for tenant assessment, license optimization, oversharing review, and agent governance.
The reason this is worth a long read is specific. Most Microsoft infrastructure engineers have not yet shipped a production agent. I have. This document is how it is built, why the choices are the ones that transfer to a customer engagement, and where the pattern is directly portable to the work a Microsoft cloud practice is already being asked to do.
Section 2 | By the numbers
The system is live, not a demo.
These numbers come from the running system. Every row is an opportunity I acted on. Every folder holds drafted correspondence, screening notes, and prep artifacts the agent produced or helped produce.
145
Opportunities managed
Feb 6 to Apr 16, one operator
6
Active interview threads
Running concurrent
129
Opportunity folders
JD, resume, prep, correspondence
71
Days of continuous run
Autonomous morning digest and triage
16
Specialist skills
Routed by task signature
15
Python services
Graph API, tracker, digest, health
10
Scheduled and startup jobs
6 Cowork cron, 4 Windows Task Scheduler
17
Context files
Confirmed experience, rules, lessons
Every metric above is queryable from the running system. The tracker is an .xlsx with a regenerated HTML view. The skills live in a plain directory the agent reads at session start. The run counts come from the action log.
Section 3 | Capabilities
What it actually does, in operator terms.
Inbox triage with accountability
Paginates the Outlook inbox via Graph. Finds recruiter replies, application confirmations, interview invites, and DOL mail. Never relies on keyword search because staffing firms use unpredictable senders. Writes every new recruiter contact to a tracker row the same pass.
Draft generation in operator voice
Produces Outlook drafts directly via Graph instead of asking the operator to copy and paste. Embeds the full signature block inline because the Outlook auto-signature is disabled. Handles decline, interest, redirect, and thank-you flows with separate templates.
Grounded interview prep
Builds company and role prep docs keyed to the job description and the operator's confirmed experience. Refuses to claim tools the operator has not used. Spells out every acronym on first use. Includes a meeting-join button pulled from the calendar invite.
Posting screener
Reads full job descriptions before queueing anything. Checks that the core platform is Microsoft. Verifies the posting link is live. Rejects with tracker evidence so the record stays auditable for the DOL.
Resume tailoring with provenance
Variants per role, built from a branded docx template. Strips AI and tool metadata from docProps XML and runs pikepdf on the PDF before delivery. Every change is traceable to the source bullet in the master resume.
Scheduled execution
Six cron tasks run unattended via the Cowork scheduled-tasks MCP: morning digest (06:00), dashboard and briefing (06:00 weekdays), email classification (09:00 weekdays), LinkedIn sweep (07:00 weekdays), Dice and Indeed sweep (07:15 Mon/Thu), session-usage monitor (22:00). Four more run via Windows Task Scheduler: the action and chatbot API servers start at login, rclone backup fires nightly at 23:00, and a toast notifier hangs on a manual trigger. Sessions open onto work already done.
Morning digest and dashboard
Python service regenerates a digest page, a command-center page, and an actions page. Runs on a Cowork cron schedule at 06:00 ET. The operator opens one URL in the morning and sees the whole pipeline.
Section 4 | Architecture
How the pieces fit.
Natural-language in, deterministic state out.
The architecture has four tiers. A natural-language entry point, a skill router, a tool layer that mixes Model Context Protocol connectors with Python services, and a state layer on disk. Probabilistic work stays in the model. Deterministic work stays in Python. The two halves never pretend to be each other.
Tier 1 | Interface
Natural language, voice or text
Operator gives a request in plain English. No DSL, no form. The session-start skill fires on every new session and gates all other work until the inbox is scanned and the tracker is current.
▼
Tier 2 | Skill Router
16 specialist skills, markdown-defined
Each skill is a markdown file with a description field the model matches against the request. The router picks one or more skills based on task signature. Skills are isolated: the resume-work skill cannot rewrite the email-and-drafts skill. This is the same pattern Claude Agent SDK uses for production agents.
▼
Tier 3a | MCP connectors
ms365 | dice | indeed | microsoft-learn
Graph API access for mail, calendar, SharePoint, OneDrive. Job board search. Microsoft documentation lookup. Added April 2 after the outlook-assistant MCP was deprecated.
Tier 3b | Python services
Deterministic work
action_server.py, chatbot_server.py, generate_morning_digest.py, generate_tracker_html.py, graph_api_helper.py. 15 services total. Anything that needs to be the same every time runs here, not in the model.
→
Tier 4 | State
Disk, not a database
Job_Tracker.xlsx is the spine. action_log.json, sent_log.json, action_queue.json track the agent's own work. MEMORY.md and a memory/ directory hold persistent facts across sessions.
Ground truth
CLAUDE.md and context/
The agent reads CLAUDE.md at session start. Identity rules, compensation floors, writing rules, thoroughness rules. 17 context files hold confirmed experience. Anything not in here is not a claim the agent will make.
Deterministic | PythonProbabilistic | Model + skill
Section 4.5 | The agent map
All sixteen skills, drawn three ways.
Structure, sequence, and system stack.
The four-tier block above is the bones. The three visuals below are the muscle. Constellation shows what each skill does and which ones share a tool. Swim lane shows how they execute in order during a normal morning run. Layered stack shows every moving part named in one place.
Map 1. Skill Constellation
Structure
Main Agent in the center. Sixteen specialist skills grouped into four functional domains. The tool line under each cluster name is the primary MCP or Python library the skills in that domain invoke.
Map 2. Daily Workflow
Sequence
A normal weekday morning, left to right. Each step is one skill acting on state produced by the step before it. No step is skipped. The session gate on the far left forbids any downstream work until the inbox scan and tracker refresh are complete.
01
Session Gate
session-start
Reads CLAUDE.md. Loads rules, identity facts, comp floors. Refuses to do anything else until the inbox has been scanned.
StateCLAUDE.md | context/
02
Inbox Triage
email-and-drafts
Paginates Outlook via Graph. Finds recruiter replies, ATS confirmations, interview invites, DOL mail. Stages drafts for each new recruiter contact.
Toolms365 MCP
03
Draft Generation
email-and-drafts | post-screen-thankyou
Writes Outlook drafts directly. Embeds signature inline. Selects template by flow: decline, redirect, interest, thank-you.
Toolms365 MCP | create-draft-email
04
Prep Sync
interview-prep
For any confirmed interview on the calendar, rebuilds the prep doc. Pulls Teams join link. Adds it as a button on the doc and on the dashboard.
Toolms365 calendar | python-docx
05
Tracker Regen
tracker-and-dashboard
Any change to Job_Tracker.xlsx triggers generate_tracker_html.py. The HTML view is never edited by hand.
Toolopenpyxl | Python
06
Morning Digest
action-queue | scheduled task
Renders a digest page with open actions, interview schedule, proactive leads, and pending DOL items. One URL for the whole pipeline.
ToolPython service | Windows Scheduler
Map 3. Layered System Stack
System
The whole thing as a stack. Operator on top, disk state on the bottom, request flow downward, state flow upward. Every named component in the running system appears somewhere on this map. Orange outlines are MCP connectors. Gold fills are skill boundaries. Italic caption text is the underlying tool.
Layer 1 | Operator
Daniel
Natural-language requests, voice or text. One operator, no queue.
Layer 2 | Orchestrator
Main Agent | Skill Router
Matches request to skill descriptions. Enforces the session-start gate. Loads CLAUDE.md and context/ on every new session.
The maps above describe the shape of the system. The Command Center is where the operator lives. One URL opened at the start of the day, three stacked views, every piece of state the pipeline produced since the last run. The views below are rendered against sample data because the live pipeline holds confidential recruiter correspondence and active interview threads. The layout, the field structure, and the rendering pattern are exactly what ships. Company names are placeholders from the Microsoft documentation tradition: Fabrikam, Contoso, Northwind, Adventure Works, and friends.
Sample data only. Rows, recruiters, dates, and compensation figures are fabricated. The real dashboard runs locally on the operator's own machine.
Pipeline Snapshot
regenerated from Job_Tracker.xlsx on every change
Company
Role
Stage
Next Action
Contact
Updated
Fabrikam Industries
Principal M365 Architect
Round 2 Scheduled
Prep doc due Fri
Alex Morgan | talent recruiter
Apr 15
Contoso Ltd
Director of IT Infrastructure
Awaiting Feedback
Follow up Apr 24
Jordan Chen | direct
Apr 12
Northwind Traders
Senior Cloud Engineer
Applied
ATS confirmation received
direct apply
Apr 14
Adventure Works
Cloud Governance Lead
Screen Scheduled
Screen Apr 21 10:00 AM ET
Sam Rivera | agency
Apr 16
Wingtip Toys
Azure Platform Architect
Prospect
Research company, confirm comp floor
proactive lead
Apr 11
Tailwind Traders
Principal Cloud Architect
Declined
Rate below floor, logged
Casey Patel | staffing
Apr 10
Proseware
Senior M365 Engineer
Applied
ATS silent past SLA
direct apply
Apr 08
Lucerne Publishing
IT Director
Phone Screen Done
Waiting decision
Morgan Bell | search firm
Apr 09
Action Queue
drafts awaiting operator review | stale threshold 48h
Draft reply staged2h ago
Recruiter follow-up at Fabrikam Industries. Full signature block embedded inline. Needs operator review and send.
amorgan@fabrikam.exampleStaged in Outlook
Interview confirm6h ago
Adventure Works screen confirmed for Apr 21 10:00 AM ET. Calendar invite accepted. Prep doc scheduled for build Fri.
srivera@recruitco.examplePrep doc queued
Stale draft audit18h ago
Two drafts to Proseware recruiter unsent past 48h threshold. Operator decision required: send, rewrite, or close out.
draft_queue.jsonOperator action
Proactive leadToday 06:15
New posting detected on Dice. Title: Principal Cloud Architect at Fabrikam Industries. Core platform Microsoft: yes. Posting link live: yes. Compensation inside floor. Queued for operator review.
dice MCP | screened by posting-screener skillQueued
Morning Digest | Apr 17 2026
regenerated nightly by a Cowork cron task at 06:00 ET
Adventure Works | Screen | 10:00 AM ET | Teams join button active on prep doc
Interviews this week
2 confirmed | 1 pending recruiter confirmation
Stale drafts
2 recruiter drafts past 48h threshold | flagged for operator action
Proactive leads
5 new postings matched against criteria | 3 queued | 2 auto-rejected below comp floor
DOL / unemployment
RESEA complete | UI active | next certification Sunday | job-search log current
Run health
session-start last run 06:00 ET | action_server uptime 71 days | no errors in last scan
This view is where the architecture pays off. The operator does not open Job_Tracker.xlsx, action_log.json, or the skill directory during a normal morning run. One URL holds the whole pipeline. Replace the operator with a customer tenant administrator and the rendering pattern still holds. Different data, same state spine, same deterministic rebuild every time the pipeline changes.
Section 5 | Decisions that made it real
The choices I would bring to a customer conversation.
01
Split deterministic and probabilistic work cleanly.
Anything that must be the same every time runs in Python. Anything that has to sound like a person runs in the model. The tracker calculation, the HTML regeneration, the log write, the file copy: Python. The email draft, the screening judgment, the prep-doc narrative: model. The two halves never pretend to be each other.
Why it mattersCustomers will try to let an agent move money, delete tenants, or compute payroll. The right answer is to bracket the probabilistic part with deterministic guardrails, not to trust the model to behave.
02
Ground every claim in source files.
The agent is not allowed to claim experience I cannot back up. CLAUDE.md has explicit identity rules. Context files hold confirmed work history. If a prep doc would need a capability I do not have, the agent either omits it or flags the gap. No fabrication, no inflation, no plausible hallucination.
Why it mattersCustomer-facing agents that fabricate will destroy trust on the first contact. Grounding is the only pattern that scales.
03
Make the agent refuse shortcuts.
If there is a more rigorous method available to verify something, the agent uses it, even if it costs more tool calls. Verify by querying Sent Items, not by asking the operator. Paginate the full inbox, not a keyword search, because staffing firms use unpredictable senders. Fix data quality issues in the same pass you find them, not a list for later.
Why it mattersShortcut behavior is how agents drift. Building the opposite habit into the system prompt and into the skill descriptions is what keeps behavior stable over weeks of runs.
04
Close the loop with writable memory.
The agent writes lessons back to skill files, context files, and auto-memory after each task. CLAUDE.md stays lean; archived lessons live in context/lessons-archive.md. The next session gets a sharper tool than the last one. This is the feedback loop most agent pilots are missing.
Why it mattersWithout a writable memory tier, an agent starts every engagement cold. With one, it compounds.
05
Treat the operator as the integration point.
The agent pushes state into tools the operator already uses: Outlook drafts, Excel tracker, HTML dashboard. The operator never has to learn a new interface. The agent adapts to the environment rather than asking the environment to adapt.
Why it mattersThis is the move that unlocks adoption. Customers will not migrate to a new console. They will accept an agent that makes their existing console better.
Section 6 | Translation
Why this pattern matches the work enterprises are being asked to do.
Same architecture, bigger surface area.
What I built for my own pipeline
What a Microsoft cloud customer now needs
Inbox triage that writes back to a tracker. Agent paginates Graph, classifies recruiter correspondence, stages Outlook drafts, updates the .xlsx spine.
Tenant assessment, oversharing review, license optimization, and migration. All four share the same architectural shape: paginate a Graph source, classify each record, stage an action, write back with a full audit log. My pipeline runs that pattern on mail and a tracker. A customer engagement runs the same pattern on SharePoint permissions, Exchange rules, license entitlements, or a full tenant transfer. The pattern is the portable piece. Scale and the zero-loss bar on migration are what a customer tenant adds.
Grounded interview prep. Refuses to claim tools the operator has not used. Pulls facts from context files. Every acronym spelled out on first use.
Agent governance for customer-facing agents. Grounding is the foundational discipline, and it is the one I have built into my own system. Enterprise agent governance adds what I have not yet shipped at customer scale: multi-tenant data isolation, Purview DLP and sensitivity labels, data residency controls, and audit export for regulated industries. The grounding muscle transfers. The rest is M365 plumbing that already maps to components I know how to wire.
16 skills, one router. Task signatures pick the right specialist. Skills stay isolated. The router cannot rewrite a skill's behavior at runtime.
Copilot Studio and Agent 365 patterns. Topics, actions, and tools in Copilot Studio are the direct analogue of my skills, Python services, and MCP connectors. The primitives are the same; the packaging and licensing differ. Translation between Claude Agent SDK and Copilot Studio is pattern recognition, not net new learning. The architectural vocabulary carries over cleanly.
Session gate, memory tier, deterministic state. Agent cannot do downstream work until upstream state is verified. Memory writes after every task. State lives on disk, not in the model.
Customer Zero discipline for agent rollouts. I ran the pilot on myself, instrumented everything, wrote the lessons back to the guardrails, and widened scope only after the loop closed. A one-operator pipeline is smaller than a customer fleet by several orders of magnitude. The discipline is what transfers. The blast radius is what gets sized up inside a customer engagement.
The architecture above is pattern transfer from a pipeline I built and run. I have not yet applied the same pattern at customer scale on a Copilot Studio or Agent 365 engagement. That is the honest distinction, and it is why Customer Zero is the right framing. My twenty years are in the Microsoft infrastructure these agents plug into: tenants, identity, Exchange, SharePoint, endpoints, the M365 control plane. I know how those components wire together. What they need is the right tenant to live in. Given the platform depth and the architectural fluency, the ramp into a customer conversation is short.
The short version.
Most Microsoft infrastructure engineers have not yet shipped a production agent. I have. The system has been running for ten weeks. The architecture is in Section 4. The operator view is in Section 4.6. The decisions that made it real are in Section 5. The translation from my pipeline to customer work is in Section 6.
If you are building a Microsoft cloud practice, running a Copilot or Agent 365 customer-zero program, or staffing a Principal Architect role where this pattern is part of the job, I would be glad to walk through any of it live.
Daniel Lepel
Principal Microsoft Cloud Architect
daniel@lepel.us
212-252-9200
Albany, NY Capital District
Agent Architecture Case Study | daniellepel.com | Published April 2026